MKL 2017 Developer Reference C

Intel® Math Kernel Library
Developer Reference
Revision: 011
MKL 2017
Legal Information
Contents
Contents
Legal Information.............................................................................. 29
Introducing the Intel® Math Kernel Library.........................................35
Getting Help and Support................................................................... 37
What's New........................................................................................ 39
Notational Conventions...................................................................... 41
Chapter 1: Function Domains

Performance Enhancements...................................................................... 46
Parallelism.............................................................................................. 46
C Datatypes Specific to Intel MKL...............................................................47
Chapter 2: BLAS and Sparse BLAS Routines

BLAS Routines......................................................................................... 49
Naming Conventions for BLAS Routines.............................................. 49
C Interface Conventions for BLAS Routines..........................................51
Matrix Storage Schemes for BLAS Routines......................................... 52
BLAS Level 1 Routines and Functions..................................................52
cblas_?asum...........................................................................53
cblas_?axpy............................................................................54
cblas_?copy............................................................................ 55
cblas_?dot..............................................................................55
cblas_?sdot............................................................................ 56
cblas_?dotc............................................................................ 57
cblas_?dotu............................................................................ 58
cblas_?nrm2........................................................................... 58
cblas_?rot.............................................................................. 59
cblas_?rotg.............................................................................60
cblas_?rotm............................................................................61
cblas_?rotmg.......................................................................... 62
cblas_?scal............................................................................. 63
cblas_?swap........................................................................... 64
cblas_i?amax.......................................................................... 64
cblas_i?amin...........................................................................65
cblas_?cabs1.......................................................................... 66
BLAS Level 2 Routines...................................................................... 66
cblas_?gbmv...........................................................................67
cblas_?gemv...........................................................................70
cblas_?ger..............................................................................71
cblas_?gerc............................................................................ 73
cblas_?geru............................................................................ 74
cblas_?hbmv...........................................................................75
cblas_?hemv...........................................................................78
cblas_?her..............................................................................80
cblas_?her2............................................................................ 81
cblas_?hpmv...........................................................................82
3
Intel® Math Kernel Library Developer Reference
cblas_?hpr..............................................................................84
cblas_?hpr2............................................................................ 85
cblas_?sbmv........................................................................... 87
cblas_?spmv........................................................................... 90
cblas_?spr.............................................................................. 91
cblas_?spr2............................................................................ 93
cblas_?symv........................................................................... 94
cblas_?syr.............................................................................. 95
cblas_?syr2............................................................................ 97
cblas_?tbmv........................................................................... 98
cblas_?tbsv...........................................................................101
cblas_?tpmv......................................................................... 104
cblas_?tpsv...........................................................................105
cblas_?trmv.......................................................................... 107
cblas_?trsv........................................................................... 108
BLAS Level 3 Routines.................................................................... 110
cblas_?gemm........................................................................ 111
cblas_?hemm........................................................................114
cblas_?herk.......................................................................... 116
cblas_?her2k.........................................................................118
cblas_?symm........................................................................ 121
cblas_?syrk...........................................................................123
cblas_?syr2k......................................................................... 125
cblas_?trmm......................................................................... 128
cblas_?trsm.......................................................................... 130
Sparse BLAS Level 1 Routines.................................................................. 132
Vector Arguments.......................................................................... 132
Naming Conventions for Sparse BLAS Routines.................................. 132
Routines and Data Types.................................................................132
BLAS Level 1 Routines That Can Work With Sparse Vectors.................. 133
cblas_?axpyi................................................................................. 133
cblas_?doti....................................................................................134
cblas_?dotci.................................................................................. 135
cblas_?dotui..................................................................................136
cblas_?gthr................................................................................... 137
cblas_?gthrz..................................................................................137
cblas_?roti.................................................................................... 138
cblas_?sctr....................................................................................139
Sparse BLAS Level 2 and Level 3 Routines................................................. 140
Naming Conventions in Sparse BLAS Level 2 and Level 3.....................140
Sparse Matrix Storage Formats for Sparse BLAS Routines.................... 141
Routines and Supported Operations..................................................141
Interface Consideration...................................................................142
Sparse BLAS Level 2 and Level 3 Routines.........................................147
mkl_?csrgemv.......................................................................150
mkl_?bsrgemv.......................................................................151
mkl_?coogemv...................................................................... 153
mkl_?diagemv.......................................................................154
mkl_?csrsymv....................................................................... 156
mkl_?bsrsymv.......................................................................157
4
Contents
mkl_?coosymv...................................................................... 158
mkl_?diasymv....................................................................... 160
mkl_?csrtrsv......................................................................... 161
mkl_?bsrtrsv.........................................................................163
mkl_?cootrsv........................................................................ 165
mkl_?diatrsv......................................................................... 166
mkl_cspblas_?csrgemv........................................................... 168
mkl_cspblas_?bsrgemv...........................................................169
mkl_cspblas_?coogemv.......................................................... 171
mkl_cspblas_?csrsymv........................................................... 172
mkl_cspblas_?bsrsymv........................................................... 173
mkl_cspblas_?coosymv.......................................................... 175
mkl_cspblas_?csrtrsv............................................................. 176
mkl_cspblas_?bsrtrsv............................................................. 178
mkl_cspblas_?cootrsv............................................................ 180
mkl_?csrmv.......................................................................... 182
mkl_?bsrmv.......................................................................... 183
mkl_?cscmv.......................................................................... 185
mkl_?coomv......................................................................... 187
mkl_?csrsv........................................................................... 189
mkl_?bsrsv........................................................................... 191
mkl_?cscsv........................................................................... 193
mkl_?coosv...........................................................................195
mkl_?csrmm......................................................................... 197
mkl_?bsrmm.........................................................................199
mkl_?cscmm.........................................................................202
mkl_?coomm........................................................................ 204
mkl_?csrsm.......................................................................... 206
mkl_?cscsm.......................................................................... 209
mkl_?coosm..........................................................................211
mkl_?bsrsm.......................................................................... 213
mkl_?diamv.......................................................................... 215
mkl_?skymv......................................................................... 217
mkl_?diasv........................................................................... 219
mkl_?skysv...........................................................................220
mkl_?diamm......................................................................... 222
mkl_?skymm........................................................................ 224
mkl_?diasm.......................................................................... 226
mkl_?skysm..........................................................................228
mkl_?dnscsr..........................................................................230
mkl_?csrcoo..........................................................................232
mkl_?csrbsr.......................................................................... 234
mkl_?csrcsc.......................................................................... 236
mkl_?csrdia.......................................................................... 238
mkl_?csrsky..........................................................................240
mkl_?csradd......................................................................... 242
mkl_?csrmultcsr.................................................................... 245
mkl_?csrmultd...................................................................... 248
Inspector-executor Sparse BLAS Routines..................................................250
Naming conventions in Inspector-executor Sparse BLAS Routines......... 250
5
Sparse Matrix Storage Formats for Inspector-executor Sparse BLAS

Routines................................................................................... 251
Supported Inspector-executor Sparse BLAS Operations....................... 252
Matrix manipulation routines........................................................... 252
mkl_sparse_?_create_csr........................................................253
mkl_sparse_?_create_csc....................................................... 255
mkl_sparse_?_create_coo.......................................................256
mkl_sparse_?_create_bsr....................................................... 258
mkl_sparse_copy...................................................................260
mkl_sparse_destroy............................................................... 261
mkl_sparse_convert_csr......................................................... 262
mkl_sparse_convert_bsr.........................................................263
mkl_sparse_?_export_csr....................................................... 264
mkl_sparse_?_export_bsr....................................................... 266
mkl_sparse_?_set_value.........................................................268
Inspector-executor Sparse BLAS Analysis Routines............................. 269
mkl_sparse_set_mv_hint........................................................ 269
mkl_sparse_set_sv_hint......................................................... 271
mkl_sparse_set_mm_hint....................................................... 273
mkl_sparse_set_sm_hint........................................................ 275
mkl_sparse_set_memory_hint.................................................278
mkl_sparse_optimize............................................................. 279
Inspector-executor Sparse BLAS Execution Routines........................... 280
mkl_sparse_?_mv..................................................................280
mkl_sparse_?_trsv.................................................................282
mkl_sparse_?_mm.................................................................284
mkl_sparse_?_trsm................................................................288
mkl_sparse_?_add................................................................. 291
mkl_sparse_spmm.................................................................292
mkl_sparse_?_spmmd............................................................293
BLAS-like Extensions.............................................................................. 294
cblas_?axpby................................................................................ 295
cblas_?gemmt............................................................................... 296
cblas_?gemm3m............................................................................299
cblas_?gemm_batch.......................................................................302
cblas_?gemm3m_batch.................................................................. 305
mkl_?imatcopy.............................................................................. 308
mkl_?omatcopy............................................................................. 310
mkl_?omatcopy2........................................................................... 312
mkl_?omatadd...............................................................................314
cblas_?gemm_alloc........................................................................ 316
cblas_?gemm_pack........................................................................ 318
cblas_?gemm_compute.................................................................. 321
cblas_?gemm_free......................................................................... 324
Chapter 3: LAPACK Routines

C Interface Conventions for LAPACK Routines.............................................327
Matrix Layout for LAPACK Routines........................................................... 329
Matrix Storage Schemes for LAPACK Routines............................................ 331
Mathematical Notation for LAPACK Routines............................................... 338
6
Contents
Error Analysis........................................................................................ 339

LAPACK Linear Equation Routines............................................................. 340
LAPACK Linear Equation Computational Routines................................ 340
Matrix Factorization: LAPACK Computational Routines.................342
Solving Systems of Linear Equations: LAPACK Computational
Routines........................................................................... 380
Estimating the Condition Number: LAPACK Computational
Routines........................................................................... 415
Refining the Solution and Estimating Its Error: LAPACK
Computational Routines...................................................... 436
Matrix Inversion: LAPACK Computational Routines..................... 491
Matrix Equilibration: LAPACK Computational Routines................. 511
LAPACK Linear Equation Driver Routines........................................... 527
?gesv................................................................................... 528
?gesvx................................................................................. 530
?gesvxx................................................................................535
?gbsv...................................................................................542
?gbsvx................................................................................. 544
?gbsvxx................................................................................548
?gtsv................................................................................... 556
?gtsvx..................................................................................557
?dtsvb..................................................................................561
?posv...................................................................................562
?posvx................................................................................. 565
?posvxx................................................................................569
?ppsv...................................................................................575
?ppsvx................................................................................. 577
?pbsv...................................................................................580
?pbsvx................................................................................. 582
?ptsv................................................................................... 586
?ptsvx..................................................................................587
?sysv................................................................................... 590
?sysv_rook........................................................................... 591
?sysvx..................................................................................593
?sysvxx................................................................................ 596
?hesv...................................................................................603
?hesvx................................................................................. 605
?hesvxx................................................................................608
?spsv................................................................................... 614
?spsvx................................................................................. 616
?hpsv...................................................................................619
?hpsvx................................................................................. 620
LAPACK Least Squares and Eigenvalue Problem Routines............................. 623
LAPACK Least Squares and Eigenvalue Problem Computational Routines624
Orthogonal Factorizations: LAPACK Computational Routines.........624
Singular Value Decomposition: LAPACK Computational Routines...680
Symmetric Eigenvalue Problems: LAPACK Computational
Routines........................................................................... 698
Generalized Symmetric-Definite Eigenvalue Problems: LAPACK
7
Nonsymmetric Eigenvalue Problems: LAPACK Computational

Routines........................................................................... 748
Generalized Nonsymmetric Eigenvalue Problems: LAPACK
Generalized Singular Value Decomposition: LAPACK
Cosine-Sine Decomposition: LAPACK Computational Routines...... 827
LAPACK Least Squares and Eigenvalue Problem Driver Routines........... 834
Linear Least Squares (LLS) Problems: LAPACK Driver Routines.... 835
Generalized Linear Least Squares (LLS) Problems: LAPACK
Driver Routines................................................................. 843
Symmetric Eigenvalue Problems: LAPACK Driver Routines........... 847
Nonsymmetric Eigenvalue Problems: LAPACK Driver Routines...... 893
Singular Value Decomposition: LAPACK Driver Routines.............. 906
Cosine-Sine Decomposition: LAPACK Driver Routines.................. 931
Generalized Symmetric Definite Eigenvalue Problems: LAPACK
Driver Routines................................................................. 937
Generalized Nonsymmetric Eigenvalue Problems: LAPACK Driver
Routines........................................................................... 976
LAPACK Auxiliary Routines..................................................................... 1000
?lacgv.........................................................................................1000
?syconv...................................................................................... 1000
?syr............................................................................................1002
i?max1....................................................................................... 1003
?sum1........................................................................................ 1004
?gelq2........................................................................................ 1004
?geqr2........................................................................................1006
?geqrt2.......................................................................................1007
?geqrt3.......................................................................................1008
?getf2.........................................................................................1010
?lacn2........................................................................................ 1011
?lacpy.........................................................................................1013
?lakf2......................................................................................... 1014
?lange........................................................................................ 1015
?lansy.........................................................................................1016
?lanhe........................................................................................ 1017
?lantr......................................................................................... 1018
?lapmr........................................................................................1019
?lapmt........................................................................................ 1020
?lapy2........................................................................................ 1022
?lapy3........................................................................................ 1022
?laran.........................................................................................1023
?larfb......................................................................................... 1023
?larfg......................................................................................... 1026
?larft.......................................................................................... 1027
?larfx..........................................................................................1030
?large.........................................................................................1031
?larnd.........................................................................................1032
?larnv.........................................................................................1033
?laror......................................................................................... 1034
8
Contents
?larot......................................................................................... 1036
?lartgp........................................................................................1039
?lartgs........................................................................................ 1040
?lascl..........................................................................................1041
?laset......................................................................................... 1042
?lasrt..........................................................................................1043
?laswp........................................................................................ 1044
?latm1........................................................................................ 1045
?latm2........................................................................................ 1047
?latm3........................................................................................ 1050
?latm5........................................................................................ 1053
?latm6........................................................................................ 1056
?latme........................................................................................ 1058
?latmr........................................................................................ 1062
?lauum....................................................................................... 1069
?syswapr.................................................................................... 1070
?heswapr.................................................................................... 1071
?sfrk...........................................................................................1072
?hfrk.......................................................................................... 1074
?tfsm..........................................................................................1076
?tfttp..........................................................................................1078
?tfttr.......................................................................................... 1079
?tpqrt2 ...................................................................................... 1080
?tprfb ........................................................................................ 1082
?tpttf..........................................................................................1085
?tpttr..........................................................................................1087
?trttf.......................................................................................... 1088
?trttp..........................................................................................1089
?lacp2........................................................................................ 1090
mkl_?tppack................................................................................1091
mkl_?tpunpack............................................................................ 1093
LAPACK Utility Functions and Routines.....................................................1096
ilaver..........................................................................................1096
?lamch....................................................................................... 1096
LAPACK Test Functions and Routines....................................................... 1097
?lagge........................................................................................ 1098
?laghe........................................................................................ 1099
?lagsy.........................................................................................1100
?latms........................................................................................ 1101
Chapter 4: ScaLAPACK Routines

Overview of ScaLAPACK Routines............................................................1107
ScaLAPACK Array Descriptors................................................................. 1108
Naming Conventions for ScaLAPACK Routines...........................................1110
ScaLAPACK Computational Routines........................................................ 1111
Systems of Linear Equations: ScaLAPACK Computational Routines...... 1111
Matrix Factorization: ScaLAPACK Computational Routines.................. 1112
p?getrf............................................................................... 1112
p?gbtrf............................................................................... 1114
p?dbtrf............................................................................... 1116
9
p?dttrf................................................................................1118
p?potrf............................................................................... 1121
p?pbtrf............................................................................... 1122
p?pttrf................................................................................1124
Solving Systems of Linear Equations: ScaLAPACK Computational
Routines................................................................................. 1127
p?getrs...............................................................................1127
p?gbtrs...............................................................................1128
p?dbtrs...............................................................................1131
p?dttrs............................................................................... 1133
p?potrs...............................................................................1135
p?pbtrs...............................................................................1137
p?pttrs............................................................................... 1139
p?trtrs................................................................................ 1141
Estimating the Condition Number: ScaLAPACK Computational Routines1143
p?gecon..............................................................................1143
p?pocon..............................................................................1146
p?trcon...............................................................................1148
Refining the Solution and Estimating Its Error: ScaLAPACK
Computational Routines............................................................ 1151
p?gerfs............................................................................... 1151
p?porfs............................................................................... 1154
p?trrfs................................................................................ 1157
Matrix Inversion: ScaLAPACK Computational Routines....................... 1161
p?getri............................................................................... 1161
p?potri............................................................................... 1163
p?trtri.................................................................................1164
Matrix Equilibration: ScaLAPACK Computational Routines...................1166
p?geequ............................................................................. 1166
p?poequ............................................................................. 1168
Orthogonal Factorizations: ScaLAPACK Computational Routines.......... 1170
p?geqrf...............................................................................1170
p?geqpf.............................................................................. 1172
p?orgqr.............................................................................. 1175
p?ungqr..............................................................................1177
p?ormqr............................................................................. 1179
p?unmqr.............................................................................1181
p?gelqf............................................................................... 1184
p?orglq...............................................................................1186
p?unglq.............................................................................. 1188
p?ormlq..............................................................................1190
p?unmlq............................................................................. 1192
p?geqlf............................................................................... 1195
p?orgql...............................................................................1197
p?ungql.............................................................................. 1199
p?ormql..............................................................................1201
p?unmql............................................................................. 1203
p?gerqf...............................................................................1206
p?orgrq.............................................................................. 1208
p?ungrq..............................................................................1210
10
Contents
p?ormr3............................................................................. 1212
p?unmr3.............................................................................1215
p?ormrq............................................................................. 1218
p?unmrq.............................................................................1221
p?tzrzf................................................................................1224
p?ormrz..............................................................................1226
p?unmrz............................................................................. 1229
p?ggqrf...............................................................................1232
p?ggrqf...............................................................................1236
Symmetric Eigenvalue Problems: ScaLAPACK Computational Routines. 1239
p?syngst.............................................................................1240
p?syntrd............................................................................. 1242
p?sytrd...............................................................................1246
p?ormtr.............................................................................. 1249
p?hengst............................................................................ 1252
p?hentrd.............................................................................1255
p?hetrd.............................................................................. 1258
p?unmtr............................................................................. 1261
p?stebz.............................................................................. 1264
p?stedc.............................................................................. 1268
p?stein............................................................................... 1270
Nonsymmetric Eigenvalue Problems: ScaLAPACK Computational
Routines................................................................................. 1273
p?gehrd..............................................................................1274
p?ormhr............................................................................. 1277
p?unmhr.............................................................................1279
p?lahqr...............................................................................1282
p?trevc............................................................................... 1284
Singular Value Decomposition: ScaLAPACK Driver Routines................ 1287
p?gebrd.............................................................................. 1287
p?ormbr............................................................................. 1291
p?unmbr.............................................................................1295
Generalized Symmetric-Definite Eigenvalue Problems: ScaLAPACK
Computational Routines............................................................ 1299
p?sygst...............................................................................1299
p?hegst.............................................................................. 1301
ScaLAPACK Driver Routines....................................................................1303
p?gesv........................................................................................1303
p?gesvx...................................................................................... 1305
p?gbsv........................................................................................1310
p?dbsv........................................................................................1312
p?dtsv........................................................................................ 1315
p?posv........................................................................................1317
p?posvx...................................................................................... 1319
p?pbsv........................................................................................1324
p?ptsv........................................................................................ 1326
p?gels.........................................................................................1328
p?syev........................................................................................1332
p?syevd...................................................................................... 1334
p?syevr.......................................................................................1337
11
p?syevx...................................................................................... 1341
p?heev....................................................................................... 1347
p?heevd......................................................................................1350
p?heevr...................................................................................... 1353
p?heevx......................................................................................1358
p?gesvd...................................................................................... 1364
p?sygvx...................................................................................... 1368
p?hegvx......................................................................................1375
ScaLAPACK Auxiliary Routines................................................................ 1383
p?lacgv....................................................................................... 1388
p?max1...................................................................................... 1389
pilaver........................................................................................ 1390
pmpcol....................................................................................... 1391
pmpim2...................................................................................... 1392
?combamax1............................................................................... 1393
p?sum1...................................................................................... 1394
p?dbtrsv..................................................................................... 1395
p?dttrsv...................................................................................... 1397
p?gebal.......................................................................................1400
p?gebd2......................................................................................1402
p?gehd2..................................................................................... 1405
p?gelq2...................................................................................... 1407
p?geql2...................................................................................... 1409
p?geqr2...................................................................................... 1411
p?gerq2...................................................................................... 1413
p?getf2....................................................................................... 1415
p?labrd....................................................................................... 1417
p?lacon.......................................................................................1420
p?laconsb....................................................................................1422
p?lacp2.......................................................................................1423
p?lacp3.......................................................................................1424
p?lacpy....................................................................................... 1426
p?laevswp................................................................................... 1427
p?lahrd....................................................................................... 1429
p?laiect.......................................................................................1431
p?lamve......................................................................................1432
p?lange.......................................................................................1434
p?lanhs.......................................................................................1436
p?lansy, p?lanhe.......................................................................... 1437
p?lantr........................................................................................1439
p?lapiv........................................................................................1441
p?lapv2.......................................................................................1443
p?laqge.......................................................................................1445
p?laqr0....................................................................................... 1447
p?laqr1....................................................................................... 1450
p?laqr2....................................................................................... 1453
p?laqr3....................................................................................... 1455
p?laqr5....................................................................................... 1458
p?laqsy....................................................................................... 1461
p?lared1d....................................................................................1462
12
Contents
p?lared2d....................................................................................1463
p?larf......................................................................................... 1464
p?larfb........................................................................................1467
p?larfc........................................................................................ 1470
p?larfg........................................................................................1473
p?larft........................................................................................ 1475
p?larz......................................................................................... 1477
p?larzb....................................................................................... 1480
p?larzc........................................................................................1483
p?larzt........................................................................................ 1486
p?lascl........................................................................................ 1488
p?lase2.......................................................................................1490
p?laset....................................................................................... 1491
p?lasmsub...................................................................................1493
p?lasrt........................................................................................ 1494
p?lassq....................................................................................... 1496
p?laswp...................................................................................... 1497
p?latra........................................................................................1499
p?latrd........................................................................................1500
p?latrs........................................................................................ 1503
p?latrz........................................................................................ 1505
p?lauu2...................................................................................... 1508
p?lauum..................................................................................... 1509
p?lawil........................................................................................ 1510
p?org2l/p?ung2l........................................................................... 1511
p?org2r/p?ung2r.......................................................................... 1513
p?orgl2/p?ungl2........................................................................... 1515
p?orgr2/p?ungr2.......................................................................... 1518
p?orm2l/p?unm2l......................................................................... 1520
p?orm2r/p?unm2r........................................................................ 1523
p?orml2/p?unml2......................................................................... 1526
p?ormr2/p?unmr2........................................................................ 1529
p?pbtrsv..................................................................................... 1533
p?pttrsv...................................................................................... 1536
p?potf2....................................................................................... 1539
p?rot.......................................................................................... 1541
p?rscl......................................................................................... 1543
p?sygs2/p?hegs2......................................................................... 1544
p?sytd2/p?hetd2.......................................................................... 1546
p?trord....................................................................................... 1549
p?trsen....................................................................................... 1553
p?trti2........................................................................................ 1558
?lahqr2....................................................................................... 1559
?lamsh....................................................................................... 1561
?lapst......................................................................................... 1562
?laqr6.........................................................................................1563
?lar1va....................................................................................... 1566
?laref..........................................................................................1568
?larrb2........................................................................................1570
?larrd2........................................................................................1572
13
?larre2........................................................................................1575
?larre2a...................................................................................... 1579
?larrf2........................................................................................ 1582
?larrv2........................................................................................1584
?lasorte...................................................................................... 1588
?lasrt2........................................................................................ 1590
?stegr2....................................................................................... 1591
?stegr2a..................................................................................... 1594
?stegr2b..................................................................................... 1597
?stein2....................................................................................... 1601
?dbtf2.........................................................................................1602
?dbtrf......................................................................................... 1604
?dttrf..........................................................................................1605
?dttrsv........................................................................................1606
?pttrsv........................................................................................1608
?steqr2....................................................................................... 1609
?trmvt........................................................................................ 1611
pilaenv....................................................................................... 1613
pilaenvx......................................................................................1614
pjlaenv....................................................................................... 1616
Additional ScaLAPACK Routines...................................................... 1617
ScaLAPACK Utility Functions and Routines................................................1619
p?labad.......................................................................................1620
p?lachkieee................................................................................. 1621
p?lamch......................................................................................1621
p?lasnbt......................................................................................1623
ScaLAPACK Redistribution/Copy Routines.................................................1623
p?gemr2d................................................................................... 1624
p?trmr2d.................................................................................... 1626
Chapter 5: Sparse Solver Routines

Intel MKL PARDISO - Parallel Direct Sparse Solver Interface....................... 1629
pardiso....................................................................................... 1634
pardisoinit...................................................................................1640
pardiso_64.................................................................................. 1641
pardiso_getenv, pardiso_setenv..................................................... 1642
mkl_pardiso_pivot........................................................................1643
pardiso_getdiag........................................................................... 1644
pardiso_handle_store................................................................... 1645
pardiso_handle_restore.................................................................1646
pardiso_handle_delete.................................................................. 1646
pardiso_handle_store_64.............................................................. 1647
pardiso_handle_restore_64........................................................... 1648
pardiso_handle_delete_64.............................................................1649
Intel MKL PARDISO Parameters in Tabular Form............................... 1649
pardiso iparm Parameter............................................................... 1653
PARDISO_DATA_TYPE................................................................... 1664
Parallel Direct Sparse Solver for Clusters Interface.................................... 1665
cluster_sparse_solver................................................................... 1666
cluster_sparse_solver_64.............................................................. 1671
14
Contents
cluster_sparse_solver iparm Parameter........................................... 1672

Direct Sparse Solver (DSS) Interface Routines..........................................1677
DSS Interface Description............................................................. 1678
DSS Implementation Details.......................................................... 1679
DSS Routines...............................................................................1679
dss_create.......................................................................... 1680
dss_define_structure............................................................ 1681
dss_reorder.........................................................................1682
dss_factor_real, dss_factor_complex...................................... 1684
dss_solve_real, dss_solve_complex........................................ 1685
dss_delete.......................................................................... 1687
dss_statistics.......................................................................1688
Iterative Sparse Solvers based on Reverse Communication Interface (RCI
ISS)............................................................................................... 1690
CG Interface Description............................................................... 1692
FGMRES Interface Description........................................................1696
RCI ISS Routines..........................................................................1702
dcg_init.............................................................................. 1702
dcg_check...........................................................................1703
dcg.................................................................................... 1704
dcg_get.............................................................................. 1706
dcgmrhs_init....................................................................... 1706
dcgmrhs_check....................................................................1707
dcgmrhs............................................................................. 1708
dcgmrhs_get....................................................................... 1710
dfgmres_init........................................................................1711
dfgmres_check.................................................................... 1712
dfgmres..............................................................................1713
dfgmres_get........................................................................1715
RCI ISS Implementation Details..................................................... 1716
Preconditioners based on Incomplete LU Factorization Technique................ 1716
ILU0 and ILUT Preconditioners Interface Description......................... 1717
dcsrilu0.......................................................................................1717
dcsrilut....................................................................................... 1720
Sparse Matrix Checker Routines............................................................. 1723
sparse_matrix_checker................................................................. 1724
sparse_matrix_checker_init........................................................... 1725
Chapter 6: Extended Eigensolver Routines

The FEAST Algorithm............................................................................ 1727
Extended Eigensolver Functionality......................................................... 1728
Parallelism in Extended Eigensolver Routines................................... 1729
Achieving Performance With Extended Eigensolver Routines............... 1729
Extended Eigensolver Interfaces............................................................. 1730
Extended Eigensolver Naming Conventions...................................... 1730
feastinit...................................................................................... 1731
Extended Eigensolver Input Parameters...........................................1732
Extended Eigensolver Output Details...............................................1733
Extended Eigensolver RCI Routines.................................................1734
Extended Eigensolver RCI Interface Description....................... 1734
15
?feast_srci/?feast_hrci.......................................................... 1737
Extended Eigensolver Predefined Interfaces..................................... 1739
Matrix Storage.....................................................................1739
?feast_syev/?feast_heev.......................................................1740
?feast_sygv/?feast_hegv.......................................................1741
?feast_sbev/?feast_hbev.......................................................1743
?feast_sbgv/?feast_hbgv...................................................... 1745
?feast_scsrev/?feast_hcsrev.................................................. 1747
?feast_scsrgv/?feast_hcsrgv..................................................1749
Chapter 7: Vector Mathematical Functions

VM Data Types, Accuracy Modes, and Performance Tips............................. 1753
VM Naming Conventions........................................................................ 1754
VM Function Interfaces................................................................. 1755
VM Mathematical Function Interfaces...................................... 1755
VM Pack Function Interfaces.................................................. 1755
VM Unpack Function Interfaces.............................................. 1755
VM Service Function Interfaces.............................................. 1755
VM Input Function Interfaces................................................. 1756
VM Output Function Interfaces...............................................1756
Vector Indexing Methods....................................................................... 1756
VM Error Diagnostics.............................................................................1757
VM Mathematical Functions....................................................................1758
Special Value Notations................................................................. 1759
Arithmetic Functions..................................................................... 1760
v?Add.................................................................................1760
v?Sub.................................................................................1761
v?Sqr................................................................................. 1763
v?Mul................................................................................. 1764
v?MulByConj....................................................................... 1765
v?Conj................................................................................1766
v?Abs................................................................................. 1767
v?Arg................................................................................. 1768
v?LinearFrac........................................................................1769
Power and Root Functions............................................................. 1772
v?Inv................................................................................. 1772
v?Div................................................................................. 1773
v?Sqrt................................................................................ 1774
v?InvSqrt............................................................................1776
v?Cbrt................................................................................ 1777
v?InvCbrt........................................................................... 1778
v?Pow2o3........................................................................... 1779
v?Pow3o2........................................................................... 1780
v?Pow................................................................................ 1781
v?Powx...............................................................................1783
v?Hypot..............................................................................1785
Exponential and Logarithmic Functions............................................1786
v?Exp.................................................................................1786
v?Expm1............................................................................ 1788
v?Ln...................................................................................1789
16
Contents
v?Log10............................................................................. 1791
v?Log1p..............................................................................1793
Trigonometric Functions................................................................ 1794
v?Cos................................................................................. 1794
v?Sin..................................................................................1796
v?SinCos............................................................................ 1797
v?CIS................................................................................. 1799
v?Tan................................................................................. 1800
v?Acos............................................................................... 1801
v?Asin................................................................................ 1803
v?Atan................................................................................1804
v?Atan2..............................................................................1806
Hyperbolic Functions.....................................................................1807
v?Cosh............................................................................... 1807
v?Sinh................................................................................1809
v?Tanh............................................................................... 1811
v?Acosh..............................................................................1813
v?Asinh.............................................................................. 1815
v?Atanh..............................................................................1816
Special Functions......................................................................... 1818
v?Erf.................................................................................. 1818
v?Erfc.................................................................................1820
v?CdfNorm..........................................................................1822
v?ErfInv..............................................................................1823
v?ErfcInv............................................................................ 1826
v?CdfNormInv..................................................................... 1827
v?LGamma..........................................................................1829
v?TGamma......................................................................... 1830
v?ExpInt1........................................................................... 1831
Rounding Functions...................................................................... 1832
v?Floor............................................................................... 1832
v?Ceil.................................................................................1833
v?Trunc.............................................................................. 1834
v?Round............................................................................. 1835
v?NearbyInt........................................................................ 1836
v?Rint................................................................................ 1837
v?Modf............................................................................... 1838
v?Frac................................................................................ 1839
VM Pack/Unpack Functions.................................................................... 1841
v?Pack........................................................................................1841
v?Unpack.................................................................................... 1842
VM Service Functions............................................................................ 1843
vmlSetMode................................................................................ 1844
vmlGetMode................................................................................ 1846
MKLFreeTls..................................................................................1846
vmlSetErrStatus...........................................................................1847
vmlGetErrStatus.......................................................................... 1848
vmlClearErrStatus........................................................................ 1848
vmlSetErrorCallBack..................................................................... 1849
vmlGetErrorCallBack.....................................................................1851
17
vmlClearErrorCallBack.................................................................. 1851
Chapter 8: Statistical Functions

Random Number Generators.................................................................. 1853
Random Number Generators Conventions........................................ 1854
Random Number Generators Mathematical Notation................. 1854
Random Number Generators Naming Conventions.................... 1855
Basic Generators.......................................................................... 1859
BRNG Parameter Definition....................................................1861
Random Streams................................................................. 1862
BRNG Data Types.................................................................1862
Error Reporting............................................................................ 1862
VS RNG Usage Model.................................................................... 1864
Service Routines.......................................................................... 1865
vslNewStream..................................................................... 1866
vslNewStreamEx..................................................................1867
vsliNewAbstractStream......................................................... 1869
vsldNewAbstractStream........................................................ 1870
vslsNewAbstractStream........................................................ 1871
vslDeleteStream.................................................................. 1873
vslCopyStream.................................................................... 1873
vslCopyStreamState............................................................. 1874
vslSaveStreamF...................................................................1875
vslLoadStreamF................................................................... 1876
vslSaveStreamM.................................................................. 1877
vslLoadStreamM.................................................................. 1878
vslGetStreamSize.................................................................1879
vslLeapfrogStream............................................................... 1880
vslSkipAheadStream............................................................ 1881
vslGetStreamStateBrng........................................................ 1883
vslGetNumRegBrngs.............................................................1884
Distribution Generators................................................................. 1884
Continuous Distributions....................................................... 1888
Discrete Distributions........................................................... 1912
Advanced Service Routines............................................................ 1929
Advanced Service Routine Data Types..................................... 1929
vslRegisterBrng................................................................... 1930
vslGetBrngProperties............................................................ 1931
Formats for User-Designed Generators....................................1932
Convolution and Correlation................................................................... 1934
Convolution and Correlation Naming Conventions............................. 1935
Convolution and Correlation Data Types.......................................... 1936
Convolution and Correlation Parameters.......................................... 1936
Convolution and Correlation Task Status and Error Reporting..............1938
Convolution and Correlation Task Constructors................................. 1939
vslConvNewTask/vslCorrNewTask........................................... 1940
vslConvNewTask1D/vslCorrNewTask1D................................... 1941
vslConvNewTaskX/vslCorrNewTaskX....................................... 1942
vslConvNewTaskX1D/vslCorrNewTaskX1D................................1944
Convolution and Correlation Task Editors......................................... 1946
18
Contents
vslConvSetMode/vslCorrSetMode........................................... 1947
vslConvSetInternalPrecision/vslCorrSetInternalPrecision............1948
vslConvSetStart/vslCorrSetStart............................................ 1949
vslConvSetDecimation/vslCorrSetDecimation........................... 1950
Task Execution Routines................................................................ 1951
vslConvExec/vslCorrExec...................................................... 1951
vslConvExec1D/vslCorrExec1D...............................................1953
vslConvExecX/vslCorrExecX...................................................1955
vslConvExecX1D/vslCorrExecX1D........................................... 1956
Convolution and Correlation Task Destructors...................................1958
vslConvDeleteTask/vslCorrDeleteTask......................................1958
Convolution and Correlation Task Copiers........................................ 1959
vslConvCopyTask/vslCorrCopyTask......................................... 1959
Convolution and Correlation Usage Examples................................... 1960
Convolution and Correlation Mathematical Notation and Definitions..... 1963
Convolution and Correlation Data Allocation..................................... 1964
Summary Statistics...............................................................................1966
Summary Statistics Naming Conventions.........................................1967
Summary Statistics Data Types...................................................... 1967
Summary Statistics Parameters......................................................1968
Summary Statistics Task Status and Error Reporting......................... 1968
Summary Statistics Task Constructors.............................................1972
vslSSNewTask..................................................................... 1972
Summary Statistics Task Editors.....................................................1973
vslSSEditTask...................................................................... 1975
vslSSEditMoments................................................................1983
vslSSEditSums.................................................................... 1984
vslSSEditCovCor.................................................................. 1985
vslSSEditCP.........................................................................1987
vslSSEditPartialCovCor..........................................................1989
vslSSEditQuantiles............................................................... 1990
vslSSEditStreamQuantiles..................................................... 1991
vslSSEditPooledCovariance.................................................... 1992
vslSSEditRobustCovariance....................................................1993
vslSSEditOutliersDetection.................................................... 1995
vslSSEditMissingValues......................................................... 1996
vslSSEditCorParameterization................................................ 2000
Summary Statistics Task Computation Routines................................2000
vslSSCompute..................................................................... 2004
Summary Statistics Task Destructor................................................2005
vslSSDeleteTask...................................................................2005
Summary Statistics Usage Examples...............................................2005
Summary Statistics Mathematical Notation and Definitions.................2007
Chapter 9: Fourier Transform Functions

FFT Functions...................................................................................... 2014
FFT Interface............................................................................... 2015
Computing an FFT........................................................................ 2015
Configuration Settings.................................................................. 2015
DFTI_PRECISION................................................................. 2018
19
DFTI_FORWARD_DOMAIN..................................................... 2018
DFTI_DIMENSION, DFTI_LENGTHS.........................................2019
DFTI_PLACEMENT................................................................ 2019
DFTI_FORWARD_SCALE, DFTI_BACKWARD_SCALE...................2019
DFTI_NUMBER_OF_USER_THREADS....................................... 2020
DFTI_THREAD_LIMIT............................................................2020
DFTI_INPUT_STRIDES, DFTI_OUTPUT_STRIDES...................... 2021
DFTI_NUMBER_OF_TRANSFORMS.......................................... 2022
DFTI_INPUT_DISTANCE, DFTI_OUTPUT_DISTANCE.................. 2023
DFTI_COMPLEX_STORAGE, DFTI_REAL_STORAGE,
DFTI_CONJUGATE_EVEN_STORAGE....................................2023
DFTI_PACKED_FORMAT........................................................ 2026
DFTI_WORKSPACE............................................................... 2034
DFTI_COMMIT_STATUS.........................................................2035
DFTI_ORDERING................................................................. 2035
FFT Descriptor Manipulation Functionss........................................... 2036
DftiCreateDescriptor.............................................................2036
DftiCommitDescriptor........................................................... 2037
DftiFreeDescriptor................................................................ 2038
DftiCopyDescriptor............................................................... 2039
FFT Descriptor Configuration Functions............................................2040
DftiSetValue........................................................................ 2040
DftiGetValue........................................................................2041
FFT Computation Functions............................................................2042
DftiComputeForward.............................................................2042
DftiComputeBackward.......................................................... 2044
Configuring and Computing an FFT in C/C++........................... 2046
Status Checking Functions.............................................................2048
DftiErrorClass...................................................................... 2048
DftiErrorMessage................................................................. 2050
Cluster FFT Functions............................................................................2050
Computing Cluster FFT..................................................................2051
Distributing Data among Processes................................................. 2052
Cluster FFT Interface.................................................................... 2054
Cluster FFT Descriptor Manipulation Functions.................................. 2054
DftiCreateDescriptorDM........................................................ 2054
DftiCommitDescriptorDM.......................................................2055
DftiFreeDescriptorDM........................................................... 2056
Cluster FFT Computation Functions................................................. 2057
DftiComputeForwardDM........................................................ 2057
DftiComputeBackwardDM...................................................... 2058
Cluster FFT Descriptor Configuration Functions................................. 2059
DftiSetValueDM....................................................................2060
DftiGetValueDM................................................................... 2061
Error Codes................................................................................. 2063
Chapter 10: PBLAS Routines

PBLAS Routines Overview...................................................................... 2065
PBLAS Routine Naming Conventions........................................................2066
PBLAS Level 1 Routines......................................................................... 2068
20
Contents
p?amax...................................................................................... 2068
p?asum.......................................................................................2069
p?axpy....................................................................................... 2070
p?copy........................................................................................2072
p?dot..........................................................................................2073
p?dotc........................................................................................ 2074
p?dotu........................................................................................ 2075
p?nrm2.......................................................................................2077
p?scal.........................................................................................2078
p?swap....................................................................................... 2079
p?gemv...................................................................................... 2081
p?agemv.....................................................................................2083
p?ger..........................................................................................2086
p?gerc........................................................................................ 2087
p?geru........................................................................................2089
p?hemv...................................................................................... 2091
p?ahemv.....................................................................................2092
p?her......................................................................................... 2094
p?her2........................................................................................2096
p?symv.......................................................................................2098
p?asymv..................................................................................... 2100
p?syr..........................................................................................2102
p?syr2........................................................................................ 2103
p?trmv........................................................................................2105
p?atrmv...................................................................................... 2107
p?trsv.........................................................................................2110
p?geadd......................................................................................2112
p?tradd.......................................................................................2114
p?gemm..................................................................................... 2116
p?hemm..................................................................................... 2118
p?herk........................................................................................ 2120
p?her2k...................................................................................... 2122
p?symm......................................................................................2124
p?syrk........................................................................................ 2126
p?syr2k...................................................................................... 2128
p?tran........................................................................................ 2131
p?tranu.......................................................................................2132
p?tranc....................................................................................... 2133
p?trmm...................................................................................... 2135
p?trsm........................................................................................2137
Chapter 11: Partial Differential Equations Support

Trigonometric Transform Routines........................................................... 2141
Trigonometric Transforms Implemented...........................................2141
Sequence of Invoking TT Routines.................................................. 2143
Trigonometric Transform Interface Description..................................2144
TT Routines................................................................................. 2144
?_init_trig_transform............................................................2145
21
?_commit_trig_transform......................................................2146
?_forward_trig_transform..................................................... 2148
?_backward_trig_transform................................................... 2150
free_trig_transform.............................................................. 2151
Common Parameters of the Trigonometric Transforms....................... 2152
Trigonometric Transform Implementation Details.............................. 2155
Fast Poisson Solver Routines ................................................................. 2156
Poisson Solver Implementation...................................................... 2156
Sequence of Invoking Poisson Solver Routines................................. 2162
Fast Poisson Solver Interface Description.........................................2164
Routines for the Cartesian Solver................................................... 2164
?_init_Helmholtz_2D/?_init_Helmholtz_3D.............................. 2164
_commit_Helmholtz_2D/?_commit_Helmholtz_3D.................... 2167
?_Helmholtz_2D/?_Helmholtz_3D.......................................... 2170
free_Helmholtz_2D/free_Helmholtz_3D...................................2173
Routines for the Spherical Solver....................................................2174
?_init_sph_p/?_init_sph_np...................................................2174
?_commit_sph_p/?_commit_sph_np.......................................2176
?_sph_p/?_sph_np............................................................... 2178
free_sph_p/free_sph_np....................................................... 2179
Common Parameters for the Poisson Solver..................................... 2180
ipar....................................................................................2180
dpar and spar......................................................................2184
Caveat on Parameter Modifications......................................... 2187
Parameters That Define Boundary Conditions...........................2187
Poisson Solver Implementation Details............................................ 2190
Chapter 12: Nonlinear Optimization Problem Solvers

Nonlinear Solver Organization and Implementation................................... 2191
Nonlinear Solver Routine Naming Conventions..........................................2192
Nonlinear Least Squares Problem without Constraints................................2193
?trnlsp_init..................................................................................2193
?trnlsp_check.............................................................................. 2195
?trnlsp_solve............................................................................... 2196
?trnlsp_get..................................................................................2198
?trnlsp_delete..............................................................................2199
Nonlinear Least Squares Problem with Linear (Bound) Constraints...............2200
?trnlspbc_init...............................................................................2200
?trnlspbc_check........................................................................... 2202
?trnlspbc_solve............................................................................ 2204
?trnlspbc_get...............................................................................2205
?trnlspbc_delete.......................................................................... 2207
Jacobian Matrix Calculation Routines....................................................... 2207
?jacobi_init..................................................................................2208
?jacobi_solve...............................................................................2208
?jacobi_delete............................................................................. 2209
?jacobi........................................................................................2210
?jacobix...................................................................................... 2211
Chapter 13: Support Functions
22
Contents
Version Information.............................................................................. 2218

mkl_get_version.......................................................................... 2219
mkl_get_version_string.................................................................2220
Threading Control.................................................................................2221
mkl_set_num_threads.................................................................. 2222
mkl_domain_set_num_threads...................................................... 2223
mkl_set_num_threads_local.......................................................... 2224
mkl_set_dynamic......................................................................... 2225
mkl_get_max_threads.................................................................. 2226
mkl_domain_get_max_threads...................................................... 2227
mkl_get_dynamic.........................................................................2228
mkl_set_num_stripes................................................................... 2229
mkl_get_num_stripes................................................................... 2230
Error Handling..................................................................................... 2230
Error Handling for Linear Algebra Routines.......................................2230
xerbla................................................................................ 2230
pxerbla...............................................................................2232
LAPACKE_xerbla.................................................................. 2233
Handling Fatal Errors.................................................................... 2233
mkl_set_exit_handler........................................................... 2234
Character Equality Testing..................................................................... 2234
lsame......................................................................................... 2234
lsamen....................................................................................... 2235
Timing................................................................................................ 2236
second/dsecnd.............................................................................2236
mkl_get_cpu_clocks..................................................................... 2236
mkl_get_cpu_frequency................................................................ 2237
mkl_get_max_cpu_frequency........................................................ 2238
mkl_get_clocks_frequency.............................................................2238
Memory Management............................................................................2239
mkl_free_buffers..........................................................................2239
mkl_thread_free_buffers............................................................... 2240
mkl_disable_fast_mm...................................................................2240
mkl_mem_stat............................................................................ 2241
mkl_peak_mem_usage................................................................. 2241
mkl_malloc..................................................................................2242
mkl_calloc...................................................................................2243
mkl_realloc................................................................................. 2244
mkl_free..................................................................................... 2245
mkl_set_memory_limit................................................................. 2245
Usage Example for the Memory Functions........................................2246
Single Dynamic Library Control...............................................................2247
mkl_set_interface_layer................................................................ 2247
mkl_set_threading_layer...............................................................2248
mkl_set_xerbla............................................................................ 2249
mkl_set_progress.........................................................................2250
mkl_set_pardiso_pivot.................................................................. 2250
Intel Many Integrated Core Architecture Support...................................... 2251
mkl_mic_enable...........................................................................2252
mkl_mic_disable.......................................................................... 2252
23
mkl_mic_get_device_count........................................................... 2253
mkl_mic_set_workdivision.............................................................2253
mkl_mic_get_workdivision............................................................ 2255
mkl_mic_set_max_memory...........................................................2256
mkl_mic_free_memory................................................................. 2257
mkl_mic_register_memory............................................................ 2259
mkl_mic_set_device_num_threads................................................. 2259
mkl_mic_set_resource_limit.......................................................... 2261
mkl_mic_get_resource_limit.......................................................... 2263
mkl_mic_set_offload_report.......................................................... 2264
mkl_mic_set_flags....................................................................... 2265
mkl_mic_get_flags....................................................................... 2266
mkl_mic_get_status..................................................................... 2266
mkl_mic_clear_status................................................................... 2268
mkl_mic_get_meminfo..................................................................2269
mkl_mic_get_cpuinfo....................................................................2270
Conditional Numerical Reproducibility Control........................................... 2271
mkl_cbwr_set.............................................................................. 2272
mkl_cbwr_get..............................................................................2273
mkl_cbwr_get_auto_branch...........................................................2274
Named Constants for CNR Control.................................................. 2275
Reproducibility Conditions............................................................. 2276
Usage Examples for CNR Support Functions..................................... 2276
Miscellaneous.......................................................................................2277
mkl_progress...............................................................................2277
mkl_enable_instructions ...............................................................2279
mkl_set_env_mode...................................................................... 2280
mkl_verbose................................................................................2281
mkl_set_mpi .............................................................................. 2282
mkl_finalize.................................................................................2283
Chapter 14: BLACS Routines

Matrix Shapes...................................................................................... 2285
Repeatability and Coherence.................................................................. 2286
BLACS Combine Operations................................................................... 2289
?gamx2d.....................................................................................2290
?gamn2d.....................................................................................2291
?gsum2d.....................................................................................2293
BLACS Point To Point Communication...................................................... 2294
?gesd2d......................................................................................2296
?trsd2d....................................................................................... 2296
?gerv2d...................................................................................... 2297
?trrv2d....................................................................................... 2298
BLACS Broadcast Routines..................................................................... 2298
?gebs2d......................................................................................2300
?trbs2d....................................................................................... 2300
?gebr2d...................................................................................... 2301
?trbr2d....................................................................................... 2302
BLACS Support Routines........................................................................2303
Initialization Routines................................................................... 2303
24
Contents
blacs_pinfo......................................................................... 2303
blacs_setup.........................................................................2304
blacs_get............................................................................ 2304
blacs_set............................................................................ 2305
blacs_gridinit.......................................................................2307
blacs_gridmap..................................................................... 2308
Destruction Routines.....................................................................2309
blacs_freebuff..................................................................... 2310
blacs_gridexit......................................................................2310
blacs_abort......................................................................... 2310
blacs_exit........................................................................... 2311
Informational Routines..................................................................2311
blacs_gridinfo......................................................................2312
blacs_pnum........................................................................ 2312
blacs_pcoord....................................................................... 2312
Miscellaneous Routines................................................................. 2313
blacs_barrier....................................................................... 2313
Examples of BLACS Routines Usage........................................................ 2314
Chapter 15: Data Fitting Functions

Data Fitting Function Naming Conventions............................................... 2315
Data Fitting Function Data Types............................................................ 2316
Mathematical Conventions for Data Fitting Functions................................. 2316
Data Fitting Usage Model....................................................................... 2319
Data Fitting Usage Examples..................................................................2319
Data Fitting Function Task Status and Error Reporting............................... 2325
Data Fitting Task Creation and Initialization Routines.................................2327
df?NewTask1D............................................................................. 2327
Task Configuration Routines................................................................... 2329
df?EditPPSpline1D........................................................................ 2330
df?EditPtr....................................................................................2337
dfiEditVal.................................................................................... 2338
df?EditIdxPtr............................................................................... 2340
df?QueryPtr.................................................................................2341
dfiQueryVal................................................................................. 2342
df?QueryIdxPtr............................................................................ 2343
Data Fitting Computational Routines....................................................... 2344
df?Construct1D............................................................................2345
df?Interpolate1D/df?InterpolateEx1D..............................................2346
df?Integrate1D/df?IntegrateEx1D...................................................2353
df?SearchCells1D/df?SearchCellsEx1D............................................ 2357
df?InterpCallBack.........................................................................2359
df?IntegrCallBack.........................................................................2360
df?SearchCellsCallBack................................................................. 2362
Data Fitting Task Destructors................................................................. 2364
dfDeleteTask................................................................................2364
Chapter 16: Deep Neural Network Functions

Enumerated Types................................................................................ 2366
Handling Array Layouts......................................................................... 2368
25
dnnLayoutCreate..........................................................................2369
dnnLayoutCreateFromPrimitive.......................................................2369
dnnLayoutGetMemorySize............................................................. 2370
dnnLayoutCompare...................................................................... 2370
dnnLayoutDelete.......................................................................... 2371
Handling Attributes of DNN Operations.................................................... 2371
dnnPrimitiveAttributesCreate......................................................... 2371
dnnPrimitiveAttributesDestroy........................................................2372
dnnPrimitiveGetAttributes..............................................................2372
DNN Operations................................................................................... 2372
dnnConvolutionCreate, dnnGroupsConvolutionCreate........................ 2373
dnnInnerProductCreate................................................................. 2377
dnnReLUCreate............................................................................ 2378
dnnLRNCreate............................................................................. 2379
dnnPoolingCreate......................................................................... 2380
dnnBatchNormalizationCreate........................................................ 2382
dnnBatchNormalizationCreate_v2................................................... 2383
dnnSplitCreate.............................................................................2384
dnnConcatCreate......................................................................... 2385
dnnSumCreate.............................................................................2386
dnnScaleCreate............................................................................2386
dnnConversionCreate....................................................................2387
dnnExecute................................................................................. 2388
dnnConversionExecute.................................................................. 2392
dnnDelete................................................................................... 2392
dnnAllocateBuffer......................................................................... 2393
dnnReleaseBuffer......................................................................... 2393
Appendix A: Linear Solvers Basics

Sparse Linear Systems.......................................................................... 2395
Matrix Fundamentals.................................................................... 2395
Direct Method.............................................................................. 2396
Sparse Matrix Storage Formats...............................................................2400
DSS Symmetric Matrix Storage...................................................... 2400
DSS Nonsymmetric Matrix Storage................................................. 2401
DSS Structurally Symmetric Matrix Storage..................................... 2402
DSS Distributed Symmetric Matrix Storage...................................... 2403
Sparse BLAS CSR Matrix Storage Format......................................... 2404
Sparse BLAS CSC Matrix Storage Format......................................... 2406
Sparse BLAS Coordinate Matrix Storage Format................................2406
Sparse BLAS Diagonal Matrix Storage Format...................................2407
Sparse BLAS Skyline Matrix Storage Format.....................................2408
Sparse BLAS BSR Matrix Storage Format......................................... 2409
Appendix B: Routine and Function Arguments

Vector Arguments in BLAS..................................................................... 2413
Vector Arguments in VM........................................................................ 2414
Matrix Arguments................................................................................. 2414
Appendix C: FFTW Interface to Intel® Math Kernel Library

FFTW Notational Conventions................................................................. 2421
26
Contents
FFTW2 Interface to Intel® Math Kernel Library ......................................... 2421

Wrappers Reference..................................................................... 2421
One-dimensional Complex-to-complex FFTs ............................ 2421
Multi-dimensional Complex-to-complex FFTs............................ 2422
One-dimensional Real-to-half-complex/Half-complex-to-real FFTs2422
Multi-dimensional Real-to-complex/Complex-to-real FFTs.......... 2422
Multi-threaded FFTW............................................................ 2423
FFTW Support Functions....................................................... 2423
Limitations of the FFTW2 Interface to Intel MKL................................2424
Installing FFTW2 Interface Wrappers...............................................2424
Creating the Wrapper Library.................................................2424
Application Assembling ........................................................ 2425
Running FFTW2 Interface Wrapper Examples........................... 2425
MPI FFTW2 Wrappers....................................................................2425
MPI FFTW Wrappers Reference...............................................2426
Creating MPI FFTW2 Wrapper Library......................................2427
Application Assembling with MPI FFTW Wrapper Library.............2427
Running MPI FFTW2 Wrapper Examples.................................. 2428
FFTW3 Interface to Intel® Math Kernel Library.......................................... 2428
Using FFTW3 Wrappers................................................................. 2429
Building Your Own FFTW3 Interface Wrapper Library......................... 2430
Building an Application With FFTW3 Interface Wrappers.....................2430
Running FFTW3 Interface Wrapper Examples................................... 2431
MPI FFTW3 Wrappers....................................................................2431
Building Your Own Wrapper Library.........................................2431
Building an Application......................................................... 2432
Running Examples............................................................... 2432
Appendix D: Code Examples

BLAS Code Examples............................................................................ 2433
Fourier Transform Functions Code Examples.............................................2438
FFT Code Examples...................................................................... 2439
Examples of Using OpenMP* Threading for FFT Computation......2442
Examples for Cluster FFT Functions.................................................2444
Auxiliary Data Transformations.......................................................2446
Bibliography
Glossary
27
28
Legal Information
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from
course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information
provided here is subject to change without notice. Contact your Intel representative to obtain the latest
forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors which may cause deviations from
published specifications. Current characterized errata are available on request.
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with
other products.
Cilk, Intel, the Intel logo, Intel Atom, Intel Core, Intel Inside, Intel NetBurst, Intel SpeedStep, Intel vPro,
Intel Xeon Phi, Intel XScale, Itanium, MMX, Pentium, Thunderbolt, Ultrabook, VTune and Xeon are
trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation
in the United States and/or other countries.
Java is a registered trademark of Oracle and/or its affiliates.
Third Party Content

Intel® Math Kernel Library (Intel® MKL) includes content from several 3rd party sources that was originally
governed by the licenses referenced below:
• Portions© Copyright 2001 Hewlett-Packard Development Company, L.P.
• Sections on the Linear Algebra PACKage (LAPACK) routines include derivative work portions that have
been copyrighted:
© 1991, 1992, and 1998 by The Numerical Algorithms Group, Ltd.
• Intel MKL supports LAPACK 3.5 set of computational, driver, auxiliary and utility routines under the
following license:
Copyright © 1992-2011 The University of Tennessee and The University of Tennessee Research Foundation. All rights
reserved.
Copyright © 2000-2011 The University of California Berkeley. All rights reserved.
Copyright © 2006-2012 The University of Colorado Denver. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the
following conditions are met:
• Redistributions of source code must retain the above copyright notice, this list of conditions and the following
disclaimer.
• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following
disclaimer listed in this license in the documentation and/or other materials provided with the distribution.
• Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote
products derived from this software without specific prior written permission.
29
The copyright holders provide no reassurances that the source code provided does not infringe any patent, copyright,
or any other intellectual property rights of third parties. The copyright holders disclaim any liability to any recipient for
claims brought against recipient by any third party for infringement of that parties intellectual property rights.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The original versions of LAPACK from which that part of Intel MKL was derived can be obtained from
http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S.
Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D.
Sorensen.
• The original versions of the Basic Linear Algebra Subprograms (BLAS) from which the respective part of
Intel® MKL was derived can be obtained from http://www.netlib.org/blas/index.html.
• XBLAS is distributed under the following copyright:
Copyright © 2008-2009 The University of California Berkeley. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided
that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the
following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the
following disclaimer listed in this license in the documentation and/or other materials provided with the
distribution.
- Neither the name of the copyright holders nor the names of its contributors may be used to endorse or
promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
• The original versions of the Basic Linear Algebra Communication Subprograms (BLACS) from which the
respective part of Intel MKL was derived can be obtained from http://www.netlib.org/blacs/index.html.
The authors of BLACS are Jack Dongarra and R. Clint Whaley.
• The original versions of Scalable LAPACK (ScaLAPACK) from which the respective part of Intel® MKL was
derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G.
Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley.
• The original versions of the Parallel Basic Linear Algebra Subprograms (PBLAS) routines from which the
respective part of Intel® MKL was derived can be obtained from http://www.netlib.org/scalapack/html/
pblas_qref.html.
• PARDISO (PARallel DIrect SOlver)* in Intel® MKL was originally developed by the Department of Computer
Science at the University of Basel (http://www.unibas.ch). It can be obtained at http://www.pardiso-
project.org.
• The Extended Eigensolver functionality is based on the Feast solver package and is distributed under the
following license:
30
Legal Information
Copyright © 2009, The Regents of the University of Massachusetts, Amherst.

Developed by E. Polizzi
All rights reserved.
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the
following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the University nor the names of its contributors may be used to endorse or
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
• Some Fast Fourier Transform (FFT) functions in this release of Intel® MKL have been generated by the
SPIRAL software generation system (http://www.spiral.net/) under license from Carnegie Mellon
University. The authors of SPIRAL are Markus Puschel, Jose Moura, Jeremy Johnson, David Padua,
Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen,
Robert W. Johnson, and Nick Rizzolo.
• Open MPI is distributed under the New BSD license, listed below.
Most files in this release are marked with the copyrights of the organizations who have edited them. The
copyrights below are in no particular order and generally reflect members of the Open MPI core team who
have contributed code to this release. The copyrights for code used under license from other parties are
included in the corresponding files.
Copyright © 2004-2010 The Trustees of Indiana University and Indiana University Research and
Technology Corporation. All rights reserved.
Copyright © 2004-2010 The University of Tennessee and The University of Tennessee Research
Foundation. All rights reserved.
Copyright © 2004-2010 High Performance Computing Center Stuttgart, University of Stuttgart. All rights
reserved.
Copyright © 2004-2008 The Regents of the University of California. All rights reserved.
Copyright © 2006-2010 Los Alamos National Security, LLC. All rights reserved.
Copyright © 2006-2010 Cisco Systems, Inc. All rights reserved.
Copyright © 2006-2010 Voltaire, Inc. All rights reserved.
Copyright © 2006-2011 Sandia National Laboratories. All rights reserved.
Copyright © 2006-2010 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms.
Copyright © 2006-2010 The University of Houston. All rights reserved.
Copyright © 2006-2009 Myricom, Inc. All rights reserved.
Copyright © 2007-2008 UT-Battelle, LLC. All rights reserved.
Copyright © 2007-2010 IBM Corporation. All rights reserved.
Copyright © 1998-2005 Forschungszentrum Juelich, Juelich Supercomputing Centre, Federal Republic of
Germany
31
Copyright © 2005-2008 ZIH, TU Dresden, Federal Republic of Germany

Copyright © 2007 Evergrid, Inc. All rights reserved.
Copyright © 2008 Chelsio, Inc. All rights reserved.
Copyright © 2008-2009 Institut National de Recherche en Informatique. All rights reserved.
Copyright © 2007 Lawrence Livermore National Security, LLC. All rights reserved.
Copyright © 2007-2009 Mellanox Technologies. All rights reserved.
Copyright © 2006-2010 QLogic Corporation. All rights reserved.
Copyright © 2008-2010 Oak Ridge National Labs. All rights reserved.
Copyright © 2006-2010 Oracle and/or its affiliates. All rights reserved.
Copyright © 2009 Bull SAS. All rights reserved.
Copyright © 2010 ARM ltd. All rights reserved.
Copyright © 2010-2011 Alex Brick . All rights reserved.
Copyright © 2012 The University of Wisconsin-La Crosse. All rights reserved.
Copyright © 2013-2014 Intel, Inc. All rights reserved.
Copyright © 2011-2014 NVIDIA Corporation. All rights reserved.
- Redistributions of source code must retain the above copyright notice, this list of conditions and the
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the
following disclaimer listed in this license in the documentation and/or other materials provided with the
distribution.
- Neither the name of the copyright holders nor the names of its contributors may be used to endorse or
The copyright holders provide no reassurances that the source code provided does not infringe any
patent, copyright, or any other intellectual property rights of third parties. The copyright holders disclaim
any liability to any recipient for claims brought against recipient by any third party for infringement of that
parties intellectual property rights.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
• The Safe C Library is distributed under the following copyright:
Copyright (c)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
associated documentation files (the "Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do so, subject to the
following conditions:
32
Legal Information
The above copyright notice and this permission notice shall be included in all copies or substantial portions
of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT
OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
• HPL Copyright Notice and Licensing Terms
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the
following disclaimer in the documentation and/or other materials provided with the distribution.
3. All advertising materials mentioning features or use of this software must display the following
acknowledgement: This product includes software developed at the University of Tennessee, Knoxville,
Innovative Computing Laboratories.
4. The name of the University, the name of the Laboratory, or the names of its contributors may not be
used to endorse or promote products derived from this software without specific written permission.
Copyright© 2017, Intel Corporation. All rights reserved.
33
34
Introducing the Intel® Math Kernel
Library
The Intel® Math Kernel Library (Intel® MKL) improves performance with math routines for software
applications that solve large computational problems. Intel MKL provides BLAS and LAPACK linear algebra
routines, fast Fourier transforms, vectorized math functions, random number generation functions, and other
functionality.
NOTE
It is your responsibility when using Intel MKL to ensure that input data has the required format and
does not contain invalid characters. These can cause unexpected behavior of the library.
The library requires subroutine and function parameters to be valid before being passed. While some
Intel MKL routines do limited checking of parameter errors, your application should check for NULL
pointers, for example.
Intel MKL is optimized for the latest Intel processors, including processors with multiple cores (see the Intel
MKL Release Notes for the full list of supported processors). Intel MKL also performs well on non-Intel
processors.
For more details about functionality provided by Intel MKL, see the Function Domains section.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for
optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and
SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-
dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to
the applicable product User and Reference Guides for more information regarding the specific instruction
sets covered by this notice.
Notice revision #20110804
35
36
Getting Help and Support
Intel provides a support web site that contains a rich repository of self help information, including getting
started tips, known product issues, product errata, license information, user forums, and more. Visit the Intel
MKL support website at http://www.intel.com/software/products/support/.
37
38
What's New
This Developer Reference documents Intel® Math Kernel Library (Intel® MKL) 2017 Update 2 release for the C
interface.
NOTE
This publication, the Intel Math Kernel Library Developer Reference, was previously known as the Intel
Math Kernel Library Reference Manual.
The following function domains were updated with new functions, enhancements to the existing functionality,
or improvements to the existing documentation:
• Deep Neural Network Functions have been updated to support:

• Asymmetric border types. For more details, see dnnConvolutionCreate and dnnPoolingCreate.
• Selection of the computation method to perform batch normalization. For more details, see the
description of the new function dnnBatchNormalizationCreate_v2.
• Support functions have been added to specify and return the number of partitions along the leading
dimension of the output matrix for parallel ?gemm functions. For more details, see mkl_set_num_stripes
and mkl_get_num_stripes.
• Support of ScaLAPACK by the progress routine has been implemented. For more details, see
mkl_progress and mkl_set_progress.
The manual has also been updated to reflect other enhancements to the product, and minor improvements
and error corrections have been made.
Optimization Notice
39
40
Notational Conventions
This manual uses the following terms to refer to operating systems:
Windows* OS This term refers to information that is valid on all supported Windows* operating
systems.
Linux* OS This term refers to information that is valid on all supported Linux* operating
systems.
macOS* This term refers to information that is valid on Intel®-based systems running the
macOS* operating system.
This manual uses the following notational conventions:
• Routine name shorthand (for example, ?ungqr instead of cungqr/zungqr).

• Font conventions used for distinction between the text and the code.
Routine Name Shorthand

For shorthand, names that contain a question mark "?" represent groups of routines with similar
functionality. Each group typically consists of routines used with four basic data types: single-precision real,
double-precision real, single-precision complex, and double-precision complex. The question mark is used to
indicate any or all possible varieties of a function; for example:
?swap Refers to all four data types of the vector-vector ?swap routine:
sswap, dswap, cswap, and zswap.
Font Conventions
The following font conventions are used:
lowercase courier Code examples:

a[k+i][j] = matrix[i][j];
data types; for example, const float*
lowercase courier mixed with Function names; for example, vmlSetMode

UpperCase courier
lowercase courier italic Variables in arguments and parameters description. For example, incx.
* Used as a multiplication symbol in code examples and equations and

where required by the programming language syntax.
41
42
Function Domains 1
NOTE
It is your responsibility when using Intel MKL to ensure that input data has the required format and
does not contain invalid characters. These can cause unexpected behavior of the library.
The library requires subroutine and function parameters to be valid before being passed. While some
Intel MKL routines do limited checking of parameter errors, your application should check for NULL
pointers, for example.
The Intel® Math Kernel Library includes Fortran routines and functions optimized for Intel® processor-based
computers running operating systems that support multiprocessing. In addition to the Fortran interface, Intel
MKL includes a C-language interface for the Discrete Fourier transform functions, as well as for the Vector
Mathematics and Vector Statistics functions. For hardware and software requirements to use Intel MKL, see
Intel® MKL Release Notes.
BLAS Routines
The BLAS routines and functions are divided into the following groups according to the operations they
perform:
• BLAS Level 1 Routines perform operations of both addition and reduction on vectors of data. Typical
operations include scaling and dot products.
• BLAS Level 2 Routines perform matrix-vector operations, such as matrix-vector multiplication, rank-1 and
rank-2 matrix updates, and solution of triangular systems.
• BLAS Level 3 Routines perform matrix-matrix operations, such as matrix-matrix multiplication, rank-k
update, and solution of triangular systems.
Starting from release 8.0, Intel® MKL also supports the Fortran 95 interface to the BLAS routines.
Starting from release 10.1, a number of BLAS-like Extensions are added to enable the user to perform
certain data manipulation, including matrix in-place and out-of-place transposition operations combined with
simple matrix arithmetic operations.
Sparse BLAS Routines

The Sparse BLAS Level 1 Routines and Functions and Sparse BLAS Level 2 and Level 3 Routines routines and
functions operate on sparse vectors and matrices. These routines perform vector operations similar to the
BLAS Level 1, 2, and 3 routines. The Sparse BLAS routines take advantage of vector and matrix sparsity:
they allow you to store only non-zero elements of vectors and matrices. Intel MKL also supports Fortran 95
interface to Sparse BLAS routines.
LAPACK Routines
The Intel® Math Kernel Library fully supports the LAPACK 3.6 set of computational, driver, auxiliary and utility
routines.
The original versions of LAPACK from which that part of Intel MKL was derived can be obtained from http://
www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J.
Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen.
The LAPACK routines can be divided into the following groups according to the operations they perform:
• Routines for solving systems of linear equations, factoring and inverting matrices, and estimating
condition numbers (see LAPACK Routines: Linear Equations).
43
1 Intel® Math Kernel Library Developer Reference
• Routines for solving least squares problems, eigenvalue and singular value problems, and Sylvester's
equations (see LAPACK Routines: Least Squares and Eigenvalue Problems).
Starting from release 8.0, Intel MKL also supports the Fortran 95 interface to LAPACK computational and
driver routines. This interface provides an opportunity for simplified calls of LAPACK routines with fewer
required arguments.
Sparse Solver Routines

Direct sparse solver routines in Intel MKL (see Chapter 8) solve symmetric and symmetrically-structured
sparse matrices with real or complex coefficients. For symmetric matrices, these Intel MKL subroutines can
solve both positive-definite and indefinite systems. Intel MKL includes a solver based on the PARDISO*
sparse solver, referred to as Intel MKL PARDISO, as well as an alternative set of user callable direct sparse
solver routines.
If you use the Intel MKL PARDISO sparse solver, please cite:
O.Schenk and K.Gartner. Solving unsymmetric sparse systems of linear equations with PARDISO. J. of Future
Generation Computer Systems, 20(3):475-487, 2004.
Intel MKL provides also an iterative sparse solver (see Chapter 8) that uses Sparse BLAS level 2 and 3
routines and works with different sparse data formats.
Extended Eigensolver Routines

TheExtended Eigensolver RCI Routines is a set of high-performance numerical routines for solving standard
(Ax = λx) and generalized (Ax = λBx) eigenvalue problems, where A and B are symmetric or Hermitian. It
yields all the eigenvalues and eigenvectors within a given search interval. It is based on the Feast algorithm,
an innovative fast and stable numerical algorithm presented in [Polizzi09], which deviates fundamentally
from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms [Bai00]) or
other Davidson-Jacobi techniques [Sleijpen96]. The Feast algorithm is inspired by the density-matrix
representation and contour integration technique in quantum mechanics.
It is free from orthogonalization procedures. Its main computational tasks consist of solving very few inner
independent linear systems with multiple right-hand sides and one reduced eigenvalue problem orders of
magnitude smaller than the original one. The Feast algorithm combines simplicity and efficiency and offers
many important capabilities for achieving high performance, robustness, accuracy, and scalability on parallel
architectures. This algorithm is expected to significantly augment numerical performance in large-scale
modern applications.
Some of the characteristics of the Feast algorithm [Polizzi09] are:
• Converges quickly in 2-3 iterations with very high accuracy

• Naturally captures all eigenvalue multiplicities
• No explicit orthogonalization procedure
• Can reuse the basis of pre-computed subspace as suitable initial guess for performing outer-refinement
iterations
This capability can also be used for solving a series of eigenvalue problems that are close one another.
• The number of internal iterations is independent of the size of the system and the number of eigenpairs in
the search interval
• The inner linear systems can be solved either iteratively (even with modest relative residual error) or
directly
VM Functions
The Vector Mathematics functions (see Chapter 9) include a set of highly optimized implementations of
certain computationally expensive core mathematical functions (power, trigonometric, exponential,
hyperbolic, etc.) that operate on vectors of real and complex numbers.
Application programs that might significantly improve performance with VM include nonlinear programming
software, integrals computation, and many others. VM provides interfaces both for Fortran and C languages.
44
Function Domains 1
Statistical Functions
Vector Statistics (VS) contains three sets of functions (see Chapter 10) providing:
• Pseudorandom, quasi-random, and non-deterministic random number generator subroutines
implementing basic continuous and discrete distributions. To provide best performance, the VS
subroutines use calls to highly optimized Basic Random Number Generators (BRNGs) and a set of vector
mathematical functions.
• A wide variety of convolution and correlation operations.
• Initial statistical analysis of raw single and double precision multi-dimensional datasets.
Fourier Transform Functions

The Intel® MKL multidimensional Fast Fourier Transform (FFT) functions with mixed radix support (see
Chapter 11) provide uniformity of discrete Fourier transform computation and combine functionality with
ease of use. Both Fortran and C interface specification are given. There is also a cluster version of FFT
functions, which runs on distributed-memory architectures and is provided only for Intel® 64 and Intel® Many
Integrated Core architectures.
The FFT functions provide fast computation via the FFT algorithms for arbitrary lengths. See the Intel® MKL
Developer Guide for the specific radices supported.
Partial Differential Equations Support

Intel® MKL provides tools for solving Partial Differential Equations (PDE) (see Chapter 13). These tools are
Trigonometric Transform interface routines and Poisson Solver.
The Trigonometric Transform routines may be helpful to users who implement their own solvers similar to the
Intel MKL Poisson Solver. The users can improve performance of their solvers by using fast sine, cosine, and
staggered cosine transforms implemented in the Trigonometric Transform interface.
The Poisson Solver is designed for fast solving of simple Helmholtz, Poisson, and Laplace problems. The
Trigonometric Transform interface, which underlies the solver, is based on the Intel MKL FFT interface (refer
to Chapter 11), optimized for Intel® processors.
Support Functions
The Intel® MKL support functions (see Chapter 15) are used to support the operation of the Intel MKL
software and provide basic information on the library and library operation, such as the current library
version, timing, setting and measuring of CPU frequency, error handling, and memory allocation.
Starting from release 10.0, the Intel MKL support functions provide additional threading control.
Starting from release 10.1, Intel MKL selectively supports a Progress Routine feature to track progress of a
lengthy computation and/or interrupt the computation using a callback function mechanism. The user
application can define a function called mkl_progress that is regularly called from the Intel MKL routine
supporting the progress routine feature. See the Progress Routines section in Chapter 15 for reference. Refer
to a specific LAPACK or DSS/PARDISO function description to see whether the function supports this feature
or not.
Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)
Intel® Math Kernel Library (Intel® MKL) functions for Deep Neural Networks (DNN functions) is a collection of
performance primitives for Deep Neural Networks (DNN) applications optimized for Intel® architecture. The
implementation of DNN functions includes a limited set of primitives used in the AlexNet topology.
The primitives implement forward and backward passes for several convolution, pooling, normalization,
activation, and multi-dimensional transposition operations.
Intel MKL DNN primitives implement a plain C application programming interface (API) that can be used in
the existing C/C++ DNN frameworks, as well as in custom DNN applications.
45
Optimization Notice
Performance Enhancements
The Intel® Math Kernel Library has been optimized by exploiting both processor and system features and
capabilities. Special care has been given to those routines that most profit from cache-management
techniques. These especially include matrix-matrix operation routines such as dgemm().
In addition, code optimization techniques have been applied to minimize dependencies of scheduling integer
and floating-point units on the results within the processor.
The major optimization techniques used throughout the library include:
• Loop unrolling to minimize loop management costs

• Blocking of data to improve data reuse opportunities
• Copying to reduce chances of data eviction from cache
• Data prefetching to help hide memory latency
• Multiple simultaneous operations (for example, dot products in dgemm) to eliminate stalls due to
arithmetic unit pipelines
• Use of hardware features such as the SIMD arithmetic units, where appropriate
These are techniques from which the arithmetic code benefits the most.
Optimization Notice
Parallelism
In addition to the performance enhancements discussed above, Intel® MKL offers performance gains through
parallelism provided by the symmetric multiprocessing performance (SMP) feature. You can obtain
improvements from SMP in the following ways:
• One way is based on user-managed threads in the program and further distribution of the operations over
the threads based on data decomposition, domain decomposition, control decomposition, or some other
parallelizing technique. Each thread can use any of the Intel MKL functions (except for the deprecated ?
lacon LAPACK routine) because the library has been designed to be thread-safe.
• Another method is to use the FFT and BLAS level 3 routines. They have been parallelized and require no
alterations of your application to gain the performance enhancements of multiprocessing. Performance
using multiple processors on the level 3 BLAS shows excellent scaling. Since the threads are called and
managed within the library, the application does not need to be recompiled thread-safe.
46
Function Domains 1
• Yet another method is to use tuned LAPACK routines. Currently these include the single- and double
precision flavors of routines for QR factorization of general matrices, triangular factorization of general
and symmetric positive-definite matrices, solving systems of equations with such matrices, as well as
solving symmetric eigenvalue problems.
For instructions on setting the number of available processors for the BLAS level 3 and LAPACK routines, see
Intel® MKL Developer Guide.
Optimization Notice
C Datatypes Specific to Intel MKL

The mkl_types.h file defines datatypes specific to Intel MKL.
C/C++ Type Fortran Type LP32 LP64 Equivalent ILP64 Equivalent
Equivalent (Size in Bytes) (Size in Bytes)
(Size in
Bytes)
MKL_INT INTEGER C/C++: C/C++: int C/C++: long long

int (or define MKL_ILP64
(MKL integer) (default Fortran: INTEGER*4
macros
INTEGER) Fortran:
(4 bytes)
INTEGER*4 Fortran: INTEGER*8
(4 bytes) (8 bytes)
MKL_UINT N/A C/C++: C/C++: unsigned C/C++: unsigned

unsigned int long long
(MKL unsigned
int
integer) (4 bytes) (8 bytes)
(4 bytes)
MKL_LONG N/A C/C++: C/C++: long C/C++: long

long
(MKL long integer) (Windows: 4 bytes) (8 bytes)
(4 bytes)
(Linux, Mac: 8
bytes)
MKL_Complex8 COMPLEX*8 (8 bytes) (8 bytes) (8 bytes)

(Like C99 complex
float)
MKL_Complex16 COMPLEX*16 (16 bytes) (16 bytes) (16 bytes)

(Like C99 complex
double)
You can redefine datatypes specific to Intel MKL. One reason to do this is if you have your own types which
are binary-compatible with Intel MKL datatypes, with the same representation or memory layout. To redefine
a datatype, use one of these methods:
47
• Insert the #define statement redefining the datatype before the mkl.h header file #include statement.
For example,
#define MKL_INT size_t
#include "mkl.h"
• Use the compiler -D option to redefine the datatype. For example,
...-DMKL_INT=size_t...
NOTE
As the user, if you redefine Intel MKL datatypes you are responsible for making sure that your
definition is compatible with that of Intel MKL. If not, it might cause unpredictable results or crash the
application.
48
BLAS and Sparse BLAS Routines 2
This chapter describes the Intel® Math Kernel Library implementation of the BLAS and Sparse BLAS routines,
and BLAS-like extensions. The routine descriptions are arranged in several sections:
• BLAS Level 1 Routines (vector-vector operations)
• BLAS Level 2 Routines (matrix-vector operations)
• BLAS Level 3 Routines (matrix-matrix operations)
• Sparse BLAS Level 1 Routines (vector-vector operations).
• Sparse BLAS Level 2 and Level 3 Routines (matrix-vector and matrix-matrix operations)
• BLAS-like Extensions
Each section presents the routine and function group descriptions in alphabetical order by routine or function
group name; for example, the ?asum group, the ?axpy group. The question mark in the group name
corresponds to different character codes indicating the data type (s, d, c, and z or their combination); see
Routine Naming Conventions.
When BLAS or Sparse BLAS routines encounter an error, they call the error reporting routine xerbla.
In BLAS Level 1 groups i?amax and i?amin, an "i" is placed before the data-type indicator and corresponds
to the index of an element in the vector. These groups are placed in the end of the BLAS Level 1 section.
BLAS Routines
Naming Conventions for BLAS Routines

BLAS routine names have the following structure:
<character> <name> <mod> ( )
The <character> field indicates the data type:
s real, single precision
c complex, single precision
d real, double precision
z complex, double precision
Some routines and functions can have combined character codes, such as sc or dz.
For example, the function scasum uses a complex input array and returns a real value.
The <name> field, in BLAS level 1, indicates the operation type. For example, the BLAS level 1 routines ?
dot, ?rot, ?swap compute a vector dot product, vector rotation, and vector swap, respectively.
In BLAS level 2 and 3, <name> reflects the matrix argument type:
ge general matrix
gb general band matrix
sy symmetric matrix
49
sp symmetric matrix (packed storage)
sb symmetric band matrix
he Hermitian matrix
hp Hermitian matrix (packed storage)
hb Hermitian band matrix
tr triangular matrix
tp triangular matrix (packed storage)
tb triangular band matrix.
The <mod> field, if present, provides additional details of the operation. BLAS level 1 names can have the
following characters in the <mod> field:
c conjugated vector
u unconjugated vector
g Givens rotation construction
m modified Givens rotation

mg modified Givens rotation construction
BLAS level 2 names can have the following characters in the <mod> field:
mv matrix-vector product
sv solving a system of linear equations with a single unknown vector
r rank-1 update of a matrix
r2 rank-2 update of a matrix.
BLAS level 3 names can have the following characters in the <mod> field:
mm matrix-matrix product
sm solving a system of linear equations with multiple unknown vectors
rk rank-k update of a matrix
r2k rank-2k update of a matrix.
The examples below illustrate how to interpret BLAS routine names:
ddot <d> <dot>: double-precision real vector-vector dot product
cdotc <c> <dot> <c>: complex vector-vector dot product, conjugated
scasum <sc> <asum>: sum of magnitudes of vector elements, single precision real
output and single precision complex input
cdotu <c> <dot> <u>: vector-vector dot product, unconjugated, complex
sgemv <s> <ge> <mv>: matrix-vector product, general matrix, single precision
50
ztrmm <z> <tr> <mm>: matrix-matrix product, triangular matrix, double-precision
complex.
Sparse BLAS level 1 naming conventions are similar to those of BLAS level 1. For more information, see
Naming Conventions.
C Interface Conventions for BLAS Routines

CBLAS, the C interface to the Basic Linear Algebra Subprograms (BLAS), provides a C language interface to
BLAS routines for Intel MKL. While you can call the Fortran implementation of BLAS, for coding in C the
CBLAS interface has some advantages such as allowing you to specify column-major or row-major ordering
with the layout parameter.
For more information about calling Fortran routines from C in general, and specifically about calling BLAS and
CBLAS routines, see " Mixed-language Programming with the Intel Math Kernel Library" in the Intel Math
Kernel Library Developer Guide.
NOTE
This reference contains syntax in C for both the CBLAS interface and the Fortran BLAS routines.
In CBLAS, the Fortran routine names are prefixed with cblas_ (for example, dasum becomes
cblas_dasum). Names of all CBLAS functions are in lowercase letters.
Complex functions ?dotc and ?dotu become CBLAS subroutines (void functions); they return the complex
result via a void pointer, added as the last parameter. CBLAS names of these functions are suffixed with
_sub. For example, the BLAS function cdotc corresponds to cblas_cdotc_sub.
WARNING
Users of the CBLAS interface should be aware that the CBLAS are just a C interface to the BLAS, which
is based on the FORTRAN standard and subject to the FORTRAN standard restrictions. In particular, the
output parameters should not be referenced through more than one argument.
NOTE
This interface is not implemented in the Sparse BLAS Level 2 and Level 3 routines.
The arguments of CBLAS functions comply with the following rules:

• Input arguments are declared with the const modifier.
• Non-complex scalar input arguments are passed by value.
• Complex scalar input arguments are passed as void pointers.
• Array arguments are passed by address.
• BLAS character arguments are replaced by the appropriate enumerated type.
• Level 2 and Level 3 routines acquire an additional parameter of type CBLAS_LAYOUT as their first
argument. This parameter specifies whether two-dimensional arrays are row-major (CblasRowMajor) or
column-major (CblasColMajor).
Enumerated Types
The CBLAS interface uses the following enumerated types:
enum CBLAS_LAYOUT {
CblasRowMajor=101, /* row-major arrays */
CblasColMajor=102}; /* column-major arrays */
enum CBLAS_TRANSPOSE {
CblasNoTrans=111, /* trans='N' */
CblasTrans=112, /* trans='T' */
51
CblasConjTrans=113}; /* trans='C' */
enum CBLAS_UPLO {
CblasUpper=121, /* uplo ='U' */
CblasLower=122}; /* uplo ='L' */
enum CBLAS_DIAG {
CblasNonUnit=131, /* diag ='N' */
CblasUnit=132}; /* diag ='U' */
enum CBLAS_SIDE {
CblasLeft=141, /* side ='L' */
CblasRight=142}; /* side ='R' */
Matrix Storage Schemes for BLAS Routines

Matrix arguments of BLAS and CBLAS routines can use the following storage schemes:
• Full storage: a matrix A is stored in a two-dimensional array a, with the matrix element Aij stored in the
array element a[i + j*lda] for column-major layout and a[j + i*lda] for row-major layout, where
lda is the leading dimension for the array.
• Packed storage scheme allows you to store symmetric, Hermitian, or triangular matrices more compactly.
For column-major layout, the upper or lower triangle of the matrix is packed by columns in a one
dimensional array. For row-major layout, the upper or lower triangle of the matrix is packed by rows in a
one dimensional array.
• Band storage: a band matrix is stored compactly in a two-dimensional array. For column-major layout,
columns of the matrix are stored in the corresponding columns of the array, and diagonals of the matrix
are stored in a specific row of the array. For row-major layout, rows of the matrix are stored in the
corresponding rows of the array, and diagonals of the matrix are stored in a specific column of the array.
For more information on matrix storage schemes, see Matrix Arguments in Appendix B.
Row-Major and Column-Major Layout

The BLAS routines follow the Fortran convention of storing two-dimensional arrays using column-major
layout. When calling BLAS routines from C, remember that they require arrays to be in column-major format,
not the row-major format that is the convention for C. Unless otherwise specified, the psuedo-code examples
for the BLAS routines illustrate matrices stored using column-major layout.
The CBLAS interface allows you to specify either column-major or row-major layout for BLAS Level 2 and
Level 3 routines, by setting the layout parameter to CblasColMajor or CblasRowMajor.
BLAS Level 1 Routines and Functions

BLAS Level 1 includes routines and functions, which perform vector-vector operations. Table “BLAS Level 1
Routine Groups and Their Data Types” lists the BLAS Level 1 routine and function groups and the data types
associated with them.
BLAS Level 1 Routine and Function Groups and Their Data Types
Routine or Data Types Description
Function Group
cblas_?asum s, d, sc, dz Sum of vector magnitudes (functions)
cblas_?axpy s, d, c, z Scalar-vector product (routines)
cblas_?copy s, d, c, z Copy vector (routines)
cblas_?dot s, d Dot product (functions)
cblas_?sdot sd, d Dot product with double precision (functions)
52
Function Group
cblas_?dotc c, z Dot product conjugated (functions)
cblas_?dotu c, z Dot product unconjugated (functions)
cblas_?nrm2 s, d, sc, dz Vector 2-norm (Euclidean norm) (functions)
cblas_?rot s, d, cs, zd Plane rotation of points (routines)
cblas_?rotg s, d, c, z Generate Givens rotation of points (routines)
cblas_?rotm s, d Modified Givens plane rotation of points (routines)
cblas_?rotmg s, d Generate modified Givens plane rotation of points

(routines)
cblas_?scal s, d, c, z, cs, zd Vector-scalar product (routines)
cblas_?swap s, d, c, z Vector-vector swap (routines)
cblas_i?amax s, d, c, z Index of the maximum absolute value element of a vector

(functions)
cblas_i?amin s, d, c, z Index of the minimum absolute value element of a vector

(functions)
cblas_?cabs1 s, d Auxiliary functions, compute the absolute value of a

complex number of single or double precision
cblas_?asum
Computes the sum of magnitudes of the vector
elements.
Syntax
float cblas_sasum (const MKL_INT n, const float *x, const MKL_INT incx);
float cblas_scasum (const MKL_INT n, const void *x, const MKL_INT incx);
double cblas_dasum (const MKL_INT n, const double *x, const MKL_INT incx);
double cblas_dzasum (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
The ?asum routine computes the sum of the magnitudes of elements of a real vector, or the sum of
magnitudes of the real and imaginary parts of elements of a complex vector:
res = |Rex1| + |Imx1| + |Rex2| + |Imx2|+ ... + |Rexn| + |Imxn|,
where x is a vector with n elements.
Input Parameters
n Specifies the number of elements in vector x.
53
Array, size at least (1 + (n-1)*abs(incx)).
incx Specifies the increment for indexing vector x.
Return Values
Contains the sum of magnitudes of real and imaginary parts of all elements of the vector.
cblas_?axpy
Computes a vector-scalar product and adds the result
to a vector.
Syntax
void cblas_saxpy (const MKL_INT n, const float a, const float *x, const MKL_INT incx,
float *y, const MKL_INT incy);
void cblas_daxpy (const MKL_INT n, const double a, const double *x, const MKL_INT incx,
double *y, const MKL_INT incy);
void cblas_caxpy (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
void *y, const MKL_INT incy);
void cblas_zaxpy (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?axpy routines perform a vector-vector operation defined as
y := a*x + y
where:
a is a scalar
x and y are vectors each with a number of elements that equals n.
Input Parameters
n Specifies the number of elements in vectors x and y.
a Specifies the scalar a.
x Array, size at least (1 + (n-1)*abs(incx)).
incx Specifies the increment for the elements of x.
y Array, size at least (1 + (n-1)*abs(incy)).
incy Specifies the increment for the elements of y.
Output Parameters
y Contains the updated vector y.
54
cblas_?copy
Copies vector to another vector.
Syntax
void cblas_scopy (const MKL_INT n, const float *x, const MKL_INT incx, float *y, const
MKL_INT incy);
void cblas_dcopy (const MKL_INT n, const double *x, const MKL_INT incx, double *y,
const MKL_INT incy);
void cblas_ccopy (const MKL_INT n, const void *x, const MKL_INT incx, void *y, const
MKL_INT incy);
void cblas_zcopy (const MKL_INT n, const void *x, const MKL_INT incx, void *y, const
MKL_INT incy);
Include Files
• mkl.h
Description
The ?copy routines perform a vector-vector operation defined as
y = x,
where x and y are vectors.
Input Parameters
Output Parameters
y Contains a copy of the vector x if n is positive. Otherwise, parameters are

unaltered.
cblas_?dot
Computes a vector-vector dot product.
Syntax
float cblas_sdot (const MKL_INT n, const float *x, const MKL_INT incx, const float *y,
double cblas_ddot (const MKL_INT n, const double *x, const MKL_INT incx, const double
*y, const MKL_INT incy);
55
Include Files
• mkl.h
Description
The ?dot routines perform a vector-vector reduction operation defined as
where xi and yi are elements of vectors x and y.
Input Parameters
x Array, size at least (1+(n-1)*abs(incx)).
y Array, size at least (1+(n-1)*abs(incy)).
Return Values
The result of the dot product of x and y, if n is positive. Otherwise, returns 0.
cblas_?sdot
Computes a vector-vector dot product with double
precision.
Syntax
float cblas_sdsdot (const MKL_INT n, const float sb, const float *sx, const MKL_INT
incx, const float *sy, const MKL_INT incy);
double cblas_dsdot (const MKL_INT n, const float *sx, const MKL_INT incx, const float
*sy, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?sdot routines compute the inner product of two vectors with double precision. Both routines use double
precision accumulation of the intermediate results, but the sdsdot routine outputs the final result in single
precision, whereas the dsdot routine outputs the double precision result. The function sdsdot also adds
scalar value sb to the inner product.
Input Parameters
n Specifies the number of elements in the input vectors sx and sy.
56
sb Single precision scalar to be added to inner product (for the function
sdsdot only).
sx, sy Arrays, size at least (1+(n -1)*abs(incx)) and (1+(n-1)*abs(incy)),

respectively. Contain the input single precision vectors.
incx Specifies the increment for the elements of sx.
incy Specifies the increment for the elements of sy.
Return Values
The result of the dot product of sx and sy (with sb added for sdsdot), if n is positive. Otherwise, returns sb
for sdsdot and 0 for dsdot.
cblas_?dotc
Computes a dot product of a conjugated vector with
another vector.
Syntax
void cblas_cdotc_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotc);
void cblas_zdotc_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotc);
Include Files
• mkl.h
Description
The ?dotc routines perform a vector-vector operation defined as:
where xi and yi are elements of vectors x and y.
Input Parameters
x Array, size at least (1 + (n -1)*abs(incx)).
y Array, size at least (1 + (n -1)*abs(incy)).
Output Parameters
dotc Contains the result of the dot product of the conjugated x and unconjugated
y, if n is positive. Otherwise, it contains 0.
57
cblas_?dotu
Computes a vector-vector dot product.
Syntax
void cblas_cdotu_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotu);
void cblas_zdotu_sub (const MKL_INT n, const void *x, const MKL_INT incx, const void
*y, const MKL_INT incy, void *dotu);
Include Files
• mkl.h
Description
The ?dotu routines perform a vector-vector reduction operation defined as
where xi and yi are elements of complex vectors x and y.
Input Parameters
Output Parameters
dotu Contains the result of the dot product of x and y, if n is positive. Otherwise,
it contains 0.
cblas_?nrm2
Computes the Euclidean norm of a vector.
Syntax
float cblas_snrm2 (const MKL_INT n, const float *x, const MKL_INT incx);
double cblas_dnrm2 (const MKL_INT n, const double *x, const MKL_INT incx);
float cblas_scnrm2 (const MKL_INT n, const void *x, const MKL_INT incx);
double cblas_dznrm2 (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
58
Description
The ?nrm2 routines perform a vector reduction operation defined as
res = ||x||,
where:
x is a vector,
res is a value containing the Euclidean norm of the elements of x.
Input Parameters
x Array, size at least (1 + (n -1)*abs (incx)).
Return Values
The Euclidean norm of the vector x.
cblas_?rot
Performs rotation of points in the plane.
Syntax
void cblas_srot (const MKL_INT n, float *x, const MKL_INT incx, float *y, const MKL_INT
incy, const float c, const float s);
void cblas_drot (const MKL_INT n, double *x, const MKL_INT incx, double *y, const
MKL_INT incy, const double c, const double s);
void cblas_csrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy, const float c, const float s);
void cblas_zdrot (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy, const double c, const double s);
Include Files
• mkl.h
Description
Given two complex vectors x and y, each vector element of these vectors is replaced as follows:
xi = c*xi + s*yi
yi = c*yi - s*xi
Input Parameters
59
c A scalar.
s A scalar.
Output Parameters
x Each element is replaced by c*x + s*y.
y Each element is replaced by c*y - s*x.
cblas_?rotg
Computes the parameters for a Givens rotation.
Syntax
void cblas_srotg (float *a, float *b, float *c, float *s);
void cblas_drotg (double *a, double *b, double *c, double *s);
void cblas_crotg (void *a, const void *b, float *c, void *s);
void cblas_zrotg (void *a, const void *b, double *c, void *s);
Include Files
• mkl.h
Description
Given the Cartesian coordinates (a, b) of a point, these routines return the parameters c, s, r, and z
associated with the Givens rotation. The parameters c and s define a unitary matrix such that:
The parameter z is defined such that if |a| > |b|, z is s; otherwise if c is not 0 z is 1/c; otherwise z is 1.
Input Parameters
a Provides the x-coordinate of the point p.
b Provides the y-coordinate of the point p.
Output Parameters
a Contains the parameter r associated with the Givens rotation.
b Contains the parameter z associated with the Givens rotation.
c Contains the parameter c associated with the Givens rotation.
s Contains the parameter s associated with the Givens rotation.
60
cblas_?rotm
Performs modified Givens rotation of points in the
plane.
Syntax
void cblas_srotm (const MKL_INT n, float *x, const MKL_INT incx, float *y, const
MKL_INT incy, const float *param);
void cblas_drotm (const MKL_INT n, double *x, const MKL_INT incx, double *y, const
MKL_INT incy, const double *param);
Include Files
• mkl.h
Description
Given two vectors x and y, each vector element of these vectors is replaced as follows:
xi xi
=H
yi yi
for i=1 to n, where H is a modified Givens transformation matrix whose values are stored in the param[1]
through param[4] array. See discussion on the param argument.
Input Parameters
param Array, size 5.

The elements of the param array are:
param[0] contains a switch, flag. param[1-4] contain h11, h21, h12, and
h22, respectively, the components of the array H.
Depending on the values of flag, the components of H are set as follows:
h 11 h 12
flag = -1.0: H =
h 21 h 22
1.0 h 12
flag = 0.0: H =
h 21 1.0
h 11 1.0
flag = 1.0: H =
−1.0 h 22
1.0 0.0
flag = -2.0: H =
0.0 1.0
61
In the last three cases, the matrix entries of 1.0, -1.0, and 0.0 are assumed
based on the value of flag and are not required to be set in the param
vector.
Output Parameters
x Each element x[i] is replaced by h11*x[i] + h12*y[i].
y Each element y[i] is replaced by h21*x[i] + h22*y[i].
cblas_?rotmg
Computes the parameters for a modified Givens
rotation.
Syntax
void cblas_srotmg (float *d1, float *d2, float *x1, const float y1, float *param);
void cblas_drotmg (double *d1, double *d2, double *x1, const double y1, double *param);
Include Files
• mkl.h
Description
Given Cartesian coordinates (x1, y1) of an input vector, these routines compute the components of a
modified Givens transformation matrix H that zeros the y-component of the resulting vector:
x1 x1 d1
=H
0 y1 d2
Input Parameters
d1 Provides the scaling factor for the x-coordinate of the input vector.
d2 Provides the scaling factor for the y-coordinate of the input vector.
x1 Provides the x-coordinate of the input vector.
y1 Provides the y-coordinate of the input vector.
Output Parameters
d1 Provides the first diagonal element of the updated matrix.
d2 Provides the second diagonal element of the updated matrix.
x1 Provides the x-coordinate of the rotated vector before scaling.
param Array, size 5.

The elements of the param array are:
param[0] contains a switch, flag. the other array elements param[1-4]
contain the components of the array H: h11, h21, h12, and h22, respectively.
Depending on the values of flag, the components of H are set as follows:
62
h 11 h 12
flag = -1.0: H =
h 21 h 22
1.0 h 12
flag = 0.0: H =
h 21 1.0
h 11 1.0
flag = 1.0: H =
−1.0 h 22
1.0 0.0
flag = -2.0: H =
0.0 1.0
In the last three cases, the matrix entries of 1.0, -1.0, and 0.0 are assumed
based on the value of flag and are not required to be set in the param
vector.
cblas_?scal
Computes the product of a vector by a scalar.
Syntax
void cblas_sscal (const MKL_INT n, const float a, float *x, const MKL_INT incx);
void cblas_dscal (const MKL_INT n, const double a, double *x, const MKL_INT incx);
void cblas_cscal (const MKL_INT n, const void *a, void *x, const MKL_INT incx);
void cblas_zscal (const MKL_INT n, const void *a, void *x, const MKL_INT incx);
void cblas_csscal (const MKL_INT n, const float a, void *x, const MKL_INT incx);
void cblas_zdscal (const MKL_INT n, const double a, void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
The ?scal routines perform a vector operation defined as
x = a*x
where:
a is a scalar, x is an n-element vector.
Input Parameters
63
Output Parameters
x Updated vector x.
cblas_?swap
Swaps a vector with another vector.
Syntax
void cblas_sswap (const MKL_INT n, float *x, const MKL_INT incx, float *y, const
MKL_INT incy);
void cblas_dswap (const MKL_INT n, double *x, const MKL_INT incx, double *y, const
MKL_INT incy);
void cblas_cswap (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy);
void cblas_zswap (const MKL_INT n, void *x, const MKL_INT incx, void *y, const MKL_INT
incy);
Include Files
• mkl.h
Description
Given two vectors x and y, the ?swap routines return vectors y and x swapped, each replacing the other.
Input Parameters
Output Parameters
x Contains the resultant vector x, that is, the input vector y.
y Contains the resultant vector y, that is, the input vector x.
cblas_i?amax
Finds the index of the element with maximum
absolute value.
Syntax
CBLAS_INDEX cblas_isamax (const MKL_INT n, const float *x, const MKL_INT incx);
CBLAS_INDEX cblas_idamax (const MKL_INT n, const double *x, const MKL_INT incx);
CBLAS_INDEX cblas_icamax (const MKL_INT n, const void *x, const MKL_INT incx);
64
CBLAS_INDEX cblas_izamax (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
Given a vector x, the i?amax functions return the position of the vector element x[i] that has the largest
absolute value for real flavors, or the largest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
If n is not positive, 0 is returned.
If more than one vector element is found with the same largest absolute value, the index of the first one
encountered is returned.
Input Parameters
Return Values
Returns the position of vector element that has the largest absolute value such that x[index-1] has the
largest absolute value.
cblas_i?amin
Finds the index of the element with the smallest
absolute value.
Syntax
CBLAS_INDEX cblas_isamin (const MKL_INT n, const float *x, const MKL_INT incx);
CBLAS_INDEX cblas_idamin (const MKL_INT n, const double *x, const MKL_INT incx);
CBLAS_INDEX cblas_icamin (const MKL_INT n, const void *x, const MKL_INT incx);
CBLAS_INDEX cblas_izamin (const MKL_INT n, const void *x, const MKL_INT incx);
Include Files
• mkl.h
Description
Given a vector x, the i?amin functions return the position of the vector element x[i] that has the smallest
absolute value for real flavors, or the smallest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
If n is not positive, 0 is returned.
If more than one vector element is found with the same smallest absolute value, the index of the first one
encountered is returned.
Input Parameters
n On entry, n specifies the number of elements in vector x.
65
Return Values
Indicates the position of vector element with the smallest absolute value such that x[index-1] has the
smallest absolute value.
cblas_?cabs1
Computes absolute value of complex number.
Syntax
float cblas_scabs1 (const void *z);
double cblas_dcabs1 (const void *z);
Include Files
• mkl.h
Description
The ?cabs1 is an auxiliary routine for a few BLAS Level 1 routines. This routine performs an operation
defined as
res=|Re(z)|+|Im(z)|,
where z is a scalar, and res is a value containing the absolute value of a complex number z.
Input Parameters
z Scalar.
Return Values
The absolute value of a complex number z.
BLAS Level 2 Routines

This section describes BLAS Level 2 routines, which perform matrix-vector operations. Table “BLAS Level 2
Routine Groups and Their Data Types” lists the BLAS Level 2 routine groups and the data types associated
with them.
BLAS Level 2 Routine Groups and Their Data Types
Routine Groups Data Types Description
cblas_?gbmv s, d, c, z Matrix-vector product using a general band matrix
cblas?_gemv s, d, c, z Matrix-vector product using a general matrix
cblas_?ger s, d Rank-1 update of a general matrix
cblas_?gerc c, z Rank-1 update of a conjugated general matrix
cblas_?geru c, z Rank-1 update of a general matrix, unconjugated
cblas_?hbmv c, z Matrix-vector product using a Hermitian band matrix
cblas_?hemv c, z Matrix-vector product using a Hermitian matrix
66
Routine Groups Data Types Description
cblas_?her c, z Rank-1 update of a Hermitian matrix
cblas_?her2 c, z Rank-2 update of a Hermitian matrix
cblas_?hpmv c, z Matrix-vector product using a Hermitian packed matrix
cblas_?hpr c, z Rank-1 update of a Hermitian packed matrix
cblas_?hpr2 c, z Rank-2 update of a Hermitian packed matrix
cblas_?sbmv s, d Matrix-vector product using symmetric band matrix
cblas_?spmv s, d Matrix-vector product using a symmetric packed matrix
cblas_?spr s, d Rank-1 update of a symmetric packed matrix
cblas_?spr2 s, d Rank-2 update of a symmetric packed matrix
cblas_?symv s, d Matrix-vector product using a symmetric matrix
cblas_?syr s, d Rank-1 update of a symmetric matrix
cblas_?syr2 s, d Rank-2 update of a symmetric matrix
cblas_?tbmv s, d, c, z Matrix-vector product using a triangular band matrix
cblas_?tbsv s, d, c, z Solution of a linear system of equations with a triangular

band matrix
cblas_?tpmv s, d, c, z Matrix-vector product using a triangular packed matrix
cblas_?tpsv s, d, c, z Solution of a linear system of equations with a triangular

packed matrix
cblas_?trmv s, d, c, z Matrix-vector product using a triangular matrix
cblas_?trsv s, d, c, z Solution of a linear system of equations with a triangular

matrix
cblas_?gbmv
Computes a matrix-vector product using a general
band matrix
Syntax
void cblas_sgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const float alpha, const float
*a, const MKL_INT lda, const float *x, const MKL_INT incx, const float beta, float *y,
void cblas_dgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const double alpha, const
double *a, const MKL_INT lda, const double *x, const MKL_INT incx, const double beta,
double *y, const MKL_INT incy);
void cblas_cgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const void *alpha, const void
*a, const MKL_INT lda, const void *x, const MKL_INT incx, const void *beta, void *y,
67
void cblas_zgbmv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const MKL_INT kl, const MKL_INT ku, const void *alpha, const void
*a, const MKL_INT lda, const void *x, const MKL_INT incx, const void *beta, void *y,
Include Files
• mkl.h
Description
The ?gbmv routines perform a matrix-vector operation defined as
y := alpha*A*x + beta*y,
or
y := alpha*A'*x + beta*y,
or
y := alpha *conjg(A')*x + beta*y,
where:
alpha and beta are scalars,
x and y are vectors,
A is an m-by-n band matrix, with kl sub-diagonals and ku super-diagonals.
Input Parameters
Layout Specifies whether two-dimensional array storage is row-major

(CblasRowMajor) or column-major (CblasColMajor).
trans Specifies the operation:

If trans=CblasNoTrans, then y := alpha*A*x + beta*y
If trans=CblasTrans, then y := alpha*A'*x + beta*y
If trans=CblasConjTrans, then y := alpha *conjg(A')*x + beta*y
m Specifies the number of rows of the matrix A.

The value of m must be at least zero.
n Specifies the number of columns of the matrix A.

The value of n must be at least zero.
kl Specifies the number of sub-diagonals of the matrix A.

The value of kl must satisfy 0≤kl.
ku Specifies the number of super-diagonals of the matrix A.

The value of ku must satisfy 0≤ku.
alpha Specifies the scalar alpha.
a Array, size lda*n.
68
Layout = CblasColMajor: Before entry, the leading (kl + ku + 1) by n
part of the array a must contain the matrix of coefficients. This matrix must
be supplied column-by-column, with the leading diagonal of the matrix in
row (ku) of the array, the first super-diagonal starting at position 1 in row
(ku - 1), the first sub-diagonal starting at position 0 in row (ku + 1),
and so on. Elements in the array a that do not correspond to elements in
the band matrix (such as the top left ku by ku triangle) are not referenced.
The following program segment transfers a band matrix from conventional
full matrix storage (matrix, with leading dimension ldm) to band storage (a,
with leading dimension lda):
for (j = 0; j < n; j++) {

k = ku - j;
for (i = max(0, j-ku); i < min(m, j+kl+1); i++) {
a[(k+i) + j*lda] = matrix[i + j*ldm];
}
}
Layout = CblasRowMajor: Before entry, the leading (kl + ku + 1) by m

part of the array a must contain the matrix of coefficients. This matrix must
be supplied row-by-row, with the leading diagonal of the matrix in column
(kl) of the array, the first super-diagonal starting at position 0 in column
(kl + 1), the first sub-diagonal starting at position 1 in row (kl - 1),
and so on. Elements in the array a that do not correspond to elements in
the band matrix (such as the top left kl by kl triangle) are not referenced.
The following program segment transfers a band matrix from row-major full
matrix storage (matrix, with leading dimension ldm) to band storage (a,
with leading dimension lda):
for (i = 0; i < m; i++) {

k = kl - i;
for (j = max(0, i-kl); j < min(n, i+ku+1); j++) {
a[(k+j) + i*lda] = matrix[j + i*ldm];
}
}
lda Specifies the leading dimension of a as declared in the calling

(sub)program. The value of lda must be at least (kl + ku + 1).
x Array, size at least (1 + (n - 1)*abs(incx)) when

trans=CblasNoTrans, and at least (1 + (m - 1)*abs(incx))
otherwise. Before entry, the array x must contain the vector x.
incx Specifies the increment for the elements of x. incx must not be zero.
beta Specifies the scalar beta. When beta is equal to zero, then y need not be
set on input.
y Array, size at least (1 +(m - 1)*abs(incy)) when

trans=CblasNoTrans and at least (1 +(n - 1)*abs(incy)) otherwise.
Before entry, the incremented array y must contain the vector y.

The value of incy must not be zero.
69
Output Parameters
y Updated vector y.
cblas_?gemv
Computes a matrix-vector product using a general
matrix
Syntax
void cblas_sgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const float alpha, const float *a, const MKL_INT lda, const float
*x, const MKL_INT incx, const float beta, float *y, const MKL_INT incy);
void cblas_dgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const double alpha, const double *a, const MKL_INT lda, const
double *x, const MKL_INT incx, const double beta, double *y, const MKL_INT incy);
void cblas_cgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const void
*x, const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
void cblas_zgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT
m, const MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, const void
*x, const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?gemv routines perform a matrix-vector operation defined as
or
y := alpha*A'*x + beta*y,
or
y := alpha*conjg(A')*x + beta*y,
where:
A is an m-by-n matrix.
Input Parameters


if trans=CblasNoTrans, then y := alpha*A*x + beta*y;
if trans=CblasTrans, then y := alpha*A'*x + beta*y;
70
if trans=CblasConjTrans, then y := alpha *conjg(A')*x + beta*y.
m Specifies the number of rows of the matrix A. The value of m must be at

least zero.
n Specifies the number of columns of the matrix A. The value of n must be at

least zero.
a Array, size lda*k.
For Layout = CblasColMajor, k is n. Before entry, the leading m-by-n part

of the array a must contain the matrix A.
For Layout = CblasRowMajor, k is m. Before entry, the leading n-by-m part


(sub)program.
For Layout = CblasColMajor, the value of lda must be at least max(1,
m).
For Layout = CblasRowMajor, the value of lda must be at least max(1,
n).
x Array, size at least (1+(n-1)*abs(incx)) when trans=CblasNoTrans

and at least (1+(m - 1)*abs(incx)) otherwise. Before entry, the
incremented array x must contain the vector x.

The value of incx must not be zero.
beta Specifies the scalar beta. When beta is set to zero, then y need not be set
on input.
y Array, size at least (1 +(m - 1)*abs(incy)) when

trans=CblasNoTrans and at least (1 +(n - 1)*abs(incy)) otherwise.
Before entry with non-zero beta, the incremented array y must contain the
vector y.

Output Parameters
y Updated vector y.
cblas_?ger
Performs a rank-1 update of a general matrix.
Syntax
void cblas_sger (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT incy,
float *a, const MKL_INT lda);
71
void cblas_dger (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT
incy, double *a, const MKL_INT lda);
Include Files
• mkl.h
Description
The ?ger routines perform a matrix-vector operation defined as
A := alpha*x*y'+ A,
where:
alpha is a scalar,
x is an m-element vector,
y is an n-element vector,
A is an m-by-n general matrix.
Input Parameters



x Array, size at least (1 + (m - 1)*abs(incx)). Before entry, the

incremented array x must contain the m-element vector x.

y Array, size at least (1 + (n - 1)*abs(incy)). Before entry, the

incremented array y must contain the n-element vector y.




(sub)program.
72
m).
n).
Output Parameters
a Overwritten by the updated matrix.
cblas_?gerc
Performs a rank-1 update (conjugated) of a general
matrix.
Syntax
void cblas_cgerc (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT incy,
void *a, const MKL_INT lda);
void cblas_zgerc (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
Include Files
• mkl.h
Description
The ?gerc routines perform a matrix-vector operation defined as
A := alpha*x*conjg(y') + A,
where:
alpha is a scalar,
Input Parameters



73







(sub)program.
m).
n).
Output Parameters
cblas_?geru
Performs a rank-1 update (unconjugated) of a general
matrix.
Syntax
void cblas_cgeru (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
void cblas_zgeru (const CBLAS_LAYOUT Layout, const MKL_INT m, const MKL_INT n, const
Include Files
• mkl.h
Description
The ?geru routines perform a matrix-vector operation defined as
A := alpha*x*y ' + A,
where:
74
alpha is a scalar,
Input Parameters










(sub)program.
m).
n).
Output Parameters
cblas_?hbmv
Computes a matrix-vector product using a Hermitian
band matrix.
75
Syntax
void cblas_chbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
void cblas_zhbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const void *x,
const MKL_INT incx, const void *beta, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?hbmv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y,
where:
x and y are n-element vectors,
A is an n-by-n Hermitian band matrix, with k super-diagonals.
Input Parameters

uplo Specifies whether the upper or lower triangular part of the Hermitian band
matrix A is used:
If uplo = CblasUpper, then the upper triangular part of the matrix A is
used.
If uplo = CblasLower, then the low triangular part of the matrix A is used.
n Specifies the order of the matrix A. The value of n must be at least zero.
k For uplo = CblasUpper: Specifies the number of super-diagonals of the

matrix A.
For uplo = CblasLower: Specifies the number of sub-diagonals of the
matrix A.
The value of k must satisfy 0≤k.
Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the leading (k + 1) by n part of
the array a must contain the upper triangular band part of the Hermitian
matrix. The matrix must be supplied column-by-column, with the leading
diagonal of the matrix in row k of the array, the first super-diagonal starting
at position 1 in row (k - 1), and so on. The top left k by k triangle of the
array a is not referenced.
76
The following program segment transfers the upper triangular part of a
Hermitian band matrix from conventional full matrix storage (matrix, with
leading dimension ldm) to band storage (a, with leading dimension lda):
for (j = 0; j < n; j++) {

m = k - j;
for (i = max( 0, j - k); i <= j; i++) {
a[(m+i) + j*lda] = matrix[i + j*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1) by n part of

the array a must contain the lower triangular band part of the Hermitian
matrix, supplied column-by-column, with the leading diagonal of the matrix
in row 0 of the array, the first sub-diagonal starting at position 0 in row 1,
and so on. The bottom right k by k triangle of the array a is not referenced.
The following program segment transfers the lower triangular part of a
Hermitian band matrix from conventional full matrix storage (matrix, with
for (j = 0; j < n; j++) {

m = -j;
for (i = j; i < min(n, j + k + 1); i++) {
}
}
Layout = CblasRowMajor:
Before entry with uplo = CblasUpper, the leading (k + 1)-by-n part of
array a must contain the upper triangular band part of the Hermitian
matrix. The matrix must be supplied row-by-row, with the leading diagonal
of the matrix in column 0 of the array, the first super-diagonal starting at
position 0 in column 1, and so on. The bottom right k-by-k triangle of array
a is not referenced.
Hermitian band matrix from row-major full matrix storage (matrix with
leading dimension ldm) to row-major band storage (a, with leading
dimension lda):
for (i = 0; i < n; i++) {

m = -i;
for (j = i; j < MIN(n, i+k+1); j++) {
a[(m+j) + i*lda] = matrix[j + i*ldm];
}
}
Before entry with uplo = CblasLower, the leading (k + 1)-by-n part of

array a must contain the lower triangular band part of the Hermitian matrix,
supplied row-by-row, with the leading diagonal of the matrix in column k of
the array, the first sub-diagonal starting at position 1 in column k-1, and so
on. The top left k-by-k triangle of array a is not referenced.
77

Hermitian row-major band matrix from row-major full matrix storage
(matrix, with leading dimension ldm) to row-major band storage (a, with
leading dimension lda):
for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
}
}
The imaginary parts of the diagonal elements need not be set and are
assumed to be zero.

(sub)program. The value of lda must be at least (k + 1).
x Array, size at least (1 + (n - 1)*abs(incx)). Before entry, the


beta Specifies the scalar beta.

incremented array y must contain the vector y.

Output Parameters
y Overwritten by the updated vector y.
cblas_?hemv
matrix.
Syntax
void cblas_chemv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *a, const MKL_INT lda, const void *x, const MKL_INT
incx, const void *beta, void *y, const MKL_INT incy);
void cblas_zhemv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *a, const MKL_INT lda, const void *x, const MKL_INT
incx, const void *beta, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
78
The ?hemv routines perform a matrix-vector operation defined as
where:
A is an n-by-n Hermitian matrix.
Input Parameters

uplo Specifies whether the upper or lower triangular part of the array a is used.
If uplo = CblasUpper, then the upper triangular of the array a is used.
If uplo = CblasLower, then the low triangular of the array a is used.
Before entry with uplo = CblasUpper, the leading n-by-n upper triangular
part of the array a must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of a is not referenced. Before
entry with uplo = CblasLower, the leading n-by-n lower triangular part of
the array a must contain the lower triangular part of the Hermitian matrix
and the strictly upper triangular part of a is not referenced.
assumed to be zero.

(sub)program. The value of lda must be at least max(1, n).

incremented array x must contain the n-element vector x.

beta Specifies the scalar beta. When beta is supplied as zero then y need not be
set on input.


Output Parameters
79
cblas_?her
Performs a rank-1 update of a Hermitian matrix.
Syntax
void cblas_cher (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const void *x, const MKL_INT incx, void *a, const MKL_INT lda);
void cblas_zher (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const void *x, const MKL_INT incx, void *a, const MKL_INT lda);
Include Files
• mkl.h
Description
The ?her routines perform a matrix-vector operation defined as
A := alpha*x*conjg(x') + A,
where:
alpha is a real scalar,
x is an n-element vector,
Input Parameters

x Array, dimension at least (1 + (n - 1)*abs(incx)). Before entry, the


matrix and the strictly lower triangular part of a is not referenced.
Before entry with uplo = CblasLower, the leading n-by-n lower triangular
part of the array a must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of a is not referenced.
80
assumed to be zero.

Output Parameters
a With uplo = CblasUpper, the upper triangular part of the array a is

overwritten by the upper triangular part of the updated matrix.
With uplo = CblasLower, the lower triangular part of the array a is
overwritten by the lower triangular part of the updated matrix.
The imaginary parts of the diagonal elements are set to zero.
cblas_?her2
Performs a rank-2 update of a Hermitian matrix.
Syntax
void cblas_cher2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *x, const MKL_INT incx, const void *y, const MKL_INT
incy, void *a, const MKL_INT lda);
void cblas_zher2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
incy, void *a, const MKL_INT lda);
Include Files
• mkl.h
Description
The ?her2 routines perform a matrix-vector operation defined as
A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A,
where:
alpha is a scalar,
Input Parameters

81




part of the array a must contain the lower triangular part of the Hermitian
assumed to be zero.

Output Parameters

cblas_?hpmv
packed matrix.
Syntax
void cblas_chpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *ap, const void *x, const MKL_INT incx, const void
*beta, void *y, const MKL_INT incy);
void cblas_zhpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const void *alpha, const void *ap, const void *x, const MKL_INT incx, const void
*beta, void *y, const MKL_INT incy);
Include Files
• mkl.h
82
Description
The ?hpmv routines perform a matrix-vector operation defined as
where:
A is an n-by-n Hermitian matrix, supplied in packed form.
Input Parameters

uplo Specifies whether the upper or lower triangular part of the matrix A is
supplied in the packed array ap.

supplied in the packed array ap .
If uplo = CblasLower, then the low triangular part of the matrix A is

ap Array, size at least ((n*(n + 1))/2).
For Layout = CblasColMajor:
Before entry with uplo = CblasUpper, the array ap must contain the upper
triangular part of the Hermitian matrix packed sequentially, column-by-
column, so that ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and
A2, 2 respectively, and so on. Before entry with uplo = CblasLower, the
array ap must contain the lower triangular part of the Hermitian matrix
packed sequentially, column-by-column, so that ap[0] contains A1, 1,
ap[1] and ap[2] contain A2, 1 and A3, 1 respectively, and so on.
For Layout = CblasRowMajor:
triangular part of the Hermitian matrix packed sequentially, row-by-row,
ap[0] contains A1, 1, ap[1] and ap[2] contain A1, 2 and A1, 3 respectively,
and so on. Before entry with uplo = CblasLower, the array ap must
contain the lower triangular part of the Hermitian matrix packed
sequentially, row-by-row, so that ap[0] contains A1, 1, ap[1] and ap[2]
contain A2, 1 and A2, 2 respectively, and so on.
assumed to be zero.
x Array, size at least (1 +(n - 1)*abs(incx)). Before entry, the

83

When beta is equal to zero then y need not be set on input.


Output Parameters
cblas_?hpr
Performs a rank-1 update of a Hermitian packed
matrix.
Syntax
void cblas_chpr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const void *x, const MKL_INT incx, void *ap);
void cblas_zhpr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const void *x, const MKL_INT incx, void *ap);
Include Files
• mkl.h
Description
The ?hpr routines perform a matrix-vector operation defined as
A := alpha*x*conjg(x') + A,
where:
Input Parameters

If uplo = CblasUpper, the upper triangular part of the matrix A is supplied

in the packed array ap .
If uplo = CblasLower, the low triangular part of the matrix A is supplied in

the packed array ap .
84

incx Specifies the increment for the elements of x. incx must not be zero.
A2, 2 respectively, and so on.
Before entry with uplo = CblasLower, the array ap must contain the lower
and so on.
triangular part of the Hermitian matrix packed sequentially, row-by-row, so
that ap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2
respectively, and so on.
assumed to be zero.
Output Parameters
ap With uplo = CblasUpper, overwritten by the upper triangular part of the

updated matrix.
With uplo = CblasLower, overwritten by the lower triangular part of the
updated matrix.
cblas_?hpr2
Performs a rank-2 update of a Hermitian packed
matrix.
Syntax
void cblas_chpr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
incy, void *ap);
85
void cblas_zhpr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
incy, void *ap);
Include Files
• mkl.h
Description
The ?hpr2 routines perform a matrix-vector operation defined as
A := alpha*x*conjg(y') + conjg(alpha)*y*conjg(x') + A,
where:
alpha is a scalar,
Input Parameters



x Array, dimension at least (1 +(n - 1)*abs(incx)). Before entry, the


y Array, size at least (1 +(n - 1)*abs(incy)). Before entry, the


86
and so on.
triangular part of the Hermitian matrix packed sequentially, row-by-row, so
assumed to be zero.
Output Parameters

updated matrix.
updated matrix.
The imaginary parts of the diagonal elements need are set to zero.
cblas_?sbmv
Computes a matrix-vector product using a symmetric
band matrix.
Syntax
void cblas_ssbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const float alpha, const float *a, const MKL_INT lda, const float *x,
const MKL_INT incx, const float beta, float *y, const MKL_INT incy);
void cblas_dsbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const MKL_INT k, const double alpha, const double *a, const MKL_INT lda, const double
*x, const MKL_INT incx, const double beta, double *y, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?sbmv routines perform a matrix-vector operation defined as
where:
A is an n-by-n symmetric band matrix, with k super-diagonals.
87
Input Parameters

uplo Specifies whether the upper or lower triangular part of the band matrix A is
used:
if uplo = CblasUpper - upper triangular part;
if uplo = CblasLower - low triangular part.
k Specifies the number of super-diagonals of the matrix A.

a Array, size lda*n. Before entry with uplo = CblasUpper, the leading (k
+ 1) by n part of the array a must contain the upper triangular band part
of the symmetric matrix, supplied column-by-column, with the leading
diagonal of the matrix in row k of the array, the first super-diagonal starting
at position 1 in row (k - 1), and so on. The top left k by k triangle of the
array a is not referenced.
symmetric band matrix from conventional full matrix storage (matrix, with
for (j = 0; j < n; j++) {

m = k - j;
for (i = max( 0, j - k); i <= j; i++) {
}
}

the array a must contain the lower triangular band part of the symmetric
matrix, supplied column-by-column, with the leading diagonal of the matrix
in row 0 of the array, the first sub-diagonal starting at position 0 in row 1,
and so on. The bottom right k by k triangle of the array a is not referenced.
symmetric band matrix from conventional full matrix storage (matrix, with
for (j = 0; j < n; j++) {

m = -j;
for (i = j; i < min(n, j + k + 1); i++) {
}
}
array a must contain the upper triangular band part of the symmetric
matrix. The matrix must be supplied row-by-row, with the leading diagonal
88
of the matrix in column 0 of the array, the first super-diagonal starting at
position 0 in column 1, and so on. The bottom right k-by-k triangle of array
a is not referenced.
symmetric band matrix from row-major full matrix storage (matrix with
dimension lda):
for (i = 0; i < n; i++) {

m = -i;
for (j = i; j < MIN(n, i+k+1); j++) {
}
}

array a must contain the lower triangular band part of the symmetric
matrix, supplied row-by-row, with the leading diagonal of the matrix in
column k of the array, the first sub-diagonal starting at position 1 in column
k-1, and so on. The top left k-by-k triangle of array a is not referenced.
symmetric row-major band matrix from row-major full matrix storage
for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
}
}




incremented array y must contain the vector y.

Output Parameters
89
cblas_?spmv
Computes a matrix-vector product using a symmetric
packed matrix.
Syntax
void cblas_sspmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *ap, const float *x, const MKL_INT incx, const float
beta, float *y, const MKL_INT incy);
void cblas_dspmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *ap, const double *x, const MKL_INT incx, const double
beta, double *y, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?spmv routines perform a matrix-vector operation defined as
where:
A is an n-by-n symmetric matrix, supplied in packed form.
Input Parameters



triangular part of the symmetric matrix packed sequentially, column-by-
A2, 2 respectively, and so on. Before entry with uplo = CblasLower, the
array ap must contain the lower triangular part of the symmetric matrix
packed sequentially, column-by-column, so that ap[0] contains A1, 1,
90
triangular part of the symmetric matrix packed sequentially, row-by-row,
contain the lower triangular part of the symmetric matrix packed



When beta is supplied as zero, then y need not be set on input.


Output Parameters
cblas_?spr
Performs a rank-1 update of a symmetric packed
matrix.
Syntax
void cblas_sspr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, float *ap);
void cblas_dspr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, double *ap);
Include Files
• mkl.h
Description
The ?spr routines perform a matrix-vector operation defined as
a:= alpha*x*x'+ A,
where:
91
Input Parameters





ap For Layout = CblasColMajor:
and so on.
triangular part of the symmetric matrix packed sequentially, row-by-row, so
Output Parameters

updated matrix.
updated matrix.
92
cblas_?spr2
Performs a rank-2 update of a symmetric packed
matrix.
Syntax
void cblas_sspr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT
incy, float *ap);
void cblas_dspr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT
incy, double *ap);
Include Files
• mkl.h
Description
The ?spr2 routines perform a matrix-vector operation defined as
A:= alpha*x*y'+ alpha*y*x' + A,
where:
alpha is a scalar,
Input Parameters






93
incy Specifies the increment for the elements of y. The value of incy must not be
zero.
ap For Layout = CblasColMajor:
and so on.
triangular part of the symmetric matrix packed sequentially, row-by-row, so
Output Parameters

updated matrix.
updated matrix.
cblas_?symv
Computes a matrix-vector product for a symmetric
matrix.
Syntax
void cblas_ssymv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *a, const MKL_INT lda, const float *x, const MKL_INT
incx, const float beta, float *y, const MKL_INT incy);
void cblas_dsymv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *a, const MKL_INT lda, const double *x, const MKL_INT
incx, const double beta, double *y, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?symv routines perform a matrix-vector operation defined as
where:
94
A is an n-by-n symmetric matrix.
Input Parameters

If uplo = CblasUpper, then the upper triangular part of the array a is

used.
If uplo = CblasLower, then the low triangular part of the array a is used.
part of the array a must contain the upper triangular part of the symmetric
matrix A and the strictly lower triangular part of a is not referenced. Before
the array a must contain the lower triangular part of the symmetric matrix
A and the strictly upper triangular part of a is not referenced.




When beta is supplied as zero, then y need not be set on input.


Output Parameters
cblas_?syr
Performs a rank-1 update of a symmetric matrix.
95
Syntax
void cblas_ssyr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, float *a, const MKL_INT lda);
void cblas_dsyr (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, double *a, const MKL_INT lda);
Include Files
• mkl.h
Description
The ?syr routines perform a matrix-vector operation defined as
A := alpha*x*x' + A ,
where:
Input Parameters


used.
x Array, size at least (1 + (n-1)*abs(incx)). Before entry, the


matrix A and the strictly lower triangular part of a is not referenced.
part of the array a must contain the lower triangular part of the symmetric
matrix A and the strictly upper triangular part of a is not referenced.

96
Output Parameters

cblas_?syr2
Performs a rank-2 update of symmetric matrix.
Syntax
void cblas_ssyr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const float alpha, const float *x, const MKL_INT incx, const float *y, const MKL_INT
incy, float *a, const MKL_INT lda);
void cblas_dsyr2 (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const MKL_INT n,
const double alpha, const double *x, const MKL_INT incx, const double *y, const MKL_INT
incy, double *a, const MKL_INT lda);
Include Files
• mkl.h
Description
The ?syr2 routines perform a matrix-vector operation defined as
A := alpha*x*y'+ alpha*y*x' + A,
where:
alpha is a scalar,
Input Parameters


used.

97

incy Specifies the increment for the elements of y. The value of incy must not be
zero.

Output Parameters

cblas_?tbmv
Computes a matrix-vector product using a triangular
band matrix.
Syntax
void cblas_stbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const MKL_INT k, const
float *a, const MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
double *a, const MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
void *a, const MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztbmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
Include Files
• mkl.h
Description
The ?tbmv routines perform one of the matrix-vector operations defined as
98
x := A*x, or x := A'*x, or x := conjg(A')*x,
where:
A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k +1) diagonals.
Input Parameters

uplo Specifies whether the matrix A is an upper or lower triangular matrix:

uplo = CblasUpper
if uplo = CblasLower, then the matrix is low triangular.

if trans=CblasNoTrans, then x := A*x;
if trans=CblasTrans, then x := A'*x;
if trans=CblasConjTrans, then x := conjg(A')*x.
diag Specifies whether the matrix A is unit triangular:

if diag = CblasUnit then the matrix is unit triangular;
if diag = CblasNonUnit , then the matrix is not unit triangular.
k On entry with uplo = CblasUpper specifies the number of super-diagonals

of the matrix A. On entry with uplo = CblasLower, k specifies the number
of sub-diagonals of the matrix a.
the array a must contain the upper triangular band part of the matrix of
coefficients, supplied column-by-column, with the leading diagonal of the
matrix in row k of the array, the first super-diagonal starting at position 1 in
row (k - 1), and so on. The top left k by k triangle of the array a is not
referenced. The following program segment transfers an upper triangular
band matrix from conventional full matrix storage (matrix, with leading
dimension ldm) to band storage (a, with leading dimension lda):
for (j = 0; j < n; j++) {

m = k - j;
for (i = max( 0, j - k); i <= j; i++) {
}
}
99

the array a must contain the lower triangular band part of the matrix of
matrix in row 0 of the array, the first sub-diagonal starting at position 0 in
row 1, and so on. The bottom right k by k triangle of the array a is not
referenced. The following program segment transfers a lower triangular
band matrix from conventional full matrix storage (matrix, with leading
dimension ldm) to band storage (a, with leading dimension lda):
for (j = 0; j < n; j++) {

m = -j;
for (i = j; i < min(n, j + k + 1); i++) {
}
}
Note that when diag = CblasUnit , the elements of the array a

corresponding to the diagonal elements of the matrix are not referenced,
but are assumed to be unity.
array a must contain the upper triangular band part of the matrix of
coefficients. The matrix must be supplied row-by-row, with the leading
diagonal of the matrix in column 0 of the array, the first super-diagonal
starting at position 0 in column 1, and so on. The bottom right k-by-k
triangle of array a is not referenced.

dimension lda):
for (i = 0; i < n; i++) {

m = -i;
for (j = i; j < MIN(n, i+k+1); j++) {
}
}

array a must contain the lower triangular band part of the matrix of
coefficients, supplied row-by-row, with the leading diagonal of the matrix in
for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
}
}
100


Output Parameters
x Overwritten with the transformed vector x.
cblas_?tbsv
Solves a system of linear equations whose coefficients
are in a triangular band matrix.
Syntax
void cblas_stbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
float *a, const MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
double *a, const MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
void cblas_ztbsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
Include Files
• mkl.h
Description
The ?tbsv routines solve one of the following systems of equations:
A*x = b, or A'*x = b, or conjg(A')*x = b,

where:
b and x are n-element vectors,
A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k + 1) diagonals.
The routine does not test for singularity or near-singularity.

Such tests must be performed before calling this routine.
Input Parameters

101
uplo Specifies whether the matrix A is an upper or lower triangular matrix:

if uplo = CblasUpper the matrix is upper triangular;
if uplo = CblasLower, the matrix is low triangular.
trans Specifies the system of equations:

if trans=CblasNoTrans, then A*x = b;
if trans=CblasTrans, then A'*x = b;
if trans=CblasConjTrans, then conjg(A')*x = b.

if diag = CblasNonUnit, then the matrix is not unit triangular.
k On entry with uplo = CblasUpper, k specifies the number of super-

diagonals of the matrix A. On entry with uplo = CblasLower, k specifies
the number of sub-diagonals of the matrix A.
the array a must contain the upper triangular band part of the matrix of
matrix in row k of the array, the first super-diagonal starting at position 1 in
row (k - 1), and so on. The top left k by k triangle of the array a is not
referenced.
The following program segment transfers an upper triangular band matrix
from conventional full matrix storage (matrix, with leading dimension ldm)
to band storage (a, with leading dimension lda):
for (j = 0; j < n; j++) {

m = k - j;
for (i = max( 0, j - k); i <= j; i++) {
}
}

the array a must contain the lower triangular band part of the matrix of
matrix in row 0 of the array, the first sub-diagonal starting at position 0 in
row 1, and so on. The bottom right k by k triangle of the array a is not
referenced.
102
The following program segment transfers a lower triangular band matrix
from conventional full matrix storage (matrix, with leading dimension ldm)
to band storage (a, with leading dimension lda):
for (j = 0; j < n; j++) {

m = -j;
for (i = j; i < min(n, j + k + 1); i++) {
}
}
When diag = CblasUnit, the elements of the array a corresponding to the

diagonal elements of the matrix are not referenced, but are assumed to be
unity.
array a must contain the upper triangular band part of the matrix of
coefficients. The matrix must be supplied row-by-row, with the leading
diagonal of the matrix in column 0 of the array, the first super-diagonal
starting at position 0 in column 1, and so on. The bottom right k-by-k
triangle of array a is not referenced.

dimension lda):
for (i = 0; i < n; i++) {

m = -i;
for (j = i; j < MIN(n, i+k+1); j++) {
}
}

array a must contain the lower triangular band part of the matrix of
coefficients, supplied row-by-row, with the leading diagonal of the matrix in
for (i = 0; i < n; i++) {

m = k - i;
for (j = max(0, i-k); j <= i; j++) {
}
}


incremented array x must contain the n-element right-hand side vector b.
103

Output Parameters
x Overwritten with the solution vector x.
cblas_?tpmv
packed matrix.
Syntax
void cblas_stpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *ap, float
*x, const MKL_INT incx);
void cblas_dtpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *ap, double
void cblas_ctpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *ap, void
void cblas_ztpmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
Include Files
• mkl.h
Description
The ?tpmv routines perform one of the matrix-vector operations defined as

where:
A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
Input Parameters

uplo Specifies whether the matrix A is upper or lower triangular:

uplo = CblasUpper

104

triangular matrix packed sequentially, column-by-column, so that
respectively, and so on. Before entry with uplo = CblasLowerap[0]
contains A1, 1, ap[1] and ap[2] contain A1, 2 and A2, 2, the array ap must
contain the lower triangular matrix packed sequentially, column-by-column,
so thatap[0] contains A1, 1, ap[1] and ap[2] contain A2, 1 and A3, 1
respectively, and so on. When diag = CblasUnit, the diagonal elements of
a are not referenced, but are assumed to be unity.
triangular matrix packed sequentially, row-by-row, ap[0] contains A1, 1,
triangular matrix packed sequentially, row-by-row, so that ap[0] contains
A1, 1, ap[1] and ap[2] contain A2, 1 and A2, 2 respectively, and so on.


Output Parameters
cblas_?tpsv
are in a triangular packed matrix.
Syntax
void cblas_stpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *ap, float
void cblas_dtpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *ap, double
105
void cblas_ctpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const

void cblas_ztpsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
Include Files
• mkl.h
Description
The ?tpsv routines solve one of the following systems of equations

where:
A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
This routine does not test for singularity or near-singularity.
Input Parameters


uplo = CblasUpper
trans Specifies the system of equations:

if trans=CblasConjTrans, then conjg(A')*x = b.

triangular part of the triangular matrix packed sequentially, column-by-
106
triangular part of the triangular matrix packed sequentially, column-by-
triangular part of the triangular matrix packed sequentially, row-by-row,
contain the lower triangular part of the triangular matrix packed
When diag = CblasUnit, the diagonal elements of a are not referenced,
but are assumed to be unity.


Output Parameters
cblas_?trmv
matrix.
Syntax
void cblas_strmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *a, const
MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *a, const
MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const void *a, const
MKL_INT lda, void *x, const MKL_INT incx);
void cblas_ztrmv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
Include Files
• mkl.h
Description
The ?trmv routines perform one of the following matrix-vector operations defined as
107
where:
A is an n-by-n unit, or non-unit, upper or lower triangular matrix.
Input Parameters


uplo = CblasUpper


a Array, size lda*n. Before entry with uplo = CblasUpper, the leading n-by-
n upper triangular part of the array a must contain the upper triangular
the array a must contain the lower triangular matrix and the strictly upper
triangular part of a is not referenced.
When diag = CblasUnit, the diagonal elements of a are not referenced
either, but are assumed to be unity.



Output Parameters
cblas_?trsv
are in a triangular matrix.
108
Syntax
void cblas_strsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const float *a, const
MKL_INT lda, float *x, const MKL_INT incx);
void cblas_dtrsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const CBLAS_DIAG diag, const MKL_INT n, const double *a, const
MKL_INT lda, double *x, const MKL_INT incx);
void cblas_ctrsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
void cblas_ztrsv (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
Include Files
• mkl.h
Description
The ?trsv routines solve one of the systems of equations:

where:
A is an n-by-n unit, or non-unit, upper or lower triangular matrix.
The routine does not test for singularity or near-singularity.
Input Parameters


uplo = CblasUpper
trans Specifies the systems of equations:

if trans=CblasConjTrans, then oconjg(A')*x = b.

109
a Array, size lda*n . Before entry with uplo = CblasUpper, the leading n-
by-n upper triangular part of the array a must contain the upper triangular
the array a must contain the lower triangular matrix and the strictly upper



Output Parameters
BLAS Level 3 Routines

BLAS Level 3 routines perform matrix-matrix operations. Table “BLAS Level 3 Routine Groups and Their Data
Types” lists the BLAS Level 3 routine groups and the data types associated with them.
BLAS Level 3 Routine Groups and Their Data Types
Routine Group Data Types Description
cblas_?gemm s, d, c, z Computes a matrix-matrix product with general matrices.
cblas_?hemm c, z Computes a matrix-matrix product where one input matrix

is Hermitian.
cblas_?herk c, z Performs a Hermitian rank-k update.
cblas_?her2k c, z Performs a Hermitian rank-2k update.
cblas_?symm s, d, c, z Computes a matrix-matrix product where one input matrix

is symmetric.
cblas_?syrk s, d, c, z Performs a symmetric rank-k update.
cblas_?syr2k s, d, c, z Performs a symmetric rank-2k update.
cblas_?trmm s, d, c, z Computes a matrix-matrix product where one input matrix

is triangular.
cblas_?trsm s, d, c, z Solves a triangular matrix equation.
Symmetric Multiprocessing Version of Intel® MKL

Many applications spend considerable time executing BLAS routines. This time can be scaled by the number
of processors available on the system through using the symmetric multiprocessing (SMP) feature built into
the Intel MKL Library. The performance enhancements based on the parallel use of the processors are
available without any programming effort on your part.
To enhance performance, the library uses the following methods:
110
• The BLAS functions are blocked where possible to restructure the code in a way that increases the
localization of data reference, enhances cache memory use, and reduces the dependency on the memory
bus.
• The code is distributed across the processors to maximize parallelism.
Optimization Notice
cblas_?gemm
Computes a matrix-matrix product with general
matrices.
Syntax
void cblas_sgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float
alpha, const float *a, const MKL_INT lda, const float *b, const MKL_INT ldb, const
float beta, float *c, const MKL_INT ldc);
void cblas_dgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double
alpha, const double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const
double beta, double *c, const MKL_INT ldc);
void cblas_cgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void
*alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void
*beta, void *c, const MKL_INT ldc);
void cblas_zgemm (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
Include Files
• mkl.h
Description
The ?gemm routines compute a scalar-matrix-matrix product and add the result to a scalar-matrix product,
with general matrices. The operation is defined as
C := alpha*op(A)*op(B) + beta*C,
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
A, B and C are matrices:
111
op(A) is an m-by-k matrix,

op(B) is a k-by-n matrix,
C is an m-by-n matrix.
See also
• ?gemm for the Fortran language interface to this routine
• ?gemm3m, BLAS-like extension routines, that use matrix multiplication for similar matrix-matrix operations
Input Parameters

transa Specifies the form of op(A) used in the matrix multiplication:
if transa=CblasNoTrans, then op(A) = A;
if transa=CblasTrans, then op(A) = AT;
if transa=CblasConjTrans, then op(A) = AH.
transb Specifies the form of op(B) used in the matrix multiplication:
if transb=CblasNoTrans, then op(B) = B;
if transb=CblasTrans, then op(B) = BT;
if transb=CblasConjTrans, then op(B) = BH.
m Specifies the number of rows of the matrix op(A) and of the matrix C. The
value of m must be at least zero.
n Specifies the number of columns of the matrix op(B) and the number of
columns of the matrix C.
k Specifies the number of columns of the matrix op(A) and the number of
rows of the matrix op(B).
The value of k must be at least zero.
a
transa=CblasNoTrans transa=CblasTrans or
transa=CblasConjTrans
Layout = Array, size lda*k. Array, size lda*m.

CblasColMajor
Before entry, the leading Before entry, the leading k-
m-by-k part of the array a by-m part of the array a
must contain the matrix must contain the matrix A.
A.
Layout = Array, size lda* m. Array, size lda*k.

CblasRowMajor
Before entry, the leading m-
by-k part of the array a
must contain the matrix A.
112
Before entry, the leading
k-by-m part of the array a
must contain the matrix
A.

(sub)program.
Layout = lda must be at least lda must be at least

CblasColMajor max(1, m). max(1, k)

CblasRowMajor max(1, k) max(1, m).
b
transb=CblasNoTrans transb=CblasTrans or
transb=CblasConjTrans
Layout = Array, size ldb by n. Array, size ldb by k. Before

CblasColMajor Before entry, the leading entry the leading n-by-k
k-by-n part of the array b part of the array b must
must contain the matrix contain the matrix B.
B.
Layout = Array, size ldb by k. Array, size ldb by n. Before

CblasRowMajor Before entry the leading entry, the leading k-by-n
n-by-k part of the array b part of the array b must
B.
ldb Specifies the leading dimension of b as declared in the calling

(sub)program.
Layout = ldb must be at least ldb must be at least

CblasColMajor max(1, k). max(1, n).

CblasRowMajor max(1, n). max(1, k).

When beta is equal to zero, then c need not be set on input.
113
c
Layout = Array, size ldc by n. Before entry, the leading m-
CblasColMajor by-n part of the array c must contain the matrix C,
except when beta is equal to zero, in which case c
need not be set on entry.
Layout = Array, size ldc by m. Before entry, the leading n-

CblasRowMajor by-m part of the array c must contain the matrix C,
ldc Specifies the leading dimension of c as declared in the calling

(sub)program.
Layout = CblasColMajor ldc must be at least max(1, m).
Layout = CblasRowMajor ldc must be at least max(1, n).
Output Parameters
c Overwritten by the m-by-n matrix (alpha*op(A)*op(B) + beta*C).
Example
For examples of routine usage, see the code in the Intel MKL installation directory:
• cblas_sgemm: examples\cblas\source\cblas_sgemmx.c
• cblas_dgemm: examples\cblas\source\cblas_dgemmx.c
• cblas_cgemm: examples\cblas\source\cblas_cgemmx.c
• cblas_zgemm: examples\cblas\source\cblas_zgemmx.c
cblas_?hemm
Computes a matrix-matrix product where one input
matrix is Hermitian.
Syntax
void cblas_chemm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const void *alpha, const void *a, const
MKL_INT lda, const void *b, const MKL_INT ldb, const void *beta, void *c, const
MKL_INT ldc);
void cblas_zhemm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
MKL_INT ldc);
Include Files
• mkl.h
Description
114
The ?hemm routines compute a scalar-matrix-matrix product using a Hermitian matrix A and a general matrix
B and add the result to a scalar-matrix product using a general matrix C. The operation is defined as
C := alpha*A*B + beta*C
or
C := alpha*B*A + beta*C,
where:
A is a Hermitian matrix,
B and C are m-by-n matrices.
Input Parameters

side Specifies whether the Hermitian matrix A appears on the left or right in the
operation as follows:
if side = CblasLeft, then C := alpha*A*B + beta*C;
if side = CblasRight, then C := alpha*B*A + beta*C.
uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is used:
If uplo = CblasUpper, then the upper triangular part of the Hermitian
matrix A is used.
If uplo = CblasLower, then the low triangular part of the Hermitian matrix
A is used.
m Specifies the number of rows of the matrix C.

n Specifies the number of columns of the matrix C.

a Array, size lda* ka, where ka is m when side = CblasLeft and is n

otherwise. Before entry with side = CblasLeft, the m-by-m part of the
array a must contain the Hermitian matrix, such that when uplo =
CblasUpper, the leading m-by-m upper triangular part of the array a must
contain the upper triangular part of the Hermitian matrix and the strictly
lower triangular part of a is not referenced, and when uplo = CblasLower,
the leading m-by-m lower triangular part of the array a must contain the
lower triangular part of the Hermitian matrix, and the strictly upper
Before entry with side = CblasRight, the n-by-n part of the array a must
contain the Hermitian matrix, such that when uplo = CblasUpper, the
leading n-by-n upper triangular part of the array a must contain the upper
triangular part of the Hermitian matrix and the strictly lower triangular part
115
of a is not referenced, and when uplo = CblasLower, the leading n-by-n

lower triangular part of the array a must contain the lower triangular part of
the Hermitian matrix, and the strictly upper triangular part of a is not
referenced. The imaginary parts of the diagonal elements need not be set,
they are assumed to be zero.
lda Specifies the leading dimension of a as declared in the calling (sub)

program. When side = CblasLeft then lda must be at least max(1, m),
otherwise lda must be at least max(1,n).
b For Layout = CblasColMajor: array, size ldb*n. The leading m-by-n part
of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. The leading n-by-m part
of the array b must contain the matrix B

(sub)program. When Layout = CblasColMajor, ldb must be at least
max(1, m); otherwise, ldb must be at least max(1, n).

When beta is supplied as zero, then c need not be set on input.
c For Layout = CblasColMajor: array, size ldc*n. Before entry, the leading
m-by-n part of the array c must contain the matrix C, except when beta is
zero, in which case c need not be set on entry.
For Layout = CblasRowMajor: array, size ldc*m. Before entry, the leading
n-by-m part of the array c must contain the matrix C, except when beta is

(sub)program. When Layout = CblasColMajor, ldc must be at least
max(1, m); otherwise, ldc must be at least max(1, n).
Output Parameters
c Overwritten by the m-by-n updated matrix.
cblas_?herk
Performs a Hermitian rank-k update.
Syntax
void cblas_cherk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const void
*a, const MKL_INT lda, const float beta, void *c, const MKL_INT ldc);
void cblas_zherk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const void
*a, const MKL_INT lda, const double beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
116
Description
The ?herk routines perform a rank-k matrix-matrix operation using a general matrix A and a Hermitian
matrix C. The operation is defined as
C := alpha*A*AH + beta*C,
or
C := alpha*AH*A + beta*C,
where:
alpha and beta are real scalars,
C is an n-by-n Hermitian matrix,
A is an n-by-k matrix in the first case and a k-by-n matrix in the second case.
Input Parameters

uplo Specifies whether the upper or lower triangular part of the array c is used.
If uplo = CblasUpper, then the upper triangular part of the array c is

used.
If uplo = CblasLower, then the low triangular part of the array c is used.

if trans=CblasNoTrans, then C := alpha*A*AH + beta*C;
if trans=CblasConjTrans, then C := alpha*AH*A + beta*C.
n Specifies the order of the matrix C. The value of n must be at least zero.
k With trans=CblasNoTrans, k specifies the number of columns of the

matrix A, and with trans=CblasConjTrans, k specifies the number of
rows of the matrix A.
a
trans=CblasNoTrans trans=CblasConjTrans
Layout = Array, size lda*k. Array, size lda*n.

CblasColMajor
n-by-k part of the array a by-n part of the array a
A.
Layout = Array, size lda*n. Array, size lda*k.

CblasRowMajor
Before entry, the leading Before entry, the leading n-
k-by-n part of the array a by-k part of the array a
A.
117
lda

CblasColMajor max(1, n). max(1, k)

CblasRowMajor max(1, k) max(1, n).
c Array, size ldc by n.

part of the array c must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of c is not referenced.
part of the array c must contain the lower triangular part of the Hermitian
matrix and the strictly upper triangular part of c is not referenced.
The imaginary parts of the diagonal elements need not be set, they are
assumed to be zero.

(sub)program. The value of ldc must be at least max(1, n).
Output Parameters
c With uplo = CblasUpper, the upper triangular part of the array c is

With uplo = CblasLower, the lower triangular part of the array c is
cblas_?her2k
Performs a Hermitian rank-2k update.
Syntax
void cblas_cher2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const void *alpha, const void
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const float beta, void *c,
const MKL_INT ldc);
void cblas_zher2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const double beta, void *c,
const MKL_INT ldc);
Include Files
• mkl.h
Description
118
The ?her2k routines perform a rank-2k matrix-matrix operation using general matrices A and B and a
Hermitian matrix C. The operation is defined as
C := alpha*A*BH + conjg(alpha)*B*AH + beta*C,
or
C := alpha*AH*B + conjg(alpha)*BH*A + beta*C,
where:
alpha is a scalar and beta is a real scalar,
C is an n-by-n Hermitian matrix,
A and B are n-by-k matrices in the first case and k-by-n matrices in the second case.
Input Parameters

If uplo = CblasUpper, then the upper triangular of the array c is used.
If uplo = CblasLower, then the low triangular of the array c is used.

if trans=CblasNoTrans, then C:=alpha*A*BH + alpha*B*AH + beta*C;
if trans=CblasConjTrans, then C:=alpha*AH*B + alpha*BH*A +

beta*C.
k With trans=CblasNoTrans specifies the number of columns of the matrix

A, and with trans=CblasConjTrans, k specifies the number of rows of the
matrix A.
The value of k must be at least equal to zero.
a

CblasColMajor
A.

CblasRowMajor
A.
119

(sub)program.


b
Layout = Array, size ldb*k. Array, size ldb*n.

CblasColMajor
n-by-k part of the array b by-n part of the array b
must contain the matrix must contain the matrix B.
B.

CblasRowMajor
k-by-n part of the array b by-k part of the array b
B.
ldb Specifies the leading dimension of a as declared in the calling

(sub)program.



part of the array c must contain the upper triangular part of the Hermitian
matrix and the strictly lower triangular part of c is not referenced.
part of the array c must contain the lower triangular part of the Hermitian
The imaginary parts of the diagonal elements need not be set, they are
assumed to be zero.

120
Output Parameters

cblas_?symm
matrix is symmetric.
Syntax
void cblas_ssymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const float alpha, const float *a, const
MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c, const
MKL_INT ldc);
void cblas_dsymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const MKL_INT m, const MKL_INT n, const double alpha, const double *a, const
MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double *c, const
MKL_INT ldc);
void cblas_csymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
MKL_INT ldc);
void cblas_zsymm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
MKL_INT ldc);
Include Files
• mkl.h
Description
The ?symm routines compute a scalar-matrix-matrix product with one symmetric matrix and add the result to
a scalar-matrix product. The operation is defined as
C := alpha*A*B + beta*C,
or
C := alpha*B*A + beta*C,
where:
A is a symmetric matrix,
B and C are m-by-n matrices.
121
Input Parameters

side Specifies whether the symmetric matrix A appears on the left or right in the
operation:
if side = CblasLeft, then C := alpha*A*B + beta*C;
if side = CblasRight, then C := alpha*B*A + beta*C.
uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is used:
if uplo = CblasUpper, then the upper triangular part is used;
if uplo = CblasLower, then the lower triangular part is used.
m Specifies the number of rows of the matrix C.

n Specifies the number of columns of the matrix C.

a Array, size lda* ka , where ka is m when side = CblasLeft and is n

otherwise.
Before entry with side = CblasLeft, the m-by-m part of the array a must
contain the symmetric matrix, such that when uplo = CblasUpper, the
leading m-by-m upper triangular part of the array a must contain the upper
triangular part of the symmetric matrix and the strictly lower triangular part
of a is not referenced, and when side = CblasLeft, the leading m-by-m
lower triangular part of the array a must contain the lower triangular part of
the symmetric matrix and the strictly upper triangular part of a is not
referenced.
Before entry with side = CblasRight, the n-by-n part of the array a must
contain the symmetric matrix, such that when uplo = CblasUppere array a
must contain the upper triangular part of the symmetric matrix and the
strictly lower triangular part of a is not referenced, and when side =
CblasLeft, the leading n-by-n lower triangular part of the array a must
contain the lower triangular part of the symmetric matrix and the strictly
upper triangular part of a is not referenced.

(sub)program. When side = CblasLeft then lda must be at least max(1,
m), otherwise lda must be at least max(1, n).
b For Layout = CblasColMajor: array, size ldb*n. The leading m-by-n part
For Layout = CblasRowMajor: array, size ldb*m. The leading n-by-m part
of the array b must contain the matrix B
122

When beta is set to zero, then c need not be set on input.
c For Layout = CblasColMajor: array, size ldc*n. Before entry, the leading
m-by-n part of the array c must contain the matrix C, except when beta is
For Layout = CblasRowMajor: array, size ldc*m. Before entry, the leading
n-by-m part of the array c must contain the matrix C, except when beta is

(sub)program. When Layout = CblasColMajor, ldc must be at least
max(1, m); otherwise, ldc must be at least max(1, n).
Output Parameters
c Overwritten by the m-by-n updated matrix.
cblas_?syrk
Performs a symmetric rank-k update.
Syntax
void cblas_ssyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const float
*a, const MKL_INT lda, const float beta, float *c, const MKL_INT ldc);
void cblas_dsyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const
double *a, const MKL_INT lda, const double beta, double *c, const MKL_INT ldc);
void cblas_csyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
*a, const MKL_INT lda, const void *beta, void *c, const MKL_INT ldc);
void cblas_zsyrk (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
*a, const MKL_INT lda, const void *beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?syrk routines perform a rank-k matrix-matrix operation for a symmetric matrix C using a general
matrix A. The operation is defined as
C := alpha*A*A' + beta*C,
or
C := alpha*A'*A + beta*C,
123
where:
C is an n-by-n symmetric matrix,
Input Parameters


used.

if trans=CblasNoTrans, then C := alpha*A*A' + beta*C;
if trans=CblasTrans, then C := alpha*A'*A + beta*C;
if trans=CblasConjTrans, then C := alpha*A'*A + beta*C.
k On entry with trans=CblasNoTrans, k specifies the number of columns of

the matrix a, and on entry with trans=CblasTrans or
trans=CblasConjTrans , k specifies the number of rows of the matrix a.
a Array, size lda* ka, where ka is k when trans=CblasNoTrans, and is n

otherwise. Before entry with trans=CblasNoTrans, the leading n-by-k
part of the array a must contain the matrix A, otherwise the leading k-by-n
part of the array a must contain the matrix A.

CblasColMajor
A.

CblasRowMajor
A.
lda
124

c Array, size ldc* n. Before entry with uplo = CblasUpper, the leading n-
by-n upper triangular part of the array c must contain the upper triangular
part of the symmetric matrix and the strictly lower triangular part of c is not
referenced.
part of the array c must contain the lower triangular part of the symmetric

Output Parameters

cblas_?syr2k
Performs a symmetric rank-2k update.
Syntax
void cblas_ssyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const float alpha, const float
*a, const MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);
void cblas_dsyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE trans, const MKL_INT n, const MKL_INT k, const double alpha, const
double *a, const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta,
double *c, const MKL_INT ldc);
void cblas_csyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void *beta, void *c,
const MKL_INT ldc);
void cblas_zsyr2k (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
*a, const MKL_INT lda, const void *b, const MKL_INT ldb, const void *beta, void *c,
const MKL_INT ldc);
Include Files
• mkl.h
125
Description
The ?syr2k routines perform a rank-2k matrix-matrix operation for a symmetric matrix C using general
matrices A and B. The operation is defined as
C := alpha*A*B' + alpha*B*A' + beta*C,
or
C := alpha*A'*B + alpha*B'*A + beta*C,
where:
C is an n-by-n symmetric matrix,
A and B are n-by-k matrices in the first case, and k-by-n matrices in the second case.
Input Parameters


used.

if trans=CblasNoTrans, then C := alpha*A*B'+alpha*B*A'+beta*C;
if trans=CblasTrans, then C := alpha*A'*B +alpha*B'*A +beta*C;
if trans=CblasConjTrans, then C := alpha*A'*B +alpha*B'*A

+beta*C.
k On entry with trans=CblasNoTrans, k specifies the number of columns of

the matrices A and B, and on entry with trans=CblasTrans or
trans=CblasConjTrans, k specifies the number of rows of the matrices A
and B. The value of k must be at least zero.
a

CblasColMajor
A.

CblasRowMajor
126
A.

(sub)program.


b
Layout = Array, size ldb*k. Array, size ldb*n.

CblasColMajor
n-by-k part of the array b by-n part of the array b
B.

CblasRowMajor
k-by-n part of the array b by-k part of the array b
B.
ldb Specifies the leading dimension of a as declared in the calling

(sub)program.


c Array, size ldc* n. Before entry with uplo = CblasUpper, the leading n-
by-n upper triangular part of the array c must contain the upper triangular
part of the symmetric matrix and the strictly lower triangular part of c is not
referenced.
part of the array c must contain the lower triangular part of the symmetric
127

Output Parameters

cblas_?trmm
matrix is triangular.
Syntax
void cblas_strmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
uplo, const CBLAS_TRANSPOSE transa, const CBLAS_DIAG diag, const MKL_INT m, const
MKL_INT n, const float alpha, const float *a, const MKL_INT lda, float *b, const
MKL_INT ldb);
void cblas_dtrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
MKL_INT n, const double alpha, const double *a, const MKL_INT lda, double *b, const
MKL_INT ldb);
void cblas_ctrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
MKL_INT n, const void *alpha, const void *a, const MKL_INT lda, void *b, const MKL_INT
ldb);
void cblas_ztrmm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
ldb);
Include Files
• mkl.h
Description
The ?trmm routines compute a scalar-matrix-matrix product with one triangular matrix. The operation is
defined as
B := alpha*op(A)*B
or
B := alpha*B*op(A)
where:
alpha is a scalar,
B is an m-by-n matrix,
A is a unit, or non-unit, upper or lower triangular matrix
op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').
128
Input Parameters

side Specifies whether op(A) appears on the left or right of B in the operation:
if side = CblasLeft, then B := alpha*op(A)*B;
if side = CblasRight, then B := alpha*B*op(A).

uplo = CblasUpper
if transa=CblasTrans, then op(A) = A';
if transa=CblasConjTrans, then op(A) = conjg(A').

m Specifies the number of rows of B. The value of m must be at least zero.
n Specifies the number of columns of B. The value of n must be at least zero.

When alpha is zero, then a is not referenced and b need not be set before
entry.
a Array, size lda by k, where k is m when side = CblasLeft and is n when

side = CblasRight. Before entry with uplo = CblasUpper, the leading k
by k upper triangular part of the array a must contain the upper triangular
Before entry with uplo = CblasLower, the leading k by k lower triangular
part of the array a must contain the lower triangular matrix and the strictly
upper triangular part of a is not referenced.

(sub)program. When side = CblasLeft, then lda must be at least max(1,
m), when side = CblasRight, then lda must be at least max(1, n).
b For Layout = CblasColMajor: array, size ldb*n. Before entry, the leading
m-by-n part of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. Before entry, the leading
n-by-m part of the array b must contain the matrix B.
129

Output Parameters
b Overwritten by the transformed matrix.
cblas_?trsm
Solves a triangular matrix equation.
Syntax
void cblas_strsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
MKL_INT n, const float alpha, const float *a, const MKL_INT lda, float *b, const
MKL_INT ldb);
void cblas_dtrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
MKL_INT n, const double alpha, const double *a, const MKL_INT lda, double *b, const
MKL_INT ldb);
void cblas_ctrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
ldb);
void cblas_ztrsm (const CBLAS_LAYOUT Layout, const CBLAS_SIDE side, const CBLAS_UPLO
ldb);
Include Files
• mkl.h
Description
The ?trsm routines solve one of the following matrix equations:
op(A)*X = alpha*B,
or
X*op(A) = alpha*B,
where:
alpha is a scalar,
X and B are m-by-n matrices,
A is a unit, or non-unit, upper or lower triangular matrix
op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A').
The matrix B is overwritten by the solution matrix X.
130
Input Parameters

side Specifies whether op(A) appears on the left or right of X in the equation:
if side = CblasLeft, then op(A)*X = alpha*B;
if side = CblasRight, then X*op(A) = alpha*B.

uplo = CblasUpper
if transa=CblasTrans;


entry.
a Array, size lda* k , where k is m when side = CblasLeft and is n when

side = CblasRight. Before entry with uplo = CblasUpper, the leading k
by k upper triangular part of the array a must contain the upper triangular
Before entry with uplo = CblasLower lower triangular part of the array a
must contain the lower triangular matrix and the strictly upper triangular
part of a is not referenced.

(sub)program. When side = CblasLeft, then lda must be at least max(1,
m), when side = CblasRight, then lda must be at least max(1, n).
b For Layout = CblasColMajor: array, size ldb*n. Before entry, the leading
m-by-n part of the array b must contain the matrix B.
For Layout = CblasRowMajor: array, size ldb*m. Before entry, the leading
n-by-m part of the array b must contain the matrix B.
131

Output Parameters
b Overwritten by the solution matrix X.
Sparse BLAS Level 1 Routines

This section describes Sparse BLAS Level 1, an extension of BLAS Level 1 included in the Intel® Math Kernel
Library beginning with the Intel MKL release 2.1. Sparse BLAS Level 1 is a group of routines and functions
that perform a number of common vector operations on sparse vectors stored in compressed form.
Sparse vectors are those in which the majority of elements are zeros. Sparse BLAS routines and functions
are specially implemented to take advantage of vector sparsity. This allows you to achieve large savings in
computer time and memory. If nz is the number of non-zero vector elements, the computer time taken by
Sparse BLAS operations will be O(nz).
Vector Arguments
Compressed sparse vectors. Let a be a vector stored in an array, and assume that the only non-zero
elements of a are the following:
a[k1], a[k2], a[k3] . . . a[knz],
where nz is the total number of non-zero elements in a.

In Sparse BLAS, this vector can be represented in compressed form by two arrays, x (values) and indx
(indices). Each array has nz elements:
x[0]=a[k1], x[1]=a[k2], . . . x[nz-1]= a[knz],
indx[0]=k1, indx[1]=k2, . . . indx[nz-1]= knz.
Thus, a sparse vector is fully determined by the triple (nz, x, indx). If you pass a negative or zero value of nz
to Sparse BLAS, the subroutines do not modify any arrays or variables.
Full-storage vectors. Sparse BLAS routines can also use a vector argument fully stored in a single array (a
full-storage vector). If y is a full-storage vector, its elements must be stored contiguously: the first element
in y[0], the second in y[1], and so on. This corresponds to an increment incy = 1 in BLAS Level 1. No
increment value for full-storage vectors is passed as an argument to Sparse BLAS routines or functions.
Naming Conventions for Sparse BLAS Routines

Similar to BLAS, the names of Sparse BLAS subprograms have prefixes that determine the data type
involved: s and d for single- and double-precision real; c and z for single- and double-precision complex
respectively.
If a Sparse BLAS routine is an extension of a "dense" one, the subprogram name is formed by appending the
suffix i (standing for indexed) to the name of the corresponding "dense" subprogram. For example, the
Sparse BLAS routine saxpyi corresponds to the BLAS routine saxpy, and the Sparse BLAS function cdotci
corresponds to the BLAS function cdotc.
Routines and Data Types

Routines and data types supported in the Intel MKL implementation of Sparse BLAS are listed in Table
“Sparse BLAS Routines and Their Data Types”.
132
Sparse BLAS Routines and Their Data Types
Routine/ Data Types Description
Function
cblas_?axpyi s, d, c, z Scalar-vector product plus vector (routines)
cblas_?doti s, d Dot product (functions)
cblas_?dotci c, z Complex dot product conjugated (functions)
cblas_?dotui c, z Complex dot product unconjugated (functions)
cblas_?gthr s, d, c, z Gathering a full-storage sparse vector into compressed

form nz, x, indx (routines)
cblas_?gthrz s, d, c, z Gathering a full-storage sparse vector into compressed

form and assigning zeros to gathered elements in the full-
storage vector (routines)
cblas_?roti s, d Givens rotation (routines)
cblas_?sctr s, d, c, z Scattering a vector from compressed form to full-storage

form (routines)
BLAS Level 1 Routines That Can Work With Sparse Vectors

The following BLAS Level 1 routines will give correct results when you pass to them a compressed-form array
x(with the increment incx=1):
cblas_?asum sum of absolute values of vector elements
cblas_?copy copying a vector
cblas_?nrm2 Euclidean norm of a vector
cblas_?scal scaling a vector
cblas_i?amax index of the element with the largest absolute value for real flavors, or the
largest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
cblas_i?amin index of the element with the smallest absolute value for real flavors, or the
smallest sum |Re(x[i])|+|Im(x[i])| for complex flavors.
The result i returned by i?amax and i?amin should be interpreted as index in the compressed-form array, so
that the largest (smallest) value is x[i-1]; the corresponding index in full-storage array is indx[i-1].
You can also call cblas_?rotg to compute the parameters of Givens rotation and then pass these
parameters to the Sparse BLAS routines cblas_?roti.
cblas_?axpyi
Adds a scalar multiple of compressed sparse vector to
a full-storage vector.
Syntax
void cblas_saxpyi (const MKL_INT nz, const float a, const float *x, const MKL_INT
*indx, float *y);
void cblas_daxpyi (const MKL_INT nz, const double a, const double *x, const MKL_INT
*indx, double *y);
133
void cblas_caxpyi (const MKL_INT nz, const void *a, const void *x, const MKL_INT *indx,
void *y);
void cblas_zaxpyi (const MKL_INT nz, const void *a, const void *x, const MKL_INT *indx,
void *y);
Include Files
• mkl.h
Description
The ?axpyi routines perform a vector-vector operation defined as
y := a*x + y
where:
a is a scalar,
x is a sparse vector stored in compressed form,
y is a vector in full storage form.
The ?axpyi routines reference or modify only the elements of y whose indices are listed in the array indx.
The values in indx must be distinct.
Input Parameters
nz The number of elements in x and indx.
x Array, size at least nz.
indx Specifies the indices for the elements of x.

Array, size at least nz.
y Array, size at least max(indx[i]).
Output Parameters
cblas_?doti
Computes the dot product of a compressed sparse real
vector by a full-storage real vector.
Syntax
float cblas_sdoti (const MKL_INT nz, const float *x, const MKL_INT *indx, const float
*y);
double cblas_ddoti (const MKL_INT nz, const double *x, const MKL_INT *indx, const
double *y);
Include Files
• mkl.h
134
Description
The ?doti routines return the dot product of x and y defined as
res = x[0]*y[indx[0]] + x[1]*y[indx[1]] +...+ x[nz-1]*y[indx[nz-1]]
where the triple (nz, x, indx) defines a sparse real vector stored in compressed form, and y is a real vector in
full storage form. The functions reference only the elements of y whose indices are listed in the array indx.
The values in indx must be distinct.
Input Parameters
nz The number of elements in x and indx .

Output Parameters
res Contains the dot product of x and y, if nz is positive. Otherwise, res

contains 0.
cblas_?dotci
Computes the conjugated dot product of a
compressed sparse complex vector with a full-storage
complex vector.
Syntax
void cblas_cdotci_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
void cblas_zdotci_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
Include Files
• mkl.h
Description
The ?dotci routines return the dot product of x and y defined as
conjg(x[0])*y[indx[0]] + ... + conjg(x[nz-1])*y[indx[nz-1]]
where the triple (nz, x, indx) defines a sparse complex vector stored in compressed form, and y is a real
vector in full storage form. The functions reference only the elements of y whose indices are listed in the
array indx. The values in indx must be distinct.
Input Parameters
nz The number of elements in x and indx .
135

Output Parameters
dotui Contains the conjugated dot product of x and y, if nz is positive. Otherwise,

it contains 0.
cblas_?dotui
Computes the dot product of a compressed sparse
complex vector by a full-storage complex vector.
Syntax
void cblas_cdotui_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
void cblas_zdotui_sub (const MKL_INT nz, const void *x, const MKL_INT *indx, const void
*y, void *dotui);
Include Files
• mkl.h
Description
The ?dotui routines return the dot product of x and y defined as
res = x[0]*y[indx[0]] + x[1]*y(indx[1]) +...+ x[nz - 1]*y[indx[nz - 1]]
where the triple (nz, x, indx) defines a sparse complex vector stored in compressed form, and y is a real
vector in full storage form. The functions reference only the elements of y whose indices are listed in the
array indx. The values in indx must be distinct.
Input Parameters

Output Parameters
dotui Contains the dot product of x and y, if nz is positive. Otherwise, res

contains 0.
136
cblas_?gthr
Gathers a full-storage sparse vector's elements into
compressed form.
Syntax
void cblas_sgthr (const MKL_INT nz, const float *y, float *x, const MKL_INT *indx);
void cblas_dgthr (const MKL_INT nz, const double *y, double *x, const MKL_INT *indx);
void cblas_cgthr (const MKL_INT nz, const void *y, void *x, const MKL_INT *indx);
void cblas_zgthr (const MKL_INT nz, const void *y, void *x, const MKL_INT *indx);
Include Files
• mkl.h
Description
The ?gthr routines gather the specified elements of a full-storage sparse vector y into compressed form(nz,
x, indx). The routines reference only the elements of y whose indices are listed in the array indx:
x[i] = y]indx[i]], for i=0,1,... ,nz-1.
Input Parameters
nz The number of elements of y to be gathered.
indx Specifies indices of elements to be gathered.

Output Parameters

Contains the vector converted to the compressed form.
cblas_?gthrz
Gathers a sparse vector's elements into compressed
form, replacing them by zeros.
Syntax
void cblas_sgthrz (const MKL_INT nz, float *y, float *x, const MKL_INT *indx);
void cblas_dgthrz (const MKL_INT nz, double *y, double *x, const MKL_INT *indx);
void cblas_cgthrz (const MKL_INT nz, void *y, void *x, const MKL_INT *indx);
void cblas_zgthrz (const MKL_INT nz, void *y, void *x, const MKL_INT *indx);
Include Files
• mkl.h
137
Description
The ?gthrz routines gather the elements with indices specified by the array indx from a full-storage vector y
into compressed form (nz, x, indx) and overwrite the gathered elements of y by zeros. Other elements of y
are not referenced or modified (see also ?gthr).
Input Parameters
nz The number of elements of y to be gathered.
indx Specifies indices of elements to be gathered.

Output Parameters

Contains the vector converted to the compressed form.
y The updated vector y.
cblas_?roti
Applies Givens rotation to sparse vectors one of which
is in compressed form.
Syntax
void cblas_sroti (const MKL_INT nz, float *x, const MKL_INT *indx, float *y, const
float c, const float s);
void cblas_droti (const MKL_INT nz, double *x, const MKL_INT *indx, double *y, const
double c, const double s);
Include Files
• mkl.h
Description
The ?roti routines apply the Givens rotation to elements of two real vectors, x (in compressed form nz, x,
indx) and y (in full storage form):
x[i] = c*x[i] + s*y[indx[i]]

y[indx[i]] = c*y[indx[i]]- s*x[i]
The routines reference only the elements of y whose indices are listed in the array indx. The values in indx
must be distinct.
Input Parameters
138
c A scalar.
s A scalar.
Output Parameters
x and y The updated arrays.
cblas_?sctr
Converts compressed sparse vectors into full storage
form.
Syntax
void cblas_ssctr (const MKL_INT nz, const float *x, const MKL_INT *indx, float *y);
void cblas_dsctr (const MKL_INT nz, const double *x, const MKL_INT *indx, double *y);
void cblas_csctr (const MKL_INT nz, const void *x, const MKL_INT *indx, void *y);
void cblas_zsctr (const MKL_INT nz, const void *x, const MKL_INT *indx, void *y);
Include Files
• mkl.h
Description
The ?sctr routines scatter the elements of the compressed sparse vector (nz, x, indx) to a full-storage
vector y. The routines modify only the elements of y whose indices are listed in the array indx:
y[indx[i]] = x[i], for i=0,1,... ,nz-1.
Input Parameters
nz The number of elements of x to be scattered.
indx Specifies indices of elements to be scattered.


Contains the vector to be converted to full-storage form.
Output Parameters

Contains the vector y with updated elements.
139
Sparse BLAS Level 2 and Level 3 Routines

This section describes Sparse BLAS Level 2 and Level 3 routines included in the Intel® Math Kernel Library
(Intel® MKL) . Sparse BLAS Level 2 is a group of routines and functions that perform operations between a
sparse matrix and dense vectors. Sparse BLAS Level 3 is a group of routines and functions that perform
operations between a sparse matrix and dense matrices.
The terms and concepts required to understand the use of the Intel MKL Sparse BLAS Level 2 and Level 3
routines are discussed in the Linear Solvers Basics appendix.
The Sparse BLAS routines can be useful to implement iterative methods for solving large sparse systems of
equations or eigenvalue problems. For example, these routines can be considered as building blocks for
Iterative Sparse Solvers based on Reverse Communication Interface (RCI ISS) described in the Chapter 8 of
the manual.
Intel MKL provides Sparse BLAS Level 2 and Level 3 routines with typical (or conventional) interface similar
to the interface used in the NIST* Sparse BLAS library [Rem05].
Some software packages and libraries (the PARDISO* Solver used in Intel MKL, Sparskit 2 [Saad94], the
Compaq* Extended Math Library (CXML)[CXML01]) use different (early) variation of the compressed sparse
row (CSR) format and support only Level 2 operations with simplified interfaces. Intel MKL provides an
additional set of Sparse BLAS Level 2 routines with similar simplified interfaces. Each of these routines
operates only on a matrix of the fixed type.
The routines described in this section support both one-based indexing and zero-based indexing of the input
data (see details in the section One-based and Zero-based Indexing).
Naming Conventions in Sparse BLAS Level 2 and Level 3

Each Sparse BLAS Level 2 and Level 3 routine has a six- or eight-character base name preceded by the prefix
mkl_ or mkl_cspblas_ .
The routines with typical (conventional) interface have six-character base names in accordance with the
template:
mkl_<character > <data> <operation>( )
The routines with simplified interfaces have eight-character base names in accordance with the templates:
mkl_<character > <data> <mtype> <operation>( )
for routines with one-based indexing; and

mkl_cspblas_<character> <data><mtype><operation>( )
for routines with zero-based indexing.

The <data> field indicates the sparse matrix storage format (see section Sparse Matrix Storage Formats):
coo coordinate format
csr compressed sparse row format and its variations
csc compressed sparse column format and its variations
140
dia diagonal format
sky skyline storage format
bsr block sparse row format and its variations
The <operation> field indicates the type of operation:
mv matrix-vector product (Level 2)
mm matrix-matrix product (Level 3)
sv solving a single triangular system (Level 2)
sm solving triangular systems with multiple right-hand sides (Level 3)
The field <mtype> indicates the matrix type:
ge sparse representation of a general matrix
sy sparse representation of the upper or lower triangle of a symmetric matrix
tr sparse representation of a triangular matrix
Sparse Matrix Storage Formats for Sparse BLAS Routines

The current version of Intel MKL Sparse BLAS Level 2 and Level 3 routines support the following point entry
[Duff86] storage formats for sparse matrices:
• compressed sparse row format (CSR) and its variations;

• compressed sparse column format (CSC);
• coordinate format;
• diagonal format;
• skyline storage format;
and one block entry storage format:
• block sparse row format (BSR) and its variations.

For more information see "Sparse Matrix Storage Formats" in Appendix A.
Intel MKL provides auxiliary routines - matrix converters - that convert sparse matrix from one storage
format to another.
Routines and Supported Operations

This section describes operations supported by the Intel MKL Sparse BLAS Level 2 and Level 3 routines. The
following notations are used here:
A is a sparse matrix;
B and C are dense matrices;
D is a diagonal scaling matrix;
x and y are dense vectors;
alpha and beta are scalars;
op(A) is one of the possible operations:

op(A) = A;
141
op(A) = AT - transpose of A;
op(A) = AH - conjugated transpose of A.
inv(op(A)) denotes the inverse of op(A).

The Intel MKL Sparse BLAS Level 2 and Level 3 routines support the following operations:
• computing the vector product between a sparse matrix and a dense vector:
y := alpha*op(A)*x + beta*y
• solving a single triangular system:
y := alpha*inv(op(A))*x
• computing a product between sparse matrix and dense matrix:

C := alpha*op(A)*B + beta*C
• solving a sparse triangular system with multiple right-hand sides:
C := alpha*inv(op(A))*B
Intel MKL provides an additional set of the Sparse BLAS Level 2 routines with simplified interfaces. Each of
these routines operates on a matrix of the fixed type. The following operations are supported:
• computing the vector product between a sparse matrix and a dense vector (for general and symmetric
matrices):
y := op(A)*x
• solving a single triangular system (for triangular matrices):
y := inv(op(A))*x
Matrix type is indicated by the field <mtype> in the routine name (see section Naming Conventions in Sparse
BLAS Level 2 and Level 3).
NOTE
The routines with simplified interfaces support only four sparse matrix storage formats, specifically:
CSR format in the 3-array variation accepted in the direct sparse solvers and in the CXML;
diagonal format accepted in the CXML;
coordinate format;
BSR format in the 3-array variation.
Note that routines with both typical (conventional) and simplified interfaces use the same computational
kernels that work with certain internal data structures.
The Intel MKL Sparse BLAS Level 2 and Level 3 routines do not support in-place operations.
Complete list of all routines is given in the “Sparse BLAS Level 2 and Level 3 Routines”.
Interface Consideration
One-Based and Zero-Based Indexing

The Intel MKL Sparse BLAS Level 2 and Level 3 routines support one-based and zero-based indexing of data
arrays.
Routines with typical interfaces support zero-based indexing for the following sparse data storage formats:
CSR, CSC, BSR, and COO. Routines with simplified interfaces support zero based indexing for the following
sparse data storage formats: CSR, BSR, and COO. See the complete list of Sparse BLAS Level 2 and Level 3
Routines.
142
The one-based indexing uses the convention of starting array indices at 1. The zero-based indexing uses the
convention of starting array indices at 0. For example, indices of the 5-element array x can be presented in
case of one-based indexing as follows:
Element index: 1 2 3 4 5
Element value: 1.0 5.0 7.0 8.0 9.0
and in case of zero-based indexing as follows:

Element index: 0 1 2 3 4
Element value: 1.0 5.0 7.0 8.0 9.0
The detailed descriptions of the one-based and zero-based variants of the sparse data storage formats are
given in the "Sparse Matrix Storage Formats" in Appendix A.
Most parameters of the routines are identical for both one-based and zero-based indexing, but some of them
have certain differences. The following table lists all these differences.
Parameter One-based Indexing Zero-based Indexing
val Array containing non-zero elements of Array containing non-zero elements of

the matrix A, its length is . pntre[m] the matrix A, its length is . pntre[m—
- pntrb[1] 1] - pntrb[0]
pntrb Array of length m. This array contains Array of length m. This array contains
row indices, such that pntrb[i] - row indices, such that pntrb[i] -
pntrb[1]+1 is the first index of row i pntrb[0] is the first index of row i in
in the arrays val and indx the arrays val and indx.
pntre Array of length m. This array contains Array of length m. This array contains
row indices, such that pntre[I] - row indices, such that pntre[i] -
pntrb[1] is the last index of row i in pntrb[0]-1 is the last index of row i
the arrays val and indx. in the arrays val and indx.
ia Array of length m + 1, containing Array of length m+1, containing indices
indices of elements in the array a, of elements in the array a, such that
such that ia[i] is the index in the ia[i] is the index in the array a of
array a of the first non-zero element the first non-zero element from the
from the row i. The value of the last row i. The value of the last element
element ia[m + 1] is equal to the ia[m] is equal to the number of non-
number of non-zeros plus one. zeros.
ldb Specifies the leading dimension of b as Specifies the second dimension of b as
declared in the calling (sub)program. declared in the calling (sub)program.
ldc Specifies the leading dimension of c as Specifies the second dimension of c as
declared in the calling (sub)program. declared in the calling (sub)program.
Differences Between Intel MKL and NIST* Interfaces

The Intel MKL Sparse BLAS Level 3 routines have the following conventional interfaces:
mkl_xyyymm(transa, m, n, k, alpha, matdescra, arg(A), b, ldb, beta, c, ldc), for matrix-
matrix product;
mkl_xyyysm(transa, m, n, alpha, matdescra, arg(A), b, ldb, c, ldc), for triangular solvers
with multiple right-hand sides.
Here x denotes data type, and yyy - sparse matrix data structure (storage format).
The analogous NIST* Sparse BLAS (NSB) library routines have the following interfaces:
143
xyyymm(transa, m, n, k, alpha, descra, arg(A), b, ldb, beta, c, ldc, work, lwork), for
matrix-matrix product;
xyyysm(transa, m, n, unitd, dv, alpha, descra, arg(A), b, ldb, beta, c, ldc, work,
lwork), for triangular solvers with multiple right-hand sides.
Some similar arguments are used in both libraries. The argument transa indicates what operation is
performed and is slightly different in the NSB library (see Table “Parameter transa”). The arguments m and k
are the number of rows and column in the matrix A, respectively, n is the number of columns in the matrix C.
The arguments alpha and beta are scalar alpha and beta respectively (beta is not used in the Intel MKL
triangular solvers.) The arguments b and c are rectangular arrays with the leading dimension ldb and ldc,
respectively. arg(A) denotes the list of arguments that describe the sparse representation of A.
Parameter transa
MKL interface NSB interface Operation
data type char * INTEGER
value N or n 0 op(A) = A
T or t 1 op(A) = AT
C or c 2 op(A) = AT or op(A) =
AH
Parameter matdescra
The parameter matdescra describes the relevant characteristic of the matrix A. This manual describes
matdescra as an array of six elements in line with the NIST* implementation. However, only the first four
elements of the array are used in the current versions of the Intel MKL Sparse BLAS routines. Elements
matdescra[4] and matdescra[5] are reserved for future use. Note that whether matdescra is described in
your application as an array of length 6 or 4 is of no importance because the array is declared as a pointer in
the Intel MKL routines. To learn more about declaration of the matdescra array, see the Sparse BLAS
examples located in the Intel MKL installation directory: examples/spblasc/ for C. The table below lists
elements of the parameter matdescra, their Fortran values, and their meanings. The parameter matdescra
corresponds to the argument descra from NSB library.
Possible Values of the Parameter matdescra [descra - 1]
MKL interface NSB Matrix characteristics

interface
one-based zero-based
indexing indexing
data type char * char * int *
1st element matdescra[1] matdescra[0] descra[0] matrix structure
value G G 0 general
S S 1 symmetric (A = AT)
H H 2 Hermitian (A = (AH))
T T 3 triangular
A A 4 skew(anti)-symmetric (A = -AT)
D D 5 diagonal
144
MKL interface NSB Matrix characteristics
interface
2nd element matdescra[2] matdescra[1] descra[1] upper/lower triangular indicator
value L L 1 lower
U U 2 upper
3rd element matdescra[3] matdescra[2] descra[2] main diagonal type
value N N 0 non-unit
U U 1 unit
type of indexing
4th element matdescra[4] matdescra[3] descra[3]
one-based indexing
value F 1
zero-based indexing
C 0
In some cases possible element values of the parameter matdescra depend on the values of other elements.
The Table "Possible Combinations of Element Values of the Parameter matdescra" lists all possible
combinations of element values for both multiplication routines and triangular solvers.
Possible Combinations of Element Values of the Parameter matdescra
Routines matdescra[0] matdescra[1] matdescra[2] matdescra[3]

Multiplication G ignored ignored F (default) or C
Routines
S or H L (default) N (default) F (default) or C
S or H L (default) U F (default) or C
S or H U N (default) F (default) or C
S or H U U F (default) or C
A L (default) ignored F (default) or C
A U ignored F (default) or C
Multiplication T L U F (default) or C
Routines and
Triangular Solvers
T L N F (default) or C
T U U F (default) or C
T U N F (default) or C
D ignored N (default) F (default) or C
D ignored U F (default) or C
For a matrix in the skyline format with the main diagonal declared to be a unit, diagonal elements must be
stored in the sparse representation even if they are zero. In all other formats, diagonal elements can be
stored (if needed) in the sparse representation if they are not zero.
Operations with Partial Matrices

One of the distinctive feature of the Intel MKL Sparse BLAS routines is a possibility to perform operations
only on partial matrices composed of certain parts (triangles and the main diagonal) of the input sparse
matrix. It can be done by setting properly first three elements of the parameter matdescra.
An arbitrary sparse matrix A can be decomposed as
A = L + D + U
145
where L is the strict lower triangle of A, U is the strict upper triangle of A, D is the main diagonal.
Table "Output Matrices for Multiplication Routines" shows correspondence between the output matrices and
values of the parameter matdescra for the sparse matrix A for multiplication routines.
Output Matrices for Multiplication Routines
matdescra[0] matdescra[1] matdescra[2] Output Matrix
G ignored ignored alpha*op(A)*x + beta*y

alpha*op(A)*B + beta*C
S or H L N alpha*op(L+D+L')*x + beta*y
alpha*op(L+D+L')*B + beta*C
S or H L U alpha*op(L+I+L')*x + beta*y
alpha*op(L+I+L')*B + beta*C
S or H U N alpha*op(U'+D+U)*x + beta*y
alpha*op(U'+D+U)*B + beta*C
S or H U U alpha*op(U'+I+U)*x + beta*y
alpha*op(U'+I+U)*B + beta*C
T L U alpha*op(L+I)*x + beta*y
alpha*op(L+I)*B + beta*C
T L N alpha*op(L+D)*x + beta*y
alpha*op(L+D)*B + beta*C
T U U alpha*op(U+I)*x + beta*y
alpha*op(U+I)*B + beta*C
T U N alpha*op(U+D)*x + beta*y
alpha*op(U+D)*B + beta*C
A L ignored alpha*op(L-L')*x + beta*y

alpha*op(L-L')*B + beta*C
A U ignored alpha*op(U-U')*x + beta*y

alpha*op(U-U')*B + beta*C
D ignored N alpha*D*x + beta*y

alpha*D*B + beta*C
D ignored U alpha*x + beta*y

alpha*B + beta*C
Table “Output Matrices for Triangular Solvers” shows correspondence between the output matrices and values
of the parameter matdescra for the sparse matrix A for triangular solvers.
Output Matrices for Triangular Solvers
T L N alpha*inv(op(L))*x
alpha*inv(op(L))*B
T L U alpha*inv(op(L))*x
146
alpha*inv(op(L))*B
T U N alpha*inv(op(U))*x
alpha*inv(op(U))*B
T U U alpha*inv(op(U))*x
alpha*inv(op(U))*B
D ignored N alpha*inv(D)*x
alpha*inv(D)*B
D ignored U alpha*x
alpha*B
Sparse BLAS Level 2 and Level 3 Routines.

Table “Sparse BLAS Level 2 and Level 3 Routines” lists the sparse BLAS Level 2 and Level 3 routines
described in more detail later in this section.
Routine/Function Description
Simplified interface, one-based indexing
mkl_?csrgemv Computes matrix - vector product of a sparse general matrix

in the CSR format (3-array variation)
mkl_?bsrgemv Computes matrix - vector product of a sparse general matrix

in the BSR format (3-array variation).
mkl_?coogemv Computes matrix - vector product of a sparse general matrix

in the coordinate format.
mkl_?diagemv Computes matrix - vector product of a sparse general matrix

in the diagonal format.
mkl_?csrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the CSR format (3-array variation)
mkl_?bsrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the BSR format (3-array variation).
mkl_?coosymv Computes matrix - vector product of a sparse symmetrical

matrix in the coordinate format.
mkl_?diasymv Computes matrix - vector product of a sparse symmetrical

matrix in the diagonal format.
mkl_?csrtrsv Triangular solvers with simplified interface for a sparse matrix

in the CSR format (3-array variation).
mkl_?bsrtrsv Triangular solver with simplified interface for a sparse matrix

in the BSR format (3-array variation).
mkl_?cootrsv Triangular solvers with simplified interface for a sparse matrix

147
mkl_?diatrsv Triangular solvers with simplified interface for a sparse matrix

Simplified interface, zero-based indexing
mkl_cspblas_?csrgemv Computes matrix - vector product of a sparse general matrix

in the CSR format (3-array variation) with zero-based
indexing.
mkl_cspblas_?bsrgemv Computes matrix - vector product of a sparse general matrix

in the BSR format (3-array variation)with zero-based indexing.
mkl_cspblas_?coogemv Computes matrix - vector product of a sparse general matrix

in the coordinate format with zero-based indexing.
mkl_cspblas_?csrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the CSR format (3-array variation) with zero-based
indexing
mkl_cspblas_?bsrsymv Computes matrix - vector product of a sparse symmetrical

matrix in the BSR format (3-array variation) with zero-based
indexing.
mkl_cspblas_?coosymv Computes matrix - vector product of a sparse symmetrical

matrix in the coordinate format with zero-based indexing.
mkl_cspblas_?csrtrsv Triangular solvers with simplified interface for a sparse matrix

in the CSR format (3-array variation) with zero-based
indexing.
mkl_cspblas_?bsrtrsv Triangular solver with simplified interface for a sparse matrix

in the BSR format (3-array variation) with zero-based
indexing.
mkl_cspblas_?cootrsv Triangular solver with simplified interface for a sparse matrix

in the coordinate format with zero-based indexing.
Typical (conventional) interface, one-based and zero-based indexing
mkl_?csrmv Computes matrix - vector product of a sparse matrix in the

CSR format.
mkl_?bsrmv Computes matrix - vector product of a sparse matrix in the

BSR format.
mkl_?cscmv Computes matrix - vector product for a sparse matrix in the

CSC format.
mkl_?coomv Computes matrix - vector product for a sparse matrix in the

coordinate format.
mkl_?csrsv Solves a system of linear equations for a sparse matrix in the

CSR format.
mkl_?bsrsv Solves a system of linear equations for a sparse matrix in the

BSR format.
mkl_?cscsv Solves a system of linear equations for a sparse matrix in the

CSC format.
148
mkl_?coosv Solves a system of linear equations for a sparse matrix in the

coordinate format.
mkl_?csrmm Computes matrix - matrix product of a sparse matrix in the

CSR format
mkl_?bsrmm Computes matrix - matrix product of a sparse matrix in the

BSR format.
mkl_?cscmm Computes matrix - matrix product of a sparse matrix in the

CSC format
mkl_?coomm Computes matrix - matrix product of a sparse matrix in the

coordinate format.
mkl_?csrsm Solves a system of linear matrix equations for a sparse matrix

in the CSR format.
mkl_?bsrsm Solves a system of linear matrix equations for a sparse matrix

in the BSR format.
mkl_?cscsm Solves a system of linear matrix equations for a sparse matrix

in the CSC format.
mkl_?coosm Solves a system of linear matrix equations for a sparse matrix

Typical (conventional) interface, one-based indexing
mkl_?diamv Computes matrix - vector product of a sparse matrix in the

diagonal format.
mkl_?skymv Computes matrix - vector product for a sparse matrix in the

skyline storage format.
mkl_?diasv Solves a system of linear equations for a sparse matrix in the

diagonal format.
mkl_?skysv Solves a system of linear equations for a sparse matrix in the

skyline format.
mkl_?diamm Computes matrix - matrix product of a sparse matrix in the

diagonal format.
mkl_?skymm Computes matrix - matrix product of a sparse matrix in the

skyline storage format.
mkl_?diasm Solves a system of linear matrix equations for a sparse matrix

mkl_?skysm Solves a system of linear matrix equations for a sparse matrix

in the skyline storage format.
Auxiliary routines
Matrix converters
mkl_?dnscsr Converts a sparse matrix in uncompressed representation to

CSR format (3-array variation) and vice versa.
mkl_?csrcoo Converts a sparse matrix in CSR format (3-array variation) to

coordinate format and vice versa.
149
mkl_?csrbsr Converts a sparse matrix in CSR format to BSR format (3-

array variations) and vice versa.
mkl_?csrcsc Converts a sparse matrix in CSR format to CSC format and

vice versa (3-array variations).
mkl_?csrdia Converts a sparse matrix in CSR format (3-array variation) to

diagonal format and vice versa.
mkl_?csrsky Converts a sparse matrix in CSR format (3-array variation) to

sky line format and vice versa.
Operations on sparse matrices
mkl_?csradd Computes the sum of two sparse matrices stored in the CSR
format (3-array variation) with one-based indexing.
mkl_?csrmultcsr Computes the product of two sparse matrices stored in the

CSR format (3-array variation) with one-based indexing.
mkl_?csrmultd Computes product of two sparse matrices stored in the CSR

format (3-array variation) with one-based indexing. The result
is stored in the dense matrix.
mkl_?csrgemv
Computes matrix - vector product of a sparse general
matrix stored in the CSR format (3-array variation)
with one-based indexing.
Syntax
void mkl_scsrgemv (const char *transa , const MKL_INT *m , const float *a , const
MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dcsrgemv (const char *transa , const MKL_INT *m , const double *a , const
MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_ccsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *a ,
const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *a ,
const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?csrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
A is an m-by-m sparse square matrix in the CSR format (3-array variation), AT is the transpose of A.
150
NOTE
This routine supports only one-based indexing of the input arrays.
Input Parameters
transa Specifies the operation.

If transa = 'N' or 'n', then as y := A*x
If transa = 'T' or 't' or 'C' or 'c', then y := AT*x,
m Number of rows of the matrix A.
a Array containing non-zero elements of the matrix A. Its length is equal to

the number of non-zero elements in the matrix A. Refer to values array
description in Sparse Matrix Storage Formats for more details.
ia Array of length m + 1, containing indices of elements in the array a, such

that ia[i] - ia[0] is the index in the array a of the first non-zero
element from the row i. The value of the last element ia[m] - ia[0] is
equal to the number of non-zeros. Refer to rowIndex array description in
Sparse Matrix Storage Formats for more details.
ja Array containing the column indices plus one for each non-zero element of
the matrix A.
Its length is equal to the length of the array a. Refer to columns array
x Array, size is m.
On entry, the array x must contain the vector x.
Output Parameters
y Array, size at least m.
On exit, the array y must contain the vector y.
mkl_?bsrgemv
matrix stored in the BSR format (3-array variation)
Syntax
void mkl_sbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
151
void mkl_zbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?bsrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
A is an m-by-m block sparse square matrix in the BSR format (3-array variation), AT is the transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then the matrix-vector product is computed as
y := A*x
If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is
computed as y := AT*x,
m Number of block rows of the matrix A.
lb Size of the block in the matrix A.
a Array containing elements of non-zero blocks of the matrix A. Its length is

equal to the number of non-zero blocks in the matrix A multiplied by lb*lb.
Refer to values array description in BSR Format for more details.
ia Array of length (m + 1), containing indices of block in the array a, such

equal to the number of non-zero blocks. Refer to rowIndex array
description in BSR Format for more details.
ja Array containing the column indices plus one for each non-zero block in the
matrix A.
Its length is equal to the number of non-zero blocks of the matrix A. Refer
to columns array description in BSR Format for more details.
x Array, size (m*lb).
152
Output Parameters
y Array, size at least (m*lb).
mkl_?coogemv
Computes matrix-vector product of a sparse general
matrix stored in the coordinate format with one-based
indexing.
Syntax
void mkl_scoogemv (const char *transa , const MKL_INT *m , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x , float
*y );
void mkl_dcoogemv (const char *transa , const MKL_INT *m , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x , double
*y );
void mkl_ccoogemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8
*x , MKL_Complex8 *y );
void mkl_zcoogemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?coogemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
A is an m-by-m sparse square matrix in the coordinate format, AT is the transpose of A.
NOTE
Input Parameters
153

y := A*x
val Array of length nnz, contains non-zero elements of the matrix A in the
arbitrary order.
Refer to values array description in Coordinate Format for more details.
rowind Array of length nnz, contains the row indices plus one for each non-zero
element of the matrix A.
Refer to rows array description in Coordinate Format for more details.
colind Array of length nnz, contains the column indices plus one for each non-zero
element of the matrix A. Refer to columns array description in Coordinate
Format for more details.
nnz Specifies the number of non-zero element of the matrix A.

Refer to nnz description in Coordinate Format for more details.
x Array, size is m.
One entry, the array x must contain the vector x.
Output Parameters
mkl_?diagemv
matrix stored in the diagonal format with one-based
indexing.
Syntax
void mkl_sdiagemv (const char *transa , const MKL_INT *m , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *x , float
*y );
void mkl_ddiagemv (const char *transa , const MKL_INT *m , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *x , double
*y );
void mkl_cdiagemv (const char *transa , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8
void mkl_zdiagemv (const char *transa , const MKL_INT *m , const MKL_Complex16 *val ,
154
Include Files
• mkl.h
Description
The mkl_?diagemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
A is an m-by-m sparse square matrix in the diagonal storage format, AT is the transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then y := A*x
If transa = 'T' or 't' or 'C' or 'c', then y := AT*x,
val Two-dimensional array of size lval*ndiag, contains non-zero diagonals of

the matrix A. Refer to values array description in Diagonal Storage Scheme
for more details.
lval Leading dimension of vallval≥m. Refer to lval description in Diagonal

Storage Scheme for more details.
idiag Array of length ndiag, contains the distances between main diagonal and
each non-zero diagonals in the matrix A.
Refer to distance array description in Diagonal Storage Scheme for more
details.
ndiag Specifies the number of non-zero diagonals of the matrix A.
x Array, size is m.
Output Parameters
155
mkl_?csrsymv
Computes matrix - vector product of a sparse
symmetrical matrix stored in the CSR format (3-array
variation) with one-based indexing.
Syntax
void mkl_scsrsymv (const char *uplo , const MKL_INT *m , const float *a , const MKL_INT
*ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dcsrsymv (const char *uplo , const MKL_INT *m , const double *a , const
MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_ccsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *a , const
MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *a , const
MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?csrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation).
NOTE
Input Parameters
uplo Specifies whether the upper or low triangle of the matrix A is used.
If uplo = 'U' or 'u', then the upper triangle of the matrix A is used.
If uplo = 'L' or 'l', then the low triangle of the matrix A is used.


156
the matrix A.
x Array, size is m.
Output Parameters
mkl_?bsrsymv
Computes matrix-vector product of a sparse
symmetrical matrix stored in the BSR format (3-array
variation) with one-based indexing.
Syntax
void mkl_sbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_dbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex8 *y );
void mkl_zbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb , const
MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?bsrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation).
NOTE
Input Parameters
157
uplo Specifies whether the upper or low triangle of the matrix A is considered.


ja Array containing the column indices plus one for each non-zero block in the
matrix A.
Output Parameters
mkl_?coosymv
symmetrical matrix stored in the coordinate format
Syntax
void mkl_scoosymv (const char *uplo , const MKL_INT *m , const float *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x , float
*y );
void mkl_dcoosymv (const char *uplo , const MKL_INT *m , const double *val , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x , double
*y );
void mkl_ccoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8
void mkl_zcoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
158
Include Files
• mkl.h
Description
The mkl_?coosymv routine performs a matrix-vector operation defined as
y := A*x
where:
A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format.
NOTE
Input Parameters
arbitrary order.

x Array, size is m.
Output Parameters
159
mkl_?diasymv
symmetrical matrix stored in the diagonal format with
one-based indexing.
Syntax
void mkl_sdiasymv (const char *uplo , const MKL_INT *m , const float *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *x , float
*y );
void mkl_ddiasymv (const char *uplo , const MKL_INT *m , const double *val , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *x , double
*y );
void mkl_cdiasymv (const char *uplo , const MKL_INT *m , const MKL_Complex8 *val ,
void mkl_zdiasymv (const char *uplo , const MKL_INT *m , const MKL_Complex16 *val ,
Include Files
• mkl.h
Description
The mkl_?diasymv routine performs a matrix-vector operation defined as
y := A*x
where:
A is an upper or lower triangle of the symmetrical sparse matrix.
NOTE
Input Parameters
val Two-dimensional array of size lval by ndiag, contains non-zero diagonals

of the matrix A. Refer to values array description in Diagonal Storage
Scheme for more details.
lval Leading dimension of val, lval≥m. Refer to lval description in Diagonal

160
details.
x Array, size is m.
Output Parameters
mkl_?csrtrsv
Triangular solvers with simplified interface for a sparse
matrix in the CSR format (3-array variation) with one-
based indexing.
Syntax
void mkl_scsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x ,
float *y );
void mkl_dcsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *a , const MKL_INT *ia , const MKL_INT *ja , const double
*x , double *y );
void mkl_ccsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const
void mkl_zcsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const
Include Files
• mkl.h
Description
The mkl_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the CSR format (3 array variation):
A*y = x
or
AT*y = x,
where:
A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, AT is the transpose of A.
161
NOTE
Input Parameters
transa Specifies the system of linear equations.

If transa = 'N' or 'n', then A*y = x
If transa = 'T' or 't' or 'C' or 'c', then AT*y = x,
diag Specifies whether A is unit triangular.

If diag = 'U' or 'u', then A is a unit triangular.
If diag = 'N' or 'n', then A is not unit triangular.

NOTE
The non-zero elements of the given row of the matrix must be stored
in the same order as they appear in the row (from left to right).
No diagonal element can be omitted from a sparse storage if the solver
is called with the non-unit indicator.

the matrix A.
NOTE
Column indices must be sorted in increasing order for each row.
x Array, size is m.
162
Output Parameters
Contains the vector y.
mkl_?bsrtrsv
Triangular solver with simplified interface for a sparse
Syntax
void mkl_sbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const float *a , const MKL_INT *ia , const MKL_INT
*ja , const float *x , float *y );
void mkl_dbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const double *a , const MKL_INT *ia , const MKL_INT
*ja , const double *x , double *y );
void mkl_cbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const MKL_Complex8 *a , const MKL_INT *ia , const
MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zbsrtrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_INT *lb , const MKL_Complex16 *a , const MKL_INT *ia , const
MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the BSR format (3-array variation) :
y := A*x
or
y := AT*x,
where:
NOTE
Input Parameters
uplo Specifies the upper or low triangle of the matrix A is used.

163

y := A*x
computed as y := AT*x.
diag Specifies whether A is a unit triangular matrix.

If diag = 'U' or 'u', then A is a unit triangular.
If diag = 'N' or 'n', then A is not a unit triangular.

NOTE

that ia[I] - ia[0] is the index in the array a of the first non-zero
element from the row I. The value of the last element ia[m] - ia[0] is
ja
Array containing the column indices plus one for each non-zero block in the
matrix A.
Output Parameters
164
mkl_?cootrsv
matrix in the coordinate format with one-based
indexing.
Syntax
void mkl_scootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *val , const MKL_INT *rowind , const MKL_INT *colind , const
MKL_INT *nnz , const float *x , float *y );
void mkl_dcootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *val , const MKL_INT *rowind , const MKL_INT *colind , const
MKL_INT *nnz , const double *x , double *y );
void mkl_ccootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zcootrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?cootrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the coordinate format:
A*y = x
or
AT*y = x,
where:
NOTE
Input Parameters

165

If diag = 'U' or 'u', then A is unit triangular.
arbitrary order.

x Array, size is m.
Output Parameters
mkl_?diatrsv
matrix in the diagonal format with one-based
indexing.
Syntax
void mkl_sdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const float *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const float *x , float *y );
void mkl_ddiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const double *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const double *x , double *y );
void mkl_cdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zdiatrsv (const char *uplo , const char *transa , const char *diag , const
MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const MKL_Complex16 *x , MKL_Complex16 *y );
166
Include Files
• mkl.h
Description
The mkl_?diatrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the diagonal format:
A*y = x
or
AT*y = x,
where:
NOTE
Input Parameters




NOTE
All elements of this array must be sorted in increasing order.
167

details.
x Array, size is m.
Output Parameters
mkl_cspblas_?csrgemv
matrix stored in the CSR format (3-array variation)
with zero-based indexing.
Syntax
void mkl_cspblas_scsrgemv (const char *transa , const MKL_INT *m , const float *a ,
const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dcsrgemv (const char *transa , const MKL_INT *m , const double *a ,
const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_ccsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrgemv (const char *transa , const MKL_INT *m , const MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16
*y );
Include Files
• mkl.h
Description
The mkl_cspblas_?csrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
A is an m-by-m sparse square matrix in the CSR format (3-array variation) with zero-based indexing, AT is
the transpose of A.
NOTE
This routine supports only zero-based indexing of the input arrays.
168
Input Parameters

y := A*x


that ia[I] is the index in the array a of the first non-zero element from the
row I. The value of the last element ia[m] is equal to the number of non-
zeros. Refer to rowIndex array description in Sparse Matrix Storage
Formats for more details.
ja Array containing the column indices for each non-zero element of the
matrix A.
x Array, size is m.
One entry, the array x must contain the vector x.
Output Parameters
mkl_cspblas_?bsrgemv
Syntax
void mkl_cspblas_sbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double
*y );
void mkl_cspblas_cbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_cspblas_zbsrgemv (const char *transa , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16
169
Include Files
• mkl.h
Description
The mkl_cspblas_?bsrgemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
A is an m-by-m block sparse square matrix in the BSR format (3-array variation) with zero-based indexing,
AT is the transpose of A.
NOTE
Input Parameters

y := A*x


that ia[i] is the index in the array a of the first non-zero element from the
row i. The value of the last element ia[m] is equal to the number of non-
zero blocks. Refer to rowIndex array description in BSR Format for more
details.
ja Array containing the column indices for each non-zero block in the matrix A.
170
Output Parameters
mkl_cspblas_?coogemv
matrix stored in the coordinate format with zero-
based indexing.
Syntax
void mkl_cspblas_scoogemv (const char *transa , const MKL_INT *m , const float *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x ,
float *y );
void mkl_cspblas_dcoogemv (const char *transa , const MKL_INT *m , const double *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x ,
double *y );
void mkl_cspblas_ccoogemv (const char *transa , const MKL_INT *m , const MKL_Complex8
*val , const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const
void mkl_cspblas_zcoogemv (const char *transa , const MKL_INT *m , const MKL_Complex16
Include Files
• mkl.h
Description
The mkl_cspblas_dcoogemv routine performs a matrix-vector operation defined as
y := A*x
or
y := AT*x,
where:
A is an m-by-m sparse square matrix in the coordinate format with zero-based indexing, AT is the transpose
of A.
NOTE
Input Parameters

y := A*x
171

arbitrary order.
rowind Array of length nnz, contains the row indices for each non-zero element of
the matrix A.
colind Array of length nnz, contains the column indices for each non-zero element
of the matrix A. Refer to columns array description in Coordinate Format
for more details.

x Array, size is m.
Output Parameters
mkl_cspblas_?csrsymv
symmetrical matrix stored in the CSR format (3-array
variation) with zero-based indexing.
Syntax
void mkl_cspblas_scsrsymv (const char *uplo , const MKL_INT *m , const float *a , const
MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dcsrsymv (const char *uplo , const MKL_INT *m , const double *a ,
const MKL_INT *ia , const MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_ccsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcsrsymv (const char *uplo , const MKL_INT *m , const MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16
*y );
Include Files
• mkl.h
Description
172
The mkl_cspblas_?csrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation) with
zero-based indexing.
NOTE
Input Parameters


zeros. Refer to rowIndex array description in Sparse Matrix Storage
Formats for more details.
matrix A.
x Array, size is m.
Output Parameters
mkl_cspblas_?bsrsymv
symmetrical matrix stored in the BSR format (3-arrays
variation) with zero-based indexing.
173
Syntax
void mkl_cspblas_sbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const float *a , const MKL_INT *ia , const MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const double *a , const MKL_INT *ia , const MKL_INT *ja , const double *x , double
*y );
void mkl_cspblas_cbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex8 *x ,
MKL_Complex8 *y );
void mkl_cspblas_zbsrsymv (const char *uplo , const MKL_INT *m , const MKL_INT *lb ,
const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja , const MKL_Complex16
Include Files
• mkl.h
Description
The mkl_cspblas_?bsrsymv routine performs a matrix-vector operation defined as
y := A*x
where:
A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation) with
NOTE
Input Parameters

174
details.
Output Parameters
mkl_cspblas_?coosymv
symmetrical matrix stored in the coordinate format
with zero-based indexing .
Syntax
void mkl_cspblas_scoosymv (const char *uplo , const MKL_INT *m , const float *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *x ,
float *y );
void mkl_cspblas_dcoosymv (const char *uplo , const MKL_INT *m , const double *val ,
const MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *x ,
double *y );
void mkl_cspblas_ccoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex8
void mkl_cspblas_zcoosymv (const char *uplo , const MKL_INT *m , const MKL_Complex16
Include Files
• mkl.h
Description
The mkl_cspblas_?coosymv routine performs a matrix-vector operation defined as
y := A*x
where:
A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format with zero-based
indexing.
175
NOTE
Input Parameters
arbitrary order.
the matrix A.
for more details.

x Array, size is m.
Output Parameters
mkl_cspblas_?csrtrsv
matrix in the CSR format (3-array variation) with
Syntax
void mkl_cspblas_scsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const float *a , const MKL_INT *ia , const MKL_INT *ja , const float
*x , float *y );
void mkl_cspblas_dcsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const double *a , const MKL_INT *ia , const MKL_INT *ja , const
double *x , double *y );
void mkl_cspblas_ccsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_Complex8 *x , MKL_Complex8 *y );
176
void mkl_cspblas_zcsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_cspblas_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the CSR format (3-array variation) with zero-based indexing:
A*y = x
or
AT*y = x,
where:
NOTE
Input Parameters

diag Specifies whether matrix A is unit triangular.


NOTE
177

ia Array of length m+1, containing indices of elements in the array a, such that
ia[i] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] is equal to the number of non-zeros.
Refer to rowIndex array description in Sparse Matrix Storage Formats for
more details.
matrix A.
NOTE
x Array, size is m.
Output Parameters
mkl_cspblas_?bsrtrsv
Triangular solver with simplified interface for a sparse
Syntax
void mkl_cspblas_sbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const float *a , const MKL_INT *ia , const
MKL_INT *ja , const float *x , float *y );
void mkl_cspblas_dbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const double *a , const MKL_INT *ia , const
MKL_INT *ja , const double *x , double *y );
void mkl_cspblas_cbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const MKL_Complex8 *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zbsrtrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_INT *lb , const MKL_Complex16 *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
178
Description
The mkl_cspblas_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the BSR format (3-array variation) with zero-based indexing:
y := A*x
or
y := AT*x,
where:
NOTE
Input Parameters
uplo Specifies the upper or low triangle of the matrix A is used.


y := A*x
diag Specifies whether matrix A is unit triangular or not.

If diag = 'U' or 'u', A is unit triangular.
If diag = 'N' or 'n', A is not unit triangular.

NOTE
179

that ia[I] is the index in the array a of the first non-zero element from the
row I. The value of the last element ia[m] is equal to the number of non-
details.
Output Parameters
mkl_cspblas_?cootrsv
matrix in the coordinate format with zero-based
indexing .
Syntax
void mkl_cspblas_scootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const float *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const float *x , float *y );
void mkl_cspblas_dcootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const double *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const double *x , double *y );
void mkl_cspblas_ccootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT
*colind , const MKL_INT *nnz , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_cspblas_zcootrsv (const char *uplo , const char *transa , const char *diag ,
const MKL_INT *m , const MKL_Complex16 *val , const MKL_INT *rowind , const MKL_INT
Include Files
• mkl.h
Description
The mkl_cspblas_?cootrsv routine solves a system of linear equations with matrix-vector operations for a
sparse matrix stored in the coordinate format with zero-based indexing:
A*y = x
or
AT*y = x,
where:
180
NOTE
Input Parameters


arbitrary order.
the matrix A.
for more details.

x Array, size is m.
Output Parameters
181
mkl_?csrmv
Computes matrix - vector product of a sparse matrix
stored in the CSR format.
Syntax
void mkl_scsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const float *x , const float *beta , float *y );
void mkl_dcsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const double *beta , double
*y );
void mkl_ccsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8 *val , const MKL_INT
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , const
MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zcsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
Include Files
• mkl.h
Description
The mkl_?csrmv routine performs a matrix-vector operation defined as
y := alpha*A*x + beta*y
or
y := alpha*AT*x + beta*y,
where:
A is an m-by-k sparse matrix in the CSR format, AT is the transpose of A.
NOTE
This routine supports a CSR format both with one-based indexing and zero-based indexing.
Input Parameters

If transa = 'N' or 'n', then y := alpha*A*x + beta*y
If transa = 'T' or 't' or 'C' or 'c', then y := alpha*AT*x + beta*y,
182
k Number of columns of the matrix A.
matdescra Array of six elements, specifies properties of the matrix used for operation.
Only first four array elements are used, their possible values are given in
Table “Possible Values of the Parameter matdescra (descra)”. Possible
combinations of element values of this parameter are given in Table
“Possible Combinations of Element Values of the Parameter matdescra”.
val Array containing non-zero elements of the matrix A.

Its length is pntre[m-1] - pntrb[0].
Refer to values array description in CSR Format for more details.
indx For one-based indexing, array containing the column indices plus one for
each non-zero element of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero element of the matrix A.
Its length is equal to length of the val array.
Refer to columns array description in CSR Format for more details.
pntrb Array of length m.
This array contains row indices, such that pntrb[i] - pntrb[0] is the
first index of row i in the arrays val and indx.
Refer to pointerb array description in CSR Format for more details.
pntre Array of length m.
This array contains row indices, such that pntre[i] - pntrb[0]-1 is the
last index of row i in the arrays val and indx.
Refer to pointerE array description in CSR Format for more details.
x Array, size at least k if transa = 'N' or 'n' and at least m otherwise. On

entry, the array x must contain the vector x.
y Array, size at least m if transa = 'N' or 'n' and at least k otherwise. On

entry, the array y must contain the vector y.
Output Parameters
mkl_?bsrmv
Computes matrix - vector product of a sparse matrix
stored in the BSR format.
183
Syntax
void mkl_sbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *x , const
float *beta , float *y );
void mkl_dbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const
double *beta , double *y );
void mkl_cbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
*val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
MKL_Complex8 *x , const MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zbsrmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_Complex16 *x , const MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?bsrmv routine performs a matrix-vector operation defined as
or
where:
A is an m-by-k block sparse matrix in the BSR format, AT is the transpose of A.
NOTE
This routine supports a BSR format both with one-based indexing and zero-based indexing.
Input Parameters

computed as y := alpha*AT*x + beta*y,
184
k Number of block columns of the matrix A.
val Array containing elements of non-zero blocks of the matrix A. Its length is
each non-zero block of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero block of the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A.
Refer to columns array description in BSR Format for more details.
first index of block row i in the array indx
Refer to pointerB array description in BSR Format for more details.
For zero-based indexing this array contains row indices, such that
pntre[i] - pntrb[0] - 1 is the last index of block row i in the array
indx.
Refer to pointerE array description in BSR Format for more details.
x Array, size at least (k*lb) if transa = 'N' or 'n', and at least (m*lb)
otherwise. On entry, the array x must contain the vector x.
y Array, size at least (m*lb) if transa = 'N' or 'n', and at least (k*lb)
otherwise. On entry, the array y must contain the vector y.
Output Parameters
mkl_?cscmv
Computes matrix-vector product for a sparse matrix in
the CSC format.
185
Syntax
void mkl_scscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*pntrb , const MKL_INT *pntre , const float *x , const float *beta , float *y );
void mkl_dcscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , const double *beta , double
*y );
void mkl_ccscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
void mkl_zcscmv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
Include Files
• mkl.h
Description
The mkl_?cscmv routine performs a matrix-vector operation defined as
or
where:
A is an m-by-k sparse matrix in compressed sparse column (CSC) format, AT is the transpose of A.
NOTE
This routine supports CSC format both with one-based indexing and zero-based indexing.
Input Parameters

186

Its length is pntre[k-1] - pntrb[0].
Refer to values array description in CSC Format for more details.
Refer to rows array description in CSC Format for more details.
pntrb Array of length k.
This array contains column indices, such that pntrb[i] - pntrb[0] + 1

is the first index of column i in the arrays val and indx.
Refer to pointerb array description in CSC Format for more details.
pntre Array of length k.
For one-based indexing this array contains column indices, such that
pntre[i] - pntrb[1] is the last index of column i in the arrays val and
indx.
For zero-based indexing this array contains column indices, such that
pntre[i] - pntrb[1] - 1 is the last index of column i in the arrays val
and indx.
Refer to pointerE array description in CSC Format for more details.


Output Parameters
mkl_?coomv
Computes matrix - vector product for a sparse matrix
Syntax
void mkl_scoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const float *x , const float *beta , float *y );
187
void mkl_dcoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *rowind ,
const MKL_INT *colind , const MKL_INT *nnz , const double *x , const double *beta ,
double *y );
void mkl_ccoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8 *x , const
void mkl_zcoomv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *x , const
Include Files
• mkl.h
Description
The mkl_?coomv routine performs a matrix-vector operation defined as
or
where:
A is an m-by-k sparse matrix in compressed coordinate format, AT is the transpose of A.
NOTE
This routine supports a coordinate format both with one-based indexing and zero-based indexing.
Input Parameters

188
arbitrary order.
rowind Array of length nnz.
For one-based indexing, contains the row indices plus one for each non-zero
For zero-based indexing, contains the row indices for each non-zero
colind Array of length nnz.
For one-based indexing, contains the column indices plus one for each non-
zero element of the matrix A.
For zero-based indexing, contains the column indices for each non-zero
Refer to columns array description in Coordinate Format for more details.



Output Parameters
mkl_?csrsv
Solves a system of linear equations for a sparse
matrix in the CSR format.
Syntax
void mkl_scsrsv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT *pntrb , const
MKL_INT *pntre , const float *x , float *y );
void mkl_dcsrsv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *indx , const MKL_INT *pntrb ,
const MKL_INT *pntre , const double *x , double *y );
void mkl_ccsrsv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT
*pntrb , const MKL_INT *pntre , const MKL_Complex8 *x , MKL_Complex8 *y );
189
void mkl_zcsrsv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
Include Files
• mkl.h
Description
The mkl_?csrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the CSR format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)*x,
where:
alpha is scalar, x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main
diagonal, AT is the transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then y := alpha*inv(A)*x
If transa = 'T' or 't' or 'C' or 'c', then y := alpha*inv(AT)*x,
m Number of columns of the matrix A.

Its length is pntre[m - 1] - pntrb[0].
NOTE
190
NOTE
This array contains row indices, such that pntre[i] - pntrb[0] - 1 is

the last index of row i in the arrays val and indx.
x Array, size at least m.
On entry, the array x must contain the vector x. The elements are accessed
with unit increment.
On entry, the array y must contain the vector y. The elements are accessed
Output Parameters
y Contains solution vector x.
mkl_?bsrsv
matrix in the BSR format.
Syntax
void mkl_sbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
float *alpha , const char *matdescra , const float *val , const MKL_INT *indx , const
MKL_INT *pntrb , const MKL_INT *pntre , const float *x , float *y );
void mkl_dbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *x , double *y );
void mkl_cbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *x ,
MKL_Complex8 *y );
191
void mkl_zbsrsv (const char *transa , const MKL_INT *m , const MKL_INT *lb , const
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex16 *x ,
MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?bsrsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the BSR format:
y := alpha*inv(A)*x
or
y := alpha*inv(AT)* x,
where:
NOTE
Input Parameters

If transa = 'T' or 't' or 'C' or 'c', then y := alpha*inv(AT)* x,
m Number of block columns of the matrix A.
Refer to the values array description in BSR Format for more details.
NOTE
192
Refer to the columns array description in BSR Format for more details.
first index of block row i in the array indx
For one-based indexing this array contains row indices, such that pntre[i]
- pntrb[1] is the last index of block row i in the array indx.
pntre[i] - pntrb[0] - 1 is the last index of block row i in the array
indx.
x Array, size at least (m*lb).
Output Parameters
mkl_?cscsv
matrix in the CSC format.
Syntax
void mkl_scscsv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *indx , const MKL_INT *pntrb , const
MKL_INT *pntre , const float *x , float *y );
void mkl_dcscsv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *indx , const MKL_INT *pntrb ,
const MKL_INT *pntre , const double *x , double *y );
193
void mkl_ccscsv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
void mkl_zcscsv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
Include Files
• mkl.h
Description
The mkl_?cscsv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the CSC format:
y := alpha*inv(A)*x
or
where:
NOTE
This routine supports a CSC format both with one-based indexing and zero-based indexing.
Input Parameters

If transa= 'T' or 't' or 'C' or 'c', then y := alpha*inv(AT)* x,

Its length is pntre[m-1] - pntrb[0].
NOTE
194
indx For one-based indexing, array containing the row indices plus one for each
non-zero element of the matrix A.
For zero-based indexing, array containing the row indices for each non-zero
Refer to columns array description in CSC Format for more details.
NOTE
Row indices must be sorted in increasing order for each column.
This array contains column indices, such that pntrb[i] - pntrb[0] is the
first index of column i in the arrays val and indx.
This array contains column indices, such that pntre[i] - pntrb[0] - 1

is the last index of column i in the arrays val and indx.
Output Parameters
y Contains the solution vector x.
mkl_?coosv
matrix in the coordinate format.
Syntax
void mkl_scoosv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const float *x , float *y );
void mkl_dcoosv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *rowind , const MKL_INT *colind ,
const MKL_INT *nnz , const double *x , double *y );
195
void mkl_ccoosv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *rowind , const MKL_INT
void mkl_zcoosv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?coosv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the coordinate format:
y := alpha*inv(A)*x
or
where:
NOTE
Input Parameters

arbitrary order.
196

Output Parameters
mkl_?csrmm
Computes matrix - matrix product of a sparse matrix
stored in the CSR format.
Syntax
void mkl_scsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const float *b , const
MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dcsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const
MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_ccsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_Complex8 *alpha , const char *matdescra , const MKL_Complex8
MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta , MKL_Complex8 *c ,
const MKL_INT *ldc );
void mkl_zcsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
197
Include Files
• mkl.h
Description
The mkl_?csrmm routine performs a matrix-matrix operation defined as
or
C := alpha*AT*B + beta*C
or
C := alpha*AH*B + beta*C,
where:
B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse row (CSR) format, AT is the
transpose of A, and AH is the conjugate transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then C := alpha*A*B + beta*C,
If transa = 'T' or 't', then C := alpha*AT*B + beta*C,
If transa = 'C' or 'c', then C := alpha*AH*B + beta*C.
n Number of columns of the matrix C.

For zero-based indexing its length is pntre[m—1] - pntrb[0].
each non-zero element of the matrix A.
198
For zero-based indexing, array containing the column indices for each non-
This array contains row indices, such that pntrb[I] - pntrb[0] is the
first index of row I in the arrays val and indx.
This array contains row indices, such that pntre[I] - pntrb[0] - 1 is

the last index of row I in the arrays val and indx.
b Array, size ldb by at least n for non-transposed matrix A and at least m for
transposed for one-based indexing, and (at least k for non-transposed
matrix A and at least m for transposed, ldb) for zero-based indexing.
On entry with transa='N' or 'n', the leading k-by-n part of the array b
must contain the matrix B, otherwise the leading m-by-n part of the array b
must contain the matrix B.
ldb Specifies the leading dimension of b for one-based indexing, and the second
dimension of b for zero-based indexing, as declared in the calling
(sub)program.
c Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
On entry, the leading m-by-n part of the array c must contain the matrix C,
otherwise the leading k-by-n part of the array c must contain the matrix C.
ldc Specifies the leading dimension of c for one-based indexing, and the second
dimension of c for zero-based indexing, as declared in the calling
(sub)program.
Output Parameters
c Overwritten by the matrix (alpha*A*B + beta* C), (alpha*AT*B +

beta*C), or (alpha*AH*B + beta*C).
mkl_?bsrmm
Computes matrix - matrix product of a sparse matrix
stored in the BSR format.
199
Syntax
void mkl_sbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const float *alpha , const char *matdescra , const
float *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
float *b , const MKL_INT *ldb , const float *beta , float *c , const MKL_INT *ldc );
void mkl_dbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const double *alpha , const char *matdescra , const
double *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT *pntre , const
double *b , const MKL_INT *ldb , const double *beta , double *c , const MKL_INT *ldc );
void mkl_cbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const MKL_Complex8 *alpha , const char *matdescra ,
const MKL_Complex8 *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT
*pntre , const MKL_Complex8 *b , const MKL_INT *ldb , const MKL_Complex8 *beta ,
MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zbsrmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , const MKL_INT *lb , const MKL_Complex16 *alpha , const char *matdescra ,
const MKL_Complex16 *val , const MKL_INT *indx , const MKL_INT *pntrb , const MKL_INT
*pntre , const MKL_Complex16 *b , const MKL_INT *ldb , const MKL_Complex16 *beta ,
MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
The mkl_?bsrmm routine performs a matrix-matrix operation defined as
or
or
where:
B and C are dense matrices, A is an m-by-k sparse matrix in block sparse row (BSR) format, AT is the
transpose of A, and AH is the conjugate transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then the matrix-matrix product is computed as
200
If transa = 'T' or 't', then the matrix-vector product is computed as
If transa = 'C' or 'c', then the matrix-vector product is computed as
k Number of block columns of the matrix A.
each non-zero block in the matrix A.
zero block in the matrix A.
Its length is equal to the number of non-zero blocks in the matrix A. Refer
to the columns array description in BSR Format for more details.
This array contains row indices, such that pntrb[I] - pntrb[0] is the
first index of block row I in the array indx.
This array contains row indices, such that pntre[I] - pntrb[0] - 1 is

the last index of block row I in the array indx.
On entry with transa='N' or 'n', the leading n-by-k block part of the
array b must contain the matrix B, otherwise the leading m-by-n block part
ldb Specifies the leading dimension (in blocks) of b as declared in the calling
(sub)program.
201
c Array, size ldc* n for one-based indexing, size k* ldc for zero-based
indexing.
On entry, the leading m-by-n block part of the array c must contain the
matrix C, otherwise the leading n-by-k block part of the array c must
contain the matrix C.
ldc Specifies the leading dimension (in blocks) of c as declared in the calling
(sub)program.
Output Parameters
c Overwritten by the matrix (alpha*A*B + beta*C) or (alpha*AT*B +

beta*C) or (alpha*AH*B + beta*C).
mkl_?cscmm
Computes matrix-matrix product of a sparse matrix
stored in the CSC format.
Syntax
void mkl_scscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
void mkl_dcscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
void mkl_ccscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
void mkl_zcscmm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
Include Files
• mkl.h
Description
The mkl_?cscmm routine performs a matrix-matrix operation defined as
or
C := alpha*AT*B + beta*C,
202
or
where:
B and C are dense matrices, A is an m-by-k sparse matrix in compressed sparse column (CSC) format, AT is
the transpose of A, and AH is the conjugate transpose of A.
NOTE
This routine supports CSC format both with one-based indexing and zero-based indexing.
Input Parameters

If transa = 'N' or 'n', then C := alpha*A* B + beta*C
If transa ='C' or 'c', then C := alpha*AH*B + beta*C

Its length is pntrb[k-1] - pntrb[0].
non-zero element of the matrix A.
pntrb Array of length k.
This array contains column indices, such that pntrb[i] - pntrb[0] is the
first index of column i in the arrays val and indx.
pntre Array of length k.
203
This array contains column indices, such that pntre[i] - pntrb[0] - 1

is the last index of column i in the arrays val and indx.
On entry with transa = 'N' or 'n', the leading k-by-n part of the array b
(sub)program.
indexing.
(sub)program.
Output Parameters
c Overwritten by the matrix (alpha*A*B + beta* C) or (alpha*AT*B +

beta*C) or (alpha*AH*B + beta*C).
mkl_?coomm
stored in the coordinate format.
Syntax
void mkl_scoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const float *b , const
void mkl_dcoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *rowind , const MKL_INT *colind , const MKL_INT *nnz , const double *b , const
void mkl_ccoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
204
void mkl_zcoomm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
Include Files
• mkl.h
Description
The mkl_?coomm routine performs a matrix-matrix operation defined as
or
or
where:
B and C are dense matrices, A is an m-by-k sparse matrix in the coordinate format, AT is the transpose of A,
and AH is the conjugate transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then C := alpha*A*B + beta*C
arbitrary order.
205

(sub)program.
indexing.
(sub)program.
Output Parameters
c Overwritten by the matrix (alpha*A*B + beta*C), (alpha*AT*B +

mkl_?csrsm
Solves a system of linear matrix equations for a
sparse matrix in the CSR format.
206
Syntax
void mkl_scsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*pntrb , const MKL_INT *pntre , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_dcsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const MKL_INT *ldb , double
*c , const MKL_INT *ldc );
void mkl_ccsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*indx , const MKL_INT *pntrb , const MKL_INT *pntre , const MKL_Complex8 *b , const
MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zcsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
Include Files
• mkl.h
Description
The mkl_?csrsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the CSR format:
C := alpha*inv(A)*B
or
C := alpha*inv(AT)*B,
where:
alpha is scalar, B and C are dense matrices, A is a sparse upper or lower triangular matrix with unit or non-
unit main diagonal, AT is the transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then C := alpha*inv(A)*B
If transa = 'T' or 't' or 'C' or 'c', then C := alpha*inv(AT)*B,
207

For zero-based indexing its length is pntre[m—1] - pntrb[0].
NOTE
each non-zero element of the matrix A.
NOTE
pntre[i] - pntrb[0] - 1 is the last index of row i in the arrays val and
indx.
b Array, size ldb* n for one-based indexing, and (m, ldb) for zero-based
indexing.
On entry the leading m-by-n part of the array b must contain the matrix B.
(sub)program.
208
(sub)program.
Output Parameters
c
Array, size ldc by n for one-based indexing, and (m, ldc) for zero-based
indexing.
The leading m-by-n part of the array c contains the output matrix C.
mkl_?cscsm
sparse matrix in the CSC format.
Syntax
void mkl_scscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*pntrb , const MKL_INT *pntre , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_dcscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *pntrb , const MKL_INT *pntre , const double *b , const MKL_INT *ldb , double
void mkl_ccscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
void mkl_zcscsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
Include Files
• mkl.h
Description
The mkl_?cscsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the CSC format:
C := alpha*inv(A)*B
or
where:
209
NOTE
This routine supports a CSC format both with one-based indexing and zero-based indexing.
Input Parameters
transa Specifies the system of equations.

If transa = 'N' or 'n', then C := alpha*inv(A)*B

For zero-based indexing its length is pntre[m] - pntrb[0].
NOTE
non-zero element of the matrix A. For zero-based indexing, array containing
the row indices for each non-zero element of the matrix A.
NOTE
Row indices must be sorted in increasing order for each column.
This array contains column indices, such that pntrb[I] - pntrb[0] is the
first index of column I in the arrays val and indx.
This array contains column indices, such that pntre[I] - pntrb[1]-1 is

the last index of column I in the arrays val and indx.
210
b Array, size ldb by n for one-based indexing, and (m, ldb) for zero-based
indexing.
(sub)program.
(sub)program.
Output Parameters
indexing.
mkl_?coosm
sparse matrix in the coordinate format.
Syntax
void mkl_scoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *rowind , const
MKL_INT *colind , const MKL_INT *nnz , const float *b , const MKL_INT *ldb , float
void mkl_dcoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *rowind ,
const MKL_INT *colind , const MKL_INT *nnz , const double *b , const MKL_INT *ldb ,
double *c , const MKL_INT *ldc );
void mkl_ccoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex8 *b , const
void mkl_zcoosm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*rowind , const MKL_INT *colind , const MKL_INT *nnz , const MKL_Complex16 *b , const
Include Files
• mkl.h
Description
The mkl_?coosm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the coordinate format:
C := alpha*inv(A)*B
211
or
where:
NOTE
Input Parameters

C := alpha*inv(A)*B
computed as C := alpha*inv(AT)*B,
arbitrary order.
zero element of the matrix A
element of the matrix A
212
b Array, size ldb by n for one-based indexing, and (m, ldb) for zero-based
indexing.
Before entry the leading m-by-n part of the array b must contain the matrix
B.
(sub)program.
(sub)program.
Output Parameters
indexing.
mkl_?bsrsm
sparse matrix in the BSR format.
Syntax
void mkl_sbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const float *alpha , const char *matdescra , const float *val , const
MKL_INT *ldb , float *c , const MKL_INT *ldc );
void mkl_dbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lb , const double *alpha , const char *matdescra , const double *val , const
MKL_INT *ldb , double *c , const MKL_INT *ldc );
void mkl_cbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex8 *b , const MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zbsrsm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_Complex16 *b , const MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
213
The mkl_?bsrsm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the BSR format:
C := alpha*inv(A)*B
or
where:
NOTE
Input Parameters

C := alpha*inv(A)*B.
computed as C := alpha*inv(AT)*B.
m Number of block columns of the matrix A.
NOTE
214
Refer to the columns array description in BSR Format for more details.
first index of block row i in the array indx.
This array contains row indices, such that pntre[i] - pntrb[0] - 1 is

the last index of block row i in the arrays val and indx.
b Array, size ldb* n for one-based indexing, size m* ldb for zero-based
indexing.
ldb Specifies the leading dimension (in blocks) of b as declared in the calling
(sub)program.
ldc Specifies the leading dimension (in blocks) of c as declared in the calling
(sub)program.
Output Parameters
c Array, size ldc* n for one-based indexing, size m* ldc for zero-based
indexing.
mkl_?diamv
in the diagonal format with one-based indexing.
Syntax
void mkl_sdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const float *x , const float *beta , float *y );
void mkl_ddiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *lval , const
MKL_INT *idiag , const MKL_INT *ndiag , const double *x , const double *beta , double
*y );
void mkl_cdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8 *x , const
void mkl_zdiamv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16 *x , const
215
Include Files
• mkl.h
Description
The mkl_?diamv routine performs a matrix-vector operation defined as
or
where:
A is an m-by-k sparse matrix stored in the diagonal format, AT is the transpose of A.
NOTE
Input Parameters

If transa = 'N' or 'n', then y := alpha*A*x + beta*y,
If transa = 'T' or 't' or 'C' or 'c', then y := alpha*AT*x + beta*y.


details.
216
x Array, size at least k if transa = 'N' or 'n', and at least m otherwise. On
y Array, size at least m if transa = 'N' or 'n', and at least k otherwise. On

Output Parameters
mkl_?skymv
in the skyline storage format with one-based indexing.
Syntax
void mkl_sskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *pntr , const float
*x , const float *beta , float *y );
void mkl_dskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *pntr , const
double *x , const double *beta , double *y );
void mkl_cskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
*pntr , const MKL_Complex8 *x , const MKL_Complex8 *beta , MKL_Complex8 *y );
void mkl_zskymv (const char *transa , const MKL_INT *m , const MKL_INT *k , const
*pntr , const MKL_Complex16 *x , const MKL_Complex16 *beta , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?skymv routine performs a matrix-vector operation defined as
or
where:
A is an m-by-k sparse matrix stored using the skyline storage scheme, AT is the transpose of A.
NOTE
217
Input Parameters

NOTE
General matrices (matdescra[0]='G') is not supported.
val Array containing the set of elements of the matrix A in the skyline profile
form.
If matdescrsa[1]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescrsa[1]= 'U', then val contains elements from the upper
triangle of the matrix A.
Refer to values array description in Skyline Storage Scheme for more
details.
pntr Array of length (m + 1) for lower triangle, and (k + 1) for upper triangle.
It contains the indices specifying in the val the positions of the first
element in each row (column) of the matrix A. Refer to pointers array
description in Skyline Storage Scheme for more details.


Output Parameters
218
mkl_?diasv
matrix in the diagonal format with one-based
indexing.
Syntax
void mkl_sdiasv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT *idiag , const
MKL_INT *ndiag , const float *x , float *y );
void mkl_ddiasv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *lval , const MKL_INT *idiag ,
const MKL_INT *ndiag , const double *x , double *y );
void mkl_cdiasv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const MKL_Complex8 *x , MKL_Complex8 *y );
void mkl_zdiasv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const MKL_Complex16 *x , MKL_Complex16 *y );
Include Files
• mkl.h
Description
The mkl_?diasv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix stored in the diagonal format:
y := alpha*inv(A)*x
or
where:
NOTE
Input Parameters

If transa = 'T' or 't' or 'C' or 'c', then y := alpha*inv(AT)*x,
219


NOTE

details.
Output Parameters
mkl_?skysv
matrix in the skyline format with one-based indexing.
Syntax
void mkl_sskysv (const char *transa , const MKL_INT *m , const float *alpha , const
char *matdescra , const float *val , const MKL_INT *pntr , const float *x , float *y );
void mkl_dskysv (const char *transa , const MKL_INT *m , const double *alpha , const
char *matdescra , const double *val , const MKL_INT *pntr , const double *x , double
*y );
void mkl_cskysv (const char *transa , const MKL_INT *m , const MKL_Complex8 *alpha ,
const char *matdescra , const MKL_Complex8 *val , const MKL_INT *pntr , const
220
void mkl_zskysv (const char *transa , const MKL_INT *m , const MKL_Complex16 *alpha ,
const char *matdescra , const MKL_Complex16 *val , const MKL_INT *pntr , const
Include Files
• mkl.h
Description
The mkl_?skysv routine solves a system of linear equations with matrix-vector operations for a sparse
matrix in the skyline storage format:
y := alpha*inv(A)*x
or
where:
NOTE
Input Parameters

NOTE
form.
If matdescra[2]= 'L', then val contains elements from the low triangle
of the matrix A.
If matdescsa[2]= 'U', then val contains elements from the upper
221

details.
It contains the indices specifying in the val the positions of the first
element in each row (column) of the matrix A. Refer to pointers array
description in Skyline Storage Scheme for more details.
Output Parameters
mkl_?diamm
stored in the diagonal format with one-based
indexing.
Syntax
void mkl_sdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const float *b , const
void mkl_ddiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const double *b , const
void mkl_cdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*val , const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const
void mkl_zdiamm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*val , const MKL_INT *lval , const MKL_INT *idiag , const MKL_INT *ndiag , const
Include Files
• mkl.h
Description
222
The mkl_?diamm routine performs a matrix-matrix operation defined as
or
or
where:
B and C are dense matrices, A is an m-by-k sparse matrix in the diagonal format, AT is the transpose of A,
and AH is the conjugate transpose of A.
NOTE
Input Parameters



details.
223
b Array, size ldb* n.

(sub)program.

(sub)program.
Output Parameters

mkl_?skymm
stored using the skyline storage scheme with one-
based indexing.
Syntax
void mkl_sskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *pntr , const float *b , const MKL_INT *ldb , const float *beta , float *c ,
void mkl_dskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *pntr , const double *b , const MKL_INT *ldb , const double *beta , double
void mkl_cskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*val , const MKL_INT *pntr , const MKL_Complex8 *b , const MKL_INT *ldb , const
MKL_Complex8 *beta , MKL_Complex8 *c , const MKL_INT *ldc );
void mkl_zskymm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*val , const MKL_INT *pntr , const MKL_Complex16 *b , const MKL_INT *ldb , const
MKL_Complex16 *beta , MKL_Complex16 *c , const MKL_INT *ldc );
Include Files
• mkl.h
Description
224
The mkl_?skymm routine performs a matrix-matrix operation defined as
or
or
where:
B and C are dense matrices, A is an m-by-k sparse matrix in the skyline storage format, AT is the transpose
of A, and AH is the conjugate transpose of A.
NOTE
Input Parameters

NOTE
General matrices (matdescra [0]='G') is not supported.
form.
of the matrix A.
225

details.
It contains the indices specifying the positions of the first element of the
matrix A in each row (for the lower triangle) or column (for upper triangle)
in the val array such that val[pntr[i] - 1] is the first element in row or
column i + 1. Refer to pointers array description in Skyline Storage

(sub)program.

(sub)program.
Output Parameters

mkl_?diasm
sparse matrix in the diagonal format with one-based
indexing.
Syntax
void mkl_sdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *lval , const MKL_INT
*idiag , const MKL_INT *ndiag , const float *b , const MKL_INT *ldb , float *c , const
MKL_INT *ldc );
void mkl_ddiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *lval , const
MKL_INT *idiag , const MKL_INT *ndiag , const double *b , const MKL_INT *ldb , double
void mkl_cdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex8 *b , const
226
void mkl_zdiasm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*lval , const MKL_INT *idiag , const MKL_INT *ndiag , const MKL_Complex16 *b , const
Include Files
• mkl.h
Description
The mkl_?diasm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the diagonal format:
C := alpha*inv(A)*B
or
where:
NOTE
Input Parameters

If transa = 'N' or 'n', then C := alpha*inv(A)*B,
If transa = 'T' or 't' or 'C' or 'c', then C := alpha*inv(AT)*B.


227
NOTE

details.

(sub)program.

(sub)program.
Output Parameters
The leading m-by-n part of the array c contains the matrix C.
mkl_?skysm
sparse matrix stored using the skyline storage scheme
Syntax
void mkl_sskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const float
*alpha , const char *matdescra , const float *val , const MKL_INT *pntr , const float
*b , const MKL_INT *ldb , float *c , const MKL_INT *ldc );
void mkl_dskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
double *alpha , const char *matdescra , const double *val , const MKL_INT *pntr , const
double *b , const MKL_INT *ldb , double *c , const MKL_INT *ldc );
void mkl_cskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*pntr , const MKL_Complex8 *b , const MKL_INT *ldb , MKL_Complex8 *c , const MKL_INT
*ldc );
void mkl_zskysm (const char *transa , const MKL_INT *m , const MKL_INT *n , const
*pntr , const MKL_Complex16 *b , const MKL_INT *ldb , MKL_Complex16 *c , const MKL_INT
*ldc );
Include Files
• mkl.h
Description
228
The mkl_?skysm routine solves a system of linear equations with matrix-matrix operations for a sparse
matrix in the skyline storage format:
C := alpha*inv(A)*B
or
where:
NOTE
Input Parameters

If transa = 'N' or 'n', then C := alpha*inv(A)*B,
NOTE
form.
of the matrix A.
details.
pntr Array of length (m + 1) for lower triangle, and (n + 1) for upper triangle.
229
It contains the indices specifying the positions of the first element of the
matrix A in each row (for the lower triangle) or column (for upper triangle)
in the val array such that val[pntr[i] - 1] is the first element in row or
column i + 1. Refer to pointers array description in Skyline Storage

(sub)program.

(sub)program.
Output Parameters
The leading m-by-n part of the array c contains the matrix C.
mkl_?dnscsr
Convert a sparse matrix in uncompressed
representation to the CSR format and vice versa.
Syntax
void mkl_ddnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n , double
*adns , const MKL_INT *lda , double *acsr , MKL_INT *ja , MKL_INT *ia , MKL_INT
*info );
void mkl_sdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n , float
*adns , const MKL_INT *lda , float *acsr , MKL_INT *ja , MKL_INT *ia , MKL_INT *info );
void mkl_cdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n ,
MKL_Complex8 *adns , const MKL_INT *lda , MKL_Complex8 *acsr , MKL_INT *ja , MKL_INT
*ia , MKL_INT *info );
void mkl_zdnscsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *n ,
MKL_Complex16 *adns , const MKL_INT *lda , MKL_Complex16 *acsr , MKL_INT *ja , MKL_INT
*ia , MKL_INT *info );
Include Files
• mkl.h
Description
This routine converts a sparse matrix A between formats: stored as a rectangular array (dense
representation) and stored using compressed sparse row (CSR) format (3-array variation).
Input Parameters
job Array, contains the following conversion parameters:
• job[0]: Conversion type.
230
•If job[0]=0, the rectangular matrix A is converted to the CSR
format;
• if job[0]=1, the rectangular matrix A is restored from the CSR
format.
• job[1]: index base for the rectangular matrix A.
• If job[1]=0, zero-based indexing for the rectangular matrix A is
used;
• if job[1]=1, one-based indexing for the rectangular matrix A is used.
• job[2]: Index base for the matrix in CSR format.
•If job[2]=0, zero-based indexing for the matrix in CSR format is
used;
• if job[2]=1, one-based indexing for the matrix in CSR format is
used.
• job[3]: Portion of matrix.
• If job[3]=0, adns is a lower triangular part of matrix A;
• If job[3]=1, adns is an upper triangular part of matrix A;
• If job[3]=2, adns is a whole matrix A.
• job[4]=nzmax: maximum number of the non-zero elements allowed if
job[0]=0.
• job[5]: job indicator for conversion to CSR format.
• If job[5]=0, only array ia is generated for the output storage.
• If job[5]>0, arrays acsr, ia, ja are generated for the output
storage.
n Number of columns of the matrix A.
adns (input/output)
If the conversion type is from uncompressed to CSR, on input adns
contains an uncompressed (dense) representation of matrix A.
lda Specifies the leading dimension of adns as declared in the calling

(sub)program.
For zero-based indexing of A, lda must be at least max(1, n).
For one-based indexing of A, lda must be at least max(1, m).
acsr (input/output)
If conversion type is from CSR to uncompressed, on input acsr contains
the non-zero elements of the matrix A. Its length is equal to the number of
non-zero elements in the matrix A. Refer to values array description in
ja (input/output). If conversion type is from CSR to uncompressed, on input

for zero-based indexing of A ja contains the column indices plus one for
each non-zero element of the matrix A. For one-based indexing of A ja
contains the column indices for each non-zero element of the matrix A.
231
Its length is equal to the length of the array acsr. Refer to columns array
ia (input/output). Array of length m + 1.
If conversion type is from CSR to uncompressed, on input for zero-based

indexing of A ia contains indices of elements in the array acsr, such that
ia[i] - 1 is the index in the array acsr of the first non-zero element from
the row i. For one-based indexing of A ia contains indices of elements in
the array acsr, such that ia[i] is the index in the array acsr of the first
non-zero element from the row i.
The value ofia[m] - ia[0] is equal to the number of non-zeros. Refer to
rowIndex array description in Sparse Matrix Storage Formats for more
details.
Output Parameters
adns If conversion type is from CSR to uncompressed, on output adns contains

the uncompressed (dense) representation of matrix A.
acsr, ja, ia If conversion type is from uncompressed to CSR, on output acsr, ja, and
ia contain the compressed sparse row (CSR) format (3-array variation) of
matrix A (see Sparse Matrix Storage Formats for a description of the
storage format).
info Integer info indicator only for restoring the matrix A from the CSR format.
If info=0, the execution is successful.
If info=i, the routine is interrupted processing the i-th row because there
is no space in the arrays acsr and ja according to the value nzmax.
mkl_?csrcoo
Converts a sparse matrix in the CSR format to the
coordinate format and vice versa.
Syntax
void mkl_scsrcoo (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , MKL_INT *nnz , float *acoo , MKL_INT *rowind , MKL_INT *colind , MKL_INT
*info );
void mkl_dcsrcoo (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , MKL_INT *nnz , double *acoo , MKL_INT *rowind , MKL_INT *colind ,
MKL_INT *info );
void mkl_ccsrcoo (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_INT *nnz , MKL_Complex8 *acoo , MKL_INT *rowind , MKL_INT
*colind , MKL_INT *info );
void mkl_zcsrcoo (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_INT *nnz , MKL_Complex16 *acoo , MKL_INT *rowind , MKL_INT
*colind , MKL_INT *info );
Include Files
• mkl.h
232
Description
This routine converts a sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to coordinate format and vice versa.
Input Parameters

job[0]
If job[0]=0, the matrix in the CSR format is converted to the coordinate
format;
if job[0]=1, the matrix in the coordinate format is converted to the CSR
format.
if job[0]=2, the matrix in the coordinate format is converted to the CSR
format, and the column indices in CSR representation are sorted in the
increasing order within each row.
job[1]
If job[1]=0, zero-based indexing for the matrix in CSR format is used;
if job[1]=1, one-based indexing for the matrix in CSR format is used.
job[2]
If job[2]=0, zero-based indexing for the matrix in coordinate format is
used;
if job[2]=1, one-based indexing for the matrix in coordinate format is
used.
job[4]
job[4]=nzmax - maximum number of the non-zero elements allowed if
job[0]=0.
job[5] - job indicator.
For conversion to the coordinate format:
If job[5]=1, only array rowind is filled in for the output storage.
If job[5]=2, arrays rowind, colind are filled in for the output storage.
If job[5]=3, all arrays rowind, colind, acoo are filled in for the output
storage.
For conversion to the CSR format:
If job[5]=0, all arrays acsr, ja, ia are filled in for the output storage.
If job[5]=1, only array ia is filled in for the output storage.
If job[5]=2, then it is assumed that the routine already has been called
with the job[5]=1, and the user allocated the required space for storing
the output arrays acsr and ja.
n Dimension of the matrix A.
nnz Specifies the number of non-zero elements of the matrix A for job[0]≠0.
233
acsr (input/output)
Array containing non-zero elements of the matrix A. Its length is equal to
ja (input/output). For job[1] = 1 (one-based indexing for the matrix in CSR

format), array containing the column indices plus one for each non-zero
For job[1] = 0 (zero-based indexing for the matrix in CSR format), array
ia (input/output). Array of length n + 1, containing indices of elements in the

array acsr, such that ia[i] - ia[0] is the index in the array acsr of the
first non-zero element from the row i. The value of the last element ia[n]
- ia[0] is equal to the number of non-zeros plus one. Refer to rowIndex
array description in Sparse Matrix Storage Formats for more details.
acoo (input/output)
rowind (input/output). Array of length nnz, contains the row indices for each non-
colind (input/output). Array of length nnz, contains the column indices for each
non-zero element of the matrix A. Refer to columns array description in
Coordinate Format for more details.
Output Parameters
nnz Returns the number of converted elements of the matrix A for job[0]=0.
info Integer info indicator only for converting the matrix A from the CSR format.
If info=1, the routine is interrupted because there is no space in the arrays

acoo, rowind, colind according to the value nzmax.
mkl_?csrbsr
Converts a square sparse matrix in the CSR format to
the BSR format and vice versa.
Syntax
void mkl_scsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , float *acsr , MKL_INT *ja , MKL_INT *ia , float *absr , MKL_INT
*jab , MKL_INT *iab , MKL_INT *info );
234
void mkl_dcsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , double *acsr , MKL_INT *ja , MKL_INT *ia , double *absr , MKL_INT
*jab , MKL_INT *iab , MKL_INT *info );
void mkl_ccsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , MKL_Complex8 *acsr , MKL_INT *ja , MKL_INT *ia , MKL_Complex8
*absr , MKL_INT *jab , MKL_INT *iab , MKL_INT *info );
void mkl_zcsrbsr (const MKL_INT *job , const MKL_INT *m , const MKL_INT *mblk , const
MKL_INT *ldabsr , MKL_Complex16 *acsr , MKL_INT *ja , MKL_INT *ia , MKL_Complex16
*absr , MKL_INT *jab , MKL_INT *iab , MKL_INT *info );
Include Files
• mkl.h
Description
This routine converts a square sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the block sparse row (BSR) format and vice versa.
Input Parameters

job[0]
If job[0]=0, the matrix in the CSR format is converted to the BSR format;
if job[0]=1, the matrix in the BSR format is converted to the CSR format.
job[1]
job[2]
If job[2]=0, zero-based indexing for the matrix in the BSR format is used;
if job[2]=1, one-based indexing for the matrix in the BSR format is used.
job[3] is only used for conversion to CSR format. By default, the converter
saves the blocks without checking whether an element is zero or not. If
job[3]=1, then the converter only saves non-zero elements in blocks.
For conversion to the BSR format:
If job[5]=0, only arrays jab, iab are generated for the output storage.
If job[5]>0, all output arrays absr, jab, and iab are filled in for the
output storage.
If job[5]=-1, iab[m] - iab[0] returns the number of non-zero blocks.

If job[5]=0, only arrays ja, ia are generated for the output storage.
m Actual row dimension of the matrix A for convert to the BSR format; block
row dimension of the matrix A for convert to the CSR format.
235
mblk Size of the block in the matrix A.
ldabsr Leading dimension of the array absr as declared in the calling program.
ldabsr must be greater than or equal to mblk*mblk.
acsr (input/output)
ja (input/output). Array containing the column indices for each non-zero

ia (input/output). Array of length m + 1, containing indices of elements in the

array acsr, such that ia[I]] - iab[0] is the index in the array acsr of
the first non-zero element from the row I. The value of ia[m]] - iab[0]
is equal to the number of non-zeros. Refer to rowIndex array description in
absr (input/output)
Array containing elements of non-zero blocks of the matrix A. Its length is
equal to the number of non-zero blocks in the matrix A multiplied by
mblk*mblk. Refer to values array description in BSR Format for more
details.
jab (input/output). Array containing the column indices for each non-zero block
of the matrix A.
iab (input/output). Array of length (m + 1), containing indices of blocks in the

array absr, such that iab[i] - iab[0] is the index in the array absr of
the first non-zero element from the i-th row . The value of iab[m] is equal
to the number of non-zero blocks. Refer to rowIndex array description in
BSR Format for more details.
Output Parameters
If info=1, it means that mblk is equal to 0.
If info=2, it means that ldabsr is less than mblk*mblk and there is no

space for all blocks.
mkl_?csrcsc
Converts a square sparse matrix in the CSR format to
the CSC format and vice versa.
236
Syntax
void mkl_dcsrcsc (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_scsrcsc (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_ccsrcsc (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT *info );
void mkl_zcsrcsc (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *acsc , MKL_INT *ja1 , MKL_INT *ia1 , MKL_INT
*info );
Include Files
• mkl.h
Description
This routine converts a square sparse matrix A stored in the compressed sparse row (CSR) format (3-array
variation) to the compressed sparse column (CSC) format and vice versa.
Input Parameters

job[0]
If job[0]=0, the matrix in the CSR format is converted to the CSC format;
if job[0]=1, the matrix in the CSC format is converted to the CSR format.
job[1]
job[2]
If job[2]=0, zero-based indexing for the matrix in the CSC format is used;
if job[2]=1, one-based indexing for the matrix in the CSC format is used.

For conversion to the CSC format:
If job[5]=0, only arrays ja1, ia1 are filled in for the output storage.
If job[5]≠0, all output arrays acsc, ja1, and ia1 are filled in for the
output storage.
If job[5]=0, only arrays ja, ia are filled in for the output storage.
If job[5]≠0, all output arrays acsr, ja, and ia are filled in for the output
storage.
m Dimension of the square matrix A.
acsr (input/output)
237
Array containing non-zero elements of the square matrix A. Its length is

equal to the number of non-zero elements in the matrix A. Refer to values


first non-zero element from the row i. The value ofia[m] - ia[0] is equal
to the number of non-zeros. Refer to rowIndex array description in Sparse
Matrix Storage Formats for more details.
acsc (input/output)
Array containing non-zero elements of the square matrix A. Its length is
equal to the number of non-zero elements in the matrix A. Refer to values
ja1 (input/output). Array containing the row indices for each non-zero element
of the matrix A.
Its length is equal to the length of the array acsc. Refer to columns array
ia1 (input/output). Array of length m + 1, containing indices of elements in the

array acsc, such that ia1[i] - ia1[0] is the index in the array acsc of
the first non-zero element from the column i. The value of ia1[m] -
ia1[0] is equal to the number of non-zeros. Refer to rowIndex array
Output Parameters
info This parameter is not used now.
mkl_?csrdia
Converts a sparse matrix in the CSR format to the
diagonal format and vice versa.
Syntax
void mkl_dcsrdia (const MKL_INT *job , const MKL_INT *n , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *adia , const MKL_INT *ndiag , MKL_INT *distance , MKL_INT
*idiag , double *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT *info );
void mkl_scsrdia (const MKL_INT *job , const MKL_INT *n , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *adia , const MKL_INT *ndiag , MKL_INT *distance , MKL_INT
*idiag , float *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT *info );
void mkl_ccsrdia (const MKL_INT *job , const MKL_INT *n , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *adia , const MKL_INT *ndiag , MKL_INT *distance ,
MKL_INT *idiag , MKL_Complex8 *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT
*info );
238
void mkl_zcsrdia (const MKL_INT *job , const MKL_INT *n , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *adia , const MKL_INT *ndiag , MKL_INT *distance ,
MKL_INT *idiag , MKL_Complex16 *acsr_rem , MKL_INT *ja_rem , MKL_INT *ia_rem , MKL_INT
*info );
Include Files
• mkl.h
Description
variation) to the diagonal format and vice versa.
Input Parameters

job[0]
If job[0]=0, the matrix in the CSR format is converted to the diagonal
format;
if job[0]=1, the matrix in the diagonal format is converted to the CSR
format.
job[1]
job[2]
If job[2]=0, zero-based indexing for the matrix in the diagonal format is
used;
if job[2]=1, one-based indexing for the matrix in the diagonal format is
used.
For conversion to the diagonal format:
If job[5]=0, diagonals are not selected internally, and acsr_rem, ja_rem,
ia_rem are not filled in for the output storage.
If job[5]=1, diagonals are not selected internally, and acsr_rem, ja_rem,
ia_rem are filled in for the output storage.
If job[5]=10, diagonals are selected internally, and acsr_rem, ja_rem,
ia_rem are not filled in for the output storage.
If job[5]=11, diagonals are selected internally, and csr_rem, ja_rem,
ia_rem are filled in for the output storage.
If job[5]=0, each entry in the array adia is checked whether it is zero.
Zero entries are not included in the array acsr.
If job[5]≠0, each entry in the array adia is not checked whether it is zero.
m Dimension of the matrix A.
239
acsr (input/output)


adia (input/output)
Array of size (ndiag*idiag) containing diagonals of the matrix A.
The key point of the storage is that each element in the array adia retains
the row number of the original matrix. To achieve this diagonals in the
lower triangular part of the matrix are padded from the top, and those in
the upper triangular part are padded from the bottom.
ndiag Specifies the leading dimension of the array adia as declared in the calling
(sub)program, must be at least max(1, m).
distance Array of length idiag, containing the distances between the main diagonal
and each non-zero diagonal to be extracted. The distance is positive if the
diagonal is above the main diagonal, and negative if the diagonal is below
the main diagonal. The main diagonal has a distance equal to zero.
idiag Number of diagonals to be extracted. For conversion to diagonal format on

return this parameter may be modified.
acsr_rem, ja_rem, ia_rem Remainder of the matrix in the CSR format if it is needed for conversion to
the diagonal format.
Output Parameters
info This parameter is not used now.
mkl_?csrsky
Converts a sparse matrix in CSR format to the skyline
format and vice versa.
Syntax
void mkl_dcsrsky (const MKL_INT *job , const MKL_INT *m , double *acsr , MKL_INT *ja ,
MKL_INT *ia , double *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_scsrsky (const MKL_INT *job , const MKL_INT *m , float *acsr , MKL_INT *ja ,
MKL_INT *ia , float *asky , MKL_INT *pointers , MKL_INT *info );
240
void mkl_ccsrsky (const MKL_INT *job , const MKL_INT *m , MKL_Complex8 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex8 *asky , MKL_INT *pointers , MKL_INT *info );
void mkl_zcsrsky (const MKL_INT *job , const MKL_INT *m , MKL_Complex16 *acsr , MKL_INT
*ja , MKL_INT *ia , MKL_Complex16 *asky , MKL_INT *pointers , MKL_INT *info );
Include Files
• mkl.h
Description
variation) to the skyline format and vice versa.
Input Parameters

job[0]
If job[0]=0, the matrix in the CSR format is converted to the skyline
format;
if job[0]=1, the matrix in the skyline format is converted to the CSR
format.
job[1]
job[2]
If job[2]=0, zero-based indexing for the matrix in the skyline format is
used;
if job[2]=1, one-based indexing for the matrix in the skyline format is
used.
job[3]
For conversion to the skyline format:
If job[3]=0, the upper part of the matrix A in the CSR format is converted.
If job[3]=1, the lower part of the matrix A in the CSR format is converted.

If job[3]=0, the matrix is converted to the upper part of the matrix A in
the CSR format.
If job[3]=1, the matrix is converted to the lower part of the matrix A in
the CSR format.
job[4]
job[4]=nzmax - maximum number of the non-zero elements of the matrix
A if job[0]=0.

Only for conversion to the skyline format:
If job[5]=0, only arrays pointers is filled in for the output storage.
241
If job[5]=1, all output arrays asky and pointers are filled in for the
output storage.
m Dimension of the matrix A.
acsr (input/output)


asky (input/output)
Array, for a lower triangular part of A it contains the set of elements from
each row starting from the first none-zero element to and including the
diagonal element. For an upper triangular matrix it contains the set of
elements from each column of the matrix starting with the first non-zero
element down to and including the diagonal element. Encountered zero
elements are included in the sets. Refer to values array description in
Skyline Storage Format for more details.
pointers (input/output).
Array with dimension (m+1), where m is number of rows for lower triangle
(columns for upper triangle), pointers[i-1] - pointers[0] gives the
index of element in the array asky that is first non-zero element in row
(column)i . The value of pointers[m] is set to nnz + pointers[0],
where nnz is the number of elements in the array asky. Refer to pointers
array description in Skyline Storage Format for more details
Output Parameters
If info=1, the routine is interrupted because there is no space in the array

asky according to the value nzmax.
mkl_?csradd
Computes the sum of two matrices stored in the CSR
format (3-array variation) with one-based indexing.
242
Syntax
void mkl_dcsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , double *a , MKL_INT *ja , MKL_INT *ia , const
double *beta , double *b , MKL_INT *jb , MKL_INT *ib , double *c , MKL_INT *jc ,
MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_scsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , float *a , MKL_INT *ja , MKL_INT *ia , const
float *beta , float *b , MKL_INT *jb , MKL_INT *ib , float *c , MKL_INT *jc , MKL_INT
*ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_ccsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *ia ,
const MKL_Complex8 *beta , MKL_Complex8 *b , MKL_INT *jb , MKL_INT *ib , MKL_Complex8
*c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_zcsradd (const char *trans , const MKL_INT *request , const MKL_INT *sort ,
const MKL_INT *m , const MKL_INT *n , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *ia ,
const MKL_Complex16 *beta , MKL_Complex16 *b , MKL_INT *jb , MKL_INT *ib ,
MKL_Complex16 *c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
Include Files
• mkl.h
Description
The mkl_?csradd routine performs a matrix-matrix operation defined as
C := A+beta*op(B)
where:
A, B, C are the sparse matrices in the CSR format (3-array variation).
op(B) is one of op(B) = B, or op(B) = BT, or op(B) = BH
beta is a scalar.
The routine works correctly if and only if the column indices in sparse matrix representations of matrices A
and B are arranged in the increasing order for each row. If not, use the parameter sort (see below) to
reorder column indices and the corresponding elements of the input matrices.
NOTE
Input Parameters
trans Specifies the operation.

If trans = 'N' or 'n', then C := A+beta*B
If trans = 'T' or 't', then C := A+beta*BT
If trans = 'C' or 'c', then C := A+beta*BH.
request If request=0, the routine performs addition. The memory for the output
arrays ic, jc, c must be allocated beforehand.
243
If request=1, the routine only computes the values of the array ic of

length m + 1. The memory for the ic array must be allocated beforehand.
On exit the value ic[m] - 1 is the actual number of the elements in the
arrays c and jc.
If request=2, after the routine is called previously with the parameter

request=1 and after the output arrays jc and c are allocated in the calling
program with length at least ic[m] - 1, the routine performs addition.
sort Specifies the type of reordering. If this parameter is not set (default), the
routine does not perform reordering.
If sort=1, the routine arranges the column indices ja for each row in the
increasing order and reorders the corresponding values of the matrix A in
the array a.
If sort=2, the routine arranges the column indices jb for each row in the
increasing order and reorders the corresponding values of the matrix B in
the array b.
If sort=3, the routine performs reordering for both input matrices A and B.

the matrix A. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array a. Refer to
columns array description in Sparse Matrix Storage Formats for more
details.

element from the row i. The value of the last element ia[m] is equal to the
number of non-zero elements of the matrix A plus one. Refer to rowIndex
b Array containing non-zero elements of the matrix B. Its length is equal to

the number of non-zero elements in the matrix B. Refer to values array
jb Array containing the column indices plus one for each non-zero element of
the matrix B. For each row the column indices must be arranged in the
increasing order.
The length of this array is equal to the length of the array b. Refer to
details.
244
ib Array of length m + 1 when trans = 'N' or 'n', or n + 1 otherwise.
This array contains indices of elements in the array b, such that ib[i] -
ib[0] is the index in the array b of the first non-zero element from the row
i. The value of the last element ib[m] or ib[n] is equal to the number of
non-zero elements of the matrix B plus one. Refer to rowIndex array
nzmax The length of the arrays c and jc.
This parameter is used only if request=0. The routine stops calculation if

the number of elements in the result matrix C exceeds the specified value
of nzmax.
Output Parameters
c Array containing non-zero elements of the result matrix C. Its length is

equal to the number of non-zero elements in the matrix C. Refer to values
jc Array containing the column indices plus one for each non-zero element of
the matrix C.
The length of this array is equal to the length of the array c. Refer to
details.
ic Array of length m + 1, containing indices of elements in the array c, such

that ic[i] - ic[0] is the index in the array c of the first non-zero
element from the row i. The value of the last element ic[m] is equal to the
number of non-zero elements of the matrix C plus one. Refer to rowIndex
info If info=0, the execution is successful.
If info=I>0, the routine stops calculation in the I-th row of the matrix C
because number of elements in C exceeds nzmax.
If info=-1, the routine calculates only the size of the arrays c and jc and
returns this value plus 1 as the last element of the array ic.
mkl_?csrmultcsr
Computes product of two sparse matrices stored in
the CSR format (3-array variation) with one-based
indexing.
Syntax
void mkl_dcsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , double *a , MKL_INT
*ja , MKL_INT *ia , double *b , MKL_INT *jb , MKL_INT *ib , double *c , MKL_INT *jc ,
void mkl_scsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , float *a , MKL_INT
*ja , MKL_INT *ia , float *b , MKL_INT *jb , MKL_INT *ib , float *c , MKL_INT *jc ,
245
void mkl_ccsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *b , MKL_INT *jb , MKL_INT *ib , MKL_Complex8
*c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
void mkl_zcsrmultcsr (const char *trans , const MKL_INT *request , const MKL_INT
*sort , const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex16 *a ,
MKL_INT *ja , MKL_INT *ia , MKL_Complex16 *b , MKL_INT *jb , MKL_INT *ib ,
MKL_Complex16 *c , MKL_INT *jc , MKL_INT *ic , const MKL_INT *nzmax , MKL_INT *info );
Include Files
• mkl.h
Description
The mkl_?csrmultcsr routine performs a matrix-matrix operation defined as
C := op(A)*B
where:
A, B, C are the sparse matrices in the CSR format (3-array variation);
op(A) is one of op(A) = A, or op(A) =AT, or op(A) = AH .
You can use the parameter sort to perform or not perform reordering of non-zero entries in input and output
sparse matrices. The purpose of reordering is to rearrange non-zero entries in compressed sparse row matrix
so that column indices in compressed sparse representation are sorted in the increasing order for each row.
The following table shows correspondence between the value of the parameter sort and the type of
reordering performed by this routine for each sparse matrix involved:
Value of the parameter Reordering of A (arrays Reordering of B (arrays Reordering of C (arrays
sort a, ja, ia) b, ja, ib) c, jc, ic)
1 yes no yes
2 no yes yes
3 yes yes yes
4 yes no no
5 no yes no
6 yes yes no
7 no no no
arbitrary value not equal to no no yes
1, 2,..., 7
NOTE
Input Parameters

If trans = 'N' or 'n', then C := A*B
If trans = 'T' or 't' or 'C' or 'c', then C := AT*B.
246
request If request=0, the routine performs multiplication, the memory for the
output arrays ic, jc, c must be allocated beforehand.
If request=1, the routine computes only values of the array ic of length m

+ 1, the memory for this array must be allocated beforehand. On exit the
value ic[m] - 1 is the actual number of the elements in the arrays c and
jc.
If request=2, the routine has been called previously with the parameter
request=1, the output arrays jc and c are allocated in the calling program
and they are of the length ic[m] - 1 at least.
sort Specifies whether the routine performs reordering of non-zeros entries in

input and/or output sparse matrices (see table above).
k Number of columns of the matrix B.

increasing order.
details.
ia Array of length m + 1.
This array contains indices of elements in the array a, such that ia[i] -
ia[0] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] is equal to the number of non-zero
elements of the matrix A plus one. Refer to rowIndex array description in

increasing order.
details.
ib Array of length n + 1 when trans = 'N' or 'n', or m + 1 otherwise.
247
i. The value of the last element ib[n] or ib[m] is equal to the number of
non-zero elements of the matrix B plus one. Refer to rowIndex array
nzmax The length of the arrays c and jc.
This parameter is used only if request=0. The routine stops calculation if

the number of elements in the result matrix C exceeds the specified value
of nzmax.
Output Parameters
c Array containing non-zero elements of the result matrix C. Its length is

equal to the number of non-zero elements in the matrix C. Refer to values
jc Array containing the column indices plus one for each non-zero element of
the matrix C.
The length of this array is equal to the length of the array c. Refer to
details.
ic Array of length m + 1 when trans = 'N' or 'n', or n + 1 otherwise.
This array contains indices of elements in the array c, such that ic[i] -
ic[0] is the index in the array c of the first non-zero element from the row
i. The value of the last element ic[m] or ic[n] is equal to the number of
non-zero elements of the matrix C plus one. Refer to rowIndex array
If info=I>0, the routine stops calculation in the I-th row of the matrix C
because number of elements in C exceeds nzmax.
If info=-1, the routine calculates only the size of the arrays c and jc and
returns this value plus 1 as the last element of the array ic.
mkl_?csrmultd
Computes product of two sparse matrices stored in
the CSR format (3-array variation) with one-based
indexing. The result is stored in the dense matrix.
Syntax
void mkl_dcsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , double *a , MKL_INT *ja , MKL_INT *ia , double *b , MKL_INT *jb , MKL_INT
*ib , double *c , MKL_INT *ldc );
void mkl_scsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , float *a , MKL_INT *ja , MKL_INT *ia , float *b , MKL_INT *jb , MKL_INT
*ib , float *c , MKL_INT *ldc );
248
void mkl_ccsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *ia , MKL_Complex8 *b , MKL_INT
*jb , MKL_INT *ib , MKL_Complex8 *c , MKL_INT *ldc );
void mkl_zcsrmultd (const char *trans , const MKL_INT *m , const MKL_INT *n , const
MKL_INT *k , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *ia , MKL_Complex16 *b , MKL_INT
*jb , MKL_INT *ib , MKL_Complex16 *c , MKL_INT *ldc );
Include Files
• mkl.h
Description
The mkl_?csrmultd routine performs a matrix-matrix operation defined as
C := op(A)*B
where:
A, B are the sparse matrices in the CSR format (3-array variation), C is dense matrix;
op(A) is one of op(A) = A, or op(A) =AT, or op(A) = AH .
The routine works correctly if and only if the column indices in sparse matrix representations of matrices A
and B are arranged in the increasing order for each row.
NOTE
Input Parameters

If trans = 'N' or 'n', then C := A*B
If trans = 'T' or 't' or 'C' or 'c', then C := AT*B.
k Number of columns of the matrix B.

increasing order.
details.
ia Array of length m + 1 when trans = 'N' or 'n', or n + 1 otherwise.
249
This array contains indices of elements in the array a, such that ia[i] -
ia[0] is the index in the array a of the first non-zero element from the row
i. The value of the last element ia[m] or ia[n] is equal to the number of
non-zero elements of the matrix A plus one. Refer to rowIndex array

increasing order.
details.
ib Array of length m + 1.
i. The value of the last element ib[m] is equal to the number of non-zero
elements of the matrix B plus one. Refer to rowIndex array description in
Output Parameters
c Array containing non-zero elements of the result matrix C.
ldc Specifies the leading dimension of the dense matrix C as declared in the
calling (sub)program. Must be at least max(m, 1) when trans = 'N' or
'n', or max(1, n) otherwise.
Inspector-executor Sparse BLAS Routines

The inspector-executor API for Sparse BLAS divides operations into two stages: analysis and execution.
During the initial analysis stage, the API inspects the matrix sparsity pattern and applies matrix structure
changes. In the execution stage, subsequent routine calls reuse this information in order to improve
performance.
The inspector-executor API supports key Sparse BLAS operations for iterative sparse solvers:
• Sparse matrix-vector multiplication

• Sparse matrix-matrix multiplication with a sparse or dense result
• Solution of triangular systems
• Sparse matrix addition
Naming conventions in Inspector-executor Sparse BLAS Routines

The Inspector-executor Sparse BLAS API routine names use the following convention:
mkl_sparse_[<character>_]<operation>[_<format>]
250
The data type is included in the name only if the function accepts dense matrix or scalar floating point
parameters.
The <operation> field indicates the type of operation:
create create matrix handle
copy create a copy of matrix handle
convert convert matrix between sparse formats
export export matrix from internal representation to CSR or BSR format
destroy frees memory allocated for matrix handle
set_<op>_hin provide information about number of upcoming compute operations and

t operation type for optimization purposes, where <op> is mv, sv, mm, sm, or
memory
optimize analyze the matrix using hints and store optimization information in matrix
handle
mv compute sparse matrix-vector product
mm compute sparse matrix by dense matrix product (batch mv)
spmm/spmmd compute sparse matrix by sparse matrix product and store the result as a
sparse/dense matrix
trsv solve a triangular system
trsm solve a triangular system with multiple right-hand sides
add compute sum of two sparse matrices
The <format> field indicates the sparse matrix storage format:
coo coordinate format
csr compressed sparse row format plus variations
csc compressed sparse column format plus variations
bsr block sparse row format plus variations
The format is included in the function name only if the function parameters include an explicit sparse matrix
in one of the conventional sparse matrix formats.
Sparse Matrix Storage Formats for Inspector-executor Sparse BLAS Routines

Inspector-executor Sparse BLAS routines support four conventional sparse matrix storage formats:
• compressed sparse row format (CSR) plus variations
251
• compressed sparse column format (CSC) plus variations

• coordinate format (COO)
• block sparse row format (BSR) plus variations
Computational routines operate on a matrix handle that stores a matrix in CSR or BSR formats. Other
formats should be converted to CSR or BSR format before calling any computational routines. For more
information see Sparse Matrix Storage Formats.
Supported Inspector-executor Sparse BLAS Operations

The Inspector-executor Sparse BLAS API can perform several operations involving sparse matrices. These
notations are used in the description of the operations:
• A, G, V are sparse matrices

• B and C are dense matrices
• x and y are dense vectors
• alpha and beta are scalars
op(A) represents a possible transposition of matrix A

op(A) = A
op(A) = AT - transpose of A
op(A) = AH - conjugate transpose of A
op(A)-1 denotes the inverse of op(A).

The Inspector-executor Sparse BLAS routines support the following operations:
• computing the vector product between a sparse matrix and a dense vector:
• solving a single triangular system:
• computing a product between a sparse matrix and a dense matrix:

C := alpha*op(A)*B + beta*C
• computing a product between sparse matrices with a sparse result:
V := alpha*op(A)*G
• computing a product between sparse matrices with a dense result:
C := alpha*op(A)*G
• computing a sum of sparse matrices with a sparse result:
V := alpha*op(A) + G
• solving a sparse triangular system with multiple right-hand sides:
C := alpha*inv(op(A))*B
Matrix manipulation routines

The Matrix Manipulation routines table lists the Matrix Manipulation routines and the data types associated
with them.
252
Matrix Manipulation Routines and Their Data Types
Function Group
mkl_sparse_? s, d, c, z Creates a handle for a CSR format matrix.

_create_csr
mkl_sparse_? s, d, c, z Creates a handle for a CSC format matrix.

_create_csc
mkl_sparse_? s, d, c, z Creates a handle for a matrix in COO format.

_create_coo
mkl_sparse_? s, d, c, z Creates a handle for a matrix in BSR format.

_create_bsr
mkl_sparse_copy NA Creates a copy of a matrix handle.
mkl_sparse_destro NA Frees memory allocated for matrix handle.

y
mkl_sparse_conve NA Converts internal matrix representation to CSR format.

rt_csr
mkl_sparse_conve NA Converts internal matrix representation to BSR format or

rt_bsr changes BSR block size.
mkl_sparse_? s, d, c, z Exports CSR matrix from internal representation.

_export_csr
mkl_sparse_? s, d, c, z Exports BSR matrix from internal representation.

_export_bsr
mkl_sparse_? s, d, c, z Changes a single value of matrix in internal

_set_value representation.
mkl_sparse_?_create_csr
Creates a handle for a CSR format matrix.
Syntax
sparse_status_t mkl_sparse_s_create_csr (sparse_matrix_t *A, sparse_index_base_t
indexing, MKL_INT rows, MKL_INT cols, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT
*col_indx, float *values);
sparse_status_t mkl_sparse_d_create_csr (sparse_matrix_t *A, sparse_index_base_t
*col_indx, double *values);
sparse_status_t mkl_sparse_c_create_csr (sparse_matrix_t *A, sparse_index_base_t
*col_indx, MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_csr (sparse_matrix_t *A, sparse_index_base_t
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_create_csr routine creates a handle for an m-by-k matrix A in CSR format.
253
Input Parameters
indexing Indicates how input arrays are indexed.
SPARSE_INDEX_BASE_Z Zero-based (C-style) indexing: indices start at

ERO 0.
SPARSE_INDEX_BASE_O One-based (Fortran-style) indexing: indices

NE start at 1.
rows Number of rows of matrix A.
cols Number of columns of matrix A.
rows_start Array of length at least m. This array contains row indices, such that
rows_start[i] - rows_start[0] is the first index of row i in the arrays
values and col_indx.
rows_end Array of at least length m. This array contains row indices, such that
rows_end[i] - rows_start[0] - 1 is the last index of row i in the arrays
col_indx For one-based indexing, array containing the column indices plus one for
Its length is at least rows_end[rows - 1] - rows_start[0].
values Array containing non-zero elements of the matrix A. Its length is equal to
length of the col_indx array.
Output Parameters
A Handle containing internal data for subsequent Inspector-executor Sparse

BLAS operations.
Return Values
The function returns a value indicating whether the operation was successful or not, and why.
SPARSE_STATUS_SUCCESS The operation was successful.
SPARSE_STATUS_NOT_INITIALIZE The routine encountered an empty handle or matrix array.

D
SPARSE_STATUS_ALLOC_FAILED Internal memory allocation failed.
SPARSE_STATUS_INVALID_VALUE The input parameters contain an invalid value.
SPARSE_STATUS_EXECUTION_FAIL Execution failed.

ED
SPARSE_STATUS_INTERNAL_ERROR An error in algorithm implementation occurred.
254
SPARSE_STATUS_NOT_SUPPORTED The requested operation is not supported.
mkl_sparse_?_create_csc
Creates a handle for a CSC format matrix.
Syntax
sparse_status_t mkl_sparse_s_create_csc (sparse_matrix_t *A, sparse_index_base_t
*col_indx, float *values);
sparse_status_t mkl_sparse_d_create_csc (sparse_matrix_t *A, sparse_index_base_t
*col_indx, double *values);
sparse_status_t mkl_sparse_c_create_csc (sparse_matrix_t *A, sparse_index_base_t
sparse_status_t mkl_sparse_z_create_csc (sparse_matrix_t *A, sparse_index_base_t
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_create_csc routine creates a handle for an m-by-k matrix A in CSC format.
Input Parameters

ERO 0.

NE start at 1.
rows Number of rows of the matrix A.
cols Number of columns of the matrix A.
rows_start Array of length at least m. This array contains row indices, such that
rows_end Array of at least length m. This array contains row indices, such that
255
Its length is at least rows_end[rows - 1] - rows_start[0].
length of the col_indx array.
Output Parameters
A Handle containing internal data.
Return Values

D

ED
mkl_sparse_?_create_coo
Creates a handle for a matrix in COO format.
Syntax
sparse_status_t mkl_sparse_s_create_coo (sparse_matrix_t *A, sparse_index_base_t
indexing, MKL_INT rows, MKL_INT cols, MKL_INT nnz, MKL_INT *row_indx, MKL_INT *
col_indx, float *values);
sparse_status_t mkl_sparse_d_create_coo (sparse_matrix_t *A, sparse_index_base_t
col_indx, double *values);
sparse_status_t mkl_sparse_c_create_coo (sparse_matrix_t *A, sparse_index_base_t
col_indx, MKL_Complex8 *values);
sparse_status_t mkl_sparse_z_create_coo (sparse_matrix_t *A, sparse_index_base_t
col_indx, MKL_Complex16 *values);
Include Files
• mkl_spblas.h
256
Description
The mkl_sparse_?_create_coo routine creates a handle for an m-by-k matrix A in COO format.
Input Parameters

ERO 0.

NE start at 1.
rows Number of rows of matrix A.
cols Number of columns of matrix A.
nnz Specifies the number of non-zero elements of the matrix A.

row_indx Array of length nnz, containing the row indices for each non-zero element
of matrix A.
col_indx Array of length nnz, containing the column indices for each non-zero
element of matrix A.
values Array of length nnz, containing the non-zero elements of matrix A in

arbitrary order.
Output Parameters
Return Values

D

ED
257
mkl_sparse_?_create_bsr
Creates a handle for a matrix in BSR format.
Syntax
sparse_status_t mkl_sparse_s_create_bsr (sparse_matrix_t *A, sparse_index_base_t
indexing, sparse_layout_t block_layout, MKL_INT rows, MKL_INT cols, MKL_INT
block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx, float *values);
sparse_status_t mkl_sparse_d_create_bsr (sparse_matrix_t *A, sparse_index_base_t
block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx, double *values);
sparse_status_t mkl_sparse_c_create_bsr (sparse_matrix_t *A, sparse_index_base_t
block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx, MKL_Complex8
*values);
sparse_status_t mkl_sparse_z_create_bsr (sparse_matrix_t *A, sparse_index_base_t
block_size, MKL_INT *rows_start, MKL_INT *rows_end, MKL_INT *col_indx, MKL_Complex16
*values);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_create_bsr routine creates a handle for an m-by-k matrix A in BSR format.
Input Parameters

ERO 0.

NE start at 1.
block_layout Specifies layout of blocks:
SPARSE_LAYOUT_ROW_M Storage of elements of blocks uses row major

AJOR layout.
SPARSE_LAYOUT_COLUM Storage of elements of blocks uses column

N_MAJOR major layout.
258
NOTE
If you specify SPARSE_INDEX_BASE_ZERO for indexing, you must use
SPARSE_LAYOUT_ROW_MAJOR for block_layout. Similarly, if you
specify SPARSE_INDEX_BASE_ONE for indexing, you must use
SPARSE_LAYOUT_COLUMN_MAJOR for block_layout. Otherwise
mkl_sparse_?_create_bsr returns
SPARSE_STATUS_NOT_SUPPORTED.
rows Number of block rows of matrix A.
cols Number of block columns of matrix A.
block_size Size of blocks in matrix A.
rows_start Array of length m. This array contains row indices, such that
rows_start[i] - rows_start[0] is the first index of block row i in the
arrays values and col_indx.
rows_end Array of length m. This array contains row indices, such that rows_end[i]
- rows_start[0] - 1 is the last index of block row i in the arrays values
and col_indx.
each non-zero block of the matrix A. For zero-based indexing, array
containing the column indices for each non-zero block of the matrix A. Its
length is rows_end[rows - 1] - rows_start[0].
length of the col_indx array multiplied by block_size*block_size.
Output Parameters
Return Values

D

ED
259
mkl_sparse_copy
Creates a copy of a matrix handle.
Syntax
sparse_status_t mkl_sparse_copy (const sparse_matrix_t source, struct matrix_descr
descr, sparse_matrix_t *dest);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_copy routine creates a copy of a matrix handle.
Input Parameters
source Specifies handle containing internal data.
descr Structure specifying sparse matrix properties.

sparse_matrix_type_t type - Specifies the type of a sparse matrix:
SPARSE_MATRIX_TYPE_ The matrix is processed as is.

GENERAL
SPARSE_MATRIX_TYPE_ The matrix is symmetric (only the requested
SYMMETRIC triangle is processed).
SPARSE_MATRIX_TYPE_ The matrix is Hermitian (only the requested

HERMITIAN triangle is processed).
SPARSE_MATRIX_TYPE_ The matrix is triangular (only the requested

TRIANGULAR triangle is processed).
SPARSE_MATRIX_TYPE_ The matrix is diagonal (only diagonal elements

DIAGONAL are processed).
SPARSE_MATRIX_TYPE_ The matrix is block-triangular (only requested

BLOCK_TRIANGULAR triangle is processed). (Applies to BSR format
only.)
SPARSE_MATRIX_TYPE_ The matrix is block-diagonal (only diagonal

BLOCK_DIAGONAL blocks are processed. (Applies to BSR format
only.)
sparse_fill_mode_t mode - Specifies the triangular matrix part for

symmetric, Hermitian, triangular, and block-triangular matrices:
SPARSE_FILL_MODE_LO The lower triangular matrix part is processed.

WER
260
SPARSE_FILL_MODE_UP The upper triangular matrix part is processed.
PER
sparse_diag_type_t diag - Specifies diagonal type for non-general
matrices:
SPARSE_DIAG_NON_UNI Diagonal elements might not be equal to one.

T
SPARSE_DIAG_UNIT Diagonal elements are equal to one.
Output Parameters
dest Handle containing internal data.
Return Values

D

ED
mkl_sparse_destroy
Frees memory allocated for matrix handle.
Syntax
sparse_status_t mkl_sparse_destroy (sparse_matrix_t A);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_destroy routine frees memory allocated for matrix handle.
NOTE
You must free memory allocated for matrices after completing use of them. The mkl_sparse_destroy
provides a utility to do so.
261
Input Parameters
Return Values

D

ED
mkl_sparse_convert_csr
Converts internal matrix representation to CSR
format.
Syntax
sparse_status_t mkl_sparse_convert_csr (const sparse_matrix_t source,
sparse_operation_t operation, sparse_matrix_t *dest);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_convert_csr routine converts internal matrix representation to CSR format.
Input Parameters
source Handle containing internal data.
operation Specifies operation op() on input matrix.
SPARSE_OPERATION_NO Non-transpose, op(A) = A.

N_TRANSPOSE
SPARSE_OPERATION_TR Transpose, op(A) = AT.
ANSPOSE
SPARSE_OPERATION_CO Conjugate transpose, op(A) = AH.
NJUGATE_TRANSPOSE
262
Output Parameters
Return Values

D

ED
mkl_sparse_convert_bsr
Converts internal matrix representation to BSR format
or changes BSR block size.
Syntax
sparse_status_t mkl_sparse_convert_bsr (const sparse_matrix_t source, MKL_INT
block_size, sparse_layout_t block_layout, sparse_operation_t operation, sparse_matrix_t
*dest);
Include Files
• mkl_spblas.h
Description
Themkl_sparse_convert_bsr routine converts internal matrix representation to BSR format or changes
BSR block size.
Input Parameters
block_size Size of the block in the output structure.

AJOR layout.

263

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE
Output Parameters
Return Values

D

ED
mkl_sparse_?_export_csr
Exports CSR matrix from internal representation.
Syntax
sparse_status_t mkl_sparse_s_export_csr (const sparse_matrix_t *source,
sparse_index_base_t *indexing, MKL_INT *rows, MKL_INT *cols, MKL_INT **rows_start,
MKL_INT **rows_end, MKL_INT **col_indx, float **values);
sparse_status_t mkl_sparse_d_export_csr (const sparse_matrix_t *source,
MKL_INT **rows_end, MKL_INT **col_indx, double **values);
sparse_status_t mkl_sparse_c_export_csr (const sparse_matrix_t *source,
MKL_INT **rows_end, MKL_INT **col_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_csr (const sparse_matrix_t *source,
MKL_INT **rows_end, MKL_INT **col_indx, MKL_Complex16 **values);
Include Files
• mkl_spblas.h
264
Description
If the matrix specified by the source handle is in CSR format, the mkl_sparse_?_export_csr routine
exports an m-by-k matrix A in CSR format matrix from the internal representation. The routine returns
pointers to the internal representation and does not allocate additional memory.
NOTE
Since the exported data is a copy of an internally stored structure, any changes made to it have no
effect on subsequent Inspector-executor Sparse BLAS operations.
If the matrix is not already in CSR format, the routine returns SPARSE_STATUS_INVALID_VALUE.
Input Parameters
Output Parameters

ERO 0.

NE start at 1.
rows Number of rows of the matrix source.
cols Number of columns of the matrix source.
rows_start Pointer to array of length m. This array contains row indices, such that
rows_end Pointer to array of length m. This array contains row indices, such that
col_indx For one-based indexing, pointer to array containing the column indices plus
one for each non-zero element of the matrix source. For zero-based
indexing, pointer to array containing the column indices for each non-zero
element of the matrix source. Its length is rows_end[rows - 1] -
rows_start[0].
values Pointer to array containing non-zero elements of the matrix A. Its length is
equal to length of the col_indx array.
Return Values
265

D

ED
mkl_sparse_?_export_bsr
Exports BSR matrix from internal representation.
Syntax
sparse_status_t mkl_sparse_s_export_bsr (const sparse_matrix_t source,
sparse_index_base_t *indexing, sparse_layout_t *block_layout, MKL_INT *rows, MKL_INT
*cols, MKL_INT *block_size, MKL_INT **rows_start, MKL_INT **rows_end, MKL_INT
**col_indx, float **values);
sparse_status_t mkl_sparse_d_export_bsr (const sparse_matrix_t source,
**col_indx, double **values);
sparse_status_t mkl_sparse_c_export_bsr (const sparse_matrix_t source,
**col_indx, MKL_Complex8 **values);
sparse_status_t mkl_sparse_z_export_bsr (const sparse_matrix_t source,
**col_indx, MKL_Complex16 **values);
Include Files
• mkl_spblas.h
Description
If the matrix specified by the source handle is in BSR format, the mkl_sparse_?_export_bsr routine
exports an m-by-k matrix A in BSR format from the internal representation. The routine returns pointers to
the internal representation and does not allocate additional memory.
NOTE
Since the exported data is a copy of an internally stored structure, any changes made to it have no
effect on subsequent Inspector-executor Sparse BLAS operations.
If the matrix is not already in BSR format, the routine returns SPARSE_STATUS_INVALID_VALUE.
266
Input Parameters
Output Parameters

ERO 0.

NE start at 1.

AJOR layout.

rows Number of block rows of the matrix source.
cols Number of columns of the matrix source. Number of block columns of

matrix source.
block_size Size of the block in matrix source.
rows_start Pointer to array of length m. This array contains row indices, such that
rows_start[i] - rows_start[0] is the first index of block row i in the
rows_end Pointer to array of length m. This array contains row indices, such that
rows_end[i] - rows_start[0] - 1 is the last index of block row i in the
col_indx For one-based indexing, pointer to array containing the column indices plus
one for each non-zero blocks of the matrix source. For zero-based indexing,
pointer to array containing the column indices for each non-zero blocks of
the matrix source. Its length is rows_end[m - 1] - rows_start[0].
values Pointer to array containing non-zero elements of matrix source. Its length is
equal to length of the col_indx array multiplied by
block_size*block_size.
Return Values
267

D

ED
mkl_sparse_?_set_value
Changes a single value of matrix in internal
representation.
Syntax
sparse_status_t mkl_sparse_s_set_value (sparse_matrix_t A, MKL_INT row, MKL_INT col,
float value);
sparse_status_t mkl_sparse_d_set_value (sparse_matrix_t A, MKL_INT row, MKL_INT col,
double value);
sparse_status_t mkl_sparse_c_set_value (sparse_matrix_t A, MKL_INT row, MKL_INT col,
MKL_Complex8 value);
sparse_status_t mkl_sparse_z_set_value (sparse_matrix_t A, MKL_INT row, MKL_INT col,
MKL_Complex16 value);
Include Files
• mkl_spblas.h
Description
Use the mkl_sparse_?_set_value routine to change a single value of a matrix in internal Inspector-
executor Sparse BLAS format.
Input Parameters
A Specifies handle containing internal data.
row Indicates row of matrix in which to set value.
col Indicates column of matrix in which to set value.
value Indicates value
Output Parameters
A Handle containing modified internal data.
Return Values
268

D
Inspector-executor Sparse BLAS Analysis Routines

Analysis Routines and Their Data Types
Routine or Function Description
Group
mkl_sparse_set_mv_hint Provides estimate of number and type of upcoming matrix-vector operations.
mkl_sparse_set_sv_hint Provides estimate of number and type of upcoming triangular system solver
operations.
mkl_sparse_set_mm_hint Provides estimate of number and type of upcoming matrix-matrix

multiplication operations.
mkl_sparse_set_sm_hint Provides estimate of number and type of upcoming triangular matrix solve
with multiple right hand sides operations.
mkl_sparse_set_memory Provides memory requirements for performance optimization purposes.

_hint
mkl_sparse_optimize Analyzes matrix structure and performs optimizations using the hints
provided in the handle.
Optimization Notice
mkl_sparse_set_mv_hint
Provides estimate of number and type of upcoming
matrix-vector operations.
Syntax
sparse_status_t mkl_sparse_set_mv_hint (sparse_matrix_t A, sparse_operation_t
operation, struct matrix_descr descr, MKL_INT expected_calls);
Include Files
• mkl_spblas.h
269
Description
Use the mkl_sparse_set_mv_hint routine to provide the Inspector-executor Sparse BLAS API an estimate
of the number of upcoming matrix-vector multiplication operations for performance optimization, and specify
whether or not to perform an operation on the matrix.
Optimization Notice
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE


GENERAL




only.)

only.)
270

WER
PER
matrices:

T
expected_calls Number of expected calls to execution routine.
Output Parameters
Return Values

D

ED
mkl_sparse_set_sv_hint
triangular system solver operations.
Syntax
sparse_status_t mkl_sparse_set_sv_hint (sparse_matrix_t A, sparse_operation_t
operation, struct matrix_descr descr, MKL_INT expected_calls);
Include Files
• mkl_spblas.h
271
Description
The mkl_sparse_sv_hint routine provides an estimate of the number of upcoming triangular system solver
operations and type of these operations for performance optimization.
Optimization Notice
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE


GENERAL




only.)

only.)
272

WER
PER
matrices:

T
Output Parameters
Return Values

D

ED
mkl_sparse_set_mm_hint
matrix-matrix multiplication operations.
Syntax
sparse_status_t mkl_sparse_set_mm_hint (sparse_matrix_t A, sparse_operation_t
operation, struct matrix_descr descr, sparse_layout_t layout, MKL_INT
dense_matrix_size, MKL_INT expected_calls);
Include Files
• mkl_spblas.h
273
Description
The mkl_sparse_set_mm_hint routine provides an estimate of the number of upcoming matrix-matrix
multiplication operations and type of these operations for performance optimization purposes.
Optimization Notice
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE


GENERAL




only.)

only.)
274

WER
PER
matrices:

T
layout Specifies layout of elements:
SPARSE_LAYOUT_COLUM Storage of elements uses column major layout.

N_MAJOR
SPARSE_LAYOUT_ROW_M Storage of elements uses row major layout.
AJOR
dense_matrix_size Number of columns in dense matrix.
Output Parameters
Return Values

D

ED
mkl_sparse_set_sm_hint
triangular matrix solve with multiple right hand sides
operations.
275
Syntax
sparse_status_t mkl_sparse_set_sm_hint (sparse_matrix_t A, sparse_operation_t
operation, struct matrix_descr descr, sparse_layout_t layout, MKL_INT
dense_matrix_size, MKL_INT expected_calls);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_set_sm_hint routine provides an estimate of the number of upcoming triangular matrix
solve with multiple right hand sides operations and type of these operations for performance optimization
purposes.
Optimization Notice
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE


GENERAL


276

only.)

only.)


WER
PER
matrices:

T
layout Specifies layout of elements:

N_MAJOR
AJOR
dense_matrix_size Number of right-hand-side.
Output Parameters
Return Values

D
277

ED
mkl_sparse_set_memory_hint
Provides memory requirements for performance
optimization purposes.
Syntax
sparse_status_t mkl_sparse_set_memory_hint (sparse_matrix_t A, sparse_memory_usage_t
policy);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_set_memory_hint routine allocates additional memory for further performance
optimization purposes.
Optimization Notice
Input Parameters
policy Specify memory utilization policy for optimization routine using these types:
SPARSE_MEMORY_NONE Routine can allocate memory only for auxiliary

structures (such as for workload balancing); the
amount of memory is proportional to vector
size.
SPARSE_MEMORY_AGGRE Default.
SSIVE Routine can allocate memory up to the size of
matrix A for converting into the appropriate
sparse format.
Output Parameters
278
Return Values

D

ED
mkl_sparse_optimize
Analyzes matrix structure and performs optimizations
using the hints provided in the handle.
Syntax
sparse_status_t mkl_sparse_optimize (sparse_matrix_t A);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_optimize routine analyzes matrix structure and performs optimizations using the hints
provided in the handle. Generally, specifying a higher number of expected operations allows for more
aggressive and time consuming optimizations.
Optimization Notice
Input Parameters
Return Values
279

D

ED
Inspector-executor Sparse BLAS Execution Routines

Execution Routines and Their Data Types
Function Group
mkl_sparse_?_mv s, d, c, z Computes a sparse matrix-vector product.
mkl_sparse_?_ s, d, c, z Solves a system of linear equations for a square sparse

trsv matrix.
mkl_sparse_?_mm s, d, c, z Computes the product of a sparse matrix and a dense

matrix.
mkl_sparse_? s, d, c, z Solves a system of linear equations with multiple right

_trsm hand sides for a square sparse matrix.
mkl_sparse_?_add s, d, c, z Computes sum of two sparse matrices.
mkl_sparse_spmm s, d, c, z Computes the product of two sparse matrices and stores

the result as a sparse matrix.
mkl_sparse_? s, d, c, z Computes the product of two sparse matrices and stores

_spmmd the result as a dense matrix.
mkl_sparse_?_mv
Computes a sparse matrix-vector product.
Syntax
sparse_status_t mkl_sparse_s_mv (sparse_operation_t operation, float alpha, const
sparse_matrix_t A, struct matrix_descr descr, const float *x, float beta, float *y);
sparse_status_t mkl_sparse_d_mv (sparse_operation_t operation, double alpha, const
sparse_matrix_t A, struct matrix_descr descr, const double *x, double beta, double *y);
sparse_status_t mkl_sparse_c_mv (sparse_operation_t operation, MKL_Complex8 alpha,
const sparse_matrix_t A, struct matrix_descr descr, const MKL_Complex8 *x, MKL_Complex8
beta, MKL_Complex8 *y);
sparse_status_t mkl_sparse_z_mv (sparse_operation_t operation, MKL_Complex16 alpha,
const sparse_matrix_t A, struct matrix_descr descr, const MKL_Complex16 *x,
MKL_Complex16 beta, MKL_Complex16 *y);
280
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_mv routine computes a sparse matrix-vector product defined as
where:
alpha and beta are scalars, x and y are vectors, and A is a matrix handle of a matrix with m rows and k
columns.
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE
A Handle containing sparse matrix in internal data structure.


GENERAL




only.)

only.)

281

WER
PER
matrices:

T
x Array with size equal to the number of columns of A if operation =

SPARSE_OPERATION_NON_TRANSPOSE and at least rows of A otherwise. On
y Array with size at least m if

operation=SPARSE_OPERATION_NON_TRANSPOSE and at least k otherwise.
On entry, the array y must contain the vector y.
Output Parameters
Return Values

D

ED
mkl_sparse_?_trsv
Solves a system of linear equations for a triangular
sparse matrix.
Syntax
sparse_status_t mkl_sparse_s_trsv (sparse_operation_t operation, float alpha, const
sparse_matrix_t A, struct matrix_descr descr, const float *x, float *y);
282
sparse_status_t mkl_sparse_d_trsv (sparse_operation_t operation, double alpha, const
sparse_matrix_t A, struct matrix_descr descr, const double *x, double *y);
sparse_status_t mkl_sparse_c_trsv (sparse_operation_t operation, MKL_Complex8 alpha,
const sparse_matrix_t A, struct matrix_descr descr, const MKL_Complex8 *x, MKL_Complex8
*y);
sparse_status_t mkl_sparse_z_trsv (sparse_operation_t operation, MKL_Complex16 alpha,
const sparse_matrix_t A, struct matrix_descr descr, const MKL_Complex16 *x,
MKL_Complex16 *y);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_trsv routine solves a system of linear equations for a matrix:
op(A)*y = alpha * x
where A is a triangular sparse matrix, alpha is a scalar, and x and y are vectors.
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE


GENERAL



283

only.)

only.)


WER
PER
matrices:

T
x Array of size at least m, where m is the number of rows of matrix A. On

Output Parameters
y Array of size at least m containing the solution to the system of linear

equations.
Return Values

D

ED
mkl_sparse_?_mm
Computes the product of a sparse matrix and a dense
matrix.
284
Syntax
sparse_status_t mkl_sparse_s_mm (sparse_operation_t operation, float alpha, const
sparse_matrix_t A, struct matrix_descr descr, sparse_layout_t layout, const float *x,
MKL_INT columns, MKL_INT ldx, float beta, float *y, MKL_INT ldy);
sparse_status_t mkl_sparse_d_mm (sparse_operation_t operation, double alpha, const
sparse_matrix_t A, struct matrix_descr descr, sparse_layout_t layout, const double *x,
MKL_INT columns, MKL_INT ldx, double beta, double *y, MKL_INT ldy);
sparse_status_t mkl_sparse_c_mm (sparse_operation_t operation, MKL_Complex8 alpha,
const sparse_matrix_t A, struct matrix_descr descr, sparse_layout_t layout, const
MKL_Complex8 *x, MKL_INT columns, MKL_INT ldx, MKL_Complex8 beta, MKL_Complex8 *y,
MKL_INT ldy);
sparse_status_t mkl_sparse_z_mm (sparse_operation_t operation, MKL_Complex16 alpha,
MKL_Complex16 *x, MKL_INT columns, MKL_INT ldx, MKL_Complex16 beta, MKL_Complex16 *y,
MKL_INT ldy);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_mm routine performs a matrix-matrix operation:
where alpha and beta are scalars, A is a sparse matrix, and x and y are dense matrices.
The mkl_sparse_?_mm and mkl_sparse_?_trsm routines support these configurations:
Column-major dense matrix: Row-major dense matrix: layout

layout = = SPARSE_LAYOUT_ROW_MAJOR
SPARSE_LAYOUT_COLUMN_MAJ
OR
0-based sparse matrix: CSR All formats

SPARSE_INDEX_BASE_ZERO
BSR: general non-transposed
matrix multiplication only
1-based sparse matrix: All formats CSR

SPARSE_INDEX_BASE_ONE
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE
285


GENERAL




only.)

only.)


WER
PER
matrices:

T
layout Describes the storage scheme for the dense matrix:

N_MAJOR
AJOR
x Array of size at least rows*cols.
286
layout = layout =
SPARSE_LAYOUT_CO SPARSE_LAYOUT_ROW_
LUMN_MAJOR MAJOR
rows (number of ldx If op(A) = A, number

rows in x) of columns in A
If op(A) = AT, number
of rows in A
cols (number of columns ldx

columns in x)
columns Number of columns of matrix y.
ldx Specifies the leading dimension of matrix x.
beta Specifies the scalar beta
y Array of size at least rows*cols, where
layout = layout =
LUMN_MAJOR MAJOR
rows (number of ldy If op(A) = A, number

rows in y) of columns in A
If op(A) = AT, number
of rows in A
cols (number of columns ldy

columns in y)
Output Parameters
y Overwritten by the updated matrix y.
Return Values

D

ED
287
mkl_sparse_?_trsm
Solves a system of linear equations with multiple right
hand sides for a triangular sparse matrix.
Syntax
sparse_status_t mkl_sparse_s_trsm (sparse_operation_t operation, float alpha, const
sparse_matrix_t A, struct matrix_descr descr, sparse_layout_t layout, const float *x,
MKL_INT columns, MKL_INT ldx, float *y, MKL_INT ldy);
sparse_status_t mkl_sparse_d_trsm (sparse_operation_t operation, double alpha, const
sparse_matrix_t A, struct matrix_descr descr, sparse_layout_t layout, const double *x,
MKL_INT columns, MKL_INT ldx, double *y, MKL_INT ldy);
sparse_status_t mkl_sparse_c_trsm (sparse_operation_t operation, MKL_Complex8 alpha,
MKL_Complex8 *x, MKL_INT columns, MKL_INT ldx, MKL_Complex8 *y, MKL_INT ldy);
sparse_status_t mkl_sparse_z_trsm (sparse_operation_t operation, MKL_Complex16 alpha,
MKL_Complex16 *x, MKL_INT columns, MKL_INT ldx, MKL_Complex16 *y, MKL_INT ldy);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_trsm routine solves a system of linear equations with multiple right hand sides for a
triangular sparse matrix:
where:
alpha is a scalar, x and y are dense matrices, and A is a sparse matrix.
The mkl_sparse_?_mm and mkl_sparse_?_trsm routines support these configurations:
Column-major dense matrix: Row-major dense matrix: layout

layout = = SPARSE_LAYOUT_ROW_MAJOR
SPARSE_LAYOUT_COLUMN_MAJ
OR
0-based sparse matrix: CSR All formats

SPARSE_INDEX_BASE_ZERO
1-based sparse matrix: All formats CSR

SPARSE_INDEX_BASE_ONE
Input Parameters

N_TRANSPOSE
288
ANSPOSE
NJUGATE_TRANSPOSE


GENERAL




only.)

only.)


WER
PER
matrices:

T

N_MAJOR
289

AJOR
x Array of size at least rows*cols.
layout = layout =
LUMN_MAJOR MAJOR
rows (number of ldx number of rows in A

rows in x)
cols (number of columns ldx

columns in x)
On entry, the array x must contain the matrix x.
columns Number of columns in matrix y.
ldx Specifies the leading dimension of matrix x.
y Array of size at least rows*cols, where
layout = layout =
LUMN_MAJOR MAJOR
rows (number of ldy number of rows in A

rows in y)
cols (number of columns ldy

columns in y)
Output Parameters
y Overwritten by the updated matrix y.
Return Values

D

ED
290
mkl_sparse_?_add
Computes sum of two sparse matrices.
Syntax
sparse_status_t mkl_sparse_s_add (sparse_operation_t operation, const sparse_matrix_t
A, float alpha, const sparse_matrix_t B, sparse_matrix_t *C);
sparse_status_t mkl_sparse_d_add (sparse_operation_t operation, const sparse_matrix_t
A, double alpha, const sparse_matrix_t B, sparse_matrix_t *C);
sparse_status_t mkl_sparse_c_add (sparse_operation_t operation, const sparse_matrix_t
A, MKL_Complex8 alpha, const sparse_matrix_t B, sparse_matrix_t *C);
sparse_status_t mkl_sparse_z_add (sparse_operation_t operation, const sparse_matrix_t
A, MKL_Complex16 alpha, const sparse_matrix_t B, sparse_matrix_t *C);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_add routine performs a matrix-matrix operation:
C := alpha*op(A) + B
where alpha is a scalar and A, B, and C are sparse matrices.
Input Parameters
A Handle containing a sparse matrix in internal data structure.

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE
B Handle containing a sparse matrix in internal data structure.
Output Parameters
C Handle containing the resulting sparse matrix in internal data structure.
Return Values

D
291

ED
mkl_sparse_spmm
Computes the product of two sparse matrices and
stores the result as a sparse matrix.
Syntax
sparse_status_t mkl_sparse_spmm (sparse_operation_t operation, const sparse_matrix_t A,
const sparse_matrix_t B, sparse_matrix_t *C);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_spmm routine performs a matrix-matrix operation:
C := op(A) *B
where A, B, and C are sparse matrices.
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE
Output Parameters
C Handle containing the resulting sparse matrix in internal data structure.
Return Values
292
D

ED
mkl_sparse_?_spmmd
Computes the product of two sparse matrices and
stores the result as a dense matrix.
Syntax
sparse_status_t mkl_sparse_s_spmmd (sparse_operation_t operation, const sparse_matrix_t
A, const sparse_matrix_t B, sparse_layout_t layout, float *C, MKL_INT ldc);
sparse_status_t mkl_sparse_d_spmmd (sparse_operation_t operation, const sparse_matrix_t
A, const sparse_matrix_t B, sparse_layout_t layout, double *C, MKL_INT ldc);
sparse_status_t mkl_sparse_c_spmmd (sparse_operation_t operation, const sparse_matrix_t
A, const sparse_matrix_t B, sparse_layout_t layout, MKL_Complex8 *C, MKL_INT ldc);
sparse_status_t mkl_sparse_z_spmmd (sparse_operation_t operation, const sparse_matrix_t
A, const sparse_matrix_t B, sparse_layout_t layout, MKL_Complex16 *C, MKL_INT ldc);
Include Files
• mkl_spblas.h
Description
The mkl_sparse_?_spmmd routine performs a matrix-matrix operation:
C := op(A)*B
where A and B are sparse matrices and C is a dense matrix.
Input Parameters

N_TRANSPOSE
ANSPOSE
NJUGATE_TRANSPOSE
293

N_MAJOR
AJOR
ldC Leading dimension of matrix C.
Output Parameters
C Resulting dense matrix stored in internal data structure.
Return Values

D

ED
BLAS-like Extensions
Intel MKL provides C and Fortran routines to extend the functionality of the BLAS routines. These include
routines to compute vector products, matrix-vector products, and matrix-matrix products.
Intel MKL also provides routines to perform certain data manipulation, including matrix in-place and out-of-
place transposition operations combined with simple matrix arithmetic operations. Transposition operations
are Copy As Is, Conjugate transpose, Transpose, and Conjugate. Each routine adds the possibility of scaling
during the transposition operation by giving some alpha and/or beta parameters. Each routine supports
both row-major orderings and column-major orderings.
Table “BLAS-like Extensions” lists these routines.
The <?> symbol in the routine short names is a precision prefix that indicates the data type:
s float
d double
c MKL_Complex8
z MKL_Complex16
294
BLAS-like Extensions
Routine Data Types Description
cblas_?axpby s, d, c, z Scales two vectors, adds them to one another and stores
result in the vector (routines).
cblas_?gemmt s, d, c, z Computes a matrix-matrix product with general matrices

but updates only the upper or lower triangular part of the
result matrix.
cblas_?gemm3m c, z Computes a scalar-matrix-matrix product using matrix

multiplications and adds the result to a scalar-matrix
product.
cblas_?gemm_batch s, d, c, z Computes scalar-matrix-matrix products and adds the

results to scalar matrix products for groups of general
matrices.
cblas_? c, z Computes a scalar-matrix-matrix product using matrix

gemm3m_batch multiplications and adds the result to a scalar-matrix
product.
mkl_?imatcopy s, d, c, z Performs scaling and in-place transposition/copying of

matrices.
mkl_?omatcopy s, d, c, z Performs scaling and out-of-place transposition/copying of

matrices.
mkl_?omatcopy2 s, d, c, z Performs two-strided scaling and out-of-place

transposition/copying of matrices.
mkl_?omatadd s, d, c, z Performs scaling and sum of two matrices including their

out-of-place transposition/copying.
cblas_?gemm_alloc s, d Allocates storage for a packed matrix.
?cblas_gemm_pack s, d Performs scaling and packing of the matrix into the

previously allocated buffer.
cblas_? s, d Computes a matrix-matrix product with general matrices

gemm_compute where one or both input matrices are stored in a packed
data structure and adds the result to a scalar-matrix
product.
cblas_?gemm_free s, d Frees the storage previously allocated for the packed

matrix.
cblas_?axpby
Scales two vectors, adds them to one another and
stores result in the vector.
Syntax
void cblas_saxpby (const MKL_INT n, const float a, const float *x, const MKL_INT incx,
const float b, float *y, const MKL_INT incy);
void cblas_daxpby (const MKL_INT n, const double a, const double *x, const MKL_INT
incx, const double b, double *y, const MKL_INT incy);
void cblas_caxpby (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
const void *b, void *y, const MKL_INT incy);
295
void cblas_zaxpby (const MKL_INT n, const void *a, const void *x, const MKL_INT incx,
const void *b, void *y, const MKL_INT incy);
Include Files
• mkl.h
Description
The ?axpby routines perform a vector-vector operation defined as
y := a*x + b*y
where:
a and b are scalars
x and y are vectors each with n elements.
Input Parameters
b Specifies the scalar b.
Output Parameters
Example
For examples of routine usage, see the code in the Intel MKL installation directory:
• cblas_saxpby: examples\cblas\source\cblas_saxpbyx.c
• cblas_daxpby: examples\cblas\source\cblas_daxpbyx.c
• cblas_caxpby: examples\cblas\source\cblas_caxpbyx.c
• cblas_zaxpby: examples\cblas\source\cblas_zaxpbyx.c
cblas_?gemmt
matrices but updates only the upper or lower
triangular part of the result matrix.
Syntax
void cblas_sgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT n, const MKL_INT k,
const float alpha, const float *a, const MKL_INT lda, const float *b, const MKL_INT
ldb, const float beta, float *c, const MKL_INT ldc);
296
void cblas_dgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
const double alpha, const double *a, const MKL_INT lda, const double *b, const MKL_INT
ldb, const double beta, double *c, const MKL_INT ldc);
void cblas_cgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
const void *alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb,
const void *beta, void *c, const MKL_INT ldc);
void cblas_zgemmt (const CBLAS_LAYOUT Layout, const CBLAS_UPLO uplo, const
const void *alpha, const void *a, const MKL_INT lda, const void *b, const MKL_INT ldb,
const void *beta, void *c, const MKL_INT ldc);
Include Files
• mkl.h
Description
The ?gemmt routines compute a scalar-matrix-matrix product with general matrices and add the result to the
upper or lower part of a scalar-matrix product. These routines are similar to the ?gemm routines, but they
only access and update a triangular part of the square result matrix (see Application Notes below).
The operation is defined as
where:
op(A) is an n-by-k matrix,
C is an n-by-n upper or lower triangular matrix.
Input Parameters

If uplo = 'U' or 'u', then the upper triangular part of the array c is used.
If uplo = 'L' or 'l', then the lower triangular part of the array c is used.
if transa = 'N' or 'n', then op(A) = A;
if transa = 'T' or 't', then op(A) = AT;
if transa = 'C' or 'c', then op(A) = AH.
if transb = 'N' or 'n', then op(B) = B;
297
if transb = 'T' or 't', then op(B) = BT;
if transb = 'C' or 'c', then op(B) = BH.
rows of the matrix op(B). The value of k must be at least zero.
a Array, size lda by ka, where ka is k when transa = 'N' or 'n', and is n
otherwise. Before entry with transa = 'N' or 'n', the leading n-by-k part
of the array a must contain the matrix A, otherwise the leading k-by-n part

(sub)program. When transa = 'N' or 'n', then lda must be at least
max(1, n), otherwise lda must be at least max(1, k).
b Array, size ldb by kb, where kb is n when transb = 'N' or 'n', and is k
otherwise. Before entry with transb = 'N' or 'n', the leading k-by-n part
of the array b must contain the matrix B, otherwise the leading n-by-k part

(sub)program. When transb = 'N' or 'n', then ldb must be at least
max(1, k), otherwise ldb must be at least max(1, n).
beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.
Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular
part of the array c must contain the upper triangular part of the matrix C
and the strictly lower triangular part of c is not referenced.
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular
part of the array c must contain the lower triangular part of the matrix C
and the strictly upper triangular part of c is not referenced.

Output Parameters
c When uplo = 'U' or 'u', the upper triangular part of the array c is
When uplo = 'L' or 'l', the lower triangular part of the array c is
298
Application Notes
These routines only access and update the upper or lower triangular part of the result matrix. This can be
useful when the result is known to be symmetric; for example, when computing a product of the form C :=
alpha*B*S*BT + beta*C , where S and C are symmetric matrices and B is a general matrix. In this case,
first compute A := B*S (which can be done using the corresponding ?symm routine), then compute C :=
alpha*A*BT + beta*C using the ?gemmt routine.
cblas_?gemm3m
Computes a scalar-matrix-matrix product using matrix
multiplications and adds the result to a scalar-matrix
product.
Syntax
void cblas_cgemm3m (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
void cblas_zgemm3m (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const
Include Files
• mkl.h
Description
The ?gemm3m routines perform a matrix-matrix operation with general complex matrices. These routines are
similar to the ?gemm routines, but they use fewer matrix multiplication operations (see Application Notes
below).
where:
op(x) is one of op(x) = x, or op(x) = x', or op(x) = conjg(x'),
Input Parameters

299
if transa=CblasTrans, then op(A) = A';
if transb=CblasNoTrans, then op(B) = B;
if transb=CblasTrans, then op(B) = B';
if transb=CblasConjTrans, then op(B) = conjg(B').
columns of the matrix C.
rows of the matrix op(B).
a
Layout = Array, size lda*k. Array, size lda*m.

CblasColMajor
m-by-k part of the array a by-m part of the array a
A.
Layout = Array, size lda* m. Array, size lda*k.

CblasRowMajor
Before entry, the leading Before entry, the leading m-
k-by-m part of the array a by-k part of the array a
A.

(sub)program.

CblasColMajor max(1, m). max(1, k)

CblasRowMajor max(1, k) max(1, m).
300
b
Layout = Array, size ldb by n. Array, size ldb by k. Before

CblasColMajor Before entry, the leading entry the leading n-by-k
k-by-n part of the array b part of the array b must
B.
Layout = Array, size ldb by k. Array, size ldb by n. Before

CblasRowMajor Before entry the leading entry, the leading k-by-n
n-by-k part of the array b part of the array b must
B.

(sub)program.

CblasColMajor max(1, k). max(1, n).

CblasRowMajor max(1, n). max(1, k).

c
Layout = Array, size ldc by n. Before entry, the leading m-
CblasColMajor by-n part of the array c must contain the matrix C,
Layout = Array, size ldc by m. Before entry, the leading n-

CblasRowMajor by-m part of the array c must contain the matrix C,

(sub)program.
Output Parameters
c Overwritten by the m-by-n matrix (alpha*op(A)*op(B) + beta*C).
301
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)= fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).
Thus the corresponding matrix multiplications are stable.
cblas_?gemm_batch
Computes scalar-matrix-matrix products and adds the
matrices.
Syntax
void cblas_sgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array,
const MKL_INT* k_array, const float* alpha_array, const float **a_array, const MKL_INT*
lda_array, const float **b_array, const MKL_INT* ldb_array, const float* beta_array,
float **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
void cblas_dgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const MKL_INT* k_array, const double* alpha_array, const double **a_array, const
MKL_INT* lda_array, const double **b_array, const MKL_INT* ldb_array, const double*
beta_array, double **c_array, const MKL_INT* ldc_array, const MKL_INT group_count,
const MKL_INT* group_size);
void cblas_cgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT*
lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void
**c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
void cblas_zgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array,
const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT*
lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void
**c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT*
group_size);
302
Include Files
• mkl.h
Description
The ?gemm_batch routines perform a series of matrix-matrix operations with general matrices. They are
similar to the ?gemm routine counterparts, but the ?gemm_batch routines perform matrix-matrix operations
with groups of matrices, processing a number of groups at once. The groups contain matrices with the same
parameters.
idx = 0
for i = 0..group_count - 1
alpha and beta in alpha_array[i] and beta_array[i]
for j = 0..group_size[i] - 1
A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
idx = idx + 1
end for
end for
where:
alpha and beta are scalar elements of alpha_array and beta_array,
A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:

A, B, and C represent matrices stored at addresses pointed to by a_array, b_array, and c_array,
respectively. The number of entries in a_array, b_array, and c_array is total_batch_count = the sum of all
of the group_size entries.
See also gemm for a detailed description of multiplication for general matrices and ?gemm3m_batch, BLAS-
like extension routines for similar matrix-matrix operations.
NOTE
Error checking is not performed for Intel MKL Windows* single dynamic libraries for the ?gemm_batch
routines.
Input Parameters

transa_array Array of size group_count. For the group i, transai = transa_array[i]

specifies the form of op(A) used in the matrix multiplication:
if transai = CblasNoTrans, then op(A) = A;
if transai = CblasTrans, then op(A) = AT;
if transai = CblasConjTrans, then op(A) = AH.
303
transb_array Array of size group_count. For the group i, transbi = transb_array[i]

specifies the form of op(Bi) used in the matrix multiplication:
if transbi = CblasNoTrans, then op(B) = B;
if transbi = CblasTrans, then op(B) = BT;
if transbi = CblasConjTrans, then op(B) = BH.
m_array Array of size group_count. For the group i, mi = m_array[i] specifies the
number of rows of the matrix op(A) and of the matrix C.
The value of each element of m_array must be at least zero.
n_array Array of size group_count. For the group i, ni = n_array[i] specifies the
number of columns of the matrix op(B) and the number of columns of the
matrix C.
The value of each element of n_array must be at least zero.
k_array Array of size group_count. For the group i, ki = k_array[i] specifies the
number of columns of the matrix op(A) and the number of rows of the
matrix op(B).
The value of each element of k_array must be at least zero.
alpha_array Array of size group_count. For the group i, alpha_array[i] specifies the
scalar alphai.
a_array Array, size total_batch_count, of pointers to arrays used to store A

matrices.
lda_array Array of size group_count. For the group i, ldai = lda_array[i]

specifies the leading dimension of the array storing matrix A as declared in
the calling (sub)program.
transai=CblasNoTrans transai=CblasTrans or
transai=CblasConjTrans
Layout = ldai must be at least ldai must be at least

CblasColMajor max(1, mi). max(1, ki)

CblasRowMajor max(1, ki) max(1, mi).
b_array Array, size total_batch_count, of pointers to arrays used to store B

matrices.
ldb_array Array of size group_count. For the group i, ldbi = ldb_array[i]

specifies the leading dimension of the array storing matrix B as declared in
transbi=CblasNoTrans transbi=CblasTrans or
transbi=CblasConjTrans
Layout = ldbi must be at least ldbi must be at least

CblasColMajor max(1, ki). max(1, ni).
304
CblasRowMajor max(1, ni). max(1, ki).
beta_array Array of size group_count. For the group i, beta_array[i] specifies the
scalar betai.
When betai is equal to zero, then C matrices in group i need not be set on
input.
c_array Array, size total_batch_count, of pointers to arrays used to store C

matrices.
ldc_array Array of size group_count. For the group i, ldci = ldc_array[i]

specifies the leading dimension of all arrays storing matrix C in group i as
declared in the calling (sub)program.
When Layout = CblasColMajorldci must be at least max(1, mi).
When Layout = CblasRowMajorldci must be at least max(1, ni).
group_count Specifies the number of groups. Must be at least 0.
group_size Array of size group_count. The element group_size[i] specifies the

number of matrices in group i. Each element in group_size must be at
least 0.
Output Parameters
c_array Overwritten by the mi-by-ni matrix (alphai*op(A)*op(B) + betai*C) for

group i.
cblas_?gemm3m_batch
Computes scalar-matrix-matrix products and adds the
matrices.
Syntax
void cblas_cgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE*
transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const
MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void
**a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array,
const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT
group_count, const MKL_INT* group_size);
void cblas_zgemm3m_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE*
transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const
MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void
**a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array,
const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT
group_count, const MKL_INT* group_size);
Include Files
• mkl.h
305
Description
The ?gemm3m_batch routines perform a series of matrix-matrix operations with general matrices. They are
similar to the ?gemm3m routine counterparts, but the ?gemm3m_batch routines perform matrix-matrix
operations with groups of matrices, processing a number of groups at once. The groups contain matrices with
the same parameters. The ?gemm3m_batch routines use fewer matrix multiplications than the ?gemm_batch
routines, as described in the Application Notes.
idx = 0
for i = 0..group_count - 1
alpha and beta in alpha_array[i] and beta_array[i]
for j = 0..group_size[i] - 1
A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
idx = idx + 1
end for
end for
where:
alpha and beta are scalar elements of alpha_array and beta_array,
A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:

A, B, and C represent matrices stored at addresses pointed to by a_array, b_array, and c_array,
respectively. The number of entries in a_array, b_array, and c_array is total_batch_count = the sum of all
the group_size entries.
See also gemm for a detailed description of multiplication for general matrices and gemm_batch, BLAS-like
extension routines for similar matrix-matrix operations.
NOTE
Error checking is not performed for Intel MKL Windows* single dynamic libraries for the ?
gemm3m_batch routines.
Input Parameters

transa_array Array of size group_count. For the group i, transai = transa_array[i]

specifies the form of op(A) used in the matrix multiplication:
if transai = CblasNoTrans, then op(A) = A;
if transai = CblasTrans, then op(A) = AT;
if transai = CblasConjTrans, then op(A) = AH.
transb_array Array of size group_count. For the group i, transbi = transb_array[i]

specifies the form of op(Bi) used in the matrix multiplication:
306
if transbi = CblasNoTrans, then op(B) = B;
if transbi = CblasTrans, then op(B) = BT;
if transbi = CblasConjTrans, then op(B) = BH.
m_array Array of size group_count. For the group i, mi = m_array[i] specifies the
number of rows of the matrix op(A) and of the matrix C.
The value of each element of m_array must be at least zero.
n_array Array of size group_count. For the group i, ni = n_array[i] specifies the
number of columns of the matrix op(B) and the number of columns of the
matrix C.
The value of each element of n_array must be at least zero.
k_array Array of size group_count. For the group i, ki = k_array[i] specifies the
number of columns of the matrix op(A) and the number of rows of the
matrix op(B).
The value of each element of k_array must be at least zero.
alpha_array Array of size group_count. For the group i, alpha_array[i] specifies the
scalar alphai.
a_array Array, size total_batch_count, of pointers to arrays used to store A

matrices.
lda_array Array of size group_count. For the group i, ldai = lda_array[i]

specifies the leading dimension of the array storing matrix A as declared in
transai=CblasNoTrans transai=CblasTrans or
transai=CblasConjTrans

CblasColMajor max(1, mi). max(1, ki)

CblasRowMajor max(1, ki) max(1, mi).
b_array Array, size total_batch_count, of pointers to arrays used to store B

matrices.
ldb_array Array of size group_count. For the group i, ldbi = ldb_array[i]

specifies the leading dimension of the array storing matrix B as declared in
transbi=CblasNoTrans transbi=CblasTrans or
transbi=CblasConjTrans

CblasColMajor max(1, ki). max(1, ni).
307

CblasRowMajor max(1, ni). max(1, ki).
beta_array For the group i, beta_array[i] specifies the scalar betai.
When betai is equal to zero, then C matrices in group i need not be set on
input.
c_array Array, size total_batch_count, of pointers to arrays used to store C

matrices.
ldc_array Array of size group_count. For the group i, ldci = ldc_array[i]

specifies the leading dimension of all arrays storing matrix C in group i as
declared in the calling (sub)program.
When Layout = CblasColMajorldci must be at least max(1, mi).
When Layout = CblasRowMajorldci must be at least max(1, ni).
group_count Specifies the number of groups. Must be at least 0.
group_size Array of size group_count. The element group_size[i] specifies the

number of matrices in group i. Each element in group_size must be at
least 0.
Output Parameters
c_array Overwritten by the mi-by-ni matrix (alphai*op(A)*op(B) + betai*C) for

group i.
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input
matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional
four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications
reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for
large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)= fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are
satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).
Thus the corresponding matrix multiplications are stable.
mkl_?imatcopy
Performs scaling and in-place transposition/copying of
matrices.
308
Syntax
void mkl_simatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const float alpha, float * AB, size_t lda, size_t ldb);
void mkl_dimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const double alpha, double * AB, size_t lda, size_t ldb);
void mkl_cimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const MKL_Complex8 alpha, MKL_Complex8 * AB, size_t lda, size_t ldb);
void mkl_zimatcopy (const char ordering, const char trans, size_t rows, size_t cols,
const MKL_Complex16 alpha, MKL_Complex16 * AB, size_t lda, size_t ldb);
Include Files
• mkl.h
Description
The mkl_?imatcopy routine performs scaling and in-place transposition/copying of matrices. A transposition
operation can be a normal matrix copy, a transposition, a conjugate transposition, or just a conjugation. The
operation is defined as follows:
AB := alpha*op(AB).
NOTE
Different arrays must not overlap.
Input Parameters
ordering Ordering of the matrix storage.

If ordering = 'R' or 'r', the ordering is row-major.
If ordering = 'C' or 'c', the ordering is column-major.
trans Parameter that specifies the operation type.

If trans = 'N' or 'n', op(AB)=AB and the matrix AB is assumed
unchanged on input.
If trans = 'T' or 't', it is assumed that AB should be transposed.
If trans = 'C' or 'c', it is assumed that AB should be conjugate

transposed.
If trans = 'R' or 'r', it is assumed that AB should be only conjugated.
If the data is real, then trans = 'R' is the same as trans = 'N', and
trans = 'C' is the same as trans = 'T'.
rows The number of rows in matrix AB before the transpose operation.
cols The number of columns in matrix AB before the transpose operation.
ab Array.
alpha This parameter scales the input matrix by alpha.
309
lda Distance between the first elements in adjacent columns (in the case of the
column-major order) or rows (in the case of the row-major order) in the
source matrix; measured in the number of elements.
This parameter must be at least rows if ordering = 'C' or 'c', and
max(1,cols) otherwise.
ldb Distance between the first elements in adjacent columns (in the case of the
destination matrix; measured in the number of elements.
To determine the minimum value of ldb on output, consider the following
guideline:
If ordering = 'C' or 'c', then
• If trans = 'T' or 't' or 'C' or 'c', this parameter must be at least

max(1,cols)
• If trans = 'N' or 'n' or 'R' or 'r', this parameter must be at least
max(1,rows)
If ordering = 'R' or 'r', then
• If trans = 'T' or 't' or 'C' or 'c', this parameter must be at least

max(1,rows)
• If trans = 'N' or 'n' or 'R' or 'r', this parameter must be at least
max(1,cols)
Output Parameters
ab Array.
Contains the matrix AB.
Interfaces
mkl_?omatcopy
Performs scaling and out-place transposition/copying
of matrices.
Syntax
void mkl_somatcopy (char ordering, char trans, size_t rows, size_t cols, const float
alpha, const float * A, size_t lda, float * B, size_t ldb);
void mkl_domatcopy (char ordering, char trans, size_t rows, size_t cols, const double
alpha, const double * A, size_t lda, double * B, size_t ldb);
void mkl_comatcopy (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, MKL_Complex8 * B, size_t ldb);
void mkl_zomatcopy (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, MKL_Complex16 * B, size_t
ldb);
Include Files
• mkl.h
310
Description
The mkl_?omatcopy routine performs scaling and out-of-place transposition/copying of matrices. A

transposition operation can be a normal matrix copy, a transposition, a conjugate transposition, or just a
conjugation. The operation is defined as follows:
B := alpha*op(A)
NOTE
Input Parameters


If trans = 'N' or 'n', op(A)=A and the matrix A is assumed unchanged
on input.
If trans = 'T' or 't', it is assumed that A should be transposed.
If trans = 'C' or 'c', it is assumed that A should be conjugate

transposed.
If trans = 'R' or 'r', it is assumed that A should be only conjugated.
rows The number of rows in matrix B (the destination matrix).
cols The number of columns in matrix B (the destination matrix).
a Array.
lda
If ordering = 'R' or 'r', lda represents the number of elements in array
a between adjacent rows of matrix A; lda must be at least equal to the
number of columns of matrix A.
If ordering = 'C' or 'c', lda represents the number of elements in array
a between adjacent columns of matrix A; lda must be at least equal to the
number of row in matrix A.
b Array.
ldb
If ordering = 'R' or 'r', ldb represents the number of elements in array
b between adjacent rows of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', ldb must be at least equal to

rows.
311
• If trans = 'N' or 'n' or 'R' or 'r', ldb must be at least equal to

cols.
If ordering = 'C' or 'c', ldb represents the number of elements in array

b between adjacent columns of matrix B.

cols.
rows.
Output Parameters
b Array, size at least m.
Contains the destination matrix.
Interfaces
mkl_?omatcopy2
Performs two-strided scaling and out-of-place
transposition/copying of matrices.
Syntax
void mkl_somatcopy2 (char ordering, char trans, size_t rows, size_t cols, const float
alpha, const float * A, size_t lda, size_t stridea, float * B, size_t ldb, size_t
strideb);
void mkl_domatcopy2 (char ordering, char trans, size_t rows, size_t cols, const double
alpha, const double * A, size_t lda, size_t stridea, double * B, size_t ldb, size_t
strideb);
void mkl_comatcopy2 (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, size_t stridea, MKL_Complex8 *
B, size_t ldb, size_t strideb);
void mkl_zomatcopy2 (char ordering, char trans, size_t rows, size_t cols, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, size_t stridea, MKL_Complex16
* B, size_t ldb, size_t strideb);
Include Files
• mkl.h
Description
The mkl_?omatcopy2 routine performs two-strided scaling and out-of-place transposition/copying of

matrices. A transposition operation can be a normal matrix copy, a transposition, a conjugate transposition,
or just a conjugation. The operation is defined as follows:
B := alpha*op(A)
Normally, matrices in the BLAS or LAPACK are specified by a single stride index. For instance, in the column-
major order, A(2,1) is stored in memory one element away from A(1,1), but A(1,2) is a leading dimension
away. The leading dimension in this case is at least the number of rows of the source matrix. If a matrix has
two strides, then both A(2,1) and A(1,2) may be an arbitrary distance from A(1,1).
312
NOTE
Input Parameters


If trans = 'N' or 'n', op(A)=A and the matrix A is assumed unchanged
on input.
If trans = 'T' or 't', it is assumed that A should be transposed.
If trans = 'C' or 'c', it is assumed that A should be conjugate

transposed.
If trans = 'R' or 'r', it is assumed that A should be only conjugated.
rows The number of rows in matrix B (the destination matrix).
cols The number of columns in matrix B (the destination matrix).
a Array.
lda
If ordering = 'R' or 'r', lda represents the number of elements in array
a between adjacent rows of matrix A; lda must be at least equal to the
number of columns of matrix A.
If ordering = 'C' or 'c', lda represents the number of elements in array
a between adjacent columns of matrix A; lda must be at least 1 and not
more than the number of columns in matrix A.
stridea
If ordering = 'R' or 'r', stridea represents the number of elements in
array a between adjacent columns of matrix A. stridea must be at least 1
and not more than the number of columns in matrix A.
If ordering = 'C' or 'c', stridea represents the number of elements in
array a between adjacent rows of matrix A. stridea must be at least equal
to the number of columns in matrix A.
b Array.
ldb
If ordering = 'R' or 'r', ldb represents the number of elements in array
b between adjacent rows of matrix B.

rows/strideb.
313

cols/strideb.
If ordering = 'C' or 'c', ldb represents the number of elements in array

b between adjacent columns of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', ldb must be at least 1 and not
more than rows/strideb.
• If trans = 'N' or 'n' or 'R' or 'r', ldb must be at least 1 and not
more than cols/strideb.
strideb
If ordering = 'R' or 'r', strideb represents the number of elements in
array b between adjacent columns of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', strideb must be at least 1 and

not more than rows (the number of rows in matrix B).
• If trans = 'N' or 'n' or 'R' or 'r', strideb must be at least 1 and
not more than cols (the number of columns in matrix B).
If ordering = 'C' or 'c', strideb represents the number of elements in

array b between adjacent rows of matrix B.
• If trans = 'T' or 't' or 'C' or 'c', strideb must be at least equal to

rows (the number of rows in matrix B).
• If trans = 'N' or 'n' or 'R' or 'r', strideb must be at least equal to
cols (the number of columns in matrix B).
Output Parameters
b Array, size at least m.
Contains the destination matrix.
Interfaces
mkl_?omatadd
Scales and sums two matrices including in addition to
performing out-of-place transposition operations.
Syntax
void mkl_somatadd (char ordering, char transa, char transb, size_t m, size_t n, const
float alpha, const float * A, size_t lda, const float beta, const float * B, size_t
ldb, float * C, size_t ldc);
void mkl_domatadd (char ordering, char transa, char transb, size_t m, size_t n, const
double alpha, const double * A, size_t lda, const double beta, const double * B,
size_t ldb, double * C, size_t ldc);
void mkl_comatadd (char ordering, char transa, char transb, size_t m, size_t n, const
MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, const MKL_Complex8 beta, const
MKL_Complex8 * B, size_t ldb, MKL_Complex8 * C, size_t ldc);
314
void mkl_zomatadd (char ordering, char transa, char transb, size_t m, size_t n, const
MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, const MKL_Complex16 beta,
const MKL_Complex16 * B, size_t ldb, MKL_Complex16 * C, size_t ldc);
Include Files
• mkl.h
Description
The mkl_?omatadd routine scales and adds two matrices, as well as performing out-of-place transposition
operations. A transposition operation can be no operation, a transposition, a conjugate transposition, or a
conjugation (without transposition). The following out-of-place memory movement is done:
C := alpha*op(A) + beta*op(B)
where the op(A) and op(B) operations are transpose, conjugate-transpose, conjugate (no transpose), or no
transpose, depending on the values of transa and transb. If no transposition of the source matrices is
required, m is the number of rows and n is the number of columns in the source matrices A and B. In this
case, the output matrix C is m-by-n.
NOTE
Note that different arrays must not overlap.
Input Parameters

transa Parameter that specifies the operation type on matrix A.

If transa = 'N' or 'n', op(A)=A and the matrix A is assumed unchanged
on input.
If transa = 'T' or 't', it is assumed that A should be transposed.
If transa = 'C' or 'c', it is assumed that A should be conjugate

transposed.
If transa = 'R' or 'r', it is assumed that A should be conjugated (and not
transposed).
If the data is real, then transa = 'R' is the same as transa = 'N', and
transa = 'C' is the same as transa = 'T'.
transb Parameter that specifies the operation type on matrix B.

If transb = 'N' or 'n', op(B)=B and the matrix B is assumed unchanged
on input.
If transb = 'T' or 't', it is assumed that B should be transposed.
If transb = 'C' or 'c', it is assumed that B should be conjugate

transposed.
If transb = 'R' or 'r', it is assumed that B should be conjugated (and not
transposed).
315
If the data is real, then transb = 'R' is the same as transb = 'N', and
transb = 'C' is the same as transb = 'T'.
m The number of matrix rows in op(A), op(B), and C.
n The number of matrix columns in op(A), op(B), and C.
a Array.
lda Distance between the first elements in adjacent columns (in the case of the
source matrix A; measured in the number of elements.
For ordering = 'C' or 'c': when transa = 'N', 'n', 'R', or 'r', lda
must be at least max(1,m); otherwise lda must be max(1,n).
For ordering = 'R' or 'r': when transa = 'N', 'n', 'R', or 'r', lda
must be at least max(1,n); otherwise lda must be max(1,m).
beta This parameter scales the input matrix by beta.
b Array.
ldb Distance between the first elements in adjacent columns (in the case of the
source matrix B; measured in the number of elements.
For ordering = 'C' or 'c': when transa = 'N', 'n', 'R', or 'r', ldb
must be at least max(1,m); otherwise ldb must be max(1,n).
For ordering = 'R' or 'r': when transa = 'N', 'n', 'R', or 'r', ldb
must be at least max(1,n); otherwise ldb must be max(1,m).
ldc Distance between the first elements in adjacent columns (in the case of the
destination matrix C; measured in the number of elements.
If ordering = 'C' or 'c', then ldc must be at least max(1, m),
otherwise ldc must be at least max(1, n).
Output Parameters
c Array.
Interfaces
cblas_?gemm_alloc
Allocates storage for a packed matrix.
Syntax
float* cblas_sgemm_alloc (const CBLAS_IDENTIFIER identifier, const MKL_INT m, const
MKL_INT n, const MKL_INT k);
double* cblas_dgemm_alloc (const CBLAS_IDENTIFIER identifier, const MKL_INT m, const
MKL_INT n, const MKL_INT k);
316
Include Files
• mkl.h
Description
The cblas_?gemm_alloc routine is one of a set of related routines that enable use of an internal packed
storage. Call the cblas_?gemm_alloc routine first to allocate storage for a packed matrix structure to be
used in subsequent calls, ultimately to compute
where:
op(X) is one of the operations op(X) = X, op(X) = XT, or op(X) = XH,

A , B, and C are matrices:
The cblas_?gemm_alloc routine is not supported on Windows* OS for the IA-32 architecture with
single dynamic library linking.
Input Parameters
identifier Specifies which matrix is to be packed:

If identifier = CblasAMatrix, the routine allocates storage to pack
matrix A.
If identifier = CblasBMatrix, the routine allocates storage to pack
matrix B.
m Specifies the number of rows of matrix op(A) and of the matrix C. The value
of m must be at least zero.
n Specifies the number of columns of matrix op(B) and the number of

columns of matrix C. The value of n must be at least zero.
k Specifies the number of columns of matrix op(A) and the number of rows of
matrix op(B). The value of k must be at least zero.
Return Values
The function returns the allocated storage.
See Also
cblas_?gemm_packPerforms scaling and packing of the matrix into the previously allocated buffer.
cblas_?gemm_computeComputes a matrix-matrix product with general matrices where one or
both input matrices are stored in a packed data structure and adds the result to a scalar-matrix
product.
cblas_?gemm_freeFrees the storage previously allocated for the packed matrix.
cblas_?gemm for a detailed description of general matrix multiplication.
317
cblas_?gemm_pack
Performs scaling and packing of the matrix into the
previously allocated buffer.
Syntax
void cblas_sgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
float alpha, const float *src, const MKL_INT ld, float *dest);
void cblas_dgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier,
const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const
double alpha, const double *src, const MKL_INT ld, double *dest);
Include Files
• mkl.h
Description
The cblas_?gemm_pack routine is one of a set of related routines that enable use of an internal packed
storage. Call cblas_?gemm_pack after successfully calling cblas_?gemm_alloc. The cblas_?gemm_pack
scales the identified matrix by alpha and packs it into the buffer allocated previously with cblas_?
gemm_alloc.
The cblas_?gemm_pack routine performs this operation:
dest := alpha*op(src) as part of the computation C := alpha*op(A)*op(B) + beta*C

where:

src is a matrix,
A , B, and C are matrices
op(src) is an m-by-k matrix if identifier = CblasAMatrix,
op(src) is a k-by-n matrix if identifier = CblasBMatrix,
dest is an internal packed storage buffer.
NOTE
You must use the same value of the Layout parameter for the entire sequence of related cblas_?
gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.
Input Parameters

identifier Specifies which matrix is to be packed:

If identifier = CblasAMatrix, the routine allocates storage to pack
matrix A.
318
If identifier = CblasBMatrix, the routine allocates storage to pack
matrix B.
trans Specifies the form of op(src) used in the packing:
If trans = CblasNoTrans op(src) = src.
If trans = CblasTrans op(src) = srcT.
If trans = CblasConjTrans op(src) = srcH.
columns of the matrix C. The value of n must be at least zero.
src Array:
identifier = identifier = CblasBMatrix

CblasAMatrix
trans = trans = trans = trans =

CblasNoT CblasTra CblasNoTrans CblasTrans
rans ns or or trans =
trans = CblasConjTra
CblasCon ns
jTrans
Layout = Size Size Size ld*n. Size ld*k.

CblasCol ld*k. ld*m.
Before entry, Before entry,
Major Before Before the leading k- the leading n-
entry, the entry, the by-n part of by-k part of
leading m- leading k- the array src the array src
by-k part by-m part must contain must contain
of the of the the matrix B. the matrix B.
array src array src
must must
contain contain
the matrix the matrix
A. A.
Layout = Size Size Size ld*k. Size ld*n.

CblasRow ld*m. ld*k.
Before entry, Before entry,
Major Before Before the leading n- the leading k-
entry, the entry, the by-k part of by-n part of
leading k- leading m- the array src the array src
by-m part by-k part must contain must contain
of the of the the matrix B. the matrix B.
array src array src
319

CblasAMatrix

CblasCon ns
jTrans
must must
contain contain
the matrix the matrix
A. A.
ld Specifies the leading dimension of src as declared in the calling

(sub)program.

CblasAMatrix

CblasCon ns
jTrans
Layout = ld must ld must ld must be at ld must be at

CblasCol be at be at least max(1, least max(1,
Major least least k). n).
max(1, max(1,
m). k).
Layout = ld must ld must ld must be at ld must be at

CblasRow be at be at least max(1, least max(1,
Major least least n). k).
max(1, max(1,
k). m).
dest Scaled and packed internal storage buffer.
Output Parameters
dest Overwritten by the matrix alpha*op(src).
See Also
cblas_?gemm_allocAllocates storage for a packed matrix.
product.
320
cblas_?gemm_compute
matrices where one or both input matrices are stored
in a packed data structure and adds the result to a
scalar-matrix product.
Syntax
void cblas_sgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float *a,
const MKL_INT lda, const float *b, const MKL_INT ldb, const float beta, float *c,
const MKL_INT ldc);
void cblas_dgemm_compute (const CBLAS_LAYOUT Layout, const MKL_INT transa, const
MKL_INT transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double *a,
const MKL_INT lda, const double *b, const MKL_INT ldb, const double beta, double *c,
const MKL_INT ldc);
Include Files
• mkl.h
Description
The cblas_?gemm_compute routine is one of a set of related routines that enable use of an internal packed
storage. After calling cblas_?gemm_pack call cblas_?gemm_compute to compute
C := op(A)*op(B) + beta*C,
where:

beta is a scalar,
A , B, and C are matrices:
NOTE
You must use the same value of the Layout parameter for the entire sequence of related cblas_?
gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both A and B matrices, you must use the same number of threads for packing A as for
packing B.
Input Parameters

transa Specifies the form of op(A) used in the matrix multiplication, one of the
CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:
If transa = CblasNoTrans op(A) = A.
321
If transa = CblasTrans op(A) = AT.
If transa = CblasConjTrans op(A) = AH.
If transa = CblasPacked the matrix in array a is packed and lda is

ignored.
transb Specifies the form of op(B) used in the matrix multiplication, one of the
CBLAS_TRANSPOSE or CBLAS_STORAGE enumerated types:
If transb = CblasNoTrans op(B) = B.
If transb = CblasTrans op(B) = BT.
If transb = CblasConjTrans op(B) = BH.
If transb = CblasPacked the matrix in array b is packed and ldb is

ignored.
columns of the matrix C. The value of n must be at least zero.
a Array:
transa = transa = transa =

CblasNoTrans CblasTrans or CblasPacked
transa =
CblasConjTrans
Layout = Size lda*k. Size lda*m. Stored in

CblasColMajor internal
Before entry, the Before entry, the
packed
leading m-by-k leading k-by-m part
format.
part of the array of the array a must
a must contain contain the matrix
the matrix A. A.
Layout = Size lda*m. Size lda*k. Stored in

CblasRowMajor internal
packed
leading k-by-m leading m-by-k part
format.
part of the array of the array a must
a must contain contain the matrix
the matrix A. A.

(sub)program.
transa = transa = transa =

transa =
CblasConjTrans
322
Layout = lda must be at lda must be at lda is ignored.
CblasColMajor least max(1, least max(1, k).
m).
Layout = lda must be at lda must be at lda is ignored.

CblasRowMajor least max(1, least max(1, m).
k).
b Array:
transb = transb = transb =

transb =
CblasConjTrans
Layout = Size ldb*n. Size ldb*k. Stored in

CblasColMajor internal
packed
leading k-by-n leading n-by-k part
format.
part of the array of the array b must
b must contain contain the matrix
the matrix B. B.
Layout = Size ldb*k. Size ldb*n. Stored in

CblasRowMajor internal
packed
leading n-by-k leading k-by-n part
format.
part of the array of the array b must
b must contain contain the matrix
the matrix B. B.

(sub)program.
transb = transb = transb =

CblasNoTrans CblasTransor CblasPacked
transb =
CblasConjTrans
Layout = ldb must be at ldb must be at ldb is ignored.

CblasColMajor least max(1, least max(1, n).
k).
Layout = ldb must be at ldb must be at ldb is ignored.

CblasRowMajor least max(1, least max(1, k).
n).
beta Specifies the scalar beta. When beta is equal to zero, then c need not be
set on input.
c Array:
323
Layout = Size ldc*n.

CblasColMajor
Before entry, the leading m-by-n part of the array c
must contain the matrix C, except when beta is
equal to zero, in which case c need not be set on
entry.
Layout = Size ldc*m.

CblasRowMajor
Before entry, the leading n-by-m part of the array c
must contain the matrix C, except when beta is
equal to zero, in which case c need not be set on
entry.

(sub)program.
Output Parameters
c Overwritten by the m-by-n matrix op(A)*op(B) + beta*C.
See Also
cblas_?gemm_free
Frees the storage previously allocated for the packed
matrix.
Syntax
void cblas_sgemm_free (float *dest);
void cblas_dgemm_free (double *dest);
Include Files
• mkl.h
Description
The cblas_?gemm_free routine is one of a set of related routines that enable use of an internal packed
storage. Call the cblas_?gemm_free routine last to release storage for the packed matrix structure allocated
with cblas_?gemm_alloc.
Input Parameters
dest Previously allocated storage.
324
Output Parameters
dest The freed buffer.
See Also
product.
325
326
LAPACK Routines 3
This chapter describes the Intel® Math Kernel Library implementation of routines from the LAPACK package
that are used for solving systems of linear equations, linear least squares problems, eigenvalue and singular
value problems, and performing a number of related computational tasks. The library includes LAPACK
routines for both real and complex data. Routines are supported for systems of equations with the following
types of matrices:
• general
• banded
• symmetric or Hermitian positive-definite (full, packed, and rectangular full packed (RFP) storage)
• symmetric or Hermitian positive-definite banded
• symmetric or Hermitian indefinite (both full and packed storage)
• symmetric or Hermitian indefinite banded
• triangular (full, packed, and RFP storage)
• triangular banded
• tridiagonal
• diagonally dominant tridiagonal.
NOTE
Different arrays used as parameters to Intel MKL LAPACK routines must not overlap.
WARNING
LAPACK routines assume that input matrices do not contain IEEE 754 special values such as INF or
NaN values. Using these special values may cause LAPACK to return unexpected results or become
unstable.
C Interface Conventions for LAPACK Routines

The C interfaces are implemented for most of the Intel MKL LAPACK driver and computational routines.
Function Prototypes
Intel MKL supports four distinct floating-point precisions. Each corresponding prototype looks similar, usually
differing only in the data type. C interface LAPACK function names follow the form <?><name>, where <?> is:
• LAPACKE_s for float

• LAPACKE_d for double
• LAPACKE_c for lapack_complex_float
• LAPACKE_z for lapack_complex_double
A specific example follows. To solve a system of linear equations with a packed Cholesky-factored Hermitian
positive-definite matrix with complex precision, use the following:
lapack_int LAPACKE_c pptrs(int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float * ap, lapack_complex_float * b, lapack_int ldb);
327
Workspace Arrays
In contrast to the Fortran interface, the LAPACK C interface omits workspace parameters because workspace
is allocated during runtime and released upon completion of the function operation.
If you prefer to allocate workspace arrays yourself, the LAPACK C interface provides alternate interfaces with
work parameters. The name of the alternate interface is the same as the LAPACK C interface with _work
appended. For example, the syntax for the singular value decomposition of a real bidiagonal matrix is:
Fortran: call sbdsdc ( uplo, compq, n, d, e, u, ldu, vt, ldvt, q, iq,

work , iwork, info )
C LAPACK interface: lapack_int LAPACKE_sbdsdc ( int matrix_layout, char uplo, char
compq, lapack_int n, float* d, float* e, float* u, lapack_int
ldu, float* vt, lapack_int ldvt, float* q, lapack_int* iq );
Alternate C LAPACK lapack_int LAPACKE_sbdsdc_work( int matrix_layout, char uplo,
interface with work char compq, lapack_int n, float* d, float* e, float* u,
parameters: lapack_int ldu, float* vt, lapack_int ldvt, float* q, lapack_int*
iq, float* work , lapack_int* iwork );
See the install_dir/include/mkl_lapacke.h file for the full list of alternative C LAPACK interfaces.
The Intel MKL Fortran-specific documentation contains details about workspace arrays.
Mapping Fortran Data Types against C Data Types

Fortran Data Types vs. C Data Types
FORTRAN C
INTEGER lapack_int
LOGICAL lapack_logical
REAL float
DOUBLE PRECISION double
COMPLEX lapack_complex_float
COMPLEX*16/DOUBLE COMPLEX lapack_complex_double
CHARACTER char
C Type Definitions
You can find type definitions specific to Intel MKL such as MKL_INT, MKL_Complex8, and MKL_Complex16 in
install_dir/mkl_types.h.
C types #ifndef lapack_int

#define lapack_int MKL_INT
#endif
#ifndef lapack_logical
#define lapack_logical lapack_int
#endif
Complex Type Definitions Complex type for single precision:
#ifndef lapack_complex_float
#define lapack_complex_float MKL_Complex8
#endif
328
LAPACK Routines 3
Complex type for double precision:
#ifndef lapack_complex_double
#define lapack_complex_double MKL_Complex16
#endif
Matrix Layout Definitions #define LAPACK_ROW_MAJOR 101

#define LAPACK_COL_MAJOR 102
See Matrix Layout for LAPACK Routines for an explanation of row-major order
and column-major order storage.
Error Code Definitions #define LAPACK_WORK_MEMORY_ERROR -1010 /* Failed to allocate

memory
for a working array */
#define LAPACK_TRANSPOSE_MEMORY_ERROR -1011 /* Failed to allocate
memory
for transposed matrix */
Matrix Layout for LAPACK Routines

There are two general methods of storing a two dimensional matrix in linear (one dimensional) memory:
column-wise (column major order) or row-wise (row major order). Consider an M-by-N matrix A:
Column Major Layout

In column major layout the first index, i, of matrix elements ai,j changes faster than the second index when
accessing sequential memory locations. In other words, for 1 ≤i < M, if the element ai,j is stored in a specific
location in memory, the element ai+1,j is stored in the next location, and, for 1 ≤j < N, the element aM,j is
stored in the location previous to element a1,j+1. So the matrix elements are located in memory according to
this sequence:
{a1,1a2,1 ... aM,1a1,2a2,2 ... aM,2 ... ... a1,Na2,N ... aM,N}
329
Row Major Layout

In row major layout the second index, j, of matrix elements ai,j changes faster than the first index when
accessing sequential memory locations. In other words, for 1 ≤j < N, if the element ai,j is stored in a specific
location in memory, the element ai,j+1 is stored in the next location, and, for 1 ≤i < M, the element ai,N is
stored in the location previous to element ai+1,1. So the matrix elements are located in memory according to
this sequence:
{a1,1a1,2 ... a1,Na2,1a2,2 ... a2,N ... ... aN,1aN,2 ... aM,N}
Leading Dimension Parameter
A leading dimension parameter allows use of LAPACK routines on a submatrix of a larger matrix. For
example, the submatrix B can be extracted from the original matrix A defined previously:
B is formed from rows with indices i0 + 1 to i0 + K and columns j0 + 1 to j0 + L of matrix A. To specify matrix
B, LAPACK routines require four parameters:
• the number of rows K;
• the number of columns L;
• a pointer to the start of the array containing elements of B;
• the leading dimension of the array containing elements of B.
The leading dimension depends on the layout of the matrix:
• Column major layout
Leading dimension ldb=M, the number of rows of matrix A.
Starting address: offset by i0 + j0*ldb from a1,1.
• Row major layout

Leading dimension ldb=N, the number of columns of matrix A.
Starting address: offset by i0*ldb + j0 from a1,1.
330
LAPACK Routines 3
Matrix Storage Schemes for LAPACK Routines
LAPACK routines use the following matrix storage schemes:
• Full Storage
• Packed Storage
• Band Storage
• Rectangular Full Packed (RFP) Storage
Full Storage
Consider an m-by-n matrix A :
a 1, 1 a 1, 2 a 1, 3 ⋯ a 1, n
a 2, 1 a 2, 2 a 2, 3 ⋯ a 2, n
A = a 3, 1 a 3, 2 a 3, 3 ⋯ a 3, n
⋮ ⋮ ⋮ ⋱ ⋮
a m, 1 a m, 2 a m, 3 ⋯ a m, n
It is stored in a one-dimensional array a of length at least lda*n for column major layout or m*lda for row
major layout. Element ai,j is stored as array element a[k] where the mapping of k(i, j) is defined as
• column major layout: k(i, j) = i - 1 + (j - 1)*lda

• row major layout: k(i, j) = (i - 1)*lda + j - 1
NOTE
Although LAPACK accepts parameter values of zero for matrix size, in general the size of the array
used to store an m-by-n matrix A with leading dimension lda should be greater than or equal to
max(1, n*lda) for column major layout and max (1, m*lda) for row major layout.
NOTE
Even though the array used to store a matrix is one-dimensional, for simplicity the documentation
sometimes refers parts of the array such as rows, columns, upper and lower triangular part, and
diagonals. These refer to the parts of the matrix stored within the array. For example, the lower
triangle of array a is defined as the subset of elements a[k(i,j)] with i≥j.
Packed Storage
The packed storage format compactly stores matrix elements when only one part of the matrix, the upper or
lower triangle, is necessary to determine all of the elements of the matrix. This is the case when the matrix
is upper triangular, lower triangular, symmetric, or Hermitian. For an n-by-n matrix of one of these types, a
linear array ap of length n*(n + 1)/2 is adequate. Two parameters define the storage scheme:
matrix_layout, which specifies column major (with the value LAPACK_COL_MAJOR) or row major (with the
value LAPACK_ROW_MAJOR) matrix layout, and uplo, which specifies that the upper triangle (with the value
'U') or the lower triangle (with the value 'L') is stored.
Element ai,j is stored as array element a[k] where the mapping of k(i, j) is defined as
matrix_layout = LAPACK_COL_MAJOR matrix_layout = LAPACK_ROW_MAJOR
uplo = 'U' 1 ≤i≤j≤n k(i, j) = i - 1 + j*(j - 1)/2 k(i, j) = j - 1 + (i - 1)*(2*n - i)/2
331
uplo = 'L' 1 ≤j≤i≤n k(i, j) = i - 1 + (j - 1)*(2*n - j)/2 k(i, j) = j - 1 + i*(i - 1)/2
NOTE
should be greater than or equal to max(1, nx*(n + 1)/2).
Band Storage
When the non-zero elements of a matrix are confined to diagonal bands, it is possible to store the elements
more efficiently using band storage. For example, consider an m-by-n band matrix A with kl subdiagonals
and ku superdiagonals:
a 1, 1 a 1, 2 ⋯ a 1, k + 1
u
⋮ ⋮ ⋱ ⋱ ⋱
a k + 1, 1 a k + 1, 2 ⋯ a k + 1, k + 1 ⋱ a k + 1, k + k + 1
l l l u l l u
A= a k + 2, 2 ⋱ ⋱ ⋱ ⋱ ⋱
l
⋱ ⋱ ⋱ ⋱ ⋱ ⋱
a k + j, j a k + 1, j + 1 ⋯ ⋯ ⋯ a k + j, k + k + j
l l l l u
⋱ ⋱ ⋱ ⋱ ⋱ ⋱
This matrix can be stored compactly in a one dimensional array ab. There are two operations involved in
storing the matrix: packing the band matrix into matrix AB, and converting the packed matrix to a one-
dimensional array.
• Packing the Band Matrix: How the band matrix is packed depends on the matrix layout.
• Column major layout: matrix A is packed in an ldab-by-n matrix AB column-wise so that the diagonals
of A become rows of array AB.
a 1, k + 1 a 1, k + 2 a 1, k + 3 ⋯
u u u
⋰ ⋮ ⋮ ⋮ ⋯
a 1, 3 ⋯ a k − 1, k + 1 ak , k + 2 a k + 1, k + 3 ⋯
u u u u u u
a 1, 2 a 2, 3 ⋯ ak , k + 1 a k + 1, k + 2 a k + 2, k + 3 ⋯
u u u u u u
AB =
a 1, 1 a 2, 2 a 3, 3 ⋯ a k + 1, k + 1 a k + 2, k + 2 a k + 3, k + 3 ⋯
u u u u u u
a 2, 1 a 3, 2 a 4, 3 ⋯ a k + 2, k + 1 a k + 3, k + 2 a k + 4, k + 3 ⋯
u u u u u u
⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋯
a k + 1, 1 a k + 2, 2 a k + 3, 3 ⋯ a k + k + 1, k + 1 a k + k + 2, k + 2 a k + k + 3, k + 3 ⋯
l l l u l. u u l. u u l. u
The number of rows of ABldab≥kl + ku + 1, and the number of columns of AB is n.

• Row major layout: matrix A is packed in an m-by-ldab matrix AB row-wise so that the diagonals of A
become columns of AB.
332
LAPACK Routines 3
a 1, 1 a 1, 2 ⋯ a 1, k + 1
u
a 2, 1 a 2, 2 a 2, 3 ⋯ a 2, k + 2
u
a 3, 1 a 3, 2 a 3, 3 a 3, 4 ⋯ a 3, k + 3
u
⋰ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
AB =
a k + 1, 1 a k + 1, 2 ⋯ ⋯ a k + 1, k + 1 a k + 1, k + 2 ⋯ a k + 1, k + k + 1
l l l l l l l u l
a k + 2, 2 a k + 2, 3 ⋯ ⋯ a k + 2, k + 2 a k + 2, k + 3 ⋯ a k + 2, k + k + 2
l l l l l l l u l
a k + 3, 3 a k + 3, 4 ⋯ ⋯ a k + 3, k + 3 a k + 3, k + 4 ⋯ a k + 3, k + k + 3
l l l l l l l u l
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
The number of columns of ABldab≥kl + ku + 1, and the number of rows of AB is m.
NOTE
For both column major and row major layout, elements of the upper left triangle of AB are not used.
Depending on the relationship of the dimensions m, n, kl, and ku, the lower right triangle might not be
used.
• Converting the Packed Matrix to a One-Dimensional Array: The packed matrix AB is stored in a linear
array ab as described in Full Storage . The size of ab should be greater than or equal to the total number
of elements of matrix AB: ldab*n for column major layout or ldab*m for row major layout. The leading
dimension of ab, ldab, must be greater than or equal to kl + ku + 1 (and some routines require it to be
even larger).
Element ai,j is stored as array element a[k(i, j)] where the mapping of k(i, j) is defined as
• column major layout: k(i, j) = i + ku - j + (j - 1)*ldab; 1 ≤j≤n, max(1, j - ku) ≤i≤ min(m, j + kl)
• row major layout: k(i, j) = (i - j)*ldab + kl + j - 1; 1 ≤i≤m, max(1, i - kl) ≤j≤ min(n, i + ku)
NOTE
should be greater than or equal to max(1, n*ldab) for column major layout and max (1, m*ldab) for
row major layout.
Rectangular Full Packed Storage

A combination of full and packed storage, rectangular full packed storage can be used to store the upper or
lower triangle of a matrix which is upper triangular, lower triangular, symmetric, or Hermitian. It offers the
storage savings of packed storage plus the efficiency of using full storage Level 3 BLAS and LAPACK routines.
Three parameters define the storage scheme: matrix_layout, which specifies column major (with the value
LAPACK_COL_MAJOR) or row major (with the value LAPACK_ROW_MAJOR) matrix layout; uplo, which specifies
that the upper triangle (with the value 'U') or the lower triangle (with the value 'L') is stored;and transr,
which specifies normal (with the value 'N'), transpose (with the value 'T'), or conjugate transpose (with the
value 'C') operation on the matrix.
Consider an N-by-N matrix A:
a 0, 0 a 0, 1 a 0, 2 ⋯ a 0, N − 1
a 1, 0 a 1, 1 a 1, 2 ⋯ a 1, N − 1
A= a 2, 0 a 2, 1 a 2, 2 ⋯ a 2, N − 1
⋮ ⋮ ⋮ ⋱ ⋮
a N − 1, 0 a N − 1, 1 a N − 1, 2 ⋯ a N − 1, N − 1
333
The upper or lower triangle of A can be stored in the array ap of length N*(N + 1)/2.
Additionally, define k as the integer part of N/2, such that N=2*k if N is even, and N=2*k + 1 if N is odd.
Storing the matrix involves packing the matrix into a rectangular matrix, and then storing the matrix in a
one-dimensional array. The size of rectangular matrix AP required for the N-by-N matrix A is N + 1 by N/2 for
even N, and N by (N + 1)/2 for odd N.
These examples illustrate the rectangular full packed storage method.
• Upper triangular - uplo = 'U'

Consider a matrix A with N = 6:
a 0, 0 a 0, 1 a 0, 2 a 0, 3 a 0, 4 a 0, 5
a 1, 0 a 1, 1 a 1, 2 a 1, 3 a 1, 4 a 1, 5
a 2, 0 a 2, 1 a 2, 2 a 2, 3 a 2, 4 a 2, 5
A=
a 3, 0 a 3, 1 a 3, 2 a 3, 3 a 3, 4 a 3, 5
a 4, 0 a 4, 1 a 4, 2 a 4, 3 a 4, 4 a 4, 5
a 5, 0 a 5, 1 a 5, 2 a 5, 3 a 5, 4 a 5, 5
• Not transposed - transr = 'N'
The elements of the upper triangle of A can be packed in a matrix with the dimensions (N + 1)-by-
(N/2) = 7 by 3:
a 0, 3 a 0, 4 a 0, 5
a 1, 3 a 1, 4 a 1, 5
a 2, 3 a 2, 4 a 2, 5
AP = a 3, 3 a 3, 4 a 3, 5
a 0, 0 a 4, 4 a 4, 5
a 0, 1 a 1, 1 a 5, 5
a 0, 2 a 1, 2 a 2, 2
• Transposed or conjugate transposed - transr = 'T' or transr = 'C'
The elements of the upper triangle of A can be packed in a matrix with the dimensions (N/2) by (N
+ 1) = 3 by 7:
a 0, 3 a 1, 3 a 2, 3 a 3, 3 a 0, 0 a 0, 1 a 0, 2
AP = a 0, 4 a 1, 4 a 2, 4 a 3, 4 a 4, 4 a 1, 1 a 1, 2
a 0, 5 a 1, 5 a 2, 5 a 3, 5 a 4, 5 a 5, 5 a 2, 2

a 0, 0 a 0, 1 a 0, 2 a 0, 3 a 0, 4
a 1, 0 a 1, 1 a 1, 2 a 1, 3 a 1, 4
A = a 2, 0 a 2, 1 a 2, 2 a 2, 3 a 2, 4
a 3, 0 a 3, 1 a 3, 2 a 3, 3 a 3, 4
a 4, 0 a 4, 1 a 4, 2 a 4, 3 a 4, 4
The elements of the upper triangle of A can be packed in a matrix with the dimensions (N)-by-((N
+1)/2) = 5 by 3:
334
LAPACK Routines 3
a 0, 2 a 0, 3 a 0, 4
a 1, 2 a 1, 3 a 1, 4
AP = a 2, 2 a 2, 3 a 2, 4
a 0, 0 a 3, 3 a 3, 4
a 0, 1 a 1, 1 a 4, 4
The elements of the upper triangle of A can be packed in a matrix with the dimensions ((N+1)/2) by
(N ) = 5 by 3:
a 0, 2 a 1, 2 a 2, 3 a 0, 0 a 0, 1
AP = a 0, 3 a 1, 3 a 2, 3 a 3, 3 a 1, 1
a 0, 4 a 1, 4 a 2, 4 a 3, 4 a 4, 4
• Lower triangular - uplo = 'L'

a 0, 0 a 0, 1 a 0, 2 a 0, 3 a 0, 4 a 0, 5
a 1, 0 a 1, 1 a 1, 2 a 1, 3 a 1, 4 a 1, 5
a 2, 0 a 2, 1 a 2, 2 a 2, 3 a 2, 4 a 2, 5
A=
a 3, 0 a 3, 1 a 3, 2 a 3, 3 a 3, 4 a 3, 5
a 4, 0 a 4, 1 a 4, 2 a 4, 3 a 4, 4 a 4, 5
a 5, 0 a 5, 1 a 5, 2 a 5, 3 a 5, 4 a 5, 5
The elements of the lower triangle of A can be packed in a matrix with the dimensions (N + 1)-by-
(N/2) = 7 by 3:
a 3, 3 a 4, 3 a 5, 3
a 0, 0 a 4, 4 a 5, 4
a 1, 0 a 1, 1 a 5, 5
AP = a 2, 0 a 2, 1 a 2, 2
a 3, 0 a 3, 1 a 3, 2
a 3, 0 a 4, 1 a 4, 2
a 5, 0 a 5, 1 a 5, 2
The elements of the lower triangle of A can be packed in a matrix with the dimensions (N/2) by (N
+ 1) = 3 by 7:
a 3, 3 a 0, 0 a 1, 0 a 2, 0 a 3, 0 a 4, 0 a 5, 0
AP = a 4, 3 a 4, 4 a 1, 1 a 2, 1 a 3, 1 a 4, 1 a 5, 1
a 5, 3 a 5, 4 a 5, 5 a 2, 2 a 3, 2 a 4, 2 a 5, 2
335
a 0, 0 a 0, 1 a 0, 2 a 0, 3 a 0, 4
a 1, 0 a 1, 1 a 1, 2 a 1, 3 a 1, 4
A = a 2, 0 a 2, 1 a 2, 2 a 2, 3 a 2, 4
a 3, 0 a 3, 1 a 3, 2 a 3, 3 a 3, 4
a 4, 0 a 4, 1 a 4, 2 a 4, 3 a 4, 4
The elements of the lower triangle of A can be packed in a matrix with the dimensions (N)-by-((N
+1)/2) = 5 by 3:
a 0, 0 a 3, 3 a 4, 3
a 1, 0 a 1, 1 a 4, 4
AP = a 2, 0 a 2, 1 a 2, 2
a 3, 0 a 3, 1 a 3, 2
a 4, 0 a 4, 1 a 4, 2
The elements of the lower triangle of A can be packed in a matrix with the dimensions ((N+1)/2) by
(N ) = 5 by 3:
a 0, 0 a 1, 0 a 2, 0 a 3, 0 a 4, 0
AP = a 3, 3 a 1, 1 a 2, 1 a 3, 1 a 4, 1
a 4, 3 a 4, 4 a 2, 2 a 3, 2 a 4, 2
The packed matrix AP can be stored using column major layout or row major layout.
NOTE
The matrix_layout and transr parameters can specify the same storage scheme: for example, the
storage scheme for matrix_layout = LAPACK_COL_MAJOR and transr = 'N' is the same as that for
matrix_layout = LAPACK_ROW_MAJOR and transr = 'T'.
Element ai,j is stored as array element ap[l] where the mapping of l(i, j) is defined in the following tables.
• Column major layout: matrix_layout = LAPACK_COL_MAJOR a
trans uplo N l(i, j) = i j

r
'N' 'U' 2*k (j - k)*(N + 1) + i 0 ≤i < N max(i, k) ≤j

< N
i*(N + 1) + j + k + 1 0 ≤i < k i≤j < k
2*k (j - k)*N + i 0 ≤i < N max(i, k) ≤j

+ 1 < N
i*N + j + k + 1 0 ≤i < k i≤j < k
'L' 2*k j*(N + 1) + i + 1 0 ≤i < N 0 ≤j≤ min(i,

k)
336
LAPACK Routines 3
r
(i - k)*(N + 1) + j - k k≤i < N k≤j≤i
2*k j*N + i 0 ≤i < N 0 ≤j≤ min(i,

+ 1 k)
(i - k)*N + j - k - 1 k + 1 ≤i < N k + 1 ≤j≤i
'T' or 'U' 2*k i*k + j - k 0 ≤i < N max(i, k) ≤j

'C' < N
(j + k + 1)*k + i 0 ≤i < k i≤j < k
2*k i*(k + 1) + j - k 0 ≤i < N max(i, k) ≤j

+ 1 < N
(j + k + 1)*(k + 1) + i 0 ≤i < k i≤j < k
'L' 2*k (i + 1)*k + j 0 ≤i < N 0 ≤j≤ min(i,

k)
(j - k)*k + i - k k≤i < N k≤j≤i
2*k i*(k + 1) + j 0 ≤i < N 0 ≤j≤ min(i,

+ 1 k)
(j - k - 1)*(k + 1) + i - k k + 1 ≤i < N k + 1 ≤j≤i

• Row major layout: matrix_layout = LAPACK_ROW_MAJOR

r
'N' 'U' 2*k i*k + j - k 0 ≤i < N max(i, k) ≤j

< N
(k + j + 1)*k + i 0 ≤i < k i≤j < k
2*k i*(k + 1) + j - k 0 ≤i < N max(i, k) ≤j

+ 1 < N
(k + j + 1)*(k + 1) + i 0 ≤i < k i≤j < k
'L' 2*k (i + 1)*k + j 0 ≤i < N 0 ≤j≤ min(i,

k)
(j - k)*k + i - k k≤i < N k≤j≤i
2*k i*(k + 1) + j 0 ≤i < N 0 ≤j≤ min(i,

+ 1 k)
(j - k - 1)*(k + 1) + i - k k + 1 ≤i < N k + 1 ≤j≤i
'T' or 'U' 2*k (j - k)*(N + 1) + i 0 ≤i < N max(i, k) ≤j

'C' < N
337

r
i*(N + 1) + k + j + 1 0 ≤i < k i≤j < k
2*k (j - k)*N + i 0 ≤i < N max(i, k) ≤j

+ 1 < N
i*N + k + j + 1 0 ≤i < k i≤j < k
'L' 2*k j*(N + 1) + i + 1 0 ≤i < N 0 ≤j≤ min(i,

k)
(i - k)*(N + 1) + j - k k≤i < N k≤j≤i
2*k j*N + i 0 ≤i < N 0 ≤j≤ min(i,

+ 1 k)
(i - k)*N + j - k - 1 k + 1 ≤i < N k + 1 ≤j≤i
NOTE
should be greater than or equal to max(1, N*(N + 1)/2).
Mathematical Notation for LAPACK Routines

Descriptions of LAPACK routines use the following notation:
AH For an M-by-N matrix A, denotes the conjugate transposed N-by-M
matrix with elements:
For a real-valued matrix, AH = AT.
x·y The dot product of two vectors, defined as:
Ax = b A system of linear equations with an n-by-n matrix A = {aij}, a

right-hand side vector b = {bi}, and an unknown vector x = {xi}.
AX = B A set of systems with a common matrix A and multiple right-hand

sides. The columns of B are individual right-hand sides, and the
columns of X are the corresponding solutions.
|x| the vector with elements |xi| (absolute values of xi).
|A| the matrix with elements |aij| (absolute values of aij).
||x||∞ = maxi|xi| The infinity-norm of the vector x.
||A||∞ = maxiΣj|aij| The infinity-norm of the matrix A.
338
LAPACK Routines 3
||A||1 = maxjΣi|aij| The one-norm of the matrix A. ||A||1 = ||AT||∞ = ||AH||∞
||x||2 The 2-norm of the vector x: ||x||2 = (Σi|xi|2)1/2 = ||x||E (see

the definition for Euclidean norm in this section).
||A||2 The 2-norm (or spectral norm) of the matrix A.
||A||E The Euclidean norm of the matrix A: ||A||E2 = ΣiΣj|aij|2.
κ(A) = ||A||·||A-1|| The condition number of the matrix A.
λi Eigenvalues of the matrix A (for the definition of eigenvalues, see

Eigenvalue Problems).
σi Singular values of the matrix A. They are equal to square roots of the
eigenvalues of AHA. (For more information, see Singular Value
Decomposition).
Error Analysis
In practice, most computations are performed with rounding errors. Besides, you often need to solve a
system Ax = b, where the data (the elements of A and b) are not known exactly. Therefore, it is important
to understand how the data errors and rounding errors can affect the solution x.
Data perturbations. If x is the exact solution of Ax = b, and x + δx is the exact solution of a perturbed
problem (A + δA)(x + δx) = (b + δb), then this estimate, given up to linear terms of perturbations,
holds:
where A + δA is nonsingular and
In other words, relative errors in A or b may be amplified in the solution vector x by a factor κ(A) = ||A||
||A-1|| called the condition number of A.
Rounding errors have the same effect as relative perturbations c(n)ε in the original data. Here ε is the
machine precision, defined as the smallest positive number x such that 1 + x > 1; and c(n) is a modest
function of the matrix order n. The corresponding solution error is
||δx||/||x||≤c(n)κ(A)ε. (The value of c(n) is seldom greater than 10n.)
NOTE
Machine precision depends on the data type used. For example, it is usually defined in the float.h
file as FLT_EPSILON the float datatype and DBL_EPSILON for the double datatype.
339
Thus, if your matrix A is ill-conditioned (that is, its condition number κ(A) is very large), then the error in
the solution x can also be large; you might even encounter a complete loss of precision. LAPACK provides
routines that allow you to estimate κ(A) (see Routines for Estimating the Condition Number) and also give
you a more precise estimate for the actual solution error (see Refining the Solution and Estimating Its Error).
LAPACK Linear Equation Routines

This section describes routines for performing the following computations:
– factoring the matrix (except for triangular matrices)

– equilibrating the matrix (except for RFP matrices)
– solving a system of linear equations
– estimating the condition number of a matrix (except for RFP matrices)
– refining the solution of linear equations and computing its error bounds (except for RFP matrices)
– inverting the matrix.
To solve a particular problem, you can call two or more computational routines or call a corresponding driver
routine that combines several tasks in one call. For example, to solve a system of linear equations with a
general matrix, call ?getrf (LU factorization) and then ?getrs (computing the solution). Then, call ?gerfs
to refine the solution and get the error bounds. Alternatively, use the driver routine ?gesvx that performs all
these tasks in one call.
LAPACK Linear Equation Computational Routines

Table "Computational Routines for Systems of Equations with Real Matrices" lists the LAPACK computational
routines for factorizing, equilibrating, and inverting real matrices, estimating their condition numbers, solving
systems of equations with real matrices, refining the solution, and estimating its error. Table "Computational
Routines for Systems of Equations with Complex Matrices" lists similar routines for complex matrices.
Computational Routines for Systems of Equations with Real Matrices
Matrix type, Factorize Equilibrate Solve Condition Estimate Invert matrix
storage scheme matrix matrix system number error
general ?getrf ?geequ, ?getrs ?gecon ?gerfs, ?getri

?geequb ?gerfsx
general band ?gbtrf ?gbequ, ?gbtrs ?gbcon ?gbrfs,

?gbequb ?gbrfsx
general tridiagonal ?gttrf ?gttrs ?gtcon ?gtrfs
diagonally ?dttrfb ?dttrsb

dominant
tridiagonal
symmetric ?potrf ?poequ, ?potrs ?pocon ?porfs, ?potri

positive-definite
?poequb ?porfsx
symmetric ?pptrf ?ppequ ?pptrs ?ppcon ?pprfs ?pptri

positive-definite,
packed storage
symmetric ?pftrf ?pftrs ?pftri

positive-definite,
RFP storage
340
LAPACK Routines 3
symmetric ?pbtrf ?pbequ ?pbtrs ?pbcon ?pbrfs

positive-definite,
band
symmetric ?pttrf ?pttrs ?ptcon ?ptrfs

positive-definite,
tridiagonal
symmetric ?sytrf ?syequb ?sytrs ?sycon ?syrfs, ?sytri

indefinite
?sytrs2 ?syrfsx ?sytri2
?sytri2x
symmetric ?sptrf ?sptrs ?spcon ?sprfs ?sptri

indefinite, packed
storage
mkl_?
spffrt2,
mkl_?
spffrtx
triangular ?trtrs ?trcon ?trrfs ?trtri
triangular, packed ?tptrs ?tpcon ?tprfs ?tptri

storage
triangular, RFP ?tftri

storage
triangular band ?tbtrs ?tbcon ?tbrfs
Computational Routines for Systems of Equations with Complex Matrices

general ?getrf ?geequ, ?getrs ?gecon ?gerfs, ?getri

?geequb ?gerfsx
general band ?gbtrf ?gbequ, ?gbtrs ?gbcon ?gbrfs,

?gbequb ?gbrfsx
general tridiagonal ?gttrf ?gttrs ?gtcon ?gtrfs
Hermitian ?potrf ?poequ, ?potrs ?pocon ?porfs, ?potri

positive-definite
?poequb ?porfsx
Hermitian ?pptrf ?ppequ ?pptrs ?ppcon ?pprfs ?pptri

positive-definite,
packed storage
Hermitian ?pftrf ?pftrs ?pftri

positive-definite,
RFP storage
Hermitian ?pbtrf ?pbequ ?pbtrs ?pbcon ?pbrfs

positive-definite,
band
341

Hermitian ?pttrf ?pttrs ?ptcon ?ptrfs

positive-definite,
tridiagonal
Hermitian ?hetrf ?heequb ?hetrs ?hecon ?herfs, ?hetri

indefinite
?hetrs2 ?herfsx ?hetri2
?hetri2x
symmetric ?sytrf ?syequb ?sytrs ?sycon ?syrfs, ?sytri

indefinite
?sytrs2 ?syrfsx ?sytri2
?sytri2x
Hermitian ?hptrf ?hptrs ?hpcon ?hprfs ?hptri

indefinite, packed
storage
symmetric ?sptrf ?sptrs ?spcon ?sprfs ?sptri

indefinite, packed
storage
mkl_?
spffrt2,
mkl_?
spffrtx
triangular ?trtrs ?trcon ?trrfs ?trtri
triangular, packed ?tptrs ?tpcon ?tprfs ?tptri

storage
triangular, RFP ?tftri

storage
triangular band ?tbtrs ?tbcon ?tbrfs
Matrix Factorization: LAPACK Computational Routines

This section describes the LAPACK routines for matrix factorization. The following factorizations are
supported:
• LU factorization
• Cholesky factorization of real symmetric positive-definite matrices
• Cholesky factorization of real symmetric positive-definite matrices with pivoting
• Cholesky factorization of Hermitian positive-definite matrices
• Cholesky factorization of Hermitian positive-definite matrices with pivoting
• Bunch-Kaufman factorization of real and complex symmetric matrices
• Bunch-Kaufman factorization of Hermitian matrices.
You can compute:
• the LU factorization using full and band storage of matrices

• the Cholesky factorization using full, packed, RFP, and band storage
• the Bunch-Kaufman factorization using full and packed storage.
?getrf
Computes the LU factorization of a general m-by-n
matrix.
342
LAPACK Routines 3
Syntax
lapack_int LAPACKE_sgetrf (int matrix_layout , lapack_int m , lapack_int n , float *
a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_dgetrf (int matrix_layout , lapack_int m , lapack_int n , double *
a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_cgetrf (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_zgetrf (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the LU factorization of a general m-by-n matrix A as
A = P*L*U,
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n) and U is upper triangular (upper trapezoidal if m < n). The routine uses partial pivoting, with row
interchanges.
NOTE
This routine supports the Progress Routine feature. See Progress Functionsection for details.
Input Parameters
matrix_layout Specifies whether matrix storage layout is row major

(LAPACK_ROW_MAJOR) or column major (LAPACK_COL_MAJOR).
m The number of rows in the matrix A (m≥ 0).
n The number of columns in A; n≥ 0.
a Array, size at least max(1, lda*n) for column-major layout or max(1,

lda*m) for row-major layout. Contains the matrix A.
lda The leading dimension of array a, which must be at least max(1, m)

for column-major layout or max(1, n) for row-major layout.
Output Parameters
a Overwritten by L and U. The unit diagonal elements of L are not

stored.
ipiv Array, size at least max(1,min(m, n)). The pivot indices; for 1 ≤i≤
min(m, n), row i was interchanged with row ipiv(i).
Return Values
This function returns a value info.
343
If info = -i, parameter i had an illegal value.
If info = i, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
occur if you use the factor U for solving a system of linear equations.
Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, where
|E| ≤c(min(m,n))εP|L||U|
c(n) is a modest linear function of n, and ε is the machine precision.

The approximate number of floating-point operations for real flavors is
(2/3)n3 If m = n,
(1/3)n2(3m-n) If m>n,
(1/3)m2(3n-m) If m<n.
The number of operations for complex flavors is four times greater.

After calling this routine with m = n, you can call the following:
?getrs to solve A*X = B or ATX = B or AHX = B
?gecon to estimate the condition number of A
?getri to compute the inverse of A.
See Also
mkl_progress
mkl_?getrfnpi
Performs LU factorization (complete or incomplete) of
a general matrix without pivoting.
Syntax
lapack_int LAPACKE_mkl_sgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, float* a, lapack_int lda);
lapack_int LAPACKE_mkl_dgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, double* a, lapack_int lda);
lapack_int LAPACKE_mkl_cgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, lapack_complex_float* a, lapack_int lda);
lapack_int LAPACKE_mkl_zgetrfnpi (int matrix_layout, lapack_int m, lapack_int n,
lapack_int nfact, lapack_complex_double* a, lapack_int lda);
Include Files
• mkl.h
Description
The routine computes the LU factorization of a general m-by-n matrix A without using pivoting. It supports
incomplete factorization. The factorization has the form:
A = L*U,
344
LAPACK Routines 3
where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n) and U is upper triangular
(upper trapezoidal if m < n).
Incomplete factorization has the form:
where L is lower trapezoidal with unit diagonal elements, U is upper trapezoidal, and is the unfactored
part of matrix A. See the application notes section for further details.
NOTE
Use ?getrf if it is possible that the matrix is not diagonal dominant.
Input Parameters
matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR)

or column major (LAPACK_COL_MAJOR).
m The number of rows in matrix A; m≥ 0.
n The number of columns in matrix A; n≥ 0.
nfact The number of rows and columns to factor; 0 ≤nfact≤ min(m, n). Note that
if nfact < min(m, n), incomplete factorization is performed.
a Array of size at least lda*n for column major layout and at least lda*m for
row major layout. Contains the matrix A.
lda The leading dimension of array a. lda≥ max(1, m) for column major layout
and lda≥ max(1, n) for row major layout.
Output Parameters
a Overwritten by L and U. The unit diagonal elements of L are not stored.

When incomplete factorization is specified by setting nfact < min(m, n), a
also contains the unfactored submatrix . See the application notes

section for further details.
Return Values
If info = -i, the i-th parameter had an illegal value.
If info = i, uii is 0. The requested factorization has been completed, but U is exactly singular. Division by 0
will occur if factorization is completed and factor U is used for solving a system of linear equations.
Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, with
|E| ≤c(min(m, n))ε|L||U|
345
where c(n) is a modest linear function of n, and ε is the machine precision.
(2/3)n3 If m = n = nfact
(1/3)n2(3m-n) If m>n = nfact
(1/3)m2(3n-m) If m = nfact<n
(2/3)n3 - (n-nfact)3 If m = n,nfact< min(m, n)
(1/3)(n2(3m-n) - (n-nfact)2(3m - If m>n > nfact

2nfact - n))
(1/3)(m2(3n-m) - (m-nfact)2(3n - If nfact < m < n

2nfact - m))

When incomplete factorization is specified, the first nfact rows and columns are factored, with the update of
the remaining rows and columns of A as follows:
If matrix A is represented as a block 2-by-2 matrix:
where
• A11 is a square matrix of order nfact,

• A21 is an (m - nfact)-by-nfact matrix,
• A12 is an nfact-by-(n - nfact) matrix, and
• A22 is an (m - nfact)-by-(n - nfact) matrix.
The result is
L1 is a lower triangular square matrix of order nfact with unit diagonal and U1 is an upper triangular square
matrix of order nfact. L1 and U1 result from LU factorization of matrix A11: A11 = L1U1.
346
LAPACK Routines 3
L2 is an (m - nfact)-by-nfact matrix and L2 = A21U1-1. U2 is an nfact-by-(n - nfact) matrix and U2 =
L1-1A12.
is an (m - nfact)-by-(n - nfact) matrix and = A22 - L2U2.
On exit, elements of the upper triangle U1 are stored in place of the upper triangle of block A11 in array a;
elements of the lower triangle L1 are stored in the lower triangle of block A11 in array a (unit diagonal
elements are not stored). Elements of L2 replace elements of A21; U2 replaces elements of A12 and
replaces elements of A22.
?getrf2
Computes LU factorization using partial pivoting with
row interchanges.
Syntax
lapack_int LAPACKE_sgetrf2 (int matrix_layout, lapack_int m, lapack_int n, float * a,
lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_dgetrf2 (int matrix_layout, lapack_int m, lapack_int n, double * a,
lapack_int LAPACKE_cgetrf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zgetrf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv);
Include Files
• mkl.h
Description
?getrf2 computes an LU factorization of a general m-by-n matrix A using partial pivoting with row
interchanges.
The factorization has the form
A=P*L*U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n), and U is upper triangular (upper trapezoidal if m < n).
347
This is the recursive version of the algorithm. It divides the matrix into four submatrices:
A11 A12
A=
A21 A22
where A11 is n1 by n1 and A22 is n2 by n2 with n1 = min(m, n), and n2 = n - n1.
A11
The subroutine calls itself to factor ,
A12
A12
do the swaps on , solve A12, update A22, then it calls itself to factor A22 and do the swaps on A21.
A22
Input Parameters

m The number of rows of the matrix A. m >= 0.
n The number of columns of the matrix A. n >= 0.
On entry, the m-by-n matrix to be factored.
lda The leading dimension of the array a. lda >= max(1,m).
Output Parameters
a On exit, the factors L and U from the factorization A = P * L * U; the unit

diagonal elements of L are not stored.
ipiv Array, size (min(m,n)).
The pivot indices; for 1 <= i <= min(m,n), row i of the matrix was
interchanged with row ipiv[i - 1].
Return Values
= 0: successful exit
< 0: if info = -i, the i-th argument had an illegal value.
> 0: if info = i, Ui, i is exactly zero. The factorization has been completed, but the factor U is exactly singular,
and division by zero will occur if it is used to solve a system of equations.
?gbtrf
band matrix.
Syntax
lapack_int LAPACKE_sgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , float * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_dgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , double * ab , lapack_int ldab , lapack_int * ipiv );
348
LAPACK Routines 3
lapack_int LAPACKE_cgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , lapack_complex_float * ab , lapack_int ldab , lapack_int * ipiv );
lapack_int LAPACKE_zgbtrf (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , lapack_complex_double * ab , lapack_int ldab , lapack_int *
ipiv );
Include Files
• mkl.h
Description
The routine forms the LU factorization of a general m-by-n band matrix A with kl non-zero subdiagonals and
ku non-zero superdiagonals, that is,
A = P*L*U,
where P is a permutation matrix; L is lower triangular with unit diagonal elements and at most kl non-zero
elements in each column; U is an upper triangular band matrix with kl + ku superdiagonals. The routine uses
partial pivoting, with row interchanges (which creates the additional kl superdiagonals in U).
NOTE
This routine supports the Progress Routine feature. See Progress Function section for details.
Input Parameters

m The number of rows in matrix A; m≥ 0.
n The number of columns in matrix A; n≥ 0.
kl The number of subdiagonals within the band of A; kl≥ 0.
ku The number of superdiagonals within the band of A; ku≥ 0.
ab Array, size at least max(1, ldab*n) for column-major layout or

max(1, ldab*m) for row-major layout.
The array ab contains the matrix A in band storage as described in

Band Storage.
ldab The leading dimension of the array ab. (ldab≥ 2*kl + ku + 1)
Output Parameters
ab Overwritten with elements of L and U. U is stored as an upper

triangular band matrix with kl + ku superdiagonals, and L is stored
as a lower triangular band matrix with kl subdiagonals (diagonal unit
values are not stored). Since the output array has more nonzero
elements than the initial matrix A, there are limitations on the value of
ldab and the placement of elements of A in array ab.
See Application Notes below for further details.
349
ipiv Array, size at least max(1,min(m, n)). The pivot indices; for 1 ≤i≤
min(m, n) , row i was interchanged with row ipiv(i).
Return Values
If info = 0, the execution is successful.
If info = i, uiiis 0. The factorization has been completed, but U is exactly singular. Division by 0 will occur
if you use the factor U for solving a system of linear equations.
Application Notes
The computed L and U are the exact factors of a perturbed matrix A + E, where
|E| ≤c(kl+ku+1) εP|L||U|
c(k) is a modest linear function of k, and ε is the machine precision.

The total number of floating-point operations for real flavors varies between approximately 2n(ku+1)kl and
2n(kl+ku+1)kl. The number of operations for complex flavors is four times greater. All these estimates
assume that kl and ku are much less than min(m,n).
As described in Band Storage, storage of a band matrix can be considered in two steps: packing band matrix
elements into a matrix AB, then storing the elements in a linear array ab using a full storage scheme. The
effect of the ?gbtrf routine on matrix AB is illustrated by this example, for m = n = 6, kl = 2, ku = 1.
• matrix_layout = LAPACK_COL_MAJOR
On entry: On exit:
• matrix_layout = LAPACK_ROW_MAJOR
On entry: On exit:
Elements marked * are not used; elements marked + need not be set on entry, but are required by the
routine to store elements of U because of fill-in resulting from the row interchanges.
After calling this routine with m = n, you can call the following routines:
350
LAPACK Routines 3
gbtrs to solve A*X = B or AT*X = B or AH*X = B
gbcon to estimate the condition number of A.
See Also
mkl_progress
?gttrf
Computes the LU factorization of a tridiagonal matrix.
Syntax
lapack_int LAPACKE_sgttrf (lapack_int n , float * dl , float * d , float * du , float *
du2 , lapack_int * ipiv );
lapack_int LAPACKE_dgttrf (lapack_int n , double * dl , double * d , double * du ,
double * du2 , lapack_int * ipiv );
lapack_int LAPACKE_cgttrf (lapack_int n , lapack_complex_float * dl ,
lapack_complex_float * d , lapack_complex_float * du , lapack_complex_float * du2 ,
lapack_int * ipiv );
lapack_int LAPACKE_zgttrf (lapack_int n , lapack_complex_double * dl ,
lapack_complex_double * d , lapack_complex_double * du , lapack_complex_double * du2 ,
Include Files
• mkl.h
Description
The routine computes the LU factorization of a real or complex tridiagonal matrix A using elimination with
partial pivoting and row interchanges.
A = L*U,
where L is a product of permutation and unit lower bidiagonal matrices and U is upper triangular with
nonzeroes in only the main diagonal and first two superdiagonals.
Input Parameters
n The order of the matrix A; n≥ 0.
dl, d, du Arrays containing elements of A.

The array dl of dimension (n - 1) contains the subdiagonal elements
of A.
The array d of dimension n contains the diagonal elements of A.
The array du of dimension (n - 1) contains the superdiagonal
elements of A.
351
Output Parameters
dl Overwritten by the (n-1) multipliers that define the matrix L from the
LU factorization of A.
d Overwritten by the n diagonal elements of the upper triangular matrix

U from the LU factorization of A.
du Overwritten by the (n-1) elements of the first superdiagonal of U.
du2 Array, dimension (n -2). On exit, du2 contains (n-2) elements of

the second superdiagonal of U.
ipiv Array, dimension (n). The pivot indices: for 1 ≤i≤n, row i was
interchanged with row ipiv[i-1]. ipiv[i-1] is always i or i+1;
ipiv[i-1] = i indicates a row interchange was not required.
Return Values
If info = i, uiiis 0. The factorization has been completed, but U is exactly singular. Division by zero will
Application Notes
?gbtrs to solve A*X = B or AT*X = B or AH*X = B
?gbcon to estimate the condition number of A.
?dttrfb
Computes the factorization of a diagonally dominant
tridiagonal matrix.
Syntax
void sdttrfb (const MKL_INT * n , float * dl , float * d , const float * du , MKL_INT *
info );
void ddttrfb (const MKL_INT * n , double * dl , double * d , const double * du ,
MKL_INT * info );
void cdttrfb (const MKL_INT * n , MKL_Complex8 * dl , MKL_Complex8 * d , const
MKL_Complex8 * du , MKL_INT * info );
void zdttrfb_ (const MKL_INT * n , MKL_Complex16 * dl , MKL_Complex16 * d , const
MKL_Complex16 * du , MKL_INT * info );
Include Files
• mkl.h
Description
352
LAPACK Routines 3
The ?dttrfb routine computes the factorization of a real or complex tridiagonal matrix A with the BABE
(Burning At Both Ends) algorithm without pivoting. The factorization has the form
A = L1*U*L2
where
• L1 and L2 are unit lower bidiagonal with k and n - k - 1 subdiagonal elements, respectively, where k =
n/2, and
• U is an upper bidiagonal matrix with nonzeroes in only the main diagonal and first superdiagonal.
Input Parameters

The array dl of dimension (n - 1) contains the subdiagonal
elements of A.
The array d of dimension n contains the diagonal elements of A.
The array du of dimension (n - 1) contains the superdiagonal

elements of A.
Output Parameters
dl Overwritten by the (n -1) multipliers that define the matrix L from

the LU factorization of A.
d Overwritten by the n diagonal element reciprocals of the upper

triangular matrix U from the factorization of A.
du Overwritten by the (n-1) elements of the superdiagonal of U.
info If info = 0, the execution is successful.
If info = i, uii is 0. The factorization has been completed, but U is

exactly singular. Division by zero will occur if you use the factor U for
solving a system of linear equations.
Application Notes
A diagonally dominant tridiagonal system is defined such that |di| > |dli-1| + |dui| for any i:
1 < i < n, and |d1| > |du1|, |dn| > |dln-1|
The underlying BABE algorithm is designed for diagonally dominant systems. Such systems are free from the
numerical stability issue unlike the canonical systems that use elimination with partial pivoting (see ?gttrf).
The diagonally dominant systems are much faster than the canonical systems.
NOTE
• The current implementation of BABE has a potential accuracy issue on very small or large data
close to the underflow or overflow threshold respectively. Scale the matrix before applying the
solver in the case of such input data.
• Applying the ?dttrfb factorization to non-diagonally dominant systems may lead to an accuracy
loss, or false singularity detected due to no pivoting.
353
?potrf
Computes the Cholesky factorization of a symmetric
(Hermitian) positive-definite matrix.
Syntax
lapack_int LAPACKE_spotrf (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dpotrf (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_cpotrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_zpotrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, Hermitian
positive-definite matrix A:
A = UT*U for real data, A = UH*U for complex data if uplo='U'

A = L*LT for real data, A = L*LH for complex data if uplo='L'
where L is a lower triangular matrix and U is upper triangular.
NOTE
Input Parameters

uplo Must be 'U' or 'L'.
Indicates whether the upper or lower triangular part of A is stored and

how A is factored:
If uplo = 'U', the array a stores the upper triangular part of the
matrix A, and the strictly lower triangular part of the matrix is not
referenced.
If uplo = 'L', the array a stores the lower triangular part of the
matrix A, and the strictly upper triangular part of the matrix is not
referenced.
n The order of matrix A; n≥ 0.
a Array, size max(1, lda*n. The array a contains either the upper or the
lower triangular part of the matrix A (see uplo).
354
LAPACK Routines 3
lda The leading dimension of a. Must be at least max(1, n).
Output Parameters
a The upper or lower triangular part of a is overwritten by the Cholesky

factor U or L, as specified by uplo.
Return Values
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, and the
factorization could not be completed. This may indicate an error in forming the matrix A.
Application Notes
If uplo = 'U', the computed factor U is the exact factor of a perturbed matrix A + E, where

A similar estimate holds for uplo = 'L'.
The total number of floating-point operations is approximately (1/3)n3 for real flavors or (4/3)n3 for
complex flavors.
After calling this routine, you can call the following routines:
?potrs to solve A*X = B
?pocon to estimate the condition number of A
?potri to compute the inverse of A.
See Also
mkl_progress
?potrf2
Computes Cholesky factorization using a recursive
algorithm.
Syntax
lapack_int LAPACKE_spotrf2 (int matrix_layout, char uplo, lapack_int n, float * a,
lapack_int lda);
lapack_int LAPACKE_dpotrf2 (int matrix_layout, char uplo, lapack_int n, double * a,
lapack_int lda);
lapack_int LAPACKE_cpotrf2 (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float * a, lapack_int lda);
355
lapack_int LAPACKE_zpotrf2 (int matrix_layout, char uplo, lapack_int n,

lapack_complex_double * a, lapack_int lda);
Include Files
• mkl.h
Description
?potrf2 computes the Cholesky factorization of a real or complex symmetric positive definite matrix A using
the recursive algorithm.
for real flavors:
A = UT * U, if uplo = 'U', or
A = L * LT, if uplo = 'L',
for complex flavors:

A = UH * U, if uplo = 'U',
or A = L * LH, if uplo = 'L',
where U is an upper triangular matrix and L is lower triangular.

This is the recursive version of the algorithm. It divides the matrix into four submatrices:
A11 A12
A=
A21 A22
where A11 is n1 by n1 and A22 is n2 by n2, with n1 = n/2 and n2 = n-n1.
The subroutine calls itself to factor A11. Update and scale A21 or A12, update A22 then call itself to factor
A22.
Input Parameters

uplo = 'U': Upper triangle of A is stored;

= 'L': Lower triangle of A is stored.
n The order of the matrix A.

n≥ 0.
a Array, size (lda*n).
On entry, the symmetric matrix A.

If uplo = 'U', the leading n-by-n upper triangular part of a contains the
upper triangular part of the matrix A, and the strictly lower triangular part
of a is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of a contains the
lower triangular part of the matrix A, and the strictly upper triangular part
lda The leading dimension of the array a.
lda≥ max(1,n).
356
LAPACK Routines 3
Output Parameters
a On exit, if info = 0, the factor U or L from the Cholesky factorization.
For real flavors:

A = UT*U or A = L*LT;
For complex flavors:
A = UH*U or A = L*LH.
Return Values
< 0: if info = -i, the i-th argument had an illegal value
> 0: if info = i, the leading minor of order i is not positive definite, and the factorization could not be
completed.
?pstrf
Computes the Cholesky factorization with complete
pivoting of a real symmetric (complex Hermitian)
positive semidefinite matrix.
Syntax
lapack_int LAPACKE_spstrf( int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, lapack_int* piv, lapack_int* rank, float tol );
lapack_int LAPACKE_dpstrf( int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, lapack_int* piv, lapack_int* rank, double tol );
lapack_int LAPACKE_cpstrf( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* piv, lapack_int* rank, float
tol );
lapack_int LAPACKE_zpstrf( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* piv, lapack_int* rank, double
tol );
Include Files
• mkl.h
Description
The routine computes the Cholesky factorization with complete pivoting of a real symmetric (complex
Hermitian) positive semidefinite matrix. The form of the factorization is:
PT * A * P = UT * U, if uplo ='U' for real flavors,

PT * A * P = UH * U, if uplo ='U' for complex flavors,
PT * A * P = L * LT, if uplo ='L' for real flavors,
PT * A * P = L * LH, if uplo ='L' for complex flavors,
where P is a permutation matrix stored as vector piv, and U and L are upper and lower triangular matrices,
respectively.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls
level 3 BLAS.
357
Input Parameters

Indicates whether the upper or lower triangular part of A is stored:

matrix A, and the strictly lower triangular part of the matrix is not
referenced.
matrix A, and the strictly upper triangular part of the matrix is not
referenced.
a Array a, size max(1,lda*n). The array a contains either the upper or

the lower triangular part of the matrix A (see uplo). .
tol User defined tolerance. If tol < 0, then n*ε*max(Ak,k), where ε is the
machine precision, will be used (see Error Analysis for the definition of
machine precision). The algorithm terminates at the (k-1)-st step, if
the pivot ≤tol.
lda The leading dimension of a; at least max(1, n).
Output Parameters
a If info = 0, the factor U or L from the Cholesky factorization is as

described in Description.
piv Array, size at least max(1, n). The array piv is such that the nonzero
entries are Ppiv[k-1],k (1 ≤k≤n).
rank The rank of a given by the number of steps the algorithm completed.
Return Values
If info = -k, the k-th argument had an illegal value.
If info > 0, the matrix A is either rank deficient with a computed rank as returned in rank, or is not
positive semidefinite.
See Also
?pftrf
(Hermitian) positive-definite matrix using the
Rectangular Full Packed (RFP) format .
358
LAPACK Routines 3
Syntax
lapack_int LAPACKE_spftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
float * a );
lapack_int LAPACKE_dpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
double * a );
lapack_int LAPACKE_cpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_float * a );
lapack_int LAPACKE_zpftrf (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_double * a );
Include Files
• mkl.h
Description
The routine forms the Cholesky factorization of a symmetric positive-definite or, for complex data, a
Hermitian positive-definite matrix A:


The matrix A is in the Rectangular Full Packed (RFP) format. For the description of the RFP format, see Matrix
Storage Schemes.
This is the block version of the algorithm, calling Level 3 BLAS.
Input Parameters

transr Must be 'N', 'T' (for real data) or 'C' (for complex data).
If transr = 'N', the Normal transr of RFP A is stored.
If transr = 'T', the Transpose transr of RFP A is stored.
If transr = 'C', the Conjugate-Transpose transr of RFP A is stored.

matrix A.
matrix A.
a Array, size (n*(n+1)/2). The array a contains the matrix A in the RFP
format.
359
Output Parameters
a a is overwritten by the Cholesky factor U or L, as specified by uplo

and trans.
Return Values
See Also
?pptrf
(Hermitian) positive-definite matrix using packed
storage.
Syntax
lapack_int LAPACKE_spptrf (int matrix_layout , char uplo , lapack_int n , float * ap );
lapack_int LAPACKE_dpptrf (int matrix_layout , char uplo , lapack_int n , double *
ap );
lapack_int LAPACKE_cpptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap );
lapack_int LAPACKE_zpptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap );
Include Files
• mkl.h
Description
positive-definite packed matrix A:

NOTE
Input Parameters

360
LAPACK Routines 3
Indicates whether the upper or lower triangular part of A is packed in

the array ap, and how A is factored:
If uplo = 'U', the array ap stores the upper triangular part of the
matrix A, and A is factored as UH*U.
If uplo = 'L', the array ap stores the lower triangular part of the
matrix A; A is factored as L*LH.
ap Array, size at least max(1, n(n+1)/2). The array ap contains either the
upper or the lower triangular part of the matrix A (as specified by
uplo) in packed storage (see Matrix Storage Schemes).
Output Parameters
ap Overwritten by the Cholesky factor U or L, as specified by uplo.
Return Values
Application Notes

The total number of floating-point operations is approximately (1/3)n3 for real flavors and (4/3)n3 for
complex flavors.
?pptrs to solve A*X = B
?ppcon to estimate the condition number of A
?pptri to compute the inverse of A.
See Also
mkl_progress
361
?pbtrf
(Hermitian) positive-definite band matrix.
Syntax
lapack_int LAPACKE_spbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , float * ab , lapack_int ldab );
lapack_int LAPACKE_dpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , double * ab , lapack_int ldab );
lapack_int LAPACKE_cpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_complex_float * ab , lapack_int ldab );
lapack_int LAPACKE_zpbtrf (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_complex_double * ab , lapack_int ldab );
Include Files
• mkl.h
Description
positive-definite band matrix A:

NOTE
Input Parameters

Indicates whether the upper or lower triangular part of A is stored in

the array ab, and how A is factored:
If uplo = 'U', the upper triangle of A is stored.
If uplo = 'L', the lower triangle of A is stored.
kd The number of superdiagonals or subdiagonals in the matrix A; kd≥ 0.
ab Array, size max(1, ldab*n). The array ab contains either the upper or
the lower triangular part of the matrix A (as specified by uplo) in band
storage (see Matrix Storage Schemes).
ldab The leading dimension of the array ab. (ldab≥kd + 1)
362
LAPACK Routines 3
Output Parameters
ab The upper or lower triangular part of A (in band storage) is

overwritten by the Cholesky factor U or L, as specified by uplo.
Return Values
Application Notes

The total number of floating-point operations for real flavors is approximately n(kd+1)2. The number of
operations for complex flavors is 4 times greater. All these estimates assume that kd is much less than n.
?pbtrs to solve A*X = B
?pbcon to estimate the condition number of A.
See Also
mkl_progress
?pttrf
Computes the factorization of a symmetric (Hermitian)
positive-definite tridiagonal matrix.
Syntax
lapack_int LAPACKE_spttrf( lapack_int n, float* d, float* e );
lapack_int LAPACKE_dpttrf( lapack_int n, double* d, double* e );
lapack_int LAPACKE_cpttrf( lapack_int n, float* d, lapack_complex_float* e );
lapack_int LAPACKE_zpttrf( lapack_int n, double* d, lapack_complex_double* e );
Include Files
• mkl.h
Description
The routine forms the factorization of a symmetric positive-definite or, for complex data, Hermitian positive-
definite tridiagonal matrix A:
363
A = L*D*LT for real flavors, or

A = L*D*LH for complex flavors,
where D is diagonal and L is unit lower bidiagonal. The factorization may also be regarded as having the form
A = UT*D*U for real flavors, or A = UH*D*U for complex flavors, where U is unit upper bidiagonal.
Input Parameters
d Array, dimension (n). Contains the diagonal elements of A.
e Array, dimension (n -1). Contains the subdiagonal elements of A.
Output Parameters
d Overwritten by the n diagonal elements of the diagonal matrix D from

the L*D*LT (for real flavors) or L*D*LH (for complex flavors)
factorization of A.
e Overwritten by the (n - 1) sub-diagonal elements of the unit

bidiagonal factor L or U from the factorization of A.
Return Values
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite; if i < n,
the factorization could not be completed, while if i = n, the factorization was completed, but d[n - 1] ≤
0.
?sytrf
Computes the Bunch-Kaufman factorization of a
symmetric matrix.
Syntax
lapack_int LAPACKE_ssytrf (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_dsytrf (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda , lapack_int * ipiv );
lapack_int LAPACKE_csytrf (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zsytrf (int matrix_layout , char uplo , lapack_int n ,
Include Files
• mkl.h
Description
364
LAPACK Routines 3
The routine computes the factorization of a real/complex symmetric matrix A using the Bunch-Kaufman
diagonal pivoting method. The form of the factorization is:
if uplo='U', A = U*D*UT
if uplo='L', A = L*D*LT,
where A is the input matrix, U and L are products of permutation and triangular matrices with unit diagonal
(upper triangular for U and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1
and 2-by-2 diagonal blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of
D.
NOTE
This routine supports the Progress Routine feature. See Progress Routinesection for details.
Input Parameters


how A is factored:
matrix A, and A is factored as U*D*UT.
matrix A, and A is factored as L*D*LT.
a Array, size max(1, lda*n). The array a contains either the upper or
the lower triangular part of the matrix A (see uplo).
Output Parameters
a The upper or lower triangular part of a is overwritten by details of the

block-diagonal matrix D and the multipliers used to obtain the factor U
(or L).
ipiv Array, size at least max(1, n). Contains details of the interchanges
and the block structure of D. If ipiv[i-1] = k >0, then dii is a 1-
by-1 block, and the i-th row and column of A was interchanged with
the k-th row and column.
If uplo = 'U' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and i-th row and column of A was
interchanged with the m-th row and column.
If uplo = 'L' and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i+1)-th row and column of A
was interchanged with the m-th row and column.
365
Return Values
If info = i, Dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
if you use D for solving a system of linear equations.
Application Notes
The 2-by-2 unit diagonal blocks and the unit diagonal elements of U and L are not stored. The remaining
elements of U and L are stored in the corresponding columns of the array a, but additional row interchanges
are required to recover U or L explicitly (which is seldom necessary).
If ipiv[i-1] = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly in the
corresponding elements of the array a.
If uplo = 'U', the computed factors U and D are the exact factors of a perturbed matrix A + E, where
|E| ≤c(n)εP|U||D||UT|PT
c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holds for the
computed L and D when uplo = 'L'.
complex flavors.
?sytrs to solve A*X = B
?sycon to estimate the condition number of A
?sytri to compute the inverse of A.
If uplo = 'U', then A = U*D*U', where
U = P(n)*U(n)* ... *P(k)*U(k)*...,
that is, U is a product of terms P(k)*U(k), where
• k decreases from n to 1 in steps of 1 and 2.

• D is a block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k).
• P(k) is a permutation matrix as defined by ipiv[k-1].
• U(k) is a unit upper triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or 2), then
If s = 1, D(k) overwrites A(k,k), and v overwrites A(1:k-1,k).
366
LAPACK Routines 3
If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k) and A(k,k), and v overwrites A(1:k-2,k
-1:k).
If uplo = 'L', then A = L*D*L', where
L = P(1)*L(1)* ... *P(k)*L(k)*...,
that is, L is a product of terms P(k)*L(k), where
• k increases from 1 to n in steps of 1 and 2.

• P(k) is a permutation matrix as defined by ipiv(k).
• L(k) is a unit lower triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or 2), then
If s = 1, D(k) overwrites A(k,k), and v overwrites A(k+1:n,k).
If s = 2, the lower triangle of D(k) overwrites A(k,k), A(k+1,k), and A(k+1,k+1), and v overwrites A(k
+2:n,k:k+1).
See Also
mkl_progress
?sytrf_rook
Computes the bounded Bunch-Kaufman factorization
of a symmetric matrix.
Syntax
lapack_int LAPACKE_ssytrf_rook (int matrix_layout, char uplo, lapack_int n, float * a,
lapack_int LAPACKE_dsytrf_rook (int matrix_layout, char uplo, lapack_int n, double * a,
lapack_int LAPACKE_csytrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_int LAPACKE_zsytrf_rook (int matrix_layout, char uplo, lapack_int n,
Include Files
• mkl.h
Description
The routine computes the factorization of a real/complex symmetric matrix A using the bounded Bunch-
Kaufman ("rook") diagonal pivoting method. The form of the factorization is:
367
(upper triangular for U and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1
D.
Input Parameters
matrix_layout Specifies whether matrix storage layout for array b is row major

how A is factored:
a Array, size lda*n. The array a contains either the upper or the lower
triangular part of the matrix A (see uplo).
Output Parameters

(or L).
ipiv If ipiv(k) > 0, then rows and columns k and ipiv(k) were
interchanged and Dk, k is a 1-by-1 diagonal block.
If uplo = 'U' and ipiv(k) < 0 and ipiv(k - 1) < 0, then rows
and columns k and -ipiv(k) were interchanged, rows and columns k -
1 and -ipiv(k - 1) were interchanged, and Dk-1:k, k-1:k is a 2-by-2
diagonal block.
If uplo = 'L' and ipiv(k) < 0 and ipiv(k + 1) < 0, then rows
and columns k and -ipiv(k) were interchanged, rows and columns k
+ 1 and -ipiv(k + 1) were interchanged, and Dk:k+1, k:k+1 is a 2-by-2
diagonal block.
Return Values
If info = i, Dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
368
LAPACK Routines 3
Application Notes
complex flavors.
?sytrs_rook to solve A*X = B
?sycon_rook (Fortran only) to estimate the condition number of A
?sytri_rook (Fortran only) to compute the inverse of A.
If uplo = 'U', then A = U*D*U', where
U = P(n)*U(n)* ... *P(k)*U(k)*...,
that is, U is a product of terms P(k)*U(k), where

• k decreases from n to 1 in steps of 1 and 2.
• P(k) is a permutation matrix as defined by ipiv[k-1].
• U(k) is a unit upper triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or 2), then
If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k) and A(k,k), and v overwrites A(1:k-2,k
-1:k).
If uplo = 'L', then A = L*D*L', where
L = P(1)*L(1)* ... *P(k)*L(k)*...,
that is, L is a product of terms P(k)*L(k), where

• k increases from 1 to n in steps of 1 and 2.
• P(k) is a permutation matrix as defined by ipiv(k).
• L(k) is a unit lower triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or 2), then
369
+2:n,k:k+1).
See Also
?hetrf
complex Hermitian matrix.
Syntax
lapack_int LAPACKE_chetrf (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zhetrf (int matrix_layout , char uplo , lapack_int n ,
Include Files
• mkl.h
Description
The routine computes the factorization of a complex Hermitian matrix A using the Bunch-Kaufman diagonal
pivoting method:
if uplo='U', A = U*D*UH
if uplo='L', A = L*D*LH,
(upper triangular for U and lower triangular for L), and D is a Hermitian block-diagonal matrix with 1-by-1
D.
NOTE
This routine supports the Progress Routine feature. See Progress Routinesection for details.
Input Parameters


how A is factored:
matrix A, and A is factored as U*D*UH.
matrix A, and A is factored as L*D*LH.
370
LAPACK Routines 3
a Array, size max(1, lda*n).
The array a contains the upper or the lower triangular part of the
matrix A (see uplo).
Output Parameters

(or L).
Return Values
If info = i, dii is 0. The factorization has been completed, but D is exactly singular. Division by 0 will occur
Application Notes
This routine is suitable for Hermitian matrices that are not known to be positive-definite. If A is in fact
positive-definite, the routine does not perform interchanges, and no 2-by-2 diagonal blocks occur in D.
elements of U and L are stored in the corresponding columns of the array a, but additional row interchanges
are required to recover U or L explicitly (which is seldom necessary).
Ifipiv[i-1] = i for all i =1...n, then all off-diagonal elements of U (L) are stored explicitly in the

A similar estimate holds for the computed L and D when uplo = 'L'.
The total number of floating-point operations is approximately (4/3)n3.
?hetrs to solve A*X = B
371
?hecon to estimate the condition number of A
?hetri to compute the inverse of A.
See Also
mkl_progress
?hetrf_rook
Computes the bounded Bunch-Kaufman factorization
of a complex Hermitian matrix.
Syntax
lapack_int LAPACKE_chetrf_rook (int matrix_layout, char uplo, lapack_int n,
lapack_int LAPACKE_zhetrf_rook (int matrix_layout, char uplo, lapack_int n,
Include Files
• mkl.h
Description
The routine computes the factorization of a complex Hermitian matrix A using the bounded Bunch-Kaufman
diagonal pivoting method:
where A is the input matrix, U (or L ) is a product of permutation and unit upper ( or lower) triangular
matrices, and D is a Hermitian block-diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks.
This is the blocked version of the algorithm, calling Level 3 BLAS.
Input Parameters

matrix A.
matrix A.
a Array a, size (lda*n)
matrix A (see uplo).
372
LAPACK Routines 3
If uplo = 'U', the leading n-by-n upper triangular part of a contains
the upper triangular part of the matrix A, and the strictly lower
triangular part of a is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of a contains the lower triangular part of the
matrix A, and the strictly upper triangular part of a is not referenced.
Output Parameters
a
The block diagonal matrix D and the multipliers used to obtain the
factor U or L (see Application Notes for further details).
ipiv • If uplo = 'U':

If ipiv(k) > 0, then rows and columns k and ipiv(k) were
If ipiv(k) < 0 and ipiv(k - 1) < 0, then rows and columns k and
-ipiv(k) were interchanged and rows and columns k - 1 and -
ipiv(k - 1) were interchanged, Dk - 1:k,k - 1:k is a 2-by-2 diagonal
block.
• If uplo = 'L':
If ipiv(k) > 0, then rows and columns k and ipiv(k) were
interchanged and Dk,k is a 1-by-1 diagonal block.
If ipiv(k) < 0 and ipiv(k + 1) < 0, then rows and columns k and
-ipiv(k) were interchanged and rows and columns k + 1 and -
ipiv(k + 1) were interchanged, Dk:k + 1,k:k + 1 is a 2-by-2 diagonal
block.
Return Values
If info = i, Dii is exactly 0. The factorization has been completed, but the block diagonal matrix D is
exactly singular, and division by 0 will occur if you use D for solving a system of linear equations.
Application Notes
If uplo = 'U', thenA = U*D*UH, where
U = P(n)*U(n)* ... *P(k)U(k)* ...,

i.e., U is a product of terms P(k)*U(k), where k decreases from n to 1 in steps of 1 or 2, and D is a block
diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k). P(k) is a permutation matrix as defined by
ipiv(k), and U(k) is a unit upper triangular matrix, such that if the diagonal block D(k) is of order s (s = 1
or 2), then
k−s s n−k
k−s I v 0
U k =
s 0 I 0
n−k 0 0 I
373

If s = 2, the upper triangle of D(k) overwrites A(k-1,k-1), A(k-1,k), and A(k,k), and v overwrites
A(1:k-2,k-1:k).
If uplo = 'L', then A = L*D*LH, where
L = P(1)*L(1)* ... *P(k)*L(k)* ...,

i.e., L is a product of terms P(k)*L(k), where k increases from 1 to n in steps of 1 or 2, and D is a block
diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks D(k). P(k) is a permutation matrix as defined by
ipiv(k), and L(k) is a unit lower triangular matrix, such that if the diagonal block D(k) is of order s (s = 1 or
2), then
k−1 s n−k−s+1
k−1 I 0 0
Lk =
s 0 I 0
n−k−s+1 0 v I

+2:n,k:k+1).
See Also
mkl_progress
?sptrf
symmetric matrix using packed storage.
Syntax
lapack_int LAPACKE_ssptrf (int matrix_layout , char uplo , lapack_int n , float * ap ,
lapack_int LAPACKE_dsptrf (int matrix_layout , char uplo , lapack_int n , double * ap ,
lapack_int LAPACKE_csptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , lapack_int * ipiv );
lapack_int LAPACKE_zsptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the factorization of a real/complex symmetric matrix A stored in the packed format
using the Bunch-Kaufman diagonal pivoting method. The form of the factorization is:
where U and L are products of permutation and triangular matrices with unit diagonal (upper triangular for U
and lower triangular for L), and D is a symmetric block-diagonal matrix with 1-by-1 and 2-by-2 diagonal
blocks. U and L have 2-by-2 unit diagonal blocks corresponding to the 2-by-2 blocks of D.
374
LAPACK Routines 3
NOTE
Input Parameters


the array ap and how A is factored:
ap Array, size at least max(1, n(n+1)/2). The array ap contains the upper
or the lower triangular part of the matrix A (as specified by uplo) in
packed storage (see Matrix Storage Schemes).
Output Parameters
ap The upper or lower triangle of A (as specified by uplo) is overwritten

by details of the block-diagonal matrix D and the multipliers used to
obtain the factor U (or L).
Return Values
375
Application Notes
elements of U and L overwrite elements of the corresponding columns of the array ap, but additional row
interchanges are required to recover U or L explicitly (which is seldom necessary).
If ipiv(i) = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitly in packed form.
c(n) is a modest linear function of n, and ε is the machine precision. A similar estimate holds for the
computed L and D when uplo = 'L'.
complex flavors.
?sptrs to solve A*X = B
?spcon to estimate the condition number of A
?sptri to compute the inverse of A.
See Also
mkl_progress
?hptrf
complex Hermitian matrix using packed storage.
Syntax
lapack_int LAPACKE_chptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , lapack_int * ipiv );
lapack_int LAPACKE_zhptrf (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the factorization of a complex Hermitian packed matrix A using the Bunch-Kaufman
diagonal pivoting method:
(upper triangular for U and lower triangular for L), and D is a Hermitian block-diagonal matrix with 1-by-1
D.
376
LAPACK Routines 3
NOTE
Input Parameters

Indicates whether the upper or lower triangular part of A is packed

and how A is factored:
ap Array, size at least max(1, n(n+1)/2). The array ap contains the upper
or the lower triangular part of the matrix A (as specified by uplo) in
Output Parameters
ap The upper or lower triangle of A (as specified by uplo) is overwritten

by details of the block-diagonal matrix D and the multipliers used to
obtain the factor U (or L).
Return Values
377
Application Notes
elements of U and L are stored in the array ap, but additional row interchanges are required to recover U or L
explicitly (which is seldom necessary).
If ipiv[i-1] = i for all i = 1...n, then all off-diagonal elements of U (L) are stored explicitly in the

A similar estimate holds for the computed L and D when uplo = 'L'.

?hptrs to solve A*X = B
?hpcon to estimate the condition number of A
?hptri to compute the inverse of A.
See Also
mkl_progress
mkl_?spffrt2, mkl_?spffrtx
Computes the partial LDLT factorization of a
symmetric matrix using packed storage.
Syntax
void mkl_sspffrt2 (float *ap , const MKL_INT *n , const MKL_INT *ncolm , float *work ,
float *work2 );
void mkl_dspffrt2 (double *ap , const MKL_INT *n , const MKL_INT *ncolm , double
*work , double *work2 );
void mkl_cspffrt2 (MKL_Complex8 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
MKL_Complex8 *work , MKL_Complex8 *work2 );
void mkl_zspffrt2 (MKL_Complex16 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
void mkl_sspffrtx (float *ap , const MKL_INT *n , const MKL_INT *ncolm , float *work ,
float *work2 );
void mkl_dspffrtx (double *ap , const MKL_INT *n , const MKL_INT *ncolm , double
*work , double *work2 );
void mkl_cspffrtx (MKL_Complex8 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
void mkl_zspffrtx (MKL_Complex16 *ap , const MKL_INT *n , const MKL_INT *ncolm ,
378
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine computes the partial factorization A = LDLT , where L is a lower triangular matrix and D is a
diagonal matrix.
CAUTION
The routine assumes that the matrix A is factorizable. The routine does not perform pivoting and does
not handle diagonal elements which are zero, which cause the routine to produce incorrect results
without any indication.
a bT
Consider the matrix A = , where a is the element in the first row and first column of A, b is a column
b C
vector of size n - 1 containing the elements from the second through n-th column of A, C is the lower-right
square submatrix of A, and I is the identity matrix.
The mkl_?spffrt2 routine performs ncolm successive factorizations of the form
−1
a bT a 0 a 0 a bT
A= = .
b C b I 0 C − ba −1bT 0 I
The mkl_?spffrtx routine performs ncolm successive factorizations of the form

1 0 a T
a bT 0 1 ba −1
A= = .
b C ba −1 I 0 C − ba −1bT 0 I
The approximate number of floating point operations performed by real flavors of these routines is
(1/6)*ncolm*(2*ncolm2 - 6*ncolm*n + 3*ncolm + 6*n2 - 6*n + 7).
The approximate number of floating point operations performed by complex flavors of these routines is
(1/3)*ncolm*(4*ncolm2 - 12*ncolm*n + 9*ncolm + 12*n2 - 18*n + 8).
Input Parameters
ap Array, size at least max(1, n(n+1)/2). The array ap contains the lower
triangular part of the matrix A in packed storage (see Matrix Storage
Schemes for uplo = 'L').
ncolm The number of columns to factor, ncolm≤n.
work, work2 Workspace arrays, size of each at least n.
Output Parameters
ap Overwritten by the factor L. The first ncolm diagonal elements of the

input matrix A are replaced with the diagonal elements of D. The
subdiagonal elements of the first ncolm columns are replaced with the
corresponding elements of L. The rest of the input array is updated as
indicated in the Description section.
379
NOTE
Specifying ncolm = n results in complete factorization A = LDLT.
See Also
mkl_progress
Solving Systems of Linear Equations: LAPACK Computational Routines

This section describes the LAPACK routines for solving systems of linear equations. Before calling most of
these routines, you need to factorize the matrix of your system of equations (see Routines for Matrix
Factorization in this chapter). However, the factorization is not necessary if your system of equations has a
triangular matrix.
?getrs
Solves a system of linear equations with an LU-
factored square coefficient matrix, with multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , const lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , const lapack_int * ipiv , double * b ,
lapack_int ldb );
lapack_int LAPACKE_cgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgetrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv ,
lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations:
A*X = B if trans='N',
AT*X = B if trans='T',
AH*X = B if trans='C' (for complex matrices only).
Before calling this routine, you must call ?getrf to compute the LU factorization of A.
380
LAPACK Routines 3
Input Parameters

trans Must be 'N' or 'T' or 'C'.
Indicates the form of the equations:

If trans = 'N', then A*X = B is solved for X.
If trans = 'T', then AT*X = B is solved for X.
If trans = 'C', then AH*X = B is solved for X.
n The order of A; the number of rows in B(n≥ 0).
nrhs The number of right-hand sides; nrhs≥ 0.
a Array of size max(1, lda*n).
The array a contains LU factorization of matrix A resulting from the

call of ?getrf.
b Array of size max(1,ldb*nrhs) for column major layout, and

max(1,ldb*n) for row major layout.
The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations.
lda The leading dimension of a; lda≥ max(1, n).
ldb The leading dimension of b; ldb≥ max(1, n) for column major layout
and ldb≥nrhs for row major layout.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?getrf.
Output Parameters
Return Values
Application Notes
For each right-hand side b, the computed solution is the exact solution of a perturbed system of equations (A
+ E)x = b, where
|E| ≤c(n)εP|L||U|

If x0 is the true solution, the computed solution x satisfies this error bound:
381
where cond(A,x)= || |A-1||A| |x| ||∞ / ||x||∞≤ ||A-1||∞ ||A||∞ = κ∞(A).
Note that cond(A,x) can be much smaller than κ∞(A); the condition number of AT and AH might or might
not be equal to κ∞(A).
The approximate number of floating-point operations for one right-hand side vector b is 2n2 for real flavors
and 8n2 for complex flavors.
To estimate the condition number κ∞(A), call ?gecon.
To refine the solution and estimate the error, call ?gerfs.
See Also
?gbtrs
Solves a system of linear equations with an LU-
factored band coefficient matrix, with multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const float * ab , lapack_int ldab , const
lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const double * ab , lapack_int ldab , const
lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const lapack_complex_float * ab , lapack_int
ldab , const lapack_int * ipiv , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgbtrs (int matrix_layout , char trans , lapack_int n , lapack_int
kl , lapack_int ku , lapack_int nrhs , const lapack_complex_double * ab , lapack_int
ldab , const lapack_int * ipiv , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations:
Here A is an LU-factored general band matrix of order n with kl non-zero subdiagonals and ku nonzero
superdiagonals. Before calling this routine, call ?gbtrf to compute the LU factorization of A.
382
LAPACK Routines 3
Input Parameters

n The order of A; the number of rows in B; n≥ 0.
ab Array ab size max(1, ldab*n)
The array ab contains elements of the LU factors of the matrix A as

returned by gbtrf.
b Array b size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout.
ldab The leading dimension of the array ab; ldab≥ 2*kl + ku +1.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gbtrf.
Output Parameters
Return Values
Application Notes
+ E)x = b, where
|E| ≤c(kl + ku + 1)εP|L||U|

383
The approximate number of floating-point operations for one right-hand side vector is 2n(ku + 2kl) for real
flavors. The number of operations for complex flavors is 4 times greater. All these estimates assume that kl
and ku are much less than min(m,n).
To estimate the condition number κ∞(A), call ?gbcon.
To refine the solution and estimate the error, call ?gbrfs.
See Also
?gttrs
Solves a system of linear equations with a tridiagonal
coefficient matrix using the LU factorization computed
by ?gttrf.
Syntax
lapack_int LAPACKE_sgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const float * dl , const float * d , const float * du , const float * du2 ,
const lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const double * dl , const double * d , const double * du , const double * du2 ,
const lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_float * dl , const lapack_complex_float * d , const
lapack_complex_float * du , const lapack_complex_float * du2 , const lapack_int *
ipiv , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgttrs (int matrix_layout , char trans , lapack_int n , lapack_int
nrhs , const lapack_complex_double * dl , const lapack_complex_double * d , const
lapack_complex_double * du , const lapack_complex_double * du2 , const lapack_int *
ipiv , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations with multiple right hand sides:
384
LAPACK Routines 3
Before calling this routine, you must call ?gttrf to compute the LU factorization of A.
Input Parameters

n The order of A; n≥ 0.
nrhs The number of right-hand sides, that is, the number of columns in B;
nrhs≥ 0.
dl,d,du,du2 Arrays: dl(n -1), d(n), du(n -1), du2(n -2).
The array dl contains the (n - 1) multipliers that define the matrix L

from the LU factorization of A.
The array d contains the n diagonal elements of the upper triangular
matrix U from the LU factorization of A.
The array du contains the (n - 1) elements of the first superdiagonal
of U.
The array du2 contains the (n - 2) elements of the second
superdiagonal of U.
b Array of size max(1, ldb*nrhs) for column major layout and max(1,
n*ldb) for row major layout. Contains the matrix B whose columns
are the right-hand sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n) for column major

layout and ldb≥nrhs for row major layout.
ipiv Array, size (n). The ipiv array, as returned by ?gttrf.
Output Parameters
Return Values
385
Application Notes
+ E)x = b, where
|E| ≤c(n)εP|L||U|

The approximate number of floating-point operations for one right-hand side vector b is 7n (including n
divisions) for real flavors and 34n (including 2n divisions) for complex flavors.
To estimate the condition number κ∞(A), call ?gtcon.
To refine the solution and estimate the error, call ?gtrfs.
See Also
?dttrsb
Solves a system of linear equations with a diagonally
dominant tridiagonal coefficient matrix using the LU
factorization computed by ?dttrfb.
Syntax
void sdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const float
* dl, const float * d, const float * du, float * b, const MKL_INT * ldb, MKL_INT *
info );
void ddttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const double
* dl, const double * d, const double * du, double * b, const MKL_INT * ldb, MKL_INT *
info );
void cdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const
MKL_Complex8 * dl, const MKL_Complex8 * d, const MKL_Complex8 * du, MKL_Complex8 * b,
const MKL_INT * ldb, MKL_INT * info );
void zdttrsb (const char * trans, const MKL_INT * n, const MKL_INT * nrhs, const
MKL_Complex16 * dl, const MKL_Complex16 * d, const MKL_Complex16 * du, MKL_Complex16 *
b, const MKL_INT * ldb, MKL_INT * info );
Include Files
• mkl.h
Description
The ?dttrsb routine solves the following systems of linear equations with multiple right hand sides for X:
386
LAPACK Routines 3
Before calling this routine, call ?dttrfb to compute the factorization of A.
Input Parameters
Indicates the form of the equations solved for X:

If trans = 'N', then A*X = B.
If trans = 'T', then AT*X = B.
If trans = 'C', then AH*X = B.
nrhs The number of right-hand sides, that is, the number of columns in B;
nrhs≥ 0.
dl, d, du Arrays: dl(n -1), d(n), du(n -1).
The array dl contains the (n - 1) multipliers that define the

matrices L1, L2 from the factorization of A.
matrix U from the factorization of A.
The array du contains the (n - 1) elements of the superdiagonal of U.
b Array of size max(1, ldb*nrhs). Contains the matrix B whose

columns are the right-hand sides for the systems of equations.
ldb The leading dimension of b; ldb≥ max(1, n).
Output Parameters
?potrs
Solves a system of linear equations with a Cholesky-
factored symmetric (Hermitian) positive-definite
coefficient matrix.
Syntax
lapack_int LAPACKE_spotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * a , lapack_int lda , float * b , lapack_int ldb );
lapack_int LAPACKE_dpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * a , lapack_int lda , double * b , lapack_int ldb );
387
lapack_int LAPACKE_cpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int

nrhs , const lapack_complex_float * a , lapack_int lda , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zpotrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * a , lapack_int lda , lapack_complex_double * b ,
lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a symmetric positive-definite or, for
complex data, Hermitian positive-definite matrix A, given the Cholesky factorization of A:

where L is a lower triangular matrix and U is upper triangular. The system is solved with multiple right-hand
sides stored in the columns of the matrix B.
Before calling this routine, you must call potrf to compute the Cholesky factorization of A.
Input Parameters

Indicates how the input matrix A has been factored:

If uplo = 'U', U is stored, where A = UT*U for real data, A = UH*U
for complex data.
If uplo = 'L', L is stored, where A = L*LT for real data, A = L*LH for
complex data
nrhs The number of right-hand sides (nrhs≥ 0).
a Array A of size at least max(1, lda*n)
The array a contains the factor U or L (see uplo) as returned by

potrf. .
b The array b contains the matrix B whose columns are the right-hand
sides for the systems of equations. The size of b must be at least
max(1, ldb*nrhs) for column major layout and max(1, ldb*n) for
row major layout.
388
LAPACK Routines 3
Output Parameters
Return Values
Application Notes
If uplo = 'U', the computed solution for each right-hand side b is the exact solution of a perturbed system
of equations (A + E)x = b, where
|E| ≤c(n)ε |UH||U|

A similar estimate holds for uplo = 'L'. If x0 is the true solution, the computed solution x satisfies this
error bound:
Note that cond(A,x) can be much smaller than κ∞ (A). The approximate number of floating-point operations
for one right-hand side vector b is 2n2 for real flavors and 8n2 for complex flavors.
To estimate the condition number κ∞(A), call ?pocon.
To refine the solution and estimate the error, call ?porfs.
See Also
?pftrs
factored symmetric (Hermitian) positive-definite
coefficient matrix using the Rectangular Full Packed
(RFP) format.
Syntax
lapack_int LAPACKE_spftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const float * a , float * b , lapack_int ldb );
lapack_int LAPACKE_dpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const double * a , double * b , lapack_int ldb );
lapack_int LAPACKE_cpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const lapack_complex_float * a , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zpftrs (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_int nrhs , const lapack_complex_double * a , lapack_complex_double * b ,
lapack_int ldb );
389
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a symmetric positive-definite or, for complex
data, Hermitian positive-definite matrix A using the Cholesky factorization of A:

Before calling ?pftrs, you must call ?pftrf to compute the Cholesky factorization of A. L stands for a lower
triangular matrix and U for an upper triangular matrix.
Storage Schemes.
Input Parameters

If transr = 'N', the untransposed factor of Ais stored in RFP format.
If transr = 'T', the transposed factor of Ais stored in RFP format.
If transr = 'C', the conjugate-transposed factor of Ais stored in RFP

format.

for complex data.
complex data
nrhs The number of right-hand sides, that is, the number of columns of the
matrix B; nrhs≥ 0.
a Array a of size max(1,n*(n + 1)/2).
The array a contains, in the RFP format, the factor U or L obtained by

factorization of matrix A.
b The array b of size max(1, ldb*nrhs) for column major layout and
max(1,ldb*n) for row major layout contains the matrix B whose
390
LAPACK Routines 3
Output Parameters
b The solution matrix X.
Return Values
See Also
?pptrs
Solves a system of linear equations with a packed
Cholesky-factored symmetric (Hermitian) positive-
definite coefficient matrix.
Syntax
lapack_int LAPACKE_spptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * ap , double * b , lapack_int ldb );
lapack_int LAPACKE_cpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zpptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * ap , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a packed symmetric positive-definite or,
for complex data, Hermitian positive-definite matrix A, given the Cholesky factorization of A:

Before calling this routine, you must call ?pptrf to compute the Cholesky factorization of A.
Input Parameters


for complex data.
391
complex data
nrhs The number of right-hand sides (nrhs≥ 0).
ap, b The size of ap must be at least max(1,n(n+1)/2).

The array ap contains the factor U or L, as specified by uplo, in
max(1, ldb*n) for row major layout contains the matrix B whose
Output Parameters
Return Values
Application Notes
If uplo = 'U', the computed solution for each right-hand side b is the exact solution of a perturbed system
of equations (A + E)x = b, where
|E| ≤c(n)ε |UH||U|

Note that cond(A,x) can be much smaller than κ∞(A).
The approximate number of floating-point operations for one right-hand side vector b is 2n2 for real flavors
To estimate the condition number κ∞(A), call ?ppcon.
To refine the solution and estimate the error, call ?pprfs.
392
LAPACK Routines 3
See Also
?pbtrs
factored symmetric (Hermitian) positive-definite band
coefficient matrix.
Syntax
lapack_int LAPACKE_spbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const float * ab , lapack_int ldab , float * b , lapack_int
ldb );
lapack_int LAPACKE_dpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const double * ab , lapack_int ldab , double * b , lapack_int
ldb );
lapack_int LAPACKE_cpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const lapack_complex_float * ab , lapack_int ldab ,
lapack_int LAPACKE_zpbtrs (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , const lapack_complex_double * ab , lapack_int ldab ,
Include Files
• mkl.h
Description
The routine solves for real data a system of linear equations A*X = B with a symmetric positive-definite or,
for complex data, Hermitian positive-definite band matrix A, given the Cholesky factorization of A:

Before calling this routine, you must call ?pbtrf to compute the Cholesky factorization of A in the band
storage form.
Input Parameters


If uplo = 'U', U is stored in ab, where A = UT*U for real matrices
and A = UH*U for complex matrices.
If uplo = 'L', L is stored in ab, where A = L*LT for real matrices and
A = L*LH for complex matrices.
393
ab Array ab is of size max (1, ldab*n).

The array ab contains the Cholesky factor, as returned by the
factorization routine, in band storage form.
The size of b is at least max(1, ldb*nrhs) for column major layout
and max(1, ldb*n) for row major layout.
ldab The leading dimension of the array ab; ldab≥kd +1.
Output Parameters
Return Values
Application Notes
+ E)x = b, where
|E| ≤c(kd + 1)εP|UH||U| or |E| ≤c(kd + 1)εP|LH||L|

The approximate number of floating-point operations for one right-hand side vector is 4n*kd for real flavors
and 16n*kd for complex flavors.
To estimate the condition number κ∞(A), call ?pbcon.
To refine the solution and estimate the error, call ?pbrfs.
394
LAPACK Routines 3
See Also
?pttrs
Solves a system of linear equations with a symmetric
(Hermitian) positive-definite tridiagonal coefficient
matrix using the factorization computed by ?pttrf.
Syntax
lapack_int LAPACKE_spttrs( int matrix_layout, lapack_int n, lapack_int nrhs, const
float* d, const float* e, float* b, lapack_int ldb );
lapack_int LAPACKE_dpttrs( int matrix_layout, lapack_int n, lapack_int nrhs, const
double* d, const double* e, double* b, lapack_int ldb );
lapack_int LAPACKE_cpttrs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, lapack_complex_float* b, lapack_int
ldb );
lapack_int LAPACKE_zpttrs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, lapack_complex_double* b, lapack_int
ldb );
Include Files
• mkl.h
Description
The routine solves for X a system of linear equations A*X = B with a symmetric (Hermitian) positive-definite
tridiagonal matrix A. Before calling this routine, call ?pttrf to compute the L*D*LT or UT*D*Ufor real data
and the L*D*LH or UH*D*Ufactorization of A for complex data.
Input Parameters

uplo Used for cpttrs/zpttrs only. Must be 'U' or 'L'.
Specifies whether the superdiagonal or the subdiagonal of the

tridiagonal matrix A is stored and how A is factored:
If uplo = 'U', the array e stores the conjugated values of the
superdiagonal of U, and A is factored as UH*D*U.
If uplo = 'L', the array e stores the subdiagonal of L, and A is

factored as L*D*LH.
d Array, dimension (n). Contains the diagonal elements of the diagonal

matrix D from the factorization computed by ?pttrf.
e Array e is of size (n -1).
395
The array e contains the (n - 1) sub-diagonal elements of the unit

bidiagonal factor L or the conjugated values of the superdiagonal of U
from the factorization computed by ?pttrf (see uplo).
e, b The array b contains the matrix B whose columns are the right-hand
Output Parameters
Return Values
See Also
?sytrs
Solves a system of linear equations with a UDUT- or
LDLT-factored symmetric coefficient matrix.
Syntax
lapack_int LAPACKE_ssytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int ldb );
lapack_int LAPACKE_dsytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int ldb );
lapack_int LAPACKE_csytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int LAPACKE_zsytrs (int matrix_layout , char uplo , lapack_int n , lapack_int
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a symmetric matrix A, given the Bunch-
Kaufman factorization of A:
396
LAPACK Routines 3
where U and L are upper and lower triangular matrices with unit diagonal and D is a symmetric block-
diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the matrix B.
You must supply to this routine the factor U (or L) and the array ipiv returned by the factorization routine ?
sytrf.
Input Parameters


If uplo = 'U', the array a stores the upper triangular factor U of the
factorization A = U*D*UT.
If uplo = 'L', the array a stores the lower triangular factor L of the
factorization A = L*D*LT.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sytrf.
a The array aof size max(1, lda*n) contains the factor U or L (see
uplo). .
sides for the system of equations.
Output Parameters
Return Values
397
Application Notes
+ E)x = b, where
|E| ≤c(n)εP|U||D||UT|PT or |E| ≤c(n)εP|L||D||UT|PT

The total number of floating-point operations for one right-hand side vector is approximately 2n2 for real
flavors or 8n2 for complex flavors.
To estimate the condition number κ∞(A), call ?sycon.
To refine the solution and estimate the error, call ?syrfs.
See Also
?sytrs_rook
Solves a system of linear equations with a UDU- or
LDL-factored symmetric coefficient matrix.
Syntax
lapack_int LAPACKE_ssytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const float * a, lapack_int lda, const lapack_int * ipiv, float * b, lapack_int
ldb);
lapack_int LAPACKE_dsytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const double * a, lapack_int lda, const lapack_int * ipiv, double * b,
lapack_int ldb);
lapack_int LAPACKE_csytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zsytrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * b, lapack_int ldb);
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a symmetric matrix A, using the factorization A
= U*D*UT or A = L*D*LT computed by ?sytrf_rook.
398
LAPACK Routines 3
Input Parameters

If uplo = 'U', the factorization is of the form A = U*D*UT.
If uplo = 'L', the factorization is of the form A = L*D*LT.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?
sytrf_rook.
a, b Arrays: a, size (lda*n), b size (ldb*nrhs).
The array a contains the block diagonal matrix D and the multipliers
used to obtain U or L as computed by ?sytrf_rook (see uplo).
and ldb≥nrhs) for row major layout.
Output Parameters
Return Values
Application Notes
See Also
?hetrs
Solves a system of linear equations with a UDUT- or
LDLT-factored Hermitian coefficient matrix.
399
Syntax
lapack_int LAPACKE_chetrs (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int LAPACKE_zhetrs (int matrix_layout , char uplo , lapack_int n , lapack_int
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a Hermitian matrix A, given the Bunch-
where U and L are upper and lower triangular matrices with unit diagonal and D is a symmetric block-
diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the matrix B.
You must supply to this routine the factor U (or L) and the array ipiv returned by the factorization routine ?
hetrf.
Input Parameters


factorization A = U*D*UH.
factorization A = L*D*LH.
ipiv Array, size at least max(1, n).
The ipiv array, as returned by ?hetrf.
a The array aof size max(1, lda*n) contains the factor U or L (see
uplo).
400
LAPACK Routines 3
Output Parameters
Return Values
Application Notes
+ E)x = b, where
|E| ≤c(n)εP|U||D||UH|PT or |E| ≤c(n)εP|L||D||LH|PT

The total number of floating-point operations for one right-hand side vector is approximately 8n2.
To estimate the condition number κ∞(A), call ?hecon.
To refine the solution and estimate the error, call ?herfs.
See Also
?hetrs_rook
LDL-factored Hermitian coefficient matrix.
Syntax
lapack_int LAPACKE_chetrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zhetrs_rook (int matrix_layout, char uplo, lapack_int n, lapack_int
nrhs, const lapack_complex_double * a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * b, lapack_int ldb);
401
Include Files
• mkl.h
Description
The routine solves for a system of linear equations A*X = B with a complex Hermitian matrix A using the
factorization A = U*D*UH or A = L*D*LH computed by ?hetrf_rook.
Input Parameters

If uplo = 'U', the factorization is of the form A = U*D*UH.
If uplo = 'L', the factorization is of the form A = L*D*LH.
The ipiv array, as returned by ?hetrf_rook.
a, b Arrays: a (lda*n)), b(ldb*nrhs).
The array a contains the block diagonal matrix D and the multipliers
used to obtain the factor U or L as computed by ?hetrf_rook (see
uplo).
Output Parameters
Return Values
?sytrs2
LDL-factored symmetric coefficient matrix.
402
LAPACK Routines 3
Syntax
lapack_int LAPACKE_ssytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int ldb );
lapack_int LAPACKE_dsytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int ldb );
lapack_int LAPACKE_csytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int LAPACKE_zsytrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a symmetric matrix A using the factorization of
A:
if uplo='L', A = L*D*LT
where
• U and L are upper and lower triangular matrices with unit diagonal
• D is a symmetric block-diagonal matrix.
The factorization is computed by ?sytrf.
Input Parameters


a The array aof size max(1, lda*n) contains the block diagonal matrix D
and the multipliers used to obtain the factor U or L as computed by ?
sytrf.
403
b The array b contains the right-hand side matrix B.

ipiv Array of size n. The ipiv array contains details of the interchanges
and the block structure of D as determined by ?sytrf.
Output Parameters
Return Values
See Also
?sytrf
?hetrs2
LDL-factored Hermitian coefficient matrix.
Syntax
lapack_int LAPACKE_chetrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
lapack_int LAPACKE_zhetrs2 (int matrix_layout , char uplo , lapack_int n , lapack_int
Include Files
• mkl.h
Description
The routine solves a system of linear equations A*X = B with a complex Hermitian matrix A using the
factorization of A:
if uplo='L', A = L*D*LH
where
• U and L are upper and lower triangular matrices with unit diagonal
404
LAPACK Routines 3
• D is a Hermitian block-diagonal matrix.
The factorization is computed by ?hetrf.
Input Parameters


a The array a of size max(1, lda*n) contains the block diagonal matrix
D and the multipliers used to obtain the factor U or L as computed
by ?hetrf.
max(1, ldb*n) for row major layout contains the right-hand side
matrix B.
ipiv Array of size n. The ipiv array contains details of the interchanges
and the block structure of D as determined by ?hetrf.
Output Parameters
Return Values
See Also
?hetrf
?sptrs
LDL-factored symmetric coefficient matrix using
packed storage.
405
Syntax
lapack_int LAPACKE_ssptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const float * ap , const lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dsptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const double * ap , const lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_csptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , const lapack_int * ipiv , lapack_complex_float
* b , lapack_int ldb );
lapack_int LAPACKE_zsptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * ap , const lapack_int * ipiv ,
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a symmetric matrix A, given the Bunch-
where U and L are upper and lower packed triangular matrices with unit diagonal and D is a symmetric
block-diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the
matrix B. You must supply the factor U (or L) and the array ipiv returned by the factorization routine ?sptrf.
Input Parameters


If uplo = 'U', the array ap stores the packed factor U of the
factorization A = U*D*UT. If uplo = 'L', the array ap stores the
packed factor L of the factorization A = L*D*LT.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sptrf.
ap The dimension of array ap must be at least max(1, n(n+1)/2). The

array ap contains the factor U or L, as specified by uplo, in packed
sides for the system of equations. The size of b is max(1, ldb*nrhs)
for column major layout and max(1, ldb*n) for row major layout.
406
LAPACK Routines 3
Output Parameters
Return Values
Application Notes
+ E)x = b, where
|E| ≤c(n)εP|U||D||UT|PT or |E| ≤c(n)εP|L||D||LT|PT

To estimate the condition number κ∞(A), call ?spcon.
To refine the solution and estimate the error, call ?sprfs.
See Also
?hptrs
LDL-factored Hermitian coefficient matrix using
packed storage.
Syntax
lapack_int LAPACKE_chptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_float * ap , const lapack_int * ipiv , lapack_complex_float
lapack_int LAPACKE_zhptrs (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , const lapack_complex_double * ap , const lapack_int * ipiv ,
407
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B with a Hermitian matrix A, given the Bunch-
where U and L are upper and lower packed triangular matrices with unit diagonal and D is a symmetric
block-diagonal matrix. The system is solved with multiple right-hand sides stored in the columns of the
matrix B.
You must supply to this routine the arrays ap (containing U or L)and ipiv in the form returned by the
factorization routine ?hptrf.
Input Parameters


If uplo = 'U', the array ap stores the packed factor U of the
factorization A = U*D*UH. If uplo = 'L', the array ap stores the
packed factor L of the factorization A = L*D*LH.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hptrf.
ap The dimension of array ap must be at least max(1,n(n+1)/2). The

array ap contains the factor U or L, as specified by uplo, in packed
sides for the system of equations. The size of b is max(1, ldb*nrhs)
for column major layout and max(1, ldb*n) for row major layout.
Output Parameters
Return Values
408
LAPACK Routines 3
Application Notes
+ E)x = b, where
|E| ≤c(n)εP|U||D||UH|PT or |E| ≤c(n)εP|L||D||LH|PT

The total number of floating-point operations for one right-hand side vector is approximately 8n2 for complex
flavors.
To estimate the condition number κ∞(A), call ?hpcon.
To refine the solution and estimate the error, call ?hprfs.
See Also
?trtrs
Solves a system of linear equations with a triangular
coefficient matrix, with multiple right-hand sides.
Syntax
lapack_int LAPACKE_strtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const float * a , lapack_int lda , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dtrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const double * a , lapack_int lda , double * b ,
lapack_int ldb );
lapack_int LAPACKE_ctrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_float * a , lapack_int lda ,
lapack_int LAPACKE_ztrtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_double * a , lapack_int lda ,
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations with a triangular matrix A, with multiple
right-hand sides stored in B:
409
Input Parameters

Indicates whether A is upper or lower triangular:

If uplo = 'U', then A is upper triangular.
If uplo = 'L', then A is lower triangular.
diag Must be 'N' or 'U'.
If diag = 'N', then A is not a unit triangular matrix.
If diag = 'U', then A is unit triangular: diagonal elements of A are

assumed to be 1 and not referenced in the array a.
a The array a contains the matrix A.
The size of a is max(1, lda*n).
The size of b is max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout.
Output Parameters
Return Values
410
LAPACK Routines 3
Application Notes
+ E)x = b, where
|E| ≤c(n)ε |A|
c(n) is a modest linear function of n, and ε is the machine precision. If x0 is the true solution, the computed
solution x satisfies this error bound:
The approximate number of floating-point operations for one right-hand side vector b is n2 for real flavors
To estimate the condition number κ∞(A), call ?trcon.
To estimate the error in the solution, call ?trrfs.
See Also
?tptrs
Solves a system of linear equations with a packed
triangular coefficient matrix, with multiple right-hand
sides.
Syntax
lapack_int LAPACKE_stptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dtptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const double * ap , double * b , lapack_int ldb );
lapack_int LAPACKE_ctptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_float * ap , lapack_complex_float
lapack_int LAPACKE_ztptrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int nrhs , const lapack_complex_double * ap ,
Include Files
• mkl.h
Description
411
The routine solves for X the following systems of linear equations with a packed triangular matrix A, with
multiple right-hand sides stored in B:
Input Parameters


If diag = 'U', then A is unit triangular: diagonal elements are

assumed to be 1 and not referenced in the array ap.
ap The dimension of arrayap must be at least max(1,n(n+1)/2). The

array ap contains the matrix A in packed storage (see Matrix Storage
Schemes).
Output Parameters
412
LAPACK Routines 3
Return Values
Application Notes
+ E)x = b, where
|E| ≤c(n)ε |A|

The approximate number of floating-point operations for one right-hand side vector b is n2 for real flavors
To estimate the condition number κ∞(A), call ?tpcon.
To estimate the error in the solution, call ?tprfs.
See Also
?tbtrs
Solves a system of linear equations with a band
triangular coefficient matrix, with multiple right-hand
sides.
Syntax
lapack_int LAPACKE_stbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const float * ab , lapack_int ldab ,
float * b , lapack_int ldb );
lapack_int LAPACKE_dtbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const double * ab , lapack_int ldab ,
double * b , lapack_int ldb );
lapack_int LAPACKE_ctbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const lapack_complex_float * ab ,
lapack_int ldab , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_ztbtrs (int matrix_layout , char uplo , char trans , char diag ,
lapack_int n , lapack_int kd , lapack_int nrhs , const lapack_complex_double * ab ,
lapack_int ldab , lapack_complex_double * b , lapack_int ldb );
413
Include Files
• mkl.h
Description
The routine solves for X the following systems of linear equations with a band triangular matrix A, with
multiple right-hand sides stored in B:
Input Parameters



assumed to be 1 and not referenced in the array ab.
ab The array ab contains the matrix A in band storage form.
The size of ab must be max(1, ldab*n)
ldab The leading dimension of ab; ldab≥kd + 1.
414
LAPACK Routines 3
Output Parameters
Return Values
Application Notes
+ E)x = b, where
|E|≤ c(n)ε|A|
c(n) is a modest linear function of n, and ε is the machine precision. If x0 is the true solution, the computed
solution x satisfies this error bound:
The approximate number of floating-point operations for one right-hand side vector b is 2n*kd for real
flavors and 8n*kd for complex flavors.
To estimate the condition number κ∞(A), call ?tbcon.
To estimate the error in the solution, call ?tbrfs.
See Also
Estimating the Condition Number: LAPACK Computational Routines

This section describes the LAPACK routines for estimating the condition number of a matrix. The condition
number is used for analyzing the errors in the solution of a system of linear equations (see Error Analysis).
Since the condition number may be arbitrarily large when the matrix is nearly singular, the routines actually
compute the reciprocal condition number.
?gecon
Estimates the reciprocal of the condition number of a
general matrix in the 1-norm or the infinity-norm.
Syntax
lapack_int LAPACKE_sgecon( int matrix_layout, char norm, lapack_int n, const float* a,
lapack_int lda, float anorm, float* rcond );
415
lapack_int LAPACKE_dgecon( int matrix_layout, char norm, lapack_int n, const double* a,

lapack_int lda, double anorm, double* rcond );
lapack_int LAPACKE_cgecon( int matrix_layout, char norm, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_zgecon( int matrix_layout, char norm, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a general matrix A in the 1-norm or infinity-
norm:
κ1(A) =||A||1||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞||A-1||∞ = κ1(AT) = κ1(AH).
An estimate is obtained for ||A-1||, and the reciprocal of the condition number is computed as rcond =
1 / (||A|| ||A-1||).
Before calling this routine:
• compute anorm (either ||A||1 = maxjΣi |aij| or ||A||∞ = maxiΣj |aij|)

• call ?getrf to compute the LU factorization of A.
Input Parameters

norm Must be '1' or 'O' or 'I'.
If norm = '1' or 'O', then the routine estimates the condition

number of matrix A in 1-norm.
If norm = 'I', then the routine estimates the condition number of
matrix A in infinity-norm.
a The array a contains the LU-factored matrix A, as returned by ?

getrf.
anorm The norm of the original matrix A (see Description).
Output Parameters
rcond An estimate of the reciprocal of the condition number. The routine sets
rcond = 0 if the estimate underflows; in this case the matrix is
singular (to working precision). However, anytime rcond is small
compared to 1.0, for the working precision, the matrix may be poorly
conditioned or even singular.
416
LAPACK Routines 3
Return Values
Application Notes
The computed rcond is never less than r (the reciprocal of the true condition number) and in practice is
nearly always less than 10r. A call to this routine involves solving a number of systems of linear equations
A*x = b or AH*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2*n2 floating-point operations for real flavors and 8*n2 for complex flavors.
See Also
?gbcon
band matrix in the 1-norm or the infinity-norm.
Syntax
lapack_int LAPACKE_sgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const float* ab, lapack_int ldab, const lapack_int* ipiv, float anorm,
float* rcond );
lapack_int LAPACKE_dgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const double* ab, lapack_int ldab, const lapack_int* ipiv, double
anorm, double* rcond );
lapack_int LAPACKE_cgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, const lapack_int* ipiv,
float anorm, float* rcond );
lapack_int LAPACKE_zgbcon( int matrix_layout, char norm, lapack_int n, lapack_int kl,
lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, const lapack_int*
ipiv, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a general band matrix A in the 1-norm or
infinity-norm:
κ1(A) = ||A||1||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) = ||A||∞||A-1||∞ = κ1(AT) = κ1(AH).
1 / (||A|| ||A-1||).

• call ?gbtrf to compute the LU factorization of A.
417
Input Parameters


ldab The leading dimension of the array ab. (ldab≥ 2*kl + ku +1).
ab The array abof size max(1, ldab*n) contains the factored band matrix
A, as returned by ?gbtrf.
anorm The norm of the original matrix A(see Description).
Output Parameters
rcond =0 if the estimate underflows; in this case the matrix is singular
(to working precision). However, anytime rcond is small compared to
1.0, for the working precision, the matrix may be poorly conditioned
or even singular.
Return Values
Application Notes
A*x = b or AH*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n(ku + 2kl) floating-point operations for real flavors and 8n(ku + 2kl) for complex
flavors.
See Also
?gtcon
tridiagonal matrix.
418
LAPACK Routines 3
Syntax
lapack_int LAPACKE_sgtcon( char norm, lapack_int n, const float* dl, const float* d,
const float* du, const float* du2, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dgtcon( char norm, lapack_int n, const double* dl, const double* d,
const double* du, const double* du2, const lapack_int* ipiv, double anorm, double*
rcond );
lapack_int LAPACKE_cgtcon( char norm, lapack_int n, const lapack_complex_float* dl,
const lapack_complex_float* d, const lapack_complex_float* du, const
lapack_complex_float* du2, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zgtcon( char norm, lapack_int n, const lapack_complex_double* dl,
const lapack_complex_double* d, const lapack_complex_double* du, const
lapack_complex_double* du2, const lapack_int* ipiv, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a real or complex tridiagonal matrix A in the
1-norm or infinity-norm:
κ1(A) = ||A||1||A-1||1
κ∞(A) = ||A||∞||A-1||∞
1 / (||A|| ||A-1||).

• call ?gttrf to compute the LU factorization of A.
Input Parameters

dl,d,du,du2 Arrays: dl(n -1), d(n), du(n -1), du2(n -2).
The array dl contains the (n - 1) multipliers that define the matrix L

from the LU factorization of A as computed by ?gttrf.

The array du contains the (n - 1) elements of the first superdiagonal
of U.
The array du2 contains the (n - 2) elements of the second
superdiagonal of U.
419
ipiv Array, size (n). The array of pivot indices, as returned by ?gttrf.
anorm The norm of the original matrix A(see Description).
Output Parameters
rcond=0 if the estimate underflows; in this case the matrix is singular
or even singular.
Return Values
Application Notes
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2
floating-point operations for real flavors and 8n2 for complex flavors.
?pocon
symmetric (Hermitian) positive-definite matrix.
Syntax
lapack_int LAPACKE_spocon( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_dpocon( int matrix_layout, char uplo, lapack_int n, const double* a,
lapack_int lda, double anorm, double* rcond );
lapack_int LAPACKE_cpocon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float anorm, float* rcond );
lapack_int LAPACKE_zpocon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a symmetric (Hermitian) positive-definite
matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric or Hermitian, κ∞(A) = κ1(A)).
1 / (||A|| ||A-1||).
420
LAPACK Routines 3
• call ?potrf to compute the Cholesky factorization of A.
Input Parameters


If uplo = 'U', A is factored as A = UT*U for real flavors or A = UH*U
for complex flavors, and U is stored.
If uplo = 'L', A is factored as A = L*LT for real flavors or A = L*LH
for complex flavors, and L is stored.
a The array a of size max(1, lda*n) contains the factored matrix A, as

returned by ?potrf.
Output Parameters
or even singular.
Return Values
Application Notes
See Also
?ppcon
packed symmetric (Hermitian) positive-definite
matrix.
421
Syntax
lapack_int LAPACKE_sppcon( int matrix_layout, char uplo, lapack_int n, const float* ap,
lapack_int LAPACKE_dppcon( int matrix_layout, char uplo, lapack_int n, const double*
ap, double anorm, double* rcond );
lapack_int LAPACKE_cppcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, float anorm, float* rcond );
lapack_int LAPACKE_zppcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a packed symmetric (Hermitian) positive-
definite matrix A:
1 / (||A|| ||A-1||).

• call ?pptrf to compute the Cholesky factorization of A.
Input Parameters


ap The array ap contains the packed factored matrix A, as returned by ?

pptrf. The dimension of ap must be at least max(1,n(n+1)/2).
422
LAPACK Routines 3
Output Parameters
or even singular.
Return Values
Application Notes
See Also
?pbcon
symmetric (Hermitian) positive-definite band matrix.
Syntax
lapack_int LAPACKE_spbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const float* ab, lapack_int ldab, float anorm, float* rcond );
lapack_int LAPACKE_dpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const double* ab, lapack_int ldab, double anorm, double* rcond );
lapack_int LAPACKE_cpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_float* ab, lapack_int ldab, float anorm, float* rcond );
lapack_int LAPACKE_zpbcon( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_double* ab, lapack_int ldab, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a symmetric (Hermitian) positive-definite
band matrix A:
1 / (||A|| ||A-1||).

• call ?pbtrf to compute the Cholesky factorization of A.
423
Input Parameters


ldab The leading dimension of the array ab. (ldab≥kd +1).
ab The array ab of size max(1, ldab*n) contains the factored matrix A in

band form, as returned by ?pbtrf.
Output Parameters
or even singular.
Return Values
Application Notes
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately
4*n(kd + 1) floating-point operations for real flavors and 16*n(kd + 1) for complex flavors.
See Also
?ptcon
symmetric (Hermitian) positive-definite tridiagonal
matrix.
Syntax
lapack_int LAPACKE_sptcon( lapack_int n, const float* d, const float* e, float anorm,
float* rcond );
424
LAPACK Routines 3
lapack_int LAPACKE_dptcon( lapack_int n, const double* d, const double* e, double
anorm, double* rcond );
lapack_int LAPACKE_cptcon( lapack_int n, const float* d, const lapack_complex_float* e,
lapack_int LAPACKE_zptcon( lapack_int n, const double* d, const lapack_complex_double*
e, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine computes the reciprocal of the condition number (in the 1-norm) of a real symmetric or complex
Hermitian positive-definite tridiagonal matrix using the factorization A = L*D*LT for real flavors and A =
L*D*LH for complex flavors or A = UT*D*U for real flavors and A = UH*D*U for complex flavors computed
by ?pttrf :

The norm ||A-1|| is computed by a direct method, and the reciprocal of the condition number is computed
as rcond = 1 / (||A|| ||A-1||).
• compute anorm as ||A||1 = maxjΣi |aij|

• call ?pttrf to compute the factorization of A.
Input Parameters
d Arrays, dimension (n).

The array d contains the n diagonal elements of the diagonal matrix D
from the factorization of A, as computed by ?pttrf ;
e Array, size (n -1).
Contains off-diagonal elements of the unit bidiagonal factor U or L

from the factorization computed by ?pttrf .
anorm The 1- norm of the original matrix A (see Description).
Output Parameters
or even singular.
Return Values
425
Application Notes
4*n(kd + 1) floating-point operations for real flavors and 16*n(kd + 1) for complex flavors.
?sycon
symmetric matrix.
Syntax
lapack_int LAPACKE_ssycon( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dsycon( int matrix_layout, char uplo, lapack_int n, const double* a,
lapack_int lda, const lapack_int* ipiv, double anorm, double* rcond );
lapack_int LAPACKE_csycon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, const lapack_int* ipiv, float anorm, float*
rcond );
lapack_int LAPACKE_zsycon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, const lapack_int* ipiv, double anorm, double*
rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a symmetric matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).
1 / (||A|| ||A-1||).

• call ?sytrf to compute the factorization of A.
Input Parameters


426
LAPACK Routines 3
a The array a of size max(1,lda*n) contains the factored matrix A, as

returned by ?sytrf.
The array ipiv, as returned by ?sytrf.
Output Parameters
or even singular.
Return Values
Application Notes
See Also
?hecon
Hermitian matrix.
Syntax
lapack_int LAPACKE_checon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* a, lapack_int lda, const lapack_int* ipiv, float anorm, float*
rcond );
lapack_int LAPACKE_zhecon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* a, lapack_int lda, const lapack_int* ipiv, double anorm, double*
rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a Hermitian matrix A:
427
κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = κ1(A)).

• compute anorm (either ||A||1 =maxjΣi |aij| or ||A||∞ =maxiΣj |aij|)

• call ?hetrf to compute the factorization of A.
Input Parameters


a The array a of size max(1, lda*n) contains the factored matrix A, as

returned by ?hetrf.
The array ipiv, as returned by ?hetrf.
Output Parameters
or even singular.
Return Values
Application Notes
A*x = b; the number is usually 5 and never more than 11. Each solution requires approximately 8n2
floating-point operations.
See Also
428
LAPACK Routines 3
?spcon
packed symmetric matrix.
Syntax
lapack_int LAPACKE_sspcon( int matrix_layout, char uplo, lapack_int n, const float* ap,
const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_dspcon( int matrix_layout, char uplo, lapack_int n, const double*
ap, const lapack_int* ipiv, double anorm, double* rcond );
lapack_int LAPACKE_cspcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zspcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_int* ipiv, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a packed symmetric matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is symmetric, κ∞(A) = κ1(A)).
1 / (||A|| ||A-1||).

• call ?sptrf to compute the factorization of A.
Input Parameters


If uplo = 'U', the array ap stores the packed upper triangular factor
U of the factorization A = U*D*UT.
If uplo = 'L', the array ap stores the packed lower triangular factor
L of the factorization A = L*D*LT.

sptrf. The dimension of ap must be at least max(1,n(n+1)/2).
The array ipiv, as returned by ?sptrf.
429
Output Parameters
rcond = 0 if the estimate underflows; in this case the matrix is
singular (to working precision). However, anytime rcond is small
compared to 1.0, for the working precision, the matrix may be poorly
conditioned or even singular.
Return Values
Application Notes
See Also
?hpcon
packed Hermitian matrix.
Syntax
lapack_int LAPACKE_chpcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_int* ipiv, float anorm, float* rcond );
lapack_int LAPACKE_zhpcon( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_int* ipiv, double anorm, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a Hermitian matrix A:
κ1(A) = ||A||1 ||A-1||1 (since A is Hermitian, κ∞(A) = k1(A)).
1 / (||A|| ||A-1||).
• compute anorm (either ||A||1 =maxjΣi |aij| or ||A||∞ =maxiΣj |aij|)

• call ?hptrf to compute the factorization of A.
Input Parameters

430
LAPACK Routines 3

If uplo = 'U', the array ap stores the packed upper triangular factor
U of the factorization A = U*D*UT.
If uplo = 'L', the array ap stores the packed lower triangular factor
L of the factorization A = L*D*LT.

hptrf. The dimension of ap must be at least max(1,n(n+1)/2).
ipiv Array, size at least max(1, n). The array ipiv, as returned by ?hptrf.
Output Parameters
or even singular.
Return Values
Application Notes
A*x = b; the number is usually 5 and never more than 11. Each solution requires approximately 8n2
floating-point operations.
See Also
?trcon
triangular matrix.
Syntax
lapack_int LAPACKE_strcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const float* a, lapack_int lda, float* rcond );
lapack_int LAPACKE_dtrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const double* a, lapack_int lda, double* rcond );
lapack_int LAPACKE_ctrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_float* a, lapack_int lda, float* rcond );
431
lapack_int LAPACKE_ztrcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_double* a, lapack_int lda, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a triangular matrix A in either the 1-norm or
infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞ (A) =||A||∞ ||A-1||∞ =k1 (AT) = κ1 (AH) .
Input Parameters



If uplo = 'U', the array a stores the upper triangle of A, other array
elements are not referenced.
If uplo = 'L', the array a stores the lower triangle of A, other array
elements are not referenced.

a The array a of size max(1, lda*n) contains the matrix A.
Output Parameters
or even singular.
432
LAPACK Routines 3
Return Values
Application Notes
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately n2
floating-point operations for real flavors and 4n2 operations for complex flavors.
See Also
?tpcon
packed triangular matrix.
Syntax
lapack_int LAPACKE_stpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const float* ap, float* rcond );
lapack_int LAPACKE_dtpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const double* ap, double* rcond );
lapack_int LAPACKE_ctpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_float* ap, float* rcond );
lapack_int LAPACKE_ztpcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, const lapack_complex_double* ap, double* rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a packed triangular matrix A in either the 1-
norm or infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .
Input Parameters


433
uplo Must be 'U' or 'L'. Indicates whether A is upper or lower triangular:
If uplo = 'U', the array ap stores the upper triangle of A in packed

form.
If uplo = 'L', the array ap stores the lower triangle of A in packed
form.

ap The array ap contains the packed matrix A. The dimension of ap must

be at least max(1,n(n+1)/2).
Output Parameters
or even singular.
Return Values
Application Notes
A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires approximately n2
floating-point operations for real flavors and 4n2 operations for complex flavors.
See Also
?tbcon
triangular band matrix.
Syntax
lapack_int LAPACKE_stbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const float* ab, lapack_int ldab, float* rcond );
lapack_int LAPACKE_dtbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const double* ab, lapack_int ldab, double* rcond );
lapack_int LAPACKE_ctbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const lapack_complex_float* ab, lapack_int ldab, float*
rcond );
434
LAPACK Routines 3
lapack_int LAPACKE_ztbcon( int matrix_layout, char norm, char uplo, char diag,
lapack_int n, lapack_int kd, const lapack_complex_double* ab, lapack_int ldab, double*
rcond );
Include Files
• mkl.h
Description
The routine estimates the reciprocal of the condition number of a triangular band matrix A in either the 1-
norm or infinity-norm:
κ1(A) =||A||1 ||A-1||1 = κ∞(AT) = κ∞(AH)
κ∞(A) =||A||∞ ||A-1||∞ =κ1 (AT) = κ1(AH) .
Input Parameters


uplo Must be 'U' or 'L'. Indicates whether A is upper or lower triangular:
If uplo = 'U', the array ap stores the upper triangle of A in packed

form.
If uplo = 'L', the array ap stores the lower triangle of A in packed
form.

ab The array ab of size max(1, ldab*n) contains the band matrix A.
ldab The leading dimension of the array ab. (ldab≥kd +1).
Output Parameters
or even singular.
435
Return Values
Application Notes
2*n(kd + 1) floating-point operations for real flavors and 8*n(kd + 1) operations for complex flavors.
See Also
Refining the Solution and Estimating Its Error: LAPACK Computational Routines
This section describes the LAPACK routines for refining the computed solution of a system of linear equations
and estimating the solution error. You can call these routines after factorizing the matrix of the system of
equations and computing the solution (see Routines for Matrix Factorization and Routines for Solving
Systems of Linear Equations).
?gerfs
Refines the solution of a system of linear equations
with a general coefficient matrix and estimates its
error.
Syntax
lapack_int LAPACKE_sgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf, const
lapack_int* ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float*
ferr, float* berr );
lapack_int LAPACKE_dgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf, const
lapack_int* ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
ferr, double* berr );
lapack_int LAPACKE_cgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zgerfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_int* ipiv, const lapack_complex_double* b, lapack_int
ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
436
LAPACK Routines 3
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B or AT*X
= B or AH*X = B with a general matrix A, with multiple right-hand sides. For each computed solution vector
x, the routine computes the component-wise backward errorβ. This error is the smallest relative
perturbation in elements of A and b such that x is the exact solution of the perturbed system:
|δaij| ≤β|aij|, |δbi| ≤β|bi| such that (A + δA)x = (b + δb).
Finally, the routine estimates the component-wise forward error in the computed solution ||x - xe||∞/||
x||∞ (here xe is the exact solution).
• call the factorization routine ?getrf

• call the solver routine ?getrs.
Input Parameters


If trans = 'N', the system has the form A*X = B.
If trans = 'T', the system has the form AT*X = B.
If trans = 'C', the system has the form AH*X = B.
a,af,b,x Arrays:
a(size max(1, lda*n)) contains the original matrix A, as supplied to ?
getrf.
af(size max(1, ldaf*n)) contains the factored matrix A, as returned
by ?getrf.
bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the right-hand side matrix B.
xof size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout contains the solution matrix X.
ldaf The leading dimension of af; ldaf≥ max(1, n).
ldx The leading dimension of x; ldx≥ max(1, n) for column major layout
and ldx≥nrhs for row major layout.
The ipiv array, as returned by ?getrf.
437
Output Parameters
x The refined solution matrix X.
ferr, berr Arrays, size at least max(1, nrhs). Contain the component-wise
forward and backward errors, respectively, for each solution vector.
Return Values
Application Notes
The bounds returned in ferr are not rigorous, but in practice they almost always overestimate the actual
error.
For each right-hand side, computation of the backward error involves a minimum of 4n2 floating-point
operations (for real flavors) or 16n2 operations (for complex flavors). In addition, each step of iterative
refinement involves 6n2 operations (for real flavors) or 24n2 operations (for complex flavors); the number of
iterations may range from 1 to 5. Estimating the forward error involves solving a number of systems of linear
equations A*x = b with the same coefficient matrix A and different right hand sides b; the number is usually
4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point operations for real
See Also
?gerfsx
Uses extra precise iterative refinement to improve the
solution to the system of linear equations with a
general coefficient matrix A and provides error bounds
and backward error estimates.
Syntax
lapack_int LAPACKE_sgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, const float* af, lapack_int ldaf,
const lapack_int* ipiv, const float* r, const float* c, const float* b, lapack_int
ldb, float* x, lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds,
float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_dgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, const double* af, lapack_int ldaf,
const lapack_int* ipiv, const double* r, const double* c, const double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds,
double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_cgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* r,
const float* c, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float*
err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zgerfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* r,
438
LAPACK Routines 3
const double* c, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
Include Files
• mkl.h
Description
The routine improves the computed solution to a system of linear equations and provides error bounds and
backward error estimates for the solution. In addition to a normwise error bound, the code provides a
maximum componentwise error bound, if possible. See comments for err_bnds_norm and err_bnds_comp
for details of the error bounds.
The original system of linear equations may have been equilibrated before calling this routine, as described
by the parameters equed, r, and c below. In this case, the solution and error bounds returned are for the
original unequilibrated system.
Input Parameters
matrix_layout Specifies whether two-dimensional array storage is row major

trans Must be 'N', 'T', or 'C'.
Specifies the form of the system of equations:

If trans = 'N', the system has the form A*X = B (No transpose).
If trans = 'T', the system has the form AT*X = B (Transpose).
If trans = 'C', the system has the form AH*X = B (Conjugate transpose
for complex flavors, Transpose for real flavors).
equed Must be 'N', 'R', 'C', or 'B'.
Specifies the form of equilibration that was done to A before calling this
routine.
If equed = 'N', no equilibration was done.
If equed = 'R', row equilibration was done, that is, A has been
premultiplied by diag(r).
If equed = 'C', column equilibration was done, that is, A has been
postmultiplied by diag(c).
If equed = 'B', both row and column equilibration was done, that is, A has
been replaced by diag(r)*A*diag(c). The right-hand side B has been
changed accordingly.
n The number of linear equations; the order of the matrix A; n≥ 0.
nrhs The number of right-hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.
a, af, b Arrays: a (size max(1, lda*n)), af (size max(1, ldaf*n)), b (size max(1,
ldb*nrhs) for column major layout and max(1, ldb*n) for row major
layout).
The array a contains the original n-by-n matrix A.
439
The array af contains the factored form of the matrix A, that is, the factors
L and U from the factorization A = P*L*U as computed by ?getrf.
The array b contains the matrix B whose columns are the right-hand sides
for the systems of equations.
ipiv Array, size at least max(1, n). Contains the pivot indices as computed by ?
getrf; for row 1 ≤i≤n, row i of the matrix was interchanged with row
ipiv(i).
r, c Arrays: r (size n), c (size n). The array r contains the row scale factors for
A, and the array c contains the column scale factors for A.
equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed = 'N'

or 'C', r is not accessed.
If equed = 'R' or 'B', each element of r must be positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if equed =

'N' or 'R', c is not accessed.
If equed = 'C' or 'B', each element of c must be positive.
Each element of r or c should be a power of the radix to ensure a reliable

solution and error estimates. Scaling by powers of the radix does not cause
rounding errors unless the result underflows or overflows. Rounding errors
during scaling lead to refining with a matrix that is not equivalent to the
input matrix, producing error estimates that may not be reliable.
ldb The leading dimension of the array b; ldb≥ max(1, n) for column major
x Array, of size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout.
The solution matrix X as computed by ?getrs
ldx The leading dimension of the output array x; ldx≥ max(1, n) for column
major layout and ldx≥nrhs for row major layout.
n_err_bnds Number of error bounds to return for each right hand side and each type
(normwise or componentwise). See err_bnds_norm and err_bnds_comp
descriptions in Output Arguments section below.
nparams Specifies the number of parameters set in params. If ≤ 0, the params array
is never referenced and default values are used.
params Array, size nparams. Specifies algorithm parameters. If an entry is less than
0.0, that entry is filled with the default value used for that parameter. Only
positions up to nparams are accessed; defaults are used for higher-
numbered parameters. If defaults are acceptable, you can pass nparams =
0, which prevents the source code from accessing the params argument.
params[0] : Whether to perform iterative refinement or not. Default: 1.0
440
LAPACK Routines 3
=0.0 No refinement is performed and no error bounds
are computed.
=1.0 Use the double-precision refinement algorithm,

possibly with doubled-single computations if the
compilation environment does not support
double precision.
(Other values are reserved for future use.)

params[1] : Maximum number of residual computations allowed for
refinement.
Default 10.0
Aggressive Set to 100.0 to permit convergence using

approximate factorizations or factorizations
other than LU. If the factorization uses a
technique other than Gaussian elimination, the
guarantees in err_bnds_norm and
err_bnds_comp may no longer be trustworthy.
params[2] : Flag determining if the code will attempt to find a solution

with a small componentwise relative error in the double-precision algorithm.
Positive is true, 0.0 is false. Default: 1.0 (attempt componentwise
convergence).
Output Parameters
x The improved solution matrix X.
rcond Reciprocal scaled condition number. An estimate of the reciprocal Skeel

condition number of the matrix A after equilibration (if done). If rcond is
less than the machine precision, in particular, if rcond = 0, the matrix is
singular to working precision. Note that the error may still be small even if
this number is very small and the matrix appears ill-conditioned.
berr Array, size at least max(1, nrhs). Contains the componentwise relative
backward error for each solution vector xj, that is, the smallest relative
change in any element of A or B that makes xj an exact solution.
err_bnds_norm Array of size nrhs*n_err_bnds. For each right-hand side, contains

information about various error bounds and condition numbers
corresponding to the normwise relative error, which is defined as follows:
Normwise relative error in the i-th solution vector
The array is indexed by the type of error information as described below.

There are currently up to three pieces of information returned.
441
err=1 "Trust/don't trust" boolean. Trust the answer if

the reciprocal condition number is less than the
threshold sqrt(n)*slamch(ε) for single
precision flavors and sqrt(n)*dlamch(ε) for
double precision flavors.
err=2 "Guaranteed" error bound. The estimated

forward error, almost certainly within a factor of
10 of the true error so long as the next entry is
greater than the threshold sqrt(n)*slamch(ε)
for single precision flavors and
sqrt(n)*dlamch(ε) for double precision
flavors. This error bound should only be trusted
if the previous boolean is true.
err=3 Reciprocal condition number. Estimated

normwise reciprocal condition number.
Compared with the threshold
sqrt(n)*slamch(ε) for single precision flavors
and sqrt(n)*dlamch(ε) for double precision
flavors to determine if the error estimate is
"guaranteed". These reciprocal condition
numbers for some appropriately scaled matrix Z
are:
Let z=s*a, where s scales each row by a power

of the radix so all absolute row sums of z are
approximately 1.
The information for right-hand side i, where 1 ≤i≤nrhs, and type of error
err is stored in:
• Column major layout: err_bnds_norm[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_norm[err - 1 + (i - 1)*n_err_bnds]
err_bnds_comp Array of size nrhs*n_err_bnds. For each right-hand side, contains

corresponding to the componentwise relative error, which is defined as
follows:
Componentwise relative error in the i-th solution vector:

There are currently up to three pieces of information returned for each
right-hand side. If componentwise accuracy is not requested (params[2] =
0.0), then err_bnds_comp is not accessed. If n_err_bnds < 3, then at
most the first n_err_bnds columns of the err_bnds_comp array are
returned.
442
LAPACK Routines 3


componentwise reciprocal condition number.
are:
Let z=s*(a*diag(x)), where x is the solution

for the current right-hand side and s scales each
row of a*diag(x) by a power of the radix so all
absolute row sums of z are approximately 1.
err is stored in:
• Column major layout: err_bnds_comp[(err - 1)*nrhs + i - 1].

• Row major layout: err_bnds_comp[err - 1 + (i - 1)*n_err_bnds]
params Output parameter only if the input contains erroneous values, namely, in
params[0], params[1], params[2]. In such a case, the corresponding
elements of params are filled with default values on output.
Return Values
If info = 0, the execution is successful. The solution to every right-hand side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not guaranteed. The solutions
corresponding to other right-hand sides k with k > j may not be guaranteed as well, but only the first such
right-hand side is reported. If a small componentwise error is not requested params[2] = 0.0, then the j-th
443
right-hand side is the first with a normwise error bound that is not guaranteed (the smallest j such that for
column major layout err_bnds_norm[j - 1] = 0.0 or err_bnds_comp[j - 1] = 0.0; or for row major
layout err_bnds_norm[(j - 1)*n_err_bnds] = 0.0 or err_bnds_comp[(j - 1)*n_err_bnds] = 0.0).
See the definition of err_bnds_norm and err_bnds_comp for err = 1. To get information about all of the
right-hand sides, check err_bnds_norm or err_bnds_comp.
See Also
?gbrfs
with a general band coefficient matrix and estimates
its error.
Syntax
lapack_int LAPACKE_sgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const float* ab, lapack_int ldab, const float* afb,
lapack_int ldafb, const lapack_int* ipiv, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const double* ab, lapack_int ldab, const double* afb,
lapack_int ldafb, const lapack_int* ipiv, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* afb, lapack_int ldafb, const lapack_int* ipiv, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* ferr, float* berr );
lapack_int LAPACKE_zgbrfs( int matrix_layout, char trans, lapack_int n, lapack_int kl,
lapack_int ku, lapack_int nrhs, const lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* afb, lapack_int ldafb, const lapack_int* ipiv, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int ldx,
double* ferr, double* berr );
Include Files
• mkl.h
Description
= B or AH*X = B with a band matrix A, with multiple right-hand sides. For each computed solution vector x,
the routine computes the component-wise backward errorβ. This error is the smallest relative perturbation
in elements of A and b such that x is the exact solution of the perturbed system:
• call the factorization routine ?gbtrf

• call the solver routine ?gbtrs.
444
LAPACK Routines 3
Input Parameters


kl The number of sub-diagonals within the band of A; kl≥ 0.
ku The number of super-diagonals within the band of A; ku≥ 0.
ab,afb,b,x Arrays:
ab(size max(1, ldab*n)) contains the original band matrix A, as
supplied to ?gbtrf, but stored in rows from 1 to kl + ku + 1 for
column major layout, and columns from 1 to kl + ku + 1 for row
major layout.
afb(size max(1, ldafb*n)) contains the factored band matrix A, as
returned by ?gbtrf.
ldab The leading dimension of ab, ldab≥kl + ku + 1.
ldafb The leading dimension of afb, ldafb≥ 2*kl + ku + 1.
ldx The leading dimension of x; ldx≥ max(1, n).
Output Parameters
Return Values
445
If info =0, the execution is successful.

Application Notes
error.
For each right-hand side, computation of the backward error involves a minimum of 4n(kl + ku) floating-
point operations (for real flavors) or 16n(kl + ku) operations (for complex flavors). In addition, each step of
iterative refinement involves 2n(4kl + 3ku) operations (for real flavors) or 8n(4kl + 3ku) operations (for
complex flavors); the number of iterations may range from 1 to 5. Estimating the forward error involves
solving a number of systems of linear equations A*x = b; the number is usually 4 or 5 and never more than
11. Each solution requires approximately 2n2 floating-point operations for real flavors or 8n2 for complex
flavors.
See Also
?gbrfsx
banded coefficient matrix A and provides error bounds
and backward error estimates.
Syntax
lapack_int LAPACKE_sgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const float* ab, lapack_int ldab, const
float* afb, lapack_int ldafb, const lapack_int* ipiv, const float* r, const float* c,
const float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* berr,
lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams,
float* params );
lapack_int LAPACKE_dgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const double* ab, lapack_int ldab,
const double* afb, lapack_int ldafb, const lapack_int* ipiv, const double* r, const
double* c, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond,
double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp,
lapack_int nparams, double* params );
lapack_int LAPACKE_cgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const lapack_complex_float* ab,
lapack_int ldab, const lapack_complex_float* afb, lapack_int ldafb, const lapack_int*
ipiv, const float* r, const float* c, const lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* berr, lapack_int
n_err_bnds, float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float*
params );
lapack_int LAPACKE_zgbrfsx( int matrix_layout, char trans, char equed, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, const lapack_complex_double* ab,
lapack_int ldab, const lapack_complex_double* afb, lapack_int ldafb, const lapack_int*
ipiv, const double* r, const double* c, const lapack_complex_double* b, lapack_int
ldb, lapack_complex_double* x, lapack_int ldx, double* rcond, double* berr, lapack_int
n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double*
params );
446
LAPACK Routines 3
Include Files
• mkl.h
Description
by the parameters equed, r, and c below. In this case, the solution and error bounds returned are for the
Input Parameters


If trans = 'C', the system has the form AH*X = B (Conjugate transpose
for complex flavors, Transpose for real flavors).
routine.
been replaced by diag(r)*A*diag(c). The right-hand side B has been
and X; nrhs≥ 0.
ab, afb, b The array abof size max(1, ldab*n) contains the original matrix A in band
storage, in rows from 1 to kl+ku + 1 for column major layout, and in
columns from 1 to kl+ku + 1 for row major layout.
447
The array afbof size max(1, ldafb*n) contains details of the LU

factorization of the banded matrix A as computed by ?gbtrf.
The array bof size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout contains the matrix B whose columns are the
right-hand sides for the systems of equations.
ldab The leading dimension of the array ab; ldab≥kl+ku+1.
ldafb The leading dimension of the array afb; ldafb≥ 2*kl+ku+1.
ipiv Array, size at least max(1, n). Contains the pivot indices as computed by ?
gbtrf; for row 1 ≤i≤n, row i of the matrix was interchanged with row
ipiv[i-1].
r, c Arrays: r(n), c(n). The array r contains the row scale factors for A, and
the array c contains the column scale factors for A.
If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed =

'N' or 'C', r is not accessed.
If equed = 'R' or 'B', each element of r must be positive.

If equed = 'C' or 'B', each element of c must be positive.

x Array, size max(1, ldx*nrhs) for column major layout and max(1, ldx*n)
for row major layout.
The solution matrix X as computed by sgbtrs/dgbtrs for real flavors or
cgbtrs/zgbtrs for complex flavors.
n_err_bnds Number of error bounds to return for each right-hand side and each type
0.0, that entry will be filled with the default value used for that parameter.
Only positions up to nparams are accessed; defaults are used for higher-
448
LAPACK Routines 3
(for single precision flavors), 1.0D+0 (for double precision flavors).

are computed.

double precision.

refinement.
Default 10.0


convergence).
Output Parameters



449



are

approximately 1.
err is stored in:


follows:

0.0), then err_bnds_comp is not accessed.
450
LAPACK Routines 3
err=2 "Guaranteed" error bpound. The estimated


are

err is stored in:

params[0], params[1], and params[2]. In such a case, the corresponding
Return Values
451
See Also
?gtrfs
with a tridiagonal coefficient matrix and estimates its
error.
Syntax
lapack_int LAPACKE_sgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const float* dl, const float* d, const float* du, const float* dlf, const float*
df, const float* duf, const float* du2, const lapack_int* ipiv, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const double* dl, const double* d, const double* du, const double* dlf, const
double* df, const double* duf, const double* du2, const lapack_int* ipiv, const double*
b, lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_float* dl, const lapack_complex_float* d, const
lapack_complex_float* du, const lapack_complex_float* dlf, const lapack_complex_float*
df, const lapack_complex_float* duf, const lapack_complex_float* du2, const lapack_int*
ipiv, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int LAPACKE_zgtrfs( int matrix_layout, char trans, lapack_int n, lapack_int
nrhs, const lapack_complex_double* dl, const lapack_complex_double* d, const
lapack_complex_double* du, const lapack_complex_double* dlf, const
lapack_complex_double* df, const lapack_complex_double* duf, const
lapack_complex_double* du2, const lapack_int* ipiv, const lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
= B or AH*X = B with a tridiagonal matrix A, with multiple right-hand sides. For each computed solution
vector x, the routine computes the component-wise backward errorβ. This error is the smallest relative
|δaij|/|aij| ≤β|aij|, |δbi|/|bi| ≤β|bi| such that (A + δA)x = (b + δb).
• call the factorization routine ?gttrf
452
LAPACK Routines 3
• call the solver routine ?gttrs.
Input Parameters


dl Array dl of size n -1 contains the subdiagonal elements of A.
d Array d of size n contains the diagonal elements of A.
du Array du of size n -1 contains the superdiagonal elements of A.
dlf Array dlf of size n -1 contains the (n - 1) multipliers that define the
matrix L from the LU factorization of A as computed by ?gttrf.
df Array df of size n contains the n diagonal elements of the upper

triangular matrix U from the LU factorization of A.
duf Array duf of size n -1 contains the (n - 1) elements of the first

superdiagonal of U.
du2 Array du2 of size n -2 contains the (n - 2) elements of the second

superdiagonal of U.
b Array b (size max(1,ldb*nrhs) for column major layout and max(1,

ldb*n) for row major layout) contains the right-hand side matrix B.
x Array x (size max(1,ldx*nrhs) for column major layout and max(1,

ldx*n) contains the solution matrix X, as computed by ?gttrs.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?gttrf.
Output Parameters
453
ferr, berr Arrays, size at least max(1,nrhs). Contain the component-wise

Return Values
See Also
?porfs
with a symmetric (Hermitian) positive-definite
coefficient matrix and estimates its error.
Syntax
lapack_int LAPACKE_sporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* a, lapack_int lda, const float* af, lapack_int ldaf, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* a, lapack_int lda, const double* af, lapack_int ldaf, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_cporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* a, lapack_int lda, const lapack_complex_float* af,
lapack_int ldaf, const lapack_complex_float* b, lapack_int ldb, lapack_complex_float*
x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zporfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* a, lapack_int lda, const lapack_complex_double* af,
lapack_int ldaf, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine performs an iterative refinement of the solution to a system of linear equations A*X = B with a
symmetric (Hermitian) positive definite matrix A, with multiple right-hand sides. For each computed solution
• call the factorization routine ?potrf

• call the solver routine ?potrs.
454
LAPACK Routines 3
Input Parameters

a Array a (size max(1, lda*n)) contains the original matrix A, as

supplied to ?potrf.
af Array af (size max(1, ldaf*n)) contains the factored matrix A, as

returned by ?potrf.
b Array bof size max(1, ldb*nrhs) for column major layout and max(1,
The second dimension of b must be at least max(1, nrhs).
x Array x of size max(1, ldx*nrhs) for column major layout and max(1,
Output Parameters
Return Values
Application Notes
error.
455
equations A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.
See Also
?porfsx
symmetric/Hermitian positive-definite coefficient
matrix A and provides error bounds and backward
error estimates.
Syntax
lapack_int LAPACKE_sporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
const float* s, const float* b, lapack_int ldb, float* x, lapack_int ldx, float*
rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, float* params );
lapack_int LAPACKE_dporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
const double* s, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_cporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_complex_float* af, lapack_int ldaf, const float* s, const lapack_complex_float*
b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* berr,
float* params );
lapack_int LAPACKE_zporfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_complex_double* af, lapack_int ldaf, const double* s, const
double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, double* params );
Include Files
• mkl.h
Description
by the parameters equed and s below. In this case, the solution and error bounds returned are for the
456
LAPACK Routines 3
Input Parameters


equed Must be 'N' or 'Y'.
routine.
If equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s). The right-hand side B has been
and X; nrhs≥ 0.
a The array a (size max(1, lda*n)) contains the symmetric/Hermitian matrix

A as specified by uplo. If uplo = 'U', the leading n-by-n upper triangular
part of a contains the upper triangular part of the matrix A and the strictly
lower triangular part of a is not referenced. If uplo = 'L', the leading n-
by-n lower triangular part of a contains the lower triangular part of the
af The array af (size max(1, ldaf*n)) contains the triangular factor L or U

from the Cholesky factorization A = UT*U or A = L*LT as computed by
spotrf for real flavors or dpotrf for complex flavors.
b The array b (size max(1, ldb*nrhs for column major layout and max(1,
ldb*n) for row major layout) contains the matrix B whose columns are the
s Array of size n. The array s contains the scale factors for A.
If equed = 'N', s is not accessed.
If equed = 'Y', each element of s must be positive.
Each element of s should be a power of the radix to ensure a reliable

457
The solution matrix X as computed by ?potrs


are computed.

double precision.

refinement.
Default 10.0


convergence).
458
LAPACK Routines 3
Output Parameters






are
459

approximately 1.
err is stored in:


follows:




are
460
LAPACK Routines 3
err is stored in:

params Output parameter only if the input contains erroneous values, namely in
params[0], params[1], or params[2]. In such a case, the corresponding
Return Values
See Also
?pprfs
coefficient matrix stored in a packed format and
estimates its error.
Syntax
lapack_int LAPACKE_spprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* ap, const float* afp, const float* b, lapack_int ldb, float* x,
lapack_int LAPACKE_dpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* ap, const double* afp, const double* b, lapack_int ldb, double* x,
lapack_int LAPACKE_cpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const
461
lapack_int LAPACKE_zpprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,

const lapack_complex_double* ap, const lapack_complex_double* afp, const
Include Files
• mkl.h
Description
symmetric (Hermitian) positive definite matrix A, with multiple right-hand sides. For each computed solution
Finally, the routine estimates the component-wise forward error in the computed solution
||x - xe||∞/||x||∞
where xe is the exact solution.
• call the factorization routine ?pptrf

• call the solver routine ?pptrs.
Input Parameters


ap ap contains the original matrix A in a packed format, as supplied to ?

pptrf. The dimension of ap must be at least max(1,n(n+1)/2).
afp afp contains the factored matrix A in a packed format, as returned

by ?pptrf. The dimension of afp must be at least max(1,n(n+1)/2).
b Array b of size max(1, ldb*nrhs) for column major layout and

matrix B.
462
LAPACK Routines 3
Output Parameters
Return Values
Application Notes
error.
iterations may range from 1 to 5.
Estimating the forward error involves solving a number of systems of linear equations A*x = b; the number
of systems is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point
operations for real flavors or 8n2 for complex flavors.
See Also
?pbrfs
with a band symmetric (Hermitian) positive-definite
coefficient matrix and estimates its error.
Syntax
lapack_int LAPACKE_spbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const float* ab, lapack_int ldab, const float* afb, lapack_int ldafb,
const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const double* ab, lapack_int ldab, const double* afb, lapack_int
ldafb, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* ferr,
double* berr );
lapack_int LAPACKE_cpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* afb, lapack_int ldafb, const lapack_complex_float* b, lapack_int
ldb, lapack_complex_float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_zpbrfs( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
lapack_int nrhs, const lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* afb, lapack_int ldafb, const lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
463
Include Files
• mkl.h
Description
symmetric (Hermitian) positive definite band matrix A, with multiple right-hand sides. For each computed
solution vector x, the routine computes the component-wise backward errorβ. This error is the smallest
relative perturbation in elements of A and b such that x is the exact solution of the perturbed system:
• call the factorization routine ?pbtrf

• call the solver routine ?pbtrs.
Input Parameters


ab Array ab (size max(ldab*n)) contains the original band matrix A, as

supplied to ?pbtrf.
afb Array afb (size max(ldafb*n)) contains the factored band matrix A,
as returned by ?pbtrf.

matrix B.
ldab The leading dimension of ab; ldab≥kd + 1.
ldafb The leading dimension of afb; ldafb≥kd + 1.
464
LAPACK Routines 3
Output Parameters
Return Values

Application Notes
error.
For each right-hand side, computation of the backward error involves a minimum of 8n*kd floating-point
operations (for real flavors) or 32n*kd operations (for complex flavors). In addition, each step of iterative
refinement involves 12n*kd operations (for real flavors) or 48n*kd operations (for complex flavors); the
number of iterations may range from 1 to 5.
is usually 4 or 5 and never more than 11. Each solution requires approximately 4n*kd floating-point
operations for real flavors or 16n*kd for complex flavors.
See Also
?ptrfs
tridiagonal coefficient matrix and estimates its error.
Syntax
lapack_int LAPACKE_sptrfs( int matrix_layout, lapack_int n, lapack_int nrhs, const
float* d, const float* e, const float* df, const float* ef, const float* b, lapack_int
ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dptrfs( int matrix_layout, lapack_int n, lapack_int nrhs, const
double* d, const double* e, const double* df, const double* ef, const double* b,
lapack_int LAPACKE_cptrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, const float* df, const
lapack_complex_float* ef, const lapack_complex_float* b, lapack_int ldb,
lapack_int LAPACKE_zptrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, const double* df, const
lapack_complex_double* ef, const lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
465
Include Files
• mkl.h
Description
symmetric (Hermitian) positive definite tridiagonal matrix A, with multiple right-hand sides. For each
computed solution vector x, the routine computes the component-wise backward errorβ. This error is the
smallest relative perturbation in elements of A and b such that x is the exact solution of the perturbed
system:
• call the factorization routine ?pttrf

• call the solver routine ?pttrs.
Input Parameters

uplo Used for complex flavors only. Must be 'U' or 'L'.
Specifies whether the superdiagonal or the subdiagonal of the

tridiagonal matrix A is stored and how A is factored:
If uplo = 'U', the array e stores the superdiagonal of A, and A is
factored as UH*D*U.
If uplo = 'L', the array e stores the subdiagonal of A, and A is

factored as L*D*LH.
d The array d (size n) contains the n diagonal elements of the

tridiagonal matrix A.
df The array df (size n) contains the n diagonal elements of the diagonal

matrix D from the factorization of A as computed by ?pttrf.
e,ef,b,x The array e (size n -1) contains the (n - 1) off-diagonal elements of

the tridiagonal matrix A (see uplo).
The array ef (size n -1) contains the (n - 1) off-diagonal elements of
the unit bidiagonal factor U or L from the factorization computed by ?
pttrf (see uplo).
The array b of size max(1, ldb*nrhs) for column major layout and
466
LAPACK Routines 3
The array x of size max(1, ldx*nrhs) for column major layout and
max(1, ldx*n) for row major layout contains the solution matrix X as
computed by ?pttrs.
Output Parameters
Return Values
See Also
?syrfs
with a symmetric coefficient matrix and estimates its
error.
Syntax
lapack_int LAPACKE_ssyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* a, lapack_int lda, const float* af, lapack_int ldaf, const lapack_int*
ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float* ferr, float*
berr );
lapack_int LAPACKE_dsyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* a, lapack_int lda, const double* af, lapack_int ldaf, const lapack_int*
ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double* ferr,
double* berr );
lapack_int LAPACKE_csyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_int LAPACKE_zsyrfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
Include Files
• mkl.h
467
Description
symmetric full-storage matrix A, with multiple right-hand sides. For each computed solution vector x, the
routine computes the component-wise backward errorβ. This error is the smallest relative perturbation in
elements of A and b such that x is the exact solution of the perturbed system:
• call the factorization routine ?sytrf

• call the solver routine ?sytrs.
Input Parameters

a Array a(size max(1, lda*n)) contains the original matrix A, as

supplied to ?sytrf.
af Array af (size max(1, ldaf*n)) contains the factored matrix A, as

returned by ?sytrf.

matrix B.
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?sytrf.
468
LAPACK Routines 3
Output Parameters
Return Values
Application Notes
error.
equations A*x = b; the number is usually 4 or 5 and never more than 11. Each solution requires
approximately 2n2 floating-point operations for real flavors or 8n2 for complex flavors.
See Also
?syrfsx
symmetric indefinite coefficient matrix A and provides
error bounds and backward error estimates.
Syntax
lapack_int LAPACKE_ssyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
const lapack_int* ipiv, const float* s, const float* b, lapack_int ldb, float* x,
lapack_int ldx, float* rcond, float* berr, lapack_int n_err_bnds, float*
err_bnds_norm, float* err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_dsyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
const lapack_int* ipiv, const double* s, const double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, double* params );
lapack_int LAPACKE_csyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* s,
const lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int ldx,
float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zsyrfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* s,
469
const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x, lapack_int

ldx, double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm,
double* err_bnds_comp, lapack_int nparams, double* params );
Include Files
• mkl.h
Description
The routine improves the computed solution to a system of linear equations when the coefficient matrix is
symmetric indefinite, and provides error bounds and backward error estimates for the solution. In addition to
a normwise error bound, the code provides a maximum componentwise error bound, if possible. See
comments for err_bnds_norm and err_bnds_comp for details of the error bounds.
Input Parameters


routine.
and X; nrhs≥ 0.
a, af, b The array a (size max(1, lda*n)) contains the symmetric/Hermitian matrix
A as specified by uplo. If uplo = 'U', the leading n-by-n upper triangular
part of a contains the upper triangular part of the matrix A and the strictly
lower triangular part of a is not referenced. If uplo = 'L', the leading n-
by-n lower triangular part of a contains the lower triangular part of the
The array af (size max(1, ldaf*n)) contains the triangular factor L or U

from the Cholesky factorization A = UT*U or A = L*LT as computed by
ssytrf for real flavors or dsytrf for complex flavors.
470
LAPACK Routines 3
The array b (size max(1, ldb*nrhs) for column major layout and max(1,
ldb*n) for row major layout) contains the matrix B whose columns are the
ipiv Array, size at least max(1, n). Contains details of the interchanges and the
block structure of D as determined by ssytrf for real flavors or dsytrf for
complex flavors.
s Array, size (n). The array s contains the scale factors for A.

x Array, of size max(1, ldx*nrhs) for column major layout and max(1,
The solution matrix X as computed by ?sytrs


are computed.
471

double precision.

refinement.
Default 10.0


convergence).
Output Parameters




472
LAPACK Routines 3

are:
err is stored in:


follows:



473


are:

err is stored in:

Return Values
See Also
474
LAPACK Routines 3
?herfs
with a complex Hermitian coefficient matrix and
estimates its error.
Syntax
lapack_int LAPACKE_cherfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_int LAPACKE_zherfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
Include Files
• mkl.h
Description
complex Hermitian full-storage matrix A, with multiple right-hand sides. For each computed solution vector x,
the routine computes the component-wise backward errorβ. This error is the smallest relative perturbation
in elements of A and b such that x is the exact solution of the perturbed system:
• call the factorization routine ?hetrf

• call the solver routine ?hetrs.
Input Parameters

a,af,b,x Arrays:
a(size max(1, lda*n)) contains the original matrix A, as supplied to ?
hetrf.
af(size max(1, ldaf*n)) contains the factored matrix A, as returned
by ?hetrf.
475
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hetrf.
Output Parameters
Return Values
Application Notes
error.
For each right-hand side, computation of the backward error involves a minimum of 16n2 operations. In
addition, each step of iterative refinement involves 24n2 operations; the number of iterations may range
from 1 to 5.
is usually 4 or 5 and never more than 11. Each solution requires approximately 8n2 floating-point operations.
The real counterpart of this routine is ?ssyrfs/?dsyrfs
See Also
?herfsx
symmetric indefinite coefficient matrix A and provides
error bounds and backward error estimates.
Syntax
lapack_int LAPACKE_cherfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_complex_float* af, lapack_int ldaf, const lapack_int* ipiv, const float* s,
476
LAPACK Routines 3
float* rcond, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, float* params );
lapack_int LAPACKE_zherfsx( int matrix_layout, char uplo, char equed, lapack_int n,
lapack_complex_double* af, lapack_int ldaf, const lapack_int* ipiv, const double* s,
ldx, double* rcond, double* berr, lapack_int n_err_bnds, double* err_bnds_norm,
double* err_bnds_comp, lapack_int nparams, double* params );
Include Files
• mkl.h
Description
The routine improves the computed solution to a system of linear equations when the coefficient matrix is
Hermitian indefinite, and provides error bounds and backward error estimates for the solution. In addition to
a normwise error bound, the code provides a maximum componentwise error bound, if possible. See
comments for err_bnds_norm and err_bnds_comp for details of the error bounds.
Input Parameters


routine.
and X; nrhs≥ 0.
a, af, b The array a of size max(1, lda*n) contains the Hermitian matrix A as
specified by uplo. If uplo = 'U', the leading n-by-n upper triangular part
of a contains the upper triangular part of the matrix A and the strictly lower
477
triangular part of a is not referenced. If uplo = 'L', the leading n-by-n

lower triangular part of a contains the lower triangular part of the matrix A
and the strictly upper triangular part of a is not referenced.
The array af of size max(1, ldaf*n) contains the block diagonal matrix D
and the multipliers used to obtain the factor U or L from the factorization A
= U*D*UT or A = L*D*LT as computed by ssytrf for cherfsx or dsytrf
for zherfsx.
The array b of size max(1, ldb*nrhs) for row major layout and max(1,
ldb*n) for column major layout contains the matrix B whose columns are
the right-hand sides for the systems of equations.
ipiv Array, size at least max(1, n). Contains details of the interchanges and the
block structure of D as determined by ssytrf for real flavors or dsytrf for
complex flavors.
s Array, size (n). The array s contains the scale factors for A.

The solution matrix X as computed by ?hetrs
478
LAPACK Routines 3
(for cherfsx), 1.0D+0 (for zherfsx).

are computed.

double precision.

refinement.
Default 10
Aggressive Set to 100 to permit convergence using


convergence).
Output Parameters



479

threshold sqrt(n)*slamch(ε) for cherfsx and
sqrt(n)*dlamch(ε) for zherfsx.

for cherfsx and sqrt(n)*dlamch(ε) for
zherfsx. This error bound should only be
trusted if the previous boolean is true.

are:

approximately 1.
err is stored in:


follows:


threshold sqrt(n)*slamch(ε) for cherfsx and
sqrt(n)*dlamch(ε) for zherfsx.
480
LAPACK Routines 3
for cherfsx and sqrt(n)*dlamch(ε) for
zherfsx. This error bound should only be

are:

err is stored in:

Return Values
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been completed, but the factor D is exactly
See Also
481
?sprfs
with a packed symmetric coefficient matrix and
estimates the solution error.
Syntax
lapack_int LAPACKE_ssprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const float* ap, const float* afp, const lapack_int* ipiv, const float* b, lapack_int
ldb, float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dsprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const double* ap, const double* afp, const lapack_int* ipiv, const double* b,
lapack_int LAPACKE_csprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const lapack_int*
lapack_int LAPACKE_zsprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const lapack_int*
ipiv, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
Include Files
• mkl.h
Description
packed symmetric matrix A, with multiple right-hand sides. For each computed solution vector x, the routine
computes the component-wise backward errorβ. This error is the smallest relative perturbation in elements
of A and b such that x is the exact solution of the perturbed system:
• call the factorization routine ?sptrf

• call the solver routine ?sptrs.
Input Parameters

482
LAPACK Routines 3
ap,afp,b,x Arrays:
ap of size max(1, n(n+1)/2) contains the original packed matrix A, as
supplied to ?sptrf.
afp of size max(1, n(n+1)/2) contains the factored packed matrix A,

as returned by ?sptrf.
and ldx≥ max(1,nrhs) for row major layout.
Output Parameters
Return Values
Application Notes
error.
iterations may range from 1 to 5.
of systems is usually 4 or 5 and never more than 11. Each solution requires approximately 2n2 floating-point
operations for real flavors or 8n2 for complex flavors.
See Also
?hprfs
with a packed complex Hermitian coefficient matrix
and estimates the solution error.
483
Syntax
lapack_int LAPACKE_chprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_float* ap, const lapack_complex_float* afp, const lapack_int*
lapack_int LAPACKE_zhprfs( int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
const lapack_complex_double* ap, const lapack_complex_double* afp, const lapack_int*
ipiv, const lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
Include Files
• mkl.h
Description
packed complex Hermitian matrix A, with multiple right-hand sides. For each computed solution vector x, the
• call the factorization routine ?hptrf

• call the solver routine ?hptrs.
Input Parameters

ap,afp,b,x Arrays:
apmax(1, n(n + 1)/2) contains the original packed matrix A, as
supplied to ?hptrf.
afpmax(1, n(n + 1)/2) contains the factored packed matrix A, as

returned by ?hptrf.
484
LAPACK Routines 3
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hptrf.
Output Parameters
ferr, berr Arrays, size at least max(1,nrhs). Contain the component-wise

Return Values
Application Notes
error.
For each right-hand side, computation of the backward error involves a minimum of 16n2 operations. In
addition, each step of iterative refinement involves 24n2 operations; the number of iterations may range
from 1 to 5.
is usually 4 or 5 and never more than 11. Each solution requires approximately 8n2 floating-point operations.
The real counterpart of this routine is ?ssprfs/?dsprfs.
See Also
?trrfs
Estimates the error in the solution of a system of
linear equations with a triangular coefficient matrix.
Syntax
lapack_int LAPACKE_strrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const float* a, lapack_int lda, const float* b,
lapack_int ldb, const float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const double* a, lapack_int lda, const double* b,
lapack_int ldb, const double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_ctrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* x, lapack_int ldx,
485
lapack_int LAPACKE_ztrrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* x, lapack_int
ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine estimates the errors in the solution to a system of linear equations A*X = B or AT*X = B or
AH*X = B with a triangular matrix A, with multiple right-hand sides. For each computed solution vector x, the
The routine also estimates the component-wise forward error in the computed solution ||x - xe||∞/||
Before calling this routine, call the solver routine ?trtrs.
Input Parameters



If diag = 'U', then A is unit triangular: diagonal elements of A are

a, b, x Arrays:
a(size max(1, lda*n)) contains the upper or lower triangular matrix
A, as specified by uplo.
486
LAPACK Routines 3
Output Parameters
Return Values
Application Notes
error.
A call to this routine involves, for each right-hand side, solving a number of systems of linear equations A*x
= b; the number of systems is usually 4 or 5 and never more than 11. Each solution requires approximately
n2 floating-point operations for real flavors or 4n2 for complex flavors.
See Also
?tprfs
linear equations with a packed triangular coefficient
matrix.
Syntax
lapack_int LAPACKE_stprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const float* ap, const float* b, lapack_int ldb, const
float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const double* ap, const double* b, lapack_int ldb,
const double* x, lapack_int ldx, double* ferr, double* berr );
lapack_int LAPACKE_ctprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_float* ap, const
lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* x, lapack_int ldx,
487
lapack_int LAPACKE_ztprfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int nrhs, const lapack_complex_double* ap, const
lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* x, lapack_int
ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
AH*X = B with a packed triangular matrix A, with multiple right-hand sides. For each computed solution
Before calling this routine, call the solver routine ?tptrs.
Input Parameters



If diag = 'N', A is not a unit triangular matrix.
If diag = 'U', A is unit triangular: diagonal elements of A are

ap, b, x Arrays:
apmax(1, n(n + 1)/2) contains the upper or lower triangular matrix A,
as specified by uplo.
488
LAPACK Routines 3
ldx The leading dimension of x; ldb≥ max(1, n) for column major layout
Output Parameters
Return Values
Application Notes
error.
n2 floating-point operations for real flavors or 4n2 for complex flavors.
See Also
?tbrfs
linear equations with a triangular band coefficient
matrix.
Syntax
lapack_int LAPACKE_stbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const float* ab, lapack_int ldab, const
float* b, lapack_int ldb, const float* x, lapack_int ldx, float* ferr, float* berr );
lapack_int LAPACKE_dtbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const double* ab, lapack_int ldab, const
double* b, lapack_int ldb, const double* x, lapack_int ldx, double* ferr, double*
berr );
lapack_int LAPACKE_ctbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const lapack_complex_float* ab,
lapack_int ldab, const lapack_complex_float* b, lapack_int ldb, const
489
lapack_int LAPACKE_ztbrfs( int matrix_layout, char uplo, char trans, char diag,
lapack_int n, lapack_int kd, lapack_int nrhs, const lapack_complex_double* ab,
lapack_int ldab, const lapack_complex_double* b, lapack_int ldb, const
lapack_complex_double* x, lapack_int ldx, double* ferr, double* berr );
Include Files
• mkl.h
Description
AH*X = B with a triangular band matrix A, with multiple right-hand sides. For each computed solution vector
x, the routine computes the component-wise backward errorβ. This error is the smallest relative
Before calling this routine, call the solver routine ?tbtrs.
Input Parameters



If diag = 'N', A is not a unit triangular matrix.

kd The number of super-diagonals or sub-diagonals in the matrix A; kd≥

0.
ab, b, x Arrays:
490
LAPACK Routines 3
ab(size max(1, ldab*n)) contains the upper or lower triangular matrix
A, as specified by uplo, in band storage format.
ldx The leading dimension of x; ldb≥ max(1, n) for column major layout
Output Parameters
Return Values
Application Notes
error.
2n*kd floating-point operations for real flavors or 8n*kd operations for complex flavors.
See Also
Matrix Inversion: LAPACK Computational Routines

It is seldom necessary to compute an explicit inverse of a matrix. In particular, do not attempt to solve a
system of equations Ax = b by first computing A-1 and then forming the matrix-vector product x = A-1b.
Call a solver routine instead (see Routines for Solving Systems of Linear Equations); this is more efficient
and more accurate.
However, matrix inversion routines are provided for the rare occasions when an explicit inverse matrix is
needed.
?getri
Computes the inverse of an LU-factored general
matrix.
Syntax
lapack_int LAPACKE_sgetri (int matrix_layout , lapack_int n , float * a , lapack_int
lda , const lapack_int * ipiv );
491
lapack_int LAPACKE_dgetri (int matrix_layout , lapack_int n , double * a , lapack_int

lda , const lapack_int * ipiv );
lapack_int LAPACKE_cgetri (int matrix_layout , lapack_int n , lapack_complex_float *
a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zgetri (int matrix_layout , lapack_int n , lapack_complex_double *
a , lapack_int lda , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a general matrix A. Before calling this routine, call ?getrf to
factorize A.
Input Parameters

a Array a(size max(1, lda*n)) contains the factorization of the matrix

A, as returned by ?getrf: A = P*L*U. The second dimension of a
must be at least max(1,n).
The ipiv array, as returned by ?getrf.
Output Parameters
a Overwritten by the n-by-n matrix inv(A).
Return Values

If info = i, the i-th diagonal element of the factor U is zero, U is singular, and the inversion could not be
completed.
Application Notes
The computed inverse X satisfies the following error bound:
|XA - I| ≤c(n)ε|X|P|L||U|,
where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identity matrix; P, L,
and U are the factors of the matrix factorization A = P*L*U.
complex flavors.
492
LAPACK Routines 3
See Also
?potri
Computes the inverse of a symmetric (Hermitian)
positive-definite matrix using the Cholesky
factorization.
Syntax
lapack_int LAPACKE_spotri (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dpotri (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_cpotri (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zpotri (int matrix_layout , char uplo , lapack_int n ,
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric positive definite or, for complex flavors, Hermitian
positive-definite matrix A. Before calling this routine, call ?potrf to factorize A.
Input Parameters


a Array a(size max(1, lda*n)). Contains the factorization of the matrix

A, as returned by ?potrf.
Output Parameters
a Overwritten by the upper or lower triangle of the inverse of A.
Return Values
493
If info = i, the i-th diagonal element of the Cholesky factor (and therefore the factor itself) is zero, and the
inversion could not be completed.
Application Notes
The computed inverse X satisfies the following error bounds:
||XA - I||2≤c(n)εκ2(A), ||AX - I||2≤c(n)εκ2(A),
where c(n) is a modest linear function of n, and ε is the machine precision; I denotes the identity matrix.
The 2-norm ||A||2 of a matrix A is defined by ||A||2 = maxx·x=1(Ax·Ax)1/2, and the condition number
κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2.
complex flavors.
See Also
?pftri
Computes the inverse of a symmetric (Hermitian)
positive-definite matrix in RFP format using the
Cholesky factorization.
Syntax
lapack_int LAPACKE_spftri (int matrix_layout , char transr , char uplo , lapack_int n ,
float * a );
lapack_int LAPACKE_dpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
double * a );
lapack_int LAPACKE_cpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_float * a );
lapack_int LAPACKE_zpftri (int matrix_layout , char transr , char uplo , lapack_int n ,
lapack_complex_double * a );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric positive definite or, for complex data, Hermitian
positive-definite matrix A using the Cholesky factorization:

Before calling this routine, call ?pftrf to factorize A.
Storage Schemes.
Input Parameters

494
LAPACK Routines 3
If transr = 'N', the Normal transr of RFP U (if uplo = 'U') or L (if
uplo = 'L') is stored.
If transr = 'T', the Transpose transr of RFP U (if uplo = 'U') or L
(if uplo = 'L' is stored.
If transr = 'C', the Conjugate-Transpose transr of RFP U (if uplo

= 'U') or L (if uplo = 'L' is stored.

If uplo = 'U', A = UT*U for real data or A = UH*U for complex data,
and U is stored.
If uplo = 'L', A = L*LT for real data or A = L*LH for complex data,
and L is stored.
a Array, size (n*(n+1)/2). The array a contains the factor U or L

matrix A in the RFP format.
Output Parameters
a The symmetric/Hermitian inverse of the original matrix in the same

storage format.
Return Values
If info = i, the (i,i) element of the factor U or L is zero, and the inverse could not be computed.
See Also
?pptri
Computes the inverse of a packed symmetric
(Hermitian) positive-definite matrix using Cholesky
factorization.
Syntax
lapack_int LAPACKE_spptri (int matrix_layout , char uplo , lapack_int n , float * ap );
lapack_int LAPACKE_dpptri (int matrix_layout , char uplo , lapack_int n , double *
ap );
lapack_int LAPACKE_cpptri (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zpptri (int matrix_layout , char uplo , lapack_int n ,
495
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric positive definite or, for complex flavors, Hermitian
positive-definite matrix A in packed form. Before calling this routine, call ?pptrf to factorize A.
Input Parameters

Indicates whether the upper or lower triangular factor is stored in ap:
If uplo = 'U', then the upper triangular factor is stored.
If uplo = 'L', then the lower triangular factor is stored.
ap Array, size at least max(1, n(n+1)/2).

Contains the factorization of the packed matrix A, as returned by ?
pptrf.
The dimension ap must be at least max(1,n(n+1)/2).
Output Parameters
ap Overwritten by the packed n-by-n matrix inv(A).
Return Values

If info = i, the i-th diagonal element of the Cholesky factor (and therefore the factor itself) is zero, and the
inversion could not be completed.
Application Notes
||XA - I||2≤c(n)εκ2(A), ||AX - I||2≤c(n)εκ2(A),
where c(n) is a modest linear function of n, and ε is the machine precision; I denotes the identity matrix.
The 2-norm ||A||2 of a matrix A is defined by ||A||2 =maxx·x=1(Ax·Ax)1/2, and the condition number
κ2(A) is defined by κ2(A) = ||A||2 ||A-1||2 .
complex flavors.
See Also
496
LAPACK Routines 3
?sytri
Computes the inverse of a symmetric matrix using
U*D*UT or L*D*LT Bunch-Kaufman factorization.
Syntax
lapack_int LAPACKE_ssytri (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_dsytri (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int LAPACKE_csytri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv );
lapack_int LAPACKE_zsytri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric matrix A. Before calling this routine, call ?sytrf to
factorize A.
Input Parameters


If uplo = 'U', the array a stores the Bunch-Kaufman factorization A
= U*D*UT.
If uplo = 'L', the array a stores the Bunch-Kaufman factorization A
= L*D*LT.
a a(size max(1, lda*n)) contains the factorization of the matrix A, as

returned by ?sytrf.
The ipiv array, as returned by ?sytrf.
Output Parameters
Return Values
497
If info =-i, parameter i had an illegal value.

If info = i, the i-th diagonal element of D is zero, D is singular, and the inversion could not be completed.
Application Notes
|D*UT*PT*X*P*U - I| ≤c(n)ε(|D||UT|PT|X|P|U| + |D||D-1|)
for uplo = 'U', and
|D*LT*PT*X*P*L - I| ≤c(n)ε(|D||LT|PT|X|P|L| + |D||D-1|)
for uplo = 'L'. Here c(n) is a modest linear function of n, and ε is the machine precision; I denotes the
identity matrix.
complex flavors.
See Also
?hetri
Computes the inverse of a complex Hermitian matrix
using U*D*UH or L*D*LH Bunch-Kaufman
factorization.
Syntax
lapack_int LAPACKE_chetri (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zhetri (int matrix_layout , char uplo , lapack_int n ,
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a complex Hermitian matrix A. Before calling this routine, call ?
hetrf to factorize A.
Input Parameters


If uplo = 'U', the array a stores the Bunch-Kaufman factorization A
= U*D*UH.
If uplo = 'L', the array a stores the Bunch-Kaufman factorization A
= L*D*LH.
498
LAPACK Routines 3
a, Array a(size max(1, lda*n)) contains the factorization of the matrix
A, as returned by ?hetrf. The second dimension of a must be at least
max(1,n).
ipiv Array, size at least max(1, n). The ipiv array, as returned by ?hetrf.
Output Parameters
Return Values

Application Notes
|D*UH*PT*X*P*U - I| ≤c(n)ε(|D||UH|PT|X|P|U| + |D||D-1|)
for uplo = 'U', and
|D*LH*PT*X*P*L - I| ≤c(n)ε(|D||LH|PT|X|P|L| + |D||D-1|)
identity matrix.
The total number of floating-point operations is approximately (8/3)n3 for complex flavors.
The real counterpart of this routine is ?sytri.
See Also
?sytri2
Computes the inverse of a symmetric indefinite matrix
through allocating memory and calling ?sytri2x.
Syntax
lapack_int LAPACKE_ssytri2 (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int LAPACKE_dsytri2 (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int LAPACKE_csytri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zsytri2 (int matrix_layout , char uplo , lapack_int n ,
499
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric indefinite matrix A using the factorization A =
U*D*UT or A = L*D*LT computed by ?sytrf.
The ?sytri2 routine allocates a temporary buffer before calling ?sytri2x that actually computes the
inverse.
Input Parameters


If uplo = 'U', the array a stores the factorization A = U*D*UT.
If uplo = 'L', the array a stores the factorization A = L*D*LT.
a Array a(size max(1, lda*n)) contains the block diagonal matrix D and
the multipliers used to obtain the factor U or L as returned by ?sytrf.
Details of the interchanges and the block structure of D as returned

by ?sytrf.
Output Parameters
a If info = 0, the symmetric inverse of the original matrix.
If uplo = 'U', the upper triangular part of the inverse is formed and
the part of A below the diagonal is not referenced.
If uplo = 'L', the lower triangular part of the inverse is formed and
the part of A above the diagonal is not referenced.
Return Values
If info =-i, the i-th parameter had an illegal value.
If info = i, D(i,i) = 0; D is singular and its inversion could not be computed.
See Also
?sytrf
?sytri2x
500
LAPACK Routines 3
?hetri2
Computes the inverse of a Hermitian indefinite matrix
through allocating memory and calling ?hetri2x.
Syntax
lapack_int LAPACKE_chetri2 (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zhetri2 (int matrix_layout , char uplo , lapack_int n ,
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a Hermitian indefinite matrix A using the factorization A =
U*D*UH or A = L*D*LH computed by ?hetrf.
The ?hetri2 routine allocates a temporary buffer before calling ?hetri2x that actually computes the
inverse.
Input Parameters


If uplo = 'U', the array a stores the factorization A = U*D*UH.
If uplo = 'L', the array a stores the factorization A = L*D*LH.
a Array a(size max(1, lda*n)) contains the block diagonal matrix D and
the multipliers used to obtain the factor U or L as returned by ?sytrf.
Details of the interchanges and the block structure of D as returned

by ?hetrf.
Output Parameters
a If info = 0, the inverse of the original matrix.
If uplo = 'U', the upper triangular part of the inverse is formed and
If uplo = 'L', the lower triangular part of the inverse is formed and
Return Values
501
If info = i, D(i,i) = 0; D is singular and its inversion could not be computed.
See Also
?hetrf
?hetri2x
?sytri2x
Computes the inverse of a symmetric indefinite matrix
after ?sytri2allocates memory.
Syntax
lapack_int LAPACKE_ssytri2x (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_dsytri2x (int matrix_layout , char uplo , lapack_int n , double *
a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_csytri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_zsytri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a symmetric indefinite matrix A using the factorization A =
U*D*UT or A = L*D*LT computed by ?sytrf.
The ?sytri2x actually computes the inverse after the ?sytri2 routine allocates memory before calling ?
sytri2x.
Input Parameters


If uplo = 'U', the array a stores the factorization A = U*D*UT.
If uplo = 'L', the array a stores the factorization A = L*D*LT.
a Array a (size max(1, lda*n)) contains the nb (block size) diagonal

matrix D and the multipliers used to obtain the factor U or L as
returned by ?sytrf. The second dimension of a must be at least
max(1,n).
502
LAPACK Routines 3
Details of the interchanges and the nb structure of D as returned by ?

sytrf.
nb Block size.
Output Parameters
If info = 'U', the upper triangular part of the inverse is formed and
If info = 'L', the lower triangular part of the inverse is formed and
Return Values
If info = i, Dii= 0; D is singular and its inversion could not be computed.
See Also
?sytrf
?sytri2
?hetri2x
Computes the inverse of a Hermitian indefinite matrix
after ?hetri2allocates memory.
Syntax
lapack_int LAPACKE_chetri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
lapack_int LAPACKE_zhetri2x (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int lda , const lapack_int * ipiv , lapack_int nb );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a Hermitian indefinite matrix A using the factorization A =
U*D*UH or A = L*D*LH computed by ?hetrf.
The ?hetri2x actually computes the inverse after the ?hetri2 routine allocates memory before calling ?
hetri2x.
503
Input Parameters


If uplo = 'U', the array a stores the factorization A = U*D*UH.
If uplo = 'L', the array a stores the factorization A = L*D*LH.
a Arrays a(size max(1, lda*n)) contains the nb (block size) diagonal

matrix D and the multipliers used to obtain the factor U or L as
returned by ?hetrf.
Details of the interchanges and the nb structure of D as returned by ?

hetrf.
nb Block size.
Output Parameters
If info = 'U', the upper triangular part of the inverse is formed and
If info = 'L', the lower triangular part of the inverse is formed and
Return Values
If info = i, Dii= 0; D is singular and its inversion could not be computed.
See Also
?hetrf
?hetri2
?sptri
Computes the inverse of a symmetric matrix using
U*D*UT or L*D*LT Bunch-Kaufman factorization of
matrix in packed storage.
504
LAPACK Routines 3
Syntax
lapack_int LAPACKE_ssptri (int matrix_layout , char uplo , lapack_int n , float * ap ,
const lapack_int * ipiv );
lapack_int LAPACKE_dsptri (int matrix_layout , char uplo , lapack_int n , double * ap ,
const lapack_int * ipiv );
lapack_int LAPACKE_csptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , const lapack_int * ipiv );
lapack_int LAPACKE_zsptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a packed symmetric matrix A. Before calling this routine, call ?
sptrf to factorize A.
Input Parameters


If uplo = 'U', the array ap stores the Bunch-Kaufman factorization A
= U*D*UT.
If uplo = 'L', the array ap stores the Bunch-Kaufman factorization A
= L*D*LT.
ap Arrays ap (size max(1,n(n+1)/2)) contains the factorization of the

matrix A, as returned by ?sptrf.
Output Parameters
ap Overwritten by the matrix inv(A) in packed form.
Return Values

505
Application Notes
|D*UT*PT*X*P*U - I| ≤c(n)ε(|D||UT|PT|X|P|U| + |D||D-1|)
for uplo = 'U', and
|D*LT*PT*X*P*L - I| ≤c(n)ε(|D||LT|PT|X|P|L| + |D||D-1|)
identity matrix.
complex flavors.
See Also
?hptri
Computes the inverse of a complex Hermitian matrix
using U*D*UH or L*D*LH Bunch-Kaufman factorization
of matrix in packed storage.
Syntax
lapack_int LAPACKE_chptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * ap , const lapack_int * ipiv );
lapack_int LAPACKE_zhptri (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * ap , const lapack_int * ipiv );
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a complex Hermitian matrix A using packed storage. Before
calling this routine, call ?hptrf to factorize A.
Input Parameters


If uplo = 'U', the array ap stores the packed Bunch-Kaufman
If uplo = 'L', the array ap stores the packed Bunch-Kaufman

ap Array ap (size max(1,n(n+1)/2)) contains the factorization of the

matrix A, as returned by ?hptrf.
506
LAPACK Routines 3
The ipiv array, as returned by ?hptrf.
Output Parameters
ap Overwritten by the matrix inv(A).
Return Values
Application Notes
|D*UH*PT*X*P*U - I| ≤c(n)ε(|D||UH|PT|X|P|U| + |D||D-1|)
for uplo = 'U', and
|D*LH*PT*X*PL - I| ≤c(n)ε(|D||LH|PT|X|P|L| + |D||D-1|)
identity matrix.
The real counterpart of this routine is ?sptri.
See Also
?trtri
Computes the inverse of a triangular matrix.
Syntax
lapack_int LAPACKE_strtri (int matrix_layout , char uplo , char diag , lapack_int n ,
float * a , lapack_int lda );
lapack_int LAPACKE_dtrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
double * a , lapack_int lda );
lapack_int LAPACKE_ctrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_int LAPACKE_ztrtri (int matrix_layout , char uplo , char diag , lapack_int n ,
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a triangular matrix A.
507
Input Parameters



a Array: . Contains the matrix A.
lda The first dimension of a; lda≥ max(1, n).
Output Parameters
a Overwritten by the matrix inv(A).
Return Values

If info = i, the i-th diagonal element of A is zero, A is singular, and the inversion could not be completed.
Application Notes
|XA - I| ≤c(n)ε |X||A|
|XA - I| ≤c(n)ε |A-1||A||X|,
where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identity matrix.
complex flavors.
See Also
?tftri
Computes the inverse of a triangular matrix stored in
the Rectangular Full Packed (RFP) format.
Syntax
lapack_int LAPACKE_stftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , float * a );
508
LAPACK Routines 3
lapack_int LAPACKE_dtftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , double * a );
lapack_int LAPACKE_ctftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , lapack_complex_float * a );
lapack_int LAPACKE_ztftri (int matrix_layout , char transr , char uplo , char diag ,
lapack_int n , lapack_complex_double * a );
Include Files
• mkl.h
Description
Computes the inverse of a triangular matrix A stored in the Rectangular Full Packed (RFP) format. For the
description of the RFP format, see Matrix Storage Schemes.
This is the block version of the algorithm, calling Level 3 BLAS.
Input Parameters

If transr = 'N', the Normal transr of RFP A is stored.
If transr = 'T', the Transpose transr of RFP A is stored.
If transr = 'C', the Conjugate-Transpose transr of RFP A is stored.
Indicates whether the upper or lower triangular part of RFP A is

stored:
matrix A.
matrix A.

a Array, size max(1, n*(n + 1)/2). The array a contains the matrix A in
the RFP format.
Output Parameters
a The (triangular) inverse of the original matrix in the same storage

format.
509
Return Values
If info = i, Ai, i is exactly zero. The triangular matrix is singular and its inverse cannot be computed.
See Also
?tptri
Computes the inverse of a triangular matrix using
packed storage.
Syntax
lapack_int LAPACKE_stptri (int matrix_layout , char uplo , char diag , lapack_int n ,
float * ap );
lapack_int LAPACKE_dtptri (int matrix_layout , char uplo , char diag , lapack_int n ,
double * ap );
lapack_int LAPACKE_ctptri (int matrix_layout , char uplo , char diag , lapack_int n ,
lapack_int LAPACKE_ztptri (int matrix_layout , char uplo , char diag , lapack_int n ,
Include Files
• mkl.h
Description
The routine computes the inverse inv(A) of a packed triangular matrix A.
Input Parameters



ap Array, size at least max(1,n(n+1)/2).
510
LAPACK Routines 3
Contains the packed triangular matrix A.
Output Parameters
ap Overwritten by the packed n-by-n matrix inv(A) .
Return Values

If info = i, the i-th diagonal element of A is zero, A is singular, and the inversion could not be completed.
Application Notes
|XA - I| ≤c(n)ε |X||A|
|X - A-1| ≤c(n)ε |A-1||A||X|,
where c(n) is a modest linear function of n; ε is the machine precision; I denotes the identity matrix.
complex flavors.
See Also
Matrix Equilibration: LAPACK Computational Routines

Routines described in this section are used to compute scaling factors needed to equilibrate a matrix. Note
that these routines do not actually scale the matrices.
?geequ
Computes row and column scaling factors intended to
equilibrate a general matrix and reduce its condition
number.
Syntax
lapack_int LAPACKE_sgeequ( int matrix_layout, lapack_int m, lapack_int n, const float*
a, lapack_int lda, float* r, float* c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_dgeequ( int matrix_layout, lapack_int m, lapack_int n, const double*
a, lapack_int lda, double* r, double* c, double* rowcnd, double* colcnd, double*
amax );
lapack_int LAPACKE_cgeequ( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* r, float* c, float* rowcnd, float*
colcnd, float* amax );
lapack_int LAPACKE_zgeequ( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* r, double* c, double* rowcnd,
double* colcnd, double* amax );
511
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate an m-by-n matrix A and reduce its
condition number. The output array r returns the row scale factors and the array c the column scale factors.
These factors are chosen to try to make the largest element in each row and column of the matrix B with
elements bij=r[i-1]*aij*c[j-1] have absolute value 1.
Input Parameters

m The number of rows of the matrix A; m≥ 0.
n The number of columns of the matrix A; n≥ 0.
a Array: size max(1, lda*n) for column major layout and max(1,
lda*m) for row major layout.
Contains the m-by-n matrix A whose equilibration factors are to be
computed.
lda The leading dimension of a; lda≥ max(1, m).
Output Parameters
r, c Arrays: r (size m), c (size n).

If info = 0, or info>m, the array r contains the row scale factors of
the matrix A.
If info = 0, the array c contains the column scale factors of the
matrix A.
rowcnd If info = 0 or info>m, rowcnd contains the ratio of the smallest

r[i] to the largest r[i].
colcnd If info = 0, colcnd contains the ratio of the smallest c[i] to the
largest c[i].
amax Absolute value of the largest element of the matrix A.
Return Values
If info = i, i > 0, and
i≤m, the i-th row of A is exactly zero;

i>m, the (i-m)th column of A is exactly zero.
512
LAPACK Routines 3
Application Notes
All the components of r and c are restricted to be between SMLNUM = smallest safe number and BIGNUM=
largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number of A but
works well in practice.
SMLNUM and BIGNUM are parameters representing machine precision. You can use the ?lamch routines to
compute them. For example, compute single precision values of SMLNUM and BIGNUM as follows:
SMLNUM = slamch ('s')

BIGNUM = 1 / SMLNUM
If rowcnd≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by r.
If colcnd≥ 0.1, it is not worth scaling by c.
If amax is very close to SMLNUM or very close to BIGNUM, the matrix A should be scaled.
See Also
Error Analysis
?geequb
Computes row and column scaling factors restricted to
a power of radix to equilibrate a general matrix and
reduce its condition number.
Syntax
lapack_int LAPACKE_sgeequb( int matrix_layout, lapack_int m, lapack_int n, const float*
a, lapack_int lda, float* r, float* c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_dgeequb( int matrix_layout, lapack_int m, lapack_int n, const
double* a, lapack_int lda, double* r, double* c, double* rowcnd, double* colcnd,
double* amax );
lapack_int LAPACKE_cgeequb( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* r, float* c, float* rowcnd, float*
colcnd, float* amax );
lapack_int LAPACKE_zgeequb( int matrix_layout, lapack_int m, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* r, double* c, double* rowcnd,
double* colcnd, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate an m-by-n general matrix A and
reduce its condition number. The output array r returns the row scale factors and the array c - the column
scale factors. These factors are chosen to try to make the largest element in each row and column of the
matrix B with elements bi,j = r[i-1]*ai,j*c[j-1] have an absolute value of at most the radix.
r[i-1] and c[j-1] are restricted to be a power of the radix between SMLNUM = smallest safe number and
BIGNUM = largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number
of a but works well in practice.
513

BIGNUM = 1 / SMLNUM
This routine differs from ?geequ by restricting the scaling factors to a power of the radix. Except for over-
and underflow, scaling by these factors introduces no additional rounding errors. However, the scaled entries'
magnitudes are no longer equal to approximately 1 but lie between sqrt(radix) and 1/sqrt(radix).
Input Parameters

a Array: size max(1, lda*n) for column major layout and max(1,
Contains the m-by-n matrix A whose equilibration factors are to be
computed.
Output Parameters
r, c Arrays: r(m), c(n).
If info = 0, or info>m, the array r contains the row scale factors for
the matrix A.
If info = 0, the array c contains the column scale factors for the
matrix A.

r[i] to the largest r[i]. If rowcnd≥ 0.1, and amax is neither too
large nor too small, it is not worth scaling by r.
largest c[i]. If colcnd≥ 0.1, it is not worth scaling by c.
amax Absolute value of the largest element of the matrix A. If amax is very
close to SMLNUM or very close to BIGNUM, the matrix should be scaled.
Return Values
If info = i, i > 0, and

i>m, the (i-m)-th column of A is exactly zero.
514
LAPACK Routines 3
See Also
Error Analysis
?gbequ
equilibrate a banded matrix and reduce its condition
number.
Syntax
lapack_int LAPACKE_sgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const float* ab, lapack_int ldab, float* r, float* c, float*
rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_dgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const double* ab, lapack_int ldab, double* r, double* c, double*
rowcnd, double* colcnd, double* amax );
lapack_int LAPACKE_cgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, float* r, float*
c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_zgbequ( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, double* r,
double* c, double* rowcnd, double* colcnd, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate an m-by-n band matrix A and reduce
its condition number. The output array r returns the row scale factors and the array c the column scale
factors. These factors are chosen to try to make the largest element in each row and column of the matrix B
with elements bij=r[i - 1]*aij*c[j - 1] have absolute value 1.
Input Parameters

ab Array, size max(1, ldab*n) for column major layout and max(1,
ldab*m) for row major layout. Contains the original band matrix A.
ldab The leading dimension of ab; ldab≥kl+ku+1.
515
Output Parameters

If info = 0, or info>m, the array r contains the row scale factors of
the matrix A.
If info = 0, the array c contains the column scale factors of the
matrix A.

r[i] to the largest r[i].
largest c[i].
Return Values
If info = i and

i>m, the (i-m)th column of A is exactly zero.
Application Notes
All the components of r and c are restricted to be between SMLNUM = smallest safe number and BIGNUM=
largest safe number. Use of these scaling factors is not guaranteed to reduce the condition number of A but

BIGNUM = 1 / SMLNUM
If rowcnd≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by r.
If colcnd≥ 0.1, it is not worth scaling by c.
See Also
Error Analysis
?gbequb
Computes row and column scaling factors restricted to
a power of radix to equilibrate a banded matrix and
reduce its condition number.
Syntax
lapack_int LAPACKE_sgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const float* ab, lapack_int ldab, float* r, float* c, float*
rowcnd, float* colcnd, float* amax );
516
LAPACK Routines 3
lapack_int LAPACKE_dgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const double* ab, lapack_int ldab, double* r, double* c, double*
rowcnd, double* colcnd, double* amax );
lapack_int LAPACKE_cgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_float* ab, lapack_int ldab, float* r, float*
c, float* rowcnd, float* colcnd, float* amax );
lapack_int LAPACKE_zgbequb( int matrix_layout, lapack_int m, lapack_int n, lapack_int
kl, lapack_int ku, const lapack_complex_double* ab, lapack_int ldab, double* r,
double* c, double* rowcnd, double* colcnd, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate an m-by-n banded matrix A and
reduce its condition number. The output array r returns the row scale factors and the array c - the column
scale factors. These factors are chosen to try to make the largest element in each row and column of the
matrix B with elements bi, j=r[i-1]*ai, j*c[j-1] have an absolute value of at most the radix.
r[i] and c[j] are restricted to be a power of the radix between SMLNUM = smallest safe number and
BIGNUM = largest safe number. Use of these scaling factors is not guaranteed to reduce the condition
number of a but works well in practice.

BIGNUM = 1 / SMLNUM
This routine differs from ?gbequ by restricting the scaling factors to a power of the radix. Except for over-
and underflow, scaling by these factors introduces no additional rounding errors. However, the scaled entries'
magnitudes are no longer equal to approximately 1 but lie between sqrt(radix) and 1/sqrt(radix).
Input Parameters

ab Array: size max(1, ldab*n) for column major layout and max(1,
ldab*m) for row major layout
ldab The leading dimension of a; ldab≥ max(1, m).
Output Parameters
517
If info = 0, or info>m, the array r contains the row scale factors for
the matrix A.
If info = 0, the array c contains the column scale factors for the
matrix A.

r(i) to the largest r(i). If rowcnd≥ 0.1, and amax is neither too
large nor too small, it is not worth scaling by r.
largest c[i]. If colcnd≥ 0.1, it is not worth scaling by c.
close to SMLNUM or BIGNUM, the matrix should be scaled.
Return Values
If info = i, the i-th diagonal element of A is nonpositive.

i>m, the (i-m)-th column of A is exactly zero.
See Also
Error Analysis
?poequ
equilibrate a symmetric (Hermitian) positive definite
matrix and reduce its condition number.
Syntax
lapack_int LAPACKE_spoequ( int matrix_layout, lapack_int n, const float* a, lapack_int
lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpoequ( int matrix_layout, lapack_int n, const double* a, lapack_int
lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpoequ( int matrix_layout, lapack_int n, const lapack_complex_float*
a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zpoequ( int matrix_layout, lapack_int n, const
lapack_complex_double* a, lapack_int lda, double* s, double* scond, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive-
definite matrix A and reduce its condition number (with respect to the two-norm). The output array s returns
scale factors such that contains
518
LAPACK Routines 3
These factors are chosen so that the scaled matrix B with elements Bi,j=s[i-1]*Ai,j*s[j-1] has diagonal
elements equal to 1.
This choice of s puts the condition number of B within a factor n of the smallest possible condition number
over all possible diagonal scalings.
Input Parameters

a Array: size max(1, lda*n) .
Contains the n-by-n symmetric or Hermitian positive definite matrix A

whose scaling factors are to be computed. Only the diagonal elements
of A are referenced.
lda The leading dimension of a; lda≥ max(1,n).
Output Parameters
s Array, size n.
If info = 0, the array s contains the scale factors for A.
scond If info = 0, scond contains the ratio of the smallest s[i] to the
largest s[i].
Return Values
Application Notes
If scond≥ 0.1 and amax is neither too large nor too small, it is not worth scaling by s.
See Also
Error Analysis
519
?poequb
matrix and reduce its condition number.
Syntax
lapack_int LAPACKE_spoequb( int matrix_layout, lapack_int n, const float* a, lapack_int
lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dpoequb( int matrix_layout, lapack_int n, const double* a,
lapack_int lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpoequb( int matrix_layout, lapack_int n, const
lapack_complex_float* a, lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_zpoequb( int matrix_layout, lapack_int n, const
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive-
definite matrix A and reduce its condition number (with respect to the two-norm).
These factors are chosen so that the scaled matrix B with elements Bi,j=s[i-1]*Ai,j*s[j-1] has diagonal
elements equal to 1. s[i - 1] is a power of two nearest to, but not exceeding 1/sqrt(Ai,i).
Input Parameters

a Array: size max(1, lda*n) .
Contains the n-by-n symmetric or Hermitian positive definite matrix A

whose scaling factors are to be computed. Only the diagonal elements
of A are referenced.
Output Parameters
s Array, size (n).
largest s[i]. If scond≥ 0.1, and amax is neither too large nor too
small, it is not worth scaling by s.
520
LAPACK Routines 3
Return Values
See Also
Error Analysis
?ppequ
matrix in packed storage and reduce its condition
number.
Syntax
lapack_int LAPACKE_sppequ( int matrix_layout, char uplo, lapack_int n, const float* ap,
float* s, float* scond, float* amax );
lapack_int LAPACKE_dppequ( int matrix_layout, char uplo, lapack_int n, const double*
ap, double* s, double* scond, double* amax );
lapack_int LAPACKE_cppequ( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, float* s, float* scond, float* amax );
lapack_int LAPACKE_zppequ( int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, double* s, double* scond, double* amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive
definite matrix A in packed storage and reduce its condition number (with respect to the two-norm). The
output array s returns scale factors such that contains
These factors are chosen so that the scaled matrix B with elements bij=s[i-1]*aij*s[j-1] has diagonal
elements equal to 1.
521
Input Parameters


the array ap:
matrix A.
matrix A.
ap Array, size at least max(1,n(n+1)/2). The array ap contains the

upper or the lower triangular part of the matrix A (as specified by
uplo) in packed storage (see Matrix Storage Schemes).
Output Parameters
s Array, size (n).

largest s[i].
Return Values
Application Notes
See Also
Error Analysis
?pbequ
equilibrate a symmetric (Hermitian) positive-definite
band matrix and reduce its condition number.
Syntax
lapack_int LAPACKE_spbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const float* ab, lapack_int ldab, float* s, float* scond, float* amax );
522
LAPACK Routines 3
lapack_int LAPACKE_dpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const double* ab, lapack_int ldab, double* s, double* scond, double* amax );
lapack_int LAPACKE_cpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_float* ab, lapack_int ldab, float* s, float* scond, float* amax );
lapack_int LAPACKE_zpbequ( int matrix_layout, char uplo, lapack_int n, lapack_int kd,
const lapack_complex_double* ab, lapack_int ldab, double* s, double* scond, double*
amax );
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric (Hermitian) positive
definite band matrix A and reduce its condition number (with respect to the two-norm). The output array s
returns scale factors such that contains
These factors are chosen so that the scaled matrix B with elements bij=s[i-1]*aij*s[j-1] has diagonal
elements equal to 1. This choice of s puts the condition number of B within a factor n of the smallest possible
condition number over all possible diagonal scalings.
Input Parameters

Indicates whether the upper or lower triangular part of A is stored in

the array ab:
If uplo = 'U', the array ab stores the upper triangular part of the
matrix A.
If uplo = 'L', the array ab stores the lower triangular part of the
matrix A.
ab Array, size max(1, ldab*n) .
The array ap contains either the upper or the lower triangular part of
the matrix A (as specified by uplo) in band storage (see Matrix
Storage Schemes).
523
Output Parameters
s Array, size (n).

largest s[i].
Return Values
Application Notes
See Also
Error Analysis
?syequb
equilibrate a symmetric indefinite matrix and reduce
its condition number.
Syntax
lapack_int LAPACKE_ssyequb( int matrix_layout, char uplo, lapack_int n, const float* a,
lapack_int lda, float* s, float* scond, float* amax );
lapack_int LAPACKE_dsyequb( int matrix_layout, char uplo, lapack_int n, const double*
a, lapack_int lda, double* s, double* scond, double* amax );
lapack_int LAPACKE_csyequb( int matrix_layout, char uplo, lapack_int n, const
lapack_int LAPACKE_zsyequb( int matrix_layout, char uplo, lapack_int n, const
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a symmetric indefinite matrix A and
reduce its condition number (with respect to the two-norm).
The array s contains the scale factors, s[i-1] = 1/sqrt(A(i,i)). These factors are chosen so that the
scaled matrix B with elements bi,j=s[i-1]*ai, j*s[j-1] has ones on the diagonal.
524
LAPACK Routines 3
Input Parameters


matrix A.
matrix A.
a Array a: max(1, lda*n) .
Contains the n-by-n symmetric indefinite matrix A whose scaling

factors are to be computed. Only the diagonal elements of A are
referenced.
Output Parameters
s Array, size (n).
Return Values
See Also
Error Analysis
?heequb
equilibrate a Hermitian indefinite matrix and reduce its
condition number.
525
Syntax
lapack_int LAPACKE_cheequb( int matrix_layout, char uplo, lapack_int n, const
lapack_int LAPACKE_zheequb( int matrix_layout, char uplo, lapack_int n, const
Include Files
• mkl.h
Description
The routine computes row and column scalings intended to equilibrate a Hermitian indefinite matrix A and
reduce its condition number (with respect to the two-norm).
The array s contains the scale factors, s[i-1] = 1/sqrt(ai,i). These factors are chosen so that the scaled
matrix B with elements bi,j=s[i-1]*ai,j*s[j-1] has ones on the diagonal.
Input Parameters


matrix A.
matrix A.
a Array a: size max(1, lda*n) .
Contains the n-by-n symmetric indefinite matrix A whose scaling

factors are to be computed. Only the diagonal elements of A are
referenced.
Output Parameters
s Array, size (n).
526
LAPACK Routines 3
Return Values
See Also
Error Analysis
LAPACK Linear Equation Driver Routines

Table "Driver Routines for Solving Systems of Linear Equations" lists the LAPACK driver routines for solving
systems of linear equations with real or complex matrices.
Driver Routines for Solving Systems of Linear Equations
Matrix type, storage Simple Driver Expert Driver Expert Driver using
scheme Extra-Precise
Interative Refinement
general ?gesv ?gesvx ?gesvxx
general band ?gbsv ?gbsvx ?gbsvxx
general tridiagonal ?gtsv ?gtsvx
diagonally dominant ?dtsvb

tridiagonal
symmetric/Hermitian ?posv ?posvx ?posvxx

positive-definite
symmetric/Hermitian ?ppsv ?ppsvx

positive-definite,
storage
symmetric/Hermitian ?pbsv ?pbsvx

positive-definite, band
symmetric/Hermitian ?ptsv ?ptsvx

positive-definite,
tridiagonal
symmetric/Hermitian ?sysv/?hesv ?sysvx/?hesvx ?sysvxx/?hesvxx

indefinite
?sysv_rook
symmetric/Hermitian ?spsv/?hpsv ?spsvx/?hpsvx

indefinite, packed
storage
complex symmetric ?sysv ?sysvx

?sysv_rook
complex symmetric, ?spsv ?spsvx

packed storage
In this table ? stands for s (single precision real), d (double precision real), c (single precision complex), or z
(double precision complex). In the description of ?gesv and ?posv routines, the ? sign stands for combined
character codes ds and zc for the mixed precision subroutines.
527
?gesv
Computes the solution to the system of linear
equations with a square coefficient matrix A and
multiple right-hand sides.
Syntax
lapack_int LAPACKE_sgesv (int matrix_layout , lapack_int n , lapack_int nrhs , float *
a , lapack_int lda , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dgesv (int matrix_layout , lapack_int n , lapack_int nrhs , double *
a , lapack_int lda , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cgesv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_float * a , lapack_int lda , lapack_int * ipiv , lapack_complex_float *
b , lapack_int ldb );
lapack_int LAPACKE_zgesv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_double * a , lapack_int lda , lapack_int * ipiv , lapack_complex_double
lapack_int LAPACKE_dsgesv (int matrix_layout, lapack_int n, lapack_int nrhs, double *
a, lapack_int lda, lapack_int * ipiv, double * b, lapack_int ldb, double * x,
lapack_int ldx, lapack_int * iter);
lapack_int LAPACKE_zcgesv (int matrix_layout, lapack_int n, lapack_int nrhs,
lapack_complex_double * a, lapack_int lda, lapack_int * ipiv, lapack_complex_double *
b, lapack_int ldb, lapack_complex_double * x, lapack_int ldx, lapack_int * iter);
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B, where A is an n-by-n matrix, the columns
of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = P*L*U, where P
is a permutation matrix, L is unit lower triangular, and U is upper triangular. The factored form of A is then
used to solve the system of equations A*X = B.
The dsgesv and zcgesv are mixed precision iterative refinement subroutines for exploiting fast single
precision hardware. They first attempt to factorize the matrix in single precision (dsgesv) or single complex
precision (zcgesv) and use this factorization within an iterative refinement procedure to produce a solution
with double precision (dsgesv) / double complex precision (zcgesv) normwise backward error quality (see
below). If the approach fails, the method switches to a double precision or double complex precision
factorization respectively and computes the solution.
The iterative refinement is not going to be a winning strategy if the ratio single precision performance over
double precision performance is too small. A reasonable strategy should take the number of right-hand sides
and the size of the matrix into account. This might be done with a call to ilaenv in the future. At present,
iterative refinement is implemented.
The iterative refinement process is stopped if
iter > itermax
or for all the right-hand sides:
rnmr < sqrt(n)*xnrm*anrm*eps*bwdmax
528
LAPACK Routines 3
where
• iter is the number of the current iteration in the iterativerefinement process

• rnmr is the infinity-norm of the residual
• xnrm is the infinity-norm of the solution
• anrm is the infinity-operator-norm of the matrix A
• eps is the machine epsilon returned by dlamch (‘Epsilon’).
The values itermax and bwdmax are fixed to 30 and 1.0d+00 respectively.
Input Parameters

n The number of linear equations, that is, the order of the matrix A; n≥
0.
a The array a(size max(1, lda*n)) contains the n-by-n coefficient

matrix A.
b The array bof size max(1, ldb*nrhs) for column major layout and
max(1, ldb*n) for row major layout contains the n-by-nrhs matrix of
right hand side matrix B.
lda The leading dimension of the array a; lda≥ max(1, n).
ldb The leading dimension of the array b; ldb≥ max(1, n) for column
major layout and ldb≥nrhs for row major layout.
ldx The leading dimension of the array x; ldx≥ max(1, n) for column
Output Parameters
a Overwritten by the factors L and U from the factorization of A =

P*L*U; the unit diagonal elements of L are not stored.
If iterative refinement has been successfully used (info= 0 and
iter≥ 0), then A is unchanged.
If double precision factorization has been used (info= 0 and iter <
0), then the array A contains the factors L and U from the
factorization A = P*L*U; the unit diagonal elements of L are not
stored.
b Overwritten by the solution matrix X for dgesv, sgesv,zgesv,zgesv.

Unchanged for dsgesv and zcgesv.
529
ipiv Array, size at least max(1, n). The pivot indices that define the
permutation matrix P; row i of the matrix was interchanged with row
ipiv[i-1]. Corresponds to the single precision factorization (if
info= 0 and iter≥ 0) or the double precision factorization (if info=
0 and iter < 0).
x Array, size max(1, ldx*nrhs) for column major layout and max(1,
ldx*n) for row major layout. If info = 0, contains the n-by-nrhs
solution matrix X.
iter
If iter < 0: iterative refinement has failed, double precision
factorization has been performed
• If iter = -1: the routine fell back to full precision for

implementation- or machine-specific reason
• If iter = -2: narrowing the precision induced an overflow, the
routine fell back to full precision
• If iter = -3: failure of sgetrf for dsgesv, or cgetrf for zcgesv
• If iter = -31: stop the iterative refinement after the 30th
iteration.
If iter > 0: iterative refinement has been successfully used. Returns

the number of iterations.
Return Values
If info = i, Ui, i (computed in double precision for mixed precision subroutines) is exactly zero. The
factorization has been completed, but the factor U is exactly singular, so the solution could not be computed.
See Also
?lamch
?getrf
?gesvx
equations with a square coefficient matrix A and
multiple right-hand sides, and provides error bounds
on the solution.
Syntax
lapack_int LAPACKE_sgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, lapack_int*
ipiv, char* equed, float* r, float* c, float* b, lapack_int ldb, float* x, lapack_int
ldx, float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_dgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, lapack_int*
ipiv, char* equed, double* r, double* c, double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* ferr, double* berr, double* rpivot );
530
LAPACK Routines 3
lapack_int LAPACKE_cgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* r, float* c,
float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_zgesvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* af,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* r, double* c,
double* rcond, double* ferr, double* berr, double* rpivot );
Include Files
• mkl.h
Description
The routine uses the LU factorization to compute the solution to a real or complex system of linear equations
A*X = B, where A is an n-by-n matrix, the columns of matrix B are individual right-hand sides, and the
Error bounds on the solution and a condition estimate are also provided.
The routine ?gesvx performs the following steps:
1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:
trans = 'N': diag(r)*A*diag(c)*inv(diag(c))*X = diag(r)*B

trans = 'T': (diag(r)*A*diag(c))T*inv(diag(r))*X = diag(c)*B
trans = 'C': (diag(r)*A*diag(c))H*inv(diag(r))*X = diag(c)*B
Whether or not the system will be equilibrated depends on the scaling of the matrix A, but if
equilibration is used, A is overwritten by diag(r)*A*diag(c) and B by diag(r)*B (if trans='N') or
diag(c)*B (if trans = 'T' or 'C').
2. If fact = 'N' or 'E', the LU decomposition is used to factor the matrix A (after equilibration if fact
= 'E') as A = P*L*U, where P is a permutation matrix, L is a unit lower triangular matrix, and U is
upper triangular.
3. If some Ui,i= 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, info = n + 1 is returned as a warning, but the
routine still goes on to solve for X and compute error bounds as described below.
4. The system of equations is solved for X using the factored form of A.
5. Iterative refinement is applied to improve the computed solution matrix and calculate error bounds and
backward error estimates for it.
6. If equilibration was used, the matrix X is premultiplied by diag(c) (if trans = 'N') or diag(r) (if
trans = 'T' or 'C') so that it solves the original system before equilibration.
Input Parameters

fact Must be 'F', 'N', or 'E'.
531
Specifies whether or not the factored form of the matrix A is supplied

on entry, and if not, whether the matrix A should be equilibrated
before it is factored.
If fact = 'F': on entry, af and ipiv contain the factored form of A. If
equed is not 'N', the matrix A has been equilibrated with scaling
factors given by r and c.
a, af, and ipiv are not modified.
If fact = 'N', the matrix A will be copied to af and factored.
If fact = 'E', the matrix A will be equilibrated if necessary, then

copied to af and factored.

If trans = 'C', the system has the form AH*X = B (Transpose for
real flavors, conjugate transpose for complex flavors).
nrhs The number of right hand sides; the number of columns of the
matrices B and X; nrhs≥ 0.
a The array a(size max(1, lda*n)) contains the matrix A. If fact =

'F' and equed is not 'N', then A must have been equilibrated by the
scaling factors in r and/or c.
af The array afaf(size max(1, ldaf*n)) is an input argument if fact =

'F'. It contains the factored form of the matrix A, that is, the factors
L and U from the factorization A = P*L*U as computed by ?getrf. If
equed is not 'N', then af is the factored form of the equilibrated
matrix A.
b The array bbof size max(1, ldb*nrhs) for column major layout and
ipiv Array, size at least max(1, n). The array ipiv is an input argument if
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?getrf; row i of the matrix was interchanged
with row ipiv[i-1].
532
LAPACK Routines 3
equed is an input argument if fact = 'F'. It specifies the form of
equilibration that was done:
If equed = 'N', no equilibration was done (always true if fact =
'N').
If equed = 'B', both row and column equilibration was done, that is,
A has been replaced by diag(r)*A*diag(c).
r, c Arrays: r (size n), c (size n). The array r contains the row scale
factors for A, and the array c contains the column scale factors for A.
These arrays are input arguments if fact = 'F' only; otherwise they
are output arguments.
If equed = 'R' or 'B', A is multiplied on the left by diag(r); if equed
= 'N' or 'C', r is not accessed.
If fact = 'F' and equed = 'R' or 'B', each element of r must be
positive.
If equed = 'C' or 'B', A is multiplied on the right by diag(c); if
equed = 'N' or 'R', c is not accessed.
If fact = 'F' and equed = 'C' or 'B', each element of c must be
positive.
ldx The leading dimension of the output array x; ldx≥ max(1, n) for
column major layout and ldx≥nrhs for row major layout.
Output Parameters
If info = 0 or info = n+1, the array x contains the solution matrix
X to the original system of equations. Note that A and B are modified
on exit if equed≠'N', and the solution to the equilibrated system is:
diag(C)-1*X, if trans = 'N' and equed = 'C' or 'B';

diag(R)-1*X, if trans = 'T' or 'C' and equed = 'R' or 'B'. The
second dimension of x must be at least max(1,nrhs).
a Array a is not modified on exit if fact = 'F' or 'N', or if fact =

'E' and equed = 'N'. If equed≠'N', A is scaled on exit as follows:
equed = 'R': A = diag(R)*A
equed = 'C': A = A*diag(c)
equed = 'B': A = diag(R)*A*diag(c).
533
af If fact = 'N' or 'E', then af is an output argument and on exit

returns the factors L and U from the factorization A = PLU of the
original matrix A (if fact = 'N') or of the equilibrated matrix A (if
fact = 'E'). See the description of a for the form of the equilibrated
matrix.
b Overwritten by diag(r)*B if trans = 'N' and equed = 'R'or 'B';
overwritten by diag(c)*B if trans = 'T' or 'C' and equed = 'C'

or 'B';
not changed if equed = 'N'.
r, c These arrays are output arguments if fact≠'F'. See the description

of r, c in Input Arguments section.
rcond An estimate of the reciprocal condition number of the matrix A after

equilibration (if done). If rcond is less than the machine precision, in
particular, if rcond = 0, the matrix is singular to working precision.
This condition is indicated by a return code of info > 0.
ferr Array, size at least max(1, nrhs). Contains the estimated forward
error bound for each solution vector xj (the j-th column of the
solution matrix X). If xtrue is the true solution corresponding to xj,
ferr[j-1] is an estimated upper bound for the magnitude of the
largest element in (xj - xtrue) divided by the magnitude of the
largest element in xj. The estimate is as reliable as the estimate for
rcond, and is almost always a slight overestimate of the true error.
berr Array, size at least max(1, nrhs). Contains the component-wise

relative backward error for each solution vector xj, that is, the
smallest relative change in any element of A or B that makes xj an
exact solution.
ipiv If fact = 'N'or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the
fact = 'E').
equed If fact≠'F', then equed is an output argument. It specifies the form

of equilibration that was done (see the description of equed in Input
Arguments section).
rpivot On exit, rpivot contains the reciprocal pivot growth factor:
If rpivot is much less than 1, then the stability of the LU

factorization of the (equilibrated) matrix A could be poor. This also
means that the solution x, condition estimator rcond, and forward
error bound ferr could be unreliable. If factorization fails with 0 <
info≤n, then rpivot contains the reciprocal pivot growth factor for
the leading info columns of A.
534
LAPACK Routines 3
Return Values
If info = i, and i≤n, then U(i, i) is exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the matrix is
singular to working precision. Nevertheless, the solution and error bounds are computed because there are a
number of situations where the computed solution can be more accurate than the value of rcond would
suggest.
See Also
?gesvxx
Uses extra precise iterative refinement to compute the
square coefficient matrix A and multiple right-hand
sides
Syntax
lapack_int LAPACKE_sgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
ipiv, char* equed, float* r, float* c, float* b, lapack_int ldb, float* x, lapack_int
ldx, float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds, float*
err_bnds_norm, float* err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_dgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
ipiv, char* equed, double* r, double* c, double* b, lapack_int ldb, double* x,
lapack_int ldx, double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds,
double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double*
params );
lapack_int LAPACKE_cgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* r, float* c,
float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_zgesvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* r, double* c,
double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double* params );
Include Files
• mkl.h
Description
535
A*X = B, where A is an n-by-n matrix, the columns of the matrix B are individual right-hand sides, and the
Both normwise and maximum componentwise error bounds are also provided on request. The routine returns
a solution with a small guaranteed error (O(eps), where eps is the working machine precision) unless the
matrix is very ill-conditioned, in which case a warning is returned. Relevant condition numbers are also
calculated and returned.
The routine accepts user-provided factorizations and equilibration factors; see definitions of the fact and
equed options. Solving with refinement and using a factorization from a previous call of the routine also
produces a solution with O(eps) errors or warnings but that may not be true for general user-provided
factorizations and equilibration factors if they differ from what the routine would itself produce.
The routine ?gesvxx performs the following steps:
1. If fact = 'E', scaling factors r and c are computed to equilibrate the system:

upper triangular.
factored form of A is used to estimate the condition number of the matrix A (see the rcond parameter).
If the reciprocal of the condition number is less than machine precision, the routine still goes on to
solve for X and compute error bounds.
5. By default, unless is set to zero, the routine applies iterative refinement to improve the computed
solution matrix and calculate error bounds. Refinement calculates the residual to at least twice the
working precision.
Input Parameters

Specifies whether or not the factored form of the matrix A is supplied on

entry, and if not, whether the matrix A should be equilibrated before it is
factored.
If fact = 'F', on entry, af and ipiv contain the factored form of A. If
equed is not 'N', the matrix A has been equilibrated with scaling factors
given by r and c. Parameters a, af, and ipiv are not modified.
If fact = 'E', the matrix A will be equilibrated, if necessary, copied to af

and factored.
536
LAPACK Routines 3

If trans = 'C', the system has the form AH*X = B (Conjugate Transpose
= Transpose for real flavors, Conjugate Transpose for complex flavors).
nrhs The number of right hand sides; the number of columns of the matrices B
and X; nrhs≥ 0.
a, af, b Arrays: a(size max(lda*n)), af(size max(ldaf*n)), b(size max(1,

layout).
The array a contains the matrix A. If fact = 'F' and equed is not 'N',
then A must have been equilibrated by the scaling factors in r and/or c. .
The array af is an input argument if fact = 'F'. It contains the factored

form of the matrix A, that is, the factors L and U from the factorization A =
P*L*U as computed by ?getrf. If equed is not 'N', then af is the factored
form of the equilibrated matrix A.
ipiv Array, size at least max(1, n). The array ipiv is an input argument if fact
= 'F'. It contains the pivot indices from the factorization A = P*L*U as
computed by ?getrf; row i of the matrix was interchanged with row
ipiv[i-1].

If equed = 'N', no equilibration was done (always true if fact = 'N').
been replaced by diag(r)*A*diag(c).
r, c Arrays: r (size n), c (size n). The array r contains the row scale factors
for A, and the array c contains the column scale factors for A. These arrays
are input arguments if fact = 'F' only; otherwise they are output
arguments.
537

'N' or 'C', r is not accessed.
If fact = 'F' and equed = 'R'or 'B', each element of r must be
positive.
positive.
params Array, size max(1, nparams). Specifies algorithm parameters. If an entry is

less than 0.0, that entry is filled with the default value used for that
parameter. Only positions up to nparams are accessed; defaults are used
for higher-numbered parameters. If defaults are acceptable, you can pass
nparams = 0, which prevents the source code from accessing the params
argument.

are computed.

double precision.

refinement.
Default 10.0

538
LAPACK Routines 3

convergence).
Output Parameters
If info = 0, the array x contains the solution n-by-nrhs matrix X to the
original system of equations. Note that A and B are modified on exit if
equed≠'N', and the solution to the equilibrated system is:
inv(diag(c))*X, if trans = 'N' and equed = 'C' or 'B'; or
inv(diag(r))*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.
a Array a is not modified on exit if fact = 'F' or 'N', or if fact = 'E' and
equed = 'N'.
If equed≠'N', A is scaled on exit as follows:
equed = 'R': A = diag(r)*A

equed = 'B': A = diag(r)*A*diag(c).
af If fact = 'N' or 'E', then af is an output argument and on exit returns

the factors L and U from the factorization A = PLU of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E'). See the
description of a for the form of the equilibrated matrix.
b Overwritten by diag(r)*B if trans = 'N' and equed = 'R' or 'B';
overwritten by trans = 'T' or 'C' and equed = 'C' or 'B';
r, c These arrays are output arguments if fact≠'F'. Each element of these

arrays is a power of the radix. See the description of r, c in Input
Arguments section.

rpvgrw Contains the reciprocal pivot growth factor:
539
If this is much less than 1, the stability of the LU factorization of the

(equlibrated) matrix A could be poor. This also means that the solution X,
estimated condition numbers, and error bounds could be unreliable. If
factorization fails with 0 < info≤n, this parameter contains the reciprocal
pivot growth factor for the leading info columns of A. In ?gesvx, this
quantity is returned in rpivot.





are:

approximately 1.
540
LAPACK Routines 3
err is stored in err_bnds_norm[(err-1)*nrhs + i - 1].

follows:




are:

err is stored in err_bnds_comp[(err-1)*nrhs + i - 1].
541
ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit
contains the pivot indices from the factorization A = P*L*U of the original
matrix A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E').
equed If fact≠'F', then equed is an output argument. It specifies the form of

equilibration that was done (see the description of equed in Input
Arguments section).
params If an entry is less than 0.0, that entry is filled with the default value used
for that parameter, otherwise the entry is not modified
Return Values
See Also
?gbsv
equations with a band coefficient matrix A and
Syntax
lapack_int LAPACKE_sgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , float * ab , lapack_int ldab , lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , double * ab , lapack_int ldab , lapack_int * ipiv , double *
b , lapack_int ldb );
lapack_int LAPACKE_cgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , lapack_complex_float * ab , lapack_int ldab , lapack_int *
ipiv , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zgbsv (int matrix_layout , lapack_int n , lapack_int kl , lapack_int
ku , lapack_int nrhs , lapack_complex_double * ab , lapack_int ldab , lapack_int *
ipiv , lapack_complex_double * b , lapack_int ldb );
542
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n band
matrix with kl subdiagonals and ku superdiagonals, the columns of matrix B are individual right-hand sides,
and the columns of X are the corresponding solutions.
The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L*U, where L is a
product of permutation and unit lower triangular matrices with kl subdiagonals, and U is upper triangular
with kl+ku superdiagonals. The factored form of A is then used to solve the system of equations A*X = B.
Input Parameters

n The order of A. The number of rows in B; n≥ 0.
nrhs The number of right-hand sides. The number of columns in B; nrhs≥

0.
ab, b Arrays: ab(size max(1, ldab*n)), bof size max(1, ldb*nrhs) for
column major layout and max(1, ldb*n) for row major layout.
The array ab contains the matrix A in band storage (see Matrix

Storage Schemes).
ldab The leading dimension of the array ab. (ldab≥ 2kl + ku +1)
Output Parameters
ab Overwritten by L and U. U is stored as an upper triangular band

matrix with kl + ku superdiagonals and L is stored as a lower
triangular band matrix with kl subdiagonals. See Matrix Storage
Schemes.
ipiv Array, size at least max(1, n). The pivot indices: row i was
interchanged with row ipiv[i-1].
Return Values
543
If info = i, Ui, i is exactly zero. The factorization has been completed, but the factor U is exactly singular,
so the solution could not be computed.
See Also
?gbsvx
Computes the solution to the real or complex system
of linear equations with a band coefficient matrix A
and multiple right-hand sides, and provides error
bounds on the solution.
Syntax
lapack_int LAPACKE_sgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, float* ab, lapack_int ldab, float* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, float* r, float* c, float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr, float* berr,
float* rpivot );
lapack_int LAPACKE_dgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, double* ab, lapack_int ldab, double*
afb, lapack_int ldafb, lapack_int* ipiv, char* equed, double* r, double* c, double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr,
double* rpivot );
lapack_int LAPACKE_cgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_float* ab, lapack_int
ldab, lapack_complex_float* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
float* r, float* c, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* ferr, float* berr, float* rpivot );
lapack_int LAPACKE_zgbsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_double* ab, lapack_int
ldab, lapack_complex_double* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
double* r, double* c, lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* ferr, double* berr, double* rpivot );
Include Files
• mkl.h
Description
A*X = B, AT*X = B, or AH*X = B, where A is a band matrix of order n with kl subdiagonals and ku
superdiagonals, the columns of matrix B are individual right-hand sides, and the columns of X are the
corresponding solutions.
The routine ?gbsvx performs the following steps:
1. If fact = 'E', real scaling factors r and c are computed to equilibrate the system:
trans = 'N': diag(r)*A*diag(c) *inv(diag(c))*X = diag(r)*B

trans = 'T': (diag(r)*A*diag(c))T *inv(diag(r))*X = diag(c)*B
trans = 'C': (diag(r)*A*diag(c))H *inv(diag(r))*X = diag(c)*B
544
LAPACK Routines 3
Whether the system will be equilibrated depends on the scaling of the matrix A, but if equilibration is
used, A is overwritten by diag(r)*A*diag(c) and B by diag(r)*B (if trans='N') or diag(c)*B (if
trans = 'T'or 'C').
2. If fact = 'N'or 'E', the LU decomposition is used to factor the matrix A (after equilibration if fact =
'E') as A = L*U, where L is a product of permutation and unit lower triangular matrices with kl
subdiagonals, and U is upper triangular with kl+ku superdiagonals.
3. If some Ui,i = 0, so that U is exactly singular, then the routine returns with info = i. Otherwise, the
Input Parameters

Specifies whether the factored form of the matrix A is supplied on

entry, and if not, whether the matrix A should be equilibrated before it
is factored.
If fact = 'F': on entry, afb and ipiv contain the factored form of A.
If equed is not 'N', the matrix A is equilibrated with scaling factors
given by r and c.
ab, afb, and ipiv are not modified.
If fact = 'N', the matrix A will be copied to afb and factored.

copied to afb and factored.

If trans = 'C', the system has the form AH*X = B (Transpose for
real flavors, conjugate transpose for complex flavors).
n The number of linear equations, the order of the matrix A; n≥ 0.
nrhs The number of right hand sides, the number of columns of the
545
ab, afb, b Arrays: ab (max(ldab*n)), afb (max(ldafb*n)), b(max(1,

layout).
The array ab contains the matrix A in band storage (see Matrix
Storage Schemes). If fact = 'F' and equed is not 'N', then A must
have been equilibrated by the scaling factors in r and/or c.
The array afb is an input argument if fact = 'F'. It contains the
factored form of the matrix A, that is, the factors L and U from the
factorization A = P*L*U as computed by ?gbtrf. U is stored as an
upper triangular band matrix with kl + ku superdiagonals.L is stored
as lower triangular band matrix with kl subdiagonals. If equed is not
'N', then afb is the factored form of the equilibrated matrix A.
ldab The leading dimension of ab; ldab≥kl+ku+1.
ldafb The leading dimension of afb; ldafb≥ 2*kl+ku+1.
fact = 'F'. It contains the pivot indices from the factorization A =
P*L*U as computed by ?gbtrf; row i of the matrix was interchanged
with row ipiv[i-1].

If equed = 'N', no equilibration was done (always true if fact =
'N').
if equed = 'B', both row and column equilibration was done, that is,
A has been replaced by diag(r)*A*diag(c).
r, c Arrays: r (size n), c (size n).
The array r contains the row scale factors for A, and the array c
contains the column scale factors for A. These arrays are input
arguments if fact = 'F' only; otherwise they are output arguments.
If equed = 'R'or 'B', A is multiplied on the left by diag(r); if

equed = 'N' or 'C', r is not accessed.
positive.
If equed = 'C'or 'B', A is multiplied on the right by diag(c); if
equed = 'N'or 'R', c is not accessed.
546
LAPACK Routines 3
If fact = 'F' and equed = 'C'or 'B', each element of c must be
positive.
Output Parameters
X to the original system of equations. Note that A and B are modified
on exit if equed≠'N', and the solution to the equilibrated system is:
inv(diag(c))*X, if trans = 'N' and equed = 'C'or 'B';
ab Array ab is not modified on exit if fact = 'F' or 'N', or if fact =

'E' and equed = 'N'.
afb If fact = 'N' or 'E', then afb is an output argument and on exit
returns details of the LU factorization of the original matrix A (if fact
= 'N') or of the equilibrated matrix A (if fact = 'E'). See the
description of ab for the form of the equilibrated matrix.
b Overwritten by diag(r)*b if trans = 'N' and equed = 'R' or 'B';
overwritten by diag(c)*b if trans = 'T' or 'C' and equed = 'C'

or 'B';
r, c These arrays are output arguments if fact≠'F'. See the description

of r, c in Input Arguments section.

equilibration (if done).
If rcond is less than the machine precision (in particular, if rcond =0),
the matrix is singular to working precision. This condition is indicated
by a return code of info>0.
error bound for each solution vector xj (the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj, ferr[j-1]
is an estimated upper bound for the magnitude of the largest element
in (xj - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcond, and is almost
always a slight overestimate of the true error.
547

exact solution.
contains the pivot indices from the factorization A = L*U of the
fact = 'E').

Arguments section).
rpivot On exit, rpivot contains the reciprocal pivot growth factor:
If rpivot is much less than 1, then the stability of the LU

factorization of the (equilibrated) matrix A could be poor. This also
means that the solution x, condition estimator rcond, and forward
error bound ferr could be unreliable. If factorization fails with 0 <
info≤n, then rpivot contains the reciprocal pivot growth factor for
the leading info columns of A.
Return Values
If info = i, and i≤n, then Ui, i is exactly zero. The factorization has been completed, but the factor U is
exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned. If info =
i, and i = n+1, then U is nonsingular, but rcond is less than machine precision, meaning that the matrix is
singular to working precision. Nevertheless, the solution and error bounds are computed because there are a
number of situations where the computed solution can be more accurate than the value of rcond would
suggest.
See Also
?gbsvxx
banded coefficient matrix A and multiple right-hand
sides
Syntax
lapack_int LAPACKE_sgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, float* ab, lapack_int ldab, float* afb,
lapack_int ldafb, lapack_int* ipiv, char* equed, float* r, float* c, float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* rpvgrw, float* berr,
const float* params );
548
LAPACK Routines 3
lapack_int LAPACKE_dgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, double* ab, lapack_int ldab, double*
afb, lapack_int ldafb, lapack_int* ipiv, char* equed, double* r, double* c, double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* rpvgrw, double*
berr, lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, const double* params );
lapack_int LAPACKE_cgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_float* ab, lapack_int
ldab, lapack_complex_float* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
float* r, float* c, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x,
lapack_int ldx, float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds,
float* err_bnds_norm, float* err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_zgbsvxx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int kl, lapack_int ku, lapack_int nrhs, lapack_complex_double* ab, lapack_int
ldab, lapack_complex_double* afb, lapack_int ldafb, lapack_int* ipiv, char* equed,
double* r, double* c, lapack_complex_double* b, lapack_int ldb, lapack_complex_double*
x, lapack_int ldx, double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds,
double* err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double*
params );
Include Files
• mkl.h
Description
A*X = B, AT*X = B, or AH*X = B, where A is an n-by-n banded matrix, the columns of the matrix B are
individual right-hand sides, and the columns of X are the corresponding solutions.
The routine ?gbsvxx performs the following steps:
1. If fact = 'E', scaling factors r and c are computed to equilibrate the system:

upper triangular.
549
5. By default, unless params[0] is set to zero, the routine applies iterative refinement to improve the
computed solution matrix and calculate error bounds. Refinement calculates the residual to at least
twice the working precision.
Input Parameters


factored.
If fact = 'F', on entry, afb and ipiv contain the factored form of A. If
given by r and c. Parameters ab, afb, and ipiv are not modified.
If fact = 'E', the matrix A will be equilibrated, if necessary, copied to afb

and factored.

If trans = 'C', the system has the form AH*X = B (Conjugate Transpose
= Transpose for real flavors, Conjugate Transpose for complex flavors).
and X; nrhs≥ 0.
ab, afb, b Arrays: ab (max(ldab*n)), afb (max(ldafb*n)), b(max(1, ldb*nrhs) for

column major layout and max(1, ldb*n) for row major layout).
The array ab contains the matrix A in band storage.
If fact = 'F' and equed is not 'N', then AB must have been equilibrated
by the scaling factors in r and/or c.
550
LAPACK Routines 3
The array afb is an input argument if fact = 'F'. It contains the factored
form of the banded matrix A, that is, the factors L and U from the
factorization A = P*L*U as computed by ?gbtrf. U is stored as an upper
triangular banded matrix with kl + ku superdiagonals. L is stored as lower
triangular band matrix with kl subdiagonals. If equed is not 'N', then afb
is the factored form of the equilibrated matrix A.
ldab The leading dimension of the array ab; ldab≥kl+ku+1.
ldafb The leading dimension of the array afb; ldafb≥ 2*kl+ku+1.
= 'F'. It contains the pivot indices from the factorization A = P*L*U as
computed by ?gbtrf; row i of the matrix was interchanged with row
ipiv[i-1].

been replaced by diag(r)*A*diag(c).
r, c Arrays: r (size n), c (size n). The array r contains the row scale factors for
A, and the array c contains the column scale factors for A. These arrays are
input arguments if fact = 'F' only; otherwise they are output arguments.

'N'or 'C', r is not accessed.
positive.
positive.
551
params Array, size max(1, nparams). Specifies algorithm parameters. If an entry is

argument.

are computed.
=1.0 Use the extra-precise refinement algorithm.

refinement.
Default 10.0


convergence).
Output Parameters
inv(diag(c))*X, if trans = 'N' and equed = 'C' or 'B'; or
552
LAPACK Routines 3
ab Array ab is not modified on exit if fact = 'F' or 'N', or if fact = 'E'
and equed = 'N'.

afb If fact = 'N' or 'E', then afb is an output argument and on exit returns
the factors L and U from the factorization A = PLU of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E').
b Overwritten by diag(r)*B if trans = 'N' and equed = 'R' or 'B';
overwritten by trans = 'T' or 'C' and equed = 'C' or 'B';
r, c These arrays are output arguments if fact≠'F'. Each element of these

arrays is a power of the radix. See the description of r, c in Input
Arguments section.


pivot growth factor for the leading info columns of A. In ?gbsvx, this
quantity is returned in rpivot.


553



are:

approximately 1.

follows:


554
LAPACK Routines 3

are:

contains the pivot indices from the factorization A = P*L*U of the original
matrix A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E').

Arguments section).
for that parameter, otherwise the entry is not modified.
info If info = 0, the execution is successful. The solution to every right-hand

side is guaranteed.
If 0 < info≤n: Uinfo,info is exactly zero. The factorization has been

completed, but the factor U is exactly singular, so the solution and error
bounds could not be computed; rcond = 0 is returned.
If info = n+j: The solution corresponding to the j-th right-hand side is not
guaranteed. The solutions corresponding to other right-hand sides k with k
> j may not be guaranteed as well, but only the first such right-hand side is
reported. If a small componentwise error is not requested params[2] =
0.0, then the j-th right-hand side is the first with a normwise error bound
that is not guaranteed (the smallest j such that err_bnds_norm[j - 1] =
555
0.0 or err_bnds_comp[j - 1] = 0.0. See the definition of

err_bnds_norm and err_bnds_comp for err = 1. To get information about
all of the right-hand sides, check err_bnds_norm or err_bnds_comp.
Return Values
See Also
?gtsv
equations with a tridiagonal coefficient matrix A and
Syntax
lapack_int LAPACKE_sgtsv (int matrix_layout , lapack_int n , lapack_int nrhs , float *
dl , float * d , float * du , float * b , lapack_int ldb );
lapack_int LAPACKE_dgtsv (int matrix_layout , lapack_int n , lapack_int nrhs , double *
dl , double * d , double * du , double * b , lapack_int ldb );
lapack_int LAPACKE_cgtsv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_float * dl , lapack_complex_float * d , lapack_complex_float * du ,
lapack_int LAPACKE_zgtsv (int matrix_layout , lapack_int n , lapack_int nrhs ,
lapack_complex_double * dl , lapack_complex_double * d , lapack_complex_double * du ,
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B, where A is an n-by-n tridiagonal matrix, the
columns of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The routine uses Gaussian elimination with partial pivoting.
Note that the equation AT*X = B may be solved by interchanging the order of the arguments du and dl.
556
LAPACK Routines 3
Input Parameters

n The order of A, the number of rows in B; n≥ 0.
nrhs The number of right-hand sides, the number of columns in B; nrhs≥

0.
dl The array dl (size n - 1) contains the (n - 1) subdiagonal elements

of A.
d The array d (size n) contains the diagonal elements of A.
du The array du (size n - 1) contains the (n - 1) superdiagonal

elements of A.
b The array bof size max(1, ldb*nrhs) for column major layout and
Output Parameters
dl Overwritten by the (n-2) elements of the second superdiagonal of the

upper triangular matrix U from the LU factorization of A. These
elements are stored in dl[0], ..., dl[n - 3].
d Overwritten by the n diagonal elements of U.
du Overwritten by the (n-1) elements of the first superdiagonal of U.
Return Values
If info = i, Ui, i is exactly zero, and the solution has not been computed. The factorization has not been
completed unless i = n.
See Also
?gtsvx
Computes the solution to the real or complex system
of linear equations with a tridiagonal coefficient matrix
A and multiple right-hand sides, and provides error
557
Syntax
lapack_int LAPACKE_sgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const float* dl, const float* d, const float* du, float* dlf, float*
df, float* duf, float* du2, lapack_int* ipiv, const float* b, lapack_int ldb, float*
x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const double* dl, const double* d, const double* du, double* dlf,
double* df, double* duf, double* du2, lapack_int* ipiv, const double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const lapack_complex_float* dl, const lapack_complex_float* d, const
lapack_complex_float* du, lapack_complex_float* dlf, lapack_complex_float* df,
lapack_complex_float* duf, lapack_complex_float* du2, lapack_int* ipiv, const
float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zgtsvx( int matrix_layout, char fact, char trans, lapack_int n,
lapack_int nrhs, const lapack_complex_double* dl, const lapack_complex_double* d, const
lapack_complex_double* du, lapack_complex_double* dlf, lapack_complex_double* df,
lapack_complex_double* duf, lapack_complex_double* du2, lapack_int* ipiv, const
double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
A*X = B, AT*X = B, or AH*X = B, where A is a tridiagonal matrix of order n, the columns of matrix B are
The routine ?gtsvx performs the following steps:
1. If fact = 'N', the LU decomposition is used to factor the matrix A as A = L*U, where L is a product
of permutation and unit lower bidiagonal matrices and U is an upper triangular matrix with nonzeroes in
only the main diagonal and first two superdiagonals.
Input Parameters

fact Must be 'F' or 'N'.
558
LAPACK Routines 3
Specifies whether or not the factored form of the matrix A has been
supplied on entry.
If fact = 'F': on entry, dlf, df, duf, du2, and ipiv contain the
factored form of A; arrays dl, d, du, dlf, df, duf, du2, and ipiv will not
be modified.
If fact = 'N', the matrix A will be copied to dlf, df, and duf and
factored.

If trans = 'C', the system has the form AH*X = B (Conjugate

transpose).
n The number of linear equations, the order of the matrix A; n≥ 0.
nrhs The number of right hand sides, the number of columns of the
dl,d,du,dlf,df, duf,du2,b Arrays:

dl, size (n -1), contains the subdiagonal elements of A.
d, size (n), contains the diagonal elements of A.
du, size (n -1), contains the superdiagonal elements of A.
dlf, size (n -1). If fact = 'F', then dlf is an input argument and on
entry contains the (n -1) multipliers that define the matrix L from the
LU factorization of A as computed by ?gttrf.
df, size (n). If fact = 'F', then df is an input argument and on

entry contains the n diagonal elements of the upper triangular matrix
U from the LU factorization of A.
duf, size (n -1). If fact = 'F', then duf is an input argument and on
entry contains the (n -1) elements of the first superdiagonal of U.
du2, size (n -2). If fact = 'F', then du2 is an input argument and
on entry contains the (n-2) elements of the second superdiagonal of
U.
b, size max(ldb*nrhs) for column major layout and max(ldb*n) for
row major layout, contains the right-hand side matrix B.
ipiv Array, size at least max(1, n). If fact = 'F', then ipiv is an input
argument and on entry contains the pivot indices, as returned by ?
gttrf.
559
Output Parameters
X.
dlf If fact = 'N', then dlf is an output argument and on exit contains
the (n-1) multipliers that define the matrix L from the LU
factorization of A.
df If fact = 'N', then df is an output argument and on exit contains

the n diagonal elements of the upper triangular matrix U from the LU
factorization of A.
duf If fact = 'N', then duf is an output argument and on exit contains
the (n-1) elements of the first superdiagonal of U.
du2 If fact = 'N', then du2 is an output argument and on exit contains
the (n-2) elements of the second superdiagonal of U.
ipiv The array ipiv is an output argument if fact = 'N'and, on exit,

contains the pivot indices from the factorization A = L*U ; row i of
the matrix was interchanged with row ipiv[i-1]. The value of ipiv[i-1]
will always be i or i+1; ipiv[i-1]=i indicates a row interchange was not
required.
rcond An estimate of the reciprocal condition number of the matrix A. If

rcond is less than the machine precision (in particular, if rcond =0),
by a return code of info>0.
in xj - xtrue divided by the magnitude of the largest element in xj. The
estimate is as reliable as the estimate for rcond, and is almost always
a slight overestimate of the true error.

exact solution.
Return Values
If info = i, and i≤n, then Ui, i is exactly zero. The factorization has not been completed unless i = n, but
the factor U is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is
returned. If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision,
560
LAPACK Routines 3
meaning that the matrix is singular to working precision. Nevertheless, the solution and error bounds are
computed because there are a number of situations where the computed solution can be more accurate than
the value of rcond would suggest.
See Also
?dtsvb
equations with a diagonally dominant tridiagonal
coefficient matrix A and multiple right-hand sides.
Syntax
void sdtsvb (const MKL_INT * n, const MKL_INT * nrhs, float * dl, float * d, const
float * du, float * b, const MKL_INT * ldb, MKL_INT * info );
void ddtsvb (const MKL_INT * n, const MKL_INT * nrhs, double * dl, double * d, const
double * du, double * b, const MKL_INT * ldb, MKL_INT * info );
void cdtsvb (const MKL_INT * n, const MKL_INT * nrhs, MKL_Complex8 * dl, MKL_Complex8 *
d, const MKL_Complex8 * du, MKL_Complex8 * b, const MKL_INT * ldb, MKL_INT * info );
void zdtsvb (const MKL_INT * n, const MKL_INT * nrhs, MKL_Complex16 * dl, MKL_Complex16
* d, const MKL_Complex16 * du, MKL_Complex16 * b, const MKL_INT * ldb, MKL_INT *
info );
Include Files
• mkl.h
Description
The ?dtsvb routine solves a system of linear equations A*X = B for X, where A is an n-by-n diagonally
dominant tridiagonal matrix, the columns of matrix B are individual right-hand sides, and the columns of X
are the corresponding solutions. The routine uses the BABE (Burning At Both Ends) algorithm.
Note that the equation AT*X = B may be solved by interchanging the order of the arguments du and dl.
Input Parameters
n The order of A, the number of rows in B; n≥ 0.

0.
dl, d, du, b Arrays: dl (size n - 1), d (size n), du (size n - 1), b(max(ldb*nrhs)
for column major layout and max(ldb*n) for row major layout).
The array dl contains the (n - 1) subdiagonal elements of A.
The array d contains the diagonal elements of A.
The array du contains the (n - 1) superdiagonal elements of A.
ldb The leading dimension of b; ldb≥ max(1, n).
561
Output Parameters
dl Overwritten by the (n-1) elements of the subdiagonal of the lower

triangular matrices L1, L2 from the factorization of A (see dttrfb).
d Overwritten by the n diagonal element reciprocals of U.
If info = i, uii is exactly zero, and the solution has not been
computed. The factorization has not been completed unless i = n.
Application Notes
A diagonally dominant tridiagonal system is defined such that |di| > |dli-1| + |dui| for any i:
1 < i < n, and |d1| > |du1|, |dn| > |dln-1|
The underlying BABE algorithm is designed for diagonally dominant systems. Such systems have no
numerical stability issue unlike the canonical systems that use elimination with partial pivoting (see ?gtsv).
The diagonally dominant systems are much faster than the canonical systems.
NOTE
• The current implementation of BABE has a potential accuracy issue on very small or large data
close to the underflow or overflow threshold respectively. Scale the matrix before applying the
solver in the case of such input data.
• Applying the ?dtsvb factorization to non-diagonally dominant systems may lead to an accuracy
loss, or false singularity detected due to no pivoting.
?posv
equations with a symmetric or Hermitian positive-
definite coefficient matrix A and multiple right-hand
sides.
Syntax
lapack_int LAPACKE_sposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
float * a, lapack_int lda, float * b, lapack_int ldb);
lapack_int LAPACKE_dposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
double * a, lapack_int lda, double * b, lapack_int ldb);
lapack_int LAPACKE_cposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb);
lapack_int LAPACKE_zposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb);
lapack_int LAPACKE_dsposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
double * a, lapack_int lda, double * b, lapack_int ldb, double * x, lapack_int ldx,
lapack_int * iter);
562
LAPACK Routines 3
lapack_int LAPACKE_zcposv (int matrix_layout, char uplo, lapack_int n, lapack_int nrhs,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int ldb,
lapack_complex_double * x, lapack_int ldx, lapack_int * iter);
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n
symmetric/Hermitian positive-definite matrix, the columns of matrix B are individual right-hand sides, and
the columns of X are the corresponding solutions.
The Cholesky decomposition is used to factor A as
A = UT*U (real flavors) and A = UH*U (complex flavors), if uplo = 'U'
or A = L*LT (real flavors) and A = L*LH (complex flavors), if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of A is then used
to solve the system of equations A*X = B.
The dsposv and zcposv are mixed precision iterative refinement subroutines for exploiting fast single
precision hardware. They first attempt to factorize the matrix in single precision (dsposv) or single complex
precision (zcposv) and use this factorization within an iterative refinement procedure to produce a solution
with double precision (dsposv) / double complex precision (zcposv) normwise backward error quality (see
below). If the approach fails, the method switches to a double precision or double complex precision
factorization respectively and computes the solution.
The iterative refinement is not going to be a winning strategy if the ratio single precision/complex
performance over double precision/double complex performance is too small. A reasonable strategy should
take the number of right-hand sides and the size of the matrix into account. This might be done with a call to
ilaenv in the future. At present, iterative refinement is implemented.
The iterative refinement process is stopped if
iter > itermax
or for all the right-hand sides:
rnmr < sqrt(n)*xnrm*anrm*eps*bwdmax,
where
• iter is the number of the current iteration in the iterative refinement process
• rnmr is the infinity-norm of the residual
• xnrm is the infinity-norm of the solution
• anrm is the infinity-operator-norm of the matrix A
• eps is the machine epsilon returned by dlamch (‘Epsilon’).
The values itermax and bwdmax are fixed to 30 and 1.0d+00 respectively.
Input Parameters


563

0.
a, b Arrays: a(size max(1, lda)), b, size max(ldb*nrhs) for column major

layout and max(ldb*n) for row major layout,. The array a contains
the upper or the lower triangular part of the matrix A (see uplo).
Note that in the case of zcposv the imaginary parts of the diagonal
elements need not be set and are assumed to be zero.
ldx The leading dimension of the array x; ldx≥ max(1, n) for column
Output Parameters
a If info = 0, the upper or lower triangular part of a is overwritten by

the Cholesky factor U or L, as specified by uplo.
If iterative refinement has been successfully used (info= 0 and
iter≥ 0), then A is unchanged.
If double precision factorization has been used (info= 0 and iter <
0), then the array A contains the factors L or U from the Cholesky
factorization.
ipiv Array, size at least max(1, n). The pivot indices that define the
permutation matrix P; row i of the matrix was interchanged with row
ipiv(i). Corresponds to the single precision factorization (if info= 0
and iter≥ 0) or the double precision factorization (if info= 0 and
iter < 0).
ldx*n) for row major layout. If info = 0, contains the n-by-nrhs
solution matrix X.
iter
If iter < 0: iterative refinement has failed, double precision
factorization has been performed
• If iter = -1: the routine fell back to full precision for

implementation- or machine-specific reason
• If iter = -2: narrowing the precision induced an overflow, the
routine fell back to full precision
• If iter = -3: failure of spotrf for dsposv, or cpotrf for zcposv
564
LAPACK Routines 3
• If iter = -31: stop the iterative refinement after the 30th
iteration.
If iter > 0: iterative refinement has been successfully used. Returns

the number of iterations.
Return Values
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive definite, so the
factorization could not be completed, and the solution has not been computed.
See Also
?posvx
Uses the Cholesky factorization to compute the
symmetric or Hermitian positive-definite coefficient
matrix A, and provides error bounds on the solution.
Syntax
lapack_int LAPACKE_sposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, char* equed,
float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float*
ferr, float* berr );
lapack_int LAPACKE_dposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, char* equed,
double* s, double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond,
lapack_int LAPACKE_cposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, char* equed, float* s, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zposvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, char* equed, double* s, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
The routine uses the Cholesky factorization A=UT*U (real flavors) / A=UH*U (complex flavors) or A=L*LT (real
flavors) / A=L*LH (complex flavors) to compute the solution to a real or complex system of linear equations
A*X = B, where A is a n-by-n real symmetric/Hermitian positive definite matrix, the columns of matrix B are
565
The routine ?posvx performs the following steps:
1. If fact = 'E', real scaling factors s are computed to equilibrate the system:
diag(s)*A*diag(s)*inv(diag(s))*X = diag(s)*B.
equilibration is used, A is overwritten by diag(s)*A*diag(s) and B by diag(s)*B.
2. If fact = 'N' or 'E', the Cholesky decomposition is used to factor the matrix A (after equilibration if
fact = 'E') as
A = UT*U (real), A = UH*U (complex), if uplo = 'U',
or A = L*LT (real), A = L*LH (complex), if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix.

3. If the leading i-by-i principal minor is not positive-definite, then the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A. If the
reciprocal of the condition number is less than machine precision, info = n + 1 is returned as a
warning, but the routine still goes on to solve for X and compute error bounds as described below.
6. If equilibration was used, the matrix X is premultiplied by diag(s) so that it solves the original system
before equilibration.
Input Parameters


If fact = 'F': on entry, af contains the factored form of A. If equed
= 'Y', the matrix A has been equilibrated with scaling factors given
by s.
a and af will not be modified.

copied to af and factored.


0.
566
LAPACK Routines 3
a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), b, size
max(ldb*nrhs) for column major layout and max(ldb*n) for row
major layout, .
The array a contains the matrix A as specified by uplo. If fact = 'F'
and equed = 'Y', then A must have been equilibrated by the scaling
factors in s, and a must contain the equilibrated matrix
diag(s)*A*diag(s).
The array af is an input argument if fact = 'F'. It contains the
triangular factor U or L from the Cholesky factorization of A in the
same storage format as A. If equed is not 'N', then af is the factored
form of the equilibrated matrix diag(s)*A*diag(s).

if equed = 'N', no equilibration was done (always true if fact =
'N');
if equed = 'Y', equilibration was done, that is, A has been replaced
by diag(s)*A*diag(s).
s Array, size (n). The array s contains the scale factors for A. This array
is an input argument if fact = 'F' only; otherwise it is an output
argument.
If fact = 'F' and equed = 'Y', each element of s must be positive.
Output Parameters
X to the original system of equations. Note that if equed = 'Y', A
and B are modified on exit, and the solution to the equilibrated system
is inv(diag(s))*X.
a Array a is not modified on exit if fact = 'F' or 'N', or if fact =

'E' and equed = 'N'.
567
If fact = 'E' and equed = 'Y', A is overwritten by

diag(s)*A*diag(s).
af If fact = 'N' or 'E', then af is an output argument and on exit

returns the triangular factor U or L from the Cholesky factorization
A=UT*U or A=L*LT (real routines), A=UH*U or A=L*LH (complex
routines) of the original matrix A (if fact = 'N'), or of the
equilibrated matrix A (if fact = 'E'). See the description of a for the
form of the equilibrated matrix.
b Overwritten by diag(s)*B, if equed = 'Y'; not changed if equed =

'N'.
s This array is an output argument if fact≠'F'. See the description of s

in Input Arguments section.

equilibration (if done). If rcond is less than the machine precision (in
particular, if rcond =0), the matrix is singular to working precision.
This condition is indicated by a return code of info>0.
in (xj) - xtrue) divided by the magnitude of the largest element in xj.

exact solution.

Arguments section).
Return Values
If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive-definite,
so the factorization could not be completed, and the solution and error bounds could not be computed; rcond
=0 is returned.
If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision, meaning that the
matrix is singular to working precision. Nevertheless, the solution and error bounds are computed because
there are a number of situations where the computed solution can be more accurate than the value of rcond
would suggest.
See Also
568
LAPACK Routines 3
?posvxx
symmetric or Hermitian positive-definite coefficient
matrix A applying the Cholesky factorization.
Syntax
lapack_int LAPACKE_sposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* af, lapack_int ldaf, char* equed,
float* s, float* b, lapack_int ldb, float* x, lapack_int ldx, float* rcond, float*
rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float*
err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_dposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* af, lapack_int ldaf, char* equed,
double* s, double* b, lapack_int ldb, double* x, lapack_int ldx, double* rcond,
double* rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
err_bnds_comp, lapack_int nparams, const double* params );
lapack_int LAPACKE_cposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, char* equed, float* s, lapack_complex_float* b, lapack_int ldb,
lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw, float* berr,
const float* params );
lapack_int LAPACKE_zposvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, char* equed, double* s, lapack_complex_double* b, lapack_int ldb,
lapack_complex_double* x, lapack_int ldx, double* rcond, double* rpvgrw, double* berr,
lapack_int n_err_bnds, double* err_bnds_norm, double* err_bnds_comp, lapack_int
nparams, const double* params );
Include Files
• mkl.h
Description
A*X = B, where A is an n-by-n real symmetric/Hermitian positive definite matrix, the columns of matrix B
are individual right-hand sides, and the columns of X are the corresponding solutions.
The routine ?posvxx performs the following steps:
1. If fact = 'E', scaling factors are computed to equilibrate the system:
diag(s)*A*diag(s) *inv(diag(s))*X = diag(s)*B
569
fact = 'E') as

3. If the leading i-by-i principal minor is not positive-definite, the routine returns with info = i.
Otherwise, the factored form of A is used to estimate the condition number of the matrix A (see the
rcond parameter). If the reciprocal of the condition number is less than machine precision, the routine
still goes on to solve for X and compute error bounds.
5. By default, unless params[0] is set to zero, the routine applies iterative refinement to get a small error
and error bounds. Refinement calculates the residual to at least twice the working precision.
Input Parameters


factored.
If fact = 'F', on entry, af contains the factored form of A. If equed is not
'N', the matrix A has been equilibrated with scaling factors given by s.
Parameters a and af are not modified.

and factored.

and X; nrhs≥ 0.
a, af, b Arrays: a(size max(lda*n)), af(size max(ldaf*n)), b)size max(1,

layout).
The array a contains the matrix A as specified by uplo . If fact = 'F' and
equed = 'Y', then A must have been equilibrated by the scaling factors in
s, and a must contain the equilibrated matrix diag(s)*A*diag(s).
570
LAPACK Routines 3
The array af is an input argument if fact = 'F'. It contains the triangular
factor U or L from the Cholesky factorization of A in the same storage
format as A. If equed is not 'N', then af is the factored form of the
equilibrated matrix diag(s)*A*diag(s).
lda The leading dimension of the array a; lda≥ max(1,n).
ldaf The leading dimension of the array af; ldaf≥ max(1,n).

if equed = 'Y', both row and column equilibration was done, that is, A has
been replaced by diag(s)*A*diag(s).
s Array, size (n). The array s contains the scale factors for A. This array is an
input argument if fact = 'F' only; otherwise it is an output argument.

descriptions in the Output Arguments section below.
params Array, size max(1,nparams). Specifies algorithm parameters. If an entry is

argument.
571

are computed.

refinement.
Default 10.0


convergence).
Output Parameters
inv(diag(s))*X.
equed = 'N'.
If fact = 'E' and equed = 'Y', A is overwritten by diag(s)*A*diag(s).

the triangular factor U or L from the Cholesky factorization A=UT*U or
A=L*LT (real routines), A=UH*U or A=L*LH (complex routines) of the original
matrix A (if fact = 'N'), or of the equilibrated matrix A (if fact = 'E').
See the description of a for the form of the equilibrated matrix.
b If equed = 'N', B is not modified.
If equed = 'Y', B is overwritten by diag(s)*B.
s This array is an output argument if fact≠'F'. Each element of this array is

a power of the radix. See the description of s in Input Arguments section.
572
LAPACK Routines 3

pivot growth factor for the leading info columns of A.





573

are:

approximately 1.

follows:




are:
574
LAPACK Routines 3

Arguments section).
Return Values
See Also
?ppsv
equations with a symmetric (Hermitian) positive
definite packed coefficient matrix A and multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * ap , float * b , lapack_int ldb );
lapack_int LAPACKE_dppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * ap , double * b , lapack_int ldb );
lapack_int LAPACKE_cppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zppsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_complex_double * b , lapack_int ldb );
575
Include Files
• mkl.h
Description
The routine solves for X the real or complex system of linear equations A*X = B, where A is an n-by-n real
symmetric/Hermitian positive-definite matrix stored in packed format, the columns of matrix B are individual
right-hand sides, and the columns of X are the corresponding solutions.
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of A is then used
to solve the system of equations A*X = B.
Input Parameters



0.
ap, b Arrays: ap (size max(1,n*(n+1)/2), b, size max(ldb*nrhs) for

column major layout and max(ldb*n) for row major layout,. The
array ap contains the upper or the lower triangular part of the matrix
A (as specified by uplo) in packed storage (see Matrix Storage
Schemes). .
Output Parameters
ap If info = 0, the upper or lower triangular part of A in packed storage is

overwritten by the Cholesky factor U or L, as specified by uplo.
Return Values
576
LAPACK Routines 3
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, so the
See Also
?ppsvx
symmetric (Hermitian) positive definite packed
coefficient matrix A, and provides error bounds on the
solution.
Syntax
lapack_int LAPACKE_sppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, float* ap, float* afp, char* equed, float* s, float* b, lapack_int
ldb, float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, double* ap, double* afp, char* equed, double* s, double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_float* ap, lapack_complex_float* afp, char* equed,
float* s, lapack_complex_float* b, lapack_int ldb, lapack_complex_float* x, lapack_int
ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zppsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, lapack_complex_double* ap, lapack_complex_double* afp, char* equed,
double* s, lapack_complex_double* b, lapack_int ldb, lapack_complex_double* x,
lapack_int ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
A*X = B, where A is a n-by-n symmetric or Hermitian positive-definite matrix stored in packed format, the
columns of matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The routine ?ppsvx performs the following steps:
fact = 'E') as
577

reciprocal of the condition number is less than machine precision, info = n+1 is returned as a
Input Parameters


If fact = 'F': on entry, afp contains the factored form of A. If
equed = 'Y', the matrix A has been equilibrated with scaling factors
given by s.
ap and afp will not be modified.
If fact = 'N', the matrix A will be copied to afp and factored.

copied to afp and factored.

nrhs The number of right-hand sides; the number of columns in B; nrhs≥

0.
ap, afp, b Arrays: (size max(1,n*(n+1)/2), afp (size max(1,n*(n+1)/2), bof size
row major layout.
The array ap contains the upper or lower triangle of the original
symmetric/Hermitian matrix A in packed storage (see Matrix Storage
Schemes). In case when fact = 'F' and equed = 'Y', ap must
contain the equilibrated matrix diag(s)*A*diag(s).
The array afp is an input argument if fact = 'F' and contains the
triangular factor U or L from the Cholesky factorization of A in the
same storage format as A. If equed is not 'N', then afp is the
factored form of the equilibrated matrix A.
578
LAPACK Routines 3

'N');
by diag(s)A*diag(s).
argument.
Output Parameters
X to the original system of equations. Note that if equed = 'Y', A
and B are modified on exit, and the solution to the equilibrated system
is inv(diag(s))*X.
ap Array ap is not modified on exit if fact = 'F' or 'N', or if fact =

'E'and equed = 'N'.
If fact = 'E' and equed = 'Y', ap is overwritten by
diag(s)*A*diag(s).
afp If fact = 'N'or 'E', then afp is an output argument and on exit
equilibrated matrix A (if fact = 'E'). See the description of ap for
the form of the equilibrated matrix.

'N'.

579

particular, if rcond = 0), the matrix is singular to working precision.
error bound for each solution vector xj(the j-th column of the solution
matrix X). If xtrue is the true solution corresponding to xj,

exact solution.

Arguments section).
Return Values
= 0 is returned.
would suggest.
See Also
?pbsv
equations with a symmetric or Hermitian positive-
definite band coefficient matrix A and multiple right-
hand sides.
Syntax
lapack_int LAPACKE_spbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , float * ab , lapack_int ldab , float * b , lapack_int ldb );
lapack_int LAPACKE_dpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , double * ab , lapack_int ldab , double * b , lapack_int ldb );
lapack_int LAPACKE_cpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , lapack_complex_float * ab , lapack_int ldab ,
580
LAPACK Routines 3
lapack_int LAPACKE_zpbsv (int matrix_layout , char uplo , lapack_int n , lapack_int
kd , lapack_int nrhs , lapack_complex_double * ab , lapack_int ldab ,
Include Files
• mkl.h
Description
symmetric/Hermitian positive definite band matrix, the columns of matrix B are individual right-hand sides,
and the columns of X are the corresponding solutions.
where U is an upper triangular band matrix and L is a lower triangular band matrix, with the same number of
superdiagonals or subdiagonals as A. The factored form of A is then used to solve the system of equations
A*X = B.
Input Parameters


kd The number of superdiagonals of the matrix A if uplo = 'U', or the

number of subdiagonals if uplo = 'L';kd≥ 0.

0.
ab, b Arrays: ab(size max(1, ldab*n)), bof size max(1, ldb*nrhs) for
column major layout and max(1, ldb*n) for row major layout. The
array ab contains the upper or the lower triangular part of the matrix
A (as specified by uplo) in band storage (see Matrix Storage
Schemes).
581
Output Parameters
ab The upper or lower triangular part of A (in band storage) is

overwritten by the Cholesky factor U or L, as specified by uplo, in the
same storage format as A.
Return Values
If info = i, the leading minor of order i (and therefore the matrix A itself) is not positive-definite, so the
See Also
?pbsvx
symmetric (Hermitian) positive-definite band
coefficient matrix A, and provides error bounds on the
solution.
Syntax
lapack_int LAPACKE_spbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, float* ab, lapack_int ldab, float* afb, lapack_int
ldafb, char* equed, float* s, float* b, lapack_int ldb, float* x, lapack_int ldx,
lapack_int LAPACKE_dpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, double* ab, lapack_int ldab, double* afb, lapack_int
ldafb, char* equed, double* s, double* b, lapack_int ldb, double* x, lapack_int ldx,
lapack_int LAPACKE_cpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* afb, lapack_int ldafb, char* equed, float* s,
lapack_int LAPACKE_zpbsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int kd, lapack_int nrhs, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* afb, lapack_int ldafb, char* equed, double* s,
Include Files
• mkl.h
Description
582
LAPACK Routines 3
A*X = B, where A is a n-by-n symmetric or Hermitian positive definite band matrix, the columns of matrix B
The routine ?pbsvx performs the following steps:
fact = 'E') as
where U is an upper triangular band matrix and L is a lower triangular band matrix.
3. If the leading i-by-i principal minor is not positive definite, then the routine returns with info = i.
Input Parameters


If fact = 'F': on entry, afb contains the factored form of A. If
equed = 'Y', the matrix A has been equilibrated with scaling factors
given by s.
ab and afb will not be modified.

copied to afb and factored.

583

0.
ab, afb, b Arrays: ab(size max(1, ldab*n)), afb(size max(1, ldafb*n)), bof size
row major layout.
The array ab contains the upper or lower triangle of the matrix A in
band storage (see Matrix Storage Schemes).
If fact = 'F' and equed = 'Y', then ab must contain the
equilibrated matrix diag(s)*A*diag(s).
The array afb is an input argument if fact = 'F'. It contains the

triangular factor U or L from the Cholesky factorization of the band
matrix A in the same storage format as A. If equed = 'Y', then afb
is the factored form of the equilibrated matrix A.
ldab The leading dimension of ab; ldab≥kd+1.
ldafb The leading dimension of afb; ldafb≥kd+1.

'N')
by diag(s)*A*diag(s).
argument.
Output Parameters
584
LAPACK Routines 3
If info = 0 or info = n+1, the array x contains the solution matrix X
to the original system of equations. Note that if equed = 'Y', A and
B are modified on exit, and the solution to the equilibrated system is
inv(diag(s))*X.
ab On exit, if fact = 'E'and equed = 'Y', A is overwritten by

diag(s)*A*diag(s).
afb If fact = 'N'or 'E', then afb is an output argument and on exit
equilibrated matrix A (if fact = 'E'). See the description of ab for
the form of the equilibrated matrix.

'N'.



exact solution.

Arguments section).
Return Values
If info = i, and i≤n, the leading minor of order i (and therefore the matrix A itself) is not positive definite,
=0 is returned. If info = i, and i = n + 1, then U is nonsingular, but rcond is less than machine precision,
585
meaning that the matrix is singular to working precision. Nevertheless, the solution and error bounds are
computed because there are a number of situations where the computed solution can be more accurate than
the value of rcond would suggest.
See Also
?ptsv
equations with a symmetric or Hermitian positive
definite tridiagonal coefficient matrix A and multiple
right-hand sides.
Syntax
lapack_int LAPACKE_sptsv( int matrix_layout, lapack_int n, lapack_int nrhs, float* d,
float* e, float* b, lapack_int ldb );
lapack_int LAPACKE_dptsv( int matrix_layout, lapack_int n, lapack_int nrhs, double* d,
double* e, double* b, lapack_int ldb );
lapack_int LAPACKE_cptsv( int matrix_layout, lapack_int n, lapack_int nrhs, float* d,
lapack_complex_float* e, lapack_complex_float* b, lapack_int ldb );
lapack_int LAPACKE_zptsv( int matrix_layout, lapack_int n, lapack_int nrhs, double* d,
lapack_complex_double* e, lapack_complex_double* b, lapack_int ldb );
Include Files
• mkl.h
Description
symmetric/Hermitian positive-definite tridiagonal matrix, the columns of matrix B are individual right-hand
sides, and the columns of X are the corresponding solutions.
A is factored as A = L*D*LT (real flavors) or A = L*D*LH (complex flavors), and the factored form of A is
then used to solve the system of equations A*X = B.
Input Parameters


0.
d Array, dimension at least max(1, n). Contains the diagonal elements

of the tridiagonal matrix A.
e, b Arrays: e (size n - 1), bof size max(1, ldb*nrhs) for column major
layout and max(1, ldb*n) for row major layout. The array e contains
the (n - 1) subdiagonal elements of A.
586
LAPACK Routines 3
Output Parameters
d Overwritten by the n diagonal elements of the diagonal matrix D from

the L*D*LT (real)/ L*D*LH (complex) factorization of A.
e Overwritten by the (n - 1) subdiagonal elements of the unit

bidiagonal factor L from the factorization of A.
Return Values
solution has not been computed. The factorization has not been completed unless i = n.
See Also
?ptsvx
Uses factorization to compute the solution to the
system of linear equations with a symmetric
(Hermitian) positive definite tridiagonal coefficient
matrix A, and provides error bounds on the solution.
Syntax
lapack_int LAPACKE_sptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const float* d, const float* e, float* df, float* ef, const float* b, lapack_int ldb,
float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const double* d, const double* e, double* df, double* ef, const double* b, lapack_int
ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const float* d, const lapack_complex_float* e, float* df, lapack_complex_float* ef,
lapack_int LAPACKE_zptsvx( int matrix_layout, char fact, lapack_int n, lapack_int nrhs,
const double* d, const lapack_complex_double* e, double* df, lapack_complex_double* ef,
ldx, double* rcond, double* ferr, double* berr );
Include Files
• mkl.h
Description
587
The routine uses the Cholesky factorization A = L*D*LT (real)/A = L*D*LH (complex) to compute the
solution to a real or complex system of linear equations A*X = B, where A is a n-by-n symmetric or
Hermitian positive definite tridiagonal matrix, the columns of matrix B are individual right-hand sides, and
the columns of X are the corresponding solutions.
The routine ?ptsvx performs the following steps:
1. If fact = 'N', the matrix A is factored as A = L*D*LT (real flavors)/A = L*D*LH (complex flavors),
where L is a unit lower bidiagonal matrix and D is diagonal. The factorization can also be regarded as
having the form A = UT*D*U (real flavors)/A = UH*D*U (complex flavors).
Input Parameters


on entry.
If fact = 'F': on entry, df and ef contain the factored form of A.
Arrays d, e, df, and ef will not be modified.
If fact = 'N', the matrix A will be copied to df and ef, and factored.

0.
d, df Arrays: d (size n), df (size n).

The array d contains the n diagonal elements of the tridiagonal matrix
A.
The array df is an input argument if fact = 'F' and on entry
contains the n diagonal elements of the diagonal matrix D from the
L*D*LT (real)/ L*D*LH (complex) factorization of A.
e,ef,b Arrays: e (size n -1), ef (size n -1), b, size max(ldb*nrhs) for column
major layout and max(ldb*n) for row major layout. The array e
contains the (n - 1) subdiagonal elements of the tridiagonal matrix
A.
The array ef is an input argument if fact = 'F' and on entry
contains the (n - 1) subdiagonal elements of the unit bidiagonal
factor L from the L*D*LT (real)/ L*D*LH (complex) factorization of A.
588
LAPACK Routines 3
Output Parameters
X to the system of equations.
df, ef These arrays are output arguments if fact = 'N'. See the
description of df, ef in Input Arguments section.

ferrj is an estimated upper bound for the magnitude of the largest
element in (xj - xtrue) divided by the magnitude of the largest
element in xj. The estimate is as reliable as the estimate for rcond,
and is almost always a slight overestimate of the true error.

exact solution.
Return Values
=0 is returned.
would suggest.
See Also
589
?sysv
equations with a real or complex symmetric coefficient
matrix A and multiple right-hand sides.
Syntax
lapack_int LAPACKE_ssysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * a , lapack_int lda , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dsysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * a , lapack_int lda , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_csysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_int LAPACKE_zsysv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
Include Files
• mkl.h
Description
symmetric matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the
The diagonal pivoting method is used to factor A as A = U*D*UT or A = L*D*LT, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is symmetric and block diagonal with 1-
by-1 and 2-by-2 diagonal blocks.
The factored form of A is then used to solve the system of equations A*X = B.
Input Parameters



0.
a, b Arrays: a(size max(1, lda*n)), bof size max(1, ldb*nrhs) for column
major layout and max(1, ldb*n) for row major layout.
symmetric matrix A (see uplo).
590
LAPACK Routines 3
Output Parameters
a If info = 0, a is overwritten by the block-diagonal matrix D and the

multipliers used to obtain the factor U (or L) from the factorization of
A as computed by ?sytrf.
b If info = 0, b is overwritten by the solution matrix X.
and the block structure of D, as determined by ?sytrf.
If ipiv[i-1] = k >0, then dii is a 1-by-1 diagonal block, and the i-

th row and column of A was interchanged with the k-th row and
column.
If uplo = 'U' and ipiv[i] = ipiv[i-1] = -m < 0, then D has a 2-by-2
block in rows/columns i and i+1, and (i)-th row and column of A was
If uplo = 'L'and ipiv[i] = ipiv[i-1] = -m < 0, then D has a 2-by-2
Return Values
If info = i, dii is 0. The factorization has been completed, but D is exactly singular, so the solution could
not be computed.
See Also
?sysv_rook
matrix A and multiple right-hand sides.
Syntax
lapack_int LAPACKE_ssysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , float * a , lapack_int lda , lapack_int * ipiv , float * b ,
lapack_int ldb );
lapack_int LAPACKE_dsysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , double * a , lapack_int lda , lapack_int * ipiv , double * b ,
lapack_int ldb );
591
lapack_int LAPACKE_csysv_rook (int matrix_layout , char uplo , lapack_int n ,

lapack_int nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_int LAPACKE_zsysv_rook (int matrix_layout , char uplo , lapack_int n ,
lapack_int nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
Include Files
• mkl.h
Description
symmetric matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the
The ?sysv_rook routine is called to compute the factorization of a complex symmetric matrix A using the
bounded Bunch-Kaufman ("rook") diagonal pivoting method.
Input Parameters



0.
a, b Arrays: a(size max(1, lda*n)), bof size max(1, ldb*nrhs) for column
major layout and max(1, ldb*n) for row major layout.
symmetric matrix A (see uplo). The second dimension of a must be at
least max(1, n).
sides for the systems of equations. The second dimension of b must
be at least max(1,nrhs).
592
LAPACK Routines 3
Output Parameters

A.
and the block structure of D.
If ipiv[k - 1] > 0, then rows and columns k and ipiv[k - 1] were
If uplo = 'U' and ipiv[k - 1] < 0 and ipiv[k - 2] < 0, then
rows and columns k and -ipiv[k - 1] were interchanged, rows and
columns k - 1 and -ipiv[k - 2] were interchanged, and Dk-1:k, k-1:k is
a 2-by-2 diagonal block.
If uplo = 'L' and ipiv[k - 1] < 0 and ipiv[k] < 0, then rows
and columns k and -ipiv[k - 1] were interchanged, rows and columns
k + 1 and -ipiv[k ] were interchanged, and Dk:k+1, k:k+1 is a 2-by-2
diagonal block.
Return Values
not be computed.
See Also
?sysvx
Uses the diagonal pivoting factorization to compute
the solution to the system of linear equations with a
real or complex symmetric coefficient matrix A, and
provides error bounds on the solution.
Syntax
lapack_int LAPACKE_ssysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const float* a, lapack_int lda, float* af, lapack_int ldaf,
lapack_int* ipiv, const float* b, lapack_int ldb, float* x, lapack_int ldx, float*
rcond, float* ferr, float* berr );
lapack_int LAPACKE_dsysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const double* a, lapack_int lda, double* af, lapack_int ldaf,
lapack_int* ipiv, const double* b, lapack_int ldb, double* x, lapack_int ldx, double*
rcond, double* ferr, double* berr );
lapack_int LAPACKE_csysvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, lapack_complex_float*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
593
lapack_int LAPACKE_zsysvx( int matrix_layout, char fact, char uplo, lapack_int n,

lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, lapack_complex_double*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a real or complex system of
linear equations A*X = B, where A is a n-by-n symmetric matrix, the columns of matrix B are individual
right-hand sides, and the columns of X are the corresponding solutions.
The routine ?sysvx performs the following steps:
1. If fact = 'N', the diagonal pivoting method is used to factor the matrix A. The form of the
factorization is A = U*D*UT or A = L*D*LT, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i= 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
condition number is less than machine precision, info = n+1 is returned as a warning, but the routine
still goes on to solve for X and compute error bounds as described below.
Input Parameters

supplied on entry.
If fact = 'F': on entry, af and ipiv contain the factored form of A.
Arrays a, af, and ipiv will not be modified.


0.
594
LAPACK Routines 3
a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), bof size
row major layout .
symmetric matrix A (see uplo).
The array af is an input argument if fact = 'F'. It contains the block
diagonal matrix D and the multipliers used to obtain the factor U or L
from the factorization A = U*D*UT orA = L*D*LT as computed by ?
sytrf.
fact = 'F'. It contains details of the interchanges and the block
structure of D, as determined by ?sytrf.
If ipiv[i-1] = k > 0, then dii is a 1-by-1 diagonal block, and the

i-th row and column of A was interchanged with the k-th row and
column.
If uplo = 'U'and ipiv[i] = ipiv[i-1] = -m < 0, then D has a
2-by-2 block in rows/columns i and i+1, and (i)-th row and column of
A was interchanged with the m-th row and column.
If uplo = 'L'and ipiv[i] = ipiv[i-1] = -m < 0, then D has a
2-by-2 block in rows/columns i and i+1, and (i+1)-th row and column
of A was interchanged with the m-th row and column.
Output Parameters
af, ipiv These arrays are output arguments if fact = 'N'.
See the description of af, ipiv in Input Arguments section.

rcond is less than the machine precision (in particular, if rcond = 0),
by a return code of info > 0.
595

exact solution.
Return Values
If info = i, and i≤n, then dii is exactly zero. The factorization has been completed, but the block diagonal
matrix D is exactly singular, so the solution and error bounds could not be computed; rcond = 0 is returned.
If info = i, and i = n + 1, then D is nonsingular, but rcond is less than machine precision, meaning that the
would suggest.
See Also
?sysvxx
symmetric indefinite coefficient matrix A applying the
diagonal pivoting factorization.
Syntax
lapack_int LAPACKE_ssysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
ipiv, char* equed, float* s, float* b, lapack_int ldb, float* x, lapack_int ldx,
float* rcond, float* rpvgrw, float* berr, lapack_int n_err_bnds, float* err_bnds_norm,
float* err_bnds_comp, lapack_int nparams, const float* params );
lapack_int LAPACKE_dsysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
ipiv, char* equed, double* s, double* b, lapack_int ldb, double* x, lapack_int ldx,
double* rcond, double* rpvgrw, double* berr, lapack_int n_err_bnds, double*
err_bnds_norm, double* err_bnds_comp, lapack_int nparams, const double* params );
lapack_int LAPACKE_csysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* s, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw,
float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, const float* params );
596
LAPACK Routines 3
lapack_int LAPACKE_zsysvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* s, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* rcond, double*
rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
Include Files
• mkl.h
Description
linear equations A*X = B, where A is an n-by-n real symmetric/Hermitian matrix, the columns of matrix B
The routine ?sysvxx performs the following steps:

= 'E') as
A = U*D*UT, if uplo = 'U',
or A = L*D*LT, if uplo = 'L',
where U or L is a product of permutation and unit upper (lower) triangular matrices, and D is a
symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
3. If some D(i,i)=0, so that D is exactly singular, the routine returns with info = i. Otherwise, the
6. If equilibration was used, the matrix X is premultiplied by diag(r) so that it solves the original system
Input Parameters

597

factored.
given by s. Parameters a, af, and ipiv are not modified.

and factored.

and X; nrhs≥ 0.
a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), b, size

max(ldb*nrhs) for column major layout and max(ldb*n) for row major
layout,.
The array a contains the symmetric matrix A as specified by uplo. If uplo
= 'U', the leading n-by-n upper triangular part of a contains the upper
triangular part of the matrix A and the strictly lower triangular part of a is
not referenced. If uplo = 'L', the leading n-by-n lower triangular part of a
contains the lower triangular part of the matrix A and the strictly upper

diagonal matrix D and the multipliers used to obtain the factor U and L from
the factorization A = U*D*UT or A = L*D*LT as computed by ?sytrf.
= 'F'. It contains details of the interchanges and the block structure of D
as determined by ?sytrf. If ipiv[k-1] > 0, rows and columns k and
ipiv[k-1] were interchanged and D(k,k) is a 1-by-1 diagonal block.
If uplo = 'U' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-by-2
diagonal block in rows and columns i and i + 1, and the i-th row and
column of A were interchanged with the m-th row and column.
598
LAPACK Routines 3
If uplo = 'L' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-by-2
diagonal block in rows and columns i and i + 1, and the (i + 1)-st row and

s Array, size (n). The array s contains the scale factors for A. If equed =
'Y', A is multiplied on the left and right by diag(s).
This array is an input argument if fact = 'F' only; otherwise it is an
output argument.


argument.

are computed.
599

refinement.
Default 10.0


convergence).
Output Parameters
for row major layout).
inv(diag(s))*X.
a If fact = 'E' and equed = 'Y', overwritten by diag(s)*A*diag(s).
af If fact = 'N', af is an output argument and on exit returns the block

diagonal matrix D and the multipliers used to obtain the factor U or L from
the factorization A = U*D*UT or A = L*D*LT.



600
LAPACK Routines 3

The array is indexed by the type of error information as described below. Up

to three pieces of information are returned.



are:

approximately 1.

follows:
601




are:

ipiv If fact = 'N', ipiv is an output argument and on exit contains details of
the interchanges and the block structure D, as determined by ssytrf for
single precision flavors and dsytrf for double precision flavors.

Arguments section).
602
LAPACK Routines 3
Return Values
See Also
?hesv
equations with a Hermitian matrix A and multiple
right-hand sides.
Syntax
lapack_int LAPACKE_chesv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * a , lapack_int lda , lapack_int * ipiv ,
lapack_int LAPACKE_zhesv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * a , lapack_int lda , lapack_int * ipiv ,
Include Files
• mkl.h
Description
The routine solves for X the complex system of linear equations A*X = B, where A is an n-by-n symmetric
matrix, the columns of matrix B are individual right-hand sides, and the columns of X are the corresponding
solutions.
The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1
and 2-by-2 diagonal blocks.
603
Input Parameters


how A is factored:

0.
a, b Arrays: a(size max(1, lda*n)), bbof size max(1, ldb*nrhs) for

column major layout and max(1, ldb*n) for row major layout. The
array a contains the upper or the lower triangular part of the
Hermitian matrix A (see uplo).
Output Parameters

A as computed by ?hetrf.
and the block structure of D, as determined by ?hetrf.

column.
If uplo = 'U'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
If uplo = 'L'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
604
LAPACK Routines 3
Return Values
not be computed.
See Also
?hesvx
the solution to the complex system of linear equations
with a Hermitian coefficient matrix A, and provides
error bounds on the solution.
Syntax
lapack_int LAPACKE_chesvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* a, lapack_int lda, lapack_complex_float*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_float* b, lapack_int ldb,
lapack_int LAPACKE_zhesvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* a, lapack_int lda, lapack_complex_double*
af, lapack_int ldaf, lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a complex system of linear
equations A*X = B, where A is an n-by-n Hermitian matrix, the columns of matrix B are individual right-hand
sides, and the columns of X are the corresponding solutions.
The routine ?hesvx performs the following steps:
factorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
605
Input Parameters

supplied on entry.
If fact = 'F': on entry, af and ipiv contain the factored form of A.
Arrays a, af, and ipiv are not modified.
If fact = 'N', the matrix A is copied to af and factored.

how A is factored:
Hermitian matrix A, and A is factored as U*D*UH.
Hermitian matrix A; A is factored as L*D*LH.

0.
a, af, b Arrays: a(size max(1, lda*n)), af(size max(1, ldaf*n)), bof size
row major layout.
Hermitian matrix A (see uplo).
The array af is an input argument if fact = 'F'. It contains he block
diagonal matrix D and the multipliers used to obtain the factor U or L
from the factorization A = U*D*UH or A = L*D*LH as computed by ?
hetrf.
structure of D, as determined by ?hetrf.

column.
606
LAPACK Routines 3
If uplo = 'U'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
If uplo = 'L'and ipiv[i] =ipiv[i-1] = -m < 0, then D has a 2-by-2
Output Parameters
af, ipiv These arrays are output arguments if fact = 'N'. See the
description of af, ipiv in Input Arguments section.

in (xj) - xtrue) divided by the magnitude of the largest element in xj.
The estimate is as reliable as the estimate for rcon, and is almost

exact solution.
Return Values
would suggest.
See Also
607
?hesvxx
Hermitian indefinite coefficient matrix A applying the
diagonal pivoting factorization.
Syntax
lapack_int LAPACKE_chesvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, lapack_int* ipiv, char* equed, float* s, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* x, lapack_int ldx, float* rcond, float* rpvgrw,
float* berr, lapack_int n_err_bnds, float* err_bnds_norm, float* err_bnds_comp,
lapack_int nparams, const float* params );
lapack_int LAPACKE_zhesvxx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int ldaf, lapack_int* ipiv, char* equed, double* s, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* x, lapack_int ldx, double* rcond, double*
rpvgrw, double* berr, lapack_int n_err_bnds, double* err_bnds_norm, double*
Include Files
• mkl.h
Description
The routine uses the diagonal pivoting factorization to compute the solution to a complex/double complex
system of linear equations A*X = B, where A is an n-by-n Hermitian matrix, the columns of matrix B are
The routine ?hesvxx performs the following steps:

= 'E') as
A = U*D*UT, if uplo = 'U',
or A = L*D*LT, if uplo = 'L',
where U or L is a product of permutation and unit upper (lower) triangular matrices, and D is a
symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal blocks.
608
LAPACK Routines 3
3. If some D(i,i)=0, so that D is exactly singular, the routine returns with info = i. Otherwise, the
6. If equilibration was used, the matrix X is premultiplied by diag(r) so that it solves the original system
Input Parameters


factored.
given by s. Parameters a, af, and ipiv are not modified.

and factored.

and X; nrhs≥ 0.
a, af, b Arrays: a(size max(lda*n)), af(size max(ldaf*n)), b, (size

max(ldb*nrhs) for column major layout and max(ldb*n) for row major
layout),.
The array a contains the Hermitian matrix A as specified by uplo. If uplo =
'U', the leading n-by-n upper triangular part of a contains the upper
triangular part of the matrix A and the strictly lower triangular part of a is
not referenced. If uplo = 'L', the leading n-by-n lower triangular part of a
contains the lower triangular part of the matrix A and the strictly upper

diagonal matrix D and the multipliers used to obtain the factor U and L from
the factorization A = U*D*UT or A = L*D*LT as computed by ?hetrf.
609
= 'F'. It contains details of the interchanges and the block structure of D
as determined by ?sytrf.
If ipiv[k-1] > 0, rows and columns k and ipiv[k-1] were interchanged

and Dk,k) is a 1-by-1 diagonal block.
If uplo = 'U' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-by-2
diagonal block in rows and columns i and i + 1, and the i-th row and
If uplo = 'L' and ipiv[i] = ipiv[i - 1] = m < 0, D has a 2-by-2
diagonal block in rows and columns i and i + 1, and the (i + 1)-st row and

s Array, size (n). The array s contains the scale factors for A. If equed =
'Y', A is multiplied on the left and right by diag(s).
This array is an input argument if fact = 'F' only; otherwise it is an
output argument.


610
LAPACK Routines 3
argument.

are computed.

refinement.
Default 10
Aggressive Set to 100 to permit convergence using


convergence).
Output Parameters
inv(diag(s))*X.
a If fact = 'E' and equed = 'Y', overwritten by diag(s)*A*diag(s).
af If fact = 'N', af is an output argument and on exit returns the block

diagonal matrix D and the multipliers used to obtain the factor U or L from
the factorization A = U*D*UT or A = L*D*LT.

611


berr Array, size at least max(1, nrhs). Contains the component-wise relative



threshold sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε) for zhesvxx.

for chesvxx and sqrt(n)*dlamch(ε) for
zhesvxx. This error bound should only be

sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε)for zhesvxx to determine if
the error estimate is "guaranteed". These
reciprocal condition numbers for some
appropriately scaled matrix Z are:
612
LAPACK Routines 3

approximately 1.

follows:


threshold sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε) for zhesvxx.

for chesvxx and sqrt(n)*dlamch(ε) for
zhesvxx. This error bound should only be

sqrt(n)*slamch(ε) for chesvxx and
sqrt(n)*dlamch(ε) for zhesvxx to determine
if the error estimate is "guaranteed". These
reciprocal condition numbers for some
appropriately scaled matrix Z are:

613
ipiv If fact = 'N', ipiv is an output argument and on exit contains details of
the interchanges and the block structure D, as determined by ssytrf for
single precision flavors and dsytrf for double precision flavors.

Arguments section).
Return Values
See Also
?spsv
matrix A stored in packed format, and multiple right-
hand sides.
Syntax
lapack_int LAPACKE_sspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , float * ap , lapack_int * ipiv , float * b , lapack_int ldb );
lapack_int LAPACKE_dspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , double * ap , lapack_int * ipiv , double * b , lapack_int ldb );
lapack_int LAPACKE_cspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_int * ipiv , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zspsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_int * ipiv , lapack_complex_double * b ,
lapack_int ldb );
614
LAPACK Routines 3
Include Files
• mkl.h
Description
symmetric matrix stored in packed format, the columns of matrix B are individual right-hand sides, and the
Input Parameters



0.
ap, b Arrays: ap (size max(1,n*(n+1)/2), bof size max(1, ldb*nrhs) for


Output Parameters
ap The block-diagonal matrix D and the multipliers used to obtain the

factor U (or L) from the factorization of A as computed by ?sptrf,
stored as a packed triangular matrix in the same storage format as A.
and the block structure of D, as determined by ?sptrf.
If ipiv[i-1] = k > 0, then dii is a 1-by-1 block, and the i-th row
and column of A was interchanged with the k-th row and column.
615
If uplo = 'U'and ipiv[i]=ipiv[i-1] = -m < 0, then D has a 2-by-2

If uplo = 'L'and ipiv[i-1] =ipiv[i] = -m < 0, then D has a 2-by-2
Return Values
not be computed.
See Also
?spsvx
real or complex symmetric coefficient matrix A stored
in packed format, and provides error bounds on the
solution.
Syntax
lapack_int LAPACKE_sspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const float* ap, float* afp, lapack_int* ipiv, const float* b,
lapack_int ldb, float* x, lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_dspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const double* ap, double* afp, lapack_int* ipiv, const double* b,
lapack_int ldb, double* x, lapack_int ldx, double* rcond, double* ferr, double* berr );
lapack_int LAPACKE_cspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* ap, lapack_complex_float* afp, lapack_int*
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zspsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* ap, lapack_complex_double* afp,
lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
Include Files
• mkl.h
Description
linear equations A*X = B, where A is a n-by-n symmetric matrix stored in packed format, the columns of
matrix B are individual right-hand sides, and the columns of X are the corresponding solutions.
The routine ?spsvx performs the following steps:
616
LAPACK Routines 3
factorization is A = U*D*UT orA = L*D*LT, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is symmetric and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
Input Parameters

supplied on entry.
If fact = 'F': on entry, afp and ipiv contain the factored form of A.
Arrays ap, afp, and ipiv are not modified.
If fact = 'N', the matrix A is copied to afp and factored.

how A is factored:
symmetric matrix A, and A is factored as U*D*UT.
symmetric matrix A; A is factored as L*D*LT.

0.
ap, afp, b Arrays: ap (size max(1,n*(n+1)/2), afp (size max(1,n*(n+1)/2), bof

size max(1, ldb*nrhs) for column major layout and max(1, ldb*n)
The array ap contains the upper or lower triangle of the symmetric
matrix A in packed storage (see Matrix Storage Schemes).
The array afp is an input argument if fact = 'F'. It contains the
block diagonal matrix D and the multipliers used to obtain the factor U
or L from the factorization A = U*D*UT or A = L*D*LT as computed
by ?sptrf, in the same storage format as A.
617
structure of D, as determined by ?sptrf.
Output Parameters
afp, ipiv These arrays are output arguments if fact = 'N'. See the
description of afp, ipiv in Input Arguments section.

forward and relative backward errors, respectively, for each solution
vector.
Return Values
would suggest.
See Also
618
LAPACK Routines 3
?hpsv
equations with a Hermitian coefficient matrix A stored
in packed format, and multiple right-hand sides.
Syntax
lapack_int LAPACKE_chpsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_float * ap , lapack_int * ipiv , lapack_complex_float * b ,
lapack_int ldb );
lapack_int LAPACKE_zhpsv (int matrix_layout , char uplo , lapack_int n , lapack_int
nrhs , lapack_complex_double * ap , lapack_int * ipiv , lapack_complex_double * b ,
lapack_int ldb );
Include Files
• mkl.h
Description
The routine solves for X the system of linear equations A*X = B, where A is an n-by-n Hermitian matrix
stored in packed format, the columns of matrix B are individual right-hand sides, and the columns of X are
the corresponding solutions.
The diagonal pivoting method is used to factor A as A = U*D*UH or A = L*D*LH, where U (or L) is a product
of permutation and unit upper (lower) triangular matrices, and D is Hermitian and block diagonal with 1-by-1
and 2-by-2 diagonal blocks.
Input Parameters



0.
ap, b Arrays: ap (size max(1,n*(n+1)/2), bof size max(1, ldb*nrhs) for


619
Output Parameters
ap The block-diagonal matrix D and the multipliers used to obtain the

factor U (or L) from the factorization of A as computed by ?hptrf,
stored as a packed triangular matrix in the same storage format as A.
and the block structure of D, as determined by ?hptrf.
Return Values
not be computed.
See Also
?hpsvx
Hermitian coefficient matrix A stored in packed
format, and provides error bounds on the solution.
Syntax
lapack_int LAPACKE_chpsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_float* ap, lapack_complex_float* afp, lapack_int*
lapack_int ldx, float* rcond, float* ferr, float* berr );
lapack_int LAPACKE_zhpsvx( int matrix_layout, char fact, char uplo, lapack_int n,
lapack_int nrhs, const lapack_complex_double* ap, lapack_complex_double* afp,
lapack_int* ipiv, const lapack_complex_double* b, lapack_int ldb,
Include Files
• mkl.h
Description
620
LAPACK Routines 3
The routine uses the diagonal pivoting factorization to compute the solution to a complex system of linear
equations A*X = B, where A is a n-by-n Hermitian matrix stored in packed format, the columns of matrix B
The routine ?hpsvx performs the following steps:
factorization is A = U*D*UH or A = L*D*LH, where U (or L) is a product of permutation and unit upper
(lower) triangular matrices, and D is a Hermitian and block diagonal with 1-by-1 and 2-by-2 diagonal
blocks.
2. If some di,i = 0, so that D is exactly singular, then the routine returns with info = i. Otherwise, the
Input Parameters

supplied on entry.
If fact = 'F': on entry, afp and ipiv contain the factored form of A.
Arrays ap, afp, and ipiv are not modified.
If fact = 'N', the matrix A is copied to afp and factored.

how A is factored:
Hermitian matrix A, and A is factored as U*D*UH.
Hermitian matrix A, and A is factored as L*D*LH.

0.
ap, afp, b Arrays: ap (size max(1,n*(n+1)/2), afp (size max(1,n*(n+1)/2), bof

size max(1, ldb*nrhs) for column major layout and max(1, ldb*n)
The array ap contains the upper or lower triangle of the Hermitian
matrix A in packed storage (see Matrix Storage Schemes).
621
The array afp is an input argument if fact = 'F'. It contains the

block diagonal matrix D and the multipliers used to obtain the factor U
or L from the factorization A = U*D*UH or A = L*D*LH as computed
by ?hptrf, in the same storage format as A.
structure of D, as determined by ?hptrf.
Output Parameters
afp, ipiv These arrays are output arguments if fact = 'N'. See the
description of afp, ipiv in Input Arguments section.


exact solution.
622
LAPACK Routines 3
Return Values
would suggest.
See Also
LAPACK Least Squares and Eigenvalue Problem Routines

This section includes descriptions of LAPACK computational routines and driver routines for solving linear
least squares problems, eigenvalue and singular value problems, and performing a number of related
computational tasks. For a full reference on LAPACK routines and related information see [LUG].
Least Squares Problems. A typical least squares problem is as follows: given a matrix A and a vector b,
find the vector x that minimizes the sum of squares Σi((Ax)i - bi)2 or, equivalently, find the vector x that
minimizes the 2-norm ||Ax - b||2.
In the most usual case, A is an m-by-n matrix with m≥n and rank(A) = n. This problem is also referred to
as finding the least squares solution to an overdetermined system of linear equations (here we have more
equations than unknowns). To solve this problem, you can use the QR factorization of the matrix A (see QR
Factorization).
If m < n and rank(A) = m, there exist an infinite number of solutions x which exactly satisfy Ax = b, and
thus minimize the norm ||Ax - b||2. In this case it is often useful to find the unique solution that
minimizes ||x||2. This problem is referred to as finding the minimum-norm solution to an
underdetermined system of linear equations (here we have more unknowns than equations). To solve this
problem, you can use the LQ factorization of the matrix A (see LQ Factorization).
In the general case you may have a rank-deficient least squares problem, with rank(A)< min(m, n): find
the minimum-norm least squares solution that minimizes both ||x||2 and ||Ax - b||2. In this case (or
when the rank of A is in doubt) you can use the QR factorization with pivoting or singular value
decomposition (see Singular Value Decomposition).
Eigenvalue Problems. The eigenvalue problems (from German eigen "own") are stated as follows: given a
matrix A, find the eigenvaluesλ and the corresponding eigenvectorsz that satisfy the equation
Az = λz (right eigenvectors z)
or the equation
zHA = λzH (left eigenvectors z).
If A is a real symmetric or complex Hermitian matrix, the above two equations are equivalent, and the
problem is called a symmetric eigenvalue problem. Routines for solving this type of problems are described
in the section Symmetric Eigenvalue Problems.
Routines for solving eigenvalue problems with nonsymmetric or non-Hermitian matrices are described in the
section Nonsymmetric Eigenvalue Problems.
The library also includes routines that handle generalized symmetric-definite eigenvalue problems: find
the eigenvalues λ and the corresponding eigenvectors x that satisfy one of the following equations:
Az = λBz, ABz = λz, or BAz = λz,
623
where A is symmetric or Hermitian, and B is symmetric positive-definite or Hermitian positive-definite.

Routines for reducing these problems to standard symmetric eigenvalue problems are described in the
section Generalized Symmetric-Definite Eigenvalue Problems.
To solve a particular problem, you usually call several computational routines. Sometimes you need to
combine the routines of this chapter with other LAPACK routines described in "LAPACK Routines: Linear
Equations" as well as with BLAS routines described in "BLAS and Sparse BLAS Routines".
For example, to solve a set of least squares problems minimizing ||Ax - b||2 for all columns b of a given
matrix B (where A and B are real matrices), you can call ?geqrf to form the factorization A = QR, then call ?
ormqr to compute C = QHB and finally call the BLAS routine ?trsm to solve for X the system of equations RX
= C.
Another way is to call an appropriate driver routine that performs several tasks in one call. For example, to
solve the least squares problem the driver routine ?gels can be used.
LAPACK Least Squares and Eigenvalue Problem Computational Routines

In the sections that follow, the descriptions of LAPACK computational routines are given. These routines
perform distinct computational tasks that can be used for:
Orthogonal Factorizations
Singular Value Decomposition
Symmetric Eigenvalue Problems
Generalized Symmetric-Definite Eigenvalue Problems
Nonsymmetric Eigenvalue Problems
Generalized Nonsymmetric Eigenvalue Problems
Generalized Singular Value Decomposition
See also the respective driver routines.
Orthogonal Factorizations: LAPACK Computational Routines

This section describes the LAPACK routines for the QR (RQ) and LQ (QL) factorization of matrices. Routines
for the RZ factorization as well as for generalized QR and RQ factorizations are also included.
QR Factorization. Assume that A is an m-by-n matrix to be factored.
If m≥n, the QR factorization is given by
where R is an n-by-n upper triangular matrix with real diagonal elements, and Q is an m-by-m orthogonal (or
unitary) matrix.
You can use the QR factorization for solving the following least squares problem: minimize ||Ax - b||2
where A is a full-rank m-by-n matrix (m≥n). After factoring the matrix, compute the solution x by solving Rx
= (Q1)Tb.
If m < n, the QR factorization is given by
A = QR = Q(R1R2)
where R is trapezoidal, R1 is upper triangular and R2 is rectangular.
Q is represented as a product of min(m, n) elementary reflectors. Routines are provided to work with Q in
this representation.
LQ Factorization LQ factorization of an m-by-n matrix A is as follows. If m≤n,
624
LAPACK Routines 3
where L is an m-by-m lower triangular matrix with real diagonal elements, and Q is an n-by-n orthogonal (or
unitary) matrix.
If m > n, the LQ factorization is
where L1 is an n-by-n lower triangular matrix, L2 is rectangular, and Q is an n-by-n orthogonal (or unitary)
matrix.
You can use the LQ factorization to find the minimum-norm solution of an underdetermined system of linear
equations Ax = b where A is an m-by-n matrix of rank m (m < n). After factoring the matrix, compute the
solution vector x as follows: solve Ly = b for y, and then compute x = (Q1)Hy.
Table "Computational Routines for Orthogonal Factorization" lists LAPACK routines that perform orthogonal
factorization of matrices.
Computational Routines for Orthogonal Factorization
Matrix type, factorization Factorize without Factorize with Generate Apply
pivoting pivoting matrix Q matrix Q
general matrices, QR factorization geqrf geqpf orgqr ormqr

geqrfp geqp3 ungqr unmqr
general matrices, blocked QR geqrt gemqrt

factorization
general matrices, RQ factorization gerqf orgrq ormrq

ungrq unmrq
general matrices, LQ factorization gelqf orglq ormlq

unglq unmlq
general matrices, QL factorization geqlf orgql ormql

ungql unmql
trapezoidal matrices, RZ tzrzf ormrz

factorization
unmrz
pair of matrices, generalized QR ggqrf

factorization
pair of matrices, generalized RQ ggrqf

factorization
triangular-pentagonal matrices, tpqrt tpmqrt

blocked QR factorization
625
?geqrf
Computes the QR factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgeqrf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* tau);
lapack_int LAPACKE_dgeqrf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* tau);
lapack_int LAPACKE_cgeqrf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqrf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine forms the QR factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of min(m, n)
elementary reflectors. Routines are provided to work with Q in this representation.
NOTE
Input Parameters

n The number of columns in A (n≥ 0).
a Array a of size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout contains the matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout and
at least max(1, n) for row major layout.
Output Parameters
a Overwritten by the factorization data as follows:

The elements on and above the diagonal of the array contain the min(m,n)-
by-n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array tau, present the orthogonal matrix Q as
a product of min(m,n) elementary reflectors (see Orthogonal Factorizations).
626
LAPACK Routines 3
tau Array, size at least max (1, min(m, n)). Contains scalars that define
elementary reflectors for the matrix Qin its decomposition in a product of
elementary reflectors (see Orthogonal Factorizations).
Return Values
Application Notes
The computed factorization is the exact factorization of a matrix A + E, where
||E||2 = O(ε)||A||2.
(4/3)n3 if m = n,
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.
The number of operations for complex flavors is 4 times greater.

To solve a set of least squares problems minimizing ||A*x - b||2 for all columns b of a given matrix B, you
can call the following:
?geqrf (this routine) to factorize A = QR;
ormqr to compute C = QT*B (for real matrices);
unmqr to compute C = QH*B (for complex matrices);
trsm (a BLAS routine) to solve R*X = C.
(The columns of the computed X are the least squares solution vectors x.)
To compute the elements of Q explicitly, call
orgqr (for real matrices)
ungqr (for complex matrices).
See Also
mkl_progress
?geqrfp
matrix with non-negative diagonal elements.
Syntax
lapack_int LAPACKE_sgeqrfp (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dgeqrfp (int matrix_layout, lapack_int m, lapack_int n, double* a,
627
lapack_int LAPACKE_cgeqrfp (int matrix_layout, lapack_int m, lapack_int n,

lapack_int LAPACKE_zgeqrfp (int matrix_layout, lapack_int m, lapack_int n,
Include Files
• mkl.h
Description
The routine forms the QR factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
pivoting is performed. The diagonal entries of R are real and nonnegative.
NOTE
Input Parameters

a Array, size max(1,lda*n) for column major layout and max(1,lda*m) for
row major layout, containing the matrix A.
Output Parameters

The diagonal elements of the matrix R are real and non-negative.
tau Array, size at least max (1, min(m, n)). Contains scalars that define
elementary reflectors for the matrix Qin its decomposition in a product of
elementary reflectors (see Orthogonal Factorizations).
Return Values
628
LAPACK Routines 3
Application Notes
||E||2 = O(ε)||A||2.
(4/3)n3 if m = n,
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.

?geqrfp (this routine) to factorize A = QR;
(The columns of the computed X are the least squares solution vectors x.)
See Also
mkl_progress
?geqrt
Computes a blocked QR factorization of a general real
or complex matrix using the compact WY
representation of Q.
Syntax
lapack_int LAPACKE_sgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, float* a, lapack_int lda, float* t, lapack_int ldt);
lapack_int LAPACKE_dgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, double* a, lapack_int lda, double* t, lapack_int ldt);
lapack_int LAPACKE_cgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, lapack_complex_float* a, lapack_int lda, lapack_complex_float* t, lapack_int ldt);
lapack_int LAPACKE_zgeqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int
nb, lapack_complex_double* a, lapack_int lda, lapack_complex_double* t, lapack_int
ldt);
629
Include Files
• mkl.h
Description
The strictly lower triangular matrix V contains the elementary reflectors H(i) in the ith column below the
diagonal. For example, if m=5 and n=3, the matrix V is
where vi represents one of the vectors that define H(i). The vectors are returned in the lower triangular part
of array a.
NOTE
The 1s along the diagonal of V are not stored in a.
Let k = min(m,n). The number of blocks is b = ceiling(k/nb), where each block is of order nb except for
the last block, which is of order ib = k - (b-1)*nb. For each of the b blocks, a upper triangular block
reflector factor is computed:t1, t2, ..., tb. The nb-by-nb (and ib-by-ib for the last block) ts are stored
in the nb-by-n array t as
t = (t1t2 ... tb).
Input Parameters

nb The block size to be used in the blocked QR (min(m, n) ≥nb≥ 1).
for row major layout contains the m-by-n matrix A.
max(1, n) for row major layout.
ldt The leading dimension of t; at least nb for column major layout and max(1,
min(m, n)) for row major layout.
630
LAPACK Routines 3
Output Parameters

below the diagonal, with the array t, present the orthogonal matrix Q as a
product of min(m,n) elementary reflectors (see Orthogonal Factorizations).
t Array, size max(1, ldt*min(m, n)) for column major layout and max(1,
ldt*nb) for row major layout.
The upper triangular block reflector's factors stored as a sequence of upper
triangular blocks.
Return Values
If info < 0 and info = -i, the i-th parameter had an illegal value.
?gemqrt
Multiplies a general matrix by the orthogonal/unitary
matrix Q of the QR factorization formed by ?geqrt.
Syntax
lapack_int LAPACKE_sgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const float* v, lapack_int ldv, const
float* t, lapack_int ldt, float* c, lapack_int ldc);
lapack_int LAPACKE_dgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const double* v, lapack_int ldv, const
double* t, lapack_int ldt, double* c, lapack_int ldc);
lapack_int LAPACKE_cgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const lapack_complex_float* v, lapack_int
ldv, const lapack_complex_float* t, lapack_int ldt, lapack_complex_float* c, lapack_int
ldc);
lapack_int LAPACKE_zgemqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int nb, const lapack_complex_double* v, lapack_int
ldv, const lapack_complex_double* t, lapack_int ldt, lapack_complex_double* c,
lapack_int ldc);
Include Files
• mkl.h
Description
The ?gemqrt routine overwrites the general real or complex m-by-n matrixC with
side ='L' side ='R'

trans = 'N': Q*C C*Q
trans = 'T': QT*C C*QT
trans = 'C': QH*C C*QH
where Q is a real orthogonal (complex unitary) matrix defined as the product of k elementary reflectors
631
Q = H(1) H(2)... H(k) = I - V*T*VT for real flavors, and
Q = H(1) H(2)... H(k) = I - V*T*VH for complex flavors,
generated using the compact WY representation as returned by geqrt. Q is of order m if side = 'L' and of
order n if side = 'R'.
Input Parameters

side ='L': apply Q, QT, or QH from the left.

='R': apply Q, QT, or QH from the right.
trans ='N', no transpose, apply Q.

='T', transpose, apply QT.
='C', transpose, apply QH.
m The number of rows in the matrix C, (m≥ 0).
n The number of columns in the matrix C, (n≥ 0).
k The number of elementary reflectors whose product defines the matrix Q.

Constraints:
If side = 'L', m≥k≥0
If side = 'R', n≥k≥0.
nb
The block size used for the storage of t, k≥nb≥ 1. This must be the same
value of nb used to generate t in geqrt.
v Array of size max(1, ldv*k) for column major layout, max(1, ldv*m) for
row major layout and side = 'L', and max(1, ldv*n) for row major layout
and side = 'R'.
The ith column must contain the vector which defines the elementary
reflector H(i), for i = 1,2,...,k, as returned by geqrt in the first k columns of
its array argument a.
ldv The leading dimension of the array v.
if side = 'L', ldv must be at least max(1,m) for column major layout and
max(1, k) for row major layout;
if side = 'R', ldv must be at least max(1,n) for column major layout and
max(1, k) for row major layout.
t Array, size max(1, ldt*min(m, n)) for column major layout and max(1,
ldt*nb) for row major layout.
The upper triangular factors of the block reflectors as returned by geqrt.
ldt The leading dimension of the array t. ldt must be at least nb for column
major layout and max(1, k) for row major layout.
c The m-by-n matrix C.
632
LAPACK Routines 3
ldc The leadinng dimension of the array c. ldc must be at least max(1, m) for
column major layout and max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, C*Q, QT*C, C*QT, QH*C, or C*QH as

specified by side and trans.
Return Values
?geqpf
matrix with pivoting.
Syntax
lapack_int LAPACKE_sgeqpf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int* jpvt, float* tau);
lapack_int LAPACKE_dgeqpf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, lapack_int* jpvt, double* tau);
lapack_int LAPACKE_cgeqpf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* jpvt, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqpf (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* jpvt, lapack_complex_double*
tau);
Include Files
• mkl.h
Description
The routine is deprecated and has been replaced by routine geqp3.
The routine ?geqpf forms the QR factorization of a general m-by-n matrix A with column pivoting: A*P =
Q*R (see Orthogonal Factorizations). Here P denotes an n-by-n permutation matrix.
Input Parameters

633
lda The leading dimension of a; at least max(1, m)for column major layout and
jpvt Array, size at least max(1, n).

On entry, if jpvt[i - 1] > 0, the i-th column of A is moved to the
beginning of A*P before the computation, and fixed in place during the
computation.
If jpvt[i - 1] = 0, the ith column of A is a free column (that is, it may
be interchanged during the computation with any other free column).
Output Parameters

tau Array, size at least max (1, min(m, n)). Contains additional information on
the matrix Q.
jpvt Overwritten by details of the permutation matrix P in the factorization A*P

= Q*R. More precisely, the columns of A*P are the columns of A in the
following order:
jpvt[0], jpvt[1], ..., jpvt[n - 1].
Return Values
Application Notes
||E||2 = O(ε)||A||2.
(4/3)n3 if m = n,
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.

?geqpf (this routine) to factorize A*P = Q*R;
634
LAPACK Routines 3
(The columns of the computed X are the permuted least squares solution vectors x; the output array jpvt
specifies the permutation order.)
?geqp3
matrix with column pivoting using level 3 BLAS.
Syntax
lapack_int LAPACKE_sgeqp3 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, lapack_int* jpvt, float* tau);
lapack_int LAPACKE_dgeqp3 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, lapack_int* jpvt, double* tau);
lapack_int LAPACKE_cgeqp3 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* jpvt, lapack_complex_float* tau);
lapack_int LAPACKE_zgeqp3 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* jpvt, lapack_complex_double*
tau);
Include Files
• mkl.h
Description
The routine forms the QR factorization of a general m-by-n matrix A with column pivoting: A*P = Q*R (see
Orthogonal Factorizations) using Level 3 BLAS. Here P denotes an n-by-n permutation matrix. Use this
routine instead of geqpf for better performance.
Input Parameters

635
jpvt
Array, size at least max(1, n).
On entry, if jpvt[i - 1]≠ 0, the i-th column of A is moved to the

beginning of AP before the computation, and fixed in place during the
computation.
If jpvt[i - 1] = 0, the i-th column of A is a free column (that is, it may
be interchanged during the computation with any other free column).
Output Parameters

tau Array, size at least max (1, min(m, n)). Contains scalar factors of the
elementary reflectors for the matrix Q.
jpvt Overwritten by details of the permutation matrix P in the factorization A*P

= Q*R. More precisely, the columns of AP are the columns of A in the
following order:
jpvt[0], jpvt[1], ..., jpvt[n - 1].
Return Values
Application Notes
?geqp3 (this routine) to factorize A*P = Q*R;
(The columns of the computed X are the permuted least squares solution vectors x; the output array jpvt
specifies the permutation order.)
636
LAPACK Routines 3
?orgqr
Generates the real orthogonal matrix Q of the QR
factorization formed by ?geqrf.
Syntax
lapack_int LAPACKE_sorgqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of m-by-m orthogonal matrix Q of the QR factorization formed by
the routines geqrf or geqpf. Use this routine after a call to sgeqrf/dgeqrf or sgeqpf/dgeqpf.
Usually Q is determined from the QR factorization of an m by p matrix A with m≥p. To compute the whole
matrix Q, use:
LAPACKE_?orgqr(matrix_layout, m, m, p, a, lda, tau)
To compute the leading p columns of Q (which form an orthonormal basis in the space spanned by the
columns of A):
LAPACKE_?orgqr(matrix_layout, m, p, p, a, lda)
To compute the matrix Qk of the QR factorization of leading k columns of the matrix A:
LAPACKE_?orgqr(matrix_layout, m, m, k, a, lda, tau)
To compute the leading k columns of Qk (which form an orthonormal basis in the space spanned by leading k
columns of the matrix A):
LAPACKE_?orgqr(matrix_layout, m, k, k, a, lda, tau)
Input Parameters

m The order of the orthogonal matrix Q (m≥ 0).
n The number of columns of Q to be computed

(0 ≤n≤m).
k The number of elementary reflectors whose product defines the matrix Q (0

≤k≤n).
a, tau Arrays:
a and tau are the arrays returned by sgeqrf / dgeqrf or sgeqpf / dgeqpf.
The size of a is max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout .
The size of tau must be at least max(1, k).
637
Output Parameters
a Overwritten by n leading columns of the m-by-m orthogonal matrix Q.
Return Values
Application Notes
The computed Q differs from an exactly orthogonal matrix by a matrix E such that
||E||2 = O(ε)|*|A||2 where ε is the machine precision.
The total number of floating-point operations is approximately 4*m*n*k - 2*(m + n)*k2 + (4/3)*k3.
If n = k, the number is approximately (2/3)*n2*(3m - n).
The complex counterpart of this routine is ungqr.
?ormqr
Multiplies a real matrix by the orthogonal matrix Q of
the QR factorization formed by ?geqrf or ?geqpf.
Syntax
lapack_int LAPACKE_sormqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const float* a, lapack_int lda, const float* tau, float*
c, lapack_int ldc);
lapack_int LAPACKE_dormqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const double* a, lapack_int lda, const double* tau,
double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q of the QR factorization
formed by the routines geqrf or geqpf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QT*C,
C*Q, or C*QT (overwriting the result on C).
Input Parameters

side Must be either 'L' or 'R'.
If side='L', Q or QT is applied to C from the left.
638
LAPACK Routines 3
If side='R', Q or QT is applied to C from the right.
trans Must be either 'N' or 'T'.
If trans='N', the routine multiplies C by Q.
If trans='T', the routine multiplies C by QT.
m The number of rows in the matrix C (m≥ 0).
n The number of columns in C (n≥ 0).

Constraints:
0 ≤k≤m if side='L';
0 ≤k≤n if side='R'.
a, tau, c Arrays:
a and tau are the arrays returned by sgeqrf / dgeqrf or sgeqpf / dgeqpf.
The size of a is max(1, lda*k) for column major layout, max(1, lda*m) for
row major layout and side = 'L', and max(1, lda*n) for row major layout
and side = 'R'.

Array c of size max(1, ldc*n) for column major layout and max(1, ldc*m)
for row major layout contains the m-by-n matrix C.
lda The leading dimension of a. Constraints:

if side = 'L', lda≥ max(1, m)for column major layout and max(1, k) for
row major layout ;
if side = 'R', lda≥ max(1, n)for column major layout and max(1, k) for
row major layout.
ldc The leading dimension of c. Constraint:

ldc≥ max(1, m)for column major layout and max(1, n) for row major layout.
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, or C*QT (as specified by side
and trans).
Return Values
Application Notes
The complex counterpart of this routine is unmqr.
639
?ungqr
Generates the complex unitary matrix Q of the QR
factorization formed by ?geqrf.
Syntax
lapack_int LAPACKE_cungqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zungqr (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of m-by-m unitary matrix Q of the QR factorization formed by the
routines geqrf or geqpf. Use this routine after a call to cgeqrf/zgeqrf or cgeqpf/zgeqpf.
Usually Q is determined from the QR factorization of an m by p matrix A with m≥p. To compute the whole
matrix Q, use:
LAPACKE_?ungqr(matrix_layout, m, m, p, a, lda, tau)
To compute the leading p columns of Q (which form an orthonormal basis in the space spanned by the
columns of A):
LAPACKE_?ungqr(matrix_layout, m, p, p, a, lda, tau)
To compute the matrix Qk of the QR factorization of the leading k columns of the matrix A:
LAPACKE_?ungqr(matrix_layout, m, m, k, a, lda, tau)
To compute the leading k columns of Qk (which form an orthonormal basis in the space spanned by the
leading k columns of the matrix A):
LAPACKE_?ungqr(matrix_layout, m, k, k, a, lda, tau)
Input Parameters

m The order of the unitary matrix Q (m≥ 0).
n The number of columns of Q to be computed

(0 ≤n≤m).

≤k≤n).
a, tau Arrays: a and tau are the arrays returned by cgeqrf/zgeqrf or cgeqpf/
zgeqpf.
The size of a is max(1, lda*n) for column major layout and max(1, lda*m)
640
LAPACK Routines 3
Output Parameters
a Overwritten by n leading columns of the m-by-m unitary matrix Q.
Return Values
Application Notes
The computed Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
where ε is the machine precision.
If n = k, the number is approximately (8/3)*n2*(3m - n).
The real counterpart of this routine is orgqr.
?unmqr
Multiplies a complex matrix by the unitary matrix Q of
the QR factorization formed by ?geqrf.
Syntax
lapack_int LAPACKE_cunmqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmqr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a rectangular complex matrix C by Q or QH, where Q is the unitary matrix Q of the QR
factorization formed by the routines geqrf or geqpf.
Depending on the parameters side and trans, the routine can form one of the matrix products Q*C, QH*C,
C*Q, or C*QH (overwriting the result on C).
Input Parameters

If side = 'L', Q or QH is applied to C from the left.
641
If side = 'R', Q or QH is applied to C from the right.
trans Must be either 'N' or 'C'.
If trans = 'N', the routine multiplies C by Q.
If trans = 'C', the routine multiplies C by QH.

Constraints:
0 ≤k≤m if side = 'L';
0 ≤k≤n if side = 'R'.
a, c, tau Arrays:
a size max(1, lda*k) for column major layout, max(1, lda*m) for row
major layout when side ='L', and max(1, lda*n) for row major layout
when side ='R' and tau are the arrays returned by cgeqrf / zgeqrf or
cgeqpf / zgeqpf.
c(size max(1, ldc*n) for column major layout and max(1, ldc*m for row
major layout) contains the m-by-n matrix C.

lda≥ max(1, m) for column major layout and lda≥ max(1, k) for row
major layout if side = 'L';
lda≥ max(1, n) for column major layout and lda≥ max(1, k) for row
major layout if side = 'R'.
ldc The leading dimension of c. Constraint:

ldc≥ max(1, m) for column major layout and max(1, n) for row major
layout.
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, or C*QH (as specified by side
and trans).
Return Values
Application Notes
The real counterpart of this routine is ormqr.
642
LAPACK Routines 3
?gelqf
Computes the LQ factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgelqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dgelqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int LAPACKE_cgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_int LAPACKE_zgelqf (int matrix_layout, lapack_int m, lapack_int n,
Include Files
• mkl.h
Description
The routine forms the LQ factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
NOTE
Input Parameters

Output Parameters

The elements on and below the diagonal of the array contain the m-by-
min(m,n) lower trapezoidal matrix R (R is lower triangular if m≤n); the
elements above the diagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors.
tau Array, size at least max(1, min(m, n)).
643
Contains scalars that define elementary reflectors for the matrix Q (see
Orthogonal Factorizations).
Return Values
Application Notes
||E||2 = O(ε) ||A||2.
(4/3)n3 if m = n,
(2/3)n2(3m-n) if m > n,
(2/3)m2(3n-m) if m < n.

To find the minimum-norm solution of an underdetermined least squares problem minimizing ||A*x - b||2
for all columns b of a given matrix B, you can call the following:
?gelqf (this routine) to factorize A = L*Q;
trsm (a BLAS routine) to solve L*Y = B for Y;
ormlq to compute X = (Q1)T*Y (for real matrices);
unmlq to compute X = (Q1)H*Y (for complex matrices).
(The columns of the computed X are the minimum-norm solution vectors x. Here A is an m-by-n matrix with
m < n; Q1 denotes the first m columns of Q).
orglq (for real matrices)
unglq (for complex matrices).
See Also
mkl_progress
?orglq
Generates the real orthogonal matrix Q of the LQ
factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_sorglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_int LAPACKE_dorglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
644
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine generates the whole or part of n-by-n orthogonal matrix Q of the LQ factorization formed by the
routines gelqf. Use this routine after a call to sgelqf/dgelqf.
Usually Q is determined from the LQ factorization of an p-by-n matrix A with n≥p. To compute the whole
matrix Q, use:
info = LAPACKE_?orglq(matrix_layout, n, n, p, a, lda, tau)
To compute the leading p rows of Q, which form an orthonormal basis in the space spanned by the rows of A,
use:
info = LAPACKE_?orglq(matrix_layout, p, n, p, a, lda, tau)
To compute the matrix Qk of the LQ factorization of the leading k rows of A, use:
info = LAPACKE_?orglq(matrix_layout, n, n, k, a, lda, tau)
To compute the leading k rows of Qk, which form an orthonormal basis in the space spanned by the leading k
rows of A, use:
info = LAPACKE_?orgqr(matrix_layout, k, n, k, a, lda, tau)
Input Parameters

m The number of rows of Q to be computed

(0 ≤m≤n).
n The order of the orthogonal matrix Q (n≥m).

≤k≤m).
a, tau Arrays: a (size max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) and tau are the arrays returned by sgelqf/dgelqf.
Output Parameters
a Overwritten by m leading rows of the n-by-n orthogonal matrix Q.
Return Values
645
Application Notes
The computed Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
If m = k, the number is approximately (2/3)*m2*(3n - m).
The complex counterpart of this routine is unglq.
?ormlq
the LQ factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_sormlq (int matrix_layout, char side, char trans, lapack_int m,
c, lapack_int ldc);
lapack_int LAPACKE_dormlq (int matrix_layout, char side, char trans, lapack_int m,
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the orthogonal matrix Q of the LQ
factorization formed by the routine gelqf.
Input Parameters

If side = 'L', Q or QT is applied to C from the left.
If side = 'R', Q or QT is applied to C from the right.
If trans = 'T', the routine multiplies C by QT.

Constraints:
646
LAPACK Routines 3
a, c, tau Arrays:
a and tau are arrays returned by ?gelqf.
The size of a must be:
For side = 'L' and column major layout, max(1, lda*m).
For side = 'R' and column major layout, max(1, lda*n).
For row major layout regardless of side, max(1, lda*k).
The dimension of tau must be at least max(1, k).

lda The leading dimension of a. For column major layout, lda≥ max(1, k). For
row major layout, if side = 'L', lda≥ max(1, m), or, if side = 'R', lda≥
max(1, n).
ldc The leading dimension of c; ldc≥ max(1, m) for column major layout and
Output Parameters
and trans).
Return Values
Application Notes
The complex counterpart of this routine is unmlq.
?unglq
Generates the complex unitary matrix Q of the LQ
factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_cunglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_int LAPACKE_zunglq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
Include Files
• mkl.h
Description
The routine generates the whole or part of n-by-n unitary matrix Q of the LQ factorization formed by the
routines gelqf. Use this routine after a call to cgelqf/zgelqf.
647
Usually Q is determined from the LQ factorization of an p-by-n matrix A with n < p. To compute the whole
matrix Q, use:
info = LAPACKE_?unglq(matrix_layout, n, n, p, a, lda, tau)
To compute the leading p rows of Q, which form an orthonormal basis in the space spanned by the rows of A,
use:
info = LAPACKE_?unglq(matrix_layout, p, n, p, a, lda, tau)
To compute the matrix Qk of the LQ factorization of the leading k rows of A, use:
info = LAPACKE_?unglq(matrix_layout, n, n, k, a, lda, tau)
To compute the leading k rows of Qk, which form an orthonormal basis in the space spanned by the leading k
rows of A, use:
info = LAPACKE_?ungqr(matrix_layout, k, n, k, a, lda, tau)
Input Parameters

m The number of rows of Q to be computed (0 ≤m≤n).
n The order of the unitary matrix Q (n≥m).

≤k≤m).
for row major layout) and tau are the arrays returned by cgelqf/zgelqf.
The dimension of tau must be at least max(1, k).
Output Parameters
a Overwritten by m leading rows of the n-by-n unitary matrix Q.
Return Values
Application Notes
The computed Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε)*||A||2,
If m = k, the number is approximately (8/3)*m2*(3n - m) .
The real counterpart of this routine is orglq.
648
LAPACK Routines 3
?unmlq
the LQ factorization formed by ?gelqf.
Syntax
lapack_int LAPACKE_cunmlq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int LAPACKE_zunmlq (int matrix_layout, char side, char trans, lapack_int m,
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QH, where Q is the unitary matrix Q of the LQ
factorization formed by the routine gelqf.
Input Parameters


Constraints:
a, c, tau Arrays:
a and tau are arrays returned by ?gelqf.
For side = 'L' and column major layout, max(1, lda*m).
For side = 'R' and column major layout, max(1, lda*n).
649
For row major layout regardless of side, max(1, lda*k).

lda The leading dimension of a. For column major layout, lda≥ max(1, k). For
row major layout, if side = 'L', lda≥ max(1, m), or, if side = 'R', lda≥
max(1, n).
Output Parameters
and trans).
Return Values
Application Notes
The real counterpart of this routine is ormlq.
?geqlf
Computes the QL factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgelqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dgelqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int LAPACKE_cgelqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_int LAPACKE_zgelqf (int matrix_layout, lapack_int m, lapack_int n,
Include Files
• mkl.h
Description
The routine forms the QL factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
650
LAPACK Routines 3
NOTE
Input Parameters

Output Parameters
a Overwritten on exit by the factorization data as follows:

if m≥n, the lower triangle of the subarray a(m-n+1:m, 1:n) contains the n-
by-n lower triangular matrix L; if m≤n, the elements on and below the (n-
m)-th superdiagonal contain the m-by-n lower trapezoidal matrix L; in both
cases, the remaining elements, with the array tau, represent the
orthogonal/unitary matrix Q as a product of elementary reflectors.
tau Array, size at least max(1, min(m, n)). Contains scalar factors of the
elementary reflectors for the matrix Q (see Orthogonal Factorizations).
Return Values
Application Notes
Related routines include:
orgql to generate matrix Q (for real matrices);
ungql to generate matrix Q (for complex matrices);
ormql to apply matrix Q (for real matrices);
unmql to apply matrix Q (for complex matrices).
See Also
mkl_progress
651
?orgql
Generates the real matrix Q of the QL factorization
formed by ?geqlf.
Syntax
lapack_int LAPACKE_sorgql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_int LAPACKE_dorgql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
Include Files
• mkl.h
Description
The routine generates an m-by-n real matrix Q with orthonormal columns, which is defined as the last n
columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...* H(2)*H(1) as returned
by the routines geqlf. Use this routine after a call to sgeqlf/dgeqlf.
Input Parameters

m The number of rows of the matrix Q (m≥ 0).
n The number of columns of the matrix Q (m≥ n≥ 0).
k The number of elementary reflectors whose product defines the matrix Q

(n≥ k≥ 0).
for row major layout), tau.
On entry, the (n - k + i)th column of a must contain the vector which
defines the elementary reflector H(i), for i = 1,2,...,k, as returned by
sgeqlf/dgeqlf in the last k columns of its array argument a; tau[i - 1]
must contain the scalar factor of the elementary reflector H(i), as returned
by sgeqlf/dgeqlf;
Output Parameters
a Overwritten by the last n columns of the m-by-m orthogonal matrix Q.
Return Values
652
LAPACK Routines 3
Application Notes
The complex counterpart of this routine is ungql.
?ungql
Generates the complex matrix Q of the QL
factorization formed by ?geqlf.
Syntax
lapack_int LAPACKE_cungql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_int LAPACKE_zungql (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
Include Files
• mkl.h
Description
The routine generates an m-by-n complex matrix Q with orthonormal columns, which is defined as the last n
columns of a product of k elementary reflectors H(i) of order m: Q = H(k) *...* H(2)*H(1) as returned
by the routines geqlf/geqlf . Use this routine after a call to cgeqlf/zgeqlf.
Input Parameters

m The number of rows of the matrix Q (m≥0).
n The number of columns of the matrix Q (m≥n≥0).

(n≥k≥0).
On entry, the (n - k + i)th column of a must contain the vector which
defines the elementary reflector H(i), for i = 1,2,...,k, as returned by
cgeqlf/zgeqlf in the last k columns of its array argument a;
tau[i - 1] must contain the scalar factor of the elementary reflector H(i), as
returned by cgeqlf/zgeqlf;
Output Parameters
a Overwritten by the last n columns of the m-by-m unitary matrix Q.
Return Values
653
Application Notes
The real counterpart of this routine is orgql.
?ormql
the QL factorization formed by ?geqlf.
Syntax
lapack_int LAPACKE_sormql (int matrix_layout, char side, char trans, lapack_int m,
c, lapack_int ldc);
lapack_int LAPACKE_dormql (int matrix_layout, char side, char trans, lapack_int m,
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the orthogonal matrix Q of the QL
factorization formed by the routine geqlf.
Depending on the parameters side and trans, the routine ormql can form one of the matrix products Q*C,
QT*C, C*Q, or C*QT (overwriting the result over C).
Input Parameters


Constraints:
654
LAPACK Routines 3
a, tau, c Arrays: a, tau, c.
For column major layout regardless of side, max(1, lda*k).
For side = 'L' and row major layout, max(1, lda*m).
For side = 'R' and row major layout, max(1, lda*n).
On entry, the ith column of a must contain the vector which defines the
elementary reflector Hi, for i = 1,2,...,k, as returned by sgeqlf/dgeqlf in
the last k columns of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector Hi, as
returned by sgeqlf/dgeqlf.

c(size max(1, ldc*n) for column major layout and max(1, ldc*m) for row
lda The leading dimension of a;

if side = 'L', lda≥ max(1, m)for column major layout and max(1, k) for
row major layout ;
if side = 'R', lda≥ max(1, n)for column major layout and max(1, k) for
row major layout.
ldc The leading dimension of c; ldc≥ max(1, m)for column major layout and
Output Parameters
and trans).
Return Values
Application Notes
The complex counterpart of this routine is unmql.
?unmql
the QL factorization formed by ?geqlf.
Syntax
lapack_int LAPACKE_cunmql (int matrix_layout, char side, char trans, lapack_int m,
lapack_int LAPACKE_zunmql (int matrix_layout, char side, char trans, lapack_int m,
655
Include Files
• mkl.h
Description
The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrix Q of the QL
factorization formed by the routine geqlf.
Depending on the parameters side and trans, the routine unmql can form one of the matrix products Q*C,
QH*C, C*Q, or C*QH (overwriting the result over C).
Input Parameters


Constraints:
a, tau, c Arrays: a, tau, c.

For column major layout regardless of side, max(1, lda*k).
For side = 'L' and row major layout, max(1, lda*m).
For side = 'R' and row major layout, max(1, lda*n).
On entry, the i-th column of a must contain the vector which defines the
elementary reflector H(i), for i = 1,2,...,k, as returned by cgeqlf/zgeqlf
in the last k columns of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector H(i),
as returned by cgeqlf/zgeqlf.

lda The leading dimension of a.
656
LAPACK Routines 3
If side = 'L', lda≥ max(1, m)for column major layout and max(1, k) for
row major layout.
If side = 'R', lda≥ max(1, n)for column major layout and max(1, k) for
row major layout.
Output Parameters
and trans).
Return Values
Application Notes
The real counterpart of this routine is ormql.
?gerqf
Computes the RQ factorization of a general m-by-n
matrix.
Syntax
lapack_int LAPACKE_sgerqf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dgerqf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int LAPACKE_cgerqf (int matrix_layout, lapack_int m, lapack_int n,
lapack_int LAPACKE_zgerqf (int matrix_layout, lapack_int m, lapack_int n,
Include Files
• mkl.h
Description
The routine forms the RQ factorization of a general m-by-n matrix A (see Orthogonal Factorizations). No
NOTE
657
Input Parameters

for row major layout contains the m-by-n matrix A.
Output Parameters

if m≤n, the upper triangle of the subarray
a(1:m, n-m+1:n ) contains the m-by-m upper triangular matrix R;

if m≥n, the elements on and above the (m-n)th subdiagonal contain the m-
by-n upper trapezoidal matrix R;
in both cases, the remaining elements, with the array tau, represent the
orthogonal/unitary matrix Q as a product of min(m,n) elementary
reflectors.
tau Array, size at least max (1, min(m, n)). (See Orthogonal Factorizations.)
Contains scalar factors of the elementary reflectors for the matrix Q.
Return Values
Application Notes
orgrq to generate matrix Q (for real matrices);
ungrq to generate matrix Q (for complex matrices);
ormrq to apply matrix Q (for real matrices);
unmrq to apply matrix Q (for complex matrices).
See Also
mkl_progress
658
LAPACK Routines 3
?orgrq
Generates the real matrix Q of the RQ factorization
formed by ?gerqf.
Syntax
lapack_int LAPACKE_sorgrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_int LAPACKE_dorgrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
Include Files
• mkl.h
Description
The routine generates an m-by-n real matrix with orthonormal rows, which is defined as the last m rows of a
product of k elementary reflectors H(i) of order n: Q = H(1)* H(2)*...*H(k)as returned by the routines
gerqf. Use this routine after a call to sgerqf/dgerqf.
Input Parameters

m The number of rows of the matrix Q (m≥ 0).
n The number of columns of the matrix Q (n≥ m).

(m≥ k≥ 0).
a, tau Arrays: a(size max(1, lda*n) for column major layout and max(1, lda*m)
On entry, the (m - k + i)-th row of a must contain the vector which defines
the elementary reflector H(i), for i = 1,2,...,k, as returned by sgerqf/
dgerqf in the last k rows of its array argument a;
as returned by sgerqf/dgerqf;
Output Parameters
a Overwritten by the last m rows of the n-by-n orthogonal matrix Q.
Return Values
659
Application Notes
The complex counterpart of this routine is ungrq.
?ungrq
Generates the complex matrix Q of the RQ
factorization formed by ?gerqf.
Syntax
lapack_int LAPACKE_cungrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
lapack_int LAPACKE_zungrq (int matrix_layout, lapack_int m, lapack_int n, lapack_int k,
Include Files
• mkl.h
Description
The routine generates an m-by-n complex matrix with orthonormal rows, which is defined as the last m rows
of a product of k elementary reflectors H(i) of order n: Q = H(1)H* H(2)H*...*H(k)H as returned by the
routines gerqf. Use this routine after a call to cgerqf/zgerqf.
Input Parameters

m The number of rows of the matrix Q (m≥0).
n The number of columns of the matrix Q (n≥m ).

(m≥k≥0).
a, tau Arrays: a(size max(1, lda*n) for column major layout and max(1, lda*m)
On entry, the (m - k + i)th row of a must contain the vector which defines
the elementary reflector H(i), for i = 1,2,...,k, as returned by cgerqf/
zgerqf in the last k rows of its array argument a;
returned by cgerqf/zgerqf;
Output Parameters
a Overwritten by the m last rows of the n-by-n unitary matrix Q.
Return Values
660
LAPACK Routines 3
Application Notes
The real counterpart of this routine is orgrq.
?ormrq
the RQ factorization formed by ?gerqf.
Syntax
lapack_int LAPACKE_sormrq (int matrix_layout, char side, char trans, lapack_int m,
c, lapack_int ldc);
lapack_int LAPACKE_dormrq (int matrix_layout, char side, char trans, lapack_int m,
Include Files
• mkl.h
Description
The routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrix defined as a
product of k elementary reflectors Hi : Q = H1H2 ... Hk as returned by the RQ factorization routine gerqf.
C*Q, or C*QT (overwriting the result over C).
Input Parameters


Constraints:
0 ≤k≤m, if side = 'L';
0 ≤k≤n, if side = 'R'.
661
a, tau, c Arrays: a(size for side = 'L': max(1, lda*m) for column major layout and
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*n) for
column major layout and max(1, lda*k) for row major layout), tau, c (size
max(1, ldc*n) for column major layout and max(1, ldc*m) for row major
layout).
On entry, the ith row of a must contain the vector which defines the
elementary reflector Hi, for i = 1,2,...,k, as returned by sgerqf/dgerqf in
the last k rows of its array argument a.
tau[i - 1] must contain the scalar factor of the elementary reflector Hi, as
returned by sgerqf/dgerqf.

c contains the m-by-n matrix C.
lda The leading dimension of a; lda≥ max(1, k)for column major layout. For
row major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if
side = 'R'.
Output Parameters
and trans).
Return Values
Application Notes
The complex counterpart of this routine is unmrq.
?unmrq
the RQ factorization formed by ?gerqf.
Syntax
lapack_int LAPACKE_cunmrq (int matrix_layout, char side, char trans, lapack_int m,
lapack_int LAPACKE_zunmrq (int matrix_layout, char side, char trans, lapack_int m,
Include Files
• mkl.h
662
LAPACK Routines 3
Description
The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the complex unitary matrix defined
as a product of k elementary reflectors H(i) of order n: Q = H(1)H* H(2)H*...*H(k)Has returned by the
RQ factorization routine gerqf .
C*Q, or C*QH (overwriting the result over C).
Input Parameters


Constraints:
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*n) for
layout).
elementary reflector H(i), for i = 1,2,...,k, as returned by cgerqf/zgerqf in
the last k rows of its array argument a.
returned by cgerqf/zgerqf.

lda The leading dimension of a; lda≥ max(1, k)for column major layout. For row
major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if side
= 'R' .
663
Output Parameters
and trans).
Return Values
Application Notes
The real counterpart of this routine is ormrq.
?tzrzf
Reduces the upper trapezoidal matrix A to upper
triangular form.
Syntax
lapack_int LAPACKE_stzrzf (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dtzrzf (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int LAPACKE_ctzrzf (int matrix_layout, lapack_int m, lapack_int n,
lapack_int LAPACKE_ztzrzf (int matrix_layout, lapack_int m, lapack_int n,
Include Files
• mkl.h
Description
The routine reduces the m-by-n (m≤n) real/complex upper trapezoidal matrix A to upper triangular form by
means of orthogonal/unitary transformations. The upper trapezoidal matrix A = [A1 A2] = [A1:m, 1:m, A1:m, m
+1:n] is factored as
A = [R0]*Z,
where Z is an n-by-n orthogonal/unitary matrix, R is an m-by-m upper triangular matrix, and 0 is the m-by-
(n-m) zero matrix.
The ?tzrzf routine replaces the deprecated ?tzrqf routine.
Input Parameters

n The number of columns in A (n≥m).
664
LAPACK Routines 3
a Array a is of size max(1, lda*n) for column major layout and max(1,
The leading m-by-n upper trapezoidal part of the array a contains the
matrix A to be factorized.
Output Parameters

the leading m-by-m upper triangular part of a contains the upper triangular
matrix R, and elements m +1 to n of the first m rows of a, with the array
tau, represent the orthogonal matrix Z as a product of m elementary
reflectors.
tau Array, size at least max (1, m). Contains scalar factors of the elementary
reflectors for the matrix Z.
Return Values
Application Notes
The factorization is obtained by Householder's method. The k-th transformation matrix, Z(k), which is used
to introduce zeros into the (m - k + 1)-th row of A, is given in the form
where for real flavors
and for complex flavors
665
tau is a scalar and z(k) is an l-element vector. tau and z(k) are chosen to annihilate the elements of the k-th
row of A2.
The scalar tau is returned in the k-th element of tau and the vector u(k) in the k-th row of A, such that the
elements of z(k) are stored in the last m - n elements of the k-th row of array a.
The elements of R are returned in the upper triangular part of A.

The matrix Z is given by
Z = Z(1)*Z(2)*...*Z(m).
ormrz to apply matrix Q (for real matrices)
unmrz to apply matrix Q (for complex matrices).
?ormrz
Multiplies a real matrix by the orthogonal matrix
defined from the factorization formed by ?tzrzf.
Syntax
lapack_int LAPACKE_sormrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const float* a, lapack_int lda, const float*
tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const double* a, lapack_int lda, const
double* tau, double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The ?ormrz routine multiplies a real m-by-n matrix C by Q or QT, where Q is the real orthogonal matrix
defined as a product of k elementary reflectors H(i) of order n: Q = H(1)* H(2)*...*H(k) as returned by
the factorization routine tzrzf .
C*Q, or C*QT (overwriting the result over C).
The matrix Q is of order m if side = 'L' and of order n if side = 'R'.
The ?ormrz routine replaces the deprecated ?latzm routine.
Input Parameters

666
LAPACK Routines 3

Constraints:
l The number of columns of the matrix A containing the meaningful part of

the Householder reflectors. Constraints:
0 ≤l≤m, if side = 'L';
0 ≤l≤n, if side = 'R'.
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*b) for
layout).
elementary reflector H(i), for i = 1,2,...,k, as returned by stzrzf/dtzrzf
in the last k rows of its array argument a.
as returned by stzrzf/dtzrzf.

lda The leading dimension of a; lda≥ max(1, k)for column major layout. For row
major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if side
= 'R' .
Output Parameters
and trans).
Return Values
667
Application Notes
The complex counterpart of this routine is unmrz.
?unmrz
Multiplies a complex matrix by the unitary matrix
defined from the factorization formed by ?tzrzf.
Syntax
lapack_int LAPACKE_cunmrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const lapack_complex_float* a, lapack_int
lda, const lapack_complex_float* tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zunmrz (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, const lapack_complex_double* a, lapack_int
lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
The routine multiplies a complex m-by-n matrix C by Q or QH, where Q is the unitary matrix defined as a
product of k elementary reflectors H(i):
Q = H(1)H* H(2)H*...*H(k)H as returned by the factorization routine tzrzf.

C*Q, or C*QH (overwriting the result over C).
The matrix Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters


Constraints:
668
LAPACK Routines 3
l The number of columns of the matrix A containing the meaningful part of
the Householder reflectors. Constraints:
0 ≤l≤m, if side = 'L';
0 ≤l≤n, if side = 'R'.
max(1, lda*k) for row major layout; for side = 'R': max(1, lda*b) for
layout).
elementary reflector H(i), for i = 1,2,...,k, as returned by ctzrzf/ztzrzf
in the last k rows of its array argument a.
as returned by ctzrzf/ztzrzf.

lda The leading dimension of a; lda≥ max(1, k)for column major layout. For
row major layout, lda≥ max(1, m) if side = 'L', and lda≥ max(1, n) if
side = 'R'.
Output Parameters
and trans).
Return Values
Application Notes
The real counterpart of this routine is ormrz.
?ggqrf
Computes the generalized QR factorization of two
matrices.
Syntax
lapack_int LAPACKE_sggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
float* a, lapack_int lda, float* taua, float* b, lapack_int ldb, float* taub);
lapack_int LAPACKE_dggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
double* a, lapack_int lda, double* taua, double* b, lapack_int ldb, double* taub);
669
lapack_int LAPACKE_cggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,

lapack_complex_float* a, lapack_int lda, lapack_complex_float* taua,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* taub);
lapack_int LAPACKE_zggqrf (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* taua,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* taub);
Include Files
• mkl.h
Description
The routine forms the generalized QR factorization of an n-by-m matrix A and an n-by-p matrix B as A =
Q*R, B = Q*T*Z, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix,
and R and T assume one of the forms:
or
where R11 is upper triangular, and
where T12 or T21 is a p-by-p upper triangular matrix.

In particular, if B is square and nonsingular, the GQR factorization of A and B implicitly gives the QR
factorization of B-1A as:
B-1*A = ZT*(T-1*R) (for real flavors) or B-1*A = ZH*(T-1*R) (for complex flavors).
670
LAPACK Routines 3
Input Parameters

n The number of rows of the matrices A and B (n≥ 0).
m The number of columns in A (m≥ 0).
p The number of columns in B (p≥ 0).
a, b Array a of size max(1, lda*m) for column major layout and max(1, lda*n)
Array b of size max(1, ldb*p) for column major layout and max(1, ldb*n)
for row major layout contains the matrix B.
lda The leading dimension of a; at least max(1, n) for column major layout and
at least max(1, m) for row major layout.
ldb The leading dimension of b; at least max(1, n) for column major layout and
at least max(1, p) for row major layout.
Output Parameters
a, b Overwritten by the factorization data as follows:

on exit, the elements on and above the diagonal of the array a contain the
min(n,m)-by-m upper trapezoidal matrix R (R is upper triangular if n≥m);the
elements below the diagonal, with the array taua, represent the orthogonal/
unitary matrix Q as a product of min(n,m) elementary reflectors ;
if n≤p, the upper triangle of the subarray b(1:n, p-n+1:p ) contains the n-
by-n upper triangular matrix T;
if n > p, the elements on and above the (n-p)th subdiagonal contain the n-
by-p upper trapezoidal matrix T; the remaining elements, with the array
taub, represent the orthogonal/unitary matrix Z as a product of elementary
reflectors.
taua, taub Arrays, size at least max (1, min(n, m)) for taua and at least max (1,
min(n, p)) for taub. The array taua contains the scalar factors of the
elementary reflectors which represent the orthogonal/unitary matrix Q.
The array taub contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Z.
Return Values
Application Notes
The matrix Q is represented as a product of elementary reflectors
Q = H(1)H(2)...H(k), where k = min(n,m).
671
Each H(i) has the form

H(i) = I - τa*v*vT for real flavors, or
H(i) = I - τa*v*vH for complex flavors,
where τa is a real/complex scalar, and v is a real/complex vector with vj = 0 for 1 ≤j≤i - 1, vi = 1.
On exit, fori + 1 ≤j≤n, vj is stored in a[(j - 1) + (i - 1)*lda] for column major layout and in a[(j -
1)*lda + (i - 1)] for row major layout and τa is stored in taua[i - 1]
The matrix Z is represented as a product of elementary reflectors
Z = H(1)H(2)...H(k), where k = min(n,p).
H(i) = I - τb*v*vT for real flavors, or
H(i) = I - τb*v*vH for complex flavors,
where τb is a real/complex scalar, and v is a real/complex vector with vp - k + 1 = 1, vj = 0 for p - k + 1 ≤j≤p -
1, .
On exit, for 1 ≤j≤p - k + i - 1, vj is stored in b[(n - k + i - 1) + (j - 1)*ldb] for column major layout
and in b[(n - k + i - 1)*ldb + (j - 1)] for row major layout and τb is stored in taub[i - 1].
?ggrqf
Computes the generalized RQ factorization of two
matrices.
Syntax
lapack_int LAPACKE_sggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
float* a, lapack_int lda, float* taua, float* b, lapack_int ldb, float* taub);
lapack_int LAPACKE_dggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
double* a, lapack_int lda, double* taua, double* b, lapack_int ldb, double* taub);
lapack_int LAPACKE_cggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* taua,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* taub);
lapack_int LAPACKE_zggrqf (int matrix_layout, lapack_int m, lapack_int p, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* taua,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* taub);
Include Files
• mkl.h
Description
The routine forms the generalized RQ factorization of an m-by-n matrix A and an p-by-n matrix B as A =
R*Q, B = Z*T*Q, where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix,
and R and T assume one of the forms:
or
672
LAPACK Routines 3
where R11 or R21 is upper triangular, and
or
where T11 is upper triangular.

In particular, if B is square and nonsingular, the GRQ factorization of A and B implicitly gives the RQ
factorization of A*B-1 as:
A*B-1 = (R*T-1)*ZT (for real flavors) or A*B-1 = (R*T-1)*ZH (for complex flavors).
Input Parameters

m The number of rows of the matrix A (m≥ 0).
p The number of rows in B (p≥ 0).
n The number of columns of the matrices A and B (n≥ 0).
a, b Arrays:
a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) contains the m-by-n matrix A.
b(size max(1, ldb*n) for column major layout and max(1, ldb*p) for row
major layout) contains the p-by-n matrix B.
ldb The leading dimension of b; at least max(1, p)for column major layout and
673
Output Parameters
a, b Overwritten by the factorization data as follows:

on exit, if m≤n, element Ri j (1<=i≤j≤m) of upper triangular matrix R is
stored in a[(i - 1) + (n - m + j - 1)*lda] for column major layout
and in a[(i - 1)*lda + (n - m + j - 1)] for row major layout.
if m > n, the elements on and above the (m-n)th subdiagonal contain the
m-by-n upper trapezoidal matrix R;
the remaining elements, with the array taua, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors.
The elements on and above the diagonal of the array b contain the
min(p,n)-by-n upper trapezoidal matrix T (T is upper triangular if p≥n); the
elements below the diagonal, with the array taub, represent the orthogonal/
unitary matrix Z as a product of elementary reflectors.
taua, taub Arrays, size at least max (1, min(m, n)) for taua and at least max (1,
min(p, n)) for taub.
The array taua contains the scalar factors of the elementary reflectors
which represent the orthogonal/unitary matrix Q.
which represent the orthogonal/unitary matrix Z.
Return Values
Application Notes
Q = H(1)H(2)...H(k), where k = min(m,n).
H(i) = I - taua*v*vT for real flavors, or
H(i) = I - taua*v*vH for complex flavors,
where taua is a real/complex scalar, and v is a real/complex vector with vn - k + i = 1, vn - k + i + 1:n = 0.
On exit, v1:n - k + i - 1 is stored in a(m-k+i,1:n-k+i-1) and taua is stored in taua[i - 1].

Z = H(1)H(2)...H(k), where k = min(p,n).
H(i) = I - taub*v*vT for real flavors, or
H(i) = I - taub*v*vH for complex flavors,
where taub is a real/complex scalar, and v is a real/complex vector with v1:i - 1 = 0, vi = 1.
On exit, vi + 1:p is stored in b(i+1:p, i) and taub is stored in taub[i - 1].
674
LAPACK Routines 3
?tpqrt
Computes a blocked QR factorization of a real or
complex "triangular-pentagonal" matrix, which is
composed of a triangular block and a pentagonal
block, using the compact WY representation for Q.
Syntax
lapack_int LAPACKE_stpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, float* a, lapack_int lda, float* b, lapack_int ldb, float* t,
lapack_int ldt);
lapack_int LAPACKE_dtpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, double* a, lapack_int lda, double* b, lapack_int ldb, double* t,
lapack_int ldt);
lapack_int LAPACKE_ctpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* t, lapack_int ldt);
lapack_int LAPACKE_ztpqrt (int matrix_layout, lapack_int m, lapack_int n, lapack_int l,
lapack_int nb, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, lapack_complex_double* t, lapack_int ldt);
Include Files
• mkl.h
Description
The input matrix C is an (n+m)-by-n matrix
where A is an n-by-n upper triangular matrix, and B is an m-by-n pentagonal matrix consisting of an (m-l)-
by-n rectangular matrix B1 on top of an l-by-n upper trapezoidal matrix B2:
The upper trapezoidal matrix B2 consists of the first l rows of an n-by-n upper triangular matrix, where 0
≤l≤ min(m,n). If l=0, B is an m-by-n rectangular matrix. If m=l=n, B is upper triangular. The elementary
reflectors H(i) are stored in the ith column below the diagonal in the (n+m)-by-n input matrix C. The
structure of vectors defining the elementary reflectors is illustrated by:
The elements of the unit matrix I are not stored. Thus, V contains all of the necessary information, and is
returned in array b.
675
NOTE
Note that V has the same form as B:
The columns of V represent the vectors which define the H(i)s.

The number of blocks is k = ceiling(n/nb), where each block is of order nb except for the last block, which is
of order ib = n - (k-1)*nb. For each of the k blocks, an upper triangular block reflector factor is computed:
T1, T2, ..., Tk. The nb-by-nb (ib-by-ib for the last block) Tis are stored in the nb-by-n array t as
t = [T1T2 ... Tk] .
Input Parameters

m The total number of rows in the matrix B (m≥ 0).
n The number of columns in B and the order of the triangular matrix A (n≥ 0).
l The number of rows of the upper trapezoidal part of B (min(m, n) ≥l≥ 0).
nb The block size to use in the blocked QR factorization (n≥nb≥ 1).
a, b Arrays: a size lda*n contains the n-by-n upper triangular matrix A.
b size max(1, ldb*n) for column major layout and max(1, ldb*m) for row
major layout, the pentagonal m-by-n matrix B. The first (m-l) rows contain
the rectangular B1 matrix, and the next l rows contain the upper
trapezoidal B2 matrix.
ldb The leading dimension of b; at least max(1, m) for column major layout and
ldt The leading dimension of t; at least nb for column major layout and at least
Output Parameters
a The elements on and above the diagonal of the array contain the upper
triangular matrix R.
b The pentagonal matrix V.
t Array, size ldt*n for column major layout and ldt*nb for row major
layout.
676
LAPACK Routines 3
The upper triangular block reflectors stored in compact form as a sequence
of upper triangular blocks.
Return Values
?tpmqrt
Applies a real or complex orthogonal matrix obtained
from a "triangular-pentagonal" complex block reflector
to a general real or complex matrix, which consists of
two blocks.
Syntax
lapack_int LAPACKE_stpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const float* v, lapack_int
ldv, const float* t, lapack_int ldt, float* a, lapack_int lda, float* b, lapack_int
ldb);
lapack_int LAPACKE_dtpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const double* v, lapack_int
ldv, const double* t, lapack_int ldt, double* a, lapack_int lda, double* b, lapack_int
ldb);
lapack_int LAPACKE_ctpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const lapack_complex_float*
v, lapack_int ldv, const lapack_complex_float* t, lapack_int ldt, lapack_complex_float*
a, lapack_int lda, lapack_complex_float* b, lapack_int ldb);
lapack_int LAPACKE_ztpmqrt (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int k, lapack_int l, lapack_int nb, const lapack_complex_double*
v, lapack_int ldv, const lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb);
Include Files
• mkl.h
Description
The columns of the pentagonal matrix V contain the elementary reflectors H(1), H(2), ..., H(k); V is
composed of a rectangular block V1 and a trapezoidal block V2:
The size of the trapezoidal block V2 is determined by the parameter l, where 0 ≤l≤k. V2 is upper trapezoidal,
consisting of the first l rows of a k-by-k upper triangular matrix.
If l=k, V2 is upper triangular;
If l=0, there is no trapezoidal block, so V = V1 is rectangular.
677
If side = 'L':
where A is k-by-n, B is m-by-n and V is m-by-k.
If side = 'R':
where A is m-by-k, B is m-by-n and V is n-by-k.
The real/complex orthogonal matrix Q is formed from V and T.

If trans='N' and side='L', c contains Q * C on exit.
If trans='T' and side='L', C contains QT * C on exit.
If trans='C' and side='L', C contains QH * C on exit.
If trans='N' and side='R', C contains C * Q on exit.
If trans='T' and side='R', C contains C * QT on exit.
If trans='C' and side='R', C contains C * QH on exit.
Input Parameters

side ='L': apply Q, QT, or QH from the left.

='R': apply Q, QT, or QH from the right.
trans ='N', no transpose, apply Q.

='T', transpose, apply QT.
='C', transpose, apply QH.
m The number of rows in the matrix B, (m≥ 0).
n The number of columns in the matrix B, (n≥ 0).
k The number of elementary reflectors whose product defines the matrix Q,

(k≥ 0).
l The order of the trapezoidal part of V (k≥l≥ 0).
nb
The block size used for the storage of t, k≥nb≥ 1. This must be the same
value of nb used to generate t in tpqrt.
v Size ldv*k for column major layout; ldv*m for row major layout and side
= 'L', ldv*n for row major layout and side = 'R'.
678
LAPACK Routines 3
The ith column must contain the vector which defines the elementary
reflector H(i), for i = 1,2,...,k, as returned by tpqrt in array argument b.
If side = 'L', ldv must be at least max(1,m) for column major layout and
max(1, k for row major layout;
If side = 'R', ldv must be at least max(1,n) for column major layout and
max(1, k for row major layout.
t Array, size ldt*k for column major layout and ldt*nb for row major
layout.
The upper triangular factors of the block reflectors as returned by tpqrt
ldt The leading dimension of the array t. ldt must be at least nb for column
major layout and max(1, k for row major layout.
a If side = 'L', size lda*n for column major layout and lda*k for row major
layout ..
If side = 'R', size lda*k for column major layout and lda*m for row major
layout ..
The k-by-n or m-by-k matrix A.
If side = 'L', lda must be at least max(1,k) for column major layout and
max(1, n for row major layout.
If side = 'R', lda must be at least max(1,m) for column major layout and
max(1, k for row major layout.
b Size ldb*n for column major layout and ldb*m for row major layout.
The m-by-n matrix B.
ldb The leading dimension of the array b. ldb must be at least max(1,m) for
column major layout and max(1, n for row major layout.
Output Parameters
a Overwritten by the corresponding block of the product Q*C, C*Q, QT*C,

C*QT, QH*C, or C*QH.
b Overwritten by the corresponding block of the product Q*C, C*Q, QT*C,

C*QT, QH*C, or C*QH.
Return Values
679
Singular Value Decomposition: LAPACK Computational Routines

This section describes LAPACK routines for computing the singular value decomposition (SVD) of a general
m-by-n matrix A:
A = UΣVH.
In this decomposition, U and V are unitary (for complex A) or orthogonal (for real A); Σ is an m-by-n
diagonal matrix with real diagonal elements σi:
σ1≥σ2≥ ... ≥σmin(m, n)≥ 0.
The diagonal elements σi are singular values of A. The first min(m, n) columns of the matrices U and V are,
respectively, left and right singular vectors of A. The singular values and singular vectors satisfy
Avi = σiui and AHui = σivi
where ui and vi are the i-th columns of U and V, respectively.
To find the SVD of a general matrix A, call the LAPACK routine ?gebrd or ?gbbrd for reducing A to a
bidiagonal matrix B by a unitary (orthogonal) transformation: A = QBPH. Then call ?bdsqr, which forms the
SVD of a bidiagonal matrix: B = U1ΣV1H.
Thus, the sought-for SVD of A is given by A = UΣVH =(QU1)Σ(V1HPH).
Table "Computational Routines for Singular Value Decomposition (SVD)" lists LAPACK routines that perform
singular value decomposition of matrices.
Computational Routines for Singular Value Decomposition (SVD)
Operation Real matrices Complex matrices
Reduce A to a bidiagonal matrix B: A = QBPH ?gebrd ?gebrd

(full storage)
Reduce A to a bidiagonal matrix B: A = QBPH ?gbbrd ?gbbrd

(band storage)
Generate the orthogonal (unitary) matrix Q or ?orgbr ?ungbr

P
Apply the orthogonal (unitary) matrix Q or P ?ormbr ?unmbr
Form singular value decomposition of the ?bdsqr ?bdsdc ?bdsqr

bidiagonal matrix B: B = UΣVH
680
LAPACK Routines 3
Decision Tree: Singular Value Decomposition
Figure "Decision Tree: Singular Value Decomposition" presents a decision tree that helps you choose the right
sequence of routines for SVD, depending on whether you need singular values only or singular vectors as
well, whether A is real or complex, and so on.
You can use the SVD to find a minimum-norm solution to a (possibly) rank-deficient least squares problem of
minimizing ||Ax - b||2. The effective rank k of the matrix A can be determined as the number of singular
values which exceed a suitable threshold. The minimum-norm solution is
x = Vk(Σk)-1c
where Σk is the leading k-by-k submatrix of Σ, the matrix Vk consists of the first k columns of V = PV1, and
the vector c consists of the first k elements of UHb = U1HQHb.
?gebrd
Reduces a general matrix to bidiagonal form.
Syntax
lapack_int LAPACKE_sgebrd( int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int lda, float* d, float* e, float* tauq, float* taup );
lapack_int LAPACKE_dgebrd( int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double* d, double* e, double* tauq, double* taup );
lapack_int LAPACKE_cgebrd( int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* d, float* e, lapack_complex_float*
tauq, lapack_complex_float* taup );
lapack_int LAPACKE_zgebrd( int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* d, double* e, lapack_complex_double*
tauq, lapack_complex_double* taup );
681
Include Files
• mkl.h
Description
The routine reduces a general m-by-n matrix A to a bidiagonal matrix B by an orthogonal (unitary)
transformation.
If m≥n, the reduction is given by
where B1 is an n-by-n upper diagonal matrix, Q and P are orthogonal or, for a complex A, unitary matrices;
Q1 consists of the first n columns of Q.
If m < n, the reduction is given by
A = Q*B*PH = Q*(B10)*PH = Q1*B1*P1H,

where B1 is an m-by-m lower diagonal matrix, Q and P are orthogonal or, for a complex A, unitary matrices;
P1 consists of the first m columns of P.
The routine does not form the matrices Q and P explicitly, but represents them as products of elementary
reflectors. Routines are provided to work with the matrices Q and P in this representation:
If the matrix A is real,
• to compute Q and P explicitly, call orgbr.

• to multiply a general matrix by Q or P, call ormbr.
If the matrix A is complex,
• to compute Q and P explicitly, call ungbr.

• to multiply a general matrix by Q or P, call unmbr.
Input Parameters

a Arrays:
major layout) contains the matrix A.
lda The leading dimension of a; at least max(1, m) for column major layout
and at least max(1, n) for row major layout.
Output Parameters
a If m≥n, the diagonal and first super-diagonal of a are overwritten by the

upper bidiagonal matrix B. The elements below the diagonal, with the array
tauq, represent the orthogonal matrix Q as a product of elementary
682
LAPACK Routines 3
reflectors, and the elements above the first superdiagonal, with the array
taup, represent the orthogonal matrix P as a product of elementary
reflectors.
If m < n, the diagonal and first sub-diagonal of a are overwritten by the
lower bidiagonal matrix B. The elements below the first subdiagonal, with
the array tauq, represent the orthogonal matrix Q as a product of
elementary reflectors, and the elements above the diagonal, with the array
taup, represent the orthogonal matrix P as a product of elementary
reflectors.
d Array, size at least max(1, min(m, n)).
Contains the diagonal elements of B.
e Array, size at least max(1, min(m, n) - 1). Contains the off-diagonal

elements of B.
tauq, taup Arrays, size at least max (1, min(m, n)). The scalar factors of the
elementary reflectors which represent the orthogonal or unitary matrices P
and Q.
Return Values
Application Notes
The computed matrices Q, B, and P satisfy QBPH = A + E, where ||E||2 = c(n)ε ||A||2, c(n) is a
modestly increasing function of n, and ε is the machine precision.
(4/3)*n2*(3*m - n) for m≥n,
(4/3)*m2*(3*n - m) for m < n.
If n is much less than m, it can be more efficient to first form the QR factorization of A by calling geqrf and
then reduce the factor R to bidiagonal form. This requires approximately 2*n2*(m + n) floating-point
operations.
If m is much less than n, it can be more efficient to first form the LQ factorization of A by calling gelqf and
then reduce the factor L to bidiagonal form. This requires approximately 2*m2*(m + n) floating-point
operations.
?gbbrd
Reduces a general band matrix to bidiagonal form.
Syntax
lapack_int LAPACKE_sgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, float* ab, lapack_int ldab, float* d,
float* e, float* q, lapack_int ldq, float* pt, lapack_int ldpt, float* c, lapack_int
ldc );
683
lapack_int LAPACKE_dgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,

lapack_int ncc, lapack_int kl, lapack_int ku, double* ab, lapack_int ldab, double* d,
double* e, double* q, lapack_int ldq, double* pt, lapack_int ldpt, double* c,
lapack_int ldc );
lapack_int LAPACKE_cgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, lapack_complex_float* ab, lapack_int
ldab, float* d, float* e, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* pt, lapack_int ldpt, lapack_complex_float* c, lapack_int ldc );
lapack_int LAPACKE_zgbbrd( int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int ncc, lapack_int kl, lapack_int ku, lapack_complex_double* ab, lapack_int
ldab, double* d, double* e, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* pt, lapack_int ldpt, lapack_complex_double* c, lapack_int ldc );
Include Files
• mkl.h
Description
The routine reduces an m-by-n band matrix A to upper bidiagonal matrix B: A = Q*B*PH. Here the matrices
Q and P are orthogonal (for real A) or unitary (for complex A). They are determined as products of Givens
rotation matrices, and may be formed explicitly by the routine if required. The routine can also update a
matrix C as follows: C = QH*C.
Input Parameters

vect Must be 'N' or 'Q' or 'P' or 'B'.
If vect = 'N', neither Q nor PH is generated.
If vect = 'Q', the routine generates the matrix Q.
If vect = 'P', the routine generates the matrix PH.
If vect = 'B', the routine generates both Q and PH.
ncc The number of columns in C (ncc≥ 0).
kl The number of sub-diagonals within the band of A (kl≥ 0).
ku The number of super-diagonals within the band of A (ku≥ 0).
ab, c Arrays:
ab(size max(1, ldab*n) for column major layout and max(1, ldab*m) for
row major layout) contains the matrix A in band storage (see Matrix
Storage Schemes).
c(size max(1, ldc*ncc) for column major layout and max(1, ldc*m) for
row major layout) contains an m-by-ncc matrix C.
684
LAPACK Routines 3
If ncc = 0, the array c is not referenced.
ldab The leading dimension of the array ab (ldab≥kl + ku + 1).
ldq The leading dimension of the output array q.

ldq≥ max(1, m) if vect = 'Q' or 'B', ldq≥ 1 otherwise.
ldpt The leading dimension of the output array pt.

ldpt≥ max(1, n) if vect = 'P' or 'B', ldpt≥ 1 otherwise.
ldc The leading dimension of the array c.

ldc≥ max(1, m) if ncc > 0; ldc≥ 1 if ncc = 0.
Output Parameters
ab Overwritten by values generated during the reduction.
d Array, size at least max(1, min(m, n)). Contains the diagonal elements of
the matrix B.
e Array, size at least max(1, min(m, n) - 1).
Contains the off-diagonal elements of B.
q, pt Arrays:
qsize max(1, ldq*m) contains the output m-by-m matrix Q.
psize max(1, ldpt*n) contains the output n-by-n matrix PT.
c Overwritten by the product QH*C.
c is not referenced if ncc = 0.
Return Values
Application Notes
The computed matrices Q, B, and P satisfy Q*B*PH = A + E, where ||E||2 = c(n)ε ||A||2, c(n) is a
If m = n, the total number of floating-point operations for real flavors is approximately the sum of:
6*n2*(kl + ku) if vect = 'N' and ncc = 0,

3*n2*ncc*(kl + ku - 1)/(kl + ku) if C is updated, and
3*n3*(kl + ku - 1)/(kl + ku) if either Q or PH is generated (double this if both).
To estimate the number of operations for complex flavors, use the same formulas with the coefficients 20
and 10 (instead of 6 and 3).
685
?orgbr
Generates the real orthogonal matrix Q or PT
determined by ?gebrd.
Syntax
lapack_int LAPACKE_sorgbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of the orthogonal matrices Q and PT formed by the routines gebrd/
gebrd. Use this routine after a call to sgebrd/dgebrd. All valid combinations of arguments are described in
Input parameters. In most cases you need the following:
To compute the whole m-by-m matrix Q:
LAPACKE_?orgbr(matrix_layout, 'Q', m, m, n, a, lda, tau )
(note that the array a must have at least m columns).

To form the n leading columns of Q if m > n:
LAPACKE_?orgbr(matrix_layout, 'Q', m, n, n, a, lda, tau )
To compute the whole n-by-n matrix PT:
LAPACKE_?orgbr(matrix_layout, 'P', n, n, m, a, lda, tau )
(note that the array a must have at least n rows).

To form the m leading rows of PT if m < n:
LAPACKE_?orgbr(matrix_layout, 'P', m, n, m, a, lda, tau )
Input Parameters

vect Must be 'Q' or 'P'.
If vect = 'P', the routine generates the matrix PT.
m, n The number of rows (m) and columns (n) in the matrix Q or PT to be

returned (m≥ 0, n≥ 0).
If vect = 'Q', m≥n≥ min(m, k).
If vect = 'P', n≥m≥ min(n, k).
k If vect = 'Q', the number of columns in the original m-by-k matrix

reduced by gebrd.
686
LAPACK Routines 3
If vect = 'P', the number of rows in the original k-by-n matrix reduced
by gebrd.
a Array, size at least lda*n for column major layout and lda*m for row major
layout. The vectors which define the elementary reflectors, as returned by
gebrd.
lda The leading dimension of the array a. lda≥ max(1, m) for column major
layout and at least max(1, n) for row major layout .
tau Array, size min (m,k) if vect = 'Q', min (n,k) if vect = 'P'.
Scalar factor of the elementary reflector H(i) or G(i), which determines Q
and PT as returned by gebrd in the array tauq or taup.
Output Parameters
a Overwritten by the orthogonal matrix Q or PT (or the leading rows or

columns thereof) as specified by vect, m, and n.
Return Values
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).
The approximate numbers of floating-point operations for the cases listed in Description are as follows:
To form the whole of Q:
(4/3)*n*(3m2 - 3m*n + n2) if m > n;
(4/3)*m3 if m≤n.
To form the n leading columns of Q when m > n:
(2/3)*n2*(3m - n2) if m > n.

To form the whole of PT:
(4/3)*n3 if m≥n;
(4/3)*m*(3n2 - 3m*n + m2) if m < n.
To form the m leading columns of PT when m < n:
(2/3)*n2*(3m - n2) if m > n.

The complex counterpart of this routine is ungbr.
?ormbr
Multiplies an arbitrary real matrix by the real
orthogonal matrix Q or PT determined by ?gebrd.
Syntax
lapack_int LAPACKE_sormbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const float* a, lapack_int lda, const float*
tau, float* c, lapack_int ldc);
687
lapack_int LAPACKE_dormbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const double* a, lapack_int lda, const
Include Files
• mkl.h
Description
Given an arbitrary real matrix C, this routine forms one of the matrix products Q*C, QT*C, C*Q, C*QT, P*C,
PT*C, C*P, C*PT, where Q and P are orthogonal matrices computed by a call to gebrd. The routine overwrites
the product on C.
Input Parameters
In the descriptions below, r denotes the order of Q or PT:
If side = 'L', r = m; if side = 'R', r = n.

If vect = 'Q', then Q or QT is applied to C.
If vect = 'P', then P or PT is applied to C.
side Must be 'L' or 'R'.
If side = 'L', multipliers are applied to C from the left.
If side = 'R', they are applied to C from the right.
trans Must be 'N' or 'T'.
If trans = 'N', then Q or P is applied to C.
If trans = 'T', then QT or PT is applied to C.
m The number of rows in C.
n The number of columns in C.
k One of the dimensions of A in ?gebrd:
If vect = 'Q', the number of columns in A;
If vect = 'P', the number of rows in A.
Constraints: m≥ 0, n≥ 0, k≥ 0.
a, c Arrays:
a is the array a as returned by ?gebrd.
The size of a depends on the value of the matrix_layout, vect, and side
parameters:
matrix_layout vect side size
column major 'Q' - max(1, lda*k)
688
LAPACK Routines 3
column major 'P' 'L' max(1, lda*m)
column major 'P' 'R' max(1, lda*n)
row major 'Q' 'L' max(1, lda*m)
row major 'Q' 'R' max(1, lda*n)
row major 'P' - max(1, lda*k)
major layout) holds the matrix C.

lda≥ max(1, r) for column major layout and at least max(1, k) for row
major layout if vect = 'Q';
lda≥ max(1, min(r,k)) for column major layout and at least max(1, r) for
row major layout if vect = 'P'.
ldc≥ max(1, n) for row major layout .
tau Array, size at least max (1, min(r, k)).

For vect = 'Q', the array tauq as returned by ?gebrd. For vect = 'P',
the array taup as returned by ?gebrd.
Output Parameters
c Overwritten by the product Q*C, QT*C, C*Q, C*Q,T, P*C, PT*C, C*P, or C*PT,
as specified by vect, side, and trans.
Return Values
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2.
The total number of floating-point operations is approximately

2*n*k(2*m - k) if side = 'L' and m≥k;
2*m*k(2*n - k) if side = 'R' and n≥k;
2*m2*n if side = 'L' and m < k;
2*n2*m if side = 'R' and n < k.
The complex counterpart of this routine is unmbr.
689
?ungbr
Generates the complex unitary matrix Q or PH
determined by ?gebrd.
Syntax
lapack_int LAPACKE_cungbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, lapack_complex_float* a, lapack_int lda, const lapack_complex_float*
tau);
lapack_int LAPACKE_zungbr (int matrix_layout, char vect, lapack_int m, lapack_int n,
lapack_int k, lapack_complex_double* a, lapack_int lda, const lapack_complex_double*
tau);
Include Files
• mkl.h
Description
The routine generates the whole or part of the unitary matrices Q and PH formed by the routines gebrd/
gebrd. Use this routine after a call to cgebrd/zgebrd. All valid combinations of arguments are described in
Input Parameters; in most cases you need the following:
To compute the whole m-by-m matrix Q, use:
LAPACKE_?ungbr(matrix_layout, 'Q', m, m, n, a, lda, tau)
(note that the array a must have at least m columns).

To form the n leading columns of Q if m > n, use:
LAPACKE_?ungbr(matrix_layout, 'Q', m, n, n, a, lda, tau)
To compute the whole n-by-n matrix PH, use:
LAPACKE_?ungbr(matrix_layout, 'P', n, n, m, a, lda, tau)
(note that the array a must have at least n rows).

To form the m leading rows of PH if m < n, use:
LAPACKE_?ungbr(matrix_layout, 'P', m, m, n, a, lda, tau)
Input Parameters

If vect = 'P', the routine generates the matrix PH.
m The number of required rows of Q or PH.
n The number of required columns of Q or PH.
690
LAPACK Routines 3
For vect = 'Q': k≤n≤m if m > k, or m = n if m≤k.
For vect = 'P': k≤m≤n if n > k, or m = n if n≤k.
a Arrays:
a, size at least lda*n for column major layout and lda*m for row major
layout, is the array a as returned by ?gebrd.
tau For vect = 'Q', the array tauq as returned by ?gebrd. For vect = 'P',
The dimension of tau must be at least max(1, min(m, k)) for vect = 'Q',
or max(1, min(m, k)) for vect = 'P'.
Output Parameters
a Overwritten by the orthogonal matrix Q or PT (or the leading rows or

columns thereof) as specified by vect, m, and n.
Return Values
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).
The approximate numbers of possible floating-point operations are listed below:

To compute the whole matrix Q:
(16/3)n(3m2 - 3m*n + n2) if m > n;
(16/3)m3 if m≤n.
To form the n leading columns of Q when m > n:
(8/3)n2(3m - n2).
To compute the whole matrix PH:
(16/3)n3 if m≥n;
(16/3)m(3n2 - 3m*n + m2) if m < n.
To form the m leading columns of PH when m < n:
(8/3)n2(3m - n2) if m > n.

The real counterpart of this routine is orgbr.
?unmbr
Multiplies an arbitrary complex matrix by the unitary
matrix Q or P determined by ?gebrd.
691
Syntax
lapack_int LAPACKE_cunmbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const lapack_complex_float* a, lapack_int
lapack_int LAPACKE_zunmbr (int matrix_layout, char vect, char side, char trans,
lapack_int m, lapack_int n, lapack_int k, const lapack_complex_double* a, lapack_int
lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int ldc);
Include Files
• mkl.h
Description
Given an arbitrary complex matrix C, this routine forms one of the matrix products Q*C, QH*C, C*Q, C*QH,
P*C, PH*C, C*P, or C*PH, where Q and P are unitary matrices computed by a call to gebrd/gebrd. The routine
overwrites the product on C.
Input Parameters
In the descriptions below, r denotes the order of Q or PH:

If vect = 'Q', then Q or QH is applied to C.
If vect = 'P', then P or PH is applied to C.
If side = 'L', multipliers are applied to C from the left.
If side = 'R', they are applied to C from the right.
trans Must be 'N' or 'C'.
If trans = 'N', then Q or P is applied to C.
If trans = 'C', then QH or PH is applied to C.
m The number of rows in C.
n The number of columns in C.
a, c Arrays:
a is the array a as returned by ?gebrd.
692
LAPACK Routines 3
The size of a depends on the value of the matrix_layout, vect, and side
parameters:
column major 'Q' - max(1, lda*k)
column major 'P' 'L' max(1, lda*m)
column major 'P' 'R' max(1, lda*n)
row major 'Q' 'L' max(1, lda*m)
row major 'Q' 'R' max(1, lda*n)
row major 'P' - max(1, lda*k)
major layout) holds the matrix C.

lda≥ max(1, r) for column major layout and at least max(1, k) for row
major layout if vect = 'Q';
lda≥ max(1, min(r,k)) for column major layout and at least max(1, r) for
row major layout if vect = 'P'.
ldc The leading dimension of c; ldc≥ max(1, m).
tau Array, size at least max (1, min(r, k)).

For vect = 'Q', the array tauq as returned by ?gebrd. For vect = 'P',
Output Parameters
c Overwritten by the product Q*C, QH*C, C*Q, C*QH, P*C, PH*C, C*P, or
C*PH, as specified by vect, side, and trans.
Return Values
Application Notes
The total number of floating-point operations is approximately

8*n*k(2*m - k) if side = 'L' and m≥k;
8*m*k(2*n - k) if side = 'R' and n≥k;
8*m2*n if side = 'L' and m < k;
693
8*n2*m if side = 'R' and n < k.

The real counterpart of this routine is ormbr.
?bdsqr
Computes the singular value decomposition of a
general matrix that has been reduced to bidiagonal
form.
Syntax
lapack_int LAPACKE_sbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, float* d, float* e, float* vt, lapack_int ldvt, float*
u, lapack_int ldu, float* c, lapack_int ldc );
lapack_int LAPACKE_dbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, double* d, double* e, double* vt, lapack_int ldvt,
double* u, lapack_int ldu, double* c, lapack_int ldc );
lapack_int LAPACKE_cbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, float* d, float* e, lapack_complex_float* vt,
lapack_int ldvt, lapack_complex_float* u, lapack_int ldu, lapack_complex_float* c,
lapack_int ldc );
lapack_int LAPACKE_zbdsqr( int matrix_layout, char uplo, lapack_int n, lapack_int ncvt,
lapack_int nru, lapack_int ncc, double* d, double* e, lapack_complex_double* vt,
lapack_int ldvt, lapack_complex_double* u, lapack_int ldu, lapack_complex_double* c,
lapack_int ldc );
Include Files
• mkl.h
Description
The routine computes the singular values and, optionally, the right and/or left singular vectors from the
Singular Value Decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal matrix B using the implicit
zero-shift QR algorithm. The SVD of B has the form B = Q*S*PH where S is the diagonal matrix of singular
values, Q is an orthogonal matrix of left singular vectors, and P is an orthogonal matrix of right singular
vectors. If left singular vectors are requested, this subroutine actually returns U *Q instead of Q, and, if right
singular vectors are requested, this subroutine returns PH *VT instead of PH, for given real/complex input
matrices U and VT. When U and VT are the orthogonal/unitary matrices that reduce a general matrix A to
bidiagonal form: A = U*B*VT, as computed by ?gebrd, then
A = (U*Q)*S*(PH*VT)
is the SVD of A. Optionally, the subroutine may also compute QH *C for a given real/complex input matrix C.
Input Parameters

If uplo = 'U', B is an upper bidiagonal matrix.
If uplo = 'L', B is a lower bidiagonal matrix.
n The order of the matrix B (n≥ 0).
694
LAPACK Routines 3
ncvt The number of columns of the matrix VT, that is, the number of right
singular vectors (ncvt≥ 0).
Set ncvt = 0 if no right singular vectors are required.
nru The number of rows in U, that is, the number of left singular vectors (nru≥
0).
Set nru = 0 if no left singular vectors are required.
ncc The number of columns in the matrix C used for computing the product
QH*C (ncc≥ 0). Set ncc = 0 if no matrix C is supplied.
d, e Arrays:
d contains the diagonal elements of B.
The size of d must be at least max(1, n).
e contains the (n-1) off-diagonal elements of B.
The size of e must be at least max(1, n - 1).
vt, u, c Arrays:
vt, size max(1, ldvt*ncvt) for column major layout and max(1, ldvt*n)
for row major layout, contains an n-by-ncvt matrix VT.
vt is not referenced if ncvt = 0.
u, size max(1, ldu*n) for column major layout and max(1, ldu*nru) for
row major layout, contains an nru by n matrix U.
u is not referenced if nru = 0.
c, size max(1, ldc*ncc) for column major layout and max(1, ldc*n) for
row major layout, contains the n-by-ncc matrix C for computing the
product QH*C.
ldvt The leading dimension of vt. Constraints:

ldvt≥ max(1, n) if ncvt > 0 for column major layout and ldvt≥ max(1,
ncvt) for row major layout;
ldvt≥ 1 if ncvt = 0.
ldu The leading dimension of u. Constraint:

ldu≥ max(1, nru) for column major layout and ldu≥ max(1, n) for row
major layout .
ldc The leading dimension of c. Constraints:

ldc≥ max(1, n) if ncc > 0 for column major layout and ldc≥ max(1, ncc)
for row major layout; ldc≥ 1 otherwise.
Output Parameters
d On exit, if info = 0, overwritten by the singular values in decreasing order

(see info).
e On exit, if info = 0, e is destroyed. See also info below.
695
c Overwritten by the product QH*C.
vt On exit, this array is overwritten by PH *VT. Not referenced if ncvt = 0.
u On exit, this array is overwritten by U *Q. Not referenced if nru = 0.
Return Values
If info > 0,
If ncvt = nru = ncc = 0,

• info = 1, a split was marked by a positive value in e
• info = 2, the current block of z not diagonalized after 100*n iterations (in the inner while loop)
• info = 3, termination criterion of the outer while loop is not met (the program created more than n
unreduced blocks).
In all other cases when ncvt, nru, or ncc > 0, the algorithm did not converge; d and e contain the elements
of a bidiagonal matrix that is orthogonally similar to the input matrix B; if info = i, i elements of e have
not converged to zero.
Application Notes
Each singular value and singular vector is computed to high relative accuracy. However, the reduction to
bidiagonal form (prior to calling the routine) may decrease the relative accuracy in the small singular values
of the original matrix if its singular values vary widely in magnitude.
If si is an exact singular value of B, and si is the corresponding computed value, then
|si - σi| ≤p*(m,n)*ε*σi
where p(m, n) is a modestly increasing function of m and n, and ε is the machine precision.
If only singular values are computed, they are computed more accurately than when some singular vectors
are also computed (that is, the function p(m, n) is smaller).
If ui is the corresponding exact left singular vector of B, and wi is the corresponding computed left singular
vector, then the angle θ(ui, wi) between them is bounded as follows:
θ(ui, wi) ≤p(m,n)*ε / min i≠j(|σi - σj|/|σi + σj|).
Here mini≠j(|σi - σj|/|σi + σj|) is the relative gap between σi and the other singular values. A similar
error bound holds for the right singular vectors.
The total number of real floating-point operations is roughly proportional to n2 if only the singular values are
computed. About 6n2*nru additional operations (12n2*nru for complex flavors) are required to compute the
left singular vectors and about 6n2*ncvt operations (12n2*ncvt for complex flavors) to compute the right
singular vectors.
?bdsdc
Computes the singular value decomposition of a real
bidiagonal matrix using a divide and conquer method.
Syntax
lapack_int LAPACKE_sbdsdc (int matrix_layout, char uplo, char compq, lapack_int n,
float* d, float* e, float* u, lapack_int ldu, float* vt, lapack_int ldvt, float* q,
lapack_int* iq);
696
LAPACK Routines 3
lapack_int LAPACKE_dbdsdc (int matrix_layout, char uplo, char compq, lapack_int n,
double* d, double* e, double* u, lapack_int ldu, double* vt, lapack_int ldvt, double*
q, lapack_int* iq);
Include Files
• mkl.h
Description
The routine computes the Singular Value Decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal
matrix B: B = U*Σ*VT, using a divide and conquer method, where Σ is a diagonal matrix with non-negative
diagonal elements (the singular values of B), and U and V are orthogonal matrices of left and right singular
vectors, respectively. ?bdsdc can be used to compute all singular values, and optionally, singular vectors or
singular vectors in compact form.
This rotuine uses ?lasd0, ?lasd1, ?lasd2, ?lasd3, ?lasd4, ?lasd5, ?lasd6, ?lasd7, ?lasd8, ?lasd9, ?
lasda, ?lasdq, ?lasdt.
Input Parameters

If uplo = 'U', B is an upper bidiagonal matrix.
If uplo = 'L', B is a lower bidiagonal matrix.
compq Must be 'N', 'P', or 'I'.
If compq = 'N', compute singular values only.
If compq = 'P', compute singular values and compute singular vectors in

compact form.
If compq = 'I', compute singular values and singular vectors.
n The order of the matrix B (n ≥ 0).
d, e Arrays:
d contains the n diagonal elements of the bidiagonal matrix B. The size of d
must be at least max(1, n).
e contains the off-diagonal elements of the bidiagonal matrix B. The size of
e must be at least max(1, n).
ldu The leading dimension of the output array u; ldu≥ 1.

If singular vectors are desired, then ldu≥ max(1, n), regardless of the value
of matrix_layout.
ldvt The leading dimension of the output array vt; ldvt≥ 1.

If singular vectors are desired, then ldvt≥ max(1, n), regardless of the value
of matrix_layout.
697
Output Parameters
d If info = 0, overwritten by the singular values of B.
e On exit, e is overwritten.
u, vt, q Arrays: u(size ldu*n), vt(size ldvt*n), q(size ≥n*(11 + 2*smlsiz

+ 8*int(log2(n/(smlsiz+1)))) where smlsiz is returned by ilaenv and is
equal to maximum size of the subproblems at the bottom of the
computation tree )..
If compq = 'I', then on exit u contains the left singular vectors of the
bidiagonal matrix B, unless info≠ 0 (seeinfo). For other values of compq, u
is not referenced.
if compq = 'I', then on exit vtT contains the right singular vectors of the
bidiagonal matrix B, unless info≠ 0 (seeinfo). For other values of compq,
vt is not referenced.
If compq = 'P', then on exit, if info = 0, q and iq contain the left and
right singular vectors in a compact form. Specifically, q contains all the
float (for sbdsdc) or double (for dbdsdc) data for singular vectors. For
other values of compq, q is not referenced.
iq
Array: iq(size ≥n*(3 + 3*int(log2(n/(smlsiz+1)))) where smlsiz is
returned by ilaenv and is equal to maximum size of the subproblems at
the bottom of the computation tree.).
If compq = 'P', then on exit, if info = 0, q and iq contain the left and
right singular vectors in a compact form. Specifically, iq contains all the
lapack_int data for singular vectors. For other values of compq, iq is not
referenced.
Return Values
If info = i, the algorithm failed to compute a singular value. The update process of divide and conquer
failed.
Symmetric Eigenvalue Problems: LAPACK Computational Routines

Symmetric eigenvalue problems are posed as follows: given an n-by-n real symmetric or complex
Hermitian matrix A, find the eigenvaluesλ and the corresponding eigenvectors z that satisfy the equation
Az = λz (or, equivalently, zHA = λzH).
In such eigenvalue problems, all n eigenvalues are real not only for real symmetric but also for complex
Hermitian matrices A, and there exists an orthonormal system of n eigenvectors. If A is a symmetric or
Hermitian positive-definite matrix, all eigenvalues are positive.
To solve a symmetric eigenvalue problem with LAPACK, you usually need to reduce the matrix to tridiagonal
form and then solve the eigenvalue problem with the tridiagonal matrix obtained. LAPACK includes routines
for reducing the matrix to a tridiagonal form by an orthogonal (or unitary) similarity transformation A =
QTQH as well as for solving tridiagonal symmetric eigenvalue problems. These routines are listed in Table
"Computational Routines for Solving Symmetric Eigenvalue Problems".
698
LAPACK Routines 3
There are different routines for symmetric eigenvalue problems, depending on whether you need all
eigenvectors or only some of them or eigenvalues only, whether the matrix A is positive-definite or not, and
so on.
These routines are based on three primary algorithms for computing eigenvalues and eigenvectors of
symmetric problems: the divide and conquer algorithm, the QR algorithm, and bisection followed by inverse
iteration. The divide and conquer algorithm is generally more efficient and is recommended for computing all
eigenvalues and eigenvectors. Furthermore, to solve an eigenvalue problem using the divide and conquer
algorithm, you need to call only one routine. In general, more than one routine has to be called if the QR
algorithm or bisection followed by inverse iteration is used.
Computational Routines for Solving Symmetric Eigenvalue Problems

Operation Real symmetric matrices Complex Hermitian
matrices
Reduce to tridiagonal form A = QTQH (full sytrd hetrd

storage)
Reduce to tridiagonal form A = QTQH sptrd hptrd

(packed storage)
Reduce to tridiagonal form A = QTQH (band sbtrd hbtrd

storage).
Generate matrix Q (full storage) orgtr ungtr
Generate matrix Q (packed storage) opgtr upgtr
Apply matrix Q (full storage) ormtr unmtr
Apply matrix Q (packed storage) opmtr upmtr
Find all eigenvalues of a tridiagonal matrix sterf

T
Find all eigenvalues and eigenvectors of a steqr stedc steqr stedc

tridiagonal matrix T
Find all eigenvalues and eigenvectors of a pteqr pteqr

tridiagonal positive-definite matrix T.
Find selected eigenvalues of a tridiagonal stebz stegr stegr

matrix T
Find selected eigenvectors of a tridiagonal stein stegr stein stegr

matrix T
Find selected eigenvalues and eigenvectors stemr stemr

of f a real symmetric tridiagonal matrix T
Compute the reciprocal condition numbers disna disna

for the eigenvectors
?sytrd
Reduces a real symmetric matrix to tridiagonal form.
Syntax
lapack_int LAPACKE_ssytrd (int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, float* d, float* e, float* tau);
lapack_int LAPACKE_dsytrd (int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, double* d, double* e, double* tau);
699
Include Files
• mkl.h
Description
The routine reduces a real symmetric matrix A to symmetric tridiagonal form T by an orthogonal similarity
transformation: A = Q*T*QT. The orthogonal matrix Q is not formed explicitly but is represented as a
product of n-1 elementary reflectors. Routines are provided for working with Q in this representation (see
Application Notes below).
Input Parameters

If uplo = 'U', a stores the upper triangular part of A.
If uplo = 'L', a stores the lower triangular part of A.
n The order of the matrix A (n≥ 0).
a a (size max(1, lda*n)) is an array containing either upper or lower

triangular part of the matrix A, as specified by uplo. If uplo = 'U', the
leading n-by-n upper triangular part of a contains the upper triangular part
of the matrix A, and the strictly lower triangular part of A is not referenced.
of A is not referenced.
Output Parameters
a On exit,
if uplo = 'U', the diagonal and first superdiagonal of A are overwritten by
the corresponding elements of the tridiagonal matrix T, and the elements
above the first superdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors;
if uplo = 'L', the diagonal and first subdiagonal of A are overwritten by
below the first subdiagonal, with the array tau, represent the orthogonal
d, e, tau Arrays:
d contains the diagonal elements of the matrix T.
e contains the off-diagonal elements of T.
The size of e must be at least max(1, n-1).
tau stores (n-1) scalars that define elementary reflectors in decomposition
of the orthogonal matrix Q in a product of n-1 elementary reflectors. tau(n)
is used as workspace.
700
LAPACK Routines 3
The size of tau must be at least max(1, n).
Return Values
Application Notes
The computed matrix T is exactly similar to a matrix A+E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
The approximate number of floating-point operations is (4/3)n3.
After calling this routine, you can call the following:
orgtr to form the computed matrix Q explicitly
ormtr to multiply a real matrix by Q.
The complex counterpart of this routine is ?hetrd.
?orgtr
Generates the real orthogonal matrix Q determined
by ?sytrd.
Syntax
lapack_int LAPACKE_sorgtr (int matrix_layout, char uplo, lapack_int n, float* a,
lapack_int lda, const float* tau);
lapack_int LAPACKE_dorgtr (int matrix_layout, char uplo, lapack_int n, double* a,
lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n orthogonal matrix Q formed by sytrd when reducing a real
symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sytrd.
Input Parameters

Use the same uplo as supplied to ?sytrd.
n The order of the matrix Q (n≥ 0).
a, tau Arrays:
a (size max(1, lda*n)) is the array a as returned by ?sytrd.
701
tau is the array tau as returned by ?sytrd.
The size of tau must be at least max(1, n-1).
Output Parameters
a Overwritten by the orthogonal matrix Q.
Return Values
Application Notes
The computed matrix Q differs from an exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε),
The complex counterpart of this routine is ungtr.
?ormtr
Multiplies a real matrix by the real orthogonal matrix
Q determined by ?sytrd.
Syntax
lapack_int LAPACKE_sormtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const float* a, lapack_int lda, const float* tau, float*
c, lapack_int ldc);
lapack_int LAPACKE_dormtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const double* a, lapack_int lda, const double* tau,
Include Files
• mkl.h
Description
The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by sytrd when
reducing a real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sytrd.
Input Parameters
In the descriptions below, r denotes the order of Q:

702
LAPACK Routines 3
Use the same uplo as supplied to ?sytrd.
a, c, tau a (size max(1, lda*r)) and tau are the arrays returned by ?sytrd.
The size of tau must be at least max(1, r-1).

major layout) contains the matrix C.
lda The leading dimension of a; lda≥ max(1, r).
at least max(1, n) for row major layout .
Output Parameters
and trans).
Return Values
Application Notes
The total number of floating-point operations is approximately 2*m2*n, if side = 'L', or 2*n2*m, if side =
'R'.
The complex counterpart of this routine is unmtr.
?hetrd
Reduces a complex Hermitian matrix to tridiagonal
form.
703
Syntax
lapack_int LAPACKE_chetrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* d, float* e, lapack_complex_float*
tau );
lapack_int LAPACKE_zhetrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* d, double* e, lapack_complex_double*
tau );
Include Files
• mkl.h
Description
The routine reduces a complex Hermitian matrix A to symmetric tridiagonal form T by a unitary similarity
transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitly but is represented as a product of
n-1 elementary reflectors. Routines are provided to work with Q in this representation. (They are described
later in this section .)
Input Parameters


triangular part of the matrix A, as specified by uplo. If uplo = 'U', the
leading n-by-n upper triangular part of a contains the upper triangular part
of the matrix A, and the strictly lower triangular part of A is not referenced.
Output Parameters
a On exit,
if uplo = 'U', the diagonal and first superdiagonal of A are overwritten by
above the first superdiagonal, with the array tau, represent the orthogonal
matrix Q as a product of elementary reflectors;
below the first subdiagonal, with the array tau, represent the orthogonal
d, e Arrays:
704
LAPACK Routines 3
The dimension of d must be at least max(1, n).
The dimension of e must be at least max(1, n-1).
tau Array, size at least max(1, n-1). Stores (n-1) scalars that define elementary
reflectors in decomposition of the unitary matrix Q in a product of n-1
elementary reflectors.
Return Values
Application Notes
The computed matrix T is exactly similar to a matrix A + E, where ||E||2 = c(n)*ε*||A||2, c(n) is a
ungtr to form the computed matrix Q explicitly
unmtr to multiply a complex matrix by Q.
The real counterpart of this routine is ?sytrd.
?ungtr
Generates the complex unitary matrix Q determined
by ?hetrd.
Syntax
lapack_int LAPACKE_cungtr (int matrix_layout, char uplo, lapack_int n,
lapack_int LAPACKE_zungtr (int matrix_layout, char uplo, lapack_int n,
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n unitary matrix Q formed by hetrd when reducing a complex
Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to ?hetrd.
Input Parameters

705
Use the same uplo as supplied to ?hetrd.
a, tau Arrays:
a (size max(1, lda*n)) is the array a as returned by ?hetrd.
tau is the array tau as returned by ?hetrd.
The dimension of tau must be at least max(1, n-1).
Output Parameters
a Overwritten by the unitary matrix Q.
Return Values
Application Notes
The computed matrix Q differs from an exactly unitary matrix by a matrix E such that ||E||2 = O(ε), where
ε is the machine precision.
The real counterpart of this routine is orgtr.
?unmtr
Multiplies a complex matrix by the complex unitary
matrix Q determined by ?hetrd.
Syntax
lapack_int LAPACKE_cunmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_int LAPACKE_zunmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
Include Files
• mkl.h
Description
The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix Q formed by hetrd when
reducing a complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to ?
hetrd.
706
LAPACK Routines 3
Input Parameters

Use the same uplo as supplied to ?hetrd.
a, c, tau a (size max(1, lda*r)) and tau are the arrays returned by ?hetrd.
The dimension of tau must be at least max(1, r-1).

lda The leading dimension of a; lda≥ max(1, r).
ldc The leading dimension of c; ldc≥ max(1, n) for column major layout and
ldc≥ max(1, m) for row major layout .
Output Parameters
and trans).
Return Values
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2, where
The total number of floating-point operations is approximately 8*m2*n if side = 'L' or 8*n2*m if side =
'R'.
The real counterpart of this routine is ormtr.
707
?sptrd
Reduces a real symmetric matrix to tridiagonal form
using packed storage.
Syntax
lapack_int LAPACKE_ssptrd (int matrix_layout, char uplo, lapack_int n, float* ap,
float* d, float* e, float* tau);
lapack_int LAPACKE_dsptrd (int matrix_layout, char uplo, lapack_int n, double* ap,
double* d, double* e, double* tau);
Include Files
• mkl.h
Description
The routine reduces a packed real symmetric matrix A to symmetric tridiagonal form T by an orthogonal
similarity transformation: A = Q*T*QT. The orthogonal matrix Q is not formed explicitly but is represented as
a product of n-1 elementary reflectors. Routines are provided for working with Q in this representation. See
Application Notes below for details.
Input Parameters

If uplo = 'U', ap stores the packed upper triangle of A.
If uplo = 'L', ap stores the packed lower triangle of A.
ap Array, size at least max(1, n(n+1)/2). Contains either upper or lower

triangle of A (as specified by uplo) in the packed form described in Matrix
Storage Schemes.
Output Parameters
ap Overwritten by the tridiagonal matrix T and details of the orthogonal matrix

Q, as specified by uplo.
d, e, tau Arrays:
tau Stores (n-1) scalars that define elementary reflectors in decomposition
of the matrix Q in a product of n-1 reflectors.
Return Values
708
LAPACK Routines 3
Application Notes
The matrix Q is represented as a product of n-1 elementary reflectors, as follows :
• If uplo = 'U', Q = H(n-1) ... H(2)H(1)

H(i) = I - tau*v*vT
where tau is a real scalar and v is a real vector with v(i+1:n) = 0 and v(i) = 1.
On exit, tau is stored in tau[i - 1], and v(1:i-1) is stored in AP, overwriting A(1:i-1, i+1).
• If uplo = 'L', Q = H(1)H(2) ... H(n-1)

H(i) = I - tau*v*vT
where tau is a real scalar and v is a real vector with v(1:i) = 0 and v(i+1) = 1.
On exit, tau is stored in tau[i - 1], and v(i+2:n) is stored in AP, overwriting A(i+2:n, i).
modestly increasing function of n, and ε is the machine precision. The approximate number of floating-point
operations is (4/3)n3.
opgtr to form the computed matrix Q explicitly
opmtr to multiply a real matrix by Q.
The complex counterpart of this routine is hptrd.
?opgtr
by ?sptrd.
Syntax
lapack_int LAPACKE_sopgtr (int matrix_layout, char uplo, lapack_int n, const float* ap,
const float* tau, float* q, lapack_int ldq);
lapack_int LAPACKE_dopgtr (int matrix_layout, char uplo, lapack_int n, const double*
ap, const double* tau, double* q, lapack_int ldq);
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n orthogonal matrix Q formed by sptrd when reducing a packed real
symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?sptrd.
709
Input Parameters

uplo Must be 'U' or 'L'. Use the same uplo as supplied to ?sptrd.
ap, tau Arrays ap and tau, as returned by ?sptrd.
The size of ap must be at least max(1, n(n+1)/2).

The size of tau must be at least max(1, n-1).
ldq The leading dimension of the output array q; at least max(1, n).
Output Parameters
q Array, size (size max(1, ldq*n)) .
Contains the computed matrix Q.
Return Values
Application Notes
The complex counterpart of this routine is upgtr.
?opmtr
Multiplies a real matrix by the real orthogonal matrix
Q determined by ?sptrd.
Syntax
lapack_int LAPACKE_sopmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const float* ap, const float* tau, float* c, lapack_int
ldc);
lapack_int LAPACKE_dopmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const double* ap, const double* tau, double* c, lapack_int
ldc);
Include Files
• mkl.h
Description
710
LAPACK Routines 3
The routine multiplies a real matrix C by Q or QT, where Q is the orthogonal matrix Q formed by sptrd when
reducing a packed real symmetric matrix A to tridiagonal form: A = Q*T*QT. Use this routine after a call to ?
sptrd.
Input Parameters

Use the same uplo as supplied to ?sptrd.
ap, tau, c ap and tau are the arrays returned by ?sptrd.
The dimension of ap must be at least max(1, r(r+1)/2).

The dimension of tau must be at least max(1, r-1).
ldc The leading dimension of c; ldc≥ max(1, n) for column major layout and
ldc≥ max(1, m) for row major layout .
Output Parameters
and trans).
Return Values
711
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε) ||C||2, where
The total number of floating-point operations is approximately 2*m2*n if side = 'L', or 2*n2*m if side =
'R'.
The complex counterpart of this routine is upmtr.
?hptrd
Reduces a complex Hermitian matrix to tridiagonal
form using packed storage.
Syntax
lapack_int LAPACKE_chptrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* ap, float* d, float* e, lapack_complex_float* tau );
lapack_int LAPACKE_zhptrd( int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* ap, double* d, double* e, lapack_complex_double* tau );
Include Files
• mkl.h
Description
The routine reduces a packed complex Hermitian matrix A to symmetric tridiagonal form T by a unitary
similarity transformation: A = Q*T*QH. The unitary matrix Q is not formed explicitly but is represented as a
product of n-1 elementary reflectors. Routines are provided for working with Q in this representation (see
Input Parameters

If uplo = 'U', ap stores the packed upper triangle of A.
If uplo = 'L', ap stores the packed lower triangle of A.
ap Array, size at least max(1, n(n+1)/2). Contains either upper or lower

triangle of A (as specified by uplo) in the packed form described in "Matrix
Storage Schemes.
Output Parameters
ap Overwritten by the tridiagonal matrix T and details of the unitary matrix Q,

d, e Arrays:
712
LAPACK Routines 3
tau Array, size at least max(1, n-1). Stores (n-1) scalars that define elementary
reflectors in decomposition of the unitary matrix Q in a product of
reflectors.
Return Values
Application Notes
upgtr to form the computed matrix Q explicitly
upmtr to multiply a complex matrix by Q.
The real counterpart of this routine is sptrd.
?upgtr
by ?hptrd.
Syntax
lapack_int LAPACKE_cupgtr (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_float* ap, const lapack_complex_float* tau, lapack_complex_float* q,
lapack_int ldq);
lapack_int LAPACKE_zupgtr (int matrix_layout, char uplo, lapack_int n, const
lapack_complex_double* ap, const lapack_complex_double* tau, lapack_complex_double* q,
lapack_int ldq);
Include Files
• mkl.h
Description
The routine explicitly generates the n-by-n unitary matrix Q formed by hptrd when reducing a packed
complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call to ?hptrd.
Input Parameters

uplo Must be 'U' or 'L'. Use the same uplo as supplied to ?hptrd.
713
ap, tau Arrays ap and tau, as returned by ?hptrd.
The dimension of ap must be at least max(1, n(n+1)/2).

ldq The leading dimension of the output array q;

at least max(1, n).
Output Parameters
q Array, size (size max(1, ldq*n)) .
Contains the computed matrix Q.
Return Values
Application Notes
The real counterpart of this routine is opgtr.
?upmtr
Multiplies a complex matrix by the unitary matrix Q
determined by ?hptrd.
Syntax
lapack_int LAPACKE_cupmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_float* ap, const lapack_complex_float*
tau, lapack_complex_float* c, lapack_int ldc);
lapack_int LAPACKE_zupmtr (int matrix_layout, char side, char uplo, char trans,
lapack_int m, lapack_int n, const lapack_complex_double* ap, const
Include Files
• mkl.h
Description
The routine multiplies a complex matrix C by Q or QH, where Q is the unitary matrix formed by hptrd when
reducing a packed complex Hermitian matrix A to tridiagonal form: A = Q*T*QH. Use this routine after a call
to ?hptrd.
714
LAPACK Routines 3
Input Parameters

Use the same uplo as supplied to ?hptrd.
If trans = 'T', the routine multiplies C by QH.
ap, tau, c, ap and tau are the arrays returned by ?hptrd.
The size of ap must be at least max(1, r(r+1)/2).

The size of tau must be at least max(1, r-1).
ldc≥ max(1, n) for row major layout .
Output Parameters
and trans).
Return Values
Application Notes
The computed product differs from the exact product by a matrix E such that ||E||2 = O(ε)*||C||2, where
The total number of floating-point operations is approximately 8*m2*n if side = 'L' or 8*n2*m if side =
'R'.
The real counterpart of this routine is opmtr.
715
?sbtrd
Reduces a real symmetric band matrix to tridiagonal
form.
Syntax
lapack_int LAPACKE_ssbtrd (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* d, float* e, float* q, lapack_int
ldq);
lapack_int LAPACKE_dsbtrd (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* d, double* e, double* q,
lapack_int ldq);
Include Files
• mkl.h
Description
The routine reduces a real symmetric band matrix A to symmetric tridiagonal form T by an orthogonal
similarity transformation: A = Q*T*QT. The orthogonal matrix Q is determined as a product of Givens
rotations.
If required, the routine can also form the matrix Q explicitly.
Input Parameters

vect Must be 'V', 'N', or 'U'.
If vect = 'V', the routine returns the explicit matrix Q.
If vect = 'N', the routine does not return Q.
If vect = 'U', the routine updates matrix X by forming X*Q.
If uplo = 'U', ab stores the upper triangular part of A.
If uplo = 'L', ab stores the lower triangular part of A.
n The order of the matrix A (n≥0).
kd The number of super- or sub-diagonals in A

(kd≥0).
ab, q ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd+ 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix A (as specified by uplo) in band
storage format.
q (size max(1, ldq*n)) is an array.
If vect = 'U', the q array must contain an n-by-n matrix X.
If vect = 'N' or 'V', the q parameter need not be set.
716
LAPACK Routines 3
ldab The leading dimension of ab; at least kd+1 for column major layout and n
ldq The leading dimension of q. Constraints:

ldq≥ max(1, n) if vect = 'V' or 'U';
ldq≥ 1 if vect = 'N'.
Output Parameters
ab On exit, the diagonal elements of the array ab are overwritten by the

diagonal elements of the tridiagonal matrix T. If kd > 0, the elements on
the first superdiagonal (if uplo = 'U') or the first subdiagonal (if uplo =
'L') are ovewritten by the off-diagonal elements of T. The rest of ab is
overwritten by values generated during the reduction.
d, e, q Arrays:
q is not referenced if vect = 'N'.
If vect = 'V', q contains the n-by-n matrix Q.
If vect = 'U', q contains the product X* Q.
Return Values
Application Notes
modestly increasing function of n, and ε is the machine precision. The computed matrix Q differs from an
exactly orthogonal matrix by a matrix E such that ||E||2 = O(ε).
The total number of floating-point operations is approximately 6n2*kd if vect = 'N', with 3n3*(kd-1)/kd
additional operations if vect = 'V'.
The complex counterpart of this routine is hbtrd.
?hbtrd
Reduces a complex Hermitian band matrix to
tridiagonal form.
Syntax
lapack_int LAPACKE_chbtrd( int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* d, float* e,
lapack_complex_float* q, lapack_int ldq );
717
lapack_int LAPACKE_zhbtrd( int matrix_layout, char vect, char uplo, lapack_int n,

lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* d, double* e,
lapack_complex_double* q, lapack_int ldq );
Include Files
• mkl.h
Description
The routine reduces a complex Hermitian band matrix A to symmetric tridiagonal form T by a unitary
similarity transformation: A = Q*T*QH. The unitary matrix Q is determined as a product of Givens rotations.
If required, the routine can also form the matrix Q explicitly.
Input Parameters

vect Must be 'V', 'N', or 'U'.
If vect = 'V', the routine returns the explicit matrix Q.
If vect = 'N', the routine does not return Q.
If vect = 'U', the routine updates matrix X by forming Q*X.

(kd≥ 0).
ab ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd+ 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix A (as specified by uplo) in band
storage format.
q q (size max(1, ldq*n)) is an array.

If vect = 'U', the q array must contain an n-by-n matrix X.
If vect = 'N' or 'V', the q parameter need not be set.'
ldab The leading dimension of ab; at least kd+1 for column major layout and n
ldq The leading dimension of q. Constraints:

ldq≥ max(1, n) if vect = 'V' or 'U';
ldq≥ 1 if vect = 'N'.
718
LAPACK Routines 3
Output Parameters
ab On exit, the diagonal elements of the array ab are overwritten by the

diagonal elements of the tridiagonal matrix T. If kd > 0, the elements on
the first superdiagonal (if uplo = 'U') or the first subdiagonal (if uplo =
'L') are ovewritten by the off-diagonal elements of T. The rest of ab is
overwritten by values generated during the reduction.
d, e Arrays:
q If vect = 'N', q is not referenced.
If vect = 'V', q contains the n-by-n matrix Q.
If vect = 'U', q contains the product X* Q.
Return Values
Application Notes
modestly increasing function of n, and ε is the machine precision. The computed matrix Q differs from an
exactly unitary matrix by a matrix E such that ||E||2 = O(ε).
The total number of floating-point operations is approximately 20n2*kd if vect = 'N', with
10n3*(kd-1)/kd additional operations if vect = 'V'.
The real counterpart of this routine is sbtrd.
?sterf
Computes all eigenvalues of a real symmetric
tridiagonal matrix using QR algorithm.
Syntax
lapack_int LAPACKE_ssterf (lapack_int n, float* d, float* e);
lapack_int LAPACKE_dsterf (lapack_int n, double* d, double* e);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues of a real symmetric tridiagonal matrix T (which can be obtained by
reducing a symmetric or Hermitian matrix to tridiagonal form). The routine uses a square-root-free variant of
the QR algorithm.
If you need not only the eigenvalues but also the eigenvectors, call steqr.
719
Input Parameters
n The order of the matrix T (n≥ 0).
d, e Arrays:
d contains the diagonal elements of T.
Output Parameters
d The n eigenvalues in ascending order, unless info > 0.

See also info.
e On exit, the array is overwritten; see info.
Return Values
If info = i, the algorithm failed to find all the eigenvalues after 30n iterations:
i off-diagonal elements have not converged to zero. On exit, d and e contain, respectively, the diagonal and
off-diagonal elements of a tridiagonal matrix orthogonally similar to T.
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix T+E such that ||E||2 = O(ε)*||T||2,
If λi is an exact eigenvalue, and mi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*||T||2
where c(n) is a modestly increasing function of n.
The total number of floating-point operations depends on how rapidly the algorithm converges. Typically, it is
about 14n2.
?steqr
Computes all eigenvalues and eigenvectors of a
symmetric or Hermitian matrix reduced to tridiagonal
form (QR algorithm).
Syntax
lapack_int LAPACKE_ssteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, float* z, lapack_int ldz );
lapack_int LAPACKE_dsteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz );
lapack_int LAPACKE_csteqr( int matrix_layout, char compz, lapack_int n, float* d,
float* e, lapack_complex_float* z, lapack_int ldz );
720
LAPACK Routines 3
lapack_int LAPACKE_zsteqr( int matrix_layout, char compz, lapack_int n, double* d,
double* e, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetric tridiagonal
matrix T. In other words, the routine can compute the spectral factorization: T = Z*Λ*ZT. Here Λ is a
diagonal matrix whose diagonal elements are the eigenvalues λi; Z is an orthogonal matrix whose columns
are eigenvectors. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.
The routine normalizes the eigenvectors so that ||zi||2 = 1.
You can also use the routine for computing the eigenvalues and eigenvectors of an arbitrary real symmetric
(or complex Hermitian) matrix A reduced to tridiagonal form T: A = Q*T*QH. In this case, the spectral
factorization is as follows: A = Q*T*QH = (Q*Z)*Λ*(Q*Z)H. Before calling ?steqr, you must reduce A to
tridiagonal form and generate the explicit matrix Q by calling the following routines:
for real matrices: for complex matrices:
full storage ?sytrd, ?orgtr ?hetrd, ?ungtr
packed storage ?sptrd, ?opgtr ?hptrd, ?upgtr
band storage ?sbtrd(vect='V') ?hbtrd(vect='V')
If you need eigenvalues only, it's more efficient to call sterf. If T is positive-definite, pteqr can compute small
eigenvalues more accurately than ?steqr.
To solve the problem by a single call, use one of the divide and conquer routines stevd, syevd, spevd, or
sbevd for real symmetric matrices or heevd, hpevd, or hbevd for complex Hermitian matrices.
Input Parameters

compz Must be 'N' or 'I' or 'V'.
If compz = 'N', the routine computes eigenvalues only.
If compz = 'I', the routine computes the eigenvalues and eigenvectors of

the tridiagonal matrix T.
If compz = 'V', the routine computes the eigenvalues and eigenvectors of
the original symmetric matrix. On entry, z must contain the orthogonal
matrix used to reduce the original matrix to tridiagonal form.
d, e Arrays:
721
z Array, size max(1, ldz*n).
If compz = 'N' or 'I', z need not be set.
If vect = 'V', z must contain the orthogonal matrix used in the reduction
to tridiagonal form.
ldz The leading dimension of z. Constraints:

ldz≥ 1 if compz = 'N';
ldz≥ max(1, n) if compz = 'V' or 'I'.
Output Parameters
d The n eigenvalues in ascending order, unless info > 0.

See also info.
z If info = 0, contains the n-by-n matrix the columns of which are

orthonormal eigenvectors (the i-th column corresponds to the i-th
eigenvalue).
Return Values
If info = i, the algorithm failed to find all the eigenvalues after 30n iterations: i off-diagonal elements have
not converged to zero. On exit, d and e contain, respectively, the diagonal and off-diagonal elements of a
tridiagonal matrix orthogonally similar to T.
Application Notes
If λi is an exact eigenvalue, and μi is the corresponding computed value, then
|μi - λi| ≤c(n)*ε*||T||2
If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle
θ(zi, wi) between them is bounded as follows:
θ(zi, wi) ≤c(n)*ε*||T||2 / mini≠j|λi - λj|.
The total number of floating-point operations depends on how rapidly the algorithm converges. Typically, it is
about
24n2 if compz = 'N';
7n3 (for complex flavors, 14n3) if compz = 'V' or 'I'.
722
LAPACK Routines 3
?stemr
Computes selected eigenvalues and eigenvectors of a
real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstemr( int matrix_layout, char jobz, char range, lapack_int n,
const float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int nzc, lapack_int* isuppz,
lapack_logical* tryrac );
lapack_int LAPACKE_dstemr( int matrix_layout, char jobz, char range, lapack_int n,
const double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu,
lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int nzc, lapack_int*
isuppz, lapack_logical* tryrac );
lapack_int LAPACKE_cstemr( int matrix_layout, char jobz, char range, lapack_int n,
const float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu,
lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int nzc,
lapack_int* isuppz, lapack_logical* tryrac );
lapack_int LAPACKE_zstemr( int matrix_layout, char jobz, char range, lapack_int n,
const double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu,
lapack_int* m, double* w, lapack_complex_double* z, lapack_int ldz, lapack_int nzc,
lapack_int* isuppz, lapack_logical* tryrac );
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal
matrix T. Any such unreduced matrix has a well defined set of pairwise different real eigenvalues, the
corresponding real eigenvectors are pairwise orthogonal.
The spectrum may be computed either completely or partially by specifying either an interval (vl,vu] or a
range of indices il:iu for the desired eigenvalues.
Depending on the number of desired eigenvalues, these are computed either by bisection or the dqds
algorithm. Numerically orthogonal eigenvectors are computed by the use of various suitable L*D*LT
factorizations near clusters of close eigenvalues (referred to as RRRs, Relatively Robust Representations). An
informal sketch of the algorithm follows.
For each unreduced block (submatrix) of T,
a. Compute T - sigma*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of L and D cause only small relative
changes in the eigenvalues and eigenvectors. The standard (unfactored) representation of the
tridiagonal matrix T does not have this property in general.
b. Compute the eigenvalues to suitable accuracy. If the eigenvectors are desired, the algorithm attains full
accuracy of the computed eigenvalues only right before the corresponding vectors have to be
computed, see steps c and d.
c. For each cluster of close eigenvalues, select a new shift close to the cluster, find a new factorization,
and refine the shifted eigenvalues to suitable accuracy.
d. For each eigenvalue with a large enough relative separation compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to step c for any clusters that remain.
For more details, see: [Dhillon04], [Dhillon04-02], [Dhillon97]
723
Input Parameters

jobz Must be 'N' or 'V'.
If jobz = 'N', then only eigenvalues are computed.
If jobz = 'V', then eigenvalues and eigenvectors are computed.
range Must be 'A' or 'V' or 'I'.
If range = 'A', the routine computes all eigenvalues.
If range = 'V', the routine computes all eigenvalues in the half-open

interval: (vl, vu].
If range = 'I', the routine computes eigenvalues with indices il to iu.
n The order of the matrix T (n≥0).
d Array, size n.
Contains n diagonal elements of the tridiagonal matrix T.
e Array, size n.
Contains (n-1) off-diagonal elements of the tridiagonal matrix T in
elements 0 to n-2 of e. e[n - 1] need not be set on input, but is used
internally as workspace.
vl, vu If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues. Constraint: vl<vu.
If range = 'A' or 'I', vl and vu are not referenced.
il, iu
If range = 'I', the indices in ascending order of the smallest and largest
eigenvalues to be returned.
Constraint: 1≤il≤iu≤n, if n>0.
If range = 'A' or 'V', il and iu are not referenced.
ldz The leading dimension of the output array z.

if jobz = 'V', then ldz≥ max(1, n) for column major layout and ldz≥
max(1, m) for row major layout ;
ldz≥ 1 otherwise.
nzc The number of eigenvectors to be held in the array z.

If range = 'A', then nzc≥max(1, n);
If range = 'V', then nzc is greater than or equal to the number of

eigenvalues in the half-open interval: (vl, vu].
If range = 'I', then nzc≥iu-il+1.
If nzc = -1, then a workspace query is assumed; the routine calculates the
number of columns of the array z that are needed to hold the eigenvectors.
724
LAPACK Routines 3
This value is returned as the first entry of the array z, and no error
message related to nzc is issued by the routine xerbla.
tryrac
If tryrac is true, it indicates that the code should check whether the
tridiagonal matrix defines its eigenvalues to high relative accuracy. If so,
the code uses relative-accuracy preserving algorithms that might be (a bit)
slower depending on the matrix. If the matrix does not define its
eigenvalues to high relative accuracy, the code can uses possibly faster
algorithms.
If tryrac is not true, the code is not required to guarantee relatively
accurate eigenvalues and can use the fastest possible techniques.
Output Parameters
d On exit, the array d is overwritten.
e On exit, the array e is overwritten.
m
The total number of eigenvalues found, 0≤m≤n.
If range = 'A', then m=n, and if range = 'I', then m=iu-il+1.
w Array, size n.
The first m elements contain the selected eigenvalues in ascending order.
z Array z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for
row major layout) .
If jobz = 'V', and info = 0, then the first m columns of z contain the
orthonormal eigenvectors of the matrix T corresponding to the selected
eigenvalues, with the i-th column of z holding the eigenvector associated
with w(i).
If jobz = 'N', then z is not referenced.
Note: the exact value of m is not known in advance and can be computed
with a workspace query by setting nzc=-1, see description of the
parameter nzc.
isuppz
Array, size (2*max(1, m)).
The support of the eigenvectors in z, that is the indices indicating the

nonzero elements in z. The i-th computed eigenvector is nonzero only in
elements isuppz[2*i - 2] through isuppz[2*i - 1]. This is relevant in
the case when the matrix is split. isuppz is only accessed when jobz =
'V' and n>0.
tryrac On exit, , set to true. tryrac is set to false if the matrix does not define its
eigenvalues to high relative accuracy.
Return Values
725
If info > 0, an internal error occurred.
?stedc
symmetric tridiagonal matrix using the divide and
conquer method.
Syntax
lapack_int LAPACKE_sstedc( int matrix_layout, char compz, lapack_int n, float* d,
lapack_int LAPACKE_dstedc( int matrix_layout, char compz, lapack_int n, double* d,
lapack_int LAPACKE_cstedc( int matrix_layout, char compz, lapack_int n, float* d,
lapack_int LAPACKE_zstedc( int matrix_layout, char compz, lapack_int n, double* d,
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and (optionally) all the eigenvectors of a symmetric tridiagonal
matrix using the divide and conquer method. The eigenvectors of a full or band real symmetric or complex
Hermitian matrix can also be found if sytrd/hetrd or sptrd/hptrd or sbtrd/hbtrd has been used to reduce this
matrix to tridiagonal form.
Input Parameters


the tridiagonal matrix.
original symmetric/Hermitian matrix. On entry, the array z must contain the
orthogonal/unitary matrix used to reduce the original matrix to tridiagonal
form.
n The order of the symmetric tridiagonal matrix (n≥ 0).
d, e Arrays:
d contains the diagonal elements of the tridiagonal matrix.
e contains the subdiagonal elements of the tridiagonal matrix.
726
LAPACK Routines 3
z Array z is of size max(1, ldz*n).
If compz = 'V', then, on entry, z must contain the orthogonal/unitary

matrix used to reduce the original matrix to tridiagonal form.

Output Parameters
d The n eigenvalues in ascending order, unless info≠ 0.
See also info.
z If info = 0, then if compz = 'V', z contains the orthonormal eigenvectors

of the original symmetric/Hermitian matrix, and if compz = 'I', z contains
the orthonormal eigenvectors of the symmetric tridiagonal matrix. If compz
= 'N', z is not referenced.
Return Values
If info = i, the algorithm failed to compute an eigenvalue while working on the submatrix lying in rows and
columns i/(n+1) through mod(i, n+1).
?stegr
Syntax
lapack_int LAPACKE_sstegr( int matrix_layout, char jobz, char range, lapack_int n,
float* d, float* e, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_dstegr( int matrix_layout, char jobz, char range, lapack_int n,
double* d, double* e, double vl, double vu, lapack_int il, lapack_int iu, double
abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_cstegr( int matrix_layout, char jobz, char range, lapack_int n,
lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_zstegr( int matrix_layout, char jobz, char range, lapack_int n,
abstol, lapack_int* m, double* w, lapack_complex_double* z, lapack_int ldz,
lapack_int* isuppz );
727
Include Files
• mkl.h
Description
matrix T.
The spectrum may be computed either completely or partially by specifying either an interval (vl,vu] or a
range of indices il:iu for the desired eigenvalues.
?stegr is a compatibility wrapper around the improved stemr routine. See its description for further details.
Note that the abstol parameter no longer provides any benefit and hence is no longer used.
Input Parameters

If job = 'N', then only eigenvalues are computed.
If job = 'V', then eigenvalues and eigenvectors are computed.
If range = 'V', the routine computes eigenvalues w[i] in the half-open

interval:
vl<w[i]≤vu.
d, e Arrays:
e contains the subdiagonal elements of T in elements 1 to n-1; e(n) need
not be set on input, but it is used as a workspace.
The dimension of e must be at least max(1, n).
for eigenvalues.
Constraint: vl< vu.
il, iu
Constraint: 1 ≤il≤iu≤n, if n > 0.
728
LAPACK Routines 3
abstol Unused. Was the absolute error tolerance for the eigenvalues/eigenvectors
in previous versions.
ldz The leading dimension of the output array z. Constraints:

ldz≥ 1 if jobz = 'N';
ldz≥ max(1, n) if jobz = 'V'.
Output Parameters
d, e On exit, d and e are overwritten.
m The total number of eigenvalues found,

0 ≤m≤n.
If range = 'A', m = n, and if range = 'I', m = iu-il+1.
w Array, size at least max(1, n).

The selected eigenvalues in ascending order, stored in w[0] to w[m - 1].
z Array z(size max(1,ldz*m)).
If jobz = 'V', and if info = 0, the first m columns of z contain the

with w[i - 1].
Note: if range = 'V', the exact value of m is not known in advance and an
upper bound must be used. Using n = m is always safe.
isuppz
Array, size at least (2*max(1, m)).
The support of the eigenvectors in z, that is the indices indicating the

nonzero elements in z. The i-th computed eigenvector is nonzero only in
elements isuppz[2*i - 2] through isuppz[2*i - 1]. This is relevant in
the case when the matrix is split. isuppz is only accessed when jobz =
'V', and n > 0.
Return Values
If info > 0, an internal error occurred.
?pteqr
Computes all eigenvalues and (optionally) all
eigenvectors of a real symmetric positive-definite
tridiagonal matrix.
Syntax
lapack_int LAPACKE_spteqr( int matrix_layout, char compz, lapack_int n, float* d,
729
lapack_int LAPACKE_dpteqr( int matrix_layout, char compz, lapack_int n, double* d,

lapack_int LAPACKE_cpteqr( int matrix_layout, char compz, lapack_int n, float* d,
lapack_int LAPACKE_zpteqr( int matrix_layout, char compz, lapack_int n, double* d,
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and (optionally) all the eigenvectors of a real symmetric positive-
definite tridiagonal matrix T. In other words, the routine can compute the spectral factorization: T =
Z*Λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi; Z is an orthogonal matrix whose
columns are eigenvectors. Thus,
T*zi = λi*zi for i = 1, 2, ..., n.
(The routine normalizes the eigenvectors so that ||zi||2 = 1.)
You can also use the routine for computing the eigenvalues and eigenvectors of real symmetric (or complex
Hermitian) positive-definite matrices A reduced to tridiagonal form T: A = Q*T*QH. In this case, the spectral
factorization is as follows: A = Q*T*QH = (QZ)*Λ*(QZ)H. Before calling ?pteqr, you must reduce A to
tridiagonal form and generate the explicit matrix Q by calling the following routines:
for real matrices: for complex matrices:
full storage ?sytrd, ?orgtr ?hetrd, ?ungtr
packed storage ?sptrd, ?opgtr ?hptrd, ?upgtr
band storage ?sbtrd(vect='V') ?hbtrd(vect='V')
The routine first factorizes T as L*D*LH where L is a unit lower bidiagonal matrix, and D is a diagonal matrix.
Then it forms the bidiagonal matrix B = L*D1/2 and calls ?bdsqr to compute the singular values of B, which
are the square roots of the eigenvalues of T.
Input Parameters


the tridiagonal matrix T.
A (and the array z must contain the matrix Q on entry).
d, e Arrays:
730
LAPACK Routines 3
z Array, size max(1, ldz*n)
If compz = 'N' or 'I', z need not be set.
If compz = 'V', z must contain the orthogonal matrix used in the

reduction to tridiagonal form..

Output Parameters
d The n eigenvalues in descending order, unless info > 0.
See also info.
e On exit, the array is overwritten.
z If info = 0, contains an n-byn matrix the columns of which are

orthonormal eigenvectors. (The i-th column corresponds to the i-th
eigenvalue.)
Return Values
If info = i, the leading minor of order i (and hence T itself) is not positive-definite.
If info = n + i, the algorithm for computing singular values failed to converge; i off-diagonal elements
have not converged to zero.
Application Notes
|μi - λi| ≤c(n)*ε*K*λi
where c(n) is a modestly increasing function of n, ε is the machine precision, and K = ||DTD||2 *||
(DTD)-1||2, D is diagonal with dii = tii-1/2.
If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle θ(zi,
wi) between them is bounded as follows:
θ(ui, wi) ≤c(n)εK / mini≠j(|λi - λj|/|λi + λj|).
Here mini≠j(|λi - λj|/|λi + λj|) is the relative gap between λi and the other eigenvalues.
The total number of floating-point operations depends on how rapidly the algorithm converges.
Typically, it is about
731
30n2 if compz = 'N';

6n3 (for complex flavors, 12n3) if compz = 'V' or 'I'.
?stebz
Computes selected eigenvalues of a real symmetric
tridiagonal matrix by bisection.
Syntax
lapack_int LAPACKE_sstebz (char range, char order, lapack_int n, float vl, float vu,
lapack_int il, lapack_int iu, float abstol, const float* d, const float* e,
lapack_int* m, lapack_int* nsplit, float* w, lapack_int* iblock, lapack_int* isplit);
lapack_int LAPACKE_dstebz (char range, char order, lapack_int n, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, const double* d, const double* e,
lapack_int* m, lapack_int* nsplit, double* w, lapack_int* iblock, lapack_int* isplit);
Include Files
• mkl.h
Description
The routine computes some (or all) of the eigenvalues of a real symmetric tridiagonal matrix T by bisection.
The routine searches for zero or negligible off-diagonal elements to see if T splits into block-diagonal form T
= diag(T1, T2, ...). Then it performs bisection on each of the blocks Ti and returns the block index of
each computed eigenvalue, so that a subsequent call to stein can also take advantage of the block structure.
Input Parameters

interval: vl < w[i]≤vu.
order Must be 'B' or 'E'.
If order = 'B', the eigenvalues are to be ordered from smallest to largest

within each split-off block.
If order = 'E', the eigenvalues for the entire matrix are to be ordered
from smallest to largest.
vl, vu If range = 'V', the routine computes eigenvalues w[i] in the half-open
interval:
vl < w[i]) ≤vu.
il, iu Constraint: 1 ≤il≤iu≤n.
If range = 'I', the routine computes eigenvalues w[i] such that il≤i≤iu
(assuming that the eigenvalues w[i] are in ascending order).
732
LAPACK Routines 3
abstol The absolute tolerance to which each eigenvalue is required. An eigenvalue

(or cluster) is considered to have converged if it lies in an interval of width
abstol.
If abstol≤ 0.0, then the tolerance is taken as eps*|T|, where eps is the
machine precision, and |T| is the 1-norm of the matrix T.
d, e Arrays:
Output Parameters
m The actual number of eigenvalues found.
nsplit The number of diagonal blocks detected in T.
w Array, size at least max(1, n). The computed eigenvalues, stored in w[0] to
w[m - 1].
iblock, isplit
Arrays, size at least max(1, n).
A positive value iblock[i] is the block number of the eigenvalue stored in
w[i] (see also info).
The leading nsplit elements of isplit contain points at which T splits into
blocks Ti as follows: the block T1 contains rows/columns 1 to isplit[0];
the block T2 contains rows/columns isplit[0]+1 to isplit[1], and so
on.
Return Values
If info = 1, for range = 'A' or 'V', the algorithm failed to compute some of the required eigenvalues to
the desired accuracy; iblock[i] < 0 indicates that the eigenvalue stored in w[i] failed to converge.
If info = 2, for range = 'I', the algorithm failed to compute some of the required eigenvalues. Try calling
the routine again with range = 'A'.
If info = 3:
for range = 'A' or 'V', same as info = 1;
for range = 'I', same as info = 2.
If info = 4, no eigenvalues have been computed. The floating-point arithmetic on the computer is not
behaving as expected.
733
Application Notes
The eigenvalues of T are computed to high relative accuracy which means that if they vary widely in
magnitude, then any small eigenvalues will be computed more accurately than, for example, with the
standard QR method. However, the reduction to tridiagonal form (prior to calling the routine) may exclude
the possibility of obtaining high relative accuracy in the small eigenvalues of the original matrix if its
eigenvalues vary widely in magnitude.
?stein
Computes the eigenvectors corresponding to specified
eigenvalues of a real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstein( int matrix_layout, lapack_int n, const float* d, const
float* e, lapack_int m, const float* w, const lapack_int* iblock, const lapack_int*
isplit, float* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_dstein( int matrix_layout, lapack_int n, const double* d, const
double* e, lapack_int m, const double* w, const lapack_int* iblock, const lapack_int*
isplit, double* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_cstein( int matrix_layout, lapack_int n, const float* d, const
float* e, lapack_int m, const float* w, const lapack_int* iblock, const lapack_int*
isplit, lapack_complex_float* z, lapack_int ldz, lapack_int* ifailv );
lapack_int LAPACKE_zstein( int matrix_layout, lapack_int n, const double* d, const
double* e, lapack_int m, const double* w, const lapack_int* iblock, const lapack_int*
isplit, lapack_complex_double* z, lapack_int ldz, lapack_int* ifailv );
Include Files
• mkl.h
Description
The routine computes the eigenvectors of a real symmetric tridiagonal matrix T corresponding to specified
eigenvalues, by inverse iteration. It is designed to be used in particular after the specified eigenvalues have
been computed by ?stebz with order = 'B', but may also be used when the eigenvalues have been
computed by other routines.
If you use this routine after ?stebz, it can take advantage of the block structure by performing inverse
iteration on each block Ti separately, which is more efficient than using the whole matrix T.
If T has been formed by reduction of a full symmetric or Hermitian matrix A to tridiagonal form, you can
transform eigenvectors of T to eigenvectors of A by calling ?ormtr or ?opmtr (for real flavors) or by calling ?
unmtr or ?upmtr (for complex flavors).
Input Parameters

m The number of eigenvectors to be returned.
d, e, w Arrays:
734
LAPACK Routines 3
e contains the sub-diagonal elements of T stored in elements 1 to n-1
w contains the eigenvalues of T, stored in w[0] to w[m - 1] (as returned
by stebz). Eigenvalues of T1 must be supplied first, in non-decreasing
order; then those of T2, again in non-decreasing order, and so on.
Constraint:
if iblock[i] = iblock[i+1], w[i] ≤w[i+1].
The size of w must be at least max(1, n).
iblock, isplit
Arrays, size at least max(1, n). The arrays iblock and isplit, as returned
by ?stebz with order = 'B'.
If you did not call ?stebz with order = 'B', set all elements of iblock to
1, and isplit[0] to n.)
ldz The leading dimension of the output array z; ldz≥ max(1, n) for column
major layout and ldz>=max(1,m) for row major layout.
Output Parameters
z Array, size at least max(1,ldz*m) for column major layout and

max(1,ldz*n) for row major layout.
If info = 0, z contains an n-by-n matrix the columns of which are

orthonormal eigenvectors. (The i-th column corresponds to the ith
eigenvalue.)
ifailv
Array, size at least max(1, m).
If info = i > 0, the first i elements of ifailv contain the indices of any
eigenvectors that failed to converge.
Return Values
If info = i, then i eigenvectors (as indicated by the parameter ifailv) each failed to converge in 5 iterations.
The current iterates are stored in the corresponding columns/rows of the array z.
Application Notes
Each computed eigenvector zi is an exact eigenvector of a matrix T+Ei, where ||Ei||2 = O(ε)*||T||2.
However, a set of eigenvectors computed by this routine may not be orthogonal to so high a degree of
accuracy as those computed by ?steqr.
?disna
Computes the reciprocal condition numbers for the
eigenvectors of a symmetric/ Hermitian matrix or for
the left or right singular vectors of a general matrix.
735
Syntax
lapack_int LAPACKE_sdisna (char job, lapack_int m, lapack_int n, const float* d, float*
sep);
lapack_int LAPACKE_ddisna (char job, lapack_int m, lapack_int n, const double* d,
double* sep);
Include Files
• mkl.h
Description
The routine computes the reciprocal condition numbers for the eigenvectors of a real symmetric or complex
Hermitian matrix or for the left or right singular vectors of a general m-by-n matrix.
The reciprocal condition number is the 'gap' between the corresponding eigenvalue or singular value and the
nearest other one.
The bound on the error, measured by angle in radians, in the i-th computed vector is given by
?lamch('E')*(anorm/sep(i))
where anorm = ||A||2 = max( |d(j)| ). sep(i) is not allowed to be smaller than slamch('E')*anorm in
order to limit the size of the error bound.
?disna may also be used to compute error bounds for eigenvectors of the generalized symmetric definite
eigenproblem.
Input Parameters
job Must be 'E','L', or 'R'. Specifies for which problem the reciprocal
condition numbers should be computed:
job = 'E': for the eigenvectors of a symmetric/Hermitian matrix;
job = 'L': for the left singular vectors of a general matrix;
job = 'R': for the right singular vectors of a general matrix.
m The number of rows of the matrix (m≥ 0).
n
If job = 'L', or 'R', the number of columns of the matrix (n≥ 0). Ignored
if job = 'E'.
d Array, dimension at least max(1,m) if job = 'E', and at least max(1,

min(m,n)) if job = 'L' or 'R'.
This array must contain the eigenvalues (if job = 'E') or singular values
(if job = 'L' or 'R') of the matrix, in either increasing or decreasing
order.
If singular values, they must be non-negative.
Output Parameters
sep Array, dimension at least max(1,m) if job = 'E', and at least max(1,
min(m,n)) if job = 'L' or 'R'. The reciprocal condition numbers of the
vectors.
736
LAPACK Routines 3
Return Values
Generalized Symmetric-Definite Eigenvalue Problems: LAPACK Computational Routines

Generalized symmetric-definite eigenvalue problems are as follows: find the eigenvalues λ and the
corresponding eigenvectors z that satisfy one of these equations:
Az = λBz, ABz = λz, or BAz = λz,
where A is an n-by-n symmetric or Hermitian matrix, and B is an n-by-n symmetric positive-definite or
Hermitian positive-definite matrix.
In these problems, there exist n real eigenvectors corresponding to real eigenvalues (even for complex
Hermitian matrices A and B).
Routines described in this section allow you to reduce the above generalized problems to standard symmetric
eigenvalue problem Cy = λy, which you can solve by calling LAPACK routines described earlier in this
chapter (see Symmetric Eigenvalue Problems).
Different routines allow the matrices to be stored either conventionally or in packed storage. Prior to
reduction, the positive-definite matrix B must first be factorized using either potrf or pptrf.
The reduction routine for the banded matrices A and B uses a split Cholesky factorization for which a specific
routine pbstf is provided. This refinement halves the amount of work required to form matrix C.
Table "Computational Routines for Reducing Generalized Eigenproblems to Standard Problems" lists LAPACK
routines that can be used to solve generalized symmetric-definite eigenvalue problems.
Computational Routines for Reducing Generalized Eigenproblems to Standard Problems
Matrix type Reduce to standard Reduce to standard Reduce to standard Factorize
problems (full problems (packed problems (band band
storage) storage) matrices) matrix
real sygst spgst sbgst pbstf

symmetric
matrices
complex hegst hpgst hbgst pbstf

Hermitian
matrices
?sygst
Reduces a real symmetric-definite generalized
eigenvalue problem to the standard form.
Syntax
lapack_int LAPACKE_ssygst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, float* a, lapack_int lda, const float* b, lapack_int ldb);
lapack_int LAPACKE_dsygst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, double* a, lapack_int lda, const double* b, lapack_int ldb);
Include Files
• mkl.h
Description
The routine reduces real symmetric-definite generalized eigenproblems
737
A*z = λ*B*z, A*B*z = λ*z, or B*A*z = λ*z

to the standard form C*y = λ*y. Here A is a real symmetric matrix, and B is a real symmetric positive-
definite matrix. Before calling this routine, call ?potrf to compute the Cholesky factorization: B = UT*U or B
= L*LT.
Input Parameters

itype Must be 1 or 2 or 3.
If itype = 1, the generalized eigenproblem is A*z = lambda*B*z
for uplo = 'U': C = inv(UT)*A*inv(U), z = inv(U)*y;
for uplo = 'L': C = inv(L)*A*inv(LT), z = inv(LT)*y.
If itype = 2, the generalized eigenproblem is A*B*z = lambda*z
for uplo = 'U': C = U*A*UT, z = inv(U)*y;
for uplo = 'L': C = LT*A*L, z = inv(LT)*y.
If itype = 3, the generalized eigenproblem is B*A*z = lambda*z
for uplo = 'U': C = U*A*UT, z = UT*y;
for uplo = 'L': C = LT*A*L, z = L*y.
If uplo = 'U', the array a stores the upper triangle of A; you must supply
B in the factored form B = UT*U.
If uplo = 'L', the array a stores the lower triangle of A; you must supply
B in the factored form B = L*LT.
n The order of the matrices A and B (n≥ 0).
a, b Arrays:
a (size max(1, lda*n)) contains the upper or lower triangle of A.
b (size max(1, ldb*n)) contains the Cholesky-factored matrix B:
B = UT*U or B = L*LT (as returned by ?potrf).
ldb The leading dimension of b; at least max(1, n).
Output Parameters
a The upper or lower triangle of A is overwritten by the upper or lower

triangle of C, as specified by the arguments itype and uplo.
Return Values
738
LAPACK Routines 3
Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by inv(B) (if
itype = 1) or B (if itype = 2 or 3). When the routine is used as a step in the computation of eigenvalues
and eigenvectors of the original problem, there may be a significant loss of accuracy if B is ill-conditioned
with respect to inversion.
The approximate number of floating-point operations is n3.
?hegst
Reduces a complex Hermitian positive-definite
generalized eigenvalue problem to the standard form.
Syntax
lapack_int LAPACKE_chegst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_float* a, lapack_int lda, const lapack_complex_float* b, lapack_int
ldb);
lapack_int LAPACKE_zhegst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_double* a, lapack_int lda, const lapack_complex_double* b, lapack_int
ldb);
Include Files
• mkl.h
Description
The routine reduces a complex Hermitian positive-definite generalized eigenvalue problem to standard form.
itype Problem Result
1 A*x = λ*B*x A overwritten by inv(UH)*A*inv(U) or

inv(L)*A*inv(LH)
2 A*B*x = λ*x A overwritten by U*A*UH or LH*A*L
3 B*A*x = λ*x
Before calling this routine, you must call ?potrf to compute the Cholesky factorization: B = UH*U or B =
L*LH.
Input Parameters
for uplo = 'U': C = (UH)-1*A*U-1;
for uplo = 'L': C = L-1*A*(LH)-1.
for uplo = 'U': C = U*A*UH;
for uplo = 'L': C = LH*A*L.
739
for uplo = 'U': C = U*A*UH;
for uplo = 'L': C = LH*A*L.
If uplo = 'U', the array a stores the upper triangle of A; you must supply
B in the factored form B = UH*U.
If uplo = 'L', the array a stores the lower triangle of A; you must supply
B in the factored form B = L*LH.
a, b Arrays:
a (size max(1, lda*n)) contains the upper or lower triangle of A.
b (size max(1, ldb*n)) contains the Cholesky-factored matrix B:
B = UH*U or B = L*LH (as returned by ?potrf).
Output Parameters
a The upper or lower triangle of A is overwritten by the upper or lower

Return Values
Application Notes
Forming the reduced matrix C is a stable procedure. However, it involves implicit multiplication by B-1 (if
?spgst
eigenvalue problem to the standard form using packed
storage.
Syntax
lapack_int LAPACKE_sspgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, float* ap, const float* bp);
lapack_int LAPACKE_dspgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, double* ap, const double* bp);
740
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine reduces real symmetric-definite generalized eigenproblems

A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x
to the standard form C*y = λ*y, using packed matrix storage. Here A is a real symmetric matrix, and B is a
real symmetric positive-definite matrix. Before calling this routine, call ?pptrf to compute the Cholesky
factorization: B = UT*U or B = L*LT.
Input Parameters

for uplo = 'U': C = inv(UT)*A*inv(U), z = inv(U)*y;
for uplo = 'L': C = inv(L)*A*inv(LT), z = inv(LT)*y.
for uplo = 'U': C = U*A*UT, z = inv(U)*y;
for uplo = 'L': C = LT*A*L, z = inv(LT)*y.
for uplo = 'U': C = U*A*UT, z = UT*y;
for uplo = 'L': C = LT*A*L, z = L*y.
If uplo = 'U', ap stores the packed upper triangle of A;
you must supply B in the factored form B = UT*U.
If uplo = 'L', ap stores the packed lower triangle of A;
you must supply B in the factored form B = L*LT.
ap, bp Arrays:
ap contains the packed upper or lower triangle of A.
The dimension of ap must be at least max(1, n*(n+1)/2).
bp contains the packed Cholesky factor of B (as returned by ?pptrf with
the same uplo value).
The dimension of bp must be at least max(1, n*(n+1)/2).
741
Output Parameters
ap The upper or lower triangle of A is overwritten by the upper or lower

Return Values
Application Notes
?hpgst
Reduces a generalized eigenvalue problem with a
Hermitian matrix to a standard eigenvalue problem
using packed storage.
Syntax
lapack_int LAPACKE_chpgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_float* ap, const lapack_complex_float* bp);
lapack_int LAPACKE_zhpgst (int matrix_layout, lapack_int itype, char uplo, lapack_int
n, lapack_complex_double* ap, const lapack_complex_double* bp);
Include Files
• mkl.h
Description
The routine reduces generalized eigenproblems with Hermitian matrices

A*z = λ*B*z, A*B*z = λ*z, or B*A*z = λ*z.
to standard eigenproblems C*y = λ*y, using packed matrix storage. Here A is a complex Hermitian matrix,
and B is a complex Hermitian positive-definite matrix. Before calling this routine, you must call ?pptrf to
compute the Cholesky factorization: B = UH*U or B = L*LH.
Input Parameters

for uplo = 'U': C = inv(UH)*A*inv(U), z = inv(U)*y;
for uplo = 'L': C = inv(L)*A*inv(LH), z = inv(LH)*y.
742
LAPACK Routines 3
for uplo = 'U': C = U*A*UH, z = inv(U)*y;
for uplo = 'L': C = LH*A*L, z = inv(LH)*y.
for uplo = 'U': C = U*A*UH, z = UH*y;
for uplo = 'L': C = LH*A*L, z = L*y.
If uplo = 'U', ap stores the packed upper triangle of A; you must supply
B in the factored form B = UH*U.
If uplo = 'L', ap stores the packed lower triangle of A; you must supply B
in the factored form B = L*LH.
ap, bp Arrays:
ap contains the packed upper or lower triangle of A.
The dimension of a must be at least max(1, n*(n+1)/2).
bp contains the packed Cholesky factor of B (as returned by ?pptrf with
the same uplo value).
The dimension of b must be at least max(1, n*(n+1)/2).
Output Parameters
ap The upper or lower triangle of A is overwritten by the upper or lower

Return Values
Application Notes
?sbgst
eigenproblem for banded matrices to the standard
form using the factorization performed by ?pbstf.
743
Syntax
lapack_int LAPACKE_ssbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, const float* bb, lapack_int
ldbb, float* x, lapack_int ldx);
lapack_int LAPACKE_dsbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, const double* bb,
lapack_int ldbb, double* x, lapack_int ldx);
Include Files
• mkl.h
Description
To reduce the real symmetric-definite generalized eigenproblem A*z = λ*B*z to the standard form C*y=λ*y,
where A, B and C are banded, this routine must be preceded by a call to pbstf, which computes the split
Cholesky factorization of the positive-definite matrix B: B=ST*S. The split Cholesky factorization, compared
with the ordinary Cholesky factorization, allows the work to be approximately halved.
This routine overwrites A with C = XT*A*X, where X = inv(S)*Q and Q is an orthogonal matrix chosen
(implicitly) to preserve the bandwidth of A. The routine also has an option to allow the accumulation of X,
and then, if z is an eigenvector of C, X*z is an eigenvector of the original system.
Input Parameters

vect Must be 'N' or 'V'.
If vect = 'N', then matrix X is not returned;
If vect = 'V', then matrix X is returned.
ka The number of super- or sub-diagonals in A

(ka≥ 0).
kb The number of super- or sub-diagonals in B

(ka≥kb≥ 0).
ab, bb ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(ka + 1)) for row major layout) is an array containing either
upper or lower triangular part of the symmetric matrix A (as specified by
uplo) in band storage format.
744
LAPACK Routines 3
bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing the
banded split Cholesky factor of B as specified by uplo, n and kb and
returned by pbstf/pbstf.
ldab The leading dimension of the array ab; must be at least ka+1 for column
major layout and max(1, n) for row major layout.
ldbb The leading dimension of the array bb; must be at least kb+1 for column
ldx The leading dimension of the output array x. Constraints: if vect = 'N',
then ldx≥ 1;
if vect = 'V', then ldx≥ max(1, n).
Output Parameters
ab On exit, this array is overwritten by the upper or lower triangle of C as

specified by uplo.
x Array.
If vect = 'V', then x (size at least max(1, ldx*n)) contains the n-by-n
matrix X = inv(S)*Q.
If vect = 'N', then x is not referenced.
Return Values
Application Notes
Forming the reduced matrix C involves implicit multiplication by inv(B). When the routine is used as a step
in the computation of eigenvalues and eigenvectors of the original problem, there may be a significant loss of
accuracy if B is ill-conditioned with respect to inversion.
If ka and kb are much less than n then the total number of floating-point operations is approximately
6n2*kb, when vect = 'N'. Additional (3/2)n3*(kb/ka) operations are required when vect = 'V'.
?hbgst
Reduces a complex Hermitian positive-definite
generalized eigenproblem for banded matrices to the
standard form using the factorization performed by ?
pbstf.
Syntax
lapack_int LAPACKE_chbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab, const
lapack_complex_float* bb, lapack_int ldbb, lapack_complex_float* x, lapack_int ldx);
lapack_int LAPACKE_zhbgst (int matrix_layout, char vect, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab, const
lapack_complex_double* bb, lapack_int ldbb, lapack_complex_double* x, lapack_int ldx);
745
Include Files
• mkl.h
Description
To reduce the complex Hermitian positive-definite generalized eigenproblem A*z = λ*B*z to the standard
form C*x = λ*y, where A, B and C are banded, this routine must be preceded by a call to pbstf/pbstf, which
computes the split Cholesky factorization of the positive-definite matrix B: B = SH*S. The split Cholesky
factorization, compared with the ordinary Cholesky factorization, allows the work to be approximately halved.
This routine overwrites A with C = XH*A*X, where X = inv(S)*Q, and Q is a unitary matrix chosen
(implicitly) to preserve the bandwidth of A. The routine also has an option to allow the accumulation of X,
and then, if z is an eigenvector of C, X*z is an eigenvector of the original system.
Input Parameters

vect Must be 'N' or 'V'.
If vect = 'N', then matrix X is not returned;
If vect = 'V', then matrix X is returned.

(ka≥ 0).

(ka≥kb≥ 0).
ab, bb ab(size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(ka + 1)) for row major layout) is an array containing either
upper or lower triangular part of the Hermitian matrix A (as specified by
bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing the
banded split Cholesky factor of B as specified by uplo, n and kb and
returned by pbstf/pbstf.
ldx The leading dimension of the output array x. Constraints:

if vect = 'N', then ldx≥ 1;
746
LAPACK Routines 3
if vect = 'V', then ldx≥ max(1, n).
Output Parameters
ab On exit, this array is overwritten by the upper or lower triangle of C as

specified by uplo.
x Array.
If vect = 'V', then x (size at least max(1, ldx*n)) contains the n-by-n
matrix X = inv(S)*Q.
If vect = 'N', then x is not referenced.
Return Values
Application Notes
Forming the reduced matrix C involves implicit multiplication by inv(B). When the routine is used as a step
in the computation of eigenvalues and eigenvectors of the original problem, there may be a significant loss of
accuracy if B is ill-conditioned with respect to inversion. The total number of floating-point operations is
approximately 20n2*kb, when vect = 'N'. Additional 5n3*(kb/ka) operations are required when vect =
'V'. All these estimates assume that both ka and kb are much less than n.
?pbstf
Computes a split Cholesky factorization of a real
symmetric or complex Hermitian positive-definite
banded matrix used in ?sbgst/?hbgst .
Syntax
lapack_int LAPACKE_spbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
float* bb, lapack_int ldbb);
lapack_int LAPACKE_dpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
double* bb, lapack_int ldbb);
lapack_int LAPACKE_cpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
lapack_complex_float* bb, lapack_int ldbb);
lapack_int LAPACKE_zpbstf (int matrix_layout, char uplo, lapack_int n, lapack_int kb,
lapack_complex_double* bb, lapack_int ldbb);
Include Files
• mkl.h
Description
The routine computes a split Cholesky factorization of a real symmetric or complex Hermitian positive-
definite band matrix B. It is to be used in conjunction with sbgst/hbgst.
The factorization has the form B = ST*S (or B = SH*S for complex flavors), where S is a band matrix of the
same bandwidth as B and the following structure: S is upper triangular in the first (n+kb)/2 rows and lower
triangular in the remaining rows.
747
Input Parameters

If uplo = 'U', bb stores the upper triangular part of B.
If uplo = 'L', bb stores the lower triangular part of B.
n The order of the matrix B (n≥ 0).

(kb≥ 0).
bb bb(size at least max(1, ldbb*n) for column major layout and at least
max(1, ldbb*(kb + 1)) for row major layout) is an array containing either
upper or lower triangular part of the matrix B (as specified by uplo) in band
storage format.
ldbb The leading dimension of bb; must be at least kb+1for column major and at
least max(1, n) for row major.
Output Parameters
bb On exit, this array is overwritten by the elements of the split Cholesky

factor S.
Return Values
If info = i, then the factorization could not be completed, because the updated element bii would be the
square root of a negative number; hence the matrix B is not positive-definite.
Application Notes
The computed factor S is the exact factor of a perturbed matrix B + E, where

The total number of floating-point operations for real flavors is approximately n(kb+1)2. The number of
operations for complex flavors is 4 times greater. All these estimates assume that kb is much less than n.
After calling this routine, you can call sbgst/hbgst to solve the generalized eigenproblem Az = λBz, where A
and B are banded and B is positive-definite.
Nonsymmetric Eigenvalue Problems: LAPACK Computational Routines

This section describes LAPACK routines for solving nonsymmetric eigenvalue problems, computing the Schur
factorization of general matrices, as well as performing a number of related computational tasks.
748
LAPACK Routines 3
A nonsymmetric eigenvalue problem is as follows: given a nonsymmetric (or non-Hermitian) matrix A, find
the eigenvaluesλ and the corresponding eigenvectorsz that satisfy the equation
Az = λz (right eigenvectors z)
or the equation
zHA = λzH (left eigenvectors z).
Nonsymmetric eigenvalue problems have the following properties:
• The number of eigenvectors may be less than the matrix order (but is not less than the number of
distinct eigenvalues of A).
• Eigenvalues may be complex even for a real matrix A.
• If a real nonsymmetric matrix has a complex eigenvalue a+bi corresponding to an eigenvector z, then a-
bi is also an eigenvalue. The eigenvalue a-bi corresponds to the eigenvector whose elements are
complex conjugate to the elements of z.
To solve a nonsymmetric eigenvalue problem with LAPACK, you usually need to reduce the matrix to the
upper Hessenberg form and then solve the eigenvalue problem with the Hessenberg matrix obtained. Table
"Computational Routines for Solving Nonsymmetric Eigenvalue Problems" lists LAPACK routines to reduce the
matrix to the upper Hessenberg form by an orthogonal (or unitary) similarity transformation A = QHQH as
well as routines to solve eigenvalue problems with Hessenberg matrices, forming the Schur factorization of
such matrices and computing the corresponding condition numbers.
The decision tree in Figure "Decision Tree: Real Nonsymmetric Eigenvalue Problems" helps you choose the
right routine or sequence of routines for an eigenvalue problem with a real nonsymmetric matrix. If you need
to solve an eigenvalue problem with a complex non-Hermitian matrix, use the decision tree shown in Figure
"Decision Tree: Complex Non-Hermitian Eigenvalue Problems".
Computational Routines for Solving Nonsymmetric Eigenvalue Problems
Operation performed Routines for real matrices Routines for complex matrices
Reduce to Hessenberg form ?gehrd, ?gehrd

A = QHQH
Generate the matrix Q ?orghr ?unghr
Apply the matrix Q ?ormhr ?unmhr
Balance matrix ?gebal ?gebal
Transform eigenvectors of ?gebak ?gebak

balanced matrix to those of
the original matrix
Find eigenvalues and Schur ?hseqr ?hseqr

factorization (QR algorithm)
Find eigenvectors from ?hsein ?hsein

Hessenberg form (inverse
iteration)
Find eigenvectors from ?trevc ?trevc

Schur factorization
Estimate sensitivities of ?trsna ?trsna

eigenvalues and
eigenvectors
Reorder Schur factorization ?trexc ?trexc
Reorder Schur factorization, ?trsen ?trsen

find the invariant subspace
and estimate sensitivities
749
Operation performed Routines for real matrices Routines for complex matrices
Solves Sylvester's equation. ?trsyl ?trsyl
Decision Tree: Real Nonsymmetric Eigenvalue Problems
750
LAPACK Routines 3
Decision Tree: Complex Non-Hermitian Eigenvalue Problems
?gehrd
Reduces a general matrix to upper Hessenberg form.
Syntax
lapack_int LAPACKE_sgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, float* a, lapack_int lda, float* tau);
lapack_int LAPACKE_dgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, double* a, lapack_int lda, double* tau);
lapack_int LAPACKE_cgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_float* a, lapack_int lda, lapack_complex_float* tau);
lapack_int LAPACKE_zgehrd (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_double* a, lapack_int lda, lapack_complex_double* tau);
751
Include Files
• mkl.h
Description
The routine reduces a general matrix A to upper Hessenberg form H by an orthogonal or unitary similarity
transformation A = Q*H*QH. Here H has real subdiagonal elements.
The routine does not form the matrix Q explicitly. Instead, Q is represented as a product of elementary
reflectors. Routines are provided to work with Q in this representation.
Input Parameters

ilo, ihi If A is an output by ?gebal, then ilo and ihi must contain the values
returned by that routine. Otherwise ilo = 1 and ihi = n. (If n > 0, then
1 ≤ilo≤ihi≤n; if n = 0, ilo = 1 and ihi = 0.)
a Arrays:
a (size max(1, lda*n)) contains the matrix A.
Output Parameters
a The elements on and above the subdiagonal contain the upper Hessenberg
matrix H. The subdiagonal elements of H are real. The elements below the
subdiagonal, with the array tau, represent the orthogonal matrix Q as a
product of n elementary reflectors.
tau Array, size at least max (1, n-1).

Contains scalars that define elementary reflectors for the matrix Q.
Return Values
Application Notes
The computed Hessenberg matrix H is exactly similar to a nearby matrix A + E, where ||E||2 < c(n)ε||
A||2, c(n) is a modestly increasing function of n, and ε is the machine precision.
The approximate number of floating-point operations for real flavors is (2/3)*(ihi - ilo)2(2ihi + 2ilo
+ 3n); for complex flavors it is 4 times greater.
?orghr
by ?gehrd.
752
LAPACK Routines 3
Syntax
lapack_int LAPACKE_sorghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, float* a, lapack_int lda, const float* tau);
lapack_int LAPACKE_dorghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, double* a, lapack_int lda, const double* tau);
Include Files
• mkl.h
Description
The routine explicitly generates the orthogonal matrix Q that has been determined by a preceding call to
sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QT, and represents the matrix Q as a product of ihi-
iloelementary reflectors. Here ilo and ihi are values determined by sgebal/dgebal when balancing the
matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.)
The matrix Q generated by ?orghr has the structure:
where Q22 occupies rows and columns ilo to ihi.
Input Parameters

ilo, ihi These must be the same parameters ilo and ihi, respectively, as supplied
to ?gehrd. (If n > 0, then 1 ≤ilo≤ihi≤n; if n = 0, ilo = 1 and ihi =
0.)
a, tau Arrays: a (size max(1, lda*n)) contains details of the vectors which define
the elementary reflectors, as returned by ?gehrd.
tau contains further details of the elementary reflectors, as returned by ?

gehrd.
The dimension of tau must be at least max (1, n-1).
Output Parameters
a Overwritten by the n-by-n orthogonal matrix Q.
753
Return Values
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε), where ε is the
machine precision.
The approximate number of floating-point operations is (4/3)(ihi-ilo)3.
The complex counterpart of this routine is unghr.
?ormhr
Multiplies an arbitrary real matrix C by the real
orthogonal matrix Q determined by ?gehrd.
Syntax
lapack_int LAPACKE_sormhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const float* a, lapack_int lda, const
float* tau, float* c, lapack_int ldc);
lapack_int LAPACKE_dormhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const double* a, lapack_int lda, const
Include Files
• mkl.h
Description
The routine multiplies a matrix C by the orthogonal matrix Q that has been determined by a preceding call to
sgehrd/dgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QT, and represents the matrix Q as a product of ihi-
iloelementary reflectors. Here ilo and ihi are values determined by sgebal/dgebal when balancing the
matrix;if the matrix has not been balanced, ilo = 1 and ihi = n.)
With ?ormhr, you can form one of the matrix products Q*C, QT*C, C*Q, or C*QT, overwriting the result on C
(which may be any real rectangular matrix).
A common application of ?ormhr is to transform a matrix V of eigenvectors of H to the matrix QV of
eigenvectors of A.
Input Parameters

If side= 'L', then the routine forms Q*C or QT*C.
If side= 'R', then the routine forms C*Q or C*QT.
trans Must be 'N' or 'T'.
754
LAPACK Routines 3
If trans= 'N', then Q is applied to C.
If trans= 'T', then QT is applied to C.
m The number of rows in C (m≥ 0).
to ?gehrd.
If m > 0 and side = 'L', then 1 ≤ilo≤ihi≤m.
If m = 0 and side = 'L', then ilo = 1 and ihi = 0.
If n > 0 and side = 'R', then 1 ≤ilo≤ihi≤n.
If n = 0 and side = 'R', then ilo = 1 and ihi = 0.
a, tau, c Arrays:
a(size max(1,lda*n) for side='R' and size max(1,lda*m) for side='L')
contains details of the vectors which define the elementary reflectors, as
returned by ?gehrd.

gehrd .
The dimension of tau must be at least max (1, m-1) if side = 'L' and at
least max (1, n-1) if side = 'R'.
major layout) contains the m by n matrix C.
lda The leading dimension of a; at least max(1, m) if side = 'L' and at least
max (1, n) if side = 'R'.
ldc The leading dimension of c; at least max(1, m) for column major layout and
Output Parameters
c C is overwritten by product Q*C, QT*C, C*Q, or C*QT as specified by side and

trans.
Return Values
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε)|*|C||2, where
The approximate number of floating-point operations is
2n(ihi-ilo)2 if side = 'L';
2m(ihi-ilo)2 if side = 'R'.
755
The complex counterpart of this routine is unmhr.
?unghr
by ?gehrd.
Syntax
lapack_int LAPACKE_cunghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_float* a, lapack_int lda, const lapack_complex_float* tau);
lapack_int LAPACKE_zunghr (int matrix_layout, lapack_int n, lapack_int ilo, lapack_int
ihi, lapack_complex_double* a, lapack_int lda, const lapack_complex_double* tau);
Include Files
• mkl.h
Description
The routine is intended to be used following a call to cgehrd/zgehrd, which reduces a complex matrix A to
upper Hessenberg form H by a unitary similarity transformation: A = Q*H*QH. ?gehrd represents the matrix
Q as a product of ihi-iloelementary reflectors. Here ilo and ihi are values determined by cgebal/zgebal
when balancing the matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.
Use the routine unghr to generate Q explicitly as a square matrix. The matrix Q has the structure:
where Q22 occupies rows and columns ilo to ihi.
Input Parameters

to ?gehrd . (If n > 0, then 1 ≤ilo≤ihi≤n. If n = 0, then ilo = 1 and
ihi = 0.)
a, tau Arrays:
a (size max(1, lda*n)) contains details of the vectors which define the
elementary reflectors, as returned by ?gehrd.
gehrd .
The dimension of tau must be at least max (1, n-1).
756
LAPACK Routines 3
Output Parameters
a Overwritten by the n-by-n unitary matrix Q.
Return Values
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε), where ε is the
machine precision.
The approximate number of real floating-point operations is (16/3)(ihi-ilo)3.
The real counterpart of this routine is orghr.
?unmhr
Multiplies an arbitrary complex matrix C by the
complex unitary matrix Q determined by ?gehrd.
Syntax
lapack_int LAPACKE_cunmhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const lapack_complex_float* a, lapack_int
lapack_int LAPACKE_zunmhr (int matrix_layout, char side, char trans, lapack_int m,
lapack_int n, lapack_int ilo, lapack_int ihi, const lapack_complex_double* a,
lapack_int lda, const lapack_complex_double* tau, lapack_complex_double* c, lapack_int
ldc);
Include Files
• mkl.h
Description
The routine multiplies a matrix C by the unitary matrix Q that has been determined by a preceding call to
cgehrd/zgehrd. (The routine ?gehrd reduces a real general matrix A to upper Hessenberg form H by an
orthogonal similarity transformation, A = Q*H*QH, and represents the matrix Q as a product of ihi-ilo
elementary reflectors. Here ilo and ihi are values determined by cgebal/zgebal when balancing the
matrix; if the matrix has not been balanced, ilo = 1 and ihi = n.)
With ?unmhr, you can form one of the matrix products Q*C, QH*C, C*Q, or C*QH, overwriting the result on C
(which may be any complex rectangular matrix). A common application of this routine is to transform a
matrix V of eigenvectors of H to the matrix QV of eigenvectors of A.
Input Parameters

757
If side = 'L', then the routine forms Q*C or QH*C.
If side = 'R', then the routine forms C*Q or C*QH.
trans Must be 'N' or 'C'.
If trans = 'N', then Q is applied to C.
If trans = 'T', then QH is applied to C.
m The number of rows in C (m≥ 0).
to ?gehrd .
If m > 0 and side = 'L', then 1 ≤ilo≤ihi≤m.
If m = 0 and side = 'L', then ilo = 1 and ihi = 0.
If n > 0 and side = 'R', then 1 ≤ilo≤ihi≤n.
If n = 0 and side = 'R', then ilo =1 and ihi = 0.
a, tau, c Arrays:
a(size max(1,lda*n) for side='R' and size max(1,lda*m) for side='L')
contains details of the vectors which define the elementary reflectors, as
returned by ?gehrd.

gehrd.
The dimension of tau must be at least max (1, m-1)
if side = 'L' and at least max (1, n-1) if side = 'R'.
lda The leading dimension of a; at least max(1, m) if side = 'L' and at least
max (1, n) if side = 'R'.
Output Parameters
c C is overwritten by Q*C, or QH*C, or C*QH, or C*Q as specified by side and

trans.
Return Values
758
LAPACK Routines 3
Application Notes
The computed matrix Q differs from the exact result by a matrix E such that ||E||2 = O(ε)*||C||2, where
8n(ihi-ilo)2 if side = 'L';
8m(ihi-ilo)2 if side = 'R'.
The real counterpart of this routine is ormhr.
?gebal
Balances a general matrix to improve the accuracy of
computed eigenvalues and eigenvectors.
Syntax
lapack_int LAPACKE_sgebal( int matrix_layout, char job, lapack_int n, float* a,
lapack_int lda, lapack_int* ilo, lapack_int* ihi, float* scale );
lapack_int LAPACKE_dgebal( int matrix_layout, char job, lapack_int n, double* a,
lapack_int lda, lapack_int* ilo, lapack_int* ihi, double* scale );
lapack_int LAPACKE_cgebal( int matrix_layout, char job, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int* ilo, lapack_int* ihi, float*
scale );
lapack_int LAPACKE_zgebal( int matrix_layout, char job, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int* ilo, lapack_int* ihi, double*
scale );
Include Files
• mkl.h
Description
The routine balances a matrix A by performing either or both of the following two similarity transformations:
(1) The routine first attempts to permute A to block upper triangular form:
where P is a permutation matrix, and A'11 and A'33 are upper triangular. The diagonal elements of A'11 and
A'33 are eigenvalues of A. The rest of the eigenvalues of A are the eigenvalues of the central diagonal block
A'22, in rows and columns ilo to ihi. Subsequent operations to compute the eigenvalues of A (or its Schur
factorization) need only be applied to these rows and columns; this can save a significant amount of work if
ilo > 1 and ihi < n.
If no suitable permutation exists (as is often the case), the routine sets ilo = 1 and ihi = n, and A'22 is
the whole of A.
(2) The routine applies a diagonal similarity transformation to A', to make the rows and columns of A'22 as
close in norm as possible:
759
This scaling can reduce the norm of the matrix (that is, ||A''22|| < ||A'22||), and hence reduce the
effect of rounding errors on the accuracy of computed eigenvalues and eigenvectors.
Input Parameters

job Must be 'N' or 'P' or 'S' or 'B'.
If job = 'N', then A is neither permuted nor scaled (but ilo, ihi, and scale
get their values).
If job = 'P', then A is permuted but not scaled.
If job = 'S', then A is scaled but not permuted.
If job = 'B', then A is both scaled and permuted.
a Array a (size max(1, lda*n)) contains the matrix A.
Output Parameters
a Overwritten by the balanced matrix (a is not referenced if job = 'N').
ilo, ihi The values ilo and ihi such that on exit a(i,j) is zero if i > j and 1 ≤j <
ilo or ihi < j≤n.
If job = 'N' or 'S', then ilo = 1 and ihi = n.
scale Array, size at least max(1, n).

Contains details of the permutations and scaling factors.
More precisely, if pj is the index of the row and column interchanged with
row and column j, and dj is the scaling factor used to balance row and
column j, then
scale[j - 1] = pj for j = 1, 2,..., ilo-1, ihi+1,..., n;
scale[j - 1] = dj for j = ilo, ilo + 1,..., ihi.
The order in which the interchanges are made is n to ihi+1, then 1 to ilo-1.
Return Values
760
LAPACK Routines 3
Application Notes
The errors are negligible, compared with those in subsequent computations.
If the matrix A is balanced by this routine, then any eigenvectors computed subsequently are eigenvectors of
the matrix A'' and hence you must call gebak to transform them back to eigenvectors of A.
If the Schur vectors of A are required, do not call this routine with job = 'S' or 'B', because then the
balancing transformation is not orthogonal (not unitary for complex flavors).
If you call this routine with job = 'P', then any Schur vectors computed subsequently are Schur vectors of
the matrix A'', and you need to call gebak (with side = 'R') to transform them back to Schur vectors of A.
The total number of floating-point operations is proportional to n2.
?gebak
Transforms eigenvectors of a balanced matrix to those
of the original nonsymmetric matrix.
Syntax
lapack_int LAPACKE_sgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* scale, lapack_int m, float* v, lapack_int
ldv );
lapack_int LAPACKE_dgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* scale, lapack_int m, double* v,
lapack_int ldv );
lapack_int LAPACKE_cgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* scale, lapack_int m, lapack_complex_float*
v, lapack_int ldv );
lapack_int LAPACKE_zgebak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* scale, lapack_int m,
lapack_complex_double* v, lapack_int ldv );
Include Files
• mkl.h
Description
The routine is intended to be used after a matrix A has been balanced by a call to ?gebal, and eigenvectors
of the balanced matrix A''22 have subsequently been computed. For a description of balancing, see gebal. The
balanced matrix A'' is obtained as A''= D*P*A*PT*inv(D), where P is a permutation matrix and D is a
diagonal scaling matrix. This routine transforms the eigenvectors as follows:
if x is a right eigenvector of A'', then PT*inv(D)*x is a right eigenvector of A; if y is a left eigenvector of A'',
then PT*D*y is a left eigenvector of A.
Input Parameters

job Must be 'N' or 'P' or 'S' or 'B'. The same parameter job as supplied to ?
gebal.
761
If side = 'L', then left eigenvectors are transformed.
If side = 'R', then right eigenvectors are transformed.
n The number of rows of the matrix of eigenvectors (n≥ 0).
ilo, ihi The values ilo and ihi, as returned by ?gebal. (If n > 0, then 1
≤ilo≤ihi≤n;
if n = 0, then ilo = 1 and ihi = 0.)
scale Array, size at least max(1, n).

Contains details of the permutations and/or the scaling factors used to
balance the original general matrix, as returned by ?gebal.
m The number of columns of the matrix of eigenvectors (m≥ 0).
v Arrays:
v(size max(1, ldv*n) for column major layout and max(1, ldv*m) for row
major layout) contains the matrix of left or right eigenvectors to be
transformed.
ldv The leading dimension of v; at least max(1, n) for column major layout and
at least max(1, m) for row major layout .
Output Parameters
v Overwritten by the transformed eigenvectors.
Return Values
Application Notes
The errors in this routine are negligible.
The approximate number of floating-point operations is approximately proportional to m*n.
?hseqr
Computes all eigenvalues and (optionally) the Schur
factorization of a matrix reduced to Hessenberg form.
Syntax
lapack_int LAPACKE_shseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float* h, lapack_int ldh, float* wr, float* wi, float*
z, lapack_int ldz );
lapack_int LAPACKE_dhseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double* h, lapack_int ldh, double* wr, double* wi,
double* z, lapack_int ldz );
762
LAPACK Routines 3
lapack_int LAPACKE_chseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhseqr( int matrix_layout, char job, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* w, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally the Schur factorization, of an upper Hessenberg
matrix H: H = Z*T*ZH, where T is an upper triangular (or, for real flavors, quasi-triangular) matrix (the
Schur form of H), and Z is the unitary or orthogonal matrix whose columns are the Schur vectors zi.
You can also use this routine to compute the Schur factorization of a general matrix A which has been
reduced to upper Hessenberg form H:
A = Q*H*QH, where Q is unitary (orthogonal for real flavors);
A = (QZ)*T*(QZ)H.
In this case, after reducing A to Hessenberg form by gehrd, call orghr to form Q explicitly and then pass Q
to ?hseqr with compz = 'V'.
You can also call gebal to balance the original matrix before reducing it to Hessenberg form by ?hseqr, so
that the Hessenberg matrix H will have the structure:
where H11 and H33 are upper triangular.

If so, only the central diagonal block H22 (in rows and columns ilo to ihi) needs to be further reduced to
Schur form (the blocks H12 and H23 are also affected). Therefore the values of ilo and ihi can be supplied to ?
hseqr directly. Also, after calling this routine you must call gebak to permute the Schur vectors of the
balanced matrix to those of the original matrix.
If ?gebal has not been called, however, then ilo must be set to 1 and ihi to n. Note that if the Schur
factorization of A is required, ?gebal must not be called with job = 'S' or 'B', because the balancing
transformation is not unitary (for real flavors, it is not orthogonal).
?hseqr uses a multishift form of the upper Hessenberg QR algorithm. The Schur vectors are normalized so
that ||zi||2 = 1, but are determined only to within a complex factor of absolute value 1 (for the real
flavors, to within a factor ±1).
Input Parameters
job Must be 'E' or 'S'.
If job = 'E', then eigenvalues only are required.
If job = 'S', then the Schur form T is required.
763
If compz = 'N', then no Schur vectors are computed (and the array z is not
referenced).
If compz = 'I', then the Schur vectors of H are computed (and the array z
is initialized by the routine).
If compz = 'V', then the Schur vectors of A are computed (and the array z
must contain the matrix Q on entry).
n The order of the matrix H (n≥ 0).
ilo, ihi If A has been balanced by ?gebal, then ilo and ihi must contain the values
returned by ?gebal. Otherwise, ilo must be set to 1 and ihi to n.
h, z Arrays:
h (size max(1, ldh*n)) ) The n-by-n upper Hessenberg matrix H.
z (size max(1, ldz*n))
If compz = 'V', then z must contain the matrix Q from the reduction to
Hessenberg form.
If compz = 'I', then z need not be set.
If compz = 'N', then z is not referenced.
ldh The leading dimension of h; at least max(1, n).
ldz The leading dimension of z;

If compz = 'N', then ldz≥ 1.
If compz = 'V' or 'I', then ldz≥ max(1, n).
Output Parameters
w Array, size at least max (1, n). Contains the computed eigenvalues, unless
info>0. The eigenvalues are stored in the same order as on the diagonal of
the Schur form T (if computed).
wr, wi Arrays, size at least max (1, n) each.

Contain the real and imaginary parts, respectively, of the computed
eigenvalues, unless info > 0. Complex conjugate pairs of eigenvalues
appear consecutively with the eigenvalue having positive imaginary part
first. The eigenvalues are stored in the same order as on the diagonal of the
Schur form T (if computed).
h If info = 0 and job = 'S', h contains the upper quasi-triangular matrix T

from the Schur decomposition (the Schur form).
If info = 0 and job = 'E', the contents of h are unspecified on exit. (The
output value of h when info > 0 is given under the description of info
below.)
z If compz = 'V' and info = 0, then z contains Q*Z.
If compz = 'I' and info = 0, then z contains the unitary or orthogonal

matrix Z of the Schur vectors of H.
764
LAPACK Routines 3
Return Values
If info = i, ?hseqr failed to compute all of the eigenvalues. Elements 1,2, ..., ilo-1 and i+1, i+2, ..., n of
the eigenvalue arrays (wr and wi for real flavors and w for complex flavors) contain the real and imaginary
parts of those eigenvalues that have been successfully found.
If info > 0, and job = 'E', then on exit, the remaining unconverged eigenvalues are the eigenvalues of
the upper Hessenberg matrix rows and columns ilo through info of the final output value of H.
If info > 0, and job = 'S', then on exit (initial value of H)*U = U*(final value of H), where U is a unitary
matrix. The final value of H is upper Hessenberg and triangular in rows and columns info+1 through ihi.
If info > 0, and compz = 'V', then on exit (final value of Z) = (initial value of Z)*U, where U is the
unitary matrix (regardless of the value of job).
If info > 0, and compz = 'I', then on exit (final value of Z) = U, where U is the unitary matrix (regardless
of the value of job).
If info > 0, and compz = 'N', then Z is not accessed.
Application Notes
The computed Schur factorization is the exact factorization of a nearby matrix H + E, where ||E||2 < O(ε)
||H||2/si, and ε is the machine precision.
If λi is an exact eigenvalue, and μi is the corresponding computed value, then |λi - μi|≤c(n)*ε*||H||2/si,
where c(n) is a modestly increasing function of n, and si is the reciprocal condition number of λi. The
condition numbers si may be computed by calling trsna.
The total number of floating-point operations depends on how rapidly the algorithm converges; typical
numbers are as follows.
If only eigenvalues are computed: 7n3 for real flavors

25n3 for complex flavors.
If the Schur form is computed: 10n3 for real flavors

If the full Schur factorization is 20n3 for real flavors

computed:
?hsein
Computes selected eigenvectors of an upper
Hessenberg matrix that correspond to specified
eigenvalues.
Syntax
lapack_int LAPACKE_shsein( int matrix_layout, char side, char eigsrc, char initv,
lapack_logical* select, lapack_int n, const float* h, lapack_int ldh, float* wr, const
float* wi, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m, lapack_int* ifaill, lapack_int* ifailr );
765
lapack_int LAPACKE_dhsein( int matrix_layout, char side, char eigsrc, char initv,
lapack_logical* select, lapack_int n, const double* h, lapack_int ldh, double* wr,
const double* wi, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int
mm, lapack_int* m, lapack_int* ifaill, lapack_int* ifailr );
lapack_int LAPACKE_chsein( int matrix_layout, char side, char eigsrc, char initv, const
lapack_logical* select, lapack_int n, const lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* w, lapack_complex_float* vl, lapack_int ldvl,
lapack_complex_float* vr, lapack_int ldvr, lapack_int mm, lapack_int* m, lapack_int*
ifaill, lapack_int* ifailr );
lapack_int LAPACKE_zhsein( int matrix_layout, char side, char eigsrc, char initv, const
lapack_logical* select, lapack_int n, const lapack_complex_double* h, lapack_int ldh,
lapack_complex_double* w, lapack_complex_double* vl, lapack_int ldvl,
lapack_complex_double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m, lapack_int*
ifaill, lapack_int* ifailr );
Include Files
• mkl.h
Description
The routine computes left and/or right eigenvectors of an upper Hessenberg matrix H, corresponding to
selected eigenvalues.
The right eigenvector x and the left eigenvector y, corresponding to an eigenvalue λ, are defined by: H*x =
λ*x and yH*H = λ*yH (or HH*y = λ**y). Here λ* denotes the conjugate of λ.
The eigenvectors are computed by inverse iteration. They are scaled so that, for a real eigenvector x, max|
xi| = 1, and for a complex eigenvector, max(|Rexi| + |Imxi|) = 1.
If H has been formed by reduction of a general matrix A to upper Hessenberg form, then eigenvectors of H
may be transformed to eigenvectors of A by ormhr or unmhr.
Input Parameters
side Must be 'R' or 'L' or 'B'.
If side = 'R', then only right eigenvectors are computed.
If side = 'L', then only left eigenvectors are computed.
If side = 'B', then all eigenvectors are computed.
eigsrc Must be 'Q' or 'N'.
If eigsrc = 'Q', then the eigenvalues of H were found using hseqr; thus if
H has any zero sub-diagonal elements (and so is block triangular), then the
j-th eigenvalue can be assumed to be an eigenvalue of the block containing
the j-th row/column. This property allows the routine to perform inverse
iteration on just one diagonal block. If eigsrc = 'N', then no such
assumption is made and the routine performs inverse iteration using the
whole matrix.
initv Must be 'N' or 'U'.
If initv = 'N', then no initial estimates for the selected eigenvectors are
supplied.
766
LAPACK Routines 3
If initv = 'U', then initial estimates for the selected eigenvectors are
supplied in vl and/or vr.
select
Array, size at least max (1, n). Specifies which eigenvectors are to be
computed.
For real flavors:
To obtain the real eigenvector corresponding to the real eigenvalue wr[j],
set select[j] to 1
To select the complex eigenvector corresponding to the complex eigenvalue

(wr[j - 1], wi[j - 1]) with complex conjugate (wr[j], wi[j]), set select[j - 1]
and/or select[j] to 1; the eigenvector corresponding to the first eigenvalue
in the pair is computed.
To select the eigenvector corresponding to the eigenvalue w[j], set select[j]
to 1
n The order of the matrix H (n≥ 0).
h, vl, vr Arrays:
h (size max(1, ldh*n)) The n-by-n upper Hessenberg matrix H. If an NAN
value is detected in h, the routine returns with info = -6.
vl(size max(1, ldvl*mm) for column major layout and max(1, ldvl*n) for
row major layout)
If initv = 'V' and side = 'L' or 'B', then vl must contain starting
vectors for inverse iteration for the left eigenvectors. Each starting vector
must be stored in the same column or columns as will be used to store the
corresponding eigenvector.
If initv = 'N', then vl need not be set.
The array vl is not referenced if side = 'R'.
vr(size max(1, ldvr*mm) for column major layout and max(1, ldvr*n) for
row major layout)
If initv = 'V' and side = 'R' or 'B', then vr must contain starting
vectors for inverse iteration for the right eigenvectors. Each starting vector
must be stored in the same column or columns as will be used to store the
corresponding eigenvector.
If initv = 'N', then vr need not be set.
The array vr is not referenced if side = 'L'.
w Array, size at least max (1, n).

Contains the eigenvalues of the matrix H.
If eigsrc = 'Q', the array must be exactly as returned by ?hseqr.
767
Contain the real and imaginary parts, respectively, of the eigenvalues of the
matrix H. Complex conjugate pairs of values must be stored in consecutive
elements of the arrays. If eigsrc = 'Q', the arrays must be exactly as
returned by ?hseqr.
ldvl The leading dimension of vl.

If side = 'L' or 'B', ldvl≥ max(1,n) for column major layout and ldvl≥
max(1, mm) for row major layout .
If side = 'R', ldvl≥ 1.
ldvr The leading dimension of vr.

If side = 'R' or 'B', ldvr≥ max(1,n) for column major layout and ldvr≥
If side = 'L', ldvr≥1.
mm The number of columns in vl and/or vr.

Must be at least m, the actual number of columns required (see Output
Parameters below).
For real flavors, m is obtained by counting 1 for each selected real
eigenvector and 2 for each selected complex eigenvector (see select).
For complex flavors, m is the number of selected eigenvectors (see select).
Constraint:
0 ≤mm≤n.
Output Parameters
select Overwritten for real flavors only.

If a complex eigenvector was selected as specified above, then select[j - 1]
is set to 1 and select[j] to 0
w The real parts of some elements of w may be modified, as close eigenvalues

are perturbed slightly in searching for independent eigenvectors.
wr Some elements of wr may be modified, as close eigenvalues are perturbed

slightly in searching for independent eigenvectors.
vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by select).
If side = 'R' or 'B', vr contains the computed right eigenvectors (as
specified by select).
The eigenvectors treated column-wise form a rectangular n-by-mm matrix.
For real flavors: a real eigenvector corresponding to a real eigenvalue

occupies one column of the matrix; a complex eigenvector corresponding to
a complex eigenvalue occupies two columns: the first column holds the real
part of the eigenvector and the second column holds the imaginary part of
the eigenvector. The matrix is stored in a one-dimensional array as
described by matrix_layout (using either column major or row major
layout).
768
LAPACK Routines 3
m For real flavors: the number of columns of vl and/or vr required to store the
selected eigenvectors.
For complex flavors: the number of selected eigenvectors.
ifaill, ifailr
Arrays, size at least max(1, mm) each.
ifaill[i - 1] = 0 if the ith column of vl converged;
ifaill[i - 1] = j > 0 if the eigenvector stored in the i-th column of vl
(corresponding to the jth eigenvalue) failed to converge.
ifailr[i - 1] = 0 if the ith column of vr converged;
ifailr[i - 1] = j > 0 if the eigenvector stored in the i-th column of vr
(corresponding to the jth eigenvalue) failed to converge.
For real flavors: if the ith and (i+1)th columns of vl contain a selected
complex eigenvector, then ifaill[i - 1] and ifaill[i] are set to the same value.
A similar rule holds for vr and ifailr.
The array ifaill is not referenced if side = 'R'. The array ifailr is not
referenced if side = 'L'.
Return Values
If info > 0, then i eigenvectors (as indicated by the parameters ifaill and/or ifailr above) failed to converge.
The corresponding columns of vl and/or vr contain no useful information.
Application Notes
Each computed right eigenvector x i is the exact eigenvector of a nearby matrix A + Ei, such that ||Ei|| <
O(ε)||A||. Hence the residual is small:
||Axi - λixi|| = O(ε)||A||.
However, eigenvectors corresponding to close or coincident eigenvalues may not accurately span the relevant
subspaces.
Similar remarks apply to computed left eigenvectors.
?trevc
Computes selected eigenvectors of an upper (quasi-)
triangular matrix computed by ?hseqr.
Syntax
lapack_int LAPACKE_strevc( int matrix_layout, char side, char howmny, lapack_logical*
select, lapack_int n, const float* t, lapack_int ldt, float* vl, lapack_int ldvl,
float* vr, lapack_int ldvr, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_dtrevc( int matrix_layout, char side, char howmny, lapack_logical*
select, lapack_int n, const double* t, lapack_int ldt, double* vl, lapack_int ldvl,
double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m );
769
lapack_int LAPACKE_ctrevc( int matrix_layout, char side, char howmny, const

lapack_logical* select, lapack_int n, lapack_complex_float* t, lapack_int ldt,
lapack_complex_float* vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr,
lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztrevc( int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr,
Include Files
• mkl.h
Description
The routine computes some or all of the right and/or left eigenvectors of an upper triangular matrix T (or, for
real flavors, an upper quasi-triangular matrix T). Matrices of this type are produced by the Schur
factorization of a general matrix: A = Q*T*QH, as computed by hseqr.
The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w, are defined by:
T*x = w*x, yH*T = w*yH, where yH denotes the conjugate transpose of y.
The eigenvalues are not input to this routine, but are read directly from the diagonal blocks of T.
This routine returns the matrices X and/or Y of right and left eigenvectors of T, or the products Q*X and/or
Q*Y, where Q is an input matrix.
If Q is the orthogonal/unitary factor that reduces a matrix A to Schur form T, then Q*X and Q*Y are the
matrices of right and left eigenvectors of A.
Input Parameters

side Must be 'R' or 'L' or 'B'.
If side = 'R', then only right eigenvectors are computed.
If side = 'L', then only left eigenvectors are computed.
If side = 'B', then all eigenvectors are computed.
howmny Must be 'A' or 'B' or 'S'.
If howmny = 'A', then all eigenvectors (as specified by side) are

computed.
If howmny = 'B', then all eigenvectors (as specified by side) are computed
and backtransformed by the matrices supplied in vl and vr.
If howmny = 'S', then selected eigenvectors (as specified by side and
select) are computed.
select
Array, size at least max (1, n).
If howmny = 'S', select specifies which eigenvectors are to be computed.
If howmny = 'A' or 'B', select is not referenced.
For real flavors:
770
LAPACK Routines 3
If omega[j] is a real eigenvalue, the corresponding real eigenvector is
computed if select[j] is 1.
If omega[j - 1] and omega[j] are the real and imaginary parts of a complex
eigenvalue, the corresponding complex eigenvector is computed if either
select[j - 1] or select[j] is 1, and on exit select[j - 1] is set to 1and select[j]
is set to 0.

The eigenvector corresponding to the j-th eigenvalue is computed if select[j
- 1] is 1.
t, vl, vr Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T in Schur canonical
form. For complex flavors ctrevc and ztrevc, contains the upper
triangular matrix T.
row major layout)
If howmny = 'B' and side = 'L' or 'B', then vl must contain an n-by-n
matrix Q (usually the matrix of Schur vectors returned by ?hseqr).
If howmny = 'A' or 'S', then vl need not be set.
The array vl is not referenced if side = 'R'.
row major layout)
If howmny = 'B' and side = 'R' or 'B', then vr must contain an n-by-n
matrix Q (usually the matrix of Schur vectors returned by ?hseqr). .
If howmny = 'A' or 'S', then vr need not be set.
The array vr is not referenced if side = 'L'.
ldt The leading dimension of t; at least max(1, n).

If side = 'L' or 'B', ldvl≥n.
If side = 'R', ldvl≥ 1.

If side = 'R' or 'B', ldvr≥n.
If side = 'L', ldvr≥ 1.
mm The number of columns in the arrays vl and/or vr. Must be at least m (the
precise number of columns required).
If howmny = 'A' or 'B', mm = n.
If howmny = 'S': for real flavors, mm is obtained by counting 1 for each

selected real eigenvector and 2 for each selected complex eigenvector;
for complex flavors, mm is the number of selected eigenvectors (see
select).
771
Constraint: 0 ≤mm≤n.
Output Parameters
select If a complex eigenvector of a real matrix was selected as specified above,

then select[j] is set to 1 and select[j + 1] to 0
t ctrevc/ztrevc modify the t array, which is restored on exit.
vl, vr If side = 'L' or 'B', vl contains the computed left eigenvectors (as
specified by howmny and select).
If side = 'R' or 'B', vr contains the computed right eigenvectors (as
specified by howmny and select).
The eigenvectors treated column-wise form a rectangular n-by-mm matrix.
For real flavors: a real eigenvector corresponding to a real eigenvalue

occupies one column of the matrix; a complex eigenvector corresponding to
a complex eigenvalue occupies two columns: the first column holds the real
part of the eigenvector and the second column holds the imaginary part of
the eigenvector. The matrix is stored in a one-dimensional array as
described by matrix_layout (using either column major or row major
layout).
m
For complex flavors: the number of selected eigenvectors.
If howmny = 'A' or 'B', m is set to n.
For real flavors: the number of columns of vl and/or vr actually used to

store the selected eigenvectors.
Return Values
Application Notes
If xi is an exact right eigenvector and yi is the corresponding computed eigenvector, then the angle θ(yi,
xi) between them is bounded as follows: θ(yi,xi)≤(c(n)ε||T||2)/sepi where sepi is the reciprocal
condition number of xi. The condition number sepi may be computed by calling ?trsna.
?trsna
Estimates condition numbers for specified eigenvalues
and right eigenvectors of an upper (quasi-) triangular
matrix.
Syntax
lapack_int LAPACKE_strsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const float* t, lapack_int ldt, const float* vl,
lapack_int ldvl, const float* vr, lapack_int ldvr, float* s, float* sep, lapack_int
mm, lapack_int* m );
772
LAPACK Routines 3
lapack_int LAPACKE_dtrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const double* t, lapack_int ldt, const double*
vl, lapack_int ldvl, const double* vr, lapack_int ldvr, double* s, double* sep,
lapack_int LAPACKE_ctrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* t, lapack_int ldt,
const lapack_complex_float* vl, lapack_int ldvl, const lapack_complex_float* vr,
lapack_int ldvr, float* s, float* sep, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ztrsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* t, lapack_int ldt,
const lapack_complex_double* vl, lapack_int ldvl, const lapack_complex_double* vr,
lapack_int ldvr, double* s, double* sep, lapack_int mm, lapack_int* m );
Include Files
• mkl.h
Description
The routine estimates condition numbers for specified eigenvalues and/or right eigenvectors of an upper
triangular matrix T (or, for real flavors, upper quasi-triangular matrix T in canonical Schur form). These are
the same as the condition numbers of the eigenvalues and right eigenvectors of an original matrix A =
Z*T*ZH (with unitary or, for real flavors, orthogonal Z), from which T may have been derived.
The routine computes the reciprocal of the condition number of an eigenvalue λi as si = |vT*u|/(||u||E||
v||E) for real flavors and si = |vH*u|/(||u||E||v||E) for complex flavors,
where:
• u and v are the right and left eigenvectors of T, respectively, corresponding to λi.
• vT/vH denote transpose/conjugate transpose of v, respectively.
This reciprocal condition number always lies between zero (ill-conditioned) and one (well-conditioned).
An approximate error estimate for a computed eigenvalue λi is then given by ε*||T||/si, where ε is the
machine precision.
To estimate the reciprocal of the condition number of the right eigenvector corresponding to λi, the routine
first calls trexc to reorder the diagonal elements of matrix T so that λi is in the leading position:
The reciprocal condition number of the eigenvector is then estimated as sepi, the smallest singular value of
the matrix T22 - λi*I.
An approximate error estimate for a computed right eigenvector u corresponding to λi is then given by ε*||
T||/sepi.
Input Parameters

job Must be 'E' or 'V' or 'B'.
773
If job = 'E', then condition numbers for eigenvalues only are computed.
If job = 'V', then condition numbers for eigenvectors only are computed.
If job = 'B', then condition numbers for both eigenvalues and

eigenvectors are computed.
howmny Must be 'A' or 'S'.
If howmny = 'A', then the condition numbers for all eigenpairs are
computed.
If howmny = 'S', then condition numbers for selected eigenpairs (as
specified by select) are computed.
select
Array, size at least max (1, n) if howmny = 'S' and at least 1 otherwise.
Specifies the eigenpairs for which condition numbers are to be computed if

howmny= 'S'.
For real flavors:

To select condition numbers for the eigenpair corresponding to the real
eigenvalue λj, select[j] must be set 1;
to select condition numbers for the eigenpair corresponding to a complex

conjugate pair of eigenvalues λj and λj + 1), select[j - 1] and/or select[j]
must be set 1
For complex flavors

To select condition numbers for the eigenpair corresponding to the
eigenvalue λj, select[j] must be set 1select is not referenced if howmny =
'A'.
t, vl, vr Arrays:
t (size max(1, ldt*n)) contains the n-by-n matrix T.
row major layout)
If job = 'E' or 'B', then vl must contain the left eigenvectors of T (or of
any matrix Q*T*QH with Q unitary or orthogonal) corresponding to the
eigenpairs specified by howmny and select. The eigenvectors must be
stored in consecutive columns of vl, as returned by trevc or hsein.
The array vl is not referenced if job = 'V'.
row major layout)
If job = 'E' or 'B', then vr must contain the right eigenvectors of T (or of
any matrix Q*T*QH with Q unitary or orthogonal) corresponding to the
eigenpairs specified by howmny and select. The eigenvectors must be
stored in consecutive columns of vr, as returned by trevc or hsein.
The array vr is not referenced if job = 'V'.
774
LAPACK Routines 3
If job = 'E' or 'B', ldvl≥ max(1,n) for column major layout and ldvl≥
If job = 'V', ldvl≥ 1.

If job = 'E' or 'B', ldvr≥ max(1,n) for column major layout and ldvr≥
If job = 'R', ldvr≥ 1.
mm The number of elements in the arrays s and sep, and the number of
columns in vl and vr (if used). Must be at least m (the precise number
required).
If howmny = 'A', mm = n;
if howmny = 'S', for real flavorsmm is obtained by counting 1 for each

selected real eigenvalue and 2 for each selected complex conjugate pair of
eigenvalues.
for complex flavorsmm is the number of selected eigenpairs (see select).
Constraint:
0 ≤mm≤n.
Output Parameters
s Array, size at least max(1, mm) if job = 'E' or 'B' and at least 1 if job =
'V'.
Contains the reciprocal condition numbers of the selected eigenvalues if job
= 'E' or 'B', stored in consecutive elements of the array. Thus s[j - 1],
sep[j - 1] and the j-th columns of vl and vr all correspond to the same
eigenpair (but not in general the j th eigenpair unless all eigenpairs have
been selected).
For real flavors: for a complex conjugate pair of eigenvalues, two
consecutive elements of s are set to the same value. The array s is not
referenced if job = 'V'.
sep Array, size at least max(1, mm) if job = 'V' or 'B' and at least 1 if job =
'E'. Contains the estimated reciprocal condition numbers of the selected
right eigenvectors if job = 'V' or 'B', stored in consecutive elements of
the array.
For real flavors: for a complex eigenvector, two consecutive elements of sep
are set to the same value; if the eigenvalues cannot be reordered to
compute sep[j - 1], then sep[j - 1] is set to zero; this can only occur when
the true value would be very small anyway. The array sep is not referenced
if job = 'E'.
m
For complex flavors: the number of selected eigenpairs.
If howmny = 'A', m is set to n.
For real flavors: the number of elements of s and/or sep actually used to
store the estimated condition numbers.
775
Return Values
Application Notes
The computed values sepi may overestimate the true value, but seldom by a factor of more than 3.
?trexc
Reorders the Schur factorization of a general matrix.
Syntax
lapack_int LAPACKE_strexc( int matrix_layout, char compq, lapack_int n, float* t,
lapack_int ldt, float* q, lapack_int ldq, lapack_int* ifst, lapack_int* ilst );
lapack_int LAPACKE_dtrexc( int matrix_layout, char compq, lapack_int n, double* t,
lapack_int ldt, double* q, lapack_int ldq, lapack_int* ifst, lapack_int* ilst );
lapack_int LAPACKE_ctrexc( int matrix_layout, char compq, lapack_int n,
lapack_complex_float* t, lapack_int ldt, lapack_complex_float* q, lapack_int ldq,
lapack_int ifst, lapack_int ilst );
lapack_int LAPACKE_ztrexc( int matrix_layout, char compq, lapack_int n,
lapack_complex_double* t, lapack_int ldt, lapack_complex_double* q, lapack_int ldq,
lapack_int ifst, lapack_int ilst );
Include Files
• mkl.h
Description
The routine reorders the Schur factorization of a general matrix A = Q*T*QH, so that the diagonal element or
block of T with row index ifst is moved to row ilst.
The reordered Schur form S is computed by an unitary (or, for real flavors, orthogonal) similarity
transformation: S = ZH*T*Z. Optionally the updated matrix P of Schur vectors is computed as P = Q*Z,
giving A = P*S*PH.
Input Parameters

compq Must be 'V' or 'N'.
If compq = 'V', then the Schur vectors (Q) are updated.
If compq = 'N', then no Schur vectors are updated.
t, q Arrays:
776
LAPACK Routines 3
t (size max(1, ldt*n)) contains the n-by-n matrix T.
q (size max(1, ldq*n))
If compq = 'V', then q must contain Q (Schur vectors).
If compq = 'N', then q is not referenced.
ldq The leading dimension of q;

If compq = 'N', then ldq≥ 1.
If compq = 'V', then ldq≥ max(1, n).
ifst, ilst 1 ≤ifst≤n; 1 ≤ilst≤n.

Must specify the reordering of the diagonal elements (or blocks, which is
possible for real flavors) of the matrix T. The element (or block) with row
index ifst is moved to row ilst by a sequence of exchanges between
adjacent elements (or blocks).
Output Parameters
t Overwritten by the updated matrix S.
q If compq = 'V', q contains the updated matrix of Schur vectors.
ifst, ilst Overwritten for real flavors only.

If ifst pointed to the second row of a 2 by 2 block on entry, it is changed to
point to the first row; ilst always points to the first row of the block in its
final position (which may differ from its input value by ±1).
Return Values
Application Notes
The computed matrix S is exactly similar to a matrix T+E, where ||E||2 = O(ε)*||T||2, and ε is the
machine precision.
Note that if a 2 by 2 diagonal block is involved in the re-ordering, its off-diagonal elements are in general
changed; the diagonal elements and the eigenvalues of the block are unchanged unless the block is
sufficiently ill-conditioned, in which case they may be noticeably altered. It is possible for a 2 by 2 block to
break into two 1 by 1 blocks, that is, for a pair of complex eigenvalues to become purely real.
for real flavors: 6n(ifst-ilst) if compq = 'N';

12n(ifst-ilst) if compq = 'V';
for complex flavors: 20n(ifst-ilst) if compq = 'N';

40n(ifst-ilst) if compq = 'V'.
777
?trsen
Reorders the Schur factorization of a matrix and
(optionally) computes the reciprocal condition
numbers for the selected cluster of eigenvalues and
respective invariant subspace.
Syntax
lapack_int LAPACKE_strsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, float* t, lapack_int ldt, float* q, lapack_int
ldq, float* wr, float* wi, lapack_int* m, float* s, float* sep );
lapack_int LAPACKE_dtrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, double* t, lapack_int ldt, double* q, lapack_int
ldq, double* wr, double* wi, lapack_int* m, double* s, double* sep );
lapack_int LAPACKE_ctrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, lapack_complex_float* t, lapack_int ldt,
lapack_complex_float* q, lapack_int ldq, lapack_complex_float* w, lapack_int* m, float*
s, float* sep );
lapack_int LAPACKE_ztrsen( int matrix_layout, char job, char compq, const
lapack_logical* select, lapack_int n, lapack_complex_double* t, lapack_int ldt,
lapack_complex_double* q, lapack_int ldq, lapack_complex_double* w, lapack_int* m,
double* s, double* sep );
Include Files
• mkl.h
Description
The routine reorders the Schur factorization of a general matrix A = Q*T*QT (for real flavors) or A = Q*T*QH
(for complex flavors) so that a selected cluster of eigenvalues appears in the leading diagonal elements (or,
for real flavors, diagonal blocks) of the Schur form. The reordered Schur form R is computed by a unitary
(orthogonal) similarity transformation: R = ZH*T*Z. Optionally the updated matrix P of Schur vectors is
computed as P = Q*Z, giving A = P*R*PH.
Let
where the selected eigenvalues are precisely the eigenvalues of the leading m-by-m submatrix T11. Let P be
correspondingly partitioned as (Q1Q2) where Q1 consists of the first m columns of Q. Then A*Q1 = Q1*T11,
and so the m columns of Q1 form an orthonormal basis for the invariant subspace corresponding to the
selected cluster of eigenvalues.
Optionally the routine also computes estimates of the reciprocal condition numbers of the average of the
cluster of eigenvalues and of the invariant subspace.
Input Parameters

778
LAPACK Routines 3
job Must be 'N' or 'E' or 'V' or 'B'.
If job = 'N', then no condition numbers are required.
If job = 'E', then only the condition number for the cluster of eigenvalues
is computed.
If job = 'V', then only the condition number for the invariant subspace is
computed.
If job = 'B', then condition numbers for both the cluster and the invariant
subspace are computed.
compq Must be 'V' or 'N'.
If compq = 'V', then Q of the Schur vectors is updated.
If compq = 'N', then no Schur vectors are updated.
select
Specifies the eigenvalues in the selected cluster. To select an eigenvalue λj,
select[j] must be 1
For real flavors: to select a complex conjugate pair of eigenvalues λj and λj

+1 (corresponding 2 by 2 diagonal block), select[j - 1] and/or select[j] must
be 1; the complex conjugate λjand λj + 1 must be either both included in the
cluster or both excluded.
t, q Arrays:
t (size max(1, ldt*n)) Theupper quasi-triangular n-by-n matrix T, in Schur
canonical form.
If compq = 'V', then q must contain the matrix Q of Schur vectors.

If compq = 'V', then ldq≥ max(1, n).
Output Parameters
t Overwritten by the reordered matrix R in Schur canonical form with the

selected eigenvalues in the leading diagonal blocks.
q If compq = 'V', q contains the updated matrix of Schur vectors; the first
m columns of the Q form an orthogonal basis for the specified invariant
subspace.
w Array, size at least max(1, n). The recorded eigenvalues of R. The

eigenvalues are stored in the same order as on the diagonal of R.
779
wr, wi Arrays, size at least max(1, n). Contain the real and imaginary parts,
respectively, of the reordered eigenvalues of R. The eigenvalues are stored
in the same order as on the diagonal of R. Note that if a complex
eigenvalue is sufficiently ill-conditioned, then its value may differ
significantly from its value before reordering.
m
For complex flavors: the dimension of the specified invariant subspaces,
which is the same as the number of selected eigenvalues (see select).
For real flavors: the dimension of the specified invariant subspace. The
value of m is obtained by counting 1 for each selected real eigenvalue and 2
for each selected complex conjugate pair of eigenvalues (see select).
Constraint: 0 ≤m≤n.
s If job = 'E' or 'B', s is a lower bound on the reciprocal condition number

of the average of the selected cluster of eigenvalues.
If m = 0 or n, then s = 1.
For real flavors: if info = 1, then s is set to zero.s is not referenced if job
= 'N' or 'V'.
sep If job = 'V' or 'B', sep is the estimated reciprocal condition number of
the specified invariant subspace.
If m = 0 or n, then sep = |T|.
For real flavors: if info = 1, then sep is set to zero.
sep is not referenced if job = 'N' or 'E'.
Return Values
If info = 1, the reordering of T failed because some eigenvalues are too close to separate (the problem is
very ill-conditioned); T may have been partially reordered, and wr and wi contain the eigenvalues in the
same order as in T; s and sep (if requested) are set to zero.
Application Notes
The computed matrix R is exactly similar to a matrix T+E, where ||E||2 = O(ε)*||T||2, and ε is the
machine precision. The computed s cannot underestimate the true reciprocal condition number by more than
a factor of (min(m, n-m))1/2; sep may differ from the true value by (m*n-m2)1/2. The angle between the
computed invariant subspace and the true subspace is O(ε)*||A||2/sep. Note that if a 2-by-2 diagonal
block is involved in the re-ordering, its off-diagonal elements are in general changed; the diagonal elements
and the eigenvalues of the block are unchanged unless the block is sufficiently ill-conditioned, in which case
they may be noticeably altered. It is possible for a 2-by-2 block to break into two 1-by-1 blocks, that is, for a
pair of complex eigenvalues to become purely real.
?trsyl
Solves Sylvester equation for real quasi-triangular or
complex triangular matrices.
780
LAPACK Routines 3
Syntax
lapack_int LAPACKE_strsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const float* a, lapack_int lda, const float* b, lapack_int
ldb, float* c, lapack_int ldc, float* scale );
lapack_int LAPACKE_dtrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const double* a, lapack_int lda, const double* b,
lapack_int ldb, double* c, lapack_int ldc, double* scale );
lapack_int LAPACKE_ctrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* c, lapack_int ldc,
float* scale );
lapack_int LAPACKE_ztrsyl( int matrix_layout, char trana, char tranb, lapack_int isgn,
lapack_int m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* c, lapack_int ldc,
double* scale );
Include Files
• mkl.h
Description
The routine solves the Sylvester matrix equation op(A)*X±X*op(B) = α*C, where op(A) = A or AH, and the
matrices A and B are upper triangular (or, for real flavors, upper quasi-triangular in canonical Schur form); α≤
1 is a scale factor determined by the routine to avoid overflow in X; A is m-by-m, B is n-by-n, and C and X
are both m-by-n. The matrix X is obtained by a straightforward process of back substitution.
The equation has a unique solution if and only if αi±βi≠ 0, where {αi} and {βi} are the eigenvalues of A and
B, respectively, and the sign (+ or -) is the same as that used in the equation to be solved.
Input Parameters

trana Must be 'N' or 'T' or 'C'.
If trana = 'N', then op(A) = A.
If trana = 'T', then op(A) = AT (real flavors only).
If trana = 'C' then op(A) = AH.
tranb Must be 'N' or 'T' or 'C'.
If tranb = 'N', then op(B) = B.
If tranb = 'T', then op(B) = BT (real flavors only).
If tranb = 'C', then op(B) = BH.
isgn Indicates the form of the Sylvester equation.

If isgn = +1, op(A)*X + X*op(B) = alpha*C.
If isgn = -1, op(A)*X - X*op(B) = alpha*C.
m The order of A, and the number of rows in X and C (m≥ 0).
781
n The order of B, and the number of columns in X and C (n≥ 0).
a, b, c Arrays:
a (size max(1, lda*m)) contains the matrix A.
b (size max(1, ldb*n)) contains the matrix B.
Output Parameters
c Overwritten by the solution matrix X.
scale The value of the scale factor α.
Return Values
If info = 1, A and B have common or close eigenvalues; perturbed values were used to solve the equation.
Application Notes
Let X be the exact, Y the corresponding computed solution, and R the residual matrix: R = C - (AY±YB).
Then the residual is always small:
||R||F = O(ε)*(||A||F +||B||F)*||Y||F.
However, Y is not necessarily the exact solution of a slightly perturbed equation; in other words, the solution
is not backwards stable.
For the forward error, the following bound holds:
||Y - X||F≤||R||F/sep(A,B)
but this may be a considerable overestimate. See [Golub96] for a definition of sep(A, B).
The approximate number of floating-point operations for real flavors is m*n*(m + n). For complex flavors it
is 4 times greater.
Generalized Nonsymmetric Eigenvalue Problems: LAPACK Computational Routines

This section describes LAPACK routines for solving generalized nonsymmetric eigenvalue problems,
reordering the generalized Schur factorization of a pair of matrices, as well as performing a number of
related computational tasks.
A generalized nonsymmetric eigenvalue problem is as follows: given a pair of nonsymmetric (or non-
Hermitian) n-by-n matrices A and B, find the generalized eigenvaluesλ and the corresponding generalized
eigenvectorsx and y that satisfy the equations
Ax = λBx (right generalized eigenvectors x)
782
LAPACK Routines 3
and
yHA = λyHB (left generalized eigenvectors y).
Table "Computational Routines for Solving Generalized Nonsymmetric Eigenvalue Problems" lists LAPACK
routines used to solve the generalized nonsymmetric eigenvalue problems and the generalized Sylvester
equation.
Computational Routines for Solving Generalized Nonsymmetric Eigenvalue Problems
Routine Operation performed
name
gghrd Reduces a pair of matrices to generalized upper Hessenberg form using orthogonal/
unitary transformations.
ggbal Balances a pair of general real or complex matrices.
ggbak Forms the right or left eigenvectors of a generalized eigenvalue problem.
gghd3 Reduces a pair of matrices to generalized upper Hessenberg form.
hgeqz Implements the QZ method for finding the generalized eigenvalues of the matrix pair
(H,T).
tgevc Computes some or all of the right and/or left generalized eigenvectors of a pair of upper
triangular matrices
tgexc Reorders the generalized Schur decomposition of a pair of matrices (A,B) so that one
diagonal block of (A,B) moves to another row index.
tgsen Reorders the generalized Schur decomposition of a pair of matrices (A,B) so that a
selected cluster of eigenvalues appears in the leading diagonal blocks of (A,B).
tgsyl Solves the generalized Sylvester equation.
tgsyl Estimates reciprocal condition numbers for specified eigenvalues and/or eigenvectors of a
pair of matrices in generalized real Schur canonical form.
?gghrd
Reduces a pair of matrices to generalized upper
Hessenberg form using orthogonal/unitary
transformations.
Syntax
lapack_int LAPACKE_sgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float* a, lapack_int lda, float* b, lapack_int ldb,
float* q, lapack_int ldq, float* z, lapack_int ldz);
lapack_int LAPACKE_dgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double* a, lapack_int lda, double* b, lapack_int ldb,
double* q, lapack_int ldq, double* z, lapack_int ldz);
lapack_int LAPACKE_cgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz);
lapack_int LAPACKE_zgghrd (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* z, lapack_int ldz);
783
Include Files
• mkl.h
Description
The routine reduces a pair of real/complex matrices (A,B) to generalized upper Hessenberg form using
orthogonal/unitary transformations, where A is a general matrix and B is upper triangular. The form of the
generalized eigenvalue problem is A*x = λ*B*x, and B is typically made upper triangular by computing its
QR factorization and moving the orthogonal matrix Q to the left side of the equation.
This routine simultaneously reduces A to a Hessenberg matrix H:
QH*A*Z = H
and transforms B to another upper triangular matrix T:
QH*B*Z = T
in order to reduce the problem to its standard form H*y = λ*T*y, where y = ZH*x.
The orthogonal/unitary matrices Q and Z are determined as products of Givens rotations. They may either be
formed explicitly, or they may be postmultiplied into input matrices Q1 and Z1, so that
Q1*A*Z1H = (Q1*Q)*H*(Z1*Z)H
Q1*B*Z1H = (Q1*Q)*T*(Z1*Z)H
If Q1 is the orthogonal/unitary matrix from the QR factorization of B in the original equation A*x = λ*B*x,
then the routine ?gghrd reduces the original problem to generalized Hessenberg form.
Input Parameters

compq Must be 'N', 'I', or 'V'.
If compq = 'N', matrix Q is not computed.
If compq = 'I', Q is initialized to the unit matrix, and the orthogonal/

unitary matrix Q is returned;
If compq = 'V', Q must contain an orthogonal/unitary matrix Q1 on entry,
and the product Q1*Q is returned.
compz Must be 'N', 'I', or 'V'.
If compz = 'N', matrix Z is not computed.
If compz = 'I', Z is initialized to the unit matrix, and the orthogonal/

unitary matrix Z is returned;
If compz = 'V', Z must contain an orthogonal/unitary matrix Z1 on entry,
and the product Z1*Z is returned.
ilo, ihi ilo and ihi mark the rows and columns of A which are to be reduced. It is
assumed that A is already upper triangular in rows and columns 1:ilo-1 and
ihi+1:n. Values of ilo and ihi are normally set by a previous call to ggbal;
otherwise they should be set to 1 and n respectively.
Constraint:
784
LAPACK Routines 3
If n > 0, then 1 ≤ilo≤ihi≤n;
if n = 0, then ilo = 1 and ihi = 0.
a, b, q, z Arrays:
a (size max(1, lda*n)) contains the n-by-n general matrix A.
b (size max(1, ldb*n)) contains the n-by-n upper triangular matrix B.
If compq = 'V', then q must contain the orthogonal/unitary matrix Q1,

typically from the QR factorization of B.
z (size max(1, ldz*n))
If compz = 'V', then z must contain the orthogonal/unitary matrix Z1.

If compq = 'I'or 'V', then ldq≥ max(1, n).

If compz = 'N', then ldz≥ 1.
If compz = 'I'or 'V', then ldz≥ max(1, n).
Output Parameters
a On exit, the upper triangle and the first subdiagonal of A are overwritten
with the upper Hessenberg matrix H, and the rest is set to zero.
b On exit, overwritten by the upper triangular matrix T = QH*B*Z. The

elements below the diagonal are set to zero.
q If compq = 'I', then q contains the orthogonal/unitary matrix Q, ;
If compq = 'V', then q is overwritten by the product Q1*Q.
z If compz = 'I', then z contains the orthogonal/unitary matrix Z;
If compz = 'V', then z is overwritten by the product Z1*Z.
Return Values
785
?ggbal
Balances a pair of general real or complex matrices.
Syntax
lapack_int LAPACKE_sggbal( int matrix_layout, char job, lapack_int n, float* a,
lapack_int lda, float* b, lapack_int ldb, lapack_int* ilo, lapack_int* ihi, float*
lscale, float* rscale );
lapack_int LAPACKE_dggbal( int matrix_layout, char job, lapack_int n, double* a,
lapack_int lda, double* b, lapack_int ldb, lapack_int* ilo, lapack_int* ihi, double*
lscale, double* rscale );
lapack_int LAPACKE_cggbal( int matrix_layout, char job, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale );
lapack_int LAPACKE_zggbal( int matrix_layout, char job, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int ldb,
lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale );
Include Files
• mkl.h
Description
The routine balances a pair of general real/complex matrices (A,B). This involves, first, permuting A and B by
similarity transformations to isolate eigenvalues in the first 1 to ilo-1 and last ihi+1 to n elements on the
diagonal;and second, applying a diagonal similarity transformation to rows and columns ilo to ihi to make the
rows and columns as close in norm as possible. Both steps are optional. Balancing may reduce the 1-norm of
the matrices, and improve the accuracy of the computed eigenvalues and/or eigenvectors in the generalized
eigenvalue problem A*x = λ*B*x.
Input Parameters

job Specifies the operations to be performed on A and B. Must be 'N' or 'P' or

'S' or 'B'.
If job = 'N ', then no operations are done; simply set ilo =1, ihi=n,
lscale[i] =1.0 and rscale[i]=1.0 for
i = 0,..., n - 1.
If job = 'P', then permute only.
If job = 'S', then scale only.
If job = 'B', then both permute and scale.
a, b Arrays:
If job = 'N', a and b are not referenced.
786
LAPACK Routines 3
Output Parameters
a, b Overwritten by the balanced matrices A and B, respectively.
ilo, ihi ilo and ihi are set to integers such that on exit Ai, j = 0 and Bi, j = 0 if i>j
and j=1,...,ilo-1 or i=ihi+1,..., n.
If job = 'N'or 'S', then ilo = 1 and ihi = n.
lscale, rscale Arrays, size at least max(1, n).

lscale contains details of the permutations and scaling factors applied to the
left side of A and B.
If Pj is the index of the row interchanged with row j, and Dj is the scaling
factor applied to row j, then
lscale[j - 1] = Pj, for j = 1,..., ilo-1
= Dj, for j = ilo,...,ihi
= Pj, for j = ihi+1,..., n.
rscale contains details of the permutations and scaling factors applied to the
right side of A and B.
If Pj is the index of the column interchanged with column j, and Dj is the
scaling factor applied to column j, then
rscale[j - 1] = Pj, for j = 1,..., ilo-1
= Dj, for j = ilo,...,ihi
= Pj, for j = ihi+1,..., n
Return Values
?ggbak
Forms the right or left eigenvectors of a generalized
eigenvalue problem.
Syntax
lapack_int LAPACKE_sggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const float* lscale, const float* rscale, lapack_int
m, float* v, lapack_int ldv );
lapack_int LAPACKE_dggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* lscale, const double* rscale, lapack_int
m, double* v, lapack_int ldv );
787
lapack_int LAPACKE_cggbak( int matrix_layout, char job, char side, lapack_int n,

lapack_int ilo, lapack_int ihi, const float* lscale, const float* rscale, lapack_int
m, lapack_complex_float* v, lapack_int ldv );
lapack_int LAPACKE_zggbak( int matrix_layout, char job, char side, lapack_int n,
lapack_int ilo, lapack_int ihi, const double* lscale, const double* rscale, lapack_int
m, lapack_complex_double* v, lapack_int ldv );
Include Files
• mkl.h
Description
The routine forms the right or left eigenvectors of a real/complex generalized eigenvalue problem
A*x = λ*B*x
by backward transformation on the computed eigenvectors of the balanced pair of matrices output by ggbal.
Input Parameters

job Specifies the type of backward transformation required. Must be 'N', 'P',
'S', or 'B'.
If job = 'N', then no operations are done; return.
If job = 'P', then do backward transformation for permutation only.
If job = 'S', then do backward transformation for scaling only.
If job = 'B', then do backward transformation for both permutation and

scaling. This argument must be the same as the argument job supplied to ?
ggbal.
If side = 'L', then v contains left eigenvectors.
If side = 'R', then v contains right eigenvectors.
n The number of rows of the matrix V (n≥ 0).
ilo, ihi The integers ilo and ihi determined by ?gebal. Constraint:
lscale, rscale Arrays, size at least max(1, n).

The array lscale contains details of the permutations and/or scaling factors
applied to the left side of A and B, as returned by ?ggbal.
The array rscale contains details of the permutations and/or scaling factors
applied to the right side of A and B, as returned by ?ggbal.
m The number of columns of the matrix V

(m≥ 0).
788
LAPACK Routines 3
v Array v(size max(1, ldv*m) for column major layout and max(1, ldv*n) for
row major layout) . Contains the matrix of right or left eigenvectors to be
transformed, as returned by tgevc.
ldv The leading dimension of v; at least max(1, n) for column major layout and
at least max(1, m) for row major layout .
Output Parameters
v Overwritten by the transformed eigenvectors
Return Values
?gghd3
Reduces a pair of matrices to generalized upper
Hessenberg form.
Syntax
lapack_int LAPACKE_sgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, float * a, lapack_int lda, float * b, lapack_int ldb,
float * q, lapack_int ldq, float * z, lapack_int ldz);
lapack_int LAPACKE_dgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, double * a, lapack_int lda, double * b, lapack_int
ldb, double * q, lapack_int ldq, double * z, lapack_int ldz);
lapack_int LAPACKE_cgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, lapack_complex_float * q, lapack_int ldq,
lapack_complex_float * z, lapack_int ldz);
lapack_int LAPACKE_zgghd3 (int matrix_layout, char compq, char compz, lapack_int n,
lapack_int ilo, lapack_int ihi, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, lapack_complex_double * q, lapack_int ldq,
lapack_complex_double * z, lapack_int ldz);
Include Files
• mkl.h
Description
?gghd3 reduces a pair of real or complex matrices (A, B) to generalized upper Hessenberg form using
orthogonal/unitary transformations, where A is a general matrix and B is upper triangular. The form of the
generalized eigenvalue problem is
A*x = λ*B*x,
and B is typically made upper triangular by computing its QR factorization and moving the orthogonal/unitary
matrix Q to the left side of the equation.
This subroutine simultaneously reduces A to a Hessenberg matrix H:
QT*A*Z = H for real flavors
789
or
QT*A*Z = H for complex flavors
and transforms B to another upper triangular matrix T:
QT*B*Z = T for real flavors
or
QT*B*Z = T for complex flavors
in order to reduce the problem to its standard form
H*y = λ*T*y
where y = ZT*x for real flavors
or
y = ZT*x for complex flavors.
The orthogonal/unitary matrices Q and Z are determined as products of Givens rotations. They may either be
formed explicitly, or they may be postmultiplied into input matrices Q1 and Z1, so that
for real flavors:
Q1 * A * Z1T = (Q1*Q) * H * (Z1*Z)T
Q1 * B * Z1T = (Q1*Q) * T * (Z1*Z)T
Q1 * A * Z1H = (Q1*Q) * H * (Z1*Z)T
Q1 * B * Z1T = (Q1*Q) * T * (Z1*Z)T
If Q1 is the orthogonal/unitary matrix from the QR factorization of B in the original equation A*x = λ*B*x,
then ?gghd3 reduces the original problem to generalized Hessenberg form.
This is a blocked variant of ?gghrd, using matrix-matrix multiplications for parts of the computation to
enhance performance.
Input Parameters

compq = 'N': do not compute q;
= 'I': q is initialized to the unit matrix, and the orthogonal/unitary matrix Q

is returned;
= 'V': q must contain an orthogonal/unitary matrix Q1 on entry, and the
product Q1*q is returned.
compz = 'N': do not compute z;
= 'I': z is initialized to the unit matrix, and the orthogonal/unitary matrix Z

is returned;
= 'V': z must contain an orthogonal/unitary matrix Z1 on entry, and the
product Z1*z is returned.
n The order of the matrices A and B.

n≥ 0.
790
LAPACK Routines 3
ilo, ihi ilo and ihi mark the rows and columns of a which are to be reduced. It is
assumed that a is already upper triangular in rows and columns 1:ilo - 1
and ihi + 1:n. ilo and ihi are normally set by a previous call to ?ggbal;
otherwise they should be set to 1 and n, respectively.
1 ≤ilo≤ihi≤n, if n > 0; ilo=1 and ihi=0, if n=0.
On entry, the n-by-n general matrix to be reduced.
lda≥ max(1,n).
b Array, (ldb*n).
On entry, then-by-n upper triangular matrix B.
ldb The leading dimension of the array b.
ldb≥ max(1,n).
q Array, size (ldq*n).
On entry, if compq = 'V', the orthogonal/unitary matrix Q1, typically from

the QR factorization of b.
ldq The leading dimension of the array q.
ldq≥n if compq='V' or 'I'; ldq≥ 1 otherwise.
z Array, size (ldz*n).
On entry, if compz = 'V', the orthogonal/unitary matrix Z1.
Not referenced if compz='N'.
ldz The leading dimension of the array z. ldz≥n if compz='V' or 'I'; ldz≥ 1
otherwise.
Output Parameters
a On exit, the upper triangle and the first subdiagonal of a are overwritten
with the upper Hessenberg matrix H, and the rest is set to zero.
b On exit, the upper triangular matrix T = QTBZ for real flavors or T = QHBZ
for complex flavors. The elements below the diagonal are set to zero.
q On exit, if compq='I', the orthogonal/unitary matrix Q, and if compq = 'V',

the product Q1*Q.
Not referenced if compq='N'.
z On exit, if compz='I', the orthogonal/unitary matrix Z, and if compz = 'V',

the product Z1*Z.
Not referenced if compz='N'.
791
Return Values
= 0: successful exit.
Application Notes
This routine reduces A to Hessenberg form and maintains B in using a blocked variant of Moler and Stewart's
original algorithm, as described by Kagstrom, Kressner, Quintana-Orti, and Quintana-Orti (BIT 2008).
?hgeqz
Implements the QZ method for finding the generalized
eigenvalues of the matrix pair (H,T).
Syntax
lapack_int LAPACKE_shgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, float* h, lapack_int ldh, float* t,
lapack_int ldt, float* alphar, float* alphai, float* beta, float* q, lapack_int ldq,
float* z, lapack_int ldz );
lapack_int LAPACKE_dhgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, double* h, lapack_int ldh, double* t,
lapack_int ldt, double* alphar, double* alphai, double* beta, double* q, lapack_int
ldq, double* z, lapack_int ldz );
lapack_int LAPACKE_chgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, lapack_complex_float* h, lapack_int ldh,
lapack_complex_float* t, lapack_int ldt, lapack_complex_float* alpha,
lapack_complex_float* beta, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhgeqz( int matrix_layout, char job, char compq, char compz,
lapack_int n, lapack_int ilo, lapack_int ihi, lapack_complex_double* h, lapack_int
ldh, lapack_complex_double* t, lapack_int ldt, lapack_complex_double* alpha,
lapack_complex_double* beta, lapack_complex_double* q, lapack_int ldq,
lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes the eigenvalues of a real/complex matrix pair (H,T), where H is an upper Hessenberg
matrix and T is upper triangular, using the double-shift version (for real flavors) or single-shift version (for
complex flavors) of the QZ method. Matrix pairs of this type are produced by the reduction to generalized
upper Hessenberg form of a real/complex matrix pair (A,B):
A = Q1*H*Z1H, B = Q1*T*Z1H,
as computed by ?gghrd.
For real flavors:

If job = 'S', then the Hessenberg-triangular pair (H,T) is reduced to generalized Schur form,
H = Q*S*ZT, T = Q*P*ZT,
792
LAPACK Routines 3
where Q and Z are orthogonal matrices, P is an upper triangular matrix, and S is a quasi-triangular matrix
with 1-by-1 and 2-by-2 diagonal blocks. The 1-by-1 blocks correspond to real eigenvalues of the matrix pair
(H,T) and the 2-by-2 blocks correspond to complex conjugate pairs of eigenvalues.
Additionally, the 2-by-2 upper triangular diagonal blocks of P corresponding to 2-by-2 blocks of S are reduced
to positive diagonal form, that is, if Sj + 1, j is non-zero, then Pj + 1, j = Pj, j + 1 = 0, Pj, j > 0, and Pj
+ 1, j + 1 > 0.

If job = 'S', then the Hessenberg-triangular pair (H,T) is reduced to generalized Schur form,
H = Q* S*ZH, T = Q*P*ZH,
where Q and Z are unitary matrices, and S and P are upper triangular.
For all function flavors:
Optionally, the orthogonal/unitary matrix Q from the generalized Schur factorization may be post-multiplied
by an input matrix Q1, and the orthogonal/unitary matrix Z may be post-multiplied by an input matrix Z1.
If Q1 and Z1 are the orthogonal/unitary matrices from ?gghrd that reduced the matrix pair (A,B) to
generalized upper Hessenberg form, then the output matrices Q1Q and Z1Z are the orthogonal/unitary
factors from the generalized Schur factorization of (A,B):
A = (Q1Q)*S *(Z1Z)H, B = (Q1Q)*P*(Z1Z)H.
To avoid overflow, eigenvalues of the matrix pair (H,T) (equivalently, of (A,B)) are computed as a pair of
values (alpha,beta). For chgeqz/zhgeqz, alpha and beta are complex, and for shgeqz/dhgeqz, alpha is
complex and beta real. If beta is nonzero, λ = alpha/beta is an eigenvalue of the generalized
nonsymmetric eigenvalue problem (GNEP)
A*x = λ*B*x
and if alpha is nonzero, μ = beta/alpha is an eigenvalue of the alternate form of the GNEP
μ*A*y = B*y .
Real eigenvalues (for real flavors) or the values of alpha and beta for the i-th eigenvalue (for complex
flavors) can be read directly from the generalized Schur form:
alpha = Si, i, beta = Pi, i.
Input Parameters

job Specifies the operations to be performed. Must be 'E' or 'S'.
If job = 'E', then compute eigenvalues only;
If job = 'S', then compute eigenvalues and the Schur form.
compq Must be 'N', 'I', or 'V'.
If compq = 'N', left Schur vectors (q) are not computed;
If compq = 'I', q is initialized to the unit matrix and the matrix of left
Schur vectors of (H,T) is returned;
If compq = 'V', q must contain an orthogonal/unitary matrix Q1 on entry
and the product Q1*Q is returned.
compz Must be 'N', 'I', or 'V'.
793
If compz = 'N', right Schur vectors (z) are not computed;
If compz = 'I', z is initialized to the unit matrix and the matrix of right
Schur vectors of (H,T) is returned;
If compz = 'V', z must contain an orthogonal/unitary matrix Z1 on entry
and the product Z1*Z is returned.
n The order of the matrices H, T, Q, and Z

(n≥ 0).
ilo, ihi ilo and ihi mark the rows and columns of H which are in Hessenberg form.
It is assumed that H is already upper triangular in rows and columns 1:ilo-1
and ihi+1:n.
Constraint:
h, t, q, z Arrays:
On entry, h (size max(1, ldh*n)) contains the n-by-n upper Hessenberg
matrix H.
On entry, t (size max(1, ldt*n)) contains the n-by-n upper triangular
matrix T.
q (size max(1, ldq*n)) :
On entry, if compq = 'V', this array contains the orthogonal/unitary matrix

Q1 used in the reduction of (A,B) to generalized Hessenberg form.
z (size max(1, ldz*n)) :
On entry, if compz = 'V', this array contains the orthogonal/unitary matrix

Z1 used in the reduction of (A,B) to generalized Hessenberg form.

If compq = 'I'or 'V', then ldq≥ max(1, n).

If compq = 'N', then ldz≥ 1.
If compq = 'I'or 'V', then ldz≥ max(1, n).
Output Parameters
h For real flavors:
794
LAPACK Routines 3
If job = 'S', then on exit h contains the upper quasi-triangular matrix S
from the generalized Schur factorization.
If job = 'E', then on exit the diagonal blocks of h match those of S, but
the rest of h is unspecified.
If job = 'S', then, on exit, h contains the upper triangular matrix S from
the generalized Schur factorization.
If job = 'E', then on exit the diagonal of h matches that of S, but the rest
of h is unspecified.
t If job = 'S', then, on exit, t contains the upper triangular matrix P from
the generalized Schur factorization.
For real flavors:
2-by-2 diagonal blocks of P corresponding to 2-by-2 blocks of S are reduced
to positive diagonal form, that is, if h(j+1,j) is non-zero, then t(j
+1,j)=t(j,j+1)=0 and t(j,j) and t(j+1,j+1) will be positive.
If job = 'E', then on exit the diagonal blocks of t match those of P, but
the rest of t is unspecified.
if job = 'E', then on exit the diagonal of t matches that of P, but the rest
of t is unspecified.
alphar, alphai Arrays, size at least max(1, n). The real and imaginary parts, respectively,
of each scalar alpha defining an eigenvalue of GNEP.
If alphai[j - 1] is zero, then the j-th eigenvalue is real; if positive, then the
j-th and (j+1)-th eigenvalues are a complex conjugate pair, with
alphai[j] = -alphai[j - 1].
alpha Array, size at least max(1, n).

The complex scalars alpha that define the eigenvalues of GNEP. alphai[i
- 1] = Si, i in the generalized Schur factorization.
beta Array, size at least max(1, n).

For real flavors:
The scalars beta that define the eigenvalues of GNEP.
Together, the quantities alpha = (alphar[j - 1], alphai[j - 1]) and
beta = beta[j - 1] represent the j-th eigenvalue of the matrix pair
(A,B), in one of the forms lambda = alpha/beta or mu = beta/alpha.
Since either lambda or mu may overflow, they should not, in general, be
computed.
The real non-negative scalars beta that define the eigenvalues of GNEP.
795
beta[i - 1] = Pi, i in the generalized Schur factorization. Together, the

quantities alpha = alpha[j - 1] and beta = beta[j - 1] represent
the j-th eigenvalue of the matrix pair (A,B), in one of the forms lambda =
alpha/beta or mu = beta/alpha. Since either lambda or mu may
overflow, they should not, in general, be computed.
q On exit, if compq = 'I', q is overwritten by the orthogonal/unitary matrix

of left Schur vectors of the pair (H,T), and if compq = 'V', q is overwritten
by the orthogonal/unitary matrix of left Schur vectors of (A,B).
z On exit, if compz = 'I', z is overwritten by the orthogonal/unitary matrix

of right Schur vectors of the pair (H,T), and if compz = 'V', z is
overwritten by the orthogonal/unitary matrix of right Schur vectors of
(A,B).
Return Values
If info = 1,..., n, the QZ iteration did not converge.
(H,T) is not in Schur form, but alphar[i - 1], alphai[i - 1] (for real flavors), alpha[i - 1] (for complex flavors),
and beta[i - 1], i=info+1,..., n should be correct.
If info = n+1,...,2n, the shift calculation failed.
(H,T) is not in Schur form, but alphar[i - 1], alphai[i - 1] (for real flavors), alpha[i - 1] (for complex flavors),
and beta[i - 1], i =info-n+1,..., n should be correct.
?tgevc
Computes some or all of the right and/or left
generalized eigenvectors of a pair of upper triangular
matrices.
Syntax
lapack_int LAPACKE_stgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const float* s, lapack_int lds, const float* p,
lapack_int ldp, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int mm,
lapack_int* m);
lapack_int LAPACKE_dtgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const double* s, lapack_int lds, const double*
p, lapack_int ldp, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr,
lapack_int mm, lapack_int* m);
lapack_int LAPACKE_ctgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* s, lapack_int lds,
const lapack_complex_float* p, lapack_int ldp, lapack_complex_float* vl, lapack_int
ldvl, lapack_complex_float* vr, lapack_int ldvr, lapack_int mm, lapack_int* m);
lapack_int LAPACKE_ztgevc (int matrix_layout, char side, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* s, lapack_int lds,
const lapack_complex_double* p, lapack_int ldp, lapack_complex_double* vl, lapack_int
ldvl, lapack_complex_double* vr, lapack_int ldvr, lapack_int mm, lapack_int* m);
796
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine computes some or all of the right and/or left eigenvectors of a pair of real/complex matrices
(S,P), where S is quasi-triangular (for real flavors) or upper triangular (for complex flavors) and P is upper
triangular.
Matrix pairs of this type are produced by the generalized Schur factorization of a real/complex matrix pair
(A,B):
A = Q*S*ZH, B = Q*P*ZH
as computed by ?gghrd plus ?hgeqz.
The right eigenvector x and the left eigenvector y of (S,P) corresponding to an eigenvalue w are defined by:
S*x = w*P*x, yH*S = w*yH*P
The eigenvalues are not input to this routine, but are computed directly from the diagonal blocks or diagonal
elements of S and P.
This routine returns the matrices X and/or Y of right and left eigenvectors of (S,P), or the products Z*X
and/or Q*Y, where Z and Q are input matrices.
If Q and Z are the orthogonal/unitary factors from the generalized Schur factorization of a matrix pair (A,B),
then Z*X and Q*Y are the matrices of right and left eigenvectors of (A,B).
Input Parameters

side Must be 'R', 'L', or 'B'.
If side = 'R', compute right eigenvectors only.
If side = 'L', compute left eigenvectors only.
If side = 'B', compute both right and left eigenvectors.
howmny Must be 'A', 'B', or 'S'.
If howmny = 'A', compute all right and/or left eigenvectors.
If howmny = 'B', compute all right and/or left eigenvectors,

backtransformed by the matrices in vr and/or vl.
If howmny = 'S', compute selected right and/or left eigenvectors, specified
by the logical array select.
select
If howmny = 'S', select specifies the eigenvectors to be computed.
If howmny = 'A'or 'B', select is not referenced.
For real flavors:

If w[j] is a real eigenvalue, the corresponding real eigenvector is computed
if select[j] is 1.
797
If w[j] and omega[j + 1] are the real and imaginary parts of a complex
eigenvalue, the corresponding complex eigenvector is computed if either
select[j] or select[j + 1] is 1, and on exit select[j] is set to 1and select[j
+ 1] is set to 0.

The eigenvector corresponding to the j-th eigenvalue is computed if
select[j] is 1.
n The order of the matrices S and P (n≥ 0).
s, p, vl, vr Arrays:
s (size max(1, lds*n)) contains the matrix S from a generalized Schur
factorization as computed by ?hgeqz. This matrix is upper quasi-triangular
for real flavors, and upper triangular for complex flavors.
p (size max(1, ldp*n)) contains the upper triangular matrix P from a
generalized Schur factorization as computed by ?hgeqz.
For real flavors, 2-by-2 diagonal blocks of P corresponding to 2-by-2 blocks

of S must be in positive diagonal form.
For complex flavors, P must have real diagonal elements.
If side = 'L' or 'B' and howmny = 'B', vl(size max(1, ldvl*mm) for
column major layout and max(1, ldvl*n) for row major layout) must
contain an n-by-n matrix Q (usually the orthogonal/unitary matrix Q of left
Schur vectors returned by ?hgeqz).
If side = 'R', vl is not referenced.
If side = 'R' or 'B' and howmny = 'B', vr(size max(1, ldvr*mm) for
column major layout and max(1, ldvr*n) for row major layout) must
contain an n-by-n matrix Z (usually the orthogonal/unitary matrix Z of right
Schur vectors returned by ?hgeqz).
If side = 'L', vr is not referenced.
lds The leading dimension of s; at least max(1, n).
ldp The leading dimension of p; at least max(1, n).
ldvl The leading dimension of vl;

If side = 'L' or 'B', then ldvl≥n for column major layout and ldvl≥
max(1, mm) for row major layout.
If side = 'R', then ldvl≥ 1 .
ldvr The leading dimension of vr;

If side = 'R' or 'B', then ldvr≥n for column major layout and ldvr≥
max(1, mm) for row major layout.
If side = 'L', then ldvr≥ 1.
mm The number of columns in the arrays vl and/or vr (mm≥m).
798
LAPACK Routines 3
Output Parameters
vl On exit, if side = 'L' or 'B', vl contains:
if howmny = 'A', the matrix Y of left eigenvectors of (S,P);
if howmny = 'B', the matrix Q*Y;
if howmny = 'S', the left eigenvectors of (S,P) specified by select, stored

consecutively in the columns of vl, in the same order as their eigenvalues.
For real flavors:
A complex eigenvector corresponding to a complex eigenvalue is stored in
two consecutive columns, the first holding the real part, and the second the
imaginary part.
vr On exit, if side = 'R' or 'B', vr contains:
if howmny = 'A', the matrix X of right eigenvectors of (S,P);
if howmny = 'B', the matrix Z*X;
if howmny = 'S', the right eigenvectors of (S,P) specified by select, stored

consecutively in the columns of vr, in the same order as their eigenvalues.
For real flavors:
A complex eigenvector corresponding to a complex eigenvalue is stored in
two consecutive columns, the first holding the real part, and the second the
imaginary part.
m The number of columns in the arrays vl and/or vr actually used to store the
eigenvectors.
For real flavors:

Each selected real eigenvector occupies one column and each selected
complex eigenvector occupies two columns.
Each selected eigenvector occupies one column.
Return Values
For real flavors:

if info = i>0, the 2-by-2 block (i:i+1) does not have a complex eigenvalue.
?tgexc
Reorders the generalized Schur decomposition of a
pair of matrices (A,B) so that one diagonal block of
(A,B) moves to another row index.
799
Syntax
lapack_int LAPACKE_stgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* q,
lapack_int ldq, float* z, lapack_int ldz, lapack_int* ifst, lapack_int* ilst);
lapack_int LAPACKE_dtgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* q,
lapack_int ldq, double* z, lapack_int ldz, lapack_int* ifst, lapack_int* ilst);
lapack_int LAPACKE_ctgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* q, lapack_int ldq, lapack_complex_float* z,
lapack_int ldz, lapack_int ifst, lapack_int ilst);
lapack_int LAPACKE_ztgexc (int matrix_layout, lapack_logical wantq, lapack_logical
wantz, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
b, lapack_int ldb, lapack_complex_double* q, lapack_int ldq, lapack_complex_double* z,
lapack_int ldz, lapack_int ifst, lapack_int ilst);
Include Files
• mkl.h
Description
The routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrix pair (A,B)
using an orthogonal/unitary equivalence transformation
(A,B) = Q*(A,B)*ZH,
so that the diagonal block of (A, B) with row index ifst is moved to row ilst. Matrix pair (A, B) must be in a
generalized real-Schur/Schur canonical form (as returned by gges), that is, A is block upper triangular with
1-by-1 and 2-by-2 diagonal blocks and B is upper triangular. Optionally, the matrices Q and Z of generalized
Schur vectors are updated.
Qin*Ain*ZinT = Qout*Aout*ZoutT
Qin*Bin*ZinT = Qout*Bout*ZoutT.
Input Parameters

wantq, wantz
If wantq = 1, update the left transformation matrix Q;
If wantq = 0, do not update Q;
If wantz = 1, update the right transformation matrix Z;
If wantz = 0, do not update Z.
a, b, q, z Arrays:
q (size at least 1 if wantq = 0 and at least max(1, ldq*n) if wantq = 1)
800
LAPACK Routines 3
If wantq = 0, then q is not referenced.
If wantq = 1, then q must contain the orthogonal/unitary matrix Q.
z (size at least 1 if wantz = 0 and at least max(1, ldz*n) if wantz = 1)
If wantz = 0, then z is not referenced.
If wantz = 1, then z must contain the orthogonal/unitary matrix Z.

If wantq = 0, then ldq≥ 1.
If wantq = 1, then ldq≥ max(1, n).

If wantz = 0, then ldz≥ 1.
If wantz = 1, then ldz≥ max(1, n).
ifst, ilst Specify the reordering of the diagonal blocks of (A, B). The block with row
index ifst is moved to row ilst, by a sequence of swapping between adjacent
blocks. Constraint: 1 ≤ifst, ilst≤n.
Output Parameters
a, b, q, z Overwritten by the updated matrices A,B, Q, and Z respectively.
ifst, ilst Overwritten for real flavors only.

If ifst pointed to the second row of a 2 by 2 block on entry, it is changed to
point to the first row; ilst always points to the first row of the block in its
final position (which may differ from its input value by ±1).
Return Values
If info = 1, the transformed matrix pair (A, B) would be too far from generalized Schur form; the problem
is ill-conditioned. (A, B) may have been partially reordered, and ilst points to the first row of the current
position of the block being moved.
?tgsen
Reorders the generalized Schur decomposition of a
pair of matrices (A,B) so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of
(A,B).
801
Syntax
lapack_int LAPACKE_stgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, float* a, lapack_int
lda, float* b, lapack_int ldb, float* alphar, float* alphai, float* beta, float* q,
lapack_int ldq, float* z, lapack_int ldz, lapack_int* m, float* pl, float* pr, float*
dif );
lapack_int LAPACKE_dtgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, double* a, lapack_int
lda, double* b, lapack_int ldb, double* alphar, double* alphai, double* beta, double*
q, lapack_int ldq, double* z, lapack_int ldz, lapack_int* m, double* pl, double* pr,
double* dif );
lapack_int LAPACKE_ctgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n, lapack_complex_float*
a, lapack_int lda, lapack_complex_float* b, lapack_int ldb, lapack_complex_float*
alpha, lapack_complex_float* beta, lapack_complex_float* q, lapack_int ldq,
lapack_complex_float* z, lapack_int ldz, lapack_int* m, float* pl, float* pr, float*
dif );
lapack_int LAPACKE_ztgsen( int matrix_layout, lapack_int ijob, lapack_logical wantq,
lapack_logical wantz, const lapack_logical* select, lapack_int n,
lapack_complex_double* alpha, lapack_complex_double* beta, lapack_complex_double* q,
lapack_int ldq, lapack_complex_double* z, lapack_int ldz, lapack_int* m, double* pl,
double* pr, double* dif );
Include Files
• mkl.h
Description
The routine reorders the generalized real-Schur/Schur decomposition of a real/complex matrix pair (A, B) (in
terms of an orthogonal/unitary equivalence transformation QT*(A,B)*Z for real flavors or QH*(A,B)*Z for
complex flavors), so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the pair
(A, B). The leading columns of Q and Z form orthonormal/unitary bases of the corresponding left and right
eigenspaces (deflating subspaces).
(A, B) must be in generalized real-Schur/Schur canonical form (as returned by gges), that is, A and B are
both upper triangular.
?tgsen also computes the generalized eigenvalues
ωj = (alphar(j) + alphai(j)*i)/beta(j) (for real flavors)
ωj = alpha(j)/beta(j) (for complex flavors)
of the reordered matrix pair (A, B).
Optionally, the routine computes the estimates of reciprocal condition numbers for eigenvalues and
eigenspaces. These are Difu[(A11, B11), (A22, B22)] and Difl[(A11, B11), (A22, B22)], that is, the
separation(s) between the matrix pairs (A11, B11) and (A22, B22) that correspond to the selected cluster and
the eigenvalues outside the cluster, respectively, and norms of "projections" onto left and right eigenspaces
with respect to the selected cluster in the (1,1)-block.
802
LAPACK Routines 3
Input Parameters

ijob Specifies whether condition numbers are required for the cluster of
eigenvalues (pl and pr) or the deflating subspaces Difu and Difl.
If ijob =0, only reorder with respect to select;
If ijob =1, reciprocal of norms of "projections" onto left and right

eigenspaces with respect to the selected cluster (pl and pr);
If ijob =2, compute upper bounds on Difu and Difl, using F-norm-based
estimate (dif (1:2));
If ijob =3, compute estimate of Difu and Difl, using 1-norm-based
estimate (dif (1:2)). This option is about 5 times as expensive as ijob =2;
If ijob =4,>compute pl, pr and dif (i.e., options 0, 1 and 2 above). This is
an economic version to get it all;
If ijob =5, compute pl, pr and dif (i.e., options 0, 1 and 3 above).
wantq, wantz
If wantq = 1, update the left transformation matrix Q;
If wantq = 0, do not update Q;
If wantz = 1, update the right transformation matrix Z;
If wantz = 0, do not update Z.
select
Array, size at least max (1, n). Specifies the eigenvalues in the selected
cluster.
To select an eigenvalue ωj, select[j - 1] must be 1For real flavors: to select
a complex conjugate pair of eigenvalues ωj and ωj + 1 (corresponding 2 by 2
diagonal block), select[j - 1] and/or select[j] must be set to 1; the complex
conjugate ωj and ωj + 1 must be either both included in the cluster or both
excluded.
a, b, q, z Arrays:
For real flavors: A is upper quasi-triangular, with (A, B) in generalized real

Schur canonical form.
For complex flavors: A is upper triangular, in generalized Schur canonical
form.
For real flavors: B is upper triangular, with (A, B) in generalized real Schur
canonical form.
For complex flavors: B is upper triangular, in generalized Schur canonical
form.
q (size at least 1 if wantq = 0 and at least max(1, ldq*n) if wantq = 1)
If wantq = 1, then q is an n-by-n matrix;
803
If wantq = 0, then q is not referenced.
z (size at least 1 if wantz = 0 and at least max(1, ldz*n) if wantz = 1)
If wantz = 1, then z is an n-by-n matrix;
If wantz = 0, then z is not referenced.
ldq The leading dimension of q; ldq≥ 1.

If wantq = 1, then ldq≥ max(1, n).
ldz The leading dimension of z; ldz≥ 1.

If wantz = 1, then ldz≥ max(1, n).
Output Parameters
a, b Overwritten by the reordered matrices A and B, respectively.
alphar, alphai Arrays, size at least max(1, n). Contain values that form generalized
eigenvalues in real flavors.
See beta.
alpha Array, size at least max(1, n). Contain values that form generalized
eigenvalues in complex flavors.
See beta.

For real flavors:
On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1, will be the
generalized eigenvalues.
alphar[j] + alphai[j]*i and beta[j], j=0,..., n - 1 are the diagonals
of the complex Schur form (S,T) that would result if the 2-by-2 diagonal
blocks of the real generalized Schur form of (A,B) were further reduced to
triangular form using complex unitary transformations.
If alphai[j - 1] is zero, then the j-th eigenvalue is real; if positive, then
the j-th and (j + 1)-st eigenvalues are a complex conjugate pair, with
alphai[j] negative.
The diagonal elements of A and B, respectively, when the pair (A,B) has
been reduced to generalized Schur form. alpha[i]/beta[i],i=0,..., n - 1
are the generalized eigenvalues.
q If wantq = 1, then, on exit, Q has been postmultiplied by the left

orthogonal transformation matrix which reorder (A, B). The leading m
columns of Q form orthonormal bases for the specified pair of left
804
LAPACK Routines 3
z If wantz = 1, then, on exit, Z has been postmultiplied by the left orthogonal
transformation matrix which reorder (A, B). The leading m columns of Z
form orthonormal bases for the specified pair of left eigenspaces (deflating
subspaces).
m
The dimension of the specified pair of left and right eigen-spaces (deflating
subspaces); 0 ≤m≤n.
pl, pr If ijob = 1, 4, or 5, pl and pr are lower bounds on the reciprocal of the

norm of "projections" onto left and right eigenspaces with respect to the
selected cluster.
0 < pl, pr≤ 1. If m = 0 or m = n, pl = pr = 1.
If ijob = 0, 2 or 3, pl and pr are not referenced
dif Array, size (2).

If ijob≥ 2, dif(1:2) store the estimates of Difu and Difl.
If ijob = 2 or 4, dif(1:2) are F-norm-based upper bounds on Difu and

Difl.
If ijob = 3 or 5, dif(1:2) are 1-norm-based estimates of Difu and Difl.
If m = 0 or m = n, dif(1:2) = F-norm([A, B]).
If ijob = 0 or 1, dif is not referenced.
Return Values
If info = 1, Reordering of (A, B) failed because the transformed matrix pair (A, B) would be too far from
generalized Schur form; the problem is very ill-conditioned. (A, B) may have been partially reordered.
If ijob > 0, 0 is returned in dif, pl and pr.
?tgsyl
Solves the generalized Sylvester equation.
Syntax
lapack_int LAPACKE_stgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const float* a, lapack_int lda, const float* b, lapack_int ldb,
float* c, lapack_int ldc, const float* d, lapack_int ldd, const float* e, lapack_int
lde, float* f, lapack_int ldf, float* scale, float* dif );
lapack_int LAPACKE_dtgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const double* a, lapack_int lda, const double* b, lapack_int ldb,
double* c, lapack_int ldc, const double* d, lapack_int ldd, const double* e,
lapack_int lde, double* f, lapack_int ldf, double* scale, double* dif );
lapack_int LAPACKE_ctgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int
m, lapack_int n, const lapack_complex_float* a, lapack_int lda, const
lapack_complex_float* b, lapack_int ldb, lapack_complex_float* c, lapack_int ldc, const
lapack_complex_float* d, lapack_int ldd, const lapack_complex_float* e, lapack_int lde,
lapack_complex_float* f, lapack_int ldf, float* scale, float* dif );
805
lapack_int LAPACKE_ztgsyl( int matrix_layout, char trans, lapack_int ijob, lapack_int

m, lapack_int n, const lapack_complex_double* a, lapack_int lda, const
lapack_complex_double* b, lapack_int ldb, lapack_complex_double* c, lapack_int ldc,
const lapack_complex_double* d, lapack_int ldd, const lapack_complex_double* e,
lapack_int lde, lapack_complex_double* f, lapack_int ldf, double* scale, double* dif );
Include Files
• mkl.h
Description
The routine solves the generalized Sylvester equation:

A*R-L*B = scale*C
D*R-L*E = scale*F
where R and L are unknown m-by-n matrices, (A, D), (B, E) and (C, F) are given matrix pairs of size m-by-
m, n-by-n and m-by-n, respectively, with real/complex entries. (A, D) and (B, E) must be in generalized real-
Schur/Schur canonical form, that is, A, B are upper quasi-triangular/triangular and D, E are upper triangular.
The solution (R, L) overwrites (C, F). The factor scale, 0≤scale≤1, is an output scaling factor chosen to avoid
overflow.
In matrix notation the above equation is equivalent to the following: solve Z*x = scale*b, where Z is
defined as
Here Ik is the identity matrix of size k and XT is the transpose/conjugate-transpose of X. kron(X, Y) is the
Kronecker product between the matrices X and Y.
If trans = 'T' (for real flavors), or trans = 'C' (for complex flavors), the routine ?tgsyl solves the
transposed/conjugate-transposed system ZT*y = scale*b, which is equivalent to solve for R and L in
AT*R+DT*L = scale*C
R*BT+L*ET = scale*(-F)
This case (trans = 'T' for stgsyl/dtgsyl or trans = 'C' for ctgsyl/ztgsyl) is used to compute an
one-norm-based estimate of Dif[(A, D), (B, E)], the separation between the matrix pairs (A,D) and
(B,E).
If ijob≥ 1, ?tgsyl computes a Frobenius norm-based estimate of Dif[(A, D), (B,E)]. That is, the
reciprocal of a lower bound on the reciprocal of the smallest singular value of Z. This is a level 3 BLAS
algorithm.
Input Parameters

If trans = 'N', solve the generalized Sylvester equation.
If trans = 'T', solve the 'transposed' system (for real flavors only).
806
LAPACK Routines 3
If trans = 'C', solve the ' conjugate transposed' system (for complex
flavors only).
ijob Specifies what kind of functionality to be performed:

If ijob =0, solve the generalized Sylvester equation only;
If ijob =1, perform the functionality of ijob =0 and ijob =3;
If ijob =2, perform the functionality of ijob =0 and ijob =4;
If ijob =3, only an estimate of Dif[(A, D), (B, E)] is computed (look ahead
strategy is used);
If ijob =4, only an estimate of Dif[(A, D), (B,E)] is computed (?gecon on
sub-systems is used). If trans = 'T' or 'C', ijob is not referenced.
m The order of the matrices A and D, and the row dimension of the matrices
C, F, R and L.
n The order of the matrices B and E, and the column dimension of the
matrices C, F, R and L.
a, b, c, d, e, f Arrays:
a (size max(1, lda*m)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix A.
b (size max(1, ldb*n)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix B.
major layout) contains the right-hand-side of the first matrix equation in
the generalized Sylvester equation (as defined by trans)
d (size max(1, ldd*m)) contains the upper triangular matrix D.
e (size max(1, lde*n)) contains the upper triangular matrix E.
f(size max(1, ldf*n) for column major layout and max(1, ldf*m) for row
major layout) contains the right-hand-side of the second matrix equation in
the generalized Sylvester equation (as defined by trans)
lda The leading dimension of a; at least max(1, m).
ldd The leading dimension of d; at least max(1, m).
lde The leading dimension of e; at least max(1, n).
ldf The leading dimension of f; at least max(1, m) for column major layout and
Output Parameters
c If ijob=0, 1, or 2, overwritten by the solution R.
807
If ijob=3 or 4 and trans = 'N', c holds R, the solution achieved during the
computation of the Dif-estimate.
f If ijob=0, 1, or 2, overwritten by the solution L.
If ijob=3 or 4 and trans = 'N', f holds L, the solution achieved during

the computation of the Dif-estimate.
dif On exit, dif is the reciprocal of a lower bound of the reciprocal of the Dif-
function, that is, dif is an upper bound of Dif[(A, D), (B, E)] =
sigma_min(Z), where Z as defined in the description.
If ijob = 0, or trans = 'T' (for real flavors), or trans = 'C' (for
complex flavors), dif is not touched.
scale On exit, scale is the scaling factor in the generalized Sylvester equation.
If 0 < scale < 1, c and f hold the solutions R and L, respectively, to a
slightly perturbed system but the input matrices A, B, D and E have not
been changed.
If scale = 0, c and f hold the solutions R and L, respectively, to the
homogeneous system with C = F = 0. Normally, scale = 1.
Return Values
If info > 0, (A, D) and (B, E) have common or close eigenvalues.
?tgsna
Estimates reciprocal condition numbers for specified
eigenvalues and/or eigenvectors of a pair of matrices
in generalized real Schur canonical form.
Syntax
lapack_int LAPACKE_stgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const float* a, lapack_int lda, const float* b,
lapack_int ldb, const float* vl, lapack_int ldvl, const float* vr, lapack_int ldvr,
float* s, float* dif, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_dtgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const double* a, lapack_int lda, const double*
b, lapack_int ldb, const double* vl, lapack_int ldvl, const double* vr, lapack_int
ldvr, double* s, double* dif, lapack_int mm, lapack_int* m );
lapack_int LAPACKE_ctgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_float* a, lapack_int lda,
const lapack_complex_float* b, lapack_int ldb, const lapack_complex_float* vl,
lapack_int ldvl, const lapack_complex_float* vr, lapack_int ldvr, float* s, float*
dif, lapack_int mm, lapack_int* m );
808
LAPACK Routines 3
lapack_int LAPACKE_ztgsna( int matrix_layout, char job, char howmny, const
lapack_logical* select, lapack_int n, const lapack_complex_double* a, lapack_int lda,
const lapack_complex_double* b, lapack_int ldb, const lapack_complex_double* vl,
lapack_int ldvl, const lapack_complex_double* vr, lapack_int ldvr, double* s, double*
dif, lapack_int mm, lapack_int* m );
Include Files
• mkl.h
Description
The real flavors stgsna/dtgsna of this routine estimate reciprocal condition numbers for specified
eigenvalues and/or eigenvectors of a matrix pair (A, B) in generalized real Schur canonical form (or of any
matrix pair (Q*A*ZT, Q*B*ZT) with orthogonal matrices Q and Z.
(A, B) must be in generalized real Schur form (as returned by gges/gges), that is, A is block upper triangular
with 1-by-1 and 2-by-2 diagonal blocks. B is upper triangular.
The complex flavors ctgsna/ztgsna estimate reciprocal condition numbers for specified eigenvalues and/or
eigenvectors of a matrix pair (A, B). (A, B) must be in generalized Schur canonical form, that is, A and B are
both upper triangular.
Input Parameters

job Specifies whether condition numbers are required for eigenvalues or

eigenvectors. Must be 'E' or 'V' or 'B'.
If job = 'E', for eigenvalues only (compute s ).
If job = 'V', for eigenvectors only (compute dif ).
If job = 'B', for both eigenvalues and eigenvectors (compute both s and
dif).
howmny Must be 'A' or 'S'.
If howmny = 'A', compute condition numbers for all eigenpairs.
If howmny = 'S', compute condition numbers for selected eigenpairs

specified by the logical array select.
select
If howmny = 'S', select specifies the eigenpairs for which condition
numbers are required.
If howmny = 'A', select is not referenced.
For real flavors:

To select condition numbers for the eigenpair corresponding to a real
eigenvalue ωj, select[j - 1] must be set to 1; to select condition numbers
corresponding to a complex conjugate pair of eigenvalues ωj and ωj + 1,
either select[j - 1] or select[j] must be set to 1.
809
To select condition numbers for the corresponding j-th eigenvalue and/or

eigenvector, select[j - 1] must be set to 1.
n The order of the square matrix pair (A, B)

(n≥ 0).
a, b, vl, vr Arrays:
a (size max(1, lda*n)) contains the upper quasi-triangular (for real flavors)
or upper triangular (for complex flavors) matrix A in the pair (A, B).
b (size max(1, ldb*n)) contains the upper triangular matrix B in the pair
(A, B).
If job = 'E' or 'B', vl(size max(1, ldvl*m) for column major layout and
max(1, ldvl*n) for row major layout) must contain left eigenvectors of (A,
B), corresponding to the eigenpairs specified by howmny and select. The
eigenvectors must be stored in consecutive columns of vl, as returned by ?
tgevc.
If job = 'V', vl is not referenced.
If job = 'E' or 'B', vr(size max(1, ldvr*m) for column major layout and
max(1, ldvr*n) for row major layout) must contain right eigenvectors of
(A, B), corresponding to the eigenpairs specified by howmny and select.
The eigenvectors must be stored in consecutive columns of vr, as returned
by ?tgevc.
If job = 'V', vr is not referenced.
ldvl The leading dimension of vl; ldvl≥ 1.
If job = 'E' or 'B', then ldvl≥ max(1, n) for column major layout and
ldvl≥ max(1, m) for row major layout .
ldvr The leading dimension of vr; ldvr≥ 1.
If job = 'E' or 'B', then ldvr≥ max(1, n) for column major layout and
ldvr≥ max(1, m) for row major layout.
mm The number of elements in the arrays s and dif (mm≥m).
Output Parameters
s Array, size mm.

If job = 'E' or 'B', contains the reciprocal condition numbers of the
selected eigenvalues, stored in consecutive elements of the array.
If job = 'V', s is not referenced.
For real flavors:
810
LAPACK Routines 3
For a complex conjugate pair of eigenvalues two consecutive elements of s
are set to the same value. Thus, s[j - 1], dif[j - 1], and the j-th columns of
vl and vr all correspond to the same eigenpair (but not in general the j-th
eigenpair, unless all eigenpairs are selected).
dif Array, size mm.

If job = 'V' or 'B', contains the estimated reciprocal condition numbers
of the selected eigenvectors, stored in consecutive elements of the array.
If the eigenvalues cannot be reordered to compute dif[j], dif[j] is set to 0;
this can only occur when the true value would be very small anyway.
If job = 'E', dif is not referenced.
For real flavors:

For a complex eigenvector, two consecutive elements of dif are set to the
same value.
For each eigenvalue/vector specified by select, dif stores a Frobenius norm-
based estimate of Difl.
m The number of elements in the arrays s and dif used to store the specified
condition numbers; for each selected eigenvalue one element is used.
Return Values
Generalized Singular Value Decomposition: LAPACK Computational Routines

This section describes LAPACK computational routines used for finding the generalized singular value
decomposition (GSVD) of two matrices A and B as
UHAQ = D1*(0 R),
VHBQ = D2*(0 R),
where U, V, and Q are orthogonal/unitary matrices, R is a nonsingular upper triangular matrix, and D1, D2
are “diagonal” matrices of the structure detailed in the routines description section.
Table “Computational Routines for Generalized Singular Value Decomposition” lists LAPACK routines that
perform generalized singular value decomposition of matrices.
Computational Routines for Generalized Singular Value Decomposition
Routine name Operation performed
ggsvp Computes the preprocessing decomposition for the generalized SVD
ggsvp3 Performs preprocessing for a generalized SVD.
ggsvd3 Computes generalized SVD.
tgsja Computes the generalized SVD of two upper triangular or trapezoidal

matrices
You can use routines listed in the above table as well as the driver routine ggsvd to find the GSVD of a pair of
general rectangular matrices.
811
?ggsvp
Computes the preprocessing decomposition for the
generalized SVD (deprecated).
Syntax
lapack_int LAPACKE_sggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, float* a, lapack_int lda, float* b,
lapack_int ldb, float tola, float tolb, lapack_int* k, lapack_int* l, float* u,
lapack_int ldu, float* v, lapack_int ldv, float* q, lapack_int ldq );
lapack_int LAPACKE_dggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, double* a, lapack_int lda, double* b,
lapack_int ldb, double tola, double tolb, lapack_int* k, lapack_int* l, double* u,
lapack_int ldu, double* v, lapack_int ldv, double* q, lapack_int ldq );
lapack_int LAPACKE_cggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, float tola, float tolb, lapack_int* k,
lapack_int* l, lapack_complex_float* u, lapack_int ldu, lapack_complex_float* v,
lapack_int ldv, lapack_complex_float* q, lapack_int ldq );
lapack_int LAPACKE_zggsvp( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, double tola, double tolb, lapack_int* k,
lapack_int* l, lapack_complex_double* u, lapack_int ldu, lapack_complex_double* v,
lapack_int ldv, lapack_complex_double* q, lapack_int ldq );
Include Files
• mkl.h
Description
This routine is deprecated; use ggsvp3.
The routine computes orthogonal matrices U, V and Q such that
812
LAPACK Routines 3
where the k-by-k matrix A12 and l-by-l matrix B13 are nonsingular upper triangular; A23 is l-by-l upper
triangular if m-k-l≥0, otherwise A23 is (m-k)-by-l upper trapezoidal. The sum k+l is equal to the effective
numerical rank of the (m+p)-by-n matrix (AH,BH)H.
This decomposition is the preprocessing step for computing the Generalized Singular Value Decomposition
(GSVD), see subroutine ?tgsja.
Input Parameters

jobu Must be 'U' or 'N'.
If jobu = 'U', orthogonal/unitary matrix U is computed.
If jobu = 'N', U is not computed.
jobv Must be 'V' or 'N'.
If jobv = 'V', orthogonal/unitary matrix V is computed.
If jobv = 'N', V is not computed.
jobq Must be 'Q' or 'N'.
If jobq = 'Q', orthogonal/unitary matrix Q is computed.
If jobq = 'N', Q is not computed.
p The number of rows of the matrix B (p≥ 0).
a, b Arrays:
a(size at least max(1, lda*n) for column major layout and max(1, lda*m)
for row major layout) contains the m-by-n matrix A.
b(size at least max(1, ldb*n) for column major layout and max(1, ldb*p)
for row major layout) contains the p-by-n matrix B.
tola, tolb tola and tolb are the thresholds to determine the effective numerical rank of
matrix B and a subblock of A. Generally, they are set to
tola = max(m, n)*||A||*MACHEPS,
813
tolb = max(p, n)*||B||*MACHEPS.

The size of tola and tolb may affect the size of backward errors of the
decomposition.
ldu The leading dimension of the output array u . ldu≥ max(1, m) if jobu =
'U'; ldu≥ 1 otherwise.
ldv The leading dimension of the output array v . ldv≥ max(1, p) if jobv =
'V'; ldv≥ 1 otherwise.
ldq The leading dimension of the output array q . ldq≥ max(1, n) if jobq =
'Q'; ldq≥ 1 otherwise.
Output Parameters
a Overwritten by the triangular (or trapezoidal) matrix described in the

Description section.
b Overwritten by the triangular matrix described in the Description section.
k, l On exit, k and l specify the dimension of subblocks. The sum k + l is equal

to effective numerical rank of (AH, BH)H.
u, v, q Arrays:
If jobu = 'U', u (size max(1, ldu*m)) contains the orthogonal/unitary
matrix U.
If jobu = 'N', u is not referenced.
If jobv = 'V', v (size max(1, ldv*p)) contains the orthogonal/unitary

matrix V.
If jobv = 'N', v is not referenced.
If jobq = 'Q', q (size max(1, ldq*n)) contains the orthogonal/unitary

matrix Q.
If jobq = 'N', q is not referenced.
Return Values
?ggsvp3
Performs preprocessing for a generalized SVD.
Syntax
lapack_int LAPACKE_sggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, float * a, lapack_int lda, float * b,
lapack_int ldb, float tola, float tolb, lapack_int * k, lapack_int * l, float * u,
lapack_int ldu, float * v, lapack_int ldv, float * q, lapack_int ldq);
814
LAPACK Routines 3
lapack_int LAPACKE_dggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, double * a, lapack_int lda, double * b,
lapack_int ldb, double tola, double tolb, lapack_int * k, lapack_int * l, double * u,
lapack_int ldu, double * v, lapack_int ldv, double * q, lapack_int ldq);
lapack_int LAPACKE_cggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, float tola, float tolb, lapack_int * k,
lapack_int * l, lapack_complex_float * u, lapack_int ldu, lapack_complex_float * v,
lapack_int ldv, lapack_complex_float * q, lapack_int ldq);
lapack_int LAPACKE_zggsvp3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, double tola, double tolb, lapack_int * k,
lapack_int * l, lapack_complex_double * u, lapack_int ldu, lapack_complex_double * v,
lapack_int ldv, lapack_complex_double * q, lapack_int ldq);
Include Files
• mkl_lapack.h
Include Files
• mkl.h
Description
?ggsvp3 computes orthogonal or unitary matrices U, V, and Q such that
for real flavors:
n−k−l k l
k 0 A12 A13
U T AQ = if m - k - l≥ 0;
l0 0 A23
m−k−l 0 0 0
n−k−l k l
T
U AQ = k 0 A12 A13 if m - k - l< 0;
m−k 0 0 A23
n−k−l k l
T
V BQ = l 0 0 B13
p−l 00 0
n−k−l k l
k 0 A12 A13
U H AQ = if m - k - l≥ 0;
l0 0 A23
m−k−l 0 0 0
n−k−l k l
H
U AQ = k 0 A12 A13 if m - k-l< 0;
m−k 0 0 A23
n−k−l k l
H
V BQ = l 0 0 B13
p−l 00 0
815
triangular if m-k-l≥ 0, otherwise A23 is (m-k-by-l upper trapezoidal. k + l = the effective numerical rank of
the (m + p)-by-n matrix (AT,BT)T for real flavors or (AH,BH)H for complex flavors.
This decomposition is the preprocessing step for computing the Generalized Singular Value Decomposition
(GSVD), see ?ggsvd3.
Input Parameters

jobu = 'U': Orthogonal/unitary matrix U is computed;

= 'N': U is not computed.
jobv = 'V': Orthogonal/unitary matrix V is computed;

= 'N': V is not computed.
jobq = 'Q': Orthogonal/unitary matrix Q is computed;

= 'N': Q is not computed.
m The number of rows of the matrix A.

m≥ 0.
p The number of rows of the matrix B.

p≥ 0.
n The number of columns of the matrices A and B.

n≥ 0.
On entry, the m-by-n matrix A.
lda≥ max(1,m).
b Array, size (ldb*n).
On entry, the p-by-n matrix B.
ldb≥ max(1,p).
tola, tolb tola and tolb are the thresholds to determine the effective numerical rank
of matrix B and a subblock of A. Generally, they are set to
tola = max(m,n)*norm(a)*MACHEPS,
tolb = max(p,n)*norm(b)*MACHEPS.
The size of tola and tolb may affect the size of backward errors of the
decomposition.
ldu The leading dimension of the array u.
816
LAPACK Routines 3
ldu≥ max(1,m) if jobu = 'U'; ldu≥ 1 otherwise.
ldv≥ max(1,p) if jobv = 'V'; ldv≥ 1 otherwise.
ldq≥ max(1,n) if jobq = 'Q'; ldq≥ 1 otherwise.
Output Parameters
a On exit, a contains the triangular (or trapezoidal) matrix described in the

b On exit, b contains the triangular matrix described in the Description

section.
k, l On exit, k and l specify the dimension of the subblocks described in

k + l = effective numerical rank of (AT,BT)T for real flavors or (AH,BH)H for
complex flavors.
u Array, size (ldu*m).
If jobu = 'U', u contains the orthogonal/unitary matrix U.
v Array, size (ldv*p).
If jobv = 'V', v contains the orthogonal/unitary matrix V.
If jobq = 'Q', q contains the orthogonal/unitary matrix Q.
Return Values
Application Notes
The subroutine uses LAPACK subroutine ?geqp3 for the QR factorization with column pivoting to detect the
effective numerical rank of the A matrix. It may be replaced by a better rank determination strategy.
?ggsvp3 replaces the deprecated subroutine ?ggsvp.
?ggsvd3
Computes generalized SVD.
817
Syntax
lapack_int LAPACKE_sggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l, float * a,
lapack_int lda, float * b, lapack_int ldb, float * alpha, float * beta, float * u,
lapack_int ldu, float * v, lapack_int ldv, float * q, lapack_int ldq, lapack_int *
iwork);
lapack_int LAPACKE_dggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l, double * a,
lapack_int lda, double * b, lapack_int ldb, double * alpha, double * beta, double * u,
lapack_int ldu, double * v, lapack_int ldv, double * q, lapack_int ldq, lapack_int *
iwork);
lapack_int LAPACKE_cggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
float * alpha, float * beta, lapack_complex_float * u, lapack_int ldu,
lapack_complex_float * v, lapack_int ldv, lapack_complex_float * q, lapack_int ldq,
lapack_int * iwork);
lapack_int LAPACKE_zggsvd3 (int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int * k, lapack_int * l,
double * alpha, double * beta, lapack_complex_double * u, lapack_int ldu,
lapack_complex_double * v, lapack_int ldv, lapack_complex_double * q, lapack_int ldq,
lapack_int * iwork);
Include Files
• mkl.h
Description
?ggsvd3 computes the generalized singular value decomposition (GSVD) of an m-by-n real or complex matrix
A and p-by-n real or complex matrix B:
UT*A*Q = D1*( 0 R ), VT*B*Q = D2*( 0 R ) for real flavors

or
UH*A*Q = D1*( 0 R ), VH*B*Q = D2*( 0 R ) for complex flavors
where U, V and Q are orthogonal/unitary matrices.
Let k+l = the effective numerical rank of the matrix (ATBT)T for real flavors or the matrix (AH,BH)H for
complex flavors, then R is a (k + l)-by-(k + l) nonsingular upper triangular matrix, D1 and D2 are m-by-(k +
l) and p-by-(k + l) "diagonal" matrices and of the following structures, respectively:
If m-k-l≥ 0,
k l
k I 0
D1 =
l
0 C
m−k−l 0 0
k l
D2 = l
0S
p−l 0 0
818
LAPACK Routines 3
n−k−l k l
0R =k 0 R11 R12
l 0 0 R22
where
C = diag( alpha(k+1), ... , alpha(k+l) ),
S = diag( beta(k+1), ... , beta(k+l) ),
C2 + S2 = I.
If m - k - l < 0,
k m−k k+l −m
D1 = k I 0 0
m−k 0 C 0
k m−k k+l −m
m−k 0 S 0
D2 =
k+l −m 0 0 I
p−l 0 0 0
n−k−l k m−k k+l −m

k 0 R11 R12 R13
0R =
m−k 0 0 R22 R23
k+l −m 0 0 0 R33
where
C = diag(alpha(k + 1), ... , alpha(m)),
S = diag(beta(k + 1), ... , beta(m)),
C2 + S2 = I.
The routine computes C, S, R, and optionally the orthogonal/unitary transformation matrices U, V and Q.
In particular, if B is an n-by-n nonsingular matrix, then the GSVD of A and B implicitly gives the SVD of
A*inv(B):
A*inv(B) = U*(D1*inv(D2))*VT for real flavors
or
A*inv(B) = U*(D1*inv(D2))*VH for complex flavors.
If (AT,BT)T for real flavors or (AH,BH)H for complex flavors has orthonormal columns, then the GSVD of A and
B is also equal to the CS decomposition of A and B. Furthermore, the GSVD can be used to derive the
solution of the eigenvalue problem:
AT*AX = λ* BT*BX for real flavors
or
AH*AX = λ* BH*BX for complex flavors
In some literature, the GSVD of A and B is presented in the form
UT*A*X = ( 0 D1 ), VT*B*X = ( 0 D2 ) for real (A, B)
or
UH*A*X = ( 0 D1 ), VH*B*X = ( 0 D2 ) for complex (A, B)
where U and V are orthogonal and X is nonsingular, D1 and D2 are "diagonal''. The former GSVD form can be
converted to the latter form by taking the nonsingular matrix X as
819
I 0
X = Q*
0 inv R
Input Parameters

jobu = 'U': Orthogonal/unitary matrix U is computed;

= 'N': U is not computed.
jobv = 'V': Orthogonal/unitary matrix V is computed;

= 'N': V is not computed.
jobq = 'Q': Orthogonal/unitary matrix Q is computed;

= 'N': Q is not computed.

m≥ 0.
n The number of columns of the matrices A and B.

n≥ 0.
p The number of rows of the matrix B.

p≥ 0.
lda≥ max(1,m).
On entry, the p-by-n matrix B.
ldb≥ max(1,p).
ldu The leading dimension of the array u.
ldu≥ max(1,m) if jobu = 'U'; ldu≥ 1 otherwise.
ldv≥ max(1,p) if jobv = 'V'; ldv≥ 1 otherwise.
ldq≥ max(1,n) if jobq = 'Q'; ldq≥ 1 otherwise.
iwork Array, size (n).
820
LAPACK Routines 3
Output Parameters
k, l On exit, k and l specify the dimension of the subblocks described in the

k + l = effective numerical rank of (AT,BT)T for real flavors or (AH,BH)H for
complex flavors.
a On exit, a contains the triangular matrix R, or part of R.
If m-k-l≥ 0, R is stored in the elements of array a corresponding to A1: k +

l,n - k - l + 1:n.
R11 R12 R13
If m - k - l < 0, is stored in the elements of array a
0 R22 R23
corresponding to A(1:m, n - k - l + 1:n, and R33 is stored in bthe elements of
array a corresponding to Am - k + 1:l,n + m - k - l + 1:n on exit.
b On exit, b contains part of the triangular matrix R if m - k - l < 0.
See Description for details.
alpha Array, size (n)
beta Array, size (n)
On exit, alpha and beta contain the generalized singular value pairs of a
and b;
alpha[0: k - 1] = 1,
beta[0: k - 1] = 0,
and if m - k - l≥ 0,
alpha[k:k + l - 1] = C,
beta[k:k + l - 1] = S,
or if m - k - l < 0,
alpha[k:m - 1] = C, alpha[m: k + l - 1] = 0
beta[k: m - 1] =S, beta[m: k + l - 1] = 1
and
alpha[k + l: n - 1] = 0
beta[k + l : n - 1] = 0
u Array, size (ldu*m).
If jobu = 'U', u contains the m-by-m orthogonal/unitary matrix U.
v Array, size (ldv*p).
If jobv = 'V', v contains the p-by-p orthogonal/unitary matrix V.
If jobq = 'Q', q contains the n-by-n orthogonal/unitary matrix Q.
821
iwork On exit, iwork stores the sorting information. More precisely, the following
loop uses iwork to sort alpha:
for (i = k; i<min(m,k + l); i++) {

swap (alpha[i], alpha[iwork[i] - 1]);
}
such that alpha[0] ≥alpha[1] ≥ ... ≥alpha[n - 1].
Return Values
> 0: if info = 1, the Jacobi-type procedure failed to converge.
For further details, see subroutine ?tgsja.
Application Notes
?ggsvd3 replaces the deprecated subroutine ?ggsvd.
?tgsja
Computes the generalized SVD of two upper triangular
or trapezoidal matrices.
Syntax
lapack_int LAPACKE_stgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l, float* a,
lapack_int lda, float* b, lapack_int ldb, float tola, float tolb, float* alpha, float*
beta, float* u, lapack_int ldu, float* v, lapack_int ldv, float* q, lapack_int ldq,
lapack_int* ncycle );
lapack_int LAPACKE_dtgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l, double* a,
lapack_int lda, double* b, lapack_int ldb, double tola, double tolb, double* alpha,
double* beta, double* u, lapack_int ldu, double* v, lapack_int ldv, double* q,
lapack_int ldq, lapack_int* ncycle );
lapack_int LAPACKE_ctgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int ldb, float
tola, float tolb, float* alpha, float* beta, lapack_complex_float* u, lapack_int ldu,
lapack_complex_float* v, lapack_int ldv, lapack_complex_float* q, lapack_int ldq,
lapack_int* ncycle );
lapack_int LAPACKE_ztgsja( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int p, lapack_int n, lapack_int k, lapack_int l,
double tola, double tolb, double* alpha, double* beta, lapack_complex_double* u,
lapack_int ldu, lapack_complex_double* v, lapack_int ldv, lapack_complex_double* q,
lapack_int ldq, lapack_int* ncycle );
822
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine computes the generalized singular value decomposition (GSVD) of two real/complex upper
triangular (or trapezoidal) matrices A and B. On entry, it is assumed that matrices A and B have the following
forms, which may be obtained by the preprocessing subroutine ggsvp from a general m-by-n matrix A and p-
by-n matrix B:
triangular if m-k-l≥0, otherwise A23 is (m-k)-by-l upper trapezoidal.
On exit,
UH*A*Q = D1*(0 R), VH*B*Q = D2*(0 R),
where U, V and Q are orthogonal/unitary matrices, R is a nonsingular upper triangular matrix, and D1 and D2
are "diagonal" matrices, which are of the following structures:
If m-k-l≥0,
823
where
C = diag(alpha[k],...,alpha[k+l-1])
S = diag(beta[k],...,beta[k+l-1])
C2 + S2 = I
R is stored in a(1:k+l, n-k-l+1:n ) on exit.
If m-k-l < 0,
where
C = diag(alpha[k],...,alpha[m-1]),
S = diag(beta[k],...,beta[m-1]),
C2 + S2 = I
824
LAPACK Routines 3
On exit, is stored in a(1:m, n-k-l+1:n ) and R33 is stored

in b(m-k+1:l, n+m-k-l+1:n ).
The computation of the orthogonal/unitary transformation matrices U, V or Q is optional. These matrices may
either be formed explicitly, or they may be postmultiplied into input matrices U1, V1, or Q1.
Input Parameters

jobu Must be 'U', 'I', or 'N'.
If jobu = 'U', u must contain an orthogonal/unitary matrix U1 on entry.
If jobu = 'I', u is initialized to the unit matrix.
If jobu = 'N', u is not computed.
jobv Must be 'V', 'I', or 'N'.
If jobv = 'V', v must contain an orthogonal/unitary matrix V1 on entry.
If jobv = 'I', v is initialized to the unit matrix.
If jobv = 'N', v is not computed.
jobq Must be 'Q', 'I', or 'N'.
If jobq = 'Q', q must contain an orthogonal/unitary matrix Q1 on entry.
If jobq = 'I', q is initialized to the unit matrix.
If jobq = 'N', q is not computed.
k, l Specify the subblocks in the input matrices A and B, whose GSVD is

computed.
a, b, u, v, q Arrays:
If jobu = 'U', u (size max(1, ldu*m)) must contain a matrix U1 (usually
the orthogonal/unitary matrix returned by ?ggsvp).
If jobv = 'V', v (size at least max(1, ldv*p)) must contain a matrix V1

(usually the orthogonal/unitary matrix returned by ?ggsvp).
If jobq = 'Q', q (size at least max(1, ldq*n)) must contain a matrix Q1

(usually the orthogonal/unitary matrix returned by ?ggsvp).
825
ldb The leading dimension of b; at least max(1, p) for column major layout and
ldu The leading dimension of the array u .

ldu≥ max(1, m) if jobu = 'U'; ldu≥ 1 otherwise.
ldv The leading dimension of the array v .

ldv≥ max(1, p) if jobv = 'V'; ldv≥ 1 otherwise.
ldq The leading dimension of the array q .

ldq≥ max(1, n) if jobq = 'Q'; ldq≥ 1 otherwise.
tola, tolb tola and tolb are the convergence criteria for the Jacobi-Kogbetliantz
iteration procedure. Generally, they are the same as used in ?ggsvp:
tola = max(m, n)*|A|*MACHEPS,

tolb = max(p, n)*|B|*MACHEPS.
Output Parameters
a On exit, a(n-k+1:n, 1:min(k+l, m)) contains the triangular matrix R or part

of R.
b On exit, if necessary, b(m-k+1: l, n+m-k-l+1: n)) contains a part of R.
alpha, beta Arrays, size at least max(1, n). Contain the generalized singular value pairs
of A and B:
alpha(1:k) = 1,
beta(1:k) = 0,
and if m-k-l≥ 0,
alpha(k+1:k+l) = diag(C),
beta(k+1:k+l) = diag(S),
or if m-k-l < 0,
alpha(k+1:m)= diag(C), alpha(m+1:k+l)=0

beta(k+1:m) = diag(S),
beta(m+1:k+l) = 1.
Furthermore, if k+l < n,
alpha(k+l+1:n)= 0 and
beta(k+l+1:n) = 0.
u If jobu = 'I', u contains the orthogonal/unitary matrix U.
If jobu = 'U', u contains the product U1*U.
826
LAPACK Routines 3
v If jobv = 'I', v contains the orthogonal/unitary matrix U.
If jobv = 'V', v contains the product V1*V.
q If jobq = 'I', q contains the orthogonal/unitary matrix U.
If jobq = 'Q', q contains the product Q1*Q.
ncycle The number of cycles required for convergence.
Return Values
If info = 1, the procedure does not converge after MAXIT cycles.
Cosine-Sine Decomposition: LAPACK Computational Routines

This section describes LAPACK computational routines for computing the cosine-sine decomposition (CS
decomposition) of a partitioned unitary/orthogonal matrix. The algorithm computes a complete 2-by-2 CS
decomposition, which requires simultaneous diagonalization of all the four blocks of a unitary/orthogonal
matrix partitioned into a 2-by-2 block structure.
The computation has the following phases:
1. The matrix is reduced to a bidiagonal block form.

2. The blocks are simultaneously diagonalized using techniques from the bidiagonal SVD algorithms.
Table "Computational Routines for Cosine-Sine Decomposition (CSD)" lists LAPACK routines that perform CS
decomposition of matrices.
Computational Routines for Cosine-Sine Decomposition (CSD)
Compute the CS decomposition of an bbcsd/bbcsd bbcsd/bbcsd

orthogonal/unitary matrix in bidiagonal-block
form
Simultaneously bidiagonalize the blocks of a orbdb unbdb

partitioned orthogonal matrix
Simultaneously bidiagonalize the blocks of a orbdb unbdb

partitioned unitary matrix
See Also
Cosine-Sine Decomposition: LAPACK Driver Routines
?bbcsd
Computes the CS decomposition of an orthogonal/
unitary matrix in bidiagonal-block form.
827
Syntax
lapack_int LAPACKE_sbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, float* theta, float*
phi, float* u1, lapack_int ldu1, float* u2, lapack_int ldu2, float* v1t, lapack_int
ldv1t, float* v2t, lapack_int ldv2t, float* b11d, float* b11e, float* b12d, float*
b12e, float* b21d, float* b21e, float* b22d, float* b22e );
lapack_int LAPACKE_dbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, double* theta, double*
phi, double* u1, lapack_int ldu1, double* u2, lapack_int ldu2, double* v1t, lapack_int
ldv1t, double* v2t, lapack_int ldv2t, double* b11d, double* b11e, double* b12d,
double* b12e, double* b21d, double* b21e, double* b22d, double* b22e );
lapack_int LAPACKE_cbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, float* theta, float*
phi, lapack_complex_float* u1, lapack_int ldu1, lapack_complex_float* u2, lapack_int
ldu2, lapack_complex_float* v1t, lapack_int ldv1t, lapack_complex_float* v2t,
lapack_int ldv2t, float* b11d, float* b11e, float* b12d, float* b12e, float* b21d,
float* b21e, float* b22d, float* b22e );
lapack_int LAPACKE_zbbcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, lapack_int m, lapack_int p, lapack_int q, double* theta, double*
phi, lapack_complex_double* u1, lapack_int ldu1, lapack_complex_double* u2, lapack_int
ldu2, lapack_complex_double* v1t, lapack_int ldv1t, lapack_complex_double* v2t,
lapack_int ldv2t, double* b11d, double* b11e, double* b12d, double* b12e, double*
b21d, double* b21e, double* b22d, double* b22e );
Include Files
• mkl.h
Description
mkl_lapack.fiThe routine ?bbcsd computes the CS decomposition of an orthogonal or unitary matrix in
bidiagonal-block form:
or
respectively.
828
LAPACK Routines 3
x is m-by-m with the top-left block p-by-q. Note that q must not be larger than p, m-p, or m-q. If q is not
the smallest index, x must be transposed and/or permuted in constant time using the trans option. See ?
orcsd/?uncsd for details.
The bidiagonal matrices b11, b12, b21, and b22 are represented implicitly by angles theta(1:q) and
phi(1:q-1).
The orthogonal/unitary matrices u1, u2, v1t, and v2t are input/output. The input matrices are pre- or post-
multiplied by the appropriate singular vector matrices.
Input Parameters

jobu1 If equals Y, then u1 is updated. Otherwise, u1 is not updated.
jobu2 If equals Y, then u2 is updated. Otherwise, u2 is not updated.
jobv1t If equals Y, then v1t is updated. Otherwise, v1t is not updated.
jobv2t If equals Y, then v2t is updated. Otherwise, v2t is not updated.
trans = 'T': x, u1, u2, v1t, v2t are stored in row-major order.
otherwise x, u1, u2, v1t, v2t are stored in column-major

order.
m The number of rows and columns of the orthogonal/unitary matrix X in

bidiagonal-block form.
p The number of rows in the top-left block of x. 0 ≤p≤m.
≤
q The number of columns in the top-left block of x. 0 q≤ min(p,m-p,m-q).
theta Array, size q.

On entry, the angles theta[0], ..., theta[q - 1] that, along with
phi[0], ..., phi[q - 2], define the matrix in bidiagonal-block form as
returned by orbdb/unbdb.
phi Array, size q-1.

The angles phi[0], ..., phi[q - 2] that, along with theta[0], ...,
theta[q - 1], define the matrix in bidiagonal-block form as returned by
orbdb/unbdb.
u1 Array, size at least max(1, ldu1*p).
On entry, a p-by-p matrix.
ldu1 The leading dimension of the array u1, ldu1≤ max(1, p).
u2 Array, size max(1, ldu2*(m-p)).
On entry, an (m-p)-by-(m-p) matrix.
ldu2 The leading dimension of the array u2, ldu2≤ max(1, m-p).
829
v1t Array, size max(1, ldv1t*q).
On entry, a q-by-q matrix.
ldv1t The leading dimension of the array v1t, ldv1t≤ max(1, q).
v2t Array, size.

On entry, an (m-q)-by-(m-q) matrix.
ldv2t The leading dimension of the array v2t, ldv2t≤ max(1, m-q).
Output Parameters
theta On exit, the angles whose cosines and sines define the diagonal blocks in
the CS decomposition.
u1 On exit, u1 is postmultiplied by the left singular vector matrix common to

[ b11 ; 0 ] and [ b12 0 0 ; 0 -I 0 ].
u2 On exit, u2 is postmultiplied by the left singular vector matrix common to

[ b21 ; 0 ] and [ b22 0 0 ; 0 0 I ].
v1t Array, size q.

On exit, v1t is premultiplied by the transpose of the right singular vector
matrix common to [ b11 ; 0 ] and [ b21 ; 0 ].
v2t On exit, v2t is premultiplied by the transpose of the right singular vector
matrix common to [ b12 0 0 ; 0 -I 0 ] and [ b22 0 0 ; 0 0 I ].
b11d Array, size q.

When ?bbcsd converges, b11d contains the cosines of theta[0], ...,
theta[q - 1]. If ?bbcsd fails to converge, b11d contains the diagonal of
the partially reduced top left block.
b11e Array, size q-1.

When ?bbcsd converges, b11e contains zeros. If ?bbcsd fails to converge,
b11e contains the superdiagonal of the partially reduced top left block.
b12d Array, size q.

When ?bbcsd converges, b12d contains the negative sines of
theta[0], ..., theta[q - 1]. If ?bbcsd fails to converge, b12d contains
the diagonal of the partially reduced top right block.
b12e Array, size q-1.

When ?bbcsd converges, b12e contains zeros. If ?bbcsd fails to converge,
b11e contains the superdiagonal of the partially reduced top right block.
Return Values
830
LAPACK Routines 3
If info > 0 and if ?bbcsd did not converge, info specifies the number of nonzero entries in phi, and b11d,
b11e, etc. contain the partially reduced matrix.
See Also
?orcsd/?uncsd
xerbla
?orbdb/?unbdb
Simultaneously bidiagonalizes the blocks of a
partitioned orthogonal/unitary matrix.
Syntax
lapack_int LAPACKE_sorbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, float* x11, lapack_int ldx11, float* x12, lapack_int
ldx12, float* x21, lapack_int ldx21, float* x22, lapack_int ldx22, float* theta,
float* phi, float* taup1, float* taup2, float* tauq1, float* tauq2 );
lapack_int LAPACKE_dorbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, double* x11, lapack_int ldx11, double* x12, lapack_int
ldx12, double* x21, lapack_int ldx21, double* x22, lapack_int ldx22, double* theta,
double* phi, double* taup1, double* taup2, double* tauq1, double* tauq );
lapack_int LAPACKE_cunbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, lapack_complex_float* x11, lapack_int ldx11,
lapack_complex_float* x12, lapack_int ldx12, lapack_complex_float* x21, lapack_int
ldx21, lapack_complex_float* x22, lapack_int ldx22, float* theta, float* phi,
lapack_complex_float* taup1, lapack_complex_float* taup2, lapack_complex_float* tauq1,
lapack_complex_float* tauq2 );
lapack_int LAPACKE_zunbdb( int matrix_layout, char trans, char signs, lapack_int m,
lapack_int p, lapack_int q, lapack_complex_double* x11, lapack_int ldx11,
lapack_complex_double* x12, lapack_int ldx12, lapack_complex_double* x21, lapack_int
ldx21, lapack_complex_double* x22, lapack_int ldx22, double* theta, double* phi,
lapack_complex_double* taup1, lapack_complex_double* taup2, lapack_complex_double*
tauq1, lapack_complex_double* tauq2 );
Include Files
• mkl.h
Description
The routines ?orbdb/?unbdb simultaneously bidiagonalizes the blocks of an m-by-m partitioned orthogonal
matrix X:
or unitary matrix:
831
x11 is p-by-q. q must not be larger than p, m-p, or m-q. Otherwise, x must be transposed and/or permuted
in constant time using the trans and signs options.
The orthogonal/unitary matrices p1, p2, q1, and q2 are p-by-p, (m-p)-by-(m-p), q-by-q, (m-q)-by-(m-q),
respectively. They are represented implicitly by Housholder vectors.
The bidiagonal matrices b11, b12, b21, and b22 are q-by-q bidiagonal matrices represented implicitly by angles
theta[0], ..., theta[q - 1] and phi[0], ..., phi[q - 2]. b11 and b12 are upper bidiagonal, while b21 and
b22 are lower bidiagonal. Every entry in each bidiagonal band is a product of a sine or cosine of theta with a
sine or cosine of phi. See [Sutton09] for details.
p1, p2, q1, and q2 are represented as products of elementary reflectors. .
Input Parameters


order.
signs = 'O': The lower-left block is made nonpositive (the

"other" convention).
otherwise The upper-right block is made nonpositive (the
"default" convention).
m The number of rows and columns of the matrix X.
p The number of rows in x11 and x12. 0 ≤p≤m.
q The number of columns in x11 and x21. 0 ≤q≤ min(p,m-p,m-q).
x11 Array, size (size max(1, ldx11*q) for column major layout and max(1,
ldx11*p) for row major layout) .
On entry, the top-left block of the orthogonal/unitary matrix to be reduced.
ldx11 The leading dimension of the array X11. If trans = 'T', ldx11≥p for column
major layout and ldx11≥q for row major layout. Otherwise, ldx11≥q.
x12 Array, size (size max(1, ldx12*(m-q)) for column major layout and max(1,
ldx12*p) for row major layout).
On entry, the top-right block of the orthogonal/unitary matrix to be
reduced.
832
LAPACK Routines 3
ldx12 The leading dimension of the array X12. If trans = 'N', ldx12≥p for column
major layout and ldx12≥m - q for row major layout. . Otherwise,
ldx12≥m-q.
x21 Array, size (size max(1, ldx21*q) for column major layout and max(1,
ldx21*(m-p)) for row major layout).
On entry, the bottom-left block of the orthogonal/unitary matrix to be
reduced.
ldx21 The leading dimension of the array X21. If trans = 'N', ldx21≥m-p for
column major layout and ldx12≥q for row major layout. . Otherwise,
ldx21≥q.
x22 Array, size ((size max(1, ldx22*(m-q)) for column major layout and max(1,
ldx22*(m - p)) for row major layout).
On entry, the bottom-right block of the orthogonal/unitary matrix to be
reduced.
ldx22 The leading dimension of the array X21. If trans = 'N', ldx22≥m-p for
column major layout and ldx22≥m - q for row major layout. . Otherwise,
ldx22≥m-q.
Output Parameters
x11 On exit, the form depends on trans:
If trans='N', the columns of the lower triangle of x11 specify

reflectors for p1, the rows of the upper triangle of
x11(1:q - 1, q:q - 1) specify reflectors for q1
otherwise the rows of the upper triangle of x11 specify reflectors
trans='T', for p1, the columns of the lower triangle of x11(1:q -
1, q:q - 1) specify reflectors for q1
If trans='N', the columns of the upper triangle of x12 specify the first
p reflectors for q2
otherwise the columns of the lower triangle of x12 specify the first
trans='T', p reflectors for q2
If trans='N', the columns of the lower triangle of x21 specify the

reflectors for p2
otherwise the columns of the upper triangle of x21 specify the

trans='T', reflectors for p2
If trans='N', the rows of the upper triangle of x22(q+1:m-p,p+1:m-

q) specify the last m-p-q reflectors for q2
otherwise the columns of the lower triangle of x22(p+1:m-q,q

trans='T', +1:m-p) specify the last m-p-q reflectors for p2
833
theta Array, size q. The entries of bidiagonal blocks b11, b12, b21, and b22 can be
computed from the angles theta and phi. See the Description section for
details.
phi Array, size q-1. The entries of bidiagonal blocks b11, b12, b21, and b22 can
be computed from the angles theta and phi. See the Description section
for details.
taup1 Array, size p.

Scalar factors of the elementary reflectors that define p1.
taup2 Array, size m-p.
Scalar factors of the elementary reflectors that define p2.
tauq1 Array, size q.
Scalar factors of the elementary reflectors that define q1.
tauq2 Array, size m-q.
Scalar factors of the elementary reflectors that define q2.
Return Values
See Also
?orcsd/?uncsd
?orgqr
?ungqr
?orglq
?unglq
xerbla
LAPACK Least Squares and Eigenvalue Problem Driver Routines

Each of the LAPACK driver routines solves a complete problem. To arrive at the solution, driver routines
typically call a sequence of appropriate computational routines.
Driver routines are described in the following sections :
Linear Least Squares (LLS) Problems
Generalized LLS Problems
Symmetric Eigenproblems
Nonsymmetric Eigenproblems
Singular Value Decomposition
Cosine-Sine Decomposition
Generalized Symmetric Definite Eigenproblems
Generalized Nonsymmetric Eigenproblems
834
LAPACK Routines 3
Linear Least Squares (LLS) Problems: LAPACK Driver Routines
This section describes LAPACK driver routines used for solving linear least squares problems. Table "Driver
Routines for Solving LLS Problems" lists all such routines.
Driver Routines for Solving LLS Problems
Routine Name Operation performed
gels Uses QR or LQ factorization to solve a overdetermined or underdetermined linear

system with full rank matrix.
gelsy Computes the minimum-norm solution to a linear least squares problem using a
complete orthogonal factorization of A.
gelss Computes the minimum-norm solution to a linear least squares problem using the
singular value decomposition of A.
gelsd Computes the minimum-norm solution to a linear least squares problem using the
singular value decomposition of A and a divide and conquer method.
?gels
Uses QR or LQ factorization to solve a overdetermined
or underdetermined linear system with full rank
matrix.
Syntax
lapack_int LAPACKE_sgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, double* a, lapack_int lda, double* b, lapack_int ldb);
lapack_int LAPACKE_cgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb);
lapack_int LAPACKE_zgels (int matrix_layout, char trans, lapack_int m, lapack_int n,
lapack_int nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb);
Include Files
• mkl.h
Description
The routine solves overdetermined or underdetermined real/ complex linear systems involving an m-by-n
matrix A, or its transpose/ conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has
full rank.
The following options are provided:
1. If trans = 'N' and m≥n: find the least squares solution of an overdetermined system, that is, solve the
least squares problem
minimize ||b - A*x||2
2. If trans = 'N' and m < n: find the minimum norm solution of an underdetermined system A*X = B.
3. If trans = 'T' or 'C' and m≥n: find the minimum norm solution of an undetermined system AH*X = B.
4. If trans = 'T' or 'C' and m < n: find the least squares solution of an overdetermined system, that is,
solve the least squares problem
835
minimize ||b - AH*x||2

Several right hand side vectors b and solution vectors x can be handled in a single call; they are formed by
the columns of the right hand side matrix B and the solution matrix X (when coefficient matrix is A, B is m-
by-nrhs and X is n-by-nrhs; if the coefficient matrix is AT or AH, B isn-by-nrhs and X is m-by-nrhs.
Input Parameters

If trans = 'N', the linear system involves matrix A;
If trans = 'T', the linear system involves the transposed matrix AT (for
real flavors only);
If trans = 'C', the linear system involves the conjugate-transposed
matrix AH (for complex flavors only).
n The number of columns of the matrix A

(n≥ 0).
nrhs The number of right-hand sides; the number of columns in B (nrhs≥ 0).
a, b Arrays:
b(size max(1, ldb*nrhs) for column major layout and max(1, ldb*max(m,
n)) for row major layout) contains the matrix B of right hand side vectors.
ldb The leading dimension of b; must be at least max(1, m, n) for column

major layout if trans='N' and at least max(1, n) if trans='T' and at
least max(1, nrhs) for row major layout regardless of the value of trans.
Output Parameters
a On exit, overwritten by the factorization data as follows:

if m≥n, array a contains the details of the QR factorization of the matrix A as
returned by ?geqrf;
if m < n, array a contains the details of the LQ factorization of the matrix A

as returned by ?gelqf.
b If info = 0, b overwritten by the solution vectors, stored columnwise:
if trans = 'N' and m≥n, rows 1 to n of b contain the least squares solution
vectors; the residual sum of squares for the solution in each column is
given by the sum of squares of modulus of elements n+1 to m in that
column;
836
LAPACK Routines 3
if trans = 'N' and m < n, rows 1 to n of b contain the minimum norm
solution vectors;
if trans = 'T' or 'C' and m≥n, rows 1 to m of b contain the minimum
norm solution vectors;
if trans = 'T' or 'C' and m < n, rows 1 to m of b contain the least
squares solution vectors; the residual sum of squares for the solution in
each column is given by the sum of squares of modulus of elements m+1 to
n in that column.
Return Values
If info = i, the i-th diagonal element of the triangular factor of A is zero, so that A does not have full rank;
the least squares solution could not be computed.
?gelsy
Computes the minimum-norm solution to a linear least
squares problem using a complete orthogonal
factorization of A.
Syntax
lapack_int LAPACKE_sgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, lapack_int* jpvt, float
rcond, lapack_int* rank );
lapack_int LAPACKE_dgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, lapack_int* jpvt, double
rcond, lapack_int* rank );
lapack_int LAPACKE_cgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int
ldb, lapack_int* jpvt, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelsy( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb, lapack_int* jpvt, double rcond, lapack_int* rank );
Include Files
• mkl.h
Description
The ?gelsy routine computes the minimum-norm solution to a real/complex linear least squares problem:

using a complete orthogonal factorization of A. A is an m-by-n matrix which may be rank-deficient. Several
right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the
columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X.
The routine first computes a QR factorization with column pivoting:
837
with R11 defined as the largest leading submatrix whose estimated condition number is less than 1/rcond.
The order of R11, rank, is the effective rank of A. Then, R22 is considered to be negligible, and R12 is
annihilated by orthogonal/unitary transformations from the right, arriving at the complete orthogonal
factorization:
The minimum-norm solution is then
for real flavors and
for complex flavors,

where Q1 consists of the first rank columns of Q.
The ?gelsy routine is identical to the original deprecated ?gelsx routine except for the following
differences:
• The call to the subroutine ?geqpf has been substituted by the call to the subroutine ?geqp3, which is a
BLAS-3 version of the QR factorization with column pivoting.
• The matrix B (the right hand side) is updated with BLAS-3.
• The permutation of the matrix B (the right hand side) is faster and more simple.
Input Parameters


(n≥ 0).
a, b Arrays:
n)) for row major layout) contains the m-by-nrhs right hand side matrix B.
838
LAPACK Routines 3

major layout and at least max(1, nrhs) for row major layout.
jpvt
On entry, if jpvt[i - 1]≠ 0, the i-th column of A is permuted to the front of
AP, otherwise the i-th column of A is a free column.
rcond rcond is used to determine the effective rank of A, which is defined as the
order of the largest leading triangular submatrix R11 in the QR factorization
with pivoting of A, whose estimated condition number < 1/rcond.
Output Parameters
a On exit, overwritten by the details of the complete orthogonal factorization

of A.
b Overwritten by the n-by-nrhs solution matrix X.
jpvt On exit, if jpvt[i - 1]= k, then the i-th column of AP was the k-th column of
A.
rank The effective rank of A, that is, the order of the submatrix R11. This is the
same as the order of the submatrix T11 in the complete orthogonal
factorization of A.
Return Values
?gelss
squares problem using the singular value
decomposition of A.
Syntax
lapack_int LAPACKE_sgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, float* s, float rcond,
lapack_int* rank );
lapack_int LAPACKE_dgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, double* s, double rcond,
lapack_int* rank );
lapack_int LAPACKE_cgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
ldb, float* s, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelss( int matrix_layout, lapack_int m, lapack_int n, lapack_int
ldb, double* s, double rcond, lapack_int* rank );
839
Include Files
• mkl.h
Description
The routine computes the minimum norm solution to a real linear least squares problem:
using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may be rank-deficient.
Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as
the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X. The effective
rank of A is determined by treating as zero those singular values which are less than rcond times the largest
singular value.
Input Parameters


(n≥ 0).
nrhs The number of right-hand sides; the number of columns in B

(nrhs≥ 0).
a, b Arrays:

rcond rcond is used to determine the effective rank of A. Singular values s(i)
≤rcond *s(1) are treated as zero.
If rcond <0, machine precision is used instead.
Output Parameters
a On exit, the first min(m, n) rows of a are overwritten with the matrix of
right singular vectors of A, stored row-wise.

If m≥n and rank = n, the residual sum-of-squares for the solution in the i-
th column is given by the sum of squares of modulus of elements n+1:m in
that column.
840
LAPACK Routines 3
s Array, size at least max(1, min(m, n)). The singular values of A in
decreasing order. The condition number of A in the 2-norm is
k2(A) = s(1)/ s(min(m, n)) .
rank The effective rank of A, that is, the number of singular values which are
greater than rcond *s(1).
Return Values
If info = i, then the algorithm for computing the SVD failed to converge; i indicates the number of off-
diagonal elements of an intermediate bidiagonal form which did not converge to zero.
?gelsd
squares problem using the singular value
decomposition of A and a divide and conquer method.
Syntax
lapack_int LAPACKE_sgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, float* a, lapack_int lda, float* b, lapack_int ldb, float* s, float rcond,
lapack_int* rank );
lapack_int LAPACKE_dgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
nrhs, double* a, lapack_int lda, double* b, lapack_int ldb, double* s, double rcond,
lapack_int* rank );
lapack_int LAPACKE_cgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
ldb, float* s, float rcond, lapack_int* rank );
lapack_int LAPACKE_zgelsd( int matrix_layout, lapack_int m, lapack_int n, lapack_int
ldb, double* s, double rcond, lapack_int* rank );
Include Files
• mkl.h
Description
The routine computes the minimum-norm solution to a real linear least squares problem:
using the singular value decomposition (SVD) of A. A is an m-by-n matrix which may be rank-deficient.
Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as
the columns of the m-by-nrhs right hand side matrix B and the n-by-nrhs solution matrix X.
The problem is solved in three steps:
1. Reduce the coefficient matrix A to bidiagonal form with Householder transformations, reducing the
original problem into a "bidiagonal least squares problem" (BLS).
2. Solve the BLS using a divide and conquer approach.
841
3. Apply back all the Householder transformations to solve the original least squares problem.
The effective rank of A is determined by treating as zero those singular values which are less than rcond
times the largest singular value.
Input Parameters


(n≥ 0).
a, b Arrays:

rcond rcond is used to determine the effective rank of A. Singular values s(i)
≤rcond *s(1) are treated as zero. If rcond≤ 0, machine precision is used
instead.
Output Parameters
a On exit, A has been overwritten.

If m≥n and rank = n, the residual sum-of-squares for the solution in the i-
th column is given by the sum of squares of modulus of elements n+1:m in
that column.
s Array, size at least max(1, min(m, n)). The singular values of A in

decreasing order. The condition number of A in the 2-norm is
k2(A) = s(1)/ s(min(m, n)).
rank The effective rank of A, that is, the number of singular values which are
greater than rcond *s(1).
Return Values
842
LAPACK Routines 3
If info = i, then the algorithm for computing the SVD failed to converge; i indicates the number of off-
diagonal elements of an intermediate bidiagonal form that did not converge to zero.
Generalized Linear Least Squares (LLS) Problems: LAPACK Driver Routines

This section describes LAPACK driver routines used for solving generalized linear least squares problems.
Table "Driver Routines for Solving Generalized LLS Problems" lists all such routines.
Driver Routines for Solving Generalized LLS Problems
gglse Solves the linear equality-constrained least squares problem using a generalized RQ
factorization.
ggglm Solves a general Gauss-Markov linear model problem using a generalized QR

factorization.
?gglse
Solves the linear equality-constrained least squares
problem using a generalized RQ factorization.
Syntax
lapack_int LAPACKE_sgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
float* a, lapack_int lda, float* b, lapack_int ldb, float* c, float* d, float* x);
lapack_int LAPACKE_dgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
double* a, lapack_int lda, double* b, lapack_int ldb, double* c, double* d, double*
x);
lapack_int LAPACKE_cgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
lapack_complex_float* c, lapack_complex_float* d, lapack_complex_float* x);
lapack_int LAPACKE_zgglse (int matrix_layout, lapack_int m, lapack_int n, lapack_int p,
lapack_complex_double* c, lapack_complex_double* d, lapack_complex_double* x);
Include Files
• mkl.h
Description
The routine solves the linear equality-constrained least squares (LSE) problem:
minimize ||c - A*x||2 subject to B*x = d
where A is an m-by-n matrix, B is a p-by-n matrix, c is a given m-vector, andd is a given p-vector. It is
assumed that p≤n≤m+p, and
These conditions ensure that the LSE problem has a unique solution, which is obtained using a generalized
RQ factorization of the matrices (B, A) given by
B=(0 R)*Q, A=Z*T*Q
843
Input Parameters

p The number of rows of the matrix B

(0 ≤p≤n≤m+p).
a, b, c, d Arrays:
b(size max(1, ldb*n) for column major layout and max(1, ldb*p) for row
major layout) contains the p-by-nmatrix B.
c size at least max(1, m), contains the right hand side vector for the least
squares part of the LSE problem.
d, size at least max(1, p), contains the right hand side vector for the
constrained equation.
Output Parameters
a The elements on and above the diagonal contain the min(m, n)-by-n upper
trapezoidal matrix T as returned by ?ggrqf.
x The solution of the LSE problem.
b On exit, the upper right triangle contains the p-by-p upper triangular matrix
R as returned by ?ggrqf.
d On exit, d is destroyed.
c On exit, the residual sum-of-squares for the solution is given by the sum of
squares of elements n-p+1 to m of vector c.
Return Values
If info = 1, the upper triangular factor R associated with B in the generalized RQ factorization of the pair
(B, A) is singular, so that rank(B) < p; the least squares solution could not be computed.
844
LAPACK Routines 3
If info = 2, the (n-p)-by-(n-p) part of the upper trapezoidal factor T associated with A in the generalized
RQ factorization of the pair (B, A) is singular, so that
; the least squares solution could not be computed.
?ggglm
Solves a general Gauss-Markov linear model problem
using a generalized QR factorization.
Syntax
lapack_int LAPACKE_sggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
float* a, lapack_int lda, float* b, lapack_int ldb, float* d, float* x, float* y);
lapack_int LAPACKE_dggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
double* a, lapack_int lda, double* b, lapack_int ldb, double* d, double* x, double*
y);
lapack_int LAPACKE_cggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_float* d, lapack_complex_float* x, lapack_complex_float* y);
lapack_int LAPACKE_zggglm (int matrix_layout, lapack_int n, lapack_int m, lapack_int p,
lapack_complex_double* d, lapack_complex_double* x, lapack_complex_double* y);
Include Files
• mkl.h
Description
The routine solves a general Gauss-Markov linear model (GLM) problem:

minimizex ||y||2 subject to d = A*x + B*y
where A is an n-by-m matrix, B is an n-by-p matrix, and d is a given n-vector. It is assumed that m≤n≤m+p,
and rank(A) = m and rank(AB) = n.
Under these assumptions, the constrained equation is always consistent, and there is a unique solution x and
a minimal 2-norm solution y, which is obtained using a generalized QR factorization of the matrices (A, B )
given by
In particular, if matrix B is square nonsingular, then the problem GLM is equivalent to the following weighted
linear least squares problem
minimizex ||B-1(d-A*x)||2.
845
Input Parameters

n The number of rows of the matrices A and B (n≥ 0).
m The number of columns in A (m≥ 0).
p The number of columns in B (p≥n - m).
a, b, d Arrays:
a(size max(1, lda*m) for column major layout and max(1, lda*n) for row
major layout) contains the n-by-m matrix A.
b(size max(1, ldb*p) for column major layout and max(1, ldb*n) for row
major layout) contains the n-by-p matrix B.
d, size at least max(1, n), contains the left hand side of the GLM equation.
lda The leading dimension of a; at least max(1, n)for column major layout and
max(1, m) for row major layout.
ldb The leading dimension of b; at least max(1, n)for column major layout and
max(1, p) for row major layout.
Output Parameters
x, y Arrays x, y. size at least max(1, m) for x and at least max(1, p) for y.

On exit, x and y are the solutions of the GLM problem.
a On exit, the upper triangular part of the array a contains the m-by-m upper
b On exit, if n≤p, the upper right triangle contains the n-by-n upper triangular
matrix T as returned by ?ggrqf; if n > p, the elements on and above the
(n-p)-th subdiagonal contain the n-by-p upper trapezoidal matrix T.
d On exit, d is destroyed
Return Values
If info = 1, the upper triangular factor R associated with A in the generalized QR factorization of the pair
(A, B) is singular, so that rank(A) < m; the least squares solution could not be computed.
If info = 2, the bottom (n-m)-by-(n-m) part of the upper trapezoidal factor T associated with B in the
generalized QR factorization of the pair (A, B) is singular, so that rank(AB) < n; the least squares solution
could not be computed.
846
LAPACK Routines 3
Symmetric Eigenvalue Problems: LAPACK Driver Routines
This section describes LAPACK driver routines used for solving symmetric eigenvalue problems. See also
computational routines that can be called to solve these problems. Table "Driver Routines for Solving
Symmetric Eigenproblems" lists all such driver routines.
Driver Routines for Solving Symmetric Eigenproblems
syev/heev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix.
syevd/heevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric /
Hermitian matrix using divide and conquer algorithm.
syevx/heevx Computes selected eigenvalues and, optionally, eigenvectors of a symmetric /

Hermitian matrix.
syevr/heevr Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix using the Relatively Robust Representations.
spev/hpev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix in packed storage.
spevd/hpevd Uses divide and conquer algorithm to compute all eigenvalues and (optionally) all
eigenvectors of a real symmetric / Hermitian matrix held in packed storage.
spevx/hpevx Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian matrix in packed storage.
sbev /hbev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric /
Hermitian band matrix.
sbevd/hbevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric /
Hermitian band matrix using divide and conquer algorithm.
sbevx/hbevx Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric /

Hermitian band matrix.
stev Computes all eigenvalues and, optionally, eigenvectors of a real symmetric

tridiagonal matrix.
stevd Computes all eigenvalues and (optionally) all eigenvectors of a real symmetric
tridiagonal matrix using divide and conquer algorithm.
stevx Computes selected eigenvalues and eigenvectors of a real symmetric tridiagonal

matrix.
stevr Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric

tridiagonal matrix using the Relatively Robust Representations.
?syev
Computes all eigenvalues and, optionally,
eigenvectors of a real symmetric matrix.
Syntax
lapack_int LAPACKE_ssyev (int matrix_layout, char jobz, char uplo, lapack_int n, float*
a, lapack_int lda, float* w);
lapack_int LAPACKE_dsyev (int matrix_layout, char jobz, char uplo, lapack_int n,
double* a, lapack_int lda, double* w);
847
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Note that for most cases of real symmetric eigenvalue problems the default choice should be syevr function
as its underlying algorithm is faster and uses less workspace.
Input Parameters


triangular part of the symmetric matrix A, as specified by uplo.

Must be at least max(1, n).
Output Parameters
a On exit, if jobz = 'V', then if info = 0, array a contains the orthonormal

eigenvectors of the matrix A.
If jobz = 'N', then on exit the lower triangle
(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including the
diagonal, is overwritten.

If info = 0, contains the eigenvalues of the matrix A in ascending order.
Return Values
If info = i, then the algorithm failed to converge; i indicates the number of elements of an intermediate
tridiagonal form which did not converge to zero.
848
LAPACK Routines 3
?heev
eigenvectors of a Hermitian matrix.
Syntax
lapack_int LAPACKE_cheev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* w );
lapack_int LAPACKE_zheev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* w );
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
Note that for most cases of complex Hermitian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace.
Input Parameters


triangular part of the Hermitian matrix A, as specified by uplo.
lda The leading dimension of the array a. Must be at least max(1, n).
Output Parameters
a On exit, if jobz = 'V', then if info = 0, array a contains the orthonormal

eigenvectors of the matrix A.
If jobz = 'N', then on exit the lower triangle
(if uplo = 'L') or the upper triangle (if uplo = 'U') of A, including the
diagonal, is overwritten.

849
Return Values
?syevd
Computes all eigenvalues and, optionally, all
eigenvectors of a real symmetric matrix using divide
and conquer algorithm.
Syntax
lapack_int LAPACKE_ssyevd (int matrix_layout, char jobz, char uplo, lapack_int n,
float* a, lapack_int lda, float* w);
lapack_int LAPACKE_dsyevd (int matrix_layout, char jobz, char uplo, lapack_int n,
double* a, lapack_int lda, double* w);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric matrix A.
In other words, it can compute the spectral factorization of A as: A = Z*λ*ZT.
Here Λ is a diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the orthogonal matrix
whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
If the eigenvectors are requested, then this routine uses a divide and conquer algorithm to compute
eigenvalues and eigenvectors. However, if only eigenvalues are required, then it uses the Pal-Walker-Kahan
variant of the QL or QR algorithm.
as its underlying algorithm is faster and uses less workspace. ?syevd requires more workspace but is faster
in some cases, especially for large matrices.
Input Parameters

850
LAPACK Routines 3
a Array, size (lda, *).

a (size max(1, lda*n)) is an array containing either upper or lower

Output Parameters

See also info.
a If jobz = 'V', then on exit this array is overwritten by the orthogonal

matrix Z which contains the eigenvectors of A.
Return Values
If info = i, and jobz = 'N', then the algorithm failed to converge; i indicates the number of off-diagonal
elements of an intermediate tridiagonal form which did not converge to zero.
If info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info,n+1).
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2 = O(ε)*||A||2,
The complex analogue of this routine is heevd
?heevd
eigenvectors of a complex Hermitian matrix using
divide and conquer algorithm.
Syntax
lapack_int LAPACKE_cheevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* w );
lapack_int LAPACKE_zheevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* w );
Include Files
• mkl.h
Description
851
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian matrix
A. In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.
Here Λ is a real diagonal matrix whose diagonal elements are the eigenvalues λi, and Z is the (complex)
unitary matrix whose columns are the eigenvectors zi. Thus,
A*zi = λi*zi for i = 1, 2, ..., n.
Note that for most cases of complex Hermetian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace. ?heevd requires more workspace but
is faster in some cases, especially for large matrices.
Input Parameters


Output Parameters

See also info.
a If jobz = 'V', then on exit this array is overwritten by the unitary matrix
Z which contains the eigenvectors of A.
Return Values
If info = i, and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero;
if info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info, n+1).
852
LAPACK Routines 3
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)*||A||2,
The real analogue of this routine is syevd. See also hpevd for matrices held in packed storage, and hbevd for
banded matrices.
?syevx
Computes selected eigenvalues and, optionally,
eigenvectors of a symmetric matrix.
Syntax
lapack_int LAPACKE_ssyevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* a, lapack_int lda, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int*
ifail);
lapack_int LAPACKE_dsyevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* a, lapack_int lda, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, double* z, lapack_int ldz,
lapack_int* ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices for
the desired eigenvalues.
as its underlying algorithm is faster and uses less workspace. ?syevx is faster for a few selected
eigenvalues.
Input Parameters

range Must be 'A', 'V', or 'I'.
If range = 'A', all eigenvalues will be found.
If range = 'V', all eigenvalues in the half-open interval (vl, vu] will be
found.
If range = 'I', the eigenvalues with indices il through iu will be found.
853

lda The leading dimension of the array a. Must be at least max(1, n) .
for eigenvalues; vl≤vu. Not referenced if range = 'A'or 'I'.
il, iu
If range = 'I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints: 1 ≤il≤iu≤n, if n > 0;
il = 1 and iu = 0, if n = 0.
Not referenced if range = 'A'or 'V'.
abstol The absolute error tolerance for the eigenvalues. See Application Notes for
more information.
ldz The leading dimension of the output array z; ldz≥ 1.
If jobz = 'V', then ldz≥ max(1, n) for column major layout and lda≥
max(1, m) for row major layout .
Output Parameters
a On exit, the lower triangle (if uplo = 'L') or the upper triangle (if uplo =
'U') of A, including the diagonal, is overwritten.
m The total number of eigenvalues found;

0 ≤m≤n.
w Array, size at least max(1, n). The first m elements contain the selected
eigenvalues of the matrix A in ascending order.
row major layout) contains eigenvectors.
If jobz = 'V', then if info = 0, the first m columns of z contain the
orthonormal eigenvectors of the matrix A corresponding to the selected
with w(i).
If an eigenvector fails to converge, then that column of z contains the latest
approximation to the eigenvector, and the index of the eigenvector is
returned in ifail.
854
LAPACK Routines 3
Note: you must ensure that at least max(1,m) columns are supplied in the
array z; if range = 'V', the exact value of m is not known in advance and
an upper bound must be used.
ifail
If jobz = 'V', then if info = 0, the first m elements of ifail are zero; if
info > 0, then ifail contains the indices of the eigenvectors that failed to
converge.
If jobz = 'V', then ifail is not referenced.
Return Values
If info = i, then i eigenvectors failed to converge; their indices are stored in the array ifail.
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of width
less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
If abstol is less than or equal to zero, then ε*||T|} is used as tolerance, where ||T|| is the 1-norm of the
tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues are computed most accurately
when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
If this routine returns with info > 0, indicating that some eigenvectors did not converge, try setting abstol
to 2*?lamch('S').
?heevx
Syntax
lapack_int LAPACKE_cheevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z,
lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zheevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
855
Note that for most cases of complex Hermetian eigenvalue problems the default choice should be heevr
function as its underlying algorithm is faster and uses less workspace. ?heevx is faster for a few selected
eigenvalues.
Input Parameters

range Must be 'A', 'V', or 'I'.
If range = 'V', all eigenvalues in the half-open interval (vl, vu] will be
found.

for eigenvalues; vl≤vu. Not referenced if range = 'A'or 'I'.
il, iu
If range = 'I', the indices of the smallest and largest eigenvalues to be
returned. Constraints:
1 ≤il≤iu≤n, if n > 0;il = 1 and iu = 0, if n = 0. Not referenced if range =
'A'or 'V'.
abstol
ldz The leading dimension of the output array z; ldz≥ 1.
If jobz = 'V', then ldz≥max(1, n) for column major layout and lda≥
Output Parameters
m The total number of eigenvalues found; 0 ≤m≤n.
856
LAPACK Routines 3
w Array, size max(1, n). The first m elements contain the selected eigenvalues
of the matrix A in ascending order.
row major layout) contains eigenvectors.
with w(i).
returned in ifail.
ifail
info > 0, then ifail contains the indices of the eigenvectors that failed to
converge.
If jobz = 'V', then ifail is not referenced.
Return Values
Application Notes
If abstol is less than or equal to zero, then ε*||T|| will be used in its place, where ||T|| is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most
accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
to 2*?lamch('S').
?syevr
eigenvectors of a real symmetric matrix using the
Relatively Robust Representations.
Syntax
lapack_int LAPACKE_ssyevr (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* a, lapack_int lda, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int*
isuppz);
857
lapack_int LAPACKE_dsyevr (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* a, lapack_int lda, double vl, double vu, lapack_int il,
lapack_int* isuppz);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A.
The routine first reduces the matrix A to tridiagonal form T. Then, whenever possible, ?syevr calls stemr to
compute the eigenspectrum using Relatively Robust Representations. stemr computes eigenvalues by the
dqds algorithm, while orthogonal eigenvectors are computed from various "good" L*D*LT representations
(also known as Relatively Robust Representations). Gram-Schmidt orthogonalization is avoided as far as
possible. More specifically, the various steps of the algorithm are as follows. For the each unreduced block of
T:
a. Compute T - σ*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of D and L cause only small relative
computed, see Steps c) and d).
d. For each eigenvalue with a large enough relative separation, compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to Step c) for any clusters that remain.
The desired accuracy of the output can be specified by the input parameter abstol.
The routine ?syevr calls stemr when the full spectrum is requested on machines that conform to the
IEEE-754 floating point standard. ?syevr calls stebz and stein on non-IEEE machines and when partial
spectrum requests are made.
Note that ?syevr is preferable for most cases of real symmetric eigenvalue problems as its underlying
algorithm is fast and uses less workspace.
Input Parameters


interval:
858
LAPACK Routines 3
vl < w[i]≤vu.
For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebz and sstein/
dstein are called.

for eigenvalues.
Constraint: vl< vu.
il, iu
Constraint:
1 ≤il≤iu≤n, if n > 0;
il=1 and iu=0, if n = 0.
abstol If jobz = 'V', the eigenvalues and eigenvectors output have residual
norms bounded by abstol, and the dot products between different
eigenvectors are bounded by abstol.
If abstol < n *eps*||T||, then n *eps*||T|| is used instead, where
eps is the machine precision, and ||T|| is the 1-norm of the matrix T. The
eigenvalues are computed to an accuracy of eps*||T|| irrespective of
abstol.
If high relative accuracy is important, set abstol to ?lamch('S').

Constraints:
ldz≥ 1 if jobz = 'N' and
ldz≥ max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout if jobz = 'V'.
Output Parameters
859
m The total number of eigenvalues found, 0 ≤m≤n.
If range = 'A', m = n, if range = 'I', m = iu-il+1, and if range =

'V' the exact value of m is not known in advance.
w, z Arrays:
w, size at least max(1, n), contains the selected eigenvalues in ascending
order, stored in w[0] to w[m - 1];
z(size max(1, ldz*m) for column major layout and max(1, ldz*n) for row
major layout) .
with w[i - 1].
isuppz
Array, size at least 2 *max(1, m).
The support of the eigenvectors in z, i.e., the indices indicating the nonzero
elements in z. The i-th eigenvector is nonzero only in elements isuppz[2i
- 2] through isuppz[2i - 1]. Referenced only if eigenvectors are needed
(jobz = 'V') and all eigenvalues are needed, that is, range = 'A' or
range = 'I' and il = 1 and iu = n.
Return Values
If info = i, an internal error has occurred.
Application Notes
?heevr
eigenvectors of a Hermitian matrix using the
Relatively Robust Representations.
Syntax
lapack_int LAPACKE_cheevr( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z,
lapack_int ldz, lapack_int* isuppz );
lapack_int LAPACKE_zheevr( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, double vl, double vu,
lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* isuppz );
Include Files
• mkl.h
860
LAPACK Routines 3
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A.
The routine first reduces the matrix A to tridiagonal form T with a call to hetrd. Then, whenever possible, ?
heevr calls stegr to compute the eigenspectrum using Relatively Robust Representations. ?stegr computes
eigenvalues by the dqds algorithm, while orthogonal eigenvectors are computed from various "good" L*D*LT
representations (also known as Relatively Robust Representations). Gram-Schmidt orthogonalization is
avoided as far as possible. More specifically, the various steps of the algorithm are as follows. For each
unreduced block (submatrix) of T:
a. Compute T - σ*I = L*D*LT, so that L and D define all the wanted eigenvalues to high relative
accuracy. This means that small relative changes in the entries of D and L cause only small relative
computed, see Steps c) and d).
d. For each eigenvalue with a large enough relative separation, compute the corresponding eigenvector by
forming a rank revealing twisted factorization. Go back to Step c) for any clusters that remain.
The routine ?heevr calls stemr when the full spectrum is requested on machines which conform to the
IEEE-754 floating point standard, or stebz and stein on non-IEEE machines and when partial spectrum
requests are made.
Note that the routine ?heevr is preferable for most cases of complex Hermitian eigenvalue problems as its
underlying algorithm is fast and uses less workspace.
Input Parameters

If range = 'V', the routine computes eigenvalues lambda(i) in the half-

open interval: vl< lambda(i)≤vu.
For range = 'V'or 'I', sstebz/dstebz and cstein/zstein are called.
861


for eigenvalues.
Constraint: vl< vu.
il, iu
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0 if n = 0.
abstol The absolute error tolerance to which each eigenvalue/eigenvector is

required.
If jobz = 'V', the eigenvalues and eigenvectors output have residual
eigenvectors are bounded by abstol.
If abstol < n *eps*||T||, then n *eps*||T|| is used instead, where
eps is the machine precision, and ||T|| is the 1-norm of the matrix T. The
eigenvalues are computed to an accuracy of eps*||T|| irrespective of
abstol.

Output Parameters

0 ≤m≤n.
'V' the exact value of m is not known in advance.
w Array, size at least max(1, n), contains the selected eigenvalues in

ascending order, stored in w[0] to w[m - 1].
row major layout) .
862
LAPACK Routines 3
with w[i - 1].
isuppz
- 2] through isuppz[2i - 1]. Referenced only if eigenvectors are needed
(jobz = 'V') and all eigenvalues are needed, that is, range = 'A' or
range = 'I' and il = 1 and iu = n.
Return Values
Application Notes
Normal execution of ?stemr may create NaNs and infinities and hence may abort due to a floating point
exception in environments which do not handle NaNs and infinities in the IEEE standard default manner.
For more details, see ?stemr and these references:
• Inderjit S. Dhillon and Beresford N. Parlett: "Multiple representations to compute orthogonal eigenvectors
of symmetric tridiagonal matrices," Linear Algebra and its Applications, 387(1), pp. 1-28, August 2004.
• Inderjit Dhillon and Beresford Parlett: "Orthogonal Eigenvectors and Relative Gaps," SIAM Journal on
Matrix Analysis and Applications, Vol. 25, 2004. Also LAPACK Working Note 154.
• Inderjit Dhillon: "A new O(n^2) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem",
Computer Science Division Technical Report No. UCB/CSD-97-971, UC Berkeley, May 1997.
?spev
eigenvectors of a real symmetric matrix in packed
storage.
Syntax
lapack_int LAPACKE_sspev (int matrix_layout, char jobz, char uplo, lapack_int n, float*
ap, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspev (int matrix_layout, char jobz, char uplo, lapack_int n,
double* ap, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and, optionally, eigenvectors of a real symmetric matrix A in
packed storage.
863
Input Parameters

If uplo = 'U', ap stores the packed upper triangular part of A.
If uplo = 'L', ap stores the packed lower triangular part of A.
ap Array ap contains the packed upper or lower triangle of symmetric matrix A,

The size of ap must be at least max(1, n*(n+1)/2).

if jobz = 'N', then ldz≥ 1;
if jobz = 'V', then ldz≥ max(1, n).
Output Parameters
w, z Arrays:
w, size at least max(1, n).
If info = 0, w contains the eigenvalues of the matrix A in ascending order.
z (size max(1, ldz*n)).

If jobz = 'V', then if info = 0, z contains the orthonormal eigenvectors
of the matrix A, with the i-th column of z holding the eigenvector associated
with w[i - 1].
ap On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form. The elements of the diagonal and the off-
diagonal of the tridiagonal matrix overwrite the corresponding elements of
A.
Return Values
864
LAPACK Routines 3
?hpev
eigenvectors of a Hermitian matrix in packed storage.
Syntax
lapack_int LAPACKE_chpev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* ap, float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* ap, double* w, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A in
packed storage.
Input Parameters

ap Array ap contains the packed upper or lower triangle of Hermitian matrix A,


Constraints:
if jobz = 'V', then ldz≥ max(1, n) .
Output Parameters

If info = 0, w contains the eigenvalues of the matrix A in ascending order.
z Array z (size at least max(1, ldz*n)).
865

with w[i - 1].

A.
Return Values
?spevd
Uses divide and conquer algorithm to compute all
eigenvalues and (optionally) all eigenvectors of a real
symmetric matrix held in packed storage.
Syntax
lapack_int LAPACKE_sspevd (int matrix_layout, char jobz, char uplo, lapack_int n,
float* ap, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspevd (int matrix_layout, char jobz, char uplo, lapack_int n,
double* ap, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric matrix A
(held in packed storage). In other words, it can compute the spectral factorization of A as:
A = Z*Λ*ZT.
A*zi = λi*zi for i = 1, 2, ..., n.
Input Parameters

866
LAPACK Routines 3
ap ap contains the packed upper or lower triangle of symmetric matrix A, as

specified by uplo.
The dimension of ap must be max(1, n*(n+1)/2)

Constraints:
Output Parameters
w, z Arrays:
See also info.
z (size max(1, ldz*n)).
If jobz = 'V', then this array is overwritten by the orthogonal matrix Z
which contains the eigenvectors of A. If jobz = 'N', then z is not
referenced.

A.
Return Values
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2 = O(ε)*||A||2,
The complex analogue of this routine is hpevd.
See also syevd for matrices held in full storage, and sbevd for banded matrices.
867
?hpevd
Uses divide and conquer algorithm to compute all
eigenvalues and, optionally, all eigenvectors of a
complex Hermitian matrix held in packed storage.
Syntax
lapack_int LAPACKE_chpevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_float* ap, float* w, lapack_complex_float* z, lapack_int ldz );
lapack_int LAPACKE_zhpevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_complex_double* ap, double* w, lapack_complex_double* z, lapack_int ldz );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian matrix
A (held in packed storage). In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.
A*zi = λi*zi for i = 1, 2, ..., n.
Input Parameters

ap ap contains the packed upper or lower triangle of Hermitian matrix A, as

specified by uplo.

Constraints:
868
LAPACK Routines 3
Output Parameters

See also info.
z Array, size 1 if jobz = 'N' and max(1, ldz*n) if jobz = 'V'.
If jobz = 'V', then this array is overwritten by the unitary matrix Z which
contains the eigenvectors of A.

A.
Return Values
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)*||A||2,
The real analogue of this routine is spevd.
See also heevd for matrices held in full storage, and hbevd for banded matrices.
?spevx
eigenvectors of a real symmetric matrix in packed
storage.
Syntax
lapack_int LAPACKE_sspevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, float* ap, float vl, float vu, lapack_int il, lapack_int iu, float
abstol, lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dspevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, double* ap, double vl, double vu, lapack_int il, lapack_int iu, double
abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A in
packed storage. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a
range of indices for the desired eigenvalues.
869
Input Parameters


interval: vl< w[i]≤vu.
ap Array ap contains the packed upper or lower triangle of the symmetric

matrix A, as specified by uplo.
for eigenvalues.
Constraint: vl< vu.
il, iu
Constraint: 1 ≤il≤iu≤n, if n > 0; il=1 and iu=0
if n = 0.
abstol The absolute error tolerance to which each eigenvalue is required. See
Application notes for details on error tolerance.

Constraints:
870
LAPACK Routines 3
Output Parameters

A.

0 ≤m≤n. If range = 'A', m = n, if range = 'I', m = iu-il+1, and if
range = 'V' the exact value of m is not known in advance..
w, z Arrays:
If info = 0, contains the selected eigenvalues of the matrix A in ascending
order.
major layout).
with w[i - 1].

returned in ifail.
ifail
info > 0, the ifail contains the indices the eigenvectors that failed to
converge.
If jobz = 'N', then ifail is not referenced.
Return Values
Application Notes
If abstol is less than or equal to zero, then ε*||T||1 will be used in its place, where T is the tridiagonal
matrix obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol
is set to twice the underflow threshold 2*?lamch('S'), not zero.
to 2*?lamch('S').
871
?hpevx
eigenvectors of a Hermitian matrix in packed storage.
Syntax
lapack_int LAPACKE_chpevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_float* ap, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz,
lapack_int* ifail );
lapack_int LAPACKE_zhpevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_complex_double* ap, double vl, double vu, lapack_int il,
lapack_int iu, double abstol, lapack_int* m, double* w, lapack_complex_double* z,
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A in
packed storage. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a
range of indices for the desired eigenvalues.
Input Parameters


ap Array ap contains the packed upper or lower triangle of the Hermitian

matrix A, as specified by uplo.
for eigenvalues.
872
LAPACK Routines 3
Constraint: vl< vu.
il, iu

Constraints:
Output Parameters

A.

range = 'V' the exact value of m is not known in advance..

If info = 0, contains the selected eigenvalues of the matrix A in ascending
order.
row major layout).
with w(i).
returned in ifail.
ifail
converge.
873
Return Values
Application Notes
to 2*?lamch('S').
?sbev
eigenvectors of a real symmetric band matrix.
Syntax
lapack_int LAPACKE_ssbev (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbev (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric band matrix A.
Input Parameters

874
LAPACK Routines 3

(kd≥ 0).
ab ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) is an array containing either
ldab The leading dimension of ab; must be at least kd +1 for column major
layout and n for row major layout.

Constraints:
Output Parameters
w, z Arrays:
z(size max(1, ldz*n).

with w[i - 1].
ab On exit, this array is overwritten by the values generated during the

reduction to tridiagonal form (see the description of ?sbtrd).
Return Values
?hbev
eigenvectors of a Hermitian band matrix.
Syntax
lapack_int LAPACKE_chbev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* w,
lapack_int LAPACKE_zhbev( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* w,
875
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian band matrix A.
Input Parameters


(kd≥ 0).

Constraints:
Output Parameters

If info = 0, contains the eigenvalues in ascending order.
z Array z(size max(1, ldz*n).

with w[i - 1].
876
LAPACK Routines 3
reduction to tridiagonal form(see the description of hbtrd).
Return Values
If info = i, then the algorithm failed to converge;
i indicates the number of elements of an intermediate tridiagonal form which did not converge to zero.
?sbevd
eigenvectors of a real symmetric band matrix using
Syntax
lapack_int LAPACKE_ssbevd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, float* ab, lapack_int ldab, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbevd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, double* ab, lapack_int ldab, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric band
matrix A. In other words, it can compute the spectral factorization of A as:
A = Z*Λ*ZT
A*zi = λi*zi for i = 1, 2, ..., n.
Input Parameters

877

(kd≥ 0).
ldab The leading dimension of ab; must be at least kd+1 for column major

Constraints:
Output Parameters
w, z Arrays:
See also info.
z(size max(1, ldz*n if job = 'V' and at least 1 if job = 'N').
If job = 'V', then this array is overwritten by the orthogonal matrix Z
which contains the eigenvectors of A. The i-th column of Z contains the
eigenvector which corresponds to the eigenvalue w[i - 1].
If job = 'N', then z is not referenced.

reduction to tridiagonal form.
Return Values
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A+E such that ||E||2=O(ε)*||A||2,
The complex analogue of this routine is hbevd.
See also syevd for matrices held in full storage, and spevd for matrices held in packed storage.
878
LAPACK Routines 3
?hbevd
eigenvectors of a complex Hermitian band matrix
using divide and conquer algorithm.
Syntax
lapack_int LAPACKE_chbevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_float* ab, lapack_int ldab, float* w,
lapack_int LAPACKE_zhbevd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int kd, lapack_complex_double* ab, lapack_int ldab, double* w,
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a complex Hermitian band
matrix A. In other words, it can compute the spectral factorization of A as: A = Z*Λ*ZH.
A*zi = λi*zi for i = 1, 2, ..., n.
Input Parameters


(kd≥ 0).
879
ldab The leading dimension of ab; must be at least kd+1 for column major

Constraints:
Output Parameters

See also info.
z Array, size max(1, ldz*n if job = 'V' and at least 1 if job = 'N'.
If jobz = 'V', then this array is overwritten by the unitary matrix Z which
contains the eigenvectors of A. The i-th column of Z contains the
eigenvector which corresponds to the eigenvalue w[i - 1].

Return Values
Application Notes
The computed eigenvalues and eigenvectors are exact for a matrix A + E such that ||E||2 = O(ε)||A||2,
The real analogue of this routine is sbevd.
See also heevd for matrices held in full storage, and hpevd for matrices held in packed storage.
?sbevx
eigenvectors of a real symmetric band matrix.
Syntax
lapack_int LAPACKE_ssbevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, float* ab, lapack_int ldab, float* q, lapack_int ldq,
float vl, float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float*
w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsbevx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, double* ab, lapack_int ldab, double* q, lapack_int ldq,
double vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m,
double* w, double* z, lapack_int ldz, lapack_int* ifail);
880
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a real symmetric band matrix A.
Input Parameters


interval: vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues with indices in range il

to iu.

(kd≥ 0).
ab Arrays:
Array ab (size at least max(1, ldab*n) for column major layout and at least
max(1, ldab*(kd + 1)) for row major layout) contains either upper or
lower triangular part of the symmetric matrix A (as specified by uplo) in
band storage format.
for eigenvalues.
Constraint: vl< vu.
il, iu
881
if n = 0.
ldq, ldz The leading dimensions of the output arrays q and z, respectively.
Constraints:
ldq≥ 1, ldz≥ 1;
If jobz = 'V', then ldq≥ max(1, n) and ldz≥ max(1, n) for column
major layout and ldz≥ max(1, m) for row major layout .
Output Parameters
q Array, size max(1, ldz*n).
If jobz = 'V', the n-by-n orthogonal matrix is used in the reduction to

tridiagonal form.
If jobz = 'N', the array q is not referenced.

'V', the exact value of m is not known in advance.
w, z Arrays:
w, size at least max(1, n). The first m elements of w contain the selected
z(size at least max(1, ldz*m) for column major layout and max(1, ldz*n)
with w[i - 1].

returned in ifail.

ifail
converge.
882
LAPACK Routines 3
Return Values
Application Notes
If abstol is less than or equal to zero, then ε*||T||1 is used as tolerance, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues will be computed most accurately when abstol is set
to twice the underflow threshold 2*?lamch('S'), not zero.
to 2*?lamch('S').
?hbevx
eigenvectors of a Hermitian band matrix.
Syntax
lapack_int LAPACKE_chbevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* q, lapack_int ldq, float vl, float vu, lapack_int il, lapack_int
iu, float abstol, lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz,
lapack_int* ifail );
lapack_int LAPACKE_zhbevx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int kd, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* q, lapack_int ldq, double vl, double vu, lapack_int il,
Include Files
• mkl.h
Description
The routine computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian band matrix
A. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of indices
for the desired eigenvalues.
Input Parameters

883


(kd≥ 0).
for eigenvalues.
Constraint: vl< vu.
il, iu
ldq, ldz The leading dimensions of the output arrays q and z, respectively.
Constraints:
ldq≥ 1, ldz≥ 1;
If jobz = 'V', then ldq≥ max(1, n) and ldz≥ max(1, n) for column major
layout and ldz≥ max(1, m) for row major layout.
Output Parameters
q Array, size max(1, ldz*n).
884
LAPACK Routines 3
If jobz = 'V', the n-by-n unitary matrix is used in the reduction to
tridiagonal form.
If jobz = 'N', the array q is not referenced.

0 ≤m≤n.
'V', the exact value of m is not known in advance..
w Array, size at least max(1, n). The first m elements contain the selected
z Array z(size at least max(1, ldz*m) for column major layout and max(1,
ldz*n) for row major layout).
with w[i - 1].

returned in ifail.

If uplo = 'U', the first superdiagonal and the diagonal of the tridiagonal
matrix T are returned in rows kd and kd+1 of ab, and if uplo = 'L', the
diagonal and first subdiagonal of T are returned in the first two rows of ab.
ifail
info > 0, the ifail contains the indices of the eigenvectors that failed to
converge.
Return Values
Application Notes
less than or equal to abstol + ε * max( |a|,|b| ), where ε is the machine precision.
885
to 2*?lamch('S').
?stev
eigenvectors of a real symmetric tridiagonal matrix.
Syntax
lapack_int LAPACKE_sstev (int matrix_layout, char jobz, lapack_int n, float* d, float*
e, float* z, lapack_int ldz);
lapack_int LAPACKE_dstev (int matrix_layout, char jobz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal matrix A.
Input Parameters

d, e Arrays:
Array d contains the n diagonal elements of the tridiagonal matrix A.

Array e contains the n-1 subdiagonal elements of the tridiagonal matrix A.
The size of e must be at least max(1, n). The n-th element of this array is
used as workspace.
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V' then
ldz≥ max(1, n).
Output Parameters
d On exit, if info = 0, contains the eigenvalues of the matrix A in ascending

order.
z Array, size (size max(1, ldz*n)).

with the eigenvalue returned in d[i - 1].
886
LAPACK Routines 3
If job = 'N', then z is not referenced.
e On exit, this array is overwritten with intermediate results.
Return Values
If info = i, then the algorithm failed to converge;
i elements of e did not converge to zero.
?stevd
eigenvectors of a real symmetric tridiagonal matrix
using divide and conquer algorithm.
Syntax
lapack_int LAPACKE_sstevd (int matrix_layout, char jobz, lapack_int n, float* d, float*
e, float* z, lapack_int ldz);
lapack_int LAPACKE_dstevd (int matrix_layout, char jobz, lapack_int n, double* d,
double* e, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally all the eigenvectors, of a real symmetric tridiagonal
matrix T. In other words, the routine can compute the spectral factorization of T as: T = Z*Λ*ZT.
T*zi = λi*zi for i = 1, 2, ..., n.
There is no complex analogue of this routine.
Input Parameters

n The order of the matrix T (n ≥ 0).
887
d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix T.
e contains the n-1 off-diagonal elements of T.
The dimension of e must be at least max(1, n). The n-th element of this
array is used as workspace.

ldz≥ 1 if job = 'N';
ldz≥ max(1, n) if job = 'V'.
Output Parameters
d On exit, if info = 0, contains the eigenvalues of the matrix T in ascending

order.
See also info.
z Array, size max(1, ldz*n) if jobz = 'V' and 1 if jobz = 'N' .
If jobz = 'V', then this array is overwritten by the orthogonal matrix Z

which contains the eigenvectors of T.
e On exit, this array is overwritten with intermediate results.
Return Values
Application Notes
|μi - λi| ≤ c(n)*ε*||T||2
If zi is the corresponding exact eigenvector, and wi is the corresponding computed vector, then the angle
θ(zi, wi) between them is bounded as follows:
θ(zi, wi) ≤ c(n)*ε*||T||2 / min i≠j|λi - λj|.
Thus the accuracy of a computed eigenvector depends on the gap between its eigenvalue and all the other
eigenvalues.
888
LAPACK Routines 3
?stevx
Syntax
lapack_int LAPACKE_sstevx (int matrix_layout, char jobz, char range, lapack_int n,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dstevx (int matrix_layout, char jobz, char range, lapack_int n,
abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* ifail);
Include Files
• mkl.h
Description
matrix A. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of
indices for the desired eigenvalues.
Input Parameters


interval: vl<w[i]≤vu.
d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix A.
e contains the n-1 subdiagonal elements of A.
The dimension of e must be at least max(1, n-1). The n-th element of this
for eigenvalues.
Constraint: vl< vu.
889
il, iu
abstol
ldz The leading dimensions of the output array z; ldz≥ 1. If jobz = 'V', then
layout.
Output Parameters

0 ≤m≤n.
'V' the exact value of m is unknown.
w, z Arrays:
The first m elements of w contain the selected eigenvalues of the matrix A
in ascending order.
for row major layout) .
with w[i - 1].

returned in ifail.
d, e On exit, these arrays may be multiplied by a constant factor chosen to

avoid overflow or underflow in computing the eigenvalues.
ifail
converge.
Return Values
890
LAPACK Routines 3
Application Notes
If abstol is less than or equal to zero, then ε*|A|1 is used instead. Eigenvalues are computed most accurately
If this routine returns with info > 0, indicating that some eigenvectors did not converge, set abstol to 2*?
lamch('S').
?stevr
eigenvectors of a real symmetric tridiagonal matrix
using the Relatively Robust Representations.
Syntax
lapack_int LAPACKE_sstevr (int matrix_layout, char jobz, char range, lapack_int n,
lapack_int* m, float* w, float* z, lapack_int ldz, lapack_int* isuppz);
lapack_int LAPACKE_dstevr (int matrix_layout, char jobz, char range, lapack_int n,
abstol, lapack_int* m, double* w, double* z, lapack_int ldz, lapack_int* isuppz);
Include Files
• mkl.h
Description
matrix T. Eigenvalues and eigenvectors can be selected by specifying either a range of values or a range of
indices for the desired eigenvalues.
Whenever possible, the routine calls stemr to compute the eigenspectrum using Relatively Robust
Representations. stegr computes eigenvalues by the dqds algorithm, while orthogonal eigenvectors are
computed from various "good" L*D*LT representations (also known as Relatively Robust Representations).
Gram-Schmidt orthogonalization is avoided as far as possible. More specifically, the various steps of the
algorithm are as follows. For the i-th unreduced block of T:
a. Compute T - σi = Li*Di*LiT, such that Li*Di*LiT is a relatively robust representation.

b. Compute the eigenvalues, λj, of Li*Di*LiT to high relative accuracy by the dqds algorithm.
c. If there is a cluster of close eigenvalues, "choose" σi close to the cluster, and go to Step (a).
d. Given the approximate eigenvalue λj of Li*Di*LiT, compute the corresponding eigenvector by forming a
rank-revealing twisted factorization.
The routine ?stevr calls stemr when the full spectrum is requested on machines which conform to the
IEEE-754 floating point standard. ?stevr calls stebz and stein on non-IEEE machines and when partial
spectrum requests are made.
891
Input Parameters

If range = 'V', the routine computes eigenvalues w[i]in the half-open

interval:
vl<w[i]≤vu.
For range = 'V'or 'I' and iu-il < n-1, sstebz/dstebz and sstein/
dstein are called.
d, e Arrays:
d contains the n diagonal elements of the tridiagonal matrix T.
econtains the n-1 subdiagonal elements of A.
The dimension of e must be at least max(1, n-1). The n-th element of this
for eigenvalues.
Constraint: vl< vu.
il, iu
abstol The absolute error tolerance to which each eigenvalue/eigenvector is

required.
If jobz = 'V', the eigenvalues and eigenvectors output have residual
eigenvectors are bounded by abstol. If abstol < n *eps*||T||, then n
*eps*||T|| will be used in its place, where eps is the machine precision,
and ||T|| is the 1-norm of the matrix T. The eigenvalues are computed to
an accuracy of eps*||T|| irrespective of abstol.
892
LAPACK Routines 3
Constraints:
Output Parameters

range = 'V' the exact value of m is unknown..
w, z Arrays:
The first m elements of w contain the selected eigenvalues of the matrix T
in ascending order.
with w[i - 1].
d, e On exit, these arrays may be multiplied by a constant factor chosen to

avoid overflow or underflow in computing the eigenvalues.
isuppz
- 2] through isuppz[2i - 1].
Implemented only for range = 'A' or 'I' and iu-il = n-1.
Return Values
Application Notes
Normal execution of the routine ?stegr may create NaNs and infinities and hence may abort due to a floating
point exception in environments which do not handle NaNs and infinities in the IEEE standard default manner.
Nonsymmetric Eigenvalue Problems: LAPACK Driver Routines

This section describes LAPACK driver routines used for solving nonsymmetric eigenproblems. See also
computational routines that can be called to solve these problems.
893
Table "Driver Routines for Solving Nonsymmetric Eigenproblems" lists all such driver routines.
Driver Routines for Solving Nonsymmetric Eigenproblems
gees Computes the eigenvalues and Schur factorization of a general matrix, and orders
the factorization so that selected eigenvalues are at the top left of the Schur form.
geesx Computes the eigenvalues and Schur factorization of a general matrix, orders the
factorization and computes reciprocal condition numbers.
geev Computes the eigenvalues and left and right eigenvectors of a general matrix.
geevx Computes the eigenvalues and left and right eigenvectors of a general matrix, with
preliminary matrix balancing, and computes reciprocal condition numbers for the
eigenvalues and right eigenvectors.
?gees
Computes the eigenvalues and Schur factorization of a
general matrix, and orders the factorization so that
selected eigenvalues are at the top left of the Schur
form.
Syntax
lapack_int LAPACKE_sgees( int matrix_layout, char jobvs, char sort, LAPACK_S_SELECT2
select, lapack_int n, float* a, lapack_int lda, lapack_int* sdim, float* wr, float*
wi, float* vs, lapack_int ldvs );
lapack_int LAPACKE_dgees( int matrix_layout, char jobvs, char sort, LAPACK_D_SELECT2
select, lapack_int n, double* a, lapack_int lda, lapack_int* sdim, double* wr, double*
wi, double* vs, lapack_int ldvs );
lapack_int LAPACKE_cgees( int matrix_layout, char jobvs, char sort, LAPACK_C_SELECT1
select, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_int* sdim,
lapack_complex_float* w, lapack_complex_float* vs, lapack_int ldvs );
lapack_int LAPACKE_zgees( int matrix_layout, char jobvs, char sort, LAPACK_Z_SELECT1
select, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_int* sdim,
lapack_complex_double* w, lapack_complex_double* vs, lapack_int ldvs );
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, the real Schur
form T, and, optionally, the matrix of Schur vectors Z. This gives the Schur factorization A = Z*T*ZH.
Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so that selected
eigenvalues are at the top left. The leading columns of Z then form an orthonormal basis for the invariant
subspace corresponding to the selected eigenvalues.
A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks. 2-by-2 blocks
will be standardized in the form
894
LAPACK Routines 3
where b*c < 0. The eigenvalues of such a block are
A complex matrix is in Schur form if it is upper triangular.
Input Parameters

jobvs Must be 'N' or 'V'.
If jobvs = 'N', then Schur vectors are not computed.
If jobvs = 'V', then Schur vectors are computed.
sort Must be 'N' or 'S'. Specifies whether or not to order the eigenvalues on
the diagonal of the Schur form.
If sort = 'N', then eigenvalues are not ordered.
If sort = 'S', eigenvalues are ordered (see select).
select If sort = 'S', select is used to select eigenvalues to sort to the top left of
the Schur form.
If sort = 'N', select is not referenced.
For real flavors:

An eigenvalue wr[j]+sqrt(-1)*wi[j] is selected if select(wr[j], wi[j]) is
true; that is, if either one of a complex conjugate pair of eigenvalues is
selected, then both complex eigenvalues are selected.
An eigenvalue w[j] is selected if select(w[j]) is true.
Note that a selected complex eigenvalue may no longer satisfy select(wr[j],
wi[j])= 1 after ordering, since ordering may change the value of complex
eigenvalues (especially if the eigenvalue is ill-conditioned); in this case info
may be set to n+2 (see info below).
a Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.
ldvs The leading dimension of the output array vs. Constraints:

ldvs≥ 1;
ldvs≥ max(1, n) if jobvs = 'V'.
Output Parameters
a On exit, this array is overwritten by the real-Schur/Schur form T.
sdim
895
If sort = 'N', sdim= 0.
If sort = 'S', sdim is equal to the number of eigenvalues (after sorting)

for which select is true.
Note that for real flavors complex conjugate pairs for which select is true for
either eigenvalue count as 2.
wr, wi Arrays, size at least max (1, n) each. Contain the real and imaginary parts,
respectively, of the computed eigenvalues, in the same order that they
appear on the diagonal of the output real-Schur form T. Complex conjugate
pairs of eigenvalues appear consecutively with the eigenvalue having
positive imaginary part first.
w Array, size at least max(1, n). Contains the computed eigenvalues. The
eigenvalues are stored in the same order as they appear on the diagonal of
the output Schur form T.
vs Array vs (size at least max(1, ldvs*n)) .
If jobvs = 'V', vs contains the orthogonal/unitary matrix Z of Schur

vectors.
If jobvs = 'N', vs is not referenced.
Return Values
If info = i, and
i≤n:
the QR algorithm failed to compute all the eigenvalues; elements 1:ilo-1 and i+1:n of wr and wi (for real
flavors) or w (for complex flavors) contain those eigenvalues which have converged; if jobvs = 'V', vs
contains the matrix which reduces A to its partially converged Schur form;
i = n+1:
the eigenvalues could not be reordered because some eigenvalues were too close to separate (the problem is
very ill-conditioned);
i = n+2:
after reordering, round-off changed values of some complex eigenvalues so that leading eigenvalues in the
Schur form no longer satisfy select = 1. This could also be caused by underflow due to scaling.
?geesx
Computes the eigenvalues and Schur factorization of a
general matrix, orders the factorization and computes
reciprocal condition numbers.
Syntax
lapack_int LAPACKE_sgeesx( int matrix_layout, char jobvs, char sort, LAPACK_S_SELECT2
select, char sense, lapack_int n, float* a, lapack_int lda, lapack_int* sdim, float*
wr, float* wi, float* vs, lapack_int ldvs, float* rconde, float* rcondv );
896
LAPACK Routines 3
lapack_int LAPACKE_dgeesx( int matrix_layout, char jobvs, char sort, LAPACK_D_SELECT2
select, char sense, lapack_int n, double* a, lapack_int lda, lapack_int* sdim, double*
wr, double* wi, double* vs, lapack_int ldvs, double* rconde, double* rcondv );
lapack_int LAPACKE_cgeesx( int matrix_layout, char jobvs, char sort, LAPACK_C_SELECT1
select, char sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_int*
sdim, lapack_complex_float* w, lapack_complex_float* vs, lapack_int ldvs, float*
rconde, float* rcondv );
lapack_int LAPACKE_zgeesx( int matrix_layout, char jobvs, char sort, LAPACK_Z_SELECT1
select, char sense, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_int* sdim, lapack_complex_double* w, lapack_complex_double* vs, lapack_int ldvs,
double* rconde, double* rcondv );
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues, the real-Schur/
Schur form T, and, optionally, the matrix of Schur vectors Z. This gives the Schur factorization A = Z*T*ZH.
Optionally, it also orders the eigenvalues on the diagonal of the real-Schur/Schur form so that selected
eigenvalues are at the top left; computes a reciprocal condition number for the average of the selected
eigenvalues (rconde); and computes a reciprocal condition number for the right invariant subspace
corresponding to the selected eigenvalues (rcondv). The leading columns of Z form an orthonormal basis for
this invariant subspace.
For further explanation of the reciprocal condition numbers rconde and rcondv, see [LUG], Section 4.10
(where these quantities are called s and sep respectively).
A real matrix is in real-Schur form if it is upper quasi-triangular with 1-by-1 and 2-by-2 blocks. 2-by-2 blocks
will be standardized in the form
where b*c < 0. The eigenvalues of such a block are
A complex matrix is in Schur form if it is upper triangular.
Input Parameters

jobvs Must be 'N' or 'V'.
If jobvs = 'N', then Schur vectors are not computed.
If jobvs = 'V', then Schur vectors are computed.
the diagonal of the Schur form.
897
select If sort = 'S', select is used to select eigenvalues to sort to the top left of
the Schur form.
For real flavors:

An eigenvalue wr[j]+sqrt(-1)*wi[j] is selected if select(wr[j], wi[j]) is
true; that is, if either one of a complex conjugate pair of eigenvalues is
An eigenvalue w[j] is selected if select(w[j]) is true.
Note that a selected complex eigenvalue may no longer satisfy select(wr[j],
wi[j])= 1 after ordering, since ordering may change the value of complex
eigenvalues (especially if the eigenvalue is ill-conditioned); in this case info
may be set to n+2 (see info below).
sense Must be 'N', 'E', 'V', or 'B'. Determines which reciprocal condition
number are computed.
If sense = 'N', none are computed;
If sense = 'E', computed for average of selected eigenvalues only;
If sense = 'V', computed for selected right invariant subspace only;
If sense = 'B', computed for both.
If sense is 'E', 'V', or 'B', then sort must equal 'S'.
a Arrays:
ldvs The leading dimension of the output array vs. Constraints:

ldvs≥ 1;
ldvs≥ max(1, n)if jobvs = 'V'.
Output Parameters
a On exit, this array is overwritten by the real-Schur/Schur form T.
sdim

Note that for real flavors complex conjugate pairs for which select is true for
either eigenvalue count as 2.
898
LAPACK Routines 3
respectively, of the computed eigenvalues, in the same order that they
appear on the diagonal of the output real-Schur form T. Complex conjugate
pairs of eigenvalues appear consecutively with the eigenvalue having
positive imaginary part first.
w Array, size at least max(1, n). Contains the computed eigenvalues. The
eigenvalues are stored in the same order as they appear on the diagonal of
the output Schur form T.
vs Array vs (size at least max(1, ldvs*n))
If jobvs = 'V', vs contains the orthogonal/unitary matrix Z of Schur

vectors.
If jobvs = 'N', vs is not referenced.
rconde, rcondv If sense = 'E' or 'B', rconde contains the reciprocal condition number for
the average of the selected eigenvalues.
If sense = 'N' or 'V', rconde is not referenced.
If sense = 'V' or 'B', rcondv contains the reciprocal condition number for
the selected right invariant subspace.
If sense = 'N' or 'E', rcondv is not referenced.
Return Values
If info = i, and
i≤n:
the QR algorithm failed to compute all the eigenvalues; elements 1:ilo-1 and i+1:n of wr and wi (for real
flavors) or w (for complex flavors) contain those eigenvalues which have converged; if jobvs = 'V', vs
contains the transformation which reduces A to its partially converged Schur form;
i = n+1:
the eigenvalues could not be reordered because some eigenvalues were too close to separate (the problem is
very ill-conditioned);
i = n+2:
after reordering, roundoff changed values of some complex eigenvalues so that leading eigenvalues in the
Schur form no longer satisfy select = 1. This could also be caused by underflow due to scaling.
?geev
Computes the eigenvalues and left and right
eigenvectors of a general matrix.
Syntax
lapack_int LAPACKE_sgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
float* a, lapack_int lda, float* wr, float* wi, float* vl, lapack_int ldvl, float* vr,
lapack_int ldvr );
899
lapack_int LAPACKE_dgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,

double* a, lapack_int lda, double* wr, double* wi, double* vl, lapack_int ldvl,
double* vr, lapack_int ldvr );
lapack_int LAPACKE_cgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_complex_float* w, lapack_complex_float*
vl, lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr );
lapack_int LAPACKE_zgeev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_complex_double* w,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int
ldvr );
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and, optionally,
the left and/or right eigenvectors. The right eigenvector v of A satisfies
A*v = λ*v
where λ is its eigenvalue.
The left eigenvector u of A satisfies
uH*A = λ*uH
where uH denotes the conjugate transpose of u. The computed eigenvectors are normalized to have
Euclidean norm equal to 1 and largest component real.
Input Parameters

jobvl Must be 'N' or 'V'.
If jobvl = 'N', then left eigenvectors of A are not computed.
If jobvl = 'V', then left eigenvectors of A are computed.
jobvr Must be 'N' or 'V'.
If jobvr = 'N', then right eigenvectors of A are not computed.
If jobvr = 'V', then right eigenvectors of A are computed.
a a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A.
ldvl, ldvr The leading dimensions of the output arrays vl and vr, respectively.
Constraints:
ldvl≥ 1; ldvr≥ 1.
If jobvl = 'V', ldvl≥ max(1, n);
900
LAPACK Routines 3
If jobvr = 'V', ldvr≥ max(1, n).
Output Parameters
a On exit, this array is overwritten.
Contain the real and imaginary parts, respectively, of the computed

eigenvalues. Complex conjugate pairs of eigenvalues appear consecutively
with the eigenvalue having positive imaginary part first.
Contains the computed eigenvalues.
vl, vr Arrays:
vl (size at least max(1, ldvl*n)) .
If jobvl = 'N', vl is not referenced.
For real flavors:

If the j-th eigenvalue is real,the i-th component of the j-th eigenvector uj is
stored in vl[(i - 1) + (j - 1)*ldvl] for column major layout and in
vl[(i - 1)*ldvl + (j - 1)] for row major layout..
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then for
i = sqrt(-1), the k-th component of the j-th eigenvector uj is vl[(k - 1)
+ (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl] for column major layout and as
vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k-1)*ldvl + j] for row major layout.
Similarly, the k-th component of vector (j+1) uj + 1 is vl[(k - 1) + (j -
1)*ldvl] - i*vl[(k - 1) + j*ldvl] for column major layout and as vl[(k -
1)*ldvl + (j - 1)] -i*vl[(k - 1)*ldvl + j] for row major layout. .

The i-th component of the j-th eigenvector uj is stored in vl[(i - 1) +
(j - 1)*ldvl] for column major layout and in vl[(i - 1)*ldvl+(j -
1)] for row major layout.
vr (size at least max(1, ldvr*n)).
If jobvr = 'N', vr is not referenced.
For real flavors:

If the j-th eigenvalue is real, then the i-th component of j-th eigenvector vj
is stored in vr[(i - 1) + (j - 1)*ldvr] for column major layout and in
vr[(i - 1)*ldvr + (j - 1)] for row major layout..
i = sqrt(-1), the k-th component of the j-th eigenvector vj is vr[(k - 1)
+ (j - 1)*ldvr] +i*vr[(k - 1) + j*ldvr] for column major layout and as
vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr + j] for row major layout.
Similarly, the k-th component of vector j + 1) vj + 1 is vr[(k - 1) + (j -
1)*ldvr] - i*vr[(k - 1) + j*ldvr] for column major layout and as vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout.
901
The i-th component of the j-th eigenvector vj is stored in vr[(i - 1) + (j

- 1)*ldvr] for column major layout and in vr[(i - 1)*ldvr + (j -
Return Values
If info = i, the QR algorithm failed to compute all the eigenvalues, and no eigenvectors have been
computed; elements i+1:n of wr and wi (for real flavors) or w (for complex flavors) contain those
eigenvalues which have converged.
?geevx
Computes the eigenvalues and left and right
eigenvectors of a general matrix, with preliminary
matrix balancing, and computes reciprocal condition
numbers for the eigenvalues and right eigenvectors.
Syntax
lapack_int LAPACKE_sgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, float* a, lapack_int lda, float* wr, float* wi, float* vl,
lapack_int ldvl, float* vr, lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, float*
scale, float* abnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_dgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, double* a, lapack_int lda, double* wr, double* wi, double* vl,
lapack_int ldvl, double* vr, lapack_int ldvr, lapack_int* ilo, lapack_int* ihi,
double* scale, double* abnrm, double* rconde, double* rcondv );
lapack_int LAPACKE_cgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* w,
lapack_int* ilo, lapack_int* ihi, float* scale, float* abnrm, float* rconde, float*
rcondv );
lapack_int LAPACKE_zgeevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
w, lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int
ldvr, lapack_int* ilo, lapack_int* ihi, double* scale, double* abnrm, double* rconde,
double* rcondv );
Include Files
• mkl.h
Description
The routine computes for an n-by-n real/complex nonsymmetric matrix A, the eigenvalues and, optionally,
the left and/or right eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, scale, and abnrm), reciprocal condition numbers for the eigenvalues (rconde), and
reciprocal condition numbers for the right eigenvectors (rcondv).
The right eigenvector v of A satisfies
902
LAPACK Routines 3
A·v = λ·v
where λ is its eigenvalue.
The left eigenvector u of A satisfies
uHA = λuH
where uH denotes the conjugate transpose of u. The computed eigenvectors are normalized to have Euclidean
norm equal to 1 and largest component real.
Balancing a matrix means permuting the rows and columns to make it more nearly upper triangular, and
applying a diagonal similarity transformation D*A*inv(D), where D is a diagonal matrix, to make its rows and
columns closer in norm and the condition numbers of its eigenvalues and eigenvectors smaller. The computed
reciprocal condition numbers correspond to the balanced matrix. Permuting rows and columns will not
change the condition numbers in exact arithmetic) but diagonal scaling will. For further explanation of
balancing, see [LUG], Section 4.10.
Input Parameters

balanc Must be 'N', 'P', 'S', or 'B'. Indicates how the input matrix should be
diagonally scaled and/or permuted to improve the conditioning of its
eigenvalues.
If balanc = 'N', do not diagonally scale or permute;
If balanc = 'P', perform permutations to make the matrix more nearly

upper triangular. Do not diagonally scale;
If balanc = 'S', diagonally scale the matrix, i.e. replace A by
D*A*inv(D), where D is a diagonal matrix chosen to make the rows and
columns of A more equal in norm. Do not permute;
If balanc = 'B', both diagonally scale and permute A.
Computed reciprocal condition numbers will be for the matrix after

balancing and/or permuting. Permuting does not change condition numbers
(in exact arithmetic), but balancing does.
If jobvl = 'N', left eigenvectors of A are not computed;
If jobvl = 'V', left eigenvectors of A are computed.
If sense = 'E' or 'B', then jobvl must be 'V'.
If jobvr = 'N', right eigenvectors of A are not computed;
If jobvr = 'V', right eigenvectors of A are computed.
If sense = 'E' or 'B', then jobvr must be 'V'.
If sense = 'E', computed for eigenvalues only;
903
If sense = 'V', computed for right eigenvectors only;
If sense = 'B', computed for eigenvalues and right eigenvectors.
If sense is 'E' or 'B', both left and right eigenvectors must also be
computed (jobvl = 'V' and jobvr = 'V').
a Arrays:
ldvl, ldvr The leading dimensions of the output arrays vl and vr, respectively.
Constraints:
ldvl≥ 1; ldvr≥ 1.
If jobvl = 'V', ldvl≥ max(1, n);
If jobvr = 'V', ldvr≥ max(1, n).
Output Parameters
a On exit, this array is overwritten.

If jobvl = 'V' or jobvr = 'V', it contains the real-Schur/Schur form of
the balanced version of the input matrix A.
respectively, of the computed eigenvalues. Complex conjugate pairs of
eigenvalues appear consecutively with the eigenvalue having positive
imaginary part first.
w Array, size at least max(1, n). Contains the computed eigenvalues.
vl, vr Arrays:
vl (size at least max(1, ldvl*n)) .
For real flavors:

If the j-th eigenvalue is real,the i-th component of the j-th eigenvector uj is
stored in vl[(i - 1) + (j - 1)*ldvl] for column major layout and in
vl[(i - 1)*ldvl + (j - 1)] for row major layout..
i = sqrt(-1), the k-th component of the j-th eigenvector uj is vl[(k - 1)
+ (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl] for column major layout and as
vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k-1)*ldvl + j] for row major layout.
Similarly, the k-th component of vector (j+1) uj + 1 is vl[(k - 1) + (j -
1)*ldvl] - i*vl[(k - 1) + j*ldvl] for column major layout and as vl[(k -
1)*ldvl + (j - 1)] -i*vl[(k - 1)*ldvl + j] for row major layout. .
904
LAPACK Routines 3
The i-th component of the j-th eigenvector uj is stored in vl[(i - 1) +
(j - 1)*ldvl] for column major layout and in vl[(i - 1)*ldvl+(j -
For real flavors:

If the j-th eigenvalue is real, then the i-th component of j-th eigenvector vj
is stored in vr[(i - 1) + (j - 1)*ldvr] for column major layout and in
vr[(i - 1)*ldvr + (j - 1)] for row major layout..
i = sqrt(-1), the k-th component of the j-th eigenvector vj is vr[(k - 1)
+ (j - 1)*ldvr] +i*vr[(k - 1) + j*ldvr] for column major layout and as
vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr + j] for row major layout.
Similarly, the k-th component of vector j + 1) vj + 1 is vr[(k - 1) + (j -
1)*ldvr] - i*vr[(k - 1) + j*ldvr] for column major layout and as vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout.

The i-th component of the j-th eigenvector vj is stored in vr[(i - 1) + (j
- 1)*ldvr] for column major layout and in vr[(i - 1)*ldvr + (j -
ilo, ihi ilo and ihi are integer values determined when A was balanced.
The balanced A(i,j) = 0 if i > j and j = 1,..., ilo-1 or i = ihi
+1,..., n.
If balanc = 'N' or 'S', ilo = 1 and ihi = n.
scale Array, size at least max(1, n). Details of the permutations and scaling
factors applied when balancing A.
If P[j - 1] is the index of the row and column interchanged with row and
column j, and D[j - 1] is the scaling factor applied to row and column j,
then
scale[j - 1] = P[j - 1], for j = 1,...,ilo-1
= D[j - 1], for j = ilo,...,ihi
= P[j - 1] for j = ihi+1,..., n.
abnrm The one-norm of the balanced matrix (the maximum of the sum of absolute
values of elements of any column).
rconde, rcondv Arrays, size at least max(1, n) each.

rconde[j - 1] is the reciprocal condition number of the j-th eigenvalue.
rcondv[j - 1] is the reciprocal condition number of the j-th right eigenvector.
Return Values
905
If info = i, the QR algorithm failed to compute all the eigenvalues, and no eigenvectors or condition
numbers have been computed; elements 1:ilo-1 and i+1:n of wr and wi (for real flavors) or w (for complex
flavors) contain eigenvalues which have converged.
Singular Value Decomposition: LAPACK Driver Routines

Table "Driver Routines for Singular Value Decomposition" lists the LAPACK driver routines that perform
singular value decomposition .
Driver Routines for Singular Value Decomposition
?gesvd Computes the singular value decomposition of a general rectangular matrix.
?gesdd Computes the singular value decomposition of a general rectangular matrix using a
divide and conquer method.
?gejsv Computes the singular value decomposition of a real matrix using a preconditioned
Jacobi SVD method.
?gesvj Computes the singular value decomposition of a real matrix using Jacobi plane
rotations.
?ggsvd Computes the generalized singular value decomposition of a pair of general

rectangular matrices.
?gesvdx Computes the SVD and left and right singular vectors for a matrix.
?bdsvdx Computes the SVD of a bidiagonal matrix.
Singular Value Decomposition: LAPACK Computational Routines
?gesvd
general rectangular matrix.
Syntax
lapack_int LAPACKE_sgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, float* a, lapack_int lda, float* s, float* u, lapack_int ldu, float* vt,
lapack_int ldvt, float* superb );
lapack_int LAPACKE_dgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, double* a, lapack_int lda, double* s, double* u, lapack_int ldu, double*
vt, lapack_int ldvt, double* superb );
lapack_int LAPACKE_cgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, lapack_complex_float* a, lapack_int lda, float* s, lapack_complex_float*
u, lapack_int ldu, lapack_complex_float* vt, lapack_int ldvt, float* superb );
lapack_int LAPACKE_zgesvd( int matrix_layout, char jobu, char jobvt, lapack_int m,
lapack_int n, lapack_complex_double* a, lapack_int lda, double* s,
lapack_complex_double* u, lapack_int ldu, lapack_complex_double* vt, lapack_int ldvt,
double* superb );
Include Files
• mkl.h
906
LAPACK Routines 3
Description
The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, optionally
computing the left and/or right singular vectors. The SVD is written as
A = U*Σ*VT for real routines
A = U*Σ*VH for complex routines
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m
orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal elements of Σ are the
singular values of A; they are real and non-negative, and are returned in descending order. The first min(m,
n) columns of U and V are the left and right singular vectors of A.
Note that the routine returns VT (for real flavors) or VH (for complex flavors), not V.
Input Parameters

jobu Must be 'A', 'S', 'O', or 'N'. Specifies options for computing all or part
of the matrix U.
If jobu = 'A', all m columns of U are returned in the array u;
if jobu = 'S', the first min(m, n) columns of U (the left singular vectors)
are returned in the array u;
if jobu = 'O', the first min(m, n) columns of U (the left singular vectors)
are overwritten on the array a;
if jobu = 'N', no columns of U (no left singular vectors) are computed.
jobvt Must be 'A', 'S', 'O', or 'N'. Specifies options for computing all or part
of the matrix VT/VH.
If jobvt = 'A', all n rows of VT/VH are returned in the array vt;
if jobvt = 'S', the first min(m,n) rows of VT/VH (the right singular
vectors) are returned in the array vt;
if jobvt = 'O', the first min(m,n) rows of VT/VH) (the right singular
vectors) are overwritten on the array a;
if jobvt = 'N', no rows of VT/VH (no right singular vectors) are computed.
jobvt and jobu cannot both be 'O'.
a Arrays:
for row major layout) is an array containing the m-by-n matrix A.

Must be at least max(1, m) for column major layout and at least max(1, n)
907
ldu, ldvt The leading dimensions of the output arrays u and vt, respectively.
Constraints:
ldu≥ 1; ldvt≥ 1.
If jobu = 'A', ldu≥m;
If jobu = 'S', ldu≥m for column major layout and ldu≥ min(m, n) for row
major layout;
If jobvt = 'A', ldvt≥n;
If jobvt = 'S', ldvt≥ min(m, n) for column major layout and ldvt≥n for
row major layout .
Output Parameters
a On exit,
If jobu = 'O', a is overwritten with the first min(m,n) columns of U (the
left singular vectors stored columnwise);
If jobvt = 'O', a is overwritten with the first min(m, n) rows of VT/VH (the
right singular vectors stored rowwise);
If jobu≠'O' and jobvt≠'O', the contents of a are destroyed.
s Array, size at least max(1, min(m,n)). Contains the singular values of A

sorted so that s[i] ≥ s[i + 1].
u, vt Arrays:
Array u minimum size:
Column major Row major layout

layout
jobu = 'A' max(1, ldu*m) max(1, ldu*m)
jobu = 'S' max(1, ldu*min(m, max(1, ldu*m)

n))
If jobu = 'A', u contains the m-by-m orthogonal/unitary matrix U.
If jobu = 'S', u contains the first min(m, n) columns of U (the left

singular vectors stored column-wise).
If jobu = 'N' or 'O', u is not referenced.
Array v minimum size:
Column major Row major layout

layout
jobu = 'A' max(1, ldvt*n) max(1, ldvt*n)
jobu = 'S' max(1, ldvt*min(m, max(1, ldvt*n)

n))
If jobvt = 'A', vt contains the n-by-n orthogonal/unitary matrix VT/VH.
If jobvt = 'S', vt contains the first min(m, n) rows of VT/VH (the right
singular vectors stored row-wise).
908
LAPACK Routines 3
If jobvt = 'N'or 'O', vt is not referenced.
superb If ?bdsqr does not converge (indicated by the return value info > 0), on
exit superb(0:min(m,n)-2) contains the unconverged superdiagonal
elements of an upper bidiagonal matrix B whose diagonal is in s (not
necessarily sorted). B satisfies A = u*B*VT (real flavors) or A = u*B*VH
(complex flavors), so it has the same singular values as A, and singular
vectors related by u and vt.
Return Values
If info = i, then if ?bdsqr did not converge, i specifies how many superdiagonals of the intermediate
bidiagonal form B did not converge to zero (see the description of the superb parameter for details).
?gesdd
general rectangular matrix using a divide and conquer
method.
Syntax
lapack_int LAPACKE_sgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
float* a, lapack_int lda, float* s, float* u, lapack_int ldu, float* vt, lapack_int
ldvt );
lapack_int LAPACKE_dgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
double* a, lapack_int lda, double* s, double* u, lapack_int ldu, double* vt,
lapack_int ldvt );
lapack_int LAPACKE_cgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, float* s, lapack_complex_float* u, lapack_int
ldu, lapack_complex_float* vt, lapack_int ldvt );
lapack_int LAPACKE_zgesdd( int matrix_layout, char jobz, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, double* s, lapack_complex_double* u,
lapack_int ldu, lapack_complex_double* vt, lapack_int ldvt );
Include Files
• mkl.h
Description
The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, optionally
computing the left and/or right singular vectors.
If singular vectors are desired, it uses a divide-and-conquer algorithm. The SVD is written
A = U*Σ*VT for real routines,
A = U*Σ*VH for complex routines,
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m
orthogonal/unitary matrix, and V is an n-by-n orthogonal/unitary matrix. The diagonal elements of Σ are the
singular values of A; they are real and non-negative, and are returned in descending order. The first min(m,
n) columns of U and V are the left and right singular vectors of A.
909
Note that the routine returns vt = VT (for real flavors) or vt =VH (for complex flavors), not V.
Input Parameters

jobz Must be 'A', 'S', 'O', or 'N'.
Specifies options for computing all or part of the matrices U and V.

If jobz = 'A', all m columns of U and all n rows of VT or VH are returned
in the arrays u and vt;
if jobz = 'S', the first min(m, n) columns of U and the first min(m, n)
rows of VT or VH are returned in the arrays u and vt;
if jobz = 'O', then
if m≥ n, the first n columns of U are overwritten in the array a and all rows
of VT or VH are returned in the array vt;
if m<n, all columns of U are returned in the array u and the first m rows of
VT or VH are overwritten in the array a;
if jobz = 'N', no columns of U or rows of VT or VH are computed.
a a(size max(1, lda*n) for column major layout and max(1, lda*m) for row
major layout) is an array containing the m-by-n matrix A.
lda The leading dimension of the array a. Must be at least max(1, m) for
column major layout and at least max(1, n) for row major layout.
ldu, ldvt The leading dimensions of the output arrays u and vt, respectively.
The minimum size of ldu is
jobz m≥n m<n
'N' 1 1
'A' m m
'S' m for column major m

layout; n for row
major layout
'O' 1 m
The minimum size of ldvt is
jobz m≥n m<n
'N' 1 1
'A' n n
910
LAPACK Routines 3
jobz m≥n m<n
'S' n m for column major

layout; n for row
major layout
'O' n 1
Output Parameters
a On exit:
If jobz = 'O', then if m≥ n, a is overwritten with the first n columns of U
(the left singular vectors, stored columnwise). If m < n, a is overwritten
with the first m rows of VT (the right singular vectors, stored rowwise);
If jobz≠'O', the contents of a are destroyed.
s Array, size at least max(1, min(m,n)). Contains the singular values of A

sorted so that s(i) ≥ s(i+1).
u, vt Arrays:
Array u is of size:
jobz m≥n m<n
'N' 1 1
'A' max(1, ldu*m) max(1, ldu*m)
'S' max(1, ldu*n) for max(1, ldu*m)

column major layout;
max(1, ldu*m) for
row major layout
'O' 1 max(1, ldu*m)
If jobz = 'A'or jobz = 'O' and m < n, u contains the m-by-m

orthogonal/unitary matrix U.
If jobz = 'S', u contains the first min(m, n) columns of U (the left
singular vectors, stored columnwise).
If jobz = 'O' and m≥n, or jobz = 'N', u is not referenced.
Array vt is of size:
jobz m≥n m<n
'N' 1 1
'A' max(1, ldvt*n) max(1, ldvt*n)
'S' max(1, ldvt*n) max(1, ldvt*n ) for

column major layout;
max(1, ldvt*m ) for
row major layout;
911
jobz m≥n m<n
'O' max(1, ldvt*n) 1
If jobz = 'A'or jobz = 'O' and m≥n, vt contains the n-by-n orthogonal/
unitary matrix VT.
If jobz = 'S', vt contains the first min(m, n) rows of VT (the right singular
vectors, stored rowwise).
If jobz = 'O' and m < n, or jobz = 'N', vt is not referenced.
Return Values
If info = i, then ?bdsdc did not converge, updating process failed.
?gejsv
Computes the singular value decomposition using a
preconditioned Jacobi SVD method.
Syntax
lapack_int LAPACKE_sgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, float * a, lapack_int lda,
float * sva, float * u, lapack_int ldu, float * v, lapack_int ldv, float * stat,
lapack_int * istat);
lapack_int LAPACKE_dgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, double * a, lapack_int lda,
double * sva, double * u, lapack_int ldu, double * v, lapack_int ldv, double * stat,
lapack_int * istat);
lapack_int LAPACKE_cgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, lapack_complex_float * a,
lapack_int lda, float * sva, lapack_complex_float * u, lapack_int ldu,
lapack_complex_float * v, lapack_int ldv, float * stat, lapack_int * istat);
lapack_int LAPACKE_zgejsv (int matrix_layout, char joba, char jobu, char jobv, char
jobr, char jobt, char jobp, lapack_int m, lapack_int n, lapack_complex_double * a,
lapack_int lda, double * sva, lapack_complex_double * u, lapack_int ldu,
lapack_complex_double * v, lapack_int ldv, double * stat, lapack_int * istat);
Include Files
• mkl.h
Description
The routine computes the singular value decomposition (SVD) of a real/complex m-by-n matrix A, where m≥n.
The SVD is written as

A = U*Σ*VT, for real routines
A = U*Σ*VH, for complex routines
912
LAPACK Routines 3
where Σ is an m-by-n matrix which is zero except for its n diagonal elements, U is an m-by-n (or m-by-m)
orthonormal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of Σ are the singular values
of A; the columns of U and V are the left and right singular vectors of A, respectively. The matrices U and V
are computed and stored in the arrays u and v, respectively. The diagonal of Σ is computed and stored in the
array sva.
The ?gejsv routine can sometimes compute tiny singular values and their singular vectors much more
accurately than other SVD routines.
The routine implements a preconditioned Jacobi SVD algorithm. It uses ?geqp3, ?geqrf, and ?gelqf as
preprocessors and preconditioners. Optionally, an additional row pivoting can be used as a preprocessor,
which in some cases results in much higher accuracy. An example is matrix A with the structure A = D1 * C
* D2, where D1, D2 are arbitrarily ill-conditioned diagonal matrices and C is a well-conditioned matrix. In that
case, complete pivoting in the first QR factorizations provides accuracy dependent on the condition number
of C, and independent of D1, D2. Such higher accuracy is not completely understood theoretically, but it
If A can be written as A = B*D, with well-conditioned B and some diagonal D, then the high accuracy is
guaranteed, both theoretically and in software, independent of D. For more details see [Drmac08-1],
[Drmac08-2].
The computational range for the singular values can be the full range ( UNDERFLOW,OVERFLOW ), provided
that the machine arithmetic and the BLAS and LAPACK routines called by ?gejsv are implemented to work in
that range. If that is not the case, the restriction for safe computation with the singular values in the range
of normalized IEEE numbers is that the spectral condition number kappa(A)=sigma_max(A)/sigma_min(A)
does not overflow. This code (?gejsv) is best used in this restricted range, meaning that singular values of
magnitude below ||A||_2 / slamch('O') (for single precision) or ||A||_2 / dlamch('O') (for double
precision) are returned as zeros. See jobr for details on this.
This implementation is slower than the one described in [Drmac08-1], [Drmac08-2] due to replacement of
some non-LAPACK components, and because the choice of some tuning parameters in the iterative part (?
gesvj) is left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: ?geqp3) should be implemented as in [Drmac08-3].
If m is much larger than n, it is obvious that the inital QRF with column pivoting can be preprocessed by the
QRF without pivoting. That well known trick is not used in ?gejsv because in some cases heavy row
weighting can be treated with complete pivoting. The overhead in cases m much larger than n is then only
due to pivoting, but the benefits in accuracy have prevailed. You can incorporate this extra QRF step easily
and also improve data movement (matrix transpose, matrix copy, matrix transposed copy) - this
implementation of ?gejsv uses only the simplest, naive data movement.
Optimization Notice
913
Input Parameters

joba Must be 'C', 'E', 'F', 'G', 'A', or 'R'.
Specifies the level of accuracy:

If joba = 'C', high relative accuracy is achieved if A = B*D with well-
conditioned B and arbitrary diagonal matrix D. The accuracy cannot be
spoiled by column scaling. The accuracy of the computed output depends
on the condition of B, and the procedure aims at the best theoretical
accuracy. The relative error max_{i=1:N}|d sigma_i| / sigma_i is
bounded by f(M,N)*epsilon* cond(B), independent of D. The input
matrix is preprocessed with the QRF with column pivoting. This initial
preprocessing and preconditioning by a rank revealing QR factorization is
common for all values of joba. Additional actions are specified as follows:
If joba = 'E', computation as with 'C' with an additional estimate of the

condition number of B. It provides a realistic error bound.
If joba = 'F', accuracy higher than in the 'C' option is achieved, if A =
D1*C*D2 with ill-conditioned diagonal scalings D1, D2, and a well-
conditioned matrix C. This option is advisable, if the structure of the input
matrix is not known and relative accuracy is desirable. The input matrix A is
preprocessed with QR factorization with full (row and column) pivoting.
If joba = 'G', computation as with 'F' with an additional estimate of the
condition number of B, where A = B*D. If A has heavily weighted rows,
using this condition number gives too pessimistic error bound.
If joba = 'A', small singular values are the noise and the matrix is treated
as numerically rank defficient. The error in the computed singular values is
bounded by f(m,n)*epsilon*||A||. The computed SVD A = U*S*V**t
(for real flavors) or A = U*S*V**H (for complex flavors) restores A up to
f(m,n)*epsilon*||A||. This enables the procedure to set all singular
values below n*epsilon*||A|| to zero.
If joba = 'R', the procedure is similar to the 'A' option. Rank revealing
property of the initial QR factorization is used to reveal (using triangular
factor) a gap sigma_{r+1} < epsilon * sigma_r, in which case the
numerical rank is declared to be r. The SVD is computed with absolute error
bounds, but more accurately than with 'A'.
jobu Must be 'U', 'F', 'W', or 'N'.
Specifies whether to compute the columns of the matrix U:

If jobu = 'U', n columns of U are returned in the array u
If jobu = 'F', a full set of m left singular vectors is returned in the array u.
If jobu = 'W', u may be used as workspace of length m*n. See the

description of u.
If jobu = 'N', u is not computed.
jobv Must be 'V', 'J', 'W', or 'N'.
Specifies whether to compute the matrix V:
914
LAPACK Routines 3
If jobv = 'V', n columns of V are returned in the array v; Jacobi rotations
are not explicitly accumulated.
If jobv = 'J', n columns of V are returned in the array v but they are
computed as the product of Jacobi rotations. This option is allowed only if
jobu≠'N'
If jobv = 'W', v may be used as workspace of length n*n. See the
description of v.
If jobv = 'N', v is not computed.
jobr Must be 'N' or 'R'.
Specifies the range for the singular values. If small positive singular values
are outside the specified range, they may be set to zero. If A is scaled so
that the largest singular value of the scaled matrix is around sqrt(big),
big = ?lamch('O'), the function can remove columns of A whose norm in
the scaled matrix is less than sqrt(?lamch('S')) (for jobr = 'R'), or
less than small = ?lamch('S')/?lamch('E').
If jobr = 'N', the function does not remove small columns of the scaled
matrix. This option assumes that BLAS and QR factorizations and triangular
solvers are implemented to work in that range. If the condition of A if
greater that big, use ?gesvj.
If jobr = 'R', restricted range for singular values of the scaled matrix A is
[sqrt(?lamch('S'), sqrt(big)], roughly as described above. This
option is recommended.
For computing the singular values in the full range [?lamch('S'),big],
use ?gesvj.
jobt Must be 'T' or 'N'.
If the matrix is square, the procedure may determine to use a transposed A

if AT (for real flavors) or AH (for complex flavors) seems to be better with
respect to convergence. If the matrix is not square, jobt is ignored. This is
subject to changes in the future.
The decision is based on two values of entropy over the adjoint orbit of AT *
A (for real flavors) or AH * A (for complex flavors). See the descriptions of
stat[5] and stat[6].
If jobt = 'T', the function performs transposition if the entropy test
indicates possibly faster convergence of the Jacobi process, if A is taken as
input. If A is replaced with AT or AH, the row pivoting is included
automatically.
If jobt = 'N', the functions attempts no speculations. This option can be
used to compute only the singular values, or the full SVD (u, sigma, and v).
For only one set of singular vectors (u or v), the caller should provide both
u and v, as one of the arrays is used as workspace if the matrix A is
transposed. The implementer can easily remove this constraint and make
the code more complicated. See the descriptions of u and v.
jobp Must be 'P' or 'N'.
915
Enables structured perturbations of denormalized numbers. This option

should be active if the denormals are poorly implemented, causing slow
computation, especially in cases of fast convergence. For details, see
[Drmac08-1], [Drmac08-2] . For simplicity, such perturbations are included
only when the full SVD or only the singular values are requested. You can
add the perturbation for the cases of computing one set of singular vectors.
If jobp = 'P', the function introduces perturbation.
If jobp = 'N', the function introduces no perturbation.
m The number of rows of the input matrix A; m≥ 0.
n The number of columns in the input matrix A; m≥n≥ 0.
a, u, v Array a(size lda*n for column major layout and lda*m for row major
layout) is an array containing the m-by-n matrix A.
u is a workspace array, its size for column major layout is ldu*n for
jobu='U' or 'W' and ldu*m for jobu='F'; for row major layout its size is at
least ldu*m. When jobt = 'T' and m = n, u must be provided even though
jobu = 'N'.
v is a workspace array, its size is ldv*n. When jobt = 'T' and m = n, v
must be provided even though jobv = 'N'.
column major layout and at least max(1, n) for row major layout .
sva sva is a workspace array, its size is n.
ldu The leading dimension of the array u; ldu≥ 1.
jobu = 'U' or 'F' or 'W', ldu≥m for column major layout; for row major
layout if jobu = 'U' or jobu = 'W'ldu≥n and if jobu = 'F'ldu≥m.
ldv The leading dimension of the array v; ldv≥ 1.
jobv = 'V' or 'J' or 'W', ldv≥n.
cwork cwork is a workspace array, dimension is at least lwork.
rwork rwork is an array, dimension is at least max(7, lrwork).
Output Parameters
sva On exit:
For stat[0]/stat[1] = 1: the singular values of A. During the
computation sva contains Euclidean column norms of the iterated matrices
in the array a.
For stat[0]≠stat[1]: the singular values of A are (stat[0]/stat[1]) *

sva[0:n - 1]. This factored form is used if sigma_max(A) overflows or if
small singular values have been saved from underflow by scaling the input
matrix A.
916
LAPACK Routines 3
jobr = 'R', some of the singular values may be returned as exact zeros
obtained by 'setting to zero' because they are below the numerical rank
threshold or are denormalized numbers.
u On exit:
If jobu = 'U', contains the m-by-n matrix of the left singular vectors.
If jobu = 'F', contains the m-by-m matrix of the left singular vectors,
including an orthonormal basis of the orthogonal complement of the range
of A.
If jobu = 'W' and jobv = 'V', jobt = 'T', and m = n, then u is used
as workspace if the procedure replaces A with AT (for real flavors) or AH (for
complex flavors). In that case, v is computed in u as left singular vectors of
AT or AH and copied back to the v array. This 'W' option is just a reminder
to the caller that in this case u is reserved as workspace of length n*n.
v On exit:
If jobv = 'V' or 'J', contains the n-by-n matrix of the right singular
vectors.
If jobv = 'W' and jobu = 'U', jobt = 'T', and m = n, then v is used
as workspace if the procedure replaces A with AT (for real flavors) or AH (for
complex flavors). In that case, u is computed in v as right singular vectors
of AT or AH and copied back to the u array. This 'W' option is just a
reminder to the caller that in this case v is reserved as workspace of length
n*n.
stat On exit,
stat[0] = scale = stat[1]/stat[0] is the scaling factor such that
scale*sva(1:n) are the computed singular values of A. See the
description of sva.
stat[1] = see the description of stat[0].

stat[2] = sconda is an estimate for the condition number of column
equilibrated A. If joba = 'E' or 'G', sconda is an estimate of sqrt(||(RT
* R)-1||_1). It is computed using ?pocon. It holds n-1/4 * sconda≤ ||
R-1||_2 ≤n-1/4 * sconda, where R is the triangular factor from the QRF of
A. However, if R is truncated and the numerical rank is determined to be
strictly smaller than n, sconda is returned as -1, indicating that the smallest
singular values might be lost.
If full SVD is needed, the following two condition numbers are useful for the
analysis of the algorithm. They are provied for a user who is familiar with
the details of the method.
stat[3] = an estimate of the scaled condition number of the triangular
factor in the first QR factorization.
stat[4] = an estimate of the scaled condition number of the triangular
factor in the second QR factorization.
917
The following two parameters are computed if jobt = 'T'. They are
provided for a user who is familiar with the details of the method.
stat[5] = the entropy of AT*A :: this is the Shannon entropy of
diag(AT*A) / Trace(AT*A) taken as point in the probability simplex.
stat[6] = the entropy of A*A**t.
istat On exit,
istat[0] = the numerical rank determined after the initial QR factorization
with pivoting. See the descriptions of joba and jobr.
istat[1] = the number of the computed nonzero singular value.

istat[2] = if nonzero, a warning message. If istat[2]=1, some of the
column norms of A were denormalized floats. The requested high accuracy
is not warranted by the data.
Return Values
If info > 0, the function did not converge in the maximal number of sweeps. The computed values may be
inaccurate.
See Also
?geqp3
?geqrf
?gelqf
?gesvj
?lamch
?pocon
?ormlq
?gesvj
Computes the singular value decomposition of a real
matrix using Jacobi plane rotations.
Syntax
lapack_int LAPACKE_sgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, float * a, lapack_int lda, float * sva, lapack_int mv,
float * v, lapack_int ldv, float * stat);
lapack_int LAPACKE_dgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, double * a, lapack_int lda, double * sva, lapack_int mv,
double * v, lapack_int ldv, double * stat);
lapack_int LAPACKE_cgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, lapack_complex_float * a, lapack_int lda, float * sva,
lapack_int mv, lapack_complex_float * v, lapack_int ldv, float * stat);
lapack_int LAPACKE_zgesvj (int matrix_layout, char joba, char jobu, char jobv,
lapack_int m, lapack_int n, lapack_complex_double * a, lapack_int lda, double * sva,
lapack_int mv, lapack_complex_double * v, lapack_int ldv, double * stat);
918
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine computes the singular value decomposition (SVD) of a real or complex m-by-n matrix A, where
m≥n.
The SVD of A is written as
A = U*Σ*VT for real flavors, or
A = U*Σ*VH for complex flavors,
where Σ is an m-by-n diagonal matrix, U is an m-by-n orthonormal matrix, and V is an n-by-n orthogonal/
unitary matrix. The diagonal elements of Σ are the singular values of A; the columns of U and V are the left
and right singular vectors of A, respectively. The matrices U and V are computed and stored in the arrays u
and v, respectively. The diagonal of Σ is computed and stored in the array sva.
The ?gesvj routine can sometimes compute tiny singular values and their singular vectors much more
accurately than other SVD routines.
The n-by-n orthogonal matrix V is obtained as a product of Jacobi plane rotations. The rotations are
implemented as fast scaled rotations of Anda and Park [AndaPark94]. In the case of underflow of the Jacobi
angle, a modified Jacobi transformation of Drmac ([Drmac08-4]) is used. Pivot strategy uses column
interchanges of de Rijk ([deRijk98]). The relative accuracy of the computed singular values and the accuracy
of the computed singular vectors (in angle metric) is as guaranteed by the theory of Demmel and Veselic
[Demmel92]. The condition number that determines the accuracy in the full rank case is essentially
where κ(.) is the spectral condition number. The best performance of this Jacobi SVD procedure is achieved if
used in an accelerated version of Drmac and Veselic [Drmac08-1], [Drmac08-2].
The computational range for the nonzero singular values is the machine number interval
( UNDERFLOW,OVERFLOW ). In extreme cases, even denormalized singular values can be computed with the
corresponding gradual loss of accurate digit.
Input Parameters

joba Must be 'L', 'U' or 'G'.
Specifies the structure of A:

If joba = 'L', the input matrix A is lower triangular.
If joba = 'U', the input matrix A is upper triangular.
If joba = 'G', the input matrix A is a general m-by-n, m≥n.
jobu Must be 'U', 'C' or 'N'.
Specifies whether to compute the left singular vectors (columns of U):
919
If jobu = 'U', the left singular vectors corresponding to the nonzero

singular values are computed and returned in the leading columns of A. See
more details in the description of a. The default numerical orthogonality
threshold is set to approximately TOL=CTOL*EPS, CTOL=sqrt(m), EPS
= ?lamch('E')
If jobu = 'C', analogous to jobu = 'U', except that you can control the
level of numerical orthogonality of the computed left singular vectors. TOL
can be set to TOL=CTOL*EPS, where CTOL is given on input in the array
stat. No CTOL smaller than ONE is allowed. CTOL greater than 1 / EPS is
meaningless. The option 'C' can be used if m*EPS is satisfactory
orthogonality of the computed left singular vectors, so CTOL=m could save a
few sweeps of Jacobi rotations. See the descriptions of a and stat[0].
If jobu = 'N', u is not computed. However, see the description of a.
jobv Must be 'V', 'A' or 'N'.
Specifies whether to compute the right singular vectors, that is, the matrix
V:
If jobv = 'V', the matrix V is computed and returned in the array v.
If jobv = 'A', the Jacobi rotations are applied to the mv-byn array v. In
other words, the right singular vector matrix V is not computed explicitly,
instead it is applied to an mv-byn matrix initially stored in the first mv rows
of V.
If jobv = 'N', the matrix V is not computed and the array v is not
referenced.
m The number of rows of the input matrix A.

1/slamch('E')> m≥ 0 for sgesvj.
1/dlamch('E')> m≥ 0 for dgesvj.
n The number of columns in the input matrix A; m≥n≥ 0.
a, v Array a(size at least lda*n for column major layout andlda*m for row
major layout) is an array containing the m-by-n matrix A.
Array v(size at least max(1, ldv*n)) contains, if jobv = 'A' the mv-by-n
matrix to be post-multiplied by Jacobi rotations.
column major layout and at least max(1, n) for row major layout .
mv Ifjobv = 'A', the product of Jacobi rotations in ?gesvj is applied to the

first mv rows of v. See the description of jobv. 0 ≤mv≤ldv.
ldv The leading dimension of the array v; ldv≥ 1.
jobv = 'V', ldv≥ max(1, n).

jobv = 'A', ldv≥ max(1, mv) for column major layout and ldv≥ max(1,
n) for row major layout.
920
LAPACK Routines 3
stat Array size 6. If jobu = 'C', stat[0] = CTOL, where CTOL defines the
threshold for convergence. The process stops if all columns of A are
mutually orthogonal up to CTOL*EPS, where EPS = ?lamch('E'). It is
required that CTOL≥ 1 - that is, it is not allowed to force the routine to
obtain orthogonality below ε.
Output Parameters
a On exit:
If jobu = 'U' or jobu = 'C':
• if info = 0, the leading columns of A contain left singular vectors

corresponding to the computed singular values of a that are above the
underflow threshold ?lamch('S'), that is, non-zero singular values. The
number of the computed non-zero singular values is returned in
stat[1]. Also see the descriptions of sva and stat. The computed
columns of u are mutually numerically orthogonal up to approximately
TOL=sqrt(m)*EPS (default); or TOL=CTOL*EPSjobu = 'C', see the
description of jobu.
• if info > 0, the procedure ?gesvj did not converge in the given
number of iterations (sweeps). In that case, the computed columns of u
may not be orthogonal up to TOL. The output u (stored in a), sigma
(given by the computed singular values in sva(1:n)) and v is still a
decomposition of the input matrix A in the sense that the residual ||A-
scale*U*sigma*VT||2 / ||A||2 for real flavors or ||A-
scale*U*sigma*VH||2 / ||A||2 for complex flavors (where scale =
stat[0]) is small.
If jobu = 'N':
• if info = 0, note that the left singular vectors are 'for free' in the one-
sided Jacobi SVD algorithm. However, if only the singular values are
needed, the level of numerical orthogonality of u is not an issue and
iterations are stopped when the columns of the iterated matrix are
numerically orthogonal up to approximately m*EPS. Thus, on exit, a
contains the columns of u scaled with the corresponding singular values.
• if info > 0, the procedure ?gesvj did not converge in the given
number of iterations (sweeps).
sva Array size n.
If info = 0, depending on the value scale =stat[0], where scale is the

scaling factor:
• if scale = 1, sva[0:n - 1] contains the computed singular values of

a.
• if scale≠ 1, the singular values of a are scale*sva(1:n), and this
factored representation is due to the fact that some of the singular
values of a might underflow or overflow.
If info > 0, the procedure ?gesvj did not converge in the given number
of iterations (sweeps) and scale*sva(1:n) may not be accurate.
v On exit:
921
If jobv = 'V', contains the n-by-n matrix of the right singular vectors.
If jobv = 'A', then v contains the product of the computed right singular
vector matrix and the initial matrix in the array v.
stat On exit,
stat[0] = scale is the scaling factor such that scale*sva(1:n) are the
computed singular values of A. See the description of sva.
stat[1] is the number of the computed nonzero singular values.

stat[2] is the number of the computed singular values that are larger than
the underflow threshold.
stat[3] is the number of sweeps of Jacobi rotations needed for numerical
convergence.
stat[4] = max_{i≠j} |COS(A(:,i),A(:,j))| in the last sweep. This is
useful information in cases when ?gesvj did not converge, as it can be
used to estimate whether the output is still useful and for post festum
analysis.
stat[5] is the largest absolute value over all sines of the Jacobi rotation
angles in the last sweep. It can be useful in a post festum analysis.
Return Values
If info > 0, the function did not converge in the maximal number (30) of sweeps. The output may still be
useful. See the description of stat.
?ggsvd
Computes the generalized singular value
decomposition of a pair of general rectangular
matrices (deprecated).
Syntax
lapack_int LAPACKE_sggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l, float* a,
lapack_int lda, float* b, lapack_int ldb, float* alpha, float* beta, float* u,
lapack_int ldu, float* v, lapack_int ldv, float* q, lapack_int ldq, lapack_int*
iwork );
lapack_int LAPACKE_dggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l, double* a,
lapack_int lda, double* b, lapack_int ldb, double* alpha, double* beta, double* u,
lapack_int ldu, double* v, lapack_int ldv, double* q, lapack_int ldq, lapack_int*
iwork );
lapack_int LAPACKE_cggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l,
922
LAPACK Routines 3
float* alpha, float* beta, lapack_complex_float* u, lapack_int ldu,
lapack_complex_float* v, lapack_int ldv, lapack_complex_float* q, lapack_int ldq,
lapack_int* iwork );
lapack_int LAPACKE_zggsvd( int matrix_layout, char jobu, char jobv, char jobq,
lapack_int m, lapack_int n, lapack_int p, lapack_int* k, lapack_int* l,
double* alpha, double* beta, lapack_complex_double* u, lapack_int ldu,
lapack_complex_double* v, lapack_int ldv, lapack_complex_double* q, lapack_int ldq,
lapack_int* iwork );
Include Files
• mkl.h
Description
This routine is deprecated; use ggsvd3.
The routine computes the generalized singular value decomposition (GSVD) of an m-by-n real/complex
matrix A and p-by-n real/complex matrix B:
U'*A*Q = D1*(0 R), V'*B*Q = D2*(0 R),
where U, V and Q are orthogonal/unitary matrices and U', V' mean transpose/conjugate transpose of U and V
respectively.
Let k+l = the effective numerical rank of the matrix (A', B')', then R is a (k+l)-by-(k+l) nonsingular upper
triangular matrix, D1 and D2 are m-by-(k+l) and p-by-(k+l) "diagonal" matrices and of the following
structures, respectively:
If m-k-l≥0,
where
923
C = diag(alpha[k],..., alpha[k + l - 1])

S = diag(beta[k],...,beta[k + l - 1])
C2 + S2 = I
Nonzero element ri j (1 ≤i≤j≤k + l) of R is stored in a[(i - 1) + (n - k - l + j - 1)*lda] for column
major layout and in a[(i - 1)*lda + (n - k - l + j - 1)] for row major layout.
If m-k-l < 0,
where
C = diag(alpha[k],..., alpha(m)),
S = diag(beta[k],...,beta[m - 1]),
C2 + S2 = I
On exit, the location of nonzero element ri j (1 ≤i≤j≤k + l) of R depends on the value of i. For i≤m this element
is stored in a[(i - 1) + (n - k - l + j - 1)*lda] for column major layout and in a[(i - 1)*lda +
(n - k - l + j - 1)] for row major layout. For m < i≤k + l it is stored in b[(i - k - 1) + (n - k -
l + j - 1)*ldb] for column major layout and in b[(i - k - 1)*ldb + (n - k - l + j - 1)] for row
major layout.
The routine computes C, S, R, and optionally the orthogonal/unitary transformation matrices U, V and Q.
In particular, if B is an n-by-n nonsingular matrix, then the GSVD of A and B implicitly gives the SVD of
A*B-1:
A*B-1 = U*(D1*D2-1)*V'.
If (A', B')' has orthonormal columns, then the GSVD of A and B is also equal to the CS decomposition of A
and B. Furthermore, the GSVD can be used to derive the solution of the eigenvalue problem:
A'**A*x = λ*B'*B*x.
924
LAPACK Routines 3
Input Parameters

jobu Must be 'U' or 'N'.
If jobu = 'U', orthogonal/unitary matrix U is computed.
If jobu = 'N', U is not computed.
jobv Must be 'V' or 'N'.
If jobv = 'V', orthogonal/unitary matrix V is computed.
If jobv = 'N', V is not computed.
jobq Must be 'Q' or 'N'.
If jobq = 'Q', orthogonal/unitary matrix Q is computed.
If jobq = 'N', Q is not computed.
a, b Arrays:
ldu The leading dimension of the array u .

ldu≥ max(1, m) if jobu = 'U'; ldu≥ 1 otherwise.
ldv The leading dimension of the array v .

ldv≥ max(1, p) if jobv = 'V'; ldv≥ 1 otherwise.
ldq The leading dimension of the array q .

ldq≥ max(1, n) if jobq = 'Q'; ldq≥ 1 otherwise.
Output Parameters
k, l On exit, k and l specify the dimension of the subblocks. The sum k+l is
equal to the effective numerical rank of (A', B')'.
a On exit, a contains the triangular matrix R or part of R.
925
b On exit, b contains part of the triangular matrix R if m-k-l < 0.
alpha, beta Arrays, size at least max(1, n) each.

Contain the generalized singular value pairs of A and B:
alpha(1:k) = 1,
beta(1:k) = 0,
and if m-k-l≥ 0,
alpha(k+1:k+l) = C,
beta(k+1:k+l) = S,
or if m-k-l < 0,
alpha(k+1:m)= C, alpha(m+1:k+l)=0
beta(k+1:m) = S, beta(m+1:k+l) = 1
and
alpha(k+l+1:n) = 0
beta(k+l+1:n) = 0.
u, v, q Arrays:
u, size at least max(1, ldu*m).
If jobu = 'U', u contains the m-by-m orthogonal/unitary matrix U.
v, size at least max(1, ldv*p).
If jobv = 'V', v contains the p-by-p orthogonal/unitary matrix V.
q, size at least max(1, ldq*n).
If jobq = 'Q', q contains the n-by-n orthogonal/unitary matrix Q.
iwork On exit, iwork stores the sorting information.
Return Values
If info = 1, the Jacobi-type procedure failed to converge. For further details, see subroutine tgsja.
?gesvdx
Computes the SVD and left and right singular vectors
for a matrix.
926
LAPACK Routines 3
Syntax
lapack_int LAPACKE_sgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, float * a, lapack_int lda, float vl, float vu, lapack_int
il, lapack_int iu, lapack_int * ns, float * s, float * u, lapack_int ldu, float * vt,
lapack_int ldvt, lapack_int * superb);
lapack_int LAPACKE_dgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, double * a, lapack_int lda, double vl, double vu,
lapack_int il, lapack_int iu, lapack_int *ns, double * s, double * u, lapack_int ldu,
double * vt, lapack_int ldvt, lapack_int * superb);
lapack_int LAPACKE_cgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, lapack_complex_float * a, lapack_int lda, float vl, float
vu, lapack_int il, lapack_int iu, lapack_int * ns, float * s, lapack_complex_float *
u, lapack_int ldu, lapack_complex_float * vt, lapack_int ldvt, lapack_int * superb);
lapack_int LAPACKE_zgesvdx (int matrix_layout, char jobu, char jobvt, char range,
lapack_int m, lapack_int n, lapack_complex_double * a, lapack_int lda, double vl,
double vu, lapack_int il, lapack_int iu, lapack_int * ns, double * s,
lapack_complex_double * u, lapack_int ldu, lapack_complex_double * vt, lapack_int ldvt,
lapack_int * superb);
Include Files
• mkl.h
Description
?gesvdx computes the singular value decomposition (SVD) of a real or complex m-by-n matrix A, optionally
computing the left and right singular vectors. The SVD is written
A = U * Σ * transpose(V)
where Σ is an m-by-n matrix which is zero except for its min(m,n) diagonal elements, U is an m-by-m matrix,
and V is an n-by-n matrix. The matrices U and V are orthogonal for real A, and unitary for complex A. The
diagonal elements of Σ are the singular values of A; they are real and non-negative, and are returned in
descending order. The first min(m,n) columns of U and V are the left and right singular vectors of A.
?gesvdx uses an eigenvalue problem for obtaining the SVD, which allows for the computation of a subset of
singular values and vectors. See ?bdsvdx for details.
Note that the routine returns VT, not V.
Input Parameters

jobu Specifies options for computing all or part of the matrix U:

= 'V': the first min(m,n) columns of U (the left singular vectors) or as
specified by range are returned in the array u;
= 'N': no columns of U (no left singular vectors) are computed.
jobvt Specifies options for computing all or part of the matrix VT:
= 'V': the first min(m,n) rows of VT (the right singular vectors) or as
specified by range are returned in the array vt;
927
= 'N': no rows of VT (no right singular vectors) are computed.
range = 'A': find all singular values.

= 'V': all singular values in the half-open interval (vl,vu] are found.
= 'I': the il-th through iu-th singular values are found.
m The number of rows of the input matrix A. m≥ 0.
n The number of columns of the input matrix A. n≥ 0.
a Array, size lda*n
lda≥ max(1,m).
vl vl≥0.
vu If range='V', the lower and upper bounds of the interval to be searched for
singular values. vu > vl. Not referenced if range = 'A' or 'I'.
il
iu If range='I', the indices (in ascending order) of the smallest and largest
singular values to be returned. 1 ≤il≤iu≤ min(m,n), if min(m,n) > 0. Not
referenced if range = 'A' or 'V'.
ldu The leading dimension of the array u. ldu≥ 1; if jobu = 'V', ldu≥m.
ldvt The leading dimension of the array vt. ldvt≥ 1; if jobvt = 'V', ldvt≥ns
(see above).
Output Parameters
a On exit, the contents of a are destroyed.
ns The total number of singular values found,

0 ≤ns≤ min(m, n).
If range = 'A', ns = min(m, n); if range = 'I', ns = iu - il + 1.
s Array, size (min(m,n))
The singular values of A, sorted so that s[i]≥s[i + 1].
u Array, size ldu*ucol
If jobu = 'V', u contains columns of U (the left singular vectors, stored

columnwise) as specified by range; if jobu = 'N', u is not referenced.
NOTE
Make sure that ucol≥ns; if range = 'V', the exact value of ns is not
known in advance and an upper bound must be used.
928
LAPACK Routines 3
vt Array, size ldvt*n
If jobvt = 'V', vt contains the rows of VT (the right singular vectors, stored
rowwise) as specified by range; if jobvt = 'N', vt is not referenced.
NOTE
Make sure that ldvt≥ns; if range = 'V', the exact value of ns is not
known in advance and an upper bound must be used.
superb Array, size (12*min(m, n)).
If info = 0, the first ns elements of iwork are zero. If info > 0, then
iwork contains the indices of the eigenvectors that failed to converge in ?
bdsvdx/?stevx.
Return Values
> 0: if info = i, then i eigenvectors failed to converge in ?bdsvdx/?stevx. if info = n*2 + 1, an internal error
occurred in ?bdsvdx.
?bdsvdx
Computes the SVD of a bidiagonal matrix.
Syntax
lapack_int LAPACKE_sbdsvdx (int matrix_layout, char uplo, char jobz, char range,
lapack_int n, float * d, float * e, float vl, float vu, lapack_int il, lapack_int iu,
lapack_int * ns, float * s, float * z, lapack_int ldz, lapack_int * superb);
lapack_int LAPACKE_dbdsvdx (int matrix_layout, char uplo, char jobz, char range,
lapack_int n, double * d, double * e, double vl, double vu, lapack_int il, lapack_int
iu, lapack_int * ns, double * s, double * z, lapack_int ldz, lapack_int * superb);
Include Files
• mkl.h
Description
?bdsvdx computes the singular value decomposition (SVD) of a real n-by-n (upper or lower) bidiagonal
matrix B, B = U * S * VT, where S is a diagonal matrix with non-negative diagonal elements (the singular
values of B), and U and VT are orthogonal matrices of left and right singular vectors, respectively.
Given an upper bidiagonal B with diagonal d = [d1d2 ... dn] and superdiagonal e = [e1e2 ... en - 1], ?bdsvdx
computes the singular value decompositon of B through the eigenvalues and eigenvectors of the n*2-by-n*2
tridiagonal matrix
0 d1
d 1 0 e1
TGK = e1 0 d 2
d2 ⋱ ⋱
⋱ ⋱
929
If (s,u,v) is a singular triplet of B with ||u|| = ||v|| = 1, then (±s,q), ||q|| = 1, are eigenpairs of TGK, with
v1 u1 v2 u2 ⋯ vn un
u′ ± v′
q =P* 2
= 2
, and P = en + 1 e1 en + 2 e2 ⋯ .
Given a TGK matrix, one can either
1. compute -s, -v and change signs so that the singular values (and corresponding vectors) are already in
descending order (as in ?gesvd/?gesdd) or
2. compute s, v and reorder the values (and corresponding vectors).
?bdsvdx implements (1) by calling ?stevx (bisection plus inverse iteration, to be replaced with a version of
the Multiple Relative Robust Representation algorithm. (See P. Willems and B. Lang, A framework for the
MR^3 algorithm: theory and implementation, SIAM J. Sci. Comput., 35:740-766, 2013.)
Input Parameters

uplo = 'U': B is upper bidiagonal;

= 'L': B is lower bidiagonal.
jobz = 'N': Compute singular values only;

= 'V': Compute singular values and singular vectors.
range = 'A': Find all singular values.

= 'V': all singular values in the half-open interval [vl,vu) are found.
= 'I': the il-th through iu-th singular values are found.
n The order of the bidiagonal matrix.

n >= 0.
d Array, size n.
The n diagonal elements of the bidiagonal matrix B.
e Array, size (max(1,n - 1))
The (n - 1) superdiagonal elements of the bidiagonal matrix B in elements 1

to n - 1.
vl vl≥ 0.
singular values. vu > vl.
Not referenced if range = 'A' or 'I'.
il, iu If range='I', the indices (in ascending order) of the smallest and largest
singular values to be returned.
1 ≤il≤iu≤ min(m,n), if min(m,n) > 0.
Not referenced if range = 'A' or 'V'.
ldz The leading dimension of the array z.
930
LAPACK Routines 3
ldz≥ 1, and if jobz = 'V', ldz≥ max(2,n*2).
Output Parameters
ns The total number of singular values found. 0 ≤ns≤n.
If range = 'A', ns = n, and if range = 'I', ns = iu - il + 1.
s Array, size (n)
The first ns elements contain the selected singular values in ascending

order.
z Array, size 2*n*k
If jobz = 'V', then if info = 0 the first ns columns of z contain the singular
vectors of the matrix B corresponding to the selected singular values, with
U in rows 1 to n and V in rows n+1 to n*2, i.e.
U
z=
V
NOTE
Make sure that at least k = ns+1 columns are supplied in the array z;
if range = 'V', the exact value of ns is not known in advance and an
upper bound must be used.
superb Array, size (12*n).
If jobz = 'V', then if info = 0, the first ns elements of iwork are zero. If
info > 0, then iwork contains the indices of the eigenvectors that failed to
converge in ?stevx.
Return Values
> 0:
if info = i, then i eigenvectors failed to converge in ?stevx. The indices of the eigenvectors (as returned
by ?stevx) are stored in the array iwork.
if info = n*2 + 1, an internal error occurred.
Cosine-Sine Decomposition: LAPACK Driver Routines

This section describes LAPACK driver routines for computing the cosine-sine decomposition (CS
decomposition). You can also call the corresponding computational routines to perform the same task.
The computation has the following phases:
1. The matrix is reduced to a bidiagonal block form.

2. The blocks are simultaneously diagonalized using techniques from the bidiagonal SVD algorithms.
931
Table "Driver Routines for Cosine-Sine Decomposition (CSD)" lists LAPACK routines that perform CS
decomposition of matrices.
Computational Routines for Cosine-Sine Decomposition (CSD)
Compute the CS decomposition of a block- orcsd uncsd

partitioned orthogonal matrix
Compute the CS decomposition of a block- orcsd uncsd

partitioned unitary matrix
See Also
Cosine-Sine Decomposition: LAPACK Computational Routines
?orcsd/?uncsd
Computes the CS decomposition of a block-partitioned
orthogonal/unitary matrix.
Syntax
lapack_int LAPACKE_sorcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q, float* x11,
lapack_int ldx11, float* x12, lapack_int ldx12, float* x21, lapack_int ldx21, float*
x22, lapack_int ldx22, float* theta, float* u1, lapack_int ldu1, float* u2, lapack_int
ldu2, float* v1t, lapack_int ldv1t, float* v2t, lapack_int ldv2t );
lapack_int LAPACKE_dorcsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q, double* x11,
lapack_int ldx11, double* x12, lapack_int ldx12, double* x21, lapack_int ldx21,
double* x22, lapack_int ldx22, double* theta, double* u1, lapack_int ldu1, double* u2,
lapack_int ldu2, double* v1t, lapack_int ldv1t, double* v2t, lapack_int ldv2t );
lapack_int LAPACKE_cuncsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q,
lapack_complex_float* x11, lapack_int ldx11, lapack_complex_float* x12, lapack_int
ldx12, lapack_complex_float* x21, lapack_int ldx21, lapack_complex_float* x22,
lapack_int ldx22, float* theta, lapack_complex_float* u1, lapack_int ldu1,
lapack_complex_float* u2, lapack_int ldu2, lapack_complex_float* v1t, lapack_int ldv1t,
lapack_complex_float* v2t, lapack_int ldv2t );
lapack_int LAPACKE_zuncsd( int matrix_layout, char jobu1, char jobu2, char jobv1t, char
jobv2t, char trans, char signs, lapack_int m, lapack_int p, lapack_int q,
lapack_complex_double* x11, lapack_int ldx11, lapack_complex_double* x12, lapack_int
ldx12, lapack_complex_double* x21, lapack_int ldx21, lapack_complex_double* x22,
lapack_int ldx22, double* theta, lapack_complex_double* u1, lapack_int ldu1,
lapack_complex_double* u2, lapack_int ldu2, lapack_complex_double* v1t, lapack_int
ldv1t, lapack_complex_double* v2t, lapack_int ldv2t );
Include Files
• mkl.h
Description
The routines ?orcsd/?uncsd compute the CS decomposition of an m-by-m partitioned orthogonal matrix X:
932
LAPACK Routines 3
or unitary matrix:
x11 is p-by-q. The orthogonal/unitary matrices u1, u2, v1, and v2 are p-by-p, (m-p)-by-(m-p), q-by-q, (m-q)-
by-(m-q), respectively. C and S are r-by-r nonnegative diagonal matrices satisfying C2 + S2 = I, in which r
= min(p,m-p,q,m-q).
Input Parameters

jobu1 If equals Y, then u1 is computed. Otherwise, u1 is not computed.
jobu2 If equals Y, then u2 is computed. Otherwise, u2 is not computed.
jobv1t If equals Y, then v1t is computed. Otherwise, v1t is not computed.
jobv2t If equals Y, then v2t is computed. Otherwise, v2t is not computed.

order.
signs = 'O': The lower-left block is made nonpositive (the

"other" convention).
otherwise The upper-right block is made nonpositive (the
"default" convention).
p The number of rows in x11 and x12. 0 ≤p≤m.
933
q The number of columns in x11 and x21. 0 ≤q≤m.
x11, x12, x21, x22 Arrays of size x11 (ldx11,q), x12 (ldx12,m - q), x21 (ldx21,q), and x22
(ldx22,m - q).
Contain the parts of the orthogonal/unitary matrix whose CSD is desired.
ldx11, ldx12, ldx21, ldx22 The leading dimensions of the parts of array X. ldx11≥ max(1, p), ldx12≥
max(1, p), ldx21≥ max(1, m - p), ldx22≥ max(1, m - p).
ldu1 The leading dimension of the array u1. If jobu1 = 'Y', ldu1≥ max(1,p).
ldu2 The leading dimension of the array u2. If jobu2 = 'Y', ldu2≥ max(1,m-p).
ldv1t The leading dimension of the array v1t. If jobv1t = 'Y', ldv1t≥
max(1,q).
ldv2t The leading dimension of the array v2t. If jobv2t = 'Y', ldv2t≥ max(1,m-
q).
Output Parameters
theta Array, size r, in which r = min(p,m-p,q,m-q).
C = diag( cos(theta[0]), ..., cos(theta[r - 1]) ), and

S = diag( sin(theta[0]), ..., sin(theta[r - 1]) ).
u1 Array, size at least max(1, ldu1*p).
If jobu1 = 'Y', u1 contains the p-by-p orthogonal/unitary matrix u1.
u2 Array, size at least max(1, ldu2*(m - p)).
If jobu2 = 'Y', u2 contains the (m-p)-by-(m-p) orthogonal/unitary matrix

u2.
v1t Array, size at least max(1, ldv1t*q) .
If jobv1t = 'Y', v1t contains the q-by-q orthogonal matrix v1T or unitary
matrix v1H.
v2t Array, size at least max(1, ldv2t*(m - q)).
If jobv2t = 'Y', v2t contains the (m-q)-by-(m-q) orthogonal matrix v2T or

unitary matrix v2H.
Return Values
> 0: ?orcsd/?uncsd did not converge.
See Also
?bbcsd
xerbla
934
LAPACK Routines 3
?orcsd2by1/?uncsd2by1
Computes the CS decomposition of a block-partitioned
Syntax
lapack_int LAPACKE_sorcsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, float * x11, lapack_int ldx11, float * x21,
lapack_int ldx21, float * theta, float * u1, lapack_int ldu1, float * u2, lapack_int
ldu2, float * v1t, lapack_int ldv1t);
lapack_int LAPACKE_dorcsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, double * x11, lapack_int ldx11, double *
x21, lapack_int ldx21, double * theta, double * u1, lapack_int ldu1, double * u2,
lapack_int ldu2, double * v1t, lapack_int ldv1t);
lapack_int LAPACKE_cuncsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, lapack_complex_float * x11, lapack_int
ldx11, lapack_complex_float * x21, lapack_int ldx21, float * theta,
lapack_complex_float * u1, lapack_int ldu1, lapack_complex_float * u2, lapack_int ldu2,
lapack_complex_float * v1t, lapack_int ldv1t);
lapack_int LAPACKE_zuncsd2by1 (int matrix_layout, char jobu1, char jobu2, char jobv1t,
lapack_int m, lapack_int p, lapack_int q, lapack_complex_double * x11, lapack_int
ldx11, lapack_complex_double * x21, lapack_int ldx21, double * theta,
lapack_complex_double * u1, lapack_int ldu1, lapack_complex_double * u2, lapack_int
ldu2, lapack_complex_double * v1t, lapack_int ldv1t);
Include Files
• mkl.h
Description
The routines ?orcsd2by1/?uncsd2by1 compute the CS decomposition of an m-by-q matrix X with
orthonormal columns that has been partitioned into a 2-by-1 block structure:
x11 is p-by-q. The orthogonal/unitary matrices u1, u2, v1, and v2 are p-by-p, (m-p)-by-(m-p), q-by-q, (m-q)-
by-(m-q), respectively. C and S are r-by-r nonnegative diagonal matrices satisfying C2 + S2 = I, in which r
= min(p,m-p,q,m-q).
Input Parameters

jobu1 If equal to 'Y', then u1 is computed. Otherwise, u1 is not computed.
935
jobu2 If equal to 'Y', then u2 is computed. Otherwise, u2 is not computed.
jobv1t If equal to 'Y', then v1t is computed. Otherwise, v1t is not computed.
p The number of rows in x11. 0 ≤p≤m.
q The number of columns in x11 . 0 ≤q≤m.
x11 Array, size (ldx11*q).

On entry, the part of the orthogonal matrix whose CSD is desired.
ldx11 The leading dimension of the array x11. ldx11≥ max(1,p).
x21 Array, size (ldx21*q).

On entry, the part of the orthogonal matrix whose CSD is desired.
ldx21 The leading dimension of the array X. ldx21≥ max(1,m - p).
ldu1 The leading dimension of the array u1. If jobu1 = 'Y', ldu1≥ max(1,p).
ldu2 The leading dimension of the array u2. If jobu2 = 'Y', ldu2≥ max(1,m-p).
ldv1t The leading dimension of the array v1t. If jobv1t = 'Y', ldv1t≥
max(1,q).
Output Parameters
theta Array, size r, in which r = min(p,m-p,q,m-q).
C = diag( cos(theta(1)), ..., cos(theta(r)) ), and

S = diag( sin(theta(1)), ..., sin(theta(r)) ).
u1 Array, size (ldu1*p) .

If jobu1 = 'Y', u1 contains the p-by-p orthogonal/unitary matrix u1.
u2 Array, size (ldu2*(m - p)) .

If jobu2 = 'Y', u2 contains the (m-p)-by-(m-p) orthogonal/unitary matrix
u2.
v1t Array, size (ldv1t*q) .

If jobv1t = 'Y', v1t contains the q-by-q orthogonal matrix v1T or unitary
matrix v1H.
Return Values
< 0: if info = -i, the i-th argument has an illegal value
> 0: ?orcsd2by1/?uncsd2by1 did not converge.
936
LAPACK Routines 3
See Also
?bbcsd
xerbla
Generalized Symmetric Definite Eigenvalue Problems: LAPACK Driver Routines

This section describes LAPACK driver routines used for solving generalized symmetric definite eigenproblems.
See also computational routines that can be called to solve these problems. Table "Driver Routines for
Solving Generalized Symmetric Definite Eigenproblems" lists all such driver routines.
Driver Routines for Solving Generalized Symmetric Definite Eigenproblems
sygv/hegv Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem.
sygvd/hegvd Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem. If eigenvectors
are desired, it uses a divide and conquer method.
sygvx/hegvx Computes selected eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem.
spgv/hpgv Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with matrices in
packed storage.
spgvd/hpgvd Computes all eigenvalues and, optionally, eigenvectors of a real / complex

packed storage. If eigenvectors are desired, it uses a divide and conquer method.
spgvx/hpgvx Computes selected eigenvalues and, optionally, eigenvectors of a real / complex

packed storage.
sbgv/hbgv Computes all eigenvalues and, optionally, eigenvectors of a real / complex

generalized symmetric /Hermitian positive-definite eigenproblem with banded
matrices.
sbgvd/hbgvd Computes all eigenvalues and, optionally, eigenvectors of a real / complex

matrices. If eigenvectors are desired, it uses a divide and conquer method.
sbgvx/hbgvx Computes selected eigenvalues and, optionally, eigenvectors of a real / complex

matrices.
?sygv
eigenvectors of a real generalized symmetric definite
eigenproblem.
Syntax
lapack_int LAPACKE_ssygv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* w);
lapack_int LAPACKE_dsygv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* w);
Include Files
• mkl.h
937
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
definite eigenproblem, of the form
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x.
Here A and B are assumed to be symmetric and B is also positive definite.
Input Parameters

Specifies the problem type to be solved:
if itype = 1, the problem type is A*x = lambda*B*x;
if itype = 2, the problem type is A*B*x = lambda*x;
if itype = 3, the problem type is B*A*x = lambda*x.
If jobz = 'N', then compute eigenvalues only.
If jobz = 'V', then compute eigenvalues and eigenvectors.
If uplo = 'U', arrays a and b store the upper triangles of A and B;
If uplo = 'L', arrays a and b store the lower triangles of A and B.
a, b Arrays:
a (size at least max(1, lda*n)) contains the upper or lower triangle of the
symmetric matrix A, as specified by uplo.
b (size at least max(1, ldb*n)) contains the upper or lower triangle of the
symmetric positive definite matrix B, as specified by uplo.
Output Parameters
a On exit, if jobz = 'V', then if info = 0, a contains the matrix Z of

eigenvectors. The eigenvectors are normalized as follows:
if itype = 1 or 2, ZT*B*Z = I;
if itype = 3, ZT*inv(B)*Z = I;
If jobz = 'N', then on exit the upper triangle (if uplo = 'U') or the
lower triangle (if uplo = 'L') of A, including the diagonal, is destroyed.
938
LAPACK Routines 3
b On exit, if info≤n, the part of b containing the matrix is overwritten by the
triangular factor U or L from the Cholesky factorization B = UT*U or B =
L*LT.

Return Values
If info > 0, spotrf/dpotrf or ssyev/dsyev returned an error code:
If info = i≤n, ssyev/dsyev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
did not converge to zero;
If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The factorization
of B could not be completed and no eigenvalues or eigenvectors were computed.
?hegv
eigenvectors of a complex generalized Hermitian
positive-definite eigenproblem.
Syntax
lapack_int LAPACKE_chegv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, float* w );
lapack_int LAPACKE_zhegv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, double* w );
Include Files
• mkl.h
Description
The routine computes all the eigenvalues, and optionally, the eigenvectors of a complex generalized
Hermitian positive-definite eigenproblem, of the form
Here A and B are assumed to be Hermitian and B is also positive definite.
Input Parameters

itype Must be 1 or 2 or 3. Specifies the problem type to be solved:

939
a, b Arrays:
Hermitian matrix A, as specified by uplo.
Hermitian positive definite matrix B, as specified by uplo.
Output Parameters

if itype = 1 or 2, ZH*B*Z = I;
if itype = 3, ZH*inv(B)*Z = I;

triangular factor U or L from the Cholesky factorization B = UH*U or B =
L*LH.

Return Values
If info > 0, cpotrf/zpotrf or cheev/zheev return an error code:
If info = i≤n, cheev/zheev fails to converge, and i off-diagonal elements of an intermediate tridiagonal do
not converge to zero;
of B can not be completed and no eigenvalues or eigenvectors are computed.
940
LAPACK Routines 3
?sygvd
eigenproblem using a divide and conquer method.
Syntax
lapack_int LAPACKE_ssygvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float* w);
lapack_int LAPACKE_dsygvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double* w);
Include Files
• mkl.h
Description
A*x = λ*B*x, A*B*x = λ*x, or B*A*x = λ*x .
Here A and B are assumed to be symmetric and B is also positive definite.
It uses a divide and conquer algorithm.
Input Parameters


a, b Arrays:
a (size at least lda*n) contains the upper or lower triangle of the
b (size at least ldb*n) contains the upper or lower triangle of the
941
Output Parameters


L*LT.

Return Values
If info > 0, an error code is returned as specified below.
• For info≤n:
• If info = i and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero.
• If jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the submatrix
lying in rows and columns info/(n+1) through mod(info,n+1).
• For info > n:
• If info = n + i, for 1 ≤i≤n, then the leading minor of order i of B is not positive-definite. The
factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
?hegvd
positive-definite eigenproblem using a divide and
conquer method.
Syntax
lapack_int LAPACKE_chegvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, float* w );
lapack_int LAPACKE_zhegvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double* b,
lapack_int ldb, double* w );
942
LAPACK Routines 3
Include Files
• mkl.h
Description
Here A and B are assumed to be Hermitian and B is also positive definite.
It uses a divide and conquer algorithm.
Input Parameters


a, b Arrays:
Output Parameters

if itype = 1 or 2, ZH* B*Z = I;
943

L*LH.

Return Values
If info = i, and jobz = 'N', then the algorithm failed to converge; i off-diagonal elements of an
intermediate tridiagonal form did not converge to zero;
if info = i, and jobz = 'V', then the algorithm failed to compute an eigenvalue while working on the
submatrix lying in rows and columns info/(n+1) through mod(info, n+1).
?sygvx
eigenproblem.
Syntax
lapack_int LAPACKE_ssygvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float vl,
float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w, float*
z, lapack_int ldz, lapack_int* ifail);
lapack_int LAPACKE_dsygvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double
vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
double* z, lapack_int ldz, lapack_int* ifail);
Include Files
• mkl.h
Description
The routine computes selected eigenvalues, and optionally, the eigenvectors of a real generalized symmetric-
Here A and B are assumed to be symmetric and B is also positive definite. Eigenvalues and eigenvectors can
be selected by specifying either a range of values or a range of indices for the desired eigenvalues.
944
LAPACK Routines 3
Input Parameters


if itype = 1, the problem type is A*x = λ*B*x;
if itype = 2, the problem type is A*B*x = λ*x;
if itype = 3, the problem type is B*A*x = λ*x.

interval:
vl<w[i]≤vu.
a, b Arrays:
for eigenvalues.
Constraint: vl< vu.
il, iu
if n = 0.
945
abstol
ldz≥ 1; if jobz = 'V', ldz≥ max(1, n) for column major layout and ldz≥
Output Parameters
a On exit, the upper triangle (if uplo = 'U') or the lower triangle (if uplo =
'L') of A, including the diagonal, is overwritten.

L*LT.

0 ≤m≤n. If range = 'A', m = n, and if range = 'I',
m = iu-il+1.
w, z Arrays:
The first m elements of w contain the selected eigenvalues in ascending
order.
with w[i - 1]. The eigenvectors are normalized as follows:

returned in ifail.
ifail
converge.
946
LAPACK Routines 3
Return Values
If info > 0, spotrf/dpotrf and ssyevx/dsyevx returned an error code:
If info = i≤n, ssyevx/dsyevx failed to converge, and i eigenvectors failed to converge. Their indices are
stored in the array ifail;
Application Notes
An approximate eigenvalue is accepted as converged when it is determined to lie in an interval [a,b] of
width less than or equal to abstol+ε*max(|a|,|b|), where ε is the machine precision.
obtained by reducing C to tridiagonal form, where C is the symmetric matrix of the standard symmetric
problem to which the generalized problem is transformed. Eigenvalues will be computed most accurately
lamch('S').
?hegvx
Syntax
lapack_int LAPACKE_chegvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float*
b, lapack_int ldb, float vl, float vu, lapack_int il, lapack_int iu, float abstol,
lapack_int* m, float* w, lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhegvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, double vl, double vu, lapack_int il,
Include Files
• mkl.h
Description
The routine computes selected eigenvalues, and optionally, the eigenvectors of a complex generalized
Here A and B are assumed to be Hermitian and B is also positive definite. Eigenvalues and eigenvectors can
947
Input Parameters


if itype = 1, the problem type is A*x = λ*B*x;
if itype = 2, the problem type is A*B*x = λ*x;
if itype = 3, the problem type is B*A*x = λ*x.

interval:
vl<w[i]≤vu.
a, b Arrays:
for eigenvalues.
Constraint: vl< vu.
il, iu
if n = 0.
948
LAPACK Routines 3
more information.

Output Parameters
a On exit, the upper triangle (if uplo = 'U') or the lower triangle (if uplo =
'L') of A, including the diagonal, is overwritten.

L*LH.

m = iu-il+1.

The first m elements of w contain the selected eigenvalues in ascending
order.
with w[i - 1]. The eigenvectors are normalized as follows:

returned in ifail.
ifail
converge.
949
Return Values
If info > 0, cpotrf/zpotrf and cheevx/zheevx returned an error code:
If info = i≤n, cheevx/zheevx failed to converge, and i eigenvectors failed to converge. Their indices are
Application Notes
matrix obtained by reducing C to tridiagonal form, where C is the symmetric matrix of the standard
symmetric problem to which the generalized problem is transformed. Eigenvalues will be computed most
accurately when abstol is set to twice the underflow threshold 2*?lamch('S'), not zero.
to 2*?lamch('S').
?spgv
eigenproblem with matrices in packed storage.
Syntax
lapack_int LAPACKE_sspgv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* ap, float* bp, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspgv (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* ap, double* bp, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
Here A and B are assumed to be symmetric, stored in packed format, and B is also positive definite.
Input Parameters


950
LAPACK Routines 3
If uplo = 'U', arrays ap and bp store the upper triangles of A and B;
If uplo = 'L', arrays ap and bp store the lower triangles of A and B.
ap, bp Arrays:
ap contains the packed upper or lower triangle of the symmetric matrix A,
bp contains the packed upper or lower triangle of the symmetric matrix B,
ldz The leading dimension of the output array z; ldz≥ 1. If jobz = 'V', ldz≥
max(1, n).
Output Parameters
ap On exit, the contents of ap are overwritten.
bp On exit, contains the triangular factor U or L from the Cholesky factorization

B = UT*U or B = L*LT, in the same storage format as B.
w, z Arrays:
z (size max(1, ldz*n)) .

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors.
The eigenvectors are normalized as follows:
Return Values
951
If info > 0, spptrf/dpptrf and sspev/dspev returned an error code:
If info = i≤n, sspev/dspev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
?hpgv
positive-definite eigenproblem with matrices in packed
storage.
Syntax
lapack_int LAPACKE_chpgv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float* w,
lapack_int LAPACKE_zhpgv( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double* w,
Include Files
• mkl.h
Description
Here A and B are assumed to be Hermitian, stored in packed format, and B is also positive definite.
Input Parameters


952
LAPACK Routines 3
ap, bp Arrays:
ap contains the packed upper or lower triangle of the Hermitian matrix A, as
specified by uplo.
bp contains the packed upper or lower triangle of the Hermitian matrix B,
max(1, n).
Output Parameters

B = UH*U or B = L*LH, in the same storage format as B.

z Array z (size max(1, ldz*n)).

Return Values
If info > 0, cpptrf/zpptrf and chpev/zhpev returned an error code:
If info = i≤n, chpev/zhpev failed to converge, and i off-diagonal elements of an intermediate tridiagonal
?spgvd
eigenproblem with matrices in packed storage using a
divide and conquer method.
953
Syntax
lapack_int LAPACKE_sspgvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, float* ap, float* bp, float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dspgvd (int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, double* ap, double* bp, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
If eigenvectors are desired, it uses a divide and conquer algorithm.
Input Parameters


ap, bp Arrays:
max(1, n).
954
LAPACK Routines 3
Output Parameters

w, z Arrays:
z (size at least max(1, ldz*n)).

Return Values
If info > 0, spptrf/dpptrf and sspevd/dspevd returned an error code:
If info = i≤n, sspevd/dspevd failed to converge, and i off-diagonal elements of an intermediate tridiagonal
?hpgvd
positive-definite eigenproblem with matrices in packed
storage using a divide and conquer method.
Syntax
lapack_int LAPACKE_chpgvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float* w,
lapack_int LAPACKE_zhpgvd( int matrix_layout, lapack_int itype, char jobz, char uplo,
lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double* w,
Include Files
• mkl.h
Description
955

Input Parameters


ap, bp Arrays:
specified by uplo.
max(1, n).
Output Parameters



956
LAPACK Routines 3
Return Values
If info > 0, cpptrf/zpptrf and chpevd/zhpevd returned an error code:
If info = i≤n, chpevd/zhpevd failed to converge, and i off-diagonal elements of an intermediate tridiagonal
?spgvx
eigenproblem with matrices in packed storage.
Syntax
lapack_int LAPACKE_sspgvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, float* ap, float* bp, float vl, float vu, lapack_int il,
lapack_int iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz,
lapack_int* ifail);
lapack_int LAPACKE_dspgvx (int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, double* ap, double* bp, double vl, double vu, lapack_int il,
lapack_int* ifail);
Include Files
• mkl.h
Description
Input Parameters


957

interval:
vl<w[i]≤vu.
ap, bp Arrays:
The size of bp must be at least max(1, n*(n+1)/2).
for eigenvalues.
Constraint: vl< vu.
il, iu
if n = 0.
more information.

958
LAPACK Routines 3
Output Parameters


m = iu-il+1.
w, z Arrays:
with w(i). The eigenvectors are normalized as follows:

returned in ifail.
ifail
converge.
Return Values
If info > 0, spptrf/dpptrf and sspevx/dspevx returned an error code:
If info = i≤n, sspevx/dspevx failed to converge, and i eigenvectors failed to converge. Their indices are
959
Application Notes
If abstol is less than or equal to zero, then ε*||T||1 is used instead, where T is the tridiagonal matrix
obtained by reducing A to tridiagonal form. Eigenvalues are computed most accurately when abstol is set to
twice the underflow threshold 2*?lamch('S'), not zero.
lamch('S').
?hpgvx
eigenvectors of a generalized Hermitian positive-
definite eigenproblem with matrices in packed
storage.
Syntax
lapack_int LAPACKE_chpgvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_float* ap, lapack_complex_float* bp, float vl,
float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float* w,
lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhpgvx( int matrix_layout, lapack_int itype, char jobz, char range,
char uplo, lapack_int n, lapack_complex_double* ap, lapack_complex_double* bp, double
vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m, double* w,
lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
Input Parameters


960
LAPACK Routines 3

interval:
vl<w[i]≤vu.
ap, bp Arrays:
specified by uplo.
for eigenvalues.
Constraint: vl< vu.
il, iu
if n = 0.
abstol The absolute error tolerance for the eigenvalues.

See Application Notes for more information.
max(1, n) for column major layout and ldz≥ max(1, m) for row major
layout.
Output Parameters

961

m = iu-il+1.

with w(i). The eigenvectors are normalized as follows:

returned in ifail.
ifail
converge.
Return Values
If info > 0, cpptrf/zpptrf and chpevx/zhpevx returned an error code:
If info = i≤n, chpevx/zhpevx failed to converge, and i eigenvectors failed to converge. Their indices are
Application Notes
962
LAPACK Routines 3
to 2*?lamch('S').
?sbgv
eigenproblem with banded matrices.
Syntax
lapack_int LAPACKE_ssbgv (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb, lapack_int ldbb,
float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbgv (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb, lapack_int
ldbb, double* w, double* z, lapack_int ldz);
Include Files
• mkl.h
Description
definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be symmetric and
banded, and B is also positive definite.
Input Parameters

If uplo = 'U', arrays ab and bb store the upper triangles of A and B;
If uplo = 'L', arrays ab and bb store the lower triangles of A and B.

(ka≥ 0).
kb The number of super- or sub-diagonals in B (kb≥ 0).
ab, bb Arrays:
ab(size at least max(1, ldab*n) for column major layout and max(1,
ldab*(ka + 1)) for row major layout) is an array containing either upper or
963
bb(size at least max(1, ldbb*n) for column major layout and max(1,
ldbb*(kb + 1)) for row major layout) is an array containing either upper or
lower triangular part of the symmetric matrix B (as specified by uplo) in
major layout and at least max(1, n) for row major layout .
major layout and at least max(1, n) for row major layout.
max(1, n).
Output Parameters
ab On exit, the contents of ab are overwritten.
bb On exit, contains the factor S from the split Cholesky factorization B =

ST*S, as returned by pbstf/pbstf.
w, z Arrays:
z (size at least max(1, ldz*n)) .

If jobz = 'V', then if info = 0, z contains the matrix Z of eigenvectors,
with the i-th column of z holding the eigenvector associated with w(i). The
eigenvectors are normalized so that ZT*B*Z = I.
Return Values
If info > 0, and
if i≤n, the algorithm failed to converge, and i off-diagonal elements of an intermediate tridiagonal did not
converge to zero;
if info = n + i, for 1 ≤i≤n, then pbstf/pbstf returned info = i and B is not positive-definite. The
?hbgv
positive-definite eigenproblem with banded matrices.
964
LAPACK Routines 3
Syntax
lapack_int LAPACKE_chbgv( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, float* w, lapack_complex_float* z,
lapack_int ldz );
lapack_int LAPACKE_zhbgv( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, double* w, lapack_complex_double* z,
lapack_int ldz );
Include Files
• mkl.h
Description
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are Hermitian and
banded matrices, and matrix B is also positive definite.
Input Parameters


(ka≥ 0).
ab, bb Arrays:
lower triangular part of the Hermitian matrix A (as specified by uplo) in
lower triangular part of the Hermitian matrix B (as specified by uplo) in
965
major layout and at least max(1, n for row major layout.
major layout and at least max(1, n for row major layout.
max(1, n).
Output Parameters

SH*S, as returned by pbstf/pbstf.


eigenvectors are normalized so that ZH*B*Z = I.
Return Values
If info > 0, and
converge to zero;
?sbgvd
eigenproblem with banded matrices. If eigenvectors
are desired, it uses a divide and conquer method.
Syntax
lapack_int LAPACKE_ssbgvd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb, lapack_int ldbb,
float* w, float* z, lapack_int ldz);
lapack_int LAPACKE_dsbgvd (int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb, lapack_int
ldbb, double* w, double* z, lapack_int ldz);
966
LAPACK Routines 3
Include Files
• mkl.h
Description
banded, and B is also positive definite.
Input Parameters


(ka≥ 0).
ab, bb Arrays:
max(1, n).
967
Output Parameters

w, z Arrays:
z (size at least max(1, ldz*n)).

with the i-th column of z holding the eigenvector associated with w[i -
1]. The eigenvectors are normalized so that ZT*B*Z = I.
Return Values
If info > 0, and
converge to zero;
?hbgvd
If eigenvectors are desired, it uses a divide and
conquer method.
Syntax
lapack_int LAPACKE_chbgvd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, float* w, lapack_complex_float* z,
lapack_int ldz );
lapack_int LAPACKE_zhbgvd( int matrix_layout, char jobz, char uplo, lapack_int n,
lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int ldab,
lapack_complex_double* bb, lapack_int ldbb, double* w, lapack_complex_double* z,
lapack_int ldz );
Include Files
• mkl.h
Description
968
LAPACK Routines 3
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be
Hermitian and banded, and B is also positive definite.
Input Parameters


(ka≥0).
ab, bb Arrays:
ldab The leading dimension of the array ab; must be at least ka+1.
ldbb The leading dimension of the array bb; must be at least kb+1.
max(1, n).
Output Parameters

w Array, size at least max(1, n) .

969

eigenvectors are normalized so that ZH*B*Z = I.
Return Values
If info > 0, and
converge to zero;
?sbgvx
eigenproblem with banded matrices.
Syntax
lapack_int LAPACKE_ssbgvx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, float* ab, lapack_int ldab, float* bb,
lapack_int ldbb, float* q, lapack_int ldq, float vl, float vu, lapack_int il,
lapack_int iu, float abstol, lapack_int* m, float* w, float* z, lapack_int ldz,
lapack_int* ifail);
lapack_int LAPACKE_dsbgvx (int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, double* ab, lapack_int ldab, double* bb,
lapack_int ldbb, double* q, lapack_int ldq, double vl, double vu, lapack_int il,
lapack_int* ifail);
Include Files
• mkl.h
Description
banded, and B is also positive definite. Eigenvalues and eigenvectors can be selected by specifying either all
eigenvalues, a range of values or a range of indices for the desired eigenvalues.
Input Parameters

970
LAPACK Routines 3

interval:
vl<w[i]≤vu.
If range = 'I', the routine computes eigenvalues in range il to iu.

(ka≥ 0).
ab, bb Arrays:
for eigenvalues.
Constraint: vl< vu.
il, iu
if n = 0.
971
more information.
max(1, n).
ldq The leading dimension of the output array q; ldq < 1.
If jobz = 'V', ldq < max(1, n).
Output Parameters


m = iu-il+1.
w, z, q Arrays:
w, size at least max(1, n) .
major layout) .
eigenvectors are normalized so that ZT*B*Z = I.
q (size max(1, ldq*n)) .

If jobz = 'V', then q contains the n-by-n matrix used in the reduction of
A*x = lambda*B*x to standard form, that is, C*x= lambda*x and
consequently C to tridiagonal form.
If jobz = 'N', then q is not referenced.
ifail
Array, size m.
converge.
Return Values
If info > 0, and
972
LAPACK Routines 3
converge to zero;
Application Notes
to 2*?lamch('S').
?hbgvx
Syntax
lapack_int LAPACKE_chbgvx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, lapack_complex_float* ab, lapack_int ldab,
lapack_complex_float* bb, lapack_int ldbb, lapack_complex_float* q, lapack_int ldq,
float vl, float vu, lapack_int il, lapack_int iu, float abstol, lapack_int* m, float*
w, lapack_complex_float* z, lapack_int ldz, lapack_int* ifail );
lapack_int LAPACKE_zhbgvx( int matrix_layout, char jobz, char range, char uplo,
lapack_int n, lapack_int ka, lapack_int kb, lapack_complex_double* ab, lapack_int
ldab, lapack_complex_double* bb, lapack_int ldbb, lapack_complex_double* q, lapack_int
ldq, double vl, double vu, lapack_int il, lapack_int iu, double abstol, lapack_int* m,
double* w, lapack_complex_double* z, lapack_int ldz, lapack_int* ifail );
Include Files
• mkl.h
Description
Hermitian positive-definite banded eigenproblem, of the form A*x = λ*B*x. Here A and B are assumed to be
Hermitian and banded, and B is also positive definite. Eigenvalues and eigenvectors can be selected by
specifying either all eigenvalues, a range of values or a range of indices for the desired eigenvalues.
Input Parameters

973

interval:
vl< w[i]≤vu.

(ka≥ 0).
ab, bb Arrays:
for eigenvalues.
Constraint: vl< vu.
il, iu
if n = 0.
more information.
974
LAPACK Routines 3
max(1, n) for column major layout and at least max(1, m) for row major
layout.
ldq The leading dimension of the output array q; ldq≥ 1. If jobz = 'V', ldq≥
max(1, n).
Output Parameters


m = iu-il+1.
w Array w, size at least max(1, n).

z, q Arrays:
major layout).
with the i-th column of z holding the eigenvector associated with w[i -
1]. The eigenvectors are normalized so that ZH*B*Z = I.
q (size max(1, ldq*n)).

If jobz = 'V', then q contains the n-by-n matrix used in the reduction of
Ax = λBx to standard form, that is, Cx = λx and consequently C to
tridiagonal form.
If jobz = 'N', then q is not referenced.
ifail
converge.
Return Values
If info > 0, and
975
converge to zero;
Application Notes
to 2*?lamch('S').
Generalized Nonsymmetric Eigenvalue Problems: LAPACK Driver Routines

This section describes LAPACK driver routines used for solving generalized nonsymmetric eigenproblems. See
also computational routines that can be called to solve these problems. Table "Driver Routines for Solving
Generalized Nonsymmetric Eigenproblems" lists all such driver routines.
Driver Routines for Solving Generalized Nonsymmetric Eigenproblems
gges Computes the generalized eigenvalues, Schur form, and the left and/or right Schur
vectors for a pair of nonsymmetric matrices.
ggesx Computes the generalized eigenvalues, Schur form, and, optionally, the left and/or
right matrices of Schur vectors.
gges3 Computes generalized Schur factorization for a pair of matrices.
ggev Computes the generalized eigenvalues, and the left and/or right generalized
eigenvectors for a pair of nonsymmetric matrices.
ggevx Computes the generalized eigenvalues, and, optionally, the left and/or right
generalized eigenvectors.
ggev3 Computes generalized Schur factorization for a pair of matrices.
?gges
Computes the generalized eigenvalues, Schur form,
and the left and/or right Schur vectors for a pair of
nonsymmetric matrices.
Syntax
lapack_int LAPACKE_sgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 select, lapack_int n, float* a, lapack_int lda, float* b, lapack_int
ldb, lapack_int* sdim, float* alphar, float* alphai, float* beta, float* vsl,
lapack_int ldvsl, float* vsr, lapack_int ldvsr );
lapack_int LAPACKE_dgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 select, lapack_int n, double* a, lapack_int lda, double* b,
lapack_int ldb, lapack_int* sdim, double* alphar, double* alphai, double* beta,
double* vsl, lapack_int ldvsl, double* vsr, lapack_int ldvsr );
976
LAPACK Routines 3
lapack_int LAPACKE_cgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 select, lapack_int n, lapack_complex_float* a, lapack_int lda,
lapack_complex_float* b, lapack_int ldb, lapack_int* sdim, lapack_complex_float* alpha,
lapack_complex_float* beta, lapack_complex_float* vsl, lapack_int ldvsl,
lapack_complex_float* vsr, lapack_int ldvsr );
lapack_int LAPACKE_zgges( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 select, lapack_int n, lapack_complex_double* a, lapack_int lda,
lapack_complex_double* b, lapack_int ldb, lapack_int* sdim, lapack_complex_double*
alpha, lapack_complex_double* beta, lapack_complex_double* vsl, lapack_int ldvsl,
lapack_complex_double* vsr, lapack_int ldvsr );
Include Files
• mkl.h
Description
The ?gges routine computes the generalized eigenvalues, the generalized real/complex Schur form (S,T),
optionally, the left and/or right matrices of Schur vectors (vsl and vsr) for a pair of n-by-n real/complex
nonsymmetric matrices (A,B). This gives the generalized Schur factorization
(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )
Optionally, it also orders the eigenvalues so that a selected cluster of eigenvalues appears in the leading
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T. The leading
columns of vsl and vsr then form an orthonormal/unitary basis for the corresponding left and right
If only the generalized eigenvalues are needed, use the driver ggev instead, which is faster.
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta = w, such that A -
w*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation
for beta=0 or for both being zero. A pair of matrices (S,T) is in the generalized real Schur form if T is upper
triangular with non-negative diagonal and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1
blocks correspond to real generalized eigenvalues, while 2-by-2 blocks of S are "standardized" by making the
corresponding elements of T have the form:
and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair of generalized
eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular
and, in addition, the diagonal of T are non-negative real numbers.
The ?gges routine replaces the deprecated ?gegs routine.
Input Parameters

jobvsl Must be 'N' or 'V'.
If jobvsl = 'N', then the left Schur vectors are not computed.
977
If jobvsl = 'V', then the left Schur vectors are computed.
jobvsr Must be 'N' or 'V'.
If jobvsr = 'N', then the right Schur vectors are not computed.
If jobvsr = 'V', then the right Schur vectors are computed.
the diagonal of the generalized Schur form.
select The select parameter is a pointer to a function returning a value of

lapack_logical type. For different flavors the function has different
arguments:
LAPACKE_sgges: lapack_logical (*LAPACK_S_SELECT3) ( const
float*, const float*, const float* );
LAPACKE_dgges: lapack_logical (*LAPACK_D_SELECT3) ( const
double*, const double*, const double* );
LAPACKE_cgges: lapack_logical (*LAPACK_C_SELECT2) ( const
lapack_complex_float*, const lapack_complex_float* );
LAPACKE_zgges: lapack_logical (*LAPACK_Z_SELECT2) ( const
lapack_complex_double*, const lapack_complex_double* );
If sort = 'S', select is used to select eigenvalues to sort to the top left
of the Schur form.
For real flavors:

An eigenvalue (alphar[j] + alphai[j])/beta[j] is selected if select(alphar[j],
alphai[j], beta[j]) is true; that is, if either one of a complex conjugate pair
of eigenvalues is selected, then both complex eigenvalues are selected.
Note that in the ill-conditioned case, a selected complex eigenvalue may no
longer satisfy select(alphar[j], alphai[j], beta[j]) = 1 after
ordering. In this case info is set to n+2 .
An eigenvalue alpha[j] / beta[j] is selected if select(alpha[j], beta[j])
is true.
Note that a selected complex eigenvalue may no longer satisfy
select(alpha[j], beta[j]) = 1 after ordering, since ordering may
change the value of complex eigenvalues (especially if the eigenvalue is ill-
conditioned); in this case info is set to n+2 (see info below).
n The order of the matrices A, B, vsl, and vsr (n≥ 0).
a, b Arrays:
a (size at least max(1, lda*n)) is an array containing the n-by-n matrix A
(first of the pair of matrices).
978
LAPACK Routines 3
b (size at least max(1, ldb*n)) is an array containing the n-by-n matrix B
(second of the pair of matrices).
ldb The leading dimension of the array b. Must be at least max(1, n).
ldvsl, ldvsr The leading dimensions of the output matrices vsl and vsr, respectively.
Constraints:
ldvsl≥ 1. If jobvsl = 'V', ldvsl≥ max(1, n).
ldvsr≥ 1. If jobvsr = 'V', ldvsr≥ max(1, n).
Output Parameters
a On exit, this array has been overwritten by its generalized Schur form S.
b On exit, this array has been overwritten by its generalized Schur form T.
sdim

Note that for real flavors complex conjugate pairs for which select is true
for either eigenvalue count as 2.
alphar, alphai Arrays, size at least max(1, n) each. Contain values that form generalized
See beta.
eigenvalues in complex flavors. See beta.

For real flavors:
alphar[j] + alphai[j]*i and beta[j], j=0,..., n - 1 are the diagonals of the
complex Schur form (S,T) that would result if the 2-by-2 diagonal blocks of
the real generalized Schur form of (A,B) were further reduced to triangular
form using complex unitary transformations. If alphai[j] is zero, then the j-
th eigenvalue is real; if positive, then the j-th and (j+1)-st eigenvalues are
a complex conjugate pair, with alphai[j+1] negative.
On exit, alpha[j]/beta[j], j=0,..., n - 1, will be the generalized eigenvalues.
alpha[j] and beta[j], j=0,..., n - 1 are the diagonals of the complex Schur
form (S,T) output by cgges/zgges. The beta[j] will be non-negative real.
See also Application Notes below.

vsl, vsr Arrays:
vsl (size at least max(1, ldvsl*n)).
If jobvsl = 'V', this array will contain the left Schur vectors.
979
If jobvsl = 'N', vsl is not referenced.
vsr (size at least max(1, ldvsr*n)).
If jobvsr = 'V', this array will contain the right Schur vectors.
If jobvsr = 'N', vsr is not referenced.
Return Values
If info = i, and
i≤n:
the QZ iteration failed. (A, B) is not in Schur form, but alphar[j], alphai[j] (for real flavors), or alpha[j] (for
complex flavors), and beta[j], j = info,..., n - 1 should be correct.
i > n: errors that usually indicate LAPACK problems:

i = n+1: other than QZ iteration failed in hgeqz;
i = n+2: after reordering, roundoff changed values of some complex eigenvalues so that leading
eigenvalues in the generalized Schur form no longer satisfy select = 1. This could also be caused due to
scaling;
i = n+3: reordering failed in tgsen.
Application Notes
The quotients alphar[j]/beta[j] and alphai[j]/beta[j] may easily over- or underflow, and beta[j] may even be
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai will be always less than
and usually comparable with norm(A) in magnitude, and beta always less than and usually comparable with
norm(B).
?ggesx
Computes the generalized eigenvalues, Schur form,
and, optionally, the left and/or right matrices of Schur
vectors.
Syntax
lapack_int LAPACKE_sggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 select, char sense, lapack_int n, float* a, lapack_int lda, float* b,
lapack_int ldb, lapack_int* sdim, float* alphar, float* alphai, float* beta, float*
vsl, lapack_int ldvsl, float* vsr, lapack_int ldvsr, float* rconde, float* rcondv );
lapack_int LAPACKE_dggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 select, char sense, lapack_int n, double* a, lapack_int lda, double*
b, lapack_int ldb, lapack_int* sdim, double* alphar, double* alphai, double* beta,
double* vsl, lapack_int ldvsl, double* vsr, lapack_int ldvsr, double* rconde, double*
rcondv );
lapack_int LAPACKE_cggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 select, char sense, lapack_int n, lapack_complex_float* a, lapack_int
lda, lapack_complex_float* b, lapack_int ldb, lapack_int* sdim, lapack_complex_float*
alpha, lapack_complex_float* beta, lapack_complex_float* vsl, lapack_int ldvsl,
lapack_complex_float* vsr, lapack_int ldvsr, float* rconde, float* rcondv );
980
LAPACK Routines 3
lapack_int LAPACKE_zggesx( int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 select, char sense, lapack_int n, lapack_complex_double* a, lapack_int
lda, lapack_complex_double* b, lapack_int ldb, lapack_int* sdim, lapack_complex_double*
alpha, lapack_complex_double* beta, lapack_complex_double* vsl, lapack_int ldvsl,
lapack_complex_double* vsr, lapack_int ldvsr, double* rconde, double* rcondv );
Include Files
• mkl.h
Description
The routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), the generalized
eigenvalues, the generalized real/complex Schur form (S,T), optionally, the left and/or right matrices of
Schur vectors (vsl and vsr). This gives the generalized Schur factorization
(A,B) = ( vsl*S *vsrH, vsl*T*vsrH )
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T; computes a
reciprocal condition number for the average of the selected eigenvalues (rconde); and computes a reciprocal
condition number for the right and left deflating subspaces corresponding to the selected eigenvalues
(rcondv). The leading columns of vsl and vsr then form an orthonormal/unitary basis for the corresponding
left and right eigenspaces (deflating subspaces).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha / beta = w, such that A
- w*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation
for beta=0 or for both being zero. A pair of matrices (S,T) is in generalized real Schur form if T is upper
triangular with non-negative diagonal and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1
blocks correspond to real generalized eigenvalues, while 2-by-2 blocks of S will be "standardized" by making
the corresponding elements of T have the form:
and the pair of corresponding 2-by-2 blocks in S and T will have a complex conjugate pair of generalized
eigenvalues. A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular
and, in addition, the diagonal of T are non-negative real numbers.
Input Parameters

jobvsl Must be 'N' or 'V'.
If jobvsl = 'N', then the left Schur vectors are not computed.
If jobvsl = 'V', then the left Schur vectors are computed.
jobvsr Must be 'N' or 'V'.
If jobvsr = 'N', then the right Schur vectors are not computed.
If jobvsr = 'V', then the right Schur vectors are computed.
981
the diagonal of the generalized Schur form.
select The select parameter is a pointer to a function returning a value of

lapack_logical type. For different flavors the function has different
arguments:
LAPACKE_sggesx: lapack_logical (*LAPACK_S_SELECT3) ( const
float*, const float*, const float* );
LAPACKE_dggesx: lapack_logical (*LAPACK_D_SELECT3) ( const
double*, const double*, const double* );
LAPACKE_cggesx: lapack_logical (*LAPACK_C_SELECT2) ( const
lapack_complex_float*, const lapack_complex_float* );
LAPACKE_zggesx: lapack_logical (*LAPACK_Z_SELECT2) ( const
lapack_complex_double*, const lapack_complex_double* );
If sort = 'S', select is used to select eigenvalues to sort to the top left
of the Schur form.
For real flavors:

An eigenvalue (alphar[j] + alphai[j])/beta[j] is selected if select(alphar[j],
alphai[j], beta[j]) is true; that is, if either one of a complex conjugate pair
of eigenvalues is selected, then both complex eigenvalues are selected.
longer satisfy select(alphar[j], alphai[j], beta[j]) = 1 after
ordering. In this case info is set to n+2.
An eigenvalue alpha[j] / beta[j] is selected if select(alpha[j], beta[j])
is true.
select(alpha[j], beta[j]) = 1 after ordering, since ordering may
change the value of complex eigenvalues (especially if the eigenvalue is ill-
conditioned); in this case info is set to n+2 (see info below).
If sense = 'E', computed for average of selected eigenvalues only;
If sense = 'V', computed for selected deflating subspaces only;
If sense = 'B', computed for both.
If sense is 'E', 'V', or 'B', then sort must equal 'S'.
n The order of the matrices A, B, vsl, and vsr (n≥ 0).
a, b Arrays:
982
LAPACK Routines 3


ldvsl, ldvsr The leading dimensions of the output matrices vsl and vsr, respectively.
Constraints:
ldvsl≥ 1. If jobvsl = 'V', ldvsl≥ max(1, n).
ldvsr≥ 1. If jobvsr = 'V', ldvsr≥ max(1, n).
Output Parameters
a On exit, this array has been overwritten by its generalized Schur form S.
b On exit, this array has been overwritten by its generalized Schur form T.
sdim

Note that for real flavors complex conjugate pairs for which select is true
for either eigenvalue count as 2.
See beta.

For real flavors:
On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1 will be the
alphar[j] + alphai[j]*i and beta[j], j=0,..., n - 1 are the diagonals of the
complex Schur form (S,T) that would result if the 2-by-2 diagonal blocks of
the real generalized Schur form of (A,B) were further reduced to triangular
form using complex unitary transformations. If alphai[j] is zero, then the j-
th eigenvalue is real; if positive, then the j-th and (j+1)-st eigenvalues are
a complex conjugate pair, with alphai[j+1] negative.
On exit, alpha[j]/beta[j], j=0,..., n - 1 will be the generalized eigenvalues.
alpha[j] and beta[j], j=0,..., n - 1 are the diagonals of the complex Schur
form (S,T) output by cggesx/zggesx. The beta[j] will be non-negative real.
983

vsl, vsr Arrays:
vsl (size at least max(1, ldvsl*n)).
If jobvsl = 'V', this array will contain the left Schur vectors.
If jobvsl = 'N', vsl is not referenced.
vsr (size at least max(1, ldvsr*n)).
If jobvsr = 'V', this array will contain the right Schur vectors.
If jobvsr = 'N', vsr is not referenced.
rconde, rcondv Arrays, size 2 each

If sense = 'E' or 'B', rconde(1) and rconde(2) contain the reciprocal
condition numbers for the average of the selected eigenvalues.
Not referenced if sense = 'N' or 'V'.
If sense = 'V' or 'B', rcondv[0] and rcondv[1] contain the reciprocal

condition numbers for the selected deflating subspaces.
Not referenced if sense = 'N' or 'E'.
Return Values
If info = i, and
i≤n:
the QZ iteration failed. (A, B) is not in Schur form, but alphar[j], alphai[j] (for real flavors), or alpha[j] (for
complex flavors), and beta[j], j = info,..., n - 1 should be correct.

i = n+2: after reordering, roundoff changed values of some complex eigenvalues so that leading
eigenvalues in the generalized Schur form no longer satisfy select = 1. This could also be caused due to
scaling;
i = n+3: reordering failed in tgsen.
Application Notes
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai will be always less than
and usually comparable with norm(A) in magnitude, and beta always less than and usually comparable with
norm(B).
?gges3
Computes generalized Schur factorization for a pair of
matrices.
984
LAPACK Routines 3
Syntax
lapack_int LAPACKE_sgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_S_SELECT3 selctg, lapack_int n, float * a, lapack_int lda, float * b,
lapack_int ldb, lapack_int * sdim, float * alphar, float * alphai, float * beta, float
* vsl, lapack_int ldvsl, float * vsr, lapack_int ldvsr);
lapack_int LAPACKE_dgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_D_SELECT3 selctg, lapack_int n, double * a, lapack_int lda, double * b,
lapack_int ldb, lapack_int * sdim, double * alphar, double * alphai, double * beta,
double * vsl, lapack_int ldvsl, double * vsr, lapack_int ldvsr);
lapack_int LAPACKE_cgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_C_SELECT2 selctg, lapack_int n, lapack_complex_float * a, lapack_int lda,
lapack_complex_float * b, lapack_int ldb, lapack_int * sdim, lapack_complex_float *
alpha, lapack_complex_float * beta, lapack_complex_float * vsl, lapack_int ldvsl,
lapack_complex_float * vsr, lapack_int ldvsr);
lapack_int LAPACKE_zgges3 (int matrix_layout, char jobvsl, char jobvsr, char sort,
LAPACK_Z_SELECT2 selctg, lapack_int n, lapack_complex_double * a, lapack_int lda,
lapack_complex_double * b, lapack_int ldb, lapack_int * sdim, lapack_complex_double *
alpha, lapack_complex_double * beta, lapack_complex_double * vsl, lapack_int ldvsl,
lapack_complex_double * vsr, lapack_int ldvsr);
Include Files
• mkl.h
Description
For a pair of n-by-n real or complex nonsymmetric matrices (A,B), ?gges3 computes the generalized
eigenvalues, the generalized real or complex Schur form (S,T), and optionally the left or right matrices of
Schur vectors (VSL and VSR). This gives the generalized Schur factorization
(A,B) = ( (VSL)*S*(VSR)T, (VSL)*T*(VSR)T ) for real (A,B)
or
(A,B) = ( (VSL)*S*(VSR)H, (VSL)*T*(VSR)H ) for complex (A,B)
where (VSR)H is the conjugate-transpose of VSR.
diagonal blocks of the upper quasi-triangular matrix S and the upper triangular matrix T. The leading
columns of VSL and VSR then form an orthonormal basis for the corresponding left and right eigenspaces
(deflating subspaces).
NOTE
If only the generalized eigenvalues are needed, use the driver ?ggev instead, which is faster.
A generalized eigenvalue for a pair of matrices (A,B) is a scalar w or a ratio alpha/beta = w, such that A -
w*B is singular. It is usually represented as the pair (alpha,beta), as there is a reasonable interpretation for
beta=0 or both being zero.
For real flavors:
A pair of matrices (S,T) is in generalized real Schur form if T is upper triangular with non-negative diagonal
and S is block upper triangular with 1-by-1 and 2-by-2 blocks. 1-by-1 blocks correspond to real generalized
eigenvalues, while 2-by-2 blocks of S will be "standardized" by making the corresponding elements of T have
the form:
985
a 0
0 b
and the pair of corresponding 2-by-2 blocks in S and T have a complex conjugate pair of generalized
eigenvalues.
A pair of matrices (S,T) is in generalized complex Schur form if S and T are upper triangular and, in addition,
the diagonal elements of T are non-negative real numbers.
Input Parameters

jobvsl = 'N': do not compute the left Schur vectors;
jobvsr = 'N': do not compute the right Schur vectors;

= 'V': compute the right Schur vectors.
sort Specifies whether or not to order the eigenvalues on the diagonal of the
generalized Schur form.
= 'N': Eigenvalues are not ordered;
= 'S': Eigenvalues are ordered (see selctg).
selctg selctg is a function of three arguments for real flavors or two arguments
for complex flavors. selctg must be declared EXTERNAL in the calling
subroutine. If sort = 'N', selctg is not referenced. If sort = 'S', selctg
is used to select eigenvalues to sort to the top left of the Schur form.
For real flavors:
An eigenvalue (alphar[j - 1] + alphai[j - 1])/beta[j - 1] is
selected if selctg(alphar[j - 1],alphai[j - 1],beta[j - 1]) is true.
In other words, if either one of a complex conjugate pair of eigenvalues is
longer satisfy selctg(alphar[j - 1],alphai[j - 1], beta[j - 1]) ≠ 0
after ordering. info is to be set to n+2 in this case.

An eigenvalue alpha[j - 1]/beta[j - 1] is selected if selctg(alpha[j
- 1],beta[j - 1]) is true.
selctg(alpha[j - 1],beta[j - 1])≠ 0 after ordering, since ordering
may change the value of complex eigenvalues (especially if the eigenvalue
is ill-conditioned), in this case ?gges3 returns n + 2.
n The order of the matrices A, B, VSL, and VSR. n≥ 0.
a Array, size (lda*n). On entry, the first of the pair of matrices.
lda The leading dimension of a. lda≥ max(1,n).
b Array, size (ldb*n). On entry, the second of the pair of matrices.
986
LAPACK Routines 3
ldb The leading dimension of b. ldb≥ max(1,n).
ldvsl The leading dimension of the matrix VSL. ldvsl≥ 1, and if jobvsl = 'V',
ldvsl≥ n.
ldvsr The leading dimension of the matrix VSR. ldvsr≥ 1, and if jobvsr = 'V',
ldvsr≥ n.
Output Parameters
a On exit, a is overwritten by its generalized Schur form S.
b On exit, b is overwritten by its generalized Schur form T.
sdim If sort = 'N', sdim = 0. If sort = 'S', sdim = number of eigenvalues (after
sorting) for which selctg is true.
alpha Array, size (n).
alphar Array, size (n).
alphai Array, size (n).
beta Array, size (n).
For real flavors:

On exit, (alphar[j - 1] + alphai[j - 1]*i)/beta[j - 1], j=1,...,n,
are the generalized eigenvalues. alphar[j - 1] + alphai[j - 1]*i, and
beta[j - 1],j=1,...,n are the diagonals of the complex Schur form (S,T)
that would result if the 2-by-2 diagonal blocks of the real Schur form of
(a,b) were further reduced to triangular form using 2-by-2 complex unitary
transformations. If alphai[j - 1] is zero, then the j-th eigenvalue is real;
if positive, then the j-th and (j+1)-st eigenvalues are a complex conjugate
pair, with alphai[j] negative.
Note: the quotients alphar[j - 1]/beta[j - 1] and alphai[j - 1]/

beta[j - 1] can easily over- or underflow, and beta[j - 1] might even
be zero. Thus, you should avoid computing the ratio alpha/beta by simply
dividing alpha by beta. However, alphar and alphai is always less than
and usually comparable with norm(a) in magnitude, and beta is always less
than and usually comparable with norm(b).

On exit, alpha[j - 1][j - 1]/beta[j - 1], j=1,...,n, are the
generalized eigenvalues. alpha[j - 1], j=1,...,n and beta[j - 1],
j=1,...,n are the diagonals of the complex Schur form (a,b) output by ?
gges3. The beta[j - 1] is non-negative real.
Note: the quotient alpha[j - 1]/beta[j - 1] can easily over- or

underflow, and beta[j - 1] might even be zero. Thus, you should avoid
computing the ratio alpha/beta by simply dividing alpha by beta.
However, alpha is always less than and usually comparable with norm(a) in
magnitude, and beta is always less than and usually comparable with
norm(b).
987
vsl Array, size (ldvsl*n).
If jobvsl = 'V', vsl contains the left Schur vectors. Not referenced if
jobvsl = 'N'.
vsr Array, size (ldvsr*n).
If jobvsr = 'V', vsr contains the right Schur vectors. Not referenced if
jobvsr = 'N'.
Return Values
= 0: successful exit < 0: if info = -i, the i-th argument had an illegal value.
=1,...,n:
for real flavors:

The QZ iteration failed. (a,b) are not in Schur form, but alphar[j], alphai[j] and beta[j] should be
correct for j=info,...,n - 1.
The QZ iteration failed. (a,b) are not in Schur form, but alpha[j] and beta[j] should be correct for
j=info,...,n - 1.

> n:
=n+1: other than QZ iteration failed in ?hgeqz.
=n+2: after reordering, roundoff changed values of some complex eigenvalues so that leading eigenvalues in
the Generalized Schur form no longer satisfy selctg≠ 0 This could also be caused due to scaling.
=n+3: reordering failed in ?tgsen.
?ggev
Computes the generalized eigenvalues, and the left
and/or right generalized eigenvectors for a pair of
nonsymmetric matrices.
Syntax
lapack_int LAPACKE_sggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
float* a, lapack_int lda, float* b, lapack_int ldb, float* alphar, float* alphai,
float* beta, float* vl, lapack_int ldvl, float* vr, lapack_int ldvr );
lapack_int LAPACKE_dggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
double* a, lapack_int lda, double* b, lapack_int ldb, double* alphar, double* alphai,
double* beta, double* vl, lapack_int ldvl, double* vr, lapack_int ldvr );
lapack_int LAPACKE_cggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float* alpha, lapack_complex_float* beta, lapack_complex_float* vl,
lapack_int ldvl, lapack_complex_float* vr, lapack_int ldvr );
lapack_int LAPACKE_zggev( int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double* alpha, lapack_complex_double* beta, lapack_complex_double* vl,
lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr );
988
LAPACK Routines 3
Include Files
• mkl.h
Description
The ?ggev routine computes the generalized eigenvalues, and optionally, the left and/or right generalized
eigenvectors for a pair of n-by-n real/complex nonsymmetric matrices (A,B).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation for
beta =0 and even for both being zero.
The right generalized eigenvector v(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies
A*v(j) = λ(j)*B*v(j).
The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies
u(j)H*A = λ(j)*u(j)H*B
where u(j)H denotes the conjugate transpose of u(j).
The ?ggev routine replaces the deprecated ?gegv routine.
Input Parameters

If jobvl = 'N', the left generalized eigenvectors are not computed;
If jobvl = 'V', the left generalized eigenvectors are computed.
If jobvr = 'N', the right generalized eigenvectors are not computed;
If jobvr = 'V', the right generalized eigenvectors are computed.
n The order of the matrices A, B, vl, and vr (n≥ 0).
a, b Arrays:
ldb The leading dimension of the array b. Must be at least max(1, n).
ldvl, ldvr The leading dimensions of the output matrices vl and vr, respectively.
Constraints:
ldvl≥ 1. If jobvl = 'V', ldvl≥ max(1, n).
ldvr≥ 1. If jobvr = 'V', ldvr≥ max(1, n).
989
Output Parameters
a, b On exit, these arrays have been overwritten.
See beta.

For real flavors:
On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,..., n - 1, are the generalized
eigenvalues.
If alphai[j] is zero, then the j-th eigenvalue is real; if positive, then the j-th
and (j+1)-st eigenvalues are a complex conjugate pair, with alphai[j+1]
negative.
On exit, alpha[j]/beta[j], j=0,..., n - 1, are the generalized eigenvalues.
vl, vr Arrays:
vl (size at least max(1, ldvl*n)). Contains the matrix of left generalized
eigenvectors VL.
If jobvl = 'V', the left generalized eigenvectors uj are stored one after
another in the columns of VL, in the same order as their eigenvalues. Each
eigenvector is scaled so the largest component has abs(Re) + abs(Im) =
1.
For real flavors:

If the j-th eigenvalue is real,then the k-th component of the j-th left
eigenvector uj is stored in vl[(k - 1) + (j - 1)*ldvl] for column
major layout and in vl[(k - 1)*ldvl + (j - 1)] for row major layout..
i = sqrt(-1), the k-th components of the j-th left eigenvector ujare
vl[(k - 1) + (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl] for column
major layout and vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k - 1)*ldvl
+ j] for row major layout. Similarly, the k-th components of left
eigenvector j+1 uj+1 are vl[(k - 1) + (j - 1)*ldvl] - i*vl[(k - 1)
+ j*ldvl] for column major layout and vl[(k - 1)*ldvl + (j - 1)] -
i*vl[(k - 1)*ldvl + j] for row major layout..

The k-th component of the j-th left eigenvector uj is stored in vl[(k - 1)
+ (j - 1)*ldvl] for column major layout and in vl[(k - 1)*ldvl + (j
- 1)] for row major layout.
vr (size at least max(1, ldvr*n)). Contains the matrix of right generalized
eigenvectors VR.
990
LAPACK Routines 3
If jobvr = 'V', the right generalized eigenvectors vj are stored one after
another in the columns of VR, in the same order as their eigenvalues. Each
eigenvector is scaled so the largest component has abs(Re) + abs(Im) = 1.
For real flavors:

If the j-th eigenvalue is real, then The k-th component of the j-th right
eigenvector vj is stored in vr[(k - 1) + (j - 1)*ldvr] for column
major layout and in vr[(k - 1)*ldvr + (j - 1)] for row major layout..
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then the
k-th components of thej-th right eigenvector vj can be computed as vr[(k
- 1) + (j - 1)*ldvr] + i*vr[(k - 1) + j*ldvr] for column major
layout and vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr + j]
for row major layout. Similarly, the k-th components of the right
eigenvector j+1 v{j+1} can be computed as vr[(k - 1) + (j - 1)*ldvr]
- i*vr[(k - 1) + j*ldvr] for column major layout and vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout..
The k-th component of the j-th right eigenvector vj is stored in vr[(k - 1)
+ (j - 1)*ldvr] for column major layout and in vr[(k - 1)*ldvr + (j
Return Values
If info = i, and
i≤n: the QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphai[j] (for real flavors),
or alpha[j] (for complex flavors), and beta[j], j=info,..., n - 1 should be correct.

i = n+2: error return from tgevc.
Application Notes
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai (for real flavors) or
alpha (for complex flavors) will be always less than and usually comparable with norm(A) in magnitude, and
beta always less than and usually comparable with norm(B).
?ggevx
Computes the generalized eigenvalues, and,
optionally, the left and/or right generalized
eigenvectors.
991
Syntax
lapack_int LAPACKE_sggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, float* a, lapack_int lda, float* b, lapack_int ldb, float*
alphar, float* alphai, float* beta, float* vl, lapack_int ldvl, float* vr, lapack_int
ldvr, lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale, float* abnrm,
float* bbnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_dggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, double* a, lapack_int lda, double* b, lapack_int ldb, double*
alphar, double* alphai, double* beta, double* vl, lapack_int ldvl, double* vr,
lapack_int ldvr, lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale,
double* abnrm, double* bbnrm, double* rconde, double* rcondv );
lapack_int LAPACKE_cggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_float* a, lapack_int lda, lapack_complex_float* b,
lapack_int ldb, lapack_complex_float* alpha, lapack_complex_float* beta,
lapack_int* ilo, lapack_int* ihi, float* lscale, float* rscale, float* abnrm, float*
bbnrm, float* rconde, float* rcondv );
lapack_int LAPACKE_zggevx( int matrix_layout, char balanc, char jobvl, char jobvr, char
sense, lapack_int n, lapack_complex_double* a, lapack_int lda, lapack_complex_double*
b, lapack_int ldb, lapack_complex_double* alpha, lapack_complex_double* beta,
lapack_complex_double* vl, lapack_int ldvl, lapack_complex_double* vr, lapack_int ldvr,
lapack_int* ilo, lapack_int* ihi, double* lscale, double* rscale, double* abnrm,
double* bbnrm, double* rconde, double* rcondv );
Include Files
• mkl.h
Description
The routine computes for a pair of n-by-n real/complex nonsymmetric matrices (A,B), the generalized
eigenvalues, and optionally, the left and/or right generalized eigenvectors.
Optionally also, it computes a balancing transformation to improve the conditioning of the eigenvalues and
eigenvectors (ilo, ihi, lscale, rscale, abnrm, and bbnrm), reciprocal condition numbers for the eigenvalues
(rconde), and reciprocal condition numbers for the right eigenvectors (rcondv).
A generalized eigenvalue for a pair of matrices (A,B) is a scalar λ or a ratio alpha / beta = λ, such that A -
λ*B is singular. It is usually represented as the pair (alpha, beta), as there is a reasonable interpretation for
beta=0 and even for both being zero. The right generalized eigenvector v(j) corresponding to the
generalized eigenvalue λ(j) of (A,B) satisfies
A*v(j) = λ(j)*B*v(j).
The left generalized eigenvector u(j) corresponding to the generalized eigenvalue λ(j) of (A,B) satisfies
u(j)H*A = λ(j)*u(j)H*B
where u(j)H denotes the conjugate transpose of u(j).
Input Parameters

992
LAPACK Routines 3
balanc Must be 'N', 'P', 'S', or 'B'. Specifies the balance option to be
performed.
If balanc = 'N', do not diagonally scale or permute;
If balanc = 'P', permute only;
If balanc = 'S', scale only;
If balanc = 'B', both permute and scale.
Computed reciprocal condition numbers will be for the matrices after

balancing and/or permuting. Permuting does not change condition numbers
(in exact arithmetic), but balancing does.
If jobvl = 'N', the left generalized eigenvectors are not computed;
If jobvl = 'V', the left generalized eigenvectors are computed.
If jobvr = 'N', the right generalized eigenvectors are not computed;
If jobvr = 'V', the right generalized eigenvectors are computed.
If sense = 'E', computed for eigenvalues only;
If sense = 'V', computed for eigenvectors only;
If sense = 'B', computed for eigenvalues and eigenvectors.
n The order of the matrices A, B, vl, and vr (n≥ 0).
a, b Arrays:


ldvl, ldvr The leading dimensions of the output matrices vl and vr, respectively.
Constraints:
ldvl≥ 1. If jobvl = 'V', ldvl≥ max(1, n).
ldvr≥ 1. If jobvr = 'V', ldvr≥ max(1, n).
993
Output Parameters
a, b On exit, these arrays have been overwritten.

If jobvl = 'V' or jobvr = 'V' or both, then a contains the first part of
the real Schur form of the "balanced" versions of the input A and B, and b
contains its second part.
See beta.

For real flavors:
If alphai[j] is zero, then the j-th eigenvalue is real; if positive, then the j-th
and (j+1)-st eigenvalues are a complex conjugate pair, with alphai[j+1]
negative.
On exit, alpha[j]/beta[j], j=0,..., n - 1, will be the generalized eigenvalues.
vl, vr Arrays:
vl (size at least max(1, ldvl*n)).
If jobvl = 'V', the left generalized eigenvectors u(j) are stored one after
another in the columns of vl, in the same order as their eigenvalues. Each
eigenvector will be scaled so the largest component have abs(Re) +
abs(Im) = 1.
For real flavors:

If the j-th eigenvalue is real, then k-th component of j-th left eigenvector uj
is stored in vl[(k - 1) + (j - 1)*ldvl] for column major layout and in
vl[(k - 1)*ldvl + (j - 1)] for row major layout..
i = sqrt(-1), the k-th components of the j-th left eigenvector uj can be
computed as vl[(k - 1) + (j - 1)*ldvl] + i*vl[(k - 1) + j*ldvl]
for column major layout and vl[(k - 1)*ldvl + (j - 1)] + i*vl[(k -
1)*ldvl + j] for row major layout. Similarly, the k-th components of the
left eigenvector j+1 uj+1 can be computed as vl[(k - 1) + (j -
1)*ldvl] - i*vl[(k - 1) + j*ldvl] for column major layout and vl[(k
- 1)*ldvl + (j - 1)] - i*vl[(k - 1)*ldvl + j] for row major
layout..
994
LAPACK Routines 3
The k-th component of the j-th left eigenvector uj is stored in vl[(k - 1)
+ (j - 1)*ldvl] for column major layout and in vl[(k - 1)*ldvl + (j
If jobvr = 'V', the right generalized eigenvectors v(j) are stored one after
another in the columns of vr, in the same order as their eigenvalues. Each
eigenvector will be scaled so the largest component have abs(Re) +
abs(Im) = 1.
For real flavors:

If the j-th eigenvalue is real, then the k-th component of the j-th right
eigenvector vj is stored in vr[(k - 1) + (j - 1)*ldvr] for column
major layout and in vr[(k - 1)*ldvr + (j - 1)] for row major layout..
If the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then
The k-th components of the j-th right eigenvector vj can be computed as
vr[(k - 1) + (j - 1)*ldvr] + i*vr[(k - 1) + j*ldvr] for column
major layout and vr[(k - 1)*ldvr + (j - 1)] + i*vr[(k - 1)*ldvr
+ j] for row major layout. Respectively, the k-th components of right
eigenvector j+1 vj + 1 can be computed as vr[(k - 1) + (j - 1)*ldvr]
- i*vr[(k - 1) + j*ldvr] for column major layout and vr[(k -
1)*ldvr + (j - 1)] - i*vr[(k - 1)*ldvr + j] for row major layout..
The k-th component of the j-th right eigenvector vj is stored in vr[(k - 1)
+ (j - 1)*ldvr] for column major layout and in vr[(k - 1)*ldvr + (j
ilo, ihi ilo and ihi are integer values such that on exit Ai j = 0 and Bi j = 0 if i >
j and j = 1,..., ilo-1 or i = ihi+1,..., n.
If balanc = 'N' or 'S', ilo = 1 and ihi = n.
lscale, rscale Arrays, size at least max(1, n) each.

lscale contains details of the permutations and scaling factors applied to the
left side of A and B.
If PL(j) is the index of the row interchanged with row j, and DL(j) is the
scaling factor applied to row j, then
lscale[j - 1] = PL(j), for j = 1,..., ilo-1
= DL(j), for j = ilo,...,ihi
= PL(j) for j = ihi+1,..., n.
rscale contains details of the permutations and scaling factors applied to the
right side of A and B.
If PR(j) is the index of the column interchanged with column j, and DR(j)
is the scaling factor applied to column j, then
rscale[j - 1] = PR(j), for j = 1,..., ilo-1
= DR(j), for j = ilo,...,ihi
995
= PR(j) for j = ihi+1,..., n.

abnrm, bbnrm The one-norms of the balanced matrices A and B, respectively.
rconde, rcondv Arrays, size at least max(1, n) each.

If sense = 'E', or 'B', rconde contains the reciprocal condition numbers
of the eigenvalues, stored in consecutive elements of the array. For a
complex conjugate pair of eigenvalues two consecutive elements of rconde
are set to the same value. Thus rconde[j], rcondv[j], and the j-th columns
of vl and vr all correspond to the same eigenpair (but not in general the j-th
eigenpair, unless all eigenpairs are selected).
If sense = 'N', or 'V', rconde is not referenced.
If sense = 'V', or 'B', rcondv contains the estimated reciprocal condition

numbers of the eigenvectors, stored in consecutive elements of the array.
For a complex eigenvector two consecutive elements of rcondv are set to
the same value.
If the eigenvalues cannot be reordered to compute , rcondv[j] is set to 0;
this can only occur when the true value would be very small anyway.
If sense = 'N', or 'E', rcondv is not referenced.
Return Values
If info = i, and
i≤n: the QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphai[j] (for real flavors),
or alpha[j] (for complex flavors), and beta[j], j=info,..., n - 1 should be correct.

i = n+2: error return from tgevc.
Application Notes
zero. Thus, you should avoid simply computing the ratio. However, alphar and alphai (for real flavors) or
alpha (for complex flavors) will be always less than and usually comparable with norm(A) in magnitude, and
beta always less than and usually comparable with norm(B).
?ggev3
Computes the generalized eigenvalues and the left
and right generalized eigenvectors for a pair of
matrices.
Syntax
lapack_int LAPACKE_sggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
float * a, lapack_int lda, float * b, lapack_int ldb, float * alphar, float * alphai,
float * beta, float * vl, lapack_int ldvl, float * vr, lapack_int ldvr);
996
LAPACK Routines 3
lapack_int LAPACKE_dggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
double * a, lapack_int lda, double * b, lapack_int ldb, double * alphar, double *
alphai, double * beta, double * vl, lapack_int ldvl, double * vr, lapack_int ldvr);
lapack_int LAPACKE_cggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
lapack_complex_float * alpha, lapack_complex_float * beta, lapack_complex_float * vl,
lapack_int ldvl, lapack_complex_float * vr, lapack_int ldvr);
lapack_int LAPACKE_zggev3 (int matrix_layout, char jobvl, char jobvr, lapack_int n,
lapack_complex_double * alpha, lapack_complex_double * beta, lapack_complex_double *
vl, lapack_int ldvl, lapack_complex_double * vr, lapack_int ldvr);
Include Files
• mkl.h
Description
For a pair of n-by-n real or complex nonsymmetric matrices (A, B), ?ggev3 computes the generalized
eigenvalues, and optionally, the left and right generalized eigenvectors.
A generalized eigenvalue for a pair of matrices (A, B) is a scalar λ or a ratio alpha/beta = λ, such that A - λ*B
is singular. It is usually represented as the pair (alpha,beta), as there is a reasonable interpretation for
beta=0, and even for both being zero.
For real flavors:
The right eigenvector vj corresponding to the eigenvalue λj of (A, B) satisfies
A * vj = λj * B * vj.
The left eigenvector uj corresponding to the eigenvalue λj of (A, B) satisfies
ujH * A = λj * ujH * B
where ujH is the conjugate-transpose of uj.
The right generalized eigenvector vj corresponding to the generalized eigenvalue λj of (A, B) satisfies
A * vj = λj * B * vj.
The left generalized eigenvector uj corresponding to the generalized eigenvalues λj of (A, B) satisfies
ujH * A = λj * ujH * B
where ujH is the conjugate-transpose of uj.
Input Parameters

jobvl = 'N': do not compute the left generalized eigenvectors;

= 'V': compute the left generalized eigenvectors.
jobvr = 'N': do not compute the right generalized eigenvectors;

= 'V': compute the right generalized eigenvectors.
n The order of the matrices A, B, VL, and VR.
997
n≥ 0.
On entry, the matrix A in the pair (A, B).
lda≥ max(1,n).
On entry, the matrix B in the pair (A, B).
ldb The leading dimension of b.
ldb≥ max(1,n).
ldvl The leading dimension of the matrix VL.

ldvl≥ 1, and if jobvl = 'V', ldvl≥n.
ldvr The leading dimension of the matrix VR.

ldvr≥ 1, and if jobvr = 'V', ldvr≥n.
Output Parameters
a On exit, a is overwritten.
b On exit, b is overwritten.
alphar Array, size (n).
alphai Array, size (n).
alpha Array, size (n).
beta Array, size (n).
For real flavors:

On exit, (alphar[j] + alphai[j]*i)/beta[j], j=0,...,n - 1, are the
generalized eigenvalues. If alphai[j - 1] is zero, then the j-th
eigenvalue is real; if positive, then the j-th and (j+1)-st eigenvalues are a
complex conjugate pair, with alphai[j] negative.
Note: the quotients alphar[j - 1]/beta[j - 1] and alphai[j - 1]/

beta[j - 1] can easily over- or underflow, and beta(j) might even be
zero. Thus, you should avoid computing the ratio alpha/beta by simply
dividing alpha by beta. However, alphar and alphai are always less than
and usually comparable with norm(A) in magnitude, and beta is always less
than and usually comparable with norm(B).
On exit, alpha[j]/beta[j], j=0,...,n - 1, are the generalized eigenvalues.
998
LAPACK Routines 3
Note: the quotients alpha[j - 1]/beta[j - 1] may easily over- or
underflow, and beta(j) can even be zero. Thus, you should avoid computing
the ratio alpha/beta by simply dividing alpha by beta. However, alpha is
always less than and usually comparable with norm(A) in magnitude, and
betais always less than and usually comparable with norm(B).
vl Array, size (ldvl*n).
For real flavors:

If jobvl = 'V', the left eigenvectors uj are stored one after another in the
columns of vl, in the same order as their eigenvalues. If the j-th eigenvalue
is real, then uj = the j-th column of vl. If the j-th and (j+1)-st eigenvalues
form a complex conjugate pair, then the real part of uj = the j-th column of
vl and the imaginary part of vj = the (j + 1)-st column of vl.
Each eigenvector is scaled so the largest component has abs(real part)
+abs(imag. part)=1.
Not referenced if jobvl = 'N'.

If jobvl = 'V', the left generalized eigenvectors uj are stored one after
another in the columns of vl, in the same order as their eigenvalues.
Each eigenvector is scaled so the largest component has abs(real part) +

abs(imag. part) = 1.
Not referenced if jobvl = 'N'.
vr Array, size (ldvr*n).
For real flavors:

If jobvr = 'V', the right eigenvectors vj are stored one after another in the
columns of vr, in the same order as their eigenvalues. If the j-th eigenvalue
is real, then vj = the j-th column of vr. If the j-th and (j + 1)-st
eigenvalues form a complex conjugate pair, then the real part of vj = the j-
th column of vr and the imaginary part of vj = the (j + 1)-st column of vr.
Each eigenvector is scaled so the largest component has abs(real part)

+abs(imag. part)=1.
Not referenced if jobvr = 'N'.

If jobvr = 'V', the right generalized eigenvectors vj are stored one after
another in the columns of vr, in the same order as their eigenvalues. Each
eigenvector is scaled so the largest component has abs(real part) +
abs(imag. part) = 1.
Not referenced if jobvr = 'N'.
Return Values
=1,...,n:
999
for real flavors:

The QZ iteration failed. No eigenvectors have been calculated, but alphar[j], alphar[j] and beta[j] should
be correct for j=info,...,n - 1.

The QZ iteration failed. No eigenvectors have been calculated, but alpha[j] and beta[j] should be correct for
j=info,...,n - 1.
> n:
=n + 1: other than QZ iteration failed in ?hgeqz,
=n + 2: error return from ?tgevc.
LAPACK Auxiliary Routines

Routine naming conventions, mathematical notation, and matrix storage schemes used for LAPACK auxiliary
routines are the same as for the driver and computational routines described in previous chapters.
?lacgv
Conjugates a complex vector.
Syntax
lapack_int LAPACKE_clacgv (lapack_int n, lapack_complex_float* x, lapack_int incx);
lapack_int LAPACKE_zlacgv (lapack_int n, lapack_complex_double* x, lapack_int incx);
Include Files
• mkl.h
Description
The routine conjugates a complex vector x of length n and increment incx (see "Vector Arguments in BLAS"
in Appendix B).
Input Parameters
n The length of the vector x (n≥ 0).
x Array, dimension (1+(n-1)* |incx|).
Contains the vector of length n to be conjugated.
incx The spacing between successive elements of x.
Output Parameters
x On exit, overwritten with conjg(x).
?syconv
Converts a symmetric matrix given by a triangular
matrix factorization into two matrices and vice versa.
1000
LAPACK Routines 3
Syntax
lapack_int LAPACKE_ssyconv (int matrix_layout, char uplo, char way, lapack_int n, float
* a, lapack_int lda, const lapack_int * ipiv, float * e);
lapack_int LAPACKE_dsyconv (int matrix_layout, char uplo, char way, lapack_int n,
double* a, lapack_int lda, const lapack_int * ipiv, double * e);
lapack_int LAPACKE_csyconv (int matrix_layout, char uplo, char way, lapack_int n,
lapack_complex_float * a, lapack_int lda, const lapack_int * ipiv, lapack_complex_float
* e);
lapack_int LAPACKE_zsyconv (int matrix_layout, char uplo, char way, lapack_int n,
lapack_complex_double* a, lapack_int lda, const lapack_int * ipiv,
lapack_complex_double * e);
Include Files
• mkl.h
Description
The routine converts matrix A, which results from a triangular matrix factorization, into matrices L and D and
vice versa. The routine returns non-diagonalized elements of D and applies or reverses permutation done
with the triangular matrix factorization.
Input Parameters

or column major ( LAPACK_COL_MAJOR ).
Indicates whether the details of the factorization are stored as an upper or

lower triangular matrix:
If uplo = 'U': the upper triangular, A = U*D*UT.
If uplo = 'L': the lower triangular, A = L*D*LT.
way Must be 'C' or 'R'.
a Array of size max(1,lda *n).

The block diagonal matrix D and the multipliers used to obtain the factor U
or L as computed by ?sytrf.
Details of the interchanges and the block structure of D, as returned by ?

sytrf.
Output Parameters
e Array of size max(1, n) containing the superdiagonal/subdiagonal of the

symmetric 1-by-1 or 2-by-2 block diagonal matrix D in L*D*LT.
1001
Return Values
If info < 0, the i-th parameter had an illegal value.

If info = -1011, memory allocation error occurred.
See Also
?sytrf
?syr
Performs the symmetric rank-1 update of a complex
symmetric matrix.
Syntax
lapack_int LAPACKE_csyr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float alpha, const lapack_complex_float * x, lapack_int incx,
lapack_int LAPACKE_zsyr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double alpha, const lapack_complex_double * x, lapack_int incx,
Include Files
• mkl.h
Description
The routine performs the symmetric rank 1 operation defined as

a := alpha*x*xH + a,
where:
• alpha is a complex scalar.

• x is an n-element complex vector.
• a is an n-by-n complex symmetric matrix.
These routines have their real equivalents in BLAS (see ?syr in Chapter 2).
Input Parameters

uplo Specifies whether the upper or lower triangular part of the array a is used:
If uplo = 'U' or 'u', then the upper triangular part of the array a is used.
If uplo = 'L' or 'l', then the lower triangular part of the array a is used.
n Specifies the order of the matrix a. The value of n must be at least zero.
1002
LAPACK Routines 3
incx Specifies the increment for the elements of x. The value of incx must not be
zero.
a Array, size max(1, lda*n). Before entry with uplo = 'U' or 'u', the
leading n-by-n upper triangular part of the array a must contain the upper
triangular part of the symmetric matrix and the strictly lower triangular part
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular

(sub)program. The value of lda must be at least max(1,n).
Output Parameters
a With uplo = 'U' or 'u', the upper triangular part of the array a is
With uplo = 'L' or 'l', the lower triangular part of the array a is
Return Values

i?max1
Finds the index of the vector element whose real part
has maximum absolute value.
Syntax
MKL_INT icmax1(const MKL_INT*n, const MKL_Complex8*cx, const MKL_INT*incx)
MKL_INT izmax1(const MKL_INT*n, const MKL_Complex16*cx, const MKL_INT*incx)
Include Files
• mkl.h
Description
Given a complex vector cx, the i?max1 functions return the index of the first vector element of maximum
absolute value. These functions are based on the BLAS functions icamax/izamax, but using the absolute
value of components. They are designed for use with clacon/zlacon.
1003
Input Parameters
n Specifies the number of elements in the vector cx.
cx Array, size at least (1+(n-1)*abs(incx)).
Contains the input vector.
incx Specifies the spacing between successive elements of cx.
Return Values
Index of the vector element of maximum absolute value.
?sum1
Forms the 1-norm of the complex vector using the
true absolute value.
Syntax
float scsum1(const MKL_INT*n, const MKL_Complex8*cx, const MKL_INT*incx)
double dzsum1(const MKL_INT*n, const MKL_Complex16*cx, const MKL_INT*incx)
Include Files
• mkl.h
Description
Given a complex vector cx, scsum1/dzsum1 functions take the sum of the absolute values of vector elements
and return a single/double precision result, respectively. These functions are based on scasum/dzasum from
Level 1 BLAS, but use the true absolute value and were designed for use with clacon/zlacon.
Input Parameters
n Specifies the number of elements in the vector cx.
cx Array, size at least (1+(n-1)*abs(incx)).
Contains the input vector whose elements will be summed.
incx Specifies the spacing between successive elements of cx (incx > 0).
Return Values
Sum of absolute values.
?gelq2
Computes the LQ factorization of a general
rectangular matrix using an unblocked algorithm.
Syntax
lapack_int LAPACKE_sgelq2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dgelq2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int lda, double * tau);
1004
LAPACK Routines 3
lapack_int LAPACKE_cgelq2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_int LAPACKE_zgelq2 (int matrix_layout, lapack_int m, lapack_int n,
Include Files
• mkl.h
Description
The routine computes an LQ factorization of a real/complex m-by-n matrix A as A = L*Q.
elementary reflectors :
Q = H(k) ... H(2) H(1) (or Q = H(k)H ... H(2)HH(1)H for complex flavors), where k = min(m, n)
H(i) = I - tau*v*vT for real flavors, or
H(i) = I - tau*v*vH for complex flavors,
where tau is a real/complex scalar stored in tau(i), and v is a real/complex vector with v1:i-1 = 0 and vi =
1.
On exit, the j-th (i+1 ≤j≤n) component of vector v (for real functions) or its conjugate (for complex functions)
is stored in a[i - 1 + lda*(j - 1)] for column major layout or in a[j - 1 + lda*(i - 1)] for row
major layout.
Input Parameters
a Array, size at least max(1, lda*n) for column major and max(1, lda*m)
for row major layout. Array a contains the m-by-n matrix A.
max(1,n) for row major layout.
Output Parameters

on exit, the elements on and below the diagonal of the array a contain the
m-by-min(n,m) lower trapezoidal matrix L (L is lower triangular if n≥m); the
elements above the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of min(n,m) elementary reflectors.
Contains scalar factors of the elementary reflectors.
Return Values
1005
?geqr2
Computes the QR factorization of a general
rectangular matrix using an unblocked algorithm.
Syntax
lapack_int LAPACKE_sgeqr2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dgeqr2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int LAPACKE_cgeqr2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_int LAPACKE_zgeqr2 (int matrix_layout, lapack_int m, lapack_int n,
Include Files
• mkl.h
Description
The routine computes a QR factorization of a real/complex m-by-n matrix A as A = Q*R.
elementary reflectors :
Q = H(1)*H(2)* ... *H(k), where k = min(m, n)
H(i) = I - tau*v*vT for real flavors, or
H(i) = I - tau*v*vH for complex flavors
where tau is a real/complex scalar stored in tau[i], and v is a real/complex vector with v1:i-1 = 0 and vi =
1.
On exit, vi+1:m is stored in a(i+1:m, i).
Input Parameters
1006
LAPACK Routines 3
Output Parameters

on exit, the elements on and above the diagonal of the array a contain the
min(n,m)-by-n upper trapezoidal matrix R (R is upper triangular if m≥n); the
elements below the diagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors.
Contains scalar factors of the elementary reflectors.
Return Values
?geqrt2
Computes a QR factorization of a general real or
complex matrix using the compact WY representation
of Q.
Syntax
lapack_int LAPACKE_sgeqrt2 (int matrix_layout, lapack_int m, lapack_int n, float * a,
lapack_int lda, float * t, lapack_int ldt );
lapack_int LAPACKE_dgeqrt2 (int matrix_layout, lapack_int m, lapack_int n, double * a,
lapack_int lda, double * t, lapack_int ldt );
lapack_int LAPACKE_cgeqrt2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float * a, lapack_int lda, lapack_complex_float * t, lapack_int ldt );
lapack_int LAPACKE_zgeqrt2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double * a, lapack_int lda, lapack_complex_double * t, lapack_int ldt );
Include Files
• mkl.h
Description
1007
where vi represents the vector that defines H(i). The vectors are returned in the lower triangular part of array
a.
NOTE
The block reflector H is then given by

H = I - V*T*VT for real flavors, and
H = I - V*T*VH for complex flavors,
where VT is the transpose and VH is the conjugate transpose of V.
Input Parameters
m The number of rows in the matrix A (m≥n).
Output Parameters

The elements on and above the diagonal of the array contain the n-by-n
upper triangular matrix R. The elements below the diagonal are the
columns of V.
t Array, size at least max(1, ldt*n).
The n-by-n upper triangular factor of the block reflector. The elements on
and above the diagonal contain the block reflector T. The elements below
the diagonal are not used.
Return Values
If info < 0 and info = -i, the ith argument had an illegal value.
?geqrt3
Recursively computes a QR factorization of a general
real or complex matrix using the compact WY
representation of Q.
1008
LAPACK Routines 3
Syntax
lapack_int LAPACKE_sgeqrt3 (int matrix_layout , lapack_int m , lapack_int n , float *
a , lapack_int lda , float * t , lapack_int ldt );
lapack_int LAPACKE_dgeqrt3 (int matrix_layout , lapack_int m , lapack_int n , double *
a , lapack_int lda , double * t , lapack_int ldt );
lapack_int LAPACKE_cgeqrt3 (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_float * a , lapack_int lda , lapack_complex_float * t , lapack_int
ldt );
lapack_int LAPACKE_zgeqrt3 (int matrix_layout , lapack_int m , lapack_int n ,
lapack_complex_double * a , lapack_int lda , lapack_complex_double * t , lapack_int
ldt );
Include Files
• mkl.h
Description
where vi represents one of the vectors that define H(i). The vectors are returned in the lower part of
triangular array a.
NOTE
The block reflector H is then given by

H = I - V*T*VT for real flavors, and
H = I - V*T*VH for complex flavors,
where VT is the transpose and VHis the conjugate transpose of V.
Input Parameters
m The number of rows in the matrix A (m≥n).
1009
Output Parameters
a The elements on and above the diagonal of the array contain the n-by-n
upper triangular matrix R. The elements below the diagonal are the
columns of V.
t Array, size ldt by n.
The n-by-n upper triangular factor of the block reflector. The elements on
and above the diagonal contain the block reflector T. The elements below
the diagonal are not used.
Return Values
?getf2
matrix using partial pivoting with row interchanges
(unblocked algorithm).
Syntax
lapack_int LAPACKE_sgetf2 (int matrix_layout, lapack_int m, lapack_int n, float* a,
lapack_int LAPACKE_dgetf2 (int matrix_layout, lapack_int m, lapack_int n, double* a,
lapack_int LAPACKE_cgetf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_float* a, lapack_int lda, lapack_int * ipiv);
lapack_int LAPACKE_zgetf2 (int matrix_layout, lapack_int m, lapack_int n,
lapack_complex_double* a, lapack_int lda, lapack_int * ipiv);
Include Files
• mkl.h
Description
The routine computes the LU factorization of a general m-by-n matrix A using partial pivoting with row
interchanges. The factorization has the form
A = P*L*U
1010
LAPACK Routines 3
where p is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m >
n) and U is upper triangular (upper trapezoidal if m < n).
Input Parameters
Output Parameters
a Overwritten by L and U. The unit diagonal elements of L are not stored.
ipiv
Array, size at least max(1,min(m,n)).
The pivot indices: for 1 ≤ i ≤n, row i was interchanged with row ipiv(i).
Return Values
If info = i >0, uii is 0. The factorization has been completed, but U is exactly singular. Division by 0 will
?lacn2
Estimates the 1-norm of a square matrix, using
reverse communication for evaluating matrix-vector
products.
Syntax
C:
lapack_int LAPACKE_slacn2 (lapack_int n, float * v, float * x, lapack_int * isgn, float
* est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_clacn2 (lapack_int n, lapack_complex_float * v, lapack_complex_float
* x, float * est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_dlacn2 (lapack_int n, double * v, double * x, lapack_int * isgn,
double * est, lapack_int * kase, lapack_int * isave);
lapack_int LAPACKE_zlacn2 (lapack_int n, lapack_complex_double * v,
lapack_complex_double * x, double * est, lapack_int * kase, lapack_int * isave);
Include Files
• mkl.h
1011
Description
The routine estimates the 1-norm of a square, real or complex matrix A. Reverse communication is used for
evaluating matrix-vector products.
Input Parameters
v, x Arrays, size (n) each.

v is a workspace array.
x is used as input after an intermediate return.
isgn Workspace array, size (n), used with real flavors only.
est On entry with kase set to 1 or 2, and isave(1) = 1, est must be

unchanged from the previous call to the routine.
kase On the initial call to the routine, kase must be set to 0.
isave Array, size (3).

Contains variables from the previous call to the routine.
Output Parameters
est An estimate (a lower bound) for norm(A).
kase On an intermediate return, kase is set to 1 or 2, indicating whether x is

overwritten by A*x or AT*x for real flavors and A*x or AH*x for complex
flavors.
On the final return, kase is set to 0.
v On the final return, v = A*w, where est = norm(v)/norm(w) (w is not

returned).
x On an intermediate return, x is overwritten by

A*x, if kase = 1,
AT*x, if kase = 2 (for real flavors),
AH*x, if kase = 2 (for complex flavors),
and the routine must be re-called with all the other parameters unchanged.
isave This parameter is used to save variables between calls to the routine.
Return Values
1012
LAPACK Routines 3
?lacpy
Copies all or part of one two-dimensional array to
another.
Syntax
lapack_int LAPACKE_slacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const float* a, lapack_int lda, float* b, lapack_int ldb);
lapack_int LAPACKE_dlacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const double* a, lapack_int lda, double* b, lapack_int ldb);
lapack_int LAPACKE_clacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const lapack_complex_float* a, lapack_int lda, lapack_complex_float* b, lapack_int
ldb);
lapack_int LAPACKE_zlacpy (int matrix_layout, char uplo, lapack_int m, lapack_int n,
const lapack_complex_double* a, lapack_int lda, lapack_complex_double* b, lapack_int
ldb);
Include Files
• mkl.h
Description
The routine copies all or part of a two-dimensional matrix A to another matrix B.
Input Parameters
uplo
Specifies the part of the matrix A to be copied to B.
If uplo = 'U', the upper triangular part of A;
if uplo = 'L', the lower triangular part of A.
Otherwise, all of the matrix A is copied.
for row major layout. A contains the m-by-n matrix A.
If uplo = 'U', only the upper triangle or trapezoid is accessed; if uplo =

'L', only the lower triangle or trapezoid is accessed.
lda The leading dimension of a; lda≥max(1,m) for column major layout

and max(1,n) for row major layout.
ldb The leading dimension of the output array b; ldb≥ max(1, m)for column
major layout and max(1,n) for row major layout.
Output Parameters
b Array, size at least max(1, ldb*n) for column major and max(1, ldb*m)
for row major layout. Array a contains the m-by-n matrix B.
1013
On exit, B = A in the locations specified by uplo.
Return Values
?lakf2
Forms a matrix containing Kronecker products
between the given matrices.
Syntax
void slakf2 (lapack_int *m, lapack_int *n, float *a, lapack_int *lda, float *b, float
*d, float *e, float *z, lapack_int *ldz);
void dlakf2 (lapack_int *m, lapack_int *n, double *a, lapack_int *lda, double *b,
double *d, double *e, double *z, lapack_int *ldz);
void clakf2 (lapack_int *m, lapack_int *n, lapack_complex *a, lapack_int *lda,
lapack_complex *b, lapack_complex *d, lapack_complex *e, lapack_complex *z, lapack_int
*ldz);
void zlakf2 (lapack_int *m, lapack_int *n, lapack_complex_double *a, lapack_int *lda,
lapack_complex_double *b, lapack_complex_double *d, lapack_complex_double *e,
lapack_complex_double *z, lapack_int *ldz);
Include Files
• mkl.h
Description
The routine ?lakf2 forms the 2*m*n by 2*m*n matrix Z.
,
where In is the identity matrix of size n and XT is the transpose of X. kron(X, Y) is the Kronecker product
between the matrices X and Y.
Input Parameters
m Size of matrix, m≥ 1
n Size of matrix, n≥ 1
a Array, size lda-by-n. The matrix A in the output matrix Z.
lda The leading dimension of a, b, d, and e. lda≥m+n.
b Array, size lda by n. Matrix used in forming the output matrix Z.
d Array, size lda by m. Matrix used in forming the output matrix Z.
1014
LAPACK Routines 3
e Array, size lda by n. Matrix used in forming the output matrix Z.
ldz The leading dimension of Z. ldz≥ 2* m*n.
Output Parameters
z Array, size ldz-by-2*m*n. The resultant Kronecker m*n*2 -by-m*n*2

matrix.
?lange
Returns the value of the 1-norm, Frobenius norm,
infinity-norm, or the largest absolute value of any
element of a general rectangular matrix.
Syntax
float LAPACKE_slange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
float * a, lapack_int lda);
double LAPACKE_dlange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
double * a, lapack_int lda);
float LAPACKE_clange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
double LAPACKE_zlange (int matrix_layout, char norm, lapack_int m, lapack_int n, const
Include Files
• mkl.h
Description
The function ?lange returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a real/complex matrix A.
Input Parameters
norm Specifies the value to be returned by the routine:

= 'M' or 'm': val = max(abs(Aij)), largest absolute value of the matrix
A.
= '1' or 'O' or 'o': val = norm1(A), 1-norm of the matrix A
(maximum column sum),
= 'I' or 'i': val = normI(A), infinity norm of the matrix A (maximum
row sum),
= 'F', 'f', 'E' or 'e': val = normF(A), Frobenius norm of the matrix
A (square root of sum of squares).

m≥ 0. When m = 0, ?lange is set to zero.
n The number of columns of the matrix A.
1015
n≥ 0. When n = 0, ?lange is set to zero.

lda≥ max(n,1) for column major layout and max(1,n) for row major
layout.
?lansy
Returns the value of the 1-norm, or the Frobenius
norm, or the infinity norm, or the element of largest
absolute value of a real/complex symmetric matrix.
Syntax
float LAPACKE_slansy (int matrix_layout, char norm, char uplo, lapack_int n, const
float * a, lapack_int lda);
double LAPACKE_dlansy (int matrix_layout, char norm, char uplo, lapack_int n, const
double * a, lapack_int lda);
float LAPACKE_clansy (int matrix_layout, char norm, char uplo, lapack_int n, const
double LAPACKE_zlansy (int matrix_layout, char norm, char uplo, lapack_int n, const
Include Files
• mkl.h
Description
The function ?lansy returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a real/complex symmetric matrix A.
Input Parameters

A.
row sum),
uplo Specifies whether the upper or lower triangular part of the symmetric
matrix A is to be referenced.
= 'U': Upper triangular part of A is referenced.
1016
LAPACK Routines 3
= 'L': Lower triangular part of A is referenced
n The order of the matrix A. n≥ 0. When n = 0, ?lansy is set to zero.
a Array, size at least max(1,lda*n). The symmetric matrix A.


lda≥ max(n,1).
?lanhe
absolute value of a complex Hermitian matrix.
Syntax
float LAPACKE_clanhe (int matrix_layout, char norm, char uplo, lapack_int n, const
double LAPACKE_zlanhe (int matrix_layout, char norm, char uplo, lapack_int n, const
Include Files
• mkl.h
Description
The function ?lanhe returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a complex Hermitian matrix A.
Input Parameters

= 'M' or 'm': val = max(abs(Aij)), largest absolute value of the matrix A.
= '1' or 'O' or 'o': val = norm1(A), 1-norm of the matrix A (maximum

column sum),
row sum),
= 'F', 'f', 'E' or 'e': val = normF(A), Frobenius norm of the matrix A
(square root of sum of squares).
uplo Specifies whether the upper or lower triangular part of the Hermitian matrix
A is to be referenced.
= 'U': Upper triangular part of A is referenced.
1017
= 'L': Lower triangular part of A is referenced
n The order of the matrix A. n≥ 0. When n = 0, ?lanhe is set to zero.
a Array, size at least max(1, lda*n). The Hermitian matrix A.


lda≥ max(n,1).
?lantr
absolute value of a trapezoidal or triangular matrix.
Syntax
float LAPACKE_slantr (char * norm, char * uplo, char * diag, lapack_int * m, lapack_int
* n, const float * a, lapack_int * lda, float * work);
double LAPACKE_dlantr (char * norm, char * uplo, char * diag, lapack_int * m,
lapack_int * n, const double * a, lapack_int * lda, double * work);
float LAPACKE_clantr (char * norm, char * uplo, char * diag, lapack_int * m, lapack_int
* n, const lapack_complex_float * a, lapack_int * lda, float * work);
double LAPACKE_zlantr (char * norm, char * uplo, char * diag, lapack_int * m,
lapack_int * n, const lapack_complex_double * a, lapack_int * lda, double * work);
Include Files
• mkl.h
Description
The function ?lantr returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a trapezoidal or triangular matrix A.
Input Parameters

A.
row sum),
1018
LAPACK Routines 3
uplo Specifies whether the matrix A is upper or lower trapezoidal.

= 'U': Upper trapezoidal
= 'L': Lower trapezoidal.
Note that A is triangular instead of trapezoidal if m = n.
diag
Specifies whether or not the matrix A has unit diagonal.
= 'N': Non-unit diagonal
= 'U': Unit diagonal.
m The number of rows of the matrix A. m≥ 0, and if uplo = 'U', m≤n.
When m = 0, ?lantr is set to zero.
n The number of columns of the matrix A. n≥ 0, and if uplo = 'L', n≤m.
When n = 0, ?lantr is set to zero.
The trapezoidal matrix A (A is triangular if m = n).
If uplo = 'U', the leading m-by-n upper trapezoidal part of the array a
contains the upper trapezoidal matrix, and the strictly lower triangular part
If uplo = 'L', the leading m-by-n lower trapezoidal part of the array a
contains the lower trapezoidal matrix, and the strictly upper triangular part
of A is not referenced. Note that when diag = 'U', the diagonal elements
of A are not referenced and are assumed to be one.

lda≥ max(m,1)for column major layout and ≥max(1,n) for row major
layout.
?lapmr
Rearranges rows of a matrix as specified by a
permutation vector.
Syntax
lapack_int LAPACKE_slapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, float* x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_dlapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, double* x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_clapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_float* x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_zlapmr (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_double* x, lapack_int ldx, lapack_int * k);
1019
Include Files
• mkl.h
Description
The ?lapmr routine rearranges the rows of the m-by-n matrix X as specified by the permutation k[0],
k[1], ... , k[m-1] of the integers 1,...,m.
If forwrd is true, forward permutation:
X(k[i-1],:) is moved to X{i,:) for i= 1,2,...,m.
If forwrd is false, backward permutation:
X{i,:) is moved to X(k[i-1,:) for i = 1,2,...,m.
Input Parameters
forwrd
If forwrd is true, forward permutation.
If forwrd is false, backward permutation.
m The number of rows of the matrix X. m≥ 0.
n The number of columns of the matrix X. n≥ 0.
x Array, size at least max(1, ldx*n) for column major and max(1, ldx*m)
for row major layout. On entry, the m-by-n matrix X.
ldx The leading dimension of the array X, ldx≥ max(1,m)for column major
layout and ldx≥ max(1,n) for row major layout.
k Array, size (m). On entry, k contains the permutation vector and is used as
internal workspace.
Output Parameters
x On exit, x contains the permuted matrix X.
k On exit, k is reset to its original value.
Return Values
See Also
?lapmt
Performs a forward or backward permutation of the
columns of a matrix.
1020
LAPACK Routines 3
Syntax
lapack_int LAPACKE_slapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, float * x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_dlapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, double * x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_clapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_float * x, lapack_int ldx, lapack_int * k);
lapack_int LAPACKE_zlapmt (int matrix_layout, lapack_logical forwrd, lapack_int m,
lapack_int n, lapack_complex_double * x, lapack_int ldx, lapack_int * k);
Include Files
• mkl.h
Description
The routine ?lapmt rearranges the columns of the m-by-n matrix X as specified by the permutation k[i -
1]for i = 1,...,n.
If forwrd≠ 0, forward permutation:
X(*,k(j)) is moved to X(*,j) for j=1,2,...,n.

If forwrd = 0, backward permutation:
X(*,j) is moved to X(*,k(j)) for j = 1,2,...,n.
Input Parameters

forwrd If forwrd≠ 0, forward permutation
If forwrd = 0, backward permutation
m The number of rows of the matrix X. m≥ 0.
n The number of columns of the matrix X. n≥ 0.
x Array, size ldx*n. On entry, the m-by-n matrix X.
ldx The leading dimension of the array x, ldx≥ max(1,m).
k Array, size (n). On entry, k contains the permutation vector and is used as
internal workspace.
Output Parameters
x On exit, x contains the permuted matrix X.
k On exit, k is reset to its original value.
See Also
?lapmr
1021
?lapy2
Returns sqrt(x2+y2).
Syntax
float LAPACKE_slapy2 (floatx, floaty);
double LAPACKE_dlapy2 (doublex, doubley);
Include Files
• mkl.h
Description
The function ?lapy2 returns sqrt(x2+y2), avoiding unnecessary overflow or harmful underflow.
Input Parameters
x, y Specify the input values x and y.
Return Values
The function returns a value val.
If val=-1D0, the first argument was NaN.
If val=-2D0, the second argument was NaN.
?lapy3
Returns sqrt(x2+y2+z2).
Syntax
float LAPACKE_slapy3 (floatx, floaty, floatz);
double LAPACKE_dlapy3 (double x, doubley, doublez);
Include Files
• mkl.h
Description
The function ?lapy3 returns sqrt(x2+y2+z2), avoiding unnecessary overflow or harmful underflow.
Input Parameters
x, y, z Specify the input values x, y and z.
Return Values
This function returns a value val.
If val = -1D0, the first argument was NaN.
If val = -2D0, the second argument was NaN.
If val = -3D0, the third argument was NaN.
1022
LAPACK Routines 3
?laran
Returns a random real number from a uniform
distribution.
Syntax
float slaran (lapack_int *iseed);
double dlaran (lapack_int *iseed);
Description
The ?laran routine returns a random real number from a uniform (0,1) distribution. This routine uses a
multiplicative congruential method with modulus 248 and multiplier 33952834046453. 48-bit integers are
stored in four integer array elements with 12 bits per element. Hence the routine is portable across machines
with integers of 32 bits or more.
Input Parameters
iseed Array, size 4. On entry, the seed of the random number generator. The
array elements must be between 0 and 4095, and iseed[3] must be odd.
Output Parameters
iseed On exit, the seed is updated.
Return Values
The function returns a random number.
?larfb
Applies a block reflector or its transpose/conjugate-
transpose to a general rectangular matrix.
Syntax
lapack_int LAPACKE_slarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const float * v ,
lapack_int ldv , const float * t , lapack_int ldt , float * c , lapack_int ldc );
lapack_int LAPACKE_dlarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const double * v ,
lapack_int ldv , const double * t , lapack_int ldt , double * c , lapack_int
ldc );lapack_int LAPACKE_clarfb (int matrix_layout , char side , char trans , char
direct , char storev , lapack_int m , lapack_int n , lapack_int k , const
lapack_complex_float * v , lapack_int ldv , const lapack_complex_float * t , lapack_int
ldt , lapack_complex_float * c , lapack_int ldc );
lapack_int LAPACKE_zlarfb (int matrix_layout , char side , char trans , char direct ,
char storev , lapack_int m , lapack_int n , lapack_int k , const lapack_complex_double
* v , lapack_int ldv , const lapack_complex_double * t , lapack_int ldt ,
lapack_complex_double * c , lapack_int ldc );
Include Files
• mkl.h
1023
Description
The real flavors of the routine ?larfb apply a real block reflector H or its transpose HT to a real m-by-n
matrix C from either left or right.
The complex flavors of the routine ?larfb apply a complex block reflector H or its conjugate transpose HH to
a complex m-by-n matrix C from either left or right.
Input Parameters
side
If side = 'L': apply H or HT for real flavors and H or HH for complex
flavors from the left.
If side = 'R': apply H or HT for real flavors and H or HH for complex
flavors from the right.
trans
If trans = 'N': apply H (No transpose).
If trans = 'C': apply HH (Conjugate transpose).
If trans = 'T': apply HT (Transpose).
direct
Indicates how H is formed from a product of elementary reflectors
If direct = 'F': H = H(1)*H(2)*. . . *H(k) (forward)
If direct = 'B': H = H(k)* . . . H(2)*H(1) (backward)
storev
Indicates how the vectors which define the elementary reflectors are
stored:
If storev = 'C': Column-wise
If storev = 'R': Row-wise
m The number of rows of the matrix C.
n The number of columns of the matrix C.
k The order of the matrix T (equal to the number of elementary reflectors

whose product defines the block reflector).
v The size limitations depend on values of parameters storev and side as

described in the following table:
storev = C storev = R
side = L side = R side = L side = R
Column max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

major k) k) m) n)
Row major max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

m) n) k) k)
The matrix v. See Application Notes below.
ldv The leading dimension of the array v.It should satisfy the following
conditions:
1024
LAPACK Routines 3
Column max(1,m) max(1,n) max(1,k) max(1,k)

major
Row major max(1,k) max(1,k) max(1,m) max(1,n)
t Array, size at least max(1,ldt * k).

Contains the triangular k-by-k matrix T in the representation of the block
reflector.
ldt The leading dimension of the array t.

ldt≥k.
c Array, size at least max(1, ldc * n) for column major layout and max(1, ldc
* m) for row major layout.
On entry, the m-by-n matrix C.
ldc The leading dimension of the array c.

ldc≥ max(1,m) for column major layout and ldc≥ max(1,n) for row
major layout.
Output Parameters
c On exit, c is overwritten by the product of the following:
• H*C, or HT*C, or C*H, or C*HT for real flavors

• H*C, or HH*C, or C*H, or C*HH for complex flavors
Return Values
Application Notes
The shape of the matrix V and the storage of the vectors which define the H(i) is best illustrated by the
following example with n = 5 and k = 3. The elements equal to 1 are not stored; the corresponding array
elements are modified but restored on exit. The rest of the array is not used.
1025
?larfg
Generates an elementary reflector (Householder
matrix).
Syntax
lapack_int LAPACKE_slarfg (lapack_int n , float * alpha , float * x , lapack_int incx ,
float * tau );
lapack_int LAPACKE_dlarfg (lapack_int n , double * alpha , double * x , lapack_int
incx , double * tau );
lapack_int LAPACKE_clarfg (lapack_int n , lapack_complex_float * alpha ,
lapack_complex_float * x , lapack_int incx , lapack_complex_float * tau );
lapack_int LAPACKE_zlarfg (lapack_int n , lapack_complex_double * alpha ,
lapack_complex_double * x , lapack_int incx , lapack_complex_double * tau );
Include Files
• mkl.h
Description
The routine ?larfg generates a real/complex elementary reflector H of order n, such that
1026
LAPACK Routines 3
where alpha and beta are scalars (with beta real for all flavors), and x is an (n-1)-element real/complex
vector. H is represented in the form

where tau is a real/complex scalar and v is a real/complex (n-1)-element vector, respectively. Note that for
clarfg/zlarfg, H is not Hermitian.
If the elements of x are all zero (and, for complex flavors, alpha is real), then tau = 0 and H is taken to be
the unit matrix.
Otherwise, 1 ≤tau≤ 2 (for real flavors), or
1 ≤ Re(tau) ≤ 2 and abs(tau-1) ≤ 1 (for complex flavors).
Input Parameters
n The order of the elementary reflector.
alpha
x Array, size (1+(n-2)*abs(incx)).
On entry, the vector x.
incx
The increment between elements of x. incx > 0.
Output Parameters
alpha On exit, it is overwritten with the value beta.
x On exit, it is overwritten with the vector v.
tau
Return Values
If info = -2, alpha is NaN

If info = -3, array x contains NaN components.
?larft
Forms the triangular factor T of a block reflector H = I
- V*T*V**H.
1027
Syntax
lapack_int LAPACKE_slarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const float * v , lapack_int ldv , const float * tau , float * t ,
lapack_int ldt );
lapack_int LAPACKE_dlarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const double * v , lapack_int ldv , const double * tau , double *
t , lapack_int ldt );
lapack_int LAPACKE_clarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const lapack_complex_float * v , lapack_int ldv , const
lapack_complex_float * tau , lapack_complex_float * t , lapack_int ldt );
lapack_int LAPACKE_zlarft (int matrix_layout , char direct , char storev , lapack_int
n , lapack_int k , const lapack_complex_double * v , lapack_int ldv , const
lapack_complex_double * tau , lapack_complex_double * t , lapack_int ldt );
Include Files
• mkl.h
Description
The routine ?larft forms the triangular factor T of a real/complex block reflector H of order n, which is
defined as a product of k elementary reflectors.
If direct = 'F', H = H(1)*H(2)* . . .*H(k) and T is upper triangular;
If direct = 'B', H = H(k)*. . .*H(2)*H(1) and T is lower triangular.
If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-th column of the
array v, and H = I - V*T*VT (for real flavors) or H = I - V*T*VH (for complex flavors) .
If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-th row of the array
v, and H = I - VT*T*V (for real flavors) or H = I - VH*T*V (for complex flavors).
Input Parameters
direct
Specifies the order in which the elementary reflectors are multiplied to form
the block reflector:
= 'F': H = H(1)*H(2)*. . . *H(k) (forward)
= 'B': H = H(k)*. . .*H(2)*H(1) (backward)
storev
Specifies how the vectors which define the elementary reflectors are stored
(see also Application Notes below):
= 'C': column-wise
= 'R': row-wise.
n The order of the block reflector H. n≥ 0.
k The order of the triangular factor T (equal to the number of elementary

reflectors). k≥ 1.
v The size limitations depend on values of parameters storev and side as

described in the following table:
1028
LAPACK Routines 3
Column major max(1,ldv*k) max(1,ldv*n)
Row major max(1,ldv*n) max(1,ldv*k)
The matrix v. See Application Notes below.

If storev = 'C', ldv≥ max(1,n) for column major and ldv≥max(1,k) for
row major;
if storev = 'R', ldv≥k for column major and ldv≥max(1,n) for row
major.
tau Array, size (k). tau[i-1] must contain the scalar factor of the elementary
reflector H(i).
ldt The leading dimension of the output array t. ldt≥k.
Output Parameters
t Array, size ldt * k. The k-by-k triangular factor T of the block reflector. If
direct = 'F', T is upper triangular; if direct = 'B', T is lower
triangular. The rest of the array is not used.
v The matrix V.
Application Notes
1029
?larfx
Applies an elementary reflector to a general
rectangular matrix, with loop unrolling when the
reflector has order less than or equal to 10.
Syntax
lapack_int LAPACKE_slarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const float * v , float tau , float * c , lapack_int ldc , float * work );
lapack_int LAPACKE_dlarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const double * v , double tau , double * c , lapack_int ldc , double * work );
lapack_int LAPACKE_clarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const lapack_complex_float * v , lapack_complex_float tau , lapack_complex_float *
c , lapack_int ldc , lapack_complex_float * work );
lapack_int LAPACKE_zlarfx (int matrix_layout , char side , lapack_int m , lapack_int
n , const lapack_complex_double * v , lapack_complex_double tau , lapack_complex_double
* c , lapack_int ldc , lapack_complex_double * work );
Include Files
• mkl.h
Description
The routine ?larfx applies a real/complex elementary reflector H to a real/complex m-by-n matrix C, from
either the left or the right.
H is represented in the following forms:
• H = I - tau*v*vT, where tau is a real scalar and v is a real vector.

• H = I - tau*v*vH, where tau is a complex scalar and v is a complex vector.
If tau = 0, then H is taken to be the unit matrix.
Input Parameters
side
If side = 'L': form H*C
If side = 'R': form C*H.
m The number of rows of the matrix C.
1030
LAPACK Routines 3
n The number of columns of the matrix C.
v Array, size
(m) if side = 'L' or
(n) if side = 'R'.
The vector v in the representation of H.
tau The value tau in the representation of H.
c Array, size at least max(1, ldc*n) for column major layout and max (1,
ldc*m) for row major layout. On entry, the m-by-n matrix C.
ldc The leading dimension of the array c. lda≥ (1,m).
work Workspace array, size

(n) if side = 'L' or
(m) if side = 'R'.
work is not referenced if H has order < 11.
Output Parameters
c On exit, C is overwritten by the matrix H*C if side = 'L', or C*H if side =

'R'.
?large
Pre- and post-multiplies a real general matrix with a
random orthogonal matrix.
Syntax
void slarge (lapack_int *n, float *a, lapack_int *lda, lapack_int *iseed, float * work,
lapack_int *info);
void dlarge (lapack_int *n, double *a, lapack_int *lda, lapack_int *iseed, double *
work, lapack_int *info);
void clarge (lapack_int *n, lapack_complex *a, lapack_int *lda, lapack_int *iseed,
lapack_complex * work, lapack_int *info);
void zlarge (lapack_int *n, lapack_complex_double *a, lapack_int *lda, lapack_int
*iseed, lapack_complex_double * work, lapack_int *info);
Include Files
• mkl.h
Description
The routine ?large pre- and post-multiplies a general n-by-n matrix A with a random orthogonal or unitary
matrix: A = U*D*UT .
1031
Input Parameters
n The order of the matrix A. n≥0
a Array, size lda by n.

On entry, the original n-by-n matrix A.
lda The leading dimension of the array a. lda≥n.
iseed Array, size 4.

On entry, the seed of the random number generator. The array elements
must be between 0 and 4095, and iseed[3] must be odd.
work Workspace array, size 2*n.
Output Parameters
a On exit, A is overwritten by U*A*U' for some random orthogonal matrix U.
If info < 0, the i -th parameter had an illegal value.
?larnd
Returns a random real number from a uniform or
normal distribution.
Syntax
float slarnd (lapack_int *idist, lapack_int *iseed);
double dlarnd (lapack_int *idist, lapack_int *iseed);
The data types for complex variations depend on whether or not the application links with Gnu Fortran
(gfortran) libraries.
For non-gfortran (libmkl_intel_*) interface libraries:
void clarnd (lapack_complex_float *res, lapack_int *idist, lapack_int *iseed);

void zlarnd (lapack_complex_double *res, lapack_int *idist, lapack_int *iseed);
For gfortran (libmkl_gf_*) interface libraries:
lapack_complex_float clarnd (lapack_int *idist, lapack_int *iseed);

lapack_complex_double zlarnd (lapack_int *idist, lapack_int *iseed);
To understand the difference between the non-gfortran and gfortran interfaces and when to use each of
them, see Dynamic Libraries in the lib/intel64 Directory in the Intel MKL Developer Guide.
Include Files
• mkl.h
Description
The routine ?larnd returns a random number from a uniform or normal distribution.
1032
LAPACK Routines 3
Input Parameters
idist Specifies the distribution of the random numbers. For slarnd and dlanrd:
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1).
For clarnd and zlanrd:
= 1: real and imaginary parts each uniform (0,1)

= 2: real and imaginary parts each uniform (-1,1)
= 3: real and imaginary parts each normal (0,1)
= 4: uniformly distributed on the disc abs(z) ≤ 1
= 5: uniformly distributed on the circle abs(z) = 1

On entry, the seed of the random number generator. The array elements
must be between 0 and 4095, and iseed[3] must be odd.
Output Parameters
Return Values
The function returns a random number (for complex variations libmkl_gf_* interface layer/libraries return
the result as the parameter res).
?larnv
Returns a vector of random numbers from a uniform
or normal distribution.
Syntax
lapack_int LAPACKE_slarnv (lapack_int idist , lapack_int * iseed , lapack_int n , float
* x );
lapack_int LAPACKE_dlarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
double * x );
lapack_int LAPACKE_clarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
lapack_complex_float * x );
lapack_int LAPACKE_zlarnv (lapack_int idist , lapack_int * iseed , lapack_int n ,
lapack_complex_double * x );
Include Files
• mkl.h
Description
The routine ?larnv returns a vector of n random real/complex numbers from a uniform or normal
distribution.
1033
This routine calls the auxiliary routine ?laruv to generate random real numbers from a uniform (0,1)
distribution, in batches of up to 128 using vectorisable code. The Box-Muller method is used to transform
numbers from a uniform to a normal distribution.
Input Parameters
idist Specifies the distribution of the random numbers: for slarnv and dlarnv:
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1).
for clarnv and zlarnv:

= 4: uniformly distributed on the disc abs(z) < 1
= 5: uniformly distributed on the circle abs(z) = 1
iseed Array, size (4).

On entry, the seed of the random number generator; the array elements
must be between 0 and 4095, and iseed(4) must be odd.
n The number of random numbers to be generated.
Output Parameters
x Array, size (n). The generated random numbers.
Return Values
?laror
Pre- or post-multiplies an m-by-n matrix by a random
Syntax
void slaror (char *side, char *init, lapack_int *m, lapack_int *n, float *a, lapack_int
*lda, lapack_int *iseed, float *x, lapack_int *info);
void dlaror (char *side, char *init, lapack_int *m, lapack_int *n, double *a,
lapack_int *lda, lapack_int *iseed, double *x, lapack_int *info);
void claror (char *side, char *init, lapack_int *m, lapack_int *n, lapack_complex *a,
lapack_int *lda, lapack_int *iseed, lapack_complex *x, lapack_int *info);
void zlaror (char *side, char *init, lapack_int *m, lapack_int *n,
lapack_complex_double *a, lapack_int *lda, lapack_int *iseed, lapack_complex_double *x,
lapack_int *info);
1034
LAPACK Routines 3
Include Files
• mkl.h
Description
The routine ?laror pre- or post-multiplies an m-by-n matrix A by a random orthogonal or unitary matrix U,
overwriting A. A may optionally be initialized to the identity matrix before multiplying by U. U is generated
using the method of G.W. Stewart (SIAM J. Numer. Anal. 17, 1980, 403-409).
Input Parameters
side Specifies whether A is multiplied by U on the left or right.

for slaror and dlaror:
If side = 'L', multiply A on the left (premultiply) by U.
If side = 'R', multiply A on the right (postmultiply) by UT.
If side = 'C' or 'T', multiply A on the left by U and the right by UT.
for claror and zlaror:
If side = 'L', multiply A on the left (premultiply) by U.
If side = 'R', multiply A on the right (postmultiply) by UC>.
If side = 'C', multiply A on the left by U and the right by UC>
Ifside = 'T', multiply A on the left by U and the right by UT.
init Specifies whether or not a should be initialized to the identity matrix.
If init = 'I', initialize a to (a section of) the identity matrix before

applying U.
If init = 'N', no initialization. Apply U to the input matrix A.
init = 'I' generates square or rectangular orthogonal matrices:

For m = n and side = 'L' or 'R', the rows and the columns are
orthogonal to each other.
For rectangular matrices where m < n:
• If side = 'R', ?laror produces a dense matrix in which rows are

orthogonal and columns are not.
• If side= 'L', ?laror produces a matrix in which rows are orthogonal,
first m columns are orthogonal, and remaining columns are zero.
For rectangular matrices where m > n:
• If side = 'L', ?laror produces a dense matrix in which columns are

orthogonal and rows are not.
• If side = 'R', ?laror produces a matrix in which columns are
orthogonal, first m rows are orthogonal, and remaining rows are zero.
m The number of rows of A.
n The number of columns of A.
a Array, size lda by n.
1035

lda≥ max(1, m).

On entry, specifies the seed of the random number generator. The array
elements must be between 0 and 4095; if not they are reduced mod 4096.
Also, iseed[3] must be odd.
x Workspace array, size (3*max( m, n )) .
Value of side Length of workspace
'L' 2*m + n
'R' 2*n + m
'C' or 'T' 3*n
Output Parameters
a On exit, overwritten
by UA ( if side = 'L' ),
by AU ( if side = 'R' ),
by UAUT ( if side = 'C' or 'T').
iseed The values of iseed are changed on exit, and can be used in the next call
to continue the same random number sequence.
info Array, size (4).

For slaror and dlaror:
If info < 0, the i -th parameter had an illegal value.
If info = 1, the random numbers generated by ?laror are bad.
For claror and zlaror:
If info = -1, side is not 'L', 'R', 'C', or 'T'.
If info = -3, if m is negative.
If info = -4, if m is negative or if side is 'C' or 'T' and n is not equal to

m.
If info = -6, if lda is less than m .
?larot
Applies a Givens rotation to two adjacent rows or
columns.
1036
LAPACK Routines 3
Syntax
void slarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, float *c, float *s, float *a, lapack_int *lda, float *xleft, float
*xright);
void dlarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, double *c, double *s, double *a, lapack_int *lda, double *xleft,
double *xright);
void clarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, lapack_complex *c, lapack_complex *s, lapack_complex *a, lapack_int
*lda, lapack_complex *xleft, lapack_complex *xright);
void zlarot (lapack_logical *lrows, lapack_logical *ileft, lapack_logical *iright,
lapack_int *nl, lapack_complex_double *c, lapack_complex_double *s,
lapack_complex_double *a, lapack_int *lda, lapack_complex_double *xleft,
lapack_complex_double *xright);
Include Files
• mkl.h
Description
The routine ?larot applies a Givens rotation to two adjacent rows or columns, where one element of the
first or last column or row is stored in some format other than GE so that elements of the matrix may be
used or modified for which no array element is provided.
One example is a symmetric matrix in SB format (bandwidth = 4), for which uplo = 'L'. Two adjacent rows
will have the format:
row j : C > C > C > C > C > . . . .

row j + 1 : C > C > C > C > C > . . . .
'*' indicates elements for which storage is provided.

'.' indicates elements for which no storage is provided, but are not necessarily zero; their values are
determined by symmetry.
' ' indicates elements which are required to be zero, and have no storage provided.
Those columns which have two '*' entries can be handled by srot (for slarot and clarot), or by
drot( for dlarot and zlarot).
Those columns which have no '*' entries can be ignored, since as long as the Givens rotations are carefully
applied to preserve symmetry, their values are determined.
Those columns which have one '*' have to be handled separately, by using separate variables p and q :
row j : C > C > C > C > C > p. . . .

row j + 1 : q C > C > C > C > C > . . . .
If element p is set correctly, ?larot rotates the column and sets p to its new value. The next call to ?larot
rotates columns j and j +1, and restore symmetry. The element q is zero at the beginning, and non-zero
after the rotation. Later, rotations would presumably be chosen to zero q out.
Typical Calling Sequences: rotating the i -th and (i +1)-st rows.
Input Parameters
lrows
If lrows = 1, ?larot rotates two rows.
1037
If lrows = 0, ?larot rotates two columns.
lleft
If lleft = 1, xleft is used instead of the corresponding element of a for
the first element in the second row (if lrows = 0) or column (if lrows=1).
If lleft = 0, the corresponding element of a is used.
lright
If lleft = 1, xright is used instead of the corresponding element of a for
the first element in the second row (if lrows = 0) or column (if lrows=1).
If lright = 0, the corresponding element of a is used.
nl The length of the rows (if lrows=1) or columns (if lrows=1) to be rotated.
If xleft or xright are used, the columns or rows they are in should be
included in nl, e.g., if lleft = lright = 1, then nl must be at least 2.
The number of rows or columns to be rotated exclusive of those involving

xleft and/or xright may not be negative, i.e., nl minus how many of
lleft and lright are 1 must be at least zero; if not, xerbla is called.
c, s Specify the Givens rotation to be applied.

If lrows = 1, then the matrix
is applied from the left.

If lrows = 0, then the transpose thereof is applied from the right.
a The array containing the rows or columns to be rotated. The first element of
a should be the upper left element to be rotated.
lda The "effective" leading dimension of a.
If a contains a matrix stored in GE or SY format, then this is just the

leading dimension of A.
If a contains a matrix stored in band (GB or SB) format, then this should be
one less than the leading dimension used in the calling routine. Thus, if a
in ?larot is of size lda*n, then a[(j - 1)*lda] would be the j -th
element in the first of the two rows to be rotated, and a[(j - 1)*lda
+ 1] would be the j -th in the second, regardless of how the array may be
stored in the calling routine. a cannot be dimensioned, because for band
format the row number may exceed lda, which is not legal FORTRAN.
If lrows = 1, then lda must be at least 1, otherwise it must be at least nl

minus the number of 1 values in xleft and xright.
xleft If lrows = 1, xleft is used and modified instead of a[1] (if lrows = 1)
or a[lda + 1] (if lrows = 0).
xright If lright = 1, xright is used and modified instead of a[(nl - 1)*lda]

(if lrows = 1) or a[nl - 1] (if lrows = 0).
1038
LAPACK Routines 3
Output Parameters
a On exit, modified array A.
?lartgp
Generates a plane rotation.
Syntax
lapack_int LAPACKE_slartgp (float f, floatg, float* cs, float* sn, float* r);
lapack_int LAPACKE_dlartgp (doublef, doubleg, double* cs, double* sn, double* r);
Include Files
• mkl.h
Description
The routine generates a plane rotation so that
where cs2 + sn2 = 1
This is a slower, more accurate version of the BLAS Level 1 routine ?rotg, except for the following
differences:
• f and g are unchanged on return.

• If g=0, then cs=(+/-)1 and sn=0.
• If f=0 and g≠ 0, then cs=0 and sn=(+/-)1.
The sign is chosen so that r≥ 0.
Input Parameters
f, g The first and second component of the vector to be rotated.
Output Parameters
cs The cosine of the rotation.
sn The sine of the rotation.
r The nonzero component of the rotated vector.
Return Values
If info =-1,f is NaN.
If info = -2, g is NaN.
1039
See Also
cblas_?rotg
?lartgs
?lartgs
Generates a plane rotation designed to introduce a
bulge in implicit QR iteration for the bidiagonal SVD
problem.
Syntax
lapack_int LAPACKE_slartgs (floatx, floaty, floatsigma, float* cs, float* sn);
lapack_int LAPACKE_dlartgs (doublex, doubley, doublesigma, double* cs, double* sn);
Include Files
• mkl.h
Description
The routine generates a plane rotation designed to introduce a bulge in Golub-Reinsch-style implicit QR
iteration for the bidiagonal SVD problem. x and y are the top-row entries, and sigma is the shift. The
computed cs and sn define a plane rotation that satisfies the following:
with r nonnegative.
If x2 - sigma and x * y are 0, the rotation is by π/2
Input Parameters
x, y The (1,1) and (1,2) entries of an upper bidiagonal matrix, respectively.
sigma Shift
Output Parameters
cs The cosine of the rotation.
sn The sine of the rotation.
Return Values
If info = - 1, x is NaN.
If info = - 2, y is NaN.
If info = - 3, sigma is NaN.
See Also
?lartgp
1040
LAPACK Routines 3
?lascl
Multiplies a general rectangular matrix by a real scalar
defined as cto/cfrom.
Syntax
lapack_int LAPACKE_slascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
float cfrom, float cto, lapack_int m, lapack_int n, float * a, lapack_int lda);
lapack_int LAPACKE_dlascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
double cfrom, double cto, lapack_int m, lapack_int n, double * a, lapack_int lda);
lapack_int LAPACKE_clascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
float cfrom, float cto, lapack_int m, lapack_int n, lapack_complex_float * a,
lapack_int lda);
lapack_int LAPACKE_zlascl (int matrix_layout, char type, lapack_int kl, lapack_int ku,
double cfrom, double cto, lapack_int m, lapack_int n, lapack_complex_double * a,
lapack_int lda);
Include Files
• mkl.h
Description
The routine ?lascl multiplies the m-by-n real/complex matrix A by the real scalar cto/cfrom. The operation
is performed without over/underflow as long as the final result cto*A(i,j)/cfrom does not over/underflow.
type specifies that A may be full, upper triangular, lower triangular, upper Hessenberg, or banded.
Input Parameters

type This parameter specifies the storage type of the input matrix.
= 'G': A is a full matrix.
= 'L': A is a lower triangular matrix.
= 'U': A is an upper triangular matrix.
= 'H': A is an upper Hessenberg matrix.
= 'B': A is a symmetric band matrix with lower bandwidth kl and upper

bandwidth ku and with the only the lower half stored
= 'Q': A is a symmetric band matrix with lower bandwidth kl and upper
bandwidth ku and with the only the upper half stored.
= 'Z': A is a band matrix with lower bandwidth kl and upper bandwidth ku.
See description of the ?gbtrf function for storage details.
kl The lower bandwidth of A. Referenced only if type = 'B', 'Q' or 'Z'.
ku The upper bandwidth of A. Referenced only if type = 'B', 'Q' or 'Z'.
1041
cfrom, cto The matrix A is multiplied by cto/cfrom. A(i,j) is computed without over/
underflow if the final result cto*A(i,j)/cfrom can be represented without
over/underflow. cfrom must be nonzero.
m The number of rows of the matrix A. m≥ 0.
n The number of columns of the matrix A. n≥ 0.
a Array, size (lda*n). The matrix to be multiplied by cto/cfrom. See type for
the storage type.

lda≥ max(1,m).
Output Parameters
a The multiplied matrix A.
info If info = 0 - successful exit
If info = -i < 0, the i-th argument had an illegal value.
See Also
?gbtrf
?laset
Initializes the off-diagonal elements and the diagonal
elements of a matrix to given values.
Syntax
lapack_int LAPACKE_slaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , float alpha , float beta , float * a , lapack_int lda );
lapack_int LAPACKE_dlaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , double alpha , double beta , double * a , lapack_int lda );
lapack_int LAPACKE_claset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , lapack_complex_float alpha , lapack_complex_float beta , lapack_complex_float * a ,
lapack_int lda );
lapack_int LAPACKE_zlaset (int matrix_layout , char uplo , lapack_int m , lapack_int
n , lapack_complex_double alpha , lapack_complex_double beta , lapack_complex_double *
a , lapack_int lda );
Include Files
• mkl.h
Description
The routine initializes an m-by-n matrix A to beta on the diagonal and alpha on the off-diagonals.
Input Parameters
1042
LAPACK Routines 3
uplo Specifies the part of the matrix A to be set.

If uplo = 'U', upper triangular part is set; the strictly lower triangular
part of A is not changed.
If uplo = 'L': lower triangular part is set; the strictly upper triangular
part of A is not changed.
Otherwise: All of the matrix A is set.
m The number of rows of the matrix A. m≥ 0.

n≥ 0.
alpha, beta The constants to which the off-diagonal and diagonal elements are to be
set, respectively.
The array a contains the m-by-n matrix A.

lda≥ max(1,m) for column major layout and lda ≥ max(1,n) for row major
layout.
Output Parameters
a On exit, the leading m-by-n submatrix of A is set as follows:

if uplo = 'U', Aij = alpha, 1≤i≤j-1, 1≤j≤n,
if uplo = 'L', Aij = alpha, j+1≤i≤m, 1≤j≤n,
otherwise, Aij = alpha, 1≤i≤m, 1≤j≤n, i≠j,
and, for all uplo, Aij = beta, 1≤i≤min(m, n).
Return Values
If info = i< 0, the i-th parameter had an illegal value.
?lasrt
Sorts numbers in increasing or decreasing order.
Syntax
lapack_int LAPACKE_slasrt (char id , lapack_int n , float * d );
lapack_int LAPACKE_dlasrt (char id , lapack_int n , double * d );
1043
Include Files
• mkl.h
Description
The routine ?lasrt sorts the numbers in d in increasing order (if id = 'I') or in decreasing order (if id =
'D'). It uses Quick Sort, reverting to Insertion Sort on arrays of size ≤ 20. Dimension of stack limits n to
about 232.
Input Parameters
id = 'I': sort d in increasing order;
= 'D': sort d in decreasing order.
n The length of the array d.
d On entry, the array to be sorted.
Output Parameters
d On exit, d has been sorted into increasing order

(d[0]≤d[1]≤ ... ≤d[n-1]) or into decreasing order
(d[0] ≥ d[1] ≥ ... ≥d[n-1]), depending on id.
Return Values
?laswp
Performs a series of row interchanges on a general
rectangular matrix.
Syntax
lapack_int LAPACKE_slaswp (int matrix_layout , lapack_int n , float * a , lapack_int
lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv , lapack_int incx );
lapack_int LAPACKE_dlaswp (int matrix_layout , lapack_int n , double * a , lapack_int
lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv , lapack_int incx );
lapack_int LAPACKE_claswp (int matrix_layout , lapack_int n , lapack_complex_float *
a , lapack_int lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv ,
lapack_int incx );
lapack_int LAPACKE_zlaswp (int matrix_layout , lapack_int n , lapack_complex_double *
a , lapack_int lda , lapack_int k1 , lapack_int k2 , const lapack_int * ipiv ,
lapack_int incx );
Include Files
• mkl.h
1044
LAPACK Routines 3
Description
The routine performs a series of row interchanges on the matrix A. One row interchange is initiated for each
of rows k1 through k2 of A.
Input Parameters

a Array, size max(1, lda*n) for column major and max(1, lda*mm) for row
major layout. Here mm is not less than maximum of values
ipiv[k1-1+j*|incx|], 0≤j<k2-k1.
Array a contains the m-by-n matrix A.
k1 The first element of ipiv for which a row interchange will be done.
k2 The last element of ipiv for which a row interchange will be done.
ipiv
Array, size k1+(k2-k1)*|incx|).
The vector of pivot indices. Only the elements in positions k1 through k2 of

ipiv are accessed.
ipiv(k) = l implies rows k and l are to be interchanged.
incx The increment between successive values of ipiv. If ipiv is negative, the
pivots are applied in reverse order.
Output Parameters
a On exit, the permuted matrix.
Return Values
?latm1
Computes the entries of a matrix as specified.
Syntax
void slatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, float *d, lapack_int *n, lapack_int *info);
void dlatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, double *d, lapack_int *n, lapack_int *info);
1045
void clatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, lapack_complex *d, lapack_int *n, lapack_int *info);
void zlatm1 (lapack_int *mode, *cond, lapack_int *irsign, lapack_int *idist, lapack_int
*iseed, lapack_complex_double *d, lapack_int *n, lapack_int *info);
Include Files
• mkl.h
Description
The ?latm1 routine computes the entries of D(1..n) as specified by mode, cond and irsign. idist and
iseed determine the generation of random numbers.
?latm1 is called by slatmr (for slatm1 and dlatm1), and by clatmr(for clatm1 and zlatm1) to generate
random test matrices for LAPACK programs.
Input Parameters
mode On entry describes how d is to be computed:
mode = 0 means do not change d.

mode = 1 sets d[0] = 1 and d[1:n - 1] = 1.0/cond
mode = 2 sets d[0:n - 2] = 1 and d[n - 1]=1.0/cond
mode = 3 sets d[i - 1]=cond**(-(i-1)/(n-1))
mode = 4 sets d[i - 1]= 1 - (i-1)/(n-1)*(1 - 1/cond)
mode = 5 sets d to random numbers in the range ( 1/cond , 1 ) such
that their logarithms are uniformly distributed.
mode = 6 sets d to random numbers from same distribution as the rest of
the matrix.
mode < 0 has the same meaning as abs(mode), except that the order of
the elements of d is reversed.
Thus if mode is positive, d has entries ranging from 1 to 1/cond, if

negative, from 1/cond to 1.
cond On entry, used as described under mode above. If used, it must be ≥ 1.
irsign
On entry, if mode is not -6, 0, or 6, determines sign of entries of d.
If irsign = 0, entries of d are unchanged.
If irsign = 1, each entry of d is multiplied by a random complex number

uniformly distributed with absolute value 1.
idist Specifies the distribution of the random numbers.

For slatm1 and dlatm1:
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
For clatm1 and zlatm1:
1046
LAPACK Routines 3
= 4: complex number uniform in disk(0, 1)

Specifies the seed of the random number generator. The random number
generator uses a linear congruential sequence limited to small integers, and
so should produce machine independent random numbers. The values of
iseed[3] are changed on exit, and can be used in the next call to ?latm1
to continue the same random number sequence.
d Array, size n.
n Number of entries of d.
Output Parameters
d On exit, d is updated, unless mode = 0.
If info = -1, mode is not in range -6 to 6.
If info = -2, mode is neither -6, 0 nor 6, and irsign is neither 0 nor 1.
If info = -3, mode is neither -6, 0 nor 6 and cond is less than 1.
If info = -4, mode equals 6 or -6 and idist is not in range 1 to 4.
If info = -7, n is negative.
?latm2
Returns an entry of a random matrix.
Syntax
float slatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed, float *d, lapack_int
*igrade, float *dl, float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
double dlatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed, double *d, lapack_int
*igrade, double *dl, double *dr, lapack_int *ipvtng, lapack_int *iwork, double
*sparse);
void clatm2 (lapack_complex_float *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *igrade, lapack_complex_float *dl,
lapack_complex_float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
1047
void zlatm2 (lapack_complex_double *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_double *d, lapack_int *igrade, lapack_complex_double *dl,
lapack_complex_double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
lapack_complex_float clatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int

*j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_float *d, lapack_int *igrade, lapack_complex_float *dl,
lapack_complex_float *dr, lapack_int *ipvtng, lapack_int *iwork, float *sparse);
lapack_complex_double zlatm2 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int *iseed,
lapack_complex_double *d, lapack_int *igrade, lapack_complex_double *dl,
lapack_complex_double *dr, lapack_int *ipvtng, lapack_int *iwork, double *sparse);
Include Files
• mkl.h
Description
The ?latm2 routine returns entry (i , j ) of a random matrix of dimension (m, n). It is called by the ?latmr
routine in order to build random test matrices. No error checking on parameters is done, because this routine
is called in a tight loop by ?latmr which has already checked the parameters.
Use of ?latm2 differs from ?latm3 in the order in which the random number generator is called to fill in
random matrix entries. With ?latm2, the generator is called to fill in the pivoted matrix columnwise. With ?
latm2, the generator is called to fill in the matrix columnwise, after which it is pivoted. Thus, ?latm3 can be
used to construct random matrices which differ only in their order of rows and/or columns. ?latm2 is used to
construct band matrices while avoiding calling the random number generator for entries outside the band
(and therefore generating random numbers).
The matrix whose (i , j ) entry is returned is constructed as follows (this routine only computes one entry):
• If i is outside (1..m) or j is outside (1..n), returns zero (this is convenient for generating matrices in
band format).
• Generate a matrix A with random entries of distribution idist.
• Set the diagonal to D.
• Grade the matrix, if desired, from the left (by dl) and/or from the right (by dr or dl) as specified by
igrade.
• Permute, if desired, the rows and/or columns as specified by ipvtng and iwork.
• Band the matrix to have lower bandwidth kl and upper bandwidth ku.
• Set random entries to zero as specified by sparse.
Input Parameters
m Number of rows of the matrix.
n Number of columns of the matrix.
i Row of the entry to be returned.
j Column of the entry to be returned.
1048
LAPACK Routines 3
kl Lower bandwidth.
ku Upper bandwidth.
idist On entry, idist specifies the type of distribution to be used to generate a

random matrix .
for slatm2 and dlatm2:
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
for clatm2 and zlatm2:

= 4: complex number uniform in disk (0, 1)

Seed for the random number generator.
d Array, size (min(i, j)). Diagonal entries of matrix.
igrade Specifies grading of matrix as follows:

= 0: no grading
= 1: matrix premultiplied by diag( dl )
= 2: matrix postmultiplied by diag( dr )
= 3: matrix premultiplied by diag( dl ) and postmultiplied by diag( dr)
= 4: matrix premultiplied by diag( dl ) and postmultiplied by

inv( diag( dl ) )
For slatm2 and slatm2:
= 5: matrix premultiplied by diag( dl ) and postmultiplied by diag( dl)

diag( conjg( dl ) )
dl Array, size (i or j), as appropriate.
Left scale factors for grading matrix.
dr Array, size (i or j), as appropriate.
Right scale factors for grading matrix.
ipvtng On entry specifies pivoting permutations as follows:

= 0: none
= 1: row pivoting
1049
= 2: column pivoting
= 3: full pivoting, i.e., on both sides
iwork Array, size (i or j), as appropriate. This array specifies the permutation
used. The row (or column) in position k was originally in position iwork[k
- 1]. This differs from iwork for ?latm3.
sparse Specifies the sparsity of the matrix. If sparse matrix is to be generated,

sparse should lie between 0 and 1. A uniform ( 0, 1 ) random number x is
generated and compared to sparse. If x is larger the matrix entry is
unchanged and if x is smaller the entry is set to zero. Thus on the average
a fraction sparse of the entries will be set to zero.
Output Parameters
Return Values
The function returns an entry of a random matrix (for complex variations libmkl_gf_* interface layer/
libraries return the result as the parameter res).
?latm3
Returns set entry of a random matrix.
Syntax
float slatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int
*iseed, float *d, lapack_int *igrade, float *dl, float *dr, lapack_int *ipvtng,
lapack_int *iwork, float *sparse);
double dlatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int *j, lapack_int
*isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int *idist, lapack_int
*iseed, double *d, lapack_int *igrade, double *dl, double *dr, lapack_int *ipvtng,
lapack_int *iwork, double *sparse);
void clatm3 (lapack_complex_float *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku,
lapack_int *idist, lapack_int *iseed, lapack_complex_float *d, lapack_int *igrade,
lapack_complex_float *dl, lapack_complex_float *dr, lapack_int *ipvtng, lapack_int
*iwork, float *sparse);
void zlatm3 (lapack_complex_double *res, lapack_int *m, lapack_int *n, lapack_int *i,
lapack_int *j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku,
lapack_int *idist, lapack_int *iseed, lapack_complex_double *d, lapack_int *igrade,
lapack_complex_double *dl, lapack_complex_double *dr, lapack_int *ipvtng, lapack_int
*iwork, double *sparse);
1050
LAPACK Routines 3
lapack_complex_float clatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int
*idist, lapack_int *iseed, lapack_complex_float *d, lapack_int *igrade,
lapack_complex_float *dl, lapack_complex_float *dr, lapack_int *ipvtng, lapack_int
*iwork, float *sparse);
lapack_complex_double zlatm3 (lapack_int *m, lapack_int *n, lapack_int *i, lapack_int
*j, lapack_int *isub, lapack_int *jsub, lapack_int *kl, lapack_int *ku, lapack_int
*idist, lapack_int *iseed, lapack_complex_double *d, lapack_int *igrade,
lapack_complex_double *dl, lapack_complex_double *dr, lapack_int *ipvtng, lapack_int
*iwork, double *sparse);
Include Files
• mkl.h
Description
The ?latm3 routine returns the (isub, jsub) entry of a random matrix of dimension (m, n) described by the
other parameters. (isub, jsub) is the final position of the (i ,j ) entry after pivoting according to ipvtng and
iwork. ?latm3 is called by the ?latmr routine in order to build random test matrices. No error checking on
parameters is done, because this routine is called in a tight loop by ?latmr which has already checked the
parameters.
Use of ?latm3 differs from ?latm2 in the order in which the random number generator is called to fill in
random matrix entries. With ?latm2, the generator is called to fill in the pivoted matrix columnwise. With ?
latm3, the generator is called to fill in the matrix columnwise, after which it is pivoted. Thus, ?latm3 can be
used to construct random matrices which differ only in their order of rows and/or columns. ?latm2 is used to
construct band matrices while avoiding calling the random number generator for entries outside the band
(and therefore generating random numbers in different orders for different pivot orders).
The matrix whose (isub, jsub ) entry is returned is constructed as follows (this routine only computes one
entry):
• If isub is outside (1..m) or jsub is outside (1..n), returns zero (this is convenient for generating
matrices in band format).
• Generate a matrix A with random entries of distribution idist.
• Set the diagonal to D.
• Grade the matrix, if desired, from the left (by dl) and/or from the right (by dr or dl) as specified by
igrade.
• Permute, if desired, the rows and/or columns as specified by ipvtng and iwork.
• Band the matrix to have lower bandwidth kl and upper bandwidth ku.
• Set random entries to zero as specified by sparse.
Input Parameters
m Number of rows of matrix.
n Number of columns of matrix.
i Row of unpivoted entry to be returned.
j Column of unpivoted entry to be returned.
1051
isub Row of pivoted entry to be returned.
jsub Column of pivoted entry to be returned.
kl Lower bandwidth.
ku Upper bandwidth.
idist On entry, idist specifies the type of distribution to be used to generate a

random matrix.
for slatm2 and dlatm2:
= 1: uniform (0,1)
= 2: uniform (-1,1)
= 3: normal (0,1)
for clatm2 and zlatm2:

= 4: complex number uniform in disk(0, 1)

Seed for random number generator.
d Array, size (min(i, j)). Diagonal entries of matrix.
igrade Specifies grading of matrix as follows:

= 0: no grading
= 1: matrix premultiplied by diag( dl )
= 2: matrix postmultiplied by diag( dr )
= 3: matrix premultiplied by diag( dl ) and postmultiplied by diag( dr)

inv( diag( dl ) )
For slatm2 and slatm2:

diag( conjg( dl ) )
dl Array, size (i or j, as appropriate).
Left scale factors for grading matrix.
dr Array, size (i or j, as appropriate).
Right scale factors for grading matrix.
1052
LAPACK Routines 3
ipvtng On entry specifies pivoting permutations as follows:
If ipvtng = 0: none.
If ipvtng = 1: row pivoting.
If ipvtng = 2: column pivoting.
If ipvtng = 3: full pivoting, i.e., on both sides.
sparse On entry, specifies the sparsity of the matrix if sparse matrix is to be

generated. sparse should lie between 0 and 1. A uniform( 0, 1 ) random
number x is generated and compared to sparse; if x is larger the matrix
entry is unchanged and if x is smaller the entry is set to zero. Thus on the
average a fraction sparse of the entries will be set to zero.
iwork Array, size (i or j, as appropriate). This array specifies the permutation

used. The row (or column) originally in position k is in position iwork[k -
1] after pivoting. This differs from iwork for ?latm2.
Output Parameters
isub On exit, row of pivoted entry is updated.
jsub On exit, column of pivoted entry is updated.
Return Values
The function returns an entry of a random matrix (for complex variations libmkl_gf_* interface layer/
libraries return the result as the parameter res).
?latm5
Generates matrices involved in the Generalized
Sylvester equation.
Syntax
void slatm5 (*prtype, lapack_int *m, lapack_int *n, float *a, lapack_int *lda, float
*b, lapack_int *ldb, float *c, lapack_int *ldc, float *d, lapack_int *ldd, float *e,
lapack_int *lde, float *f, lapack_int *ldf, float *r, lapack_int *ldr, float *l,
lapack_int *ldl, float *alpha, lapack_int *qblcka, lapack_int *qblckb);
void dlatm5 (*prtype, lapack_int *m, lapack_int *n, double *a, lapack_int *lda, double
*b, lapack_int *ldb, double *c, lapack_int *ldc, double *d, lapack_int *ldd, double
*e, lapack_int *lde, double *f, lapack_int *ldf, double *r, lapack_int *ldr, double
*l, lapack_int *ldl, double *alpha, lapack_int *qblcka, lapack_int *qblckb);
void clatm5 (*prtype, lapack_int *m, lapack_int *n, lapack_complex_float *a, lapack_int
*lda, lapack_complex_float *b, lapack_int *ldb, lapack_complex_float *c, lapack_int
*ldc, lapack_complex_float *d, lapack_int *ldd, lapack_complex_float *e, lapack_int
*lde, lapack_complex_float *f, lapack_int *ldf, lapack_complex_float *r, lapack_int
*ldr, lapack_complex_float *l, lapack_int *ldl, float *alpha, lapack_int *qblcka,
lapack_int *qblckb);
1053
void zlatm5 (*prtype, lapack_int *m, lapack_int *n, lapack_complex_double *a,

lapack_int *lda, lapack_complex_double *b, lapack_int *ldb, lapack_complex_double *c,
lapack_int *ldc, lapack_complex_double *d, lapack_int *ldd, lapack_complex_double *e,
lapack_int *lde, lapack_complex_double *f, lapack_int *ldf, lapack_complex_double *r,
lapack_int *ldr, lapack_complex_double *l, lapack_int *ldl, float *alpha, lapack_int
*qblcka, lapack_int *qblckb);
Include Files
• mkl.h
Description
The ?latm5 routine generates matrices involved in the Generalized Sylvester equation:
A * R - L * B = C
D * R - L * E = F
They also satisfy the diagonalization condition:
Input Parameters
prtype Specifies the type of matrices to generate.
• If prtype = 1, A and B are Jordan blocks, D and E are identity

matrices.
A:
If (i == j) then Ai, j = 1.0.
If (j == i + 1) then Ai, j = -1.0.
Otherwise Ai, j = 0.0, i, j = 1...m
B:
If (i == j) then Bi, j = 1.0 - alpha.
If (j == i + 1) then Bi, j = 1.0 .
Otherwise Bi, j = 0.0, i, j = 1...n.
D:
If (i == j) then Di, j = 1.0.
Otherwise Di, j = 0.0, i, j = 1...m.
E:
If (i == j) then Ei, j = 1.0
Otherwise Ei, j = 0.0, i, j = 1...n.
L = R are chosen from [-10...10], which specifies the right hand sides
(C, F).
• If prtype = 2 or 3: Triangular and/or quasi- triangular.
1054
LAPACK Routines 3
A:
If (i≤j) then Ai, j = [-1...1].
Otherwise Ai, j = 0.0, i, j = 1...M.
If (prtype = 3) then Ak + 1, k + 1 = Ak, k;
Ak + 1, k = [-1...1];
sign(Ak, k + 1) = -(sign(Ak + 1, k).
k = 1, m- 1, qblcka
B:
If (i≤j) then Bi, j = [-1...1].
Otherwise Bi, j = 0.0, i, j = 1...n.
If (prtype = 3) thenBk + 1, k + 1 = Bk, k
Bk + 1, k = [-1...1]
sign(Bk, k + 1)= -(sign(Bk + 1, k)
k = 1, n - 1, qblckb.
D:
If (i≤j) then Di, j = [-1...1].
Otherwise Di, j = 0.0, i, j = 1...m.
E:
If (i <= j) then Ei, j = [-1...1].
Otherwise Ei, j = 0.0, i, j = 1...N.
L, R are chosen from [-10...10], which specifies the right hand sides (C,
F).
• If prtype = 4 Full
Ai, j = [-10...10]
Di, j = [-1...1] i,j = 1...m
Bi, j = [-10...10]
Ei, j = [-1...1] i,j = 1...n
Ri, j = [-10...10]
Li, j = [-1...1] i = 1..m ,j = 1...n
L and R specifies the right hand sides (C, F).

• If prtype = 5 special case common and/or close eigs.
m Specifies the order of A and D and the number of rows in C, F, R and L.
n Specifies the order of B and E and the number of columns in C, F, R and L.
ldb The leading dimension of b.
ldc The leading dimension of c.
1055
ldd The leading dimension of d.
lde The leading dimension of e.
ldf The leading dimension of f.
ldr The leading dimension of r.
ldl The leading dimension of l.
alpha Parameter used in generating prtype = 1 and 5 matrices.
qblcka When prtype = 3, specifies the distance between 2-by-2 blocks on the
diagonal in A. Otherwise, qblcka is not referenced. qblcka > 1.
qblckb When prtype = 3, specifies the distance between 2-by-2 blocks on the
diagonal in B. Otherwise, qblckb is not referenced. qblckb > 1.
Output Parameters
a Array, size lda*m. On exit a contains them-by-m array A initialized

according to prtype.
b Array, size ldb*n. On exit b contains the n-by-n array B initialized

c Array, size ldc*n. On exit c contains the m-by-n array C initialized

d Array, size ldd*m. On exit d contains the m-by-m array D initialized

e Array, size lde*n. On exit e contains the n-by-n array E initialized according
to prtype.
f Array, size ldf*n. On exit f contains the m-by-n array F initialized

r Array, size ldr*n. On exit R contains the m-by-n array R initialized

l Array, size ldl*n. On exit l contains the m-by-narray L initialized according

to prtype.
?latm6
Generates test matrices for the generalized eigenvalue
problem, their corresponding right and left
eigenvector matrices, and also reciprocal condition
numbers for all eigenvalues and the reciprocal
condition numbers of eigenvectors corresponding to
the 1th and 5th eigenvalues.
1056
LAPACK Routines 3
Syntax
void slatm6 (lapack_int *type, lapack_int *n, float *a, lapack_int *lda, float *b,
float *x, lapack_int *ldx, float *y, lapack_int *ldy, float *alpha, float *beta, float
*wx, float *wy, float *s, float *dif);
void dlatm6 (lapack_int *type, lapack_int *n, double *a, lapack_int *lda, double *b,
double *x, lapack_int *ldx, double *y, lapack_int *ldy, double *alpha, double *beta,
double *wx, double *wy, double *s, double *dif);
void clatm6 (lapack_int *type, lapack_int *n, lapack_complex_float *a, lapack_int *lda,
lapack_complex_float *b, lapack_complex_float *x, lapack_int *ldx, lapack_complex_float
*y, lapack_int *ldy, lapack_complex_float *alpha, lapack_complex_float *beta,
lapack_complex_float *wx, lapack_complex_float *wy, float *s, float *dif);
void zlatm6 (lapack_int *type, lapack_int *n, lapack_complex_double *a, lapack_int
*lda, lapack_complex_double *b, lapack_complex_double *x, lapack_int *ldx,
lapack_complex_double *y, lapack_int *ldy, lapack_complex_double *alpha,
lapack_complex_double *beta, lapack_complex_double *wx, lapack_complex_double *wy,
double *s, double *dif);
Include Files
• mkl.h
Description
The ?latm6 routine generates test matrices for the generalized eigenvalue problem, their corresponding right
and left eigenvector matrices, and also reciprocal condition numbers for all eigenvalues and the reciprocal
condition numbers of eigenvectors corresponding to the 1th and 5th eigenvalues.
There two kinds of test matrix pairs:
(A, B)= inverse(YH) * (Da, Db) * inverse(X)
Type 1:
Type 2:
In both cases the same inverse(YH) and inverse(X) are used to compute (A, B), giving the exact eigenvectors
to (A,B) as (YH, X):
1057
,
where a, b, x and y will have all values independently of each other.
Input Parameters
type Specifies the problem type.
n Size of the matrices A and B.
lda The leading dimension of a and of b.
ldx The leading dimension of x.
ldy The leading dimension of y.
alpha, beta Weighting constants for matrix A.
wx Constant for right eigenvector matrix.
wy Constant for left eigenvector matrix.
Output Parameters
a Array, size lda*n. On exit, a contains the n-by-n matrix initialized

according to type.
b Array, size lda*n. On exit, b contains the n-by-n matrix initialized

according to type.
x Array, size ldx*n. On exit, x contains the n-by-n matrix of right

eigenvectors.
y Array, size ldy*n. On exit, y is the n-by-n matrix of left eigenvectors.
s Array, size (n). s[i - 1] is the reciprocal condition number for eigenvalue
i.
dif Array, size(n). dif[i - 1] is the reciprocal condition number for

eigenvector i .
?latme
Generates random non-symmetric square matrices
with specified eigenvalues.
1058
LAPACK Routines 3
Syntax
void slatme (lapack_int *n, char *dist, lapack_int *iseed, float *d, lapack_int *mode,
float *cond, float *dmax, char *ei, char *rsign, char *upper, char *sim, float *ds,
lapack_int *modes, float *conds, lapack_int *kl, lapack_int *ku, float *anorm, float
*a, lapack_int *lda, float *work, lapack_int *info);void dlatme (lapack_int *n, char
*dist, lapack_int *iseed, double *d, lapack_int *mode, double *cond, double *dmax,
char *ei, char *rsign, char *upper, char *sim, double *ds, lapack_int *modes, double
*conds, lapack_int *kl, lapack_int *ku, double *anorm, double *a, lapack_int *lda,
double *work, lapack_int *info);void clatme (lapack_int *n, char *dist, lapack_int
*iseed, lapack_complex_float *d, lapack_int *mode, float *cond, lapack_complex_float
*dmax, char *ei, char *rsign, char *upper, char *sim, float *ds, lapack_int *modes,
float *conds, lapack_int *kl, lapack_int *ku, float *anorm, lapack_complex_float *a,
lapack_int *lda, lapack_complex_float *work, lapack_int *info);void zlatme (lapack_int
*n, char *dist, lapack_int *iseed, lapack_complex_double *d, lapack_int *mode, double
*cond, lapack_complex_double *dmax, char *ei, char *rsign, char *upper, char *sim,
double *ds, lapack_int *modes, double *conds, lapack_int *kl, lapack_int *ku, double
*anorm, lapack_complex_double *a, lapack_int *lda, lapack_complex_double *work,
lapack_int *info);
Include Files
• mkl.h
Description
The ?latme routine generates random non-symmetric square matrices with specified eigenvalues. ?latme
operates by applying the following sequence of operations:
1. Set the diagonal to d, where d may be input or computed according to mode, cond, dmax, and rsign as
described below.
2. If upper = 'T', the upper triangle of a is set to random values out of distribution dist.
3. If sim='T', a is multiplied on the left by a random matrix X, whose singular values are specified by ds,
modes, and conds, and on the right by X inverse.
4. If kl < n-1, the lower bandwidth is reduced to kl using Householder transformations. If ku < n-1,
the upper bandwidth is reduced to ku.
5. If anorm is not negative, the matrix is scaled to have maximum-element-norm anorm.
NOTE
Since the matrix cannot be reduced beyond Hessenberg form, no packing options are available.
Input Parameters
n The number of columns (or rows) of A.
dist On entry, dist specifies the type of distribution to be used to generate the
random eigen-/singular values, and on the upper triangle (see upper).
If dist = 'U': uniform( 0, 1 )
If dist = 'S': uniform( -1, 1 )
If dist = 'N': normal( 0, 1 )
If dist = 'D': uniform on the complex disc |z| < 1.
1059

On entry iseed specifies the seed of the random number generator. The
elements should lie between 0 and 4095 inclusive, and iseed[3] should be
odd. The random number generator uses a linear congruential sequence
limited to small integers, and so should produce machine independent
random numbers.
d Array, size (n). This array is used to specify the eigenvalues of A.

If mode = 0, then d is assumed to contain the eigenvalues. Otherwise they
are computed according to mode, cond, dmax, and rsign and placed in d.
mode On entry mode describes how the eigenvalues are to be specified:
mode = 0 means use d (with ei for slatme and dlatme) as input.

mode = 1 sets d[0] = 1 and d(2:n]=1.0/cond.
mode = 2 sets d[0:n - 2] = 1 and d[n - 1]=1.0/cond.
mode = 3 sets d[i - 1] = cond**(-(i-1)/(n-1)).
mode = 4 sets d[i - 1] = 1 - (i-1)/(n-1)*(1 - 1/cond).
the matrix.
Thus if mode is between 1 and 4, d has entries ranging from 1 to 1/cond, if

between -1 and -4, d has entries ranging from 1/cond to 1.
cond On entry, this is used as described under mode above. If used, it must be ≥
1.
dmax If mode is not -6, 0 or 6, the contents of d as computed according to mode

and cond are scaled by dmax / max(abs(d[i - 1])). Note that dmax
needs not be positive or real: if dmax is negative or complex (or zero), d will
be scaled by a negative or complex number (or zero). If rsign='F' then
the largest (absolute) eigenvalue will be equal to dmax.
ei Used by slatme and dlatme only.
Array, size (n).
If mode = 0, and ei[0]is not ' ' (space character), this array specifies
which elements of d (on input) are real eigenvalues and which are the real
and imaginary parts of a complex conjugate pair of eigenvalues. The
elements of ei may then only have the values 'R' and 'I'.
If ei[j - 1] = 'R' and ei[j] = 'I', then the j -th eigenvalue is

cmplx( d[j - 1] , d[j] ), and the (j +1)-th is the complex conjugate
thereof.
If ei[j - 1] = ei[j]='R', then the j-th eigenvalue is d[j - 1] (i.e.,
real). ei[0] may not be 'I', nor may two adjacent elements of ei both
have the value 'I'.
1060
LAPACK Routines 3
If mode is not 0, then ei is ignored. If mode is 0 and ei[0] = ' ', then
the eigenvalues will all be real.
rsign If mode is not 0, 6, or -6, and rsign = 'T', then the elements of d, as
computed according to mode and cond, are multiplied by a random sign (+1
or -1) for slatme and dlatme or by a complex number from the unit circle
|z| = 1 for clatme and zlatme.
If rsign = 'F', the elements of d are not multiplied. rsign may only have
the values 'T' or 'F'.
upper If upper = 'T', then the elements of a above the diagonal will be set to
random numbers out of dist.
If upper = 'F', they will not. upper may only have the values 'T' or 'F'.
sim If sim = 'T', then a will be operated on by a "similarity transform", i.e.,

multiplied on the left by a matrix X and on the right by X inverse. X = USV,
where U and V are random unitary matrices and S is a (diagonal) matrix of
singular values specified by ds, modes, and conds.
If sim = 'F', then a will not be transformed.
ds This array is used to specify the singular values of X, in the same way that
d specifies the eigenvalues of a. If mode = 0, the ds contains the singular
values, which may not be zero.
modes
Similar to mode, but for specifying the diagonal of S. modes = -6 and +6
are not allowed (since they would result in randomly ill-conditioned
eigenvalues.)
conds Similar to cond, but for specifying the diagonal of S.
kl This specifies the lower bandwidth of the matrix. kl = 1 specifies upper

Hessenberg form. If kl is at least n-1, then A will have full lower
bandwidth.
ku This specifies the upper bandwidth of the matrix. ku = 1 specifies lower

Hessenberg form.
If ku is at least n-1, then a will have full upper bandwidth.
If ku and ku are both at least n-1, then a will be dense. Only one of ku and
kl may be less than n-1.
anorm If anorm is not negative, then a is scaled by a non-negative real number to

make the maximum-element-norm of a to be anorm.
lda Number of rows of matrix A.
work Array, size (3*n). Workspace.
Output Parameters
d Modified if mode is nonzero.
1061
ds Modified if mode is nonzero.
a Array, size lda*n. On exit, a is the desired test matrix.
info If info = 0, execution is successful.
If info = -1, n is negative .
If info = -2, dist is an illegal string.
If info = -6, cond is less than 1.0, and mode is not -6, 0, or 6 .
If info = -9, rsign is not 'T' or 'F' .
If info = -10, upper is not 'T' or 'F'.
If info = -11, sim is not 'T' or 'F'.
If info = -12, modes = 0 and ds has a zero singular value.
If info = -13, modes is not in the range -5 to 5.
If info = -14, modes is nonzero and conds is less than 1. .
If info = -15, kl is less than 1.
If info = -16, ku is less than 1, or kl and ku are both less than n-1.
If info = -19, lda is less than m.
If info = 1, error return from ?latm1 (computing d) .
If info = 2, cannot scale to dmax (max. eigenvalue is 0) .
If info = 3, error return from slatm1(for slatme and clatme), dlatm1

(for dlatme and zlatme) .
If info = 4, error return from ?large.
If info = 5, zero singular value from slatm1(for slatme and clatme),

dlatm1(for dlatme and zlatme).
?latmr
Generates random matrices of various types.
Syntax
void slatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
float *d, lapack_int *mode, float *cond, float *dmax, char *rsign, char *grade, float
*dl, lapack_int *model, float *condl, float *dr, lapack_int *moder, float *condr, char
*pivtng, lapack_int *ipivot, lapack_int *kl, lapack_int *ku, float *sparse, float
*anorm, char *pack, float *a, lapack_int *lda, lapack_int *iwork, lapack_int *info);
void dlatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
double *d, lapack_int *mode, double *cond, double *dmax, char *rsign, char *grade,
double *dl, lapack_int *model, double *condl, double *dr, lapack_int *moder, double
*condr, char *pivtng, lapack_int *ipivot, lapack_int *kl, lapack_int *ku, double
*sparse, double *anorm, char *pack, double *a, lapack_int *lda, lapack_int *iwork,
lapack_int *info);
1062
LAPACK Routines 3
void clatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
lapack_complex *d, lapack_int *mode, float *cond, lapack_complex *dmax, char *rsign,
char *grade, lapack_complex *dl, lapack_int *model, float *condl, lapack_complex *dr,
lapack_int *moder, float *condr, char *pivtng, lapack_int *ipivot, lapack_int *kl,
lapack_int *ku, float *sparse, float *anorm, char *pack, float *a, lapack_int *lda,
lapack_int *iwork, lapack_int *info);
void zlatmr (lapack_int *m, lapack_int *n, char *dist, lapack_int *iseed, char *sym,
lapack_complex_double *d, lapack_int *mode, float *cond, lapack_complex_double *dmax,
char *rsign, char *grade, lapack_complex_double *dl, lapack_int *model, float *condl,
lapack_complex_double *dr, lapack_int *moder, float *condr, char *pivtng, lapack_int
*ipivot, lapack_int *kl, lapack_int *ku, float *sparse, float *anorm, char *pack,
float *a, lapack_int *lda, lapack_int *iwork, lapack_int *info);
Description
The ?latmr routine operates by applying the following sequence of operations:
1. Generate a matrix A with random entries of distribution dist:
If sym = 'S', the matrix is symmetric,
If sym = 'H', the matrix is Hermitian,
If sym = 'N', the matrix is nonsymmetric.

2. Set the diagonal to D, where D may be input or computed according to mode, cond, dmax and rsign as
described below.
3. Grade the matrix, if desired, from the left or right as specified by grade. The inputs dl, model, condl,
dr, moder and condr also determine the grading as described below.
4. Permute, if desired, the rows and/or columns as specified by pivtng and ipivot.
5. Set random entries to zero, if desired, to get a random sparse matrix as specified by sparse.
6. Make A a band matrix, if desired, by zeroing out the matrix outside a band of lower bandwidth kl and
upper bandwidth ku.
7. Scale A, if desired, to have maximum entry anorm.
8. Pack the matrix if desired. See options specified by the pack parameter.
NOTE
If two calls to ?latmr differ only in the pack parameter, they generate mathematically equivalent
matrices. If two calls to ?latmr both have full bandwidth (kl = m-1 and ku = n-1), and differ only in
the pivtng and pack parameters, then the matrices generated differ only in the order of the rows and
columns, and otherwise contain the same data. This consistency cannot be and is not maintained with
less than full bandwidth.
Input Parameters
m Number of rows of A.
n Number of columns of A.
dist On entry, dist specifies the type of distribution to be used to generate a

random matrix .
If dist = 'U', real and imaginary parts are independent uniform( 0, 1 ).
If dist = 'S', real and imaginary parts are independent uniform( -1, 1 ).
1063
If dist = 'N', real and imaginary parts are independent normal( 0, 1 ).
If dist = 'D', distribution is uniform on interior of unit disk.

On entry, iseed specifies the seed of the random number generator. They
should lie between 0 and 4095 inclusive, and iseed[3] should be odd. The
random number generator uses a linear congruential sequence limited to
small integers, and so should produce machine independent random
numbers.
sym If sym = 'S', generated matrix is symmetric.
If sym = 'H', generated matrix is Hermitian.
If sym = 'N', generated matrix is nonsymmetric.
d On entry this array specifies the diagonal entries of the diagonal of A. d

may either be specified on entry, or set according to mode and cond as
described below. If the matrix is Hermitian, the real part of d is taken. May
be changed on exit if mode is nonzero.
mode On entry describes how d is to be used:
mode = 0 means use d as input.

mode = 1 sets d[0]=1 and d[1:n - 1]=1.0/cond.
mode = 2 sets d[0:n - 2]=1 and d[n - 1]=1.0/cond.
mode = 3 sets d[i - 1]=cond**(-(i-1)/(n-1)).
mode = 4 sets d[i - 1]=1 - (i-1)/(n-1)*(1 - 1/cond).
the matrix.
Thus if mode is between 1 and 4, d has entries ranging from 1 to 1/cond, if

between -1 and -4, D has entries ranging from 1/cond to 1.
cond On entry, used as described under mode above. If used, cond must be ≥ 1.
dmax If mode is not -6, 0, or 6, the diagonal is scaled by dmax /

max(abs(d[i])), so that maximum absolute entry of diagonal is
abs(dmax). If dmax is complex (or zero), the diagonal is scaled by a
complex number (or zero).
rsign If mode is not -6, 0, or 6, specifies the sign of the diagonal as follows:
For slatmr and dlatmr, if rsign = 'T', diagonal entries are multiplied 1
or -1 with a probability of 0.5.
For clatmr and zlatmr, if rsign = 'T', diagonal entries are multiplied by
a random complex number uniformly distributed with absolute value 1.
1064
LAPACK Routines 3
If rsign = 'F', diagonal entries are unchanged.
grade Specifies grading of matrix as follows:

If grade = 'N', there is no grading
If grade = 'L', matrix is premultiplied by diag( dl) (only if matrix is

nonsymmetric)
If grade = 'R', matrix is postmultiplied by diag( dr ) (only if matrix is
nonsymmetric)
If grade = 'B', matrix is premultiplied by diag( dl ) and postmultiplied by
diag( dr ) (only if matrix is nonsymmetric)
If grade = 'H', matrix is premultiplied by diag( dl ) and postmultiplied by

diag( conjg(dl) ) (only if matrix is Hermitian or nonsymmetric)
If grade = 'S', matrix is premultiplied by diag(dl ) and postmultiplied by

diag( dl ) (only if matrix is symmetric or nonsymmetric)
If grade = 'E', matrix is premultiplied by diag( dl ) and postmultiplied by

inv( diag( dl ) ) (only if matrix is nonsymmetric)
NOTE
if grade = 'E', then m must equal n.
dl Array, size (m).
If model = 0, then on entry this array specifies the diagonal entries of a

diagonal matrix used as described under grade above.
If model is not zero, then dl is set according to model and condl,
analogous to the way D is set according to mode and cond (except there is
no dmax parameter for dl).
If grade = 'E', then dl cannot have zero entries.
Not referenced if grade = 'N' or 'R'. Changed on exit.
model This specifies how the diagonal array dl is computed, just as mode specifies
how D is computed.
condl When model is not zero, this specifies the condition number of the
computed dl.
dr If moder = 0, then on entry this array specifies the diagonal entries of a

diagonal matrix used as described under grade above.
If moder is not zero, then dr is set according to moder and condr,

analogous to the way d is set according to mode and cond (except there is
no dmax parameter for dr).
Not referenced if grade = 'N', 'L', 'H''S' or 'E'.
moder This specifies how the diagonal array dr is to be computed, just as mode
specifies how d is to be computed.
1065
condr When moder is not zero, this specifies the condition number of the
computed dr.
pivtng On entry specifies pivoting permutations as follows:

If pivtng = 'N' or ' ': no pivoting permutation.
If pivtng = 'L': left or row pivoting (matrix must be nonsymmetric).
If pivtng = 'R': right or column pivoting (matrix must be nonsymmetric).
If pivtng = 'B' or 'F': both or full pivoting, i.e., on both sides. In this
case, m must equal n.
If two calls to ?latmr both have full bandwidth (kl = m - 1 and ku =

n-1), and differ only in the pivtng and pack parameters, then the matrices
generated differs only in the order of the rows and columns, and otherwise
contain the same data. This consistency cannot be maintained with less
than full bandwidth.
ipivot Array, size (n or m) This array specifies the permutation used. After the
basic matrix is generated, the rows, columns, or both are permuted.
If row pivoting is selected, ?latmr starts with the last row and interchanges
row m and row ipivot[m - 1], then moves to the next-to-last row,
interchanging rows [m - 2] and row ipivot[m - 2], and so on. In terms
of "2-cycles", the permutation is (1 ipivot[0]) (2 ipivot[1]) ...
(mipivot[m - 1]) where the rightmost cycle is applied first. This is the
inverse of the effect of pivoting in LINPACK. The idea is that factoring (with
pivoting) an identity matrix which has been inverse-pivoted in this way
should result in a pivot vector identical to ipivot. Not referenced if pivtng
= 'N'.
sparse On entry, specifies the sparsity of the matrix if a sparse matrix is to be

generated. sparse should lie between 0 and 1. To generate a sparse
matrix, for each matrix entry a uniform ( 0, 1 ) random number x is
generated and compared to sparse; if x is larger the matrix entry is
unchanged and if x is smaller the entry is set to zero. Thus on the average
a fraction sparse of the entries is set to zero.
kl On entry, specifies the lower bandwidth of the matrix. For example, kl = 0

implies upper triangular, kl = 1 implies upper Hessenberg, and kl at least
m-1 implies the matrix is not banded. Must equal ku if matrix is symmetric
or Hermitian.
ku On entry, specifies the upper bandwidth of the matrix. For example, ku = 0

implies lower triangular, ku = 1 implies lower Hessenberg, and kuat least
n-1 implies the matrix is not banded. Must equal kl if matrix is symmetric
or Hermitian.
anorm On entry, specifies maximum entry of output matrix (output matrix is

multiplied by a constant so that its largest absolute entry equal anorm) if
anorm is nonnegative. If anorm is negative no scaling is done.
pack On entry, specifies packing of matrix as follows:

If pack = 'N': no packing
If pack = 'U': zero out all subdiagonal entries (if symmetric or Hermitian)
1066
LAPACK Routines 3
If pack = 'L': zero out all superdiagonal entries (if symmetric or
Hermitian)
If pack = 'C': store the upper triangle columnwise (only if matrix
symmetric or Hermitian or square upper triangular)
If pack = 'R': store the lower triangle columnwise (only if matrix
symmetric or Hermitian or square lower triangular) (same as upper half
rowwise if symmetric) (same as conjugate upper half rowwise if Hermitian)
If pack = 'B': store the lower triangle in band storage scheme (only if
matrix symmetric or Hermitian)
If pack = 'Q': store the upper triangle in band storage scheme (only if
matrix symmetric or Hermitian)
If pack = 'Z': store the entire matrix in band storage scheme (pivoting
can be provided for by using this option to store A in the trailing rows of the
allocated storage)
Using these options, the various LAPACK packed and banded storage
schemes can be obtained:
LAPACK storage scheme Value of pack
GB 'Z'
PB, HB or TB 'B' or 'Q'
PP, HP or TP 'C' or 'R'
If two calls to ?latmr differ only in the pack parameter, they generate
mathematically equivalent matrices.
lda On entry, lda specifies the first dimension of a as declared in the calling
program.
If pack = 'N', 'U' or 'L', lda must be at least max( 1, m ).
If pack = 'C' or 'R', lda must be at least 1.
If pack = 'B', or 'Q', lda must be min( ku + 1, n ).
If pack = 'Z', lda must be at least kuu + kll + 1, where kuu =

min( ku, n-1 ) and kll = min( kl, n-1 ).
iwork Array, size (n or m). Workspace. Not referenced if pivtng = 'N'. Changed
on exit.
Output Parameters
iseed On exit, the seed is changed.
d May be changed on exit if mode is nonzero.
dl On exit, array is changed.
dr On exit, array is changed.
a On exit, a is the desired test matrix. Only those entries of a which are
significant on output is referenced (even if a is in packed or band storage
format). The unoccupied corners of a in band format are zeroed out.
1067
info
If info = -1, m is negative or unequal to n and sym = 'S' or 'H'.
If info = -2, n is negative .
If info = -3, dist is an illegal string.
If info = -5, sym is an illegal string..
If info = -8, cond is less than 1.0, and mode is neither -6, 0 nor 6.
If info = -10, mode is neither -6, 0 nor 6 and rsign is an illegal string.
If info = -11, grade is an illegal string, or grade = 'E' and m is not

equal to n, or grade='L', 'R', 'B', 'S' or 'E' and sym = 'H', or
grade = 'L', 'R', 'B', 'H' or 'E' and sym = 'S'
If info = -12,grade = 'E'and dl contains zero .
If info = -13, model is not in range -6 to 6 and grade = 'L', 'B',

'H', 'S' or 'E' .
If info = -14, condl is less than 1.0, grade = 'L', 'B', 'H', 'S' or
'E', and model is neither -6, 0 nor 6.
If info = -16, moder is not in range -6 to 6 and grade = 'R' or 'B' .
If info = -17, condr is less than 1.0, grade = 'R' or 'B', and moder
is neither -6, 0 nor 6 .
If info = -18, pivtng is an illegal string, or pivtng = 'B' or 'F' and m
is not equal to n, or pivtng = 'L' or 'R' and sym = 'S' or 'H'.
If info = -19, ipivot contains out of range number and pivtng is not
equal to 'N' .
If info = -20, kl is negative.
If info = -21, ku is negative, or sym = 'S' or 'H' and ku not equal to

kl .
If info = -22, sparse is not in range 0 to 1.
If info = -24, pack is an illegal string, or pack = 'U', 'L', 'B' or

'Q' and sym = 'N', or pack = 'C' and sym = 'N' and either kl is not
equal to 0 or n is not equal to m, or pack = 'R' and sym = 'N', and either
ku is not equal to 0 or n is not equal to m .
If info = -26, lda is too small .
If info = 1, error return from ?latm1 (computing D ) .
If info = 2, cannot scale to dmax (max. entry is 0) .
If info = 3, error return from ?latm1(computing dl) .
If info = 4, error return from ?latm1(computing dr) .
If info = 5, anorm is positive, but matrix constructed prior to attempting

to scale it to have norm anorm, is zero .
1068
LAPACK Routines 3
?lauum
Computes the product U*UT(U*UH) or LT*L (LH*L),
where U and L are upper or lower triangular matrices
(blocked algorithm).
Syntax
lapack_int LAPACKE_slauum (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int lda );
lapack_int LAPACKE_dlauum (int matrix_layout , char uplo , lapack_int n , double * a ,
lapack_int lda );
lapack_int LAPACKE_clauum (int matrix_layout , char uplo , lapack_int n ,
lapack_int LAPACKE_zlauum (int matrix_layout , char uplo , lapack_int n ,
Include Files
• mkl.h
Description
The routine ?lauum computes the product U*UT or LT*L for real flavors, and U*UH or LH*L for complex
flavors. Here the triangular factor U or L is stored in the upper or lower triangular part of the array a.
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in A.
If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in A.
This is the blocked form of the algorithm, calling BLAS Level 3 Routines.
Input Parameters
uplo Specifies whether the triangular factor stored in the array a is upper or
lower triangular:
= 'U': Upper triangular
= 'L': Lower triangular
n The order of the triangular factor U or L. n≥ 0.
a Array of size max(1,lda *n).

On entry, the triangular factor U or L.
lda The leading dimension of the array a. lda≥ max(1,n).
Output Parameters
a On exit,
if uplo = 'U', then the upper triangle of a is overwritten with the upper
triangle of the product U*UT(U*UH);
if uplo = 'L', then the lower triangle of a is overwritten with the lower
triangle of the product LT*L (LH*L).
1069
Return Values
If info = -k, the k-th parameter had an illegal value.
?syswapr
Applies an elementary permutation on the rows and
columns of a symmetric matrix.
Syntax
lapack_int LAPACKE_ssyswapr (int matrix_layout , char uplo , lapack_int n , float * a ,
lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_dsyswapr (int matrix_layout , char uplo , lapack_int n , double *
a , lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_csyswapr (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_float * a , lapack_int i1 , lapack_int i2 );
lapack_int LAPACKE_zsyswapr (int matrix_layout , char uplo , lapack_int n ,
lapack_complex_double * a , lapack_int i1 , lapack_int i2 );
Include Files
• mkl.h
Description
The routine applies an elementary permutation on the rows and columns of a symmetric matrix.
Input Parameters
matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR) or

column major ( LAPACK_COL_MAJOR ).

a Array of size at least max(1,lda*n).
The array a contains the block diagonal matrix D and the multipliers used to
obtain the factor U or L as computed by ?sytrf.
i1 Index of the first row to swap.
1070
LAPACK Routines 3
i2 Index of the second row to swap.
Output Parameters
If info = 'U', the upper triangular part of the inverse is formed and the part of
A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and the part of
A above the diagonal is not referenced.
Return Values
See Also
?sytrf
?heswapr
Applies an elementary permutation on the rows and
columns of a Hermitian matrix.
Syntax
lapack_int LAPACKE_cheswapr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_float* a, lapack_int i1, lapack_int i2);
lapack_int LAPACKE_zheswapr (int matrix_layout, char uplo, lapack_int n,
lapack_complex_double* a, lapack_int i1, lapack_int i2);
Include Files
• mkl.h
Description
The routine applies an elementary permutation on the rows and columns of a Hermitian matrix.
Input Parameters
matrix_layout Specifies whether matrix storage layout is row major (LAPACK_ROW_MAJOR) or

column major ( LAPACK_COL_MAJOR ).

1071
a Array of size at least max(1,lda*n).
The array a contains the block diagonal matrix D and the multipliers used to
obtain the factor U or L as computed by ?hetrf.
i1 Index of the first row to swap.
i2 Index of the second row to swap.
Output Parameters
a If info = 0, the inverse of the original matrix.
If info = 'U', the upper triangular part of the inverse is formed and the part of
A below the diagonal is not referenced.
If info = 'L', the lower triangular part of the inverse is formed and the part of
A above the diagonal is not referenced.
Return Values
See Also
?hetrf
?sfrk
Performs a symmetric rank-k operation for matrix in
RFP format.
Syntax
lapack_int LAPACKE_ssfrk (int matrix_layout , char transr , char uplo , char trans ,
lapack_int n , lapack_int k , float alpha , const float * a , lapack_int lda , float
beta , float * c );
lapack_int LAPACKE_dsfrk (int matrix_layout , char transr , char uplo , char trans ,
lapack_int n , lapack_int k , double alpha , const double * a , lapack_int lda ,
double beta , double * c );
Include Files
• mkl.h
Description
The ?sfrk routines perform a matrix-matrix operation using symmetric matrices. The operation is defined as
C := alpha*A*AT + beta*C,
1072
LAPACK Routines 3
or
C := alpha*AT*A + beta*C,
where:
C is an n-by-n symmetric matrix in rectangular full packed (RFP) format,
Input Parameters

transr
if transr = 'N' or 'n', the normal form of RFP C is stored;
if transr= 'T' or 't', the transpose form of RFP C is stored.
If uplo = 'L' or 'l', then the low triangular part of the array c is used.

if trans = 'N' or 'n', then C := alpha*A*AT + beta*C;
if trans = 'T' or 't', then C := alpha*AT*A + beta*C;
k On entry with trans = 'N' or 'n', k specifies the number of columns of

the matrix A, and on entry with trans = 'T' or 't', k specifies the
number of rows of the matrix A.
a Array, size max(1,lda*ka), where ka is in the following table:
Col_major Row_major
k n
trans = 'N'
n k
trans = 'T'
Before entry with trans = 'N' or 'n', the leading n-by-k part of the array
a must contain the matrix A, otherwise the leading k-by-n part of the array
a must contain the matrix A.

(sub)program. lda is defined by the following table:
Col_major Row_major
trans = 'N' max(1,n) max(1,k)
trans = 'T' max(1,k) max(1,n)
1073
c Array, size (n*(n+1)/2 ). Before entry contains the symmetric matrix C in

RFP format.
Output Parameters
c If trans = 'N' or 'n', then c contains C := alpha*A*A' + beta*C;
if trans = 'T' or 't', then c contains C := alpha*A'*A + beta*C;
Return Values

?hfrk
Performs a Hermitian rank-k operation for matrix in
RFP format.
Syntax
lapack_int LAPACKE_chfrk( int matrix_layout, char transr, char uplo, char trans,
lapack_int n, lapack_int k, float alpha, const lapack_complex_float* a, lapack_int
lda, float beta, lapack_complex_float* c );
lapack_int LAPACKE_zhfrk( int matrix_layout, char transr, char uplo, char trans,
lapack_int n, lapack_int k, double alpha, const lapack_complex_double* a, lapack_int
lda, double beta, lapack_complex_double* c );
Include Files
• mkl.h
Description
The ?hfrk routines perform a matrix-matrix operation using Hermitian matrices. The operation is defined as
C := alpha*A*AH + beta*C,
or
C := alpha*AH*A + beta*C,
where:
alpha and beta are real scalars,
C is an n-by-n Hermitian matrix in RFP format,
1074
LAPACK Routines 3
Input Parameters

transr
if transr = 'N' or 'n', the normal form of RFP C is stored;
if transr = 'C' or 'c', the conjugate-transpose form of RFP C is stored.
If uplo = 'L' or 'l', then the low triangular part of the array c is used.

if trans = 'N' or 'n', then C := alpha*A*AH + beta*C;
if trans = 'C' or 'c', then C := alpha*AH*A + beta*C.
k On entry with trans = 'N' or 'n', k specifies the number of columns of

the matrix a, and on entry with trans = 'T' or 't' or 'C' or 'c', k
specifies the number of rows of the matrix a.
a Array, size max(1,lda*ka), where ka is in the following table:
Col_major Row_major
k n
trans = 'N'
n k
trans = 'T'
Before entry with trans = 'N' or 'n', the leading n-by-k part of the array
a must contain the matrix A, otherwise the leading k-by-n part of the array
a must contain the matrix A.

(sub)program. lda is defined by the following table:
Col_major Row_major
trans = 'N' max(1,n) max(1,k)
trans = 'T' max(1,k) max(1,n)
c Array, size (n*(n+1)/2 ). Before entry contains the Hermitian matrix C in

in RFP format.
Output Parameters
c If trans = 'N' or 'n', then c contains C := alpha*A*AH + beta*C;
1075
if trans = 'C' or 'c', then c contains C := alpha*AH*A + beta*C ;
Return Values

?tfsm
Solves a matrix equation (one operand is a triangular
matrix in RFP format).
Syntax
lapack_int LAPACKE_stfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , float alpha , const float * a ,
float * b , lapack_int ldb );
lapack_int LAPACKE_dtfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , double alpha , const double *
a , double * b , lapack_int ldb );
lapack_int LAPACKE_ctfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , lapack_complex_float alpha ,
const lapack_complex_float * a , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_ztfsm (int matrix_layout , char transr , char side , char uplo ,
char trans , char diag , lapack_int m , lapack_int n , lapack_complex_double alpha ,
const lapack_complex_double * a , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The ?tfsm routines solve one of the following matrix equations:
op(A)*X = alpha*B,
or
X*op(A) = alpha*B,
where:
alpha is a scalar,
X and B are m-by-n matrices,
A is a unit, or non-unit, upper or lower triangular matrix in rectangular full packed (RFP) format.
op(A) can be one of the following:
• op(A) = A or op(A) = AT for real flavors

• op(A) = A or op(A) = AH for complex flavors
The matrix B is overwritten by the solution matrix X.
1076
LAPACK Routines 3
Input Parameters

transr
if transr = 'N' or 'n', the normal form of RFP A is stored;
if transr = 'T' or 't', the transpose form of RFP A is stored;
if transr = 'C' or 'c', the conjugate-transpose form of RFP A is stored.
side Specifies whether op(A) appears on the left or right of X in the equation:
if side = 'L' or 'l', then op(A)*X = alpha*B;
if side = 'R' or 'r', then X*op(A) = alpha*B.
uplo Specifies whether the RFP matrix A is upper or lower triangular:

if uplo = 'U' or 'u', then the matrix is upper triangular;
if uplo = 'L' or 'l', then the matrix is low triangular.
trans Specifies the form of op(A) used in the matrix multiplication:
if trans = 'N' or 'n', then op(A) = A;
if trans = 'T' or 't', then op(A) = A';
if trans = 'C' or 'c', then op(A) = conjg(A').
diag Specifies whether the RFP matrix A is unit triangular:

if diag = 'U' or 'u' then the matrix is unit triangular;
if diag = 'N' or 'n', then the matrix is not unit triangular.

entry.
a Array, size (n*(n+1)/2). Contains the matrix A in RFP format.
b Array, size max(1, ldb*n) for column major and max(1, ldb*m) for row
major.
Before entry, the leading m-by-n part of the array b must contain the right-
hand side matrix B.

(sub)program. The value of ldb must be at least max(1, m) for column
major and max(1,n) for row major.
Output Parameters
1077
Return Values

?tfttp
Copies a triangular matrix from the rectangular full
packed format (TF) to the standard packed format
(TP) .
Syntax
lapack_int LAPACKE_stfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * arf , float * ap );
lapack_int LAPACKE_dtfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * arf , double * ap );
lapack_int LAPACKE_ctfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * arf , lapack_complex_float * ap );
lapack_int LAPACKE_ztfttp (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * arf , lapack_complex_double * ap );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the Rectangular Full Packed (RFP) format to the standard
packed format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters

transr
= 'N': arf is in the Normal format,
= 'T': arf is in the Transpose format (for stfttp and dtfttp),
= 'C': arf is in the Conjugate-transpose format (for ctfttp and ztfttp).
uplo Specifies whether A is upper or lower triangular:

= 'U': A is upper triangular,
= 'L': A is lower triangular.
n The order of the matrix A. n≥ 0.
arf Array, size at least max (1, n*(n+1)/2).
On entry, the upper or lower triangular matrix A stored in the RFP format.
1078
LAPACK Routines 3
Output Parameters
ap Array, size at least max (1, n*(n+1)/2).
On exit, the upper or lower triangular matrix A, packed columnwise in a

linear array.
Return Values
?tfttr
Copies a triangular matrix from the rectangular full
packed format (TF) to the standard full format (TR) .
Syntax
lapack_int LAPACKE_stfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * arf , float * a , lapack_int lda );
lapack_int LAPACKE_dtfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * arf , double * a , lapack_int lda );
lapack_int LAPACKE_ctfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * arf , lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztfttr (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * arf , lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the Rectangular Full Packed (RFP) format to the standard full
format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters

transr
= 'N': arf is in the Normal format,
= 'T': arf is in the Transpose format (for stfttr and dtfttr),
= 'C': arf is in the Conjugate-transpose format (for ctfttr and ztfttr).
uplo
Specifies whether A is upper or lower triangular:
1079
n The order of the matrices arf and a. n≥ 0.
On entry, the upper or lower triangular matrix A stored in the RFP

format.
lda The leading dimension of the array a. lda ≥ max(1,n).
Output Parameters
a Array, size max(1,lda *n).
On exit, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
triangular part of the array a contains the upper triangular matrix, and the
strictly lower triangular part of a is not referenced. If uplo = 'L', the leading
n-by-n lower triangular part of the array a contains the lower triangular
matrix, and the strictly upper triangular part of a is not referenced.
Return Values
?tpqrt2
Computes a QR factorization of a real or complex
"triangular-pentagonal" matrix, which is composed of
a triangular block and a pentagonal block, using the
compact WY representation for Q.
Syntax
lapack_int LAPACKE_stpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, float * a, lapack_int lda, float * b, lapack_int ldb, float * t, lapack_int ldt);
lapack_int LAPACKE_dtpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, double * a, lapack_int lda, double * b, lapack_int ldb, double * t, lapack_int
ldt);
lapack_int LAPACKE_ctpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int ldb,
lapack_complex_float * t, lapack_int ldt );
lapack_int LAPACKE_ztpqrt2 (int matrix_layout, lapack_int m, lapack_int n, lapack_int
l, lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int
ldb, lapack_complex_double * t, lapack_int ldt );
Include Files
• mkl.h
Description
The input matrix C is an (n+m)-by-n matrix
1080
LAPACK Routines 3
where A is an n-by-n upper triangular matrix, and B is an m-by-n pentagonal matrix consisting of an (m-l)-
by-n rectangular matrix B1 on top of an l-by-n upper trapezoidal matrix B2:
The upper trapezoidal matrix B2 consists of the first l rows of an n-by-n upper triangular matrix, where 0
≤l≤ min(m,n). If l=0, B is an m-by-n rectangular matrix. If m=l=n, B is upper triangular. The matrix W
contains the elementary reflectors H(i) in the ith column below the diagonal (of A) in the (n+m)-by-n input
matrix C so that W can be represented as
Thus, V contains all of the information needed for W, and is returned in array b.
NOTE
V has the same form as B:
The columns of V represent the vectors which define the H(i)s.

The (m+n)-by-(m+n) block reflector H is then given by
H = I - W*T*WT for real flavors, and

H = I - W*T*WH for complex flavors
where WT is the transpose of W, WH is the conjugate transpose of W, and T is the upper triangular factor of
the block reflector.
Input Parameters

n The number of columns in B and the order of the triangular matrix A (n≥ 0).
1081
l The number of rows of the upper trapezoidal part of B (min(m, n) ≥l≥ 0).
a, b Arrays: a, size max(1, lda *n) contains the n-by-n upper triangular matrix
A.
b, size max(1,ldb* n) for column major and max(1,ldb*m) for row major,
the pentagonal m-by-n matrix B. The first (m-l) rows contain the
rectangular B1 matrix, and the next l rows contain the upper trapezoidal
B2 matrix.
ldb The leading dimension of b; at least max(1, m) for column major and
max(1,n) for row major.
Output Parameters
a The elements on and above the diagonal of the array contain the upper
b The pentagonal matrix V.
t Array, size max(1, ldt *n).
The upper n-by-n upper triangular factor T of the block reflector.
Return Values
?tprfb
Applies a real or complex "triangular-pentagonal"
blocked reflector to a real or complex matrix, which is
composed of two blocks.
Syntax
lapack_int LAPACKE_stprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const float * v,
lapack_int ldv, const float * t, lapack_int ldt, float * a, lapack_int lda, float * b,
lapack_int ldb);
lapack_int LAPACKE_dtprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const double * v,
lapack_int ldv, const double * t, lapack_int ldt, double * a, lapack_int lda, double *
b, lapack_int ldb);
1082
LAPACK Routines 3
lapack_int LAPACKE_ctprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const
lapack_complex_float * v, lapack_int ldv, const lapack_complex_float * t, lapack_int
ldt, lapack_complex_float * a, lapack_int lda, lapack_complex_float * b, lapack_int
ldb);
lapack_int LAPACKE_ztprfb (int matrix_layout, char side, char trans, char direct, char
storev, lapack_int m, lapack_int n, lapack_int k, lapack_int l, const
lapack_complex_double * v, lapack_int ldv, const lapack_complex_double * t, lapack_int
ldt, lapack_complex_double * a, lapack_int lda, lapack_complex_double * b, lapack_int
ldb);
Include Files
• mkl.h
Description
The ?tprfb routine applies a real or complex "triangular-pentagonal" block reflector H, HT, or HH from either
the left or the right to a real or complex matrix C, which is composed of two blocks A and B.
The block B is m-by-n. If side = 'R', A is m-by-k, and if side = 'L', A is of size k-by-n.
The pentagonal matrix V is composed of a rectangular block V1 and a trapezoidal block V2. The size of the
trapezoidal block is determined by the parameter l, where 0≤l≤k. if l=k, the V2 block of V is triangular; if
l=0, there is no trapezoidal block, thus V = V1 is rectangular.
direct='F' direct='B'
storev='C'
V2 is upper trapezoidal (first l rows of k-by-k V2 is lower trapezoidal (last l rows of k-by-k
upper triangular) lower triangular matrix)
storev='R'
V2 is lower trapezoidal (first l columns of k- V2 is upper trapezoidal (last l columns of k-

by-k lower triangular matrix) by-k upper triangular matrix)
side='L' side='R'
storev='C'
V is m-by-k V is n-by-k
1083
V2 is l-by-k V2 is l-by-k
storev='R'
V is k-by-m V is k-by-n
V2 is k-by-l V2 is k-by-l
Input Parameters

side = 'L': apply H, HT, or HH from the left,

= 'R': apply H, HT, or HH from the right.
trans = 'N': apply H (no transpose),

= 'T': apply HT (transpose),
= 'C': apply HH (conjugate transpose).
direct Indicates how H is formed from a product of elementary reflectors:

= 'F': H = H(1) H(2) . . . H(k) (Forward),
= 'B': H = H(k) . . . H(2) H(1) (Backward).
storev Indicates how the vectors that define the elementary reflectors are stored:
= 'C': Columns,
= 'R': Rows.
n The number of columns in B (n≥ 0).
k The order of the matrix T, which is the number of elementary reflectors

whose product defines the block reflector. (k≥ 0)
l The order of the trapezoidal part of V. (k≥l≥ 0).
v An array containing the pentagonal matrix V (the elementary reflectors

H(1), H(2), …, H(k). The size limitations depend on values of
parameters storev and side as described in the following table
Column max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

major k) k) m) n)
Row major max(1,ldv* max(1,ldv* max(1,ldv* max(1,ldv*

m) n) k) k)
ldv The leading dimension of the array v.It should satisfy the following
conditions:
1084
LAPACK Routines 3
Column max(1,m) max(1,n) max(1,k) max(1,k)

major
Row major max(1,k) max(1,k) max(1,m) max(1,n)
t Array size max(1,ldt * k). The triangular k-by-k matrix T in the

representation of the block reflector.
ldt The leading dimension of the array t (ldt≥k).
a size should satisfy the following conditions:

k if side = 'R'.
side = L side = R
Column major
max(1,lda*n) max(1,lda*k)
Row major
max(1,lda*k) max(1,lda*m)
The k-by-n or m-by-k matrix A.
lda The leading dimension of the array a should satisfy the following conditions:
side = L side = R
Column major
max(1,k) max(1,m)
Row major
max(1,n) max(1,k)
b Array size at least max(1, ldb *n) for column major layout and max(1, ldb
*m) for row major layout, the m-by-n matrix B.
ldb The leading dimension of the array b (ldb≥ max(1, m) for column major
layout and ldb≥ max(1, n) for row major layout).
Output Parameters
a Contains the corresponding block of H*C, HT*C, HH*C, C*H, C*HT, or C*HH.
b Contains the corresponding block of H*C, HT*C, HH*C, C*H, C*HT, or C*HH.
Return Values
?tpttf
Copies a triangular matrix from the standard packed
format (TP) to the rectangular full packed format (TF).
1085
Syntax
lapack_int LAPACKE_stpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * ap , float * arf );
lapack_int LAPACKE_dtpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * ap , double * arf );
lapack_int LAPACKE_ctpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * ap , lapack_complex_float * arf );
lapack_int LAPACKE_ztpttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * ap , lapack_complex_double * arf );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard packed format to the Rectangular Full Packed
(RFP) format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters

or column major ( LAPACK_COL_MAJOR).
transr
= 'N': arf must be in the Normal format,
= 'T': arf must be in the Transpose format (for stpttf and dtpttf),
= 'C': arf must be in the Conjugate-transpose format (for ctpttf and

ztpttf).
uplo
On entry, the upper or lower triangular matrix A, packed in a linear array.

See Matrix Storage Schemes for more information.
Output Parameters
On exit, the upper or lower triangular matrix A stored in the RFP

format.
Return Values
< 0: if info = -i, the i-th parameter had an illegal value.
1086
LAPACK Routines 3
?tpttr
Copies a triangular matrix from the standard packed
format (TP) to the standard full format (TR) .
Syntax
lapack_int LAPACKE_stpttr (int matrix_layout , char uplo , lapack_int n , const float *
ap , float * a , lapack_int lda );
lapack_int LAPACKE_dtpttr (int matrix_layout , char uplo , lapack_int n , const double
* ap , double * a , lapack_int lda );
lapack_int LAPACKE_ctpttr (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_float * ap , lapack_complex_float * a , lapack_int lda );
lapack_int LAPACKE_ztpttr (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_double * ap , lapack_complex_double * a , lapack_int lda );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard packed format to the standard full format.
Input Parameters

uplo
n The order of the matrices ap and a. n≥ 0.
ap Array, size at least max (1, n*(n+1)/2). (see Matrix Storage Schemes).
Output Parameters
a Array, size max(1,lda*n).

On exit, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
triangular part of the array a contains the upper triangular part of the
matrix A, and the strictly lower triangular part of a is not referenced. If
uplo = 'L', the leading n-by-n lower triangular part of the array a contains
the lower triangular part of the matrix A, and the strictly upper triangular
part of a is not referenced.
Return Values
1087
?trttf
Copies a triangular matrix from the standard full
format (TR) to the rectangular full packed format (TF).
Syntax
lapack_int LAPACKE_strttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const float * a , lapack_int lda , float * arf );
lapack_int LAPACKE_dtrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const double * a , lapack_int lda , double * arf );
lapack_int LAPACKE_ctrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_float * a , lapack_int lda , lapack_complex_float * arf );
lapack_int LAPACKE_ztrttf (int matrix_layout , char transr , char uplo , lapack_int n ,
const lapack_complex_double * a , lapack_int lda , lapack_complex_double * arf );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard full format to the Rectangular Full Packed (RFP)
format. For the description of the RFP format, see Matrix Storage Schemes.
Input Parameters

transr
= 'N': arf must be in the Normal format,
= 'T': arf must be in the Transpose format (for strttf and dtrttf),
= 'C': arf must be in the Conjugate-transpose format (for ctrttf and

ztrttf).
uplo
a Array, size max(1,(lda*n)).
1088
LAPACK Routines 3
On entry, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
Output Parameters
On exit, the upper or lower triangular matrix A stored in the RFP

format.
Return Values
?trttp
Copies a triangular matrix from the standard full
format (TR) to the standard packed format (TP) .
Syntax
lapack_int LAPACKE_strttp (int matrix_layout , char uplo , lapack_int n , const float *
a , lapack_int lda , float * ap );
lapack_int LAPACKE_dtrttp (int matrix_layout , char uplo , lapack_int n , const double
* a , lapack_int lda , double * ap );
lapack_int LAPACKE_ctrttp (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_float * a , lapack_int lda , lapack_complex_float * ap );
lapack_int LAPACKE_ztrttp (int matrix_layout , char uplo , lapack_int n , const
lapack_complex_double * a , lapack_int lda , lapack_complex_double * ap );
Include Files
• mkl.h
Description
The routine copies a triangular matrix A from the standard full format to the standard packed format.
Input Parameters
uplo
1089
n The order of the matrix A, n≥ 0.
a Array, size max(1, lda *n).

On entry, the triangular matrix A. If uplo = 'U', the leading n-by-n upper
Output Parameters
On exit, the upper or lower triangular matrix A, packed columnwise in a

linear array. (see Matrix Storage Schemes)
Return Values
?lacp2
Copies all or part of a real two-dimensional array to a
complex array.
Syntax
lapack_int LAPACKE_clacp2 (int matrix_layout , char uplo , lapack_int m , lapack_int
n , const float * a , lapack_int lda , lapack_complex_float * b , lapack_int ldb );
lapack_int LAPACKE_zlacp2 (int matrix_layout , char uplo , lapack_int m , lapack_int
n , const double * a , lapack_int lda , lapack_complex_double * b , lapack_int ldb );
Include Files
• mkl.h
Description
The routine copies all or part of a real matrix A to another matrix B.
Input Parameters

uplo
Specifies the part of the matrix A to be copied to B.
If uplo = 'U', the upper triangular part of A;
if uplo = 'L', the lower triangular part of A.
1090
LAPACK Routines 3
Otherwise, all of the matrix A is copied.
a Array, size at least max(1,lda*n) for column major and max(1,lda*m)

for row major, contains the m-by-n matrix A.
If uplo = 'U', only the upper triangle or trapezoid is accessed; if uplo =
'L', only the lower triangle or trapezoid is accessed.
lda The leading dimension of a; lda≥ max(1, m) for column major and lda≥
max(1, n) for row major.
ldb The leading dimension of the output array b; ldb≥ max(1, m) for column
major and ldb≥ max(1, n) for row major.
Output Parameters
b Array, size at least max(1,ldb*n) for column major layout and

max(1,ldb*m) for row major layout, contains the m-by-n matrix B.
On exit, B = A in the locations specified by uplo.
Return Values
mkl_?tppack
Copies a triangular/symmetric matrix or submatrix
from standard full format to standard packed format.
Syntax
lapack_int LAPACKE_mkl_stppack (int matrix_layout, char uplo, char trans, lapack_int n,
float* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const float*
a, lapack_int lda);
lapack_int LAPACKE_mkl_dtppack (int matrix_layout, char uplo, char trans, lapack_int n,
double* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const
double* a, lapack_int lda);
lapack_int LAPACKE_mkl_ctppack (int matrix_layout, char uplo, char trans, lapack_int n,
MKL_Complex8* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const
MKL_Complex8* a, lapack_int lda);
lapack_int LAPACKE_mkl_ztppack (int matrix_layout, char uplo, char trans, lapack_int n,
MKL_Complex16* ap, lapack_int i, lapack_int j, lapack_int rows, lapack_int cols, const
MKL_Complex16* a, lapack_int lda);
1091
Include Files
• mkl.h
Description
The routine copies a triangular or symmetric matrix or its submatrix from standard full format to packed
format
APi:i+rows-1, j:j+cols-1 := op(A)
Standard packed formats include:
• TP: triangular packed storage

• SP: symmetric indefinite packed storage
• HP: Hermitian indefinite packed storage
• PP: symmetric or Hermitian positive definite packed storage
Full formats include:
• GE: general
• TR: triangular
• SY: symmetric indefinite
• HE: Hermitian indefinite
• PO: symmetric or Hermitian positive definite
NOTE
Any elements of the copied submatrix rectangular outside of the triangular part of the matrix AP are
skipped.
Input Parameters
uplo Specifies whether the matrix AP is upper or lower triangular.

If uplo = 'U', AP is upper triangular.
If uplo = 'L': AP is lower triangular.
trans Specifies whether or not the copied block of A is transposed or not.

If trans = 'N', no transpose: op(A) = A.
If trans = 'T',transpose: op(A) = AT.
If trans = 'C',conjugate transpose: op(A) = AH. For real data this is the
same as trans = 'T'.
n The order of the matrix AP; n≥ 0
i, j Coordinates of the left upper corner of the destination submatrix in AP.

If uplo=’U’, 1 ≤i≤j≤n.
If uplo=’L’, 1 ≤j≤i≤n.
rows Number of rows in the destination submatrix. 0 ≤rows≤n - i + 1.
cols Number of columns in the destination submatrix. 0 ≤cols≤n - j + 1.
a Pointer to the source submatrix.
1092
LAPACK Routines 3
Array a contains the rows-by-cols submatrix stored as unpacked rows-by-
columns if trans = ’N’, or unpacked columns-by-rows if trans = ’T’ or
trans = ’C’.
The size of a is
trans = 'N' trans='T' or

trans='C'
matrix_layout = lda*cols lda*rows

LAPACK_COL_MAJOR
matrix_layout = lda*rows lda*cols

LAPACK_ROW_MAJOR
NOTE
If there are elements outside of the triangular part of AP, they are
skipped and are not copied from a.

trans='C'
matrix_layout = lda≥ max(1, lda≥ max(1, cols)

LAPACK_COL_MAJOR rows)
matrix_layout = lda≥ max(1, lda≥ max(1, rows)

LAPACK_ROW_MAJOR cols)
Output Parameters
ap Array of size at least max(1, n(n+1)/2). The array ap contains either the
upper or the lower triangular part of the matrix AP (as specified by uplo) in
packed storage (see Matrix Storage Schemes). The submatrix of ap from
row i to row i + rows - 1 and column j to column j + cols - 1 is
overwritten with a copy of the source matrix.
Return Values
This function returns a value info. If info=0, the execution is successful. If info = -i, the i-th parameter
had an illegal value.
mkl_?tpunpack
Copies a triangular/symmetric matrix or submatrix
from standard packed format to full format.
Syntax
lapack_int LAPACKE_mkl_stpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const float* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, float* a, lapack_int lda );
1093
lapack_int LAPACKE_mkl_dtpunpack ( int matrix_layout, char uplo, char trans,

lapack_int n, const double* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, double* a, lapack_int lda );
lapack_int LAPACKE_mkl_ctpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const MKL_Complex8* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, MKL_Complex8* a, lapack_int lda );
lapack_int LAPACKE_mkl_ztpunpack ( int matrix_layout, char uplo, char trans,
lapack_int n, const MKL_Complex16* ap, lapack_int i, lapack_int j, lapack_int rows,
lapack_int cols, MKL_Complex16* a, lapack_int lda );
Include Files
• mkl.h
Description
The routine copies a triangular or symmetric matrix or its submatrix from standard packed format to full
format.
A := op(APi:i+rows-1, j:j+cols-1)
Standard packed formats include:
• TP: triangular packed storage

• SP: symmetric indefinite packed storage
• HP: Hermitian indefinite packed storage
• PP: symmetric or Hermitian positive definite packed storage
Full formats include:
• GE: general
• TR: triangular
• SY: symmetric indefinite
• HE: Hermitian indefinite
• PO: symmetric or Hermitian positive definite
NOTE
Any elements of the copied submatrix rectangular outside of the triangular part of AP are skipped.
Input Parameters
uplo Specifies whether matrix AP is upper or lower triangular.

If uplo = 'U', AP is upper triangular.
If uplo = 'L': AP is lower triangular.
trans Specifies whether or not the copied block of AP is transposed.

If trans = 'N', no transpose: op(AP) = AP.
If trans = 'T',transpose: op(AP) = APT.
If trans = 'C',conjugate transpose: op(AP) = APH. For real data this is the
same as trans = 'T'.
n The order of the matrix AP; n ≥ 0.
1094
LAPACK Routines 3
ap Array, size at least max(1, n(n+1)/2). The array ap contains either the
upper or the lower triangular part of the matrix AP (as specified by uplo) in
packed storage (see Matrix Storage Schemes). It is the source for the
submatrix of AP from row i to row i + rows - 1 and column j to column j
+ cols - 1 to be copied.
i, j Coordinates of left upper corner of the submatrix in AP to copy.

If uplo=’U’, 1 ≤i≤j≤n.
If uplo=’L’, 1 ≤j≤i≤n.
rows Number of rows to copy. 0 ≤rows≤n - i + 1.
cols Number of columns to copy. 0 ≤cols≤n - j + 1.
lda The leading dimension of array a.

trans='C'
matrix_layout = lda≥ max(1,rows) lda≥ max(1,cols)

LAPACK_COL_MAJOR
matrix_layout = lda≥ max(1,cols) lda≥ max(1,rows)

LAPACK_ROW_MAJOR
Output Parameters
a Pointer to the destination matrix. On exit, array a is overwritten with a copy

of the unpacked rows-by-cols submatrix of ap unpacked rows-by-columns
if trans = ’N’, or unpacked columns-by-rows if trans = ’T’ or trans = ’C’.
The size of a is

trans='C'
matrix_layout = lda*cols lda*rows

LAPACK_COL_MAJOR
matrix_layout = lda*rows lda*cols

LAPACK_ROW_MAJOR
NOTE
If there are elements outside of the triangular part of ap indicated by
uplo, they are skipped and are not copied to a.
Return Values
This function returns a value info. If info=0, the execution is successful. If info = -i, the i-th parameter
had an illegal value.
1095
LAPACK Utility Functions and Routines

This section describes LAPACK utility functions and routines.
See Also
lsame Tests two characters for equality regardless of the case.
lsamen Tests two character strings for equality regardless of the case.
second/dsecnd Returns elapsed time in seconds. Use to estimate real time between two calls to
this function.
xerblaError handling function called by BLAS, LAPACK, Vector Math, and Vector Statistics
functions.
ilaver
Returns the version of the LAPACK library.
Syntax
void LAPACKE_ilaver (lapack_int * vers_major, lapack_int * vers_minor, lapack_int *
vers_patch);
Include Files
• mkl.h
Description
This routine returns the version of the LAPACK library.
Output Parameters
vers_major Returns the major version of the LAPACK library.
vers_minor Returns the minor version from the major version of the LAPACK library.
vers_patch Returns the patch version from the minor version of the LAPACK library.
?lamch
Determines machine parameters for floating-point
arithmetic.
Syntax
float LAPACKE_slamch (char cmach );
double LAPACKE_dlamch (char cmach );
Include Files
• mkl.h
Description
The function ?lamch determines single precision and double precision machine parameters.
1096
LAPACK Routines 3
Input Parameters
cmach Specifies the value to be returned by ?lamch:
= 'E' or 'e', val = eps
= 'S' or 's', val = sfmin
= 'B' or 'b', val = base
= 'P' or 'p', val = eps*base
= 'n' or 'n', val = t
= 'R' or 'r', val = rnd
= 'M' or 'm', val = emin
= 'U' or 'u', val = rmin
= 'L' or 'l', val = emax
= 'O' or 'o', val = rmax
where
eps = relative machine precision;
sfmin = safe minimum, such that 1/sfmin does not overflow;
base = base of the machine;
prec = eps*base;
t = number of (base) digits in the mantissa;
rnd = 1.0 when rounding occurs in addition, 0.0 otherwise;
emin = minimum exponent before (gradual) underflow;
rmin = underflow_threshold - base**(emin-1);
emax = largest exponent before overflow;
rmax = overflow_threshold - (base**emax)*(1-eps).
NOTE
You can use a character string for cmach instead of a single character
in order to make your code more readable. The first character of the
string determines the value to be returned. For example, 'Precision'
is interpreted as 'p'.
Output Parameters
val Value returned by the function.
LAPACK Test Functions and Routines

This section describes LAPACK test functions and routines.
1097
?lagge
Generates a general m-by-n matrix .
Syntax
lapack_int LAPACKE_slagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const float * d , float * a , lapack_int lda , lapack_int *
iseed );
lapack_int LAPACKE_dlagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const double * d , double * a , lapack_int lda , lapack_int *
iseed );
lapack_int LAPACKE_clagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const float * d , lapack_complex_float * a , lapack_int lda ,
lapack_int * iseed );
lapack_int LAPACKE_zlagge (int matrix_layout , lapack_int m , lapack_int n , lapack_int
kl , lapack_int ku , const double * d , lapack_complex_double * a , lapack_int lda ,
lapack_int * iseed );
Include Files
• mkl.h
Description
The routine generates a general m-by-n matrix A, by pre- and post- multiplying a real diagonal matrix D with
random matrices U and V:
A := U*D*V,
where U and V are orthogonal for real flavors and unitary for complex flavors. The lower and upper
bandwidths may then be reduced to kl and ku by additional orthogonal transformations.
Input Parameters
n The number of columns of the matrix A (n≥ 0).
kl The number of nonzero subdiagonals within the band of A (0 ≤kl≤m-1).
ku The number of nonzero superdiagonals within the band of A (0 ≤ku≤n-1).
d The array d with the dimension of (min(m, n)) contains the diagonal
elements of the diagonal matrix D.
lda The leading dimension of the array a (lda≥m) for column major layout and
(lda≥n) for row major layout.
iseed The array iseed with the dimension of 4 contains the seed of the random
number generator. The elements must be between 0 and 4095 and iseed
must be odd.
1098
LAPACK Routines 3
Output Parameters
a The array a with size at least max(1,lda*n) for column major layout and
max(1,lda*m) for row major layout contains the generated m-by-n matrix
A.
iseed The array iseed contains the updated seed on exit.
Return Values
?laghe
Generates a complex Hermitian matrix .
Syntax
lapack_int LAPACKE_claghe (int matrix_layout , lapack_int n , lapack_int k , const
float * d , lapack_complex_float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_zlaghe (int matrix_layout , lapack_int n , lapack_int k , const
double * d , lapack_complex_double * a , lapack_int lda , lapack_int * iseed );
Include Files
• mkl.h
Description
The routine generates a complex Hermitian matrix A, by pre- and post- multiplying a real diagonal matrix D
with random unitary matrix:
A := U*D*UH
The semi-bandwidth may then be reduced to k by additional unitary transformations.
Input Parameters

k The number of nonzero subdiagonals within the band of A (0 ≤k≤n-1).
d The array d with the dimension of (n) contains the diagonal elements of the
diagonal matrix D.
lda The leading dimension of the array a (lda≥n).
1099
number generator. The elements must be between 0 and 4095 and
iseed[3] must be odd.
Output Parameters
a The array a of size at least max (1,lda*n) contains the generated n-by-n
Hermitian matrix D.
Return Values
?lagsy
Generates a symmetric matrix by pre- and post-
multiplying a real diagonal matrix with a random
unitary matrix .
Syntax
lapack_int LAPACKE_slagsy (int matrix_layout , lapack_int n , lapack_int k , const
float * d , float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_dlagsy (int matrix_layout , lapack_int n , lapack_int k , const
double * d , double * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_clagsy (int matrix_layout , lapack_int n , lapack_int k , const
float * d , lapack_complex_float * a , lapack_int lda , lapack_int * iseed );
lapack_int LAPACKE_zlagsy (int matrix_layout , lapack_int n , lapack_int k , const
double * d , lapack_complex_double * a , lapack_int lda , lapack_int * iseed );
Include Files
• mkl.h
Description
The ?lagsy routine generates a symmetric matrix A by pre- and post- multiplying a real diagonal matrix D
with a random matrix U:
A := U*D*UT,
where U is orthogonal for real flavors and unitary for complex flavors. The semi-bandwidth may then be
reduced to k by additional unitary transformations.
Input Parameters
k The number of nonzero subdiagonals within the band of A (0 ≤k≤n-1).
1100
LAPACK Routines 3
d The array d with the dimension of (n) contains the diagonal elements of the
diagonal matrix D.
lda The leading dimension of the array a (lda≥n).
number generator. The elements must be between 0 and 4095 and
iseed[3] must be odd.
Output Parameters
a The array aof size max (1,lda*n) contains the generated symmetric n-by-n
matrix D.
Return Values
?latms
Generates a general m-by-n matrix with specific
singular values.
Syntax
lapack_int LAPACKE_slatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, float * d, lapack_int mode, float cond, float dmax,
lapack_int kl, lapack_int ku, char pack, float * a, lapack_int lda);
lapack_int LAPACKE_dlatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, double * d, lapack_int mode, double cond, double dmax,
lapack_int kl, lapack_int ku, char pack, double * a, lapack_int lda);
lapack_int LAPACKE_clatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, float * d, lapack_int mode, float cond, float dmax,
lapack_int kl, lapack_int ku, char pack, lapack_complex_float * a, lapack_int lda);
lapack_int LAPACKE_zlatms (int matrix_layout, lapack_int m, lapack_int n, char dist,
lapack_int * iseed, char sym, double * d, lapack_int mode, double cond, double dmax,
lapack_int kl, lapack_int ku, char pack, lapack_complex_double * a, lapack_int lda);
Include Files
• mkl.h
Description
The ?latms routine generates random matrices with specified singular values, or symmetric/Hermitian
matrices with specified eigenvalues for testing LAPACK programs.
It applies this sequence of operations:
1101
1. Set the diagonal to d, where d is input or computed according to mode, cond, dmax, and sym as
described in Input Parameters.
2. Generate a matrix with the appropriate band structure, by one of two methods:
Method A 1. Generate a dense m-by-n matrix by multiplying d on the left

and the right by random unitary matrices, then:
2. Reduce the bandwidth according to kl and ku, using
Householder transformations.
Method B: Convert the bandwidth-0 (i.e., diagonal) matrix to a bandwidth-1

matrix using Givens rotations, "chasing" out-of-band elements
back, much as in QR; then convert the bandwidth-1 to a
bandwidth-2 matrix, etc.
Note that for reasonably small bandwidths (relative to m and n)
this requires less storage, as a dense matrix is not generated.
Also, for symmetric or Hermitian matrices, only one triangle is
generated.
Method A is chosen if the bandwidth is a large fraction of the order of the matrix, and lda is at least m (so a
dense matrix can be stored.) Method B is chosen if the bandwidth is small (less than (1/2)*n for symmetric
or Hermitian or less than .3*n+m for nonsymmetric), or lda is less than m and not less than the bandwidth.
Pack the matrix if desired, using one of the methods specified by the pack parameter.
If Method B is chosen and band format is specified, then the matrix is generated in the band format and no
repacking is necessary.
Input Parameters

n The number of columns of the matrix A (n≥ 0).
dist Specifies the type of distribution to be used to generate the random

singular values or eigenvalues:
• 'U': uniform distribution (0, 1)

• 'S': symmetric uniform distribution (-1, 1)
• 'N': normal distribution (0, 1)
iseed Array with size 4.

Specifies the seed of the random number generator. Values should lie
between 0 and 4095 inclusive, and iseed[3] should be odd. The random
number generator uses a linear congruential sequence limited to small
integers, and so should produce machine independent random numbers.
The values of the array are modified, and can be used in the next call to ?
latms to continue the same random number sequence.
sym If sym='S' or 'H', the generated matrix is symmetric or Hermitian, with

eigenvalues specified by d, cond, mode, and dmax; they can be positive,
negative, or zero.
1102
LAPACK Routines 3
If sym='P', the generated matrix is symmetric or Hermitian, with
eigenvalues (which are singular, non-negative values) specified by d, cond,
mode, and dmax.
If sym='N', the generated matrix is nonsymmetric, with singular, non-
negative values specified by d, cond, mode, and dmax.
d Array, size (MIN(m , n))
This array is used to specify the singular values or eigenvalues of A (see the
description of sym). If mode=0, then d is assumed to contain the
eigenvalues or singular values, otherwise elements of d are computed
according to mode, cond, and dmax.
mode Describes how the singular/eigenvalues are specified.
• mode = 0: use d as input

• mode = 1: set d[0] = 1 and d[1:n - 1] = 1.0/cond
• mode = 2: set d[0:n - 2] = 1 and d[n - 1] = 1.0/cond
• mode = 3: set d[i] = cond-i/(n - 1)
• mode = 4: set d[i] = 1 - i/(n - 1)*(1 - 1/cond)
• mode = 5: set elements of d to random numbers in the range (1/cond ,
1) such that their logarithms are uniformly distributed.
• mode = 6: set elements of d to random numbers from same distribution
as the rest of the matrix.
mode < 0 has the same meaning as ABS(mode), except that the order of the
elements of d is reversed. Thus, if mode is positive, d has entries ranging
from 1 to 1/cond, if negative, from 1/cond to 1.
If sym='S' or 'H', and mode is not 0, 6, nor -6, then the elements of d are
also given a random sign (multiplied by +1 or -1).
cond Used in setting d as described for the mode parameter. If used, cond≥ 1.
dmax If mode is not -6, 0 nor 6, the contents of d, as computed according to mode
and cond, are scaled by dmax / max(abs(d[i-1])); thus, the maximum
absolute eigenvalue or singular value (the norm) is abs(dmax).
NOTE
dmax need not be positive: if dmax is negative (or zero), d will be
scaled by a negative number (or zero).
kl Specifies the lower bandwidth of the matrix. For example, kl=0 implies
upper triangular, kl=1 implies upper Hessenberg, and kl being at least m -
1 means that the matrix has full lower bandwidth. kl must equal ku if the
matrix is symmetric or Hermitian.
ku Specifies the upper bandwidth of the matrix. For example, ku=0 implies
lower triangular, ku=1 implies lower Hessenberg, and ku being at least n -
1 means that the matrix has full upper bandwidth. kl must equal ku if the
matrix is symmetric or Hermitian.
pack Specifies packing of matrix:
1103
• 'N': no packing
• 'U': zero out all subdiagonal entries (if symmetric or Hermitian)
• 'L': zero out all superdiagonal entries (if symmetric or Hermitian)
• 'B': store the lower triangle in band storage scheme (only if matrix
symmetric, Hermitian, or lower triangular)
• 'Q': store the upper triangle in band storage scheme (only if matrix
symmetric, Hermitian, or upper triangular)
• 'Z': store the entire matrix in band storage scheme (pivoting can be
provided for by using this option to store A in the trailing rows of the
allocated storage)
Using these options, the various LAPACK packed and banded storage
schemes can be obtained:
'Z' 'B' 'Q' 'C' 'R'
GB: general band x
PB: symmetric positive definite band x x
SB: symmetric band x x
HB: Hermitian band x x
TB: triangular band x x
PP: symmetric positive definite packed x x
SP: symmetric packed x x
HP: Hermitian packed x x
TP: triangular packed x x
If two calls to ?latms differ only in the pack parameter, they generate
mathematically equivalent matrices.
lda lda specifies the first dimension of a as declared in the calling program.
If pack='N', 'U', 'L', 'C', or 'R', then lda must be at least m for column major
or at least n for row major.
If pack='B' or 'Q', then lda must be at least MIN(kl, m - 1) (which is

equal to MIN(ku,n - 1)).
If pack='Z', lda must be large enough to hold the packed array: MIN( ku,
n - 1) + MIN( kl, m - 1) + 1.
Output Parameters
iseed The array iseed contains the updated seed.
d The array d contains the updated seed.
NOTE
The array d is not modified if mode = 0.
a Array of size lda by n.
1104
LAPACK Routines 3
The array a contains the generated m-by-n matrix A.
a is first generated in full (unpacked) form, and then packed, if so specified
by pack. Thus, the first m elements of the first n columns are always
modified. If pack specifies a packed or banded storage scheme, all lda
elements of the first n columns are modified; the elements of the array
which do not correspond to elements of the generated matrix are set to
zero.
Return Values
If info = 2, cannot scale to dmax (maximum singular value is 0).
If info = 3, error return from lagge, ?laghe, or lagsy.
1105
1106
ScaLAPACK Routines 4
Intel® Math Kernel Library implements routines from the ScaLAPACK package for distributed-memory
architectures. Routines are supported for both real and complex dense and band matrices to perform the
tasks of solving systems of linear equations, solving linear least-squares problems, eigenvalue and singular
value problems, as well as performing a number of related computational tasks.
Intel MKL ScaLAPACK routines are written in FORTRAN 77 with exception of a few utility routines written in C
to exploit the IEEE arithmetic. All routines are available in all precision types: single precision, double
precision, complexm, and double complex precision. See the mkl_scalapack.h header file for C declarations
of ScaLAPACK routines.
NOTE
ScaLAPACK routines are provided only for Intel® 64 or Intel® Many Integrated Core architectures.
See descriptions of ScaLAPACK computational routines that perform distinct computational tasks, as well as
driver routines for solving standard types of problems in one call. Additionally, Intel® Math Kernel Library
implements ScaLAPACK Auxiliary Routines, Utility Functions and Routines, and Matrix Redistribution/Copy
Routines. The library includes routines for both real and complex data.
The <install_directory>/examples/scalapackf directory contains sample code demonstrating the use
of ScaLAPACK routines.
Generally, ScaLAPACK runs on a network of computers using MPI as a message-passing layer and a set of
prebuilt communication subprograms (BLACS), as well as a set of BLAS optimized for the target architecture.
Intel MKL version of ScaLAPACK is optimized for Intel® processors. For the detailed system and environment
requirements, see Intel® MKL Release Notes and Intel® MKL Developer Guide.
For full reference on ScaLAPACK routines and related information, see [SLUG].
Optimization Notice
Overview of ScaLAPACK Routines

The model of the computing environment for ScaLAPACK is represented as a one-dimensional array of
processes (for operations on band or tridiagonal matrices) or also a two-dimensional process grid (for
operations on dense matrices). To use ScaLAPACK, all global matrices or vectors should be distributed on this
array or grid prior to calling the ScaLAPACK routines.
ScaLAPACK is closely tied to other components, including BLAS, BLACS, LAPACK, and PBLAS.
1107
ScaLAPACK Array Descriptors

ScaLAPACK uses two-dimensional block-cyclic data distribution as a layout for dense matrix computations.
This distribution provides good work balance between available processors, and also allows use of BLAS Level
3 routines for optimal local computations. Information about the data distribution that is required to establish
the mapping between each global matrix and its corresponding process and memory location is contained in
the array called the array descriptor associated with each global matrix. The size of the array descriptor is
denoted as dlen_.
Let A be a two-dimensional block cyclicly distributed matrix with the array descriptor array desca. The
meaning of each array descriptor element depends on the type of the matrix A. The tables "Array descriptor
for dense matrices" and "Array descriptor for narrow-band and tridiagonal matrices" describe the meaning of
each element for the different types of matrices.
Array descriptor for dense matrices (dlen_=9)
Element Stored in Description Element Index
Name Number
dtype_a desca[dtype_] 0
Descriptor type ( =1 for dense matrices).
ctxt_a desca[ctxt_] BLACS context handle for the process grid. 1
m_a desca[m_] Number of rows in the global matrix A. 2
n_a desca[n_] Number of columns in the global matrix A. 3
mb_a desca[mb_] Row blocking factor. 4
nb_a desca[nb_] Column blocking factor. 5
1108
Name Number
rsrc_a desca[rsrc_] Process row over which the first row of the 6
global matrix A is distributed.
csrc_a desca[csrc_] Process column over which the first column of 7
the global matrix A is distributed.
lld_a desca[lld_] Leading dimension of the local matrix A. 8
Array descriptor for narrow-band and tridiagonal matrices (dlen_=7)

Name Number
dtype_a desca[dtype_] Descriptor type 0
• dtype_a=501: 1-by-P grid,

• dtype_a=502: P-by-1 grid.
ctxt_a desca[ctxt_] BLACS context handle indicating the BLACS 1

process grid over which the global matrix A is
distributed. The context itself is global, but the
handle (the integer value) can vary.
n_a desca[n_] The size of the matrix dimension being 2
distributed.
nb_a desca[nb_] The blocking factor used to distribute the 3
distributed dimension of the matrix A.
src_a desca[src_] The process row or column over which the first 4
row or column of the matrix A is distributed.
lld_a desca[lld_] The leading dimension of the local matrix 5
storing the local blocks of the distributed
matrix A. The minimum value of lld_a depends
on dtype_a.
• dtype_a=501: lld_a≥ max(size of

undistributed dimension, 1),
• dtype_a=502: lld_a≥ max(nb_a, 1).
Not Reserved for future use. 6

applicable
Similar notations are used for different matrices. For example: lld_b is the leading dimension of the local
matrix storing the local blocks of the distributed matrix B and dtype_z is the type of the global matrix Z.
The number of rows and columns of a global dense matrix that a particular process in a grid receives after
data distributing is denoted by LOCr() and LOCc(), respectively. To compute these numbers, you can use the
ScaLAPACK tool routine numroc.
After the block-cyclic distribution of global data is done, you may choose to perform an operation on a
submatrix sub(A) of the global matrix A defined by the following 6 values (for dense matrices):
m The number of rows of sub(A)
n The number of columns of sub(A)
a A pointer to the local matrix containing the entire global matrix A
ia The row index of sub(A) in the global matrix A
ja The column index of sub(A) in the global matrix A
1109
desca The array descriptor for the global matrix A
Optimization Notice
Naming Conventions for ScaLAPACK Routines

For each routine introduced in this chapter, you can use the ScaLAPACK name. The naming convention for
ScaLAPACK routines is similar to that used for LAPACK routines. A general rule is that each routine name in
ScaLAPACK, which has an LAPACK equivalent, is simply the LAPACK name prefixed by initial letter p.
ScaLAPACK names have the structure p?yyzzz or p?yyzz, which is described below.
The initial letter p is a distinctive prefix of ScaLAPACK routines and is present in each such routine.
The second symbol ? indicates the data type:
The second and third letters yy indicate the matrix type as:
ge general
gb general band
gg a pair of general matrices (for a generalized problem)
dt general tridiagonal (diagonally dominant-like)
db general band (diagonally dominant-like)
po symmetric or Hermitian positive-definite
pb symmetric or Hermitian positive-definite band
pt symmetric or Hermitian positive-definite tridiagonal
sy symmetric
st symmetric tridiagonal (real)
he Hermitian
or orthogonal
1110
tr triangular (or quasi-triangular)
tz trapezoidal
un unitary
For computational routines, the last three letters zzz indicate the computation performed and have the same
meaning as for LAPACK routines.
For driver routines, the last two letters zz or three letters zzz have the following meaning:
sv a simple driver for solving a linear system
svx an expert driver for solving a linear system
ls a driver for solving a linear least squares problem
ev a simple driver for solving a symmetric eigenvalue problem
evd a simple driver for solving an eigenvalue problem using a divide and conquer
algorithm
evx an expert driver for solving a symmetric eigenvalue problem
svd a driver for computing a singular value decomposition
gvx an expert driver for solving a generalized symmetric definite eigenvalue problem
Simple driver here means that the driver just solves the general problem, whereas an expert driver is more
versatile and can also optionally perform some related computations (such, for example, as refining the
solution and computing error bounds after the linear system is solved).
ScaLAPACK Computational Routines

In the sections that follow, the descriptions of ScaLAPACK computational routines are given. These routines
perform distinct computational tasks that can be used for:
• Solving Systems of Linear Equations
• Orthogonal Factorizations and LLS Problems
• Symmetric Eigenproblems
• Nonsymmetric Eigenproblems
• Singular Value Decomposition
• Generalized Symmetric-Definite Eigenproblems
See also the respective driver routines.
Systems of Linear Equations: ScaLAPACK Computational Routines

ScaLAPACK supports routines for the systems of equations with the following types of matrices:
• general
• general banded
• general diagonally dominant-like banded (including general tridiagonal)
• symmetric or Hermitian positive-definite
• symmetric or Hermitian positive-definite banded
• symmetric or Hermitian positive-definite tridiagonal
A diagonally dominant-like matrix is defined as a matrix for which it is known in advance that pivoting is not
required in the LU factorization of this matrix.
1111
For the above matrix types, the library includes routines for performing the following computations: factoring
the matrix; equilibrating the matrix; solving a system of linear equations; estimating the condition number of
a matrix; refining the solution of linear equations and computing its error bounds; inverting the matrix. Note
that for some of the listed matrix types only part of the computational routines are provided (for example,
routines that refine the solution are not provided for band or tridiagonal matrices). See Table “Computational
Routines for Systems of Linear Equations” for full list of available routines.
To solve a particular problem, you can either call two or more computational routines or call a corresponding
driver routine that combines several tasks in one call. Thus, to solve a system of linear equations with a
general matrix, you can first call p?getrf(LU factorization) and then p?getrs(computing the solution).
Then, you might wish to call p?gerfs to refine the solution and get the error bounds. Alternatively, you can
just use the driver routine p?gesvx which performs all these tasks in one call.
Table “Computational Routines for Systems of Linear Equations” lists the ScaLAPACK computational routines
for factorizing, equilibrating, and inverting matrices, estimating their condition numbers, solving systems of
equations with real matrices, refining the solution, and estimating its error.
Computational Routines for Systems of Linear Equations
Matrix type, storage Factorize Equilibrate Solve Condition Estimate Invert
scheme matrix matrix system number error matrix
general (partial pivoting) p?getrf p?geequ p?getrs p?gecon p?gerfs p?getri
general band (partial p?gbtrf p?gbtrs
pivoting)
general band (no p?dbtrf p?dbtrs
pivoting)
general tridiagonal (no p?dttrf p?dttrs
pivoting)
symmetric/Hermitian p?potrf p?poequ p?potrs p?pocon p?porfs p?potri
positive-definite
symmetric/Hermitian p?pbtrf p?pbtrs
positive-definite, band
symmetric/Hermitian p?pttrf p?pttrs
positive-definite,
tridiagonal
triangular p?trtrs p?trcon p?trrfs p?trtri
In this table ? stands for s (single precision real), d (double precision real), c (single precision complex), or z
(double precision complex).
Matrix Factorization: ScaLAPACK Computational Routines

This section describes the ScaLAPACK routines for matrix factorization. The following factorizations are
supported:
• LU factorization of general matrices

• LU factorization of diagonally dominant-like matrices
• Cholesky factorization of real symmetric or complex Hermitian positive-definite matrices
You can compute the factorizations using full and band storage of matrices.
p?getrf
distributed matrix.
Syntax
void psgetrf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pdgetrf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_INT *info );
1112
void pcgetrf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
void pzgetrf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
Include Files
• mkl_scalapack.h
Description
The p?getrffunction forms the LU factorization of a general m-by-n distributed matrix sub(A) = A(ia:ia
+m-1, ja:ja+n-1) as
A = P*L*U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m>n)
and U is upper triangular (upper trapezoidal if m < n). L and U are stored in sub(A).
The function uses partial pivoting, with row interchanges.
NOTE
This function supports the Progress Routine feature. See mkl_progress for details.
Input Parameters
m (global) The number of rows in the distributed matrix sub(A); m≥0.
n (global) The number of columns in the distributed matrix sub(A); n≥0.
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1).
Contains the local pieces of the distributed matrix sub(A) to be factored.
ia, ja (global) The row and column indices in the global matrix A indicating the
first row and the first column of the matrix sub(A), respectively.
desca (global and local) array of size dlen_. The array descriptor for the
distributed matrix A.
Output Parameters
a Overwritten by local pieces of the factors L and U from the factorization A =

P*L*U. The unit diagonal elements of L are not stored.
ipiv (local) Array of size LOCr(m_a)+ mb_a.
Contains the pivoting information: local row i was interchanged with global
row ipiv[i-1]. This array is tied to the distributed matrix A.
info (global)
1113
info < 0: if the i-th argument is an array and the j-th entry, indexed j - 1,
had an illegal value, then info = -(i*100+j); if the i-th argument is a
scalar and had an illegal value, then info = -i.
If info = i > 0, uia+i, ja+j-1 is 0. The factorization has been completed,

but the factor U is exactly singular. Division by zero will occur if you use the
factor U for solving a system of linear equations.
See Also
Overview of ScaLAPACK Routines for details of ScaLAPACK array descriptor structures and related
notations.
p?gbtrf
Computes the LU factorization of a general n-by-n
banded distributed matrix.
Syntax
void psgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , float *a , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , float *af , MKL_INT *laf , float *work , MKL_INT
*lwork , MKL_INT *info );
void pdgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , double *a , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , double *af , MKL_INT *laf , double *work , MKL_INT
void pcgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex8 *a , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8
*work , MKL_INT *lwork , MKL_INT *info );
void pzgbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex16 *a , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *af , MKL_INT *laf ,
MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gbtrf function computes the LU factorization of a general n-by-n real/complex banded distributed
matrix A(1:n, ja:ja+n-1) using partial pivoting with row interchanges.
The resulting factorization is not the same factorization as returned from the LAPACK function ?gbtrf.
Additional permutations are performed on the matrix for the sake of parallelism.
A(1:n, ja:ja+n-1) = P*L*U*Q
where P and Q are permutation matrices, and L and U are banded lower and upper triangular matrices,
respectively. The matrix Q represents reordering of columns for the sake of parallelism, while P represents
reordering of rows for numerical stability using classic partial pivoting.
Optimization Notice
1114
Optimization Notice
Input Parameters
n (global) The number of rows and columns in the distributed submatrix

A(1:n, ja:ja+n-1); n≥ 0.
bwl (global) The number of sub-diagonals within the band of A

( 0 ≤bwl≤n-1 ).
bwu (global) The number of super-diagonals within the band of A

( 0 ≤bwu≤n-1 ).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja+n-1)
where
lld_a≥ 2*bwl + 2*bwu +1.
Contains the local pieces of the n-by-n distributed banded matrix A(1:n,
ja:ja+n-1) to be factored.
ja (global) The index in the global matrix A indicating the start of the matrix to
be operated on (which may be either all of A or a submatrix of A).
If dtype_a = 501, then dlen_≥ 7;
else if dtype_a = 1, then dlen_≥ 9.
laf (local) The size of the array af.

Must be laf≥ (nb_a+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).
If laf is not large enough, an error code will be returned and the minimum
acceptable size will be returned in af[0].
work (local) Same type as a. Workspace array of size lwork.
lwork (local or global) The size of the work array (lwork≥ 1). If lwork is too
small, the minimal acceptable size will be returned in work[0] and an error
code is returned.
Output Parameters
a On exit, this array contains details of the factorization. Note that additional
permutations are performed on the matrix, so that the factors returned are
different from those returned by LAPACK.
1115
ipiv (local) array.

The size of ipiv must be ≥nb_a.
Contains pivot indices for local factorizations. Note that you should not alter
the contents of this array between factorization and solve.
af (local)
Array of size laf.
Auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?gbtrf and is stored in af.
Note that if a linear system is to be solved using p?gbtrs after the

factorization function,af must not be altered after the factorization.
work[0] On exit, work[0] contains the minimum value of lwork required.
info (global)
info < 0:
If the i-th argument is an array and the j-th entry, indexed j - 1, had an
illegal value, then info = -(i*100+j); if the i-th argument is a scalar and
had an illegal value, then info = -i.
info> 0:
If info = k≤NPROCS, the submatrix stored on processor info and factored
locally was not nonsingular, and the factorization was not completed.
If info = k > NPROCS, the submatrix stored on processor info-NPROCS
representing interactions with other processors was not nonsingular, and
the factorization was not completed.
See Also
notations.
p?dbtrf
Computes the LU factorization of a n-by-n diagonally
dominant-like banded distributed matrix.
Syntax
void psdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , float *a , MKL_INT *ja ,
MKL_INT *desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
void pddbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , double *a , MKL_INT *ja ,
MKL_INT *desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex8 *a , MKL_INT
*ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
void pzdbtrf (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_Complex16 *a , MKL_INT
*ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work ,
MKL_INT *lwork , MKL_INT *info );
1116
Include Files
• mkl_scalapack.h
Description
The p?dbtrffunction computes the LU factorization of a n-by-n real/complex diagonally dominant-like
banded distributed matrix A(1:n, ja:ja+n-1) without pivoting.
NOTE
A matrix is called diagonally dominant-like if pivoting is not required for LU to be numerically stable.
Note that the resulting factorization is not the same factorization as returned from LAPACK. Additional
permutations are performed on the matrix for the sake of parallelism.
Optimization Notice
Input Parameters
n (global) The number of rows and columns in the distributed submatrix

A(1:n, ja:ja+n-1); n≥ 0.
bwl (global) The number of sub-diagonals within the band of A

(0 ≤bwl≤n-1).
bwu (global) The number of super-diagonals within the band of A

(0 ≤bwu≤n-1).
a (local)
Contains the local pieces of the n-by-n distributed banded matrix A(1:n,
ja:ja+n-1) to be factored.
1117
Must be laf≥NB*(bwl+bwu)+6*(max(bwl,bwu))2 .
work (local) Workspace array of size lwork.
lwork (local or global) The size of the work array, must be lwork≥
(max(bwl,bwu))2. If lwork is too small, the minimal acceptable size will
be returned in work[0] and an error code is returned.
Output Parameters
a On exit, this array contains details of the factorization. Note that additional
permutations are performed on the matrix, so that the factors returned are
different from those returned by LAPACK.
af (local)
Array of size laf.
function p?dbtrf and is stored in af.
Note that if a linear system is to be solved using p?dbtrs after the

work[0] On exit, work[0] contains the minimum value of lwork required for
optimum performance.
info (global)
info < 0:
info> 0:
locally was not diagonally dominant-like, and the factorization was not
completed.
See Also
notations.
p?dttrf
Computes the LU factorization of a diagonally
dominant-like tridiagonal distributed matrix.
1118
Syntax
void psdttrf (MKL_INT *n , float *dl , float *d , float *du , MKL_INT *ja , MKL_INT
*desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrf (MKL_INT *n , double *dl , double *d , double *du , MKL_INT *ja , MKL_INT
*desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcdttrf (MKL_INT *n , MKL_Complex8 *dl , MKL_Complex8 *d , MKL_Complex8 *du ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
void pzdttrf (MKL_INT *n , MKL_Complex16 *dl , MKL_Complex16 *d , MKL_Complex16 *du ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
Include Files
• mkl_scalapack.h
Description
The p?dttrffunction computes the LU factorization of an n-by-n real/complex diagonally dominant-like
tridiagonal distributed matrix A(1:n, ja:ja+n-1) without pivoting for stability.
The resulting factorization is not the same factorization as returned from LAPACK. Additional permutations
are performed on the matrix for the sake of parallelism.
The factorization has the form:
A(1:n, ja:ja+n-1) = P*L*U*PT,
where P is a permutation matrix, and L and U are banded lower and upper triangular matrices, respectively.
Optimization Notice
Input Parameters
n (global) The number of rows and columns to be operated on, that is, the
order of the distributed submatrix A(1:n, ja:ja+n-1) (n≥ 0).
dl, d, du (local)
Pointers to the local arrays of size nb_a each.
On entry, the array dl contains the local part of the global vector storing
the subdiagonal elements of the matrix. Globally, dl[0] is not referenced,
and dl must be aligned with d.
On entry, the array d contains the local part of the global vector storing the
diagonal elements of the matrix.
1119
On entry, the array du contains the local part of the global vector storing
the super-diagonal elements of the matrix. du[n-1] is not referenced, and
du must be aligned with d.
Must be laf≥ 2*(NB+2) .
work (local) Same type as d. Workspace array of size lwork.
lwork (local or global) The size of the work array, must be at least lwork≥
8*NPCOL.
Output Parameters
dl, d, du On exit, overwritten by the information containing the factors of the matrix.
af (local)
Array of size laf.
function p?dttrf and is stored in af.
Note that if a linear system is to be solved using p?dttrs after the

factorization function,af must not be altered.
info (global)
info < 0:
info> 0:
locally was not diagonally dominant-like, and the factorization was not
completed.
1120
See Also
notations.
p?potrf
(Hermitian) positive-definite distributed matrix.
Syntax
void pspotrf (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *info );
void pdpotrf (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *info );
void pcpotrf (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
void pzpotrf (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
Include Files
• mkl_scalapack.h
Description
The p?potrffunction computes the Cholesky factorization of a real symmetric or complex Hermitian positive-
definite distributed n-by-n matrix A(ia:ia+n-1, ja:ja+n-1), denoted below as sub(A).

sub(A) = UH*U if uplo='U', or
sub(A) = L*LH if uplo='L'
Input Parameters
uplo (global)
Indicates whether the upper or lower triangular part of sub(A) is stored.
Must be 'U' or 'L'.
If uplo = 'U', the array a stores the upper triangular part of the matrix
sub(A) that is factored as UH*U.
matrix sub(A) that is factored as L*LH.
n (global) The order of the distributed matrix sub(A) (n≥0).
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
1121
On entry, this array contains the local pieces of the n-by-n symmetric/
Hermitian distributed matrix sub(A) to be factored.
Depending on uplo, the array a contains either the upper or the lower
triangular part of the matrix sub(A) (see uplo).
Output Parameters
a The upper or lower triangular part of a is overwritten by the Cholesky factor

U or L, as specified by uplo.
info (global) .
If info=0, the execution is successful;
info < 0: if the i-th argument is an array, and the j-th entry, indexed j -
1, had an illegal value, then info = -(i*100+j); if the i-th argument is a
If info = k >0, the leading minor of order k, A(ia:ia+k-1, ja:ja+k-1), is

not positive-definite, and the factorization could not be completed.
See Also
notations.
p?pbtrf
(Hermitian) positive-definite banded distributed
matrix.
Syntax
void pspbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , float *a , MKL_INT *ja , MKL_INT
*desca , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , double *a , MKL_INT *ja ,
MKL_INT *desca , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_Complex8 *a , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
void pzpbtrf (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_Complex16 *a , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT
Include Files
• mkl_scalapack.h
1122
Description
The p?pbtrffunction computes the Cholesky factorization of an n-by-n real symmetric or complex Hermitian
positive-definite banded distributed matrix A(1:n, ja:ja+n-1).
A(1:n, ja:ja+n-1) = P*UH*U*PT, if uplo='U', or
A(1:n, ja:ja+n-1) = P*L*LH*PT, if uplo='L',
where P is a permutation matrix and U and L are banded upper and lower triangular matrices, respectively.
Optimization Notice
Input Parameters
uplo (global) Must be 'U' or 'L'.
If uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) is stored;
If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) is stored.
n (global) The order of the distributed submatrix A(1:n, ja:ja+n-1).
(n≥0).
bw (global)
The number of superdiagonals of the distributed matrix if uplo = 'U', or
the number of subdiagonals if uplo = 'L' (bw≥0).
a (local)
On entry, this array contains the local pieces of the upper or lower triangle
of the symmetric/Hermitian band distributed matrix A(1:n, ja:ja+n-1) to
be factored.
1123
Must be laf≥ (NB+2*bw)*bw.
work (local) Workspace array of size lwork.
lwork (local or global) The size of the work array, must be lwork≥bw2.
Output Parameters
a On exit, if info=0, contains the permuted triangular factor U or L from the

Cholesky factorization of the band matrix A(1:n, ja:ja+n-1), as specified
by uplo.
af (local)
Array of size laf. Auxiliary fill-in space. The fill-in space is created in a call
to the factorization function p?pbtrf and stored in af. Note that if a linear
system is to be solved using p?pbtrs after the factorization function,af
must not be altered.
info (global)
info < 0:
info>0:
locally was not positive definite, and the factorization was not completed.
See Also
notations.
p?pttrf
(Hermitian) positive-definite tridiagonal distributed
matrix.
Syntax
void pspttrf (MKL_INT *n , float *d , float *e , MKL_INT *ja , MKL_INT *desca , float
*af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
1124
void pdpttrf (MKL_INT *n , double *d , double *e , MKL_INT *ja , MKL_INT *desca ,
double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpttrf (MKL_INT *n , float *d , MKL_Complex8 *e , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzpttrf (MKL_INT *n , double *d , MKL_Complex16 *e , MKL_INT *ja , MKL_INT
*desca , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16 *work , MKL_INT *lwork ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?pttrffunction computes the Cholesky factorization of an n-by-n real symmetric or complex hermitian
positive-definite tridiagonal distributed matrix A(1:n, ja:ja+n-1).
A(1:n, ja:ja+n-1) = P*L*D*LH*PT, or
A(1:n, ja:ja+n-1) = P*UH*D*U*PT,
where P is a permutation matrix, and U and L are tridiagonal upper and lower triangular matrices,
respectively.
Optimization Notice
Input Parameters
n (global) The order of the distributed submatrix A(1:n, ja:ja+n-1)
(n≥ 0).
d, e (local)
Pointers into the local memory to arrays of size nb_a each.
On entry, the array d contains the local part of the global vector storing the
main diagonal of the distributed matrix A.
On entry, the array e contains the local part of the global vector storing the
upper diagonal of the distributed matrix A.
1125
desca (global and local ) array of size dlen_. The array descriptor for the
Must be laf≥nb_a+2.
work (local) Workspace array of size lwork .
lwork (local or global) The size of the work array, must be at least
lwork≥ 8*NPCOL.
Output Parameters
d, e On exit, overwritten by the details of the factorization.
af (local)
Array of size laf.
function p?pttrf and stored in af.
Note that if a linear system is to be solved using p?pttrs after the

factorization function,af must not be altered.
info (global)
info < 0:
info> 0:
locally was not positive definite, and the factorization was not completed.
See Also
notations.
1126
Solving Systems of Linear Equations: ScaLAPACK Computational Routines
This section describes the ScaLAPACK routines for solving systems of linear equations. Before calling most of
these routines, you need to factorize the matrix of your system of equations (see Routines for Matrix
Factorization in this chapter). However, the factorization is not necessary if your system of equations has a
triangular matrix.
p?getrs
Solves a system of distributed linear equations with a
general square matrix, using the LU factorization
computed by p?getrf.
Syntax
void psgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *info );
void pdgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb ,
void pcgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pzgetrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib ,
Include Files
• mkl_scalapack.h
Description
The p?getrsfunction solves a system of distributed linear equations with a general n-by-n distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the LU factorization computed by p?getrf.
The system has one of the following forms specified by trans:

sub(A)*X = sub(B) (no transpose),
sub(A)T*X = sub(B) (transpose),
sub(A)H*X = sub(B) (conjugate transpose),
where sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1).
Before calling this function,you must call p?getrf to compute the LU factorization of sub(A).
Input Parameters
trans (global) Must be 'N' or 'T' or 'C'.

If trans = 'N', then sub(A)*X = sub(B) is solved for X.
If trans = 'T', then sub(A)T*X = sub(B) is solved for X.
If trans = 'C', then sub(A)H *X = sub(B) is solved for X.
1127
n (global) The number of linear equations; the order of the matrix sub(A)
(n≥0).
nrhs (global) The number of right hand sides; the number of columns of the
distributed matrix sub(B) (nrhs≥0).
a, b (local)
Pointers into the local memory to arrays of local sizes lld_a*LOCc(ja+n-1)
and lld_b*LOCc(jb+nrhs-1), respectively.
On entry, the array a contains the local pieces of the factors L and U from
the factorization sub(A) = P*L*U; the unit diagonal elements of L are not
stored. On entry, the array b contains the right hand sides sub(B).
ipiv (local) Array of size of LOCr(m_a) + mb_a. Contains the pivoting

information: local row i of the matrix was interchanged with the global row
ipiv[i-1].
This array is tied to the distributed matrix A.
ib, jb (global) The row and column indices in the global matrix B indicating the
first row and the first column of the matrix sub(B), respectively.
descb (global and local) array of size dlen_. The array descriptor for the
distributed matrix B.
Output Parameters
b On exit, overwritten by the solution distributed matrix X.
info If info=0, the execution is successful. info < 0:
See Also
notations.
p?gbtrs
Solves a system of distributed linear equations with a
general band matrix, using the LU factorization
computed by p?gbtrf.
Syntax
void psgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib ,
MKL_INT *descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
1128
void pdgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib ,
MKL_INT *descb , double *af , MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work ,
void pzgbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
Include Files
• mkl_scalapack.h
Description
The p?gbtrs function solves a system of distributed linear equations with a general band distributed matrix
sub(A) = A(1:n, ja:ja+n-1) using the LU factorization computed by p?gbtrf.
The system has one of the following forms specified by trans:
sub(A)*X = sub(B) (no transpose),

sub(A)T*X = sub(B) (transpose),
sub(A)H*X = sub(B) (conjugate transpose),
where sub(B) = B(ib:ib+n-1, 1:nrhs).
Before calling this function,you must call p?gbtrf to compute the LU factorization of sub(A).
Input Parameters

If trans = 'C', then sub(A)H *X = sub(B) is solved for X.
n (global) The number of linear equations; the order of the distributed matrix
sub(A) (n≥ 0).
bwl (global) The number of sub-diagonals within the band of A( 0

≤bwl≤n-1 ).
bwu (global) The number of super-diagonals within the band of A( 0

≤bwu≤n-1 ).
distributed matrix sub(B) (nrhs≥ 0).
a, b (local)
1129

and lld_b*LOCc(nrhs), respectively.
The array a contains details of the LU factorization of the distributed band

matrix A.
On entry, the array b contains the local pieces of the right hand sides
B(ib:ib+n-1, 1:nrhs).
be operated on ( which may be either all of A or a submatrix of A).
ib (global) The index in the global matrix A indicating the start of the matrix to
If dtype_b = 502, then dlen_≥ 7;
else if dtype_b = 1, then dlen_≥ 9.
Must be laf≥nb_a*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).
work (local) Same type as a. Workspace array of size lwork.
lwork (local or global) The size of the work array, must be at least
lwork≥nrhs*(nb_a+2*bwl+4*bwu).
Output Parameters
ipiv (local) array.

The size of ipiv must be ≥nb_a.
Contains pivot indices for local factorizations. Note that you should not alter
the contents of this array between factorization and solve.
b On exit, overwritten by the local pieces of the solution distributed matrix X.
af (local)
Array of size laf.
Auxiliary Fill-in space. The fill-in space is created in a call to the

factorization function p?gbtrf and is stored in af.
Note that if a linear system is to be solved using p?gbtrs after the

1130
info < 0:
See Also
notations.
p?dbtrs
dominant-like banded distributed matrix using the
factorization computed by p?dbtrf.
Syntax
void psdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb ,
float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb ,
void pcdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8 *work , MKL_INT
void pzdbtrs (char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs ,
Include Files
• mkl_scalapack.h
Description
The p?dbtrsfunction solves for X one of the systems of equations:
sub(A)*X = sub(B),
(sub(A))T*X = sub(B), or
(sub(A))H*X = sub(B),
where sub(A) = A(1:n, ja:ja+n-1) is a diagonally dominant-like banded distributed matrix, and sub(B)
denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
This function uses the LU factorization computed by p?dbtrf.
1131
Input Parameters

If trans = 'T', then (sub(A))T*X = sub(B) is solved for X.
If trans = 'C', then (sub(A))H*X = sub(B) is solved for X.
n (global) The order of the distributed matrix sub(A) (n≥ 0).
bwl (global) The number of subdiagonals within the band of A

( 0 ≤bwl≤n-1 ).
bwu (global) The number of superdiagonals within the band of A

( 0 ≤bwu≤n-1 ).
a, b (local)
and lld_b*LOCc(nrhs), respectively.
On entry, the array a contains details of the LU factorization of the band

matrix A, as computed by p?dbtrf.
On entry, the array b contains the local pieces of the right hand side
distributed matrix sub(B).
ib (global) The row index in the global matrix B indicating the first row of the
matrix to be operated on (which may be either all of B or a submatrix of B).
af, work (local)

Arrays of size laf and lwork, respectively The array af contains auxiliary
fill-in space. The fill-in space is created in a call to the factorization function
p?dbtrf and is stored in af.
The array work is a workspace array.
1132
Must be laf≥NB*(bwl+bwu)+6*(max(bwl,bwu))2 .
lwork (local or global) The size of the array work, must be at least
lwork≥ (max(bwl,bwu))2.
Output Parameters
b On exit, this array contains the local pieces of the solution distributed
matrix X.
See Also
notations.
p?dttrs
dominant-like tridiagonal distributed matrix using the
factorization computed by p?dttrf.
Syntax
void psdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl , float *d , float
*du , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float
*af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl , double *d ,
double *du , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb ,
void pcdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *dl ,
MKL_Complex8 *d , MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
void pzdttrs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *dl ,
MKL_Complex16 *d , MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex16
*b , MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
Include Files
• mkl_scalapack.h
1133
Description
The p?dttrsfunction solves for X one of the systems of equations:
sub(A)*X = sub(B),
where sub(A) =A(1:n, ja:ja+n-1) is a diagonally dominant-like tridiagonal distributed matrix, and sub(B)
denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
This function uses the LU factorization computed by p?dttrf.
Input Parameters

If trans = 'T', then (sub(A))T*X = sub(B) is solved for X.
If trans = 'C', then (sub(A))H*X = sub(B) is solved for X.
dl, d, du (local)
Pointers to the local arrays of size nb_a each.
On entry, these arrays contain details of the factorization. Globally, dl[0]

and du[n-1] are not referenced; dl and du must be aligned with d.
If dtype_a = 501 or dtype_a = 502, then dlen_≥ 7;
b (local) Same type as d.
Pointer into the local memory to an array of local size lld_b*LOCc(nrhs)
On entry, the array b contains the local pieces of the n-by-nrhs right hand
side distributed matrix sub(B).
1134
af, work (local)
Arrays of size laf and (lwork), respectively.
The array af contains auxiliary fill-in space. The fill-in space is created in a
call to the factorization function p?dttrf and is stored in af. If a linear
system is to be solved using p?dttrs after the factorization function,af
must not be altered.
Must be laf≥NB*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu).
lwork (local or global) The size of the array work, must be at least lwork≥
10*NPCOL+4*nrhs.
Output Parameters
matrix X.
See Also
notations.
p?potrs
factored symmetric/Hermitian distributed positive-
definite matrix.
Syntax
void pspotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *info );
void pdpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *info );
void pcpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
1135
void pzpotrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT

*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
Include Files
• mkl_scalapack.h
Description
The p?potrsfunction solves for X a system of distributed linear equations in the form:
sub(A)*X = sub(B) ,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive
definite distributed matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, jb:jb+nrhs-1).
This function uses Cholesky factorization

sub(A) = UH*U, or sub(A) = L*LH
computed by p?potrf.
Input Parameters
If uplo = 'U', upper triangle of sub(A) is stored;
If uplo = 'L', lower triangle of sub(A) is stored.
a, b (local)
Pointers into the local memory to arrays of local sizes
lld_a*LOCc(ja+n-1) and lld_b*LOCc(jb+nrhs-1), respectively.
The array a contains the factors L or U from the Cholesky factorization

sub(A) = L*LH or sub(A) = UH*U, as computed by p?potrf.
On entry, the array b contains the local pieces of the right hand sides
sub(B).
descb (local) array of size dlen_. The array descriptor for the distributed matrix B.
Output Parameters
b Overwritten by the local pieces of the solution matrix X.
1136
info < 0: if the i-th argument is an array and the j-th entry, indexed j - 1,
had an illegal value, then info = -(i*100+j); if the i-th argument is a
See Also
notations.
p?pbtrs
factored symmetric/Hermitian positive-definite band
matrix.
Syntax
void pspbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af ,
MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af ,
MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *descb ,
*info );
void pzpbtrs (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb ,
*info );
Include Files
• mkl_scalapack.h
Description
The p?pbtrsfunction solves for X a system of distributed linear equations in the form:
sub(A)*X = sub(B) ,
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive definite
distributed band matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
This function uses Cholesky factorization

sub(A) = P*UH*U*PT, or sub(A) = P*L*LH*PT
computed by p?pbtrf.
Input Parameters
1137
bw (global) The number of superdiagonals of the distributed matrix if uplo =

'U', or the number of subdiagonals if uplo = 'L' (bw≥0).
a, b (local)
and lld_b*LOCc(nrhs-1), respectively.
The array a contains the permuted triangular factor U or L from the

Cholesky factorization sub(A) = P*UH*U*PT, or sub(A) = P*L*LH*PT of the
band matrix A, as returned by p?pbtrf.
On entry, the array b contains the local pieces of the n-by-nrhs right hand
matrix sub(B).
af, work (local) Arrays, same type as a.
The array af is of size laf. It contains auxiliary fill-in space. The fill-in
space is created in a call to the factorization function p?dbtrf and is stored
in af.
The array work is a workspace array of size lwork.
Must be laf≥nrhs*bw.
lwork (local or global) The size of the array work, must be at least lwork≥bw2.
1138
Output Parameters
b On exit, if info=0, this array contains the local pieces of the n-by-nrhs
solution distributed matrix X.
info < 0:
See Also
notations.
p?pttrs
Solves a system of linear equations with a symmetric
(Hermitian) positive-definite tridiagonal distributed
matrix using the factorization computed by p?pttrf.
Syntax
void pspttrs (MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT *laf , float
void pdpttrs (MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT *ja ,
MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af , MKL_INT
*laf , double *work , MKL_INT *lwork , MKL_INT *info );
void pcpttrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , MKL_Complex8 *e ,
*info );
void pzpttrs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , MKL_Complex16 *e ,
*info );
Include Files
• mkl_scalapack.h
Description
The p?pttrsfunction solves for X a system of distributed linear equations in the form:
sub(A)*X = sub(B) ,
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real symmetric or complex Hermitian positive definite
tridiagonal distributed matrix, and sub(B) denotes the distributed matrix B(ib:ib+n-1, 1:nrhs).
This function uses the factorization

sub(A) = P*L*D*LH*PT, or sub(A) = P*UH*D*U*PT
1139
computed by p?pttrf.
Input Parameters
uplo (global, used in complex flavors only)

Must be 'U' or 'L'.
d, e (local)
Pointers into the local memory to arrays of size nb_a each.
These arrays contain details of the factorization as returned by p?pttrf
If dtype_a = 501 or dtype_a = 502, then dlen_≥ 7;
b (local) Same type as d, e.
Pointer into the local memory to an array of local size

lld_b*LOCc(nrhs).
On entry, the array b contains the local pieces of the n-by-nrhsright hand
af, work (local)

Arrays of size laf and (lwork), respectively. The array af contains
auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?pttrf and is stored in af.
Must be laf≥nb_a+2.
1140
If laf is not large enough, an error code is returned and the minimum
lwork (local or global) The size of the array work, must be at least
lwork≥ (10+2*min(100,nrhs))*NPCOL+4*nrhs.
Output Parameters
matrix X.
work[0]) On exit, work[0] contains the minimum value of lwork required for
info < 0:
if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a
See Also
notations.
p?trtrs
Solves a system of linear equations with a triangular
distributed matrix.
Syntax
void pstrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib ,
void pdtrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
void pctrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
void pztrtrs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?trtrsfunction solves for X one of the following systems of linear equations:
sub(A)*X = sub(B),
1141
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular distributed matrix of order n, and sub(B) denotes
the distributed matrix B(ib:ib+n-1, jb:jb+nrhs-1).
A check is made to verify that sub(A) is nonsingular.
Input Parameters
Indicates whether sub(A) is upper or lower triangular:

If uplo = 'U', then sub(A) is upper triangular.
If uplo = 'L', then sub(A) is lower triangular.

If trans = 'C', then sub(A)H*X = sub(B) is solved for X.
diag (global) Must be 'N' or 'U'.
If diag = 'N', then sub(A) is not a unit triangular matrix.
If diag = 'U', then sub(A) is unit triangular.
nrhs (global) The number of right-hand sides; i.e., the number of columns of the
a, b (local)
and lld_b*LOCc(jb+nrhs-1), respectively.
The array a contains the local pieces of the distributed triangular matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of sub(A) contains
the upper triangular matrix, and the strictly lower triangular part of sub(A)
is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
the lower triangular matrix, and the strictly upper triangular part of sub(A)
is not referenced.
If diag = 'U', the diagonal elements of sub(A) are also not referenced
and are assumed to be 1.
On entry, the array b contains the local pieces of the right hand side
1142
Output Parameters
b On exit, if info=0, sub(B) is overwritten by the solution matrix X.
info < 0:
if the i-th argument is an array and the j-th entry, indexed j - 1, had an
info> 0:
if info = i, the i-th diagonal element of sub(A) is zero, indicating that the
submatrix is singular and the solutions X have not been computed.
See Also
notations.
Estimating the Condition Number: ScaLAPACK Computational Routines

This section describes the ScaLAPACK routines for estimating the condition number of a matrix. The condition
number is used for analyzing the errors in the solution of a system of linear equations. Since the condition
number may be arbitrarily large when the matrix is nearly singular, the routines actually compute the
reciprocal condition number.
p?gecon
general distributed matrix in either the 1-norm or the
infinity-norm.
Syntax
void psgecon (char *norm , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *anorm , float *rcond , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *liwork , MKL_INT *info );
void pdgecon (char *norm , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , double *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcgecon (char *norm , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *anorm , float *rcond , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgecon (char *norm , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , MKL_Complex16 *work , MKL_INT
*lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );
1143
Include Files
• mkl_scalapack.h
Description
The p?gecon function estimates the reciprocal of the condition number of a general distributed real/complex
matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) in either the 1-norm or infinity-norm, using the LU factorization
computed by p?getrf.
An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number is computed as
Input Parameters
norm (global) Must be '1' or 'O' or 'I'.
Specifies whether the 1-norm condition number or the infinity-norm

condition number is required.
If norm = '1' or 'O', then the 1-norm is used;
If norm = 'I', then the infinity-norm is used.
a (local)
The array a contains the local pieces of the factors L and U from the
factorization sub(A) = P*L*U; the unit diagonal elements of L are not
stored.
anorm (global)
If norm = '1' or 'O', the 1-norm of the original distributed matrix sub(A);
If norm = 'I', the infinity-norm of the original distributed matrix sub(A).
work (local)
The array work of size lwork is a workspace array.
lwork (local or global) The size of the array work.
For real flavors:

lwork must be at least
1144
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+2*LOCc(n+mod(ja-1,nb_a))
+max(2, max(nb_a*max(1, iceil(NPROW-1, NPCOL)), LOCc(n
+mod(ja-1,nb_a)) + nb_a*max(1, iceil(NPCOL-1, NPROW)))).
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+max(2,
max(nb_a*iceil(NPROW-1, NPCOL), LOCc(n+mod(ja-1,nb_a))+
nb_a*iceil(NPCOL-1, NPROW))).
LOCr and LOCc values can be computed using the ScaLAPACK tool function
numroc; NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
NOTE
iceil(x,y) is the ceiling of x/y, and mod(x,y) is the integer
remainder of x/y.
iwork (local) Workspace array of size liwork. Used in real flavors only.
liwork (local or global) The size of the array iwork; used in real flavors only. Must
be at least
liwork≥LOCr(n+mod(ia-1,mb_a)).
rwork (local)
Workspace array of size lrwork. Used in complex flavors only.
lrwork (local or global) The size of the array rwork; used in complex flavors only.
Must be at least
lrwork≥ max(1, 2*LOCc(n+mod(ja-1,nb_a))).
Output Parameters
rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A). See
Description.
iwork[0] On exit, iwork[0] contains the minimum value of liwork required for
optimum performance (for real flavors).
rwork[0] On exit, rwork[0] contains the minimum value of lrwork required for
optimum performance (for complex flavors).
info (global) If info=0, the execution is successful.
info < 0:
1145
See Also
notations.
p?pocon
Estimates the reciprocal of the condition number (in
the 1 - norm) of a symmetric / Hermitian positive-
definite distributed matrix.
Syntax
void pspocon (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *anorm , float *rcond , float *work , MKL_INT *lwork , MKL_INT *iwork ,
void pdpocon (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , double *work , MKL_INT *lwork ,
void pcpocon (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *anorm , float *rcond , MKL_Complex8 *work , MKL_INT *lwork ,
float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzpocon (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *anorm , double *rcond , MKL_Complex16 *work , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?poconfunction estimates the reciprocal of the condition number (in the 1 - norm) of a real symmetric
or complex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1), using the
Cholesky factorization sub(A) = UH*U or sub(A) = L*LH computed by p?potrf.
An estimate is obtained for ||(sub(A))-1||, and the reciprocal of the condition number is computed as
Input Parameters
Specifies whether the factor stored in sub(A) is upper or lower triangular.

If uplo = 'U', sub(A) stores the upper triangular factor U of the Cholesky
factorization sub(A) = UH*U.
1146
If uplo = 'L', sub(A) stores the lower triangular factor L of the Cholesky
factorization sub(A) = L*LH.
a (local)
The array a contains the local pieces of the factors L or U from the Cholesky
factorization sub(A) = UH*U, or sub(A) = L*LH, as computed by p?potrf.
anorm (global)
The 1-norm of the symmetric/Hermitian distributed matrix sub(A).
work (local)
For real flavors:

lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+2*LOCc(n+mod(ja-1,nb_a))
+max(2, max(nb_a*iceil(NPROW-1, NPCOL), LOCc(n
+mod(ja-1,nb_a))+nb_a*iceil(NPCOL-1, NPROW))).
max(nb_a*max(1,iceil(NPROW-1, NPCOL)), LOCc(n+mod(ja-1,nb_a))
+nb_a*max(1,iceil(NPCOL-1, NPROW)))).
NOTE
remainder of x/y.
be at least liwork≥LOCr(n+mod(ia-1,mb_a)).
rwork (local)
Must be at least lrwork≥ 2*LOCc(n+mod(ja-1,nb_a)).
1147
Output Parameters
rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A).
info < 0:
See Also
notations.
p?trcon
triangular distributed matrix in either 1-norm or
infinity-norm.
Syntax
void pstrcon (char *norm , char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *rcond , float *work , MKL_INT *lwork ,
void pdtrcon (char *norm , char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *rcond , double *work , MKL_INT *lwork ,
void pctrcon (char *norm , char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *rcond , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pztrcon (char *norm , char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *rcond , MKL_Complex16 *work ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?trconfunction estimates the reciprocal of the condition number of a triangular distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1), in either the 1-norm or the infinity-norm.
The norm of sub(A) is computed and an estimate is obtained for ||(sub(A))-1||, then the reciprocal of the
condition number is computed as
1148
Input Parameters
norm (global) Must be '1' or 'O' or 'I'.
Specifies whether the 1-norm condition number or the infinity-norm

condition number is required.
If norm = '1' or 'O', then the 1-norm is used;
If norm = 'I', then the infinity-norm is used.
If uplo = 'U', sub(A) is upper triangular. If uplo = 'L', sub(A) is lower

triangular.
diag (global) Must be 'N' or 'U'.
If diag = 'N', sub(A) is non-unit triangular. If diag = 'U', sub(A) is unit

triangular.
n (global) The order of the distributed matrix sub(A), (n≥0).
a (local)
The array a contains the local pieces of the triangular distributed matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of this distributed
matrix contains the upper triangular matrix, and its strictly lower triangular
part is not referenced.
If uplo = 'L', the leading n-by-n lower triangular part of this distributed
matrix contains the lower triangular matrix, and its strictly upper triangular
part is not referenced.
work (local)
For real flavors:

1149
lwork≥ 2*LOCr(n+mod(ia-1,mb_a))+LOCc(n+mod(ja-1,nb_a))
+max(2, max(nb_a*max(1,iceil(NPROW-1, NPCOL)),
LOCc(n+mod(ja-1,nb_a))+nb_a*max(1,iceil(NPCOL-1, NPROW)))).
max(nb_a*iceil(NPROW-1, NPCOL),
LOCc(n+mod(ja-1,nb_a))+nb_a*iceil(NPCOL-1, NPROW))).
NOTE
remainder of x/y.
be at least
liwork≥LOCr(n+mod(ia-1,mb_a)).
rwork (local)
Must be at least
lrwork≥LOCc(n+mod(ja-1,nb_a)).
Output Parameters
rcond (global)
The reciprocal of the condition number of the distributed matrix sub(A).
info < 0:
1150
See Also
notations.
Refining the Solution and Estimating Its Error: ScaLAPACK Computational Routines
This section describes the ScaLAPACK routines for refining the computed solution of a system of linear
equations and estimating the solution error. You can call these routines after factorizing the matrix of the
system of equations and computing the solution (see Routines for Matrix Factorization and Solving Systems
of Linear Equations).
p?gerfs
Improves the computed solution to a system of linear
equations and provides error bounds and backward
error estimates for the solution.
Syntax
void psgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr , float *berr ,
float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pdgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double *berr ,
double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float
*ferr , float *berr , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT
*lrwork , MKL_INT *info );
void pzgerfs (char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
MKL_INT *descaf , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
double *ferr , double *berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork ,
MKL_INT *lrwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gerfs function improves the computed solution to one of the systems of linear equations
sub(A)*sub(X) = sub(B),
sub(A)T*sub(X) = sub(B), or
sub(A)H*sub(X) = sub(B) and provides error bounds and backward error estimates for the solution.
Here sub(A) = A(ia:ia+n-1, ja:ja+n-1), sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1), and sub(X) = X(ix:ix
+n-1, jx:jx+nrhs-1).
1151
Input Parameters

If trans = 'N', the system has the form sub(A)*sub(X) = sub(B) (No
transpose);
If trans = 'T', the system has the form sub(A)T*sub(X) = sub(B)
(Transpose);
If trans = 'C', the system has the form sub(A)H*sub(X) = sub(B)
(Conjugate transpose).
nrhs (global) The number of right-hand sides, i.e., the number of columns of the
matrices sub(B) and sub(X) (nrhs≥ 0).
a, af, b, x (local)
a: lld_a * LOCc(ja+n-1),
af: lld_af * LOCc(jaf+n-1),
b: lld_b * LOCc(jb+nrhs-1),
x: lld_x * LOCc(jx+nrhs-1).
The array a contains the local pieces of the distributed matrix sub(A).
The array af contains the local pieces of the distributed factors of the
matrix sub(A) = P*L*U as computed by p?getrf.
The array b contains the local pieces of the distributed matrix of right hand
sides sub(B).
On entry, the array x contains the local pieces of the distributed solution
matrix sub(X).
iaf, jaf (global) The row and column indices in the global matrix AF indicating the
first row and the first column of the matrix sub(AF), respectively.
descaf (global and local) array of size dlen_. The array descriptor for the
distributed matrix AF.
1152
ix, jx (global) The row and column indices in the global matrix X indicating the
first row and the first column of the matrix sub(X), respectively.
descx (global and local) array of size dlen_. The array descriptor for the
distributed matrix X.
ipiv (local)
Array of size LOCr(m_af) + mb_af.
This array contains pivoting information as computed by p?getrf. If

ipiv[i]=j, then the local row i+1 was swapped with the global row jwhere
i=0, ... , LOCr(m_af) + mb_af- 1.
work (local)
For real flavors:

lwork≥ 3*LOCr(n+mod(ia-1,mb_a))
NOTE
mod(x,y) is the integer remainder of x/y.
iwork (local) Workspace array, size liwork. Used in real flavors only.
be at least
liwork≥LOCr(n+mod(ib-1,mb_b)).
rwork (local)
Workspace array, size lrwork. Used in complex flavors only.
Must be at least lrwork≥LOCr(n+mod(ib-1,mb_b))).
Output Parameters
x On exit, contains the improved solution vectors.
ferr, berr Arrays of size LOCc(jb+nrhs-1) each.
The array ferr contains the estimated forward error bound for each
solution vector of sub(X).
1153
If XTRUE is the true solution corresponding to sub(X), ferr is an estimated

upper bound for the magnitude of the largest element in (sub(X) - XTRUE)
divided by the magnitude of the largest element in sub(X). The estimate is
as reliable as the estimate for rcond, and is almost always a slight
overestimate of the true error.
This array is tied to the distributed matrix X.
The array berr contains the component-wise relative backward error of
each solution vector (that is, the smallest relative change in any entry of
sub(A) or sub(B) that makes sub(X) an exact solution). This array is tied to
the distributed matrix X.
info < 0:
See Also
notations.
p?porfs
Improves the computed solution to a system of linear
equations with symmetric/Hermitian positive definite
distributed matrix and provides error bounds and
backward error estimates for the solution.
Syntax
void psporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT
*ix , MKL_INT *jx , MKL_INT *descx , float *ferr , float *berr , float *work , MKL_INT
*lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pdporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT *jaf , MKL_INT
*descaf , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT
*ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double *berr , double *work ,
MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pcporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
1154
MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *ferr , float
*berr , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT
*info );
void pzporfs (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT
MKL_INT *descaf , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *ferr , double
*berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?porfsfunction improves the computed solution to the system of linear equations
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a real symmetric or complex Hermitian positive definite
distributed matrix and
sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1),
sub(X) = X(ix:ix+n-1, jx:jx+nrhs-1)
are right-hand side and solution submatrices, respectively. This function also provides error bounds and
backward error estimates for the solution.
Input Parameters
Specifies whether the upper or lower triangular part of the symmetric/

Hermitian matrix sub(A) is stored.
triangular.
nrhs (global) The number of right-hand sides, i.e., the number of columns of the
matrices sub(B) and sub(X) (nrhs≥0).
a, af, b, x (local)
af: lld_af * LOCc(jaf+n-1),
The array a contains the local pieces of the n-by-n symmetric/Hermitian
distributed matrix sub(A).
the upper triangular part of the matrix, and its strictly lower triangular part
is not referenced.
1155
the lower triangular part of the distributed matrix, and its strictly upper
triangular part is not referenced.
The array af contains the factors L or U from the Cholesky factorization
sub(A) = L*LH or sub(A) = UH*U, as computed by p?potrf.
On entry, the array b contains the local pieces of the distributed matrix of
right hand sides sub(B).
On entry, the array x contains the local pieces of the solution vectors
sub(X).
first row and the first column of the matrix sub(AF), respectively.
work (local)
lwork (local) The size of the array work.
For real flavors:

NOTE
1156
be at least
rwork (local)
Output Parameters
x On exit, contains the improved solution vectors.
info < 0:
See Also
notations.
p?trrfs
Provides error bounds and backward error estimates
for the solution to a system of linear equations with a
distributed triangular coefficient matrix.
1157
Syntax
void pstrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *info );
void pdtrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , double *ferr , double *berr , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pctrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , float *ferr , float *berr , MKL_Complex8 *work , MKL_INT
*lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pztrrfs (char *uplo , char *trans , char *diag , MKL_INT *n , MKL_INT *nrhs ,
*jx , MKL_INT *descx , double *ferr , double *berr , MKL_Complex16 *work , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?trrfsfunction provides error bounds and backward error estimates for the solution to one of the
systems of linear equations
sub(A)T*sub(X) = sub(B), or
sub(A)H*sub(X) = sub(B) ,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is a triangular matrix,
sub(B) = B(ib:ib+n-1, jb:jb+nrhs-1), and
sub(X) = X(ix:ix+n-1, jx:jx+nrhs-1).
The solution matrix X must be computed by p?trtrs or some other means before entering this function. The
function p?trrfs does not do iterative refinement because doing so cannot improve the backward error.
Input Parameters

triangular.

If trans = 'N', the system has the form sub(A)*sub(X) = sub(B) (No
transpose);
1158
If trans = 'T', the system has the form sub(A)T*sub(X) = sub(B)
(Transpose);
If trans = 'C', the system has the form sub(A)H*sub(X) = sub(B)
(Conjugate transpose).
If diag = 'N', then sub(A) is non-unit triangular.
nrhs (global) The number of right-hand sides, that is, the number of columns of
the matrices sub(B) and sub(X) (nrhs≥0).
a, b, x (local)
The array a contains the local pieces of the original triangular distributed
matrix sub(A).
is not referenced.
On entry, the array b contains the local pieces of the distributed matrix of
right hand sides sub(B).
On entry, the array x contains the local pieces of the solution vectors
sub(X).
1159
work (local)
For real flavors:

lwork must be at least lwork≥ 3*LOCr(n+mod(ia-1,mb_a))
NOTE
be at least
rwork (local)
Output Parameters
1160
info < 0:
See Also
notations.
Matrix Inversion: ScaLAPACK Computational Routines

This sections describes ScaLAPACK routines that compute the inverse of a matrix based on the previously
obtained factorization. Note that it is not recommended to solve a system of equations Ax = b by first
computing A-1 and then forming the matrix-vector product x = A-1b. Call a solver routine instead (see
Solving Systems of Linear Equations); this is more efficient and more accurate.
p?getri
Computes the inverse of a LU-factored distributed
matrix.
Syntax
void psgetri (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_INT *ipiv , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *info );
void pdgetri (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_INT *ipiv , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *info );
void pcgetri (MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *iwork ,
void pzgetri (MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *iwork ,
Include Files
• mkl_scalapack.h
Description
The p?getrifunction computes the inverse of a general distributed matrix sub(A) = A(ia:ia+n-1, ja:ja
+n-1) using the LU factorization computed by p?getrf. This method inverts U and then computes the
inverse of sub(A) by solving the system
inv(sub(A))*L = inv(U)
for inv(sub(A)).
1161
Input Parameters
order of the distributed matrix sub(A) (n≥0).
a (local)
On entry, the array a contains the local pieces of the L and U obtained by
the factorization sub(A) = P*L*U computed by p?getrf.
work (local)
lwork (local) The size of the array work. lwork must be at least
lwork≥LOCr(n+mod(ia-1,mb_a))*nb_a.
NOTE
The array work is used to keep at most an entire column block of sub(A).
iwork (local) Workspace array used for physically transposing the pivots, size
liwork.
liwork (local or global) The size of the array iwork.
The minimal value liwork of is determined by the following code:
if NPROW == NPCOL then

liwork = LOCc(n_a + mod(ja-1,nb_a))+ nb_a
else
liwork = LOCc(n_a + mod(ja-1,nb_a)) +
max(ceil(ceil(LOCr(m_a)/mb_a)/(lcm/NPROW)),nb_a)
end if
where lcm is the least common multiple of process rows and columns
(NPROW and NPCOL).
Output Parameters
ipiv (local)
Array of size LOCr(m_a)+ mb_a.
This array contains the pivoting information.
1162
If ipiv[i]=j, then the local row i+1 was swapped with the global row
jwhere i=0, ... , LOCr(m_a) + mb_a- 1.
info < 0:
info> 0:
If info = i, the matrix element U(i,i) is exactly zero. The factorization has
been completed, but the factor U is exactly singular, and division by zero
will occur if it is used to solve a system of equations.
See Also
notations.
p?potri
Computes the inverse of a symmetric/Hermitian
positive definite distributed matrix.
Syntax
void pspotri (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdpotri (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcpotri (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
void pzpotri (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
Include Files
• mkl_scalapack.h
Description
The p?potrifunction computes the inverse of a real symmetric or complex Hermitian positive definite
distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using the Cholesky factorization sub(A) = UH*U or
sub(A) = L*LH computed by p?potrf.
Input Parameters
1163

Hermitian matrix sub(A) is stored.
If uplo = 'U', upper triangle of sub(A) is stored. If uplo = 'L', lower
triangle of sub(A) is stored.
a (local)
On entry, the array a contains the local pieces of the triangular factor U or L
from the Cholesky factorization sub(A) = UH*U, or sub(A) = L*LH, as
computed by p?potrf.
Output Parameters
a On exit, overwritten by the local pieces of the upper or lower triangle of the
(symmetric/Hermitian) inverse of sub(A).
info < 0:
info> 0:
If info = i, the element (i, i) of the factor U or L is zero, and the inverse
could not be computed.
See Also
notations.
p?trtri
Computes the inverse of a triangular distributed
matrix.
Syntax
void pstrtri (char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *info );
void pdtrtri (char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
void pctrtri (char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
1164
void pztrtri (char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?trtrifunction computes the inverse of a real or complex upper or lower triangular distributed matrix
sub(A) = A(ia:ia+n-1, ja:ja+n-1).
Input Parameters
Specifies whether the distributed matrix sub(A) is upper or lower triangular.

If uplo = 'U', sub(A) is upper triangular.
If uplo = 'L', sub(A) is lower triangular.
Specifies whether or not the distributed matrix sub(A) is unit triangular.

If diag = 'N', then sub(A) is non-unit triangular.
a (local)
The array a contains the local pieces of the triangular distributed matrix
sub(A).
the upper triangular matrix to be inverted, and the strictly lower triangular
part of sub(A) is not referenced.
the lower triangular matrix, and the strictly upper triangular part of sub(A)
is not referenced.
Output Parameters
a On exit, overwritten by the (triangular) inverse of the original matrix.
info < 0:
1165
info> 0:
If info = k, the matrix element A(ia+k-1, ja+k-1) is exactly zero. The
triangular matrix sub(A) is singular and its inverse cannot be computed.
See Also
notations.
Matrix Equilibration: ScaLAPACK Computational Routines

ScaLAPACK routines described in this section are used to compute scaling factors needed to equilibrate a
matrix. Note that these routines do not actually scale the matrices.
p?geequ
equilibrate a general rectangular distributed matrix
and reduce its condition number.
Syntax
void psgeequ (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax , MKL_INT
*info );
void pdgeequ (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *r , double *c , double *rowcnd , double *colcnd , double
*amax , MKL_INT *info );
void pcgeequ (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax ,
MKL_INT *info );
void pzgeequ (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*amax , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?geequfunction computes row and column scalings intended to equilibrate an m-by-n distributed matrix
sub(A) = A(ia:ia+m-1, ja:ja+n-1) and reduce its condition number. The output array r returns the row
scale factors ri , and the array c returns the column scale factors cj . These factors are chosen to try to make
the largest element in each row and column of the matrix B with elements bij=ri*aij*cj have absolute value 1.
ri and cj are restricted to be between SMLNUM = smallest safe number and BIGNUM = largest safe number.
Use of these scaling factors is not guaranteed to reduce the condition number of sub(A) but works well in
practice.
1166

BIGNUM = 1 / SMLNUM
The auxiliary function p?laqge uses scaling factors computed by p?geequ to scale a general rectangular
matrix.
Input Parameters
m (global) The number of rows to be operated on, that is, the number of rows
of the distributed matrix sub(A) (m≥ 0).
n (global) The number of columns to be operated on, that is, the number of
columns of the distributed matrix sub(A) (n≥ 0).
a (local)
The array a contains the local pieces of the m-by-n distributed matrix whose
equilibration factors are to be computed.
Output Parameters
r, c (local)
Arrays of sizes LOCr(m_a) and LOCc(n_a), respectively.
If info = 0, or info>ia+m-1, r[i] contain the row scale factors for sub(A)
for ia-1≤i<ia+m-1. r is aligned with the distributed matrix A, and
replicated across every process column. r is tied to the distributed matrix
A.
If info = 0, c[i] contain the column scale factors for sub(A) for ja-1≤i<ja
+n-1. c is aligned with the distributed matrix A, and replicated down every
process row. c is tied to the distributed matrix A.
rowcnd, colcnd (global)

If info = 0 or info>ia+m-1, rowcnd contains the ratio of the smallest ri
to the largest ri (ia≤i≤ia+m-1). If rowcnd≥ 0.1 and amax is neither too
large nor too small, it is not worth scaling by ri.
If info = 0, colcnd contains the ratio of the smallest cj to the largest cj

(ja ≤j≤ja+n-1).
If colcnd≥ 0.1, it is not worth scaling by cj.
amax (global)
Absolute value of the largest matrix element. If amax is very close to
overflow or very close to underflow, the matrix should be scaled.
1167
info < 0:
info> 0:
If info = i and
i≤m, the i-th row of the distributed matrix
sub(A) is exactly zero;

i>m, the (i - m)-th column of the distributed
matrix sub(A) is exactly zero.
See Also
notations.
p?poequ
distributed matrix and reduce its condition number.
Syntax
void pspoequ (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
float *sr , float *sc , float *scond , float *amax , MKL_INT *info );
void pdpoequ (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
double *sr , double *sc , double *scond , double *amax , MKL_INT *info );
void pcpoequ (MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *sr , float *sc , float *scond , float *amax , MKL_INT *info );
void pzpoequ (MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , double *sr , double *sc , double *scond , double *amax , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?poequ function computes row and column scalings intended to equilibrate a real symmetric or
complex Hermitian positive definite distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) and reduce its
condition number (with respect to the two-norm). The output arrays sr and sc return the row and column
scale factors
These factors are chosen so that the scaled distributed matrix B with elements bij=s(i)*aij*s(j) has ones on
the diagonal.
1168
This choice of sr and sc puts the condition number of B within a factor n of the smallest possible condition
number over all possible diagonal scalings.
The auxiliary function p?laqsy uses scaling factors computed by p?geequ to scale a general rectangular
matrix.
Input Parameters
a (local)
The array a contains the n-by-n symmetric/Hermitian positive definite

distributed matrix sub(A) whose scaling factors are to be computed. Only
the diagonal elements of sub(A) are referenced.
Output Parameters
sr, sc (local)
Arrays of sizes LOCr(m_a) and LOCc(n_a), respectively.
If info = 0, the array sr(ia:ia+n-1) contains the row scale factors for
sub(A). sr is aligned with the distributed matrix A, and replicated across
every process column. sr is tied to the distributed matrix A.
If info = 0, the array sc(ja:ja+n-1) contains the column scale factors

for sub(A). sc is aligned with the distributed matrix A, and replicated down
every process row. sc is tied to the distributed matrix A.
scond (global)
If info = 0, scond contains the ratio of the smallest sr[i] ( or sc[j]) to

the largest sr[i] ( or sc[j]), with
ia-1≤i<ia+n-1 and ja-1≤j<ja+n-1.

If scond≥ 0.1 and amax is neither too large nor too small, it is not worth
scaling by sr ( or sc ).
amax (global)
Absolute value of the largest matrix element. If amax is very close to
overflow or very close to underflow, the matrix should be scaled.
info (global)
info < 0:
1169
info> 0:
If info = k, the k-th diagonal entry of sub(A) is nonpositive.
See Also
notations.
Orthogonal Factorizations: ScaLAPACK Computational Routines

This section describes the ScaLAPACK routines for the QR(RQ) and LQ(QL) factorization of matrices. Routines
for the RZ factorization as well as for generalized QR and RQ factorizations are also included. For the
mathematical definition of the factorizations, see the respective LAPACK sections or refer to [SLUG].
Table "Computational Routines for Orthogonal Factorizations" lists ScaLAPACK routines that perform
orthogonal factorization of matrices.
Computational Routines for Orthogonal Factorizations
Matrix type, Factorize Factorize with Generate matrix Apply matrix Q
factorization without pivoting Q
pivoting
general matrices, QR p?geqrf p?geqpf p?orgqr p?ormqr

factorization
p?ungqr p?unmqr
general matrices, RQ p?gerqf p?orgrq p?ormrq

factorization
p?ungrq p?unmrq
general matrices, LQ p?gelqf p?orglq p?ormlq

factorization
p?unglq p?unmlq
general matrices, QL p?geqlf p?orgql p?ormql

factorization
p?ungql p?unmql
trapezoidal matrices, p?tzrzf p?ormrz

RZ factorization
p?unmrz
pair of matrices, p?ggqrf

generalized QR
factorization
pair of matrices, p?ggrqf

generalized RQ
factorization
p?geqrf
matrix.
Syntax
void psgeqrf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
1170
void pdgeqrf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgeqrf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT
*info );
void pzgeqrf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
Include Files
• mkl_scalapack.h
Description
The p?geqrf function forms the QR factorization of a general m-by-n distributed matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) as
A=Q*R.
Input Parameters
m (global) The number of rows in the distributed matrix sub(A); (m≥ 0).
n (global) The number of columns in the distributed matrix sub(A); (n≥ 0).
a (local)
first row and the first column of the submatrix A(ia:ia+m-1, ja:ja+n-1),
respectively.
distributed matrix A
work (local).
Workspace array of size lwork.
lwork (local or global) size of work, must be at least lwork≥nb_a *

(mp0+nq0+nb_a), where
iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, MYROW, rsrc_a, NPROW),
iacol = indxg2p(ja, nb_a, MYCOL, csrc_a, NPCOL),
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL), and numroc,
indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW and NPCOL
can be determined by calling the function blacs_gridinfo.
1171
If lwork = -1, then lwork is global input and a workspace query is

assumed; the function only calculates the minimum and optimal size for all
work arrays. Each of these values is returned in the first entry of the
corresponding work array, and no error message is issued by pxerbla.
Output Parameters
a The elements on and above the diagonal of sub(A) contain the min(m,n)-by-
n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors (see Application Notes
below).
tau (local)
Array of size LOCc(ja+min(m,n)-1).
Contains the scalar factor of elementary reflectors. tau is tied to the

info (global)
= 0, the execution is successful.
< 0, if the i-th argument is an array and the j-th entry, indexed j - 1, had
an illegal value, then info = -(i*100+j); if the i-th argument is a scalar
and had an illegal value, then info = -i.
Application Notes
Q = H(ja)*H(ja+1)*...*H(ja+k-1),
where k = min(m,n).

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is
stored on exit in A(ia+i:ia+m-1, ja+i-1), and tau in tau[ja+i-2].
See Also
notations.
p?geqpf
matrix with pivoting.
Syntax
void psgeqpf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgeqpf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
1172
void pcgeqpf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
*lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzgeqpf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *tau , MKL_Complex16 *work , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?geqpf function forms the QR factorization with column pivoting of a general m-by-n distributed matrix
sub(A)= A(ia:ia+m-1, ja:ja+n-1) as
sub(A)*P=Q*R.
Input Parameters
m (global) The number of rows in the matrix sub(A) (m≥ 0).
n (global) The number of columns in the matrix sub(A) (n≥ 0).
a (local)
respectively.
work (local).
lwork (local or global) size of work, must be at least
For real flavors:

lwork≥max(3,mp0+nq0) + LOCc (ja+n-1) + nq0.
lwork≥max(3,mp0+nq0) .
Here
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW ),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL),
1173
LOCc (ja+n-1) = numroc(ja+n-1, nb_a, MYCOL,csrc_a, NPCOL),

and numroc, indxg2p are ScaLAPACK tool functions.
You can determine MYROW, MYCOL, NPROW and NPCOL by calling the
blacs_gridinfofunction.
rwork (local).
Workspace array of size lrwork (complex flavors only).
lrwork (local or global) size of rwork (complex flavors only). The value of lrwork
must be at least
lwork≥LOCc (ja+n-1) + nq0 .
Here
mp0 = numroc(m+iroff, mb_a, MYROW, iarow, NPROW ),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL),
LOCc (ja+n-1) = numroc(ja+n-1, nb_a, MYCOL,csrc_a, NPCOL),
and numroc, indxg2p are ScaLAPACK tool functions.
You can determine MYROW, MYCOL, NPROW and NPCOL by calling the
blacs_gridinfofunction.
If lrwork = -1, then lrwork is global input and a workspace query is
Output Parameters
a The elements on and above the diagonal of sub(A)contain the min(m,n)-by-

n upper trapezoidal matrix R (R is upper triangular if m≥n); the elements
below the diagonal, with the array tau, represent the orthogonal/unitary
below).
ipiv (local) Array of size LOCc(ja+n-1).
ipiv[i] = k, the local (i+1)-th column of sub(A)*P was the global k-th
column of sub(A) (0 ≤i < LOCc(ja+n-1). ipiv is tied to the distributed
matrix A.
tau (local)
Array of size LOCc(ja+min(m, n)-1).
1174
Contains the scalar factor tau of elementary reflectors. tau is tied to the
info (global)
Application Notes
Q = H(1)*H(2)*...*H(k)
where k = min(m,n).

H = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is
stored on exit in A(ia+i:ia+m-1, ja+i-1).
The matrix P is represented in ipiv as follows: if ipiv[j]= i then the (j+1)-th column of P is the i-th
canonical unit vector (0 ≤j < LOCc(ja+n-1).
See Also
notations.
p?orgqr
Generates the orthogonal matrix Q of the QR
factorization formed by p?geqrf.
Syntax
void psorgqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT *info );
void pdorgqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?orgqrfunction generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia+m-1,
ja:ja+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m
Q= H(1)*H(2)*...*H(k)
1175
as returned by p?geqrf.
Input Parameters
m (global) The number of rows in the matrix sub(Q) (m≥ 0).
n (global) The number of columns in the matrix sub(Q) (m≥n≥ 0).
k (global) The number of elementary reflectors whose product defines the

matrix Q(n≥k≥ 0).
a (local)
The j-th column of the matrix stored in amust contain the vector that
defines the elementary reflector H(j), ja≤j≤ja +k-1, as returned by p?
geqrf in the k columns of its distributed matrix argument A(ia:*, ja:ja
+k-1).
respectively.
tau (local)
Array of size LOCc(ja+k-1).
Contains the scalar factor tau[j] of elementary reflectors H(j+1) as

returned by p?geqrf (0 ≤j < LOCc(ja+k-1)). tau is tied to the distributed
matrix A.
work (local)
Workspace array of size of lwork.
lwork (local or global) size of work.
Must be at least lwork≥nb_a*(nqa0 + mpa0 + nb_a), where
iroffa = mod(ia-1, mb_a), icoffa = mod(ja-1, nb_a),

mpa0 = numroc(m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL);
indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
and NPCOL can be determined by calling the function blacs_gridinfo.

1176
Output Parameters
a Contains the local pieces of the m-by-n distributed matrix Q.
work[0] On exit, [0] contains the minimum value of lwork required for optimum
performance.
info (global)
= 0: the execution is successful.
< 0: if the i-th argument is an array and the j-th entry, indexed j - 1, had
See Also
notations.
p?ungqr
Generates the complex unitary matrix Q of the QR
factorization formed by p?geqrf.
Syntax
void pcungqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
void pzungqr (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia+m-1,
ja:ja+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m
Q = H(1)*H(2)*...*H(k)
Input Parameters
m (global) The number of rows in the matrix sub(Q); (m≥0).
n (global) The number of columns in the matrix sub(Q) (m≥n≥0).

matrix Q (n≥k≥0).
a (local)
1177
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). The

j-th column of the matrix stored in amust contain the vector that defines
the elementary reflector H(j), ja≤j≤ja +k-1, as returned by p?geqrf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1).
first row and the first column of the submatrix A, respectively.
tau (local)

matrix A.
work (local)
lwork (local or global) size of work, must be at least lwork≥nb_a*(nqa0 + mpa0

+ nb_a), where
iroffa = mod(ia-1, mb_a),
icoffa = mod(ja-1, nb_a),
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL)

Output Parameters
work[0] On exit work[0] contains the minimum value of lwork required for
info (global)
1178
See Also
notations.
p?ormqr
Multiplies a general matrix by the orthogonal matrix Q
of the QR factorization formed by p?geqrf.
Syntax
void psormqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau , float *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork , MKL_INT *info );
void pdormqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?ormqrfunction overwrites the general real m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс
+n-1) with
side ='L' side ='R'

trans = 'N': Q*sub(C) sub(C)*Q
trans = 'T': QT*sub(C) sub(C)*QT
where Q is a real orthogonal distributed matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k)
as returned by p?geqrf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
='L':Q or QT is applied from the left.
='R':Q or QT is applied from the right.
trans (global)
='N', no transpose, Q is applied.
='T', transpose, QT is applied.
m (global) The number of rows in the distributed matrix sub(C) (m≥0).
n (global) The number of columns in the distributed matrix sub(C) (n≥0).

matrix Q. Constraints:
1179
a (local)
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqrf in the
k columns of its distributed matrix argument A(ia:*, ja:ja+k-1). A(ia:*,
ja:ja+k-1) is modified by the function but restored on exit.
If side = 'L', lld_a≥max(1, LOCr(ia+m-1))
If side = 'R', lld_a≥max(1, LOCr(ia+n-1))
tau (local)

matrix A.
c (local)
Pointer into the local memory to an array of local size lld_c*LOCc(jc+n-1).
Contains the local pieces of the distributed matrix sub(C) to be factored.
ic, jc (global) The row and column indices in the global matrix C indicating the
first row and the first column of the matrix sub(C), respectively.
descc (global and local) array of size dlen_. The array descriptor for the
distributed matrix C.
work (local)
lwork (local or global) size of work, must be at least:
if side = 'L',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a) + nb_a*nb_a

else if side = 'R',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0+max(npa0+numroc(numroc(n
+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0, lcmq), mpc0))*nb_a)
+ nb_a*nb_a
end if
where
lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),
1180
npa0= numroc(n+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(ic-1, mb_c),
icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0= numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0= numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; MYROW, MYCOL,
NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
Output Parameters
c Overwritten by the product Q*sub(C), or QT*sub(C), or sub(C)*QT, or

sub(C)*Q.
info (global)
See Also
notations.
p?unmqr
the QR factorization formed by p?geqrf.
Syntax
void pcunmqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work ,
void pzunmqr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
1181
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс+n-1)
with
side ='L' side ='R'

trans = 'T': QH*sub(C) sub(C)*QH
where Q is a complex unitary distributed matrix defined as the product of k elementary reflectors
Q = H(1) H(2)... H(k) as returned by p?geqrf. Q is of order m if side = 'L' and of order n if side ='R'.
Input Parameters
side (global)
='L': Q or QH is applied from the left.
='R': Q or QH is applied from the right.
trans (global)
='C', conjugate transpose, QH is applied.

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+k-1). The
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqrf in the
If side = 'L', lld_a≥max(1, LOCr(ia+m-1))
If side = 'R', lld_a≥max(1, LOCr(ia+n-1))
tau (local)
1182

matrix A.
c (local)
first row and the first column of the submatrix C, respectively.
work (local)
If side = 'L',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0 + mpc0)*nb_a) + nb_a*nb_a

else if side = 'R',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0 + max(npa0 +

numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0,
lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
where
lcmq = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),
npa0 = numroc(n+iroffa, mb_a, MYROW, iarow, NPROW),
mpc0 = numroc(m+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(n+icoffc, nb_c, MYCOL, iccol, NPCOL),
blacs_gridinfo.
1183

Output Parameters
c Overwritten by the product Q*sub(C), or QH*sub(C), or sub(C)*QH, or

sub(C)*Q .
info (global)
See Also
notations.
p?gelqf
Computes the LQ factorization of a general
rectangular matrix.
Syntax
void psgelqf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdgelqf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgelqf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pzgelqf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
Include Files
• mkl_scalapack.h
Description
The p?gelqf function computes the LQ factorization of a real/complex distributed m-by-n matrix sub(A)=
A(ia:ia+m-1,ja:ja+n-1) = L*Q.
Input Parameters
m (global) The number of rows in the distributed submatrix sub(A) (m≥ 0).
1184
n (global) The number of columns in the distributed submatrix sub(A) (n≥
0).
a (local)
ia, ja (global) The row and column indices in the global array A indicating the first
row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.
work (local)
lwork (local or global) size of work, must be at least lwork≥mb_a*(mp0 + nq0 +

mb_a), where
iroff = mod(ia-1, mb_a),
icoff = mod(ja-1, nb_a),
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL)
NOTE

Output Parameters
a The elements on and below the diagonal of sub(A) contain the m-by-
min(m,n) lower trapezoidal matrix L (L is lower trapezoidal if m≤n); the
unitary matrix Q as a product of elementary reflectors (see Application
Notes below).
tau (local)
Array of size LOCr(ia+min(m, n)-1).
1185
Contains the scalar factors of elementary reflectors. tau is tied to the

info (global)
Application Notes
Q = H(ia+k-1)*H(ia+k-2)*...*H(ia),
where k = min(m,n)

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:n) is
stored on exit in A(ia+i-1,ja+i:ja+n-1), and tau in tau[ia+i-2].
See Also
notations.
p?orglq
Generates the real orthogonal matrix Q of the LQ
factorization formed by p?gelqf.
Syntax
void psorglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
void pdorglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?orglq function generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary
reflectors of order n
Q = H(k)*...* H(2)* H(1)
as returned by p?gelqf.
Input Parameters
1186
n (global) The number of columns in the matrix sub(Q) (n≥m≥0).

matrix Q(m≥k≥0).
a (local)
On entry, the i-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?gelqf
in the k rows of its distributed matrix argument A(ia:ia+k-1, ja:*).
first row and the first column of the submatrix A(ia:ia+m-1,ja:ja+n-1),
respectively.
work (local)

lwork≥mb_a*(mpa0+nqa0+mb_a), where
NOTE

Output Parameters
a Contains the local pieces of the m-by-n distributed matrix Q to be factored.
tau (local)
Array of size LOCr(ia+k-1).
Contains the scalar factors tau[j] of elementary reflectors H(j+1), 0 ≤j <

LOCr(ia+k-1). tau is tied to the distributed matrix A.
1187
info (global)
See Also
notations.
p?unglq
Generates the unitary matrix Q of the LQ factorization
formed by p?gelqf.
Syntax
void pcunglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
void pzunglq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia
Q = (H(k))H...*(H(2))H*(H(1))H as returned by p?gelqf.
Input Parameters
m (global) The number of rows in the matrix sub(Q) (m≥0).

a (local)
defines the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?
gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1, ja:*).
1188
respectively.
tau (local)
Array of size LOCr(ia+k-1).

LOCr(ia+k-1). tau is tied to the distributed matrix A.
work (local)

lwork≥mb_a*(mpa0+nqa0+mb_a), where
NOTE

Output Parameters
info (global)
1189
See Also
notations.
p?ormlq
of the LQ factorization formed by p?gelqf.
Syntax
void psormlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
void pdormlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
Include Files
• mkl_scalapack.h
Description
The p?ormlq function overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс
+n-1) with
side ='L' side ='R'

Q = H(k)...H(2) H(1)
as returned by p?gelqf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
='L': Q or QT is applied from the left.
='R': Q or QT is applied from the right.
trans (global)

1190
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1), if
side = 'L' and lld_a*LOCc(ja+n-1), if side = 'R'. The i-th row of the
matrix stored in amust contain the vector that defines the elementary
reflector H(i), ia≤i≤ia+k-1, as returned by p?gelqf in the k rows of its
distributed matrix argument A(ia:ia+k-1, ja:*).
A(ia:ia+k-1, ja:*) is modified by the function but restored on exit.
tau (local)

returned by p?gelqf (0 ≤j < LOCc(ja+k-1)). tau is tied to the distributed
matrix A.
c (local)
work (local)
lwork (local or global) size of the array work; must be at least:
If side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0+maxmqa0)+ numroc(numroc(m

+ iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0, lcmp), nqc0))*
mb_a) + mb_a*mb_a
else if side = 'R',
lwork≥max((mb_a* (mb_a-1))/2, (mpc0+nqc0)*mb_a + mb_a*mb_a

end if
where
lcmp = lcm/NPROW with lcm = ilcm (NPROW, NPCOL),
mqa0 = numroc(m+icoffa, nb_a, MYCOL, iacol, NPCOL),
1191

NOTE
blacs_gridinfo.
Output Parameters
c Overwritten by the product Q*sub(C), or Q' *sub (C), or sub(C)*Q', or

sub(C)*Q
info (global)
See Also
notations.
p?unmlq
Multiplies a general matrix by the unitary matrix Q of
the LQ factorization formed by p?gelqf.
Syntax
void pcunmlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
1192
void pzunmlq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
side ='L' side ='R'

trans = 'T': QH*sub(C) sub(C)*QH
Q = H(k)' ... H(2)' H(1)'
as returned by p?gelqf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
trans (global)
n (global) The number of columns in the distributed matrix sub(C)(n≥0).

a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1), if
side = 'L' and lld_a*LOCc(ja+n-1), if side = 'R', where lld_a≥
max(1, LOCr (ia+k-1)). The i-th column of the matrix stored in amust
contain the vector that defines the elementary reflector H(i), ia≤i≤ia+k-1,
as returned by p?gelqf in the k rows of its distributed matrix argument
A( ia:ia+k-1, ja:*). A( ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
1193
tau (local)
Array of size LOCc(ia+k-1).

returned by p?gelqf (0 ≤j < LOCc(ia+k-1)). tau is tied to the distributed
matrix A.
c (local)
work (local)
lwork (local or global) size of the array work; must be at least:
If side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + maxmqa0)+

numroc(numroc(m + iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side = 'R',
lwork≥max((mb_a* (mb_a-1))/2, (mpc0 + nqc0)*mb_a + mb_a*mb_a

end if
where
mqa0 = numroc(m + icoffa, nb_a, MYCOL, iacol, NPCOL),
1194
NOTE
blacs_gridinfo.
Output Parameters
c Overwritten by the product Q*sub(C), or Q'*sub (C), or sub(C)*Q', or

sub(C)*Q
info (global)
See Also
notations.
p?geqlf
Computes the QL factorization of a general matrix.
Syntax
void psgeqlf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdgeqlf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgeqlf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pzgeqlf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
Include Files
• mkl_scalapack.h
1195
Description
The p?geqlf function forms the QL factorization of a real/complex distributed m-by-n matrix sub(A)=
A(ia:ia+m-1, ja:ja+n-1) = Q*L.
Input Parameters
m (global) The number of rows in the matrix sub(Q); (m≥ 0).
n (global) The number of columns in the matrix sub(Q) (n≥ 0).
a (local)
respectively.
work (local)
lwork (local or global) size of work, must be at least lwork≥nb_a*(mp0 + nq0 +

nb_a), where
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL)
NOTE
numroc and indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW

Output Parameters
a On exit, if m≥n, the lower triangle of the distributed submatrix A(ia+m-n:ia

+m-1, ja:ja+n-1) contains the n-by-n lower triangular matrix L; if m≤n, the
elements on and below the (n - m)-th superdiagonal contain the m-by-n
1196
lower trapezoidal matrix L; the remaining elements, with the array tau,
represent the orthogonal/unitary matrix Q as a product of elementary
reflectors (see Application Notes below).
tau (local)
Array of size LOCc(ja+n-1).
Contains the scalar factors of elementary reflectors. tau is tied to the

info (global)
Application Notes
Q = H(ja+k-1)*...*H(ja+1)*H(ja)
where k = min(m,n)

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(m-k+i+1:m) = 0 and v(m-k+i) = 1;
v(1:m-k+i-1) is stored on exit in A(ia:ia+m-k+i-2, ja+n-k+i-1), and tau in tau[ja+n-k+i-2].
See Also
notations.
p?orgql
Generates the orthogonal matrix Q of the QL
factorization formed by p?geqlf.
Syntax
void psorgql (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
void pdorgql (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?orgql function generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
1197
Q = H(k)*...*H(2)*H(1)
as returned by p?geqlf.
Input Parameters
m (global) The number of rows in the matrix sub(Q), (m≥0).
n (global) The number of columns in the matrix sub(Q),(m≥n≥0).

matrix Q(n≥k≥0).
a (local)
On entry, the j-th column of the matrix stored in amust contain the vector
that defines the elementary reflector H(j),ja+n-k≤j≤ja+n-1, as returned by
p?geqlf in the k columns of its distributed matrix argument A(ia:*,ja+n-
k:ja+n-1).
respectively.
tau (local)

LOCr(ia+n-1). tau is tied to the distributed matrix A.
work (local)

lwork≥nb_a*(nqa0+mpa0+nb_a), where
NOTE
1198
Output Parameters
info (global)
See Also
notations.
p?ungql
Generates the unitary matrix Q of the QL factorization
formed by p?geqlf.
Syntax
void pcungql (const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex8
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex8
*tau , MKL_Complex8 *work , const MKL_INT *lwork , MKL_INT *info );
void pzungql (const MKL_INT *m , const MKL_INT *n , const MKL_INT *k , MKL_Complex16
*a , const MKL_INT *ia , const MKL_INT *ja , const MKL_INT *desca , const MKL_Complex16
Include Files
• mkl_scalapack.h
Description
This function generates the whole or part of m-by-n complex distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows, which is defined as the first n columns of a product of k
elementary reflectors of order m
Q = (H(k))H...*(H(2))H*(H(1))H as returned by p?geqlf.
Input Parameters
m (global) The number of rows in the matrix sub(Q) (m≥0).
n (global) The number of columns in the matrix sub(Q) (m≥n≥0).
1199

matrix Q(n≥k≥0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja
+n-1). On entry, the j-th columnof the matrix stored in a must
contain the vector that defines the elementary reflector H(j), ja+n-
k≤j≤ja+n-1, as returned by p?geqlf in the k columns of its
distributed matrix argument A(ia:*, ja+n-k: ja+n-1).
respectively.
tau (local)
Array of size LOCr(ia+n-1).

LOCr(ia+n-1). tau is tied to the distributed matrix A.
work (local)
lwork (local or global) size of work, must be at least lwork≥nb_a*(nqa0 + mpa0

+ nb_a), where

Output Parameters
info (global)
1200
See Also
notations.
p?ormql
of the QL factorization formed by p?geqlf.
Syntax
void psormql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
void pdormql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
Include Files
• mkl_scalapack.h
Description
The p?ormqlfunction overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс
+n-1) with
side ='L' side ='R'

Q = H(k)' ... H(2)' H(1)'
as returned by p?geqlf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
trans (global)
m (global) The number of rows in the distributed matrix sub(C), (m≥0).
n (global) The number of columns in the distributed matrix sub(C), (n≥0).
1201

a (local)
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?gelqf in the
If side = 'L',lld_a≥max(1, LOCr(ia+m-1)),
If side = 'R', lld_a≥max(1, LOCr(ia+n-1)).
tau (local)

returned by p?geqlf (0 ≤j < LOCc(ja+k-1)). tau is tied to the distributed
matrix A.
c (local)
work (local)
lwork (local or global) dimension of work, must be at least:
If side = 'L',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a + nb_a*nb_a

else if side ='R',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0+max(npa0 +
numroc(numroc(n+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0,
end if
where
1202
lcmq = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),
npa0= numroc(n + iroffa, mb_a, MYROW, iarow, NPROW),
NOTE
blacs_gridinfo.
Output Parameters
c Overwritten by the product Q* sub(C), or Q'*sub (C), or sub(C)* Q', or

sub(C)* Q
info (global)
See Also
notations.
p?unmql
the QL factorization formed by p?geqlf.
1203
Syntax
void pcunmql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
void pzunmql (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
side ='L' side ='R'

trans = 'C': QH*sub(C) sub(C)*QH
Q = H(k)' ... H(2)' H(1)'
as returned by p?geqlf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
trans (global)

a (local)
1204
the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?geqlf in the
If side = 'L',lld_a≥max(1, LOCr(ia+m-1)),
If side = 'R', lld_a≥max(1, LOCr(ia+n-1)).
tau (local)
Array of size LOCc(ia+n-1).

returned by p?geqlf (0 ≤j < LOCc(ia+n-1)). tau is tied to the distributed
matrix A.
c (local)
work (local)
If side = 'L',
lwork≥max((nb_a* (nb_a-1))/2, (nqc0+mpc0)*nb_a + nb_a*nb_a

else if side ='R',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0+maxnpa0)+ numroc(numroc(n

+ nb_a*nb_a
end if
where
lcmp = lcm/NPCOL with lcm = ilcm (NPROW, NPCOL),
1205
npa0 = numroc (n + iroffa, mb_a, MYROW, iarow, NPROW),

NOTE
blacs_gridinfo.
NOTE

Output Parameters
c Overwritten by the product Q* sub(C), or Q' sub (C), or sub(C)* Q', or

sub(C)* Q
info (global)
See Also
notations.
p?gerqf
Computes the RQ factorization of a general
rectangular matrix.
Syntax
void psgerqf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
1206
void pdgerqf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgerqf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pzgerqf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
Include Files
• mkl_scalapack.h
Description
The p?gerqf function forms the QR factorization of a general m-by-n distributed matrix sub(A)= A(ia:ia
+m-1, ja:ja+n-1) as
A= R*Q
Input Parameters
m (global) The number of rows in the distributed matrix sub(A); (m≥0).
n (global) The number of columns in the distributed matrix sub(A); (n≥0).
a (local)
respectively.
distributed matrix A
work (local).

lwork≥mb_a*(mp0+nq0+mb_a), where
NOTE
1207
nq0 = numroc(n+icoff, nb_a, MYCOL, iacol, NPCOL) and numroc,

indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW and NPCOL
can be determined by calling the function blacs_gridinfo.

Output Parameters
a On exit, if m≤n, the upper triangle of A(ia:ia+m-1, ja:ja+n-1) contains the

m-by-m upper triangular matrix R; if m≥n, the elements on and above the (m
- n)-th subdiagonal contain the m-by-n upper trapezoidal matrix R; the
remaining elements, with the array tau, represent the orthogonal/unitary
below).
tau (local)
Array of size LOCr(ia+m-1).

info (global)
Application Notes
Q = H(ia)*H(ia+1)*...*H(ia+k-1),
where k = min(m,n).

H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0 and v(n-k+i) = 1;
v(1:n-k+i-1) is stored on exit in A(ia+m-k+i-1,ja:ja+n-k+i-2), and tau in tau[ia+m-k+i-2].
See Also
notations.
p?orgrq
Generates the orthogonal matrix Q of the RQ
factorization formed by p?gerqf.
1208
Syntax
void psorgrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
void pdorgrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?orgrqfunction generates the whole or part of m-by-n real distributed matrix Q denoting A(ia:ia
+m-1,ja:ja+n-1) with orthonormal rows that is defined as the last m rows of a product of k elementary
Q= H(1)*H(2)*...*H(k)
as returned by p?gerqf.
Input Parameters
m (global) The number of rows in the matrix sub(Q), (m≥0).
n (global) The number of columns in the matrix sub(Q), (n≥m≥0).

a (local)
The i-th row of the matrix stored in amust contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+m-1, as returned by p?gerqf in the
k rows of its distributed matrix argument A(ia+m-k:ia+m-1, ja:*).
tau (local)
Contains the scalar factor tau[i] of elementary reflectors H(i+1) as

returned by p?gerqf, 0 ≤i < LOCr(ja+k-1). tau is tied to the distributed
matrix A.
work (local)
lwork (local or global) size of work, must be at least lwork≥mb_a*(mpa0 + nqa0

+ mb_a), where
1209

NOTE

Output Parameters
info (global)
See Also
notations.
p?ungrq
Generates the unitary matrix Q of the RQ factorization
formed by p?gerqf.
Syntax
void pcungrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
void pzungrq (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
1210
Description
This function generates the m-by-n complex distributed matrix Q denoting A(ia:ia+m-1,ja:ja+n-1) with
orthonormal rows, which is defined as the last m rows of a product of k elementary reflectors of order n
Q = (H(1))H*(H(2))H*...*(H(k))H as returned by p?gerqf.
Input Parameters

a (local)
i-th row of the matrix stored in amust contain the vector that defines the
elementary reflector H(i), ia+m-k≤i≤ia+m-1, as returned by p?gerqf in the
k rows of its distributed matrix argument A(ia+m-k:ia+m-1, ja:*).
tau (local)

returned by p?gerqf, 0 ≤i < LOCr(ia+m-1). tau is tied to the distributed
matrix A.
work (local)
lwork (local or global) size of work, must be at least lwork≥mb_a*(mpa0

+nqa0+mb_a), where
NOTE
1211

Output Parameters
info (global)
See Also
notations.
p?ormr3
Applies an orthogonal distributed matrix to a general
m-by-n distributed matrix.
Syntax
void psormr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const float* a, const MKL_INT* ia, const MKL_INT*
ja, const MKL_INT* desca, const float* tau, float* c, const MKL_INT* ic, const
MKL_INT* jc, const MKL_INT* descc, float* work, const MKL_INT* lwork, MKL_INT* info);
void pdormr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const double* a, const MKL_INT* ia, const MKL_INT*
ja, const MKL_INT* desca, const double* tau, double* c, const MKL_INT* ic, const
MKL_INT* jc, const MKL_INT* descc, double* work, const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?ormr3 overwrites the general real m-by-n distributed matrix sub( C ) = C(ic:ic+m-1,jc:jc+n-1) with
side = 'L' side = 'R'
trans = 'N' Q * sub( C ) sub( C ) * Q
trans = 'T' QT * sub( C ) sub( C ) * QT

Q * sub( C )
1212
Q = H(1) H(2) . . . H(k)
as returned by p?tzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
= 'L': apply Q or QT from the Left;
= 'R': apply Q or QT from the Right.
trans (global)
= 'N': No transpose, apply Q;
= 'T': Transpose, apply QT.
m (global)
The number of rows to be operated on i.e the number of rows of the
distributed submatrix sub( C ). m >= 0.
n (global)
The number of columns to be operated on i.e the number of columns of the
distributed submatrix sub( C ). n >= 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q.
If side = 'L', m >= k >= 0,
if side = 'R', n >= k >= 0.
l (global)
The columns of the distributed submatrix sub( A ) containing the
meaningful part of the Householder reflectors.
If side = 'L', m >= l >= 0,
if side = 'R', n >= l >= 0.
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1) if
side='L', and lld_a*LOCc(ja+n-1) if side='R', where lld_a >=
MAX(1,LOCr(ia+k-1));
On entry, the i-th row must contain the vector which defines the elementary
reflector H(i), ia <= i <= ia+k-1, as returned by p?tzrzf in the k rows of
its distributed matrix argument A(ia:ia+k-1,ja:*).
A(ia:ia+k-1,ja:*) is modified by the routine but restored on exit.
ia (global)
The row index in the global array a indicating the first row of sub( A ).
ja (global)
1213
The column index in the global array a indicating the first column of
sub( A ).
desca (global and local)

Array of size dlen_.
The array descriptor for the distributed matrix A.
tau (local)
Array, size LOCc(ia+k-1).
This array contains the scalar factors tau(i) of the elementary reflectors
H(i) as returned by p?tzrzf. tau is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1) .
On entry, the local pieces of the distributed matrix sub( C ).
ic (global)
The row index in the global array c indicating the first row of sub( C ).
jc (global)
The column index in the global array c indicating the first column of
sub( C ).
descc (global and local)

The array descriptor for the distributed matrix C.
work (local)
Array, size (lwork)
lwork (local)
The size of the array work.
lwork is local input and must be at least

If side = 'L', lwork >= MpC0 + MAX( MAX( 1, NqC0 ), numroc( numroc( m
+IROFFC,mb_a,0,0,NPROW ),mb_a,0,0,NqC0 ) );
if side = 'R', lwork >= NqC0 + MAX( 1, MpC0 );
where LCMP = LCM / NPROW

LCM = iclm( NPROW, NPCOL ),
IROFFC = MOD( ic-1, mb_c ),
ICOFFC = MOD( jc-1, nb_c),
ICROW = indxg2p( ic, mb_c, MYROW, rsrc_c, NPROW ),
ICCOL = indxg2p( jc, nb_c, MYCOL, csrc_c, NPCOL ),
MpC0 = numroc( m+IROFFC, mb_c, MYROW, ICROW, NPROW ),
NqC0 = numroc( n+ICOFFC, nb_c, MYCOL, ICCOL, NPCOL ),
1214
ilcm, indxg2p, and numroc are ScaLAPACK tool functions;
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the
subroutine blacs_gridinfo.

assumed; the routine only calculates the minimum and optimal size for all
Output Parameters
c On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) or sub( C )*Q'

or sub( C )*Q.
work On exit, work[0] returns the minimal and optimal lwork.
info (local)
< 0: If the i-th argument is an array and the j-th entry had an illegal value,
then info = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then info = -i.
Application Notes
Alignment requirements
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some alignment
properties, namely the following expressions should be true:
If side = 'L',
( nb_a = mb_c .AND. ICOFFA = IROFFC )

If side = 'R',
( nb_a = nb_c .AND. ICOFFA = ICOFFC .AND. IACOL = ICCOL )
p?unmr3
Applies an orthogonal distributed matrix to a general
m-by-n distributed matrix.
Syntax
void pcunmr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const MKL_Complex8* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, const MKL_Complex8* tau, MKL_Complex8* c, const
MKL_INT* ic, const MKL_INT* jc, const MKL_INT* descc, MKL_Complex8* work, const
MKL_INT* lwork, MKL_INT* info);
void pzunmr3 (const char* side, const char* trans, const MKL_INT* m, const MKL_INT* n,
const MKL_INT* k, const MKL_INT* l, const MKL_Complex16* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, const MKL_Complex16* tau, MKL_Complex16* c, const
MKL_INT* ic, const MKL_INT* jc, const MKL_INT* descc, MKL_Complex16* work, const
1215
Include Files
• mkl_scalapack.h
Description
p?unmr3 overwrites the general complex m-by-n distributed matrix sub( C ) = C(ic:ic+m-1,jc:jc+n-1) with
side = 'L' side = 'R'
trans = 'N': Q * sub( C ) sub( C ) * Q
trans = 'C': QH * sub( C ) sub( C ) * QH
Q = H(1)' H(2)' . . . H(k)'
Input Parameters
side (global)
= 'L': apply Q or QH from the Left;
= 'R': apply Q or QH from the Right.
trans (global)
= 'N': No transpose, apply Q;
= 'C': Conjugate transpose, apply QH.
m (global)
distributed submatrix sub( C ). m >= 0.
n (global)
distributed submatrix sub( C ). n >= 0.
k (global)
If side = 'L', m >= k >= 0, if side = 'R', n >= k >= 0.
l (global)
The columns of the distributed submatrix sub( A ) containing the
If side = 'L', m >= l >= 0, if side = 'R', n >= l >= 0.
a (local)
side='L', and lld_a*LOCc(ja+n-1) if side='R', where lld_a >=
MAX(1,LOCr(ia+k-1));
On entry, the i-th row must contain the vector which defines the elementary
reflector H(i), ia <= i <= ia+k-1, as returned by p?tzrzf in the k rows of
its distributed matrix argument A(ia:ia+k-1,ja:*).
1216
A(ia:ia+k-1,ja:*) is modified by the routine but restored on exit.
ia (global)
ja (global)
sub( A ).

tau (local)
Array, size LOCc(ia+k-1).
This array contains the scalar factors tau(i) of the elementary reflectors
H(i) as returned by p?tzrzf. tau is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1) .
On entry, the local pieces of the distributed matrix sub( C ).
ic (global)
The row index in the global array c indicating the first row of sub( C ).
jc (global)
The column index in the global array c indicating the first column of
sub( C ).
descc (global and local)

work (local)
Array, size (lwork)
On exit, work(1) returns the minimal and optimal lwork.
lwork (local or global)


If side = 'L', lwork >= MpC0 + MAX( MAX( 1, NqC0 ), numroc( numroc( m
+IROFFC,mb_a,0,0,NPROW ),mb_a,0,0,LCMP ) );
if side = 'R', lwork >= NqC0 + MAX( 1, MpC0 );
where LCMP = LCM / NPROW with LCM = ICLM( NPROW, NPCOL ),

IROFFC = MOD( ic-1, MB_C ), ICOFFC = MOD( jc-1, nb_c ),
ICROW = indxg2p( ic, MB_C, MYROW, rsrc_c, NPROW ),
1217
ICCOL = indxg2p( jc, nb_c, MYCOL, csrc_c, NPCOL ),
MpC0 = numroc( m+IROFFC, MB_C, MYROW, ICROW, NPROW ),
NqC0 = numroc( n+ICOFFC, nb_c, MYCOL, ICCOL, NPCOL ),


assumed; the routine only calculates the minimum and optimal size for all
Output Parameters
c On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) or sub( C )*Q'

or sub( C )*Q.
work (local)
Array, size (lwork)
On exit, work[0] returns the minimal and optimal lwork.
info (local)
Application Notes
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some alignment
properties, namely the following expressions should be true:
If side = 'L', ( nb_a = MB_C and ICOFFA = IROFFC )
If side = 'R', ( nb_a = nb_c and ICOFFA = ICOFFC and IACOL = ICCOL )
p?ormrq
of the RQ factorization formed by p?gerqf.
Syntax
void psormrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
void pdormrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
1218
Include Files
• mkl_scalapack.h
Description
The p?ormrqfunction overwrites the general real m-by-n distributed matrix sub (C) = C(iс:iс+m-1,jс:jс
+n-1) with
side ='L' side ='R'

Q = H(1) H(2)... H(k)
as returned by p?gerqf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
trans (global)

a (local)
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'.
The i-th row of the matrix stored in a must contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?gerqf in the
k rows of its distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia
+k-1, ja:*) is modified by the function but restored on exit.
tau (local)
1219

returned by p?gerqf (0 ≤i < LOCc(ja+k-1)). tau is tied to the distributed
matrix A.
c (local)
work (local)
If side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + max(mqa0 +

numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
else if side ='R',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 + nqc0)*mb_a) + mb_a*mb_a

end if
where
mqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL),
NOTE
1220
blacs_gridinfo.
Output Parameters
c Overwritten by the product Q* sub(C), or Q'*sub (C), or sub(C)* Q', or

sub(C)* Q
info (global)
See Also
notations.
p?unmrq
the RQ factorization formed by p?gerqf.
Syntax
void pcunmrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
void pzunmrq (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
Include Files
• mkl_scalapack.h
Description
with
side ='L' side ='R'

1221
Q = H(1)' H(2)'... H(k)'
as returned by p?gerqf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
trans (global)
m (global) The number of rows in the distributed matrix sub(C) , (m≥0).

a (local)
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R'. The i-th row of the
matrix stored in amust contain the vector that defines the elementary
reflector H(i), ia≤i≤ia+k-1, as returned by p?gerqf in the k rows of its
distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia+k-1, ja:*) is
modified by the function but restored on exit.
tau (local)

returned by p?gerqf (0 ≤i < LOCc(ja+k-1)). tau is tied to the distributed
matrix A.
c (local)
1222
work (local)
If side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 +
max(mqa0+numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW), mb_a,
0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_a
else if side = 'R',

end if
where
lcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),
NOTE
blacs_gridinfo.
Output Parameters
c Overwritten by the product Q* sub(C) or Q'*sub (C), or sub(C)* Q', or

sub(C)* Q
1223
info (global)
See Also
notations.
p?tzrzf
Reduces the upper trapezoidal matrix A to upper
triangular form.
Syntax
void pstzrzf (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdtzrzf (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pctzrzf (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pztzrzf (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
Include Files
• mkl_scalapack.h
Description
The p?tzrzffunction reduces the m-by-n (m≤n) real/complex upper trapezoidal matrix sub(A)= A(ia:ia+m-1,
ja:ja+n-1) to upper triangular form by means of orthogonal/unitary transformations. The upper trapezoidal
matrix A is factored as
A = (R 0)*Z,
where Z is an n-by-n orthogonal/unitary matrix and R is an m-by-m upper triangular matrix.
Input Parameters
m (global) The number of rows in the matrix sub(A); (m≥0).
n (global) The number of columns in the matrix sub(A) (n≥0).
a (local)
Contains the local pieces of the m-by-n distributed matrix sub (A) to be
factored.
1224
work (local)

lwork≥mb_a*(mp0+nq0+mb_a), where
mp0 = numroc (m+iroff, mb_a, MYROW, iarow, NPROW),
nq0 = numroc (n+icoff, nb_a, MYCOL, iacol, NPCOL)
NOTE

Output Parameters
a On exit, the leading m-by-m upper triangular part of sub(A) contains the
upper triangular matrix R, and elements m+1 to n of the first m rows of sub
(A), with the array tau, represent the orthogonal/unitary matrix Z as a
product of m elementary reflectors.
tau (local)

info (global)
1225
< 0:if the i-th argument is an array and the j-th entry, indexed j - 1, had
Application Notes
The factorization is obtained by the Householder's method. The k-th transformation matrix, Z(k), which is or
whose conjugate transpose is used to introduce zeros into the (m - k +1)-th row of sub(A), is given in the
form
where
T(k) = i - tau*u(k)*u(k)',
tau is a scalar and Z(k) is an (n - m) element vector. tau and Z(k) are chosen to annihilate the elements of
the k-th row of sub(A). The scalar tau is returned in the k-th element of tau, indexed k-1, and the vector
u(k) in the k-th row of sub(A), such that the elements of Z(k) are in a(k, m + 1),..., a(k, n). The
elements of R are returned in the upper triangular part of sub(A). Z is given by
Z = Z(1) * Z(2) *... * Z(m).
See Also
notations.
p?ormrz
Multiplies a general matrix by the orthogonal matrix
from a reduction to upper triangular form formed by
p?tzrzf.
Syntax
void psormrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_INT *l , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau ,
float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work , MKL_INT *lwork ,
MKL_INT *info );
void pdormrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_INT *l , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau ,
double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work , MKL_INT
1226
Include Files
• mkl_scalapack.h
Description
This function overwrites the general real m-by-n distributed matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1) with
side ='L' side ='R'

Q = H(1) H(2)... H(k)
Input Parameters
side (global)
trans (global)
m (global) The number of rows in the distributed matrix sub(C)(m≥0).

l (global)
The columns of the distributed matrix sub(A) containing the meaningful
part of the Householder reflectors.
If side = 'L', m≥l≥0
If side = 'R', n≥l≥0.
a (local)
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R', where
lld_a≥max(1,LOCr(ia+k-1)).
The i-th row of the matrix stored in amust contain the vector that defines
the elementary reflector H(i), ia≤i≤ia+k-1, as returned by p?tzrzf in the
k rows of its distributed matrix argument A(ia:ia+k-1, ja:*). A(ia:ia
+k-1, ja:*) is modified by the function but restored on exit.
1227
tau (local)

returned by p?tzrzf (0 ≤i < LOCc(ia+k-1)). tau is tied to the distributed
matrix A.
c (local)
work (local)
If side = 'L',

numroc(numroc(n+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
else if side ='R',

end if
where
mqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL),
1228
NOTE
blacs_gridinfo.
Output Parameters
c Overwritten by the product Q*sub(C), or Q'*sub (C), or sub(C)*Q', or

sub(C)*Q
info (global)
See Also
notations.
p?unmrz
Multiplies a general matrix by the unitary
transformation matrix from a reduction to upper
triangular form determined by p?tzrzf.
Syntax
void pcunmrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_INT *l , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex8 *tau , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc ,
void pzunmrz (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
MKL_INT *l , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
Include Files
• mkl_scalapack.h
Description
with
1229
side ='L' side ='R'

Q = H(1)' H(2)'... H(k)'
as returned by pctzrzf/pztzrzf. Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
trans (global)
m (global) The number of rows in the distributed matrix sub(C), (m≥0).

l (global) The columns of the distributed matrix sub(A) containing the

If side = 'L', m≥l≥0
If side = 'R', n≥l≥0.
a (local)
side = 'L', and lld_a*LOCc(ja+n-1) if side = 'R', where
lld_a≥max(1, LOCr(ja+k-1)). The i-th row of the matrix stored in amust
contain the vector that defines the elementary reflector H(i), ia≤i≤ia+k-1,
as returned by p?gerqf in the k rows of its distributed matrix argument
A(ia:ia+k-1, ja:*). A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
tau (local)
1230
returned by p?gerqf (0 ≤i < LOCc(ia+k-1)). tau is tied to the distributed
matrix A.
c (local)
work (local)
If side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0+max(mqa0+numroc(numroc(n
+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0, lcmp), nqc0))*mb_a)
+ mb_a*mb_a
else if side ='R',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0+nqc0)*mb_a) + mb_a*mb_a

end if
where
lcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),
NOTE
blacs_gridinfo.
1231

Output Parameters
c Overwritten by the product Q* sub(C), or Q'*sub (C), or sub(C)*Q', or

sub(C)*Q
info (global)
See Also
notations.
p?ggqrf
Computes the generalized QR factorization.
Syntax
void psggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *taua , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *taub , float *work , MKL_INT *lwork , MKL_INT *info );
void pdggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *taua , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *taub , double *work , MKL_INT *lwork , MKL_INT *info );
void pcggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *taua , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *taub , MKL_Complex8 *work , MKL_INT
void pzggqrf (MKL_INT *n , MKL_INT *m , MKL_INT *p , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?ggqrffunction forms the generalized QR factorization of an n-by-m matrix
sub(A) = A(ia:ia+n-1, ja:ja+m-1)
and an n-by-p matrix
1232
sub(B) = B(ib:ib+n-1, jb:jb+p-1):
as
sub(A) = Q*R, sub(B) = Q*T*Z,
where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, and R and T
assume one of the forms:
If n≥m
or if n < m
where R11 is upper triangular, and
where T12 or T21 is an upper triangular matrix.

In particular, if sub(B) is square and nonsingular, the GQR factorization of sub(A) and sub(B) implicitly gives
the QR factorization of inv (sub(B))* sub (A):
inv(sub(B))*sub(A) = ZH*(inv(T)*R)
Input Parameters
n (global) The number of rows in the distributed matrices sub (A) and sub(B)
(n≥0).
m (global) The number of columns in the distributed matrix sub(A) (m≥0).
p The number of columns in the distributed matrix sub(B) (p≥0).
1233
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+m-1).
Contains the local pieces of the n-by-m matrix sub(A) to be factored.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+p-1).
Contains the local pieces of the n-by-p matrix sub(B) to be factored.
ib, jb (global) The row and column indices in the global matrix B
indicating the first row and the first column of the submatrix B,
respectively.
work (local)
lwork (local or global) Sze of work, must be at least
lwork≥max(nb_a*(npa0+mqa0+nb_a), max((nb_a*(nb_a-1))/2,
(pqb0+npb0)*nb_a)+nb_a*nb_a, mb_b*(npb0+pqb0+mb_b)),
where
iroffa = mod(ia-1, mb_A),
npa0 = numroc (n+iroffa, mb_a, MYROW, iarow, NPROW),
mqa0 = numroc (m+icoffa, nb_a, MYCOL, iacol, NPCOL)
iroffb = mod(ib-1, mb_b),
icoffb = mod(jb-1, nb_b),
ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),
ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),
npb0 = numroc (n+iroffa, mb_b, MYROW, Ibrow, NPROW),
pqb0 = numroc(m+icoffb, nb_b, MYCOL, ibcol, NPCOL)
NOTE
and numroc, indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW
1234
Output Parameters
a On exit, the elements on and above the diagonal of sub (A) contain the
min(n, m)-by-m upper trapezoidal matrix R (R is upper triangular if n≥m); the
elements below the diagonal, with the array taua, represent the
orthogonal/unitary matrix Q as a product of min(n, m) elementary
reflectors. (See Application Notes below).
taua, taub (local)

Arrays of size LOCc(ja+min(n,m)-1) for taua and LOCr(ib+n-1) for
taub.
which represent the orthogonal/unitary matrix Q. taua is tied to the
distributed matrix A. (See Application Notes below).
which represent the orthogonal/unitary matrix Z. taub is tied to the
distributed matrix B. (See Application Notes below).
info (global)
Application Notes
Q = H(ja)*H(ja+1)*...*H(ja+k-1),
where k= min(n,m).

H(i) = i - taua*v*v'
where taua is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:n)
is stored on exit in A(ia+i:ia+n-1, ja+i-1) , and taua in taua[ja+i-2].To form Q explicitly, use ScaLAPACK
function p?orgqr/p?ungqr. To use Q to update another matrix, use ScaLAPACK function p?ormqr/p?unmqr.

Z = H(ib)*H(ib+1)*...*H(ib+k-1), where k= min(n,p).

H(i) = i - taub*v*v'
1235
where taub is a real/complex scalar, and v is a real/complex vector with v(p-k+i+1:p) = 0 and v(p-k+i) = 1;
v(1:p-k+i-1) is stored on exit in B(ib+n-k+i-1,jb:jb+p-k+i-2), and taub in taub[ib+n-k+i-2]. To form Z
explicitly, use ScaLAPACK function p?orgrq/p?ungrq. To use Z to update another matrix, use ScaLAPACK
function p?ormrq/p?unmrq.
See Also
notations.
p?ggrqf
Computes the generalized RQ factorization.
Syntax
void psggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *taua , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , float *taub , float *work , MKL_INT *lwork , MKL_INT *info );
void pdggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *taua , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , double *taub , double *work , MKL_INT *lwork , MKL_INT *info );
void pcggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
void pzggrqf (MKL_INT *m , MKL_INT *p , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?ggrqffunction forms the generalized RQ factorization of an m-by-n matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1) and a p-by-n matrix sub(B) = B(ib:ib+p-1, jb:jb+n-1):
sub(A) = R*Q, sub(B) = Z*T*Q,
where Q is an n-by-n orthogonal/unitary matrix, Z is a p-by-p orthogonal/unitary matrix, and R and T
assume one of the forms:
or
1236
where R11 or R21 is upper triangular, and
or
where T11 is upper triangular.

In particular, if sub(B) is square and nonsingular, the GRQ factorization of sub(A) and sub(B) implicitly gives
the RQ factorization of sub (A)*inv(sub(B)):
sub(A)*inv(sub(B))= (R*inv(T))*Z'
where inv(sub(B)) denotes the inverse of the matrix sub(B), and Z' denotes the transpose (conjugate
transpose) of matrix Z.
Input Parameters
m (global) The number of rows in the distributed matrices sub (A) (m≥0).
p The number of rows in the distributed matrix sub(B) (p≥0).
n (global) The number of columns in the distributed matrices sub(A) and

sub(B) (n≥0).
a (local)
Contains the local pieces of the m-by-n distributed matrix sub(A) to be
factored.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1).
Contains the local pieces of the p-by-n matrix sub(B) to be factored.
first row and the first column of the submatrix B, respectively.
1237
work (local)

Size of work, must be at least lwork≥max(mb_a*(mpa0+nqa0+mb_a),
max((mb_a*(mb_a-1))/2, (ppb0+nqb0)*mb_a) + mb_a*mb_a,
nb_b*(ppb0+nqb0+nb_b)), where
iroffa = mod(ia-1, mb_A),
mpa0 = numroc (m+iroffa, mb_a, MYROW, iarow, NPROW),
nqa0 = numroc (m+icoffa, nb_a, MYCOL, iacol, NPCOL)
ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW ),
ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL ),
ppb0 = numroc (p+iroffb, mb_b, MYROW, ibrow,NPROW),
nqb0 = numroc (n+icoffb, nb_b, MYCOL, ibcol,NPCOL)
NOTE
and numroc, indxg2p are ScaLAPACK tool functions; MYROW, MYCOL, NPROW

Output Parameters
a On exit, if m≤n, the upper triangle of A(ia:ia+m-1, ja+n-m:ja+n-1)

contains the m-by-m upper triangular matrix R; if m≥n, the elements on and
above the (m-n)-th subdiagonal contain the m-by-n upper trapezoidal matrix
R; the remaining elements, with the array taua, represent the orthogonal/
unitary matrix Q as a product of min(n,m) elementary reflectors (see
taua, taub (local)

Arrays of size LOCr(ia+m-1)for taua and LOCc(jb+min(p,n)-1) for
taub.
1238
which represent the orthogonal/unitary matrix Q. taua is tied to the
distributed matrix A.(See Application Notes below).
which represent the orthogonal/unitary matrix Z. taub is tied to the
distributed matrix B. (See Application Notes below).
info (global)
Application Notes
Q = H(ia)*H(ia+1)*...*H(ia+k-1),
where k= min(m,n).

H(i) = i - taua*v*v'
where taua is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0 and v(n-k+i) = 1;
v(1:n-k+i-1) is stored on exit in A(ia+m-k+i-1, ja:ja+n-k+i-2), and taua in taua[ia+m-k+i-2]. To form Q
explicitly, use ScaLAPACK function p?orgrq/p?ungrq. To use Q to update another matrix, use ScaLAPACK
function p?ormrq/p?unmrq.

Z = H(jb)*H(jb+1)*...*H(jb+k-1), where k= min(p,n).

H(i) = i - taub*v*v'
where taub is a real/complex scalar, and v is a real/complex vector with v(1:i-1) = 0 and v(i)= 1; v(i+1:p) is
stored on exit in B(ib+i:ib+p-1,jb+i-1), and taub in taub[jb+i-2]. To form Z explicitly, use ScaLAPACK
function p?orgqr/p?ungqr. To use Z to update another matrix, use ScaLAPACK function p?ormqr/p?unmqr.
See Also
notations.
Symmetric Eigenvalue Problems: ScaLAPACK Computational Routines

To solve a symmetric eigenproblem with ScaLAPACK, you usually need to reduce the matrix to real
tridiagonal form T and then find the eigenvalues and eigenvectors of the tridiagonal matrix T. ScaLAPACK
includes routines for reducing the matrix to a tridiagonal form by an orthogonal (or unitary) similarity
transformation A = QTQH as well as for solving tridiagonal symmetric eigenvalue problems. These routines
are listed in Table "Computational Routines for Solving Symmetric Eigenproblems".
1239
There are different routines for symmetric eigenproblems, depending on whether you need eigenvalues only
or eigenvectors as well, and on the algorithm used (either the QTQ algorithm, or bisection followed by
inverse iteration).
Computational Routines for Solving Symmetric Eigenproblems
Operation Dense symmetric/ Orthogonal/unitary Symmetric
Hermitian matrix matrix tridiagonal
matrix
Reduce to tridiagonal form A = QTQH p?sytrd/p?hetrd
Multiply matrix after reduction p?ormtr/p?unmtr
Find all eigenvalues and eigenvectors steqr2*
of a tridiagonal matrix T by a QTQ
method
Find selected eigenvalues of a p?stebz
tridiagonal matrix T via bisection
Find selected eigenvectors of a p?stein
tridiagonal matrix T by inverse
iteration
* This routine is described as part of auxiliary ScaLAPACK routines.
p?syngst
Reduces a complex Hermitian-definite generalized
eigenproblem to standard form.
Syntax
void pssyngst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, float* a,
const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const float* b, const
MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, float* scale, float* work, const
void pdsyngst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, double* a,
const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const double* b, const
MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, double* scale, double* work,
const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?syngst reduces a complex Hermitian-definite generalized eigenproblem to standard form.
p?syngst performs the same function as p?hegst, but is based on rank 2K updates, which are faster and
more scalable than triangular solves (the basis of p?syngst).
p?syngst calls p?hegst when uplo='U', hence p?hengst provides improved performance only when
uplo='L', ibtype=1.
p?syngst also calls p?hegst when insufficient workspace is provided, hence p?syngst provides improved
performance only when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB
In the following sub( A ) denotes A( ia:ia+n-1, ja:ja+n-1 ) and sub( B ) denotes B( ib:ib+n-1, jb:jb
+n-1 ).
If ibtype = 1, the problem is sub( A )*x = lambda*sub( B )*x, and sub( A ) is overwritten by
inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH)
If ibtype = 2 or 3, the problem is sub( A )*sub( B )*x = lambda*x or sub( B )*sub( A )*x = lambda*x, and
sub( A ) is overwritten by U*sub( A )*UH or LH*sub( A )*L.
sub( B ) must have been previously factorized as UH*U or L*LH by p?potrf.
1240
Input Parameters
ibtype (global)
= 1: compute inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH);
= 2 or 3: compute U*sub( A )*UH or LH*sub( A )*L.
uplo (global)
= 'U': Upper triangle of sub( A ) is stored and sub( B ) is factored as UH*U;
= 'L': Lower triangle of sub( A ) is stored and sub( B ) is factored as L*LH.
n (global)
The order of the matrices sub( A ) and sub( B ). n >= 0.
a (local)
On entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub( A ). If uplo = 'U', the leading n-by-n upper
triangular part of sub( A ) contains the upper triangular part of the matrix,
and its strictly lower triangular part is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub( A ) contains the lower
triangular part of the matrix, and its strictly upper triangular part is not
referenced.
ia (global)
A's global row index, which points to the beginning of the submatrix which
is to be operated on.
ja (global)
A's global column index, which points to the beginning of the submatrix
which is to be operated on.

b (local)
On entry, this array contains the local pieces of the triangular factor from
the Cholesky factorization of sub( B ), as returned by p?potrf.
ib (global)
B's global row index, which points to the beginning of the submatrix which
jb (global)
B's global column index, which points to the beginning of the submatrix
descb (global and local)
1241

The array descriptor for the distributed matrix B.
work (local)
Array, size (lwork)

lwork is local input and must be at least lwork >= MAX( NB * ( NP0 +1 ),
3 * NB )
When ibtype = 1 and uplo = 'L', p?syngst provides improved
performance when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB,
where NB = mb_a = nb_a,

NP0 = numroc( n, NB, 0, 0, NPROW ),
NQ0 = numroc( n, NB, 0, 0, NPROW ),
numroc is a ScaLAPACK tool functions


assumed; the routine only calculates the optimal size for all work arrays.
Each of these values is returned in the first entry of the corresponding work
array, and no error message is issued by pxerbla.
Output Parameters
a On exit, if info = 0, the transformed matrix, stored in the same format as

sub( A ).
scale (global)
Amount by which the eigenvalues should be scaled to compensate for the
scaling performed in this routine. At present, scale is always returned as
1.0, it is returned here to allow for future enhancement.
work (local)
Array, size (lwork)
On exit, work[0] returns the minimal and optimal lwork.
info (global)
p?syntrd
Reduces a real symmetric matrix to symmetric
tridiagonal form.
1242
Syntax
void pssyntrd (const char* uplo, const MKL_INT* n, float* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, float* d, float* e, float* tau, float* work, const
void pdsyntrd (const char* uplo, const MKL_INT* n, double* a, const MKL_INT* ia, const
MKL_INT* ja, const MKL_INT* desca, double* d, double* e, double* tau, double* work,
const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?syntrd is a prototype version of p?sytrd which uses tailored codes (either the serial, ?sytrd, or the
parallel code, p?syttrd) when the workspace provided by the user is adequate.
p?syntrd reduces a real symmetric matrix sub( A ) to symmetric tridiagonal form T by an orthogonal
similarity transformation:
Q' * sub( A ) * Q = T, where sub( A ) = A(ia:ia+n-1,ja:ja+n-1).
Features
p?syntrd is faster than p?sytrd on almost all matrices, particularly small ones (i.e. n < 500 * sqrt(P) ),
provided that enough workspace is available to use the tailored codes.
The tailored codes provide performance that is essentially independent of the input data layout.
The tailored codes place no restrictions on ia, ja, MB or NB. At present, ia, ja, MB and NB are restricted to
those values allowed by p?hetrd to keep the interface simple (see the Application Notes section for more
information about the restrictions).
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the symmetric
matrix sub( A ) is stored:
n (global)
The number of rows and columns to be operated on, i.e. the order of the
distributed submatrix sub( A ). n >= 0.
a (local)
On entry, this array contains the local pieces of the symmetric distributed
matrix sub( A ). If uplo = 'U', the leading n-by-n upper triangular part of
sub( A ) contains the upper triangular part of the matrix, and its strictly
lower triangular part is not referenced. If uplo = 'L', the leading n-by-n
lower triangular part of sub( A ) contains the lower triangular part of the
matrix, and its strictly upper triangular part is not referenced.
ia (global)
1243
ja (global)
sub( A ).

work (local)
Array, size (lwork)

lwork is local input and must be at least lwork >= MAX( NB * ( NP +1 ), 3

* NB )
For optimal performance, greater workspace is needed, i.e.
lwork >= 2*( ANB+1 )*( 4*NPS+2 ) + ( NPS + 4 ) * NPS
ANB = pjlaenv( ICTXT, 3, 'p?syttrd', 'L', 0, 0, 0, 0 )
ICTXT = desca( ctxt_ )
SQNPC = INT( sqrt( REAL( NPROW * NPCOL ) ) )
numroc is a ScaLAPACK tool function.

pjlaenv is a ScaLAPACK environmental inquiry function.
NPROW and NPCOL can be determined by calling the subroutine
blacs_gridinfo.
Output Parameters
a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub( A ) are
overwritten by the corresponding elements of the tridiagonal matrix T, and
the elements above the first superdiagonal, with the array tau, represent
the orthogonal matrix Q as a product of elementary reflectors; if uplo = 'L',
the diagonal and first subdiagonal of sub( A ) are overwritten by the
corresponding elements of the tridiagonal matrix T, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal matrix Q
as a product of elementary reflectors. See Further Details.
d (local)
Array, size LOCc(ja+n-1)
The diagonal elements of the tridiagonal matrix T: d(i) = A(i,i). d is tied to

the distributed matrix A.
e (local)
Array, size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.
The off-diagonal elements of the tridiagonal matrix T: e(i) = A(i,i+1) if

uplo = 'U', e(i) = A(i+1,i) if uplo = 'L'. e is tied to the distributed matrix A.
1244
tau (local)
Array, size LOCc(ja+n-1).
This array contains the scalar factors tau of the elementary reflectors. tau
is tied to the distributed matrix A.
work (local)
Array, size (lwork)
On exit, work[0] returns the optimal lwork.
info (global)
Application Notes
If uplo = 'U', the matrix Q is represented as a product of elementary reflectors
Q = H(n-1) . . . H(2) H(1).

H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) =
1; v(1:i-1) is stored on exit in A(ia:ia+i-2,ja+i), and tau in tau(ja+i-1).
If uplo = 'L', the matrix Q is represented as a product of elementary reflectors
Q = H(1) H(2) . . . H(n-1).

H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) =
1; v(i+2:n) is stored on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau(ja+i-1).
The contents of sub( A ) on exit are illustrated by the following examples with n = 5:
if uplo = 'U':
d e v2 v3 v4
d e v3 v4
d e v3
d e
d
if uplo = 'L':
d
e d
v1 e d
v1 v2 e d
v1 v2 v3 e d
where d and e denote diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
1245
The distributed submatrix sub( A ) must verify some alignment properties, namely the following expression
should be true:
( mb_a = nb_a and IROFFA = ICOFFA and IROFFA = 0 ) with IROFFA = mod( ia-1, mb_a), and ICOFFA =
mod( ja-1, nb_a ).
p?sytrd
Reduces a symmetric matrix to real symmetric
tridiagonal form by an orthogonal similarity
transformation.
Syntax
void pssytrd (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdsytrd (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , double *tau , double *work , MKL_INT *lwork ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?sytrd function reduces a real symmetric matrix sub(A) to symmetric tridiagonal form T by an
orthogonal similarity transformation:
Q'*sub(A)*Q = T,
where sub(A) = A(ia:ia+n-1,ja:ja+n-1).
Input Parameters
uplo (global)
matrix sub(A) is stored:
If uplo = 'U', upper triangular
If uplo = 'L', lower triangular
a (local)
Pointer into the local memory to an array of size lld_a*LOCc(ja+n-1). On
entry, this array contains the local pieces of the symmetric distributed
matrix sub(A).
is not referenced.
the lower triangular part of the matrix, and its strictly upper triangular part
is not referenced. See Application Notes below.
1246
work (local)
lwork≥max(NB*(np +1), 3*NB),

np = numroc(n, NB, MYROW, iarow, NPROW),

iarow = indxg2p(ia, NB, MYROW, rsrc_a, NPROW).

Output Parameters
a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub(A) are
the orthogonal matrix Q as a product of elementary reflectors; if uplo =
'L', the diagonal and first subdiagonal of sub(A) are overwritten by the
the first subdiagonal, with the array tau, represent the orthogonal matrix Q
as a product of elementary reflectors. See Application Notes below.
d (local)
Arrays of size LOCc(ja+n-1) .The diagonal elements of the tridiagonal
matrix T:
d[i]= A(i+1,i+1), 0 ≤i < LOCc(ja+n-1).
d is tied to the distributed matrix A.
e (local)
Arrays of size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.
The off-diagonal elements of the tridiagonal matrix T:

e[i]= A(i+1,i+2), 0 ≤i < LOCc(ja+n-1) if uplo = 'U',
e[i] = A(i+2,i+1) if uplo = 'L'.
e is tied to the distributed matrix A.
tau (local)
1247
Arrays of size LOCc(ja+n-1). This array contains the scalar factors of the
elementary reflectors. tau is tied to the distributed matrix A.
info (global)
Application Notes
Q = H(n-1)... H(2) H(1).

H(i) = i - tau * v * v',
where tau is a real scalar, and v is a real vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in
A(ia:ia+i-2, ja+i), and tau in tau[ja+i-2].
Q = H(1) H(2)... H(n-1).

H(i) = i - tau * v * v',
where tau is a real scalar, and v is a real vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in
A(ia+i+1:ia+n-1,ja+i-1), and tau in tau[ja+i-2].
The contents of sub(A) on exit are illustrated by the following examples with n = 5:
If uplo = 'U':
If uplo = 'L':
1248
defining H(i).
See Also
notations.
p?ormtr
Multiplies a general matrix by the orthogonal
transformation matrix from a reduction to tridiagonal
form determined by p?sytrd.
Syntax
void psormtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n , float
void pdormtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n , double
Include Files
• mkl_scalapack.h
Description
This function overwrites the general real distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1) with
side ='L' side ='R'

where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side =
'R'.
Q is defined as the product of nq elementary reflectors, as returned by p?sytrd.
If uplo = 'U', Q = H(nq-1)... H(2) H(1);
If uplo = 'L', Q = H(1) H(2)... H(nq-1).
Input Parameters
side (global)
1249
trans (global)
uplo (global)
= 'U': Upper triangle of A(ia:*, ja:*) contains elementary reflectors
from p?sytrd;
= 'L': Lower triangle of A(ia:*,ja:*) contains elementary reflectors

from p?sytrd
a (local)
Contains the vectors that define the elementary reflectors, as returned by
p?sytrd.
If side='L', lld_a≥max(1,LOCr(ia+m-1));
If side ='R', lld_a≥max(1, LOCr(ia+n-1)).
tau (local)
Array of size of ltau where
if side = 'L' and uplo = 'U', ltau = LOCc(m_a),
if side = 'L' and uplo = 'L', ltau = LOCc(ja+m-2),
if side = 'R' and uplo = 'U', ltau = LOCc(n_a),
if side = 'R' and uplo = 'L', ltau = LOCc(ja+n-2).
tau[i] must contain the scalar factor of the elementary reflector H(i+1), as
returned by p?sytrd (0 ≤i < ltau). tau is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c*LOCc(jc+n-1).
Contains the local pieces of the distributed matrix sub (C).
1250
work (local)
if uplo = 'U',
iaa= ia; jaa= ja+1, icc= ic; jcc= jc;
else uplo = 'L',
iaa= ia+1, jaa= ja;

If side = 'L',
icc= ic+1; jcc= jc;

else icc= ic; jcc= jc+1;
end if
end if
If side = 'L',
mi= m-1; ni= n

else
If side = 'R',
mi= m; mi = n-1;
lwork≥max((nb_a*(nb_a-1))/2, (nqc0 +
max(npa0+numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a,
0, 0, lcmq), mpc0))*nb_a)+ nb_a*nb_a
end if
where lcmq = lcm/NPCOL with lcm = ilcm(NPROW, NPCOL),
iroffa = mod(iaa-1, mb_a),

icoffa = mod(jaa-1, nb_a),
iarow = indxg2p(iaa, mb_a, MYROW, rsrc_a, NPROW),
npa0 = numroc(ni+iroffa, mb_a, MYROW, iarow, NPROW),
iroffc = mod(icc-1, mb_c),
icoffc = mod(jcc-1, nb_c),
icrow = indxg2p(icc, mb_c, MYROW, rsrc_c, NPROW),
iccol = indxg2p(jcc, nb_c, MYCOL, csrc_c, NPCOL),
mpc0 = numroc(mi+iroffc, mb_c, MYROW, icrow, NPROW),
nqc0 = numroc(ni+icoffc, nb_c, MYCOL, iccol, NPCOL),
NOTE
1251
blacs_gridinfo. If lwork = -1, then lwork is global input and a
workspace query is assumed; the function only calculates the minimum and
optimal size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
pxerbla.
Output Parameters
c Overwritten by the product Q*sub(C), or Q'*sub(C), or sub(C)*Q', or

sub(C)*Q.
info (global)
See Also
notations.
p?hengst
Reduces a complex Hermitian-definite generalized
eigenproblem to standard form.
Syntax
void pchengst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n, MKL_Complex8*
a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const MKL_Complex8* b,
const MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, float* scale, MKL_Complex8*
work, const MKL_INT* lwork, MKL_INT* info);
void pzhengst (const MKL_INT* ibtype, const char* uplo, const MKL_INT* n,
MKL_Complex16* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_Complex16* b, const MKL_INT* ib, const MKL_INT* jb, const MKL_INT* descb, double*
scale, MKL_Complex16* work, const MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?hengst reduces a complex Hermitian-definite generalized eigenproblem to standard form.
p?hengst performs the same function as p?hegst, but is based on rank 2K updates, which are faster and
more scalable than triangular solves (the basis of p?hengst).
p?hengst calls p?hegst when uplo='U', hence p?hengst provides improved performance only when
uplo='L' and ibtype=1.
1252
p?hengst also calls p?hegst when insufficient workspace is provided, hence p?hengst provides improved
performance only when lwork is sufficient (as described in the parameter descriptions).
In the following sub( A ) denotes the submatrix A( ia:ia+n-1, ja:ja+n-1 ) and sub( B ) denotes the
submatrix B( ib:ib+n-1, jb:jb+n-1 ).
If ibtype = 1, the problem is sub( A )*x = lambda*sub( B )*x, and sub( A ) is overwritten by
inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH)
If ibtype = 2 or 3, the problem is sub( A )*sub( B )*x = lambda*x or sub( B )*sub( A )*x = lambda*x, and
sub( A ) is overwritten by U*sub( A )*UH or LH*sub( A )*L.
sub( B ) must have been previously factorized as UH*U or L*LH by p?potrf.
Input Parameters
ibtype (global)
= 1: compute inv(UH)*sub( A )*inv(U) or inv(L)*sub( A )*inv(LH);
= 2 or 3: compute U*sub( A )*UH or LH*sub( A )*L.
uplo (global)
= 'U': Upper triangle of sub( A ) is stored and sub( B ) is factored as UH*U;
= 'L': Lower triangle of sub( A ) is stored and sub( B ) is factored as L*LH.
n (global)
The order of the matrices sub( A ) and sub( B ). n >= 0.
a (local)
On entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub( A ). If uplo = 'U', the leading n-by-n upper
triangular part of sub( A ) contains the upper triangular part of the matrix,
and its strictly lower triangular part is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub( A ) contains the lower
triangular part of the matrix, and its strictly upper triangular part is not
referenced.
ia (global)
Global row index of matrix A, which points to the beginning of the
submatrix on which to operate.
ja (global)
Global column index of matrix A, which points to the beginning of the

b (local)
ib (global)
1253
Global row index of matrix B, which points to the beginning of the

jb (global)
Global column index of matrix B, which points to the beginning of the
descb (global and local)

work (local)
Array, size (lwork)
On exit, work( 1 ) returns the minimal and optimal lwork.
lwork (local)
lwork is local input and must be at least lwork >= MAX( NB * ( NP0
+1 ), 3 * NB ).
When ibtype = 1 and uplo = 'L', p?hengst provides improved
performance when lwork >= 2 * NP0 * NB + NQ0 * NB + NB * NB, where
NB = mb_a = nb_a, NP0 = numroc( n, NB, 0, 0, NPROW ), NQ0 =
numroc( n, NB, 0, 0, NPROW ), and numroc is a ScaLAPACK tool function.

assumed; the routine only calculates the optimal size for all work arrays.
Output Parameters

sub( A ).
scale (global)
scaling performed in this routine.
scale is always returned as 1.0.
info (global)
< 0: If the i-th argument is an array and the j-entry had an illegal value,
1254
p?hentrd
Reduces a complex Hermitian matrix to Hermitian
tridiagonal form.
Syntax
void pchentrd (const char* uplo, const MKL_INT* n, MKL_Complex8* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca, float* d, float* e, MKL_Complex8* tau,
MKL_Complex8* work, const MKL_INT* lwork, float* rwork, const MKL_INT* lrwork, MKL_INT*
info);
void pzhentrd (const char* uplo, const MKL_INT* n, MKL_Complex16* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca, double* d, double* e, MKL_Complex16* tau,
MKL_Complex16* work, const MKL_INT* lwork, double* rwork, const MKL_INT* lrwork,
MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?hentrd is a prototype version of p?hetrd which uses tailored codes (either the serial, ?hetrd, or the
parallel code, p?hettrd) when adequate workspace is provided.
p?hentrd reduces a complex Hermitian matrix sub( A ) to Hermitian tridiagonal form T by an unitary
similarity transformation:
Q' * sub( A ) * Q = T, where sub( A ) = A(ia:ia+n-1,ja:ja+n-1).
p?hentrd is faster than p?hetrd on almost all matrices, particularly small ones (i.e. n < 500 * sqrt(P) ),
provided that enough workspace is available to use the tailored codes.
The tailored codes provide performance that is essentially independent of the input data layout.
The tailored codes place no restrictions on ia, ja, MB or NB. At present, ia, ja, MB and NB are restricted to
those values allowed by p?hetrd to keep the interface simple (see the Application Notes section for more
information about the restrictions).
Input Parameters
uplo (global)
Specifies whether the upper or lower triangular part of the Hermitian matrix
sub( A ) is stored:
n (global)
The number of rows and columns to be operated on, i.e. the order of the
a (local)
On entry, this array contains the local pieces of the Hermitian distributed
matrix sub( A ). If uplo = 'U', the leading n-by-n upper triangular part of
sub( A ) contains the upper triangular part of the matrix, and its strictly
1255
lower triangular part of sub( A ) contains the lower triangular part of the
ia (global)
ja (global)
sub( A ).

work (local)
Array, size (lwork)

lwork is local input and must be at least lwork >= MAX( NB * ( NP +1 ), 3

* NB ).
For optimal performance, greater workspace is needed:
lwork >= 2*( ANB+1 )*( 4*NPS+2 ) + ( NPS + 4 ) * NPS
ANB = pjlaenv( ICTXT, 3, 'p?hettrd', 'L', 0, 0, 0, 0 )
ICTXT = desca( ctxt_ )
SQNPC = INT( sqrt( REAL( NPROW * NPCOL ) ) )
NPS = MAX( numroc( n, 1, 0, 0, SQNPC ), 2*ANB )

pjlaenv is a ScaLAPACK environmental inquiry function.
NPROW and NPCOL can be determined by calling the subroutine
blacs_gridinfo.
rwork (local)
Array, size (lrwork)
lrwork (local or global)

The size of the array rwork.
lrwork is local input and must be at least lrwork >= 1.

For optimal performance, greater workspace is needed, i.e. lrwork >=
MAX( 2 * n )
1256
Output Parameters
a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub( A ) are
the unitary matrix Q as a product of elementary reflectors; if uplo = 'L', the
diagonal and first subdiagonal of sub( A ) are overwritten by the
the first subdiagonal, with the array tau, represent the unitary matrix Q as
a product of elementary reflectors. See Application Notes.
d (local)
Array, size LOCc(ja+n-1)
The diagonal elements of the tridiagonal matrix T: d[i - 1] = A(i,i). d is

tied to the distributed matrix A.
e (local)
Array, size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.
The off-diagonal elements of the tridiagonal matrix T: e[i - 1] = A(i,i+1)

if uplo = 'U', e[i - 1] = A(i+1,i) if uplo = 'L'. e is tied to the distributed
matrix A.
tau (local)
Array, size LOCc(ja+n-1).
This array contains the scalar factors tau of the elementary reflectors. tau
is tied to the distributed matrix A.
work On exit, work[0] returns the optimal lwork.
rwork On exit, rwork[0] returns the optimal lrwork.
info (global)
Application Notes
Q = H(n-1) . . . H(2) H(1).

H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) =
1; v(1:i-1) is stored on exit in A(ia:ia+i-2,ja+i), and tau in tau(ja+i-1).
Q = H(1) H(2) . . . H(n-1).
1257
H(i) = I - tau * v * v', where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) =
1; v(i+2:n) is stored on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau(ja+i-1).
The contents of sub( A ) on exit are illustrated by the following examples with n = 5:
if uplo = 'U':
d e v2 v3 v4
d e v3 v4
d e v3
d e
d
if uplo = 'L':
d
e d
v1 e d
v1 v2 e d
v1 v2 v3 e d
defining H(i).
The distributed submatrix sub( A ) must verify some alignment properties, namely the following expression
should be true:
( mb_a = nb_a and IROFFA = ICOFFA and IROFFA = 0 ) with IROFFA = mod( ia-1, mb_a), and ICOFFA =
mod( ja-1, nb_a ).
p?hetrd
Reduces a Hermitian matrix to Hermitian tridiagonal
form by a unitary similarity transformation.
Syntax
void pchetrd (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tau , MKL_Complex8 *work ,
void pzhetrd (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tau , MKL_Complex16 *work ,
Include Files
• mkl_scalapack.h
Description
The p?hetrd function reduces a complex Hermitian matrix sub(A) to Hermitian tridiagonal form T by a
unitary similarity transformation:
Q'*sub(A)*Q = T
where sub(A) = A(ia:ia+n-1,ja:ja+n-1).
1258
Input Parameters
uplo (global)
sub(A) is stored:
If uplo = 'U', upper triangular
If uplo = 'L', lower triangular
a (local)
entry, this array contains the local pieces of the Hermitian distributed
matrix sub(A).
is not referenced.
is not referenced. (see Application Notes below).
work (local)
lwork≥max(NB*(np +1), 3*NB)

np = numroc(n, NB, MYROW, iarow, NPROW),

iarow = indxg2p(ia, NB, MYROW, rsrc_a, NPROW).

Output Parameters
a On exit,
If uplo = 'U', the diagonal and first superdiagonal of sub(A) are
1259
the unitary matrix Q as a product of elementary reflectors;if uplo = 'L',

the diagonal and first subdiagonal of sub(A) are overwritten by the
the first subdiagonal, with the array tau, represent the unitary matrix Q as
a product of elementary reflectors (see Application Notes below).
d (local)
Arrays of size LOCc(ja+n-1). The diagonal elements of the tridiagonal
matrix T:
d[i]= A(i+1,i+1), 0 ≤i < LOCc(ja+n-1).
e (local)
Arrays of size LOCc(ja+n-1) if uplo = 'U'; LOCc(ja+n-2) - otherwise.

e[i]= A(i+1,i+2), 0 ≤i < LOCc(ja+n-1) if uplo = 'U',
e[i] = A(i+2,i+1) if uplo = 'L'.
tau (local)
Array of size LOCc(ja+n-1). This array contains the scalar factors of the
info (global)
Application Notes
Q = H(n-1)*...*H(2)*H(1).

H(i) = i - tau*v*v',
where tau is a complex scalar, and v is a complex vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on
exit in A(ia:ia+i-2, ja+i), and tau in tau[ja+i-2].
Q = H(1)*H(2)*...*H(n-1).

H(i) = i - tau*v*v',
where tau is a complex scalar, and v is a complex vector with v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored
on exit in A(ia+i+1:ia+n-1,ja+i-1), and tau in tau[ja+i-2].
1260
The contents of sub(A) on exit are illustrated by the following examples with n = 5:
If uplo = 'U':
If uplo = 'L':
defining H(i).
See Also
notations.
p?unmtr
transformation matrix from a reduction to tridiagonal
form determined by p?hetrd.
Syntax
void pcunmtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n ,
void pzunmtr (char *side , char *uplo , char *trans , MKL_INT *m , MKL_INT *n ,
Include Files
• mkl_scalapack.h
1261
Description
This function overwrites the general complex distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
side ='L' side ='R'

where Q is a complex unitary distributed matrix of order nq, with nq =m if side = 'L' and nq =n if side =
'R'.
Q is defined as the product of nq-1 elementary reflectors, as returned by p?hetrd.
If uplo = 'U', Q = H(nq-1)... H(2) H(1);
If uplo = 'L', Q = H(1) H(2)... H(nq-1).
Input Parameters
side (global)
trans (global)
uplo (global)
= 'U': Upper triangle of A(ia:*, ja:*) contains elementary reflectors
from p?hetrd;
= 'L': Lower triangle of A(ia:*,ja:*) contains elementary reflectors

from p?hetrd
a (local)
Contains the vectors which define the elementary reflectors, as returned by
p?hetrd.
If side='L', lld_a≥max(1,LOCr(ia+m-1));
If side ='R', lld_a≥max(1,LOCr(ia+n-1)).
tau (local)
1262
Array of size of ltau where
If side = 'L' and uplo = 'U', ltau = LOCc(m_a),
if side = 'L' and uplo = 'L', ltau = LOCc(ja+m-2),
if side = 'R' and uplo = 'U', ltau = LOCc(n_a),
if side = 'R' and uplo = 'L', ltau = LOCc(ja+n-2).
tau[i] must contain the scalar factor of the elementary reflector H(i+1), as
returned by p?hetrd (0 ≤i < ltau). tau is tied to the distributed matrix A.
c (local)
work (local)
If uplo = 'U',
iaa= ia; jaa= ja+1, icc= ic; jcc= jc;

else uplo = 'L',
iaa= ia+1, jaa= ja;

If side = 'L',
icc= ic+1; jcc= jc;

else icc= ic; jcc= jc+1;
end if
end if
If side = 'L',
mi= m-1; ni= n

else
If side = 'R',
mi= m; mi = n-1;
0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
1263


NOTE
blacs_gridinfo. If lwork = -1, then lwork is global input and a
workspace query is assumed; the function only calculates the minimum and
optimal size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
pxerbla.
Output Parameters
c Overwritten by the product Q*sub(C), or Q'*sub(C), or sub(C)*Q', or

sub(C)*Q.
info (global)
See Also
notations.
p?stebz
Computes the eigenvalues of a symmetric tridiagonal
matrix by bisection.
1264
Syntax
void psstebz (MKL_INT *ictxt , char *range , char *order , MKL_INT *n , float *vl ,
float *vu , MKL_INT *il , MKL_INT *iu , float *abstol , float *d , float *e , MKL_INT
*m , MKL_INT *nsplit , float *w , MKL_INT *iblock , MKL_INT *isplit , float *work ,
void pdstebz (MKL_INT *ictxt , char *range , char *order , MKL_INT *n , double *vl ,
double *vu , MKL_INT *il , MKL_INT *iu , double *abstol , double *d , double *e ,
MKL_INT *m , MKL_INT *nsplit , double *w , MKL_INT *iblock , MKL_INT *isplit , double
*work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?stebz function computes the eigenvalues of a symmetric tridiagonal matrix in parallel. These may be
all eigenvalues, all eigenvalues in the interval [vlvu], or the eigenvalues il through iu. A static partitioning
of work is done at the beginning of p?stebz which results in all processes finding an (almost) equal number
of eigenvalues.
Optimization Notice
Input Parameters
ictxt (global) The BLACS context handle.
range (global) Must be 'A' or 'V' or 'I'.
If range = 'A', the function computes all eigenvalues.
If range = 'V', the function computes eigenvalues in the interval [vl,

vu].
If range ='I', the function computes eigenvalues il through iu.
order (global) Must be 'B' or 'E'.
If order = 'B', the eigenvalues are to be ordered from smallest to largest

within each split-off block.
If order = 'E', the eigenvalues for the entire matrix are to be ordered
n (global) The order of the tridiagonal matrix T(n≥0).
vl, vu (global)
1265
If range = 'V', the function computes the lower and the upper bounds for
the eigenvalues on the interval [1, vu].
il, iu (global)
Constraint: 1≤il≤iu≤n.
If range = 'I', the index of the smallest eigenvalue is returned for il and
of the largest eigenvalue for iu (assuming that the eigenvalues are in
ascending order) must be returned.
abstol (global)
The absolute tolerance to which each eigenvalue is required. An eigenvalue
(or cluster) is considered to have converged if it lies in an interval of width
abstol. If abstol≤0, then the tolerance is taken as ulp||T||, where ulp is
the machine precision, and ||T|| means the 1-norm of T
Eigenvalues will be computed most accurately when abstol is set to the
underflow threshold slamch('U'), not 0. Note that if eigenvectors are
desired later by inverse iteration (p?stein), abstol should be set to 2*p?
lamch('S').
d (global)
Array of size n.
Contains n diagonal elements of the tridiagonal matrix T. To avoid overflow,

the matrix must be scaled so that its largest entry is no greater than the
overflow(1/2) * underflow(1/4) in absolute value, and for greatest
accuracy, it should not be much smaller than that.
e (global)
Array of size n - 1.
Contains (n-1) off-diagonal elements of the tridiagonal matrix T. To avoid

overflow, the matrix must be scaled so that its largest entry is no greater
than overflow(1/2) * underflow(1/4) in absolute value, and for greatest
accuracy, it should not be much smaller than that.
work (local)
Array of size max(5n, 7). This is a workspace array.
lwork (local) The size of the work array must be ≥max(5n, 7).

iwork (local) Array of size max(4n, 14). This is a workspace array.
liwork (local) The size of the iwork array must ≥max(4n, 14, NPROCS).
1266
If liwork = -1, then liwork is global input and a workspace query is
Output Parameters
m (global) The actual number of eigenvalues found. 0≤m≤n
nsplit (global) The number of diagonal blocks detected in T. 1≤nsplit≤n
w (global)
Array of size n. On exit, the first m elements of w contain the eigenvalues on
all processes.
iblock (global)
Array of size n. At each row/column j where e[j-1] is zero or small, the
matrix T is considered to split into a block diagonal matrix. On exit
iblock[i] specifies which block (from 1 to the number of blocks) the
eigenvalue w[i] belongs to.
NOTE
In the (theoretically impossible) event that bisection does not converge
for some or all eigenvalues, info is set to 1 and the ones for which it
did not are identified by a negative block number.
isplit (global)
Array of size n.
Contains the splitting points, at which T breaks up into submatrices. The

first submatrix consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], and so on, and the
nsplit-th submatrix consists of rows/columns isplit[nsplit-2]+1
through isplit[nsplit-1]=n. (Only the first nsplit elements are used,
but since the nsplit values are not known, n words must be reserved for
isplit.)
info (global)
If info < 0, if info = -i, the i-th argument has an illegal value.
If info> 0, some or all of the eigenvalues fail to converge or are not

computed.
If info = 1, bisection fails to converge for some eigenvalues; these
eigenvalues are flagged by a negative block number. The effect is that the
eigenvalues may not be as accurate as the absolute and relative tolerances.
If info = 2, mismatch between the number of eigenvalues output and the
number desired.
1267
If info = 3: range='I', and the Gershgorin interval initially used is

incorrect. No eigenvalues are computed. Probable cause: the machine has a
sloppy floating-point arithmetic. Increase the fudge parameter, recompile,
and try again.
See Also
notations.
p?stedc
symmetric tridiagonal matrix in parallel.
Syntax
void psstedc (const char* compz, const MKL_INT* n, float* d, float* e, float* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, float* work, MKL_INT* lwork,
MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);
void pdstedc (const char* compz, const MKL_INT* n, double* d, double* e, double* q,
const MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, double* work, MKL_INT*
lwork, MKL_INT* iwork, const MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?stedc computes all eigenvalues and eigenvectors of a symmetric tridiagonal matrix in parallel, using the
Input Parameters
compz = 'N': Compute eigenvalues only. (NOT IMPLEMENTED YET)

= 'I': Compute eigenvectors of tridiagonal matrix also.
= 'V': Compute eigenvectors of original dense symmetric matrix also. On
entry, Z contains the orthogonal matrix used to reduce the original matrix
to tridiagonal form. (NOT IMPLEMENTED YET)
n (global)
The order of the tridiagonal matrix T. n >= 0.
d (global)
Array, size (n)
On entry, the diagonal elements of the tridiagonal matrix.
e (global)
Array, size (n-1).
On entry, the subdiagonal elements of the tridiagonal matrix.
iq (global)
1268
Q's global row index, which points to the beginning of the submatrix which
jq (global)
Q's global column index, which points to the beginning of the submatrix
descq (global and local)

The array descriptor for the distributed matrix Q.
work (local)
Array, size (lwork)
lwork (local)
lwork = 6*n + 2*NP*NQ

NP = numroc( n, NB, MYROW, DESCQ( rsrc_ ), NPROW )
NQ = numroc( n, NB, MYCOL, DESCQ( csrc_ ), NPCOL )

If lwork = -1, the lwork is global input and a workspace query is
assumed; the routine only calculates the minimum size for the work array.
The required workspace is returned as the first element of work and no
error message is issued by pxerbla.
iwork (local)
Array, size (liwork)
liwork The size of the array iwork.
liwork = 2 + 7*n + 8*NPCOL
Output Parameters
d On exit, if info = 0, the eigenvalues in descending order.
q (local)
Array, local size ( lld_q, LOCc(jq+n-1))
q contains the orthonormal eigenvectors of the symmetric tridiagonal

matrix.
On output, q is distributed across the P processes in block cyclic format.
work On output, work[0] returns the workspace needed.
iwork On exit, if liwork > 0, iwork[0] returns the optimal liwork.
info (global)
1269
> 0: The algorithm failed to compute the info/(n+1)-th eigenvalue while

working on the submatrix lying in global rows and columns mod(info,n+1).
p?stein
Computes the eigenvectors of a tridiagonal matrix
using inverse iteration.
Syntax
void psstein (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT
*iblock , MKL_INT *isplit , float *orfac , float *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pdstein (MKL_INT *n , double *d , double *e , MKL_INT *m , double *w , MKL_INT
*iblock , MKL_INT *isplit , double *orfac , double *z , MKL_INT *iz , MKL_INT *jz ,
MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork ,
MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );
void pcstein (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT

*iblock , MKL_INT *isplit , float *orfac , MKL_Complex8 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT *info );
void pzstein (MKL_INT *n , double *d , double *e , MKL_INT *m , double *w , MKL_INT
*iblock , MKL_INT *isplit , double *orfac , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT
*liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?stein function computes the eigenvectors of a symmetric tridiagonal matrix T corresponding to
specified eigenvalues, by inverse iteration. p?stein does not orthogonalize vectors that are on different
processes. The extent of orthogonalization is controlled by the input parameter lwork. Eigenvectors that are
to be orthogonalized are computed by the same process. p?stein decides on the allocation of work among
the processes and then calls ?stein2 (modified LAPACK function) on each individual process. If insufficient
workspace is allocated, the expected orthogonalization may not be done.
NOTE
If the eigenvectors obtained are not orthogonal, increase lwork and run the code again.
p = NPROW*NPCOL is the total number of processes.
Input Parameters
n (global) The order of the matrix T(n≥ 0).
1270
m (global) The number of eigenvectors to be returned.
d, e, w (global)
Arrays:
d of size n contains the diagonal elements of T.

e of size n-1 contains the off-diagonal elements of T.
w of size m contains all the eigenvalues grouped by split-off block. The
eigenvalues are supplied from smallest to largest within the block. (Here
the output array w from p?stebz with order = 'B' is expected. The array
should be replicated in all processes.)
iblock (global)
Array of size n. The submatrix indices associated with the corresponding
eigenvalues in w: 1 for eigenvalues belonging to the first submatrix from
the top, 2 for those belonging to the second submatrix, etc. (The output
array iblock from p?stebz is expected here).
isplit (global)
Array of size n. The splitting points at which T breaks up into submatrices.
The first submatrix consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], and so on, and the
nsplit-th submatrix consists of rows/columns isplit[nsplit-2]+1
through isplit[nsplit-1]=n. (The output array isplit from p?stebz is
expected here.)
orfac (global)
orfac specifies which eigenvectors should be orthogonalized. Eigenvectors
that correspond to eigenvalues within orfac*||T|| of each other are to be
orthogonalized. However, if the workspace is insufficient (see lwork), this
tolerance may be decreased until all eigenvectors can be stored in one
process. No orthogonalization is done if orfac is equal to zero. A default
value of 1000 is used if orfac is negative. orfac should be identical on all
processes
iz, jz (global) The row and column indices in the global matrix Z indicating the
first row and the first column of the submatrix Z, respectively.
descz (global and local) array of size dlen_. The array descriptor for the
distributed matrix Z.
work (local).
lwork (local)
lwork controls the extent of orthogonalization which can be done. The
number of eigenvectors for which storage is allocated on each process is
nvec = floor((lwork-max(5*n,np00*mq00))/n). Eigenvectors
corresponding to eigenvalue clusters of size (nvec - ceil(m/p) + 1) are
guaranteed to be orthogonal (the orthogonality is similar to that obtained
from ?stein2).
1271
NOTE
lwork must be no smaller than max(5*n,np00*mq00) + ceil(m/
p)*n and should have the same input value on all processes.
It is the minimum value of lwork input on different processes that is

significant.
iwork (local)
Workspace array of size 3n+p+1.
liwork (local) The size of the array iwork. It must be greater than 3*n+p+1.

Output Parameters
z (local)
Array of size descz[dlen_-1], n/NPCOL + NB). z contains the computed
eigenvectors associated with the specified eigenvalues. Any vector which
fails to converge is set to its current iterate after MAXIT iterations (See ?
stein2). On output, z is distributed across the p processes in block cyclic
format.
work On exit, work[0] gives a lower bound on the workspace (lwork) that
guarantees the user desired orthogonalization (see orfac). Note that this
may overestimate the minimum workspace needed.
iwork On exit, iwork[0] contains the amount of integer workspace required.
On exit, the iwork[1] through iwork[p+1] indicate the eigenvectors

computed by each process. Process i computes eigenvectors indexed
iwork[i+1]+1 through iwork[i+2].
ifail (global) Array of size m. On normal exit, all elements of ifail are zero. If
one or more eigenvectors fail to converge after MAXIT iterations (as in ?
stein), then info > 0 is returned. If mod(info, m+1)>0, then for i=1 to
mod(info,m+1), the eigenvector corresponding to the eigenvalue
w[ifail[i-1]-1] failed to converge (w refers to the array of eigenvalues
on output).
NOTE
iclustr (global) Array of size 2*p.
1272
This output array contains indices of eigenvectors corresponding to a cluster
of eigenvalues that could not be orthogonalized due to insufficient
workspace (see lwork, orfac and info). Eigenvectors corresponding to
clusters of eigenvalues indexed iclustr(2*I-1) to iclustr(2*I), i = 1
to info/(m+1), could not be orthogonalized due to lack of workspace.
Hence the eigenvectors corresponding to these clusters may not be
orthogonal. iclustr is a zero terminated array: iclustr[2*k-1]≠ 0 and
iclustr[2*k] = 0 if and only if k is the number of clusters.
gap (global)
This output array contains the gap between eigenvalues whose
eigenvectors could not be orthogonalized. The info/m output values
in this array correspond to the info/(m+1) clusters indicated by the
array iclustr. As a result, the dot product between eigenvectors
corresponding to the i-th cluster may be as high as
(O(n)*macheps)/gap[i-1].
info (global)
If info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j),
If the i-th argument is a scalar and had an illegal value, then info = -i.
If info < 0: if info = -i, the i-th argument had an illegal value.
If info > 0: if mod(info, m+1) = i, then i eigenvectors failed to converge

in MAXIT iterations. Their indices are stored in the array ifail. If info/(m
+1) = i, then eigenvectors corresponding to i clusters of eigenvalues could
not be orthogonalized due to insufficient workspace. The indices of the
clusters are stored in the array iclustr.
See Also
notations.
Nonsymmetric Eigenvalue Problems: ScaLAPACK Computational Routines

This section describes ScaLAPACK routines for solving nonsymmetric eigenvalue problems, computing the
Schur factorization of general matrices, as well as performing a number of related computational tasks.
To solve a nonsymmetric eigenvalue problem with ScaLAPACK, you usually need to reduce the matrix to the
upper Hessenberg form and then solve the eigenvalue problem with the Hessenberg matrix obtained.
Table "Computational Routines for Solving Nonsymmetric Eigenproblems"lists ScaLAPACK routines for
reducing the matrix to the upper Hessenberg form by an orthogonal (or unitary) similarity transformation A=
QHQH, as well as routines for solving eigenproblems with Hessenberg matrices, and multiplying the matrix
after reduction.
Computational Routines for Solving Nonsymmetric Eigenproblems
Operation performed General matrix Orthogonal/Unitary Hessenberg matrix
matrix
Reduce to Hessenberg form A= QHQH p?gehrd
Multiply the matrix after reduction p?ormhr/ p?unmhr
Find eigenvalues and Schur p?lahqr
factorization
1273
p?gehrd
Reduces a general matrix to upper Hessenberg form.
Syntax
void psgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
void pzgehrd (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work ,
Include Files
• mkl_scalapack.h
Description
The p?gehrd function reduces a real/complex general distributed matrix sub(A) to upper Hessenberg form H
by an orthogonal or unitary similarity transformation
Q'*sub(A)*Q = H,
where sub(A) = A(ia:ia+n-1, ja:ja+n-1).
Input Parameters
n (global). The order of the distributed matrix sub(A) (n≥0).
ilo, ihi (global).

It is assumed that sub(A) is already upper triangular in rows ia:ia+ilo-2
and ia+ihi:ia+n-1 and columns ja:ja+ilo-2 and ja+ihi:ja+n-1. (See
If n > 0, 1≤ilo≤ihi≤n; otherwise set ilo = 1, ihi = n.
a (local)
entry, this array contains the local pieces of the n-by-n general distributed
matrix sub(A) to be reduced.
work (local)
1274
lwork (local or global) size of the array work. lwork is local input and must be at
least
lwork≥NB*NB + NB*max(ihip+1, ihlp+inlq)
iroffa = mod(ia-1, NB),

icoffa = mod(ja-1, NB),
ioff = mod(ia+ilo-2, NB), iarow = indxg2p(ia, NB, MYROW,
rsrc_a, NPROW), ihip = numroc(ihi+iroffa, NB, MYROW, iarow,
NPROW),
ilrow = indxg2p(ia+ilo-1, NB, MYROW, rsrc_a, NPROW),
ihlp = numroc(ihi-ilo+ioff+1, NB, MYROW, ilrow, NPROW),
ilcol = indxg2p(ja+ilo-1, NB, MYCOL, csrc_a, NPCOL),
inlq = numroc(n-ilo+ioff+1, NB, MYCOL, ilcol, NPCOL),
NOTE

Output Parameters
a On exit, the upper triangle and the first subdiagonal of sub(A) are
overwritten with the upper Hessenberg matrix H, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal/unitary
below).
tau (local).
Array of size at least max(ja+n-2).
The scalar factors of the elementary reflectors (see Application Notes

below). Elements ja:ja+ilo-2 and ja+ihi:ja+n-2 of the global vector
tau are set to zero. tau is tied to the distributed matrix A.
info (global)
1275
Application Notes
The matrix Q is represented as a product of (ihi-ilo) elementary reflectors
Q = H(ilo)*H(ilo+1)*...*H(ihi-1).

H(i)= i - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i)= 0, v(i+1)= 1 and v(ihi
+1:n)= 0; v(i+2:ihi) is stored on exit in A(ia+ilo+i:ia+ihi-1,ja+ilo+i-2), and tau in tau[ja+ilo
+i-3]. The contents of A
(ia:ia+n-1,ja:ja+n-1) are illustrated by the following example, with n = 7, ilo = 2 and ihi = 6:
on entry
on exit
where a denotes an element of the original matrix sub(A), H denotes a modified element of the upper
Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).
See Also
notations.
1276
p?ormhr
Multiplies a general matrix by the orthogonal
transformation matrix from a reduction to Hessenberg
form determined by p?gehrd.
Syntax
void psormhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau ,
MKL_INT *info );
void pdormhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau ,
Include Files
• mkl_scalapack.h
Description
The p?ormhr function overwrites the general real distributed m-by-n matrix sub(C)= C(iс:iс+m-1,jс:jс
+n-1) with
side ='L' side ='R'

where Q is a real orthogonal distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side =
'R'.
Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.
Q = H(ilo) H(ilo+1)... H(ihi-1).
Input Parameters
side (global)
trans (global)
m (global) The number of rows in the distributed matrix sub (C) (m≥0).
n (global) The number of columns in he distributed matrix sub (C) (n≥0).
ilo, ihi (global)

ilo and ihi must have the same values as in the previous call of p?gehrd.
Q is equal to the unit matrix except for the distributed submatrix Q(ia
+ilo:ia+ihi-1,ja+ilo:ja+ihi-1).
1277
If side = 'L', 1≤ilo≤ihi≤max(1,m);
If side = 'R', 1≤ilo≤ihi≤max(1,n);
ilo and ihi are relative indexes.
a (local)
p?gehrd.
tau (local)
Array of size LOCc(ja+m-2) if side = 'L', and LOCc(ja+n-2) if side =
'R'.
tau[j] contains the scalar factor of the elementary reflector H(j+1) as
returned by p?gehrd (0 ≤j < size(tau)). tau is tied to the distributed
matrix A.
c (local)
Contains the local pieces of the distributed matrix sub(C).
work (local)
Workspace array with size lwork.

lwork must be at least iaa = ia + ilo; jaa = ja+ilo-1;

If side = 'L',
mi = ihi-ilo; ni = n; icc = ic + ilo; jcc = jc;

else if side = 'R',
mi = m; ni = ihi-ilo; icc = ic; jcc = jc + ilo;

lwork≥max((nb_a*(nb_a-1))/2, (nqc0+max(npa0+numroc(numroc(ni
+ nb_a*nb_a
end if
1278

iroffc = mod(icc-1, mb_c), icoffc = mod(jcc-1, nb_c),
NOTE
blacs_gridinfo.
Output Parameters
c sub(C) is overwritten by Q*sub(C), or Q'*sub(C), or sub(C)*Q', or

sub(C)*Q.
info (global)
See Also
notations.
p?unmhr
transformation matrix from a reduction to Hessenberg
form determined by p?gehrd.
1279
Syntax
void pcunmhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
void pzunmhr (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *ilo ,
MKL_INT *ihi , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
Include Files
• mkl_scalapack.h
Description
This function overwrites the general complex distributed m-by-n matrix sub(C) = C(iс:iс+m-1,jс:jс+n-1)
with
side ='L' side ='R'

trans = 'H': QH*sub(C) sub(C)*QH
where Q is a complex unitary distributed matrix of order nq, with nq = m if side = 'L' and nq = n if side
= 'R'.
Q is defined as the product of ihi-ilo elementary reflectors, as returned by p?gehrd.
Q = H(ilo) H(ilo+1)... H(ihi-1).
Input Parameters
side (global)
trans (global)
m (global) The number of rows in the distributed matrix sub (C) (m≥0).
n (global) The number of columns in the distributed matrix sub (C) (n≥0).
ilo, ihi (global)

These must be the same parameters ilo and ihi, respectively, as supplied
to p?gehrd. Q is equal to the unit matrix except in the distributed
submatrixQ(ia+ilo:ia+ihi-1,ja+ilo:ja+ihi-1).
If side ='L', then 1≤ilo≤ihi≤max(1,m).
If side = 'R', then 1≤ilo≤ihi≤max(1,n)
ilo and ihi are relative indexes.
1280
a (local)
p?gehrd.
tau (local)
Array of size LOCc(ja+m-2), if side = 'L', and LOCc(ja+n-2) if side =
'R'.
tau[j] contains the scalar factor of the elementary reflector H(j+1) as
returned by p?gehrd (0 ≤j < size(tau)). tau is tied to the distributed
matrix A.
c (local)
Contains the local pieces of the distributed matrix sub(C).
work (local)

lwork must be at least iaa = ia + ilo;jaa = ja+ilo-1;

If side = 'L', mi = ihi-ilo; ni = n; icc = ic + ilo; jcc = jc;
else if side = 'R',
mi = m; ni = ihi-ilo; icc = ic; jcc = jc + ilo;

0, 0, lcmq ), mpc0))*nb_a) + nb_a*nb_a
end if

1281

NOTE
blacs_gridinfo.
Output Parameters
c C is overwritten by Q* sub(C) or Q'*sub(C) or sub(C)*Q' or sub(C)*Q.
work[0]) On exit work[0] contains the minimum value of lwork required for
info (global)
See Also
notations.
p?lahqr
Computes the Schur decomposition and/or
eigenvalues of a matrix already in Hessenberg form.
Syntax
void pslahqr (MKL_INT *wantt , MKL_INT *wantz , MKL_INT *n , MKL_INT *ilo , MKL_INT
*ihi , float *a , MKL_INT *desca , float *wr , float *wi , MKL_INT *iloz , MKL_INT
*ihiz , float *z , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *ilwork , MKL_INT *info );
1282
void pdlahqr (MKL_INT *wantt , MKL_INT *wantz , MKL_INT *n , MKL_INT *ilo , MKL_INT
*ihi , double *a , MKL_INT *desca , double *wr , double *wi , MKL_INT *iloz , MKL_INT
*ihiz , double *z , MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT *iwork ,
MKL_INT *ilwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
This is an auxiliary function used to find the Schur decomposition and/or eigenvalues of a matrix already in
Hessenberg form from columns ilo and ihi.
Input Parameters
wantt (global)
If wantt≠ 0, the full Schur form T is required;
If wantt = 0, only eigenvalues are required.
wantz (global)
If wantz≠ 0, the matrix of Schur vectors Z is required;
If wantz = 0, Schur vectors are not required.
n (global) The order of the Hessenberg matrix A (and z if wantz is non-zero).

n≥0.
ilo, ihi (global)

It is assumed that A is already upper quasi-triangular in rows and columns
ihi+1:n, and that A(ilo, ilo-1) = 0 (unless ilo = 1). p?lahqr works
primarily with the Hessenberg submatrix in rows and columns ilo to ihi,
but applies transformations to all of H if wantt is non-zero.
1≤ilo≤max(1,ihi); ihi≤n.
a (global)
Array, of size lld_a * LOCc(n) . On entry, the upper Hessenberg matrix A.
iloz, ihiz (global) Specify the rows of the matrix Z to which transformations must be
applied if wantz is non-zero. 1≤iloz≤ilo; ihi≤ihiz≤n.
z (global )
Array. If wantz is non-zero, on entry z must contain the current matrix Z of
transformations accumulated by pdhseqr. If wantz is zero, z is not
referenced.
work (local)
1283
lwork (local) The size of work. lwork is assumed big enough so that lwork≥3*n
+ max(2*max(lld_z,lld_a) + 2*LOCq(n), 7*ceil(n/hbl)/
lcm(NPROW,NPCOL))).
If lwork = -1, then work[0] gets set to the above number and the code
returns immediately.
iwork (global and local) array of size ilwork.
ilwork (local) This holds some of the iblk integer arrays.
Output Parameters
a On exit, if wantt is non-zero, A is upper quasi-triangular in rows and

columns ilo:ihi, with any 2-by-2 or larger diagonal blocks not yet in
standard form. If wantt is zero, the contents of A are unspecified on exit.
wr, wi (global replicated output)

Arrays of size n each. The real and imaginary parts, respectively, of the
computed eigenvalues ilo to ihiare stored in the corresponding elements
of wr and wi. If two eigenvalues are computed as a complex conjugate pair,
they are stored in consecutive elements of wr and wi, say the i-th and (i
+1)-th, with wi[i-1]> 0 and wi[i] < 0. If wantt is zero , the eigenvalues
are stored in the same order as on the diagonal of the Schur form returned
in A. A may be returned with larger diagonal blocks until the next release.
z On exit z has been updated; transformations are applied only to the

submatrix Z(iloz:ihiz, ilo:ihi).
info (global)
< 0: the parameter number - info is incorrect or inconsistent
> 0: p?lahqr failed to compute all the eigenvalues ilo to ihi in a total of
30*(ihi-ilo+1) iterations; if info = i, elements i+1: ihi of wr and wi
contain the eigenvalues that have been successfully computed.
See Also
notations.
p?trevc
Computes right and/or left eigenvectors of a complex
upper triangular matrix in parallel.
Syntax
void pctrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, MKL_Complex8* t, const MKL_INT* desct, MKL_Complex8* vl, const MKL_INT*
descvl, MKL_Complex8* vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m,
MKL_Complex8* work, float* rwork, MKL_INT* info);
1284
void pztrevc (const char* side, const char* howmny, const MKL_INT* select, const
MKL_INT* n, MKL_Complex16* t, const MKL_INT* desct, MKL_Complex16* vl, const MKL_INT*
descvl, MKL_Complex16* vr, const MKL_INT* descvr, const MKL_INT* mm, MKL_INT* m,
MKL_Complex16* work, double* rwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?trevc computes some or all of the right and/or left eigenvectors of a complex upper triangular matrix T in
parallel.
The right eigenvector x and the left eigenvector y of T corresponding to an eigenvalue w are defined by:
T*x = w*x,
y'*T = w*y'
where y' denotes the conjugate transpose of the vector y.
If all eigenvectors are requested, the routine may either return the matrices X and/or Y of right or left
eigenvectors of T, or the products Q*X and/or Q*Y, where Q is an input unitary matrix. If T was obtained
from the Schur factorization of an original matrix A = Q*T*Q', then Q*X and Q*Y are the matrices of right or
left eigenvectors of A.
Input Parameters
side (global)
= 'R': compute right eigenvectors only;
= 'L': compute left eigenvectors only;
= 'B': compute both right and left eigenvectors.
howmny (global)
= 'A': compute all right and/or left eigenvectors;
= 'B': compute all right and/or left eigenvectors, and backtransform them
using the input matrices supplied in vr and/or vl;
= 'S': compute selected right and/or left eigenvectors, specified by the

logical array select.
select (global)
Array, size (n)
If howmny = 'S', select specifies the eigenvectors to be computed.
If howmny = 'A' or 'B', select is not referenced. To select the eigenvector

corresponding to the j-th eigenvalue, select[j - 1] must be set to non-
zero.
n (global)
The order of the matrix T. n >= 0.
t (local)
Array, size lld_t*LOCc(n).
1285
The upper triangular matrix T. T is modified, but restored on exit.
desct (global and local)

The array descriptor for the distributed matrix T.
vl (local)
Array, size (descvl(lld_),mm)
On entry, if side = 'L' or 'B' and howmny = 'B', vl must contain an n-by-n
matrix Q (usually the unitary matrix Q of Schur vectors returned by ?
hseqr).
descvl (global and local)

The array descriptor for the distributed matrix VL.
vr (local)
Array, size descvr(lld_)*mm.
On entry, if side = 'R' or 'B' and howmny = 'B', vr must contain an n-by-n
matrix Q (usually the unitary matrix Q of Schur vectors returned by ?
hseqr).
descvr (global and local)

The array descriptor for the distributed matrix VR.
mm (global)
The number of columns in the arrays vl and/or vr. mm >= m.
work (local)
Array, size ( 2*desct(lld_) )
Additional workspace may be required if p?lattrs is updated to use work.
rwork Array, size ( desct(lld_) )
Output Parameters
t The upper triangular matrix T. T is modified, but restored on exit.
vl On exit, if side = 'L' or 'B', vl contains:
if howmny = 'A', the matrix Y of left eigenvectors of T;
if howmny = 'B', the matrix Q*Y;
if howmny = 'S', the left eigenvectors of T specified by select, stored

consecutively in the columns of vl, in the same order as their eigenvalues.
If side = 'R', vl is not referenced.
vr On exit, if side = 'R' or 'B', vr contains:
1286
if howmny = 'A', the matrix X of right eigenvectors of T;
if howmny = 'B', the matrix Q*X;
if howmny = 'S', the right eigenvectors of T specified by select, stored

consecutively in the columns of vr, in the same order as their eigenvalues.
If side = 'L', vr is not referenced.
m (global)
The number of columns in the arrays vl and/or vr actually used to store
the eigenvectors. If howmny = 'A' or 'B', m is set to n. Each selected
eigenvector occupies one column.
info (global)
Application Notes
The algorithm used in this program is basically backward (forward) substitution. Scaling should be used to
make the code robust against possible overflow. But scaling has not yet been implemented in p?lattrs
which is called by this routine to solve the triangular systems. p?lattrs just calls p?trsv.
Each eigenvector is normalized so that the element of largest magnitude has magnitude 1; here the
magnitude of a complex number (x,y) is taken to be |x| + |y|.
Singular Value Decomposition: ScaLAPACK Driver Routines

This section describes ScaLAPACK routines for computing the singular value decomposition (SVD) of a
general m-by-n matrix A (see "Singular Value Decomposition"in LAPACK chapter).
To find the SVD of a general matrix A, this matrix is first reduced to a bidiagonal matrix B by a unitary
(orthogonal) transformation, and then SVD of the bidiagonal matrix is computed. Note that the SVD of B is
computed using the LAPACK routine ?bdsqr .
Table "Computational Routines for Singular Value Decomposition (SVD)" lists ScaLAPACK computational
routines for performing this decomposition.
Computational Routines for Singular Value Decomposition (SVD)
Operation General matrix Orthogonal/unitary matrix
Reduce A to a bidiagonal matrix p?gebrd
Multiply matrix after reduction p?ormbr/p?unmbr
p?gebrd
Reduces a general matrix to bidiagonal form.
Syntax
void psgebrd (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tauq , float *taup , float *work , MKL_INT
void pdgebrd (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , double *tauq , double *taup , double *work ,
1287
void pcgebrd (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,

MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8 *taup ,
void pzgebrd (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq , MKL_Complex16 *taup ,
Include Files
• mkl_scalapack.h
Description
The p?gebrd function reduces a real/complex general m-by-n distributed matrix sub(A)= A(ia:ia+m-1,
ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitary transformation:
Q'*sub(A)*P = B.
If m≥n, B is upper bidiagonal; if m < n, B is lower bidiagonal.
Input Parameters
m (global) The number of rows in the distributed matrix sub(A) (m≥0).
n (global) The number of columns in the distributed matrix sub(A) (n≥0).
a (local)
Real pointer into the local memory to an array of size lld_a*LOCc(ja+n-1).
On entry, this array contains the distributed matrix sub (A).
work (local)

lwork≥nb*(mpa0 + nqa0+1)+ nqa0
where nb = mb_a = nb_a,
iroffa = mod(ia-1, nb),

icoffa = mod(ja-1, nb),
iarow = indxg2p(ia, nb, MYROW, rsrc_a, NPROW),
iacol = indxg2p (ja, nb, MYCOL, csrc_a, NPCOL),
mpa0 = numroc(m +iroffa, nb, MYROW, iarow, NPROW),
nqa0 = numroc(n +icoffa, nb, MYCOL, iacol, NPCOL),
NOTE
1288

Output Parameters
a On exit, if m≥n, the diagonal and the first superdiagonal of sub(A) are
overwritten with the upper bidiagonal matrix B; the elements below the
diagonal, with the array tauq, represent the orthogonal/unitary matrix Q as
a product of elementary reflectors, and the elements above the first
superdiagonal, with the array taup, represent the orthogonal matrix P as a
product of elementary reflectors. If m < n, the diagonal and the first
subdiagonal are overwritten with the lower bidiagonal matrix B; the
elements below the first subdiagonal, with the array tauq, represent the
orthogonal/unitary matrix Q as a product of elementary reflectors, and the
elements above the diagonal, with the array taup, represent the orthogonal
matrix P as a product of elementary reflectors. See Application Notes below.
d (local)
Array of size LOCc(ja+min(m,n)-1) if m≥n and LOCr(ia+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal matrix B:
d[i] = A(i+1,i+1), 0 ≤i < size (d).
e (local)
Array of size LOCr(ia+min(m,n)-1) if m≥n; LOCc(ja+min(m,n)-2)
otherwise. The distributed off-diagonal elements of the bidiagonal
distributed matrix B:
If m≥n, e[i] = A(i+1,i+2) for i = 0,1,..., n-2; if m < n, e[i] = A(i+2,i+1)
for i = 0,1,..., m-2. e is tied to the distributed matrix A.
tauq, taup (local)

Arrays of size LOCc(ja+min(m,n)-1) for tauq and LOCr(ia
+min(m,n)-1) for taup. Contain the scalar factors of the elementary
reflectors that represent the orthogonal/unitary matrices Q and P,
respectively. tauq and taup are tied to the distributed matrix A. See
Application Notes below.
info (global)
1289
Application Notes
The matrices Q and P are represented as products of elementary reflectors:
If m≥n,
Q = H(1)*H(2)*...*H(n), and P = G(1)*G(2)*...*G(n-1).

Each H(i) and G(i) has the form:
H(i)= i - tauq * v * v' and G(i) = i - taup*u*u'
where tauq and taup are real/complex scalars, and v and u are real/complex vectors;
v(1:i-1) = 0, v(i) = 1, and v(i+1:m) is stored on exit in A(ia+i:ia+m-1,ja+i-1);
u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in A (ia+i-1,ja+i+1:ja+n-1);
tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2].
If m < n,
Q = H(1)*H(2)*...*H(m-1), and P = G(1)* G(2)*...* G(m)
Each H (i) and G(i) has the form:
H(i)= i-tauq*v*v' and G(i)= i-taup*u*u'
here tauq and taup are real/complex scalars, and v and u are real/complex vectors;
v(1:i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A (ia+i:ia+m-1,ja+i-1); u(1:i-1) = 0, u(i) = 1, and
u(i+1:n) is stored on exit in A(ia+i-1,ja+i+1:ja+n-1);
The contents of sub(A) on exit are illustrated by the following examples:

m = 6 and n = 5(m > n):
m = 5 and n = 6(m < n):
1290
where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of the vector defining
H(i), and ui an element of the vector defining G(i).
See Also
notations.
p?ormbr
Multiplies a general matrix by one of the orthogonal
matrices from a reduction to bidiagonal form
determined by p?gebrd.
Syntax
void psormbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_INT *k , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *tau ,
MKL_INT *info );
void pdormbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_INT *k , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *tau ,
Include Files
• mkl_scalapack.h
Description
If vect = 'Q', the p?ormbr function overwrites the general real distributed m-by-n matrix sub(C) = C(iс:iс
+m-1,jс:jс+n-1) with
side ='L' side ='R'

trans = 'N': Q sub(C) sub(C) Q
trans = 'T': QT sub(C) sub(C) QT
If vect = 'P', the function overwrites sub(C) with
side ='L' side ='R'

trans = 'N': P sub(C) sub(C) P
trans = 'T': PT sub(C) sub(C) PT
Here Q and PT are the orthogonal distributed matrices determined by p?gebrd when reducing a real
distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*, ja:*) = Q*B*PT. Q and PT are defined as
products of elementary reflectors H(i) and G(i) respectively.
Let nq = m if side = 'L' and nq = n if side = 'R'. Therefore nq is the order of the orthogonal matrix Q or
PT that is applied.
If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:
If nq≥k, Q = H(1) H(2)...H(k);
If nq < k, Q = H(1) H(2)...H(nq-1).
If vect = 'P', A(ia:*, ja:*) is assumed to have been a k-by-nq matrix:
If k < nq, P = G(1) G(2)...G(k);
1291
If k≥nq, P = G(1) G(2)...G(nq-1).
Input Parameters
vect (global)
If vect ='Q', then Q or QT is applied.
If vect ='P', then P or PT is applied.
side (global)
If side ='L', then Q or QT, P or PT is applied from the left.
If side ='R', then Q or QT, P or PT is applied from the right.
trans (global)
If trans = 'N', no transpose, Q or P is applied.
If trans = 'T', then QT or PT is applied.
m (global) The number of rows in the distributed matrix sub (C).
n (global) The number of columns in the distributed matrix sub (C).
k (global)
If vect = 'Q', the number of columns in the original distributed matrix
reduced by p?gebrd;
If vect = 'P', the number of rows in the original distributed matrix

reduced by p?gebrd.
Constraints: k≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja
+min(nq,k)-1) if vect='Q', and lld_a * LOCc(ja+nq-1) if vect = 'P'.
nq = m if side = 'L', and nq = n otherwise.

The vectors that define the elementary reflectors H(i) and G(i), whose
products determine the matrices Q and P, as returned by p?gebrd.
If vect = 'Q', lld_a≥max(1, LOCr(ia+nq-1));
If vect = 'P', lld_a≥max(1, LOCr(ia+min(nq, k)-1)).
tau (local)
Array of size LOCc(ja+min(nq, k)-1), if vect = 'Q', and LOCr(ia
+min(nq, k)-1), if vect = 'P'.
tau[i] must contain the scalar factor of the elementary reflector H(i+1) or
G (i+1)
1292
which determines Q or P, as returned by pdgebrd in its array argument
tauq or taup. tau is tied to the distributed matrix A.
c (local)
work (local)
If side = 'L'
nq = m;
if ((vect = 'Q' and nq≥k) or (vect is not equal to 'Q' and nq>k)),
iaa=ia; jaa=ja; mi=m; ni=n; icc=ic; jcc=jc;
else
iaa= ia+1; jaa=ja; mi=m-1; ni=n; icc=ic+1; jcc= jc;
end if
else
If side = 'R', nq = n;
if((vect = 'Q' and nq≥k) or (vect is not equal to 'Q' and

nq>k)),
iaa=ia; jaa=ja; mi=m; ni=n; icc=ic; jcc=jc;
else
iaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc= jc+1;
end if
end if
If vect = 'Q',
If side = 'L', lwork≥max((nb_a*(nb_a-1))/2, (nqc0 + mpc0)*nb_a) +

nb_a * nb_a
else if side = 'R',
lwork≥max((nb_a*(nb_a-1))/2, (nqc0 + max(npa0 +

numroc(numroc(ni+icoffc, nb_a, 0, 0, NPCOL), nb_a, 0, 0,
end if
else if vect is not equal to 'Q', if side = 'L',
1293

numroc(numroc(mi+iroffc, mb_a, 0, 0, NPROW), mb_a, 0, 0,
else if side = 'R',

end if
end if
where lcmp = lcm/NPROW, lcmq = lcm/NPCOL, with lcm =
ilcm(NPROW, NPCOL),
iacol = indxg2p(jaa, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(mi+icoffa, nb_a, MYCOL, iacol, NPCOL),
NOTE

Output Parameters
c On exit, if vect='Q', sub(C) is overwritten by Q*sub(C), or Q'*sub(C), or

sub(C)*Q', or sub(C)*Q; if vect='P', sub(C) is overwritten by P*sub(C),
or P'*sub(C), or sub(C)*P, or sub(C)*P'.
info (global)
1294
See Also
notations.
p?unmbr
Multiplies a general matrix by one of the unitary
transformation matrices from a reduction to bidiagonal
form determined by p?gebrd.
Syntax
void pcunmbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
void pzunmbr (char *vect , char *side , char *trans , MKL_INT *m , MKL_INT *n ,
MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca ,
Include Files
• mkl_scalapack.h
Description
If vect = 'Q', the p?unmbr function overwrites the general complex distributed m-by-n matrix sub(C) =
C(iс:iс+m-1,jс:jс+n-1) with
side ='L' side ='R'

If vect = 'P', the function overwrites sub(C) with
side ='L' side ='R'

trans = 'N': P*sub(C) sub(C)*P
trans = 'C': PH*sub(C) sub(C)*PH
Here Q and PH are the unitary distributed matrices determined by p?gebrd when reducing a complex
distributed matrix A(ia:*, ja:*) to bidiagonal form: A(ia:*, ja:*) = Q*B*PH.
Q and PH are defined as products of elementary reflectors H(i) and G(i) respectively.
Let nq = m if side = 'L' and nq = n if side = 'R'. Therefore nq is the order of the unitary matrix Q or PH
that is applied.
If vect = 'Q', A(ia:*, ja:*) is assumed to have been an nq-by-k matrix:
If nq≥k, Q = H(1) H(2)... H(k);
If nq < k, Q = H(1) H(2)... H(nq-1).
If vect = 'P', A(ia:*, ja:*) is assumed to have been a k-by-nq matrix:
If k < nq, P = G(1) G(2)... G(k);
1295
If k≥nq, P = G(1) G(2)... G(nq-1).
Input Parameters
vect (global)
If vect ='Q', then Q or QH is applied.
If vect ='P', then P or PH is applied.
side (global)
If side ='L', then Q or QH, P or PH is applied from the left.
If side ='R', then Q or QH, P or PH is applied from the right.
trans (global)
If trans = 'N', no transpose, Q or P is applied.
If trans = 'C', conjugate transpose, QH or PH is applied.
m (global) The number of rows in the distributed matrix sub (C) m≥0.
n (global) The number of columns in the distributed matrix sub (C) n≥0.
k (global)
If vect = 'Q', the number of columns in the original distributed matrix
reduced by p?gebrd;
If vect = 'P', the number of rows in the original distributed matrix

reduced by p?gebrd.
Constraints: k≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja
+min(nq,k)-1) if vect='Q', and lld_a * LOCc(ja+nq-1) if vect = 'P'.
nq = m if side = 'L', and nq = n otherwise.

The vectors that define the elementary reflectors H(i) and G(i), whose
products determine the matrices Q and P, as returned by p?gebrd.
If vect = 'Q', lld_a≥max(1, LOCr(ia+nq-1));
If vect = 'P', lld_a≥max(1, LOCr(ia+min(nq, k)-1)).
tau (local)
Array of size LOCc(ja+min(nq, k)-1), if vect = 'Q', and LOCr(ia
+min(nq, k)-1), if vect = 'P'.
tau[i] must contain the scalar factor of the elementary reflector H(i+1) or
G (i+1), which determines Q or P, as returned by p?gebrd in its array
argument tauq or taup. tau is tied to the distributed matrix A.
1296
c (local)
work (local)
If side = 'L'
nq = m;
if ((vect = 'Q' and nq≥k) or (vect is not equal to 'Q' and
nq>k)), iaa= ia; jaa= ja; mi= m; ni= n; icc= ic; jcc= jc;
else
iaa= ia+1; jaa= ja; mi= m-1; ni= n; icc= ic+1; jcc= jc;
end if
else
If side = 'R', nq = n;
if ((vect = 'Q' and nq≥k) or (vect is not equal to 'Q' and

nq≥k)),
iaa= ia; jaa= ja; mi= m; ni= n; icc= ic; jcc= jc;
else
iaa= ia; jaa= ja+1; mi= m; ni= n-1; icc= ic; jcc= jc+1;
end if
end if
If vect = 'Q',
If side = 'L', lwork≥max((nb_a*(nb_a-1))/2, (nqc0+mpc0)*nb_a) +

nb_a*nb_a
else if side = 'R',
0, 0, lcmq), mpc0))*nb_a) + nb_a*nb_a
end if
else if vect is not equal to 'Q',
if side = 'L',
lwork≥max((mb_a*(mb_a-1))/2, (mpc0 +
max(mqa0+numroc(numroc(mi+iroffc, mb_a, 0, 0, NPROW), mb_a,
0, 0, lcmp), nqc0))*mb_a) + mb_a*mb_a
1297
else if side = 'R',

end if
end if
where lcmp = lcm/NPROW, lcmq = lcm/NPCOL, with lcm =
ilcm(NPROW, NPCOL),
iacol = indxg2p(jaa, nb_a, MYCOL, csrc_a, NPCOL),
mqa0 = numroc(mi+icoffa, nb_a, MYCOL, iacol, NPCOL),

NOTE

Output Parameters
c On exit, if vect='Q', sub(C) is overwritten by Q*sub(C), or Q'*sub(C), or

sub(C)*Q', or sub(C)*Q; if vect='P', sub(C) is overwritten by P*sub(C), or
P'*sub(C), or sub(C)*P, or sub(C)*P'.
info (global)
1298
See Also
notations.
Generalized Symmetric-Definite Eigenvalue Problems: ScaLAPACK Computational

Routines
This section describes ScaLAPACK routines that allow you to reduce the generalized symmetric-definite
eigenvalue problems (see Generalized Symmetric-Definite Eigenvalue Problemsin LAPACK chapters) to
standard symmetric eigenvalue problem Cy = λy, which you can solve by calling ScaLAPACK routines
described earlier in this chapter (see Symmetric Eigenproblems).
Table "Computational Routines for Reducing Generalized Eigenproblems to Standard Problems" lists these
routines.
Computational Routines for Reducing Generalized Eigenproblems to Standard Problems
Operation Real symmetric matrices Complex Hermitian matrices
Reduce to standard problems p?sygst p?hegst
p?sygst
Syntax
void pssygst (MKL_INT *ibtype , char *uplo , MKL_INT *n , float *a , MKL_INT *ia ,
float *scale , MKL_INT *info );
void pdsygst (MKL_INT *ibtype , char *uplo , MKL_INT *n , double *a , MKL_INT *ia ,
*descb , double *scale , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?sygstfunction reduces real symmetric-definite generalized eigenproblems to the standard form.
In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).
If ibtype = 1, the problem is
sub(A)*x = λ*sub(B)*x,
and sub(A) is overwritten by inv(UT)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LT).
If ibtype = 2 or 3, the problem is
sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x,

and sub(A) is overwritten by U*sub(A)*UT, or LT*sub(A)*L.
sub(B) must have been previously factorized as UT*U or L*LT by p?potrf.
Input Parameters
ibtype (global) Must be 1 or 2 or 3.

If itype = 1, compute inv(UT)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LT);
1299
If itype = 2 or 3, compute U*sub(A)*UT, or LT*sub(A)*L.
If uplo = 'U', the upper triangle of sub(A) is stored and sub (B) is
factored as UT*U.
If uplo = 'L', the lower triangle of sub(A) is stored and sub (B) is
factored as L*LT.
n (global) The order of the matrices sub (A) and sub (B) (n≥ 0).
a (local)
entry, the array contains the local pieces of the n-by-n symmetric
is not referenced.
is not referenced.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+n-1). On
entry, the array contains the local pieces of the triangular factor from the
Cholesky factorization of sub (B) as returned by p?potrf.
Output Parameters

sub(A).
scale (global)
scaling performed in this function. At present, scale is always returned as
info (global)
1300
If info = 0, the execution is successful. If info < 0, if the i-th argument
is an array and the j-th entry, indexed j - 1, had an illegal value, then info
= -(i*100+j); if the i-th argument is a scalar and had an illegal value, then
info = -i.
See Also
notations.
p?hegst
Reduces a Hermitian positive-definite generalized
Syntax
void pchegst (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT
MKL_INT *descb , float *scale , MKL_INT *info );
void pzhegst (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT
MKL_INT *descb , double *scale , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?hegst function reduces complex Hermitian positive-definite generalized eigenproblems to the
standard form.
In the following sub(A) denotes A(ia:ia+n-1, ja:ja+n-1) and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).
sub(A)*x = λ*sub(B)*x,
and sub(A) is overwritten by inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH).
sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x,

and sub(A) is overwritten by U*sub(A)*UH, or LH*sub(A)*L.
sub(B) must have been previously factorized as UH*U or L*LH by p?potrf.
Input Parameters

If itype = 1, compute inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH);
If itype = 2 or 3, compute U*sub(A)*UH, or LH*sub(A)*L.
If uplo = 'U', the upper triangle of sub(A) is stored and sub (B) is
factored as UH*U.
1301
If uplo = 'L', the lower triangle of sub(A) is stored and sub (B) is
factored as L*LH.
n (global) The order of the matrices sub (A) and sub (B) (n≥0).
a (local)
entry, the array contains the local pieces of the n-by-n Hermitian distributed
matrix sub(A). If uplo = 'U', the leading n-by-n upper triangular part of
sub(A) contains the upper triangular part of the matrix, and its strictly
lower triangular part of sub(A) contains the lower triangular part of the
b (local)
entry, the array contains the local pieces of the triangular factor from the
Cholesky factorization of sub (B) as returned by p?potrf.
Output Parameters

sub(A).
scale (global)
scaling performed in this function. At present, scale is always returned as
info (global)
If info = 0, the execution is successful. If info <0, if the i-th argument is
an array and the j-th entry, indexed j - 1, had an illegal value, then info =
-(i*100+j); if the i-th argument is a scalar and had an illegal value, then
info = -i.
See Also
notations.
1302
ScaLAPACK Driver Routines
Table "ScaLAPACK Driver Routines" lists ScaLAPACK driver routines available for solving systems of linear
equations, linear least-squares problems, standard eigenvalue and singular value problems, and generalized
symmetric definite eigenproblems.
ScaLAPACK Driver Routines
Type of Problem Matrix type, storage scheme Driver
Linear equations general (partial pivoting) p?gesv (simple driver) / p?gesvx
(expert driver)
general band (partial pivoting) p?gbsv (simple driver)
general band (no pivoting) p?dbsv (simple driver)
general tridiagonal (no pivoting) p?dtsv (simple driver)
symmetric/Hermitian positive-definite p?posv (simple driver) / p?posvx
(expert driver)
symmetric/Hermitian positive-definite, p?pbsv (simple driver)
band
symmetric/Hermitian positive-definite, p?ptsv (simple driver)
tridiagonal
Linear least squares problem general m-by-n p?gels
Symmetric eigenvalue problem symmetric/Hermitian p?syev / p?heev (simple driver); p?
syevd / p?heevd (simple driver with a
divide and conquer algorithm); p?
syevx / p?heevx (expert driver); p?
syevr / p?heevr (simple driver with
MRRR algorithm)
Singular value decomposition general m-by-n p?gesvd
Generalized symmetric definite symmetric/Hermitian, one matrix also p?sygvx / p?hegvx (expert driver)
eigenvalue problem positive-definite
p?gesv
equations with a square distributed matrix and
Syntax
void psgesv (MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
void pdgesv (MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
void pcgesv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
void pzgesv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb ,
Include Files
• mkl_scalapack.h
1303
Description
The p?gesvfunction computes the solution to a real or complex system of linear equations sub(A)*X =
sub(B), where sub(A) = A(ia:ia+n-1, ja:ja+n-1) is an n-by-n distributed matrix and X and sub(B) =
B(ib:ib+n-1, jb:jb+nrhs-1) are n-by-nrhs distributed matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) =
P*L*U, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. L and U are
stored in sub(A). The factored form of sub(A) is then used to solve the system of equations sub(A)*X =
sub(B).
Input Parameters
order of the distributed submatrix sub(A) (n≥ 0).
nrhs (global) The number of right hand sides, that is, the number of columns of
the distributed submatrices B and X(nrhs≥ 0).
a, b (local)
Pointers into the local memory to arrays of local size a: lld_a*LOCc(ja
+n-1) and b: lld_b*LOCc(jb+nrhs-1), respectively.
On entry, the array a contains the local pieces of the n-by-n distributed
matrix sub(A) to be factored.
On entry, the array b contains the right hand side distributed matrix sub(B).
first row and the first column of sub(A), respectively.
first row and the first column of sub(B), respectively.
Output Parameters
a Overwritten by the factors L and U from the factorization sub(A) = P*L*U;

the unit diagonal elements of L are not stored .
b Overwritten by the solution distributed matrix X.
ipiv (local) Array of size LOCr(m_a)+mb_a. This array contains the pivoting
information. The (local) row i of the matrix was interchanged with the
(global) row ipiv[i - 1].
info < 0:
1304
If the i-th argument is an array and the j-th entry had an illegal value, then
info = -(i*100+j); if the i-th argument is a scalar and had an illegal
info> 0:
If info = k, U(ia+k-1,ja+k-1) is exactly zero. The factorization has been
completed, but the factor U is exactly singular, so the solution could not be
computed.
See Also
notations.
p?gesvx
Uses the LU factorization to compute the solution to
the system of linear equations with a square matrix A
and multiple right-hand sides, and provides error
Syntax
void psgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , float *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , MKL_INT *ipiv , char *equed , float *r , float *c , float *b ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , float *rcond , float *ferr , float *berr , float *work , MKL_INT
*lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pdgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , double *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , MKL_INT *ipiv , char *equed , double *r , double *c , double
*b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , double *rcond , double *ferr , double *berr , double *work ,
void pcgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , MKL_INT *ipiv , char *equed , float *r , float *c ,
MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *rcond , float *ferr , float
*berr , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT
*info );
void pzgesvx (char *fact , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf ,
MKL_INT *jaf , MKL_INT *descaf , MKL_INT *ipiv , char *equed , double *r , double *c ,
MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *rcond , double *ferr , double
*berr , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork ,
MKL_INT *info );
Include Files
• mkl_scalapack.h
1305
Description
The p?gesvx function uses the LU factorization to compute the solution to a real or complex system of linear
equations AX = B, where A denotes the n-by-n submatrix A(ia:ia+n-1, ja:ja+n-1), B denotes the n-by-
nrhs submatrix B(ib:ib+n-1, jb:jb+nrhs-1) and X denotes the n-by-nrhs submatrix X(ix:ix+n-1,
jx:jx+nrhs-1).
In the following description, af stands for the subarray of af from row iaf and column jaf to row iaf+n-1 and
column jaf+n-1.
The function p?gesvx performs the following steps:
1. If fact = 'E', real scaling factors R and C are computed to equilibrate the system:
trans = 'N': diag(R)*A*diag(C) *diag(C)-1*X = diag(R)*B

trans = 'T': (diag(R)*A*diag(C))T *diag(R)-1*X = diag(C)*B
trans = 'C': (diag(R)*A*diag(C))H *diag(R)-1*X = diag(C)*B
equilibration is used, A is overwritten by diag(R)*A*diag(C) and B by diag(R)*B (if trans='N') or
= 'E') as A = PLU, where P is a permutation matrix, L is a unit lower triangular matrix, and U is
upper triangular.
3. The factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than relative machine precision, steps 4 - 6 are skipped.
6. If equilibration was used, the matrix X is premultiplied by diag(C) (if trans = 'N') or diag(R) (if
Input Parameters
fact (global) Must be 'F', 'N', or 'E'.

factored.
If fact = 'F' then, on entry, af and ipiv contain the factored form of A. If
given by r and c. Arrays a, af, and ipiv are not modified.
If fact = 'N', the matrix A is copied to af and factored.
If fact = 'E', the matrix A is equilibrated if necessary, then copied to af

and factored.
trans (global) Must be 'N', 'T', or 'C'.

If trans = 'N', the system has the form A*X = B (No transpose);
If trans = 'T', the system has the form AT*X = B (Transpose);
If trans = 'C', the system has the form AH*X = B (Conjugate transpose);
1306
n (global) The number of linear equations; the order of the submatrix A(n≥
0).
distributed submatrices B and X(nrhs≥ 0).
a, af, b, work (local)

+n-1), af: lld_af*LOCc(ja+n-1), b: lld_b*LOCc(jb+nrhs-1), work:
lwork.
The array a contains the matrix A. If fact = 'F' and equed is not 'N',
then A must have been equilibrated by the scaling factors in r and/or c.
The array af is an input argument if fact = 'F'. In this case it contains on
entry the factored form of the matrix A, that is, the factors L and U from
the factorization A = P*L*U as computed by p?getrf. If equed is not 'N',
then af is the factored form of the equilibrated matrix A.
The array b contains on entry the matrix B whose columns are the right-
hand sides for the systems of equations.
work is a workspace array. The size of work is (lwork).
first row and the first column of the submatrix A(ia:ia+n-1, ja:ja
+n-1), respectively.
first row and the first column of the subarray af, respectively.
first row and the first column of the submatrix B(ib:ib+n-1, jb:jb
+nrhs-1), respectively.
ipiv (local) Array of size LOCr(m_a)+mb_a.
The array ipiv is an input argument if fact = 'F' .
On entry, it contains the pivot indices from the factorization A = P*L*U as

computed by p?getrf; (local) row i of the matrix was interchanged with
the (global) row ipiv[i - 1].
This array must be aligned with A(ia:ia+n-1, *).
equed (global) Must be 'N', 'R', 'C', or 'B'. equed is an input argument if fact
= 'F' . It specifies the form of equilibration that was done:
If equed = 'N', no equilibration was done (always true if fact = 'N');
1307
premultiplied by diag(r);
postmultiplied by diag(c);
If equed = 'B', both row and column equilibration was done; A has been
replaced by diag(r)*A*diag(c).
r, c (local)
Arrays of size LOCr(m_a) and LOCc(n_a), respectively.
The array r contains the row scale factors for A, and the array c contains
the column scale factors for A. These arrays are input arguments if fact =
'F' only; otherwise they are output arguments. If equed = 'R' or 'B', A
is multiplied on the left by diag(r); if equed = 'N' or 'C', r is not
accessed.
positive.
positive. Array r is replicated in every process column, and is aligned with
the distributed matrix A. Array c is replicated in every process row, and is
aligned with the distributed matrix A.
first row and the first column of the submatrix X(ix:ix+n-1, jx:jx
+nrhs-1), respectively.
lwork (local or global) The size of the array work ; must be at least max(p?
gecon(lwork), p?gerfs(lwork))+LOCr(n_a).
iwork (local, psgesvx/pdgesvx only). Workspace array. The size of iwork is

(liwork).
liwork (local, psgesvx/pdgesvx only). The size of the array iwork , must be at
least LOCr(n_a).
rwork (local)
Workspace array, used in complex flavors only.
The size of rwork is (lrwork).
lrwork (local or global, pcgesvx/pzgesvx only). The size of the array rwork;must
be at least 2*LOCc(n_a) .
Output Parameters
x (local)
1308
Pointer into the local memory to an array of local size lld_x*LOCc(jx
+nrhs-1).
If info = 0, the array x contains the solution matrix X to the original
system of equations. Note that A and B are modified on exit if equed≠'N',
and the solution to the equilibrated system is:
diag(C)-1*X, if trans = 'N' and equed = 'C' or 'B'; and
diag(R)-1*X, if trans = 'T' or 'C' and equed = 'R' or 'B'.
equed = 'N'.
equed = 'R': A = diag(R)*A

equed = 'B': A = diag(R)*A*diag(c)

the factors L and U from the factorization A = P*L*U of the original matrix
A (if fact = 'N') or of the equilibrated matrix A (if fact = 'E'). See the
description of a for the form of the equilibrated matrix.
b Overwritten by diag(R)*B if trans = 'N' and equed = 'R' or 'B';
overwritten by diag(c)*B if trans = 'T' and equed = 'C' or 'B'; not

changed if equed = 'N'.
r, c These arrays are output arguments if fact≠'F'.
See the description of r, c in Input Arguments section.
rcond (global).
An estimate of the reciprocal condition number of the matrix A after
equilibration (if done). The function sets rcond =0 if the estimate
underflows; in this case the matrix is singular (to working precision).
However, anytime rcond is small compared to 1.0, for the working
precision, the matrix may be poorly conditioned or even singular.
ferr, berr (local)

Arrays of size LOCc(n_b) each. Contain the component-wise forward and
relative backward errors, respectively, for each solution vector.
Arrays ferr and berr are both replicated in every process row, and are
aligned with the matrices B and X.
ipiv If fact = 'N' or 'E', then ipiv is an output argument and on exit contains
the pivot indices from the factorization A = P*L*U of the original matrix A
(if fact = 'N') or of the equilibrated matrix A (if fact = 'E').
equed If fact≠'F' , then equed is an output argument. It specifies the form of

Arguments section).
1309
work[0] If info=0, on exit work[0] returns the minimum value of lwork required
for optimum performance.
iwork[0] If info=0, on exit iwork[0] returns the minimum value of liwork required
rwork[0] If info=0, on exit rwork[0] returns the minimum value of lrwork required
info < 0: if the ith argument is an array and the jth entry had an illegal
value, then info = -(i*100+j); if the ith argument is a scalar and had an
illegal value, then info = -i. If info = i, and i≤n, then U(i,i) is exactly
zero. The factorization has been completed, but the factor U is exactly
singular, so the solution and error bounds could not be computed. If info =
i, and i = n +1, then U is nonsingular, but rcond is less than machine
precision. The factorization has been completed, but the matrix is singular
to working precision and the solution and error bounds have not been
computed.
See Also
notations.
p?gbsv
equations with a general banded distributed matrix
and multiple right-hand sides.
Syntax
void psgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , float *b , MKL_INT *ib , MKL_INT
*descb , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , double *b , MKL_INT *ib , MKL_INT
*descb , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex8
*a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex8 *b , MKL_INT *ib ,
MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex16
*a , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_Complex16 *b , MKL_INT *ib ,
Include Files
• mkl_scalapack.h
Description
The p?gbsvfunction computes the solution to a real or complex system of linear equations
sub(A)*X = sub(B),
1310
where sub(A) = A(1:n, ja:ja+n-1) is an n-by-n real/complex general banded distributed matrix with bwl
subdiagonals and bwu superdiagonals, and X and sub(B)= B(ib:ib+n-1, 1:rhs) are n-by-nrhs distributed
matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) =
P*L*U*Q, where P and Q are permutation matrices, and L and U are banded lower and upper triangular
matrices, respectively. The matrix Q represents reordering of columns for the sake of parallelism, while P
represents reordering of rows for numerical stability using classic partial pivoting.
Optimization Notice
Input Parameters
order of the distributed matrix sub(A) (n≥ 0).
bwl (global) The number of subdiagonals within the band of A (0≤bwl≤n-1 ).
bwu (global) The number of superdiagonals within the band of A (0≤bwu≤n-1 ).
a, b (local)
+n-1) and b: lld_b*LOCc(nrhs).
On entry, the array a contains the local pieces of the global array A.
On entry, the array b contains the right hand side distributed matrix sub(B).
If desca[dtype_ - 1] = 501, then dlen_≥ 7;
else if desca[dtype_ - 1] = 1, then dlen_≥ 9.
If descb[dtype_-1] = 502, then dlen_≥ 7;
1311
else if descb[dtype_-1] = 1, then dlen_≥ 9.
work (local)
lwork (local or global) The size of the array work, must be at least lwork≥ (NB
+bwu)*(bwl+bwu)+6*(bwl+bwu)*(bwl+2*bwu) +
+ max(nrhs *(NB+2*bwl+4*bwu), 1).
Output Parameters
a On exit, contains details of the factorization. Note that the resulting

factorization is not the same factorization as returned from LAPACK.
Additional permutations are performed on the matrix for the sake of
parallelism.
matrix X.
ipiv (local) array.

The size of ipiv must be at least desca[NB - 1]. This array contains pivot
indices for local factorizations. You should not alter the contents between
factorization and solve.
If the ith argument is an array and the j-th entry had an illegal value, then
info = -(i*100+j); if the ith argument is a scalar and had an illegal
info> 0:
locally was not nonsingular, and the factorization was not completed. If
info = k > NPROCS, the submatrix stored on processor info-NPROCS
See Also
notations.
p?dbsv
Solves a general band system of linear equations.
Syntax
void psdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work ,
1312
void pddbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , double *a ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double
void pcdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex8
void pzdbsv (MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu , MKL_INT *nrhs , MKL_Complex16
Include Files
• mkl_scalapack.h
Description
The p?dbsvfunction solves the following system of linear equations:
A(1:n, ja:ja+n-1)* X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real/complex banded diagonally dominant-like distributed matrix
with bandwidth bwl, bwu.
Gaussian elimination without pivoting is used to factor a reordering of the matrix into LU.
Optimization Notice
Input Parameters
n (global) The order of the distributed submatrix A, (n≥ 0).
bwl (global) Number of subdiagonals. 0 ≤bwl≤n-1.
bwu (global) Number of subdiagonals. 0 ≤bwu≤n-1.
nrhs (global) The number of right-hand sides; the number of columns of the
distributed submatrix B, (nrhs≥ 0).
a (local).
Pointer into the local memory to an array with leading size lld_a≥ (bwl
+bwu+1) (stored in desca). On entry, this array contains the local pieces of
the distributed matrix.
1313
desca (global and local) array of size dlen.

If 1d type (dtype_a=501 or 502), dlen≥ 7;
If 2d type (dtype_a=1), dlen≥ 9.

Contains information of mapping of A to memory.
b (local)
Pointer into the local memory to an array of local lead size lld_b≥nb. On
entry, this array contains the local pieces of the right hand sides B(ib:ib
+n-1, 1:nrhs).
matrix to be operated on (which may be either all of b or a submatrix of B).
descb (global and local) array of size dlen.

If 1d type (dtype_b =502), dlen≥ 7;
If 2d type (dtype_b =1), dlen≥ 9.

Contains information of mapping of B to memory.
work (local).
Temporary workspace. This space may be overwritten in between calls to
functions. work must be the size given in lwork.
lwork (local or global) Size of user-input workspace work. If lwork is too small,
the minimal acceptable size will be returned in work[0] and an error code
is returned.
lwork≥nb(bwl+bwu)+6max(bwl,bwu)*max(bwl,bwu)
+max((max(bwl,bwu)nrhs), max(bwl,bwu)*max(bwl,bwu))
Output Parameters
a On exit, this array contains information containing details of the

factorization.
Note that permutations are performed on the matrix, so that the factors
returned are different from those returned by LAPACK.
b On exit, this contains the local piece of the solutions distributed matrix X.
work On exit, work[0] contains the minimal lwork.
info (local) If info=0, the execution is successful.
then info = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then info = -i.
> 0: If info = k < NPROCS, the submatrix stored on processor info and
factored locally was not positive definite, and the factorization was not
completed.
1314
representing interactions with other processors was not positive definite,
and the factorization was not completed.
See Also
notations.
p?dtsv
Solves a general tridiagonal system of linear
equations.
Syntax
void psdtsv (MKL_INT *n , MKL_INT *nrhs , float *dl , float *d , float *du , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT
void pddtsv (MKL_INT *n , MKL_INT *nrhs , double *dl , double *d , double *du ,
void pcdtsv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *dl , MKL_Complex8 *d ,
MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
void pzdtsv (MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *dl , MKL_Complex16 *d ,
MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
Include Files
• mkl_scalapack.h
Description
The function solves a system of linear equations
A(1:n, ja:ja+n-1) * X = B(ib:ib+n-1, 1:nrhs),
where A(1:n, ja:ja+n-1) is an n-by-n complex tridiagonal diagonally dominant-like distributed matrix.
Gaussian elimination without pivoting is used to factor a reordering of the matrix into L U.
Optimization Notice
1315
Input Parameters
n (global) The order of the distributed submatrix A(n≥ 0).
nrhs The number of right hand sides; the number of columns of the distributed
matrix B(nrhs≥ 0).
dl (local).
Pointer to local part of global vector storing the lower diagonal of the
matrix. Globally, dl[0] is not referenced, and dl must be aligned with d.
Must be of size > desca[nb_ - 1].
d (local).
Pointer to local part of global vector storing the main diagonal of the matrix.
du (local).
Pointer to local part of global vector storing the upper diagonal of the
matrix. Globally, du[n - 1] is not referenced, and du must be aligned with
d.


b (local)
Pointer into the local memory to an array of local lead size lld_b > nb. On
+n-1, 1:nrhs).

If 1d type (dtype_b =502), dlen≥ 7;
If 2d type (dtype_b =1), dlen≥ 9.

work (local).
is returned. lwork > (12*NPCOL+3*nb)+max((10+2*min(100,
nrhs))*NPCOL+4*nrhs, 8*NPCOL)
1316
Output Parameters
dl On exit, this array contains information containing the * factors of the

matrix.
d On exit, this array contains information containing the * factors of the

matrix. Must be of size > desca[nb_ - 1].
du On exit, this array contains information containing the * factors of the

matrix. Must be of size > desca[nb_ - 1].
> 0: If info = k<NPROCS, the submatrix stored on processor info and

completed.
See Also
notations.
p?posv
Solves a symmetric positive definite system of linear
equations.
Syntax
void psposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a , MKL_INT *ia ,
MKL_INT *info );
void pdposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia ,
void pcposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia ,
void pzposv (char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
1317
Description
The p?posvfunction computes the solution to a real/complex system of linear equations
sub(A)*X = sub(B),
where sub(A) denotes A(ia:ia+n-1,ja:ja+n-1) and is an n-by-n symmetric/Hermitian distributed positive
definite matrix and X and sub(B) denoting B(ib:ib+n-1,jb:jb+nrhs-1) are n-by-nrhs distributed
matrices. The Cholesky decomposition is used to factor sub(A) as
sub(A) = UT*U, if uplo = 'U', or
sub(A) = L*LT, if uplo = 'L',
where U is an upper triangular matrix and L is a lower triangular matrix. The factored form of sub(A) is then
used to solve the system of equations.
Input Parameters
Indicates whether the upper or lower triangular part of sub(A) is stored.
nrhs The number of right-hand sides; the number of columns of the distributed
matrix sub(B) (nrhs≥ 0).
a (local)
entry, this array contains the local pieces of the n-by-n symmetric
distributed matrix sub(A) to be factored.
is not referenced.
b (local)
Pointer into the local memory to an array of size lld_b*LOCc(jb+nrhs-1).
On entry, the local pieces of the right hand sides distributed matrix sub(B).
1318
Output Parameters
a On exit, if info = 0, this array contains the local pieces of the factor U or L
from the Cholesky factorization sub(A) = UH*U, or L*LH.
b On exit, if info = 0, sub(B) is overwritten by the solution distributed

matrix X.
info (global)
If info =0, the execution is successful.
If info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j), if the i-th argument is
a scalar and had an illegal value, then info = -i.
If info > 0: If info = k, the leading minor of order k, A(ia:ia+k-1,

ja:ja+k-1) is not positive definite, and the factorization could not be
completed, and the solution has not been computed.
See Also
notations.
p?posvx
Solves a symmetric or Hermitian positive definite
system of linear equations.
Syntax
void psposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , float *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *af , MKL_INT *iaf , MKL_INT *jaf ,
MKL_INT *descaf , char *equed , float *sr , float *sc , float *b , MKL_INT *ib ,
MKL_INT *jb , MKL_INT *descb , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
float *rcond , float *ferr , float *berr , float *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *info );
void pdposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , double *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , char *equed , double *sr , double *sc , double *b , MKL_INT
*ib , MKL_INT *jb , MKL_INT *descb , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , double *rcond , double *ferr , double *berr , double *work , MKL_INT *lwork ,
void pcposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *af , MKL_INT *iaf , MKL_INT
*jaf , MKL_INT *descaf , char *equed , float *sr , float *sc , MKL_Complex8 *b ,
*jx , MKL_INT *descx , float *rcond , float *ferr , float *berr , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *info );
void pzposvx (char *fact , char *uplo , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *af , MKL_INT *iaf ,
MKL_INT *jaf , MKL_INT *descaf , char *equed , double *sr , double *sc , MKL_Complex16
*b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_Complex16 *x , MKL_INT *ix ,
1319
MKL_INT *jx , MKL_INT *descx , double *rcond , double *ferr , double *berr ,
MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?posvxfunction uses the Cholesky factorization A=UT*U or A=L*LT to compute the solution to a real or
complex system of linear equations
A(ia:ia+n-1, ja:ja+n-1)*X = B(ib:ib+n-1, jb:jb+nrhs-1),
where A(ia:ia+n-1, ja:ja+n-1) is a n-by-n matrix and X and B(ib:ib+n-1,jb:jb+nrhs-1) are n-by-
nrhs matrices.
In the following comments y denotes Y(iy:iy+m-1, jy:jy+k-1), an m-by-k matrix where y can be a, af, b
and x.
The function p?posvx performs the following steps:
diag(sr)*A*diag(sc)*inv(diag(sc))*X = diag(sr)*B
equilibration is used, A is overwritten by diag(sr)*A*diag(sc) and B by diag(sr)*B .
fact = 'E') as
A = UT*U, if uplo = 'U', or
A = L*LT, if uplo = 'L',
3. The factored form of A is used to estimate the condition number of the matrix A. If the reciprocal of the
condition number is less than machine precision, steps 4-6 are skipped
6. If equilibration was used, the matrix X is premultiplied by diag(sr) so that it solves the original system
Input Parameters
fact (global) Must be 'F', 'N', or 'E'.

factored.
If fact = 'F': on entry, af contains the factored form of A. If equed =
'Y', the matrix A has been equilibrated with scaling factors given by s. a
and af will not be modified.
If fact = 'E', the matrix A will be equilibrated if necessary, then copied to

af and factored.
1320
Indicates whether the upper or lower triangular part of A is stored.
distributed submatrices B and X. (nrhs≥ 0).
a (local)
Pointer into the local memory to an array of local size lld_a*LOCc(ja
+n-1). On entry, the symmetric/Hermitian matrix A, except if fact = 'F'
and equed = 'Y', then A must contain the equilibrated matrix
diag(sr)*A*diag(sc).
If uplo = 'U', the leading n-by-n upper triangular part of A contains the
If uplo = 'L', the leading n-by-n lower triangular part of A contains the
of A is not referenced. A is not modified if fact = 'F' or 'N', or if fact =
'E' and equed = 'N' on exit.
af (local)
Pointer into the local memory to an array of local size lld_af*LOCc(ja
+n-1).
If fact = 'F', then af is an input argument and on entry contains the
triangular factor U or L from the Cholesky factorization A = UT*U or A =
L*LT, in the same storage format as A. If equed≠'N', then af is the
factored form of the equilibrated matrix diag(sr)*A*diag(sc).
first row and the first column of the submatrix AF, respectively.
equed (global) Must be 'N' or 'Y'.

If equed = 'N', no equilibration was done (always true if fact = 'N');
If equed = 'Y', equilibration was done and A has been replaced by

diag(sr)*A*diag(sc).
sr (local)
Array of size lld_a.
1321
The array s contains the scale factors for A. This array is an input argument
if fact = 'F' only; otherwise it is an output argument.
b (local)
Pointer into the local memory to an array of local size lld_b*LOCc(jb
+nrhs-1). On entry, the n-by-nrhs right-hand side matrix B.
descb (global and local) Array of size dlen_. The array descriptor for the
x (local)
Pointer into the local memory to an array of local size lld_x*LOCc(jx
+nrhs-1).
first row and the first column of the submatrix X, respectively.
work (local)

The size of the array work. lwork is local input and must be at least lwork
= max(p?pocon(lwork), p?porfs(lwork)) + LOCr(n_a).
lwork = 3*desca[lld_ - 1].
iwork (local) Workspace array of size liwork.
liwork (local or global)

The size of the array iwork. liwork is local input and must be at least
liwork = desca[lld_ - 1]liwork = LOCr(n_a).
1322
Output Parameters
a On exit, if fact = 'E' and equed = 'Y', a is overwritten by

diag(sr)*a*diag(sc).
af If fact = 'N', then af is an output argument and on exit returns the

L*LT of the original matrix A.
If fact = 'E', then af is an output argument and on exit returns the
L*LT of the equilibrated matrix A (see the description of A for the form of
the equilibrated matrix).
equed If fact≠'F' , then equed is an output argument. It specifies the form of

Arguments section).
sr This array is an output argument if fact≠'F'.
See the description of sr in Input Arguments section.
sc This array is an output argument if fact≠'F'.
See the description of sc in Input Arguments section.
b On exit, if equed = 'N', b is not modified; if trans = 'N' and equed =

'R' or 'B', b is overwritten by diag(r)*b; if trans = 'T' or 'C' and
equed = 'C' or 'B', b is overwritten by diag(c)*b.
x (local)
If info = 0 the n-by-nrhs solution matrix X to the original system of
equations.
Note that A and B are modified on exit if equed≠'N', and the solution to
the equilibrated system is
inv(diag(sc))*X if trans = 'N' and equed = 'C' or 'B', or
inv(diag(sr))*X if trans = 'T' or 'C' and equed = 'R' or 'B'.
rcond (global)
An estimate of the reciprocal condition number of the matrix A after
particular, if rcond=0), the matrix is singular to working precision. This
condition is indicated by a return code of info > 0.
ferr Arrays of size at least max(LOC,n_b). The estimated forward error bounds
for each solution vector X(j) (the j-th column of the solution matrix X). If
xtrue is the true solution, ferr[j - 1] bounds the magnitude of the largest
entry in (X(j) - xtrue) divided by the magnitude of the largest entry in
X(j). The quality of the error bound depends on the quality of the estimate
of norm(inv(A)) computed in the code; if the estimate of norm(inv(A))
is accurate, the error bound is guaranteed.
berr (local)
1323
Arrays of size at least max(LOC,n_b). The componentwise relative

backward error of each solution vector X(j) (the smallest relative change in
any entry of A or B that makes X(j) an exact solution).
work[0] (local) On exit, work[0] returns the minimal and optimal liwork.
info (global)

> 0: if info = i, and i is ≤n: if info = i, the leading minor of order i of a
is not positive definite, so the factorization could not be completed, and the
solution and error bounds could not be computed.
= n+1: rcond is less than machine precision. The factorization has been
completed, but the matrix is singular to working precision, and the solution
and error bounds have not been computed.
See Also
notations.
p?pbsv
Solves a symmetric/Hermitian positive definite banded
system of linear equations.
Syntax
void pspbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , float *a ,
MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work ,
void pdpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , double *a ,
void pcpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex8 *a ,
void pzpbsv (char *uplo , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs , MKL_Complex16 *a ,
Include Files
• mkl_scalapack.h
Description
The p?pbsvfunction solves a system of linear equations
A(1:n, ja:ja+n-1)*X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real/complex banded symmetric positive definite distributed matrix
with bandwidth bw.
Cholesky factorization is used to factor a reordering of the matrix into L*L'.
1324
Optimization Notice
Input Parameters
Indicates whether the upper or lower triangular of A is stored.

If uplo = 'U', the upper triangular A is stored
If uplo = 'L', the lower triangular of A is stored.
n (global) The order of the distributed matrix A(n≥ 0).
bw (global) The number of subdiagonals in L or U. 0 ≤bw≤n-1.
nrhs (global) The number of right-hand sides; the number of columns in

B(nrhs≥ 0).
a (local).
Pointer into the local memory to an array with leading size lld_a≥ (bw+1)
(stored in desca). On entry, this array contains the local pieces of the
distributed matrix sub(A) to be factored.
b (local)
Pointer into the local memory to an array of local lead size lld_b≥nb. On
+n-1, 1:nrhs).

If 1D type (dtype_b =502), dlen≥ 7;
If 2D type (dtype_b =1), dlen≥ 9.

1325
work (local).
is returned. lwork≥ (nb+2*bw)*bw +max((bw*nrhs), bw*bw)
Output Parameters
a On exit, this array contains information containing details of the

factorization. Note that permutations are performed on the matrix, so that
the factors returned are different from those returned by LAPACK.
b On exit, contains the local piece of the solutions distributed matrix X.

> 0: If info = k≤NPROCS, the submatrix stored on processor info and

completed.
See Also
notations.
p?ptsv
Syntax
Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations.
void psptsv (MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *descb , float *work , MKL_INT *lwork ,
MKL_INT *info );
void pdptsv (MKL_INT *n , MKL_INT *nrhs , double *d , double *e , MKL_INT *ja ,
MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *work , MKL_INT
void pcptsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , MKL_Complex8 *e ,
void pzptsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , MKL_Complex16 *e ,
1326
Include Files
• mkl_scalapack.h
Description
The p?ptsvfunction solves a system of linear equations
A(1:n, ja:ja+n-1)*X = B(ib:ib+n-1, 1:nrhs),

where A(1:n, ja:ja+n-1) is an n-by-n real tridiagonal symmetric positive definite distributed matrix.
Cholesky factorization is used to factor a reordering of the matrix into L*L'.
Optimization Notice
Input Parameters
n (global) The order of matrix A(n≥ 0).
distributed submatrix B(nrhs≥ 0).
d (local)
e (local)
matrix. Globally, du(n) is not referenced, and du must be aligned with d.


b (local)
Pointer into the local memory to an array of local lead size lld_b≥nb.
On entry, this array contains the local pieces of the right hand sides
1327

If 1d type (dtype_b = 502), dlen≥ 7;
If 2d type (dtype_b = 1), dlen≥ 9.

work (local).
is returned. lwork > (12*NPCOL+3*nb)+max((10+2*min(100,
nrhs))*NPCOL+4*nrhs, 8*NPCOL).
Output Parameters
d On exit, this array contains information containing the factors of the matrix.
Must be of size greater than or equal to desca[nb_ - 1].
e On exit, this array contains information containing the factors of the matrix.
Must be of size greater than or equal to desca[nb_ - 1].
> 0: If info = k≤NPROCS, the submatrix stored on processor info and

completed.
See Also
notations.
p?gels
Solves overdetermined or underdetermined linear
systems involving a matrix of full rank.
1328
Syntax
void psgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , float *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , float *work , MKL_INT *lwork , MKL_INT *info );
void pdgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , double *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , double *work , MKL_INT *lwork , MKL_INT *info );
void pcgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , MKL_Complex8 *work , MKL_INT *lwork , MKL_INT *info );
void pzgels (char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT
*jb , MKL_INT *descb , MKL_Complex16 *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?gels function solves overdetermined or underdetermined real/ complex linear systems involving an
m-by-n matrix sub(A) = A(ia:ia+m-1,ja:ja+n-1), or its transpose/ conjugate-transpose, using a QTQ or
LQ factorization of sub(A). It is assumed that sub(A) has full rank.
The following options are provided:
1. If trans = 'N' and m≥n: find the least squares solution of an overdetermined system, that is, solve
the least squares problem
minimize ||sub(B) - sub(A)*X||
2. If trans = 'N' and m < n: find the minimum norm solution of an underdetermined system sub(A)*X
= sub(B).
3. If trans = 'T' and m≥n: find the minimum norm solution of an undetermined system sub(A)T*X =
sub(B).
4. If trans = 'T' and m < n: find the least squares solution of an overdetermined system, that is, solve
the least squares problem
minimize ||sub(B) - sub(A)T*X||,
where sub(B) denotes B(ib:ib+m-1, jb:jb+nrhs-1) when trans = 'N' and B(ib:ib+n-1,
jb:jb+nrhs-1) otherwise. Several right hand side vectors b and solution vectors x can be handled in a
single call; when trans = 'N', the solution vectors are stored as the columns of the n-by-nrhs right
hand side matrix sub(B) and the m-by-nrhs right hand side matrix sub(B) otherwise.
Input Parameters
trans (global) Must be 'N', or 'T'.
If trans = 'N', the linear system involves matrix sub(A);
If trans = 'T', the linear system involves the transposed matrix AT (for
real flavors only).
m (global) The number of rows in the distributed matrix sub (A) (m≥ 0).
n (global) The number of columns in the distributed matrix sub (A) (n≥ 0).
1329
nrhs (global) The number of right-hand sides; the number of columns in the
distributed submatrices sub(B) and X. (nrhs≥ 0).
a (local)
entry, contains the m-by-n matrix A.
b (local)
Pointer into the local memory to an array of local size lld_b*LOCc(jb
+nrhs-1). On entry, this array contains the local pieces of the distributed
matrix B of right-hand side vectors, stored columnwise; sub(B) is m-by-
nrhs if trans='N', and n-by-nrhs otherwise.
work (local)
lwork (local or global) .

The size of the array worklwork is local input and must be at least
lwork≥ltau + max(lwf, lws), where if m > n, then
ltau = numroc(ja+min(m,n)-1, nb_a, MYCOL, csrc_a, NPCOL),
lwf = nb_a*(mpa0 + nqa0 + nb_a)
lws = max((nb_a*(nb_a-1))/2, (nrhsqb0 + mpb0)*nb_a) +
nb_a*nb_a
else
ltau = numroc(ia+min(m,n)-1, mb_a, MYROW, rsrc_a, NPROW),
lwf = mb_a * (mpa0 + nqa0 + mb_a)
lws = max((mb_a*(mb_a-1))/2, (npb0 + max(nqa0 +
numroc(numroc(n+iroffb, mb_a, 0, 0, NPROW), mb_a, 0, 0,
lcmp), nrhsqb0))*mb_a) + mb_a*mb_a
end if,
where lcmp = lcm/NPROW with lcm = ilcm(NPROW, NPCOL),

iacol= indxg2p(ja, nb_a, MYROW, rsrc_a, NPROW)
1330
nqa0 = numroc(n+icoffa, nb_a, MYCOL, iacol, NPCOL),
ibrow = indxg2p(ib, mb_b, MYROW, rsrc_b, NPROW),
ibcol = indxg2p(jb, nb_b, MYCOL, csrc_b, NPCOL),
mpb0 = numroc(m+iroffb, mb_b, MYROW, icrow, NPROW),
nqb0 = numroc(n+icoffb, nb_b, MYCOL, ibcol, NPCOL),
NOTE
NPROW, and NPCOL can be determined by calling the function
blacs_gridinfo.
Output Parameters
a On exit, If m≥n, sub(A) is overwritten by the details of its QR factorization as

returned by p?geqrf; if m < n, sub(A) is overwritten by details of its LQ
factorization as returned by p?gelqf.
b On exit, sub(B) is overwritten by the solution vectors, stored columnwise: if

trans = 'N' and m≥n, rows 1 to n of sub(B) contain the least squares
solution vectors; the residual sum of squares for the solution in each
column is given by the sum of squares of elements n+1 to m in that
column;
If trans = 'N' and m < n, rows 1 to n of sub(B) contain the minimum
norm solution vectors;
If trans = 'T' and m≥n, rows 1 to m of sub(B) contain the minimum norm
solution vectors; if trans = 'T' and m < n, rows 1 to m of sub(B) contain
the least squares solution vectors; the residual sum of squares for the
solution in each column is given by the sum of squares of elements m+1 to n
in that column.
info (global)
< 0: if the i-th argument is an array and the j-entry had an illegal value,
then info = - (i* 100+j), if the i-th argument is a scalar and had an
1331
See Also
notations.
p?syev
symmetric matrix.
Syntax
void pssyev (char *jobz , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *w , float *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , float *work , MKL_INT *lwork , MKL_INT *info );
void pdsyev (char *jobz , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *w , double *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , double *work , MKL_INT *lwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?syevfunction computes all eigenvalues and, optionally, eigenvectors of a real symmetric matrix A by
calling the recommended sequence of ScaLAPACK functions.
In its present form, the function assumes a homogeneous system and makes no checks for consistency of
the eigenvalues or eigenvectors across the different processes. Because of this, it is possible that a
heterogeneous system may return incorrect results without any error messages.
Input Parameters
np = the number of rows local to a given process.
nq = the number of columns local to a given process.
jobz (global) Must be 'N' or 'V'. Specifies if it is necessary to compute the

eigenvectors:
If jobz ='N', then only eigenvalues are computed.
If jobz ='V', then eigenvalues and eigenvectors are computed.
uplo (global) Must be 'U' or 'L'. Specifies whether the upper or lower
triangular part of the symmetric matrix A is stored:
n (global) The number of rows and columns of the matrix A(n≥ 0).
a (local)
Block cyclic array of global size n*n and local size lld_a*LOCc(ja+n-1).
If uplo = 'U', only the upper triangular part of A is used to define the
elements of the symmetric matrix.
1332
If uplo = 'L', only the lower triangular part of A is used to define the
work (local)
Array of size lwork.
lwork (local) See below for definitions of variables used to define lwork.
If no eigenvectors are requested (jobz = 'N'), then lwork≥ 5*n +
sizesytrd + 1,
where sizesytrdis the workspace for p?sytrd and is max(NB*(np +1),
3*NB).
If eigenvectors are requested (jobz = 'V') then the amount of workspace
required to guarantee that all eigenvectors are computed is:
qrmem = 2*n-2
lwmin = 5*n + n*ldc + max(sizemqrleft, qrmem) + 1
Variable definitions:
nb = desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] =
descz[nb_ - 1];
nn = max(n, nb, 2);
desca[rsrc_ - 1] = desca[rsrc_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0
np = numroc(nn, nb, 0, 0, NPROW)
nq = numroc(max(n, nb, 2), nb, 0, 0, NPCOL)
nrc = numroc(n, nb, myprowc, 0, NPROCS)
ldc = max(1, nrc)
sizemqrleft is the workspace for p?ormtr when its side argument is 'L'.
myprowc is defined when a new context is created as follows:
call blacs_get(desca[ctxt_ - 1], 0, contextc)
call blacs_gridinit(contextc, 'R', NPROCS, 1)
call blacs_gridinfo(contextc, nprowc, npcolc, myprowc,
mypcolc)
1333

Output Parameters
a On exit, the lower triangle (if uplo='L') or the upper triangle (if
uplo='U') of A, including the diagonal, is destroyed.
w (global).
Array of size n.
On normal exit, the first m entries contain the selected eigenvalues in
ascending order.
z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1). If jobz = 'V',
then on normal exit the first m columns of z contain the orthonormal
eigenvectors of the matrix corresponding to the selected eigenvalues.
work[0] On output, work[0] returns the workspace needed to guarantee

completion. If the input parameters are incorrect, work[0] may also be
incorrect.
If jobz = 'N'work[0] = minimal (optimal) amount of workspace
If jobz = 'V'work[0] = minimal workspace required to generate all the

eigenvectors.
info (global)
If info < 0: If the i-th argument is an array and the j-entry had an illegal
value, then info = -(i*100+j), if the i-th argument is a scalar and had
an illegal value, then info = -i.
If info > 0:
If info= 1 through n, the i-th eigenvalue did not converge in ?steqr2

after a total of 30n iterations.
If info= n+1, then p?syev has detected heterogeneity by finding that
eigenvalues were not identical across the process grid. In this case, the
accuracy of the results from p?syev cannot be guaranteed.
See Also
notations.
p?syevd
Computes all eigenvalues and eigenvectors of a real
symmetric matrix by using a divide and conquer
algorithm.
1334
Syntax
void pssyevd (char *jobz , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *w , float *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , float *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );
void pdsyevd (char *jobz , char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *w , double *z , MKL_INT *iz , MKL_INT *jz , MKL_INT
*descz , double *work , MKL_INT *lwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?syevd function computes all eigenvalues and eigenvectors of a real symmetric matrix A by using a
Input Parameters
jobz (global) Must be 'N' or 'V'.
Specifies if it is necessary to compute the eigenvectors:

A is stored:
a (local).
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?syevd cannot
guarantee correct error reporting.
1335
distributed matrix Z. descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
work (local).

If eigenvalues are requested:
lwork≥ max( 1+6*n + 2*np*nq, trilwmin) + 2*n
with trilwmin = 3*n + max( nb*( np + 1), 3*nb )
np = numroc( n, nb, myrow, iarow, NPROW)

nq = numroc( n, nb, mycol, iacol, NPCOL)
assumed; the function only calculates the size required for optimal
performance for all work arrays. The required workspace is returned as the
first element of the corresponding work arrays, and no error message is
issued by pxerbla.
liwork (local) , size of iwork.
liwork = 7*n + 8*npcol + 2.
Output Parameters
a On exit, the lower triangle (if uplo = 'L'), or the upper triangle (if uplo =
w (global).
Array of size n. If info = 0, w contains the eigenvalues in the ascending
order.
z (local).
Array, global size (n, n), local size lld_z*LOCc(jz+n-1).
The z parameter contains the orthonormal eigenvectors of the matrix A.
work[0] On exit, returns adequate workspace to allow optimal performance.
iwork[0] (local).
On exit, if liwork > 0, iwork[0] returns the optimal liwork.
info (global)
If info < 0:
1336
If the i-th argument is an array and the j-entry had an illegal value, then
info = -(i*100+j). If the i-th argument is a scalar and had an illegal
If info> 0:
The algorithm failed to compute the info/(n+1)-th eigenvalue while

working on the submatrix lying in global rows and columns mod(info,n
+1).
NOTE
See Also
notations.
p?syevr
eigenvectors of a real symmetric matrix using
Relatively Robust Representation.
Syntax
void pssyevr(char* jobz, char* range, char* uplo, MKL_INT* n, float* a, MKL_INT* ia,
MKL_INT* ja, MKL_INT* desca, float* vl, float* vu, MKL_INT* il, MKL_INT* iu, MKL_INT*
m, MKL_INT* nz, float* w, float* z, MKL_INT* iz, MKL_INT* jz, MKL_INT* descz, float*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
void pdsyevr(char* jobz, char* range, char* uplo, MKL_INT* n, double* a, MKL_INT* ia,
MKL_INT* ja, MKL_INT* desca, double* vl, double* vu, MKL_INT* il, MKL_INT* iu,
MKL_INT* m, MKL_INT* nz, double* w, double* z, MKL_INT* iz, MKL_INT* jz, MKL_INT*
descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?syevr computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A
distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK functions.
First, the matrix A is reduced to real symmetric tridiagonal form. Then, the eigenproblem is solved using the
parallel MRRR algorithm. Last, if eigenvectors have been computed, a backtransformation is done.
Upon successful completion, each processor stores a copy of all computed eigenvalues in w. The eigenvector
matrix z is stored in 2D block-cyclic format distributed over all processors.
Note that subsets of eigenvalues/vectors can be selected by specifying a range of values or a range of indices
for the desired eigenvalues.
Optimization Notice
1337
Optimization Notice

Input Parameters
jobz (global)
Specifies whether or not to compute the eigenvectors:
= 'N': Compute eigenvalues only.
= 'V': Compute eigenvalues and eigenvectors.
range (global)
= 'A': all eigenvalues will be found.
= 'V': all eigenvalues in the interval [vl,vu] will be found.
= 'I': the il-th through iu-th eigenvalues will be found.
uplo (global)
matrix A is stored:
n (global )
The number of rows and columns of the matrix a. n≥ 0
a Block cyclic array of global size n * n), local size lld_a * LOCc(ja+n-1).
This array contains the local pieces of the symmetric distributed matrix A. If
uplo = 'U', only the upper triangular part of a is used to define the
elements of the symmetric matrix. If uplo = 'L', only the lower triangular
part of a is used to define the elements of the symmetric matrix.
On exit, the lower triangle (if uplo='L') or the upper triangle (if uplo='U')
of a, including the diagonal, is destroyed.
ia (global )
Global row index in the global matrix A that points to the beginning of the
submatrix which is to be operated on. It should be set to 1 when operating
on a full matrix.
ja (global )
Global column index in the global matrix A that points to the beginning of
the submatrix which is to be operated on. It should be set to 1 when
operating on a full matrix.
desca (global and local) array of size dlen_=9.
1338
The array descriptor for the distributed matrix a.
vl (global )
If range='V', the lower bound of the interval to be searched for
eigenvalues. Not referenced if range = 'A' or 'I'.
vu (global )
If range='V', the upper bound of the interval to be searched for
il (global )
If range='I', the index (from smallest to largest) of the smallest eigenvalue
to be returned. il≥ 1.
Not referenced if range = 'A'.
iu (global )
If range='I', the index (from smallest to largest) of the largest eigenvalue
to be returned. min(il,n) ≤iu≤n.
iz (global )
Global row index in the global matrix Z that points to the beginning of the
on a full matrix.
jz (global )
Global column index in the global matrix Z that points to the beginning of
descz array of size dlen_.

The array descriptor for the distributed matrix z.
The context descz[ctxt_ - 1] must equal desca[ctxt_ - 1]. Also note the
array alignment requirements specified below.
work (local workspace) array of size lwork
lwork (local )
Size of work, must be at least 3.
See below for definitions of variables used to define lwork.
If no eigenvectors are requested (jobz = 'N') then
lwork≥ 2 + 5*n + max( 12 * nn, neig * ( np0 + 1 ) )

If eigenvectors are requested (jobz = 'V' ) then the amount of workspace
required is:
lwork≥ 2 + 5*n + max( 18*nn, np0 * mq0 + 2 * neig * neig ) + (2 +
iceil( neig, nprow*npcol))*nn
1339
neig = number of eigenvectors requested

nb = desca[ mb_ - 1] = desca( nb_ ) = descz[ mb_ - 1] = descz( nb_ )
nn = max( n, neig, 2 )
desca[ rsrc_ - 1] = desca[ csrc_nb_ - 1] = descz[rsrc_ - 1] =

descz[csrc_ - 1] = 0
np0 = numroc( nn, neig, 0, 0, nprow )
mq0 = numroc( max( neig, neig, 2 ), neig, 0, 0, npcol )
iceil( x, y ) is a ScaLAPACK function returning ceiling(x/y), and nprow and

npcol can be determined by calling the function blacs_gridinfo.

performance for all work arrays. Each of these values is returned in the first
entry of the corresponding work arrays, and no error message is issued by
pxerbla.
liwork (local )
size of iwork
Let nnp = max( n, nprow*npcol + 1, 4 ). Then:
liwork≥ 12*nnp + 2*n when the eigenvectors are desired

liwork≥ 10*nnp + 2*n when only the eigenvalues have to be computed

OUTPUT Parameters
m (global )
Total number of eigenvalues found. 0 ≤m≤n.
nz (global )
Total number of eigenvectors computed. 0 ≤nz≤m.
The number of columns of z that are filled.
If jobz≠ 'V', nz is not referenced.
If jobz = 'V', nz = m
w (global ) array of size n
Upon successful exit, the first m entries contain the selected eigenvalues in
ascending order.
z Block-cyclic array, global sizen*n, local size lld_z*LOCc(jz+n-1).
On exit, contains local pieces of distributed matrix Z.
1340
work On return, work[0] contains the optimal amount of workspace required for
efficient execution. If jobz='N' work[0] = optimal amount of workspace
required to compute the eigenvalues. If jobz='V' work[0] = optimal
amount of workspace required to compute eigenvalues and eigenvectors.
iwork (local workspace) array

On return, iwork[0] contains the amount of integer workspace required.
info (global )
< 0: If the i-th argument is an array and the jth-entry had an illegal value,
Application Notes
The distributed submatrices a(ia:*, ja:*) and z(iz:iz+m-1,jz:jz+n-1) must satisfy the following
alignment properties:
1. Identical (quadratic) dimension: desca[m_ - 1] = descz[m_ - 1] = desca[n_ - 1] = descz[n_ - 1]

2. Quadratic conformal blocking: desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] = descz[nb_ - 1],
desca[rsrc_ - 1] = descz[rsrc_ - 1]
3. mod( ia-1, mb_a ) = mod( iz-1, mb_z ) = 0
NOTE
See Also
notations.
p?syevx
eigenvectors of a symmetric matrix.
Syntax
void pssyevx (char *jobz , char *range , char *uplo , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *vl , float *vu , MKL_INT *il , MKL_INT
*iu , float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac , float *z ,
MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT
*info );
void pdsyevx (char *jobz , char *range , char *uplo , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *vl , double *vu , MKL_INT *il , MKL_INT
*iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac , double
*z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , double *work , MKL_INT *lwork ,
MKL_INT *iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap ,
MKL_INT *info );
1341
Include Files
• mkl_scalapack.h
Description
The p?syevxfunction computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix
A by calling the recommended sequence of ScaLAPACK functions. Eigenvalues and eigenvectors can be
selected by specifying either a range of values or a range of indices for the desired eigenvalues.
Optimization Notice
Input Parameters
jobz (global) Must be 'N' or 'V'. Specifies if it is necessary to compute the

eigenvectors:
If jobz ='N', then only eigenvalues are computed.
If jobz ='V', then eigenvalues and eigenvectors are computed.
range (global) Must be 'A', 'V', or 'I'.
If range = 'V', all eigenvalues in the half-open interval [vl, vu] will be
found.

matrix A is stored:
a (local).
1342
vl, vu (global)
If range = 'V', the lower and upper bounds of the interval to be searched
for eigenvalues; vl≤vu. Not referenced if range = 'A' or 'I'.
il, iu (global)
If range ='I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints: il≥ 1
min(il,n) ≤iu≤n
abstol (global).
If jobz='V', setting abstol to p?lamch(context, 'U') yields the most
orthogonal eigenvectors.
The absolute error tolerance for the eigenvalues. An approximate
eigenvalue is accepted as converged when it is determined to lie in an
interval [a, b] of width less than or equal to
abstol + eps * max(|a|,|b|),

where eps is the machine precision. If abstol is less than or equal to zero,
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice
the underflow threshold 2*p?lamch('S') not zero. If this function returns
with (mod(info,2) ≠ 0) or (mod(info/8,2) ≠ 0)), indicating that some
eigenvalues or eigenvectors did not converge, try setting abstol to 2*p?
lamch('S').
orfac (global).
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that
correspond to eigenvalues which are within tol=orfac*norm(A)of each
other are to be reorthogonalized. However, if the workspace is insufficient
(see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will
be done if orfac equals zero. A default value of 1.0e-3 is used if orfac is
negative. orfac should be identical on all processes.
distributed matrix Z.descz[ctxt_ - 1] must equal desca[ctxt_ - 1].
1343
work (local)

max(5*nn, NB*(np0 + 1)).
If eigenvectors are requested (jobz = 'V'), then the amount of workspace
lwork≥ 5*n + max(5*nn, np0*mq0 + 2*NB*NB) + iceil(neig,
NPROW*NPCOL)*nn
The computed eigenvectors may not be orthogonal if the minimal
workspace is supplied and orfac is too small. If you want to guarantee
orthogonality (at the cost of potentially poor performance) you should add
the following to lwork:
(clustersize-1)*n,
where clustersize is the number of eigenvalues in the largest cluster, where
a cluster is defined as a set of close eigenvalues:
{w[k - 1],..., w[k+clustersize-2]|w[j] ≤w[j-1]) +
orfac*2*norm(A)},
where
descz[nb_ - 1];
nn = max(n, nb, 2);
desca[rsrc_ - 1] = desca[nb_ - 1] = descz[rsrc_ - 1] =
descz[csrc_ - 1] = 0;
np0 = numroc(nn, nb, 0, 0, NPROW);
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL)
iceil(x, y) is a ScaLAPACK function returning ceiling(x/y)
If lwork is too small to guarantee orthogonality, p?syevx attempts to
maintain orthogonality in the clusters with the smallest spacing between the
eigenvalues.
If lwork is too small to compute all the eigenvectors requested, no
computation is performed and info= -23 is returned.
Note that when range='V', number of requested eigenvectors are not
known until the eigenvalues are computed. In this case and if lwork is large
enough to compute the eigenvalues, p?sygvx computes the eigenvalues
and as many eigenvectors as possible.
Relationship between workspace, orthogonality & performance:
Greater performance can be achieved if adequate workspace is provided. In
some situations, performance can decrease as the provided workspace
increases above the workspace amount shown below:
lwork≥max(lwork, 5*n + nsytrd_lwopt),
1344
where lwork, as defined previously, depends upon the number of
eigenvectors requested, and
nsytrd_lwopt = n + 2*(anb+1)*(4*nps+2) + (nps + 3)*nps;
anb = pjlaenv(desca[ctxt_ - 1], 3, 'p?syttrd', 'L', 0, 0, 0,
0);
sqnpc = int(sqrt(dble(NPROW * NPCOL)));
nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb);
numroc is a ScaLAPACK tool functions;
pjlaenv is a ScaLAPACK environmental inquiry function
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
For large n, no extra workspace is needed, however the biggest boost in
performance comes for small n, so it is wise to provide the extra workspace
(typically less than a megabyte per process).
If clustersize > n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. At the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on single processor.
For clustersize = n/sqrt(NPROW*NPCOL) reorthogonalizing all

eigenvectors will increase the total execution time by a factor of 2 or more.
For clustersize>n/sqrt(NPROW*NPCOL) execution time will grow as the
square of the cluster size, all other factors remaining equal and assuming
enough workspace. Less workspace means less reorthogonalization but
faster execution.
pxerbla.
iwork (local) Workspace array.
liwork (local) , size of iwork. liwork≥ 6*nnp
Where: nnp = max(n, NPROW*NPCOL + 1, 4)

Output Parameters
'U')of A, including the diagonal, is overwritten.
m (global) The total number of eigenvalues found; 0 ≤m≤ n.
nz (global) Total number of eigenvectors computed. 0 ≤ nz≤ m.
1345

If jobz≠'V', nz is not referenced.
If jobz = 'V', nz = m unless the user supplies insufficient space and p?

syevx is not able to detect this before beginning computation. To get all the
eigenvectors requested, the user must supply both sufficient space to hold
the eigenvectors in z (m≤descz[n_ - 1]) and sufficient workspace to
compute them. (See lwork). p?syevx is always able to detect insufficient
space without computation unless range = 'V'.
w (global).
Array of size n. The first m elements contain the selected eigenvalues in
ascending order.
z (local).
Array, global size n*n, local size lld_z*LOCc(jz+n-1).
If jobz = 'V', then on normal exit the first m columns of z contain the
orthonormal eigenvectors of the matrix corresponding to the selected
eigenvalues. If an eigenvector fails to converge, then that column of z
contains the latest approximation to the eigenvector, and the index of the
eigenvector is returned in ifail.
work[0] On exit, returns workspace adequate workspace to allow optimal

performance.
iwork[0] On return, iwork[0] contains the amount of integer workspace required.
ifail (global).
Array of size n.
If jobz = 'V', then on normal exit, the first m elements of ifail are zero. If
(mod(info,2) ≠ 0) on exit, then ifail contains the indices of the
iclustr (global) Array of size (2*NPROW*NPCOL)
This array contains indices of eigenvectors corresponding to a cluster of

eigenvalues that could not be reorthogonalized due to insufficient
clusters of eigenvalues indexed iclustr(2*i-1) to iclustr(2*i), could
not be reorthogonalized due to lack of workspace. Hence the eigenvectors
corresponding to these clusters may not be orthogonal. iclustr is a zero
terminated array. iclustr[2*k - 1] ≠ 0 and iclustr[2*k] = 0 if and
only if k is the number of clusters.
iclustr is not referenced if jobz = 'N'.
gap (global)
Array of size NPROW*NPCOL
1346
This array contains the gap between eigenvalues whose eigenvectors could
not be reorthogonalized. The output values in this array correspond to the
clusters indicated by the array iclustr. As a result, the dot product between
eigenvectors corresponding to the ith cluster may be as high as (C*n)/
gap[i - 1] where C is a small constant.
info (global)
If info < 0:
info = -(i*100+j), if the i-th argument is a scalar and had an illegal
If info> 0: if (mod(info,2)≠0), then one or more eigenvectors failed to

converge. Their indices are stored in ifail. Ensure abstol=2.0*p?
lamch('U').
If (mod(info/2,2)≠0), then eigenvectors corresponding to one or more
clusters of eigenvalues could not be reorthogonalized because of insufficient
workspace.The indices of the clusters are stored in the array iclustr.
If (mod(info/4,2)≠0), then space limit prevented p?syevxf rom
computing all of the eigenvectors between vl and vu. The number of
eigenvectors computed is returned in nz.
If (mod(info/8,2)≠0), then p?stebz failed to compute eigenvalues.
Ensure abstol=2.0*p?lamch('U').
NOTE
See Also
notations.
p?heev
eigenvectors of a complex Hermitian matrix.
Syntax
void pcheev (char *jobz , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *w , MKL_Complex8 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT
void pzheev (char *jobz , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *w , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
Include Files
• mkl_scalapack.h
1347
Description
The p?heev function computes all eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A
by calling the recommended sequence of ScaLAPACK functions. The function assumes a homogeneous
system and makes spot checks of the consistency of the eigenvalues across the different processes. A
heterogeneous system may return incorrect results without any error messages.
Input Parameters

A is stored:
a (local).
On entry, the Hermitian matrix A.
elements of the Hermitian matrix.
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heev cannot
work (local).

If only eigenvalues are requested (jobz = 'N'):
1348
lwork≥max(nb*(np0 + 1), 3) + 3*n
required:
lwork≥ (np0+nq0+nb)*nb + 3*n + n2
with nb = desca[mb_ - 1] = desca[ nb_ - 1] = nb = descz[mb_ -
1] = descz[ nb_ - 1]
np0 = numroc(nn, nb, 0, 0, NPROW).
nq0 = numroc( max( n, nb, 2 ), nb, 0, 0, NPCOL).
issued by pxerbla.
rwork (local).
Workspace array of size lrwork.
lrwork (local) The size of the array rwork.
See below for definitions of variables used to define lrwork.
If no eigenvectors are requested (jobz = 'N'), then lrwork≥ 2*n.
If eigenvectors are requested (jobz = 'V'), then lrwork≥ 2*n + 2*n-2.

assumed; the function only calculates the minimum size required for the
rwork array. The required workspace is returned as the first element of
rwork, and no error message is issued by pxerbla.
Output Parameters
w (global).
ascending order.
z (local).
If jobz ='V', then on normal exit the first m columns of z contain the

If jobz ='N', then work[0] = minimal workspace only for eigenvalues.
1349
If jobz ='V', then work[0] = minimal workspace required to generate all

the eigenvectors.
rwork[0] (local)
On output, rwork[0] returns workspace required to guarantee completion.
info (global)
If info < 0:
If info> 0:
If info = 1 through n, the i-th eigenvalue did not converge in ?steqr2

after a total of 30*n iterations.
If info = n+1, then p?heev detected heterogeneity, and the accuracy of
the results cannot be guaranteed.
See Also
notations.
p?heevd
complex Hermitian matrix by using a divide and
conquer algorithm.
Syntax
void pcheevd (char *jobz , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *w , MKL_Complex8 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT
*lrwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
void pzheevd (char *jobz , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *w , MKL_Complex16 *z , MKL_INT *iz , MKL_INT
*jz , MKL_INT *descz , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*lrwork , MKL_INT *iwork , MKL_INT *liwork , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?heevd function computes all eigenvalues and eigenvectors of a complex Hermitian matrix A by using a
Input Parameters
1350

A is stored:
a (local).
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heevd cannot
work (local).

If eigenvalues are requested:
lwork = n + (nb0 + mq0 + nb)*nb
with np0 = numroc( max( n, nb, 2 ), nb, 0, 0, NPROW)
mq0 = numroc( max( n, nb, 2 ), nb, 0, 0, NPCOL)

issued by pxerbla.
rwork (local).
1351
lrwork≥ 1 + 9*n + 3*np*nq,

with np = numroc( n, nb, myrow, iarow, NPROW)
nq = numroc( n, nb, mycol, iacol, NPCOL)
liwork = 7*n + 8*npcol + 2.
Output Parameters
w (global).
Array of size n. If info = 0, w contains the eigenvalues in the ascending
order.
z (local).
The z parameter contains the orthonormal eigenvectors of the matrix A.
rwork[0] (local)
On output, rwork[0] returns workspace required to guarantee completion.
iwork[0] (local).
info (global)
If info < 0:
If info> 0:
If info = 1 through n, the i-th eigenvalue did not converge.
See Also
notations.
1352
p?heevr
eigenvectors of a Hermitian matrix using Relatively
Robust Representation.
Syntax
void pcheevr(char* jobz, char* range, char* uplo, MKL_INT* n, MKL_Complex8* a,
MKL_INT* ia, MKL_INT* ja, MKL_INT* desca, float* vl, float* vu, MKL_INT* il, MKL_INT*
iu, MKL_INT* m, MKL_INT* nz, float* w, MKL_Complex8* z, MKL_INT* iz, MKL_INT* jz,
MKL_INT* descz, MKL_Complex8* work, MKL_INT* lwork, float* rwork, MKL_INT* lrwork,
MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
void pzheevr(char* jobz, char* range, char* uplo, MKL_INT* n, MKL_Complex16* a,
MKL_INT* ia, MKL_INT* ja, MKL_INT* desca, double* vl, double* vu, MKL_INT* il,
MKL_INT* iu, MKL_INT* m, MKL_INT* nz, double* w, MKL_Complex16* z, MKL_INT* iz,
MKL_INT* jz, MKL_INT* descz, MKL_Complex16* work, MKL_INT* lwork, double* rwork,
MKL_INT* lrwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?heevr computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A
distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK functions.
First, the matrix A is reduced to complex Hermitian tridiagonal form. Then, the eigenproblem is solved using
the parallel MRRR algorithm. Last, if eigenvectors have been computed, a backtransformation is done.
Upon successful completion, each processor stores a copy of all computed eigenvalues in w. The eigenvector
matrix Z is stored in 2D block-cyclic format distributed over all processors.
Optimization Notice
Input Parameters
jobz (global)
Specifies whether or not to compute the eigenvectors:
= 'N': Compute eigenvalues only.
range (global)
= 'A': all eigenvalues will be found.
1353
= 'V': all eigenvalues in the interval [vl,vu] will be found.
= 'I': the il-th through iu-th eigenvalues will be found.
uplo (global)
A is stored:
n (global )
The number of rows and columns of the matrix A. n≥ 0
a Block-cyclic array, global size n * n), local size lld_a * LOCc(ja+n-1)
Contains the local pieces of the Hermitian distributed matrix A. If uplo =

'U', only the upper triangular part of a is used to define the elements of the
Hermitian matrix. If uplo = 'L', only the lower triangular part of a is used to
define the elements of the Hermitian matrix.
ia (global )
Global row index in the global matrix A that points to the beginning of the
on a full matrix.
ja (global )
Global column index in the global matrix A that points to the beginning of
desca (global and local) array of size dlen_. (The ScaLAPACK descriptor length is
dlen_ = 9.)
The array descriptor for the distributed matrix a. The descriptor stores
details about the 2D block-cyclic storage, see the notes below. If desca is
incorrect, p?heevr cannot work correctly.
Also note the array alignment requirements specified below
vl (global)
If range='V', the lower bound of the interval to be searched for
vu (global)
If range='V', the upper bound of the interval to be searched for
il (global )
If range='I', the index (from smallest to largest) of the smallest eigenvalue
to be returned. il≥ 1.
iu (global )
1354
If range='I', the index (from smallest to largest) of the largest eigenvalue
to be returned. min(il,n) ≤iu≤n.
iz (global )
Global row index in the global matrix Z that points to the beginning of the
on a full matrix.
jz (global )
Global column index in the global matrix Z that points to the beginning of
descz (global and local) array of size dlen_.

The array descriptor for the distributed matrix z. descz[ctxt_ - 1] must
equal desca[ctxt_ - 1]
lwork (local )
Size of work array, must be at least 3.
If only eigenvalues are requested:

lwork≥n + max( nb * ( np00 + 1 ), nb * 3 )
If eigenvectors are requested:
lwork≥n + ( np00 + mq00 + nb ) * nb
For definitions of np00 and mq00, see lrwork.
For optimal performance, greater workspace is needed, i.e.

lwork≥ max( lwork, nhetrd_lwork )
Where lwork is as defined above, and
nhetrd_lwork = n + 2*( anb+1 )*( 4*nps+2 ) + ( nps + 1 ) * nps
ictxt = desca[ctxt_ - 1]
anb = pjlaenv( ictxt, 3, 'PCHETTRD', 'L', 0, 0, 0, 0 )
sqnpc = sqrt( real( nprow * npcol ) )

nps = max( numroc( n, 1, 0, 0, sqnpc ), 2*anb )

assumed; the function only calculates the optimal size for all work arrays.
rwork (local workspace) array of size lrwork
lrwork (local )
Size of rwork, must be at least 3.
1355
If no eigenvectors are requested (jobz = 'N') then
lrwork≥ 2 + 5 * n + max( 12 * n, nb * ( np00 + 1 ) )

If eigenvectors are requested (jobz = 'V' ) then the amount of workspace
required is:
lrwork≥ 2 + 5 * n + max( 18*n, np00 * mq00 + 2 * nb * nb ) +
(2 + iceil( neig, nprow*npcol))*n
NOTE
iceil(x,y) is the ceiling of x/y.
nb = desca[ mb_ - 1] = desca[ nb_ - 1] = descz[ mb_ - 1] = descz[nb_
- 1]
nn = max( n, nb, 2 )
desca[ rsrc_ - 1] = desca[csrc_ - 1] = descz[ rsrc_ - 1] = descz[csrc_ -

1] = 0
np00 = numroc( nn, nb, 0, 0, nprow )
mq00 = numroc( max( neig, nb, 2 ), nb, 0, 0, npcol )
iceil( x, y ) is a ScaLAPACK function returning ceiling(x/y), and nprow and

npcol can be determined by calling the function blacs_gridinfo.

pxerbla
iwork (local workspace) array of size liwork
liwork (local )
size of iwork
Let nnp = max( n, nprow*npcol + 1, 4 ). Then:
liwork≥ 12*nnp + 2*n when the eigenvectors are desired

liwork≥ 10*nnp + 2*n when only the eigenvalues have to be computed

corresponding work array, and no error message is issued by pxerbla
OUTPUT Parameters
a The lower triangle (if uplo='L') or the upper triangle (if uplo='U') of a,
including the diagonal, is destroyed.
m (global )
1356
Total number of eigenvalues found. 0 ≤m≤n.
nz (global )
Total number of eigenvectors computed. 0 ≤nz≤m.
If jobz≠ 'V', nz is not referenced.
If jobz = 'V', nz = m
w (global ) array of size n
On normal exit, the first m entries contain the selected eigenvalues in

ascending order.
z (local ) array, global size n * n), local size lld_z*LOCc(jz+n-1)
eigenvalues.
work work[0] returns workspace adequate workspace to allow optimal

performance.
rwork On return, rwork[0] contains the optimal amount of workspace required

for efficient execution. if jobz='N' rwork[0] = optimal amount of
workspace required to compute the eigenvalues. if jobz='V' rwork[0] =
optimal amount of workspace required to compute eigenvalues and
eigenvectors.
iwork On return, iwork[0] contains the amount of integer workspace required.
info (global )
Application Notes
The distributed submatrices a(ia:*, ja:*) and z(iz:iz+m-1,jz:jz+n-1) must satisfy the following
alignment properties:
1. Identical (quadratic) dimension: desca[m_ - 1] = descz[m_ - 1] = desca[n_ - 1] = descz[n_ - 1]

2. Quadratic conformal blocking: desca[mb_ - 1] = desca[nb_ - 1] = descz[mb_ - 1] = descz[nb_ - 1],
desca[rsrc_ - 1] = descz[rsrc_ - 1]
3. mod( ia-1, mb_a ) = mod( iz-1, mb_z ) = 0
NOTE
1357
See Also
notations.
p?heevx
Syntax
void pcheevx (char *jobz , char *range , char *uplo , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *vl , float *vu , MKL_INT *il ,
MKL_INT *iu , float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac ,
MKL_Complex8 *z , MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , MKL_Complex8 *work ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
void pzheevx (char *jobz , char *range , char *uplo , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *vl , double *vu , MKL_INT *il ,
MKL_INT *iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
Include Files
• mkl_scalapack.h
Description
The p?heevx function computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian
matrix A by calling the recommended sequence of ScaLAPACK functions. Eigenvalues and eigenvectors can
Optimization Notice
Input Parameters

1358
range (global) Must be 'A', 'V', or 'I'.
If range = 'V', all eigenvalues in the half-open interval [vl, vu] will be
found.
A is stored:
a (local).
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?heevx cannot
vl, vu (global)
for eigenvalues; not referenced if range = 'A' or 'I'.
il, iu (global)
If range ='I', the indices of the smallest and largest eigenvalues to be
returned.
Constraints:
il≥ 1; min(il,n) ≤iu≤n.
abstol (global).
1359

interval [a, b] of width less than or equal to abstol+eps*max(|a|,|b|),
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm
of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues are computed most accurately when abstol is set to twice the
underflow threshold 2*p?lamch('S'), not zero. If this function returns
with ((mod(info,2)≠0).or.(mod(info/8,2)≠0)), indicating that some
lamch('S').
NOTE
orfac (global).
correspond to eigenvalues which are within tol=orfac*norm(A) of each
negative.
orfac should be identical on all processes.
work (local).

lwork≥n + max(nb*(np0 + 1), 3)
lwork≥n + (np0+mq0+nb)*nb
with nq0 = numroc(nn, nb, 0, 0, NPCOL).
lwork≥ 5*n + max(5*nn, np0*mq0+2*nb*nb) + iceil(neig,

NPROW*NPCOL)*nn
For optimal performance, greater workspace is needed, that is
lwork≥max(lwork, nhetrd_lwork)
where lwork is as defined above, and nhetrd_lwork = n + 2*(anb
+1)*(4*nps+2) + (nps+1)*nps
1360
anb = pjlaenv(ictxt, 3, 'pchettrd', 'L', 0, 0, 0, 0)
sqnpc = sqrt(dble(NPROW * NPCOL))
nps = max(numroc(n, 1, 0, 0, sqnpc), 2*anb)
pxerbla.
rwork (local)
lrwork (local) The size of the array work.

If no eigenvectors are requested (jobz = 'N'), then lrwork≥ 5*nn+4*n.

lrwork≥ 4*n + max(5*nn, np0*mq0+2*nb*nb) + iceil(neig,
NPROW*NPCOL)*nn
the following values to lrwork:
(clustersize-1)*n,
{w[k - 1],..., w[k+clustersize-2]|w[j]
≤w[j-1]+orfac*2*norm(A)}.
neig = number of eigenvectors requested;
descz[nb_ - 1];
nn = max(n, NB, 2);
descz[csrc_ - 1] = 0;
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);
When lrwork is too small:
If lwork is too small to guarantee orthogonality, p?heevx attempts to
eigenvalues. If lwork is too small to compute all the eigenvectors requested,
no computation is performed and info= -23 is returned. Note that when
range='V', p?heevx does not know how many eigenvectors are requested
1361
until the eigenvalues are computed. Therefore, when range='V' and as

long as lwork is large enough to allow p?heevx to compute the eigenvalues,
p?heevx will compute the eigenvalues and as many eigenvectors as it can.
Relationship between workspace, orthogonality and performance:
If clustersize≥n/sqrt(NPROW*NPCOL), then providing enough space to
compute all the eigenvectors orthogonally will cause serious degradation in
performance. In the limit (that is, clustersize = n-1)p?stein will
perform no better than ?stein on 1 processor.

faster execution.
pxerbla.
liwork (local), size of iwork.
liwork≥ 6*nnp
Where: nnp = max(n, NPROW*NPCOL+1, 4)

Output Parameters
m (global) The total number of eigenvalues found; 0 ≤m≤n.
nz (global) Total number of eigenvectors computed. 0 ≤nz≤m.


heevx is not able to detect this before beginning computation. To get all the
compute them. (See lwork). p?heevx is always able to detect insufficient
space without computation unless range='V'.
w (global).
1362
ascending order.
z (local).
If jobz ='V', then on normal exit the first m columns of z contain the
rwork (local).
Array of size lrwork. On return, rwork[0] contains the optimal amount of
workspace required for efficient execution.
If jobz='N'rwork[0] = optimal amount of workspace required to compute
eigenvalues efficiently.
If jobz='V'rwork[0] = optimal amount of workspace required to compute
eigenvalues and eigenvectors efficiently with no guarantee on orthogonality.
If range='V', it is assumed that all eigenvectors may be required.
iwork[0] (local)
ifail (global)
Array of size n.
If jobz ='V', then on normal exit, the first m elements of ifail are zero. If
(mod(info,2)≠0) on exit, then ifail contains the indices of the eigenvectors
that failed to converge.
iclustr (global)
Array of size 2*NPROW*NPCOL.
This array contains indices of eigenvectors corresponding to a cluster of

eigenvalues that could not be reorthogonalized due to insufficient
clusters of eigenvalues indexed iclustr[2*i - 2]) to iclustr[2*i -
1], could not be reorthogonalized due to lack of workspace. Hence the
eigenvectors corresponding to these clusters may not be orthogonal.
iclustr is a zero terminated array. (iclustr[2*k - 1]≠0 and
iclustr[2*k]=0) if and only if k is the number of clusters. iclustr is not
referenced if jobz = 'N'.
gap (global)
Array of size (NPROW*NPCOL)
1363
eigenvectors corresponding to the i-th cluster may be as high as (C*n)/
gap(i) where C is a small constant.
info (global)
If info < 0:
If info> 0:
If (mod(info,2)≠0), then one or more eigenvectors failed to converge.

Their indices are stored in ifail. Ensure abstol=2.0*p?lamch('U')
If (mod(info/2,2)≠0), then eigenvectors corresponding to one or more

workspace.The indices of the clusters are stored in the array iclustr.
If (mod(info/4,2)≠0), then space limit prevented p?syevx from
Ensure abstol=2.0*p?lamch('U').
See Also
notations.
p?gesvd
general matrix, optionally computing the left and/or
right singular vectors.
Syntax
void psgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *s , float *u , MKL_INT *iu , MKL_INT *ju ,
MKL_INT *descu , float *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , float
*work , MKL_INT *lwork , float *rwork , MKL_INT *info );
void pdgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , double *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , double *s , double *u , MKL_INT *iu , MKL_INT
*ju , MKL_INT *descu , double *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt ,
double *work , MKL_INT *lwork , double *rwork , MKL_INT *info );
void pcgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *s , MKL_Complex8 *u , MKL_INT
*iu , MKL_INT *ju , MKL_INT *descu , MKL_Complex8 *vt , MKL_INT *ivt , MKL_INT *jvt ,
MKL_INT *descvt , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *info );
1364
void pzgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *s , MKL_Complex16 *u , MKL_INT
*iu , MKL_INT *ju , MKL_INT *descu , MKL_Complex16 *vt , MKL_INT *ivt , MKL_INT *jvt ,
MKL_INT *descvt , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?gesvd function computes the singular value decomposition (SVD) of an m-by-n matrix A, optionally
computing the left and/or right singular vectors. The SVD is written
A = U*Σ*VT,
where Σ is an m-by-n matrix that is zero except for its min(m, n) diagonal elements, U is an m-by-m
orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of Σ are the singular values
of A and the columns of U and V are the corresponding right and left singular vectors, respectively. The
singular values are returned in array s in decreasing order and only the first min(m,n) columns of U and rows
of vt = VT are computed.
Optimization Notice
Input Parameters
mp = number of local rows in A and U
nq = number of local columns in A and VT
size = min(m, n)
sizeq = number of local columns in U
sizep = number of local rows in VT
jobu (global) Specifies options for computing all or part of the matrix U.
If jobu = 'V', the first size columns of U (the left singular vectors) are
returned in the array u;
If jobu ='N', no columns of U (no left singular vectors)are computed.
jobvt (global)
Specifies options for computing all or part of the matrix VT.
If jobvt = 'V', the first size rows of VT (the right singular vectors) are
returned in the array vt;
If jobvt = 'N', no rows of VT(no right singular vectors) are computed.
1365
m (global) The number of rows of the matrix A(m≥ 0).
n (global) The number of columns in A(n≥ 0).
a (local).
Block cyclic array, global size (m, n), local size (mp, nq).
iu, ju (global) The row and column indices in the global matrix U indicating the
first row and the first column of the submatrix U, respectively.
descu (global and local) array of size dlen_. The array descriptor for the
distributed matrix U.
ivt, jvt (global) The row and column indices in the global matrix VT indicating the
first row and the first column of the submatrix VT, respectively.
descvt (global and local) array of size dlen_. The array descriptor for the
distributed matrix VT.
work (local).
Workspace array of size lwork
lwork (local) The size of the array work;

lwork > 2 + 6*sizeb + max(watobd, wbdtosvd),
where sizeb = max(m, n), and watobd and wbdtosvd refer, respectively,
to the workspace required to bidiagonalize the matrix A and to go from the
bidiagonal matrix to the singular value decomposition USVT.
For watobd, the following holds:
watobd = max(max(wp?lange,wp?gebrd), max(wp?lared2d, wp?
lared1d)),
where wp?lange, wp?lared1d, wp?lared2d, wp?gebrd are the workspaces
required respectively for the subprograms p?lange, p?lared1d, p?
lared2d, p?gebrd. Using the standard notation
mp = numroc(m, mb, MYROW, desca[ctxt_ - 1], NPROW),
nq = numroc(n, nb, MYCOL, desca[lld_ - 1], NPCOL),
the workspaces required for the above subprograms are
wp?lange = mp,
wp?lared1d = nq0,
wp?lared2d = mp0,
wp?gebrd = nb*(mp + nq + 1) + nq,
where nq0 and mp0 refer, respectively, to the values obtained at MYCOL =
0 and MYROW = 0. In general, the upper limit for the workspace is given by
a workspace required on processor (0,0):
1366
watobd≤nb*(mp0 + nq0 + 1) + nq0.
In case of a homogeneous process grid this upper limit can be used as an
estimate of the minimum workspace for every processor.
For wbdtosvd, the following holds:
wbdtosvd = size*(wantu*nru + wantvt*ncvt) + max(w?bdsqr,
max(wantu*wp?ormbrqln, wantvt*wp?ormbrprt)),
where
wantu(wantvt) = 1, if left/right singular vectors are wanted, and
wantu(wantvt) = 0, otherwise. w?bdsqr, wp?ormbrqln, and wp?ormbrprt
refer respectively to the workspace required for the subprograms ?bdsqr,
p?ormbr(qln), and p?ormbr(prt), where qln and prt are the values of the
arguments vect, side, and trans in the call to p?ormbr. nru is equal to the
local number of rows of the matrix U when distributed 1-dimensional
"column" of processes. Analogously, ncvt is equal to the local number of
columns of the matrix VT when distributed across 1-dimensional "row" of
processes. Calling the LAPACK procedure ?bdsqr requires
w?bdsqr = max(1, 2*size + (2*size - 4)* max(wantu, wantvt))

on every processor. Finally,
wp?ormbrqln = max((nb*(nb-1))/2, (sizeq+mp)*nb)+nb*nb,
wp?ormbrprt = max((mb*(mb-1))/2, (sizep+nq)*mb)+mb*mb,
assumed; the function only calculates the minimum size for the work array.
The required workspace is returned as the first element of work and no
error message is issued by pxerbla.
rwork Workspace array of size 1 + 4*sizeb. Not used for psgesvd and pdgesvd.
Output Parameters
a On exit, the contents of a are destroyed.
s (global).
Array of size size.
Contains the singular values of A sorted so that s(i) ≥s(i+1).
u (local).
local size mp*sizeq, global size m*size)
If jobu = 'V', u contains the first min(m, n) columns of U.
If jobu = 'N' or 'O', u is not referenced.
vt (local).
local size (sizep, nq), global size (size, n)
If jobvt = 'V', vt contains the first size rows of VTif jobu = 'N', vt is
not referenced.
1367
work On exit, if info = 0, then work[0] returns the required minimal size of
lwork.
rwork On exit, if info = 0, then rwork[0] returns the required size of rwork.
info (global)
If info < 0, If info = -i, the ith parameter had an illegal value.
If info > 0 i, then if ?bdsqr did not converge,
If info = min(m,n) + 1, then p?gesvd has detected heterogeneity by

finding that eigenvalues were not identical across the process grid. In this
case, the accuracy of the results from p?gesvd cannot be guaranteed.
See Also
notations.
p?sygvx
eigenproblem.
Syntax
void pssygvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_INT *jb , MKL_INT *descb , float *vl , float *vu , MKL_INT *il , MKL_INT *iu ,
float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac , float *z ,
MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , float *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , float *gap , MKL_INT
*info );
void pdsygvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_INT *jb , MKL_INT *descb , double *vl , double *vu , MKL_INT *il , MKL_INT *iu ,
double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac , double *z ,
MKL_INT *iz , MKL_INT *jz , MKL_INT *descz , double *work , MKL_INT *lwork , MKL_INT
*iwork , MKL_INT *liwork , MKL_INT *ifail , MKL_INT *iclustr , double *gap , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The p?sygvxfunction computes all the eigenvalues, and optionally, the eigenvectors of a real generalized
symmetric-definite eigenproblem, of the form
sub(A)*x = λ*sub(B)*x, sub(A) sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.
Here x denotes eigen vectors, λ (lambda) denotes eigenvalues, sub(A) denoting A(ia:ia+n-1, ja:ja
+n-1) is assumed to symmetric, and sub(B) denoting B(ib:ib+n-1, jb:jb+n-1) is also positive definite.
1368
Optimization Notice
Input Parameters

If ibtype = 1, the problem type is sub(A)*x = lambda*sub(B)*x;
If ibtype = 2, the problem type is sub(A)*sub(B)*x = lambda*x;
If ibtype = 3, the problem type is sub(B)*sub(A)*x = lambda*x.
If jobz ='N', then compute eigenvalues only.
If jobz ='V', then compute eigenvalues and eigenvectors.
If range = 'V', the function computes eigenvalues in the interval: [vl,

vu]
If range = 'I', the function computes eigenvalues with indices il through
iu.
If uplo = 'U', arrays a and b store the upper triangles of sub(A) and sub
(B);
If uplo = 'L', arrays a and b store the lower triangles of sub(A) and sub
(B).
n (global) The order of the matrices sub(A) and sub (B), n≥ 0.
a (local)
the upper triangular part of the matrix.
the lower triangular part of the matrix.
1369
distributed matrix A. If desca[ctxt_ - 1] is incorrect, p?sygvx cannot
b (local).
If uplo = 'U', the leading n-by-n upper triangular part of sub(B) contains
distributed matrix B. descb[ctxt_ - 1] must be equal to desca[ctxt_ -
1].
vl, vu (global)
for eigenvalues.
il, iu (global)
eigenvalues to be returned. Constraint: il≥ 1, min(il, n)≤iu≤n
abstol (global)
interval [a,b] of width less than or equal to
abstol + eps*max(|a|,|b|),
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm
of the tridiagonal matrix obtained by reducing A to tridiagonal form.
with ((mod(info,2)≠0) or (mod(info/8,2)≠0)), indicating that some
lamch('S').
1370
NOTE
orfac (global).
work (local)
lwork (local)
Size of the array work. See below for definitions of variables used to define
lwork.
max(5*nn, NB*(np0 + 1)).
lwork≥ 5*n + max(5*nn, np0*mq0 + 2*nb*nb) + iceil(neig,
NPROW*NPCOL)*nn.
orthogonality at the cost of potentially poor performance you should add
the following to lwork:
(clustersize-1)*n,
{w[k - 1],..., w[k+clustersize - 2]|w[j] ≤w[j - 1] +
orfac*2*norm(A)}
neig = number of eigenvectors requested,
descz[nb_ - 1],
nn = max(n, nb, 2),
descz[csrc_ - 1] = 0,
1371
np0 = numroc(nn, nb, 0, 0, NPROW),
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL)

If lwork is too small to guarantee orthogonality, p?syevx attempts to
eigenvalues.
computation is performed and info= -23 is returned.
Note that when range='V', number of requested eigenvectors are not
known until the eigenvalues are computed. In this case and if lwork is large
enough to compute the eigenvalues, p?sygvx computes the eigenvalues
and as many eigenvectors as possible.
Greater performance can be achieved if adequate workspace is provided. In
some situations, performance can decrease as the provided workspace
increases above the workspace amount shown below:
lwork≥max(lwork, 5*n + nsytrd_lwopt, nsygst_lwopt), where
lwork, as defined previously, depends upon the number of eigenvectors
requested, and
nsytrd_lwopt = n + 2*(anb+1)*(4*nps+2) + (nps+3)*nps
nsygst_lwopt = 2*np0*nb + nq0*nb + nb*nb
anb = pjlaenv(desca[ctxt_ - 1], 3, p?syttrd ', 'L', 0, 0, 0,
0)
sqnpc = int(sqrt(dble(NPROW * NPCOL)))
NB = desca[mb_ - 1]
np0 = numroc(n, nb, 0, 0, NPROW)
nq0 = numroc(n, nb, 0, 0, NPCOL)
pjlaenv is a ScaLAPACK environmental inquiry function
MYROW, MYCOL, NPROW and NPCOL can be determined by calling the function
blacs_gridinfo.
For large n, no extra workspace is needed, however the biggest boost in
performance comes for small n, so it is wise to provide the extra workspace
(typically less than a Megabyte per process).
If clustersize≥n/sqrt(NPROW*NPCOL), then providing enough space to
compute all the eigenvectors orthogonally will cause serious degradation in
performance. At the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on a single processor.

1372
faster execution.
pxerbla.

liwork≥ 6*nnp
Where:
nnp = max(n, NPROW*NPCOL + 1, 4)
Output Parameters
a On exit,
If jobz = 'V', and if info = 0, sub(A) contains the distributed matrix Z
of eigenvectors. The eigenvectors are normalized as follows:
for ibtype = 1 or 2, ZT*sub(B)*Z = i;
for ibtype = 3, ZT*inv(sub(B))*Z = i.
If jobz = 'N', then on exit the upper triangle (if uplo='U') or the lower
triangle (if uplo='L') of sub(A), including the diagonal, is destroyed.
b On exit, if info≤n, the part of sub(B) containing the matrix is overwritten

by the triangular factor U or L from the Cholesky factorization sub(B) =
UT*U or sub(B) = L*LT.
m (global) The total number of eigenvalues found, 0 ≤m≤n.
nz (global)
Total number of eigenvectors computed. 0 ≤nz≤m. The number of columns
of z that are filled.

sygvx is not able to detect this before beginning computation. To get all the
the eigenvectors in z (m≤descz(n_)) and sufficient workspace to compute
them. (See lwork below.) p?sygvx is always able to detect insufficient
space without computation unless range='V'.
w (global)
1373
Array of size n. On normal exit, the first m entries contain the selected
eigenvalues in ascending order.
z (local).
global size n*n, local size lld_z*LOCc(jz+n-1).
work If jobz='N'work[0] = optimal amount of workspace required to compute

eigenvalues efficiently
If jobz = 'V'work[0] = optimal amount of workspace required to
compute eigenvalues and eigenvectors efficiently with no guarantee on
orthogonality.
If range='V', it is assumed that all eigenvectors may be required.
ifail (global)
Array of size n.
ifail provides additional information when info≠0
If (mod(info/16,2)≠0) then ifail[0] indicates the order of the smallest

minor which is not positive definite. If (mod(info,2)≠0) on exit, then ifail
contains the indices of the eigenvectors that failed to converge.
If neither of the above error conditions hold and jobz = 'V', then the first
m elements of ifail are set to zero.
iclustr (global)
Array of size (2*NPROW*NPCOL). This array contains indices of eigenvectors
corresponding to a cluster of eigenvalues that could not be reorthogonalized
due to insufficient workspace (see lwork, orfac and info). Eigenvectors
corresponding to clusters of eigenvalues indexed iclustr[2*i - 2] to
iclustr[2*i - 1], could not be reorthogonalized due to lack of
workspace. Hence the eigenvectors corresponding to these clusters may not
be orthogonal. iclustr is a zero terminated array.
(iclustr[2*k - 1]≠0.and. iclustr[2*k]=0) if and only if k is the

number of clusters iclustr is not referenced if jobz = 'N'.
gap (global)
Array of size NPROW*NPCOL. This array contains the gap between
eigenvalues whose eigenvectors could not be reorthogonalized. The output
values in this array correspond to the clusters indicated by the array iclustr.
As a result, the dot product between eigenvectors corresponding to the i-th
cluster may be as high as (C*n)/gap[i - 1], where C is a small constant.
info (global)
1374
If info <0: the i-th argument is an array and the j-entry had an illegal
If info> 0:

Their indices are stored in ifail.
If (mod(info,2,2)≠0), then eigenvectors corresponding to one or more
workspace. The indices of the clusters are stored in the array iclustr.
If (mod(info/4,2)≠0), then space limit prevented p?sygvx from
If (mod(info/16,2)≠0), then B was not positive definite. ifail(1) indicates

the order of the smallest minor which is not positive definite.
See Also
notations.
p?hegvx
Syntax
void pchegvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , float *vl , float *vu , MKL_INT *il ,
MKL_INT *iu , float *abstol , MKL_INT *m , MKL_INT *nz , float *w , float *orfac ,
MKL_INT *lwork , float *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
void pzhegvx (MKL_INT *ibtype , char *jobz , char *range , char *uplo , MKL_INT *n ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , double *vl , double *vu , MKL_INT *il ,
MKL_INT *iu , double *abstol , MKL_INT *m , MKL_INT *nz , double *w , double *orfac ,
MKL_INT *lwork , double *rwork , MKL_INT *lrwork , MKL_INT *iwork , MKL_INT *liwork ,
Include Files
• mkl_scalapack.h
Description
The p?hegvx function computes all the eigenvalues, and optionally, the eigenvectors of a complex
generalized Hermitian positive-definite eigenproblem, of the form
sub(A)*x = λ*sub(B)*x, sub(A)*sub(B)*x = λ*x, or sub(B)*sub(A)*x = λ*x.
1375
Here sub (A) denoting A(ia:ia+n-1, ja:ja+n-1) and sub(B) are assumed to be Hermitian and sub(B)
denoting B(ib:ib+n-1, jb:jb+n-1) is also positive definite.
Optimization Notice
Input Parameters

If ibtype = 1, the problem type is
sub(A)*x = lambda*sub(B)*x;
sub(A)*sub(B)*x = lambda*x;
sub(B)*sub(A)*x = lambda*x.
If jobz ='N', then compute eigenvalues only.
If jobz ='V', then compute eigenvalues and eigenvectors.
If range = 'V', the function computes eigenvalues in the interval: [vl,

vu]
If range = 'I', the function computes eigenvalues with indices il through
iu.
If uplo = 'U', arrays a and b store the upper triangles of sub(A) and sub
(B);
If uplo = 'L', arrays a and b store the lower triangles of sub(A) and sub
(B).
n (global)
The order of the matrices sub(A) and sub (B) (n≥ 0).
a (local)
1376
entry, this array contains the local pieces of the n-by-n Hermitian
distributed matrix sub(A). If uplo = 'U', the leading n-by-n upper
triangular part of sub(A) contains the upper triangular part of the matrix. If
uplo = 'L', the leading n-by-n lower triangular part of sub(A) contains
ia, ja (global)
The row and column indices in the global matrix A indicating the first row
and the first column of the submatrix A, respectively.
desca (global and local) array of size dlen_.

The array descriptor for the distributed matrix A. If desca[ctxt_ - 1] is
incorrect, p?hegvx cannot guarantee correct error reporting.
b (local).
entry, this array contains the local pieces of the n-by-n Hermitian
If uplo = 'U', the leading n-by-n upper triangular part of sub(B) contains
If uplo = 'L', the leading n-by-n lower triangular part of sub(B) contains
ib, jb (global)
The row and column indices in the global matrix B indicating the first row
and the first column of the submatrix B, respectively.
descb (global and local) array of size dlen_.

The array descriptor for the distributed matrix B.descb[ctxt_ - 1] must
be equal to desca[ctxt_ - 1].
vl, vu (global)
for eigenvalues.
il, iu (global)
eigenvalues to be returned. Constraint: il≥ 1, min(il, n) ≤iu≤n
abstol (global)
interval [a,b] of width less than or equal to
1377
abstol + eps*max(|a|,|b|),
then eps*norm(T) will be used in its place, where norm(T) is the 1-norm of
the tridiagonal matrix obtained by reducing A to tridiagonal form.
with ((mod(info,2)≠0).or. * (mod(info/8,2)≠0)), indicating that
some eigenvalues or eigenvectors did not converge, try setting abstol to
2*p?lamch('S').
NOTE
orfac (global).
be done if orfac equals zero. A default value of 1.0E-3 is used if orfac is
work (local)
lwork (local).
lwork≥ n+ max(NB*(np0 + 1), 3)
lwork≥n + (np0+ mq0 + NB)*NB
with nq0 = numroc(nn, NB, 0, 0, NPCOL).
For optimal performance, greater workspace is needed, that is

lwork≥max(lwork, n, nhetrd_lwopt, nhegst_lwopt)
where lwork is as defined above, and
nhetrd_lwork = 2*(anb+1)*(4*nps+2) + (nps + 1)*nps;
nhegst_lwopt = 2*np0*nb + nq0*nb + nb*nb
nb = desca[mb_ - 1]
np0 = numroc(n, nb, 0, 0, NPROW)
1378
nq0 = numroc(n, nb, 0, 0, NPCOL)
anb = pjlaenv(ictxt, 3, 'p?hettrd', 'L', 0, 0, 0, 0)
sqnpc = sqrt(dble(NPROW * NPCOL))
pjlaenv is a ScaLAPACK environmental inquiry function MYROW, MYCOL,
blacs_gridinfo.
pxerbla.
rwork (local)

If no eigenvectors are requested (jobz = 'N'), then lrwork≥ 5*nn+4*n

lrwork≥ 4*n + max(5*nn, np0*mq0)+iceil(neig, NPROW*NPCOL)*nn
the following value to lrwork:
(clustersize-1)*n,
{w]k - 1],..., w[k+clustersize - 2]|w[j] ≤w[j -
1]+orfac*2*norm(A)}
neig = number of eigenvectors requested;
descz[nb_ - 1];
nn = max(n, nb, 2);
descz[csrc_ - 1] = 0 ;
mq0 = numroc(max(neig, nb, 2), nb, 0, 0, NPCOL);
iceil(x, y) is a ScaLAPACK function returning ceiling(x/y).
1379
When lrwork is too small:

If lwork is too small to guarantee orthogonality, p?hegvx attempts to
eigenvalues.
computation is performed and info= -25 is returned. Note that when
range='V', p?hegvx does not know how many eigenvectors are requested
until the eigenvalues are computed. Therefore, when range='V' and as
long as lwork is large enough to allow p?hegvx to compute the eigenvalues,
p?hegvx will compute the eigenvalues and as many eigenvectors as it can.
Relationship between workspace, orthogonality & performance:
If clustersize > n/sqrt(NPROW*NPCOL), then providing enough space
to compute all the eigenvectors orthogonally will cause serious degradation
in performance. In the limit (that is, clustersize = n-1) p?stein will
perform no better than ?stein on 1 processor.

faster execution.
If lwork = -1, then lrwork is global input and a workspace query is
pxerbla.

liwork≥ 6*nnp
Where: nnp = max(n, NPROW*NPCOL + 1, 4)

Output Parameters
a On exit, if jobz = 'V', then if info = 0, sub(A) contains the distributed

matrix Z of eigenvectors.
If ibtype = 1 or 2, then ZH*sub(B)*Z = i;
If ibtype = 3, then ZH*inv(sub(B))*Z = i.
If jobz = 'N', then on exit the upper triangle (if uplo='U') or the lower
triangle (if uplo='L') of sub(A), including the diagonal, is destroyed.
1380
b On exit, if info≤n, the part of sub(B) containing the matrix is overwritten
by the triangular factor U or L from the Cholesky factorization sub(B) =
UH*U, or sub(B) = L*LH.
m (global) The total number of eigenvalues found, 0 ≤m≤n.
nz (global) Total number of eigenvectors computed. 0 < nz < m. The number

of columns of z that are filled.

hegvx is not able to detect this before beginning computation. To get all the
compute them. (See lwork below.) The function p?hegvx is always able to
detect insufficient space without computation unless range = 'V'.
w (global)
Array of size n. On normal exit, the first m entries contain the selected
eigenvalues in ascending order.
z (local).
global size n*n, local size lld_z*LOCc(jz+n-1).
work On exit, work[0] returns the optimal amount of workspace.
rwork On exit, rwork[0] contains the amount of workspace required for optimal
efficiency
If jobz='N'rwork[0] = optimal amount of workspace required to compute
eigenvalues efficiently
If jobz='V'rwork[0] = optimal amount of workspace required to compute
eigenvalues and eigenvectors efficiently with no guarantee on orthogonality.
If range='V', it is assumed that all eigenvectors may be required when
computing optimal workspace.
ifail (global)
Array of size n.
ifail provides additional information when info≠0
If (mod(info/16,2)≠0), then ifail[0] indicates the order of the

smallest minor which is not positive definite.
If (mod(info,2)≠0) on exit, then ifail[0] contains the indices of the
1381
If neither of the above error conditions are held, and jobz = 'V', then the
first m elements of ifail are set to zero.
iclustr (global)
Array of size (2*NPROW*NPCOL). This array contains indices of eigenvectors
corresponding to a cluster of eigenvalues that could not be reorthogonalized
due to insufficient workspace (see lwork, orfac and info). Eigenvectors
corresponding to clusters of eigenvalues indexed iclustr(2*i-1) to
iclustr(2*i), could not be reorthogonalized due to lack of workspace.
Hence the eigenvectors corresponding to these clusters may not be
orthogonal.
iclustr() is a zero terminated array. (iclustr(2*k)
≠0.and.clustr(2*k+1)=0) if and only if k is the number of clusters.
iclustr is not referenced if jobz = 'N'.
gap (global)
Array of size NPROW*NPCOL.
eigenvectors corresponding to the i-th cluster may be as high as (C*n)/
gap(i), where C is a small constant.
info (global)
If info <0: the i-th argument is an array and the j-entry had an illegal
If info> 0:

Their indices are stored in ifail.
If (mod(info,2,2)≠0), then eigenvectors corresponding to one or more
workspace. The indices of the clusters are stored in the array iclustr.
If (mod(info/4,2)≠0), then space limit prevented p?sygvx from
If (mod(info/16,2)≠0), then B was not positive definite. ifail(1) indicates

the order of the smallest minor which is not positive definite.
See Also
notations.
1382
ScaLAPACK Auxiliary Routines
ScaLAPACK Auxiliary Routines
Routine Name Data Description
Types
b?laapp s,d Multiplies a matrix with an orthogonal matrix.
b?laexc s,d Swaps adjacent diagonal blocks of a real upper quasi-triangular

matrix in Schur canonical form, by an orthogonal similarity
transformation.
b?trexc s,d Reorders the Schur factorization of a general matrix.
p?lacgv c,z Conjugates a complex vector.
p?max1 c,z Finds the index of the element whose real part has maximum
absolute value (similar to the Level 1 PBLAS p?amax, but using the
absolute value to the real part).
pmpcol s,d Finds the collaborators of a process.
pmpim2 s,d Computes the eigenpair range assignments for all processes.
?combamax1 c,z Finds the element with maximum real part absolute value and its
corresponding global index.
p?sum1 sc,dz Forms the 1-norm of a complex vector similar to Level 1 PBLAS p?
asum, but using the true absolute value.
p?dbtrsv s,d,c,z Computes an LU factorization of a general tridiagonal matrix with

no pivoting. The routine is called by p?dbtrs.
p?dttrsv s,d,c,z Computes an LU factorization of a general band matrix, using

partial pivoting with row interchanges. The routine is called by p?
dttrs.
p?gebal s,d Balances a general real matrix.
p?gebd2 s,d,c,z Reduces a general rectangular matrix to real bidiagonal form by an

orthogonal/unitary transformation (unblocked algorithm).
p?gehd2 s,d,c,z Reduces a general matrix to upper Hessenberg form by an

orthogonal/unitary similarity transformation (unblocked algorithm).
p?gelq2 s,d,c,z Computes an LQ factorization of a general rectangular matrix

p?geql2 s,d,c,z Computes a QL factorization of a general rectangular matrix

p?geqr2 s,d,c,z Computes a QR factorization of a general rectangular matrix

p?gerq2 s,d,c,z Computes an RQ factorization of a general rectangular matrix

p?getf2 s,d,c,z Computes an LU factorization of a general matrix, using partial

pivoting with row interchanges (local blocked algorithm).
p?labrd s,d,c,z Reduces the first nb rows and columns of a general rectangular
matrix A to real bidiagonal form by an orthogonal/unitary
transformation, and returns auxiliary matrices that are needed to
apply the transformation to the unreduced part of A.
1383

Types
p?lacon s,d,c,z Estimates the 1-norm of a square matrix, using the reverse
communication for evaluating matrix-vector products.
p?laconsb s,d Looks for two consecutive small subdiagonal elements.
p?lacp2 s,d,c,z Copies all or part of a distributed matrix to another distributed

matrix.
p?lacp3 s,d Copies from a global parallel array into a local replicated array or
vice versa.
p?lacpy s,d,c,z Copies all or part of one two-dimensional array to another.
p?laevswp s,d,c,z Moves the eigenvectors from where they are computed to
ScaLAPACK standard block cyclic array.
p?lahrd s,d,c,z Reduces the first nb columns of a general rectangular matrix A so

that elements below the kth subdiagonal are zero, by an
orthogonal/unitary transformation, and returns auxiliary matrices
that are needed to apply the transformation to the unreduced part
of A.
p?laiect s,d,c,z Exploits IEEE arithmetic to accelerate the computations of

eigenvalues.
p?lamve s, d Copies all or part of one two-dimensional distributed array to

another.
p?lange s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of a general rectangular
matrix.
p?lanhs s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of an upper Hessenberg
matrix.
p?lansy, p?lanhe s,d,c,z/c Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
,z the largest absolute value of any element of a real symmetric or
complex Hermitian matrix.
p?lantr s,d,c,z Returns the value of the 1-norm, Frobenius norm, infinity-norm, or
the largest absolute value of any element, of a triangular matrix.
p?lapiv s,d,c,z Applies a permutation matrix to a general distributed matrix,

resulting in row or column pivoting.
p?laqge s,d,c,z Scales a general rectangular matrix, using row and column scaling
factors computed by p?geequ.
p?laqr0 s,d Computes the eigenvalues of a Hessenberg matrix and optionally

returns the matrices from the Schur decomposition.
p?laqr1 s,d Sets a scalar multiple of the first column of the product of a 2-by-2
or 3-by-3 matrix and specified shifts.
p?laqr2 s,d Performs the orthogonal/unitary similarity transformation of a

Hessenberg matrix to detect and deflate fully converged
eigenvalues from a trailing principal submatrix (aggressive early
deflation).
1384
Types
p?laqr3 s,d Performs the orthogonal/unitary similarity transformation of a

Hessenberg matrix to detect and deflate fully converged
eigenvalues from a trailing principal submatrix (aggressive early
deflation).
p?laqr4 s,d Computes the eigenvalues of a Hessenberg matrix, and optionally

computes the matrices from the Schur decomposition.
p?laqr5 s,d Performs a single small-bulge multi-shift QR sweep.
p?laqsy s,d,c,z Scales a symmetric/Hermitian matrix, using scaling factors

computed by p?poequ.
p?lared1d s,d Redistributes an array assuming that the input array bycol is
distributed across rows and that all process columns contain the
same copy of bycol.
p?lared2d s,d Redistributes an array assuming that the input array byrow is
distributed across columns and that all process rows contain the
same copy of byrow .
p?larf s,d,c,z Applies an elementary reflector to a general rectangular matrix.
p?larfb s,d,c,z Applies a block reflector or its transpose/conjugate-transpose to a

general rectangular matrix.
p?larfc c,z Applies the conjugate transpose of an elementary reflector to a

general matrix.
p?larfg s,d,c,z Generates an elementary reflector (Householder matrix).
p?larft s,d,c,z Forms the triangular vector T of a block reflector H=I-VTVH
p?larz s,d,c,z Applies an elementary reflector as returned by p?tzrzf to a

general matrix.
p?larzb s,d,c,z Applies a block reflector or its transpose/conjugate-transpose as

returned by p?tzrzf to a general matrix.
p?larzc c,z Applies (multiplies by) the conjugate transpose of an elementary

reflector as returned by p?tzrzf to a general matrix.
p?larzt s,d,c,z Forms the triangular factor T of a block reflector H=I-VTVH as

returned by p?tzrzf.
p?lascl s,d,c,z Multiplies a general rectangular matrix by a real scalar defined as

Cto/Cfrom.
p?laset s,d,c,z Initializes the off-diagonal elements of a matrix to α and the

diagonal elements to β.
p?lasmsub s,d Looks for a small subdiagonal element from the bottom of the
matrix that it can safely set to zero.
p?lassq s,d,c,z Updates a sum of squares represented in scaled form.
p?laswp s,d,c,z Performs a series of row interchanges on a general rectangular

matrix.
p?latra s,d,c,z Computes the trace of a general square distributed matrix.
1385

Types
p?latrd s,d,c,z Reduces the first nb rows and columns of a symmetric/Hermitian

matrix A to real tridiagonal form by an orthogonal/unitary similarity
transformation.
p?latrz s,d,c,z Reduces an upper trapezoidal matrix to upper triangular form by

means of orthogonal/unitary transformations.
p?lauu2 s,d,c,z Computes the product UUH or LHL, where U and L are upper or
lower triangular matrices (local unblocked algorithm).
p?lauum s,d,c,z Computes the product UUH or LHL, where U and L are upper or
lower triangular matrices.
p?lawil s,d Forms the Wilkinson transform.
p?org2l/p?ung2l s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from a QL

factorization determined by p?geqlf (unblocked algorithm).
p?org2r/p?ung2r s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from a QR

factorization determined by p?geqrf (unblocked algorithm).
p?orgl2/p?ungl2 s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from an LQ

factorization determined by p?gelqf (unblocked algorithm).
p?orgr2/p?ungr2 s,d,c,z Generates all or part of the orthogonal/unitary matrix Q from an

RQ factorization determined by p?gerqf (unblocked algorithm).
p?orm2l/p?unm2l s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from a

QL factorization determined by p?geqlf (unblocked algorithm).
p?orm2r/p?unm2r s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from a

QR factorization determined by p?geqrf (unblocked algorithm).
p?orml2/p?unml2 s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from

an LQ factorization determined by p?gelqf (unblocked algorithm).
p?ormr2/p?unmr2 s,d,c,z Multiplies a general matrix by the orthogonal/unitary matrix from

an RQ factorization determined by p?gerqf (unblocked algorithm).
p?pbtrsv s,d,c,z Solves a single triangular linear system via frontsolve or backsolve
where the triangular matrix is a factor of a banded matrix
computed by p?pbtrf.
p?pttrsv s,d,c,z Solves a single triangular linear system via frontsolve or backsolve
where the triangular matrix is a factor of a tridiagonal matrix
computed by p?pttrf.
p?potf2 s,d,c,z Computes the Cholesky factorization of a symmetric/Hermitian

positive definite matrix (local unblocked algorithm).
p?rot s,d Applies a planar rotation to two distributed vectors.
p?rscl s,d,cs,zd Multiplies a vector by the reciprocal of a real scalar.
p?sygs2/p?hegs2 s,d,c,z Reduces a symmetric/Hermitian positive-definite generalized

eigenproblem to standard form, using the factorization results
obtained from p?potrf (local unblocked algorithm).
p?sytd2/p?hetd2 s,d,c,z Reduces a symmetric/Hermitian matrix to real symmetric

tridiagonal form by an orthogonal/unitary similarity transformation
(local unblocked algorithm).
1386
Types
p?trord s,d Reorders the Schur factorization of a general matrix.
p?trsen s,d Reorders the Schur factorization of a matrix and (optionally)

computes the reciprocal condition numbers and invariant subspace
for the selected cluster of eigenvalues.
p?trti2 s,d,c,z Computes the inverse of a triangular matrix (local unblocked

algorithm).
?lamsh s,d Sends multiple shifts through a small (single node) matrix to
maximize the number of bulges that can be sent through.
?laqr6 s,d Performs a single small-bulge multi-shift QR sweep collecting the

transformations.
?lar1va s,d Computes scaled eigenvector corresponding to given eigenvalue.
?laref s,d Applies Householder reflectors to matrices on either their rows or

columns.
?larrb2 s,d Provides limited bisection to locate eigenvalues for more accuracy.
?larrd2 s,d Computes the eigenvalues of a symmetric tridiagonal matrix to

suitable accuracy.
?larre2 s,d Given a tridiagonal matrix, sets small off-diagonal elements to zero
and for each unreduced block, finds base representations and
eigenvalues.
?larre2a s,d Given a tridiagonal matrix, sets small off-diagonal elements to zero
and for each unreduced block, finds base representations and
eigenvalues.
?larrf2 s,d Finds a new relatively robust representation such that at least one
of the eigenvalues is relatively isolated.
?larrv2 s,d Computes the eigenvectors of the tridiagonal matrix T = L*D*LT

given L, D and the eigenvalues of L*D*LT.
?lasorte s,d Sorts eigenpairs by real and complex data types.
?lasrt2 s,d Sorts numbers in increasing or decreasing order.
?stegr2 s,d Computes selected eigenvalues and eigenvectors of a real

symmetric tridiagonal matrix.
?stegr2a s,d Computes selected eigenvalues and initial representations needed

for eigenvector computations.
?stegr2b s,d From eigenvalues and initial representations computes the selected
eigenvalues and eigenvectors of the real symmetric tridiagonal
matrix in parallel on multiple processors.
?stein2 s,d Computes the eigenvectors corresponding to specified eigenvalues

of a real symmetric tridiagonal matrix, using inverse iteration.
?dbtf2 s,d,c,z Computes an LU factorization of a general band matrix with no

pivoting (local unblocked algorithm).
?dbtrf s,d,c,z Computes an LU factorization of a general band matrix with no

pivoting (local blocked algorithm).
1387

Types
?dttrf s,d,c,z Computes an LU factorization of a general tridiagonal matrix with

no pivoting (local blocked algorithm).
?dttrsv s,d,c,z Solves a general tridiagonal system of linear equations using the LU
factorization computed by ?dttrf.
?pttrsv s,d,c,z Solves a symmetric (Hermitian) positive-definite tridiagonal system

of linear equations, using the LDLH factorization computed by ?
pttrf.
?steqr2 s,d Computes all eigenvalues and, optionally, eigenvectors of a

symmetric tridiagonal matrix using the implicit QL or QR method.
?trmvt s,d,c,z Performs matrix-vector operations.
pilaenv NA Returns the positive integer value of the logical blocking size.
pilaenvx NA Called from the ScaLAPACK routines to choose problem-dependent

parameters for the local environment.
pjlaenv NA Called from the ScaLAPACK symmetric and Hermitian tailored

eigen-routines to choose problem-dependent parameters for the
local environment.
Optimization Notice
p?lacgv
Conjugates a complex vector.
Syntax
void pclacgv (MKL_INT *n , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx );
void pzlacgv (MKL_INT *n , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?lacgvfunction conjugates a complex vector sub(X) of length n, where sub(X) denotes X(ix, jx:jx
+n-1) if incx = m_x, and X(ix:ix+n-1, jx) if incx = 1.
1388
Input Parameters
n (global) The length of the distributed vector sub(X).
x (local).
Pointer into the local memory to an array of size lld_x * LOCc(n_x). On
entry the vector to be conjugated x[i] = X(ix+(jx-1)*m_x+i*incx), 0
≤i < n.
ix (global) The row index in the global matrix X indicating the first row of
sub(X).
jx (global) The column index in the global matrix X indicating the first column
of sub(X).
descx (global and local) Array of size dlen_=9. The array descriptor for the
incx (global) The global increment for the elements of X. Only two values of
incx are supported in this version, namely 1 and m_x. incx must not be
zero.
Output Parameters
x (local).
On exit, the local pieces of conjugated distributed vector sub(X).
See Also
notations.
p?max1
Finds the index of the element whose real part has
maximum absolute value (similar to the Level 1 PBLAS
p?amax, but using the absolute value to the real part).
Syntax
void pcmax1 (MKL_INT *n , MKL_Complex8 *amax , MKL_INT *indx , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx );
void pzmax1 (MKL_INT *n , MKL_Complex16 *amax , MKL_INT *indx , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx );
Include Files
• mkl_scalapack.h
Description
The p?max1function computes the global index of the maximum element in absolute value of a distributed
vector sub(X). The global index is returned in indx and the value is returned in amax, where sub(X) denotes
X(ix:ix+n-1, jx) if incx = 1, X(ix, jx:jx+n-1) if incx = m_x.
1389
Input Parameters
n (global). The number of components of the distributed vector sub(X). n ≥ 0.
x (local)
Pointer into the local memory to an array of size lld_x * LOCc(jx+n-1). On
entry this array contains the local pieces of the distributed vector sub(X).
sub(X).
of sub(X).
descx (global and local) Array of size dlen_. The array descriptor for the
incx (global).The global increment for the elements of X. Only two values of incx
are supported in this version, namely 1 and m_x. incx must not be zero.
Output Parameters
amax (global output).The absolute value of the largest entry of the distributed
vector sub(X) only in the scope of sub(X).
indx (global output).The global index of the element of the distributed vector
sub(X) whose real part has maximum absolute value.
See Also
notations.
pilaver
Returns the ScaLAPACK version.
Syntax
void pilaver (MKL_INT* vers_major, MKL_INT* vers_minor, MKL_INT* vers_patch);
Include Files
• mkl_scalapack.h
Description
This function returns the ScaLAPACK version.
Output Parameters
vers_major Return the ScaLAPACK major version.
vers_minor Return the ScaLAPACK minor version from the major version.
vers_patch Return the ScaLAPACK patch version from the minor version.
1390
pmpcol
Finds the collaborators of a process.
Syntax
void pmpcol(MKL_INT* myproc, MKL_INT* nprocs, MKL_INT* iil, MKL_INT* needil, MKL_INT*
neediu, MKL_INT* pmyils, MKL_INT* pmyius, MKL_INT* colbrt, MKL_INT* frstcl, MKL_INT*
lastcl);
Include Files
• mkl_scalapack.h
Description
Using the output from pmpim2 and given the information on eigenvalue clusters, pmpcol finds the
collaborators of myproc.
Optimization Notice
Input Parameters
myproc The processor number, 0 ≤myproc < nprocs.
nprocs The total number of processors available.
iil The index of the leftmost eigenvalue in the eigenvalue cluster.
needil The leftmost position in the eigenvalue cluster needed by myproc.
neediu The rightmost position in the eigenvalue cluster needed by myproc.
pmyils array
For each processor p, 0 < p≤nprocs, pmyils[p-1] is the index of the first
eigenvalue in the eigenvalue cluster to be computed.
pmyils[p-1] equals zero if p stays idle.
pmyius array
For each processor p, pmyius[p-1] is the index of the last eigenvalue in the
eigenvalue cluster to be computed.
pmyius[p-1] equals zero if p stays idle.
1391
OUTPUT Parameters
colbrt Non-zero if myproc collaborates.
frstcl, lastcl First and last collaborator of myproc .
myproc collaborates with:

frstcl, ..., myproc-1, myproc+1, ...,lastcl
If myproc = frstcl, there are no collaborators on the left. If myproc =
lastcl, there are no collaborators on the right.
If frstcl = 0 and lastcl = nprocs-1, then myproc collaborates with
everybody
See Also
notations.
pmpim2
Computes the eigenpair range assignments for all
processes.
Syntax
void pmpim2(MKL_INT* il, MKL_INT* iu, MKL_INT* nprocs, MKL_INT* pmyils, MKL_INT*
pmyius);
Include Files
• mkl_scalapack.h
Description
pmpim2 is the scheduling function. It computes for all processors the eigenpair range assignments.
Optimization Notice
Input Parameters
il, iu The range of eigenpairs to be computed.
nprocs The total number of processors available.
Output Parameters
pmyils array
1392
For each processor p, pmyils[p-1] is the index of the first eigenvalue in a
cluster to be computed.
pmyils[p-1] equals zero if p stays idle.
pmyius array
For each processor p, pmyius[p-1] is the index of the last eigenvalue in a
cluster to be computed.
pmyius[p-1] equals zero if p stays idle.
See Also
notations.
?combamax1
Finds the element with maximum real part absolute
value and its corresponding global index.
Syntax
void ccombamax1 (MKL_Complex8 *v1 , MKL_Complex8 *v2 );
void zcombamax1 (MKL_Complex16 *v1 , MKL_Complex16 *v2 );
Include Files
• mkl_scalapack.h
Description
The ?combamax1function finds the element having maximum real part absolute value as well as its
corresponding global index.
Input Parameters
v1 (local)
Array of size 2. The first maximum absolute value element and its global
index. v1[0]=amax, v1[1]=indx.
v2 (local)
Array of size 2. The second maximum absolute value element and its global
index. v2[0]=amax, v2[1]=indx.
Output Parameters
v1 (local).
The first maximum absolute value element and its global index.
v1[0]=amax, v1[1]=indx.
See Also
notations.
1393
p?sum1
Forms the 1-norm of a complex vector similar to Level
1 PBLAS p?asum, but using the true absolute value.
Syntax
void pscsum1 (MKL_INT *n , float *asum , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx ,
MKL_INT *descx , MKL_INT *incx );
void pdzsum1 (MKL_INT *n , double *asum , MKL_Complex16 *x , MKL_INT *ix , MKL_INT
*jx , MKL_INT *descx , MKL_INT *incx );
Include Files
• mkl_scalapack.h
Description
The p?sum1function returns the sum of absolute values of a complex distributed vector sub(x) in asum,
where sub(x) denotes X(ix:ix+n-1, jx:jx), if incx = 1, X(ix:ix, jx:jx+n-1), if incx = m_x.
Based on p?asum from the Level 1 PBLAS. The change is to use the 'genuine' absolute value.
Input Parameters
n (global). The number of components of the distributed vector sub(x). n ≥ 0.
x (local )
Pointer into the local memory to an array of size lld_x * LOCc(jx+n-1). This
array contains the local pieces of the distributed vector sub(X).
sub(X).
of sub(X)
descx (local) Array of size dlen_=9. The array descriptor for the distributed matrix
X.
incx (global) The global increment for the elements of X. Only two values of
incx are supported in this version, namely 1 and m_x.
Output Parameters
asum (local)
The sum of absolute values of the distributed vector sub(X) only in its
scope.
See Also
notations.
1394
p?dbtrsv
Computes an LU factorization of a general triangular
matrix with no pivoting. The function is called by p?
dbtrs.
Syntax
void psdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib ,
MKL_INT *descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT
*info );
void pddbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
*info );
void pcdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b ,
void pzdbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bwl , MKL_INT *bwu ,
MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b ,
MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf , MKL_Complex16
Include Files
• mkl_scalapack.h
Description
The p?dbtrsvfunction solves a banded triangular system of linear equations
A(1 :n, ja:ja+n-1) * X = B(ib:ib+n-1, 1 :nrhs) or
A(1 :n, ja:ja+n-1)T * X = B(ib:ib+n-1, 1 :nrhs) (for real flavors); A(1 :n, ja:ja+n-1)H* X = B(ib:ib+n-1,
1 :nrhs) (for complex flavors),
where A(1 :n, ja:ja+n-1) is a banded triangular matrix factor produced by the Gaussian elimination code of
p?dbtrf and is stored in A(1 :n, ja:ja+n-1) and af. The matrix stored in A(1 :n, ja:ja+n-1) is either
upper or lower triangular according to uplo, and the choice of solving A(1 :n, ja:ja+n-1) or A(1 :n, ja:ja
+n-1)T is dictated by the user by the parameter trans.
The function p?dbtrf must be called first.
Input Parameters
uplo (global)
If uplo='U', the upper triangle of A(1:n, ja:ja+n-1) is stored,
if uplo = 'L', the lower triangle of A(1:n, ja:ja+n-1) is stored.
trans (global)
If trans = 'N', solve with A(1:n, ja:ja+n-1),
if trans = 'C', solve with conjugate transpose A(1:n, ja:ja+n-1).
1395
n (global) The order of the distributed submatrix A;(n≥ 0).
bwl (global) Number of subdiagonals. 0 ≤bwl≤n-1.
bwu (global) Number of subdiagonals. 0 ≤bwu≤n-1.
distributed submatrix B (nrhs≥ 0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1),
where lld_a≥(bwl+bwu+1). On entry, this array contains the local pieces of
the n-by-n unsymmetric banded distributed Cholesky factor L or LT,
represented in global A as A(1 :n, ja:ja+n-1). This local portion is stored
in the packed banded format used in LAPACK. See the Application Notes
below and the ScaLAPACK manual for more detail on the format of
distributed matrices.
ja (global) The index in the global matrix A that points to the start of the
matrix to be operated on (which may be either all of A or a submatrix of A).

if 1d type (dtype_a = 501 or 502), dlen≥ 7;
if 2d type (dtype_a = 1), dlen≥ 9. The array descriptor for the distributed
matrix A. Contains information of mapping of A to memory.
b (local)
Pointer into the local memory to an array of local lead dimension lld_b≥nb.
On entry, this array contains the local pieces of the right-hand sides
ib (global) The row index in the global matrix B that points to the first row of
the matrix to be operated on (which may be either all of B or a submatrix of
B).

if 1d type (dtype_b =502), dlen≥7;
if 2d type (dtype_b =1), dlen≥9. The array descriptor for the distributed
matrix B. Contains information of mapping B to memory.
laf (local)
Size of user-input auxiliary fill-in space af.
laf≥nb*(bwl+bwu)+6*max(bwl, bwu)*max(bwl, bwu). If laf is not

large enough, an error code is returned and the minimum acceptable size
will be returned in af[0].
work (local).
Temporary workspace. This space may be overwritten in between function
calls.
work must be the size given in lwork.
1396
Size of user-input workspace work. If lwork is too small, the minimal
acceptable size will be returned in work[0] and an error code is returned.
lwork≥ max(bwl, bwu)*nrhs.
Output Parameters
a (local).
This local portion is stored in the packed banded format used in LAPACK.
Please see the ScaLAPACK manual for more detail on the format of
distributed matrices.
af (local).
auxiliary fill-in space. The fill-in space is created in a call to the factorization
function p?dbtrf and is stored in af. If a linear system is to be solved
using p?dbtrf after the factorization function, af must not be altered after
the factorization.
info (local).
< 0: If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info= - (i*100+j), if the i-th argument is a scalar and
See Also
notations.
p?dttrsv
Computes an LU factorization of a general band
matrix, using partial pivoting with row interchanges.
The function is called by p?dttrs.
Syntax
void psdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl ,
float *d , float *du , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT
*descb , float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pddttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl ,
double *d , double *du , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib ,
*info );
void pcdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8
*dl , MKL_Complex8 *d , MKL_Complex8 *du , MKL_INT *ja , MKL_INT *desca , MKL_Complex8
*b , MKL_INT *ib , MKL_INT *descb , MKL_Complex8 *af , MKL_INT *laf , MKL_Complex8
1397
void pzdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*dl , MKL_Complex16 *d , MKL_Complex16 *du , MKL_INT *ja , MKL_INT *desca ,
MKL_Complex16 *b , MKL_INT *ib , MKL_INT *descb , MKL_Complex16 *af , MKL_INT *laf ,
Include Files
• mkl_scalapack.h
Description
The p?dttrsvfunction solves a tridiagonal triangular system of linear equations
A(1 :n, ja:ja+n-1)*X = B(ib:ib+n-1, 1 :nrhs) or

A(1 :n, ja:ja+n-1)T * X = B(ib:ib+n-1, 1 :nrhs) for real flavors; A(1 :n, ja:ja+n-1)H* X =
B(ib:ib+n-1, 1 :nrhs) for complex flavors,
where A(1 :n, ja:ja+n-1) is a tridiagonal matrix factor produced by the Gaussian elimination code of p?
dttrf and is stored in A(1 :n, ja:ja+n-1) and af.
The matrix stored in A(1 :n, ja:ja+n-1) is either upper or lower triangular according to uplo, and the
choice of solving A(1 :n, ja:ja+n-1) or A(1 :n, ja:ja+n-1)T is dictated by the user by the parameter
trans.
The function p?dttrf must be called first.
Input Parameters
uplo (global)
If uplo='U', the upper triangle of A(1:n, ja:ja+n-1) is stored,
if uplo = 'L', the lower triangle of A(1:n, ja:ja+n-1) is stored.
trans (global)
If trans = 'N', solve with A(1:n, ja:ja+n-1),
if trans = 'C', solve with conjugate transpose A(1:n, ja:ja+n-1).
n (global) The order of the distributed submatrix A;(n≥ 0).
distributed submatrix B(ib:ib+n-1, 1:nrhs). (nrhs≥ 0).
dl (local).
Pointer to local part of global vector storing the lower diagonal of the
matrix.
Globally, dl[0] is not referenced, and dl must be aligned with d.
Must be of size ≥nb_a.
d (local).
du (local).
matrix.
1398
Globally, du[n-1] is not referenced, and du must be aligned with d.

if 1d type (dtype_a = 501 or 502), dlen≥ 7;
if 2d type (dtype_a = 1), dlen≥ 9.
The array descriptor for the distributed matrix A. Contains information of

mapping of A to memory.
b (local)
Pointer into the local memory to an array of local lead dimension lld_b≥nb.
On entry, this array contains the local pieces of the right-hand sides
B(ib:ib+n-1, 1 :nrhs).
B).

if 1d type (dtype_b = 502), dlen≥7;
if 2d type (dtype_b = 1), dlen≥ 9.
The array descriptor for the distributed matrix B. Contains information of

mapping B to memory.
laf (local).
Size of user-input auxiliary fill-in space af.
laf≥ 2*(nb+2). If laf is not large enough, an error code is returned and
the minimum acceptable size will be returned in af[0].
work (local).
Temporary workspace. This space may be overwritten in between function
calls.
work must be the size given in lwork.

Size of user-input workspace work. If lwork is too small, the minimal
acceptable size will be returned in work[0] and an error code is returned.
lwork≥ 10*npcol+4*nrhs.
Output Parameters
dl (local).
On exit, this array contains information containing the factors of the matrix.
d On exit, this array contains information containing the factors of the matrix.
Must be of size ≥nb_a.
1399
af (local).
function p?dttrf and is stored in af. If a linear system is to be solved
using p?dttrs after the factorization function, af must not be altered after
the factorization.
info (local).
if info< 0: If the i-th argument is an array and the j-th entry,

indexed j-1, had an illegal value, then info = - (i*100+j), if the i-th
argument is a scalar and had an illegal value, then info = -i.
See Also
notations.
p?gebal
Balances a general real matrix.
Syntax
void psgebal(char* job, MKL_INT* n, float* a, MKL_INT* desca, MKL_INT* ilo, MKL_INT*
ihi, float* scale, MKL_INT* info);
void pdgebal(char* job, MKL_INT* n, double* a, MKL_INT* desca, MKL_INT* ilo, MKL_INT*
ihi, double* scale, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?gebal balances a general real matrix A. This involves, first, permuting A by a similarity transformation to
isolate eigenvalues in the first 1 to ilo-1 and last ihi+1 to n elements on the diagonal; and second,
applying a diagonal similarity transformation to rows and columns ilo to ihi to make the rows and columns
as close in norm as possible. Both steps are optional.
Balancing may reduce the 1-norm of the matrix, and improve the accuracy of the computed eigenvalues
and/or eigenvectors.
Input Parameters
job (global )
Specifies the operations to be performed on a:
= 'N': none: simply set ilo = 1, ihi = n, scale[i] = 1.0 for i = 0,...,n-1;
= 'P': permute only;

= 'S': scale only;
= 'B': both permute and scale.
1400
n (global )
The order of the matrix A (n≥ 0).
a (local ) Pointer into the local memory to an array of size lld_a * LOCc(n)
This array contains the local pieces of global input matrix A.

OUTPUT Parameters
a On exit, a is overwritten by the balanced matrixA.
If job = 'N', a is not referenced.
See Notes for further details.
ilo, ihi (global )

ilo and ihi are set to integers such that on exit matrix elements A(i,j) are
zero if i > j and j = 1,...,ilo-1 or i = ihi+1,...,n.
If job = 'N' or 'S', ilo = 1 and ihi = n.
scale (global ) array of size n.
Details of the permutations and scaling factors applied to a. If pj is the

index of the row and column interchanged with row and column j and dj is
the scaling factor applied to row and column j, then
scale[j-1] = pj for j = 1,...,ilo-1
scale[j-1] = dj for j = ilo,...,ihi
The order in which the interchanges are made is n to ihi+1, then 1 to
ilo-1.
info (global )
Application Notes
The permutations consist of row and column interchanges which put the matrix in the form
where T1 and T2 are upper triangular matrices whose eigenvalues lie along the diagonal. The column indices
ilo and ihi mark the starting and ending columns of the submatrix B. Balancing consists of applying a
diagonal similarity transformation D-1BD to make the 1-norms of each row of B and its corresponding column
nearly equal. The output matrix is
1401
Information about the permutations P and the diagonal matrix D is returned in the vector scale.
See Also
notations.
p?gebd2
Reduces a general rectangular matrix to real
bidiagonal form by an orthogonal/unitary
transformation (unblocked algorithm).
Syntax
void psgebd2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *d , float *e , float *tauq , float *taup , float *work , MKL_INT
void pdgebd2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , double *tauq , double *taup , double *work ,
void pcgebd2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8 *taup ,
void pzgebd2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq , MKL_Complex16 *taup ,
Include Files
• mkl_scalapack.h
Description
The p?gebd2function reduces a real/complex general m-by-n distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1) to upper or lower bidiagonal form B by an orthogonal/unitary transformation:
Q'*sub(A)*P = B.
If m≥ n, B is the upper bidiagonal; if m<n, B is the lower bidiagonal.
Input Parameters
m (global)
The number of rows of the distributed matrix sub(A). (m≥0).
n (global)
1402
The number of columns in the distributed matrix sub(A). (n≥0).
a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the general distributed
matrix sub(A).
work (local).
This is a workspace array of size lwork.

lwork is local input and must be at least lwork≥ max(mpa0, nqa0),
where nb = mb_a = nb_a, iroffa = mod(ia-1, nb),
iarow = indxg2p(ia, nb, myrow, rsrc_a, nprow),

iacol = indxg2p(ja, nb, mycol, csrc_a, npcol),
mpa0 = numroc(m+iroffa, nb, myrow, iarow, nprow),
nqa0 = numroc(n+icoffa, nb, mycol, iacol, npcol).
indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol, nprow,
and npcol can be determined by calling the function blacs_gridinfo.

Output Parameters
a (local).
On exit, if m ≥ n, the diagonal and the first superdiagonal of sub(A) are
overwritten with the upper bidiagonal matrix B; the elements below the
diagonal, with the array tauq, represent the orthogonal/unitary matrix Q as
a product of elementary reflectors, and the elements above the first
superdiagonal, with the array taup, represent the orthogonal matrix P as a
product of elementary reflectors. If m < n, the diagonal and the first
subdiagonal are overwritten with the lower bidiagonal matrix B; the
elements below the first subdiagonal, with the array tauq, represent the
orthogonal/unitary matrix Q as a product of elementary reflectors, and the
elements above the diagonal, with the array taup, represent the orthogonal
matrix P as a product of elementary reflectors. See Applications Notes
below.
d (local)
1403
Array of size LOCc(ja+min(m,n)-1) if m≥ n; LOCr(ia+min(m,n)-1)

d[i] = A(i+1,i+1), i=0, 1,..., size (d) - 1 . d is tied to the distributed matrix
A.
e (local)
Array of size LOCc(ja+min(m,n)-1) if m≥ n; LOCr(ia+min(m,n)-2)
if m ≥ n, e[i] = A(i+1,i+2) for i = 0, 1, ... , n-2;
if m < n, e[i] = A(i+2,i+1) for i = 0, 1, ..., m-2. e is tied to the distributed

matrix A.
tauq (local).
Array of size LOCc(ja+min(m,n)-1). The scalar factors of the elementary
reflectors which represent the orthogonal/unitary matrix Q. tauq is tied to
taup (local).
Array of size LOCr(ia+min(m,n)-1). The scalar factors of the elementary
reflectors which represent the orthogonal/unitary matrix P. taup is tied to
info (local)
if info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = - (i*100+j), if the i-th argument is a
Application Notes
If m≥n,
Q = H(1)*H(2)*...*H(n), and P = G(1)*G(2)*...*G(n-1)

H(i) = I - tauq*v*v', and G(i) = I - taup*u*u',
where tauq and taup are real/complex scalars, and v and u are real/complex vectors. v(1: i-1) = 0, v(i)
= 1, and v(i+i:m) is stored on exit in
A(ia+i-ia+m-1, ja+i-1);
u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in A(ia+i-1, ja+i+1:ja+n-1);
If m < n,
v(1: i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in A(ia+i+1: ia+m-1, ja+i-1);

u(1: i-1) = 0, u(i) = 1, and u(i+1 :n) is stored on exit in A(ia+i-1,ja+i:ja+n-1);
1404
The contents of sub(A) on exit are illustrated by the following examples:
where d and e denote diagonal and off-diagonal elements of B, vi denotes an element of the vector defining
H(i), and ui an element of the vector defining G(i).
See Also
notations.
p?gehd2
Reduces a general matrix to upper Hessenberg form
by an orthogonal/unitary similarity transformation
Syntax
void psgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *tau , float *work , MKL_INT *lwork , MKL_INT
*info );
void pdgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *tau , double *work , MKL_INT *lwork , MKL_INT
*info );
void pcgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex8 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work , MKL_INT
void pzgehd2 (MKL_INT *n , MKL_INT *ilo , MKL_INT *ihi , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work ,
Include Files
• mkl_scalapack.h
Description
The p?gehd2function reduces a real/complex general distributed matrix sub(A) to upper Hessenberg form H
by an orthogonal/unitary similarity transformation: Q'*sub(A)*Q = H, where sub(A) = A(ia+n-1 :ia
+n-1, ja+n-1 :ja+n-1).
Input Parameters
n (global) The order of the distributed submatrix A. (n≥ 0).
ilo, ihi (global) It is assumed that the matrix sub(A) is already upper triangular in
rows ia:ia+ilo-2 and ia+ihi:ia+n-1 and columns ja:ja+jlo-2 and ja
+jhi:ja+n-1. See Application Notes for further information.
1405
If n≥ 0, 1 ≤ilo≤ihi≤n; otherwise set ilo = 1, ihi = n.
a (local).
On entry, this array contains the local pieces of the n-by-n general
distributed matrix sub(A) to be reduced.
work (local).

lwork is local input and must be at least lwork≥nb + max( npa0, nb ),
where nb = mb_a = nb_a, iroffa = mod( ia-1, nb ), iarow =
indxg2p ( ia, nb, myrow, rsrc_a, nprow ),npa0 = numroc(ihi
+iroffa, nb, myrow, iarow, nprow ).
indxg2p and numroc are ScaLAPACK tool functions;myrow, mycol, nprow,

Output Parameters
a (local). On exit, the upper triangle and the first subdiagonal of sub(A) are
overwritten with the upper Hessenberg matrix H, and the elements below
the first subdiagonal, with the array tau, represent the orthogonal/unitary
matrix Q as a product of elementary reflectors. (see Application Notes
below).
tau (local).
Array of size LOCc(ja+n-2) The scalar factors of the elementary reflectors
(see Application Notes below). Elements ja:ja+ilo-2 and ja+ihi:ja+n-2
of the global vector tau are set to zero. tau is tied to the distributed matrix
A.
info (local)
if info < 0: If the i-th argument is an array and the j-th entry, indexed j-1,
had an illegal value, then info = - (i*100+j), if the i-th argument is a
1406
Application Notes
The matrix Q is represented as a product of (ihi-ilo) elementary reflectors
Q = H(ilo)*H(ilo+1)*...*H(ihi-1).
H(i) = I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i)=0, v(i+1)=1 and v(ihi
+1:n)=0; v(i+2:ihi) is stored on exit in A(ia+ilo+i:ia+ihi-1, ia+ilo+i-2), and tau in tau[ja+ilo
+i-3].
The contents of A(ia:ia+n-1, ja:ja+n-1) are illustrated by the following example, with n = 7, ilo = 2
and ihi = 6:
where a denotes an element of the original matrix sub(A), h denotes a modified element of the upper
Hessenberg matrix H, and vi denotes an element of the vector defining H(ja+ilo+i-2).
See Also
notations.
p?gelq2
Computes an LQ factorization of a general rectangular
matrix (unblocked algorithm).
Syntax
void psgelq2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdgelq2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgelq2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pzgelq2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
Include Files
• mkl_scalapack.h
1407
Description
The p?gelq2function computes an LQ factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1) = L*Q.
Input Parameters
m (global)
The number of rows of the distributed matrix sub(A). (m≥0).
n (global)
The number of columns of the distributed matrix sub(A). (n≥0).
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the m-by-n distributed
matrix sub(A) which is to be factored.
work (local).

lwork is local input and must be at least lwork≥nq0 + max( 1, mp0 ),

where iroff = mod(ia-1, mb_a), icoff = mod(ja-1, nb_a),
iarow = indxg2p(ia, mb_a, myrow, rsrc_a, nprow),

iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol),
mp0 = numroc(m+iroff, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol),

assumed; the function only calculates the minimum and optimal size
for all work arrays. Each of these values is returned in the first entry
of the corresponding work array, and no error message is issued by
pxerbla.
Output Parameters
a (local).
1408
On exit, the elements on and below the diagonal of sub(A) contain the m by
min(m,n) lower trapezoidal matrix L (L is lower triangular if m≤n); the
Notes below).
tau (local).
Array of size LOCr(ia+min(m, n)-1). This array contains the scalar
factors of the elementary reflectors. tau is tied to the distributed matrix A.
info (local) If info = 0, the execution is successful. if info < 0: If the i-th
argument is an array and the j-th entry, indexed j-1, had an illegal value,
then info = - (i*100+j), if the i-th argument is a scalar and had an illegal
Application Notes
Q =H(ia+k-1)*H(ia+k-2)*. . . *H(ia) for real flavors, Q =(H(ia+k-1))H*(H(ia
+k-2))H...*(H(ia))H for complex flavors,
where k = min(m,n).
H(i) = I - tau*v*v'
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 and v(i) = 1; v(i
+1: n) (for real flavors) or conjg(v(i+1: n)) (for complex flavors) is stored on exit in A(ia+i-1,ja+i:ja
+n-1), and tau in tau[ia+i-2].
See Also
notations.
p?geql2
Computes a QL factorization of a general rectangular
Syntax
void psgeql2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdgeql2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgeql2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pzgeql2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
1409
Include Files
• mkl_scalapack.h
Description
The p?geql2function computes a QL factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1)= Q *L.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). (m≥ 0).
n (global)
The number of columns in the distributed matrix sub(A). (n≥ 0).
a (local).
work (local).

lwork is local input and must be at least lwork≥mp0 + max(1, nq0),



Output Parameters
a (local).
1410
On exit,
if m ≥n, the lower triangle of the distributed submatrix A(ia+m-n:ia+m-1,
ja:ja+n-1) contains the n-by-n lower triangular matrix L;
if m≤n, the elements on and below the (n-m)-th superdiagonal contain the
m-by-n lower trapezoidal matrix L; the remaining elements, with the array
tau, represent the orthogonal/ unitary matrix Q as a product of elementary
tau (local).
info (local).
If info = 0, the execution is successful. if info < 0: If the i-th argument
is an array and the j-th entry, indexed j-1, had an illegal value, then info
= - (i*100+j), if the i-th argument is a scalar and had an illegal value,
then info = -i.
Application Notes
Q = H(ja+k-1)*...*H(ja+1)*H(ja), where k = min(m,n).
H(i) = I- tau *v*v'

where tau is a real/complex scalar, and v is a real/complex vector with v(m-k+i+1: m) = 0 and v(m-k+i) =
1; v(1: m-k+i-1) is stored on exit in A(ia:ia+m-k+i-2, ja+n-k+i-1), and tau in tau[ja+n-k+i-2].
See Also
notations.
p?geqr2
Computes a QR factorization of a general rectangular
Syntax
void psgeqr2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdgeqr2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgeqr2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pzgeqr2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
1411
Include Files
• mkl_scalapack.h
Description
The p?geqr2function computes a QR factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1)= Q*R.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). (m≥0).
n (global) The number of columns in the distributed matrix sub(A). (n≥0).
a (local).
work (local).

lwork is local input and must be at least lwork≥mp0+max(1, nq0),

nq0 = numroc(n+icoff, nb_a, mycol, iacol, npcol).

Output Parameters
a (local).
1412
On exit, the elements on and above the diagonal of sub(A) contain the
min(m,n) by n upper trapezoidal matrix R (R is upper triangular if m≥n); the
elements below the diagonal, with the array tau, represent the orthogonal/
Notes below).
tau (local).
Array of size LOCc(ja+min(m,n)-1). This array contains the scalar factors of
the elementary reflectors. tau is tied to the distributed matrix A.
info (local)
If info = 0, the execution is successful. if info < 0:
If the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = - (i*100+j),
if the i-th argument is a scalar and had an illegal value, then info = -i.
Application Notes
Q = H(ja)*H(ja+1)*. . .* H(ja+k-1), where k = min(m,n).
H(j)= I - tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i-1) = 0 and v(i) = 1; v(i+1: m)
is stored on exit in A(ia+i:ia+m-1, ja+i-1), and tau in tau[ja+i-2].
See Also
notations.
p?gerq2
Computes an RQ factorization of a general rectangular
Syntax
void psgerq2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdgerq2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgerq2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
void pzgerq2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*info );
1413
Include Files
• mkl_scalapack.h
Description
The p?gerq2function computes an RQ factorization of a real/complex distributed m-by-n matrix sub(A) =
A(ia:ia+m-1, ja:ja+n-1) = R*Q.
Input Parameters
m (global) The number of rows in the distributed matrix sub(A). (m≥0).
n (global) The number of columns in the distributed matrix sub(A). (n≥0).
a (local).
work (local).

lwork is local input and must be at least lwork≥nq0 + max(1, mp0),

where
iacol = indxg2p(ja, nb_a, mycol, csrc_a, npcol), mp0 =
numroc( m+iroff, mb_a, myrow, iarow, nprow),

Output Parameters
a (local).
On exit,
1414
if m≤n, the upper triangle of A(ia+m-n:ia+m-1, ja:ja+n-1) contains the m-
by-m upper triangular matrix R;
if m≥ n, the elements on and above the (m-n)-th subdiagonal contain the
m-by-n upper trapezoidal matrix R; the remaining elements, with the array
tau, represent the orthogonal/ unitary matrix Q as a product of elementary
tau (local).
Array of size LOCr(ia+m -1). This array contains the scalar factors of the
info (local)
if info < 0: If the i-th argument is an array and the j-th entry, indexed j-1,
had an illegal value, then info = - (i*100+j), if the i-th argument is a
Application Notes
Q = H(ia)*H(ia+1)*...*H(ia+k-1) for real flavors,
Q = (H(ia))H*(H(ia+1))H...*(H(ia+k-1))H for complex flavors,
where k = min(m, n).
where tau is a real/complex scalar, and v is a real/complex vector with v(n-k+i+1:n) = 0 and v(n-k+i) =
1; v(1:n-k+i-1) for real flavors or conjg(v(1:n-k+i-1)) for complex flavors is stored on exit in A(ia+m-
k+i-1, ja:ja+n-k+i-2), and tau in tau[ia+m-k+i-2].
See Also
notations.
p?getf2
Computes an LU factorization of a general matrix,
using partial pivoting with row interchanges (local
blocked algorithm).
Syntax
void psgetf2 (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , MKL_INT *ipiv , MKL_INT *info );
void pdgetf2 (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcgetf2 (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
1415
void pzgetf2 (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,

Include Files
• mkl_scalapack.h
Description
The p?getf2function computes an LU factorization of a general m-by-n distributed matrix sub(A) = A(ia:ia
+m-1, ja:ja+n-1) using partial pivoting with row interchanges.
The factorization has the form sub(A) = P * L* U, where P is a permutation matrix, L is lower triangular
with unit diagonal elements (lower trapezoidal if m>n), and U is upper triangular (upper trapezoidal if m < n).
This is the right-looking Parallel Level 2 BLAS version of the algorithm.
Optimization Notice
Input Parameters
m (global)
n (global) The number of columns in the distributed matrix sub(A). (nb_a -

mod(ja-1, nb_a)≥n≥0).
a (local).
matrix sub(A).
Output Parameters
ipiv (local)
Array of size(LOCr(m_a) + mb_a). This array contains the pivoting
information. ipiv[i] -> The global row that local row (i +1) was swapped
with, i = 0, 1, ... , LOCr(m_a) + mb_a - 1. This array is tied to the
1416
info (local).
If info = 0: successful exit.
If info < 0:
• if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value, then info = -(i*100+j),
• if the i-th argument is a scalar and had an illegal value, then info = -
i.
If info > 0: If info = k, the matrix element U(ia+k-1, ja+k-1) is

exactly zero. The factorization has been completed, but the factor U is
exactly singular, and division by zero will occur if it is used to solve a
system of equations.
See Also
notations.
p?labrd
Reduces the first nb rows and columns of a general
rectangular matrix A to real bidiagonal form by an
orthogonal/unitary transformation, and returns
auxiliary matrices that are needed to apply the
transformation to the unreduced part of A.
Syntax
void pslabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *d , float *e , float *tauq , float *taup , float *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , float *work );
void pdlabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , double *tauq , double *taup ,
double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *y , MKL_INT *iy ,
MKL_INT *jy , MKL_INT *descy , double *work );
void pclabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *d , float *e , MKL_Complex8 *tauq , MKL_Complex8
*taup , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_Complex8
*y , MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex8 *work );
void pzlabrd (MKL_INT *m , MKL_INT *n , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , MKL_Complex16 *tauq ,
MKL_Complex16 *taup , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_Complex16 *y , MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?labrdfunction reduces the first nb rows and columns of a real/complex general m-by-n distributed
matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1) to upper or lower bidiagonal form by an orthogonal/unitary
transformation Q'* A * P, and returns the matrices X and Y necessary to apply the transformation to the
unreduced part of sub(A).
1417
If m ≥n, sub(A) is reduced to upper bidiagonal form; if m < n, sub(A) is reduced to lower bidiagonal form.
This is an auxiliary function called by p?gebrd.
Input Parameters
m (global) The number of rows in the distributed matrix sub(A). (m≥ 0).
n (global) The number of columns in the distributed matrix sub(A). (n ≥ 0).
nb (global)
The number of leading rows and columns of sub(A) to be reduced.
a (local).
On entry, this array contains the local pieces of the general distributed
matrix sub(A).
iy, jy (global) The row and column indices in the global matrix Y indicating the
first row and the first column of the matrix sub(Y), respectively.
descy (global and local) array of size dlen_. The array descriptor for the
distributed matrix Y.
work (local).
Workspace array of sizelwork.
lwork ≥ nb_a + nq,

with nq = numroc(n+mod(ia-1, nb_y), nb_y, mycol, iacol,
npcol)
iacol = indxg2p (ja, nb_a, mycol, csrc_a, npcol)
Output Parameters
a (local)
On exit, the first nb rows and columns of the matrix are overwritten; the
rest of the distributed matrix sub(A) is unchanged.
1418
If m ≥ n, elements on and below the diagonal in the first nb columns, with
the array tauq, represent the orthogonal/unitary matrix Q as a product of
elementary reflectors; and elements above the diagonal in the first nb rows,
with the array taup, represent the orthogonal/unitary matrix P as a product
of elementary reflectors.
If m < n, elements below the diagonal in the first nb columns, with the
array tauq, represent the orthogonal/unitary matrix Q as a product of
elementary reflectors, and elements on and above the diagonal in the first
nb rows, with the array taup, represent the orthogonal/unitary matrix P as
a product of elementary reflectors. See Application Notes below.
d (local).
Array of size LOCr(ia+min(m,n)-1) if m ≥ n; LOCc(ja+min(m,n)-1)
otherwise. The distributed diagonal elements of the bidiagonal distributed
matrix B:
d[i] = A(ia+i, ja+i), i= 0, 1, ..., size (d)-1
e (local).
Array of size LOCr(ia+min(m,n)-1) if m ≥n; LOCc(ja+min(m,n)-2)
otherwise. The distributed off-diagonal elements of the bidiagonal
distributed matrix B:
if m ≥ n, e[i] = A(ia+i, ja+i+1) for i = 0, 1, ..., n-2;
if m<n, e[i] = A(ia+i+1, ja+i) for i = 0, 1, ..., m-2.
tauq, taup (local).

Array size LOCc(ja+min(m, n)-1) for tauq, size LOCr(ia+min(m, n)-1) for
taup. The scalar factors of the elementary reflectors which represent the
orthogonal/unitary matrix Q for tauq, P for taup. tauq and taup are tied to
the distributed matrix A. See Application Notes below.
x (local)
Pointer into the local memory to an array of size lld_x* nb. On exit, the
local pieces of the distributed m-by-nb matrix X(ix:ix+m-1, jx:jx+nb-1)
required to update the unreduced part of sub(A).
y (local).
Pointer into the local memory to an array of size lld_y* nb. On exit, the
local pieces of the distributed n-by-nb matrix Y(iy:iy+n-1, jy:jy+nb-1)
required to update the unreduced part of sub(A).
Application Notes
Q = H(1)*H(2)*...*H(nb), and P = G(1)*G(2)*...*G(nb)
H(i) = I - tauq*v*v' , and G(i) = I - taup*u*u',
1419
where tauq and taup are real/complex scalars, and v and u are real/complex vectors.
If m ≥ n, v(1: i-1 ) = 0, v(i) = 1, and v(i:m) is stored on exit in
A(ia+i-1:ia+m-1, ja+i-1); u(1:i) = 0, u(i+1 ) = 1, and u(i+1:n) is stored on exit in A(ia+i-1,

ja+i:ja+n-1); tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2].
If m < n, v(1: i) = 0, v(i+1 ) = 1, and v(i+1:m) is stored on exit in
A(ia+i+1:ia+m-1, ja+i-1); u(1:i-1 ) = 0, u(i) = 1, and u(i:n) is stored on exit in A(ia+i-1, ja

+i:ja+n-1); tauq is stored in tauq[ja+i-2] and taup in taup[ia+i-2]. The elements of the vectors v and
u together form the m-by-nb matrix V and the nb-by-n matrix U' which are necessary, with X and Y, to apply
the transformation to the unreduced part of the matrix, using a block update of the form: sub(A):= sub(A)
- V*Y' - X*U'. The contents of sub(A) on exit are illustrated by the following examples with nb = 2:
where a denotes an element of the original matrix which is unchanged, vi denotes an element of the vector
defining H(i), and ui an element of the vector defining G(i).
See Also
notations.
p?lacon
Estimates the 1-norm of a square matrix, using the
reverse communication for evaluating matrix-vector
products.
Syntax
void pslacon (MKL_INT *n , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *isgn , float *est ,
MKL_INT *kase );
void pdlacon (MKL_INT *n , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *isgn , double *est ,
MKL_INT *kase );
void pclacon (MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *est ,
MKL_INT *kase );
void pzlacon (MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *est ,
MKL_INT *kase );
Include Files
• mkl_scalapack.h
1420
Description
The p?laconfunction estimates the 1-norm of a square, real/unitary distributed matrix A. Reverse
communication is used for evaluating matrix-vector products. x and v are aligned with the distributed matrix
A, this information is implicitly contained within iv, ix, descv, and descx.
Input Parameters
n (global) The length of the distributed vectors v and x. n ≥ 0.
v (local).
Pointer into the local memory to an array of size LOCr(n+mod(iv-1, mb_v)).
On the final return, v = a*w, where est = norm(v)/norm(w) (w is not
returned).
iv, jv (global) The row and column indices in the global matrix V indicating the
first row and the first column of the submatrix V, respectively.
descv (global and local) array of size dlen_. The array descriptor for the
distributed matrix V.
x (local).
Pointer into the local memory to an array of size LOCr(n+mod(ix-1, mb_x)).
first row and the first column of the submatrix X, respectively.
isgn (local).
Array of size LOCr(n+mod(ix-1, mb_x)). isgn is aligned with x and v.
kase (local).
On the initial call to p?lacon, kase should be 0.
Output Parameters
x (local).
On an intermediate return, X should be overwritten by A*X, if kase=1, A'
*X, if kase=2,
p?lacon must be re-called with all the other parameters unchanged.
est (global).
kase (local)
On an intermediate return, kase is 1 or 2, indicating whether X should be
overwritten by A*X, or A'*X. On the final return from p?lacon, kase is
again 0.
See Also
notations.
1421
p?laconsb
Looks for two consecutive small subdiagonal elements.
Syntax
void pslaconsb (const float *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *m, const float *h44, const float *h33, const float *h43h34, float *buf,
const MKL_INT *lwork );
void pdlaconsb (const double *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *m, const double *h44, const double *h33, const double *h43h34, double
*buf, const MKL_INT *lwork );
void pclaconsb (const MKL_Complex8 *a , const MKL_INT *desca , const MKL_INT *i , const
MKL_INT *l , MKL_INT *m , const MKL_Complex8 *h44 , const MKL_Complex8 *h33 , const
MKL_Complex8 *h43h34 , MKL_Complex8 *buf , const MKL_INT *lwork );
void pzlaconsb (const MKL_Complex16 *a , const MKL_INT *desca , const MKL_INT *i ,
const MKL_INT *l , MKL_INT *m , const MKL_Complex16 *h44 , const MKL_Complex16 *h33 ,
const MKL_Complex16 *h43h34 , MKL_Complex16 *buf , const MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?laconsbfunction looks for two consecutive small subdiagonal elements by analyzing the effect of
starting a double shift QR iteration given by h44, h33, and h43h34 to see if this process makes a subdiagonal
negligible.
Input Parameters
a (local)
Array of size lld_a*LOCc(n_a). On entry, the Hessenberg matrix whose
tridiagonal part is being scanned. Unchanged on exit.

Array of size dlen_. The array descriptor for the distributed matrix A.
i (global)
The global location of the bottom of the unreduced submatrix of A.
Unchanged on exit.
l (global)
The global location of the top of the unreduced submatrix of A. Unchanged
on exit.
h44, h33, h43h34 (global).

These three values are for the double shift QR iteration.
lwork (local)
1422
This must be at least 7*ceil(ceil( (i-l)/mb_a )/lcm(nprow,
npcol)). Here lcm is the least common multiple and nprow*npcol is the
logical grid size.
Output Parameters
m (global). On exit, this yields the starting location of the QR double shift.
This will satisfy:
l≤m≤i-2.
buf (local).
See Also
notations.
p?lacp2
Copies all or part of a distributed matrix to another
distributed matrix.
Syntax
void pslacp2 (char *uplo , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pdlacp2 (char *uplo , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pclacp2 (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
*descb );
void pzlacp2 (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
*descb );
Include Files
• mkl_scalapack.h
Description
The p?lacp2function copies all or part of a distributed matrix A to another distributed matrix B. No
communication is performed, p?lacp2 performs a local copy sub(A):= sub(B), where sub(A) denotes
A(ia:ia+m-1, a:ja+n-1) and sub(B) denotes B(ib:ib+m-1, jb:jb+n-1).
p?lacp2 requires that only dimension of the matrix operands is distributed.
Input Parameters
uplo (global) Specifies the part of the distributed matrix sub(A) to be copied:
= 'U': Upper triangular part is copied; the strictly lower triangular part of
sub(A) is not referenced;
1423
= 'L': Lower triangular part is copied; the strictly upper triangular part of
sub(A) is not referenced.
Otherwise: all of the matrix sub(A) is copied.
m (global)
The number of rows in the distributed matrix sub(A). (m≥ 0).
n (global)
The number of columns in the distributed matrix sub(A). (n≥ 0).
a (local).
matrix sub(A).
first row and the first column of sub(B), respectively.
Output Parameters
b (local).
Pointer into the local memory to an array of size lld_b * LOCc(jb+n-1).
This array contains on exit the local pieces of the distributed matrix sub( B )
set as follows:
if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤j, 1≤j≤n;
if uplo = 'L', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), j≤i≤m, 1≤j≤n;
otherwise, B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤m, 1≤j≤n.
See Also
notations.
p?lacp3
Copies from a global parallel array into a local
replicated array or vice versa.
Syntax
void pslacp3 (const MKL_INT *m, const MKL_INT *i, float *a, const MKL_INT *desca, float
*b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *rev );
1424
void pdlacp3 (const MKL_INT *m, const MKL_INT *i, double *a, const MKL_INT *desca,
double *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj, const MKL_INT
*rev );
void pclacp3 (const MKL_INT *m, const MKL_INT *i, MKL_Complex8 *a, const MKL_INT
*desca, MKL_Complex8 *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj,
const MKL_INT *rev);
void pzlacp3 (const MKL_INT *m, const MKL_INT *i, MKL_Complex16 *a, const MKL_INT
*desca, MKL_Complex16 *b, const MKL_INT *ldb, const MKL_INT *ii, const MKL_INT *jj,
const MKL_INT *rev);
Include Files
• mkl_scalapack.h
Description
This is an auxiliary function that copies from a global parallel array into a local replicated array or vise versa.
Note that the entire submatrix that is copied gets placed on one node or more. The receiving node can be
specified precisely, or all nodes can receive, or just one row or column of nodes.
Optimization Notice
Input Parameters
m (global)
m is the order of the square submatrix that is copied.
m≥ 0. Unchanged on exit.
i (global) The matrix element A(i, i) is the global location that the copying
starts from. Unchanged on exit.
a (local)
Array of size lld_a*LOCc(n_a). On entry, the parallel matrix to be copied
into or from.
b (local)
Array of size ldb*LOCc(m). If rev = 0, this is the global portion of the
matrix A(i:i+m-1, i:i+m-1). If rev = 1, this is unchanged on exit.
ldb (local)
1425
The leading dimension of B.
ii, jj (global) By using rev 0 and 1, data can be sent out and returned again. If
rev = 0, then ii is destination row index and jj is destination column index
for the node(s) receiving the replicated matrixB. If ii ≥ 0, jj≥ 0, then node
(ii, jj) receives the data. If ii = -1, jj ≥ 0, then all rows in column jj receive
the data. If ii ≥ 0, jj = -1, then all cols in row ii receive the data. If ii = -1, jj
= -1, then all nodes receive the data. If rev !=0, then ii is the source row
index for the node(s) sending the replicated B.
rev (global) Use rev = 0 to send global matrixA into locally replicated matrixB
(on node (ii, jj)). Use rev != 0 to send locally replicated B from node (ii, jj)
to its owner (which changes depending on its location in A) into the global
A.
Output Parameters
a On exit, if rev = 1, the copied data. Unchanged on exit if rev = 0.
b If rev = 1, this is unchanged on exit.
See Also
notations.
p?lacpy
Copies all or part of one two-dimensional array to
another.
Syntax
void pslacpy (char *uplo , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pdlacpy (char *uplo , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb );
void pclacpy (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
*descb );
void pzlacpy (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
*descb );
Include Files
• mkl_scalapack.h
Description
The p?lacpyfunction copies all or part of a distributed matrix A to another distributed matrix B. No
communication is performed, p?lacpy performs a local copy sub(B):= sub(A), where sub(A) denotes
A(ia:ia+m-1,ja:ja+n-1) and sub(B) denotes B(ib:ib+m-1,jb:jb+n-1).
1426
Input Parameters
uplo (global) Specifies the part of the distributed matrix sub(A) to be copied:
= 'U': Upper triangular part; the strictly lower triangular part of sub(A) is
not referenced;
= 'L': Lower triangular part; the strictly upper triangular part of sub(A) is
not referenced.
Otherwise: all of the matrix sub(A) is copied.
m (global)
n (global)
a (local).
On entry, this array contains the local pieces of the distributed matrix
sub(A).
first row and the first column of sub(B) respectively.
Output Parameters
b (local).
This array contains on exit the local pieces of the distributed matrix sub(B)
set as follows:
if uplo = 'U', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤j, 1≤j≤n;
if uplo = 'L', B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), j≤i≤m, 1≤j≤n;
otherwise, B(ib+i-1, jb+j-1) = A(ia+i-1, ja+j-1), 1≤i≤m, 1≤j≤n.
See Also
notations.
p?laevswp
Moves the eigenvectors from where they are
computed to ScaLAPACK standard block cyclic array.
1427
Syntax
void pslaevswp (MKL_INT *n , float *zin , MKL_INT *ldzi , float *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , float *work , MKL_INT
*lwork );
void pdlaevswp (MKL_INT *n , double *zin , MKL_INT *ldzi , double *z , MKL_INT *iz ,
MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , double *work , MKL_INT
*lwork );
void pclaevswp (MKL_INT *n , float *zin , MKL_INT *ldzi , MKL_Complex8 *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , float *rwork ,
MKL_INT *lrwork );
void pzlaevswp (MKL_INT *n , double *zin , MKL_INT *ldzi , MKL_Complex16 *z , MKL_INT
*iz , MKL_INT *jz , MKL_INT *descz , MKL_INT *nvs , MKL_INT *key , double *rwork ,
MKL_INT *lrwork );
Include Files
• mkl_scalapack.h
Description
The p?laevswpfunction moves the eigenvectors (potentially unsorted) from where they are computed, to a
ScaLAPACK standard block cyclic array, sorted so that the corresponding eigenvalues are sorted.
Input Parameters
n (global)
The order of the matrix A. n ≥ 0.
zin (local).
Array of size ldzi * nvs[iam+1]. The eigenvectors on input. iam is a process
rank from [0, nprocs) interval. Each eigenvector resides entirely in one
process. Each process holds a contiguous set of nvs[iam+1] eigenvectors.
The global number of the first eigenvector that the process holds is: ((sum
for i=[0, iam] of nvs[i])+1).
ldzi (local)
The leading dimension of the zin array.
descz (global and local)

Array of size dlen_. The array descriptor for the distributed matrix Z.
nvs (global)
Array of size nprocs+1
nvs[i] = number of eigenvectors held by processes [0, i)
nvs[0] = number of eigenvectors held by processes [0, 0) = 0
1428
nvs[nprocs]= number of eigenvectors held by [0, nprocs)= total number of
eigenvectors.
key (global)
Array of size n. Indicates the actual index (after sorting) for each of the
eigenvectors.
rwork (local).
Array of size lrwork.
lrwork (local)
Size of work.
Output Parameters
z (local).
Array of global size n* n and of local size lld_z * nq. The eigenvectors on
output. The eigenvectors are distributed in a block cyclic manner in both
dimensions, with a block size of nb.
See Also
notations.
p?lahrd
Reduces the first nb columns of a general rectangular
matrix A so that elements below the k-th subdiagonal
are zero, by an orthogonal/unitary transformation,
and returns auxiliary matrices that are needed to
apply the transformation to the unreduced part of A.
Syntax
void pslahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *t , float *y , MKL_INT *iy , MKL_INT *jy ,
MKL_INT *descy , float *work );
void pdlahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *tau , double *t , double *y , MKL_INT *iy ,
MKL_INT *jy , MKL_INT *descy , double *work );
void pclahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *t , MKL_Complex8 *y ,
MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex8 *work );
void pzlahrd (MKL_INT *n , MKL_INT *k , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *t , MKL_Complex16
*y , MKL_INT *iy , MKL_INT *jy , MKL_INT *descy , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
1429
Description
The p?lahrdfunction reduces the first nb columns of a real general n-by-(n-k+1) distributed matrix A(ia:ia
+n-1 , ja:ja+n-k) so that elements below the k-th subdiagonal are zero. The reduction is performed by
an orthogonal/unitary similarity transformation Q'*A*Q. The function returns the matrices V and T which
determine Q as a block reflector I-V*T*V', and also the matrix Y = A*V*T.
This is an auxiliary function called by p?gehrd. In the following comments sub(A) denotes A(ia:ia+n-1,
ja:ja+n-1).
Input Parameters
n (global)
The order of the distributed matrix sub(A). n ≥ 0.
k (global)
The offset for the reduction. Elements below the k-th subdiagonal in the
first nb columns are reduced to zero.
nb (global)
The number of columns to be reduced.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-k). On
entry, this array contains the local pieces of the n-by-(n-k+1) general
distributed matrix A(ia:ia+n-1, ja:ja+n-k).
iy, jy (global) The row and column indices in the global matrix Y indicating the
first row and the first column of the matrix sub(Y), respectively.
descy (global and local) array of size dlen_. The array descriptor for the
distributed matrix Y.
work (local).
Array of size nb.
Output Parameters
a (local).
On exit, the elements on and above the k-th subdiagonal in the first nb
columns are overwritten with the corresponding elements of the reduced
distributed matrix; the elements below the k-th subdiagonal, with the array
tau, represent the matrix Q as a product of elementary reflectors. The other
columns of the matrix A(ia:ia+n-1, ja:ja+n-k) are unchanged. (See
Application Notes below.)
tau (local)
1430
Array of size LOCc(ja+n-2). The scalar factors of the elementary reflectors
(see Application Notes below). tau is tied to the distributed matrix A.
t (local)
Array of size nb_a* nb_a. The upper triangular matrix T.
y (local).
Pointer into the local memory to an array of size lld_y* nb_a. On exit, this
array contains the local pieces of the n-by-nb distributed matrix Y. lld_y≥
LOCr(ia+n-1).
Application Notes
The matrix Q is represented as a product of nb elementary reflectors
Q = H(1)*H(2)*...*H(nb).
H(i) = i-tau*v*v',
where tau is a real/complex scalar, and v is a real/complex vector with v(1: i+k-1)= 0, v(i+k)= 1; v(i+k
+1:n) is stored on exit in A(ia+i+k:ia+n-1, ja+i-1), and tau in tau[ja+i-2].
The elements of the vectors v together form the (n-k+1)-by-nb matrix V which is needed, with T and Y, to
apply the transformation to the unreduced part of the matrix, using an update of the form: A(ia:ia+n-1,
ja:ja+n-k) := (I-V*T*V')*(A(ia:ia+n-1, ja:ja+n-k)-Y*V'). The contents of A(ia:ia+n-1, ja:ja+n-k) on exit
are illustrated by the following example with n = 7, k = 3, and nb = 2:
where a denotes an element of the original matrix A(ia:ia+n-1, ja:ja+n-k), h denotes a modified element
of the upper Hessenberg matrix H, and vi denotes an element of the vector defining H(i).
See Also
notations.
p?laiect
Exploits IEEE arithmetic to accelerate the
computations of eigenvalues.
Syntax
void pslaiect (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
void pdlaiectb (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
void pdlaiectl (float *sigma , MKL_INT *n , float *d , MKL_INT *count );
1431
Include Files
• mkl_scalapack.h
Description
The p?laiectfunction computes the number of negative eigenvalues of (A- σI). This implementation of the
Sturm Sequence loop exploits IEEE arithmetic and has no conditionals in the innermost loop. The signbit for
real function pslaiect is assumed to be bit 32. Double-precision functions pdlaiectb and pdlaiectl differ
in the order of the double precision word storage and, consequently, in the signbit location. For pdlaiectb,
the double precision word is stored in the big-endian word order and the signbit is assumed to be bit 32. For
pdlaiectl, the double precision word is stored in the little-endian word order and the signbit is assumed to
be bit 64.
This is a ScaLAPACK internal function and arguments are not checked for unreasonable values.
Input Parameters
sigma The shift. p?laiect finds the number of eigenvalues less than equal to
sigma.
n The order of the tridiagonal matrix T. n≥ 1.
d Array of size 2n-1.
On entry, this array contains the diagonals and the squares of the off-
diagonal elements of the tridiagonal matrix T. These elements are assumed
to be interleaved in memory for better cache performance. The diagonal
entries of T are in the entries d[0], d[2],..., d[2n-2], while the
squares of the off-diagonal entries are d[1], d[3], ..., d[2n-3]. To
avoid overflow, the matrix must be scaled so that its largest entry is no
greater than overflow(1/2) * underflow(1/4) in absolute value, and for
greatest accuracy, it should not be much smaller than that.
Output Parameters
n The count of the number of eigenvalues of T less than or equal to sigma.
See Also
notations.
p?lamve
Copies all or part of one two-dimensional distributed
array to another.
Syntax
void pslamve(char* uplo, MKL_INT* m, MKL_INT* n, float* a, MKL_INT* ia, MKL_INT* ja,
MKL_INT* desca, float* b, MKL_INT* ib, MKL_INT* jb, MKL_INT* descb, float* dwork);
void pdlamve(char* uplo, MKL_INT* m, MKL_INT* n, double* a, MKL_INT* ia, MKL_INT* ja,
MKL_INT* desca, double* b, MKL_INT* ib, MKL_INT* jb, MKL_INT* descb, double* dwork);
Include Files
• mkl_scalapack.h
1432
Description
p?lamve copies all or part of a distributed matrix A to another distributed matrix B. There is no alignment
assumptions at all except that A and B are of the same size.
Optimization Notice
Input Parameters
uplo (global )
Specifies the part of the distributed matrix sub( A ) to be copied:
= 'U': Upper triangular part is copied; the strictly lower triangular part of
sub( A ) is not referenced;
= 'L': Lower triangular part is copied; the strictly upper triangular part of
sub( A ) is not referenced;
Otherwise: All of the matrix sub( A ) is copied.
m (global )
The number of rows to be operated on, which is the number of rows of the
distributed matrix sub( A ). m≥ 0.
n (global )
The number of columns to be operated on, which is the number of columns
of the distributed matrix sub( A ). n≥ 0.
a (local ) pointer into the local memory to an array of size lld_a * LOCc(ja
+n-1) . This array contains the local pieces of the distributed matrix
sub( A ) to be copied from.
ia (global )
The row index in the global matrix A indicating the first row of sub( A ).
ja (global )
The column index in the global matrix A indicating the first column of
sub( A ).

ib (global )
The row index in the global matrix B indicating the first row of sub( B ).
1433
jb (global )
The column index in the global matrix B indicating the first column of
sub( B ).

dwork (local workspace) array

If uplo = 'U' or uplo = 'L' and number of processors > 1, the length of
dwork is at least as large as the length of b.
Otherwise, dwork is not referenced.
OUTPUT Parameters
b (local ) pointer into the local memory to an array of size lld_b * LOCc(jb
+n-1) . This array contains on exit the local pieces of the distributed matrix
sub( B ).
See Also
notations.
p?lange
element, of a general rectangular matrix.
Syntax
float pslange (char *norm , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *work );
double pdlange (char *norm , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *work );
float pclange (char *norm , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *work );
double pzlange (char *norm , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?langefunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1).
Input Parameters
norm (global) Specifies what value is returned by the function:

A, it s not a matrix norm.
1434
row sum),
m (global)
The number of rows in the distributed matrix sub(A). When m = 0, p?
lange is set to zero. m≥ 0.
n (global)
The number of columns in the distributed matrix sub(A). When n = 0, p?
lange is set to zero. n ≥ 0.
a (local).
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1)
containing the local pieces of the distributed matrix sub(A).
work (local).
Array size lwork.
lwork ≥ 0 if norm = 'M' or 'm' (not referenced),
nq0 if norm = '1', 'O' or 'o',
mp0 if norm = 'I' or 'i',
0 if norm = 'F', 'f', 'E' or 'e' (not referenced),
where
mp0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nq0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol),
Output Parameters
val The value returned by the function.
See Also
notations.
1435
p?lanhs
element, of an upper Hessenberg matrix.
Syntax
float pslanhs (char *norm , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *work );
double pdlanhs (char *norm , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *work );
float pclanhs (char *norm , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *work );
double pzlanhs (char *norm , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *work );
Include Files
• mkl_scalapack.h
Description
The p?lanhsfunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of an upper Hessenberg distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1).
Input Parameters
norm Specifies the value to be returned by the function:

A.
row sum),
n (global)
The number of columns in the distributed matrix sub(A). When n =
0, p?lanhs is set to zero. n≥ 0.
a (local).
ia, ja (global)
and the first column of the matrix sub(A), respectively.
1436
work (local).
lwork≥ 0 if norm = 'M' or 'm' (not referenced),
where
iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1, nb_a ),
iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow ),
iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol ),
mp0 = numroc( m+iroffa, mb_a, myrow, iarow, nprow ),
nq0 = numroc( n+icoffa, nb_a, mycol, iacol, npcol ),
indxg2p and numroc are ScaLAPACK tool functions; myrow, imycol, nprow,
Output Parameters
See Also
notations.
p?lansy, p?lanhe
element, of a real symmetric or a complex Hermitian
matrix.
Syntax
float pslansy (char *norm , char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *work );
double pdlansy (char *norm , char *uplo , MKL_INT *n , double *a , MKL_INT *ia ,
float pclansy (char *norm , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
double pzlansy (char *norm , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
float pclanhe (char *norm , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
double pzlanhe (char *norm , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
1437
Include Files
• mkl_scalapack.h
Description
The p?lansy and p?lanhefunctions return the value of the 1-norm, or the Frobenius norm, or the infinity
norm, or the element of largest absolute value of a distributed matrix sub(A) = A(ia:ia+m-1, ja:ja
+n-1).
Input Parameters

row sum),
uplo (global) Specifies whether the upper or lower triangular part of the
symmetric matrix sub(A) is to be referenced.
= 'U': Upper triangular part of sub(A) is referenced,
= 'L': Lower triangular part of sub(A) is referenced.
n (global)
lansy is set to zero. n≥ 0.
a (local).
the upper triangular matrix whose norm is to be computed, and the strictly
lower triangular part of this matrix is not referenced. If uplo = 'L', the
leading n-by-n lower triangular part of sub(A) contains the lower triangular
matrix whose norm is to be computed, and the strictly upper triangular part
of sub(A) is not referenced.
work (local).
2*nq0+mp0+ldw if norm = '1', 'O' or 'o', 'I' or 'i',
1438
where ldw is given by:
if( nprow≠npcol ) then
ldw = mb_a*iceil(iceil(np0,mb_a),(lcm/nprow))
else
ldw = 0
end if
where lcm is the least common multiple of nprow and npcol, lcm =
ilcm( nprow, npcol ) and iceil(x,y) is a ScaLAPACK function that
returns ceiling (x/y).
iroffa = mod(ia-1, mb_a ), icoffa = mod( ja-1, nb_a),

ilcm, iceil, indxg2p, and numroc are ScaLAPACK tool functions; myrow,
mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
See Also
notations.
p?lantr
element, of a triangular matrix.
Syntax
float pslantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *work );
double pdlantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *work );
float pclantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *work );
double pzlantr (char *norm , char *uplo , char *diag , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *work );
Include Files
• mkl_scalapack.h
1439
Description
The p?lantrfunction returns the value of the 1-norm, or the Frobenius norm, or the infinity norm, or the
element of largest absolute value of a trapezoidal or triangular distributed matrix sub(A) = A(ia:ia+m-1,
ja:ja+n-1).
Input Parameters

row sum),
uplo (global)
matrix sub(A) is to be referenced.
= 'U': Upper trapezoidal,
= 'L': Lower trapezoidal.
Note that sub(A) is triangular instead of trapezoidal if m = n.
diag (global)
Specifies whether the distributed matrix sub(A) has unit diagonal.
= 'N': Non-unit diagonal.
= 'U': Unit diagonal.
m (global)
lantr is set to zero. m ≥ 0.
n (global)
lantr is set to zero. n ≥ 0.
a (local).
Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1)
ia, ja (global)
1440
work (local).
Array size lwork.
iroffa = mod(ia-1, mb_a ), icoffa = mod( ja-1, nb_a),

Output Parameters
See Also
notations.
p?lapiv
Applies a permutation matrix to a general distributed
matrix, resulting in row or column pivoting.
Syntax
void pslapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT *ip ,
MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pdlapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT *ip ,
MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pclapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv , MKL_INT
*ip , MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
void pzlapiv (char *direc , char *rowcol , char *pivroc , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *ipiv ,
MKL_INT *ip , MKL_INT *jp , MKL_INT *descip , MKL_INT *iwork );
Include Files
• mkl_scalapack.h
1441
Description
The p?lapivfunction applies either P (permutation matrix indicated by ipiv) or inv(P) to a general m-by-n
distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1), resulting in row or column pivoting. The pivot
vector may be distributed across a process row or a column. The pivot vector should be aligned with the
distributed matrix A. This function will transpose the pivot vector, if necessary.
For example, if the row pivots should be applied to the columns of sub(A), pass rowcol='C' and
pivroc='C'.
Input Parameters
direc (global)
Specifies in which order the permutation is applied:
= 'F' (Forward): Applies pivots forward from top of matrix. Computes
P*sub(A).
= 'B' (Backward): Applies pivots backward from bottom of matrix.
Computes inv(P)*sub(A).
rowcol (global)
Specifies if the rows or columns are to be permuted:
= 'R': Rows will be permuted,
= 'C': Columns will be permuted.
pivroc (global)
Specifies whether ipiv is distributed over a process row or column:
= 'R': ipiv is distributed over a process row,
= 'C': ipiv is distributed over a process column.
m (global)
lapiv is set to zero. m ≥ 0.
n (global)
lapiv is set to zero. n ≥ 0.
a (local).
ia, ja (global)
ipiv (local)
Array of size lipiv ;
1442
when rowcol='R' or 'r':
lipiv≥LOCr(ia+m-1) + mb_a if pivroc='C' or 'c',

lipiv≥LOCc(m + mod(jp-1, nb_p)) if pivroc='R' or 'r', and,
when rowcol='C' or 'c':
lipiv≥LOCr(n + mod(ip-1, mb_p)) if pivroc='C' or 'c',

lipiv≥LOCc(ja+n-1) + nb_a if pivroc='R' or 'r'.
This array contains the pivoting information. ipiv(i) is the global row
(column), local row (column) i was swapped with. When rowcol='R' or
'r' and pivroc='C' or 'c', or rowcol='C' or 'c' and pivroc='R' or
'r', the last piece of this array of size mb_a (resp. nb_a) is used as
workspace. In those cases, this array is tied to the distributed matrix A.
ip, jp (global) The row and column indices in the global matrix P indicating the
first row and the first column of the matrix sub(P), respectively.
descip (global and local) array of size dlen_. The array descriptor for the
distributed vector ipiv.
iwork (local).
Array of size ldw, where ldw is equal to the workspace necessary for
transposition, and the storage of the transposed ipiv:
Let lcm be the least common multiple of nprow and npcol.
if( *rowcol == 'r' && *pivroc == 'r') {

if( nprow == npcol) {
ldw = LOCr( n_p + (*jp-1)%nb_p ) + nb_p;
} else {
ldw = LOCr( n_p + (*jp-1)%nb_p )+
nb_p * ceil( ceil(LOCc(n_p)/nb_p) / (lcm/npcol) );
}
} else if( *rowcol == 'c' && *pivroc == 'c') {
if( nprow == npcol ) {
ldw = LOCc( m_p + (*ip-1)%mb_p ) + mb_p;
} else {
ldw = LOCc( m_p + (*ip-1)%mb_p ) +
mb_p *ceil(ceil(LOCr(m_p)/mb_p) / (lcm/nprow) );
}
} else {
// iwork is not referenced.
Output Parameters
a (local).
On exit, the local pieces of the permuted distributed submatrix.
See Also
notations.
p?lapv2
Applies a permutation to an m-by-n distributed matrix.
1443
Syntax
void pslapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, float* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
MKL_INT* ipiv, const MKL_INT* ip, const MKL_INT* jp, const MKL_INT* descip);
void pdlapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, double* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
void pclapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, MKL_Complex8* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
void pzlapv2 (const char* direc, const char* rowcol, const MKL_INT* m, const MKL_INT*
n, MKL_Complex16* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT* desca, const
Include Files
• mkl_scalapack.h
Description
p?lapv2 applies either P (permutation matrix indicated by ipiv) or inv( P ) to an m-by-n distributed matrix
sub( A ) denoting A(ia:ia+m-1,ja:ja+n-1), resulting in row or column pivoting. The pivot vector should be
aligned with the distributed matrix A. For pivoting the rows of sub( A ), ipiv should be distributed along a
process column and replicated over all process rows. Similarly, ipiv should be distributed along a process
row and replicated over all process columns for column pivoting.
Input Parameters
direc (global)
= 'F' (Forward) Applies pivots Forward from top of matrix. Computes P *
sub( A );
= 'B' (Backward) Applies pivots Backward from bottom of matrix. Computes
inv( P ) * sub( A ).
rowcol (global)
Specifies if the rows or columns are to be permuted:
= 'R' Rows will be permuted,
= 'C' Columns will be permuted.
m (global)
The number of rows to be operated on, i.e. the number of rows of the
distributed submatrix sub( A ). m >= 0.
n (global)
The number of columns to be operated on, i.e. the number of columns of
the distributed submatrix sub( A ). n >= 0.
a Pointer into local memory to an array of size lld_a*LOCc(ja+n-1) .
1444
On entry, this local array contains the local pieces of the distributed matrix
sub( A ) to which the row or columns interchanges will be applied.
ia (global)
ja (global)
sub( A ).

ipiv Array, size >= LOCr(m_a)+mb_a if rowcol = 'R', LOCc(n_a)+nb_a

otherwise.
It contains the pivoting information. ipiv[i - 1] is the global row (column),
local row (column) i was swapped with. The last piece of the array of size
mb_a or nb_a is used as workspace. ipiv is tied to the distributed matrix
A.
ip (global)
The global row index of ipiv, which points to the beginning of the
jp (global)
The global column index of ipiv, which points to the beginning of the
descip (global and local)

Array of size 8.
The array descriptor for the distributed matrix ipiv.
Output Parameters
a On exit, this array contains the local pieces of the permuted distributed
matrix.
p?laqge
Scales a general rectangular matrix, using row and
column scaling factors computed by p?geequ .
Syntax
void pslaqge (MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax , char
*equed );
void pdlaqge (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
*amax , char *equed );
1445
void pclaqge (MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,

MKL_INT *desca , float *r , float *c , float *rowcnd , float *colcnd , float *amax ,
char *equed );
void pzlaqge (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
*amax , char *equed );
Include Files
• mkl_scalapack.h
Description
The p?laqgefunction equilibrates a general m-by-n distributed matrix sub(A) = A(ia:ia+m-1, ja:ja+n-1)
using the row and scaling factors in the vectors r and c computed by p?geequ.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). (m ≥0).
n (global)
The number of columns in the distributed matrix sub(A). (n ≥0).
a (local).
On entry, this array contains the distributed matrix sub(A).
r (local).
Array of size LOCr(m_a). The row scale factors for sub(A). r is aligned with
the distributed matrix A, and replicated across every process column. r is
c (local).
Array of size LOCc(n_a). The row scale factors for sub(A). c is aligned with
the distributed matrix A, and replicated across every process column. c is
rowcnd (local).
The global ratio of the smallest r[i] to the largest r[i] , ia-1 ≤i≤ia+m-2.
colcnd (local).
The global ratio of the smallest c[i] to the largest c[i], ia-1 ≤i≤ia+n-2.
amax (global).
Absolute value of largest distributed submatrix entry.
1446
Output Parameters
a (local).
On exit, the equilibrated distributed matrix. See equed for the form of the
equilibrated distributed submatrix.
equed (global)
Specifies the form of equilibration that was done.
= 'N': No equilibration
= 'R': Row equilibration, that is, sub(A) has been pre-multiplied by

diag(r[ia-1:ia+m-2]),
= 'C': column equilibration, that is, sub(A) has been post-multiplied by
diag(c[ja-1:ja+n-2]),
= 'B': Both row and column equilibration, that is, sub(A) has been
replaced by diag(r[ia-1:ia+m-2])* sub(A) * diag(c[ja-1:ja
+n-2]).
See Also
notations.
p?laqr0
Computes the eigenvalues of a Hessenberg matrix and
optionally returns the matrices from the Schur
decomposition.
Syntax
void pslaqr0(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
float* h, MKL_INT* desch, float* wr, float* wi, MKL_INT* iloz, MKL_INT* ihiz, float*
z, MKL_INT* descz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork,
MKL_INT* info, MKL_INT* reclevel);
void pdlaqr0(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
double* h, MKL_INT* desch, double* wr, double* wi, MKL_INT* iloz, MKL_INT* ihiz,
double* z, MKL_INT* descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT*
liwork, MKL_INT* info, MKL_INT* reclevel);
Include Files
• mkl_scalapack.h
Description
p?laqr0 computes the eigenvalues of a Hessenberg matrix H and, optionally, the matrices T and Z from the
Schur decomposition H = Z*T*ZT, where T is an upper quasi-triangular matrix (the Schur form), and Z is the
orthogonal matrix of Schur vectors.
Optionally Z may be postmultiplied into an input orthogonal matrix Q so that this function can give the Schur
factorization of a matrix A which has been reduced to the Hessenberg form H by the orthogonal matrix Q: A
= Q * H * QT = (QZ) * T * (QZ)T.
1447
Input Parameters
wantt (global )
Non-zero : the full Schur form T is required;
Zero : only eigenvalues are required.
wantz (global )
Non-zero : the matrix of Schur vectors Z is required;
Zero: Schur vectors are not required.
n (global )
The order of the Hessenberg matrix H (and Z if wantzis non-zero). n≥ 0.
ilo, ihi (global )

It is assumed that the matrix H is already upper triangular in rows and
columns 1:ilo-1 and ihi+1:n. ilo and ihi are normally set by a previous
call to p?gebal, and then passed to p?gehrd when the matrix output by
ihi is reduced to Hessenberg form. Otherwise ilo and ihi should be set
to 1 and n, respectively. If n > 0, then 1 ≤ilo≤ihi≤n.
If n = 0, then ilo = 1 and ihi = 0.
h (global ) array of size lld_h * LOCc(n)
The upper Hessenberg matrix H.
desch (global and local )

The array descriptor for the distributed matrix H.
iloz, ihiz Specify the rows of the matrix Z to which transformations must be applied if
wantz is non-zero, 1 ≤iloz≤ilo; ihi≤ihiz≤n.
z Array of size lld_z * LOCc(n).
If wantz is non-zero, contains the matrix Z.
If wantzequals zero, z is not referenced.
descz (global and local ) array of size dlen_.

The array descriptor for the distributed matrix Z.
lwork (local )
The length of the workspace array work.
liwork (local )
The length of the workspace array iwork.
reclevel (local )
1448
Level of recursion. reclevel = 0 must hold on entry.
OUTPUT Parameters
h On exit, if wantt is non-zero, the matrix H is upper quasi-triangular in rows

and columns ilo:ihi, with 1-by-1 and 2-by-2 blocks on the main diagonal.
The 2-by-2 diagonal blocks (corresponding to complex conjugate pairs of
eigenvalues) are returned in standard form, with H(i,i) = H(i+1,i+1) and H(i
+1,i)*H(i,i+1) < 0. If info = 0 and wanttequals zero, the contents of h
are unspecified on exit.
wr, wi The real and imaginary parts, respectively, of the computed eigenvalues
ilo to ihi are stored in the corresponding elements of wr and wi. If two
eigenvalues are computed as a complex conjugate pair, they are stored in
consecutive elements of wr and wi, say the i-th and (i+1)th, with wi[i-1] >
0 and wi[i] < 0. If wantt is non-zero, the eigenvalues are stored in the
same order as on the diagonal of the Schur form returned in h.
z Updated matrix with transformations applied only to the submatrix

Z(ilo:ihi,ilo:ihi).
If COMPZ = 'I', on exit, if info = 0, z contains the orthogonal matrix Z of

the Schur vectors of H.
If wantz is non-zero, then Z(ilo:ihi,iloz:ihiz) is replaced by
Z(ilo:ihi,iloz:ihiz)*U, when U is the orthogonal/unitary Schur factor of
H(ilo:ihi,ilo:ihi).
If wantzequals zero, then z is not defined.
work[0] On exit, if info = 0, work[0] returns the optimal lwork.
iwork[0] On exit, if info = 0, iwork[0] returns the optimal liwork.
info > 0: if info = i, then the function failed to compute all the eigenvalues.
Elements 0:ilo-2 and i:n-1 of wr and wi contain those eigenvalues which
have been successfully computed.
> 0: if wanttequals zero, then the remaining unconverged eigenvalues are
the eigenvalues of the upper Hessenberg matrix rows and columns ilo
through ihi of the final output value of H.
> 0: if wantt is non-zero, then (initial value of H)*U = U*(final value of H),
where U is an orthogonal/unitary matrix. The final value of H is upper
Hessenberg and quasi-triangular/triangular in rows and columns info+1
through ihi.
> 0: if wantz is non-zero, then (final value of

Z(ilo:ihi,iloz:ihiz))=(initial value of Z(ilo:ihi,iloz:ihiz))*U, where
U is the orthogonal/unitary matrix in the previous expression (regardless of
the value of wantt).
> 0: if wantzequals zero, then z is not accessed.
See Also
notations.
1449
p?laqr1
Sets a scalar multiple of the first column of the
product of a 2-by-2 or 3-by-3 matrix and specified
shifts.
Syntax
void pslaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
float* a, MKL_INT* desca, float* wr, float* wi, MKL_INT* iloz, MKL_INT* ihiz, float*
z, MKL_INT* descz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* ilwork,
MKL_INT* info);
void pdlaqr1(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ilo, MKL_INT* ihi,
double* a, MKL_INT* desca, double* wr, double* wi, MKL_INT* iloz, MKL_INT* ihiz,
double* z, MKL_INT* descz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT*
ilwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?laqr1 is an auxiliary function used to find the Schur decomposition and/or eigenvalues of a matrix already
in Hessenberg form from columns ilo to ihi.
This is a modified version of p?lahqr from ScaLAPACK version 1.7.3. The following modifications were
made:
• Workspace query functionality was added.

• Aggressive early deflation is implemented.
• Aggressive deflation (looking for two consecutive small subdiagonal elements by PSLACONSB) is
abandoned.
• The returned Schur form is now in canonical form, i.e., the returned 2-by-2 blocks really correspond to
complex conjugate pairs of eigenvalues.
• For some reason, the original version of p?lahqr sometimes did not read out the converged eigenvalues
correctly. This is now fixed.
Optimization Notice
Input Parameters
wantt (global )
Non-zero : the full Schur form T is required;
Zero: only eigenvalues are required.
1450
wantz Non-zero : the matrix of Schur vectors Z is required;
Zero: Schur vectors are not required.
n (global )
The order of the Hessenberg matrix A (and Z if wantzis non-zero). n≥ 0.
ilo, ihi (global )

It is assumed that the matrix A is already upper quasi-triangular in rows
and columns ihi+1:n, and that A(ilo,ilo-1) = 0 (unless ilo = 1). p?
laqr1 works primarily with the Hessenberg submatrix in rows and columns
ilo to ihi, but applies transformations to all of H if wantt is non-zero.
1 ≤ilo≤ max(1,ihi); ihi≤n.
a (global ) array of size lld_a * LOCc(n)
On entry, the upper Hessenberg matrix A.
desca (global and local ) array of size dlen_.

iloz, ihiz (global )

Specify the rows of the matrix Z to which transformations must be applied if
wantz is non-zero.
1 ≤iloz≤ilo; ihi≤ihiz≤n.
z (global ) array of size lld_z * LOCc(n).
If wantz is non-zero, on entry z must contain the current matrix Z of

transformations accumulated by p?hseqr
If wantz is zero, z is not referenced.
descz (global and local ) array of size dlen_.

work (local output) array of size lwork
lwork (local )
The size of the work array (lwork>=1).
If lwork=-1, then a workspace query is assumed.
iwork (global and local ) array of size ilwork
This holds the some of the IBLK integer arrays.
ilwork (local )
The size of the iwork array (ilwork≥ 3 ).
1451
OUTPUT Parameters
a If wantt is non-zero, the matrix A is upper quasi-triangular in rows and

columns ilo:ihi, with any 2-by-2 or larger diagonal blocks not yet in
standard form. If wanttequals zero, the contents of a are unspecified on
exit.
wr, wi (global replicated ) array of size n
The real and imaginary parts, respectively, of the computed eigenvalues

ilo to ihi are stored in the corresponding elements of wr and wi. If two
eigenvalues are computed as a complex conjugate pair, they are stored in
consecutive elements of wr and wi, say the i-th and (i+1)th, with wi[i-1] >
0 and wi[i] < 0. If wantt is non-zero, the eigenvalues are stored in the
same order as on the diagonal of the Schur form returned in a. a may be
returned with larger diagonal blocks until the next release.
z On exit z is updated; transformations are applied only to the submatrix

Z(iloz:ihiz,ilo:ihi).
If wantzequals zero, z is not referenced.
info (global )
< 0: parameter number -info incorrect or inconsistent
> 0: p?laqr1 failed to compute all the eigenvalues ilo to ihi in a total of
30*(ihi-ilo+1) iterations; if info = i, elements i:ihi-1 of wr and wi
contain those eigenvalues which have been successfully computed.
Application Notes
This algorithm is very similar to p?ahqr. Unlike p?lahqr, instead of sending one double shift through the
largest unreduced submatrix, this algorithm sends multiple double shifts and spaces them apart so that there
can be parallelism across several processor row/columns. Another critical difference is that this algorithm
aggregrates multiple transforms together in order to apply them in a block fashion.
Current Notes and/or Restrictions:
• This code requires the distributed block size to be square and at least six (6); unlike simpler codes like
LU, this algorithm is extremely sensitive to block size. Unwise choices of too small a block size can lead to
bad performance.
• This code requires a and z to be distributed identically and have identical contxts.
• This release currently does not have a function for resolving the Schur blocks into regular 2x2 form after
this code is completed. Because of this, a significant performance impact is required while the deflation is
done by sometimes a single column of processors.
• This code does not currently block the initial transforms so that none of the rows or columns for any bulge
are completed until all are started. To offset pipeline start-up it is recommended that at least
2*LCM(NPROW,NPCOL) bulges are used (if possible)
• The maximum number of bulges currently supported is fixed at 32. In future versions this will be limited
only by the incoming work array.
• The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are nonzero, the
resulting transforms may be nonsimilar. This is also true with the LAPACK function.
• For this release, it is assumed rsrc_=csrc_=0
1452
• Currently, all the eigenvalues are distributed to all the nodes. Future releases will probably distribute the
eigenvalues by the column partitioning.
• The internals of this function are subject to change.
See Also
notations.
p?laqr2
Performs the orthogonal/unitary similarity
transformation of a Hessenberg matrix to detect and
deflate fully converged eigenvalues from a trailing
principal submatrix (aggressive early deflation).
Syntax
void pslaqr2(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, float* a, MKL_INT* desca, MKL_INT* iloz, MKL_INT* ihiz, float* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, float* sr, float* si, float* t, MKL_INT*
ldt, float* v, MKL_INT* ldv, float* wr, float* wi, float* work, MKL_INT* lwork);
void pdlaqr2(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, double* a, MKL_INT* desca, MKL_INT* iloz, MKL_INT* ihiz, double* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, double* sr, double* si, double* t, MKL_INT*
ldt, double* v, MKL_INT* ldv, double* wr, double* wi, double* work, MKL_INT* lwork);
Include Files
• mkl_scalapack.h
Description
p?laqr2 accepts as input an upper Hessenberg matrix A and performs an orthogonal similarity
transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
On output Ais overwritten by a new Hessenberg matrix that is a perturbation of an orthogonal similarity
transformation of A. It is to be hoped that the final version of A has many zero subdiagonal entries.
This function handles small deflation windows which is affordable by one processor. Normally, it is called by
p?laqr1. All the inputs are assumed to be valid without checking.
Optimization Notice
Input Parameters
wantt (global )
1453
If wantt is non-zero, then the Hessenberg matrix A is fully updated so that

the quasi-triangular Schur factor may be computed (in cooperation with the
calling function).
If wantt equals zero, then only enough of A is updated to preserve the
eigenvalues.
wantz (global )
If wantz is non-zero, then the orthogonal matrix Z is updated so that the
orthogonal Schur factor may be computed (in cooperation with the calling
function).
If wantz equals zero, then z is not referenced.
n (global )
The order of the matrix A and (if wantz is non-zero) the order of the
orthogonal matrix Z.
ktop, kbot (global )

It is assumed without a check that either kbot = n or A(kbot+1,kbot)=0.
kbot and ktop together determine an isolated block along the diagonal of
the Hessenberg matrix. However, A(ktop,ktop-1)=0 is not essentially
necessary if wantt is non-zero .
nw (global )
Deflation window size. 1 ≤nw≤ (kbot-ktop+1). Normally nw≥ 3 if p?laqr2 is
called by p?laqr1.
a (local ) array of size lld_a * LOCc(n)
The initial n-by-n section of a stores the Hessenberg matrix undergoing

aggressive early deflation.


Specify the rows of the matrix Zto which transformations must be applied if
wantz is non-zero. 1 ≤iloz≤ihiz≤n.
z Array of size lld_z * LOCc(n)
If wantz is non-zero, then on output, the orthogonal similarity

transformation mentioned above has been accumulated into the matrix
Z(iloz:ihiz,
kbot:ktop), stored in z, from the right.

If wantz is zero, then z is unreferenced.

t (local workspace) array of size ldt * nw.
1454
ldt (local )
The leading dimension of the array t. ldt≥nw.
v (local workspace) array of size ldv * nw.
ldv (local )
The leading dimension of the array v. ldv≥nw.
wr, wi (local workspace) array of size kbot.
work (local workspace) array of size lwork.
lwork (local )
work(lwork) is a local array and lwork is assumed big enough so that
lwork≥nw*nw.
OUTPUT Parameters
a On output a has been transformed by an orthogonal similarity

transformation, perturbed, and returned to Hessenberg form that (it is to be
hoped) has some zero subdiagonal entries.
z
ns (global )
The number of unconverged (that is, approximate) eigenvalues returned in
sr and si that may be used as shifts by the calling function.
nd (global )
The number of converged eigenvalues uncovered by this function.
sr, si (global ) array of size kbot
On output, the real and imaginary parts of approximate eigenvalues that

may be used for shifts are stored in sr[kbot-nd-ns] through sr[kbot-
nd-1] and si[kbot-nd-ns] through si[kbot-nd-1], respectively.
On processor #0, the real and imaginary parts of converged eigenvalues
are stored in sr[kbot-nd] through sr[kbot-1] and si[kbot-nd] through
si[kbot-1], respectively. On other processors, these entries are set to
zero.
See Also
notations.
p?laqr3
Performs the orthogonal/unitary similarity
transformation of a Hessenberg matrix to detect and
deflate fully converged eigenvalues from a trailing
principal submatrix (aggressive early deflation).
1455
Syntax
void pslaqr3(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, float* h, MKL_INT* desch, MKL_INT* iloz, MKL_INT* ihiz, float* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, float* sr, float* si, float* v, MKL_INT*
descv, MKL_INT* nh, float* t, MKL_INT* desct, MKL_INT* nv, float* wv, MKL_INT* descw,
float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* reclevel);
void pdlaqr3(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* n, MKL_INT* ktop, MKL_INT* kbot,
MKL_INT* nw, double* h, MKL_INT* desch, MKL_INT* iloz, MKL_INT* ihiz, double* z,
MKL_INT* descz, MKL_INT* ns, MKL_INT* nd, double* sr, double* si, double* v, MKL_INT*
descv, MKL_INT* nh, double* t, MKL_INT* desct, MKL_INT* nv, double* wv, MKL_INT*
descw, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT*
reclevel);
Include Files
• mkl_scalapack.h
Description
This function accepts as input an upper Hessenberg matrix H and performs an orthogonal similarity
transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
On output H is overwritten by a new Hessenberg matrix that is a perturbation of an orthogonal similarity
transformation of H. It is to be hoped that the final version of H has many zero subdiagonal entries.
Input Parameters
wantt (global )
If wantt is non-zero, then the Hessenberg matrix H is fully updated so that
the quasi-triangular Schur factor may be computed (in cooperation with the
calling function).
If wantt equals zero, then only enough of H is updated to preserve the
eigenvalues.
wantz (global )
If wantz is non-zero, then the orthogonal matrix Z is updated so that the
orthogonal Schur factor may be computed (in cooperation with the calling
function).
If wantz equals zero, then z is not referenced.
n (global )
The order of the matrix H and (if wantz is non-zero), the order of the
orthogonal matrix Z.
ktop (global )
It is assumed that either ktop = 1 or H (ktop,ktop-1)=0. kbot and ktop
together determine an isolated block along the diagonal of the Hessenberg
matrix.
kbot (global )
It is assumed without a check that either kbot = n or H (kbot+1,kbot)=0.
kbot and ktop together determine an isolated block along the diagonal of
the Hessenberg matrix.
1456
nw (global )
Deflation window size. 1 ≤nw≤ (kbot-ktop+1).
h (local ) array of size lld_h * LOCc(n)
The initial n-by-n section of H stores the Hessenberg matrix undergoing

aggressive early deflation.
desch (global and local) array of size dlen_.

The array descriptor for the distributed matrix H.

wantz is non-zero. 1 ≤iloz≤ihiz≤n.
z Array of size lld_z * LOCc(n)
If wantz is non-zero, then on output, the orthogonal similarity

Z(iloz:ihiz,kbot:ktop) from the right.

v (global workspace) array of size lld_v * LOCcnw)
An nw-by-nw distributed work array.
descv (global and local) array of size dlen_.

The array descriptor for the distributed matrix V.
nh The number of columns of t. nh≥nw.
t (global workspace) array of size lld_t * LOCc(nh)
desct (global and local) array of size dlen_.

The array descriptor for the distributed matrix T.
nv (global )
The number of rows of work array wv available for workspace. nv≥nw.
wv (global workspace) array of size lld_w *LOCc(nw)
descw (global and local) array of size dlen_.

The array descriptor for the distributed matrix wv.
work (local workspace) array of size lwork.
lwork (local )
The size of the work array work (lwork≥1). lwork = 2*nw suffices, but
greater efficiency may result from larger values of lwork.
1457
If lwork = -1, then a workspace query is assumed; p?laqr3 only

estimates the optimal workspace size for the given values of n, nw, ktop
and kbot. The estimate is returned in work[0]. No error message related to
lwork is issued by xerbla. Neither h nor z are accessed.
liwork (local )
The length of the workspace array iwork (liwork≥1).
If liwork=-1, then a workspace query is assumed.
OUTPUT Parameters
h On output h has been transformed by an orthogonal similarity

transformation, perturbed, and the returned to Hessenberg form that (it is
to be hoped) has some zero subdiagonal entries.
z IF wantz is non-zero, then on output, the orthogonal similarity

Z(iloz:ihiz,kbot:ktop) from the right.
ns (global )
The number of unconverged (that is, approximate) eigenvalues returned in
sr and si that may be used as shifts by the calling function.
nd (global )
The number of converged eigenvalues uncovered by this function.
sr, si (global ) array of size kbot. The real and imaginary parts of approximate
eigenvalues that may be used for shifts are stored in sr[kbot-nd-ns]
through sr[kbot-nd-1] and si[kbot-nd-ns] through si[kbot-nd-1],
respectively. The real and imaginary parts of converged eigenvalues are
stored in sr[kbot-nd] through sr[kbot-1] and si[kbot-nd] through
si[kbot-1], respectively.
work[0] On exit, if info = 0, work[0] returns the optimal lwork
iwork[0] On exit, if info = 0, iwork[0] returns the optimal liwork
See Also
notations.
p?laqr5
Performs a single small-bulge multi-shift QR sweep.
1458
Syntax
void pslaqr5(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n, MKL_INT*
ktop, MKL_INT* kbot, MKL_INT* nshfts, float* sr, float* si, float* h, MKL_INT* desch,
MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT* descz, float* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork);
void pdlaqr5(MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n, MKL_INT*
ktop, MKL_INT* kbot, MKL_INT* nshfts, double* sr, double* si, double* h, MKL_INT*
desch, MKL_INT* iloz, MKL_INT* ihiz, double* z, MKL_INT* descz, double* work, MKL_INT*
lwork, MKL_INT* iwork, MKL_INT* liwork);
Include Files
• mkl_scalapack.h
Description
This auxiliary function called by p?laqr0 performs a single small-bulge multi-shift QR sweep by chasing
separated groups of bulges along the main block diagonal of a Hessenberg matrix H.
Input Parameters
wantt (global) scalar

wanttis non-zero if the quasi-triangular Schur factor is being computed.
wantt is set to zero otherwise.
wantz (global) scalar

wantzis non-zero if the orthogonal Schur factor is being computed. wantz
is set to zero otherwise.
kacc22 (global)
Value 0, 1, or 2. Specifies the computation mode of far-from-diagonal
orthogonal updates.
= 0: p?laqr5 does not accumulate reflections and does not use matrix-
matrix multiply to update far-from-diagonal matrix entries.
= 1: p?laqr5 accumulates reflections and uses matrix-matrix multiply to
update the far-from-diagonal matrix entries.
= 2: p?laqr5 accumulates reflections, uses matrix-matrix multiply to
update the far-from-diagonal matrix entries, and takes advantage of 2-by-2
block structure during matrix multiplies.
n (global) scalar
The order of the Hessenberg matrix H and, if wantzis non-zero, the order of
the orthogonal matrix Z.
ktop, kbot (global) scalar

These are the first and last rows and columns of an isolated diagonal block
upon which the QR sweep is to be applied. It is assumed without a check
that either ktop = 1 or H(ktop,ktop-1) = 0 and either kbot = n or H(kbot
+1,kbot) = 0.
nshfts (global) scalar
1459
nshfts gives the number of simultaneous shifts. nshfts must be positive

and even.
sr, si (global) Array of size nshfts
sr contains the real parts and si contains the imaginary parts of the
nshfts shifts of origin that define the multi-shift QR sweep.
h (local) Array of size lld_h * LOCc(n)
On input h contains a Hessenberg matrix H.
desch (global and local)

array of size dlen_.
The array descriptor for the distributed matrix H .
iloz, ihiz (global)

wantzis non-zero. 1 ≤iloz≤ihiz≤n
z (local) array of size lld_z * LOCc(n)
If wantzis non-zero, then the QR Sweep orthogonal similarity

transformation is accumulated into the matrix Z(iloz:ihiz,kbot:ktop)
from the right. If wantzequals zero, then z is unreferenced.

lwork (local)
The size of the work array (lwork≥1).
If lwork=-1, then a workspace query is assumed.
liwork (local)
The size of the iwork array (liwork≥1).
If liwork=-1, then a workspace query is assumed.
Output Parameters
h A multi-shift QR sweep with shifts sr(j)+i*si(j) is applied to the isolated

diagonal block in rows and columns ktop through kbot of the matrix H.
z If wantzis non-zero, z is updated with transformations applied only to the

submatrix Z(iloz:ihiz,kbot:ktop).
1460
See Also
notations.
p?laqsy
Scales a symmetric/Hermitian matrix, using scaling
factors computed by p?poequ .
Syntax
void pslaqsy (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *sr , float *sc , float *scond , float *amax , char *equed );
void pdlaqsy (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *sr , double *sc , double *scond , double *amax , char
*equed );
void pclaqsy (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , float *sr , float *sc , float *scond , float *amax , char *equed );
void pzlaqsy (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *sr , double *sc , double *scond , double *amax , char
*equed );
Include Files
• mkl_scalapack.h
Description
The p?laqsyfunction equilibrates a symmetric distributed matrix sub(A) = A(ia:ia+n-1, ja:ja+n-1) using
the scaling factors in the vectors sr and sc. The scaling factors are computed by p?poequ.
Input Parameters
uplo (global) Specifies the upper or lower triangular part of the symmetric
distributed matrix sub(A) is to be referenced:
= 'U': Upper triangular part;
= 'L': Lower triangular part.
n (global)
The order of the distributed matrix sub(A). n ≥ 0.
a (local).
On entry, this array contains the local pieces of the distributed matrix
sub(A). On entry, the local pieces of the distributed symmetric matrix
sub(A).
the upper triangular part of the matrix, and the strictly lower triangular part
the lower triangular part of the matrix, and the strictly upper triangular part
1461
ia, ja (global)
sr (local)
Array of size LOCr(m_a). The scale factors for the matrix A(ia:ia+m-1, ja:ja
+n-1). sr is aligned with the distributed matrix A, and replicated across
every process column. sr is tied to the distributed matrix A.
sc (local)
Array of size LOCc(m_a). The scale factors for the matrix A (ia:ia+m-1,
ja:ja+n-1). sc is aligned with the distributed matrix A, and replicated
across every process column. sc is tied to the distributed matrix A.
scond (global).
Ratio of the smallest sr[i] (respectively sc[j]) to the largest sr[i]
(respectively sc[j]), with ia -1 ≤i < ia+n-1 and ja -1 ≤j < ja+n-1.
amax (global).
Absolute value of largest distributed submatrix entry.
Output Parameters
a On exit,
if equed = 'Y', the equilibrated matrix:
diag(sr ia, ..., sr ia+n-1) * sub(A) * diag(sc ja, ..., sc ja+n-1).
equed (global).
Specifies whether or not equilibration was done.
= 'N': No equilibration.
= 'Y': Equilibration was done, that is, sub(A) has been replaced by:
diag(sr ia, ..., sr ia+n-1) * sub(A) * diag(sc ja, ..., sc ja+n-1).
See Also
notations.
p?lared1d
Redistributes an array assuming that the input array,
bycol, is distributed across rows and that all process
columns contain the same copy of bycol.
Syntax
void pslared1d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , float *bycol ,
float *byall , float *work , MKL_INT *lwork );
1462
void pdlared1d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , double
*bycol , double *byall , double *work , MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?lared1dfunction redistributes a 1D array. It assumes that the input array bycol is distributed across
rows and that all process column contain the same copy of bycol. The output array byall is identical on all
processes and contains the entire array.
Input Parameters
np = Number of local rows in bycol()
n (global)
The number of elements to be redistributed. n≥ 0.
ia, ja (global) ia, ja must be equal to 1.
desc (local) array of size 9. A 2D array descriptor, which describes bycol.
bycol (local).
Distributed block cyclic array of global size n and of local size np. bycol is
distributed across the process rows. All process columns are assumed to
contain the same value.
work (local).
size lwork. Used to hold the buffers sent from one process to another.
lwork (local)
The size of the work array. lwork ≥ numroc(n, desc[nb_], 0, 0,
npcol).
Output Parameters
byall (global).
Global size n, local size n. byall is exactly duplicated on all processes. It
contains the same values as bycol, but it is replicated across all processes
rather than being distributed.
See Also
notations.
p?lared2d
Redistributes an array assuming that the input array
byrow is distributed across columns and that all
process rows contain the same copy of byrow.
Syntax
void pslared2d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , float *byrow ,
float *byall , float *work , MKL_INT *lwork );
1463
void pdlared2d (MKL_INT *n , MKL_INT *ia , MKL_INT *ja , MKL_INT *desc , double
*byrow , double *byall , double *work , MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?lared2dfunction redistributes a 1D array. It assumes that the input array byrow is distributed across
columns and that all process rows contain the same copy of byrow. The output array byall will be identical
on all processes and will contain the entire array.
Input Parameters
np = Number of local rows in byrow()
n (global)
The number of elements to be redistributed. n≥ 0.
ia, ja (global) ia, ja must be equal to 1.
desc (local) array of size dlen_. A 2D array descriptor, which describes byrow.
byrow (local).
Distributed block cyclic array of global size n and of local size np.
byrow is distributed across the process columns. All process rows
are assumed to contain the same value.
work (local).
size lwork. Used to hold the buffers sent from one process to another.
lwork (local) The size of the work array. lwork≥numroc(n, desc[nb_], 0, 0,

npcol).
Output Parameters
byall (global).
Global size n, local size n. byall is exactly duplicated on all processes. It
contains the same values as byrow, but it is replicated across all processes
rather than being distributed.
See Also
notations.
p?larf
Applies an elementary reflector to a general
rectangular matrix.
Syntax
void pslarf (char *side , MKL_INT *m , MKL_INT *n , float *v , MKL_INT *iv , MKL_INT
*jv , MKL_INT *descv , MKL_INT *incv , float *tau , float *c , MKL_INT *ic , MKL_INT
*jc , MKL_INT *descc , float *work );
1464
void pdlarf (char *side , MKL_INT *m , MKL_INT *n , double *v , MKL_INT *iv , MKL_INT
*jv , MKL_INT *descv , MKL_INT *incv , double *tau , double *c , MKL_INT *ic , MKL_INT
*jc , MKL_INT *descc , double *work );
void pclarf (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv ,
MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau , MKL_Complex8 *c ,
MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarf (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv ,
Include Files
• mkl_scalapack.h
Description
The p?larffunction applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-n
distributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented
in the form
Q = I-tau*v*v',
where tau is a real/complex scalar and v is a real/complex vector.

If tau = 0, then Q is taken to be the unit matrix.
Input Parameters
side (global).
= 'L': form Q*sub(C),
= 'R': form sub(C)*Q, Q=QT.
m (global)
The number of rows in the distributed submatrix sub(A). (m≥ 0).
n (global)
The number of columns in the distributed submatrix sub(A). (n ≥ 0).
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v),
containing the local pieces of the global distributed matrix V representing
the Householder transformation Q,
V(iv:iv+m-1, jv) if side = 'L' and incv = 1,
V(iv, jv:jv+m-1) if side = 'L' and incv = m_v,
V(iv:iv+n-1, jv) if side = 'R' and incv = 1,
V(iv, jv:jv+n-1) if side = 'R' and incv = m_v.
The array v is the representation of Q. v is not used if tau = 0.
first row and the first column of the matrix sub(V), respectively.
1465
incv (global)
The global increment for the elements of V. Only two values of incv are
supported in this version, namely 1 and m_v.
incv must not be zero.
tau (local).
Array of size LOCc(jv) if incv = 1, and LOCr(iv) otherwise. This array
contains the Householder scalars related to the Householder vectors.
tau is tied to the distributed matrix V.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1),
containing the local pieces of sub(C).
ic, jc (global)
The row and column indices in the global matrix C indicating the first row
and the first column of the matrix sub(C), respectively.
work (local).
If incv = 1,
if side = 'L',
if ivcol = iccol,
lwork≥nqc0
else
lwork≥mpc0 + max( 1, nqc0 )
end if
else if side = 'R' ,
lwork≥nqc0 + max( max( 1, mpc0), numroc(numroc( n+
icoffc,nb_v,0,0,npcol),nb_v,0,0,lcmq ) )
end if
else if incv = m_v,
if side = 'L',
lwork≥mpc0 + max( max( 1, nqc0 ), numroc(
numroc(m+iroffc,mb_v,0,0,nprow ),mb_v,0,0, lcmp ) )

else if side = 'R',
if ivrow = icrow,
lwork≥mpc0
1466
else
lwork≥nqc0 + max( 1, mpc0 )
end if
end if
end if,
where lcm is the least common multiple of nprow and npcol and lcm =
ilcm( nprow, npcol ), lcmp = lcm/nprow, lcmq = lcm/npcol,
iroffc = mod( ic-1, mb_c ), icoffc = mod( jc-1, nb_c ),
icrow = indxg2p( ic, mb_c, myrow, rsrc_c, nprow ),
iccol = indxg2p( jc, nb_c, mycol, csrc_c, npcol ),
mpc0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),
nqc0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),
ilcm, indxg2p, and numroc are ScaLAPACK tool functions; myrow, mycol,
nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the Q*sub(C) if side = 'L',
or sub(C) * Q if side = 'R'.
See Also
notations.
p?larfb
transpose to a general rectangular matrix.
Syntax
void pslarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
float *t , float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float *work );
void pdlarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv ,
double *t , double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double *work );
void pclarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex8 *t , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT
*descc , MKL_Complex8 *work );
void pzlarfb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , MKL_Complex16 *t , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT
*descc , MKL_Complex16 *work );
1467
Include Files
• mkl_scalapack.h
Description
The p?larfbfunction applies a real/complex block reflector Q or its transpose QT/conjugate transpose QH to a
real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1) from the left or the right.
Input Parameters
side (global)
if side = 'L': apply Q or QT for real flavors (QH for complex flavors) from
the Left;
if side = 'R': apply Q or QTfor real flavors (QH for complex flavors) from
the Right.
trans (global)
if trans = 'N': no transpose, apply Q;
for real flavors, if trans='T': transpose, apply QT
for complex flavors, if trans = 'C': conjugate transpose, apply QH;
direct (global) Indicates how Q is formed from a product of elementary reflectors.

if direct = 'F': Q = H(1)*H(2)*...*H(k) (Forward)
if direct = 'B': Q = H(k)*...*H(2)*H(1) (Backward)
storev (global)
Indicates how the vectors that define the elementary reflectors are stored:
if storev = 'C': Columnwise
if storev = 'R': Rowwise.
m (global)
The number of rows in the distributed matrix sub(C). (m≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n≥ 0).
k (global)
The order of the matrix T.
v (local).
Pointer into the local memory to an array of size
lld_v * LOCc(jv+k-1) if storev = 'C',
lld_v * LOCc(jv+m-1) if storev = 'R' and side = 'L',
lld_v * LOCc(jv+n-1) if storev = 'R' and side = 'R'.
It contains the local pieces of the distributed vectors V representing the

Householder transformation.
if storev = 'C' and side = 'L', lld_v≥ max(1,LOCr(iv+m-1));
1468
if storev = 'C' and side = 'R', lld_v≥ max(1,LOCr(iv+n-1));
if storev = 'R', lld_v≥LOCr(jv+k-1).
iv, jv (global)
The row and column indices in the global matrix V indicating the first row
and the first column of the matrix sub(V), respectively.
c (local).
work (local).
If storev = 'C',
if side = 'L',
lwork≥ ( nqc0 + mpc0 ) * k

else if side = 'R',
lwork ≥ ( nqc0 + max( npv0 + numroc( numroc( n +
icoffc, nb_v, 0, 0, npcol ), nb_v, 0, 0, lcmq ),

mpc0 ) ) * k
end if
else if storev = 'R' ,
if side = 'L' ,
lwork≥ ( mpc0 + max( mqv0 + numroc( numroc( m +
iroffc, mb_v, 0, 0, nprow ), mb_v, 0, 0, lcmp ),

nqc0 ) ) * k
else if side = 'R',
lwork ≥ ( mpc0 + nqc0 ) * k

end if
end if,
where
lcmq = lcm / npcol with lcm = iclm( nprow, npcol ),
iroffv = mod( iv-1, mb_v ), icoffv = mod( jv-1, nb_v ),
ivrow = indxg2p( iv, mb_v, myrow, rsrc_v, nprow ),
1469
ivcol = indxg2p( jv, nb_v, mycol, csrc_v, npcol ),
MqV0 = numroc( m+icoffv, nb_v, mycol, ivcol, npcol ),
NpV0 = numroc( n+iroffv, mb_v, myrow, ivrow, nprow ),
MpC0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),
NpC0 = numroc( n+icoffc, mb_c, myrow, icrow, nprow ),
NqC0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),
blacs_gridinfo.
Output Parameters
t (local).
Array of size mb_v * mb_vif storev = 'R', and nb_v * nb_vif storev =
'C'. The triangular matrix t is the representation of the block reflector.
c (local).
On exit, sub(C) is overwritten by the Q*sub(C), or Q'*sub(C), or
sub(C)*Q, or sub(C)*Q'. Q' is transpose (conjugate transpose) of Q.
See Also
notations.
p?larfc
Applies the conjugate transpose of an elementary
reflector to a general matrix.
Syntax
void pclarfc (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex8 *v , MKL_INT *iv ,
void pzlarfc (char *side , MKL_INT *m , MKL_INT *n , MKL_Complex16 *v , MKL_INT *iv ,
Include Files
• mkl_scalapack.h
Description
The p?larfcfunction applies a complex elementary reflector QH to a complex m-by-n distributed matrix
sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in the form
Q = i-tau*v*v',
1470
where tau is a complex scalar and v is a complex vector.
Input Parameters
side (global)
if side = 'L': form QH*sub(C) ;
if side = 'R': form sub (C)*QH.
m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v),
V(iv:iv+m-1, jv) if side = 'L' and incv = 1,
V(iv, jv:jv+m-1) if side = 'L' and incv = m_v,
V(iv:iv+n-1, jv) if side = 'R' and incv = 1,
V(iv, jv:jv+n-1) if side = 'R' and incv = m_v.
The array v is the representation of Q. v is not used if tau = 0.
iv, jv (global)
incv (global)
The global increment for the elements of v. Only two values of incv are
tau (local)
c (local).
ic, jc (global)
1471
work (local).
If incv = 1,
if side = 'L' ,
if ivcol = iccol,
lwork≥ nqc0
else
lwork ≥mpc0 + max( 1, nqc0 )
end if
else if side = 'R',
lwork≥ nqc0 + max( max( 1, mpc0 ), numroc( numroc(
n+icoffc,nb_v,0,0,npcol ), nb_v,0,0,lcmq ) )
end if
else if incv = m_v,
if side = 'L',
lwork≥mpc0 + max( max( 1, nqc0 ), numroc( numroc(
m+iroffc,mb_v,0,0,nprow ),mb_v,0,0,lcmp ) )
if ivrow = icrow,
lwork≥ mpc0
else
lwork≥nqc0 + max( 1, mpc0 )
end if
end if
end if,
where lcm is the least common multiple of nprow and npcol and lcm =
ilcm(nprow, npcol),
lcmp = lcm/nprow, lcmq = lcm/npcol,
iroffc = mod(ic-1, mb_c), icoffc = mod(jc-1, nb_c),
icrow = indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol = indxg2p(jc, nb_c, mycol, csrc_c, npcol),
mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
1472
ilcm, indxg2p, and numroc are ScaLAPACK tool functions;myrow, mycol,
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the QH*sub(C) if side = 'L', or sub(C)
* QH if side = 'R'.
See Also
notations.
p?larfg
Generates an elementary reflector (Householder
matrix).
Syntax
void pslarfg (MKL_INT *n , float *alpha , MKL_INT *iax , MKL_INT *jax , float *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx , float *tau );
void pdlarfg (MKL_INT *n , double *alpha , MKL_INT *iax , MKL_INT *jax , double *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx , double *tau );
void pclarfg (MKL_INT *n , MKL_Complex8 *alpha , MKL_INT *iax , MKL_INT *jax ,
MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx ,
MKL_Complex8 *tau );
void pzlarfg (MKL_INT *n , MKL_Complex16 *alpha , MKL_INT *iax , MKL_INT *jax ,
MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , MKL_INT *incx ,
MKL_Complex16 *tau );
Include Files
• mkl_scalapack.h
Description
The p?larfgfunction generates a real/complex elementary reflector H of order n, such that
where alpha is a scalar (a real scalar - for complex flavors), and sub(X) is an (n-1)-element real/complex
distributed vector X(ix:ix+n-2, jx) if incx = 1 and X(ix, jx:jx+n-2) if incx = m_x. H is represented
in the form
1473
where tau is a real/complex scalar and v is a real/complex (n-1)-element vector. Note that H is not
Hermitian.
If the elements of sub(X) are all zero (and X(iax, jax) is real for complex flavors), then tau = 0 and H is
taken to be the unit matrix.
Otherwise 1 ≤ real(tau) ≤ 2 and abs(tau-1) ≤ 1.
Input Parameters
n (global)
The global order of the elementary reflector. n≥ 0.
iax, jax (global)

The global row and column indices of X(iax, jax) in the global matrix X.
x (local).
Pointer into the local memory to an array of size lld_x * LOCc(n_x). This
array contains the local pieces of the distributed vector sub(X). Before
entry, the incremented array sub(X) must contain vector x.
ix, jx (global)
The row and column indices in the global matrix X indicating the first row
and the first column of sub(X), respectively.
descx (global and local)

Array of size dlen_. The array descriptor for the distributed matrix X.
incx (global)
The global increment for the elements of x. Only two values of incx are
supported in this version, namely 1 and m_x. incx must not be zero.
Output Parameters
alpha (local)
On exit, alpha is computed in the process scope having the vector sub(X).
x (local).
On exit, it is overwritten with the vector v.
tau (local).
Array of size LOCc(jx) if incx = 1, and LOCr(ix) otherwise. This array
tau is tied to the distributed matrix X.
See Also
notations.
1474
p?larft
Forms the triangular vector T of a block reflector H=I-
V*T*VH.
Syntax
void pslarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , float *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , float *tau , float *t , float *work );
void pdlarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , double *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , double *tau , double *t , double *work );
void pclarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex8 *tau , MKL_Complex8 *t ,
MKL_Complex8 *work );
void pzlarft (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex16
*v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex16 *tau , MKL_Complex16
*t , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larftfunction forms the triangular factor T of a real/complex block reflector H of order n, which is
defined as a product of k elementary reflectors.
If direct = 'F', H = H(1)*H(2)...*H(k), and T is upper triangular;
If direct = 'B', H = H(k)*...*H(2)*H(1), and T is lower triangular.
If storev = 'C', the vector which defines the elementary reflector H(i) is stored in the i-th column of the
distributed matrix V, and
H = I-V*T*V'
If storev = 'R', the vector which defines the elementary reflector H(i) is stored in the i-th row of the
distributed matrix V, and
H = I-V'*T*V.
Input Parameters
direct (global)
if direct = 'F': H = H(1)*H(2)*...*H(k) (forward)
if direct = 'B': H = H(k)*...*H(2)*H(1) (backward).
storev (global)
Specifies how the vectors that define the elementary reflectors are stored
(See Application Notes below):
if storev = 'C': columnwise;
if storev = 'R': rowwise.
1475
n (global)
The order of the block reflector H. n≥ 0.
k (global)
The order of the triangular factor T, is equal to the number of elementary
reflectors.
1 ≤k≤mb_v (= nb_v).
v Pointer into the local memory to an array of local size
LOCr(iv+n-1) * LOCc(jv+k-1) if storev = 'C', and

LOCr(iv+k-1) * LOCc(jv+n-1) if storev = 'R'.
The distributed matrix V contains the Householder vectors. (See Application
Notes below).
iv, jv (global)
descv (local) array of size dlen_. The array descriptor for the distributed matrix V.
tau (local)
Array of size LOCr(iv+k-1) if incv = m_v, and LOCc(jv+k-1) otherwise.
This array contains the Householder scalars related to the Householder
vectors.
work (local).
Workspace array of size k*(k -1)/2.
Output Parameters
v
t (local)
Array of size nb_v * nb_v if storev = 'C', and mb_v * mb_v otherwise. It
contains the k-by-k triangular factor of the block reflector associated with v.
If direct = 'F', t is upper triangular;
if direct = 'B', t is lower triangular.
Application Notes
The shape of the matrix V and the storage of the vectors that define the H(i) is best illustrated by the
1476
See Also
notations.
p?larz
Applies an elementary reflector as returned by p?
tzrzf to a general matrix.
Syntax
void pslarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , float *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , float *tau , float *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , float *work );
void pdlarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , double *v , MKL_INT
*iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , double *tau , double *c , MKL_INT
*ic , MKL_INT *jc , MKL_INT *descc , double *work );
void pclarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_INT *incv , MKL_Complex8 *tau ,
MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , MKL_Complex8 *work );
void pzlarz (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *v ,
Include Files
• mkl_scalapack.h
Description
The p?larzfunction applies a real/complex elementary reflector Q (or QT) to a real/complex m-by-n
distributed matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented
in the form
Q = I-tau*v*v',
where tau is a real/complex scalar and v is a real/complex vector.
1477
Q is a product of k elementary reflectors as returned by p?tzrzf.
Input Parameters
side (global)
if side = 'L': form Q*sub(C),
if side = 'R': form sub(C)*Q, Q = QT (for real flavors).
m (global)
The number of rows in the distributed matrix sub(C). (m≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n≥ 0).
l (global)
part of the Householder reflectors. If side = 'L', m≥ l≥ 0,
if side = 'R', n≥l ≥ 0.
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v)
V(iv:iv+l-1, jv) if side = 'L' and incv = 1,
V(iv, jv:jv+l-1) if side = 'L' and incv = m_v,
V(iv:iv+l-1, jv) if side = 'R' and incv = 1,
V(iv, jv:jv+l-1) if side = 'R' and incv = m_v.
The vector v in the representation of Q. v is not used if tau = 0.
iv, jv (global) The row and column indices in the global distributed matrix V
indicating the first row and the first column of the matrix sub(V),
respectively.
incv (global)
tau (local)
c (local).
1478
ic, jc (global)
work (local).
Array of size lwork
If incv = 1,
if side = 'L' ,
if ivcol = iccol,
lwork≥ NqC0
else
lwork ≥ MpC0 + max(1, NqC0)
end if
lwork≥NqC0 + max(max(1, MpC0), numroc(numroc(n+icoffc,nb_v,

0,0,npcol),nb_v,0,0,lcmq))
end if
else if incv = m_v,
if side = 'L' ,
lwork≥MpC0 + max(max(1, NqC0), numroc(numroc(m+iroffc,mb_v,

0,0,nprow),mb_v,0,0,lcmp))
if ivrow = icrow,
lwork≥MpC0
else
lwork≥ NqC0 + max(1, MpC0)
end if
end if
end if.
Here lcm is the least common multiple of nprow and npcol and
lcm = ilcm( nprow, npcol ), lcmp = lcm / nprow,
lcmq = lcm / npcol,

1479
mpc0 = numroc( m+iroffc, mb_c, myrow, icrow, nprow ),
nqc0 = numroc( n+icoffc, nb_c, mycol, iccol, npcol ),
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the Q*sub(C) if side = 'L', or
sub(C)*Q if side = 'R'.
See Also
notations.
p?larzb
transpose as returned by p?tzrzf to a general
matrix.
Syntax
void pslarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , float *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , float *t , float *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , float
*work );
void pdlarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , double *v , MKL_INT *iv , MKL_INT *jv , MKL_INT
*descv , double *t , double *c , MKL_INT *ic , MKL_INT *jc , MKL_INT *descc , double
*work );
void pclarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , MKL_Complex8 *v , MKL_INT *iv , MKL_INT *jv ,
MKL_INT *descv , MKL_Complex8 *t , MKL_Complex8 *c , MKL_INT *ic , MKL_INT *jc ,
MKL_INT *descc , MKL_Complex8 *work );
void pzlarzb (char *side , char *trans , char *direct , char *storev , MKL_INT *m ,
MKL_INT *n , MKL_INT *k , MKL_INT *l , MKL_Complex16 *v , MKL_INT *iv , MKL_INT *jv ,
MKL_INT *descv , MKL_Complex16 *t , MKL_Complex16 *c , MKL_INT *ic , MKL_INT *jc ,
MKL_INT *descc , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larzbfunction applies a real/complex block reflector Q or its transpose QT (conjugate transpose QH for
complex flavors) to a real/complex distributed m-by-n matrix sub(C) = C(ic:ic+m-1, jc:jc+n-1) from the
left or the right.
Currently, only storev = 'R' and direct = 'B' are supported.
1480
Input Parameters
side (global)
if side = 'L': apply Q or QT (QH for complex flavors) from the Left;
if side = 'R': apply Q or QT (QH for complex flavors) from the Right.
trans (global)
if trans = 'N': No transpose, apply Q;
If trans='T': Transpose, apply QT (real flavors);
If trans='C': Conjugate transpose, apply QH (complex flavors).
direct (global)
Indicates how H is formed from a product of elementary reflectors.
if direct = 'F': H = H(1)*H(2)*...*H(k) - forward (not supported) ;
if direct = 'B': H = H(k)*...*H(2)*H(1) - backward.
storev (global)
Indicates how the vectors that define the elementary reflectors are stored:
if storev = 'C': columnwise (not supported ).
m (global)
The number of rows in the distributed submatrix sub(C). (m≥ 0).
n (global)
The number of columns in the distributed submatrix sub(C). (n ≥ 0).
k (global)
The order of the matrix T. (= the number of elementary reflectors whose
product defines the block reflector).
l (global)
The columns of the distributed submatrix sub(A) containing the meaningful
If side = 'L', m ≥l≥ 0,
if side = 'R', n≥l ≥ 0.
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(jv+m-1) if
side = 'L', lld_v * LOCc(jv+m-1) if side = 'R'.
It contains the local pieces of the distributed vectors V representing the
Householder transformation as returned by p?tzrzf.
lld_v≥ LOCr(iv+k-1).
iv, jv (global)
1481
and the first column of the submatrix sub(V), respectively.
t (local)
Array of size mb_v* mb_v.
The lower triangular matrix T in the representation of the block reflector.
c (local).
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).
On entry, the m-by-n distributed matrix sub(C).
ic, jc (global)
and the first column of the submatrix sub(C), respectively.
work (local).
If storev = 'C' ,
if side = 'L' ,
lwork≥(nqc0 + mpc0)* k
lwork ≥ (nqc0 + max(npv0 + numroc(numroc(n+icoffc, nb_v, 0, 0,

npcol),
nb_v, 0, 0, lcmq), mpc0))* k
end if
else if storev = 'R' ,
if side = 'L' ,
lwork≥ (mpc0 + max(mqv0 + numroc(numroc(m+iroffc, mb_v, 0, 0,

nprow),
mb_v, 0, 0, lcmp), nqc0))* k
lwork≥ (mpc0 + nqc0) * k

end if
end if.
Here lcmq = lcm/npcol with lcm = iclm(nprow, npcol),
iroffv = mod(iv-1, mb_v), icoffv = mod( jv-1, nb_v),

ivrow = indxg2p(iv, mb_v, myrow, rsrc_v, nprow),
1482
ivcol = indxg2p(jv, nb_v, mycol, csrc_v, npcol),
mqv0 = numroc(m+icoffv, nb_v, mycol, ivcol, npcol),
npv0 = numroc(n+iroffv, mb_v, myrow, ivrow, nprow),
iroffc = mod(ic-1, mb_c ), icoffc= mod( jc-1, nb_c),
icrow= indxg2p(ic, mb_c, myrow, rsrc_c, nprow),
iccol= indxg2p(jc, nb_c, mycol, csrc_c, npcol),
npc0 = numroc(n+icoffc, mb_c, myrow, icrow, nprow),
blacs_gridinfo.
Output Parameters
c (local).
On exit, sub(C) is overwritten by the Q*sub(C), or Q'*sub(C), or
sub(C)*Q, or sub(C)*Q', where Q' is the transpose (conjugate transpose)
of Q.
See Also
notations.
p?larzc
Applies (multiplies by) the conjugate transpose of an
elementary reflector as returned by p?tzrzf to a
general matrix.
Syntax
void pclarzc (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *v ,
void pzlarzc (char *side , MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *v ,
Include Files
• mkl_scalapack.h
Description
The p?larzcfunction applies a complex elementary reflector QH to a complex m-by-n distributed matrix
sub(C) = C(ic:ic+m-1, jc:jc+n-1), from either the left or the right. Q is represented in the form
Q = i-tau*v*v',
where tau is a complex scalar and v is a complex vector.
1483
Input Parameters
side (global)
if side = 'L': form QH*sub(C);
if side = 'R': form sub(C)*QH .
m (global)
The number of rows in the distributed matrix sub(C). (m ≥ 0).
n (global)
The number of columns in the distributed matrix sub(C). (n ≥ 0).
l (global)
If side = 'L', m≥l ≥ 0,
if side = 'R', n≥ l ≥ 0.
v (local).
Pointer into the local memory to an array of size lld_v * LOCc(n_v)

V(iv:iv+l-1, jv) if side = 'L' and incv = 1,
V(iv, jv:jv+l-1) if side = 'L' and incv = m_v,
V(iv:iv+l-1, jv) if side = 'R' and incv = 1,
V(iv, jv:jv+l-1) if side = 'R' and incv = m_v.
The vector v in the representation of Q. v is not used if tau = 0.
iv, jv (global)
incv (global)
tau (local)
c (local).
1484
ic, jc (global)
work (local).
If incv = 1,
if side = 'L' ,
if ivcol = iccol,
lwork≥ nqc0
else
lwork ≥ mpc0 + max(1, nqc0)
end if
lwork ≥nqc0 + max(max(1, mpc0), numroc(numroc(n+icoffc, nb_v, 0,
0, npcol),
nb_v, 0, 0, lcmq)) end if
else if incv = m_v,
if side = 'L' ,
lwork≥mpc0 + max(max(1, nqc0), numroc(numroc(m+iroffc, mb_v, 0,
0, nprow),
mb_v, 0, 0, lcmp))
else if side = 'R',
if ivrow = icrow,
lwork≥ mpc0
else
lwork≥nqc0 + max(1, mpc0)
end if
end if
end if
Here lcm is the least common multiple of nprow and npcol;

lcm = ilcm(nprow, npcol), lcmp = lcm/nprow, lcmq= lcm/npcol,
iroffc = mod(ic-1, mb_c), icoffc= mod(jc-1, nb_c),
myrow, mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
c (local).
1485
On exit, sub(C) is overwritten by the QH*sub(C) if side = 'L', or

sub(C)*QH if side = 'R'.
See Also
notations.
p?larzt
Forms the triangular factor T of a block reflector H=I-
V*T*VH as returned by p?tzrzf.
Syntax
void pslarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , float *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , float *tau , float *t , float *work );
void pdlarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , double *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , double *tau , double *t , double *work );
void pclarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex8 *v ,
MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex8 *tau , MKL_Complex8 *t ,
void pzlarzt (char *direct , char *storev , MKL_INT *n , MKL_INT *k , MKL_Complex16
*v , MKL_INT *iv , MKL_INT *jv , MKL_INT *descv , MKL_Complex16 *tau , MKL_Complex16
*t , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?larztfunction forms the triangular factor T of a real/complex block reflector H of order greater than
n, which is defined as a product of k elementary reflectors as returned by p?tzrzf.
If direct = 'F', H = H(1)*H(2)*...*H(k), and T is upper triangular;
If direct = 'B', H = H(k)*...*H(2)*H(1), and T is lower triangular.
If storev = 'C', the vector which defines the elementary reflector H(i), is stored in the i-th column of the
array v, and
H = i-v*t*v'.
If storev = 'R', the vector, which defines the elementary reflector H(i), is stored in the i-th row of the
array v, and
H = i-v'*t*v
Currently, only storev = 'R' and direct = 'B' are supported.
Input Parameters
direct (global)
if direct = 'F': H = H(1)*H(2)*...*H(k) (Forward, not supported)
1486
if direct = 'B': H = H(k)*...*H(2)*H(1) (Backward).
storev (global)
Specifies how the vectors which defines the elementary reflectors are
stored:
if storev = 'C': columnwise (not supported);
n (global)
The order of the block reflector H. n≥ 0.
k (global)
The order of the triangular factor T (= the number of elementary
reflectors).
1≤k≤mb_v(= nb_v).
v Pointer into the local memory to an array of local size LOCr(iv+k-1) *

LOCc(jv+n-1).
The distributed matrix V contains the Householder vectors. See Application
Notes below.
first row and the first column of the matrix sub(V), respectively.
descv (local) array of size dlen_. The array descriptor for the distributed matrix V.
tau (local)
Array of size LOCr(iv+k-1) if incv = m_v, and LOCc(jv+k-1) otherwise.
This array contains the Householder scalars related to the Householder
vectors.
work (local).
Workspace array of size(k*(k-1)/2).
Output Parameters
v
t (local)
Array of size mb_v* mb_v. It contains the k-by-k triangular factor of the
block reflector associated with v. t is lower triangular.
Application Notes
1487
See Also
notations.
p?lascl
Multiplies a general rectangular matrix by a real scalar
defined as Cto/Cfrom.
Syntax
void pslascl (char *type , float *cfrom , float *cto , MKL_INT *m , MKL_INT *n , float
*a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
1488
void pdlascl (char *type , double *cfrom , double *cto , MKL_INT *m , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pclascl (char *type , float *cfrom , float *cto , MKL_INT *m , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
void pzlascl (char *type , double *cfrom , double *cto , MKL_INT *m , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The p?lasclfunction multiplies the m-by-n real/complex distributed matrix sub(A) denoting A(ia:ia+m-1,
ja:ja+n-1) by the real/complex scalar cto/cfrom. This is done without over/underflow as long as the final
result cto*A(i,j)/cfrom does not over/underflow. type specifies that sub(A) may be full, upper triangular,
lower triangular or upper Hessenberg.
Input Parameters
type (global)
type indicates the storage type of the input distributed matrix.
if type = 'G': sub(A) is a full matrix,
if type = 'L': sub(A) is a lower triangular matrix,
if type = 'U': sub(A) is an upper triangular matrix,
if type = 'H': sub(A) is an upper Hessenberg matrix.
cfrom, cto (global)

The distributed matrix sub(A) is multiplied by cto/cfrom. A(i,j) is
computed without over/underflow if the final result cto*A(i,j)/cfrom can be
represented without over/underflow. cfrom must be nonzero.
m (global)
n (global)
a (local input/local output)

This array contains the local pieces of the distributed matrix sub(A).
ia, ja (global)
The column and row indices in the global matrix A indicating the first row
and column of the matrix sub(A), respectively.

Array of size dlen_.The array descriptor for the distributed matrix A.
1489
Output Parameters
a (local).
On exit, this array contains the local pieces of the distributed matrix
multiplied by cto/cfrom.
info (local)
if info = 0: the execution is successful.
if info < 0: If the i-th argument is an array and the j-th entry, indexed
j-1, had an illegal value, then info = -(i*100+j),
if the i-th argument is a scalar and had an illegal value, then info = -i.
See Also
notations.
p?lase2
Initializes an m-by-n distributed matrix.
Syntax
void pslase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const float* alpha,
const float* beta, float* a, const MKL_INT* ia, const MKL_INT* ja, const MKL_INT*
desca);
void pdlase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const double*
alpha, const double* beta, double* a, const MKL_INT* ia, const MKL_INT* ja, const
MKL_INT* desca);
void pclase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const MKL_Complex8*
alpha, const MKL_Complex8* beta, MKL_Complex8* a, const MKL_INT* ia, const MKL_INT*
ja, const MKL_INT* desca);
void pzlase2 (const char* uplo, const MKL_INT* m, const MKL_INT* n, const
MKL_Complex16* alpha, const MKL_Complex16* beta, MKL_Complex16* a, const MKL_INT* ia,
const MKL_INT* ja, const MKL_INT* desca);
Include Files
• mkl_scalapack.h
Description
p?lase2 initializes an m-by-n distributed matrix sub( A ) denoting A(ia:ia+m-1,ja:ja+n-1) to beta on the
diagonal and alpha on the off-diagonals. p?lase2 requires that only the dimension of the matrix operand is
distributed.
Input Parameters
uplo (global)
Specifies the part of the distributed matrix sub( A ) to be set:
= 'U': Upper triangular part is set; the strictly lower triangular part of
sub( A ) is not changed;
1490
= 'L': Lower triangular part is set; the strictly upper triangular part of
sub( A ) is not changed;
Otherwise: All of the matrix sub( A ) is set.
m (global)
distributed submatrix sub( A ). m >= 0.
n (global)
alpha (global)
The constant to which the off-diagonal elements are to be set.
beta (global)
The constant to which the diagonal elements are to be set.
ia (global)
ja (global)
sub( A ).

Output Parameters
a (local)
This array contains the local pieces of the distributed matrix sub( A ) to be
set.
On exit, the leading m-by-n submatrix sub( A ) is set as follows:
if uplo = 'U', A(ia+i-1,ja+j-1) = alpha, 1<=i<=j-1, 1<=j<=n,
if uplo = 'L', A(ia+i-1,ja+j-1) = alpha, j+1<=i<=m, 1<=j<=n,
otherwise, A(ia+i-1,ja+j-1) = alpha, 1<=i<=m, 1<=j<=n, ia+i != ja+j,
and, for all uplo, A(ia+i-1,ja+i-1) = beta, 1<=i<=min(m,n).
p?laset
Initializes the offdiagonal elements of a matrix to alpha
and the diagonal elements to beta.
1491
Syntax
void pslaset (char *uplo , MKL_INT *m , MKL_INT *n , float *alpha , float *beta ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pdlaset (char *uplo , MKL_INT *m , MKL_INT *n , double *alpha , double *beta ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pclaset (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex8 *alpha , MKL_Complex8
*beta , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pzlaset (char *uplo , MKL_INT *m , MKL_INT *n , MKL_Complex16 *alpha ,
MKL_Complex16 *beta , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
The p?lasetfunction initializes an m-by-n distributed matrix sub(A) denoting A(ia:ia+m-1, ja:ja+n-1) to
beta on the diagonal and alpha on the offdiagonals.
Input Parameters
uplo (global)
Specifies the part of the distributed matrix sub(A) to be set:
if uplo = 'U': upper triangular part; the strictly lower triangular part of
sub(A) is not changed;
if uplo = 'L': lower triangular part; the strictly upper triangular part of
sub(A) is not changed.
Otherwise: all of the matrix sub(A) is set.
m (global)
n (global)
alpha (global).
The constant to which the offdiagonal elements are to be set.
beta (global).
The constant to which the diagonal elements are to be set.
Output Parameters
a (local).
This array contains the local pieces of the distributed matrix sub(A) to be
set. On exit, the leading m-by-n matrix sub(A) is set as follows:
if uplo = 'U', A(ia+i-1, ja+j-1) = alpha, 1≤i≤j-1, 1≤j≤n,
if uplo = 'L', A(ia+i-1, ja+j-1) = alpha, j+1≤i≤m, 1≤j≤n,
1492
otherwise, A(ia+i-1, ja+j-1) = alpha, 1≤i≤m, 1≤j≤n, ia+i≠ja+j, and, for all
uplo, A(ia+i-1, ja+i-1) = beta, 1≤i≤min(m,n).
ia, ja (global)
The column and row indices in the distributed matrix A indicating the first
row and column of the matrix sub(A), respectively.

See Also
notations.
p?lasmsub
Looks for a small subdiagonal element from the
bottom of the matrix that it can safely set to zero.
Syntax
void pslasmsub (const float *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *k, const float *smlnum, float *buf, const MKL_INT *lwork );
void pdlasmsub (const double *a, const MKL_INT *desca, const MKL_INT *i, const MKL_INT
*l, MKL_INT *k, const double *smlnum, double *buf, const MKL_INT *lwork );
void pclasmsub (const MKL_Complex8 *a , const MKL_INT *desca , const MKL_INT *i , const
MKL_INT *l , MKL_INT *k , const float *smlnum , MKL_Complex8 *buf , const MKL_INT
*lwork );
void pzlasmsub (const MKL_Complex16 *a , const MKL_INT *desca , const MKL_INT *i ,
const MKL_INT *l , MKL_INT *k , const double *smlnum , MKL_Complex16 *buf , const
MKL_INT *lwork );
Include Files
• mkl_scalapack.h
Description
The p?lasmsubfunction looks for a small subdiagonal element from the bottom of the matrix that it can
safely set to zero. This function performs a global maximum and must be called by all processes.
Input Parameters
a (local)
Array of size lld_a*LOCc(n_a).
On entry, the Hessenberg matrix whose tridiagonal part is being scanned.
Unchanged on exit.

i (global)
1493
The global location of the bottom of the unreduced submatrix of A.

Unchanged on exit.
l (global)
The global location of the top of the unreduced submatrix of A.
Unchanged on exit.
smlnum (global)
On entry, a "small number" for the given matrix. Unchanged on exit. The
machine-dependent constants for the stopping criterion.
lwork (local)
This must be at least 2*ceil(ceil((i-l)/mb_a )/ lcm(nprow,npcol)).
Here lcm is least common multiple and nprowxnpcol is the logical grid size.
Output Parameters
k (global)
On exit, this yields the bottom portion of the unreduced submatrix. This will
satisfy: l≤k≤i-1.
buf (local).
Application Notes
This routine parallelizes the code from ?lahqr that looks for a single small subdiagonal element.
See Also
notations.
p?lasrt
Sorts the numbers in an array and the corresponding
vectors in increasing order.
Syntax
void pslasrt (const char* id, const MKL_INT* n, float* d, const float* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, float* work, const MKL_INT*
void pdlasrt (const char* id, const MKL_INT* n, double* d, const double* q, const
MKL_INT* iq, const MKL_INT* jq, const MKL_INT* descq, double* work, const MKL_INT*
Include Files
• mkl_scalapack.h
Description
p?lasrt sorts the numbers in d and the corresponding vectors in q in increasing order.
1494
Input Parameters
id (global)
= 'I': sort d in increasing order;
= 'D': sort d in decreasing order. (NOT IMPLEMENTED YET)
n (global)
distributed submatrix sub( Q ). n >= 0.
d (global)
Array, size (n)
q (local)
Pointer into the local memory to an array of size lld_q*LOCc(jq+n-1) .
This array contains the local pieces of the distributed matrix sub( A ) to be
copied from.
iq (global)
The row index in the global array A indicating the first row of sub( Q ).
jq (global)
The column index in the global array A indicating the first column of
sub( Q ).
descq (global and local)

work (local)
Array, size (lwork)
lwork (local)
lwork = MAX( n, NP * ( NB + NQ )), where NP = numroc( n, NB, MYROW,

IAROW, NPROW ), NQ = numroc( n, NB, MYCOL, DESCQ( csrc_ ), NPCOL ).
iwork (local)
Array, size (liwork)
liwork (local)
The size of the array iwork.
liwork = n + 2*NB + 2*NPCOL
Output Parameters
d On exit, the numbers in d are sorted in increasing order.
1495
info (global)
p?lassq
Updates a sum of squares represented in scaled form.
Syntax
void pslassq (MKL_INT *n , float *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_INT *incx , float *scale , float *sumsq );
void pdlassq (MKL_INT *n , double *x , MKL_INT *ix , MKL_INT *jx , MKL_INT *descx ,
MKL_INT *incx , double *scale , double *sumsq );
void pclassq (MKL_INT *n , MKL_Complex8 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx , float *scale , float *sumsq );
void pzlassq (MKL_INT *n , MKL_Complex16 *x , MKL_INT *ix , MKL_INT *jx , MKL_INT
*descx , MKL_INT *incx , double *scale , double *sumsq );
Include Files
• mkl_scalapack.h
Description
The p?lassqfunction returns the values scl and smsq such that
scl 2 * smsq = x12 + ... + xn2 + scale 2*sumsq,

where
xi= sub(X) = X(ix + (jx-1)*m_x + (i - 1)*incx) for pslassq/pdlassq ,
xi= sub(X) = abs(X(ix + (jx-1)*m_x + (i - 1)*incx) for pclassq/pzlassq.
For real functions pslassq/pdlassq the value of sumsq is assumed to be non-negative and scl returns the
value
scl = max(scale, abs(xi)).
For complex functions pclassq/pzlassq the value of sumsq is assumed to be at least unity and the value of
ssq will then satisfy
1.0 ≤ ssq≤sumsq +2n
Value scale is assumed to be non-negative and scl returns the value
For all functions p?lassq values scale and sumsq must be supplied in scale and sumsq respectively, and
scale and sumsq are overwritten by scl and ssq respectively.
All functions p?lassq make only one pass through the vector sub(X).
1496
Input Parameters
n (global)
The length of the distributed vector sub(x ).
x The array that stores the vector for which a scaled sum of squares is
computed:
x[ix + (jx-1)*m_x + i*incx], 0 ≤ i < n.
ix (global)
The row index in the global matrix X indicating the first row of sub(X).
jx (global)
The column index in the global matrix X indicating the first column of
sub(X).
descx (global and local) array of size dlen_.

The array descriptor for the distributed matrix X.
incx (global)
The global increment for the elements of X. Only two values of incx are
supported in this version, namely 1 and m_x. The argument incx must not
equal zero.
scale (local).
On entry, the value scale in the equation above.
sumsq (local)
On entry, the value sumsq in the equation above.
Output Parameters
scale (local).
On exit, scale is overwritten with scl , the scaling factor for the sum of
squares.
sumsq (local).
On exit, sumsq is overwritten with the value smsq, the basic sum of squares
from which scl has been factored out.
See Also
notations.
p?laswp
Performs a series of row interchanges on a general
rectangular matrix.
Syntax
void pslaswp (char *direc , char *rowcol , MKL_INT *n , float *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
1497
void pdlaswp (char *direc , char *rowcol , MKL_INT *n , double *a , MKL_INT *ia ,
void pclaswp (char *direc , char *rowcol , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
void pzlaswp (char *direc , char *rowcol , MKL_INT *n , MKL_Complex16 *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *k1 , MKL_INT *k2 , MKL_INT *ipiv );
Include Files
• mkl_scalapack.h
Description
The p?laswpfunction performs a series of row or column interchanges on the distributed matrix
sub(A)=A(ia:ia+n-1, ja:ja+n-1). One interchange is initiated for each of rows or columns k1 through k2
of sub(A). This function assumes that the pivoting information has already been broadcast along the process
row or column. Also note that this function will only work for k1-k2 being in the same mb (or nb) block. If
you want to pivot a full matrix, use p?lapiv.
Input Parameters
direc (global)
= 'F' - forward,
= 'B' - backward.
rowcol (global)
Specifies if the rows or columns are permuted:
= 'R' - rows,
= 'C' - columns.
n (global)
If rowcol='R', the length of the rows of the distributed matrix A(*,
ja:ja+n-1) to be permuted;
If rowcol='C', the length of the columns of the distributed matrix A(ia:ia
+n-1 , *) to be permuted;
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(n_a). On
entry, this array contains the local pieces of the distributed matrix to which
the row/columns interchanges will be applied.
ia (global)
The row index in the global matrix A indicating the first row of sub(A).
ja (global)
sub(A).
1498
k1 (global)
The first element of ipiv for which a row or column interchange will be done.
k2 (global)
The last element of ipiv for which a row or column interchange will be done.
ipiv (local)
Array of size LOCr(m_a)+mb_a for row pivoting and LOCr(n_a)+nb_a for
column pivoting. This array is tied to the matrix A, ipiv[k]=l implies rows
(or columns) k+1 and l are to be interchanged, k = 0, 1, ..., size (ipiv) -1.
Output Parameters
A (local)
On exit, the permuted distributed matrix.
See Also
notations.
p?latra
Computes the trace of a general square distributed
matrix.
Syntax
float pslatra (MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
double pdlatra (MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca );
void pclatra (MKL_Complex8 * , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca );
void pzlatra (MKL_Complex16 * , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
This function computes the trace of an n-by-n distributed matrix sub(A) denoting A(ia:ia+n-1, ja:ja
+n-1). The result is left on every process of the grid.
Input Parameters
n (global)
The number of rows and columns to be operated on, that is, the order of
the distributed matrix sub(A). n ≥0.
a (local).
1499

containing the local pieces of the distributed matrix, the trace of which is to
be computed.
ia, ja (global) The row and column indices respectively in the global matrix A
indicating the first row and the first column of the matrix sub(A),
respectively.
Output Parameters
See Also
notations.
p?latrd
Reduces the first nb rows and columns of a
symmetric/Hermitian matrix A to real tridiagonal form
by an orthogonal/unitary similarity transformation.
Syntax
void pslatrd (char *uplo , MKL_INT *n , MKL_INT *nb , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *d , float *e , float *tau , float *w , MKL_INT *iw ,
MKL_INT *jw , MKL_INT *descw , float *work );
void pdlatrd (char *uplo , MKL_INT *n , MKL_INT *nb , double *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , double *tau , double *w ,
MKL_INT *iw , MKL_INT *jw , MKL_INT *descw , double *work );
void pclatrd (char *uplo , MKL_INT *n , MKL_INT *nb , MKL_Complex8 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , float *d , float *e , MKL_Complex8 *tau , MKL_Complex8
*w , MKL_INT *iw , MKL_INT *jw , MKL_INT *descw , MKL_Complex8 *work );
void pzlatrd (char *uplo , MKL_INT *n , MKL_INT *nb , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , double *d , double *e , MKL_Complex16 *tau ,
MKL_Complex16 *w , MKL_INT *iw , MKL_INT *jw , MKL_INT *descw , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?latrdfunction reduces nb rows and columns of a real symmetric or complex Hermitian matrix
sub(A)= A(ia:ia+n-1, ja:ja+n-1) to symmetric/complex tridiagonal form by an orthogonal/unitary
similarity transformation Q'*sub(A)*Q, and returns the matrices V and W, which are needed to apply the
transformation to the unreduced part of sub(A).
If uplo = U, p?latrd reduces the last nb rows and columns of a matrix, of which the upper triangle is
supplied;
if uplo = L, p?latrd reduces the first nb rows and columns of a matrix, of which the lower triangle is
supplied.
1500
This is an auxiliary function called by p?sytrd/p?hetrd.
Input Parameters
uplo (global)
Hermitian matrix sub(A) is stored:
= L: Lower triangular.
n (global)
the distributed matrix sub(A). n≥ 0.
nb (global)
The number of rows and columns to be reduced.
a Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the symmetric/Hermitian

If uplo = U, the leading n-by-n upper triangular part of sub(A) contains
is not referenced.
If uplo = L, the leading n-by-n lower triangular part of sub(A) contains the
lower triangular part of the matrix, and its strictly upper triangular part is
not referenced.
ia (global)
ja (global)
sub(A).
iw (global)
The row index in the global matrix W indicating the first row of sub(W).
jw (global)
The column index in the global matrix W indicating the first column of
sub(W).
descw (global and local) array of size dlen_. The array descriptor for the
distributed matrix W.
work (local)
Workspace array of size nb_a.
1501
Output Parameters
a (local)
On exit, if uplo = 'U', the last nb columns have been reduced to
tridiagonal form, with the diagonal elements overwriting the diagonal
elements of sub(A); the elements above the diagonal with the array tau
represent the orthogonal/unitary matrix Q as a product of elementary
reflectors;
if uplo = 'L', the first nb columns have been reduced to tridiagonal form,
with the diagonal elements overwriting the diagonal elements of sub(A);
the elements below the diagonal with the array tau represent the
orthogonal/unitary matrix Q as a product of elementary reflectors.
d (local)
The diagonal elements of the tridiagonal matrix T: d[i] = A(i+1,i+1), i = 0,

1, ..., LOCc(ja+n-1)-1. d is tied to the distributed matrix A.
e (local)
Array of size LOCc(ja+n-1) if uplo = 'U', LOCc(ja+n-2) otherwise.

e[i] = A(i + 1, i + 2) if uplo = 'U',
e[i] = A(i + 2, i + 1) if uplo = 'L',
i = 0, 1, ..., LOCc(ja+n-1)-1.
tau (local)
w (local)
Pointer into the local memory to an array of size lld_w* nb_w. This array
contains the local pieces of the n-by-nb_w matrix w required to update the
unreduced part of sub(A).
Application Notes
Q = H(n)*H(n-1)*...*H(n-nb+1)
H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(i:n) = 0 and v(i-1) = 1;
v(1:i-1) is stored on exit in A(ia:ia+i-1, ja+i), and tau in tau[ja+i-2].
If uplo = L, the matrix Q is represented as a product of elementary reflectors
Q = H(1)*H(2)*...*H(nb)
1502
H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1) = 1; v(i
+2: n) is stored on exit in A(ia+i+1: ia+n-1, ja+i-1), and tau in tau[ja+i-2].
The elements of the vectors v together form the n-by-nb matrix V which is needed, with W, to apply the
transformation to the unreduced part of the matrix, using a symmetric/Hermitian rank-2k update of the
form:
sub(A) := sub(A)-vw'-wv'.
The contents of a on exit are illustrated by the following examples with
n = 5 and nb = 2:
where d denotes a diagonal element of the reduced matrix, a denotes an element of the original matrix that
is unchanged, and vi denotes an element of the vector defining H(i).
See Also
notations.
p?latrs
Solves a triangular system of equations with the scale
factor set to prevent overflow.
Syntax
void pslatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *x , MKL_INT *ix ,
MKL_INT *jx , MKL_INT *descx , float *scale , float *cnorm , float *work );
void pdlatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *x , MKL_INT *ix ,
MKL_INT *jx , MKL_INT *descx , double *scale , double *cnorm , double *work );
void pclatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , float *scale , float *cnorm ,
1503
void pzlatrs (char *uplo , char *trans , char *diag , char *normin , MKL_INT *n ,
MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *x ,
MKL_INT *ix , MKL_INT *jx , MKL_INT *descx , double *scale , double *cnorm ,
Include Files
• mkl_scalapack.h
Description
The p?latrsfunction solves a triangular system of equations Ax = sb, ATx = sb or AHx = sb, where s is a
scale factor set to prevent overflow. The description of the function will be extended in the future releases.
Input Parameters
uplo
Specifies whether the matrix A is upper or lower triangular.
trans
Specifies the operation applied to Ax.
= 'N': Solve Ax = s*b (no transpose)
= 'T': Solve ATx = s*b (transpose)
= 'C': Solve AHx = s*b (conjugate transpose),
where s - is a scale factor
diag
Specifies whether or not the matrix A is unit triangular.
= 'N': Non-unit triangular
= 'U': Unit triangular
normin
Specifies whether cnorm has been set or not.
= 'Y': cnorm contains the column norms on entry;
= 'N': cnorm is not set on entry. On exit, the norms will be computed and
stored in cnorm.
n
The order of the matrix A. n≥ 0
a Array of size lda* n. Contains the triangular matrix A.

If uplo = U, the leading n-by-n upper triangular part of the array a
contains the upper triangular matrix, and the strictly lower triangular part
If uplo = 'L', the leading n-by-n lower triangular part of the array a
contains the lower triangular matrix, and the strictly upper triangular part
If diag = 'U', the diagonal elements of a are also not referenced and are
assumed to be 1.
1504
x Array of size n. On entry, the right hand side b of the triangular system.
ix (global).The row index in the global matrix X indicating the first row of
sub(x).
jx (global)
The column index in the global matrix X indicating the first column of
sub(X).

Array of size dlen_. The array descriptor for the distributed matrix X.
cnorm Array of size n. If normin = 'Y', cnorm is an input argument and cnorm[j]
contains the norm of the off-diagonal part of the (j+1)-th column of the
matrix A, j=0, 1, ..., n-1. If trans = 'N', cnorm[j] must be greater than or
equal to the infinity-norm, and if trans = 'T' or 'C', cnorm[j] must be
greater than or equal to the 1-norm.
work (local).
Temporary workspace.
Output Parameters
X On exit, x is overwritten by the solution vector x.
scale Array of size lda* n. The scaling factor s for the triangular system as
described above.
If scale = 0, the matrix A is singular or badly scaled, and the vector x is
an exact or approximate solution to Ax = 0.
cnorm If normin = 'N', cnorm is an output argument and cnorm[j] returns the 1-
norm of the off-diagonal part of the (j+1)-th column of A, j=0, 1, ..., n-1.
See Also
notations.
p?latrz
Reduces an upper trapezoidal matrix to upper
triangular form by means of orthogonal/unitary
transformations.
Syntax
void pslatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , float *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , float *tau , float *work );
void pdlatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , double *a , MKL_INT *ia , MKL_INT
*ja , MKL_INT *desca , double *tau , double *work );
1505
void pclatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex8 *a , MKL_INT *ia ,

MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *tau , MKL_Complex8 *work );
void pzlatrz (MKL_INT *m , MKL_INT *n , MKL_INT *l , MKL_Complex16 *a , MKL_INT *ia ,
MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *tau , MKL_Complex16 *work );
Include Files
• mkl_scalapack.h
Description
The p?latrzfunction reduces the m-by-n(m ≤ n) real/complex upper trapezoidal matrix sub(A) =
[A(ia:ia+m-1, ja:ja+m-1)A(ia:ia+m-1, ja+n-l:ja+n-1)] to upper triangular form by means of
orthogonal/unitary transformations.
The upper trapezoidal matrix sub(A) is factored as
sub(A) = ( R 0 )*Z,
where Z is an n-by-n orthogonal/unitary matrix and R is an m-by-m upper triangular matrix.
Input Parameters
m (global)
The number of rows in the distributed matrix sub(A). m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(A). n ≥ 0.
l (global)
The number of columns of the distributed matrix sub(A) containing the
meaningful part of the Householder reflectors. l > 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+n-1). On
entry, the local pieces of the m-by-n distributed matrix sub(A), which is to
be factored.
ia (global)
ja (global)
sub(A).

work (local)
lwork≥nq0 + max(1, mp0), where
1506
numroc, indxg2p, and numroc are ScaLAPACK tool functions; myrow,
mycol, nprow, and npcol can be determined by calling the function
blacs_gridinfo.
Output Parameters
a On exit, the leading m-by-m upper triangular part of sub(A) contains the
upper triangular matrix R, and elements n-l+1 to n of the first m rows of
sub(A), with the array tau, represent the orthogonal/unitary matrix Z as a
product of m elementary reflectors.
tau (local)
Array of sizeLOCr(ja+m-1). This array contains the scalar factors of the
Application Notes
The factorization is obtained by Householder's method. The k-th transformation matrix, Z(k), which is used
(or, in case of complex functions, whose conjugate transpose is used) to introduce zeros into the (m - k
+ 1)-th row of sub(A), is given in the form
where
tau is a scalar and z( k ) is an (n-m)-element vector. tau and z( k ) are chosen to annihilate the elements
of the k-th row of sub(A). The scalar tau is returned in the k-th element of tau, indexed k-1, and the vector
u( k ) in the k-th row of sub(A), such that the elements of z(k ) are in A( k, m + 1 ), ..., A( k,
n ). The elements of R are returned in the upper triangular part of sub(A).
Z is given by
Z = Z(1)Z(2)...Z(m).
See Also
notations.
1507
p?lauu2
Computes the product U*U' or L'*L, where U and L
are upper or lower triangular matrices (local
unblocked algorithm).
Syntax
void pslauu2 (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pdlauu2 (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
void pclauu2 (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
void pzlauu2 (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
The p?lauu2function computes the product U*U' or L'*L, where the triangular factor U or L is stored in the
upper or lower triangular part of the distributed matrix
sub(A)= A(ia:ia+n-1, ja:ja+n-1).
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in sub(A).
If uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in sub(A).
This is the unblocked form of the algorithm, calling BLAS Level 2 Routines. No communication is performed
by this function, the matrix to operate on should be strictly local to one process.
Input Parameters
uplo (global)
Specifies whether the triangular factor stored in the matrix sub(A) is upper
or lower triangular:
= U: upper triangular
= L: lower triangular.
n (global)
the triangular factor U or L. n≥ 0.
a (local)
entry, the local pieces of the triangular factor U or L.
ia (global)
ja (global)
1508
sub(A).
Output Parameters
a (local)
On exit, if uplo = 'U', the upper triangle of the distributed matrix sub(A)
is overwritten with the upper triangle of the product U*U'; if uplo = 'L',
the lower triangle of sub(A) is overwritten with the lower triangle of the
product L'*L.
See Also
notations.
p?lauum
Computes the product U*U' or L'*L, where U and L
are upper or lower triangular matrices.
Syntax
void pslauum (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca );
void pdlauum (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
void pclauum (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
void pzlauum (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca );
Include Files
• mkl_scalapack.h
Description
The p?lauumfunction computes the product U*U' or L'*L, where the triangular factor U or L is stored in the
upper or lower triangular part of the matrix sub(A)= A(ia:ia+n-1, ja:ja+n-1).
If uplo = 'U' or 'u', then the upper triangle of the result is stored, overwriting the factor U in sub(A). If
uplo = 'L' or 'l', then the lower triangle of the result is stored, overwriting the factor L in sub(A).
This is the blocked form of the algorithm, calling Level 3 PBLAS.
Input Parameters
uplo (global)
Specifies whether the triangular factor stored in the matrix sub(A) is upper
or lower triangular:
= 'U': upper triangular
1509
= 'L': lower triangular.
n (global)
the triangular factor U or L. n≥ 0.
a (local)
entry, the local pieces of the triangular factor U or L.
ia (global)
ja (global)
sub(A).
Output Parameters
a (local)
On exit, if uplo = 'U', the upper triangle of the distributed matrix sub(A)
is overwritten with the upper triangle of the product U*U' ; if uplo = 'L',
the lower triangle of sub(A) is overwritten with the lower triangle of the
product L'*L.
See Also
notations.
p?lawil
Forms the Wilkinson transform.
Syntax
void pslawil (const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *m, const float *a,
const MKL_INT *desca, const float *h44, const float *h33, const float *h43h34, float
*v );
void pdlawil (const MKL_INT *ii, const MKL_INT *jj, const MKL_INT *m, const double *a,
const MKL_INT *desca, const double *h44, const double *h33, const double *h43h34,
double *v );
void pclawil (const MKL_INT *ii , const MKL_INT *jj , const MKL_INT *m , const
MKL_Complex8 *a , const MKL_INT *desca , const MKL_Complex8 *h44 , const MKL_Complex8
*h33 , const MKL_Complex8 *h43h34 , MKL_Complex8 *v );
void pzlawil (const MKL_INT *ii , const MKL_INT *jj , const MKL_INT *m , const
MKL_Complex16 *a , const MKL_INT *desca , const MKL_Complex16 *h44 , const
MKL_Complex16 *h33 , const MKL_Complex16 *h43h34 , MKL_Complex16 *v );
1510
Include Files
• mkl_scalapack.h
Description
The p?lawilfunction gets the transform given by h44, h33, and h43h34 into v starting at row m.
Input Parameters
ii (global)
Number of the process row which owns the matrix element A(m+2, m+2).
jj (global)
Number of the process column which owns the matrix element A(m+2, m
+2).
m (global)
On entry, the location from where the transform starts (row m). Unchanged
on exit.
a (local)
Array of size lld_a*LOCc(n_a).
On entry, the Hessenberg matrix. Unchanged on exit.

Unchanged on exit.
h43h34 (global)
These three values are for the double shift QR iteration. Unchanged on exit.
Output Parameters
v (global)
Array of size 3 that contains the transform on output.
See Also
notations.
p?org2l/p?ung2l
Generates all or part of the orthogonal/unitary matrix
Q from a QL factorization determined by p?geqlf
Syntax
void psorg2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
void pdorg2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
1511
void pcung2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,

void pzung2l (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?org2l/p?ung2lfunction generates an m-by-n real/complex distributed matrix Q denoting A(ia:ia
+m-1, ja:ja+n-1) with orthonormal columns, which is defined as the last n columns of a product of k
elementary reflectors of order m:
Q = H(k)*...*H(2)*H(1) as returned by p?geqlf.
Input Parameters
m (global)
The number of rows in the distributed submatrix Q. m≥ 0.
n (global)
The number of columns in the distributed submatrix Q. m≥n≥ 0.
k (global)
n≥k≥ 0.
that defines the elementary reflector H(j), ja+n-k≤j≤ja+n-k, as
returned by p?geqlf in the k columns of its distributed matrix argument
A(ia:*,ja+n-k:ja+n-1).
ia (global)
ja (global)
sub(A).
tau (local)
tau[j] contains the scalar factor of the elementary reflector H(j+1), j =

0, 1, ..., LOCc(ja+n-1)-1, as returned by p?geqlf.
work (local)
1512

lwork is local input and must be at least lwork≥mpa0 + max(1, nqa0),
where
mpa0 = numroc(m+iroffa, mb_a, myrow, iarow, nprow),
nqa0 = numroc(n+icoffa, nb_a, mycol, iacol, npcol).

Output Parameters
a On exit, this array contains the local pieces of the m-by-n distributed matrix
Q.
info (local).
< 0: if the i-th argument is an array and the j-th entry, indexed j-1, had an
illegal value,
then info = - (i*100 +j),
if the i-th argument is a scalar and had an illegal value,

then info = -i.
See Also
notations.
p?org2r/p?ung2r
Q from a QR factorization determined by p?geqrf
Syntax
void psorg2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
1513
void pdorg2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT

void pcung2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
void pzung2r (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?org2r/p?ung2rfunction generates an m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal columns, which is defined as the first n columns of a product of k elementary
reflectors of order m:
Q = H(1)*H(2)*...*H(k)
Input Parameters
m (global)
The number of rows in the distributed submatrix Q.m≥ 0.
n (global)
The number of columns in the distributed submatrix Q. m≥n ≥ 0.
k (global)
n≥ k≥ 0.
a Pointer into the local memory to an array of sizelld_a * LOCc(ja+n-1)
that defines the elementary reflector H(j), ja≤j≤ja+k-1, as returned by
p?geqrf in the k columns of its distributed matrix argument A(ia:*,ja:ja
+k-1).
ia (global)
ja (global)
sub(A).

tau (local)
1514
tau[j] contains the scalar factor of the elementary reflector H(j+1), j =
0, 1, ..., LOCc(ja+k-1)-1, as returned by p?geqrf. This array is tied
to the distributed matrix A.
work (local)

lwork is local input and must be at least lwork≥ mpa0 + max(1, nqa0),
where
iroffa = mod(ia-1, mb_a , icoffa = mod(ja-1, nb_a),

Output Parameters
Q.
info (local).
illegal value,

then info = -i.
See Also
notations.
p?orgl2/p?ungl2
Q from an LQ factorization determined by p?gelqf
1515
Syntax
void psorgl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
void pdorgl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
void pcungl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
void pzungl2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?orgl2/p?ungl2function generates a m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal rows, which is defined as the first m rows of a product of k elementary reflectors of
order n
Q = H(k)*...*H(2)*H(1) (for real flavors),
Q = (H(k))H*...*(H(2))H*(H(1))H (for complex flavors) as returned by p?gelqf.
Input Parameters
m (global)
n (global)
The number of columns in the distributed submatrix Q. n≥ m ≥ 0.
k (global)
The number of elementary reflectors whose product defines the matrix Q. m
≥ k≥ 0.
defines the elementary reflector H(i), ia≤i ≤ia+k-1, as returned by p?
gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).
ia (global)
ja (global)
sub(A).
1516
tau (local)
Array of size LOCr(ja+k-1). tau[j] contains the scalar factor of the
elementary reflectors H(j+1), j = 0, 1, ..., LOCr(ja+k-1)-1, as returned by
p?gelqf. This array is tied to the distributed matrix A.
WORK (local)

lwork is local input and must be at least lwork≥nqa0 + max(1, mpa0),
where


Output Parameters
Q.
info (local)
illegal value,

then info = -i.
See Also
notations.
1517
p?orgr2/p?ungr2
Q from an RQ factorization determined by p?gerqf
Syntax
void psorgr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , float *a , MKL_INT *ia , MKL_INT
void pdorgr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , double *a , MKL_INT *ia , MKL_INT
void pcungr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex8 *a , MKL_INT *ia ,
void pzungr2 (MKL_INT *m , MKL_INT *n , MKL_INT *k , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?orgr2/p?ungr2function generates an m-by-n real/complex matrix Q denoting A(ia:ia+m-1, ja:ja
+n-1) with orthonormal rows, which is defined as the last m rows of a product of k elementary reflectors of
order n
Q = H(1)*H(2)*...*H(k) (for real flavors);
Q = (H(1))H*(H(2))H...*(H(k))H (for complex flavors) as returned by p?gerqf.
Input Parameters
m (global)
n (global)
The number of columns in the distributed submatrix Q. n≥ m ≥ 0.
k (global)
m≥ k≥ 0.
defines the elementary reflector H(i), ia+m-k≤ i≤ia+m-1, as returned by p?
gerqf in the k rows of its distributed matrix argument A(ia+m-k:ia+m-1,
ja:*).
ia (global)
1518
ja (global)
sub(A).
tau (local)
Array of size LOCr(ja+m-1). tau[j] contains the scalar factor of the
elementary reflectors H(j+1), j = 0, 1, ..., LOCr(ja+m-1)-1, as returned by
p?gerqf. This array is tied to the distributed matrix A.
work (local)

lwork is local input and must be at least lwork≥ nqa0 + max(1, mpa0 ),
where iroffa = mod( ia-1, mb_a ), icoffa = mod( ja-1, nb_a ),
iarow = indxg2p( ia, mb_a, myrow, rsrc_a, nprow ),

iacol = indxg2p( ja, nb_a, mycol, csrc_a, npcol ),
mpa0 = numroc( m+iroffa, mb_a, myrow, iarow, nprow ),
nqa0 = numroc( n+icoffa, nb_a, mycol, iacol, npcol ).

Output Parameters
Q.
info (local)
illegal value,

then info = -i.
See Also
notations.
1519
p?orm2l/p?unm2l
matrix from a QL factorization determined by p?geqlf
Syntax
void psorm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
void pdorm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
void pcunm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
void pzunm2l (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
Include Files
• mkl_scalapack.h
Description
The p?orm2l/p?unm2lfunction overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1,jc:jc+n-1) with
Q*sub(C) if side = 'L' and trans = 'N', or
QT*sub(C) / QH*sub(C) if side = 'L' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors), or
sub(C)*Q if side = 'R' and trans = 'N', or
sub(C)*QT / sub(C)*QH if side = 'R' and trans = 'T' (for real flavors) or trans = 'C' (for complex
flavors).
where Q is a real orthogonal or complex unitary distributed matrix defined as the product of k elementary
reflectors
Q = H(k)*...*H(2)*H(1) as returned by p?geqlf . Q is of order m if side = 'L' and of order n if side =
'R'.
Input Parameters
side (global)
= 'L': apply Q or QT for real flavors (QH for complex flavors) from the left,
= 'R': apply Q or QT for real flavors (QH for complex flavors) from the
right.
trans (global)
1520
= 'N': apply Q (no transpose)
= 'T': apply QT (transpose, for real flavors)
= 'C': apply QH (conjugate transpose, for complex flavors)
m (global)
The number of rows in the distributed matrix sub(C). m ≥ 0.
n (global)
The number of columns in the distributed matrix sub(C). n≥ 0.
k (global)
If side = 'L', m ≥ k ≥ 0;
if side = 'R', n ≥k ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+k-1).
On entry, the j-th row of the matrix stored in amust contain the vector that
defines the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?
geqlf in the k columns of its distributed matrix argument A(ia:*,ja:ja
+k-1). The argument A(ia:*,ja:ja+k-1) is modified by the function but
restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),
if side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).
ia (global)
ja (global)
sub(A).
tau (local)
Array of size LOCc(ja+n-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+n-1)-1, as returned by
p?geqlf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).On
entry, the local pieces of the distributed matrix sub (C).
ic (global)
The row index in the global matrix C indicating the first row of sub(C).
jc (global)
1521
The column index in the global matrix C indicating the first column of
sub(C).
work (local)
On exit, work(1) returns the minimal and optimal lwork.

if side = 'L', lwork≥mpc0 + max(1, nqc0),
if side = 'R', lwork≥nqc0 + max(max(1, mpc0), numroc(numroc(n

+icoffc, nb_a, 0, 0, npcol), nb_a, 0, 0, lcmq)),
where
lcmq = lcm/npcol,
lcm = iclm(nprow, npcol),
Mqc0 = numroc(m+icoffc, nb_c, mycol, icrow, nprow),
Npc0 = numroc(n+iroffc, mb_c, myrow, iccol, npcol),
blacs_gridinfo.
Output Parameters
c On exit, c is overwritten by Q*sub(C), or QT*sub(C)/ QH*sub(C), or

sub(C)*Q, or sub(C)*QT / sub(C)*QH
info (local)
illegal value,
1522
then info = -i.
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some
alignment properties, namely the following expressions should be true:
If side = 'L', ( mb_a == mb_c && iroffa == iroffc && iarow == icrow )
If side = 'R', ( mb_a == nb_c && iroffa == iroffc ).
See Also
notations.
p?orm2r/p?unm2r
matrix from a QR factorization determined by p?
geqrf (unblocked algorithm).
Syntax
void psorm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
void pdorm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
void pcunm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
void pzunm2r (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
Include Files
• mkl_scalapack.h
Description
The p?orm2r/p?unm2rfunction overwrites the general real/complex m-by-n distributed matrix sub
(C)=C(ic:ic+m-1, jc:jc+n-1) with
flavors), or
flavors).
1523
where Q is a real orthogonal or complex unitary matrix defined as the product of k elementary reflectors
Q = H(k)*...*H(2)*H(1) as returned by p?geqrf . Q is of order m if side = 'L' and of order n if side =

'R'.
Input Parameters
side (global)
right.
trans (global)
m (global)
The number of rows in the distributed matrix sub(C). m≥ 0.
n (global)
The number of columns in the distributed matrix sub(C). n ≥ 0.
k (global)
If side = 'L', m≥ k ≥ 0;
if side = 'R', n ≥k ≥ 0.
a (local)
Pointer into the local memory to an array of size lld_a * LOCc(ja+k-1).
that defines the elementary reflector H(j), ja≤j≤ja+k-1, as returned by p?
geqrf in the k columns of its distributed matrix argument A(ia:*,ja:ja
+k-1). The argument A(ia:*,ja:ja+k-1) is modified by the function but
restored on exit.
If side = 'L', lld_a ≥ max(1, LOCr(ia+m-1)),
if side = 'R', lld_a ≥ max(1, LOCr(ia+n-1)).
ia (global)
ja (global)
sub(A).
tau (local)
1524
Array of size LOCc(ja+k-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?geqrf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1).
On entry, the local pieces of the distributed matrix sub (C).
ic (global)
jc (global)
sub(C).
descc (global and local) array of size dlen_.

work (local)

if side = 'L', lwork≥ mpc0 + max(1, nqc0),
if side = 'R', lwork≥nqc0 + max(max(1, mpc0), numroc(numroc(n

+icoffc, nb_a, 0, 0, npcol), nb_a, 0, 0, lcmq)),
where
lcmq = lcm/npcol ,
Mqc0 = numroc(m+icoffc, nb_c, mycol, icrow, nprow),
Npc0 = numroc(n+iroffc, mb_c, myrow, iccol, npcol),
ilcm, indxg2p and numroc are ScaLAPACK tool functions; myrow, mycol,
blacs_gridinfo.
1525
Output Parameters

info (local)
illegal value,

then info = -i.
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1) must verify some
If side = 'L', (mb_a == mb_c) && (iroffa == iroffc) && (iarow == icrow).
If side = 'R', (mb_a == nb_c) && (iroffa == iroffc).
See Also
notations.
p?orml2/p?unml2
matrix from an LQ factorization determined by p?
gelqf (unblocked algorithm).
Syntax
void psorml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
void pdorml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
void pcunml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
void pzunml2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
1526
Include Files
• mkl_scalapack.h
Description
The p?orml2/p?unml2function overwrites the general real/complex m-by-n distributed matrix sub
flavors), or
flavors).
reflectors
Q = H(k)*...*H(2)*H(1) (for real flavors)
Q = (H(k))H*...*(H(2))H*(H(1))H (for complex flavors)
as returned by p?gelqf . Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
right.
trans (global)
m (global)
n (global)
k (global)
If side = 'L', m ≥ k≥ 0;
if side = 'R', n ≥ k≥ 0.
a (local)
lld_a * LOCc(ja+m-1) if side='L',
lld_a * LOCc(ja+n-1) if side='R',
1527
where lld_a≥ max (1, LOCr(ia+k-1)).
defines the elementary reflector H(i), ia≤ i≤ ia+k-1, as returned by p?
gelqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*). The argument A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
ia (global)
ja (global)
sub(A).
tau (local)
Array of size LOCc(ia+k-1). tau[i] contains the scalar factor of the
elementary reflector H(i+1), i = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?gelqf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1). On
ic (global)
jc (global)
sub(C).
work (local)

if side = 'L', lwork≥mqc0 + max(max( 1, npc0), numroc(numroc(m
+icoffc, mb_a, 0, 0, nprow), mb_a, 0, 0, lcmp)),
if side = 'R', lwork≥ npc0 + max(1, mqc0),
where
lcmp = lcm / nprow,
1528
Mpc0 = numroc(m+icoffc, mb_c, mycol, icrow, nprow),
Nqc0 = numroc(n+iroffc, nb_c, myrow, iccol, npcol),
blacs_gridinfo.
Output Parameters

info (local)
illegal value,

then info = -i.
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1, jc:jc+n-1) must verify some
If side = 'L', (nb_a == mb_c && icoffa == iroffc)
If side = 'R', (nb_a == nb_c && icoffa == icoffc && iacol == iccol).
See Also
notations.
p?ormr2/p?unmr2
matrix from an RQ factorization determined by p?
gerqf (unblocked algorithm).
1529
Syntax
void psormr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , float
void pdormr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k , double
void pcunmr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
void pzunmr2 (char *side , char *trans , MKL_INT *m , MKL_INT *n , MKL_INT *k ,
Include Files
• mkl_scalapack.h
Description
The p?ormr2/p?unmr2function overwrites the general real/complex m-by-n distributed matrix sub

flavors), or
flavors).
reflectors
Q = H(1)*H(2)*...*H(k) (for real flavors)
Q = (H(1))H*(H(2))H*...*(H(k))H (for complex flavors)
as returned by p?gerqf . Q is of order m if side = 'L' and of order n if side = 'R'.
Input Parameters
side (global)
right.
trans (global)
= 'C': apply QH(conjugate transpose, for complex flavors)
1530
m (global)
n (global)
k (global)
If side = 'L', m≥k ≥ 0;
if side = 'R', n≥ k≥ 0.
a (local)
lld_a * LOCc(ja+m-1) if side='L',
lld_a * LOCc(ja+n-1) if side='R',
where lld_a≥ max (1, LOCr(ia+k-1)).
defines the elementary reflector H(i), ia ≤i≤ ia+k-1, as returned by p?
gerqf in the k rows of its distributed matrix argument A(ia:ia+k-1,
ja:*).
The argument A(ia:ia+k-1, ja:*) is modified by the function but
restored on exit.
ia (global)
ja (global)
sub(A).
tau (local)
Array of size LOCc(ia+k-1). tau[j] contains the scalar factor of the
elementary reflector H(j+1), j = 0, 1, ..., LOCc(ja+k-1)-1, as returned by
p?gerqf. This array is tied to the distributed matrix A.
c (local)
Pointer into the local memory to an array of size lld_c * LOCc(jc+n-1). On
ic (global)
jc (global)
1531
sub(C).
work (local)

if side = 'L', lwork≥ mpc0 + max(max(1, nqc0), numroc(numroc(m
+iroffc, mb_a, 0, 0, nprow), mb_a, 0, 0, lcmp)),
if side = 'R', lwork ≥ nqc0 + max(1, mpc0),
where lcmp = lcm/nprow,

Mpc0 = numroc(m+iroffc, mb_c, myrow, icrow, nprow),
Nqc0 = numroc(n+icoffc, nb_c, mycol, iccol, npcol),
blacs_gridinfo.
Output Parameters

info (local)
illegal value,

then info = -i.
1532
NOTE
The distributed submatrices A(ia:*, ja:*) and C(ic:ic+m-1,jc:jc+n-1) must verify some
If side = 'L', (nb_a == mb_c) && (icoffa == iroffc).
If side = 'R', (nb_a == nb_c) && (icoffa == icoffc) && (iacol == iccol).
See Also
notations.
p?pbtrsv
Solves a single triangular linear system via frontsolve
or backsolve where the triangular matrix is a factor of
a banded matrix computed by p?pbtrf.
Syntax
void pspbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
float *a , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb ,
float *af , MKL_INT *laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
double *a , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb ,
void pcpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
void pzpbtrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *bw , MKL_INT *nrhs ,
Include Files
• mkl_scalapack.h
Description
The p?pbtrsvfunction solves a banded triangular system of linear equations
A(1:n, ja:ja+n-1)*X = B(jb:jb+n-1, 1:nrhs)
or
A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,
A(1:n, ja:ja+n-1)H*X = B(jb:jb+n-1, 1:nrhs) for complex flavors,
where A(1:n, ja:ja+n-1) is a banded triangular matrix factor produced by the Cholesky factorization code
p?pbtrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored in A(1:n, ja:ja+n-1) is either
upper or lower triangular according to uplo.
The function p?pbtrf must be called first.
1533
Optimization Notice
Input Parameters
If trans = 'N', solve with A(1:n, ja:ja+n-1);
If trans = 'T' or 'C' for real flavors, solve with A(1:n, ja:ja+n-1)T.
If trans = 'C' for complex flavors, solve with conjugate transpose

(A(1:n, ja:ja+n-1)H.
n (global)
the distributed submatrix A(1:n, ja:ja+n-1). n ≥ 0.
bw (global)
The number of subdiagonals in 'L' or 'U', 0 ≤bw≤n-1.
nrhs (global)
The number of right hand sides; the number of columns of the distributed
submatrix B(jb:jb+n-1, 1:nrhs); nrhs≥ 0.
a (local)
Pointer into the local memory to an array with the first size lld_a ≥ (bw
+1), stored in desca.
On entry, this array contains the local pieces of the n-by-n symmetric
banded distributed Cholesky factor L or LT*A(1:n, ja:ja+n-1).
This local portion is stored in the packed banded format used in LAPACK.
See the Application Notes below and the ScaLAPACK manual for more detail
on the format of distributed matrices.
ja (global) The index in the global in the global matrix A that points to the
start of the matrix to be operated on (which may be either all of A or a
submatrix of A).
1534
If 1D type (dtype_a = 501), then dlen≥ 7;
If 2D type (dtype_a = 1), then dlen≥ 9.
Contains information on mapping of A to memory. (See ScaLAPACK manual

for full description and options.)
b (local)
Pointer into the local memory to an array of local lead size lld_b≥nb.
B(jb:jb+n-1, 1:nrhs).
B).
If 1D type (dtype_b = 502), then dlen≥ 7;
If 2D type (dtype_b = 1), then dlen≥ 9.
Contains information on mapping of B to memory. Please, see ScaLAPACK

manual for full description and options.
laf (local)
The size of user-input auxiliary fill-in space af. Must be laf ≥ (nb
+2*bw)*bw . If laf is not large enough, an error code will be returned and
the minimum acceptable size will be returned in af[0].
work (local)
The array work is a temporary workspace array of size lwork. This space
may be overwritten in between function calls.
lwork (local or global) The size of the user-input workspace work, must be at
least lwork≥bw*nrhs. If lwork is too small, the minimal acceptable size
will be returned in work[0] and an error code is returned.
Output Parameters
af (local)
space is created in a call to the factorization function p?pbtrf and is stored
in af. If a linear system is to be solved using p?pbtrs after the
factorization function, af must not be altered after the factorization.
b On exit, this array contains the local piece of the solutions distributed
matrix X.
work[0] On exit, work[0] contains the minimum value of lwork.
info (local)
1535
illegal value,

then info = -i.
Application Notes
If the factorization function and the solve function are to be called separately to solve various sets of right-
hand sides using the same coefficient matrix, the auxiliary space af must not be altered between calls to the
factorization function and the solve function.
The best algorithm for solving banded and tridiagonal linear systems depends on a variety of parameters,
especially the bandwidth. Currently, only algorithms designed for the case N/P>>bw are implemented. These
algorithms go by many names, including Divide and Conquer, Partitioning, domain decomposition-type, etc.
The Divide and Conquer algorithm assumes the matrix is narrowly banded compared with the number of
equations. In this situation, it is best to distribute the input matrix A one-dimensionally, with columns atomic
and rows divided amongst the processes. The basic algorithm divides the banded matrix up into P pieces with
one stored on each processor, and then proceeds in 2 phases for the factorization or 3 for the solution of a
linear system.
1. Local Phase : The individual pieces are factored independently and in parallel. These factors are
applied to the matrix creating fill-in, which is stored in a non-inspectable way in auxiliary space af.
Mathematically, this is equivalent to reordering the matrix A as PAPT and then factoring the principal
leading submatrix of size equal to the sum of the sizes of the matrices factored on each processor. The
factors of these submatrices overwrite the corresponding parts of A in memory.
2. Reduced System Phase : A small (bw*(P-1)) system is formed representing interaction of the larger
blocks and is stored (as are its factors) in the space af. A parallel Block Cyclic Reduction algorithm is
used. For a linear system, a parallel front solve followed by an analogous backsolve, both using the
structure of the factored matrix, are performed.
3. Back Subsitution Phase: For a linear system, a local backsubstitution is performed on each processor
in parallel.
See Also
notations.
p?pttrsv
Solves a single triangular linear system via frontsolve
or backsolve where the triangular matrix is a factor of
a tridiagonal matrix computed by p?pttrf .
Syntax
void pspttrsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , float *d , float *e , MKL_INT
*ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *descb , float *af , MKL_INT
*laf , float *work , MKL_INT *lwork , MKL_INT *info );
void pdpttrsv (char *uplo , MKL_INT *n , MKL_INT *nrhs , double *d , double *e ,
MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *descb , double *af ,
MKL_INT *laf , double *work , MKL_INT *lwork , MKL_INT *info );
1536
void pcpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *d ,
MKL_Complex8 *e , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib ,
void pzpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *d ,
MKL_Complex16 *e , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib ,
Include Files
• mkl_scalapack.h
Description
The p?pttrsvfunction solves a tridiagonal triangular system of linear equations
A(1:n, ja:ja+n-1)*X = B(jb:jb+n-1, 1:nrhs)
or
A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,
A(1:n, ja:ja+n-1)H*X = B(jb:jb+n-1, 1:nrhs) for complex flavors,
where A(1:n, ja:ja+n-1) is a tridiagonal triangular matrix factor produced by the Cholesky factorization
code p?pttrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored in A(1:n, ja:ja+n-1) is
either upper or lower triangular according to uplo.
The function p?pttrf must be called first.
Input Parameters
trans (global) Must be 'N' or 'C'.
If trans = 'N', solve with A(1:n, ja:ja+n-1);
If trans = 'C' (for complex flavors), solve with conjugate transpose

(A(1:n, ja:ja+n-1))H.
n (global)
the distributed submatrix A(1:n, ja:ja+n-1). n ≥ 0.
nrhs (global)
The number of right hand sides; the number of columns of the distributed
submatrix B(jb:jb+n-1, 1:nrhs); nrhs≥ 0.
d (local)
Pointer to the local part of the global vector storing the main diagonal of the
matrix; must be of size ≥nb_a.
e (local)
1537
Pointer to the local part of the global vector du storing the upper diagonal of
the matrix; must be of size ≥nb_a. Globally, du(n) is not referenced, and du
must be aligned with d.
If 1D type (dtype_a = 501 or 502), then dlen ≥ 7;
If 2D type (dtype_a = 1), then dlen ≥ 9.
Contains information on mapping of A to memory. See ScaLAPACK manual

for full description and options.
b (local)
Pointer into the local memory to an array of local lead size lld_b ≥ nb.
B(jb:jb+n-1, 1:nrhs).
B).
If 1D type (dtype_b = 502), then dlen ≥ 7;
If 2D type (dtype_b = 1), then dlen ≥ 9.
Contains information on mapping of B to memory. See ScaLAPACK manual

for full description and options.
laf (local)
The size of user-input auxiliary fill-in space af. Must be laf ≥ (nb
+2*bw)*bw.
work (local)
The array work is a temporary workspace array of size lwork. This space
may be overwritten in between function calls.
lwork (local or global) The size of the user-input workspace work, must be at
least lwork≥(10+2*min(100, nrhs))*npcol+4*nrhs. If lwork is too
small, the minimal acceptable size will be returned in work[0] and an error
code is returned.
Output Parameters
d, e (local).
On exit, these arrays contain information on the factors of the matrix.
1538
af (local)
space is created in a call to the factorization function p?pbtrf and is stored
in af. If a linear system is to be solved using p?pttrs after the
factorization function, af must not be altered after the factorization.
b On exit, this array contains the local piece of the solutions distributed
matrix X.
work[0] On exit, work[0] contains the minimum value of lwork.
info (local)
illegal value,

then info = -i.
See Also
notations.
p?potf2
Computes the Cholesky factorization of a symmetric/
Hermitian positive definite matrix (local unblocked
algorithm).
Syntax
void pspotf2 (char *uplo , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
void pdpotf2 (char *uplo , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
void pcpotf2 (char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
void pzpotf2 (char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
Include Files
• mkl_scalapack.h
Description
The p?potf2function computes the Cholesky factorization of a real symmetric or complex Hermitian positive
definite distributed matrix sub (A)=A(ia:ia+n-1, ja:ja+n-1).

sub(A) = U'*U, if uplo = 'U', or sub(A) = L*L', if uplo = 'L',
where U is an upper triangular matrix, L is lower triangular. X' denotes transpose (conjugate transpose) of X.
1539
Input Parameters
uplo (global)
Hermitian matrix A is stored.
= 'U': upper triangle of sub (A) is stored;
= 'L': lower triangle of sub (A) is stored.
n (global)
the distributed matrix sub (A). n≥ 0.
a (local)
containing the local pieces of the n-by-n symmetric distributed matrix
sub(A) to be factored.
the upper triangular matrix and the strictly lower triangular part of this
matrix is not referenced.
the lower triangular matrix and the strictly upper triangular part of sub(A) is
not referenced.
ia, ja (global)
and the first column of the sub(A), respectively.
Output Parameters
a (local)
On exit,
if uplo = 'U', the upper triangular part of the distributed matrix contains
the Cholesky factor U;
if uplo = 'L', the lower triangular part of the distributed matrix contains
the Cholesky factor L.
info (local)
illegal value,

then info = -i.
1540
> 0: if info = k, the leading minor of order k is not positive definite, and
the factorization could not be completed.
See Also
notations.
p?rot
Applies a planar rotation to two distributed vectors.
Syntax
void psrot(MKL_INT* n, float* x, MKL_INT* ix, MKL_INT* jx, MKL_INT* descx, MKL_INT*
incx, float* y, MKL_INT* iy, MKL_INT* jy, MKL_INT* descy, MKL_INT* incy, float* cs,
float* sn, float* work, MKL_INT* lwork, MKL_INT* info);
void pdrot(MKL_INT* n, double* x, MKL_INT* ix, MKL_INT* jx, MKL_INT* descx, MKL_INT*
incx, double* y, MKL_INT* iy, MKL_INT* jy, MKL_INT* descy, MKL_INT* incy, double* cs,
double* sn, double* work, MKL_INT* lwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
p?rot applies a planar rotation defined by cs and sn to the two distributed vectors sub(x) and sub(y).
Input Parameters
n (global )
The number of elements to operate on when applying the planar rotation to
x and y (n≥0).
x (local) array of size ( (jx-1)*m_x + ix + ( n - 1 )*abs( incx ) )
This array contains the entries of the distributed vector sub( x ).
ix (global )
The global row index of the submatrix of the distributed matrix x to operate
on. If incx = 1, then it is required that ix = iy. 1 ≤ix≤m_x.
jx (global )
The global column index of the submatrix of the distributed matrix x to
operate on. If incx = m_x, then it is required that jx = jy. 1 ≤ix≤n_x.
descx (global and local) array of size 9

The array descriptor of the distributed matrix x.
incx (global )
The global increment for the elements of x. Only two values of incx are
supported in this version, namely 1 and m_x. Moreover, it must hold that
incx = m_x if incy =m_y and that incx = 1 if incy = 1.
1541
y (local) array of size ( (jy-1)*m_y + iy + ( n - 1 )*abs( incy ) )
This array contains the entries of the distributed vector sub( y ).
iy (global )
The global row index of the submatrix of the distributed matrix y to operate
on. If incy = 1, then it is required that iy = ix. 1 ≤iy≤m_y.
jy (global )
The global column index of the submatrix of the distributed matrix y to
operate on. If incy = m_x, then it is required that jy = jx. 1 ≤jy≤m_y.
descy (global and local) array of size 9

The array descriptor of the distributed matrix y.
incy (global )
The global increment for the elements of y. Only two values of incy are
supported in this version, namely 1 and m_y. Moreover, it must hold that
incy = m_y if incx = m_x and that incy = 1 if incx = 1.
cs, sn (global)
The parameters defining the properties of the planar rotation. It must hold
that 0 ≤cs,sn≤ 1 and that sn2 + cs2 = 1. The latter is hardly checked in
finite precision arithmetics.
lwork (local )
The length of the workspace array work.
If incx = 1 and incy = 1, then lwork = 2*m_x
If lwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the work array, returns this value as the first
entry of the IWORK array, and no error message related to LIWORK is
issued by pxerbla.
OUTPUT Parameters
x
y
work[0] On exit, if info = 0, work[0] returns the optimal lwork
info (global )
illegal value, then info = -(i*100+j), if the i-th argument is a scalar and
1542
See Also
notations.
p?rscl
Multiplies a vector by the reciprocal of a real scalar.
Syntax
void psrscl (MKL_INT *n , float *sa , float *sx , MKL_INT *ix , MKL_INT *jx , MKL_INT
void pdrscl (MKL_INT *n , double *sa , double *sx , MKL_INT *ix , MKL_INT *jx ,
void pcsrscl (MKL_INT *n , float *sa , MKL_Complex8 *sx , MKL_INT *ix , MKL_INT *jx ,
void pzdrscl (MKL_INT *n , double *sa , MKL_Complex16 *sx , MKL_INT *ix , MKL_INT *jx ,
Include Files
• mkl_scalapack.h
Description
The p?rsclfunction multiplies an n-element real/complex vector sub(X) by the real scalar 1/a. This is done
without overflow or underflow as long as the final result sub(X)/a does not overflow or underflow.
sub(X) denotes X(ix:ix+n-1, jx:jx), if incx = 1,
and X(ix:ix, jx:jx+n-1), if incx = m_x.
Input Parameters
n (global)
The number of components of the distributed vector sub(X). n≥ 0.
sa The scalar a that is used to divide each component of the vector sub(X).
This parameter must be ≥ 0.
sx Array containing the local pieces of a distributed matrix of size of at least

((jx-1)*m_x + ix + (n-1)*abs(incx)). This array contains the entries
of the distributed vector sub(X).
ix (global) The row index of the submatrix of the distributed matrix X to

operate on.
jx (global)
The column index of the submatrix of the distributed matrix X to operate
on.

Array of size 9. The array descriptor for the distributed matrix X.
incx (global)
1543
The increment for the elements of X. This version supports only two values
of incx, namely 1 and m_x.
Output Parameters
sx On exit, the result x/a.
See Also
notations.
p?sygs2/p?hegs2
Reduces a symmetric/Hermitian positive-definite
generalized eigenproblem to standard form, using the
factorization results obtained from p?potrf (local
Syntax
void pssygs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , float *a , MKL_INT *ia ,
MKL_INT *info );
void pdsygs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , double *a , MKL_INT *ia ,
void pchegs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex8 *a , MKL_INT
void pzhegs2 (MKL_INT *ibtype , char *uplo , MKL_INT *n , MKL_Complex16 *a , MKL_INT
Include Files
• mkl_scalapack.h
Description
The p?sygs2/p?hegs2function reduces a real symmetric-definite or a complex Hermitian positive-definite
generalized eigenproblem to standard form.
Here sub(A) denotes A(ia:ia+n-1, ja:ja+n-1), and sub(B) denotes B(ib:ib+n-1, jb:jb+n-1).
sub(A)*x = λ*sub(B)*x
and sub(A) is overwritten by
inv(UT)*sub(A)*inv(U) or inv(L)*sub(A)*inv(LT) - for real flavors, and

inv(UH)*sub(A)*inv(U) or inv(L)*sub(A)*inv(LH) - for complex flavors.
sub(A)*sub(B)x = λ*x or sub(B)*sub(A)x =λ*x
and sub(A) is overwritten by
1544
U*sub(A)*UT or L**T*sub(A)*L- for real flavors and
U*sub(A)*UH or L**H*sub(A)*L- for complex flavors.
The matrix sub(B) must have been previously factorized as UT*U or L*LT (for real flavors), or as UH*U or
L*LH (for complex flavors) by p?potrf.
Input Parameters
ibtype (global)
= 1:
compute inv(UT)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LT) for real
functions,
and inv(UH)*sub(A)*inv(U), or inv(L)*sub(A)*inv(LH) for complex
functions;
= 2 or 3:
compute U*sub(A)*UT, or LT*sub(A)*L for real functions,
and U*sub(A)*UH or LH*sub(A)*L for complex functions.
uplo (global)
Hermitian matrix sub(A) is stored, and how sub(B) is factorized.
= 'U': Upper triangular of sub(A) is stored and sub(B) is factorized as UT*U
(for real functions) or as UH*U (for complex functions).
= 'L': Lower triangular of sub(A) is stored and sub(B) is factorized as L*LT
(for real functions) or as L*LH (for complex functions)
n (global)
The order of the matrices sub(A) and sub(B). n≥ 0.
a (local)
Hermitian distributed matrix sub(A).
ia, ja (global)
B (local)
1545
On entry, this array contains the local pieces of the triangular factor from
the Cholesky factorization of sub(B) as returned by p?potrf.
ib, jb (global)
The row and column indices in the global matrix B indicating the first row
and the first column of the sub(B), respectively.
Output Parameters
a (local)
On exit, if info = 0, the transformed matrix is stored in the same format
as sub(A).
info
illegal value,
then info = - (i*100+ j),

then info = -i.
See Also
notations.
p?sytd2/p?hetd2
Reduces a symmetric/Hermitian matrix to real
symmetric tridiagonal form by an orthogonal/unitary
similarity transformation (local unblocked algorithm).
Syntax
void pssytd2 (char *uplo, MKL_INT *n, float *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, float *d, float *e, float *tau, float *work, MKL_INT *lwork, MKL_INT *info);
void pdsytd2 (char *uplo, MKL_INT *n, double *a, MKL_INT *ia, MKL_INT *ja, MKL_INT
*desca, double *d, double *e, double *tau, double *work, MKL_INT *lwork, MKL_INT
*info);
void pchetd2 (char *uplo, MKL_INT *n, MKL_Complex8 *a, MKL_INT *ia, MKL_INT *ja,
MKL_INT *desca, float *d, float *e, MKL_Complex8 *tau, MKL_Complex8 *work, MKL_INT
*lwork, MKL_INT *info);
void pzhetd2 (char *uplo, MKL_INT *n, MKL_Complex16 *a, MKL_INT *ia, MKL_INT *ja,
MKL_INT *desca, double *d, double *e, MKL_Complex16 *tau, MKL_Complex16 *work, MKL_INT
*lwork, MKL_INT *info);
Include Files
• mkl_scalapack.h
1546
Description
The p?sytd2/p?hetd2function reduces a real symmetric/complex Hermitian matrix sub(A) to symmetric/
Hermitian tridiagonal form T by an orthogonal/unitary similarity transformation:
Q'*sub(A)*Q = T, where sub(A) = A(ia:ia+n-1, ja:ja+n-1).
Input Parameters
uplo (global)
Hermitian matrix sub(A) is stored:
= 'U': upper triangular
= 'L': lower triangular
n (global)
the distributed matrix sub(A). n≥ 0.
a (local)
Hermitian distributed matrix sub(A).
ia, ja (global)
work (local)
The array work is a temporary workspace array of size lwork.
Output Parameters
a On exit, if uplo = 'U', the diagonal and first superdiagonal of sub(A) are
the orthogonal/unitary matrix Q as a product of elementary reflectors;
below the first subdiagonal, with the array tau, represent the orthogonal/
unitary matrix Q as a product of elementary reflectors. See the Application
Notes below.
1547
d (local)
Array of sizeLOCc(ja+n-1). The diagonal elements of the tridiagonal matrix
T:
d[i] = A(i+1,i+1), where i=0,1, ..., LOCc(ja+n-1) -1 ; d is tied to the
e (local)
Array of size LOCc(ja+n-1),
if uplo = 'U', LOCc(ja+n-2) otherwise.

e[i] = A(i+1,i+2) if uplo = 'U',
e[i] = A(i+2,i+1) if uplo = 'L',
where i=0,1, ..., LOCc(ja+n-1) -1.
tau (local)
The scalar factors of the elementary reflectors. tau is tied to the distributed
matrix A.
work[0] On exit, work[0] returns the minimal and optimal value of lwork.

The size of the workspace array work.
lwork is local input and must be at least lwork ≥ 3n.

info (local)
< 0: if the i-th argument, indexed i-1, is an array and the j-th entry had an
illegal value,
then info = -(i*100+j),

then info = -i.
Application Notes
Q = H(n-1)*...*H(2)*H(1)
1548
where tau is a real/complex scalar, and v is a real/complex vector with v(i+1:n) = 0 and v(i) = 1;
v(1:i-1) is stored on exit in A(ia:ia+i-2, ja+i), and tau in tau[ja+i-2].
Q = H(1)*H(2)*...*H(n-1).
H(i) = I - tau*v*v' ,
where tau is a real/complex scalar, and v is a real/complex vector with v(1:i) = 0 and v(i+1) = 1; v(i
+2:n) is stored on exit in A(ia+i+1:ia+n-1, ja+i-1), and tau in tau[ja+i-2].
The contents of sub (A) on exit are illustrated by the following examples with n = 5:
where d and e denotes diagonal and off-diagonal elements of T, and vi denotes an element of the vector
defining H(i).
NOTE
The distributed matrix sub(A) must verify some alignment properties, namely the following expression
should be true:
( mb_a==nb_a && iroffa==icoffa )where iroffa = mod(ia - 1, mb_a) and icoffa =
mod(ja -1, nb_a).
See Also
notations.
p?trord
Reorders the Schur factorization of a general matrix.
Syntax
void pstrord( char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, float* t,
MKL_INT* it, MKL_INT* jt, MKL_INT* desct, float* q, MKL_INT* iq, MKL_INT* jq, MKL_INT*
descq, float* wr, float* wi, MKL_INT* m, float* work, MKL_INT* lwork, MKL_INT* iwork,
MKL_INT* liwork, MKL_INT* info);
void pdtrord(char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n, double* t,
MKL_INT* it, MKL_INT* jt, MKL_INT* desct, double* q, MKL_INT* iq, MKL_INT* jq,
MKL_INT* descq, double* wr, double* wi, MKL_INT* m, double* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
1549
Include Files
• mkl_scalapack.h
Description
p?trord reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading
columns of Q form an orthonormal basis of the corresponding right invariant subspace.
T must be in Schur form (as returned by p?lahqr), that is, block upper triangular with 1-by-1 and 2-by-2
diagonal blocks.
This function uses a delay and accumulate procedure for performing the off-diagonal updates.
Optimization Notice
Input Parameters
compq (global)
= 'V': update the matrix q of Schur vectors;
= 'N': do not update q.
select (global) array of size n
select specifies the eigenvalues in the selected cluster. To select a real

eigenvalue w(j), select[j-1] must be set to 1. To select a complex
conjugate pair of eigenvalues w(j) and w(j+1), corresponding to a 2-by-2
diagonal block, either select[j-1] or select[j] or both must be set to 1; a
complex conjugate pair of eigenvalues must be either both included in the
cluster or both excluded.
para (global)
Block parameters:
para[0] maximum number of concurrent computational

windows allowed in the algorithm; 0 < para[0]≤
min(nprow, npcol) must hold;
para[1] number of eigenvalues in each window; 0 <

para[1] < para[2] must hold;
para[2] window size; para[1] < para[2] < mb_t must

hold;
1550
para[3] minimal percentage of FLOPS required for
performing matrix-matrix multiplications instead
of pipelined orthogonal transformations; 0
≤para[3]≤ 100 must hold;
para[4] width of block column slabs for row-wise

application of pipelined orthogonal
transformations in their factorized form; 0 <
para[4]≤mb_t must hold.
para[5] the maximum number of eigenvalues moved

together over a process border; in practice, this
will be approximately half of the cross border
window size; 0 < para[5]≤para[1] must hold.
n (global)
The order of the globally distributed matrix t. n≥ 0.
t (local) array of size lld_t * LOCc(n).
The local pieces of the global distributed upper quasi-triangular matrix T, in

Schur form.
it, jt (global)
The row and column index in the global matrix T indicating the first column
of T. it = jt = 1 must hold (see Application Notes).

The array descriptor for the global distributed matrix T.
q (local) array of size lld_q * LOCc(n).
On entry, if compq = 'V', the local pieces of the global distributed matrix Q
of Schur vectors.
If compq = 'N', q is not referenced.
iq, jq (global)
The column index in the global matrix Q indicating the first column of Q. iq
= jq = 1 must hold (see Application Notes).
descq (global and local) array of size dlen_.

The array descriptor for the global distributed matrix Q.
lwork (local)

entry of the work array, and no error message related to lwork is issued by
pxerbla.
1551
liwork (local)
If liwork = -1, then a workspace query is assumed; the function only

calculates the optimal size of the iwork array, returns this value as the first
entry of the iwork array, and no error message related to liwork is issued
by pxerbla
OUTPUT Parameters
select (global) array of size n
The (partial) reordering is displayed.
t On exit, t is overwritten by the local pieces of the reordered matrix T, again

in Schur form, with the selected eigenvalues in the globally leading diagonal
blocks.
q On exit, if compq = 'V', q has been postmultiplied by the global orthogonal

transformation matrix which reorders t; the leading m columns of q form an
orthonormal basis for the specified invariant subspace.
wr, wi (global ) array of size n
The real and imaginary parts, respectively, of the reordered eigenvalues of

the matrix T. The eigenvalues are in principle stored in the same order as
on the diagonal of T, with wr[i] = T(i+1,i+1) and, if T(i:i+1,i:i+1) is a 2-
by-2 diagonal block, wi[i-1] > 0 and wi[i] = -wi[i-1].
Note also that if a complex eigenvalue is sufficiently ill-conditioned, then its

value may differ significantly from its value before reordering.
m (global )
The size of the specified invariant subspace.
0 ≤m≤n.
info (global)
< 0: if info = -i, the i-th argument had an illegal value. If the i-th
argument is an array and the j-th entry, indexed j-1, had an illegal value,
> 0: here we have several possibilities
• Reordering of t failed because some eigenvalues are too close to

separate (the problem is very ill-conditioned);
1552
t may have been partially reordered, and wr and wi contain the
eigenvalues in the same order as in t.
On exit, info = {the index of t where the swap failed (indexing starts
at 1)}.
• A 2-by-2 block to be reordered split into two 1-by-1 blocks and the
second block failed to swap with an adjacent block.
On exit, info = {the index of t where the swap failed}.
• If info = n+1, there is no valid BLACS context (see the BLACS
documentation for details).
Application Notes
The following alignment requirements must hold:
• mb_t = nb_t = mb_q = nb_q

• rsrc_t = rsrc_q
• csrc_t = csrc_q
All matrices must be blocked by a block factor larger than or equal to two (3). This is to simplify reordering
across processor borders in the presence of 2-by-2 blocks.
This algorithm cannot work on submatrices of t and q, i.e., it = jt = iq = jq = 1 must hold. This is
however no limitation since p?lahqr does not compute Schur forms of submatrices anyway.
Parallel execution recommendations:
• Use a square grid, if possible, for maximum performance. The block parameters in para should be kept
well below the data distribution block size.
• In general, the parallel algorithm strives to perform as much work as possible without crossing the block
borders on the main block diagonal.
See Also
notations.
p?trsen
Reorders the Schur factorization of a matrix and
(optionally) computes the reciprocal condition
numbers and invariant subspace for the selected
cluster of eigenvalues.
Syntax
void pstrsen(char* job, char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n,
float* t, MKL_INT* it, MKL_INT* jt, MKL_INT* desct, float* q, MKL_INT* iq, MKL_INT*
jq, MKL_INT* descq, float* wr, float* wi, MKL_INT* m, float* s, float* sep, float*
work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
void pdtrsen(char* job, char* compq, MKL_INT* select, MKL_INT* para, MKL_INT* n,
double* t, MKL_INT* it, MKL_INT* jt, MKL_INT* desct, double* q, MKL_INT* iq, MKL_INT*
jq, MKL_INT* descq, double* wr, double* wi, MKL_INT* m, double* s, double* sep,
double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
1553
Description
p?trsen reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of
eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading
columns of Q form an orthonormal basis of the corresponding right invariant subspace. The reordering is
performed by p?trord.
Optionally the function computes the reciprocal condition numbers of the cluster of eigenvalues and/or the
invariant subspace.
T must be in Schur form (as returned by p?lahqr), that is, block upper triangular with 1-by-1 and 2-by-2
diagonal blocks.
Optimization Notice
Input Parameters
job (global )
Specifies whether condition numbers are required for the cluster of
eigenvalues (s) or the invariant subspace (sep):
= 'N': no condition numbers are required;

= 'E': only the condition number for the cluster of eigenvalues is computed
(s);
= 'V': only the condition number for the invariant subspace is computed
(sep);
= 'B': condition numbers for both the cluster and the invariant subspace are
computed (s and sep).
compq (global )
= 'V': update the matrix q of Schur vectors;
= 'N': do not update q.
select (global ) array of size n
select specifies the eigenvalues in the selected cluster. To select a real

eigenvalue w(j), select[j-1] must be set to a non-zero number. To select a
complex conjugate pair of eigenvalues w(j) and w(j+1), corresponding to a
2-by-2 diagonal block, either select[j-1] or select[j] or both must be set
to a non-zero number; a complex conjugate pair of eigenvalues must be
either both included in the cluster or both excluded.
para (global )
Block parameters:
1554
para[0] maximum number of concurrent computational
windows allowed in the algorithm; 0 < para[0]≤
min(NPROW,NPCOL) must hold;
para[1] number of eigenvalues in each window; 0 <

para[1] < para[2] must hold;
para[2] window size; para[1] < para[2] < mb_t must

hold;
para[3] minimal percentage of flops required for

performing matrix-matrix multiplications instead
of pipelined orthogonal transformations; 0
≤para[3]≤ 100 must hold;
para[4] width of block column slabs for row-wise

application of pipelined orthogonal
transformations in their factorized form; 0 <
para[4]≤mb_t must hold.
para[5] the maximum number of eigenvalues moved

together over a process border; in practice, this
will be approximately half of the cross border
window size 0 < para[5]≤para[1] must hold;
n (global )
The order of the globally distributed matrix t. n≥ 0.
t (local ) array of size lld_t * LOCc(n).
The local pieces of the global distributed upper quasi-triangular matrix T, in

Schur form.
it, jt (global )
The row and column index in the global matrix T indicating the first column
of T. it = jt = 1 must hold (see Application Notes).

The array descriptor for the global distributed matrix T.
q (local ) array of size lld_q * LOCc(n).
On entry, if compq = 'V', the local pieces of the global distributed matrix Q
of Schur vectors.
iq, jq (global )
The column index in the global matrix Q indicating the first column of Q. iq
= jq = 1 must hold (see Application Notes).
descq (global and local) array of size dlen_.

The array descriptor for the global distributed matrix Q.
1555
lwork (local )

entry of the work array, and no error message related to lwork is issued by
pxerbla.
liwork (local )

entry of the iwork array, and no error message related to liwork is issued
by pxerbla.
OUTPUT Parameters
t t is overwritten by the local pieces of the reordered matrix T, again in

Schur form, with the selected eigenvalues in the globally leading diagonal
blocks.
q On exit, if compq = 'V', q has been postmultiplied by the global orthogonal

transformation matrix which reorders t; the leading m columns of q form an
orthonormal basis for the specified invariant subspace.
wr, wi (global ) array of size n
The real and imaginary parts, respectively, of the reordered eigenvalues of

the matrix T. The eigenvalues are in principle stored in the same order as
on the diagonal of T, with wr[i] = T(i+1,i+1) and, if T(i:i+1,i:i+1) is a 2-
by-2 diagonal block, wi[i-1] > 0 and wi[i] = -wi[i-1].
Note also that if a complex eigenvalue is sufficiently ill-conditioned, then its

value may differ significantly from its value before reordering.
m (global )
The size of the specified invariant subspace. 0 ≤m≤n.
s (global )
If job = 'E' or 'B', s is a lower bound on the reciprocal condition number for
the selected cluster of eigenvalues. s cannot underestimate the true
reciprocal condition number by more than a factor of sqrt(n). If m = 0 or n,
s = 1.
If job = 'N' or 'V', s is not referenced.
sep (global )
1556
If job = 'V' or 'B', sep is the estimated reciprocal condition number of the
specified invariant subspace. If
m = 0 or n, sep = norm(t).
If job = 'N' or 'E', sep is not referenced.
info (global )
illegal value, then info = -(i*1000+j), if the i-th argument is a scalar and
> 0: here we have several possibilities
• Reordering of t failed because some eigenvalues are too close to

separate (the problem is very ill-conditioned); t may have been partially
reordered, and wr and wi contain the eigenvalues in the same order as
in t.
On exit, info = {the index of t where the swap failed (indexing starts
at 1)}.
• A 2-by-2 block to be reordered split into two 1-by-1 blocks and the
second block failed to swap with an adjacent block.
On exit, info = {the index of t where the swap failed}.
• If info = n+1, there is no valid BLACS context (see the BLACS
documentation for details).
Application Notes
The following alignment requirements must hold:
• mb_t = nb_t = mb_q = nb_q

• rsrc_t = rsrc_q
• csrc_t = csrc_q
All matrices must be blocked by a block factor larger than or equal to two (3). This to simplify reordering
across processor borders in the presence of 2-by-2 blocks.
This algorithm cannot work on submatrices of t and q, i.e., it = jt = iq = jq = 1 must hold. This is
however no limitation since p?lahqr does not compute Schur forms of submatrices anyway.
For parallel execution, use a square grid, if possible, for maximum performance. The block parameters in
para should be kept well below the data distribution block size.
In general, the parallel algorithm strives to perform as much work as possible without crossing the block
borders on the main block diagonal.
See Also
notations.
1557
p?trti2
Computes the inverse of a triangular matrix (local
Syntax
void pstrti2 (char *uplo , char *diag , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT
void pdtrti2 (char *uplo , char *diag , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT
void pctrti2 (char *uplo , char *diag , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia ,
void pztrti2 (char *uplo , char *diag , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia ,
Include Files
• mkl_scalapack.h
Description
The p?trti2function computes the inverse of a real/complex upper or lower triangular block matrix sub (A)
= A(ia:ia+n-1, ja:ja+n-1).
This matrix should be contained in one and only one process memory space (local operation).
Input Parameters
uplo (global)
Specifies whether the matrix sub (A) is upper or lower triangular.
= 'U': sub (A) is upper triangular
= 'L': sub (A) is lower triangular.
diag (global)
Specifies whether or not the matrix A is unit triangular.
= 'N': sub (A) is non-unit triangular
= 'U': sub (A) is unit triangular.
n (global)
The number of rows and columns to be operated on, i.e., the order of the
distributed submatrix sub(A). n≥ 0.
a (local)
Pointer into the local memory to an array, size lld_a * LOCc(ja+n-1).
On entry, this array contains the local pieces of the triangular matrix
sub(A).
If uplo = 'U', the leading n-by-n upper triangular part of the matrix
sub(A) contains the upper triangular part of the matrix, and the strictly
lower triangular part of sub(A) is not referenced.
1558
If uplo = 'L', the leading n-by-n lower triangular part of the matrix
sub(A) contains the lower triangular part of the matrix, and the strictly
upper triangular part of sub(A) is not referenced. If diag = 'U', the
diagonal elements of sub(A) are not referenced either and are assumed to
be 1.
ia, ja (global)
Output Parameters
a On exit, the (triangular) inverse of the original matrix, in the same storage
format.
info
illegal value,
then info = - (i*100+j),

then info = -i.
See Also
notations.
?lahqr2
Updates the eigenvalues and Schur decomposition.
Syntax
void clahqr2 (const MKL_INT* wantt, const MKL_INT* wantz, const MKL_INT* n, const
MKL_INT* ilo, const MKL_INT* ihi, MKL_Complex8* h, const MKL_INT* ldh, MKL_Complex8*
w, const MKL_INT* iloz, const MKL_INT* ihiz, MKL_Complex8* z, const MKL_INT* ldz,
MKL_INT* info);
void zlahqr2 (const MKL_INT* wantt, const MKL_INT* wantz, const MKL_INT* n, const
MKL_INT* ilo, const MKL_INT* ihi, MKL_Complex16* h, const MKL_INT* ldh, MKL_Complex16*
w, const MKL_INT* iloz, const MKL_INT* ihiz, MKL_Complex16* z, const MKL_INT* ldz,
MKL_INT* info);
Include Files
• mkl_scalapack.h
1559
Description
?lahqr2 is an auxiliary routine called by ?hseqr to update the eigenvalues and Schur decomposition already
computed by ?hseqr, by dealing with the Hessenberg submatrix in rows and columns ilo to ihi. This
version of ?lahqr (not the standard LAPACK version) uses a double-shift algorithm (like LAPACK's ?lahqr).
Unlike the standard LAPACK convention, this does not assume the subdiagonal is real, nor does it work to
preserve this quality if given.
Input Parameters
wantt
≠ 0: the full Schur form T is required;
= 0: only eigenvalues are required.
wantz
≠ 0: the matrix of Schur vectors Z is required;
= 0: Schur vectors are not required.
n
The order of the matrix H. n >= 0.
ilo, ihi
It is assumed that the matrix H is upper triangular in rows and columns ihi
+1 :n, and that matrix element H(ilo,ilo-1) = 0 (unless ilo = 1). ?
lahqr works primarily with the Hessenberg submatrix in rows and columns
ilo to ihi, but applies transformations to all of h if wantt is nonzero.
1 <= ilo <= max(1,ihi); ihi <= n.
h Array, size ldh*n.
On entry, the upper Hessenberg matrix H.
ldh
The leading dimension of the array h. ldh >= max(1,n).
iloz, ihiz
Specify the rows of Z to which transformations must be applied if wantz≠ 0.
1 <= iloz <= ilo; ihi <= ihiz <= n.
z Array, size ldz*n.
If wantz≠ 0, on entry z must contain the current matrix Z of

transformations. If wantz= 0, z is not referenced.
ldz
The leading dimension of the array z. ldz >= max(1,n).
Output Parameters
h On exit, if wantt≠ 0, h is upper triangular in rows and columns ilo:ihi. If

wantt= 0, the contents of h are unspecified on exit.
w Array, size (n)
The computed eigenvalues ilo to ihi are stored in the corresponding

elements of w. If wantt≠ 0, the eigenvalues are stored in the same order as
on the diagonal of the Schur form returned in h, with w[i] = H(i, i).
1560
z If wantz≠ 0, on exit z has been updated; transformations are applied only
to the submatrix Z(iloz:ihiz,ilo:ihi). If wantz= 0, z is not referenced.
info
> 0: if info = i, ?lahqr failed to compute all the eigenvalues ilo to ihi in
a total of 30*(ihi-ilo+1) iterations; elements w[i:ihi - 1] contain
those eigenvalues which have been successfully computed.
?lamsh
Sends multiple shifts through a small (single node)
matrix to maximize the number of bulges that can be
sent through.
Syntax
void slamsh (float *s, const MKL_INT *lds, MKL_INT *nbulge, const MKL_INT *jblk, float
*h, const MKL_INT *ldh, const MKL_INT *n, const float *ulp );
void dlamsh (double *s, const MKL_INT *lds, MKL_INT *nbulge, const MKL_INT *jblk,
double *h, const MKL_INT *ldh, const MKL_INT *n, const double *ulp );
void clamsh (MKL_Complex8 *s , const MKL_INT *lds , MKL_INT *nbulge , const MKL_INT
*jblk , MKL_Complex8 *h , const MKL_INT *ldh , const MKL_INT *n , const float *ulp );
void zlamsh (MKL_Complex16 *s , const MKL_INT *lds , MKL_INT *nbulge , const MKL_INT
*jblk , MKL_Complex16 *h , const MKL_INT *ldh , const MKL_INT *n , const double *ulp );
Include Files
• mkl_scalapack.h
Description
The ?lamshfunction sends multiple shifts through a small (single node) matrix to see how small consecutive
subdiagonal elements are modified by subsequent shifts in an effort to maximize the number of bulges that
can be sent through. The function should only be called when there are multiple shifts/bulges (nbulge > 1)
and the first shift is starting in the middle of an unreduced Hessenberg matrix because of two or more small
consecutive subdiagonal elements.
Input Parameters
s (local)
Array of size lds*2*jblk.
On entry, the matrix of shifts. Only the 2x2 diagonal of s is referenced. It is

assumed that s has jblk double shifts (size 2).
lds (local)
On entry, the leading dimension of S; unchanged on exit. 1<nbulge≤ jblk
≤lds/2.
nbulge (local)
On entry, the number of bulges to send through h (>1). nbulge should be
less than the maximum determined (jblk). 1<nbulge ≤ jblk≤ lds/2.
1561
jblk (local)
On entry, the number of double shifts determined for S; unchanged on exit.
h (local)
Array of size ldh*n.
On entry, the local matrix to apply the shifts on.

h should be aligned so that the starting row is 2.
ldh (local)
On entry, the leading dimension of H; unchanged on exit.
n (local)
On entry, the size of H. If all the bulges are expected to go through, n
should be at least 4nbulge+2. Otherwise, nbulge may be reduced by this
function.
ulp (local)
On entry, machine precision. Unchanged on exit.
Output Parameters
s On exit, the data is rearranged in the best order for applying.
nbulge On exit, the maximum number of bulges that can be sent through.
h On exit, the data is destroyed.
See Also
notations.
?lapst
Sorts the numbers in increasing or decreasing order.
Syntax
void slapst (const char* id, const MKL_INT* n, const float* d, MKL_INT* indx, MKL_INT*
info);
void dlapst (const char* id, const MKL_INT* n, const double* d, MKL_INT* indx, MKL_INT*
info);
Include Files
• mkl_scalapack.h
Description
?lapst is a modified version of the LAPACK routine ?lasrt.
Define a permutation indx that sorts the numbers in d in increasing order (if id = 'I') or in decreasing order
(if id = 'D' ).
Use Quick Sort, reverting to Insertion sort on arrays of size <= 20. Dimension of STACK limits n to about
232.
1562
Input Parameters
id
= 'I': sort d in increasing order;
n
The length of the array d.
d Array, size (n)
The array to be sorted.
Output Parameters
indx
Array, size (n).
The permutation which sorts the array d.
info
?laqr6
Performs a single small-bulge multi-shift QR sweep
collecting the transformations.
Syntax
void slaqr6(char* job, MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n,
MKL_INT* ktop, MKL_INT* kbot, MKL_INT* nshfts, float* sr, float* si, float* h,
MKL_INT* ldh, MKL_INT* iloz, MKL_INT* ihiz, float* z, MKL_INT* ldz, float* v, MKL_INT*
ldv, float* u, MKL_INT* ldu, MKL_INT* nv, float* wv, MKL_INT* ldwv, MKL_INT* nh,
float* wh, MKL_INT* ldwh);
void dlaqr6(char* job, MKL_INT* wantt, MKL_INT* wantz, MKL_INT* kacc22, MKL_INT* n,
MKL_INT* ktop, MKL_INT* kbot, MKL_INT* nshfts, double* sr, double* si, double* h,
MKL_INT* ldh, MKL_INT* iloz, MKL_INT* ihiz, double* z, MKL_INT* ldz, double* v,
MKL_INT* ldv, double* u, MKL_INT* ldu, MKL_INT* nv, double* wv, MKL_INT* ldwv,
MKL_INT* nh, double* wh, MKL_INT* ldwh);
Include Files
• mkl_scalapack.h
Description
This auxiliary function performs a single small-bulge multi-shift QR sweep, moving the chain of bulges from
top to bottom in the submatrix H(ktop:kbot,ktop:kbot), collecting the transformations in the matrix V or
accumulating the transformations in the matrix Z (see below).
This is a modified version of ?laqr5 from LAPACK 3.1.
Input Parameters
job Set the kind of job to do in ?laqr6, as follows:
1563
job = 'I': Introduce and chase bulges in submatrix

job = 'C': Chase bulges from top to bottom of submatrix
job = 'O': Chase bulges off submatrix
wantt wanttis non-zero if the quasi-triangular Schur factor is being computed.

wantt is set to zero otherwise.
wantz wantzis non-zero if the orthogonal Schur factor is being computed. wantz
is set to zero otherwise.
kacc22 Specifies the computation mode of far-from-diagonal orthogonal updates.

= 0: ?laqr6 does not accumulate reflections and does not use matrix-
matrix multiply to update far-from-diagonal matrix entries.
= 1: ?laqr6 accumulates reflections and uses matrix-matrix multiply to
update the far-from-diagonal matrix entries.
= 2: ?laqr6 accumulates reflections, uses matrix-matrix multiply to update
the far-from-diagonal matrix entries, and takes advantage of 2-by-2 block
structure during matrix multiplies.
n n is the order of the Hessenberg matrix H upon which this function

operates.
ktop, kbot These are the first and last rows and columns of an isolated diagonal block
upon which the QR sweep is to be applied. It is assumed without a check
that either ktop = 1 or H(ktop,ktop-1) = 0 and either kbot = n or H(kbot
+1,kbot) = 0.
nshfts nshfts gives the number of simultaneous shifts. nshfts must be positive
and even.
sr, si Array of size nshfts
sr contains the real parts and si contains the imaginary parts of the
nshfts shifts of origin that define the multi-shift QR sweep.
h Array of size ldh * n
On input h contains a Hessenberg matrix H.
ldh ldh is the leading dimension of H just as declared in the calling function.
ldh≥ max(1,n).
iloz, ihiz Specify the rows of the matrix Zto which transformations must be applied if
wantzis non-zero. 1≤iloz≤ihiz≤n
z Array of size ldz * ktop
If wantzis non-zero, then the QR sweep orthogonal similarity

transformation is accumulated into the matrix Z(iloz:ihiz,kbot:ktop),
stored in the array z, from the right.
If wantzequals zero, then z is unreferenced.
1564
ldz ldz is the leading dimension of z just as declared in the calling function.
ldz≥n.
v (workspace) array of size ldv * nshfts/2
ldv ldv is the leading dimension of v as declared in the calling function. ldv≥3.
u (workspace) array of size ldu * (3*nshfts-3)
ldu ldu is the leading dimension of u just as declared in the calling function.
ldu≥3*nshfts-3.
nh nh is the number of columns in array wh available for workspace. nh≥1 is

required for usage of this workspace, otherwise the updates of the far-
from-diagonal elements will be updated without level 3 BLAS.
wh (workspace) array of size ldwh * nh
ldwh Leading dimension of wh just as declared in the calling function.

ldwh≥3*nshfts-3.
nv nv is the number of rows in wv available for workspace. nv≥1 is required for

usage of this workspace, otherwise the updates of the far-from-diagonal
elements will be updated without level 3 BLAS.
wv (workspace) array of size ldwv * 3*nshfts
ldwv scalar
ldwv is the leading dimension of wv as declared in the in the calling
function. ldwv≥nv.
OUTPUT Parameters
h A multi-shift QR sweep with shifts sr[j]+i*si[j] is applied to the isolated

diagonal block in matrix rows and columns ktop through kbot.
z If wantzis non-zero, then the QR sweep orthogonal/unitary similarity

transformation is accumulated into the matrix Z(iloz:ihiz,kbot:ktop)
from the right.
If wantzequals zero, then z is unreferenced.
Application Notes
Notes
Based on contributions by Karen Braman and Ralph Byers, Department of Mathematics, University of Kansas,
USA Robert Granat, Department of Computing Science and HPC2N, Umea University, Sweden
See Also
notations.
1565
?lar1va
Computes scaled eigenvector corresponding to given
eigenvalue.
Syntax
void slar1va(MKL_INT* n, MKL_INT* b1, MKL_INT* bn, float* lambda, float* d, float* l,
float* ld, float* lld, float* pivmin, float* gaptol, float* z, MKL_INT* wantnc,
MKL_INT* negcnt, float* ztz, float* mingma, MKL_INT* r, MKL_INT* isuppz, float*
nrminv, float* resid, float* rqcorr, float* work);
void dlar1va(MKL_INT* n, MKL_INT* b1, MKL_INT* bn, double* lambda, double* d, double*
l, double* ld, double* lld, double* pivmin, double* gaptol, double* z, MKL_INT*
wantnc, MKL_INT* negcnt, double* ztz, double* mingma, MKL_INT* r, MKL_INT* isuppz,
double* nrminv, double* resid, double* rqcorr, double* work);
Include Files
• mkl_scalapack.h
Description
?slar1va computes the (scaled) r-th column of the inverse of the submatrix in rows b1 through bn of the
tridiagonal matrix LDLT - λI. When λ is close to an eigenvalue, the computed vector is an accurate
eigenvector. Usually, r corresponds to the index where the eigenvector is largest in magnitude. The following
steps accomplish this computation :
1. Stationary qd transform, LDLT - λI = L+D+L+T,

2. Progressive qd transform, LDLT - λI = U-D-U-T,
3. Computation of the diagonal elements of the inverse of LDLT - λI by combining the above transforms,
and choosing r as the index where the diagonal of the inverse is (one of the) largest in magnitude.
4. Computation of the (scaled) r-th column of the inverse using the twisted factorization obtained by
combining the top part of the stationary and the bottom part of the progressive transform.
Input Parameters
n The order of the matrix LDLT.
b1 First index of the submatrix of LDLT.
bn Last index of the submatrix of LDLT.
lambda The shift λ. In order to compute an accurate eigenvector, lambda should be

a good approximation to an eigenvalue of LDLT.
l Array of size n-1
The (n-1) subdiagonal elements of the unit bidiagonal matrix L, in elements

0 to n-2.
d Array of size n
The n diagonal elements of the diagonal matrix D.
ld Array of size n-1
The n-1 elements l[i]*d[i], i=0,...,n-2.
1566
lld Array of size n-1
The n-1 elements l[i]*l]i]*d[i], i=0,...,n-2.
pivmin The minimum pivot in the Sturm sequence.
gaptol Tolerance that indicates when eigenvector entries are negligible with respect
to their contribution to the residual.
z Array of size n
On input, all entries of z must be set to 0.
wantnc Specifies whether negcnt has to be computed.
r The twist index for the twisted factorization used to compute z.
On input, 0 ≤r≤n. If r is input as 0, r is set to the index where (LDLT - σI)-1

is largest in magnitude. If 1 ≤r≤n, r is unchanged.
Ideally, r designates the position of the maximum entry in the eigenvector.
work (Workspace) array of size 4*n
OUTPUT Parameters
z On output, z contains the (scaled) r-th column of the inverse. The scaling is
such that z[r-1] equals 1.
negcnt If wantncis non-zero then negcnt = the number of pivots < pivmin in the
matrix factorization LDLT, and negcnt = -1 otherwise.
ztz The square of the 2-norm of z.
mingma The reciprocal of the largest (in magnitude) diagonal element of the inverse
of LDLT - σI.
r On output, r contains the twist index used to compute z.
isuppz array of size 2

The support of the vector in z, i.e., the vector z is non-zero only in
elements isuppz[0] and isuppz[1].
nrminv nrminv = 1/SQRT( ztz )
resid The residual of the FP vector.

resid = ABS( mingma )/SQRT( ztz )
rqcorr The Rayleigh Quotient correction to lambda.
See Also
notations.
1567
?laref
Applies Householder reflectors to matrices on their
rows or columns.
Syntax
void slaref (const char* type, float* a, const MKL_INT* lda, const MKL_INT* wantz,
float* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1, MKL_INT* icol1,
const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1, const MKL_INT*
itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const float* vecs, float* v2,
float* v3, float* t1, float* t2, float* t3);
void dlaref (const char* type, double* a, const MKL_INT* lda, const MKL_INT* wantz,
double* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1, MKL_INT* icol1,
const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1, const MKL_INT*
itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const double* vecs, double* v2,
double* v3, double* t1, double* t2, double* t3);
void claref (const char* type, MKL_Complex8* a, const MKL_INT* lda, const MKL_INT*
wantz, MKL_Complex8* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1,
MKL_INT* icol1, const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1,
const MKL_INT* itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const MKL_Complex8*
vecs, MKL_Complex8* v2, MKL_Complex8* v3, MKL_Complex8* t1, MKL_Complex8* t2,
MKL_Complex8* t3);
void zlaref (const char* type, MKL_Complex16* a, const MKL_INT* lda, const MKL_INT*
wantz, MKL_Complex16* z, const MKL_INT* ldz, const MKL_INT* block, MKL_INT* irow1,
MKL_INT* icol1, const MKL_INT* istart, const MKL_INT* istop, const MKL_INT* itmp1,
const MKL_INT* itmp2, const MKL_INT* liloz, const MKL_INT* lihiz, const MKL_Complex16*
vecs, MKL_Complex16* v2, MKL_Complex16* v3, MKL_Complex16* t1, MKL_Complex16* t2,
MKL_Complex16* t3);
Include Files
• mkl_scalapack.h
Description
?laref applies one or several Householder reflectors of size 3 to one or two matrices (if column is specified)
on either their rows or columns.
Input Parameters
type (local)
If 'R': Apply reflectors to the rows of the matrix (apply from left)
Otherwise: Apply reflectors to the columns of the matrix
Unchanged on exit.
a (local)
Array, lld_a*LOCc(ja+n-1)
On entry, the matrix to receive the reflections.
lda (local)
1568
On entry, the leading dimension of a.
Unchanged on exit.
wantz (local)
If wantz≠ 0, then apply any column reflections to z as well.
If wantz = 0, then do no additional work on z.
z (local)
Array, ldz*ncols, where the value ncols depends on other arguments. If
wantzwantz≠ 0 and type≠ 'R' then ncols = icol1 + 3*(lihiz - liloz
+ 1). Otherwise, ncols is unused.
On entry, the second matrix to receive column reflections.
This is changed only if wantz is set.
ldz (local)
On entry, the leading dimension of z.
Unchanged on exit.
block (local)
If nonzero, then apply several reflectors at once and read their data from
the vecs array.
If zero, apply the single reflector given by v2, v3, t1, t2, and t3.
irow1 (local)
On entry, the local row element of a.
icol1 (local)
On entry, the local column element of a.
istart (local)
Specifies the "number" of the first reflector. This is used as an index into
vecs if block is set. istart is ignored if block is zero.
istop (local)
Specifies the "number" of the last reflector. This is used as an index into
vecs if block is set. istop is ignored if block is zero.
itmp1 (local)
Starting range into a. For rows, this is the local first column. For columns,
this is the local first row.
itmp2 (local)
Ending range into a. For rows, this is the local last column. For columns,
this is the local last row.
1569
liloz, lihiz (local)
These serve the same purpose as itmp1, itmp2 but for z when wantz is
set.
vecs (local)
Array of size 3*N (matrix size)
This holds the size 3 reflectors one after another and this is only accessed
when block is nonzero
v2, v3, t1, t2, t3 (local)

This holds information on a single size 3 Householder reflector and is read
when block is zero, and overwritten when block is nonzero
Output Parameters
a The updated matrix on exit.
z This is changed only if wantz is set.
irow1 Undefined on output.
icol1 Undefined on output.
v2, v3, t1, t2, t3 Overwritten when block is nonzero.
?larrb2
Provides limited bisection to locate eigenvalues for
more accuracy.
Syntax
void slarrb2(MKL_INT* n, float* d, float* lld, MKL_INT* ifirst, MKL_INT* ilast, float*
rtol1, float* rtol2, MKL_INT* offset, float* w, float* wgap, float* werr, float* work,
MKL_INT* iwork, float* pivmin, float* lgpvmn, float* lgspdm, MKL_INT* twist, MKL_INT*
info);
void dlarrb2(MKL_INT* n, double* d, double* lld, MKL_INT* ifirst, MKL_INT* ilast,
double* rtol1, double* rtol2, MKL_INT* offset, double* w, double* wgap, double* werr,
double* work, MKL_INT* iwork, double* pivmin, double* lgpvmn, double* lgspdm, MKL_INT*
twist, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
Given the relatively robust representation (RRR) LDLT, ?larrb2 does "limited" bisection to refine the
eigenvalues of LDLT with indices in a given range to more accuracy. Initial guesses for these eigenvalues are
input in w, the corresponding estimate of the error in these guesses and their gaps are input in werr and
wgap, respectively. During bisection, intervals [left, right] are maintained by storing their mid-points and
semi-widths in the arrays w and werr respectively. The range of indices is specified by the ifirst, ilast,
and offset parameters, as explained in Input Parameters.
1570
NOTE
There are very few minor differences between larrb from LAPACK and this current function ?larrb2.
The most important reason for creating this nearly identical copy is profiling: in the ScaLAPACK MRRR
algorithm, eigenvalue computation using ?larrb2 is used for refinement in the construction of the
representation tree, as opposed to the initial computation of the eigenvalues for the root RRR which
uses ?larrb. When profiling, this allows an easy quantification of refinement work vs. computing
eigenvalues of the root.
Input Parameters
n The order of the matrix.
d Array of size n.
lld Array of size n-1.
The (n-1) elements li+1*li+1*d[i], i=0, ..., n-2.
ifirst The index of the first eigenvalue to be computed.
ilast The index of the last eigenvalue to be computed.
rtol1, rtol2 Tolerance for the convergence of the bisection intervals.

An interval [left, right] has converged if right - left < max (rtol1 * gap,
rtol2 * max(|left|, |right|)) where gap is the (estimated) distance to the
nearest eigenvalue.
offset Offset for the arrays w, wgap and werr, i.e., the elements indexed ifirst -
offset - 1 through ilast - offset -1 of these arrays are to be used.
w Array of size n
On input, w[ifirst - offset - 1] through w[ilast - offset - 1] are

estimates of the eigenvalues of LDLT indexed ifirst through ilast.
wgap Array of size n-1.
On input, the (estimated) gaps between consecutive eigenvalues of LDLT,

i.e., wgap[I - offset - 1] is the gap between eigenvalues I and I + 1. Note
that if ifirst = ilast then wgap[ifirst - offset - 1] must be set to
zero.
werr Array of size n.
On input, werr[ifirst - offset - 1] through werr[ilast - offset - 1]

are the errors in the estimates of the corresponding elements in w.
work (workspace) array of size 4*n.
Workspace.
iwork (workspace) array of size 2*n.
Workspace.
1571
pivmin The minimum pivot in the Sturm sequence.
lgpvmn Logarithm of pivmin, precomputed.
lgspdm Logarithm of the spectral diameter, precomputed.
twist The twist index for the twisted factorization that is used for the negcount.
twist = n: Compute negcount from LDLT - λI = L+D+L+T
twist = 1: Compute negcount from LDLT - λI = U-D-U-T
twist = r, 1 < r < n: Compute negcount from LDLT - λI = NrΔr NrT
OUTPUT Parameters
w On output, the eigenvalue estimates in w are refined.
wgap On output, the eigenvalue gaps in wgap are refined.
werr On output, the errors in werr are refined.
info Error flag.
See Also
notations.
?larrd2
Computes the eigenvalues of a symmetric tridiagonal
matrix to suitable accuracy.
Syntax
void slarrd2(char* range, char* order, MKL_INT* n, float* vl, float* vu, MKL_INT* il,
MKL_INT* iu, float* gers, float* reltol, float* d, float* e, float* e2, float* pivmin,
MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, float* w, float* werr, float* wl, float*
wu, MKL_INT* iblock, MKL_INT* indexw, float* work, MKL_INT* iwork, MKL_INT* dol,
MKL_INT* dou, MKL_INT* info);
void dlarrd2(char* range, char* order, MKL_INT* n, double* vl, double* vu, MKL_INT*
il, MKL_INT* iu, double* gers, double* reltol, double* d, double* e, double* e2,
double* pivmin, MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, double* w, double* werr,
double* wl, double* wu, MKL_INT* iblock, MKL_INT* indexw, double* work, MKL_INT*
iwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?larrd2 computes the eigenvalues of a symmetric tridiagonal matrix T to limited initial accuracy. This is an
auxiliary code to be called from larre2a.
1572
?larrd2 has been created using the LAPACK code larrd which itself stems from stebz. The motivation for
creating ?larrd2 is efficiency: When computing eigenvalues in parallel and the input tridiagonal matrix splits
into blocks, ?larrd2 can skip over blocks which contain none of the eigenvalues from DOL to DOU for which
the processor responsible. In extreme cases (such as large matrices consisting of many blocks of small size
like 2x2), the gain can be substantial.
Optimization Notice
Input Parameters
range = 'A': ("All") all eigenvalues will be found.

= 'V': ("Value") all eigenvalues in the half-open interval (vl, vu] will be
found.
= 'I': ("Index") eigenvalues of the entire matrix with the indices in a given
range will be found.
order = 'B': ("By Block") the eigenvalues will be grouped by split-off block (see
iblock, isplit) and ordered from smallest to largest within the block.
= 'E': ("Entire matrix") the eigenvalues for the entire matrix will be ordered
n The order of the tridiagonal matrix T. n >= 0.
vl, vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. Eigenvalues less than or equal to vl, or greater than vu, will
not be returned. vl < vu.
il, iu If range='I', the indices (in ascending order) of the smallest eigenvalue, to
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1].
1 ≤il≤iu≤=n, if n > 0; il = 1 and iu = 0 if n = 0.
gers Array of size 2*n
The n Gerschgorin intervals (the i-th Gerschgorin interval is (gers[2*i-2],

gers[2*i-1])).
reltol The minimum relative width of an interval. When an interval is narrower

than reltol times the larger (in magnitude) endpoint, then it is considered
to be sufficiently small, i.e., converged. Note: this should always be at least
radix*machine epsilon.
1573
d Array of size n
The n diagonal elements of the tridiagonal matrix T.
e Array of size n-1
The (n-1) off-diagonal elements of the tridiagonal matrix T.
e2 Array of size n-1
The (n-1) squared off-diagonal elements of the tridiagonal matrix T.
pivmin The minimum pivot allowed in the sturm sequence for T.
nsplit The number of diagonal blocks in the matrix T.

1 ≤nsplit≤n.
isplit Array of size n
The splitting points, at which T breaks up into submatrices.

The first submatrix consists of rows/columns 1 to isplit[0], the second of
rows/columns isplit[0]+1 through isplit[1], etc., and the nsplit-th
submatrix consists of rows/columns isplit[nsplit-2]+1 through
isplit[nsplit-1]=n.
(Only the first nsplit elements will actually be used, but since the user
cannot know a priori what value nsplit will have, n words must be
reserved for isplit.)
work (workspace) Array of size 4*n
iwork (workspace) Array of size 3*n
dol, dou Specifying an index range dol:dou allows the user to work on only a
selected part of the representation tree.
Otherwise, the setting dol=1, dou=n should be applied.
Note that dol and dou refer to the order in which the eigenvalues are
stored in W.
OUTPUT Parameters
m The actual number of eigenvalues found. 0 ≤m≤n.
(See also the description of info=2,3.)
w Array of size n
On exit, the first m elements of w will contain the eigenvalue

approximations. ?larrd2 computes an interval Ij = (aj, bj] that includes
eigenvalue j. The eigenvalue approximation is given as the interval midpoint
w[j-1]= (aj + bj)/2. The corresponding error is bounded by werr[j-1] =
abs(aj - bj)/2.
werr Array of size n
The error bound on the corresponding eigenvalue approximation in w.
1574
wl, wu The interval (wl, wu] contains all the wanted eigenvalues.
If range='V', then wl=vl and wu=vu.
If range='A', then wl and wu are the global Gerschgorin bounds
on the spectrum.
If range='I', then wl and wu are computed by SLAEBZ from the
index range specified.
iblock Array of size n
At each row/column j where e[j-1] is zero or small, the matrix T is

considered to split into a block diagonal matrix. On exit, if info = 0,
iblock[i] specifies to which block (from 0 to the number of blocks minus
one) the eigenvalue w[i] belongs. (?larrd2 may use the remaining n-m
elements as workspace.)
indexw Array of size n
The indices of the eigenvalues within each block (submatrix); for example,
indexw[i]= j and iblock[i]=k imply that the (i+1)-th eigenvalue w[i] is the
j-th eigenvalue in block k.
info = 0: successful exit

> 0: some or all of the eigenvalues failed to converge or were not

computed:
• =1 or 3: Bisection failed to converge for some eigenvalues; these

eigenvalues are flagged by a negative block number. The effect is that
the eigenvalues may not be as accurate as the absolute and relative
tolerances.
• =2 or 3: range='I' only: Not all of the eigenvalues il:iu were found.
• = 4: range='I', and the Gershgorin interval initially used was too small.
No eigenvalues were computed.
See Also
notations.
?larre2
Given a tridiagonal matrix, sets small off-diagonal
elements to zero and for each unreduced block, finds
base representations and eigenvalues.
Syntax
void slarre2(char* range, MKL_INT* n, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
float* d, float* e, float* e2, float* rtol1, float* rtol2, float* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, float* w, float*
werr, float* wgap, MKL_INT* iblock, MKL_INT* indexw, float* gers, float* pivmin,
float* work, MKL_INT* iwork, MKL_INT* info);
1575
void dlarre2(char* range, MKL_INT* n, double* vl, double* vu, MKL_INT* il, MKL_INT*
iu, double* d, double* e, double* e2, double* rtol1, double* rtol2, double* spltol,
MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, double* w,
double* werr, double* wgap, MKL_INT* iblock, MKL_INT* indexw, double* gers, double*
pivmin, double* work, MKL_INT* iwork, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, ?larre2 sets, via ?larra,
"small" off-diagonal elements to zero. For each block Ti, it finds
• a suitable shift at one end of the block's spectrum,

• the root RRR, Ti - σiI = LiDiLiT, and
• eigenvalues of each LiDiLiT.
The representations and eigenvalues found are then returned to ?stegr2 to compute the eigenvectors T.
?larre2 is more suitable for parallel computation than the original LAPACK code for computing the root RRR
and its eigenvalues. When computing eigenvalues in parallel and the input tridiagonal matrix splits into
blocks, ?larre2 can skip over blocks which contain none of the eigenvalues from dol to dou for which the
processor is responsible. In extreme cases (such as large matrices consisting of many blocks of small size,
e.g. 2x2), the gain can be substantial.
Optimization Notice
Input Parameters

found.
n The order of the matrix. n > 0.
vl, vu If range='V', the lower and upper bounds for the eigenvalues.
Eigenvalues less than or equal to vl, or greater than vu, will not be
returned. vl < vu.
1576
1 ≤il≤iu≤n.
d Array of size n
The n diagonal elements of the tridiagonal matrix T.
e Array of size n
The first (n-1) entries contain the subdiagonal elements of the tridiagonal
matrix T; e[n-1] need not be set.
e2 Array of size n
The first (n-1) entries contain the squares of the subdiagonal elements of
the tridiagonal matrix T; e2[n-1] need not be set.
rtol1, rtol2 Parameters for bisection.

An interval [left, right] has converged if right-left<max( rtol1*gap,
rtol2*max(|left|,|right|) )
spltol The threshold for splitting.
dol, dou Specifying an index range dol:dou allows the user to work on only a
selected part of the representation tree. Otherwise, the setting dol=1,
dou=n should be applied.
stored in w.
work Workspace array of size 6*n
iwork
Workspace array of size 5*n
OUTPUT Parameters
vl, vu If range='I' or ='A', ?larre2 contains bounds on the desired part of the
spectrum.
d The n diagonal elements of the diagonal matrices Di.
e e contains the subdiagonal elements of the unit bidiagonal matrices Li. The
entries e[isplit[i]], 0 ≤i<nsplit, contain the base points σi+1 on output.
e2 The entries e2[isplit[i]], 0≤i<nsplit, are set to zero.
nsplit The number of blocks T splits into. 1 ≤nsplit≤n.
The splitting points, at which T breaks up into blocks.

The first block consists of rows/columns 1 to isplit[0], the second of
block consists of rows/columns isplit[nsplit-2]+1 through
isplit[nsplit-1]=n.
m The total number of eigenvalues (of all LiDiLiT) found.
1577
w Array of size n
The first m elements contain the eigenvalues. The eigenvalues of each of the
blocks, LiDiLiT, are sorted in ascending order (?larre2 may use the
remaining n-m elements as workspace).
Note that immediately after exiting this function, only the eigenvalues in
wwith indices in range dol-1:dou-1 might rely on this processor when the
eigenvalue computation is done in parallel.
The error bound on the corresponding eigenvalue in w.
Note that immediately after exiting this function, only the uncertainties in
werrwith indices in range dol-1:dou-1 might rely on this processor when
the eigenvalue computation is done in parallel.
wgap Array of size n
The separation from the right neighbor eigenvalue in w.
The gap is only with respect to the eigenvalues of the same block as each
block has its own representation tree.
Exception: at the right end of a block we store the left gap
Note that immediately after exiting this function, only the gaps in wgapwith
indices in range dol-1:dou-1 might rely on this processor when the
The indices of the blocks (submatrices) associated with the corresponding

eigenvalues in w; iblock[i]=1 if eigenvalue w[i] belongs to the first block
from the top, iblock[i]=2 if w[i] belongs to the second block, and so on.
indexw[i]= 10 and iblock[i]=2 imply that the (i+1)-th eigenvalue w[i] is
the 10th eigenvalue in block 2.

gers[2*i-1])).
pivmin The minimum pivot in the sturm sequence for T.

> 0: A problem occurred in ?larre2.
< 0: One of the called functions signaled an internal problem.

Needs inspection of the corresponding parameter info for further
information.
=-1: Problem in ?larrd.
=-2: Not enough internal iterations to find the base representation.
1578
=-3: Problem in ?larrb when computing the refined root representation
for ?lasq2.
=-4: Problem in ?larrb when preforming bisection on the desired part of

the spectrum.
=-5: Problem in ?lasq2
=-6: Problem in ?lasq2
See Also
notations.
?larre2a
Given a tridiagonal matrix, sets small off-diagonal
elements to zero and for each unreduced block, finds
base representations and eigenvalues.
Syntax
void slarre2a(char* range, MKL_INT* n, float* vl, float* vu, MKL_INT* il, MKL_INT* iu,
float* d, float* e, float* e2, float* rtol1, float* rtol2, float* spltol, MKL_INT*
nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil,
MKL_INT* neediu, float* w, float* werr, float* wgap, MKL_INT* iblock, MKL_INT* indexw,
float* gers, float* sdiam, float* pivmin, float* work, MKL_INT* iwork, float* minrgp,
MKL_INT* info);
void dlarre2a(char* range, MKL_INT* n, double* vl, double* vu, MKL_INT* il, MKL_INT*
iu, double* d, double* e, double* e2, double* rtol1, double* rtol2, double* spltol,
MKL_INT* nsplit, MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT*
needil, MKL_INT* neediu, double* w, double* werr, double* wgap, MKL_INT* iblock,
MKL_INT* indexw, double* gers, double* sdiam, double* pivmin, double* work, MKL_INT*
iwork, double* minrgp, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, ?larre2a sets any "small" off-
diagonal elements to zero, and for each unreduced block Ti, it finds
• a suitable shift at one end of the block's spectrum,

• the base representation, Ti - σiI = LiDiLiT, and
• eigenvalues of each LiDiLiT.
NOTE
The algorithm obtains a crude picture of all the wanted eigenvalues (as selected by range). However,
to reduce work and improve scalability, only the eigenvalues dol to dou are refined. Furthermore, if
the matrix splits into blocks, RRRs for blocks that do not contain eigenvalues from dol to dou are
skipped. The DQDS algorithm (function ?lasq2) is not used, unlike in the sequential case. Instead,
eigenvalues are computed in parallel to some figures using bisection.
1579
Optimization Notice
Input Parameters

found.
n The order of the matrix. n > 0.
vl, vu If range='V', the lower and upper bounds for the eigenvalues. Eigenvalues
less than or equal to vl, or greater than vu, will not be returned. vl < vu.
If range='I' or ='A', ?larre2a computes bounds on the desired part of the

spectrum.
1 ≤il≤iu≤n.
d Array of size n
On entry, the n diagonal elements of the tridiagonal matrix T.
e Array of size n
The first (n-1) entries contain the subdiagonal elements of the tridiagonal
matrix T; e[n-1] need not be set.
e2 Array of size n
The first (n-1) entries contain the squares of the subdiagonal elements of
the tridiagonal matrix T; e2[n-1] need not be set.
rtol1, rtol2 Parameters for bisection.

An interval [left,right] has converged if right - left < max( rtol1*gap,
rtol2*max(|left|,|right|) )
spltol The threshold for splitting.
dol, dou If the user wants to work on only a selected part of the representation tree,
he can specify an index range dol:dou.
Otherwise, the setting dol=1, dou=n should be applied.
1580
stored in w.
iwork Workspace array of size 5*n
minrgp The minimum relative gap threshold to decide whether an eigenvalue or a

cluster boundary is reached.
OUTPUT Parameters
vl, vu If range='V', the lower and upper bounds for the eigenvalues. Eigenvalues
less than or equal to vl, or greater than vu, are not returned. vl < vu.
If range='I' or range='A', ?larre2a computes bounds on the desired part

of the spectrum.
d The n diagonal elements of the diagonal matrices Di.
e e contains the subdiagonal elements of the unit bidiagonal matrices Li. The
entries e[isplit[i]], 0 ≤i<nsplit, contain the base points σi+1 on output.
e2 The entries e2[isplit[i ]], 0≤i<nsplit have been set to zero.
nsplit The number of blocks T splits into. 1 ≤nsplit≤n.
The splitting points, at which T breaks up into blocks.

The first block consists of rows/columns 1 to isplit[0], the second of
block consists of rows/columns isplit[nsplit-2]+1 through
isplit[nsplit-1]=n.
m The total number of eigenvalues (of all LiDiLiT) found.
needil, neediu The indices of the leftmost and rightmost eigenvalues of the root node RRR
which are needed to accurately compute the relevant part of the
representation tree.
w Array of size n
The first m elements contain the eigenvalues. The eigenvalues of each of the
blocks, LiDiLiT, are sorted in ascending order ( ?larre2a may use the
remaining n-m elements as workspace).
Note that immediately after exiting this function, only the eigenvalues in
wwith indices in range dol-1:dou-1 rely on this processor because the
The error bound on the corresponding eigenvalue in w.
1581
Note that immediately after exiting this function, only the uncertainties in
werrwith indices in range dol-1:dou-1 are reliable on this processor
because the eigenvalue computation is done in parallel.
The separation from the right neighbor eigenvalue in w. The gap is only with
respect to the eigenvalues of the same block as each block has its own
Exception: at the right end of a block we store the left gap
Note that immediately after exiting this function, only the gaps in wgapwith
indices in range dol-1:dou-1 are reliable on this processor because the


gers[2*i-1])).

> 0: A problem occurred in ?larre2a.
< 0: One of the called functions signaled an internal problem. Needs

inspection of the corresponding parameter info for further information.
=-1: Problem in ?larrd2.
=-2: Not enough internal iterations to find base representation.

=-3: Problem in ?larrb2 when computing the refined root representation.
=-4: Problem in ?larrb2 when preforming bisection on the desired part of

the spectrum.
= -9 Problem: m < dou-dol+1, that is the code found fewer eigenvalues
than it was supposed to.
See Also
notations.
?larrf2
Finds a new relatively robust representation such that
at least one of the eigenvalues is relatively isolated.
1582
Syntax
void slarrf2(MKL_INT* n, float* d, float* l, float* ld, MKL_INT* clstrt, MKL_INT*
clend, MKL_INT* clmid1, MKL_INT* clmid2, float* w, float* wgap, float* werr, MKL_INT*
trymid, float* spdiam, float* clgapl, float* clgapr, float* pivmin, float* sigma,
float* dplus, float* lplus, float* work, MKL_INT* info);
void dlarrf2(MKL_INT* n, double* d, double* l, double* ld, MKL_INT* clstrt, MKL_INT*
clend, MKL_INT* clmid1, MKL_INT* clmid2, double* w, double* wgap, double* werr,
MKL_INT* trymid, double* spdiam, double* clgapl, double* clgapr, double* pivmin,
double* sigma, double* dplus, double* lplus, double* work, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
Given the initial representation LDLT and its cluster of close eigenvalues (in a relative measure), defined by
the indices of the first and last eigenvalues in the cluster, ?larrf2 finds a new relatively robust
representation LDLT - σ I = L+D+L+T such that at least one of the eigenvalues of L+D+L+T is relatively
isolated.
This is an enhanced version of ?larrf that also tries shifts in the middle of the cluster, should there be a
large gap, in order to break large clusters into at least two pieces.
Input Parameters
n The order of the matrix (subblock, if the matrix was split).
d Array of size n
l Array of size n-1
The (n-1) subdiagonal elements of the unit bidiagonal matrix L.
ld Array of size n-1
The (n-1) elements l[i]*d[i].
clstrt The index of the first eigenvalue in the cluster.
clend The index of the last eigenvalue in the cluster.
clmid1, clmid2 The index of a middle eigenvalue pair with large gap.
w Array of size ≥ (clend-clstrt+1)
The eigenvalue approximations of LD LT in ascending order. w[clstrt - 1]

through w[clend - 1] form the cluster of relatively close eigenalues.
wgap Array of size ≥ (clend-clstrt+1)
werr Array of size ≥ (clend-clstrt+1)
werr contains the semiwidth of the uncertainty interval of the

corresponding eigenvalue approximation in w.
1583
spdiam Estimate of the spectral diameter obtained from the Gerschgorin intervals
clgapl, clgapr Absolute gap on each end of the cluster.

Set by the calling function to protect against shifts too close to eigenvalues
outside the cluster.
pivmin The minimum pivot allowed in the Sturm sequence.
OUTPUT Parameters
wgap Contains refined values of its input approximations. Very small gaps are
unchanged.
sigma The shift (σ) used to form L+D+L+T.
dplus Array of size n
The n diagonal elements of the diagonal matrix D+.
lplus Array of size n-1
The first (n-1) elements of lplus contain the subdiagonal elements of the
unit bidiagonal matrix L+.
See Also
notations.
?larrv2
Computes the eigenvectors of the tridiagonal matrix T
= L*D*LT given L, D and the eigenvalues of L*D*LT.
Syntax
void slarrv2(MKL_INT* n, float* vl, float* vu, float* d, float* l, float* pivmin,
MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, float* minrgp, float* rtol1, float* rtol2, float* w, float* werr, float* wgap,
MKL_INT* iblock, MKL_INT* indexw, float* gers, float* sdiam, float* z, MKL_INT* ldz,
MKL_INT* isuppz, float* work, MKL_INT* iwork, MKL_INT* vstart, MKL_INT* finish,
MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset, MKL_INT* info);
void dlarrv2(MKL_INT* n, double* vl, double* vu, double* d, double* l, double* pivmin,
MKL_INT* isplit, MKL_INT* m, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, double* minrgp, double* rtol1, double* rtol2, double* w, double* werr, double*
wgap, MKL_INT* iblock, MKL_INT* indexw, double* gers, double* sdiam, double* z,
MKL_INT* ldz, MKL_INT* isuppz, double* work, MKL_INT* iwork, MKL_INT* vstart, MKL_INT*
finish, MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset, MKL_INT*
info);
Include Files
• mkl_scalapack.h
1584
Description
?larrv2 computes the eigenvectors of the tridiagonal matrix T = LDLT given L, D and approximations to the
eigenvalues of LDLT. The input eigenvalues should have been computed by larre2a or by previous calls to ?
larrv2.
The major difference between the parallel and the sequential construction of the representation tree is that in
the parallel case, not all eigenvalues of a given cluster might be computed locally. Other processors might
"own" and refine part of an eigenvalue cluster. This is crucial for scalability. Thus there might be
communication necessary before the current level of the representation tree can be parsed.
Please note:
• The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. These parameters are only relevant when both eigenvalues and eigenvectors are computed
(stegr2b parameter jobz = 'V'). ?larrv2 only computes the eigenvectors corresponding to eigenvalues
dol through dou in w. (That is, instead of computing the eigenvectors belonging to w[0] through w[m-1],
only the eigenvectors belonging to eigenvalues w[dol - 1] through w[dou -1] are computed. In this case,
only the eigenvalues dol:dou are guaranteed to be accurately refined to all figures by Rayleigh-Quotient
iteration.
• The additional arguments vstart, finish, ndepth, parity, zoffset are included as a thread-safe
implementation equivalent to save variables. These variables store details about the local representation
tree which is computed layerwise. For scalability reasons, eigenvalues belonging to the locally relevant
representation tree might be computed on other processors. These need to be communicated before the
inspection of the RRRs can proceed on any given layer. Note that only when the variable finish is non-
zero, the computation has ended. All eigenpairs between dol and dou have been computed. m is set to
dou - dol + 1.
• ?larrv2 needs more workspace in z than the sequential slarrv. It is used to store the conformal
embedding of the local representation tree.
Optimization Notice
Input Parameters
n The order of the matrix. n≥ 0.
vl, vu Lower and upper bounds of the interval that contains the desired
eigenvalues. vl < vu. Needed to compute gaps on the left or right end of
the extremal eigenvalues in the desired range. vu is currently not used but
kept as parameter in case needed.
d Array of size n
The n diagonal elements of the diagonal matrix d. On exit, d is overwritten.
l Array of size n
1585
The (n-1) subdiagonal elements of the unit bidiagonal matrix L are in

elements 0 to n-2 of l (if the matrix is not split.) At the end of each block is
stored the corresponding shift as given by ?larre. On exit, l is
overwritten.
pivmin The minimum pivot allowed in the sturm sequence.
The splitting points, at which the matrix T breaks up into blocks. The first
block consists of rows/columns 1 to isplit[ 0 ], the second of rows/
columns isplit[ 0 ] + 1 through isplit[ 1 ], etc.
m The total number of input eigenvalues. 0 ≤m≤n.
dol, dou If you want to compute only selected eigenvectors from all the eigenvalues
supplied, you can specify an index range dol:dou. Or else the setting
dol=1, dou=m should be applied. Note that dol and dou refer to the order
in which the eigenvalues are stored in w. If you want to compute only
selected eigenpairs, the columns dol-1 to dou+1 of the eigenvector space
Z contain the computed eigenvectors. All other columns of Z are set to
zero.
If dol > 1, then Z(:,dol-1-zoffset) is used.
If dou < m, then Z(:,dou+1-zoffset) is used.
needil, neediu Describe which are the left and right outermost eigenvalues that still need
to be included in the computation. These indices indicate whether
eigenvalues from other processors are needed to correctly compute the
conformally embedded representation tree.
When dol≤needil≤neediu≤dou, all required eigenvalues are local to the
processor and no communication is required to compute its part of the
minrgp The minimum relative gap threshold to decide whether an eigenvalue or a

cluster boundary is reached.
rtol1, rtol2 Parameters for bisection. An interval [left,right] has converged if right-left <
max( rtol1*gap, rtol2*max(|left|,|right|) )
w Array of size n
The first m elements of w contain the approximate eigenvalues for which

eigenvectors are to be computed. The eigenvalues should be grouped by
split-off block and ordered from smallest to largest within the block. (The
output array w from ?stegr2a is expected here.) Furthermore, they are
with respect to the shift of the corresponding root representation for their
block.
The first m elements contain the semiwidth of the uncertainty interval of the
corresponding eigenvalue in w.
1586

The indices of the eigenvalues within each block (submatrix). For example:

gers[2*i-1])). The Gerschgorin intervals should be computed from the
original unshifted matrix.
Not used but kept as parameter for possible future use.
sdiam Array of size n
The spectral diameters for all unreduced blocks.
ldz The leading dimension of the array z. ldz≥ 1, and if stegr2b parameter
jobz = 'V', ldz≥ max(1,n).
work (workspace) array of size 12*n
iwork (workspace) Array of size 7*n
vstart Non-zero on initialization, set to zero afterwards.
finish A flag that indicates whether all eigenpairs have been computed.
maxcls The largest cluster worked on by this processor in the representation tree.
ndepth The current depth of the representation tree. Set to zero on initial pass,
changed when the deeper levels of the representation tree are generated.
parity An internal parameter needed for the storage of the clusters on the current
level of the representation tree.
zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion.
OUTPUT Parameters
needil, neediu
w Unshifted eigenvalues for which eigenvectors have already been computed.
werr Contains refined values of its input approximations.
wgap Contains refined values of its input approximations. Very small gaps are
changed.
z Array of size ldz * max(1,m)
1587
If info = 0, the first m columns of the matrix Z, stored in the array z,

contain the orthonormal eigenvectors of the matrix T corresponding to the
input eigenvalues, with the i-th column of Z holding the eigenvector
associated with w[i - 1].
In the distributed version, only a subset of columns is accessed, see dol,

dou, and zoffset.
isuppz
Array of size 2*max(1,m)
The support of the eigenvectors in z, i.e., the indices indicating the non-
zero elements in z. The i-th eigenvector is non-zero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i-1 ].
finish A flag that indicates whether all eigenpairs have been computed.

> 0: A problem occured in ?larrv2.
< 0: One of the called functions signaled an internal problem.

Needs inspection of the corresponding parameter info for further
information.
=-1: Problem in ?larrb2 when refining a child's eigenvalues.
=-2: Problem in ?larrf2 when computing the RRR of a child. When a child
is inside a tight cluster, it can be difficult to find an RRR. A partial remedy
from the user's point of view is to make the parameter minrgp smaller and
recompile. However, as the orthogonality of the computed vectors is
proportional to 1/minrgp, be aware that decreasing minrgp might be
reduce precision.
=-3: Problem in ?larrb2 when refining a single eigenvalue after the
Rayleigh correction was rejected.
= 5: The Rayleigh Quotient Iteration failed to converge to full accuracy.
See Also
notations.
?lasorte
Sorts eigenpairs by real and complex data types.
Syntax
void slasorte (float *s , MKL_INT *lds , MKL_INT *j , float *out , MKL_INT *info );
1588
void dlasorte (double *s , MKL_INT *lds , MKL_INT *j , double *out , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?lasortefunction sorts eigenpairs so that real eigenpairs are together and complex eigenpairs are
together. This helps to employ 2x2 shifts easily since every second subdiagonal is guaranteed to be zero. This
function does no parallel work and makes no calls.
Optimization Notice
Input Parameters
s (local)
Array of size lds.
On entry, a matrix already in Schur form.
lds (local)
On entry, the leading dimension of the array s; unchanged on exit.
j (local)
On entry, the order of the matrix S; unchanged on exit.
out (local)
Array of size 2*j. The work buffer required by the function.
info (local)
Set, if the input matrix had an odd number of real eigenvalues and things
could not be paired or if the input matrix S was not originally in Schur form.
0 indicates successful completion.
Output Parameters
s On exit, the diagonal blocks of S have been rewritten to pair the

eigenvalues. The resulting matrix is no longer similar to the input.
out Work buffer.
See Also
notations.
1589
?lasrt2
Sorts numbers in increasing or decreasing order.
Syntax
void slasrt2 (char *id , MKL_INT *n , float *d , MKL_INT *key , MKL_INT *info );
void dlasrt2 (char *id , MKL_INT *n , double *d , MKL_INT *key , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?lasrt2function is modified LAPACK function ?lasrt, which sorts the numbers in d in increasing order
(if id = 'I') or in decreasing order (if id = 'D' ). It uses Quick Sort, reverting to Insertion Sort on arrays
of size ≤ 20. The size of STACK limits n to about 232.
Input Parameters
id = 'I': sort d in increasing order;
n The length of the array d.
d Array of size n.
On entry, the array to be sorted.
key Array of size n.

On entry, key contains a key to each of the entries in d.
Typically, key[i]= i+1 for all i = 0, ..., n-1.
Output Parameters
d On exit, d has been sorted into increasing order

(d[0] ≤ ... ≤ d[n - 1] )
or into decreasing order
(d[0] ≥ ... ≥ d[n - 1] ),
depending on id.

key On exit, key is permuted in exactly the same manner as d was permuted
from input to output. Therefore, if key[i] = i+1 for all i =0, ..., n-1 on input,
d[i] on output equals d[key[i]-1] on input.
See Also
notations.
1590
?stegr2
Syntax
void sstegr2(char* jobz, char* range, MKL_INT* n, float* d, float* e, float* vl,
float* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, float* w, float* z, MKL_INT* ldz,
MKL_INT* nzc, MKL_INT* isuppz, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT*
liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* zoffset, MKL_INT* info);
void dstegr2(char* jobz, char* range, MKL_INT* n, double* d, double* e, double* vl,
double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, double* w, double* z, MKL_INT* ldz,
MKL_INT* nzc, MKL_INT* isuppz, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT*
liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* zoffset, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?stegr2 computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal matrix
T. It is invoked in the ScaLAPACK MRRR driver p?syevr and the corresponding Hermitian version either when
only eigenvalues are to be computed, or when only a single processor is used (the sequential-like case).
?stegr2 has been adapted from LAPACK's ?stegr. Please note the following crucial changes.
1. The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. ?stegr2only computes the eigenpairs corresponding to eigenvalues dol through dou in
w, indexed dol-1 through dou-1. (That is, instead of computing the eigenpairs belonging to w[0]
through w[m-1], only the eigenvectors belonging to eigenvalues w[dol-1] through w[dou-1] are
computed. In this case, only the eigenvalues dol through dou are guaranteed to be fully accurate.
2. m is not the number of eigenvalues specified by range, but is m = dou - dol + 1. This concerns the
case where only eigenvalues are computed, but on more than one processor. Thus, in this case m refers
to the number of eigenvalues computed on this processor.
3. The arrays w and z might not contain all the wanted eigenpairs locally, instead this information is
distributed over other processors.
Optimization Notice
Input Parameters
jobz = 'N': Compute eigenvalues only;

1591
range = 'A': all eigenvalues will be found.

= 'V': all eigenvalues in the half-open interval (vl,vu] will be found.
= 'I': eigenvalues of the entire matrix with the indices in a given range will
be found.
d Array of size n
On entry, the n diagonal elements of the tridiagonal matrix T. Overwritten

on exit.
e Array of size n
On entry, the (n-1) subdiagonal elements of the tridiagonal matrix T in

elements 0 to n-2 of e. e[n-1] need not be set on input, but is used
internally as workspace. Overwritten on exit.
vl
eigenvalues. vl < vu.
1 ≤il≤iu≤n, if n > 0.
ldz The leading dimension of the array z. ldz≥ 1, and if jobz = 'V', then ldz≥
max(1,n).
nzc The number of eigenvectors to be held in the array z, storing the matrix Z.
If range = 'A', then nzc≥ max(1,n).
If range = 'V', then nzc≥ the number of eigenvalues in (vl,vu].
If nzc = -1, then a workspace query is assumed; the function calculates the
number of columns of the matrix Z that are needed to hold the
eigenvectors. This value is returned as the first entry of the z array, and no
error message related to nzc is issued.
lwork The size of the array work. lwork≥ max(1,18*n)
if jobz = 'V', and lwork≥ max(1,12*n) if jobz = 'N'. If lwork = -1, then a
workspace query is assumed; the function only calculates the optimal size
of the work array, returns this value as the first entry of the work array,
and no error message related to lwork is issued.
liwork The size of the array iwork. liwork≥ max(1,10*n) if the eigenvectors are
desired, and liwork≥ max(1,8*n) if only the eigenvalues are to be
computed.
1592
entry of the iwork array, and no error message related to liwork is issued.
dol, dou From the eigenvalues w[0] through w[m-1], only eigenvectors Z(:,dol) to
Z(:,dou) are computed.
If dol > 1, then Z(:,dol-1-zoffset) is used and overwritten.
If dou < m, then Z(:,dou+1-zoffset) is used and overwritten.
zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion
OUTPUT Parameters
m Globally summed over all processors, m equals the total number of

eigenvalues found. 0 ≤m≤n. If range = 'A', m = n, and if range = 'I', m = iu-
il+1. The local output equals m = dou - dol + 1.
w Array of size n
The first m elements contain the selected eigenvalues in ascending order.

Note that immediately after exiting this function, only the eigenvalues
indexed dol-1 through dou-1 are reliable on this processor because the
eigenvalue computation is done in parallel. Other processors will hold
reliable information on other parts of the w array. This information is
communicated in the ScaLAPACK driver.
z Array of size ldz * max(1,m).
If jobz = 'V', and if info = 0, then the first m columns of the matrix Z
stored in z contain some of the orthonormal eigenvectors of the matrix T
corresponding to the selected eigenvalues, with the i-th column of Z holding
the eigenvector associated with w[i-1].
Note: the user must ensure that at least max(1,m) columns of the matrix
are supplied in the array z; if range = 'V', the exact value of m is not known
in advance and can be computed with a workspace query by setting nzc =
-1, see below.
isuppz array of size 2*max(1,m)
elements in z. The i-th computed eigenvector is nonzero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i -1]. This is relevant in the case when
the matrix is split. isuppz is only set if n>2.
work On exit, if info = 0, work[0] returns the optimal (and minimal) lwork.
iwork On exit, if info = 0, iwork[0] returns the optimal liwork.
info On exit, info
other:if info = -i, the i-th argument had an illegal value
1593
if info = 10X, internal error in ?larre2,
if info = 20X, internal error in ?larrv.
Here, the digit X = ABS( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larre2 or ?larrv, respectively.
See Also
notations.
?stegr2a
Computes selected eigenvalues and initial
representations needed for eigenvector computations.
Syntax
void sstegr2a(char* jobz, char* range, MKL_INT* n, float* d, float* e, float* vl,
float* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, float* w, float* z, MKL_INT* ldz,
MKL_INT* nzc, float* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT*
dol, MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu, MKL_INT* inderr, MKL_INT* nsplit,
float* pivmin, float* scale, float* wl, float* wu, MKL_INT* info);
void dstegr2a(char* jobz, char* range, MKL_INT* n, double* d, double* e, double* vl,
double* vu, MKL_INT* il, MKL_INT* iu, MKL_INT* m, double* w, double* z, MKL_INT* ldz,
MKL_INT* nzc, double* work, MKL_INT* lwork, MKL_INT* iwork, MKL_INT* liwork, MKL_INT*
dol, MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu, MKL_INT* inderr, MKL_INT* nsplit,
double* pivmin, double* scale, double* wl, double* wu, MKL_INT* info);
Include Files
• mkl_scalapack.h
Description
?stegr2a computes selected eigenvalues and initial representations needed for eigenvector computations
in ?stegr2b. It is invoked in the ScaLAPACK MRRR driver p?syevr and the corresponding Hermitian version
when both eigenvalues and eigenvectors are computed in parallel on multiple processors. For this case, ?
stegr2a implements the first part of the MRRR algorithm, parallel eigenvalue computation and finding the
root RRR. At the end of ?stegr2a, other processors might have a part of the spectrum that is needed to
continue the computation locally. Once this eigenvalue information has been received by the processor, the
computation can then proceed by calling the second part of the parallel MRRR algorithm, ?stegr2b.
Please note:
• The calling sequence has two additional integer parameters, (compared to LAPACK's stegr), these are
dol and dou and should satisfy m≥dou≥dol≥1. These parameters are only relevant for the case jobz = 'V'.
Globally invoked over all processors, ?stegr2a computes all the eigenvalues specified by range.
?stegr2a locally only computes the eigenvalues corresponding to eigenvalues dol through dou in w,
indexed dol-1 through dou-1. (That is, instead of computing the eigenvectors belonging to w([0] through
w[m-1], only the eigenvectors belonging to eigenvalues w[dol-1] through w[dou-1] are computed. In this
case, only the eigenvalues dol through dou are guaranteed to be fully accurate.
• m is not the number of eigenvalues specified by range, but it is m = dou - dol + 1. Instead, m refers to
the number of eigenvalues computed on this processor.
• While no eigenvectors are computed in ?stegr2a itself (this is done later in ?stegr2b), the interface
1594
If jobz = 'V' then, depending on range and dol, dou, ?stegr2a might need more workspace in z then
the original ?stegr. In particular, the arrays w and z might not contain all the wanted eigenpairs locally,
instead this information is distributed over other processors.
Optimization Notice
Input Parameters

range = 'A': all eigenvalues will be found.

= 'V': all eigenvalues in the half-open interval (vl,vu] will be found.
= 'I': eigenvalues of the entire matrix with the indices in a given range will
be found.
d Array of size n
The n diagonal elements of the tridiagonal matrix T. Overwritten on exit.
e Array of size n
On entry, the (n-1) subdiagonal elements of the tridiagonal matrix T in

elements 0 to n-2 of e. e[n-1] need not be set on input, but is used
internally as workspace. Overwritten on exit.
vl, vu If range='V', the lower and upper bounds of the interval to be searched for
eigenvalues. vl < vu.
be returned in w[il-1], and largest eigenvalue, to be returned in w[iu-1]. 1
≤il≤iu≤n, if n > 0.
max(1,n).
nzc The number of eigenvectors to be held in the array z.
If range = 'A', then nzc≥ max(1,n).
If range = 'V', then nzc≥ the number of eigenvalues in (vl,vu].
1595
If nzc = -1, then a workspace query is assumed; the function calculates the
number of columns of the matrix stored in array z that are needed to hold
the eigenvectors. This value is returned as the first entry of the z array, and
no error message related to nzc is issued.
lwork The size of the array work. lwork≥ max(1,18*n) if jobz = 'V', and lwork≥
max(1,12*n) if jobz = 'N'.

entry of the work array, and no error message related to lwork is issued.
computed.
dol, dou From all the eigenvalues w[0] through w[m-1], only eigenvalues w[dol-1]
through w[dou-1] are computed.
OUTPUT Parameters
m Globally summed over all processors, m equals the total number of

eigenvalues found. 0 ≤m≤n.
The local output equals m = dou - dol + 1.
w Array of size n
The first m elements contain approximations to the selected eigenvalues in

ascending order. Note that immediately after exiting this function, only the
eigenvalues indexed dol-1 through dou-1 are reliable on this processor
because the eigenvalue computation is done in parallel. The other entries
are very crude preliminary approximations. Other processors hold reliable
information on these other parts of the w array.
This information is communicated in the ScaLAPACK driver.
z Array of size ldz * max(1,m).
?stegr2a does not compute eigenvectors, this is done in ?stegr2b. The

argument z as well as all related other arguments only appear to keep the
interface consistent and to signal to the user that this function is meant to
be used when eigenvectors are computed.
1596
needil, neediu The indices of the leftmost and rightmost eigenvalues needed to accurately
compute the relevant part of the representation tree. This information can
be used to find out which processors have the relevant eigenvalue
information needed so that it can be communicated.
inderr inderr points to the place in the work space where the eigenvalue
uncertainties (errors) are stored.
nsplit The number of blocks into which T splits. 1 ≤nsplit≤n.
scale The scaling factor for the tridiagonal T.
It is either given by the user or computed in ?larre2a.
info On exit, info = 0: successful exit
other: if info = -i, the i-th argument had an illegal value
if info = 10x, internal error in ?larre2a,
Here, the digit x = abs( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larre2a.
See Also
notations.
?stegr2b
From eigenvalues and initial representations computes
the selected eigenvalues and eigenvectors of the real
symmetric tridiagonal matrix in parallel on multiple
processors.
Syntax
void sstegr2b(char* jobz, MKL_INT* n, float* d, float* e, MKL_INT* m, float* w, float*
z, MKL_INT* ldz, MKL_INT* nzc, MKL_INT* isuppz, float* work, MKL_INT* lwork, MKL_INT*
iwork, MKL_INT* liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT* neediu,
MKL_INT* indwlc, float* pivmin, float* scale, float* wl, float* wu, MKL_INT* vstart,
MKL_INT* finish, MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity, MKL_INT* zoffset,
MKL_INT* info);
void dstegr2b(char* jobz, MKL_INT* n, double* d, double* e, MKL_INT* m, double* w,
double* z, MKL_INT* ldz, MKL_INT* nzc, MKL_INT* isuppz, double* work, MKL_INT* lwork,
MKL_INT* iwork, MKL_INT* liwork, MKL_INT* dol, MKL_INT* dou, MKL_INT* needil, MKL_INT*
neediu, MKL_INT* indwlc, double* pivmin, double* scale, double* wl, double* wu,
MKL_INT* vstart, MKL_INT* finish, MKL_INT* maxcls, MKL_INT* ndepth, MKL_INT* parity,
MKL_INT* zoffset, MKL_INT* info);
Include Files
• mkl_scalapack.h
1597
Description
?stegr2b should only be called after a call to ?stegr2a. From eigenvalues and initial representations
computed by ?stegr2a, ?stegr2b computes the selected eigenvalues and eigenvectors of the real
symmetric tridiagonal matrix in parallel on multiple processors. It is potentially invoked multiple times on a
given processor because the locally relevant representation tree might depend on spectral information that is
"owned" by other processors and might need to be communicated.
Please note:
• The calling sequence has two additional integer parameters, dol and dou, that should satisfy
m≥dou≥dol≥1. These parameters are only relevant for the case jobz = 'V'. ?stegr2b only computes the
eigenvectors corresponding to eigenvalues dol through dou in w, indexed dol-1 through dou-1. (That is,
instead of computing the eigenvectors belonging to w([0] through w[m-1], only the eigenvectors belonging
to eigenvalues w[dol-1] through w[dou-1] are computed. In this case, only the eigenvalues dol through
dou are guaranteed to be accurately refined to all figures by Rayleigh-Quotient iteration.
• The additional arguments vstart, finish, ndepth, parity, zoffset are included as a thread-safe
implementation equivalent to save variables. These variables store details about the local representation
tree which is computed layerwise. For scalability reasons, eigenvalues belonging to the locally relevant
representation tree might be computed on other processors. These need to be communicated before the
inspection of the RRRs can proceed on any given layer. Note that only when the variable finishis non-
zero, the computation has ended. All eigenpairs between dol and dou have been computed. m is set to
dou - dol + 1.
• ?stegr2b needs more workspace in z than the sequential ?stegr. It is used to store the conformal
embedding of the local representation tree.
Optimization Notice
Input Parameters

d Array of size n
The n diagonal elements of the tridiagonal matrix T. Overwritten on exit.
e Array of size n
The (n-1) subdiagonal elements of the tridiagonal matrix T in elements 0 to

n-2 of e. e[n-1] need not be set on input, but is used internally as
workspace. Overwritten on exit.
m The total number of eigenvalues found in ?stegr2a. 0 ≤m≤n.
1598
w Array of size n
The first m elements contain approximations to the selected eigenvalues in

ascending order. Note that only the eigenvalues from the locally relevant
part of the representation tree, that is all the clusters that include
eigenvalues from dol through dou, are reliable on this processor. (It does
not need to know about any others anyway.)
max(1,n).
nzc The number of eigenvectors to be held in the array z, storing the matrix Z.
lwork The size of the array work. lwork≥ max(1,18*n)
if jobz = 'V', and lwork≥ max(1,12*n) if jobz = 'N'.

entry of the work array, and no error message related to lwork is issued.
computed.
dol, dou From the eigenvalues w[0] through w[m-1], only eigenvectors Z(:,dol) to
Z(:,dou) are computed.
If dol > 1, then Z(:,dol-1-zoffset) is used and overwritten.
If dou < m, then Z(:,dou+1-zoffset) is used and overwritten.
needil, neediu Describes which are the left and right outermost eigenvalues still to be
computed. Initially computed by ?larre2a, modified in the course of the
algorithm.
scale The scaling factor for T. Used for unscaling the eigenvalues at the very end
of the algorithm.
finish Indicates whether all eigenpairs have been computed.
1599
zoffset Offset for storing the eigenpairs when z is distributed in 1D-cyclic fashion.
OUTPUT Parameters
z Array of size ldz * max(1,m)
If jobz = 'V', and if info = 0, then a subset of the first m columns of the
matrix Z, stored in z, contain the orthonormal eigenvectors of the matrix T
corresponding to the selected eigenvalues, with the i-th column of Z holding
the eigenvector associated with w[i-1].
See dol, dou for more information.
isuppz array of size 2*max(1,m).
elements in z. The i-th computed eigenvector is nonzero only in elements
isuppz[ 2*i-2 ] through isuppz[ 2*i -1]. This is relevant in the case when
the matrix is split. isuppz is only set if n>2.
needil, neediu Modified in the course of the algorithm.
indwlc Pointer into the workspace location where the local eigenvalue
representations are stored. ("Local eigenvalues" are those relative to the
individual shifts of the RRRs.)
finish Indicates whether all eigenpairs have been computed
info On exit, info
other:if info = -i, the i-th argument had an illegal value
if info = 20x, internal error in ?larrv2.
Here, the digit x = abs( iinfo ) < 10, where iinfo is the nonzero error
code returned by ?larrv2
See Also
notations.
1600
?stein2
Computes the eigenvectors corresponding to specified
eigenvalues of a real symmetric tridiagonal matrix,
using inverse iteration.
Syntax
void sstein2 (MKL_INT *n , float *d , float *e , MKL_INT *m , float *w , MKL_INT
*iblock , MKL_INT *isplit , float *orfac , float *z , MKL_INT *ldz , float *work ,
MKL_INT *iwork , MKL_INT *ifail , MKL_INT *info );
void dstein2 (MKL_INT *n , double *d , double *e , MKL_INT *m , double *w , MKL_INT
*iblock , MKL_INT *isplit , double *orfac , double *z , MKL_INT *ldz , double *work ,
MKL_INT *iwork , MKL_INT *ifail , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?stein2function is a modified LAPACK function ?stein. It computes the eigenvectors of a real
symmetric tridiagonal matrix T corresponding to specified eigenvalues, using inverse iteration.
The maximum number of iterations allowed for each eigenvector is specified by an internal parameter maxits
(currently set to 5).
Input Parameters
n The order of the matrix T (n ≥ 0).
m The number of eigenvectors to be found (0 ≤m ≤ n).
d, e, w Arrays:
d, of size n. The n diagonal elements of the tridiagonal matrix T.
e, of size n.
The (n-1) subdiagonal elements of the tridiagonal matrix T, in elements 1
to n-1. e[n-1] need not be set.
w, of size n.
The first m elements of w contain the eigenvalues for which eigenvectors are
to be computed. The eigenvalues should be grouped by split-off block and
ordered from smallest to largest within the block. (The output array w
from ?stebz with ORDER = 'B' is expected here).
The size of w must be at least max(1, n).
iblock
Array of size n.
The submatrix indices associated with the corresponding eigenvalues in w;
iblock[i] = 1, if eigenvalue w[i] belongs to the first submatrix from the

top,
iblock[i] = 2, if eigenvalue w[i] belongs to the second submatrix, etc. (The
output array iblock from ?stebz is expected here).
1601
isplit
Array of size n.
The splitting points, at which T breaks up into submatrices. The first

submatrix consists of rows/columns 1 to isplit[0], the second submatrix
consists of rows/columns isplit[0]+1 through isplit[1], etc. (The
output array isplit from ?stebz is expected here).
orfac orfac specifies which eigenvectors should be orthogonalized. Eigenvectors

that correspond to eigenvalues which are within orfac*||T|| of each other
are to be orthogonalized.
ldz The leading dimension of the output array z; ldz ≥ max(1, n).
work Workspace array of size 5n.
iwork Workspace array of size n.
Output Parameters
z Array of size ldz * m.
The computed eigenvectors. The eigenvector associated with the eigenvalue

w[i] is stored in the (i+1)-th column of the matrix Z represented by z,
i=0, ..., m-1. Any vector that fails to converge is set to its current iterate
after maxits iterations.
ifail
Array of size m.
On normal exit, all elements of ifail are zero. If one or more eigenvectors
fail to converge after maxits iterations, then their indices are stored in the
array ifail.
info
info = 0, the exit is successful.
info < 0: if info = -i, the i-th had an illegal value.
info > 0: if info = i, then i eigenvectors failed to converge in maxits
iterations. Their indices are stored in the array ifail.
See Also
notations.
?dbtf2
Computes an LU factorization of a general band matrix
with no pivoting (local unblocked algorithm).
Syntax
void sdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , float *ab , MKL_INT
*ldab , MKL_INT *info );
void ddbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , double *ab ,
MKL_INT *ldab , MKL_INT *info );
void cdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex8 *ab ,
1602
void zdbtf2 (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex16 *ab ,
Include Files
• mkl_scalapack.h
Description
The ?dbtf2function computes an LU factorization of a general real/complex m-by-n band matrix A without
using partial pivoting with row interchanges.
This is the unblocked version of the algorithm, calling BLAS Routines and Functions.
Input Parameters
m The number of rows of the matrix A(m ≥ 0).
n The number of columns in A(n≥ 0).
kl The number of sub-diagonals within the band of A(kl ≥ 0).
ku The number of super-diagonals within the band of A(ku≥ 0).
ab Array of size ldab * n.

The matrix A in band storage, in rows kl+1 to 2kl+ku+1; rows 1 to kl
of the matrix need not be set. The j-th column of A is stored in the
array ab as follows: ab[kl+ku+i-j+(j-1)*ldab] = A(i,j) for max(1,j-
ku) ≤i≤min(m,j+kl).
ldab The leading dimension of the array ab.
(ldab≥ 2kl + ku +1)
Output Parameters
ab On exit, details of the factorization: U is stored as an upper triangular band

matrix with kl+ku superdiagonals in rows 1 to kl+ku+1, and the multipliers
used during the factorization are stored in rows kl+ku+2 to 2*kl+ku+1.
See the Application Notes below for further details.
info
< 0: if info = - i, the i-th argument had an illegal value,
> 0: if info = + i, the matrix elementU(i,i) is 0. The factorization has been
completed, but the factor U is exactly singular. Division by 0 will occur if
you use the factor U for solving a system of linear equations.
Application Notes
The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2, ku = 1:
1603
The function does not use array elements marked *; elements marked + need not be set on entry, but the
function requires them to store elements of U, because of fill-in resulting from the row interchanges.
See Also
notations.
?dbtrf
Computes an LU factorization of a general band matrix
with no pivoting (local blocked algorithm).
Syntax
void sdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , float *ab , MKL_INT
*ldab , MKL_INT *info );
void ddbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , double *ab ,
void cdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex8 *ab ,
void zdbtrf (MKL_INT *m , MKL_INT *n , MKL_INT *kl , MKL_INT *ku , MKL_Complex16 *ab ,
Include Files
• mkl_scalapack.h
Description
This function computes an LU factorization of a real m-by-n band matrix A without using partial pivoting or
row interchanges.
This is the blocked version of the algorithm, calling BLAS Routines and Functions.
Input Parameters
m The number of rows of the matrix A (m ≥ 0).
n The number of columns in A(n≥ 0).
kl The number of sub-diagonals within the band of A(kl≥ 0).
ku The number of super-diagonals within the band of A(ku≥ 0).
ab Array of size ldab * n.
1604
The matrix A in band storage, in rows kl+1 to 2kl+ku+1; rows 1 to kl need
not be set. The j-th column of A is stored in the array ab as follows: ab[kl
+ku+i-j+(j-1)*ldab] = A(i,j) for max(1,j-ku) ≤i≤ min(m,j+kl).
ldab The leading dimension of the array ab.

(ldab ≥ 2kl + ku +1)
Output Parameters
ab On exit, details of the factorization: U is stored as an upper triangular band

matrix with kl+ku superdiagonals in rows 1 to kl+ku+1, and the multipliers
used during the factorization are stored in rows kl+ku+2 to 2*kl+ku+1.
See the Application Notes below for further details.
info
> 0: if info = + i, the matrix element U(i,i) is 0. The factorization
has been completed, but the factor U is exactly singular. Division by
0 will occur if you use the factor U for solving a system of linear
equations.
Application Notes
The band storage scheme is illustrated by the following example, when m = n = 6, kl = 2, ku = 1:
The function does not use array elements marked *.
See Also
notations.
?dttrf
Computes an LU factorization of a general tridiagonal
matrix with no pivoting (local blocked algorithm).
Syntax
void sdttrf (MKL_INT *n , float *dl , float *d , float *du , MKL_INT *info );
void ddttrf (MKL_INT *n , double *dl , double *d , double *du , MKL_INT *info );
void cdttrf (MKL_INT *n , MKL_Complex8 *dl , MKL_Complex8 *d , MKL_Complex8 *du ,
MKL_INT *info );
void zdttrf (MKL_INT *n , MKL_Complex16 *dl , MKL_Complex16 *d , MKL_Complex16 *du ,
MKL_INT *info );
1605
Include Files
• mkl_scalapack.h
Description
The ?dttrffunction computes an LU factorization of a real or complex tridiagonal matrix A using elimination
without partial pivoting.
The factorization has the form A = L*U, where L is a product of unit lower bidiagonal matrices and U is upper
triangular with nonzeros only in the main diagonal and first superdiagonal.
Input Parameters
n The order of the matrix A(n ≥ 0).

The array dl of size (n-1) contains the sub-diagonal elements of A.
The array d of size n contains the diagonal elements of A.

The array du of size (n-1) contains the super-diagonal elements of A.
Output Parameters
dl Overwritten by the (n-1) multipliers that define the matrix L from the LU
factorization of A.
d Overwritten by the n diagonal elements of the upper triangular matrix U

from the LU factorization of A.
du Overwritten by the (n-1) elements of the first super-diagonal of U.
info
> 0: if info = i, the matrix element U(i,i) is exactly 0. The factorization has
been completed, but the factor U is exactly singular. Division by 0 will occur
if you use the factor U for solving a system of linear equations.
See Also
notations.
?dttrsv
Solves a general tridiagonal system of linear equations
using the LU factorization computed by ?dttrf.
Syntax
void sdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *dl ,
float *d , float *du , float *b , MKL_INT *ldb , MKL_INT *info );
void ddttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *dl ,
double *d , double *du , double *b , MKL_INT *ldb , MKL_INT *info );
1606
void cdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8
*dl , MKL_Complex8 *d , MKL_Complex8 *du , MKL_Complex8 *b , MKL_INT *ldb , MKL_INT
*info );
void zdttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16
*dl , MKL_Complex16 *d , MKL_Complex16 *du , MKL_Complex16 *b , MKL_INT *ldb , MKL_INT
*info );
Include Files
• mkl_scalapack.h
Description
The ?dttrsvfunction solves one of the following systems of linear equations:
L*X = B, LT*X = B, or LH*X = B,

U*X = B, UT*X = B, or UH*X = B
with factors of the tridiagonal matrix A from the LU factorization computed by ?dttrf.
Input Parameters
uplo Specifies whether to solve with L or U.

If trans = 'N', then A*X=B is solved for X (no transpose).
If trans = 'T', then AT*X = B is solved for X (transpose).
If trans = 'C', then AH*X = B is solved for X (conjugate transpose).
n The order of the matrix A(n≥ 0).
nrhs The number of right-hand sides, that is, the number of columns in the
matrix B(nrhs≥ 0).
dl,d,du,b The array dl of size (n - 1) contains the (n - 1) multipliers that define the
matrix L from the LU factorization of A.
The array d of size n contains n diagonal elements of the upper triangular
The array du of size (n - 1) contains the (n - 1) elements of the first super-
diagonal of U.
On entry, the array b of size ldb * nrhs contains the right-hand side of
matrix B.
ldb The leading dimension of the array b; ldb ≥ max(1, n).
Output Parameters
1607
See Also
notations.
?pttrsv
Solves a symmetric (Hermitian) positive-definite
tridiagonal system of linear equations, using the
L*D*LH factorization computed by ?pttrf.
Syntax
void spttrsv (char *trans , MKL_INT *n , MKL_INT *nrhs , float *d , float *e , float
*b , MKL_INT *ldb , MKL_INT *info );
void dpttrsv (char *trans , MKL_INT *n , MKL_INT *nrhs , double *d , double *e ,
double *b , MKL_INT *ldb , MKL_INT *info );
void cpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , float *d ,
MKL_Complex8 *e , MKL_Complex8 *b , MKL_INT *ldb , MKL_INT *info );
void zpttrsv (char *uplo , char *trans , MKL_INT *n , MKL_INT *nrhs , double *d ,
MKL_Complex16 *e , MKL_Complex16 *b , MKL_INT *ldb , MKL_INT *info );
Include Files
• mkl_scalapack.h
Description
The ?pttrsvfunction solves one of the triangular systems:
LT*X = B, or L*X = B for real flavors,

or
L*X = B, or LH*X = B,
U*X = B, or UH*X = B for complex flavors,
where L (or U for complex flavors) is the Cholesky factor of a Hermitian positive-definite tridiagonal matrix A
such that
A = L*D*LH (computed by spttrf/dpttrf)
or
A = UH*D*U or A = L*D*LH (computed by cpttrf/zpttrf).
Input Parameters
Specifies whether the superdiagonal or the subdiagonal of the tridiagonal

matrix A is stored and the form of the factorization:
If uplo = 'U', e is the superdiagonal of U, and A = UH*D*U or A =
L*D*LH;
if uplo = 'L', e is the subdiagonal of L, and A = L*D*LH.
The two forms are equivalent, if A is real.
trans
1608
for real flavors:
if trans = 'N': L*X = B (no transpose)
if trans = 'T': LT*X = B (transpose)

if trans = 'N': U*X = B or L*X = B (no transpose)
if trans = 'C': UH*X = B or LH*X = B (conjugate transpose).
n The order of the tridiagonal matrix A. n ≥ 0.
nrhs The number of right hand sides, that is, the number of columns of the
matrix B. nrhs≥ 0.
d array of size n. The n diagonal elements of the diagonal matrix D from the
factorization computed by ?pttrf.
e array of size (n-1). The (n-1) off-diagonal elements of the unit bidiagonal
factor U or L from the factorization computed by ?pttrf. See uplo.
b array of size ldb* nrhs.

On entry, the right hand side matrix B.
ldb
The leading dimension of the array b.
ldb≥ max(1, n).
Output Parameters
b On exit, the solution matrix X.
info
See Also
notations.
?steqr2
eigenvectors of a symmetric tridiagonal matrix using
the implicit QL or QR method.
Syntax
void ssteqr2 (char *compz , MKL_INT *n , float *d , float *e , float *z , MKL_INT
*ldz , MKL_INT *nr , float *work , MKL_INT *info );
void dsteqr2 (char *compz , MKL_INT *n , double *d , double *e , double *z , MKL_INT
*ldz , MKL_INT *nr , double *work , MKL_INT *info );
Include Files
• mkl_scalapack.h
1609
Description
The ?steqr2function is a modified version of LAPACK function ?steqr. The ?steqr2function computes all
eigenvalues and, optionally, eigenvectors of a symmetric tridiagonal matrix using the implicit QL or QR
method. ?steqr2 is modified from ?steqr to allow each ScaLAPACK process running ?steqr2 to perform
updates on a distributed matrix Q. Proper usage of ?steqr2 can be gleaned from examination of ScaLAPACK
function p?syev.
Input Parameters
compz Must be 'N' or 'I'.
If compz = 'N', the function computes eigenvalues only. If compz = 'I',

the function computes the eigenvalues and eigenvectors of the tridiagonal
matrix T.
z must be initialized to the identity matrix by p?laset or ?laset prior to
entering this function.
n The order of the matrix T(n ≥ 0).
d, e, work Arrays:
d contains the diagonal elements of T. The size of d must be at least
max(1, n).
e contains the (n-1) subdiagonal elements of T. The size of e must be at
least max(1, n-1).
work is a workspace array. The size of work is max(1, 2*n-2). If compz =

'N', then work is not referenced.
z (local)
Array of global size n* n and of local size ldz* nr.
If compz = 'V', then z contains the orthogonal matrix used in the

ldz The leading dimension of the array z. Constraints:
ldz ≥ 1,
ldz ≥ max(1, n), if eigenvectors are desired.
nr nr = max(1, numroc(n, nb, myprow, 0, nprocs)).

If compz = 'N', then nr is not referenced.
Output Parameters
d On exit, the eigenvalues in ascending order, if info = 0.
See also info.
e On exit, e has been destroyed.
z On exit, if info = 0, then,
1610
if compz = 'V', z contains the orthonormal eigenvectors of the original
symmetric matrix, and if compz = 'I', z contains the orthonormal
eigenvectors of the symmetric tridiagonal matrix. If compz = 'N', then z is
not referenced.
info
info = 0, the exit is successful.
info < 0: if info = -i, the i-th had an illegal value.
info > 0: the algorithm has failed to find all the eigenvalues in a total of
30n iterations;
if info = i, then i elements of e have not converged to zero; on exit, d

and e contain the elements of a symmetric tridiagonal matrix, which is
orthogonally similar to the original matrix.
See Also
notations.
?trmvt
Performs matrix-vector operations.
Syntax
void strmvt (const char* uplo, const MKL_INT* n, const float* t, const MKL_INT* ldt,
float* x, const MKL_INT* incx, const float* y, const MKL_INT* incy, float* w, const
MKL_INT* incw, const float* z, const MKL_INT* incz);
void dtrmvt (const char* uplo, const MKL_INT* n, const double* t, const MKL_INT* ldt,
double* x, const MKL_INT* incx, const double* y, const MKL_INT* incy, double* w, const
MKL_INT* incw, const double* z, const MKL_INT* incz);
void ctrmvt (const char* uplo, const MKL_INT* n, const MKL_Complex8* t, const MKL_INT*
ldt, MKL_Complex8* x, const MKL_INT* incx, const MKL_Complex8* y, const MKL_INT* incy,
MKL_Complex8* w, const MKL_INT* incw, const MKL_Complex8* z, const MKL_INT* incz);
void ztrmvt (const char* uplo, const MKL_INT* n, const MKL_Complex16* t, const MKL_INT*
ldt, MKL_Complex16* x, const MKL_INT* incx, const MKL_Complex16* y, const MKL_INT*
incy, MKL_Complex16* w, const MKL_INT* incw, const MKL_Complex16* z, const MKL_INT*
incz);
Include Files
• mkl_scalapack.h
Description
?trmvt performs the matrix-vector operations as follows:
strmvt and dtrmvt: x := T' *y, and w := T *z
ctrmvt and ztrmvt: x := conjg( T' ) *y, and w := T *z,
where x is an n element vector and T is an n-by-n upper or lower triangular matrix.
1611
Input Parameters
uplo On entry, uplo specifies whether the matrix is an upper or lower triangular
matrix as follows:
uplo = 'U' or 'u'
A is an upper triangular matrix.
uplo = 'L' or 'l'
A is a lower triangular matrix.
Unchanged on exit.
n On entry, n specifies the order of the matrix A. n must be at least zero.
Unchanged on exit.
t Array of size ( ldt, n ).
Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part
of the array t must contain the upper triangular matrix and the strictly
lower triangular part of t is not referenced.
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part
of the array t must contain the lower triangular matrix and the strictly
upper triangular part of t is not referenced.
ldt On entry, lda specifies the first dimension of A as declared in the calling
(sub) program. lda must be at least max( 1, n ).
Unchanged on exit.
incx On entry, incx specifies the increment for the elements of x. incx must
not be zero.
Unchanged on exit.
y Array of size at least ( 1 + ( n - 1 )*abs( incy ) ).
Before entry, the incremented array y must contain the n element vector y.
Unchanged on exit.
incy On entry, incy specifies the increment for the elements of y. incy must
not be zero.
Unchanged on exit.
incw On entry, incw specifies the increment for the elements of w. incw must
not be zero.
Unchanged on exit.
z Array of size at least ( 1 + ( n - 1 )*abs( incz ) ).
Before entry, the incremented array z must contain the n element vector z.
Unchanged on exit.
incz On entry, incz specifies the increment for the elements of z. incz must
not be zero.
1612
Unchanged on exit.
Output Parameters
t Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part
of the array t must contain the upper triangular matrix and the strictly
lower triangular part of t is not referenced.
Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part
of the array t must contain the lower triangular matrix and the strictly
upper triangular part of t is not referenced.
x Array of size at least ( 1 + ( n - 1 )*abs( incx ) ).
On exit, x = T' * y.
w Array of size at least ( 1 + ( n - 1 )*abs( incw ) ).
On exit, w = T * z.
pilaenv
Returns the positive integer value of the logical
blocking size.
Syntax
MKL_INT pilaenv (const MKL_INT *ictxt , const char *prec);
Include Files
• mkl_pblas.h
Description
pilaenv returns the positive integer value of the logical blocking size. This value is machine and precision
specific. This version provides a logical blocking size which should give good though not optimal performance
on many of the currently available distributed-memory concurrent computers. You are encouraged to modify
this subroutine to set this tuning parameter for your particular machine.
Input Parameters
ictxt On entry, ictxt specifies the BLACS context handle, indicating the global
context of the operation. The context itself is global, but the value of ictxt
is local.
prec On input, prec specifies the precision for which the logical block size should
be returned as follows:
prec = 'S' or 's' single precision real,
prec = 'D' or 'd' double precision real,
prec = 'C' or 'c' single precision complex,
prec = 'Z' or 'z' double precision complex,
prec = 'I' or 'i' integer.
1613
Application Notes
Before modifying this routine to tune the library performance on your system, be aware of the following:
1. The value this function returns must be strictly larger than zero,
2. If you are planning to link your program with different instances of the library (for example, on a
heterogeneous machine), you must compile each instance of the library with exactly the same version
of this routine for obvious interoperability reasons.
pilaenvx
Called from the ScaLAPACK routines to choose
problem-dependent parameters for the local
environment.
Syntax
MKL_INT pilaenvx (const MKL_INT* ictxt, const MKL_INT* ispec, const char* name, const
char* opts, const MKL_INT* n1, const MKL_INT* n2, const MKL_INT* n3, const MKL_INT*
Include Files
• mkl.h
Description
pilaenvx is called from the ScaLAPACK routines to choose problem-dependent parameters for the local
environment. See ispec for a description of the parameters. This version provides a set of parameters which
should give good, though not optimal, performance on many of the currently available computers. You are
encouraged to modify this subroutine to set the tuning parameters for your particular machine using the
option and problem size information in the arguments.
Input Parameters
ictxt (local input)On entry, ictxt specifies the BLACS context handle, indicating
the global context of the operation. The context itself is global, but the
value of ictxt is local.
ispec (global input)

Specifies the parameter to be returned as the value of pilaenvx.
= 1: the optimal blocksize; if this value is 1, an unblocked algorithm will

give the best performance (unlikely).
= 2: the minimum block size for which the block routine should be used; if
the usable block size is less than this value, an unblocked routine should be
used.
= 3: the crossover point (in a block routine, for N less than this value, an
unblocked routine should be used).
= 4: the number of shifts, used in the nonsymmetric eigenvalue routines
(DEPRECATED).
= 5: the minimum column dimension for blocking to be used; rectangular
blocks must have dimension at least k by m, where k is given by
pilaenvx(2,...) and m by pilaenvx(5,...).
= 6: the crossover point for the SVD (when reducing an m by n matrix to
bidiagonal form, if max(m,n)/min(m,n) exceeds this value, a QR
factorization is used first to reduce the matrix to a triangular form).
1614
= 7: the number of processors.
= 8: the crossover point for the multishift QR method for nonsymmetric
eigenvalue problems (DEPRECATED).
= 9: maximum size of the subproblems at the bottom of the computation
tree in the divide-and-conquer algorithm (used by ?gelsd and ?gesdd).
=10: IEEE NaN arithmetic can be trusted not to trap.

=11: infinity arithmetic can be trusted not to trap.
12 <= ispec <= 16:
p?hseqr or one of its subroutines, see piparmq for detailed explanation.

17 <= ispec <= 22:
Parameters for pb?trord/p?hseqr (not all), as follows:
=17: maximum number of concurrent computational windows;

=18: number of eigenvalues/bulges in each window;
=19: computational window size;
=20: minimal percentage of FLOPS required for performing matrix-matrix
multiplications instead of pipelined orthogonal transformations;
=21: width of block column slabs for row-wise application of pipelined
orthogonal transformations in their factorized form;
=22: the maximum number of eigenvalues moved together over a process
border;
=23: the number of processors involved in Aggressive Early Deflation
(AED);
=99: Maximum iteration chunksize in OpenMP parallelization.
name (global input)

The name of the calling subroutine, in either upper case or lower case.
opts (global input) The character options to the subroutine name, concatenated
into a single character string. For example, uplo = 'U', trans = 'T',
and diag = 'N' for a triangular routine would be specified as opts =
'UTN'.
n1, n2, n3, and n4 (global input) Problem dimensions for the subroutine name; these may not
all be required.
Output Parameters
result (global output)

>= 0: the value of the parameter specified by ispec.
< 0: if pilaenvx = -k, the k-th argument had an illegal value.
Application Notes
The following conventions have been used when calling ilaenv from the LAPACK routines:
1615
1. opts is a concatenation of all of the character options to subroutine name, in the same order that they
appear in the argument list for name, even if they are not used in determining the value of the
parameter specified by ispec.
2. The problem dimensions n1, n2, n3, and n4 are specified in the order that they appear in the argument
list for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a value
of -1.
3. The parameter value returned by ilaenv is checked for validity in the calling subroutine. For example,
ilaenv is used to retrieve the optimal block size for strtri as follows:
NB = ilaenv( 1, 'STRTRI', UPLO // DIAG, N, -1, -1, -1 );

if( NB<=1 ) {
NB = MAX( 1, N );
}
The same conventions hold for this ScaLAPACK-style variant.
pjlaenv
Called from the ScaLAPACK symmetric and Hermitian
tailored eigen-routines to choose problem-dependent
parameters for the local environment.
Syntax
MKL_INT pjlaenv (const MKL_INT* ictxt, const MKL_INT* ispec, const char* name, const
char* opts, const MKL_INT* n1, const MKL_INT* n2, const MKL_INT* n3, const MKL_INT*
n4);
Include Files
• mkl.h
Description
pjlaenv is called from the ScaLAPACK symmetric and Hermitian tailored eigen-routines to choose problem-
dependent parameters for the local environment. See ispec for a description of the parameters. This version
provides a set of parameters which should give good, though not optimal, performance on many of the
currently available computers. You are encouraged to modify this subroutine to set the tuning parameters for
your particular machine using the option and problem size information in the arguments.
Input Parameters
ispec (global input) Specifies the parameter to be returned as the value of

pjlaenv.
= 1: the data layout blocksize;
= 2: the panel blocking factor;
= 3: the algorithmic blocking factor;
= 4: execution path control;
= 5: maximum size for direct call to the LAPACK routine.
name (global input) The name of the calling subroutine, in either upper case or
lower case.
1616
opts (global input) The character options to the subroutine name, concatenated
into a single character string. For example, uplo = 'U', trans = 'T',
and diag = 'N' for a triangular routine would be specified as opts =
'UTN'.
n1, n2, n3, and n4 (global input) Problem dimensions for the subroutine name; these may not
all be required. At present, only n1 is used, and it (n1) is used only for
'TTRD'.
Output Parameters
result (global or local output)

>= 0: the value of the parameter specified by ispec.
< 0: if pjlaenv = -k, the k-th argument had an illegal value. Most
parameters set via a call to pjlaenv must be identical on all processors and
hence pjlaenv will return the same value to all procesors (i.e. global
output). However some, in particular, the panel blocking factor can be
different on each processor and hence pjlaenv can return different values
on different processors (i.e. local output).
Application Notes
The following conventions have been used when calling pjlaenv from the ScaLAPACK routines:
1. opts is a concatenation of all of the character options to subroutine name, in the same order that they
appear in the argument list for name, even if they are not used in determining the value of the
parameter specified by ispec.
2. The problem dimensions n1, n2, n3, and n4 are specified in the order that they appear in the argument
list for name. n1 is used first, n2 second, and so on, and unused problem dimensions are passed a
value of -1.
a. The parameter value returned by pjlaenv is checked for validity in the calling subroutine. For
example, pjlaenv is used to retrieve the optimal blocksize for STRTRI as follows:
NB = pjlaenv( 1, 'STRTRI', UPLO // DIAG, N, -1, -1, -1 );

IF( NB>=1 ) {
NB = MAX( 1, N );
}
pjlaenv is patterned after ilaenv and keeps the same interface in anticipation of future needs, even though
pjlaenv is only sparsely used at present in ScaLAPACK. Most ScaLAPACK codes use the input data layout
blocking factor as the algorithmic blocking factor - hence there is no need or opportunity to set the
algorithmic or data decomposition blocking factor. pXYYtevx.f and pXYYtgvx.f and pXYYttrd.f are the
only codes which call pjlaenv. pXYYtevx.f and pXYYtgvx.f redistribute the data to the best data layout
for each transformation. pXYYttrd.f uses a data layout blocking factor of 1.
Additional ScaLAPACK Routines

void pchettrd (const char *uplo , const MKL_INT *n , MKL_Complex8 *a , const MKL_INT
*ia , const MKL_INT *ja , const MKL_INT *desca , float *d , float *e , MKL_Complex8
void pzhettrd (const char *uplo , const MKL_INT *n , MKL_Complex16 *a , const MKL_INT
*ia , const MKL_INT *ja , const MKL_INT *desca , double *d , double *e , MKL_Complex16
1617
void pslaed0 (const MKL_INT *n , float *d , float *e , float *q , const MKL_INT *iq ,
const MKL_INT *jq , const MKL_INT *descq , float *work , MKL_INT *iwork , MKL_INT
*info );
void pdlaed0 (const MKL_INT *n , double *d , double *e , double *q , const MKL_INT
*iq , const MKL_INT *jq , const MKL_INT *descq , double *work , MKL_INT *iwork ,
MKL_INT *info );
void pslaed1 (const MKL_INT *n , const MKL_INT *n1 , float *d , const MKL_INT *id ,
float *q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *descq , const float
*rho , float *work , MKL_INT *iwork , MKL_INT *info );
void pdlaed1 (const MKL_INT *n , const MKL_INT *n1 , double *d , const MKL_INT *id ,
double *q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *descq , const double
*rho , double *work , MKL_INT *iwork , MKL_INT *info );
void pslaed2 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*n1 , const MKL_INT *nb , float *d , const MKL_INT *drow , const MKL_INT *dcol , float
*q , const MKL_INT *ldq , float *rho , const float *z , float *w , float *dlamda ,
float *q2 , const MKL_INT *ldq2 , float *qbuf , MKL_INT *ctot , MKL_INT *psm , const
MKL_INT *npcol , MKL_INT *indx , MKL_INT *indxc , MKL_INT *indxp , MKL_INT *indcol ,
MKL_INT *coltyp , MKL_INT *nn , MKL_INT *nn1 , MKL_INT *nn2 , MKL_INT *ib1 , MKL_INT
*ib2 );
void pdlaed2 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*n1 , const MKL_INT *nb , double *d , const MKL_INT *drow , const MKL_INT *dcol ,
double *q , const MKL_INT *ldq , double *rho , const double *z , double *w , double
*dlamda , double *q2 , const MKL_INT *ldq2 , double *qbuf , MKL_INT *ctot , MKL_INT
*psm , const MKL_INT *npcol , MKL_INT *indx , MKL_INT *indxc , MKL_INT *indxp ,
MKL_INT *indcol , MKL_INT *coltyp , MKL_INT *nn , MKL_INT *nn1 , MKL_INT *nn2 ,
MKL_INT *ib1 , MKL_INT *ib2 );
void pslaed3 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*nb , float *d , const MKL_INT *drow , const MKL_INT *dcol , float *rho , float
*dlamda , float *w , const float *z , float *u , const MKL_INT *ldu , float *buf ,
MKL_INT *indx , MKL_INT *indcol , MKL_INT *indrow , MKL_INT *indxr , MKL_INT *indxc ,
MKL_INT *ctot , const MKL_INT *npcol , MKL_INT *info );
void pdlaed3 (const MKL_INT *ictxt , MKL_INT *k , const MKL_INT *n , const MKL_INT
*nb , double *d , const MKL_INT *drow , const MKL_INT *dcol , double *rho , double
*dlamda , double *w , const double *z , double *u , const MKL_INT *ldu , double *buf ,
MKL_INT *indx , MKL_INT *indcol , MKL_INT *indrow , MKL_INT *indxr , MKL_INT *indxc ,
MKL_INT *ctot , const MKL_INT *npcol , MKL_INT *info );
void pslaedz (const MKL_INT *n , const MKL_INT *n1 , const MKL_INT *id , const float
*q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *ldq , const MKL_INT
*descq , float *z , float *work );
void pdlaedz (const MKL_INT *n , const MKL_INT *n1 , const MKL_INT *id , const double
*q , const MKL_INT *iq , const MKL_INT *jq , const MKL_INT *ldq , const MKL_INT
*descq , double *z , double *work );
void pdlaiectb (const double *sigma , const MKL_INT *n , const double *d , MKL_INT
*count );
void pdlaiectl (const double *sigma , const MKL_INT *n , const double *d , MKL_INT
*count );
void slamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const float *A ,
const MKL_INT *LDA , float *B , const MKL_INT *LDB );
void dlamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const double *A ,
const MKL_INT *LDA , double *B , const MKL_INT *LDB );
1618
void clamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const
MKL_Complex8 *A , const MKL_INT *LDA , MKL_Complex8 *B , const MKL_INT *LDB );
void zlamov (const char *UPLO , const MKL_INT *M , const MKL_INT *N , const
MKL_Complex16 *A , const MKL_INT *LDA , MKL_Complex16 *B , const MKL_INT *LDB );
void pslamr1d (const MKL_INT *n , float *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_INT *desca , float *b , const MKL_INT *ib , const MKL_INT *jb , const MKL_INT
*descb );
void pdlamr1d (const MKL_INT *n , double *a , const MKL_INT *ia , const MKL_INT *ja ,
const MKL_INT *desca , double *b , const MKL_INT *ib , const MKL_INT *jb , const
MKL_INT *descb );
void pclamr1d (const MKL_INT *n , MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex8 *b , const MKL_INT *ib , const MKL_INT *jb ,
const MKL_INT *descb );
void pzlamr1d (const MKL_INT *n , MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex16 *b , const MKL_INT *ib , const MKL_INT
*jb , const MKL_INT *descb );
void clanv2 (MKL_Complex8 *a , MKL_Complex8 *b , MKL_Complex8 *c , MKL_Complex8 *d ,
MKL_Complex8 *rt1 , MKL_Complex8 *rt2 , float *cs , MKL_Complex8 *sn );
void zlanv2 (MKL_Complex16 *a , MKL_Complex16 *b , MKL_Complex16 *c , MKL_Complex16
*d , MKL_Complex16 *rt1 , MKL_Complex16 *rt2 , double *cs , MKL_Complex16 *sn );
void pclattrs (const char *uplo , const char *trans , const char *diag , const char
*normin , const MKL_INT *n , const MKL_Complex8 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex8 *x , const MKL_INT *ix , const MKL_INT *jx ,
const MKL_INT *descx , float *scale , float *cnorm , MKL_INT *info );
void pzlattrs (const char *uplo , const char *trans , const char *diag , const char
*normin , const MKL_INT *n , const MKL_Complex16 *a , const MKL_INT *ia , const MKL_INT
*ja , const MKL_INT *desca , MKL_Complex16 *x , const MKL_INT *ix , const MKL_INT
*jx , const MKL_INT *descx , double *scale , double *cnorm , MKL_INT *info );
void pssyttrd (const char *uplo , const MKL_INT *n , float *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_INT *desca , float *d , float *e , float *tau , float
*work , const MKL_INT *lwork , MKL_INT *info );
void pdsyttrd (const char *uplo , const MKL_INT *n , double *a , const MKL_INT *ia ,
const MKL_INT *ja , const MKL_INT *desca , double *d , double *e , double *tau ,
double *work , const MKL_INT *lwork , MKL_INT *info );
MKL_INT piparmq (const MKL_INT *ictxt , const MKL_INT *ispec , const char *name , const
char *opts , const MKL_INT *n , const MKL_INT *ilo , const MKL_INT *ihi , const MKL_INT
*lworknb );
For descriptions of these functions, please see http://www.netlib.org/scalapack/explore-html/files.html.
ScaLAPACK Utility Functions and Routines

This section describes ScaLAPACK utility functions and routines. Summary information about these routines is
given in the following table:
ScaLAPACK Utility Functions and Routines
Routine Name Data Types Description
p?labad s,d Returns the square root of the underflow and overflow thresholds if the
exponent-range is very large.
p?lachkieee s,d Performs a simple check for the features of the IEEE standard.
1619
p?lamch s,d Determines machine parameters for floating-point arithmetic.
p?lasnbt s,d Computes the position of the sign bit of a floating-point number.
See Also
pxerbla Error handling routine called by ScaLAPACK routines.
p?labad
Returns the square root of the underflow and overflow
thresholds if the exponent-range is very large.
Syntax
void pslabad (MKL_INT *ictxt , float *small , float *large );
void pdlabad (MKL_INT *ictxt , double *small , double *large );
Include Files
• mkl_scalapack.h
Description
The p?labadfunction takes as input the values computed by p?lamch for underflow and overflow, and
returns the square root of each of these values if the log of large is sufficiently large. This function is
intended to identify machines with a large exponent range, such as the Crays, and redefine the underflow
and overflow limits to be the square roots of the values computed by p?lamch. This function is needed
because p?lamch does not compensate for poor arithmetic in the upper half of the exponent range, as is
found on a Cray.
In addition, this function performs a global minimization and maximization on these values, to support
heterogeneous computing networks.
Input Parameters
ictxt (global)
The BLACS context handle in which the computation takes place.
small (local).
On entry, the underflow threshold as computed by p?lamch.
large (local).
On entry, the overflow threshold as computed by p?lamch.
Output Parameters
small (local).
On exit, if log10(large) is sufficiently large, the square root of small,
otherwise unchanged.
large (local).
On exit, if log10(large) is sufficiently large, the square root of large,
otherwise unchanged.
1620
See Also
notations.
p?lachkieee
Performs a simple check for the features of the IEEE
standard.
Syntax
void pslachkieee (MKL_INT *isieee , float *rmax , float *rmin );
void pdlachkieee (MKL_INT *isieee , float *rmax , float *rmin );
Include Files
• mkl_scalapack.h
Description
The p?lachkieeefunction performs a simple check to make sure that the features of the IEEE standard are
implemented. In some implementations, p?lachkieee may not return.
This is a ScaLAPACK internal function and arguments are not checked for unreasonable values.
Input Parameters
rmax (local).
The overflow threshold(= ?lamch ('O')).
rmin (local).
The underflow threshold(= ?lamch ('U')).
Output Parameters
isieee (local).
On exit, isieee = 1 implies that all the features of the IEEE standard that
we rely on are implemented. On exit, isieee = 0 implies that some the
features of the IEEE standard that we rely on are missing.
See Also
notations.
p?lamch
Determines machine parameters for floating-point
arithmetic.
Syntax
float pslamch (MKL_INT *ictxt , char *cmach );
double pdlamch (MKL_INT *ictxt , char *cmach );
1621
Include Files
• mkl_scalapack.h
Description
The p?lamchfunction determines single precision machine parameters.
Input Parameters
ictxt (global). The BLACS context handle in which the computation takes place.
cmach (global)
Specifies the value to be returned by p?lamch:
= 'E' or 'e', p?lamch := eps
= 'S' or 's' , p?lamch := sfmin
= 'B' or 'b', p?lamch := base
= 'P' or 'p', p?lamch := eps*base
= 'N' or 'n', p?lamch := t
= 'R' or 'r', p?lamch := rnd
= 'M' or 'm', p?lamch := emin
= 'U' or 'u', p?lamch := rmin
= 'L' or 'l', p?lamch := emax
= 'O' or 'o', p?lamch := rmax,
where
eps = relative machine precision
sfmin = safe minimum, such that 1/sfmin does not overflow
base = base of the machine
prec = eps*base
t = number of (base) digits in the mantissa
rnd = 1.0 when rounding occurs in addition, 0.0 otherwise
emin = minimum exponent before (gradual) underflow
rmin = underflow threshold - base(emin-1)
emax = largest exponent before overflow
rmax = overflow threshold - (baseemax)*(1-eps)
Output Parameters
val Value returned by the function.
See Also
notations.
1622
p?lasnbt
Computes the position of the sign bit of a floating-
point number.
Syntax
void pslasnbt (MKL_INT *ieflag );
void pdlasnbt (MKL_INT *ieflag );
Include Files
• mkl_scalapack.h
Description
The p?lasnbtfunction finds the position of the signbit of a single/double precision floating point number. This
function assumes IEEE arithmetic, and hence, tests only the 32-nd bit (for single precision) or 32-nd and 64-
th bits (for double precision) as a possibility for the signbit. sizeof(int) is assumed equal to 4 bytes.
If a compile time flag (NO_IEEE) indicates that the machine does not have IEEE arithmetic, ieflag = 0 is
returned.
Output Parameters
ieflag This flag indicates the position of the signbit of any single/double precision
floating point number.
ieflag = 0, if the compile time flag NO_IEEE indicates that the machine
does not have IEEE arithmetic, or if sizeof(int) is different from 4 bytes.
ieflag = 1 indicates that the signbit is the 32-nd bit for a single precision
function.
In the case of a double precision function:
ieflag = 1 indicates that the signbit is the 32-nd bit (Big Endian).
ieflag = 2 indicates that the signbit is the 64-th bit (Little Endian).
See Also
notations.
ScaLAPACK Redistribution/Copy Routines

This section describes ScaLAPACK redistribution/copy routines. Summary information about these routines is
given in the following table:
ScaLAPACK Redistribution/Copy Routines
p?gemr2d s,d,c,z,i Copies a submatrix from one general rectangular matrix to another.
p?trmr2d s,d,c,z,i Copies a submatrix from one trapezoidal matrix to another.
See Also
pxerbla Error handling routine called by ScaLAPACK routines.
1623
p?gemr2d
Copies a submatrix from one general rectangular
matrix to another.
Syntax
void psgemr2d (MKL_INT *m, MKL_INT *n, float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT
*desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pdgemr2d (MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*ictxt );
void pcgemr2d (MKL_INT *m , MKL_INT *n MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *ictxt );
void pzgemr2d (MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb ,
MKL_INT *ictxt );
void pigemr2d (MKL_INT *m , MKL_INT *n , MKL_INT *a , MKL_INT *ia , MKL_INT *ja ,
MKL_INT *desca , MKL_INT *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT
*ictxt );
Include Files
• mkl_scalapack.h
Description
The p?gemr2dfunction copies the indicated matrix or submatrix of A to the indicated matrix or submatrix of
B. It provides a truly general copy from any block cyclicly-distributed matrix or submatrix to any other block
cyclicly-distributed matrix or submatrix. With p?trmr2d, these functions are the only ones in the ScaLAPACK
library which provide inter-context operations: they can take a matrix or submatrix A in context A
(distributed over process grid A) and copy it to a matrix or submatrix B in context B (distributed over process
grid B).
There does not need to be a relationship between the two operand matrices or submatrices other than their
global size and the fact that they are both legal block cyclicly-distributed matrices or submatrices. This
means that they can, for example, be distributed across different process grids, have varying block sizes and
differing matrix starting points, or be contained in different sized distributed matrices.
Take care when context A is disjoint from context B. The general rules for which parameters need to be set
are:
• All calling processes must have the correct m and n.

• Processes in context A must correctly define all parameters describing A.
• Processes in context B must correctly define all parameters describing B.
• Processes which are not members of context A must pass ctxt_a = -1 and need not set other parameters
describing A.
• Processes which are not members of contextB must pass ctxt_b = -1 and need not set other parameters
describing B.
Because of its generality, p?gemr2d can be used for many operations not usually associated with copy
functions. For instance, it can be used to a take a matrix on one process and distribute it across a process
grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for
instance, this function can be used to move the matrix from the workstation to the supercomputer and back.
1624
In ScaLAPACK, it is called to copy matrices from a two-dimensional process grid to a one-dimensional
process grid. It can be used to redistribute matrices so that distributions providing maximal performance can
be used by various component libraries, as well.
Note that this function requires an array descriptor with dtype_ = 1.
Optimization Notice
Input Parameters
m (global) The number of rows of matrix A to be copied (m≥0).
n (global) The number of columns of matrix A to be copied (n≥0).
a (local)
Pointer into the local memory to array of size lld_a* LOCc(ja+n-1)
containing the source matrix A.
ia, ja (global) The row and column indices in the array A indicating the first row
and the first column, respectively, of the submatrix of A) to copy. 1
≤ia≤total_rows_in_a - m +1, 1 ≤ja≤total_columns_in_a - n +1.
Only dtype_a = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of A, ctxt_a must be equal to
-1.
ib, jb (global) The row and column indices in the array B indicating the first row
and the first column, respectively, of the submatrix B to which to copy the
matrix. 1 ≤ib≤total_rows_in_b - m +1, 1 ≤jb≤total_columns_in_b - n +1.
Only dtype_b = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of B, ctxt_b must be equal to
-1.
ictxt (global).
The context encompassing at least the union of all processes in context A
and context B. All processes in the context ictxt must call this function,
even if they do not own a piece of either matrix.
1625
Output Parameters
b Pointer into the local memory to array of size lld_b*LOCc(jb+n-1).
Overwritten by the submatrix from A.
See Also
notations.
p?trmr2d
Copies a submatrix from one trapezoidal matrix to
another.
Syntax
void pstrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , float *a , MKL_INT
*ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT
*descb , MKL_INT *ictxt );
void pdtrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pctrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pztrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs ,
MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pitrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *a ,
MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *b , MKL_INT *ib , MKL_INT *jb ,
MKL_INT *descb , MKL_INT *ictxt );
Include Files
• mkl_scalapack.h
Description
The p?trmr2dfunction copies the indicated matrix or submatrix of A to the indicated matrix or submatrix of
B. It provides a truly general copy from any block cyclicly-distributed matrix or submatrix to any other block
cyclicly-distributed matrix or submatrix. With p?gemr2d, these functions are the only ones in the ScaLAPACK
library which provide inter-context operations: they can take a matrix or submatrix A in context A
(distributed over process grid A) and copy it to a matrix or submatrix B in context B (distributed over process
grid B).
The p?trmr2dfunction assumes the matrix or submatrix to be trapezoidal. Only the upper or lower part is
copied, and the other part is unchanged.
There does not need to be a relationship between the two operand matrices or submatrices other than their
global size and the fact that they are both legal block cyclicly-distributed matrices or submatrices. This
means that they can, for example, be distributed across different process grids, have varying block sizes and
differing matrix starting points, or be contained in different sized distributed matrices.
Take care when context A is disjoint from context B. The general rules for which parameters need to be set
are:
1626
• All calling processes must have the correct m and n.
• Processes in context A must correctly define all parameters describing A.
• Processes in context B must correctly define all parameters describing B.
• Processes which are not members of context A must pass ctxt_a = -1 and need not set other parameters
describing A.
• Processes which are not members of contextB must pass ctxt_b = -1 and need not set other parameters
describing B.
Because of its generality, p?trmr2d can be used for many operations not usually associated with copy
functions. For instance, it can be used to a take a matrix on one process and distribute it across a process
grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for
instance, this function can be used to move the matrix from the workstation to the supercomputer and back.
In ScaLAPACK, it is called to copy matrices from a two-dimensional process grid to a one-dimensional
process grid. It can be used to redistribute matrices so that distributions providing maximal performance can
be used by various component libraries, as well.
Note that this function requires an array descriptor with dtype_ = 1.
Optimization Notice
Input Parameters
uplo (global) Specifies whether to copy the upper or lower part of the matrix or
submatrix.
uplo = 'U' Copy the upper triangular part.
uplo = 'L' Copy the lower triangular part.
diag (global) Specifies whether to copy the diagonal of the matrix or submatrix.
diag = 'U' Do not copy the diagonal.
diag = 'N' Copy the diagonal.
m (global) The number of rows of matrix A to be copied (m≥0).
n (global) The number of columns of matrix A to be copied (n≥0).
a (local)
Pointer into the local memory to array of size lld_a* LOCc(ja+n-1)
containing the source matrix A.
1627
ia, ja (global) The row and column indices in the array A indicating the first row
and the first column, respectively, of the submatrix of A) to copy. 1
≤ia≤total_rows_in_a - m +1, 1 ≤ja≤total_columns_in_a - n +1.
Only dtype_a = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of A, ctxt_a must be equal to
-1.
ib, jb (global) The row and column indices in the array B indicating the first row
and the first column, respectively, of the submatrix B to which to copy the
matrix. 1 ≤ib≤total_rows_in_b - m +1, 1 ≤jb≤total_columns_in_b - n +1.
Only dtype_b = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of B, ctxt_b must be equal to
-1.
ictxt (global).
The context encompassing at least the union of all processes in context A
and context B. All processes in the context ictxt must call this function,
even if they do not own a piece of either matrix.
Output Parameters
b Pointer into the local memory to array of size lld_b*LOCc(jb+n-1).
Overwritten by the submatrix from A.
See Also
notations.
1628
Sparse Solver Routines 5
Intel® Math Kernel Library (Intel® MKL) provides user-callable sparse solver software to solve real or complex,
symmetric, structurally symmetric or nonsymmetric, positive definite, indefinite or Hermitian square sparse
linear system of algebraic equations.
The terms and concepts required to understand the use of the Intel MKL sparse solver routines are discussed
in the Appendix A "Linear Solvers Basics". If you are familiar with linear sparse solvers and sparse matrix
storage schemes, you can skip these sections and go directly to the interface descriptions.
This chapter describes
• the direct sparse solver based on PARDISO* which is referred to here as Intel MKL PARDISO;
• the alternative interface for the direct sparse solver which is referred to here as the DSS interface;
• iterative sparse solvers (ISS) based on the reverse communication interface (RCI);
• and two preconditioners based on the incomplete LU factorization technique.
Intel MKL PARDISO - Parallel Direct Sparse Solver Interface

This section describes the interface to the shared-memory multiprocessing parallel direct sparse solver
known as the Intel MKL PARDISO solver.
The Intel MKL PARDISO package is a high-performance, robust, memory efficient, and easy to use software
package for solving large sparse linear systems of equations on shared memory multiprocessors. The solver
uses a combination of left- and right-looking Level-3 BLAS supernode techniques [Schenk00-2]. To improve
sequential and parallel sparse numerical factorization performance, the algorithms are based on a Level-3
BLAS update and pipelining parallelism is used with a combination of left- and right-looking supernode
techniques [Schenk00, Schenk01, Schenk02, Schenk03]. The parallel pivoting methods allow complete
supernode pivoting to compromise numerical stability and scalability during the factorization process. For
sufficiently large problem sizes, numerical experiments demonstrate that the scalability of the parallel
algorithm is nearly independent of the shared-memory multiprocessing architecture.
The following table lists the names of the Intel MKL PARDISO routines and describes their general use.
Intel MKL PARDISO Routines
Routine Description
pardisoinit
Initializes Intel MKL PARDISO with default parameters
depending on the matrix type.
pardiso
Calculates the solution of a set of sparse linear equations
with single or multiple right-hand sides.
pardiso_64
Calculates the solution of a set of sparse linear equations
with single or multiple right-hand sides, 64-bit integer
version.
pardiso_getenv
Retrieves additional values from the Intel MKL PARDISO
handle.
pardiso_setenv
Sets additional values in the Intel MKL PARDISO handle.
mkl_pardiso_pivot
Replaces routine which handles Intel MKL PARDISO pivots
with user-defined routine.
pardiso_getdiag
Returns diagonal elements of initial and factorized matrix.
1629
Routine Description
pardiso_handle_store
Store internal structures from pardiso to a file.
pardiso_handle_restore
Restore pardiso internal structures from a file.
pardiso_handle_delete
Delete files with pardiso internal structure data.
pardiso_handle_store_64
Store internal structures from pardiso_64 to a file.
pardiso_handle_restore_64
Restore pardiso_64 internal structures from a file.
pardiso_handle_delete_64
Delete files with pardiso_64 internal structure data.
The Intel MKL PARDISO solver supports a wide range of real and complex sparse matrix types (see the figure
below).
Sparse Matrices That Can Be Solved with the Intel MKL PARDISO Solver
The Intel MKL PARDISO solver performs four tasks:
• analysis and symbolic factorization

• numerical factorization
• forward and backward substitution including iterative refinement
• termination to release all internal solver memory.
You can find code examples that use Intel MKL PARDISO routines to solve systems of linear equations in the
examples folder of the Intel MKL installation directory:
• examples/solverc/source
Supported Matrix Types

The analysis steps performed by Intel MKL PARDISO depend on the structure of the input matrix A.
Symmetric Matrices The solver first computes a symmetric fill-in reducing permutation P based on
either the minimum degree algorithm [Liu85] or the nested dissection algorithm
from the METIS package [Karypis98] (both included with Intel MKL), followed by
the parallel left-right looking numerical Cholesky factorization [Schenk00-2] of
PAPT = LLT for symmetric positive-definite matrices, or PAPT = LDLT for
symmetric indefinite matrices. The solver uses diagonal pivoting, or 1x1 and 2x2
Bunch-Kaufman pivoting for symmetric indefinite matrices. An approximation of
X is found by forward and backward substitution and optional iterative
refinement.
1630
Whenever numerically acceptable 1x1 and 2x2 pivots cannot be found within the
diagonal supernode block, the coefficient matrix is perturbed. One or two passes
of iterative refinement may be required to correct the effect of the perturbations.
This restricting notion of pivoting with iterative refinement is effective for highly
indefinite symmetric systems. Furthermore, for a large set of matrices from
different applications areas, this method is as accurate as a direct factorization
method that uses complete sparse pivoting techniques [Schenk04].
Another method of improving the pivoting accuracy is to use symmetric weighted
matching algorithms. These algorithms identify large entries in the coefficient
matrix A that, if permuted close to the diagonal, permit the factorization process
to identify more acceptable pivots and proceed with fewer pivot perturbations.
These algorithms are based on maximum weighted matchings and improve the
quality of the factor in a complementary way to the alternative of using more
complete pivoting techniques.
The inertia is also computed for real symmetric indefinite matrices.
Structurally Symmetric The solver first computes a symmetric fill-in reducing permutation P followed by
Matrices the parallel numerical factorization of PAPT = QLUT. The solver uses partial
pivoting in the supernodes and an approximation of X is found by forward and
backward substitution and optional iterative refinement.
Nonsymmetric Matrices The solver first computes a nonsymmetric permutation PMPS and scaling matrices
Dr and Dc with the aim of placing large entries on the diagonal to enhance
reliability of the numerical factorization process [Duff99]. In the next step the
solver computes a fill-in reducing permutation P based on the matrix PMPSA +
(PMPSA)T followed by the parallel numerical factorization
QLUR = PPMPSDrADcP
with supernode pivoting matrices Q and R. When the factorization algorithm
reaches a point where it cannot factor the supernodes with this pivoting strategy,
it uses a pivoting perturbation strategy similar to [Li99]. The magnitude of the
potential pivot is tested against a constant threshold of
alpha = eps*||A2||inf,
where eps is the machine precision, A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is
the infinity norm of A. Any tiny pivots encountered during elimination are set to
the sign (lII)*eps*||A2||inf, which trades off some numerical stability for the
ability to keep pivots from getting too small. Although many failures could render
the factorization well-defined but essentially useless, in practice the diagonal
elements are rarely modified for a large class of matrices. The result of this
pivoting approach is that the factorization is, in general, not exact and iterative
refinement may be needed.
Sparse Data Storage

Intel MKL PARDISO stores sparse data in several formats:
• CSR3: The 3-array variation of the compressed sparse row format described in Three Array Variation of
CSR Format.
• BSR3: The three-array variation of the block compressed sparse row format described in Three Array
Variation of BSR Format. Use iparm[36] to specify the block size.
• VBSR: Variable BSR format. Intel MKL PARDISO analyzes the matrix provided in CSR3 format and
converts it into an internal structure which can improve performance for matrices with a block structure.
Use iparm[36] = -t (0 < t≤ 100) to specify use of internal VBSR format and to set the degree of
similarity required to combine elements of the matrix. For example, if you set iparm[36] = -80, two
rows of the input matrix are combined when their non-zero patterns are 80% or more similar.
1631
NOTE
Intel MKL only supports VBSR format for real and symmetric positive definite or indefinite matrices
(mtype = 2 or mtype = -2).
For all storage formats, the Intel MKL PARDISO parameter ja is used for the columns array, ia is used for
rowIndex, and a is used for values. The algorithms in Intel MKL PARDISO require column indices ja to be
in increasing order per row and that the diagonal element in each row be present for any structurally
symmetric matrix. For symmetric or nonsymmetric matrices the diagonal elements which are equal to zero
are not necessary.
CAUTION
Intel MKL PARDISO column indices ja must be in increasing order per row. You can validate the sparse
matrix structure with the matrix checker (iparm[26])
NOTE
While the presence of zero diagonal elements for symmetric matrices is not required, you should
explicitly set zero diagonal elements for symmetric matrices. Otherwise, Intel MKL PARDISO creates
internal copies of arrays ia, ja, and a full of diagonal elements, which require additional memory and
computational time. However, the memory and time required the diagonal elements in internal arrays
is usually not significant compared to the memory and the time required to factor and solve the
matrix.
Storage of Matrices
By default, Intel MKL PARDISO stores matrices in RAM. However, you can specify that Intel MKL PARDISO
store matrices on disk by setting iparm[59]. This is referred to as in-core (IC) and out-of-core (OOC),
respectively.
OOC parameters can be set in a configuration file. You can set the path to this file and its name using
environmental variables MKL_PARDISO_OOC_CFG_PATH and MKL_PARDISO_OOC_CFG_FILE_NAME.
These variables specify the path and filename as follows:

<MKL_PARDISO_OOC_CFG_PATH>/<MKL_PARDISO_OOC_CFG_FILE_NAME> for Linux* OS and macOS*, and
<MKL_PARDISO_OOC_CFG_PATH>\<MKL_PARDISO_OOC_CFG_FILE_NAME> for Windows* OS.
By default, the name of the file is pardiso_ooc.cfg and it is placed in the current directory.
All temporary data files can be deleted or stored when the calculations are completed in accordance with the
value of the environmental variable MKL_PARDISO_OOC_KEEP_FILE. If it is not set or if it is set to 1, then all
files are deleted. If it is set to 0, then all files are stored.
By default, the OOC version of Intel MKL PARDISO uses the current directory for storing data, and all work
arrays associated with the matrix factors are stored in files named ooc_temp with different extensions. These
default values can be changed by using the environmental variable MKL_PARDISO_OOC_PATH.
To set the environmental variables MKL_PARDISO_OOC_MAX_CORE_SIZE,

MKL_PARDISO_OOC_MAX_SWAP_SIZE, MKL_PARDISO_OOC_KEEP_FILE, and MKL_PARDISO_OOC_PATH, create
the configuration file with the following lines:
MKL_PARDISO_OOC_PATH = <path>\ooc_file
MKL_PARDISO_OOC_MAX_CORE_SIZE = N
MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
1632
where <path> is the directory for storing data, ooc_file is the file name without any extension, N is the
maximum size of RAM in megabytes available for Intel MKL PARDISO (default value is 2000 MB), K is the
maximum swap size in megabytes available for Intel MKL PARDISO (default value is 0 MB). Do not set N+K
greater than the size of the RAM plus the size of the swap. Be sure to allow enough free memory for the
operating system and any other processes which are necessary.
CAUTION
The maximum length of the path lines in the configuration files is 1000 characters.
Alternatively the environment variables can be set via command line.

For Linux* OS and macOS*:
export MKL_PARDISO_OOC_PATH = <path>/ooc_file

export MKL_PARDISO_OOC_MAX_CORE_SIZE = N
export MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
export MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
For Windows* OS:
set MKL_PARDISO_OOC_PATH = <path>\ooc_file

set MKL_PARDISO_OOC_MAX_CORE_SIZE = N
set MKL_PARDISO_OOC_MAX_SWAP_SIZE = K
set MKL_PARDISO_OOC_KEEP_FILE = 0 (or 1)
NOTE
The values specified in a command line have higher priorities: if a variable is changed in the
configuration file and in the command line, OOC version of Intel MKL PARDISO uses only the value
defined in the command line.
Direct-Iterative Preconditioning for Nonsymmetric Linear Systems

The solver uses a combination of direct and iterative methods [Sonn89] to accelerate the linear solution
process for transient simulation. Most applications of sparse solvers require solutions of systems with
gradually changing values of the nonzero coefficient matrix, but with an identical sparsity pattern. In these
applications, the analysis phase of the solvers has to be performed only once and the numerical
factorizations are the important time-consuming steps during the simulation. Intel MKL PARDISO uses a
numerical factorization and applies the factors in a preconditioned Krylow-Subspace iteration. If the iteration
does not converge, the solver automatically switches back to the numerical factorization. This method can be
applied to nonsymmetric matrices in Intel MKL PARDISO. You can select the method using the iparm[3]
input parameter. The iparm[19] parameter returns the error status after running Intel MKL PARDISO.
Single and Double Precision Computations

Intel MKL PARDISO solves tasks using single or double precision. Each precision has its benefits and
drawbacks. Double precision variables have more digits to store value, so the solver uses more memory for
keeping data. But this mode solves matrices with better accuracy, which is especially important for input
matrices with large condition numbers.
Single precision variables have fewer digits to store values, so the solver uses less memory than in the
double precision mode. Additionally this mode usually takes less time. But as computations are made less
precisely, only some systems of equations can be solved accurately enough using single precision.
Separate Forward and Backward Substitution

The solver execution step (see parameterphase = 33 below) can be divided into two or three separate
substitutions: forward, backward, and possible diagonal. This separation can be explained by the examples of
solving systems with different matrix types.
1633
A real symmetric positive definite matrix A (mtype = 2) is factored by Intel MKL PARDISO as A = L*LT . In
this case the solution of the system A*x=b can be found as sequence of substitutions: L*y=b (forward
substitution, phase =331) andLT*x=y (backward substitution, phase =333).
A real nonsymmetric matrix A (mtype = 11) is factored by Intel MKL PARDISO as A = L*U . In this case the
solution of the system A*x=b can be found by the following sequence: L*y=b (forward substitution, phase
=331) andU*x=y (backward substitution, phase =333).
Solving a system with a real symmetric indefinite matrix A (mtype = -2) is slightly different from the cases
above. Intel MKL PARDISO factors this matrix as A=LDLT, and the solution of the system A*x=b can be
calculated as the following sequence of substitutions: L*y=b (forward substitution, phase =331), D*v=y
(diagonal substitution, phase =332), and finally LT*x=v (backward substitution, phase =333). Diagonal
substitution makes sense only for symmetric indefinite matrices (mtype = -2, -4, 6). For matrices of
other types a solution can be found as described in the first two examples.
CAUTION
The number of refinement steps (iparm[7]) must be set to zero if a solution is calculated with
separate substitutions (phase = 331, 332, 333), otherwise Intel MKL PARDISO produces the wrong
result.
NOTE
Different pivoting (iparm[20]) produces different LDLT factorization. Therefore results of forward,
diagonal and backward substitutions with diagonal pivoting can differ from results of the same steps
with Bunch-Kaufman pivoting. Of course, the final results of sequential execution of forward, diagonal
and backward substitution are equal to the results of the full solving step (phase=33) regardless of the
pivoting used.
Callback Function for Pivoting Control

In-core Intel MKL PARDISO allows you to control pivoting with a callback routine, mkl_pardiso_pivot. You can
then use the pardiso_getdiag routine to access the diagonal elements. Set iparm[55] to 1 in order to use
the callback functionality.
pardiso
Calculates the solution of a set of sparse linear
equations with single or multiple right-hand sides.
Syntax
void pardiso (_MKL_DSS_HANDLE_t pt, const MKL_INT *maxfct, const MKL_INT *mnum, const
MKL_INT *mtype, const MKL_INT *phase, const MKL_INT *n, const void *a, const MKL_INT
*ia, const MKL_INT *ja, MKL_INT *perm, const MKL_INT *nrhs, MKL_INT *iparm, const
MKL_INT *msglvl, void *b, void *x, MKL_INT *error);
Include Files
• mkl.h
Description
The routine pardiso calculates the solution of a set of sparse linear equations
A*X = B
1634
with single or multiple right-hand sides, using a parallel LU, LDL, or LLT factorization, where A is an n-by-n
matrix, and X and B are n-by-nrhs vectors or matrices.
NOTE
Optimization Notice
Input Parameters
pt Array with size of 64.

Handle to internal data structure. The entries must be set to zero prior to
the first call to pardiso. Unique for factorization.
CAUTION
After the first call to pardiso do not directly modify pt, as that could
cause a serious memory leak.
Use the pardiso_handle_store or pardiso_handle_store_64 routine to

store the content of pt to a file. Restore the contents of pt from the file
using pardiso_handle_restore or pardiso_handle_restore_64. Use
pardiso_handle_store and pardiso_handle_restore with pardiso,
and pardiso_handle_store_64 and pardiso_handle_restore_64 with
pardiso_64.
maxfct Maximum number of factors with identical sparsity structure that must be
kept in memory at the same time. In most applications this value is equal
to 1. It is possible to store several different factorizations with the same
nonzero structure at the same time in the internal data structure
management of the solver.
pardiso can process several matrices with an identical matrix sparsity
pattern and it can store the factors of these matrices at the same time.
Matrices with a different sparsity structure can be kept in memory with
different memory address pointers pt.
mnum Indicates the actual matrix for the solution phase. With this scalar you can
define which matrix to factorize. The value must be: 1 ≤mnum≤maxfct.
In most applications this value is 1.
mtype Defines the matrix type, which influences the pivoting method. The Intel
MKL PARDISO solver supports the following matrices:
1635
1 real and structurally symmetric
2 real and symmetric positive definite
-2 real and symmetric indefinite
3 complex and structurally symmetric
4 complex and Hermitian positive definite
-4 complex and Hermitian indefinite
6 complex and symmetric
11 real and nonsymmetric
13 complex and nonsymmetric
phase Controls the execution of the solver. Usually it is a two- or three-digit

integer. The first digit indicates the starting phase of execution and the
second digit indicates the ending phase. Intel MKL PARDISO has the
following phases of execution:
• Phase 1: Fill-reduction analysis and symbolic factorization

• Phase 2: Numerical factorization
• Phase 3: Forward and Backward solve including optional iterative
refinement
This phase can be divided into two or three separate substitutions:
forward, backward, and diagonal (see Separate Forward and Backward
Substitution).
• Memory release phase (phase= 0 or phase= -1)
If a previous call to the routine has computed information from previous

phases, execution may start at any phase. The phase parameter can have
the following values:
phase Solver Execution Steps

11 Analysis
12 Analysis, numerical factorization
13 Analysis, numerical factorization, solve, iterative

refinement
22 Numerical factorization
23 Numerical factorization, solve, iterative refinement
33 Solve, iterative refinement
331 like phase=33, but only forward substitution
332 like phase=33, but only diagonal substitution (if

available)
333 like phase=33, but only backward substitution
1636
0 Release internal memory for L and U matrix number
mnum
-1 Release all internal memory for all matrices
If iparm[35] = 0, phases 331, 332, and 333 perform this decomposition:
L11 0 D11 0 U 11 U 21
A=
L12 L22 0 D22 0 U 22
If iparm[35] = 2, phases 331, 332, and 333 perform a different

decomposition:
L11 0 I 0 U U
11 21
A=
L12 I 0 S 0 I
You can supply a custom implementation for phase 332 instead of calling
pardiso. For example, it can be implemented with dense LAPACK
functionality. Custom implementation also allows you to substitute the
matrix S with your own.
NOTE
For very large Schur complement matrices use LAPACK functionality to
compute the Schur complement vector instead of the Intel MKL
PARDISO phase 332 implementation.
n Number of equations in the sparse linear systems of equations A*X = B.

Constraint: n > 0.
a Array. Contains the non-zero elements of the coefficient matrix A

corresponding to the indices in ja. The coefficient matrix can be either real
or complex. The matrix must be stored in the three-array variant of the
compressed sparse row (CSR3) or in the three-array variant of the block
compressed sparse row (BSR3) format, and the matrix must be stored with
increasing values of ja for each row.
For CSR3 format, the size of a is the same as that of ja. Refer to the
values array description in Three Array Variation of CSR Format for more
details.
For BSR3 format the size of a is the size of ja multiplied by the square of
the block size. Refer to the values array description in Three Array
Variation of BSR Format for more details.
NOTE
If you set iparm[36] to a negative value, Intel MKL PARDISO converts
the data from CSR3 format to an internal variable BSR (VBSR) format.
See Sparse Data Storage.
ia Array, size (n+1).
1637
For CSR3 format, ia[i] (i<n) points to the first column index of row i in
the array ja. That is, ia[i] gives the index of the element in array a that
contains the first non-zero element from row i of A. The last element ia[n]
is taken to be equal to the number of non-zero elements in A, plus one.
Refer to rowIndex array description in Three Array Variation of CSR Format
for more details.
For BSR3 format, ia[i] (i<n) points to the first column index of row i in
contains the first non-zero block from row i of A. The last element ia[n] is
taken to be equal to the number of non-zero blcoks in A, plus one. Refer to
rowIndex array description in Three Array Variation of BSR Format for more
details.
The array ia is accessed in all phases of the solution process.
Indexing of ia is one-based by default, but it can be changed to zero-based

by setting the appropriate value to the parameter iparm[34].
ja For CSR3 format, array ja contains column indices of the sparse matrix A.
It is important that the indices are in increasing order per row. For
structurally symmetric matrices it is assumed that all diagonal elements are
stored (even if they are zeros) in the list of non-zero elements in a and ja.
For symmetric matrices, the solver needs only the upper triangular part of
the system as is shown for columns array in Three Array Variation of CSR
Format.
For BSR3 format, array ja contains column indices of the sparse matrix A.
structurally symmetric matrices it is assumed that all diagonal blocks are
stored (even if they are zeros) in the list of non-zero blocks in a and ja. For
symmetric matrices, the solver needs only the upper triangular part of the
system as is shown for columns array in Three Array Variation of BSR
Format.
The array ja is accessed in all phases of the solution process.
Indexing of ja is one-based by default, but it can be changed to zero-based

by setting the appropriate value to the parameter iparm[34].
perm Array, size (n). Depending on the value of iparm[4] and iparm[30], either
holds the permutation vector of size n or specifies elements used for
computing a partial solution.
• If iparm[4] = 1, iparm[30] = 0, and iparm[35] = 0, perm specifies

the fill-in reducing ordering to the solver. Let A be the original matrix
and C = P*A*PT be the permuted matrix. Row (column) i of C is the
perm[i] row (column) of A. The array perm is also used to return the
permutation vector calculated during fill-in reducing ordering stage.
NOTE
Be aware that setting iparm[4] = 1 prevents use of a parallel
algorithm for the solve step.
• If iparm[4] = 2, iparm[30] = 0, and iparm[35] = 0, the permutation

vector computed in phase 11 is returned in the perm array.
1638
• If iparm[4] = 0, iparm[30] > 0, and iparm[35] = 0, perm specifies
elements of the right-hand side to use or of the solution to compute for
a partial solution.
• If iparm[4] = 0, iparm[30] = 0, and iparm[35] > 0, perm specifies
elements for a Schur complement.
See iparm[4] and iparm[30] for more details.
Indexing of perm is one-based by default, but it can be changed to zero-

based by setting the appropriate value to the parameter iparm[34].
nrhs Number of right-hand sides that need to be solved for.
iparm Array, size (64). This array is used to pass various parameters to Intel MKL
PARDISO and to return some useful information after execution of the
solver.
See pardiso iparm Parameter for more details about the iparm parameters.
msglvl Message level information. If msglvl = 0 then pardiso generates no

output, if msglvl = 1 the solver prints statistical information to the screen.
b Array, size (n*nrhs). On entry, contains the right-hand side vector/matrix

B, which is placed in memory contiguously. The b[+k*nrhs] element must
hold the i-th component of k-th right-hand side vector. Note that b is only
accessed in the solution phase.
Output Parameters
(See also Intel MKL PARDISO Parameters in Tabular Form.)
pt Handle to internal data structure.
perm See the Input Parameter description of the perm array.
iparm On output, some iparm values report information such as the numbers of
non-zero elements in the factors.
See pardiso iparm Parameter for more details about the iparm parameters.
b On output, the array is replaced with the solution if iparm[5] = 1.
x Array, size (n*nrhs). If iparm[5]=0 it contains solution vector/matrix X,

which is placed contiguously in memory. The x[i + k*n] element must
hold the i-th component of the k-th solution vector. Note that x is only
accessed in the solution phase.
error The error indicator according to the below table:
error Information
0 no error
-1 input inconsistent
-2 not enough memory
1639
error Information
-3 reordering problem
-4 zero pivot, numerical factorization or iterative

refinement problem
-5 unclassified (internal) error
-6 reordering failed (matrix types 11 and 13 only)
-7 diagonal matrix is singular
-8 32-bit integer overflow problem
-9 not enough memory for OOC
-10 error opening OOC files
-11 read/write error with OOC files
-12 (pardiso_64 only) pardiso_64 called from 32-bit

library
pardisoinit
Initialize Intel MKL PARDISO with default parameters
in accordance with the matrix type.
Syntax
void pardisoinit (_MKL_DSS_HANDLE_t pt, const MKL_INT *mtype, MKL_INT *iparm );
Include Files
• mkl.h
Description
This function initializes Intel MKL PARDISO internal address pointer pt with zero values (as needed for the
very first call of pardiso) and sets default iparm values in accordance with the matrix type. Intel MKL
supplies the pardisoinit routine to be compatible with PARDISO 3.2 or lower.
NOTE
An alternative way to set default iparm values is to call pardiso in the analysis phase with
iparm(1)=0. In this case you must initialize the internal address pointer pt with zero values manually.
NOTE
The pardisoinit routine initializes only the in-core version of Intel MKL PARDISO. Switching on the
out-of core version of Intel MKL PARDISO as well as changing default iparm values can be done after
the call to pardisoinit but before the first call to pardiso.
1640
Input Parameters
perm Ignored.
mtype This scalar value defines the matrix type. Based on this value pardisoinit
sets default values for the iparm array. Refer to the section Intel MKL
PARDISO Parameters in Tabular Form for more details about the default
values of Intel MKL PARDISO
Output Parameters
pt Array with a dimension of 64. Solver internal data address pointer. These
addresses are passed to the solver, and all related internal memory
management is organized through this array. The pardisoinit routine
nullifies the array pt.
NOTE
It is very important that the pointer pt is initialized with zero before
the first call of Intel MKL PARDISO. After that first call you should never
modify the pointer, because it could cause a serious memory leak or a
crash.
iparm Array with a dimension of 64. This array is used to pass various parameters
to Intel MKL PARDISO and to return some useful information after execution
of the solver. The pardisoinit routine fills-in the iparm array with the
default values. Refer to the section Intel MKL PARDISO Parameters in
Tabular Form for more details about the default values of Intel MKL
PARDISO.
pardiso_64
equations with single or multiple right-hand sides, 64-
bit integer version.
Syntax
void pardiso_64 (_MKL_DSS_HANDLE_t pt, const long long int *maxfct, const long long int
*mnum, const long long int *mtype, const long long int *phase, const long long int *n,
const void *a, const long long int *ia, const long long int *ja, long long int *perm,
const long long int *nrhs, long long int *iparm, const long long int *msglvl, void *b,
void *x, long long int *error);
Include Files
• mkl.h
Description
pardiso_64 is an alternative ILP64 (64-bit integer) version of the pardiso routine (see Description section
for more details). The interface of pardiso_64 is the same as the interface of pardiso, but it accepts and
returns all integer data as long long int.
1641
Use pardiso_64 when pardiso for solving large matrices (with the number of non-zero elements on the
order of 500 million or more). You can use it together with the usual LP64 interfaces for the rest of Intel MKL
functionality. In other words, if you use 64-bit integer version (pardiso_64), you do not need to re-link your
applications with ILP64 libraries. Take into account that pardiso_64 may perform slower than regular
pardiso on the reordering and symbolic factorization phase.
NOTE
pardiso_64 is supported only in the 64-bit libraries. If pardiso_64 is called from the 32-bit libraries,
it returns error =-12.
NOTE
Input Parameters
The input parameters of pardiso_64 are the same as the input parameters of pardiso, but pardiso_64
accepts all integer data as long long int.
Output Parameters
The output parameters of pardiso_64 are the same as the output parameters of pardiso, but pardiso_64
pardiso_getenv, pardiso_setenv
Retrieves additional values from or sets additional
values in the Intel MKL PARDISO handle.
Syntax
MKL_INT pardiso_getenv (const _MKL_DSS_HANDLE_t handle, const enum PARDISO_ENV_PARAM
*param, char *value);
MKL_INT pardiso_setenv (_MKL_DSS_HANDLE_t handle , const enum PARDISO_ENV_PARAM *param,
const char *value);
Include Files
• mkl.h
Description
These functions operate with the Intel MKL PARDISO handle. The pardiso_getenv routine retrieves
additional values from the Intel MKL PARDISO handle, and pardiso_setenv sets specified values in the Intel
MKL PARDISO handle.
These functions enable retrieving and setting the name of the Intel MKL PARDISO OOC file.
To retrieve the Intel MKL PARDISO OOC file name, you can apply this function to any properly-created
handle.
To set the Intel MKL PARDISO OOC file name in the handle you must call the function before the reordering
stage. This is because the OOC file name is stored in the handle after the reordering stage and it is not
changed during further computations.
1642
NOTE
A 1024-byte internal buffer is used inside Intel MKL PARDISO for storing the OOC file name. Allocate a
1024-byte buffer for passing to the pardiso_getenv function as the value parameter.
Input Parameters
handle Intel MKL PARDISO handle for which to set and from which to retrieve
information. (See DSS Interface Description for structure description)
param Specifies the required parameter. The only value is

PARDISO_OCC_FILE_NAME, defined in the corresponding include file.
value Input parameter for pardiso_setenv. Contains the name of the OOC file
that must be used in the handle.
Output Parameters
handle Output parameter for pardiso_setenv. Data object of the

MKL_DSS_HANDLE type (see DSS Interface Description).
value Output parameter for pardiso_getenv. Contains the name of the OOC file
which must be used in the handle.
mkl_pardiso_pivot
Replaces routine which handles Intel MKL PARDISO
pivots with user-defined routine.
Syntax
void mkl_pardiso_pivot (const void *ai, void *bi, const void *eps);
Include Files
• mkl.h
Description
The mkl_pardiso_pivot routine allows you to handle diagonal elements which arise during numerical
factorization that are zero or near zero. By default, Intel MKL PARDISO determines that a diagonal element
bi is a pivot if bi < eps, and if so, replaces it with eps. But you can provide your own routine to modify the
resulting factorized matrix in case there are small elements on the diagonal during the factorization step.
NOTE
To use this routine, you must set iparm[55] to 1 before the main pardiso loop.
Input Parameters
ai Diagonal element of initial matrix corresponding to pivot element.
bi Diagonal element of factorized matrix that could be chosen as a pivot

element.
1643
eps Scalar to compare with diagonal of factorized matrix. On input equal to

parameter described by iparm[9].
Output Parameters
bi In case element is chosen as a pivot, value with which to replace the pivot.
pardiso_getdiag
Returns diagonal elements of initial and factorized
matrix.
Syntax
void pardiso_getdiag (const _MKL_DSS_HANDLE_t pt, void *df, void *da, const MKL_INT
*mnum, MKL_INT *error);
Include Files
• mkl.h
Description
This routine returns the diagonal elements of the initial and factorized matrix for a real or Hermitian matrix.
NOTE
In order to use this routine, you must set iparm[55] to 1 before the main pardiso loop.
Input Parameters
pt Array with a size of 64. Handle to internal data structure for the Intel MKL
PARDISO solver. The entries must be set to zero prior to the first call to
pardiso. Unique for factorization.
mnum Indicates the actual matrix for the solution phase of the Intel MKL PARDISO
solver. With this scalar you can define the diagonal elements of the
factorized matrix that you want to obtain. The value must be: 1
≤mnum≤maxfct. In most applications this value is 1.
Output Parameters
df Array with a dimension of n. Contains diagonal elements of the factorized

matrix after factorization.
NOTE
Elements of df correspond to diagonal elements of matrix L computed
during phase 22. Because during phase 22 Intel MKL PARDISO makes
additional permutations to improve stability, it is possible that array df
is not in line with the perm array computed during phase 11.
da Array with a dimension of n. Contains diagonal elements of the initial

matrix.
1644
NOTE
Elements of da correspond to diagonal elements of matrix L computed
during phase 22. Because during phase 22 Intel MKL PARDISO makes
additional permutations to improve stability, it is possible that array da
is not in line with the perm array computed during phase 11.
error The error indicator.
error Information
0 no error
-1 Diagonal information not turned on before pardiso

main loop (iparm[55]=0).
pardiso_handle_store
Store internal structures from pardiso to a file.
Syntax
void pardiso_handle_store (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT *error);
Include Files
• mkl.h
Description
This function stores Intel MKL PARDISO structures to a file, allowing you to store Intel MKL PARDISO internal
structures between the stages of the pardiso routine. The pardiso_handle_restore routine can restore the
Intel MKL PARDISO internal structures from the file.
Input Parameters
pt Array with a size of 64. Handle to internal data structure.
dirname String containing the name of the directory to which to write the files with
the content of the internal structures. Use an empty string ("") to specify
the current directory. The routine creates a file named handle.pds in the
directory.
Output Parameters
error Information
0 No error.
-2 Not enough memory.
-10 Cannot open file for writing.
1645
error Information
-11 Error while writing to file.
-13 Wrong file format.
pardiso_handle_restore
Restore pardiso internal structures from a file.
Syntax
void pardiso_handle_restore (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);
Include Files
• mkl.h
Description
This function restores Intel MKL PARDISO structures from a file. This allows you to restore Intel MKL
PARDISO internal structures stored by pardiso_handle_store after a phase of the pardiso routine and
continue execution of the next phase.
Input Parameters
dirname String containing the name of the directory in which the file with the
content of the internal structures are located. Use an empty string ("") to
specify the current directory.
Output Parameters
pt Array with a dimension of 64. Handle to internal data structure.
error Information
0 No error.
-10 Cannot open file for reading.
-11 Error while reading from file.
pardiso_handle_delete
Delete files with pardiso internal structure data.
Syntax
void pardiso_handle_delete (const char *dirname, MKL_INT *error);
1646
Include Files
• mkl.h
Description
This function deletes files generated with pardiso_handle_store that contain Intel MKL PARDISO internal
structures.
Input Parameters
Output Parameters
error Information
0 No error.
-10 Cannot delete files.
pardiso_handle_store_64
Store internal structures from pardiso_64 to a file.
Syntax
void pardiso_handle_store_64 (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);
Include Files
• mkl.h
Description
This function stores Intel MKL PARDISO structures to a file, allowing you to store Intel MKL PARDISO internal
structures between the stages of the pardiso_64 routine. The pardiso_handle_restore_64 routine can restore
the Intel MKL PARDISO internal structures from the file.
Input Parameters
dirname String containing the name of the directory to which to write the files with
the content of the internal structures. Use an empty string ("") to specify
the current directory. The routine creates a file named handle.pds in the
directory.
Output Parameters
1647
error Information
0 No error.
-10 Cannot open file for writing.
-11 Error while writing to file.
-12 Not supported in 32-bit library - routine is only

supported in 64-bit libraries.
pardiso_handle_restore_64
Restore pardiso_64 internal structures from a file.
Syntax
void pardiso_handle_restore_64 (_MKL_DSS_HANDLE_t pt, const char *dirname, MKL_INT
*error);
Include Files
• mkl.h
Description
This function restores Intel MKL PARDISO structures from a file. This allows you to restore Intel MKL
PARDISO internal structures stored by pardiso_handle_store_64 after a phase of the pardiso_64 routine
and continue execution of the next phase.
Input Parameters
Input Parameters
error Information
0 No error.
-10 Cannot open file for reading.
-11 Error while reading from file.
1648
pardiso_handle_delete_64
Syntax
Delete files with pardiso_64 internal structure data.
void pardiso_handle_delete_64 (const char *dirname, MKL_INT *error);
Include Files
• mkl.h
Description
This function deletes files generated with pardiso_handle_store_64 that contain Intel MKL PARDISO
internal structures.
Input Parameters
Output Parameters
error Information
0 No error.
-10 Cannot delete files.
-12 Not supported in 32-bit library - routine is only

supported in 64-bit libraries.
Intel MKL PARDISO Parameters in Tabular Form

The following table lists all parameters of Intel MKL PARDISO and gives their brief descriptions.
Parameter Type Description Values Comments In/
Out
pt 0 in/o
void* Solver internal Must be initialized with ut
data address zeros and never be
pointer modified later
maxfct >0 in
MKL_INT* Maximal number of Generally used value is 1
factors in memory
mnum in
MKL_INT* The number of [1: Generally used value is 1
matrix (from 1 to maxfct]
maxfct) to solve
mtype 1 in
MKL_INT* Matrix type Real and structurally
symmetric
2
Real and symmetric
positive definite
1649

Out
-2
Real and symmetric
indefinite
3
Complex and structurally
symmetric
4
Complex and Hermitian
positive definite
-4
Complex and Hermitian
indefinite
6
Complex and symmetric
matrix
11
Real and nonsymmetric
matrix
13
Complex and
nonsymmetric matrix
phase 11 in
MKL_INT* Controls the Analysis
execution of the 12
solver Analysis, numerical
factorization
For iparm[35] >
13
0, phases 331, Analysis, numerical
332, and 333 factorization, solve
perform a different 22
decomposition. See Numerical factorization
the phase 23
Numerical factorization,
parameter of solve
pardiso for details.
33
Solve, iterative refinement
331
phase=33, but only
forward substitution
332
phase=33, but only
diagonal substitution
333
phase=33, but only
backward substitution
0
Release internal memory
for L and U of the matrix
number mnum
-1
Release all internal
memory for all matrices
n >0 in
MKL_INT* Number of
equations in the
sparse linear
system A*X = B
a * in
void* Contains the non- The size of a is the same
zero elements of as that of ja, and the
the coefficient coefficient matrix can be
matrix A either real or complex. The
matrix must be stored in
the 3-array variation of
1650
Out
compressed sparse row

(CSR3) format with
increasing values of ja for
each row
ia[n ] >=0 in
MKL_INT* rowIndex array in ia[i] gives the index of
CSR3 format the element in array a that
contains the first non-zero
element from row i of A.
The last element ia(n) is
taken to be equal to the
number of non-zero
elements in A.
Note: iparm[34] indicates
whether row/column
indexing starts from 1 or
0.
ja >=0 in
MKL_INT* columns array in The column indices for
CSR3 format each row of A must be
sorted in increasing order.
For structurally symmetric
matrices zero diagonal
elements must be stored
in a and ja. Zero diagonal
elements should be stored
for symmetric matrices,
although they are not
required. For symmetric
matrices, the solver needs
only the upper triangular
part of the system.
whether row/column
0.
perm[n ] >=0 in/o
MKL_INT* Holds the You can apply your own ut
permutation vector fill-in reducing ordering
of size n or (iparm[4]= 1) or return
specifies elements the permutation from the
used for computing solver (iparm[4]= 2 ).
a partial solution
Let C = P*A*PT be the
permuted matrix. Row
(column) i of C is the
perm(i) row (column) of
A. The numbering of the
array must describe a
permutation.
To specify elements for a
partial solution, set
iparm[4]= 0,
iparm[30]> 0, and
iparm[35]= 0.
1651

Out
To specify elements for a

Schur complement, set
iparm[4]= 0,
iparm[30]= 0, and
iparm[35]> 0.
whether row/column
0.
nrhs >=0 in
MKL_INT* Number of right- Generally used value is 1
hand sides that
To obtain better Intel MKL
need to be solved
PARDISO performance,
for
during the numerical
factorization phase you
can provide the maximum
number of right-hand
sides, which can be used
further during the solving
phase.
iparm[64] * in/o
MKL_INT* This array is used If iparm[0]=0 , Intel MKL ut
to pass various PARDISO fills iparm[1]
parameters to Intel through iparm[63] with
MKL PARDISO and default values and uses
to return some them.
useful information
after execution of
the solver (see
pardiso iparm
Parameter for more
details)
msglvl 0 in
MKL_INT* Message level Intel MKL PARDISO
information generates no output
1
Intel MKL PARDISO prints
statistical information
b[n*nrhs] * in/o
void* Right-hand side On entry, contains the ut
vectors right-hand side vector/
matrix B, which is placed
contiguously in memory.
The b[i+k*n] element
must hold the i-th
component of k-th right-
hand side vector. Note that
b is only accessed in the
solution phase.
On output, the array is
replaced with the solution
if iparm[5]=1.
x[n*nrhs] * out
void* Solution vectors On output, if iparm[5]=0,
contains solution vector/
matrix X which is placed
contiguously in memory.
The x[i+k*n] element
1652
Out
must hold the i-th

component of k-th solution
vector. Note that x is only
accessed in the solution
phase.
error 0 out
MKL_INT* Error indicator No error
-1
Input inconsistent
-2
Not enough memory
-3
Reordering problem
-4
Zero pivot, numerical
factorization or iterative
refinement problem
-5
Unclassified (internal)
error
-6
Reordering failed (matrix
types 11 and 13 only)
-7
Diagonal matrix is singular
-8
32-bit integer overflow
problem
-9
Not enough memory for
OOC
-10
Problems with opening
OOC temporary files
-11
Read/write problems with
the OOC data file
1) See description of PARDISO_DATA_TYPE in PARDISO_DATA_TYPE.
pardiso iparm Parameter

The following table describes all individual components of the Intel MKL PARDISO iparm parameter.
Components which are not used must be initialized with 0. Default values are denoted with an asterisk (*).
Component Description
iparm[0] Use default values.

input 0 iparm[1] - iparm[63] are filled with default values.
≠0 You must supply all values in components iparm[1] - iparm[63].
iparm[1] Fill-in reducing ordering for the input matrix.

input
CAUTION
You can control the parallel execution of the solver by explicitly setting the
MKL_NUM_THREADS environment variable. If fewer OpenMP threads are available than
specified, the execution may slow down instead of speeding up. If MKL_NUM_THREADS
is not defined, then the solver uses all available processors.
1653
0 The minimum degree algorithm [Li99].
2* The nested dissection algorithm from the METIS package [Karypis98].
3 The parallel (OpenMP) version of the nested dissection algorithm. It can decrease the time of
computations on multi-core computers, especially when Intel MKL PARDISO Phase 1 takes
significant time.
NOTE
Setting iparm[1] = 3 prevents the use of CNR mode (iparm[33] > 0)
because Intel MKL PARDISO uses dynamic parallelism.
iparm[2]
Reserved. Set to zero.
iparm[3]
Preconditioned CGS/CG.
input This parameter controls preconditioned CGS [Sonn89] for nonsymmetric or structurally
symmetric matrices and Conjugate-Gradients for symmetric matrices. iparm[3] has the
form iparm[3]= 10*L+K.
K=0
The factorization is always computed as required by phase.
K=1
CGS iteration replaces the computation of LU. The preconditioner is LU that was
computed at a previous step (the first step or last step with a failure) in a sequence
of solutions needed for identical sparsity patterns.
K=2
CGS iteration for symmetric positive definite matrices replaces the computation of
LLT. The preconditioner is LLT that was computed at a previous step (the first step
or last step with a failure) in a sequence of solutions needed for identical sparsity
patterns.
The value L controls the stopping criterion of the Krylow-Subspace iteration:

epsCGS = 10-L is used in the stopping criterion
||dxi|| / ||dx0|| < epsCGS
where ||dxi|| = ||inv(L*U)*ri|| for K = 1 or ||dxi|| = ||inv(L*LT)*ri|| for K =
2 and ri is the residue at iteration i of the preconditioned Krylow-Subspace iteration.
A maximum number of 150 iterations is fixed with the assumption that the iteration will
converge before consuming half the factorization time. Intermediate convergence rates and
residue excursions are checked and can terminate the iteration process. If phase =23,
then the factorization for a given A is automatically recomputed in cases where the Krylow-
Subspace iteration failed, and the corresponding direct solution is returned. Otherwise the
solution from the preconditioned Krylow-Subspace iteration is returned. Using phase =33
results in an error message (error=-4) if the stopping criteria for the Krylow-Subspace
iteration can not be reached. More information on the failure can be obtained from
iparm[19].
The default is iparm[3]=0, and other values are only recommended for an advanced user.
iparm[3] must be greater than or equal to zero.
Examples:
iparm[3] Description
31 LU-preconditioned CGS iteration with a stopping criterion of 1.0E-3 for

nonsymmetric matrices
61 LU-preconditioned CGS iteration with a stopping criterion of 1.0E-6 for

nonsymmetric matrices
62 LLT-preconditioned CGS iteration with a stopping criterion of 1.0E-6 for

symmetric positive definite matrices
iparm[4]
User permutation.
1654
input This parameter controls whether user supplied fill-in reducing permutation is used instead
of the integrated multiple-minimum degree or nested dissection algorithms. Another use of
this parameter is to control obtaining the fill-in reducing permutation vector calculated
during the reordering stage of Intel MKL PARDISO.
This option is useful for testing reordering algorithms, adapting the code to special
applications problems (for instance, to move zero diagonal elements to the end of P*A*PT),
or for using the permutation vector more than once for matrices with identical sparsity
structures. For definition of the permutation, see the description of the perm parameter.
CAUTION
You can only set one of iparm[4], iparm[30], and iparm[35], so be sure that the
iparm[30] (partial solution) and the iparm[35] (Schur complement) parameters are
0 if you set iparm[4].
0*
User permutation in the perm array is ignored.
1
Intel MKL PARDISO uses the user supplied fill-in reducing permutation from the
perm array. iparm[1] is ignored.
NOTE
Setting iparm[4] = 1 prevents use of a parallel algorithm for the solve step.
2
Intel MKL PARDISO returns the permutation vector computed at phase 1 in the
perm array.
iparm[5]
Write solution on x.
input
NOTE
The array x is always used.
0*
The array x contains the solution; right-hand side vector b is kept unchanged.
1
The solver stores the solution on the right-hand side b.
iparm[6]
Number of iterative refinement steps performed.
output Reports the number of iterative refinement steps that were actually performed during the
solve step.
iparm[7]
Iterative refinement step.
input On entry to the solve and iterative refinement step, iparm[7] must be set to the
maximum number of iterative refinement steps that the solver performs.
0*
The solver automatically performs two steps of iterative refinement when perturbed
pivots are obtained during the numerical factorization.
>0
Maximum number of iterative refinement steps that the solver performs. The solver
performs not more than the absolute value of iparm[7] steps of iterative
refinement. The solver might stop the process before the maximum number of
steps if
• a satisfactory level of accuracy of the solution in terms of backward error is
achieved,
1655
• or if it determines that the required accuracy cannot be reached. In this case

Intel MKL PARDISO returns -4 in the error parameter.
The number of executed iterations is reported in iparm[6].

<0
Same as above, but the accumulation of the residuum uses extended precision real
and complex data types.
Perturbed pivots result in iterative refinement (independent of iparm[7]=0) and
the number of executed iterations is reported in iparm[6].
iparm[8]
iparm[9]
Pivoting perturbation.
input This parameter instructs Intel MKL PARDISO how to handle small pivots or zero pivots for
nonsymmetric matrices (mtype =11 or mtype =13) and symmetric matrices (mtype =-2,
mtype =-4, or mtype =6). For these matrices the solver uses a complete supernode
pivoting approach. When the factorization algorithm reaches a point where it cannot factor
the supernodes with this pivoting strategy, it uses a pivoting perturbation strategy similar
to [Li99], [Schenk04].
Small pivots are perturbed with eps = 10-iparm[9].
The magnitude of the potential pivot is tested against a constant threshold of
where eps = 10(-iparm[9]), A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is the infinity norm
of the scaled and permuted matrix A. Any tiny pivots encountered during elimination are
set to the sign (lII)*eps*||A2||inf, which trades off some numerical stability for the
ability to keep pivots from getting too small. Small pivots are therefore perturbed with eps
= 10(-iparm[9]).
13*
The default value for nonsymmetric matrices(mtype =11, mtype=13), eps =
10-13.
8*
The default value for symmetric indefinite matrices (mtype =-2, mtype=-4,
mtype=6), eps = 10-8.
iparm[10]
Scaling vectors.
input Intel MKL PARDISO uses a maximum weight matching algorithm to permute large elements
on the diagonal and to scale so that the diagonal elements are equal to 1 and the absolute
values of the off-diagonal entries are less than or equal to 1. This scaling method is applied
only to nonsymmetric matrices (mtype = 11 or mtype = 13). The scaling can also be used
for symmetric indefinite matrices (mtype = -2, mtype =-4, or mtype = 6) when the
symmetric weighted matchings are applied (iparm[12] = 1).
Use iparm[10] = 1 (scaling) and iparm[12] = 1 (matching) for highly indefinite
symmetric matrices, for example, from interior point optimizations or saddle point
problems. Note that in the analysis phase (phase=11) you must provide the numerical
values of the matrix A in array a in case of scaling and symmetric weighted matching.
0*
Disable scaling. Default for symmetric indefinite matrices.
1*
Enable scaling. Default for nonsymmetric matrices.
Scale the matrix so that the diagonal elements are equal to 1 and the absolute
values of the off-diagonal entries are less or equal to 1. This scaling method is
applied to nonsymmetric matrices (mtype = 11, mtype = 13). The scaling can also
be used for symmetric indefinite matrices (mtype = -2, mtype = -4, mtype = 6)
when the symmetric weighted matchings are applied (iparm[12] = 1).
1656
Note that in the analysis phase (phase=11) you must provide the numerical values
of the matrix A in case of scaling.
iparm[11]
Solve with transposed or conjugate transposed matrix A.
input
NOTE
For real matrices the terms transposed and conjugate transposed are equivalent.
0*
Solve a linear system AX = B.
1
Solve a conjugate transposed system AHX = B based on the factorization of the
matrix A.
2
Solve a transposed system ATX = B based on the factorization of the matrix A.
iparm[12]
Improved accuracy using (non-) symmetric weighted matching.
input Intel MKL PARDISO can use a maximum weighted matching algorithm to permute large
elements close the diagonal. This strategy adds an additional level of reliability to the
factorization methods and complements the alternative of using more complete pivoting
techniques during the numerical factorization.
0*
Disable matching. Default for symmetric indefinite matrices.
1*
Enable matching. Default for nonsymmetric matrices.
Maximum weighted matching algorithm to permute large elements close to the
diagonal.
It is recommended to use iparm[10] = 1 (scaling) and iparm[12]= 1 (matching)
for highly indefinite symmetric matrices, for example from interior point
optimizations or saddle point problems.
of the matrix A in case of symmetric weighted matching.
iparm[13]
Number of perturbed pivots.
output After factorization, contains the number of perturbed pivots for the matrix types: 11, 13,
-2, -4 and -6.
iparm[14]
Peak memory on symbolic factorization.
output The total peak memory in kilobytes that the solver needs during the analysis and symbolic
factorization phase.
This value is only computed in phase 1.
iparm[15]
Permanent memory on symbolic factorization.
output Permanent memory from the analysis and symbolic factorization phase in kilobytes that the
solver needs in the factorization and solve phases.
This value is only computed in phase 1.
iparm[16]
Size of factors/Peak memory on numerical factorization and solution.
output This parameter provides the size in kilobytes of the total memory consumed by in-core
Intel MKL PARDISO for internal floating point arrays. This parameter is computed in phase
1. See iparm[62] for the OOC mode.
The total peak memory consumed by Intel MKL PARDISO is max(iparm[14],
iparm[15]+iparm[16])
1657
iparm[17]
Report the number of non-zero elements in the factors.
input/output <0
Enable reporting if iparm[17] < 0 on entry. The default value is -1.
>=0
Disable reporting.
iparm[18]
Report number of floating point operations (in 106 floating point operations) that are
input/output necessary to factor the matrix A.
<0
Enable report if iparm[18] < 0 on entry. This increases the reordering time.
>=0
*
Disable report.
iparm[19]
Report CG/CGS diagnostics.
output >0
CGS succeeded, reports the number of completed iterations.
<0
CG/CGS failed (error=-4 after the solution phase).
If phase= 23, then the factors L and U are recomputed for the matrix A and the
error flag error=0 in case of a successful factorization. If phase = 33, then error
= -4 signals failure.
iparm[19]= - it_cgs*10 - cgs_error.
Possible values of cgs_error:
1 - fluctuations of the residuum are too large
2 - ||dxmax_it_cgs/2|| is too large (slow convergence)
3 - stopping criterion is not reached at max_it_cgs
4 - perturbed pivots caused iterative refinement
5 - factorization is too fast for this matrix. It is better to use the factorization
method with iparm[3] = 0
iparm[20]
Pivoting for symmetric indefinite matrices.
input
NOTE
Use iparm[10] = 1 (scaling) and iparm[12] = 1 (matchings) for highly indefinite
symmetric matrices, for example from interior point optimizations or saddle point
problems.
0
Apply 1x1 diagonal pivoting during the factorization process.
1*
Apply 1x1 and 2x2 Bunch-Kaufman pivoting during the factorization process.
Bunch-Kaufman pivoting is available for matrices of mtype=-2, mtype=-4, or
mtype=6.
2
Apply 1x1 diagonal pivoting during the factorization process. Using this value is the
same as using iparm[20] = 0 except that the solve step does not automatically
make iterative refinements when perturbed pivots are obtained during numerical
factorization. The number of iterations is limited to the number of iterative
refinements specified by iparm[7] (0 by default).
3
mtype=6. Using this value is the same as using iparm[20] = 1 except that the
1658
solve step does not automatically make iterative refinements when perturbed pivots
are obtained during numerical factorization. The number of iterations is limited to
the number of iterative refinements specified by iparm[7] (0 by default).
iparm[21]
Inertia: number of positive eigenvalues.
output Intel MKL PARDISO reports the number of positive eigenvalues for symmetric indefinite
matrices.
iparm[22]
Inertia: number of negative eigenvalues.
output Intel MKL PARDISO reports the number of negative eigenvalues for symmetric indefinite
matrices.
iparm[23]
Parallel factorization control.
input
NOTE
The two-level factorization algorithm does not improve performance in OOC mode.
0*
Intel MKL PARDISO uses the classic algorithm for factorization.
1
Intel MKL PARDISO uses a two-level factorization algorithm. This algorithm
generally improves scalability in case of parallel factorization on many OpenMP
threads (more than eight).
iparm[24]
Parallel forward/backward solve control.
input 0*
Intel MKL PARDISO uses a parallel algorithm for the solve step.
1
Intel MKL PARDISO uses the sequential forward and backward solve.
This feature is available only for in-core Intel MKL PARDISO (see iparm[59]).
iparm[25]
iparm[26]
Matrix checker.
input 0*
Intel MKL PARDISO does not check the sparse matrix representation for errors.
1
Intel MKL PARDISO checks integer arrays ia and ja. In particular, Intel MKL
PARDISO checks whether column indices are sorted in increasing order within each
row.
iparm[27]
Single or double precision Intel MKL PARDISO.
input See iparm[7] for information on controlling the precision of the refinement steps.
Important
The iparm[27] value is stored in the Intel MKL PARDISO handle between Intel MKL
PARDISO calls, so the precision mode can be changed only during phase 1.
0*
Input arrays (a, x and b) and all internal arrays must be presented in double
precision.
1
Input arrays (a, x and b) must be presented in single precision.
In this case all internal computations are performed in single precision.
iparm[28]
iparm[29]
Number of zero or negative pivots.
1659
output If Intel MKL PARDISO detects zero or negative pivot for mtype=2 or mtype=4 matrix types,
the factorization is stopped, Intel MKL PARDISO returns immediately with an error = -4,
and iparm[29] reports the number of the equation where the first zero or negative pivot is
detected.
iparm[30]
Partial solve and computing selected components of the solution vectors.
input This parameter controls the solve step of Intel MKL PARDISO. It can be used if only a few
components of the solution vectors are needed or if you want to reduce the computation
cost at the solve step by utilizing the sparsity of the right-hand sides. To use this option the
input permutation vector define perm so that when perm(i) = 1 it means that either the i-
th component in the right-hand sides is nonzero, or the i-th component in the solution
vectors is computed, or both, depending on the value of iparm[30].
The permutation vector perm must be present in all phases of Intel MKL PARDISO software.
At the reordering step, the software overwrites the input vector perm by a permutation
vector used by the software at the factorization and solver step. If m is the number of
components such that perm(i) = 1, then the last m components of the output vector perm
are a set of the indices i satisfying the condition perm(i) = 1 on input.
NOTE
Turning on this option often increases the time used by Intel MKL PARDISO for
factorization and reordering steps, but it can reduce the time required for the solver
step.
Important
This feature is only available for in-core Intel MKL PARDISO, so to use it you must set
iparm[59] = 0. Set the parameters iparm[7] (iterative refinement steps),
iparm[3] (preconditioned CGS), iparm[4] (user permutation), and iparm[35]
(Schur complement) to 0 as well.
0*
Disables this option.
1
it is assumed that the right-hand sides have only a few non-zero components* and
the input permutation vector perm is defined so that perm(i) = 1 means that the
(i)-th component in the right-hand sides is nonzero. In this case Intel MKL PARDISO
only uses the non-zero components of the right-hand side vectors and computes
only corresponding components in the solution vectors. That means the i-th
component in the solution vectors is only computed if perm(i) = 1.
2
It is assumed that the right-hand sides have only a few non-zero components* and
the input permutation vector perm is defined so that perm(i) = 1 means that the i-
th component in the right-hand sides is nonzero.
Unlike for iparm[30]=1, all components of the solution vector are computed for
this setting and all components of the right-hand sides are used. Because all
components are used, for iparm[30]=2 you must set the i-th component of the
right-hand sides to zero explicitly if perm(i) is not equal to 1.
3
Selected components of the solution vectors are computed. The perm array is not
related to the right-hand sides and it only indicates which components of the
solution vectors should be computed. In this case perm(i) = 1 means that the i-th
component in the solution vectors is computed.
iparm[31]
-
iparm[32]
1660
iparm[33]
Optimal number of OpenMP threads for conditional numerical reproducibility (CNR) mode.
input Intel MKL PARDISO reads the value of iparm[33] during the analysis phase (phase 1), so
you cannot change it later.
Because Intel MKL PARDISO uses C random number generator facilities during the analysis
phase (phase 1) you must take these precautions to get numerically reproducible results:
• Do not alter the states of the random number generators.
• Do not run multiple instances of Intel MKL PARDISO in parallel in the analysis phase
(phase 1).
NOTE
CNR is only available for the in-core version of Intel MKL PARDISO and the non-
parallel version of the nested dissection algorithm. You must also:
• set iparm[59] to 0 in order to use the in-core version,
• not set iparm[1] to 3 in order to not use the parallel version of the nested
dissection algorithm.
Otherwise Intel MKL PARDISO does not produce numerically repeatable results even if
CNR is enabled for Intel MKL using the functionality described in Support Functions
for CNR.
0*
CNR mode for Intel MKL PARDISO is enabled only if it is enabled for Intel MKL using
the functionality described in Support Functions for CNR and the in-core version is
used. Intel MKL PARDISO determines the optimal number of OpenMP threads
automatically, and produces numerically reproducible results regardless of the
number of threads.
>0
CNR mode is enabled for Intel MKL PARDISO if in-core version is used and the
optimal number of OpenMP threads for Intel MKL PARDISO to rely on is defined by
the value of iparm[33]. You can use iparm[33] to enable CNR mode independent
from other Intel MKL domains. To get the best performance, set iparm[33] to the
actual number of hardware threads dedicated for Intel MKL PARDISO. Setting
iparm[33] to fewer OpenMP threads than the maximum number of them in use
reduces the scalability of the problem being solved. Setting iparm[33] to more
threads than are available can reduce the performance of Intel MKL PARDISO.
iparm[34]
One- or zero-based indexing of columns and rows.
input 0*
One-based indexing: columns and rows indexing in arrays ia, ja, and perm starts
from 1 (Fortran-style indexing).
1
Zero-based indexing: columns and rows indexing in arrays ia, ja, and perm starts
from 0 (C-style indexing).
iparm[35]
Schur complement matrix computation control. To calculate this matrix, you must set the
input input permuation vector perm to a set of indexes such that when perm(i) = 1, the i-th
element of the initial matrix is an element of the Schur matrix.
CAUTION
You can only set one of iparm[4], iparm[30], and iparm[35], so be sure that the
iparm[4] (user permutation) and the iparm[30] (partial solution) parameters are 0
if you set iparm[35].
0*
Do not compute Schur complement.
1661
1
Compute Schur complement matrix as part of Intel MKL PARDISO factorization step
and return it in the solution vector.
NOTE
This option only computes the Schur complement matrix, and does not
calculate factorization arrays.
2
Compute Schur complement matrix as part of Intel MKL PARDISO factorization step
and return it in the solution vector. Since this option calculates factorization arrays
you can use it to launch partial or full solution of the entire problem after the
factorization step.
iparm[36]
Format for matrix storage.
input 0*
Use CSR format (see Three Array Variation of CSR Format) for matrix storage.
>0
Use BSR format (see Three Array Variation of BSR Format) for matrix storage with
blocks of size iparm[36].
NOTE
Intel MKL does not support BSR format in these cases:
• iparm[10] > 0: Scaling vectors
• iparm[12] > 0: Weighted matching
• iparm[30] > 0: Partial solution
• iparm[35] > 0: Schur complement
• iparm[55] > 0: Pivoting control
• iparm[59] > 0: OOC Intel MKL PARDISO
<0
Convert supplied matrix to variable BSR (VBSR) format (see Sparse Data Storage)
for matrix storage. Intel MKL PARDISO analyzes the matrix provided in CSR3
format and converts it to an internal VBSR format. Set iparm[36] = -t, 0 < t≤
100.
NOTE
Intel MKL only supports VBSR format for real and symmetric positive definite
or indefinite matrices (mtype = 2 or mtype = -2). Intel MKL does not support
VBSR format in these cases:
• iparm[10] > 0: Scaling vectors
• iparm[12] > 0: Weighted matching
• iparm[30] > 0: Partial solution
• iparm[35] > 0: Schur complement
• iparm[55] > 0: Pivoting control
• iparm[59] > 0: OOC Intel MKL PARDISO
iparm[37]
-
iparm[54]
iparm[55]
Diagonal and pivoting control.
0*
Internal function used to work with pivot and calculation of diagonal arrays turned
off.
1662
1
You can use the mkl_pardiso_pivot callback routine to control pivot elements which
appear during numerical factorization. Additionally, you can obtain the elements of
initial matrix and factorized matrices after the pardiso factorization step diagonal
using the pardiso_getdiag routine. This parameter can be turned on only in the in-
core version of Intel MKL PARDISO.
iparm[56]
-
iparm[58]
iparm[59]
Intel MKL PARDISO mode.
input iparm[59] switches between in-core (IC) and out-of-core (OOC) Intel MKL PARDISO. OOC
can solve very large problems by holding the matrix factors in files on the disk, which
requires a reduced amount of main memory compared to IC.
Unless you are operating in sequential mode, you can switch between IC and OOC modes
after the reordering phase. However, you can get better Intel MKL PARDISO performance
by setting iparm[59] before the reordering phase.
NOTE
The amount of memory used in OOC mode depends on the number of OpenMP
threads.
WARNING
Do not increase the number of OpenMP threads used for Intel MKL PARDISO between
the first call to pardiso and the factorization or solution phase. Because the
minimum amount of memory required for out-of-core execution depends on the
number of OpenMP threads, increasing it after the initial call can cause incorrect
results.
0*
IC mode.
1
IC mode is used if the total amount of RAM (in megabytes) needed for storing the
matrix factors is less than sum of two values of the environment variables:
MKL_PARDISO_OOC_MAX_CORE_SIZE (default value 2000 MB) and
MKL_PARDISO_OOC_MAX_SWAP_SIZE (default value 0 MB); otherwise OOC mode is
used. In this case amount of RAM used by OOC mode cannot exceed the value of
MKL_PARDISO_OOC_MAX_CORE_SIZE.
If the total peak memory needed for storing the local arrays is more than
MKL_PARDISO_OOC_MAX_CORE_SIZE, increase MKL_PARDISO_OOC_MAX_CORE_SIZE
if possible.
NOTE
Conditional numerical reproducibility (CNR) is not supported for this mode.
2
OOC mode.
The OOC mode can solve very large problems by holding the matrix factors in files
on the disk. Hence the amount of RAM required by OOC mode is significantly
reduced compared to IC mode.
If the total peak memory needed for storing the local arrays is more than
MKL_PARDISO_OOC_MAX_CORE_SIZE, increase MKL_PARDISO_OOC_MAX_CORE_SIZE
if possible.
1663
To obtain better Intel MKL PARDISO performance, during the numerical factorization
phase you can provide the maximum number of right-hand sides, which can be
used further during the solving phase.
iparm[60]
-
iparm[61]
iparm[62]
Size of the minimum OOC memory for numerical factorization and solution.
output This parameter provides the size in kilobytes of the minimum memory required by OOC
Intel MKL PARDISO for internal floating point arrays. This parameter is computed in phase
1.
Total peak memory consumption of OOC Intel MKL PARDISO can be estimated as
max(iparm[14], iparm[15] + iparm[62]).
iparm[63]
NOTE
Generally in sparse matrices, components which are equal to zero can be considered non-zero if
necessary. For example, in order to make a matrix structurally symmetric, elements which are zero
can be considered non-zero. See Sparse Matrix Storage Formats for an example.
Optimization Notice
PARDISO_DATA_TYPE
The following table lists the values of PARDISO_DATA_TYPE depending on the matrix types and values of the
parameter iparm[27].
Data type value Matrix type mtype iparm[27] comments
double 1, 2, -2, 11 0 Real matrices, double

precision
float 1 Real matrices, single

precision
MKL_Complex16 3, 6, 13, 4, -4 0 Complex matrices,

double precision
MKL_Complex8 1 Complex matrices, single

precision
1664
Parallel Direct Sparse Solver for Clusters Interface
The Parallel Direct Sparse Solver for Clusters Interface solves large linear systems of equations with sparse
matrices on clusters. It is
• high performing
• robust
• memory efficient
• easy to use
A hybrid implementation combines Message Passing Interface (MPI) technology for data exchange between
parallel tasks (processes) running on different nodes, and OpenMP* technology for parallelism inside each
node of the cluster. This approach effectively uses modern hardware resources such as clusters consisting of
nodes with multi-core processors. The solver code is optimized for the latest Intel processors, but also
performs well on clusters consisting of non-Intel processors.
Code examples are available in the Intel MKL installation examples directory.
Optimization Notice
Parallel Direct Sparse Solver for Clusters Interface Algorithm

Parallel Direct Sparse Solver for Clusters Interface solves a set of sparse linear equations
A*X = B
with multiple right-hand sides using a distributed LU, LLT , LDLT or LDL* factorization, where A is an n-by-n
matrix, and X and B are n-by-nrhs matrices.
The solution comprises four tasks:
• analysis and symbolic factorization;

• numerical factorization;
• forward and backward substitution including iterative refinement;
• termination to release all internal solver memory.
The solver first computes a symmetric fill-in reducing permutation P based on the nested dissection
algorithm from the METIS package [Karypis98] (included with Intel MKL), followed by the Cholesky or other
type of factorization (depending on matrix type) [Schenk00-2] of PAPT. The solver uses either diagonal
pivoting, or 1x1 and 2x2 Bunch and Kaufman pivoting for symmetric indefinite or Hermitian matrices before
finding an approximation of X by forward and backward substitution and iterative refinement.
The initial matrix A is perturbed whenever numerically acceptable 1x1 and 2x2 pivots cannot be found within
the diagonal blocks. One or two passes of iterative refinement may be required to correct the effect of the
perturbations. This restricted notion of pivoting with iterative refinement is effective for highly indefinite
symmetric systems. For a large set of matrices from different application areas, the accuracy of this method
is comparable to a direct factorization method that uses complete sparse pivoting techniques [Schenk04].
Parallel Direct Sparse Solver for Clusters additionally improves the pivoting accuracy by applying symmetric
weighted matching algorithms. These methods identify large entries in the coefficient matrix A that, if
permuted close to the diagonal, enable the factorization process to identify more acceptable pivots and
1665
proceed with fewer pivot perturbations. The methods are based on maximum weighted matching and
improve the quality of the factor in a complementary way to the alternative idea of using more complete
pivoting techniques.
Parallel Direct Sparse Solver for Clusters Interface Matrix Storage

The sparse data storage in the Parallel Direct Sparse Solver for Clusters Interface follows the scheme
described in the Sparse Matrix Storage Formats section using the variable ja for columns, ia for rowIndex,
and a for values. Column indices ja must be in increasing order per row.
When an input data structure is not accessed in a call, a NULL pointer or any valid address can be passed as
a placeholder for that argument.
Algorithm Parallelization and Data Distribution

Intel® MKL Parallel Direct Sparse Solver for Clusters enables parallel execution of the solution algorithm with
efficient data distribution.
The master MPI process performs the symbolic factorization phase to represent matrix A as computational
tree. Then matrix A is divided among all MPI processes in a one-dimensional manner. The same distribution
is used for L-factor (the lower triangular matrix in Cholesky decomposition). Matrix A and all required internal
data are broadcast to slave MPI processes. Each MPI process fills in its own parts of L-factor with initial
values of the matrix A.
Parallel Direct Sparse Solver for Clusters Interface computes all independent parts of L-factor completely in
parallel. When a block of the factor must be updated by other blocks, these updates are independently
passed to a temporary array on each updating MPI process. It further gathers the result into an updated
block using the MPI_Reduce() routine. The computations within an MPI process are dynamically divided
among OpenMP threads using pipelining parallelism with a combination of left- and right-looking techniques
similar to those of the PARDISO* software. Level 3 BLAS operations from Intel MKL ensure highly efficient
performance of block-to-block update operations.
During forward/backward substitutions, respective Right Hand Side (RHS) parts are distributed among all
MPI processes. All these processes participate in the computation of the solution. Finally, the solution is
gathered on the master MPI process.
This approach demonstrates good scalability on clusters with Infiniband* technology. Another advantage of
the approach is the effective distribution of L-factor among cluster nodes. This enables the solution of tasks
with a much higher number of non-zero elements than it is possible with any Symmetric Multiprocessing
(SMP) in-core direct solver.
The algorithm ensures that the memory required to keep internal data on each MPI process is decreased
when the number of MPI processes in a run increases. However, the solver requires that matrix A and some
other internal arrays completely fit into the memory of each MPI process.
To get the best performance, run one MPI process per physical node and set the number of OpenMP* threads
per node equal to the number of physical cores on the node.
NOTE
Instead of calling MPI_Init(), initialize MPI with MPI_Init_thread() and set the MPI threading level
to MPI_THREAD_FUNNELED or higher. For details, see the code examples in <install_dir>/
examples.
cluster_sparse_solver
1666
Syntax
void cluster_sparse_solver (_MKL_DSS_HANDLE_t pt, const MKL_INT *maxfct, const MKL_INT
*mnum, const MKL_INT *mtype, const MKL_INT *phase, const MKL_INT *n, const void *a,
const MKL_INT *ia, const MKL_INT *ja, MKL_INT *perm, const MKL_INT *nrhs, MKL_INT
*iparm, const MKL_INT *msglvl, void *b, void *x, const int *comm, MKL_INT *error);
Include Files
• mkl_cluster_sparse_solver.h
Description
The routine cluster_sparse_solver calculates the solution of a set of sparse linear equations
A*X = B
with single or multiple right-hand sides, using a parallel LU, LDL, or LLT factorization, where A is an n-by-n
matrix, and X and B are n-by-nrhs vectors or matrices.
NOTE
Input Parameters
NOTE
Most of the input parameters (except for the pt, phase, and comm parameters and, for the distributed
format, the a, ia, and ja arrays) must be set on the master MPI process only, and ignored on other
processes. Other MPI processes get all required data from the master MPI process using the MPI
communicator, comm.
pt Array of size 64.

Handle to internal data structure. The entries must be set to zero before the
first call to cluster_sparse_solver.
CAUTION
After the first call to cluster_sparse_solver do not modify pt, as
that could cause a serious memory leak.
maxfct Ignored; assumed equal to 1.
mnum Ignored; assumed equal to 1.
mtype Defines the matrix type, which influences the pivoting method. The Parallel
Direct Sparse Solver for Clusters solver supports the following matrices:
1 real and structurally symmetric
2 real and symmetric positive definite
-2 real and symmetric indefinite
3 complex and structurally symmetric
1667
4 complex and Hermitian positive definite
-4 complex and Hermitian indefinite
6 complex and symmetric
11 real and nonsymmetric
13 complex and nonsymmetric
phase Controls the execution of the solver. Usually it is a two- or three-digit

integer. The first digit indicates the starting phase of execution and the
second digit indicates the ending phase. Parallel Direct Sparse Solver for
Clusters has the following phases of execution:
• Phase 1: Fill-reduction analysis and symbolic factorization

• Phase 2: Numerical factorization
• Phase 3: Forward and Backward solve including optional iterative
refinement
• Memory release (phase= -1)
If a previous call to the routine has computed information from previous

phases, execution may start at any phase. The phase parameter can have
the following values:

11 Analysis
12 Analysis, numerical factorization
13 Analysis, numerical factorization, solve, iterative

refinement
22 Numerical factorization
23 Numerical factorization, solve, iterative refinement
33 Solve, iterative refinement
-1 Release all internal memory for all matrices
n Number of equations in the sparse linear systems of equations A*X = B.

Constraint: n > 0.
a Array. Contains the non-zero elements of the coefficient matrix A

corresponding to the indices in ja. The coefficient matrix can be either real
or complex. The matrix must be stored in the three-array variant of the
compressed sparse row (CSR3) or in the three-array variant of the block
compressed sparse row (BSR3) format, and the matrix must be stored with
increasing values of ja for each row.
For CSR3 format, the size of a is the same as that of ja. Refer to the
values array description in Three Array Variation of CSR Format for more
details.
1668
For BSR3 format the size of a is the size of ja multiplied by the square of
the block size. Refer to the values array description in Three Array
Variation of BSR Format for more details.
NOTE
For centralized input (iparm[39]=0), provide the a array for the
master MPI process only. For distributed assembled input
(iparm[39]=1 or iparm[39]=2), provide it for all MPI processes.
Important
The column indices of non-zero elements of each row of the matrix A
must be stored in increasing order.
ia For CSR3 format, ia[i] (i<n) points to the first column index of row i in
contains the first non-zero element from row i of A. The last element ia[n]
is taken to be equal to the number of non-zero elements in A, plus one.
Refer to rowIndex array description in Three Array Variation of CSR Format
for more details.
For BSR3 format, ia[i] (i<n) points to the first column index of row i in
contains the first non-zero block from row i of A. The last element ia[n] is
taken to be equal to the number of non-zero blcoks in A, plus one. Refer to
rowIndex array description in Three Array Variation of BSR Format for more
details.
The array ia is accessed in all phases of the solution process.
Indexing of ia is one-based by default, but it can be changed to zero-based

by setting the appropriate value to the parameter iparm[34]. For zero-
based indexing, the last element ia[n] is assumed to be equal to the
number of non-zero elements in matrix A.
NOTE
For centralized input (iparm[39]=0), provide the ia array at the
(iparm[39]=1 or iparm[39]=2), provide it at all MPI processes.
ja For CSR3 format, array ja contains column indices of the sparse matrix A.
system as is shown for columns array in Three Array Variation of CSR
Format.
For BSR3 format, array ja contains column indices of the sparse matrix A.
system as is shown for columns array in Three Array Variation of BSR
Format.
The array ja is accessed in all phases of the solution process.
1669
Indexing of ja is one-based by default, but it can be changed to zero-based

by setting the appropriate value to the parameter iparm(35).
NOTE
For centralized input (iparm(40)=0), provide the ja array at the
(iparm(40)=1 or iparm(40)=2), provide it at all MPI processes.
perm Ignored.
nrhs Number of right-hand sides that need to be solved for.
iparm Array, size 64. This array is used to pass various parameters to Parallel
Direct Sparse Solver for Clusters Interface and to return some useful
information after execution of the solver.
See cluster_sparse_solver iparm Parameter for more details about the
iparm parameters.
msglvl Message level information. If msglvl = 0 then cluster_sparse_solver

generates no output, if msglvl = 1 the solver prints statistical information
to the screen.
Statistics include information such as the number of non-zero elements in
L-factor and the timing for each phase.
Set msglvl = 1 if you report a problem with the solver, since the additional
information provided can facilitate a solution.
b Array, size n*nrhs. On entry, contains the right-hand side vector/matrix B,

which is placed in memory contiguously. The b[i+k*n] must hold the i-th
component of k-th right-hand side vector. Note that b is only accessed in
the solution phase.
comm MPI communicator. The solver uses the Fortran MPI communicator
internally. Convert the MPI communicator to Fortran using the
MPI_Comm_c2f() function. See the examples in the <install_dir>/
examples directory.
Output Parameters
perm Ignored.
iparm On output, some iparm values report information such as the numbers of
non-zero elements in the factors.
See cluster_sparse_solver iparm Parameter for more details about the
iparm parameters.
b On output, the array is replaced with the solution if iparm[5] = 1.
1670
x Array, size (n*nrhs). If iparm[5]=0 it contains solution vector/matrix X,
which is placed contiguously in memory. The x[i+k*n] element must hold
the i-th component of the k-th solution vector. Note that x is only accessed
in the solution phase.
error The error indicator according to the below table:
error Information
0 no error
-1 input inconsistent
-2 not enough memory
-3 reordering problem
-5 unclassified (internal) error
cluster_sparse_solver_64
Syntax
void cluster_sparse_solver_64 (void *pt, const long long int *maxfct, const long long
int *mnum, const long long int *mtype, const long long int *phase, const long long int
*n, const void *a, const long long int *ia, const long long int *ja, long long int
*perm, const long long int *nrhs, long long int *iparm, const long long int *msglvl,
void *b, void *x, const int *comm, long long int *error);
Include Files
• mkl_cluster_sparse_solver.h
Description
The routine cluster_sparse_solver_64 is an alternative ILP64 (64-bit integer) version of the
cluster_sparse_solver routine (see the Description section for more details). The interface of
cluster_sparse_solver_64 is the same as the interface of cluster_sparse_solver, but it accepts and
Use cluster_sparse_solver_64 when cluster_sparse_solver for solving large matrices (with the
number of non-zero elements on the order of 500 million or more). You can use it together with the usual
LP64 interfaces for the rest of Intel MKL functionality. In other words, if you use 64-bit integer version
(cluster_sparse_solver_64), you do not need to re-link your applications with ILP64 libraries. Take into
account that cluster_sparse_solver_64 may perform slower than regular cluster_sparse_solver on
the reordering and symbolic factorization phase.
NOTE
cluster_sparse_solver_64 is supported only in the 64-bit libraries. If
cluster_sparse_solver_64 is called from the 32-bit libraries, it returns error =-12.
1671
NOTE
Input Parameters
The input parameters of cluster_sparse_solver_64 are the same as the input parameters of
cluster_sparse_solver, but cluster_sparse_solver_64 accepts all integer data as long long int.
Output Parameters
The output parameters of cluster_sparse_solver_64 are the same as the output parameters of
cluster_sparse_solver, but cluster_sparse_solver_64 returns all integer data as long long int.
cluster_sparse_solver iparm Parameter

The following table describes all individual components of the Parallel Direct Sparse Solver for Clusters
Interface iparm parameter. Components which are not used must be initialized with 0. Default values are
denoted with an asterisk (*).
iparm[0] Use default values.

input 0 iparm[1] - iparm(64) are filled with default values.
!=0 You must supply all values in components iparm[1] - iparm(64).
iparm[1] Fill-in reducing ordering for the input matrix.

input 2* The nested dissection algorithm from the METIS package [Karypis98].
3 The parallel version of the nested dissection algorithm. It can decrease the time of
computations on multi-core computers, especially when Phase 1 takes significant time.
10 The MPI version of the nested dissection and symbolic factorization algorithms. The input
matrix for the reordering must be distributed among different MPI processes without any
intersection. Use iparm[40] and iparm[41]to set the bounds of the domain. During all of
Phase 1, the entire matrix is not gathered on any one process, which can decrease
computation time (especially when Phase 1 takes significant time) and decrease memory
usage for each MPI process on the cluster.
NOTE
If you set iparm[1] = 10, comm = -1 (MPI communicator), and if there is
one MPI process, optimization and full parallelization with the OpenMP version
of the nested dissection and symbolic factorization algorithms proceeds. This
can decrease computation time on multi-core computers. In this case, set
iparm[40] = 1 and iparm[41] = n for one-based indexing, or to 0 and n -
1, respectively, for zero-based indexing.
iparm[2]
iparm[5]
Write solution on x.
input
NOTE
The array x is always used.
0*
The array x contains the solution; right-hand side vector b is kept unchanged.
1
The solver stores the solution on the right-hand side b.
1672
iparm[6]
Number of iterative refinement steps performed.
output Reports the number of iterative refinement steps that were actually performed during the
solve step.
iparm[7]
Iterative refinement step.
input On entry to the solve and iterative refinement step, iparm[7] must be set to the
maximum number of iterative refinement steps that the solver performs.
0*
The solver automatically performs two steps of iterative refinement when perturbed
pivots are obtained during the numerical factorization.
>0
Maximum number of iterative refinement steps that the solver performs. The solver
performs not more than the absolute value of iparm[7] steps of iterative
refinement. The solver might stop the process before the maximum number of
steps if
• a satisfactory level of accuracy of the solution in terms of backward error is
achieved,
• or if it determines that the required accuracy cannot be reached. In this case
Parallel Direct Sparse Solver for Clusters Interface returns -4 in the error
parameter.
The number of executed iterations is reported in iparm[6].
<0
Same as above, but the accumulation of the residuum uses extended precision real
and complex data types.
Perturbed pivots result in iterative refinement (independent of iparm[7]=0) and
the number of executed iterations is reported in iparm[6].
iparm[8]
iparm[9]
Pivoting perturbation.
input This parameter instructs Parallel Direct Sparse Solver for Clusters Interface how to handle
small pivots or zero pivots for nonsymmetric matrices (mtype =11 or mtype =13) and
symmetric matrices (mtype =-2, mtype =-4, or mtype =6). For these matrices the solver
uses a complete supernode pivoting approach. When the factorization algorithm reaches a
point where it cannot factor the supernodes with this pivoting strategy, it uses a pivoting
perturbation strategy similar to [Li99], [Schenk04].
Small pivots are perturbed with eps = 10-iparm[9].
The magnitude of the potential pivot is tested against a constant threshold of
where eps = 10(-iparm[9]), A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is the infinity norm
of the scaled and permuted matrix A. Any tiny pivots encountered during elimination are
set to the sign (lII)*eps*||A2||inf, which trades off some numerical stability for the
ability to keep pivots from getting too small. Small pivots are therefore perturbed with eps
= 10(-iparm[9]).
13*
The default value for nonsymmetric matrices(mtype =11, mtype=13), eps =
10-13.
8*
The default value for symmetric indefinite matrices (mtype =-2, mtype=-4,
mtype=6), eps = 10-8.
iparm[10]
Scaling vectors.
input Parallel Direct Sparse Solver for Clusters Interface uses a maximum weight matching
algorithm to permute large elements on the diagonal and to scale.
1673
Use iparm[10] = 1 (scaling) and iparm[12] = 1 (matching) for highly indefinite

symmetric matrices, for example, from interior point optimizations or saddle point
problems. Note that in the analysis phase (phase=11) you must provide the numerical
values of the matrix A in array a in case of scaling and symmetric weighted matching.
0*
Disable scaling. Default for symmetric indefinite matrices.
1*
Enable scaling. Default for nonsymmetric matrices.
Scale the matrix so that the diagonal elements are equal to 1 and the absolute
values of the off-diagonal entries are less or equal to 1. This scaling method is
applied to nonsymmetric matrices (mtype = 11, mtype = 13). The scaling can also
be used for symmetric indefinite matrices (mtype = -2, mtype = -4, mtype = 6)
when the symmetric weighted matchings are applied (iparm[12] = 1).
of the matrix A in case of scaling.
iparm[11]
iparm[12]
Improved accuracy using (non-) symmetric weighted matching.
input Parallel Direct Sparse Solver for Clusters Interface can use a maximum weighted matching
algorithm to permute large elements close the diagonal. This strategy adds an additional
level of reliability to the factorization methods and complements the alternative of using
more complete pivoting techniques during the numerical factorization.
0*
Disable matching. Default for symmetric indefinite matrices.
1*
Enable matching. Default for nonsymmetric matrices.
Maximum weighted matching algorithm to permute large elements close to the
diagonal.
It is recommended to use iparm[10] = 1 (scaling) and iparm[12]= 1 (matching)
for highly indefinite symmetric matrices, for example from interior point
optimizations or saddle point problems.
of the matrix A in case of symmetric weighted matching.
iparm[13]
-
iparm[19]
iparm[20]
Pivoting for symmetric indefinite matrices.
input 0
Apply 1x1 diagonal pivoting during the factorization process.
1*
mtype=6.
iparm[21]
-
iparm[25]
iparm[26]
Matrix checker.
input 0*
Do not check the sparse matrix representation for errors.
1
Check integer arrays ia and ja. In particular, check whether the column indices are
sorted in increasing order within each row.
1674
iparm[27]
Single or double precision Parallel Direct Sparse Solver for Clusters Interface.
input See iparm[7] for information on controlling the precision of the refinement steps.
0*
Input arrays (a, x and b) and all internal arrays must be presented in double
precision.
1
Input arrays (a, x and b) must be presented in single precision.
In this case all internal computations are performed in single precision.
iparm[28]
-
iparm[33]
iparm[34]
One- or zero-based indexing of columns and rows.
input 0*
One-based indexing: columns and rows indexing in arrays ia, ja, and perm starts
from 1 (Fortran-style indexing).
1
Zero-based indexing: columns and rows indexing in arrays ia, ja, and perm starts
from 0 (C-style indexing).
iparm[36]
Format for matrix storage.
input 0*
Use CSR format (see Three Array Variation of BSR Format) for matrix storage.
>0
Use BSR format (see Three Array Variation of BSR Format) for matrix storage with
blocks of size iparm[36].
NOTE
Intel MKL does not support BSR format in these cases:
iparm[10] > 0 Scaling vectors
iparm[12] > 0 Weighted matching
iparm[30] > 0 Partial solution
iparm[35] > 0 Schur complement
iparm[55] > 0 Pivoting control
iparm[59] > 0 OOC Intel MKL PARDISO
iparm[37]
-
iparm[38]
iparm[39]
Matrix input format.
input
NOTE
Performance of the reordering step of the Parallel Direct Sparse Solver for Clusters
Interface is slightly better for assembled format (CSR, iparm[39] = 0) than for
distributed format (DCSR, iparm[39] > 0) for the same matrices, so if the matrix is
assembled on one node do not distribute it before calling cluster_sparse_solver.
0*
Provide the matrix in usual centralized input format: the master MPI process stores
all data from matrix A, with rank=0.
1675
1
Provide the matrix in distributed assembled matrix input format. In this case, each
MPI process stores only a part (or domain) of the matrix A data. Set the bounds of
the domain using iparm[40] and iparm[41]. The solution vector is placed on the
master process.
2
the domain using iparm[40] and iparm[41]. The solution vector, A, and RHS
elements are distributed between processes in same manner.
3
the domain using iparm[40] and iparm[41]. The A and RHS elements are
distributed between processes in same manner and the solution vector is the same
on each process
iparm[40]
Beginning of input domain.
input The number of the matrix A row, RHS element, and, for iparm[39]=2, solution vector that
begins the input domain belonging to this MPI process.
Only applicable to the distributed assembled matrix input format (iparm[39]> 0).
See Sparse Matrix Storage Formats for more details.
iparm[41]
End of input domain.
input The number of the matrix A row, RHS element, and, for iparm[39]=2, solution vector that
ends the input domain belonging to this MPI process.
Only applicable to the distributed assembled matrix input format (iparm[39]> 0).
See Sparse Matrix Storage Formats for more details.
iparm[42]
-
iparm[63]
input
NOTE
Generally in sparse matrices, components which are equal to zero can be considered non-zero if
necessary. For example, in order to make a matrix structurally symmetric, elements which are zero
can be considered non-zero. See Sparse Matrix Storage Formats for an example.
Optimization Notice
1676
Direct Sparse Solver (DSS) Interface Routines
Intel MKL supports the DSS interface, an alternative to the Intel MKL PARDISO interface for the direct sparse
solver. The DSS interface implements a group of user-callable routines that are used in the step-by-step
solving process and utilizes the general scheme described in Appendix A Linear Solvers Basics for solving
sparse systems of linear equations. This interface also includes one routine for gathering statistics related to
the solving process.
The DSS interface also supports the out-of-core (OOC) mode.
Table "DSS Interface Routines" lists the names of the routines and describes their general use.
DSS Interface Routines
Routine Description
dss_create
Initializes the solver and creates the basic data structures
necessary for the solver. This routine must be called
before any other DSS routine.
dss_define_structure
Informs the solver of the locations of the non-zero
elements of the matrix.
dss_reorder
Based on the non-zero structure of the matrix, computes
a permutation vector to reduce fill-in during the factoring
process.
dss_factor_real, dss_factor_complex
Computes the LU, LDLT or LLT factorization of a real or
complex matrix.
dss_solve_real, dss_solve_complex
Computes the solution vector for a system of equations
based on the factorization computed in the previous
phase.
dss_delete
Deletes all data structures created during the solving
process.
dss_statistics
Returns statistics about various phases of the solving
process.
To find a single solution vector for a single system of equations with a single right-hand side, invoke the Intel
MKL DSS interface routines in this order:
1. dss_create
2. dss_define_structure
3. dss_reorder
4. dss_factor_real, dss_factor_complex
5. dss_solve_real, dss_solve_complex
6. dss_delete
However, in certain applications it is necessary to produce solution vectors for multiple right-hand sides for a
given factorization and/or factor several matrices with the same non-zero structure. Consequently, it is
sometimes necessary to invoke the Intel MKL sparse routines in an order other than that listed, which is
possible using the DSS interface. The solving process is conceptually divided into six phases. Figure "Typical
order for invoking DSS interface routines" indicates the typical order in which the DSS interface routines can
be invoked.
Typical order for invoking DSS interface routines
1677
See the code examples that use the DSS interface routines to solve systems of linear equations in the Intel
MKL installation directory (dss_*.c).
DSS Interface Description

Each DSS routine reads from or writes to a data object called a handle. Refer to Memory Allocation and
Handles to determine the correct method for declaring a handle argument for each language. For simplicity,
the descriptions in DSS routines refer to the data type as MKL_DSS_HANDLE.
Routine Options
The DSS routines have an integer argument (referred below to as opt) for passing various options to the
routines. The permissible values for opt should be specified using only the symbol constants defined in the
language-specific header files (see Implementation Details). The routines accept options for setting the
message and termination levels as described in Table "Symbolic Names for the Message and Termination
Levels Options". Additionally, each routine accepts the option MKL_DSS_DEFAULTS that sets the default
values (as documented) for opt to the routine.
Symbolic Names for the Message and Termination Levels Options
Message Level Termination Level
MKL_DSS_MSG_LVL_SUCCESS MKL_DSS_TERM_LVL_SUCCESS
MKL_DSS_MSG_LVL_INFO MKL_DSS_TERM_LVL_INFO
MKL_DSS_MSG_LVL_WARNING MKL_DSS_TERM_LVL_WARNING
MKL_DSS_MSG_LVL_ERROR MKL_DSS_TERM_LVL_ERROR
MKL_DSS_MSG_LVL_FATAL MKL_DSS_TERM_LVL_FATAL
The settings for message and termination levels can be set on any call to a DSS routine. However, once set
to a particular level, they remain at that level until they are changed in another call to a DSS routine.
1678
You can specify both message and termination level for a DSS routine by adding the options together. For
example, to set the message level to debug and the termination level to error for all the DSS routines, use
the following call:
dss_create( handle, MKL_DSS_MSG_LVL_INFO + MKL_DSS_TERM_LVL_ERROR)
User Data Arrays

Many of the DSS routines take arrays of user data as input. For example, pointers to user arrays are passed
to the routine dss_define_structure to describe the location of the non-zero entries in the matrix.
CAUTION
Do not modify the contents of these arrays after they are passed to one of the solver routines.
DSS Implementation Details

To promote portability across platforms and ease of use across different languages, use the appropriate Intel
MKL DSS header file:
• mkl_dss.h
The header file defines symbolic constants for returned error values, function options, certain defined data
types, and function prototypes.
NOTE
Constants for options, returned error values, and message severities must be referred only by the
symbolic names that are defined in these header files. Use of the Intel MKL DSS software without
including one of the above header files is not supported.
Memory Allocation and Handles

You do not need to allocate any temporary working storage in order to use the Intel MKL DSS routines,
because the solver itself allocates any required storage. To enable multiple users to access the solver
simultaneously, the solver keeps track of the storage allocated for a particular application by using a handle
data object.
Each of the Intel MKL DSS routines creates, uses, or deletes a handle. Consequently, any program calling an
Intel MKL DSS routine must be able to allocate storage for a handle. The exact syntax for allocating storage
for a handle varies from language to language. To standardize the handle declarations, the language-specific
header files declare constants and defined data types that must be used when declaring a handle object in
your code.
• #include "mkl_dss.h"
_MKL_DSS_HANDLE_t handle;
In addition to the definition for the correct declaration of a handle, the include file also defines the following:
• function prototypes for languages that support prototypes

• symbolic constants that are used for the returned error values
• user options for the solver routines
• constants indicating message severity.
DSS Routines
1679
dss_create
Initializes the solver.
Syntax
MKL_INT dss_create(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt)
Include Files
• mkl.h
Description
The dss_create routine initializes the solver. After the call to dss_create, all subsequent invocations of the
Intel MKL DSS routines must use the value of the handle returned by dss_create.
WARNING
Do not write the value of handle directly.
The default value of the parameter opt is
MKL_DSS_MSG_LVL_WARNING + MKL_DSS_TERM_LVL_ERROR.
By default, the DSS routines use double precision for solving systems of linear equations. The precision used
by the DSS routines can be set to single mode by adding the following value to the opt parameter:
MKL_DSS_SINGLE_PRECISION.
Input data and internal arrays are required to have single precision.
By default, the DSS routines use Fortran style (one-based) indexing for input arrays of integer types (the
first value is referenced as array element 1). To set indexing to C style (the first value is referenced as array
element 0), add the following value to the opt parameter:
MKL_DSS_ZERO_BASED_INDEXING.
The opt parameter can also control number of refinement steps used on the solution stage by specifying the
two following values:
MKL_DSS_REFINEMENT_OFF - maximum number of refinement steps is set to zero;
MKL_DSS_REFINEMENT_ON (default value) - maximum number of refinement steps is set to 2.
By default, DSS uses in-core computations. To launch the out-of-core version of DSS (OOC DSS) you can add
to this parameter one of two possible values: MKL_DSS_OOC_STRONG and MKL_DSS_OOC_VARIABLE.
MKL_DSS_OOC_STRONG - OOC DSS is used.

MKL_DSS_OOC_VARIABLE - if the memory needed for the matrix factors is less than the value of the
environment variable MKL_PARDISO_OOC_MAX_CORE_SIZE, then the OOC DSS uses the in-core kernels of
Intel MKL PARDISO, otherwise it uses the OOC computations.
The variable MKL_PARDISO_OOC_MAX_CORE_SIZE defines the maximum size of RAM allowed for storing work
arrays associated with the matrix factors. It is ignored if MKL_DSS_OOC_STRONG is set. The default value of
MKL_PARDISO_OOC_MAX_CORE_SIZE is 2000 MB. This value and default path and file name for storing
temporary data can be changed using the configuration file pardiso_ooc.cfg or command line (See more
details in the description of the pardiso routine).
WARNING
Other than message and termination level options, do not change the OOC DSS settings after they are
specified in the routine dss_create.
1680
Input Parameters
opt
Parameter to pass the DSS options. The default value is
Output Parameters
handle
Pointer to the data structure storing internal DSS results
(MKL_DSS_HANDLE).
Return Values
MKL_DSS_SUCCESS
MKL_DSS_INVALID_OPTION
MKL_DSS_OUT_OF_MEMORY
MKL_DSS_MSG_LVL_ERR
MKL_DSS_TERM_LVL_ERR
dss_define_structure
Communicates locations of non-zero elements in the
matrix to the solver.
Syntax
MKL_INT dss_define_structure(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt, MKL_INT
const *rowIndex, MKL_INT const *nRows, MKL_INT const *nCols, MKL_INT const *columns,
MKL_INT const *nNonZeros);
Include Files
• mkl.h
Description
The routine dss_define_structure communicates the locations of the nNonZeros number of non-zero
elements in a matrix of nRows * nCols size to the solver.
NOTE
The Intel MKL DSS software operates only on square matrices, so nRows must be equal to nCols.
To communicate the locations of non-zero elements in the matrix, do the following:
1. Define the general non-zero structure of the matrix by specifying the value for the options argument
opt. You can set the following values for real matrices:
• MKL_DSS_SYMMETRIC_STRUCTURE
• MKL_DSS_SYMMETRIC
• MKL_DSS_NON_SYMMETRIC
and for complex matrices:
• MKL_DSS_SYMMETRIC_STRUCTURE_COMPLEX
• MKL_DSS_SYMMETRIC_COMPLEX
1681
• MKL_DSS_NON_SYMMETRIC_COMPLEX
The information about the matrix type must be defined in dss_define_structure.
2. Provide the actual locations of the non-zeros by means of the arrays rowIndex and columns (see
Sparse Matrix Storage Format).
Input Parameters
opt Parameter to pass the DSS options. The default value for the matrix
structure is MKL_DSS_SYMMETRIC.
rowIndex Array of size nRows+1. Defines the location of non-zero entries in the
matrix.
nRows Number of rows in the matrix.
nCols Number of columns in the matrix; must be equal to nRows.
columns Array of size nNonZeros. Defines the column location of non-zero entries in
the matrix.
nNonZeros Number of non-zero elements in the matrix.
Output Parameters
handle Pointer to the data structure storing internal DSS results

(MKL_DSS_HANDLE).
Return Values
MKL_DSS_SUCCESS
MKL_DSS_STATE_ERR
MKL_DSS_STRUCTURE_ERR
MKL_DSS_ROW_ERR
MKL_DSS_COL_ERR
MKL_DSS_NOT_SQUARE
MKL_DSS_TOO_FEW_VALUES
MKL_DSS_TOO_MANY_VALUES
MKL_DSS_MSG_LVL_ERR
dss_reorder
Computes or sets a permutation vector that minimizes
the fill-in during the factorization phase.
Syntax
MKL_INT dss_reorder(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt, MKL_INT const *perm)
1682
Include Files
• mkl.h
Description
If opt contains the option MKL_DSS_AUTO_ORDER, then the routine dss_reorder computes a permutation
vector that minimizes the fill-in during the factorization phase. For this option, the routine ignores the
contents of the perm array.
If opt contains the option MKL_DSS_METIS_OPENMP_ORDER, then the routine dss_reorder computes
permutation vector using the parallel nested dissections algorithm to minimize the fill-in during the
factorization phase. This option can be used to decrease the time of dss_reorder call on multi-core
computers. For this option, the routine ignores the contents of the perm array.
If opt contains the option MKL_DSS_MY_ORDER, then you must supply a permutation vector in the array
perm. In this case, the array perm is of length nRows, where nRows is the number of rows in the matrix as
defined by the previous call to dss_define_structure.
If opt contains the option MKL_DSS_GET_ORDER, then the permutation vector computed during the
dss_reorder call is copied to the array perm. In this case you must allocate the array perm beforehand.
The permutation vector is computed in the same way as if the option MKL_DSS_AUTO_ORDER is set.
Optimization Notice
Input Parameters
opt Parameter to pass the DSS options. The default value for the permutation
type is MKL_DSS_AUTO_ORDER.
perm Array of length nRows. Contains a user-defined permutation vector

(accessed only if opt contains MKL_DSS_MY_ORDER or
MKL_DSS_GET_ORDER).
Output Parameters

(MKL_DSS_HANDLE).
Return Values
MKL_DSS_SUCCESS
MKL_DSS_STATE_ERR
MKL_DSS_REORDER_ERR
MKL_DSS_REORDER1_ERR
1683
MKL_DSS_I32BIT_ERR
MKL_DSS_FAILURE
MKL_DSS_MSG_LVL_ERR
dss_factor_real, dss_factor_complex
Compute factorization of the matrix with previously
specified location of non-zero elements.
Syntax
MKL_INT dss_factor_real(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt, void const
*rValues)
MKL_INT dss_factor_complex(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt, void const
*cValues)
Include Files
• mkl.h
Description
These routines compute factorization of the matrix whose non-zero locations were previously specified by a
call to dss_define_structure and whose non-zero values are given in the array rValues, cValues or
Values. Data type These arrays must be of length nNonZeros as defined in a previous call to
dss_define_structure.
NOTE
The data type (single or double precision) of rValues, cValues, Values must be in correspondence
with precision specified by the parameter opt in the routine dss_create.
The opt argument can contain one of the following options:
• MKL_DSS_POSITIVE_DEFINITE
• MKL_DSS_INDEFINITE
• MKL_DSS_HERMITIAN_POSITIVE_DEFINITE
• MKL_DSS_HERMITIAN_INDEFINITE
depending on your matrix's type.
NOTE
Input Parameters

(MKL_DSS_HANDLE).
opt Parameter to pass the DSS options. The default value is

MKL_DSS_POSITIVE_DEFINITE.
1684
rValues Array of elements of the matrix A. Real data, single or double precision as it
is specified by the parameter opt in the routine dss_create.
cValues Array of elements of the matrix A. Complex data, single or double precision
as it is specified by the parameter opt in the routine dss_create.
Return Values
MKL_DSS_SUCCESS
MKL_DSS_STATE_ERR
MKL_DSS_OPTION_CONFLICT
MKL_DSS_VALUES_ERR
MKL_DSS_ZERO_PIVOT
MKL_DSS_FAILURE
MKL_DSS_MSG_LVL_ERR
MKL_DSS_OOC_MEM_ERR
MKL_DSS_OOC_OC_ERR
MKL_DSS_OOC_RW_ERR
dss_solve_real, dss_solve_complex
Compute the corresponding solution vector and place
it in the output array.
Syntax
MKL_INT dss_solve_real(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt, void const
*rRhsValues, MKL_INT const *nRhs, void *rSolValues)
MKL_INT dss_solve_complex(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt, void const
*cRhsValues, MKL_INT const *nRhs, void *cSolValues)
Include Files
• mkl.h
Description
For each right-hand side column vector defined in the arrays rRhsValues, cRhsValues, or RhsValues,
these routines compute the corresponding solution vector and place it in the arrays rSolValues,
cSolValues, or SolValues respectively.
NOTE
The data type (single or double precision) of all arrays must be in correspondence with precision
specified by the parameter opt in the routine dss_create.
The lengths of the right-hand side and solution vectors, nRows and nCols respectively, must be defined in a
previous call to dss_define_structure.
1685
By default, both routines perform the full solution step (it corresponds to phase = 33 in Intel MKL
PARDISO). The parameter opt enables you to calculate the final solution step-by-step, calling forward and
backward substitutions.
If it is set to MKL_DSS_FORWARD_SOLVE, the forward substitution (corresponding to phase = 331 in Intel
MKL PARDISO) is performed;
if it is set to MKL_DSS_DIAGONAL_SOLVE, the diagonal substitution (corresponding to phase = 332 in Intel
MKL PARDISO) is performed, if possible;
if it is set to MKL_DSS_BACKWARD_SOLVE, the backward substitution (corresponding to phase = 333 in Intel
MKL PARDISO) is performed.
For more details about using these substitutions for different types of matrices, see Separate Forward and
Backward Substitution in the Intel MKL PARDISO solver description.
This parameter also can control the number of refinement steps that is used on the solution stage: if it is set
to MKL_DSS_REFINEMENT_OFF, the maximum number of refinement steps equal to zero, and if it is set to
MKL_DSS_REFINEMENT_ON (default value), the maximum number of refinement steps is equal to 2.
MKL_DSS_CONJUGATE_SOLVE option added to the parameter opt enables solving a conjugate transposed
system AHX = B based on the factorization of the matrix A. This option is equivalent to the parameter
iparm[11]= 1 in Intel MKL PARDISO.
MKL_DSS_TRANSPOSE_SOLVE option added to the parameter opt enables solving a transposed system ATX =
B based on the factorization of the matrix A. This option is equivalent to the parameter iparm[11]= 2 in
Intel MKL PARDISO.
Input Parameters

(MKL_DSS_HANDLE).
opt Parameter to pass the DSS options.
nRhs Number of the right-hand sides in the system of linear equations.
rRhsValues Array of size nRows * nRhs. Contains real right-hand side vectors. Real
data, single or double precision as it is specified by the parameter opt in
the routine dss_create.
cRhsValues Array of size nRows * nRhs. Contains complex right-hand side vectors.
Complex data, single or double precision as it is specified by the parameter
opt in the routine dss_create.
RhsValues Array of size nRows * nRhs. Contains right-hand side vectors. Real or
complex data, single or double precision as it is specified by the parameter
opt in the routine dss_create.
Output Parameters
rSolValues Array of size nCols * nRhs. Contains real solution vectors. Real data,
single or double precision as it is specified by the parameter opt in the
routine dss_create.
cSolValues Array of size nCols * nRhs. Contains complex solution vectors. Complex
data, single or double precision as it is specified by the parameter opt in
the routine dss_create.
1686
Return Values
MKL_DSS_SUCCESS
MKL_DSS_STATE_ERR
MKL_DSS_DIAG_ERR
MKL_DSS_FAILURE
MKL_DSS_MSG_LVL_ERR
MKL_DSS_OOC_MEM_ERR
MKL_DSS_OOC_OC_ERR
MKL_DSS_OOC_RW_ERR
dss_delete
Deletes all of the data structures created during the
solutions process.
Syntax
MKL_INT dss_delete(_MKL_DSS_HANDLE_t const *handle, MKL_INT const *opt)
Include Files
• mkl.h
Description
The routine dss_delete deletes all data structures created during the solving process.
Input Parameters
opt
Parameter to pass the DSS options. The default value is
Output Parameters
handle
Pointer to the data structure storing internal DSS results
(MKL_DSS_HANDLE).
Return Values
MKL_DSS_SUCCESS
MKL_DSS_STATE_ERR
MKL_DSS_MSG_LVL_ERR
1687
dss_statistics
Returns statistics about various phases of the solving
process.
Syntax
MKL_INT dss_statistics(_MKL_DSS_HANDLE_t *handle, MKL_INT const *opt, _CHARACTER_STR_t
const *statArr, _DOUBLE_PRECISION_t *retValues)
Include Files
• mkl.h
Description
The dss_statistics routine returns statistics about various phases of the solving process. This routine
gathers the following statistics:
– time taken to do reordering,

– time taken to do factorization,
– duration of problem solving,
– determinant of the symmetric indefinite input matrix,
– inertia of the symmetric indefinite input matrix,
– number of floating point operations taken during factorization,
– total peak memory needed during the analysis and symbolic factorization,
– permanent memory needed from the analysis and symbolic factorization,
– memory consumption for the factorization and solve phases.
Statistics are returned in accordance with the input string specified by the parameter statArr. The value of
the statistics is returned in double precision in a return array, which you must allocate beforehand.
For multiple statistics, multiple string constants separated by commas can be used as input. Return values
are put into the return array in the same order as specified in the input string.
Statistics can only be requested at the appropriate stages of the solving process. For example, requesting
FactorTime before a matrix is factored leads to an error.
The following table shows the point at which each individual statistics item can be requested:
Statistics Calling Sequences
Type of Statistics When to Call
ReorderTime After dss_reorder is completed successfully.

FactorTime After dss_factor_real or dss_factor_complex is completed successfully.
SolveTime After dss_solve_real or dss_solve_complex is completed successfully.
Determinant After dss_factor_real or dss_factor_complex is completed successfully.
Inertia After dss_factor_real is completed successfully and the matrix is real, symmetric, and
indefinite.
Flops After dss_factor_real or dss_factor_complex is completed successfully.
Peakmem After dss_reorder is completed successfully.
Factormem After dss_reorder is completed successfully.
Solvemem After dss_factor_real or dss_factor_complex is completed successfully.
1688
Input Parameters

(MKL_DSS_HANDLE).
opt Parameter to pass the DSS options.
statArr Input string that defines the type of the returned statistics. The parameter
can include one or more of the following string constants (case of the input
string has no effect):
ReorderTime Amount of time taken to do the reordering.
FactorTime Amount of time taken to do the factorization.
SolveTime Amount of time taken to solve the problem after

factorization.
Determinant Determinant of the matrix A.

For real matrices: the determinant is returned as
det_pow, det_base in two consecutive return array
locations, where 1.0 ≤ abs(det_base) < 10.0
and determinant = det_base*10(det_pow).
For complex matrices: the determinant is returned

as det_pow, det_re, det_im in three consecutive
return array locations, where 1.0 ≤abs(det_re) +
abs(det_im) < 10.0 and determinant =
(det_re, det_im)*10(det_pow).
Inertia Inertia of a real symmetric matrix is defined as a

triplet of nonnegative integers (p,n,z), where p is
the number of positive eigenvalues, n is the number
of negative eigenvalues, and z is the number of zero
eigenvalues.
Inertia is returned as three consecutive return
array locations p, n, z.
Computing inertia can lead to incorrect results for

matrixes with a cluster of eigenvalues which are
near 0.
Inertia of a k-by-k real symmetric positive definite
matrix is always (k, 0, 0). Therefore Inertia is
returned only in cases of real symmetric indefinite
matrices. For all other matrix types, an error
message is returned.
Flops Number of floating point operations performed

during the factorization.
Peakmem Total peak memory in kilobytes that the solver

needs during the analysis and symbolic factorization
phase.
1689
Factormem Permanent memory in kilobytes that the solver

needs from the analysis and symbolic factorization
phase in the factorization and solve phases.
Solvemem Total double precision memory consumption

(kilobytes) of the solver for the factorization and
solve phases.
Output Parameters
retValues Value of the statistics returned.
Finding 'time used to reorder' and 'inertia' of a matrix

The example below illustrates the use of the dss_statistics routine.
To find the above values, call dss_statistics(handle, opt, statArr, retValue), where staArr is
"ReorderTime,Inertia"
In this example, retValue has the following values:
retValue[0] Time to reorder.
retValue[1] Positive Eigenvalues.
retValue[2] Negative Eigenvalues.
retValue[3] Zero Eigenvalues.
Return Values
MKL_DSS_SUCCESS
MKL_DSS_STATISTICS_INVALID_MATRIX
MKL_DSS_STATISTICS_INVALID_STATE
MKL_DSS_STATISTICS_INVALID_STRING
MKL_DSS_MSG_LVL_ERR
Iterative Sparse Solvers based on Reverse Communication

Interface (RCI ISS)
Intel MKL supports iterative sparse solvers (ISS) based on the reverse communication interface (RCI),
referred to here as the RCI ISS interface. The RCI ISS interface implements a group of user-callable routines
that are used in the step-by-step solving process of a symmetric positive definite system (RCI conjugate
gradient solver, or RCI CG), and of a non-symmetric indefinite (non-degenerate) system (RCI flexible
generalized minimal residual solver, or RCI FGMRES) of linear algebraic equations. This interface uses the
general RCI scheme described in [Dong95].
See the Appendix A Linear Solvers Basics for discussion of terms and concepts related to the ISS routines.
The term RCI indicates that when the solver needs the results of certain operations (for example, matrix-
vector multiplications), the user performs them and passes the result to the solver. This makes the solver
more universal as it is independent of the specific implementation of the operations like the matrix-vector
multiplication. To perform such operations, the user can use the built-in sparse matrix-vector multiplications
and triangular solvers routines described in Sparse BLAS Level 2 and Level 3 Routines.
1690
NOTE
The RCI CG solver is implemented in two versions: for system of equations with a single right-hand
side, and for systems of equations with multiple right-hand sides.
The CG method may fail to compute the solution or compute the wrong solution if the matrix of the
system is not symmetric and not positive definite.
The FGMRES method may fail if the matrix is degenerate.
Table "RCI CG Interface Routines" lists the names of the routines, and describes their general use.
RCI ISS Interface Routines
Routine Description
dcg_init, dcgmrhs_init, dfgmres_init
Checks the consistency and correctness of the user defined data.
dcg_check, dcgmrhs_check, dfgmres_check
Computes the approximate solution vector.
dcg, dcgmrhs, dfgmres
Retrieves the number of the current iteration.
dcg_get, dcgmrhs_get, dfgmres_get
The Intel MKL RCI ISS interface routines are normally invoked in this order:
1. <system_type>_init
2. <system_type>_check
3. <system_type>
4. <system_type>_get
Advanced users can change that order if they need it. Others should follow the above order of calls.
The following diagram indicates the typical order in which the RCI ISS interface routines are invoked.
Typical Order for Invoking RCI ISS interface Routines
See the code examples that use the RCI ISS interface routines to solve systems of linear equations in the
Intel MKL installation directory.
1691
CG Interface Description
Each routine for the RCI CG solver is implemented in two versions: for a system of equations with a single
right-hand side (SRHS), and for a system of equations with multiple right-hand sides (MRHS). The names of
routines for a system with MRHS contain the suffix mrhs.
Routine Options
All of the RCI CG routines have common parameters for passing various options to the routines (see CG
Common Parameters). The values for these parameters can be changed during computations.
User Data Arrays

Many of the RCI CG routines take arrays of user data as input. For example, user arrays are passed to the
routine dcg to compute the solution of a system of linear algebraic equations. The Intel MKL RCI CG routines
do not make copies of the user input arrays to minimize storage requirements and improve overall run-time
efficiency.
CG Common Parameters
NOTE
The default and initial values listed below are assigned to the parameters by calling the dcg_init/
dcgmrhs_init routine.
n MKL_INT, this parameter sets the size of the problem in the dcg_init/
dcgmrhs_init routine. All the other routines use the ipar[0] parameter
instead. Note that the coefficient matrix A is a square matrix of size n*n.
x double array of size n for SRHS, or matrix of size (n*nrhs) for MRHS. This
parameter contains the current approximation to the solution. Before the
first call to the dcg/dcgmrhs routine, it contains the initial approximation to
the solution.
nrhs MKL_INT, this parameter sets the number of right-hand sides for MRHS
routines.
b double array containing a single right-hand side vector, or matrix of size

n*nrhs containing right-hand side vectors.
RCI_request MKL_INT, this parameter gives information about the result of work of the
RCI CG routines. Negative values of the parameter indicate that the routine
completed with errors or warnings. The 0 value indicates successful
completion of the task. Positive values mean that you must perform specific
actions:
RCI_request= 1 multiply the matrix by tmp [0:n - 1), put the

result in tmp[n:2*n - 1), and return the control to
the dcg/dcgmrhs routine;
RCI_request= 2 to perform the stopping tests. If they fail, return the

control to the dcg/dcgmrhs routine. If the stopping
tests succeed, it indicates that the solution is found
and stored in the x array;
1692
RCI_request= 3 for SRHS: apply the preconditioner to tmp[2*n:3*n
- 1], put the result in tmp[3*n:4*n - 1], and
return the control to the dcg routine;
for MRHS: apply the preconditioner to

tmp[2+ipar[2]*n:(3 + ipar[2])*n - 1], put
the result in tmp[3*n:4*n - 1], and return the
control to the dcgmrhs routine.
Note that the dcg_get/dcgmrhs_get routine does not change the

parameter RCI_request. This enables use of this routine inside the reverse
communication computations.
ipar MKL_INT array, of size 128 for SRHS, and of size (128+2*nrhs) for MRHS.
This parameter specifies the integer set of data for the RCI CG
computations:
ipar[0] specifies the size of the problem. The dcg_init/

dcgmrhs_init routine assigns ipar[0]=n. All the
other routines use this parameter instead of n.
There is no default value for this parameter.
ipar[1] specifies the type of output for error and warning

messages generated by the RCI CG routines. The
default value 6 means that all messages are
displayed on the screen. Otherwise, the error and
warning messages are written to the newly created
files dcg_errors.txt and
dcg_check_warnings.txt, respectively. Note that
if ipar[5] and ipar[6] parameters are set to 0,
error and warning messages are not generated at
all.
ipar[2] for SRHS: contains the current stage of the RCI CG

computations. The initial value is 1;
for MRHS: contains the number of the right-hand
side for which the calculations are currently
performed.
WARNING
Avoid altering this variable during computations.
ipar[3] contains the current iteration number. The initial

value is 0.
ipar[4] specifies the maximum number of iterations. The

default value is min(150, n).
ipar[5] if the value is not equal to 0, the routines output

error messages in accordance with the parameter
ipar[1]. Otherwise, the routines do not output
error messages at all, but return a negative value of
the parameter RCI_request. The default value is 1.
1693
ipar[6] if the value is not equal to 0, the routines output

warning messages in accordance with the parameter
warning messages at all, but they return a negative
value of the parameter RCI_request. The default
value is 1.
ipar[7] if the value is not equal to 0, the dcg/dcgmrhs

routine performs the stopping test for the maximum
number of iterations: ipar[3]≤ipar[4]. Otherwise,
the method is stopped and the corresponding value
is assigned to the RCI_request. If the value is 0,
the routine does not perform this stopping test. The
default value is 1.

routine performs the residual stopping test:
dpar(5)≤dpar(4)= dpar(1)*dpar(3)+ dpar(2).
Otherwise, the method is stopped and
corresponding value is assigned to the
RCI_request. If the value is 0, the routine does not
perform this stopping test. The default value is 0.

routine requests a user-defined stopping test by
setting the output parameter RCI_request=2. If
the value is 0, the routine does not perform the user
defined stopping test. The default value is 1.
NOTE
At least one of the parameters ipar[7]-
ipar[9] must be set to 1.
ipar[10] if the value is equal to 0, the dcg/dcgmrhs routine

runs the non-preconditioned version of the
corresponding CG method. Otherwise, the routine
runs the preconditioned version of the CG method,
and by setting the output parameter
RCI_request=3, indicates that you must perform
the preconditioning step. The default value is 0.
ipar[11:127] are reserved and not used in the current RCI CG

SRHS and MRHS routines.
NOTE
For future compatibility, you must declare the
array ipar with length 128 for a single right-
hand side.
ipar[11:127 are reserved for internal use in the current RCI CG

+ 2*nrhs] SRHS and MRHS routines.
1694
NOTE
array ipar with length 128+2*nrhs for multiple
right-hand sides.
dpar double array, for SRHS of size 128, for MRHS of size (128+2*nrhs); this
parameter is used to specify the double precision set of data for the RCI CG
computations, specifically:
dpar[0] specifies the relative tolerance. The default value is

1.0X10-6.
dpar[1] specifies the absolute tolerance. The default value is

0.0.
dpar[2] specifies the square norm of the initial residual (if it

is computed in the dcg/dcgmrhs routine). The initial
value is 0.0.
dpar[3] service variable equal to

dpar[0]*dpar[2]+dpar[1] (if it is computed in
the dcg/dcgmrhs routine). The initial value is 0.0.
dpar[4] specifies the square norm of the current residual.

The initial value is 0.0.
dpar[5] specifies the square norm of residual from the

previous iteration step (if available). The initial value
is 0.0.
dpar[6] contains the alpha parameter of the CG method. The

initial value is 0.0.
dpar[7] contains the beta parameter of the CG method, it is

equal to dpar[4]/dpar[5] The initial value is 0.0.
dpar[8:127] are reserved and not used in the current RCI CG

SRHS and MRHS routines.
NOTE
array dpar with length 128 for a single right-
hand side.
dpar(9:128+2*nrhs are reserved for internal use in the current RCI CG

)[8:127 + 2*nrhs] SRHS and MRHS routines.
1695
NOTE
array dpar with length 128+2*nrhs for multiple
right-hand sides.
tmp double array of size (n*4)for SRHS, and (n*(3+nrhs))for MRHS. This
parameter is used to supply the double precision temporary space for the
RCI CG computations, specifically:
tmp[0:n - 1] specifies the current search direction. The initial

value is 0.0.
tmp[n:2*n - 1] contains the matrix multiplied by the current search

direction. The initial value is 0.0.
tmp[2*n:3*n - 1] contains the current residual. The initial value is 0.0.
tmp[3*n:4*n - 1] contains the inverse of the preconditioner applied to

the current residual for the SRHS version of CG.
There is no initial value for this parameter.
tmp[4*n:(4 + contains the inverse of the preconditioner applied to

nrhs)*n - 1] the current residual for the MRHS version of CG.
There is no initial value for this parameter.
NOTE
You can define this array in the code using RCI CG SRHS as
doubletmp[3*n] if you run only non-preconditioned CG iterations.
FGMRES Interface Description
Routine Options
All of the RCI FGMRES routines have common parameters for passing various options to the routines (see
FGMRES Common Parameters). The values for these parameters can be changed during computations.
User Data Arrays

Many of the RCI FGMRES routines take arrays of user data as input. For example, user arrays are passed to
the routine dfgmres to compute the solution of a system of linear algebraic equations. To minimize storage
requirements and improve overall run-time efficiency, the Intel MKL RCI FGMRES routines do not make
copies of the user input arrays.
FGMRES Common Parameters
NOTE
The default and initial values listed below are assigned to the parameters by calling the dfgmres_init
routine.
1696
n MKL_INT, this parameter sets the size of the problem in the dfgmres_init
routine. All the other routines use the ipar[0] parameter instead. Note
that the coefficient matrix A is a square matrix of size n*n.
x double array, this parameter contains the current approximation to the

solution vector. Before the first call to the dfgmres routine, it contains the
initial approximation to the solution vector.
b double array, this parameter contains the right-hand side vector.

Depending on user requests (see the parameter ipar[12]), it might
contain the approximate solution after execution.
RCI_request MKL_INT, this parameter gives information about the result of work of the
RCI FGMRES routines. Negative values of the parameter indicate that the
routine completed with errors or warnings. The 0 value indicates successful
completion of the task. Positive values mean that you must perform specific
actions:
RCI_request= 1 multiply the matrix by tmp[ipar[21] -

1:ipar[21] + n - 2], put the result in
tmp[ipar[22] - 1:ipar[22] + n - 2], and
return the control to the dfgmres routine;
RCI_request= 2 perform the stopping tests. If they fail, return the

control to the dfgres routine. Otherwise, the
solution can be updated by a subsequent call to
dfgmres_get routine;
RCI_request= 3 apply the preconditioner to tmp[ipar[21] -

1:ipar[21] + n - 2], put the result in
tmp[ipar[22] - 1:ipar[22] + n - 2], and
return the control to the dfgmres routine.
RCI_request= 4 check if the norm of the current orthogonal vector is

zero, within the rounding or computational errors.
Return the control to the dfgmres routine if it is not
zero, otherwise complete the solution process by
calling dfgmres_get routine.
ipar[128] MKL_INT array, this parameter specifies the integer set of data for the RCI
FGMRES computations:
ipar[0] specifies the size of the problem. The

dfgmres_init routine assigns ipar[0]=n. All the
other routines uses this parameter instead of n.
There is no default value for this parameter.
ipar[1] specifies the type of output for error and warning

messages that are generated by the RCI FGMRES
routines. The default value 6 means that all
messages are displayed on the screen. Otherwise
the error and warning messages are written to the
newly created file MKL_RCI_FGMRES_Log.txt. Note
1697
that if ipar[5] and ipar[6] parameters are set to

0, error and warning messages are not generated at
all.
ipar[2] contains the current stage of the RCI FGMRES

computations. The initial value is 1.
WARNING
Avoid altering this variable during computations.
ipar[3] contains the current iteration number. The initial

value is 0.
ipar[4] specifies the maximum number of iterations. The

default value is min (150,n).
ipar[5] if the value is not 0, the routines output error

messages in accordance with the parameter
ipar[1]. If it is 0, the routines do not output error
messages at all, but return a negative value of the
parameter RCI_request. The default value is 1.
ipar[6] if the value is not 0, the routines output warning

messages in accordance with the parameter
warning messages at all, but they return a negative
value of the parameter RCI_request. The default
value is 1.
ipar[7] if the value is not equal to 0, the dfgmres routine

performs the stopping test for the maximum
number of iterations: ipar[3]≤ipar[4]. If the
value is 0, the dfgmres routine does not perform
this stopping test. The default value is 1.
ipar[8] if the value is not 0, the dfgmres routine performs

the residual stopping test: dpar[4]≤dpar[3].If the
value is 0, the dfgmres routine does not perform
this stopping test. The default value is 0.
ipar[9] if the value is not 0, the dfgmres routine indicates

that the user-defined stopping test should be
performed by setting RCI_request=2. If the value
is 0, the dfgmres routine does not perform the
user-defined stopping test. The default value is 1.
NOTE
At least one of the parameters ipar[7]-
ipar[9] must be set to 1.
1698
ipar[10] if the value is 0, the dfgmres routine runs the non-
preconditioned version of the FGMRES method.
Otherwise, the routine runs the preconditioned
version of the FGMRES method, and requests that
you perform the preconditioning step by setting the
output parameter RCI_request=3. The default
value is 0.
ipar[11] if the value is not equal to 0, the dfgmres routine

performs the automatic test for zero norm of the
currently generated vector: dpar[6]≤dpar[7],
where dpar[7] contains the tolerance value.
Otherwise, the routine indicates that you must
perform this check by setting the output parameter
RCI_request=4. The default value is 0.
ipar[12] if the value is equal to 0, the dfgmres_get routine

updates the solution to the vector x according to the
computations done by the dfgmres routine. If the
value is positive, the routine writes the solution to
the right-hand side vector b. If the value is
negative, the routine returns only the number of the
current iteration, and does not update the solution.
The default value is 0.
NOTE
It is possible to call the dfgmres_get routine at
any place in the code, but you must pay special
attention to the parameter ipar[12]. The RCI
FGMRES iterations can be continued after the call
to dfgmres_get routine only if the parameter
ipar[12] is not equal to zero. If ipar[12] is
positive, then the updated solution overwrites
the right-hand side in the vector b. If you want
to run the restarted version of FGMRES with the
same right-hand side, then it must be saved in a
different memory location before the first call to
the dfgmres_get routine with positive
ipar[12].
ipar[13] contains the internal iteration counter that counts

the number of iterations before the restart takes
place. The initial value is 0.
WARNING
Do not alter this variable during computations.
ipar[14] specifies the number of the non-restarted FGMRES

iterations. To run the restarted version of the
FGMRES method, assign the number of iterations to
1699
ipar[14] before the restart. The default value is

min(150, n), which means that by default the non-
restarted version of FGMRES method is used.
ipar[15] service variable specifying the location of the

rotated Hessenberg matrix from which the matrix
stored in the packed format (see Matrix Arguments
in the Appendix B for details) is started in the tmp
array.

rotation cosines from which the vector of cosines is
started in the tmp array.

rotation sines from which the vector of sines is

rotated residual vector from which the vector is
ipar[19] service variable, specifies the location of the least

squares solution vector from which the vector is
ipar[20] service variable specifying the location of the set of

preconditioned vectors from which the set is started
in the tmp array. The memory locations in the tmp
array starting from ipar[20] are used only for the
preconditioned FGMRES method.
ipar[21] specifies the memory location from which the first

vector (source) used in operations requested via
RCI_request is started in the tmp array.
ipar[22] specifies the memory location from which the

second vector (output) used in operations requested
via RCI_request is started in the tmp array.
ipar[23:127] are reserved and not used in the current RCI

FGMRES routines.
NOTE
You must declare the array ipar with length
128. While defining the array in the code as
ipar[23] works, there is no guarantee of future
compatibility with Intel MKL.
dpar(128) double array, this parameter specifies the double precision set of data for
the RCI CG computations, specifically:
1700
dpar[0] specifies the relative tolerance. The default value is
1.0e-6.
dpar[1] specifies the absolute tolerance. The default value is

0.0e-0.
dpar[2] specifies the Euclidean norm of the initial residual (if

it is computed in the dfgmres routine). The initial
value is 0.0.
dpar[3] service variable equal to

dpar[0]*dpar[2]+dpar[1] (if it is computed in
the dfgmres routine). The initial value is 0.0.
dpar[4] specifies the Euclidean norm of the current residual.

The initial value is 0.0.
dpar[5] specifies the Euclidean norm of residual from the

previous iteration step (if available). The initial value
is 0.0.
dpar[6] contains the norm of the generated vector. The

initial value is 0.0.
NOTE
In terms of [Saad03] this parameter is the
coefficient hk+1,k of the Hessenberg matrix.
dpar[7] contains the tolerance for the zero norm of the

currently generated vector. The default value is
1.0e-12.
dpar[8:127] are reserved and not used in the current RCI

FGMRES routines.
NOTE
You must declare the array dpar with length
128. While defining the array in the code as
double dpar[8] works, there is no guarantee
of future compatibility with Intel MKL.
tmp double array of size ((2*ipar[14] + 1)*n + ipar[14]*(ipar[14]

+ 9)/2 + 1)) used to supply the double precision temporary space for the
RCI FGMRES computations, specifically:
tmp[0:ipar[15] - contains the sequence of vectors generated by the

2] FGMRES method. The initial value is 0.0.
tmp[ipar[15] - contains the rotated Hessenberg matrix generated

1:ipar[16] - 2] by the FGMRES method; the matrix is stored in the
packed format. There is no initial value for this part
of tmp array.
1701
tmp[ipar[16] - contains the rotation cosines vector generated by

1:ipar[17] - 2] the FGMRES method. There is no initial value for
this part of tmp array.
tmp[ipar[17] - contains the rotation sines vector generated by the

1:ipar[18] - 2] FGMRES method. There is no initial value for this
part of tmp array.
tmp[ipar[18] - contains the rotated residual vector generated by

1:ipar[19] - 2] the FGMRES method. There is no initial value for
this part of tmp array.
tmp[ipar[19] - contains the solution vector to the least squares

1:ipar[20] - 2] problem generated by the FGMRES method. There is
no initial value for this part of tmp array.
tmp[ipar[20] - contains the set of preconditioned vectors generated

1:*] for the FGMRES method by the user. This part of
tmp array is not used if the non-preconditioned
version of FGMRES method is called. There is no
initial value for this part of tmp array.
NOTE
You can define this array in the code as double
tmp[(2*ipar[14] + 1)*n +
ipar[14]*(ipar[14] + 9)/2 + 1)] if you run
only non-preconditioned FGMRES iterations.
RCI ISS Routines
dcg_init
Syntax
void dcg_init (const MKL_INT *n , const double *x , const double *b , MKL_INT
*RCI_request , MKL_INT *ipar , double *dpar , double *tmp );
Include Files
• mkl.h
Description
The routine dcg_init initializes the solver. After initialization, all subsequent invocations of the Intel MKL
RCI CG routines use the values of all parameters returned by the routine dcg_init. Advanced users can skip
this step and set the values in the ipar and dpar arrays directly.
1702
CAUTION
You can modify the contents of these arrays after they are passed to the solver routine only if you are
sure that the values are correct and consistent. You can perform a basic check for correctness and
consistency by calling the dcg_check routine, but it does not guarantee that the method will work
correctly.
Input Parameters
n Sets the size of the problem.
x Array of size n. Contains the initial approximation to the solution vector.

Normally it is equal to 0 or to b.
b Array of size n. Contains the right-hand side vector.
Output Parameters
RCI_request Gives information about the result of the routine.
ipar Array of size 128. Refer to the CG Common Parameters.
dpar Array of size 128. Refer to the CG Common Parameters.
tmp Array of size (n*4). Refer to the CG Common Parameters.
Return Values
RCI_request= 0 Indicates that the task completed normally.
RCI_request= -10000 Indicates failure to complete the task.
dcg_check
Checks consistency and correctness of the user
defined data.
Syntax
void dcg_check (const MKL_INT *n , const double *x , const double *b , MKL_INT
Include Files
• mkl.h
Description
The routine dcg_check checks consistency and correctness of the parameters to be passed to the solver
routine dcg. However this operation does not guarantee that the solver returns the correct result. It only
reduces the chance of making a mistake in the parameters of the method. Skip this operation only if you are
sure that the correct data is specified in the solver parameters.
The lengths of all vectors must be defined in a previous call to the dcg_init routine.
1703
Input Parameters

Output Parameters
RCI_request Gives information about result of the routine.
Return Values
RCI_request= -1100 Indicates that the task is interrupted and the errors occur.
RCI_request= -1001 Indicates that there are some warning messages.
RCI_request= -1010 Indicates that the routine changed some parameters to

make them consistent or correct.
RCI_request= -1011 Indicates that there are some warning messages and that
the routine changed some parameters.
dcg
Computes the approximate solution vector.
Syntax
void dcg (const MKL_INT *n , double *x , const double *b , MKL_INT *RCI_request ,
MKL_INT *ipar , double *dpar , double *tmp );
Include Files
• mkl.h
Description
The dcg routine computes the approximate solution vector using the CG method [Young71]. The routine dcg
uses the vector in the array x before the first call as an initial approximation to the solution. The parameter
RCI_request gives you information about the task completion and requests results of certain operations that
are required by the solver.
Note that lengths of all vectors must be defined in a previous call to the dcg_init routine.
1704
Input Parameters
Output Parameters
RCI_request Gives information about result of work of the routine.
x Array of size n. Contains the updated approximation to the solution vector.
Return Values
RCI_request=0 Indicates that the task completed normally and the solution
is found and stored in the vector x. This occurs only if the
stopping tests are fully automatic. For the user defined
stopping tests, see the description of the RCI_request= 2.
RCI_request=-1 Indicates that the routine was interrupted because the

maximum number of iterations was reached, but the
relative stopping criterion was not met. This situation
occurs only if you request both tests.
RCI_request=-2 Indicates that the routine was interrupted because of an

attempt to divide by zero. This situation happens if the
matrix is non-positive definite or almost non-positive
definite.
RCI_request=- 10 Indicates that the routine was interrupted because the

residual norm is invalid. This usually happens because the
value dpar[5] was altered outside of the routine, or the
dcg_check routine was not called.
RCI_request=-11 Indicates that the routine was interrupted because it enters

the infinite cycle. This usually happens because the values
ipar[7], ipar[8], ipar[9] were altered outside of the
routine, or the dcg_check routine was not called.
RCI_request= 1 Indicates that you must multiply the matrix by tmp[0:n -

1], put the result in the tmp[n:2*n - 1], and return
control back to the routine dcg.
1705
RCI_request= 2 Indicates that you must perform the stopping tests. If they
fail, return control back to the dcg routine. Otherwise, the
solution is found and stored in the vector x.
RCI_request= 3 Indicates that you must apply the preconditioner to [2*n:

3*n - 1], put the result in the [3*n:4*n - 1], and
return control back to the routine dcg.
dcg_get
Syntax
void dcg_get (const MKL_INT *n , const double *x , const double *b , const MKL_INT
*RCI_request , const MKL_INT *ipar , const double *dpar , const double *tmp , MKL_INT
*itercount );
Include Files
• mkl.h
Description
The routine dcg_get retrieves the current iteration number of the solutions process.
Input Parameters
x Array of size n. Contains the approximation vector to the solution.
RCI_request This parameter is not used.
Output Parameters
itercount Returns the current iteration number.
Return Values
The routine dcg_get has no return values.
dcgmrhs_init
Initializes the RCI CG solver with MHRS.
Syntax
void dcgmrhs_init (const MKL_INT *n , const double *x , const MKL_INT *nrhs , const
double *b , const MKL_INT *method , MKL_INT *RCI_request , MKL_INT *ipar , double
*dpar , double *tmp );
1706
Include Files
• mkl.h
Description
The routine dcgmrhs_init initializes the solver. After initialization all subsequent invocations of the Intel
MKL RCI CG with multiple right-hand sides (MRHS) routines use the values of all parameters that are
returned by dcgmrhs_init. Advanced users may skip this step and set the values to these parameters
directly in the appropriate routines.
WARNING
consistency by calling the dcgmrhs_check routine, but it does not guarantee that the method will
work correctly.
Input Parameters
x Array of size n*nrhs. Contains the initial approximation to the solution

vectors. Normally it is equal to 0 or to b.
nrhs Sets the number of right-hand sides.
b Array of size n*nrhs. Contains the right-hand side vectors.
method Specifies the method of solution:

A value of 1 indicates CG with multiple right-hand sides (default value)
Output Parameters
ipar Array of size (128+2*nrhs). Refer to the CG Common Parameters.
dpar Array of size (128+2*nrhs). Refer to the CG Common Parameters.
tmp Array of size (n*(3+nrhs)). Refer to the CG Common Parameters.
Return Values
dcgmrhs_check
defined data.
Syntax
void dcgmrhs_check (const MKL_INT *n , const double *x , const MKL_INT *nrhs , const
double *b , MKL_INT *RCI_request , MKL_INT *ipar , double *dpar , double *tmp );
1707
Include Files
• mkl.h
Description
The routine dcgmrhs_check checks the consistency and correctness of the parameters to be passed to the
solver routine dcgmrhs. While this operation reduces the chance of making a mistake in the parameters, it
does not guarantee that the solver returns the correct result.
If you are sure that the correct data is specified in the solver parameters, you can skip this operation.
The lengths of all vectors must be defined in a previous call to the dcgmrhs_init routine.
Input Parameters

vectors. Normally it is equal to 0 or to b.
nrhs This parameter sets the number of right-hand sides.
Output Parameters
RCI_request Returns information about the results of the routine.
Return Values

dcgmrhs
Computes the approximate solution vectors.
Syntax
void dcgmrhs (const MKL_INT *n , double *x , const MKL_INT *nrhs , const double *b ,
MKL_INT *RCI_request , MKL_INT *ipar , double *dpar , double *tmp );
1708
Include Files
• mkl.h
Description
The routine dcgmrhs computes approximate solution vectors using the CG with multiple right-hand sides
(MRHS) method [Young71]. The routine dcgmrhs uses the value that was in the x before the first call as an
initial approximation to the solution. The parameter RCI_request gives information about task completion
status and requests results of certain operations that are required by the solver.
Note that lengths of all vectors are assumed to have been defined in a previous call to the dcgmrhs_init
routine.
Input Parameters
n Sets the size of the problem, and the sizes of arrays x and b.

vectors.
tmp Array of size (n, 3+nrhs). Refer to the CG Common Parameters.
Output Parameters
RCI_request Gives information about result of work of the routine.
x Array of size (n*nrhs). Contains the updated approximation to the solution

vectors.
Return Values
stopping tests, see the description of the RCI_request= 2.

occurs only if both tests are requested by the user.
RCI_request=-2 The routine was interrupted because of an attempt to

divide by zero. This situation happens if the matrix is non-
positive definite or almost non-positive definite.
1709
RCI_request=- 10 Indicates that the routine was interrupted because the

residual norm is invalid. This usually happens because the
value dpar[5] was altered outside of the routine, or the
dcg_check routine was not called.
RCI_request=-11 Indicates that the routine was interrupted because it enters

the infinite cycle. This usually happens because the values
ipar[7], ipar[8], ipar[9] were altered outside of the
routine, or the dcg_check routine was not called.
RCI_request= 1 Indicates that you must multiply the matrix by tmp[0:n -

1], put the result in the tmp[n:2*n - 1], and return
control back to the routine dcg.
fail, return control back to the dcg routine. Otherwise, the
solution is found and stored in the vector x.
RCI_request= 3 Indicates that you must apply the preconditioner to

tmp[2*n:3*n - 1], put the result in the tmp[3*n:4*n -
1], and return control back to the routine dcg.
dcgmrhs_get
Syntax
void dcgmrhs_get (const MKL_INT *n , const double *x , const MKL_INT *nrhs , const
double *b , const MKL_INT *RCI_request , const MKL_INT *ipar , const double *dpar ,
const double *tmp , MKL_INT *itercount );
Include Files
• mkl.h
Description
The routine dcgmrhs_get retrieves the current iteration number of the solving process.
Input Parameters

vectors.
b Array of size n*nrhs. Contains the right-hand side .
RCI_request This parameter is not used.
1710
Output Parameters
itercount Array of size nrhs. Returns the current iteration number for each right-hand
side.
Return Values
The routine dcgmrhs_get has no return values.
dfgmres_init
Syntax
void dfgmres_init (const MKL_INT *n , const double *x , const double *b , MKL_INT
Include Files
• mkl.h
Description
The routine dfgmres_init initializes the solver. After initialization all subsequent invocations of Intel MKL
RCI FGMRES routines use the values of all parameters that are returned by dfgmres_init. Advanced users
can skip this step and set the values in the ipar and dpar arrays directly.
WARNING
consistency by calling the dfgmres_check routine, but it does not guarantee that the method will
work correctly.
Input Parameters

Output Parameters
ipar Array of size 128. Refer to the FGMRES Common Parameters.
dpar Array of size 128. Refer to the FGMRES Common Parameters.
tmp Array of size ((2*ipar[14] + 1)*n + ipar[14]*(ipar[14] + 9)/2

+ 1). Refer to the FGMRES Common Parameters.
1711
Return Values
dfgmres_check
defined data.
Syntax
void dfgmres_check (const MKL_INT *n, const double *x, const double *b, MKL_INT
*RCI_request, MKL_INT *ipar, double *dpar, double *tmp );
Include Files
• mkl.h
Description
The routine dfgmres_check checks consistency and correctness of the parameters to be passed to the solver
routine dfgmres. However, this operation does not guarantee that the method gives the correct result. It
only reduces the chance of making a mistake in the parameters of the routine. Skip this operation only if you
are sure that the correct data is specified in the solver parameters.
The lengths of all vectors are assumed to have been defined in a previous call to the dfgmres_init routine.
Input Parameters

Output Parameters
tmp Array of size ((2*ipar[14] + 1)*n + ipar[14]*(ipar[14] + 9)/2

+ 1). Refer to the FGMRES Common Parameters.
Return Values
1712
dfgmres
Makes the FGMRES iterations.
Syntax
void dfgmres (const MKL_INT *n, double *x, double *b, MKL_INT *RCI_request, MKL_INT
*ipar, double *dpar, double *tmp );
Include Files
• mkl.h
Description
The routine dfgmres performs the FGMRES iterations [Saad03], using the value that was in the array x
before the first call as an initial approximation of the solution vector. To update the current approximation to
the solution, the dfgmres_get routine must be called. The RCI FGMRES iterations can be continued after the
call to the dfgmres_get routine only if the value of the parameter ipar[12] is not equal to 0 (default
value). Note that the updated solution overwrites the right-hand side in the vector b if the parameter
ipar[12] is positive, and the restarted version of the FGMRES method can not be run. If you want to keep
the right-hand side, you must be save it in a different memory location before the first call to the
dfgmres_get routine with a positive ipar[12].
The parameter RCI_request gives information about the task completion and requests results of certain
operations that the solver requires.
The lengths of all the vectors must be defined in a previous call to the dfgmres_init routine.
Input Parameters
tmp Array of size [12]. Refer to the FGMRES Common Parameters.
Output Parameters
RCI_request Informs about result of work of the routine.
tmp Array of size ((2*ipar(15)+1)*n+ipar(15)*ipar(15)+9)/2 + 1). Refer

to the FGMRES Common Parameters.
1713
Return Values
stopping tests, see the description of the RCI_request= 2
or 4.

occurs only if you request both tests.
RCI_request= -10 Indicates that the routine was interrupted because of an

attempt to divide by zero. Usually this happens if the
matrix is degenerate or almost degenerate. However, it
may happen if the parameter dpar is altered, or if the
method is not stopped when the solution is found.
RCI_request= -11 Indicates that the routine was interrupted because it

entered an infinite cycle. Usually this happens because the
values ipar[7], ipar[8], ipar[9] were altered outside of
the routine, or the dfgmres_check routine was not called.
RCI_request= -12 Indicates that the routine was interrupted because errors
were found in the method parameters. Usually this happens
if the parameters ipar and dpar were altered by mistake
outside the routine.
RCI_request= 1 Indicates that you must multiply the matrix by

tmp[ipar[21] - 1:ipar[21] + n - 2], put the result
in the tmp[ipar[22] - 1:ipar[22] + n - 2], and
return control back to the routine dfgmres.
fail, return control to the dfgmres routine. Otherwise, the
FGMRES solution is found, and you can run the
fgmres_get routine to update the computed solution in the
vector x.
RCI_request= 3 Indicates that you must apply the inverse preconditioner to

tmp[ipar[21] - 1:ipar[21] + n - 2], put the result
in the tmp[ipar[22] - 1:ipar[22] + n - 2], and
return control back to the routine dfgmres.
RCI_request= 4 Indicates that you must check the norm of the currently
generated vector. If it is not zero within the computational/
rounding errors, return control to the dfgmres routine.
Otherwise, the FGMRES solution is found, and you can run
the dfgmres_get routine to update the computed solution
in the vector x.
1714
dfgmres_get
Retrieves the number of the current iteration and
updates the solution.
Syntax
void dfgmres_get (const MKL_INT *n, double *x, double *b, MKL_INT *RCI_request, const
MKL_INT *ipar, const double *dpar, double *tmp, MKL_INT *itercount );
Include Files
• mkl.h
Description
The routine dfgmres_get retrieves the current iteration number of the solution process and updates the
solution according to the computations performed by the dfgmres routine. To retrieve the current iteration
number only, set the parameter ipar[12]= -1 beforehand. Normally, you should do this before proceeding
further with the computations. If the intermediate solution is needed, the method parameters must be set
properly. For details see FGMRES Common Parameters and the Iterative Sparse Solver code examples in the
Intel MKL installation directory:
Input Parameters
tmp Array of size ((2*ipar[14]+1)*n+ipar[14]*ipar[14]+9)/2 + 1). Refer

to the FGMRES Common Parameters.
Output Parameters
x Array of size n. If ipar[12]= 0, it contains the updated approximation to

the solution according to the computations done in dfgmres routine.
Otherwise, it is not changed.
b Array of size n. If ipar(13)> 0, it contains the updated approximation to

the solution according to the computations done in dfgmres routine.
Otherwise, it is not changed.
itercount Contains the value of the current iteration number.
Return Values
1715
RCI_request= -12 Indicates that the routine was interrupted because errors
were found in the routine parameters. Usually this happens
if the parameters ipar and dpar were altered by mistake
outside of the routine.
RCI_request= -10000 Indicates that the routine failed to complete the task.
RCI ISS Implementation Details

Several aspects of the Intel MKL RCI ISS interface are platform-specific and language-specific. To promote
portability across platforms and ease of use across different languages, include one of the Intel MKL RCI ISS
language-specific header files.
NOTE
Intel MKL does not support the RCI ISS interface unless you include the language-specific header file.
Preconditioners based on Incomplete LU Factorization

Technique
Preconditioners, or accelerators are used to accelerate an iterative solution process. In some cases, their use
can reduce the number of iterations dramatically and thus lead to better solver performance. Although the
terms preconditioner and accelerator are synonyms, hereafter only preconditioner is used.
Intel MKL provides two preconditioners, ILU0 and ILUT, for sparse matrices presented in the format accepted
in the Intel MKL direct sparse solvers (three-array variation of the CSR storage format described in Sparse
Matrix Storage Format ). The algorithms used are described in [Saad03].
The ILU0 preconditioner is based on a well-known factorization of the original matrix into a product of two
triangular matrices: lower and upper triangular matrices. Usually, such decomposition leads to some fill-in in
the resulting matrix structure in comparison with the original matrix. The distinctive feature of the ILU0
preconditioner is that it preserves the structure of the original matrix in the result.
Unlike the ILU0 preconditioner, the ILUT preconditioner preserves some resulting fill-in in the preconditioner
matrix structure. The distinctive feature of the ILUT algorithm is that it calculates each element of the
preconditioner and saves each one if it satisfies two conditions simultaneously: its value is greater than the
product of the given tolerance and matrix row norm, and its value is in the given bandwidth of the resulting
preconditioner matrix.
Both ILU0 and ILUT preconditioners can apply to any non-degenerate matrix. They can be used alone or
together with the Intel MKL RCI FGMRES solver (see Sparse Solver Routines). Avoid using these
preconditioners with MKL RCI CG solver because in general, they produce a non-symmetric resulting matrix
even if the original matrix is symmetric. Usually, an inverse of the preconditioner is required in this case. To
do this the Intel MKL triangular solver routine mkl_dcsrtrsv must be applied twice: for the lower triangular
part of the preconditioner, and then for its upper triangular part.
NOTE
Although ILU0 and ILUT preconditioners apply to any non-degenerate matrix, in some cases the
algorithm may fail to ensure successful termination and the required result. Whether or not the
preconditioner produces an acceptable result can only be determined in practice.
A preconditioner may increase the number of iterations for an arbitrary case of the system and the
initial solution, and even ruin the convergence. It is your responsibility as a user to choose a suitable
preconditioner.
1716
General Scheme of Using ILUT and RCI FGMRES Routines
The general scheme for use is the same for both preconditioners. Some differences exist in the calling
parameters of the preconditioners and in the subsequent call of two triangular solvers. You can see all these
differences in the preconditioner code examples (dcsrilu*.*) in the examples folder of the Intel MKL
installation directory:
ILU0 and ILUT Preconditioners Interface Description

The concepts required to understand the use of the Intel MKL preconditioner routines are discussed in the
Appendix A Linear Solvers Basics.
User Data Arrays

The preconditioner routines take arrays of user data as input. To minimize storage requirements and improve
overall run-time efficiency, the Intel MKL preconditioner routines do not make copies of the user input arrays.
Common Parameters
Some parameters of the preconditioners are common with the FGMRES Common Parameters. The routine
dfgmres_init specifies their default and initial values. However, some parameters can be redefined with
other values. These parameters are listed below.
For the ILU0 preconditioner:

ipar[1] - specifies the destination of error messages generated by the ILU0 routine. The default value 6
means that all error messages are displayed on the screen. Otherwise routine creates a log file called
MKL_PREC_log.txt and writes error messages to it. Note if the parameter ipar[5] is set to 0, then error
messages are not generated at all.
ipar[5] - specifies whether error messages are generated. If its value is not equal to 0, the ILU0 routine
returns error messages as specified by the parameter ipar[1]. Otherwise, the routine does not generate
error messages at all, but returns a negative value for the parameter ierr. The default value is 1.
For the ILUT preconditioner:

ipar[1] - specifies the destination of error messages generated by the ILUT routine. The default value 6
means that all messages are displayed on the screen. Otherwise routine creates a log file called
MKL_PREC_log.txt and writes error messages to it. Note if the parameter ipar[5] is set to 0, then error
messages are not generated at all.
ipar[5] - specifies whether error messages are generated. If its value is not equal to 0, the ILUT routine
returns error messages as specified by the parameter ipar[1]. Otherwise, the routine does not generate
error messages at all, but returns a negative value for the parameter ierr. The default value is 1.
ipar[6] - if its value is greater than 0, the ILUT routine generates warning messages as specified by the
parameter ipar[1] and continues calculations. If its value is equal to 0, the routine returns a positive value
of the parameter ierr. If its value is less than 0, the routine generates a warning message as specified by
the parameter ipar[1] and returns a positive value of the parameter ierr. The default value is 1.
dcsrilu0
ILU0 preconditioner based on incomplete LU
factorization of a sparse matrix.
Syntax
void dcsrilu0 (const MKL_INT *n , const double *a , const MKL_INT *ia , const MKL_INT
*ja , double *bilu0 , const MKL_INT *ipar , const double *dpar , MKL_INT *ierr );
1717
Include Files
• mkl.h
Description
The routine dcsrilu0 computes a preconditioner B [Saad03] of a given sparse matrix A stored in the format
accepted in the direct sparse solvers:
A~B=L*U , where L is a lower triangular matrix with a unit diagonal, U is an upper triangular matrix with a
non-unit diagonal, and the portrait of the original matrix A is used to store the incomplete factors L and U.
CAUTION
This routine supports only one-based indexing of the array parameters.
Input Parameters
n Size (number of rows or columns) of the original square n-by-n matrix A.
a Array containing the set of elements of the matrix A. Its length is equal to
the number of non-zero elements in the matrix A. Refer to the values
array description in the Sparse Matrix Storage Format for more details.
ia Array of size (n+1) containing begin indices of rows of the matrix A such
row i. The value of the last element ia[n] is equal to the number of non-
zero elements in the matrix A, plus one. Refer to the rowIndex array
description in the Sparse Matrix Storage Format for more details.
matrix A. It is important that the indices are in increasing order per row.
The matrix size is equal to the size of the array a. Refer to the columns
array description in the Sparse Matrix Storage Format for more details.
CAUTION
If column indices are not stored in ascending order for each row of
matrix, the result of the routine might not be correct.
ipar Array of size 128. This parameter specifies the integer set of data for both
the ILU0 and RCI FGMRES computations. Refer to the ipar array
description in the FGMRES Common Parameters for more details on
FGMRES parameter entries. The entries that are specific to ILU0 are listed
below.
ipar[30] specifies how the routine operates when a zero

diagonal element occurs during calculation. If this
parameter is set to 0 (the default value set by the
routine dfgmres_init), then the calculations are
stopped and the routine returns a non-zero error
value. Otherwise, the diagonal element is set to the
value of dpar[31] and the calculations continue.
1718
NOTE
You can declare the ipar array with a size of 32. However, for future
compatibility you must declare the array ipar with length 128.
dpar Array of size 128. This parameter specifies the double precision set of data
for both the ILU0 and RCI FGMRES computations. Refer to the dpar array
FGMRES parameter entries. The entries specific to ILU0 are listed below.
dpar[30] specifies a small value, which is compared with the

computed diagonal elements. When ipar[30] is not
0, then diagonal elements less than dpar[30] are
set to dpar[31]. The default value is 1.0e-16.
NOTE
This parameter can be set to the negative value,
because the calculation uses its absolute value.
If this parameter is set to 0, the comparison with
the diagonal element is not performed.
dpar[31] specifies the value that is assigned to the diagonal

element if its value is less than dpar[30] (see
above). The default value is 1.0e-10.
NOTE
You can declare the dpar array with a size of 32. However, for future
compatibility you must declare the array dpar with length 128.
Output Parameters
bilu0 Array B containing non-zero elements of the resulting preconditioning

matrix B, stored in the format accepted in direct sparse solvers. Its size is
equal to the number of non-zero elements in the matrix A. Refer to the
values array description in the Sparse Matrix Storage Format section for
more details.
ierr Error flag, gives information about the routine completion.
NOTE
To present the resulting preconditioning matrix in the CSR3 format the arrays ia (row indices) and ja
(column indices) of the input matrix must be used.
Return Values
ierr=0 Indicates that the task completed normally.
1719
ierr=-101 Indicates that the routine was interrupted and that error
occurred: at least one diagonal element is omitted from the
matrix in CSR3 format (see Sparse Matrix Storage Format).
ierr=-102 Indicates that the routine was interrupted because the

matrix contains a diagonal element with the value of zero.

matrix contains a diagonal element which is so small that it
could cause an overflow, or that it would cause a bad
approximation to ILU0.

memory is insufficient for the internal work array.

input matrix size n is less than or equal to 0.

column indices ja are not in the ascending order.
dcsrilut
ILUT preconditioner based on the incomplete LU
factorization with a threshold of a sparse matrix.
Syntax
void dcsrilut (const MKL_INT *n, const double *a, const MKL_INT *ia, const MKL_INT *ja,
double *bilut, MKL_INT *ibilut, MKL_INT *jbilut, const double *tol, const MKL_INT
*maxfil, const MKL_INT *ipar, const double *dpar, MKL_INT *ierr);
Include Files
• mkl.h
Description
The routine dcsrilut computes a preconditioner B [Saad03] of a given sparse matrix A stored in the format
accepted in the direct sparse solvers:
A~B=L*U , where L is a lower triangular matrix with unit diagonal and U is an upper triangular matrix with
non-unit diagonal.
The following threshold criteria are used to generate the incomplete factors L and U:
1) the resulting entry must be greater than the matrix current row norm multiplied by the parameter tol,
and
2) the number of the non-zero elements in each row of the resulting L and U factors must not be greater
than the value of the parameter maxfil.
CAUTION
This routine supports only one-based indexing of the array parameters.
1720
Input Parameters
n Size (number of rows or columns) of the original square n-by-n matrix A.
a Array containing all non-zero elements of the matrix A. The length of the
array is equal to their number. Refer to values array description in the
Sparse Matrix Storage Format section for more details.
ia Array of size (n+1) containing indices of non-zero elements in the array a.

ia[i] is the index of the first non-zero element from the row i. The value
of the last element ia[n] is equal to the number of non-zeros in the matrix
A, plus one. Refer to the rowIndex array description in the Sparse Matrix
Storage Format for more details.
ja Array of size equal to the size of the array a. This array contains the column
numbers for each non-zero element of the matrix A. It is important that the
indices are in increasing order per row. Refer to the columns array
description in the Sparse Matrix Storage Format for more details.
CAUTION
If column indices are not stored in ascending order for each row of
matrix, the result of the routine might not be correct.
tol Tolerance for threshold criterion for the resulting entries of the
preconditioner.
maxfil Maximum fill-in, which is half of the preconditioner bandwidth. The number
of non-zero elements in the rows of the preconditioner cannot exceed
(2*maxfil+1).
ipar Array of size 128. This parameter is used to specify the integer set of data
for both the ILUT and RCI FGMRES computations. Refer to the ipar array
FGMRES parameter entries. The entries specific to ILUT are listed below.
ipar[30] specifies how the routine operates if the value of the

computed diagonal element is less than the current
matrix row norm multiplied by the value of the
parameter tol. If ipar[30] = 0, then the
calculation is stopped and the routine returns non-
zero error value. Otherwise, the value of the
diagonal element is set to a value determined by
dpar[30] (see its description below), and the
calculations continue.
NOTE
There is no default value for ipar[30] even if
the preconditioner is used within the RCI ISS
context. Always set the value of this entry.
1721
NOTE
You must declare the array ipar with length 128. While defining the
array in the code as ipar[30] works, there is no guarantee of future
dpar Array of size 128. This parameter specifies the double precision set of data
for both ILUT and RCI FGMRES computations. Refer to the dpar array
FGMRES parameter entries. The entries that are specific to ILUT are listed
below.
dpar[30] used to adjust the value of small diagonal elements.

Diagonal elements with a value less than the current
matrix row norm multiplied by tol are replaced with
the value of dpar[30] multiplied by the matrix row
norm.
NOTE
There is no default value for dpar[30] entry
even if the preconditioner is used within RCI ISS
context. Always set the value of this entry.
NOTE
You must declare the array dpar with length 128. While defining the
array in the code as ipar[30] works, there is no guarantee of future
Output Parameters
bilut Array containing non-zero elements of the resulting preconditioning matrix

B, stored in the format accepted in the direct sparse solvers. Refer to the
values array description in the Sparse Matrix Storage Format for more
details. The size of the array is equal to (2*maxfil+1)*n-
maxfil*(maxfil+1)+1.
NOTE
Provide enough memory for this array before calling the routine.
Otherwise, the routine may fail to complete successfully with a correct
result.
ibilut Array of size (n+1) containing indices of non-zero elements in the array
bilut. ibilut[i] is the index of the first non-zero element from the row i.
The value of the last element ibilut[n] is equal to the number of non-
zeros in the matrix B, plus one. Refer to the rowIndex array description in
the Sparse Matrix Storage Format for more details.
1722
jbilut Array, its size is equal to the size of the array bilut. This array contains
the column numbers for each non-zero element of the matrix B. Refer to
the columns array description in the Sparse Matrix Storage Format for
more details.
ierr Error flag, gives information about the routine completion.
Return Values
ierr=0 Indicates that the task completed normally.
ierr=-101 Indicates that the routine was interrupted because of an

error: the number of elements in some matrix row specified
in the sparse format is equal to or less than 0.

value of the computed diagonal element is less than the
product of the given tolerance and the current matrix row
norm, and it cannot be replaced as ipar[30]=0.

element ia[i] is less than or equal to the element ia[i -
1] (see Sparse Matrix Storage Format).

memory is insufficient for the internal work arrays.

input value of maxfil is less than 0.
ierr=-106 Indicates that the routine was interrupted because the size
n of the input matrix is less than 0.
ierr=-107 Indicates that the routine was interrupted because an

element of the array ja is less than 1, or greater than n
(see Sparse Matrix Storage Format).
ierr=101 The value of maxfil is greater than or equal to n. The

calculation is performed with the value of maxfil set to
(n-1).
ierr=102 The value of tol is less than 0. The calculation is

performed with the value of the parameter set to (-tol)
ierr=103 The absolute value of tol is greater than value of

dpar[30]; it can result in instability of the calculation.
ierr=104 The value of dpar[30] is equal to 0. It can cause

calculations to fail.
Sparse Matrix Checker Routines

Intel MKL provides a sparse matrix checker so that you can find errors in the storage of sparse matrices
before calling Intel MKL PARDISO, DSS, or Sparse BLAS routines.
1723
sparse_matrix_checker
Checks correctness of sparse matrix.
Syntax
MKL_INT sparse_matrix_checker (sparse_struct* handle);
Include Files
• mkl.h
Description
The sparse_matrix_checker routine checks a user-defined array used to store a sparse matrix in order to
detect issues which could cause problems in routines that require sparse input matrices, such as Intel MKL
PARDISO, DSS, or Sparse BLAS.
Input Parameters
handle Pointer to the data structure describing the sparse array to check.
Return Values
The routine returns a value error. Additionally, the check_result parameter returns information about
where the error occurred, which can be used when message_level is MKL_NO_PRINT.
Sparse Matrix Checker Error Values

error value Meaning Location
MKL_SPARSE_CHECKER_SUC The input array successfully

CESS passed all checks.
MKL_SPARSE_CHECKER_NON The input array is not 0 or 1 C:

_MONOTONIC based (, ia[0] is not 0 or
ia[i] and ia[i + 1] are incompatible.
1) or elements of ia are not
in non-decreasing order as check_result[0] = i
required. check_result[1] = ia[i]
check_result[2] = ia[i + 1]
MKL_SPARSE_CHECKER_OUT The value of the ja array is C:

_OF_RANGE lower than the number of
the first column or greater
than the number of the last check_result[0] = i
column.
check_result[1] = ia[i]
check_result[2] = ia[i + 1]
MKL_SPARSE_CHECKER_NON The matrix_structure C:

TRIANGULAR parameter is
ia[i] and ja[j] are incompatible.
MKL_UPPER_TRIANGULAR
and both ia and ja are not check_result[0] = i
upper triangular, or the check_result[1] = ia[i] = j
matrix_structure
parameter is check_result[2] = ja[j]
1724
error value Meaning Location
MKL_LOWER_TRIANGULAR
and both ia and ja are not
lower triangular
MKL_SPARSE_CHECKER_NON The elements of the ja C:

ORDERED array are not in non-
decreasing order in each
row as required. check_result[0] = j
check_result[1] = ja[j]
check_result[2] = ja[j + 1]
See Also
sparse_matrix_checker_initInitializes handle for sparse matrix checker.
Sparse Matrix Storage Formats
sparse_matrix_checker_init
Initializes handle for sparse matrix checker.
Syntax
void sparse_matrix_checker_init (sparse_struct* handle);
Include Files
• mkl.h
Description
The sparse_matrix_checker_init routine initializes the handle for the sparse_matrix_checker routine.
The handle variable contains this data:
Description of sparse_matrix_checkerhandle Data
Field Type Possible Values Meaning
n MKL_INT Order of the matrix

stored in sparse array.
csr_ia MKL_INT* Pointer to ia array for

matrix_format =
MKL_CSR
csr_ja MKL_INT* Pointer to ja array for

matrix_format =
MKL_CSR
check_result[3] MKL_INT See Sparse Matrix Indicates location of

Checker Error Values for problem in array when
a description of the message_level =
values returned in MKL_NO_PRINT.
check_result.
1725
Field Type Possible Values Meaning
indexing sparse_matrix_index MKL_ZERO_BASED Indexing style used in

ing array.
MKL_ONE_BASED
matrix_structure sparse_matrix_struc MKL_GENERAL_STRUCTU Type of sparse matrix

tures RE stored in array.
MKL_UPPER_TRIANGULA
R
MKL_LOWER_TRIANGULA
R
MKL_STRUCTURAL_SYMM
ETRIC
matrix_format sparse_matrix_forma MKL_CSR Format of array used for

ts sparse matrix storage.
message_level sparse_matrix_messa MKL_NO_PRINT Determines whether or

ge_levels not feedback is provided
MKL_PRINT
on the screen.
print_style sparse_matrix_print MKL_C_STYLE Determines style of

_styles messages when
MKL_FORTRAN_STYLE
message_level =
MKL_PRINT.
Input Parameters
handle Pointer to the data structure describing the sparse array to check.
Output Parameters
handle Pointer to the initialized data structure.
See Also
sparse_matrix_checkerChecks correctness of sparse matrix.
Sparse Matrix Storage Formats
1726
Extended Eigensolver Routines 6
The Extended Eigensolver functionality in Intel Math Kernel Library (Intel MKL) is based on the FEAST
Eigenvalue Solver 2.0 (http://www.ecs.umass.edu/~polizzi/feast) in compliance with a BSD license
agreement.
Extended Eigensolver uses the same naming convention as the original FEAST package.
NOTE
Intel MKL only supports the shared memory programming (SMP) version of the eigenvalue solver.
• The FEAST Algorithm gives a brief description of the algorithm underlying the Extended Eigensolver.
• Extended Eigensolver Functionality describes the problems that can and cannot be solved with the
Extended Eigensolver and how to get the best results from the routines.
• Extended Eigensolver Interfaces gives a reference for calling Extended Eigensolver routines.
The FEAST Algorithm

The Extended Eigensolver functionality is a set of high-performance numerical routines for solving symmetric
standard eigenvalue problems, Ax=λx, or generalized symmetric-definite eigenvalue problems, Ax=λBx. It
yields all the eigenvalues (λ) and eigenvectors (x) within a given search interval [λ min , λ max]. It is based on
the FEAST algorithm, an innovative fast and stable numerical algorithm presented in [Polizzi09], which
fundamentally differs from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos
algorithms [Bai00]) or other Davidson-Jacobi techniques [Sleijpen96]. The FEAST algorithm is inspired by the
density-matrix representation and contour integration techniques in quantum mechanics.
The FEAST numerical algorithm obtains eigenpair solutions using a numerically efficient contour integration
technique. The main computational tasks in the FEAST algorithm consist of solving a few independent linear
systems along the contour and solving a reduced eigenvalue problem. Consider a circle centered in the
middle of the search interval [λ min , λ max]. The numerical integration over the circle in the current version of
FEAST is performed using Ne-point Gauss-Legendre quadrature with xe the e-th Gauss node associated with
the weight ωe. For example, for the case Ne = 8:
( x1, ω1 ) = (0.183434642495649 , 0.362683783378361),

( x2, ω2 ) = (-0.183434642495649 , 0.362683783378361),
( x3, ω3 ) = (0.525532409916328 , 0.313706645877887),
( x4, ω4 ) = (-0.525532409916328 , 0.313706645877887),
( x5, ω5 ) = (0.796666477413626 , 0.222381034453374),
( x6, ω6 ) = (-0.796666477413626 , 0.222381034453374),
( x7, ω7 ) = (0.960289856497536 , 0.101228536290376), and
( x8, ω8 ) = (-0.960289856497536 , 0.101228536290376).
The figure FEAST Pseudocode shows the basic pseudocode for the FEAST algorithm for the case of real
symmetric (left pane) and complex Hermitian (right pane) generalized eigenvalue problems, using N for the
size of the system and M for the number of eigenvalues in the search interval (see [Polizzi09]).
NOTE
The pseudocode presents a simplified version of the actual algorithm. Refer to http://arxiv.org/abs/
1302.0432 for an in-depth presentation and mathematical proof of convergence of FEAST.
1727
FEAST Pseudocode
A: real symmetric A: complex Hermitian

B: symmetric positive definite (SPD) B: Hermitian positive definite (HPD)
ℜ{x}: real part of x
Extended Eigensolver Functionality

Use Extended Eigensolver to compute all the eigenvalues and eigenvectors within a given search interval.
The eigenvalue problems covered are as follows:
• standard, Ax = λx
• A complex Hermitian
• A real symmetric
• generalized, Ax = λBx
• A complex Hermitian, B Hermitian positive definite (hpd)
• A real symmetric and B real symmetric positive definite (spd)
The Extended Eigensolver functionality offers:
• Real/Complex and Single/Double precisions: double precision is recommended to provide better accuracy
of eigenpairs.
• Reverse communication interfaces (RCI) provide maximum flexibility for specific applications. RCI are
independent of matrix format and inner system solvers, so you must provide your own linear system
solvers (direct or iterative) and matrix-matrix multiply routines.
• Predefined driver interfaces for dense, LAPACK banded, and sparse (CSR) formats are less flexible but are
optimized and easy to use:
• The Extended Eigensolver interfaces for dense matrices are likely to be slower than the comparable
LAPACK routines because the FEAST algorithm has a higher computational cost.
1728
• The Extended Eigensolver interfaces for banded matrices support banded LAPACK-type storage.
• The Extended Eigensolver sparse interfaces support compressed sparse row format and use the Intel
MKL PARDISO solver.
Optimization Notice
Parallelism in Extended Eigensolver Routines

How you achieve parallelism in Extended Eigensolver routines depends on which interface you use.
Parallelism (via shared memory programming) is not explicitly implemented in Extended Eigensolver routines
within one node: the inner linear systems are currently solved one after another.
• Using the Extended Eigensolver RCI interfaces, you can achieve parallelism by providing a threaded inner
system solver and a matrix-matrix multiplication routine. When using the RCI interfaces, you are
responsible for activating the threaded capabilities of your BLAS and LAPACK libraries most likely using
the shell variable OMP_NUM_THREADS.
• Using the predefined Extended Eigensolver interfaces, parallelism can be implicitly obtained within the
shared memory version of BLAS, LAPACK or Intel MKL PARDISO. The shell variable MKL_NUM_THREADS can
be used for automatically setting the number of OpenMP threads (cores) for BLAS, LAPACK, and Intel MKL
PARDISO.
Optimization Notice
Achieving Performance With Extended Eigensolver Routines

In order to use the Extended Eigensolver Routines, you need to provide
• the search interval and the size of the subspace M0 (overestimation of the number of eigenvalues M within
a given search interval);
• the system matrix in dense, banded, or sparse CSR format if the Extended Eigensolver predefined
interfaces are used, or a high-performance complex direct or iterative system solver and matrix-vector
multiplication routine if RCI interfaces are used.
In return, you can expect
• fast convergence with very high accuracy when seeking up to 1000 eigenpairs (in two to four iterations
using M0 = 1.5M, and Ne = 8 or at most using Ne = 16 contour points);
• an extremely robust approach.
1729
The performance of the basic FEAST algorithm depends on a trade-off between the choices of the number of
Gauss quadrature points Ne, the size of the subspace M0, and the number of outer refinement loops to reach
the desired accuracy. In practice you should use M0 > 1.5 M, Ne = 8, and at most two refinement loops.
For better performance:
• M0 should be much smaller than the size of the eigenvalue problem, so that the arithmetic complexity
depends mainly on the inner system solver (O(NM) for narrow-banded or sparse systems).
• Parallel scalability performance depends on the shared memory capabilities of the of the inner system
solver.
• For very large sparse and challenging systems, application users should make use of the Extended
Eigensolver RCI interfaces with customized highly-efficient iterative systems solvers and preconditioners.
• For the Extended Eigensolver interfaces for banded matrices, the parallel performance scalability is
limited.
Optimization Notice
Extended Eigensolver Interfaces
Extended Eigensolver Naming Conventions

There are two different types of interfaces available in the Extended Eigensolver routines:
1. The reverse communication interfaces (RCI):

?feast_<matrix type>_rci
These interfaces are matrix free format (the interfaces are independent of the matrix data formats).
You must provide matrix-vector multiply and direct/iterative linear system solvers for your own explicit
or implicit data format.
2. The predefined interfaces:
?feast_<matrix type><type of eigenvalue problem>
are predefined drivers for ?feast reverse communication interface that act on commonly used matrix
data storage (dense, banded and compressed sparse row representation), using internal matrix-vector
routines and selected inner linear system solvers.
For these interfaces:
• ? indicates the data type of matrix A (and matrix B if any) defined as follows:
s float
d double
c MKL_Complex8
z MKL_Complex16
1730
• <matrix type> defined as follows:
Value of <matrix type> Matrix format Inner linear system solver used by
Extended Eigensolver
sy (symmetric real)
Dense LAPACK dense solvers
he (Hermitian
complex)
sb (symmetric
banded real)
Banded-LAPACK Internal banded solver
hb (Hermitian
banded complex)
scsr (symmetric real)

Compressed sparse row PARDISO solver
hcsr (Hermitian
complex)
s (symmetric real)
Reverse
communications User defined
h (Hermitian
interfaces
complex)
• <type of eigenvalue problem> is:

gv generalized eigenvalue problem
ev standard eigenvalue problem
For example, sfeast_scsrev is a single-precision routine with a symmetric real matrix stored in sparse
compressed-row format for a standard eigenvalue problem, and zfeast_hrci is a complex double-precision
routine with a Hermitian matrix using the reverse communication interface.
Note that:
• ? can be s or d if a matrix is real symmetric: <matrix type> is sy, sb, or scsr.

• ? can be c or z if a matrix is complex Hermitian: <matrix type> is he, hb, or hcsr.
• ? can be c or z if the Extended Eigensolver RCI interface is used for solving a complex Hermitian problem.
• ? can be s or d if the Extended Eigensolver RCI interface is used for solving a real symmetric problem.
feastinit
Initialize Extended Eigensolver input parameters with
default values.
Syntax
feastinit (MKL_INT* fpm);
Include Files
• mkl.h
Description
This routine sets all Extended Eigensolver parameters to their default values.
1731
Output Parameters
fpm Array, size 128. This array is used to pass various parameters to Extended
Eigensolver routines. See Extended Eigensolver Input Parameters for a
complete description of the parameters and their default values.
Extended Eigensolver Input Parameters

The input parameters for Extended Eigensolver routines are contained in an MKL_INT array named fpm. To
call the Extended Eigensolver interfaces, this array should be initialized using the routine feastinit.
Param Default Description
fpm[0] 0 Specifies whether Extended Eigensolver routines print runtime status.
fpm[0]=0 Extended Eigensolver routines do not generate runtime

messages at all.
fpm[0]=1 Extended Eigensolver routines print runtime status to the

screen.
fpm[1] 8 The number of contour points Ne = 8 (see the description of FEAST algorithm).
Must be one of {3,4,5,6,8,10,12,16,20,24,32,40,48}.
fpm[2] 12 Error trace double precision stopping criteria ε (ε = 10-fpm[2]) .
fpm[3] 20 Maximum number of Extended Eigensolver refinement loops allowed. If no

convergence is reached within fpm[3] refinement loops, Extended Eigensolver
routines return info=2.
fpm[4] 0 User initial subspace. If fpm[4]=0 then Extended Eigensolver routines generate
initial subspace, if fpm[4]=1 the user supplied initial subspace is used.
fpm[5] 0 Extended Eigensolver stopping test.
fpm[5]=0 Extended Eigensolvers are stopped if this residual stopping

test is satisfied:
• generalized eigenvalue problem:
• standard eigenvalue problem:
where mode is the total number of eigenvalues found in the

search interval and ε = 10-fpm[6] for real and complex or ε
= 10-fpm[2] for double precision and double complex.
fpm[5]=1 Extended Eigensolvers are stopped if this trace stopping test

is satisfied:
1732
Param Default Description
where tracej denotes the sum of all eigenvalues found in the

search interval [emin, emax] at the j-th Extended
Eigensolver iteration:
fpm[6] 5 Error trace single precision stopping criteria (10-fpm[6]) .
fpm[13] 0 fpm[13]=0 Standard use for Extended Eigensolver routines.
fpm[13]=1 Non-standard use for Extended Eigensolver routines: return

the computed eigenvectors subspace after one single
contour integration.
fpm[26] 0 Specifies whether Extended Eigensolver routines check input matrices (applies to
CSR format only).
fpm[26]=0 Extended Eigensolver routines do not check input matrices.
fpm[26]=1 Extended Eigensolver routines check input matrices.
fpm[27] 0 Check if matrix B is positive definite. Set fpm[27] = 1 to check if B is positive

definite.
fpm[29] - Reserved for future use.

to
fpm[62]
fpm[63] 0 Use the Intel MKL PARDISO solver with the user-defined PARDISO iparm array
settings.
NOTE
This option can only be used by Extended Eigensolver Predefined Interfaces
for Sparse Matrices.
fpm[63]=0 Extended Eigensolver routines use the Intel MKL PARDISO

default iparm settings defined by calling the pardisoinit
subroutine.
fpm[63]=1 The values from fpm[64] to fpm[127] correspond to

iparm[0] to iparm[63] respectively according to the
formula fpm[64 + i]= iparm[i] for i = 0, 1, ..., 63.
Extended Eigensolver Output Details

Errors and warnings encountered during a run of the Extended Eigensolver routines are stored in an integer
variable, info. If the value of the output info parameter is not 0, either an error or warning was
encountered. The possible return values for the info parameter along with the error code descriptions are
given in the following table.
Return Codes for info Parameter
info Classification Description
202 Error Problem with size of the system n (n≤0)
1733
info Classification Description
201 Error Problem with size of initial subspace m0 (m0≤0 or m0>n)
200 Error Problem with emin,emax (emin≥emax)
(100+i) Error Problem with i-th value of the input Extended Eigensolver
parameter (fpm[i - 1]). Only the parameters in use are checked.
4 Warning Successful return of only the computed subspace after call with
fpm[13] = 1
3 Warning Size of the subspace m0 is too small (m0<m)
2 Warning No Convergence (number of iteration loops >fpm[3])
1 Warning No eigenvalue found in the search interval. See remark below for
further details.
0 Successful exit
-1 Error Internal error for allocation memory.
-2 Error Internal error of the inner system solver. Possible reasons: not
enough memory for inner linear system solver or inconsistent
input.
-3 Error Internal error of the reduced eigenvalue solver

Possible cause: matrix B may not be positive definite. It can be
checked by setting fpm[27] = 1 before calling an Extended
Eigensolver routine, or by using LAPACK routines.
-4 Error Matrix B is not positive definite.
-(100+i) Error Problem with the i-th argument of the Extended Eigensolver
interface.
In some extreme cases the return value info=1 may indicate that the Extended Eigensolver routine has
failed to find the eigenvalues in the search interval. This situation could arise if a very large search interval is
used to locate a small and isolated cluster of eigenvalues (i.e. the dimension of the search interval is many
orders of magnitude larger than the number of contour points. It is then either recommended to increase the
number of contour points fpm[1] or simply rescale more appropriately the search interval. Rescaling means
the initial problem of finding all eigenvalues the search interval [λmin,λmax] for the standard eigenvalue
problem A x=λx is replaced with the problem of finding all eigenvalues in the search interval [λmin/ t, λmax/ t]
for the standard eigenvalue problem (A/t) x=(λ/t) x where t is a scaling factor.
Extended Eigensolver RCI Routines

If you do not require specific linear system solvers or matrix storage schemes, you can skip this section and
go directly to Extended Eigensolver Predefined Interfaces.
Extended Eigensolver RCI Interface Description

The Extended Eigensolver RCI interfaces can be used to solve standard or generalized eigenvalue problems,
and are independent of the format of the matrices. As mentioned earlier, the Extended Eigensolver algorithm
is based on the contour integration techniques of the matrix resolvent G(σ )= (σB - A)-1 over a circle. For
solving a generalized eigenvalue problem, Extended Eigensolver has to perform one or more of the following
operations at each contour point denoted below by Ze :
• Factorize the matrix (Ze *B - A)

• Solve the linear system (Ze *B - A)X = Y or (Ze *B - A)HX = Y with multiple right hand sides, where H
means transpose conjugate
1734
• Matrix-matrix multiply BX = Y or AX = Y
For solving a standard eigenvalue problem, replace the matrix B with the identity matrix I.
The primary aim of RCI interfaces is to isolate these operations: the linear system solver, factorization of the
matrix resolvent at each contour point, and matrix-matrix multiplication. This gives universality to RCI
interfaces as they are independent of data structures and the specific implementation of the operations like
matrix-vector multiplication or inner system solvers. However, this approach requires some additional effort
when calling the interface. In particular, operations listed above are performed by routines that you supply on
data structures that you find most appropriate for the problem at hand.
To initialize an Extended Eigensolver RCI routine, set the job indicator (ijob) parameter to the value -1.
When the routine requires the results of an operation, it generates a special value of ijob to indicate the
operation that needs to be performed. The routine also returns ze, the coordinate along the complex contour,
the values of array work or workc, and the number of columns to be used. Your subroutine then must
perform the operation at the given contour point ze, store the results in prescribed array, and return control
to the Extended Eigensolver RCI routine.
The following pseudocode shows the general scheme for using the Extended Eigensolver RCI functionality for
a real symmetric problem:
ijob=-1; // initialization
do while (ijob!=0) {
?feast_srci(&ijob, &N, &Ze, work, workc, Aq, Bq,
fpm, &epsout, &loop, &Emin, &Emax, &M0, E, lambda, &q, res, &info);
switch(ijob) {
case 10: // Factorize the complex matrix (ZeB-A)
break;
case 11: // Solve the complex linear system (ZeB-A)x=workc

// Put result in workc
break;
case 30: // Perform multiplication A by Qi..Qj columns of QNxM0

// where i = fpm[23] and j = fpm[23]+fpm[24]−1
// Qi..Qj located in q starting from q+N*(i-1)
break;
case 40: // Perform multiplication B by Qi..Qj columns of QNxM0

// Result is stored in work+N*(i-1)
break;
}
}
NOTE
The ? option in ?feast in the pseudocode given above should be replaced by either s or d, depending
on the matrix data type of the eigenvalue system.
The next pseudocode shows the general scheme for using the Extended Eigensolver RCI functionality for a
complex Hermitian problem:
ijob=-1; // initialization
while (ijob!=0) {
?feast_hrci(&ijob, &N, &Ze, work, workc, Aq, Bq,
fpm, &epsout, &loop, &Emin, &Emax, &M0, E, lambda, &q, res, &info);
switch (ijob) {
case 10: // Factorize the complex matrix (ZeB-A)
1735
break;
case 11: // Solve the linear system (ZeB−A)y=workc

break;
case 20: // Factorize (if needed by case 21) the complex matrix (ZeB−A)ˆH
// ATTENTION: This option requires additional memory storage
// (i.e . the resulting matrix from case 10 cannot be overwritten)
break;
case 21: // Solve the linear system (ZeB−A)ˆHy=workc

// REMARK: case 20 becomes obsolete if this solve can be performed
// using the factorization in case 10
break;
case 30: // Multiply A by Qi..Qj columns of QNxM0,

break;
case 40: // Perform multiplication B by Qi..Qj columns of QNxM0

break;
}
}
end do
NOTE
The ? option in ?feast in the pseudocode given above should be replaced by either c or z, depending
on the matrix data type of the eigenvalue system.
If case 20 can be avoided, performance could be up to twice as fast, and Extended Eigensolver
functionality would use half of the memory.
If an iterative solver is used along with a preconditioner, the factorization of the preconditioner could be
performed with ijob = 10 (and ijob = 20 if applicable) for a given value of Ze, and the associated iterative
solve would then be performed with ijob = 11 (and ijob = 21 if applicable).
Optimization Notice
1736
?feast_srci/?feast_hrci
Extended Eigensolver RCI interface.
Syntax
void sfeast_srci (MKL_INT* ijob, const MKL_INT* n, MKL_Complex8* ze, float* work,
MKL_Complex8* workc, float* aq, float* sq, MKL_INT* fpm, float* epsout, MKL_INT* loop,
const float* emin, const float* emax, MKL_INT* m0, float* lambda, float* q, MKL_INT*
m, float* res, MKL_INT* info);
void dfeast_srci (MKL_INT* ijob, const MKL_INT* n, MKL_Complex16* ze, double* work,
MKL_Complex16* workc, double* aq, double* sq, MKL_INT* fpm, double* epsout, MKL_INT*
loop, const double* emin, const double* emax, MKL_INT* m0, double* lambda, double* q,
MKL_INT* m, double* res, MKL_INT* info);
void cfeast_hrci (MKL_INT* ijob, const MKL_INT* n, MKL_Complex8* ze, MKL_Complex8*
work, MKL_Complex8* workc, MKL_Complex8* aq, MKL_Complex8* sq, MKL_INT* fpm, float*
epsout, MKL_INT* loop, const float* emin, const float* emax, MKL_INT* m0, float*
lambda, MKL_Complex8* q, MKL_INT* m, float* res, MKL_INT* info);
void zfeast_hrci (MKL_INT* ijob, const MKL_INT* n, MKL_Complex16* ze, MKL_Complex16*
work, MKL_Complex16* workc, MKL_Complex16* aq, MKL_Complex16* sq, MKL_INT* fpm,
double* epsout, MKL_INT* loop, const double* emin, const double* emax, MKL_INT* m0,
double* lambda, MKL_Complex16* q, MKL_INT* m, double* res, MKL_INT* info);
Include Files
• mkl.h
Description
Compute eigenvalues as described in Extended Eigensolver RCI Interface Description.
Input Parameters
ijob Job indicator variable. On entry, a call to ?feast_srci/?feast_hrci with

ijob=-1 initializes the eigensolver.
n Sets the size of the problem. n > 0.
work Workspace array of size n by m0.
workc Workspace array of size n by m0.
aq, sq Workspace arrays of size m0 by m0.
fpm Array, size of 128. This array is used to pass various parameters to
Extended Eigensolver routines. See Extended Eigensolver Input Parameters
for a complete description of the parameters and their default values.
emin, emax The lower and upper bounds of the interval to be searched for eigenvalues;
emin≤emax.
m0 On entry, specifies the initial guess for subspace size to be used, 0 < m0≤n.
Set m0≥m where m is the total number of eigenvalues located in the interval
[emin, emax]. If the initial guess is wrong, Extended Eigensolver routines
return info=3.
1737
q On entry, if fpm[4]=1, the array q of size n by m contains a basis of guess

subspace where n is the order of the input matrix.
Output Parameters
ijob On exit, the parameter carries the status flag that indicates the condition of
the return. The status information is divided into three categories:
1. A zero value indicates successful completion of the task.

2. A positive value indicates that the solver requires a matrix-vector
multiplication or solving a specific system with a complex coefficient.
3. A negative value indicates successful initiation.
A non-zero value of ijob specifically means the following:
• ijob = 10 - factorize the complex matrix Ze*B - A at a given contour

point Ze and return the control to the ?feast_srci/?feast_hrci
routine where Ze is a complex number meaning contour point and its
value is defined internally in ?feast_srci/?feast_hrci.
• ijob =11 - solve the complex linear system (Ze*B - A)*y = workc, put
the solution in workc and return the control to the ?feast_srci/?
feast_hrci routine.
• ijob =20 - factorize the complex matrix (Ze*B - A)H at a given contour
point Ze and return the control to the ?feast_srci/?feast_hrci
routine where Ze is a complex number meaning contour point and its
value is defined internally in ?feast_srci/?feast_hrci.
The symbol XH means transpose conjugate of matrix X.

• ijob = 21 - solve the complex linear system(Ze*B - A)H*y = workc, put
the solution in workc and return the control to the ?feast_srci/?
feast_hrci routine. The case ijob=20 becomes obsolete if the solve
can be performed using the factorization computed for ijob=10.
The symbol XH mean transpose conjugate of matrix X.

• ijob = 30 - multiply matrix A by Qj..Qi, put the result in work + N*(i -
1), and return the control to the ?feast_srci/?feast_hrci routine.
i is fpm[24], and j is fpm[23] + fpm[24] - 1.

• ijob = 40 - multiply matrix B by Qj..Qi, put the result in work + N*(i -
1) and return the control to the ?feast_srci/?feast_hrci routine. If a
standard eigenvalue problem is solved, just return work = q.
i is fpm[24], and j is fpm[23] + fpm[24] - 1.

• ijob = -2 - rerun the ?feast_srci/?feast_hrci task with the same
parameters.
ze Defines the coordinate along the complex contour. All values of ze are
generated by ?feast_srci/?feast_hrci internally.
fpm On output, contains coordinates of columns of work array needed for

iterative refinement. (See Extended Eigensolver RCI Interface Description.)
epsout On output, contains the relative error on the trace: |tracei - tracei-1| /max(|
emin|, |emax|)
1738
loop On output, contains the number of refinement loop executed. Ignored on
input.
lambda Array of length m0. On output, the first m entries of lambda are eigenvalues
found in the interval.
q On output, q contains all eigenvectors corresponding to lambda.
m The total number of eigenvalues found in the interval [emin, emax]: 0

≤m≤m0.
res Array of length m0. On exit, the first m components contain the relative
residual vector:
• generalized eigenvalue problem:
• standard eigenvalue problem:
for i=0, 1, …, m - 1, and where m is the total number of eigenvalues found in

the search interval.
info If info=0, the execution is successful. If info≠ 0, see Output Eigensolver

info Details.
Extended Eigensolver Predefined Interfaces

The predefined interfaces include routines for standard and generalized eigenvalue problems, and for dense,
banded, and sparse matrices.
Matrix Type Standard Eigenvalue Problem Generalized Eigenvalue

Problem
Dense ?feast_syev ?feast_sygv

?feast_heev ?feast_hegv
Banded ?feast_sbev ?feast_sbgv

?feast_hbev ?feast_hbgv
Sparse ?feast_scsrev ?feast_scsrgv

?feast_hcsrev ?feast_hcsrgv
Matrix Storage
The symmetric and Hermitian matrices used in Extended Eigensolvers predefined interfaces can be stored in
full, band, and sparse formats.
• In the full storage format (described in Full Storage in additional detail) you store all elements, all of the
elements in the upper triangle of the matrix, or all of the elements in the lower triangle of the matrix.
1739
• In the band storage format (described in Band storage in additional detail), you store only the elements
along a diagonal band of the matrix.
• In the sparse format (described in Storage Arrays for a Matrix in CSR Format (3-Array Variation)), you
store only the non-zero elements of the matrix.
In generalized eigenvalue systems you must use the same family of storage format for both matrices A and
B. The bandwidth can be different for the banded format (klb can be different from kla), and the position of
the non-zero elements can also be different for the sparse format (CSR coordinates ib and jb can be
different from ia and ja).
?feast_syev/?feast_heev
Extended Eigensolver interface for standard
eigenvalue problem with dense matrices.
Syntax
void sfeast_syev (const char * uplo, const MKL_INT * n, const float * a, const MKL_INT
* lda, MKL_INT * fpm, float * epsout, MKL_INT * loop, const float * emin, const float
* emax, MKL_INT * m0, float * e, float * x, MKL_INT * m, float * res, MKL_INT * info);
void dfeast_syev (const char * uplo, const MKL_INT * n, const double * a, const MKL_INT
* lda, MKL_INT * fpm, double * epsout, MKL_INT * loop, const double * emin, const
double * emax, MKL_INT * m0, double * e, double * x, MKL_INT * m, double * res,
MKL_INT * info);
void cfeast_heev (const char * uplo, const MKL_INT * n, const MKL_Complex8 * a, const
MKL_INT * lda, MKL_INT * fpm, float * epsout, MKL_INT * loop, const float * emin,
const float * emax, MKL_INT * m0, float * e, MKL_Complex8 * x, MKL_INT * m, float *
res, MKL_INT * info);
void zfeast_heev (const char * uplo, const MKL_INT * n, const MKL_Complex16 * a, const
MKL_INT * lda, MKL_INT * fpm, double * epsout, MKL_INT * loop, const double * emin,
const double * emax, MKL_INT * m0, double * e, MKL_Complex16 * x, MKL_INT * m, double
* res, MKL_INT * info);
Include Files
• mkl.h
Description
The routines compute all the eigenvalues and eigenvectors for standard eigenvalue problems, Ax = λx, within
a given search interval.
Input Parameters
uplo Must be 'U' or 'L' or 'F' .
If uplo = 'U', a stores the upper triangular parts of A.
If uplo = 'L', a stores the lower triangular parts of A.
If uplo= 'F' , a stores the full matrix A.
a Array of dimension lda by n, contains either full matrix A or upper or lower

triangular part of the matrix A, as specified by uplo
1740
fpm Array, dimension of 128. This array is used to pass various parameters to
emin≤emax.
m0 On entry, specifies the initial guess for subspace dimension to be used, 0 <
m0≤n. Set m0≥m where m is the total number of eigenvalues located in the
interval [emin, emax]. If the initial guess is wrong, Extended Eigensolver
x On entry, if fpm[4]=1, the array x of size n by m contains a basis of guess

Output Parameters
emin|, |emax|)

input.
e Array of length m0. On output, the first m entries of e are eigenvalues found
in the interval.
x On output, the first m columns of x contain the orthonormal eigenvectors

corresponding to the computed eigenvalues e, with the i-th column of x
holding the eigenvector associated with e[i].

≤m≤m0.
residual vector:
for i=1, 2, …, m, and where m is the total number of eigenvalues found in


info Details.
?feast_sygv/?feast_hegv
Extended Eigensolver interface for generalized
eigenvalue problem with dense matrices.
Syntax
void sfeast_sygv (const char * uplo, const MKL_INT * n, const float * a, const MKL_INT
* lda, const float * b, const MKL_INT * ldb, MKL_INT * fpm, float * epsout, MKL_INT *
loop, const float * emin, const float * emax, MKL_INT * m0, float * e, float * x,
MKL_INT * m, float * res, MKL_INT * info);
1741
void dfeast_sygv (const char * uplo, const MKL_INT * n, const double * a, const MKL_INT
* lda, const double * b, const MKL_INT * ldb, MKL_INT * fpm, double * epsout, MKL_INT
* loop, const double * emin, const double * emax, MKL_INT * m0, double * e, double *
x, MKL_INT * m, double * res, MKL_INT * info);
void cfeast_hegv (const char * uplo, const MKL_INT * n, const MKL_Complex8 * a, const
MKL_INT * lda, const MKL_Complex8 * b, const MKL_INT * ldb, MKL_INT * fpm, float *
epsout, MKL_INT * loop, const float * emin, const float * emax, MKL_INT * m0, float *
e, MKL_Complex8 * x, MKL_INT * m, float * res, MKL_INT * info);
void zfeast_hegv (const char * uplo, const MKL_INT * n, const MKL_Complex16 * a, const
MKL_INT * lda, const MKL_Complex16 * b, const MKL_INT * ldb, MKL_INT * fpm, double *
epsout, MKL_INT * loop, const double * emin, const double * emax, MKL_INT * m0, double
* e, MKL_Complex16 * x, MKL_INT * m, double * res, MKL_INT * info);
Include Files
• mkl.h
Description
The routines compute all the eigenvalues and eigenvectors for generalized eigenvalue problems, Ax = λBx,
within a given search interval.
Input Parameters
If UPLO = 'U', a and b store the upper triangular parts of A and B

respectively.
If UPLO = 'L', a and b store the lower triangular parts of A and B
respectively.
If UPLO= 'F', a and b store the full matrices A and B respectively.

b Array of dimension ldb by n, contains either full matrix B or upper or lower

triangular part of the matrix B, as specified by uplo
ldb The leading dimension of the array B. Must be at least max(1, n).
emin≤emax.
1742
Output Parameters
emin|, |emax|)

input.
in the interval.


≤m≤m0.
residual vector:


info Details.
?feast_sbev/?feast_hbev
eigenvalue problem with banded matrices.
Syntax
void sfeast_sbev (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
float * a, const MKL_INT * lda, MKL_INT * fpm, float * epsout, MKL_INT * loop, const
float * emin, const float * emax, MKL_INT * m0, float * e, float * x, MKL_INT * m,
float * res, MKL_INT * info);
void dfeast_sbev (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
double * a, const MKL_INT * lda, MKL_INT * fpm, double * epsout, MKL_INT * loop, const
double * emin, const double * emax, MKL_INT * m0, double * e, double * x, MKL_INT * m,
double * res, MKL_INT * info);
void cfeast_hbev (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
MKL_Complex8 * a, const MKL_INT * lda, MKL_INT * fpm, float * epsout, MKL_INT * loop,
const float * emin, const float * emax, MKL_INT * m0, float * e, MKL_Complex8 * x,
MKL_INT * m, float * res, MKL_INT * info);
1743
void zfeast_hbev (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
MKL_Complex16 * a, const MKL_INT * lda, MKL_INT * fpm, double * epsout, MKL_INT *
loop, const double * emin, const double * emax, MKL_INT * m0, double * e,
MKL_Complex16 * x, MKL_INT * m, double * res, MKL_INT * info);
Include Files
• mkl.h
Description
Input Parameters
kla The number of super- or sub-diagonals within the band in A (kla≥ 0).

emin≤emax.

Output Parameters
emin|, |emax|)

input.
in the interval.
1744

≤m≤m0.
residual vector:


info Details.
?feast_sbgv/?feast_hbgv
eigenvalue problem with banded matrices.
Syntax
void sfeast_sbgv (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
float * a, const MKL_INT * lda, const MKL_INT * klb, const float * b, const MKL_INT *
ldb, MKL_INT * fpm, float * epsout, MKL_INT * loop, const float * emin, const float *
emax, MKL_INT * m0, float * e, float * x, MKL_INT * m, float * res, MKL_INT * info);
void dfeast_sbgv (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
double * a, const MKL_INT * lda, const MKL_INT * klb, const double * b, const MKL_INT *
ldb, MKL_INT * fpm, double * epsout, MKL_INT * loop, const double * emin, const double
* emax, MKL_INT * m0, double * e, double * x, MKL_INT * m, double * res, MKL_INT *
info);
void cfeast_hbgv (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
MKL_Complex8 * a, const MKL_INT * lda, const MKL_INT * klb, const MKL_Complex8 * b,
const MKL_INT * ldb, MKL_INT * fpm, float * epsout, MKL_INT * loop, const float *
emin, const float * emax, MKL_INT * m0, float * e, MKL_Complex8 * x, MKL_INT * m,
void zfeast_hbgv (const char * uplo, const MKL_INT * n, const MKL_INT * kla, const
MKL_Complex16 * a, const MKL_INT * lda, const MKL_INT * klb, const MKL_Complex16 * b,
const MKL_INT * ldb, MKL_INT * fpm, double * epsout, MKL_INT * loop, const double *
emin, const double * emax, MKL_INT * m0, double * e, MKL_Complex16 * x, MKL_INT * m,
Include Files
• mkl.h
Description
1745
NOTE
Both matrices A and B must use the same family of storage format. The bandwidth, however, can be
different (klb can be different from kla).
Input Parameters

respectively.
respectively.
kla The number of super- or sub-diagonals within the band in A (kla≥ 0).

klb The number of super- or sub-diagonals within the band in B (klb≥ 0).
b Array of dimension ldb by n, contains either full matrix B or upper or lower

triangular part of the matrix B, as specified by uplo
ldb The leading dimension of the array B. Must be at least max(1, n).
emin≤emax.

Output Parameters
emin|, |emax|)

input.
1746
in the interval.


≤m≤m0.
residual vector:


info Details.
?feast_scsrev/?feast_hcsrev
eigenvalue problem with sparse matrices.
Syntax
void sfeast_scsrev (const char * uplo, const MKL_INT * n, const float * a, const
MKL_INT * ia, const MKL_INT * ja, MKL_INT * fpm, float * epsout, MKL_INT * loop, const
float * emin, const float * emax, MKL_INT * m0, float * e, float * x, MKL_INT * m,
void dfeast_scsrev (const char * uplo, const MKL_INT * n, const double * a, const
MKL_INT * ia, const MKL_INT * ja, MKL_INT * fpm, double * epsout, MKL_INT * loop,
const double * emin, const double * emax, MKL_INT * m0, double * e, double * x,
MKL_INT * m, double * res, MKL_INT * info);
void cfeast_hcsrev (const char * uplo, const MKL_INT * n, const MKL_Complex8 * a, const
MKL_INT * ia, const MKL_INT * ja, MKL_INT * fpm, float * epsout, MKL_INT * loop, const
float * emin, const float * emax, MKL_INT * m0, float * e, MKL_Complex8 * x, MKL_INT *
m, float * res, MKL_INT * info);
void zfeast_hcsrev (const char * uplo, const MKL_INT * n, const MKL_Complex16 * a,
const MKL_INT * ia, const MKL_INT * ja, MKL_INT * fpm, double * epsout, MKL_INT *
loop, const double * emin, const double * emax, MKL_INT * m0, double * e,
MKL_Complex16 * x, MKL_INT * m, double * res, MKL_INT * info);
Include Files
• mkl.h
Description
1747
Input Parameters
a Array containing the nonzero elements of either the full matrix A or the
upper or lower triangular part of the matrix A, as specified by uplo.
ia Array of length n + 1, containing indices of elements in the array a , such

row i . The value of the last element ia[n] is equal to the number of non-
zeros plus one.
matrix A being represented in the array a . Its length is equal to the length
of the array a.
emin≤emax.

Output Parameters
fpm On output, the last 64 values correspond to Intel MKL PARDISO iparm[0]
to iparm[63] (regardless of the value of fpm[63] on input).
emin|, |emax|)

input.
in the interval.

1748
≤m≤m0.
residual vector:


info Details.
?feast_scsrgv/?feast_hcsrgv
eigenvalue problem with sparse matrices.
Syntax
void sfeast_scsrgv (const char * uplo, const MKL_INT * n, const float * a, const
MKL_INT * ia, const MKL_INT * ja, const float * b, const MKL_INT * ib, const MKL_INT *
jb, MKL_INT * fpm, float * epsout, MKL_INT * loop, const float * emin, const float *
emax, MKL_INT * m0, float * e, float * x, MKL_INT * m, float * res, MKL_INT * info);
void dfeast_scsrgv (const char * uplo, const MKL_INT * n, const double * a, const
MKL_INT * ia, const MKL_INT * ja, const double * b, const MKL_INT * ib, const MKL_INT *
jb, MKL_INT * fpm, double * epsout, MKL_INT * loop, const double * emin, const double
* emax, MKL_INT * m0, double * e, double * x, MKL_INT * m, double * res, MKL_INT *
info);
void cfeast_hcsrgv (const char * uplo, const MKL_INT * n, const MKL_Complex8 * a, const
MKL_INT * ia, const MKL_INT * ja, const MKL_Complex8 * b, const MKL_INT * ib, const
MKL_INT * jb, MKL_INT * fpm, float * epsout, MKL_INT * loop, const float * emin, const
float * emax, MKL_INT * m0, float * e, MKL_Complex8 * x, MKL_INT * m, float * res,
MKL_INT * info);
void zfeast_hcsrgv (const char * uplo, const MKL_INT * n, const MKL_Complex16 * a,
const MKL_INT * ia, const MKL_INT * ja, const MKL_Complex16 * b, const MKL_INT * ib,
const MKL_INT * jb, MKL_INT * fpm, double * epsout, MKL_INT * loop, const double *
emin, const double * emax, MKL_INT * m0, double * e, MKL_Complex16 * x, MKL_INT * m,
Include Files
• mkl.h
Description
1749
NOTE
Both matrices A and B must use the same family of storage format. The position of the non-zero
elements can be different (CSR coordinates ib and jb can be different from ia and ja).
Input Parameters

respectively.
respectively.
a Array containing the nonzero elements of either the full matrix A or the
upper or lower triangular part of the matrix A, as specified by uplo.
ia Array of length n + 1, containing indices of elements in the array a , such

that ia[i - 1] is the index in the array a of the first non-zero element
from the row i . The value of the last element ia[n] is equal to the number
of non-zeros plus one.
matrix A being represented in the array a . Its length is equal to the length
of the array a.
b Array of dimension ldb by *, contains the nonzero elements of either the

full matrix B or the upper or lower triangular part of the matrix B, as
specified by uplo.
ib Array of length n + 1, containing indices of elements in the array b , such

that ib[i - 1] is the index in the array b of the first non-zero element
from the row i . The value of the last element ib[n] is equal to the number
of non-zeros plus one.
jb Array containing the column indices for each non-zero element of the
matrix B being represented in the array b . Its length is equal to the length
of the array b.
emin≤emax.
1750
Output Parameters
fpm On output, the last 64 values correspond to Intel MKL PARDISO iparm[0]
to iparm[63] (regardless of the value of fpm[63] on input).
emin|, |emax|)

input.
in the interval.


≤m≤m0.
residual vector:
Ax i − λiBx i
1
max E min , E max Bx i
1


info Details.
1751
1752
Vector Mathematical Functions 7
This chapter describes Intel® MKL Vector Mathematics functions (VM), which compute a mathematical
function of each of the vector elements. VM includes a set of highly optimized functions (arithmetic, power,
trigonometric, exponential, hyperbolic, special, and rounding) that operate on vectors of real and complex
numbers.
Application programs that improve performance with VM include nonlinear programming software,
computation of integrals, financial calculations, computer graphics, and many others.
VM functions fall into the following groups according to the operations they perform:
• VM Mathematical Functions compute values of mathematical functions, such as sine, cosine, exponential,
or logarithm, on vectors stored contiguously in memory.
• VM Pack/Unpack Functions convert to and from vectors with positive increment indexing, vector indexing,
and mask indexing (see Appendix B for details on vector indexing methods).
• VM Service Functions set/get the accuracy modes and the error codes, and free memory.
The VM mathematical functions take an input vector as an argument, compute values of the respective
function element-wise, and return the results in an output vector. All the VM mathematical functions can
perform in-place operations, where the input and output arrays are at the same memory locations.
The Intel MKL interfaces are given in mkl_vml_functions.h.
Examples that demonstrate how to use the VM functions are located in:
${MKL}/examples/vmlc/source
See VM performance and accuracy data in the online VM Performance and Accuracy Data document available
at http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/
Optimization Notice
VM Data Types, Accuracy Modes, and Performance Tips

VM includes mathematical and pack/unpack vector functions for single and double precision vector
arguments of real and compex types. Intel MKL provides Fortran and C interfaces for all VM functions,
including the associated service functions. The Function Naming Conventions section below shows how to call
these functions.
Performance depends on a number of factors, including vectorization and threading overhead. The
recommended usage is as follows:
• Use VM for vector lengths larger than 40 elements.

• Use the Intel® Compiler for vector lengths less than 40 elements.
All VM vector functions support the following accuracy modes:
1753
• High Accuracy (HA), the default mode

• Low Accuracy (LA), which improves performance by reducing accuracy of the two least significant bits
• Enhanced Performance (EP), which provides better performance at the cost of significantly reduced
accuracy. Approximately half of the bits in the mantissa are correct.
Note that using the EP mode does not guarantee accurate processing of corner cases and special values.
Although the default accuracy is HA, LA is sufficient in most cases. For applications that require less accuracy
(for example, media applications, some Monte Carlo simulations, etc.), the EP mode may be sufficient.
VM handles special values in accordance with the C99 standard [C99].
Intel MKL offers both functions and environment variables to switch between modes for VM. See the Intel
MKL Developer Guide for details about the environment variables. Use the vmlSetMode(mode) function (see
Table "Values of the mode Parameter") to switch between the HA, LA, and EP modes. The vmlGetMode()
function returns the current mode.
Optimization Notice
See Also
VM Naming Conventions
VM Naming Conventions
The VM function names are of mixed (lower and upper) case.
The VM mathematical and pack/unpack function names have the following structure:
v[m]<?><name><mod>
where
• v is a prefix indicating vector operations.

• [m] is an optional prefix for mathematical functions that indicates additional argument to specify a VM
mode for a given function call (see vmlSetMode for possible values and their description).
• <?> is a precision prefix that indicates one of the following the data types:
s float.
d double.
c MKL_Complex8.
z MKL_Complex16.
• <name> indicates the function short name, with some of its letters in uppercase. See examples in Table
"VM Mathematical Functions".
• <mod> field (written in uppercase) is present only in the pack/unpack functions and indicates the
indexing method used:
i indexing with a positive increment
v indexing with an index vector
1754
m indexing with a mask vector.
The VM service function names have the following structure:

vml<name>
where
<name> indicates the function short name, with some of its letters in uppercase. See examples in Table "VM
Service Functions".
To call VM functions from an application program, use conventional function calls. For example, call the
vector single precision real exponential function as
vsExp ( n, a, y );
VM Function Interfaces
VM interfaces include the function names and argument lists. The following sections describe the interfaces
for the VM functions. Note that some of the functions have multiple input and output arguments
Some VM functions may also take scalar arguments as input. See the function description for the naming
conventions of such arguments.
VM Mathematical Function Interfaces

v<?><name>( n, a, [scalar input arguments,]y );
v<?><name>( n, a, b, [scalar input arguments,]y );
v<?><name>( n, a, y, z );
vm<?><name>( n, a, [scalar input arguments,]y, mode );
vm<?><name>( n, a, b, [scalar input arguments,]y, mode );
vm<?><name>( n, a, y, z, mode );
VM Pack Function Interfaces

v<?>PackI( n, a, inca, y );
v<?>PackV( n, a, ia, y );
v<?>PackM( n, a, ma, y );
VM Unpack Function Interfaces

v<?>UnpackI( n, a, y, incy );
v<?>UnpackV( n, a, y, iy );
v<?>UnpackM( n, a, y, my );
VM Service Function Interfaces

oldmode = vmlSetMode( mode );
mode = vmlGetMode( void );
olderr = vmlSetErrStatus ( err );
err = vmlGetErrStatus( void );
olderr = vmlClearErrStatus( void );
oldcallback = vmlSetErrorCallBack( callback );
callback = vmlGetErrorCallBack( void );
oldcallback = vmlClearErrorCallBack( void );
Note that oldmode, oldcerr, and oldcallback refer to settings prior to the call.
1755
VM Input Function Interfaces

n number of elements to be calculated
a first input vector
b second input vector
inca vector increment for the input vector a
ia index vector for the input vector a
ma mask vector for the input vector a
incy vector increment for the output vector y
iy index vector for the output vector y
my mask vector for the output vector y
err error code
mode VM mode
callback address of the callback function
VM Output Function Interfaces

y first output vector
z second output vector
err error code
mode VM mode
olderr former error code
oldmode former VM mode
callback address of the callback function
oldcallback address of the former callback function
See the data types of the parameters used in each function in the respective function description section. All
the Intel MKL VM mathematical functions can perform in-place operations.
Vector Indexing Methods

VM mathematical functions work only with unit stride. To accommodate arrays with other increments, or
more complicated indexing, you can gather the elements into a contiguous vector and then scatter them
after the computation is complete.
VM uses the three following indexing methods to do this task:
• positive increment
• index vector
• mask vector
1756
The indexing method used in a particular function is indicated by the indexing modifier (see the description of
the <mod> field in Function Naming Conventions). For more information on the indexing methods, see Vector
Arguments in VM in Appendix B.
VM Error Diagnostics
The VM mathematical functions incorporate the error handling mechanism, which is controlled by the
following service functions:
vmlGetErrStatus, These functions operate with a global variable called VM Error

vmlSetErrStatus, Status. The VM Error Status flags an error, a warning, or a
vmlClearErrStatus successful execution of a VM function.
vmlGetErrCallBack, These functions enable you to customize the error handling. For
vmlSetErrCallBack, example, you can identify a particular argument in a vector where
vmlClearErrCallBack an error occurred or that caused a warning.
vmlSetMode, vmlGetMode These functions get and set a VM mode. If you set a new VM mode
using the vmlSetMode function, you can store the previous VM
mode returned by the routine and restore it at any point of your
application.
If both an error and a warning situation occur during the function call, the VM Error Status variable keeps
only the value of the error code. See Table "Values of the VM Error Status" for possible values. If a VM
function does not encounter errors or warnings, it sets the VM Error Status to VML_STATUS_OK.
If you use incorrect input arguments to a VM function (VML_STATUS_BADSIZE and VML_STATUS_BADMEM), the
function calls xerbla to report errors. See Table "Values of the VM Error Status" for details
You can use the vmlSetMode and vmlGetMode functions to modify error handling behavior. Depending on the
VM mode, the error handling behavior includes the following operations:
• setting the VM Error Status to a value corresponding to the observed error or warning
• setting the errno variable to one of the values described in Table "Set Values of the errno Variable"
• writing error text information to the stderr stream
• raising the appropriate exception on an error, if necessary
• calling the additional error handler callback function that is set by vmlSetErrorCallBack.
Set Values of the errno Variable

Value of errno Description
0 No errors are detected.

EINVAL The array dimension is not positive.
EACCES NULL pointer is passed.
EDOM At least one of array values is out of a range of definition.
ERANGE At least one of array values caused a singularity, overflow or
underflow.
See Also
vmlGetErrStatus Gets the VM Error Status.
vmlSetErrStatus Sets the new VM Error Status according to err and stores the previous VM Error
Status to olderr.
vmlClearErrStatus Sets the VM Error Status to VML_STATUS_OK and stores the previous VM Error
Status to olderr.
vmlSetErrorCallBack Sets the additional error handler callback function and gets the old callback
function.
vmlGetErrorCallBack Gets the additional error handler callback function.
1757
vmlClearErrorCallBack Deletes the additional error handler callback function and retrieves the
former callback function.
vmlGetMode Gets the VM mode.
vmlSetMode Sets a new mode for VM functions according to the mode parameter and stores the
previous VM mode to oldmode.
VM Mathematical Functions
This section describes VM functions that compute values of mathematical functions on real and complex
vector arguments with unit increment.
Each function is introduced by its short name, a brief description of its purpose, and the calling sequence for
each type of data, as well as a description of the input/output arguments.
The input range of parameters is equal to the mathematical range of the input data type, unless the function
description specifies input threshold values, which mark off the precision overflow, as follows:
• FLT_MAX denotes the maximum number representable in single precision real data type
• DBL_MAX denotes the maximum number representable in double precision real data type
Table "VM Mathematical Functions" lists available mathematical functions and associated data types.
VM Mathematical Functions
Function Data Types Description
Arithmetic Functions
v?Add s, d, c, z Addition of vector elements
v?Sub s, d, c, z Subtraction of vector elements
v?Sqr s, d Squaring of vector elements
v?Mul s, d, c, z Multiplication of vector elements
v?MulByConj c, z Multiplication of elements of one vector by conjugated elements of
the second vector
v?Conj c, z Conjugation of vector elements
v?Abs s, d, c, z Computation of the absolute value of vector elements
v?Arg c, z Computation of the argument of vector elements
v?LinearFrac s, d Linear fraction transformation of vectors
Power and Root Functions
v?Inv s, d Inversion of vector elements
v?Div s, d, c, z Division of elements of one vector by elements of the second vector
v?Sqrt s, d, c, z Computation of the square root of vector elements
v?InvSqrt s, d Computation of the inverse square root of vector elements
v?Cbrt s, d Computation of the cube root of vector elements
v?InvCbrt s, d Computation of the inverse cube root of vector elements
v?Pow2o3 s, d Raising each vector element to the power of 2/3
v?Pow3o2 s, d Raising each vector element to the power of 3/2
v?Pow s, d, c, z Raising each vector element to the specified power
v?Powx s, d, c, z Raising each vector element to the constant power
v?Hypot s, d Computation of the square root of sum of squares
Exponential and Logarithmic Functions
v?Exp s, d, c, z Computation of the exponential of vector elements
v?Expm1 s, d Computation of the exponential of vector elements decreased by 1
v?Ln s, d, c, z Computation of the natural logarithm of vector elements
v?Log10 s, d, c, z Computation of the denary logarithm of vector elements
v?Log1p s, d Computation of the natural logarithm of vector elements that are
increased by 1
Trigonometric Functions
v?Cos s, d, c, z Computation of the cosine of vector elements
1758
Function Data Types Description
v?Sin s, d, c, z Computation of the sine of vector elements
v?SinCos s, d Computation of the sine and cosine of vector elements
v?CIS c, z Computation of the complex exponent of vector elements (cosine and
sine combined to complex value)
v?Tan s, d, c, z Computation of the tangent of vector elements
v?Acos s, d, c, z Computation of the inverse cosine of vector elements
v?Asin s, d, c, z Computation of the inverse sine of vector elements
v?Atan s, d, c, z Computation of the inverse tangent of vector elements
v?Atan2 s, d Computation of the four-quadrant inverse tangent of elements of two
vectors
Hyperbolic Functions
v?Cosh s, d, c, z Computation of the hyperbolic cosine of vector elements
v?Sinh s, d, c, z Computation of the hyperbolic sine of vector elements
v?Tanh s, d, c, z Computation of the hyperbolic tangent of vector elements
v?Acosh s, d, c, z Computation of the inverse hyperbolic cosine of vector elements
v?Asinh s, d, c, z Computation of the inverse hyperbolic sine of vector elements
v?Atanh s, d, c, z Computation of the inverse hyperbolic tangent of vector elements.
Special Functions
v?Erf s, d Computation of the error function value of vector elements
v?Erfc s, d Computation of the complementary error function value of vector
elements
v?CdfNorm s, d Computation of the cumulative normal distribution function value of
vector elements
v?ErfInv s, d Computation of the inverse error function value of vector elements
v?ErfcInv s, d Computation of the inverse complementary error function value of
vector elements
v?CdfNormInv s, d Computation of the inverse cumulative normal distribution function
value of vector elements
v?LGamma s, d Computation of the natural logarithm for the absolute value of the
gamma function of vector elements
v?TGamma s, d Computation of the gamma function of vector elements
v?ExpInt1 s, d Computation of the exponential integral of vector elements
Rounding Functions
v?Floor s, d Rounding towards minus infinity
v?Ceil s, d Rounding towards plus infinity
v?Trunc s, d Rounding towards zero infinity
v?Round s, d Rounding to nearest integer
v?NearbyInt s, d Rounding according to current mode
v?Rint s, d Rounding according to current mode and raising inexact result
exception
v?Modf s, d Computation of the integer and fractional parts
v?Frac s, d Computation of the fractional part
Special Value Notations

This section defines notations of special values for complex functions. The definitions are provided in text,
tables, or formulas.
• z, z1, z2, etc. denote complex numbers.

• i, i2=-1 is the imaginary unit.
• x, X, x1, x2, etc. denote real imaginary parts.
• y, Y, y1, y2, etc. denote imaginary parts.
• X and Y represent any finite positive IEEE-754 floating point values, if not stated otherwise.
• Quiet NaN and signaling NaN are denoted with QNAN and SNAN, respectively.
1759
• The IEEE-754 positive infinities or floating-point numbers are denoted with a + sign before X, Y, etc.
• The IEEE-754 negative infinities or floating-point numbers are denoted with a - sign before X, Y, etc.
CONJ(z) and CIS(z) are defined as follows:
CONJ(x+i·y)=x-i·y
CIS(y)=cos(y)+i·sin(y).
The special value tables show the result of the function for the z argument at the intersection of the RE(z)
column and the i*IM(z) row. If the function raises an exception on the argument z, the lower part of this
cell shows the raised exception and the VM Error Status. An empty cell indicates that this argument is normal
and the result is defined mathematically.
Arithmetic Functions
Arithmetic functions perform the basic mathematical operations like addition, subtraction, multiplication or
computation of the absolute value of the vector elements.
v?Add
Performs element by element addition of vector a and
vector b.
Syntax
vsAdd( n, a, b, y );
vmsAdd( n, a, b, y, mode );
vdAdd( n, a, b, y );
vmdAdd( n, a, b, y, mode );
vcAdd( n, a, b, y );
vmcAdd( n, a, b, y, mode );
vzAdd( n, a, b, y );
vmzAdd( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters
Name Type Description
n const MKL_INT Specifies the number of elements to be

calculated.
a, b const float* for vsAdd, vmsAdd Pointers to arrays that contain the input vectors
a and b.
const double* for vdAdd, vmdAdd
const MKL_Complex8* for vcAdd,
vmcAdd
const MKL_Complex16* for vzAdd,
vmzAdd
1760
mode const MKL_INT64 Overrides global VM mode setting for this

function call. See vmlSetMode for possible
values and their description.
Output Parameters
y float* for vsAdd, vmsAdd Pointer to an array that contains the output
vector y.
double* for vdAdd, vmdAdd
MKL_Complex8* for vcAdd, vmcAdd
MKL_Complex16* for vzAdd, vmzAdd
Description
The v?Add function performs element by element addition of vector a and vector b.
Special values for Real Function v?Add(x)
Argument 1 Argument 2 Result Exception
+0 +0 +0
+0 -0 +0
-0 +0 +0
-0 -0 -0
+∞ +∞ +∞
+∞ -∞ QNAN INVALID
-∞ +∞ QNAN INVALID
-∞ -∞ -∞
SNAN any value QNAN INVALID
any value SNAN QNAN INVALID
QNAN non-SNAN QNAN
non-SNAN QNAN QNAN
Specifications for special values of the complex functions are defined according to the following formula
Add(x1+i*y1,x2+i*y2) = (x1+x2) + i*(y1+y2)
Overflow in a complex function occurs (supported in the HA/LA accuracy modes only) when x1, x2, y1, y2
are finite numbers, but the real or imaginary part of the computed result is so large that it does not fit the
target precision. In this case, the function returns ∞ in that part of the result, raises the OVERFLOW exception,
and sets the VM Error Status to VML_STATUS_OVERFLOW (overriding any possible
VML_STATUS_ACCURACYWARNING status).
v?Sub
Performs element by element subtraction of vector b
from vector a.
Syntax
vsSub( n, a, b, y );
vmsSub( n, a, b, y, mode );
vdSub( n, a, b, y );
vmdSub( n, a, b, y, mode );
1761
vcSub( n, a, b, y );
vmcSub( n, a, b, y, mode );
vzSub( n, a, b, y );
vmzSub( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a, b const float* for vsSub, vmsSub Pointers to arrays that contain the input vectors
a and b.
const double* for vdSub, vmdSub
const MKL_Complex8* for vcSub,
vmcSub
const MKL_Complex16* for vzSub,
vmzSub

Output Parameters
y float* for vsSub, vmsSub Pointer to an array that contains the output
vector y.
double* for vdSub, vmdSub
MKL_Complex8* for vcSub, vmcSub
MKL_Complex16* for vzSub, vmzSub
Description
The v?Sub function performs element by element subtraction of vector b from vector a.
Special values for Real Function v?Sub(x)
+0 +0 +0
+0 -0 +0
-0 +0 -0
-0 -0 +0
+∞ +∞ QNAN INVALID
+∞ -∞ +∞
-∞ +∞ -∞
-∞ -∞ QNAN INVALID
1762
QNAN non-SNAN QNAN

non-SNAN QNAN QNAN
Sub(x1+i*y1,x2+i*y2) = (x1-x2) + i*(y1-y2).
v?Sqr
Performs element by element squaring of the vector.
Syntax
vsSqr( n, a, y );
vmsSqr( n, a, y, mode );
vdSqr( n, a, y );
vmdSqr( n, a, y, mode );
Include Files
• mkl.h
Input Parameters
n const MKL_INT Specifies the number of elements to be calculated.
a const float* for vsSqr, Pointer to an array that contains the input vector a.
vmsSqr
const double* for vdSqr,
vmdSqr
mode const MKL_INT64 Overrides global VM mode setting for this function call. See
vmlSetMode for possible values and their description.
Output Parameters
y float* for vsSqr, vmsSqr Pointer to an array that contains the output
vector y.
double* for vdSqr, vmdSqr
Description
The v?Sqr function performs element by element squaring of the vector.
1763
Special Values for Real Function v?Sqr(x)

Argument Result Exception
+0 +0
-0 +0
+∞ +∞
-∞ +∞
QNAN QNAN
SNAN QNAN INVALID
v?Mul
Performs element by element multiplication of vector
a and vector b.
Syntax
vsMul( n, a, b, y );
vmsMul( n, a, b, y, mode );
vdMul( n, a, b, y );
vmdMul( n, a, b, y, mode );
vcMul( n, a, b, y );
vmcMul( n, a, b, y, mode );
vzMul( n, a, b, y );
vmzMul( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a, b const float* for vsMul, vmsMul Pointers to arrays that contain the input vectors
a and b.
const double* for vdMul, vmdMul
const MKL_Complex8* for vcMul,
vmcMul
const MKL_Complex16* for vzMul,
vmzMul

1764
Output Parameters
y float* for vsMul, vmsMul Pointer to an array that contains the output
vector y.
double* for vdMul, vmdMul
MKL_Complex8* for vcMul, vmcMul
MKL_Complex16* for vzMul, vmzMul
Description
The v?Mul function performs element by element multiplication of vector a and vector b.
Special values for Real Function v?Mul(x)
+0 +0 +0
+0 -0 -0
-0 +0 -0
-0 -0 +0
+0 +∞ QNAN INVALID
+0 -∞ QNAN INVALID
-0 +∞ QNAN INVALID
-0 -∞ QNAN INVALID
+∞ +0 QNAN INVALID
+∞ -0 QNAN INVALID
-∞ +0 QNAN INVALID
-∞ -0 QNAN INVALID
+∞ +∞ +∞
+∞ -∞ -∞
-∞ +∞ -∞
-∞ -∞ +∞
QNAN non-SNAN QNAN
non-SNAN QNAN QNAN
Mul(x1+i*y1,x2+i*y2) = (x1*x2-y1*y2) + i*(x1*y2+y1*x2).
v?MulByConj
Performs element by element multiplication of vector
a element and conjugated vector b element.
Syntax
vcMulByConj( n, a, b, y );
1765
vmcMulByConj( n, a, b, y, mode );
vzMulByConj( n, a, b, y );
vmzMulByConj( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a, b const MKL_Complex8* for Pointers to arrays that contain the input vectors
vcMulByConj, vmcMulByConj a and b.
const MKL_Complex16* for

vzMulByConj, vmzMulByConj

Output Parameters
y MKL_Complex8* for vcMulByConj, Pointer to an array that contains the output

vmcMulByConj vector y.
MKL_Complex16* for vzMulByConj,

vmzMulByConj
Description
The v?MulByConj function performs element by element multiplication of vector a element and conjugated
vector b element.
Specifications for special values of the functions are found according to the formula
MulByConj(x1+i*y1,x2+i*y2) = Mul(x1+i*y1,x2-i*y2).
v?Conj
Performs element by element conjugation of the
vector.
Syntax
vcConj( n, a, y );
vmcConj( n, a, y, mode );
vzConj( n, a, y );
1766
vmzConj( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const MKL_Complex8* for vcConj, Pointer to an array that contains the input vector
vmcConj a.
const MKL_Complex16* for vzConj,

vmzConj

Output Parameters
y MKL_Complex8* for vcConj, vmcConj Pointer to an array that contains the output
vector y.
MKL_Complex16* for vzConj,
vmzConj
Description
The v?Conj function performs element by element conjugation of the vector.
No special values are specified. The function does not raise floating-point exceptions.
v?Abs
Computes absolute value of vector elements.
Syntax
vsAbs( n, a, y );
vmsAbs( n, a, y, mode );
vdAbs( n, a, y );
vmdAbs( n, a, y, mode );
vcAbs( n, a, y );
vmcAbs( n, a, y, mode );
vzAbs( n, a, y );
vmzAbs( n, a, y, mode );
Include Files
• mkl.h
1767
Input Parameters

calculated.
a const float* for vsAbs, vmsAbs Pointer to an array that contains the input vector
a.
const double* for vdAbs, vmdAbs
const MKL_Complex8* for vcAbs,
vmcAbs
const MKL_Complex16* for vzAbs,
vmzAbs

Output Parameters
y float* for vsAbs, vmsAbs, vcAbs, Pointer to an array that contains the output
vmcAbs vector y.
double* for vdAbs, vmdAbs, vzAbs,

vmzAbs
Description
The v?Abs function computes an absolute value of vector elements.
Special Values for Real Function v?Abs(x)
+0 +0
-0 +0
+∞ +∞
-∞ +∞
QNAN QNAN
SNAN QNAN INVALID
Abs(z) = Hypot(RE(z),IM(z)).
v?Arg
Computes argument of vector elements.
Syntax
vcArg( n, a, y );
vmcArg( n, a, y, mode );
vzArg( n, a, y );
vmzArg( n, a, y, mode );
1768
Include Files
• mkl.h
Input Parameters

calculated.
a const MKL_Complex8* for vcArg, Pointer to an array that contains the input vector
vmcArg a.
const MKL_Complex16* for vzArg,

vmcArg

Output Parameters
y float* for vcArg, vmcArg Pointer to an array that contains the output
vector y.
double* for vzArg, vmcArg
Description
The v?Arg function computes argument of vector elements.
See theSpecial Value Notationssection for the conventions used in the table below.
Special Values for Complex Function v?Arg(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·∞ +3·π/4 +π/2 +π/2 +π/2 +π/2 +π/4 NAN
+i·Y +π +π/2 +π/2 +0 NAN
+i·0 +π +π +π +0 +0 +0 NAN
-i·0 -π -π -π -0 -0 -0 NAN
-i·Y -π -π/2 -π/2 -0 NAN
-i·∞ -3·π/4 -π/2 -π/2 -π/2 -π/2 -π/4 NAN
+i·NAN NAN NAN NAN NAN NAN NAN NAN
Notes:
• raises INVALID exception when real or imaginary part of the argument is SNAN
• Arg(z)=Atan2(IM(z),RE(z)).
v?LinearFrac
Performs linear fraction transformation of vectors a
and b with scalar parameters.
1769
Syntax
vsLinearFrac( n, a, b, scalea, shifta, scaleb, shiftb, y );
vmsLinearFrac( n, a, b, scalea, shifta, scaleb, shiftb, y, mode );
vdLinearFrac( n, a, b, scalea, shifta, scaleb, shiftb, y )
vmdLinearFrac( n, a, b, scalea, shifta, scaleb, shiftb, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a, b const float* for vsLinearFrac, Pointers to arrays that contain the input vectors
vmsLinearFrac a and b.
const double* for vdLinearFrac,

vmdLinearFrac
scalea, scaleb const float for vsLinearFrac, Constant values for scaling multipliers of vectors
vmsLinearFrac a and b.
const double for vdLinearFrac,

vmdLinearFrac
shifta, shiftb const float for vsLinearFrac, Constant values for shifting addends of vectors a
vmsLinearFrac and b.
const double for vdLinearFrac,

vmdLinearFrac

Output Parameters
y float* for vsLinearFrac, Pointer to an array that contains the output

vmsLinearFrac vector y.
double* for vdLinearFrac,

vmdLinearFrac
Description
The v?LinearFrac function performs a linear fraction transformation of vector a by vector b with scalar
parameters: scaling multipliers scalea, scaleb and shifting addends shifta, shiftb:
y[i]=(scalea·a[i]+shifta)/(scaleb·b[i]+shiftb), i=1,2 … n
1770
The v?LinearFrac function is implemented in the EP accuracy mode only, therefore no special values are
defined for this function. If used in HA or LA mode, v?LinearFrac sets the VM Error Status to
VML_STATUS_ACCURACYWARNING (see the Values of the VM Status table). Correctness is guaranteed within
the threshold limitations defined for each input parameter (see the table below); otherwise, the behavior is
unspecified.
Threshold Limitations on Input Parameters
2EMIN/2≤ |scalea| ≤ 2(EMAX-2)/2

2EMIN/2≤ |scaleb| ≤ 2(EMAX-2)/2
|shifta| ≤ 2EMAX-2
|shiftb| ≤ 2EMAX-2
2EMIN/2≤a[i] ≤ 2(EMAX-2)/2
2EMIN/2≤b[i] ≤ 2(EMAX-2)/2
a[i] ≠ - (shifta/scalea)*(1-δ1), |δ1| ≤ 21-(p-1)/2
b[i] ≠ - (shiftb/scaleb)*(1-δ2), |δ2| ≤ 21-(p-1)/2
EMIN and EMAX are the minimum and maximum exponents and p is the number of significant bits (precision)
for the corresponding data type according to the ANSI/IEEE Standard 754-2008 ([IEEE754]):
• for single precision EMIN = -126, EMAX = 127, p = 24

• for double precision EMIN = -1022, EMAX = 1023, p = 53
The thresholds become less strict for common cases with scalea=0 and/or scaleb=0:
• if scalea=0, there are no limitations for the values of a[i] and shifta.
• if scaleb=0, there are no limitations for the values of b[i] and shiftb.
Example
To use the v?LinearFrac to shift vector a by a scalar value, set scaleb to 0. Note that even if scaleb is 0,
b must be declared.
#include <stdio.h>
#include "mkl_vml.h"
int main()
{
double a[10], *b;
double r[10];
double scalea = 1.0, scaleb = 0.0;
double shifta = -1.0, shiftb = 1.0;
MKL_INT i=0,n=10;
a[0]=-10000.0000;
a[1]=-7777.7777;
a[2]=-5555.5555;
a[3]=-3333.3333;
a[4]=-1111.1111;
a[5]=1111.1111;
a[6]=3333.3333;
a[7]=5555.5555;
a[8]=7777.7777;
a[9]=10000.0000;
1771
vdLinearFrac( n, a, b, scalea, shifta, scaleb, shiftb, r );
for(i=0;i<10;i++) {
printf("%25.14f %25.14f\n",a[i],r[i]);
}
return 0;
}
To use the v?LinearFrac to compute shifta/(scaleb·b[i]+shiftb), set scalea to 0. Note that even if
scalea is 0, a must be declared.
Power and Root Functions
v?Inv
Performs element by element inversion of the vector.
Syntax
vsInv( n, a, y );
vmsInv( n, a, y, mode );
vdInv( n, a, y );
vmdInv( n, a, y, mode );
Include Files
• mkl.h
Input Parameters
a const float* for vsInv, Pointer to an array that contains the input vector a.
vmsInv
const double* for vdInv,
vmdInv
mode const MKL_INT64 Overrides global VM mode setting for this function call. See
vmlSetMode for possible values and their description.
Output Parameters
y float* for vsInv, vmsInv Pointer to an array that contains the output
vector y.
double* for vdInv, vmdInv
Description
The v?Inv function performs element by element inversion of the vector.
1772
Special Values for Real Function v?Inv(x)
Argument Result VM Error Status Exception
+0 +∞ VML_STATUS_SING ZERODIVIDE
-0 -∞ VML_STATUS_SING ZERODIVIDE
+∞ +0
-∞ -0
QNAN QNAN
SNAN QNAN INVALID
v?Div
Performs element by element division of vector a by
vector b
Syntax
vsDiv( n, a, b, y );
vmsDiv( n, a, b, y, mode );
vdDiv( n, a, b, y );
vmdDiv( n, a, b, y, mode );
vcDiv( n, a, b, y );
vmcDiv( n, a, b, y, mode );
vzDiv( n, a, b, y );
vmzDiv( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a, b const float* for vsDiv, vmsDiv Pointers to arrays that contain the input vectors
a and b.
const double* for vdDiv, vmdDiv
const MKL_Complex8* for vcDiv,
vmcDiv
const MKL_Complex16* for vzDiv,
vmzDiv

Precision Overflow Thresholds for Real v?Div Function

Data Type Threshold Limitations on Input Parameters
single precision abs(a[i]) < abs(b[i]) * FLT_MAX

double precision abs(a[i]) < abs(b[i]) * DBL_MAX
1773
Precision overflow thresholds for the complex v?Div function are beyond the scope of this document.
Output Parameters
y float* for vsDiv, vmsDiv Pointer to an array that contains the output
vector y.
double* for vdDiv, vmdDiv
MKL_Complex8* for vcDiv, vmcDiv
MKL_Complex16* for vzDiv, vmzDiv
Description
The v?Div function performs element by element division of vector a by vector b.
Special values for Real Function v?Div(x)
Argument 1 Argument 2 Result VM Error Status Exception
X > +0 +0 +∞ VML_STATUS_SING ZERODIVIDE

X > +0 -0 -∞ VML_STATUS_SING ZERODIVIDE
X < +0 +0 -∞ VML_STATUS_SING ZERODIVIDE
X < +0 -0 +∞ VML_STATUS_SING ZERODIVIDE
+0 +0 QNAN VML_STATUS_SING
-0 -0 QNAN VML_STATUS_SING
X > +0 +∞ +0
X > +0 -∞ -0
+∞ +∞ QNAN
-∞ -∞ QNAN
QNAN QNAN QNAN
SNAN SNAN QNAN INVALID
Div(x1+i*y1,x2+i*y2) = (x1+i*y1)*(x2-i*y2)/(x2*x2+y2*y2).
Overflow in a complex function occurs when x2+i*y2 is not zero, x1, x2, y1, y2 are finite numbers, but the
real or imaginary part of the exact result is so large that it does not fit the target precision. In that case, the
function returns ∞ in that part of the result, raises the OVERFLOW exception, and sets the VM Error Status to
VML_STATUS_OVERFLOW.
v?Sqrt
Computes a square root of vector elements.
Syntax
vsSqrt( n, a, y );
vmsSqrt( n, a, y, mode );
vdSqrt( n, a, y );
vmdSqrt( n, a, y, mode );
vcSqrt( n, a, y );
vmcSqrt( n, a, y, mode );
vzSqrt( n, a, y );
vmzSqrt( n, a, y, mode );
1774
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsSqrt, vmsSqrt Pointer to an array that contains the input vector
a.
const double* for vdSqrt, vmdSqrt
const MKL_Complex8* for vcSqrt,
vmcSqrt
const MKL_Complex16* for vzSqrt,
vmzSqrt

Output Parameters
y float* for vsSqrt, vmsSqrt Pointer to an array that contains the output
vector y.
double* for vdSqrt, vmdSqrt
MKL_Complex8* for vcSqrt, vmcSqrt
MKL_Complex16* for vzSqrt,
vmzSqrt
Description
The v?Sqrt function computes a square root of vector elements.
Special Values for Real Function v?Sqrt(x)
X < +0 QNAN VML_STATUS_ERRDOM INVALID

+0 +0
-0 -0
-∞ QNAN VML_STATUS_ERRDOM INVALID
+∞ +∞
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Sqrt(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
+i·∞ +∞+i·∞ +∞+i·∞ +∞+i·∞ +∞+i·∞ +∞+i·∞ +∞+i·∞ +∞+i·∞
1775
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
+i·Y +0+i·∞ +∞+i·0 QNAN+i·QNAN
+i·0 +0+i·∞ +0+i·0 +0+i·0 +∞+i·0 QNAN+i·QNAN
-i·0 +0-i·∞ +0-i·0 +0-i·0 +∞-i·0 QNAN+i·QNAN
-i·Y +0-i·∞ +∞-i·0 QNAN+i·QNAN
-i·∞ +∞-i·∞ +∞-i·∞ +∞-i·∞ +∞-i·∞ +∞-i·∞ +∞-i·∞ +∞-i·∞

+i·NAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
Notes:
• raises INVALID exception when the real or imaginary part of the argument is SNAN
• Sqrt(CONJ(z))=CONJ(Sqrt(z)).
v?InvSqrt
Computes an inverse square root of vector elements.
Syntax
vsInvSqrt( n, a, y );
vmsInvSqrt( n, a, y, mode );
vdInvSqrt( n, a, y );
vmdInvSqrt( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsInvSqrt, Pointer to an array that contains the input vector
vmsInvSqrt a.
const double* for vdInvSqrt,

vmdInvSqrt

Output Parameters
y float* for vsInvSqrt, vmsInvSqrt Pointer to an array that contains the output
vector y.
double* for vdInvSqrt, vmdInvSqrt
1776
Description
The v?InvSqrt function computes an inverse square root of vector elements.
Special Values for Real Function v?InvSqrt(x)

+∞ +0
QNAN QNAN
SNAN QNAN INVALID
v?Cbrt
Computes a cube root of vector elements.
Syntax
vsCbrt( n, a, y );
vmsCbrt( n, a, y, mode );
vdCbrt( n, a, y );
vmdCbrt( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsCbrt, vmsCbrt Pointer to an array that contains the input vector
a.
const double* for vdCbrt, vmdCbrt

Output Parameters
y float* for vsCbrt, vmsCbrt Pointer to an array that contains the output
vector y.
double* for vdCbrt, vmdCbrt
Description
The v?Cbrt function computes a cube root of vector elements.
1777
Special Values for Real Function v?Cbrt(x)

+0 +0
-0 -0
+∞ +∞
-∞ -∞
QNAN QNAN
SNAN QNAN INVALID
v?InvCbrt
Computes an inverse cube root of vector elements.
Syntax
vsInvCbrt( n, a, y );
vmsInvCbrt( n, a, y, mode );
vdInvCbrt( n, a, y );
vmdInvCbrt( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsInvCbrt, Pointer to an array that contains the input vector
vmsInvCbrt a.
const double* for vdInvCbrt,

vmdInvCbrt

Output Parameters
y float* for vsInvCbrt, vmsInvCbrt Pointer to an array that contains the output
vector y.
double* for vdInvCbrt, vmdInvCbrt
Description
The v?InvCbrt function computes an inverse cube root of vector elements.
Special Values for Real Function v?InvCbrt(x)
1778
+∞ +0
-∞ -0
QNAN QNAN
SNAN QNAN INVALID
v?Pow2o3
Raises each element of a vector to the constant power
2/3.
Syntax
vsPow2o3( n, a, y );
vmsPow2o3( n, a, y, mode );
vdPow2o3( n, a, y );
vmdPow2o3( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsPow2o3, Pointers to arrays that contain the input vector
vmsPow2o3 a.
const double* for vdPow2o3,

vmdPow2o3

Output Parameters
y float* for vsPow2o3, vmsPow2o3 Pointer to an array that contains the output
vector y.
double* for vdPow2o3, vmdPow2o3
Description
The v?Pow2o3 function raises each element of a vector to the constant power 2/3.
Special Values for Real Function v?Pow2o3(x)
+0 +0
-0 +0
+∞ +∞
1779
-∞ +∞
QNAN QNAN
SNAN QNAN INVALID
v?Pow3o2
Raises each element of a vector to the constant power
3/2.
Syntax
vsPow3o2( n, a, y );
vmsPow3o2( n, a, y, mode );
vdPow3o2( n, a, y );
vmdPow3o2( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsPow3o2, Pointers to arrays that contain the input vector
vmsPow3o2 a.
const double* for vdPow3o2,

vmdPow3o2

Precision Overflow Thresholds for Pow3o2 Function

single precision abs(a[i]) < ( FLT_MAX )2/3

double precision abs(a[i]) < ( DBL_MAX )2/3
Output Parameters
y float* for vsPow3o2, vmsPow3o2 Pointer to an array that contains the output
vector y.
double* for vdPow3o2, vmdPow3o2
Description
The v?Pow3o2 function raises each element of a vector to the constant power 3/2.
1780
Special Values for Real Function v?Pow3o2(x)

+0 +0
-0 -0
+∞ +∞
QNAN QNAN
SNAN QNAN INVALID
v?Pow
Computes a to the power b for elements of two
vectors.
Syntax
vsPow( n, a, b, y );
vmsPow( n, a, b, y, mode );
vdPow( n, a, b, y );
vmdPow( n, a, b, y, mode );
vcPow( n, a, b, y );
vmcPow( n, a, b, y, mode );
vzPow( n, a, b, y );
vmzPow( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a, b const float* for vsPow, vmsPow Pointers to arrays that contain the input vectors
a and b.
const double* for vdPow, vmdPow
const MKL_Complex8* for vcPow,
vmcPow
const MKL_Complex16* for vzPow,
vmzPow

Precision Overflow Thresholds for Real v?Pow Function

single precision abs(a[i]) < ( FLT_MAX )1/b[i]
1781
double precision abs(a[i]) < ( DBL_MAX )1/b[i]
Precision overflow thresholds for the complex v?Pow function are beyond the scope of this document.
Output Parameters
y float* for vsPow, vmsPow Pointer to an array that contains the output
vector y.
double* for vdPow, vmdPow
MKL_Complex8* for vcPow, vmcPow
MKL_Complex16* for vzPow, vmzPow
Description
The v?Pow function computes a to the power b for elements of two vectors.
The real function v(s/d)Pow has certain limitations on the input range of a and b parameters. Specifically, if
a[i] is positive, then b[i] may be arbitrary. For negative a[i], the value of b[i] must be an integer
(either positive or negative).
The complex function v(c/z)Pow has no input range limitations.
Special values for Real Function v?Pow(x)
+0 neg. odd integer +∞ VML_STATUS_ERRDOM ZERODIVIDE

-0 neg. odd integer -∞ VML_STATUS_ERRDOM ZERODIVIDE
+0 neg. even integer +∞ VML_STATUS_ERRDOM ZERODIVIDE
-0 neg. even integer +∞ VML_STATUS_ERRDOM ZERODIVIDE
+0 neg. non-integer +∞ VML_STATUS_ERRDOM ZERODIVIDE
-0 neg. non-integer +∞ VML_STATUS_ERRDOM ZERODIVIDE
-0 pos. odd integer +0
-0 pos. odd integer -0
+0 pos. even integer +0
-0 pos. even integer +0
+0 pos. non-integer +0
-0 pos. non-integer +0
-1 +∞ +1
-1 -∞ +1
+1 any value +1
+1 +0 +1
+1 -0 +1
+1 +∞ +1
+1 -∞ +1
+1 QNAN +1
any value +0 +1
+0 +0 +1
-0 +0 +1
+∞ +0 +1
-∞ +0 +1
QNAN +0 +1
1782
any value -0 +1
+0 -0 +1
-0 -0 +1
+∞ -0 +1
-∞ -0 +1
QNAN -0 +1
X < +0 non-integer QNAN VML_STATUS_ERRDOM INVALID
|X| < 1 -∞ +∞
+0 -∞ +∞ VML_STATUS_ERRDOM ZERODIVIDE
-0 -∞ +∞ VML_STATUS_ERRDOM ZERODIVIDE
|X| > 1 -∞ +0
+∞ -∞ +0
-∞ -∞ +0
|X| < 1 +∞ +0
+0 +∞ +0
-0 +∞ +0
|X| > 1 +∞ +∞
+∞ +∞ +∞
-∞ +∞ +∞
-∞ neg. odd integer -0
-∞ neg. even integer +0
-∞ neg. non-integer +0
-∞ pos. odd integer -∞
-∞ pos. even integer +∞
-∞ pos. non-integer +∞
+∞ X < +0 +0
+∞ X > +0 +∞
QNAN QNAN QNAN
QNAN SNAN QNAN INVALID
SNAN QNAN QNAN INVALID
The complex double precision versions of this function, vzPow and vmzPow, are implemented in the EP
accuracy mode only. If used in HA or LA mode, vzPow and vmzPow set the VM Error Status to
VML_STATUS_ACCURACYWARNING (see the Values of the VM Status table).
v?Powx
Raises each element of a vector to the constant
power.
Syntax
vsPowx( n, a, b, y );
vmsPowx( n, a, b, y, mode );
1783
vdPowx( n, a, b, y );
vmdPowx( n, a, b, y, mode );
vcPowx( n, a, b, y );
vmcPowx( n, a, b, y, mode );
vzPowx( n, a, b, y );
vmzPowx( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters
n const MKL_INT Number of elements to be calculated.
a const float* for vsPowx, vmsPowx Pointer to an array that contains the input vector
a.
const double* for vdPowx, vmdPowx
const MKL_Complex8* for vcPowx,
vmcPowx
const MKL_Complex16* for vzPowx,
vmzPowx
b
const float for vsPowx, vmsPowx Constant value for power b.
const double for vdPowx, vmdPowx

const MKL_Complex8 for vcPowx,
vmcPowx
const MKL_Complex16 for vzPowx,
vmzPowx

Precision Overflow Thresholds for Real v?Powx Function

single precision abs(a[i]) < ( FLT_MAX )1/b

double precision abs(a[i]) < ( DBL_MAX )1/b
Precision overflow thresholds for the complex v?Powx function are beyond the scope of this document.
Output Parameters
y float* for vsPowx, vmsPowx Pointer to an array that contains the output
vector y.
double* for vdPowx, vmdPowx
MKL_Complex8* for vcPowx, vmcPowx
1784
MKL_Complex16* for vzPowx,

vmzPowx
Description
The v?Powx function raises each element of a vector to the constant power.
The real function v(s/d)Powx has certain limitations on the input range of a and b parameters. Specifically,
if a[i] is positive, then b may be arbitrary. For negative a[i], the value of b must be an integer (either
positive or negative).
The complex function v(c/z)Powx has no input range limitations.
Special values and VM Error Status treatment are the same as for the v?Pow function.
v?Hypot
Computes a square root of sum of two squared
elements.
Syntax
vsHypot( n, a, b, y );
vmsHypot( n, a, b, y, mode );
vdHypot( n, a, b, y );
vmdHypot( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters
n const MKL_INT Number of elements to be calculated.
a, b const float* for vsHypot, Pointers to arrays that contain the input vectors
vmsHypot a and b.
const double* for vdHypot,

vmdHypot

Precision Overflow Thresholds for Hypot Function

single precision
abs(a[i]) < sqrt(FLT_MAX)
abs(b[i]) < sqrt(FLT_MAX)
double precision
abs(a[i]) < sqrt(DBL_MAX)
abs(b[i]) < sqrt(DBL_MAX)
1785
Output Parameters
y float* for vsHypot, vmsHypot Pointer to an array that contains the output
vector y.
double* for vdHypot, vmdHypot
Description
The function v?Hypot computes a square root of sum of two squared elements.
Special values for Real Function v?Hypot(x)
+0 +0 +0
-0 -0 +0
+∞ any value +∞
any value +∞ +∞
QNAN any value QNAN
any value QNAN QNAN
Exponential and Logarithmic Functions
v?Exp
Computes an exponential of vector elements.
Syntax
vsExp( n, a, y );
vmsExp( n, a, y, mode );
vdExp( n, a, y );
vmdExp( n, a, y, mode );
vcExp( n, a, y );
vmcExp( n, a, y, mode );
vzExp( n, a, y );
vmzExp( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
1786
a const float* for vsExp, vmsExp Pointer to an array that contains the input vector
a.
const double* for vdExp, vmdExp
const MKL_Complex8* for vcExp,
vmcExp
const MKL_Complex16* for vzExp,
vmzExp

Precision Overflow Thresholds for Real v?Exp Function

single precision a[i] < Ln( FLT_MAX )

double precision a[i] < Ln( DBL_MAX )
Precision overflow thresholds for the complex v?Exp function are beyond the scope of this document.
Output Parameters
y float* for vsExp, vmsExp Pointer to an array that contains the output
vector y.
double* for vdExp, vmdExp
MKL_Complex8* for vcExp, vmcExp
MKL_Complex16* for vzExp, vmzExp
Description
The v?Exp function computes an exponential of vector elements.
Special Values for Real Function v?Exp(x)
+0 +1
-0 +1
X > overflow +∞ VML_STATUS_OVERFLOW OVERFLOW
X < underflow +0 VML_STATUS_UNDERFLOW UNDERFLOW
+∞ +∞
-∞ +0
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Exp(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
+i·∞ +0+i·0 QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
INVALID INVALID INVALID INVALID INVALID INVALID
1787
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
+i·Y +0·CIS(Y) +∞·CIS(Y) QNAN+i·QNAN
+i·0 +0·CIS(0) +1+i·0 +1+i·0 +∞+i·0 QNAN+i·0
-i·0 +0·CIS(0) +1-i·0 +1-i·0 +∞-i·0 QNAN-i·0
-i·Y +0·CIS(Y) +∞·CIS(Y) QNAN+i·QNAN
-i·∞ +0-i·0 QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
INVALID INVALID INVALID INVALID INVALID
+i·NAN +0+i·0 QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
INVALID INVALID INVALID INVALID
Notes:
• raises the INVALID exception when real or imaginary part of the argument is SNAN
• raises the INVALID exception on argument z=-∞+i·QNAN
• raises the OVERFLOW exception and sets the VM Error Status to VML_STATUS_OVERFLOW in the case of
overflow, that is, when RE(z), IM(z) are finite non-zero numbers, but the real or imaginary part of the
exact result is so large that it does not meet the target precision.
v?Expm1
Computes an exponential of vector elements
decreased by 1.
Syntax
vsExpm1( n, a, y );
vmsExpm1( n, a, y, mode );
vdExpm1( n, a, y );
vmdExpm1( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsExpm1, Pointer to an array that contains the input vector
vmsExpm1 a.
const double* for vdExpm1,

vmdExpm1

1788
Precision Overflow Thresholds for Expm1 Function
single precision a[i] < Ln( FLT_MAX )

double precision a[i] < Ln( DBL_MAX )
Output Parameters
y float* for vsExpm1, vmsExpm1 Pointer to an array that contains the output
vector y.
double* for vdExpm1, vmdExpm1
Description
The v?Expm1 function computes an exponential of vector elements decreased by 1.
Special Values for Real Function v?Expm1(x)
+0 +0
-0 +0
+∞ +∞
-∞ -1
QNAN QNAN
SNAN QNAN INVALID
v?Ln
Computes natural logarithm of vector elements.
Syntax
vsLn( n, a, y );
vmsLn( n, a, y, mode );
vdLn( n, a, y );
vmdLn( n, a, y, mode );
vcLn( n, a, y );
vmcLn( n, a, y, mode );
vzLn( n, a, y );
vmzLn( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
1789
a const float* for vsLn, vmsLn Pointer to an array that contains the input vector
a.
const double* for vdLn, vmdLn
const MKL_Complex8* for vcLn,
vmcLn
const MKL_Complex16* for vzLn,
vmzLn

Output Parameters
y float* for vsLn, vmsLn Pointer to an array that contains the output
vector y.
double* for vdLn, vmdLn
MKL_Complex8* for vcLn, vmcLn
MKL_Complex16* for vzLn, vmzLn
Description
The v?Ln function computes natural logarithm of vector elements.
Special Values for Real Function v?Ln(x)
+1 +0
+0 -∞ VML_STATUS_SING ZERODIVIDE
+∞ +∞
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Ln(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·∞ +∞+i·π/2 +∞+i·π/2 +∞+i·π/2 +∞+i·π/2 +∞+i·π/4 +∞+i·QNAN
+i·Y +∞+i·π +∞+i·0 QNAN+i·QNAN
INVALID
+i·0 +∞+i·π -∞+i·π -∞+i·0 +∞+i·0 QNAN+i·QNAN

ZERODIVID ZERODIVID INVALID
E E
1790
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
-i·0 +∞-i·π -∞-i·π -∞-i·0 +∞-i·0 QNAN+i·QNAN
ZERODIVID ZERODIVID INVALID
E E
-i·Y +∞-i·π +∞-i·0 QNAN+i·QNAN
INVALID
-i·∞ +∞-i·π/2 +∞-i·π/2 +∞-i·π/2 +∞-i·π/2 +∞-i·π/4 +∞+i·QNAN
+i·NAN +∞+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
Notes:
v?Log10
Computes denary logarithm of vector elements.
Syntax
vsLog10( n, a, y );
vmsLog10( n, a, y, mode );
vdLog10( n, a, y );
vmdLog10( n, a, y, mode );
vcLog10( n, a, y );
vmcLog10( n, a, y, mode );
vzLog10( n, a, y );
vmzLog10( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsLog10, Pointer to an array that contains the input vector
vmsLog10 a.
const double* for vdLog10,

vmdLog10
const MKL_Complex8* for vcLog10,
vmcLog10
1791
const MKL_Complex16* for vzLog10,

vmzLog10

Output Parameters
y float* for vsLog10, vmsLog10 Pointer to an array that contains the output
vector y.
double* for vdLog10, vmdLog10
MKL_Complex8* for vcLog10,
vmcLog10
MKL_Complex16* for vzLog10,
vmzLog10
Description
The v?Log10 function computes a denary logarithm of vector elements.
Special Values for Real Function v?Log10(x)
+1 +0
+∞ +∞
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Log10(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·∞ +∞+i·QNAN
INVALID
+i·Y +∞+i·0 QNAN+i·QNAN
INVALID
+i·0 -∞+i·0 +∞+i·0 QNAN+i·QNAN

ZERODRIVE INVALID
ZERODRIVE
1792
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
-i·0 -∞-i·0 +∞-i·0 QNAN-i·QNAN
ZERODIVID INVALID
ZERODIVID E
E
-i·Y +∞-i·0 QNAN+i·QNAN
INVALID
-i·∞ +∞+i·QNAN
Notes:
v?Log1p
Computes a natural logarithm of vector elements that
are increased by 1.
Syntax
vsLog1p( n, a, y );
vmsLog1p( n, a, y, mode );
vdLog1p( n, a, y );
vmdLog1p( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsLog1p, Pointer to an array that contains the input vector
vmsLog1p a.
const double* for vdLog1p,

vmdLog1p

1793
Output Parameters
y float* for vsLog1p, vmsLog1p Pointer to an array that contains the output
vector y.
double* for vdLog1p, vmdLog1p
Description
The v?Log1p function computes a natural logarithm of vector elements that are increased by 1.
Special Values for Real Function v?Log1p(x)
X < -1 QNAN VML_STATUS_ERRDOM INVALID
+0 +0
-0 -0
+∞ +∞
QNAN QNAN
SNAN QNAN INVALID
Trigonometric Functions
v?Cos
Computes cosine of vector elements.
Syntax
vsCos( n, a, y );
vmsCos( n, a, y, mode );
vdCos( n, a, y );
vmdCos( n, a, y, mode );
vcCos( n, a, y );
vmcCos( n, a, y, mode );
vzCos( n, a, y );
vmzCos( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
1794
a const float* for vsCos, vmsCos Pointer to an array that contains the input vector
a.
const double* for vdCos, vmdCos
const MKL_Complex8* for vcCos,
vmcCos
const MKL_Complex16* for vzCos,
vmzCos

Output Parameters
y float* for vsCos, vmsCos Pointer to an array that contains the output
vector y.
double* for vdCos, vmdCos
MKL_Complex8* for vcCos, vmcCos
MKL_Complex16* for vzCos, vmzCos
Description
The v?Cos function computes cosine of vector elements.
Note that arguments abs(a[i]) ≤ 213 and abs(a[i]) ≤ 216 for single and double precisions respectively
are called fast computational path. These are trigonometric function arguments for which VM provides the
best possible performance. Avoid arguments that do not belong to the fast computational path in the VM
High Accuracy (HA) and Low Accuracy (LA) functions. Alternatively, you can use VM Enhanced Performance
(EP) functions that are fast on the entire function domain. However, these functions provide less accuracy.
Special Values for Real Function v?Cos(x)
+0 +1
-0 +1
+∞ QNAN VML_STATUS_ERRDOM INVALID
QNAN QNAN
SNAN QNAN INVALID
Cos(z) = Cosh(i*z).
Optimization Notice
1795
Optimization Notice
v?Sin
Computes sine of vector elements.
Syntax
vsSin( n, a, y );
vmsSin( n, a, y, mode );
vdSin( n, a, y );
vmdSin( n, a, y, mode );
vcSin( n, a, y );
vmcSin( n, a, y, mode );
vzSin( n, a, y );
vmzSin( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsSin, vmsSin Pointer to an array that contains the input vector
a.
const double* for vdSin, vmdSin
const MKL_Complex8* for vcSin,
vmcSin
const MKL_Complex16* for vzSin,
vmzSin

Output Parameters
y float* for vsSin, vmsSin Pointer to an array that contains the output
vector y.
double* for vdSin, vmdSin
MKL_Complex8* for vcSin, vmcSin
1796
MKL_Complex16* for vzSin, vmzSin
Description
The function computes sine of vector elements.
Special Values for Real Function v?Sin(x)
+0 +0
-0 -0
QNAN QNAN
SNAN QNAN INVALID
Sin(z) = -i*Sinh(i*z).
Optimization Notice
v?SinCos
Computes sine and cosine of vector elements.
Syntax
vsSinCos( n, a, y, z );
vmsSinCos( n, a, y, z, mode );
vdSinCos( n, a, y, z );
vmdSinCos( n, a, y, z, mode );
Include Files
• mkl.h
1797
Input Parameters

calculated.
a const float* for vsSinCos, Pointer to an array that contains the input vector
vmsSinCos a.
const double* for vdSinCos,

vmdSinCos

Output Parameters
y, z float* for vsSinCos, vmsSinCos Pointers to arrays that contain the output vectors
y (for sinevalues) and z(for cosine values).
double* for vdSinCos, vmdSinCos
Description
The function computes sine and cosine of vector elements.
Special Values for Real Function v?SinCos(x)
Argument Result 1 Result 2 VM Error Status Exception
+0 +0 +1
-0 -0 +1
+∞ QNAN QNAN VML_STATUS_ERRDOM INVALID
-∞ QNAN QNAN VML_STATUS_ERRDOM INVALID
QNAN QNAN QNAN
Sin(z) = -i*Sinh(i*z).
Optimization Notice
1798
Optimization Notice
v?CIS
Computes complex exponent of real vector elements
(cosine and sine of real vector elements combined to
complex value).
Syntax
vcCIS( n, a, y );
vmcCIS( n, a, y, mode );
vzCIS( n, a, y );
vmzCIS( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vcCIS, vmcCIS Pointer to an array that contains the input vector
a.
const double* for vzCIS, vmzCIS

Output Parameters
y MKL_Complex8* for vcCIS, vmcCIS Pointer to an array that contains the output
vector y.
MKL_Complex16* for vzCIS, vmzCIS
Description
The v?CIS function computes complex exponent of real vector elements (cosine and sine of real vector
elements combined to complex value).
Special Values for Complex Function v?CIS(x)
x CIS(x)
+∞ QNAN+i·QNAN
INVALID
+0 +1+i·0
1799
x CIS(x)
-0 +1-i·0
-∞ QNAN+i·QNAN
INVALID
NAN QNAN+i·QNAN
Notes:
• raises INVALID exception when the argument is SNAN

• raises INVALID exception and sets the VM Error Status to VML_STATUS_ERRDOM for x=+∞, x=-∞
v?Tan
Computes tangent of vector elements.
Syntax
vsTan( n, a, y );
vmsTan( n, a, y, mode );
vdTan( n, a, y );
vmdTan( n, a, y, mode );
vcTan( n, a, y );
vmcTan( n, a, y, mode );
vzTan( n, a, y );
vmzTan( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsTan, vmsTan Pointer to an array that contains the input vector
a.
const double* for vdTan, vmdTan
const MKL_Complex8* for vcTan,
vmcTan
const MKL_Complex16* for vzTan,
vmzTan

1800
Output Parameters
y float* for vsTan, vmsTan Pointer to an array that contains the output
vector y.
double* for vdTan, vmdTan
MKL_Complex8* for vcTan, vmcTan
MKL_Complex16* for vzTan, vmzTan
Description
The v?Tan function computes tangent of vector elements.
are called fast computational path. These are trigigonometric function arguments for which VM provides the
Special Values for Real Function v?Tan(x)
+0 +0
-0 -0
QNAN QNAN
SNAN QNAN INVALID
Tan(z) = -i*Tanh(i*z).
Optimization Notice
v?Acos
Computes inverse cosine of vector elements.
Syntax
vsAcos( n, a, y );
vmsAcos( n, a, y, mode );
vdAcos( n, a, y );
vmdAcos( n, a, y, mode );
vcAcos( n, a, y );
1801
vmcAcos( n, a, y, mode );
vzAcos( n, a, y );
vmzAcos( n, a, y, mode );
Include Files
• mkl.h
Input Parameters
n const int Specifies the number of elements to be

calculated.
a const float* for vsAcos, vmsAcos Pointer to an array that contains the input vector
a.
const double* for vdAcos, vmdAcos
const MKL_Complex8* for vcAcos,
vmcAcos
const MKL_Complex16* for vzAcos,
vmzAcos

Output Parameters
y float* for vsAcos, vmsAcos Pointer to an array that contains the output
vector y.
double* for vdAcos, vmdAcos
MKL_Complex8* for vcAcos, vmcAcos
MKL_Complex16* for vzAcos,
vmzAcos
Description
The v?Acos function computes inverse cosine of vector elements.
Special Values for Real Function v?Acos(x)
+0 +π/2
-0 +π/2
+1 +0
-1 +π
|X| > 1 QNAN VML_STATUS_ERRDOM INVALID
QNAN QNAN
SNAN QNAN INVALID
See the Special Value Notationssection for the conventions used in the table below.
1802
Special Values for Complex Function v?Acos(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·∞ QNAN-i·∞
+i·Y +π-i·∞ +0-i·∞ QNAN+i·QNAN
+i·0 +π-i·∞ +0-i·∞ QNAN+i·QNAN
-i·0 +π+i·∞ +0+i·∞ QNAN+i·QNAN
-i·Y +π+i·∞ +0+i·∞ QNAN+i·QNAN
-i·∞ QNAN+i·∞
+i·NAN QNAN+i·∞ QNAN+i·QNAN QNAN+i·QNAN QNAN+i·∞ QNAN+i·QNAN
Notes:
• Acos(CONJ(z))=CONJ(Acos(z)).
v?Asin
Computes inverse sine of vector elements.
Syntax
vsAsin( n, a, y );
vmsAsin( n, a, y, mode );
vdAsin( n, a, y );
vmdAsin( n, a, y, mode );
vcAsin( n, a, y );
vmcAsin( n, a, y, mode );
vzAsin( n, a, y );
vmzAsin( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsAsin, vmsAsin Pointer to an array that contains the input vector
a.
const double* for vdAsin, vmdAsin
1803
const MKL_Complex8* for vcAsin,

vmcAsin
const MKL_Complex16* for vzAsin,
vmzAsin

Output Parameters
y float* for vsAsin, vmsAsin Pointer to an array that contains the output
vector y.
double* for vdAsin, vmdAsin
MKL_Complex8* for vcAsin, vmcAsin
MKL_Complex16* for vzAsin,
vmzAsin
Description
The v?Asin function computes inverse sine of vector elements.
Special Values for Real Function v?Asin(x)
+0 +0
-0 -0
+1 +π/2
-1 -π/2
QNAN QNAN
SNAN QNAN INVALID
Asin(z) = -i*Asinh(i*z).
v?Atan
Computes inverse tangent of vector elements.
Syntax
vsAtan( n, a, y );
vmsAtan( n, a, y, mode );
vdAtan( n, a, y );
vmdAtan( n, a, y, mode );
vcAtan( n, a, y );
vmcAtan( n, a, y, mode );
1804
vzAtan( n, a, y );
vmzAtan( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsAtan, vmsAtan Pointer to an array that contains the input vector
a.
const double* for vdAsin, vmdAtan
const MKL_Complex8* for vcAtan,
vmcAtan
const MKL_Complex16* for vzAsin,
vmzAtan

Output Parameters
y float* for vsAtan, vmsAtan Pointer to an array that contains the output
vector y.
double* for vdAsin, vmdAtan
MKL_Complex8* for vcAtan, vmcAtan
MKL_Complex16* for vzAsin,
vmzAtan
Description
The v?Atan function computes inverse tangent of vector elements.
Special Values for Real Function v?Atan(x)
+0 +0
-0 -0
+∞ +π/2
-∞ -π/2
QNAN QNAN
SNAN QNAN INVALID
Atan(z) = -i*Atanh(i*z).
1805
v?Atan2
Computes four-quadrant inverse tangent of elements
of two vectors.
Syntax
vsAtan2( n, a, b, y );
vmsAtan2( n, a, b, y, mode );
vdAtan2( n, a, b, y );
vmdAtan2( n, a, b, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a, b const float* for vsAtan2, Pointers to arrays that contain the input vectors
vmsAtan2 a and b.
const double* for vdAtan2,

vmdAtan2

Output Parameters
y float* for vsAtan2, vmsAtan2 Pointer to an array that contains the output
vector y.
double* for vdAtan2, vmdAtan2
Description
The v?Atan2 function computes four-quadrant inverse tangent of elements of two vectors.
The elements of the output vectory are computed as the four-quadrant arctangent of a[i] / b[i].
Special values for Real Function v?Atan2(x)
-∞ -∞ -3*π/4
-∞ X < +0 -π/2
-∞ -0 -π/2
-∞ +0 -π/2
-∞ X > +0 -π/2
-∞ +∞ -π/4
X < +0 -∞ -π
X < +0 -0 -π/2
1806
X < +0 +0 -π/2
X < +0 +∞ -0
-0 -∞ -π
-0 X < +0 -π
-0 -0 -π
-0 +0 -0
-0 X > +0 -0
-0 +∞ -0
+0 -∞ +π
+0 X < +0 +π
+0 -0 +π
+0 +0 +0
+0 X > +0 +0
+0 +∞ +0
X > +0 -∞ +π
X > +0 -0 +π/2
X > +0 +0 +π/2
X > +0 +∞ +0
+∞ -∞ -3*π/4
+∞ X < +0 +π/2
+∞ -0 +π/2
+∞ +0 +π/2
+∞ X > +0 +π/2
+∞ +∞ +π/4
X > +0 QNAN QNAN
X > +0 SNAN QNAN INVALID
QNAN X > +0 QNAN
SNAN X > +0 QNAN INVALID
QNAN QNAN QNAN
QNAN SNAN QNAN INVALID
Hyperbolic Functions
v?Cosh
Computes hyperbolic cosine of vector elements.
Syntax
vsCosh( n, a, y );
vmsCosh( n, a, y, mode );
vdCosh( n, a, y );
vmdCosh( n, a, y, mode );
vcCosh( n, a, y );
vmcCosh( n, a, y, mode );
1807
vzCosh( n, a, y );
vmzCosh( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsCosh, vmsCosh Pointer to an array that contains the input vector
a.
const double* for vdCosh, vmdCosh
const MKL_Complex8* for vcCosh,
vmcCosh
const MKL_Complex16* for vzCosh,
vmzCosh

Precision Overflow Thresholds for Real v?Cosh Function

single precision -Ln(FLT_MAX)-Ln2 <a[i] < Ln(FLT_MAX)+Ln2

double precision -Ln(DBL_MAX)-Ln2 <a[i] < Ln(DBL_MAX)+Ln2
Precision overflow thresholds for the complex v?Cosh function are beyond the scope of this document.
Output Parameters
y float* for vsCosh, vmsCosh Pointer to an array that contains the output
vector y.
double* for vdCosh, vmdCosh
MKL_Complex8* for vcCosh, vmcCosh
MKL_Complex16* for vzCosh,
vmzCosh
Description
The v?Cosh function computes hyperbolic cosine of vector elements.
Special Values for Real Function v?Cosh(x)
+0 +1
-0 +1
X < -overflow +∞ VML_STATUS_OVERFLOW OVERFLOW
+∞ +∞
1808
-∞ +∞
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Cosh(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
+i·∞ +∞+i·QNAN QNAN+i·QNAN QNAN-i·0 QNAN+i·0 QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
+i·Y +∞·Cos(Y)- +∞·CIS(Y) QNAN+i·QNAN
i·∞·Sin(Y)
+i·0 +∞-i·0 +1-i·0 +1+i·0 +∞+i·0 QNAN+i·0
-i·0 +∞+i·0 +1+i·0 +1-i·0 +∞-i·0 QNAN-i·0
-i·Y +∞·Cos(Y)- +∞·CIS(Y) QNAN+i·QNAN
i·∞·Sin(Y)
-i·∞ +∞+i·QNAN QNAN+i·QNAN QNAN+i·0 QNAN-i·0 INVALID QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
+i·NAN +∞+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN-i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
Notes:
• raises the INVALID exception when the real or imaginary part of the argument is SNAN
• Cosh(CONJ(z))=CONJ(Cosh(z))
• Cosh(-z)=Cosh(z).
v?Sinh
Computes hyperbolic sine of vector elements.
Syntax
vsSinh( n, a, y );
vmsSinh( n, a, y, mode );
vdSinh( n, a, y );
vmdSinh( n, a, y, mode );
vcSinh( n, a, y );
vmcSinh( n, a, y, mode );
vzSinh( n, a, y );
vmzSinh( n, a, y, mode );
1809
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsSinh, vmsSinh Pointer to an array that contains the input vector
a.
const double* for vdSinh, vmdSinh
const MKL_Complex8* for vcSinh,
vmcSinh
const MKL_Complex16* for vzSinh,
vmzSinh

Precision Overflow Thresholds for Real v?Sinh Function
single precision -Ln(FLT_MAX)-Ln2 <a[i] < Ln(FLT_MAX)+Ln2

double precision -Ln(DBL_MAX)-Ln2 <a[i] < Ln(DBL_MAX)+Ln2
Precision overflow thresholds for the complex v?Sinh function are beyond the scope of this document.
Output Parameters
y float* for vsSinh, vmsSinh Pointer to an array that contains the output
vector y.
double* for vdSinh, vmdSinh
MKL_Complex8* for vcSinh, vmcSinh
MKL_Complex16* for vzSinh,
vmzSinh
Description
The v?Sinh function computes hyperbolic sine of vector elements.
Special Values for Real Function v?Sinh(x)
+0 +0
-0 -0
X < -overflow -∞ VML_STATUS_OVERFLOW OVERFLOW
+∞ +∞
-∞ -∞
QNAN QNAN
SNAN QNAN INVALID
1810
Special Values for Complex Function v?Sinh(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
+i·∞ -∞+i·QNAN QNAN+i·QNAN -0+i·QNAN +0+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
+i·Y -∞·Cos(Y)+ +∞·CIS(Y) QNAN+i·QNAN
i·∞·Sin(Y)
+i·0 -∞+i·0 -0+i·0 +0+i·0 +∞+i·0 QNAN+i·0
-i·0 -∞-i·0 -0-i·0 +0-i·0 +∞-i·0 QNAN-i·0
-i·Y -∞·Cos(Y)+ +∞·CIS(Y) QNAN+i·QNAN
i·∞·Sin(Y)
-i·∞ -∞+i·QNAN QNAN+i·QNAN -0+i·QNAN +0+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
+i·NAN -∞+i·QNAN QNAN+i·QNAN -0+i·QNAN +0+i·QNAN QNAN+i·QNAN +∞+i·QNAN QNAN+i·QNAN
Notes:
• raises the INVALID exception when the real or imaginary part of the argument is SNAN
• Sinh(CONJ(z))=CONJ(Sinh(z))
• Sinh(-z)=-Sinh(z).
v?Tanh
Computes hyperbolic tangent of vector elements.
Syntax
vsTanh( n, a, y );
vmsTanh( n, a, y, mode );
vdTanh( n, a, y );
vmdTanh( n, a, y, mode );
vcTanh( n, a, y );
vmcTanh( n, a, y, mode );
vzTanh( n, a, y );
vmzTanh( n, a, y, mode );
Include Files
• mkl.h
1811
Input Parameters

calculated.
a const float* for vsTanh, vmsTanh Pointer to an array that contains the input vector
a.
const double* for vdTanh, vmdTanh
const MKL_Complex8* for vcTanh,
vmcTanh
const MKL_Complex16* for vzTanh,
vmzTanh

Output Parameters
y float* for vsTanh, vmsTanh Pointer to an array that contains the output
vector y.
double* for vdTanh, vmdTanh
MKL_Complex8* for vcTanh, vmcTanh
MKL_Complex16* for vzTanh,
vmzTanh
Description
The v?Tanh function computes hyperbolic tangent of vector elements.
Special Values for Real Function v?Tanh(x)
+0 +0
-0 -0
+∞ +1
-∞ -1
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Tanh(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
+i·∞ -1+i·0 QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +1+i·0 QNAN+i·QNAN
+i·Y -1+i·0·Tan(Y) +1+i·0·Tan(Y) QNAN+i·QNAN
+i·0 -1+i·0 -0+i·0 +0+i·0 +1+i·0 QNAN+i·0
1812
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z)
-i·0 -1-i·0 -0-i·0 +0-i·0 +1-i·0 QNAN-i·0
-i·Y -1+i·0·Tan(Y) +1+i·0·Tan(Y) QNAN+i·QNAN
-i·∞ -1-i·0 QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +1-i·0 QNAN+i·QNAN
+i·NAN -1+i·0 QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN QNAN+i·QNAN +1+i·0 QNAN+i·QNAN
Notes:
• Tanh(CONJ(z))=CONJ(Tanh(z))
• Tanh(-z)=-Tanh(z).
v?Acosh
Computes inverse hyperbolic cosine (nonnegative) of
vector elements.
Syntax
vsAcosh( n, a, y );
vmsAcosh( n, a, y, mode );
vdAcosh( n, a, y );
vmdAcosh( n, a, y, mode );
vcAcosh( n, a, y );
vmcAcosh( n, a, y, mode );
vzAcosh( n, a, y );
vmzAcosh( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsAcosh, Pointer to an array that contains the input vector
vmsAcosh a.
const double* for vdAcosh,

vmdAcosh
const MKL_Complex8* for vcAcosh,
vmcAcosh
const MKL_Complex16* for vzAcosh,
vmzAcosh
1813

Output Parameters
y float* for vsAcosh, vmsAcosh Pointer to an array that contains the output
vector y.
double* for vdAcosh, vmdAcosh
MKL_Complex8* for vcAcosh,
vmcAcosh
MKL_Complex16* for vzAcosh,
vmzAcosh
Description
The v?Acosh function computes inverse hyperbolic cosine (nonnegative) of vector elements.
Special Values for Real Function v?Acosh(x)
+1 +0
+∞ +∞
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Acosh(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·∞ +∞+i·π/2 +∞+i·π/2 +∞+i·π/2 +∞+i·π/2 +∞+i·π/4 +∞+i·QNAN
+i·Y +∞+i·π +∞+i·0 QNAN+i·QNAN
+i·0 +∞+i·π +0+i·π/2 +0+i·π/2 +∞+i·0 QNAN+i·QNAN
-i·0 +∞+i·π +0+i·π/2 +0+i·π/2 +∞+i·0 QNAN+i·QNAN
-i·Y +∞+i·π +∞+i·0 QNAN+i·QNAN
-i·∞ +∞-i·π/2 +∞-i·π/2 +∞-i·π/2 +∞-i·π/2 +∞-i·π/4 +∞+i·QNAN
Notes:
• Acosh(CONJ(z))=CONJ(Acosh(z)).
1814
v?Asinh
Computes inverse hyperbolic sine of vector elements.
Syntax
vsAsinh( n, a, y );
vmsAsinh( n, a, y, mode );
vdAsinh( n, a, y );
vmdAsinh( n, a, y, mode );
vcAsinh( n, a, y );
vmcAsinh( n, a, y, mode );
vzAsinh( n, a, y );
vmzAsinh( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsAsinh, Pointer to an array that contains the input vector
vmsAsinh a.
const double* for vdAsinh,

vmdAsinh
const MKL_Complex8* for vcAsinh,
vmcAsinh
const MKL_Complex16* for vzAsinh,
vmzAsinh

Output Parameters
y float* for vsAsinh, vmsAsinh Pointer to an array that contains the output
vector y.
double* for vdAsinh, vmdAsinh
MKL_Complex8* for vcAsinh,
vmcAsinh
MKL_Complex16* for vzAsinh,
vmzAsinh
Description
The v?Asinh function computes inverse hyperbolic sine of vector elements.
1815
Special Values for Real Function v?Asinh(x)

+0 +0
-0 -0
+∞ +∞
-∞ -∞
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Asinh(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·∞ -∞+i·π/4 -∞+i·π/2 +∞+i·π/2 +∞+i·π/2 +∞+i·π/2 +∞+i·π/4 +∞+i·QNAN
+i·Y -∞+i·0 +∞+i·0 QNAN+i·QNAN
+i·0 +∞+i·0 +0+i·0 +0+i·0 +∞+i·0 QNAN+i·QNAN
-i·0 -∞-i·0 -0-i·0 +0-i·0 +∞-i·0 QNAN-i·QNAN
-i·Y -∞-i·0 +∞-i·0 QNAN+i·QNAN
-i·∞ -∞-i·π/4 -∞-i·π/2 -∞-i·π/2 +∞-i·π/2 +∞-i·π/2 +∞-i·π/4 +∞+i·QNAN

+i·NAN -∞+i·QNAN QNAN QNAN QNAN QNAN +∞+i·QNAN QNAN+i·QNAN
+i·QNAN +i·QNAN +i·QNAN +i·QNAN
Notes:
• Asinh(CONJ(z))=CONJ(Asinh(z))
• Asinh(-z)=-Asinh(z).
v?Atanh
Computes inverse hyperbolic tangent of vector
elements.
Syntax
vsAtanh( n, a, y );
vmsAtanh( n, a, y, mode );
vdAtanh( n, a, y );
vmdAtanh( n, a, y, mode );
vcAtanh( n, a, y );
vmcAtanh( n, a, y, mode );
vzAtanh( n, a, y );
vmzAtanh( n, a, y, mode );
Include Files
• mkl.h
1816
Input Parameters

calculated.
a const float* for vsAtanh, Pointer to an array that contains the input vector
vmsAtanh a.
const double* for vdAtanh,

vmdAtanh
const MKL_Complex8* for vcAtanh,
vmcAtanh
const MKL_Complex16* for vzAtanh,
vmzAtanh

Output Parameters
y float* for vsAtanh, vmsAtanh Pointer to an array that contains the output
vector y.
double* for vdAtanh, vmdAtanh
MKL_Complex8* for vcAtanh,
vmcAtanh
MKL_Complex16* for vzAtanh,
vmzAtanh
Description
The v?Atanh function computes inverse hyperbolic tangent of vector elements.
Special Values for Real Function v?Atanh(x)
QNAN QNAN
SNAN QNAN INVALID
Special Values for Complex Function v?Atanh(z)
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·∞ -0+i·π/2 -0+i·π/2 -0+i·π/2 +0+i·π/2 +0+i·π/2 +0+i·π/2 +0+i·π/2
1817
RE(z) -∞ -X -0 +0 +X +∞ NAN
i·IM(z
)
+i·Y -0+i·π/2 +0+i·π/2 QNAN+i·QNAN
+i·0 -0+i·π/2 -0+i·0 +0+i·0 +0+i·π/2 QNAN+i·QNAN
-i·0 -0-i·π/2 -0-i·0 +0-i·0 +0-i·π/2 QNAN-i·QNAN
-i·Y -0-i·π/2 +0-i·π/2 QNAN+i·QNAN
-i·∞ -0-i·π/2 -0-i·π/2 -0-i·π/2 +0-i·π/2 +0-i·π/2 +0-i·π/2 +0-i·π/2

+i·NAN -0+i·QNAN QNAN -0+i·QNAN +0+i·QNAN QNAN +0+i·QNAN QNAN+i·QNAN
+i·QNAN +i·QNAN
Notes:
• Atanh(+-1+-i*0)=+-∞+-i*0, and ZERODIVIDE exception is raised

• Atanh(CONJ(z))=CONJ(Atanh(z))
• Atanh(-z)=-Atanh(z).
Special Functions
v?Erf
Computes the error function value of vector elements.
Syntax
vsErf( n, a, y );
vmsErf( n, a, y, mode );
vdErf( n, a, y );
vmdErf( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsErf, vmsErf Pointer to an array that contains the input vector
a.
const double* for vdErf, vmdErf

1818
Output Parameters
y float* for vsErf, vmsErf Pointer to an array that contains the output
vector y.
double* for vdErf, vmdErf
Description
The Erf function computes the error function values for elements of the input vector a and writes them to
the output vector y.
The error function is defined as given by:
Useful relations:
where erfc is the complementary error function.
where
is the cumulative normal distribution function.
where Φ-1(x) and erf-1(x) are the inverses to Φ(x) and erf(x) respectively.
The following figure illustrates the relationships among Erf family functions (Erf, Erfc, CdfNorm).
Erf Family Functions Relationship
1819
Useful relations for these functions:
Special Values for Real Function v?Erf(x)

+∞ +1
-∞ -1
QNAN QNAN
SNAN QNAN INVALID
See Also
v?Erfc
v?CdfNorm
v?Erfc
Computes the complementary error function value of
vector elements.
Syntax
vsErfc( n, a, y );
vmsErfc( n, a, y, mode );
vdErfc( n, a, y );
vmdErfc( n, a, y, mode );
1820
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsErfc, vmsErfc Pointer to an array that contains the input vector
a.
const double* for vdErfc, vmdErfc

Output Parameters
y float* for vsErfc, vmsErfc Pointer to an array that contains the output
vector y.
double* for vdErfc, vmdErfc
Description
The Erfc function computes the complementary error function values for elements of the input vector a and
writes them to the output vector y.
The complementary error function is defined as follows:
Useful relations:
where
1821
See also Figure "Erf Family Functions Relationship" in Erf function description for Erfc function relationship
with the other functions of Erf family.
Special Values for Real Function v?Erfc(x)
X > underflow +0 VML_STATUS_UNDERFLOW UNDERFLOW

+∞ +0
-∞ +2
QNAN QNAN
SNAN QNAN INVALID
See Also
v?Erf
v?CdfNorm
v?CdfNorm
Computes the cumulative normal distribution function
values of vector elements.
Syntax
vsCdfNorm( n, a, y );
vmsCdfNorm( n, a, y, mode );
vdCdfNorm( n, a, y );
vmdCdfNorm( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsCdfNorm, Pointer to an array that contains the input vector
vmsCdfNorm a.
const double* for vdCdfNorm,

vmdCdfNorm

Output Parameters
y float* for vsCdfNorm, vmsCdfNorm Pointer to an array that contains the output
vector y.
double* for vdCdfNorm, vmdCdfNorm
Description
1822
The CdfNorm function computes the cumulative normal distribution function values for elements of the input
vector a and writes them to the output vector y.
The cumulative normal distribution function is defined as given by:
Useful relations:
where Erf and Erfc are the error and complementary error functions.
See also Figure "Erf Family Functions Relationship" in Erf function description for CdfNorm function
relationship with the other functions of Erf family.
Special Values for Real Function v?CdfNorm(x)
X < underflow +0 VML_STATUS_UNDERFLOW UNDERFLOW

+∞ +1
-∞ +0
QNAN QNAN
SNAN QNAN INVALID
See Also
v?Erf
v?Erfc
v?ErfInv
Computes inverse error function value of vector
elements.
Syntax
vsErfInv( n, a, y );
vmsErfInv( n, a, y, mode );
vdErfInv( n, a, y );
vmdErfInv( n, a, y, mode );
Include Files
• mkl.h
1823
Input Parameters

calculated.
a const float* for vsErfInv, Pointer to an array that contains the input vector
vmsErfInv a.
const double* for vdErfInv,

vmdErfInv

Output Parameters
y float* for vsErfInv, vmsErfInv Pointer to an array that contains the output
vector y.
double* for vdErfInv, vmdErfInv
Description
The ErfInv function computes the inverse error function values for elements of the input vector a and writes
them to the output vector y
y = erf-1(a),
where erf(x) is the error function defined as given by:
Useful relations:
where erfc is the complementary error function.
where
1824
Figure "ErfInv Family Functions Relationship" illustrates the relationships among ErfInv family functions
(ErfInv, ErfcInv, CdfNormInv).
ErfInv Family Functions Relationship
Useful relations for these functions:
Special Values for Real Function v?ErfInv(x)

+0 +0
-0 -0
QNAN QNAN
1825
SNAN QNAN INVALID
See Also
v?ErfcInv
v?CdfNormInv
v?ErfcInv
Computes the inverse complementary error function
value of vector elements.
Syntax
vsErfcInv( n, a, y );
vmsErfcInv( n, a, y, mode );
vdErfcInv( n, a, y );
vmdErfcInv( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsErfcInv, Pointer to an array that contains the input vector
vmsErfcInv a.
const double* for vdErfcInv,

vmdErfcInv

Output Parameters
y float* for vsErfcInv, vmsErfcInv Pointer to an array that contains the output
vector y.
double* for vdErfcInv, vmdErfcInv
Description
The ErfcInv function computes the inverse complimentary error function values for elements of the input
vector a and writes them to the output vector y.
The inverse complementary error function is defined as given by:
1826
where erf(x) denotes the error function and erfinv(x) denotes the inverse error function.
See also Figure "ErfInv Family Functions Relationship" in ErfInv function description for ErfcInv function
relationship with the other functions of ErfInv family.
Special Values for Real Function v?ErfcInv(x)
+1 +0
-0 +∞ VML_STATUS_SING ZERODIVIDE
X > +2 QNAN VML_STATUS_ERRDOM INVALID
QNAN QNAN
SNAN QNAN INVALID
See Also
v?ErfInv
v?CdfNormInv
v?CdfNormInv
Computes the inverse cumulative normal distribution
function values of vector elements.
Syntax
vsCdfNormInv( n, a, y );
vmsCdfNormInv( n, a, y, mode );
vdCdfNormInv( n, a, y );
vmdCdfNormInv( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
1827
a const float* for vsCdfNormInv, Pointer to an array that contains the input vector
vmsCdfNormInv a.
const double* for vdCdfNormInv,

vmdCdfNormInv

Output Parameters
y float* for vsCdfNormInv, Pointer to an array that contains the output

vmsCdfNormInv vector y.
double* for vdCdfNormInv,

vmdCdfNormInv
Description
The CdfNormInv function computes the inverse cumulative normal distribution function values for elements
of the input vector a and writes them to the output vector y.
The inverse cumulative normal distribution function is defined as given by:
where CdfNorm(x) denotes the cumulative normal distribution function.
Useful relations:
where erfinv(x) denotes the inverse error function and erfcinv(x) denotes the inverse complementary
error functions.
See also Figure "ErfInv Family Functions Relationship" in ErfInv function description for CdfNormInv
function relationship with the other functions of ErfInv family.
Special Values for Real Function v?CdfNormInv(x)
+0.5 +0
X > +1 QNAN VML_STATUS_ERRDOM INVALID
1828
QNAN QNAN
SNAN QNAN INVALID
See Also
v?ErfInv
v?ErfcInv
v?LGamma
Computes the natural logarithm of the absolute value
of gamma function for vector elements.
Syntax
vsLGamma( n, a, y );
vmsLGamma( n, a, y, mode );
vdLGamma( n, a, y );
vmdLGamma( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsLGamma, Pointer to an array that contains the input vector
vmsLGamma a.
const double* for vdLGamma,

vmdLGamma

Output Parameters
y float* for vsLGamma, vmsLGamma Pointer to an array that contains the output
vector y.
double* for vdLGamma, vmdLGamma
Description
The v?LGamma function computes the natural logarithm of the absolute value of gamma function for elements
of the input vector a and writes them to the output vector y. Precision overflow thresholds for the v?LGamma
function are beyond the scope of this document. If the result does not meet the target precision, the function
raises the OVERFLOW exception and sets the VM Error Status to VML_STATUS_OVERFLOW.
1829
Special Values for Real Function v?LGamma(x)

+1 +0
+2 +0
+0 +? VML_STATUS_SING ZERODIVIDE
-0 +? VML_STATUS_SING ZERODIVIDE
negative integer +? VML_STATUS_SING ZERODIVIDE
-? +?
+? +?
X > overflow +? VML_STATUS_OVERFLOW OVERFLOW
QNAN QNAN
SNAN QNAN INVALID
v?TGamma
Computes the gamma function of vector elements.
Syntax
vsTGamma( n, a, y );
vmsTGamma( n, a, y, mode );
vdTGamma( n, a, y );
vmdTGamma( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsTGamma, Pointer to an array that contains the input vector
vmsTGamma a.
const double* for vdTGamma,

vmdTGamma

Output Parameters
y float* for vsTGamma, vmsTGamma Pointer to an array that contains the output
vector y.
double* for vdTGamma, vmdTGamma
1830
Description
The v?TGamma function computes the gamma function for elements of the input vector a and writes them to
the output vector y. Precision overflow thresholds for the v?TGamma function are beyond the scope of this
document. If the result does not meet the target precision, the function raises the OVERFLOW exception and
sets the VM Error Status to VML_STATUS_OVERFLOW.
Special Values for Real Function v?TGamma(x)
negative integer QNAN VML_STATUS_ERRDOM INVALID
+∞ +∞
QNAN QNAN
SNAN QNAN INVALID
v?ExpInt1
Computes the exponential integral of vector elements.
Syntax
vsExpInt1( n, a, y );
vmsExpInt1( n, a, y, mode );
vdExpInt1( n, a, y );
vmdExpInt1( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsExpInt1, Pointer to an array that contains the input vector
vmsExpInt1 a.
const double* for vdExpInt1,

vmdExpInt1

Output Parameters
y float* for vsExpInt1, vmsExpInt1 Pointer to an array that contains the output
vector y.
1831
double* for vdExpInt1, vmdExpInt1
Description
The v?ExpInt1 function computes the exponential integral E1 of vector elements.
For positive real values x, this can be written as:
e−t e−xt
E1 x = ∫ ∞ ∞
x t dt = ∫ 1 t dt.
For negative real values x, the result is defined as NAN.

Special Values for Real Function v?ExpInt1(x)
x < +0 QNAN VML_STATUS_ERRDOM INVALID

-0 +∞ VML_STATUS_SING ZERODIVIDE
+∞ +0
QNAN QNAN
SNAN QNAN INVALID
Rounding Functions
v?Floor
Computes an integer value rounded towards minus
infinity for each vector element.
Syntax
vsFloor( n, a, y );
vmsFloor( n, a, y, mode );
vdFloor( n, a, y );
vmdFloor( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsFloor, Pointer to an array that contains the input vector
vmsFloor a.
const double* for vdFloor,

vmdFloor
1832

Output Parameters
y float* for vsFloor, vmsFloor Pointer to an array that contains the output
vector y.
double* for vdFloor, vmdFloor
Description
The function computes an integer value rounded towards minus infinity for each vector element.
Special Values for Real Function v?Floor(x)

+0 +0
-0 -0
+∞ +∞
-∞ -∞
SNAN QNAN INVALID
QNAN QNAN
v?Ceil
Computes an integer value rounded towards plus
infinity for each vector element.
Syntax
vsCeil( n, a, y );
vmsCeil( n, a, y, mode );
vdCeil( n, a, y );
vmdCeil( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
1833
a const float* for vsCeil, vmsCeil Pointer to an array that contains the input vector
a.
const double* for vdCeil, vmdCeil

Output Parameters
y float* for vsCeil, vmsCeil Pointer to an array that contains the output
vector y.
double* for vdCeil, vmdCeil
Description
The function computes an integer value rounded towards plus infinity for each vector element.
Special Values for Real Function v?Ceil(x)

+0 +0
-0 -0
+∞ +∞
-∞ -∞
SNAN QNAN INVALID
QNAN QNAN
v?Trunc
Computes an integer value rounded towards zero for
each vector element.
Syntax
vsTrunc( n, a, y );
vmsTrunc( n, a, y, mode );
vdTrunc( n, a, y );
vmdTrunc( n, a, y, mode );
Include Files
• mkl.h
1834
Input Parameters

calculated.
a const float* for vsTrunc, Pointer to an array that contains the input vector
vmsTrunc a.
const double* for vdTrunc,

vmdTrunc

Output Parameters
y float* for vsTrunc, vmsTrunc Pointer to an array that contains the output
vector y.
double* for vdTrunc, vmdTrunc
Description
The function computes an integer value rounded towards zero for each vector element.
Special Values for Real Function v?Trunc(x)

+0 +0
-0 -0
+∞ +∞
-∞ -∞
SNAN QNAN INVALID
QNAN QNAN
v?Round
Computes a value rounded to the nearest integer for
Syntax
vsRound( n, a, y );
vmsRound( n, a, y, mode );
vdRound( n, a, y );
vmdRound( n, a, y, mode );
1835
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsRound, Pointer to an array that contains the input vector
vmsRound a.
const double* for vdRound,

vmdRound

Output Parameters
y float* for vsRound, vmsRound Pointer to an array that contains the output
vector y.
double* for vdRound, vmdRound
Description
The function computes a value rounded to the nearest integer for each vector element. Input elements that
are halfway between two consecutive integers are always rounded away from zero regardless of the rounding
mode.
Special Values for Real Function v?Round(x)
+0 +0
-0 -0
+∞ +∞
-∞ -∞
SNAN QNAN INVALID
QNAN QNAN
v?NearbyInt
Computes a rounded integer value in the current
rounding mode for each vector element.
Syntax
vsNearbyInt( n, a, y );
vmsNearbyInt( n, a, y, mode );
vdNearbyInt( n, a, y );
vmdNearbyInt( n, a, y, mode );
1836
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsNearbyInt, Pointer to an array that contains the input vector
vmsNearbyInt a.
const double* for vdNearbyInt,

vmdNearbyInt

Output Parameters
y float* for vsNearbyInt, Pointer to an array that contains the output

vmsNearbyInt vector y.
double* for vdNearbyInt,

vmdNearbyInt
Description
The v?NearbyInt function computes a rounded integer value in a current rounding mode for each vector
element.
Special Values for Real Function v?NearbyInt(x)
Argument Argument Exception
+0 +0
-0 -0
+∞ +∞
-∞ -∞
SNAN QNAN INVALID
QNAN QNAN
v?Rint
Computes a rounded integer value in the current
rounding mode.
Syntax
vsRint( n, a, y );
vmsRint( n, a, y, mode );
vdRint( n, a, y );
vmdRint( n, a, y, mode );
1837
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsRint, vmsRint Pointer to an array that contains the input vector
a.
const double* for vdRint, vmdRint

Output Parameters
y float* for vsRint, vmsRint Pointer to an array that contains the output
vector y.
double* for vdRint, vmdRint
Description
The v?Rint function computes a rounded floating-point integer value using the current rounding mode for
The rounding mode affects the results computed for inputs that fall between consecutive integers. For
example:
• f(0.5) = 0, for rounding modes set to round to nearest round toward zero or to minus infinity.
• f(0.5) = 1, for rounding modes set to plus infinity.
• f(-1.5) = -2, for rounding modes set to round to nearest or to minus infinity.
• f(-1.5) = -1, for rounding modes set to round toward zero or to plus infinity.
Special Values for Real Function v?Rint(x)
+0 +0
-0 -0
+∞ +∞
-∞ -∞
SNAN QNAN INVALID
QNAN QNAN
v?Modf
Computes a truncated integer value and the remaining
fraction part for each vector element.
Syntax
vsModf( n, a, y, z );
vmsModf( n, a, y, z, mode );
vdModf( n, a, y, z );
1838
vmdModf( n, a, y, z, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsModf, vmsModf Pointer to an array that contains the input vector
a.
const double* for vdModf, vmdModf

Output Parameters
y, z float* for vsModf, vmsModf Pointer to an array that contains the output
vector y and z.
double* for vdModf, vmdModf
Description
The function computes a truncated integer value and the remaining fraction part for each vector element.
Special Values for Real Function v?Modf(x)

Argument Result: y(i) Result: z(i) Exception
+0 +0 +0
-0 -0 -0
+∞ +∞ +0
-∞ -∞ -0
QNAN QNAN QNAN
v?Frac
Computes a signed fractional part for each vector
element.
1839
Syntax
vsFrac( n, a, y );
vmsFrac( n, a, y, mode );
vdFrac( n, a, y );
vmdFrac( n, a, y, mode );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsFrac, vmsFrac Pointer to an array that contains the input vector
a.
const double* for vdFrac, vmdFrac

Output Parameters
y float* for vsFrac, vmsFrac Pointer to an array that contains the output
vector y.
double* for vdFrac, vmdFrac
Description
The function computes a signed fractional part for each vector element.
Special Values for Real Function v?Frac(x)

+0 +0
-0 -0
+∞ +0
-∞ -0
SNAN QNAN INVALID
QNAN QNAN
1840
VM Pack/Unpack Functions
This section describes VM functions that convert vectors with unit increment to and from vectors with positive
increment indexing, vector indexing, and mask indexing (see Appendix B for details on vector indexing
methods).
The table below lists available VM Pack/Unpack functions, together with data types and indexing methods
associated with them.
VM Pack/Unpack Functions
Function Short Name Data Indexing Description
Types Methods
v?Pack s, d, c, I,V,M Gathers elements of arrays, indexed by different methods.
z
v?Unpack s, d, c, I,V,M Scatters vector elements to arrays with different indexing.
z
See Also
Vector Arguments in VM
v?Pack
Copies elements of an array with specified indexing to
a vector with unit increment.
Syntax
vsPackI( n, a, inca, y );
vsPackV( n, a, ia, y );
vsPackM( n, a, ma, y );
vdPackI( n, a, inca, y );
vdPackV( n, a, ia, y );
vdPackM( n, a, ma, y );
vcPackI( n, a, inca, y );
vcPackV( n, a, ia, y );
vcPackM( n, a, ma, y );
vzPackI( n, a, inca, y );
vzPackV( n, a, ia, y );
vzPackM( n, a, ma, y );
Include Files
• mkl.h
Input Parameters
a const float* for vsPackI, Specifies pointer to an array that contains the input
vsPackV, vsPackM vector a. The arrays must be:
1841
const double* for vdPackI, for v?PackI, at least(1 + (n-1)*inca)

vdPackV, vdPackM
for v?PackV, at least max( n,max(ia[j]) ), j=0,
const MKL_Complex8* for vcPackI, …, n-1
vcPackV, vcPackM
for v?PackM, at least n.
vzPackI, vzPackV, vzPackM
inca const MKL_INT for vsPackI, Specifies the increment for the elements of a.
vdPackI, vcPackI, vzPackI
ia Specifies the pointer to an array of size at least n

const int* for vsPackV, vdPackV,
that contains the index vector for the elements of a.
vcPackV, vzPackV
ma Specifies the pointer to an array of size at least n

const int* for vsPackM, vdPackM,
that contains the mask vector for the elements of a.
vcPackM, vzPackM
Output Parameters
y float* for vsPackI, vsPackV, Pointer to an array of size at least n that

vsPackM contains the output vector y.
double* for vdPackI, vdPackV,

vdPackM
const MKL_Complex8* for vcPackI,
vcPackV, vcPackM
const MKL_Complex16* for vzPackI,
vzPackV, vzPackM
v?Unpack
Copies elements of a vector with unit increment to an
array with specified indexing.
Syntax
vsUnpackI( n, a, y, incy );
vsUnpackV( n, a, y, iy );
vsUnpackM( n, a, y, my );
vdUnpackI( n, a, y, incy );
vdUnpackV( n, a, y, iy );
vdUnpackM( n, a, y, my );
vcUnpackI( n, a, y, incy );
vcUnpackV( n, a, y, iy );
vcUnpackM( n, a, y, my );
vzUnpackI( n, a, y, incy );
1842
vzUnpackV( n, a, y, iy );
vzUnpackM( n, a, y, my );
Include Files
• mkl.h
Input Parameters

calculated.
a const float* for vsUnpackI,

Specifies the pointer to an array of size at least n
vsUnpackV, vsUnpackM that contains the input vector a.
const double* for vdUnpackI,
vdUnpackV, vdUnpackM
vcUnpackI, vcUnpackV, vcUnpackM
vzUnpackI, vzUnpackV, vzUnpackM
incy const MKL_INT for vsUnpackI, Specifies the increment for the elements of y.
vdUnpackI, vcUnpackI, vzUnpackI
iy const int* for vsUnpackV, Specifies the pointer to an array of size at least n
vdUnpackV, vcUnpackV, vzUnpackV that contains the index vector for the elements
of a.
my const int* for vsUnpackM, Specifies the pointer to an array of size at least n
vdUnpackM, vcUnpackM, vzUnpackM that contains the mask vector for the elements
of a.
Output Parameters
y float* for vsUnpackI, vsUnpackV, Specifies the pointer to an array that contains the
vsUnpackM output vector y.
double* for vdUnpackI, The array must be:

vdUnpackV, vdUnpackM
for v?UnpackI, at least (1 + (n-1)*incy)
const MKL_Complex8* for for v?UnpackV, at least
vcUnpackI, vcUnpackV, vcUnpackM max( n,max(ia[j]) ),j=0,..., n-1,
const MKL_Complex16* for for v?UnpackM, at least n.
vzUnpackI, vzUnpackV, vzUnpackM
VM Service Functions
The VM Service functions enable you to set/get the accuracy mode and error code. These functions are
available both in the Fortran and C interfaces. The table below lists available VM Service functions and their
short description.
1843
VM Service Functions
Function Short Name Description
vmlSetMode Sets the VM mode
vmlGetMode Gets the VM mode
MKLFreeTls Frees allocated VM/VS thread local storage memory from within DllMain
routine (Windows* OS only)
vmlSetErrStatus Sets the VM Error Status
vmlGetErrStatus Gets the VM Error Status
vmlClearErrStatus Clears the VM Error Status
vmlSetErrorCallBack Sets the additional error handler callback function
vmlGetErrorCallBack Gets the additional error handler callback function
vmlClearErrorCallBack Deletes the additional error handler callback function
Optimization Notice
vmlSetMode
Sets a new mode for VM functions according to the
mode parameter and stores the previous VM mode to
oldmode.
Syntax
oldmode = vmlSetMode( mode );
Include Files
• mkl.h
Input Parameters
mode const MKL_UINT Specifies the VM mode to be set.
Output Parameters
oldmode unsigned int Specifies the former VM mode.
Description
The vmlSetMode function sets a new mode for VM functions according to the mode parameter and stores the
previous VM mode to oldmode. The mode change has a global effect on all the VM functions within a thread.
1844
NOTE
You can override the global mode setting and change the mode for a given VM function call by using a
respective vm[s,d]<Func> variant of the function.
The mode parameter is designed to control accuracy, handling of denormalized numbers, and error handling.
Table "Values of the mode Parameter" lists values of the mode parameter. You can obtain all other possible
values of the mode parameter from the mode parameter values by using bitwise OR ( | ) operation to
combine one value for accuracy, one value for handling of denormalized numbers, and one vlaue for error
control options. The default value of the mode parameter is VML_HA | VML_FTZDAZ_OFF |
VML_ERRMODE_DEFAULT.
The VML_FTZDAZ_ON mode is specifically designed to improve the performance of computations that involve
denormalized numbers at the cost of reasonable accuracy loss. This mode changes the numeric behavior of
the functions: denormalized input values are treated as zeros (DAZ = denormals-are-zero) and denormalized
results are flushed to zero (FTZ = flush-to-zero). Accuracy loss may occur if input and/or output values are
close to denormal range.
Values of the mode Parameter
Value of mode Description
Accuracy Control
VML_HA high accuracy versions of VM functions
VML_LA low accuracy versions of VM functions
VML_EP enhanced performance accuracy versions of VM functions
Denormalized Numbers Handling Control
VML_FTZDAZ_ON Faster processing of denormalized inputs is enabled.
VML_FTZDAZ_OFF Faster processing of denormalized inputs is disabled.
Error Mode Control
VML_ERRMODE_IGNORE No action is set for computation errors.
VML_ERRMODE_ERRNO On error, the errno variable is set.
VML_ERRMODE_STDERR On error, the error text information is written to stderr.
VML_ERRMODE_EXCEPT On error, an exception is raised.
VML_ERRMODE_CALLBACK On error, an additional error handler function is called.
VML_ERRMODE_DEFAULT On error, the errno variable is set, an exception is raised, and an
additional error handler function is called.
Optimization Notice
Examples
The following example shows how to set low accuracy, fast processing for denormalized numbers and stderr
error mode:
1845
vmlSetMode( VML_LA );
vmlSetMode( VML_LA | VML_FTZDAZ_ON | VML_ERRMODE_STDERR );
vmlGetMode
Gets the VM mode.
Syntax
mod = vmlGetMode( void );
Include Files
• mkl.h
Output Parameters
mod unsigned int Specifies the packed mode parameter.
Description
The function vmlGetMode returns the VM mode parameter that controls accuracy, handling of denormalized
numbers, and error handling options. The mod variable value is a combination of the values listed in the
table "Values of the mode Parameter". You can obtain these values using the respective mask from the table
"Values of Mask for the mode Parameter".
Values of Mask for the mode Parameter
Value of mask Description
VML_ACCURACY_MASK Specifies mask for accuracy mode selection.

VML_FTZDAZ_MASK Specifies mask for FTZDAZ mode selection.
VML_ERRMODE_MASK Specifies mask for error mode selection.
See example below:
Examples
accm = vmlGetMode(void )& VML_ACCURACY_MASK;
denm = vmlGetMode(void )& VML_FTZDAZ_MASK;
errm = vmlGetMode(void )& VML_ERRMODE_MASK;
MKLFreeTls
Frees allocated VM/VS thread local storage memory
from within DllMain routine. Use on Windows* OS
only.
Syntax
void MKLFreeTls( const MKL_UINT fdwReason );
Include Files
• mkl_vml_functions_win.h
1846
Description
The MKLFreeTls routine frees thread local storage (TLS) memory which has been allocated when using MKL
static libraries to link into a DLL on the Windows* OS. The routine should only be used within DllMain.
NOTE
It is only necessary to use MKLFreeTls for TLS data in the VM and VS domains.
Input Parameters
fdwReason const MKL_UINT Reason code from the DllMain call.
Example
BOOL WINAPI DllMain(HINSTANCE hInst, DWORD fdwReason, LPVOID lpvReserved)
{
/*customer code*/
MKLFreeTls(fdwReason);
/*customer code*/
}
vmlSetErrStatus
Sets the new VM Error Status according to err and
stores the previous VM Error Status to olderr.
Syntax
olderr = vmlSetErrStatus( status );
Include Files
• mkl.h
Input Parameters
status const MKL_INT Specifies the VM error status to be set.
Output Parameters
olderr int Specifies the former VM error status.
Description
Table "Values of the VM Status" lists possible values of the err parameter.
Values of the VM Status
Status Description
Successful Execution
VML_STATUS_OK The execution was completed successfully.
1847
Status Description
Warnings
VML_STATUS_ACCURACYWARNING The execution was completed successfully in a different accuracy
mode.
Errors
VML_STATUS_BADSIZE The function does not support the preset accuracy mode. The Low
Accuracy mode is used instead.
VML_STATUS_BADMEM NULL pointer is passed.
VML_STATUS_ERRDOM At least one of array values is out of a range of definition.
VML_STATUS_SING At least one of the input array values causes a divide-by-zero
exception or produces an invalid (QNaN) result.
VML_STATUS_OVERFLOW An overflow has happened during the calculation process.
VML_STATUS_UNDERFLOW An underflow has happened during the calculation process.
Examples
olderr = vmlSetErrStatus( VML_STATUS_OK );
olderr = vmlSetErrStatus( VML_STATUS_ERRDOM );
olderr = vmlSetErrStatus( VML_STATUS_UNDERFLOW );
vmlGetErrStatus
Gets the VM Error Status.
Syntax
err = vmlGetErrStatus( void );
Include Files
• mkl.h
Output Parameters
err int Specifies the VM error status.
vmlClearErrStatus
Sets the VM Error Status to VML_STATUS_OK and
stores the previous VM Error Status to olderr.
Syntax
olderr = vmlClearErrStatus( void );
Include Files
• mkl.h
1848
Output Parameters
olderr int Specifies the former VM error status.
vmlSetErrorCallBack
Sets the additional error handler callback function and
gets the old callback function.
Syntax
oldcallback = vmlSetErrorCallBack( callback );
Include Files
• mkl.h
Input Parameters
Name Description
callback Pointer to the callback function. The callback function has the following format:
static int __stdcall

MyHandler(DefVmlErrorContext*
pContext)
{
/* Handler body */
};
1849
Name Description
The passed error structure is defined as follows:
typedef struct _DefVmlErrorContext

{
int iCode;/* Error status value */
int iIndex;/* Index for bad array
element, or bad array
dimension, or bad
array pointer */
double dbA1; /* Error argument 1 */
double dbA2; /* Error argument 2 */
double dbR1; /* Error result 1 */
double dbR2; /* Error result 2 */
char cFuncName[64]; /* Function name */
int iFuncNameLen; /* Length of
functionname*/
double dbA1Im; /* Error argument 1, imag
part*/
double dbA2Im; /* Error argument 2, imag
part*/
double dbR1Im; /* Error result 1, imag
part*/
double dbR2Im; /* Error result 2, imag
part*/
} DefVmlErrorContext;
Output Parameters
oldcallback int Pointer to the former callback function.
Description
The callback function is called on each VM mathematical function error if VML_ERRMODE_CALLBACK error
mode is set (see "Values of the mode Parameter").
Use the vmlSetErrorCallBack() function if you need to define your own callback function instead of
default empty callback function.
The input structure for a callback function contains the following information about the error encountered:
• the input value that caused an error

• location (array index) of this value
• the computed result value
• error code
1850
• name of the function in which the error occurred.
You can insert your own error processing into the callback function. This may include correcting the passed
result values in order to pass them back and resume computation. The standard error handler is called after
the callback function only if it returns 0.
vmlGetErrorCallBack
Gets the additional error handler callback function.
Syntax
callback = vmlGetErrorCallBack( void );
Include Files
• mkl.h
Output Parameters
Name Description
callback Pointer to the callback function
vmlClearErrorCallBack
Deletes the additional error handler callback function
and retrieves the former callback function.
Syntax
oldcallback = vmlClearErrorCallBack( void );
Include Files
• mkl.h
Output Parameters
oldcallback int Pointer to the former callback function
1851
1852
Statistical Functions 8
Statistical functions in Intel® MKL are known as the Vector Statistics (VS). They are designed for the purpose
of
• generating vectors of pseudorandom, quasi-random, and non-deterministic random numbers
• performing mathematical operations of convolution and correlation
• computing basic statistical estimates for single and double precision multi-dimensional datasets
The corresponding functionality is described in the respective Random Number Generators, Convolution and
Correlation, and Summary Statistics sections.
See VS performance data in the online VS Performance Data document available at http://
software.intel.com/en-us/articles/intel-math-kernel-library-documentation/
The basic notion in VS is a task. The task object is a data structure or descriptor that holds the parameters
related to a specific statistical operation: random number generation, convolution and correlation, or
summary statistics estimation. Such parameters can be an identifier of a random number generator, its
internal state and parameters, data arrays, their shape and dimensions, an identifier of the operation and so
forth. You can modify the VS task parameters using the VS service functions.
Optimization Notice
Random Number Generators

Intel MKL VS provides a set of routines implementing commonly used pseudorandom, quasi-random, or non-
deterministic random number generators with continuous and discrete distribution. To improve performance,
all these routines were developed using the calls to the highly optimized Basic Random Number Generators
(BRNGs) and vector mathematical functions (VM, see Chapter 9, "Vector Mathematical Functions").
VS provides interfaces both for Fortran and C languages. For users of the C and C++ languages the
mkl_vsl.h header file is provided. All header files are found in the following directory:
${MKL}/include
All VS routines can be classified into three major categories:

• Transformation routines for different types of statistical distributions, for example, uniform, normal
(Gaussian), binomial, etc. These routines indirectly call basic random number generators, which are
pseudorandom, quasi-random, or non-deterministic random number generators. Detailed description of
the generators can be found in Distribution Generators section.
• Service routines to handle random number streams: create, initialize, delete, copy, save to a binary file,
load from a binary file, get the index of a basic generator. The description of these routines can be found
in Service Routines section.
• Registration routines for basic pseudorandom generators and routines that obtain properties of the
registered generators (see Advanced Service Routines section ).
1853
The last two categories are referred to as service routines.
Optimization Notice
Random Number Generators Conventions

This document makes no specific differentiation between random, pseudorandom, and quasi-random
numbers, nor between random, pseudorandom, and quasi-random number generators unless the context
requires otherwise. For details, refer to the ‘Random Numbers’ section in VS Notes document provided at the
Intel® MKL web page.
All generators of nonuniform distributions, both discrete and continuous, are built on the basis of the uniform
distribution generators, called Basic Random Number Generators (BRNGs). The pseudorandom numbers with
nonuniform distribution are obtained through an appropriate transformation of the uniformly distributed
pseudorandom numbers. Such transformations are referred to as generation methods. For a given
distribution, several generation methods can be used. See VS Notes for the description of methods available
for each generator.
An RNG task determines environment in which random number generation is performed, in particular
parameters of the BRNG and its internal state. Output of VS generators is a stream of random numbers that
are used in Monte Carlo simulations. A random stream descriptor and a random stream are used as
synonyms of an RNG task in the document unless the context requires otherwise.
The random stream descriptor specifies which BRNG should be used in a given transformation method. See
the Random Streams and RNGs in Parallel Computation section of VS Notes.
The term computational node means a logical or physical unit that can process data in parallel.
Random Number Generators Mathematical Notation

The following notation is used throughout the text:
N The set of natural numbers N = {1, 2, 3 ...}.
Z The set of integers Z = {... -3, -2, -1, 0, 1, 2, 3 ...}.
R The set of real numbers.
The floor of a (the largest integer less than or equal to a).
⊕ or xor Bitwise exclusive OR.
Binomial coefficient or combination (α∈R, α≥ 0; k∈N∪{0}).
For α≥k binomial coefficient is defined as
1854
If α < k, then
Φ(x) Cumulative Gaussian distribution function
defined over - ∞ < x < + ∞.

Φ(-∞) = 0, Φ(+∞) = 1.
Γ(α) The complete gamma function
where α > 0.
B(p, q) The complete beta function
where p>0 and q>0.
LCG(a,c, m) Linear Congruential Generator xn+1 = (axn + c) mod m, where a is

called the multiplier, c is called the increment, and m is called the
modulus of the generator.
MCG(a,m) Multiplicative Congruential Generator xn+1 = (axn) mod m is a special

case of Linear Congruential Generator, where the increment c is taken
to be 0.
GFSR(p, q) Generalized Feedback Shift Register Generator
xn = xn-p⊕xn-q.
Random Number Generators Naming Conventions

The names of the routines, types, and constants in VS random number generators are case-sensitive and can
contain lowercase and uppercase characters (viRngUniform).
1855
The names of generator routines have the following structure:

v<type of result>Rng<distribution>
where
• v is the prefix of a VS vector function.

• <type of result> is either s, d, or i and specifies one of the following types:
s float
d double
i int
Prefixes s and d apply to continuous distributions only, prefix i applies
only to discrete case.
• rng indicates that the routine is a random generator.

• <distribution> specifies the type of statistical distribution.
Names of service routines follow the template below:
vsl<name>
where
• vsl is the prefix of a VS service function.

• <name> contains a short function name.
For a more detailed description of service routines, refer to Service Routines and Advanced Service Routines
sections.
The prototype of each generator routine corresponding to a given probability distribution fits the following
structure:
status = <function name>( method, stream, n, r, [<distribution parameters>] )
where
• method defines the method of generation. A detailed description of this parameter can be found in table
"Values of <method> in method parameter". See the next page, where the structure of the method
parameter name is explained.
• stream defines the descriptor of the random stream and must have a non-zero value. Random streams,
descriptors, and their usage are discussed further in Random Streams and Service Routines.
• n defines the number of random values to be generated. If n is less than or equal to zero, no values are
generated. Furthermore, if n is negative, an error condition is set.
• r defines the destination array for the generated numbers. The dimension of the array must be large
enough to store at least n random numbers.
• status defines the error status of a VS routine. See Error Reporting section for a detailed description of
error status values.
Additional parameters included into <distribution parameters> field are individual for each generator routine
and are described in detail in Distribution Generators section.
To invoke a distribution generator, use a call to the respective VS routine. For example, to obtain a vector r,
composed of n independent and identically distributed random numbers with normal (Gaussian) distribution,
that have the mean value a and standard deviation sigma, write the following:
status = vsRngGaussian( method, stream, n, r, a, sigma )
The name of a method parameter has the following structure:
VSL_RNG_METHOD_method<distribution>_<method>
VSL_RNG_METHOD_<distribution>_<method>_ACCURATE
where
1856
• <distribution> is the probability distribution.
• <method> is the method name.
Type of the name structure for the method parameter corresponds to fast and accurate modes of random
number generation (see "Distribution Generators" section and VS Notes for details).
Method names VSL_RNG_METHOD_<distribution>_<method>
and
VSL_RNG_METHOD_<distribution>_<method>_ACCURATE
should be used with
v<precision>Rng<distribution>
function only, where
• <precision> is
s for single precision continuous distribution
d for double precision continuous distribution
i for discrete distribution

• <distribution> is the probability distribution.
is the probability distribution.Table "Values of <method> in method parameter" provides specific predefined
values of the method name. The third column contains names of the functions that use the given method.
Values of <method> in method parameter
Method Short Description Functions
STD Standard method. Currently there is only one method for these Uniform
functions. (continuous),
Uniform
(discrete),
UniformBits,
UniformBits32,
UniformBits64
BOXMULLER BOXMULLER generates normally distributed random number x Gaussian,

thru the pair of uniformly distributed numbers u1 and u2 GaussianMV
according to the formula:
BOXMULLER2 BOXMULLER2 generates normally distributed random numbers x1 Gaussian,

and x2 thru the pair of uniformly distributed numbers u1 and u2 GaussianMV,
according to the formulas: Lognormal
ICDF Inverse cumulative distribution function method. Exponential,

Laplace,
Weibull, Cauchy,
Rayleigh,
1857
Gumbel,
Bernoulli,
Geometric,
Gaussian,
GaussianMV,
Lognormal
GNORM For α > 1, a gamma distributed random number is generated as Gamma

a cube of properly scaled normal random number; for 0.6 ≤α <
1, a gamma distributed random number is generated using
rejection from Weibull distribution; for α < 0.6, a gamma
distributed random number is obtained using transformation of
exponential power distribution; for α = 1, gamma distribution is
reduced to exponential distribution.
CJA For min(p, q) > 1, Cheng method is used; for min(p, q) < Beta
1, Johnk method is used, if q + K·p2+ C≤ 0 (K = 0.852...,
C=-0.956...) otherwise, Atkinson switching algorithm is used;
for max(p, q) < 1, method of Johnk is used; for min(p, q) <
1, max(p, q)> 1, Atkinson switching algorithm is used (CJA
stands for the first letters of Cheng, Johnk, Atkinson); for p =
1 or q = 1, inverse cumulative distribution function method is
used;for p = 1 and q = 1, beta distribution is reduced to
uniform distribution.
BTPE Acceptance/rejection method for Binomial

ntrial·min(p,1 - p)≥ 30
with decomposition into 4 regions:
– 2 parallelograms
– triangle
– left exponential tail
– right exponential tail
H2PE Acceptance/rejection method for large mode of distribution with Hypergeometric

decomposition into 3 regions:
– rectangular
PTPE Acceptance/rejection method for λ≥ 27 with decomposition into Poisson

4 regions:
– 2 parallelograms
– triangle
– right exponential tail;
otherwise, table lookup method is used.
POISNORM for λ≥ 1, method based on Poisson inverse CDF approximation Poisson,

by Gaussian inverse CDF; PoissonV
for λ < 1, table lookup method is used.
NBAR Acceptance/rejection method for , NegBinomial
1858
with decomposition into 5 regions:

– rectangular
– 2 trapezoid
NOTE
In this document, routines are often referred to by their base name (Gaussian) when this does not
lead to ambiguity. In the routine reference, the full name (vsrnggaussian, vsRngGaussian) is
always used in prototypes and code examples.
Basic Generators
VS provides pseudorandom, quasi-random, and non-deterministic random number generators. This includes
the following BRNGs, which differ in speed and other properties:
• the 31-bit multiplicative congruential pseudorandom number generator MCG(1132489760, 231 -1)
[L'Ecuyer99]
• the 32-bit generalized feedback shift register pseudorandom number generator GFSR(250,103)
[Kirkpatrick81]
• the combined multiple recursive pseudorandom number generator MRG32k3a [L'Ecuyer99a]
• the 59-bit multiplicative congruential pseudorandom number generator MCG(1313, 259) from NAG
Numerical Libraries [NAG]
• Wichmann-Hill pseudorandom number generator (a set of 273 basic generators) from NAG Numerical
Libraries [NAG]
• Mersenne Twister pseudorandom number generator MT19937 [Matsumoto98] with period length 219937-1
of the produced sequence
• Set of 6024 Mersenne Twister pseudorandom number generators MT2203 [Matsumoto98],
[Matsumoto00]. Each of them generates a sequence of period length equal to 22203-1. Parameters of the
generators provide mutual independence of the corresponding sequences.
• SIMD-oriented Fast Mersenne Twister pseudorandom number generator SFMT19937 [Saito08] with a
period length equal to 219937-1 of the produced sequence.
• Sobol quasi-random number generator [Sobol76], [Bratley88], which works in arbitrary dimension. For
dimensions greater than 40 the user should supply initialization parameters (initial direction numbers and
primitive polynomials or direction numbers) by using vslNewStreamEx function. See additional details on
interface for registration of the parameters in the library in VS Notes.
• Niederreiter quasi-random number generator [Bratley92], which works in arbitrary dimension. For
dimensions greater than 318 the user should supply initialization parameters (irreducible polynomials or
direction numbers) by using vslNewStreamEx function. See additional details on interface for registration
of the parameters in the library in VS Notes.
• Non-deterministic random number generator (RDRAND-based generators only) [AVX], [IntelSWMan].
1859
NOTE
You can use a non-deterministic random number generator only if the underlying hardware supports it.
For instructions on how to detect if an Intel CPU supports a non-deterministic random number
generator see, for example, Chapter 8: Post-32nm Processor Instructions in [AVX] or Chapter 4:
RdRand Instruction Usage in [BMT].
NOTE
The time required by some non-deterministic sources to generate a random number is not constant,
so you might have to make multiple requests before the next random number is available. VS limits
the number of retries for requests to the non-deterministic source to 10. You can redefine the
maximum number of retries during the initialization of the non-deterministic random number
generator with the vslNewStreamEx function.
For more details on the non-deterministic source implementation for Intel CPUs please refer to Section
7.3.17, Volume 1, Random Number Generator Instruction in [IntelSWMan] and Section 4.2.2, RdRand
Retry Loop in [BMT].
• Philox4x32-10 counter-based pseudorandom number generator with a period of

2128PHILOX4X32X10[Salmon11].
• ARS-5 counter-based pseudorandom number generator with a period of 2128, which uses instructions from
the AES-NI set ARS5[Salmon11].
See some testing results for the generators in VS Notes and comparative performance data at https://
software.intel.com/en-us/articles/intel-math-kernel-library-documentation.
VS provides means of registration of such user-designed generators through the steps described in Advanced
Service Routines section.
For some basic generators, VS provides two methods of creating independent random streams in
multiprocessor computations, which are the leapfrog method and the block-splitting method. These sequence
splitting methods are also useful in sequential Monte Carlo.
In addition, MT2203 pseudorandom number generator is a set of 6024 generators designed to create up to
6024 independent random sequences, which might be used in parallel Monte Carlo simulations. Another
generator that has the same feature is Wichmann-Hill. It allows creating up to 273 independent random
streams. The properties of the generators designed for parallel computations are discussed in detail in
[Coddington94].
You may want to design and use your own basic generators. VS provides means of registration of such user-
designed generators through the steps described in Advanced Service Routines section.
There is also an option to utilize externally generated random numbers in VS distribution generator routines.
For this purpose VS provides three additional basic random number generators:
– for external random data packed in 32-bit integer array

– for external random data stored in double precision floating-point array; data is supposed to be uniformly
distributed over (a,b) interval
– for external random data stored in single precision floating-point array; data is supposed to be uniformly
distributed over (a,b) interval.
Such basic generators are called the abstract basic random number generators.
See VS Notes for a more detailed description of the generator properties.
Optimization Notice
1860
Optimization Notice
BRNG Parameter Definition

Predefined values for the brng input parameter are as follows:
Values of brng parameter
Value Short Description
VSL_BRNG_MCG31 A 31-bit multiplicative congruential generator.
VSL_BRNG_R250 A generalized feedback shift register generator.
VSL_BRNG_MRG32K3A A combined multiple recursive generator with two components

of order 3.
VSL_BRNG_MCG59 A 59-bit multiplicative congruential generator.
VSL_BRNG_WH A set of 273 Wichmann-Hill combined multiplicative

congruential generators.
VSL_BRNG_MT19937 A Mersenne Twister pseudorandom number generator.
VSL_BRNG_MT2203 A set of 6024 Mersenne Twister pseudorandom number

generators.
VSL_BRNG_SFMT19937 A SIMD-oriented Fast Mersenne Twister pseudorandom number

generator.
VSL_BRNG_SOBOL A 32-bit Gray code-based generator producing low-discrepancy

sequences for dimensions 1 ≤ s ≤ 40; user-defined
dimensions are also available.
VSL_BRNG_NIEDERR A 32-bit Gray code-based generator producing low-discrepancy

sequences for dimensions 1 ≤ s ≤ 318; user-defined
dimensions are also available.
VSL_BRNG_IABSTRACT An abstract random number generator for integer arrays.
VSL_BRNG_DABSTRACT An abstract random number generator for double precision

floating-point arrays.
VSL_BRNG_SABSTRACT An abstract random number generator for single precision

floating-point arrays.
VSL_BRNG_NONDETERM A non-deterministic random number generator.
VSL_BRNG_PHILOX4X32X10 A Philox4x32-10 counter-based pseudorandom number

generator.
VSL_BRNG_ARS5 An ARS-5 counter-based pseudorandom number generator

that uses instructions from the AES-NI set.
See VS Notes for detailed description.
1861
Optimization Notice
Random Streams
Random stream (or stream) is an abstract source of pseudo- and quasi-random sequences of uniform
distribution. You can operate with stream state descriptors only. A stream state descriptor, which holds state
descriptive information for a particular BRNG, is a necessary parameter in each routine of a distribution
generator. Only the distribution generator routines operate with random streams directly. See VS Notes for
details.
NOTE
Random streams associated with abstract basic random number generator are called the abstract
random streams. See VS Notes for detailed description of abstract streams and their use.
You can create unlimited number of random streams by VS Service Routines like NewStream and utilize them
in any distribution generator to get the sequence of numbers of given probability distribution. When they are
no longer needed, the streams should be deleted calling service routine DeleteStream.
VS provides service functions SaveStreamF and LoadStreamF to save random stream descriptive data to a
binary file and to read this data from a binary file respectively. See VS Notes for detailed description.
BRNG Data Types

typedef(void*)VSLStreamStatePtr;
See Advanced Service Routines for the format of the stream state structure for user-designed generators.
Error Reporting
VS RNG routines return status codes of the performed operation to report errors to the calling program. The
application should perform error-related actions and/or recover from the error. The status codes are of
integer type and have the following format:
VSL_ERROR_<ERROR_NAME> - indicates VS errors common for all VS domains.
VSL_RNG_ERROR_<ERROR_NAME> - indicates VS RNG errors.
VS RNG errors are of negative values while warnings are of positive values. The status code of zero value
indicates successful completion of the operation: VSL_ERROR_OK (or synonymic VSL_STATUS_OK).
Status Codes
Status Code Description
Common VSL
VSL_ERROR_OK, VSL_STATUS_OK No error, execution is successful.
VSL_ERROR_BADARGS Input argument value is not valid.
1862
VSL_ERROR_CPU_NOT_SUPPORTED CPU version is not supported.
VSL_ERROR_FEATURE_NOT_IMPLEMENTED Feature invoked is not implemented.
VSL_ERROR_MEM_FAILURE System cannot allocate memory.
VSL_ERROR_NULL_PTR Input pointer argument is NULL.
VSL_ERROR_UNKNOWN Unknown error.
VS RNG Specific
VSL_RNG_ERROR_BAD_FILE_FORMAT File format is unknown.
VSL_RNG_ERROR_BAD_MEM_FORMAT Descriptive random stream format is

unknown.
VSL_RNG_ERROR_BAD_NBITS The value in NBits field is bad.
VSL_RNG_ERROR_BAD_NSEEDS The value in NSeeds field is bad.
VSL_RNG_ERROR_BAD_STREAM The random stream is invalid.
VSL_RNG_ERROR_BAD_STREAM_STATE_SIZE The value in StreamStateSize field is bad.
VSL_RNG_ERROR_BAD_UPDATE Callback function for an abstract BRNG returns

an invalid number of updated entries in a
buffer, that is, < 0 or >nmax.
VSL_RNG_ERROR_BAD_WORD_SIZE The value in WordSize field is bad.
VSL_RNG_ERROR_BRNG_NOT_SUPPORTED BRNG is not supported by the function.
VSL_RNG_ERROR_BRNG_TABLE_FULL Registration cannot be completed due to lack

of free entries in the table of registered
BRNGs.
VSL_RNG_ERROR_BRNGS_INCOMPATIBLE Two BRNGs are not compatible for the

operation.
VSL_RNG_ERROR_FILE_CLOSE Error in closing the file.
VSL_RNG_ERROR_FILE_OPEN Error in opening the file.
VSL_RNG_ERROR_FILE_READ Error in reading the file.
VSL_RNG_ERROR_FILE_WRITE Error in writing the file.
VSL_RNG_ERROR_INVALID_ABSTRACT_STREAM The abstract random stream is invalid.
VSL_RNG_ERROR_INVALID_BRNG_INDEX BRNG index is not valid.
VSL_RNG_ERROR_LEAPFROG_UNSUPPORTED BRNG does not support Leapfrog method.
VSL_RNG_ERROR_NO_NUMBERS Callback function for an abstract BRNG returns

zero as the number of updated entries in a
buffer.
VSL_RNG_ERROR_QRNG_PERIOD_ELAPSED Period of the generator is exceeded.
VSL_RNG_ERROR_SKIPAHEAD_UNSUPPORTED BRNG does not support Skip-Ahead method.
1863
VSL_RNG_ERROR_UNSUPPORTED_FILE_VER File format version is not supported.
VSL_RNG_ERROR_NONDETERM_NOT_SUPPORTED Non-deterministic random number generator

is not supported on the CPU running the
application.
VSL_RNG_ERROR_NONDETERM_ NRETRIES_EXCEEDED Number of retries to generate a random

number using non-deterministic random
number generator exceeds threshold (see
Section 7.2.1.12 Non-deterministic in [VS
Notes] for more details)
VSL_RNG_ERROR_ARS5_NOT_SUPPORTED ARS-5 random number generator is not

supported on the CPU running the application.
Optimization Notice
VS RNG Usage Model

A typical algorithm for VS random number generators is as follows:
1. Create and initialize stream/streams. Functions vslNewStream, vslNewStreamEx, vslCopyStream,
vslCopyStreamState, vslLeapfrogStream, vslSkipAheadStream.
2. Call one or more RNGs.
3. Process the output.
4. Delete the stream/streams. Function vslDeleteStream.
NOTE
You may reiterate steps 2-3. Random number streams may be generated for different threads.
The following example demonstrates generation of a random stream that is output of basic generator
MT19937. The seed is equal to 777. The stream is used to generate 10,000 normally distributed random
numbers in blocks of 1,000 random numbers with parameters a = 5 and sigma = 2. Delete the streams after
completing the generation. The purpose of the example is to calculate the sample mean for normal
distribution with the given parameters.
Example of VS RNG Usage

#include <stdio.h>
#include "mkl_vsl.h"
int main()
{
double r[1000]; /* buffer for random numbers */
double s; /* average */
VSLStreamStatePtr stream;
int i, j;
1864
/* Initializing */
s = 0.0;
vslNewStream( &stream, VSL_BRNG_MT19937, 777 );
/* Generating */
for ( i=0; i<10; i++ ) {
vdRngGaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream, 1000, r, 5.0, 2.0 );
for ( j=0; j<1000; j++ ) {
s += r[j];
}
}
s /= 10000.0;
/* Deleting the stream */

vslDeleteStream( &stream );
/* Printing results */
printf( "Sample mean of normal distribution = %f\n", s );
return 0;
}
Additionally, examples that demonstrate usage of VS random number generators are available in:
${MKL}/examples/vslc/source
Service Routines
Stream handling comprises routines for creating, deleting, or copying the streams and getting the index of a
basic generator. A random stream can also be saved to and then read from a binary file. Table "Service
Routines" lists all available service routines
Service Routines
Routine Short Description
vslNewStream Creates and initializes a random stream.
vslNewStreamEx Creates and initializes a random stream for the generators

with multiple initial conditions.
vsliNewAbstractStream Creates and initializes an abstract random stream for integer

arrays.
vsldNewAbstractStream Creates and initializes an abstract random stream for double

precision floating-point arrays.
vslsNewAbstractStream Creates and initializes an abstract random stream for single

precision floating-point arrays.
vslDeleteStream Deletes previously created stream.
vslCopyStream Copies a stream to another stream.
vslCopyStreamState Creates a copy of a random stream state.
vslSaveStreamF Writes a stream to a binary file.
vslLoadStreamF Reads a stream from a binary file.
vslSaveStreamM Writes a random stream descriptive data, including state, to a

memory buffer.
1865
Routine Short Description
vslLoadStreamM Creates a new stream and reads stream descriptive data,

including state, from the memory buffer.
vslGetStreamSize Computes size of memory necessary to hold the random

stream.
vslLeapfrogStream Initializes the stream by the leapfrog method to generate a

subsequence of the original sequence.
vslSkipAheadStream Initializes the stream by the skip-ahead method.
vslGetStreamStateBrng Obtains the index of the basic generator responsible for the
generation of a given random stream.
vslGetNumRegBrngs Obtains the number of currently registered basic generators.
Most of the generator-based work comprises three basic steps:
1. Creating and initializing a stream (vslNewStream, vslNewStreamEx, vslCopyStream,

vslCopyStreamState, vslLeapfrogStream, vslSkipAheadStream).
2. Generating random numbers with given distribution, see Distribution Generators.
3. Deleting the stream (vslDeleteStream).
Note that you can concurrently create multiple streams and obtain random data from one or several
generators by using the stream state. You must use the vslDeleteStream function to delete all the streams
afterwards.
vslNewStream
Creates and initializes a random stream.
Syntax
status = vslNewStream( &stream, brng, seed );
Include Files
• mkl.h
Input Parameters
brng const MKL_INT Index of the basic generator to initialize the stream. See
Table Values of brng parameter for specific value.
seed const MKL_UINT Initial condition of the stream. In the case of a quasi-
random number generator seed parameter is used to set
the dimension. If the dimension is greater than the
dimension that brng can support or is less than 1, then the
dimension is assumed to be equal to 1.
Output Parameters
stream VSLStreamStatePtr* Stream state descriptor
1866
Description
For a basic generator with number brng, this function creates a new stream and initializes it with a 32-bit
seed. The seed is an initial value used to select a particular sequence generated by the basic generator brng.
The function is also applicable for generators with multiple initial conditions. Use this function to create and
initialize a new stream with a 32-bit seed only. If you need to provide multiple initial conditions such as
several 32-bit or wider seeds, use the function vslNewStreamEx. See VS Notes for a more detailed
description of stream initialization for different basic generators.
NOTE
This function is not applicable for abstract basic random number generators. Please use
vsliNewAbstractStream, vslsNewAbstractStream or vsldNewAbstractStream to utilize integer,
single-precision or double-precision external random data respectively.
Optimization Notice
Return Values
VSL_ERROR_OK, VSL_STATUS_OK Indicates no error, execution is successful.
VSL_RNG_ERROR_INVALID_BRNG_INDEX BRNG index is invalid.
VSL_ERROR_MEM_FAILURE System cannot allocate memory for stream.
VSL_RNG_ERROR_NONDETERMINISTIC_NOT_SUPP Non-deterministic random number generator is not

ORTED supported.
VSL_RNG_ERROR_ARS5_NOT_SUPPORTED ARS-5 random number generator is not supported on the

CPU running the application.
vslNewStreamEx
Creates and initializes a random stream for generators
with multiple initial conditions.
Syntax
status = vslNewStreamEx( &stream, brng, n, params );
Include Files
• mkl.h
1867
Input Parameters
brng const MKL_INT Index of the basic generator to initialize the stream. See
Table "Values of brng parameter" for specific value.
n const MKL_INT Number of initial conditions contained in params
params const unsigned int Array of initial conditions necessary for the basic generator
brng to initialize the stream. In the case of a quasi-random
number generator only the first element in params
parameter is used to set the dimension. If the dimension is
greater than the dimension that brng can support or is less
than 1, then the dimension is assumed to be equal to 1.
Output Parameters
stream VSLStreamStatePtr* Stream state descriptor
Description
The vslNewStreamEx function provides an advanced tool to set the initial conditions for a basic generator if
its input arguments imply several initialization parameters. Initial values are used to select a particular
sequence generated by the basic generator brng. Whenever possible, use vslNewStream, which is analogous
to vslNewStreamEx except that it takes only one 32-bit initial condition. In particular, vslNewStreamEx may
be used to initialize the state table in Generalized Feedback Shift Register Generators (GFSRs). A more
detailed description of this issue can be found in VS Notes.
This function is also used to pass user-defined initialization parameters of quasi-random number generators
into the library. See VS Notes for the format for their passing and registration in VS.
NOTE
This function is not applicable for abstract basic random number generators. Please use
vsliNewAbstractStream, vslsNewAbstractStream or vsldNewAbstractStream to utilize integer,
single-precision or double-precision external random data respectively.
Optimization Notice
Return Values
1868
VSL_RNG_ERROR_INVALID_BRNG_INDEX BRNG index is invalid.

ORTED supported.

vsliNewAbstractStream
Creates and initializes an abstract random stream for
integer arrays.
Syntax
status = vsliNewAbstractStream( &stream, n, ibuf, icallback );
Include Files
• mkl.h
Input Parameters
n const MKL_INT Size of the array ibuf
ibuf const unsigned int Array of n 32-bit integers
icallback Pointer to the callback function used for ibuf update
NOTE
Format of the callback function:
int iUpdateFunc( VSLStreamStatePtrstream, int* n, unsigned int ibuf[], int* nmin, int* nmax,
int* idx );
The callback function returns the number of elements in the array actually updated by the function.Table
icallback Callback Function Parameters gives the description of the callback function parameters.
icallback Callback Function Parameters
Parameters Short Description
stream Abstract random stream descriptor
n Size of ibuf
ibuf Array of random numbers associated with the stream stream
nmin Minimal quantity of numbers to update
nmax Maximal quantity of numbers that can be updated
idx Position in cyclic buffer ibuf to start update 0≤idx<n.
1869
Output Parameters
stream VSLStreamStatePtr* Descriptor of the stream state structure
Description
The vsliNewAbstractStream function creates a new abstract stream and associates it with an integer array
ibuf and your callback function icallback that is intended for updating of ibuf content.
Return Values
VSL_ERROR_BADARGS Parameter n is not positive.
VSL_ERROR_NULL_PTR Either buffer or callback function parameter is a NULL

pointer.
vsldNewAbstractStream
double precision floating-point arrays.
Syntax
status = vsldNewAbstractStream( &stream, n, dbuf, a, b, dcallback );
Include Files
• mkl.h
Input Parameters
n const MKL_INT Size of the array dbuf
dbuf const double Array of n double precision floating-point random numbers

with uniform distribution over interval (a,b)
a const double Left boundary a
b const double Right boundary b
dcallback See Note below Pointer to the callback function used for update of the array
dbuf
Output Parameters
1870
NOTE
Format of the callback function:
int dUpdateFunc( VSLStreamStatePtr stream, int* n, double dbuf[], int* nmin, int* nmax, int*
idx );
dcallback Callback Function Parameters gives the description of the callback function parameters.
dcallback Callback Function Parameters
n Size of dbuf
dbuf Array of random numbers associated with the stream stream
idx Position in cyclic buffer dbuf to start update 0≤idx<n.
Description
The vsldNewAbstractStream function creates a new abstract stream for double precision floating-point
arrays with random numbers of the uniform distribution over interval (a,b). The function associates the
stream with a double precision array dbuf and your callback function dcallback that is intended for updating
of dbuf content.
Return Values

pointer.
vslsNewAbstractStream
single precision floating-point arrays.
Syntax
status = vslsNewAbstractStream( &stream, n, sbuf, a, b, scallback );
Include Files
• mkl.h
1871
Input Parameters
n const MKL_INT Size of the array sbuf
sbuf const float Array of n single precision floating-point random numbers

with uniform distribution over interval (a,b)
a const float Left boundary a
b const float Right boundary b
scallback See Note below Pointer to the callback function used for update of the array
sbuf
Output Parameters
NOTE
Format of the callback function in C:
int sUpdateFunc( VSLStreamStatePtr stream, int* n, float sbuf[], int* nmin, int* nmax, int*
idx );
scallback Callback Function Parameters gives the description of the callback function parameters.
scallback Callback Function Parameters
n Size of sbuf
sbuf Array of random numbers associated with the stream stream
idx Position in cyclic buffer sbuf to start update 0≤idx<n.
Description
The vslsNewAbstractStream function creates a new abstract stream for single precision floating-point
arrays with random numbers of the uniform distribution over interval (a,b). The function associates the
stream with a single precision array sbuf and your callback function scallback that is intended for updating of
sbuf content.
Return Values
1872

pointer.
vslDeleteStream
Deletes a random stream.
Syntax
status = vslDeleteStream( &stream );
Include Files
• mkl.h
Input/Output Parameters
stream VSLStreamStatePtr* Stream state descriptor. Must have non-zero value. After
the stream is successfully deleted, the pointer is set to
NULL.
Description
The function deletes the random stream created by one of the initialization functions.
Return Values
VSL_ERROR_NULL_PTR stream parameter is a NULL pointer.
VSL_RNG_ERROR_BAD_STREAM stream is not a valid random stream.
vslCopyStream
Creates a copy of a random stream.
Syntax
status = vslCopyStream( &newstream, srcstream );
Include Files
• mkl.h
Input Parameters
srcstream const VSLStreamStatePtr Pointer to the stream state structure to be copied
1873
Output Parameters
newstream VSLStreamStatePtr* Copied random stream descriptor
Description
The function creates an exact copy of srcstream and stores its descriptor to newstream.
Return Values
VSL_ERROR_NULL_PTR srcstream parameter is a NULL pointer.
VSL_RNG_ERROR_BAD_STREAM srcstream is not a valid random stream.
VSL_ERROR_MEM_FAILURE System cannot allocate memory for newstream.
vslCopyStreamState
Creates a copy of a random stream state.
Syntax
status = vslCopyStreamState( deststream, srcstream );
Include Files
• mkl.h
Input Parameters
srcstream const VSLStreamStatePtr Pointer to the stream state structure, from which the state
structure is copied
Output Parameters
deststrea VSLStreamStatePtr Pointer to the stream state structure where the stream
m state is copied
Description
The vslCopyStreamState function copies a stream state from srcstream to the existing deststream stream.
Both the streams should be generated by the same basic generator. An error message is generated when the
index of the BRNG that produced deststream stream differs from the index of the BRNG that generated
srcstream stream.
Unlike vslCopyStream function, which creates a new stream and copies both the stream state and other
data from srcstream, the function vslCopyStreamState copies only srcstream stream state data to the
generated deststream stream.
1874
Return Values
VSL_ERROR_NULL_PTR Either srcstream or deststream is a NULL pointer.
VSL_RNG_ERROR_BAD_STREAM Either srcstream or deststream is not a valid random

stream.
VSL_RNG_ERROR_BRNGS_INCOMPATIBLE BRNG associated with srcstream is not compatible with

BRNG associated with deststream.
vslSaveStreamF
Writes random stream descriptive data, including
stream state, to binary file.
Syntax
errstatus = vslSaveStreamF( stream, fname );
Include Files
• mkl.h
Input Parameters
stream const VSLStreamStatePtr Random stream to be written to the file
fname const char* File name specified as a null-terminated string
Output Parameters
errstatus int Error status of the operation
Description
The vslSaveStreamF function writes the random stream descriptive data, including the stream state, to the
binary file. Random stream descriptive data is saved to the binary file with the name fname. The random
stream stream must be a valid stream created by vslNewStream-like or vslCopyStream-like service
routines. If the stream cannot be saved to the file, errstatus has a non-zero value. The random stream can
be read from the binary file using the vslLoadStreamF function.
Return Values
VSL_ERROR_NULL_PTR Either fname or stream is a NULL pointer.
VSL_RNG_ERROR_FILE_OPEN Indicates an error in opening the file.
VSL_RNG_ERROR_FILE_WRITE Indicates an error in writing the file.
1875
VSL_RNG_ERROR_FILE_CLOSE Indicates an error in closing the file.
VSL_ERROR_MEM_FAILURE System cannot allocate memory for internal needs.
vslLoadStreamF
Creates new stream and reads stream descriptive
data, including stream state, from binary file.
Syntax
errstatus = vslLoadStreamF( &stream, fname );
Include Files
• mkl.h
Input Parameters
fname const char* File name specified as a null-terminated string
Output Parameters
stream VSLStreamStatePtr* Pointer to a new random stream
Description
The vslLoadStreamF function creates a new stream and reads stream descriptive data, including the stream
state, from the binary file. A new random stream is created using the stream descriptive data from the
binary file with the name fname. If the stream cannot be read (for example, an I/O error occurs or the file
format is invalid), errstatus has a non-zero value. To save random stream to the file, use vslSaveStreamF
function.
CAUTION
Calling vslLoadStreamF with a previously initialized stream pointer can have unintended
consequences such as a memory leak. To initialize a stream which has been in use until calling
vslLoadStreamF, you should call the vslDeleteStream function first to deallocate the resources.
Optimization Notice
1876
Return Values
VSL_ERROR_NULL_PTR fname is a NULL pointer.
VSL_RNG_ERROR_FILE_OPEN Indicates an error in opening the file.
VSL_RNG_ERROR_FILE_WRITE Indicates an error in writing the file.
VSL_RNG_ERROR_FILE_CLOSE Indicates an error in closing the file.
VSL_RNG_ERROR_BAD_FILE_FORMAT Unknown file format.
VSL_RNG_ERROR_UNSUPPORTED_FILE_VER File format version is unsupported.

ORTED supported.

vslSaveStreamM
Writes random stream descriptive data, including
stream state, to a memory buffer.
Syntax
errstatus = vslSaveStreamM( stream, memptr );
Include Files
• mkl.h
Input Parameters
stream const VSLStreamStatePtr Random stream to be written to the memory
memptr char* Memory buffer to save random stream descriptive data to
Output Parameters
Description
The vslSaveStreamM function writes the random stream descriptive data, including the stream state, to the
memory at memptr. Random stream stream must be a valid stream created by vslNewStream-like or
vslCopyStream-like service routines. The memptr parameter must be a valid pointer to the memory of size
sufficient to hold the random stream stream. Use the service routine vslGetStreamSize to determine this
amount of memory.
1877
If the stream cannot be saved to the memory, errstatus has a non-zero value. The random stream can be
read from the memory pointed by memptr using the vslLoadStreamM function.
Return Values
VSL_ERROR_NULL_PTR Either memptr or stream is a NULL pointer.
VSL_RNG_ERROR_BAD_STREAM stream is a NULL pointer.
vslLoadStreamM
Creates a new stream and reads stream descriptive
data, including stream state, from the memory buffer.
Syntax
errstatus = vslLoadStreamM( &stream, memptr );
Include Files
• mkl.h
Input Parameters
memptr const char* Memory buffer to load random stream descriptive data from
Output Parameters
stream VSLStreamStatePtr* Pointer to a new random stream
Description
The vslLoadStreamM function creates a new stream and reads stream descriptive data, including the stream
state, from the memory buffer. A new random stream is created using the stream descriptive data from the
memory pointer by memptr. If the stream cannot be read (for example, memptr is invalid), errstatus has a
non-zero value. To save random stream to the memory, use vslSaveStreamM function. Use the service
routine vslGetStreamSize to determine the amount of memory sufficient to hold the random stream.
CAUTION
Calling LoadStreamM with a previously initialized stream pointer can have unintended consequences
such as a memory leak. To initialize a stream which has been in use until calling vslLoadStreamM,
you should call the vslDeleteStream function first to deallocate the resources.
Optimization Notice
1878
Optimization Notice

Return Values
VSL_ERROR_NULL_PTR memptr is a NULL pointer.
VSL_RNG_ERROR_BAD_MEM_FORMAT Descriptive random stream format is unknown.

ORTED supported.

vslGetStreamSize
Computes size of memory necessary to hold the
random stream.
Syntax
memsize = vslGetStreamSize( stream );
Include Files
• mkl.h
Input Parameters
stream const VSLStreamStatePtr Random stream
Output Parameters
memsize int Amount of memory in bytes necessary to hold descriptive

data of random stream stream
Description
The vslGetStreamSize function returns the size of memory in bytes which is necessary to hold the given
random stream. Use the output of the function to allocate the buffer to which you will save the random
stream by means of the vslSaveStreamM function.
1879
Return Values
VSL_RNG_ERROR_BAD_STREAM stream is a NULL pointer.
vslLeapfrogStream
Initializes a stream using the leapfrog method.
Syntax
status = vslLeapfrogStream( stream, k, nstreams );
Include Files
• mkl.h
Input Parameters
stream VSLStreamStatePtr Pointer to the stream state structure to which leapfrog

method is applied
k const MKL_INT Index of the computational node, or stream number
nstreams const MKL_INT Largest number of computational nodes, or stride
Description
The vslLeapfrogStream function generates random numbers in a random stream with non-unit stride. This
feature is particularly useful in distributing random numbers from the original stream across the nstreams
buffers without generating the original random sequence with subsequent manual distribution.
One of the important applications of the leapfrog method is splitting the original sequence into non-
overlapping subsequences across nstreams computational nodes. The function initializes the original random
stream (see Figure "Leapfrog Method") to generate random numbers for the computational node k, 0 ≤k <
nstreams, where nstreams is the largest number of computational nodes used.
Leapfrog Method
1880
The leapfrog method is supported only for those basic generators that allow splitting elements by the
leapfrog method, which is more efficient than simply generating them by a generator with subsequent
manual distribution across computational nodes. See VS Notes for details.
For quasi-random basic generators, the leapfrog method allows generating individual components of quasi-
random vectors instead of whole quasi-random vectors. In this case nstreams parameter should be equal to
the dimension of the quasi-random vector while k parameter should be the index of a component to be
generated (0 ≤k < nstreams). Other parameters values are not allowed.
The following code illustrates the initialization of three independent streams using the leapfrog method:
Code for Leapfrog Method

...
VSLStreamStatePtr stream1;
/* Creating 3 identical streams */

status = vslNewStream(&stream1, VSL_BRNG_MCG31, 174);
status = vslCopyStream(&stream2, stream1);
/* Leapfrogging the streams

*/
status = vslLeapfrogStream(stream1, 0, 3);
/* Generating random numbers

*/
...
/* Deleting the streams
*/
status = vslDeleteStream(&stream1);
...
Return Values
VSL_ERROR_NULL_PTR stream is a NULL pointer.
VSL_RNG_ERROR_LEAPFROG_UNSUPPORTED BRNG does not support Leapfrog method.
vslSkipAheadStream
Initializes a stream using the block-splitting method.
Syntax
status = vslSkipAheadStream( stream, nskip);
1881
Include Files
• mkl.h
Input Parameters
stream VSLStreamStatePtr Pointer to the stream state structure to which block-

splitting method is applied
nskip const long long int Number of skipped elements
Description
The vslSkipAheadStream function skips a given number of elements in a random stream. This feature is
particularly useful in distributing random numbers from original random stream across different
computational nodes. If the largest number of random numbers used by a computational node is nskip, then
the original random sequence may be split by vslSkipAheadStream into non-overlapping blocks of nskip
size so that each block corresponds to the respective computational node. The number of computational
nodes is unlimited. This method is known as the block-splitting method or as the skip-ahead method. (see
Figure "Block-Splitting Method").
Block-Splitting Method
The skip-ahead method is supported only for those basic generators that allow skipping elements by the
skip-ahead method, which is more efficient than simply generating them by generator with subsequent
manual skipping. See VS Notes for details.
Please note that for quasi-random basic generators the skip-ahead method works with components of quasi-
random vectors rather than with whole quasi-random vectors. Therefore, to skip NS quasi-random vectors,
set the nskip parameter equal to the NS*DIMEN, where DIMEN is the dimension of the quasi-random vector.
If this operation results in exceeding the period of the quasi-random number generator, which is 232-1, the
library returns the VSL_RNG_ERROR_QRNG_PERIOD_ELAPSED error code.
The following code illustrates how to initialize three independent streams using the vslSkipAheadStream
function:
Code for Block-Splitting Method

1882
/* Creating the 1st stream
*/
status = vslNewStream(&stream1, VSL_BRNG_MCG31, 174);
/* Skipping ahead by 7 elements the 2nd stream */

status = vslSkipAheadStream(stream2, 7);
/* Skipping ahead by 7 elements the 3rd stream */

status = vslSkipAheadStream(stream3, 7);
/* Generating random numbers

*/
...
/* Deleting the streams
*/
...
Return Values
VSL_RNG_ERROR_SKIPAHEAD_UNSUPPORTED BRNG does not support the Skip-Ahead method.
VSL_RNG_ERROR_QRNG_PERIOD_ELAPSED Period of the quasi-random number generator is exceeded.
vslGetStreamStateBrng
Returns index of a basic generator used for generation
of a given random stream.
Syntax
brng = vslGetStreamStateBrng( stream );
Include Files
• mkl.h
Input Parameters
stream const VSLStreamStatePtr Pointer to the stream state structure
1883
Output Parameters
brng int Index of the basic generator assigned for the generation of
stream ; negative in case of an error
Description
The vslGetStreamStateBrng function retrieves the index of a basic generator used for generation of a
given random stream.
Return Values
vslGetNumRegBrngs
Obtains the number of currently registered basic
generators.
Syntax
nregbrngs = vslGetNumRegBrngs( void );
Include Files
• mkl.h
Output Parameters
nregbrngs int Number of basic generators registered at the moment of

the function call
Description
The vslGetNumRegBrngs function obtains the number of currently registered basic generators. Whenever
user registers a user-designed basic generator, the number of registered basic generators is incremented.
The maximum number of basic generators that can be registered is determined by the VSL_MAX_REG_BRNGS
parameter.
Distribution Generators
Intel MKL VS routines are used to generate random numbers with different types of distribution. Each
function group is introduced below by the type of underlying distribution and contains a short description of
its functionality, as well as specifications of the call sequence and the explanation of input and output
parameters. Table "Continuous Distribution Generators" and Table "Discrete Distribution Generators" list the
random number generator routines with data types and output distributions, and sets correspondence
between data types of the generator routines and the basic random number generators.
Continuous Distribution Generators
Type of Data BRNG Data Description
Distribution Types Type
vRngUniform s, d s, d Uniform continuous distribution on the interval [a,b)
1884
Type of Data BRNG Data Description
Distribution Types Type
vRngGaussian s, d s, d Normal (Gaussian) distribution
vRngGaussianMV s, d s, d Multivariate normal (Gaussian) distribution
vRngExponential s, d s, d Exponential distribution
vRngLaplace s, d s, d Laplace distribution (double exponential distribution)
vRngWeibull s, d s, d Weibull distribution
vRngCauchy s, d s, d Cauchy distribution
vRngRayleigh s, d s, d Rayleigh distribution
vRngLognormal s, d s, d Lognormal distribution
vRngGumbel s, d s, d Gumbel (extreme value) distribution
vRngGamma s, d s, d Gamma distribution
vRngBeta s, d s, d Beta distribution
Discrete Distribution Generators

Type of Data BRNG Data Type Description
Distribution Types
vRngUniform i d Uniform discrete

distribution on the
interval [a,b)
vRngUniformBits i i Underlying BRNG integer

recurrence
vRngUniformBits i i Uniformly distributed

32 bits in 32-bit chunks
vRngUniformBits i i Uniformly distributed

64 bits in 64-bit chunks
vRngBernoulli i s Bernoulli distribution
vRngGeometric i s Geometric distribution
vRngBinomial i d Binomial distribution
vRngHypergeomet i d Hypergeometric
ric distribution
vRngPoisson i s (for VSL_RNG_METHOD_POISSON_POISNORM) Poisson distribution

s (for distribution parameter λ≥ 27) and d (for
λ < 27) (for
VSL_RNG_METHOD_POISSON_PTPE)
vRngPoissonV i s Poisson distribution with

varying mean
vRngNegBinomial i d Negative binomial

distribution, or Pascal
distribution
1885
Modes of random number generation

The library provides two modes of random number generation, accurate and fast. Accurate generation mode
is intended for the applications that are highly demanding to accuracy of calculations. When used in this
mode, the generators produce random numbers lying completely within definitional domain for all values of
the distribution parameters. For example, random numbers obtained from the generator of continuous
distribution that is uniform on interval [a,b] belong to this interval irrespective of what a and b values may
be. Fast mode provides high performance of generation and also guarantees that generated random numbers
belong to the definitional domain except for some specific values of distribution parameters. The generation
mode is set by specifying relevant value of the method parameter in generator routines. List of distributions
that support accurate mode of generation is given in the table below.
Distribution Generators Supporting Accurate Mode
Type of Distribution Data Types
vRngUniform s, d
vRngExponential s, d
vRngWeibull s, d
vRngRayleigh s, d
vRngLognormal s, d
vRngGamma s, d
vRngBeta s, d
See additional details about accurate and fast mode of random number generation in VS Notes.
New method names

The current version of Intel MKL has a modified structure of VS RNG method names. (See RNG Naming
Conventions for details.) The old names are kept for backward compatibility. The set correspondence
between the new and legacy method names for VS random number generators.
Method Names for Continuous Distribution Generators
RNG Legacy Method Name New Method Name
vRngUniform VSL_METHOD_SUNIFORM_STD, VSL_RNG_METHOD_UNIFORM_STD,

VSL_METHOD_DUNIFORM_STD, VSL_RNG_METHOD_UNIFORM_STD_ACCURATE
VSL_METHOD_SUNIFORM_STD_ACCURATE,
VSL_METHOD_DUNIFORM_STD_ACCURATE
vRngGaussia VSL_METHOD_SGAUSSIAN_BOXMULLER, VSL_RNG_METHOD_GAUSSIAN_BOXMULLER,

n VSL_METHOD_SGAUSSIAN_BOXMULLER2, VSL_RNG_METHOD_GAUSSIAN_BOXMULLER2,
VSL_METHOD_SGAUSSIAN_ICDF, VSL_RNG_METHOD_GAUSSIAN_ICDF
VSL_METHOD_DGAUSSIAN_BOXMULLER,
VSL_METHOD_DGAUSSIAN_BOXMULLER2,
VSL_METHOD_DGAUSSIAN_ICDF
vRngGaussia VSL_METHOD_SGAUSSIANMV_BOXMULLER, VSL_RNG_METHOD_GAUSSIANMV_BOXMULLER

nMV VSL_METHOD_SGAUSSIANMV_BOXMULLER2, ,
VSL_METHOD_SGAUSSIANMV_ICDF, VSL_RNG_METHOD_GAUSSIANMV_BOXMULLER
VSL_METHOD_DGAUSSIANMV_BOXMULLER, 2, VSL_RNG_METHOD_GAUSSIANMV_ICDF
VSL_METHOD_DGAUSSIANMV_BOXMULLER2,
VSL_METHOD_DGAUSSIANMV_ICDF
1886
vRngExponen VSL_METHOD_SEXPONENTIAL_ICDF, VSL_RNG_METHOD_EXPONENTIAL_ICDF,

tial VSL_METHOD_DEXPONENTIAL_ICDF, VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACC
VSL_METHOD_SEXPONENTIAL_ICDF_ACCUR URATE
ATE,
VSL_METHOD_DEXPONENTIAL_ICDF_ACCUR
ATE
vRngLaplace VSL_METHOD_SLAPLACE_ICDF, VSL_RNG_METHOD_LAPLACE_ICDF

VSL_METHOD_DLAPLACEL_ICDF
vRngWeibull VSL_METHOD_SWEIBULL_ICDF, VSL_RNG_METHOD_WEIBULL_ICDF,

VSL_METHOD_DWEIBULL_ICDF, VSL_RNG_METHOD_WEIBULL_ICDF_ACCURAT
VSL_METHOD_SWEIBULL_ICDF_ACCURATE, E
VSL_METHOD_DWEIBULL_ICDF_ACCURATE
vRngCauchy VSL_METHOD_SCAUCHY_ICDF, VSL_RNG_METHOD_CAUCHY_ICDF

VSL_METHOD_DCAUCHY_ICDF
vRngRayleig VSL_METHOD_SRAYLEIGH_ICDF, VSL_RNG_METHOD_RAYLEIGH_ICDF,

h VSL_METHOD_DRAYLEIGH_ICDF, VSL_RNG_METHOD_RAYLEIGH_ICDF_ACCURA
VSL_METHOD_SRAYLEIGH_ICDF_ACCURATE, TE
VSL_METHOD_DRAYLEIGH_ICDF_ACCURATE
vRngLognorm VSL_METHOD_SLOGNORMAL_BOXMULLER2, VSL_RNG_METHOD_LOGNORMAL_BOXMULLER2

al VSL_METHOD_DLOGNORMAL_BOXMULLER2, ,
VSL_METHOD_SLOGNORMAL_BOXMULLER2_A VSL_RNG_METHOD_LOGNORMAL_BOXMULLER2
CCURATE, _ACCURATE
VSL_METHOD_DLOGNORMAL_BOXMULLER2_A
CCURATE
VSL_METHOD_SLOGNORMAL_ICDF, VSL_RNG_METHOD_LOGNORMAL_ICDF,
VSL_METHOD_DLOGNORMAL_ICDF, VSL_RNG_METHOD_LOGNORMAL_ICDF_ACCUR
VSL_METHOD_SLOGNORMAL_ICDF_ACCURAT ATE
E,
VSL_METHOD_DLOGNORMAL_ICDF_ACCURAT
E
vRngGumbel VSL_METHOD_SGUMBEL_ICDF, VSL_RNG_METHOD_GUMBEL_ICDF

VSL_METHOD_DGUMBEL_ICDF
vRngGamma VSL_METHOD_SGAMMA_GNORM, VSL_RNG_METHOD_GAMMA_GNORM,

VSL_METHOD_DGAMMA_GNORM, VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE
VSL_METHOD_SGAMMA_GNORM_ACCURATE,
VSL_METHOD_DGAMMA_GNORM_ACCURATE
vRngBeta VSL_METHOD_SBETA_CJA, VSL_RNG_METHOD_BETA_CJA,

VSL_METHOD_DBETA_CJA, VSL_RNG_METHOD_BETA_CJA_ACCURATE
VSL_METHOD_SBETA_CJA_ACCURATE,
VSL_METHOD_DBETA_CJA_ACCURATE
Method Names for Discrete Distribution Generators

vRngUniform VSL_METHOD_IUNIFORM_STD VSL_RNG_METHOD_UNIFORM_STD
vRngUniformB VSL_METHOD_IUNIFORMBITS_STD VSL_RNG_METHOD_UNIFORMBITS_STD

its
1887
vRngBernoull VSL_METHOD_IBERNOULLI_ICDF VSL_RNG_METHOD_BERNOULLI_ICDF

i
vRngGeometri VSL_METHOD_IGEOMETRIC_ICDF VSL_RNG_METHOD_GEOMETRIC_ICDF

c
vRngBinomial VSL_METHOD_IBINOMIAL_BTPE VSL_RNG_METHOD_BINOMIAL_BTPE
vRngHypergeo VSL_METHOD_IHYPERGEOMETRIC_H2PE VSL_RNG_METHOD_HYPERGEOMETRIC_H2PE

metric
vRngPoisson VSL_METHOD_IPOISSON_PTPE, VSL_RNG_METHOD_POISSON_PTPE,

VSL_METHOD_IPOISSON_POISNORM VSL_RNG_METHOD_POISSON_POISNORM
vRngPoissonV VSL_METHOD_IPOISSONV_POISNORM VSL_RNG_METHOD_POISSONV_POISNORM
vRngNegBinom VSL_METHOD_INEGBINOMIAL_NBAR VSL_RNG_METHOD_NEGBINOMIAL_NBAR

ial
Continuous Distributions
This section describes routines for generating random numbers with continuous distribution.
vRngUniform Continuous Distribution Generators

Generates random numbers with uniform distribution.
Syntax
status = vsRngUniform( method, stream, n, r, a, b );
status = vdRngUniform( method, stream, n, r, a, b );
Include Files
• mkl.h
Input Parameters
method const MKL_INT Generation method; the specific values are as follows:
VSL_RNG_METHOD_UNIFORM_STD
VSL_RNG_METHOD_UNIFORM_STD_ACCURATE
Standard method.
stream VSLStreamStatePtr Pointer to the stream state structure
n const MKL_INT Number of random values to be generated
a const float for vsRngUniform Left bound a

const double for
vdRngUniform
b const float for vsRngUniform Right bound b
1888
const double for

vdRngUniform
Output Parameters
r float* for vsRngUniform Vector of n random numbers uniformly distributed over the
interval [a,b)
double* for vdRngUniform
Description
The vRngUniform function generates random numbers uniformly distributed over the interval [a, b), where
a, b are the left and right bounds of the interval, respectively, and a, b∈R ; a < b.
The probability density function is given by:
The cumulative distribution function is as follows:
Optimization Notice
Return Values
1889
VSL_RNG_ERROR_BAD_UPDATE Callback function for an abstract BRNG returns an invalid

number of updated entries in a buffer, that is, < 0 or >
nmax.
VSL_RNG_ERROR_NO_NUMBERS Callback function for an abstract BRNG returns 0 as the

number of updated entries in a buffer.
VSL_RNG_ERROR_QRNG_PERIOD_ELAPSED Period of the generator has been exceeded.
VSL_RNG_ERROR_NONDETERM_NRETRIES_EXCEED Number of retries to generate a random number by using

ED non-deterministic random number generator exceeds
threshold.

vRngGaussian
Generates normally distributed random numbers.
Syntax
status = vsRngGaussian( method, stream, n, r, a, sigma );
status = vdRngGaussian( method, stream, n, r, a, sigma );
Include Files
• mkl.h
Input Parameters
method const MKL_INT Generation method. The specific values are as follows:
VSL_RNG_METHOD_GAUSSIAN_BOXMULLER
VSL_RNG_METHOD_GAUSSIAN_BOXMULLER2
VSL_RNG_METHOD_GAUSSIAN_ICDF
See brief description of the methods BOXMULLER,

BOXMULLER2, and ICDF in Table "Values of <method> in
method parameter"
a const float for Mean valuea.

vsRngGaussian
const double for
vdRngGaussian
sigma const float for Standard deviationσ.

vsRngGaussian
1890
const double for

vdRngGaussian
Output Parameters
r float* for vsRngGaussian Vector of n normally distributed random numbers
double* for vdRngGaussian
Description
The vRngGaussian function generates random numbers with normal (Gaussian) distribution with mean value
a and standard deviation σ, where
a, σ∈R ; σ > 0.
The cumulative distribution function Fa,σ(x) can be expressed in terms of standard normal distribution Φ(x)
as
Fa,σ(x) = Φ((x - a)/σ)
Optimization Notice
Return Values
1891

nmax.


threshold.

vRngGaussianMV
Generates random numbers from multivariate normal
distribution.
Syntax
status = vsRngGaussianMV( method, stream, n, r, dimen, mstorage, a, t );
status = vdRngGaussianMV( method, stream, n, r, dimen, mstorage, a, t );
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_GAUSSIANMV_BOXMULLER
VSL_RNG_METHOD_GAUSSIANMV_BOXMULLER2
VSL_RNG_METHOD_GAUSSIANMV_ICDF
See brief description of the methods BOXMULLER,

BOXMULLER2, and ICDF in Table "Values of <method> in
method parameter"
n const MKL_INT Number of d-dimensional vectors to be generated
dimen const MKL_INT Dimension d ( d≥ 1) of output random vectors
mstorage const MKL_INT Matrix storage scheme for lower triangular matrix T. The
routine supports three matrix storage schemes:
1892
• VSL_MATRIX_STORAGE_FULL— all d x d elements of the

matrix T are passed, however, only the lower triangle
part is actually used in the routine.
• VSL_MATRIX_STORAGE_PACKED— lower triangle
elements of T are packed by rows into a one-
dimensional array.
• VSL_MATRIX_STORAGE_DIAGONAL— only diagonal
elements of T are passed.
a const float* for Mean vector a of dimension d

vsRngGaussianMV
const double* for
vdRngGaussianMV
t const float* for Elements of the lower triangular matrix passed according to
vsRngGaussianMV the matrix T storage scheme mstorage.
const double* for

vdRngGaussianMV
Output Parameters
r float* for vsRngGaussianMV Array of n random vectors of dimension dimen
double* for vdRngGaussianMV
Description
The vRngGaussianMV function generates random numbers with d-variate normal (Gaussian) distribution with
mean value a and variance-covariance matrix C, where a∈Rd; C is a d×d symmetric positive-definite matrix.
where x∈Rd .
Matrix C can be represented as C = TTT, where T is a lower triangular matrix - Cholesky factor of C.
Instead of variance-covariance matrix C the generation routines require Cholesky factor of C in input. To
compute Cholesky factor of matrix C, the user may call Intel MKL LAPACK routines for matrix factorization: ?
potrf or ?pptrf for v?RngGaussianMV/v?rnggaussianmv routines (? means either s or d for single and
double precision respectively). See Application Notes for more details.
Optimization Notice
1893
Optimization Notice
Application Notes
Since matrices are stored in Fortran by columns, while in C they are stored by rows, the usage of Intel MKL
factorization routines (assuming Fortran matrices storage) in combination with multivariate normal RNG
(assuming C matrix storage) is slightly different in C and Fortran. The following tables help in using these
routines in C and Fortran. For further information please refer to the appropriate VS example file.
Using Cholesky Factorization Routines in C
Matrix Storage Scheme Variance- Factorization UPLO Result of
Covariance Matrix Routine Parameter Factorizatio
Argument in n as Input
Factorizati Argument
on Routine for RNG
VSL_MATRIX_STORAGE_FULL C in C two- spotrf for ‘U’ Lower

dimensional array vsRngGaussianMV triangle of T.
Upper
dpotrf for
triangle is not
vdRngGaussianMV
used.
VSL_MATRIX_STORAGE_PACK Lower triangle of C spptrf for ‘L’ Lower

ED packed by columns vsRngGaussianMV triangle of T
into one- packed by
dpptrf for
dimensional array columns into
vdRngGaussianMV
a one-
dimensional
array.
Return Values

nmax.


threshold.
1894
vRngExponential
Generates exponentially distributed random numbers.
Syntax
status = vsRngExponential( method, stream, n, r, a, beta );
status = vdRngExponential( method, stream, n, r, a, beta );
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_EXPONENTIAL_ICDF
VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACCURATE
Inverse cumulative distribution function method
a const float for Displacement a

vsRngExponential
const double for
vdRngExponential
beta const float for Scalefactorβ.

vsRngExponential
const double for
vdRngExponential
Output Parameters
r float* for vsRngExponential Vector of n exponentially distributed random numbers
double* for vdRngExponential
Description
The vRngExponential function generates random numbers with exponential distribution that has
displacement a and scalefactor β, where a, β∈R ; β > 0.
1895
Optimization Notice
Return Values

nmax.


threshold.

vRngLaplace
Generates random numbers with Laplace distribution.
1896
Syntax
status = vsRngLaplace( method, stream, n, r, a, beta );
status = vdRngLaplace( method, stream, n, r, a, beta );
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_LAPLACE_ICDF
a const float for vsRngLaplace Mean value a

const double for
vdRngLaplace
beta const float for vsRngLaplace Scalefactorβ.

const double for
vdRngLaplace
Output Parameters
r float* for vsRngLaplace Vector of n Laplace distributed random numbers
double* for vdRngLaplace
Description
The vRngLaplace function generates random numbers with Laplace distribution with mean value (or
average) a and scalefactor β, where a, β∈R ; β > 0. The scalefactor value determines the standard
deviation as
1897
Optimization Notice
Return Values

nmax.


threshold.

vRngWeibull
Generates Weibull distributed random numbers.
Syntax
status = vsRngWeibull( method, stream, n, r, alpha, a, beta );
status = vdRngWeibull( method, stream, n, r, alpha, a, beta );
1898
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_WEIBULL_ICDF
VSL_RNG_METHOD_WEIBULL_ICDF_ACCURATE
alpha const float for vsRngWeibull Shapeα.

const double for
vdRngWeibull
a const float for vsRngWeibull Displacement a

const double for
vdRngWeibull
beta const float for vsRngWeibull Scalefactorβ.

const double for
vdRngWeibull
Output Parameters
r float* for vsRngWeibull Vector of n Weibull distributed random numbers
double* for vdRngWeibull
Description
The vRngWeibull function generates Weibull distributed random numbers with displacement a, scalefactor β,
and shape α, where α, β, a∈R ; α > 0, β > 0.
1899
Optimization Notice
Return Values

nmax.


threshold.

vRngCauchy
Generates Cauchy distributed random values.
Syntax
status = vsRngCauchy( method, stream, n, r, a, beta );
status = vdRngCauchy( method, stream, n, r, a, beta );
Include Files
• mkl.h
1900
Input Parameters
VSL_RNG_METHOD_CAUCHY_ICDF
a const float for vsRngCauchy Displacementa.
const double for vdRngCauchy
beta const float for vsRngCauchy Scalefactorβ.
const double for vdRngCauchy
Output Parameters
r float* for vsRngCauchy Vector of n Cauchy distributed random numbers
double* for vdRngCauchy
Description
The function generates Cauchy distributed random numbers with displacement a and scalefactor β, where a,
β∈R ; β > 0.
Optimization Notice
1901
Optimization Notice
Return Values

nmax.


threshold.

vRngRayleigh
Generates Rayleigh distributed random values.
Syntax
status = vsRngRayleigh( method, stream, n, r, a, beta );
status = vdRngRayleigh( method, stream, n, r, a, beta );
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_RAYLEIGH_ICDF
VSL_RNG_METHOD_RAYLEIGH_ICDF_ACCURATE
1902
a const float for Displacement a

vsRngRayleigh
const double for
vdRngRayleigh
beta const float for Scalefactorβ.

vsRngRayleigh
const double for
vdRngRayleigh
Output Parameters
r float* for vsRngRayleigh Vector of n Rayleigh distributed random numbers
double* for vdRngRayleigh
Description
The vRngRayleigh function generates Rayleigh distributed random numbers with displacement a and
scalefactor β, where a, β∈R ; β > 0.
The Rayleigh distribution is a special case of the Weibull distribution, where the shape parameter α = 2.
Optimization Notice
1903
Optimization Notice
Return Values

nmax.


threshold.

vRngLognormal
Generates lognormally distributed random numbers.
Syntax
status = vsRngLognormal( method, stream, n, r, a, sigma, b, beta );
status = vdRngLognormal( method, stream, n, r, a, sigma, b, beta );
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_LOGNORMAL_BOXMULLER2
VSL_RNG_METHOD_LOGNORMAL_BOXMULLER2_ACCURATE
Box Muller 2 based method
VSL_RNG_METHOD_LOGNORMAL_ICDF
VSL_RNG_METHOD_LOGNORMAL_ICDF_ACCURATE
1904
Inverse cumulative distribution function based method
a const float for Average a of the subject normal distribution

vsRngLognormal
const double for
vdRngLognormal
sigma const float for Standard deviation σ of the subject normal distribution
vsRngLognormal
const double for
vdRngLognormal
b const float for Displacement b

vsRngLognormal
const double for
vdRngLognormal
beta const float for Scalefactor β.

vsRngLognormal
const double for
vdRngLognormal
Output Parameters
r float* for vsRngLognormal Vector of n lognormally distributed random numbers
double* for vdRngLognormal
Description
The vRngLognormal function generates lognormally distributed random numbers with average of distribution
a and standard deviation σ of subject normal distribution, displacement b, and scalefactor β, where a, σ, b,
β∈R ; σ > 0 , β > 0.
1905
Optimization Notice
Return Values

nmax.


threshold.

vRngGumbel
Generates Gumbel distributed random values.
Syntax
status = vsRngGumbel( method, stream, n, r, a, beta );
status = vdRngGumbel( method, stream, n, r, a, beta );
Include Files
• mkl.h
1906
Input Parameters
VSL_RNG_METHOD_GUMBEL_ICDF
a const float for vsRngGumbel Displacementa.
const double for vdRngGumbel
beta const float for vsRngGumbel Scalefactorβ.
const double for vdRngGumbel
Output Parameters
r float* for vsRngGumbel Vector of n random numbers with Gumbel distribution
double* for vdRngGumbel
Description
The vRngGumbel function generates Gumbel distributed random numbers with displacement a and
scalefactor β, where a, β∈R ; β > 0.
Optimization Notice
1907
Optimization Notice
Return Values

nmax.


threshold.

vRngGamma
Generates gamma distributed random values.
Syntax
status = vsRngGamma( method, stream, n, r, alpha, a, beta );
status = vdRngGamma( method, stream, n, r, alpha, a, beta );
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_GAMMA_GNORM
VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE
Acceptance/rejection method using random numbers with

Gaussian distribution. See brief description of the method
GNORM in Table "Values of <method> in method parameter"
1908
alpha const float for vsRngGamma Shape α.
const double for vdRngGamma
a const float for vsRngGamma Displacement a.
beta const float for vsRngGamma Scalefactorβ.
Output Parameters
r float* for vsRngGamma Vector of n random numbers with gamma distribution
double* for vdRngGamma
Description
The vRngGamma function generates random numbers with gamma distribution that has shape parameter α,
displacement a, and scale parameter β, where α, β, and a∈R ; α > 0, β > 0.
where Γ(α) is the complete gamma function.

Optimization Notice
1909
Return Values

nmax.


threshold.

vRngBeta
Generates beta distributed random values.
Syntax
status = vsRngBeta( method, stream, n, r, p, q, a, beta );
status = vdRngBeta( method, stream, n, r, p, q, a, beta );
Include Files
• mkl.h
Input Parameters
VSL_RNG_METHOD_BETA_CJA
VSL_RNG_METHOD_BETA_CJA_ACCURATE
See brief description of the method CJA in Table "Values of

<method> in method parameter"
p const float for vsRngBeta Shape p
const double for vdRngBeta
q const float for vsRngBeta Shape q
1910
a const float for vsRngBeta Displacementa.
beta const float for vsRngBeta Scalefactorβ.
Output Parameters
r float* for vsRngBeta Vector of n random numbers with beta distribution
double* for vdRngBeta
Description
The vRngBeta function generates random numbers with beta distribution that has shape parameters p and
q, displacement a, and scale parameter β, where p, q, a, and β∈R ; p > 0, q > 0, β > 0.
where B(p, q) is the complete beta function.

Optimization Notice

MKL 2017 Developer Reference C

Uploaded by

Copyright:

Available Formats

You might also like

MKL 2017 Developer Reference C

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MKL 2017 Developer Reference C

Uploaded by

Copyright:

Available Formats

Intel® Math Kernel Library

Chapter 1: Function Domains

Chapter 2: BLAS and Sparse BLAS Routines

Sparse Matrix Storage Formats for Inspector-executor Sparse BLAS

Chapter 3: LAPACK Routines

Error Analysis........................................................................................ 339

Nonsymmetric Eigenvalue Problems: LAPACK Computational

Chapter 4: ScaLAPACK Routines

Chapter 5: Sparse Solver Routines

cluster_sparse_solver iparm Parameter........................................... 1672

Chapter 6: Extended Eigensolver Routines

Chapter 7: Vector Mathematical Functions

Chapter 8: Statistical Functions

Chapter 9: Fourier Transform Functions

Chapter 10: PBLAS Routines

Chapter 11: Partial Differential Equations Support

Chapter 12: Nonlinear Optimization Problem Solvers

Chapter 13: Support Functions

Version Information.............................................................................. 2218

Chapter 14: BLACS Routines

Chapter 15: Data Fitting Functions

Chapter 16: Deep Neural Network Functions

Appendix A: Linear Solvers Basics

Appendix B: Routine and Function Arguments

Appendix C: FFTW Interface to Intel® Math Kernel Library

FFTW2 Interface to Intel® Math Kernel Library ......................................... 2421

Appendix D: Code Examples

Third Party Content

Copyright © 2000-2011 The University of California Berkeley. All rights reserved.

Copyright © 2006-2012 The University of Colorado Denver. All rights reserved.

Copyright © 2009, The Regents of the University of Massachusetts, Amherst.

Copyright © 2005-2008 ZIH, TU Dresden, Federal Republic of Germany

• Deep Neural Network Functions have been updated to support:

This manual uses the following notational conventions:

• Routine name shorthand (for example, ?ungqr instead of cungqr/zungqr).

Routine Name Shorthand

lowercase courier Code examples:

lowercase courier mixed with Function names; for example, vmlSetMode

* Used as a multiplication symbol in code examples and equations and

Sparse BLAS Routines

Sparse Solver Routines

Extended Eigensolver Routines

• Converges quickly in 2-3 iterations with very high accuracy

Fourier Transform Functions

Partial Differential Equations Support

• Loop unrolling to minimize loop management costs

C Datatypes Specific to Intel MKL

MKL_INT INTEGER C/C++: C/C++: int C/C++: long long

MKL_UINT N/A C/C++: C/C++: unsigned C/C++: unsigned

MKL_LONG N/A C/C++: C/C++: long C/C++: long

MKL_Complex8 COMPLEX*8 (8 bytes) (8 bytes) (8 bytes)

MKL_Complex16 COMPLEX*16 (16 bytes) (16 bytes) (16 bytes)

Naming Conventions for BLAS Routines

s real, single precision

c complex, single precision

d real, double precision

z complex, double precision

gb general band matrix

sx, sy Arrays, size at least (1+(n -1)abs(incx)) and (1+(n-1)abs(incy)),