Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Jacobi method

Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
The Jacobi method (II)
Some implementation details
Zlatko Drma c
University of Zagreb
Department of Mathematics
visiting Virginia Tech
October 18. 2007
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
Outline
1 Jacobi method versus bidiagonal SVDs
2 New Jacobi SVD
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
Fast
bidiagonalization
based SVD
Onesided Jacobi
SVD
New Jacobi
SVD
. . . ....
1 Jacobi method versus bidiagonal SVDs
Fast bidiagonalization based SVD
Onesided Jacobi SVD
2 New Jacobi SVD
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
Fast
bidiagonalization
based SVD
Onesided Jacobi
SVD
New Jacobi
SVD
Why are bidiagonal SVDs
faster?
Bidiagonalization: A = U
1
BV
T
1
is
A =
_

_





_

_
B =
_

_
o o o
o o o
o o o
o o o
o o o o
_

_
nite algorithm; 4mn
2
4n
3
/3 ops for B,
1.73 sec. for 1000 1000 (before 4.90, 2.79)
optimized, BLAS 2.49, Howell, Demmel, Fulton, ..
SVD(B), B = U
2
V
T
2
fast, mathematically sophisticated:
O(n
3
) in the QR SVD
O(n
2
) to O(n
3
) in the divide and conquer SVD
O(n
2
) Parlett and Dhillon MRRR ; O(n) data for only
Assemble: U = U
1
U
2
, V = V
1
V
2
. Use BLAS 3.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
Fast
bidiagonalization
based SVD
Onesided Jacobi
SVD
New Jacobi
SVD
Why are bidiagonal SVDs
faster?
Bidiagonalization: A = U
1
BV
T
1
is
A =
_

_





_

_
B =
_

_
o o o
o o o
o o o
o o o
o o o o
_

_
nite algorithm; 4mn
2
4n
3
/3 ops for B,
1.73 sec. for 1000 1000 (before 4.90, 2.79)
optimized, BLAS 2.49, Howell, Demmel, Fulton, ..
SVD(B), B = U
2
V
T
2
fast, mathematically sophisticated:
O(n
3
) in the QR SVD
O(n
2
) to O(n
3
) in the divide and conquer SVD
O(n
2
) Parlett and Dhillon MRRR ; O(n) data for only
Assemble: U = U
1
U
2
, V = V
1
V
2
. Use BLAS 3.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
Fast
bidiagonalization
based SVD
Onesided Jacobi
SVD
New Jacobi
SVD
... and the Jacobi SVD
A
(0)
= A ; V := I
A
(k+1)
= A
(k)
V
(k)
, V := VV
(k)
, k = 0, 1, 2, . . .
Under certain conditions, as k :
A
(k)
U, ((A
(k)
)
T
A
(k)
=
T
)
V
(0)
V
(1)
V
(k)
V
works with dense mn array A and dense n n array V;
busy memory trafc
high op count n(n 1)/2 * (one dot product and two
plane rotations) 3mn
2
+ 2n
3
ops per sweep. Many
sweeps (> 5, 6, 10) needed for numerical convergence. mn
2
ops only to check convergence.
all ops on BLAS 1 level
destroys any initial zero structure
It does not seem to be fair to compare with the bidiagonal
SVDs in terms of efciency. However, this Jacobi SVD can
be considered naive.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
Fast
bidiagonalization
based SVD
Onesided Jacobi
SVD
New Jacobi
SVD
What can be done?
Precondition for better convergence (transform initial A to
better X, i.e. X
T
X structured and more diagonal than A
T
A)
Preprocess (make the X above simpler, smaller,
structured; exploit the structure)
Reduce op count (nontrivial modications of the
algorithm)
Choose pivots wisely. Tailor pivot strategy for each given
matrix. Go for higher order convergence. Avoid extra work.
Stop just in time.
Upgrade AMAP ops to BLAS > 1 . Reduce memory
trafc. Software engineering, proling, ...
Z. Drma c and K. Veseli c: a project with the Volkswagen
Science Fundation.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
. . . ....
1 Jacobi method versus bidiagonal SVDs
2 New Jacobi SVD
A or A
T
?
R or R
T
Trinagular Jacobi SVD
Space-time locality: Tiling
Driver routine
Underow issues
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
A or A
T
?
A =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_












_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,
A = UV
T
, A
T
= V
T
U
T
A =
_


_
A =
_
_
_
_
_
_
_
_






_
_
_
_
_
_
_
_
If m n, take A and A = QR.
If m n, take A
T
and A
T
= QR.
If m = n and [A, A
T
] ,= 0 (A
T
A ,= AA
T
)... interesting. Subtle.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
H or M?
Let H = A
T
A, M = AA
T
. Which one is more diagonal?
Which one is easier prey for the Jacobi algorithm? Ready to
pay at most O(n
2
) for given A (O(n) for given H, M).
H = H
T
=
_
?
?
_
, M = M
T
=
_
?
?
_
(H) =
_
W
T
HW : W
T
W = I
_
, adjoint orbit
M (H) =|H|
F
= |M|
F
, thus
s(H)

i
h
2
ii
>

i
m
2
ii
=

i =j
h
2
ij
<

i =j
m
2
ij
s() attains maxima only at diagonal matrices in (H).
Interesting analysis (Brocketts dynamical systems, gradient
ow, critical points). Can be used to decide A or A
T
.
Note that it estimates the offnorm, and not the scaled
offnorm.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
H or M?
Let H = UU
T
(spectral decomposition),
h
ii
=
n

j =1
[u
ij
[
2

j
, i = 1, . . . , n, and (1)
d(H) =
_
h
11
, . . . , h
nn
_
T
, (H) =
_

1
, . . . ,
n
_
T
.
d(H) = (U U)(H) = S(H), S = U U. (2)
S is orthostochastic. Thus, d(H) is majorised by (H)
(d(H) (H)) (Schur theorem).
Normalize (2) using the trace of H,
d(H)
Trace(H)
. .
d

(H)
= S
(H)
Trace(H)
. .

(H)
, (3)
then d

(H) and

(H) are two nite probability distributions


connected via the doubly stochastic matrix S.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
A or A
T
d

(H) = S

(H), S 2stochastic
Thus, d

(H) has larger entropy than

(H). For a probability


distribution p =
_
p
1
, . . . , p
n
_
T
(p
i
0,

i
p
i
= 1) the entropy
of p is
(p) =
1
logn
n

i =1
p
i
logp
i
[0, 1] . (4)
Example: Let A be triang. QRF factor of the 4 4 Hilbert
matrix, H = A
T
A, M = AA
T
.

1
2.250e + 000 h
11
= 1.423e + 000

2
2.860e 002 h
22
= 4.636e 001

3
4.540e 005 h
33
= 2.413e 001

4
9.351e 009 h
44
= 1.506e 001.
m
11
= 2.235e +000 m
22
= 4.365e 002
m
33
= 1.302e 004 m
44
= 3.530e 008.
(H) 7.7e 001 > (M) 8.6e 002.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Entropy (Boltzman, von
Neumann, Shannon)
Entropy is a very, very, very special function.
If k 1, . . . , n, and q =

k
i =1
p
i
, then
(p
1
, . . . , p
n
) = (q, 1 q) + q
_
p
1
q
, . . . ,
p
k
q
_
+ (1 q)
_
p
k+1
1 q
, . . . ,
p
n
1 q
_
.
E(H) =
_
1 q
1
q
2

12

11

22
_
,
q
1
=

k
i =1
p
i
q
2
=

n
i =k+1
p
i

12
= (q
1
, q
2
)

11
= (
p
1
q
1
, . . . ,
p
k
q
1
),

22
= (
p
k+1
q
2
, . . . ,
p
n
q
2
)
(H) =
12
+q
1

11
+ q
2

22
. Can be used to guess block
structure (compare E(H), E(M)).
KullbackLeibler distance, relative entropy, Fisher metric, ...
Interpretations, applications, ... open
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
R or R
T
Once we choose A A or A
T
, and compute the
factorization
AP = Q
_
R
0
_
with blockpartitioned R
R =
_
R
[11]
R
[12]
0 R
[22]
_
where pivoting ensures
min
(R
[11]
) >
max
(R
[22]
).
Again, can choose to implicitly diagonalize R
T
R or RR
T
, i.e.
to use R or R
T
as input to a onesided Jacobi SVD.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
R or R
T
H =
_
H
[11]
H
[12]
H
[21]
H
[22]
_
= R
T
R
=
_
R
T
[11]
R
[11]
R
T
[11]
R
[12]
R
T
[12]
R
[11]
R
T
[12]
R
[12]
+R
T
[22]
R
[22]
_
,
M =
_
M
[11]
M
[12]
M
[21]
M
[22]
_
=
_
R
[11]
R
T
[11]
+ R
[12]
R
T
[12]
R
[12]
R
T
[22]
R
[22]
R
T
[12]
R
[22]
R
T
[22]
_
Trace(M
[11]
) = Trace(H
[11]
) +|R
[12]
|
2
F
,
Trace(M
[22]
) = Trace(H
[22]
) |R
[12]
|
2
F
.
|M
[21]
|
F

max
(R
[22]
)

min
(R
[11]
)
|H
[21]
|
F
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
M versus H
0 50 100
10
2
10
1
10
0
10
1
10
2
0
20
40
60
80
100
0
50
100
0
0.2
0.4
0.6
0.8
1
0
20
40
60
80
100
0
50
100
0
0.2
0.4
0.6
0.8
1
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
M versus H
0 20 40 60
10
1
10
0
10
1
10
2
10
3
10
4
10
5
0
50
0
50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
50
0
50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Quasidenite connection
In case of successful separation
min
(M
[11]
) >
max
(M
[22]
)

M I quasi-denite.
If U =
_
U
[11]
U
[12]
U
[21]
U
[22]
_
is the eigenvector matrix of M, then
U
T
[11]
U
[11]
~ U
T
[21]
U
[21]
U
T
[22]
U
[22]
~ U
T
[12]
U
[12]
(Lwner p.o.)

min
(U
[ii ]
) > 1/

2, U
1
[11]
U
[12]
, U
1
[22]
U
[21]
small.
R =
_
R
[11]
R
[12]
0 R
[22]
_
=
_
U
[11]
U
[12]
U
[21]
U
[22]
_
V
T
Prefer to have U as product of Jacobi rotations. Use
onesided Jacobi SVD on R
T
.
Need for Jacobi SVD tailored for lower triangular structured
matrices.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Triangular Jacobi SVD
Consider XV = U, with lower triangular X.
X =
_

_
o o o o o o o
o o o o o o
o o o o o
o o o o
o o o
o o
o

_

_
= [X
1
, X
2
, X
3
, X
4
]
Rotating in column spaces of X
4
, X
3
, [X
3
, X
4
], X
2
is efcient,
because of zero blocks. Less ops. Reduced memory
trafc. Moreover, rotating respectively, in spaces of X
1
, X
2
,
X
1
X
2
, X
1
, X
2
, X
3
, X
4
, X
3
X
4
, X
3
, X
4
, and nally in
[X
1
, X
2
] [X
3
, X
4
] has CUBIC convergence (Rhee, Hari).
Again, preprocessing + preconditioning in one. Good for
one sweep only. X becomes dense.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Almost banded Jacobi SVD
Now, X is dense, but X
s
is close to orthogonal and
X
T
s
X
s
= I + E, where the largest entries in [E[ are close to
diagonal. Must dynamically adapt to such an X.
X
T
X =
_

_
+ x x x x
+ x x
x x x
+ x
+ +
+
+

_
+ : rotated; : rotation skipped after dot product test;
x : dot product test skipped
Many unnecessary dot products saved in this way.
Dynamical switch to full rowcyclic strategy to ensure
(check, conrm) convergence.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
0
50
0
20
40
60
20
0
20
0
20
40
60
0
20
40
60
20
10
0
0
20
40
60
0
20
40
60
20
10
0
0
20
40
60
0
20
40
60
20
10
0
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Tiling
Tiled rowcyclic strategy with tile size b.
Simplied description of one full sweep
N
bl
= n/b|
for r = 1 to N
bl
i = (r 1) b +1
for d = 0 to k [Do the blocks (r , r ), . . . , (r + k, r +k)]
i = i + d b
for p = i to mini + b 1, n
for q = p +1 to mini +b 1, n
rotate pivot pair (p, q)
i = (r 1) b +1
for c = r + 1 to N
bl
j = (c 1) b + 1
for p = i to mini + b 1, n
for q = j to minj +b 1, n
rotate pivot pair (p, q)
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Last sweep: convergence
Convergence:
max
i =j
[X
T
X[
ij
_
(X
T
X)
ii
(X
T
X)
jj
nu
Can dynamically, with some certainty, predict last, empty
sweep, i.e. n(n 1)/2 dot products not followed by
rotations. In that case use BLAS 3 (SSYRK) to compute
upper triangle of H = X
T
X and check convergence (5 or 10
times faster than BLAS 1 SDOTs).
Then inspect H and, if necessary, apply rotations to X.
Keep track of parallel rotations. If rotated in one sweep,
repeat this once more and check convergence. If
nonconvergence (a rare event), return control to
rowcycling on X, and do this again.
All controls and switches are dynamical, based on theory.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi ++ (Drma c, Veseli c 06.)
(A)P = Q
_
R
0
_
; = rank(R) (A = D
1
BD
2
)
R(1 : , 1 : n)
T
= Q
1
_
R
1
0
_
;
X = R
T
1
=
_
0

_
; X
T
X I quasidenite
X

U
x
= XJ
1
J
2
J

)
. .
V
x
V
x
= R
T
1
(X

)
U =
T
Q
_
U
x
0
0 I
m
_
; V = PQ
1
_
V
x
0
0 I
n
_
if = n, Q
1
V
x
= R
1
X

Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Where is V?
Want more, i.e. less computation for the same accuracy. Let
det(R) ,= 0, R
T
V = U. In Jacobi SVD U is the limit
matrix, and V is the accumulated product of Jacobi
rotations. Well, let us compute V a posteriori, without
accumulating rotations. This means that
R
T
V = U
is considered as matrix equation for V.
General problem: If XV = U is the Jacobi SVD, what kind
of X allows a posteriori computation of V, once we have U
and ?
Since U is orthogonal, U
T
X = V
T
, and
V = X
T
U
1
. Bad. Computed X
T

1
not
numerically orthogonal. Known problems.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
A posteriori computation of V
If

V = computed(X
T

1
) then the quality of

V
depends on (X). Undesirable fact.
Strange way: V = X
1
U. Good. WHY?
Error analysis of Jacobi:

= (X + X)

V (=

U

),

V
T

V = I. Then
X
1

= (I +X
1
X)

V,
where (error analysis)
[X
1
X[ [X
1
[ [X[ [1/n]
nn
.
(X) = | [X
1
[ [X[ | does not depend on row scaling of X.
(X) = (X) for any diagonal . If X is triangular and

V = computed(X
1

)
then |

V(:, i )

V(:, i )| f (n) u (X) for all i .
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Options in Step 1.
Let A

Q
_

R
0
_
be computed. Then
(A + A) =

Q
_

R
0
_
,
|A|
|A|
, |

Q

Q|
Let

R =
_

R
[11]

R
[12]
0

R
[22]
_
, [

r
k+1,k+1
[ [

r
11
[
Then |

R
[22]
|

n k
max
(

R).

R
[22]
0.

Q
_

R
[11]

R
[12]
0 0
_
= A + A

Q
_
0 0
0

R
[22]
_
Second QR returns k k matrix. Computed
[
i

i
[


1
, i = 1 : n, (
k+1
= .. =
n
= 0).
Can also test [

r
k+1,k+1
[ [

r
kk
[ to get [
i

i
[


k
.
Very efcient for low numerical rank k and standard
accuracy.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
CAVEAT: underow and
overow
Overow problem:
max
> =overow.
Solution: A c A, max
i
|A(:, i )| < /

n.
Underow problems:
FATAL: nonconvergence, loss of accuracy
annoying: sloooow computation with denormals which
are as good as zeros, hardware and OS dependent
problems
easily removable by scaling (A c A)
NOT removable by scaling
Treating underow problems is more tricky
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Rotation without squaring
G =
_
a c
c b
_
=
_
x
T
y
T
_
_
x y
_
, x, y R
m
.
Let |x|
2
> |y|
2
. Jacobi rotation is dened by
J =
1

1 + t
2
_
1 t
t 1
_
, where
t =
sign()
[[ +
_
1 +
2
, and =
b a
2c
.
=
_
b
a

_
a
b
2
c

ab
=
|y|
2
|x|
2

|x|
2
|y|
2
2cos (x, y)
What if |x|
2
/|y|
2
= Inf? = Inf?
2
= Inf?
What if computed(J) = I
2
independent of cos (x, y)?
What if t is denormalized?
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Jacobi, Gau , Gram-Schmidt
G =
_
a c
c b
_
=
_
x
T
y
T
_
_
x y
_
, x, y R
m
.
Let |x|
2
> |y|
2
/

u. Then
cot 2 =
|y|
|x|

|x|
|y|
2
y
T
x
|x||y|

|x|
|y|
2
y
T
x
|x||y|
, [ cot 2[
1
2

u
tan
1
2
1
cot 2

y
T
x
|x|
2
=
y
T
x
|x||y|
|y|
|x|
,
[ tan[

u, cos 1, and y goes to


y

= y +x tan y x
y
T
x
|x|
2
x
y


_
y
|y|

_
y
T
x
|x||y|
__
x
|x|
__
|y|.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
SMALL details: Underows
Some underows are just annoying. Example: small
denormalized summands in dot product with other
summands eg > 1. Example: QRF(sparse A)
Problem in Jacobi: If A = BD with illconditioned D, Jacobi
very accurate and converges swiftly (low number of dots
and rots). It should be much faster than QR, but it is much
slower. Frustrating.
X =
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
0

_
_
_
_
_
_
_
_
_
_
_
_
Scaling doesnt help. Set to zero doesnt help.
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
SMALL details: Underow
problems
X X + X

articial perturbation:
applied only once, simply and efciently
big enough to remove denormalized numbers
destroys all zeros, thus destroying initial triangular
structure
so small and structured that it does not harm high relative
accuracy properties and mechanisms
so small and structured that it does not interfere with the
convergence
so small and structured that it can be ignored at any time
we nd it convenient to do so
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
SMALL details: Underow
problems
X X + X

articial perturbation
(X

)
ij
= X
ij
+ sign(X
ij
) min[X
ii
[, [X
jj
[,
(X

)
ji
= sign(X
ij
) min[X
ii
[, [X
jj
[ i > j .

X =
_

X
11

X
12

X
21

X
22
_
=
_

X
11
0

X
21

X
22
_
+
_
0

X
12
0 0
_
.
[(X

)
ij
[
|X(i , :)|

[X
jj
[
|X(i , :)|
(since [X
jj
[ [X
ii
[)

=
_

X
11
0

X
21

X
22
+

X
22
__
I 0
0

W
_
+
_
0

X
12
0 0
_
=
_

X
11

X
12

W
T

X
21

X
22
+

X
22
__
I 0
0

W
_
|

X
22
(i , :)|
J
|

X
22
(i , :)|
Jacobi method
Zlatko Drma c
Jacobi method
versus
bidiagonal
SVDs
New Jacobi
SVD
A or A
T
?
R or R
T
Trinagular Jacobi
SVD
Space-time locality:
Tiling
Driver routine
Underow issues
Conclusion and future work
Z.D., K. Veseli c, V. Hari
Current code outperforms SGESVD for full SVD.
Comparable with SGESDD?
Better (coming soon)
Rotation part will be with block rotations
Aggressive implementation for standard absolute
accuracy and low numranks
The new code will be in the next release of LAPACK.
Better new RRQR = better Jacobi SVD.
ScaLAPACK. New design required. Numerical analysis
clear and clean with any pivoting and blocking. Efcient
parallel pivot strategy open.

You might also like