Comp Mult Linear

Computational Multilinear Algebra
Charles F. Van Loan
Supported in part by the NSF contract CCR-9901988.

Involves Working With Kronecker Products
⎡ ⎤
⎢
⎢
⎢
b11c11 b11c12 b11c13 b12c11 b12c12 b12c13 ⎥
⎥
⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
b11c21 b11c22 b11c23 b12c21 b12c22 b12c23 ⎥
⎥
⎥
⎡ ⎤
⎢ ⎥
⎢ c11 c12 c13 ⎥
⎡ ⎤ ⎢ ⎥
⎢ b11 b12 ⎥⎥ ⎢
⎢
⎥
⎥
⎢
⎢
⎢
b11c31 b11c32 b11c33 b12c31 b12c32 b12c33 ⎥
⎥
⎥
⎦ ⊗ ⎢ c21 c22 c23 ⎥ =
⎢ ⎢ ⎥ ⎢ ⎥
⎣ ⎢ ⎥
b21 b22 ⎢
⎣
⎥
⎦
⎢
⎢
⎥
⎥
c31 c32 c33 ⎢
⎢
⎢ b21c11 b21c12 b21c13 b22c11 b22c12 b22c13 ⎥
⎥
⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
b21c21 b21c22 b21c23 b22c21 b22c22 b22c23 ⎥
⎥
⎥
⎢ ⎥
⎣ ⎦
b21c31 b21c32 b21c33 b22c31 b22c32 b22c33
“Replicated Block Structure”

Properties
Quite predictable:
(B ⊗ C)T = BT ⊗ C T
(B ⊗ C)−1 = B −1 ⊗ C −1
(B ⊗ C)(D ⊗ F ) = BD ⊗ CF
B ⊗ (C ⊗ D) = (B ⊗ C) ⊗ D
Think twice:
B⊗C =C ⊗B
B ⊗ C = (Perfect Shuffle)(C ⊗ B)(Perfect Shuffle)T

The Perfect Shuffle Sp,q
⎡ ⎤ ⎡ ⎤
⎢
⎢
0 ⎥⎥ ⎢
⎢
0 ⎥⎥
⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
1 ⎥⎥⎥ ⎢
⎢
⎢
4 ⎥⎥⎥
⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢ 2 ⎥⎥⎥ ⎢
⎢
⎢
8 ⎥⎥⎥
⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
⎢
3 ⎥⎥⎥⎥ ⎢
⎢
⎢
⎢
1 ⎥⎥⎥⎥ ⎡ ⎤
⎢
⎢
⎢ 4 ⎥⎥⎥⎥ ⎢
⎢
⎢ 5 ⎥⎥⎥⎥ ⎢ 0 4 8 ⎥⎥ ⎡ ⎤
⎢
⎢ ⎥
⎢
⎢
⎢ S3,4 0 1 2 3 ⎥⎥
⎢
⎢ 5 ⎥⎥⎥ ⎢
⎢ 9 ⎥⎥⎥⎥ ⎢
⎢
⎢ 1 5 9 ⎥⎥⎥
⎥ ⎢
⎢
⎢ ⎥
S3,4 ⎢
⎢ ⎥ =
⎢
⎢ ⎥ ≡ ⎢
⎢ ⎥ −→ ⎢
⎢ 4 5 6 7 ⎥⎥⎥
⎢
⎢ 6 ⎥⎥⎥ ⎢
⎢ 2 ⎥⎥⎥ ⎢
⎢ 2 6 10 ⎥⎥⎥ ⎢
⎣ ⎦
⎢
⎢ ⎥
⎢
⎢ ⎥
⎢
⎣ ⎦ 8 9 10 11
⎢
⎢
⎢
7 ⎥⎥⎥ ⎢
⎢
⎢
6 ⎥⎥⎥ 3 7 11
⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
⎢
8 ⎥⎥⎥⎥ ⎢
⎢
⎢
⎢
10 ⎥⎥⎥⎥
⎢
⎢
⎢
⎢
9 ⎥⎥⎥⎥ ⎢
⎢
⎢
⎢
3 ⎥⎥⎥⎥
⎢
⎢
⎢
⎢
10 ⎥⎥⎥⎥ ⎢
⎢
⎢
⎢
7 ⎥⎥⎥⎥
⎣ ⎦ ⎣ ⎦
11 11
Takes the length-pq “card deck” x, splits it into p piles of length-q each, and
then takes one card from each pile in turn until the deck is reassembled.
C ⊗ B = Sm1,m2 (B ⊗ C)SnT1,n2 B ∈ IRm1×n1 , C ∈ IRm2×n2
E.g., ⎡ ⎤
⎢ ⎥
⎢
⎢
⎢
b11c11 b11c12 b11c13 b12c11 b12c12 b12c13 ⎥
⎥
⎥
⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
b11c21 b11c22 b11c23 b12c21 b12c22 b12c23 ⎥
⎥
⎥
A=B⊗C = ⎢
⎢
⎢
⎥
⎥
⎥
⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
b21c11 b21c12 b21c13 b22c11 b22c12 b22c13 ⎥
⎥
⎥
⎢ ⎥
⎣ ⎦
b21c21 b21c22 b21c23 b22c21 b22c22 b22c23
⎡ ⎤
⎢
⎢
c11b11 c11b12 c12b11 c12b12 c13b11 c13b12 ⎥⎥
⎢ ⎥
⎢ ⎥
T
⎢
⎢
⎢
c11b21 c11b22 c12b21 c12b22 c13b21 c13b22 ⎥
⎥
⎥
S2,2 A S2,3 = = C ⊗B = ⎢
⎢
⎢
⎥
⎥
⎥
⎢
⎢
⎢
c21b11 c21b12 c22b11 c22b12 c23b11 c23b12 ⎥
⎥
⎥
⎢ ⎥
⎣ ⎦
c21b21 c21b22 c22b21 c22b22 c23b21 c23b22
Inheriting Structure
⎧ ⎫ ⎧ ⎫
⎪
⎪
⎪
⎪
⎪
⎪
nonsingular ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
nonsingular ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪
⎪ ⎪ ⎪ ⎪
⎪
⎪
⎪
⎪
⎪
⎪
lower(upper) triangular
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
lower(upper)triangular⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
banded ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
block banded ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
symmetric ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
symmetric ⎪
⎪
⎪
⎪
⎪
⎪
⎨ ⎬ ⎨ ⎬
If B and C are ⎪⎪⎪ positive definite ⎪
⎪
⎪
then B ⊗ C is ⎪
⎪
⎪
positive definite ⎪
⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
stochastic ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
stochastic ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪
⎪ ⎪ ⎪ ⎪
⎪
⎪
⎪
⎪
⎪
Toeplitz ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
block Toeplitz ⎪
⎪
⎪
⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪
⎪ ⎪ ⎪ ⎪
⎪
⎪
⎪
⎪
⎪
⎪
permutations ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
a permutation ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪ ⎪
⎪
⎩ orthogonal
⎪ ⎪
⎭ ⎩ orthogonal
⎪ ⎪
⎭
Factorizations of B ⊗ C
Obvious...
B ⊗ C = (PBT LB UB ) ⊗ (PCT LC UC ) = (PB ⊗ PC )T (LB ⊗ LC )(UB ⊗ UC )
B ⊗ C = (GB GTB ) ⊗ (GC GTC ) = (GB ⊗ GC )(GB ⊗ GC )T
B ⊗ C = (QB RB ) ⊗ (QC RC ) = (QB ⊗ QC )(RB ⊗ RC )
Sort of...
B ⊗ C = (QB ΛB QTB ) ⊗ (QC ΛC QTC ) = (QB ⊗ QC )(ΛB ⊗ ΛC )(QB ⊗ QC )T
B ⊗ C = (UB ΣB VBT ) ⊗ (UC ΣC VCT ) = (UB ⊗ UC )(ΣB ⊗ ΣC )(VB ⊗ VC )T
Problematic...
B ⊗ C = (QB RB PBT ) ⊗ (QC RC PCT ) = (QB ⊗ QC )(RB ⊗ RC )(PB ⊗ PC )T

“Sort of ”
⎡ ⎤
⎢
⎢
× 0 0 0 0 0 ⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 × 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 × 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎡ ⎤
⎢ ⎥
⎢
⎢
× 0 0 ⎥
⎥
⎢
⎢
⎢
0 0 0 × 0 0 ⎥
⎥
⎥
⎡ ⎤
⎢ ⎥ ⎢ ⎥
⎢
⎢
× 0 ⎥
⎥
⎢
⎢
⎢
0 × 0 ⎥
⎥
⎥
⎢
⎢
⎢
0 0 0 0 × 0 ⎥
⎥
⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
0 × ⎥
⎥
⎥
⊗ ⎢
⎢
⎢
0 0 × ⎥
⎥
⎥
= ⎢
⎢
⎢
0 0 0 0 0 × ⎥
⎥
⎥
⎣ ⎦ ⎢ ⎥ ⎢ ⎥
0 0 ⎢
⎢
⎢
0 0 0 ⎥
⎥
⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎣ ⎦ ⎢ ⎥
⎢ ⎥
0 0 0 ⎢
⎢ 0 0 0 0 0 0 ⎥
⎥
⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎣ ⎦
0 0 0 0 0 0
⎡ ⎤
⎢
⎢
× 0 0 0 0 0 ⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 × 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 × 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 × 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
⎢
0 0 0 0 × 0 ⎥
⎥
⎥
⎥
⎛ ⎡ ⎤ ⎞
⎢ ⎥
⎜
⎜ ⎡ ⎤
⎢
⎢
× 0 0 ⎥
⎥
⎟
⎟
⎢
⎢
⎢
0 0 0 0 0 × ⎥
⎥
⎥
⎜ ⎢ ⎥ ⎟ ⎢ ⎥
⎜
⎜
⎜
⎢
⎢
× 0 ⎥
⎥
⎢
⎢
⎢
0 × 0 ⎥
⎥
⎥
⎟
⎟
⎟
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎜ ⎢ ⎥ ⎢ ⎥ ⎟ ⎢ ⎥
P ⎜
⎜
⎜
⎢
⎢
⎢
0 × ⎥
⎥
⎥
⊗ ⎢
⎢
⎢
0 0 × ⎥
⎥
⎥
⎟
⎟
⎟
= ⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎜ ⎣ ⎦ ⎢ ⎥ ⎟ ⎢ ⎥
⎜
⎜
⎜
0 0 ⎢
⎢
⎢
0 0 0 ⎥
⎥
⎥
⎟
⎟
⎟
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎝ ⎣ ⎦ ⎠ ⎢ ⎥
⎢ ⎥
0 0 0 ⎢
⎢ 0 0 0 0 0 0 ⎥
⎥
⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 0 0 0 ⎥
⎥
⎥
⎣ ⎦
0 0 0 0 0 0
“Fast” Factoring
Suppose B ∈ IRm×m and C ∈ IRn×n are positive definite.
A = B ⊗ C is an mn-by-mn symmetric positive definite natrix.
Ordinarily, a Cholesky factorization GGT this size would cost O(m3n3) flops.
But GA = GB ⊗ GC and GB and GC require O(m3 + n3) flops.

“Fast” Solving
If B, C ∈ IRm×m , then the m2-by-m2 system
(B ⊗ C)x = f ≡ CXB T = F x = vec(X), f = vec(F )
can be solved in O(m3) flops:
CY = F
XB T = Y
via factorizations of B and C.

The vec Operation
⎡ ⎤
⎢
⎢
X(:, 1) ⎥⎥
⎢ ⎥
⎢
⎢ X(:, 2) ⎥⎥⎥
X ∈ IRm×n ⇒ vec(X) = ⎢
⎢
⎢ .. ⎥
⎥
⎢ ⎥
⎢ ⎥
⎣ ⎦
X(:, n)
Y = CXB T ⇒ vec(Y ) = (B ⊗ C)vec(X)

Matrix Equations
Sylvester:
F X + XGT = C ≡ (In ⊗ F + G ⊗ Im) vec(X) = vec(C)
Lyapunov:
F X + XF T = C ≡ (In ⊗ F + F ⊗ In) vec(X) = vec(C)
Generalized Sylvester:
F1XGT1 + F2XGT2 = C ≡ (G1 ⊗ F1 + G2 ⊗ F2) vec(X) = vec(C)
These can be solved in O(n3) time via the (generalized) Schur decomposition.
See Bartels and Stewart, Golub, Nash, and Van Loan, and J. Gardiner, M.R.
Wette, A.J. Laub, J.J. Amato, and C.B. Moler.
Talk Outline
How do we solve
(1) min W ((B ⊗ C)x − d) (weighted least quares)
(2) U T [B ⊗ C | d] V = Σ (total least squares)
(3) (Ap ⊗ · · · ⊗ A1 − λI)x = b (shifted linear systems)
given that these problems
(1 ) min (B ⊗ C)x − d
(2 ) U T (B ⊗ C)V = Σ
(3 ) (Ap ⊗ · · · ⊗ A1)x = b
are easy
The Rise of the Kronecker Product
• Thriving Application Areas
• Sparse Factorizations for fast linear transforms
• Tensoring Low-Dimension Techniques

Thriving application areas such as Semidefinite
Programming...
Some sample problems...
X ⊗ X + AT DA u = f.
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎢
⎢
0 AT I ⎥ ⎢ ∆x ⎥
⎥⎢ ⎥
⎢ rd ⎥
⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
A 0 0 ⎥ ⎢ ∆y ⎥
⎥⎢
⎥⎢
⎥ =
⎥
⎢ r ⎥
⎢ p ⎥ .
⎢ ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
Z⊗I 0 X ⊗I ∆z rc
See Alizadeh, Haeberly, and Overton (1998).

Symmetric Kronecker Products
For symmetric X ∈ IRn×n and arbitrary B, C ∈ IRn×n this operation is defined

by ⎛ ⎞
1
(B ⊗ C)svec(X) = svec ⎜⎝ CXB T + BXC T ⎟⎠
2
where the “svec” operation is a normalized stacking of X’s subdiagonal columns,

e.g.,
⎡ ⎤
⎢
⎢
x11 x12 x13 ⎥⎥ √ √ √
⎢ ⎥ T
X = ⎢
⎢
⎢
x21 x22 x23 ⎥⎥ ⇒ svec(X) = x11 2x21 2x31 x22 2x32 x33
⎥
.
⎣ ⎦
x31 x32 x33
svec stacks the subdiagonal portion of X’s columns.

The Rise of the Kronecker Product Cont’d
Sparse Factorizations
Kronecker products are proving to be a very effective way to look at fast

linear transforms.
A Sparse Factorization of the DFT Matrix
n = 2t
Fn = At · · · A1Pn
Pn = S2,n/2(I2 ⊗ S2,n/4) · · · (In/4 ⊗ S2,2)
⎡ ⎤
⎢
⎢
IL/2 ΩL/2 ⎥⎥
Aq = Ir ⊗ ⎢
⎢
⎢
⎣
⎥
⎥
⎥
⎦
L = 2q , r = n/L
IL/2 −ΩL/2
ΩL/2 = diag(1, ωL, . . . , ωLL/2−1) ωL = exp(−2πi/L)

Different FFTs Correspond to Different
Factorizations of Fn
The Cooley-Tukey FFT is based on y = Fnx = At · · · A1Pnx

x ← Pnx
for k = 1:t
x ← Aq x
end
y←x
The Gentleman-Sande FFT is based on y = Fnx = FnT x = PnT AT1 · · · ATt x

for k = t: − 1:1
x ← ATq x
end
y ← PnT x
Matrix Transpose
A ∈ IRm×n and B = AT , then vec(B) = Sn,m · vec(A).
⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢
⎢
a11 ⎥⎥ ⎢
⎢
1 0 0 0 0 0 ⎥
⎥
⎢
⎢
a11 ⎥⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
a12 ⎥⎥⎥ ⎢
⎢
⎢
0 0 1 0 0 0 ⎥
⎥
⎥
⎢
⎢
⎢
a21 ⎥⎥⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢ a13 ⎥⎥⎥ ⎢
⎢
⎢ 0 0 0 0 1 0 ⎥
⎥
⎥
⎢
⎢
⎢ a12 ⎥⎥⎥
⎢
⎢
⎥ = ⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
a21 ⎥⎥⎥⎥ ⎢
⎢
⎢
⎢
0 1 0 0 0 0 ⎥
⎥
⎥
⎥
⎢
⎢
⎢
⎢
a22 ⎥⎥⎥⎥
⎢
⎢
⎢
⎢
a22 ⎥⎥⎥⎥ ⎢
⎢
⎢
⎢
0 0 0 1 0 0 ⎥
⎥
⎥
⎥
⎢
⎢
⎢
⎢
a13 ⎥⎥⎥⎥
⎣
a23 ⎦ ⎣
0 0 0 0 0 1 ⎦ ⎣
a23 ⎦
⎡ ⎤
⎢ a11 a21 ⎥⎥ ⎡ ⎤T
⎢
⎢ ⎥ ⎢ a11 a12 a13 ⎥⎥
⎢
⎢ a12 a22 ⎥⎥⎥ = ⎢
⎣ ⎦
⎢
⎣ ⎦ a21 a22 a23
a13 a23
Multiple Pass Transpose
To compute B = AT where A ∈ IRm×n factor
Sn,m = Γt · · · Γ1
and then execute
a = vec(A)
for k = 1:t
a ← Γk a
end
Define B ∈ IRn×m by vec(B) = a.
Different transpose algorithms correspond to different factorizations of

Sn,m.
An Example
If m = pn, then Sn,m = Γ2Γ1 = (Ip ⊗ Sn,n)(Sn,p ⊗ In)
The first pass b(1) = Γ1vec(A) corresponds to a block transposition:

⎡ ⎤
A1 ⎥⎥
⎢
⎢
⎢ ⎥
A2 ⎥⎥⎥
⎢
⎢
A= ⎢
⎢ ⎥ → B (1) = A1 A2 A3 A4 .
A3 ⎥⎥⎥
⎢
⎢
⎢
⎣ ⎦
A4
The second pass b(2) = Γ2b(1) carries out the transposition of the blocks.
B (1) = A1 A2 A3 A4 → B (2) = AT1 AT2 AT3 AT4 .
Note that B (2) = AT .

The Rise of the Kronecker Product Cont’d
Tensoring Low-Dimension Techniques
Greater willingness to entertain problems of high dimension.

Tensoring Low Dimensional Ideas
Quadrature in one and three dimensions:
b n
a f (x) dx ≈ wi f (xi)
i=1
= wT f (x)
b1 b2 b3 nx ny nz (x) (y) (z)

a1 a2 a3 f (x, y, z) dx dy dz ≈ wi wj wk f (xi, yj , zk )
i=1 j=1 k=1
= (w(x) ⊗ w(y) ⊗ w(z))T f (x ⊗ y ⊗ z)

Higher Dimensional KP Problems
1. Andre, Nowak, and Van Veen (1997).

In higher-order statistics the calculations involve the “cumulants’
x ⊗ x ⊗ · · · ⊗ x. (Note that vec(xxT ) = x ⊗ x.)
2. de Lathauwer, de Moor, and Vandewalle (2000)

multilinear SVD, low rank approximation of tensors
Least Squares Problems that Involve Kronecker
Products
Some Least Squares Problems
Least squares problems of the form
min (B ⊗ C)x − b
can be efficiently solved by computing the QR factorizations (or SVDs) of B

and C. See Fausett and Fulton (1994) and Fausett, Fulton, and Hashish (1997).
Barrlund (1998) on surface fitting with splines...
(A1 ⊗ A2)x − f such that (B1 ⊗ B2)x = g

Weighted Least Squares
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎢
⎢
W B⊗C ⎥
⎥
⎢
⎢
r ⎥
⎥
⎢
⎢
b ⎥
⎥
−1/2
min W ((B ⊗ C)x − b) 2 ≡ ⎢
⎢
⎣
⎥
⎥
⎦
⎢
⎢
⎣
⎥
⎥
⎦
= ⎢
⎢
⎣
⎥
⎥
⎦
T T
B ⊗C 0 x 0
Compute the QR factorizations

⎡ ⎤ ⎡ ⎤
⎢ RB ⎥⎥ ⎢ RC ⎥
B = QB ⎢⎣ ⎦ C = QC ⎢⎣ ⎥
⎦
0 0
The augmented system transforms to
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎢
⎢
E11 E12 RB ⊗ RC ⎥
⎥
⎢
⎢
r̃1 ⎥⎥ ⎢
⎢ b̃1 ⎥⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥
⎢
⎢
⎢
E21 E22 0 ⎥
⎥
⎥
⎢
⎢
⎢
r̃2 ⎥
⎥
⎥
= ⎢
⎢
⎢
b̃2 ⎥
⎥
⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ T ⊗ T ⎦ ⎣ ⎦ ⎣ ⎦
RB RC 0 0 x 0
where
⎡ ⎤
⎢ E11 E12 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎣ ⎦
E21 E22
is a simple permutation of (QB ⊗ QC )T W (QB ⊗ QC ). Solve the E22 system

via congugate gradients exploiting structure.
When the blocks are Kronecker products...
⎡ ⎤
⎢ B1 ⊗ C1 ⎥⎥
min ⎢
⎣ ⎦x − b
B2 ⊗ C2 2
Compute a pair of GSVD’s:
B1 = U1B D1B XBT B2 = U2B D2B XBT
C1 = U1C D1C XCT C2 = U2C D2C XCT .
⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢ B1 ⊗ C1 ⎥⎥ ⎢ U1B ⊗ U2B 0 ⎥⎢ D1B ⊗ D2B ⎥⎥ T T
⎦ = ⎦ XB ⊗ XC .
⎢ ⎢ ⎥⎢
⎣ ⎣ ⎦⎣
B2 ⊗ C2 0 U1C ⊗ U2C D1C ⊗ D2C
Total Least Squares
2
LS: min r 2 AxLS = b + ropt
b + r ∈ ran(A)
2 2
TLS: min E F + r 2 (A + Eopt)xTLS = b + ropt
b + r ∈ ran(A + E)
“Errors in Variables”
Total Least Squares Solution
To solve
2 2
min E F + r 2 (A + Eopt)xTLS = b + ropt
b + r ∈ ran(A + E)
compute the SVD of [A | b] ∈ IRm×n+1 :
UT A b V = Σ
and set
xTLS = −V (1:n, n + 1)/V (n + 1, n + 1)

If A is a Kronecker Product B ⊗ C...
We need the last column of V in U T F V = Σ where

F = B⊗C b
First compute the SVDs of B and C:
UBT BVB = ΣB UCT CVC = ΣC

If
⎡ ⎤
⎢ V B ⊗ VC 0 ⎥
Ũ = U B ⊗ UC Ṽ = ⎢⎣ ⎥
⎦
0 1
then
Ũ T F Ṽ = ΣB ⊗ ΣC g ≡ F̃ where g = Ũ T b
We need the smallest right singular vector of F̃ .

Think Eigenvector
The right singular vectors of F̃ are the eigenvectors of

⎡ ⎤
T ⎢ D z ⎥⎥
C = F̃ T F̃ = ΣB ⊗ ΣC g ΣB ⊗ ΣC g = ⎢
⎣ ⎦
zT α
where
D = ΣTB ΣB ⊗ ΣTC ΣC (square and diagonal)
z = (ΣB ⊗ ΣC )T g
α = gT g
Eigenvectors/Eigenvalues of Bordered Matrices
⎡ ⎤
d1⎢
⎢
0 0 0 z1 ⎥⎥
⎢ ⎥
0 ⎢
⎢
⎢
d2 0 0 z2 ⎥⎥⎥
⎢ ⎥
⎢
C= 0 ⎢
⎢
⎢
0 d3 0 z3 ⎥⎥⎥⎥
⎢
0 ⎢
⎢
⎢
0 0 d4 z4 ⎥⎥⎥⎥
⎣
z1 z2 z3 z4 α⎦
The eigenvalues of C are the zeros of
N zi2
f (µ) = µ − α + z T (D − µI)−1z = µ − α +
i=1 di − µ
Can assume that the di are distinct and that the zi are nonzero.
The zeros of f are separated by the di. Can show that the smallest root of f
(a.k.a. the smallest eigenvalue of F̃ ) is between 0 and the smallest di.
From the eigenvalue equation
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎢ D z ⎥⎥ ⎢⎢ x ⎥⎥ ⎢ x ⎥
⎢
⎣ T ⎦ ⎣ ⎦ = µmin
⎢
⎣
⎥
⎦
z α −1 −1
it follows that
x = (D − µminI)−1z
Thus the sought-after singular vector of F̃ is just a unit vector in the direction
⎡ ⎤
−1
⎢ (D − µmin I) z ⎥⎥
⎢
⎢ ⎥
⎢ ⎥
⎣ ⎦
−1
Overall Solution Procedure
1. First compute the SVDs of B and C:

UBT BVB = ΣB UCT CVC = ΣC
2. Find smallest root of f (µ) = µ − α + z T (D − µI)−1z.
3. Get smallest singular vector w of

ΣB ⊗ ΣC g
4. Get the TLS solution from

⎡ ⎤
⎢ V B ⊗ VC 0 ⎥
v = ⎢⎣ ⎥
⎦ w
0 1
A Shifted Kronecker Product System
Frequency Response
Suppose we wish to evaluate for many different values of µ.

φ(µ) = cT (A − µI)−1d
It pays to recompute the real Schur decomposition..

⎡ ⎤
⎢
⎢
× × × × × ⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 × × × × ⎥
⎥
⎥
⎢ ⎥
QT AQ = T = ⎢
⎢
⎢
0 × × × × ⎥
⎥
⎥
⎢ ⎥
⎢
⎢
⎢
0 0 0 × × ⎥
⎥
⎥
⎣ ⎦
0 0 0 0 ×
φ(µ) = cT Q(T − µI)−1QT d (O(n2) per evaluation

Solve Ap ⊗ · · · ⊗ A1 − λI x = b.
First compute the real Schur decompositions Ak = Qk Tk QTk , i = 1:p
Set Q = Q(p) ⊗ · · · ⊗ Q(1).
It follows that
QT A(p) ⊗ · · · ⊗ A(1) Q = T (p) ⊗ · · · ⊗ T (1) ≡ T
and we get
T (p) ⊗ · · · ⊗ T (1) − λIn y = c
where y ∈ IRn and c ∈ IRn are defined by

T
x = Q(p) ⊗ · · · ⊗ Q(1) y c = Q(p) ⊗ · · · ⊗ Q(1) b.
Unshifted Case
The standard approach for solving
T (p) ⊗ · · · ⊗ T (1) y = c T (i) ∈ IRni×ni

is to recognize that
p
T = Iρi ⊗ T (i) ⊗ Iµi
i=1
where ρi = n1 · · · ni−1 and µi = ni+1 · · · np for i = 1:p.

We then obtain...
y←c
for i = 1:p
−1
y ← Iρi ⊗ T (i) ⊗ Iµi y
end
The p = 2 case: (F ⊗ G − λI)x = b
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎢
⎢
f11G − λIm f12G f13G f14G ⎥ ⎢ y1 ⎥
⎥⎢ ⎥
⎢
⎢
c1 ⎥⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢
⎢ 0 f22G − λIm f23G f24G ⎥ ⎢ y2 ⎥
⎥ ⎢ ⎥ ⎢
⎢ c2 ⎥⎥⎥
⎥ =
⎢ ⎥⎢ ⎥ ⎢
⎢ ⎥⎢ ⎢ ⎥
⎢
⎢
⎢
0 0 f33G − λIm f34G ⎥ ⎢ y3 ⎥
⎥⎢
⎥⎢
⎥
⎥
⎢
⎢
⎢
c3 ⎥⎥⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
0 0 0 f44G − λIm y4 c4
Solve the triangular system

(f44G − λIm) y4 = c4
for y4. Substituting this into the above system reduces it to
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎢
⎢
f11G − λIm f12G f13G ⎥ ⎢ y1 ⎥
⎥⎢ ⎥
⎢
⎢
c̃1 ⎥⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢
⎢
⎢
0 f22G − λIm f23G ⎥ ⎢ y2 ⎥ =
⎥
⎥⎢
⎢ ⎥
⎥
⎢
⎢
⎢
c̃2 ⎥⎥⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
0 0 f33G − λIm y3 c̃3
where c̃i = ci − fi4Gy4, i = 1:3.

The General Triangular Case
It is recursive...
function y = KPShiftSolve(T, c, λ, n, α)
n is a p-by-1 integer vector
T is a p-by-1 cell array where T {i} is an
ni-by-ni upper triangular matrix for i = 1:p.
Assume that c ∈ IRN where
N = n1 · · · np and
λ and α are scalars so A = (αT {p} ⊗ · · · ⊗ T {1} − λIN )
A is nonsingular. y ∈ IRN solves Ay = c.
p ← length(n)
if p == 1
y ← (αT {1} − λI)\c
else
m ← n1 · · · np−1
for i = np: − 1:1
idx ← 1 + (i − 1)m:im
α ← T {p}(i, i)
y(idx) ← KPShiftSolve(T, b(idx), λ, n(1:p − 1), α)
z ← (T {p − 1} ⊗ · · · ⊗ T {1})y(idx)
for j = 1:i − 1
jdx ← 1 + (j − 1)m:jm
c(jdx) ← c(jdx) − T {p}(j, i)z
end
end
end
Two-by-Two Bumps
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎢ f11 G − λIm f12G f13G f14G ⎥ ⎢ y1 ⎥ ⎢ c1 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢
⎢ 0 f 22 G − λIm f23 G f24 G ⎥ ⎢ y2 ⎥
⎥ ⎢ ⎥ ⎢ c ⎥
⎢ 2 ⎥
⎥ =
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎢ ⎥
⎢
⎢
⎢
0 0 f 33 G − λIm f34 G ⎥⎢
⎥ ⎢ y
⎥⎢ 3 ⎥
⎥
⎥ ⎢ c3 ⎥
⎢
⎢
⎥
⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
0 0 f43G f44G − λIm y4 c4
Solve for y3 and y4 at the same time.
Revert to localized complex arithmetic.

⎡ ⎤ ⎡ ⎤
⎢ f33 f34 ⎥⎥ ⎢ t33 t34 ⎥⎥
Z H ⎢⎣ ⎦Z =
⎢
⎣ ⎦
f43 f44 0 t44
High Order Kronecker Products
y = (Fp ⊗ · · · ⊗ F1) x Fi ∈ IRni×ni
It is convenient to make use of the factorization

Fp ⊗ · · · ⊗ F1 = Mp · · · M1
where
Mi = ΠTni,N/ni IN/ni ⊗ Fi N = n1 · · · np
Matlab:
Z←x
for i = 1:p
Z ← (Fi · reshape(Z, ni, N/ni))T
end
y ← reshape(Z, N, 1)
Conclusion
Our goal is to heighten the profile of the Kronecker product and develop an
“infrastructure” of methods thereby making it easier for the numerical linear
algebra community to spot Kronecker “opportunities”.
With precedent...
• The development of effective algorithms for the QR and SVD factorizations

turned many “AT A” problems into least square/singular value calculations.
• The development of the QZ algorithm (Moler and Stewart (1973)) for the
problem Ax = λBx prompted a rethink of “standard” eigenproblems that
have the form B −1Ax = λx.

Comp Mult Linear

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comp Mult Linear

Uploaded by

Copyright:

Available Formats

Computational Multilinear Algebra

Charles F. Van Loan

Supported in part by the NSF contract CCR-9901988.

“Replicated Block Structure”

B ⊗ C = (Perfect Shuﬄe)(C ⊗ B)(Perfect Shuﬄe)T

B ⊗ C = (GB GTB ) ⊗ (GC GTC ) = (GB ⊗ GC )(GB ⊗ GC )T

B ⊗ C = (QB RB ) ⊗ (QC RC ) = (QB ⊗ QC )(RB ⊗ RC )

B ⊗ C = (UB ΣB VBT ) ⊗ (UC ΣC VCT ) = (UB ⊗ UC )(ΣB ⊗ ΣC )(VB ⊗ VC )T

B ⊗ C = (QB RB PBT ) ⊗ (QC RC PCT ) = (QB ⊗ QC )(RB ⊗ RC )(PB ⊗ PC )T

Suppose B ∈ IRm×m and C ∈ IRn×n are positive definite.

A = B ⊗ C is an mn-by-mn symmetric positive definite natrix.

But GA = GB ⊗ GC and GB and GC require O(m3 + n3) flops.

If B, C ∈ IRm×m , then the m2-by-m2 system

(B ⊗ C)x = f ≡ CXB T = F x = vec(X), f = vec(F )

can be solved in O(m3) flops:

via factorizations of B and C.

Y = CXB T ⇒ vec(Y ) = (B ⊗ C)vec(X)

(2) U T [B ⊗ C | d] V = Σ (total least squares)

(3) (Ap ⊗ · · · ⊗ A1 − λI)x = b (shifted linear systems)

given that these problems

• Thriving Application Areas

• Sparse Factorizations for fast linear transforms

• Tensoring Low-Dimension Techniques

Some sample problems...

See Alizadeh, Haeberly, and Overton (1998).

For symmetric X ∈ IRn×n and arbitrary B, C ∈ IRn×n this operation is defined

where the “svec” operation is a normalized stacking of X’s subdiagonal columns,

svec stacks the subdiagonal portion of X’s columns.

Kronecker products are proving to be a very eﬀective way to look at fast

Pn = S2,n/2(I2 ⊗ S2,n/4) · · · (In/4 ⊗ S2,2)

ΩL/2 = diag(1, ωL, . . . , ωLL/2−1) ωL = exp(−2πi/L)

The Cooley-Tukey FFT is based on y = Fnx = At · · · A1Pnx

The Gentleman-Sande FFT is based on y = Fnx = FnT x = PnT AT1 · · · ATt x

A ∈ IRm×n and B = AT , then vec(B) = Sn,m · vec(A).

To compute B = AT where A ∈ IRm×n factor

and then execute

Diﬀerent transpose algorithms correspond to diﬀerent factorizations of

If m = pn, then Sn,m = Γ2Γ1 = (Ip ⊗ Sn,n)(Sn,p ⊗ In)

The first pass b(1) = Γ1vec(A) corresponds to a block transposition:

B (1) = A1 A2 A3 A4 → B (2) = AT1 AT2 AT3 AT4 .

Note that B (2) = AT .

Tensoring Low-Dimension Techniques

Greater willingness to entertain problems of high dimension.

Quadrature in one and three dimensions:

b1 b2 b3 nx ny nz (x) (y) (z)

= (w(x) ⊗ w(y) ⊗ w(z))T f (x ⊗ y ⊗ z)

1. Andre, Nowak, and Van Veen (1997).

2. de Lathauwer, de Moor, and Vandewalle (2000)

Least squares problems of the form

can be eﬃciently solved by computing the QR factorizations (or SVDs) of B

Barrlund (1998) on surface fitting with splines...

(A1 ⊗ A2)x − f such that (B1 ⊗ B2)x = g

Compute the QR factorizations

is a simple permutation of (QB ⊗ QC )T W (QB ⊗ QC ). Solve the E22 system

Compute a pair of GSVD’s:

B1 = U1B D1B XBT B2 = U2B D2B XBT

C1 = U1C D1C XCT C2 = U2C D2C XCT .

compute the SVD of [A | b] ∈ IRm×n+1 :

xTLS = −V (1:n, n + 1)/V (n + 1, n + 1)

We need the last column of V in U T F V = Σ where

First compute the SVDs of B and C:

UBT BVB = ΣB UCT CVC = ΣC

We need the smallest right singular vector of F̃ .