Professional Documents
Culture Documents
VinAI Linear Algebra Slides Released
VinAI Linear Algebra Slides Released
Tung Pham
11 Oct 2022
Wikipedia
Linear algebra is central to almost all areas of mathematics.
Linear algebra is also used in most sciences and fields of engineering,
because it allows modeling many natural phenomena, and computing
efficiently with such models.
William Stein
Mathematics is the art of reducing any problem to linear algebra.
India example
One-third of a collection of beautiful water lilies was offered to Shiva,
one-fifth to Vishnu, one-sixth to Surya, and one-fourth to the Devi. The
six that remained were presented to the guru. How many water lilies were
there in all?
Euler brought to light that a system of linear equations does not have
to have a solution.
Gauss introduced elimination method to solve system of linear
equations.
In 1848, Sylvester introduced the term “matrix".
Cayley defined the matrix multiplication.
With the development of computer, the matrix calculations were
speed up.
Story of Nobel laureate, who needed to compute inverse matrix of 25
by 25 matrix. He had to access to super-computer at that time.
Source: Wikipedia
Source:Wikipedia
Source: Internet
Vector space
~
Given an origin O, for each point A, we view it as a vector OA.
The whole space could be considered as a vector space.
For example
Line: R
Plane: R2
The set of polynomials with real coefficients
p(x) = a0 + a1 x + . . . + an x n
a1 v1 + a2 v2 + . . . + an vn ,
a1 v1 + . . . + ak vk = ~0.
~ + GB
GA ~ + GC
~ = ~0.
a1 v1 + . . . + ak vk 6= ~0.
1, 2x − 1, 3x 2 − 2x, . . .
Source: Internet
L(a~
u + b~v ) = aL(~
u ) + bL(~v )
for u~ and ~v are two vectors in V, a and b are scalar (real number, complex
number etc).
Matrix
A rectangle of size m × n of numbers, which is a linear map from n
dimensional space to m dimensional space.
Example:
2 2
L= e1 = [1, 0], e2 = [0, 1]
1 3
2 2 1 2
L(e1 ) = =
1 3 0 1
2 2 0 2
L(e2 ) = =
1 3 1 3
From the previous example, the map L takes unit vectors e1 and e2
map to vectors (2, 1) and (2, 3), which are the columns of matrix L.
For any vector v = (x, y ), v could be written as x × (1, 0) + y × (0, 1).
Then
When L is a m × n matrix, L : Rn → Rm .
Image of L
Image of L is the set L(V) ⊂ W.
Rank of L
The dimension of L(V) is also the rank of L.
Kernel of L
The kernel of L, denoted by Ker (L), is defined as
Ker (L) = v ∈ V : L(v ) = 0
Proposition
Ker (L) is also a vector space.
L(v1 ) = L(v2 ) = 0
L(a1 v1 + a2 v2 ) = a1 L(v1 ) + a2 L(v2 ) = 0 ⇒ a1 v1 + a2 v2 ∈ Ker (L).
1 2 3 1 2 3
L = 4 5 6 ; v1 = 4 v2 = 5 v3 = 6
7 8 9 7 8 9
v1 − 2v2 + v3 = 0.
Rank(L) is the dimension of L(V), Nullity (L) is the dimension of Ker (L)
and Dim(V) is the dimension of V.
Source: Internet
Tung Pham Linear Algebra 30 / 133
Rank and Nullity Theorem
Source: Internet
What are the rank of xy > and I + xx > , where x, y are n × 1 vectors?
If T : Rn → Rm , could rank(T ) be greater than n or m?
Prove that rank(AB) ≤ min{rank(A), rank(B)}.
If A is linear map from R3 to R3 . Can both rank and dimension of
kernel of A equal 2 at the same time?
Prove that rank(A) + rank(B) ≥ rank(A + B). When does the
equality happen?
Prove that Ker (T1 ) ∩ Ker (T2 ) ⊆ Ker (T1 + T2 ), give example when
the equality does not happen?
Prove or disprove Ker (T ) = Ker (T 2 ) and Image(T ) = Image(T 2 )?
Source: Wikipedia.
Le = λe.
v = α1 e1 + . . . + αk ek
⇒ L(v ) = L(α1 e1 + . . . + αk ek )
= λα1 e1 + . . . + λαk ek
= λv .
v is an eigenvector of L.
α1 e1 + . . . + αk−1 ek−1 = ek
L(α1 e1 + . . . + αk−1 ek−1 ) = λ1 α1 e1 + . . . + λk−1 αk−1 ek−1
L(ek ) = λk ek
⇒λ1 α1 e1 + . . . λk−1 αk−1 ek−1 = λk ek = λk α1 e1 + . . . + λk αk−1 ek−1
⇒(λ1 α1 − λk α1 )e1 + . . . + (λk−1 αk−1 − λk αk−1 )ek−1 = 0.
n1 + . . . + nk = n.
V = V1 ⊕ V2 ⊕ . . . ⊕ Vk
Multiplicity of eigenvalues
A n × n matrix T has no more than n eigenvalues.
T − λI
Algebraic proof:
Let Vi be the vector space corresponding to eigenvalues λi for
i = 1, 2, . . . , k. The dimension of Vi is equal to ni , ni is the multiplicity of
λi .
We consider V = V1 ⊕ V2 ⊕ . . . ⊕ Vk , we need to prove that ki=1 ni
P
is also the dimension of V .
It is equivalent to show that if v1 ∈ V1 , v2 ∈ V2 , . . . , vk ∈ Vk , then
v1 , v2 , . . . , vk are independent.
Assume the contrary, Without Loss of Generality (WLOG) there exist
minimal t and ai 6= 0; 1 ≤ i ≤ t, such that
a1 v1 + . . . + at vt = 0
L(a1 v1 + . . . + at vt ) = λ1 a1 v1 + . . . + λt at vt = 0.
Algebraic proof:
From the previous slide,
λ1 a1 v1 + λ1 a2 v2 + . . . + λ1 at vt = 0
λ1 a1 v1 + λ2 a2 v2 + . . . + λt at vt = 0.
Contradiction.
Thus, V1 ⊕ V2 ⊕ . . . ⊕ Vk has dimension n1 + . . . + nk and it is a
subspace of V .
Hence, n ≥ ki=1 ni .
P
Important properties
An = PD n P −1
Leibniz’s formulas
a11 a12 . . . a1n
a21 a22 n
. . . a2n Xh Y i
A= ; det(A) = sign(σ) ai,σ(i) .
..
. . . . . . . . . . σ i=1
an1 an2 . . . ann
Example:
Spectral decomposition
Pn >
If A is a n × n symmetric matrix, then A = i=1 λi ui ui
A = UΛU >
n
X
aij = uik λk ujk
k=1
Xn
aji = ujk λk uik
k=1
A is a symmetric matrix
All eigenvalues are real numbers:
If Av = λv , then λ is real number
Eigenvectors with different eigenvalues are orthogonal.
If W is stable under A, then so is W ⊥ , i.e.
A(W) ⊂ W ⇒ A(W ⊥ ) ⊂ W ⊥ .
Step 1
Prove that all eigenvalues are real numbers.
since A is symmetric, A = A> . The LHS = RHS, then they are all real
numbers.
Let (λ, v ) be a pair of eigenvalue and eigenvector:
Then λ ∈ R.
Step 2
Prove that eigenvectors with different corresponding eigenvalues are
orthogonal.
However u > Av = (u > Av )> = v > A> (u > )> = v > Au. It follows that
Step 3
If A ∈ Rn×n is symmetric and W is a subspace of Rn such that
A(W ) ⊂ W , then A(W ⊥ ) ⊂ W ⊥ .
Step 4
If W ⊥ is a subspace and A(W ⊥ ) ⊂ W ⊥ then W ⊥ contains an eigenvector
of A.
Choose an orthogonal basis u1 , . . . , um of W ⊥ , because Aui ∈ W ⊥ . Denote
Step 4 continued
m
X m
X m
X
A(w ) = xj A(uj ) = xj rij ui
j=1 j=1 i=1
m
X m
X
= rij xj ui
i=1 j=1
Xn
= λxi ui
i=1
= λw .
Step 5
Building the subspace W by induction.
Xn n
X n
X
v > Av = ( αi ui> ) λi ui ui> ( αi ui )
i=1 i=1 i=1
n
X
= αi2 λi .
i=1
Let
0
a11 a12 . . . a1n ..
a21 a22 . . . a2n .
A= . .. ; 1i ;
ei =
. .. . .
. . . . ..
.
an1 an2 . . . ann
0
ei> Aej = ei> Aj = aij .
tr(A) = tr(A> ).
tr(αA + βB) = αtr(A) + βtr(B).
Let a, b be two n × 1 vectors,
tr(XY ) = tr(YX ).
tr(XY ) = tr(YX ).
Then we have
A2 − tr(A) · A + det(A) · I2 = O2 .
a11 − λ a12 ... a1n
22 − λ
a21 a ... a2n
A − λI =
... ... ··· ...
an1 an2 . . . ann − λ
det(A − λI ) is a polynomial of λ which has roots λ1 , . . . , λn , then
n
Y
det(A − λI ) = (λi − λ).
i=1
i=1
A = PΛP −1 ,
Avi = λi vi .
volume(λ1 v1 , . . . , λn vn )
det(A) =
volume(v1 , . . . , vn )
Yn
= λi .
i=1
It is more geometric.
U(ei ) = Ui
ei = (0, . . . , 1i , . . . , 0)
>
U U = I; U > = U −1 .
Ui is the i th column of U.
Rotation and Reflection matrices are two special cases of unitary matrix.
Rot(x)Rot(y ) = Rot(x + y ).
Ref(x)Ref(y ) = Rot(2[x − y ]).
Rot(x)Ref(y ) = Ref(y + 12 x).
Ref(x)Rot(y ) = Ref(x − 21 y ).
Rot and Ref preserve the distance between two points.
M = UΣV > .
Source:Internet
Tung Pham Linear Algebra 88 / 133
Geometry of Singular Value Decomposition
Source: Internet
V2 = [v`+1 , . . . , vn ], V = [V1 , V2 ].
> > Λ 0`×(n−`)
V M MV = .
0(n−`)×` 0(n−`)×(n−`)
V1> V1 = I`
V2> V2 = In−` .
1 1
uj> ui = vj M > Mvi = 0.
σj σi
1
Let U1 = MV1 Λ− 2 , then U1 is orthogonal matrix of size m × `.
We have
V1>
In = V1 , V2 = V1 V1> + V2 V2>
V2>
1 1 1
U1 Λ 2 V1> = MV1 Λ− 2 Λ 2 V1> = MV1 I` V1>
= M In − V2 V2> = M − MV2 V2> = M.
Then
k
X
Ak = σi ui vi> .
i=1
Polar decomposition
Given a square matrix A, A = US, where U is unitary matrix and S is
symmetric non-negative definite matrix.
Source: Wikipedia
The first part is "rotation", the second part is positive definite matrix.
The polar decomposition of nonsingular matrix is unique.
Definition
A is upper triangular matrix if aij = 0 for all i > j.
Properties
A is lower triangular matrix if A> is upper triangular matrix, it means
that aij = 0 for all i < j.
Product of two upper triangular matrices is an upper triangular
matrix.
The same for product of two lower triangular matrices.
Geometric representation: The first k vectors/columns of matrix A lie
in the space spanned by e1 , . . . , ek , where
ei = [0, . . . , 1i , . . . , 0]> .
QR decomposition
Matrix A could be decomposed as product of two matrices: Q and R, Q is
an orthogonal matrix and R is an upper triangular matrix, where A is a
matrix of m × n, Q is a matrix of m × n and R is a matrix of n × n.
This process guarantees that the first k vectors of Q forms the same
space as the first k vectors of A.
Hence, the coefficient matrix is upper triangle matrix.
This process is named Gram-Schmidt process.
Source: Wikipedia
Cholesky decomposition
A positive definite matrix A could be decomposed as product of two
matrix L and L> , where L is the lower triangle matrix.
LU decomposition
A matrix A could be written as the product of L and U, where L is a lower
matrix and U is an upper matrix.
LDU decomposition
Matrix A = LDU, where D is diagonal matrix, L and U are unitriangle
(main diagonal entries are all equal to 1) matrices.
Rn = Image(A) ⊕ Image(A)⊥ .
Note that the map A is bijective between Ker (A)⊥ and Image(A).
We can define the inverse of A between Ker (A)⊥ and Image(A), then
extend the inverse map to Rm and Rn .
The map is called Moore-Penrose inverse, denoted by A+ .
A : Rm → Rn .
For any v ∈ Rn , decompose
v = v1 + v2
v1 ∈ Image(A); v2 ∈ Image(A)⊥ .
A+ : Rn → Rm
A+ (v ) = u1 .
Au = v ; A+ v = u1 ⇒ u = u1 + u2 , u1 ⊥ u2
+ +
Au = v = Au1 ; AA Au = AA v = Au1 .
(A+ A)> = A+ A.
Proof: For u ∈ Rm ,
A = UΣV > ,
then
A+ = V Σ+ U > ,
1
where Σ+ is the transpose of Σ with singular value σi is replaced by σi .
Proof: For some σi > 0, thus vi ∈ Ker (A)⊥
X
A = UΣV > = σi ui vi> ⇒ Avi = σi ui
i
1
⇒A+ ui = vi ⇒ A+ = V Σ+ U > .
σi
Then
X
pi,k = P(state k| state i) = p1,i,j p2,j,k .
j
Norms are all defined for vectors, they are also defined for matrix.
For matrix A is a map from Rm to Rn .
kAxkq
kAkp,q = max .
x6=0 kxkp
Assume that A = aij with i = 1, . . . , m and j = 1, . . . , n.
The SVD of A is
min{m,n}
X
A= σi ui vi> = UΣV > .
i
Then we have
Schatten norm:
min{m,n} 1
σip (A) .
X p
kAkS,p =
i=1
Then we have,
r
hX n
ih X i Xr
Ax = σi ui vi> xj vj = σi xi ui .
i=1 j=1 i=1
r
X r
X
kAxk22 = σi2 xi2 ≤ σ12 xi2 ≤ σ12 .
i=1 i=1
It follows that kAxk2 ≤ σ1 for all kxk = 1, which means that kAk2 ≤ σ1 .
Then
v
u r
uX
kAk2 ≤ t σi2 = kAkF .
i=1
p = 1:
n
X n
X m X
X n
kAxk1 = aij xj ≤ |aij ||xj |
i=1 j=1 i=1 j=1
n
X X Xn
= |xj | |aij | ≤ max |aij |.
j
j=1 i=1 i=1
Pn
Equality: xj ∗ = 1 for j ∗ = arg maxj i=1 |aij |.
Dantzig selector (Candes and Tao, 2004): Solve the convex program
min β 1
subject to X β = y .
β
min β 1
subject to X >r ∞
≤ λp ,
β
min X S,1
subject to Xij = Mij ; (i, j) ∈ Ω.