Professional Documents
Culture Documents
A Journey From Linear Algebra to Machine Learning
A Journey From Linear Algebra to Machine Learning
2/50
L INEAR A LGEBRA TO S TATISTICAL
L EARNING
3/50
L INEAR A LGEBRA TO S TATISTICAL
L EARNING
4/50
Importance of Linear Algebra in ML
5/50
ML Models leveraging Linear Algebra
6/50
What is linear algebra?
• Linear equation
a1 x1 + . . . + an xn = b defines a
plane in (x1 , . . . , xn ) space
Straight lines define common
solutions to equations.
7/50
Standard Definitions
9/50
Shapes of Tensors
10/50
Matrix Addition
D = aB + c =⇒ Di,j = aBi,j + c
C = A + b =⇒ Ci,j = Ai,j + bj
11/50
Multiplying Matrices
12/50
Tensors and Linear Classifier
13/50
Linear Transformation
• Ax=b
− where A ∈ Rn×n and b ∈ Rn
− More explicitly (n equations in n unknowns)
Ax = b
−1
A Ax = A−1 b
In x = A−1 b
x = A−1 b
15/50
Disadvantage of closed-form solutions
• If A−1 exists, the same A−1 can be used for any given b
− But A−1 cannot be represented with sufficient precision
− It is not used in practice
• Gaussian elimination also has disadvantages
− numerical instability (division by small no.)
− O(n3 ) for n × n matrix
• Software solutions use value of b in finding x
− E.g., difference (derivative) between b and output is used
iteratively
• Least squares solutions of a m × n system
16/50
Use of a Vector in Regression
• A design matrix
− N samples, D features
17/50
Norms
f (x) = 0 =⇒ x = 0
f (x + y) ≤ f (x) + f (y) (Triangle Inequality)
∀ α ∈ R, f (αx) = |α|f (x)
18/50
LP Norm
• Definition:
!1/p
X
∥x∥p = |xi |p
i
− L2 Norm
• Called Euclidean norm
• Simply the Euclidean distance between the origin and the
point x
• written simply as ∥x∥
• Squared Euclidean norm is same as xT x
− L1 Norm
• Useful when 0 and non-zero have to be distinguished
• Note that L2 increases slowly near origin, e.g., 0.12 = 0.01
− L∞ Norm
19/50
Use of norm in Regression
M−1
X
y(x, w) = w0 + wj ϕj (x)
j=1
• Loss Function
N
1X λ
L(w) = {y(xn , w)−tn }2 + ∥w2 ∥
2 n=1 2
20/50
Distance and Frobenius Norm
• Norm is the length of a vector
• Distance between two vectors (v, w)
p
− dist(v, w) = ∥v − w∥ = (v1 − w1 )2 + . . . + (vn − wn )2
Distance to origin would be sqrt of sum of squares
• Similar to L2 norm
1
2
X
∥A∥F = A2i,j
i,j
• Frobenius in ML
− Layers of neural network involve matrix multiplication
− Regularization:
minimize Frobenius of weight matrices ∥W(i) ∥ over L layers
L
X
JR = J + λ ∥W (i) ∥F
i=1
21/50
Frobenius in ML
22/50
L INEAR A LGEBRA TO I MAGE P ROCESSING
23/50
What are eigenvalues?
Ax − λx = 0 =⇒ (A − λI)x = 0
• Trivial solution is if x = 0.
• The non trivial solution occurs when det(A − λI) = 0.
• Are eigenvectors unique?
• If x is an eigenvector, then βx is also an eigenvector and βλ
is an eigenvalue
24/50
Calculating the Eigenvectors/ values
a11 − λ a12
det = 0 =⇒ (a11 −λ)(a22 −λ)−a12 a21 = 0
a21 a22 − λ
25/50
Eigenvalue example
• Consider,
( λ2 − (a + a )λ + (a a − a a ) = 0
11 22 11 22 12 21
1 2
A= =⇒ λ2 − (1 + 4)λ + (1 ∗ 4 − 2 ∗ 2) = 0
2 4
λ2 = (1 + 4)λ =⇒ λ = 0, λ = 5
26/50
Physical Interpretation
27/50
Physical Interpretation
• And so on . . .
29/50
Spectral Decomposition Theorem
λ1 0 ... 0
0 λ2 ... 0
P = [e1 , e2 , . . . , ek ]; Λ = . .. ..
k×k k×k .. ..
. . .
0 0 ... λk
30/50
Example of spectral decomposition
• Eigendecomposition is A = PΛPT
− where P is an orthogonal matrix composed of eigenvectors
of A
• Decomposition is not unique when two eigenvalues are
the same.
• Eigenvalues really work for square matrices. Consider a
regression problem, we have input and output matrix and
m and k have different dimensions. Then, this method
fails. Solution: SVD.
• By convention order entries of Λ in descending order:
− Under this convention, eigendecomposition is unique if all
eigenvalues are unique
32/50
Singular Value Decomposition
• If A is rectangular m × k matrix of real numbers, then there
exists an m × m orthogonal matrix U and a k × k
orthogonal matrix V such that
A = U Λ VT UUT = VV T = I
(m×k) (m×m)(m×k)(k×k)
AT A = VΛ2 V T
34/50
Example for SVD
• V can be computed as
3 −1 10 0 2
3 1 1 T 3 1 1
A= =⇒ A A = 1 3 =0 10 4
−1 3 1 −1 3 1
1 1 2 4 2
det(AT A − γI) = 0 =⇒ γ1 = 12, γ2 = 10, γ3 = 0
h i h i h i
−1 −5
=⇒ vT1 = √16 , √26 , √16 , vT2 = √25 , √5 , 0 , vT3 = √130 , √230 , √
30
35/50
Example for SVD
3 1 1
A=
−1 3 1
" 1 # " 1 #
√ √ h i √ √ h
−1
i
= 12 12 √16 , √26 , √16 + 10 −12 √25 , √ 5
, 0
√ √
2 2
37/50
Digital Images
The (i, j)th entry of the matrix comprises of (i, j)th pixel value, which
determines the intensity of light of the image.
38/50
Image Attributes
39/50
Image Compression using SVD
40/50
Compression in color images
41/50
SVD of RGB image
• Obtain the singular values (Λ) and the singular vectors (U, V) of the image using
SVD.
• The amount of variance explained by the singular values can be obtained by
plotting Λ2 /sum(Λ2 ).
43/50
L INEAR A LGEBRA TO S PATIO -T EMPORAL
D ATA A NALYSIS
44/50
Matrix Decomposition
45/50
Example: spatio-temporal data
46/50
Global daily temperature
47/50
Global warming component
48/50
More Formally
X ≈ WH
49/50
Textbook and References
50/50