CS 229 - Linear Algebra and Calculus Refresher

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

Want more content like this? Subscribe here


(https://docs.google.com/forms/d/e/1FAIpQLSeOr-
yp8VzYIs4ZtE9HVkRcMJyDcJ2FieM82fUsFoCssHu9DA/viewform) to be notified of new
releases!

(https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus#cs-229---
machine-learning)CS 229 - Machine Learning (teaching/cs-229) English 

Probabilities Algebra

(https://stanford.edu/~shervine/teaching/cs-
229/refresher-algebra-
calculus#cheatsheet)Linear Algebra and Calculus
refresher  Star 13,818

By Afshine Amidi (https://twitter.com/afshinea) and Shervine Amidi (https://twitter.com/shervinea)

(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#notations)
General notations

Definitions
❐ Vector ― We note x ∈ Rn a vector with n entries, where xi ∈ R is the ith entry:

⎛ x1 ⎞
⎜ x2 ⎟

x=⎜ ⎟ ∈ Rn
⎜ ⋮ ⎟

​ ​ ​

⎝ xn ⎠ ​

❐ Matrix ― We note A ∈ Rm×n a matrix with m rows and n columns, where Ai,j ∈ R is the ​

entry located in the ith row and j th column:

https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 1/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

⎛ A1,1 ⋯ A1,n ⎞
A=⎜ ⋮ ⋮ ⎟∈R
​ ​

m×n

⎝ Am,1 Am,n ⎠
​ ​ ​ ​ ​

​ ⋯ ​

Remark: the vector x defined above can be viewed as a n × 1 matrix and is more particularly called a
column-vector.

Main matrices
❐ Identity matrix ― The identity matrix I ∈ Rn×n is a square matrix with ones in its diagonal and
zero everywhere else:

⎛ 1 0 ⋯ 0 ⎞
⎜ ⋮ ⎟
I=⎜ ⎟
0 ⋱ ⋱
⎜ ⎟
⎜ 0 ⎟
​ ​ ​ ​ ​ ​

⋮ ⋱ ⋱
⎝ 0 ⋯ 0 1 ⎠

Remark: for all matrices A ∈ Rn×n , we have A × I = I × A = A.

❐ Diagonal matrix ― A diagonal matrix D ∈ Rn×n is a square matrix with nonzero values in its
diagonal and zero everywhere else:

⎛ d1 0 ⋯ 0 ⎞
⎜ ⋮ ⎟

D=⎜ ⎟
0 ⋱ ⋱
⎜ ⎟
⎜ 0 ⎟
​ ​ ​ ​ ​ ​

⋮ ⋱ ⋱
⎝ 0 ⋯ 0 dn ⎠ ​

Remark: we also note D as diag(d1 , ..., dn ). ​ ​

(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#operations)
Matrix operations

Multiplication
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 2/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

❐ Vector-vector ― There are two types of vector-vector products:

inner product: for x, y ∈ Rn , we have:


n
x y = ∑ xi yi ∈ R
T
​ ​ ​ ​

i=1

outer product: for x ∈ Rm , y ∈ Rn , we have:

⎛ x1 y1 ⋯ x1 yn ⎞
xy T = ⎜ ⋮ ⋮ ⎟∈R
​ ​ ​ ​

m×n

⎝ xm y1 xm yn ⎠
​ ​ ​ ​ ​

​ ​ ⋯ ​ ​

❐ Matrix-vector ― The product of matrix A ∈ Rm×n and vector x ∈ Rn is a vector of size Rm ,


such that:

⎛ ar,1 x ⎞
T
n
Ax = ⎜ ⎟ = ∑ ac,i xi ∈ Rm


⎝ aT x ⎠ i=1
​ ​ ​ ​ ​ ​

r,m ​

where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.
​ ​ ​

❐ Matrix-matrix ― The product of matrices A ∈ Rm×n and B ∈ Rn×p is a matrix of size Rn×p ,
such that:

⎛ ar,1 bc,1 aTr,1 bc,p ⎞


T
⋯ n
AB = ⎜ ⎟ = ∑ ac,i bTr,i ∈ Rn×p
​ ​ ​

⋮ ⋮
⎝ aT bc,1 aTr,m bc,p ⎠ i=1
​ ​ ​ ​ ​ ​ ​ ​ ​

r,m ​ ​ ⋯ ​ ​

where aTr,i , bTr,i are the vector rows and ac,j , bc,j are the vector columns of A and B respectively.
​ ​ ​ ​

Other operations
❐ Transpose ― The transpose of a matrix A ∈ Rm×n , noted AT , is such that its entries are flipped:
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 3/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

∀i, j, ATi,j = Aj,i


​ ​

Remark: for matrices A, B , we have (AB)T = B T AT .

❐ Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such
that:

AA−1 = A−1 A = I

Remark: not all square matrices are invertible. Also, for matrices A, B , we have (AB)−1 = B −1 A−1

❐ Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:

n
tr(A) = ∑ Ai,i ​ ​ ​

i=1

Remark: for matrices A, B , we have tr(AT ) = tr(A) and tr(AB) = tr(BA)

❐ Determinant ― The determinant of a square matrix A ∈ Rn×n , noted ∣A∣ or det(A) is


expressed recursively in terms of A\i,\j , which is the matrix A without its ith row and j th column, as

follows:

n
det(A) = ∣A∣ = ∑(−1)i+j Ai,j ∣A\i,\j ∣​ ​ ​

j=1

Remark: A is invertible if and only if ∣A∣  0. Also, ∣AB∣ = ∣A∣∣B∣ and ∣AT ∣ = ∣A∣.
=

(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#properties)
Matrix properties

Definitions
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 4/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

❐ Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and
antisymmetric parts as follows:

A + AT A − AT
A= +
2 2
​ ​ ​ ​ ​

Symmetric Antisymmetric

❐ Norm ― A norm is a function N : V ⟶ [0, +∞[ where V is a vector space, and such that for
all x, y ∈ V , we have:

N (x + y) ⩽ N (x) + N (y)
N (ax) = ∣a∣N (x) for a scalar
if N (x) = 0, then x = 0

For x ∈ V , the most commonly used norms are summed up in the table below:

Norm Notation Definition Use case


n
Manhattan, L 1
∣∣x∣∣1 ​
∑ ∣xi ∣ ​ ​
LASSO regularization
i=1
n
Euclidean, L 2
∣∣x∣∣2 ​ ∑ x2i ​ ​ ​ Ridge regularization
i=1
1

(∑ xi )
n p ​

p-norm, L p
∣∣x∣∣p ​
p
​ ​
Hölder inequality
i=1

Infinity, L∞ ∣∣x∣∣∞ ​
max ∣xi ∣ ​ ​

Uniform convergence
i

❐ Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in
the set can be defined as a linear combination of the others.

Remark: if no vector can be written this way, then the vectors are said to be linearly independent.

❐ Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the
vector space generated by its columns. This is equivalent to the maximum number of linearly
independent columns of A.

https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 5/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

❐ Positive semi-definite matrix ― A matrix A ∈ Rn×n is positive semi-definite (PSD) and is noted
A ⪰ 0 if we have:

A = AT ​
 and  ∀x ∈ Rn , xT Ax ⩾ 0 ​

Remark: similarly, a matrix A is said to be positive definite, and is noted A ≻ 0, if it is a PSD matrix
which satisfies for all non-zero vector x, x T
Ax > 0.

❐ Eigenvalue, eigenvector ― Given a matrix A ∈ Rn×n , λ is said to be an eigenvalue of A if there


exists a vector z ∈ Rn \{0}, called eigenvector, such that we have:

Az = λz ​

❐ Spectral theorem ― Let A ∈ Rn×n . If A is symmetric, then A is diagonalizable by a real


orthogonal matrix U ∈ Rn×n . By noting Λ = diag(λ1 , ..., λn ), we have: ​ ​

∃Λ diagonal, A = U ΛU T

❐ Singular-value decomposition ― For a given matrix A of dimensions m × n, the singular-value


decomposition (SVD) is a factorization technique that guarantees the existence of U m×m
unitary, Σ m × n diagonal and V n × n unitary matrices, such that:

A = U ΣV T

(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#calculus)
Matrix calculus

❐ Gradient ― Let f : Rm×n → R be a function and A ∈ Rm×n be a matrix. The gradient of f


with respect to A is a m × n matrix, noted ∇A f (A), such that: ​

https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 6/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

∂f (A)
(∇A f (A)) =
∂Ai,j
​ ​ ​

i,j ​

Remark: the gradient of f is only defined when f is a function that returns a scalar.

❐ Hessian ― Let f : Rn → R be a function and x ∈ Rn be a vector. The hessian of f with


respect to x is a n × n symmetric matrix, noted ∇2x f (x), such that: ​

∂ 2 f (x)
(∇2x f (x)) =
∂xi ∂xj
​ ​ ​

i,j ​ ​

Remark: the hessian of f is only defined when f is a function that returns a scalar.

❐ Gradient operations ― For matrices A, B, C , the following gradient properties are worth
having in mind:

T
∇A tr(AB) = B T

​ ∇AT f (A) = (∇A f (A))
​ ​

∇A tr(ABAT C) = CAB + C T AB T
​ ​ ∇A ∣A∣ = ∣A∣(A−1 )T
​ ​

 (https://twitter.com/shervinea)  (https://linkedin.com/in/shervineamidi) 
(https://github.com/shervinea)  (https://scholar.google.com/citations?user=nMnMTm8AAAAJ) 

https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 7/7

You might also like