Professional Documents
Culture Documents
CS 229 - Linear Algebra and Calculus Refresher
CS 229 - Linear Algebra and Calculus Refresher
CS 229 - Linear Algebra and Calculus Refresher
(https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus#cs-229---
machine-learning)CS 229 - Machine Learning (teaching/cs-229) English
Probabilities Algebra
(https://stanford.edu/~shervine/teaching/cs-
229/refresher-algebra-
calculus#cheatsheet)Linear Algebra and Calculus
refresher Star 13,818
(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#notations)
General notations
Definitions
❐ Vector ― We note x ∈ Rn a vector with n entries, where xi ∈ R is the ith entry:
⎛ x1 ⎞
⎜ x2 ⎟
x=⎜ ⎟ ∈ Rn
⎜ ⋮ ⎟
⎝ xn ⎠
❐ Matrix ― We note A ∈ Rm×n a matrix with m rows and n columns, where Ai,j ∈ R is the
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 1/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher
⎛ A1,1 ⋯ A1,n ⎞
A=⎜ ⋮ ⋮ ⎟∈R
m×n
⎝ Am,1 Am,n ⎠
⋯
Remark: the vector x defined above can be viewed as a n × 1 matrix and is more particularly called a
column-vector.
Main matrices
❐ Identity matrix ― The identity matrix I ∈ Rn×n is a square matrix with ones in its diagonal and
zero everywhere else:
⎛ 1 0 ⋯ 0 ⎞
⎜ ⋮ ⎟
I=⎜ ⎟
0 ⋱ ⋱
⎜ ⎟
⎜ 0 ⎟
⋮ ⋱ ⋱
⎝ 0 ⋯ 0 1 ⎠
❐ Diagonal matrix ― A diagonal matrix D ∈ Rn×n is a square matrix with nonzero values in its
diagonal and zero everywhere else:
⎛ d1 0 ⋯ 0 ⎞
⎜ ⋮ ⎟
D=⎜ ⎟
0 ⋱ ⋱
⎜ ⎟
⎜ 0 ⎟
⋮ ⋱ ⋱
⎝ 0 ⋯ 0 dn ⎠
(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#operations)
Matrix operations
Multiplication
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 2/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher
i=1
⎛ x1 y1 ⋯ x1 yn ⎞
xy T = ⎜ ⋮ ⋮ ⎟∈R
m×n
⎝ xm y1 xm yn ⎠
⋯
⎛ ar,1 x ⎞
T
n
Ax = ⎜ ⎟ = ∑ ac,i xi ∈ Rm
⋮
⎝ aT x ⎠ i=1
r,m
where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.
❐ Matrix-matrix ― The product of matrices A ∈ Rm×n and B ∈ Rn×p is a matrix of size Rn×p ,
such that:
⋮ ⋮
⎝ aT bc,1 aTr,m bc,p ⎠ i=1
r,m ⋯
where aTr,i , bTr,i are the vector rows and ac,j , bc,j are the vector columns of A and B respectively.
Other operations
❐ Transpose ― The transpose of a matrix A ∈ Rm×n , noted AT , is such that its entries are flipped:
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 3/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher
❐ Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such
that:
AA−1 = A−1 A = I
Remark: not all square matrices are invertible. Also, for matrices A, B , we have (AB)−1 = B −1 A−1
❐ Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:
n
tr(A) = ∑ Ai,i
i=1
follows:
n
det(A) = ∣A∣ = ∑(−1)i+j Ai,j ∣A\i,\j ∣
j=1
Remark: A is invertible if and only if ∣A∣ 0. Also, ∣AB∣ = ∣A∣∣B∣ and ∣AT ∣ = ∣A∣.
=
(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#properties)
Matrix properties
Definitions
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 4/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher
❐ Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and
antisymmetric parts as follows:
A + AT A − AT
A= +
2 2
Symmetric Antisymmetric
❐ Norm ― A norm is a function N : V ⟶ [0, +∞[ where V is a vector space, and such that for
all x, y ∈ V , we have:
N (x + y) ⩽ N (x) + N (y)
N (ax) = ∣a∣N (x) for a scalar
if N (x) = 0, then x = 0
For x ∈ V , the most commonly used norms are summed up in the table below:
(∑ xi )
n p
p-norm, L p
∣∣x∣∣p
p
Hölder inequality
i=1
Infinity, L∞ ∣∣x∣∣∞
max ∣xi ∣
Uniform convergence
i
❐ Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in
the set can be defined as a linear combination of the others.
Remark: if no vector can be written this way, then the vectors are said to be linearly independent.
❐ Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the
vector space generated by its columns. This is equivalent to the maximum number of linearly
independent columns of A.
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 5/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher
❐ Positive semi-definite matrix ― A matrix A ∈ Rn×n is positive semi-definite (PSD) and is noted
A ⪰ 0 if we have:
A = AT
and ∀x ∈ Rn , xT Ax ⩾ 0
Remark: similarly, a matrix A is said to be positive definite, and is noted A ≻ 0, if it is a PSD matrix
which satisfies for all non-zero vector x, x T
Ax > 0.
Az = λz
∃Λ diagonal, A = U ΛU T
A = U ΣV T
(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#calculus)
Matrix calculus
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 6/7
8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher
∂f (A)
(∇A f (A)) =
∂Ai,j
i,j
Remark: the gradient of f is only defined when f is a function that returns a scalar.
∂ 2 f (x)
(∇2x f (x)) =
∂xi ∂xj
i,j
Remark: the hessian of f is only defined when f is a function that returns a scalar.
❐ Gradient operations ― For matrices A, B, C , the following gradient properties are worth
having in mind:
T
∇A tr(AB) = B T
∇AT f (A) = (∇A f (A))
∇A tr(ABAT C) = CAB + C T AB T
∇A ∣A∣ = ∣A∣(A−1 )T
(https://twitter.com/shervinea) (https://linkedin.com/in/shervineamidi)
(https://github.com/shervinea) (https://scholar.google.com/citations?user=nMnMTm8AAAAJ)
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 7/7