CS 229 - Linear Algebra and Calculus Refresher

8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher
Want more content like this? Subscribe here

(https://docs.google.com/forms/d/e/1FAIpQLSeOr-
yp8VzYIs4ZtE9HVkRcMJyDcJ2FieM82fUsFoCssHu9DA/viewform) to be notified of new
releases!
(https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus#cs-229---
machine-learning)CS 229 - Machine Learning (teaching/cs-229) English 
Probabilities Algebra
(https://stanford.edu/~shervine/teaching/cs-
229/refresher-algebra-
calculus#cheatsheet)Linear Algebra and Calculus
refresher Star 13,818
By Afshine Amidi (https://twitter.com/afshinea) and Shervine Amidi (https://twitter.com/shervinea)
(https://stanford.edu/~shervine/teaching/cs-229/refresher-
algebra-calculus#notations)
General notations
Definitions
❐ Vector ― We note x ∈ Rn a vector with n entries, where xi ∈ R is the ith entry:

⎛ x1 ⎞
⎜ x2 ⎟

x=⎜ ⎟ ∈ Rn
⎜ ⋮ ⎟

⎝ xn ⎠
❐ Matrix ― We note A ∈ Rm×n a matrix with m rows and n columns, where Ai,j ∈ R is the
entry located in the ith row and j th column:
https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus 1/7
⎛ A1,1 ⋯ A1,n ⎞
A=⎜ ⋮ ⋮ ⎟∈R

m×n
⎝ Am,1 Am,n ⎠

⋯
Remark: the vector x defined above can be viewed as a n × 1 matrix and is more particularly called a
column-vector.
Main matrices
❐ Identity matrix ― The identity matrix I ∈ Rn×n is a square matrix with ones in its diagonal and
zero everywhere else:
⎛ 1 0 ⋯ 0 ⎞
⎜ ⋮ ⎟
I=⎜ ⎟
0 ⋱ ⋱
⎜ ⎟
⎜ 0 ⎟

⋮ ⋱ ⋱
⎝ 0 ⋯ 0 1 ⎠
Remark: for all matrices A ∈ Rn×n , we have A × I = I × A = A.
❐ Diagonal matrix ― A diagonal matrix D ∈ Rn×n is a square matrix with nonzero values in its
diagonal and zero everywhere else:
⎛ d1 0 ⋯ 0 ⎞
⎜ ⋮ ⎟

D=⎜ ⎟
0 ⋱ ⋱
⎜ ⎟
⎜ 0 ⎟

⋮ ⋱ ⋱
⎝ 0 ⋯ 0 dn ⎠
Remark: we also note D as diag(d1 , ..., dn ).
algebra-calculus#operations)
Matrix operations
Multiplication
❐ Vector-vector ― There are two types of vector-vector products:
inner product: for x, y ∈ Rn , we have:

n
x y = ∑ xi yi ∈ R
T

i=1
outer product: for x ∈ Rm , y ∈ Rn , we have:
⎛ x1 y1 ⋯ x1 yn ⎞
xy T = ⎜ ⋮ ⋮ ⎟∈R

m×n
⎝ xm y1 xm yn ⎠

⋯
❐ Matrix-vector ― The product of matrix A ∈ Rm×n and vector x ∈ Rn is a vector of size Rm ,

such that:
⎛ ar,1 x ⎞
T
n
Ax = ⎜ ⎟ = ∑ ac,i xi ∈ Rm

⋮
⎝ aT x ⎠ i=1

r,m
where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.

❐ Matrix-matrix ― The product of matrices A ∈ Rm×n and B ∈ Rn×p is a matrix of size Rn×p ,
such that:
⎛ ar,1 bc,1 aTr,1 bc,p ⎞

T
⋯ n
AB = ⎜ ⎟ = ∑ ac,i bTr,i ∈ Rn×p

⋮ ⋮
⎝ aT bc,1 aTr,m bc,p ⎠ i=1

r,m ⋯
where aTr,i , bTr,i are the vector rows and ac,j , bc,j are the vector columns of A and B respectively.

Other operations
❐ Transpose ― The transpose of a matrix A ∈ Rm×n , noted AT , is such that its entries are flipped:
∀i, j, ATi,j = Aj,i

Remark: for matrices A, B , we have (AB)T = B T AT .
❐ Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such
that:
AA−1 = A−1 A = I
Remark: not all square matrices are invertible. Also, for matrices A, B , we have (AB)−1 = B −1 A−1
❐ Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:
n
tr(A) = ∑ Ai,i
i=1
Remark: for matrices A, B , we have tr(AT ) = tr(A) and tr(AB) = tr(BA)
❐ Determinant ― The determinant of a square matrix A ∈ Rn×n , noted ∣A∣ or det(A) is

expressed recursively in terms of A\i,\j , which is the matrix A without its ith row and j th column, as

follows:
n
det(A) = ∣A∣ = ∑(−1)i+j Ai,j ∣A\i,\j ∣
j=1
Remark: A is invertible if and only if ∣A∣  0. Also, ∣AB∣ = ∣A∣∣B∣ and ∣AT ∣ = ∣A∣.
=

algebra-calculus#properties)
Matrix properties
Definitions
❐ Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and
antisymmetric parts as follows:
A + AT A − AT
A= +
2 2

Symmetric Antisymmetric
❐ Norm ― A norm is a function N : V ⟶ [0, +∞[ where V is a vector space, and such that for
all x, y ∈ V , we have:
N (x + y) ⩽ N (x) + N (y)
N (ax) = ∣a∣N (x) for a scalar
if N (x) = 0, then x = 0
For x ∈ V , the most commonly used norms are summed up in the table below:
Norm Notation Definition Use case

n
Manhattan, L 1
∣∣x∣∣1
∑ ∣xi ∣
LASSO regularization
i=1
n
Euclidean, L 2
∣∣x∣∣2 ∑ x2i Ridge regularization
i=1
1
(∑ xi )
n p
p-norm, L p
∣∣x∣∣p
p

Hölder inequality
i=1
Infinity, L∞ ∣∣x∣∣∞
max ∣xi ∣
Uniform convergence
i
❐ Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in
the set can be defined as a linear combination of the others.
Remark: if no vector can be written this way, then the vectors are said to be linearly independent.
❐ Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the
vector space generated by its columns. This is equivalent to the maximum number of linearly
independent columns of A.
❐ Positive semi-definite matrix ― A matrix A ∈ Rn×n is positive semi-definite (PSD) and is noted
A ⪰ 0 if we have:
A = AT
and ∀x ∈ Rn , xT Ax ⩾ 0
Remark: similarly, a matrix A is said to be positive definite, and is noted A ≻ 0, if it is a PSD matrix
which satisfies for all non-zero vector x, x T
Ax > 0.
❐ Eigenvalue, eigenvector ― Given a matrix A ∈ Rn×n , λ is said to be an eigenvalue of A if there

exists a vector z ∈ Rn \{0}, called eigenvector, such that we have:
Az = λz
❐ Spectral theorem ― Let A ∈ Rn×n . If A is symmetric, then A is diagonalizable by a real

orthogonal matrix U ∈ Rn×n . By noting Λ = diag(λ1 , ..., λn ), we have:
∃Λ diagonal, A = U ΛU T
❐ Singular-value decomposition ― For a given matrix A of dimensions m × n, the singular-value

decomposition (SVD) is a factorization technique that guarantees the existence of U m×m
unitary, Σ m × n diagonal and V n × n unitary matrices, such that:
A = U ΣV T
algebra-calculus#calculus)
Matrix calculus
❐ Gradient ― Let f : Rm×n → R be a function and A ∈ Rm×n be a matrix. The gradient of f

with respect to A is a m × n matrix, noted ∇A f (A), such that:
∂f (A)
(∇A f (A)) =
∂Ai,j

i,j
Remark: the gradient of f is only defined when f is a function that returns a scalar.
❐ Hessian ― Let f : Rn → R be a function and x ∈ Rn be a vector. The hessian of f with

respect to x is a n × n symmetric matrix, noted ∇2x f (x), such that:
∂ 2 f (x)
(∇2x f (x)) =
∂xi ∂xj

i,j
Remark: the hessian of f is only defined when f is a function that returns a scalar.
❐ Gradient operations ― For matrices A, B, C , the following gradient properties are worth
having in mind:
T
∇A tr(AB) = B T

∇AT f (A) = (∇A f (A))

∇A tr(ABAT C) = CAB + C T AB T
∇A ∣A∣ = ∣A∣(A−1 )T

 (https://twitter.com/shervinea)  (https://linkedin.com/in/shervineamidi) 
(https://github.com/shervinea)  (https://scholar.google.com/citations?user=nMnMTm8AAAAJ) 

CS 229 - Linear Algebra and Calculus Refresher

Uploaded by

Copyright:

Available Formats

You might also like

CS 229 - Linear Algebra and Calculus Refresher

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS 229 - Linear Algebra and Calculus Refresher

Uploaded by

Copyright:

Available Formats

8/10/22, 9:10 PM CS 229 - Linear Algebra and Calculus refresher

Want more content like this? Subscribe here

By Afshine Amidi (https://twitter.com/afshinea) and Shervine Amidi (https://twitter.com/shervinea)

entry located in the ith row and j th column:

Remark: for all matrices A ∈ Rn×n , we have A × I = I × A = A.

Remark: we also note D as diag(d1 , ..., dn ). ​ ​

❐ Vector-vector ― There are two types of vector-vector products:

inner product: for x, y ∈ Rn , we have:

outer product: for x ∈ Rm , y ∈ Rn , we have:

❐ Matrix-vector ― The product of matrix A ∈ Rm×n and vector x ∈ Rn is a vector of size Rm ,

⎛ ar,1 bc,1 aTr,1 bc,p ⎞

∀i, j, ATi,j = Aj,i

Remark: for matrices A, B , we have (AB)T = B T AT .

Remark: for matrices A, B , we have tr(AT ) = tr(A) and tr(AB) = tr(BA)

❐ Determinant ― The determinant of a square matrix A ∈ Rn×n , noted ∣A∣ or det(A) is

Norm Notation Definition Use case

❐ Eigenvalue, eigenvector ― Given a matrix A ∈ Rn×n , λ is said to be an eigenvalue of A if there

❐ Spectral theorem ― Let A ∈ Rn×n . If A is symmetric, then A is diagonalizable by a real

❐ Singular-value decomposition ― For a given matrix A of dimensions m × n, the singular-value

❐ Gradient ― Let f : Rm×n → R be a function and A ∈ Rm×n be a matrix. The gradient of f

❐ Hessian ― Let f : Rn → R be a function and x ∈ Rn be a vector. The hessian of f with

You might also like

Remark: we also note D as diag(d1 , ..., dn ).