Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Strang's Introduction to Linear Algebra Notes

Chapter 1: Introduction to Vectors


1.2: Lengths and Dot Products
, the angle in between the vectors, is 0. AKA the vectors are perpendicular.

, the unit vector in the same direction as v

Schwarz inequality

Triangle Inequality

1.3: Matrices
Matrix multiplication on the right by a vector can be thought of as a linear combination of columns of the
matrix

where u,v,w are columns of A

One can also think of acting on the vector in insofar as the elements of the vector are the
result of linear combinations of according to the corresponding rows of

One example is the difference matrix:

Of course, there's also the usual way of multiplying matrices which takes dot products of corresponding
rows and columns (or for a vector, the whole vector)

One other way of looking at the equation is as a system of equations, where is known, and
we're trying to find

The system of equations of the difference matrix can be written as follows:

This system of equations is well-behaved, and as a result the matrix is invertible. What that means
is that for every constant there is exactly one solution . Note that this doesn't have to be the
case: if we had an under or over determined system, we would have infinitely many solutions / no
solutions at all.
How do these facts about the corresponding systems of equations translate over to the matrix
equation?

Not having enough variables in an equation corresponds to redundant columns in the source
matrix, AKA the columns of are dependent. Not having enough equations for our unknowns
means that our rows are dependent. Either of these can result in infinitely many solutions or no
solutions, depending on .

Another way to look at dependence: the column / row vectors (in 3D) lie in the same plane,
instead of spanning the entire 3D space. This means that not all vectors can result from a
linear combination of the columns, and there is sometimes no solution.
Having dependent columns / rows means that the matrix is not invertible. If the matrix was
invertible, that would imply that there was a single solution to the system of equations. Since this
is clearly not the case, the matrix cannot be inverted.

Chapter 2: Solving Linear Equations


2.1 Vectors and Linear Equations

Take the following system of equations:

Viewing this in column form it's clear why linearly dependent columns lead to a singular matrix:

If the columns are linearly dependent, then for some vectors there will be no such that a
linear combination of the columns equals
The row picture, of course, is of two lines in the - plane, intersecting in a point

Now say we have a system of 3 linear equations, with the coefficient matrix

The matrix equation is thus:

What does it mean to multiply by ?

Matrix multiplication can be thought of as an operation that's defined by its correspondence


with the linear system of equations it represents

The row picture - is the dot products of with rows of


The column picture - is a linear combination of the columns of

The column picture is the one Strang prefers, for its cleanliness

2.2 The Idea of Elimination


A visual of what elimination leads to: the upper triangular system

It's clear that these systems can be quickly solved with back substitution

Some nomenclature:

The pivot is the first nonzero element in the row that does the elimination (i.e. the row that is
subtracted from other rows)
The multiplier is (entry to eliminate) / (pivot coefficient), in the above example, 3
Elimination can fail in some scenarios - e.g. the pivot is 0. This happens in systems with no solution, as well
as in systems with infinitely many solutions. Sometimes, the pivot is 0 and the system is still solvable, but
rows need to be exchanged.

2.3 Elimination Using Matrices


What if we want to represent the elimination steps using matrices?

Consider the following elimination step, when we subtract 2x the first row from the second. Here it is,

represented in matrix form:

The elimination matrix that subtracts the jth row from the ith row:

Change the entry in the identity matrix to , and you have


How do elimination matrices act on ?

The purpose of elimination is to produce a 0 in the row that's being acted upon, in the corresponding
position from the row that's being subtracted
The operation we're performing is
Also, what's the impact of the ordering of the s on the final elimination matrix?
Ok, so how do we multiply the matrices on the left hand side?

We know what the result of is, so we can make various observations from this result

We also know that the rule for multiplying matrices must hold the same result as the
We also know that the rule for multiplying matrices must hold the same result as the
multiplication when is a single column matrix

This also holds for multiple columns:

Note that can be viewed in the column fashion as a linear combination of


the columns of by the elements of
Why?
Cool! We just defined matrix multiplication using matrix-vector multiplication, which came from
linear equations

The augmented matrix:

When performing elimination, add as the last column, because the same operations act on as

2.4 Rules for Matrix Operations


is a valid matrix product for dim dim

dot product of th row of with th row of


Along with the column picture, there's a row picture for matrix multiplication

An example is elimination ( ) - think about the resulting rows of and their relation to the
original matrices
Block multiplication

You can divide matrices into blocks. If in the multiplication , the column cuts of match the row
cuts of , you can do the resulting multiplication just like if the cuts were numbers

One example is doing multiplication column-row instead of row-column:

2.5 Inverse Matrices


2.5 Inverse Matrices
A number has an inverse if it's not zero - the matrix condition for not having an inverse is more interesting

Some interesting notes on invertibility

for a nonzero vector implies that is not invertible

You can see how this is equivalent to linear dependence of the columns
Matrix is invertible iff determinant is not 0

Diagonal matrix is invertible iff no diagonal entries are 0

invertible implies is invertible

For square matrices, right inverses are left inverses

Gauss-Jordan Elimination

Think about this in column form:

This is 3 systems of 3 linear equations; we can solve them all at once in elemination

Once you get to an upper triangular matrix, you perform another round of elimination, but upwards
so that all elements above the pivots in are , resulting in a diagonal matrix

Finally, you divide rows through by the elements on the diagonals to get on the left and on the
right

Why is the matrix on the right hand side ?

We've multiplied by a series of elimination matrices to get , which implies that , the product
of all the elimination matrices, is , and since we've performed the same steps on , the right
hand matrix is
Gauss-Jordan form explains why when we computationally solve , we generally won't try to invert
. We need to solve systems of equations to invert with dimension , but to solve we
simply need to solve one system of equations

So, when Gauss-Jordan elimination fails, we don't have an invertible matrix

2.6 Elimination = Factorization:


Lots of the interesting theorems and operations of linear algebra are factorizations of a matrix into
multiple other matrices

Elimination is one of these - when you take , is always an invertible matrix (if there are no row
exchanges), so this is equivalent to factoring

Each step of the elimination is intended to produce a in the element of the operand by multiplying
by with the multiplier in the element of the elimination matrix; the inverse does the opposite

An amazing thing happens - in the product , each of the original multipliers go in directly, without any
An amazing thing happens - in the product , each of the original multipliers go in directly, without any
cross-talk!

Why does this happen?

It happens because each row of is composed of a sum of a row of , and rows of , because
the rows above a given row in do not change, so the original row of can be recovered by
reversing the summing of rows in

If we want to be cleaner (have s on the diagonal like ), we can divide out by a matrix to get the new
factorization

The best way to look at this is with the row picture, multiplying on the left

2.7 Transposes and Permutations


Transpose: the matrix flips over it's diagonal,

Some rules:

Dot product / inner product

The inner product of two vectors is

Called the inner product because the transpose is on the inside, the outer product is

Note that the inner product of two dimensional vectors, is a scalar, while the outer product is
an matrix
There's a way to define transposes through dot products, but it doesn't seem to add intuition for me

Symmetric matrices have

The inverse of a symmetric matrix is also symmetric


You can produce a symmetric square matrix from any rectangular matrix

and are both symmetric square matrices (but they're different matrices)

Look at the individual elements - the and elements are the same because as you swap
matrices, you also swap rows for columns
Symmetric matrices make elimination a bit easier

During elimination of a symmetric matric, turns into so we only have to


perform elimination on , not carry it over to

Why is this the case?


Why is this the case?
Permutation matrices

Permutation matrices swap rows of the matrix they're multiplied by (on the left multiplication)

One way to look at this: is a product of single row exchanges


since the transpose of a single row
exchange is still the row exchange. This is the row exchanges in reverse order, which is the
inverse
The transpose of a row exchange is the row exchange because swapping row and is
equivalent to swapping column and column in the identity matrix
Bringing this back to elimination - factorization doesn't always work - namely when we need row
exchanges to eliminate

If we perform all our permuations first, then we get

Chapter 3: Vector Spaces and Subspaces


3.1: Spaces of Vectors
The key attribute of vector spaces is that they're closed under linear combinations

Subspaces are closed spaces encolsed in a larger space

Note that subspaces must always include the 0 vector!


The column space of is the space of all linear combinations of the columns; and is a subspace of
where is the number of rows

is solvable iff

3.2: The Nullspace of : Solving

3.3 The Rank and the Row Reduced Form

3.4 The Complete Solution to


See Notes for 2.2 from Applications, much better explanation there.

3.5 Independence, Basis, and Dimension


Most notes in Applications 2.3

Bases for Matrix Spaces and Function Spaces

Chapter 4: Orthogonality

Questions
Questions
Why is ?

What is a geometric description of the subspace that we project into in least squares?

Need to rederive projection formulas for myself.

Need to rederive least squares formulas for myself + get intuitive understanding of each step.

4.1 Orthogonality of the Four Subspaces

4.2 Projections
We want to find projections of vectors onto subspaces - i.e. find the vectors in those subspaces that are
the closest to the original vector

Our goal, thus, is to find the vector which is the projection of our original vector into a subspace as
well as the matrix that takes

Every subspace of has an projection matrix

Projection Onto a Line

The key to projection is orthogonality - consider the example where we're trying to project a vector onto
a line in the direction of : the resulting vector on the line is

The vector connecting and is perpendicular to

The projection is a multiple of :

Since is a scalar, we can shift to get the projection matrix , which is a rank
matrix

Projection Onto a Subspace

We have vectors in that are linearly independent (i.e. form a basis for a subspace), and we want to
find the combination of the vectors that is closest to a given vector

Our error vector is where is the matrix with the vectors as columns

We know that our error vector is perpendicular to the subspace, which implies that it's in the left
nullspace, so

Solving this, we get

Why are we guaranteed is invertible?

The nullspace of is but since the column space of is orthogonal to


The nullspace of is but since the column space of is orthogonal to
the left nullspace of , this is , so has the same nullspace as
When has linearly independent columns, its nullspace is trivial, so also has trivial
nullspace, and since it's sqaure, that means it's invertible

4.3 Least Squares Approximations


Often we have where there are more rows than columns, and is not in , but we still want
to get a solution

The error is nonzero in this case, but we want to minimize it

The solution that minimizes the error is the least squares solution

is the projection of into

What is the subspace we're projecting into?

The relation between the plane that is formed by and the straight line we're fitting to is
unclear
So, when has no solution, multiply by and solve

But how to solve this equation?


By geometry

Every (for fitting a straight line) lies in the plane of the columns
We want the point on the plane closest to , which is the projection
This gives the smallest error , and the points are fitted to a line
Don't really understand how this gives us a solution, it just sketches out what a solution looks
like
By algebra

We can decompose where is in the column space and is orthogonal to the column
space, i.e. is in the left nullspace
The error is by Pythagorean thm since the two vectors are
orthogonal, and for
By calculus

We take partial derivatives of the error, and find the point where they're simultaneously , the
resulting system of equations is equivalent to

4.4 Orthogonal Bases and Gram-Schmidt


An orthogonal matrix is a matrix with orthonormal columns

An orthogonal matrix is really easy to work with

which is clearly seen


For the special case where is square,
It's obv that every permutation matrix is orthogonal (columns of just 1 and rest s)

Multiplication by orthogonal matrices leaves lengths unchanged, as well as dot products

Projections Using Orthogonal Bases


Projections Using Orthogonal Bases

Previously, when we projected we had with the columns as bases for the subspace we wanted to project
into, and , but if is orthogonal, this simplifies to (and simplifies even
further if is square)

Note that if is square, the projection matrix is the identity, which makes sense - the space you're
projecting into is the whole space
E.g. the least squares solution becomes
So, you ask, how do we get this orthogonal matrix from our sad, sad matrix ?

The Gram-Schmidt Process

Very simple - say we have vectors that we want to turn into orthonormal vectors

Simple: subtract away from each vector the portion that lies in the span of previous vectors

I.e. has a portion that is parallel to and a portion that is perpendicular.

The perpendicular portion is

For , subtract away both the projection onto and

At the end, divide by the lengths to get orthonormal vectors

The Factorization

We went from a matrix to a matrix - are these matrices related via matrix operations such that we
can factor ?
Yep - the key insight is similar to what makes so clean in factorization

in each column of is made of sums of that

column and preceding ones in , later columns aren't involved


Any can be factored into , which is useful for least squares because it implies

Chapter 5: Determinants
5.1 The Properties of Determinants
3 basic properties, plus more that follow as corollaries:

1.

2. The determinant changes signs when two rows or columns are exchanged

3. The determinant has linearity for each row separately:


The multiplicative linearity has a close relation to area / volume - multiply an dimensional matrix by
, and the factor is , as if the determinant represented the volume taken up by the matrix

4. If two rows of are equal, det

5. Subtracting a multiple of one row from another does not change the determinant

6. A matrix with a row of s has det

7. If is triangular then the determinant is the product of the diagonals

8. det is singular

9. det det det

10. det det

Every rule that applies to rows also applies to columns

5.2 Permutations and Cofactors


The Pivot Formula

If no row exchanges are involved, the product of the pivots is the determinant

(det P) (det A) = (det L)(det U)

det A = (det P) (det U)

det P is determined by the number of row exchanges ( ), det U is the product of the pivots
You get this formula because elimination doesn't change the determinant, so for an invertible matrix you
can reduce all the way to , where linearity of the determinant by row means you can factor out each non-
pivot

The Big Formula for Determinants

The pivot formula is much easier to compute, but it's harder to relate the end product back to the terms of
the initial matrix

The Big Formula has terms - fuck me

You can derive the formula, esp in the simple case, through applying linearity a bunch

det - or the sum of all column permutations

Each term in the formula uses each row and column once

Determinant by Cofactors

Another way to simplify the big formula is to group by a row or column

E.g. in the case,


E.g. in the case,

The three things in parantheses are cofactors - determinants of smaller submatrices

where is the submatrix that throws out row and column of

Cofactors are useful when a row or column has lots of s

5.3 Cramer's Rule, Inverses, and Volumes


Doesn't seem to be super important. Gives a closed form solution for which is useful for some
abstract proofs, gives a closed form solution for the inverse in terms of cofactors and the determinant, shows
some neat things about how the determinant relates to areas and volumes.

Chapter 6: Eigenvalues and Eigenvectors


6.1 Introduction to Eigenvalues
Eigenvalues and eigenvectors become key when we consider dynamic systems

E.g. we can find the steady state of matrices, , by using eigen

Most vectors change direction when multiplied by , but eigenvectors are distinguished by the fact that
is parallel to

Basic equation: , where the eigenvalue determines how the vector is scaled by
One interesting example is the identity matrix, which has one eigenvalue but every vector is an
eigenvector

Most matrices will have independent eigenvectors and eigenvalues

When is raised to a power, the eigenvectors stay the same, while the eigenvalues are raised to the same
power

The geometric picture here is clear - the directionally invariant vectors stay that way, while the scaling
gets applied again and again
There's a cool picture here - say you have eigenvalues , then the first eigenvector will be the
steady state since it doesn't get scaled as you keep applying , while the second eigenvector will
decay as
Special properties of a matrix lead to special eigenvectors and eigenvalues

So:

This has a solution when is singular, i.e. det


We can then solve this equation for the eigenvalues, and use the eigenvalues to get the eigenvectors
by substituting the eigenvalue and solving by elimination
Note that elimination does not preserve the eigenvalues

There are two useful sanity checks we can get out -


There are two useful sanity checks we can get out -
- i.e. the sum of the diagonal is the sum of the
eigenvalues, and the product of the eigenvalues is the determinant

You might also like