Professional Documents
Culture Documents
Strang Linear Algebra Notes
Strang Linear Algebra Notes
Schwarz inequality
Triangle Inequality
1.3: Matrices
Matrix multiplication on the right by a vector can be thought of as a linear combination of columns of the
matrix
One can also think of acting on the vector in insofar as the elements of the vector are the
result of linear combinations of according to the corresponding rows of
Of course, there's also the usual way of multiplying matrices which takes dot products of corresponding
rows and columns (or for a vector, the whole vector)
One other way of looking at the equation is as a system of equations, where is known, and
we're trying to find
This system of equations is well-behaved, and as a result the matrix is invertible. What that means
is that for every constant there is exactly one solution . Note that this doesn't have to be the
case: if we had an under or over determined system, we would have infinitely many solutions / no
solutions at all.
How do these facts about the corresponding systems of equations translate over to the matrix
equation?
Not having enough variables in an equation corresponds to redundant columns in the source
matrix, AKA the columns of are dependent. Not having enough equations for our unknowns
means that our rows are dependent. Either of these can result in infinitely many solutions or no
solutions, depending on .
Another way to look at dependence: the column / row vectors (in 3D) lie in the same plane,
instead of spanning the entire 3D space. This means that not all vectors can result from a
linear combination of the columns, and there is sometimes no solution.
Having dependent columns / rows means that the matrix is not invertible. If the matrix was
invertible, that would imply that there was a single solution to the system of equations. Since this
is clearly not the case, the matrix cannot be inverted.
Viewing this in column form it's clear why linearly dependent columns lead to a singular matrix:
If the columns are linearly dependent, then for some vectors there will be no such that a
linear combination of the columns equals
The row picture, of course, is of two lines in the - plane, intersecting in a point
Now say we have a system of 3 linear equations, with the coefficient matrix
The column picture is the one Strang prefers, for its cleanliness
It's clear that these systems can be quickly solved with back substitution
Some nomenclature:
The pivot is the first nonzero element in the row that does the elimination (i.e. the row that is
subtracted from other rows)
The multiplier is (entry to eliminate) / (pivot coefficient), in the above example, 3
Elimination can fail in some scenarios - e.g. the pivot is 0. This happens in systems with no solution, as well
as in systems with infinitely many solutions. Sometimes, the pivot is 0 and the system is still solvable, but
rows need to be exchanged.
Consider the following elimination step, when we subtract 2x the first row from the second. Here it is,
The elimination matrix that subtracts the jth row from the ith row:
The purpose of elimination is to produce a 0 in the row that's being acted upon, in the corresponding
position from the row that's being subtracted
The operation we're performing is
Also, what's the impact of the ordering of the s on the final elimination matrix?
Ok, so how do we multiply the matrices on the left hand side?
We know what the result of is, so we can make various observations from this result
We also know that the rule for multiplying matrices must hold the same result as the
We also know that the rule for multiplying matrices must hold the same result as the
multiplication when is a single column matrix
When performing elimination, add as the last column, because the same operations act on as
An example is elimination ( ) - think about the resulting rows of and their relation to the
original matrices
Block multiplication
You can divide matrices into blocks. If in the multiplication , the column cuts of match the row
cuts of , you can do the resulting multiplication just like if the cuts were numbers
You can see how this is equivalent to linear dependence of the columns
Matrix is invertible iff determinant is not 0
Gauss-Jordan Elimination
This is 3 systems of 3 linear equations; we can solve them all at once in elemination
Once you get to an upper triangular matrix, you perform another round of elimination, but upwards
so that all elements above the pivots in are , resulting in a diagonal matrix
Finally, you divide rows through by the elements on the diagonals to get on the left and on the
right
We've multiplied by a series of elimination matrices to get , which implies that , the product
of all the elimination matrices, is , and since we've performed the same steps on , the right
hand matrix is
Gauss-Jordan form explains why when we computationally solve , we generally won't try to invert
. We need to solve systems of equations to invert with dimension , but to solve we
simply need to solve one system of equations
Elimination is one of these - when you take , is always an invertible matrix (if there are no row
exchanges), so this is equivalent to factoring
Each step of the elimination is intended to produce a in the element of the operand by multiplying
by with the multiplier in the element of the elimination matrix; the inverse does the opposite
An amazing thing happens - in the product , each of the original multipliers go in directly, without any
An amazing thing happens - in the product , each of the original multipliers go in directly, without any
cross-talk!
It happens because each row of is composed of a sum of a row of , and rows of , because
the rows above a given row in do not change, so the original row of can be recovered by
reversing the summing of rows in
If we want to be cleaner (have s on the diagonal like ), we can divide out by a matrix to get the new
factorization
The best way to look at this is with the row picture, multiplying on the left
Some rules:
Called the inner product because the transpose is on the inside, the outer product is
Note that the inner product of two dimensional vectors, is a scalar, while the outer product is
an matrix
There's a way to define transposes through dot products, but it doesn't seem to add intuition for me
and are both symmetric square matrices (but they're different matrices)
Look at the individual elements - the and elements are the same because as you swap
matrices, you also swap rows for columns
Symmetric matrices make elimination a bit easier
Permutation matrices swap rows of the matrix they're multiplied by (on the left multiplication)
is solvable iff
Chapter 4: Orthogonality
Questions
Questions
Why is ?
What is a geometric description of the subspace that we project into in least squares?
Need to rederive least squares formulas for myself + get intuitive understanding of each step.
4.2 Projections
We want to find projections of vectors onto subspaces - i.e. find the vectors in those subspaces that are
the closest to the original vector
Our goal, thus, is to find the vector which is the projection of our original vector into a subspace as
well as the matrix that takes
The key to projection is orthogonality - consider the example where we're trying to project a vector onto
a line in the direction of : the resulting vector on the line is
Since is a scalar, we can shift to get the projection matrix , which is a rank
matrix
We have vectors in that are linearly independent (i.e. form a basis for a subspace), and we want to
find the combination of the vectors that is closest to a given vector
Our error vector is where is the matrix with the vectors as columns
We know that our error vector is perpendicular to the subspace, which implies that it's in the left
nullspace, so
The solution that minimizes the error is the least squares solution
The relation between the plane that is formed by and the straight line we're fitting to is
unclear
So, when has no solution, multiply by and solve
Every (for fitting a straight line) lies in the plane of the columns
We want the point on the plane closest to , which is the projection
This gives the smallest error , and the points are fitted to a line
Don't really understand how this gives us a solution, it just sketches out what a solution looks
like
By algebra
We can decompose where is in the column space and is orthogonal to the column
space, i.e. is in the left nullspace
The error is by Pythagorean thm since the two vectors are
orthogonal, and for
By calculus
We take partial derivatives of the error, and find the point where they're simultaneously , the
resulting system of equations is equivalent to
Previously, when we projected we had with the columns as bases for the subspace we wanted to project
into, and , but if is orthogonal, this simplifies to (and simplifies even
further if is square)
Note that if is square, the projection matrix is the identity, which makes sense - the space you're
projecting into is the whole space
E.g. the least squares solution becomes
So, you ask, how do we get this orthogonal matrix from our sad, sad matrix ?
Very simple - say we have vectors that we want to turn into orthonormal vectors
Simple: subtract away from each vector the portion that lies in the span of previous vectors
The Factorization
We went from a matrix to a matrix - are these matrices related via matrix operations such that we
can factor ?
Yep - the key insight is similar to what makes so clean in factorization
Chapter 5: Determinants
5.1 The Properties of Determinants
3 basic properties, plus more that follow as corollaries:
1.
2. The determinant changes signs when two rows or columns are exchanged
5. Subtracting a multiple of one row from another does not change the determinant
8. det is singular
If no row exchanges are involved, the product of the pivots is the determinant
det P is determined by the number of row exchanges ( ), det U is the product of the pivots
You get this formula because elimination doesn't change the determinant, so for an invertible matrix you
can reduce all the way to , where linearity of the determinant by row means you can factor out each non-
pivot
The pivot formula is much easier to compute, but it's harder to relate the end product back to the terms of
the initial matrix
You can derive the formula, esp in the simple case, through applying linearity a bunch
Each term in the formula uses each row and column once
Determinant by Cofactors
Most vectors change direction when multiplied by , but eigenvectors are distinguished by the fact that
is parallel to
Basic equation: , where the eigenvalue determines how the vector is scaled by
One interesting example is the identity matrix, which has one eigenvalue but every vector is an
eigenvector
When is raised to a power, the eigenvectors stay the same, while the eigenvalues are raised to the same
power
The geometric picture here is clear - the directionally invariant vectors stay that way, while the scaling
gets applied again and again
There's a cool picture here - say you have eigenvalues , then the first eigenvector will be the
steady state since it doesn't get scaled as you keep applying , while the second eigenvector will
decay as
Special properties of a matrix lead to special eigenvectors and eigenvalues
So: