Multiplication Ax and AB Column Space of A Independent Rows and Basis Row Rank Column Rank

Linear Algebra and Learning from Data
Multiplication Ax and AB
Column space of A
Independent rows and basis
Row rank = column rank
Neural Networks and Deep Learning / new course and book

math.mit.edu/learningfromdata
   
2 3 2x1 + 3x2
x
By rows 2 4 1 = 2x1 + 4x2 
x2
3 7 3x1 + 7x2
     
2 3 2 3
x
By columns 2 4 1 = x1 2 + x2 4
x2
3 7 3 7
b = (b1 , b2 , b3 ) is in the column space of A exactly when
Axb = b has a solution (x1 , x2 )
 
1
b =  1  is not in C(A).
1
   
2x1 + 3x2 1
Ax = 2x1 + 4x2 = 1  is unsolvable.
  
3x1 + 7x2 1
1
The firsttwo
equations force x1 = 2 and x2 = 0.
Then: 3 21 +7(0) = 1.5 (not 1).
What are the column spaces of
   
2 3 5 2 3 1
A2 = 2 4 6  and A3 = 2 4 1 ?
3 7 10 3 7 1
If column 1 of A is not all zero, put it into C.
If column 2 of A is not a multiple of column 1, put it into C.
If column 3 of A is not a combination of columns 1 and 2, put it
into C. Continue.
At the end C will have r columns (r ≤ n).
They will be a “basis” for the column space of A.
The left out columns are combinations of those basic columns in
C.
   
1 3 8 1 3
If A = 1 2 6
  then C =  1 2  n = 3 columns in A
r = 2 columns in C
0 1 2 0 1
 
1 2 3
n = 3 columns in A
If A =  0 4 5  then C = A.
r = 3 columns in C
0 0 6
   
1 2 5 1
If A =  1 2 5  then C =  1  n = 3 columns in A
r = 1 column in C
1 2 5 1
   
1 3 8 1 3
1 0 2
A= 1 2 6 = 1 2
    = CR
0 1 2
0 1 2 0 1
All we are doing is to put the right numbers in R. Combinations

of the columns of C produce the columns of A. Then A = CR
stores this information as a matrix multiplication.
The number of independent columns equals the number of
independent rows
Look at A = CR by rows instead of columns. R has r rows.

Multiplying by C takes combinations. Since A = CR, we
get every row of A from the r rows of R. Those r rows are
independent — a basis for the row space of A.
Column-row multiplication of matrices
—– b∗1 —–
  
| |
.. ∗ ∗ ∗
AB =  a1 . . . an    = a1 b1 + a2 b2 + · · · + an bn .
  
.
| | —– b∗n —–
| {z }
sum of rank 1 matrices
The i, j entry of ak b∗k is aik bkj .

Pn
Add to find cij = k=1 aik bkj = row i · column j.
A = LU A = QR S = QΛQT A = XΛX −1 A = U ΣV T
Deep Learning by Neural Networks
1 Key operation Composition F = F3 (F2 (F1 (x0 )))

2 Key rule Chain rule for derivatives
3 Key algorithm Stochastic gradient descent
4 Key subroutine Backpropagation
5 Key nonlinearity ReLU(x) = max(x, 0) = ramp function
Theorem
Suppose we have N hyperplanes H1 , . . . , HN in m-dimensional

space Rm . Those come from N linear equations aTi x + bi = 0,
in other words from Ax = b. Then the number of regions
bounded by the N hyperplanes (including infinite regions) is
probably r(N, m) and certainly not more :
m
X N N N N
r(N, m) = = + + ··· + .
i 0 1 m
i=0
Thus N = 1 hyperplane in Rm produces 10 + 11 = 2 regions

(one fold). And N = 2 hyperplanes will produce 1 + 2 + 1 = 4

regions provided m ≥ 2. When m = 1 we have 2 folds in a line,
which only separates the line into r(2, 1) = 3 pieces.
The theorem will follow from the recursive formula
r(N, m) = r(N − 1, m) + r(N − 1, m − 1)
Figure: The r(2, 1) = 3 pieces of H create 3 new regions. Then the

count becomes r(3.2) = 4 + 3 = 7 flat regions in the continuous
piecewise linear surface z = F (x1 , x2 ).
Backpropagation: Reverse Mode Graph for Derivatives
of x2 (x + y)

∂F ∂F
Figure: Reverse-mode computation of the gradient ∂x , ∂y at
x = 2, y = 3.
AB first or BC first ? Compute (AB) C or A(BC) ?
AB = (m × n) (n × p) has mnp multiplications

First way
(AB)C = (m × p) (p × q) has mpq multiplications
BC = (n × p) (p × q) has npq multiplications

Second way
A(BC) = (m × n) (n × q) has mnq multiplications

Multiplication Ax and AB Column Space of A Independent Rows and Basis Row Rank Column Rank

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiplication Ax and AB Column Space of A Independent Rows and Basis Row Rank Column Rank

Uploaded by

Copyright:

Available Formats

Linear Algebra and Learning from Data

Neural Networks and Deep Learning / new course and book

All we are doing is to put the right numbers in R. Combinations

Look at A = CR by rows instead of columns. R has r rows.

The i, j entry of ak b∗k is aik bkj .

1 Key operation Composition F = F3 (F2 (F1 (x0 )))

Suppose we have N hyperplanes H1 , . . . , HN in m-dimensional

Thus N = 1 hyperplane in Rm produces 10 + 11 = 2 regions

(one fold). And N = 2 hyperplanes will produce 1 + 2 + 1 = 4

r(N, m) = r(N − 1, m) + r(N − 1, m − 1)

Figure: The r(2, 1) = 3 pieces of H create 3 new regions. Then the

AB = (m × n) (n × p) has mnp multiplications

BC = (n × p) (p × q) has npq multiplications

You might also like