Precept 6

Precept 6
COS 324 Staff

Pre-Precept Poll 1
How was the midterm?
A. Very easy: I didn’t have to think much about the problems

B. Easy: I quickly got most of the problems
C. Moderate: It took me some time to figure out the logic
D. Hard: I struggled to understand a lot of the questions
E. Very hard: I really couldn’t do most of the midterm
Pre-Precept Poll 2
Neural Network or Simpler (Maybe Linear) Model?
I have a giant dataset of images of cats and dogs and want to

classify some new images.
A. Use a neural network!

B. A simpler model could work
Pre-Precept Poll 3
Neural Network or Simpler (Maybe Linear) Model?
I have a couple images each of a zero and a one and want to

classify some new images.

Pre-Precept Poll 4
Neural Network or Simple Linear Model?
I have a big set of datapoints of square footage versus house

price, and want to estimate the price of some new houses.

Pre-Precept Poll 5
Neural Network or Simple Linear Model?
I have a big set of satellite images of a house versus house price,

and want to estimate the price of some new houses.

B. A simple linear model could work
Matrix-Vector Product
𝜕 𝑑×𝑑 𝑑
𝐴𝑥 𝐴∈ℝ ,𝑥 ∈ ℝ
𝜕𝑥
What is the dimension of 𝐴𝑥 ?
What is the derivative of (𝐴𝑥)𝑖 (the 𝑖 th index of 𝐴𝑥 ) wrt 𝑥 ?

Matrix Factorization
𝑚×𝑛 𝑚×𝑟 𝑟×𝑛
Matrix 𝑀 ∈ ℝ written as 𝑀 = 𝐴𝐵 , where 𝐴 ∈ ℝ and 𝐵 ∈ ℝ
Suppose 𝑀 is matrix of 𝑚 users’ preferences for 𝑛 movies
𝑀 has some missing entries (not everyone has seen every movie)
Each movie has some 𝑟 attributes: length, genre, etc.
𝐴 captures user’s favorite attributes. 𝐵 captures movie’s attributes.
2
∑ (𝑀𝑖𝑗 − (𝐴𝐵)𝑖𝑗 ) Ω is set of non-missing indices in 𝑀
(𝑖,𝑗)∈Ω
Gradient Time!
Building Intuition
We know it is hard to take derivatives wrt vectors, matrices, etc.
Strategies we use:
1. Take derivative wrt one entry, then generalize to vectors, matrices
2. Pretend the problem has only 1D inputs, then try to build up
𝑚×𝑛 𝜕𝑓
If 𝑓(𝑋, 𝑌) outputs a scalar and 𝑋, 𝑌 ∈ ℝ , what shape should be?
𝜕𝑋
Matrix Multiplication!
1 2
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − (𝐴𝐵)𝑖𝑗 )
|Ω| (𝑖,𝑗)∈Ω
̂ ̂ 𝑟
Let 𝑀 = 𝐴𝐵 (our current approximation for 𝑀) 𝑀𝑖𝑗 = ∑ 𝐴𝑖𝑧 𝐵𝑧𝑗
𝑧=1
𝑧
𝑧
×
=
7×4 𝐵 ∈ ℝ4×10 ̂
𝐴∈ℝ
𝑀 = 𝐴𝐵 ∈ ℝ7×10
Back to the objective!
Single Entry Derivative
1 ̂
2
̂ 𝑟
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 ) 𝑀𝑖𝑗 = ∑ 𝐴𝑖𝑧 𝐵𝑧𝑗 = 𝐴𝑖∗ ⋅ 𝐵∗𝑗
|Ω| (𝑖,𝑗)∈Ω 𝑧=1
̂
𝜕𝑓 𝜕𝑓 𝜕𝑀𝑖𝑗
= ∑ ⋅
𝜕𝐴𝑖𝑘 𝑗:(𝑖,𝑗)∈Ω ̂ 𝜕𝐴𝑖𝑘
𝜕𝑀𝑖𝑗
𝜕𝑓 ̂
What is ̂ ? 𝜕𝑀𝑖𝑗
𝜕𝑀𝑖𝑗 What is ?
𝜕𝐴𝑖𝑘
An Alternative Representation!
1 ̂
2
1 2 𝐸 is a matrix of errors and we measure
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 ) = ∑ 𝐸𝑖𝑗
|Ω| (𝑖,𝑗)∈Ω |Ω| (𝑖,𝑗)∈Ω loss on non-missing indices
𝜕𝑓 𝜕𝑓 𝜕𝐸𝑖𝑗
= ∑ ⋅
𝜕𝐴𝑖𝑘 𝑗:(𝑖,𝑗)∈Ω 𝜕𝐸𝑖𝑗 𝜕𝐴𝑖𝑘
𝜕𝑓 𝜕𝐸𝑖𝑗
What is ? What is ?
𝜕𝐸𝑖𝑗 𝜕𝐴𝑖𝑘
Matrix Time
𝜕𝑓 2 ̂
= ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 )𝐵𝑘𝑗 Assume no entries are missing.
𝜕𝐴𝑖𝑘 |Ω| 𝑗:(𝑖,𝑗)∈Ω
What terms in the derivative change as we change 𝑘 ?
What terms in the derivative change as we change 𝑖 ?
Now try writing the matrix form!

Feedforward Nets Notation
ℎ1 If (𝑥1 , 𝑥2 ) = (1, −1), what is (ℎ1 , ℎ2 , ℎ3 )?

1
3
𝑥1 -2
2
3 1 𝑜
ℎ2
0
𝑥2
-2 If 𝑔 is the ReLU activation, what is 𝑔((ℎ1 , ℎ2 , ℎ3 ))?
-1
ℎ3
Let ℎ = (ℎ1 , ℎ2 , ℎ3 ) and 𝑥 = (𝑥1 , 𝑥2 )

3×2
How can we pick 𝑊 ∈ ℝ such that ℎ = 𝑊𝑥 ?

Precept 6

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Precept 6

Uploaded by

Copyright:

Available Formats

Precept 6

COS 324 Staff

A. Very easy: I didn’t have to think much about the problems

I have a giant dataset of images of cats and dogs and want to

A. Use a neural network!

I have a couple images each of a zero and a one and want to

A. Use a neural network!

I have a big set of datapoints of square footage versus house

A. Use a neural network!

I have a big set of satellite images of a house versus house price,

A. Use a neural network!

What is the derivative of (𝐴𝑥)𝑖 (the 𝑖 th index of 𝐴𝑥 ) wrt 𝑥 ?

Suppose 𝑀 is matrix of 𝑚 users’ preferences for 𝑛 movies

Each movie has some 𝑟 attributes: length, genre, etc.

𝐴 captures user’s favorite attributes. 𝐵 captures movie’s attributes.

We know it is hard to take derivatives wrt vectors, matrices, etc.

What terms in the derivative change as we change 𝑘 ?

What terms in the derivative change as we change 𝑖 ?

Now try writing the matrix form!

ℎ1 If (𝑥1 , 𝑥2 ) = (1, −1), what is (ℎ1 , ℎ2 , ℎ3 )?

Let ℎ = (ℎ1 , ℎ2 , ℎ3 ) and 𝑥 = (𝑥1 , 𝑥2 )

You might also like