Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Precept 6

COS 324 Staff


Pre-Precept Poll 1
How was the midterm?

A. Very easy: I didn’t have to think much about the problems


B. Easy: I quickly got most of the problems
C. Moderate: It took me some time to figure out the logic
D. Hard: I struggled to understand a lot of the questions
E. Very hard: I really couldn’t do most of the midterm
Pre-Precept Poll 2
Neural Network or Simpler (Maybe Linear) Model?

I have a giant dataset of images of cats and dogs and want to


classify some new images.

A. Use a neural network!


B. A simpler model could work
Pre-Precept Poll 3
Neural Network or Simpler (Maybe Linear) Model?

I have a couple images each of a zero and a one and want to


classify some new images.

A. Use a neural network!


B. A simpler model could work
Pre-Precept Poll 4
Neural Network or Simple Linear Model?

I have a big set of datapoints of square footage versus house


price, and want to estimate the price of some new houses.

A. Use a neural network!


B. A simpler model could work
Pre-Precept Poll 5
Neural Network or Simple Linear Model?

I have a big set of satellite images of a house versus house price,


and want to estimate the price of some new houses.

A. Use a neural network!


B. A simple linear model could work
Matrix-Vector Product
𝜕 𝑑×𝑑 𝑑
𝐴𝑥 𝐴∈ℝ ,𝑥 ∈ ℝ
𝜕𝑥
What is the dimension of 𝐴𝑥 ?

What is the derivative of (𝐴𝑥)𝑖 (the 𝑖 th index of 𝐴𝑥 ) wrt 𝑥 ?


Matrix Factorization
𝑚×𝑛 𝑚×𝑟 𝑟×𝑛
Matrix 𝑀 ∈ ℝ written as 𝑀 = 𝐴𝐵 , where 𝐴 ∈ ℝ and 𝐵 ∈ ℝ

Suppose 𝑀 is matrix of 𝑚 users’ preferences for 𝑛 movies

𝑀 has some missing entries (not everyone has seen every movie)

Each movie has some 𝑟 attributes: length, genre, etc.

𝐴 captures user’s favorite attributes. 𝐵 captures movie’s attributes.

2
∑ (𝑀𝑖𝑗 − (𝐴𝐵)𝑖𝑗 ) Ω is set of non-missing indices in 𝑀
(𝑖,𝑗)∈Ω
Gradient Time!
Building Intuition

We know it is hard to take derivatives wrt vectors, matrices, etc.

Strategies we use:
1. Take derivative wrt one entry, then generalize to vectors, matrices
2. Pretend the problem has only 1D inputs, then try to build up

𝑚×𝑛 𝜕𝑓
If 𝑓(𝑋, 𝑌) outputs a scalar and 𝑋, 𝑌 ∈ ℝ , what shape should be?
𝜕𝑋
Matrix Multiplication!

1 2
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − (𝐴𝐵)𝑖𝑗 )
|Ω| (𝑖,𝑗)∈Ω
̂ ̂ 𝑟
Let 𝑀 = 𝐴𝐵 (our current approximation for 𝑀) 𝑀𝑖𝑗 = ∑ 𝐴𝑖𝑧 𝐵𝑧𝑗
𝑧=1
𝑧

𝑧
×
=
7×4 𝐵 ∈ ℝ4×10 ̂
𝐴∈ℝ
𝑀 = 𝐴𝐵 ∈ ℝ7×10
Back to the objective!
Single Entry Derivative
1 ̂
2
̂ 𝑟
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 ) 𝑀𝑖𝑗 = ∑ 𝐴𝑖𝑧 𝐵𝑧𝑗 = 𝐴𝑖∗ ⋅ 𝐵∗𝑗
|Ω| (𝑖,𝑗)∈Ω 𝑧=1

̂
𝜕𝑓 𝜕𝑓 𝜕𝑀𝑖𝑗
= ∑ ⋅
𝜕𝐴𝑖𝑘 𝑗:(𝑖,𝑗)∈Ω ̂ 𝜕𝐴𝑖𝑘
𝜕𝑀𝑖𝑗
𝜕𝑓 ̂
What is ̂ ? 𝜕𝑀𝑖𝑗
𝜕𝑀𝑖𝑗 What is ?
𝜕𝐴𝑖𝑘
An Alternative Representation!
1 ̂
2
1 2 𝐸 is a matrix of errors and we measure
𝑓(𝐴, 𝐵) = ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 ) = ∑ 𝐸𝑖𝑗
|Ω| (𝑖,𝑗)∈Ω |Ω| (𝑖,𝑗)∈Ω loss on non-missing indices

𝜕𝑓 𝜕𝑓 𝜕𝐸𝑖𝑗
= ∑ ⋅
𝜕𝐴𝑖𝑘 𝑗:(𝑖,𝑗)∈Ω 𝜕𝐸𝑖𝑗 𝜕𝐴𝑖𝑘

𝜕𝑓 𝜕𝐸𝑖𝑗
What is ? What is ?
𝜕𝐸𝑖𝑗 𝜕𝐴𝑖𝑘
Matrix Time
𝜕𝑓 2 ̂
= ∑ (𝑀𝑖𝑗 − 𝑀𝑖𝑗 )𝐵𝑘𝑗 Assume no entries are missing.
𝜕𝐴𝑖𝑘 |Ω| 𝑗:(𝑖,𝑗)∈Ω

What terms in the derivative change as we change 𝑘 ?

What terms in the derivative change as we change 𝑖 ?

Now try writing the matrix form!


Feedforward Nets Notation

ℎ1 If (𝑥1 , 𝑥2 ) = (1, −1), what is (ℎ1 , ℎ2 , ℎ3 )?


1
3
𝑥1 -2
2

3 1 𝑜
ℎ2
0
𝑥2
-2 If 𝑔 is the ReLU activation, what is 𝑔((ℎ1 , ℎ2 , ℎ3 ))?
-1
ℎ3

Let ℎ = (ℎ1 , ℎ2 , ℎ3 ) and 𝑥 = (𝑥1 , 𝑥2 )


3×2
How can we pick 𝑊 ∈ ℝ such that ℎ = 𝑊𝑥 ?

You might also like