Professional Documents
Culture Documents
Winter1516 Lecture3 PDF
Winter1516 Lecture3 PDF
Winter1516 Lecture3 PDF
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 1 11 Jan 2016
Administrative
A1 is due Jan 20 (Wednesday). ~9 days left
Warning: Jan 18 (Monday) is Holiday (no class/office hours)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 2 11 Jan 2016
Recall from last time Challenges in Visual Recognition
Camera pose Illumination Deformation Occlusion
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 3 11 Jan 2016
Recall from last time data-driven approach, kNN
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 4 11 Jan 2016
Recall from last time Linear classifier
[32x32x3]
f(x,W) class scores
array of numbers 0...1
(3072 numbers total)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 5 11 Jan 2016
Recall from last time Going forward: Loss function/Optimization
TODO:
1. Define a loss function
that quantifies our
-3.45 -0.51 3.42 unhappiness with the
-8.87 6.04 4.64 scores across the training
0.09 5.31 2.65 data.
2.9 -4.22 5.1
4.48 -4.19 2.64 2. Come up with a way of
8.02 3.58 5.55 efficiently finding the
3.78 4.49 -4.34
1.06
parameters that minimize
-4.37 -1.5
-0.36 -4.79
the loss function.
-2.09
-0.72 -2.93 6.14 (optimization)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 6 11 Jan 2016
Suppose: 3 training examples, 3 classes.
With some W the scores are:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 7 11 Jan 2016
Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 8 11 Jan 2016
Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
cat 3.2 1.3 2.2 and the full training loss is the mean
over all examples in the training data:
car 5.1 4.9 2.5
frog -1.7 2.0 -3.1
L = (2.9 + 0 + 10.9)/3
Losses: 2.9 0 10.9 = 4.6
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 12 11 Jan 2016
Suppose: 3 training examples, 3 classes. Multiclass SVM loss:
With some W the scores are:
Given an example
where is the image and
where is the (integer) label,
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 18 11 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 19 11 Jan 2016
There is a bug with the loss:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 20 11 Jan 2016
There is a bug with the loss:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 21 11 Jan 2016
Suppose: 3 training examples, 3 classes.
With some W the scores are:
Before:
= max(0, 1.3 - 4.9 + 1)
+max(0, 2.0 - 4.9 + 1)
= max(0, -2.6) + max(0, -1.9)
=0+0
=0
cat 3.2 1.3 2.2 With W twice as large:
= max(0, 2.6 - 9.8 + 1)
car 5.1 4.9 2.5 +max(0, 4.0 - 9.8 + 1)
= max(0, -6.2) + max(0, -4.8)
frog -1.7 2.0 -3.1 =0+0
=0
Losses: 2.9 0
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 22 11 Jan 2016
\lambda = regularization strength
Weight Regularization (hyperparameter)
In common use:
L2 regularization
L1 regularization
Elastic net (L1 + L2)
Max norm regularization (might see later)
Dropout (will see later)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 23 11 Jan 2016
L2 regularization: motivation
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 24 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
cat 3.2
car 5.1
frog -1.7
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 25 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
cat 3.2
car 5.1
frog -1.7
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 26 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
where
cat 3.2
car 5.1
frog -1.7
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 27 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
where
car 5.1
frog -1.7
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 28 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
where
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 29 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
where
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 30 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
cat 3.2
car 5.1
frog -1.7
unnormalized log probabilities
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 31 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
unnormalized probabilities
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 32 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
unnormalized probabilities
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 33 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
unnormalized probabilities
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 34 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
unnormalized probabilities
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 35 11 Jan 2016
Softmax Classifier (Multinomial Logistic Regression)
Q5: usually at
initialization W are small
numbers, so all s ~= 0.
unnormalized probabilities What is the loss?
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 36 11 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 37 11 Jan 2016
Softmax vs. SVM
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 38 11 Jan 2016
Softmax vs. SVM
http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 40 11 Jan 2016
Optimization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 41 11 Jan 2016
Recap
- We have some dataset of (x,y) e.g.
- We have a score function:
- We have a loss function:
Softmax
SVM
Full loss
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 42 11 Jan 2016
Strategy #1: A first very bad idea solution: Random search
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 43 11 Jan 2016
Lets see how well this works on the test set...
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 47 11 Jan 2016
current W: gradient dW:
[0.34, [?,
-1.11, ?,
0.78, ?,
0.12, ?,
0.55, ?,
2.81, ?,
-3.1, ?,
-1.5, ?,
0.33,] ?,]
loss 1.25347
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 48 11 Jan 2016
current W: W + h (first dim): gradient dW:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 55 11 Jan 2016
Evaluation the
gradient numerically
- approximate
- very slow to evaluate
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 56 11 Jan 2016
This is silly. The loss is just a function of W:
want
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 57 11 Jan 2016
This is silly. The loss is just a function of W:
want
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 58 11 Jan 2016
This is silly. The loss is just a function of W:
want
Calculus
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 59 11 Jan 2016
This is silly. The loss is just a function of W:
= ...
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 60 11 Jan 2016
current W: gradient dW:
[0.34, [-2.5,
-1.11, dW = ... 0.6,
0.78, (some function 0,
0.12, data and W) 0.2,
0.55, 0.7,
2.81, -0.5,
-3.1, 1.1,
-1.5, 1.3,
0.33,] -2.1,]
loss 1.25347
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 61 11 Jan 2016
In summary:
- Numerical gradient: approximate, slow, easy to write
=>
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 62 11 Jan 2016
Gradient Descent
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 63 11 Jan 2016
W_2
original W
W_1
negative gradient direction
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 64 11 Jan 2016
Mini-batch Gradient Descent
- only use a small portion of the training set to compute the gradient.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 65 11 Jan 2016
Example of optimization progress while
training a neural network.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 66 11 Jan 2016
The effects of step size (or learning rate)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 67 11 Jan 2016
Mini-batch Gradient Descent
- only use a small portion of the training set to compute the gradient.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 68 11 Jan 2016
The effects of
different
update form
formulas
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 69 11 Jan 2016
Aside: Image Features
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 70 11 Jan 2016
Example: Color (Hue) Histogram
hue bins
+1
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 71 11 Jan 2016
Example: HOG/SIFT features
8x8 pixel region,
quantize the edge
orientation into 9 bins
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 72 11 Jan 2016
Example: HOG/SIFT features
8x8 pixel region,
quantize the edge
orientation into 9 bins
Many more:
GIST, LBP,
Texton,
SSIM, ...
(image from vlfeat.org)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 73 11 Jan 2016
Example: Bag of Words histogram of
visual words
visual word vectors learn k-means centroids
vocabulary of visual words 1000-d vector
1000-d vector
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 74 11 Jan 2016
vector describing various
image statistics
f 10 numbers, indicating
class scores
training
[32x32x3]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 75 11 Jan 2016
Next class:
Becoming a backprop ninja
and
Neural Networks (part 1)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - 76 11 Jan 2016