Professional Documents
Culture Documents
498 FA2019 Lecture03
498 FA2019 Lecture03
498 FA2019 Lecture03
3:
Linear Classifiers
cat
bird
deer
dog
truck
This image is CC0 1.0 public domain This image is CC0 1.0 public domain
train test
Linear
classifiers
Image
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Image
f(x,W) = Wx
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Image
f(x,W) = Wx
(10,) (10, 3072)
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Image
f(x,W) = Wx + b (10,)
(10,) (10, 3072)
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
56
56 231
231
24 2
24
Input image
2
(2, 2)
(4,)
56
0.2 -0.5 0.1 2.0 1.1 -96.8
56 231
231
24 2
1.5 1.3 2.1 0.0
24
+ 3.2
= 437.9
56
0.2 -0.5 0.1 2.0 1.1 -96.8
56 231
231
24 2
1.5 1.3 2.1 0.0
24
+ 3.2
= 437.9
56
0.2 -0.5 0.1 2.0 1.1 -96.8
56 231
231
24 2
1.5 1.3 2.1 0.0 3.2
24
= 437.9
437.8 218.9
62.0 31.0
56
0.2 -0.5 0.1 2.0 1.1 -96.8
56 231
231
24 2
1.5 1.3 2.1 0.0
24
+ 3.2
= 437.9
Airplane f(x,W) = Wx + b
Score
Classifier Deer Score
score
Car Score
Array of 32x32x3 numbers
Value of pixel (15, 8, 0) (3072 numbers total)
Car Score
Deer = 0
Score Plot created using Wolfram Cloud
y
X Y F(x,y)
0 0 0
0 1 1
1 0 1
1 1 0 x
Given a W, we can
compute class scores
-3.45 -0.51 3.42 for an image x.
-8.87 6.04 4.64
0.09 5.31 2.65
2.9 -4.22 5.1 But how can we
4.48
8.02
-4.19
3.58
2.64 actually choose a
5.55
3.78 4.49 -4.34 good W?
1.06 -4.37 -1.5
-0.36 -2.09 -4.79
-0.72 -2.93 6.14 Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain
TODO:
Loss
Score for
correct class
Loss
Score for
correct class
Highest score
among other classes
Justin Johnson Lecture 3 - 41 September 11, 2019
Multiclass SVM Loss
”The score of the correct class should
be higher than all the other scores”
Loss
“Hinge Loss”
Score for
correct class
Let be scores
Let be scores
Let be scores
Let be scores
Let be scores
Let be scores
Let be scores
Let be scores
Let be scores
Let be scores
Let be scores
= regularization strength
(hyperparameter)
= regularization strength
(hyperparameter)
= regularization strength
(hyperparameter)
Purpose of Regularization:
- Express preferences in among models beyond ”minimize training error”
- Avoid overfitting: Prefer simple models that generalize better
- Improve optimization by adding curvature
L2 regularization likes to
“spread out” the weights
x
The model f1 fits the training data perfectly
The model f2 has training error, but is simpler
Justin Johnson Lecture 3 - 67 September 11, 2019
Regularization: Prefer Simpler Models
f1 f2
y
x
Regularization pushes against fitting the data
too well so we don’t fit noise in the data
Justin Johnson Lecture 3 - 68 September 11, 2019
Regularization: Prefer Simpler Models
f1 f2
Regularization is
y
important! You should
(usually) use it.
x
Regularization pushes against fitting the data
too well so we don’t fit noise in the data
Justin Johnson Lecture 3 - 69 September 11, 2019
Cross-Entropy Loss (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
cat 3.2
car 5.1
frog -1.7
cat 3.2
car 5.1
frog -1.7
cat 3.2
car 5.1
frog -1.7
Unnormalized log-
probabilities / logits
cat 3.2
car 5.1
frog -1.7
cat 3.2
car 5.1
Q: What is the min /
frog -1.7 max possible loss Li?
cat 3.2
car 5.1
Q: What is the min /
frog -1.7 A: Min 0, max +infinity
max possible loss Li?
cat 3.2
car 5.1
Q: If all scores are
frog -1.7 small random values,
what is the loss?
cat 3.2
car 5.1
Q: If all scores are
frog -1.7 A: -log(C)
small random values,
log(10) ≈ 2.3
what is the loss?
Softmax
SVM
Full loss
Softmax
SVM
Full loss