Professional Documents
Culture Documents
CERN Deep Learning and Vision
CERN Deep Learning and Vision
Jon Shlens
Google Research
28 April 2017
Agenda
4. Conclusions
Agenda
4. Conclusions
The hubris of artificial intelligence
http://dspace.mit.edu/handle/1721.1/6125
‘Simple’ problems proved most difficult.
cat?
Machine learning applied everywhere.
classes
• electric ray
• barracuda
• coho salmon
• tench
• goldfish
• sawfish
• smalltooth sawfish
• guitarfish
• stingray
• roughtail stingray
• ...
ImageNet 2011
Compressed Fisher kernel + SVM Xerox Research Center Europe
SIFT bag-of-words + VQ + SVM University of Amsterdam & University of
Trento
SIFT + ? ISI Lab, Tokyo University
ImageNet 2012
Deep convolutional neural network University of Toronto
Discriminatively trained DPMs University of Oxford
Fisher-based SIFT features + SVM ISI Lab, Tokyo University
Examples of artificial vision in action
Good fine-grain classification.
• fine-grain classification
hibiscus
Good generalization. dahila
• generalization
Both meal
Sensible errors. recognized as “meal” meal
• sensible errors
snake dog
4. Conclusions
History of techniques in ImageNet Challenge
ImageNet 2010
Locality constrained linear coding + SVM NEC & UIUC
Fisher kernel + SVM Xerox Research Center Europe
SIFT features + LI2C Nanyang Technological Institute
SIFT features + k-Nearest Neighbors Laboratoire d'Informatique de Grenoble
Color features + canonical correlation analysis National Institute of Informatics, Tokyo
ImageNet 2011
Compressed Fisher kernel + SVM Xerox Research Center Europe
SIFT bag-of-words + VQ + SVM University of Amsterdam & University of
Trento
SIFT + ? ISI Lab, Tokyo University
ImageNet 2012
Deep convolutional neural network University of Toronto
Discriminatively trained DPMs University of Oxford
Fisher-based SIFT features + SVM ISI Lab, Tokyo University
Deep convolutional neural networks
“cat”
Loosely
Loosely inspired
based onby(what
(whatlittle)
little)
we
we know
know about
about the
the brain
brain
“cat”
• no recurrence or feedback *
• no dynamics or state *
• no biophysics
f (z) = max(0, z)
The perceptron: a probabilistic model for information storage and organization in the brain.
F Rosenblatt (1958)
Employing a network for a task.
“dog”
y = f (f (...)) y
label of node j
Example: how to classify with a network
exp(yj )
P (j) = P
j exp(yj )
1
0.75
y 0.5
0.25
0
cat dog car truck cow bicycle cat dog car truck cow bicycle
label of node j
@ loss
@ wi y
Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences
P Werbos (1974)
Learning Internal Representations by Error Propagation.
D Rumelhart, G Hinton, R Williams, James L. McClelland et al (1986)
Optimization is highly non-convex.
loss
weight 1 weight 2
“4”
http://yann.lecun.com/exdb/mnist/
# weights = N x M = 1000
# weights = N x P2 = 78400
handwritten
zip codes
P=28
…
translation
cropping
dilation
contrast
rotation
scale
brightness
…
0 0 0
0 1 0
0 0 0
https://docs.gimp.org/en/plug-in-convmatrix.html
original filter (5 x 5) blur
https://docs.gimp.org/en/plug-in-convmatrix.html
original filter (5 x 5) sharpen
https://docs.gimp.org/en/plug-in-convmatrix.html
original filter (3 x 3) vertical edge detector
https://docs.gimp.org/en/plug-in-convmatrix.html
original filter (3 x 3) all edge detector
https://docs.gimp.org/en/plug-in-convmatrix.html
interlude for convolutions
Multi-layer perceptron on MNIST.
“4”
logistic classifier (M=10)
# weights = N x M = 1000
# weights = N x P2 = 78400
handwritten
zip codes
P=28
“4”
logistic classifier (M=10)
# weights = N x M x K= 1000 K
# weights = N x F2 = 2500
F=5
handwritten
zip codes F=5
P=28
grayscale image
input depth
input depth
RGB image
Generalizing convolutions in depth.
edge detector
filter bank
output
depth
output depth
convolutional
network
• input and output depth are arbitrary parameters and not equal.
• Convolutional neural networks operate with depths up to 1024.
The first convolutional neural network.
“4”
logistic classifier (M=10)
convolutional (N=12)
convolutional (N=12)
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
S Ioffe and C Szegedy (2015)
4. Conclusions
Covariate shifts are problematic in machine learning
time = 1
time = N
network.
50%
adaptation or whitening is
impractical in an online time
setting.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
S Ioffe and C Szegedy (2015)
Previous method for addressing covariate shifts
time = 1
• Adagrad time = N
layer i
• building invariances
through normalization
time = 1
• regularizing the network
time = N
(e.g. dropout, maxout)
I Goodfellow et al (2013)
N Srivastava et al. (2014)
Mitigate covariate shift via batch normalization.
85%
50%
15%
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
S Ioffe and C Szegedy (2015)
Batch normalization speeds up training enormously.
number of mini-batches
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
S Ioffe and C Szegedy (2015)
Agenda
4. Conclusions
Switching to other types of gradients
An important distinction:
• the former provides an update that “lives” in weight space
• the latter provides an update that “lives” in image space
Gradient propagation to find responsible pixels
layer 3
layer 5
Inception-v3
“dog”
http://mscoco.org
Gradient propagation for distorting images.
“dog”
• Apply gradient distortion, feed back the distorted image into the
network and iterate.
“dog”
@ loss
which pixels are sensitive to the label
@ image
Inception-v3
“dog”
4. Conclusions
Quick Start Guide
Online resources:
http://www.tensorflow.org
http://cs231n.github.io/convolutional-networks/
Google Brain Residency Program
g.co/brainresidency
Google Brain Residency Program
g.co/brainresidency