Professional Documents
Culture Documents
Deep Learning PDF
Deep Learning PDF
Learning
Kairit
Sirts
Lecture
in
TUT
19.12.2016
Outline
2
Why deep learning?
Deep learning Gradient boosting
3
http://www.infoworld.com/article/3003315/big-data/deep-learning-a-brief-guide-for-practical-problem-solvers.html
What can be done with deep learning?
Handwritten digit recognition
5
Street view number recognition
6
Image classification
7
Image classification
10 objects
6000 labeled instances for each object
Best accuracy so far 96.53%
8
Image classification
9
Image classification
20 superclasses
100 finegrained classes
600 labeled images per class
Best classification accuracy 75.72%
10
Detecting doodles
https://quickdraw.withgoogle.com
There are other simple and fun AI
experiments launched by Google
https://aiexperiments.withgoogle.com
11
Image captioning
12
Image captioning – not so great results
13
Automatic colorization of images
14
http://richzhang.github.io/colorization/resources/images/teaser3.jpg
Automatic colorization of images - failed
15
DeepDream
https://deepdreamgenerator.com
16
DeepDream
17
DeepDream
18
DeepDream
19
Word embeddings
20
http://metaoptimize.s3.amazonaws.com/cw-embeddings-ACL2010/embeddings-mostcommon.EMBEDDING_SIZE=50.png
Word embeddings
months
weekdays
numbers
21
Word embeddings
23
http://karpathy.github.io/2015/05/21/rnn-effectiveness
Machine translation
24
Learning to play Atari Arcade games
25
https://www.youtube.com/watch?v=cjpEIotvwFY
AlphaGo
26
https://www.youtube.com/watch?v=PQCrX1sQSzY
Other tasks tackled with deep neural networks
• Speech recognition
• Various tasks in robotics
• Log analysis/risk detection
• Recommendation systems
• Motion detection from videos
• Business and Economics analytics
• Etc …
27
Deep learning demystified
How does deep learning work?
• Biological neuron • Artificial neuron
http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7
29
• Biological neural network • Artificial neural network
30
https://www.eeweb.com/blog/rob_riemen/deep-machine-learning-and-the-google-brain http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7
What happens inside a neuron?
<
Output: ℎ = 𝑓(𝑧)
31
Activation function
1
if
𝑧 ≥ th 1 𝑒 E − 𝑒 DE
𝑓 𝑧 =J 𝑓 𝑧 = 𝑓 𝑧 = E 𝑓 𝑧 = max
(0, 𝑧)
0
if
𝑧 < th 1 + 𝑒 DE 𝑒 + 𝑒 DE
32
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html
Single neuron logic gates
33
https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/
XOR gate
• Cannot be done with a single neuron
• A hidden layer is necessary
8Y9+9Y9+9Y9+9Y4=
= 270 weights
35
http://neuralnetworksanddeeplearning.com/
Backpropagation
ERROR à ERROR’
ERROR’ < ERROR
36
Diversion to calculus - derivative
• 𝑦_ = 𝑓 _ 𝑥
• Derivative is the slope of the tangent
line
• It is the rate of change when going in
the direction of steepest ascent
37
Derivatives
38
Gradients
• Generalization of derivatives to
multivariate functions
• Derivative is a vector pointing to the
direction of steepest ascent
ab ab
• ∇𝑓(𝑥, 𝑦) = ,
ac ad
ab ab
• , - partial derivatives – take
ac ad
derivative wrt one variable while
treating all others as constant
39
Gradients and backpropagation
40
Gradient descent
• An iterative algorithm
• Start with initial parameter values 𝜃 f
• Update parameters iteratively until
convergence:
𝜃 gh7 =:
𝜃 g − 𝛼∇𝑓 𝜃
• 𝛼 - learning rate, controls the step size
41
Deep learning demystified
How does backpropagation work?
Backpropagation explained
• Example from:
https://mattmazur.com/2015/03/17/
• 2 inputs
• 1 hidden layer with 2 neurons
• Bias terms in both the hidden and
output layer
• 2 outputs
43
Initial configuration
• Training values
• Initial weights: 𝑤7 , … , 𝑤l
• Initial biases: 𝑏7 , 𝑏:
44
Forward pass – first hidden unit
45
Forward pass – first hidden unit
46
Forward pass – second hidden unit
47
Forward pass – first output unit
48
Forward pass – second output unit
49
Forward pass – error of the first output
50
Forward pass – output error
51
Forward pass – output error
52
Backwards pass
• Consider 𝑤n
• How much a change in 𝑤n affects the
total error?
• Apply the chain rule:
53
Chain rule
• Formula for computing derivative of the composition of two or more functions
• 𝐹 𝑥 ≡ 𝑓(𝑔 𝑥 ) ≡ (𝑓 ∘ 𝑔)(𝑥) – composition of functions 𝑓 and 𝑔
• 𝐹 _ 𝑥 = 𝑓 _ 𝑔 𝑥 𝑔_ 𝑥
• 𝐹 𝑥 = 𝑒 sc 𝑔 𝑥 = 3𝑥 𝑓 𝑔 𝑥 = 𝑒 u(c) = 𝑒 sc
54
Backwards pass
• Consider 𝑤n
• How much a change in 𝑤n affects the
total error?
• Apply the chain rule:
55
How much does error change wrt the output?
56
How much does output change wrt its net input?
57
Derivative of the sigmoid function
1
𝑓 𝑧 =
1 + 𝑒 DE
𝑓 _ 𝑧 = 𝑓(𝑧)(1 − 𝑓 𝑧 )
58
How much does output change wrt its net input?
59
How much does net input change wrt 𝑤n ?
60
Putting it all together
61
This is known as the delta rule
• Delta rule is the gradient descent rule for updating the weights of the inputs to
neurons in a single-layer neural network
62
Apply delta rule to outer layer weights
63
Update the weights with gradient descent
• set learning rate 𝛼 = 0.5 𝜽𝒕h𝟏 =:
𝜽𝒕 − 𝜶𝜵𝒇 𝜽
64
Backpropagation to hidden layer
65
BP through hidden layer
66
BP through hidden layer
• Consider one of those:
67
BP through hidden layer
• Plug the values in:
68
BP through hidden layer
a•‚gƒ„ a<…gƒ„
• Next we need and for each
a<…gƒ„ a†
weight 𝑤
69
BP through hidden layer
• Putting it together
70
BP through hidden layer
• Compute the partial derivatives in the same
way for 𝑤: , 𝑤s and 𝑤|
• Update 𝑤: , 𝑤s and 𝑤|
71
After first update with backpropagation
72
Did the error decrease?
73
In conclusion
• Neural networks consist of artificial neurons organized into layers and connected
to each other with learnable weights.
• Backpropagation with gradient descent is the standard method for training neural
networks.
• Backpropagation can be used to compute the gradients of a neural network,
regardless of the depth of the network.
• Of course, there are other important tricks and tips but this is the basis of
understanding neural networks and deep learning.
74
Common neural network architectures
Feed-forward network
76
https://upload.wikimedia.org/wikipedia/en/5/54/Feed_forward_neural_net.gif
Recurrent neural network
77
Convolutional neural networks
78
http://parse.ele.tue.nl/education/cluster2
Autoencoders
• Output layer attempts to reconstruct
the input
• Used for unsupervised feature learning
• The hidden layer has typically less
neurons, thus performing data
compression
79
Getting started with neural networks
Courses and tutorials
• https://www.coursera.org/learn/machine-learning -
• Introductory course on machine learning, provides necessary background
• https://www.coursera.org/learn/neural-networks
• Course on neural networks – assumes knowledge about machine learning
• http://ufldl.stanford.edu/tutorial/
• Tutorial on deep learning but covers also some simpler machine learning
• http://cs231n.stanford.edu/
• Course on convolutional neural networks
• https://www.udacity.com/course/deep-learning--ud730
• Course on deep learning
81
Books
• http://www.deeplearningbook.org/
• Deep Learning: A Practitioner’s approach – not released yet
• Fundamentals of deep learning – not released yet
82
Low level libraries
• Theano - http://deeplearning.net/software/theano/
• Tensorflow - https://www.tensorflow.org/get_started/
• Python-based
• Automatic differentiation
• Can use cuda for computing on GPU
• Torch – http://torch.ch/
• Based on Lua
• Modular pieces that are easy to combine
• Lots of pretrained models
83
Higher level libraries
• Keras - https://keras.io/
• On top of theano and tensorflow
• Based on python
• Modular
• Supports both convolutional and recurrent networks
• Supports arbitrary connectivity
• Runs on both CPU and GPU
84
Keras – example code
85
What else?
• Take the Machine Learning course in spring semester
• Use neural networks for your thesis work
• Potential supervisors in UT:
• Kairit Sirts (problems involving natural language)
• Mark Fishel (machine translation)
• Raul Vicente (computational neuroscience)
• Ilya Kuzovkin (computational neuroscience)
86
In conclusion - Deep learning
• Can be used to solve very complex problems
• Based on artificial neural networks with many hidden layers
• Each artificial neuron is a simple computational unit
• Neural networks are trained with gradient descent algorithm
• Backpropagations algorithm is used to compute the gradients with respect to
tunable parameters
• There are many tutorials and online courses about deep learning
• There are various software libraries that enable to get started with deep learning
relatively easily
87