Professional Documents
Culture Documents
"Hello World" of Deep Learning: Disclaimer: This PPT Is Modified Based On Dr. Hung-Yi Lee
"Hello World" of Deep Learning: Disclaimer: This PPT Is Modified Based On Dr. Hung-Yi Lee
of deep learning
Disclaimer:
This PPT is modified based on
Dr. Hung-yi Lee
http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.h
tml
Introduction
If you want to learn theano:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Le
Keras cture/Theano%20DNN.ecm.mp4/index.html
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Le
cture/RNN%20training%20(v6).ecm.mp4/index.html
Very flexible
or
Need some
effort to learn
Install Tensorflow
• Mac: open Terminal window
• $ pip install tensorflow
import keras
print (keras.__version__)
“Hello world”
• Handwriting Digit Recognition
Machine “1
”
28 x 28
𝑙 1 𝐿′ = 𝑙 1+𝑙 3 1 +⋯
𝐿′ ′= 𝑙 2+𝑙 1 6 +⋯
Update parameters once
x2 NN y2 ^ 𝑦 2
Mini-batch
…
𝑙 2 Until all mini-batches
have been picked
x16 NN y16 ^𝑦 16
𝑙 16 one epoch
……
𝐿′ = 𝑙 1+𝑙 3 1 +⋯
𝑙 1
Update parameters once
x31 NN y31 ^𝑦 31
𝑙 31 Pick the 2nd batch
……
𝐿′ ′= 𝑙 2+𝑙 1 6 +⋯
…
Batch size = 1 Until all mini-batches
Stochastic gradient descent have been picked
Repeat 20 times one epoch
Very large batch size can yield
Speed worse performance
……
……
……
……
……
xN
x a1 a2…… y yM
y ¿ 𝑓 ()
x Forward pass (Backward pass is similar)
WL …
¿ 𝜎
𝜎
W 2 𝜎 (
x(
W1 ( +)
) b2 … + bL
b1 + )
Speed - Matrix Operation
• Why mini-batch is faster than stochastic gradient
descent?
Stochastic Gradient Descent
𝑧 1 1 𝑥 𝑧 1
= 𝑊 = 1
𝑊 𝑥 ……
Mini-batch
matrix
𝑧 1 𝑧 1 1 𝑥 𝑥 Practically, which
= 𝑊
one is faster?
Keras
case 1:
case 2:
Example 1