Professional Documents
Culture Documents
openSAP ml2 Week All Slides
openSAP ml2 Week All Slides
x1
Forward pass
▪ Calculates scores based on weights of hidden nodes Backpropagation
Backpropagation
▪ Calculates error contributed by each neuron after
each batch is processed
▪ Weights are modified based on error calculated
Forward Pass
Softmax
▪ A classifier that converts scores to probabilities for
each class
open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Week 3: Deep Networks and Sequence Models
Unit 2: Introduction to Sequence Models
Introduction to Sequence Models
What we covered in the last unit
Deep networks
Content:
▪ Sequence data
▪ Sequence models
▪ Applications of sequence models
Time Series
Items sold 2 4 7 7 ?
Day 1 2 3 4 5
Time Series
Items sold 2 4 7 7 ?
Day 1 2 3 4 5
Sentences
Position 1 2 3 4
Time Series
Characteristics:
Items sold 2 4 7 7 ? ▪ Ordered elements
▪ Number of elements can vary
Day 1 2 3 4 5
Sentences
Position 1 2 3 4
▪ Time series
External factors: weather, competing products, etc.
Internal factors: previous values
▪ Sentences
External factors: paragraph, conversation context, etc.
Internal factors: previous words
▪ Primary focus is on internal factors
Previous elements determine the next element, up to noise
Mathematically: 𝑒𝑡+1 = 𝑓 𝑒1 , 𝑒2 , … , 𝑒𝑡 + ϵ
▪ n-gram models
Infer wt+1 based on a fixed number n of previous words (e.g., hidden Markov models)
▪ n-gram models
Infer wt+1 based on a fixed number n of previous words (e.g., hidden Markov models)
Example: let n = 3
expensive
the cat is cute ✔
…
▪ n-gram models
Infer wt+1 based on a fixed number n of previous words (e.g., hidden Markov models)
Example: let n = 3
expensive
the cat is cute ✔
…
cute
the cat on the piano is expensive ✔
…
▪ Bag of words
Infer wt+1 based on a fixed-length vector of word count built on previous words
▪ Bag of words
Infer wt+1 based on a fixed-length vector of word count built on previous words
Example: assume the dictionary consists of words “the”, “cat”, “nice”, “caught”, “jump”
the cat nice caught jump
▪ Bag of words
Infer wt+1 based on a fixed-length vector of word count built on previous words
Example: assume the dictionary consists of words “the”, “cat”, “nice”, “caught”, “jump”
the cat nice caught jump
cried
the cat caught mice ✔
…
cried ✔
the caught cat mice
…
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14
Introduction to Sequence Models
Sequence models
encoded info of
w1, w2, …, wt-1
…... wt+1
w1 w2 wt
current word
as input
Motivation:
encoded info of
w1, w2, …, wt-1 We change
wt+1 = f(w1, w2, …, wt-1, wt)
to
…... wt+1 wt+1 = f(h, wt)
where h is the encoded info of w1, w2, …, wt-1
w1 w2 wt
current word
as input
Motivation:
encoded info of
w1, w2, …, wt-1 We change
wt+1 = f(w1, w2, …, wt-1, wt)
to
…... wt+1 wt+1 = f(h, wt)
where h is the encoded info of w1, w2, …, wt-1
w1 w2 wt
Remarks: When h can be expressed as a fixed-
current word length feature vector:
as input
▪ The function f can be learned in a similar way
across different positions/states
▪ Sentences with different lengths are handled
consistently
▪ Text classification
…... label
w1 w2 wt
▪ Text classification
…... label
w1 w2 wt
▪ Example: Sentiment analysis
positive ✔
the cat is so cute neutral
negative
positive
the sky is blue neutral ✔
negative
positive
this is bad neutral
negative ✔
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 19
Introduction to Sequence Models
Other applications of sequence models – Sequence to sequence
…... …...
w1 w2 wt <eos> o1 ok
…... …...
w1 w2 wt <eos> o1 ok
I bike to work
Distributed representations
Word2Vec
open@sap.com
Week 3: Deep Networks and Sequence Models
Unit 3: Representation Learning for NLP
Representation Learning for NLP
What we covered in the last unit
Sequential data
Sequence models
Content:
▪ Vector representations of words
▪ Distributed representations
▪ Word2Vec
One-hot encoding
▪ Definition: Given a vocabulary V, every word is
converted to a vector in ℝ|V|x1 that is 0 for all indices Everybody likes dogs
but 1 at the index of the respective word represented
1 0 0
▪ Example: 0 0 1
“Everybody likes dogs and hates bananas.” 0 1 0
w~ … … …
… … …
… … …
0 0 0
One-hot representation
▪ Straightforward implementation
▪ Meaning of word is not encoded in representation Queen
▪ Vector dimensionality scales with vocabulary size
T
1 0 Prince
0 1
X =0
0 0
0 0 King
Distributed Representations
▪ Instead of storing all information in Queen Woman King Prince
one dimension, we distribute the
meaning across a fixed number of Femininity 0.99 0.99 0.01 0.02
dimensions
▪ Encodes semantic and syntactic Masculinity 0.01 0.01 0.99 0.92
features of words
▪ Reduces the necessary number of
dimensions Royalty 0.99 0.99 0.99 0.81
Predict
Skip-Gram Model
▪ For each word in a corpus, predict the neighboring xo-c
words in a context window of length c W’N×V
Back-
Word embedding propagate
matrix Error
Everybody likes dogs and hates bananas. h
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Word2Vec: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean 9
(2013). Distributed Representations of Words and Phrases and their Compositionality.
Representation Learning for NLP
Word2Vec: CBOW
Input layer
WV×N
xo+c
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Word2Vec: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean 10
(2013). Distributed Representations of Words and Phrases and their Compositionality.
Representation Learning for NLP
Word2Vec captures similarity of words…
Word similarity
▪ Closest vectors in terms of (cosine) distance according to
GoogleNews-vectors (trained on about 100 billion words)
▪ Excluded plurals for illustration
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC word2vec: https://code.google.com/archive/p/word2vec/ 11
Representation Learning for NLP
…and beyond
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Word2Vec: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean 12
(2013). Distributed Representations of Words and Phrases and their Compositionality.
Representation Learning for NLP
…and beyond
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Word2Vec: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean 13
(2013). Distributed Representations of Words and Phrases and their Compositionality.
Representation Learning for NLP
…and beyond
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Word2Vec: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean 14
(2013). Distributed Representations of Words and Phrases and their Compositionality.
Representation Learning for NLP
…and beyond
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Word2Vec: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean 15
(2013). Distributed Representations of Words and Phrases and their Compositionality.
Representation Learning for NLP
Vector representation for words
Demo in TensorBoard
▪ Show visualization of embeddings in
TensorBoard, either by t-SNE or PCA
▪ 10,000 word vectors with dimensionality of 128
▪ Reduced to d = 3 for visualization
open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Week 3: Deep Networks and Sequence Models
Unit 4: Basic RNNs in TensorFlow
Basic RNNs in TensorFlow
What we covered in the last unit
Content:
▪ Recap: Sequence types
▪ Basic RNN cells
▪ Training RNNs
Example:
Height
Parent’s height → Child’s height
One-to-One
Example:
Image captioning
Image → Sequence of words
One-to-One One-to-Many
Example:
Sentiment Analysis
Sequence of words → Sentiment
Example:
Machine Translation
Sequence of words →
Sequence of words
y yt-2 yt-1 yt
Wy Wy Wy
unrolling Wh Wh Wh
h ht-2 ht-1 ht
Wx Wx Wx
x xt-2 xt-1 xt
yt Output at
step t
ht-1 ht
Memory of all Memory of all
previous inputs previous inputs
and input xt
Input at
xt step t
▪ xt = input at step t
▪ ht-1 = memory of all previous inputs until t-1
▪ Wh = weight parameters for hidden state ht
▪ Wx = weight parameters for input xt
xt
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11
Basic RNNs in TensorFlow
Understanding an RNN cell
▪ xt = input at step t
▪ ht-1 = memory of all previous inputs until t-1
▪ Wh = weight parameters for hidden state ht
▪ Wx = weight parameters for input xt
xt
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12
Basic RNNs in TensorFlow
Understanding an RNN cell
▪ xt = input at step t
▪ ht-1 = memory of all previous inputs until t-1
▪ Wh = weight parameters for hidden state ht
▪ Wx = weight parameters for input xt
▪ F = activation function e.g. tanh
xt
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13
Basic RNNs in TensorFlow
Understanding an RNN cell
▪ xt = input at step t
▪ ht-1 = memory of all previous inputs until t-1
▪ Wh = weight parameters for hidden state ht
▪ Wx = weight parameters for input xt
▪ F = activation function e.g. tanh
yt ▪ xt = input at step t
▪ ht-1 = memory of all previous inputs until t-1
softmax
▪ Wh = weight parameters for hidden state ht
▪ Wx = weight parameters for input xt
Wy ▪ F = activation function e.g. tanh
▪ Wy = weight parameters for hidden output
F
▪ yt = output at step t
yt-2 yt-1 yt
Takeaway
▪ We use the same weight parameters for all t
Wy Wy Wy
▪ Output depends on all previous inputs (ht-1)
Wh Wh Wh
ht-2 ht-1 ht and the current input xt
▪ Calculating the output means matrix
Wx Wx Wx multiplication of the same matrices over and
over again
xt-2 xt-1 xt
yt-2 yt-1 yt
Wy Wy Wy
Wh3 Wh3 Wh3 Wh3
Wx Wx Wx
xt-2 xt-1 xt
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17
Basic RNNs in TensorFlow
Training of RNNs
Wh
Wh Wh Wh
ht-3 ht-2 ht-1 ht
▪ Our main problem: The same operation (matrix Gradient: Rate of change of cost
multiplication followed by activation function) is function with respect to each parameter
repeated over and over
▪ Any eigenvalue smaller than one becomes zero ht-3 ht-2 ht-1 ht
h1 h2 h3
Fn Fn Fn
h0 Wp
+ Wp
+ Wp
+
Wpp Wpp Wpp
i0 i1 i3
h1 h2 h3
Fn Fn Fn
h0 Wp
+ Wp
+ Wp
+
Wpp Wpp Wpp
i0 i1 i3
h1 h2 h3
Fn Fn Fn
h0 Wp
+ Wp
+ Wp
+
Wpp Wpp Wpp
i0 i1 i3
small gradient flow is interrupted at each step, thus making learning difficult large
h1 hk+1 hT
Fn Fn Fn
x0 xk xT
open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Week 3: Deep Networks and Sequence Models
Unit 5: Introduction to LSTM, GRU
Introduction to LSTM, GRU
What we covered in the last unit
-1 ≤ 𝑡𝑎𝑛ℎ 𝑥 ≤1
o ot-2 ot-1 ot
unrolling
x xt-2 xt-1 xt
End-to-end architecture to model the sequence remains the same, but the cell structure changes
ct-1 ct
tanh LSTM:
▪ Decide whether current input matters
▪ Decide which part of the memory to forget
𝜎 𝜎 tanh 𝜎 ▪ Decide what to output at the current time
ht-1 ht
xt
ct-1 ct
tanh
▪ Captures long-term dependency
▪ Element-wise operation
▪ Gradients can flow freely without suppression
𝜎 𝜎 tanh 𝜎
ht-1 ht
xt
ft = 𝝈(Wf [ht-1 , xt ] + bf )
ct-1 ct
tanh
▪ Based on the past sequence and current
ft input, a forget regulator (ft) is created
▪ It dampens certain elements of ct-1
𝜎 𝜎 tanh 𝜎
▪ As a result, some past information is
forgotten
ht-1 ht
▪ This helps in remembering only the
important part of a sequence
xt
it = 𝝈(Wi [ht-1 , xt ] + bi )
ct-1 ct
tanh nt = 𝒕𝒂𝒏𝒉(Wn [ht-1 , xt ] + bn )
ft ▪ nt contributes new information from the
it nt
current input
𝜎 𝜎 tanh 𝜎
▪ it acts as an input regulator
xt
ct = ft * ct-1 + it * nt
ct-1 ct
tanh
▪ After all updates in the memory cell, we get
ft new long-term information in ct
it nt
▪ The inputs to ct are between 0 and 1, but the
𝜎 𝜎 tanh 𝜎 elements can be unbounded because of
addition to each element over multiple steps
ht-1 ht
xt
ot = 𝝈(Wo [ht-1 , xt ] + bo )
ct-1 ct
tanh ht =ot ∗ 𝒕𝒂𝒏𝒉( ct )
ft ▪ Computation of new short-term information
it nt ot
▪ Since ct is unbounded, we again bound it
𝜎 𝜎 tanh 𝜎 with an activation function, in this case tanh
▪ ot acts as an output regulator
ht-1 ht
▪ It decides what fraction of long-term
information is considered for generating
current working memory
xt
c0 c1 c2
tanh tanh
ft ft
it nt ot it nt ot
𝜎 𝜎 tanh 𝜎 𝜎 𝜎 tanh 𝜎
h0 h1 h2
x0 x1
GRU: Junyoung Chung Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio (Dec, 2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13
Introduction to LSTM, GRU
Vanilla LSTM cell structure
ResNet: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (Dec, 2015). Deep Residual Learning for Image Recognition.
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14
Introduction to LSTM, GRU
Coming up next
Convolutional Networks
▪ Introduction to CNNs
▪ CNN architecture
▪ Accelerating deep CNN training
▪ Applications of CNNs
open@sap.com
© 2017 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components
of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are
set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products,
and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The
information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various
risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company)
in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.
See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.