Professional Documents
Culture Documents
Seq 2 Seq
Seq 2 Seq
Seq 2 Seq
RNN
x
Forward C1 C2 C3
y1 y2 y3
h1 h2 h3
x1 h0 x2 h1 x3 h2
Deep RNN
Same
parameters
at this level
≠
Same
parameters
at this level
Time
Recurrent neural network problem
Character-level language model example
Character-level language model example
Vocabulary:
[h,e,l,o] y
Example training RN
sequence: N
“hello”
x
Character-level language model example
Character-level language model example
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Character-level language model example
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Character-level language model example
Character-level language model example
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Long short term memory (LSTM)
Translation
Model
Seq2seq
Seq2Seq
Seq2seq
Seq2seq - prediction
Greedy search
Beam search
Word embedding (glove, word2vec)
Word semantic
Attention - motivation
Attention
q
k1 k2 k3
Attention
Attention function
Self attention
Transformer
Seq2seq
Transformer
Transformer
Multi-head attention
BERT
BERT model
● Mask Language
Modeling (MLM)
● Next Sentence
Prediction (NSP)
Masked LM (MLM)
Next sentence prediction
Fine-tuning BERT
● Classification tasks such as sentiment analysis.
● In Question Answering tasks (e.g. SQuAD v1.1).
● In Named Entity Recognition (NER).
Q&A