Seq 2 Seq

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Seq2seq, Attention, Self

attention, Transformer, BERT


Tuan Nguyen - AI4E
Outline
● RNN review
● Seq2seq
● Beam search
● Attention
● Self-attention
● Transformer
● BERT
Recurrent Neural Network
Usually drawn as:
RNN Formula
The state consists of a single “hidden” vector h:
y

RNN

x
Forward C1 C2 C3

y1 y2 y3

h1 h2 h3

x1 h0 x2 h1 x3 h2
Deep RNN

Same
parameters
at this level


Same
parameters
at this level

Time
Recurrent neural network problem
Character-level language model example
Character-level language model example

Vocabulary:
[h,e,l,o] y

Example training RN
sequence: N
“hello”
x
Character-level language model example
Character-level language model example

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”
Character-level language model example
Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”
Character-level language model example
Character-level language model example

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”
Long short term memory (LSTM)
Translation
Model
Seq2seq
Seq2Seq
Seq2seq
Seq2seq - prediction
Greedy search
Beam search
Word embedding (glove, word2vec)
Word semantic
Attention - motivation
Attention

q
k1 k2 k3
Attention
Attention function
Self attention
Transformer
Seq2seq
Transformer
Transformer
Multi-head attention
BERT
BERT model

● Mask Language
Modeling (MLM)
● Next Sentence
Prediction (NSP)
Masked LM (MLM)
Next sentence prediction
Fine-tuning BERT
● Classification tasks such as sentiment analysis.
● In Question Answering tasks (e.g. SQuAD v1.1).
● In Named Entity Recognition (NER).
Q&A

You might also like