Pretrained-Model

NLP Basic – 08
Pre-trained Word Vectors
AI VIET NAM
Nguyễn Quốc Thái
CONTENT
1 Pre-trained Word Vectors

2 Text Classification
3 Summary
2
1 – Pretrained Word Vectors
 Pre-trained Language Models (LMs)
Not Enough
Embedding Matrix
Get Vectors
3
Objective: Language Model Supervised: Text Classification
0 1
Model LANGUAGE MODEL
Classifier
PRE-TRAINED LMs
Data
This movie is bad 0
This movie is good 1
4
 Word2Vec
 Glove
 Fasttext
 ELMO
Training Data
Wikipedia
News
Book
Social Network
…
5
 Word2Vec
6
 Fasttext
Perplexity
7
 ELMO
Paper
Pre-trained
8
 Glove
 The conditional probability: NER Task

exp(ujTvi)
Qij =
∑ w∈ exp(uw T i
V
 Using co-occurrence
v) probabilities:
J=−- - Xij log

Qij
i j∈context(i)
9
 Pre-trained Glove Embedding
10
Version:
6B
400K Vocab
50 D
11
Version:
6B
400K Vocab
50 D
12
Version:
6B
400K Vocab
50 D
13
Find Synonyms
Find Analogies
14
Find Synonyms
Given a word
Find top k synonym words
15
Find Synonyms
Given a word
Find top k synonym words
16
Find Analogies
Given 3 words
Find a word with analogies relationship
Example:
“man” : “woman” :: “son” : “daughter”
a : b :: c : d
Vec(a) + Vec(d) = Vec(b) + Vec(c)
17
Find Analogies
Given 3 words
Find a word with analogies relationship
Example:
“man” : “woman” :: “son” : “daughter”
a : b :: c : d
Vec(a) + Vec(d) = Vec(b) + Vec(c)
18
2 – Text Classification
Preprocessing
Glove
Representation One-hot BoW TF-IDF Pre-trained
Classifier Naïve Bayes Logistic Neural Network
Metrics Accuracy Recall Precision F1 Score
19
 Neural Network
20
 Review: Embedding Layer
Embedding matrix
Input matrix 0 0.1 3.1 Select
w[0 w[4 w[2] w[1
Index-based Representation Operation ] ] w[3] ]
1 0.5 2.5
w[5 w[6 w[1] w[3
0 4 2 3 2 1.3 0.6 ] ] w[2] ]
1
3 0.4 0.1 Output matrix
5 6 1 2
Input
3 shape: 2x5 4 0.7 1.4
5 2.3 1.7 3.1 1.4 0.6 0.1 2.5
6 2.5 2.5 0.1 0.7 1.3 0.4 0.5
2.3 2.5 0.5 1.3 0.4
Vocab size = 7
Input shape: 2x5x2
21
Embedding matrix
0 <oov> 0.1 3.1
1 <pad> 0.5 2.5
2 <unk> 1.3 0.6 Final embedding matrix
3 neural 0.4 0.1 0 <oov> 0.1 0.1
4 language 0.7 1.4 1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
Glove embedding matrix
3 neural 0.0 0.0
0 <oov> 0.1 0.1
4 language 0.7 0.7
1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
3 language 0.7 0.7
4 mưa 0.7 0.4 22
Glove embedding matrix
0 <oov> 0.1 0.1
1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
3 language 0.7 0.7
4 mưa 0.7 0.4
23
Final embedding matrix
0 <oov> 0.1 0.1
1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
3 neural 0.0 0.0
4 language 0.7 0.7
24
Final embedding matrix

0 <oov> 0.1 0.1
1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
3 neural 0.0 0.0
Update weight 4 language 0.7 0.7 Update weight
25
3 - Summary
 Basic NLP Course
01 Introduction
02 Preprocessing
03 Language Modeling
04 Part Of Speech (POS)
05 Constituency Parsing
06 Basic Vectorization
07 Word2Vec
08 Pretrained Model
26
3 - Summary
 2 - Preprocessing
27
3 - Summary
 2 - Preprocessing
Balanced Data
Positive Negative
Imbalanced Data
Positive Negative
28
3 - Summary
 3 – Language Model
29
3 - Summary
 4 – POS Tagging - NER
Model: Hidden Markov Model (HMM)
30
3 - Summary
 5 – Constituency Parsing (CFG)
Grammar in CNF
CKY
G = (T, N, P, S, R) 1. Start  S
2. S  NP VP
3. NP  Det Noun
T: a set of terminal symbols 4. NP  NN PP
5. PP  Prep NP
6. VP  V NP
N: a set of non-terminal symbols 7. a. VP  V Args
b. Args  NP PP
P(PN ): a set of pre-terminal symbols 8. V  ate
9. NP  John
10. NP  ice-cream, snow
S: a start symbol 11. Noun  ice-cream, pizza
12. Noun  table, guy,
R: a set of rules or productions campus
R = {| N, (TN)} 13. Det  the
14. Prep  on
31
3 - Summary
 5 – Dependency Parsing
A graph G = (V, A)
V vertices
{w0=root, w1,…, wn)
usually one per word in sentence
A arcs
{(wi,r,wj):wi≠wj) | wi∈V,
wj∈V-w0, r ∈Rx}
Rx: a set of all possible dependency
relations in x
32
3 - Summary
 5 – Dependency Parsing
Dependency Tree
a ROOT
Each word has a single head
Dependency structure is connected
Projective
Acyclic
Unique path from ROOT to each word
33
3 - Summary
 6 – Basic Vectorization
 One-hot encoding
 Bag-of-words (BoW)
 Bag-of-N-gram
34
3 - Summary
 7 – Word2Vec
35
3 - Summary
 7 – Pre-trained Embedding
 Word2Vec
 Glove
 Fasttext
 ELMO
36
3 - Summary
 NLP Pipeline
37
3 - Summary
 The neural history of NLP
2001 Neural Language

Models
2008 Multi-task Learning
Word Embedding
2013 Neural Network for NLP
2014 Sequence-to-sequence Models
2015 Attention - Transformer
2018 Pretrained Language Models

38
3 - Summary

Models
Word Embedding

39
3 - Summary

Models
2008
Word Embedding
Multi-task Learning

40
3 - Summary

Models
2008
Word Embedding
Multi-task Learning

41
3 - Summary
2001 Neural Language RNN LSTM GRU

Models
2008
Word Embedding
Multi-task Learning
CNN

42
3 - Summary
Machine Translation
Models
Word Embedding
Image Captioning

43
3 - Summary
2001 Neural Language Question Answer

Models
Word Embedding

44
3 - Summary
2001 Neural Language Attention

Models
Word Embedding

45
3 - Summary

Models
Word Embedding

Transformer
46
3 - Summary

Models
Word Embedding

BERT

47
3 - Summary

Models
Word Embedding
2015 Attention - Transformer GPT

48
3 - Summary
 Reference
1. https://web.stanford.edu/~jurafsky/slp3/
2. http://web.stanford.edu/class/cs224n/
3. https://d2l.ai/
4. http://nlpprogress.com/
5. https://github.com/undertheseanlp/NLP-Vietnamese-progress
49
Thanks!
Any
questions?

Pretrained-Model

Uploaded by

Copyright:

Available Formats

You might also like

Pretrained-Model

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pretrained-Model

Uploaded by

Copyright:

Available Formats

NLP Basic – 08

Pre-trained Word Vectors

1 Pre-trained Word Vectors

Objective: Language Model Supervised: Text Classification

 The conditional probability: NER Task

J=−- - Xij log

Classifier Naïve Bayes Logistic Neural Network

Metrics Accuracy Recall Precision F1 Score

Final embedding matrix

Update weight 4 language 0.7 0.7 Update weight

Model: Hidden Markov Model (HMM)

Each word has a single head

Dependency structure is connected

2001 Neural Language

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language RNN LSTM GRU

2015 Attention - Transformer

2018 Pretrained Language Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language Question Answer

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language Attention

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2001 Neural Language

2014 Sequence-to-sequence Models

2015 Attention - Transformer

2018 Pretrained Language Models

2001 Neural Language

2014 Sequence-to-sequence Models

2015 Attention - Transformer GPT

2018 Pretrained Language Models

You might also like