Professional Documents
Culture Documents
Pretrained-Model
Pretrained-Model
Pretrained-Model
AI VIET NAM
Nguyễn Quốc Thái
CONTENT
2
1 – Pretrained Word Vectors
Pre-trained Language Models (LMs)
Not Enough
Embedding Matrix
Get Vectors
3
1 – Pretrained Word Vectors
Pre-trained Language Models (LMs)
0 1
Model LANGUAGE MODEL
Classifier
PRE-TRAINED LMs
Data
This movie is bad 0
This movie is good 1
4
1 – Pretrained Word Vectors
Pre-trained Language Models (LMs)
Word2Vec
Glove
Fasttext
ELMO
Training Data
Wikipedia
News
Book
Social Network
…
5
1 – Pretrained Word Vectors
Word2Vec
6
1 – Pretrained Word Vectors
Fasttext
Perplexity
7
1 – Pretrained Word Vectors
ELMO
Paper
Pre-trained
8
1 – Pretrained Word Vectors
Glove
9
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
10
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Version:
6B
400K Vocab
50 D
11
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Version:
6B
400K Vocab
50 D
12
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Version:
6B
400K Vocab
50 D
13
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Find Synonyms
Find Analogies
14
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Find Synonyms
Given a word
Find top k synonym words
15
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Find Synonyms
Given a word
Find top k synonym words
16
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Find Analogies
Given 3 words
Find a word with analogies relationship
Example:
“man” : “woman” :: “son” : “daughter”
a : b :: c : d
Vec(a) + Vec(d) = Vec(b) + Vec(c)
17
1 – Pretrained Word Vectors
Pre-trained Glove Embedding
Find Analogies
Given 3 words
Find a word with analogies relationship
Example:
“man” : “woman” :: “son” : “daughter”
a : b :: c : d
Vec(a) + Vec(d) = Vec(b) + Vec(c)
18
2 – Text Classification
Preprocessing
Glove
Representation One-hot BoW TF-IDF Pre-trained
19
2 – Text Classification
Neural Network
20
2 – Text Classification
Review: Embedding Layer
Embedding matrix
Input matrix 0 0.1 3.1 Select
w[0 w[4 w[2] w[1
Index-based Representation Operation ] ] w[3] ]
1 0.5 2.5
w[5 w[6 w[1] w[3
0 4 2 3 2 1.3 0.6 ] ] w[2] ]
1
3 0.4 0.1 Output matrix
5 6 1 2
Input
3 shape: 2x5 4 0.7 1.4
5 2.3 1.7 3.1 1.4 0.6 0.1 2.5
6 2.5 2.5 0.1 0.7 1.3 0.4 0.5
2.3 2.5 0.5 1.3 0.4
Vocab size = 7
Input shape: 2x5x2
21
2 – Text Classification
Pre-trained Glove Embedding
Embedding matrix
0 <oov> 0.1 3.1
1 <pad> 0.5 2.5
2 <unk> 1.3 0.6 Final embedding matrix
3 neural 0.4 0.1 0 <oov> 0.1 0.1
4 language 0.7 1.4 1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
Glove embedding matrix
3 neural 0.0 0.0
0 <oov> 0.1 0.1
4 language 0.7 0.7
1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
3 language 0.7 0.7
4 mưa 0.7 0.4 22
2 – Text Classification
Pre-trained Glove Embedding
Glove embedding matrix
0 <oov> 0.1 0.1
1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
3 language 0.7 0.7
4 mưa 0.7 0.4
23
2 – Text Classification
Pre-trained Glove Embedding
Final embedding matrix
0 <oov> 0.1 0.1
1 <pad> 0.5 0.5
2 <unk> 0.3 0.6
3 neural 0.0 0.0
4 language 0.7 0.7
24
2 – Text Classification
Pre-trained Glove Embedding
25
3 - Summary
Basic NLP Course
01 Introduction
02 Preprocessing
03 Language Modeling
04 Part Of Speech (POS)
05 Constituency Parsing
06 Basic Vectorization
07 Word2Vec
08 Pretrained Model
26
3 - Summary
2 - Preprocessing
27
3 - Summary
2 - Preprocessing
Balanced Data
Positive Negative
Imbalanced Data
Positive Negative
28
3 - Summary
3 – Language Model
29
3 - Summary
4 – POS Tagging - NER
30
3 - Summary
5 – Constituency Parsing (CFG)
Grammar in CNF
CKY
G = (T, N, P, S, R) 1. Start S
2. S NP VP
3. NP Det Noun
T: a set of terminal symbols 4. NP NN PP
5. PP Prep NP
6. VP V NP
N: a set of non-terminal symbols 7. a. VP V Args
b. Args NP PP
P(PN ): a set of pre-terminal symbols 8. V ate
9. NP John
10. NP ice-cream, snow
S: a start symbol 11. Noun ice-cream, pizza
12. Noun table, guy,
R: a set of rules or productions campus
R = {| N, (TN)} 13. Det the
14. Prep on
31
3 - Summary
5 – Dependency Parsing
A graph G = (V, A)
V vertices
{w0=root, w1,…, wn)
usually one per word in sentence
A arcs
{(wi,r,wj):wi≠wj) | wi∈V,
wj∈V-w0, r ∈Rx}
Rx: a set of all possible dependency
relations in x
32
3 - Summary
5 – Dependency Parsing
Dependency Tree
a ROOT
Projective
Acyclic
Unique path from ROOT to each word
33
3 - Summary
6 – Basic Vectorization
One-hot encoding
Bag-of-words (BoW)
Bag-of-N-gram
34
3 - Summary
7 – Word2Vec
35
3 - Summary
7 – Pre-trained Embedding
Word2Vec
Glove
Fasttext
ELMO
36
3 - Summary
NLP Pipeline
37
3 - Summary
The neural history of NLP
2008
Word Embedding
Multi-task Learning
2013 Neural Network for NLP
2008
Word Embedding
Multi-task Learning
2013 Neural Network for NLP
2008
Word Embedding
Multi-task Learning
2013 Neural Network for NLP
CNN
2014 Sequence-to-sequence Models
Machine Translation
2001 Neural Language
Models
2008 Multi-task Learning
Word Embedding
2013 Neural Network for NLP
Image Captioning
2014 Sequence-to-sequence Models
1. https://web.stanford.edu/~jurafsky/slp3/
2. http://web.stanford.edu/class/cs224n/
3. https://d2l.ai/
4. http://nlpprogress.com/
5. https://github.com/undertheseanlp/NLP-Vietnamese-progress
49
Thanks!
Any
questions?