(Shared) - GPT

GPT
Lecturer: Ngoc Ba
VietAI Teaching Team
Founder @ ProtonX
Video
Transformer
Application
Encoder Layers Bert
Application
Decoder Layers GPT
https://arxiv.org/abs/1706.03762
Bert vs GPT
0
Sentiment Analysis Language Model

Performing sentiment analysis by utilizing Generate the next words using history
information extracted from all words.
Softmax
Nội
CLS Tôi đi học ở Hà Nội Tôi đi học ở Hà
Bert - Google GPT - OpenAI

https://arxiv.org/pdf/1810.04805.pdf
Lịch sử GPT
2018 GPT-1
Improving Language Understanding by Generative Pre-Training
Discovery that Transformer Decoder can perform multiple natural language tasks independently
without the use of an Encoder.
2019 GPT-2
Language Models are Unsupervised Multitask Learners
Experimentation on large datasets and data preparation techniques for training a model on
multiple different tasks.
2020 GPT-3
Language Models are Few-Shot Learners
Continuing the breakthroughs with the use of zero-shot/one-shot/few-shot
learning instead of fine-tuning the model.
4/2022 InstructionGPT
Training language models to follow instructions with human feedback
Incorporating human-loop and reinforcement learning into model training to avoid generating bad/poisoned
information.
11/2022 ChatGPT Video

Lịch sử GPT
2018
2019
The consistent philosophy

Transforming the language model to achieve
high performance across multiple tasks
2020 (Multitasks) without separating datasets and
models tailored to each individual task.
4/2022
11/2022
GPT-1 Output Probabilities
Softmax
Improving Language Understanding by Generative
Pre-Training (117M parameters; Radford et al., 2018) Linear
- A decoder consisting of 12 layers in a Transformer

Add & Norm
architecture.
- The training data, which consisted of more than 7,000 Feed
Forward
distinct books (4.6GB of text), was obtained from
BooksCorpus. Add & Norm
Bỏ đi cross-attention
Multi-Headed
Unsupervised pre-training vớiAttention
encoder
Add & Norm
Masked
Multi-Headed
Attention
Output
Embedding
k: context window parameters Outputs

(shifted right) Video
Radford et al., 2018
GPT-1
Supervised fine-tuning
c labeled dataset C, where each instance consists of a sequence of input tokens x1 , . . . , xm,
along with a label y.
Cost Function

GPT-1
Supervised fine-tuning
c labeled dataset C, where each instance consists of a sequence of input tokens x1 , . . . , xm,
along with a label y.
Fine-tuning Cost Function
Cost Function

GPT-2
Language Models are Unsupervised Multitask Learners
GPT-1 GPT-2
117M parameters 1.5B parameters
4.6GB data 40GB data (WebText)
Scrape links posted on Reddit

that have received a minimum
of 3 upvotes
Radford et al., 2019 Video

GPT-2
Translation
Condition the language model on a context of

example pairs of the format
english sentence = french sentence
final prompt
english sentence = Inference Result
sea otter = loutre de mer

peppermint = menthe poivrée
cheese =
Result: fromage

GPT-2

GPT-3
Language Models are Few-Shot Learners
GPT-2 GPT-3
1.5B parameters 175B parameters
40GB data over 600GB data
Dataset
In-context learning
Traditional
Unsupervised-model Fine-tuning
In-context learning
No fine-tuning
No gradient-updates
Zero-shot
One-shot
Few-shot
Learning
Data quality
Reality
Track all articles using a database

Distributed Training
A6000 A6000 A6000 A6000 A6000 A6000 A6000 A6000
Copy Distributed
Language Model Data

GPT-2 5 millions sentences
A6000 A6000 A6000 A6000 A6000 A6000 A6000 A6000

Video
https://bluestudio.ai/smart-hr
Distributed Training
A6000 A6000 A6000 A6000
Average
the parameters
A6000 A6000 A6000 A6000

Scaling Laws for Neural Language Models
Reality
GPT-2 Medium
345M parameters
https://bluestudio.ai/smart-hr
Inference Architecture
Internet
Load balancing
Distribute the traffic
Instance Group
Auto Scaling
CPU 8GB RAM

e2 Medium
Video
Human-in-the-loop
Cache all queries and their corresponding generated responses
Video
Human-in-the-loop
Admin is able to review and regenerate new response to improve content quality.
Instruction finetuning
GPT-3.5 - Instruction GPT
Large-scale language models are capable of

generating outputs that may be false, harmful,
or not particularly beneficial to the user.
aligning language models with user intent

on a wide range of tasks by fine-tuning with
human feedback
https://arxiv.org/pdf/2203.02155.pdf Video
GPT-3.5 - Instruction GPT
Supervised Reward modeling (RM) Reinforcement learning
fine-tuning (SFT) (RL)
Supervised fine-tuning (SFT)
Write an email…
Rephrase the sentence:…

Prompts .
.
Translate the paragraph:…
fine-tuning
GPT-3
(SFT)
Video
Reward modeling (RM)
.
.
Write an email…
Generated 1 Generated 4 4
GPT-3 Generated 3 Generated 1
2
Ranking
Human scoring
Video
Reward modeling (RM)
Write an email…
Copy Reward (Preference)
Reduce weight
.
.
model
175B 6B Fine-tuning
GPT-3 Generated 3 Generated 1
2
Human scoring Ranking
Problem: The decisions made by humans are often subject to noise and miscalibration.
Solution: Using pairwise comparisons instead of direct ratings.

Reward Model
Reward (Preference)
model
Generated 1 2 Generated 3 1
Learn to generate better response
Number of comparisons from each prompt
https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model
Training Process
Agent
Prompts
x: Ngoc is
Reinforcement Learning Update
Copy
Supervised Tuned Language
fine-tuning (SFT) Modal (RL Policy) 1
y: a teacher of Reward (Preference

y: handsome. VietAI NLP model)
Class Environment
+
A penalty that restrains us from diverging excessively from the pretrained model. It is
Kullback-Leibler (KL) divergence Video
Objective function Reward (Preference
model)
Tuned Language Supervised

Modal (RL Policy) fine-tuning (SFT)
y: a teacher of
VietAI NLP y: handsome.
Classs
Video
What we learn today
● GPT and its versions, ranging from GPT-1 to InstructionGPT

● Human-in-the-loop in building AI Applications
33
Reinforcement Learning
● https://web.stanford.edu/class/cs234/
● https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf
34
Thank you!
35

(Shared) - GPT

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Shared) - GPT

Uploaded by

Copyright:

Available Formats

GPT

Decoder Layers GPT

Sentiment Analysis Language Model

CLS Tôi đi học ở Hà Nội Tôi đi học ở Hà

Bert - Google GPT - OpenAI

11/2022 ChatGPT Video

The consistent philosophy

- A decoder consisting of 12 layers in a Transformer

Add & Norm

k: context window parameters Outputs

Radford et al., 2018

Fine-tuning Cost Function

Radford et al., 2018

Scrape links posted on Reddit

Radford et al., 2019 Video

Condition the language model on a context of

english sentence = french sentence

sea otter = loutre de mer

Radford et al., 2019

Radford et al., 2019

Track all articles using a database

A6000 A6000 A6000 A6000 A6000 A6000 A6000 A6000

Language Model Data

A6000 A6000 A6000 A6000 A6000 A6000 A6000 A6000

A6000 A6000 A6000 A6000

A6000 A6000 A6000 A6000

CPU 8GB RAM

Large-scale language models are capable of

aligning language models with user intent

Rephrase the sentence:…

Translate the paragraph:…

Human scoring Ranking

Solution: Using pairwise comparisons instead of direct ratings.

Learn to generate better response

Number of comparisons from each prompt

Reinforcement Learning Update

y: a teacher of Reward (Preference

Tuned Language Supervised

● GPT and its versions, ranging from GPT-1 to InstructionGPT

You might also like