Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

GPT

Lecturer: Ngoc Ba
VietAI Teaching Team
Founder @ ProtonX
Video
Transformer

Application
Encoder Layers Bert

Application

Decoder Layers GPT

https://arxiv.org/abs/1706.03762
Bert vs GPT
0

Sentiment Analysis Language Model


Performing sentiment analysis by utilizing Generate the next words using history
information extracted from all words.
Softmax

Nội

CLS Tôi đi học ở Hà Nội Tôi đi học ở Hà

Bert - Google GPT - OpenAI


https://arxiv.org/pdf/1810.04805.pdf
Lịch sử GPT
2018 GPT-1
Improving Language Understanding by Generative Pre-Training
Discovery that Transformer Decoder can perform multiple natural language tasks independently
without the use of an Encoder.

2019 GPT-2
Language Models are Unsupervised Multitask Learners
Experimentation on large datasets and data preparation techniques for training a model on
multiple different tasks.

2020 GPT-3
Language Models are Few-Shot Learners
Continuing the breakthroughs with the use of zero-shot/one-shot/few-shot
learning instead of fine-tuning the model.

4/2022 InstructionGPT
Training language models to follow instructions with human feedback
Incorporating human-loop and reinforcement learning into model training to avoid generating bad/poisoned
information.

11/2022 ChatGPT Video


Lịch sử GPT
2018

2019

The consistent philosophy


Transforming the language model to achieve
high performance across multiple tasks
2020 (Multitasks) without separating datasets and
models tailored to each individual task.

4/2022

11/2022
GPT-1 Output Probabilities

Softmax
Improving Language Understanding by Generative
Pre-Training (117M parameters; Radford et al., 2018) Linear

- A decoder consisting of 12 layers in a Transformer


Add & Norm
architecture.
- The training data, which consisted of more than 7,000 Feed
Forward
distinct books (4.6GB of text), was obtained from
BooksCorpus. Add & Norm

Bỏ đi cross-attention
Multi-Headed
Unsupervised pre-training vớiAttention
encoder

Add & Norm

Masked
Multi-Headed
Attention

Output
Embedding

k: context window parameters Outputs


(shifted right) Video
Radford et al., 2018
GPT-1
Supervised fine-tuning
c labeled dataset C, where each instance consists of a sequence of input tokens x1 , . . . , xm,
along with a label y.

Cost Function

Radford et al., 2018


GPT-1
Supervised fine-tuning
c labeled dataset C, where each instance consists of a sequence of input tokens x1 , . . . , xm,
along with a label y.

Fine-tuning Cost Function

Cost Function

Radford et al., 2018


GPT-2
Language Models are Unsupervised Multitask Learners

GPT-1 GPT-2
117M parameters 1.5B parameters
4.6GB data 40GB data (WebText)

Scrape links posted on Reddit


that have received a minimum
of 3 upvotes

Radford et al., 2019 Video


GPT-2
Translation

Condition the language model on a context of


example pairs of the format

english sentence = french sentence

final prompt
english sentence = Inference Result

sea otter = loutre de mer


peppermint = menthe poivrée
cheese =

Result: fromage

Radford et al., 2019


GPT-2

Radford et al., 2019


GPT-3
Language Models are Few-Shot Learners

GPT-2 GPT-3
1.5B parameters 175B parameters
40GB data over 600GB data

Dataset
In-context learning
Traditional

Unsupervised-model Fine-tuning

In-context learning

No fine-tuning
No gradient-updates
Zero-shot
One-shot
Few-shot
Learning

https://arxiv.org/pdf/2005.14165.pdf
Data quality

https://arxiv.org/pdf/2005.14165.pdf
Reality

Track all articles using a database


Distributed Training

A6000 A6000 A6000 A6000 A6000 A6000 A6000 A6000

Copy Distributed

Language Model Data


GPT-2 5 millions sentences

A6000 A6000 A6000 A6000 A6000 A6000 A6000 A6000


Video
https://bluestudio.ai/smart-hr
Distributed Training

A6000 A6000 A6000 A6000

Average
the parameters

A6000 A6000 A6000 A6000


Scaling Laws for Neural Language Models

https://arxiv.org/pdf/2001.08361.pdf
Reality

GPT-2 Medium
345M parameters

https://bluestudio.ai/smart-hr
Inference Architecture
Internet

Load balancing
Distribute the traffic

Instance Group
Auto Scaling

CPU 8GB RAM


e2 Medium
Video
Human-in-the-loop
Cache all queries and their corresponding generated responses

Video
Human-in-the-loop
Admin is able to review and regenerate new response to improve content quality.
Instruction finetuning

https://arxiv.org/pdf/2210.11416.pdf
GPT-3.5 - Instruction GPT

Large-scale language models are capable of


generating outputs that may be false, harmful,
or not particularly beneficial to the user.

aligning language models with user intent


on a wide range of tasks by fine-tuning with
human feedback

https://arxiv.org/pdf/2203.02155.pdf Video
GPT-3.5 - Instruction GPT
Supervised Reward modeling (RM) Reinforcement learning
fine-tuning (SFT) (RL)

https://arxiv.org/pdf/2203.02155.pdf
Supervised fine-tuning (SFT)

Write an email…

Rephrase the sentence:…


Prompts .
.

Translate the paragraph:…

fine-tuning

GPT-3
(SFT)

Video
Reward modeling (RM)

.
.

Write an email…

Generated 1 Generated 4 4

Generated 2 Generated 2 3
GPT-3 Generated 3 Generated 1
2

Generated 4 Generated 3 1

Ranking
Human scoring

Video
Reward modeling (RM)
Write an email…
Copy Reward (Preference)
Reduce weight
.
.
model

175B 6B Fine-tuning

Generated 1 Generated 4 4

Generated 2 Generated 2 3
GPT-3 Generated 3 Generated 1
2

Generated 4 Generated 3 1

Human scoring Ranking

Problem: The decisions made by humans are often subject to noise and miscalibration.

Solution: Using pairwise comparisons instead of direct ratings.


Reward Model
Reward (Preference)
model

Generated 1 2 Generated 3 1

Learn to generate better response

Number of comparisons from each prompt

https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model
Training Process
Agent
Prompts
x: Ngoc is

Reinforcement Learning Update

Copy
Supervised Tuned Language
fine-tuning (SFT) Modal (RL Policy) 1

y: a teacher of Reward (Preference


y: handsome. VietAI NLP model)
Class Environment

+
A penalty that restrains us from diverging excessively from the pretrained model. It is
Kullback-Leibler (KL) divergence Video
Objective function Reward (Preference
model)

Tuned Language Supervised


Modal (RL Policy) fine-tuning (SFT)

y: a teacher of
VietAI NLP y: handsome.
Classs
Video
What we learn today

● GPT and its versions, ranging from GPT-1 to InstructionGPT


● Human-in-the-loop in building AI Applications

33
Reinforcement Learning

● https://web.stanford.edu/class/cs234/
● https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf

34
Thank you!

35

You might also like