Welcome to Scribd!

MLSys Class LLM Introduction

Uploaded by

0% found this document useful (0 votes)

66 views43 pages

The document introduces language models including BERT, GPT, and T5 which use techniques like masked language modeling, causal language modeling, and text-to-text transfer. It discusses how transformer models use attention and self-attention. The document compares BERT and GPT and explains how pretraining, fine-tuning, prompting, and reinforcement learning from human feedback are used. It raises questions about the advantages and disadvantages of different training methods, the role of systems research in scaling language models, security considerations, and improving energy efficiency.

Original Description:

Original Title

MLSys class LLM introduction

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

66 views43 pages

MLSys Class LLM Introduction

Uploaded by

Ali Elouafiq

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 43

Search inside document

Introduction to

Language Models
Eve Fleisig & Kayo Yin
CS 294-162
August 28, 2023
Language Modeling

Image credit: jalammar.github.io/illustrated-word2vec/

Masked Language Modeling
BERT

Image credit: jalammar.github.io/illustrated-bert/

Causal Language Modeling
GPT

Image credit: jalammar.github.io/illustrated-gpt2/

BERT vs. GPT

● Bidirectional encoder models (BERT) do better than generative models at

non-generation tasks, for comparable training data/model complexity.

● Generative models (GPT) have training efficiency and scalability advantages

that may make them ultimately more accurate. They can also solve
downstream tasks in a zero-shot setting.
Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer

Image credit: jalammar.github.io/illustrated-transformer/ v

Attention
Self-Attention
Self-Attention

Image credit: jalammar.github.io/illustrated-gpt2/

Self-Attention

Image credit: jalammar.github.io/illustrated-gpt2/

Self-Attention
Self-Attention
Self-Attention
Self-Attention
Multi-headed Attention
Multi-headed Attention
Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer Input
Transformer Encoder

Image credit: jalammar.github.io/illustrated-transformer/

Adding the Decoder

Image credit: jalammar.github.io/illustrated-transformer/

BERT

Image credit: jalammar.github.io/illustrated-bert/

BERT
GPT
GPT
T5

Text-to-Text Transfer Transformer

Pretraining & Fine-tuning
Pretraining & Fine-tuning
Pretraining & Fine-tuning

Unsupervised objective

Supervised objective
Prefixes & Prompting
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning

Generalization to new tasks without fine-tuning enabled by:

Scaling
Data Compute
Scaling Data
Common Crawl dataset: introduced with T5; still in use
GPT-3 Training Data:
Scaling Data & Compute

Kaplan et al., 2020;

Hoffmann et al., 2022
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Discussion
● What are the advantages and disadvantages of different training or tuning methods
that have been tried (task-specific training, pretrain/fine-tune, prompting, RLHF)?
● What is the role of systems research in scaling up LLMs? How could advances in
systems research change scaling “laws”?
● What security considerations do we need to consider when deploying LLMs into the
real world?
● How can we improve the energy efficiency and carbon footprint of LLMs?

Transformers For Natural Language Processing and Computer Vision Third Edition Denis Rothman All Chapter
Document67 pages
Transformers For Natural Language Processing and Computer Vision Third Edition Denis Rothman All Chapter
heather.flores371
100% (7)
Mastering Reinforcement Learning With Pyth - Enes Bilgin
Document426 pages
Mastering Reinforcement Learning With Pyth - Enes Bilgin
Nguyen Duc Anh
100% (1)
Canara Bank Mobile Number Change and Update
Document1 page
Canara Bank Mobile Number Change and Update
Anonymous YFac6px
77% (44)
Learning Python
From Everand
Learning Python
Romano Fabrizio
Rating: 5 out of 5 stars
5/5 (1)
SAP - Project Work
Document63 pages
SAP - Project Work
jay_kb
100% (3)
From GPT2AutoGPT
Document12 pages
From GPT2AutoGPT
emilyma630
No ratings yet
ChatGPT KZ Feb2023 PDF
Document7 pages
ChatGPT KZ Feb2023 PDF
samuel asefa
No ratings yet
GPT 4
Document98 pages
GPT 4
Gary Joel Pimentel Rosario
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
Document25 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
bigdatateacher
No ratings yet
GPT-4 Vs GPT-35 A Concise Showdown
Document6 pages
GPT-4 Vs GPT-35 A Concise Showdown
kirinNarak
No ratings yet
What Is GPT Artcle
Document5 pages
What Is GPT Artcle
Abdulhamid Sonaike
No ratings yet
Untitled
Document202 pages
Untitled
foratação pcenootebok
No ratings yet
Gpfsworkshop2010 Tutorial v17 2
Document381 pages
Gpfsworkshop2010 Tutorial v17 2
atretau
No ratings yet
LLM Learning
Document56 pages
LLM Learning
13910235173
No ratings yet
GPT 4
Document99 pages
GPT 4
Ang Andrew
No ratings yet
Training Language Models To Follow Instructions With Human Feedback
Document68 pages
Training Language Models To Follow Instructions With Human Feedback
Luka Savic
No ratings yet
The GPT Model
Document1 page
The GPT Model
Andrew mill
No ratings yet
How To Make Custom AI-Generated Text With GPT-2
Document3 pages
How To Make Custom AI-Generated Text With GPT-2
zikit.ben.david
No ratings yet
BERT
Document1 page
BERT
ice queen
No ratings yet
Training Language Models To Follow Instructions
Document15 pages
Training Language Models To Follow Instructions
maniacmusic
No ratings yet
How To Calculate Capability For Positional Tolerance Like A Hole Position - Searching For A Generic Solution - THX - LinkedIn
Document2 pages
How To Calculate Capability For Positional Tolerance Like A Hole Position - Searching For A Generic Solution - THX - LinkedIn
Lokesh Narasimhaiah
No ratings yet
EMNLP 2020 Tutorial High Performance NLP
Document274 pages
EMNLP 2020 Tutorial High Performance NLP
刘江
No ratings yet
ChatGPT, LLM and RLHF
Document45 pages
ChatGPT, LLM and RLHF
Ailed De La Cruz Paez
No ratings yet
Master Thesis Genetic Algorithm
Document6 pages
Master Thesis Genetic Algorithm
alissacruzomaha
100% (2)
3760 The Batch Size Can Affect Infe
Document5 pages
3760 The Batch Size Can Affect Infe
somrawee
No ratings yet
JISHNU Seminar Report Draft
Document16 pages
JISHNU Seminar Report Draft
Abhin As
No ratings yet
TARP Report
Document18 pages
TARP Report
dragonnishanth
No ratings yet
SGPT: GPT Sentence Embeddings For Semantic Search: Preprint. Under Review
Document19 pages
SGPT: GPT Sentence Embeddings For Semantic Search: Preprint. Under Review
Susan George
No ratings yet
Technical Seminar
Document21 pages
Technical Seminar
Deepak Gowda
100% (1)
Parameter Reference
Document7,270 pages
Parameter Reference
Enio Laguardia
No ratings yet
Albert: A L Bert S - L L R: ITE FOR ELF Supervised Earning of Anguage Epresentations
Document16 pages
Albert: A L Bert S - L L R: ITE FOR ELF Supervised Earning of Anguage Epresentations
Aks123
No ratings yet
Fuzzy Model For Optimizing Strategic Decisions Using Matlab
Document13 pages
Fuzzy Model For Optimizing Strategic Decisions Using Matlab
Research Cell: An International Journal of Engineering Sciences
No ratings yet
Paper 006
Document18 pages
Paper 006
m.hajihosseini95
No ratings yet
Trends in Personalized Video Recommendations
Document46 pages
Trends in Personalized Video Recommendations
ameydhar
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
Document51 pages
Tianzheng Troy Wang CIS498EAS499 Submission
dan_1967
No ratings yet
A Simple Guide On Using BERT For Binary Text Classification
Document18 pages
A Simple Guide On Using BERT For Binary Text Classification
sita devi
No ratings yet
Introducing Decision Transformers On Hugging Face ?
Document12 pages
Introducing Decision Transformers On Hugging Face ?
minfuel
No ratings yet
BERT in Machine Learning - Aman Kharwal
Document8 pages
BERT in Machine Learning - Aman Kharwal
Sarsij Mishra
No ratings yet
LeNgiCoaLahProNg11 PDF
Document8 pages
LeNgiCoaLahProNg11 PDF
Jorge Leandro
No ratings yet
The Flan Collection
Document22 pages
The Flan Collection
yog54Origin
No ratings yet
Thesis Certificate
Document4 pages
Thesis Certificate
tammylacyarlington
100% (2)
Experiment 9
Document4 pages
Experiment 9
SAYYAM
No ratings yet
GenAI Vfinal 13oct2023
Document7 pages
GenAI Vfinal 13oct2023
PRADYUMNA BEHERA
No ratings yet
Full Chapter Transformers For Natural Language Processing and Computer Vision Third Edition Denis Rothman PDF
Document53 pages
Full Chapter Transformers For Natural Language Processing and Computer Vision Third Edition Denis Rothman PDF
wendell.taylor288
100% (3)
Tutorial MATCH-T DSM (English) 70
Document49 pages
Tutorial MATCH-T DSM (English) 70
W1CHM4N
No ratings yet
Lightgbm Gradient Boosting Tree
Document9 pages
Lightgbm Gradient Boosting Tree
weapon
No ratings yet
Chat GPT
Document10 pages
Chat GPT
Tarang Singh
No ratings yet
GPT Reference Sheet
Document8 pages
GPT Reference Sheet
Hidayat Mariadi
No ratings yet
How-To Leverage ChatGPT For Test Automation
Document24 pages
How-To Leverage ChatGPT For Test Automation
suraj satav
No ratings yet
XG Boost
Document22 pages
XG Boost
johnrofa
No ratings yet
Parameter Reference
Document7,144 pages
Parameter Reference
nick
No ratings yet
Undergraduate Research Internship in Computational Biology University of Connecticut Health Center
Document7 pages
Undergraduate Research Internship in Computational Biology University of Connecticut Health Center
Scott Norton
No ratings yet
Syllabus
Document11 pages
Syllabus
伯阿
No ratings yet
The Dawn of Lmms Preliminary Explorations With GPT 4V Ision Arxiv 2309 17421V2 Cs CV 11 Oct 2023 1St Edition Zhengyuan Yang
Document70 pages
The Dawn of Lmms Preliminary Explorations With GPT 4V Ision Arxiv 2309 17421V2 Cs CV 11 Oct 2023 1St Edition Zhengyuan Yang
katie.benton245
100% (6)
2023 GPT4All Technical Report
Document3 pages
2023 GPT4All Technical Report
Leonel rugama
No ratings yet
ChatGPT Course PDF
Document161 pages
ChatGPT Course PDF
Hidayat Mariadi
No ratings yet
How to use ChatGPT
From Everand
How to use ChatGPT
Bernhard Gaum
No ratings yet
Electra Pre Training Text Encoders As Discriminators Rather Than Generators
Document18 pages
Electra Pre Training Text Encoders As Discriminators Rather Than Generators
Minh Nguyen
No ratings yet
ML - Neural Networks
Document5 pages
ML - Neural Networks
Ben S
No ratings yet
Thesis On Booth Multiplier
Document4 pages
Thesis On Booth Multiplier
WhereCanYouBuyResumePaperSingapore
100% (2)
Groovy for Domain-specific Languages - Second Edition
From Everand
Groovy for Domain-specific Languages - Second Edition
Dearle Fergal
No ratings yet
ChatGPT for Beginners Al-Powered Producivity
From Everand
ChatGPT for Beginners Al-Powered Producivity
Ary S. Jr.
No ratings yet
Social Networks and Privacy: Oleksandr Bodriagov
Document28 pages
Social Networks and Privacy: Oleksandr Bodriagov
Roshini
No ratings yet
Nifty Constituents
Document13 pages
Nifty Constituents
Pranav Gandhi
No ratings yet
15 Slides Presentation
Document16 pages
15 Slides Presentation
F223822 Mohammad Ahmed Kamran
No ratings yet
Epfl Master Thesis Defense
Document7 pages
Epfl Master Thesis Defense
irywesief
100% (1)
User's Guide: Goldcrest
Document13 pages
User's Guide: Goldcrest
Erick Garlobo Hidalgo
No ratings yet
Slides
Document189 pages
Slides
Paúl Zambrano
No ratings yet
EPayment User Guide Manual v2.0
Document18 pages
EPayment User Guide Manual v2.0
Mark Amodia
No ratings yet
Intel Pentium Pro: Presented by
Document16 pages
Intel Pentium Pro: Presented by
Ian John Montalbo
No ratings yet
Proposal PPT Edited
Document29 pages
Proposal PPT Edited
fedasa bahilu
No ratings yet
Step by Step Guide-Zoom
Document10 pages
Step by Step Guide-Zoom
Mark
No ratings yet
Estimate Summary: Upfront Cost Monthly Cost Total 12 Months Cost
Document2 pages
Estimate Summary: Upfront Cost Monthly Cost Total 12 Months Cost
Kratik Jain
No ratings yet
Digital System Design: Implementation of Arithmetic Logic Unit
Document9 pages
Digital System Design: Implementation of Arithmetic Logic Unit
AL RIZWAN
No ratings yet
Rules of Logarithms
Document9 pages
Rules of Logarithms
Von Christian
No ratings yet
OOP Question Paper 1
Document4 pages
OOP Question Paper 1
Profulla swa
No ratings yet
Storage Allocation Using Auto Provisioning
Document96 pages
Storage Allocation Using Auto Provisioning
padhiary jagannath
No ratings yet
SQL Exercises - Pieces and Providers - Wikibooks, Open Books For An Open World With Answers PDF
Document3 pages
SQL Exercises - Pieces and Providers - Wikibooks, Open Books For An Open World With Answers PDF
bazezew
No ratings yet
CSE332 Pipelining Problems&Solutions
Document4 pages
CSE332 Pipelining Problems&Solutions
Tasnim Hossain
No ratings yet
Professional Obdii/Chip Tuning System For The Recalibration of Stock Ecu Engine Working Parameters
Document20 pages
Professional Obdii/Chip Tuning System For The Recalibration of Stock Ecu Engine Working Parameters
qwerty
No ratings yet
Power Exchange Interfaces For Power Center
Document227 pages
Power Exchange Interfaces For Power Center
Preeti V
No ratings yet
Ca Unit 4 Prabu
Document24 pages
Ca Unit 4 Prabu
6109 Sathish Kumar J
No ratings yet
Catalogo General Jungheinrich 2020
Document2 pages
Catalogo General Jungheinrich 2020
Victor Valencia
No ratings yet
Chapter 5: Functions and Graphs
Document32 pages
Chapter 5: Functions and Graphs
jokydin92
No ratings yet
Gujarat Technological University
Document2 pages
Gujarat Technological University
ABC
No ratings yet
Groeneveld Automatic Greasing Systems
Document8 pages
Groeneveld Automatic Greasing Systems
kamal
No ratings yet
Fast Dormancy Feature
Document15 pages
Fast Dormancy Feature
Soda Candy
No ratings yet
DX Diag
Document32 pages
DX Diag
PFFM 3011
No ratings yet
Base SAS 9.1 Procedures Guide
Document1,460 pages
Base SAS 9.1 Procedures Guide
Eugênio Chaves
No ratings yet
Nature Activities Binder by Slidesgo
Document47 pages
Nature Activities Binder by Slidesgo
Aida
No ratings yet