Welcome to Scribd!

Skip carousel

Transformers Explained "Attention Is All You Need."

Uploaded by

Shresht V

0% found this document useful (0 votes)

2 views28 pages

Original Title

Transformers explained “Attention is all you need.”

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

2 views28 pages

Transformers Explained "Attention Is All You Need."

Uploaded by

Shresht V

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 28

Search inside document

Transformers explained

“Attention is all you need.”

The technology behind ChatGPT, BERT, and all other language models
Do you know the full form of “GPT”?

“Generative Pre-trained Transformer”

It’s 2012.
Computers are good at vision.

Thanks to Neural Networks

Photo Credits: Google Brain

Convolutional neural networks (1987)

Photo Credits: Google Brain

Recurrent neural networks (RNN)

Photo Credits: Google Brain

“Sequential”
In Language, order of
words matters.

“Dhravya went looking for trouble”

“Trouble went looking for Dhravya”

Problem with RNNs

They forget the context 1. Slow

You cannot parrallelize them 2. Expensive to
train
✨Transformers✨
A flavor of Recurrent Neural Network that can be parallelized.

2017, Google + University of Toronto

Just:
1. Get a lot of GPUs
2. Get a LOT of data
3. Results will blow your mind.
How big?

GPT-3
But how does it work?

GPT-3
The three pillars

GPT-3
✨Transformers✨
Positional
Encoding

I love science
Positional Encoding
(tokenization)
1 2 3

I love science

Store information in the word order itself. Not in the structure of

the order.
A helpful rule of thumb is that one token generally corresponds
to ~4 characters of text for common English text. This translates
to roughly ¾ of a word

Photo taken from https://platform.openai.com/tokenizer GPT

tokenizer
Transformers understand
Importance of word order,

From the data.

Attention.

The agreement on the European Economic Area was signed in August

1992.

the European Economic Area -> la européenne économique zone

Understanding
language
“European” comes after “Economic” in French.
There’s a gendered agreement between words

the European Economic Area -> la zone économique européenne

But…
How does the model know which
words it should be attending to?
Self-Attention
“The programmer crashed the server”
“That software developer’s name is Max. He manages the
servers.”
“He is working on fixing the servers. The software is
broken.
Self-Attention
Understand the word based on the context of other
words,

In the same input sentence.

Models attending to the word

“Server, can I get a check?”

“Looks like I just crashed the server”
Transformers boil down
to

GPT-3
✨Transformers✨
Today, transformers are
used for:
Today, anyone can train
models on unlabelled data
References
Morgan, Abby. "Explainable AI: Visualizing Attention in
Transformers." Comet, 16 July 2023,
www.comet.com/site/blog/explainable-ai-for-transformers/.

Vaswani, Ashish, et al. "Attention is all you need." Advances in

neural information processing systems 30 (2017)
https://arxiv.org/abs/1706.03762

Muñoz, Eduardo. "Attention is all you need: Discovering the

Transformer paper. Detailed implementation of a Transformer
model in Tensorflow." Towards Data Science, 2 Nov. 2020,
towardsdatascience.com.

Fundamentals of Programming: Using Python
From Everand
Fundamentals of Programming: Using Python
Bruce Embry
Rating: 5 out of 5 stars
5/5 (2)
Tree Libros Boblioteca
Document33 pages
Tree Libros Boblioteca
Juan Carlos Sandoval Moreno
No ratings yet
DSAASTAT
Document37 pages
DSAASTAT
Muhamad Farhan RL
0% (1)
Century Tech: How Software Will Work
Document4 pages
Century Tech: How Software Will Work
Amar Shahid
No ratings yet
Chapter 1: Introduction To Computer Science and Media Computation
Document27 pages
Chapter 1: Introduction To Computer Science and Media Computation
Sadi Snmz
No ratings yet
ChatGPT Is Not A Blurry JPEG of The Web. It's A Simulacrum.
Document5 pages
ChatGPT Is Not A Blurry JPEG of The Web. It's A Simulacrum.
Tyson Woolman
No ratings yet
Archivo - 01 (Cópia)
Document5 pages
Archivo - 01 (Cópia)
SRT MLops
No ratings yet
Archivo - 01 (Outra Cópia)
Document5 pages
Archivo - 01 (Outra Cópia)
SRT MLops
No ratings yet
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
Cscubs Dialogues
Document11 pages
Cscubs Dialogues
Emy Omar
No ratings yet
Artificial Intelligence: S.Gokul, F.Ivin Prasanna
Document4 pages
Artificial Intelligence: S.Gokul, F.Ivin Prasanna
Monika Padala
No ratings yet
Introduction To Artificial Intelligence Section B
Document5 pages
Introduction To Artificial Intelligence Section B
elly
No ratings yet
Designing and Implementing Conversationa
Document12 pages
Designing and Implementing Conversationa
stevenkanyanjua1
No ratings yet
Will Human Beings Be Superseded by Generative Pre-Trained Transformer 3 (GPT-3) in Programming?
Document3 pages
Will Human Beings Be Superseded by Generative Pre-Trained Transformer 3 (GPT-3) in Programming?
researchparks
No ratings yet
Project English
Document4 pages
Project English
Olea Șișov
No ratings yet
Data Science Interview Questions (#Day15)
Document12 pages
Data Science Interview Questions (#Day15)
ARPAN MAITY
No ratings yet
Map of Computer Science - YouTube
Document4 pages
Map of Computer Science - YouTube
Master Ershin
No ratings yet
Attention and Memory in Deep Learning and NLP
Document8 pages
Attention and Memory in Deep Learning and NLP
OnixHoque
No ratings yet
Investigacion 2
Document9 pages
Investigacion 2
fy.fm073
No ratings yet
Miss Nasreen Anjum: Artificial Intelligence (AI)
Document21 pages
Miss Nasreen Anjum: Artificial Intelligence (AI)
Vladimir Petrusevski
No ratings yet
AI Research Work 20231010 AccessibilityChecked
Document49 pages
AI Research Work 20231010 AccessibilityChecked
Francisco Cláudio
No ratings yet
MU-SYNC - A Music Recommendation Bot: International Journal For Modern Trends in Science and Technology January 2022
Document4 pages
MU-SYNC - A Music Recommendation Bot: International Journal For Modern Trends in Science and Technology January 2022
Bâbú Sheelaj
No ratings yet
AI-Unit 5
Document32 pages
AI-Unit 5
Pranya Aneja
No ratings yet
What Are Large Language Models
Document6 pages
What Are Large Language Models
Shahrouz Alizadeh
No ratings yet
Advanced Deep Learning and Transformers - Cirrincione
Document3 pages
Advanced Deep Learning and Transformers - Cirrincione
Mohamed Chiheb Ben chaâbane
No ratings yet
Artificial Intelligence & Learning Computers: R.Kartheek B.Tech Iii (I.T) V.R.S & Y.R.N Engineering College, Chirala
Document9 pages
Artificial Intelligence & Learning Computers: R.Kartheek B.Tech Iii (I.T) V.R.S & Y.R.N Engineering College, Chirala
Ravula Kartheek
No ratings yet
Research Paper
Document11 pages
Research Paper
api-236335684
No ratings yet
Deep Learning PIAIC
Document229 pages
Deep Learning PIAIC
waqarmwach
100% (1)
Understanding Deep Learning
Document39 pages
Understanding Deep Learning
stelios
No ratings yet
Report in ML
Document9 pages
Report in ML
Priti Gupta
No ratings yet
Branches of Artificial Intelligence As AI Capabilities
Document5 pages
Branches of Artificial Intelligence As AI Capabilities
Khurram Abbas
No ratings yet
Artificial Intelligence
Document30 pages
Artificial Intelligence
Shivi Goyal
0% (1)
A Behind-the-Scenes Look at Natural Language Processing: Spam Detection
Document1 page
A Behind-the-Scenes Look at Natural Language Processing: Spam Detection
xiaokunzheng
No ratings yet
Artificial Intelligence & Learning Computers: Medicherla Pratyusha Morla Sirisha
Document8 pages
Artificial Intelligence & Learning Computers: Medicherla Pratyusha Morla Sirisha
sreeu619
No ratings yet
Deep Learning
Document207 pages
Deep Learning
intelligence gateway
100% (2)
Natural Language Processing in Action Second Edition Meap V11 Hobson Lane Maria Dyshel Online Ebook Texxtbook Full Chapter PDF
Document69 pages
Natural Language Processing in Action Second Edition Meap V11 Hobson Lane Maria Dyshel Online Ebook Texxtbook Full Chapter PDF
sue.butler320
100% (8)
Group Assignment
Document24 pages
Group Assignment
Kena hk
No ratings yet
Instroduction To Machine Translation
Document15 pages
Instroduction To Machine Translation
decentdawood
No ratings yet
Sevenconcurrencymodelsinsevenweeks Preview
Document5 pages
Sevenconcurrencymodelsinsevenweeks Preview
Scrib
No ratings yet
Laser Third Edition b2 SB Unit 3 Technology
Document12 pages
Laser Third Edition b2 SB Unit 3 Technology
Дар'я Фесенко
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
Document9 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
Aisha Gurung
No ratings yet
Pro .NET Memory Management: For Better Code, Performance, and Scalability
From Everand
Pro .NET Memory Management: For Better Code, Performance, and Scalability
Konrad Kokosa
No ratings yet
Bhawini NLP Practical
Document98 pages
Bhawini NLP Practical
Bhawini Raj
No ratings yet
Artificial Intelligence
Document8 pages
Artificial Intelligence
s_asmath
0% (1)
Recent Advances in Natural Language Processing
Document50 pages
Recent Advances in Natural Language Processing
Seth Grimes
No ratings yet
Buildinwg A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
Document9 pages
Buildinwg A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
Pocho Ortiz
No ratings yet
What Is Chatscript? Natural Language Processing
Document6 pages
What Is Chatscript? Natural Language Processing
Ankit Sharma
No ratings yet
Unit I
Document10 pages
Unit I
Raj Bhore
No ratings yet
Unit 1 Everyday Uses of Computer: Kompetensi Dasar
Document33 pages
Unit 1 Everyday Uses of Computer: Kompetensi Dasar
Tifani
No ratings yet
Campus X NLP Lecture 1
Document2 pages
Campus X NLP Lecture 1
Depepanshu Mahajan
No ratings yet
AI As Agency Without Intelligence
Document12 pages
AI As Agency Without Intelligence
Daniel Arturo Guerrero Álvarez
No ratings yet
Natural Language Processing Synopsis
Document8 pages
Natural Language Processing Synopsis
LevelUp Presents
No ratings yet
Comprehensive Guide Attention Mechanism Deep Learning
Document17 pages
Comprehensive Guide Attention Mechanism Deep Learning
Vishal Ashok Palled
No ratings yet
CH 2
Document3 pages
CH 2
Metages Degneh
No ratings yet
Siraj - School of AI - V1.0 08162018
Document19 pages
Siraj - School of AI - V1.0 08162018
Mohit k
No ratings yet
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
Document4 pages
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
erkaninho
No ratings yet
Encoder Decoder Models
Document31 pages
Encoder Decoder Models
belwalkarvarad
No ratings yet
ch3 Programming
Document22 pages
ch3 Programming
furkangokaypk26
No ratings yet
14 - Networking and The Internet
Document9 pages
14 - Networking and The Internet
Serhii Soloninko
No ratings yet
Application of AI
Document7 pages
Application of AI
shady
No ratings yet
Sign Language Recognition System Using Deep Learning
Document5 pages
Sign Language Recognition System Using Deep Learning
International Journal of Innovative Science and Research Technology
No ratings yet
GPT Models
Document10 pages
GPT Models
Vinay 422
No ratings yet
Guru Inovatif Mengenal Model Quantum Teaching
Document246 pages
Guru Inovatif Mengenal Model Quantum Teaching
Hanafi Is'device Outsaider
No ratings yet
Ansible Advance Training
Document3 pages
Ansible Advance Training
Mangesh Abnave
No ratings yet
Transforming Procure To Pay
Document1 page
Transforming Procure To Pay
Horacio Miranda
No ratings yet
R 12 Migration - Production Cutover Period Activities Details
Document6 pages
R 12 Migration - Production Cutover Period Activities Details
gopikrishna8843
No ratings yet
Secviii Div 2 - 7.5.6
Document2 pages
Secviii Div 2 - 7.5.6
Bhavin Sukhadiya
No ratings yet
SR Data Analyst Sample Resume
Document2 pages
SR Data Analyst Sample Resume
harsha9095
No ratings yet
Filter by Expression and FUSE
Document3 pages
Filter by Expression and FUSE
Sreenivas Yadav
No ratings yet
ASEM2M Iridium SBD Developers Guide
Document59 pages
ASEM2M Iridium SBD Developers Guide
kira019
No ratings yet
Univariate Time Series Modelling and Forecasting
Document62 pages
Univariate Time Series Modelling and Forecasting
derghal
No ratings yet
XMMMX
Document14 pages
XMMMX
kaushikbhandari
No ratings yet
Stickers
Document4 pages
Stickers
api-304435953
No ratings yet
Program Code
Document23 pages
Program Code
ecobalas7
No ratings yet
SAWO Poland 2018
Document6 pages
SAWO Poland 2018
cftaca
No ratings yet
Ez Publish Tutorial PDF
Document2 pages
Ez Publish Tutorial PDF
Amy
No ratings yet
Tutorial - NUnit
Document4 pages
Tutorial - NUnit
kokatnur
100% (2)
GEOVIA MineSched Brochure
Document4 pages
GEOVIA MineSched Brochure
Mahesh Pandey
0% (2)
IT Governance-An Integrated Framework and Roadmap
Document23 pages
IT Governance-An Integrated Framework and Roadmap
Laurentiu Sterescu
No ratings yet
Ryanpauld - Cootauco CV
Document1 page
Ryanpauld - Cootauco CV
Anonymous yVP9IENrs
No ratings yet
Experiment No.1: AIM: Design The Following Combinational Circuits Using VHDL and Test The Circuits Using Test
Document83 pages
Experiment No.1: AIM: Design The Following Combinational Circuits Using VHDL and Test The Circuits Using Test
delinquent_abhishek
No ratings yet
Operacion PMM
Document28 pages
Operacion PMM
alio0o
No ratings yet
Autocad Plant 3D Introduction: Course Description
Document1 page
Autocad Plant 3D Introduction: Course Description
fguope27
No ratings yet
RV180 K9 G5
Document5 pages
RV180 K9 G5
insafon
No ratings yet
Routing Policys Juniper MX
Document12 pages
Routing Policys Juniper MX
Sandra Gonsales
No ratings yet
Cs245 F16 Part 1
Document310 pages
Cs245 F16 Part 1
needecon102notes
No ratings yet
Atme College of Engineering: Course Outline For The Session 2018-2019 (Odd Semester)
Document3 pages
Atme College of Engineering: Course Outline For The Session 2018-2019 (Odd Semester)
rishabh sharma
No ratings yet
Boot Loader I Mate
Document5 pages
Boot Loader I Mate
Amanda Melo
No ratings yet