Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Transformers explained

“Attention is all you need.”


The technology behind ChatGPT, BERT, and all other language models
Do you know the full form of “GPT”?

“Generative Pre-trained Transformer”


It’s 2012.
Computers are good at vision.

Thanks to Neural Networks

Photo Credits: Google Brain


Convolutional neural networks (1987)

Photo Credits: Google Brain


Recurrent neural networks (RNN)

Photo Credits: Google Brain


“Sequential”
In Language, order of
words matters.

“Dhravya went looking for trouble”

“Trouble went looking for Dhravya”


Problem with RNNs

They forget the context 1. Slow


You cannot parrallelize them 2. Expensive to
train
✨Transformers✨
A flavor of Recurrent Neural Network that can be parallelized.

2017, Google + University of Toronto

Just:
1. Get a lot of GPUs
2. Get a LOT of data
3. Results will blow your mind.
How big?

GPT-3
But how does it work?

GPT-3
The three pillars

GPT-3
✨Transformers✨
Positional
Encoding

I love science
Positional Encoding
(tokenization)
1 2 3

I love science

Store information in the word order itself. Not in the structure of


the order.
A helpful rule of thumb is that one token generally corresponds
to ~4 characters of text for common English text. This translates
to roughly ¾ of a word

Photo taken from https://platform.openai.com/tokenizer GPT


tokenizer
Transformers understand
Importance of word order,

From the data.


Attention.

The agreement on the European Economic Area was signed in August


1992.

the European Economic Area -> la européenne économique zone


Understanding
language
“European” comes after “Economic” in French.
There’s a gendered agreement between words

the European Economic Area -> la zone économique européenne


But…
How does the model know which
words it should be attending to?
Self-Attention
“The programmer crashed the server”
“That software developer’s name is Max. He manages the
servers.”
“He is working on fixing the servers. The software is
broken.
Self-Attention
Understand the word based on the context of other
words,

In the same input sentence.

Models attending to the word

“Server, can I get a check?”


“Looks like I just crashed the server”
Transformers boil down
to

GPT-3
✨Transformers✨
Today, transformers are
used for:
Today, anyone can train
models on unlabelled data
References
Morgan, Abby. "Explainable AI: Visualizing Attention in
Transformers." Comet, 16 July 2023,
www.comet.com/site/blog/explainable-ai-for-transformers/.

Vaswani, Ashish, et al. "Attention is all you need." Advances in


neural information processing systems 30 (2017)
https://arxiv.org/abs/1706.03762

Muñoz, Eduardo. "Attention is all you need: Discovering the


Transformer paper. Detailed implementation of a Transformer
model in Tensorflow." Towards Data Science, 2 Nov. 2020,
towardsdatascience.com.

You might also like