Lecture-5-Intro DL

I N T R OD U CT I ON TO
D EEP L EA R N I N G &
L A RG E L A N G UAG E
MOD EL S
TWUMASI MENSAH-BOATENG
INTRODUCTION
INTRODUCTION
INTRODUCTION
INTRODUCTION
WHY DEEP LEARNING?
• Big Data • Hardware • Software
§ Large Datasets § GPUs – § Open-source

Graphical software
§ Easy Collection
Processing Units § Tensorflow
§ Easy storage
§ Massive § Pytorch, etc.
Parallelization
BUILDING BLOCKS
OF A NEURAL
NETWORK
ARTIFICIAL NEURAL NETWORK
THE PERCEPTRON
%
ŷ = 𝑔 𝑤! + ' 𝑥" 𝑤"
"#$
(Non-linear) Where:
𝑔 is the activation function
𝑤! is the bias weight
AC T I VAT I O N F U N C T I O N S
1 𝑒 ' − 𝑒 &'
𝑔𝑧 = 𝑔 𝑧 = max(0, 𝑧)
1 + 𝑒 &' 𝑔𝑧 = '
𝑒 + 𝑒 &'
1, 𝑧>0
𝑔( 𝑧 = 𝑔(𝑧)(1 − 𝑔 𝑧 ) 𝑔( 𝑧 = 1 − 𝑔(𝑧)) 𝑔′ 𝑧 = 5
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
I M P O RTA N C E O F AC T I VAT I O N
FUNCTIONS
Demonstration
NEURAL NETWORKS
%
($) ($)
𝑧* = 𝑔 𝑤!,* + ' 𝑥" 𝑤",*
"#$
.
()) ())
ŷ = 𝑔 𝑤! + ' 𝑧" 𝑤"
"#$
TRAINING A NEURAL NETWORK
• Training a neural network involves two Optimizing with Gradient Descent
main steps
1. Quantify the losses. – Forward pass. Algorithm
2. Optimize the losses. – Backpropagation. 1. Initialize weights randomly
2. Loop until convergence
Loss/Cost Function: /0(1)
3. compute gradient, /1
.
1 4. Update weights, 𝑤 ← 𝑤 − ƞ
/0(1)
𝐽 𝑊 = ' 𝐿 𝑓 𝑥 (*) ; 𝑤 , 𝑦 (*) /1
𝑛 5. Return weights
*#$
Objective Function: Demonstration

.
1
𝑊 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛1 ' 𝐿 𝑓 𝑥 (*) ; 𝑤 , 𝑦 (*)
𝑛
*#$
TRAINING A NEURAL NETWORK
Computing the gradient: Backpropagation
𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕ŷ 𝜕𝑧
= ∗ ∗
𝜕𝑤$ 𝜕ŷ 𝜕𝑧 𝜕𝑤$
𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕ŷ
= ∗
𝜕𝑤) 𝜕ŷ 𝜕𝑤)
10-MINUTES BREAK
INTRODUCTION TO
LARGE LANGUAGE
MODELS
W H AT I S L A N G U AG E ?
A language is a system of communication that allows individuals to express thoughts, ideas,
and emotions.
Linguistics is the study of Language.

Fundamentals of Language
- Phonetics - the sound of speech. - Phonology - sound patterns.

- Part-of-speech annotation. - Morphology - word structure
- Syntax - sentence structure. - Semantics - Meaning
- Discourse.
A P P L I C AT I O N S O F C O M P U TAT I O N A L
LINGUISTICS
M O D E L S O F C O M P U TAT I O N A L
LINGUISTICS
KNOWLEDGE-BASED MODELS
• Follows manually written rules. Drawbacks
• It needs a proof of concept. • It is was not robust.
• It had few applications.
ELIZA: the first chatbot • It was not scalable.
Eliza, a chatbot therapist (njit.edu)

DATA - D R I V E N M O D E L S
Instead of writing rules, have computer learn rules/regularities.
Demonstration with ChatGPT.

EVOLUTION OF LANGUAGE MODELS
SEQUENTIAL MODELING
Sequential modeling is designed to capture and utilize temporal dependencies within
data sequences. This involves understanding how each element in a sequence relates to
the previous elements and predicting future elements based on this temporal context.
Sequential modeling has applications in a wide range of tasks, including but not limited
to natural language processing (e.g., language modeling, machine translation, text
generation), time series analysis (e.g., forecasting, anomaly detection), speech
recognition, music generation, and DNA sequence analysis
SEQUENTIAL MODELING
To model sequences, we need to:
• Handle variable length sequences.
• Track long-term dependencies.
• Maintain information about order.
• Share parameters across the sequence.

LARGE LANGUAGE MODEL
ENCODING LANGUAGE FOR NEURAL
NETWORK.
• Vocabulary
• Indexing
• Embedding
ENCODING LANGUAGE FOR NEURAL
NETWORK.
THE MODEL:TRANSFORMERS
Self-Attention Mechanism: The Transformer model relies
on self-attention mechanisms to weigh the importance of
different words in a sentence when processing natural
language data. This mechanism allows the model to focus
on relevant words and learn contextual representations
effectively.
Parallelization: Transformer model can process all words

in a sequence simultaneously. This parallelization enables
faster training and inference, making the Transformer
model more efficient for long sequences.
Encoder-Decoder Architecture: The Transformer model

typically consists of an encoder-decoder architecture,
where the encoder processes the input sequence to create
representations, and the decoder generates the output
sequence based on these representations
E N C O D E R O N LY M O D E L S
• They are also known as Auto-Encoding models
• They are mostly used for:
• Sentiments classification
• Named Entity Recognition
• Extractive Question Answering

E.g. BERT
D E C O D E R O N LY M O D E L S
• They are known as Auto-regressive models.
• Tasks include, Text generation
E.g. GPT-2 My name
My name is
My name is Tom
ENCODER-DECODER MODELS
• They are also known as sequence-to-sequence models.
Tasks include:
• Summarization
• Translation
• Generative Question Answering
• E.g. T5, BART

SUMMARY
THANK YOU!
RESOURCES TO LEARN MORE
• Intro to Deep Learning (Tensforflow)
• Neural Networks: Zero to Hero (Andrej Karpathy)
• Intro to Deep Learning (M
• https://www.deeplearning.ai/ai-notes/initialization/index.htmlIT)
• https://playground.tensorflow.org/

Lecture-5-Intro DL

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture-5-Intro DL

Uploaded by

Copyright:

Available Formats

I N T R OD U CT I ON TO

§ Large Datasets § GPUs – § Open-source

Objective Function: Demonstration

Computing the gradient: Backpropagation

Linguistics is the study of Language.

- Phonetics - the sound of speech. - Phonology - sound patterns.

• It needs a proof of concept. • It is was not robust.

• It had few applications.

ELIZA: the first chatbot • It was not scalable.

Eliza, a chatbot therapist (njit.edu)

Demonstration with ChatGPT.

• Handle variable length sequences.

• Track long-term dependencies.

• Maintain information about order.

• Share parameters across the sequence.

Parallelization: Transformer model can process all words

Encoder-Decoder Architecture: The Transformer model

• They are mostly used for:

• Named Entity Recognition

• Extractive Question Answering

• Tasks include, Text generation

E.g. GPT-2 My name

• Generative Question Answering

• E.g. T5, BART

• Neural Networks: Zero to Hero (Andrej Karpathy)

• Intro to Deep Learning (M

You might also like