Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

I N T R OD U CT I ON TO

D EEP L EA R N I N G &
L A RG E L A N G UAG E
MOD EL S

TWUMASI MENSAH-BOATENG
INTRODUCTION
INTRODUCTION
INTRODUCTION
INTRODUCTION
WHY DEEP LEARNING?
• Big Data • Hardware • Software

§ Large Datasets § GPUs – § Open-source


Graphical software
§ Easy Collection
Processing Units § Tensorflow
§ Easy storage
§ Massive § Pytorch, etc.
Parallelization
BUILDING BLOCKS
OF A NEURAL
NETWORK
ARTIFICIAL NEURAL NETWORK
THE PERCEPTRON

%
ŷ = 𝑔 𝑤! + ' 𝑥" 𝑤"
"#$

(Non-linear) Where:
𝑔 is the activation function
𝑤! is the bias weight
AC T I VAT I O N F U N C T I O N S

1 𝑒 ' − 𝑒 &'
𝑔𝑧 = 𝑔 𝑧 = max(0, 𝑧)
1 + 𝑒 &' 𝑔𝑧 = '
𝑒 + 𝑒 &'
1, 𝑧>0
𝑔( 𝑧 = 𝑔(𝑧)(1 − 𝑔 𝑧 ) 𝑔( 𝑧 = 1 − 𝑔(𝑧)) 𝑔′ 𝑧 = 5
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
I M P O RTA N C E O F AC T I VAT I O N
FUNCTIONS

Demonstration
NEURAL NETWORKS

%
($) ($)
𝑧* = 𝑔 𝑤!,* + ' 𝑥" 𝑤",*
"#$
.
()) ())
ŷ = 𝑔 𝑤! + ' 𝑧" 𝑤"
"#$
TRAINING A NEURAL NETWORK
• Training a neural network involves two Optimizing with Gradient Descent
main steps
1. Quantify the losses. – Forward pass. Algorithm
2. Optimize the losses. – Backpropagation. 1. Initialize weights randomly
2. Loop until convergence
Loss/Cost Function: /0(1)
3. compute gradient, /1
.
1 4. Update weights, 𝑤 ← 𝑤 − ƞ
/0(1)
𝐽 𝑊 = ' 𝐿 𝑓 𝑥 (*) ; 𝑤 , 𝑦 (*) /1
𝑛 5. Return weights
*#$

Objective Function: Demonstration


.
1
𝑊 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛1 ' 𝐿 𝑓 𝑥 (*) ; 𝑤 , 𝑦 (*)
𝑛
*#$
TRAINING A NEURAL NETWORK

Computing the gradient: Backpropagation

𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕ŷ 𝜕𝑧
= ∗ ∗
𝜕𝑤$ 𝜕ŷ 𝜕𝑧 𝜕𝑤$

𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕ŷ
= ∗
𝜕𝑤) 𝜕ŷ 𝜕𝑤)
10-MINUTES BREAK
INTRODUCTION TO
LARGE LANGUAGE
MODELS
W H AT I S L A N G U AG E ?
A language is a system of communication that allows individuals to express thoughts, ideas,
and emotions.

Linguistics is the study of Language.


Fundamentals of Language

- Phonetics - the sound of speech. - Phonology - sound patterns.


- Part-of-speech annotation. - Morphology - word structure
- Syntax - sentence structure. - Semantics - Meaning
- Discourse.
A P P L I C AT I O N S O F C O M P U TAT I O N A L
LINGUISTICS
M O D E L S O F C O M P U TAT I O N A L
LINGUISTICS
KNOWLEDGE-BASED MODELS
• Follows manually written rules. Drawbacks

• It needs a proof of concept. • It is was not robust.

• It had few applications.

ELIZA: the first chatbot • It was not scalable.

Eliza, a chatbot therapist (njit.edu)


DATA - D R I V E N M O D E L S
Instead of writing rules, have computer learn rules/regularities.

Demonstration with ChatGPT.


EVOLUTION OF LANGUAGE MODELS
SEQUENTIAL MODELING
Sequential modeling is designed to capture and utilize temporal dependencies within
data sequences. This involves understanding how each element in a sequence relates to
the previous elements and predicting future elements based on this temporal context.

Sequential modeling has applications in a wide range of tasks, including but not limited
to natural language processing (e.g., language modeling, machine translation, text
generation), time series analysis (e.g., forecasting, anomaly detection), speech
recognition, music generation, and DNA sequence analysis
SEQUENTIAL MODELING
To model sequences, we need to:

• Handle variable length sequences.

• Track long-term dependencies.

• Maintain information about order.

• Share parameters across the sequence.


LARGE LANGUAGE MODEL
ENCODING LANGUAGE FOR NEURAL
NETWORK.
• Vocabulary

• Indexing

• Embedding
ENCODING LANGUAGE FOR NEURAL
NETWORK.
THE MODEL:TRANSFORMERS
Self-Attention Mechanism: The Transformer model relies
on self-attention mechanisms to weigh the importance of
different words in a sentence when processing natural
language data. This mechanism allows the model to focus
on relevant words and learn contextual representations
effectively.

Parallelization: Transformer model can process all words


in a sequence simultaneously. This parallelization enables
faster training and inference, making the Transformer
model more efficient for long sequences.

Encoder-Decoder Architecture: The Transformer model


typically consists of an encoder-decoder architecture,
where the encoder processes the input sequence to create
representations, and the decoder generates the output
sequence based on these representations
E N C O D E R O N LY M O D E L S
• They are also known as Auto-Encoding models

• They are mostly used for:

• Sentiments classification

• Named Entity Recognition

• Extractive Question Answering


E.g. BERT
D E C O D E R O N LY M O D E L S
• They are known as Auto-regressive models.

• Tasks include, Text generation

E.g. GPT-2 My name

My name is

My name is Tom
ENCODER-DECODER MODELS
• They are also known as sequence-to-sequence models.

Tasks include:

• Summarization

• Translation

• Generative Question Answering

• E.g. T5, BART


SUMMARY

THANK YOU!
RESOURCES TO LEARN MORE
• Intro to Deep Learning (Tensforflow)

• Neural Networks: Zero to Hero (Andrej Karpathy)

• Intro to Deep Learning (M

• https://www.deeplearning.ai/ai-notes/initialization/index.htmlIT)

• https://playground.tensorflow.org/

You might also like