006 NLP-pipelineSLides

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

General

Complete
Natural
Language
Processing

NLP
Masterclass

Pipeline
This section would allow you to
conceptualise the overall process from
using raw text to the point where it can
be fed to a machine learning model.
For instance; take Amazon Text
Reviews to a point where we can
create a machine learning model can
tell us if it is a positive or negative
review.

Course Flow OVERVIEW //NIDIA


Text Word
Text
Wrangling & Embedding
Normalization
Preprocessing

Train Initialise Build


Model Model Model

Test Evaluate
Model Model
Text Complete
Natural
Language

Pre-
Processing
Masterclass

processing
Most of the time data in the real world,
not limited to NLP but for data science
in general - data can be messy. In our
case, text data can be highly
unstructured. While learning NLP, most
times the datasets require
preprocessing - but not to the extent to
if you were actually creating your own
dataset from scratch. Nevertheless, I
will ensure that you get good practice
with cleaning some very dirty data.

Clean the mess REALITY //NIDIA


Pre- Complete
Natural
Language

processing
Processing
Masterclass

Tweets
Extract hashtags
Clean URLS
Mentions
Emojis
Smileys
Remove digits
Punctuations
Stop words - dont add much
meaning - a, an, in, this, it, at, the

#Coachella TWEETS //NIDIA


Text Complete
Natural
Language

Normalization Processing
Masterclass

in computer science, canonical means


a standard state or behaviour of an
attribute - we are putting the text into a
structure that conforms to well-
established patterns.
Tokenization
Stemming
Lemmatization
Sentence Segmentation
Spell correction

standard state CONVERTING TEXT INTO SINGLE //NIDIA


CANONICAL FORM
Word Complete
Natural
Language

Embeddings Processing
Masterclass

The aim is to encode (normalized)


words into a vector that exists at some
position in “word space”.
Essentialy, we are representing a
vocabulary in a vector space.

mathematics principles Cosine and


Euclidean Distance.

This is where the magic starts. The


similarities - how do we know that king
is associated with prince, but not as
much with jellyfish?

Embedding layers allow the algorithm


to figure out:
man is to king
EXAMPLE: WORD2VEC, GLOVE //NIDIA
woman is to ________
Space - dimensions, positions.
Closeness, distance.
Build Model Complete
Natural
Language
Processing
Machine learning involves models; Masterclass
having its own ML pipeline - which
operate with a series of mathematical
executions, to learn from data, in order
to estimate the output on unseen data.
Example: the dataset can have 1000
reviews labelled positive or negative.
The model can predict if an unseen
review is negative or positive.

Depending on the type of data, the


accuracy will vary depending on the
type of model.
Deep learning - Recurrent Neural
Networks, LSTMs.

ML pipeline EXAMPLE: LSTM //NIDIA


Transfer Complete
Natural
Language

Learning Processing
Masterclass

If my dataset consists of Netflix movie


titles. The movie titles make up my
vocabulary. But what if this is not
enough to train my model on - Transfer
learning allows the model to use
already learned knowledge from a
gigantic dataset - this knowledge is
transferred from a related task to this
new dataset.

Initialising Model PRE-TRAINED - USING PREVIOUS //NIDIA


KNOWLEDGE
Train Model Complete
Natural
Language
Processing
Masterclass
Think of this as - using the dataset, with
the new cleaned data; that was
converted into a machine readable
integer, structured form --> take this
data, think of it as knowledge; we are
taking this knowledge to iterate, the
way we would study words repeatedly
in order to learn it --> training the model
is simply teaching the
machine/teaching the computer.

teaching the computer CHOOSE MODEL //NIDIA


Test & Complete
Natural
Language

Evaluate Processing
Masterclass

Predict the output, then compare it to


the dataset with the labelled data.
Then evaluate it on unseen data.

Amazon review: "I love this book, it's


better than I expected & the shipping
was really fast!"

Performance PRE-TRAINED - USING PREVIOUS //NIDIA


KNOWLEDGE

You might also like