Stemming and Lemmatization

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

STEMMING AND LEMMATIZATION

INTRODUCTION
Used in Natural Language Processing Stemming and Lemmatization reduce a
(NLP) word to its root or core.

Help to understand that some ”different” Usually is a pre-processing step for NLP
words come from the same core root
Example:
­ Eaten
­ Eating
­ Eats

2023FJ - JURESTI@TEC.MX 2
STEMMING
2023FJ - JURESTI@TEC.MX 3
STEMMING
Easiest approach

Words are reduced to their


word stems

A stem may not be the same as


the root of the word
https://devopedia.org/images/article/218/8583.1569386710.png

Algorithms are usually heuristic

2023FJ - JURESTI@TEC.MX 4
STEMMING ALGORITHMS
Porter stemmer
Created in 1980
Removes common endings to words

Usually, applied first as a starter point


Guarantees reproducibility

2023FJ - JURESTI@TEC.MX 5
STEMMING ALGORITHMS…
Snowball stemmer
Also known as Porter2
Better than Porter stemmer

More aggressive than Porter stemmer

2023FJ - JURESTI@TEC.MX 6
STEMMING ALGORITHMS…
Lancaster Stemmer
One of the most aggressive
NLTK allows to add your own rules

Can transform words into strange stems

2023FJ - JURESTI@TEC.MX 7
STEMMING ALGORITHMS…
Regular Expression Stemmer
Allows to define a regular expression

Removes prefixes and suffixes

2023FJ - JURESTI@TEC.MX 8
LEMMATIZATION
2023FJ - JURESTI@TEC.MX 9
LEMMATIZATION
Involves resolving words to their
dictionary form

Requires linguistic knowledge

https://d2mk45aasx86xg.cloudfront.net/Example_to_understand_lemmatization_a73d97a04c.webp

Gives better solutions called


“Lemma”s

More complex to use

2023FJ - JURESTI@TEC.MX 10
https://d2mk45aasx86xg.cloudfront.net/difference_between_Stemming_and_lemmatization_8_11zon_452539721d.webp

https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/stemvslemma.png

2023FJ - JURESTI@TEC.MX 11
WORDNET
LEMMATIZER
Lexical database

Used by most search engines

2023FJ - JURESTI@TEC.MX 12
TEXTBLOB
LEMMATIZER

2023FJ - JURESTI@TEC.MX 13
N-GRAMS
2023FJ - JURESTI@TEC.MX 14
N-GRAM
Connected string of N elements

An element can be a word or a


smaller set (like a syllable)

Used extensively in NLP

Uses: https://images.deepai.org/django-summernote/2019-04-11/f98290ce-a9e9-48c6-8330-4e9a5fe55331.png

­ Text autocompletion
­ Auto spell check
­ Basic grammar check

2023FJ - JURESTI@TEC.MX 15
USING NLTK

2023FJ - JURESTI@TEC.MX 16
USING NLTK…

2023FJ - JURESTI@TEC.MX 17

You might also like