Professional Documents
Culture Documents
Stemming and Lemmatization
Stemming and Lemmatization
Stemming and Lemmatization
INTRODUCTION
Used in Natural Language Processing Stemming and Lemmatization reduce a
(NLP) word to its root or core.
Help to understand that some ”different” Usually is a pre-processing step for NLP
words come from the same core root
Example:
Eaten
Eating
Eats
2023FJ - JURESTI@TEC.MX 2
STEMMING
2023FJ - JURESTI@TEC.MX 3
STEMMING
Easiest approach
2023FJ - JURESTI@TEC.MX 4
STEMMING ALGORITHMS
Porter stemmer
Created in 1980
Removes common endings to words
2023FJ - JURESTI@TEC.MX 5
STEMMING ALGORITHMS…
Snowball stemmer
Also known as Porter2
Better than Porter stemmer
2023FJ - JURESTI@TEC.MX 6
STEMMING ALGORITHMS…
Lancaster Stemmer
One of the most aggressive
NLTK allows to add your own rules
2023FJ - JURESTI@TEC.MX 7
STEMMING ALGORITHMS…
Regular Expression Stemmer
Allows to define a regular expression
2023FJ - JURESTI@TEC.MX 8
LEMMATIZATION
2023FJ - JURESTI@TEC.MX 9
LEMMATIZATION
Involves resolving words to their
dictionary form
https://d2mk45aasx86xg.cloudfront.net/Example_to_understand_lemmatization_a73d97a04c.webp
2023FJ - JURESTI@TEC.MX 10
https://d2mk45aasx86xg.cloudfront.net/difference_between_Stemming_and_lemmatization_8_11zon_452539721d.webp
https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/stemvslemma.png
2023FJ - JURESTI@TEC.MX 11
WORDNET
LEMMATIZER
Lexical database
2023FJ - JURESTI@TEC.MX 12
TEXTBLOB
LEMMATIZER
2023FJ - JURESTI@TEC.MX 13
N-GRAMS
2023FJ - JURESTI@TEC.MX 14
N-GRAM
Connected string of N elements
Uses: https://images.deepai.org/django-summernote/2019-04-11/f98290ce-a9e9-48c6-8330-4e9a5fe55331.png
Text autocompletion
Auto spell check
Basic grammar check
2023FJ - JURESTI@TEC.MX 15
USING NLTK
2023FJ - JURESTI@TEC.MX 16
USING NLTK…
2023FJ - JURESTI@TEC.MX 17