Professional Documents
Culture Documents
Introduction To Natural Language Processing (Nlp) : Ths. Đặng Nhân Cách Email: Cach.Dang@Ut.Edu.Vn
Introduction To Natural Language Processing (Nlp) : Ths. Đặng Nhân Cách Email: Cach.Dang@Ut.Edu.Vn
Introduction To Natural Language Processing (Nlp) : Ths. Đặng Nhân Cách Email: Cach.Dang@Ut.Edu.Vn
NATURAL LANGUAGE
PROCESSING (NLP)
Email: cach.dang@ut.edu.vn
Scope of discussion
https://en.wikipedia.org/wiki/Natural_language_processing
Flow chart of the NLP
Flow chart of the NLP
CONCEPTS AND EXAMPLES
Raw Text cleansing data stages
NLTK (Natural Language Toolkit) is used for almost all NLP tasks.
Word Tokenize
Sentence
Tokenize
Stop-words removal
● A few words in the document have the same root but used in
different ways.
● Stemming is the process of eliminating suffixes, prefixes from
a word to obtain a root word.
● E.g: connect, connection, connected, connections, connects.
○ Stemmed_word: connect
Stemming
Stemming