Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Introduction to NLP

Definition of NLP
• NLP enables computers to understand natural language as humans do. Whether
the language is spoken or written, natural language processing uses artificial
intelligence to take real-world input, process it, and make sense of it in a way a
computer can understand.
• Just as humans have different sensors -- such as ears to hear and eyes to see --
computers have programs to read and microphones to collect audio. And just as
humans have a brain to process that input, computers have a program to process
their respective inputs.
• At some point in processing, the input is converted to code that the computer
can understand.
Process
Continued..
• NLU – Natural Language understanding
• Which converts speech to text and it act as an interpreter.
• It specifies about what do the users say? their intent? and meaning?
NLG – Natural Language Generation
• What should we say to user?
• It should be intelligent and conversational.
• It deals with structured data.
• The reply must be in context of what the user has asked.
• In NLG, we do text and sentence planning.
Continued..
• There are two main phases to natural language processing: data
preprocessing and algorithm development.
• Data preprocessing involves preparing and "cleaning" text data for
machines to be able to analyze it. preprocessing puts data in workable
form and highlights features in the text that an algorithm can work
with.
Several ways of Preprocessing

• Tokenization. This is when text is broken down into smaller units to


work with.
• Stop word removal. This is when common words are removed from
text so unique words that offer the most information about the text
remain.
• Lemmatization and stemming. This is when words are reduced to
their root forms to process.
• Part-of-speech tagging. This is when words are marked based on the
part-of speech they are -- such as nouns, verbs and adjectives.
Stemming v/s Lemmatization

Stemming Lemmatization
Stemming is a process that stems or removes last few Lemmatization considers the context and converts the
characters from a word, often leading to incorrect word to its meaningful base form, which is called
meanings and spelling. Lemma.
For instance, stemming the word ‘Caring‘ would For instance, lemmatizing the word ‘Caring‘ would
return ‘Car‘. return ‘Care‘.
Stemming is used in case of large dataset where Lemmatization is computationally expensive since it
performance is an issue. involves look-up tables and what not.
NLP its applications
• How user communicate with each other.
• Computer should replicate the same thing
Applications:
Speech Recognition
Sentiment Analysis
Machine translation
Chat bots
Knowledge in Speech and Language
Processing
• A natural language understanding system must have knowledge about
what the words mean, how the words and its meaning combine to
form sentence.
• The different forms of knowledge required for natural language
understanding are:
Continued..
• Speech recognition, or speech-to-text, is the ability of a machine
or program to identify words spoken aloud and convert them into
readable text.
• Text-to-speech, is a category of software or hardware that converts
text to artificial speech. A text-to-speech system is one that reads text
aloud through the computer's sound card or other speech synthesis
device.
• Both requires knowledge about Phonetics and phonology.
Continued…
• morphology, the way words break down into component parts that carry
meanings like singular versus plural.
• (1.1) I’m I do, sorry that afraid Dave I’m can’t. The knowledge needed to order
and group words together comes under the heading of syntax.
• How much Chinese silk was exported to Western Europe by the end of the
18th century? - Question and answering.
1. Lexical semantics – Meaning of all words like export or silk.
2. Compositions semantics – What exactly constitutes western Europe as
opposed to eastern or southern Europe.
3. What does end mean when combined with the 18th century.
4. Need to know relationship of the words to the syntactic structure.
Variations of statement
Ambiguity
Challenges : (AI hard problems)
Lexical Ambiguity : The words we are speaking dividing them in to
tokens. When it will be converted into different words, there may be
multiple meanings of the same word.
Ex: The tank was full of water.
Continued..

Syntactic Ambiguity : It deals about the structure of the sentence


• When we pass any sentence, the structure of the sentence must be
valid. If there is a problem with that, there will be problem with the
structure.
• Ex: Old men and women were taken to safe place
Continued..
Semantic Ambiguity: It concentrates about the meaning of data
• Ex: The car hit the pole while it was moving
• What was moving? Car or pole
• A moving car hit the child when it was moving
Removing of ambiguity is the bigger challenge. To do so, we use
lemenisations, stemming and name entity relations.
Collect all the words and identify various part of speech
Continued..
• Pragmatic Ambiguity : The context of phrase give multiple
interpretations.
• Ex: The police are coming?
Models and Algorithms
• State machines
• Rule systems
• Logic
• Probabilistic models
• Vector space models
Variations of basic model are:
• Deterministic and non-deterministic
• Finite state automata
• Finite state transducers
Continued..
Formal rule system:
• Regular grammars
• regular relations
• Context free grammar
• Feature-augmented grammar
Note: state machines and formal rule systems are the main tools used
when dealing with knowledge of phonology, morphology and syntax
Probabilistic models
Markov model
HMM

You might also like