Introduction To Natural Language Processing (NLP)

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 87

6.

Introduction to Natural Language


Processing (NLP)
Lecture Overview
1. Students should appreciate the need for NLP
2. Students should understand the difference between
natural and formal language and the difficulty in
processing the former
3. Students should understand the fundamental
problems in NLP; mainly ambiguities that arise in NLP
4. Students should understand the language information
required like Phonology, Morphology, Syntax,
Semantic, Discourse, World knowledge
5. Student should be familiar with the everyday
applications of NLP
What is NLP?
• ”Natural language processing (NLP) is a field of Artificial
Intelligence and Linguistics concerned with the
interactions between computers and human (natural)
languages.
• Specifically, the process of a computer extracting
meaningful information from natural language input
and/or producing natural language output
• The aim being to build intelligent computers that can
interact with human beings like a human being. The
ultimate objective of NLP being to allow people to
communicate with computers in much the same way they
communicate with each other.
What is NLP? (2)
• NLP involves the reading and understanding of spoken or
written language through the medium of a computer. This
includes, for example, the automatic translation of one
language into another, but also spoken word recognition,
or the automatic answering of questions. Computers often
have trouble understanding such tasks, because they
usually try to understand the meaning of each individual
word, rather than the sentence or phrase as a whole. So
for a translation program, it can be difficult to understand
the linguistic nuance in the word ‘Greek’ when it comes to
the examples ‘My wife is Greek’ and ‘It’s all Greek to me’,
for example
What is NLP? (3)
• Through NLP, computers learn to accurately manage and
apply overall linguistic meaning to text excerpts like
phrases or sentences. But this isn’t just useful for
translation or customer service chat bots: computers can
also use it to process spoken commands or even generate
audible responses that can be used in communication
with the blind, for example. Summarizing long texts or
targeting and extracting specific keywords and
information within a large body of text also requires a
deeper understanding of linguistic syntax than
computers had previously been able to achieve.
Natural Language?
• Refers to the language spoken by people,
e.g. English, Japanese, Swahili, Shona, as
opposed to Artificial and formal Languages,
like C++, Java, predicate logic etc.
• What do you think then are the main
differences between formal languages (such
as logics or programming languages) and a
natural language (such as English)?
NLP Definition
• Natural Language Processing is a theoretically
motivated range of computational techniques
for analysing and representing naturally
occurring texts/speech at one or more levels of
linguistic analysis for the purpose of achieving
human-like language PROCESSING for a range
of tasks or applications.
NLP Foundations
• Foundations are in computer science (AI,
theory, algorithms,…); linguistics; logic,
mathematics; statistics; and psychology
Why NLP?
• Two trends
1. An enormous amount of knowledge is now available
in machine readable form as natural language text
• Huge amounts of data
• Internet = at least 20 billions web pages
– Text data web – sites, blog, tweets .......
– Audio data – speech .......

2. Conversational agents are becoming an important


form of human-computer communication

9
How much data was produced every
minute in 2017?
NLP Everyday...
• Before diving right into NLP details, let us go through some of our
daily experiences which we might have noticed them as just some of
the features an application is providing but not as NLP applications.
1. Did you notice that face book has shown you an advertisement of
buying something related to your recent status updates?
2. Did you notice Google classifying your emails to Primary, Social,
Updates, Spam etc. ?
3. Did you notice your keyboard in Smartphone that has learnt the
patterns or words of your text input ?
4. Did you notice Google giving you results, suggested queries
when you made a search ?
5. Did you notice that a lot around you are adjusting themselves to
suit your thoughts and expressions, at least that is related to
technology ?
Some NLP Applications
1. Spelling and grammar 9. Text Summarization
checking 10. Speech Processing
2. Information Retrieval (text to speech,
3. Information Extraction automatic speech
4. question answering recognition, speech to
speech translation)
5. Machine Translation ..
11. Text Processing
6. dialogue systems (Morph Analyser, POS
7. Sentiments Analysis Tagging, Parsing)
8. Text Classification 12. Exam marking
1. Spelling and Grammar Checking

• All spelling checkers can flag words which aren’t in a


dictionary.
• Examples:
– *The neccesary steps are obvious.
– The necessary steps are obvious.
– * Its a fair exchange.
– It’s a fair exchange.
2. Information Retrieval
• Information retrieval involves returning a set
of documents in response to a user query:
Internet search engines are a form of IR.
• Information Retrieval involves selecting from
a set of documents the ones that are relevant
to a query
3. Information Extraction
• Information extraction involves trying to
discover specific information from a set of
documents/ emails/ tweets/ news etc,.
• The information required can be described as
a template. For instance, for company joint
ventures, the template might have slots for
the companies, the dates, the products, the
amount of money involved. The slot fillers
are generally strings.
Information Extraction (2)
• Information extraction: Extraction of
meaning from email :- “We have decided to
meet tomorrow at 10:00am in the lab.”
Information Extraction (3)
• News: AN EARTHQUAKE struck Indonesia today - a
strapping 7.7 magnitude earthquake that struck early today
off the northern coast of the island of Sumatra. It caused
minor damage and there are no reports of any deaths,
although electricity was interrupted in several places.
 Location : Indonesia Magnitude: 7.7
 Region: Sumatra (Northern Cost) Deaths: Nil
 Damage: Minor
• Tweet: @nokia announces release of new PDA phones see
is.gd/iuTuY
 Who: Nokia
 What: Product announcement
Information Abstraction vs. Information
Extraction
• "The Army Corps of Engineers, in their rush to
protect New Orleans by the start of the 2006
hurricane season, installed defective flood-
control pumps despite warnings from its own
expert about the defects
• Extractive: "Army Corps of Engineers", "New
Orleans", and "defective flood-control pumps“
• Abstractive: "political negligence" ,
"inadequate protection from floods
4. Question Answering
• Question answering attempts to find a
specific answer to a specific question from a
set of documents, or at least a short
piece of text that contains the answer.
• E.g. What is the capital of France?
– “Paris has been the French capital for many
centuries”
Question Answering (2)
Question Answering (3)
• Question Answering
– ”What time is the next bus from the city after the
5:00 pm bus ?”
– ”I am a 2nd year CSE student, which classes do I
have today ?”
– ”Which gene is associated with Diabetes ?”
– ”Who is Donald Trump ?”
5. Machine Translation
• Machine Translation: Translating content in one natural
language to another natural language. Example : Translating
and English Sentence to Shona with the help of software.
6.Dialogue Systems
7. Sentiment/ Opinion Analysis
• Sentiment Analysis: Help companies analyze large
number of reviews on a product; Help customers
process the reviews provided on a product.
• Reviews about a restaurant :-
– “Best roast chicken in New Delhi.”
– “Service was very disappointing.”
• Another set of reviews
– “iPhone 4S is over-hyped.”
– “The hype about iPhone 4S is justified.”
Sentiment/Opinion Analysis (2)
8. Text/ Document Classification
• Text classification or categorization involves
sorting text into fixed topic categories
Text/Document Classification (2)
9. Text summarization
• Text summarization: Extract keywords or
key-phrases from a large piece of text.
Creating an abstract of an entire article.
• Given a piece of text, automatically make a
summary satisfying required constraints.
Text Summarization (2)
• Examples of constraints:
– Summary should have all the information of the
document
– Summary should have only correct information of
the document.
– Summary should have information only from the
document and so on, depending on the user’s
needs!
10. Context Analysis
• Context analysis: Social networking sites can
‘fairly’ understand the topic of discussion “ 4
of your friends posted about UZ Graduation”.
11. Speech processing
• Text to speech
– Converting electronic text to digital speech
• Automatic Speech Recognition
– Automatic transcription of spoken content to
electronic text
• Speech to speech translation
– Translating spoken content from one language to
another in real time or offline.
12. Text Processing
• Processing raw text
I. POS Tagging
– Ram/NNP goes/VB to/TO school/NNP ..
II. Stemming
– running --> run
III. Morphological Analysis
– Running --> run + ing
IV. Parsing
– Identifying sentence structure
– S --> NP + V
(I) Text Processing : Part of Speech Tagging
Part of Speech Tagging (2)
• Part of speech (POS) recognition
– “ Today is a beautiful day. “

– Today is a beautiful day


Noun Verb Article Adjective Noun
• “Interest rates interest economists for the interest of
the nation.“ (word sense disambiguation)
(II) Text Processing : Stemming

• Stemming is the process for reducing


inflected (or sometimes derived) words to
their stem, base or root form.

• car, cars -> car


• run, ran, running -> run
• stemmer, stemming, stemmed -> stem
The 8 Parts of Speech
1. Noun
2. Pronoun
3. Verb
4. Adjective
5. Adverb.
6. Preposition.
7. Conjunction.
8. Interjection.
1. Nouns
• A noun is a part of speech that identifies a
person, animal, place, event, idea, thing
etc., e.g. a dog, a girl, country, nurse, cat,
party, oil, poverty, book, chair, school, and so
forth.
2. Pronoun
• A pronoun is a word that replaces a noun in a
sentence e.g. I, me, myself, this, she, he,
they, them, we, us, ours etc.
3. Adjective
• These are descriptive words that name an
attribute of a noun, such as sweet, beautiful,
red, small, sharp, brave, old, faithful etc.,.
– A black car
– A beautiful dress
– Frank is a tall, skinny man
• Adjectives describe nouns by giving some
information about an object's size, shape,
age, color, origin or material
Adjective (2)
• An article is a kind of adjective which is
always used with and gives some information
about a noun. There are only three articles in
English: a, an and the. Articles are used very
often and are important for using English
accurately.
4. Verb
• A verb is a kind of word (part of speech) that
tells about an action or a state. It is the main
part of a sentence: every sentence has a verb.
In English, verbs are the only kind of word
that changes to show past or present tense
e.g. sleep, eat, read, come, listen, swim,
play, thought, accept, visited, wrote, dance,
drink, sat, yell, fell, fix, etc.
5. Adverb
• A word or phrase that describes/ modifies the
meaning of a verb, adjective, or other
adverbs, expressing manner, place, time, or
degree (e.g. gently, here, now, very )
• E.g. He sings loudly, she quickly disappeared,
she smiled cheerfully, the house was
spotlessly clean, the movie ended abruptly,
• Adverbs tell us how something happened e.g.
brutally, gently, sloppily, randomly,
6. Preposition

• A word governing, and usually preceding, a noun or


pronoun and expressing a relation to another word or
element in the clause
• E.g. ‘the man on the platform’, ‘she arrived after dinner’,
"I'm going with her, He sat on the chair, she was hiding
under the table., The cat jumped off the counter, He
drove over the bridge, The book belongs to Anthony,
They were sitting by the tree.
• Words such as, on, at, in, by, with, until, about, before,
above, behind, during, to, across, below, after, beneath,
for, between, over, around, among, against, besides etc.
are prepositions.
7. Conjunction
• A word that joins together sentences,
clauses, phrases, or words.
• Words such as “and”, “because”, “but”, “for”,
“if”, “or”, and “when” are examples of
conjunctions aka connectives e.g. You can
have ice cream or pizza, He plays football and
cricket, The weather was cold but clear, I
waited at home until she arrived, He went to
bed because he was tired.
8. Interjection
• Interjections are words used to express strong feeling or sudden
emotion. They are included in a sentence (usually at the start) to
express a sentiment such as surprise, disgust, joy, excitement,
or enthusiasm.
• Interjections for Greeting - include: Hello! , Hey! , Hi! ...
• Interjections for Joy - include: Hurrah! , Hurray! ...
• Interjections for Approval - include: Bravo! , Brilliant! ...
• Interjections for Surprise - include: Huh! ,Wow! Oh!...
• Interjections for Grief/Pain - include: Alas! , Ah!
• Others include: aha, ahh, ahoy, bingo, blah, cheers,
congratulations, duh, gosh, ha-ha, hallelujah, hmm, oops, ouch,
yap, yippee, yikes
Key NLP Components
1. Syntax
2. Semantics
3. Morphology
4. Phonetics
5. Pragmatics
6. Discourse
Syntax
• Syntax: the way words are used to form
phrases. e.g., it is part of English syntax that a
determiner such as the will come before a
noun, and also that determiners are
obligatory with certain singular nouns
• Syntax deals with the structure of sentences
• Syntax: the structuring of words into legal
larger phrases and sentences
2. Semantics
• Semantics – concerned with what words
mean and how these meaning combine in
sentences to form sentence meaning.
• Semantics refers to the explicit meaning of
words, phrases or sentences.
– Lexical semantics: the study of the meanings
of words
– Compositional semantics: how to combine
word meanings
3. Pragmatics
• Pragmatics refers to the implicit and
contextual meaning of words or sentences.
• Pragmatics – concerned with how sentences
are used in different situations and how use
affects the interpretation of the sentence.
– Presupposition: “Have you stopped beating
your wife?”
– Indirect speech acts: “Do you have a
stapler?”
4. Phonetics and Phonology
• Phonetics and phonology: speech sounds,
their production, and the rule systems that
govern their use
• Phonology – concerns how words are related
to the sounds that realize them.
5. Morphology:
• Morphology – concerns how words are constructed from more basic
meaning units called morphemes. A morpheme is the primitive unit of
meaning in a language.
• Morphology: words and their composition from more basic units
– Cat, cats (inflectional morphology)
– Child, children
– Friend, friendly (derivational morphology)
• Morphology: the structure of words.
• For instance,
– unusually can be thought of as composed of a prefix un-, and a stem
usual, and an affix -ly.
– composed is compose plus the inflectional affix –ed
6. Discourse
• Discourse – concerns how the immediately preceding
sentences affect the interpretation of the next sentence.
For example, interpreting pronouns and interpreting the
temporal aspects of the information.

• Utterance interpretation in the context of the text or dialog


– Sue took the trip to New York. She had a great time there.
• Sue/she;
• New York/there;
• took/had (time)
Challenges to NLP
Challenges to NLP (2)
Challenges to NLP (3)
• Ambiguous headlines:
– Include your children when baking cookies
– Local High School Dropouts Cut in Half
– Hospitals are Sued by 7 Foot Doctors
– Iraqi Head Seeks Arms

– Safety Experts Say School Bus Passengers Should


Be Belted  
– Teacher Strikes Idle Kids
What is Ambiguity?
• A word, term , notation , sign, symbol ,
phrase, sentence ,or any other form of
communication, is said to be ambiguous if it
can be interpreted in more than one way.
• The term ambiguity is used to describe a
word, a phrase or a sentence with a multiple
meanings.( Fromkin, Rodman, Hyams . 2007).
1. Syntactic Ambiguity:
• Syntactic Ambiguity, aka structural or grammatical
ambiguity‘, is the phenomenon in which the same
sequence of words has two or more meanings that is
accounted for by different phrase structure analysis
• Syntactic ambiguity arises when the role a word
plays in a sentence is unclear, for instance, the
phrase “new houses and shops“ can be analyzed
as either “new{houses and shops } “ ,i.e. both are
new , or “{new houses }and shops” ,i.e.‟ only‟ the
houses are new
Syntactic Ambiguity (2)

The criminal shot

the servant of the actress who was on the balcony

Who was on the balcony?


the servant the actress
?
Syntactic Ambiguity (3)

• NB: The ambiguity here is because of the


structure of the sentences rather than the
word or phrase  the words are not
confusing but the word structure is
• Example: “He chased the girl in his car”.
– What does this mean? Did he chase the girl
already inside his car? Or did he chase a girl
(perhaps in another car with his car?)
Syntactic Ambiguity (4)
Syntactic Ambiguity (5)

• “No student solved exactly 2 problems”


– Meaning 1: There was no student who solved
exactly 2 problems
– Meaning 2: There were exactly 2 problems that no
student solved
• “The chicken is ready to eat”
– It’s not clear if it is the chicken that is ready to eat
or if the chicken is ready to be eaten
Syntactic Ambiguity (6)

1. Tsitsi bumped unto a man with an umbrella

2. Small boys and girls are playing hide and seek

3. “We saw the man with the telescope”

4. Jesus answered him, "Truly I tell you, today, you will


be with me in paradise.“ (Luk. 23:43)
2. Lexical Ambiguity
• Another major type of ambiguity is the semantic (or lexical)
ambiguity, which might be viewed in a sentence like “Visiting
relatives can cause problems”, which can interpreted in two
different ways .The first interpretation is “Relatives who visit us
can cause problems” whereas the second one is “When we visit
relatives there can be problems”
• “Tsitsi loves her mother and Rudo does too”. The interpretations
can be Rudo loves Tsitsi’s mother or Rudo likes her own mother
• When a word or phrase thus has more than one possible
meaning and may cause confusion, it is called “lexical ambiguity.
The word CHIP for instance means either 1. a small piece of wood,
2. a long thin piece of potato or 3. a small piece of silicon .
Lexical Ambiguity (2)
• Lexical ambiguity is thus the ambiguity of a
single word. Lexical ambiguity occurs when a
word has more than one meaning.
• E.g. of such words include bank, pen, bat, book,
cricket, tank, fast etc. (1)“I saw a military tank”
(2) “The tank was full of water” . NB: The tank in
both instances corresponds to the syntactic
category of a noun but their meaning is different

Lexical Ambiguity (3)
• Pass’ for example can mean a physical
handover of something, a decision not to
partake in something, and a measure of
success in an exam or another test format. It
also operates in the same conjugation as both
a verb and a noun. The difference in meaning
comes from the words that surround ‘pass’
within the sentence or phrase (I passed the
butter/on the opportunity/the exam).
Lexical Ambiguity (4)
• The word silver can be used as a noun, verb and
adjective as shown in these 3 examples: (1) “She
bagged 2 silver medals, (2). “His worries had
silvered his hair”, (3). “She made a silver speech”
• Bank Example:
– The men decided to wait by the bank
– The fisherman decided to wait by the bank
– The businessman decided to wait by the bank
– I opened a checking account by the bank
Lexical Ambiguity (5)
3. Pragmatic Ambiguity

• Pragmatic ambiguity arises when the statement is


not specific, and the context does not provide the
information needed to clarify the statement.
Information is missing, and must be inferred.
• Consider the example “I love you too”
• This can be interpreted as
– I love you (just like you love me)
– I love you (just like someone else does)
– I love you (and I love someone else)
– I love you (as well as liking you)
Pragmatic Ambiguity (2)

• Pragmatic ambiguity refers to a situation where the context of a


phrase gives it multiple interpretation. One of the hardest tasks in
NLP. The problem involves processing user intention, sentiment,
belief world, modals etc. all of which are highly complex tasks.
• Consider the example, Tourist (checking out of the hotel): Waiter,
go upstairs to my room and see if my sandals are there; do not be
late; I have to catch the train in 15 minutes. Waiter (running
upstairs and coming back panting): Yes sir, they are there.
• Clearly, the waiter is falling short of the expectation of the tourist,
since he does not understand the pragmatics of the
• situation.
When Ambiguity becomes useful
• Ambiguity has its importance in several fields
such as humor and advertising. To catch the
reader's attention , headlines in newspapers or
magazines tend use syntactic and lexical
ambiguity
• To resolve the challenge of ambiguity, word
sense disambiguation is often used. Write
brief notes on what word sense
disambiguation is and how it works.
Steps in NLP

• There are general five


steps −
1. Lexical Analysis
2. Syntactic Analysis
3. Semantic Analysis
4. Disclosure Integration
5. Pragmatic Analysis
1. Lexical Analysis 
• Lexical Analysis − It involves identifying and
analyzing the structure of words. Lexicon of a
language means the collection of words and
phrases in a language. Lexical analysis is
dividing the whole chunk of text into
paragraphs, sentences, and words.
2. Syntactic Analysis (Parsing) 
• Syntactic Analysis (Parsing) − It involves
analysis of words in the sentence for grammar
and arranging words in a manner that shows
the relationship among the words. The
sentence such as “The school goes to boy” is
rejected by English syntactic analyzer.
3. Semantic Analysis
• Semantic Analysis − It draws the exact
meaning or the dictionary meaning from the
text. The text is checked for meaningfulness.
It is done by mapping syntactic structures and
objects in the task domain. The semantic
analyzer disregards sentence such as “hot ice-
cream”.
4. Discourse Integration 
• Discourse Integration − The meaning of any
sentence depends upon the meaning of the
sentence just before it. In addition, it also
brings about the meaning of immediately
succeeding sentence.
5. Pragmatic Analysis
• Pragmatic Analysis − During this, what was
said is re-interpreted on what it actually
meant. It involves deriving those aspects of
language which require real world knowledge.
NLP Parse Tree Generation
• A parse tree is an ordered, rooted tree that
represents the syntactic structure of a sentence.
• The parse tree breaks down the sentence into
structured parts so that the computer can easily
understand and process it. In order for the
parsing algorithm to construct a parse tree, a set
of rewrite rules, which describe what tree
structures are legal, need to be constructed.
Parse Tree (2)
• These rules say that a certain symbol may be
expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two
strings Noun Phrase (NP) and Verb Phrase (VP), then
the string combined by NP followed by VP is a
sentence. The rewrite rules for the sentence are as
follows −
• S → NP VP
• NP → DET N | DET ADJ N
• VP → V NP
Parse Tree (3)
Parse Tree (4)
• The parse tree is the entire structure, starting from S and
ending in each of the leaf nodes (John, hit, the, ball). The
following abbreviations are used in the tree:
• S for sentence, the top-level structure in this example
• NP for noun phrase. The first (leftmost) NP, a single noun
"John", serves as the subject of the sentence. The second
one is the object of the sentence.
• VP for verb phrase, which serves as the predicate
• V for verb. In this case, it's a transitive verb hit.
• D for determiner, in this instance the definite article "the"
• N for noun
Parse Tree (5)
Parse Tree (6)
• Articles (DET) − a | an | the
• Nouns − bird | birds | grain | grains
• Noun Phrase (NP) − Article + Noun | Article +
Adjective + Noun
• = DET N | DET ADJ N
• Verbs − pecks | pecking | pecked
• Verb Phrase (VP) − NP V | V NP
• Adjectives (ADJ) − beautiful | small | chirping
Parse Tree (7)
Parse Tree (8)
Parse Tree (9)
Parse Tree (10)
Further Reading
• https://www.tutorialspoint.com/artificial_intellig
ence/artificial_intelligence_natural_language_pr
ocessing.htm

• http://nlp.stanford.edu:8080/parser/index.jsp

• https://www.youtube.com/watch?v=fOvTtapxa9
c

You might also like