Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Natural Language

Processing
KOE-088
Unit1
Prepared by Sandhya Avasthi, AP, Department of CSE
Topics Covered in Unit 1
✓Introduction
✓The study of Language
✓Applications of NLP
✓Evaluating Language Understanding Systems
✓Different levels of Language Analysis
✓Representations and understanding
✓Organization of Natural language Understanding System
✓Linguistic Background: An outline of English syntax.
NLP-Introduction
• It refers to branch of computer science, mainly artificial intelligence or
AI—concerned with giving computers the ability to understand text and
spoken words in much the same way human beings can.
• NLP combines computational linguistics—rule-based modeling of
human language—with statistical, machine learning, and deep
learning models
• It drives computer programs that translate text from one language to
another, respond to spoken commands, and summarize large volumes
of text rapidly—even in real time.
• NLP has two main steps; NLU and NLG
What is natural language?

❑It means human language. The most common way that people
communicate is by speaking or writing in one of the natural language
such as English, Chinese, German, or French.

❑ There are two forms of natural language: written and spoken forms.
NLU and NLG
▪ NLP (Natural Langauge Processing) sums up all methods covering the pure
processing of language by means of algorithmic, statistic, heuristic etc. means.

▪ NLU (Natural Langauge Understanding) indicates the real understanding of a text


that is formulated in some natural languages.

▪ Goal of NLU is to gain insights into human cognition, developing artificial agents as
assistants and solving major subpoblem of AI

▪ Natural Language Generation (NLG) is the next step that takes the language that has
been processed (NLP) and understood (NLU), then produces written or spoken
narratives from a data set.
What is Understanding?
To understand a statement is to
• determine its truth (with justification)
• calculate its entailments
• take appropriate action in light of it
• translate it into another language

History of NLU
▪ 1960s: Pattern-matching with small rule-sets
▪ 1970-80s: Linguistically rich, logic-driven, grounded systems; restricted applications
▪ 1990s: the statistical revolution in NLP leads to a decrease in NLU work
▪ 2010s: NLU returns to center stage, mixing techniques from previous decades
Technological and Cognitive Goals
• As per the theory discussed by James Allen (1987) ,there are two
underlying motivations for building a computational theory.
-technological goal
- cognitive goal
• Technological goal is to build better computers and solutions that
works
• Cognitive goal is to build a computational analog of the human-
language-processing mechanism
Study of Natural Language
Study of natural language
Applications of NLP
Applications are divided into two classes
• Text-based applications- involve the processing of written text, such as
books, newspapers, reports, manuals, email messages, and so on. These are
all reading-based tasks.
• Dialog-based applications- Dialogue-based applications involve human-
machine communication. Most naturally this involves spoken language, but
it also includes interaction using keyboards.
Text-based Application
• Finding appropriate documents on certain topics from a data base of texts
-example, finding relevant books in a library
• Extracting information from messages or articles on certain topics
- example, building a database of all stock transactions described in the
news on a given day
• Translating documents from one language to another
-producing automobile repair manuals in many different languages
• summarizing texts for certain purposes
- example, producing a 3-page summary of a 1000-page government report
Dialog-based applications
• question-answering systems, where natural language is used to query a database
-example, a query system to a personnel database
• automated customer service over the telephone
-example, to perform banking transactions or order items from a catalogue
• tutoring systems, where the machine interacts with a student
-for example, an automated mathematics tutoring system
• spoken language control of a machine
- voice control of a computing device, cooperative problem-solving systems
-example, a system that helps a person plan and schedule freight shipments

Problems/issues in dialogue systems- First, language used is very different, and the system needs to
participate actively in order to maintain a natural, smooth-flowing dialogue. Dialogue requires the use of
acknowledgments to verify that things are understood, and an ability to both recognize and generate
clarification sub-dialogues when something is not clearly understood.
Evaluating Language understanding System
Understanding vary from application to application. If this is so, how can you tell if a system
works?
Black Box Evaluation-
▪ Run the program and see how well it performs the task it was designed to do.
▪ If the program is meant to answer questions about a database of facts, you might ask it questions to
see how good it is at producing the correct answers.
▪ If the system is designed to participate in simple conversations on a certain topic, you might try
conversing with it.
▪ This is called black box evaluation because it evaluates system performance without looking
inside to see how it works.
Limitations
▪ It is problematic in early stages of research because early evaluation results can be misleading.
▪ Techniques that produce best results in the short term will not lead to the best results in the long
term.
Evaluating Language Understanding System
Glass Box Evaluation-
This method identifies various subcomponents of a system and then evaluate each one with
appropriate tests.
This is called glass box evaluation because you look inside at the structure of the system.
Limitations
• Problem with glass box evaluation is that it requires some consensus on what various
components of a natural language system should be.
• Achieving such a consensus is an area of considerable activity at the present.
Example- ELIZA Program Developed at MIT, 1960
Example- NLU System
Different levels of Language Analysis
• NLU system must use considerable knowledge about structure of the language
itself, including what the words are, how words combine to form sentences, what
the words mean, how word meanings contribute to sentence meanings.
• NLU take into account aspect of what makes humans intelligent — their general
world knowledge and their reasoning abilities.
Example-
To answer questions or to participate in a conversation, a person not only must
know a lot about the structure of the language being used, but also must know
about the world in general and the conversational setting in particular.
Forms of Knowledge relevant for NLU(1)
• Phonetic and phonological knowledge
-how words are related to the sounds that realize them.
-Such knowledge is crucial for speech-based systems
• Morphological knowledge
-how words are constructed from more basic meaning units called morphemes.
-A morpheme is the primitive unit of meaning in a language
- for example, meaning of the word "friendly" is derivable from the meaning of the
noun "friend" and the suffix "-ly", which transforms a noun into an adjective
Forms of Knowledge relevant for NLU(2)
• Syntactic knowledge
-how words can be put together to form correct sentences
-determines what structural role each word plays in the sentence and what
phrases are subparts of what other phrases.
• Semantic knowledge
-what words mean and how these meanings -combine in sentences to form
sentence meanings.
-study of context-independent meaning - the meaning a sentence has regardless
of the context in which it is used.
Forms of Knowledge relevant for NLU(3)
• Pragmatic knowledge
How sentences are used in different situations and how use affects the
interpretation of the sentence.
• Discourse knowledge
How the immediately preceding sentences affect the interpretation of the next
sentence. This information is especially important for interpreting pronouns
and for interpreting the temporal aspects of the information conveyed.
• World knowledge
Includes general knowledge about the structure of the world that language
users must have in order to, for example, maintain a conversation.
It includes what each language user must know about the other user’s beliefs
and goals.
Syntax, Semantics, and Pragmatics
There are three different levels of linguistic analysis done before performing
NLP:
• Syntax: What part of the given text is grammatically right?
• Semantics: What is the meaning of the given text?
• Pragmatics: What is the purpose of the text?
Syntax, Semantics, and Pragmatics
Following examples may help you understand the distinction between syntax, semantics, and
pragmatics.
1. Language is one of the fundamental aspects of human behavior and is a crucial
component of our lives.
2. Green frogs have large noses.
3. Green ideas have large noses.
4. Large have green ideas nose.
▪ Sentence 1 It agrees with all that is known about syntax, semantics, and pragmatics.
▪ Sentence 2 is well-formed syntactically and semantically, but not pragmatically.
▪ sentence 3 is much worse. Not only is it obviously pragmatically ill-formed, it is also
semantically ill-formed (Ideas cannot be green)
▪ Sentence 4 is unintelligible, even though it contains the same words as sentence 3. It does not
even have enough structure to allow you to say what is wrong with it. Thus it is syntactically
ill-formed.
Representations and understanding
A crucial component of understanding involves computing a representation of the
meaning of sentences and texts.
✓most words have multiple meanings, which we will call senses.
✓ word "cook", for example, has a sense as a verb and a sense as a noun;
✓"dish" has multiple senses as a noun as well as a sense as a verb
✓This ambiguity would inhibit the system from making the appropriate inferences
✓needed to model understanding. The disambiguation problem appears much
easier than it actually is because
✓people do not generally notice ambiguity
Representations and understanding
Useful representations language have following two properties:-
• The representation must be precise and unambiguous. You should be able to
express every distinct reading of a sentence as a distinct formula in the
representation.
• The representation should capture the intuitive structure of the natural language
sentences that it represents.
• Example-sentences that appear to be structurally similar should have similar
structural representations
• Meanings of two sentences that are paraphrases of each other should be closely
related to each other.
Syntax : Representing Sentence Structure
• Syntactic structure of a sentence indicates the way that words in the sentence are
related to each other.
• Structure indicates how words are grouped together into phrases, what words
modify what other words, and what words are of central importance in the sentence.
• structure may identify the types or relationships that exist between phrases and
can store other information about the particular sentence structure that may be
needed for later processing.
• For example-
Two structural representations of "Rice flies like sand".

• Most syntactic
representations of language
are based on the notion of
context-free grammars,
• That represent sentence
structure in terms of what
phrases are subparts of
other phrases.
• like" is a verb (V) in first and
a preposition (P) in the
second.
Logical Form
• The structure of a sentence doesn’t reflect its meaning,
• Both interpretations in example have same syntactic structure, and the different meanings
arise from an ambiguity concerning the sense of the word "catch".
• intended meaning of a sentence depends on the situation in which the sentence is
produced.
• division is between context-independent meaning and context-dependent meaning.
• logical form encodes possible word senses and identifies the semantic relationships between
the words and phrases.
• Many of these relationships are often captured using an abstract set of semantic
relationships between the verb and its NPs.

Example- NP "the catch" can have different meanings depending on whether the speaker is talking about a
baseball game or a fishing expedition.
The fact that "catch" may refer to a baseball move or the results of a fishing expedition is knowledge about
English and is independent of the situation in which the word is used.
Knowledge Representation
• system uses to representand reason about its application domain.
• This is the language in which all the specific knowledge based on the application
is represented.
• Goal of contextual interpretation is to take a representation of the structure of a
sentence and its logical form, and to map this into some expression in the KR that
allows the system to perform the appropriate task in the domain
• first-order predicate calculus (FOPC) is the final representation language because
it is relatively well known, well studied, and is precisely defined.
Organization of Natural language Understanding
System (1)
▪ The Figure shows actual organization
of NLP/NLU based system.
▪ This includes a interpretation process
that map from one representation to
the other.
▪ Process that maps a sentence to its
syntactic structure and logical form is
called parser
▪ It uses knowledge about word and
word meanings (lexicon) and a set of
rules defining the legal structure
(grammar) in order to assign a
syntactic structure and a logical form
to an input sentence.
Organization of Natural language Understanding
System (2)
▪ Process that transforms the syntactic structure and logical form into a final meaning
representation is called contextual processing.
▪ Process includes issues such as identifying the objects referred to by noun phrases (
example, the man)
▪ Inferential processing required to interpret the sentence appropriately within the
application domain.
▪ It uses knowledge of discourse context and knowledge of application to produce a final
representation.
▪ The system perform reasoning tasks for the application.
▪ This requires a response to the user, meaning that must be expressed is passed to the
generation component of the system.
▪ It uses knowledge of the discourse context , and information on grammar and lexicon to
plan form of utterances, which is then mapped into words by a realization process.
Example Sentence
1. Visiting relatives can be trying.
2. Visiting museums can be trying.

▪ Both the sentences have identical syntactic structure, so both are


syntactically ambiguous.
▪ In sentence 1, subject might be relatives who are visiting you, or event
where you are visiting relatives.
▪ While in sentence 2, has only one valid semantic interpretation, since
museums are object that cant visit someone, only they can be visited.
Syntactic Processing
Outline of English syntax
Elements of simple noun/verb/adjective/adverbial phrase
What is Syntactic Processing?
• Syntactic processing is the process of analyzing the grammatical
structure of a sentence to understand its meaning.
• This involves identifying different parts of speech in a sentence, such
as nouns, verbs, adjectives, and adverbs, and how they relate to each
other in order to give proper meaning to the sentence.
• help to understand the relationship between individual words in the
sentence

Only first sentence is grammatically correct and hence mean something.


Word
• Most basic unit of linguistic structure appears to be the word.
• Study of morphology concerns the construction of words from more basic
components corresponding roughly to meaning units.
• There are two basic ways that new words are formed, traditionally classified as
inflectional forms and derivational forms.
• Inflectional forms use a root form of a word and typically add a suffix so that
the word appears in the appropriate form given the sentence
Classes of words
➢Nouns (objects, places, concepts, events, qualities)
➢Adjective
➢Verbs
➢Adverbs
Element of Simple Noun Phrases
• Noun phrases (NPs) are used to refer to things: objects, places, concepts,
events, qualities, and so on. The simplest NP consists of a single pronoun: he,
she, they,
• you, me, it, I, and so on.
• Pronouns can refer to physical objects
• Another basic form of noun phrase consists of a name or proper noun,
• such as John or Rochester. These nouns appear in capitalized form in
carefully
• written English
It hid under the rug.
Once I opened the door, I regretted it for months.
He was so angry, but he didn’t show it.
Specifiers and Qualifiers
• Qualifiers describe general class of objects identified by the head
• Specifiers indicate how many such objects are being described, as well as how the
objects being described relate to the speaker and hearer.
• Specifiers are constructed out of ordinals (such as first and second), cardinals
(such as one and two), and determiners.
• Determiners can be subdivided into the following general classes:

articles— the, a, and an.


demonstratives— this, that, these, and those.
possessives— John’s, the fat man’s, her, my, and whose.
wh-determiners— which and what.
quantifying determiners— some, every, most, no, any, both, and half.
Verb Phrases and Simple Sentences
• While an NP is used to refer to things, a sentence (S) is used to assert, query, or
command. You may assert that some sentence is true, ask whether a sentence is
true, or command someone to do something described in the sentence.
• A simple declarative sentence consists of an NP, the subject, followed by a verb phrase
(VP), the predicate
Transitivity and Passives
• last verb in a verb sequence is called main verb, and is drawn from the open class
of verbs.
• Depending on the verb, a wide variety of complement structures are allowed.
-intransitive verbs (Jack laughed, He will have been running)
-transitive verbs (Jack found a key)
Particles
• Some verb forms are constructed from a verb and an additional word called a
particle.
• Particles overlap with class of prepositions, Some examples are up, out, over, and
in.
• With verbs such as look, take, or put, construct many different verbs by
combining verb with a particle
-Example, look up, look out, look over
• difference between a particle and a preposition results in two different readings
for the same sentence.
• For example, look over the paper would mean reading the paper, if you
consider over a particle (the verb is look over).
• In contrast, the same sentence would mean looking at something else behind or
above the paper, if you consider over a preposition (the verb is look).
Review Questions
1. Discuss natural language understanding.
2. What is natural language generation?
3. Discuss the concept of language understanding system.
4. Describe the lexical semantics.
5. What is syntactic processing?

You might also like