Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Introduction

OBJECTIVES
After reading this chapter, the student will be able to Understand:
Introduction to Natural Language Processing.
OP OODDODAOOO ©

History of NLP. |
Generic NLP Systems .
Levels of NLP.
Knowledge in Language processing .
Ambiguity-in NLP ..
Stages in NLP.
Challenges for NLP.
Application Areas of NLP:

Scanned with CamScanner


24 % a eee ai
Language Processin
SAeaomangereet

Consider a example given in Figure 1:


ee 4. Introduction to Natural
English, Hindi, Marathi
languages poken by people, €.9- kJfmmff mmmvvv_ nnnfin333
We Natural Language refers to the
1. wt ceeh

Java, etc. Uj iheale eleee mnster vensi credur |


.

ages, like C, C++,


ramming langu
aes as opposed to artificial’prog Baboi oi cestnize
has evolved
language is any language that
A natural language or ordinary consci ous planning
Coovoel2* ekk: Idsilk Ikef vnnjfj?
on without
naturally in humans through use and repetiti
a
t
Fgmgimilk mifin kfre xnnn!
can take differen t forms, such as written
or premeditation. Natural languages
Figure 1; Sample Text in naturalform
from constructed sl
text, speech or signing etc. They are distinguished Computers “see” text in English the same you have seen the figure 1.
logic.
formal languages such as those used to program computers or to study Normally, People have no trouble understanding natural language as they have
l intelligence that helps
Natural language processing (NLP)is a branchof artificia Common sense knowledge, Reasoning capacity, Experience for understanding the
ate human langua ge. NLP draws from
computers understand, interpret and manipul context of the text. But this is not the case with Computers. Computers don'thave inbuilt
tionallinguistics, in its pursuit
many disciplines, including computerscience and computa Common-sense knowledge, Reasoning capacity, Experience. Uniess we teach —
er understanding.
to fill the gap between human communication and comput computers to do so, they will not understand any natural language.
Natural Language Process ing (NLP)is a field of researc h and application that determines

the way computers can be used to understand and manage natural langua
ge text or 3. Goals of Natural Language Processing:
speech to do useful things. The term “natural” in the context of the language is used 1. The ultimate goal of natural language processing is for computers to achieve
to distinguish human languages (such as Gujarati, English, Spanish and French) from human-like comprehension of texts/languages. When this ts actyeved, computer
computer languages (such as C, C++, Java and Prolog). The definition of Natural systems will be able to understand, draw inferences from, summanze, tansiale
Language Processing clarifies that it is a theoretically induced range of computational and generate accurate and natural human text and language.
techniques (multiple methods or techniques for language analysis) for analyzing and
2. The goal of natural language processingis to specify a language comprehension
representing naturally occurring text (such as English, Gujarati and Punjabi) at one .
and production theory to such a level of detail that a person is abie to write a
or more levels of linguistic analysis for the purpose of achieving human like language
computer program which can understand and produce natural language.
processing for a range of tasks or applications.
3. The basic goal of NLP is to accomplish human like language processing. The .
2. Need for Natural Language Processing (N LP) choice of word “processing” is very deliberate and should not be replaced with
“understanding”. For although the field of NLP was onginally referred to as Natural
Significant growth in the volume andvariety of data is due to the amountof unstructured
Language Understanding (NLU), that goal has not yet been accomplished. A full a BE
text data—in fact, up to 80% ofall your data is unstructured text data. Companies collect
NLU system would be able to:
huge amounts of documents, emails, social media, and other text-based information
— Paraphrase an inputtext.
to get to know their customers better, offer services or market their products .However,
mostof this data is unused and untouched. | — Translate the text into another language.
Text analytics, through the useof natural language processing (NLP), holds the key to — Answer questions aboutthe contents ofthe text.
unlocking the business value within these vast data resources. — Draw inferences from the text.
In the era of big data, businessescan fully utilize the data potential and take advantage
of the latest parallel text analytics and NLP algorithms packaged in a variety of open
source software namely R, pythonetc.

; ( 2 ) Natural Language Processing

Scanned with CamScanner


== = —————
= ——
————— —————

.
A. Brief overview of NLP | |
human | anguage
eke ot and
th . I t
on the interactions between
* The field of study that focuses for short. ‘ty s : e
Processing, or NLP gs caw eae
computers is called Natural Language Database. ; Artificial
inguistics Intelligence PCOS ie
artificial intelligence, and computationa
intersection of computer science,
sing lies in making computers unders
tand I
The essence of Natural Language Proces the
sk though. Computers can understand NaturalLanguage!
the natural language. That's not an easyta human a rocessing,
~Pro
peace
andthe tables in the database, but
structured form of data like spreadsheets ¥ Ff f. Web NLP |
t
ctured categoryof data, and it gets difficul
languages,texts, and voices form an unstru
the need for Natural Language
for the computer to understand it, and there arises Information : Machine . &Text =. ’ ~ Language«_-
Retrieval Translation Categorization Morphological —
Processing.
(using ontology) (E-M & M-E) ~ summarization Analysis
s forms and it would get very
There's a lot of natural language data out there in variou
easy if computers can understand and proces s that data. We can train the models
: ssGak: Abstractive Extractive
writing for
in accordance with expected output in different ways. Humans have been Seeeeea at eee Summarization . Summarization
would be great
thousands ofyears, there are lotofliterature pieces available, and it
Figure 2: NLP in the Computer science taxonomy
if we make computers understand that. But the task is never going to be easy. There
are various challenges floating out there like understanding the correct meaning of the 5. History of NLP
sentence, correct Named-Entity Recognition(NER), correct prediction of various parts of
NLP began in the 1950s as the intersection of artificial intelligence and linguistics.
speech, coreference resolution(the most challenging thing in my opinion).
NLP wasoriginally distinct from text information retrieval (IR), which employs highly
Computers can't truly understand the humanlanguage. If we feed enough data and train
scalable statistics-based techniques to index and search large volumesoftext efficiently:
a model property, it can distinguish and try categorizing various parts of speech(noun,
Manning et al1 provide an excellent introduction to IR.. With time, however, NLP and IR
verb, adjective, supporters, etc...) based on previously fed data and experiences. If it
have converged somewhat. Currently, NLP borrows from several, very diverse fields,
encounters a new word it tried making the nearest guess which can be embarrassingly
requiring today's NLP researchers and developers to broaden their mental knowledge-
wrong 2 few times.
base significantly.
K's very difficutt for a computer to extract the exact meaning from a sentence. For
Early simplistic approaches, for example, word-for-word Russian-to-English machine
example — The boy radiated fire like vibes. The boy had a very motivating personality or
translation, were defeated by homographs—identically spelled words with multiple :
he actually radiated fire? As you can see overhere, parsing English with a compute is
|
meanings—and metaphor, leading to the apocryphal story of the Biblical, ‘the spiritis ~
going to be complicated.
willing, but the flesh is weak’ being translatedto ‘the vodkaiis agreeable, but the meat)is
spoiled,’
Chomsky's 1956 theoretical analysis of language grammars provided an estimate of “ 3
the problem's difficulty, : influencing the creation (1963) of Backus-Naur Form (BNF) “Ae
notation. BNF is used to specify,a ‘context-free grammar ee andis common usedte
‘ :

ae

(©) Natural Language Processing Introduction cs


eee
Ss
|

Scanned with CamScanner


ees . i ion isas Modern NLPconsists of speech recognition, machine leaming, machine te ca " |

igre aa 0 as Ne teraa
e ae machinetranslation. These parts when combined would allow for artificialimeligence to te
. represen
rules that cofectively
decivation code syntactic?
validate program
heuristics.) Chomsky also identified still more gain real knowledge of the world, not just playing chess or moving around an obstacie is
'
absolute constraints. not expert systems used to specify course. In the nearfuture computers wil be able to read all of the information onfneand
ctve ‘Teguiar’ grammars. the basis of the regular expressions
by Kleene (1956), was first learn from it and solve problems and possibly cure diseases. There lent for NLP and on
text-search patiems. Regular expression syntax. defined
Al is humanity. research will not stop until both are at a human tevel Of awereress and a
supported by Ken Thompson's grep ublity on UNIX.
understanding. os
parser generators such
Subsequenty (1970s). lexical-analyzer (lexer) generators and
A lexer transforms text into tokens; a 6. Generic NLP system
as the lex‘yace combination utlized grammars.
generators simplify programming-
perser validates a token sequence. Lexer/parser
and BNF specifications,
language implementation greatly by taking regular-expression Typed Input Somer SPL
that determine lexing/
ee

code and lookup tables


respectively, as input, and generating

are often employed _ Osatese Upcee


While CFGs are theoretically inadequate for natural language, they Message Text eo
in practice. Programming languages are typically designed deliberately with a 5 Natural
for NLP g Language j—*Veanrg
Look-Ahead parser with Left-
a restrictive CFG vanant, an LALR(1) grammar (LALR,
(bottom-up) derivation), to simplify implementation. g& Sootan Raspomae
to-right processing and Rightmost
it builds compound
An LALR(1) parser scans text left-to-right, operates bottom-up (i. e, Speech ;

constructs from simpler ones), and uses a look-ahead ofa single token
to make parsing
| Orme
The Prolog language was originally invented (1970) for NLP applications. Its syntax is
Figure 3: Generic NLP Sytem
especially suited for writing grammars, although, in the easiest implementation mode
eflectre
(top-down parsing), rules must be phrased differently (i. e, right-recursively) from those Any natural language processing should start with some inout and ends wih
intended for a yacc-style parser. Top-down parsers are easier to implement than bottom- and accurate output. The inputs for natura! lanquage processor can be text or speech.
Outret may be m
up parsers (they don't need generators), but are much slower. There are a variety of output that can be generated by the system.
wocate, es
’ Recent research has increasingly focused on unsupervised and semi-supervised the form of answer when inputis a question. Similarty outputs can be Database
the re: z
leaming algorithms. Such algorithms are able to learn from data that has not been Spoken response, Semantics, Part of speech, Morphology of word, Semanscs of
hand-annotated with the desired answers, or using a combination of annotated and non- word/ Sentencesetc.

annotated data. Generally, this task is much moredifficult than supervised learning, and
typically produces less accurate results for a given amountof input data. However, there
aR. is an enormous amountof non-annotated data available (including, among otherthings,
r

fe the entire content of the World Wide Web), which can often make upfor the inferior
. results.

ie («) Natural Language Processing

Scanned with CamScanner


ro a i
~T.Levels of NLP —— az resolution Is replacing of words such as pronouns. Discourse structure rec
determines the function of sentencesin
the te:xt which
i adds meanii
“ Natural Language Processing works on multiple levels and mostoften, these different the text, represante
e nt

~ greas synergizewell with each other. The NLPcan broadly be divided into various levels oe
Reasoning: To produce an answerto a question which is not explicitly storedin ada oe a -
asshown in figure. Natural LanguageInterface to Database (NLIDB) carries out reasoning based ondatastored.
nS
in the database. For example, consider a databasethat holds the academic information about =i
.
Contextual student, and user posed a query such as: ‘Which studentislikelyto fail in Maths subject? a
ia Parsing A reasoning To answer the query, NLIDB needs a domain expert to narrow down the reasoning process.” . :|
Application
reasoning
and execution
KnowledgeLanguage processing sh
A natural language understanding system must have knowledge about what the words
syntactic Utierance mean, how words combine to form sentences, how word meanings combine to from
] planning
sentence meanings and so on. The different forms of knowledge required for natural
language understanding are given below.

PHONETIC AND PHONOLOGICAL KNOWLEDGE .


Phonetics is the study of languageatthe level of sounds while phonology is the study -
of combination of sounds into organized units of speech, the formation of syllables
and larger units. Phonetic and phonological knowledge are essential for speech based

Reasoning systems as they deal with how wordsarerelated to the soundsthat realize them.

MORPHOLOGICAL KNOWLEDGE
Phonolocy: — deeis with intespretetion of speech sound within and across words. Morphology concemsword formation. It is a study of the pattemsof formation of words by
Morphology: f is & saucy of the way words are built up from smaller meaning-bearing units the combination of sounds into minimaldistinctive units of meaning called morphemes. ts -
calles morphemes. For example. the word ‘fox’ has single mompheme while the word ‘cats’
Morphological knowledge concems how words are constructed from morphemes. ~ a
have two morcphemes “caf end mocpheme “-s" represents singular and plural concepts.
SYNTACTIC KNOWLEDGE
blorphoiogical Jexicon is the Est of stem and affixes together with basic information,
Syntax is the level at which we study how words combine to form phrases, phrases
wheter the siem is a Noun stem or a Verb stem [21]. The detailed analysis ofthis level
combine to form clauses and clauses join to make sentences. Syntactic analysis ie
is Gscussed in chapter 4. Syntax: It is a study of formal relationships between words. It
8 @ study of how words are clustered in classes in the form of Pari-of-Speech (POS),
concernssentence formation. It deals with how words can beput together to form correct __ . eee
how they are grouped with their neighbours into phrases, and the way words depend on sentences.It also determines whatstructural role each word plays in the sentence and
what phrases are subparts of whatother phrases. » Ea
&ach other in a sentence.
Semantics: it is 2 study of the meaning of words that are associated with grammatical SEMANTIC KNOWLEDGE a
structure. f consists of two kinds of approaches: syntax-driven semantic analysis of the words and sentences. This is the study of context :
and It concerns the meanings
has, no matter in which context itis: ee.
serattic gemmat. The detailed explanation of this level is discussed in chapter4.
In independent meaningthat is the meaning a sentence
used. Defining the meaningof a sentenceis very difficult due to the ambiguities kNONed Sigg’
Gsooute contest, the level of NLP works with text longer than
a sentence. There are two
types ofGiscourse- anaphora resolution and discourse/text structure recognition.
Anaphora PRAGMATIC KNOWLEDGE

(2)secs Langage Processing


pa
ieee:

Scanned with CamScanner


= ee me nt

oe Pragmatics is the extension of the meaning or semantics. Pragmatics deals with the So
They are:
a: contextual aspects of meaningin particularsituations. It concerns how sentences are
-Lexical Analysis:- Analysis
Eo used in differentsituations and how useaffects the interpretation of the sentence. of word forms
e Syntactic Analysis:-Struct
DISCOURSE KNOWLEDGE ure processing
Discourse concems connected sentences.It is a study of chunks of language whichare Semantic Analysis:- Meaningr
epresentation
bigger than a single sentence. Discourse language concernsinter-sententiallinks thatis Discourse Analysis:- Processing
ofinterrelated sentences "
how the immediately preceding sentencesaffect the interpretation of the next sentence, ¢ Pragmatic Analysis: -The purposeful
Use of sentencesin situations.
Discourse knowledgeis important for interpreting pronouns and temporal aspectsof the Ambiguity can occur atall these
levels. It is a property oflinguistic expressi
. information conveyed. expression (word/phrase/sentence)
ons. fan. ee
has more than oneinterpretation we can
WORLD KNOWLEDGE refer it a8.)
ambiguous.
Word knowledgeis nothing but everyday knowledge that all speakers share about the For e.g: Consider the sentence, “The
chicken is readyto eat”.
world.It includes the general knowledge aboutthe structure of the world and what each The interpretations in the above phrase
can be:
language user must know aboutthe other user's beliefs and goals. This essential to
* The chicken(bird) is ready to be fed
or
makethe language understanding muchbetter.
* The chicken (food) is ready to be eaten.
knowledgerepresentation and reasoning systems have incorporated natural
language Consider another sentence: “There was nota
as interfaces to expert systems or knowledge basesthat performed tasks single man atthe party”
separate from
natural languageprocessing. As this book shows, however, the computation The interpretations in this case can be:
al nature of
representation and inference in natural language makesit the ¢ Lack of bachelors at the party or
ideal modelfor all tasks
in anintelligent computer system. Natural languageprocessing e Lack of menaltogether
combines the qualitative
characteristics ofhuman knowledge processing with a computer's Thereare different types of ambiguities
quantitative advantages,
allowing for an in-depth, systematic processing of vast “oa
amounts of information. The 1. Lexical Ambiguity: is the ambiguity of a single word. A word
essays in this interdisciplinary book cover a range of implementat can be ambiguous “ae
ions and designs, from with respectto its syntactic class. Eg: book, study.
formal computational models to large-scale natural language
processing systems. For eg: The word “silver” can be used as a noun, an adjective, or a verb.

9. Ambiguity in NLP — She baggedtwosilver medals.


— She made a silver speech.
Natural Language Processing (NLP)is an area
of research and application that explores
how computers can be used to understand and — His worries hadsilvered his hair. -
manipulate natural language text or
Speech to do useful things. The Text based NLP Lexical ambiguity can be resolved by Lexical category disambiguation ie, |
has been regarded as consisting of
various levels. : i
speech tagging. As many words may belong to more than onelexical category r
speechtaggingis the processof assigning a part-of-speech orlexical category suchas
a noun, verb, pronoun, preposition, adverb, adjective etc. to each word in a Sentences
Lexical Semantic Ambiguity: The type of lexical ambiguity, which occurs when a 2
single word is associated with multiple senses. Eg: bank, pen,fast, bat, cricket etc. ae
For eg: 1. The tank wasfull of water.
’ 2.1 saw military tank.
Natural Language Processing

Scanned with CamScanner


sa Consider the sentence “1 Saw
: “Werds havemultiple =n 1or such sentences. 3. Semantic Ambiguity:This occurs when the meaning of the wort
SIA o : 5 oreoeaeI
be misinterpreted. Even after the syntax and the meanings ofthe indus
words
entence. are:
the contextof the s have been resolved, there are two ways of reading the sentence.
meaning of the words which changes
BS FPossble m
Consider the example: “Seema loves her mother and Sriya does too”aeSpo eras
Se flying mammal / wooden club?
with a saw.)
ae a i et tense of “see” / present tense of "saw" (to cut The interpretations can be Sriya lovesSeema’s mother or Sriya likes herown m mother.sae
gory
to the syntactic cate Semantic ambiguities born from the fact that generally a computer is not in apostionto ‘ es
both sentences corresponds
The«occurrence of tank in
differ ent. Lexic al Sema ntic ambiguity resolved using distinguishing whatis logical from whatis not.
noun, but their meanings are atically Consider the example: “The carhit the pole while it was moving”.
techniques,where WSD aims at autom
word sense disambiguation (WSD) er. The interpretations can be:
mann
in the context in a computational
assigning the meaning of the word
uities were syntactic ambiguities. —: The car, while moving, hit the pole
2. Syntactic Ambiguity: The structural ambig
uity is of two kinds : Scop e Ambi guity and Attachment Ambiguity. — The carhit the pole while the pole was moving.
Structural ambig
Thefirst interpretation is preferred than the second one because we have a model ofthe
ves operators and quantifiers.
— Scope Ambiguity: Scope ambiguity invol world that helps usto distinguish whatis logical (or possible) from what is not. To supply a .
to safe locations.
Consider the example: Old men and women were taken to a computer modelof the world is not so easy.
is ambiguous. That
The scope ofthe adjective (i.e., the amount of text it qualifies) Consider the example: “We saw his duck”
is,whetherthe structure (old men and women)or((old men) and women)?
Duck can refer to the person’s bird or to a motion he made.
The scope of quantifiers is often not clear and creates ambiguity.
Semantic ambiguity happens when a sentence contains an ambiguous word or phrase.
Every man loves a woman
Discourse Ambiguity: Discourselevel processing needs a shared wordorshared
The interpretations can be, For every man there is a woman and also it can be
knowledge and the interpretation is carried out using this context. a ae
there is one particular woman whois loved by every man.
ambiguity comes underdiscourse level. -
— Attachment Ambiguity
— Anaphoric Ambiguity: Anaphora’s are the entities that have been pene
Asentence has attachment ambiguity if a constituent fits more than one position in
a parse tree. Attachment ambiguity arises from uncertainty of attaching a phrase or
introduced into the discourse. f
clause to a part of a sentence.° Consider the example, The horse ran up thehill. It was very ateor it soon act < oe
tired. ian
Consider the example:
The man saw the girl withthe telescope.
The anaphoric reference of‘it’ in the two situations cause ambiguity. Steep epoies > ag
to surface hence‘it’ can‘behill. Tired applies to animate object hence ‘it’ canbe a
It is ambiguous whether the man saw girl carrying a telescope, or he saw her
horse. 2
throughhis telescope.
. Pragmatic Ambiguity: Pragmatic ambiguity refers to a situation where the context <3 Bs
The meaning is dependent on whetherthe preposition ‘with’ is attachedto the girl
or the man. of a phrase gives it multiple interpretation. One of the hardest tasks in NLP. The 4S ay
Consider the example: problem involves processinguser intention, sentiment, belief world, modal $e all oe
of which are highly complex tasks. er
Buy books for children
Preposition Phrase ‘for children’ can be either adverbial and attach to the verb buy
or adjectival and attach to the object noun books.

‘ivodution(33)

Scanned with CamScanner


Bee ae “Considerthe example,
seeif .
Se i ioH “Tourist (checking out of the hotel): Waiter, go upstairs to myroom and —. Markov Model Approaches
-
in 15 minutes .a
~ | -” my sandals are there; do notbe late; | have to catch the train — Maximum Entropy Approaches

Waiter (running upstairs and coming back panting): Yes sir, they are there. —-HMM-Based Taggers
tourist, since he does not 3. Machine Learning Approaches |
Clearly, the waiter is falling short of the expectation of the
understand the pragmatics of the situation.
10. Stages of NLP
Pragmatic ambiguity arises when the statement is not specific, and the context
does not provide the information needed to clarify the statement. Information is Thereare generalfive stepsiin natural language processing

missing, and mustbeinferred. Consider the example: “I love you too.” Lexical Analysis: It involves identifying and analyzing the structure of words. ‘Lexicon of=
This can be interpreted as: a language means thecollection of words and phrasesin a language. Lexical analysis
is
dividing the whole chunk of text into paragraphs, sentences, and words.-
— | love you (just like you love me)
The lexical analysis in NLP deals with the study at the level of words with respectto.
ea | love you (just like someone else does)
their lexical meaning and part-of-speech. This level of linguistic processing utilizes a.
—_ | love you (and | love someoneelse) language's lexicon, whichis a collection of individual lexemes. Alexemeis a basic unit of ’
— |loveyou (as well as liking you) lexical meaning; which is an abstract unit of morphological analysis that represents‘he s
It is a highly complex task to resolve all these kinds of ambiguities, especially in the set of forms or “senses” taken by single morphemes, <5 ae
upper levelsof NLP. The meaningof a word, phrase, or sentence cannot be understood “Duck”, for example, can take the form ofa noun ora verb butits part-of-speech and lexical|
in isolation and contextual knowledge is needed to interpret the meaning, pragmatic meaning can only be derived in context with other words used in the phrase/sentence..
and world knowledgeis required in higher levels. It is not easy to create a world model This, in fact, is an early step towards a more sophisticated Information Retrieval system,
for disambiguation tasks. Linguistic tools and lexical resources are needed for the whereprecision is improved through part-of-speech tagging. ;
“4
developmentof disambiguation techniques. Resourceless languages are lagging behind Syntactic Analysis (Parsing): It involves analysis of words in the sentence for grammar
in these fields compared to resourceful languagesin implementation of these techniques. and arranging words in a manner that showsthe relationship among: the words. Theae
analyzer.‘ae
Rule based methods are language specific where as stochastic or statistical methods sentence such as “The school goesto boy” is rejected by English syntactic
are language independent: Automatic resolution ofall these ambiguities contains several
Synta ctic - Semantic Disclosure’
long standing problems but again development towards full-fledged disambiguation Lexical
Analysis ~ Integration
Analysis Analysis
techniquesis required which takes careofall the ambiguities. It is very much necessary
for the accurate working of NLP applications such as Machine Translation, Information how these meanings combine:
Semantic Analysis: It concerns what words mean‘and
_ Retrieval, Question Answering etc. : It draws the exact meaning orthe dcionary
in sentences to form sentence meanings.
Statistical Approaches of Ambiguity Resolution in Natural Language Processing the text. The text is checked for meaningfu lness. It is done by mapping
meaning from
are: domain. The semantic analyzer. disregards
syntactic structures and objects in the task
1. Probabilistic model can be (plant : industrial plant/living
sentence such as “hot ice-cream”. Another example
2. Part of Speech Tagging organism)
— Rule-Based Approaches

— Natural LanguageProcessing

Scanned with CamScanner


\
1

immediately preceding sentencesaffect


ope at oeceealesth « igweraammeter

Discourse Integration: This concems how the nce depends uponthe


of meaning: the prefix un- refers to “not being’, while the suffix -ness referstoa state
ng of any sente
interpretation of the next sentence. The meani of being”. The stem happyis considered as a free morphemes since tis a “wort in ts
iy of the sentence — In additio n,it also brings aboutthe meaningof the own night. Bound morphemes (prefixes and suffixes) require a free morphemes to which —

=
it can be attached to, and can therefore not appear as a “word” on ther own. .
ces are used in different situations
Pragmatic Analysis: This concems how senten In Information Retrieval, document and query terms can be stemmed to match Be
it afects the interpr etation of the senten ce. During this, what was said is re-
and how morphological variants of terms between the documents and query, such that the singular -
es deriving those aspects of language
imecpreted on what @ actually meant. It involv form of a noun in a query will match even with its plural form in the document. and vice
which requir e real world knowle dge. versa, thereby increasing recall.

Surface form
Morphologic! Discourse Analysis
| | want to print |
analysis Resolving references | Ali's init file
fadvual words are between sentences Me

analyzed words are to (prep)


analyzed into ther
print (verd)
components
Ali (noun)
Morphological
analysis

Stages of NLP
Pragmatic Analysis
To reinterpret what Syntactic Analysis
was said to what was
The part-of-speech tagging output of the lexical analysis can be used at the syntactic
axtually meant
Semantic Analysis
levelof linguistic processing to group words into phrase and Clause brackets. Syntactic
Atransformation is Analysis also referred to as “parsing”, allows the extraction of phrases which convey
made from the input
more meaningthanjustthe individual words by themselves, such as in a noun phrase.
text to an intemal
representation that In Information Retrieval, parsing can be leveraged to improve indexing since phrases
refiects the meaning
can be usedasrepresentations of documents which provide better information than just
single-word indices. In the same way, phrases that are syntachcally Gerwed from the
Figure 4: Stages ofNLP
query offers better search keys to match with documents that are simiarty parsed.
Morphological Analysis:
The morphological level of linguistic processing deals with the study of word structures
and word formation, focusing on the analysis of the individual components of words. The
most important unit of morphology, defined as having the “minimal unit of meaning” is
referred to as the morphemes. For example, the word: “unhappiness”.
It can be broken
= down into three morphemes (prefix, stem, and suffix), with each conveying
some form

A 16 )Matural Language Processing Introduction 17

Scanned with CamScanner


ne:
timesas in the case of the news headli
Nevertheless, syntax canstill be ambiguousat ly refers to how Syntactic Net
back to gain black belt” — which actual
o “Boy paralyzed after tumor fights disease and anes ©) who
of a tumor but enduredthe fight against the
‘a boy was paral yzed becau se
ultimately gained a high level of compet ence in martial arts. |wo a
a\ 4A ° . ee
Syntactic oe Ona’ we Trae
Example S+ ,..pnalve’s PRO V y eo”
| NP VP
Stems | LN Parse. want |
' V Ss tree PROV. NP
I (pronoun) Pro / | 4O™ YS
want(verb) | NP VP | Pro NP Syntactic
to (prep) want | NP | print NP abs \ analysis
print (verb) PRO fos
All (noun) | pro NP Parse tree Al's Le tle
's (possessive) . | print NP. AD) ON
Pragmatic Analysis
lait (aa)
file (noun) ll
Ais |
inti file The pragmatic level of linguistic processing deals with the useofreal-world knowledge
and understanding how this impacts the meaning of what is being communicated. By ~
Semantic Analysis analyzing the contextual dimension of the documents and queries, a more detailed
The semantic level of linguistic processing deals with the determination of what a representation is derived.
sentence really means by relating syntactic features and disambiguating words with In Information Retrieval, this level of Natural Language Processing primarily engages
multiple definitions to the given context. This level entails the appropriate interpretation query processing and understanding by integrating the user's history and goals as well
of the meaning of sentences, rather than the analysis at the level or individual words or- as the context upon which the query is being made. Contexts may include. time and
phrases. location.

In Information Retrieval, the query and document matching process can This level of analysis enables major breakthroughs in Information Retrieval as it
be performed
on a conceptual level, as opposed to simple terms, thereby further facilitates the conversation between the IR system andthe users,allowing the elicitation
increasing system
precision. Moreover, by applying semantic analysis to the query, of the purpose upon whichthe information being soughtis planned to be used, thereby
term expansion would
_ be possible with the use oflexical sources, Offering ensuring that the information retrieval system is fit for purpose.
improved retrieval of the relevant
documents even if exact terms are not used in
the query. Precision may increase with
query expansion, as with recall probably increasi
ng as well.

: Natural LanguageProcessing Introduction (as) ae

Scanned with CamScanner


Example
To whom the pronoun to
Sera 11. Applications of NLP a
> |whomthe proper
: me ‘noun‘Ali’ refers In the context of Human ComputerInterface (HCI), there are many NLP applications:
Discourse | What are.thefiles to be”
printed_
such asinformation retrieval systems,information extraction, machine learning systems, eat
@— © question answering system, dialogue system, email routing, telephone banking, speech.-
io recognition system, documentation retrieval system, document summarization, discourse a
==

| who's ==

OnOn
. w
s management, multilingual query processing, and natural languageinterface to database ae %
Pragmatic
system. Currentlyinteractive applications may beclassified into following categories: a
whal whal
t
type Speech Recognition / Speech Understanding and Synthesis / Speech Generation: © ;
Execute the command
Syntactic Net @)
Speech understanding system attempts to perform a semantic and pragmatic processing
Ipr /ali/stuff.init
of spoken utterance to understand whatthe user is saying and act on whatis being said.
The research area in this category includes: linguistic analysis, design & developing
Discourse Analysis
efficient and effective algorithms for speech recognition and synthesis.
The discourselevel of linguistic processing deals with the analysis of structure and meaning
LanguageTranslator: It is a task of automatically converting one natural language into
of text beyond a single sentence, making connections between words and sentences.At
another preserving the meaning of input text and producing an equivalent text in the .
this level, Anaphora Resolution is also achieved byidentifying the entity referenced by an
output language. The researchareain this category includes, language modelling.
anaphor (most commonlyin the form of, but notlimited to, a pronoun). An example is shown
Information Retrieval (IR): It is a scientific discipline that deals with analysis, design and
below.
implementation of a computerized system that addresses representation, organization,
©
and accessto large amounts of heterogeneous information encoded in digital format.
"| voted for Obama because he was most fromuserand ©
The search engine is the well known application of IR which accepts query
document to user. It returns the document, not the relevant answers;
returns the relevant
areain IR a
users areleft to extract answers from the returned documents. The research
aligned with my values,” she said.
information categorization 2and * Se
Figure 5: Anaphora Resolution IIfustration includes: information searching, information extraction,
n.
With the capability to recognize and resolve anaphora relationships, document and query information summarization from unstructured informatio
d information from unstructured SS
representations are improved,since,at the lexical level, the implicit presence of concepts is Information Extraction: Itincludes extraction of structure
from natural language text. The research
accounted for throughout the document aswell as in the query, while at the semantic and text. It is an activity offilling predefined template
y includes identifyi ng nameden tity, resolvin g anaphora and identifying
discourse levels, an integrated content representation of the documents and queries are area in this categor
generated. : relationships between entities. :

Answering (QA): It is passage retrieval in specific domain. It is a process of.— Bi


Structured documentsalso benefit from the analysis at the discourse level since sections Question
collection of documents.
can be broken downinto (1) title, (2) abstract, (3) introduction, (4) body, (5) results, finding answersfor a given question |from a large
process offinding answers.
. (6) analysis, (7) conclusion, and (8) references. Information Retrieval systems are Natural LanguageInterface to Database (NLIDB): It is a
significantly improved, as the specific roles of pieces of information are determined as natural language.
from database by asking questionsin
for whetherit is a‘conclusion, an opinion, a prediction, or a fact:

Natural Language Processing

Scanned with CamScanner


Beate
<<
Se

between human and com puters. li determines


Dialog Systems: | t is a study of dialog .
ar an ds ty le of the sen ten ce bas ed on th at it gives response to users. The research
‘gramm dialog and
in thi s cat ego ry inc lud es the des ign of co nventional agent, human-robot
grea
an analysis of human-human dialog.
g / Text Generation: The task of
‘Discourse Management / Story Understandin
nature of discourse relationship
identifying the discourse structure is to identify the
contrast and also to classify
between sentences such as elaboration, explanation,
yes-no, statement and assertion).
speechacts ina chunk of text (For example,

Se
d QuestionSs oee ee ee ee SS
Exp ecte
SS

s stages involvedin
4. What is Natural language processing ( NLP) ? Discuss variou
NLP processwith suitable example.
of analysis
2. What is Natural Language Understanding? Discuss various levels
under it with example.
suitable
3. What do you mean by ambiguity in Natural language? Explain with
example. Discuss various ways to resolve ambiguity in NL.
l language?
4. What do mean by lexical ambiguity and syntactic ambiguity in Natura
Whatare different ways to resolve these ambiguities?
ns in detail.
_ 5, List various applications of NLP and discuss any 2 applicatio

| @) Natural Language Processing

Scanned with CamScanner

You might also like