Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

EXPALAIN IN DETAIL ALL THE STEPS INVOLVED IN NATURAL LANGAUGE PROCESSING (30)

According to Copestake (2014), Natural language processing systems take strings of words
(sentences) as their input and produce structured representations capturing the meaning of those
strings as their output. The nature of this output depends heavily on the task at hand. A natural
language understanding system serving as an interface to a database might accept questions in
English which relate to the kind of data held by the database. In this case the meaning of the input
(the output of the system) might be expressed in terms of structured SQL queries which can be
directly submitted to the database.
The steps involved in natural langugae processing are as follows:
1. Morphological Analysis:

Individual worlds are analyzed into their components and non-word tokens, such as punctuation are
separated from the words.

2. Syntactic Analysis:

Linear sequences of words are transformed into structures that show how the words relate each other.
Some word sequences may be rejected if they violate the languages rules for how words may be
combined.

3. Semantic Analysis:

The structures created by the syntactic analyzer are assigned meanings.

4. Discourse Integration:

The meaning of an individual sentences may depend on the sentences that precede it and may
influence the meanings of the sentence (may depend on the sentences that precede it) that follow it.

5. Pragmatic Analysis:

The structure representing what was said is reinterpreted to determine that what was actually meant.
For example, the sentence “Do you know what time it is?” should be interpreted as a request to be
told the time.

According to Copestake (2014), a simplified view of Natural Language Processing emphasises four
distinct stages. In real systems these stages rarely all occur as separated, sequential processes.

Logical Steps in Natural Language Processing


Source: Copestake (2014)
1. Phonology:

This level deals with the interpretation of speech sounds within and across words. There are, in fact,
three types of rules used in phonological analysis: 1) phonetic rules – for sounds within words; 2)
phonemic rules – for variations of pronunciation when words are spoken together, and; 3) prosodic
rules – for fluctuation in stress and intonation across a sentence. In an NLP system that accepts
spoken input, the sound waves are analyzed and encoded into a digitized signal for interpretation by
various rules or by comparison to the particular language model being utilized

2. Morphological Analysis:

The preliminary stage which takes place before syntax analysis is morphological processing.
The purpose of this stage of language processing is to break strings of language input into sets
of tokens corresponding to discrete words, sub-words and punctuation forms. For example a
word like "unhappily" can be broken into three sub-word tokens as: un- happy –ly.
Morphology is concerned primarily with recognising how base words have been modified to
form other words with similar meanings but often with different syntactic categories.
Modification typically occurs by the addition of prefixes and/or postfixes but other textual
changes can also take place. In general there are three different cases of word form
modification.
As a language, English is easier to tokenise and apply morphological analysis to than
many. In some far-eastern languages words are not separated (by whitespace characters) in
their written form (examples include Japanese and some Chinese languages). In many
languages the morphology of words can be ambiguous in ways that can only be resolved by
carrying out syntactic and/or semantic analysis on the input. Simple examples in English
occur between plural nouns and singular verbs: "climbs" as in 'there are many climbs in the
Alps' or 'he climbs Everest in March'. This example of ambiguity can be resolved by syntax
analysis alone but other examples are more complex. "Undoable" could be analysed as ((un-do) -
able) or as (un- (do-able)), ambiguity which cannot always be resolved at the syntax level alone.
The output from the morphological processing phase is a string of tokens which can then be
used for lexicon lookup. These tokens may contain tense, number, gender and proximity
information (depending on the language) and in some cases may also contain additional
syntactic information for the parser. The next stage of processing is syntax analysis.
3. Syntactic Analysis:

A language processor must carry out a number of different functions primarily based around
syntax analysis and semantic analysis. The purpose of syntax analysis is two-fold: to check
that a string of words (a sentence) is well-formed and to break it up into a structure that shows
the syntactic relationships between the different words. A syntactic analyser (or parser) does
this using a dictionary of word definitions (the lexicon) and a set of syntax rules (the
grammar). A simple lexicon only contains the syntactic category of each word, a simple
grammar describes rules which indicate only how syntactic categories can be combined to
form phrases of different types.

After semantic analysis the next stage of processing deals with pragmatics. Unfortunately
there is no universally agreed distinction between semantics and pragmatics. This document,
in common with several other authors [Russel & Norvig(c)] makes the distinction as follows:
semantic analysis associates meaning with isolated utterances/sentences; pragmatic analysis
interprets the results of semantic analysis from the perspective of a specific context (the
context of the dialogue or state of the world etc). This means that with a sentence like "The
large cat chased the rat" semantic analysis can produce an expression which means the large
cat but cannot carry out the further step of inference required to identify the large cat as Felix.
This would be left up to pragmatic analysis. In some cases, like the example just described,
pragmatic analysis simply fits actual objects/events which exist in a given context with object
references obtained during semantic analysis. In other cases pragmatic analysis can
disambiguate sentences which cannot be fully disambiguated during the syntax and semantic
analysis phases. As an example consider the sentence "Put the apple in the basket on the
shelf". There are two semantic interpretations for this sentence. Using a form of logic for the
semantics:

4. Semantic Analysis

Once the computer has arrived at an analysis of the input sentence's syntactic structure, a semantic
analysis is needed to ascertain the meaning of the sentence. Two caveats are needed before I proceed.
First, as before, the subject is more complex than can be thoroughly discussed here, so I will proceed
by describing what seem to me to be the main issues and giving some examples. Second, I act as if
syntactic analysis and semantic analysis are two distinct and separated procedures when in an NLP
system they may in fact be interwoven.

From the syntactic structure of a sentence the NLP system will attempt to produce the logical form of
the sentence. Logical form is context-free in that it does not require that the sentence be interpreted
within its overall context in the discourse or conversation in which it occurs. And logical form
attempts to state the meaning of the sentence without reference to the particular natural language.
Thus the intent seems to be to make it closer to the notion of a proposition than to the original
sentence.

The basic or primitive unit of meaning for semantic will be not the word but the sense, because
words may have different senses, like those listed in the dictionary for the same word. One attempt to
help with this is for the different senses can be organized into a set of classes of objects; this
representation is called an ontology. Aristotle noted classes of substance, quantity, quality, relation,
place, time, position, state, action, and affection, and Allen notes we can add events, ideas, concepts,
and plans. Actions and events are especially influential. Events are important in many theories
because they provide a structure of organizing the interpretation of sentences. Actions are carried out
by agents. Also important is the already mentioned notion of a situation.

Just noting different senses of a word does not of course tell you which one is being used in a
particular sentence, and so ambiguity is still a problem for semantic interpretation. (Allen notes that
some senses are more specific (less vague) than others, and virtually all senses involve some degree
of vagueness in that they could theoretically be made more precise.) A word with different senses is
said to havelexical ambiguity. At the semantic level one must also note the possibility
ofstructural ambiguity. "Every boy loves a dog" is ambiguous between many dogs or one dog. This
kind of semantic structural ambiguity will involve quantifier scoping.

Thus far we have written as if the process of syntactic analysis, yielding a syntactic representation in
a context-free grammar or definite clause grammar, and the process of semantic analysis, which
takes this syntactic representation and yields a statement in a logical form language, were two
separate processes. We have used the phrase "semantic interpretation" loosely for the latter process;
actually we might think of semantic interpretation as going from the sentence to the logical form or
from the syntactic structure or representation to the logical form. There is a more specialized use of
"semantic interpretation" involved in the use of various techniques to linksyntactic and semantic
analysis. In this specialized sense, the method of semantic interpretation allows logical forms to be
computed while parsing. A popular version of this pursues a rule-by-rule style, with each syntactic
rule corresponding to a semantic rule, so that each well-formed syntactic constituent will have a
corresponding well-formed semantic (logical form) meaning constituent. But other approaches are
possible, including those that attempt to produce a semantic interpretation directly from the sentence
without using syntactic analysis and those that attempt to parse based on semantic structure. Just as
in the case of syntactic analysis, statistics might be used to disambiguate words into the most likely
sense.

5. Discourse

While syntax and semantics work with sentence-length units, the discourse level of NLP works with
units of text longer than a sentence. That is, it does not interpret multi-sentence texts as just
concatenated sentences, each of which can be interpreted singly. Rather, discourse focuses on the
properties of the text as a whole that convey meaning by making connections between component
sentences. Several types of discourse processing can occur at this level, two of the most common
being anaphora resolution and discourse/text structure recognition. Anaphora resolution is the
replacing of words such as pronouns, which are semantically vacant, with the appropriate entity to
which they refer (30). Discourse/text structure recognition determines the functions of sentences in
the text, which, in turn, adds to the meaningful representation of the text. For example, newspaper
articles can be deconstructed into discourse components such as: Lead, Main Story, Previous Events,
Evaluation, Attributed Quotes, and Expectation (31).

6. Pragmatic

This level is concerned with the purposeful use of language in situations and utilizes context over
and above the contents of the text for understanding. The goal is to explain how extra meaning is
read into texts without actually being encoded in them. This requires much world knowledge,
including the understanding of intentions, plans, and goals. Some NLP applications may utilize
knowledge bases and inferencing modules. For example, the following two sentences require
resolution of the anaphoric term ‘they’, but this resolution requires pragmatic or world knowledge.
The city councilors refused the demonstrators a permit because they feared violence. The city
councilors refused the demonstrators a permit because they advocated revolution.

References
Copestake, A., 2014. Natural Language Processing. Fuculty of Computer Science, Cambridge University, 19
May.

Negnevtsky, M., 2005. Artificial Inteligence - A guide to intelligent Systems. 2nd ed. Great Britain, King's Lynn:
Biddles Ltd, .

Thirumuruganathan, S., 2010. BUILDING BAYESIAN NETWORK BASED EXPERT SYSTEMS FROM RULES. Faculty
of the Graduate School of The University of Texas at Arlington in Partial Fulfillment.

You might also like