2-Lecture Two - (Back Ground of NLP)

Background and Overview of NLP
Adama Science and Technology University

School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2022)
NLP: Definition
 Natural language refers to human languages (Amharic, Afaan

Oromo, Tigrigna, English, Arabic, Chinese, etc.), as opposed to
artificial/programming languages such as C++, Java, Pascal,
etc.
 Natural language is represented using texts in spoken or written

forms.
 NLP is the computerized approach to analyzing text that is

based on both a set of theories and a set of technologies.
01/02/23 2
NLP: Definition
 A more comprehensive definition of NLP is given as:

 An interdisciplinary field of study dealing with computational
techniques for analyzing and representing naturally occurring
texts at one or more levels of linguistic analysis for the purpose of
achieving human-like language processing for a range of tasks or
applications.
 ...interdisciplinary field...
 Several fields including linguistics, psycholinguistics,
mathematics, computer science, and electrical engineering
contribute to the research and development of NLP.
01/02/23 3
NLP: Definition
 ...computational techniques...
 Multiple models, methods and algorithms are employed to
accomplish a particular type of language analysis.
 ...naturally occurring texts...

 Texts can be in spoken or written forms representing natural
languages used by humans to communicate to one another.
 ...levels of linguistic analysis...

 Multiple types of language processing are known to be at work when
humans produce or comprehend language.
01/02/23 4
NLP: Definition
 ...human-like language processing...

 NLP strives for human-like performance, and thus considered as a
discipline within Artificial Intelligence.
 ...tasks or applications...
 The goal of NLP is to accomplish human-like language processing
for various tasks and applications such as machine translation,
information retrieval, question-answering, etc.
01/02/23 5
NLP: Definition
 The task of NLU is equivalent to the role of reader/listener, whereas the task of Natural Language Generation (NLG) is that of the
writer/speaker.
01/02/23 6
NLP: Importance of NLP
 NLP bridges the communication gap between people and computers.

 Can lead to a better and a more natural communication with computers.
 Process an ever increasing amount of natural language data generated by
people, e.g. extract required information from web.
01/02/23 7
NLP: Difficulty of NLP
 People generally don’t appreciate how intelligent they are as natural

language processors.
 For them NLP is deceptively simple because no conscious effort is required.
 Since computers are orders of magnitude faster, many find it hard to believe
that computers are not good at processing natural languages.
01/02/23 8
NLP: Difficulty of NLP
 NLP is hard because of:

 Ambiguity - A word, term, phrase or sentence could mean several possible things.
 Computer languages are designed to be unambiguous.
 Variability - Lots of ways to express the same thing.

 Computer languages have variability but the equivalence of expressions can be
automatically detected.
01/02/23 9
NLP: Brief History
 Research and development on NLP started along with the advent of computers.
 The field emerged in the US from the strong desire of having a Machine Translation
system that automatically translates texts from Russian journals into English.
 However, the initial efforts to develop an accurate machine translation system were
not successful as automatic translation could not be realized just by translating words.
01/02/23 10
NLP: Brief History
 It was then understood that human-like translations require analyses of languages

at different levels such as:
 Word level,
 Phrase and sentence level,
 Sequential sentences,
 Whole text context,
 Beyond the text (knowledge about the world).
01/02/23 11
NLP: Brief History
 Such understandings helped many researchers and developers realize that they needed a more
adequate theory of language.
 Key contributors in this field include:

 Noam Chomsky, in his work on generative grammars,
 Claude Shannon, in his work on applied probabilistic models to automata for language.
 John Backus and Peter Naur, in their work on context-free grammars for programming languages.
01/02/23 12
NLP: Course Coverage and
Knowledge Requirement
01/02/23 13
Levels of Linguistic Analysis:
Morphology
 Morphology is the study of the componential nature of words.
 At morphological level, the smallest parts of words that carry meanings and affixes are analyzed.
01/02/23 14
Levels of Linguistic Analysis: Syntax
 Syntax refers to the study of structural relationships between words in a sentence.

 Syntactic analysis requires both a grammar and a parser, the output of which is a
representation of the sentence that reveals the structural dependency relationships
between the words.
01/02/23 15
Semantics
 Semantics deals with the meaning of words, phrases and sentences.
 Semantic analysis requires knowledge of:
 Lexical semantics : the meanings of the component words,
 Compositional semantics: how components combine to form larger meanings.
01/02/23 16
Discourse
 Discourse level deals with the properties of the text as a whole that
convey meaning by making connections between component sentences.
 Several methods are used in discourse processing, two of the most
common being:
 Anaphora resolution: replacing words such as pronouns, which are
semantically vacant, with the appropriate entity to which they refer; and
 Discourse/text structure recognition: determining the functions of sentences
in the text (which adds to the meaningful representation of the text).
01/02/23 17
Discourse
 Discourse level deals with the properties of the text as a whole that convey meaning by making connections between component sentences.
01/02/23 18
Pragmatics
 Pragmatics is concerned with the purposeful use of language in situations and utilizes context
over and above the contents of the text for understanding.
 Pragmatics deals with world knowledge – outside the contents of the document.
01/02/23 19
Applications of NLP: Spelling Correction
and Grammar Checking
01/02/23 20
Applications of NLP: Information
Retrieval
 Information Retrieval provides a list of potentially relevant documents in response to a user’s query.
01/02/23 21
Applications of NLP: Information
Extraction
 Information extraction focuses on the recognition, tagging and extraction of certain key elements of information (e.g. persons,
companies, locations, organizations, etc.) from large collections of text into a structured representation.
01/02/23 22
Applications of NLP: Machine
Translation
 Machine Translation is an automatic translation of text from one language to another.
01/02/23 23
Applications of NLP: Question-
Answering
 Question-Answering provides the user with either just the text of the answer itself of answer-providing passages.
01/02/23 24
Applications of NLP: Dialogue
Systems
 Dialogue Systems are agents that converse with human beings in a coherent structure using several modes of communication such
as text, speech, gesture, etc.
01/02/23 25
Applications of NLP: Text
Summarization
 Text summarization is an application of NLP that reduces a larger text into a shorter, yet richly constituted representation of the
original document.
01/02/23 26
Related Fields: Modes of Language
Representation
01/02/23 27
Related Fields: Speech Recognition
 Speech Recognition is the process of converting spoken words

(acoustic signals) into equivalent text.
 Speech Synthesis, also known as Text-to-Speech system,
performs the reverse process, i.e. artificially produces human
speech from a given text.
01/02/23 28
Related Fields: Optical Character
Recognition (OCR)
 Optical Character Recognition (OCR) is a computerized system
that converts non-editable text to machine-encoded text.
 If the text to be converted is handwritten, the system is also
known as intelligent Character Recognition (ICR).
01/02/23 29
Morphological Analysis
Introduction: Terminologies
 Morphology - the study of the structure of words.

 Morpheme - minimal units of morphology, e.g. helpfulness.
 Stem - part of the word that never changes even when

morphologically inflected. For example, walk is the stem for the
words walk, walks, walking, and walked.
 Root/Lemma - citation form of a set of words, e.g. break is the root

form for the words break, breaks, breaking, broke, and broken.
01/02/23 31
 Part-of-Speech / Lexical Category / Word Class - a linguistic

category of words that explains how the word is used in a
sentence.
 Although different languages may have different classification
schemes, English and Amharic words are usually classified into
eight lexical categories: noun, pronoun, adjective, verb,
adverb, preposition, conjunction and interjection.
 Morphologically important parts-of-speech in English and

Amharic include: nouns, adjectives and verbs.
01/02/23 32
 Morphological Analysis - the process of finding morphemes of a

word.
 It is an important component of Spelling Correction, Machine
Translation, Information Retrieval, Text Generation and other
natural language systems.
 Morphological Generation - the process of generating different
words from a morpheme.
 Lemmatisation - the process of finding the root/lemma of a word.
 Stemming - the process of finding the stems of a word.
01/02/23 33
Introduction: Kinds of Morphemes
 Morphemes can be classified in two ways:

 Free versus Bound,
 Roots, Affixes versus Combining Forms.
 Free Morphemes – morphemes that can stand on their own to

give meaning. Example:
 Friend in friendly,
 Large in enlarge,
 Help in helpfulness,
 Perform in performance.
01/02/23 34
 Morphemes can be classified in two ways:

 Free versus Bound,
 Roots, Affixes versus Combining Forms.
 Bound Morphemes – morphemes that can not stand on their

own as a word. Example:
 ly in friendly,
 en in enlarge,
 ful and ness in helpfulness,
 ance in performance.
01/02/23 35
 Roots, Affixes versus Combining Forms:

 Roots: morphemes (within a non-compound word) that makes the
most precise and concrete contribution to the words meaning, and
is either the sole morpheme or else the only one that is not an
affix.
Example: break in breaks, help in unhelpfulness.
 Affixes: bound morphemes that either precede, follow or are
inserted inside the root or stem.
Example: prefix: en in enlarge is an affix that precedes the
root large.
Suffix: ly in largely is an affix that allows the root large.
01/02/23 36
 Roots, Affixes versus Combining Forms:

 Combining Forms: morphemes that are formed from two bound
or free-like roots.
Example: two frees roots: photo and graph in photograph.
Two bound roots: electro- and –lysis in electrolysis.
Bound and free roots: Ethio- and America in Ethio-American.
01/02/23 37
Introduction: Morphological Types
 There are three types of morphological structures:

 Isolating,
 Agglutinative,
 Inflectional.
 Isolating: Languages with isolating morphological structures

have morphemes representing words in the language in most
cases.
 There is little or no morphological change in words, and such
languages do not require extensive study on morphological
analysis.
01/02/23 38
Introduction: Morphological Types
 Agglutinative: Languages with agglutinative morphological

structures have words formed from lots of morphemes that are
glued together.
 Words in these language groups have lots of easily separable
morphemes.
 Inflectional: In languages with inflectional morphological

structures, morphemes are fused together and require complex
morphological analyzer to separate morphemes.
 Morphemes may be fused together in several ways such as
affixation and doubling all or part of a word.
01/02/23 39
Introduction: Morphological Rules
 Words can be formed from morphemes in two ways:

 Derivational Morphology,
 Inflectional Morphology,
 Derivational Morphology:
 Derivational Morphology is a morphology concerned with the way in
which words are derived from morphemes through processes such as
affixation or compounding.
 This derivation process usually changes the part-of-speech category.
01/02/23 40
Introduction: Morphological Rules
 Inflectional Morphology is a morphology that deals with the

combination of a word with a morpheme, usually resulting in a
word of the same class as the original stem, and serving same
syntactic function.
 They do not change the part-of-speech category but the
grammatical function.
 Inflection can achieved by marking a word category for person
(first, second, third), gender (feminine, neuter, masculine), number
(singular, plural), case (subjective/nominative,
objective/accusative/dative, possessive/genitive), definiteness
(definite, indefinite), degree (positive, comparative, superlative),
tense (past, present, future), aspect (perfective,
01/02/23imperfective/continuous), politeness (impolite, polite), etc. 41
Syntax and Parsing

School of SoEEC
Department of Computer Science and
Engineering
Introduction
 Syntax - refers to the way words are related to each other in a

sentence.
 Syntactic Analysis - analyizes:
 How words are grouped together into phrases;
 What words modify other words;
 What words are of central importance to the sentence.
 Syntactic Analysis is used in many NLP applications such as:
 Grammar Checking,
 Question Answering,
 Information Extraction,
 Machine Translation.
01/02/23 43
Phrases : Noun Phrases
01/02/23 44
Sentences
 Simple Sentences (English):

 The computer is on the table.
 He went home.
 Compound Sentences (English):
 I like coffee, and he likes tea.
 He played football and She watched a Television.
 Complex Sentences (English):

 He was driving the car that he bought from his father.
 We rented our house to friends while we were abroad.
01/02/23 45
Simple Sentences
01/02/23 46
Parsing
 Parsing - is a derivation process which identifies the structure of

sentences using a given grammar.
 Considered as a special case of a search problem.
 Two basic methods of searching are used:
 Top-down strategy
 Bottom-up strategy
 Methods of improving efficiency:

 Storing lexical rules separately
 Chunking
01/02/23 47
Parsing Strategies: Top-Down Parsing
 Top-down parsing starts with the symbol S and then searches through different ways to
rewrite the symbols until the input sentence is generated.
01/02/23 48
Parsing Strategies: Top-Down Parsing
 Bottom-up parsing starts with words in a sentence and uses production rules backward to
reduce the sequence of symbols until it consists solely of S.
01/02/23 49
Towards Efficient Parsing: Separating
Lexical Rules
 The efficiency of parsing algorithms can be improved if lexical rules are stored separately
in a structure called lexicon, which specifies the possible categories for each word.
 The following example shows the lexical rules separated from other grammatical rules.
01/02/23 50
Towards Efficient Parsing: Chunking
 Chunking, also called partial parsing, is a technique which attempts to

model human parsing by breaking the text up into small pieces, each
parsed separately.
 Chunk boundaries correspond roughly to the pauses in everyday speech.
 For example, consider the following sentence.

 When I read a sentence, I read it a chunk at a time.
 Then, the following chunks can be identified.
 [When I read] [a sentence], [I read it] [a chunk] [at a time].
01/02/23 51
Towards Efficient Parsing: Chunking
 Each chunk can then be parsed separately.

 In addition to perhaps being a better model of human behavior
than full parsing methods, other advantages of chunk parsing are
as follows:
 Because a chunk parser only needs to deal with small, non-
recursive clauses, it is able to process text much more quickly.
 A chunk parser is easier to implement and requires much less
memory to parse.
 When a full parse fails, it must discard an entire sentence, even if it
got much of the structure correct.
 A chunk parser only discards a few words when it cannot figure out
how to proceed.
01/02/23 52
Semantic Analysis
School of SoEEC
Department of Computer Science and
Engineering
Dr. Mesfin Abebe Haile (2021)
Introduction to Semantics
 Semantic Analysis involves extraction of context-independent aspects of a sentence's

meaning, including the semantic roles of entities mentioned in the sentence, and
quantification information, such as cardinality, iteration, and dependency.
 Semantic Analysis is an important component for many NLP applications.
01/02/23 54
Cultural Analysis and Linguistic
Semantics
 Culture of the society has in impact on semantic analysis.
 How do we represent the meanings of tella, injera, besso, tej, teff,
etc… so as to translate them in to a foreign language?
 There is a close link between the life of the society and the lexicon
of the language spoken.
 For example: ice, snow  በረዶ
 Politeness in Amharic, not in English
01/02/23 55
First Order Predicate Calculus:
Elements of FOPC
 A predicate represents a property or relation between terms that
can be true or false.
 In a given interpretation, an n-ary predicate can defined as a function
from tuples of n terms to {True, False}.
 For example: Brother(Abebe, Kebede), Left-of(Square1, Square2),
GreaterThan(plus(1,1), plus(0,1)).
 Connectives are used to compose complex representations.
 Truth table
01/02/23 56
Elements of FOPC
 An atomic sentence is simply a predicate applied to a set of
terms.
 For example: Owns(Abebe, Car1)
 Sold(Abebe, Car1, Kebede)
 Semantics is True or False depending on the interpretation.
 The standard propositional connectives ( ∨, ¬, ∧, ⇒ ) can be
used to construct complex sentences:
 For example: Owns(Abebe, Car1) ∨ Owns(Kebede, Car1)
 Sold(Abebe, Car1, Kebede) ⇒ ¬Owns(Abebe, Car1)
 Semantics same as in propositional logic.
01/02/23 57
Example
 Sheraton Addis is a hotel.
 Hotel(SheratonAddis)
 Sheraton Addis serves Ethiopian food.
 Serves(SheratonAddis, EthiopianFood)
 I have only five Birr and I don’t have a lot of time.

 Have(Speaker, FiveBirr) ∧ ¬ Have(Speaker, LotOfTime)
 Sheraton Addis is near AAU.
 Near(LocationOf(SheratonAddis), LocationOf(AAU))
01/02/23 58
Semantic Networks
 A semantic network is a network which represents semantic

relations among concepts.
 This is often used as a form of knowledge representation.
 It is a directed or undirected graph consisting of vertices, which
represent concepts, and edges.
 A semantic network is used when one has knowledge that is best

understood as a set of concepts that are related to one another.
01/02/23 59
Semantic Networks
 Example:
01/02/23 60
Discovering Latent Semantics
 Latent Semantic Analysis (LSA) aims to discover something about

the meaning behind the words; about the topics in the documents.
 What is the difference between topics and words?
 Words are observable,
 Topics are not observable; they are latent.
 How to find out topics from the words in an automatic way?

 We can imagine them as:
 a compression of words;
 a combination of words.
01/02/23 61
Discovering Latent Semantics
 Implements the idea that the meaning of a passage is the sum of the
meanings of its words:
 meaning(word1) + meaning(word2) + … + meaning(wordn) =
meaning(passage)
 This “bag of words” function shows that a passage is considered to

be an unordered set of word tokens and the meanings are additive.
 By creating an equation of this kind for every passage of language

that a learner observes, we get a large system of linear equations.
01/02/23 62
Vector Space Model
 Represent the document as a vector where each entry corresponds to a

different word and the number at that entry corresponds to how many
times that word was present in the document (or some function of it)
 Number of words is huge.
 Select and use a smaller set of words that are of interest.
 E.g. uninteresting words: ‘and’, ‘the’ ‘at’, ‘is’, etc. These are called
stop-words.
 Stemming: remove endings. E.g. ‘learn’, ‘learning’, ‘learnable’,
‘learned’ could be substituted by the single stem ‘learn’.
 Other simplifications can also be invented and used.
 The set of different remaining words is called dictionary or
vocabulary. Fix an ordering of the terms in the dictionary so that you
can operate them by their index.
01/02/23 63
Question & Answer
01/02/23 64
Thank You !!!
01/02/23 65

2-Lecture Two - (Back Ground of NLP)

Uploaded by

Copyright:

Available Formats

You might also like

2-Lecture Two - (Back Ground of NLP)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2-Lecture Two - (Back Ground of NLP)

Uploaded by

Copyright:

Available Formats

Background and Overview of NLP

Adama Science and Technology University

 Natural language refers to human languages (Amharic, Afaan

 Natural language is represented using texts in spoken or written

 NLP is the computerized approach to analyzing text that is

 A more comprehensive definition of NLP is given as:

 ...naturally occurring texts...

 ...levels of linguistic analysis...

 ...human-like language processing...

 NLP bridges the communication gap between people and computers.

 People generally don’t appreciate how intelligent they are as natural

 NLP is hard because of:

 Variability - Lots of ways to express the same thing.

 It was then understood that human-like translations require analyses of languages

 Key contributors in this field include:

 Syntax refers to the study of structural relationships between words in a sentence.

 Speech Recognition is the process of converting spoken words

 Morphology - the study of the structure of words.

 Stem - part of the word that never changes even when

 Root/Lemma - citation form of a set of words, e.g. break is the root

 Part-of-Speech / Lexical Category / Word Class - a linguistic

 Morphologically important parts-of-speech in English and

 Morphological Analysis - the process of finding morphemes of a

 Morphemes can be classified in two ways:

 Free Morphemes – morphemes that can stand on their own to

 Morphemes can be classified in two ways:

 Bound Morphemes – morphemes that can not stand on their

 Roots, Affixes versus Combining Forms:

 Roots, Affixes versus Combining Forms:

 There are three types of morphological structures:

 Isolating: Languages with isolating morphological structures

 Agglutinative: Languages with agglutinative morphological

 Inflectional: In languages with inflectional morphological

 Words can be formed from morphemes in two ways:

 Inflectional Morphology is a morphology that deals with the

Adama Science and Technology University

 Syntax - refers to the way words are related to each other in a

 Simple Sentences (English):

 Complex Sentences (English):

 Parsing - is a derivation process which identifies the structure of

 Methods of improving efficiency:

 Chunking, also called partial parsing, is a technique which attempts to

 For example, consider the following sentence.

 Each chunk can then be parsed separately.

 Semantic Analysis involves extraction of context-independent aspects of a sentence's

 I have only five Birr and I don’t have a lot of time.

 A semantic network is a network which represents semantic

 A semantic network is used when one has knowledge that is best

 Latent Semantic Analysis (LSA) aims to discover something about

 How to find out topics from the words in an automatic way?

 This “bag of words” function shows that a passage is considered to

 By creating an equation of this kind for every passage of language

 Represent the document as a vector where each entry corresponds to a

You might also like