A Beginner's Introduction To Natural Language Processing (NLP)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

A beginner’s introduction to Natural Language

Processing (NLP)
It’s not easy to train machines on how humans communicate. In recent
years, numerous technological innovations have enabled computers to
recognize language the way we humans do.

What is Natural Language Processing

Natural Language Processing (NLP) is a branch of artificial intelligence


dealing with the interaction between humans and computers using a
natural language. The ultimate aim of NLP is to read, understand, and
decode human words in a valuable manner. Most of the NLP techniques
depend on machine learning to obtain meaning from human languages.

A usual interaction between machines and humans using Natural


Language Processing could go as follows:

1. Humans talk to the computer


2. The computer captures the audio
3. There is an audio to text conversion
4. Text data is processed
5. Data is converted to audio
6. The computer plays the audio file and responds to humans

The use of Natural Language Processing

These are the typical applications where NLP is playing a driving force:

 NLP is used in language translate applications such as Google


Translate
 It has excellent use in word processors like Microsoft Word and web
apps like Grammarly that uses NLP to check the grammatical
accuracy of texts
 Interactive Voice Response (IVR) apps used in the call center to
answer to specific users’ queries
 Personal assistant apps such as Alexa, Siri, OK Google, and Cortana

How does Natural Language Processing work?

Natural Language Processing requires applying algorithms to recognize


and bring out the rules of natural language so that the raw language data is
transformed into a machine-understandable form. When we provide text to
the computer, it uses algorithms to understand the meaning related to
every sentence and gather the essential data from them. However, there
are times when these computers fail to extract the exact meaning of a
sentence, which could lead to uncertain results.

For example, one of the common errors is with the translation of the word
“online” from English to Russian. In the English language, online means
“connected to networks,” but its Russian translation has a synonym that
means “interactive.”

Another example is a bot-based English sentence-restructuring tool that


translates the sentence in such a way that it can change the whole
meaning.

Here is a common English proverb, which we need to reframe:

The spirit is willing, but the flesh is weak

Here is the sentence of how that tool rewords it

The soul is prepared; however, the tissue is powerless.

What are the techniques used in NLP?

Syntactic and semantic analysis are the key techniques used to complete
the tasks of Natural Language Processing.

Below is the explanation of their use:


1. SyntaxThe syntax is the positioning of words in a sentence in such a
way that they make sense grammatically.In Natural Language
Processing, syntactic analysis is used to determine the way a natural
language aligns with the rules of grammar. Some specific algorithms
are used to apply grammar rules to words and extract their meaning.
Syntax further includes some specific techniques:

 Lemmatization: The process of lowering multiple inflected forms of a


word into a single form for hassle-free analysis
 Morphological segmentation: Division of words into single units
called morphemes
 Word segmentation: Division of a large piece of continuing text into
different units
 Part-of-speech tagging: Identification of the part of speech for each
word
 Parsing: Grammatical analytics for the assigned sentence
 Sentence breaking: Placement of sentence boundaries on a
massive piece of text
 Stemming: Includes cutting the inflected words to their original form
2. Semantics
Semantics refers to the linguistic and logic that are conveyed through a
text. Semantic analysis is one of the most complex aspects of NLP that
hasn’t been entirely resolved yet.

Semantics involves implementing computer algorithms to find out the


interpretation of words and the structure of the sentences.

Here are some techniques in semantic analysis:

 Named entity recognition (NER): It involves discovering the parts of


a text that identifies and classifies into predetermined groups. Some
common examples include the names of places and people.
 Word sense disambiguation: It’s about determining the sense of a
word based on its context.
 Natural language generation: It uses the database to get semantic
intentions and turn it into human language.

The business benefits of NLP

Spellcheck and search are so mainstream, that we often take for granted,
especially at work where Natural Language Processing provides several
productivity benefits.

For example, at work, if you want to know the information about your
leaves, you can save the time of asking questions to your Human Resource
Manager. There is a chatbot based searches in the companies to whom
you can request a question and get answers about any policy of the
company. The integrated search tools in companies make customer
resource calls and accounting up to 10x shorter.

In addition, NLP helps recruiters in sorting job profiles, attract varied


candidates, and select employees that are more qualified. NLP also helps
in spam detection and keeps unwanted emails out of your mailbox. Gmail
and Outlook use NLP to label messages from specific senders into folders
you create.

In addition, sentiment analysis tools help organizations promptly recognize


whether Tweets and messages about them are right or not, so that they
can resolve client concerns. This tool doesn’t just process words on a
social network, and it segregates the context in which they emerge.
Sometimes an English word can have a negative or neutral meaning, so
NLP is used to thoroughly understand a post by the customers by
determining their emotion behind those words.

NLP uses cases

Multi-language machine translation: NLP-powered translation tools can


be used to translate low impact content like regulatory texts, or emails, and
speed up communication with partners as well as other business
interactions.

Advertising: NLP can be used to detect new potential customers on social


media by evaluating their digital footprint, which powers targeted
campaigns.

Sentiment and context analysis: NLP helps in generating more granular


insights. Customer interactions or feedback can be evaluated not only for
general positive or negative sentiment but also classified according to the
context of what was being discussed.

Brand monitoring: Billions of social media interactions can be analyzed to


find out what customers are saying about your brand and your contenders’
brand. We have done an in-house project to develop a specifically-tuned
‘Twitter ear’ which keeps listening for and surfaces any conversations
about topics of interest.

Call center operations: A speech-to-text transcript of a call can be


generated and then evaluated using NLP to bring attention to the most
important inquiries. One can ascertain general customer satisfaction, and
identify where some training is required for the support staff.

HR and recruiting: With NLP, recruiters can detect candidates more


efficiently as they can speed up candidate search by filtering out relevant
resumes and crafting bias-proof and gender-neutral job descriptions.

Chatbots: Gartner predicts that chatbots will account for 85% of customer
interactions by 2020. The next wave of chatbots are voice-driven chatbots
that can understand human speech, and ‘speak back’ rather than
interacting in a text-based fashion.

What is the future of NLP?


Today, NLP is striving to identify subtle distinction in the meaning of the
language, whether due to spelling errors, lack of context, or difference in
dialects.

As a part of the NLP experiment, Microsoft launched an Artificial


Intelligence (AI) chatbot named Tay on Twitter in the year 2016. The
thought behind it was that with more users conversing with this chatbot, the
smarter it would get. However, after 16 hours of its launch, Microsoft had to
remove Tay because of its abusive and racist comments.

The tech-giant learned a lot from this experience, and some months later, it
released its second-gen English-language chatbot called Zo. It uses a
merger of advanced approaches to acknowledge and initiate conversation.
Other organizations are also experimenting with bots to remember details
associated with an individual discussion.

Perhaps the future is full of challenges and threats for Natural Language
Processing; regulations are advancing speedily like never before. We are
likely to reach a developing level in the upcoming years to make complex
apps look possible.

Conclusion

NLP and machine learning applications play a pivotal role in supporting


machine-human communications. With more research in this sphere, there
are more developments to make machines smarter at learning and
understanding the human language.
What is Parsing in NLP?
Parsing is the process of examining the grammatical structure and
relationships inside a given sentence or text in natural language
processing (NLP). It involves analyzing the text to determine the roles of
specific words, such as nouns, verbs, and adjectives, as well as their
interrelationships.

This analysis produces a structured representation of the text, allowing


NLP computers to understand how words in a phrase connect to one
another. Parsers expose the structure of a sentence by constructing parse
trees or dependency trees that illustrate the hierarchical and syntactic
relationships between words.

This essential NLP stage is crucial for a variety of language understanding


tasks, which allow machines to extract meaning, provide coherent answers,
and execute tasks such as machine translation, sentiment analysis, and
information extraction.

Types of Parsing in NLP

The types of parsing are the core steps in NLP, allowing machines to
perceive the structure and meaning of the text, which is required for a
variety of language processing activities. There are two main types of
parsing in NLP which are as follows:
Syntactic Parsing

Syntactic parsing deals with a sentence’s grammatical structure. It involves


looking at the sentence to determine parts of speech, sentence boundaries,
and word relationships. The two most common approaches included are as
follows:

 Constituency Parsing: Constituency Parsing builds parse trees that


break down a sentence into its constituents, such as noun phrases
and verb phrases. It displays a sentence’s hierarchical structure,
demonstrating how words are arranged into bigger grammatical units.
 Dependency Parsing: Dependency parsing depicts grammatical
links between words by constructing a tree structure in which each
word in the sentence is dependent on another. It is frequently used in
tasks such as information extraction and machine translation
because it focuses on word relationships such as subject-verb-object
relations.

Semantic Parsing

Semantic parsing goes beyond syntactic structure to extract a sentence’s


meaning or semantics. It attempts to understand the roles of words in the
context of a certain task and how they interact with one another. Semantic
parsing is utilized in a variety of NLP applications, such as question
answering, knowledge base populating, and text understanding. It is
essential for activities requiring the extraction of actionable information from
text.

Parsing Techniques in NLP


The fundamental link between a sentence and its grammar is derived from
a parse tree. A parse tree is a tree that defines how the grammar was
utilized to construct the sentence. There are mainly two parsing techniques,
commonly known as top-down and bottom-up.
Top-Down Parsing

 A parse tree is a tree that defines how the grammar was utilized to
construct the sentence. Using the top-down approach, the parser
attempts to create a parse tree from the root node S down to the
leaves.
 The procedure begins with the assumption that the input can be
derived from the selected start symbol S.
 The next step is to find the tops of all the trees that can begin with S
by looking at the grammatical rules with S on the left-hand side,
which generates all the possible trees.
 Top-down parsing is a search with a specific objective in mind.
 It attempts to replicate the initial creation process by rederiving the
sentence from the start symbol, and the production tree is recreated
from the top down.
 Top-down, left-to-right, and backtracking are prominent search
strategies that are used in this method.
 The search begins with the root node labeled S, i.e., the starting
symbol, expands the internal nodes using the next productions with
the left-hand side equal to the internal node, and continues until
leaves are part of speech (terminals).
 If the leaf nodes, or parts of speech, do not match the input string, we
must go back to the most recent node processed and apply it to
another production.

Let’s consider the grammar rules:

Sentence = S = Noun Phrase (NP) + Verb Phrase (VP) + Preposition


Phrase (PP)

Take the sentence: “John is playing a game”, and apply Top-down parsing
If part of the speech does not match the input string, backtrack to the node
NP.

Part of the speech verb does not match the input string, backtrack to the
node S, since PNoun is matched.
The top-down technique has the advantage of never wasting time
investigating trees that cannot result in S, which indicates it never
examines subtrees that cannot find a place in some rooted tree.

Bottom-Up Parsing

 Bottom-up parsing begins with the words of input and attempts to


create trees from the words up, again by applying grammar rules one
at a time.
 The parse is successful if it builds a tree rooted in the start symbol S
that includes all of the input. Bottom-up parsing is a type of data-
driven search. It attempts to reverse the manufacturing process and
return the phrase to the start symbol S.
 It reverses the production to reduce the string of tokens to the
beginning Symbol, and the string is recognized by generating the
rightmost derivation in reverse.
 The goal of reaching the starting symbol S is accomplished through a
series of reductions; when the right-hand side of some rule matches
the substring of the input string, the substring is replaced with the left-
hand side of the matched production, and the process is repeated
until the starting symbol is reached.
 Bottom-up parsing can be thought of as a reduction process. Bottom-
up parsing is the construction of a parse tree in postorder.
Considering the grammatical rules stated above and the input sentence
“John is playing a game”,

The bottom-up parsing operates as follows:

Parsers and Its Types in NLP


As previously stated, a parser is essentially a procedural interpretation of
grammar. It searches through the space of a variety of trees to find the best
tree for the provided text. Let’s have a look at some of the accessible
parsers below.

 Recursive Descent Parser


o A top-down parser that iteratively breaks down the highest-level grammar rule
into subrules is known as a recursive descent parser. It is frequently
implemented as a set of recursive functions, each of which handles a certain
grammatical rule.
o This style of parser is frequently employed in hand-crafted parsers for simple
programming languages and domain-specific languages.

 Shift-Reduce Parser
o A shift-reduce parser is a sort of bottom-up parser that starts with the input
and builds a parse tree by performing a series of shift (transfer data to the
stack) and reduction (apply grammar rules) operations.
o Shift-reduce parsers are used in programming language parsing and are
frequently used with LR (Left-to-right, Rightmost derivation) or LALR (Look-
Ahead LR) parsing techniques

 Chart Parser
o A chart parser is a sort of parsing algorithm that efficiently parses words by
using dynamic programming and chart data structures. To reduce
unnecessary work, it stores and reuses intermediate parsing results.
o Early parser is a type of chart parser that is commonly utilized for parsing
context-free grammars.

 Regexp Parser
o A regexp (regular expression) parser is used to match patterns and extract
text. It scans a larger text or document for substrings that match a specific
regular expression pattern.
o Text processing and information retrieval tasks make extensive use of regexp
parsers.

Each of these parsers serves a different purpose and has its own set of
benefits and drawbacks. The parser chosen is determined by the nature of
the parsing task, the grammar of the language being processed, and the
application’s efficiency requirements.

How Does the Parser Work?


The first step is to identify the sentence’s subject. The parser divides the
text sequence into a group of words that are associated with a phrase. So,
this collection of words that are related to one another is referred to as the
subject.

Syntactic parsing and parts of speech are context-free grammar structures


that are based on the structure or arrangement of words. It is not
determined by the context.

The most important thing to remember is that grammar is always


syntactically valid, even if it may not make contextual sense.

Applications of Parsing in NLP


Parsing is a key natural language processing approach for analyzing and
comprehending the grammatical structure of natural language text. Parsing
is important in NLP for various reasons. Some of them are mentioned
below:

Syntactic Analysis: Parsing helps in determining the syntactic structure of


sentences by detecting parts of speech, phrases, and grammatical
relationships between words. This information is critical for understanding
sentence grammar.

Named Entity Recognition (NER): NER parsers can detect and classify
entities in text, such as people’s, organizations, and locations’ names,
among other things. This is essential for information extraction and text
comprehension.

Semantic Role Labeling (SRL): SRL parsers determine the semantic


roles of words in a sentence, such as who is the “agent,” “patient,” or
“instrument” in a given activity. It is essential for understanding the meaning
of sentences.

Machine Translation: Parsing can be used to assess source language


syntax and generate syntactically correct translations in the target
language. This is necessary for machine translation systems such as
Google Translate.
Question Answering: Parsing is used in question-answering systems to
help break down a question into its grammatical components, allowing the
system to search a corpus for relevant replies.

Text Summarization: Parsing is the process of extracting the essential


syntactic and semantic structures of a text, which is necessary for
producing short and coherent summaries.

Information Extraction: Parsing is used to extract structured information


from unstructured text, such as data from resumes, news articles, or
product reviews.

End-Note
In NLP, parsing is the foundation for understanding the structure of human
language. Parsing is the bridge connecting natural language to
computational understanding, serving diverse applications like syntactic
analysis, semantic role labeling, machine translation, and more. As NLP
technology advances, parsing will continue to be a critical component in
improving language understanding, making it more accessible, responsive,
and valuable in a wide range of applications.

You might also like