Chapter 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/366384548

Natural Language Processing: Applications,


Techniques and Challenges

Chapter · December 2020


DOI: 10.22271/ed.book.784-

CITATIONS READS

0 3,767

2 authors:

Irum Hafeez Sodhar Abdul Hafeez Buller


Shaheed Benazir Bhutto University, Shaheed Ben… 52 PUBLICATIONS 170 CITATIONS
40 PUBLICATIONS 89 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Abdul Hafeez Buller on 17 December 2022.

The user has requested enhancement of the downloaded file.


ADVANCES IN

Computer
Science VOLUME - 7

Chief Editor
Dr. Mukesh Singla
Professor, Computer Science and Engineering, Department of Satpriya
Group of Institutions, Rohtak, Haryana, India

AkiNik Publications
New Delhi
Published By: AkiNik Publications

AkiNik Publications
169, C-11, Sector - 3,
Rohini, Delhi-110085, India
Toll Free (India) – 18001234070
Phone No. – 9711224068, 9911215212
Email – akinikbooks@gmail.com

Chief Editor: Dr. Mukesh Singla

The author/publisher has attempted to trace and acknowledge the materials


reproduced in this publication and apologize if permission and
acknowledgements to publish in this form have not been given. If any material
has not been acknowledged please write and let us know so that we may rectify
it.

© AkiNik Publications
Publication Year: 2020
Pages: 162
Paperback ISBN: 978-93-90217-43-4
E-Book ISBN: 978-93-90217-44-1
Book DOI: https://doi.org/10.22271/ed.book.784
Price: ` 787/-
Contents

Chapters Page No.


1. Natural Language Processing: Applications, Techniques and
Challenges 01-25
(Irum Naz Sodhar, Akhtar Hussain Jalbani, Abdul Hafeez Buller, Azeem
Ayaz Mirani and Anam Naz Sodhar)

2. Low False Rate Alarms Using Machine Learning


Techniques 27-48
(Praveen V, Juhitha K, Sandhya A, Harish B and Dr. C Narasimham)

3. Verifiable Encryption and Decryption using Biometric


Authentication 49-65
(V. Amulya, A Monika, MLR Sainadh, S. Pavan Kumar and Dr. C.
Narasimham)

4. Face Recognition using Wavelet Filter Bank (WFB) Design 67-85


(Dr. Mohd. Abdul Muqeet)

5. S. Beer Centurial Problem in Cybernetics and Methods of


Its Resolution 87-117
(Petro P. Trokhimchuck)

6. Learning Apps a Technological Advancement and Its Impact


on the Memory of Children 119-129
(Dr. Anindita Gupta)

7. Face Recognition using LBP-Based Adaptive Directional


Wavelet Transform 131-144
(Dr. Mohd. Abdul Muqeet)

8. Artificial Intelligence and Its Challenges 145-162


(Manisha Verma)
Chapter - 1
Natural Language Processing: Applications,
Techniques and Challenges

Authors
Irum Naz Sodhar
Lecturer, Shaheed Benazir Bhutto University, Shaheed
Benazirabad, Sindh, Pakistan
Akhtar Hussain Jalbani
Associate Professor, Quaid-e-Awam University of Engineering,
Science & Technology, Nawabshah, Sindh, Pakistan
Abdul Hafeez Buller
Engineer, Quaid-e-Awam University of Engineering, Science
& Technology, Nawabshah, Sindh, Pakistan
Azeem Ayaz Mirani
Lecturer, Shaheed Benazir Bhutto University, Shaheed
Benazirabad, Sindh, Pakistan
Anam Naz Sodhar
Postgraduate Student, Quaid-e-Awam University of
Engineering, Science & Technology, Nawabshah, Sindh,
Pakistan

Page | 1
Page | 2
Chapter - 1
Natural Language Processing: Applications, Techniques
and Challenges
Irum Naz Sodhar, Akhtar Hussain Jalbani, Abdul Hafeez Buller, Azeem Ayaz Mirani and
Anam Naz Sodhar

Abstract
Natural language processing (NLP) is branch of artificial intelligence
which actually deals with natural language processing and computer
interpretation. Challenges in field of Artificial Intelligence which is focused
interpretation of language and its complexities. Real life application
including information retrieval, information extraction, machine translation,
text simplification, sentiment analysis, text summarization NLP is
contributing main part of research in these days. The textual software such as
MS-word and word-processor are important NLP consideration for
automated grammar and spelling check improvement. Office 365 is best
example which actually provides both texts as well as dictates features which
actually is best NLP application. IoT brought modern trends to communicate
and translate language to language version of context. It provides ease of
communication for language learners, researchers, tourist to communicate on
new places. Several IoT devices are introduced in market to share and
communicate. IoT made easy to bring smart devices as a smart language
translator without having human assistant. IoT are devices such as smart
language translator, C-Pen, TT-easy etc. This chapter focuses on the
application, techniques, challenges and future contribution in the field of
NLP.
Keywords: natural language processing (NLP), information retrieval,
information extraction, machine translation, text simplification
I. Introduction
Natural language or ordinary language is any language that naturally
evolved in human. Involvement of human language comes through use and
reputation without having proper and intentionally planned. Natural
language is can be counted in different forms such as speech, singing, facial

Page | 3
expression, signs and body gestures. Naturally developed language is
actually human in habitant adoptions which are built on the basis of different
words, signs, gestures and other activities. In recent years artificial
intelligence is occupying important applications of human life. This made
easy to come on new evaluation of technology with several new
opportunities. Due to rise of importance of artificial intelligence many new
sub-fields have arisen to contribute in human life. Many applications in this
time arisen due to important contribution. Artificial intelligence is evolved in
many fields of life such as education, health, and agricultural, natural
language interpretation and also in all aspects of life. Rapid and efficient
contribution of artificial intelligence shows greater importance in real life
activities. Few of important fields of artificial intelligence include
evolutionary computation, vision robotics, expert system, speech processing,
planning, machine learning and Natural language processing. As this chapter
is concern with natural language processing.
1.1 Natural language processing (NLP)
Natural Language processing (NLP) is subfield of artificial intelligence
which focuses computational linguistics interpretation. This field
encompasses several areas of textual and audio interpretation by integration
of machine learning methods which behaves statistically. It is also covers the
area of the pragmatic research of computational linguistics became very vast
and powerful by implementation of various techniques. (J. Li, Chen et al.
2016). The increasing availability and capability of NLP techniques which
improve computational language accuracy and improvement day by day.
NLP and Machine learning are most focused areas of research. NLP is
mostly influenced by other fields such as psychology, cognitive science,
linguistics and many other fields. It is concern with computational models of
engineering which processed to solve the human interaction and human
language understanding. For this several software packages are developed
for the language modelling areas for the interpretation of the computational
language which human can interpreted easily.
Mainly three broad concepts are pertaining to use in NLP.
1.1.1 Subjectivity
Subjective representation in the world is basic concept of human
phycology which is main cause representation of subjective experience. In
this context subjective experiences are encountered by five basic senses of
human and natural language (I. Li et al. 2018). This is considered as
subjective consciousness of mind by the integral part of five basic senses as

Page | 4
tactician, olfaction, gestation, vision and audition. As a subject one can hear
voice, see images, taste flavour, feel tactile and smell odours in some natural
and philological context that is also called natural language or universal
language. Due to this fact NLP some time referred as the study of structure
of subjective experience.
1.1.2 Consciousness
Consciousness of human mind is important concern of NLP which
based on two important parts consciousness and unconsciousness
components. Subjective representation which occurs in awareness is
consciousness whereas subjective representation which occurs outside of
awareness referred as unconsciousness mind.
1.1.3 Learning
Human cognitive learning starts when he/she born. The learning process
begins by the active five senses of human. Whereas, NLP learning is based
on the derivative leaning principles termed as modelling. Computerized
programmed model which bases one subjective learning experience from
consciousness of mind by five basic senses. NLP required a detailed
description of sequence of sensory and linguistic representations which is
followed by a comprehensive and detailed codification process.
1.2 Natural language processing importance
NLP can be sub-divided into two broad categories the core area and
application area which deals with two different areas of research. Core areas
of NLP cover and investigate basic or fundamental problems which cover
language modelling. Relationships of words in language which have
naturally occurrences can be encountered. Other than these NLP core areas
covers morphological processing or dealing with the discrimination of
meaningful components of words. Syntactic parsing which builds sentence
diagram which is used to attempt a suitable language text processing.
Semantic processing itself is used to distil meaning of words, sentences,
phrases and higher level of abstraction in piece of text. NLP is core part of
personal improvement, phobias and anxiety (Colneriĉ, N. 2018). Machine
interpretations which are use the perceptual and interactive thoughts by
communication techniques to make conveniently change thoughts and ideas.
However, NLP is to deal, convert text and understand human language but
don’t be confused with the idea NLP is only concern with language and its
interpretation it has very broader area of interest in these days.

Page | 5
1.3 Natural language processing history in short
Neuro-linguistic programming an artificial intelligence approach for
human communication with translation of grammatical context. It is for
personal and interpersonal development, psychotherapy created by Richard
Bandler. Historically came in 1950 with integration of artificial intelligence
and human natural language. The core basic part was information retrieval
(IR) for natural language text which actually employs techniques of high
scalable techniques to search and index large volume of text from database.
Today most of research believes that existence of NLP field in computer
world came about 1970. Richard Bandler student at U.C, santacurz with
collaboration of Dr. Jhon Grinder. Leslie Cameron bandler, roberdilts, Judith
DeLozier, and David Gordon proceed and expanded work of Richard
bandler and Dr. Grinder. Now NLP is focus of most of the researcher to
build a smart model for different statistical language meaning (semantics) for
human and machine understanding. New age considered to most dominant
area of research in field of artificial intelligence and neuro-linguistic
programming (NLP). NLP considered a quasi-religion which actually
belongs today’s world. This categorization is important contribution
presented by sociologist and anthropologists. However, Mr. Jean M.
Langford categorized as a folk magic implies to practice with symbolic
efficacy.
1.4 Internet of things and natural language processing
IoT is things oriented (objects), internet oriented (middleware) and
semantic oriented (information) network of objects. These three paradigms
(e.g. objects, internet and semantics) are important to consider IoT enabled
devices which can sense, process and communicate in real time environment.
Objects with additional internet connectivity having different functionalities
can communicate each other without any interference. It consists a series of
connected devices which scan share sense data to optimize performance.
Objects having limited battery life, smile size and having active processing
speed monitor environment without any interference in desired area of
application. Make this sense for IoT devices to see, hear think and react to
share data and information each other for making decisions on the basis of
obtained data. IoT enables these devices more smarter by underlying
technologies includes communication technologies, ubiquitous and pervasive
computing, embedded technology, internet protocols, physical architecture
and many other applications. Smart objects work on their relevant takes with
domain specific information as well as application. IoT support interaction
among the specific domain applications with heterogeneity of the domain

Page | 6
services i-e each domain sensors and actuators are responsible to
communicate directly with each other. Internet of things is emerging part of
almost all fields of life. IoT is collection of smart devices having
connectivity over internet. It is considered main part of today’s daily life due
to fast growth population of smart devices. Many new applications are
aroused because of easy to carry smart devices. Smart world is grown more
advanced due to rapid growth of micro-controller and Nano-technology.
Smart Nano (sensors) devices work as assistant of human by translating
chemical, physical parameters from real world which difficult to measure
and recording of voice which can be translated, managed and processed over
internet i-e cloud computing starts from here. Many smart devices are used
as human language translators proposed by different companies.
1.5 Feature selection methods
Feature selection is important area of research in machine learning,
pattern recognition, statistics and data mining. Features and subset of content
features selection are important to classify, recognize and deal with text. To
select the features subset from original features set which bases on features
relevancy and redundancy. Mainly there are four feature selection categories
completely relevancy and noisy feature, weakly relevant and redundant
features and strongly relevant features. For discriminative power and
prediction accuracy enhancement the relevant strong features are crucial.
The non-redundant features can be effective to predict sometimes having
weak features for the evolutionary measurement. Relevant features measures
accuracy as per needed for prediction whereas irrelevant features shows
features does not compatible for prediction accuracy. Normally appropriate
prediction accuracy needs to select both strongly relevant features as well as
weakly relevant features selection. Noisy and irrelevant features should be
vanished for good and effective prediction model. This is important to find
out the relevancy of features either its weak or strong. It does not mean one
feature is weak that is completely worthless or not important but this can be
very important with combination of another strong feature.
2. NLP models in context
2.1 Machine learning models and approaches
Machine learning models help to learn from statistical input data and
react as automated output. These models are different in nature and function
specific mathematical models. Machine learning comprises on many of
approaches which is used efficient classifying, clustering and predicting
accurately. In this context machine learning generally classifies these

Page | 7
approaches in supervised, unsupervised and semi-supervised machine
learning approaches. These approaches have different methods and models
to give best data modification, classification and prediction (Alishahi et al.,
2019). For this each category is described in detail below:
1. Supervised Machine learning models
2. Unsupervised Machine learning models
2.1.1 Supervised machine learning
Supervised machine learning models task of learning a function on input
to an output based on input and output pairs. Supervised learning deduces a
function from a labelled data and training data consist of a set of training
examples. Each pair of consists an individual object however this pair may
be having same features. Supervised learning based on set of input having
labelled data sets (Jung & Lee, 2019).
2.1.1.1 Support vector machine (SVM)
Support Vector Machine imparts machine learning in natural language
processing encountered crucial application of computational linguistic word
classification and text categorization. SVM helps to investigate and solve
conventional passive learning (Rameshbhai & Paulose, 2019). SVM may
involve solving natural language processing issue like imbalanced training
data and difficulty of obtaining sufficient training data. SVM is crucial part
of Natural Language processing research such as POS (parts-of-speech)
tagging, Word sense disambiguation, NP (non-phrase) chunking, information
extraction, relation extraction, semantic role labelling and dependency
analysis. These applications involve multi-class classification task; the next
step is to convert multiclass into binary classification; then classifier is
trained for binary classification and finally combined classifier result is
obtained.

Page | 8
Fig 1: Support vector machine
Figure 1. Shows how support vector machine algorithm works on mixed
data and classify data. SVM based on hyper plane in feature space having
maximal distance i-e margin to all training examples. On the whole SVM is
best ever classification algorithm having better generalization capability on
unseen data as compare to other classification methods like KNN or decision
trees (Jung & Lee, 2019), (Rajendran et al., 2019).
2.1.1.2 Decision trees
Decision trees are supervised machine learning approach which is
important part of efficient decision making. Decision trees supports
hierarchical of data set collection to make decision in different machine
learning applications. Important part of decision trees-based approach is to
selecting attribute to each node of tree. Decision trees are used to classify
unseen instances which can be induced a decision tree for extracting the
rules from obtained data. In natural language processing decision trees in
ambiguous state can perform better performance. Disambiguating problems,
phonetic ambiguities and dialog ending are important areas where decision
trees can perform better performance. The important aspect of language
application is morphological complex languages by morphological parsing.
These structures which come from morphological parsers can be in many
forms like strings, trees or network. Next level is semantic knowledge from
different levels of semantic and the arrangement of words can help to
recognize correct sense or it’s ambiguous (Jung & Lee, 2019).

Page | 9
Fig 2: Decision Node
Figure 2 shows decision tree. Decisions trees also can be applied to
resolve probabilistic grammar which is important to solve prepositional-
phrase attachment. They have also important role to develop statistical
models for parsing.
2.1.1.3 Random forest
Random forest is supervised machine learning model in machine
learning. Random forest classifies based on results obtained from decision
trees (I. Li et al., 2019). However, mode of targeted output for each decision
tree is out of forest. Random forest generates random tree on random
samples training data. Therefore, model performance and controlling over
fitting can be overcome by reducing variance in overall model. It is based on
collection of randomly constructed decision trees and number of classifies on
the basis of several subsamples of datasets.

Page | 10
Fig 3: Random Forest
Figure 3 describes about Random forest how to works more successfully
in classification and regression applications. Random forest in language
modelling is one of crucial approach which is used to solve problem of
predicting text based on randomly grown decision trees (DT). It can
generalize data more advanced level for prediction of text.
2.1.1.4 K-nearest neighbour
K-nearest neighbour is supervised statistical model based on pattern
recognition model. It has important application in Natural Language
processing-e language text categorization. K-Nearest Neighbour searches the
nearest neighbour from pre-defined text document. It works on some
similarity scores and rank that k neighbour based on similarity (Jung & Lee,
2019).

Page | 11
Fig 4: K nearest neighbour
Figure 4 shows the KNN to find the component of neighbour from set to
test data or similar data. This similarity is further can be used to predict text
document. Further if more than one neighbour belongs to same category than
sum of their score as weight of this category and height score is assigned to
the test document (Al-Makhadmeh & Tolba, 2019).
2.1.2 Unsupervised machine learning
Unsupervised learning refers to learn from data set without being
supervised, known or labelled. As contrary supervised learning methods
unsupervised machine learning cannot infer classification or regression
directly. It is due to untrained or non- labelling bunch of datasets and it
causes difficulty to train model as normally someone does. However
unsupervised learning can be used for discovering structure of data. It is
important approach for applications in which precisely unknown data
patterns are used to discover. In this context be aware most of time these
patterns are poor approximations as supervised machine does effectively. It
often used in situation where desired data output is not obtained as per
application or experiment analysis-e determining target of student learning
with full satisfaction (Jung & Lee, 2019).
2.1.2.1 Clustering
Clustering is machine learning approach which involves grouping
unsupervised or unknown data points into chunks or clusters. Clustering

Page | 12
implies to group similar data into set of known cluster (Liu et al., 2019). All
cluster objects are different from each other. Given set of data can be
categorized into clusters by applying different clustering algorithms. These
algorithms can be dived into two categories partitioning algorithms which
implies flat partitioning whereas other one is hierarchical algorithms which
follows hierarchical There are number of types clustering in machine
learning which can be used in NLP for text categorization and text clustering
such as K-means clustering, fuzzy clustering, density based clustering,
model based clustering (Alishahi et al., 2019).
2.1.2.2 K-means clustering
K-Mean clustering is one important approach in unsupervised machine
learning applications. It is used to evaluate and perform decision making on
unlabelled data (i-e undefined or group of data). K-means defines k-centroid
and assign every data point nearest clusters. Mean in k-means is indicates
averaging of data i-e finding the centroids (Alishahi et al., 2019).

Fig 2: K-Means clustering


Figure 1 shows k-clusters is used to label data for training dataset. The
mean distance of data points and cluster is most important factor that
actually defines value of K. Clustering is important for the populated data or

Page | 13
data points should be grouped in similar format or closely related
characteristics.
2.1.2.3 Fuzzy clustering
Fuzzy clustering is method of clustering in which allow one piece of
data to different clusters. Fuzzy clustering partitions clustering method
which is actually generalize the membership of data characteristics to other
clusters. However clusters are identified by similarity measures (Al-
Makhadmeh & Tolba, 2019).

Fig 3: Fuzzy clustering


Figure 2 shows Fuzzy clustering, follows a finite set of partition of n
elements. It has important applications areas where fuzzy clustering can be
used these areas includes bioinformatics, image analysis and marketing.
2.1.2.4 Hidden MARKOV model
Hidden Markov model is finite set of states; each state is having relation
or association with probability. Each state transition is given by set of
probabilities i-e transition probability. HMM is statistical model that is
actually variation on Markov Chain. It has many rich applications in

Page | 14
machine learning, NLP and data mining tasks including text pattern
recognition, handwriting writing recognition, speech synthesis, parts of
speech tagging and speech recognition(Al-Makhadmeh & Tolba, 2019).
Other applications include Gene prediction, machine translation, time series
analysis, HMM helps programs come to the most likely decision logic based
on both previous decisions (like previously recognized word or sentence)
and current data provided (such as audio snippet).

Fig 4: Hidden Markov Model


Figure 3 describes 3 steps for the HMM. HMM is mostly used toddy in
word vocabulary and word management games which helps to improve
language skills as well natural language processing i-e Computational
Processing. A detailed illustration of HMM model is given below in
diagram.
2.1.2.5 Association analysis
Association rules can be used for mining association between items
from unconstructed data with some modifications (Al-Makhadmeh & Tolba,
2019).

Fig 5: Association Analysis

Page | 15
Figure 4 shows association rules. It can be helpful for generating
statistical thesaurus, can search large statistical data, mining grammatical
and lexical grammar rules efficiently. When user specified minimum support
and user specified minimum.
3. Applications of NLP (Natural language processing)
Natural Language Processing (NLP) is the field of computer science,
artificial intelligence and computational language related with the interaction
between computers and humans by using natural languages as English,
Arabic, Urdu, etc. NLP related to humans and human interact with computers
by using NLP (Olex et al., 2019). Artificial neural network also work on
modelling of nonlinear processes and uses tools to solve problems related to
NLP such as Classification involves (Multilingualism classification, Word
Labelling, Named Entity Recognition, POS tagging Sentiment analysis,
Document identification, gender identification and so on) Clustering
(automatic document arrangement, theme extraction, Set of text in different
classes, set of unlabeled text, filter the text and so on) Regression (Examine
the relationship between two or more classes, dependent and independent
variables) Pattern recognition in different perspective, Decision making,
Visualization, Computer vision and others (I. Li et al., 2019).
Four most important application of NLP (Natural language processing)
which are:
1. Machine translation
2. Automatic Summarization
3. Sentiment Analysis
4. Text classification and Question Answering
3.1 Machine translation
Machine Translation (MT) is the sub area of the computational linguistic
that findout to used software to transliteration text or speech from one script
to another through translation process. Machine Translation (MT) is the area
of study that computer capable to learn with using programming skills.
Machine Learning Translation is one of the most motivate technology that
one would have ever comes across in the world. Now computers are more
similar to humans because using NLP Application to interact different people
at same time without consuming human effort. Machine Translation is
automated Translation of text either written form or speech form.

Page | 16
3.1.1 Types of MT (Machine translation)
3.1.1.1 Rule based machine translation (RBMT)
RBMT, develop many years ago, was the first practical approach to
machine translation. It works by parsing a source sentence to identified text
and analyze its nature of text, and then converting it into the appropriate
language based on rules of linguistics. Actually, rules are defined for the
language translations. Rule based translation replaced by Statistical machine
learning or Hybrid system (Olex et al., 2019), (Jung & Lee, 2019).
3.1.1.2 Statistical machine translation (SMT)
Statistical Machine Translation related to training with huge amount of
data either using multilingual, bilingual and monolingual corpora. The system
relation between source text and transliteration, both attributes are used in
SMT then generate final results like that given source text would achieve
translation. Machine Translation itself nothing no use rules of grammar and
punctuation. The Machine Translation engine mostly used today Google
Translate, Bing Translator and so on. Also, other Machine Translation tools
are available for linguistic translations on different platform. The SMT is that
provide facility to measure of percentage of translation and also provide avoid
the handcraft translation (Jung & Lee, 2019).
3.1.1.3 Example-based machine translation (EBMT)
Example-based Machine Translation system (EBMT), a sentence of any
language is translated by a translator. Sentence translation is easy and result
will be comes accurate most probably. Otherwise huge amount of text
translates by using single click so results comes in the form of lot ambiguity
in text. Large amount of sentences are taken by lot of time in execution
(Alishahi et al., 2019), (Al-Makhadmeh & Tolba, 2019).
3.1.1.4 Hybrid
In the first condition, Text is translated first by RBMT engine and then
process by machine and corrects errors when occur. Second condition the
RBMT engine do not translate the text but support SMT engine by using
input data. (Data in different nature is either Present/Past) (Al-Makhadmeh &
Tolba, 2019). The two main categories of hybrid systems are:
1. Rule Based Machine Translation (RBMT)
2. Statistical Machine Translation (SMT)
The Hybrid system used both approaches Rule based Machine
Translation (RBMT) and Statistical Machine Translation (SMT). Currently

Page | 17
more researcher takes neural machine translation (NMT) approach for
different perspective.
3.2 Automatic summarization
Text summarization is divided text into small pieces of text. Most
common problem in natural language processing (NLP) is automatic text
summarization. Automatic text summarizations have two main approaches
how to summarize the text in NLP. One is Extraction based and second
Abstraction based summarization (Al-Makhadmeh & Tolba, 2019).
3.2.1 Extraction based summarization
The extraction based summarization technique take document of text and
combine the whole text to make a summary. The extraction is prepared
through already defined text without make any change (Al-Makhadmeh &
Tolba, 2019).
For example: Text: Ali and Alia to attend the marriage ceremony of
cousin in Dhaka. In the city, Alia gave birth to a child named Yousif.
Text summary: Ali and Alia attend marriage ceremony Dhaka. Alia
birth Yousif. Ali, Alia, attend, marriage ceremony, Birth and Yousif these
words are extracted words and combine to generate summary of words.
Sometime summary is completely out of sense.
3.2.2 Abstraction based summarization
The abstraction-based summarization is technique used for select short
path or rephrase the text document. Abstraction is applied for text
summarization in deep learning issues. It can remove the grammar issues
from the text. The abstraction technique is used same as just like human.
Abstraction is good from the extraction. The text summarization algorithms
could do with abstraction is hard to build up that’s why used extraction.
Abstraction text summary: Ali and Alia came to Dhaka where Yousif
was born.
3.3 Sentiment analysis
Sentiment Analysis is also called as Opinion Mining. Sentiment analysis
is the area of Natural language processing that develop system that try to
identify and extract views about text (Kim, 2014), (Rameshbhai & Paulose,
2019). Although the system identifies the views, these systems extract the
attributes of the statement. Opinion/views can be defined as: text
summarization can be dived into two components, facts and Opinions/ Views.
Facts are objective expressions about text. Opinions/views are usually

Page | 18
subjective expressions that illustrate public views, sentiments, and feelings
towards a text. Sentiment analysis have two sub problems, one is subjective
classification and polarity classification (Liu et al., 2019). Subjective
classification means classify a sentence as subjective or objective. Polarity
classification means expressing positive, neutral and negative opinion/views
(J. Li et al., 2016b).
3.3.1 Level of sentiment analysis
There are three levels of Sentiment Analysis of Text which are:
Level 1: (Document) Sentiment analysis takes the views from whole
document or part.
Level 2: (Sentence) Sentiment analysis considers the views from single
sentence or single statement.
Level 3: (Sub-Sentence) Sentiment analysis take views from sub-
expression inside a Single statement.
3.3.2 Algorithms of sentiment analysis
There are lot of algorithms and methods available to implement
sentiment analysis system, those are classified as (J. Li et al., 2016b),
(Rajendran et al., 2019):
 Rule-Based: System that performed sentiment analysis basis on a
set of rule apply on text
 Automatic: System that depends on machine learning techniques to
apply on text
 Hybrid: System that combined both rule-based and Automatic
3.3.2.1 Rule based approach
Rule based approaches known as a set of rules about scripting language
that identify, subjectivity, polarity, or the subject of an opinion/view. Rules
are defined as according to input such as: Stemming, Tokenization, Parts of
Speech Tagging and Parsing. Other source includes such as lexicon (i-e lists
of words and expressions) (Jung & Lee, 2019).
3.3.2.2 Automatic Approach
Automatic approach does not rely on manually created rules. Text
Classification performed and identifies sentiments from text and gives
response in the form of category: Positive, Neutral and negative by using
classification algorithm (Rajendran et al., 2019). Those algorithms are Naive
bayes, Linear Regression, Support Vector Machine, Deep Learning (Al-
Makhadmeh & Tolba, 2019).

Page | 19
3.3.2.3 Hybrid approach
The hybrid approach is combination of rule based and Automatic
approaches by using both approaches to get progress better results.
3.4 Text classification and question answering
Text classification is the technique to divide text according to need of
context. Freely text available on different sources such as newspapers, social
media, chat box, online forum and so on. Text classification could be used to
arrange the text in the suitable format. It is the basic task of natural language
processing (NLP) to classify text into document, paragraph, sentence, words,
letters and so on.

Fig 6: Text Classification


Text Classification can be divided by two way such as: Manually and
Automatic text classification. Manually done by humans and automatic done
by using software’s. Automatic text classification technique is fast and
effective. There are many techniques for the automatic text to classify but
mostly used three techniques which are: Rule based System (RBS), Machine
Learning based System (MLBS), and hybrid System (HS).
3.4.1 Rule based system (RBS)
Rule based techniques classify text by using set of rules developed by
manually for scripts. Rule based system (RBS) easily understand by human
and easily find out mistake either rule followed on text or not and easily
improve text without consuming cost. This technique also has drawback. The
system require must be familiar with system and also have appropriate
information about system, so that is very time-consuming process. Rule based
system also difficult to implement on system (Al-Makhadmeh & Tolba,
2019).
3.4.2 Machine learning based system (MLBS)
Text classification by using machine learning to make classification of
text based on previous results. At first starting collection of data then pre-
processing, and trained data means extract the appropriate text. A machine
learning algorithm can use for the different purposes (Jung & Lee, 2019).

Page | 20
Mostly used Naive bayes, support vector machine and deep learning for the
text classification.
3.4.3 Hybrid system (HS)
Hybrid system means combination of rule-based system (RBS) and
machine learning based system. Which is used for the additional
improvement of outcomes. This technique is very useful because increases
improvement of text classification.
3.5 Question answering
Question answering is the field of informational retrieval and natural
language processing which most attention of how system builds automatically
question arise from user side in natural language processing. The computer is
capable to translate natural languages by using text input to convert different
language, so system produce valid answer to ask from users. Valid answer
means question asked from user’s side.
4. NLP Techniques
4.1 Challenges in application of natural language processing
4.1.1 Text format
The main challenges in machine learning which words, phrases, are
syntactic, semantic translation problem comes during machine translation. In
many languages, single keyword has several senses. The human could
recognize the right sense of the word (with several meaning). The computer is
not capable to understand human language by looking some text, sometime
hidden meaning of text that time no need of translator because not recognize
the text. Syntactic problems occur due to a variety of difference of languages
(Alishahi et al., 2019).
4.1.2 Quality
The biggest problems of word are to use in machine translation. Single
word has different meaning in different context being used, it is not possible
for software to understand the nature of words and situation. Human beings
are understanding context, especially in different situations and recognize the
emotions, non-verbal communication and so on (J. Li et al., 2016b).
4.1.3 Lexical gap
In natural language processing the same meaning could be expresses in a
variety of ways. Because a question can usually only be answered if every
referred idea is recognized, lexical gap significantly raises the amount of

Page | 21
questions that answer by system. Three types of lexical gap such as:
Phonological (For example: In English is /Spr/, Consonant cluster present in
the starting of word such as: Spring, Spray etc. but cannot combine two or
more form). Second is morphological gap a show any word is absent or can
exist is called morphological. Third is semantic gap means to describe the
words For example gender identification from words.
4.1.4 Ambiguity
It is the occurrence of same phrases has a variety of meaning that can be
in structural and syntactic. When one word, phrase, or sentence or paragraph
has more than one meaning is referred as ambiguous. The ambiguous word
mostly used for Natural linguistic. Different types of ambiguity are: Lexical
Analysis, Syntactic Analysis, Semantic Analysis, Discourse Analysis, and
Pragmatic Analysis.
4.1.4.1 Lexical analysis
Lexical Analysis is the ambiguity used when word is ambiguous. Lexical
ambiguity can be solved by Lexical category disambiguation that is POS
tagging. As many words may be belong to more than one lexical category
POS tagging is the process of assigning a POS or lexical category such as a
noun, Verb, Pronoun, Preposition, Adverb, Adjective etc.
4.1.4.2 Syntactic analysis
The structural ambiguities come in syntactic ambiguities, two kinds of
syntactic ambiguities: Scope Ambiguity and Attachment Ambiguity. Scope
Ambiguity: Scope ambiguity involved operators and quantifiers (the amount
of text). Attachment Ambiguity: is arises from vagueness of connect a phrase
or clause to a part of a sentence or paragraph.
4.1.4.3 Semantic analysis
Semantic Analysis is the meaning of the words themselves can be
misunderstood, after the syntax and the meaning of the separated words have
been solved, there are two ways of the reading sentence. Semantic
ambiguities born from the fact that generally a computer Semantic ambiguity
born from the fact that generally a computer is not in a position to
distinguishing what is logical from what is not.
4.1.4.4 Discourse analysis
Discourse analysis is a shared knowledge and the interpretation from the
outside using context.

Page | 22
4.1.4.5 Multilingualism
Multilingualism data involves text more than one language. As data that
cover more languages is known as multilingual data that must include more
than two languages. Information on the web presents in different languages.
All users of the web have different native language that’s why used mostly
native languages on the web for different purposes. Natural language is
processing tasks from automatic analysis of syntax to semantic and discourse
as well as machine learning techniques (including supervised and
unsupervised). Most important cause for using multilingualism data is to
extract linguistic information present in those data. But it is often the case that
in order to extract such information from data, a linguistic analysis.
4.1.5.6 Named entity extraction
Named Entity extraction (also known as Entity identification, Entity
chunking and Entity Extraction) is a subtask of information extraction that try
to find location and classify named entity from unstructured data. The core
issue of Name Entity Extraction to recognize the name entity, which is
general, organized into person name, location, and organization from a textual
document. In generally extract only features, labels, topics, location and so
on. Spell mistakes occurs during typing or conversation. Artificial intelligence
is broad field used machine learning and correct it spelling by auto, it means
your system is able to correct it misspelled word and Extract entity.
References
1. Li J, Monroe W, Jurafsky D. Understanding neural networks through
representation erasure, 2016. arXiv preprint arXiv:1612.08220.
2. Fabbri AR, Li I, Trairatvorakul P, He Y, Ting WT, Tung R et al.
Tutorialbank: A manually-collected corpus for prerequisite chains,
survey extraction and resource recommendation, 2018. arXiv preprint
arXiv:1805.04617.
3. Colneriĉ N, Demsar J. Emotion recognition on twitter: Comparative
study and training a unison model. IEEE transactions on affective
computing, 2018.
4. Grinder J, Bandler R. Reframing: Neuro-linguistic programming and the
transformation of meaning. Moab, UT: Real People Press, 1983, 156.
5. Langford JM. Medical mimesis: healing signs of a cosmopolitan"
quack". American Ethnologist. 1999; 26(1):24-46.

Page | 23
6. Al-Makhadmeh Z, Tolba A. Automatic hate speech detection using
killer natural language processing optimizing ensemble deep learning
approach. Computing, 0123456789, 2019.
https://doi.org/10.1007/s00607-019-00745-0.
7. Alishahi A, Chrupała G, Linzen T. Analyzing and interpreting neural
networks for NLP: A report on the first Blackbox NLP workshop.
Natural Language Engineering. 2019; 25(4):543-557.
https://doi.org/10.1017/S135132491900024X
8. Jung N, Lee G. Automated classification of building information
modeling (BIM) case studies by BIM use based on natural language
processing (NLP) and unsupervised learning. Advanced Engineering
Informatics. 2019; 41:100-917. https://doi.org/10.1016/j.aei.2019.04.007
9. Kim Y. Convolutional neural networks for sentence classification.
EMNLP 2014-2014 Conference on Empirical Methods in Natural
Language Processing, Proceedings of the Conference, 2014, 1746-1751.
https://doi.org/10.3115/v1/d14-1181
10. Li I, Fabbri AR, Tung RR, Radev DR. What Should I Learn First:
Introducing LectureBank for NLP Education and Prerequisite Chain
Learning. Proceedings of the AAAI Conference on Artificial
Intelligence. 2019; 33:6674-6681.
https://doi.org/10.1609/aaai.v33i01.33016674.
11. Li J, Chen X, Hovy E, Jurafsky D. Visualizing and understanding neural
models in NLP. Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language
Technologies, NAACL HLT 2016-Proceedings of the Conference,
2016a, 681-691.
12. Li J, Chen X, Hovy E, Jurafsky D. Visualizing and understanding neural
models in NLP. Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language
Technologies, NAACL HLT -Proceedings of the Conference, 2016b,
681-691. https://doi.org/10.18653/v1/n16-1082
13. Liu Z, Zhu H, Chong TY. An NLP-PCA Based Trading Strategy On
Chinese Stock Market. 2019; 334:80-89. https://doi.org/10.2991/hsmet-
19.2019.16.
14. Olex A, Maffey L, McInnes B. {NLP} Whack-A-Mole: {C}hallenges in
Cross-Domain Temporal Expression Extraction. Naacl. 2019; 2:3682-
3692. https://www.aclweb.org/anthology/N19-1369

Page | 24
15. Rajendran A, Zhang C, Abdul-Mageed M. UBC-NLP at SemEval-2019
Task 6: Ensemble Learning of Offensive Content With Enhanced
Training Data, 2013, 775-781. https://doi.org/10.18653/v1/s19-2136
16. Rameshbhai CJ, Paulose J. Opinion mining on newspaper headlines
using SVM and NLP. International Journal of Electrical and Computer
Engineering. 2019; 9(3):2152-2163.
https://doi.org/10.11591/ijece.v9i3.pp2152-2163.

Page | 25

View publication stats

You might also like