Professional Documents
Culture Documents
Unit V
Unit V
Unit V
Ontologies are formal definitions of vocabularies that allow us to define difficult or complex
structures and new relationships between vocabulary terms and members of classes that we
define. Ontologies generally describe specific domains such as scientific research areas.
Example:
Ontology depicting Movie:-
Components:
1. Individuals –
Individuals are also known as instances of objects or concepts.It may or may not be present
in an ontology.It represents the atomic level of an ontology.
For example, in the above ontology of movie, individuals can be a film (Titanic), a director
(James Cameron), an actor (Leonardo DiCaprio).
2. Classes –
Sets of collections of various objects are termed as classes.
For example, in the above ontology representing movie, movie genre (e.g. Thriller, Drama),
types of person (Actor or Director) are classes.
3. Attributes –
Properties that objects may possess.
For example, a movie is described by the set of ‘parts’ it contains like Script, Director, Actors.
4. Relations –
Ways in which concepts are related to one another.
For example, as shown above in the diagram a movie has to have a script and actors in it.
3.1 ONTOLOGICAL ENGINEERING
Concepts such as Events, Time, Physical Objects, and Beliefs— that occur in many
different domains. Representing these abstract concepts is sometimes called ontological
engineering.
Figure 3.13 The upper ontology of the world, showing the topics to be
covered later in the chapter. Each link indicates that the lower concept is a
specialization of the upper one. Specializations are not necessarily disjoint;
a human is both an animal and an agent, for example.
For example, a shopper would normally have the goal of buying a basketball, rather
than a particular basketball such as BB9 There are two choices for representing categories in
first-order logic: predicates and objects. That is, we can use the predicate Basketball (b), or we
can reify1 the category as an object, Basketballs.
Notice that because Dogs is a category and is a member of Domesticated Species, the
latter must be a category of categories. Categories can also be defined by providing necessary
and sufficient conditions for membership. For example, a bachelor is an unmarried adult male:
Physical Composition
We use the general PartOf relation to say that one thing is part of another. Objects can
be grouped into part of hierarchies, reminiscent of the Subset hierarchy:
PartOf (x, x)
Therefore, we can conclude PartOf (Bucharest, Earth).
For example, if the apples are Apple1, Apple2, and Apple3, then
BunchOf ({Apple1,Apple2,Apple3})
denotes the composite object with the three apples as parts (not elements). We can
define BunchOf in terms of the PartOf relation. Obviously, each element of s is part
of
BunchOf (s): ∀x x∈ s ⇒PartOf (x, BunchOf (s)) Furthermore, BunchOf (s) is the
smallest object satisfying this condition. In other words, BunchOf (s) must be part of
any object that has all the elements of s as parts:
∀y [∀x x∈ s ⇒PartOf (x, y)] ⇒PartOf (BunchOf (s), y)
Measurements
In both scientific and commonsense theories of the world, objects have height, mass,
cost, and so on. The values that we assign for these properties are called measures.
Length(L1)=Inches(1.5)=Centimeters(3.81)
Similar axioms can be written for pounds and kilograms, seconds and days, and dollars
and cents. Measures can be used to describe objects as follows:
Diameter (Basketball12)=Inches(9.5)
ListPrice(Basketball12)=$(19)
d∈ Days ⇒ Duration(d)=Hours(24)
Time Intervals
Event calculus opens us up to the possibility of talking about time, and time intervals.
We will consider two kinds of time intervals: moments and extended intervals. The distinction
is that only moments have zero duration:
Partition({Moments,ExtendedIntervals}, Intervals )
i∈Moments⇔Duration(i)=Seconds(0)
The functions Begin and End pick out the earliest and latest moments in an interval,
and the function Time delivers the point on the time scale for a moment.
The function Duration gives the difference between the end time and the start time.
Two intervals Meet if the end time of the first equals the start time of the second. The
complete set of interval relations, as proposed by Allen (1983), is shown graphically in Figure
12.2 and logically below:
Meet(i,j) ⇔ End(i)=Begin(j)
Before(i,j) ⇔ End(i) < Begin(j)
After (j,i) ⇔ Before(i, j)
During(i,j) ⇔ Begin(j) < Begin(i) < End(i) < End(j)
Overlap(i,j) ⇔ Begin(i) < Begin(j) < End(i) < End(j)
Begins(i,j) ⇔ Begin(i) = Begin(j)
Finishes(i,j) ⇔ End(i) = End(j)
3.7 EVENTS
Event calculus reifies fluents and events. The fluent At(Shankar, Berkeley) is an
object that refers to the fact of Shankar being in Berkeley, but does not by itself say anything
about whether it is true. To assert that a fluent is actually true at some point in time we use the
predicate T, as in T(At(Shankar, Berkeley), t). Events are described as instances of event
categories. The event E1 of Shankar flying from San Francisco to Washington, D.C. is
described as E1 ∈Flyings𝖠 Flyer (E1, Shankar ) 𝖠 Origin(E1, SF) 𝖠 Destination (E1,DC)
we can define an alternative three-argument version of the category of flying events and say
E1 ∈Flyings(Shankar, SF,DC) We then use Happens(E1, i) to say that the event E1 took
place over the time interval i, and we say the same thing in functional form with Extent(E1)=i.
We represent time intervals by a (start, end) pair of times; that is, i = (t1, t2) is the time interval
that starts at t1 and ends at t2. The complete set of predicates for one version of the event
calculus is T(f, t) Fluent f is true at time t Happens(e, i) Event e happens over the time interval
i Initiates(e, f, t) Event e causes fluent f to start to hold at time t Terminates(e, f, t) Event e
causes fluent f to cease to hold at time t Clipped(f, i) Fluent f ceases to be true at some point
during time interval i Restored (f, i) Fluent f becomes true sometime during time interval i We
assume a distinguished event, Start, that describes the initial state by saying which fluents are
initiated or terminated at the start time. We define T by saying that a fluent holds at a point in
time if the fluent was initiated by an event at some time in the past and was not made false
(clipped) by an intervening event. A fluent does not hold if it was terminated by an event and
not made true (restored) by another event. Formally, the axioms are:
Happens(e, (t1, t2)) 𝖠Initiates(e, f, t1) 𝖠¬Clipped(f, (t1, t)) 𝖠 t1 < t ⇒T(f,
t)Happens(e, (t1, t2)) 𝖠 Terminates(e, f, t1)𝖠¬Restored (f, (t1, t)) 𝖠 t1 < t ⇒¬T(f, t)
where Clipped and Restored are defined by Clipped(f, (t1, t2)) ⇔∃ e, t, t3 Happens(e,
(t, t3))𝖠 t1 ≤ t < t2 𝖠 Terminates(e, f, t) Restored (f, (t1, t2)) ⇔∃ e, t, t3 Happens(e, (t,
t3)) 𝖠 t1 ≤ t < t2 𝖠 Initiates(e, f, t)
What we need is a model of the mental objects that are in someone’s head (or
something’s knowledge base) and of the mental processes that manipulate those mental objects.
The model does not have to be detailed. We do not have to be able to predict how many
milliseconds it will take for a particular agent to make a deduction. We will be happy just to be
able to conclude that mother knows whether or not she is sitting.
We begin with the propositional attitudes that an agent can have toward mental objects:
attitudes such as Believes, Knows, Wants, Intends, and Informs. The difficulty is that these
attitudes do not behave like “normal” predicates.
For example, suppose we try to assert that Lois knows that Superman can fly: Knows
(Lois, CanFly(Superman)) One minor issue with this is that we normally think of
CanFly(Superman) as a sentence, but here it appears as a term. That issue can be patched up
just be reifying CanFly(Superman); making it a fluent. A more serious problem isthat, if it is
true that Superman is Clark Kent, then we must conclude that Lois knows that Clark can fly:
(Superman = Clark) 𝖠Knows(Lois, CanFly(Superman)) |= Knows(Lois, CanFly (Clark))
Modal logic is designed to address this problem. Regular logic is concerned with a single
modality, the modality of truth, allowing us to express “P is true.” Modal logic includes special
modal operators that take sentences (rather than terms) as arguments.
For example, “A knows P” is represented with the notation KAP, where K is the modal
operator for knowledge. It takes two arguments, an agent (written as the subscript) and a
sentence. The syntax of modal logic is the same as first-order logic, except that sentences can
also be formed with modal operators. In first-order logic a model contains a set of objects and
an interpretation that maps each name to the appropriate object, relation, or function. In modal
logic we want to be able to consider both the possibility that Superman’s secret identity is Clark
and that it isn’t. Therefore, we will need a more complicated model, one that consists of a
collection of possible worlds rather than just one true world. The worlds are connected in a
graph by accessibility relations, one relation for each modal operator. We say that world w1 is
accessible from world w0 with respect to the modal operator KA if everything in w1 is
consistent with what A knows in w0, and we write this as Acc(KA,w0,w1). In diagrams such
as Figure 12.4 we show accessibility as an arrow between possible worlds. In general, a
knowledge atom KAP is true in world w if and only if P is true in every world accessible from
w. The truth of more complex sentences is derived by recursive application of this rule and the
normal rules of first-order logic. That means that modal logic can be used to reason about
nested knowledge sentences: what one agent knows about another agent’s knowledge. For
example, we can say that, even though Lois doesn’t know whether Superman’s secret identity
is Clark Kent, she does know that Clark knows: KLois [KClark Identity(Superman, Clark )
∨KClark¬Identity(Superman, Clark )] Figure 3.15 shows some possible worlds for this
domain, with accessibility relations for Lois and Superman
Figure 3.15
In the TOP-LEFT diagram, it is common knowledge that Superman knows his own
identity, and neither he nor Lois has seen the weather report. So in w0 the worlds w0 and w2
are accessible to Superman; maybe rain is predicted, maybe not. For Lois all four worlds are
accessible from each other; she doesn’t know anything about the report or if Clark is Superman.
But she does know that Superman knows whether he is Clark, because in every world that is
accessible to Lois, either Superman knows I, or he knows ¬I. Lois does not know which is the
case, but either way she knows Superman knows. In the TOP-RIGHT diagram it is common
knowledge that Lois has seen the weather report. So in w4 she knows rain is predicted and in
w6 she knows rain is not predicted. Superman does not know the report, but he knows that Lois
knows, because in every world that is accessible to him, either she knows R or she knows ¬
R. In the BOTTOM diagram we represent the scenario where it is common knowledge that
Superman knows his identity, and Lois might or might not have seen the weather report. We
represent this by combining the two top scenarios, and adding arrows to show that Superman
does not know which scenario actually holds. Lois does know, so we don’t need to add any
arrows for her. In w0 Superman still knows I but not R, and now he does not know whether
Lois knows R. From what Superman knows, he might be in w0 or w2, in which case Lois does
not know whether R is true, or he could be in w4, in which case she knows R, or w6, in which
case she knows ¬R.
CLASSICAL PLANNING
Classical Planning is the planning where an agent takes advantage of the problem structure to
construct complex plans of an action. The agent performs three tasks in classical planning:
Initial state: It is the representation of each state as the conjunction of the ground and
functionless atoms.
Actions: It is defined by a set of action schemas which implicitly define
the ACTION() and RESULT() functions.
Result: It is obtained by the set of actions used by the agent.
Goal: It is same as a precondition, which is a conjunction of literals (whose value is either
positive or negative).
There are various examples which will make PDLL understandable:
This problem can be illustrated with the help of the following actions:
Therefore, the Air cargo transport problem is based on loading and unloading the cargo and flying it
from one place to another.
Below is the PDLL description for Air cargo transport:
Init (On(C1, SFO) ? On(C2, JFK) ? On(P1, SFO) ? On(P2, JFK)? Cargo(C1) ? Cargo(C2) ?
Plane(P1) ? Plane(P2)
? Airport (JFK) ? Airport (SFO))
Goal (On(C1, JFK) ? On(C2, SFO))
Action(Load (c, p, a),
PRECOND: On(c, a) ? On(p, a) ? Cargo(c) ? Plane(p) ? Airport (a)
EFFECT: ? On(c, a) ? In(c, p))
Action(Unload(c, p, a),
PRECOND: In(c, p) ? On(p, a) ? Cargo(c) ? Plane(p) ? Airport (a)
EFFECT: On(c, a) ? ?In(c, p))
Action(Fly(p, from, to),
PRECOND: On(p, from) ? Plane(p) ? Airport (from) ? Airport (to)
EFFECT: ? On(p, from) ? On(p, to))
The above described actions, (i.e., load, unload, and fly) affects the following two predicates:
The problem is that the agent needs to change the flat tire. The aim is to place a good spare tire over
the car’s axle. There are four actions used to define the spare tire problem:
1. PlanSAT: It is the question asking if there exists any plan that solves a planning problem.
2. Bounded PlanSAT: It is the question asking if there is a solution of length k or less than it.
We found that:
Note: PSPACE is the class which refers to those problems that can be solved via deterministic
Turing machine under a polynomial time space.
From the above, it can be concluded that:
It is seen from the above state space tree that the goal state is minimized from h(n)=3 to h(n)=0.
However, we can create and use several heuristic functions as per the reqirement. It is also clear
from the above example that a heuristic function h(n) can be defined as the information required to
solve a given problem more efficiently. The information can be related to the nature of the state,
cost of transforming from one state to another, goal node characterstics, etc., which is
expressed as a heuristic function.
Properties of a Heuristic Search Algorithm
Use of heuristic function in a heuristic search algorithm leads to following properties of a heuristic
search algorithm:
The heuristic method refers to finding the best possible solution to a problem quickly, effectively, and
efficiently. The word heuristic is derived from an ancient Greek word, 'eurisko.' It means to find, discover, or
search. It is a practical method of mental shortcut for problem-solving and decision making that reduces the
cognitive load and doesn't require to be perfect. The method is helpful in getting a satisfactory solution to a
much larger problem within a limited time frame.
The trial and error heuristic is the most fundamental heuristic. It can be applied in all situations, from
matching nuts and bolts to finding the answer related to algebra. Some common heuristics used to solve
metamaterial problems are visual representation, forward/backward reasoning, additional assumptions, and
simplification.
o First Principle - Understanding the Problem: It is the first step to solve a problem. This is the most important
principle because before solving a problem, it is required to understand the real problem. But many people
skip this principle of finding the initial suitable approach. The principle is focused on knowing the problem and
looking at the problem from other angles.
The various aspects covered under this principle are: what is the problem, what is going on, is there any other
way to explain the problem, is there all required information available, etc. These all points help in
understanding the actual problem and its aspects.
o Second Principle - Making a Plan: A problem can be solved by using many different ways. The second
principle says that it is required to find the best way that can be used to find the solution to the given problem.
For this purpose, the right strategy is the first find the requirement. The reverse 'working backward' can help
with this. In this, people assume to have a solution that helps them in solving the problem from the starting
point.
It also helps in making an overview of the possibilities, removing the less efficient immediately, comparing all
the remaining possibilities, or applying symmetry. This improves the judgment ability as well as the creativity
of a person.
o Third Principle - Implementing the Plan: After making the proper strategy, the plan can be implemented.
However, for this, it is necessary to be patient and give the required time to solve the problem. Because
implementing the plan is tougher than making a plan. If the plan does not provide any solution or does not
stand as per the expectations, then it is advised to repeat the second principle in a better way.
o Fourth Principle - Evaluation and Adaptation: This principle evaluated that things are in the planned way. In
other words, it said that we match the planned way with the standard way. After this, it is found that the things
are going well maintained so that the best way of solving the problem can get. Some plans may work while
others may not. So, after the proper evaluation, the most appropriate way can be adapted to solve the main
problem.
o Dividing Technique: Under this technique, the original problem is divided into smaller pieces or sub-problems
so that the answer can be found more easily. After solving these sub-problems separately, they can be merged
to get the final answer of the solution of the original problem.
o Inductive Method: This method involves a smaller problem than the original problem, which has been solved
already. The original bigger problem can be solved by deriving the generalization from the smaller problem or
by using the same method that is applied in the previous problem.
o Reduction Method: As we know, the problem is solved by different factors and causes, this method sets
various limits for the main problem in advance. It is helpful in reducing the leeway of the original problem and
getting the solution easily.
o Constructive Method: Under this method, the problem is solved step by step, and when the first step is passed,
the solution is taken as a victory. After it, consecutive steps are taken to reach the final stage. It helps in getting
the best way to solve the problem and getting a successful result.
o Local Search Method: In this method, the most feasible way of solving a problem is searched and used.
Continuous improvement is made in the method during the solving process, and when there is no more scope
for improvement, the method gets to the end, and the final result is the answer to the problem.
Psychology:
Cognitive Maps:
Cognitive maps were also discovered to be manipulated and created using heuristics. Internal representations
of our physical environment, particularly linked with spatial relationships, are known as cognitive maps. Our
memory uses these internal representations as a guide in our external surroundings. When asked about map
imaging, distancing, and other topics, it was discovered that respondents frequently distorted visuals. The
regularization of photographs gave rise to these aberrations.
Philosophy: An excellent example is a model that is a heuristic device for comprehending what it models
because it is never identical to what it models. In this sense, heuristics include stories, analogies, and the like.
The concept of utopia, as articulated in Plato's best-known work, The Republic, is a classic example. It
implies that the "ideal city" represented in The Republic is neither offered as a goal to strive for or as a
guiding principle for growth. Rather, it demonstrates how everything would have to be connected and how
one thing would lead to another (sometimes with disastrous consequences) if particular principles were
chosen and followed to the letter.
The noun heuristic is frequently used to define a rule-of-thumb, technique, or method. Heuristics are
important in creative thinking and the formation of scientific hypotheses, according to science philosophers.
Law: Heuristics are used in legal theory, particularly in the theory of law and economics, when a step-by-
step analysis is practicable, insofar as "practicality" is determined by the interests of a governing body.
The current securities regulatory structure is based on the assumption that all investors are completely
rational. Actual investors are constrained by cognitive biases, heuristics, and framing effects. For example,
the legal drinking age for unaccompanied persons in all states and the United States is 21 years. It is
considered that people must be mature enough to make judgments considering the risks of alcohol intake.
Given that people mature at varying rates, the age of 21 may be too late for some and too early for others.
The rather arbitrary deadline is adopted in this circumstance because it is hard or impracticable to determine
whether an individual is mature enough for society to trust them with such a high level of responsibility.
However, other proposed amendments include completing an alcohol education course on the condition for
legal alcohol possession rather than reaching 21. Because completion of such a course would probably be
optional rather than mandatory, teenage alcohol policy would be more case-by-case instead of heuristic.
Stereotyping: The heuristic method is also used by people to make opinions or judgments about things
that are not familiar to them or which they have never seen. They work as a mental shortcut to guessing
everything about a person as per his/her social status, actions, and background. It's not just related to making
assumptions about a person but also about an event, experience, and all the other things. It can be pure
guessing also. Stereotypes, as initially defined by journalist Walter Lippmann in his book Public
Opinion (1922), are mental images formed by our experiences and the information we are given about the
world.
Artificial Intelligence: This method is also helpful in AI to find the solution space. In artificial intelligence
systems, a heuristic can be used to seek a solution space. The heuristic is obtained by modifying the weight
of branches based on how likely each branch is to lead to a destination node or by applying a function that
the designer has programmed into the system.
The heuristic method is a mathematical method that The exact solution method focuses on finding the
provides a good solution with proof to a particular optimal solution to a problem.
problem.
This method consumes less time. This method consumes more time.
It provides a good, immediate, short-term goal or It provides an optimal, perfect, or rational solution or
approximate solution or decision. decision.
DESCRIPTION LOGIC IN AI
When the left-hand side is an atomic concept, the 드 symbol introduces a primitive definition (giving
only necessary condition) while the ≐ symbol introduces a real definition. With necessary and
sufficient conditions.
In general, it is possible to have complex concept expressions at the left-hand side as well.
Example :
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial intelligence modeled after
the brain. An Artificial neural network is usually a computational network based on biological neural networks that
construct the structure of the human brain. Similar to a human brain has neurons interconnected to each other, artificial
neural networks also have neurons that are linked to each other in various layers of the networks. These neurons are
known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network. In this tutorial, we will
discuss ANNs, Adaptive resonance theory, Kohonen self-organizing map, Building blocks, unsupervised learning,
Genetic algorithm, etc.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus represents
Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:
Dendrites Inputs
Synapse Weights
Axon Output
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the network of neurons
makes up a human brain so that computers will have an option to understand things and make decisions in a human-
like manner. The artificial neural network is designed by programming computers to behave simply like interconnected
brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association point somewhere in the
range of 1,000 and 100,000. In the human brain, data is stored in such a manner as to be distributed, and we can extract
more than one piece of this data when necessary from our memory parallelly. We can say that the human brain is made
up of incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a digital logic gate that takes
an input and gives an output. "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get "On"
in output. If both the inputs are "Off," then we get "Off" in output. Here the output depends upon input. Our brain does
not perform the same task. The outputs to inputs relationship keep changing because of the neurons in our brain, which
are "learning."
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden features
and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output that is
conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias. This
computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the output. Activation functions
choose whether a node should fire or not. Only those who are fired make it to the output layer. There are distinctive
activation functions available that can be applied upon the sort of task we are performing.
Artificial neural networks have a numerical value that can perform more than one task simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a database. The disappearance of a
couple of pieces of data in one place doesn't prevent the network from working.
After ANN training, the information may produce output even with inadequate data. The loss of performance here relies
upon the significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage the network according to
the desired output by demonstrating these examples to the network. The succession of the network is directly
proportional to the chosen instances, and if the event can't appear to the network in all its aspects, it can produce false
output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this feature makes the network
fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks. The appropriate network
structure is accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide insight concerning
why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their structure. Therefore, the
realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values before being introduced to
ANN. The presentation mechanism to be resolved here will directly impact the performance of the network. It relies on
the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us optimum results.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or something else to scale up to the
system's response. Bias has the same input, and weight equals to 1. Here the total of weighted inputs can be in the range
of 0 to positive infinity. Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired output. There is a different
kind of the activation function, but primarily either linear or non-linear sets of functions. Some of the commonly used
sets of activation functions are the Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look
at each of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there is a threshold value set
up. If the net weighted input of neurons is more than 1, then the final output of the activation function is returned as
one or else the output is returned as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan hyperbolic function is used
to approximate output from the actual net input. The function is defined as:
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved results internally. As per
the University of Massachusetts, Lowell Centre for Atmospheric Research. The feedback networks feed information
back into itself and are well suited to solve optimization issues. The Internal system error corrections utilize feedback
ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer, and at least one layer of a
neuron. Through assessment of its output by reviewing its input, the intensity of the network can be noticed based on group
behavior of the associated neurons, and the output is decided. The primary advantage of this network is that it figures out how to
evaluate and recognize input patterns.
LST Memory is a sophisticated version of the recurrent neural networks (RNN) design that was created to represent
chronological sequences and their long-range dependencies more precisely than traditional RNNs. Its main features are
the internal design of an LSTM cell and the various modifications introduced into the LSTM structure, and a few
applications of LSTMs that are in high demand. The article also provides an examination of LSTMs as well as GRUs.
The tutorial ends with a list that outlines the drawbacks associated with the LSTM network and a description of the
new models based on attention, which are swiftly replacing LSTMs in the real world.
Introduction:
LSTM networks extend the recurrent neural network (RNNs) mainly designed to deal with situations in which RNNs
do not work. When we talk about RNN, it is an algorithm that processes the current input by taking into account the
output of previous events (feedback) and then storing it in the memory of its users for a brief amount of time (short -
term memory). Of the many applications, its most well-known ones are those in the areas of non-Markovian speech
control and music composition. However, there are some drawbacks to RNNs. It is the first to fail to save information
over long periods of time. Sometimes an ancestor of data stored a considerable time ago is needed to determine the
output of the present. However, RNNs are utterly incapable of managing these "long-term dependencies." The second
issue is that there is no better control of which component of the context is required to continue and what part of the
past must be forgotten. Other issues associated with RNNs are the exploding or disappearing slopes (explained later)
that occur in training an RNN through backtracking. Therefore, the Long-Short-Term Memory (LSTM) was introduced
into the picture. It was designed so that the problem of the gradient disappearing is eliminated almost entirely as the
training model is unaffected. Long-time lags within specific issues are solved using LSTMs, which also deal with the
effects of noise, distributed representations, or endless numbers. With LSTMs, they do not meet the requirement to
maintain the same number of states before the time required by the hideaway Markov model (HMM). LSTMs offer us
an extensive range of parameters like learning rates and output and input biases. Therefore, there is no need for minor
adjustments. The effort to update each weight is decreased to O(1) by using LSTMs like those used in Back Propagation
Through Time (BPTT), which is a significant advantage.
In training a network, the primary objective is to reduce the losses (in terms of cost or error) seen in the output of the
network when training data is passed through it. We determine the gradient, or loss in relation to a weight set, then
adjust the weights in accordance with this, and repeat this process until we arrive at an optimal set of weights that will
ensure the loss is as low as. This is the idea behind reverse-tracking. Sometimes, it is the case that the gradient becomes
minimal. It is important to note that the amount of gradient in one layer is determined by some aspects of the following
layers. If any component is tiny (less one), The result is that the gradient will appear smaller. This is also known as "the
scaling effect. If this effect is multiplied by the rate of learning, which is a tiny value that ranges from 0.1-to 0.001, this
produces a lower value. This means that the change in weights is minimal and produces nearly the same results as
before. If the gradients are significant because of the vast components and the weights are changed to be higher than
the ideal value. The issue is commonly referred to as the issue of explosive gradients. To stop this effect of scaling, the
neural network unit was rebuilt such that the scale factor was set to one. The cell was then enhanced by a number of
gating units and was named the LSTM.
Architecture:
The main difference between the structures that comprise RNNs as well as LSTMs can be seen in the fact that the
hidden layer of LSTM is the gated unit or cell. It has four layers that work with each other to create the output of the
cell, as well as the cell's state. Both of these are transferred to the next layer. Contrary to RNNs, which comprise the
sole neural net layer made up of Tanh, LSTMs are comprised of three logistic sigmoid gates and a Tanh layer. Gates
were added to restrict the information that goes through cells. They decide which portion of the data is required in the
next cell and which parts must be eliminated. The output will typically fall in the range of 0-1, where "0" is a reference
to "reject all' while "1" means "include all."
Each LSTM cell is equipped with three inputs and two outputs, ht, and Ct. At a specific time, t, which ht is the hidden
state, and Ct is the cell state or memory. It xt is the present information point or the input. The first sigmoid layer
contains two inputs: ht-1 and xt, where ht-1 is the state hidden in the cell before it. It is also known by its name and the
forget gate since its output is a selection of the amount of data from the last cell that should be included. Its output will
be a number [0,1] multiplied (pointwise) by the previous cell's state .
Applications:
LSTM models have to be trained using a training dataset before being used for real-world use. The most challenging
applications are listed in the following sections:
1. Text generation or language modelling involves the calculation of words whenever a sequence of words is
supplied as input. Language models can be used at the level of characters or n-gram level as well as at the
sentence or the level of a paragraph.
2. Image processing is the process of the analysis of a photograph and converting the result into sentences. In order
to do this, we will need to have a set of data consisting of many photos with the appropriate descriptive captions.
A model that has been trained can determine the characteristics of images in the data. It is a photo dataset. The
data is processed to include only those words that suggest the most. It is text data. By combining these two types
of information, we will try to make the model work. The model's job is to produce a descriptive phrase for the
image, one word at the moment, using input words predicted by the model and the image.
3. Speech and Handwriting Recognition
4. The process of music generation is identical to text generation, where LSTMs can predict the musical notes, not
text, by studying a mix of notes fed into the input.
5. Language Translation involves translating a sequence of one language to a similar sequence in a different
language. Like image processing, an image-based dataset that includes words and translations is cleaned first
before the relevant portion to build the model. An encoder-decoder LSTM model can convert the input
sequences into their formatted vector (encoding) and then convert the translated version.
Drawbacks:
Everything in the world indeed has its advantages and disadvantages. LSTMs are no exception, and they also come
with a few disadvantages that are discussed below:
1. They became popular due to the fact that they solved the issue of gradients disappearing. However that they are
unable to eliminate the problem. The issue lies in that data needs to be moved between cells for its analysis.
Furthermore, the cell is becoming extremely complex with the addition of functions (such as the forget gate)
that are now part of the picture.
2. They require lots of time and resources to be trained and prepared for real-world applications. Technically
speaking, they require high memory bandwidth due to the linear layers present within each cell, which the
system is usually unable to supply. Therefore, in terms of hardware, LSTMs become pretty inefficient.
3. With the growing technology of data, mining scientists are searching for a system that is able to store past data
for more extended periods of time than LSTMs. The motivation behind the development of such a model is the
habit of humans of dividing a particular chunk of information into smaller parts to facilitate recollection.
4. LSTMs are affected by various random weights and behave similarly to neural networks that feed forward. They
favour small initialization over large weights.
AI - Natural Language Processing
Natural Language Processing (NLP) refers to AI method of communicating with an intelligent systems using
a natural language such as English.
Processing of Natural Language is required when you want an intelligent system like robot to perform as per
your instructions, when you want to hear decision from a dialogue based clinical expert system, etc.
The field of NLP involves making computers to perform useful tasks with the natural languages humans use.
The input and output of an NLP system can be −
Speech
Written Text
Components of NLP
There are two components of NLP as given −
Natural Language Understanding (NLU)
Understanding involves the following tasks −
Difficulties in NLU
NL has an extremely rich form and structure.
It is very ambiguous. There can be different levels of ambiguity −
Lexical ambiguity − It is at very primitive level such as word-level.
For example, treating the word “board” as noun or verb?
Syntax Level ambiguity − A sentence can be parsed in different ways.
For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or he lifted a beetle
that had red cap?
Referential ambiguity − Referring to something using pronouns. For example, Rima went to Gauri.
She said, “I am tired.” − Exactly who is tired?
One input can mean different meanings.
Many inputs can mean the same thing.
NLP Terminology
Phonology − It is study of organizing sound systematically.
Morphology − It is a study of construction of words from primitive meaningful units.
Morpheme − It is primitive unit of meaning in a language.
Syntax − It refers to arranging words to make a sentence. It also involves determining the structural
role of words in the sentence and in phrases.
Semantics − It is concerned with the meaning of words and how to combine words into meaningful
phrases and sentences.
Pragmatics − It deals with using and understanding sentences in different situations and how the
interpretation of the sentence is affected.
Discourse − It deals with how the immediately preceding sentence can affect the interpretation of the
next sentence.
World Knowledge − It includes the general knowledge about the world.
Steps in NLP
There are general five steps −
Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon of a language
means the collection of words and phrases in a language. Lexical analysis is dividing the whole chunk
of txt into paragraphs, sentences, and words.
Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and
arranging words in a manner that shows the relationship among the words. The sentence such as
“The school goes to boy” is rejected by English syntactic analyzer.
Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text. The text is
checked for meaningfulness. It is done by mapping syntactic structures and objects in the task domain.
The semantic analyzer disregards sentence such as “hot ice-cream”.
Discourse Integration − The meaning of any sentence depends upon the meaning of the sentence
just before it. In addition, it also brings about the meaning of immediately succeeding sentence.
Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant. It
involves deriving those aspects of language which require real world knowledge.
Implementation Aspects of Syntactic Analysis
There are a number of algorithms researchers have developed for syntactic analysis, but we consider only
the following simple methods −
Context-Free Grammar
Top-Down Parser
Let us see them in detail −
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite rules. Let us
create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily understand
and process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which
describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then the
string combined by NP followed by VP is a sentence. The rewrite rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks", sentences such
as "The bird peck the grains" can be wrongly permitted. i. e. the subject-verb agreement error is approved
as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
They are not highly precise. For example, “The grains peck the bird”, is a syntactically correct
according to parser, but even if it makes no sense, parser takes it as a correct sentence.
To bring out high precision, multiple sets of grammar need to be prepared. It may require a completely
different sets of rules for parsing singular and plural variations, passive sentences, etc., which can
lead to creation of huge set of rules that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that
matches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over again
with a different set of rules. This is repeated until a specific rule is found which describes the structure of the
sentence.
Merit − It is simple to implement.
Demerits −