Professional Documents
Culture Documents
Module - 1: Introduction To AI
Module - 1: Introduction To AI
Introduction to AI
1. Introduction
In computing, data is information that has been translated into a form that is efficient for
movement or processing. Relative to today's computers and transmission media, data is
information converted into binary digital form. Raw data is a term used to describe data in its
most basic digital format.
Data collection is the process of gathering and measuring information from countless
different sources. In order to use the data we collect to develop practical artificial intelligence
(AI) and machine learning solutions, it must be collected and stored in a way that makes
sense for the business problem at hand.
AI works by combining large amounts of data with fast, iterative processing and intelligent
algorithms, allowing the software to learn automatically from patterns or features in the data.
The process requires multiple passes at the data to find connections and derive meaning from
undefined data.
As well as its role as input data for AI systems, data also plays a vital role in training,
validation and testing AI outputs. ... At this step of AI development, data is used to create a
test set and a training set.
Knowledge is the information about a domain that can be used to solve problems in that
domain. To solve many problems requires much knowledge, and this knowledge must be
represented in the computer. As part of designing a program to solve problems, we must
define how the knowledge will be represented.
Data contains raw figures and facts. Information unlike data provides insights analyzed
through the data collected. Information is particular with correlation to the inferences derived.
Data doesn't harbor any real meaning whereas information exists to provide insights and
meaning.
Intelligent means having or showing the ability to easily learn or understand things or to
deal with new or difficult situations or having or showing a lot of intelligence or able to
learn and understand things or having an ability to deal with problems or situations that
resembles or suggests the ability of an intelligent person
Artificial Intelligence (AI) refers to the ability of machines to perform cognitive tasks like
thinking, perceiving, learning, problem solving and decision making; it is inspired by the
ways people use their brains to perceive, learn, reason out and decide the action.
The basic objective of AI (also called heuristic programming, machine intelligence, or the
simulation of cognitive behavior) is to enable computers to perform such intellectual tasks as
decision making, problem solving, perception, understanding human communication (in any
language, and translate among them)
Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans, to learn by example. Innumerable developers are leveraging the latest
deep learning innovative technologies to take their business to the new high.
There are large numbers of fields of Artificial Intelligence technology like autonomous
vehicles, computer vision, automatic text generation, and the like, where the scope and use of
deep learning are increasing.
Take an example of Self Driving feature in cars like Tesla(Autopilot), where Deep learning is
a key technology behind enabling them to recognize a stop sign or to distinguish a pedestrian
from a lamppost.
2. Facial Recognition
Artificial Intelligence has made it possible to recognize individual faces using biometric
mapping. This has lead to pathbreaking advancements in surveillance technologies. It
compares the knowledge with a database of known faces to seek out a match.
However, this has also faced a lot of criticism for breach of privacy.
AI has the ability to execute the same kind of work over and over again without breaking a
sweat. To understand this feature better, let’s take the example of Siri, a voice-enabled
assistant created by Apple Inc. It can handle so many commands in a single day!
From asking to take up notes for a brief, to rescheduling the calendar for a meeting, to
guiding us through the streets with navigation, the assistant has it all covered.
Earlier, all of these activities had to be done manually which used to take up a lot of time and
effort.
The automation would not only lead to increased efficiencies but also result in lower
overhead costs and in some cases a safer work environment.
4. Data Ingestion
With every passing day, the data that we are all producing is growing exponentially, which is
where AI steps in. Instead of manually feeding this data, AI-enabled not just gathers this data
but also analyzes it with the help of its previous experiences.
Data ingestion is that the transportation of knowledge from assorted sources to a data-storage
medium where it are often accessed, used, and analyzed by a corporation.
AI, with the help of neural networks, analyzes a large amount of such data and helps in
providing a logical inference out of it.
5. Chatbots
Chatbots are software to provide a window for solving customer problems’ through either
audio or textual input. Earlier the bots used to respond only to specific commands. If you say
the wrong thing, it didn’t know what you meant.
The bot was only as smart as it was programmed to be. The real change came when these
chatbots were enabled by artificial intelligence.
Now, you don’t have to be ridiculously specific when you are talking to the chatbot. It
understands language, not just commands.
There are a lot of companies that have moved on from voice process executives to chatbots to
help customers solve their problems.
The chatbots not only offer services revolving around issues that the customers face but also
provides product suggestions to the users. All this, just because of AI.
6. Quantum Computing
AI is helping solve complex quantum physics problems with the accuracy of supercomputers
with the help of quantum neural networks. This can lead to path-breaking developments in
the near future.
It is an interdisciplinary field that focuses on building quantum algorithms for improving
computational tasks within AI, including sub-fields like machine learning.
7. Cloud Computing
Next Artificial Intelligence characteristics is Cloud Computing. With such a huge amount of
data being churned out every day, data storage in a physical form would have been a major
problem.
AI capabilities are working within the business cloud computing environment to make
organizations more efficient, strategic, and insight-driven.
However, the advent of Cloud Computing has saved us from such worries.
Microsoft Azure is one of the prominent players in the cloud computing industry. It offers to
deploy your own machine learning models to your data stored in cloud servers without any
lock-in.
AI works by combining large amounts of data with fast, iterative processing and intelligent
algorithms, allowing the software to learn automatically from patterns or features in the data.
Computer vision relies on pattern recognition and deep learning to recognize what's in a
picture or video.
3. AI Applications:
3.1 Automation
Industry has often sought to leverage technology to drive productivity. So, to reduce
production costs, industries have automated many repetitive activities and processes to reduce
the amount of human intervention required. Machines and computers use automation to
perform repetitive tasks and adapt to changes in circumstances. Automation has been widely
adopted in both blue-collar and white-collar workplaces.
Machine learning is a revolutionary idea: feed a machine a large amount of data, and it will
use the experience gained from the data to improve its own algorithm and process data better
in the future. The most significant arm of machine learning is Neural Networks. Neural
Networks are interconnected networks of nodes called neurons or perceptrons. These are
loosely modeled on the way the human brain processes information.
Neural Networks store data, learn from it, and improve their abilities to sort new data. For
example, a Neural Network tasked with identifying dogs can be fed various images of dogs
tagged with the type of dog. Over time, it will learn what kind of image corresponds to what
kind of dog. The machine therefore learns from experience and improves itself.
Deep Learning is a subset of Machine Learning. In Deep Learning, Neural Networks are
arranged into sprawling networks with a large number of layers that are trained using massive
amounts of data. It is different from most other kinds of Machine Learning, which generally
stress training on labeled data (for example, a picture of a dog with a tag identifying the name
of the dog, and some instructions on how to process each of these). In Deep Learning, the
sprawling artificial Neural Network is fed unlabeled data and not given any instructions. It
determines the important characteristics and purpose of the data itself, while storing it as
experience. Returning to our dog example: when images of a dog are fed to a Deep Learning
Neural Network, the machine itself determines the important characteristics of each breed of
dog from the images, and can then use these to identify a given dog’s breed.
Machine Vision seeks to allow computers to see. A computer captures images from a mounted
camera and converts them from analog to digital (the latter can be easily analyzed). Machine
Vision methods often seek to simulate the human eye. Machine Vision has various potential
uses, such as signature identification and medical image analysis.
NLP techniques (including voice recognition, text translation, and sentiment analysis) allow
computers to comprehend human language and speech. While Siri and Alexa are examples of
commercially available products using NLP algorithms, the major technology companies have
developed far more advanced NLP techniques than the ones Siri and Alexa use.
4. Intelligent agent
An intelligent agent is a program that can make decisions or perform a service based on its
environment, user input and experiences. These programs can be used to autonomously
gather information on a regular, programmed schedule or when prompted by the user in real
time. Intelligent agents may also be referred to as a bot, which is short for robot.
Typically, an agent program, using parameters the user has provided, searches all or some
part of the internet, gathers information the user is interested in and presents it to them on a
periodic or requested basis. Data intelligent agents can extract any specifiable information,
such as included keywords or publication date. In agents that employ artificial intelligence
(AI), user input is collected using sensors, like microphone or cameras, and agent output is
delivered through actuators, like speakers or screens. The practice of having information
brought to a user by an agent is called push technology.
Common characteristics of intelligent agents are adaptation based on experience, real time
problem solving, analysis of error or success rates and the use of memory-based storage and
retrieval.
For enterprises, intelligent agents can be used for applications in data mining, data analytics
and customer service and support (CSS). Consumers can also use intelligent agents to
compare the prices of similar products and notify the user when a website update occurs.
Intelligent agents are also similar to software agents which are autonomous computer
programs.
Examples of Agent:
A software agent has Keystrokes, file contents, received network packages which act
as sensors and displays on the screen, files, sent network packets acting as actuators.
A Human-agent has eyes, ears, and other organs which act as sensors, and hands,
legs, mouth, and other body parts acting as actuators.
A Robotic agent has Cameras and infrared range finders which act as sensors and
various motors acting as actuators.
Figure 1.2: Agents
Agents can be grouped into four classes based on their degree of perceived intelligence and
capability :
Goal-Based Agents
Utility-Based Agents
Learning Agent
Simple reflex agents ignore the rest of the percept history and act only on the basis of
the current percept. Percept history is the history of all that an agent has perceived to date.
The agent function is based on the condition-action rule. A condition-action rule is a rule
that maps a state i.e, condition to an action. If the condition is true, then the action is taken,
else not. This agent function only succeeds when the environment is fully observable. For
simple reflex agents operating in partially observable environments, infinite loops are often
unavoidable. It may be possible to escape from infinite loops if the agent can randomize its
actions.
Problems with Simple reflex agents are :
If there occurs any change in the environment, then the collection of rules need to be
updated.
It works by finding a rule whose condition matches the current situation. A model-based
agent can handle partially observable environments by the use of a model about the world.
The agent has to keep track of the internal state which is adjusted by each percept and that
depends on the percept history. The current state is stored inside the agent which maintains
some kind of structure describing the part of the world which cannot be seen.
Goal-based agents
These kinds of agents take decisions based on how far they are currently from
their goal(description of desirable situations). Their every action is intended to reduce its
distance from the goal. This allows the agent a way to choose among multiple possibilities,
selecting the one which reaches a goal state. The knowledge that supports its decisions is
represented explicitly and can be modified, which makes these agents more flexible. They
usually require search and planning. The goal-based agent’s behavior can easily be
changed.
Utility-based agents
The agents which are developed having their end uses as building blocks are called utility-
based agents. When there are multiple possible alternatives, then to decide which one is best,
utility-based agents are used. They choose actions based on a preference (utility) for each
state. Sometimes achieving the desired goal is not enough. We may look for a quicker, safer,
cheaper trip to reach a destination. Agent happiness should be taken into consideration.
Utility describes how “happy” the agent is. Because of the uncertainty in the world, a utility
agent chooses the action that maximizes the expected utility. A utility function maps a state
onto a real number which describes the associated degree of happiness.
Figure 1.5: Utility-based agents
Learning Agent :
A learning agent in AI is the type of agent that can learn from its past experiences or it has
learning capabilities. It starts to act with basic knowledge and then is able to act and adapt
automatically through learning.
2. Critic: The learning element takes feedback from critics which describes how well
the agent is doing with respect to a fixed performance standard.
Whenever the agent is confronted by a problem, its first action is seeking a solution is its
knowledge system. This is known as the search for the solution in the knowledge base.
Another attempt can be to search for a solution by going into different states. The search of
the agent stops in the state when the agent reaches the goal state.
There are many approaches for searching a particular goal state from all the states that the
agent can be in.
There are many search algorithms which are followed by an agent for solving the problems
by searching. Some of them are:
Random search:
In this search technique, an agent just keeps checking any random state for being it the goal
state. This is not an effective way to search the solution because, in this search, each node can
be searched again and again, there is no fixed path followed, problems like infinite searching
can be faced.
A* search:
It is one of the best and popular techniques used in path finding and graph traversals. It decides
the node to be traversed on the basis of an f-score which is calculated according to some norms
and the node with the highest f-score gets traversed. Here, the f-score is calculated on the basis of
misplaced events and number of nodes which needs to be moved to in order to replace the
nodes.The problem-solving agent performs precisely by defining problems and several solutions.
So we can say that problem solving is a part of artificial intelligence that encompasses a number
of techniques such as a tree, B-tree, heuristic algorithms to solve a problem.
Problem-solving agent in Artificial Intelligence is goal-based agents that focus on goals, is one
embodiment of a group of algorithms, and techniques to solve a well-defined problem in the area
of Artificial Intelligence.
Problem-solving agent in Artificial Intelligence is goal-based agents that focus on goals, is one
embodiment of a group of algorithms, and techniques to solve a well-defined problem in the area
of Artificial Intelligence. And these agents are different from reflex agents who just have to map
states into actions and can’t map when storing and learning both are bigger. The different stages
that Problem-solving agents perform, to arrive at a desired state or solution are:
1. Articulating or expressing the desired goal and the problem is tried upon, clearly.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a computational
network based on biological neural networks that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other, artificial neural networks
also have neurons that are linked to each other in various layers of the networks. These
neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network.
In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing
map, Building blocks, unsupervised learning, Genetic algorithm, etc.
The given figure illustrates the typical diagram of Biological Neural Network. The typical
Artificial Neural Network looks something like the given figure.
Figure 1.8: Artificial Neural Networks
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic
the network of neurons makes up a human brain so that computers will have an option to
understand things and make decisions in a human-like manner. The artificial neural network
is designed by programming computers to behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.
4. Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.
3. Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Science artificial neural networks that have steeped into the world in the mid-20 th century are
exponentially developing. In the present time, we have investigated the pros of artificial
neural networks and the issues encountered in the course of their utilization. It should not be
overlooked that the cons of ANN networks, which are a flourishing science branch, are
eliminated individually, and their pros are increasing day by day. It means that artificial
neural networks will turn into an irreplaceable part of our lives progressively important.
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the
form of a vector. These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between neurons
inside the artificial neural network. All the weighted inputs are summarized inside the
computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions. Some of the commonly used sets of activation functions are the
Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each
of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there
is a threshold value set up. If the net weighted input of neurons is more than 1, then the final
output of the activation function is returned as one or else the output is returned as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function is
defined as:
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
Prerequisite
Audience
Our Artificial Neural Network Tutorial is developed for beginners as well as professionals, to
help them understand the basic concept of ANNs.
Problems
We assure you that you will not find any problem in this Artificial Neural Network tutorial.
But if there is any problem or mistake, please post the problem in the contact form so that we
can further improve it.
Fuzzy Logic Systems (FLS) produce acceptable but definite output in response to incomplete,
ambiguous, distorted, or inaccurate (fuzzy) input.
Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of
FL imitates the way of decision making in humans that involves all intermediate possibilities
between digital values YES and NO.
The conventional logic block that a computer can understand takes precise input and
produces a definite output as TRUE or FALSE, which is equivalent to human’s YES or NO.
The inventor of fuzzy logic, LotfiZadeh, observed that unlike computers, the human decision
making includes a range of possibilities between YES and NO, such as –
CERTAINL YES
Y
POSSIBLY YE
S
CANNOT SAY
POSSIBLY NO
CERTAINL NO
Y
The fuzzy logic works on the levels of possibilities of input to achieve the definite output.
Implementation
It can be implemented in systems with various sizes and capabilities ranging from
small micro-controllers to large, networked, workstation-based control systems.
Fuzzification Module − It transforms the system inputs, which are crisp numbers, into
fuzzy sets. It splits the input signal into five steps such as −
LP
x is Large Positive
MP
x is Medium Positive
S
x is Small
MN
x is Medium Negative
LN
x is Large Negative
Defuzzification Module − It transforms the fuzzy set obtained by the inference engine
into a crisp value.
Membership Function
Membership functions allow you to quantify linguistic term and represent a fuzzy set
graphically. A membership function for a fuzzy set A on the universe of discourse X is
defined as μA:X → [0,1].
Here, each element of X is mapped to a value between 0 and 1. It is called membership value
or degree of membership. It quantifies the degree of membership of the element in X to the
fuzzy set A.
There can be multiple membership functions applicable to fuzzify a numerical value. Simple
membership functions are used as use of complex functions does not add more precision in
the output.
All membership functions for LP, MP, S, MN, and LN are shown as below −
The triangular membership function shapes are most common among various other
membership function shapes such as trapezoidal, singleton, and Gaussian.
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the
corresponding output also changes.
Let us consider an air conditioning system with 5-level fuzzy logic system. This system
adjusts the temperature of air conditioner by comparing the room temperature and the target
temperature value.
Algorithm
Convert crisp data into fuzzy data sets using membership functions. (fuzzification)
Development
Linguistic variables are input and output variables in the form of simple words or sentences.
For room temperature, cold, warm, hot, etc., are linguistic terms.
Every member of this set is a linguistic term and it can cover some portion of overall
temperature values.
Create a matrix of room temperature values versus target temperature values that an air
conditioning system is expected to provide.
RoomTemp
Very_Cold Cold Warm Hot Very_Hot
. /Target
Build a set of rules into the knowledge base in the form of IF-THEN-ELSE structures.
Fuzzy set operations perform evaluation of rules. The operations used for OR and AND are
Max and Min respectively. Combine all results of evaluation to form a final result. This result
is a fuzzy value.
Automotive Systems
Automatic Gearboxes
Four-Wheel Steering
Hi-Fi Systems
Photocopiers
Television
Domestic Goods
Microwave Ovens
Refrigerators
Toasters
Vacuum Cleaners
Washing Machines
Environment Control
Air Conditioners/Dryers/Heaters
Humidifiers
Advantages of FLSs
You can modify a FLS by just adding or deleting rules due to flexibility of fuzzy
logic.
Fuzzy logic Systems can take imprecise, distorted, noisy input information.
Fuzzy logic is a solution to complex problems in all fields of life, including medicine,
as it resembles human reasoning and decision making.
Disadvantages of FLSs
They are suitable for the problems which do not need high accuracy.
Natural Language Processing (NLP) refers to AI method of communicating with an
intelligent systems using a natural language such as English.
Processing of Natural Language is required when you want an intelligent system like robot to
perform as per your instructions, when you want to hear decision from a dialogue based
clinical expert system, etc.
The field of NLP involves making computers to perform useful tasks with the natural
languages humans use. The input and output of an NLP system can be −
Speech
Written Text
Components of NLP
It is the process of producing meaningful phrases and sentences in the form of natural
language from some internal representation.
It involves −
Text planning − It includes retrieving the relevant content from knowledge base.
Difficulties in NLU
For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or
he lifted a beetle that had red cap?
NLP Terminology
Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
Steps in NLP
Semantic Analysis − It draws the exact meaning or the dictionary meaning from the
text. The text is checked for meaningfulness. It is done by mapping syntactic
structures and objects in the task domain. The semantic analyzer disregards sentence
such as “hot ice-cream”.
Discourse Integration − The meaning of any sentence depends upon the meaning of
the sentence just before it. In addition, it also brings about the meaning of
immediately succeeding sentence.
Pragmatic Analysis − During this, what was said is re-interpreted on what it actually
meant. It involves deriving those aspects of language which require real world
knowledge.
There are a number of algorithms researchers have developed for syntactic analysis, but we
consider only the following simple methods −
Context-Free Grammar
Top-Down Parser
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite
rules. Let us create grammar to parse a sentence −
The parse tree breaks down the sentence into structured parts so that the computer can easily
understand and process it. In order for the parsing algorithm to construct this parse tree, a set
of rewrite rules, which describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other
symbols. According to first order logic rule, if there are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string combined by NP followed by VP is a sentence. The rewrite
rules for the sentence are as follows −
S → NP VP
VP → V NP
Lexocon −
DET → a | the
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks",
sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the subject-verb
agreement error is approved as correct.
Demerits −
They are not highly precise. For example, “The grains peck the bird”, is a
syntactically correct according to parser, but even if it makes no sense, parser takes it
as a correct sentence.
To bring out high precision, multiple sets of grammar need to be prepared. It may
require a completely different sets of rules for parsing singular and plural variations,
passive sentences, etc., which can lead to creation of huge set of rules that are
unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of
terminal symbols that matches the classes of the words in the input sentence until it consists
entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is
started over again with a different set of rules. This is repeated until a specific rule is found
which describes the structure of the sentence.
Demerits −
Expert systems (ES) are one of the prominent research domains of AI. It is introduced by the
researchers at Stanford University, Computer Science Department.
The expert systems are the computer applications developed to solve complex problems in a
particular domain, at the level of extra-ordinary human intelligence and expertise.
High performance
Understandable
Reliable
Highly responsive
Advising
Deriving a solution
Diagnosing
Explaining
Interpreting input
Predicting results
Knowledge Base
Inference Engine
User Interface
Knowledge is required to exhibit intelligence. The success of any ES majorly depends upon
the collection of highly accurate and precise knowledge.
What is Knowledge?
The data is collection of facts. The information is organized as data and facts about the task
domain. Data, information, and past experience combined together are termed as knowledge.
Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is in
the form of IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and
accuracy of the information stored in the knowledge base.
The knowledge base is formed by readings from various experts, scholars, and the
Knowledge Engineers. The knowledge engineer is a person with the qualities of empathy,
quick learning, and case analyzing skills.
He acquires information from subject expert by recording, interviewing, and observing him at
work, etc. He then categorizes and organizes the information in a meaningful way, in the
form of IF-THEN-ELSE rules, to be used by interference machine. The knowledge engineer
also monitors the development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a
correct, flawless solution.
In case of knowledge-based ES, the Inference Engine acquires and manipulates the
knowledge from the knowledge base to arrive at a particular solution.
Applies rules repeatedly to the facts, which are obtained from earlier rule application.
Resolves rules conflict when multiple rules are applicable to a particular case.
Forward Chaining
Backward Chaining
Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
Here, the Inference Engine follows the chain of conditions and derivations and finally
deduces the outcome. It considers all the facts and rules, and sorts them before concluding to
a solution.
This strategy is followed for working on conclusion, result, or effect. For example, prediction
of share market status as an effect of changes in interest rates.
Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this
happened?”
On the basis of what has already happened, the Inference Engine tries to find out which
conditions could have happened in the past for this result. This strategy is followed for
finding out cause or reason. For example, diagnosis of blood cancer in humans.
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally
Natural Language Processing so as to be used by the user who is well-versed in the task
domain. The user of the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may
appear in the following forms −
Its technology should be adaptable to user’s requirements; not the other way round.
No technology can offer easy and complete solution. Large systems are costly, require
significant development time, and computer resources. ESs have their limitations which
include −
Application
Description
Design Domain
Medical Domain
Diagnosis Systems to deduce cause of disease from observed data, conduction medical
operations on humans.
Monitoring Systems
Comparing data continuously with observed system or with prescribed behavior such as
leakage monitoring in long petroleum pipeline.
Knowledge Domain
Finance/Commerce
Detection of possible fraud, suspicious transactions, stock market trading, Airline scheduling,
cargo scheduling.
There are several levels of ES technologies available. Expert systems technologies include −
o Large databases.
Tools − They reduce the effort and cost involved in developing an expert system to
large extent.
Shells − A shell is nothing but an expert system without knowledge base. A shell
provides the developers with knowledge acquisition, inference engine, user interface,
and explanation facility. For example, few shells are given below −
o Java Expert System Shell (JESS) that provides fully developed Java API for
creating an expert system.
Know and establish the degree of integration with the other systems and databases.
Realize how the concepts can represent the domain knowledge best.
The knowledge engineer uses sample cases to test the prototype for any deficiencies
in performance.
Test and ensure the interaction of the ES with all elements of its environment,
including end users, databases, and other information systems.
Cater for new interfaces with other information systems, as those systems evolve.
Benefits of Expert Systems
Less Production Cost − Production cost is reasonable. This makes them affordable.
Speed − They offer great speed. They reduce the amount of work an individual puts
in.
Steady response − They work steadily without getting motional, tensed or fatigued.
An Uninformed search is a group of wide range usage algorithms of the era. These algorithms
are brute force operations, and they don’t have extra information about the search space; the
only information they have is on how to traverse or visit the nodes in the tree. Thus
uninformed search algorithms are also called blind search algorithms. The search algorithm
produces the search tree without using any domain knowledge, which is the brute force in
nature. They are different from informed search algorithms in a way that you check for a goal
when a node is generated or expanded, and they don’t have any background information on
how to approach the goal.
BFS is a search operation for finding the nodes in a tree. The algorithm works breadthwise
and traverses to find the desired node in a tree. It starts searching operation from the root
nodes and expands the successor nodes at that level before moving ahead and then moves
along breadth wise for further expansion.
It occupies a lot of memory space, and time to execute when the solution is at the
bottom or end of the tree and uses the FIFO queue.
DFS is one of the recursive algorithms we know. It traverses the graph or a tree depth-wise.
Thus it is known to be a depth-first search algorithm as it derives its name from the way it
functions. The DFS uses the stack for its implementation. The process of search is similar to
BFS. The only difference lies in the expansion of nodes which is depth-wise in this case.
Unlike the BFS, the DFS requires very less space in the memory because of the way it
stores the nodes stack only on the path it explores depth-wise.
In comparison to BFS, the execution time is also less if the expansion of nodes is
correct. If the path is not correct, then the recursion continues, and there is no
guarantee that one may find the solution. This may result in an infinite loop
formation.
The DFS search algorithm is not optimal, and it may generate large steps and possibly
high cost to find the solution.
The DLS algorithm is one of the uninformed strategies. A depth limited search is close to
DFS to some extent. It can find the solution to the demerit of DFS. The nodes at the depth
may behave as if no successor exists at the depth. Depth-limited search can be halted in two
cases:
o SFV: The Standard failure value which tells that there is no solution to the
problem.
o CFV: The Cutoff failure value tells that there is no solution within the given
depth.
The DLS is efficient in memory space utilization.
It has the demerit of incompleteness. It is complete only if the solution is above the
depth limit.
The UCS algorithm is used for visiting the weighted tree. The main goal of the uniform cost
search is to fetch a goal node and find the true path, including the cumulative cost. The
following are the properties of the UCS algorithm:
The expansion takes place on the basis of cost from the root. The UCS is implemented
using a priority queue.
The UCS does not care for the number of steps, and so it may end up an infinite loop.
We can say that UCs is the optimal algorithm as it chooses the path with the lowest
cost only.
When the search space is large, it proves itself, and the depth is not known.
This algorithm has one demerit, and it is that it iterates all the previous steps.
The algorithm is known to be complete only if the branching factor is known r finite.
The Two way or Bidirectional search algorithm executes in a way that t has to run two
searches simultaneously one in a forward direction and the other in the backward direction.
The search will stop when the two simultaneous searches intersect each other to find the goal
node. It is free to use any search algorithm discussed above, like BFS, DFS, etc.
The implementation is difficult, and the goal node should be known in advance to
execute it.
DFS is known as the Depth First Search Algorithm which provides the steps to traverse each
and every node of a graph without repeating any node. This algorithm is the same as Depth
First Traversal for a tree but differs in maintaining a Boolean to check if the node has already
been visited or not. This is important for graph traversal as cycles also exist in the graph. A
stack is maintained in this algorithm to store the suspended nodes while traversal. It is named
so because we first travel to the depth of each adjacent node and then continue traversing
another adjacent node.
This algorithm is contrary to the BFS algorithm where all the adjacent nodes are visited
followed by neighbors to the adjacent nodes. It starts exploring the graph from one node and
explores its depth before backtracking. Two things are considered in this algorithm:
Visiting a Vertex: Selection of a vertex or node of the graph to traverse.
procDFS_implement(G,v):
let St be a stack
St.push(v)
while St has elements
v = St.pop()
if v is not labeled as visited:
label v as visited
for all edge v to w inG.adjacentEdges(v) do
St.push(w)
Linear Traversal also exists for DFS that can be implemented in 3 ways:
Preorder
Inorder
PostOrder
Reverse postorder is a very useful way to traverse and used in topological sorting as well as
various analyses. A stack is also maintained to store the nodes whose exploration is still
pending.
In DFS, the below steps are followed to traverse the graph. For example, a given graph, let us
start traversal from 1:
Below are the steps to DFS Algorithm with advantages and disadvantages:
Step1: Node 1 is visited and added to the sequence as well as the spanning tree.
Step2: Adjacent nodes of 1 are explored that is 4 thus 1 is pushed to stack and 4 is pushed
into the sequence as well as spanning tree.
Step3: One of the adjacent nodes of 4 is explored and thus 4 is pushed to the stack, and 3
enters the sequence and spanning tree.
Step4: Adjacent nodes of 3 are explored by pushing it onto the stack and 10 enters the
sequence. As there is no adjacent node to 10, thus 3 is popped out of the stack.
Step5: Another adjacent node of 3 is explored and 3 is pushed onto the stack and 9 is visited.
No adjacent node of 9 thus 3 is popped out and the last adjacent node of 3 i.e 2 is visited.
A similar process is followed for all the nodes till the stack becomes empty which indicates
the stop condition for the traversal algorithm. — -> dotted lines in the spanning tree refers to
the back edges present in the graph.
In this way, all the nodes in the graph are traverse without repeating any of the nodes.
Advantages: The memory requirements for this algorithm is very less. Lesser space
and time complexity than BFS.
Code:
import java.util.Stack;
public class DepthFirstSearch {
static void depthFirstSearch(int[][] matrix, int source){
boolean[] visited = new boolean [matrix.length];
visited[source-1] = true;
Stack<Integer> stack = new Stack<>();
stack.push(source);
inti,x;
System.out.println("Depth of first order is");
System.out.println(source);
while(!stack.isEmpty()){
x= stack.pop();
for(i=0;i<matrix.length;i++){
if(matrix[x-1][i] == 1 && visited[i] == false){
stack.push(x);
visited[i]=true;
System.out.println(i+1);
x=i+1;
i=-1;
}
}
}
}
public static void main(String[] args){
int vertices=5;
int[][] mymatrix = new int[vertices][vertices];
for(inti=0;i<vertices;i++){
for(int j=0;j<vertices;j++){
mymatrix[i][j]= i+j;
}
}
depthFirstSearch(mymatrix,5);
}
}
Output:
Explanation of the above program: We made a graph having 5 vertices (0,1,2,3,4) and used a
visited array to keep track of all visited vertices. Then for each node whose adjacent nodes
exist same algorithm repeats till all the nodes are visited. Then the algorithm goes back to
calling vertex and pop it from the stack.
Depth-limited search
This search strategy is similar to DFS with a little difference. The difference is that in depth-
limited search, we limit the search by imposing a depth limit l to the depth of the search tree.
It does not need to explore till infinity. As a result, the depth-first search is a special case of
depth-limited search. when the limit l is infinite.
Depth-limited search on a binary tree
In the above figure, the depth-limit is 1. So, only level 0 and 1 get expanded in A->B->C
DFS sequence, starting from the root node A till node B. It is not giving satisfactory result
because we could not reach the goal node I.
Set a variable NODE to the initial state, i.e., the root node.
Set a variable GOAL which contains the value of the goal state.
Loop each node by traversing in DFS manner till the depth-limit value.
While performing the looping, start removing the elements from the stack in LIFO
order.
If the goal state is found, return goal state. Else terminate the search.
Completeness: Depth-limited search does not guarantee to reach the goal node.
Optimality: It does not give an optimal solution as it expands the nodes till the depth-
limit.
Note: Depth-limit search terminates with two kinds of failures: the standard failure value
indicates “no solution,” and cut-off value, which indicates “no solution within the depth-
limit.”
This search is a combination of BFS and DFS, as BFS guarantees to reach the goal node and
DFS occupies less memory space. Therefore, iterative deepening search combines these two
advantages of BFS and DFS to reach the goal node. It gradually increases the depth-limit
from 0,1,2 and so on and reach the goal node.
In the above figure, the goal node is H and initial depth-limit =[0-1]. So, it will expand level
0 and 1 and will terminate with A->B->C sequence. Further, change the depth-limit =[0-3], it
will again expand the nodes from level 0 till level 3 and the search terminate with A->B->D-
>F->E->H sequence where H is the desired goal node.
Loop each node up to the limit value and further increase the limit value accordingly.
Completeness: Iterative deepening search may or may not reach the goal state.
Space Complexity: It has the same space complexity as BFS, i.e., O(bd).
Note: Generally, iterative deepening search is required when the search space is large, and the
depth of the solution is unknown.
Bidirectional search
The strategy behind the bidirectional search is to run two searches simultaneously–one
forward search from the initial state and other from the backside of the goal–hoping that both
searches will meet in the middle. As soon as the two searches intersect one another, the
bidirectional search terminates with the goal node. This search is implemented by replacing
the goal test to check if the two searches intersect. Because if they do so, it means a solution
is found.
Some toy problems, such as 8-puzzle, 8-queen, tic-tac-toe, etc., can be solved more
efficiently with the help of a heuristic function. Let’s see how:
Consider the following 8-puzzle problem where we have a start state and a goal state. Our
task is to slide the tiles of the current/start state and place it in an order followed in the goal
state. There can be four moves either left, right, up, or down. There can be several ways to
convert the current/start state to the goal state, but, we can use a heuristic function h(n) to
solve the problem more efficiently.
So, there is total of three tiles out of position i.e., 6,5 and 4. Do not count the empty tile
present in the goal state). i.e. h(n)=3. Now, we require to minimize the value of h(n) =0.
We can construct a state-space tree to minimize the h(n) value to 0, as shown below:
It is seen from the above state space tree that the goal state is minimized from h(n)=3 to
h(n)=0. However, we can create and use several heuristic functions as per the reqirement. It is
also clear from the above example that a heuristic function h(n) can be defined as the
information required to solve a given problem more efficiently. The information can be
related to the nature of the state, cost of transforming from one state to another, goal node
characterstics, etc., which is expressed as a heuristic function.
Exercises
Module -2
Local and Adversarial search
1. Hill Climbing Algorithm: Hill climbing search is a local search problem. The
purpose of the hill climbing search is to climb a hill and reach the topmost peak/ point of that
hill. It is based on the heuristic search technique where the person who is climbing up on the
hill estimates the direction which will lead him to the highest peak.
To understand the concept of hill climbing algorithm, consider the below landscape
representing the goal state/peak and the current state of the climber. The topographical
regions shown in the figure can be defined as:
Global Maximum: It is the highest point on the hill, which is the goal state.
Local Maximum: It is the peak higher than all other peaks but lower than the global
maximum.
Flat local maximum: It is the flat area over the hill where it has no uphill or
downhill. It is a saturated point of the hill.
Simple hill climbing is the simplest technique to climb a hill. The task is to reach the highest
peak of the mountain. Here, the movement of the climber depends on his move/steps. If he
finds his next step better than the previous one, he continues to move else remain in the same
state. This search focus only on his previous and next step.
2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
Steepest-ascent hill climbing is different from simple hill climbing search. Unlike simple hill
climbing search, It considers all the successive nodes, compares them, and choose the node
which is closest to the solution. Steepest hill climbing search is similar to best-first search
because it focuses on each node instead of one.
Note: Both simple, as well as steepest-ascent hill climbing search, fails when there is no
closer node.
2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Loop until a better node is not found to reach the solution.
Stochastic hill climbing does not focus on all the nodes. It selects one node at random and
decides whether it should be expanded or search for a better one.
Random-restart algorithm is based on try and try strategy. It iteratively searches the node and
selects the best one at each step until the goal is not found. The success depends most
commonly on the shape of the hill. If there are few plateaus, local maxima, and ridges, it
becomes easy to reach the destination.
Hill climbing algorithm is a fast and furious approach. It finds the solution state rapidly
because it is quite easy to improve a bad state. But, there are following limitations of this
search:
Local Maxima: It is that peak of the mountain which is highest than all its
neighboring states but lower than the global maxima. It is not the goal peak because
there is another peak higher than it.
Ridges: It is a challenging problem where the person finds two or more local maxima
of the same height commonly. It becomes difficult for the person to navigate the right
point and stuck to that point itself.
2. Simulated Annealing
Simulated annealing is similar to the hill climbing algorithm. It works on the current
situation. It picks a random move instead of picking the best move. If the move leads to the
improvement of the current situation, it is always accepted as a step towards the solution
state, else it accepts the move having a probability less than 1. This search technique was first
used in 1980 to solve VLSI layout problems. It is also applied for factory scheduling and
other large optimization tasks.
Local beam search is quite different from random-restart search. It keeps track of k states
instead of just one. It selects k randomly generated states, and expand them at each step. If
any state is a goal state, the search stops with success. Else it selects the best k successors
from the complete list and repeats the same process. In random-restart search where each
search process runs independently, but in local beam search, the necessary information is
shared between the parallel search processes.
This search can suffer from a lack of diversity among the k states.
In every simulated annealing example, a random new point is generated. The distance
between the current point and the new point has a basis of the probability distribution on the
scale of the proportion of temperature. The algorithm aims at all those points that minimize
the objective with certain constraints and probabilities. Those points that raise the objective
are also accepted to explore all the possible solutions instead of concentrating only on local
minima.
There are a set of steps that are performed for simulated annealing in ai. These steps can be
summarized as follows:
Simulated annealing creates a trial point randomly. The algorithm selects the distance
between the current point and the trial point by a probability distribution. The scale of
such distribution is temperature. With the annealing function trial, point distribution
distance is set. To keep the boundaries intact, the trial point is shifted gradually.
The Simulated Annealing formula then determines if the new point is better than the
older or not. If the new point is better, it is made as a next point, while if the new
point is worse, it can still be accepted depending upon the simulated annealing
acceptance function.
A systematic algorithm gradually reduces the temperature selecting the best point that
gets generated in the process.
For lowering the values, the annealing parameters are set, raising and reducing the
temperatures. The simulated annealing parameters are based on the values of the
probable gradients of every dimension of the objective.
The simulated annealing is concluded when it reaches the lowest minima or any of the
specific stopping criteria.
Some of the conditions that are considered as the basis to stop the simulated-annealing are as
follows:
The simulated-annealing performs until the value of the objective function goes lesser
than the tolerance function. The value of default is 1e – 6
The default value of iterations in simulated-annealing is INF. This can be set to any
positive integer as well. When the algorithm exceeds the iteration value, it stops.
The annealing concludes when the maximum number of evaluations is achieved. The
default value of such evaluations is 3000 * number of variables.
The default value of maximum time is Inf, and when that is reached, the algorithm
stops.
When the best objective function value is lesser than the limit of the objective it
concludes. The default value of such an objective function is -Inf.
At the onset, a city class needs to be created to specify several destinations the
travelling salesman would visit.
After that, a class has to be created that keeps track of the cities.
Then a class is created that models the tour of the travelling salesman.
With all the different classes and the information in hand, a simulated-annealing
algorithm is created.
A heuristic search algorithm that examines a graph by extending the most promising node in
a limited set is known as beam search. Beam search is a heuristic search technique that
always expands the W number of the best nodes at each level. It progresses level by level and
moves downwards only from the best W nodes at each level. Beam Search uses breadth-first
search to build its search tree. Beam Search constructs its search tree using breadth-first
search. It generates all the successors of the current level’s state at each level of the tree.
However, at each level, it only evaluates a W number of states. Other nodes are not taken into
account.
The heuristic cost associated with the node is used to choose the best nodes. The width of the
beam search is denoted by W. If B is the branching factor, at every depth, there will always
be W × B nodes under consideration, but only W will be chosen. More states are trimmed
when the beam width is reduced.
When W = 1, the search becomes a hill-climbing search in which the best node is always
chosen from the successor nodes. No states are pruned if the beam width is unlimited, and the
beam search is identified as a breadth-first search.
The beamwidth bounds the amount of memory needed to complete the search, but it comes at
the cost of completeness and optimality (possibly that it will not find the best solution). The
reason for this danger is that the desired state could have been pruned.
Example: The search tree generated using this algorithm with W = 2 & B = 3 is given below :
4. Beam Search
The black nodes are selected based on their heuristic values for further expansion.
Start
Else :
Find SUCCs of NODE if any, with its estimated cost&
if NODE = Goal,
else
If FOUND = True,
else
return No
Stop
5. Genetic Algorithms
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger
part of evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection
and genetics. These are intelligent exploitation of random search provided with historical data
to direct the search into the region of better performance in solution space. They are
commonly used to generate high-quality solutions for optimization problems and search
problems.
Genetic algorithms simulate the process of natural selection which means those species who
can adapt to changes in their environment are able to survive and reproduce and go to next
generation. In simple words, they simulate “survival of the fittest” among individual of
consecutive generation for solving a problem. Each generation consist of a population of
individuals and each individual represents a point in search space and possible solution. Each
individual is represented as a string of character/integer/float/bits. This string is analogous to
the Chromosome.
Genetic algorithms are based on an analogy with genetic structure and behavior of
chromosome of the population. Following is the foundation of GAs based on this analogy –
2. Those individuals who are successful (fittest) then mate to create more offspring than
others
3. Genes from “fittest” parent propagate throughout the generation, that is sometimes
parents create offspring which is better than either parent.
Search space
The population of individuals are maintained within search space. Each individual represent a
solution in search space for given problem. Each individual is coded as a finite length vector
(analogous to chromosome) of components. These variable components are analogous to
Genes. Thus a chromosome (individual) is composed of several genes (variable components).
Fitness Score
A Fitness Score is given to each individual which shows the ability of an individual to
“compete”. The individual having optimal fitness score (or near optimal) are sought.
The GAs maintains the population of n individuals (chromosome/solutions) along with their
fitness scores.The individuals having better fitness scores are given more chance to reproduce
than others. The individuals with better fitness scores are selected who mate and produce
better offspring by combining chromosomes of parents. The population size is static so the
room has to be created for new arrivals. So, some individuals die and get replaced by new
arrivals eventually creating new generation when all the mating opportunity of the old
population is exhausted. It is hoped that over successive generations better solutions will
arrive while least fit die.
Each new generation has on average more “better genes” than the individual (solution) of
previous generations. Thus each new generations have better “partial solutions” than previous
generations. Once the offsprings produced having no significant difference than offspring
produced by previous populations, the population is converged. The algorithm is said to be
converged to a set of solutions for the problem.
Once the initial generation is created, the algorithm evolve the generation using following
operators –
1) Selection Operator: The idea is to give preference to the individuals with good fitness
scores and allow them to pass there genes to the successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are
selected using selection operator and crossover sites are chosen randomly. Then the genes at
these crossover sites are exchanged thus creating a completely new individual (offspring).
For example –
Given a target string, the goal is to produce target string starting from a random string of the
same length. In the following implementation, following analogies are made –
Characters A-Z, a-z, 0-9 and other special symbols are considered as genes. A string
generated by these character is considered as chromosome/solution/Individual.Fitness score is
the number of characters which differ from characters in target string at a particular index. So
individual having lower fitness value is given more preference.
Online search is a necessary idea for unknown environments, where the agent does not know
what states exist or what its actions do. In this state of ignorance, the agent faces an
exploration problem and must use its actions as experiments in order to learn enough to make
deliberation worthwhile. The canonical example of online search is a robot that is placed in a
new building and must explore it to build a map that it can use for getting from A to B.
Methods for escaping from labyrinths—required knowledge for aspiring heroes of antiquity
—are also examples of online search algorithms. Spatial exploration is not the only form of
exploration, however. Consider a newborn baby: it has many possible actions but knows the
outcomes of none of them, and it has experienced only a few of the possible states that it can
reach. The baby’s gradual discovery of how the world works is, in part, an online search
process. Online search problems. An online search problem must be solved by an agent
executing actions, rather than by pure
computation. We assume a deterministic and fully observable environment relaxes these
assumptions), but we stipulate that the agent knows only the following:
GOAL-TEST(s).
Note in particular that the agent cannot determine RESULT(s, a) except by actually being in s
and doing a. For example, in the maze problem shown in Figure 4.19, the agent does not
know that going Up from (1,1) leads to (1,2); nor, having done that, does it know that going
Down will take it back to (1,1). This degree of ignorance can be reduced in some applications
—for example, a robot explorer might know how its movement actions work and be ignorant
only of the locations of obstacles.
Game Playing is an important domain of artificial intelligence. Games don’t require much
knowledge; the only knowledge we need to provide is the rules, legal moves and the
conditions of winning or losing the game.
Both players try to win the game. So, both of them try to make the best move possible at each
turn. Searching techniques like BFS(Breadth First Search) are not accurate for this as the
branching factor is very high, so searching will take a lot of time. So, we need another search
procedures that improve –
6. Minimax search
The most common search technique in game playing is Minimax search procedure. It is
depth-first depth-limited search procedure. It is used for games like chess and tic-tac-toe.
MOVEGEN : It generates all the possible moves that can be generated from the current
position.
STATICEVALUATION : It returns a value depending upon the goodness from the viewpoint
of two-player
This algorithm is a two player game, so we call the first player as PLAYER1 and second
player as PLAYER2. The value of each node is backed-up from its children. For PLAYER1
the backed-up value is the maximum value of its children and for PLAYER2 the backed-up
value is the minimum value of its children. It provides most promising move to PLAYER1,
assuming that the PLAYER2 has make the best move. It is a recursive algorithm, as same
procedure occurs at each level.
Figure 2.9: Before backing-up of values
We assume that PLAYER1 will start the game. 4 levels are generated. The value to nodes H,
I, J, K, L, M, N, O is provided by STATICEVALUATION function. Level 3 is maximizing
level, so all nodes of level 3 will take maximum values of their children. Level 2 is
minimizing level, so all its nodes will take minimum values of their children. This process
continues. The value of A is 23. That means A should choose C move to win.
The word ‘pruning’ means cutting down branches and leaves. In data science pruning is a
much-used term which refers to post and pre-pruning in decision trees and random forest.
Alpha-beta pruning is nothing but the pruning of useless branches in decision trees. This
alpha-beta pruning algorithm was discovered independently by researchers in the 1900s.
Minimax is a classic depth-first search technique for a sequential two-player game. The two
players are called MAX and MIN. The minimax algorithm is designed for finding the optimal
move for MAX, the player at the root node. The search tree is created by recursively
expanding all nodes from the root in a depth-first manner until either the end of the game or
the maximum search depth is reached. Let us explore this algorithm in detail.
As already mentioned, there are two players in the game, viz- Max and Min. Max plays the
first step. Max’s task is to maximise its reward while Min’s task is to minimise Max’s
reward, increasing its own reward at the same time. Let’s say Max can take actions a, b, or c.
Which one of them will give Max the best reward when the game ends? To answer this
question, we need to explore the game tree to a sufficient depth and assume that Min plays
optimally to minimise the reward of Max.
Here is an example. Four coins are in a row and each player can pick up one coin or two
coins on his/her turn. The player who picks up the last coin wins. Assuming that Max plays
first, what move should Max make to win?
If Max picks two coins, then only two coins remain and Min can pick two coins and win.
Thus picking up 1 coin shall maximise Max’s reward.
As you might have noticed, the nodes of the tree in the figure below have some values
inscribed on them, these are called minimax value. The minimax value of a node is the utility
of the node if it is a terminal node.
If the node is a non-terminal Max node, the minimax value of the node is the maximum of the
minimax values of all of the node’s successors. On the other hand, if the node is a non-
terminal Min node, the minimax value of the node is the minimum of the minimax values of
all of the node’s successors.
Now we will discuss the idea behind the alpha beta pruning. If we apply alpha-beta pruning
to the standard minimax algorithm it gives the same decision as that of standard algorithm but
it prunes or cuts down the nodes that are unusual in decision tree i.e. which are not affecting
the final decision made by the algorithm. This will help to avoid the complexity in the
interpretation of complex trees.
Now let us discuss the intuition behind this technique. Let us try to find minimax decision in
the below tree :
Figure 2.12: After backing-up of values
In this case,
Minimax Decision = MAX {MIN {3, 5, 10}, MIN {2, a, b}, MIN {2, 7, 3}}
Here in the above result you must have a doubt in your mind that how can we find the
maximum from missing value. So, here is solution of your doubt also:
In the second node we choose the minimum value as c which is less than or equal to 2 i.e. c
<= 2. Now If c <= 3 and we have to choose the max of 3, c, 2 the maximum value will be 3.
We have reached a decision without looking at those nodes. And this is where alpha-beta
pruning comes into the play.
Alpha: Alpha is the best choice or the highest value that we have found at any
instance along the path of Maximizer. The initial value for alpha is – ∞.
Beta: Beta is the best choice or the lowest value that we have found at any instance
along the path of Minimizer. The initial value for alpha is + ∞.
Each node has to keep track of its alpha and beta values. Alpha can be updated only
when it’s MAX’s turn and, similarly, beta can be updated only when it’s MIN’s
chance.
MAX will update only alpha values and MIN player will update only beta values.
The node values will be passed to upper nodes instead of values of alpha and beta
during go into reverse of tree.
Alpha and Beta values only be passed to child nodes.
1. We will first start with the initial move. We will initially define the alpha and beta
values as the worst case i.e. α = -∞ and β= +∞. We will prune the node only when
alpha becomes greater than or equal to beta.
2. Since the initial value of alpha is less than beta so we didn’t prune it. Now it’s turn for
MAX. So, at node D, value of alpha will be calculated. The value of alpha at node D will be
max (2, 3). So, value of alpha at node D will be 3.
3. Now the next move will be on node B and its turn for MIN now. So, at node B, the value
of alpha beta will be min (3, ∞). So, at node B values will be alpha= – ∞ and beta will be 3.
Figure 2.14: Max-Max
In the next step, algorithms traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.
4. Now it’s turn for MAX. So, at node E we will look for MAX. The current value of alpha at
E is – ∞ and it will be compared with 5. So, MAX (- ∞, 5) will be 5. So, at node E, alpha = 5,
Beta = 5. Now as we can see that alpha is greater than beta which is satisfying the pruning
condition so we can prune the right successor of node E and algorithm will not be traversed
and the value at node E will be 5.
Figure 2.15: Max-Max
6. In the next step the algorithm again comes to node A from node B. At node A alpha will be
changed to maximum value as MAX (- ∞, 3). So now the value of alpha and beta at node A
will be (3, + ∞) respectively and will be transferred to node C. These same values will be
transferred to node F.
7. At node F the value of alpha will be compared to the left branch which is 0. So, MAX (0,
3) will be 3 and then compared with the right child which is 1, and MAX (3,1) = 3 still α
remains 3, but the node value of F will become 1.
Figure 2.16: Max-Max
8. Now node F will return the node value 1 to C and will compare to beta value at C. Now its
turn for MIN. So, MIN (+ ∞, 1) will be 1. Now at node C, α= 3, and β= 1 and alpha is greater
than beta which again satisfies the pruning condition. So, the next successor of node C i.e. G
will be pruned and the algorithm didn’t compute the entire subtree G.
The above represented tree is the final tree which is showing the nodes which are computed
and the nodes which are not computed. So, for this example the optimal value of the
maximizer will be 3.
The effectiveness of alpha – beta pruning is based on the order in which node is examined.
Move ordering plays an important role in alpha beta pruning.
1. Worst Ordering: In some cases of alpha beta pruning none of the node pruned by the
algorithm and works like standard minimax algorithm. This consumes a lot of time as
because of alpha and beta factors and also not gives any effective results. This is
called Worst ordering in pruning. In this case, the best move occurs on the right side
of the tree.
2. Ideal Ordering: In some cases of alpha beta pruning lot of the nodes pruned by the
algorithm. This is called Ideal ordering in pruning. In this case, the best move occurs
on the left side of the tree. We apply DFS hence it first search left of the tree and go
deep twice as minimax algorithm in the same amount of time.
Rules to find Good ordering
Order of nodes should be in such a way that the best nodes will be computed first
• a set of variables,
• a set of constraints.
The aim is to choose a value for each variable so that the resulting possible world satisfies the
constraints; we want a model of the constraints.
A finite CSP has a finite set of variables and a finite domain for each variable. Many of the
methods considered in this chapter only work for finite CSPs, although some are designed for
infinite, even continuous, domains.
The multidimensional aspect of these problems, where each variable can be seen as a separate
dimension, makes them difficult to solve but also provides structure that can be exploited.
• Find a model.
• Find the best model, given a measure of how good models are; see Section 4.10.
CSPs are very common, so it is worth trying to find relatively efficient ways to solve them.
Determining whether there is a model for a CSP with finite domains is NP-hard (see box) and
no known algorithms exist to solve such problems that do not use exponential time in the
worst case. However, just because a problem is NP-hard does not mean that all instances are
difficult to solve. Many instances have structure that can be exploited.
CSP solvers can be faster than state-space searchers because the CSP solver
can quickly eliminate large swatches of the search space;
With CSP, once we find out that a partial assignment is not a solution, we can
immediately discard further refinements of the partial assignment.
Consider a small part of the car assembly, consisting of 15 tasks: install axles (front and
back), affix all four wheels (right and left, front and back), tighten nuts for each wheel, affix
hubcaps, and inspect the final assembly. Represent the tasks with 15 variables:
X = {AxleF, AxleB, WheelRF, WheelLF, WheelRB, WheelLB, NutsRF, NutsLF, NutsRB, NutsLB, CapRF,
CapLF, CapRB, CapLB, Inspect}. The value of each variable is the time that the task starts.
Assert that the inspection come last and takes 3 minutes. For every variable except Inspect we
add a constraint of the form X + dX ≤ Inspect.
There is a requirement to get the whle assembly done in 30 minutes, we can achieve that by
limiting the domain of all variables:
Di = {1, 2, 3, …, 27}.
The simplest kind of CSP involves variables that have discrete, finite domains. E.g. Map-
coloring problems, scheduling with time limits, the 8-queens problem.
A discrete domain can be infinite. e.g. The set of integers or strings. With infinite domains, to
describe constraints, a constraint language must be used instead of enumerating all allowed
combinations of values.
CSP with continuous domains are common in the real world and are widely studied in the
field of operations research.
The simplest type is the unary constraint, which restricts the value of a single variable.
A binary constraint relates two variables. (e.g. SA≠NSW.) A binary CSP is one with only
binary constraints, can be represented as a constraint graph.
Constraint hypergraph: consists of ordinary nodes (circles in the figure) and hypernodes (the
squares), which represent n-ary constraints.
Two ways to transform an n-ary CSP to a binary one:
a. Every finite domain constraint can be reduced to a set of binary constraints if enough
auxiliary variables are introduced, so we could transform any CSP into one with only binary
constraints.
b. The dual-graph transformation: create a new graph in which there will be one variable for
each constraint in the original graph, and one binary constraint for each pair of constraints in
the original graph that share variables.
e.g. If the original graph has variable {X,Y,Z} and constraints <(X,Y,Z),C1> and
<(X,Y),C2>, then the dual graph would have variables {C1,C2} with the binary constraint
<(X,Y),R1>, where (X,Y) are the shared variables and R1 is a new relation that defines the
constraint between the shared variables.
We might prefer a global constraint (such as Alldiff) rather than a set of binary constraints for
two reasons:
2) possible to design special-purpose inference algorithms for global constraints that are not
available for a set of more primitive constraints.
A number of inference techniques use the constraints to infer which variable/value pairs are
consistent and which are not. These include node, arc, path, and k-consistent.
constraint propagation: Using the constraints to reduce the number of legal values for a
variable, which in turn can reduce the legal values for another variable, and so on.
local consistency: If we treat each variable as a node in a graph and each binary constraint as
an arc, then the process of enforcing local consistency in each part of the graph causes
inconsistent values to be eliminated throughout the graph.
Node consistency
A single variable (a node in the CSP network) is node-consistent if all the values in the
variable’s domain satisfy the variable’s unary constraint.
Arc consistency
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s
binary constraints.
Xi is arc-consistent with respect to another variable Xj if for every value in the current domain
Di there is some value in the domain Dj that satisfies the binary constraint on the arc (Xi, Xj).
Arc consistency tightens down the domains (unary constraint) using the arcs (binary
constraints).
AC-3 algorithm:
AC-3 maintains a queue of arcs which initially contains all the arcs in the CSP.
AC-3 then pops off an arbitrary arc (X i, Xj) from the queue and makes X i arc-consistent with
respect to Xj.
But if this revises Di, then add to the queue all arcs (Xk, Xi) where Xk is a neighbor of Xi.
If Di is revised down to nothing, then the whole CSP has no consistent solution, return
failure;
Otherwise, keep checking, trying to remove values from the domains of variables until no
more arcs are in the queue.
The result is an arc-consistent CSP that have the same solutions as the original one but have
smaller domains.
The complexity of AC-3:
Assume a CSP with n variables, each with domain size at most d, and with c binary
constraints (arcs). Checking consistency of an arc can be done in O(d 2) time, total worst-case
time is O(cd3).
Path consistency
Path consistency: A two-variable set {Xi, Xj} is path-consistent with respect to a third
variable Xm if, for every assignment {Xi = a, Xj = b} consistent with the constraint on {X i,
Xj}, there is an assignment to Xm that satisfies the constraints on {Xi, Xm} and {Xm, Xj}.
Path consistency tightens the binary constraints by using implicit constraints that are inferred
by looking at triples of variables.
K-consistency
K-consistency: A CSP is k-consistent if, for any set of k-1 variables and for any consistent
assignment to those variables, a consistent value can always be assigned to any kth variable.
A CSP with n nodes and make it strongly n-consistent, we are guaranteed to find a solution in
time O(n2d). But algorithm for establishing n-consitentcy must take time exponential in n in
the worse case, also requires space that is exponential in n.
Global constraints
A global constraint is one involving an arbitrary number of variables (but not necessarily all
variables). Global constraints can be handled by special-purpose algorithms that are more
efficient than general-purpose methods.
A simple algorithm: First remove any variable in the constraint that has a singleton domain,
and delete that variable’s value from the domains of the remaining variables. Repeat as long
as there are singleton variables. If at any point an empty domain is produced or there are
more vairables than domain values left, then an inconsistency has been detected.
e.g.
Atmost(10, P1, P2, P3, P4): no more than 10 personnel are assigned in total.
If each variable has the domain {3, 4, 5, 6}, the Atmost constraint cannot be satisfied.
We can enforce consistency by deleting the maximum value of any domain if it is not
consistent with the minimum values of the other domains.
e.g. If each variable in the example has the domain {2, 3, 4, 5, 6}, the values 5 and 6 can be
deleted from each domain.
For large resource-limited problems with integer values, domains are represented by upper
and lower bounds and are managed by bounds propagation.
e.g.
suppose there are two flights F 1 and F2 in an airline-scheduling problem, for which the planes
have capacities 165 and 385, respectively. The initial domains for the numbers of passengers
on each flight are
Now suppose we have the additional constraint that the two flight together must carry 420
people: F1 + F2 = 420. Propagating bounds constraints, we reduce the domains to
A CSP is bounds consistent if for every variable X, and for both the lower-bound and upper-
bound values of X, there exists some value of Y that satisfies the constraint between X and Y
for every variable Y.
Sudoku
A Sudoku puzzle can be considered a CSP with 81 variables, one for each square. We use the
variable names A1 through A9 for the top row (left to right), down to I1 through I9 for the
bottom row. The empty squares have the domain {1, 2, 3, 4, 5, 6, 7, 8, 9} and the pre-filled
squares have a domain consisting of a single value.
There are 27 different Alldiff constraints: one for each row, column, and box of 9 squares:
Backtracking search, a form of depth-first search, is commonly used for solving CSPs.
Inference can be interwoven with search.
Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values
in the domain of that variable in turn, trying to find a solution. If an inconsistency is detected,
then BACKTRACK returns failure, causing the previous call to try another value.
3)When the search arrives at an assignment that violates a constraint, can the search avoid
repeating this failure?
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the variable with the
fewest “legal” value. A.k.a. “most constrained variable” or “fail-first” heuristic, it picks a
variable that is most likely to cause a failure soon thereby pruning the search tree. If some
variable X has no legal values left, the MRV heuristic will select X and failure will be
detected immediately—avoiding pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for
SA, so it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor on future
choices by selecting the variable that is involved in the largest number of constraints on other
unassigned variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has
degree 0.
ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then the ordering
does not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the
neighboring variables in the constraint graph. (Try to leave the maximum flexibility for
subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that our next
choice is for Q. Blue would be a bad choice because it eliminates the last legal value left for
Q’s neighbor, SA, therefore prefers red to blue.
2. Interleaving search and inference
Advantage: For many problems the search will be more effective if we combine the MRV
heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t
look ahead and make all the other variables arc-consistent.
Intelligent backtracking
e.g.
Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue, T=red}.
When we try the next variable SA, we see every value violates a constraint.
Intelligent backtracking: Backtrack to a variable that was responsible for making one of the
possible values of the next variable (e.g. SA) impossible.
Conflict set for a variable: A set of assignments that are in conflict with some value for that
variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
(e.g. backjumping would jump over T and try a new value for V.)
Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s domain,
add X=x to Y’s conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added
to the conflict set of X.
Conflict-directed backjumping:
e.g.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4
variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because
NT doesn’t have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with
any subsequent variables to have no consistent solution. So the algorithm should backtrack to
NSW and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-direct
backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a
standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value
for Xj fails, backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward, given
the preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
After backjumping from a contradiction, how to avoid running into the same problem again:
Constraint learning: The idea of finding a minimum set of variables from the conflict set that
causes the problem. This set of variables, along with their corresponding values, is called
a no-good. We then record the no-good, either by adding a new constraint to the CSP or by
keeping a separate cache of no-goods.
Exercise
Module-3
Knowledge and Reasoning
Knowledge-base is required for updating knowledge for an agent to learn with experiences
and take action as per the knowledge.
2. LOGIC
that differ, at least on the surface, from existing forms of classical machine learning and deep
learning. It is crucial to keep in mind just as there are many forms of machine learning; there
are many different forms of logic-based approaches to AI with their own sets of tradeoffs.
2.1 Propositional logic (PL) is the simplest form of logic where all the statements are made
by propositions. A proposition is a declarative statement which is either true or false. It is a
technique of knowledge representation in logical and mathematical form.
Example:
1. a) It is Sunday.
2. b) The Sun rises from West (False proposition)
3. c) 3+3= 7(False proposition)
4. d) 5 is a prime number.
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:
a. Atomic Propositions
b. Compound propositions
1. a) 2+2 is 4, it is an atomic proposition as it is a true fact.
2. b) "The Sun is cold" is also a proposition as it is a false fact.
o Compound proposition: Compound propositions are constructed by combining
simpler or atomic propositions, using parenthesis and logical connectives.
Example:
1. a) "It is raining today, and street is wet."
2. b) "Ankit is a doctor, and his clinic is in Mumbai."
3. Horn Clauses
The definite clause language does not allow a contradiction to be stated. However, a simple
expansion of the language can allow proof by contradiction.
An integrity constraint is a clause of the form
false←a1∧...∧ak.
where the ai are atoms and false is a special atom that is false in all interpretations.
A Horn clause is either a definite clause or an integrity constraint. That is, a Horn clause has
either false or a normal atom as its head.
Integrity constraints allow the system to prove that some conjunction of atoms is false in all
models of a knowledge base - that is, to prove disjunctions of negations of
atoms. Recall that ¬p is the negation of p, which is true in an interpretation when p is false in
that interpretation, and p∨q is the disjunction of p and q, which is true in an interpretation
if p is true or q is true or both are true in the interpretation. The integrity
constraint false←a1∧...∧ak is logically equivalent to ¬a1∨...∨¬ak.
A Horn clause knowledge base can imply negations of atoms, as shown in Example 5.16.
Example 5.16: Consider the knowledge base KB1:
false←a∧b.
a←c.
b←c.
The atom c is false in all models of KB1. If c were true in model I of KB1, then a and b would
both be true in I (otherwise I would not be a model of KB1).
Because false is false in I and a and b are true in I, the first clause is false in I, a contradiction
to I being a model of KB1. Thus, c is false in all models of KB1. This is expressed as
KB1 ¬c
which means that ¬c is true in all models of KB1, and so c is false in all models of KB1.
Although the language of Horn clauses does not allow disjunctions and negations to be input,
disjunctions of negations of atoms can be derived,
In the topic of Propositional logic, we have seen that how to represent statements using
propositional logic. But unfortunately, in propositional logic, we can only represent the facts,
which are either true or false. PL is not sufficient to represent the complex sentences or
natural language statements. The propositional logic has very limited expressive power.
Consider the following sentence, which we cannot represent using PL logic.
To represent the above statements, PL logic is not sufficient, so we required some more
powerful logic, such as first-order logic.
a. Syntax
b. Semantics
The syntax of FOL determines which collection of symbols is a logical expression in first-
order logic. The basic syntactic elements of first-order logic are symbols. We write
statements in short-hand notation in FOL.
Variables x, y, z, a, b,....
Connective ∧, ∨, ¬, ⇒, ⇔
s
Equality ==
Quantifier ∀, ∃
Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences
are formed from a predicate symbol followed by a parenthesis with a sequence of
terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject
of the statement and second part "is an integer," is known as a predicate.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
o For all x
o For each x
o For every x.
Example:
Let a variable x which refers to a cat so all x can be represented in UOD as below:
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.
Note: In Existential quantifier we always use AND or Conjunction symbol (∧).
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
Example:
It will be read as: There are some x where x is a boy who is intelligent.
Inference in First-Order Logic is used to deduce new facts or sentences from existing
sentences. Before understanding the FOL inference rule, let's understand some basic
terminologies used in FOL.
Substitution:
Equality:
Stay
First-Order logic does not only use predicate and terms for making atomic sentences but also
uses another way, which is equality in FOL. For this, we can use equality symbols which
specify that the two terms refer to the same object.
As in the above example, the object referred by the Brother (John) is similar to the object
referred by Smith. The equality symbol can also be used with negation to represent that two
terms are not the same objects.
As propositional logic we also have inference rules in first-order logic, so following are some
basic inference rules in FOL:
1. Universal Generalization
2. Universal Instantiation
3. Existential Instantiation
4. Existential introduction
1. Universal Generalization:
o Universal generalization is a valid inference rule which states that if premise P(c) is
true for any arbitrary element c in the universe of discourse, then we can have a
conclusion as ∀ x P(x).
Example: Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8
bits.", it will also be true.
2. Universal Instantiation:
o Universal instantiation is also called as universal elimination or UI is a valid inference
rule. It can be applied multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for the
variable.
o The UI rule state that we can infer any sentence P(c) by substituting a ground term c
(a constant within domain x) from ∀ x P(x) for any object in the universe of discourse.
Example:1.
Example: 2.
"All kings who are greedy are Evil." So let our knowledge base contains this detail as in the
form of FOL:
So from this information, we can infer any of the following statements using Universal
Instantiation:
3. Existential Instantiation:
Example:
4. Existential introduction
technique that represents the knowledge in logical & mathematical form. There are two types
Since propositional logic works on 0 and 1 thus it is also known as ‘Boolean Logic’.
In this type of logic, symbolic variables are used in order to represent the logic and any
FOL articulates the natural language statements briefly. Another name of First-Order Logic is
‘Predicate Logic’.
Facts about First Order Logic
FOL is known as the powerful language which is used to develop information related
Unlike PL, FOL assumes some of the facts that are related to objects, relations, and
functions.
FOL has two main key features or you can say parts that are; ‘Syntax’ & ‘Semantics’.
Propositional Logic converts a complete sentence into a symbol and makes it logical
whereas in First-Order Logic relation of a particular sentence will be made that involves
can easily represent the individual establishment that means if you are writing a single
PL does not signify or express the generalization, specialization or pattern for example
‘QUANTIFIERS’ cannot be used in PL but in FOL users can easily use quantifiers as it
What is Unification?
Substitution θ = {John/x} is a unifier for these atoms and applying this substitution, and both
expressions will be identical.
o The UNIFY algorithm is used for unification, which takes two atomic sentences and
returns a unifier for those sentences (If any exist).
o Unification is a key component of all first-order inference algorithms.
o It returns fail if the expressions do not match with each other.
o The substitution variables are called Most General Unifier or MGU.
E.g. Let's say there are two different expressions, P(x, y), and P(a, f(z)).
In this example, we need to make both above statements identical to each other. For this, we
will perform the substitution.
12.1M
287
C++ vs Java
P(x, y)......... (i)
P(a, f(z))......... (ii)
o Substitute x with a, and y with f(z) in the first expression, and it will be represented
as a/x and f(z)/y.
o With both the substitutions, the first expression will be identical to the second
expression and the substitution set will be: [a/x, f(z)/y].
o Predicate symbol must be same, atoms or expression with different predicate symbol
can never be unified.
o Number of Arguments in both expressions must be identical.
o Unification will fail if there are two similar variables present in the same expression.
o
Forward Chaining and backward chaining in AI
In artificial intelligence, forward and backward chaining is one of the important topics, but
before understanding forward and backward chaining lets first understand that from where
these two terms came.
Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence,
which applies logical rules to the knowledge base to infer new information from known facts.
The first inference engine was part of the expert system. Inference engine commonly
proceeds in two modes, which are:
A. Forward chaining
B. Backward chaining
A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when
using an inference engine. Forward chaining is a form of reasoning which start with atomic
sentences in the knowledge base and applies inference rules (Modus Ponens) in the forward
direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises
are satisfied, and add their conclusion to the known facts. This process repeats until the
problem is solved.
Properties of Forward-Chaining:
Consider the following famous example which we will use in both approaches:
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A,
an enemy of America, has some missiles, and all the missiles were sold to it by Robert, who
is an American citizen."
To solve the above problem, first, we will convert all the above facts into first-order definite
clauses, and then we will use a forward-chaining algorithm to reach the goal.
B. Backward Chaining:
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e.,
proofs by contradictions. It was invented by a Mathematician John Alan Robinson in the year
1965.
Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Unification is a key concept in proofs by resolutions.
Resolution is a single inference rule which can efficiently operate on the conjunctive normal
form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a
unit clause.
13.5M
326
Features of Java - Javatpoint
Next
Stay
Note: To better understand this topic, firstly learns the FOL in AI.
The resolution rule for first-order logic is simply a lifted version of the propositional rule.
Resolution can resolve two clauses if they contain complementary literals, which are assumed
to be standardized apart so that they share no variables.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
To better understand all the above steps, we will take an example in which we will apply
resolution.
Example:
a. John likes all kind of food.
b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.
Prove by resolution that:
f. John likes peanuts.
In the first step we will convert all the given statements into its first order logic.
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Move negation (¬)inwards and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known
as Skolemization. But in this example problem since there is no existential quantifier
so all the statements will remain same in this step.
o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not
implicitly quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).
In this statement, we will apply negation to the conclusion statements, which will be written
as ¬likes(John, Peanuts)
Now in this step, we will solve the problem by resolution tree using substitution. For the
above problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a complete contradiction with the
given set of statements.
Exercise
Module-4
Handling Uncertainty
Quantifying Uncertainty
The concept of quantifying uncertainty relies on how an agent can keep away uncertainty
with a degree of belief. The term uncertainty refers to that situation or information which is
either unknown or imperfect.
Earlier, we have seen that the problem-solving agents rely on the belief states (which
represents all possible states and generates the future plan) to handle uncertainty. But there
were certain drawbacks created when the agent’s program was created:
If the right contingent plan (future plan) is selected, it can grow arbitrarily large.
There can be a condition sometimes when no plan guarantees to reach the goal. Thus,
a method should be there to compare the pros and cons of the plans which are not guaranteed.
Need of Uncertainty
To understand the need, let’s see the below example of uncertain reasoning:
Consider the diagnosis of a cancer patient. By following the propositional logic, a rule can be
derived as:
Cancer?age.
But this rule is incorrect as all cancers are not caused due to age effects. There can be other
possible causes like environment, genetics, skin type, etc. We can rewrite the rule as:
Unfortunately, this rule will also not work because there can be unlimited causes of cancer.
There is only one way to make the rule applicable, i.e., to make it logically exhaustive. But,
using logic with such medical like domain fails due to the following reasons:
Laziness: It is difficult to list all set of antecedents or consequents required to ensure
exceptionless rules. With this, it is too typical to use such rules.
In statistics and probability theory, the Bayes’ theorem (also known as the Bayes’ rule) is a
mathematical formula used to determine the conditional probability of events. Essentially, the
Bayes’ theorem describes the probability of an event based on prior knowledge of the
conditions that might be relevant to the event.
The theorem is named after English statistician, Thomas Bayes, who discovered the formula
in 1763. It is considered the foundation of the special statistical inference approach called the
Bayes’ inference.
Besides statistics, the Bayes’ theorem is also used in various disciplines, with medicine and
pharmacology as the most notable examples. In addition, the theorem is commonly employed
in different fields of finance. Some of the applications include but are not limited to,
modeling the risk of lending money to borrowers or forecasting the probability of the success
of an investment.
Where:
P(A|B) – the probability of event A occurring, given event B has occurred
Note that events A and B are independent events (i.e., the probability of the outcome of event
A does not depend on the probability of the outcome of event B).
A special case of the Bayes’ theorem is when event A is a binary variable. In such a case, the
theorem is expressed in the following way:
Where:
P(B|A+) – the probability of event B occurring given that event A+ has occurred
In the special case above, events A– and A+ are mutually exclusive outcomes of event A.
Imagine you are a financial analyst at an investment bank. According to your research
of publicly-traded companies, 60% of the companies that increased their share price by more
than 5% in the last three years replaced their CEOs during the period.
At the same time, only 35% of the companies that did not increase their share price by more
than 5% in the same period replaced their CEOs. Knowing that the probability that the stock
prices grow by more than 5% is 4%, find the probability that the shares of a company that
fires its CEO will increase by more than 5%.
Before finding the probabilities, you must first define the notation of the probabilities.
P(A) – the probability that the stock price increases by 5%
P(A|B) – the probability of the stock price increases by 5% given that the CEO has
been replaced
P(B|A) – the probability of the CEO replacement given the stock price has increased
by 5%.
Thus, the probability that the shares of a company that replaces its CEO will grow by more
than 5% is 6.67%.
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional
logic with certainty, which means we were sure about the predicates. With this knowledge
representation, we might write A→B, which means if A is true then B is true, but consider a
situation where we are not sure about whether A is true or not then we cannot express this
statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A match
between two teams or two players." These are probable sentences for which we can assume
that it will happen but not sure about it, so here we use probabilistic reasoning.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
1. P(A) = 0, indicates total uncertainty in an event A.
1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
o P(¬A) + P(A) = 1.
Random variables: Random variables are used to represent the events and objects in the real
world.
Posterior Probability: The probability that is calculated after all evidence or information has
taken into account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already
happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be given
as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample
space will be reduced to set B, and now we can only calculate event A when event B is
already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes
English and mathematics, and then what is the percent of students those who like English also
like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
Probability of a given event = Chances of that event occurring / Total number of Events.
Considering the chances of March being cold is only 30%, therefore, P(S) = 0.3
Probability always takes a value between 0 and 1. If the probability is 0, then the event will
never occur and if it is 1, then it will occur for sure.
Then, P(S∧T) means Probability of S AND T, i.e., Probability of March and April being cold.
Proofs for the properties are not given here and you can work them out by yourselves using
Venn Diagrams.
Conditional Property
Conditional Property is defined as the probability of a given event given another event. It is
denoted by P(B|A) and is read as: ''Probability of B given probability of A.''
Hidden Markov models (HMMs) are a formal foundation for making probabilistic
models of linear sequence 'labeling' problems1,2. They provide a conceptual toolkit
for building complex models just by drawing an intuitive picture. They are at the heart
of a diverse range of programs, including genefinding, profile searches, multiple
sequence alignment and regulatory site identification. HMMs are the Legos of
computational sequence analysis.
Hidden Markov Model (HMM)
When we can not observe the state themselves but only the result of some probability
function(observation) of the states we utilize HMM. HMM is a statistical Markov model in
which the system being modeled is assumed to be a Markov process with unobserved
(hidden) states.
Markov Model: Series of (hidden) states z={z_1,z_2………….} drawn from state alphabet S
={s_1,s_2,…….𝑠_|𝑆|} where z_i belongs to S.
Assumptions of HMM
HMM too is built upon several assumptions and the following is vital. Output independence
assumption: Output observation is conditionally independent of all other hidden states and all
other observations when given the current hidden state.
Emission
Probability Matrix: Probability of hidden state generating output v_i given that state at the
corresponding time was s_j.Hidden Markov Model as a finite state machineConsider the
example given below in Fig.3. which elaborates how a person feels on different climates.
Set of states (S) = {Happy, Grumpy}Set of hidden states (Q) = {Sunny , Rainy}
Observed States for four day = {z1=Happy, z2= Grumpy, z3=Grumpy, z4=Happy}
The feeling that you understand from a person emoting is called the observations since you
observe them.
The weather that influences the feeling of a person is called the hidden state since you can’t
observe it.