Module - 1: Introduction To AI

Module -1
Introduction to AI
1. Introduction
Data, Knowledge, Information, Intelligent, Artificial intelligent. Need, characteristics and

function of AI, Artificial intelligent application. Intelligent Agents, Solving problems by
searching, Problem-solving agents, Well defined problems and Solutions with examples.
Applications of AI:- ANN, Fuzzy Systems, NLP. Introduction to Expert systems.
Uninformed search strategies-BFS, DFS, Iterative deepening, bidirectional search. Heuristic
Search Techniques: Greedy BFS, A*, Memory bounded, Heuristic functions.
Data is the quantities, characters, or symbols on which operations are performed by a

computer, which may be stored and transmitted in the form of electrical signals and recorded
on magnetic, optical, or mechanical recording media. Things known or assumed as facts,
making the basis of reasoning or calculation.
In computing, data is information that has been translated into a form that is efficient for
movement or processing. Relative to today's computers and transmission media, data is
information converted into binary digital form. Raw data is a term used to describe data in its
most basic digital format.
Data collection is the process of gathering and measuring information from countless
different sources. In order to use the data we collect to develop practical artificial intelligence
(AI) and machine learning solutions, it must be collected and stored in a way that makes
sense for the business problem at hand.
AI works by combining large amounts of data with fast, iterative processing and intelligent
algorithms, allowing the software to learn automatically from patterns or features in the data.
The process requires multiple passes at the data to find connections and derive meaning from
undefined data.
As well as its role as input data for AI systems, data also plays a vital role in training,
validation and testing AI outputs. ... At this step of AI development, data is used to create a
test set and a training set.
Knowledge is a familiarity, awareness, or understanding of someone or something, such as

facts (descriptive knowledge), skills (procedural knowledge), or objects (acquaintance
knowledge). The term "knowledge" can refer to a theoretical or practical understanding of a
subject.
Knowledge is the information about a domain that can be used to solve problems in that
domain. To solve many problems requires much knowledge, and this knowledge must be
represented in the computer. As part of designing a program to solve problems, we must
define how the knowledge will be represented.
Figure 1.1: Knowledge Representation
The definition of information is news or knowledge received or given. An example of

information is what's given to someone who asks for background about
something.Technically, data are raw facts and figures that are processed into information,
such as summaries and totals.
You need information because it empowers you! Information allows: intellectual

development which leads to academic credibility. the development of subject knowledge
leading to an ability to discuss your subject with authority.
difference between data and information
Data contains raw figures and facts. Information unlike data provides insights analyzed
through the data collected. Information is particular with correlation to the inferences derived.
Data doesn't harbor any real meaning whereas information exists to provide insights and
meaning.
Intelligent means having or showing the ability to easily learn or understand things or to
deal with new or difficult situations or having or showing a lot of intelligence or able to
learn and understand things or having an ability to deal with problems or situations that
resembles or suggests the ability of an intelligent person
Artificial Intelligence (AI) refers to the ability of machines to perform cognitive tasks like
thinking, perceiving, learning, problem solving and decision making; it is inspired by the
ways people use their brains to perceive, learn, reason out and decide the action.
The intelligence demonstrated by machines is known as Artificial Intelligence. Artificial

Intelligence has grown to be very popular in today’s world. It is the simulation of natural
intelligence in machines that are programmed to learn and mimic the actions of humans.
These machines are able to learn with experience and perform human-like tasks. As
technologies such as AI continue to grow, they will have a great impact on our quality of
life. It’s but natural that everyone today wants to connect with AI technology somehow,
may it be as an end-user or pursuing a career in Artificial Intelligence.
Artificial Intelligence means
 An intelligent entity created by humans.

 Capable of performing tasks intelligently without being explicitly instructed.
 Capable of thinking and acting rationally and humanely.
Artificial Intelligence enhances the speed, precision and effectiveness of human efforts. In

financial institutions, AI techniques can be used to identify which transactions are likely to be
fraudulent, adopt fast and accurate credit scoring, as well as automate manually intense data
management tasks.
The basic objective of AI (also called heuristic programming, machine intelligence, or the
simulation of cognitive behavior) is to enable computers to perform such intellectual tasks as
decision making, problem solving, perception, understanding human communication (in any
language, and translate among them)
2. Artificial Intelligence Characteristics and function

1. Deep Learning
Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans, to learn by example. Innumerable developers are leveraging the latest
deep learning innovative technologies to take their business to the new high.
There are large numbers of fields of Artificial Intelligence technology like autonomous
vehicles, computer vision, automatic text generation, and the like, where the scope and use of
deep learning are increasing.
Take an example of Self Driving feature in cars like Tesla(Autopilot), where Deep learning is
a key technology behind enabling them to recognize a stop sign or to distinguish a pedestrian
from a lamppost.
2. Facial Recognition
Artificial Intelligence has made it possible to recognize individual faces using biometric
mapping. This has lead to pathbreaking advancements in surveillance technologies. It
compares the knowledge with a database of known faces to seek out a match.
However, this has also faced a lot of criticism for breach of privacy.
For example, Clearview AI, an American technology company, offers surveillance

technology for law agencies to monitor entire cities with a network of CCTV Cameras
exactly assigning each and every citizen with their Social Credit Score in real-time.
3. Automate Simple and Repetitive Tasks
AI has the ability to execute the same kind of work over and over again without breaking a
sweat. To understand this feature better, let’s take the example of Siri, a voice-enabled
assistant created by Apple Inc. It can handle so many commands in a single day!
From asking to take up notes for a brief, to rescheduling the calendar for a meeting, to
guiding us through the streets with navigation, the assistant has it all covered.
Earlier, all of these activities had to be done manually which used to take up a lot of time and
effort.
The automation would not only lead to increased efficiencies but also result in lower
overhead costs and in some cases a safer work environment.
4. Data Ingestion
With every passing day, the data that we are all producing is growing exponentially, which is
where AI steps in. Instead of manually feeding this data, AI-enabled not just gathers this data
but also analyzes it with the help of its previous experiences.
Data ingestion is that the transportation of knowledge from assorted sources to a data-storage
medium where it are often accessed, used, and analyzed by a corporation.
AI, with the help of neural networks, analyzes a large amount of such data and helps in
providing a logical inference out of it.
5. Chatbots
Chatbots are software to provide a window for solving customer problems’ through either
audio or textual input. Earlier the bots used to respond only to specific commands. If you say
the wrong thing, it didn’t know what you meant.
The bot was only as smart as it was programmed to be. The real change came when these
chatbots were enabled by artificial intelligence.
Now, you don’t have to be ridiculously specific when you are talking to the chatbot. It
understands language, not just commands.
For example, Watson Assistant, an AI-powered assistant, developed by IBM which can run

across various channels like websites, messengers, and apps and requires zero human
intervention once programmed.
There are a lot of companies that have moved on from voice process executives to chatbots to
help customers solve their problems.
The chatbots not only offer services revolving around issues that the customers face but also
provides product suggestions to the users. All this, just because of AI.
6. Quantum Computing
AI is helping solve complex quantum physics problems with the accuracy of supercomputers
with the help of quantum neural networks. This can lead to path-breaking developments in
the near future.
It is an interdisciplinary field that focuses on building quantum algorithms for improving
computational tasks within AI, including sub-fields like machine learning.
The whole concept of quantum-enhanced AI algorithms remains in the conceptual research

domain
For example, A pioneer in this field is Google AI Quantum whose objective is to develop

superconducting qubit processors and quantum-assisted optimization for varied applications.
7. Cloud Computing
Next Artificial Intelligence characteristics is Cloud Computing. With such a huge amount of
data being churned out every day, data storage in a physical form would have been a major
problem.
AI capabilities are working within the business cloud computing environment to make
organizations more efficient, strategic, and insight-driven.
However, the advent of Cloud Computing has saved us from such worries.
Microsoft Azure is one of the prominent players in the cloud computing industry. It offers to
deploy your own machine learning models to your data stored in cloud servers without any
lock-in.
AI works by combining large amounts of data with fast, iterative processing and intelligent
algorithms, allowing the software to learn automatically from patterns or features in the data.
Computer vision relies on pattern recognition and deep learning to recognize what's in a
picture or video.
3. AI Applications:
3.1 Automation
Industry has often sought to leverage technology to drive productivity. So, to reduce
production costs, industries have automated many repetitive activities and processes to reduce
the amount of human intervention required. Machines and computers use automation to
perform repetitive tasks and adapt to changes in circumstances. Automation has been widely
adopted in both blue-collar and white-collar workplaces.
3.2 Machine Learning
Machine learning is a revolutionary idea: feed a machine a large amount of data, and it will
use the experience gained from the data to improve its own algorithm and process data better
in the future. The most significant arm of machine learning is Neural Networks. Neural
Networks are interconnected networks of nodes called neurons or perceptrons. These are
loosely modeled on the way the human brain processes information.
Neural Networks store data, learn from it, and improve their abilities to sort new data. For
example, a Neural Network tasked with identifying dogs can be fed various images of dogs
tagged with the type of dog. Over time, it will learn what kind of image corresponds to what
kind of dog. The machine therefore learns from experience and improves itself.
3.3 Deep Learning
Deep Learning is a subset of Machine Learning. In Deep Learning, Neural Networks are
arranged into sprawling networks with a large number of layers that are trained using massive
amounts of data. It is different from most other kinds of Machine Learning, which generally
stress training on labeled data (for example, a picture of a dog with a tag identifying the name
of the dog, and some instructions on how to process each of these). In Deep Learning, the
sprawling artificial Neural Network is fed unlabeled data and not given any instructions. It
determines the important characteristics and purpose of the data itself, while storing it as
experience. Returning to our dog example: when images of a dog are fed to a Deep Learning
Neural Network, the machine itself determines the important characteristics of each breed of
dog from the images, and can then use these to identify a given dog’s breed.
3.4 Machine Vision
Machine Vision seeks to allow computers to see. A computer captures images from a mounted
camera and converts them from analog to digital (the latter can be easily analyzed). Machine
Vision methods often seek to simulate the human eye. Machine Vision has various potential
uses, such as signature identification and medical image analysis.
3.5 Natural Language Processing (NLP)
NLP techniques (including voice recognition, text translation, and sentiment analysis) allow
computers to comprehend human language and speech. While Siri and Alexa are examples of
commercially available products using NLP algorithms, the major technology companies have
developed far more advanced NLP techniques than the ones Siri and Alexa use.
4. Intelligent agent
An intelligent agent is a program that can make decisions or perform a service based on its
environment, user input and experiences. These programs can be used to autonomously
gather information on a regular, programmed schedule or when prompted by the user in real
time. Intelligent agents may also be referred to as a bot, which is short for robot.
Typically, an agent program, using parameters the user has provided, searches all or some
part of the internet, gathers information the user is interested in and presents it to them on a
periodic or requested basis. Data intelligent agents can extract any specifiable information,
such as included keywords or publication date. In agents that employ artificial intelligence
(AI), user input is collected using sensors, like microphone or cameras, and agent output is
delivered through actuators, like speakers or screens. The practice of having information
brought to a user by an agent is called push technology.
Common characteristics of intelligent agents are adaptation based on experience, real time
problem solving, analysis of error or success rates and the use of memory-based storage and
retrieval.
For enterprises, intelligent agents can be used for applications in data mining, data analytics
and customer service and support (CSS). Consumers can also use intelligent agents to
compare the prices of similar products and notify the user when a website update occurs.
Intelligent agents are also similar to software agents which are autonomous computer
programs.
Examples of Agent:
 A software agent has Keystrokes, file contents, received network packages which act
as sensors and displays on the screen, files, sent network packets acting as actuators.
 A Human-agent has eyes, ears, and other organs which act as sensors, and hands,
legs, mouth, and other body parts acting as actuators.
 A Robotic agent has Cameras and infrared range finders which act as sensors and
various motors acting as actuators.

Figure 1.2: Agents
4.1 Types of Agents
Agents can be grouped into four classes based on their degree of perceived intelligence and
capability :
 Simple Reflex Agents
 Model-Based Reflex Agents
 Goal-Based Agents
 Utility-Based Agents
 Learning Agent
Simple reflex agents
Simple reflex agents ignore the rest of the percept history and act only on the basis of
the current percept. Percept history is the history of all that an agent has perceived to date.
The agent function is based on the condition-action rule. A condition-action rule is a rule
that maps a state i.e, condition to an action. If the condition is true, then the action is taken,
else not. This agent function only succeeds when the environment is fully observable. For
simple reflex agents operating in partially observable environments, infinite loops are often
unavoidable. It may be possible to escape from infinite loops if the agent can randomize its
actions.
Problems with Simple reflex agents are :
 Very limited intelligence.
 No knowledge of non-perceptual parts of the state.
 Usually too big to generate and store.
 If there occurs any change in the environment, then the collection of rules need to be
updated.
Figure 1.3: Simple reflex agents
Model-based reflex agents
It works by finding a rule whose condition matches the current situation. A model-based
agent can handle partially observable environments by the use of a model about the world.
The agent has to keep track of the internal state which is adjusted by each percept and that
depends on the percept history. The current state is stored inside the agent which maintains
some kind of structure describing the part of the world which cannot be seen.
Updating the state requires information about :
 how the world evolves independently from the agent, and
 how the agent’s actions affect the world.

Goal-based agents
These kinds of agents take decisions based on how far they are currently from
their goal(description of desirable situations). Their every action is intended to reduce its
distance from the goal. This allows the agent a way to choose among multiple possibilities,
selecting the one which reaches a goal state. The knowledge that supports its decisions is
represented explicitly and can be modified, which makes these agents more flexible. They
usually require search and planning. The goal-based agent’s behavior can easily be
changed.

Figure 1.4: Goal-based agents
Utility-based agents
The agents which are developed having their end uses as building blocks are called utility-
based agents. When there are multiple possible alternatives, then to decide which one is best,
utility-based agents are used. They choose actions based on a preference (utility) for each
state. Sometimes achieving the desired goal is not enough. We may look for a quicker, safer,
cheaper trip to reach a destination. Agent happiness should be taken into consideration.
Utility describes how “happy” the agent is. Because of the uncertainty in the world, a utility
agent chooses the action that maximizes the expected utility. A utility function maps a state
onto a real number which describes the associated degree of happiness.

Figure 1.5: Utility-based agents
Learning Agent :
A learning agent in AI is the type of agent that can learn from its past experiences or it has
learning capabilities. It starts to act with basic knowledge and then is able to act and adapt
automatically through learning.
A learning agent has mainly four conceptual components, which are:
1. Learning element: It is responsible for making improvements by learning from the

environment
2. Critic: The learning element takes feedback from critics which describes how well
the agent is doing with respect to a fixed performance standard.
3. Performance element: It is responsible for selecting external action
4. Problem Generator: This component is responsible for suggesting actions that will

lead to new and informative experiences.
Figure 1.6: Learning Agent
5. The searching approach of an agent to find a solution to the problem.
Whenever the agent is confronted by a problem, its first action is seeking a solution is its
knowledge system. This is known as the search for the solution in the knowledge base.
Another attempt can be to search for a solution by going into different states. The search of
the agent stops in the state when the agent reaches the goal state.
There are many approaches for searching a particular goal state from all the states that the
agent can be in.
There are many search algorithms which are followed by an agent for solving the problems
by searching. Some of them are:
 Random search:
In this search technique, an agent just keeps checking any random state for being it the goal
state. This is not an effective way to search the solution because, in this search, each node can
be searched again and again, there is no fixed path followed, problems like infinite searching
can be faced.
 Breadth-first search (BFS):

In this type of search, the agent considers every state as a node of a tree data structure. It first
checks the current node and then evaluates all the neighboring nodes. After all the
neighboring nodes are checked, it moves towards the next set of neighboring nodes for any of
the neighbor nodes, and this process continues until the search is ended. In BFS, the nodes of
the tree are traversed level after level.
 Depth-first search (DFS):

In the DFS, the search first begins from the root node and first one of the child node’s sub-
tree is completely traversed. That is, first all the one-sided nodes are checked, and then the
other sided nodes are checked.
 Best First Search (Heuristic Search):

In the best first search, which is also known as the heuristic search, the agent picks up the
best node based upon the heuristic value irrespective of where the node is.
 A* search:
It is one of the best and popular techniques used in path finding and graph traversals. It decides
the node to be traversed on the basis of an f-score which is calculated according to some norms
and the node with the highest f-score gets traversed. Here, the f-score is calculated on the basis of
misplaced events and number of nodes which needs to be moved to in order to replace the
nodes.The problem-solving agent performs precisely by defining problems and several solutions.
So we can say that problem solving is a part of artificial intelligence that encompasses a number
of techniques such as a tree, B-tree, heuristic algorithms to solve a problem.
Problem-solving agent in Artificial Intelligence is goal-based agents that focus on goals, is one
embodiment of a group of algorithms, and techniques to solve a well-defined problem in the area
of Artificial Intelligence.
Problem-solving agent in Artificial Intelligence is goal-based agents that focus on goals, is one
embodiment of a group of algorithms, and techniques to solve a well-defined problem in the area
of Artificial Intelligence. And these agents are different from reflex agents who just have to map
states into actions and can’t map when storing and learning both are bigger. The different stages
that Problem-solving agents perform, to arrive at a desired state or solution are:
1. Articulating or expressing the desired goal and the problem is tried upon, clearly.
2. Explore and examine
3. Find the solution from the various algorithms on the table.
4. The final step is Execution!
Problem Solving Agent

 Goal Formulation-Set of one or more (desirable) world states.(eg.Checkmate in
Chess)
 Problem Formulation-What actions and states to consider given a goal and an initial
state
 Search for solution-Given the problem, search for a solution--a sequence of actions
to achieve the goal starting from initial state
 Execution of the solution
An important aspect of intelligence is goal-based problem solving. ... A well-defined problem

can be described by: Initial state. Operator or successor function - for any state x returns s(x),
the set of states reachable from x with one action. State space - all states reachable from
initial by any sequence of actions.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a computational
network based on biological neural networks that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other, artificial neural networks
also have neurons that are linked to each other in various layers of the networks. These
neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network.
In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing
map, Building blocks, unsupervised learning, Genetic algorithm, etc.
Figure 1.7: Biological Neural Network
The given figure illustrates the typical diagram of Biological Neural Network. The typical
Artificial Neural Network looks something like the given figure.
Figure 1.8: Artificial Neural Networks
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic
the network of neurons makes up a human brain so that computers will have an option to
understand things and make decisions in a human-like manner. The artificial neural network
is designed by programming computers to behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to

understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.
Artificial Neural Network primarily consists of three layers:
Figure 1.9: Layers of ANN

Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the

output. Activation functions choose whether a node should fire or not. Only those who are
fired make it to the output layer. There are distinctive activation functions available that can
be applied upon the sort of task we are performing.
 Advantages of Artificial Neural Network (ANN)

1. Parallel processing capability:
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
2. Storing data on the entire network:
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
3. Capability to work with incomplete knowledge:
After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.
4. Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.
5. Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
 Disadvantages of Artificial Neural Network:

1. Assurance of proper network structure:
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
2. Unrecognized behavior of the network:
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.
3. Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
4. Difficulty of showing the issue to the network:
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
The duration of the network is unknown:
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Science artificial neural networks that have steeped into the world in the mid-20 th century are
exponentially developing. In the present time, we have investigated the pros of artificial
neural networks and the issues encountered in the course of their utilization. It should not be
overlooked that the cons of ANN networks, which are a flourishing science branch, are
eliminated individually, and their pros are increasing day by day. It means that artificial
neural networks will turn into an irreplaceable part of our lives progressively important.
How do artificial neural networks work?
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the
form of a vector. These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.
Figure 1.10: Working of ANN
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between neurons
inside the artificial neural network. All the weighted inputs are summarized inside the
computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions. Some of the commonly used sets of activation functions are the
Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each
of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there
is a threshold value set up. If the net weighted input of neurons is more than 1, then the final
output of the activation function is returned as one or else the output is returned as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function is
defined as:
F(x) = (1/1 + exp(-????x))
Where ???? is considered the Steepness parameter.
Types of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output

layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
input, the intensity of the network can be noticed based on group behavior of the associated
neurons, and the output is decided. The primary advantage of this network is that it figures
out how to evaluate and recognize input patterns.
Prerequisite
No specific expertise is needed as a prerequisite before starting this tutorial.
Audience
Our Artificial Neural Network Tutorial is developed for beginners as well as professionals, to
help them understand the basic concept of ANNs.
Problems
We assure you that you will not find any problem in this Artificial Neural Network tutorial.
But if there is any problem or mistake, please post the problem in the contact form so that we
can further improve it.
Fuzzy Logic Systems (FLS) produce acceptable but definite output in response to incomplete,
ambiguous, distorted, or inaccurate (fuzzy) input.
What is Fuzzy Logic?
Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of
FL imitates the way of decision making in humans that involves all intermediate possibilities
between digital values YES and NO.
The conventional logic block that a computer can understand takes precise input and
produces a definite output as TRUE or FALSE, which is equivalent to human’s YES or NO.
The inventor of fuzzy logic, LotfiZadeh, observed that unlike computers, the human decision
making includes a range of possibilities between YES and NO, such as –
CERTAINL YES
Y
POSSIBLY YE
S
CANNOT SAY
POSSIBLY NO
CERTAINL NO
Y
The fuzzy logic works on the levels of possibilities of input to achieve the definite output.
Implementation
 It can be implemented in systems with various sizes and capabilities ranging from
small micro-controllers to large, networked, workstation-based control systems.
 It can be implemented in hardware, software, or a combination of both.
Why Fuzzy Logic?
Fuzzy logic is useful for commercial and practical purposes.
 It can control machines and consumer products.
 It may not give accurate reasoning, but acceptable reasoning.
 Fuzzy logic helps to deal with the uncertainty in engineering.
Fuzzy Logic Systems Architecture
It has four main parts as shown −
 Fuzzification Module − It transforms the system inputs, which are crisp numbers, into
fuzzy sets. It splits the input signal into five steps such as −
LP
x is Large Positive
MP
x is Medium Positive
S
x is Small
MN
x is Medium Negative
LN
x is Large Negative
 Knowledge Base − It stores IF-THEN rules provided by experts.
 Inference Engine − It simulates the human reasoning process by making fuzzy

inference on the inputs and IF-THEN rules.
 Defuzzification Module − It transforms the fuzzy set obtained by the inference engine
into a crisp value.
The membership functions work on fuzzy sets of variables.
Membership Function
Membership functions allow you to quantify linguistic term and represent a fuzzy set
graphically. A membership function for a fuzzy set A on the universe of discourse X is
defined as μA:X → [0,1].
Here, each element of X is mapped to a value between 0 and 1. It is called membership value
or degree of membership. It quantifies the degree of membership of the element in X to the
fuzzy set A.
 x axis represents the universe of discourse.
 y axis represents the degrees of membership in the [0, 1] interval.
There can be multiple membership functions applicable to fuzzify a numerical value. Simple
membership functions are used as use of complex functions does not add more precision in
the output.
All membership functions for LP, MP, S, MN, and LN are shown as below −
The triangular membership function shapes are most common among various other
membership function shapes such as trapezoidal, singleton, and Gaussian.
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the
corresponding output also changes.
Example of a Fuzzy Logic System
Let us consider an air conditioning system with 5-level fuzzy logic system. This system
adjusts the temperature of air conditioner by comparing the room temperature and the target
temperature value.
Algorithm
 Define linguistic Variables and terms (start)
 Construct membership functions for them. (start)
 Construct knowledge base of rules (start)
 Convert crisp data into fuzzy data sets using membership functions. (fuzzification)
 Evaluate rules in the rule base. (Inference Engine)
 Combine results from each rule. (Inference Engine)
 Convert output data into non-fuzzy values. (defuzzification)
Development
Step 1 − Define linguistic variables and terms
Linguistic variables are input and output variables in the form of simple words or sentences.
For room temperature, cold, warm, hot, etc., are linguistic terms.
Temperature (t) = {very-cold, cold, warm, very-warm, hot}
Every member of this set is a linguistic term and it can cover some portion of overall
temperature values.
Step 2 − Construct membership functions for them

The membership functions of temperature variable are as shown −
Step3 − Construct knowledge base rules
Create a matrix of room temperature values versus target temperature values that an air
conditioning system is expected to provide.
RoomTemp
Very_Cold Cold Warm Hot Very_Hot
. /Target
Very_Cold No_Change Heat Heat Heat Heat
Cold Cool No_Chang Heat Heat Heat

e
Warm Cool Cool No_Change Heat Heat
Hot Cool Cool Cool No_Change Heat
Very_Hot Cool Cool Cool Cool No_Change
Build a set of rules into the knowledge base in the form of IF-THEN-ELSE structures.
Sr. Condition Action

No.
IF temperature=(Cold OR Very_Cold) AND target=Warm Heat
1
THEN
IF temperature=(Hot OR Very_Hot) AND target=Warm Cool

2
THEN
3 IF (temperature=Warm) AND (target=Warm) THEN No_Change
Step 4 − Obtain fuzzy value
Fuzzy set operations perform evaluation of rules. The operations used for OR and AND are
Max and Min respectively. Combine all results of evaluation to form a final result. This result
is a fuzzy value.
Step 5 − Perform defuzzification
Defuzzification is then performed according to membership function for output variable.
Application Areas of Fuzzy Logic
The key application areas of fuzzy logic are as given −
Automotive Systems
 Automatic Gearboxes
 Four-Wheel Steering
 Vehicle environment control

Consumer Electronic Goods
 Hi-Fi Systems
 Photocopiers
 Still and Video Cameras
 Television
Domestic Goods
 Microwave Ovens
 Refrigerators
 Toasters
 Vacuum Cleaners
 Washing Machines
Environment Control
 Air Conditioners/Dryers/Heaters
 Humidifiers
Advantages of FLSs
 Mathematical concepts within fuzzy reasoning are very simple.
 You can modify a FLS by just adding or deleting rules due to flexibility of fuzzy
logic.
 Fuzzy logic Systems can take imprecise, distorted, noisy input information.
 FLSs are easy to construct and understand.
 Fuzzy logic is a solution to complex problems in all fields of life, including medicine,
as it resembles human reasoning and decision making.
Disadvantages of FLSs
 There is no systematic approach to fuzzy system designing.
 They are understandable only when simple.
 They are suitable for the problems which do not need high accuracy.
Natural Language Processing (NLP) refers to AI method of communicating with an
intelligent systems using a natural language such as English.
Processing of Natural Language is required when you want an intelligent system like robot to
perform as per your instructions, when you want to hear decision from a dialogue based
clinical expert system, etc.
The field of NLP involves making computers to perform useful tasks with the natural
languages humans use. The input and output of an NLP system can be −
 Speech
 Written Text
Components of NLP
There are two components of NLP as given −
Natural Language Understanding (NLU)
Understanding involves the following tasks −
 Mapping the given input in natural language into useful representations.
 Analyzing different aspects of the language.
Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of natural
language from some internal representation.
It involves −
 Text planning − It includes retrieving the relevant content from knowledge base.
 Sentence planning − It includes choosing required words, forming meaningful

phrases, setting tone of the sentence.
 Text Realization − It is mapping sentence plan into sentence structure.
The NLU is harder than NLG.
Difficulties in NLU
NL has an extremely rich form and structure.
It is very ambiguous. There can be different levels of ambiguity −

 Lexical ambiguity − It is at very primitive level such as word-level.
 For example, treating the word “board” as noun or verb?
 Syntax Level ambiguity − A sentence can be parsed in different ways.
 For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or
he lifted a beetle that had red cap?
 Referential ambiguity − Referring to something using pronouns. For example, Rima

went to Gauri. She said, “I am tired.” − Exactly who is tired?
 One input can mean different meanings.
 Many inputs can mean the same thing.
NLP Terminology
 Phonology − It is study of organizing sound systematically.
 Morphology − It is a study of construction of words from primitive meaningful units.
 Morpheme − It is primitive unit of meaning in a language.
 Syntax − It refers to arranging words to make a sentence. It also involves determining

the structural role of words in the sentence and in phrases.
 Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
 Pragmatics − It deals with using and understanding sentences in different situations

and how the interpretation of the sentence is affected.
 Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
 World Knowledge − It includes the general knowledge about the world.
Steps in NLP
There are general five steps −
 Lexical Analysis − It involves identifying and analyzing the structure of words.

Lexicon of a language means the collection of words and phrases in a language.
Lexical analysis is dividing the whole chunk of txt into paragraphs, sentences, and
words.
 Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for

grammar and arranging words in a manner that shows the relationship among the
words. The sentence such as “The school goes to boy” is rejected by English syntactic
analyzer.
 Semantic Analysis − It draws the exact meaning or the dictionary meaning from the
text. The text is checked for meaningfulness. It is done by mapping syntactic
structures and objects in the task domain. The semantic analyzer disregards sentence
such as “hot ice-cream”.
 Discourse Integration − The meaning of any sentence depends upon the meaning of
the sentence just before it. In addition, it also brings about the meaning of
immediately succeeding sentence.
 Pragmatic Analysis − During this, what was said is re-interpreted on what it actually
meant. It involves deriving those aspects of language which require real world
knowledge.
Implementation Aspects of Syntactic Analysis
There are a number of algorithms researchers have developed for syntactic analysis, but we
consider only the following simple methods −
 Context-Free Grammar
 Top-Down Parser
Let us see them in detail −
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite
rules. Let us create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily
understand and process it. In order for the parsing algorithm to construct this parse tree, a set
of rewrite rules, which describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other
symbols. According to first order logic rule, if there are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string combined by NP followed by VP is a sentence. The rewrite
rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching

N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks",
sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the subject-verb
agreement error is approved as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
 They are not highly precise. For example, “The grains peck the bird”, is a
syntactically correct according to parser, but even if it makes no sense, parser takes it
as a correct sentence.
 To bring out high precision, multiple sets of grammar need to be prepared. It may
require a completely different sets of rules for parsing singular and plural variations,
passive sentences, etc., which can lead to creation of huge set of rules that are
unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of
terminal symbols that matches the classes of the words in the input sentence until it consists
entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is
started over again with a different set of rules. This is repeated until a specific rule is found
which describes the structure of the sentence.
Merit − It is simple to implement.
Demerits −
 It is inefficient, as the search process has to be repeated if an error occurs.
 Slow speed of working.
Expert systems (ES) are one of the prominent research domains of AI. It is introduced by the
researchers at Stanford University, Computer Science Department.
What are Expert Systems?
The expert systems are the computer applications developed to solve complex problems in a
particular domain, at the level of extra-ordinary human intelligence and expertise.
Characteristics of Expert Systems
 High performance
 Understandable
 Reliable
 Highly responsive
Capabilities of Expert Systems
The expert systems are capable of −
 Advising
 Instructing and assisting human in decision making

 Demonstrating
 Deriving a solution
 Diagnosing
 Explaining
 Interpreting input
 Predicting results
 Justifying the conclusion
 Suggesting alternative options to a problem
They are incapable of −
 Substituting human decision makers
 Possessing human capabilities
 Producing accurate output for inadequate knowledge base
 Refining their own knowledge
Components of Expert Systems
The components of ES include −
 Knowledge Base
 Inference Engine
 User Interface
Let us see them one by one briefly −

Knowledge Base
It contains domain-specific and high-quality knowledge.
Knowledge is required to exhibit intelligence. The success of any ES majorly depends upon
the collection of highly accurate and precise knowledge.
What is Knowledge?
The data is collection of facts. The information is organized as data and facts about the task
domain. Data, information, and past experience combined together are termed as knowledge.
Components of Knowledge Base
The knowledge base of an ES is a store of both, factual and heuristic knowledge.
 Factual Knowledge − It is the information widely accepted by the Knowledge

Engineers and scholars in the task domain.
 Heuristic Knowledge − It is about practice, accurate judgement, one’s ability of

evaluation, and guessing.
Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is in
the form of IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and
accuracy of the information stored in the knowledge base.
The knowledge base is formed by readings from various experts, scholars, and the
Knowledge Engineers. The knowledge engineer is a person with the qualities of empathy,
quick learning, and case analyzing skills.
He acquires information from subject expert by recording, interviewing, and observing him at
work, etc. He then categorizes and organizes the information in a meaningful way, in the
form of IF-THEN-ELSE rules, to be used by interference machine. The knowledge engineer
also monitors the development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a
correct, flawless solution.
In case of knowledge-based ES, the Inference Engine acquires and manipulates the
knowledge from the knowledge base to arrive at a particular solution.
In case of rule based ES, it −
 Applies rules repeatedly to the facts, which are obtained from earlier rule application.
 Adds new knowledge into the knowledge base if required.
 Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies −
 Forward Chaining
 Backward Chaining
Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
Here, the Inference Engine follows the chain of conditions and derivations and finally
deduces the outcome. It considers all the facts and rules, and sorts them before concluding to
a solution.
This strategy is followed for working on conclusion, result, or effect. For example, prediction
of share market status as an effect of changes in interest rates.
Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this
happened?”
On the basis of what has already happened, the Inference Engine tries to find out which
conditions could have happened in the past for this result. This strategy is followed for
finding out cause or reason. For example, diagnosis of blood cancer in humans.
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally
Natural Language Processing so as to be used by the user who is well-versed in the task
domain. The user of the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may
appear in the following forms −
 Natural language displayed on screen.
 Verbal narrations in natural language.
 Listing of rule numbers displayed on the screen.

The user interface makes it easy to trace the credibility of the deductions.
Requirements of Efficient ES User Interface
 It should help users to accomplish their goals in shortest possible way.
 It should be designed to work for user’s existing or desired work practices.
 Its technology should be adaptable to user’s requirements; not the other way round.
 It should make efficient use of user input.
Expert Systems Limitations
No technology can offer easy and complete solution. Large systems are costly, require
significant development time, and computer resources. ESs have their limitations which
include −
 Limitations of the technology
 Difficult knowledge acquisition
 ES are difficult to maintain
 High development costs
Applications of Expert System
The following table shows where ES can be applied.
Application
Description
Design Domain
Camera lens design, automobile design.
Medical Domain
Diagnosis Systems to deduce cause of disease from observed data, conduction medical
operations on humans.
Monitoring Systems
Comparing data continuously with observed system or with prescribed behavior such as
leakage monitoring in long petroleum pipeline.
Process Control Systems

Controlling a physical process based on monitoring.
Knowledge Domain
Finding out faults in vehicles, computers.
Finance/Commerce
Detection of possible fraud, suspicious transactions, stock market trading, Airline scheduling,
cargo scheduling.
Expert System Technology
There are several levels of ES technologies available. Expert systems technologies include −
 Expert System Development Environment − The ES development environment

includes hardware and tools. They are −
o Workstations, minicomputers, mainframes.
o High level Symbolic Programming Languages such as LIStProgramming

(LISP) and PROgrammationenLOGique (PROLOG).
o Large databases.
 Tools − They reduce the effort and cost involved in developing an expert system to
large extent.
o Powerful editors and debugging tools with multi-windows.
o They provide rapid prototyping
o Have Inbuilt definitions of model, knowledge representation, and inference

design.
 Shells − A shell is nothing but an expert system without knowledge base. A shell
provides the developers with knowledge acquisition, inference engine, user interface,
and explanation facility. For example, few shells are given below −
o Java Expert System Shell (JESS) that provides fully developed Java API for
creating an expert system.
o Vidwan, a shell developed at the National Centre for Software Technology,

Mumbai in 1993. It enables knowledge encoding in the form of IF-THEN
rules.
Development of Expert Systems: General Steps
The process of ES development is iterative. Steps in developing the ES include −
Identify Problem Domain
 The problem must be suitable for an expert system to solve it.
 Find the experts in task domain for the ES project.
 Establish cost-effectiveness of the system.
Design the System
 Identify the ES Technology
 Know and establish the degree of integration with the other systems and databases.
 Realize how the concepts can represent the domain knowledge best.
Develop the Prototype
From Knowledge Base: The knowledge engineer works to −
 Acquire domain knowledge from the expert.
 Represent it in the form of If-THEN-ELSE rules.
Test and Refine the Prototype
 The knowledge engineer uses sample cases to test the prototype for any deficiencies
in performance.
 End users test the prototypes of the ES.
Develop and Complete the ES
 Test and ensure the interaction of the ES with all elements of its environment,
including end users, databases, and other information systems.
 Document the ES project well.
 Train the user to use ES.
Maintain the System
 Keep the knowledge base up-to-date by regular review and update.
 Cater for new interfaces with other information systems, as those systems evolve.
Benefits of Expert Systems
 Availability − They are easily available due to mass production of software.
 Less Production Cost − Production cost is reasonable. This makes them affordable.
 Speed − They offer great speed. They reduce the amount of work an individual puts
in.
 Less Error Rate − Error rate is low as compared to human errors.
 Reducing Risk − They can work in the environment dangerous to humans.
 Steady response − They work steadily without getting motional, tensed or fatigued.
Introduction to Uninformed Search
An Uninformed search is a group of wide range usage algorithms of the era. These algorithms
are brute force operations, and they don’t have extra information about the search space; the
only information they have is on how to traverse or visit the nodes in the tree. Thus
uninformed search algorithms are also called blind search algorithms. The search algorithm
produces the search tree without using any domain knowledge, which is the brute force in
nature. They are different from informed search algorithms in a way that you check for a goal
when a node is generated or expanded, and they don’t have any background information on
how to approach the goal.
Types of Uninformed Search Algorithms
Below are the various types of Uninformed Search Algorithms:
1. Breadth-First Search Algorithms
BFS is a search operation for finding the nodes in a tree. The algorithm works breadthwise
and traverses to find the desired node in a tree. It starts searching operation from the root
nodes and expands the successor nodes at that level before moving ahead and then moves
along breadth wise for further expansion.
 It occupies a lot of memory space, and time to execute when the solution is at the
bottom or end of the tree and uses the FIFO queue.
 Time Complexity of BFS is expressed as T (n) = 1+n2+n3+…….+nd= O (nd) and;
 Space Complexity of BFS is O (nd).

 The breadth-first search algorithm is complete.
 The optimal solution is possible to obtain from BFS.
2. Depth First Search Algorithms
DFS is one of the recursive algorithms we know. It traverses the graph or a tree depth-wise.
Thus it is known to be a depth-first search algorithm as it derives its name from the way it
functions. The DFS uses the stack for its implementation. The process of search is similar to
BFS. The only difference lies in the expansion of nodes which is depth-wise in this case.
 Unlike the BFS, the DFS requires very less space in the memory because of the way it
stores the nodes stack only on the path it explores depth-wise.
 In comparison to BFS, the execution time is also less if the expansion of nodes is
correct. If the path is not correct, then the recursion continues, and there is no
guarantee that one may find the solution. This may result in an infinite loop
formation.
 The DFS is complete only with finite state space.
 Time Complexity is expressed as T(n) = 1+ n2+ n3+………+ nm=O(nm).
 The Space Complexity is expressed as O (bm).
 The DFS search algorithm is not optimal, and it may generate large steps and possibly
high cost to find the solution.
3. Depth Limited Search Algorithm
The DLS algorithm is one of the uninformed strategies. A depth limited search is close to
DFS to some extent. It can find the solution to the demerit of DFS. The nodes at the depth
may behave as if no successor exists at the depth. Depth-limited search can be halted in two
cases:
o SFV: The Standard failure value which tells that there is no solution to the
problem.
o CFV: The Cutoff failure value tells that there is no solution within the given
depth.
 The DLS is efficient in memory space utilization.
 Time Complexity is expressed as O(bℓ).
 Space Complexity is expressed as O(b×ℓ).
 It has the demerit of incompleteness. It is complete only if the solution is above the
depth limit.
4. Uniform-cost Search Algorithm
The UCS algorithm is used for visiting the weighted tree. The main goal of the uniform cost
search is to fetch a goal node and find the true path, including the cumulative cost. The
following are the properties of the UCS algorithm:
 The expansion takes place on the basis of cost from the root. The UCS is implemented
using a priority queue.
 The UCS does not care for the number of steps, and so it may end up an infinite loop.
 The uniform-cost search algorithm is known to be complete.
 Time Complexity can be expressed as O(b1 + [C*/ε])/
 Space Complexity is expressed as O(b1 + [C*/ε]).
 We can say that UCs is the optimal algorithm as it chooses the path with the lowest
cost only.
5. Iterative deepening depth-first Search
This algorithm is a combination of BFS and DFS searching techniques. It is iterative in

nature. The best depth is found using it. The algorithm is set to search only at a certain depth.
The depth keeps increasing at each recursive step until it finds the goal node.
Popular Course in this category
 The power of BFS and DFS combination is observed in this algorithm.
 When the search space is large, it proves itself, and the depth is not known.
 This algorithm has one demerit, and it is that it iterates all the previous steps.
 The algorithm is known to be complete only if the branching factor is known r finite.
 Time Complexity is expressed as O(bd).
 Space Complexity is expressed as O(bd).
 This algorithm is optimal.
6. Bidirectional Search Algorithm
The Two way or Bidirectional search algorithm executes in a way that t has to run two
searches simultaneously one in a forward direction and the other in the backward direction.
The search will stop when the two simultaneous searches intersect each other to find the goal
node. It is free to use any search algorithm discussed above, like BFS, DFS, etc.
 Bidirectional search is quick and occupies less memory.
 The implementation is difficult, and the goal node should be known in advance to
execute it.
 The Bidirectional Search algorithm is found to be complete and optimal.
 Time Complexity is expressed as O(bd).
 Space Complexity is expressed as O(bd).
Introduction to DFS Algorithm
DFS is known as the Depth First Search Algorithm which provides the steps to traverse each
and every node of a graph without repeating any node. This algorithm is the same as Depth
First Traversal for a tree but differs in maintaining a Boolean to check if the node has already
been visited or not. This is important for graph traversal as cycles also exist in the graph. A
stack is maintained in this algorithm to store the suspended nodes while traversal. It is named
so because we first travel to the depth of each adjacent node and then continue traversing
another adjacent node.
Explain the DFS Algorithm
This algorithm is contrary to the BFS algorithm where all the adjacent nodes are visited
followed by neighbors to the adjacent nodes. It starts exploring the graph from one node and
explores its depth before backtracking. Two things are considered in this algorithm:
 Visiting a Vertex: Selection of a vertex or node of the graph to traverse.
 Exploration of a Vertex: Traversing all the nodes adjacent to that vertex.
Pseudo Code for Depth First Search:
procDFS_implement(G,v):
let St be a stack
St.push(v)
while St has elements
v = St.pop()
if v is not labeled as visited:
label v as visited
for all edge v to w inG.adjacentEdges(v) do
St.push(w)
Linear Traversal also exists for DFS that can be implemented in 3 ways:
 Preorder
 Inorder
 PostOrder
Reverse postorder is a very useful way to traverse and used in topological sorting as well as
various analyses. A stack is also maintained to store the nodes whose exploration is still
pending.
Graph Traverse in DFS
In DFS, the below steps are followed to traverse the graph. For example, a given graph, let us
start traversal from 1:
Explanation to DFS Algorithm
Below are the steps to DFS Algorithm with advantages and disadvantages:
Step1: Node 1 is visited and added to the sequence as well as the spanning tree.
Step2: Adjacent nodes of 1 are explored that is 4 thus 1 is pushed to stack and 4 is pushed
into the sequence as well as spanning tree.
Step3: One of the adjacent nodes of 4 is explored and thus 4 is pushed to the stack, and 3
enters the sequence and spanning tree.
Step4: Adjacent nodes of 3 are explored by pushing it onto the stack and 10 enters the
sequence. As there is no adjacent node to 10, thus 3 is popped out of the stack.
Step5: Another adjacent node of 3 is explored and 3 is pushed onto the stack and 9 is visited.
No adjacent node of 9 thus 3 is popped out and the last adjacent node of 3 i.e 2 is visited.
A similar process is followed for all the nodes till the stack becomes empty which indicates
the stop condition for the traversal algorithm. — -> dotted lines in the spanning tree refers to
the back edges present in the graph.
In this way, all the nodes in the graph are traverse without repeating any of the nodes.
Advantages and Disadvantages
 Advantages: The memory requirements for this algorithm is very less. Lesser space
and time complexity than BFS.
 Disadvantages: Solution is not guaranteed Applications. Topological Sorting. Finding

Bridges of the graph.
Example to Implement DFS Algorithm
Below is the example to implement DFS Algorithm:
Code:
import java.util.Stack;
public class DepthFirstSearch {
static void depthFirstSearch(int[][] matrix, int source){
boolean[] visited = new boolean [matrix.length];
visited[source-1] = true;
Stack<Integer> stack = new Stack<>();
stack.push(source);
inti,x;
System.out.println("Depth of first order is");
System.out.println(source);
while(!stack.isEmpty()){
x= stack.pop();
for(i=0;i<matrix.length;i++){
if(matrix[x-1][i] == 1 && visited[i] == false){
stack.push(x);
visited[i]=true;
System.out.println(i+1);
x=i+1;
i=-1;
}
}
}
}
public static void main(String[] args){
int vertices=5;
int[][] mymatrix = new int[vertices][vertices];
for(inti=0;i<vertices;i++){
for(int j=0;j<vertices;j++){
mymatrix[i][j]= i+j;
}
}
depthFirstSearch(mymatrix,5);
}
}
Output:
Explanation of the above program: We made a graph having 5 vertices (0,1,2,3,4) and used a
visited array to keep track of all visited vertices. Then for each node whose adjacent nodes
exist same algorithm repeats till all the nodes are visited. Then the algorithm goes back to
calling vertex and pop it from the stack.
Depth-limited search
This search strategy is similar to DFS with a little difference. The difference is that in depth-
limited search, we limit the search by imposing a depth limit l to the depth of the search tree.
It does not need to explore till infinity. As a result, the depth-first search is a special case of
depth-limited search. when the limit l is infinite.
Depth-limited search on a binary tree
In the above figure, the depth-limit is 1. So, only level 0 and 1 get expanded in A->B->C
DFS sequence, starting from the root node A till node B. It is not giving satisfactory result
because we could not reach the goal node I.
Depth-limited search Algorithm
 Set a variable NODE to the initial state, i.e., the root node.
 Set a variable GOAL which contains the value of the goal state.
 Set a variable LIMIT which carries a depth-limit value.
 Loop each node by traversing in DFS manner till the depth-limit value.
 While performing the looping, start removing the elements from the stack in LIFO
order.
 If the goal state is found, return goal state. Else terminate the search.
The performance measure of Depth-limited search
 Completeness: Depth-limited search does not guarantee to reach the goal node.
 Optimality: It does not give an optimal solution as it expands the nodes till the depth-
limit.
 Space Complexity: The space complexity of the depth-limited search is O(bl).

 Time Complexity: The time complexity of the depth-limited search is O(bl).
Disadvantages of Depth-limited search
 This search strategy is not complete.
 It does not provide an optimal solution.
Note: Depth-limit search terminates with two kinds of failures: the standard failure value
indicates “no solution,” and cut-off value, which indicates “no solution within the depth-
limit.”
Iterative deepening depth-first search/ Iterative deepening search
This search is a combination of BFS and DFS, as BFS guarantees to reach the goal node and
DFS occupies less memory space. Therefore, iterative deepening search combines these two
advantages of BFS and DFS to reach the goal node. It gradually increases the depth-limit
from 0,1,2 and so on and reach the goal node.
In the above figure, the goal node is H and initial depth-limit =[0-1]. So, it will expand level
0 and 1 and will terminate with A->B->C sequence. Further, change the depth-limit =[0-3], it
will again expand the nodes from level 0 till level 3 and the search terminate with A->B->D-
>F->E->H sequence where H is the desired goal node.
Iterative deepening search Algorithm
 Explore the nodes in DFS order.

 Set a LIMIT variable with a limit value.
 Loop each node up to the limit value and further increase the limit value accordingly.
 Terminate the search when the goal state is found.
The performance measure of Iterative deepening search
 Completeness: Iterative deepening search may or may not reach the goal state.
 Optimality: It does not give an optimal solution always.
 Space Complexity: It has the same space complexity as BFS, i.e., O(bd).
 Time Complexity: It has O(d) time complexity.
Disadvantages of Iterative deepening search
 The drawback of iterative deepening search is that it seems wasteful because it

generates states multiple times.
Note: Generally, iterative deepening search is required when the search space is large, and the
depth of the solution is unknown.
Bidirectional search
The strategy behind the bidirectional search is to run two searches simultaneously–one
forward search from the initial state and other from the backside of the goal–hoping that both
searches will meet in the middle. As soon as the two searches intersect one another, the
bidirectional search terminates with the goal node. This search is implemented by replacing
the goal test to check if the two searches intersect. Because if they do so, it means a solution
is found.
The performance measure of Bidirectional search
 Complete: Bidirectional search is complete.
 Optimal: It gives an optimal solution.
 Time and space complexity: Bidirectional search has O(bd/2)
Disadvantage of Bidirectional Search
 It requires a lot of memory space.

Heuristic Functions in AI: As we have already seen that an informed search make use of
heuristic functions in order to reach the goal node in a more prominent way. Therefore, there
are several pathways in a search tree to reach the goal node from the current node. The
selection of a good heuristic function matters certainly. A good heuristic function is
determined by its efficiency. More is the information about the problem, more is the
processing time.
Some toy problems, such as 8-puzzle, 8-queen, tic-tac-toe, etc., can be solved more
efficiently with the help of a heuristic function. Let’s see how:
Consider the following 8-puzzle problem where we have a start state and a goal state. Our
task is to slide the tiles of the current/start state and place it in an order followed in the goal
state. There can be four moves either left, right, up, or down. There can be several ways to
convert the current/start state to the goal state, but, we can use a heuristic function h(n) to
solve the problem more efficiently.
A heuristic function for the 8-puzzle problem is defined below:
h(n)=Number of tiles out of position.
So, there is total of three tiles out of position i.e., 6,5 and 4. Do not count the empty tile
present in the goal state). i.e. h(n)=3. Now, we require to minimize the value of h(n) =0.
We can construct a state-space tree to minimize the h(n) value to 0, as shown below:
It is seen from the above state space tree that the goal state is minimized from h(n)=3 to
h(n)=0. However, we can create and use several heuristic functions as per the reqirement. It is
also clear from the above example that a heuristic function h(n) can be defined as the
information required to solve a given problem more efficiently. The information can be
related to the nature of the state, cost of transforming from one state to another, goal node
characterstics, etc., which is expressed as a heuristic function.
Properties of a Heuristic search Algorithm
Use of heuristic function in a heuristic search algorithm leads to following properties of a

heuristic search algorithm:
 Admissible Condition: An algorithm is said to be admissible, if it returns an optimal

solution.
 Completeness: An algorithm is said to be complete, if it terminates with a solution (if

the solution exists).
 Dominance Property: If there are two admissible heuristic algorithms A1 and A2
having h1 and h2 heuristic functions, then A1 is said to dominate A2 ifh1 is better
than h2 for all the values of node n.
 Optimality Property: If an algorithm is complete, admissible, and dominating other

algorithms, it will be the best one and will definitely give an optimal solution.
Exercises
Module -2
Local and Adversarial search
A constrained optimization problem is an optimization problem that also has hard constraints

specifying which variable assignments are possible. The aim is to find a best assignment that
satisfies the hard constraints. A soft constraint assigns a cost for each assignment of values to
some subset of the variables.
1. Hill Climbing Algorithm: Hill climbing search is a local search problem. The
purpose of the hill climbing search is to climb a hill and reach the topmost peak/ point of that
hill. It is based on the heuristic search technique where the person who is climbing up on the
hill estimates the direction which will lead him to the highest peak.
State-space Landscape of Hill climbing algorithm
To understand the concept of hill climbing algorithm, consider the below landscape
representing the goal state/peak and the current state of the climber. The topographical
regions shown in the figure can be defined as:
 Global Maximum: It is the highest point on the hill, which is the goal state.
 Local Maximum: It is the peak higher than all other peaks but lower than the global
maximum.
 Flat local maximum: It is the flat area over the hill where it has no uphill or
downhill. It is a saturated point of the hill.
 Shoulder: It is also a flat area where the summit is possible.
 Current state: It is the current position of the person.
1.1 Types of Hill climbing search algorithm
There are following types of hill-climbing search:
 Simple hill climbing
 Steepest-ascent hill climbing
 Stochastic hill climbing
 Random-restart hill climbing

Figure 2.1: Hill climbing algorithm
1.1.1 Simple hill climbing search
Simple hill climbing is the simplest technique to climb a hill. The task is to reach the highest
peak of the mountain. Here, the movement of the climber depends on his move/steps. If he
finds his next step better than the previous one, he continues to move else remain in the same
state. This search focus only on his previous and next step.
Simple hill climbing Algorithm
1. Create a CURRENT node, NEIGHBOUR node, and a GOAL node.
2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Else CURRENT node<= NEIGHBOUR node, move ahead.
4. Loop until the goal is not reached or a point is not found.
1.1.2 Steepest-ascent hill climbing
Steepest-ascent hill climbing is different from simple hill climbing search. Unlike simple hill
climbing search, It considers all the successive nodes, compares them, and choose the node
which is closest to the solution. Steepest hill climbing search is similar to best-first search
because it focuses on each node instead of one.
Note: Both simple, as well as steepest-ascent hill climbing search, fails when there is no
closer node.
Steepest-ascent hill climbing algorithm
1. Create a CURRENT node and a GOAL node.
2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Loop until a better node is not found to reach the solution.
4. If there is any better successor node present, expand it.
5. When the GOAL is attained, return GOAL and terminate.
1.1.3 Stochastic hill climbing
Stochastic hill climbing does not focus on all the nodes. It selects one node at random and
decides whether it should be expanded or search for a better one.
1.1.4 Random-restart hill climbing
Random-restart algorithm is based on try and try strategy. It iteratively searches the node and
selects the best one at each step until the goal is not found. The success depends most
commonly on the shape of the hill. If there are few plateaus, local maxima, and ridges, it
becomes easy to reach the destination.
Limitations of Hill climbing algorithm
Hill climbing algorithm is a fast and furious approach. It finds the solution state rapidly
because it is quite easy to improve a bad state. But, there are following limitations of this
search:
 Local Maxima: It is that peak of the mountain which is highest than all its
neighboring states but lower than the global maxima. It is not the goal peak because
there is another peak higher than it.
Figure 2.2: Local maxima and Global maxima

 Plateau: It is a flat surface area where no uphill exists. It becomes difficult for the
climber to decide that in which direction he should move to reach the goal point.
Sometimes, the person gets lost in the flat area.
Figure 2.3: Plateau
 Ridges: It is a challenging problem where the person finds two or more local maxima
of the same height commonly. It becomes difficult for the person to navigate the right
point and stuck to that point itself.
Figure 2.4: Ridges
2. Simulated Annealing
Simulated annealing is similar to the hill climbing algorithm. It works on the current
situation. It picks a random move instead of picking the best move. If the move leads to the
improvement of the current situation, it is always accepted as a step towards the solution
state, else it accepts the move having a probability less than 1. This search technique was first
used in 1980 to solve VLSI layout problems. It is also applied for factory scheduling and
other large optimization tasks.
Local Beam Search
Local beam search is quite different from random-restart search. It keeps track of k states
instead of just one. It selects k randomly generated states, and expand them at each step. If
any state is a goal state, the search stops with success. Else it selects the best k successors
from the complete list and repeats the same process. In random-restart search where each
search process runs independently, but in local beam search, the necessary information is
shared between the parallel search processes.
Disadvantages of Local Beam search
 This search can suffer from a lack of diversity among the k states.
 It is an expensive version of hill climbing search.
A Simulated annealing algorithm is a method to solve bound-constrained and unconstrained

optimization parameters models. The method is based on physical annealing and is used to
minimize system energy.
In every simulated annealing example, a random new point is generated. The distance
between the current point and the new point has a basis of the probability distribution on the
scale of the proportion of temperature. The algorithm aims at all those points that minimize
the objective with certain constraints and probabilities. Those points that raise the objective
are also accepted to explore all the possible solutions instead of concentrating only on local
minima.
Optimization by simulated annealing is performed by systematically decreasing the

temperature and minimising the search’s extent.
2. Implement Simulated Annealing
There are a set of steps that are performed for simulated annealing in ai. These steps can be
summarized as follows:
 Simulated annealing creates a trial point randomly. The algorithm selects the distance
between the current point and the trial point by a probability distribution. The scale of
such distribution is temperature. With the annealing function trial, point distribution
distance is set. To keep the boundaries intact, the trial point is shifted gradually.
 The Simulated Annealing formula then determines if the new point is better than the
older or not. If the new point is better, it is made as a next point, while if the new
point is worse, it can still be accepted depending upon the simulated annealing
acceptance function.
 A systematic algorithm gradually reduces the temperature selecting the best point that
gets generated in the process.
 For lowering the values, the annealing parameters are set, raising and reducing the
temperatures. The simulated annealing parameters are based on the values of the
probable gradients of every dimension of the objective.
 The simulated annealing is concluded when it reaches the lowest minima or any of the
specific stopping criteria.
3. Stopping Criteria of Simulated annealing
Some of the conditions that are considered as the basis to stop the simulated-annealing are as
follows:
 The simulated-annealing performs until the value of the objective function goes lesser
than the tolerance function. The value of default is 1e – 6
 The default value of iterations in simulated-annealing is INF. This can be set to any
positive integer as well. When the algorithm exceeds the iteration value, it stops.
 The annealing concludes when the maximum number of evaluations is achieved. The
default value of such evaluations is 3000 * number of variables.
 The default value of maximum time is Inf, and when that is reached, the algorithm
stops.
 When the best objective function value is lesser than the limit of the objective it
concludes. The default value of such an objective function is -Inf.
4. Simulated Annealing Worked Example

To understand how simulated-annealing works, one can take the example of a traveling
salesman. The solution can be created by applying any of the language selections. Let us
understand the problem and the solution with simulated-annealing applications.
 At the onset, a city class needs to be created to specify several destinations the
travelling salesman would visit.
 After that, a class has to be created that keeps track of the cities.
 Then a class is created that models the tour of the travelling salesman.
 With all the different classes and the information in hand, a simulated-annealing
algorithm is created.
 Thus with the types of optimization problems, a relatively simpler algorithm is

created, and the solution is sought.
3. Local Beam Search :
A heuristic search algorithm that examines a graph by extending the most promising node in
a limited set is known as beam search. Beam search is a heuristic search technique that
always expands the W number of the best nodes at each level. It progresses level by level and
moves downwards only from the best W nodes at each level. Beam Search uses breadth-first
search to build its search tree. Beam Search constructs its search tree using breadth-first
search. It generates all the successors of the current level’s state at each level of the tree.
However, at each level, it only evaluates a W number of states. Other nodes are not taken into
account.
The heuristic cost associated with the node is used to choose the best nodes. The width of the
beam search is denoted by W. If B is the branching factor, at every depth, there will always
be W × B nodes under consideration, but only W will be chosen. More states are trimmed
when the beam width is reduced.
When W = 1, the search becomes a hill-climbing search in which the best node is always
chosen from the successor nodes. No states are pruned if the beam width is unlimited, and the
beam search is identified as a breadth-first search.
The beamwidth bounds the amount of memory needed to complete the search, but it comes at
the cost of completeness and optimality (possibly that it will not find the best solution). The
reason for this danger is that the desired state could have been pruned.
Example: The search tree generated using this algorithm with W = 2 & B = 3 is given below :
Figure 2.5: Search tree with W=2 and B=3
4. Beam Search
The black nodes are selected based on their heuristic values for further expansion.
The algorithm for beam search is given as :
Input: Start & Goal States.
Local Variables: OPEN, NODE, SUCCS, W_OPEN, FOUND
Output: Yes or No (yes if the search is successfully done)
Start
Take the inputs
NODE = Root_Node& Found = False
If : Node is the Goal Node,
Then Found = True,
Else :
Find SUCCs of NODE if any, with its estimated cost&
store it in OPEN List
While (Found == false & not able to proceed further), do
Sort OPEN List
Select top W elements from OPEN list and put it in
W_OPEN list and empty the OPEN list.
for each NODE from W_OPEN list
if NODE = Goal,
then FOUND = true
else
Find SUCCs of NODE. If any with its estimated
cost & Store it in OPEN list
If FOUND = True,
then return Yes
else
return No
Stop
5. Genetic Algorithms
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger
part of evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection
and genetics. These are intelligent exploitation of random search provided with historical data
to direct the search into the region of better performance in solution space. They are
commonly used to generate high-quality solutions for optimization problems and search
problems.
Genetic algorithms simulate the process of natural selection which means those species who
can adapt to changes in their environment are able to survive and reproduce and go to next
generation. In simple words, they simulate “survival of the fittest” among individual of
consecutive generation for solving a problem. Each generation consist of a population of
individuals and each individual represents a point in search space and possible solution. Each
individual is represented as a string of character/integer/float/bits. This string is analogous to
the Chromosome.
5.1 Foundation of Genetic Algorithms
Genetic algorithms are based on an analogy with genetic structure and behavior of
chromosome of the population. Following is the foundation of GAs based on this analogy –
1. Individual in population compete for resources and mate
2. Those individuals who are successful (fittest) then mate to create more offspring than
others
3. Genes from “fittest” parent propagate throughout the generation, that is sometimes
parents create offspring which is better than either parent.
4. Thus each successive generation is more suited for their environment.
Search space
The population of individuals are maintained within search space. Each individual represent a
solution in search space for given problem. Each individual is coded as a finite length vector
(analogous to chromosome) of components. These variable components are analogous to
Genes. Thus a chromosome (individual) is composed of several genes (variable components).
Figure 2.6: Initial populations
Fitness Score
A Fitness Score is given to each individual which shows the ability of an individual to
“compete”. The individual having optimal fitness score (or near optimal) are sought.
The GAs maintains the population of n individuals (chromosome/solutions) along with their
fitness scores.The individuals having better fitness scores are given more chance to reproduce
than others. The individuals with better fitness scores are selected who mate and produce
better offspring by combining chromosomes of parents. The population size is static so the
room has to be created for new arrivals. So, some individuals die and get replaced by new
arrivals eventually creating new generation when all the mating opportunity of the old
population is exhausted. It is hoped that over successive generations better solutions will
arrive while least fit die.
Each new generation has on average more “better genes” than the individual (solution) of
previous generations. Thus each new generations have better “partial solutions” than previous
generations. Once the offsprings produced having no significant difference than offspring
produced by previous populations, the population is converged. The algorithm is said to be
converged to a set of solutions for the problem.
Operators of Genetic Algorithms
Once the initial generation is created, the algorithm evolve the generation using following
operators –
1) Selection Operator: The idea is to give preference to the individuals with good fitness
scores and allow them to pass there genes to the successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are
selected using selection operator and crossover sites are chosen randomly. Then the genes at
these crossover sites are exchanged thus creating a completely new individual (offspring).
For example –
Figure 2.7: Crossover

3) Mutation Operator: The key idea is to insert random genes in offspring to maintain the
diversity in population to avoid the premature convergence. For example –
Figure 2.8: Mutation
The whole algorithm can be summarized as –
1) Randomly initialize populations p
2) Determine fitness of population
3) Untill convergence repeat:

a) Select parents from population
b) Crossover and generate new population
c) Perform mutation on new population
d) Calculate fitness for new population
Given a target string, the goal is to produce target string starting from a random string of the
same length. In the following implementation, following analogies are made –
Characters A-Z, a-z, 0-9 and other special symbols are considered as genes. A string
generated by these character is considered as chromosome/solution/Individual.Fitness score is
the number of characters which differ from characters in target string at a particular index. So
individual having lower fitness value is given more preference.
Online search is a necessary idea for unknown environments, where the agent does not know
what states exist or what its actions do. In this state of ignorance, the agent faces an
exploration problem and must use its actions as experiments in order to learn enough to make
deliberation worthwhile. The canonical example of online search is a robot that is placed in a
new building and must explore it to build a map that it can use for getting from A to B.
Methods for escaping from labyrinths—required knowledge for aspiring heroes of antiquity
—are also examples of online search algorithms. Spatial exploration is not the only form of
exploration, however. Consider a newborn baby: it has many possible actions but knows the
outcomes of none of them, and it has experienced only a few of the possible states that it can
reach. The baby’s gradual discovery of how the world works is, in part, an online search
process. Online search problems. An online search problem must be solved by an agent
executing actions, rather than by pure
computation. We assume a deterministic and fully observable environment relaxes these
assumptions), but we stipulate that the agent knows only the following:
ACTIONS(s), which returns a list of actions allowed in state s;

The step-cost function c(s, a, s )—note that this cannot be used until the agent knows
that s is the outcome; and
GOAL-TEST(s).
Note in particular that the agent cannot determine RESULT(s, a) except by actually being in s
and doing a. For example, in the maze problem shown in Figure 4.19, the agent does not
know that going Up from (1,1) leads to (1,2); nor, having done that, does it know that going
Down will take it back to (1,1). This degree of ignorance can be reduced in some applications
—for example, a robot explorer might know how its movement actions work and be ignorant
only of the locations of obstacles.
Game Playing is an important domain of artificial intelligence. Games don’t require much
knowledge; the only knowledge we need to provide is the rules, legal moves and the
conditions of winning or losing the game.
Both players try to win the game. So, both of them try to make the best move possible at each
turn. Searching techniques like BFS(Breadth First Search) are not accurate for this as the
branching factor is very high, so searching will take a lot of time. So, we need another search
procedures that improve –
 Generate procedure so that only good moves are generated.
 Test procedure so that the best move can be explored first.
6. Minimax search
The most common search technique in game playing is Minimax search procedure. It is
depth-first depth-limited search procedure. It is used for games like chess and tic-tac-toe.
Minimax algorithm uses two functions –
MOVEGEN : It generates all the possible moves that can be generated from the current
position.
STATICEVALUATION : It returns a value depending upon the goodness from the viewpoint
of two-player
This algorithm is a two player game, so we call the first player as PLAYER1 and second
player as PLAYER2. The value of each node is backed-up from its children. For PLAYER1
the backed-up value is the maximum value of its children and for PLAYER2 the backed-up
value is the minimum value of its children. It provides most promising move to PLAYER1,
assuming that the PLAYER2 has make the best move. It is a recursive algorithm, as same
procedure occurs at each level.
Figure 2.9: Before backing-up of values

Figure 2.10: After backing-up of values
We assume that PLAYER1 will start the game. 4 levels are generated. The value to nodes H,
I, J, K, L, M, N, O is provided by STATICEVALUATION function. Level 3 is maximizing
level, so all nodes of level 3 will take maximum values of their children. Level 2 is
minimizing level, so all its nodes will take minimum values of their children. This process
continues. The value of A is 23. That means A should choose C move to win.
Alpha beta pruning is an optimisation technique for the minimax algorithm.
The word ‘pruning’ means cutting down branches and leaves. In data science pruning is a
much-used term which refers to post and pre-pruning in decision trees and random forest.
Alpha-beta pruning is nothing but the pruning of useless branches in decision trees. This
alpha-beta pruning algorithm was discovered independently by researchers in the 1900s.
Alpha-beta pruning is an optimisation technique for the minimax algorithm which is

discussed in the next section. The need for pruning came from the fact that in some cases
decision trees become very complex. In that tree, some useless branches increase the
complexity of the model. So, to avoid this, Alpha-Beta pruning comes to play so that the
computer does not have to look at the entire tree. These unusual nodes make the algorithm
slow. Hence by removing these nodes algorithm becomes fast.
6.1 Minimax algorithm
Minimax is a classic depth-first search technique for a sequential two-player game. The two
players are called MAX and MIN. The minimax algorithm is designed for finding the optimal
move for MAX, the player at the root node. The search tree is created by recursively
expanding all nodes from the root in a depth-first manner until either the end of the game or
the maximum search depth is reached. Let us explore this algorithm in detail.
As already mentioned, there are two players in the game, viz- Max and Min. Max plays the
first step. Max’s task is to maximise its reward while Min’s task is to minimise Max’s
reward, increasing its own reward at the same time. Let’s say Max can take actions a, b, or c.
Which one of them will give Max the best reward when the game ends? To answer this
question, we need to explore the game tree to a sufficient depth and assume that Min plays
optimally to minimise the reward of Max.
Here is an example. Four coins are in a row and each player can pick up one coin or two
coins on his/her turn. The player who picks up the last coin wins. Assuming that Max plays
first, what move should Max make to win?
If Max picks two coins, then only two coins remain and Min can pick two coins and win.
Thus picking up 1 coin shall maximise Max’s reward.
As you might have noticed, the nodes of the tree in the figure below have some values
inscribed on them, these are called minimax value. The minimax value of a node is the utility
of the node if it is a terminal node.
If the node is a non-terminal Max node, the minimax value of the node is the maximum of the
minimax values of all of the node’s successors. On the other hand, if the node is a non-
terminal Min node, the minimax value of the node is the minimum of the minimax values of
all of the node’s successors.
Now we will discuss the idea behind the alpha beta pruning. If we apply alpha-beta pruning
to the standard minimax algorithm it gives the same decision as that of standard algorithm but
it prunes or cuts down the nodes that are unusual in decision tree i.e. which are not affecting
the final decision made by the algorithm. This will help to avoid the complexity in the
interpretation of complex trees.
 See how KNN algorithm works.
Now let us discuss the intuition behind this technique. Let us try to find minimax decision in
the below tree :
In this case,
Minimax Decision = MAX {MIN {3, 5, 10}, MIN {2, a, b}, MIN {2, 7, 3}}
= MAX {3, c, 2} = 3
Here in the above result you must have a doubt in your mind that how can we find the
maximum from missing value. So, here is solution of your doubt also:
In the second node we choose the minimum value as c which is less than or equal to 2 i.e. c
<= 2. Now If c <= 3 and we have to choose the max of 3, c, 2 the maximum value will be 3.
We have reached a decision without looking at those nodes. And this is where alpha-beta
pruning comes into the play.
Key points in Alpha-beta Pruning
 Alpha: Alpha is the best choice or the highest value that we have found at any
instance along the path of Maximizer. The initial value for alpha is – ∞.
 Beta: Beta is the best choice or the lowest value that we have found at any instance
along the path of Minimizer. The initial value for alpha is + ∞.
 The condition for Alpha-beta Pruning is that α >= β.
 Each node has to keep track of its alpha and beta values. Alpha can be updated only
when it’s MAX’s turn and, similarly, beta can be updated only when it’s MIN’s
chance.
 MAX will update only alpha values and MIN player will update only beta values.
 The node values will be passed to upper nodes instead of values of alpha and beta
during go into reverse of tree.
 Alpha and Beta values only be passed to child nodes.
Working of Alpha-beta Pruning
1. We will first start with the initial move. We will initially define the alpha and beta
values as the worst case i.e. α = -∞ and β= +∞. We will prune the node only when
alpha becomes greater than or equal to beta.
Figure 2.13: Max-Max
2. Since the initial value of alpha is less than beta so we didn’t prune it. Now it’s turn for
MAX. So, at node D, value of alpha will be calculated. The value of alpha at node D will be
max (2, 3). So, value of alpha at node D will be 3.
3. Now the next move will be on node B and its turn for MIN now. So, at node B, the value
of alpha beta will be min (3, ∞). So, at node B values will be alpha= – ∞ and beta will be 3.
In the next step, algorithms traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.
4. Now it’s turn for MAX. So, at node E we will look for MAX. The current value of alpha at
E is – ∞ and it will be compared with 5. So, MAX (- ∞, 5) will be 5. So, at node E, alpha = 5,
Beta = 5. Now as we can see that alpha is greater than beta which is satisfying the pruning
condition so we can prune the right successor of node E and algorithm will not be traversed
and the value at node E will be 5.
6. In the next step the algorithm again comes to node A from node B. At node A alpha will be
changed to maximum value as MAX (- ∞, 3). So now the value of alpha and beta at node A
will be (3, + ∞) respectively and will be transferred to node C. These same values will be
transferred to node F.
7. At node F the value of alpha will be compared to the left branch which is 0. So, MAX (0,
3) will be 3 and then compared with the right child which is 1, and MAX (3,1) = 3 still α
remains 3, but the node value of F will become 1.
8. Now node F will return the node value 1 to C and will compare to beta value at C. Now its
turn for MIN. So, MIN (+ ∞, 1) will be 1. Now at node C, α= 3, and β= 1 and alpha is greater
than beta which again satisfies the pruning condition. So, the next successor of node C i.e. G
will be pruned and the algorithm didn’t compute the entire subtree G.

Now, C will return the node value to A and the best value of A will be MAX (1, 3) will be 3.
The above represented tree is the final tree which is showing the nodes which are computed
and the nodes which are not computed. So, for this example the optimal value of the
maximizer will be 3.
Move Ordering in Pruning
The effectiveness of alpha – beta pruning is based on the order in which node is examined.
Move ordering plays an important role in alpha beta pruning.
There are two types of move ordering in Alpha beta pruning:
1. Worst Ordering: In some cases of alpha beta pruning none of the node pruned by the
algorithm and works like standard minimax algorithm. This consumes a lot of time as
because of alpha and beta factors and also not gives any effective results. This is
called Worst ordering in pruning. In this case, the best move occurs on the right side
of the tree.
2. Ideal Ordering: In some cases of alpha beta pruning lot of the nodes pruned by the
algorithm. This is called Ideal ordering in pruning. In this case, the best move occurs
on the left side of the tree. We apply DFS hence it first search left of the tree and go
deep twice as minimax algorithm in the same amount of time.
Rules to find Good ordering
 The best move happens from the lowest node
 Use domain knowledge while finding the best move
 Order of nodes should be in such a way that the best nodes will be computed first
7. Constraint satisfaction problem.

As originally defined in artificial intelligence, constraints enumerate the possible values a
set of variables may take in a given world. ... In other words, a solution is a way for
assigning a value to each variable in such a way that all constraints are satisfied by these
values.
A constraint satisfaction problem (CSP) consists of
• a set of variables,
• a domain for each variable, and
• a set of constraints.
The aim is to choose a value for each variable so that the resulting possible world satisfies the
constraints; we want a model of the constraints.
A finite CSP has a finite set of variables and a finite domain for each variable. Many of the
methods considered in this chapter only work for finite CSPs, although some are designed for
infinite, even continuous, domains.
The multidimensional aspect of these problems, where each variable can be seen as a separate
dimension, makes them difficult to solve but also provides structure that can be exploited.
Given a CSP, there are a number of tasks that can be performed:
• Determine whether or not there is a model.
• Find a model.
• Find all of the models or enumerate the models.
• Count the number of models.
• Find the best model, given a measure of how good models are; see Section 4.10.
• Determine whether some statement holds in all models.

This chapter mostly considers the problem of finding a model. Some of the methods can also
determine if there is no solution. What may be more surprising is that some of the methods
can find a model if one exists, but they cannot tell us that there is no model if none exists.
CSPs are very common, so it is worth trying to find relatively efficient ways to solve them.
Determining whether there is a model for a CSP with finite domains is NP-hard (see box) and
no known algorithms exist to solve such problems that do not use exponential time in the
worst case. However, just because a problem is NP-hard does not mean that all instances are
difficult to solve. Many instances have structure that can be exploited.
Advantage of formulating a problem as a CSP:
 CSPs yield a natural representation for a wide variety of problems;
 CSP solvers can be faster than state-space searchers because the CSP solver
can quickly eliminate large swatches of the search space;
 With CSP, once we find out that a partial assignment is not a solution, we can
immediately discard further refinements of the partial assignment.
 We can see why a assignment is not a solution—which variables violate a

constraint.
7.1 Job-shop scheduling
Consider a small part of the car assembly, consisting of 15 tasks: install axles (front and
back), affix all four wheels (right and left, front and back), tighten nuts for each wheel, affix
hubcaps, and inspect the final assembly. Represent the tasks with 15 variables:
X = {AxleF, AxleB, WheelRF, WheelLF, WheelRB, WheelLB, NutsRF, NutsLF, NutsRB, NutsLB, CapRF,
CapLF, CapRB, CapLB, Inspect}. The value of each variable is the time that the task starts.
Precedent constraints: Whenever a task T1 must occur before task T2, and T1 take

duration d1 to complete. We add an arithmetic constraint of the form T1 + d1 ≤ T2 . So,
AxleF + 10 ≤ WheelRF ; AxleF + 10 ≤ WheelLF ; AxleB + 10 ≤ WheelRB ; AxleB + 10 ≤

WheelLB ;
WheelRF + 1 ≤ NutsRF ; WheelLF + 1 ≤ NutsLF ; WheelRB + 1 ≤ NutsRB ; WheelLB + 1 ≤ NutsLB ;
NutsRF + 2 ≤ CapRF ; NutsLF + 2 ≤ CapLF ; NutsRB + 2 ≤ CapRB ; NutsLB + 2 ≤ CapLB ;
Disjunctive constraint: AxleF and AxleB must not overlap in time. So,

( AxleF + 10 ≤ AxleB ) or ( AxleB + 10 ≤ AxleF )
Assert that the inspection come last and takes 3 minutes. For every variable except Inspect we
add a constraint of the form X + dX ≤ Inspect.
There is a requirement to get the whle assembly done in 30 minutes, we can achieve that by
limiting the domain of all variables:
Di = {1, 2, 3, …, 27}.
7.2 Variation on the CSP formalism
a. Types of variables in CSPs
The simplest kind of CSP involves variables that have discrete, finite domains. E.g. Map-
coloring problems, scheduling with time limits, the 8-queens problem.
A discrete domain can be infinite. e.g. The set of integers or strings. With infinite domains, to
describe constraints, a constraint language must be used instead of enumerating all allowed
combinations of values.
CSP with continuous domains are common in the real world and are widely studied in the
field of operations research.
The best known category of continuous-domain CSPs is that of linear

programming problems, where constraints must be linear equalities or inequalities. Linear
programming problems can be solved in time polynomial in the number of variables.
b. Types of constraints in CSPs
The simplest type is the unary constraint, which restricts the value of a single variable.
A binary constraint relates two variables. (e.g. SA≠NSW.) A binary CSP is one with only
binary constraints, can be represented as a constraint graph.
We can also describe higher-order constraints. (e.g. The ternary constraint Between(X, Y, Z).)
A constraint involving an arbitrary number of variables is called a global constraint. (Need

not involve all the variable in a problem.) One of the most common global constraint
is Alldiff, which says that all of the variables involved in the constraint must have different
values.
Constraint hypergraph: consists of ordinary nodes (circles in the figure) and hypernodes (the
squares), which represent n-ary constraints.
Two ways to transform an n-ary CSP to a binary one:
a. Every finite domain constraint can be reduced to a set of binary constraints if enough
auxiliary variables are introduced, so we could transform any CSP into one with only binary
constraints.
b. The dual-graph transformation: create a new graph in which there will be one variable for
each constraint in the original graph, and one binary constraint for each pair of constraints in
the original graph that share variables.
e.g. If the original graph has variable {X,Y,Z} and constraints <(X,Y,Z),C1> and
<(X,Y),C2>, then the dual graph would have variables {C1,C2} with the binary constraint
<(X,Y),R1>, where (X,Y) are the shared variables and R1 is a new relation that defines the
constraint between the shared variables.
We might prefer a global constraint (such as Alldiff) rather than a set of binary constraints for
two reasons:
1) easier and less error-prone to write the problem description.
2) possible to design special-purpose inference algorithms for global constraints that are not
available for a set of more primitive constraints.
Absolute constraints: Violation of which rules out a potential solution.

Preference constraints: indicate which solutions are preferred, included in many real-world
CSPs. Preference constraints can often be encoded as costs on individual variable
assignments, with this formulation, CSPs with preferences can be solved with optimization
search methods. We ca call such a problem a constraint optimization problem(COP). Linear
programming problems do this kind of optimization.
Constraint propagation: Inference in CSPs
A number of inference techniques use the constraints to infer which variable/value pairs are
consistent and which are not. These include node, arc, path, and k-consistent.
constraint propagation: Using the constraints to reduce the number of legal values for a
variable, which in turn can reduce the legal values for another variable, and so on.
local consistency: If we treat each variable as a node in a graph and each binary constraint as
an arc, then the process of enforcing local consistency in each part of the graph causes
inconsistent values to be eliminated throughout the graph.
There are different types of local consistency:
Node consistency
A single variable (a node in the CSP network) is node-consistent if all the values in the
variable’s domain satisfy the variable’s unary constraint.
We say that a network is node-consistent if every variable in the network is node-consistent.
Arc consistency
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s
binary constraints.
Xi is arc-consistent with respect to another variable Xj if for every value in the current domain
Di there is some value in the domain Dj that satisfies the binary constraint on the arc (Xi, Xj).
A network is arc-consistent if every variable is arc-consistent with every other variable.
Arc consistency tightens down the domains (unary constraint) using the arcs (binary
constraints).
AC-3 algorithm:

AC-3 maintains a queue of arcs which initially contains all the arcs in the CSP.
AC-3 then pops off an arbitrary arc (X i, Xj) from the queue and makes X i arc-consistent with
respect to Xj.
If this leaves Di unchanged, just moves on to the next arc;
But if this revises Di, then add to the queue all arcs (Xk, Xi) where Xk is a neighbor of Xi.
If Di is revised down to nothing, then the whole CSP has no consistent solution, return
failure;
Otherwise, keep checking, trying to remove values from the domains of variables until no
more arcs are in the queue.
The result is an arc-consistent CSP that have the same solutions as the original one but have
smaller domains.
The complexity of AC-3:
Assume a CSP with n variables, each with domain size at most d, and with c binary
constraints (arcs). Checking consistency of an arc can be done in O(d 2) time, total worst-case
time is O(cd3).
Path consistency
Path consistency: A two-variable set {Xi, Xj} is path-consistent with respect to a third
variable Xm if, for every assignment {Xi = a, Xj = b} consistent with the constraint on {X i,
Xj}, there is an assignment to Xm that satisfies the constraints on {Xi, Xm} and {Xm, Xj}.
Path consistency tightens the binary constraints by using implicit constraints that are inferred
by looking at triples of variables.
K-consistency
K-consistency: A CSP is k-consistent if, for any set of k-1 variables and for any consistent
assignment to those variables, a consistent value can always be assigned to any kth variable.
1-consistency = node consistency; 2-consisency = arc consistency; 3-consistensy = path

consistency.
A CSP is strongly k-consistent if it is k-consistent and is also (k - 1)-consistent, (k – 2)-

consistent, … all the way down to 1-consistent.
A CSP with n nodes and make it strongly n-consistent, we are guaranteed to find a solution in
time O(n2d). But algorithm for establishing n-consitentcy must take time exponential in n in
the worse case, also requires space that is exponential in n.
Global constraints
A global constraint is one involving an arbitrary number of variables (but not necessarily all
variables). Global constraints can be handled by special-purpose algorithms that are more
efficient than general-purpose methods.
1) inconsistency detection for Alldiff constraints
A simple algorithm: First remove any variable in the constraint that has a singleton domain,
and delete that variable’s value from the domains of the remaining variables. Repeat as long
as there are singleton variables. If at any point an empty domain is produced or there are
more vairables than domain values left, then an inconsistency has been detected.
A simple consistency procedure for a higher-order constraint is sometimes more effective

than applying arc consistency to an equivalent set of binary constrains.
2) inconsistency detection for resource constraint (the atmost constraint)

We can detect an inconsistency simply by checking the sum of the minimum of the current
domains;
e.g.
Atmost(10, P1, P2, P3, P4): no more than 10 personnel are assigned in total.
If each variable has the domain {3, 4, 5, 6}, the Atmost constraint cannot be satisfied.
We can enforce consistency by deleting the maximum value of any domain if it is not
consistent with the minimum values of the other domains.
e.g. If each variable in the example has the domain {2, 3, 4, 5, 6}, the values 5 and 6 can be
deleted from each domain.
3) inconsistency detection for bounds consistent
For large resource-limited problems with integer values, domains are represented by upper
and lower bounds and are managed by bounds propagation.
e.g.
suppose there are two flights F 1 and F2 in an airline-scheduling problem, for which the planes
have capacities 165 and 385, respectively. The initial domains for the numbers of passengers
on each flight are
D1 = [0, 165] and D2 = [0, 385].
Now suppose we have the additional constraint that the two flight together must carry 420
people: F1 + F2 = 420. Propagating bounds constraints, we reduce the domains to
D1 = [35, 165] and D2 = [255, 385].
A CSP is bounds consistent if for every variable X, and for both the lower-bound and upper-
bound values of X, there exists some value of Y that satisfies the constraint between X and Y
for every variable Y.
Sudoku
A Sudoku puzzle can be considered a CSP with 81 variables, one for each square. We use the
variable names A1 through A9 for the top row (left to right), down to I1 through I9 for the
bottom row. The empty squares have the domain {1, 2, 3, 4, 5, 6, 7, 8, 9} and the pre-filled
squares have a domain consisting of a single value.
There are 27 different Alldiff constraints: one for each row, column, and box of 9 squares:
Alldiff(A1, A2, A3, A4, A5, A6, A7, A8, A9)
Alldiff(B1, B2, B3, B4, B5, B6, B7, B8, B9)
Alldiff(A1, B1, C1, D1, E1, F1, G1, H1, I1)
Alldiff(A2, B2, C2, D2, E2, F2, G2, H2, I2)
Alldiff(A1, A2, A3, B1, B2, B3, C1, C2, C3)
Alldiff(A4, A5, A6, B4, B5, B6, C4, C5, C6)
Backtracking search for CSPs
Backtracking search, a form of depth-first search, is commonly used for solving CSPs.
Inference can be interwoven with search.
Commutativity: CSPs are all commutative. A problem is commutative if the order of

application of any given set of actions has no effect on the outcome.
Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values
in the domain of that variable in turn, trying to find a solution. If an inconsistency is detected,
then BACKTRACK returns failure, causing the previous call to try another value.
There is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state,

action function, transition model, or goal test.
BACKTRACKING-SARCH keeps only a single representation of a state and alters that

representation rather than creating a new ones.

To solve CSPs efficiently without domain-specific knowledge, address following questions:
1)function SELECT-UNASSIGNED-VARIABLE: which variable should be assigned next?
function ORDER-DOMAIN-VALUES: in what order should its values be tried?
2)function INFERENCE: what inferences should be performed at each step in the search?
3)When the search arrives at an assignment that violates a constraint, can the search avoid
repeating this failure?
1. Variable and value ordering
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the variable with the
fewest “legal” value. A.k.a. “most constrained variable” or “fail-first” heuristic, it picks a
variable that is most likely to cause a failure soon thereby pruning the search tree. If some
variable X has no legal values left, the MRV heuristic will select X and failure will be
detected immediately—avoiding pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for
SA, so it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor on future
choices by selecting the variable that is involved in the largest number of constraints on other
unassigned variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has
degree 0.
ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then the ordering
does not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the
neighboring variables in the constraint graph. (Try to leave the maximum flexibility for
subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that our next
choice is for Q. Blue would be a bad choice because it eliminates the last legal value left for
Q’s neighbor, SA, therefore prefers red to blue.
The minimum-remaining-values and degree heuristic are domain-independent methods for

deciding which variable to choose next in a backtracking search. The least-constraining-
value heuristic helps in deciding which value to try first for a given variable.

2. Interleaving search and inference
Inference: forward checking: [One of the simplest forms of inference.] Whenever a variable

X is assigned, the forward-checking process establishes arc consistency for it: for each
unassigned variable Y that is connected to X by a constraint, delete from Y’s domain any
value that is inconsistent with the value chosen for X.
There is no reason to do forward checking if we have already done arc consistency as a

preprocessing step.
Advantage: For many problems the search will be more effective if we combine the MRV
heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t
look ahead and make all the other variables arc-consistent.
MAC (Maintaining Arc Consistency) algorithm: [More powerful than forward checking,

detect this inconsistency.] After a variable Xi is assigned a value, the INFERENCE procedure
calls AC-3, but instead of a queue of all arcs in the CSP, we start with only the arcs(X j, Xi)
for all Xj that are unassigned variables that are neighbors of Xi. From there, AC-3 does
constraint propagation in the usual way, and if any variable has its domain reduced to the
empty set, the call to AC-3 fails and we know to backtrack immediately.
Intelligent backtracking

chronological backtracking: The BACKGRACKING-SEARCH in Fig 6.5. When a branch of

the search fails, back up to the preceding variable and try a different value for it. (The most
recent decision point is revisited.)
e.g.
Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue, T=red}.
When we try the next variable SA, we see every value violates a constraint.
We back up to T and try a new color, it cannot resolve the problem.
Intelligent backtracking: Backtrack to a variable that was responsible for making one of the
possible values of the next variable (e.g. SA) impossible.
Conflict set for a variable: A set of assignments that are in conflict with some value for that
variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method: Backtracks to the most recent assignment in the conflict set.
(e.g. backjumping would jump over T and try a new value for V.)
Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s domain,
add X=x to Y’s conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added
to the conflict set of X.
In fact,every branch pruned by backjumping is also pruned by forward checking. Hence

simple backjumping is redundant in a forward-checking search or in a search that uses
stronger consistency checking (such as MAC).
Conflict-directed backjumping:
e.g.
consider the partial assignment which is proved to be inconsistent: {WA=red, NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4
variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because
NT doesn’t have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with
any subsequent variables to have no consistent solution. So the algorithm should backtrack to
NSW and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-direct
backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a
standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value
for Xj fails, backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward, given
the preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)
After backjumping from a contradiction, how to avoid running into the same problem again:
Constraint learning: The idea of finding a minimum set of variables from the conflict set that
causes the problem. This set of variables, along with their corresponding values, is called
a no-good. We then record the no-good, either by adding a new constraint to the CSP or by
keeping a separate cache of no-goods.
Backtracking occurs when no legal assignment can be found for a variable. Conflict-directed

backjumping backtracks directly to the source of the problem.
Exercise
Module-3
Knowledge and Reasoning
1. Knowledge-Based Agent in Artificial intelligence
o An intelligent agent needs knowledge about the real world for taking decisions

and reasoning to act efficiently.
o Knowledge-based agents are those agents who have the capability of maintaining an
internal state of knowledge, reason over that knowledge, update their knowledge after
observations and take actions. These agents can represent the world with some formal
representation and act intelligently.
o Knowledge-based agents are composed of two main parts:
o Knowledge-base and
o Inference system.
A knowledge-based agent must able to do the following:
o An agent should be able to represent states, actions, etc.

o An agent Should be able to incorporate new percepts
o An agent can update the internal representation of the world
o An agent can deduce the internal representation of the world
o An agent can deduce appropriate actions.
The architecture of knowledge-based agent:
Figure 3.1: Knowledge base agent

The above diagram is representing a generalized architecture for a knowledge-based agent.
The knowledge-based agent (KBA) takes input from the environment by perceiving the
environment. The input is taken by the inference engine of the agent and which also
communicate with KB to decide as per the knowledge store in KB. The learning element of
KBA regularly updates the KB by learning new knowledge.
Knowledge base: Knowledge-base is a central component of a knowledge-based agent, it is

also known as KB. It is a collection of sentences (here 'sentence' is a technical term and it is
not identical to sentence in English). These sentences are expressed in a language which is
called a knowledge representation language. The Knowledge-base of KBA stores fact about
the world.
Why use a knowledge base?
Knowledge-base is required for updating knowledge for an agent to learn with experiences
and take action as per the knowledge.
2. LOGIC
By logic we mean symbolic, knowledge-based, reasoning and other similar approaches to AI
that differ, at least on the surface, from existing forms of classical machine learning and deep
learning. It is crucial to keep in mind just as there are many forms of machine learning; there
are many different forms of logic-based approaches to AI with their own sets of tradeoffs.
Very briefly, logic-based AI systems can be thought of as high-level programming systems

that can easily encode human knowledge in a compact and usable manner.
2.1 Propositional logic (PL) is the simplest form of logic where all the statements are made
by propositions. A proposition is a declarative statement which is either true or false. It is a
technique of knowledge representation in logical and mathematical form.
Example:
1. a) It is Sunday.
2. b) The Sun rises from West (False proposition)
3. c) 3+3= 7(False proposition)
4. d) 5 is a prime number.
Following are some basic facts about propositional logic:
o Propositional logic is also called Boolean logic as it works on 0 and 1.

o In propositional logic, we use symbolic variables to represent the logic, and we can
use any symbol for a representing a proposition, such A, B, C, P, Q, R, etc.
o Propositions can be either true or false, but it cannot be both.
o Propositional logic consists of an object, relations or function, and logical
connectives.
o These connectives are also called logical operators.
o The propositions and connectives are the basic elements of the propositional logic.
o Connectives can be said as a logical operator which connects two sentences.
o A proposition formula which is always true is called tautology, and it is also called a
valid sentence.
o A proposition formula which is always false is called Contradiction.
o A proposition formula which has both true and false values is called
o Statements which are questions, commands, or opinions are not propositions such as
"Where is Rohini", "How are you", "What is your name", are not propositions.
2.1.1 Syntax of propositional logic:
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:
a. Atomic Propositions
b. Compound propositions
o Atomic Proposition: Atomic propositions are the simple propositions. It consists of a

single proposition symbol. These are the sentences which must be either true or false.
1. a) 2+2 is 4, it is an atomic proposition as it is a true fact.
2. b) "The Sun is cold" is also a proposition as it is a false fact.
o Compound proposition: Compound propositions are constructed by combining
simpler or atomic propositions, using parenthesis and logical connectives.
Example:
1. a) "It is raining today, and street is wet."
2. b) "Ankit is a doctor, and his clinic is in Mumbai."
3. Horn Clauses
The definite clause language does not allow a contradiction to be stated. However, a simple
expansion of the language can allow proof by contradiction.
An integrity constraint is a clause of the form
false←a1∧...∧ak.
where the ai are atoms and false is a special atom that is false in all interpretations.
A Horn clause is either a definite clause or an integrity constraint. That is, a Horn clause has
either false or a normal atom as its head.
Integrity constraints allow the system to prove that some conjunction of atoms is false in all
models of a knowledge base - that is, to prove disjunctions of negations of
atoms. Recall that ¬p is the negation of p, which is true in an interpretation when p is false in
that interpretation, and p∨q is the disjunction of p and q, which is true in an interpretation
if p is true or q is true or both are true in the interpretation. The integrity
constraint false←a1∧...∧ak is logically equivalent to ¬a1∨...∨¬ak.
A Horn clause knowledge base can imply negations of atoms, as shown in Example 5.16.
Example 5.16: Consider the knowledge base KB1:
false←a∧b.
a←c.
b←c.
The atom c is false in all models of KB1. If c were true in model I of KB1, then a and b would
both be true in I (otherwise I would not be a model of KB1).
Because false is false in I and a and b are true in I, the first clause is false in I, a contradiction
to I being a model of KB1. Thus, c is false in all models of KB1. This is expressed as
KB1 ¬c
which means that ¬c is true in all models of KB1, and so c is false in all models of KB1.
Although the language of Horn clauses does not allow disjunctions and negations to be input,
disjunctions of negations of atoms can be derived,
4. First-Order Logic in Artificial intelligence
In the topic of Propositional logic, we have seen that how to represent statements using
propositional logic. But unfortunately, in propositional logic, we can only represent the facts,
which are either true or false. PL is not sufficient to represent the complex sentences or
natural language statements. The propositional logic has very limited expressive power.
Consider the following sentence, which we cannot represent using PL logic.
o "Some humans are intelligent", or

o "Sachin likes cricket."
To represent the above statements, PL logic is not sufficient, so we required some more
powerful logic, such as first-order logic.
4.1 First-Order logic:

o First-order logic is another way of knowledge representation in artificial intelligence.
It is an extension to propositional logic.
o FOL is sufficiently expressive to represent the natural language statements in a
concise way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-
order logic is a powerful language that develops information about the objects in a
more easy way and can also express the relationship between those objects.
o First-order logic (like natural language) does not only assume that the world contains
facts like propositional logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits,
wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any
relation such as: the sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax
b. Semantics
4.1.1 Syntax of First-Order logic:
The syntax of FOL determines which collection of symbols is a logical expression in first-
order logic. The basic syntactic elements of first-order logic are symbols. We write
statements in short-hand notation in FOL.
Basic Elements of First-order logic:
Following are the basic elements of FOL syntax:
Constant 1, 2, A, John, Mumbai, cat,....
Variables x, y, z, a, b,....
Predicates Brother, Father, >,....
Function sqrt, LeftLegOf, ....
Connective ∧, ∨, ¬, ⇒, ⇔
s
Equality ==
Quantifier ∀, ∃
Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences
are formed from a predicate symbol followed by a parenthesis with a sequence of
terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).

Chinky is a cat: => cat (Chinky).
Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
o Subject: Subject is the main part of the statement.

o Predicate: A predicate can be defined as a relation, which binds two atoms together in
a statement.
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject
of the statement and second part "is an integer," is known as a predicate.
 Quantifiers in First-order logic:

o A quantifier is a language element which generates quantification, and quantification
specifies the quantity of specimen in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope of the
variable in the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)

b. Existential quantifier, (for some, at least one).
 Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.

Note: In universal quantifier we use implication "→".
If x is a variable, then ∀x is read as:
o For all x
o For each x
o For every x.
Example:
All man drink coffee.
Let a variable x which refers to a cat so all x can be represented in UOD as below:
∀x man(x) → drink (x, coffee).
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.
Note: In Existential quantifier we always use AND or Conjunction symbol (∧).
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
o There exists a 'x.'

o For some 'x.'
o For at least one 'x.'
Example:
Some boys are intelligent.
∃x: boys(x) ∧ intelligent(x)
It will be read as: There are some x where x is a boy who is intelligent.
 Inference in First-Order Logic
Inference in First-Order Logic is used to deduce new facts or sentences from existing
sentences. Before understanding the FOL inference rule, let's understand some basic
terminologies used in FOL.
Substitution:
Substitution is a fundamental operation performed on terms and formulas. It occurs in all

inference systems in first-order logic. The substitution is complex in the presence of
quantifiers in FOL. If we write F[a/x], so it refers to substitute a constant "a" in place of
variable "x".
Note: First-order logic is capable of expressing facts about some or all objects in the universe.
Equality:
Stay
First-Order logic does not only use predicate and terms for making atomic sentences but also
uses another way, which is equality in FOL. For this, we can use equality symbols which
specify that the two terms refer to the same object.
Example: Brother (John) = Smith.
As in the above example, the object referred by the Brother (John) is similar to the object
referred by Smith. The equality symbol can also be used with negation to represent that two
terms are not the same objects.
Example: ￢(x=y) which is equivalent to x ≠y.
4.2.1 FOL inference rules for quantifier:
As propositional logic we also have inference rules in first-order logic, so following are some
basic inference rules in FOL:
1. Universal Generalization
2. Universal Instantiation
3. Existential Instantiation
4. Existential introduction
1. Universal Generalization:
o Universal generalization is a valid inference rule which states that if premise P(c) is
true for any arbitrary element c in the universe of discourse, then we can have a
conclusion as ∀ x P(x).
o It can be represented as: .

o This rule can be used if we want to show that every element has a similar property.
o In this rule, x must not appear as a free variable.
Example: Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8
bits.", it will also be true.
2. Universal Instantiation:
o Universal instantiation is also called as universal elimination or UI is a valid inference
rule. It can be applied multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for the
variable.
o The UI rule state that we can infer any sentence P(c) by substituting a ground term c
(a constant within domain x) from ∀ x P(x) for any object in the universe of discourse.
o It can be represented as: .
Example:1.
IF "Every person like ice-cream"=> ∀x P(x) so we can infer that

"John likes ice-cream" => P(c)
Example: 2.
Let's take a famous example,
"All kings who are greedy are Evil." So let our knowledge base contains this detail as in the
form of FOL:
∀x king(x) ∧ greedy (x) → Evil (x),
So from this information, we can infer any of the following statements using Universal
Instantiation:
o King(John) ∧ Greedy (John) → Evil (John),

o King(Richard) ∧ Greedy (Richard) → Evil (Richard),
o King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),
3. Existential Instantiation:
o Existential instantiation is also called as Existential Elimination, which is a valid

inference rule in first-order logic.
o It can be applied only once to replace the existential sentence.
o The new KB is not logically equivalent to old KB, but it will be satisfiable if old KB
was satisfiable.
o This rule states that one can infer P(c) from the formula given in the form of ∃x P(x)
for a new constant symbol c.
o The restriction with this rule is that c used in the rule must be a new term for which
P(c ) is true.
o It can be represented as:
Example:
From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the

knowledge base.
o The above used K is a constant symbol, which is called Skolem constant.

o The Existential instantiation is a special case of Skolemization process.
4. Existential introduction
o An existential introduction is also known as an existential generalization, which is a

valid inference rule in first-order logic.
o This rule states that if there is some element c in the universe of discourse which has a
property P, then we can infer that there exists something in the universe which has the
property P.
o It can be represented as:

o Example: Let's say that,
"Priyanka got good marks in English."
"Therefore, someone got good marks in English."
Differentiation between Propositional Logic and First-Order Logic
Propositional Logic (PL)
Propositional logic is an analytical statement which is either true or false. It is basically a
technique that represents the knowledge in logical & mathematical form. There are two types
of propositional logic; Atomic and Compound Propositions.

Facts about Propositional Logic
 Since propositional logic works on 0 and 1 thus it is also known as ‘Boolean Logic’.
 Proposition logic can be either true or false it can never be both.
 In this type of logic, symbolic variables are used in order to represent the logic and any
logic can be used for representing the variable.
 t is comprised of objects, relations, functions, and logical connectives.
 Proposition formula which is always false is called ‘Contradiction’ whereas a
proposition formula which is always true is called ‘Tautology’.
First-Order Logic (FOL)
First-Order Logic is another knowledge representation in AI which is an extended part of PL.
FOL articulates the natural language statements briefly. Another name of First-Order Logic is
‘Predicate Logic’.
Facts about First Order Logic
 FOL is known as the powerful language which is used to develop information related
to objects in a very easy way.
 Unlike PL, FOL assumes some of the facts that are related to objects, relations, and
functions.
 FOL has two main key features or you can say parts that are; ‘Syntax’ & ‘Semantics’.
Key differences between PL and FOL
 Propositional Logic converts a complete sentence into a symbol and makes it logical
whereas in First-Order Logic relation of a particular sentence will be made that involves
relations, constants, functions, and constants.

 The limitation of PL is that it does not represent any individual entities whereas FOL
can easily represent the individual establishment that means if you are writing a single
sentence then it can be easily represented in FOL.
 PL does not signify or express the generalization, specialization or pattern for example
‘QUANTIFIERS’ cannot be used in PL but in FOL users can easily use quantifiers as it
does express the generalization, specialization, and pattern.
What is Unification?
o Unification is a process of making two different logical atomic expressions identical

by finding a substitution. Unification depends on the substitution process.
o It takes two literals as input and makes them identical using substitution.
o Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such that, Ψ1𝜎 = Ψ2𝜎, then it
can be expressed as UNIFY(Ψ1, Ψ2).
o Example: Find the MGU for Unify{King(x), King(John)}
Let Ψ1 = King(x), Ψ2 = King(John),
Substitution θ = {John/x} is a unifier for these atoms and applying this substitution, and both
expressions will be identical.
o The UNIFY algorithm is used for unification, which takes two atomic sentences and
returns a unifier for those sentences (If any exist).
o Unification is a key component of all first-order inference algorithms.
o It returns fail if the expressions do not match with each other.
o The substitution variables are called Most General Unifier or MGU.
E.g. Let's say there are two different expressions, P(x, y), and P(a, f(z)).
In this example, we need to make both above statements identical to each other. For this, we
will perform the substitution.
12.1M
287
C++ vs Java
P(x, y)......... (i)
P(a, f(z))......... (ii)
o Substitute x with a, and y with f(z) in the first expression, and it will be represented
as a/x and f(z)/y.
o With both the substitutions, the first expression will be identical to the second
expression and the substitution set will be: [a/x, f(z)/y].
Conditions for Unification:
Following are some basic conditions for unification:
o Predicate symbol must be same, atoms or expression with different predicate symbol
can never be unified.
o Number of Arguments in both expressions must be identical.
o Unification will fail if there are two similar variables present in the same expression.
o
Forward Chaining and backward chaining in AI
In artificial intelligence, forward and backward chaining is one of the important topics, but
before understanding forward and backward chaining lets first understand that from where
these two terms came.
Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence,
which applies logical rules to the knowledge base to infer new information from known facts.
The first inference engine was part of the expert system. Inference engine commonly
proceeds in two modes, which are:
A. Forward chaining
B. Backward chaining
A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when
using an inference engine. Forward chaining is a form of reasoning which start with atomic
sentences in the knowledge base and applies inference rules (Modus Ponens) in the forward
direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises
are satisfied, and add their conclusion to the known facts. This process repeats until the
problem is solved.
Properties of Forward-Chaining:
o It is a down-up approach, as it moves from bottom to top.

o It is a process of making a conclusion based on known facts or data, by starting from
the initial state and reaches the goal state.
o Forward-chaining approach is also called as data-driven as we reach to the goal using
available data.
o Forward -chaining approach is commonly used in the expert system, such as CLIPS,
business, and production rule systems.
Consider the following famous example which we will use in both approaches:
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A,
an enemy of America, has some missiles, and all the missiles were sold to it by Robert, who
is an American citizen."
Prove that "Robert is criminal."
To solve the above problem, first, we will convert all the above facts into first-order definite
clauses, and then we will use a forward-chaining algorithm to reach the goal.
B. Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method

when using an inference engine. A backward chaining algorithm is a form of reasoning,
which starts with the goal and works backward, chaining through rules to find known facts
that support the goal.
Properties of backward chaining:
o It is known as a top-down approach.

o Backward-chaining is based on modus ponens inference rule.
o In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts
true.
o It is called a goal-driven approach, as a list of goals decides which rules are selected
and used.
o Backward -chaining algorithm is used in game theory, automated theorem proving
tools, inference engines, proof assistants, and various AI applications.
o The backward-chaining method mostly used a depth-first search strategy for proof.
 Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e.,
proofs by contradictions. It was invented by a Mathematician John Alan Robinson in the year
1965.
Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Unification is a key concept in proofs by resolutions.
Resolution is a single inference rule which can efficiently operate on the conjunctive normal
form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a
unit clause.
Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to

be conjunctive normal form or CNF.
13.5M
326
Features of Java - Javatpoint
Next
Stay
Note: To better understand this topic, firstly learns the FOL in AI.
The resolution inference rule:
The resolution rule for first-order logic is simply a lifted version of the propositional rule.
Resolution can resolve two clauses if they contain complementary literals, which are assumed
to be standardized apart so that they share no variables.
Where li and mj are complementary literals.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
We can resolve two clauses which are given below:
[Animal (g(x) V Loves (f(x), x)] and [￢ Loves(a, b) V ￢Kills(a, b)]
Where two complimentary literals are: Loves (f(x), x) and ￢ Loves (a, b)

These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent
clause:
[Animal (g(x) V ￢ Kills(f(x), x)].
Steps for Resolution:

1. Conversion of facts into first-order logic.
2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).
To better understand all the above steps, we will take an example in which we will apply
resolution.
Example:
a. John likes all kind of food.
b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.
Prove by resolution that:
f. John likes peanuts.
Step-1: Conversion of Facts into FOL
In the first step we will convert all the given statements into its first order logic.
Step-2: Conversion of FOL into CNF

In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes
easier for resolution proofs.
o Eliminate all implication (→) and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Move negation (¬)inwards and rewrite
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
o Rename variables or standardize variables
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known
as Skolemization. But in this example problem since there is no existential quantifier
so all the statements will remain same in this step.
o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not
implicitly quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).
Note: Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)"

can be written in two separate statements.
o Distribute conjunction ∧ over disjunction ¬.
This step will not make any change in this problem.
Step-3: Negate the statement to be proved
In this statement, we will apply negation to the conclusion statements, which will be written
as ¬likes(John, Peanuts)
Step-4: Draw Resolution graph:
Now in this step, we will solve the problem by resolution tree using substitution. For the
above problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a complete contradiction with the
given set of statements.
Exercise
Module-4
Handling Uncertainty
Quantifying Uncertainty
The concept of quantifying uncertainty relies on how an agent can keep away uncertainty
with a degree of belief. The term uncertainty refers to that situation or information which is
either unknown or imperfect.
Earlier, we have seen that the problem-solving agents rely on the belief states (which
represents all possible states and generates the future plan) to handle uncertainty. But there
were certain drawbacks created when the agent’s program was created:
 Large and complex belief-state representations were created. It was impossible to

handle each one of them.
 If the right contingent plan (future plan) is selected, it can grow arbitrarily large.
 There can be a condition sometimes when no plan guarantees to reach the goal. Thus,
a method should be there to compare the pros and cons of the plans which are not guaranteed.
Need of Uncertainty
To understand the need, let’s see the below example of uncertain reasoning:
Consider the diagnosis of a cancer patient. By following the propositional logic, a rule can be
derived as:
Cancer?age.
But this rule is incorrect as all cancers are not caused due to age effects. There can be other
possible causes like environment, genetics, skin type, etc. We can rewrite the rule as:
Cancer?age V genetics V skin …
Unfortunately, this rule will also not work because there can be unlimited causes of cancer.
There is only one way to make the rule applicable, i.e., to make it logically exhaustive. But,
using logic with such medical like domain fails due to the following reasons:
 Laziness: It is difficult to list all set of antecedents or consequents required to ensure
exceptionless rules. With this, it is too typical to use such rules.
 Theoretical Ignorance: There is no complete theory of cancer in the medical science

domain.
 Practical Ignorance: Although, we know each rule, but it is uncertain for a specific

patient as all tests cannot be run.
Probability Notation
What is the Bayes’ Theorem?
In statistics and probability theory, the Bayes’ theorem (also known as the Bayes’ rule) is a
mathematical formula used to determine the conditional probability of events. Essentially, the
Bayes’ theorem describes the probability of an event based on prior knowledge of the
conditions that might be relevant to the event.
The theorem is named after English statistician, Thomas Bayes, who discovered the formula
in 1763. It is considered the foundation of the special statistical inference approach called the
Bayes’ inference.
Besides statistics, the Bayes’ theorem is also used in various disciplines, with medicine and
pharmacology as the most notable examples. In addition, the theorem is commonly employed
in different fields of finance. Some of the applications include but are not limited to,
modeling the risk of lending money to borrowers or forecasting the probability of the success
of an investment.
Formula for Bayes’ Theorem
The Bayes’ theorem is expressed in the following formula:
Where:
 P(A|B) – the probability of event A occurring, given event B has occurred
 P(B|A) – the probability of event B occurring, given event A has occurred
 P(A) – the probability of event A
 P(B) – the probability of event B
Note that events A and B are independent events (i.e., the probability of the outcome of event
A does not depend on the probability of the outcome of event B).
A special case of the Bayes’ theorem is when event A is a binary variable. In such a case, the
theorem is expressed in the following way:
Where:
 P(B|A–) – the probability of event B occurring given that event A– has occurred
 P(B|A+) – the probability of event B occurring given that event A+ has occurred
In the special case above, events A– and A+ are mutually exclusive outcomes of event A.
Example of Bayes’ Theorem
Imagine you are a financial analyst at an investment bank. According to your research
of publicly-traded companies, 60% of the companies that increased their share price by more
than 5% in the last three years replaced their CEOs during the period.
At the same time, only 35% of the companies that did not increase their share price by more
than 5% in the same period replaced their CEOs. Knowing that the probability that the stock
prices grow by more than 5% is 4%, find the probability that the shares of a company that
fires its CEO will increase by more than 5%.
Before finding the probabilities, you must first define the notation of the probabilities.
 P(A) – the probability that the stock price increases by 5%
 P(B) – the probability that the CEO is replaced
 P(A|B) – the probability of the stock price increases by 5% given that the CEO has
been replaced
 P(B|A) – the probability of the CEO replacement given the stock price has increased
by 5%.
Using the Bayes’ theorem, we can find the required probability:
Thus, the probability that the shares of a company that replaces its CEO will grow by more
than 5% is 6.67%.
Probabilistic reasoning in Artificial intelligence
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional
logic with certainty, which means we were sure about the predicates. With this knowledge
representation, we might write A→B, which means if A is true then B is true, but consider a
situation where we are not sure about whether A is true or not then we cannot express this
statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of

probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine
probability theory with logic to handle the uncertainty.Stay
We use probability in probabilistic reasoning because it provides a way to handle the

uncertainty that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not
confirmed, such as "It will rain today," "behavior of someone for some situations," "A match
between two teams or two players." These are probable sentences for which we can assume
that it will happen but not sure about it, so here we use probabilistic reasoning.
Need of probabilistic reasoning in AI:
o When there are unpredictable outcomes.
o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
Note: We will learn the above two rules in later chapters.
As probabilistic reasoning uses probability and related terms, so before understanding

probabilistic reasoning, let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is

the numerical measure of the likelihood that an event will occur. The value of probability
always remains between 0 and 1 that represent ideal uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
1. P(A) = 0, indicates total uncertainty in an event A.
1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
o P(¬A) = probability of a not happening event.
o P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real
world.
Prior probability: The prior probability of an event is probability computed before

observing new information.
Posterior Probability: The probability that is calculated after all evidence or information has
taken into account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already
happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
Where P(A⋀B)= Joint probability of a and B
P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be given
as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample
space will be reduced to set B, and now we can only calculate event A when event B is
already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes
English and mathematics, and then what is the percent of students those who like English also
like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.
Hence, 57% are the students who like English also like Mathematics.
Probabilistic Reasoning Over Time in AI
Probabilistic reasoning is the representation of knowledge where the concept of probability is

applied to indicate the uncertainty in knowledge.
Reasons to use Probabilistic Reasoning in AI
Some reasons to use this way of representing knowledge is given below:
 When we are unsure of the predicates.
 When the possibilities of predicates become too large to list down.

 When during an experiment, it is proven that an error occurs.
Probability of a given event = Chances of that event occurring / Total number of Events.
Notations and Properties
Consider the statement S: March will be cold.
Probability is often denoted as P(predicate).
Considering the chances of March being cold is only 30%, therefore, P(S) = 0.3
Probability always takes a value between 0 and 1. If the probability is 0, then the event will
never occur and if it is 1, then it will occur for sure.
Then, P(¬S) = 0.7
This means, the probability of March not being cold is 70%.
 Property 1: P(S) + P(¬S) = 1
Consider the statement T: April will be cold.
Then, P(S∧T) means Probability of S AND T, i.e., Probability of March and April being cold.
P(S∨T) means Probability of S OR T, i.e., Probability of March or April being cold.
 Property 2: P(S∨T) = P(S) + P(T) - P(S∧T)
Proofs for the properties are not given here and you can work them out by yourselves using
Venn Diagrams.
Conditional Property
Conditional Property is defined as the probability of a given event given another event. It is
denoted by P(B|A) and is read as: ''Probability of B given probability of A.''
 Property 3: P(B|A) = P(B∧A) / P(A).
 What is a hidden Markov model?
 Hidden Markov models (HMMs) are a formal foundation for making probabilistic
models of linear sequence 'labeling' problems1,2. They provide a conceptual toolkit
for building complex models just by drawing an intuitive picture. They are at the heart
of a diverse range of programs, including genefinding, profile searches, multiple
sequence alignment and regulatory site identification. HMMs are the Legos of
computational sequence analysis.
Hidden Markov Model (HMM)
When we can not observe the state themselves but only the result of some probability
function(observation) of the states we utilize HMM. HMM is a statistical Markov model in
which the system being modeled is assumed to be a Markov process with unobserved
(hidden) states.
Markov Model: Series of (hidden) states z={z_1,z_2………….} drawn from state alphabet S
={s_1,s_2,…….𝑠_|𝑆|} where z_i belongs to S.
Hidden Markov Model: Series of observed output x = {x_1,x_2,………} drawn from an

output alphabet V= {𝑣1, 𝑣2, . . , 𝑣_|𝑣|} where x_i belongs to V
Assumptions of HMM
HMM too is built upon several assumptions and the following is vital. Output independence
assumption: Output observation is conditionally independent of all other hidden states and all
other observations when given the current hidden state.
Emission
Probability Matrix: Probability of hidden state generating output v_i given that state at the
corresponding time was s_j.Hidden Markov Model as a finite state machineConsider the
example given below in Fig.3. which elaborates how a person feels on different climates.
Fig.3. Markov Model as Finite State Machine — Image by Author
Set of states (S) = {Happy, Grumpy}Set of hidden states (Q) = {Sunny , Rainy}
State series over time = z∈ S_T
Observed States for four day = {z1=Happy, z2= Grumpy, z3=Grumpy, z4=Happy}
The feeling that you understand from a person emoting is called the observations since you
observe them.
The weather that influences the feeling of a person is called the hidden state since you can’t
observe it.

Module - 1: Introduction To AI

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module - 1: Introduction To AI

Uploaded by

Copyright:

Available Formats

Module -1

Data, Knowledge, Information, Intelligent, Artificial intelligent. Need, characteristics and

Data is the quantities, characters, or symbols on which operations are performed by a

Knowledge is a familiarity, awareness, or understanding of someone or something, such as

Figure 1.1: Knowledge Representation

The definition of information is news or knowledge received or given. An example of

You need information because it empowers you! Information allows: intellectual

difference between data and information

The intelligence demonstrated by machines is known as Artificial Intelligence. Artificial

Artificial Intelligence means

 An intelligent entity created by humans.

Artificial Intelligence enhances the speed, precision and effectiveness of human efforts. In

2. Artificial Intelligence Characteristics and function

For example, Clearview AI, an American technology company, offers surveillance

3. Automate Simple and Repetitive Tasks

For example, Watson Assistant, an AI-powered assistant, developed by IBM which can run

The whole concept of quantum-enhanced AI algorithms remains in the conceptual research

For example, A pioneer in this field is Google AI Quantum whose objective is to develop

3.2 Machine Learning

3.3 Deep Learning

3.4 Machine Vision

3.5 Natural Language Processing (NLP)

4.1 Types of Agents

 Simple Reflex Agents

 Model-Based Reflex Agents

Simple reflex agents

 Very limited intelligence.

 No knowledge of non-perceptual parts of the state.

 Usually too big to generate and store.

Figure 1.3: Simple reflex agents

Model-based reflex agents

Updating the state requires information about :

 how the world evolves independently from the agent, and

 how the agent’s actions affect the world.

Figure 1.4: Goal-based agents

A learning agent has mainly four conceptual components, which are:

1. Learning element: It is responsible for making improvements by learning from the

3. Performance element: It is responsible for selecting external action

4. Problem Generator: This component is responsible for suggesting actions that will

5. The searching approach of an agent to find a solution to the problem.

 Breadth-first search (BFS):

 Depth-first search (DFS):

 Best First Search (Heuristic Search):

2. Explore and examine

3. Find the solution from the various algorithms on the table.

4. The final step is Execution!

Problem Solving Agent

An important aspect of intelligence is goal-based problem solving. ... A well-defined problem

Figure 1.7: Biological Neural Network

Relationship between Biological neural network and artificial neural network:

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to

Artificial Neural Network primarily consists of three layers:

Figure 1.9: Layers of ANN

It determines weighted total is passed as an input to an activation function to produce the

 Advantages of Artificial Neural Network (ANN)

2. Storing data on the entire network:

3. Capability to work with incomplete knowledge:

5. Having fault tolerance:

 Disadvantages of Artificial Neural Network:

2. Unrecognized behavior of the network:

4. Difficulty of showing the issue to the network:

The duration of the network is unknown: