Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Subject: Artificial Intelligence (3170716)

IMP Question
Q. No. Questions
1 What is artificial intelligence? Define the different task domains of artificial
intelligence.
Artificial Intelligence is “the study of how to make computers do things, which, at the
moment, people do better”.
According to the father of Artificial Intelligence, John McCarthy, it is “The science
and engineering of making intelligent machines, especially intelligent computer
programs”.
Artificial Intelligence is a “way of making a computer, a computer-controlled robot,
or software think intelligently, in the similar manner the intelligent humans think”.

Task Domains of Artificial Intelligence (AI)


Artificial Intelligence tasks are divided into three groups, Mundane Tasks, Formal
Tasks and Expert Tasks.
Mundane Tasks:
 Perception
 Vision
 Speech
 Natural Languages
 Language Understanding
 Language Generation
 Language Translation
 Common sense reasoning
 Robot Control
 Common Sense
 Reasoning
 Planing

Formal Tasks
 Games: chess, checkers, etc
 Mathematics
 Geometr
 Logic
 Integration and Differentiation
 Verification
 Theorem Proving

Expert Tasks:
 Engineering ( Design, Fault finding, Manufacturing planning)
 Scientific Analysis
 Medical Diagnosis
 Financial Analysis

2 What is the significance of the “Turing Test” in AI? Explain how it is performed.

The success of an intelligent behavior of a system can be measured with Turing Test.
Two persons and a machine to be evaluated participate in the test. Out of the two
persons, one plays the role of the tester. Each of them sits in different rooms. The
tester is unaware of who is machine and who is a human. He interrogates the
questions by typing and sending them to both intelligences, to which he receives
typed responses.
This test aims at fooling the tester. If the tester fails to determine machine’s response
from the human response, then the machine is said to be intelligent.

3 Discuss with examples: AI Problem Characteristic.

Heuristics cannot be generalized, as they are domain specific. Most problems


requiring simulation of intelligence use heuristic search extensively. Some heuristics
are used to define the control structure that guides the search process, as seen in the
example described above. But heuristics can also be encoded in the rules to represent
the domain knowledge. Since most AI problems make use of knowledge and guided
search through the knowledge, AI can be described as the study of techniques for
solving exponentially hard problems in polynomial time by exploiting knowledge
about problem domain. To use the heuristic search for problem solving, we suggest
analysis of the problem for the following considerations:
1. Is the problem decomposable into small sub-problems which are easy to
solve?
2. Can solution steps be ignored or undone?
3. Is the universe of the problem is predictable?
4. Is a good solution to the problem is absolute or relative?
5. Is the solution to the problem a state or a path?
6. What is the role of knowledge in solving a problem using artificial
intelligence?
7. Does the task of solving a problem require human interaction?

1. Is the problem decomposable into small sub-problems which are easy to solve?
Can the problem be broken down into smaller problems to be solved independently?
The decomposable problem can be solved easily.
Example: In this case, the problem is divided into smaller problems. The smaller
problems are solved independently. Finally, the result is merged to get the final result.
2. Can solution steps be ignored or undone?
In the Theorem Proving problem, a lemma that has been proved can be ignored for
the next steps.
Such problems are called Ignorable problems.
In the 8-Puzzle, Moves can be undone and backtracked.
Such problems are called Recoverable problems.

In Playing Chess, moves can be retracted.


Such problems are called Irrecoverable problems.
Ignorable problems can be solved using a simple control structure that never
backtracks. Recoverable problems can be solved using
backtracking. Irrecoverable problems can be solved by recoverable style methods
via planning.

3. Is the universe of the problem is predictable?


In Playing Bridge, We cannot know exactly where all the cards are or what the other
players will do on their turns.

Uncertain outcome!
For certain-outcome problems, planning can be used to generate a sequence of
operators that is guaranteed to lead to a solution.
For uncertain-outcome problems, a sequence of generated operators can only have
a good probability of leading to a solution. Plan revision is made as the plan is carried
out and the necessary feedback is provided.

4. Is a good solution to the problem is absolute or relative?


The Travelling Salesman Problem, we have to try all paths to find the shortest one.
Any path problem can be solved using heuristics that suggest good paths to explore.
For best-path problems, a much more exhaustive search will be performed.

5. Is the solution to the problem a state or a path


The Water Jug Problem, the path that leads to the goal must be reported.
A path-solution problem can be reformulated as a state-solution problem by
describing a state as a partial path to a solution. The question is whether that is natural
or not.
6. What is the role of knowledge in solving a problem using artificial
intelligence?

Playing Chess
Consider again the problem of playing chess. Suppose you had unlimited computing
power available. How much knowledge would be required by a perfect program? The
answer to this question is very little—just the rules for determining legal moves and
some simple control mechanism that implements an appropriate search procedure.
Additional knowledge about such things as good strategy and tactics could of course
help considerably to constrain the search and speed up the execution of the program.
Knowledge is important only to constrain the search for a solution.
Reading Newspaper
Now consider the problem of scanning daily newspapers to decide which are
supporting the Democrats and which are supporting the Republicans in some
upcoming election. Again assuming unlimited computing power, how much
knowledge would be required by a computer trying to solve this problem? This time
the answer is a great deal.
It would have to know such things as:
 The names of the candidates in each party.
 The fact that if the major thing you want to see done is have taxes lowered,
you are probably supporting the Republicans.
 The fact that if the major thing you want to see done is improved education
for minority students, you are probably supporting the Democrats.
 The fact that if you are opposed to big government, you are probably
supporting the Republicans.
 And so on …

7. Does the task of solving a problem require human interaction?


Sometimes it is useful to program computers to solve problems in ways that the
majority of people would not be able to understand.
This is fine if the level of the interaction between the computer and its human users
is problem-in solution-out.
But increasingly we are building programs that require intermediate interaction with
people, both to provide additional input to the program and to provide additional
reassurance to the user.
The solitary problem, in which there is no intermediate communication and no
demand for an explanation of the reasoning process.
The conversational problem, in which intermediate communication is to provide
either additional assistance to the computer or additional information to the user.

4 Explain Goal Based Agent and Utility based Agent architecture with proper
diagram.

Types of Agents

Agents can be grouped into five classes based on their degree of perceived
intelligence and capability :

 Simple Reflex Agents


 Model-Based Reflex Agents
 Goal-Based Agents
 Utility-Based Agents
 Learning Agent

Goal-based agents
 These kinds of agents take decisions based on how far they are currently
from their goal(description of desirable situations).
 Their every action is intended to reduce its distance from the goal. This
allows the agent a way to choose among multiple possibilities, selecting the
one which reaches a goal state.
 The knowledge that supports its decisions is represented explicitly and can
be modified, which makes these agents more flexible.
 They usually require search and planning. The goal-based agent’s behavior
can easily be changed.
 They choose their actions in order to achieve goals. Goal-based approach is
more flexible than reflex agent since the knowledge supporting a decision is
explicitly modeled, thereby allowing for modifications.
 Goal − It is the description of desirable situations.

Utility-based agents

 Sometimes achieving the desired goal is not enough.


 Goals are inadequate when −
 There are conflicting goals, out of which only few can be achieved.
o Goals have some uncertainty of being achieved and you need to
weigh likelihood of success against the importance of a goal.
o The agents which are developed having their end uses as building
blocks are called utility-based agents.
 When there are multiple possible alternatives, then to decide which one
is best, utility-based agents are used. They choose actions based on
a preference (utility) for each state.
 We may look for a quicker, safer, cheaper trip to reach a destination.
Agent happiness should be taken into consideration.
 Utility describes how “happy” the agent is. Because of the uncertainty
in the world, a utility agent chooses the action that maximizes the
expected utility.
 A utility function maps a state onto a real number which describes the
associated degree of happiness.

5 Explain various properties of task environment

The environment has multifold properties −


1. Fully observable vs Partially Observable
 If an agent sensor can sense or access the complete state of an environment at
each point of time then it is a fully observable environment, else it is partially
observable.
 A fully observable environment is easy as there is no need to maintain the
internal state to keep track history of the world.
 An agent with no sensors in all environments then such an environment is
called as unobservable.
2. Deterministic vs Stochastic
 If an agent's current state and selected action can completely determine the
next state of the environment, then such environment is called a deterministic
environment.
 A stochastic environment is random in nature and cannot be determined
completely by an agent.
 In a deterministic, fully observable environment, agent does not need to worry
about uncertainty.
3. Episodic vs Sequential
 In an episodic environment, there is a series of one-shot actions, and only the
current percept is required for the action.
 However, in Sequential environment, an agent requires memory of past
actions to determine the next best actions.
4. Single-agent vs Multi-agent
 If only one agent is involved in an environment, and operating by itself then
such an environment is called single agent environment.
 However, if multiple agents are operating in an environment, then such an
environment is called a multi-agent environment.
 The agent design problems in the multi-agent environment are different from
single agent environment.
5. Static vs Dynamic
 If the environment can change itself while an agent is deliberating then such
environment is called a dynamic environment else it is called a static
environment.
 Static environments are easy to deal because an agent does not need to
continue looking at the world while deciding for an action.
 However for dynamic environment, agents need to keep looking at the world
at each action.
 Taxi driving is an example of a dynamic environment whereas Crossword
puzzles are an example of a static environment.
6. Discrete vs Continuous:
 If in an environment there are a finite number of percepts and actions that can
be performed within it, then such an environment is called a discrete
environment else it is called continuous environment.
 A chess gamecomes under discrete environment as there is a finite number of
moves that can be performed.
 A self-driving car is an example of a continuous environment.
7. Known vs Unknown
 Known and unknown are not actually a feature of an environment, but it is an
agent's state of knowledge to perform an action.
 In a known environment, the results for all actions are known to the agent.
While in unknown environment, agent needs to learn how it works in order to
perform an action.
 It is quite possible that a known environment to be partially observable and
an Unknown environment to be fully observable.
8. Accessible vs Inaccessible
 If an agent can obtain complete and accurate information about the state's
environment, then such an environment is called an Accessible environment
else it is called inaccessible.
 An empty room whose state can be defined by its temperature is an example
of an accessible environment.
 Information about an event on earth is an example of Inaccessible
environment.

6 Write difference between Informed and Uninformed Search in AI.

Informed Search algorithms have information on the goal state which helps in more
efficient searching. This information is obtained by a function that estimates how
close a state is to the goal state. Example: and Graph Search.
Uninformed Search algorithms have no additional information on the goal node
other than the one provided in the problem definition. The plans to reach the goal
state from the start state differ only by the order and length of actions. Examples:
Depth First Search and Breadth-First Search.

Parameters Informed Search Uninformed Search


It is also known as Heuristic It is also known as Blind
Known as
Search. Search.

Using It uses knowledge for the It doesn’t use knowledge


Knowledge searching process. for the searching process.

It finds solution slow as


It finds a solution more
Performance compared to an informed
quickly.
search.

It may or may not be


Completion It is always complete.
complete.

Cost Factor Cost is low. Cost is high.

It consumes moderate
It consumes less time
Time time because of slow
because of quick searching.
searching.

No suggestion is given
There is a direction given
Direction regarding the solution in
about the solution.
it.

It is less lengthy while It is more lengthy while


Implementation
implemented. implemented.

It is more efficient as
It is comparatively less
efficiency takes into account
efficient as incurred cost is
cost and performance. The
Efficiency more and the speed of
incurred cost is less and
finding the Breadth-
speed of finding solutions is
Firstsolution is slow.
quick.

Comparatively higher
Computational Computational requirements
computational
requirements are lessened.
requirements.

Having a wide scope in terms


Size of search Solving a massive search
of handling large search
problems task is challenging.
problems.
Greedy Search Depth First Search (DFS)
Examples of A* Search Breadth First Search
Algorithms AO* Search (BFS)
Hill Climbing Algorithm Branch and Bound

7 Write difference between BFS and DFS with example.

S.
Parameters BFS DFS
No.

BFS stands for Breadth DFS stands for Depth First


1. Stands for
First Search. Search.
BFS(Breadth First Search)
uses Queue data structure DFS(Depth First Search)
2. Data Structure
for finding the shortest uses Stack data structure.
path.
DFS is also a traversal
approach in which the
BFS is a traversal approach
traverse begins at the root
in which we first walk
node and proceeds through
3. Definition through all nodes on the
the nodes as far as possible
same level before moving
until we reach the node
on to the next level.
with no unvisited nearby
nodes.

BFS can be used to find a


single source shortest path
In DFS, we might traverse
in an unweighted graph
through more edges to
4. Technique because, in BFS, we reach
reach a destination vertex
a vertex with a minimum
from a source.
number of edges from a
source vertex.
Conceptual BFS builds the tree level by DFS builds the tree sub-
5.
Difference level. tree by sub-tree.
It works on the concept of It works on the concept of
6. Approach used
FIFO (First In First Out). LIFO (Last In First Out).
BFS is more suitable for DFS is more suitable when
7. Suitable for searching vertices closer to there are solutions away
the given source. from source.
DFS is more suitable for
BFS considers all game or puzzle problems.
Suitable for neighbors first and We make a decision, and
8. Decision therefore not suitable for the then explore all paths
Treestheirwinning decision-making trees used through this decision. And
in games or puzzles. if this decision leads to win
situation, we stop.
The Time complexity of The Time complexity of
BFS is O(V + E) when DFS is also O(V + E) when
Adjacency List is used and Adjacency List is used and
9. Time Complexity O(V^2) when Adjacency O(V^2) when Adjacency
Matrix is used, where V Matrix is used, where V
stands for vertices and E stands for vertices and E
stands for edges. stands for edges.

Visiting of Siblings/ Here, siblings are visited Here, children are visited
10.
Children before the children. before the siblings.
The visited nodes are added
Nodes that are traversed
Removal of to the stack and then
11. several times are deleted
Traversed Nodes removed when there are no
from the queue.
more nodes to visit.
DFS algorithm is a
In BFS there is no concept recursive algorithm that
12. Backtracking
of backtracking. uses the idea of
backtracking
BFS is used in various DFS is used in various
applications such as applications such as acyclic
13. Applications
bipartite graphs, shortest graphs and topological
paths, etc. order etc.
BFS requires more
14. Memory DFS requires less memory.
memory.
BFS is optimal for finding DFS is not optimal for
15. Optimality
the shortest path. finding the shortest path.
DFS has lesser space
In BFS, the space
complexity because at a
complexity is more critical
16. Space complexity time it needs to store only a
as compared to time
single path from the root to
complexity.
the leaf node.
BFS is slow as compared to DFS is fast as compared to
17. Speed
DFS. BFS.
When the target is close to When the target is far from
18. When to use? the source, BFS performs the source, DFS is
better. preferable.
8 What is a state space search? Explain with respect to the water jug problem.
Problem: There are two jugs of volume A gallon and B gallon . Neither has any
measuring mark on it.There is a pump that can be used to fill the jugs with
water.How can you get exactly x gallon of water into the A gallon jug. Assuming
that we have unlimited supply of water.
Let's assume we have A=4 gallon and B= 3 gallon jugs. And we want exactly 2
gallon water into jug A (i.e 4 gallon jug) how we will do this.

Solution:
We are given two jugs, a 4-gallon one and 3-gallon one. Neither has any measuring
marked on it. There is a pump, which can be used to fill the jugs with water. How can
we get exactly 2 gallons of water into 4-gallon jug?

The state space for this problem can be described as the set of ordered pairs of integers
(X, Y) such that X = 0, 1, 2, 3 or 4 and Y = 0, 1, 2 or 3; X is the number of gallons of
water in the 4-gallon jug and Y the quantity of water in the 3-gallon jug.

The start state is (0, 0) and the goal state is (2, n) for any value of n, as the problem
does not specify how many gallons need to be filled in the 3-gallon jug (0, 1, 2, 3).
So the problem has one initial state and many goal states. Some problems may have
many initial states and one or many goal states.

As in chess playing they are represented as rules whose left side are matched against
the current state and their right sides describe the new state which results from
applying the rule.
In order to describe the operators completely here are some assumptions, not
mentioned, in the problem state.
1. We can fill a jug from the pump.
2. We can pour water out a jug, onto the ground.
3. We can pour water out of one jug into the other.
4. No other measuring devices are available.
All such additional assumptions need to be given when converting a problem
statement in English to a formal representation of the problem, suitable for use by a
program.
To solve the water jug problem, all we need, in addition to the problem description
given above, is a control structure which loops through a simple cycle in which some
rule whose left side matches the current state is chosen, the appropriate change to the
state is made as described in the corresponding right side and the resulting state is
checked to see if it corresponds to a goal state.
The operators to be used to solve the problem can be described as shown in below
table:
Rule State Process

1 (X,Y | X<4) (4,Y) {Fill 4-gallon jug}

2 (X,Y |Y<3) (X,3) {Fill 3-gallon jug}

3 (X,Y | X=d & (X-d,Y) {Pour some water out of 4 - gallon jug
d>0) }

4 (X,Y | Y=d & (X,Y-d) {Pour some water out of 3 -gallon jug
d>0) }

5 (X,Y |X>0) (0,Y) {Empty 4-gallon jug}

6 (X,Y | Y>0) (X,0) {Empty 3-gallon jug}

7 (X,Y | X+Y>=4 ^ (4,Y-(4-X)) {Pour water from 3-gallon jug into


Y>0) 4-gallon jug until 4-gallon jug is full}

8 (X,Y | X+Y>=3 (X-(3-Y),3) {Pour water from 4-gallon jug into


^X>0) 3-gallon jug until 3-gallon jug is full}

9 (X,Y | X+Y<=4 (X+Y,0) {Pour all water from 3-gallon jug into
^Y>0) 4-gallon jug}

10 (X,Y | X+Y <=3^ (0,X+Y)


X>0) {Pour all water from 4-gallon jug into 3-gallon
jug}

11 (0,2) (2,0) {Pour 2 gallon water from 3 gallon jug


into 4 gallon jug}

12 (2,Y) (0,Y) {empty the 2 gallon in the 4 gallon on


the ground.}
There are several sequences of operators which will solve the problem, two such
sequences are shown in Fig.

9 Explain Hill Climbing Algorithm with it types .

Hill Climbing is a heuristic search used for mathematical optimization problems in the field
of Artificial Intelligence.
Given a large set of inputs and a good heuristic function, it tries to find a sufficiently good
solution to the problem. This solution may not be the global optimal maximum.
Features of Hill Climbing
1. Variant of generate and test algorithm: It is a variant of generating and test algorithm.
The generate and test algorithm is as follows :
1. Generate possible solutions.
2. Test to see if this is the expected solution.
3. If the solution has been found quit else go to step 1.
Hence we call Hill climbing a variant of generating and test algorithm as it takes the
feedback from the test procedure. Then this feedback is utilized by the generator in
deciding the next move in the search space.
2. Uses the Greedy approach: At any point in state space, the search moves in that
direction only which optimizes the cost of function with the hope of finding the optimal
solution at the end.

Types of Hill Climbing


A. Simple Hill climbing:
It examines the neighboring nodes one by one and selects the first neighboring node which
optimizes the current cost as the next node.
Algorithm for Simple Hill climbing :
Step 1 : Evaluate the initial state. If it is a goal state then stop and return success.
Otherwise, make initial state as current state.
Step 2 : Loop until the solution state is found or there are no new operators present which
can be applied to the current state.
a) Select a state that has not been yet applied to the current state and apply it to produce a
new state.
b) Perform these to evaluate new state
i. If the current state is a goal state, then stop and return success.
ii. If it is better than the current state, then make it current state and proceed further.
iii. If it is not better than the current state, then continue in the loop until a solution is
found.
Step 3 : Exit.
B. Steepest-Ascent Hill climbing:
It first examines all the neighboring nodes and then selects the node closest to the solution
state as of the next node.
Algorithm for Steepest Ascent Hill climbing :
Step 1 : Evaluate the initial state. If it is a goal state then stop and return success.
Otherwise, make initial state as current state.
Step 2 : Repeat these steps until a solution is found or current state does not change
a) Select a state that has not been yet applied to the current state.
b) Initialize a new ‘best state’ equal to current state and apply it to produce a new state.
c) Perform these to evaluate new
state i. If the
current state is a goal state, then stop and return
success. ii. If it is better than best state, then
make it best state else continue loop with another new state.
d) Make best state as current state and go to Step 2: (b) part.
Step 3 : Exit
C. Stochastic hill climbing:
It does not examine all the neighboring nodes before deciding which node to select. It just
selects a neighboring node at random and decides (based on the amount of improvement in
that neighbor) whether to move to that neighbor or to examine another.
Step 1: Evaluate the initial state. If it is a goal state then stop and return success.
Otherwise, make the initial state the current state.
Step 2: Repeat these steps until a solution is found or the current state does not change.
a) Select a state that has not been yet applied to the current state.
b) Apply successor function to the current state and generate all the neighbor states.
c) Among the generated neighbor states which are better than the current state choose a
state randomly (or based on some probability
function).
d) If the chosen state is the goal state, then return success, else make it the current state
and repeat step 2: b) part.
Step 3: Exit.

State Space diagram for Hill Climbing


The state-space diagram is a graphical representation of the set of states our search
algorithm can reach vs the value of our objective function(the function which we wish to
maximize).
X- axis: denotes the state space ie states or configuration our algorithm may reach.
Y-axis: denotes the values of objective function corresponding to a particular state.
The best solution will be that state space where the objective function has a maximum
value(global maximum).
Different regions in the State Space Diagram:
1. Local maximum: It is a state which is better than its neighboring state however
there exists a state which is better than it(global maximum). This state is better
because here the value of the objective function is higher than its neighbors.
2. Global maximum: It is the best possible state in the state space diagram. This is
because, at this stage, the objective function has the highest value.
3. Plateau/flat local maximum: It is a flat region of state space where neighboring
states have the same value.
4. Ridge: It is a region that is higher than its neighbors but itself has a slope. It is a
special kind of local maximum.
5. Current state: The region of state space diagram where we are currently present
during the search.
6. Shoulder: It is a plateau that has an uphill edge.
Problems in different regions in Hill climbing
Hill climbing cannot reach the optimal/best state(global maximum) if it enters any of the
following regions :
1. Local maximum: At a local maximum all neighboring states have a value that is
worse than the current state. Since hill-climbing uses a greedy approach, it will not
move to the worse state and terminate itself. The process will end even though a
better solution may exist.
To overcome the local maximum problem: Utilize the backtracking technique.
Maintain a list of visited states. If the search reaches an undesirable state, it can
backtrack to the previous configuration and explore a new path.
2. Plateau: On the plateau, all neighbors have the same value. Hence, it is not
possible to select the best direction.
To overcome plateaus: Make a big jump. Randomly select a state far away from
the current state. Chances are that we will land in a non-plateau region.
3. Ridge: Any point on a ridge can look like a peak because movement in all possible
directions is downward. Hence the algorithm stops when it reaches this state.
To overcome Ridge: In this kind of obstacle, use two or more rules before testing.
It implies moving in several directions at once.

10 Explain A* search Algorithm .

 A* search is the most commonly known form of best-first search.


 It uses heuristic function h(n), and cost to reach the node n from the start state
g(n). It has combined features of UCS(Uniform Cost Search) and greedy best-
first search, by which it solve the problem efficiently.
 A* search algorithm finds the shortest path through the search space using the
heuristic function. This search algorithm expands less search tree and
provides optimal result faster.
 A* algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n).
 In A* search algorithm, we use search heuristic as well as the cost to reach
the node. Hence we can combine both costs as following, and this sum is
called as a fitness number.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure
and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation
function (g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed
list. For each successor n', check whether n' is already in the OPEN or CLOSED list,
if not then compute evaluation function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached
to the back pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.
Advantages:
 A* search algorithm is the best algorithm than other search algorithms.
 A* search algorithm is optimal and complete.
 This algorithm can solve very complex problems.
Disadvantages:
 It does not always produce the shortest path as it mostly based on heuristics
and approximation.
 A* search algorithm has some complexity issues.
 The main drawback of A* is memory requirement as it keeps all generated
nodes in the memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The
heuristic value of all states is given in the below table so we will calculate the f(n) of
each state using the formula f(n)= g(n) + h(n), where g(n) is the cost to reach any
node from start state.
Here we will use OPEN and CLOSED list.
Solution:

Initialization: {(S, 5)}


Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G,
10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal
path with cost 6.
Points to remember:
 A* algorithm returns the path which occurred first, and it does not search for
all remaining paths.
 The efficiency of A* algorithm depends on the quality of heuristic.
 A* algorithm expands all nodes which satisfy the condition f(n)
Complete: A* algorithm is complete as long as:
 Branching factor is finite.
 Cost at every action is fixed.

Optimal: A* search algorithm is optimal if it follows below two conditions:


 Admissible: the first condition requires for optimality is that h(n) should be
an admissible heuristic for A* tree search. An admissible heuristic is
optimistic in nature.
 Consistency: Second required condition is consistency for only A* graph-
search.
If the heuristic function is admissible, then A* tree search will always find the least
cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic
function, and the number of nodes expanded is exponential to the depth of solution
d. So the time complexity is O(b^d), where b is the branching factor.
Space Complexity: The space complexity of A* search algorithm is O(b^d)
11 What is problem reduction technique? Using this explain AO* search with an
example.

When a problem can be divided into a set of sub problems, where each sub problem
can be solved separately and a combination of these will be a solution, AND-OR
graphs or AND - OR trees are used for representing the solution.
The decomposition of the problem or problem reduction generates AND arcs.
AND-OR Graph

The figure shows an AND-OR graph


1. To pass any exam, we have two options, either cheating or hard work.
2. In this graph we are given two choices, first do cheating or (The red line)
work hard and (The arc) pass.
3. When we have more than one choice and we have to pick one, we apply OR
condition to choose one.(That's what we did here).
4. Basically the ARC here denote AND condition.
5. Here we have replicated the arc between the work hard and the pass because
by doing the hard work possibility of passing an exam is more than cheating.
A* Vs AO*
1. Both are part of informed search technique and use heuristic values to solve
the problem.
2. The solution is guaranteed in both algorithm.
3. A* always gives an optimal solution (shortest path with low cost) But It is
not guaranteed to that AO* always provide an optimal solutions.
4. Reason: Because AO* does not explore all the solution path once it got
solution.
How AO* works
Let's try to understand it with the following diagram

The algorithm always moves towards a lower cost value.


Basically, We will calculate the cost function here (F(n)= G (n) + H (n))
H: heuristic/ estimated value of the nodes. and G: actual cost or edge value (here
unit value).
Here we have taken the edges value 1 , meaning we have to focus solely on the
heuristic value.
1. The Purple color values are edge values (here all are same that is one).
2. The Red color values are Heuristic values for nodes.
3. The Green color values are New Heuristic values for nodes.
Procedure:
1. In the above diagram we have two ways from A to D or A to B-C (because
of and condition). calculate cost to select a path
2. F(A-D)= 1+10 = 11 and F(A-BC) = 1 + 1 + 6 +12 = 20
3. As we can see F(A-D) is less than F(A-BC) then the algorithm choose the
path F(A-D).
4. Form D we have one choice that is F-E.
5. F(A-D-FE) = 1+1+ 4 +4 =10
6. Basically 10 is the cost of reaching FE from D. And Heuristic value of node
D also denote the cost of reaching FE from D. So, the new Heuristic value of
D is 10.
7. And the Cost from A-D remain same that is 11.
Suppose we have searched this path and we have got the Goal State, then we will
never explore the other path. (this is what AO* says but here we are going to explore
other path as well to see what happen)
Let's Explore the other path:
1. In the above diagram we have two ways from A to D or A to B-C (because
of and condition). calculate cost to select a path
2. F(A-D)= 1+10 = 11 and F(A-BC) = 1 + 1 + 6 +12 = 20
3. As we know the cost is more of F(A-BC) but let's take a look
4. Now from B we have two path G and H , let's calculate the cost
5. F(B-G)= 5+1 =6 and F(B-H)= 7 + 1 = 8
6. So, cost from F(B-H) is more than F(B-G) we will take the path B-G.
7. The Heuristic value from G to I is 1 but let's calculate the cost form G to
I.
8. F(G-I) = 1 +1 = 2. which is less than Heuristic value 5. So, the new Heuristic
value form G to I is 2.
9. If it is a new value, then the cost from G to B must also have changed. Let's
see the new cost form (B to G)
10. F(B-G)= 1+2 =3 . Mean the New Heuristic value of B is 3.
11. But A is associated with both B and C .
12. As we can see from the diagram C only have one choice or one node to
explore that is J. The Heuristic value of C is 12.
13. Cost form C to J= F(C-J) = 1+1= 2 Which is less than Heuristic value
14. Now the New Heuristic value of C is 2.
15. And the New Cost from A- BC that is F(A-BC) = 1+1+2+3 = 7 which is
less than F(A-D)=11.
16. In this case Choosing path A-BC is more cost effective and good than that of
A-D.
But this will only happen when the algorithm explores this path as well. But according
to the algorithm, algorithm will not accelerate this path (here we have just did it to
see how the other path can also be correct).
But it is not the case in all the cases that it will happen in some cases that the algorithm
will get optimal solution.

12 Discuss Min-Max search method.

 Mini-max algorithm is a recursive or backtracking algorithm which is used in


decision-making and game theory. It provides an optimal move for the player
assuming that opponent is also playing optimally.
 Mini-Max algorithm uses recursion to search through the game-tree.
 Min-Max algorithm is mostly used for game playing in AI. Such as Chess,
Checkers, tic-tac-toe, go, and various tow-players game. This Algorithm
computes the minimax decision for the current state.
 In this algorithm two players play the game, one is called MAX and other is
called MIN.
 Both the players fight it as the opponent player gets the minimum benefit
while they get the maximum benefit.
 Both Players of the game are opponent of each other, where MAX will select
the maximized value and MIN will select the minimized value.
 The minimax algorithm performs a depth-first search algorithm for the
exploration of the complete game tree.
 The minimax algorithm proceeds all the way down to the terminal node of the
tree, then backtrack the tree as the recursion.
Working of Min-Max Algorithm:
 The working of the minimax algorithm can be easily described using an
example. Below we have taken an example of game-tree which is representing
the two-player game.
 In this example, there are two players one is called Maximizer and other is
called Minimizer.
 Maximizer will try to get the Maximum possible score, and Minimizer will
try to get the minimum possible score.
 This algorithm applies DFS, so in this game-tree, we have to go all the way
through the leaves to reach the terminal nodes.
 At the terminal node, the terminal values are given so we will compare those
value and backtrack the tree until the initial state occurs. Following are the
main steps involved in solving the two-player game tree:
Step-1: In the first step, the algorithm generates the entire game-tree and apply the
utility function to get the utility values for the terminal states. In the below tree
diagram, let's take A is the initial state of the tree. Suppose maximizer takes first turn
which has worst-case initial value =- infinity, and minimizer will take next turn which
has worst-case initial value = +infinity.

Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -
∞, so we will compare each value in terminal state with initial value of Maximizer
and determines the higher nodes values. It will find the maximum among the all.
 For node D max(-1,- -∞) => max(-1,4)= 4
 For Node E max(2, -∞) => max(2, 6)= 6
 For Node F max(-3, -∞) => max(-3,-5) = -3
 For node G max(0, -∞) = max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value
with +∞, and will find the 3rd layer node values.
 For node B= min(4,6) = 4
 For node C= min (-3, 7) = -3

Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all
nodes value and find the maximum value for the root node. In this game tree, there
are only 4 layers, hence we reach immediately to the root node, but in real games,
there will be more than 4 layers.
 For node A max(4, -3)= 4
That was the complete workflow of the minimax two player game.
Properties of Mini-Max algorithm:
 Complete- Min-Max algorithm is Complete. It will definitely find a solution
(if exist), in the finite search tree.
 Optimal- Min-Max algorithm is optimal if both opponents are playing
optimally.
 Time complexity- As it performs DFS for the game-tree, so the time
complexity of Min-Max algorithm is O(bm), where b is branching factor of
the game-tree, and m is the maximum depth of the tree.
 Space Complexity- Space complexity of Mini-max algorithm is also similar
to DFS which is O(bm).
Limitation of the minimax Algorithm:
o The main drawback of the minimax algorithm is that it gets really slow for
complex games such as Chess, go, etc. This type of games has a huge
branching factor, and the player has lots of choices to decide.
o This limitation of the minimax algorithm can be improved from alpha-beta
pruning which we have discussed in the next topic.

13 Discuss Alpha-Beta cutoffs procedure in game playing.


 Alpha-beta pruning is a modified version of the minimax algorithm. It is an
optimization technique for the minimax algorithm.
 As we have seen in the minimax search algorithm that the number of game
states it has to examine are exponential in depth of the tree. Since we cannot
eliminate the exponent, but we can cut it to half. Hence there is a technique
by which without checking each node of the game tree we can compute the
correct minimax decision, and this technique is called pruning. This involves
two threshold parameter Alpha and beta for future expansion, so it is called
alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
 Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not
only prune the tree leaves but also entire sub-tree.
 The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point
along the path of Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along
the path of Minimizer. The initial value of beta is +∞.
 The Alpha-beta pruning to a standard minimax algorithm returns the same
move as the standard algorithm does, but it removes all the nodes which are
not really affecting the final decision but making algorithm slow. Hence by
pruning these nodes, it makes the algorithm fast.
The main condition which required for alpha-beta pruning is:
1. α>=β

Working of Alpha-Beta Pruning:


Let's take an example of two-player search tree to understand the working of Alpha-
beta pruning
Step 1: At the first step the, Max player will start first move from node A where α=
-∞ and β= +∞, these value of alpha and beta passed down to node B where again α=
-∞ and β= +∞, and Node B passes the same value to its child D.

Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of
α is compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α
at node D and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as this
is a turn of Min, Now β= +∞, will compare with the available subsequent nodes value,
i.e. min (∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E,
and the values of α= -∞, and β= 3 will also be passed.

Step 4: At node E, Max will take its turn, and the value of alpha will change. The
current value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E
α= 5 and β= 3, where α>=β, so the right successor of E will be pruned, and algorithm
will not traverse it, and the value at node E will be 5.

Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At
node A, the value of alpha will be changed the maximum available value is 3 as max
(-∞, 3)= 3, and β= +∞, these two values now passes to right successor of A which is
Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0,
and max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3
still α remains 3, but the node value of F will become 1.

Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the
value of beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C,
α=3 and β= 1, and again it satisfies the condition α>=β, so the next child of C which
is G will be pruned, and the algorithm will not compute the entire sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed
and nodes which has never been computed. Hence the optimal value for the
maximizer is 3 for this example.

14 Discuss the different approaches to knowledge representation.

Relational Knowledge:
 The simplest way to represent declarative facts is as a set of relations of the
same sort used in the database system.
 Provides a framework to compare two objects based on equivalent attributes.
 Any instance in which two different objects are compared is a relational type
of knowledge.
 The reason that this representation is simple is that standing alone provides
very weak inferential capabilities but knowledge represented in this form may
serve as the input to more powerful inference engine.
 The table below shows a simple way to store facts.
o The facts about a set of objects are put systematically in columns.
o This representation provides little opportunity for inference.

 Given the facts it is not possible to answer simple question such as: “Who is
the heaviest player?”
 But if a procedure for finding heaviest player is provided, then these facts will
enable that procedure to compute an answer.
 We can ask things like who “bats — left” and “throws — right”.

Inheritable Knowledge:
 Here, the knowledge elements inherit attributes from their parents.
 The knowledge is embodied in the design hierarchies found in the functional,
physical and process domains.
 Within the hierarchy, elements inherit attributes from their parents, but in
many cases not all attributes of the parent elements be prescribed to the child
elements.
 The inheritance is a powerful form of inference, but not adequate.
 The basic KR (Knowledge Representation) needs to be augmented with
inference mechanism.
 In order to support property inheritance objects must be organized into
classes and classes must be arranged into generalization hierarchy.
 Figure below shows some additional baseball knowledge inserted into a
structure that is so arranged.
 Boxed nodes — objects and values of attributes of objects.
 Lines represent attributes.
 Arrows — point from object to its value.
 This structure is known as a slot and filler structure, semantic network or a
collection of frames.

Steps to retrieve a value V for attribute A of an instance object O:


 Find the object O in the knowledge base.
 If there is a value for the attribute report it.
 Otherwise look for a value of an instance, if none fail
 Otherwise go to that the node corrousponding to that value and look for a
value for the attribute A. if one is found , report it.
 Otherwise do until there is no value for isa attribute or until an answer is
found
1. Get the value of isa attribute and move to that node.
2. See if there is a value for the attribute A. If there , report it.
This procedure is simple. It does not say what we should do if there is more than one
value of instance and isa attribute.
We can apply above procedure to our example knowledge base to derive answer to
the following questions
1. team(Pee-Wee-Reese) = Brooklyn-Dodgers. This attribute had a value stored
explicitly in the knowledge base.
2. bats(Three-Fingers-Brown) = Right to get a value for the attribute
bats required going up the isa hierarchy to the class Basball -player. But
what we found there was not a value but rule for computing a value. This
rule required another value (that for handed) as input . Song the entire process
must be begun gain recursively to find a value for handed. This time , it is
necessary to go all the way up to Person to discover that the default value for
handed for person is Right . Now the rule for bats can be applied , producing
the result Right.

Inferential Knowledge:
 This knowledge generates new information from the given information.
 This new information does not require further data gathering from source, but
does require analysis of the given information to generate new knowledge.
 Example: given a set of relations and values, one may infer other values or
relations. A predicate logic (a mathematical deduction) is used to infer from
a set of attributes. Inference through predicate logic uses a set of logical
operations to relate individual data.
Represent knowledge as formal logic:
All dogs have tails ∀x: dog(x) → hastail(x)
Advantages:
 A set of strict rules.
 Can be used to derive more facts.
 Truths of new statements can be verified.
 Guaranteed correctness.
Many inference procedures available to implement standard rules of logic popular in
AI systems. e.g. Automated theorem proving.

Procedural Knowledge:
A representation in which the control information, to use the knowledge, is embedded
in the knowledge itself. For example, computer programs, directions, and recipes;
these indicate specific use or implementation;
Knowledge is encoded in some procedures, small programs that know how to do
specific things, how to proceed.
Advantages:
 Heuristic or domain specific knowledge can be represented.
 Extended logical inferences, such as default reasoning facilitated.
 Side effects of actions may be modeled. Some rules may become false in time.
Keeping track of this in large systems may be tricky.
Disadvantages:
 Completeness — not all cases may be represented.
 Consistency — not all deductions may be correct. e.g. if we know that Fred
is a bird we might deduce that Fred can fly. Later we might discover that Fred
is an emu.
 Modularity is sacrificed. Changes in knowledge base might have far-reaching
effects.
 Cumbersome control information.

15 Explain with example how choosing the granularity of representation and


finding the right structure are crucial issues in knowledge representation.

The fundamental goal of Knowledge Representation is to facilitate inferencing


(conclusions) from knowledge.
The issues that arise while using KR techniques are many. Some of these are
explained below,
Choosing Granularity
 What level should the knowledge be represented and what are the primitives
?
 Should there be a small number or should there be a large number of low-
level primitives or High-level facts.
 High-level facts may not be adequate for inference while Low-level
primitives may require a lot of storage.
 Example of Granularity :
Suppose we are interested in following facts
John spotted Sue.
This could be represented as
Spotted (agent(John), object (Sue))
Such a representation would make it easy to answer questions such are
Who spotted Sue ?
Suppose we want to know
Did John see Sue ?
Given only one fact, we cannot discover that answer.
We can add other facts, such as
Spotted (x , y) → saw (x , y)
We can now infer the answer to the question.

Finding Right structure


 Given a large amount of knowledge stored in a database, how can relevant
parts are accessed when they are needed?
 This is about access to right structure for describing a particular situation.
 This requires, selecting an initial structure and then revising the choice.
 While doing so, it is necessary to solve following problems:
1. How to perform an initial selection of the most appropriate structure.
2. How to fill in appropriate details from the current situations.
3. How to find a better structure if the one chosen initially turns out not
to be appropriate.
4. What to do if none of the available structures is appropriate.
5. When to create and remember a new structure.
 There is no good, general purpose method for solving all these problems.
Some knowledge representation techniques solve some of these issues.

16 Explain the Forward and Backward Reasoning with example.


A. Forward Chaining:
Forward chaining is also known as a forward deduction or forward reasoning method
when using an inference engine. Forward chaining is a form of reasoning which start
with atomic sentences in the knowledge base and applies inference rules (Modus
Ponens) in the forward direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose
premises are satisfied, and add their conclusion to the known facts. This process
repeats until the problem is solved.
Properties of Forward-Chaining:
 It is a down-up approach, as it moves from bottom to top.
 It is a process of making a conclusion based on known facts or data, by
starting from the initial state and reaches the goal state.
 Forward-chaining approach is also called as data-driven as we reach to the
goal using available data.
 Forward -chaining approach is commonly used in the expert system, such as
CLIPS, business, and production rule systems.
Consider the following famous example which we will use in both approaches:
Example:
"As per the law, it is a crime for an American to sell weapons to enemy nations.
Country Nono, an enemy of America, has some missiles, and all the missiles were
sold to it by Colonel, who is an American citizen."
Prove that "Colonel is criminal."
To solve the above problem, first, we will convert all the above facts into first-order
definite clauses, and then we will use a forward-chaining algorithm to reach the goal.
Facts Conversion into FOL(First Order Logic):
 It is a crime for an American to sell weapons to enemy nations. (Let's say x,
y, and z are variables)
American (x) 𝖠 Weapon(y) 𝖠 sells (x, y, z) 𝖠 Enemy(z,America) →
Criminal(x) ...(1)
 Country Nono has some missiles. ∃x : Owns(Nono, x) 𝖠 Missile(x). It can
be written in two definite clauses as below ,
 Owns(Nono , x) ................................................................................. (2)
Missile(x) ...................(3)
 All of the missiles were sold to country Nono by Colonel.
∀x : Missiles(x) 𝖠 Owns (Nono, x) → Sells (Colonel, x, Nono) .............(4)
 Missiles are weapons.
Missile(x) → Weapons (x) ................... (5)
 Country Nono is an enemy of America.
Enemy (Nono, America) ..................... (6)
 Robert is American
American(Robert)........................(7)
Forward chaining proof:
Step-1:
In the first step we will start with the known facts and will choose the sentences which
do not have implications, such as: American(Colonel), Enemy(Nono, America),
Owns(Nono, x), and Missile(x). All these facts will be represented as below.
Step-2:
At the second step, we will see those facts which infer from available facts and with
satisfied premises.(Make sure to not consider those facts which is at R.H.S in above
FOL(First Order Logic) , for example Sells clauses is present in LHS in rule 1 but it
is also derives from rule 4, hence it will be not considered as fact. )
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfy with the substitution {y/x}, so Sells (Colonel, x, Nono) is added,
which infers from the conjunction of Rule (2) and (3).
Rule-(5) is satisfied so Weapon(x) is added.
Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {x/Colonel, y/x,
z/Nono}, so we can add Criminal(Colonel) which infers all the available facts. And
hence we reached our goal statement.

Hence it is proved that Robert is Criminal using forward chaining approach.

B. Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning
method when using an inference engine. A backward chaining algorithm is a form of
reasoning, which starts with the goal and works backward, chaining through rules to
find known facts that support the goal.
Properties of backward chaining:
 It is known as a top-down approach.
 Backward-chaining is based on modus ponens inference rule.
 In backward chaining, the goal is broken into sub-goal or sub-goals to prove
the facts true.
 It is called a goal-driven approach, as a list of goals decides which rules are
selected and used.
 Backward -chaining algorithm is used in game theory, automated theorem
proving tools, inference engines, proof assistants, and various AI applications.
 The backward-chaining method mostly used a depth-first search strategy for
proof.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the
rules.
 American (x) 𝖠 weapon(y) 𝖠 sells (x, y, z) 𝖠 Enemy(z ,America) →
Criminal(x) ...(1)
 Owns(Nono, x).........................(2)
 Missile(x) .......................... (3)
 Missiles(x) 𝖠 Owns (Nono, x) → Sells (Colonel, x, Nono) ................. (4)
 Missile(x) → Weapons (x) ....................... (5)
 Enemy (Nono, America) ......................... (7)
 American(Colonel). .......................... (8)
Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate, which is
Criminal(Colonel), and then infer further rules.
Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will infer other
facts, and at last, we will prove those facts true. So our goal fact is "Colonel is
Criminal," so following is the predicate of it.
Step-2:
At the second step, we will infer other facts form goal fact which satisfies the rules.
So as we can see in Rule-1, the goal predicate Criminal (Colonel) is present with
substitution {Colonel/x}. So we will add all the conjunctive facts below the first level
and will replace x with Colonel.Here we can see American (Colonel) is a fact, so it
is proved here.

Step-3:t At step-3, we will extract further fact Missile(x) which infer from
Weapon(x), as it satisfies Rule-(5). Weapon (x) is also true with the substitution of a
constant x at y.

Step-4:
At step-4, we can infer facts Missile(x) and Owns(Nono, x) form Sells(Colonel,x, z)
which satisfies the Rule- 4, with the substitution of Nono in place of z. So these two
statements are proved here.And hence all the statements are proved true using
backward chaining.

17 Discuss Bay’s theorem.


Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning,
which determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities
of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The
Bayesian inference is an application of Bayes' theorem, which is fundamental to
Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing
new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event
A with known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
1. P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is
basic of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as
Probability of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes'
rule can be written as:
Where A1, A2, A3,. , An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and
P(A). This is very useful in cases where we have a good probability of these three
terms and want to determine the fourth one. Suppose we want to perceive the effect
of some unknown cause, and want to compute that cause, then the Bayes' rule
becomes:

Example-1:
Question: what is the probability that a patient has diseases meningitis with a
stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it
occurs 80% of the time. He is also aware of some more facts, which are given as
follows:
 The Known probability that a patient has meningitis disease is 1/30,000.
 The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient
has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with
a stiff neck.

18 Discuss Bayesian network and its application.


"A Bayesian network is a probabilistic graphical model which represents a set of
variables and their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian
model.
Bayesian networks are probabilistic, because these networks are built from a
probability distribution, and also use probability theory for prediction and anomaly
detection.
Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions,
and it consists of two parts:
 Directed Acyclic Graph
 Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

 Each node corresponds to the random variables, and a variable can be


continuous or discrete.
 Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows
connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented
by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a
directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.
The Bayesian network has mainly two components:
 Causal Component
 Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi
|Parent(Xi) ), which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional
probability. So let's first understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,.... , xn], it can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,. ... , xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]. .. P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,. ....... , X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:
Let's understand the Bayesian network through an example by creating a directed
acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for minor
earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls
Harry when he hears the alarm, but sometimes he got confused with the phone ringing
and calls at that time too. On the other hand, Sophia likes to listen to high music, so
sometimes she misses to hear the alarm. Here we would like to compute the
probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor
an earthquake occurred, and David and Sophia both called the Harry.
Solution:
 The Bayesian network for the above problem is given below. The network
structure is showing that burglary and earthquake is the parent node of the
alarm and directly affecting the probability of alarm's going off, but David
and Sophia's calls depend on alarm probability.
 The network is representing that our assumptions do not directly perceive the
burglary and also do not notice the minor earthquake, and they also not confer
before calling.
 The conditional distributions for each node are given as conditional
probabilities table or CPT.
 Each row in the CPT must be sum to 1 because all the entries in the table
represent an exhaustive set of cases for the variable.
 In CPT, a boolean variable with k boolean parents contains 2 K probabilities.
Hence, if there are two parents, then CPT will contain 4 probability values
List of all events occurring in this network:
 Burglary (B)
 Earthquake(E)
 Alarm(A)
 David Calls(D)
 Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A,
B, E], can rewrite the above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999


Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability of
Alarm.
A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95


Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent Node
"Alarm."
A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98


From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint
distribution.

19 Requirements for knowledge Representation system:


A good knowledge representation system must possess the following properties.
o Representational Accuracy:
A major property of a knowledge representation system is that it is adequate
and can make an AI system understand, i.e., represent all the knowledge
required by it to deal with a particular field or domain.
o Inferential Adequacy:
KR system should have ability to manipulate the representational structures
to produce new knowledge corresponding to existing structure.
o Inferential Efficiency:
The ability to direct the inferential knowledge mechanism into the most
productive directions by storing appropriate guides.
o Acquisitional efficiency- The final property of the knowledge representation
system will be its ability to gain new knowledge automatically, helping the
AI to add to its current knowledge and consequently become increasingly
smarter and productive.

20 Cycle of Knowledge Representation in AI.


Artificial Intelligent Systems usually consist of various components to display their
intelligent behavior. Some of these components include:
 Perception
 Learning
 Knowledge Representation & Reasoning
 Planning
 Execution
Here is an example to show the different components of the system and how it works:
Example
The above diagram shows the interaction of an AI system with the real world and the
components involved in showing intelligence.
 The Perception component retrieves data or information from the
environment. with the help of this component, you can retrieve data from the
environment, find out the source of noises and check if the AI was damaged
by anything. Also, it defines how to respond when any sense has been
detected.
 Then, there is the Learning Component that learns from the captured data by
the perception component. The goal is to build computers that can be taught
instead of programming them. Learning focuses on the process of self-
improvement. In order to learn new things, the system requires knowledge
acquisition, inference, acquisition of heuristics, faster searches, etc.
 The main component in the cycle is Knowledge Representation and
Reasoning which shows the human-like intelligence in the machines.
Knowledge representation is all about understanding intelligence. Instead of
trying to understand or build brains from the bottom up, its goal is to
understand and build intelligent behavior from the top-down and focus on
what an agent needs to know in order to behave intelligently. Also, it defines
how automated reasoning procedures can make this knowledge available as
needed.
 The Planning and Execution components depend on the analysis of
knowledge representation and reasoning. Here, planning includes giving an
initial state, finding their preconditions and effects, and a sequence of actions
to achieve a state in which a particular goal holds. Now once the planning is
completed, the final stage is the execution of the entire process.
21. What are different fuzzy set operations?
22. Explain the difference between Boolean and Fuzzy Set membership using a suitable
example.
23. With neat diagram explain fuzzy inference process. Explain its each component in
brief.
24. What are the types of Fuzzy inference system? What are the main differences between
them?
25. Explain with example Mamdani fuzzy inference system.
26. Give the example of sugeno inference model.
27. What is the main difference between the probability and fuzzy logic?
28. Discuss various defuzzification methods.
29. Discuss various ANN applications.
30. Explain ANN.
31. Explain Rosenblatt’s perceptron model.
32. Explain Multilayer Perceptron NN with neat diagram.
33. Discuss main disadvantage of single layer perceptron.
34. How Does Back-Propagation in Artificial Neural Networks Work?
35. Explain different types of activation functions.

You might also like