CS2351 - Ai 00 PDF

CS2351 Artificial Intelligence

A Course Material on

Artificial Intelligence


Mrs. J.Justina Princy Thilagavathy





CS2351 Artificial Intelligence


This is to certify that the e-course material

Subject Code : CS22351

Subject : Artificial Intelligence

Class : III Year CSE

being prepared by me and it meets the knowledge requirement of the university curriculum.

Signature of the Author

Name: J. Justina Princy Thilagavathy

Designation: Assistant Professor

This is to certify that the course material being prepared by Mrs.J.Justina Princy Thilagavathy is of adequate
quality. She has referred more than five books among them minimum one is from abroad author.

Signature of HD

Name: Mrs. P. Murugapriya

Head & AP

CS2351 Artificial Intelligence




1 Introduction

2 Agents

3 Problem formulation

4 uninformed search strategies

5 heuristics, informed search strategies

6 constraint satisfaction


7 Logical agents, Propotional logic

8 inferences

9 first-order logic, inferences in first order logic

10 forward chaining

11 backward chaining

12 unification, Resolution


13 Planning with state, Space search

14 partial Order Planning

15 planning graphs,Planning andacting with real world


CS2351 Artificial Intelligence

16 Uncertainty , Review of probability

17 probabilistic Reasoning

18 Bayesian networks

19 inferences in Bayesian networks, Temporal models

20 Hidden Markov models


21 Learning from observation

22 Inductive learning

23 Decision trees

24 Explanation based learning

25 Statistical Learning methods

26 Reinforcement Learning


A Glossary

B Question bank

C Previous year question papers

CS2351 Artificial Intelligence



Aim: To learn the basics of designing intelligent agents that can solve general purpose problems,
represent and process knowledge, plan and act, reason under uncertainty and can learn from


Introduction – Agents – Problem formulation – uninformed search strategies – heuristics
– informed search strategies – constraint satisfaction


Logical agents – propositional logic – inferences – first-order logic – inferences in first- order logic
– forward chaining – backward chaining – unification – resolution


Planning with state-space search – partial-order planning – planning graphs – planning and acting in
the real world


Uncertainty – review of probability - probabilistic Reasoning – Bayesian networks –
inferences in Bayesian networks – Temporal models – Hidden Markov models

Learning from observation - Inductive learning – Decision trees – Explanation based learning –
Statistical Learning methods - Reinforcement Learning


1. S. Russel and P. Norvig, "Artificial Intelligence – A Modern Approach", Second
Edition, Pearson Education, 2003.
Edition, Pearson Education, 2003.


1. David Poole, Alan Mackworth, Randy Goebel, "Computational Intelligence : a logical
approach", Oxford University Press, 2004.

approach”, Oxford University Press, 2004.
2. G. Luger, “Artificial Intelligence: Structures and Strategies for complex problem solving”,
Fourth Edition, Pearson Education, 2002.
3. J. Nilsson, “Artificial Intelligence: A new Synthesis”, Elsevier Publishers, 1998.

CS2351 Artificial Intelligence




The objective of Artificial Intelligence is that how the system can perceive, understand, predict and
manipulate a world far larger and more complicated. The field of Artificial Intelligence is to build
intelligent entities.


Artificial Intelligence is the study of how to make computers do things at which, at the moment,
people are better.


 Building systems that think like humans

“The exciting new effort to make computers think … machines with minds, in the full and
literal sense” -- Haugeland, 1985

“The automation of activities that we associate with human thinking, … such as

decision-making, problem solving, learning, …” -- Bellman, 1978

 Building systems that act like humans

“The art of creating machines that perform functions that require intelligence when
performed by people” -- Kurzweil, 1990

“The study of how to make computers do things at which, at the moment, people
are better” -- Rich and Knight, 1991

 Building systems that think rationally

“The study of mental faculties through the use of computational models” -- Charniak
and McDermott, 1985

“The study of the computations that make it possible to perceive, reason, and act” --
Winston, 1992

 Building systems that act rationally

“A field of study that seeks to explain and emulate intelligent behavior in terms of
computational processes” -- Schalkoff, 1990

“The branch of computer science that is concerned with the automation of

intelligent behavior” -- Luger and Stubblefield, 1993

CS2351 Artificial Intelligence

Agent = perceive + act
 Thinking
 Reasoning
 Planning

Agent: entity in a program or environment capable of generating action.

An agent uses perceptionof the environment to make decisions about actions to take. The
perception capability is usually called a sensor. The actions can depend on the most recent perception or
on the entire history (percept sequence).

An agent is anything that can be viewed as perceiving its environment through sensors and acting
upon the environment through actuators.
Ex: Robotic agent
Human agent


Agent = perceive+act

 Thinking
 Reasonig
 Planning

CS2351 Artificial Intelligence

Agent: entity in a program or environment capable of generating action.

An agent uses perception of the environment to make decisions about actions to take.
The perception capability is usually called a sensor.
The actions can depend on the most recent perception or on the entire history (percept

An agent is anything that can be viewed as perceiving its environment through sensors and acting
upon the environment through actuators.

Ex: Robotic agent

Human agent

Agents interact with environment through sensors and actuators.


CS2351 Artificial Intelligence

Percept sequence action

[A, clean] right

[A, dirt] suck
[B, clean] left
[B, dirty] suck
[A, clean], [A, clean] right
[A, clean], [A, dirty] suck

Fig: practical tabulation of a simple agent function for the vacuum cleaner world

Agent Function

1.The agent function is a mathematical function that maps a sequence of perceptions into

2. The function is implemented as the agent program.

3. The part of the agent taking an action is called an actuator.

4. Environment sensors agent function actuators environment


A rational agent is one that can take the right decision in every situation.

Performance measure: a set of criteria/test bed for the success of the agent's behavior.

The performance measures should be based on the desired effect of the agent on
the environment.


The agent's rational behavior depends on:

1.the performance measure that defines success

2. the agent's knowledge of the environment

3.the action that it is capable of performing

4 .The current sequence of perceptions.

Definition: for every possible percept sequence, the agent is expected to take an
action that will maximize its performance measure.

Agent Autonomy:

An agent is omniscient if it knows the actual outcome of its actions. Not possible in
SCE 9 Dept of CSE
CS2351 Artificial Intelligence

practice. An environment can sometimes be completely known in advance.

Exploration: sometimes an agent must perform an action to gather information (to increase

CS2351 Artificial Intelligence

Autonomy: the capacity to compensate for partial or incorrect prior knowledge (usually by


Task environment – the problem that the agent is a

solution to. Includes
Performance measure




Agent Type Performance Environment Actuators Sensors


Taxi Driver Safe, Fast, Roads, other Steering, Camera, sonar,

Legal, Comfort, traffic, accelerators, GPS,
Maximize Profits pedestrians, brake, Speedometer,
customers signal keyboard, etc
, horn

Medical Healthy patient, Patient, Screen display Keyboard (entry

diagnosi minimize costs, hospital, (questions, of
s system lawsuits staff tests, symptoms
diagnoses, , findings,
treatments, patient's
referrals) answers

Properties of Task Environment:

• Fully Observable (vs. Partly Observable)

– Agent sensors give complete state of the environment at each point in time

– Sensors detect all the aspect that is relevant to the choice of action.

– An environment might be partially observable because of noisy and

inaccurate sensors or apart of the state are simply missing from the sensor

• Deterministic (vs. Stochastic)

– Next state of the environment is completely determined by the current state

SCE 11 Dept of CSE
CS2351 Artificial Intelligence

and the action executed by the agent

CS2351 Artificial Intelligence

– Strategic environment (if the environment is deterministic except for the actions
of other agent.)

• Episodic (vs. Sequential)

– Agent’s experience can be divided into episodes, each episode with what an agent
perceive and what is the action

• Next episode does not depend on the previous episode

– Current decision will affect all future sates in sequential environment

• Static (vs. Dynamic)

– Environment doesn’t change as the agent is deliberating

– Semi dynamic

• Discrete (vs. Continuous)

– Depends the way time is handled in describing state, percept, actions

• Chess game : discrete

• Taxi driving : continuous

• Single Agent (vs. Multi Agent)

– Competitive, cooperative multi-agent environments

– Communication is a key issue in multi agent environments.

Partially Observable:

Ex: Automated taxi cannot see what other devices are

thinking. Stochastic:
Ex: taxi driving is clearly stochastic in this sense, because one can never predict the
behaviorof the traffic exactly.

Semi dynamic:

If the environment does not change for some time, then it changes due to agent’s
performance is called semi dynamic environment.

Single Agent Vs multi agent:

An agent solving a cross word puzzle by itself is clearly in a single agent environment.

An agent playing chess is in a two agent environment.

CS2351 Artificial Intelligence

Example of Task Environments and Their Classes

Four types of agents:

1. Simple reflex agent

2. Model based reflex agent

3. goal-based agent

4. utility-
base agent

Simple reflex agent

SRA works only if the correct decision can be made on the basis of only the
current percept that is only if the environment is fully observable.


– no plan,
no goal
– do not know what they want to achieve

– do not know what they are doing

Condition-action rule

SCE 14 Dept of CSE

CS2351 Artificial Intelligence

– If condition then action

Ex: medical diagnosis system.

CS2351 Artificial Intelligence

Algorithm Explanation:

Interpret – Input:

Function generates an abstracted description of the current state from the percept.


Function returns the first rule in the set of rules that matches the given state


The selected rule is executed as action of the given percept.

Model-Based Reflex Agents:


An agent which combines the current percept with the old internal state to
generate updated description of the current state.

If the world is not fully observable, the agent must remember observations about the
parts of the environment it cannot currently observe.

This usually requires an internal representation of the world (or internal state).

Since this representation is a model of the world, we call this model-based agent.

Ex: Braking problem


1.Reflex agent with internal state

2.Sensor does not provide the complete state of the

3. must keep its internal state

Updating the internal world

requires two kinds of knowledge

1. How world evolves

2. How agent’s action affect the world

SCE 16 Dept of CSE

CS2351 Artificial Intelligence

Algorithm Explanation:

UPDATE-INPUT: This is responsible for creating the new internal stated description.

Goal-based agents:

The agent has a purpose and the action to be taken depends on the current state
and on what it tries to accomplish (the goal).

In some cases the goal is easy to achieve. In others it involves planning, sifting through a
search space for possible solutions, developing a strategy.


CS2351 Artificial Intelligence

– Action depends on the goal. (consideration of future)

– e.g. path finding

CS2351 Artificial Intelligence

– Fundamentally different from the condition-action rule.

– Search and Planning

– Solving “car-braking” problem?

– Yes, possible … but not likely natural.

• Appears less efficient.

Utility-based agents

If one state is preferred over the other, then it has higher utility for the agent

Utility-Function (state) = real number (degree of


The agent is aware of a utility function that estimates how close the current state is to the
agent's goal.

• Characteristics

– to generate high-quality behavior

– Map the internal states to real

numbers. (e.g., game playing)
• Looking for higher utility value utility function

SCE 19 Dept of CSE

CS2351 Artificial Intelligence

Learning Agents

Agents capable of acquiring new competence through observations and

actions. Learning agent has the following components

Learning element

Suggests modification to the existing rule to the critic

Performance element

Collection of knowledge and procedures for selecting the driving actions

Choice depends on Learning element


Observes the world and passes information to the learning element

Problem generator

Identifies certain areas of behavior needs improvement and

suggest experiments

CS2351 Artificial Intelligence

Agent Example

A file manager agent.

Sensors: commands like ls, du, pwd.

Actuators: commands like tar, gzip, cd, rm, cp, etc.

Purpose: compress and archive files that have not been used in a while.

Environment: fully observable (but partially observed), deterministic (strategic),

episodic, dynamic, discrete.

Problem Formulation

• Problem formulation is the process of deciding what actions and states to consider,
given a goal

Formulate Goal, Formulate



SCE 21 Dept of CSE
CS2351 Artificial Intelligence


Four components of problem definition

– Initial state – that the agent starts in

– Possible Actions

• Uses a Successor Function

– Returns <action, successor>


• State Space – the state space forms a graph in which the nodes are
states and arcs between nodes are actions.

• Path

– Goal Test – which determine whether a given state is goal state

– Path cost – function that assigns a numeric cost to each path.


• Route finding

• Touring (traveling salesman)

• Logistics

• VLSI layout

• Robot navigation

• Learning

CS2351 Artificial Intelligence


Example-1 : Vacuum World

Problem Formulation

• States

– 2 x 22 = 8 states

– Formula n2n states

• Initial State

– Any one of 8 states

• Successor Function

– Legal states that result from three actions (Left, Right, Suck)

• Goal Test

– All squares are clean

• Path Cost

– Number of steps (each step costs a value of 1)

CS2351 Artificial Intelligence

State Space for the Vacuum World.

Labels on Arcs denote L: Left, R: Right, S: Suck


• Uninformed strategies use only the information available in the problem definition

– Also known as blind searching

– Uninformed search methods:

• Breadth-first search

• Uniform-cost search

• Depth-first search

• Depth-limited search

• Iterative deepening search

SCE 24 Dept of CSE

CS2351 Artificial Intelligence


The root node is expanded first, and then all the nodes generated by the node are expanded.

• Expand the shallowest unexpanded node

• Place all new successors at the end of a FIFO queue


CS2351 Artificial Intelligence

Properties of Breadth-First Search

• Complete

– Yes if b (max branching factor) is finite

• Time

– 1 + b + b2 + … + bd + b(bd-1) = O(bd+1)

– exponential in d

• Space

– O(bd+1)

– Keeps every node in memory

– This is the big problem; an agent that generates nodes at 10 MB/sec will
860 MB in 24 hours

• Optimal

– Yes (if cost is 1 per step); not optimal in general

Lessons from Breadth First Search

• The memory requirements are a bigger problem for breadth-first search than is
execution time

CS2351 Artificial Intelligence

• Exponential-complexity search problems cannot be solved by uniformed methods

for any but the smallest instances

Ex: Route finding problem


Task: Find the route from S to G using BFS.


Step 2:


CS2351 Artificial Intelligence


Answer : The path in the 2nd depth level that is SBG (or ) SCG.

Time complexity


Expand one node to the depth of the tree. If dead end occurs, backtracking is done
to the next immediate previous node for the nodes to be expanded

• Expand the deepest unexpanded node

• Unexplored successors are placed on a stack until fully explored

• Enqueue nodes on nodes in LIFO (last-in, first-out) order. That is, nodes used as
a stack data structure to order nodes.

• It has modest memory requirement.

• It needs to store only a single path from the root to a leaf node, along with
remaining unexpanded sibling nodes for each node on a path

• Back track uses less memory.


SCE 28 Dept of CSE

CS2351 Artificial Intelligence

CS2351 Artificial Intelligence

SCE 30 Dept of CSE

CS2351 Artificial Intelligence

Properties of Depth-First Search

• Complete

– No: fails in infinite-depth spaces, spaces with loops

• Modify to avoid repeated spaces along path

– Yes: in finite spaces

• Time

– O(bm)

– Not great if m is much larger than d

– But if the solutions are dense, this may be faster than breadth-first search

• Space

– O(bm)…linear space
CS2351 Artificial Intelligence

• Optimal

– No

• When search hits a dead-end, can only back up one level at a time even if the
“problem” occurs because of a bad operator choice near the top of the tree.
Hence, only does “chronological backtracking”


• If more than one solution exists or no of levels is high then dfs is best because
exploration is done only a small portion of the white space.


• No guaranteed to find solution.

Example: Route finding problem

Given problem:

Task: Find a route between A to B

Step 1:

Step 2:

CS2351 Artificial Intelligence

Step 3:


Step 4:


Answer: Path in 3rd level is SADG



CS2351 Artificial Intelligence

A cut off (Maximum level of the depth) is introduced in this search technique to overcome
disadvantage of Depth First Search. The cut off value depends on the number of states.DLS
can be implemented as a simple modification to the general tree search algorithm or the
recursive DFS algorithm.DLS imposes a fixed depth limit on a dfs.

A variation of depth-first search that uses a depth limit

– Alleviates the problem of unbounded trees

– Search to a predetermined depth l (“ell”)

– Nodes at depth l have no successors

• Same as depth-first search if l = ∞

• Can terminate for failure and cutoff

• Two kinds of failure

Standard failure: indicates no solution

Cut off: indicates no solution within the depth limit

Properties of Depth-Limited Search

• Complete

– Yes if l < d

• Time

– N(IDS)=(d)b+(d-1)b²+……………………..+(1)

– O(bl)
SCE 34 Dept of CSE
CS2351 Artificial Intelligence

• Space

– O(bl)

• Optimal

– No if l > d


• Cut off level is introduced in DFS Technique.


• No guarantee to find the optimal solution.

E.g.: Route finding problem




The number of states in the given map is five. So it is possible to get the goal state at the
maximum depth of four. Therefore the cut off value is four.

Task: find a path from A to E.

1. 2. 3. 4.




CS2351 Artificial Intelligence

Answer: Path = ABDE Depth=3



• Iterative deepening depth-first search It is a strategy that steps the issue of choosing the
best path depth limit by trying all possible depth limit

Uses depth-first search

Finds the best depth limit

Gradually increases the depth limit; 0, 1, 2, … until a goal is found

Iterative Lengthening Search:

The idea is to use increasing path-cost limit instead of increasing depth limits. The
resulting algorithm called iterative lengthening search.


CS2351 Artificial Intelligence

Properties of Iterative Deepening


• Complete

– Yes

• Time : N(IDS)=(d)b+(d-1)b2+…………+(1)bd

– O(bd)

• Space

– O(bd)

• Optimal

– Yes if step cost = 1

– Can be modified to explore uniform cost tree


CS2351 Artificial Intelligence

• This method is preferred for large state space and when the depth of the search
is not known.

• Memory requirements are modest.

• Like BFS it is complete

CS2351 Artificial Intelligence


Many states are expanded multiple times.

Lessons from Iterative Deepening Search

• If branching factor is b and solution is at depth d, then nodes at depth d are

generated once, nodes at depth d-1 are generated twice, etc.

– Hence bd + 2b(d-1) + ... + db <= bd / (1 - 1/b)2 = O(bd).

– If b=4, then worst case is 1.78 * 4d, i.e., 78% more nodes searched
than exist at depth d (in the worst case).

• Faster than BFS even though IDS generates repeated states

– BFS generates nodes up to level d+1

– IDS only generates nodes up to level d

• In general, iterative deepening search is the preferred uninformed search

method when there is a large search space and the depth of the solution is
not known

Example: Route finding problem





Task: Find a path from A to G.



CS2351 Artificial Intelligence





Answer: Since it is a IDS tree the lowest depth limit (i.e.) A-F-G is selected as the solution path.



CS2351 Artificial Intelligence

It is a strategy that simultaneously searches both the directions (i.e) forward from the
initial state and backward from the goal state and stops when the two searches meet
in the Middle.

• Alternate searching from the start state toward the goal and from the goal state
toward the start.

• Stop when the frontiers intersect.

• Works well only when there are unique start and goal states.

• Requires the ability to generate “predecessor” states.

• Can (sometimes) lead to finding a solution more quickly.

Properties of Bidirectional Search:

1. Time Complexity: O(b d/2)

2. Space Complexity: O(b d/2)

3. Complete: Yes

4. Optimal: Yes

CS2351 Artificial Intelligence


Reduce time complexity and space complexity


The space requirement is the most significant weakness of bi-directional search.If two
searches do not meet at all, complexity arises in the search technique. In backward search
calculating predecessor is difficult task. If more than one goal state exists then explicitly,
multiple state searches are required.


• Completeness

– Will a solution always be found if one exists?

• Time

– How long does it take to find the solution?

– Often represented as the number of nodes searched

• Space

– How much memory is needed to perform the search?

– Often represented as the maximum number of nodes stored at once

• Optimal

– Will the optimal (least cost) solution be found?

• Time and space complexity are measured in

– b – maximum branching factor of the search tree

– m – maximum depth of the state space

– d – depth of the least cost solution

sce 42 Dept of CSE

CS2351 Artificial Intelligence


Heuristic / Informed

It uses additional information about nodes (heuristics) that have not yet been explored to
decide which nodes to examine next

 Use problem specific knowledge

 Can find solutions more efficiently than search strategies that do not use domain specific

 find solutions even when there is limited time available

General approach of informed search:

 Best-first search: node is selected for expansion based on an evaluation function f(n)

Idea: evaluation function measures distance to the goal.

* Choose node which appears best

• Best First Search algorithms differs in the evaluation function

– Evaluation function incorporate the problem specific knowledge in the form of


– h(n) , heuristic function , a component of f(n), Estimated cost of cheapest path

to the
goal node

• h(n) = 0, if n is the goal node


 fringe is queue sorted in decreasing order of desirability.

 Special cases: greedy search, A* search


• Expands the node that is closest to the goal

• Consider route finding problem in Romania

– Use of hSLD, Straight Line Distance Heuristic

– Evaluation function f(n) = h(n) (heuristic), estimate of cost from n to goal

SCE 43 Dept of CSE
CS2351 Artificial Intelligence


A best first search that uses to select next node to expand is called greedy search.



From the given graph and estimated cost the goal state is estimated as B
from A. Apply the evaluation function h(n) to find a path from A to B.

SCE 44 Dept of CSE

CS2351 Artificial Intelligence

From F goal state B is reached. Therefore the path from A to B using greedy search is A-S-F-B
= 450(i.e.) (140+99+211).or the problem of finding route from Arad to Burcharest...

CS2351 Artificial Intelligence


 Completeness: NO (cfr. DF-search)

- Check on repeated states

- Minimizing h(n) can result in false starts, e.g. Iasi to Fagaras.

Properties of greedy best-first search:

• Complete? No – can get stuck in loops, e.g., Iasi Neamt Iasi Neamt

• Time? O(bm), but a good heuristic can give dramatic improvement

• Space? O(bm) -- keeps all nodes in memory

• Optimal? No


• Standard search problem:

– state is a "black box“ – any data structure that supports successor function,
heuristic function, and goal test

• CSP:

– state is defined by variables Xi with values from domain Di

CS2351 Artificial Intelligence

– goal test is a set of constraints specifying allowable combinations of values for

subsets of variables

– Simple example of a formal representation language

• Allows useful general-purpose algorithms with more power than standard search

Arc consistency:

1. Arc refers to a directed arc in the constraint graph.

2. Arc consistency checking can be applied either as a preprocessing. step before
the process must be applied repeatedly until no more inconsistency remain.

Path consistency:

Path consistency means that any pair of adjacent variables can always be
extended to a third neighboring variable, this is also called path consistency


Stronger forms of propagation can be defined using the notation called K-

consistency. A CSP is K-consistency if for any set of K-1 variables and for any consistent
assignment to those variables, a constant value can always be assigned to any variable.

Example: Map-Coloring

• Variables WA, NT, Q, NSW, V, SA, T

• Domains Di = {red,green,blue}
CS2351 Artificial Intelligence

• Constraints: adjacent regions must have different colors

• e.g., WA ≠ NT, or (WA,NT) in ,(red,green),(red,blue),(green,red),


• Solutions are complete and consistent assignments, e.g., WA = red, NT = green,Q =

red,NSW= green,V = red,SA = blue,T = green

Constraint graph

• Binary CSP: each constraint relates two variables

• Constraint graph: nodes are variables, arcs are constraints

CS2351 Artificial Intelligence

CS2351 Artificial Intelligence

Varieties of CSPs

• Discrete variables

– finite domains:

• n variables, domain size d O(dn) complete assignments

• e.g., Boolean CSPs, incl.~Boolean satisfiability (NP-complete)

– infinite domains:

• integers, strings, etc.

• e.g., job scheduling, variables are start/end days for each job

• need a constraint language, e.g., StartJob1 + 5 ≤ StartJob3

• Continuous variables

– e.g., start/end times for Hubble Space Telescope observations

– linear constraints solvable in polynomial time by linear programming

Varieties of constraints:

• Unary constraints involve a single variable,

– e.g., SA ≠ green

• Binary constraints involve pairs of variables,

– e.g., SA ≠ WA

– Higher-order constraints involve 3 or more variables,

– e.g., cryptarithmetic column constraints

CS2351 Artificial Intelligence


Knowledge representation
A variety of ways of knowledge (facts) have been exploited in AI programs. Facts: truths
in some relevant world. These are things we want to represent.
Propositional logic

It is a way of representing knowledge.In logic and mathematics, a propositional calculus

or logic is a formal system in which formulae representing propositions can be formed by
combining atomic propositions

using logical connectives

Sentences considered in propositional logic are not arbitrary sentences but are the ones
that are either true or false, but not both. This kind of sentences are called propositions.


Some facts in propositional logic:

It is raining. - RAINING
It is sunny - SUNNY
It is windy - WINDY

If it is raining ,then it is not sunny - RAINING -> SUNNY

Elements of propositional logic

Simple sentences which are true or false are basic propositions. Larger and more complex
sentences are constructed from basic propositions by combining them with connectives.
Thus propositions and connectives are the basic elements of propositional logic. Though
there are many connectives, we are going to use the following five basic connectives
here: NOT, AND, OR, IF_THEN (or IMPLY), IF_AND_ONLY_IF. They are also
denoted by the symbols: , , , , , respectively.

Inference is deriving new sentences from old.

CS2351 Artificial Intelligence

Modus ponens

There are standard patterns of inference that can be applied to derive chains of
conclusions that lead to the desired goal. These patterns of inference are called inference


Propositions tell about the notion of truth and it can be applied to logical reasoning. We can
have logical entailment between sentences. This is known as entailment where a sentence
follows logically from another sentence.In mathematical notation we write : knowledge
based agents or logical agents.The central component of a knowledge-based agent is its
knowledge base, or KB.
Informally,a knowledge base is a set of sentences. Each sentence is expressed in language
called a knowledge representation language and represents some assertion about the
The syntax of propositional logic defines the allowable
sentences. The atomic sentences-
the indivisible syntactic elements-consist of a single proposition symbol. Each such
symbol tands for a proposition that can be true or false. We will use uppercase names for
symbols: P, Q, R, and so on.

Complex sentences are constructed from simpler sentences using logical

connectives. There are five connectives in common use:

First order Logic

Whereas propositional logic assumes the world contains facts, first-

order logic (like natural language) assumes the world contains

Objects: people, houses, numbers, colors, baseball games, wars, …

Relations: red, round, prime, brother of, bigger than, part of, comes between,
Functions: father of, best friend, one more than, plus,

The basic syntactic elements of -orderlogicare. the symbols that stand for objects,
relations, and functions. The symbols,come in three kinds:

a) constant symbols, which stand for objects;

b) predicate symbols, which stand for relations;
c) and function symbols, which stand for functions.

We adopt the convention that these symbols will begin with uppercase letters.

Example: Constant
symbols : Richard
CS2351 Artificial Intelligence

and John; predicate

symbols :

Brother, OnHead, Person, King, and

Crown; function symbol :LeftLeg.


There is need to express properties of entire collections of objects,instead of enumerating

the objects by name. Quantifiers let us do this.FOL contains two standard quantifiers

a) Universal ( ) and

b) Existential ( )

Universal quantification

( x) P(x) : means that P holds forall values of x in the domain associated with that

E.g., ( x) dolphin(x) => mammal(x)

Existential quantification

( x)P(x) means that P holds for some value of x in the domain associated with that

E.g., ( x) mammal(x) ^ lays-eggs(x)
Permits one to make a statement about some object without naming it

Explain Universal Quantifiers with an example.

Rules such as "All kings are persons,'' is written in first-order logic as

x King(x) => Person(x)

where is pronounced as “ For all ..”

Thus, the sentence says, "For all x, if x is a king, then z is a

person." The symbol x is called a variable(lower case letters)

The sentence x P,where P is a logical expression says that P is true for every object x.

SCE 59 Dept of CSE

CS2351 Artificial Intelligence

Existential quantifiers with an example.

Universal quantification makes statements about every object. It is possible to make a

statement about some object in the universe without naming it,by using an existential


“King John has a crown on his head”

x Crown(x) ^ OnHead(x,John)

x is pronounced There“ exists an x such that ..” or “ For some x ..”

connection between universal and existential quantifiers

“Everyone likes icecream “ is equivalent

“there is no one who does not like icecream”
This can be expressed as :
x Likes(x,IceCream) isquivalent
to Likes(x,IceCream)
Knowledge Engineering
Discuss them by applying the steps to any real world
application of your choice. The general process of knowledge base construction a process
is called knowledge engineering. A knowledge engineer is someone who investigates a
particular domain, learns what concepts are important in that domain, and creates a
formal representation of the objects and relations in the domain. We will illustrate the
knowledge engineering process in an electronic circuit domain that should already be
fairly familiar,

The steps associated with the knowledge engineering process are :

1. Identfy the task.

. The task will determine what knowledge must be represented in order to connect
problem instances to answers. This step is analogous to the PEAS process for designing

2. Assemble the relevant knowledge. The knowledge engineer might already be an

expert in the domain, or might need to work with real experts to extract what they know-
a process called knowledge acquisition.

3. Decide on a vocabulary of predicates, functions, and constants. That is, translate

the important domain-level concepts into logic-level names.

SCE 60 Dept of CSE

CS2351 Artificial Intelligence

Once the choices have been made. the result is a vocabulary that is known as the ontology of
the domain. The word ontology means a particular theory of the nature of being or existence.

4. Encode general /knowledge about the domain.

The knowledge engineer writes down the axioms for all the vocabulary terms. This pins down
(to the extent possible) the meaning of the terms, enabling the expert to check the content.
Often, this step reveals misconceptions or gaps in the vocabulary that must be fixed by returning
to step 3 and iterating through the process.

5. Encode a description of the specific problem instance.

For a logical agent, problem instances are supplied by the sensors, whereas a "disembodied"
knowledge base is supplied with additional sentences in the same way that traditional
programs are supplied with input data.

6. Pose queries to the inference procedure and get answers.

This is where the reward is: we can let the inference procedure operate on the axioms and
problem-specific facts to derive the facts we are interested in knowing.

7. Debug the knowledge base.

x NumOfLegs(x,4) => Mammal(x) Is

false for reptiles ,amphibians.

To understand this seven-step process better, we now apply it to an extended example-the

domain of electronic circuits.

The electronic circuits domain

We will develop an ontology and knowledge base that allow us to reason about digital Circuits
of the kind shown in Figure 8.4. We follow the seven-step process for knowledge engineering
There are many reasoning tasks associated with digital circuits. At the highest level, one
analyzes the circuit's functionality. For example, what are all the gates connected to the first
input terminal? Does the circuit contain feedback loops? These will be our tasks in this section.

SCE 61 Dept of CSE

CS2351 Artificial Intelligence

Assemble the relevant knowledge

What do we know about digital circuits? For our purposes, they are composed of wires and
gates. Signals flow along wires to the input terminals of gates, and each gate produces a decide
on vocabulary.

We now know that we want to talk about circuits, terminals, signals, and gates. The next
step is to choose functions, predicates, and constants to represent them. We will start from
individual gates and move up to circuits. First, we need to be able to distinguish a gate from
other gates. This is handled by naming gates with constants: X I , X2, and so on

Encode general knowledge of the domain

One sign that we have a good ontology is that there are very few general rules which need
to be specified. A sign that we have a good vocabulary is that each rule can be stated clearly
and concisely. With our example, we need only seven simple rules to describe everything we
need to know about circuits:

1. If two terminals are connected, then they have the same signal:

2. The signal at every terminal is either 1 or 0 (but not both):

3. Connected is a commutative predicate:

SCE 62 Dept of CSE

CS2351 Artificial Intelligence

4. An OR gate's output is 1 if and only if any of its inputs is 1:

5. An A.ND gate's output is 0 if and only if any of its inputs is 0:

6. An XOR gate's output is 1 if and only if its inputs are different:

7. A NOT gate's output is different from its input:

Encode the specific problem instance

The circuit shown in Figure 8.4 is encoded as circuit C1 with the following description.

First, we categorize the gates:

Type(X1)= XOR Type(X2)= XOR

Pose queries to the inference procedure

What combinations of inputs would cause the first output of Cl (the sum bit) to be 0 and
The second output of C1 (the carry bit) to be l?

Debug the knowledge base

We can perturb the knowledge base in various ways to see what kinds of erroneous



Usage of First Order Logic.

The best way to find usage of First order logic is through examples. The examples can be
taken from some simple domains. In knowledge representation, a domain is just some
part of

the world about which we wish to express some knowledge.

Assertions and queries in first-order logic

Sentences are added to a knowledge base using TELL, exactly as in propositional logic.
CS2351 Artificial Intelligence

sentences are called assertions.

For example, we can assert that John is a king and that kings are persons:

TELL(KB, King (John))

Where KB is knowledge base.

TELL(KB, x King(x) => Person(x)).

We can ask questions of the knowledge base using ASK. For

example, returns true.
Questions asked using ASK are called queries or goals
Will return true.

(ASK KBto find whther Jon is a king) ASK(KB, x person(x))


The kinship domain

The first example we consider is the domain of family relationships, or

kinship. This domain includes facts such as

"Elizabeth is the mother of Charles" and

"Charles is the father of William7' and rules such as

"One's grandmother is the mother of one's parent."
Clearly, the objects in our domain are people.
We will have two unary predicates, Male and Female.

Kinship relations-parenthood, brotherhood, marriage, and so on-will be represented by

binary predicates: Parent, Sibling, Brother, Sister, Child, Daughter,Son, Spouse,
Husband, Grandparent, Grandchild, Cousin, Aunt, and Uncle.

We will use functions for Mother and Father.

Forward chaining with an example.

Using a deduction to reach a conclusion from a set of antecedents is called forward

chaining. In other words,the system starts from a set of facts,and a set of rules,and tries to
find the way of using these rules and facts to deduce a conclusion or come up with a
suitable couse of action. This is known as data driven reasoning.

SCE 64 Dept of CSE

CS2351 Artificial Intelligence

The proof tree generated by forward chaining.

Example knowledge base
• The law says that it is a crime for an American to sell weapons to hostile nations. The
country Nono, an enemy of America, hassomemissiles, and all of its missiles were sold to
it by Colonel West, who is American.

• Prove that Col. West is a criminal

... it is a crime for an American to sell weapons to hostile nations: American(x)

Owns(Nono,x) ) … all of its missiles
were sold to it by Colonel West Missile(x)
Missiles are weapons: Missile(x)
"hostile“: Enemy(x,America) )
The country Nono, an enemy of America … Enemy(Nono,America)


(a) The initial facts appear in the bottom level

(b) Facts inferred on the first iteration is in the middle level

(c) The facts inferered on the 2nd iteration is at the top level

SCE 65 Dept of CSE

CS2351 Artificial Intelligence


Backward chaining with an example.

Forward chaining applies a set of rules and facts to deduce whatever conclusions can be
derived. In backward chaining ,we start from a conclusion, which is the hypothesis we
wish to prove and we aim to show how that conclusion can be reached from the rules and
facts in the data base. The conclusion we are aiming to prove is called a goal, and the
reasoning in this way is known as goal-driven.

SCE 66 Dept of CSE

CS2351 Artificial Intelligence

Backward chaining example

SCE 67 Dept of CSE

CS2351 Artificial Intelligence


(a) To prove Criminal(West) ,we have to prove four conjuncts below it.

(b) Some of which are in knowledge base,and others require further backward





CS2351 Artificial Intelligence



The agent first generates a goal to achieve and then constructs aplan to achieve it
from the Current state


Representation Using Problem Solving Approach

 Forward search

 Backward search
 Heuristic search

Representation Using Planning Approach

 STRIPS-standard research institute problem solver.

 Representation for states and goals

 Representation for plans

 Situation space and plan space

 Solutions
Why Planning ?

Intelligent agents must operate in the world. They are not simply passive reasoners (Knowledge
Representation, reasoning under uncertainty) or problem solvers (Search), they must also acton
the world.
We want intelligent agents to act in “intelligent ways”. Taking purposeful actions, predicting the
expected effect of such actions, composing actions together to achieve complex goals.
E.g. if we have a robot we want robot to decide what to do; how to act to achieve our goals

CS2351 Artificial Intelligence

Planning Problem

How to change the world to suit our needs

Critical issue: we need to reason about what the world will be like after doing a few
actions, not just what it is like now
GOAL: Craig has coffee
CURRENTLY: robot in mailroom, has no coffee, coffee not made, Craig in office etc.

TO DO: goto lounge, make coffee


Partial-Order Planning Algorithms

Partially Ordered Plan
c) Plan
d) Steps
e) Ordering constraints
f) Variable binding constraints
g) Causal links
h) POP Algorithm
i) Make initial plan
j) Loop until plan is a complete
– Select a subgoal
– Choose an operator
– Resolve threats
Choose Operator
k) Choose operator(c, Sneeds)

Choose a step S from the plan or a new step S by instantiating an operator that has c as an
• If there’s no such step, Fail

SCE 78 Dept of CSE

CS2351 Artificial Intelligence

• Add causal link S _c Sneeds

• Add ordering constraint S < Sneeds

• Add variable binding constraints if necessary

• Add S to steps if necessary Nondeterministic choice

• Choose – pick one of the options arbitrarily

• Fail – go back to most recent non-deterministic choice and try a different one that
has not been tried before Resolve Threats ∈

• A step S threatens a causal link Si c Sj iff ¬ c effects(S) and it’s possible that Si <
S < Sj
• For each threat Choose

–Promote S : S < Si < Sj

–Demote S : Si < Sj < S

If resulting plan is inconsistent, then Fail

Threats with Variables If c has variables in it, things are kind of tricky.

• S is a threat if there is any∈ instantiation of the variables that makes ¬ c effects(S)

•We could possibly resolve the threat by adding a negative variable binding constraint,
saying that two variables or a variable and a constant cannot be bound to one another

• Another strategy is to ignore such threats until the very end, hoping that the variables will
become bound and make things easier to deal with
Shopping Domain
4. Actions Have(Milk) Have(Banana)

5. Buy(x, store) • Start
At(Home) Sells(SM, Milk)
– Pre: At(store), Sells(store, x)
• ∧
– Eff: Have(x) Drill)
• Go(x, y)
– Pre: At(x)
– Eff: At(y), ¬At(x)
• Goal

SCE 79 Dept of CSE
CS2351 Artificial Intelligence

(Bananas) At(HDW) Sells

Buy (Milk)
At (SM) Sells(SM,M)
Have(D) Have(M)
Have(B) At(SM)
Sells(SM,B) NB: Causal
links imply ordering
of steps

Sells(SM, Banana) Sells(HW,

Shopping problem

rt At
Buy (Drill) Buy

CS2351 Artificial Intelligence

At (x2)
start At
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
Buy (Milk)
At (SM) Sells(SM,M)
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)

NB: Causal links

imply ordering
of steps
At (x2)

start At
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
Buy (Milk)
At (SM) Sells(SM,M)
Have(D) Have(M)
Have(B) At(SM)
Sells(SM,B) x1=Home
x2=Home NB: Causal
links imply ordering
of steps
At (x2) GO

CS2351 Artificial Intelligence

start At
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
Buy (Milk)
At (SM) Sells(SM,M)
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
x1=Home x2=Home
NB: Causal links
imply ordering
of steps
http://csetubeAt (Home)

At (x2)

Buy (Drill) Buy (Bananas)

At(HDW) Sells (HDW,D)
Buy (Milk)
At (SM) Sells(SM,M)
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
x1=Home x2=Home
NB: Causal links
imply ordering
of steps
At (Home)
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
At (x2)
Buy (Milk)

CS2351 Artificial Intelligence

At (SM) Sells(SM,M)
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
x1=Home x2=Home
At (Home)
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
http://csetubeGO (HDW)

At (x2)
Buy (Milk)
At (SM) Sells(SM,M)
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)

¬At(x1) x1=Home
x2=Home x2=HDW

start At
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
At (x2)
Buy (Milk)
At (SM) Sells(SM,M)
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)

x1=Home x2=Home
start At
Buy (Drill) Buy (Bananas)

CS2351 Artificial Intelligence

At(HDW) Sells (HDW,D)

At (x2)
Buy (Milk)
At (SM) Sells(SM,M)
finish Have(D)
Have(M) Have(B).
At(SM) Sells(SM,B)
¬At(x1) x1=Home x2=Home x2=HDW


 Levels 

 Mutex between actions 

 Mutex holds between luents 

 Graph plan algorithm 


SCE 84 Dept of CSE

CS2351 Artificial Intelligence

 Conditional planning Or Contingency


 Execution monitoring and replanning

 Continuous planning

 Multiagent planning

 Times, schedules, and resources

 Critical path method

 Hierarchical task network planning

SCE 85 Dept of CSE

To act rationally under uncertainty we must be able to evaluate how likely certain
things are. With FOL a fact F is only useful if it is known to be true or false. But we need
to be able to evaluate how likely it is that F is true. By weighing likelihoods of events
(probabilities) we can develop mechanisms for acting rationally under uncertainty.

Dental Diagnosis example.

In FOL we might formulate

P. symptom(P,toothache)→ disease(p,cavity) ∨disease(p,gumDisease) ∨

When do we stop?
Cannot list all possible causes.
We also want to rank the possibilities. We don’t want to start drilling for a cavity before
checking for more likely causes first.

Axioms Of Probability

Given a set U (universe), a probability function is a function defined over the

subsets of U that maps each subset to the real numbers and that satisfies the Axioms of

1.Pr(U) = 1
2.Pr(A) ∈[0,1]
3.Pr(A ∪B) = Pr(A) + Pr(B) –Pr(A ∩B)

Note if A ∩B = {} then Pr(A ∪B) = Pr(A) + Pr(B)


 Natural way to represent uncertainty

 People have intuitive notions about probabilities
 Many of these are wrong or inconsistent
 Most people don’t get what probabilities mean
 Understanding Probabilities
 Initially, probabilities are “relative frequencies”
 This works well for dice and coin flips
 For more complicated events, this is problematic
 What is the probability that Obama will be reelected?
 This event only happens once
 We can’t count frequencies
 still seems like a meaningful question
 In general, all events are unique
CS2351 Artificial Intelligence

Probabilities and Beliefs

 Suppose I have flipped a coin and hidden the outcome
 What is P(Heads)?
 Note that this is a statement about a belief, not a statement about the world
 The world is in exactly one state (at the macro level) and it is in that state
with probability 1.
 Assigning truth values to probability statements is very tricky business
 Must reference speakers state of knowledge

Frequentism and Subjectivism

 Frequentists hold that probabilities must come from relative frequencies
 This is a purist viewpoint
 This is corrupted by the fact that relative frequencies are often unobtainable
 Often requires complicated and convoluted
 assumptions to come up with probabilities
 Subjectivists: probabilities are degrees of belief
o Taints purity of probabilities
o Ofen more practical
Types are:
1 Unconditional or prior probabilities
2 Conditional or posterior probabilities


 Representing Knowledge in an Uncertain Domain
 Belief network used to encode the meaningful dependence between variables.
o Nodes represent random variables
o Arcs represent direct influence
o Nodes have conditional probability table that gives that var's probability
given the different states of its parents
o Is a Directed Acyclic Graph (or DAG)

The Semantics of Belief Networks

 To construct net, think of as representing the joint probability distribution.
 To infer from net, think of as representing conditional independence statements.
 Calculate a member of the joint probability by multiplying individual conditional
o P(X1=x1, . . . Xn=xn) =
o = P(X1=x1|parents(X1)) * . . . * P(Xn=xn|parents(Xn))
 Note: Only have to be given the immediate parents of Xi, not all other nodes:
o P(Xi|X(i-1),...X1) = P(Xi|parents(Xi))
 To incrementally construct a network:
1. Decide on the variables
2. Decide on an ordering of them
3. Do until no variables are left:

CS2351 Artificial Intelligence

a. Pick a variable and make a node for it

b. Set its parents to the minimal set of pre-existing nodes
c. Define its conditional probability
 Often, the resulting conditional probability tables are much smaller than the
exponential size of the full joint
 If don't order nodes by "root causes" first, get larger conditional probability tables
 Different tables may encode the same probabilities.
 Some canonical distributions that appear in conditional probability tables:
o deterministic logical relationship (e.g. AND, OR)
o deterministic numeric relationship (e.g. MIN)
o parameteric relationship (e.g. weighted sum in neural net)
o noisy logical relationship (e.g. noisy-OR, noisy-MAX)

Direction-dependent separation or D-separation:

 If all undirected paths between 2 nodes are d-separated given evidence node(s) E,
then the 2 nodes are independent given E.
 Evidence node(s) E d-separate X and Y if for every path between them E contains a
node Z that:
o has an arrow in on the path leading from X and an arrow out on the path
leading to Y (or vice versa)
o has arrows out leading to both X and Y
o does NOT have arrows in from both X and Y (nor Z's children too)

Inference in Belief Networks

 Want to compute posterior probabilities of query variables given evidence

 Types of inference for belief networks:
o Diagnostic inference: symptoms to causes
o Causal inference: causes to symptoms
o Intercausal inference:
o Mixed inference: mixes those above

Inference in Multiply Connected Belief Networks

 Multiply connected graphs have 2 nodes connected by more than one path
 Techniques for handling:
o Clustering: Group some of the intermediate nodes into one meganode.
Pro: Perhaps best way to get exact evaluation.
Con: Conditional probability tables may exponentially increase in size.
o Cutset conditioning: Obtain simplier polytrees by instantiating variables as
Con: May obtain exponential number of simplier polytrees.
Pro: It may be safe to ignore trees with lo probability (bounded cutset
o Stochastic simulation: run thru the net with randomly choosen values for
each node (weighed by prior probabilities).
SCE 88 Dept of CSE
Bayes’ nets:
A technique for describing complex joint distributions (models) using simple, local
(conditional probabilities)
More properly called graphical models
Local interactions chain together to give global indirect interactions

A Bayesian network is a graphical structure that allows us to represent and reason

about an uncertain domain. The nodes in a Bayesian network represent a set of random
X=X1;::Xi;:::Xn, from the domain. A set of directed arcs(or links) connects pairs of nodes,
Xi!Xj, representing the direct dependencies between variables.

Assuming discrete variables, the strength of the relationship between variables is

quantified by conditional probability distributions associated with each node. The only
constraint on the arcs allowed in a BN is that there must not be any directed cycles: you
cannot return to a node simply by following directed arcs.

Such networks are called directed acyclic graphs, or simply dags. There are a
number of steps that a knowledge engineer must undertake when building a Bayesian

CS2351 Artificial Intelligence

network. At this stage we will present these steps as a sequence; however it is important to
note that in the real-world the process is not so simple.

Nodes and values

First, the knowledge engineer must identify the variables of interest. This involves
answering the question: what are the nodes to represent and what values can they take, or
what state can they be in? For now we will consider only nodes that take discrete values.
The values should be both mutually exclusive and exhaustive , which means that the
variable must take on exactly one of these values at a time. Common types of discrete
nodes include:

Boolean nodes, which represent propositions, taking the binary values true (T)
and false (F). In a medical diagnosis domain, the node Cancer would represent
the proposition that a patient has cancer.

Ordered values. For example, a node Pollution might represent a patient’s pol-
lution exposure and take the values low, medium, high

Integral values. For example, a node called Age might represent a patient’s age
and have possible values from 1 to 120.

Even at this early stage, modeling choices are being made. For example, an
alternative to representing a patient’s exact age might be to clump patients into different
age groups, such as baby, child, adolescent, young, middleaged, old. The trick is to choose
values that represent the domain efficiently.

1 Representation of joint probability distribution

CS2351 Artificial Intelligence

2 Conditional independence relation in Bayesian network



1 Tell
2 Ask
3 Kinds of inferences
4 Use of Bayesian network

 In general, the problem of Bayes Net inference is NP-hard (exponential in the size
of the graph).
 For singly-connected networks or polytrees in which there are no undirected loops,
there are linear time algorithms based on belief propagation.
 Each node sends local evidence messages to their children and parents.
 Each node updates belief in each of its possible values based on incoming messages
from it neighbors and propagates evidence on to its neighbors.
 There are approximations to inference for general networks based on loopy belief
propagation that iteratively refines probabilities that converge to accurate limit.


1 Monitoring or filtering
2 Prediction

Bayes' Theorem

Many of the methods used for dealing with uncertainty in expert systems are based
on Bayes' Theorem.

P(A) Probability of event A
P(A B) Probability of events A and B occurring together
P(A | B) Conditional probability of event A
given that event B has occurred .nr/
If A and B are independent, then P(A | B) = P(A). .co

Expert systems usually deal with events that are not independent, e.g. a disease and
its symptoms are not independent.

P (A B) = P(A | B)* P(B) = P(B | A) * P(A) therefore P(A | B) = P(B | A) * P(A) / P(B)

Uses of Bayes' Theorem

In doing an expert task, such as medical diagnosis, the goal is to determine
identifications (diseases) given observations (symptoms). Bayes' Theorem provides such a
CS2351 Artificial Intelligence

P(A | B) = P(B | A) * P(A) / P(B)

Suppose: A=Patient has measles, B =has a rash

Then:P(measles/rash)=P(rash/measles) * P(measles) / P(rash)

The desired diagnostic relationship on the left can be calculated based on the known
statistical quantities on the right.

Joint Probability Distribution

Given a set of random variables X1 ... Xn, an atomic event is an assignment of a
particular value to each Xi. The joint probability distribution is a table that assigns a
probability to each atomic event. Any question of conditional probability can be answered
from the joint.

Toothache ¬ Toothache
Cavity 0.04 0.06
¬ Cavity 0.01 0.89


The size of the table is combinatoric: the product of the number of possibilities for
each random variable. The time to answer a question from the table will also be
combinatoric. Lack of evidence: we may not have statistics for some table entries, even
though those entries are not impossible.

Chain Rule

We can compute probabilities using a chain rule as follows:

P(A &and B &and C) = P(A | B &and C) * P(B | C) * P(C)
If some conditions C1 &and ... &and Cn are independent of other conditions U, we will
P(A | C1 &and ... &and Cn &and U) = P(A | C1 &and ... &and Cn)
This allows a conditional probability to be computed more easily from smaller tables using
the chain rule.

Bayesian Networks

Bayesian networks, also called belief networks or Bayesian belief networks, express
relationships among variables by directed acyclic graphs with probability tables stored at
the nodes.[Example from Russell & Norvig.]
1 A burglary can set the alarm off
2 An earthquake can set the alarm off
3 The alarm can cause Mary to call
4 The alarm can cause John to call

Computing with Bayesian Networks

CS2351 Artificial Intelligence

If a Bayesian network is well structured as a poly-tree (at most one path between
any two nodes), then probabilities can be computed relatively efficiently. One kind of
algorithm, due to Judea Pearl, uses a message-passing style in which nodes of the network
compute probabilities and send them to nodes they are connected to. Several software
packages exist for computing with belief networks.

A Hidden Markov Model (HMM) tagger chooses the tag for each word that maximizes:
[Jurafsky, op. cit.] P(word | tag) * P(tag | previous n tags)

For a bigram tagger, this is approximated as:

ti = argmaxj P( wi | tj ) P( tj | ti - 1 )

In practice, trigram taggers are most often used, and a search is made for the best
set of tags for the whole sentence; accuracy is about 96%.


A hidden Markov model (HMM) is an augmentation of the Markov chain to include
observations. Just like the state transition of the Markov chain, an HMM also includes
observations of the state. These observations can be partial in that different states can map
to the same observation and noisy in that the same state can stochastically map to different
observations at different times.

The assumptions behind an HMM are that the state at time t+1 only depends on the
state at time t, as in the Markov chain. The observation at time t only depends on the state
at time t. The observations are modeled using the variable for each time t whose domain is
the set of possible observations. The belief network representation of an HMM is depicted
in Figure. Although the belief network is shown for four stages, it can proceed indefinitely.

A stationary HMM includes the following probability distributions:

P(S0) specifies initial conditions.

P(St+1|St) specifies the dynamics.
P(Ot|St) specifies the sensor model.

There are a number of tasks that are common for HMMs.

The problem of filtering or belief-state monitoring is to determine the current state

based on the current and previous observations, namely to determine P(Si|O0,...,Oi).

CS2351 Artificial Intelligence

Note that all state and observation variables after Si are irrelevant because they are
not observed and can be ignored when this conditional distribution is computed.

The problem of smoothing is to determine a state based on past and future

observations. Suppose an agent has observed up to time k and wants to determine the state
at time i for i<k; the smoothing problem is to determine


All of the variables Si and Vi for i>k can be ignored.

CS2351 Artificial Intelligence





What is learning?

Learning denotes changes in the system that are adaptive in the sense that they enable the
system to do the same task or tasks drawn from the same population more effectively the next
time (Simon, 1983).

Learning is making useful changes in our minds (Minsky, 1985).

Learning is constructing or modifying representations of what is being experienced

(Michalski, 1986).

A computer program learns if it improves its performance at some task through experience

(Mitchell, 1997).

So what is learning?

(1) acquire and organize knowledge (by building, modifying and organizing internal
representations of some external reality);
(2) discover new knowledge and theories (by creating hypotheses that explain some data or
(3) acquire skills (by gradually improving their motor or cognitive skills through repeated
sometimes involving little or no conscious thought).
(4) Learning results in changes in the agent (or mind) that improve its competence and/or

(5) Learning is essential for unknown environments, (1) i.e., when designer lacks omniscience

o Learning is useful as a system construction method,

o Expose the agent to reality rather than trying to write it down
o Learning modifies the agent's decision mechanisms to improve performance


Learning agents:

• Four Components

1. Performance Element: collection of knowledge and procedures to decide on the next action.

CS2351 Artificial Intelligence

E.g. walking, turning, drawing, etc.

2. Learning Element: takes in feedback from the critic and modifies the performance element

3. Critic: provides the learning element with information on how well the agent is doing based on a
fixed performance standard. E.g. the audience

4. Problem Generator: provides the performance element with suggestions on new actions to take.

Components of the Performance Element

• A direct mapping from conditions on the current state to actions

• Information about the way the world evolves

• Information about the results of possible actions the agent can take

• Utility information indicating the desirability of world states

Learning element

CS2351 Artificial Intelligence

• Design of a learning element is affected by

– Which components of the performance element are to be learned

– What feedback is available to learn these components

– What representation is used for the components

Type of feedback:

– Supervised learning: correct answers for each example

– Unsupervised learning: correct answers not given

– Reinforcement learning: occasional rewards


Inductive Learning in supervised learning we have a set of {xi, f (xi)} for 1≤i≤n, and our
aim is to determine 'f' by some adaptive algorithm. It is a machine learning approach in which rules
are inferred from facts or data. In logic, reasoning from the specific to the general Conditional or
antecedent reasoning. Theoretical results in machine learning mainly deal with a type of inductive
learning called supervised learning. In supervised learning, an algorithm is given samples that are
labeled in some useful way. In case of inductive learning algorithms, like artificial neural networks,
the real robot may learn only from previously gathered data. Another option is to let the bot learn
everything around him by inducing facts from the environment. This is known as inductive
learning. Finally, you could get the bot to evolve, and optimise his performance over several

f(x) is the target function

An example is a pair [x, f(x)]

Learning task: find a hypothesis h such that h(x)

f(xi) ]}, i = 1,2,…,N Construct h so that it agrees with f.

The hypothesis h is consistent if it agrees with f on all observations.

Ockham’s razor: Select the simplest consistent hypothesis.

How achieve good generalization?

CS2351 Artificial Intelligence

Simplest: Construct a decision tree with one leaf for every example = memory based learning.
Not very good generalization.

Advanced: Split on each variable so that the purity of each split increases (i.e. either only yes or
only no)



• Come up with a set of attributes to describe the object or situation.

• Collect a complete set of examples (training set) from which the decision tree can derive a
hypothesis to define (answer) the goal predicate.

Decision Tree Example:

Problem: decide whether to wait for a table at a restaurant, based on the following attributes:

1. Alternate: is there an alternative restaurant nearby?

2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?

4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None, Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?

8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai, Burger)

10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Logical Representation of a Path

r [Patrons(r, full) -30)

CS2351 Artificial Intelligence

Expressiveness of Decision Trees

• Any Boolean function can be written as a decision tree

• E.g., for Boolean functions, truth table row → path to leaf:

• Trivially, there is a consistent decision tree for any training set with one path to leaf for

example (unless f nondeterministic in x) but it probably won't generalize to new examples

• Prefer to find more compact decision trees


– Can only describe one object at a time.

– Some functions require an exponentially large decision tree.

• E.g. Parity function, majority function

• Decision trees are good for some kinds of functions, and bad for others.

• There is no one efficient representation for all kinds of functions.

Principle Behind the Decision-Tree-Learning Algorithm

• Uses a general principle of inductive learning often called Ockham’s razor:

“The most likely hypothesis is the simplest one that is consistent with all

• Decision trees can express any function of the input attributes.

Decision tree learning Algorithm:

• Aim: find a small tree consistent with the training examples

• Idea: (recursively) choose "most significant" attribute as root of (sub)tree

Choosing an attribute tests:

• Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all

• Patrons? is a better choice

Attribute-based representations

• Examples described by attribute values (Boolean, discrete, continuous)

• E.g., situations where I will/won't wait for a table:

CS2351 Artificial Intelligence

• Classification of examples is positive (T) or negative (F)

Using information theory

• To implement Choose-Attribute in the DTL algorithm

• Information Content (Entropy):

I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)

• For a training set containing p positive examples and n negative examples:

A chosen attribute A divides the training set E into subsets E1, … , Ev according to their
values for A, where A has v distinct values. Information Gain (IG) or reduction in entropy from the
attribute test: remainder ( A),

 Choose the attribute with the largest IG

• For the training set, p = n = 6, I(6/12, 6/12) = 1 bit

• Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the

Assessing the performance of the learning algorithm:

• A learning algorithm is good if it produces hypotheses that do a good job of predicating the

classifications of unseen examples

• Test the algorithm’s prediction performance on a set of new examples, called a test set.

CS2351 Artificial Intelligence

Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

 Choose the attribute with the largest IG

• For the training set, p = n = 6, I(6/12, 6/12) = 1 bit

 Assessing the performance of the learning algorithm:

• A learning algorithm is good if it produces hypotheses that do a good job of predicating the

classifications of unseen examples


• Extract general rules from examples

• Basic idea

– Given an example, construct a proof for the goal predicate that applies using the background
CS2351 Artificial Intelligence


– In parallel, construct a generalized proof with variabilized goal.

– Construct a new rule, LHS with the leaves of the proof tree and RHS with the variabilized

– Drop any conditions that are always true regardless of value of variables in the goal.

• Any partial subtree can be use for the extracted general rule, how to choose?

• Efficiency, Operationality, Generality

– Too many rules slows down reasoning

– Rules should provide speed increase by eliminating dead-ends and shortening the

– As general as possible to cover the most cases

• Tradeoffs, how to maximize the efficiency of the knowledge base?

• Any partial subtree can be use for the extracted general rule, how to choose?

• Efficiency, Operationality, Generality

– Too many rules slows down reasoning

CS2351 Artificial Intelligence

– Rules should provide speed increase by eliminating dead-ends and shortening the proof

– As general as possible to cover the most cases


Learn probabilistic theories of the world from experience

♦ We focus on the learning of Bayesian networks

♦ More specifically, input data (or evidence), learn probabilistic theories

of the world (or hypotheses)

View learning as Bayesian updating of a probability distribution over the hypothesis space

H is the hypothesis variable, values h1, h2, . . ., prior P(H) jth observation dj gives the outcome of
random variable Dj training data d = d1, . . . , dN

Given the data so far, each hypothesis has a posterior probability:

P(hi|d) = αP(d|hi)P(hi)

where P(d|hi) is called the likelihood

Predictions use a likelihood-weighted average over all hypotheses:

P(X|d) = Σi P(X|d, hi)P(hi|d) = Σi P(X|hi)P(hi|d)


Suppose there are five kinds of bags of candies:

10% are h1: 100% cherry candies

20% are h2: 75% cherry candies + 25% lime candies

40% are h3: 50% cherry candies + 50% lime candies

20% are h4: 25% cherry candies + 75% lime candies

10% are h5: 100% lime candies

Then we observe candies drawn from some bag:

What kind of bag is it? What flavour will the next candy be?

CS2351 Artificial Intelligence

1. The true hypothesis eventually dominates the Bayesian prediction given

that the true hypothesis is in the prior

2. The Bayesian prediction is optimal, whether the data set be small or large[?]

On the other hand

1. The hypothesis space is usually very large or infinite summing over the hypothesis space is
often intractable.
2. Overfitting when the hypothesis space is too expressive such that some hypotheses fit the
date set well.
3. Use prior to penalize complexity.


• Active Reinforcement learning

• Passive Reinforcement learning

Reinforcement learning

•Frequency of rewards:

–E.g., chess: reinforcement received at end of game

–E.g., table tennis: each point scored can be viewed as rewardco

. learning goals knowledge

 Environment
 Sensors
 ActuatorsCritic
 AgentLearning
 Performance Element
 Problem generator
 Performance standard
 changesfeedback

• reward part of the input percept•agent must be hardwired to recognize that as reward

and not as another sensory input•E.g., animal psychologists have studied reinforcement

CS2351 Artificial Intelligence

on animals

Passive reinforcement learning

•Direct utility estimation

•Adaptive dynamic programming

•Temporal difference learning

– Active reinforcement learning •Exploration

•Learning an Action-Value Function

Active Reinforcement learning

The agent‘s policy is fixed

–in state s, it always executes the action π(s)

•Goal: how good is the policy?

•The passive learning agent has

–no knowledge about the transition model T(s,a,s‘)

–no knowledge about the reward function R(s)

•It executes sets of trialsin the environment using its policy π.

–it starts in state (1,1) and experiences a sequence of state transitions until it reaches one

of the terminal states (4,2) or (4,3).

•E.g., (1,1)-0.04 (1,2)-0.04 (1,3)-0.04 (2,3)-0.04 (3,3).0.04 (3,2)-0.04 (3,3)-0.04 (4,3)+1

•Use the information about rewards tolearntheexpected utility Uπ(s):

Utility is the expected sum of (discounted)rewards obtained if policy πis followed

Adaptive dynamic programming

•Idea: Learn how states are connected •Adaptive dynamic programming (ADP) agent

–learns the transition modelT(s, π(s), s’)of the environment

–solves the Markov decision process using a dynamic programming method

•Learning transition model is easy fully observable environment

–supervised learning taskwith input = state-action pair, output = resulting state –transition
model can be represented as table of probabilities

CS2351 Artificial Intelligence

•how often do action items occur estimate transition probability T(s,a,s‘) from the frequency
with which s‘is reached when executing a in s.

•E.g., from state (1,3) Rightis executed three times. The resulting state is two times (2,3)
T((1,3) ,Right, (2,3)) is estimated to be 2/3.

CS2351 Artificial Intelligence


1. Artificial Intelligence - The study of how to make computers do things at which, at

the moment, people are better.

2. Turing test - Defines the intelligent behavior as the ability to achieve human-level
performance in all cognitive tasks, sufficient to fool an interrogator.

3. Agent - Anything that can be viewed as perceiving its environment through sensors
and acting upon that environment through actuators.

4. Rational agent - Rational agent is one that does the right thing. A system is rational
if it does the “right thing”, given what it knows.

5. Omniscience agent - It is one which knows the actual outcome of its actions & can
act accordingly.

6. Agent program - Takes the current percept as input from the sensors and return to
the actuators.

7. Agent function - Abstract mathematical description. That maps any given percept
sequence to an action.

8. Problem solving agent - Decides what to do by finding sequences of actions that

lead to desirable states.

9. Backtracking search - A variant of depth-first search. Only one successor is

generated at a time rather than all successors. Each partially expanded node
remembers which successor to generate next.

10. Depth limited search - Supplying depth-first with a predetermined depth limit l.
That is, nodes at depth l are treated as if they have no successors. This approach is
called depth-limited search.

CS2351 Artificial Intelligence

11. Uniformed search - Distinguish a goal state from a non-goal state. Also known as
blind search.

12. Informed search - It is one that uses problem-specific knowledge beyond the
definition of the problem itself and can find solutions more efficiently than an
uninformed strategy.

13. Iterative deepening search - It is an abstract mathematical description. That maps

any given percept sequence to an action.

14. Breadth first search - The root node is expanded first then all the nodes generated
by the root node are expanded next and their successors and so on.

15. Greedy best-first search - Expands the node that is closest to the goal, on the
grounds that this is likely to lead to a solution quickly. Thus, it evaluates nodes by
using the heuristic function f(n) = h(n).

16. A* search - evaluates nodes by combining g(n), the cost to reach the node, and
h(n), the cost to get from the node to the goal. f (n)=g(n)+h(n)

17. Recursive best-first search - A simple recursive algorithm that attempts to

minimize the operation of standard best-first search, but using only linear space.

18. Local maxima - Is a peak that is higher than each of its neighboring states, but
lower than the global maximum.

19. Ridges - Results in a sequence of local maxima that is very difficult for greedy
algorithms to navigate.

20. Plateaux - An area of the state space landscape where the evaluatin function is flat.

21. Hill Climbing Search - Is simply a loop that continually moves in the direction of
increasing value that is uphill. It terminates when it reaches a “peak” where no
neighbor has a higher value.

22. Genetic algorithm - A variant of stochastic beam in which successor states are
generated by combining two parent states, rather than by modifying a single state.

CS2351 Artificial Intelligence

23. Online search problems -Solved only by an agent executing actions, rather than by
a purely computational process. Assume that the agent knows the following:

24. ACTIONS(s) - Returns a list of actions allowed in states.

25. Linear constraints - Constraints in which each variable appears only in linear

26. Unary Constraints – Constraints that restrict the value of a single variable.

27. Binary Constraints - Binary constraints are one with only binary constraints. It can
be represented as a constraint graph.

28. Game - Defined by the initial state, the legal actions in each state, a terminal test
and a utility function that applies to terminal states.

29. Offline search - Compute a complete solution before setting in the real world and
then execute the solution without recourse to their percepts.

30. Commutative Problem - A problem is commutative if the order of application

of any given set of actions has no effect on the outcome.

31. Minimum remaining values - Choosing the variable with the fewest “legal”
values. Otherwise called as “most constraint variable” or “fail first”

32. Informed search strategy - Uses problem specific knowledge beyond the
definition of the problem itself.

33. Best First Search approach - An instance of the general TREE SEARCH
algorithm in which a node is selected for expansion based on an evaluation
function, f (n).

34. Nested Quantifier - Express the more complex sentences using multiple

35. Equality symbol - Used to make the statements more effective that two terms refer
to the same object.
CS2351 Artificial Intelligence

36. Higher Order Logic - allows quantifying over relations and functions as well as
over objects.
37. First Order Logic - Representation language that is far more powerful than
propositional logic.

38. Declarative approach - Representation language makes it easy to express the

knowledge in the form of sentences. This simplifies the construction problem

39. Syntax - Describes the possible configuration that can constitute sentences.

40. Semantics - Determines the facts in the world to which the sentences refer.

41. Entailment - The generations of new sentences that are necessarily true given th
old sentences are true. This relation between sentences is called entailment.

42. Tuple - Collection of objects arranged in a fixed order and is written with angle
brackets surrounding the objects.

43. Symbols - The basic syntactic elements of first order logic are the symbols that
stand for objects, relations and functions. The symbols are in three kinds. Constant
symbols which stand for objects, Predicate symbols which stand for relations and
Function symbol which stand for functions.

44. Ground term - The term without variables.

45. Inference - The task of deriving the new sentence.

46. Datalog - Set of first order definite clauses with no function symbols.

47. Data complexity - Complexity of inference as a function of the number of ground

facts in the database.

48. Prolog programs - set of definite clauses written in a notation somewhat different
from standard first-order logic.

CS2351 Artificial Intelligence

49. Skolemization - Process of removing existential quantifiers by elimination.

50. Situations - logical terms consisting of the initial situation and all situations that
are generated by applying an action to a situation.

51. Fluent - functions and predicates that vary from one situation to the next, such as
the location of the agent.

52. Learning - takes many forms, depending on the nature of the performance element,
the component to be improved, and the available feedback.

53. Inductive learning - Learn a function from examples of its inputs and outputs.

54. PAC-learning algorithm - Any learning algorithm that returns hypothesis that are
probably approximately correct.
55. Sample Complexity - The number of required examples, as a function of E..

56. Neuron - A cell in the brain whose principal function is the collection, processing
and dissemination of electrical signals.

57. Epoch - Each cycle through the examples is called an epoch.

58. Communication - intentional exchange of information brought about by the

production and perception of signs drawn from a shared system of conventional
signs. Most animals use signs to represent important messages.

59. Define language - enables us to communicate most of what we know about the

60. Grammar -A finite set of rules that specifies a language. Formal languages always
have grammar. Natural languages have no grammar.

61. Metaphor - A figure of speech in which a phrase with one literal meaning is used
to suggest a different meaning by way of an analogy.

62. Discourse - any string of language usually that is more than one sentence long.

CS2351 Artificial Intelligence

63. Reference resolution - Interpretation of a pronoun or a definite noun phrase that

refers to an object in the world.

64. Information retrieval - Task of finding documents that are relevant to a user’s
need for information. The best known example of information retrieval systems are
search engines on the World Wide Web.

CS2351 Artificial Intelligence


