Professional Documents
Culture Documents
CS2351 - Ai 00 PDF
CS2351 - Ai 00 PDF
CS2351 - Ai 00 PDF
A Course Material on
Artificial Intelligence
By
ASSISTANT PROFESSOR
QUALITY CERTIFICATE
being prepared by me and it meets the knowledge requirement of the university curriculum.
This is to certify that the course material being prepared by Mrs.J.Justina Princy Thilagavathy is of adequate
quality. She has referred more than five books among them minimum one is from abroad author.
Signature of HD
Head & AP
TABLE OF CONTENTS
PAGE
S.No DATE TOPIC
No
1 Introduction 6
2 Agents 7
3 Problem formulation 21
6 constraint satisfaction 46
8 inferences 57
10 forward chaining 64
11 backward chaining 66
12 unification, Resolution 76
UNIT III-PLANNING
17 probabilistic Reasoning 87
18 Bayesian networks 89
UNIT V LEARNING
22 Inductive learning 97
23 Decision trees 98
APPENDICES
A Glossary 107
Aim: To learn the basics of designing intelligent agents that can solve general purpose problems,
represent and process knowledge, plan and act, reason under uncertainty and can learn from
experiences
UNIT V LEARNING 9
Learning from observation - Inductive learning – Decision trees – Explanation based learning –
Statistical Learning methods - Reinforcement Learning
BOOK:
1. S. Russel and P. Norvig, “Artificial Intelligence – A Modern Approach”, Second
Edition, Pearson Education, 2003.
REFERENCES:
UNIT-1
PROBLEM SOLVING
INTRODUCTION:
The objective of Artificial Intelligence is that how the system can perceive, understand, predict and
manipulate a world far larger and more complicated. The field of Artificial Intelligence is to build
intelligent entities.
DEFINITION:
Artificial Intelligence is the study of how to make computers do things at which, at the moment,
people are better.
SOME DEFINITIONS OF AI
“The exciting new effort to make computers think … machines with minds, in the full and
literal sense” -- Haugeland, 1985
“The art of creating machines that perform functions that require intelligence when
performed by people” -- Kurzweil, 1990
“The study of how to make computers do things at which, at the moment, people
are better” -- Rich and Knight, 1991
“The study of mental faculties through the use of computational models” -- Charniak
and McDermott, 1985
“The study of the computations that make it possible to perceive, reason, and act” --
Winston, 1992
“A field of study that seeks to explain and emulate intelligent behavior in terms of
computational processes” -- Schalkoff, 1990
AGENTS
Agent = perceive + act
Thinking
Reasoning
Planning
Definition:
An agent is anything that can be viewed as perceiving its environment through sensors and acting
upon the environment through actuators.
Ex: Robotic agent
Human agent
INTELLIGENT AGENT:
Agent = perceive+act
Thinking
Reasonig
Planning
An agent uses perception of the environment to make decisions about actions to take.
The perception capability is usually called a sensor.
The actions can depend on the most recent perception or on the entire history (percept
sequence).
An agent is anything that can be viewed as perceiving its environment through sensors and acting
upon the environment through actuators.
Human agent
A B
Fig: practical tabulation of a simple agent function for the vacuum cleaner world
Agent Function
1.The agent function is a mathematical function that maps a sequence of perceptions into
action.
RATIONAL AGENT:
A rational agent is one that can take the right decision in every situation.
Performance measure: a set of criteria/test bed for the success of the agent's behavior.
The performance measures should be based on the desired effect of the agent on
the environment.
Rationality:
Definition: for every possible percept sequence, the agent is expected to take an
action that will maximize its performance measure.
Agent Autonomy:
An agent is omniscient if it knows the actual outcome of its actions. Not possible in
SCE 9 Dept of CSE
CS2351 Artificial Intelligence
Autonomy: the capacity to compensate for partial or incorrect prior knowledge (usually by
learning).
NATURE OF ENVIRONMENTS:
Environment
Actuator
Sensors
– Agent sensors give complete state of the environment at each point in time
– Sensors detect all the aspect that is relevant to the choice of action.
– Strategic environment (if the environment is deterministic except for the actions
of other agent.)
– Agent’s experience can be divided into episodes, each episode with what an agent
perceive and what is the action
– Semi dynamic
Partially Observable:
Semi dynamic:
If the environment does not change for some time, then it changes due to agent’s
performance is called semi dynamic environment.
An agent solving a cross word puzzle by itself is clearly in a single agent environment.
3. goal-based agent
4. utility-
base agent
Definition:
SRA works only if the correct decision can be made on the basis of only the
current percept that is only if the environment is fully observable.
Characteristics
– no plan,
no goal
– do not know what they want to achieve
Condition-action rule
Algorithm Explanation:
Interpret – Input:
Function generates an abstracted description of the current state from the percept.
RULE- MATCH:
Function returns the first rule in the set of rules that matches the given state
description.
RULE - ACTION:
Definition:
An agent which combines the current percept with the old internal state to
generate updated description of the current state.
If the world is not fully observable, the agent must remember observations about the
parts of the environment it cannot currently observe.
This usually requires an internal representation of the world (or internal state).
Since this representation is a model of the world, we call this model-based agent.
characteristics
Algorithm Explanation:
UPDATE-INPUT: This is responsible for creating the new internal stated description.
Goal-based agents:
The agent has a purpose and the action to be taken depends on the current state
and on what it tries to accomplish (the goal).
In some cases the goal is easy to achieve. In others it involves planning, sifting through a
search space for possible solutions, developing a strategy.
Characteriscs
Utility-based agents
If one state is preferred over the other, then it has higher utility for the agent
The agent is aware of a utility function that estimates how close the current state is to the
agent's goal.
• Characteristics
Learning Agents
Learning element
Performance element
Critic
Problem generator
Agent Example
Purpose: compress and archive files that have not been used in a while.
Problem Formulation
• Problem formulation is the process of deciding what actions and states to consider,
given a goal
Search
Execute
SCE 21 Dept of CSE
CS2351 Artificial Intelligence
PROBLEMS
– Possible Actions
• State Space – the state space forms a graph in which the nodes are
states and arcs between nodes are actions.
• Path
• Route finding
• Logistics
• VLSI layout
• Robot navigation
• Learning
TOY PROBLEM
Problem Formulation
• States
– 2 x 22 = 8 states
• Initial State
• Successor Function
– Legal states that result from three actions (Left, Right, Suck)
• Goal Test
• Path Cost
• Uninformed strategies use only the information available in the problem definition
• Breadth-first search
• Uniform-cost search
• Depth-first search
• Depth-limited search
BREADTH-
FIRSTSEARCH
Definition:
The root node is expanded first, and then all the nodes generated by the node are expanded.
Implementation:
• Complete
• Time
– 1 + b + b2 + … + bd + b(bd-1) = O(bd+1)
– exponential in d
• Space
– O(bd+1)
– This is the big problem; an agent that generates nodes at 10 MB/sec will
produce
860 MB in 24 hours
• Optimal
• The memory requirements are a bigger problem for breadth-first search than is
execution time
Given:
Step1:
Step 2:
Step3:
Step4:
Answer : The path in the 2nd depth level that is SBG (or ) SCG.
Time complexity
Definition:
Expand one node to the depth of the tree. If dead end occurs, backtracking is done
to the next immediate previous node for the nodes to be expanded
• Enqueue nodes on nodes in LIFO (last-in, first-out) order. That is, nodes used as
a stack data structure to order nodes.
• It needs to store only a single path from the root to a leaf node, along with
remaining unexpanded sibling nodes for each node on a path
Implementation:
• Complete
• Time
– O(bm)
– But if the solutions are dense, this may be faster than breadth-first search
• Space
– O(bm)…linear space
SCE 31 Dept of CSE
CS2351 Artificial Intelligence
• Optimal
– No
• When search hits a dead-end, can only back up one level at a time even if the
“problem” occurs because of a bad operator choice near the top of the tree.
Hence, only does “chronological backtracking”
Advantage:
• If more than one solution exists or no of levels is high then dfs is best because
exploration is done only a small portion of the white space.
Disadvantage:
Given problem:
Step 1:
Step 2:
Step 3:
A B C
Step 4:
A B C
G
Answer: Path in 3rd level is SADG
DEPTH-LIMITED SEARCH
Definition:
A cut off (Maximum level of the depth) is introduced in this search technique to overcome
the
disadvantage of Depth First Search. The cut off value depends on the number of states.DLS
can be implemented as a simple modification to the general tree search algorithm or the
recursive DFS algorithm.DLS imposes a fixed depth limit on a dfs.
• Complete
– Yes if l < d
• Time
– N(IDS)=(d)b+(d-1)b²+……………………..+(1)
– O(bl)
SCE 34 Dept of CSE
CS2351 Artificial Intelligence
• Space
– O(bl)
• Optimal
– No if l > d
Advantage:
Disadvantage:
Given:
B C
D E
The number of states in the given map is five. So it is possible to get the goal state at the
maximum depth of four. Therefore the cut off value is four.
1. 2. 3. 4.
A A A A
B C B C B C
40
D D
Definition:
• Iterative deepening depth-first search It is a strategy that steps the issue of choosing the
best path depth limit by trying all possible depth limit
The idea is to use increasing path-cost limit instead of increasing depth limits. The
resulting algorithm called iterative lengthening search.
Implementation:
• Complete
– Yes
• Time : N(IDS)=(d)b+(d-1)b2+…………+(1)bd
– O(bd)
• Space
– O(bd)
• Optimal
Advantages:
• This method is preferred for large state space and when the depth of the search
is not known.
Disadvantages:
– If b=4, then worst case is 1.78 * 4d, i.e., 78% more nodes searched
than exist at depth d (in the worst case).
Given:
A F
B C
D E G
Limit=0
A
Limit=1
B C F
Limit=2
1.
A
B C F
2.
Answer: Since it is a IDS tree the lowest depth limit (i.e.) A-F-G is selected as the solution path.
BI-DIRECTIONAL SEARCH
Definition:
It is a strategy that simultaneously searches both the directions (i.e) forward from the
initial state and backward from the goal state and stops when the two searches meet
in the Middle.
• Alternate searching from the start state toward the goal and from the goal state
toward the start.
• Works well only when there are unique start and goal states.
3. Complete: Yes
4. Optimal: Yes
Advantages:
Disadvantages:
The space requirement is the most significant weakness of bi-directional search.If two
searches do not meet at all, complexity arises in the search technique. In backward search
calculating predecessor is difficult task. If more than one goal state exists then explicitly,
multiple state searches are required.
• Completeness
• Time
• Space
• Optimal
Heuristic / Informed
It uses additional information about nodes (heuristics) that have not yet been explored to
decide which nodes to examine next
Can find solutions more efficiently than search strategies that do not use domain specific
knowledge.
Best-first search: node is selected for expansion based on an evaluation function f(n)
Implementation:
Definition:
A best first search that uses to select next node to expand is called greedy search.
Ex:
Given,
Solution:
From the given graph and estimated cost the goal state is estimated as B
from A. Apply the evaluation function h(n) to find a path from A to B.
From F goal state B is reached. Therefore the path from A to B using greedy search is A-S-F-B
= 450(i.e.) (140+99+211).or the problem of finding route from Arad to Burcharest...
• Complete? No – can get stuck in loops, e.g., Iasi Neamt Iasi Neamt
• Optimal? No
– state is a "black box“ – any data structure that supports successor function,
heuristic function, and goal test
• CSP:
• Allows useful general-purpose algorithms with more power than standard search
algorithms
Arc consistency:
Path consistency:
Path consistency means that any pair of adjacent variables can always be
extended to a third neighboring variable, this is also called path consistency
K-consistency:
Example: Map-Coloring
• Domains Di = {red,green,blue}
sce Dept of CSE
47
CS2351 Artificial Intelligence
Constraint graph
Varieties of CSPs
• Discrete variables
– finite domains:
– infinite domains:
• e.g., job scheduling, variables are start/end days for each job
• Continuous variables
Varieties of constraints:
– e.g., SA ≠ green
– e.g., SA ≠ WA
Knowledge representation
A variety of ways of knowledge (facts) have been exploited in AI programs. Facts: truths
in some relevant world. These are things we want to represent.
Propositional logic
Example
Inference
Inference is deriving new sentences from old.
Modus ponens
There are standard patterns of inference that can be applied to derive chains of
conclusions that lead to the desired goal. These patterns of inference are called inference
rules.
Entailment
Propositions tell about the notion of truth and it can be applied to logical reasoning. We can
have logical entailment between sentences. This is known as entailment where a sentence
follows logically from another sentence.In mathematical notation we write : knowledge
based agents or logical agents.The central component of a knowledge-based agent is its
knowledge base, or KB.
Informally,a knowledge base is a set of sentences. Each sentence is expressed in language
called a knowledge representation language and represents some assertion about the
world.
The syntax of propositional logic defines the allowable
sentences. The atomic sentences-
the indivisible syntactic elements-consist of a single proposition symbol. Each such
symbol tands for a proposition that can be true or false. We will use uppercase names for
symbols: P, Q, R, and so on.
The basic syntactic elements of -orderlogicare. the symbols that stand for objects,
relations, and functions. The symbols,come in three kinds:
We adopt the convention that these symbols will begin with uppercase letters.
Example: Constant
symbols : Richard
SCE 58 Dept of CSE
CS2351 Artificial Intelligence
Quantifiers
a) Universal ( ) and
b) Existential ( )
Universal quantification
( x) P(x) : means that P holds forall values of x in the domain associated with that
variable
Existential quantification
( x)P(x) means that P holds for some value of x in the domain associated with that
variable
E.g., ( x) mammal(x) ^ lays-eggs(x)
Permits one to make a statement about some object without naming it
The sentence x P,where P is a logical expression says that P is true for every object x.
Example
. The task will determine what knowledge must be represented in order to connect
problem instances to answers. This step is analogous to the PEAS process for designing
agents.
Once the choices have been made. the result is a vocabulary that is known as the ontology of
the domain. The word ontology means a particular theory of the nature of being or existence.
The knowledge engineer writes down the axioms for all the vocabulary terms. This pins down
(to the extent possible) the meaning of the terms, enabling the expert to check the content.
Often, this step reveals misconceptions or gaps in the vocabulary that must be fixed by returning
to step 3 and iterating through the process.
For a logical agent, problem instances are supplied by the sensors, whereas a "disembodied"
knowledge base is supplied with additional sentences in the same way that traditional
programs are supplied with input data.
This is where the reward is: we can let the inference procedure operate on the axioms and
problem-specific facts to derive the facts we are interested in knowing.
We will develop an ontology and knowledge base that allow us to reason about digital Circuits
of the kind shown in Figure 8.4. We follow the seven-step process for knowledge engineering
There are many reasoning tasks associated with digital circuits. At the highest level, one
analyzes the circuit's functionality. For example, what are all the gates connected to the first
input terminal? Does the circuit contain feedback loops? These will be our tasks in this section.
What do we know about digital circuits? For our purposes, they are composed of wires and
gates. Signals flow along wires to the input terminals of gates, and each gate produces a decide
on vocabulary.
We now know that we want to talk about circuits, terminals, signals, and gates. The next
step is to choose functions, predicates, and constants to represent them. We will start from
individual gates and move up to circuits. First, we need to be able to distinguish a gate from
other gates. This is handled by naming gates with constants: X I , X2, and so on
One sign that we have a good ontology is that there are very few general rules which need
to be specified. A sign that we have a good vocabulary is that each rule can be stated clearly
and concisely. With our example, we need only seven simple rules to describe everything we
need to know about circuits:
1. If two terminals are connected, then they have the same signal:
The circuit shown in Figure 8.4 is encoded as circuit C1 with the following description.
What combinations of inputs would cause the first output of Cl (the sum bit) to be 0 and
The second output of C1 (the carry bit) to be l?
We can perturb the knowledge base in various ways to see what kinds of erroneous
behaviors
emerge.
The best way to find usage of First order logic is through examples. The examples can be
taken from some simple domains. In knowledge representation, a domain is just some
part of
Sentences are added to a knowledge base using TELL, exactly as in propositional logic.
SCE 63 Dept of CSE
CS2351 Artificial Intelligence
Such
sentences are called assertions.
For example, we can assert that John is a king and that kings are persons:
Note:
(c) The facts inferered on the 2nd iteration is at the top level
ALGORITHM
Forward chaining applies a set of rules and facts to deduce whatever conclusions can be
derived. In backward chaining ,we start from a conclusion, which is the hypothesis we
wish to prove and we aim to show how that conclusion can be reached from the rules and
facts in the data base. The conclusion we are aiming to prove is called a goal, and the
reasoning in this way is known as goal-driven.
Note:
(a) To prove Criminal(West) ,we have to prove four conjuncts below it.
(b) Some of which are in knowledge base,and others require further backward
UNIFICATION:
UNIFY(P,R)=UNIFY(Q,R)=UNIFY(P,Q)
RESOLUTION:
NF
CNF
INF WITH REFUTATION
CNF WITH REFUTATION
UNIT III-PLANNING
The agent first generates a goal to achieve and then constructs aplan to achieve it
from the Current state
PROBLEMSOLVING TO PLANNING
Forward search
Backward search
Heuristic search
Solutions
Why Planning ?
Intelligent agents must operate in the world. They are not simply passive reasoners (Knowledge
Representation, reasoning under uncertainty) or problem solvers (Search), they must also acton
the world.
We want intelligent agents to act in “intelligent ways”. Taking purposeful actions, predicting the
expected effect of such actions, composing actions together to achieve complex goals.
E.g. if we have a robot we want robot to decide what to do; how to act to achieve our goals
Planning Problem
Choose a step S from the plan or a new step S by instantiating an operator that has c as an
effect
• If there’s no such step, Fail
• Fail – go back to most recent non-deterministic choice and try a different one that
has not been tried before Resolve Threats ∈
• A step S threatens a causal link Si c Sj iff ¬ c effects(S) and it’s possible that Si <
S < Sj
• For each threat Choose
Threats with Variables If c has variables in it, things are kind of tricky.
•We could possibly resolve the threat by adding a negative variable binding constraint,
saying that two variables or a variable and a constant cannot be bound to one another
• Another strategy is to ignore such threats until the very end, hoping that the variables will
become bound and make things easier to deal with
Shopping Domain
4. Actions Have(Milk) Have(Banana)
•
5. Buy(x, store) • Start
At(Home) Sells(SM, Milk)
– Pre: At(store), Sells(store, x)
• ∧
– Eff: Have(x) Drill)
• Go(x, y)
– Pre: At(x)
– Eff: At(y), ¬At(x)
• Goal
∧
SCE 79 Dept of CSE
CS2351 Artificial Intelligence
Shopping problem
sta
rt At
(Hom
e)
Buy (Drill) Buy
At (x2)
GO (HDW)
At(x1)
¬At(x1)
start At
(Home)
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
Buy (Milk)
At (SM) Sells(SM,M)
finish
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
start At
(Home)
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
Buy (Milk)
At (SM) Sells(SM,M)
finish
Have(D) Have(M)
Have(B) At(SM)
Sells(SM,B) x1=Home
x2=Home NB: Causal
links imply ordering
of steps
!
¬At(x2)
GO (SM)
At (x2) GO
(HDW)
At(x1)
¬At(x1)
start At
(Home)
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
Buy (Milk)
At (SM) Sells(SM,M)
finish
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
x1=Home x2=Home
NB: Causal links
imply ordering
of steps
¬At(x2)
http://csetubeAt (Home)
GO (SM)
At (x2)
GO (HDW)
At(x1)
¬At(x1)
start
At (SM) Sells(SM,M)
finish
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
GO (HDW)
At(x1)
¬At(x1)
x1=Home x2=Home
3
start
At (Home)
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
¬At(x2)
GO (SM)
http://csetubeGO (HDW)
At (x2)
Buy (Milk)
At (SM) Sells(SM,M)
finish
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
At(x1)
¬At(x1) x1=Home
x2=Home x2=HDW
start At
(Home)
Buy (Drill) Buy (Bananas)
At(HDW) Sells (HDW,D)
¬At(x2)
GO (SM)
At (x2)
Buy (Milk)
At (SM) Sells(SM,M)
finish
Have(D) Have(M) Have(B)
At(SM) Sells(SM,B)
GO (HDW)
At(x1)
¬At(x1)
x1=Home x2=Home
x2=HDW
start At
(Home)
Buy (Drill) Buy (Bananas)
Levels
Mutex between actions
Mutex holds between luents
Graph plan algorithm
Continuous planning
Multiagent planning
UNIT-IV: UNCERTAINTY
4.1 UNCERTAINTY
To act rationally under uncertainty we must be able to evaluate how likely certain
things are. With FOL a fact F is only useful if it is known to be true or false. But we need
to be able to evaluate how likely it is that F is true. By weighing likelihoods of events
(probabilities) we can develop mechanisms for acting rationally under uncertainty.
When do we stop?
Cannot list all possible causes.
We also want to rank the possibilities. We don’t want to start drilling for a cavity before
checking for more likely causes first.
Axioms Of Probability
1.Pr(U) = 1
2.Pr(A) ∈[0,1]
3.Pr(A ∪B) = Pr(A) + Pr(B) –Pr(A ∩B)
Multiply connected graphs have 2 nodes connected by more than one path
Techniques for handling:
o Clustering: Group some of the intermediate nodes into one meganode.
Pro: Perhaps best way to get exact evaluation.
Con: Conditional probability tables may exponentially increase in size.
o Cutset conditioning: Obtain simplier polytrees by instantiating variables as
constants.
Con: May obtain exponential number of simplier polytrees.
Pro: It may be safe to ignore trees with lo probability (bounded cutset
conditioning).
o Stochastic simulation: run thru the net with randomly choosen values for
each node (weighed by prior probabilities).
SCE 88 Dept of CSE
CS2351 Artificial Intelligence
Bayes’ nets:
A technique for describing complex joint distributions (models) using simple, local
distributions
(conditional probabilities)
More properly called graphical models
Local interactions chain together to give global indirect interactions
Such networks are called directed acyclic graphs, or simply dags. There are a
number of steps that a knowledge engineer must undertake when building a Bayesian
network. At this stage we will present these steps as a sequence; however it is important to
note that in the real-world the process is not so simple.
Boolean nodes, which represent propositions, taking the binary values true (T)
and false (F). In a medical diagnosis domain, the node Cancer would represent
the proposition that a patient has cancer.
Ordered values. For example, a node Pollution might represent a patient’s pol-
lution exposure and take the values low, medium, high
Integral values. For example, a node called Age might represent a patient’s age
and have possible values from 1 to 120.
Even at this early stage, modeling choices are being made. For example, an
alternative to representing a patient’s exact age might be to clump patients into different
age groups, such as baby, child, adolescent, young, middleaged, old. The trick is to choose
values that represent the domain efficiently.
In general, the problem of Bayes Net inference is NP-hard (exponential in the size
of the graph).
For singly-connected networks or polytrees in which there are no undirected loops,
there are linear time algorithms based on belief propagation.
Each node sends local evidence messages to their children and parents.
Each node updates belief in each of its possible values based on incoming messages
from it neighbors and propagates evidence on to its neighbors.
There are approximations to inference for general networks based on loopy belief
propagation that iteratively refines probabilities that converge to accurate limit.
TEMPORAL MODELS
1 Monitoring or filtering
2 Prediction
Bayes' Theorem
Many of the methods used for dealing with uncertainty in expert systems are based
on Bayes' Theorem.
Notation:
P(A) Probability of event A
P(A B) Probability of events A and B occurring together
P(A | B) Conditional probability of event A
given that event B has occurred .nr/
If A and B are independent, then P(A | B) = P(A). .co
Expert systems usually deal with events that are not independent, e.g. a disease and
its symptoms are not independent.
Theorem
P (A B) = P(A | B)* P(B) = P(B | A) * P(A) therefore P(A | B) = P(B | A) * P(A) / P(B)
The desired diagnostic relationship on the left can be calculated based on the known
statistical quantities on the right.
Toothache ¬ Toothache
Cavity 0.04 0.06
¬ Cavity 0.01 0.89
Problems:
The size of the table is combinatoric: the product of the number of possibilities for
each random variable. The time to answer a question from the table will also be
combinatoric. Lack of evidence: we may not have statistics for some table entries, even
though those entries are not impossible.
Chain Rule
Bayesian Networks
Bayesian networks, also called belief networks or Bayesian belief networks, express
relationships among variables by directed acyclic graphs with probability tables stored at
the nodes.[Example from Russell & Norvig.]
1 A burglary can set the alarm off
2 An earthquake can set the alarm off
3 The alarm can cause Mary to call
4 The alarm can cause John to call
If a Bayesian network is well structured as a poly-tree (at most one path between
any two nodes), then probabilities can be computed relatively efficiently. One kind of
algorithm, due to Judea Pearl, uses a message-passing style in which nodes of the network
compute probabilities and send them to nodes they are connected to. Several software
packages exist for computing with belief networks.
A Hidden Markov Model (HMM) tagger chooses the tag for each word that maximizes:
[Jurafsky, op. cit.] P(word | tag) * P(tag | previous n tags)
In practice, trigram taggers are most often used, and a search is made for the best
set of tags for the whole sentence; accuracy is about 96%.
The assumptions behind an HMM are that the state at time t+1 only depends on the
state at time t, as in the Markov chain. The observation at time t only depends on the state
at time t. The observations are modeled using the variable for each time t whose domain is
the set of possible observations. The belief network representation of an HMM is depicted
in Figure. Although the belief network is shown for four stages, it can proceed indefinitely.
Note that all state and observation variables after Si are irrelevant because they are
not observed and can be ignored when this conditional distribution is computed.
P(Si|O0,...,Ok).
UNIT-V
LEARNING
Introduction:
What is learning?
Learning denotes changes in the system that are adaptive in the sense that they enable the
system to do the same task or tasks drawn from the same population more effectively the next
time (Simon, 1983).
(Michalski, 1986).
A computer program learns if it improves its performance at some task through experience
(Mitchell, 1997).
So what is learning?
(1) acquire and organize knowledge (by building, modifying and organizing internal
representations of some external reality);
(2) discover new knowledge and theories (by creating hypotheses that explain some data or
phenomena);
(3) acquire skills (by gradually improving their motor or cognitive skills through repeated
practice,
sometimes involving little or no conscious thought).
(4) Learning results in changes in the agent (or mind) that improve its competence and/or
efficiency.
(5) Learning is essential for unknown environments, (1) i.e., when designer lacks omniscience
Learning agents:
• Four Components
1. Performance Element: collection of knowledge and procedures to decide on the next action.
2. Learning Element: takes in feedback from the critic and modifies the performance element
accordingly.
3. Critic: provides the learning element with information on how well the agent is doing based on a
fixed performance standard. E.g. the audience
4. Problem Generator: provides the performance element with suggestions on new actions to take.
• Information about the results of possible actions the agent can take
Learning element
Type of feedback:
Inductive Learning in supervised learning we have a set of {xi, f (xi)} for 1≤i≤n, and our
aim is to determine 'f' by some adaptive algorithm. It is a machine learning approach in which rules
are inferred from facts or data. In logic, reasoning from the specific to the general Conditional or
antecedent reasoning. Theoretical results in machine learning mainly deal with a type of inductive
learning called supervised learning. In supervised learning, an algorithm is given samples that are
labeled in some useful way. In case of inductive learning algorithms, like artificial neural networks,
the real robot may learn only from previously gathered data. Another option is to let the bot learn
everything around him by inducing facts from the environment. This is known as inductive
learning. Finally, you could get the bot to evolve, and optimise his performance over several
generations.
Simplest: Construct a decision tree with one leaf for every example = memory based learning.
Not very good generalization.
Advanced: Split on each variable so that the purity of each split increases (i.e. either only yes or
only no)
• Collect a complete set of examples (training set) from which the decision tree can derive a
hypothesis to define (answer) the goal predicate.
Problem: decide whether to wait for a table at a restaurant, based on the following attributes:
• Trivially, there is a consistent decision tree for any training set with one path to leaf for
each
Limitations
• Decision trees are good for some kinds of functions, and bad for others.
“The most likely hypothesis is the simplest one that is consistent with all
observations.”
• Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all
negative"
Attribute-based representations
A chosen attribute A divides the training set E into subsets E1, … , Ev according to their
values for A, where A has v distinct values. Information Gain (IG) or reduction in entropy from the
attribute test: remainder ( A),
• Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the
root
• A learning algorithm is good if it produces hypotheses that do a good job of predicating the
• Test the algorithm’s prediction performance on a set of new examples, called a test set.
Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root
• A learning algorithm is good if it produces hypotheses that do a good job of predicating the
• Basic idea
– Given an example, construct a proof for the goal predicate that applies using the background
SCE 101 Dept of CSE
CS2351 Artificial Intelligence
knowledge.
– Construct a new rule, LHS with the leaves of the proof tree and RHS with the variabilized
goal.
– Drop any conditions that are always true regardless of value of variables in the goal.
• Any partial subtree can be use for the extracted general rule, how to choose?
– Rules should provide speed increase by eliminating dead-ends and shortening the
proof
• Any partial subtree can be use for the extracted general rule, how to choose?
– Rules should provide speed increase by eliminating dead-ends and shortening the proof
View learning as Bayesian updating of a probability distribution over the hypothesis space
H is the hypothesis variable, values h1, h2, . . ., prior P(H) jth observation dj gives the outcome of
random variable Dj training data d = d1, . . . , dN
P(hi|d) = αP(d|hi)P(hi)
Example
What kind of bag is it? What flavour will the next candy be?
2. The Bayesian prediction is optimal, whether the data set be small or large[?]
1. The hypothesis space is usually very large or infinite summing over the hypothesis space is
often intractable.
2. Overfitting when the hypothesis space is too expressive such that some hypotheses fit the
date set well.
3. Use prior to penalize complexity.
Reinforcement learning
•Frequency of rewards:
Environment
Sensors
ActuatorsCritic
AgentLearning
Performance Element
Problem generator
Performance standard
changesfeedback
• reward part of the input percept•agent must be hardwired to recognize that as reward
and not as another sensory input•E.g., animal psychologists have studied reinforcement
on animals
–it starts in state (1,1) and experiences a sequence of state transitions until it reaches one
•Idea: Learn how states are connected •Adaptive dynamic programming (ADP) agent
–supervised learning taskwith input = state-action pair, output = resulting state –transition
model can be represented as table of probabilities
•how often do action items occur estimate transition probability T(s,a,s‘) from the frequency
with which s‘is reached when executing a in s.
•E.g., from state (1,3) Rightis executed three times. The resulting state is two times (2,3)
T((1,3) ,Right, (2,3)) is estimated to be 2/3.
GLOSSARY
2. Turing test - Defines the intelligent behavior as the ability to achieve human-level
performance in all cognitive tasks, sufficient to fool an interrogator.
3. Agent - Anything that can be viewed as perceiving its environment through sensors
and acting upon that environment through actuators.
4. Rational agent - Rational agent is one that does the right thing. A system is rational
if it does the “right thing”, given what it knows.
5. Omniscience agent - It is one which knows the actual outcome of its actions & can
act accordingly.
6. Agent program - Takes the current percept as input from the sensors and return to
the actuators.
7. Agent function - Abstract mathematical description. That maps any given percept
sequence to an action.
10. Depth limited search - Supplying depth-first with a predetermined depth limit l.
That is, nodes at depth l are treated as if they have no successors. This approach is
called depth-limited search.
11. Uniformed search - Distinguish a goal state from a non-goal state. Also known as
blind search.
12. Informed search - It is one that uses problem-specific knowledge beyond the
definition of the problem itself and can find solutions more efficiently than an
uninformed strategy.
14. Breadth first search - The root node is expanded first then all the nodes generated
by the root node are expanded next and their successors and so on.
15. Greedy best-first search - Expands the node that is closest to the goal, on the
grounds that this is likely to lead to a solution quickly. Thus, it evaluates nodes by
using the heuristic function f(n) = h(n).
16. A* search - evaluates nodes by combining g(n), the cost to reach the node, and
h(n), the cost to get from the node to the goal. f (n)=g(n)+h(n)
18. Local maxima - Is a peak that is higher than each of its neighboring states, but
lower than the global maximum.
19. Ridges - Results in a sequence of local maxima that is very difficult for greedy
algorithms to navigate.
20. Plateaux - An area of the state space landscape where the evaluatin function is flat.
21. Hill Climbing Search - Is simply a loop that continually moves in the direction of
increasing value that is uphill. It terminates when it reaches a “peak” where no
neighbor has a higher value.
22. Genetic algorithm - A variant of stochastic beam in which successor states are
generated by combining two parent states, rather than by modifying a single state.
23. Online search problems -Solved only by an agent executing actions, rather than by
a purely computational process. Assume that the agent knows the following:
25. Linear constraints - Constraints in which each variable appears only in linear
form.
26. Unary Constraints – Constraints that restrict the value of a single variable.
27. Binary Constraints - Binary constraints are one with only binary constraints. It can
be represented as a constraint graph.
28. Game - Defined by the initial state, the legal actions in each state, a terminal test
and a utility function that applies to terminal states.
29. Offline search - Compute a complete solution before setting in the real world and
then execute the solution without recourse to their percepts.
31. Minimum remaining values - Choosing the variable with the fewest “legal”
values. Otherwise called as “most constraint variable” or “fail first”
32. Informed search strategy - Uses problem specific knowledge beyond the
definition of the problem itself.
33. Best First Search approach - An instance of the general TREE SEARCH
algorithm in which a node is selected for expansion based on an evaluation
function, f (n).
34. Nested Quantifier - Express the more complex sentences using multiple
quantifiers.
35. Equality symbol - Used to make the statements more effective that two terms refer
to the same object.
SCE 109 Dept of CSE
CS2351 Artificial Intelligence
36. Higher Order Logic - allows quantifying over relations and functions as well as
over objects.
37. First Order Logic - Representation language that is far more powerful than
propositional logic.
39. Syntax - Describes the possible configuration that can constitute sentences.
40. Semantics - Determines the facts in the world to which the sentences refer.
41. Entailment - The generations of new sentences that are necessarily true given th
old sentences are true. This relation between sentences is called entailment.
42. Tuple - Collection of objects arranged in a fixed order and is written with angle
brackets surrounding the objects.
43. Symbols - The basic syntactic elements of first order logic are the symbols that
stand for objects, relations and functions. The symbols are in three kinds. Constant
symbols which stand for objects, Predicate symbols which stand for relations and
Function symbol which stand for functions.
46. Datalog - Set of first order definite clauses with no function symbols.
48. Prolog programs - set of definite clauses written in a notation somewhat different
from standard first-order logic.
50. Situations - logical terms consisting of the initial situation and all situations that
are generated by applying an action to a situation.
51. Fluent - functions and predicates that vary from one situation to the next, such as
the location of the agent.
52. Learning - takes many forms, depending on the nature of the performance element,
the component to be improved, and the available feedback.
53. Inductive learning - Learn a function from examples of its inputs and outputs.
54. PAC-learning algorithm - Any learning algorithm that returns hypothesis that are
probably approximately correct.
55. Sample Complexity - The number of required examples, as a function of E..
56. Neuron - A cell in the brain whose principal function is the collection, processing
and dissemination of electrical signals.
59. Define language - enables us to communicate most of what we know about the
world.
60. Grammar -A finite set of rules that specifies a language. Formal languages always
have grammar. Natural languages have no grammar.
61. Metaphor - A figure of speech in which a phrase with one literal meaning is used
to suggest a different meaning by way of an analogy.
62. Discourse - any string of language usually that is more than one sentence long.
64. Information retrieval - Task of finding documents that are relevant to a user’s
need for information. The best known example of information retrieval systems are
search engines on the World Wide Web.
QUESTION BANK
Unit I
Possible 2 marks:
A rational agent is one that does the right thing. A system is rational if it
does the “right thing”, given what it knows.
7. State the needs of a computer to pass the turing test. (anna univ 2005)
It is one which knows the actual outcome of its actions & can act
accordingly.
2. Deterministic Vs Stochastic.
3. Episodic Vs Sequential
4. Static Vs Dynamic
5. Discrete Vs Continuous
The current decision does not affect whether the next part is defective.
17. What are the problems arises when knowledge of the states or actions is
incomplete?
2. Contingency problems
3. Exploration problems
1. Completeness
2. Optimality
3. Time Complexity
4. Space Complexity
ii) Touring
i) Initial state
ii) Actions
This term has no information about the number of steps or path cost current
to goal state. They can distinguish a goal state from a non-goal state. Also known as blind
search.
The time complexity is O(b d), where, d is the depth and b is number at each
level.
The root node is expanded first then all the nodes generated by the root node
are expanded next and their successors and so on.
2. What is meant by PEAS? List out few agents types and describes their PEAS? (anna
univ 2004)
6. Explain in detail iterative deepening depth-first search. Write an algorithm for it.
7. Describe in brief the depth-first search and breadth-first search algorithms and also
mention their advantages. (anna univ 2005)
UNIT II
POSSIBLE 2 MARKS:
2. Define A* search.
A* search evaluates nodes by combining g(n), the cost to reach the node,
and h(n), the cost to get from the node to the goal.
f (n)=g(n)+h(n)
3. Define Consistency.
A heuristic h(n) is consistent if, for every node n and every successor n’ of
n generated by any action a, the estimated cost of reaching the goal n is no greater than
the step cost of getting to n’ plus the estimated cost of reaching the goal from n’:
5. What are the reasons that hill climbing often gets stuck? (anna univ 2004)
Local maxima:
Ridges:
Plateaux:
8. Why a hill climbing search is called a greedy local search? (anna univ 2004)
Hill climbing is sometimes called greedy local search because it grabs a
good neighbor state without thinking ahead about where to go next.
Linear constraints are the constraints in which each variable appears only in
linear form.
Unary Constraints:
Binary Constraints:
A heuristic h(n) is consistent if, for every node n and every successor n’ pf n
generated by any action a, the estimated cost of reaching the goal form n is no greater than
the step cost of getting to n’ plus the estimated cost of reaching the goal form n’:
A game can be defined by the initial state, the legal actions in each state, a
terminal test and a utility function that applies to terminal states.
The problem with minimax search is that the number of games states it has
to examine is exponential in the number of moves. We can’t eliminate the exponent, but we
can effectively cut it in half. The trick is that is possible to compute the correct minimax
decision without looking at every node in the game tree. This technique is called alpha-beta
pruning.
Backtracking search is used for a depth first search that chooses values for
one variable at a time and backtracks when a variable has no legal values left to assign.
Choosing the variable with the fewest “legal” values is called the minimum
remaining values heuristic. Otherwise called as “most constraint variable” or “fail first”
Best first search typically use a heuristic function h (n) that estimates the
cost of the solution from n.
h(n) = estimated cost of the cheapest path from node n to a goal node.
2. Trace the operation of A* search applied to the problem of getting to Bucharest from
Lugoj using the straight-line distance heuristic. (anna univ 2004)
3. Invent a heuristic function for the 8-puzzle that sometimes overestimates, and show
how it can lead to a suboptimal solution on a particular problem.
UNIT III
POSSIBLE 2 MARKS:
1. What are the standard quantifiers of First Order Logic? (Apr/May 2008)
They are:
i) Universal Quantifiers
ii) Existential Quantifiers
Thus it is true if and only if, all the above sentences are true that is if p is true
for all objects x in the universe. Hence, is called universal quantifier.
To say, for example, that king john has a crown on his head, we write
The sentence says that P is true for at least one object x. Hence, is called
existential quantifier.
The Nested Quantifier is to express the more complex sentences using multiple
quantifiers. For example, “Brothers are siblings” can be written as
Consecutive quantifiers of the same type can be written as one quantifier with
several variables. For example, to say that siblinghood is a symmetric relationship, we can
write
The two quantifiers can be connected with each other through negation. It can be
explained through negation. It can be explained with the following example.
This means “Everyone likes ice cream” is equivalent to “there is no one who does
not like ice cream”.
The equality symbol is used to make the statements more effective that two terms
refer to the same object.
The Higher Order Logic allows quantifying over relations and functions as well as
over objects.
Eg: The two objects are equal if and only if, all the properties to them are
equivalent.
First Order Logic, a representation language that is far more powerful than
propositional logic. First Order Logic commits to the existence of objects and relations.
Relations - equals
Functions - plus
The representation language makes it easy to express the knowledge in the form of
sentences. This simplifies the construction problem enormously. This is called as
declarative approach.
ii) Semantics: It determines the facts in the world to which the sentences
refer.
The generations of new sentences that are necessarily true given the old sentences
are true. This relation between sentences is called entailment.
A tuple is a collection of objects arranged in a fixed order and is written with angle
brackets surrounding the objects.
{< Richard the Lionheart, King John>, <King John, Richard the
Lion heart>}
The basic syntactic elements of first order logic are the symbols that stand
for objects, relations and functions. The symbols are in three kinds. Constant symbols
which stand for objects, Predicate symbols which stand for relations and Function symbol
which stand for functions.
The task of deriving the new sentence from the old is called Inference.
The set of first order definite clauses with no function symbols is called
datalog.
Enemy(Nono, America)
The “inner loop” of the algorithm involves finding all possible unifiers such that the
premise of a rule unifies with a suitable set of facts in the knowledge base. This is called
Pattern Matching.
The first called OR-Parallelism comes from the possibility of a goal unifying with
many different clauses in the knowledge base. Each gives rise to an independent branch in
the search space that can lead to a potential solution and branches can be solved in parallel.
The second called AND-Parallelism comes from the possibility of solving each
conjunct in the body of an implication in parallel.
First Order resolution requires that sentences be in conjunctive normal form that is,
a conjunction of clauses, where each clause is a disjunction of literals. Literals can contain
variables, which are assumed to universally quantified.
Becomes, in CNF,
Hostile(z) Criminal(x)
Demodulation
Para modulation
Situations, which denote the states resulting from executing actions. This approach
is called Situation Calculus.
Situations are logical terms consisting of the initial situation and all
situations that are generated by applying an action to a situation.
Fluent are functions and predicates that vary from one situation to the next,
such as the location of the agent.
Atemporal or eternal predicates and functions are also allowed.
1. Explain the various steps associated with the knowledge engineering process?
Discuss them by applying the steps to any real world application of your choice.
(May/June 2007)
2. What are the various ontologies involved in situation calculus?
(May/June 2007)
(Nov/Dec 2007)
6. Explain the steps involved in representing knowledge using first order logic.
(Nov/Dec 2007)
(Nov/Dec 2007)
8. How are the facts represented using prepositional logic? Give an example.
(Nov/Dec 2005) 9.
Describe Non-Monotonic logic with an example.
(Nov/Dec 2005)
UNIT IV
TWO MARKS:
1. What is learning?
Learning takes many forms, depending on the nature of the performance
element, the component to be improved, and the available feedback.
b) Regression:
Learning a continuous function is called regression.
a) Alternate
b) Bar
c) Fri/Sat
d) Hungry
e) Patrons
f) Price
g) Raining
h) Reservation
i) Type
j) Wait Estimate
Parity function:
Majority function:
b) Divide it into two disjoint sets: the training set and the test set.
c) Apply the learning algorithm to the training set, generating a hypothesis h.
d) Measures the percentage of examples in the test set that are correctly
classified by h.
e) Repeat steps 1 to 4 for different sizes of training sets and different randomly
selected training sets of each size.
The agent’s policy is fixed and the task is to learn the utilities of states, this
could also involve learning a model of the environment.
1. Explain with proper example how EM algorithm can be used for learning with
hidden variables. (Nov/Dec 2007)
2. Describe how decision trees could be used for inductive learning. Explain its
effectiveness with a suitable example. (Nov/Dec 2007)
3. Explain the explanation-based learning. (Nov/Dec 2007)
4. Discuss on learning with hidden variables. (Nov/Dec 2007)
5. i) What do you understand by soft computing?
6. ii)Differentiate conventional and formal learning techniques / Theory and learning
via forms of reward and punishment. (Nov/Dec 2005)
7. Discuss partial order planning with unbound variables. (Nov/Dec 2005)
8. With reference to planning discuss progression and regression. (Nov/Dec 2005)
9. What are the languages suited for planning? (Nov/Dec 2005)
UNIT V
1. What is communication?
Communication is the intentional exchange of information brought about by
the production and perception of signs drawn from a shared system of conventional
signs. Most animals use signs to represent important messages.
2. Define language.
Language enables us to communicate most of what we know about the
world.
3. Why would an agent bother to perform a speech act when it could be doing a
“regular” action?
A group of agents exploring together gains an advantage by being able to do
the following.
Query
Inform
Request
Acknowledge
Promise
4. Differentiate formal language Vs natural language.
Formal language:
For example, a language in the first order logic, the terminal symbols include ^ and
P, and a typical string is “P ^ Q”. The String is not a member of the language.
Natural language:
5. Define Grammar.
A grammar is a finite set of rules that specifies a language. Formal
languages always have grammar. Natural languages have no grammar.
7. Define Lexicon.
The list of allowable words called lexicon. The words are grouped into the
categories or parts of speech familiar to dictionary users. Nouns, pronouns and names
to denote things, verbs to denote events, adjective to modify nouns and adverbs to
modify verbs.
Should return a parse tree with root S whose leaves are the “the wumpus is dead”
and whose internal nodes are nonterminal symbols from the grammar ε0.
1. A document collection
2. A query posed in a query language
3. A result set
4. A representation of the result set.
1. Explain the Machine Translation System with a neat sketch. Analyze its learning
probablities. (May/June 2007)
2. Perform Bottom Up and Top Down Parsing for the input “the wumpus is dead”.
(May/June 2007)
3. i) Describe the process involved in communication using the example sentence “the
wumpus is dead”
ii) Write short notes on semantic representation. (May/June 2007)
Or
(b) Explain the following search strategies(ANS: Page number-74)
(i) best first search
(ii) A* search
12. (a) Explain Min Max procedure (ANS: Page number-165)
Or
(b) Describe alpha beta pruning and give the other modifications to the min
max procedure to improve its performance, (ANS: Page number-167)
13. (a) Illustrate the use of predicate logic to represent the knowl;edge with suitable
example. (ANS: Page number-240)
Or
(b) consider the following sentences
john likes all kinds of food
apples are food
chicken is food
anything anyone isn’t killed by is
food. Bill eats peanuts and is still
alive
Sue eats everything bill eats
(i) translate these sentences into formulas in predicate
logic (ii) prove that johnlikes peanuts using backward
chaining (iii) convert the formulas of a part into clause
form
(iv) prove that john likes peanuts using resolution (ANS: Page number-253)
PART B – (5 x 16 = 80
marks)
11.(a) Explain in detail on the characteristics and applications of
13. (a) Explain the concept of planning with state space search using example
(Or)
(b) Explain the use of planning graph in providing better heuristic estimate
with suitable example.
15.(a) Explain the concept of learning using decision trees and neural network
approach
(
Or) (b) Write short
notes on:
(1) Statistical learning
(2) Explanation based learning.