Download as pdf or txt
Download as pdf or txt
You are on page 1of 153

India’s Mega Online Education Hub for Class 9-12 Students,

Engineers, Managers, Lawyers and Doctors.


Free Resources for Free Resources for Free Resources for Free Resources for Free Resources for
Class 9-12 Students Engineering Students MBA/BBA Students LLB/LLM Students MBBS/BDS Students

• Lecture Notes • Lecture Notes • Lecture Notes • Lecture Notes • Lecture Notes

• Project Reports • Project Reports • Project Reports • Project Reports • Project Reports

• Solved Papers • Solved Papers • Solved Papers • Solved Papers • Solved Papers

View More » View More » View More » View More » View More »

▼▼ Scroll Down to View your Downloaded File! ▼▼


Disclaimer

Please note none of the content or study material in this document or content in this file is prepared or
owned by Studynama.com. This content is shared by our student partners and we do not hold any
copyright on this content.

Please let us know if the content in this file infringes any of your copyright by writing to us at:
info@studynama.com and we will take appropriate action.
UNIT – 01
Meaning and definition of artificial intelligence
Unit-01/Lecture-01

Introduction to AI
What is artificial intelligence?
Artificial Intelligence is the branch of computer science concerned with making computers
behave like humans.
Major AI textbooks define artificial intelligence as "the study and design of intelligent
agents," where an intelligent agent is a system that perceives its environment and takes

m
actions which maximize its chances of success. John McCarthy, who coined the term in

o
1956, defines it as "the science and engineering of making intelligent machines, especially

.c
intelligent computer programs."

a
The definitions of AI according to some text books are categorized into four approaches and
are summarized in the table below :
Systems that think like humans m
a
―The exciting new effort to make computers think … machines with minds,in the full and
literal sense.‖(Haugeland,1985) n
y
Systems that think rationally

d
―The study of mental faculties through the use of computer models.‖

u
(Charniak and McDermont,1985)
t
Systems that act like humans
S
The art of creating machines that perform functions that require intelligence when
performed by people.‖(Kurzweil,1990)
Systems that act rationally
“Computational intelligence is the study of the design of intelligent agents.‖(Poole et
al.,1998)
The four approaches in more detail are as follows :
(a)Acting humanly : The Turing Test approach
o Test proposed by Alan Turing in 1950
o The computer is asked questions by a human interrogator.
The computer passes the test if a human interrogator, after posing some written questions,
cannot tell whether the written responses come from a person or not. Programming a
computer to pass ,the computer need to possess the following capabilities :
Natural language processing to enable it to communicate successfully in English.
Knowledge representation to store what it knows or hears
Automated reasoning to use the stored information to answer questions and to draw new
conclusions.
Machine learning to adapt to new circumstances and to detect and extrapolate patterns
To pass the complete Turing Test, the computer will need m
o
Computer vision to perceive the objects, and Robotics to manipulate objects and move
about.
.c
a
(b)Thinking humanly : The cognitive modelling approach We need to get inside actual
working of the human mind :
m
a
(a) through introspection – trying to capture our own thoughts as they go by;

n
(b) through psychological experiments

y
Allen Newell and Herbert Simon, who developed GPS, the - General Problem Solver‖ tried

d
to trace the reasoning steps to traces of human subjects solving the same problems. The

u
interdisciplinary field of cognitive science brings together computer models from AI and

t
experimental techniques from psychology to try to construct precise and testable theories

S
of the workings of the human mind
(c)Thinking rationally : The “laws of thought approach”
The Greek philosopher Aristotle was one of the first to attempt to codify ―right thinkin
,that is irrefutable reasoning processes. His syllogism provided patterns for argument
structures that always yielded correct conclusions when given correct premises—for
example, Socrates is a man; all men are mortal; therefore Socrates is mortal.‖.
These laws of thought were supposed to govern the operation of the mind; their study
Studynama.com - #1 destination for free notes, eBooks, papers & projects.
Choose your course/study level below and start downloading:

CLASSES 6-10 CLASSES 11-12


 Class 6 CBSE NCERT Notes, Solutions  Class 11 Science & Medical Notes, eBooks
 Class 7 CBSE NCERT Notes, Solutions  Class 12 Science & Medical Notes, eBooks
 Class 8 CBSE NCERT Notes, Solutions  Class 11/12 Science & Medical Projects
 Class 6, 7, 8 CBSE NCERT Projects  Class 11/12 Science & Medical Solved Papers
 Class 9 Notes, eBook CBSE PDF Download  Class 11 Commerce Notes, PDF eBooks
 Class 10 Notes, eBooks CBSE  Class 12 Commerce Notes, PDF eBooks
 Class 9 & 10 Projects, PPTs PDF Free  Class 11 Arts Notes, eBooks Free
 Class 9, 10 Solved Board Papers  Class 12 Arts Notes, eBooks Free

ENGINEERING – BE / BTECH. STUDY MATERIAL


 First Year Engineering Notes, Projects  Aerospace Engineering Notes, Projects
 Computer Sc. Engineering Notes, Projects  Metallurgical Engineering Notes, Projects
 Electronics Engineering Notes, Projects  CSE/IT Engineering Projects, Reports
 Electrical/EEE Engineering Notes, Projects  Electronics Engineering Projects, Reports
 Mechanical Engineering Notes, Projects  Electrical/EE Engineering Projects, Reports
 Civil Engineering Notes, eBooks, Projects  Mechanical Engineering Projects, Reports
 Production Engineering Notes, Projects  Civil Engineering (CE) Projects, Reports
 Chemical Engineering Notes, Projects

BBA / BBM MBA / PGDM


 BBA/BBM CORE Notes, Books, eBooks  MBA Core/General Notes, eBooks, Projects
 BBA/BBM Marketing Notes, Books, eBooks  MBA Marketing Notes, eBooks, Projects
 BBA/BBM Finance Notes, Books, eBooks  MBA Finance Notes, eBooks, Projects, Reports
 BBA/BBM HR Notes, Books, eBooks  MBA HR Notes, eBooks, Projects, Reports
 BBA/BBM Operations Notes, Books, eBooks  MBA Operations Notes, eBooks, Projects
 BBA & BBM Projects, Training Report

SCROLL DOWN TO SEE SUBJECT NOTES


initiated a field called logic.

(d)Acting rationally : The rational agent approach


An agent is something that acts. Computer agents are not mere programs ,but they are
expected to have the following attributes also :
(a) operating under autonomous control,
(b) perceiving their environment,
(c) persisting over a prolonged time period,
(e) adapting to change.
A rational agent is one that acts so as to achieve the best outcome.

The History of Artificial Intelligence m


The gestation of artificial intelligence (1943-1955) o
.c
There were a number of early examples of work that can be characterized as AI, but it

a
Was Alan Turing who first articulated a complete vision of A1 in his 1950 article "Computing

m
Machinery and Intelligence." Therein, he introduced the Turing test, machine learning,

a
genetic algorithms, and reinforcement learning.

n
The birth of artificial intelligence (1956)

y
McCarthy convinced Minsky, Claude Shannon, and Nathaniel Rochester to help him bring

d
together U.S. researchers interested in automata theory, neural nets, and the study of

u
intelligence. They organized a two-month workshop at Dartmouth in the summer of 1956.

t
Perhaps the longest-lasting thing to come out of the workshop was an agreement to adopt

S
McCarthy's new name for the field: artificial intelligence.
Early enthusiasm, great expectations (1952-1969) The early years of A1 were full of
successes-in a limited way.
General Problem Solver (GPS) was a computer program created in 1957 by Herbert Simon
and Allen Newell to build a universal problem solver machine. The order in which the
program considered subgoals and possible actions was similar to that in which humans
approached the same problems. Thus, GPS was probably the first program to embody the
Studynama.com - #1 destination for free notes, eBooks, papers & projects.
Choose your course/study level below and start downloading:

LLB / LAW MEDICAL


 LLB Notes, eBooks FREE PDF Download  MBBS Notes, eBook, Project, Papers & Cases
 LLB Law First Year PDF Notes, Projects, Papers  BDS Notes, eBook, Project, Papers & Cases
 LLB Law 2nd Year PDF Notes, Projects, Papers  BHMS Notes, eBook, Project, Papers & Cases
 LLB Law 3rd Year PDF Notes, Projects, Papers  BPharma Notes, Projects, Papers & Cases

B.COM ENGINEERING / GATE ENTRANCE


 B.Com. General Notes, eBooks PDF Download  IIT-JEE Mains 2019 Solved Papers, Notes, Cutoffs
 B.Com. Honours (Hons.) Notes, eBooks  IITJEE Advanced 2019: Solved Papers, Notes
 B.Com. Finance Notes, eBooks PDF Download  BITSAT 2019 - Solved Papers, Notes, Cutoffs
 B.Com. Human Resources Notes, eBooks PDF  VITEEE/SRMEEE/MU-OET 2019: Papers, Notes
 B.Com. IT Notes, eBooks PDF Download  UPTU/AKTU 2019 - Solved Papers, Notes, Cutoffs
 B.Com. Accounting & Taxation Notes, eBooks  WBJEE 2019 - Solved Papers, Notes, Cutoffs
 B.Com. Marketing Notes, eBooks PDF Download  MH-CET 2019: Solved Papers, Notes, Cutoffs
 B.Com. BFSI Notes, eBooks PDF Download  EAMCET, COMEDK 2019: Solved Papers, Notes

BCA / MCA / B.SC. MEDICAL ENTRANCE


 BCA Notes, eBooks Download  AIIMS Medical 2019 - Solved Papers, Notes
 BCA Project Reports & PPTs  NEET (AIPMT) 2019: Solved Papers, Notes
 MCA Notes, eBooks Download  AIPVT Medical 2019: Solved Papers, Notes
 B.Sc. Notes, eBooks Download  AFMC Medical 2019: Solved Papers, Notes
 B.Sc.(IT) Notes, eBooks, Papers, Projects  BHU-PMT, CMC Vellore 2019: Papers, Notes

GATE – ENTRANCE FOR MTECH & PSU


 GATE CSE/IT engineering Notes & Papers  GATE Electronics engineering Notes & Papers
 GATE Mechanical engineering Notes & Papers  GATE Civil Engg. Free PDF Notes & papers

SCROLL DOWN TO SEE SUBJECT NOTES


"thinking humanly" approach.
A dose of reality (1966-1973)
From the beginning, AI researchers were not shy about making predictions of their coming
successes. The following statement by Herbert Simon in 1957 is often quoted:
-It is not my aim to surprise or shock you-but the simplest way I can summarize is to say that
there are now in the world machines that think, that learn and that create. Moreover, their
ability to do these things is going to increase rapidly until-in a visible future-the range of
problems they can handle will be coextensive with the range to which the human mind has
been applied.
Knowledge-based systems: The key to power? (1969-1979)
Dendral was an influential pioneer project in artificial intelligence (AI) of the 1960s, and the
m
computer software expert system that it produced. Its primary aim was to help organic
o
chemists in identifying unknown organic molecules, by analyzing their mass spectra and

.c
using knowledge of chemistry. It was done at Stanford University by Edward Feigenbaum,
Bruce Buchanan, Joshua Lederberg, and Carl Djerassi.
a
A1 becomes an industry (1980-present)
m
a
In 1981, the Japanese announced the "Fifth Generation" project, a 10-year plan to build

n
intelligent computers running Prolog. Overall, the A1 industry boomed from a few million

y
dollars in 1980 to billions of dollars in 1988.

d
The return of neural networks (1986-present)

u
Psychologists including David Rumelhart and Geoff Hinton continued the study of neural-net

t
models of memory.

S
A1 becomes a science (1987-present)
In recent years, approaches based on hidden Markov models (HMMs) have come to
dominate the area. Speech technology and the related field of handwritten character
recognition are already making the transition to widespread industrial and consumer
applications.
The Bayesian network formalism was invented to allow efficient representation of, and
rigorous reasoning with, uncertain knowledge.
The emergence of intelligent agents (1995-present)
One of the most important environments for intelligent agents is the Internet.

The state of art:


What can AI do today?
Autonomous planning and scheduling: A hundred million miles from Earth, NASA's Remote
Agent program became the first on-board autonomous planning program to control the
scheduling of operations for a spacecraft (Jonsson et al., 2000). Remote Agent generated
plans from high-level goals specified from the ground, and it monitored the operation of the
spacecraft as the plans were executed-detecting, diagnosing, and recovering from problems
as they occurred.
m
Game playing: IBM's Deep Blue became the first computer program to defeat the
o
world champion in a chess match when it bested Garry Kasparov by a score of 3.5 to 2.5 in
an exhibition match (Goodman and Keene, 1997).
.c
a
Autonomous control: The ALVINN computer vision system was trained to steer a car

m
to keep it following a lane. It was placed in CMU's NAVLAB computer-controlled minivan and

a
used to navigate across the United States-for 2850 miles it was in control of steering the

n
vehicle 98% of the time.

y
Diagnosis: Medical diagnosis programs based on probabilistic analysis have been able to

d
perform at the level of an expert physician in several areas of medicine.

u
Logistics Planning: During the Persian Gulf crisis of 1991, U.S. forces deployed a Dynamic

t
Analysis and Replanning Tool, DART (Cross and Walker, 1994), to do automated logistics

S
planning and scheduling for transportation. This involved up to 50,000 vehicles, cargo, and
people at a time, and had to account for starting points, destinations, routes, and conflict
resolution among all parameters. The AI planning techniques allowed a plan to be
generated in hours that would have taken weeks with older methods. The Defense
Advanced Research Project Agency (DARPA) stated that this single application more than
paid back DARPA's 30-year investment in AI.
Robotics: Many surgeons now use robot assistants in microsurgery. HipNav (DiGioia
et al., 1996) is a system that uses computer vision techniques to create a three-dimensional
model of a patient's internal anatomy and then uses robotic control to guide the insertion of
hip replacement prosthesis.
Language understanding and problem solving: PROVERB (Littman et al., 1999) is a computer
program that solves crossword puzzles better than most humans, using constraints on
possible word fillers, a large database of past puzzles, and a variety of information sources
including dictionaries and online databases such as a list of movies and the actors that
appear in them.

INTELLIGENT AGENTS
Agents and environments
m
An agent is anything that can be viewed as perceiving its environment through sensors and
o
acting upon that environment through actuators. This simple idea is illustrated in Figure 1.

.c
a
o A human agent has eyes, ears, and other organs for sensors and hands, legs, mouth, and
other body parts for actuators.
m
a
o A robotic agent might have cameras and infrared range finders for sensors and various

n
motors for actuators.

y
o A software agent receives keystrokes, file contents, and network packets as sensory

d
inputs and acts on the environment by displaying on the screen, writing files, and sending

u
network packets.

t
S

Figure 1 Agents interact with environments through sensors and actuators


Percept
We use the term percept to refer to the agent's perceptual inputs at any given instant.

Percept Sequence
An agent's percept sequence is the complete history of everything the agent has ever
perceived.

Agent function
Mathematically speaking, we say that an agent's behavior is described by the agent function
that maps any given percept sequence to an action.

Agent program m
o
Internally, The agent function for an artificial agent will be implemented by an agent

.c
program. It is important to keep these two ideas distinct. The agent function is an abstract

a
mathematical description; the agent program is a concrete implementation, running on the
agent architecture.
m
Rational Agent
a
n
A rational agent is one that does the right thing-conceptually speaking, every entry in the

y
table for the agent function is filled out correctly. Obviously, doing the right thing is

d
better than doing the wrong thing. The right action is the one that will cause the agent to be
most successful.
u
t
Performance measures

S
A performance measure embodies the criterion for success of an agent's behaviour. When
an agent is plunked down in an environment, it generates a sequence of actions according
to the precepts it receives. This sequence of actions causes the environment to go through a
sequence of states. If the sequence is desirable, then the agent has performed well.
Rationality
What is rational at any given time depends on four things:
o The performance measure that defines the criterion of success. o The agent's prior
knowledge of the environment.
o The actions that the agent can perform.
o The agent's percept sequence to date.
This leads to a definition of a rational agent:
For each possible percept sequence, a rational agent should select an action that is
expected to maximize its performance measure, given the evidence provided by the percept
sequence and whatever built-in knowledge the agent has.
Omniscience, learning, and autonomy
An omniscient agent knows the actual outcome of its actions and can act accordingly; but
omniscience is impossible in reality.
Doing actions in order to modify future precepts-sometimes called information gathering-is
an important part of rationality. m
o
Our definition requires a rational agent not only to gather information, but also to learn as
much as possible from what it perceives.
.c
a
To the extent that an agent relies on the prior knowledge of its designer rather than on its

m
own precepts, we say that the agent lacks autonomy. A rational agent should be

a
autonomous-it should learn what it can to compensate for partial or incorrect prior

n
knowledge.

y
Task environments

d
We must think about task environments, which are essentially the "problems" to which

u
rational agents are the "solutions."

t
Specifying the task environment

S
The rationality of the simple vacuum-cleaner agent, needs specification of o the
performance measure
o the environment
o the agent's actuators and sensors.
PEAS
All these are grouped together under the heading of the task environment. We call this the
PEAS (Performance, Environment, Actuators, Sensors) description.
In designing an agent, the first step must always be to specify the task environment as fully
as possible.

m
o
.c
a
m
a
n
y
S.NO
d RGPV QUESTIONS Year Marks
Q.1
u
Explain Artificial Intelligence. Also discuss June 2014 7

t
the areas where it can be used.
Q.2 SWhat are the characteristics of AI problems?
Explain the areas where AI can be applied
June,2010 6

Q.3 Define artificial intelligence. Also tell the June,2005 10


areas where it can be used.
Unit-01/Lecture-02
Various types of production systems
A Knowledge representation formalism consists of collections of condition-action
rules(Production Rules or Operators), a database which is modified in accordance with the
rules, and a Production System Interpreter which controls the operation of the rules i.e The
'control mechanism' of a Production System, determining the order in which Production
Rules are fired.
A system that uses this form of knowledge representation is called a production system.
A production system consists of rules and factors. Knowledge is encoded in a declarative
from which comprises of a set of rules of the form
Situation ------------ Action
m
SITUATION that implies ACTION.
o
.c
Example:-
IF the initial state is a goal state THEN quit.
The major components of an AI production system are a
i. A global database
m
ii. A set of production rules and
a
iii. A control system
n
y
The goal database is the central data structure used by an AI production system. The

d
production system. The production rules operate on the global database. Each rule has a

u
precondition that is either satisfied or not by the database. If the precondition is satisfied,

t
the rule can be applied. Application of the rule changes the database. The control system

S
chooses which applicable rule should be applied and ceases computation when a
termination condition on the database is satisfied. If several rules are to fire at the same
time, the control system resolves the conflicts.
Four classes of production systems:-
1. A monotonic production system
2. A non monotonic production system
3. A partially commutative production system
4. A commutative production system.

Advantages of production systems:-


1. Production systems provide an excellent tool for structuring AI programs.
2. Production Systems are highly modular because the individual rules can be added,
removed or modified independently.
3. The production rules are expressed in a natural form, so the statements contained in the
knowledge base should the a recording of an expert thinking out loud.
Disadvantages of Production Systems:-
One important disadvantage is the fact that it may be very difficult analyse the flow of
m
control within a production system because the individual rules don’t call each other.
o
Production systems describe the operations that can be performed in a search for a solution
to the problem. They can be classified as follows.
.c
a
Monotonic production system :- A system in which the application of a rule never prevents

m
the later application of another rule, that could have also been applied at the time the first
rule was selected.
a
n
Partially commutative production system:-

y
A production system in which the application of a particular sequence of rules transforms

d
state X into state Y, then any permutation of those rules that is allowable also transforms

u
state x into state Y.

t
Theorem proving falls under monotonic partially communicative system. Blocks world and 8

S
puzzle problems like chemical analysis and synthesis come under monotonic, not partially
commutative systems. Playing the game of bridge comes under non monotonic , not
partially commutative system.
For any problem, several production systems exist. Some will be efficient than others.
Though it may seem that there is no relationship between kinds of problems and kinds of
production systems, in practice there is a definite relationship.
Partially commutative, monotonic production systems are useful for solving ignorable
problems. These systems are important for man implementation standpoint because they
can be implemented without the ability to backtrack to previous states, when it is
discovered that an incorrect path was followed. Such systems increase the efficiency since it
is not necessary to keep track of the changes made in the search process.
Monotonic partially commutative systems are useful for problems in which changes occur
but can be reversed and in which the order of operation is not critical (ex: 8 puzzle
problem).
Production systems that are not partially commutative are useful for many problems in
which irreversible changes occur, such as chemical analysis. When dealing with such
systems, the order in which operations are performed is very important and hence correct
decisions have to be made at the first time itself. m
o
Characteristics of production systems
.c
a

m
Separation of Knowledge (the Rules) and Control (Recognize-Act Cycle)

a

n
Natural Mapping onto State Space Search (Data or Goal Driven)

y

d
Modularity of Production Rules (Rules represent chunks of knowledge)

u

t
Pattern-Directed Control (More flexible than algorithmic control)


S
Opportunities for Heuristic Control can be built into the rules

• Tracing and Explanation (Simple control, informative rules)

• Language Independence
• Model for Human Problem Solving (SOAR, ACT*)
8 Puzzle Problem
The 8 puzzle consists of eight numbered, movable tiles set in a 3x3 frame. One cell of the
frame is always empty thus making it possible to move an adjacent numbered tile into the
empty cell. Such a puzzle is illustrated in following diagram.
The program is to change the initial configuration into the goal configuration. A solution to
the problem is an appropriate sequence of moves, such as “move tiles 5 to the right, move
tile 7 to the left, move tile 6 to the down, etc”.

To solve a problem using a production system, we must specify the global database the
rules, and the control strategy. For the 8 puzzle problem that correspond to these three
m
components. These elements are the problem states, moves and goal. In this problem each
o
tile configuration is a state. The set of all configuration in the space of problem states or the

.c
problem space, there are only 3,62,880 different configurations o the 8 tiles and blank

a
space. Once the problem states have been conceptually identified, we must construct a

m
computer representation, or description of them. this description is then used as the

a
database of a production system. For the 8-puzzle, a straight forward description is a 3X3

n
array of matrix of numbers. The initial global database is this description of the initial

y
problem state. Virtually any kind of data structure can be used to describe states.

d
A move transforms one problem state into another state. The 8-puzzle is convenjently

u
interpreted as having the following for moves. Move empty space (blank) to the left, move

t
blank up, move blank to the right and move blank down,. These moves are modeled by

S
production rules that operate on the state descriptions in the appropriate manner.
The rules each have preconditions that must be satisfied by a state description in order for
them to be applicable to that state description. Thus the precondition for the rule
associated with “move blank up” is derived from the requirement that the blank space must
not already be in the top row.
The problem goal condition forms the basis for the termination condition of the production
system. The control strategy repeatedly applies rules to state descriptions until a description
of a goal state is produced . it also keep track of rules that have been applied so that it can
compose them into sequence representing the problem solution. A solution to the 8-puzzle
problem is given in the following figure.
8 – puzzle problem is shown in following diagrams.

m
o
.c
a
m
a
n
y
d
u
t
S
m
o
.c
a
m
a
n
y
d
u
t
S
Water-Jug Problem
Statement :- We are given 2 jugs, a 4 liter one and a 3- liter one. Neither has any measuring
markers on it. There is a pump that can be used to fill the jugs with water. How can we get
exactly 2 liters of water in to the 4-liter jugs?
Solution:-
The state space for this problem can be defined as

{ ( i ,j ) i = 0,1,2,3,4 j = 0,1,2,3}

‘i’ represents the number of liters of water in the 4-liter jug and ‘j’ represents the number of
m
liters of water in the 3-liter jug. The initial state is ( 0,0) that is no water on each jug. The
goal state is to get ( 2,n) for any value of ‘n’. o
.c
a
To solve this we have to make some assumptions not mentioned in the problem. They are

m
1. We can fill a jug from the pump.
a
n
y
2. We can pour water out of a jug to the ground.

d
u
3. We can pour water from one jug to another.

t
S
4. There is no measuring device available.

The various operators (Production Rules) that are available to solve this problem may be
stated as given in the following figure.
m
o
.c
a
m
a
n
y
d
u
t
S
Studynama’s BDS Community is one of India’s Largest Community of Dental Students. About
19,232 Indian Dental Course students are members of this community and share FREE study
material, cases, projects, exam papers etc. to enable each other to do well in their semester exams.

Links to Popular Study Material for BDS (Dental) students:


 Orthodontic Fixed Appliances - BDS Lecture Notes PDF Download
 Amalgam Restoration - BDS Lecture Notes PDF Download
 COMPLEX NON-SKELETAL PROBLEMS IN PREADOLESCENT CHILDREN - BDS Lecture Notes
 Anatomy of Scalp - BDS Lecture Notes PDF Download
 Cerebrospinal Fluid (CSF) - BDS Lecture Notes PDF Download
 Cementum - BDS Lecture Notes PDF Download
 Recent Advances in Instrumentation Techniques - BDS Lecture Notes PDF Download
 Ameloblastoma - BDS Lecture Notes PDF Download
 Prevention of Odontogenic Infection - Principles of Management - BDS Lecture Notes
 Therapeutic Dentistry Histology of Teeth Dental Charting - BDS Lecture Notes PDF Download
 Maxillofacial Trauma and Management - BDS Lecture Notes PDF
 Technical Endodontics - BDS Lecture Notes PDF Download
And 698 more free downloads for Dental Students.
Other Popular Links for Law Study Material:
 BDS Lecture Notes, eBooks, Guides, Projects and Case Papers FREE PDF Download
 BDS Lecture Notes, eBooks, Guides & Handouts FREE PDF Download
 BDS University Previous Year Exam Question Papers & Solutions FREE PDF Download
m
o
.c
a
m
a
n
y
d
u
t
S
The Missionaries and Cannibals Problem Statement

Three missionaries and three cannibals find themselves on one side of a river. They have
would like to get to the other side. But the missionaries are not sure what else the cannibals
agreed to. So the missionaries managed the trip across the river in such a way that the
number of missionaries on either side of the river is never less than the number of cannibals
who are on the same side. The only boar available holds only two at a time. How can
everyone get across the river without the missionaries risking being eaten?

Solution:-
The state for this problem can be defined as
{(i, j)/ i=0, 1, 2, 3, : j=0, 1, 2, 3} where i represents the number missionaries in one side of a
river . j represents the number of cannibals in the same side of river. The initial state is (3,3),
that is three missionaries and three cannibals on one side of a river , (Bank 1) and ( 0,0) on
another side of the river (bank 2) . the goal state is to get (3,3) at bank 2 and (0,0) at bank 1.

To sole this problem we will make the following assumptions:

1. Number of cannibals should lesser than the missionaries on either side.

2. Only one boat is available to travel.

3. Only one or maximum of two people can go in the boat at a time. m


4. All the six have to cross the river from bank. o
.c
5. There is no restriction on the number of trips that can be made to reach of the goal.

6. Both the missionaries and cannibals can row the boat.


a
m
a
n
y
d
u
t
S
Studynama’s Engineering Community is one of India’s Largest Community of BE & BTECH Students. About 79,182
Indian Engineering students are members of this community and share FREE study material, notes, eBooks, major
and minor projects, exam papers etc. to enable each other to do well in their semester exams.

Links to Popular Study Material for Engineering (BE/BTech) students:


CSE & IT Engineering Electronics Engineering Electrical Engineering Mechanical Engineering Civil Engineering

Computer Science Electronics Engineering BTech Civil/Structural


Electrical/EE Engineering Mechanical/Automobile/IP
(CSE/IT Engineering) (ECE/ET/EC) Engineering Second
Second/2nd Year Notes Engineering second year notes
Second/2nd Year Notes Second/2nd Year Notes Year Lecture Notes

CSE/IT Computer Electronics Engineering BTech Civil/Structural


Electrical/EE Engineering Mechanical/Automobile/IP
Science Engg. Final/4th (ECE/EC/ET) Third/3rd Engineering Fourth
Third/3rd Year notes Engineering fourth year notes
Year Notes Year Notes Year Lecture Notes

Computer Science Electronics (ECE/ET/EC) BTech Civil/Structural


Electrical/EE Engineering Mechanical/Automobile/IP
Engineering (CSE/IT) Engineering Final/Fourth Engineering Third Year
Fourth/4th Year Notes Engineering third year notes
Third/3rd Year Notes Year Notes Lecture Notes

Advanced Java Antenna & wave Electrical Machine-1 pdf Automobile engineering lecture
Surveying 1 - eBook
Programming eBook propagation eBook download notes

Web Technology - Network analysis & Electrical machines-II Engineering materials & SOM - strength of
eBook synthesis notes eBook metallurgy lecture notes materials - eBook

E-Commerce lecture VLSI engineering pdf Manufacturing Technology-1 Engineering Geology


EMI eBook
notes lecture notes lecture notes eBook

And 12998 more free downloads for BE & BTech Students.

Other Popular Links for Engineering Study Material:


• Engineering First Semester (Sem 1) Notes
• Engineering Second Semester (Sem 2) Notes
• Engineering chemistry pdf eBook
• Engineering Mechanics PDF Lecture Notes
• Electrical/EE Engineering notes
• Mechanical/Automobile/IP Engineering notes
• Powerplant Engineering Lecture Notes
• Engineering Mechanics lecture notes
RGPV QUESTIONS Year Marks
Q.1 Discuss the characteristics of production June.2014 7
systems
Q.2 Define the various types of production system June.2013 10
& give its characteristics
Q.3 You are given 2 jugs, a 4 gallon jug and a 3 June.2012 10
gallon one. Neither have any measuring
markers on it. There is a pump that can be used
to fill the jug with water. How can you get
exactly 2 gallon of water in the 4 gallon jug ?
write the production rule for the above
m
problem. o
Q.4 Explain the purpose of production system with
.c
June.2012 10
examples. Also explain the various types of
production system.
a
Q.5 m
Explain various types of production systems June.2011 10
with their characteristics a
Q.6 n
Find the state space representation of 8-Puzzle June.2011 10
problem y
Q.7 d
Explain the purpose of production system. June.2006 10
u
What are its characteristics? Mention different
types t of production system with brief
S
explanation. https://www.rgpvonline.com
Unit-01/Lecture-03
Breadth First Search (BFS)
Breadth-first search is a simple strategy in which the root node is expanded first,then all
successors of the root node are expanded next,then their successors,and so on. In general,all
the nodes are expanded at a given depth in the search tree before any nodes at the next level
are expanded.
Breath-first-search is implemented by calling TREE-SEARCH with an empty fringe that is a first-
in-first-out(FIFO) queue,assuring that the nodes that are visited first will be expanded first. In
otherwards,calling TREE-SEARCH(problem,FIFO-QUEUE()) results in breadth-first-search. The
FIFO queue puts all newly generated successors at the end of the queue,which means that
Shallow nodes are expanded before deeper nodes.

m
o
.c
a
Breadth-first search on a simple binary tree. At each stage, the node to be expanded next is

m
indicated by a marker.

a
Properties of breadth-first-search

n
y
d
u
t
S
Time complexity for BFS
Assume every state has b successors. The root of the search tree generates b nodes at the
first level,each of which generates b more nodes,for a total of b2 at the second level. Each of
these generates b more nodes,yielding b3 nodes at the third level,and so on. Now
suppose,that the
Studynama’s Law Community is one of India’s Largest Community of Law Students. About
29,982 Indian Law students are members of this community and share FREE study material,
cases, projects, exam papers etc. to enable each other to do well in their semester exams.

Links to Popular Study Material for LAW (LLB & BA.LLB) students:
• Family law pdf lecture notes & eBook download for LLB students
• Jurisprudence Law lecture notes pdf & eBook download for LLB students
• Company Law lecture notes pdf & eBook download for LLB students
• Law of Evidence lecture notes pdf & eBook download for LLB students
• Contract Law lecture notes pdf & eBook download for LLB students
• Criminal law pdf lecture notes & eBook download for LLB students
• Taxation Law lecture notes pdf & eBook download for LLB students
• Law of torts pdf eBook & lecture notes for LLB students
• Constitutional law pdf lecture notes & eBook download
• Labour law lecture notes pdf & eBook download for LLB students
• Administrative law lecture notes pdf & eBook download for LLB students
• Constitutional Law - I q&a notes pdf & eBook download for LLB
And 1998 more free downloads for Law & LLB Students.

Other Popular Links for Law Study Material:


• LLB/LLM Lecture Notes, eBooks, Guides, Handouts FREE PDF Download
• LLB - Law third year notes, eBooks, handouts and study material - semester 5 & 6
• LLB - Law second year notes, eBooks, handouts and study material - semester 3 & 4
• LLB - Law first year notes, eBooks, handouts and study material - semester 1 & 2
solution is at depth d. In the worst case,we would expand all but the last node at level d,
generating bd+1 - b nodes at level d+1.
Then the total number of nodes generated is b + b2 + b3 + …+ bd + ( bd+1 + b) = O(bd+1).
Every node that is generated must remain in memory,because it is either part of the fringe or
is an ancestor of a fringe node. The space compleity is,therefore ,the same as the time
complexity

Depth first search


Depth-first-search always expands the deepest node in the current fringe of the search tree.
The progress of the search is illustrated in figure 1.31. The search proceeds immediately to
the deepest level of the search tree,where the nodes have no successors. As those nodes are
m
expanded,they are dropped from the fringe,so then the search ―backs up‖ to the next
shallowest node that still has unexplored successors. o
.c
a
m
a
n
y
d
u
t
S

Fig. Depth-first-search on a binary tree. Nodes that have been expanded and have no
descendants in the fringe can be removed from the memory;these are shown in black.
Nodes at depth 3 are assumed to have no successors and M is the only goal node.
This strategy can be implemented by TREE-SEARCH with a last-in-first-out (LIFO) queue,also
known as a stack.
Depth-first-search has very modest memory requirements.It needs to store only a single path
from the root to a leaf node,along with the remaining unexpanded sibling nodes for each
node on the path. Once the node has been expanded,it can be removed from the memory,as
soon as its descendants have been fully explored
For a state space with a branching factor b and maximum depth m,depth-first-search requires
storage of only bm + 1 nodes.
Drawback of Depth-first-search
m
The drawback of depth-first-search is that it can make a wrong choice and get stuck going
o
down very long(or even infinite) path when a different choice would lead to solution near the

.c
root of the search tree. For example ,depth-first-search will explore the entire left subtree
even if node C is a goal node.
a
BACKTRACKING SEARCH
m
a
A variant of depth-first search called backtracking search uses less memory and only one

n
successor is generated at a time rather than all successors.; Only O(m) memory is needed

y
rather than O(bm)

d
S.NO RGPV QUESTIONS Year Marks
Q.1
u
Compare breadth first search and depth first search June.2014 7

t
techniques.
Q.2
S
Explain best first, depth first and breadth first search.
When would best first search be worse than simple
June.2011,
Dec.2009
10

breadth first search?


Q.3 Differentiate between Breadth First Search and June.2007 7
Depth First Search.
Unit-01/Lecture-04
INFORMED SEARCH AND EXPLORATION
Informed(Heuristic) Search Strategies
Informed search strategy is one that uses problem-specific knowledge beyond the definition
of the problem itself. It can find solutions more efficiently than uninformed strategy.
Best-first search Best-first search is an instance of general TREE-SEARCH or GRAPH-SEARCH
algorithm in which a node is selected for expansion based on an evaluation function f(n).
The node with lowest evaluation is selected for expansion, because the evaluation measures
the distance to the goal.
m
o
This can be implemented using a priority-queue, a data structure that will maintain the

.c
fringe in ascending order of f-values.
Heuristic functions

a
A Heuristic technique helps in solving problems, even though there is no guarantee that it

m
will never lead in the wrong direction. There are heuristics of every general applicability as

a
well as domain specific. The strategies are general purpose heuristics. In order to use them

n
in a specific domain they are coupler with some domain specific heuristics. There are two

y
major ways in which domain - specific, heuristic information can be incorporated into rule-

d
based search procedure.

u
- In the rules themselves

t
- As a heuristic function that evaluates individual problem states and determines how

S
desired they are.
A heuristic function is a function that maps from problem state description to measures
desirability, usually represented as number weights. The value of a heuristic function at a
given node in the search process gives a good estimate of that node being on the desired
path to solution. Well designed heuristic functions can provides a fairly good estimate of
whether a path is good or not. ( " The sum of the distances traveled so far" is a simple
heuristic function in the traveling salesman problem) . the purpose of a heuristic function is
Studynama’s MBA Community is one of India’s Largest Community of MBA & PGDM Students. About 29,882 Indian
Management students are members of this community and share FREE study material, notes, eBooks, projects, case
studies exam papers etc. to enable each other to do well in their semester exams.

Links to Popular Study Material for Management (MBA/PGDM) students:


MBA General MBA Marketing MBA Finance MBA HR MBA Operations

MBA/PGDM first Enterprise resource Security analysis & Business environment MBA Operations
year notes planning (ERP) pdf portfolio Mgmt. Notes Notes

MBA/PGDM second Marketing International Fin. Human resource MIS pdf


year notes management pdf management eBook management pdf

Principles of International Advanced financial Compensation Industrial safety


management pdf marketing eBook management eBook management Notes eBook

Managerial Retail marketing Corporate taxation International human Import export


economics eBook eBook pdf eBook resource management pdf mgmt. lecture notes

Organizational Sales management Financial Human resource TQM eBook


behavior pdf eBook management eBook development eBook

Operation research Brand management Management Organization & Management


lecture notes pdf accounting notes management process pdf control systems

And 12,998 more free downloads for MBA & PGDM Students.

Other Popular Links for MBA Study Material:


• MBA/PGDM Core (General) Notes, eBooks, Projects, Cases & Reports FREE PDF Download
• MBA/PGDM Marketing Notes, eBooks, Projects, Cases & Reports FREE PDF Download
• MBA/PGDM Finance Notes, eBooks, Projects, Cases & Reports FREE PDF Download
• MBA/PGDM HR Notes, eBooks, Projects, Cases & Reports FREE PDF Download
• MBA/PGDM Operations Notes, eBooks, Projects, Cases & Reports FREE PDF Download
to guide the search process in the most profitable directions, by suggesting which path to
follow first when more than one path is available. However in many problems, the cost of
computing the value of a heuristic function would be more than the effort saved in the
search process. Hence generally there is a trade-off between the cost of evaluating a
heuristic function and the savings in search that the function provides.
A heuristic function or simply a heuristic is a function that ranks alternatives in various
search algorithms at each branching step basing on an available information in order to
make a decision which branch is to be followed during a search.
The key component of Best-first search algorithm is a heuristic function, denoted by h(n):
h(n) = extimated cost of the cheapest path from node n to a goal node.
For example,in Romania,one might estimate the cost of the cheapest path from Arad to
Bucharest via a straight-line distance from Arad to Bucharest. m
o
Heuristic function are the most common form in which additional knowledge is imparted to
the search algorithm.
.c
Greedy Best-first search
a
m
Greedy best-first search tries to expand the node that is closest to the goal,on the grounds

a
that this is likely to a solution quickly.

n
It evaluates the nodes by using the heuristic function f(n) = h(n).

y
Taking the example of Route-finding problems in Romania , the goal is to reach Bucharest

d
starting from the city Arad. We need to know the straight-line distances to Bucharest from

u
various cities For example, the initial state is In(Arad) ,and the straight line distance heuristic

t
hSLD(In(Arad)) is found to be 366.

S
Using the straight-line distance heuristic hSLD ,the goal state can be reached faster
Fig.Values of hSLD - straight line distances to Bucharest

m
o
.c
Fig. stages in greedy best-first search for Bucharest using straight-line distance heuristic
hSLD. Nodes are labeled with their h-values.
a
m
Below Figure shows the progress of greedy best-first search using hSLD to find a path from

a
Arad to Bucharest. The first node to be expanded from Arad will be Sibiu,because it is closer
to Bucharest than either Zerind or Timisoara. The next node to be expanded will be
n
Fagaras,because it is closest. Fagaras in turn generates Bucharest, which is the goal.
Properties of greedy search
y
o Complete?? No–can get stuck in loops, e.g., Iasi ! Neamt ! Iasi ! Neamt !
d
Complete in finite space with repeated-state checking

u
o Time?? O(bm), but a good heuristic can give dramatic improvement o Space?? O(bm)—
keeps all nodes in memory
o Optimal?? Not
S
Greedy best-first search is not optimal,and it is incomplete.
The worst-case time and space complexity is O(bm),where m is the maximum depth of the
search space.
m
o
.c
a
m
A* Search a
n
A* Search is the most widely used form of best-first search. The evaluation function f(n) is
obtained by combining y
d
(1) g(n) = the cost to reach the node,and
u
(2) h(n) = the cost to get from the node to the goal : f(n) = g(n) + h(n).
t
A* Search is both optimal and complete. A* is optimal if h(n) is an admissible heuristic. The
S
obvious example of admissible heuristic is the straight-line distance hSLD. It cannot be an
overestimate.
A Search is optimal if h(n) is an admissible heuristic – that is,provided that h(n) never
overestimates the cost to reach the goal.
An obvious example of an admissible heuristic is the straight-line distance hSLD that we
used in getting to Bucharest. The values of ‗g ‗ are computed from the step costs shown in
the Romania map Also the values of hSLD are given in Figure.

m
o
.c
a
m
a
n
y
d
u
t
S

Fig. Stages in A* Search for Bucharest. Nodes are labeled with f = g + h . The h-values are
the straight-line distances to Bucharest taken from previous figure
S.NO RGPV QUESTIONS Year Marks
Q.1 What do you understand by heuristic? Explain hill June.2014 7
climbing method compare it with generate and test
method
Q.2 Why heuristic search techniques are considered to June.2011 10
be more powerful than the traditional search
techniques?
Q.3 What do you understand by heuristic? Explain hill m
June.2006 10
o
.c
climbing method compare it with generate and test
method

a
m
a
n
y
d
u
t
S
Unit-01/Lecture-05
LOCAL SEARCH ALGORITHMS AND OPTIMIZATION PROBLEMS
o In many optimization problems, the path to the goal is irrelevant; the goal state itself is the
solution
o For example, in the 8-queens problem, what matters is the final configuration of queens,
not the order in which they are added.
o In such cases, we can use local search algorithms. They operate using a single current
state(rather than multiple paths) and generally move only to neighbours of that state.
o The important applications of these class of problems are (a) integrated-circuit
design,(b)Factory-floor layout,(c) job-shop scheduling,(d)automatic

m
programming,(e)telecommunications network optimization,(f)Vehicle routing,and (g)
portfolio management.
o
.c
Key advantages of Local Search Algorithms
(1) They use very little memory – usually a constant amount; and
a
(2) they can often find reasonable solutions in large or infinite(continuous) state spaces for
which systematic algorithms are unsuitable.
m
OPTIMIZATION PROBLEMS
a
n
Inaddition to finding goals,local search algorithms are useful for solving pure optimization

y
problems,in which the aim is to find the best state according to an objective function.

d
u
State Space Landscape

t
To understand local search,it is better explained using state space landscape as shown in
figure.
S
A landscape has both ―location‖ (defined by the state) and ―elevation‖(defined by the
value of the heuristic cost function or objective function).
If elevation corresponds to cost,then the aim is to find the lowest valley – a global minimum;
if elevation corresponds to an objective function,then the aim is to find the highest peak – a
global maximum.
Local search algorithms explore this landscape. A complete local search algorithm always
finds a goal if one exists; an optimal algorithm always finds a global minimum/maximum.

m
o
.c
Fig. A one dimensional state space landscape in which elevation corresponds to the

a
objective function. The aim is to find the global maximum. Hill climbing search modifies the
current state to try to improve it ,as shown by the arrow. The various topographic features
are defined in the text m
Hill-climbing search a
n
The hill-climbing search algorithm as shown in figure, is simply a loop that continually moves
y
in the direction of increasing value – that is, uphill. It terminates when it reaches a ―peak‖
d
where no neighbour has a higher value.
u
t
S

Fig. The hill-climbing search algorithm (steepest ascent version), which is the most basic
local search technique. At each step the current node is replaced by the best neighbour; the
neighbour with the highest VALUE. If the heuristic cost estimate h is used, we could find the
neighbour with the lowest h.
Hill-climbing is sometimes called greedy local search because it grabs a good neighbor state
without thinking ahead about where to go next. Greedy algorithms often perform quite
well. Problems with hill-climbing
Hill-climbing often gets stuck for the following reasons :
o Local maxima : a local maximum is a peak that is higher than each of its neighboring
states,but lower than the global maximum. Hill-climbing algorithms that reach the vicinity
of a local maximum will be drawn upwards towards the peak,but will then be stuck with
nowhere else to go
o Ridges : A ridge is shown in Figure 2.10. Ridges results in a sequence of local maxima that
is very difficult for greedy algorithms to navigate. m
o
o Plateaux : A plateau is an area of the state space landscape where the evaluation function

.c
is flat. It can be a flat local maximum,from which no uphill exit exists,or a shoulder,from
which it is possible to make progress.
a
m
a
n
y
d
u
t
S
Fig.Illustration of why ridges cause difficulties for hill-climbing. The grid of states(dark
circles) is superimposed on a ridge rising from left to right,creating a sequence of local
maxima that are not directly connected to each other. From each local maximum,all th
available options point downhill.
Hill-climbing variations
Stochastic hill-climbing
o Random selection among the uphill moves.
o The selection probability can vary with the steepness of the uphill move.
First-choice hill-climbing
o cfr. stochastic hill climbing by generating successors randomly until a better one is found.
Random-restart hill-climbing
o Tries to avoid getting stuck in local maxima. Simulated annealing search
A hill-climbing algorithm that never makes ―downhill‖ moves towards states with lower
value(or higher cost) is guaranteed to be incomplete,because it can stuck on a local

m
maximum.In contrast,a purely random walk –that is,moving to a successor choosen

o
uniformly at random from the set of successors – is complete,but extremely inefficient.

.c
Simulated annealing is an algorithm that combines hill-climbing with a random walk in
someway that yields both efficiency and completeness.
a
Figure shows simulated annealing algorithm. It is quite similar to hill climbing. Instead of

m
picking the best move,however,it picks the random move. If the move improves the

a
situation,it is always accepted. Otherwise,the algorithm accepts the move with some

n
probability less than 1. The probability decreases exponentially with the ―badness‖ of the
move – the amount
y
E by which the evaluation is worsened.

d
Simulated annealing was first used extensively to solve VLSI layout problems in the early

u
1980s. It has been applied widely to factory scheduling and other large-scale optimization
tasks. t
S
The SIMULATED ANNEALING algorithm

1 Evaluate the initial state


2 If it is goal state Then quit
otherwise make the current state this initial state and proceed;

3 Make BESTSOFAR to current state


4 Set T from the annealing schedule
5 Repeat
select an operator that can be applied to the current state
apply the new operator and create a new state
evaluate this new state
find [[Delta]]E = difference between the values of current and new states
If this new state is goal state Then quit
Otherwise compare with the current state
If better set BESTSOFAR to the value of this state and make the current the new state
If it is not better then make it the current state with probability p'. This involves generating a
random number in the range 0 to 1 and comparing it with a half, if it is less than a half do
m
nothing and if it is greater than a half accept this state as the next current be a half.
Revise T in the annealing schedule dependent on number of nodes in treeo
Until a solution is found or no more new operators
.c
6 Return BESTSOFAR as the answer
a
m
a
n
y
d
u
t
S
m
Fig. The simulated annealing search algorithm, a version of stochastic hill climbing
where some downhill moves are allowed.
o
S.NO RGPV QUESTIONS .c Year Marks
Q.1 a
What do you understand by heuristic? Explain hill June.2006 10

m
climbing method compare it with generate and test
method
a
n
y
d
u
t
S
UNIT-01/LECTURE-06
HILL CLIMBING PROCEDURE
This is a variety of depth-first (generate - and - test) search. A feedback is used here to
decide on the direction of motion in the search space. In the depth-first search, the test
function will merely accept or reject a solution. But in hill climbing the test function is
provided with a heuristic function which provides an estimate of how close a given state is
to goal state. The hill climbing test procedure is as follows :
1. General he first proposed solution as done in depth-first procedure. See if it is a solution.
If so quit , else continue.
2. From this solution generate new set of solutions use , some application rules

m
3. For each element of this set

o
(i) Apply test function. It is a solution quit.

.c
(ii) Else see whether it is closer to the goal state than the solution already generated. If yes,
remember it else discard it.

a
4. Take the best element so far generated and use it as the next proposed solution.

m
This step corresponds to move through the problem space in the direction Towards the goal
state.
a
5. Go back to step 2.
n
y
Sometimes this procedure may lead to a position, which is not a solution, but from which

d
there is no move that improves things. This will happen if we have reached one of the

u
following three states.

t
(a) A "local maximum " which is a state better than all its neighbors , but is not better than

S
some other states farther away. Local maxim sometimes occur with in sight of a solution. In
such cases they are called " Foothills".
(b) A "plateau'' which is a flat area of the search space, in which neighboring states have the
same value. On a plateau, it is not possible to determine the best direction in which to move
by making local comparisons.
(c) A "ridge" which is an area in the search that is higher than the surrounding areas, but
cannot be searched in a simple move.
To overcome these problems we can
(a) Back track to some earlier nodes and try a different direction. This is a good way of
dealing with local maxim.
(b) Make a big jump an some direction to a new area in the search. This can be done by
applying two more rules of the same rule several times, before testing. This is a good
strategy is dealing with plate and ridges.
Hill climbing becomes inefficient in large problem spaces, and when combinatorial explosion
occurs. But it is a useful when combined with other methods.
Steepest Ascent Hill Climbing m
o
.c
This differs from the basic Hill climbing algorithm by choosing the best successor rather than

a
the first successor that is better. This indicates that it has elements of the breadth first
algorithm.
m
a
n
The algorithm proceeds

y
d
Steepest ascent Hill climbing algorithm

u
t
1 Evaluate the initial state

S
2 If it is goal state Then quit

otherwise make the current state this initial state and proceed;

3 Repeat
set Target to be the state that any successor of the current state can better;

for each operator that can be applied to the current state

apply the new operator and create a new state

evaluate this state

If this state is goal state Then quit

Otherwise compare with Target


m
If better set Target to this value o
.c
a
If Target is better than current state set current state to Target

m
a
Until a solution is found or current state does not change

n
y
Both the basic and this method of hill climbing may fail to find a solution by reaching a state

d
from which no subsequent improvement can be made and this state is not the solution.

u
t
Local maximum state is a state which is better than its neighbours but is not better than

S
states faraway. These are often known as foothills. Plateau states are states which have
approximately the same value and it is not clear in which direction to move in order to reach
the solution. Ridge states are special typesw of local maximum states. The surrounding area
is basically unfriendly and makes it difficult to escape from, in single steps, and so the path
peters out when surrounded by ridges. Escape relies on: backtracking to a previous good
state and proceed in a completely different direction--- involves keeping records of the
current path from the outset; making a gigantic leap forward to a different part of the
search space perhaps by applying a sensible small step repeatedly, good for plateaux;
applying more than one rule at a time before testing, good for ridges.
None of these escape strategies can guarantee success.

S.NO RGPV QUESTION YEAR MARKS


Q.1 Jun-2012 08
June.2006 10
June.2007
June.2009
Dec.2008 m
o
.c
Dec.2009

a Dec.2013, 8

m June-2006,07

a Dec.2005 5

n
Dec.2006

y
d
u
t
S
UNIT-01/LECTURE-07
Best first Search
Best-first search is a search algorithm which explores a graph by expanding the most
promising node chosen according to a specified rule.

Judea Pearl described best-first search as estimating the promise of node n by a "heuristic
evaluation function f(n) which, in general, may depend on the description of n, the
description of the goal, the information gathered by the search up to that point, and most
important, on any extra knowledge about the problem domain."

m
Some authors have used "best-first search" to refer specifically to a search with a heuristic

o
that attempts to predict how close the end of a path is to a solution, so that paths which are

.c
judged to be closer to a solution are extended first. This specific type of search is called
greedy best-first search.

a
m
Efficient selection of the current best candidate for extension is typically implemented using
a priority queue.
a
n
y
The A* search algorithm is an example of best-first search, as is B*. Best-first algorithms are

d
often used for path finding in combinatorial search. (Note that neither A* nor B* is a greedy

u
best-first search as they incorporate the distance from start in addition to estimated

t
distances to the goal.)

S
An algorithm implementing best-first search follows.[3]

OPEN = [initial state]


while OPEN is not empty or until a goal is found
do
1. Remove the best node from OPEN, call it n.
2. If n is the goal state, backtrace path to n (through recorded parents) and return path.
3. Create n's successors.
4. Evaluate each successor, add it to OPEN, and record its parent.
done

Note that this version of the algorithm is not complete, i.e. it does not always find a possible
path between two nodes, even if there is one. For example, it gets stuck in a loop if it arrives
at a dead end, that is a node with the only successor being its parent. It would then go back
to its parent, add the dead-end successor to the OPEN list again, and so on.

The following version extends the algorithm to use an additional CLOSED list, containing all
nodes that have been evaluated and will not be looked at again. As this will avoid any node
being evaluated twice, it is not subject to infinite loops. m
o
OPEN = [initial state]
.c
CLOSED = []
a
while OPEN is not empty
m
do
a
n
1. Remove the best node from OPEN, call it n, add it to CLOSED.

y
2. If n is the goal state, backtrace path to n (through recorded parents) and return path.

d
3. Create n's successors.

u
4. For each successor do:

t
a. If it is not in CLOSED and it is not in OPEN: evaluate it, add it to OPEN, and record its
parent.
S
b. Otherwise, if this new path is better than previous one, change its recorded parent.
i. If it is not in OPEN add it to OPEN.
ii. Otherwise, adjust its priority in OPEN using this new evaluation.
done

Also note that the given pseudo code of both versions just terminates when no path is
found. An actual implementation would of course require special handling of this case.

Best-first search in its most basic form consists of the following algorithm (adapted from
Pearl, 1984):

The first step is to define the OPEN list with a single node, the starting node. The second
step is to check whether or not OPEN is empty. If it is empty, then the algorithm returns
failure and exits. The third step is to remove the node with the best score, n, from OPEN and
place it in CLOSED. The fourth step “expands” the node n, where expansion is the
identification of successor nodes of n. The fifth step then checks each of the successor
nodes to see whether or not one of them is the goal node. If any successor is the goal node,
m
the algorithm returns success and the solution, which consists of a path traced backwards
o
from the goal to the start node. Otherwise, the algorithm proceeds to the sixth step. For

.c
every successor node, the algorithm applies the evaluation function, f, to it, then checks to

a
see if the node has been in either OPEN or CLOSED. If the node has not been in either, it gets

m
added to OPEN. Finally, the seventh step establishes a looping structure by sending the

a
algorithm back to the second step. This loop will only be broken if the algorithm returns

n
success in step five or failure in step two.

y
d
The algorithm is represented here in pseudo-code:

u
t
1. Define a list, OPEN, consisting solely of a single node, the start node, s.

S
2. IF the list is empty, return failure.

3. Remove from the list the node n with the best score (the node where f is the
minimum), and move it to a list, CLOSED.

4. Expand node n.
5. IF any successor to n is the goal node, return success and the solution (by tracing the
path from the goal node to s).

6. FOR each successor node:

a) apply the evaluation function, f, to the node.


b) IF the node has not been in either list, add it to OPEN.

7.looping structure by sending the algorithm back to the second step.

m
Pearl adds a third step to the FOR loop that is designed to prevent re-expansion of nodes
o
that have already been visited. This step has been omitted above because it is not common
to all best-first search algorithms.
.c
a
m
a
n
y
d
u
t
S
A is an initial node, which is expand to B,C and D. A heuristic function, say cost of reaching
the goal , is applied to each of these nodes, since D is most promising, it is expanded next,
producing two successor nodes E and F. Heuristic function is applied to them. Now out of
the four remaining ( B,C and F) B looks more promising and hence it is expand generating
nodes G and H . Again when evaluated E appears to be the next stop J has to be expanded
giving rise to nodes I and J. In the next step J has to be expanded, since it is more promising .
this process continues until a solution is found.

Above figure shows the best - first search tree. Since a search tree may generate duplicate
nodes, usually a search graph is preferred.
m
o
The best - first search is implemented by an algorithm known as A* algorithm. The algorithm

.c
searches a directed graph in which each node represents a point in the problem space. Each

a
node will contain a description of the problem state it represents and it will have links to its

m
parent nodes and successor nodes. In addition it will also indicate how best it is for the

a
search process. A* algorithm uses have been generated, heuristic functions applied to them,

n
but successors not generated. The list CLOSED contains nodes which have been examined,

y
i.e., their successors generated.

d
u
A heuristic function f estimates the merits of each generated node. This function f has two

t
components g and h. the function g gives the cost of getting from the initial state to the

S
current node. The function h is an estimate of the addition cost of getting from current node
to a goal state. The function f (=g+h) gives the cost of getting from the initial state to a goal
state via the current node.
S.NO RGPV QUESTION YEAR MARKS
Q.1 Find the OPEN and CLOSED list nodes June.2013 10
for the diagram given below.

B C

m
o
.c
D E

a
m
a
Q.2 Explain best first, depth first and June.2011 10
breadth first search. When would best
n
first search be worse than simple
y
breadth first search?
d
u
t
S
UNIT-01/LECTURE-08
A* algorithm
A* uses a best-first search and finds a least-cost path from a given initial node to one goal
node (out of one or more possible goals). As A* traverses the graph, it follows a path of the
lowest expected total cost or distance, keeping a sorted priority queue of alternate path
segments along the way.

It uses a knowledge-plus-heuristic cost function of node x (usually denoted f(x)) to


determine the order in which the search visits nodes in the tree. The cost function is a sum

m
of two functions:

o
.c
the past path-cost function, which is the known distance from the starting node to the
current node x (usually denoted g(x))

a
a future path-cost function, which is an admissible "heuristic estimate" of the distance
from x to the goal (usually denoted h(x)).
m
a
n
The h(x) part of the f(x) function must be an admissible heuristic; that is, it must not

y
overestimate the distance to the goal. Thus, for an application like routing, h(x) might

d
represent the straight-line distance to the goal, since that is physically the smallest possible

u
distance between any two points or nodes.

t
S
If the heuristic h satisfies the additional condition h(x) <= d(x,y) + h(y) for every edge (x, y) of
the graph (where d denotes the length of that edge), then h is called monotone, or
consistent. In such a case, A* can be implemented more efficiently—roughly speaking, no
node needs to be processed more than once (see closed set below)—and A* is equivalent to
running Dijkstra's algorithm with the reduced cost d'(x, y) := d(x, y) + h(y) - h(x).

Process
Like all informed search algorithms, it first searches the routes that appear to be most likely
to lead towards the goal. What sets A* apart from a greedy best-first search is that it also
takes the distance already traveled into account; the g(x) part of the heuristic is the cost
from the starting point, not simply the local cost from the previously expanded node.

Starting with the initial node, it maintains a priority queue of nodes to be traversed, known
as the open set or fringe. The lower f(x) for a given node x, the higher its priority. At each
step of the algorithm, the node with the lowest f(x) value is removed from the queue, the f
and g values of its neighbors are updated accordingly, and these neighbors are added to the
queue. The algorithm continues until a goal node has a lower f value than any node in the
queue (or until the queue is empty). (Goal nodes may be passed over multiple times if there
remain other nodes with lower f values, as they may lead to a shorter path to a goal.) The f
value of the goal is then the length of the shortest path, since h at the goal is zero in an
admissible heuristic.

The algorithm described so far gives us only the length of the shortest path. To find the
m
actual sequence of steps, the algorithm can be easily revised so that each node on the path

o
keeps track of its predecessor. After this algorithm is run, the ending node will point to its
predecessor, and so on, until some node's predecessor is the start node.

.c
Additionally, if the heuristic is monotonic (or consistent, see below), a closed set of nodes

a
already traversed may be used to make the search more efficient.

Algorithm:
m
a
1. Start with OPEN containing the initial node. Its g=0 and f ' = h '

n
Set CLOSED to empty list. y
d
2. Repeat u
t
S
If OPEN is empty , stop and return failure

Else pick the BESTNODE on OPEN with lowest f ' value and place it on CLOSED

If BESTNODE is goal state return success and stop


Else

Generate the successors of BESTNODE.

For each SUCCESSOR do the following:

1. Set SUCCESSOR to point back to BESTNODE. (back links will help to

recover the path)

2. compute g(SUCCESSOR) = g(BESTNODE) cost of getting from BESTNODE to SUCCESSOR.


m
o
3. If SUCCESSOR is the same as any node on OPEN, call that node OLS and add OLD to

.c
BESTNODE 's successors. Check g(OLD) and g(SUCCESSOR). It g(SUCCESSOR) is cheaper then

a
reset OLD 's parent link to point to BESTNODE. Update g(OLD) and f '(OLD).

m
a
4. If SUCCESSOR was not on OPEN , see if it is on CLOSED . if so call the node CLOSED OLD ,

n
and better as earlier and set the parent link and g and f ' values appropriately.

y
d
5. If SUCCESSOR was not already on earlier OPEN or CLOSED, then put it on OPEN and add it

u
to the list of BESTNODE 's successors.

t
S
Compute f ' (SUCCESSOR) = g(SUCCESSOR) + h ' (SUCCESSOR)

Best first searches will always find good paths to a goal after exploring the entire state
space. All that is required is that a good measure of goal distance be used.

The A* Algorithm
Best first search is a simplified A*.

Start with OPEN holding the initial nodes.


Pick the BEST node on OPEN such that f = g + h' is minimal.
If BEST is goal node quit and return the path from initial to BEST Otherwise
Remove BEST from OPEN and all of BEST's children, labelling each with its path from initial
node.

Graceful decay of admissibility

If h' rarely overestimates h by more than d then the A* algorithm will rarely find a solution
whose cost is d greater than the optimal solution. m
o
.c
S.NO RGPV QUESTION
aYEAR MARKS
Q.1
m Dec.2012 10

a Dec.2009

n
Jun.2004

y Jun.2008
Q.2
d June.2004 10

u June.2008

t Dec.2009
Q.3
S Dec-2013 7
UNIT-01/LECTURE-09

AO* algorithms
AO* algorithm (pronounced ``A-O-star'') an algorithm similar to A* for heuristic search of an
AO* Algorithm
1. Initialise the graph to start node
2. Traverse the graph following the current path accumulating nodes that have not yet
been expanded or solved
3. Pick any of these nodes and expand it and if it has no successors call this value

m
FUTILITY otherwise calculate only f' for each of the successors.

o
4. If f' is 0 then mark the node as SOLVED

.c
5. Change the value of f' for the newly created node to reflect its successors by back
propagation.

a
6. Wherever possible use the most promising routes and if a node is marked as SOLVED
then mark the parent node as SOLVED.
m
a
7. If starting node is SOLVED or value greater than FUTILITY, stop, else repeat from 2.

n
AO* Algorithm2.
y
The AO* ALGORITHM
d
u
The problem reduction algorithm we just described is a simplification of an algorithm

t
described in Martelli and Montanari, Martelli and Montanari and Nilson. Nilsson calls it the

S
AO* algorithm , the name we assume.
1. Place the start node s on open.
2. Using the search tree constructed thus far, compute the most promising solution tree T
3. Select a node n that is both on open and a part of T. Remove n from open and place it on
closed.
4. If n is a terminal goal node, label n as solved. If the solution of n results in any of n’s
ancestors being solved, label all the ancestors as solved. If the start node s is solved, exit
with success where T is the solution tree. Remove from open all nodes with a solved
ancestor.
5. If n is not a solvable node (operators cannot be applied), label n as unsolvable. If the start
node is labeled as unsolvable, exit with failure. If any of n’s ancestors become unsolvable
because n is, label them unsolvable as well. Remove from open all nodes with unsolvable
ancestors.
6. Otherwise, expand node n generating all of its successors. For each such successor node
that contains more than one sub problem, generate their successors to give individual sub
problems. Attach to each newly generated node a back pointer to its predecessor. Compute
the cost estimate h* for each newly generated node and place all such nodes that do not yet
have descendents on open. Next, recomputed the values of h* at n and each ancestor of n.
7. Return to step 2. m
Problem Reduction with AO* Algorithm. o
PROBLEM REDUCTION ( AND - OR graphs - AO * Algorithm)
.c
a
m
When a problem can be divided into a set of sub problems, where each sub problem can be

a
solved separately and a combination of these will be a solution, AND-OR graphs or AND - OR

n
trees are used for representing the solution. The decomposition of the problem or problem

y
reduction generates AND arcs. One AND are may point to any number of successor nodes.

d
All these must be solved so that the arc will rise to many arcs, indicating several possible

u
solutions. Hence the graph is known as AND - OR instead of AND. Figure shows an AND - OR
graph.
t
S

An algorithm to find a solution in an AND - OR graph must handle AND area appropriately.
A* algorithm can not search AND - OR graphs efficiently. This can be understand from the
give figure.
FIGURE : AND - OR graph

In figure (a) the top node A has been expanded producing two area one leading to B and
m
leading to C-D . the numbers at each node represent the value of f ' at that node (cost of
o
getting to the goal state from current state). For simplicity, it is assumed that every

.c
operation(i.e. applying a rule) has unit cost, i.e., each are with single successor will have a

a
cost of 1 and each of its components. With the available information till now , it appears

m
that C is the most promising node to expand since its f ' = 3 , the lowest but going through B

a
would be better since to use C we must also use D' and the cost would be 9(3+4+1+1).
Through B it would be 6(5+1).
n
y
Thus the choice of the next node to expand depends not only n a value but also on whether
d
that node is part of the current best path form the initial mode. Figure (b) makes this
u
t
clearer. In figure the node G appears to be the most promising node, with the least f ' value.

S
But G is not on the current beat path, since to use G we must use GH with a cost of 9 and
again this demands that arcs be used (with a cost of 27). The path from A through B, E-F is
better with a total cost of (17+1=18). Thus we can see that to search an AND-OR graph, the
following three things must be done.
1. traverse the graph starting at the initial node and following the current best path, and
accumulate the set of nodes that are on the path and have not yet been expanded.
2. Pick one of these unexpanded nodes and expand it. Add its successors to the graph and
computer f ' (cost of the remaining distance) for each of them.

3. Change the f ' estimate of the newly expanded node to reflect the new information
produced by its successors. Propagate this change backward through the graph. Decide
which of the current best path.

The propagation of revised cost estimation backward is in the tree is not necessary in A*
algorithm. This is because in AO* algorithm expanded nodes are re-examined so that the
current best path can be selected. The working of AO* algorithm is illustrated in figure as
follows:
m
o
.c
a
m
a
n
y
d
u
t
Referring the figure. The initial node is expanded and D is Marked initially as promising

S
node. D is expanded producing an AND arc E-F. f ' value of D is updated to 10. Going
backwards we can see that the AND arc B-C is better . it is now marked as current best path.
B and C have to be expanded next. This process continues until a solution is found or all
paths have led to dead ends, indicating that there is no solution. An A* algorithm the path
from one node to the other is always that of the lowest cost and it is independent of the
paths through other nodes.
The algorithm for performing a heuristic search of an AND - OR graph is given below. Unlike
A* algorithm which used two lists OPEN and CLOSED, the AO* algorithm uses a single
structure G. G represents the part of the search graph generated so far. Each node in G
points down to its immediate successors and up to its immediate predecessors, and also has
with it the value of h' cost of a path from itself to a set of solution nodes. The cost of getting
from the start nodes to the current node "g" is not stored as in the A* algorithm. This is
because it is not possible to compute a single such value since there may be many paths to
the same state. In AO* algorithm serves as the estimate of goodness of a node. Also a there
should value called FUTILITY is used. The estimated cost of a solution is greater than
FUTILITY then the search is abandoned as too expansive to be practical.
For representing above graphs AO* algorithm is as follows
m
AO* ALGORITHM: o
.c
1. Let G consists only to the node representing the initial state call this node INTT. Compute
h' (INIT).
a
m
a
2. Until INIT is labeled SOLVED or hi (INIT) becomes greater than FUTILITY, repeat the

n
following procedure.

y
(I) Trace the marked arcs from INIT and select an unbounded node NODE.

d
(II) Generate the successors of NODE . if there are no successors then assign FUTILITY as

u
h' (NODE). This means that NODE is not solvable. If there are successors then for each

t
one called SUCCESSOR, that is not also an ancester of NODE do the following

S
(a) add SUCCESSOR to graph G

(b) if successor is not a terminal node, mark it solved and assign zero to its h ' value.

(c) If successor is not a terminal node, compute it h' value.

(III) propagate the newly discovered information up the graph by doing the following . let S
be a set of nodes that have been marked SOLVED. Initialize S to NODE. Until S is empty
repeat the following procedure;
(a) select a node from S call if CURRENT and remove it from S.

(b) compute h' of each of the arcs emerging from CURRENT , Assign minimum h' to
CURRENT.

(c) Mark the minimum cost path a s the best out of CURRENT.

(d) Mark CURRENT SOLVED if all of the nodes connected to it through the new marked
are have been labeled SOLVED.
m
o
(e) If CURRENT has been marked SOLVED or its h ' has just changed, its new status

.c
must be propagate backwards up the graph . hence all the ancestors of CURRENT are added
to S.
a
AO* Search Procedure.
m
a
n
1. Place the start node on open.

y
2. Using the search tree, compute the most promising solution tree TP .

d
3. Select node n that is both on open and a part of tp, remove n from open and place it no

u
closed.

t
4. If n is a goal node, label n as solved. If the start node is solved, exit with success where tp

S
is the solution tree, remove all nodes from open with a solved ancestor.
5. If n is not solvable node, label n as unsolvable. If the start node is labeled as unsolvable,
exit with failure. Remove all nodes from open ,with unsolvable ancestors.
6. Otherwise, expand node n generating all of its successor compute the cost of for each
newly generated node and place all such nodes on open.
7. Go back to step(2)
Note: AO* will always find minimum cost solution.
Various types of control strategies:
We will now consider the problem of deciding which rule to apply next during the process of
searching for a solution for a solution. This question arises when more than one rule will
have its left side match the current state.

The first requirement of a control strategy is that it must cause motion. The second
requirement of a control strategy is that issue must be systematic. We will explain these two
with respect to water jug problem. If we have implemented choosing the first operator and
then the one which matches the first one, then we would not have solved the problem. If
we follow any strategy which can cause some motion then will lead to a solution. But if it is
m
not followed systematically, and then got the solution. One day to follow a systematic
o
control strategy is to construct a tree with the initial state as its root. By applying all possible

.c
combinations from the first level leaf nodes. Continue the process until some rule produces

a
a goal state. For the water jug problem a tree can be constructed as given in following
diagram.
m
a
n
The control strategy for the search process is called breadth first search. Other systematical

y
control strategies are also available . for example, we can select one single branch of a tree

d
until it yields a solution or until some pre specified depth has been reached. If not we go

u
back and explore to other branches . this is called depth – first – search. The water jug

t
problems will lead to an answer by adoption any control strategy because the problem is

S
simple. This is not always the case.

S.NO RGPV QUESTION YEAR MARKS


Q.1 Explain AO* algorithm. June.2014 10
Q.2 June.2004 10
REFERENCE

BOOK AUTHOR PRIORITY


Artificial 1
Intelligence. Elaine Rich & Kevin Knight
Artificial 2 m
Intelligence. A o
modern
.c
approach Russell Stuart & peter Norvig
a
m
a
n
y
d
u
t
S
1

UNIT-02
Knowledge Representation
UNIT-02/LECTURE-01
Knowledge Representation:
For the purpose of solving complex problems c\encountered in AI, we need both a large amount
of knowledge and some mechanism for manipulating that knowledge to create solutions to new
problems. A variety of ways of representing knowledge (facts) have been exploited in AI
programs. In all variety of knowledge representations, we deal with two kinds of entities.

A. Facts: Truths in some relevant world. These are the things we want to represent.

B. Representations of facts in some chosen formalism. these are things we willm


o
Actually be able to manipulate.
.c
a
One way to think of structuring these entities is at two levels: (a) the knowledge level, at which
m
facts are described, and (b) the symbol level, at which representations of objects at the
a
knowledge level are defined in terms of symbols that can be manipulated by programs.
n
y
The facts and representations are linked with two-way mappings. This link is called
d
representation mappings. The forward representation mapping maps from facts to
u
representations. The backward representation mapping goes the other way, from
t
S
representations to facts.

One common representation is natural language (particularly English) sentences. Regardless of


the representation for facts we use in a program, we may also need to be concerned with an
English representation of those facts in order to facilitate getting information into and out of the
system. We need mapping functions from English sentences to the representation we actually
use and from it back to sentences.
2

What to Represent?

Let us first consider what kinds of knowledge might need to be represented in AI systems:

Objects
-- Facts about objects in our world domain. e.g. Guitars have strings, trumpets are brass
instruments.

Events
-- Actions that occur in our world. e.g. Steve Vai played the guitar in Frank Zappa's Band.

m
Performance

o
-- A behaviour like playing the guitar involves knowledge about how to do things.

Meta-knowledge .c
a
-- Knowledge about what we know. e.g. Bobrow's Robot who plan's a trip. It knows that it can

m
read street signs along the way to find out where it is.

a
n
Thus in solving problems in AI we must represent knowledge and there are two entities to deal
with:
y
d
Facts
u
t
-- truths about the real world and what we represent. This can be regarded as the knowledge
level
S
Representation of the facts
Which we manipulate. This can be regarded as the symbol level since we usually define the
representation in terms of symbols that can be manipulated by programs.

We can structure these entities at two levels

the knowledge level


-- at which facts are described
3

the symbol level


-- at which representations of objects are defined in terms of symbols that can be
manipulated in programs (see Fig. 5)

Fig 5 Two Entities in Knowledge Representation

English or natural language is an obvious way of representing and handling facts. Logic enables

m
us to consider the following fact: spot is a dog as dog(spot) We could then infer that all dogs

o
have tails with: tex2html_wrap_inline7154 : dog(x) tex2html_wrap_inline7156 hasatail(x) We
can then deduce:
.c
hasatail(Spot)
a
m
a
Using an appropriate backward mapping function the English sentence Spot has a tail can be
generated. n
y
d
The available functions are not always one to one but rather are many to many which is a
u
characteristic of English representations. The sentences All dogs have tails and every dog has a
t
tail both say that each dog has a tail but the first could say that each dog has more than one tail
S
try substituting teeth for tails. When an AI program manipulates the internal representation of
facts these new representations should also be interpretable as new representations of facts.

Consider the classic problem of the mutilated chess board. Problem In a normal chess board the
opposite corner squares has been eliminated. The given task is to cover all the squares on the
remaining board by dominoes so that each domino covers two squares. No overlapping of
dominoes is allowed, can it be done. Consider three data structures
4

Fig. 3.1 Mutilated Checker

The first two are illustrated in the diagrams above and the third data structure is the number of
black squares and the number of white squares. The first diagram loses the colour of the
squares and a solution is not east to see; the second preserves the colours but produces no
easier path whereas counting the number of squares of each colour giving black as 32 and the
number of white as 30 yields an immediate solution of NO as a domino must be on one white

m
square and one black square, thus the number of squares must be equal for a positive solution.

o
.c
Using Knowledge

a
We have briefly mentioned where knowledge is used in AI systems. Let us consider a little

m
further to what applications and how knowledge may be used.

a
Learning
n
y
-- acquiring knowledge. This is more than simply adding new facts to a knowledge base. New

d
data may have to be classified prior to storage for easy retrieval, etc.. Interaction and inference

u
with existing facts to avoid redundancy and replication in the knowledge and and also so that

t
facts can be updated.
Retrieval S
-- The representation scheme used can have a critical effect on the efficiency of the method.
Humans are very good at it.

Many AI methods have tried to model human (see lecture on distributed reasoning)

Reasoning
-- Infer facts from existing data.
5

If a system on only knows:

Miles Davis is a Jazz Musician.


All Jazz Musicians can play their instruments well.

If things like Is Miles Davis a Jazz Musician? or Can Jazz Musicians play their instruments well?
Are asked then the answer is readily obtained from the data structures and procedures.

However a question like Can Miles Davis plays his instrument well? requires reasoning.

The above are all related. For example, it is fairly obvious that learning and reasoning involve
retrieval etc.

m
Properties for Knowledge Representation Systems

o
.c
The following properties should be possessed by a knowledge representation system.

Representational Adequacy a
m
-- the ability to represent the required knowledge;
Inferential Adequacy
a
n
- the ability to manipulate the knowledge represented to produce new knowledge

y
corresponding to that inferred from the original;
Inferential Efficiency
d
u
- the ability to direct the inferential mechanisms into the most productive directions by

t
storing appropriate guides;

S
Acquisitional Efficiency
- the ability to acquire new knowledge using automatic methods wherever possible rather
than reliance on human intervention.

Approaches to Knowledge Representation


Simple relational knowledge
The simplest way of storing facts is to use a relational method where each fact about a set of
objects is set out systematically in columns. This representation gives little opportunity for
inference, but it can be used as the knowledge basis for inference engines.
6

• Simple way to store facts.


• Each fact about a set of objects is set out systematically in columns (Fig. 7).
• Little opportunity for inference.
• Knowledge basis for inference engines.

Figure: Simple Relational Knowledge

We can ask things like:


Who is dead?
Who plays Jazz/Trumpet etc.?
m
o
This sort of representation is popular in database systems.

.c
Inheritable knowledge

Relational knowledge is made up of objects consisting of


a
• attributes m
• corresponding associated values. a
n
y
We extend the base more by allowing inference mechanisms:

d

u
Property inheritance

t
Elements inherit values from being members of a class.

S
Data must be organised into a hierarchy of classes (Fig. 8).
7

Fig. 8 Property Inheritance Hierarchy

m
• Boxed nodes -- objects and values of attributes of objects. o
• Values can be objects with attributes and so on.
.c
a
• Arrows -- point from object to its value.
• This structure is known as a slot and filler structure, semantic network or a collection of frames.

m
a
The algorithm to retrieve a value for an attribute of an instance object:

n

y
Find the object in the knowledge base

d
If there is a value for the attribute report it

u
Otherwise look for a value of instance if none fail

t
Otherwise go to that node and find a value for the attribute and then report it

S
Otherwise search through using isa until a value is found for the attribute.

Inferential Knowledge
Represent knowledge as formal logic:

All dogs have tails : dog(x) hasatail(x) Advantages:

• A set of strict rules.


o Can be used to derive more facts.
o Truths of new statements can be verified.
o Guaranteed correctness.
• Many inference procedures available to in implement standard rules of logic.
8

• Popular in AI systems. e.g Automated theorem proving.

Procedural Knowledge

Basic idea:

• Knowledge encoded in some procedures


• small programs that know how to do specific things, how to proceed.
• e.g a parser in a natural language understander has the knowledge that a noun phrase may
contain articles, adjectives and nouns. It is represented by calls to routines that know how to
process articles, adjectives and nouns. https://www.rgpvonline.com

Advantages:

• Heuristic or domain specific knowledge can be represented.


m
• Extended logical inferences, such as default reasoning facilitated.
o
.c
• Side effects of actions may be modelled. Some rules may become false in time. Keeping track of
this in large systems may be tricky.
a
Disadvantages:
m
a

n
Completeness -- not all cases may be represented.

y
• Consistency -- not all deductions may be correct.

d
e.g If we know that Fred is a bird we might deduce that Fred can fly. Later we might discover that Fred
is an emu. u
t
S
• Modularity is sacrificed. Changes in knowledge base might have far-reaching effects.
• Cumbersome control information.
S.NO RGPV QUESTION YEAR MARKS
Q.1 Explain the various approaches June.2014 7
of knowledge representation
schema with the help of
examples.
Q.2 Explain knowledge June.2013 10
representation & problems
9

associated with it with


examples
Q.3 Discuss various approaches of June.2011 10
knowledge representation with
the help of examples

m
o
.c
a
m
a
n
y
d
u
t
S
10

UNIT-02/LECTURE-02
Problems in representing knowledge:
Below are listed issues that should be raised when using a knowledge representation technique:
Important Attributes
-- Are there any attributes that occur in many different types of problem?
There are two instance and isa and each is important because each supports property
inheritance.
Relationships

m
-- What about the relationship between the attributes of an object, such as, inverses, existence,

o
techniques for reasoning about values and single valued attributes. We can consider an example

.c
of an inverse in
band(John Zorn,Naked City)
a
This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked City.
Another representation is band = Naked City
m
a
band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron
Granularity
n
y
-- At what level should the knowledge be represented and what are the primitives. Choosing

d
the Granularity of Representation Primitives are fundamental concepts such as holding, seeing,

u
playing and as English is a very rich language with over half a million words it is clear we will find

t
difficulty in deciding upon which words to choose as our primitives in a series of situations.

S
If Tom feeds a dog then it could become:
feeds(tom, dog)
If Tom gives the dog a bone like:
gives(tom, dog,bone) Are these the same?
In any sense does giving an object food constitute feeding?
If give(x, food) feed(x) then we are making progress.

But we need to add certain inferential rules.

In the famous program on relationships Louise is Bill's cousin How do we represent this? louise
11

= daughter (brother or sister (father or mother( bill))) Suppose it is Chris then we do not know if
it is Chris as a male or female and then son applies as well.

Clearly the separate levels of understanding require different levels of primitives and these need
many rules to link together apparently similar primitives.
Obviously there is a potential storage problem and the underlying question must be what level
of comprehension is needed.

Logic Knowledge Representation


We briefly mentioned how logic can be used to represent simple facts in the last lecture. Here
we will highlight major principles involved in knowledge representation. In particular predicate
logic will be met in other knowledge representation schemes and reasoning methods.

m
o
A more comprehensive treatment is given in the third year Expert Systems course. Symbols

.c
used The following standard logic symbols we use in this course are:
For all
There exists a
Implies
m
Not
a
Or
n
And y
d
Propositional Logic. u
t
Propositional logic is a special case of FOPL. It is simple to deal with and a decision procedure
S
exists for it. We can easily represent real world facts as logical propositions. Propositions are
elementary atomic sentences, written as formula or well formed formulas (wffs). Propositions
may either be true or false . some examples of simple propositions are
it is raining
snow is white
people live on moon
Compound Propositions are formed from wffs using logical connectives “ not, and or , if …then
(implication) and if … and only if”. For example :it is raining and wind is blowing” is a compound
12

proposition.
As example of prepositional logic is given below
it is raining RAINING
it is hot HOT
it is windy WINDY

m
It is a way of representing knowledge.

o
In logic and mathematics, a propositional calculus or logic is a formal system in which formulae

.c
representing propositions can be formed by combining atomic propositions using logical
connectives

a
Sentences considered in propositional logic are not arbitrary sentences but are the ones that

m
are either true or false, but not both. This kind of sentences are called propositions.
Example
a
Some facts in propositional logic:
n
It is raining. - RAINING
y
It is sunny -
dSUNNY
It is windy -
u WINDY

t
If it is raining ,then it is not sunny - RAINING -> ¬ SUNNY

S
the elements of propositional logic

Simple sentences which are true or false are basic propositions. Larger and more complex
sentences are constructed from basic propositions by combining them with connectives. Thus
propositions and connectives are the basic elements of propositional logic. Though there are
many connectives, we are going to use the following five basic connectives here:
NOT, AND, OR, IF_THEN (or IMPLY), IF_AND_ONLY_IF. They are also denoted by the symbols:
↔, respectively.

The connectives used in propositional logic.


The syntax of propositional logic defines the allowable sentences. The atomic sentences-the
indivisible syntactic elements-consist of a single proposition symbol. Each such symbol
13

stands for a proposition that can be true or false. We will use uppercase names for symbols: P,
Q, R, and so on.
Complex sentences are constructed from simpler sentences using logical connectives. There are
five connectives in common use:

a formal grammar of propositional logic:

m
o
.c
a
m
a
Compare different knowledge representation languages.

n
y
d
u
t
S.NO
Q.1
S RGPV QUESTION
What do you understand by the
YEAR
June.2012
MARKS
10
representation of Knowledge?
Explain the various problems in
representing knowledge.
Q.2 What is propositional logic? June.2011 10
Explain the knowledge
representation using propositional
logic.
14

UNIT-02/LECTURE-03
Predicate logic
Predicate is a proposition in which some of the Boolean values are replaced by Boolean-valued
functions and quantified expression. Predicate combines these functions using the operators of
propositional calculus and quantifies like ∀ (for-all) and ∃ (there-exists). Some authors do call
predicates as propositional functions.

Boolean-valued function is one with one or more variables that gives a Boolean result. For
example, prime(n) is a function that returns true if n is prime and false if otherwise. ∀x(citizen(x,
India)) returns true if everyone on the planet is a citizen of India and false otherwise.
∃x(citizen(x, India)) returns true if at least one person on the planet is a citizen of India and false
otherwise.
m
Quantifiers are specified as (with X as variable and P as proposition): o
Name
Universal
Example
∀X.P For all X, P is true .c
Meaning

Existential
Unique Existence
∃X.P
∃!X.P
a
There exists a value of X such that P is true
There exists an unique value of X such that
P is true m
a
In can be inferred that:
n
y
1. Negation of a Universal Specification: ¬∀xQ(x) is equivalent to ∃x ¬Q(x).

d
2. Negation of a Existential Specification: ¬∃xQ(x) is equivalent to ∀x ¬Q(x).

u
3. For any predicate Q(x, y):

t
a. Commutation of a Universal Quantifiers: ∀x∀y Q(x, y) is equivalent to ∀y∀x Q(x,

S y)
b. Commutation of a Existential Quantifiers: ∃x∃y Q(x, y) is equivalent to ∃y∃x Q(x,
y)
c. ∃x∀y Q(x, y)  ∀y∃x Q(x, y)

In this example, all X who are colleagues are persons: ∀X.(colleague(X) ⊃ person(X))
In this example, there exists X whom the mother married is a male.
∃X.(mother(hema, X) ∩male(X))

Deduction Rules (Rules of inference): Things can be deduced from the given propositional logic
15

using certain rules of inference and rules of replacement. Rules of inferences are generally
represented in the form [Premise/Conclusion] as:
Premise
Conclusion

Conjunction: Given a derivation, if the first step is proposition p and the second step is
proposition q, then it could be concluded as conjunction of p and q. Conjunction can also be
called as conjunction introduction or &-introduction or logical multiplication.
p/q could be written as p ∩ q

Simplification: Conjunctions of p and q can be concluded as p or q. Simplification can also be


called as conjunction elimination or &-elimination.
(p ∩q)/p OR (p ∩ q)/q

m
o
Addition: If the derivation is proposition p, then the conclusion might be a disjunction of p and

.c
any proposition q. Addition can also be called as disjunction introduction or –introduction.
p/(p ∪ q) OR q/(p ∪ q)
a
m
Disjunctive Syllogism: If the first step of derivation is proposition p  q and the next step is p,
then the conclusion is q. a
(p ∪q)/ ¬p/q n
y
d
Modus Ponens: If the first step of derivation is proposition p  q and the next step is p, then

u
the conclusion is q. Modus Ponens is also called as modus ponendo ponens or detachment or a
t
type of -elimination.
(p  q)/p/q S
Modus Tollens: If the first step of derivation is proposition p  q and the next step is q, then
the conclusion is p. Modus Tollens is also called modus tollendo tollens or a type of -
elimination.
(p q)/ ¬q/¬p

Hypothetical Syllogism: If the first step of derivation is proposition p  q and the next step is q
16

 r, then the conclusion is p  r. Hypothetical syllogism is also called as chain reasoning or


chain deduction.
(p  q)/(q  r)/(p  r)

Constructive Dilemma: If the first step of derivation is proposition (p  r) & (q  s) and the
next step is (p ∪ q), then the conclusion is (r ∪ s).
((p  r) & (q  s))/(p ∪ q)/(r ∪ s)
Another example of Constructive Dilemma: (p ∪ q)/(p  r)/(q  s)/(r ∪ s)

Destructive Dilemma: If the first step of derivation is proposition (¬p ∪ ¬q), the second step
is (r  p), and the next step is (s  q), then the conclusion is (¬r ∪ ¬s).
(¬p ∪ ¬q)/(r  p)/(s  q)/(¬r ∪ ¬s)

m
Absorption: If the first step of derivation is proposition (p  q), then the conclusion is (p 
(p & q)). o
(p  q)/(p  (p & q))
.c
a
m
Excluded Middle Introduction: During any step of the derivation, we could include a disjunction

a
of a proposition p, along with the negation of it. This inclusion will have no change on the
derivation. This could be represented as (p ∪ ¬p).
n
y
Universal Specification: Given an assertion of the form ∀xP(x), we could conclude P(s),
d
where s is any value of the variable x.
u
∀xP(x)/P(s)

t
S
Existential Specification: Given an assertion of the form ∃xP(x), we could conclude P(s),
where s is a constant and not a variable. ∃xP(x)/P(s)

Universal Generalization: Given a derivation of the form Q(x) where x is a free variable which
could be represented by any value within a specific domain, then we could conclude the
derivation as ∀xP(x).
Existential Generalization: Given a derivation of the form Q(x) where x is a constant value, then
we could conclude the derivation as ∃xP(x).
17

Deduction Rules (Rules of Replacement): In any given derivation, there are ways by which we
could replace a given proposition with its equivalent. These substitutions are based on the rules
of replacement. Rules of replacement are different from rules of inference because here we
don't infer anything but instead just provide an equivalent using different symbols. Rules of
inferences can be applied to the derivation as a whole only but rules of replacement can be
applied on parts of the derivation. Rules of inferences are unidirectional but rules of
replacement are bidirectional.

The following table summarizes various properties that could be applied on predicate calculus.
The specific property of deMorgan's property helps us to remove disjunctions or conjunctions
without changing the actual meaning.
Property Meaning Meaning
Double Negation ¬¬p ≡ p
(¬-elimination)
m
Commutativity
Associativity
p∨q≡q∨p
(p ∨ q) ∨ r ≡ p ∨ (q ∨ r)
p∧q≡q∧p
o
(p ∧ q) ∧ r ≡ p ∧ (q ∧ r)

.c
Tautology p≡p∨p p≡p∧p
Distributivity p ∨ q ∧ r ≡ (p ∨ q) ∧ (p ∨ r) p ∧ (q ∨ r) ≡ p ∧ q ∨ p ∧ r
Transposition
(Contraposition)
p ⊃ q ≡ ¬q ⊃ ¬p
a
Exportation p ⊃ (q ⊃ r) ≡ (p ∧ q) ⊃ r
m
Idempotence
Identity
p∨p≡p
p ∨ ¬p ≡ true a p∧p≡p
p ∧ ¬p ≡ false
deMorgan
n
¬(p ∨ q) ≡ ¬p ∧ ¬q ¬(p ∧ q) ≡ ¬p ∨ ¬q
Implication
Quantification y
p ⊃ q ≡ ¬p ∨ q
¬∀xP(x) ≡ ∃x¬P(x) ¬∃xP(x) ≡ ∀x¬P(x)
Material dp ⇔ q ≡ (p ⊃ q) ∧ (q ⊃ p) p⇔q ≡ (p ∧ q)∨(¬p ∧ ¬q)
Equivalence
u p⇔q ≡ (¬p ∨ q)∧(¬q ∨ p)

t
S
Material equivalence can also be called as "biconditional introduction/elimination" or "-
introduction/elimination".

Conditional and Indirect Proof: Given any proposition, the deduction could be made directly
using the rules of inference or rules of replacement or just as a premise. If the deduction is
straightforward, then the deduction is called as direct deduction.

Conditional Proof: Given the Premise and Conclusion, we wish to prove the conclusion. Most of
the conditionals are of the form p  q, where q is the conclusion that follows as an implication
18

of the premise p. To prove that the conditional is true, we must prove that q follows p. Let's
check whether the conclusion follows the premise, using some examples:
1. Given that (p  q) ∧ (q  r) is the premise and (p  r) is the conclusion.
The premise is generally represented with a right arrow preceding it.
|→ (p  q) ∧ (q  r) … (1)
Given the premise, using Simplification, we could select
p  q … (2)
Given p  q, the premise here should be p and thus, using Modus Ponens, we could conclude
on q. …(3)
From the equation (1), we could also use Simplification and select (q  r).
From (q  r) and using (3) as the premise, we could apply Modus Ponens and infer at
r. … (4)

m
Thus, we could infer that p  r as the conclusion.

o
.c
2. Let's assume that we are given that the statements are of the order: a  (b ∨ c) and
a  ¬d and d ⇔ b. we need to conclude that a  c.

a
The premise is generally represented with a right arrow preceding it.
|→ a  (b ∨ c)
m … (1)

a
Given the implication, a is the initial premise. … (2)

n
Using Material Equivalence, we could infer that:

y
d ⇔ b ≡ (d  b) ∧ (b  d) … (3)

d
Using Simplification, we could infer (b  d) from (3). … (4)

u
Using the initial premise a and (a  ¬d), we could using Simplification to attain ¬d.

t … (5)

S
Using (4) and (5) and applying Modus Tollens, we could attain ¬b. … (6)
Using the initial premise a and (a  (b ∨ c)), we could use Modus Ponens to attain
(b ∨ c). … (7)
Using (6) and (7) and applying Disjunctive Syllogism, we could infer to c.
Thus, we could infer a  c, which implies c (conclusion) follows a (premise).

Indirect Proof: Indirect Proof is also called as proof by reduction ad absurdum (reduce to the
absurd). In indirect proof, we have to conclude that the basis of the premise is false. Initially, we
assume that the premise is true, which is the opposite of what we have to conclude. Then we
19

reach a contradiction having the conjunction of the premise p (or any other intermediate
sentence q) and negation of the premise ¬p (or any other intermediate sentence q).
Let's consider that the premise is p and an intermediate step contains q ∧ ¬q. As the
intermediate step is achievable from the premise, using conditional proof, we could represent p
 (q ∧ ¬q).
Using Transposition [p ⊃ q ≡ ¬q ⊃ ¬p], we can attain: ¬(q ∧ ¬q)  ¬p
Using DeMorgan's Law [¬(p ∧ q) ≡ ¬p ∨ ¬q], we can attain: (¬q ∨ q)  ¬p
… (1)
(¬q ∨ q) is an excluded middle introduction. … (2)
Using (1) and (2) and applying Modus Ponens, we could conclude ¬p which is the
opposite of the premise p. As the negation of the premise is true, then the premise is concluded
to be false.

m
o
.c
S.NO RGPV QUESTION YEAR MARKS
Q.1 Convert the following June.2012 10
sentences to predicate logic: a
i) All Romans were either
m
a
loyal to Caesar or hated him.

n
ii) Everyone is loyal to
someone
iii) Marcus y tried to

d
assassinate Caesar.

u
iv) Caesar was ruler
Q.2 t
Convert the following June.2011 10
S
sentences to predicate logic:
i) Marcus was a man.
ii) All men are mortal.
iii) No mortal lives longer
than 150 years.
Q.3 Convert the following June.2013 10
sentences to predicate logic:
i) Marcus was a man.
ii) All pomperians were
20

romans.
iii) No mortal lives longer
than 150 years.
iv) 4 is an even number
v) All even numbers are
divisible by 2.
http://iiscs.wssu.edu/drupal/node/3661

m
o
.c
a
m
a
n
y
d
u
t
S
21

UNIT-02/LECTURE-04
Clausal Form: Clausal form is one in which there is the antecedent and the consequent.
Consequent depends upon the antecedent. Antecedent needs to be verified before the
consequent can be concluded. Existential quantifiers are not required and universal quantifiers
are implicit in the use of variables in atomic proposition. No operators other than conjunction
[∩] and disjunction [∪] are needed. We can represent the clausal form in any one of the

following ways:
b1 ∪ b2 ∪ … ∪ b n ⊂ a1 ∩ a2 ∩ … ∩ a m
b1 ∪ b2 ∪ … ∪ bn  a1 ∩ a2 ∩ … ∩ am
a1 ∩ a2 ∩ … ∩ am  b1 ∪ b2 ∪ … ∪ bn
means if all the A's are true, then at least one B is true. Here, it is obvious that all A's indicate
every A and thus it is universally quantified. If m = 0, then there is no antecedent to be tested
and thus the consequent is implied and called as a fact.
m
o
.c
plays(saleem, cricket) ⊂ plays(saleem, game) ∩ game(cricket)
In this example, if saleem plays a game and the game is cricket then it implies that saleem plays

a
cricket. Here, the person name (saleem) and the game name (cricket) are fixed and thus
existential quantifiers are not needed.
m
a
n
Resolution and Unification: We can use propositions to discover new theorems from known

y
axioms or assertions and other known theorems. Resolution is a principle of inference that

d
allows inferred propositions to be computed from given propositions. As an example, if a ⊂ b

u
and c ⊂ d and suppose, a is identical to d, then the Resolution is c ⊂ b. Resolution is also

t
called as the control strategy of the logical programming language.
Few Examples:
S
older(a,b) ⊂ mother(a,b)
wiser(a,b) ⊂ older(a,b)
Resolution: wiser(a,b) ⊂ mother(a,b)
By stating that mothers are older and older people are wiser, we resolve that mothers are wiser.
The most common form of resolution is called as binary resolution. In binary resolution, one
clause contains a literal and the other contains the negation of the literal. In this case, a new
clause is produced by removing the literal and its negation and then making the disjunction of
the remaining elements of both the clauses.
22

Unit Resolution P(x) ∨ Q(x), ¬P(A)


Q(A)
Resolution P(x) ∨ Q(x), ¬Q(x)
∨ R(x)
P(x) ∨ R(x)

Resolution requires the given logical statements to be converted into Conjunctive Normal Form
(CNF) or clause form. A formula is said to be in CNF form if it is made up of conjunction of
clauses, where a clause is made up of disjunction of literals. This is required for resolution
because resolution acts on a pair of disjunctions to produce a new disjunction. The use of
conjunction of clauses is required because of the assumption that all clauses are assumed to be
true at the same time. In resolution, P(x) ∨ Q(x), ¬Q(x) ∨ R(x) can be converted to P(x) ∨
R(x) because among Q(x) and ¬Q(x), one of them should always be true and the other

should always be false. Thus Q(x) and ¬Q(x) is a tautology.

m
Clauses
o
.c
A Clause is a disjunction of literals. A Ground Clause is a variable-free clause. A Horn Clause is a
clause with at most one positive literal. A Definite Clause is a Horn Clause with exactly one
positive Literal. a
m
Notice that implications are equivalent to Horn or Definite clauses: (A IMPLIES B) is equivalent
to ( (NOT A) OR B)
a
n
(A AND B IMPLIES FALSE) is equivalent to ((NOT A) OR (NOT B)).

y
Formulae
d
A Formula is either:
u

t
an atomic formula, or
 S
a Negation, i.e. the NOT of a formula, or
 a Conjunctive Formula, i.e. the AND of formulae, or
 a Disjunctive Formula, i.e. the OR of formulae, or
 an Implication, that is a formula of the form (formula1 IMPLIES formula2), or
 an Equivalence, that is a formula of the form (formula1 IFF formula2), or
 a Universally Quantified Formula, that is a formula of the form (ALL variable formula).
We say that occurrences of variable are bound in formula [we should be more precise]. Or
 a Existentially Quantified Formula, that is a formula of the form (EXISTS variable
formula). We say that occurrences of variable are bound in formula [we should be more
23

precise].

An occurrence of a variable in a formula that is not bound, is said to be free. A formula where all
occurrences of variables are bound is called a closed formula, one where all variables are free is
called an open formula.

A formula that is the disjunction of clauses is said to be in Clausal Form. We shall see that there
is a sense in which every formula is equivalent to a clausal form.

Often it is convenient to refer to terms and formulae with a single name. Form or Expression is
used to this end.

m
Substitutions


o
Given a term s, the result [substitution instance] of substituting a term

.c
t in s for a variable x, s[t/x], is:
o t, if s is the variable x
o y, if s is the variable y different from x


a
o F(s1[t/x] s2[t/x] .. sn[t/x]), if s is F(s1 s2 .. sn).
Given a formula A, the result (substitution instance) of substituting a
m
term t in A for a variable x, A[t/x], is:
o FALSE, if A is FALSE,
a
o P(t1[t/x] t2[t/x] .. tn[t/x]), if A is P(t1 t2 .. tn),

n
o (B[t/x] AND C[t/x]) if A is (B AND C), and similarly for the other

y
connectives,
o (ALL x B) if A is (ALL x B), (similarly for EXISTS),

d
o (ALL y B[t/x]), if A is (ALL y B) and y is different from x (similarly

u
for EXISTS).

t
The substitution [t/x] can be seen as a map from terms to terms and from

S
formulae to formulae. We can define similarly [t1/x1 t2/x2 .. tn/xn], where t1 t2 .. tn
are terms and x1 x2 .. xn are variables, as a map, the [simultaneous] substitution of
x1 by t1, x2 by t2, .., of xn by tn. [If all the terms t1 .. tn are variables, the
substitution is called an alphabetic variant, and if they are ground terms, it is
called a ground substitution.] Note that a simultaneous substitution is not the
same as a sequential substitution.

Unification

Given two substitutions S = [t1/x1 .. tn/xn] and V = [u1/y1 .. um/ym], the


composition of S and V, S . V, is the substitution obtained by:
o Applying V to t1 .. tn [the operation on substitutions with just this
property is called concatenation], and
o adding any pair uj/yj such that yj is not in {x1 .. xn}.
24

For example: [G(x y)/z].[A/x B/y C/w D/z] is [G(A B)/z A/x B/y C/w].
Composition is an operation that is associative and non commutative

 A set of forms f1 .. fn is unifiable iff there is a substitution S such that f1.S = f2.S = .. =
fn.S. We then say that S is a unifier of the set.
For example {P(x F(y) B) P(x F(B) B)} is unified by [A/x B/y] and also unified by [B/y].

A Most General Unifier (MGU) of a set of forms f1 .. fn is a substitution S that unifies this set
and such that for any other substitution T that unifies the set there is a substitution V such that
S.V = T. The result of applying the MGU to the forms is called a Most General Instance (MGI).
Here are some examples:

m
FORMULAE MGU MGI
o
.c
------------------------------------------------------------
--(P x), (P A) [A/x] (P A)
------------------------------------------------------------

a
--(P (F x) y (G y)), [x/y x/z] (P (F x) x (G x))
(P (F x) z (G x))
------------------------------------------------------------

m
--(F x (G y)), [(G u)/x y/z] (F (G u) (G y))
(F (G u) (G z))

a
------------------------------------------------------------
--(F x (G y)), Not Unifiable

n
(F (G u) (H z))
------------------------------------------------------------

y
--(F x (G x) x), Not Unifiable
(F (G u) (G (G z)) z)

d
--------------------------------------------------------------
This last example is interesting: we first find that (G u) should replace x, then that (G z) should

u
replace x; finally that x and z are equivalent. So we need x->(G z) and x->z to be both true. This
t
would be possible only if z and (G z) were equivalent. That cannot happen for a finite term. To
S
recognize cases such as this that do not allow unification [we cannot replace z by (G z) since z
occurs in (G z)], we need what is called an Occur Test . Most Prolog implementation use
Unification extensively but do not do the occur test for efficiency reasons.

The determination of Most General Unifier is done by the Unification Algorithm. Here is the
pseudo code for it:

FUNCTION Unify WITH PARAMETERS form1, form2, and assign RETURNS MGU, where form1
and form2 are the forms that we want to unify, and assign is initially nil.
25

1. Use the Find-Difference function described below to determine the first elements where
form1 and form2 differ and one of the elements is a variable. Call difference-set the
value returned by Find-Difference. This value will be either the atom Fail, if the two
forms cannot be unified; or null, if the two forms are identical; or a pair of the form (Variable
Expression).

2. If Find-Difference returned the atom Fail, Unify also returns Fail and we cannot unify the
two forms.

3. If Find-Difference returned nil, then Unify will return assign as MGU.

m
4. Otherwise, we replace each occurrence of Variable by Expression in form1 and form2;

o
we compose the given assignment assign with the assignment that maps Variable into

.c
Expression, and we repeat the process for the new form1, form2, and assign.

a
FUNCTION Find-Difference WITH PARAMETERS form1 and form2 RETURNS pair, where form1
and form2 are e-expressions.
m
a
n
1. If form1 and form2 are the same variable, return nil.

y
d
2. Otherwise, if either form1 or form2 is a variable, and it does not appear anywhere in the

u
other form, then return the pair (Variable Other-Form), otherwise return Fail.

t
S
3. Otherwise, if either form1 or form2 is an atom then if they are the same atom then return
nil otherwise return Fail.
4. Otherwise both form1 and form2 are lists.
Apply the Find-Difference function to corresponding elements of the two lists until either a
call returns a non-null value or the two lists are simultaneously exhausted, or some
elements are left over in one list.
In the first case, that non-null value is returned; in the second, nil is returned; in the third,
Fail is returned
26

S.NO RGPV QUESTION YEAR MARKS


Q.1 Consider the following facts: June,2006 20
i) The members of the friends club are suresh,
megha, niket and shruti.
ii) Suresh is married to megha.
iii) Niket is shruti’s brother.
iv) The spouse of every married person in the club is
also in the club.
v) The last meeting of the club was at suresh’s house.
a) Represent these facts in predicate logic.
m
b) Convert it into clause form.
o
.c
c) From the fact given above, most people
would be able to decide on the truth of the
following additional statements: a
m
vi) The last meeting of the club was at megha’s house

a
vii) Shruti is not married.
Can you construct resolution proofs to demonstrate the
n
truth of each of these statements given the facts listed
y
above? Do so if possible . otherwise give the reason.

d
Q.2 u
Explain Unification algorithm. June,2006 10
Q.3 t
Convert the following to clause form June,2013 10

S
∀X∃yP(x,y)->∃ Q(y, x)
27

UNIT-02/LECTURE-05
Resolution Refutation: Resolution refutation is used to prove a theorem by negating the goal
that should be proved. This negated goal should be added to the actual set of true axioms that
are present in the given statements. After adding the negated goal, the rules of inferences are
used to resolve that this logic leads to a contradiction. If the negated goal contradicts the given
set of axioms, then the original goal is said to be consistent with the set of axioms.

Steps involved in resolution refutation proofs are:


1. Convert the given premises or axioms into clause form.
2. Add the negation of the goal to the set of given axioms.
3. Resolve them and produce a contradiction. Contradiction could be shown by ending

m
up with an empty clause. Whenever there is a contradiction in the clauses, an empty

o
clause can always be generated and thus resolution is said to be refutation
complete.
.c
Let's consider a simple example.
a
m
Given axioms: (1) Mango is a Tree. (2) All Trees are Plants. (3) All Trees need water
Goal: Mango Tree needs water. a
n
1. Convert the premises into predicates.
a. tree(Mango) y
d
b. ∀x (tree(x)  plants(x))
u
c. ∀x (plants(x)  water(x))
t
2. The goal required is:
S water(mango)
3. Using Modus Ponens on Step 1 (a and b), we get
plants(mango)
4. Using Modus Ponens on Step 2 and 1 (c), we get
water(mango)
5. Now, convert the predicates into clause form
Predicate Clause Form

tree(mango) tree(mango)
28

∀x (tree(x)  ¬tree(x) ∨
plants(x)) plants(x)
∀x (plants(x)  ¬plants(x) ∨
water(x)) water(x)
Negate the Conclusion: water(mango)
water(mango)
6. Combining (¬tree(x) ∨ plants(x)) and (¬plants(x) ∨ water(x)), and

using resolution, we get (¬tree(x) ∨ water(x))


7. Combining tree(mango) and (¬tree(x) ∨ water(x)), we get
water(mango)
8. Considering water(mango) and ¬water(mango), we get an empty clause.

Strategies for Resolution: Strategy for resolution defines the way by which various clauses are
considered for resolution. There are many strategies present for resolution process. A strategy is

m
called as a complete strategy if by using the refutation-complete rule we are guaranteed to find

o
a refutation provided the set of clauses given is unsatisfiable. Refutation-complete rule means

.c
that we could attain unsatisfiability by using the inference rules alone, provided the set of
clauses given is unsatisfiable. A set of clauses is said to be unsatisfiable if there is no
a
interpretation available that could make the set satisfiable. Any clause (C1) is called as an

m
ancestor of another clause (C2) only if C2 is a resolvent (result of resolution) of C1 and a

a
different clause or C2 is a resolvent of a descendant of C1 and a different clause.

n
Breadth-First Strategy: Breadth-First strategy is of complete strategy type. Here,

y
every clause is compared for resolution with every other clause provided, to produce

d
the first level of clauses. The clauses in the first level are then compared for

u
resolution with the original clause, to produce the second level of clauses. Thus, the

t
clauses at the nth level are generated by resolving the (n-1)th level clause elements

S
with the original clause as well as all the previous level of clauses. Breadth-First
strategy guarantees a shortest solution path as every level is exhausted before
moving to the next level. If the process is continued for many levels, it is guaranteed
to find a refutation if one exists and thus this strategy is said to be complete.
Breadth-First strategy is better for a small set of clauses.
Given the clauses as: tree(mango), ¬tree(x) ∨ plants(x), ¬plants(x) ∨
water(x)
Negation of the goal: ¬water(mango)
29

tree(mango) ¬tree(x) ∨ plants(x) ¬plants(x) ∨ water(x) ¬water(mango)

LEVEL 1 plants(mango) ¬tree(x) ∨ water(x) ¬plants(mango)

LEVEL 2
water(mango) water(mango) ¬tree(mango) ¬tree(mango) water(mango)

LEVEL 3 EMPTY 3rd level is stopped as Empty Clause is found

• Unit Preference Strategy: Here, the resultant clause produced is preferred to have
fewer literals than the parent clause. Also, it is preferred to do resolution with unit
clauses. Thus, unit clauses are used for resolution whenever they are available and
this causes the resolution refutation to be restricted.
Given the clauses as: tree(mango), ¬tree(x) ∨ plants(x), ¬plants(x) ∨
water(x)
m
Negation of the goal: ¬water(mango)
o
tree(mango) ¬tree(x) ∨ plants(x)
.c
{mango/x} a
plants(mango) m ¬plants(x) ∨ water(x)
a
n {mango/x}

y
¬water(mango) water(mango)

d
u
t EMPTY


S
Set of Support Strategy: This strategy is said to be good for large set of clauses.
Given a set of input clauses (S), we can select a subset (T) of (S), which is called as
the set of support. The given set of clauses is divided into two partitions: (1) set of
support and (2) auxiliary set.
Make sure that the auxiliary set is not contradictory. Generally, the given set of clauses is not
contradictory and the contradiction arises only with the addition of the negation of the goal. So,
auxiliary set is generally made up of all the given axioms excluding the negation of the goal.
The set of support must be consisted of clauses that form the negated theorem or descendants
30

of the negated theorem. The set of support is generally started just with the negation of the
goal.
Using auxiliary set alone is not possible to derive contradictions, and thus one should take at
least one clause from the set of support. The strategy requires that one of the resolvents in
each resolution must be from the set of support, i.e., either should be coming from the
negation of the goal or have an ancestor in the set of support.
Given the clauses as: tree(mango), ¬tree(x) ∨ plants(x), ¬plants(x) ∨ water(x)
Negation of the goal: ¬water(mango)
Set of support = Negation of the goal = ¬water(mango)
All the other clauses are considered to be in the auxiliary set.
During the first iteration, ¬plants(mango) is got as the resolvent. This resolvent is not coming
from the negation of the goal, but one of its ancestor is from the set of support. Its ancestors

m
are ¬water(mango) and ¬plants(x) ∨ water(x). Now, we add the resolvent to the set of

o
support and thus the set of support becomes ¬water(mango), ¬plants(mango). Now, we

.c
apply the new member of the set of support with the original auxiliary set. The new resolvent is
added to the set of support and the process continues until an empty resolvent is obtained.

a
m
a
n
Auxiliary Set: tree(mango), ¬tree(x) ∨ plants(x), ¬plants(x) ∨ water(x)
Set of Support: ¬water(mango)

¬water(mango)
y
tree(mango) ¬tree(x) ∨ plants(x) ¬plants(x) ∨ water(x)

d
u ¬plants(mango)

t
Set of Support: ¬water(mango), ¬plants(mango)

¬plants(mango) tree(mango) ¬tree(x) ∨ plants(x) ¬plants(x) ∨ water(x)

S ¬tree(mango)
Set of Support: ¬water(mango), ¬plants(mango), ¬tree(mango)

¬tree(mango) tree(mango) ¬tree(x) ∨ plants(x) ¬plants(x) ∨ water(x)

EMPTY

• Linear Input Form Strategy: Here the negated goal is taken first and resolved with
any one of the original axioms but not the goal. The resultant clause is then resolved
with any one of the axioms to get another new clause, which is then resolved with
31

any one of the axioms. This process continues until an empty clause is obtained.
Given the clauses as: tree(mango), ¬tree(x) ∨ plants(x), ¬plants(x) ∨
water(x)
Negation of the goal: ¬water(mango)

¬water(mango) ¬plants(x) ∨ water(x)

{mango/x}

¬plants(mango) ¬tree(x) ∨ plants(x)

{mango/x}

tree(mango) ¬tree(mango)

m
o
EMPTY

.c
It is also noted that Linear Input Strategy allows those resolutions where one of the
clauses is a member of the original set of axioms and/or the negated theorem. Linear
Input Strategy is not refutation complete. a
• m
Ancestry Filtering Strategy: The condition statement in this strategy is that at least

a
one member of the clauses that are being resolved either is a member of the original

n
set of clauses or is an ancestor of the other clause being resolved.

y
d
S.NO u
RGPV QUESTION YEAR MARKS
Q.1 t
Write short note:- June.2014 10
S i) Resolution
ii) Refutation
iii) Inferencing
32

UNIT-02/LECTURE-06

RESOLUTION IN PREDICATE LOGIC

Two literals are contradictory if one can be unified with the negation of the other. For example
man(x) and man (Himalayas) are contradictory since man(x) and man(Himalayas ) can be
unified. In predicate logic unification algorithm is used to locate pairs of literals that cancel out.
It is important that if two instances of the same variable occur, then they must be given
identical substitutions. The resolution algorithm for predicate logic as follows

Let f be a set of given statements and S is a statement to be proved.

1. Covert all the statements of F to clause form.

m
2. Negate S and convert the result to clause form. Add it to the set of clauses obtained in 1.

o
.c
3. Repeat until either a contradiction is found or no progress can be made or a predetermined
amount of effort has been expended.
a
a) Select two clauses. Call them parent clauses.
m
a
b) Resolve them together. The resolvent will be the disjunction of all of these literals of both
n
clauses. If there is a pair of literals T1 and T2 such that one parent clause contains Ti and the
y
other contains T2 and if T1 and T2 are unifiable, then neither t1 nor T2 should appear in the
d
resolvent. Here Ti and T2 are called complimentary literals.
u
t
© If the resolvent is the empty clause , then a contradiction has been found. If it is not, then

S
add it to the set of clauses available to the procedure.

Using resolution to produce proof is illustrated in the following statements.

1. Marcus was a man.

2. Marcus was a Pompeian


33

3. All pompeians were Roman

4. Caesar was a ruler

5. All Romans were either loyal to Caesar or hated him.

6. Everyone is loyal to someone.

7. People only try to assassinate rulers they are not loyal to.

8. Marcus tried to assassinate Caesar.

m
o
.c
a
m
a
n
y
d
u
t
S
34

Compare different knowledge representation languages

Resolution Refutation Proofs


The basic steps
Convert the set of rules and facts into clause form (conjunction of clauses)
Insert the negation of the goal as another clause
Use resolution to deduce a refutation

m
If a refutation is obtained, then the goal can be deduced from the set of facts and rules.
Conversion to Normal Form
o
.c
A formula is said to be in clause form if it is of the form:
∀x1 ∀x2 … ∀xn [C1 ∧C2 ∧… ∧Ck]
a
All first-order logic formulas can be converted to clause form
We shall demonstrate the conversion on the formula: m
a
∀x {p(x) ⇒∃z { ¬∀y [q(x,y) ⇒p(f(x1))] ∧∀y [q(x,y) ⇒p(x)] }}
n
Step1: Take the existential closure and eliminate redundant quantifiers. This introduces ∃x1
y
d
and eliminates ∃z, so:

u
∀x {p(x) ⇒∃z { ¬∀y [q(x,y) ⇒p(f(x1))] ∧∀y [q(x,y) ⇒p(x)] }}

t
∃x1 ∀x {p(x) ⇒{ ¬∀y [q(x,y) ⇒p(f(x1))] ∧∀y [q(x,y) ⇒p(x)] }}

S
Step 2: Rename any variable that is quantified more than once. y has been quantified twice, so:
∃x1 ∀x {p(x) ⇒{ ¬∀y [q(x,y) ⇒p(f(x1))] ∧∀y [q(x,y) ⇒p(x)] }}
∃x1 ∀x {p(x) ⇒{ ¬∀y [q(x,y) ⇒p(f(x1))] ∧∀z [q(x,z) ⇒p(x)] }}
Step 3: Eliminate implication.
∃x1 ∀x {p(x) ⇒{ ¬∀y [q(x,y) ⇒p(f(x1))] ∧∀z [q(x,z) ⇒p(x)] }}
∃x1 ∀x {¬p(x) ∨{¬∀y [¬q(x,y) ∨p(f(x ))] ∧∀z [¬q(x,z) ∨p(x)] }}
Step 4: Move ¬ all the way inwards.
∃x1 ∀x {¬p(x) ∨{¬∀y [¬q(x,y) ∨p(f(x1 ))] ∧∀z [¬q(x,z) ∨p(x)] }}
35

∃x1 ∀x {¬p(x) ∨{∃y [q(x,y) ∧¬p(f(x1))] ∧∀z [¬q(x,z) ∨p(x)] }}


Step 5: Push the quantifiers to the right
∃x1 ∀x {¬p(x) ∨{∃y [q(x,y) ∧¬p(f(x1)) ∧∀z [¬q(x,z) ∨p(x)] }}
∃x1 ∀x {¬p(x) ∨{[∃y q(x,y) ∧¬p(f(x1)) ∧[∀z ¬q(x,z) ∨p(x)] }}
Step 6: Eliminate existential quantifiers (Skolemization).
Pick out the leftmost ∃y B(y) and replace it by B(f(xi1, xi2,…, xin)), where:
a)x i1 , x i2 ,…, xin are all the distinct free variables of ∃y B(y) that are universally quantified to the
left of ∃y B(y), and
b)F is any n-ary function constant which does not occur already

Skolemization:
∃x1 ∀x {¬p(x) ∨{[∃y q(x,y) ∧¬p(f(x1))] ∧[∀z ¬q(x,z) ∨p(x)] }}
∀x {¬p(x) ∨{[q(x,g(x)) ∧¬p(f(a)) ∧[∀z ¬q(x,z) ∨p(x)] }}
m
o
.c
Step 7: Move all universal quantifiers to the left
∀x {¬p(x) ∨{[q(x,g(x)) ∧¬p(f(a))] ∧[∀z ¬q(x,z) ∨p(x)] }}
∀x ∀z {¬p(x) ∨{[q(x,g(x)) ∧¬p(f(a))] ∧[¬q(x,z) ∨p(x)] }} a
m
Step 8: Distribute ∧over ∨. a
n
∀x ∀z {[¬p(x) ∨q(x,g(x))] ∧[¬p(x) ∨¬p(f(a))] ∧[¬p(x) ∨¬q(x,z) ∨p(x)] }
y
Step 9: (Optional) Simplify d
u
∀x {[¬p(x) ∨q(x,g(x))] ∧¬p(f(a)) }
t
Resolution S
If Unify(zj, ¬qk) = 𝜃𝜃, then:
z1 ∨...∨ zm , q1 ∨...∨qn
----------------------------------------------------
SUBST(𝜃𝜃,,z1 ∨...∨zi −1 ∨zi1 ∨...∨zm
∨q1 ∨...∨qk−1 ∨qk1 ∨...∨qn )
36

S.NO RGPV QUESTION YEAR MARKS


Q.1
Convert the following sentences to
June.2012 10
predicate logic:

1. Marcus was a man.

2. Marcus was a Pompeian

3. All pompeians were Roman


m
4. Caesar was a ruler
o
5. All Romans were either loyal to Caesar .c
or hated him. a
m
a
6. Everyone is loyal to someone.

n
7. People only try to assassinate rulers they
are not loyal to. y
d
u
8. Marcus tried to assassinate Caesar.

t
S
37

UNIT-02/LECTURE-07
forward chaining
Using a deduction to reach a conclusion from a set of antecedents is called forward chaining. In other
words,the system starts from a set of facts,and a set of rules,and tries to find the way of using these
rules and facts to deduce a conclusion or come up with a suitable couse of action. This is known as data
driven reasoning.

m
o
.c
a
The proof tree generated by forward chaining. Example knowledge base
•The law says that it is a crime for an American to sell weapons to hostile nations. The country
m
Nono, an enemy of America, has some missiles, and all of its missiles were sold to it by Colonel
West, who is American. a
•Prove that Col. West is a criminal n
y
... it is a crime for an American to sell weapons to hostile nations:
d
American(x) ∧ Weapon(y) ∧ Sells(x,y,z) ∧ Hostile(z) ⇒ Criminal(x)
u
Nono … has some missiles, i.e., ∃x Owns(Nono,x) ∧ Missile(x):
t
Owns(Nono,M1) and Missile(M1)
S
… all of its missiles were sold to it by Colonel West
Missile(x) ∧ Owns(Nono,x) ⇒Sells(West,x,Nono)
Missiles are weapons:
Missile(x) ⇒Weapon(x)
An enemy of America counts as "hostile“:
Enemy(x,America) ⇒Hostile(x)
West, who is American …
American(West)
38

The country Nono, an enemy of America …


Enemy(Nono,America)

Note:
(a) The initial facts appear in the bottom level
(b) Facts inferred on the first iteration is in the middle level (c) The facts inferred on the 2nd
iteration is at the top level
Forward chaining algorithm

m
o
.c
a
m
a
n
y
backward chaining

d
Forward chaining applies a set of rules and facts to deduce whatever conclusions can be

u
derived.

t
In backward chaining ,we start from a conclusion,which is the hypothesis we wish to prove,and

S
we aim to show how that conclusion can be reached from the rules and facts in the data base.
The conclusion we are aiming to prove is called a goal ,and the reasoning in this way is known as
goal-driven.
Backward chaining example
39

Fig : Proof tree constructed by backward chaining to prove that West is criminal.
Note:
(a) To prove Criminal(West) ,we have to prove four conjuncts below it.
m
(b) Some of which are in knowledge base, and others require further backward chaining.
o
.c
a
m
a
n
y
Every sentence of first-order logic can be converted into an inferentially equivalent CNF
d
sentence. In particular, the CNF sentence will be unsatisfiable just when the original sentence is
u
unsatisfiable, so we have a basis for doing proofs by contradiction on the CNF sentences. Here
t
we have to eliminate existential quantifiers.
S.NO S RGPV QUESTION YEAR MARKS
Q.1 June.2011 10
Q.2 Dec.2012 10
Q.3 June.2011 10
40

UNIT-02/LECTURE-08
Deduction
An Inference Rule is a rule for obtaining a new formula [the consequence] from a set of given
formulae [the premises].
A most famous inference rule is Modus Ponens:
{A, NOT A OR B}
---------------
B
For example:
{Sam is tall,
m
if Sam is tall then Sam is unhappy}
o
.c
------------------------------------
Sam is unhappy

a
When we introduce inference rules we want them to be Sound, that is, we want the

m
consequence of the rule to be a logical consequence of the premises of the rule. Modus Ponens

a
is sound. But the following rule, called Abduction , is not:
{B, NOT A OR B}
n
--------------
y
A
d
Theorem Proving:
u
t
The aim of theorem proving is to negate a theorem that is stated as a proposition. This could be

S
done by finding inconsistency in the theorem. Theorem proving has been a basis for logic
programming. Most propositions can be represented in Horn Clause format. When propositions
are used for resolution, only restricted form (Headless horn form) can be used.

Horn Clause: Horn clause helps in representing procedures as specifications, which assist in
automatic deduction of interested queries. Horn clause is either with headed or headless form.
Headed form can have multiple propositions on the right hand side (called as antecedent) but
can have only one atomic proposition on left side (called as consequent). Headless has no left
side and it is used to state facts. Headless horn clauses are called as facts (or logical statement),
41

while the headed horn clauses are called as rules.


Let's consider an example of sorting a list: Here, we specify the characteristics of a sorted list
but not the actual process of rearranging a list.
sort(old_list, new_list) ⊂permutation(old_list, new_list) ∩ sorted (new_list)
sorted(list) ⊂ ∀x such that 1 ≤ x < n, list(x) ≤ list(x+1)
Then, we need to define the process of permutation:
permutation(old_list, new_list) ⊂ …
Structures are represented using atomic proposition. A predicate with zero or more arguments
is written in functional notation as functor(parameters). Fact is a term that is followed by a
period and is similar to horn clause without a right-hand side. Here are some Headless Horn
clauses (fact) examples, which have no conditions or right-hand part.
student(noraini). % noraini is a student.
brother(arif, amri). % arif is the brother of amri.
m
o
So, after specifying the facts above, if you generate a query as brother(arif,X), logical language

.c
responds X = amri. But if the query is given as brother(amri,X), logical language responds No.
Rule Statements: A rule is a term that is followed by :- (means if) and a series of terms
a
separated by commas and ends with a period ".". The presence of the period indicates to the

m
compiler or interpreter that the rule ends there. Headed Horn clause (Rule) has a head h, which

a
is a predicate, and a body, which is a list of predicates p1, p2, … pn, written in any one of the
following formats:
n
h ← p1, p2, … pn
y
h ⇒ p1, p2, … pn
d
h :- p1, p2, … pn
u
t
Means that h is true, only if p1, p2, … and pn are simultaneously true. Right side part is called as

S
the antecedent (if part) and it may be single term or conjunction of terms. Left side is called as
the consequent (then part) and it must be of one term only.

To make the logical language answer the brother(amri,X) query as X = arif, then we need to add
a rule as below:
brother(X, Y) :- brother(Y,X), !. % Please note the period at the end.
Legal goals or queries are made up of conjunctive propositions and/or propositions with
variables. For example:
brother(amri, X). % X is the variable to query for.
42

Theorem proving is a proposition represented in headless horn format and for this, we want the
system to either prove or disapprove.
student(hafiz). % Verify whether hafiz is a student or not.

Conjunction: Conjunction is the process of joining multiple terms with logical AND operations.
Logical AND is represented using comma or ∧. Implication is represented using :- or ⊃. As an
example, a person who could read as well as write is called as a literate. Here, read and write
are the conditions that must be true, to imply that literate is true. By default, it is assumed that
all the variables used are universal objects. This is represented as:
literate(C) :−read(C), write(C) % if read & write are true, then literate is true.
is equivalent to read(C) ∧ write(C) ⊃ literate(C)

m
o
S.NO RGPV QUESTION YEAR MARKS

.c
Q.1 Dec.2010 5
Dec.2011

a
Dec.2012
Q.2
m Dec.2009 10

a
n
y
d
u
t
S
43

UNIT-02/LECTURE-09
What is reasoning?
When we require any knowledge system to do something it has not been explicitly told how to
do it must reason.
The system must figure out what it needs to know from what it already knows.
We have seen simple example of reasoning or drawing inferences already. For example if we
know: Robins are birds.
All birds have wings. Then if we ask: Do robins have wings?

How can we reason?

To a certain extent this will depend on the knowledge representation chosen. Although a good

m
knowledge representation scheme has to allow easy, natural and plausible reasoning. Listed
below are very broad methods of how we may reason.
o
.c
Formal reasoning
-- Basic rules of inference with logic knowledge representations.
Procedural reasoning a
m
-- Uses procedures that specify how to perhaps solve (sub) problems.
Reasoning by analogy
a
n
-- Humans are good at this, more difficult for AI systems. E.g. If we are asked Can robins fly?.

y
The system might reason that robins are like sparrows and it knows sparrows can fly so ...

d
Generalisation and abstraction

u
-- Again humans effective at this. This is basically getting towards learning and understanding
methods.
t
S
Meta-level reasoning
-- Once again uses knowledge about about what you know and perhaps ordering it in some
kind of importance.

Uncertain Reasoning?

Unfortunately the world is an uncertain place.


Any AI system that seeks to model and reasoning in such a world must be able to deal with this.
44

In particular it must be able to deal with:


• Incompleteness -- compensate for lack of knowledge.
• Inconsistencies -- resolve ambiguities and contradictions.
• Change -- it must be able to update its world knowledge base over time.
Clearly in order to deal with this some decision that a made are more likely to be true (or false)
than others and we must introduce methods that can cope with this uncertainty.
There are three basic methods that can do this:

• Symbolic methods.
• Statistical methods.
• Fuzzy logic methods.

Non-Monotonic Reasoning

m
Predicate logic and the inferences we perform on it is an example of monotonic reasoning.

o
In monotonic reasoning if we enlarge at set of axioms we cannot retract any existing assertions or

.c
axioms.
Humans do not adhere to this monotonic structure when reasoning:
• a
we need to jump to conclusions in order to plan and, more basically, survive.

m
 we cannot anticipate all possible outcomes of our plan.

a
 we must make assumptions about things we do not specifically know about.

n
y
Traditional systems based on predicate logic are monotonic. Here number of statements known

d
to be true increases with time. New statements are added and new theorems are proved, but

u
the previously known statements never become invalid.

t
S
In monotonic systems there is no need to check for in consistencies between new statements
and the old knowledge. When a proof is made, the basis of the proof need not be remembered,
since the old statements never disappear. But monotonic systems are not good in real problem
domains where the information is incomplete, situations change and new assumptions are
generated while solving new problems.

Non monotonic reasoning is based on default reasoning or “most probabilistic choice”. S is


assumed to be true as long as there is no evidence to the contrary. For example when we visit a
friend’s home, we buy biscuits for the children . because we believe that most children like
45

biscuits. In this case we do not have information to the contrary. A computational description of
default reasoning must relate the lack of information on X to conclude on Y.

Default reasoning ( or most probabilistic choice) is defined as follows:

Definition 1 : If X is not known, then conclude Y.

Definition 2 : If X can not be proved, then conclude Y.

Definition 3: If X can not be proved in some allocated amount of time then conclude Y.

It is to be noted that the above reasoning process lies outside the realm of logic. It conclude on

m
Y if X can not be proved, but never bothers to find whether X can be proved or not. Hence the

o
default reasoning systems can not be characterized formally. Even if one succeeds in gaining

.c
complete information at the moment, the validity of the information may not be for ever, since
it is a

a
m
changing world. What appears to be true now, may be so at a later time ( in a non monotonic
system).
a
n
y
One way to solve the problem of a changing world to delete statements when they are no

d
longer accurate, and replace them by more accurate statements. This leads to a non monotonic

u
system in which statements can be deleted as well as added to the knowledge base. When a

t
statement is deleted, other related statements may also have to be deleted. Non monotonic

S
reasoning systems may be necessary due to any of the following reasons.

- The presence of incomplete inforamtio0n requires default reasoning.

- A changing world must be described by a changing database.

- Generating a complete solution to a problem may require temporary


assumptions about partial solutions.
46

Non monotonic systems are harder to deal with than monotonic systems. This is because when
a statement is deleted as “no more valid”, other related statements have to be backtracked and
they should be either deleted or new proofs have to be found for them. This is called
dependency directed backtracking (DDB). In order to propagate the current changes into the
database, the statements on which a particular proof depends, should also be stored along with
the proof. Thus non – monotonic systems require more storage space as well as more
processing time than monotonic systems.

Default reasoning

This is a very common from of non-monotonic reasoning. Here We want to draw conclusions
based on what is most likely to be true.

m
We have already seen examples of this and possible ways to represent this knowledge.
We will discuss two approaches to do this:
o
• Non-Monotonic logic. .c
• Default logic. a
m
a
DO NOT get confused about the label Non-Monotonic and Default being applied to reasoning

n
and a particular logic. Non-Monotonic reasoning is generic descriptions of a class of reasoning.

y
Non-Monotonic logic is a specific theory. The same goes for Default reasoning and Default logic.

d
Non-Monotonic Logic
u
t
S
This is basically an extension of first-order predicate logic to include a modal operator, M. The
purpose of this is to allow for consistency.

For example: : plays_instrument(x) improvises(x) jazz_musician(x)

states that for all x is x plays an instrument and if the fact that x can improvise is consistent with
all other knowledge then we can conclude that x is a jazz musician.

How do we define consistency?


47

One common solution (consistent with PROLOG notation) is


to show that fact P is true attempt to prove . If we fail we may say that P is consistent (since
is false).

However consider the famous set of assertions relating to President Nixon.

: Republican(x) Pacifist(x Pacifist(x)

: Quaker(x) Pacifist(x) Pacifist(x)

Now this states that Quakers tend to be pacifists and Republicans tend not to be.

m
BUT Nixon was both a Quaker and a Republican so we could assert:
o
Quaker(Nixon) .c
a
Republican(Nixon)
m
a
n
This now leads to our total knowledge becoming inconsistent.

y
Default Logic
d
u
t
Default logic introduces a new inference rule:

S
Which states if A is deducible and it is consistent to assume B then conclude C.
Now this is similar to Non-monotonic logic but there are some distinctions: New inference rules
are used for computing the set of plausible extensions. So in the Nixon example above Default
logic can support both assertions since is does not say anything about how choose between
them -- it will depend on the inference being made.In Default logic any nonmonotonic
expressions are rules of inference rather than expressions.
48

S.NO RGPV QUESTION YEAR MARKS


Q.1 Explain monotonic and non-monotonic June.2013 10
reasoning with example June.2006
Q.2 Differentiate the monotonic and non- June.2011 10
monotonic reasoning

m
o
.c
a
m
a
n
y
d
u
t
S
49

REFERENCE

BOOK AUTHOR PRIORITY


Artificial 1
Intelligence. Elaine Rich & Kevin Knight
Artificial 2
Intelligence. A
modern
approach Russell Stuart & peter Norvig

m
o
.c
a
m
a
n
y
d
u
t
S
1

UNIT-03
Probabilistic Reasoning
UNIT-03/LECTURE-01

Bayes Theorem
This reads that given some evidence E then probability that hypothesis is true is equal to the ratio of the
probability that E will be true given times the a priori evidence on the probability of and the sum of the
probability of E over the set of all hypotheses times the probability of these hypotheses.

The set of all hypotheses must be mutually exclusive and exhaustive.


Thus to find if we examine medical evidence to diagnose an illness. We must know all the prior
probabilities of find symptom and also the probability of having an illness based on certain symptoms
being observed.

Bayesian statistics lie at the heart of most statistical reasoning systems.


How is Bayes theorem exploited?
The key is to formulate problem correctly: m
o
P(A|B) states the probability of A given only B's evidence. If there is other relevant evidence then it must

.c
also be considered.

Herein lies a problem:

a
• All events must be mutually exclusive. However in real world problems events are not generally
unrelated. For example in diagnosing measles, the symptoms of spots and a fever are related. This means

m
that computing the conditional probabilities gets complex.
In general if a prior evidence, p and some new observation, N then computing
grows exponentially for large sets of p
a
n
• All events must be exhaustive. This means that in order to compute all probabilities the set of possible
events must be closed. Thus if new information arises the set must be created afresh and all probabilities
recalculated.
y
d
Thus Simple Bayes rule-based systems are not suitable for uncertain reasoning.

u
• Knowledge acquisition is very hard.
• Too many probabilities needed -- too large a storage space.

t
• Computation time is too large.

S
• Updating new information is difficult and time consuming.
• Exceptions like ``none of the above'' cannot be represented.
• Humans are not very good probability estimators.

However, Bayesian statistics still provide the core to reasoning in many uncertain reasoning systems with
suitable enhancement to overcome the above problems.
We will look at three broad categories:
• Certainty factors,
• Dempster-Shafer models,
• Bayesian networks.

Belief Models and Certainty Factors


This approach has been suggested by Shortliffe and Buchanan and used in their famous medical diagnosis
MYCIN system.
MYCIN is essentially and expert system. Here we only concentrate on the probabilistic reasoning aspects
2

of MYCIN.
• MYCIN represents knowledge as a set of rules.
• Associated with each rule is a certainty factor
• A certainty factor is based on measures of belief B and disbelief D of an hypothesis given evidence E as
follows:
where is the standard probability.
• The certainty factor C of some hypothesis given evidence E is defined as:

Reasoning with Certainty factors


• Rules expressed as if evidence list then there is suggestive evidence with probability, p for symptom .
• MYCIN uses rules to reason backward to clinical data evidence from its goal of predicting a disease-
causing organism.
• Certainty factors initially supplied by experts changed according to previous formulae.
• How do we perform reasoning when several rules are chained together?
Measures of belief and disbelief given several observations are calculated as follows:
• How about our belief about several hypotheses taken together? Measures of belief given several
hypotheses and to be combined logically are calculated as follows:
Disbelief is calculated similarly.

Overcoming the Bayes Rule shortcomings

m
Certainty Factors do adhere to the rules of Bayesian statistics, but it can represent tractable knowledge
systems:
o
• Individual rules contribute belief in an hypotheses -- basically a conditional probability.

.c
• The formulae for combination of evidence / hypotheses basically assume that all rules are independent
ruling out the need for joint probabilities.
• The burden of guaranteeing independence is placed on the rule writer.
a
m
UNIT-03/Lecture 02
a
Dempster-Shafer Models n
y
This can be regarded as a more general approach to representing uncertainty than the Bayesian

d
approach.
Bayesian methods are sometimes inappropriate:

u
Let A represent the proposition Demi Moore is attractive.Then the axioms of probability insist that
Now suppose that Andrew does not even know who Demi Moore is.
Then
t
S
• We cannot say that Andrew believes the proposition if he has no idea what it means.
• Also, It is not fair to say that he disbelieves the proposition.
• It would therefore be meaningful to denote Andrew's belief of B(A) and as both being 0.
• Certainty factors do not allow this.

Dempster-Shafer Calculus
The basic idea in representing uncertainty in this model is:
• Set up a confidence interval -- an interval of probabilities within which the true probability lies with a
certain confidence -- based on the Belief B and plausibility PL provided by some evidence E for a
proposition P.
• The belief brings together all the evidence that would lead us to believe in P with some certainty.
• The plausibility brings together the evidence that is compatible with P and is not inconsistent with it.
• This method allows for further additions to the set of knowledge and does not assume disjoint
outcomes.
3

If is the set of possible outcomes, then a mass probability, M, is defined for each member of the set and
takes values in the range [0,1].
The Null set, , is also a member of .
NOTE: This deals wit set theory terminology that will be dealt with in a tutorial shortly. Also see exercises
to get experience of problem solving in this important subject matter.
M is a probability density function defined not just for but for em all subsets.
So if is the set { Flu (F), Cold (C), Pneumonia (P) } then is the set { , {F}, {C}, {P}, {F, C}, {F, P}, {C, P}, {F, C, P}
}
• The confidence interval is then defined as [B(E),PL(E)]
where i.e. all the evidence that makes us believe in the correctness of P, and

where i.e. all the evidence that contradicts P.


Combining beliefs
• We have the ability to assign M to a set of hypotheses.
• To combine multiple sources of evidence to a single (or multiple) hypothesis do the following:
o Suppose and are two belief functions.
o Let X be the set set of subsets of to which assigns a nonzero value and letY be a similar set for
o Then to get a new belief function from the combination of beliefs in and we do:
whenever .
NOTE: We define to be 0 so that the orthogonal sum remains a basic probability assignment.
Combining beliefs
• We have the ability to assign M to a set of hypotheses. m
o
• To combine multiple sources of evidence to a single (or multiple) hypothesis do the following:

.c
o Suppose and are two belief functions.
o Let X be the set of subsets of to which assigns a nonzero value and letY be a similar set for
o Then to get a new belief function from the combination of beliefs in and we do:
Whenever .
a
NOTE: We define to be 0 so that the orthogonal sum remains a basic probability assignment.

m
Bayesian networks also called Belief Networks or Probabilistic Inference Networks.
The basic idea is:
a
n
• Knowledge in the world is modular -- most events are conditionally independent of most other events.
• Adopt a model that can use a more local representation to allow interactions between events that only
affect each other.
y
• Some events may only be unidirectional others may be bidirectional -- make a distinction between
these in model.
d
u
• Events may be causal and thus get chained together in a network.

t
Implementation
S
• A Bayesian Network is a directed acyclic graph:
o A graph where the directions are links which indicate dependencies that exist between nodes.
o Nodes represent propositions about events or events themselves.
o Conditional probabilities quantify the strength of dependencies.

Consider the following example:


• The probability, that my car won't start.
• If my car won't start then it is likely that
o The battery is flat or
o The staring motor is broken.
In order to decide whether to fix the car myself or send it to the garage I make the following decision:
• If the headlights do not work then the battery is likely to be flat so i fix it myself.
• If the starting motor is defective then send car to garage.
• If battery and starting motor both gone send car to garage.
4

Reasoning in Bayesian nets


• Probabilities in links obey standard conditional probability axioms.
• Therefore follow links in reaching hypothesis and update beliefs accordingly.
• A few broad classes of algorithms have bee used to help with this:
o Pearls's message passing method.
o Clique triangulation.
o Stochastic methods.
o Basically they all take advantage of clusters in the network and use their limits on the influence to
constrain the search through net.
o They also ensure that probabilities are updated correctly.
• Since information is local information can be readily added and deleted with minimum effect on the
whole network. ONLY affected nodes need updating.

UNIT-03/Lecture 03

Fuzzy Logic

m
This topic is treated more formally in other courses. Here we summarize the main points for the sake
completeness.
Fuzzy logic is a totally different approach to representing uncertainty:
o
.c
• It focuses on ambiguities in describing events rather the uncertainty about the occurrence of an event.
• Changes the definitions of set theory and logic to allow this.
• Traditional set theory defines set memberships as a Boolean predicate.
Fuzzy Set Theory
a
• Fuzzy set theory defines set membership as a possibility distribution.

m
a
This basically states that we can take n possible events and us f to generate as single possible outcome.
This extends set membership since we could have varying definitions of, say, hot curries. One person

n
might declare that only curries of Vindaloo strength or above are hot whilst another might say madras
and above are hot. We could allow for these variations definition by allowing both possibilities in fuzzy
definitions.
y
d
Once set membership has been redefined we can develop new logics based on combining of
Uncertain Reasoning

u
Sometimes the knowledge in rules is not certain. Rules then may be enhanced by adding information
t
about how certain the conclusions drawn from the rules may be. Here we describe certainty factors and

S
their manipulation.
Often, experts can't give definite answers.
May require an inference mechanism that derives conclusions by combining uncertainties.

Fuzzy Inferencing
The process of fuzzy reasoning is incorporated into what is called a Fuzzy Inferencing System. It is
comprised of three steps that process the system inputs to the appropriate system outputs. These steps
are 1) Fuzzification, 2) Rule Evaluation, and 3) Defuzzification. The system is illustrated in the following
figure. https://www.rgpvonline.com
1. Fuzzification : is the first step in the fuzzy inferencing process. This involves a domain formation where
crisp inputs are transformed into fuzzy inputs. Crisp inputs are exact inputs measured by sensors and
passed into the control system for processing, such as temperature, pressure, rpm's, etc.. Each crisp input
that is to be processed by the FIU has its own group of membership functions or sets to which they are
transformed. This group of membership functions exists within a universe of discourse that holds all
5

relevant values that the crisp input can possess. The following shows the structure of membership
functions within a universe of discourse for a crisp input.

2. Degree of membership: degree to which a crisp value is compatible to a membership function, value
from 0 to 1, also known as truth value or fuzzy input. membership function, MF: defines a fuzzy set by
mapping crisp values from its domain to the sets associated degree of membership.

3.crisp inputs: distinct or exact inputs to a certain system variable, usually measured

4.parameters external from the control system, e.g. 6 Volts.

5.label: descriptive name used to identify a membership function.

6.scope: or domain, the width of the membership function, the range of concepts, usually numbers, over
which a membership function is mapped.

7.universe of discourse: range of all possible values, or concepts, applicable to a system variable. When
designing the number of membership functions for an input variable, labels must initially be determined
for the membership functions. The number of labels correspond to the number of regions that the
universe should be divided, such that each label describes a region of behavior. A scope must be assigned

m
to each membership function that numerically identifies the range of input values that correspond to a
label. The shape of the membership function should be representative of the variable. However this
o
shape is also restricted by the computing resources available. Complicated shapes require more complex

.c
descriptive equations or large lookup tables. The next figure shows examples of possible shapes for
membership functions.

Reasoning Under Uncertainty


a
Human expertise is based on effective application of learned biases. These biases must be tempered with

m
an understanding of strengths and weaknesses (range of applicability) of each bias.
In expert systems, a model of inexact reasoning is needed to capture the judgmental, ``art of good
guessing'' quality of science.
a
n
In this section we discuss several approaches to reasoning under uncertainty.
• Bayesian model of conditional probability

y
• EMYCIN's method, an approximation of Bayesian
• Bayesian nets, a more compact representation used for multiple variables.
d
u UNIT-03/Lecture 04
t
S
Certainty Factors
Logic and rules provide all or nothing answers
An expert might want to say that something provides evidence for a conclusion, but it is not definite.
For example, the MYCIN system, an early expert system that diagnosed bacterial blood infections, used
rules of this form:
if the infection is primary-bacteremia
and the site of the culture is one of the sterile sites
and the suspected portal of entry is the gastrointestinal tract
then there is suggestive evidence (0.7) that the infection is bacteroid
0.7 is a certainty factor
Certainty factors have been quantified using various different systems, including linguistics ones (certain,
fairly certain, likely, unlikely, highly unlikely, definitely not) and various numeric scales, such as 0-10, 0-1,
and -1 to 1. We shall concentrate on the -1 to 1 version.
Certainty factors may apply both to facts and to rules, or rather to the conclusion(s) of rules.
6

A "Theory" of Certainty
Certainty factors range from -1 to +1
As the certainty factor (CF) approaches 1 the evidence is stronger for a hypothesis.
As the CF approaches -1 the confidence against the hypothesis gets stronger.
A CF around 0 indicates that there is little evidence either for or against the hypothesis.
Certainty Factors and Rules
Premises for rules are formed by the and and or of a number of facts.
The certainty factors associated with each condition are combined to produce a certainty factor for the
whole premise.
For two conditions P1 and P2:
CF(P1 and P2) = min(CF(P1), CF(P2))
CF(P1 or P2) = max(CF(P1), CF(P2))

The combined CF of the premises is then multiplied by the CF of the rule to get the CF of the conclusion
Example
if (P1 and P2) or P3 then C1 (0.7) and C2 (0.3)
Assume CF(P1) = 0.6, CF(P2) = 0.4, CF(P3) = 0.2
CF(P1 and P2) = min(0.6, 0.4) = 0.4
CF(0.4, P3) = max(0.4, 0.2) = 0.4
CF(C1) = 0.7 * 0.4 = 0.28
CF(C2) = 0.3 * 0.4 = 0.12
________________________________________ m
Combining Multiple CF's
o
.c
Suppose two rules make conclusions about C.
How do we combine evidence from two rules?
Let CFR1(C) be the current CF for C.
Let CFR2(C) be the CF for C resulting from a new rule.
The new CF is calculated as follows: a
CFR1(C) + CFR2(C) - CFR1(C) * CFR2(C) when CFR1(C) m
and CFR2(C)
a
n
are both positive

y
CFR1(C) + CFR2(C) + CFR1(C) * CFR2(C) when CFR1(C)
and CFR2(C)
are both negative
d
u
[CFR1(C) + CFR2(C)]/[1 - min(|CFR1(C)|, |CFR2(C)|)] when CFR1(C)
and CFR2(C)
t
S
are of opposite sign
________________________________________
What do certainty factors mean?
• They are guesses by an expert about the relevance of evidence.
• They are ad hoc.
• CF's are tuned by trial and error.
• CF's hide more knowledge.
Certainty factors quantify the confidence that an expert might have in a conclusion that s/he has arrived
at. We have given rules for combining certainty factors to obtain estimates of the certainty to be
associated with conclusions obtained by using uncertain rules and uncertain evidence.
Certainty Factor : A certainty factor is a number, often in the range -1 to +1, which is associated with a
condition or an action of a
rule. In more detail, each component of a condition may have an certainty factor associated with it - for
example if the condition is of the form A and B, then there could be a certainty factor for A and a certainty
factor for B.
7

A certainty factor of 1 means that the fact (or proposition) is highly certain. A certainty factor of 0 means
no information about whether the proposition is true or not. A certainty factor of -1 means that the
proposition is certainly false. A certainty factor of 0.7 means that the proposition is quite likely to be true,
and so on.
The certainty factors of conditions are associated with facts held in working memory. Certainty factors for
actions are stored as part of the rules.
Rules for manipulating certainty factors are given in the lecture notes on uncertain reasoning.
However, here is a simple example. Suppose that there is a rule
if P then Q (0.7)
meaning that if P is true, then, with certainty factor 0.7, Q follows. Suppose also that P is stored in
working memory with an associated certainty factor of 0.8. Suppose that the rule above fires (see also
match-resolve-act cycle). Then Q will be added to working memory with an associated certainty factor of
0.7 * 0.8 = 0.56.
condition-action rule
A condition-action rule, also called a production or production rule, is a rule of the form
if condition then action.
The condition may be a compound one using connectives like and, or, and not. The action, too, may be
compound. The action can affect the value of
working memory variables, or take some real world action, or potentially do other things, including
stopping the production system.
Rule-Based Systems
m
The knowledge of many expert systems is principally stored in their collections of rules.
o
One of the most popular methods for representing knowledge is in the form of Production Rules. These

.c
are in the form of:
if conditions then conclusion
If 1) the gram stain of the organism is gram
negative, and
2) the morphology of the organism is rod, and a
3) the aerobicity of the organism is
anaerobic, m
Then: There is suggestive evidence (. 6) that
a
n
the identity of the organism is
Bacteroides.

Advantages of Rules y
d
• Knowledge comes in meaningful chunks.

u
• New knowledge can be added incrementally.
• Rules can make conclusions based on different kinds of data, depending on what is available.
t
• Rule conclusions provide ``islands'' that give multiplicative power.

EMYCIN S
• Rules can be used to provide explanations, control problem-solving process, check new rules for errors.

EMYCIN was the first widely used expert system tool.


• Good for learning expert systems
• Limited in applicability to ``finite classification'' problems:
o Diagnosis
o Identification
• Good explanation capability
• Certainty factors
Several derivative versions exist.
Rule-Based Expert Systems[Shortliffe, E. Computer-based medical consultations: MYCIN. New York:
Elsevier, 1976.]
MYCIN diagnoses infectious blood diseases using a backward-chained (exhaustive) control strategy.
The algorithm, ignoring certainty factors, is basically back chaining:
Given:
8

1. list of diseases, Goal-list


2. initial symptoms, DB
3. Rules
For each g ∈ Goal-list do
If prove(g, DB, Rules) then Print (``Diagnosis:'', g)
Function prove (goal, DB, Rules)
If goal ∈ DB then return True
elseif ∃ r ∈ Rules such that rRHS contains goal
then return provelist(LHS, DB, Rules)[provelist calls prove with each condition of LHS]
else Ask user about goal and return answer
SLOT AND FILLER STRUCTURE
Why use this data structure?
• It enables attribute values to be retrieved quickly
o assertions are indexed by the entities
o binary predicates are indexed by first argument. E.g. team(Mike-Hall , Cardiff).
• Properties of relations are easy to describe .
• It allows ease of consideration as it embraces aspects of object oriented programming.
So called because:
• A slot is an attribute value pair in its simplest form.
• A filler is a value that a slot can take -- could be a numeric, string (or any data type) value or a pointer to
another slot.
• A weak slot and filler structure does not consider the content of the representation. m
We will study two types:
o
.c
• Semantic Nets.
• Frames.

UNIT-03/Lecture 04
a
m
a
Semantic Network :
n
Semantic networks are a knowledge representation technique. More specifically, it is a way of recording
y
all the relevant relationships between members of set of objects and types. "Object" means an individual

d
(a particular person, or other particular animal or object, such as a particular cat, tree, chair, brick, etc.).
"Type" means a set of related objects - the set of all persons, cats, trees, chairs, bricks, mammals, plants,

u
furniture, etc. Possible relationships include the special set-theoretic relationships isa (set membership)

t
and ako(the subset relation), and also general relationships like likes, child-of. Technically a semantic
network is a node- and edge-labelled directed graph, and they are frequently depicted that way. Here is a

S
pair of labelled nodes and a single labelled edge (relationship) between them (there could be more than
one relationship between a single pair):

Here is a larger fragment of a semantic net, showing 4 labelled nodes (Fifi, cat, mammal, milk) and three
labelled edges (isa, ako, likes) between them.

slot : A slot in a frame is like a field in a record or struct in languages like Pascal, Modula-2 and C.
However, slots can be added dynamically to frames, and slots contain substructure, called facets. The
facets would normally include a value, perhaps a default, quite likely some demons, and possibly some
flags like the iProlog frame system's cache and multi_valued facets.
state
The major idea is that:
• The meaning of a concept comes from its relationship to other concepts, and that,
• The information is stored by interconnecting nodes with labelled arcs.
Representation in a Semantic Net
9

These values can also be represented in logic as: isa(person, mammal), instance(Mike-Hall, person)
team(Mike-Hall, Cardiff)
We have already seen how conventional predicates such as lecturer(dave) can be written as instance
(dave, lecturer) Recall that isa and instance represent inheritance and are popular in many knowledge
representation schemes. But we have a problem: How we can have more than 2 place predicates in
semantic nets? E.g. score(Cardiff, Llanelli, 23-6) Solution:
• Create new nodes to represent new objects either contained or alluded to in the knowledge, game and
fixture in the current example.
As a more complex example consider the sentence: John gave Mary the book. Here we have several
aspects of an event.

Inference in a Semantic Net


Basic inference mechanism: follow links between nodes.
Two methods to do this:
Intersection search
-- the notion that spreading activation out of two nodes and finding their intersection finds relationships
among objects. This is achieved by assigning a special tag to each visited node.
Many advantages including entity-based organisation and fast parallel implementation. However very
structured questions need highly structured networks.
Inheritance
-- the isa and instance representation provide a mechanism to implement this. m
o
Inheritance also provides a means of dealing with default reasoning. E.g. we could represent:

.c
• Emus are birds.
• Typically birds fly and have wings.
• Emus run.
in the following Semantic net:
a
m
In making certain inferences we will also need to distinguish between the link that defines a new entity
and holds its value and the other kind of link that relates two existing entities. Consider the example
a
shown where the height of two people is depicted and we also wish to compare them.

n
We need extra nodes for the concept as well as its value.

y
Special procedures are needed to process these nodes, but without this distinction the analysis would be
very limited.
d
Extending Semantic Nets
u
Here we will consider some extensions to Semantic nets that overcome a few problems (see Exercises) or
t
extend their expression of knowledge.

S
Partitioned Networks Partitioned Semantic Networks allow for:
• propositions to be made without commitment to truth.
• expressions to be quantified.
Basic idea: Break network into spaces which consist of groups of nodes and arcs and regard each space as
a node.
Consider the following: Andrew believes that the earth is flat. We can encode the proposition the earth is
flat in a space and within it have nodes and arcs the represent the fact (Fig. 15). We can the have nodes
and arcs to link this space the the rest of the network to represent Andrew's belief.
Fig. 12 Partitioned network
Now consider the quantified expression: Every parent loves their child To represent this we:
• Create a general statement, GS, special class.
• Make node g an instance of GS.
• Every element will have at least 2 attributes:
o a form that states which relation is being asserted.
o one or more forall ( ) or exists ( ) connections -- these represent universally quantifiable variables in
10

such statements e.g. x, y in parent(x) : child(y) loves(x,y)

Here we have to construct two spaces one for each x,y. NOTE: We can express variables as existentially
qualified variables and express the event of love having an agent p and receiver b for every parent p
which could simplify the network (See Exercises).
Also If we change the sentence to Every parent loves child then the node of the object being acted on
(the child) lies outside the form of the general statement. Thus it is not viewed as an existentially qualified
variable whose value may depend on the agent. (See Exercises and Rich and Knight book for examples of
this) So we could construct a partitioned network as in Fig. 16

Fig. 12 Partitioned network


Frames : Frames are a knowledge representation technique. They resemble an extended form of record
(as in Pascal and Modula-2) or struct (using C terminology) or class (in Java) in that they have a number of
slots which are like fields in a record or struct, or variable in a class. Unlike a record/struct/class, it is
possible to add slots to a frame dynamically (i.e. while the program is executing) and the contents of the
slot need not be a simple value. If there is no value present in a slot, the frame system may use a default
for frames of that type, or there may be a demon present to help compute a value for the slot.

Generic Frame : A frame that serves as a template for building instance frames. For example, a generic
frame might describe the "elephant" concept in general, giving defaults for various elephant features

m
(number of legs, ears, presence of trunk and tusks, colour, size, weight, habitat, membership of the class
of mammals, etc.), which an instance frame would describe a particular elephant, say "Dumbo", who
o
might have a missing tusk and who would thus have the default for number of tusks overridden by

.c
specifically setting number of tusks to 1. Instance frames are said to inherit their slots from the generic
frame used to create them. Generic frames may also inherit slots from other generic frames of which they
are a subconcept (as with mammal and elephant - elephant inherits all the properties of mammal that are
a
encoded in the mammal generic frame - warm blood, bear young alive, etc.)
Goal state

m
Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the distinction
between a semantic net and a frame ends. Semantic nets initially we used to represent labelled
a
connections between objects. As tasks became more complex the representation needs to be more

n
structured. The more structured the system it becomes more beneficial to use frames. A frame is a
collection of attributes or slots and associated values that describe some real world entity. Frames on

y
their own are not particularly helpful but frame systems are a powerful way of encoding information to
support reasoning. Set theory provides a good basis for understanding frame systems. Each frame
represents:
d
• a class (set), or
u
An instFrame Knowledge Representation
t
S
Figure: A simple frame system
Here the frames Person, Adult-Male, Rugby-Player and Rugby-Team are all classes and the frames Robert-
Howley and Cardiff-RFC are instances.
• The isa relation is in fact the subset relation.
• The instance relation is in fact element of.
• The isa attribute possesses a transitivity property. This implies: Robert-Howley is a Back and a Back is a
Rugby-Player who in turn is an Adult-Male and also a Person.
• Both isa and instance have inverses which are called subclasses or all instances.
• There are attributes that are associated with the class or set such as cardinality and on the other hand
there are attributes that are possessed by each member of the class or set.
DISTINCTION BETWEEN SETS AND INSTANCES
It is important that this distinction is clearly understood.
Cardiff-RFC can be thought of as a set of players or as an instance of a Rugby-Team.
If Cardiff-RFC were a class then
• its instances would be players
11

• it could not be a subclass of Rugby-Team otherwise its elements would be members of Rugby-Team
which we do not want.
Instead we make it a subclass of Rugby-Player and this allows the players to inherit the correct properties
enabling us to let the Cardiff-RFC to inherit information about teams.
This means that Cardiff-RFC is an instance of Rugby-Team.
BUT There is a problem here:
• A class is a set and its elements have properties.
• We wish to use inheritance to bestow values on its members.
• But there are properties that the set or class itself has such as the manager of a team.
This is why we need to view Cardiff-RFC as a subset of one class players and an instance of teams. We
seem to have a CATCH 22. Solution: MetaClasses
A metaclass is a special class whose elements are themselves classes.
Now consider our rugby teams as:

Figure: A Metaclass frame system


The basic metaclass is Class, and this allows us to
• define classes which are instances of other classes, and (thus)
• inherit properties from this class.
Inheritance of default values occurs when one element or class is an instance of a class.
Slots as Objects
How can we to represent the following properties in frames?
• Attributes such as weight, age be attached and make sense. m
• Constraints on values such as age being less than a hundred
o
.c
• Default values
• Rules for inheritance of values such as children inheriting parent's names
• Rules for computing values
• Many values for a slot.
a
A slot is a relation that maps from its domain of classes to its range of values.

m
A relation is a set of ordered pairs so one relation is a subset of another.
Since slot is a set the set of all slots can be represent by a metaclass called Slot, say.
NOTE the following:
a
n
• Instances of SLOT are slots
• Associated with SLOT are attributes that each instance will inherit.
• Each slot has a domain and range.
y
• Range is split into two parts one the class of the elements and the other is a constraint which is a logical
d
expression if absent it is taken to be true.

u
• If there is a value for default then it must be passed on unless an instance has its own value.
• The to-compute attribute involves a procedure to compute its value. E.g. in Position where we use the
t
dot notation to assign values to the slot of a frame.

S
• Transfers through lists other slots from which values can be derived from inheritance.
• instance (an element of a class).
Interpreting frames
A frame system interpreter must be capable of the following in order to exploit the frame slot
representation:
• Consistency checking -- when a slot value is added to the frame relying on the domain attribute and
that the value is legal using range and range constraints.
• Propagation of definition values along isa and instance links.
• Inheritance of default. values along isa and instance links.
• Computation of value of slot as needed.
• Checking that only correct number of values computed.
• Demon : A demon is a facet of a slot in a frame which causes some action to be taken when the frame is
accessed in certain types of ways. For example, an if-needed demon is activated or triggered if the value
of the slot is required and a value has not yet been stored in the slot, and it should calculate or otherwise
obtain a value for the slot, while a range demon is triggered if a new value is added to the slot, to check
12

that the value added is permissible for this particular slot.


• Here is a list of the demon types supported by the iProlog frame implementation:
if_added
demons are triggered when a new value is put into a slot.
if_removed
demons are triggered when a value is removed from a slot.
if_replaced
is triggered when a slot value is replaced.
if_needed
demons are triggered when there is no value present in an instance frame and a value must be computed
from a generic frame.
if_new
is triggered when a new frame is created.
range
is triggered when a new value is added. The value must satisfy the range constraint specified for the slot.
help
is triggered when the range demon is triggered and returns false.
The following are not demons but demon-related slots in a frame.
cache
• means that when a value is computed it is stored in the instance frame.
• multi_valued
• means that the slot may contain more than one value. m
o
.c
Strong Slot and Filler Structures : Represent links between objects according to more rigid rules.
• Specific notions of what types of object and relations between them are provided.
• Represent knowledge about common situations.
a
m
UNIT-03/Lecture 05

a
Conceptual Dependency (CD)
n
Conceptual Dependency originally developed to represent knowledge acquired from natural language
input.
y
d
The goals of this theory are:
• To help in the drawing of inference from sentences.

u
• To be independent of the words used in the original input.
• That is to say: For any 2 (or more) sentences that are identical in meaning there should be only one
t
representation of that meaning.

S
It has been used by many programs that portend to understand English (MARGIE, SAM, PAM). CD
developed by Schank et al. as were the previous examples.
CD provides: a structure into which nodes representing information can be placed
• a specific set of primitives
• at a given level of granularity.
Sentences are represented as a series of diagrams depicting actions using both abstract and real physical
situations.
• The agent and the objects are represented
• The actions are built up from a set of primitive acts which can be modified by tense.
Examples of Primitive Acts are:
ATRANS -- Transfer of an abstract relationship. e.g. give.
PTRANS -- Transfer of the physical location of an object. e.g. go.
PROPEL -- Application of a physical force to an object. e.g. push.
MTRANS-- Transfer of mental information. e.g. tell.
MBUILD -- Construct new information from old. e.g. decide.
13

SPEAK -- Utter a sound. e.g. say.


ATTEND-- Focus a sense on a stimulus. e.g. listen, watch.
MOVE -- Movement of a body part by owner. e.g. punch, kick.
GRASP-- Actor grasping an object. e.g. clutch.
INGEST-- Actor ingesting an object. e.g. eat.
EXPEL -- Actor getting rid of an object from body. e.g. ????.
Six primitive conceptual categories provide building blocks which are the set of allowable dependencies in
the concepts in a sentence:
PP-- Real world objects.
ACT-- Real world actions.
PA -- Attributes of objects.
AA -- Attributes of actions.
T-- Times.
LOC -- Locations.
How do we connect these things together?
Consider the example:
John gives Mary a book

• Arrows indicate the direction of dependency. Letters above indicate certain relationships:
o-- object.
R-- recipient-donor.
I -- instrument e.g. eat with a spoon. m
D-- destination e.g. going home.
o
.c
• Double arrows ( ) indicate two-way links between the actor (PP) and action (ACT).
• The actions are built from the set of primitive acts (see above).
o These can be modified by tense etc.
a
The use of tense and mood in describing events is extremely important and schank introduced the
following modifiers:
p -- past
f-- future m
t-- transition
a
-- start transition n
-- finished transition y
k
d
-- continuing
? u
-- interrogative
t
/
-- negative
delta
S
-- timeless
c
-- conditional
the absence of any modifier implies the present tense.
So the past tense of the above example:
John gave Mary a book becomes:

The has an object (actor), PP and action, ACT. I.e. PP ACT. The triplearrow ( ) is also a two link but
between an object, PP, and its attribute, PA. I.e. PP PA.
It represents isa type dependencies. E.g
Dave lecturerDave is a lecturer.
Primitive states are used to describe many state descriptions such as height, health, mental state, physical
14

state.
There are many more physical states than primitive actions. They use a numeric scale.
E.g. John height(+10) John is the tallest John height(< average) John is short Frank Zappa health(-10)
Frank Zappa is dead Dave mental_state(-10) Dave is sad Vase physical_state(-10) The vase is broken
You can also specify things like the time of occurrence in the relation ship.
For Example: John gave Mary the book yesterday

Now let us consider a more complex sentence: Since smoking can kill you, I stopped Lets look at how we
represent the inference that smoking can kill:
• Use the notion of one to apply the knowledge to.
• Use the primitive act of INGESTing smoke from a cigarette to one.
• Killing is a transition from being alive to dead. We use triple arrows to indicate a transition from one
state to another.
• Have a conditional, c causality link. The triple arrow indicates dependency of one concept on another.

To add the fact that I stopped smoking


• Use similar rules to imply that I smoke cigarettes.

m
• The qualification attached to this dependency indicates that the instance INGESTing smoke has
stopped.
o
Advantages of CD: .c
• Using these primitives involves fewer inference rules.
• Many inference rules are already represented in CD structure. a
m
• The holes in the initial structure help to focus on the points still to be established.
Disadvantages of CD:
a
• Knowledge must be decomposed into fairly low level primitives.

n
• Impossible or difficult to find correct set of primitives.
• A lot of inference may still be required.

y
• Representations can be complex even for relatively simple actions. Consider:
Dave bet Frank five pounds that Wales would win the Rugby World Cup.
d
Complex representations require a lot of storage

u
t UNIT-03/Lecture 06

Scripts
S
A script is a structure that prescribes a set of circumstances which could be expected to follow on from
one another. It is similar to a thought sequence or a chain of situations which could be anticipated.
Scripts are beneficial because:
• Events tend to occur in known runs or patterns.
• Causal relationships between events exist.
• Entry conditions exist which allow an event to take place
• Prerequisites exist upon events taking place. E.g. when a student progresses through a degree scheme
or when a purchaser buys a house.
The components of a script include:
Entry Conditions -- these must be satisfied before events in the script can occur.
Results -- Conditions that will be true after events in script occur.
Props -- Slots representing objects involved in events.
15

Roles -- Persons involved in the events.


Track -- Variations on the script. Different tracks may share components of the same script.
Scenes-- The sequence of events that occur. Events are represented in conceptual dependency form.
Scripts are useful in describing certain situations such as robbing a bank. This might involve:
• Getting a gun.
• Hold up a bank.
• Escape with the money.
Here the Props might be
• Gun, G.
• Loot, L.
• Bag, B
• Get away car, C.
The Roles might be:
• Robber, S.
• Cashier, M.
• Bank Manager, O.
• Policeman, P.
The Entry Conditions might be:
• S is poor.
• S is destitute.
The Results might be:
• S has more money. m
• O is angry.
o
.c
• M is in a state of shock.
• P is shot.
There are 3 scenes: obtaining the gun, robbing the bank and the getaway.
a
• If a particular script is to be applied it must be activated and the activating depends on its significance.

m
• If a topic is mentioned in passing then a pointer to that script could be held.
• If the topic is important then the script should be opened.
a
• The danger lies in having too many active scripts much as one might have too many windows open on

n
the screen or too many recursive calls in a program.
• Provided events follow a known trail we can use scripts to represent the actions involved and use them
to answer detailed questions.
y
• Different trails may be allowed for different outcomes of Scripts ( e.g. The bank robbery goes wrong).
d
CYC u
What is CYC?
t
reasoning. S
• An ambitious attempt to form a very large knowledge base aimed at capturing commonsense

• Initial goals to capture knowledge from a hundred randomly selected articles in the EnCYClopedia
Britannica.
• Both Implicit and Explicit knowledge encoded.
• Emphasis on study of underlying information (assumed by the authors but not needed to tell to the
readers.
Example: Suppose we read that Wellington learned of Napoleon's death
Then we (humans) can conclude Napoleon never new that Wellington had died.
How do we do this?
We require special implicit knowledge or commonsense such as:
• We only die once.
• You stay dead.
• You cannot learn of anything when dead.
• Time cannot go backwards.
16

Why build large knowledge bases:


Brittleness
-- Specialised knowledge bases are brittle. Hard to encode new situations and non-graceful degradation in
performance. Commonsense based knowledge bases should have a firmer foundation.
Form and Content
-- Knowledge representation may not be suitable for AI. Commonsense strategies could point out where
difficulties in content may affect the form.
Shared Knowledge
-- Should allow greater communication among systems with common bases and assumptions.
Machine Learning : Machine learning refers to the ability of computers to automatically acquire new
knowledge, learning from, for example, past cases or experience, from the computer's own experiences,
or from exploration. Machine learning has many uses such as finding rules to direct marketing campaigns
based on lessons learned from analysis of data from supermarket loyalty campaigns; or learning to
recognize characters from people's handwriting. Machine learning enables computer software to adapt to
changing circumstances, enabling it to make better decisions than non-AI software. Synonyms: learning,
automatic learning.

Model-based Reasoning : Model-based reasoning (MBR) concentrates on reasoning about a system’s


behavior from an explicit model of the mechanisms underlying that behavior. Model-based techniques
can very succinctly represent knowledge more completely and at a greater level of detail than techniques

m
that encode experience, because they employ models that are compact axiomatic systems from which
large amounts of information can be deduced.
o
.c
Natural Language Processing : English is an example of a natural language, a computer language isn't. For
a computer to process a natural language, it would have to mimic what a human does. That is, the
computer would have to recognize the sequence of words spoken by a person or another computer,
a
understand the syntax or grammar of the words (i.e., do a syntactical analysis), and then extract the
meaning of the words. A limited amount of meaning can be derived from a sequence of words taken out

m
of context (i.e., by semantic analysis); but much more of the meaning depends on the context in which
the words are spoken (e.g., who spoke them, under what circumstances, with what tone, and what else
a
was said, particularly before the words), which would require a pragmatic analysis to extract. To date,

n
natural language processing is poorly developed and computers are not yet able to even approach the
ability of humans to extract meaning from natural languages; yet there are already valuable practical
applications of the technology.
y
How is CYC coded?
d
• By hand.
• Special CYCL language: u
o LISP like.
t
o Frame based
S
o Multiple inheritance
o Slots are fully fledged objects.
o Generalized inheritance -- any link not just isa and instance.
Genuine Randomness
-- Card games are a good example. We may not be able to predict any outcomes with certainty but we
have knowledge about the likelihood of certain items (e.g. like being dealt an ace) and we can exploit this.
Exceptions
-- Symbolic methods can represent this. However if the number of exceptions is large such system tend to
break down. Many common sense and expert reasoning tasks for example. Statistical techniques can
summarise large exceptions without resorting enumeration.
Basic Statistical methods -- Probability
The basic approach statistical methods adopt to deal with uncertainty is via the axioms of probability:
• Probabilities are (real) numbers in the range 0 to 1.
• A probability of P(A) = 0 indicates total uncertainty in A, P(A) = 1 total certainty and values in between
17

some degree of (un)certainty.


• Probabilities can be calculated in a number of ways.
Very Simply
Probability = (number of desired outcomes) / (total number of outcomes)
So given a pack of playing cards the probability of being dealt an ace from a full normal deck is 4 (the
number of aces) / 52 (number of cards in deck) which is 1/13. Similarly the probability of being dealt a
spade suit is 13 / 52 = 1/4.
If you have a choice of number of items k from a set of items n then the formula is applied to find the
number of ways of making this choice. (! = factorial).
So the chance of winning the national lottery (choosing 6 from 49) is to 1.
• Conditional probability, P(A|B), indicates the probability of of event A given that we know event B has
occurred.
• sets etc. and reason effectively.

m
o
.c
a
m
a
n
y
d
u
t
S
1

UNIT-04
Game Playing and Natural Language Processing
UNIT-04/LECTURE-01

Game Playing
Formulating Game Playing as Search

• Consider 2-person, zero-sum, perfect information (i.e., both players have access to complete
information about the state of the game, so no information is hidden from either player) games.
Players alternate moves and there is no chance (e.g., using dice) involved
• Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim, and Othello
• Iterative methods apply here because search space is too large for interesting games to search
for a "solution." Therefore, search will be done before EACH move in order to select the best
next move to be made.
m

o
Adversary methods needed because alternate moves are made by an opponent who is trying to
win. Therefore must incorporate the idea that an adversary makes moves that are "not

.c
controllable" by you.
• Evaluation function is used to evaluate the "goodness" of a configuration of the game. Unlike in

a
heuristic search where the evaluation function was a non-negative estimate of the cost from the
start node to a goal and passing through the given node, here the evaluation function, also

m
called the static evaluation function estimates board quality in leading to a win for one player.
• Instead of modeling the two players separately, the zero-sum assumption and the fact that we

a
don't have, in general, any information about how our opponent plays, means we'll use a single
evaluation function to describe the goodness of a board with respect to BOTH players. That is,

n
f(n) = large positive value means the board associated with node n is good for me and bad for

y
you. f(n) = large negative value means the board is bad for me and good for you. f(n) near 0
means the board is a neutral position. f(n) = +infinity means a winning position for me. f(n) = -

• d
infinity means a winning position for you.
Example of an Evaluation Function for Tic-Tac-Toe:
u
f(n) = [number of 3-lengths open for me] - [number of 3-lengths open for you]

• t
where a 3-length is a complete row, column, or diagonal.
Most evaluation functions are specified as a weighted sum of "features:" (w1 * feat1) + (w2 *
S
feat2) + ... + (wn * featn). For example, in chess some features evaluate piece placement on the
board and other features describe configurations of several pieces. Deep Blue has about 6000
features in its evaluation function.

Game Trees

• Root node represents the configuration of the board at which a decision must be made as to
what is the best single move to make next. If it is my turn to move, then the root is labeled a
"MAX" node indicating it is my turn; otherwise it is labeled a "MIN" node to indicate it is my
opponent's turn.
• Arcs represent the possible legal moves for the player that the arcs emanate from
• Each level of the tree has nodes that are all MAX or all MIN; since moves alternate, the nodes at
level i are of the opposite kind from those at level i+1
2

Searching Game Trees using the Minimax Algorithm

Steps used in picking the next move:

1. Create start node as a MAX node (since it's my turn to move) with current board configuration
2. Expand nodes down to some depth (i.e., ply) of lookahead in the game
3. Apply the evaluation function at each of the leaf nodes
4. "Back up" values for each of the non-leaf nodes until a value is computed for the root node. At
MIN nodes, the backed up value is the minimum of the values associated with its children. At
MAX nodes, the backed up value is the maximum of the values associated with its children.
5. Pick the operator associated with the child node whose backed up value determined the value at
the root

Note: The above process of "backing up" values gives the optimal strategy that BOTH players would
follow given that they both have the information computed at the leaf nodes by the evaluation function.
This is implicitly assuming that your opponent is using the same static evaluation function you are, and
that they are applying it at the same set of nodes in the search tree.

Minimax Algorithm in Java


m
public int minimax(s)
o
.c
{
int [] v = new int[#ofSuccessors];
if (leaf(s))

a
return(static-evaluation(s));
else
{
// s1, s2, ..., sk are the successors of s
for (int i = 1; i < #ofSuccessors; i++) m
{
v[i] = minimax(si); a
}
if (node-type(s) = max) n
return max(v1, ..., vk);
else return min(v1, ..., vk); y
}
}
d
u
t
Example of Minimax Algorithm

S
For example, in a 2-ply
search, the MAX player
considers all (3) possible /
S MAX ......
/ | \
| \
moves. MIN .... A B C
/|\ |\ |\
The opponent MIN also / | \ | \ | \
considers all possible D E F G H I J
moves. The evaluation function 100 3 -1 6 5 2 9
is applied to the leaf level only.

Once the static evaluation function is applied at the leaf nodes, backing up values can begin. First we
compute the backed-up values at the parents of the leaves. Node A is a MIN node corresponding to the
fact that it is a position where it's the opponent's turn to move. A's backed-up value is -1 (= min(100, 3, -
1), meaning that if the opponent ever reaches the board associated with this node, then it will pick the
move associated with the arc from A to F. Similarly, B's backed-up value is 5 (corresponding to child H)
3

and C's backed-up value is 2 (corresponding to child I).

Next, we backup values to the next higher level, in this case to the MAX node S. Since it is our turn to
move at this node, we select the move that looks best based on the backed-up values at each of S's
children. In this case the best child is B since B's backed-up value is 5 (= max(-1, 5, 2)). So the minimax
value for the root node S is 5, and the move selected based on this 2-ply search is the move associated
with the arc from S to B.

It is important to notice that the backed-up values are used at nodes A, B, and C to evaluate which is
best for S; we do not apply the static evaluation function at any non-leaf node. Why? Because it is
assumed that the values computed at nodes farther ahead in the game (and therefore lower in the tree)
are more accurate evaluations of quality and therefore are preferred over the evaluation function values
if applied at the higher levels of the tree.

Notice that, in general, the backed-up value of a node changes as we search more plies. For example, A's
backed-up value is -1. But if we had searched one more ply, D, E and F will have their own backed-up
values, which are almost certainly going to be different from 100, 3 and -1, respectively. And, in turn, A
will likely not have -1 as its backed-up value. We are implicitly assuming that the deeper we search, the
better the quality of the final outcome.

Alpha-Beta Pruning
m

o
Minimax computes the optimal playing strategy but does so inefficiently because it first

.c
generates a complete tree and then computes and backs up static-evaluation-function values.
For example, from an average chess position there are 38 possible moves. So, looking ahead 12
plies involves generating 1 + 38 + 382 + ... + 3812 = (3812-1)/(38-1) nodes, and applying the static
a
evaluation function at 3812 = 9 billion billion positions, which is far beyond the capabilities of any
computer in the foreseeable future. Can we devise another algorithm that is guaranteed to
m
produce the same result (i.e., minimax value at the root) but does less work (i.e., generates

a
fewer nodes)? Yes---Alpha-Beta.
• Basic idea: "If you have an idea that is surely bad, don't take the time to see how truly awful it
is." -- Pat Winston
n
y
Example of how to use this idea for pruning away useless work:
MAX
d . S minimax val >= 100

u
. |
. |
MIN
t A
100
B
/|\
minimax val <= 20

S /
200
/ \
\
100
/ | \
D E
120 20

In the above example we are performing a depth-first search to depth (ply) 2, where children
are generated and visited left-to-right. At this stage of the search we have just finished
generating B's second child, E, and computed the static evaluation function at E (=20). Before
generating B's third child notice the current situation: S is a MAX node and its left child A has a
minimax value of 100, so S's minimax value must eventually be some number >= 100. Similarly,
B has generated two children, D and E, with values 120 and 20, respectively, so B's final minimax
value must be <= min(120, 20) = 20 since B is a MIN node.

The fact that S's minimax value must be at least 100 while B's minimax value must be no greater
than 20 means that no matter what value is computed for B's third child, S's minimax value will
4

be 100. In other words, S's minimax value does not depend on knowing the value of B's third
child. Hence, we can cutoff the search below B, ignoring generating any other children after D
and E.

Alpha-Beta Algorithm

• Traverse the search tree in depth-first order


• Assuming we stop the search at ply d, then at each of these nodes we generate, we apply the
static evaluation function and return this value to the node's parent
• At each non-leaf node, store a value indicating the best backed-up value found so far. At MAX
nodes we'll call this alpha, and at MIN nodes we'll call the value beta. In other words, alpha =
best (i.e., maximum) value found so far at a MAX node (based on its descendant's values). Beta =
best (i.e., minimum) value found so far at a MIN node (based on its descendant's values).
• The alpha value (of a MAX node) is monotonically non-decreasing
• The beta value (of a MIN node) is monotonically non-increasing
• Given a node n, cutoff the search below n (i.e., don't generate any more of n's children) if
o n is a MAX node and alpha(n) >= beta(i) for some MIN node ancestor i of n. This is called
a beta cutoff.
o n is a MIN node and beta(n) <= alpha(i) for some MAX node ancestor i of n. This is called
an alpha cutoff.

100 m
In the example shown above an alpha cutoff occurs at node B because beta(B) = 20 < alpha(S) =

o
MIN . S beta = 20 .c
An example of a beta cutoff at node B (because alpha(B) = 25 > beta(S) = 20) is shown below:

MAX A
.
. |
|
B alpha = 25
a
20
/ \
/|\
/ | \ m
20
/ \
-10
D E
-20 25 a
n

y
To avoid searching for the ancestor nodes in order to make the above tests, we can carry down
the tree the best values found so far at the ancestors. That is, at a MAX node n, beta = minimum

d
of all the beta values at MIN node ancestors of n. Similarly, at a MIN node n, alpha = maximum
of all the alpha values at MAX node ancestors of n. Thus, now at each non-leaf node we'll store
u
both an alpha value and a beta value. https://www.rgpvonline.com

t
Initially, assign to the root values of alpha = -infinity and beta = +infinity

S
• See the text for a pseudocode description of the full Alpha-Beta algorithm

Example of Alpha-Beta Algorithm on a 3-Ply Search Tree

Below is a search tree where a beta cutoff occurs at node F and alpha cutoffs occur at nodes C and D. In
this case we've pruned 10 nodes (O,H,R,S,I,T,U,K,Y,Z) from the 26 that are generated by Minimax.
5
The linked image cannot be displayed. The file may have been moved, renamed, or deleted. Verify that the link points to the correct file and location.

Effectiveness of Alpha-Beta
m

Minimax o
Alpha-Beta is guaranteed to compute the same minimax value for the root node as computed by

.c
• In the worst case Alpha-Beta does NO pruning, examining b^d leaf nodes, where each node has
b children and a d-ply search is performed

a
In the best case, Alpha-Beta will examine only (2b)^(d/2) leaf nodes. Hence if you hold fixed the
number of leaf nodes (as a measure of the amount of time you have allotted before a decision

m
must be made), then you can search twice as deep as Minimax!
• The best case occurs when each player's best move is the leftmost alternative (i.e., the first child

a
generated). So, at MAX nodes the child with the largest value is generated first, and at MIN
nodes the child with the smallest value is generated first.

n
In the chess program Deep Blue, they found empirically that Alpha-Beta pruning meant that the

y
average branching factor at each node was about 6 instead of about 35-40

Cutting off Search (or, when to stop and apply the evaluation function)
d
So far we have assumed a fixed depth d where the search is stopped and the static evaluation function is
u
applied. But there are variations on this that are important to note:
t
• S
Don't stop at non-quiescent nodes. If a node represents a state in the middle of an exchange of
pieces, then the node is not quiescent and therefore the evaluation function may not give a
reliable estimate of board quality. Or, another definition for chess: "a state is non-quiescent if
any piece is attacked by one of lower value, or by more pieces than defenses, or if any check
exists on a square controlled by the opponent." In this case, expand more nodes and only apply
the evaluation function at quiescent nodes.
• The identification of non-quiescent nodes partially deals with the horizon effect. A negative
horizon is where the state seen by the evaluation function is evaluated as better than it really is
because an undesirable effect is just beyond this node (i.e., the search horizon). A positive
horizon is where the evaluation function wrongly underestimates the value of a state when
positive actions just over the search horizon indicate otherwise.
• Iterative Deepening is frequently used with Alpha-Beta so that searches to successively deeper
plies can be attempted if there is time, and the move selected is the one computed by the
6

deepest search completed when the time limit is reached.

Natural Language Processing

Natural Language Processing (NLP) is the process of computer analysis of input provided in a
human language (natural language), and conversion of this input into a useful form of
representation.

The field of NLP is primarily concerned with getting computers to perform useful and interesting
tasks with human languages. The field of NLP is secondarily concerned with helping us come to
a better understanding of human language.

• The input/output of a NLP system can be: –


written text


– speech
We will mostly concerned with written text (not speech). m
• To process written text, we need:
o
.c
– lexical, syntactic, semantic knowledge about the language –
discourse information, real world knowledge

a
• To process spoken language, we need everything required to process written text, plus
the challenges of speech recognition and speech synthesis.

There are two components of NLP. m


1. Natural Language Understanding a
n
– Mapping the given input in the natural language into a useful
representation.
y
– Different level of analysis required:
d
morphological analysis,

u
syntactic analysis, semantic
analysis, discourse analysis,
… t
S
2. Natural Language Generation
– Producing output in the natural language from some internal
representation.
– Different level of synthesis required: deep
planning (what to say), syntactic
generation
3. NL Understanding is much harder than NL Generation. But, still both of them are hard.

The difficulty in NL understanding arises from the following facts:

• Natural language is extremely rich in form and structure, and very ambiguous. – How
to represent meaning,
– Which structures map to which meaning structures.
• One input can mean many different things. Ambiguity can be at different levels.
7

– Lexical (word level) ambiguity -- different meanings of words


– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning
of that sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.

The following language related information are useful in NLP:

• Phonology – concerns how words are related to the sounds that realize them.

• Morphology – concerns how words are constructed from more basic meaning
units called morphemes. A morpheme is the primitive unit of meaning in a
language.

• Syntax – concerns how can be put together to form correct sentences and
determines what structural role each word plays in the sentence and what phrases
are subparts of other phrases.
m

o
Semantics – concerns what words mean and how these meaning combine in sentences to

.c
form sentence meaning. The study of context-independent meaning.

• Pragmatics – concerns how sentences are used in different situations and how use
affects the interpretation of the sentence. a
m
• Discourse – concerns how the immediately preceding sentences affect the

a
interpretation of the next sentence. For example, interpreting pronouns and
interpreting the temporal aspects of the information.

• n
World Knowledge – includes general knowledge about the world. What each
y
language user must know about the other’s beliefs and goals.
d
u
Natural Language Understanding
t
The steps in natural language understanding are as follows:
SWords
Morphological Analysis

Morphologically analyzed words (another step: POS tagging)

Syntactic Analysis

Syntactic Structure

Semantic Analysis
8

Context-independent meaning representation

Discourse Processing

Final meaning representation

m
o
.c
a
m
a
n
y
d
u
t
S
UNIT-05
Learning, Common Sense, Expert Systems

Introduction to Learning
Machine Learning is the study of how to build computer systems that adapt and improve
with experience. It is a subfield of Artificial Intelligence and intersects with cognitive
science, information theory, and probability theory, among others.

Classical AI deals mainly with deductive reasoning, learning represents inductive


reasoning. Deductive reasoning arrives at answers to queries relating to a particular
situation starting from a set of general axioms, whereas inductive reasoning arrives at
general axioms from a set of particular instances.

Classical AI often suffers from the knowledge acquisition problem in real life

m
applications where obtaining and updating the knowledge base is costly and prone
to errors. Machine learning serves to solve the knowledge acquisition bottleneck by
obtaining the result from data by induction.
o
.c
Machine learning is particularly attractive in several real life problem because of
the following reasons:
a

m
Some tasks cannot be defined well except by example


a
Working environment of machines may not be known at design time

• n
Explicit knowledge encoding may be difficult and not available

y
Environments change over time

• d
Biological systems learn

u
Recently, learning is widely used in a number of application areas including,
t

• S
Data mining and knowledge discovery
Speech/image/video (pattern) recognition
• Adaptive control
• Autonomous vehicles/robots
• Decision support systems
• Bioinformatics
• WWW

Formally, a computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T, as measured
by P, improves with experience E.
Thus a learning system is characterized by:

A task T

B experience E, and

C performance measure P

Examples:Learning to play chess

T: Play chess

P: Percentage of games won in world tournament


E: Opportunity to play against self or other players

Learning to drive a van

T: Drive on a public highway using vision sensors m


o
.c
P: Average distance traveled before an error (according to human observer)
E: Sequence of images and steering actions recorded during human driving.

a
The block diagram of a generic learning system which can realize the above definition is
shown below:
m
a
n
y
d
u
t
S

As can be seen from the above diagram the system consists of the following components:

1. Goal: Defined with respect to the task to be performed by the system


2. Model: A mathematical function which maps perception to actions
3. Learning rules: Which update the model parameters with new experience such
that the performance measures with respect to the goals is optimized
4. Experience: A set of perception (and possibly the corresponding actions)

12.1.1 Taxonomy of Learning Systems

Several classification of learning systems are possible based on the above components
as follows:

Goal/Task/Target Function:

Prediction: To predict the desired output for a given input based on previous input/output
pairs. E.g., to predict the value of a stock given other inputs like market index, interest
rates etc.

m
Categorization: To classify an object into one of several categories based on features of

o
the object. E.g., a robotic vision system to categorize a machine part into one of the
categories, spanner, hammer etc based on the parts’ dimension and shape.

.c
Clustering: To organize a group of objects into homogeneous segments. E.g., a satellite

a
image analysis system which groups land areas into forest, urban and water body, for
better utilization of natural resources.

m
Planning: To generate an optimal sequence of actions to solve a particular problem.

a
E.g., an Unmanned Air Vehicle which plans its path to obtain a set of pictures and avoid
enemy anti-aircraft guns.
n
Models:
y

d
Propositional and FOL rules

• u
Decision trees
t


S
Linear separators

Neural networks

• Graphical models

• Temporal models like hidden Markov models

Learning Rules:

Learning rules are often tied up with the model of learning used. Some common rules are
gradient descent, least square error, expectation maximization and margin maximization.
Experiences:

Learning algorithms use experiences in the form of perceptions or perception action


pairs to improve their performance. The nature of experiences available varies with
applications. Some common situations are described below.

Supervised learning: In supervised learning a teacher or oracle is available which


provides the desired action corresponding to a perception. A set of perception action pair
provides what is called a training set. Examples include an automated vehicle where a set
of vision inputs and the corresponding steering actions are available to the learner.

Unsupervised learning: In unsupervised learning no teacher is available. The learner only


discovers persistent patterns in the data consisting of a collection of perceptions. This is
also called exploratory learning. Finding out malicious network attacks from a sequence
of anomalous data packets is an example of unsupervised learning.

m
Active learning: Here not only a teacher is available, the learner has the freedom to ask

o
the teacher for suitable perception-action example pairs which will help the learner to
improve its performance. Consider a news recommender system which tries to learn an

.c
users preferences and categorize news articles as interesting or uninteresting to the
user. The system may present a particular article (of which it is not sure) to the user and
ask whether it is interesting or not.
a
m
Reinforcement learning: In reinforcement learning a teacher is available, but the teacher
instead of directly providing the desired action corresponding to a perception, return
a
reward and punishment to the learner for its action corresponding to a perception.

n
Examples include a robot in a unknown terrain where its get a punishment when its hits
an obstacle and reward when it moves smoothly.
y
In order to design a learning system the designer has to make the following choices
d
based on the application.

u
t
S
m
o
.c
12.1.2 Mathematical formulation of the inductive learning problem

a
• Extrapolate from a given set of examples so that we can make accurate
predictions about future examples.

m
• Supervised versus Unsupervised learning
Want to learn an unknown function f(x) = y, where x is an input example and y is the

a
desired output. Supervised learning implies we are given a set of (x, y) pairs by a "teacher."
Unsupervised learning means we are only given the xs. In either case, the goal is to estimate
f.
n
Inductive Bias y
d
u
• Inductive learning is an inherently conjectural process because any knowledge created by
generalization from specific facts cannot be proven true; it can only be proven false. Hence,
t
inductive inference is falsity preserving, not truth preserving.

S
To generalize beyond the specific training examples, we need constraints or biases on
what f is best. That is, learning can be viewed as searching the Hypothesis Space H of
possible f functions.
• A bias allows us to choose one f over another one
• A completely unbiased inductive algorithm could only memorize the training examples
and could not say anything more about other unseen examples.
• Two types of biases are commonly used in machine learning:
o Restricted Hypothesis Space Bias
Allow only certain types of f functions, not arbitrary ones

o Preference Bias
Define a metric for comparing fs so as to determine whether one is better than
another

Inductive Learning Framework

• Raw input data from sensors are preprocessed to obtain a feature vector, x, that
adequately describes all of the relevant features for classifying examples.
• Each x is a list of (attribute, value) pairs. For example,

x = (Person = Sue, Eye-Color = Brown, Age = Young, Sex = Female)

The number of attributes (also called features) is fixed (positive, finite). Each
attribute has a fixed, finite number of possible values.

Each example can be interpreted as a point in an n-dimensional feature space, where n is the
number of attributes.

m
o
.c
a
m
a
n
y
d
u
t
S
Neural Networks
Artificial neural networks are among the most powerful learning models. They have the
versatility to approximate a wide range of complex functions representing multi-dimensional
input-output maps. Neural networks also have inherent adaptability, and can perform robustly
even in noisy environments.

An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by


the way biological nervous systems, such as the brain, process information. The key element of
this paradigm is the novel structure of the information processing system. It is composed of a
large number of highly interconnected simple processing elements (neurons) working in unison
to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a
specific application, such as pattern recognition or data classification, through a learning process.
Learning in biological systems involves adjustments to the synaptic connections that exist
between the neurons. This is true of ANNs as well. ANNs can process information at a great
speed owing to their highly massive parallelism.

m
Neural networks, with their remarkable ability to derive meaning from complicated or imprecise
o
data, can be used to extract patterns and detect trends that are too complex to be noticed by either

.c
humans or other computer techniques. A trained neural network can be thought of as an "expert"
in the category of information it has been given to analyse. This expert can then be used to

a
provide projections given new situations of interest and answer "what if" questions. Other
advantages include:

• m
Adaptive learning: An ability to learn how to do tasks based on the data given for training


or initial experience.
a
Self-Organisation: An ANN can create its own organisation or representation of the
n
information it receives during learning time.

y
Real Time Operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this
capability.
d

u
Fault Tolerance via Redundant Information Coding: Partial destruction of a network
leads to the corresponding degradation of performance. However, some network
t
capabilities may be retained even with major network damage.

S
Artificial Neural Networks

Artificial neural networks are represented by a set of nodes, often arranged in layers, and a set of
weighted directed links connecting them. The nodes are equivalent to neurons, while the links
denote synapses. The nodes are the information processing units and the links acts as
communicating media.

There are a wide variety of networks depending on the nature of information processing carried
out at individual nodes, the topology of the links, and the algorithm for adaptation of link
weights. Some of the popular among them include:

Perceptron: This consists of a single neuron with multiple inputs and a single output. It has
restricted information processing capability. The information processing is done through a
transfer function which is either linear or non-linear.

Multi-layered Perceptron (MLP): It has a layered architecture consisting of input, hidden and
output layers. Each layer consists of a number of perceptrons. The output of each layer is
transmitted to the input of nodes in other layers through weighted links. Usually, this
transmission is done only to nodes of the next layer, leading to what are known as feed forward
networks. MLPs were proposed to extend the limited information processing capabilities of
simple percptrons, and are highly versatile in terms of their approximation ability. Training or
weight adaptation is done in MLPs using supervised backpropagation learning.

Recurrent Neural Networks: RNN topology involves backward links from output to the input
and hidden layers. The notion of time is encoded in the RNN information processing scheme.
m
They are thus used in applications like speech processing where inputs are time sequences data.

o
.c
Self-Organizing Maps: SOMs or Kohonen networks have a grid topology, wit unequal grid
weights. The topology of the grid provides a low dimensional visualization of the data
distribution. These are thus used in applications which typically involve organization and human
a
browsing of a large volume of data. Learning is performed using a winner take all strategy in a
unsupervised mode.
m
a
n
y
d
u
t
S
Common Sense

In artificial intelligence research, commonsense knowledge is the collection of facts and


information that an ordinary person is expected to know. The commonsense knowledge problem
is the ongoing project in the field of knowledge representation (a sub-field of artificial
intelligence) to create a commonsense knowledge base: a database containing all the general
knowledge that most people possess, represented in a way that it is available to artificial
intelligence programs that use natural language or make inferences about the ordinary world.
Such a database is a type of ontology of which the most general are called upper ontologies.

The problem is considered to be among the hardest in all of AI research because the breadth and
detail of commonsense knowledge is enormous. Any task that requires commonsense knowledge
is considered AI-complete: to be done as well as a human being does it, it requires the machine
to appear as intelligent as a human being. These tasks include machine translation, object
m
recognition, text mining and many others. To do these tasks perfectly, the machine simply has to

o
know what the text is talking about or what objects it may be looking at, and this is impossible in
general unless the machine is familiar with all the same concepts that an ordinary person is
familiar with.
.c
a
Information in a commonsense knowledge base may include, but is not limited to, the following:

• An ontology of classes and individuals m


• Parts and materials of objects
a

Functions and uses of objects n
Properties of objects (such as color and size)

y

• Locations of objects and layouts of locations

d
Locations of actions and events

u
Durations of actions and events

• t
Preconditions of actions and events
Effects (postconditions) of actions and events


S
Subjects and objects of actions
Behaviors of devices
• Stereotypical situations or scripts
• Human goals and needs
• Emotions
• Plans and strategies
• Story themes
• Contexts
Expert Systems
An expert system is a computer program that contains some of the subject-specific knowledge
of one or more human experts. An expert systems are meant to solve real problems which
normally would require a specialized human expert (such as a doctor or a minerologist). Building
an expert system therefore first involves extracting the relevant knowledge from the human
expert. Such knowledge is often heuristic in nature, based on useful ``rules of thumb'' rather than
absolute certainties. Extracting it from the expert in a way that can be used by a computer is
generally a difficult task, requiring its own expertise. A knowledge engineer has the job of
extracting this knowledge and building the expert system knowledge base.

A first attempt at building an expert system is unlikely to be very successful. This is partly
because the expert generally finds it very difficult to express exactly what knowledge and rules
they use to solve a problem. Much of it is almost subconscious, or appears so obvious they don't
even bother mentioning it. Knowledge acquisition for expert systems is a big area of research,
m
with a wide variety of techniques developed. However, generally it is important to develop an

o
initial prototype based on information extracted by interviewing the expert, then iteratively refine
it based on feedback both from the expert and from potential users of the expert system.

.c
In order to do such iterative development from a prototype it is important that the expert system

a
is written in a way that it can easily be inspected and modified. The system should be able to
explain its reasoning (to expert, user and knowledge engineer) and answer questions about the

m
solution process. Updating the system shouldn't involve rewriting a whole lot of code - just
adding or deleting localized chunks of knowledge.
a
n
The most widely used knowledge representation scheme for expert systems is rules. Typically,
the rules won't have certain conclusions - there will just be some degree of certainty that the
y
conclusion will hold if the conditions hold. Statistical techniques are used to determine these

d
certainties. Rule-based systems, with or without certainties, are generally easily modifiable and
make it easy to provide reasonably helpful traces of the system's reasoning. These traces can be
u
used in providing explanations of what it is doing.

t
Expert systems have been used to solve a wide range of problems in domains such as medicine,
S
mathematics, engineering, geology, computer science, business, law, defence and education.
Within each domain, they have been used to solve problems of different types. Types of problem
involve diagnosis (e.g., of a system fault, disease or student error); design (of a computer
systems, hotel etc); and interpretation (of, for example, geological data). The appropriate
problem solving technique tends to depend more on the problem type than on the domain. Whole
books have been written on how to choose your knowledge representation and reasoning
methods given characteristics of your problem. https://www.rgpvonline.com

The following figure shows the most important modules that make up a rule-based expert
system. The user interacts with the system through a user interface which may use menus,
natural language or any other style of interaction). Then an inference engine is used to reason
with both the expert knowledge (extracted from our friendly expert) and data specific to the
particular problem being solved. The expert knowledge will typically be in the form of a set of
IF-THEN rules. The case specific data includes both data provided by the user and partial
conclusions (along with certainty measures) based on this data. In a simple forward chaining
rule-based system the case specific data will be the elements in working memory.

m
o
.c
a
m
a
Almost all expert systems also have an explanation subsystem, which allows the program to

n
explain its reasoning to the user. Some systems also have a knowledge base editor which help the
expert or knowledge engineer to easily update and check the knowledge base.
y
d
One important feature of expert systems is the way they (usually) separate domain specific
knowledge from more general purpose reasoning and representation techniques. The general
u
purpose bit (in the dotted box in the figure) is referred to as an expert system shell. As we see in

t
the figure, the shell will provide the inference engine (and knowledge representation scheme), a

S
user interface, an explanation system and sometimes a knowledge base editor. Given a new kind
of problem to solve (say, car design), we can usually find a shell that provides the right sort of
support for that problem, so all we need to do is provide the expert knowledge. There are
numerous commercial expert system shells, each one appropriate for a slightly different range of
problems. (Expert systems work in industry includes both writing expert system shells and
writing expert systems using shells.) Using shells to write expert systems generally greatly
reduces the cost and time of development.
Millions of University Lecture Notes, Book Solutions,
Summary, Assignments and Projects are available for FREE.

You might also like