AI Notes

Chapter -1: Introduction
What is artificial intelligence?
It is the science and engineering of making intelligent machines, especially intelligent

computer programs. It is related to the similar task of using computers to understand human
intelligence, but AI does not have to confine itself to methods that are biologically
observable.
It is Duplication of human thought process by machine

 Learning from experience
 Interpreting ambiguities
 Rapid response to varying situations
 Applying reasoning to problem-solving
 Manipulating environment by applying knowledge
 Thinking and reasoning
Yes, but what is intelligence?
Intelligence is the computational part of the ability to achieve goals in the world. Varying
kinds and degrees of intelligence occur in people, many animals and some machines.
Isn't there a solid definition of intelligence that doesn't depend on relating it to human
intelligence?
Not yet. The problem is that we cannot yet characterize in general what kinds of
computational procedures we want to call intelligent. We understand some of the mechanisms
of intelligence and not others.
Acting humanly: The Turing Test approach
Fig. The imitation game
Abridged history of AI(summary)
1943 McCulloch & Pitts: Boolean circuit model of brain

1950 Turing's "Computing Machinery and Intelligence"
1956 Dartmouth meeting: "Artificial Intelligence" adopted
1960s Early AI programs, including Samuel's checkers program, Newell & Simon's
Logic Theorist, Gelernter's Geometry Engine
1965 Robinson's complete algorithm for logical reasoning
1966—73 AI discovers computational complexity, neural network research almost
disappears
1969—79 early development of knowledge-based systems
1980-- AI becomes an industry
1986-- Neural networks return to popularity
1987-- AI becomes a science
1995-- The emergence of intelligent agents
Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

√ Goals of AI
 Replicate human intelligence
"AI is the study of complex information processing problems that often have their roots in
some aspect of biological information processing. The goal of the subject is to identify
solvable and interesting information processing problems, and solve them." -- David Marr
 Solve knowledge-intensive tasks
"AI is the design, study and construction of computer programs that behave intelligently." --
Tom Dean
"... to achieve their full impact, computer systems must have more than processing power--
they must have intelligence. They need to be able to assimilate and use large bodies of
information and collaborate with and help people find new ways of working together
effectively. The technology must become more responsive to human needs and styles of
work, and must employ more natural means of communication." -- Barbara Grosz and
Randall Davis
 Intelligent connection of perception and action
AI not centered around representation of the world, but around action in the world. Behavior-
based intelligence. (see Rod Brooks in the movie Fast, Cheap and Out of Control)
 Enhance human-human, human-computer and computer-computer interaction/communication
Computer can sense and recognize its users, see and recognize its environment, respond
visually and audibly to stimuli. New paradigms for interacting productively with computers
using speech, vision, natural language, 3D virtual reality, 3D displays, more natural and
powerful user interfaces, etc. (See, for example, projects in Microsoft's "Advanced
Interactivity and Intelligence" group.)
Some Application Areas of AI
 Game Playing
Deep Blue Chess program beat world champion Gary Kasparov
 Speech Recognition
PEGASUS spoken language interface to American Airlines' EAASY SABRE reservation
system, which allows users to obtain flight information and make reservations over the
telephone. The 1990s has seen significant advances in speech recognition so that limited
systems are now successful.
 Computer Vision
Face recognition programs in use by banks, government, etc. The ALVINN system from
CMU autonomously drove a van from Washington, D.C. to San Diego (all but 52 of 2,849
miles), averaging 63 mph day and night, and in all weather conditions. Handwriting
recognition, electronics and manufacturing inspection, photointerpretation, baggage
inspection, reverse engineering to automatically construct a 3D geometric model.
 Expert Systems
Application-specific systems that rely on obtaining the knowledge of human experts in an
area and programming that knowledge into a system.
o Diagnostic Systems
Microsoft Office Assistant in Office 97 provides customized help by decision-
theoretic reasoning about an individual user. MYCIN system for diagnosing bacterial
infections of the blood and suggesting treatments. Intellipath pathology diagnosis

system (AMA approved). Pathfinder medical diagnosis system, which suggests tests
and makes diagnoses. Whirlpool customer assistance center.
o System Configuration
DEC's XCON system for custom hardware configuration. Radiotherapy treatment
planning.
o Financial Decision Making
Credit card companies, mortgage companies, banks, and the U.S. government employ
AI systems to detect fraud and expedite financial transactions. For example, AMEX
credit check. Systems often use learning algorithms to construct profiles of customer
usage patterns, and then use these profiles to detect unusual patterns and take
appropriate action.
o Classification Systems
Put information into one of a fixed set of categories using several sources of
information. E.g., financial decision making systems. NASA developed a system for
classifying very faint areas in astronomical images into either stars or galaxies with
very high accuracy by learning from human experts' classifications.
 Mathematical Theorem Proving
Use inference methods to prove new theorems.
 Natural Language Understanding
AltaVista's translation of web pages. Translation of Catepillar Truck manuals into 20
languages. (Note: One early system translated the English sentence "The spirit is willing but
the flesh is weak" into the Russian equivalent of "The vodka is good but the meat is rotten.")
 Scheduling and Planning
Automatic scheduling for manufacturing. DARPA's DART system used in Desert Storm and
Desert Shield operations to plan logistics of people and supplies. American Airlines rerouting
contingency planner. European space agency planning and scheduling of spacecraft assembly,
integration and verification.
Some AI "Grand Challenge" Problems
 Translating telephone
 Accident-avoiding car
 Aids for the disabled
 Smart clothes
 Intelligent agents that monitor and manage information by filtering, digesting, abstracting
 Tutors
 Self-organizing systems, e.g., that learn to assemble something by observing a human do it.
A Framework for Building AI Systems

Perception
Intelligent biological systems are physically embodied in the world and experience the world through
their sensors (senses). For an autonomous vehicle, input might be images from a camera and range
information from a rangefinder. For a medical diagnosis system, perception is the set of symptoms
and test results that have been obtained and input to the system manually. Includes areas of vision,
speech processing, natural language processing, and signal processing (e.g., market data and acoustic
data).
Reasoning
Inference, decision-making, classification from what is sensed and what the internal "model" is of the
world. Might be a neural network, logical deduction system, Hidden Markov Model induction,
heuristic searching a problem space, Bayes Network inference, genetic algorithms, etc. Includes areas
of knowledge representation, problem solving, decision theory, planning, game theory, machine
learning, uncertainty reasoning, etc.

Action
Biological systems interact within their environment by actuation, speech, etc. All behavior is
centered around actions in the world. Examples include controlling the steering of a Mars rover or
autonomous vehicle, or suggesting tests and making diagnoses for a medical diagnosis system.
Includes areas of robot actuation, natural language generation, and speech synthesis.
Some Fundamental Issues for Most AI Problems
 Representation
Facts about the world have to be represented in some way, e.g., mathematical logic is one
language that is used in AI. Deals with the questions of what to represent and how to
represent it. How to structure knowledge? What is explicit, and what must be inferred? How
to encode "rules" for inferencing so as to find information that is only implicitly known? How
to deal with incomplete, inconsistent, and probabilistic knowledge? Epistemology issues
(what kinds of knowledge are required to solve problems).
Example: "The fly buzzed irritatingly on the window pane. Jill picked up the newspaper."
Inference: Jill has malicious intent; she is not intending to read the newspaper, or use it to
start a fire, or ...
Example: Given 17 sticks in 3 x 2 grid, remove 5 sticks to leave exactly 3 squares.
 Search
Many tasks can be viewed as searching a very large problem space for a solution. For
example, Checkers has about 1040 states, and Chess has about 10120 states in a typical games.
Use of heuristics (meaning "serving to aid discovery") and constraints.
 Inference
From some facts others can be inferred. Related to search. For example, knowing "All
elephants have trunks" and "Clyde is an elephant," can we answer the question "Does Clyde
hae a trunk?" What about "Peanuts has a trunk, is it an elephant?" Or "Peanuts lives in a tree
and has a trunk, is it an elephant?" Deduction, abduction, non-monotonic reasoning, reasoning
under uncertainty.
 Learning
Inductive inference, neural networks, genetic algorithms, artificial life, evolutionary
approaches.
 Planning
Starting with general facts about the world, facts about the effects of basic actions, facts about
a particular situation, and a statement of a goal, generate a strategy for achieving that goals in
terms of a sequence of primitive steps or actions.
The State of the Art

 Computer beats human in a chess game.
 Computer-human conversation using speech recognition.
 Computer program can chat with human
 Expert system controls a spacecraft.
 Robot can walk on stairs and hold a cup of water.
 Language translation for web-pages.
 Home appliances use fuzzy logic.

Agent and Environment
 An agent is anything that can be viewed as perceiving its environment through
sensors and acting upon that environment through actuators. A human agent has
eyes, ears, and other organs for sensors and hands, legs, mouth, and other body parts
for actuators. A robotic agent might have cameras and infrared range finders for
sensors and various motors for actuators. A software agent receives keystrokes, file
contents, and network packets as sensory inputs and acts on the environment by
displaying on the screen, writing files, and sending network packets. We will make
the general assumption that every agent can perceive its own actions (but not always
the effects).
 We use the term percept to refer to the agent's perceptual inputs at any given instant.
An agent's percept sequence is the complete history of everything the agent has ever
perceived. In general, an agent's choice of action at any given instant can depend on
the entire percept sequence observed to date
 If we can specify the agent's choice of action for every possible percept sequence,
then we have said more or less everything there is to say about the agent.
Mathematically speaking, we say that an agent's behavior is described by the agent
function that maps any given percept sequence to an action.
 f : P * A
 The agent program runs on the physical architecture to produce f

Fig. Agents interact with environments through sensors and actuators
Fig. Vacuum cleaner world
Percepts: location and contents, e.g., [A, Dirty]

Actions: Left, Right, Suck, NoOp
For Vacuum Cleaner Agent:
Percept sequence Action
[A, Clean] Right
[A, Dirty] Suck
[B, Clean] Left
[B, Dirty] Suck
[A, Clean], [A, Clean] Right
[A, Clean], [A, Dirty] Suck
function Reflex-Vacuum-Agent( [location,status]) returns an action
 if status = Dirty then return Suck

 else if location =A then return Right
 else if location = B then return Left
Rationality
Definition of Rational Agent:
For each possible percept sequence, a rational agent should select an action that is expected to maximize its
performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge
the agent has.
Rational ≠ omniscient (percepts may not supply all relevant information)

Rational ≠ clairvoyant (action outcomes may not be as expected)
Hence, rational ≠ successful

Rational exploration, learning, autonomy
PEAS (Performance measure, Environment, Actuators, Sensors)
To design a rational agent, we must specify the task environments. Task environments are
essentially the "problems" to which rational agents are the "solutions." Those task
environments come in a variety of flavors and the flavor of the task environment directly
affects the appropriate design for the agent program.
Consider, e.g., the task of designing an automated taxi:
Agent Type Performance Environment Actuators Sensors

Measure
Taxi driver Safe, fast, legal, Roads, other traffic, Steering, accelerator, Cameras, sonar,
comfortable trip, pedestrians, customers brake, signal, horn, speedometer, GPS,
maximize profits display odometer,
accelerometer, engine
sensors, keyboard
Figure PEAS description of the task environment for an automated taxi.
Agent Type Performance Environment Actuators Sensors

Measure
Medical Healthy patient, Patient, hospital, staff Display Keyboard entry of
diagnosis system minimize costs, questions, tests, symptoms, findings,
lawsuits diagnoses, treatments, patient's answers
referrals
Internet Shopping Price, quality, www sites, vendors, Display to user, follow HTML pages (text,
Agent appropriateness, shippers URL, fill in form graphics, scripts)
efficiency
The range of task environments that might arise in AI is obviously vast. We can, however,
identify a fairly small number of dimensions along which task environments can be catego-
rized.
Fully observable vs. partially observable:

If an agent's sensors give it access to the complete state of the environment at each point in
time, then we say that the task environment is fully observable. An environment might be
partially observable because of noisy and inaccurate sensors or because parts of the state are
simply missing from the sensor data For example, a vacuum agent with only a local dirt
sensor cannot tell whether there is dirt in other squares, and an automated taxi cannot see
what other drivers are thinking.
Deterministic vs. stochastic.

If the next state of the environment is completely determined by the current state and the
action executed by the agent, then we say the environment is deterministic; otherwise, it is
stochastic.
Episodic vs. sequential.
In an episodic task environment, the agent's experience is divided into atomic episodes.

Each episode consists of the agent perceiving and then performing a single action. Crucially,
the next episode does not depend on the actions taken in previous episodes. In episodic
environments, the choice of action in each episode depends only on the episode itself. In
sequential environments, on the other hand, the current decision could affect all future
decisions. Chess and taxi driving are sequential: in both cases, short-term actions can have
long-term consequences. Episodic environments are much simpler than sequential
environments because the agent does not need to think ahead.
Static vs. dynamic.

If the environment can change while an agent is deliberating, then we say the environ-
ment is dynamic for that agent; otherwise, it is static. If the environment itself does not
change with the passage of time but the agent's performance score does, then we say the
environment is semidynamic. Taxi driving is clearly dynamic: the other cars and the taxi
itself keep moving while the driving algorithm dithers about what to do next. Chess, when
played with a clock, is semidynamic. Crossword puzzles are static.
Discrete vs. continuous.

The discrete/continuous distinction can be applied to the state of the environment, to the way
time is handled, and to the percepts and actions of the agent. For example, a discrete-state
environment such as a chess game has a finite number of distinct states. Chess also has a
discrete set of percepts and actions. Taxi driving is a continuous-state
Single agent vs. multiagent.

Single agent and multiagent environment is differentiated by observing no. of agents in the
environment. For example, an agent solving a crossword puzzle by itself is clearly in a
single-agent environment, whereas an agent playing chess is in a two-agent environment.
As one might expect, the hardest case is partially observable, stochastic, sequential, dynamic,
continuous, and multiagent. The real world is partially observable, stochastic, sequential,
dynamic, continuous, multi-agent.
There are four basic kinds of agent program that embody the principles underlying almost all
intelligent systems. All these can be turned into learning agents
• Simple reflex agents;
• Model-based reflex agents;
• Goal-based agents; and
• Utility-based agents.
All these can be turned into learning agents.
Agent types; simple reflex

 Select action on the basis of only the current percept.E.g. the vacuum-agent
 Large reduction in possible percept/action situations.
 Implemented through condition-action rules
 If dirty then suck

function REFLEX-VACUUM-AGENT ([location, status]) return an action
if status == Dirty then return Suck
else if location == A then return Right
else if location == B then return Left
Reduction from 4T to 4 entries
Agent types; reflex and state

 To tackle partially observable environments.
 Maintain internal state
 Over time update state using world knowledge
 How does the world change.
 How do actions affect world.
⇒Model of World
Agent types; goal-based

 The agent needs a goal to know which situations are desirable.
o Things become difficult when long sequences of actions are required to find
the goal.
 Typically investigated in search and planning research.
 Major difference: future is taken into account
 Is more flexible since knowledge is represented explicitly and can be manipulated.

Agent types; utility-based
 Certain goals can be reached in different ways.
o Some are better, have a higher utility.
 Utility function maps a (sequence of) state(s) onto a real number.
 Improves on goals:
o Selecting between conflicting goals
o Select appropriately between several goals based on likelihood of success.
Agent types; learning

 All previous agent-programs describe methods for selecting actions.
o Yet it does not explain the origin of these programs.
o Learning mechanisms can be used to perform this task.
o Teach them instead of instructing them.
o Advantage is the robustness of the program toward initially unknown
environments.
 Learning element: introduce improvements in performance element.

 Critic provides feedback on agents performance based on fixed performance standard.
 Performance element: selecting actions based on percepts.
 Corresponds to the previous agent programs

 Problem generator: suggests actions that will lead to new and informative
experiences.
 Exploration vs. exploitation
KNOWLEDGE
• Data = collection of facts, measurements, statistics
• Information = organized data
• Knowledge = contextual, relevant, actionable information
– Strong experiential and reflective elements
– Good leverage and increasing returns
– Dynamic
– Branches and fragments with growth
– Difficult to estimate impact of investment
– Uncertain value in sharing
– Evolves over time with experience
• Explicit knowledge
– Objective, rational, technical
– Policies, goals, strategies, papers, reports
– Codified
– Leaky knowledge
• Tacit knowledge
– Subjective, cognitive, experiential learning
– Highly personalized
– Difficult to formalize
– Sticky knowledge

Chapter 2 :Problem Solving
Problem-solving agent
Four general steps in problem solving:
 Goal formulation
o What are the successful world states
 Problem formulation
o What actions and states to consider to give the goal
 Search
o Determine the possible sequence of actions that lead to the states of known
values and then choosing the best sequence.
 Execute
o Give the solution perform the actions.
function SIMPLE-PROBLEM-SOLVING-AGENT(percept) return an action
static: seq, an action sequence
state, some description of the current world state
goal, a goal
problem, a problem formulation
stateUPDATE-STATE(state, percept)
if seq is empty then
goal FORMULATE-GOAL(state)
problemFORMULATE-PROBLEM(state,goal)
seq SEARCH(problem)
action FIRST(seq)
seq REST(seq)
return action
EXAMPLE:
 On holiday in Romania; currently in Arad

o Flight leaves tomorrow from Bucharest
 Formulate goal
o Be in Bucharest
 Formulate problem
o States: various cities
o Actions: drive between cities
 Find solution
o Sequence of cities; e.g. Arad, Sibiu, Fagaras, Bucharest, …

Selecting a state space
 Real world is absurdly complex.
State space must be abstracted for problem solving.
 (Abstract) state = set of real states.
 (Abstract) action = complex combination of real actions.
o e.g. Arad ®Zerind represents a complex set of possible routes, detours, rest stops, etc.
o The abstraction is valid if the path between two states is reflected in the real world.
 (Abstract) solution = set of real paths that are solutions in the real world.
 _ Each abstract action should be “easier” than the real problem.
Formulating Problem as a Graph

In the graph
 each node represents a possible state;

 a node is designated as the initial state;
 one or more nodes represent goal states, states in which the agent’s goal is considered
accomplished.
 each edge represents a state transition caused by specific agent action;
 associated to each edge is the cost of performing that transition.
State space graph of vacuum world

Example: vacuum world
 States?? two locations with or without dirt: 2 x 22=8 states.
 Initial state?? Any state can be initial
 Actions?? {Left, Right, Suck}
 Goal test?? Check whether squares are clean.
o Path cost?? Number of actions to reach goal.
Example: 8-puzzle
 States?? Integer location of each tile
 Initial state?? Any state can be initial
 Actions?? {Left, Right, Up, Down}
 Goal test?? Check whether goal configuration is reached
o Path cost?? Number of actions to reach goal

Problem Solving as Search
Search space: set of states reachable from an initial state S0 via a (possibly empty/finite/infinite)
sequence of state transitions.
To achieve the problem’s goal
 search the space for a (possibly optimal) sequence of transitions starting from S0 and leading
to a goal state;
 execute (in order) the actions associated to each transition in the identified sequence.
Depending on the features of the agent’s world the two steps above can be interleaved.
How do we reach a goal state?
There may be several possible ways. Or none!
Factors to consider:
 cost of finding a path;

 cost of traversing a path.
Problem Solving as Search

 Reduce the original problem to a search problem.
 A solution for the search problem is a path initial state–goal state.
 The solution for the original problem is either
o the sequence of actions associated with the path
o Or the description of the goal state.
Example: The 8-puzzle
It can be generalized to 15-puzzle, 24-puzzle, or (n2 − 1)-puzzle for n ≥ 6.
States: configurations of tiles

Operators: move one tile Up/Down/Left/Right
 There are 9! = 362, 880 possible states (all permutations of {⊓⊔, 1, 2, 3, 4, 5, 6, 7,
8}).
 There are 16! possible states for 15-puzzle.
 Not all states are directly reachable from a given state.
(In fact, exactly half of them are reachable from a given state.)
How can an artificial agent represent the states and the state
space for this problem?
Go from state S to state G.
Problem formulation
 A problem is defined by:
o An initial state, e.g. Arad
o Successor function S(X)= set of action-state pairs
 e.g. S(Arad)={<Arad ® Zerind, Zerind>,…}
intial state + successor function = state space
o Goal test, can be
 Explicit, e.g. x=‘at bucharest’
 Implicit, e.g. checkmate(x)
o Path cost (additive)
 e.g. sum of distances, number of actions executed, …
 c(x,a,y) is the step cost, assumed to be >= 0
A solution is a sequence of actions from initial to goal state.
Optimal solution has the lowest path cost.
Problem formulation
1. Choose an appropriate data structure to represent the world states.
2. Define each operator as a precondition/effects pair where the precondition holds exactly in the
states the operator applies to, effects describe how a state changes into a successor state by the
application of the operator.
3. Specify an initial state.

4. Provide a description of the goal (used to check if a reached state is a goal state).
Formulating the 8-puzzle Problem

States: each represented by a 3 × 3 array of numbers in [0 . . . 8], where value 0 is for the empty cell.
 Operators: 24 operators of the form Op(r,c,d) where r, c ∈ {1, 2, 3}, d ∈ {L,R,U,D}.

 Op(r,c,d) moves the empty space at position (r, c) in the direction d.
Example: Op(3,2,R)
We have 24 operators in this problem formulation . . .

20 too many!
Problem types
 Deterministic, fully observable ⇒single state problem
o Agent knows exactly which state it will be in; solution is a sequence.
 Partial knowledge of states and actions:
o Non-observable ⇒sensorless or conformant problem
 Agent may have no idea where it is; solution (if any) is a sequence.
o Nondeterministic and/or partially observable ⇒contingency problem
 Percepts provide new information about current state; solution is a tree or
policy; often interleave search and execution.
 If uncertainty is caused by actions of another agent: adversarial problem
o Unknown state space ⇒exploration problem (“online”)
 When states and actions of the environment are unknown.

 Problem Solutions need Well-Defined Problems, and Well Defined Problems need to
embody explicit solutions on possible solutions: well defined problems must define the space
of possible solutions.We use searching to solve well defined problems.
Constraint satisfaction problems

What is a CSP?
 Finite set of variables V1, V2, …, Vn

 Finite set of constraints C1, C2, …, Cm
 Nonemtpy domain of possible values for each variables DV1, DV2, … DVn
 Each constraint Ci limits the values that variables can take,
 e.g., V1 ≠ V2
 A state is defined as an assignment of values to some or all variables.
 Consistent assignment: assignment does not not violate the constraints.
 An assignment is complete when every value is mentioned.
 A solution to a CSP is a complete assignment that satisfies all constraints.
 Some CSPs require a solution that maximizes an objective function.
 Applications: Scheduling the time of observations on the Hubble Space Telescope, Floor planning,
Map coloring, Cryptography
 CSPs are a special kind of problem: states defined by values of a fixed set of variables, goal test
defined by constraints on variable values
Varieties of Constraints
 Unary constraints involve a single variable.
 e.g. SA ¹ green
 Binary constraints involve pairs of variables.
 e.g. SA ¹ WA
 Higher-order constraints involve 3 or more variables.
 e.g. cryptharithmetic column constraints.
 Preference (soft constraints) e.g. red is better than greenoften representable by a cost for each
variable assignment constrained optimization problems.
CSP example: map coloring
 Variables: WA, NT, Q, NSW, V, SA, T

 Domains: Di={red,green,blue}
 Constraints: adjacent regions must have different colors.
o E.g. WA ¹ NT (if the language allows this)
o E.g. (WA,NT) ¹ {(red,green),(red,blue),(green,red),…}

 Solutions are assignments satisfying all constraints, e.g.
 {WA=red,NT=green,Q=red,NSW=green,V=red,SA=blue,T=green}
Constraint graph
CSP benefits
 Standard representation pattern
 Generic goal and successor functions
 Generic heuristics (no domain specific expertise).
Constraint graph = nodes are variables, edges show constraints.
 Graph can be used to simplify search.
o e.g. Tasmania is an independent subproblem.
Cryptarithmetic conventions
 Each letter or symbol represents only one digit throughout the problem;
 When letters are replaced by their digits, the resultant arithmetical operation must be
correct;
 The numerical base, unless specifically stated, is 10;
 Numbers must not begin with a zero;
 There must be only one solution to the problem.
1.
S E N D
+ M O R E
------------
M O N E Y
We see at once that M in the total must be 1, since the total of the column SM cannot reach as
high as 20. Now if M in this column is replaced by 1, how can we make this column total as
much as 10 to provide the 1 carried over to the left below? Only by making S very large: 9 or
8. In either case the letter O must stand for zero: the summation of SM could produce only 10
or 11, but we cannot use 1 for letter O as we have already used it for M.
If letter O is zero, then in column EO we cannot reach a total as high as 10, so that there will
be no 1 to carry over from this column to SM. Hence S must positively be 9.
Since the summation EO gives N, and letter O is zero, N must be 1 greater than E and the
column NR must total over 10. To put it into an equation: E + 1 = N
From the NR column we can derive the equation: N + R + (+ 1) = E + 10
We have to insert the expression (+ 1) because we don’t know yet whether 1 is carried over
from column DE. But we do know that 1 has to be carried over from column NR to EO.
Subtract the first equation from the second: R + (+1) = 9

We cannot let R equal 9, since we already have S equal to 9. Therefore we will have to make
R equal to 8; hence we know that 1 has to be carried over from column DE.
Column DE must total at least 12, since Y cannot be 1 or zero. What values can we give D
and E to reach this total? We have already used 9 and 8 elsewhere. The only digits left that
are high enough are 7, 6 and 7, 5. But remember that one of these has to be E, and N is 1
greater than E. Hence E must be 5, N must be 6, while D is 7. Then Y turns out to be 2, and
the puzzle is completely solved.
S E N D
9 5 6 7
+ M O R E
1 0 8 5
---------
M O N E Y
1 0 6 5 2
2.
T W O
+ T W O
_____
F O U R
Since, Lets first check with F as 0.Now imagine O with highest possible value 9.Now R must be 8
and T should be 4. Now among the remaining numbers if we check then we get U as 3.Thus W must
be 6,
T W O
4 6 9
+ T W O
4 6 9
_____
F O U R
0 9 3 8
Game Playing
Summary
 Games are fun (and dangerous)
 They illustrate several important points about AI
 Perfection is unattainable -> approximation
 Good idea what to think about
 Uncertainty constrains the assignment of values to states
 Games are to AI as grand prix racing is to automobile design.
 Games are a form of multi-agent environment

o What do other agents do and how do they affect our success?
o Cooperative vs. competitive multi-agent environments.
o Competitive multi-agent environments give rise to adversarial problems a.k.a.
games
 Why study games?
o Fun; historically entertaining
o Interesting subject of study because they are hard
 Chess game:
 average branch factor: 35, each player: 50 moves-> Search tree: 35100
nodes
Relation of Search and Games

 Search – no adversary
 Solution is (heuristic) method for finding goal
 Heuristics and CSP techniques can find optimal solution
 Evaluation function: estimate of cost from start to goal through given node
 Examples: path planning, scheduling activities
 Games – adversary
 Solution is strategy (strategy specifies move for every possible opponent reply).
 Time limits force an approximate solution
 Evaluation function: evaluate “goodness” of game position
 Examples: chess, checkers, Othello, backgammon
Types Of Games
Multiplayer Games allow more than one player
Game setup
 Two players: MAX and MIN
 MAX moves first and they take turns until the game is over. Winner gets award, looser
gets penalty.
 Games as search:
o Initial state: e.g. board configuration of chess
o Successor function: list of (move,state) pairs specifying legal moves.
o Terminal test: Is the game finished?
o Utility function: Gives numerical value of terminal states.

 E.g. win (+1), loose (-1) and draw (0) in tic-tac-toe (next)
 MAX uses search tree to determine next move.
 Partial Game Tree for Tic Tac Toe
Optimal strategies
 Find the contingent strategy for MAX assuming an infallible MIN opponent.
 Assumption: Both players play optimally !!
 Given a game tree, the optimal strategy can be determined by using the minimax value of
each node:
MINIMAX-VALUE(n)=
Two-Ply Game Tree
Minimax maximizes the worst-case outcome for max.

Production System
Production systems are applied to problem solving programs that must perform a wide-range of
searches. Production systems are symbolic AI systems. The difference between these two terms is
only one of semantics. A symbolic AI system may not be restricted to the very definition of production
systems, but they can't be much different either.
Production systems are composed of three parts, a global database, production rules and a control
structure.
A production system (or production rule system) is a computer program typically used to provide
some form of artificial intelligence, which consists primarily of a set of rules about behaviour. These
rules, termed productions, are a basic representation found useful in automated planning, expert
systems and action selection. A production system provides the mechanism necessary to execute
productions in order to achieve some goal for the system.
Productions consist of two parts: a sensory precondition (or "IF" statement) and an action (or
"THEN"). If a production's precondition matches the current state of the world, then the production is
said to be triggered. If a production's action is executed, it is said to have fired.
The first production systems were done by Newell and Simon in the 1950s, and the idea was written
up in their (1972).
"Production" in the title of these notes (or "production rule") is a synonym for "rule", i.e. for a
condition-action rule (see below). The term seems to have originated with the term used for
rewriting rules in the Chomsky hierarchy of grammar types, where for example context-free
grammar rules are sometimes referred to as context-free productions.
Rules
These are also called condition-action rules.

These components of a rule-based system have the form:
if <condition> then <conclusion>
or
if <condition> then <action>
Example:
if patient has high levels of the enzyme ferritin in their blood
and patient has the Cys282→Tyr mutation in HFE gene
then conclude patient has haemochromatosis*
* medical validity of this rule is not asserted here
Rules can be evaluated by:
 backward chaining
 forward chaining

Backward Chaining
 To determine if a decision should be made, work backwards looking for justifications for the
decision.
 Eventually, a decision must be justified by facts.
Forward Chaining
 Given some facts, work forward through inference net.

 Discovers what conclusions can be derived from data.
Forward Chaining 2
Until a problem is solved or no rule's 'if' part is satisfied by the current situation:
1. Collect rules whose 'if' parts are satisfied.

2. If more than one rule's 'if' part is satisfied, use a conflict resolution strategy to eliminate all
but one.
3. Do what the rule's 'then' part says to do.

Production Rules
A production rule system consists of
 a set of rules
 working memory that stores temporary data
 a forward chaining inference engine
Match-Resolve-Act Cycle
The match-resolve-act cycle is what the inference engine does.
loop
match conditions of rules with contents of working memory
if no rule matches then stop
resolve conflicts
act (i.e. perform conclusion part of rule)
end loop
Chapter-3
3.1. Uninformed Search

3.1.1 Breadth-first search (BFS)
 Description
 A simple strategy in which the root is expanded first then all the root successors are expanded
next, then their successors.
 We visit the search tree level by level that all nodes are expanded at a given depth before any
nodes at the next level are expanded.
 Order in which nodes are expanded.
 Performance Measure:
 Completeness:
 it is easy to see that breadth-first search is complete that it visit all levels given that d
factor is finite, so in some d it will find a solution.
 Optimality:
 breadth-first search is not optimal until all actions have the same cost.
 Space complexity and Time complexity:

 Consider a state space where each node as a branching factor b, the root of the tree
generates b nodes, each of which generates b nodes yielding b 2 each of these generates b3 and so
on.
 In the worst case, suppose that our solution is at depth d, and we expand all nodes
but the last node at level d, then the total number of generated nodes is: b + b 2 + b3 + b4 + bd+1 – b
= O(bd+1), which is the time complexity of BFS.
 As all the nodes must retain in memory while we expand our search, then the space
complexity is like the time complexity plus the root node = O(bd+1).
 Conclusion:
 We see that space complexity is the biggest problem for BFS than its exponential execution
time.
 Time complexity is still a major problem, to convince your-self look at the table below.
3.1.2. Depth-first search (DFS)

 Description:
 DFS progresses by expanding the first child node of the search tree that appears and thus
going deeper and deeper until a goal node is found, or until it hits a node that has no children. Then the
search backtracks, returning to the most recent node it hasn’t finished exploring.
 Order in which nodes are expanded
 Performance Measure:
 Completeness:
 DFS is not complete, to convince yourself consider that our search start expanding
the left sub tree of the root for so long path (may be infinite) when different choice near the root could
lead to a solution, now suppose that the left sub tree of the root has no solution, and it is unbounded,
then the search will continue going deep infinitely, in this case we say that DFS is not complete.
 Optimality:

 Consider the scenario that there is more than one goal node, and our search decided
to first expand the left sub tree of the root where there is a solution at a very deep level of this left sub
tree, in the same time the right sub tree of the root has a solution near the root, here comes the non-
optimality of DFS that it is not guaranteed that the first goal to find is the optimal one, so we conclude
that DFS is not optimal.
 Time Complexity:
 Consider a state space that is identical to that of BFS, with branching factor b, and we
start the search from the root.
 In the worst case that goal will be in the shallowest level in the search tree resulting in
generating all tree nodes which are O(bm).
 Space Complexity:
 Unlike BFS, our DFS has a very modest memory requirements, it needs to story only
the path from the root to the leaf node, beside the siblings of each node on the path, remember that
BFS needs to store all the explored nodes in memory.
 DFS removes a node from memory once all of its descendants have been expanded.
 With branching factor b and maximum depth m, DFS requires storage of only bm + 1
nodes which areO(bm) compared to the O(bd+1) of the BFS.
 Conclusion:
 DFS may suffer from non-termination when the length of a path in the search tree is
infinite, so we perform DFS to a limited depth which is called Depth-limited Search.
3.1.3 Depth Limited Search
• Breadth first has computational, especially, space problems. Depth first can run off down a very
long (or infinite) path..
• Idea: introduce a depth limit on branches to be expanded.
• Don’t expand a branch below this depth.
• Most useful if you know the maximum depth of the solution.
 Perform depth first search but only to a pre-specified depth limit L.

 No node on a path that is more than L steps from the initial state is placed on the Frontier.
 We “truncate” the search by looking only at paths of length L or less.
 Description:
 The unbounded tree problem appeared in DFS can be fixed by imposing a limit on the depth that DFS
can reach, this limit we will call depth limit l, this solves the infinite path problem.
 Performance Measure:
 Completeness:
 The limited path introduces another problem which is the case when we choose l < d, in
which is our DLS will never reach a goal, in this case we can say that DLS is not complete.
 Optimality:
 One can view DFS as a special case of the depth DLS, that DFS is DLS with l = infinity.
 DLS is not optimal even if l > d.
 Time Complexity: O(bl)
 Space Complexity: O(bl)
 Conclusion:
 DLS can be used when the there is a prior knowledge to the problem, which is always not the case,
Typically, we will not know the depth of the shallowest goal of a problem unless we solved this
problem before.

It is Depth First -search with depth limit l.
 i.e. nodes at depth l have no successors.
 Problem knowledge can be used
 Solves the infinite-path problem.
 If l < d then incompleteness results.
 If l > d then not optimal.
 Time complexity: O(bl )
 Space complexity: O(bl )
Advantages
 Will always terminate
 Will find solution if there is one in the depth bound
Disadvantages
• Too small a depth bound misses solutions
• Too large a depth bound may find poor solutions when there are better ones
3.1.4. Search Strategies’ Comparison:

Here is a table that compares the performance measures of each search strategy.
3.2. Informed Search

- more powerful than uninformed
- Informed = use problem-specific knowledge
3.2.1. Hill Climbing

 Here feedback from the test procedure is used to help the generator decide which direction to
move in search space.
 The test function is augmented with a heuristic function that provides an estimate of how
close a given state is to the goal state.
 Computation of heuristic function can be done with negligible amount of computation.
 Greedy local search
Hill climbing is often used when a good heuristic function is available for evaluating states but when
no other useful knowledge is available
 Loop that continuously moves in the direction of increasing value
 Terminates when it reaches a “Peak”
 Problem: depending on initial state, can get stuck in local maxima
This simple policy has three well-known drawbacks:
1. Local Maxima: a local maximum

as opposed to global maximum.
2. Plateaus: An area of the search

space where evaluation function is
flat, thus requiring random walk.
3. Ridge: Where there are steep

slopes and the search direction is
not towards the top but towards the
side.
Variations of Hill Climbing
 Stochastic hill-climbing
o Random selection among the uphill moves.
o The selection probability can vary with the steepness of the uphill move.
 First-choice hill-climbing
o cfr. stochastic hill climbing by generating successors randomly until a better one is
found.
 Random-restart hill-climbing
o Tries to avoid getting stuck in local maxima.
3.2.2. Best First Search

 General approach of informed search:
o Best-first search: node is selected for expansion based on an evaluation function f(n)
 Idea: evaluation function measures distance to the goal.
o Choose node which appears best
 Implementation:
o fringe is queue sorted in decreasing order of desirability.

o Special cases: greedy search, A* search
 Best First Search is a general search strategy
 Uses an evaluation function f(n) in deciding which node (in queue) to expand next
 Note: “best” could be misleading (it is relative, not absolute)
 Greedy search is one type of Best First Search
3.2.2.1.Greedy Search
 Use a heuristic h() (cost estimate to goal) as the evaluation function

 Example: straight-line distance in finding a path from one city to another
 Evaluation function f(n) = h(n) (heuristic)= (estimate of cost from n to goal)
e.g., hSLD(n) = straight-line distance from n to Bucharest
 Greedy best-first search expands the node that appears to be closest to goal
 Complete? No – can get stuck in loops, e.g., Iasi  Neamt  Iasi  Neamt 
Time? O(bm), but a good heuristic can give dramatic improvement
Space? O(bm) -- keeps all nodes in memory
 Optimal? No
 But can be acceptable in practice
3.2.2. A* Search
 Best-known form of best-first search.
 Idea: avoid expanding paths that are already expensive.
 Evaluation function f(n)=g(n) + h(n)
o g(n) the cost (so far) to reach the node.
o h(n) estimated cost to get from the node to the goal.
o f(n) estimated total cost of path through n to goal.
 A* search uses an admissible heuristic
o A heuristic is admissible if it never overestimates the cost to reach the goal
o Are optimistic
Formally:
1. h(n) <= h*(n) where h*(n) is the true cost from n
2. h(n) >= 0 so h(G)=0 for any goal G.
e.g. hSLD(n) never overestimates the actual road distance
example:

Find Bucharest starting at Arad
f(Arad) = c(??,Arad)+h(Arad)=0+366=366
Initial State:
Expand Arrad and determine f(n) for each node

 f(Sibiu)=c(Arad,Sibiu)+h(Sibiu)=140+253=393
 f(Timisoara)=c(Arad,Timisoara)+h(Timisoara)=118+329=447
 f(Zerind)=c(Arad,Zerind)+h(Zerind)=75+374=449
 Best choice is Sibiu
And so on…
Admissible Heuristic
 A heuristic h(n) is admissible if for every node n,
h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n.
 An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic
 Example: hSLD(n) (never overestimates the actual road distance)
 Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal
A* Search Evaluation

 Completeness: YES
 Time complexity: (exponential with path length)
 Space complexity:(all nodes are stored)
 Optimality: YES
 Cannot expand fi+1 until fi is finished.
 A* expands all nodes with f(n)< C*
 A* expands some nodes with f(n)=C*
 A* expands no nodes with f(n)>C*
Also optimally efficient (not including ties)
3.2.3. Adversarial Search

MINMAX procedure
 Perfect play for deterministic games
 Idea: choose move to position with highest minimax value = best achievable payoff against
best play
 E.g., 2-ply game:
MINMAX Algorithm
minimax(player,board)
if(game over in current board position)
return winner
children = all legal moves for player from this board
if(max's turn)
return maximal score of calling minimax on all the children
else (min's turn)
return minimal score of calling minimax on all the children
 Complete? Yes (if tree is finite)

 Optimal? Yes (against an optimal opponent)
 Time complexity? O(bm)
 Space complexity? O(bm) (depth-first exploration)
 For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
Alpha Beta Pruning
 ALPHA-BETA pruning is a method that reduces the number of nodes explored in Minimax
strategy.
 It reduces the time required for the search and it must be restricted so that no time is to be
wasted searching moves that are obviously bad for the current player.

 The exact implementation of alpha-beta keeps track of the best move for each side as it moves
throughout the tree.
Properties of α-β

 Pruning does not affect final result
 Good move ordering improves effectiveness of pruning
 With "perfect ordering," time complexity = O(bm/2)
 doubles depth of search
Why it is called alpha-beta?
 A simple example of the value of reasoning about which computations are relevant
 α is the value of the best (i.e., highest-value) choice found so far at any choice point along the
path for max
 If v is worse than α, max will avoid it
 prune that branch
 Define β similarly for min
Chapter 4
4.1.1 Logics are formal languages for formalizing reasoning, in particular for representing
information such that conclusions can be drawn
Logic involves:
– A language with a syntax for specifying what is a legal expression in the language;
syntax defines well formed sentences in the language
– Semantics for associating elements of the language with elements of some subject
matter. Semantics defines the "meaning" of sentences (link to the world); i.e.,
semantics defines the truth of a sentence with respect to each possible world
– Inference rules for manipulating sentences in the language
4.1.2. Syntax (grammar, internal structure of the language)

– Vocabulary: grammatical categories
– Identifying Well-Formed Formulae (“WFFs”)
4.1.3 Semantics (pertaining to meaning and truth value)
– Translation
– Truth functions
– Truth tables for the connectives
4.1.4. Connectives (“Sentence-Forming Operators”)

~ negation “not,” “it is not the case that”
⋅ conjunction “and”
∨ disjunction “or” (inclusive)
⊃ conditional “if – then,” “implies”
≣ biconditional “if and only if,” “iff”
• Connect to sentences to make new sentences
• Negation attaches to one sentence
– It is not raining ∼ R
• Conjunction, disjunction, conditional and biconditional attach two sentences together
– It is raining and it is cold R ∙ C

– If it rains then it pours R⊃P
4.1.5. Well-Formed Formulae

Rules for WFF
1. A sentence letter by itself is a WFF
A B Z
2. The result of putting  immediately in front of a WFF is a WFF
A B B  (A  B)  ( C  D)
3. The result of putting  ,  ,  , or  between two WFFs and surrounding the whole thing with
parentheses is a WFF
(A  B) (  C  D) ((  C  D)  (E  (F   G)))
4. Outside parentheses may be dropped
AB CD (  C  D)  (E  (F   G))
A sentence that can be constructed by applying the rules for constructing WFFs one at a time is a
WFF
A sentence which can't be so constructed is not a WFF.
– Atomic sentences are wffs:
Propositional symbol (atom)
Examples: P, Q, R, BlockIsRed, SeasonIsWinter
– Complex or compound wffs.
Given w1 and w2 wffs:
 w1 (negation)
(w1  w2) (conjunction)
(w1  w2) (disjunction)
(w1  w2) (implication; w1 is the antecedent;
w2 is the consequent)
(w1  w2) (biconditional)
4.1.6. Tautology
If a wff is True under all the interpretations of its constituents atoms, we say that
the wff is valid or it is a tautology.
Examples:
1 P  P 2 (P  P) 3 [P  (Q  P)] 4 [(P  Q) P) P]
An inconsistent sentence or contradiction is a sentence that is False under all interpretations. The
world is never like what it describes, as in “It’s raining and it’s not raining.”
4.1.7.Validity
An argument is valid whenever the truth of all its premises implies the truth of its conclusion.
An argument is a sequence of propositions. The final proposition is called the conclusion of the argument while the other
proposition are called the premises or hypotheses of the argument.
one can use the rules of inference to show the validity of an argument.
Note that p1, p2, … q are generally compound propositions or wffs.
4.2.
Intelligent agents should have capacity for:

 Perceiving: acquiring information from environment,
 Knowledge Representation: representing its understanding of the world,

 Reasoning: inferring the implications of what it knows and of the choices it has, and
 Acting: choosing what it want to do and carry it out.
4.2.1.Knowledge Base
 Representation of knowledge and the reasoning processes that brings knowledge to life –
center to entire field of AI
 Knowledge and reasoning also play a crucial role in dealing partially observable
environments
 Central component of Knowledge-based agent is its knowledge base.
 Knowledge base = set of sentences in a formal language

 Declarative approach to building an agent (or other system):
 TELL it what it needs to know
 Then it can Ask itself what to do
- answers should follow from the KB
4.2.2.Entailment
 Entailment means that one thing follows from another:
KB ╞ α
 Knowledge base KB entails sentence α if and only if α is true in all worlds where KB is
true
o e.g., the KB containing “the Giants won” and “the Reds won” entails “Either the
Giants won or the Reds won”
o E.g., x+y = 4 entails 4 = x+y
o Entailment is a relationship between sentences (i.e., syntax) that is based on
semantics
Inference
 Notation :KB ├i α = sentence α can be derived from KB by procedure i
 Soundness: i is sound if whenever KB ├i α, it is also true that KB╞ α
 Completeness: i is complete if whenever KB╞ α, it is also true that KB ├i α
Sound Rules of Inference

Here are some examples of sound rules of inference
 A rule is sound if its conclusion is true whenever the premise is true
Each can be shown to be sound using a truth table
RULE PREMISE CONCLUSION
Modus Ponens A, A  B B
And Introduction A, B AB
And Elimination AB A
Double Negation A A
Unit Resolution A  B, B A
Resolution A  B, B  C AC
Soundness of Modus Ponens

A B A→B OK?
True True True 
True False False 
False True True 
False False True 
Horn Clause
A Horn sentence or Horn clause has the form:
P1  P2  P3 ...  Pn  Q
or alternatively
P1   P2   P3 ...   Pn  Q
where Ps and Q are non-negated atoms
• To get a proof for Horn sentences, apply Modus Ponens repeatedly until nothing can be done
• We will use the Horn clause form later
4.2.3.Propositional Logic
Propositional Logic Syntax
 Propositional logic is the simplest logic – illustrates basic ideas
 All objects described are fixed or unique
 E.g. "John is a student" student(john) ; Here John refers to one unique person.
 In propositional logic (PL) an user defines a set of propositional symbols, like P and Q. User
defines the semantics of each of these symbols. For example,
 P means "It is hot"
 Q means "It is humid“
 R means "It is raining"
 The proposition symbols:
S, S1, S2 etc are sentences
_ If S is a sentence, ØS is a sentence (negation )
_ If S1 and S2 are sentences, S1 Ù S2 is a sentence (conjunction )
_ If S1 and S2 are sentences, S1 Ú S2 is a sentence (disjunction )
_ If S1 and S2 are sentences, S1 => S2 is a sentence (implication )
_ If S1 and S2 are sentences, S1  S2 is a sentence (biconditional )
Propositional Logic Semantics

 Each model specifies true/false for each proposition symbol
With these symbols, 8 possible models, can be enumerated automatically.
Rules for evaluating truth with respect to a model m:
S is true iff S is false
S1  S2 is true iff S1 is true and S2 is true
S1  S2 is true iff S1is true or S2 is true
S1  S2 is true iff S1 is false or S2 is true
i.e., is false iff S1 is true and S2 is false
S1  S2 is true iff S1S2 is true and S2S1 is true

 Simple recursive process evaluates an arbitrary sentence, e.g.,
P1,2  (P2,2  P3,1) = true  (true  false) = true  true = true
Truth Table for Connectives
Validity and satisfiability

A sentence is valid if it is true in all models,
e.g., True, A A, A  A, (A  (A  B))  B
Validity is connected to inference via the Deduction Theorem:
KB ╞ α if and only if (KB  α) is valid
A sentence is satisfiable if it is true in some model
e.g., A B, C
A sentence is unsatisfiable if it is true in no models
e.g., AA
Satisfiability is connected to inference via the following:
KB ╞ α if and only if (KB α) is unsatisfiable
Logical Equivalence
 Two sentences are logically equivalent iff true in same models: α ≡ ß iff α╞ β and β╞ α
Resolution
 Conjunctive Normal Form (CNF)
o conjunction of disjunctions of literals clauses
 E.g., (A Ú ØB) Ù (B Ú ØC Ú ØD)
 Resolution is sound and complete for propositional logic
 Conversion to CNF
B1,1  (P1,2  P2,1)β

1. Eliminate , replacing α  β with (α  β)(β  α).
(B1,1  (P1,2  P2,1))  ((P1,2  P2,1)  B1,1)

2. Eliminate , replacing α  β with α β.
(B1,1  P1,2  P2,1)  ((P1,2  P2,1)  B1,1)
3. Move  inwards using de Morgan's rules and double-negation:
(B1,1  P1,2  P2,1)  ((P1,2  P2,1)  B1,1)
4. Apply distributivity law ( over ) and flatten:
(B1,1  P1,2  P2,1)  (P1,2  B1,1)  (P2,1  B1,1)
 Resolution Algorithm
 Proof by contradiction, i.e., show KBα unsatisfiable
 Proportional Resolution
Advantages of propositional logic:

· Simple.
· No decidability problems.
Limitations of Propositional Calculus

 An argument may not be provable using propositional logic, but may be provable using
predicate logic.
 e.g. All horses are animals.
Therefore, the head of a horse is the head of an animal.

We know that this argument is correct and yet it cannot be proved under propositional logic,
but it can be proved under predicate logic.
 Limited representational power.
 Simple statements may require large and awkward representations.
4.2.4.First Order Predicate Logic (FOPL)
Predicate Logic (FOPL) provides

i) A language to express assertions (axioms) about certain "worlds ".
ii) An inference system or deductive apparatus whereby we may draw conclusions from
such assertions and
iii) A semantics based on set theory.
The language of FOPL consists of

i) A set of constant symbols (to name particular individuals such as table, a,b,c,d,e etc. - these depend
on the application)
ii) A set of variables (to refer to arbitrary individuals)
iii) A set of predicate symbols (to represent relations such as On, Above etc. -these depend on the
application)
iv) A set of function symbols (to represent functions - these depend on the application)
v) The logical connectives −, . , υ ,ω , ¬ (to capture and, or, implies, iff and not)
vi) The Universal Quantifier, ∀ : and the Existential Quantifer, ∃ :(to capture “all”, “every”, “some”,
“few”, “there exists” etc.)
vii) Normally a special binary relation of equality (=) is considered (at least in mathematics) as part of
the language.
Quantification
Universal Quantification
 <variables> <sentence>
Everyone at KEC is smart:
x At(x,KEC)  Smart(x)
x P is true in a model m iff P is true with x being each possible object in the model
 Roughly speaking, equivalent to the conjunction of instantiations of P
At(KingJohn,KEC)  Smart(KingJohn)
At(Richard,KEC)  Smart(Richard)
 Common mistake to avoid:

 Typically,  is the main connective with 
 Common mistake: using  as the main connective with :
x At(x,KEC)  Smart(x) means “Everyone is at KEC and everyone is smart
Existential Quantification
 <variables> <sentence>
 Someone at KEC is smart:
 x At(x,KEC)  Smart(x)
 x P is true in a model m iff P is true with x being some
possible object in the model
 Typically,  is the main connective with 
 Common mistake: using  as the main connective with :
x At(x,KEC)  Smart(x) is true if there is anyone who is not at KEC

Properties of Quantifiers
 x y is the same as y x
 x y is the same as y x
 x y is not the same as y x
x y Loves(x,y)
 “There is a person who loves everyone in the world”
 y x Loves(x,y)
 “Everyone in the world is loved by at least one person”
 Quantifier duality: each can be expressed using the other
 x Likes(x,IceCream) x Likes(x,IceCream)
 x Likes(x,Broccoli) x Likes(x,Broccoli)
Example 1
For example, Suppose we wish to represent in FOPL the following sentences
a) “Everyone loves Janet”
b) “Not everyone loves Daphne”
c) “Everyone is loved by their mother”
Introducing constant symbols j and d to represent Janet and Daphne respectively; a binary
predicate symbol L to represent loves and the unary function symbol1 m to represent the
mother of a person given as argument.
The above sentences may now be represented in FOPL by
a) ∀x.L(x,j)
b) ∃x.¬L(x,d)
c) ∀x.L(m(x),x)
Example 2
We will express the following in first order predicate calculus
“sam is Kind”
“Every kind person has someone who loves them”
“sam loves someone”
The non-logical symbols of our language are
the constant sam and
the unary predicate (or property) Kind and
the binary predicate Loves.
We may represent the above sentences as
1. Kind(sam)
2. ∀x.(Kind(x) υ ∃y.Loves(y,x))
3. ∃y Loves(sam,y)
Some Semantic Issues

An interpretation (of the language of FOPL) consists of
a) a non empty set of objects (the Universe of Discourse, D) containing designated
individuals named by the constant symbols
b) for each function symbol in the language of FOPL, a corresponding function over D.
c) for each predicate symbol in the language of FOPL, a corresponding relation over D.
An interpretation is said to be a model for a set of sentences Γ, if each sentence of Γ is true

under the given interpretation.

 The interpretation of a formula F in first order predicate logic consists of fixing a
domain of values (non empty) D and of an association of values for every constant,
function and predicate in the formula F as follows:
 (1) Every constant has an associated value in D.
 (2) Every function f, of arity n, is defined by the correspondence
where D n = {(x1 ,..., x n ) | x1  D,..., x n  D}
Every predicate of arity n, is defined by the correspondence P : D  {a, f }
n
 (3)
 Interpretation Example
Using FOL
 Brothers are siblings
x,y Brother(x,y)  Sibling(x,y)
 One's mother is one's female parent
m,c Mother(c) = m  (Female(m)  Parent(m,c))
 “Sibling” is symmetric
x,y Sibling(x,y)  Sibling(y,x)
 Marcus was a man
 Man(Marcus)
 Marcus was a Pompeian
 Pompeian(Marcus)
 All Pompeians were Romans
 x:Pompeian(x)Roman(x)
 All Romans were either loyal to Caesar or hated him
 x:Roman(x) loyalto(x,Caesar) V hate(x, Caesar)
 Everyone is loyal to someone
 x: y: loyalto(x,y)
 People only try to assassinate rulers they are not loyal to
x: y: person(x) AND ruler(y) AND tryassassinate(x,y) ~loyalto(x,y)

4.3
Inference Rules
Complex deductive arguments can be judged valid or invalid based on whether or not the steps in that
argument follow the nine basic rules of inference. These rules of inference are all relatively simple,
although when presented in formal terms they can look overly complex.
Conjunction:
1. P
2. Q
3. Therefore, P and Q.
1. It is raining in New York.
2. It is raining in Boston
3. Therefore, it is raining in both New York and Boston
Simplification
1. P and Q.
2. Therefore, P.
1. It is raining in both New York and Boston.
2. Therefore, it is raining in New York.
Addition
1. P
2. Therefore, P or Q.
1. It is raining
2. Therefore, either either it is raining or the sun is shining.
Absorption
1. If P, then Q.
2. Therfore, If P then P and Q.
1. If it is raining, then I will get wet.
2. Therefore, if it is raining, then it is raining and I will get wet.
Modus Ponens
1. If P then Q.(p->q)
2. P.
3. Therefore, Q.
1. If it is raining, then I will get wet.
2. It is raining.
3. Therefore, I will get wet.
Modus Tollens
1. If P then Q.
2. Not Q. (~Q).
3. Therefore, not P (~P).

1. If it had rained this morning, I would have gotten wet.
2. I did not get wet.
3. Therefore, it did not rain this morning.
Hypothetical Syllogism
1. If P then Q.
2. If Q then R.
3. Therefore, if P then R.
1. If it rains, then I will get wet.
2. If I get wet, then my shirt will be ruined.
3. If it rains, then my shirt will be ruined.
Disjunctive Syllogism/unit resolution
1. Either P or Q/.p v q,
2. Not P (~P).
3. Therefore, Q.
1. Either it rained or I took a cab to the movies.
2. It did not rain.
3. Therefore, I took a cab to the movies.
Constructive Dilemma
1. (If P then Q) and (If R then S).
2. P or R.
3. Therefore, Q or S.
1. If it rains, then I will get wet and if it is sunny, then I will be dry.
2. Either it will rain or it will be sunny.
3. Therefore, either I will get wet or I will be dry.
The above rules of inference, when combined with the rules of replacement, mean that propositional
calculus is "complete." Propositional calculus is simply another name for formal logic
Unification
I in computer science and logic, is an algorithmic process by which one attempts to solve
the satisfiability problem. The goal of unification is to find a substitution which demonstrates that two
seemingly different terms are in fact either identical or just equal. Unification is widely used
in automated reasoning, logic programming and programming language type system implementation.
Several kinds of unification are commonly studied: that for theories without any equations (the empty
theory) is referred to as syntactic unification: one wishes to show that (pairs of) terms are identical.
If one has a non-empty equational theory, then one is typically interested in showing the equality of (a
pair of) terms; this is referred to as semantic unification. Since substitutions can be ordered into
a partial order, unification can be understood as the procedure of finding a join on a lattice.
We also need some way of binding variables to values in a consistent way so that components of
sentences can be matched. This is the process of Unification.
Binding
A binding list is a set of enteries of the form v = e where v is a variable and e is an object. Given an
expression p and a binding list we write for the instantiation of p using bindings in.
Unifier
Given two expressions p and q, a unifier is a binding list such that

= .
Most General Unifier
MGU is a unifier that binds the fewest variables or binds them to less specific expressions.
Most General Unifier (MGU) Algorithm for expressions p and q
1. If either p or q is either an object constant or a variable, then:
i). If p=q, then p and q already unify and we return { }.

ii). If either p or q is a variable, then return the result binding that variable to the other expression.
iii). Otherwise return failure.
2.If neither p nor q is an object constant or a variable, then they must both be compound expressions
(suppose each is made up ofp1,......pn and q1,......qm) and must be unified one component at a time.
i).If the types and any function/relation constant are not equal, return failure.
ii).If , then return failure.

iii).Otherwise and do the following
a).Set = { }, k = 0.
b).If k = n then stop and return as the mgu of p and q.
c).Otherwise, increment k and apply mgu recursively to and .
 If and unify, add new bindings to and return to step 2(c)ii.
 If and fail to unify then return failure for unification of p and q.
Resolution Refutation System

 Resolution is a technique for proving theorems in predicate calculus
 Resolution is a sound inference rule that, when used to produce a refutation, is also complete
 In an important practical application resolution theorem proving particularly the resolution
refutation system, has made the current generation of Prolog interpreters possible
 The resolution principle, describes a way of finding contradictions in a data base of clauses
with minimum substitution
 Resolution Refutation proves a theorem by negating the statement to be proved and adding
the negated goal to the set of axioms that are known or have been assumed to be true
 It then uses the resolution rule of inference to show that this leads to a contradiction
 Steps in Resolution Refutation Proof
1. Put the premises or axioms into clause form
2. Add the negations of what is to be proved in clause form, to the set of axioms
3. Resolve these clauses together, producing new clauses that logically follow from them
4. Produce a contradiction by generating the empty clause
Discussion on Steps
 Resolution Refutation proofs require that the axioms and the negation of the goal be placed in
a normal form called the clause form

 Clausal form represents the logical database as a set of disjunctions of literals
 Resolution is applied to two clauses when one contains a literal and the other its negation
 The substitutions used to produce the empty clause are those under which the opposite of the
negated goal is true
 If these literals contain variables, they must be unified to make them equivalent
 A new clause is then produced consisting of the disjunction of all the predicates in the two
clauses minus the literal and its negative instance (which are said to have been “resolved
away”)
 Example:
We wish to prove that “Fido will die” from the statements that
“Fido is a dog” and “all dogs are animals” and “all animals will die”
Convert these predicates to clause form
Predicate Form Clause Form
x: [dog(x)animal(x)] ¬ dog(x) V animal(x)
Dog(fido) Dog(fido)
y:[animal(y) die(y)] ¬ animal(y) V die(y)
Apply Resolution
Q.1. Anyone passing the Artificial Intelligence exam and winning the lottery is happy. But anyone
who studies or is lucky can pass all their exams. Ali did not study but he is lucky. Anyone who is
lucky wins the lottery. Is Ali happy?
Anyone passing the AI Exam and winning the lottery is happy
X:[pass(x,AI) Λ win(x, lottery) happy(x)]
Anyone who studies or is lucky can pass all their exams
X Y [studies(x) V lucky(x) pass(x,y)]
Ali did not study but he is lucky
¬ study(ali) Λ lucky(ali)
Anyone who is lucky wins the lottery
X: [lucky(x) win(x,lottery)]
Change to clausal form

1. ¬pass(X,AI) V ¬win(X,lottery) V happy(X)
2. ¬study(Y) V pass(Y,Z)
3. ¬lucky(W) V pass(W,V)

4. ¬study(ali)
5. Lucky(ali)
6. ¬lucky(u) V win(u,lottery)
7. Add negation of the conclusion ¬happy(ali)
4.4.
Symbolic versus statistical reasoning
The (Symbolic) methods basically represent uncertainty belief as being
 True,
 False, or
 Neither True nor False.
Some methods also had problems with
 Incomplete Knowledge
 Contradictions in the knowledge.
Statistical methods provide a method for representing beliefs that are not certain (or uncertain) but for
which there may be some supporting (or contradictory) evidence.
Statistical methods offer advantages in two broad scenarios:
Genuine Randomness
-- Card games are a good example. We may not be able to predict any outcomes with
certainty but we have knowledge about the likelihood of certain items (e.g. like being dealt an
ace) and we can exploit this.
Exceptions
-- Symbolic methods can represent this. However if the number of exceptions is large such
system tend to break down. Many common sense and expert reasoning tasks for example.
Statistical techniques can summarise large exceptions without resorting enumeration.
Basic Statistical methods -- Probability

The basic approach statistical methods adopt to deal with uncertainty is via the axioms of probability:
 Probabilities are (real) numbers in the range 0 to 1.
 A probability of P(A) = 0 indicates total uncertainty in A, P(A) = 1 total certainty and values
in between some degree of (un)certainty.
 Probabilities can be calculated in a number of ways.

Very Simply
Probability = (number of desired outcomes) / (total number of outcomes)
So given a pack of playing cards the probability of being dealt an ace from a full normal deck
is 4 (the number of aces) / 52 (number of cards in deck) which is 1/13. Similarly the
probability of being dealt a spade suit is 13 / 52 = 1/4.
If you have a choice of number of items k from a set of items n then
the formula is applied to find the number of ways of making this choice. (! =
factorial).
So the chance of winning the national lottery (choosing 6 from 49) is to

1.
 Conditional probability, P(A|B), indicates the probability of of event A given that we know
event B has occurred.
Bayes Theorem
 This states:
o This reads that given some evidence E then probability that hypothesis is true is
equal to the ratio of the probability that E will be true given times the a
priori evidence on the probability of and the sum of the probability of E over the
set of all hypotheses times the probability of these hypotheses.
o The set of all hypotheses must be mutually exclusive and exhaustive.
o Thus to find if we examine medical evidence to diagnose an illness. We must know
all the prior probabilities of find symptom and also the probability of having an
illness based on certain symptoms being observed.
Bayesian statistics lie at the heart of most statistical reasoning systems.
How is Bayes theorem exploited?
 The key is to formulate problem correctly:
P(A|B) states the probability of A given only B's evidence. If there is other relevant evidence
then it must also be considered.
Herein lies a problem:
 All events must be mutually exclusive. However in real world problems events are not
generally unrelated. For example in diagnosing measles, the symptoms of spots and a fever
are related. This means that computing the conditional probabilities gets complex.
In general if a prior evidence, p and some new observation, N then computing
grows exponentially for large sets of p

 All events must be exhaustive. This means that in order to compute all probabilities the set of
possible events must be closed. Thus if new information arises the set must be created afresh
and all probabilities recalculated.
Thus Simple Bayes rule-based systems are not suitable for uncertain reasoning.
 Knowledge acquisition is very hard.
 Too many probabilities needed -- too large a storage space.
 Computation time is too large.
 Updating new information is difficult and time consuming.
 Exceptions like ``none of the above'' cannot be represented.
 Humans are not very good probability estimators.
However, Bayesian statistics still provide the core to reasoning in many uncertain reasoning systems
with suitable enhancement to overcome the above problems.

We will look at three broad categories:
 Certainty factors,
 Dempster-Shafer models,
 Bayesian networks.
Belief Models and Certainty Factors

This approach has been suggested by Shortliffe and Buchanan and used in their famous medical
diagnosis MYCIN system.
MYCIN is essentially and expert system. Here we only concentrate on the probabilistic reasoning
aspects of MYCIN.
 MYCIN represents knowledge as a set of rules.
 Associated with each rule is a certainty factor
 A certainty factor is based on measures of belief B and disbelief D of an hypothesis given
evidence E as follows:
where is the standard probability.

 The certainty factor C of some hypothesis given evidenceE is defined as:
Reasoning with Certainty factors

 Rules expressed as if evidence list then there is suggestive evidence with
probability, p for symptom .
 MYCIN uses rules to reason backward to clinical data evidence from its goal of predicting a
disease-causing organism.
 Certainty factors initially supplied by experts changed according to previous formulae.
 How do we perform reasoning when several rules are chained together?
Measures of belief and disbelief given several observations are calculated as follows:
 How about our belief about several hypotheses taken together? Measures of belief given
several hypotheses and to be combined logically are calculated as follows:
Disbelief is calculated similarly.
Bayesian networks
These are also called Belief Networks or Probabilistic Inference Networks. Initially developed by
Pearl (1988).
The basic idea is:
 Knowledge in the world is modular -- most events are conditionally independent of most
other events.
 Adopt a model that can use a more local representation to allow interactions between events
that only affect each other.
 Some events may only be unidirectional others may be bidirectional -- make a distinction
between these in model.
 Events may be causal and thus get chained together in a network.
Implementation
 A Bayesian Network is a directed acyclic graph:
o A graph where the directions are links which indicate dependencies that exist
between nodes.
o Nodes represent propositions about events or events themselves.
o Conditional probabilities quantify the strength of dependencies.
Consider the following example:
 The probability, that my car won't start.
 If my car won't start then it is likely that
o The battery is flat or
o The staring motor is broken.
In order to decide whether to fix the car myself or send it to the garage I make the following decision:
 If the headlights do not work then the battery is likely to be flat so i fix it myself.
 If the starting motor is defective then send car to garage.
 If battery and starting motor both gone send car to garage.
The network to represent this is as follows:
Fig. A simple Bayesian network
Reasoning in Bayesian(belief) nets

 Probabilities in links obey standard conditional probability axioms.
 Therefore follow links in reaching hypothesis and update beliefs accordingly.
 A few broad classes of algorithms have been used to help with this:
o Pearls's message passing method.
o Clique triangulation.
o Stochastic methods.
o Basically they all take advantage of clusters in the network and use their limits on the
influence to constrain the search through net.
o They also ensure that probabilities are updated correctly.
 Since information is local information can be readily added and deleted with minimum effect
on the whole network. ONLY affected nodes need updating.
 Example
o Consider problem: “block-lifting”
o B: the battery is charged.
o L: the block is liftable.
o M: the arm moves.
o G: the gauge indicates that the battery is charged

o
o p(G,M,B,L) = p(G|M,B,L)p(M|B,L)p(B|L)p(L)= p(G|B)p(M|B,L)p(B)p(L)
o Specification:
 Traditional: 16 rows
 BayessianNetworks: 8 rows
 Reasoning: top-down
o Example:
o if the block is liftable, compute the probability of arm moving.
o I.e., Compute p(M | L)
o Solution:
Insert parent nodes:
p(M|L) = p(M,B|L) + p(M,¬B|L)
Use chain rule:
p(M|L) = p(M|B,L)p(B|L) + p(M|,¬B,L)p(¬B|L)
Remove independent node:
p(B|L) =p(B) : B does not have PARENT
p(¬B|L) = p(¬B) = 1 – p(B)
p(M|L) = p(M|B,L)p(B) + p(M|,¬B,L)(1 – p(B))
= 0.9´0.95 + 0.0 ´(1 – 0.95)
= 0.855
 Reasoning: bottom-up
Example:
If the arm cannot move
Compute the probability that the block is not liftable.
I.e., Compute: p(¬L|¬M)
Use Bayesian Rule:
Compute top-down reasoning
p(¬M|¬L) = 0.9525 –exercise
p(¬L) = 1- p(L) = 1- 0.7 = 0.3

Chapter-5
Knowledge Representation.
solving complex AI problems requires large amounts of knowledge and mechanisms for manipulating
that knowledge. The inference mechanisms that operate on knowledge, relay on the ways knowledge
is represented. A good knowledge representation model allows for more powerful inference
mechanisms that operate on them. While representing knowledge one has to consider two things.
1. Facts, which are truths in some relevant world.
2. Representation of facts in some chosen formalism . These are the things which are actually
manipulated by inference mechanism.
Knowledge representation schemes are useful only if there are functions that map facts to
representations and vice versa. AI is more concerned with a natural language representation of facts
and the functions which map natural language sentences into some representational formalism. An
appealing way of representing facts is using the language of logic. Logical formalism provides a way
of deriving new knowledge from the old through mathematical deduction. In this formalism, we can
conclude that a new statement is true by proving that it follows from the statements already known to
be facts.
STRUCTURED REPRESNTATION OF KNOWLEDGE

Representing knowledge using logical formalism, like predicate logic, has several advantages. They
can be combined with powerful inference mechanisms like resolution, which makes reasoning with
facts easy. But using logical formalism complex structures of the world, objects and their
relationships, events, sequences of events etc. can not be described easily.
A good system for the representation of structured knowledge in a particular domain should posses
the following four properties:
(i) Representational Adequacy:- The ability to represent all kinds of knowledge that are needed in that
domain.
(ii) Inferential Adequacy :- The ability to manipulate the represented structure and infer new
structures.
(iii) Inferential Efficiency:- The ability to incorporate additional information into the knowledge
structure that will aid the inference mechanisms.
(iv) Acquisitional Efficiency :- The ability to acquire new information easily, either by direct insertion
or by program control.
The techniques that have been developed in AI systems to accomplish these objectives fall under two
categories:
1. Declarative Methods:- In these knowledge is represented as static collection of facts which are
manipulated by general procedures. Here the facts need to be stored only one and they can be used in
any number of ways. Facts can be easily added to declarative systems without changing the general
procedures.
2. Procedural Method:- In these knowledge is represented as procedures. Default reasoning and

probabilistic reasoning are examples of procedural methods. In these, heuristic knowledge of “How to
do things efficiently “can be easily represented.
In practice most of the knowledge representation employ a combination of both. Most of the
knowledge representation structures have been developed to handle programs that handle natural
language input. One of the reasons that knowledge structures are so important is that they provide a
way to represent information about commonly occurring patterns of things . such descriptions are
some times called schema. One definition of schema is
“Schema refers to an active organization of the past reactions, or of past experience, which must
always be supposed to be operating in any well adapted organic response”.
By using schemas, people as well as programs can exploit the fact that the real world is not random.
There are several types of schemas that have proved useful in AI programs. They include
Approaches to Knowledge Representation

We briefly survey some representation schemes.
 Simple relational knowledge
 Inheritable knowledge
 Inferential knowledge
 Procedural knowledge
Simple relational knowledge

The simplest way of storing facts is to use a relational method where each fact about a set of objects is
set out systematically in columns. This representation gives little opportunity for inference, but it can
be used as the knowledge basis for inference engines.
 Simple way to store facts.
 Each fact about a set of objects is set out systematically in columns (Fig. 7).
 Little opportunity for inference.
 Knowledge basis for inference engines.
 Exist in form of tables, like tables in database
 _Each relation (row in table) itself can provide very weak inferential capabilities.
 _Relations may serve as the input to powerful inference engines
Figure: Simple Relational Knowledge

We can ask things like:
 Who is dead?
 Who plays Jazz/Trumpet etc.?
This sort of representation is popular in database systems.
Inheritable knowledge
Relational knowledge is made up of objects consisting of
 attributes
 corresponding associated values.
We extend the base more by allowing inference mechanisms:
 Property inheritance
o elements inherit values from being members of a class.
o data must be organised into a hierarchy of classes (Fig. 8).
Fig. 8 Property Inheritance Hierarchy

 Boxed nodes -- objects and values of attributes of objects.

 Values can be objects with attributes and so on.
 Arrows -- point from object to its value.
 This structure is known as a slot and filler structure, semantic network or a collection of
frames.
The algorithm to retrieve a value for an attribute of an instance object:
1. Find the object in the knowledge base
2. If there is a value for the attribute report it
3. Otherwise look for a value of instance if none fail
4. Otherwise go to that node and find a value for the attribute and then report it
5. Otherwise search through using isa until a value is found for the attribute.
Inferential knowledge
 Facts represented in a logical form, which facilitates reasoning.
 An inference engine is required.
Procedural knowledge
 Coding actions to be performed when a condition satisfied.
o Example
o IF
 Has fever more than 39oC
 Be lazy eating
 Skin has red dots
o THEN
 Suspect petechial fever
 Implementation:
o Writing actions in LISP Programming Language
o Writing actions in Production System Framework, like CLISP, JESS
Issue in Knowledge Representation

Below are listed issues that should be raised when using a knowledge representation technique:
Important Attributes

-- Are there any attributes that occur in many different types of problem?
There are two instance and isa and each is important because each supports property
inheritance.
Relationships
-- What about the relationship between the attributes of an object, such as, inverses, existence,
techniques for reasoning about values and single valued attributes. We can consider an
example of an inverse in
band(John Zorn,Naked City)
This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked
City.
Another representation is band = Naked City
band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron,
Granularity
-- At what level should the knowledge be represented and what are the primitives. Choosing
the Granularity of Representation Primitives are fundamental concepts such as holding,
seeing, playing and as English is a very rich language with over half a million words it is clear
we will find difficulty in deciding upon which words to choose as our primitives in a series of
situations.
 At what level of detail should knowledge be represented?
o Balance the trade-off
 High-level facts may not be adequate for inference
 Low-level primitives may require a lot of storage.
How should sets of objects be represented?
o By names.
o By extensional definition.
o By intensional definition
Given a large amount of knowledge stored, how can relevant parts be accessed?
o Selecting an initial structure.
o Revising the choice.
If Tom feeds a dog then it could become:
feeds(tom, dog)
If Tom gives the dog a bone like:
gives(tom, dog,bone) Are these the same?
In any sense does giving an object food constitute feeding?
If give(x, food) feed(x) then we are making progress.
But we need to add certain inferential rules.
In the famous program on relationships Louise is Bill's cousin How do we represent this? louise =
daughter (brother or sister (father or mother( bill))) Suppose it is Chris then we do not know if it
is Chris as a male or female and then son applies as well.
Clearly the separate levels of understanding require different levels of primitives and these need many
rules to link together apparently similar primitives.
Obviously there is a potential storage problem and the underlying question must be what level of
comprehension is needed.
What are Semantic networks and frames? Explain with suitable examples.
A semantic network is often used as a form of knowledge representation. It is a directed graph

consisting of vertices which represent concepts and edges which represent semantic relations
between the concepts.
The following semantic relations are commonly represented in a semantic net.
Meronymy (A is part of B)
Holonymy (B has A as a part of itself)
Hyponymy (or troponymy) (A is subordinate of B; A is kind of B)
Hypernymy (A is superordinate of B)

Synonymy (A denotes the same as B)
Antonymy (A denotes the opposite of B)
Example
The physical attributes of a person can be represented as in the following figure using a semantic
net.
These values can also be represented in logic as: isa(person, mammal), instance(Mike-Hall, person)
team(Mike-Hall, Cardiff)
( For detail of semantic network please follow lecture slides)
Frames
Frames are descriptions of conceptual individuals. Frames can exist for ``real'' objects such as ``The
Everest Hotel'', sets of objects such as ``Hotels'', or more ``abstract'' objects such as ``Cola-Wars'' or
``Gulfwar''.
A Frame system is a collection of objects. Each object contains a number of slots. A slot represents
an attribute. Each slot has a value. The value of an attribute can be another object. Frames are
essentially defined by their relationships with other frames. Relationships between frames are
represented using slots. If a frame f is in a relationship r to a frame g, then we put the value g in the r
slot of f.
For example, suppose we are describing the following genealogical tree:
The frame describing Adam might look something like:

Adam:
sex: Male
spouse: Beth
child: (Charles Donna Ellen)
where sex, spouse, and child are slots.
The genealogical tree would then be described by (at least) seven frames, describing the following
individuals: Adam, Beth, Charles, Donna, Ellen, Male, and Female.

A frame can be considered just a convenient way to represent a set of predicates applied to constant
symbols (e.g. ground instances of predicates.). For example, the frame above could be written:
sex(Adam,Male)
spouse(Adam,Beth)
child(Adam,Charles)
child(Adam,Donna)
child(Adam,Ellen)
Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the
distinction between a semantic net and a frame ends. Semantic nets initially we used to represent
labelled connections between objects. As tasks became more complex the representation needs to
be more structured. The more structured the system it becomes more beneficial to use frames. A
frame is a collection of attributes or slots and associated values that describe some real world entity.
Semantic Networks
 First introduced by Quillian back in the late-60s
M. Ross Quillian. "Semantic Memories", In M. M. Minsky, editor, Semantic
Information Processing, pages 216-270. Cambridge, MA: MIT Press, 1968
 Semantic network is simple representation scheme which uses a graph of labeled nodes and
labeled directed arcs to encode knowledge
 Nodes – objects, concepts, events
 Arcs – relationships between nodes
 Graphical depiction associated with semantic networks is a big reason for their popularity
 The idea behind a semantic network is that knowledge is often best understood as a set of
concepts that are related to one another.
 Arcs define binary relations which hold between objects denoted by the nodes.

 inheritance reasoning in semantic nets
 follow MemberOf & SubsetOf links
 up the hierarchy
 stop at the category with a property link
 to infer the property for an
individual
the semantic net advantages

 simplicity of inference
 ease of visualizing, even for large nets
 ease of representing default values for categories
 & ease of overriding defaults by more specific values
but, awkward or impossible
 to capture many of FOL's representational capabilities

 negation, disjunction, existential quantification, ...
 when extended to do so, it loses its attractive simplicity
Frames
 Frames – semantic net with properties

 A frame represents an entity as a set of slots (attributes) and associated values
 A frame can represent a specific entry, or a general concept
 Frames are implicitly associated with one another because the value of a slot can be another
frame
 3 components of a frame
· frame name
· attributes (slots)
· values (fillers: list of values, range, string, etc.)
 Book Frame
 Slot  Filler
 Title  AI. A modern Approach

 Author  Russell & Norvig
 Year  2003
 More natural support of values then semantic nets

 Can be easily implemented using object-oriented programming techniques
 Inheritance is easily controlled
 Similar to Object-Oriented programming paradigm
 Benefits:
 Makes programming easier by grouping related knowledge
 Easily understood by non-developers
 Expressive power

 Easy to set up slots for new properties and relations
 Easy to include default information and detect missing values
 Drawbacks:
 No standards
 More of a general methodology than a specific representation:
• Frame for a class-room will be different for a professor and for a maintenance
worker
No associated reasoning/inference mechanisms
Conceptual Dependency (CD)

This representation is used in natural language processing in order to represent them earning of the
sentences in such a way that inference we can be made from the sentences. It is independent of the
language in which the sentences were originally stated. CD representations of a sentence is built out
of primitives , which are not words belonging to the language but are conceptual , these primitives are
combined to form the meaning s of the words. As an example consider the event represented by the
sentence.
In the above representation the symbols have the following meaning:
Arrows indicate direction of dependency

Double arrow indicates two may link between actor and the action
P indicates past tense
ATRANS is one of the primitive acts used by the theory . it indicates transfer of possession
0 indicates the object case relation
R indicates the recipient case relation
Conceptual dependency provides a str5ucture in which knowledge can be represented and also a set of
building blocks from which representations can be built. A typical set of primitive actions are
ATRANS - Transfer of an abstract relationship(Eg: give)

PTRANS - Transfer of the physical location of an object(Eg: go)
PROPEL - Application of physical force to an object (Eg: push)
MOVE - Movement of a body part by its owner (eg : kick)
GRASP - Grasping of an object by an actor(Eg: throw)
INGEST - Ingesting of an object by an animal (Eg: eat)
EXPEL - Expulsion of something from the body of an animal (cry)
MTRANS - Transfer of mental information(Eg: tell)
MBUILD - Building new information out of old(Eg: decide)
SPEAK - Production of sounds(Eg: say)
ATTEND - Focusing of sense organ toward a stimulus (Eg: listen)
A second set of building block is the set of allowable dependencies among the conceptualization
describe in a sentence.
Advantages of CD:
 Using these primitives involves fewer inference rules.
 Many inference rules are already represented in CD structure.
 The holes in the initial structure help to focus on the points still to be established.
Disadvantages of CD:
 Knowledge must be decomposed into fairly low level primitives.
 Impossible or difficult to find correct set of primitives.
 A lot of inference may still be required.
 Representations can be complex even for relatively simple actions. Consider:
Dave bet Frank five pounds that Wales would win the Rugby World Cup.
Complex representations require a lot of storage
Applications of CD:
MARGIE
(Meaning Analysis, Response Generation and Inference on English) -- model natural
language understanding.
SAM
(Script Applier Mechanism) -- Scripts to understand stories. See next section.
PAM
(Plan Applier Mechanism) -- Scripts to understand stories.
Script
A structured representation of background world knowledge. This structure contains knowledge
about objects, actions, and situations that are described in the input text. If we consider the
Knowledge about Shooping or Entering into the Restraunt. This kind of stored Knowledge about
stereotypical events is called a Script.
A script is a structure that prescribes a set of circumstances which could be expected to follow on
from one another.
It is similar to a thought sequence or a chain of situations which could be anticipated.
It could be considered to consist of a number of slots or frames but with more specialised roles.
Scripts are beneficial because:
 Events tend to occur in known runs or patterns.
 Causal relationships between events exist.
 Entry conditions exist which allow an event to take place
 Prerequisites exist upon events taking place. E.g. when a student progresses through a degree
scheme or when a purchaser buys a house.
The components of a script include:
Entry Conditions
-- these must be satisfied before events in the script can occur.
Results
-- Conditions that will be true after events in script occur.
Props
-- Slots representing objects involved in events.
Roles
-- Persons involved in the events.
Track
-- Variations on the script. Different tracks may share components of the same script.
Scenes
-- The sequence of events that occur. Events are represented in conceptual dependency form.
Advantages of Scripts:
 Ability to predict events.
 A single coherent interpretation may be build up from a collection of observations.
Disadvantages:
 Less general than frames.
 May not be suitable to represent all kinds of knowledge.
Scripts are useful in describing certain situations such as robbing a bank.

Chapter-6
Refer pulchowk notes+
What is an artificial neural network?

An artificial neural network is a system based on the operation of biological neural networks, in other
words, is an emulation of biological neural system. Why would be necessary the implementation of
artificial neural networks? Although computing these days is truly advanced, there are certain tasks
that a program made for a common microprocessor is unable to perform; even so a software
implementation of a neural network can be made with their advantages and disadvantages.
Advantages:
 A neural network can perform tasks that a linear program can not.
 When an element of the neural network fails, it can continue without any problem by their
parallel nature.
 A neural network learns and does not need to be reprogrammed.
 It can be implemented in any application.
 It can be implemented without any problem.
Disadvantages:
 The neural network needs training to operate.
 The architecture of a neural network is different from the architecture of microprocessors
therefore needs to be emulated.
 Requires high processing time for large neural networks.
Expert System Architecture
Explanation Based Learning
The EBL Hypothesis

By understanding why an example is a member of a concept, can learn the essential properties of the
concept
Trade-off
the need to collect many examples for the ability to “explain” single examples (a “domain” theory)
Learning by Generalizing Explanations

Given
– Goal (e.g., some predicate calculus statement)
– Situation Description (facts)
– Domain Theory (inference rules)
– Operationality Criterion
Use problem solver to justify, using the rules, the goal in terms of the facts.
Generalize the justification as much as possible.
The operationality criterion states which other terms can appear in the generalized result.
• An explanation is an inter-connected collection of “pieces” of knowledge (inference rules, rewrite

rules, etc.)
• These “rules” are connected using unification, as in Prolog
• The generalization task is to compute the most general unifier that allows the “knowledge pieces”
to be connected together as generally as possible

Machine Vision
 The goal of Machine Vision is to create a model of the real world from images
– A machine vision system recovers useful information about a scene from its two
dimensional projectionsThe world is three dimensional
– Two dimensional digitized images
 Knowledge about the objects (regions) in a scene and projection geometry is required.
 The information which is recovered differs depending on the application
– Satellite, medical images etc.
 Processing takes place in stages:
– Enhancement, segmentation, image analysis and matching (pattern recognition).

Machine Vision Stages
Boltzmann Machine
A Boltzmann machine is a type of stochastic recurrent neural network invented by Geoffrey
Hinton and Terry Sejnowski. Boltzmann machines can be seen as
the stochastic, generative counterpart of Hopfield nets. They were one of the first examples of a
neural network capable of learning internal representations, and are able to represent and (given
sufficient time) solve difficult combinatoric problems. However, due to a number of issues discussed
below, Boltzmann machines with unconstrained connectivity have not proven useful for practical
problems in machine learning or inference. They are still theoretically intriguing, however, due to the
locality and Hebbian nature of their training algorithm, as well as their parallelism and the
resemblance of their dynamics to simple physical processes. If the connectivity is constrained, the
learning can be made efficient enough to be useful for practical problems.
They are named after the Boltzmann distribution in statistical mechanics, which is used in their
sampling function.
A Boltzmann machine, like a Hopfield network, is a network of units with an "energy" defined for the network.
It also has binary units, but unlike Hopfield nets, Boltzmann machine units are stochastic. The global
energy, , in a Boltzmann machine is identical in form to that of a Hopfield network:
Where:
 is the connection strength between unit and unit .
 is the state, , of unit .
 is the threshold of unit .
The connections in a Boltzmann machine have two restrictions:
 . (No unit has a connection with itself.)
 . (All connections are symmetric.)

AI Notes

Uploaded by

Document Information

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

AI Notes

Uploaded by

Copyright:

Chapter -1: Introduction

What is artificial intelligence?

It is the science and engineering of making intelligent machines, especially intelligent

It is Duplication of human thought process by machine

Yes, but what is intelligence?

Acting humanly: The Turing Test approach

Fig. The imitation game

Abridged history of AI(summary)

1943 McCulloch & Pitts: Boolean circuit model of brain

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

 Solve knowledge-intensive tasks

 Intelligent connection of perception and action

 Enhance human-human, human-computer and computer-computer interaction/communication

Some Application Areas of AI

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Some AI "Grand Challenge" Problems

A Framework for Building AI Systems

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Some Fundamental Issues for Most AI Problems

Example: Given 17 sticks in 3 x 2 grid, remove 5 sticks to leave exactly 3 squares.

The State of the Art

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Fig. Vacuum cleaner world

Percepts: location and contents, e.g., [A, Dirty]

Percept sequence Action

[A, Clean] Right

[A, Dirty] Suck

[B, Clean] Left

[B, Dirty] Suck

[A, Clean], [A, Clean] Right

[A, Clean], [A, Dirty] Suck

function Reflex-Vacuum-Agent( [location,status]) returns an action

 if status = Dirty then return Suck

Definition of Rational Agent:

Rational ≠ omniscient (percepts may not supply all relevant information)

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

PEAS (Performance measure, Environment, Actuators, Sensors)

Consider, e.g., the task of designing an automated taxi:

Agent Type Performance Environment Actuators Sensors

Agent Type Performance Environment Actuators Sensors

Fully observable vs. partially observable:

Deterministic vs. stochastic.

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Static vs. dynamic.

Discrete vs. continuous.

Single agent vs. multiagent.

Agent types; simple reflex

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Agent types; reflex and state

Agent types; goal-based

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Agent types; learning

 Learning element: introduce improvements in performance element.

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

 On holiday in Romania; currently in Arad

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

Formulating Problem as a Graph

 each node represents a possible state;

State space graph of vacuum world

Benard O.Osero, Bsc, Msc Comp Sci, CCNA.

To achieve the problem’s goal

How do we reach a goal state?

There may be several possible ways. Or none!

 cost of finding a path;

Problem Solving as Search