Professional Documents
Culture Documents
AI Notes
AI Notes
Intelligence is the computational part of the ability to achieve goals in the world. Varying
kinds and degrees of intelligence occur in people, many animals and some machines.
Isn't there a solid definition of intelligence that doesn't depend on relating it to human
intelligence?
Not yet. The problem is that we cannot yet characterize in general what kinds of
computational procedures we want to call intelligent. We understand some of the mechanisms
of intelligence and not others.
"AI is the study of complex information processing problems that often have their roots in
some aspect of biological information processing. The goal of the subject is to identify
solvable and interesting information processing problems, and solve them." -- David Marr
"AI is the design, study and construction of computer programs that behave intelligently." --
Tom Dean
"... to achieve their full impact, computer systems must have more than processing power--
they must have intelligence. They need to be able to assimilate and use large bodies of
information and collaborate with and help people find new ways of working together
effectively. The technology must become more responsive to human needs and styles of
work, and must employ more natural means of communication." -- Barbara Grosz and
Randall Davis
AI not centered around representation of the world, but around action in the world. Behavior-
based intelligence. (see Rod Brooks in the movie Fast, Cheap and Out of Control)
Computer can sense and recognize its users, see and recognize its environment, respond
visually and audibly to stimuli. New paradigms for interacting productively with computers
using speech, vision, natural language, 3D virtual reality, 3D displays, more natural and
powerful user interfaces, etc. (See, for example, projects in Microsoft's "Advanced
Interactivity and Intelligence" group.)
Game Playing
Deep Blue Chess program beat world champion Gary Kasparov
Speech Recognition
PEGASUS spoken language interface to American Airlines' EAASY SABRE reservation
system, which allows users to obtain flight information and make reservations over the
telephone. The 1990s has seen significant advances in speech recognition so that limited
systems are now successful.
Computer Vision
Face recognition programs in use by banks, government, etc. The ALVINN system from
CMU autonomously drove a van from Washington, D.C. to San Diego (all but 52 of 2,849
miles), averaging 63 mph day and night, and in all weather conditions. Handwriting
recognition, electronics and manufacturing inspection, photointerpretation, baggage
inspection, reverse engineering to automatically construct a 3D geometric model.
Expert Systems
Application-specific systems that rely on obtaining the knowledge of human experts in an
area and programming that knowledge into a system.
o Diagnostic Systems
Microsoft Office Assistant in Office 97 provides customized help by decision-
theoretic reasoning about an individual user. MYCIN system for diagnosing bacterial
infections of the blood and suggesting treatments. Intellipath pathology diagnosis
Translating telephone
Accident-avoiding car
Aids for the disabled
Smart clothes
Intelligent agents that monitor and manage information by filtering, digesting, abstracting
Tutors
Self-organizing systems, e.g., that learn to assemble something by observing a human do it.
Reasoning
Inference, decision-making, classification from what is sensed and what the internal "model" is of the
world. Might be a neural network, logical deduction system, Hidden Markov Model induction,
heuristic searching a problem space, Bayes Network inference, genetic algorithms, etc. Includes areas
of knowledge representation, problem solving, decision theory, planning, game theory, machine
learning, uncertainty reasoning, etc.
Representation
Facts about the world have to be represented in some way, e.g., mathematical logic is one
language that is used in AI. Deals with the questions of what to represent and how to
represent it. How to structure knowledge? What is explicit, and what must be inferred? How
to encode "rules" for inferencing so as to find information that is only implicitly known? How
to deal with incomplete, inconsistent, and probabilistic knowledge? Epistemology issues
(what kinds of knowledge are required to solve problems).
Example: "The fly buzzed irritatingly on the window pane. Jill picked up the newspaper."
Inference: Jill has malicious intent; she is not intending to read the newspaper, or use it to
start a fire, or ...
Search
Many tasks can be viewed as searching a very large problem space for a solution. For
example, Checkers has about 1040 states, and Chess has about 10120 states in a typical games.
Use of heuristics (meaning "serving to aid discovery") and constraints.
Inference
From some facts others can be inferred. Related to search. For example, knowing "All
elephants have trunks" and "Clyde is an elephant," can we answer the question "Does Clyde
hae a trunk?" What about "Peanuts has a trunk, is it an elephant?" Or "Peanuts lives in a tree
and has a trunk, is it an elephant?" Deduction, abduction, non-monotonic reasoning, reasoning
under uncertainty.
Learning
Inductive inference, neural networks, genetic algorithms, artificial life, evolutionary
approaches.
Planning
Starting with general facts about the world, facts about the effects of basic actions, facts about
a particular situation, and a statement of a goal, generate a strategy for achieving that goals in
terms of a sequence of primitive steps or actions.
If we can specify the agent's choice of action for every possible percept sequence,
then we have said more or less everything there is to say about the agent.
Mathematically speaking, we say that an agent's behavior is described by the agent
function that maps any given percept sequence to an action.
f : P * A
The agent program runs on the physical architecture to produce f
For each possible percept sequence, a rational agent should select an action that is expected to maximize its
performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge
the agent has.
To design a rational agent, we must specify the task environments. Task environments are
essentially the "problems" to which rational agents are the "solutions." Those task
environments come in a variety of flavors and the flavor of the task environment directly
affects the appropriate design for the agent program.
The range of task environments that might arise in AI is obviously vast. We can, however,
identify a fairly small number of dimensions along which task environments can be catego-
rized.
As one might expect, the hardest case is partially observable, stochastic, sequential, dynamic,
continuous, and multiagent. The real world is partially observable, stochastic, sequential,
dynamic, continuous, multi-agent.
There are four basic kinds of agent program that embody the principles underlying almost all
intelligent systems. All these can be turned into learning agents
• Simple reflex agents;
• Model-based reflex agents;
• Goal-based agents; and
• Utility-based agents.
All these can be turned into learning agents.
KNOWLEDGE
• Data = collection of facts, measurements, statistics
• Information = organized data
• Knowledge = contextual, relevant, actionable information
– Strong experiential and reflective elements
– Good leverage and increasing returns
– Dynamic
– Branches and fragments with growth
– Difficult to estimate impact of investment
– Uncertain value in sharing
– Evolves over time with experience
• Explicit knowledge
– Objective, rational, technical
– Policies, goals, strategies, papers, reports
– Codified
– Leaky knowledge
• Tacit knowledge
– Subjective, cognitive, experiential learning
– Highly personalized
– Difficult to formalize
– Sticky knowledge
Problem-solving agent
Four general steps in problem solving:
Goal formulation
o What are the successful world states
Problem formulation
o What actions and states to consider to give the goal
Search
o Determine the possible sequence of actions that lead to the states of known
values and then choosing the best sequence.
Execute
o Give the solution perform the actions.
function SIMPLE-PROBLEM-SOLVING-AGENT(percept) return an action
static: seq, an action sequence
state, some description of the current world state
goal, a goal
problem, a problem formulation
stateUPDATE-STATE(state, percept)
if seq is empty then
goal FORMULATE-GOAL(state)
problemFORMULATE-PROBLEM(state,goal)
seq SEARCH(problem)
action FIRST(seq)
seq REST(seq)
return action
EXAMPLE:
Example: 8-puzzle
States?? Integer location of each tile
Initial state?? Any state can be initial
Actions?? {Left, Right, Up, Down}
Goal test?? Check whether goal configuration is reached
o Path cost?? Number of actions to reach goal
search the space for a (possibly optimal) sequence of transitions starting from S0 and leading
to a goal state;
execute (in order) the actions associated to each transition in the identified sequence.
Depending on the features of the agent’s world the two steps above can be interleaved.
Factors to consider:
How can an artificial agent represent the states and the state
space for this problem?
Problem formulation
A problem is defined by:
o An initial state, e.g. Arad
o Successor function S(X)= set of action-state pairs
e.g. S(Arad)={<Arad ® Zerind, Zerind>,…}
intial state + successor function = state space
o Goal test, can be
Explicit, e.g. x=‘at bucharest’
Implicit, e.g. checkmate(x)
o Path cost (additive)
e.g. sum of distances, number of actions executed, …
c(x,a,y) is the step cost, assumed to be >= 0
A solution is a sequence of actions from initial to goal state.
Optimal solution has the lowest path cost.
Problem formulation
1. Choose an appropriate data structure to represent the world states.
2. Define each operator as a precondition/effects pair where the precondition holds exactly in the
states the operator applies to, effects describe how a state changes into a successor state by the
application of the operator.
3. Specify an initial state.
Example: Op(3,2,R)
Varieties of Constraints
Unary constraints involve a single variable.
e.g. SA ¹ green
Binary constraints involve pairs of variables.
e.g. SA ¹ WA
Higher-order constraints involve 3 or more variables.
e.g. cryptharithmetic column constraints.
Preference (soft constraints) e.g. red is better than greenoften representable by a cost for each
variable assignment constrained optimization problems.
Constraint graph
CSP benefits
Standard representation pattern
Generic goal and successor functions
Generic heuristics (no domain specific expertise).
Constraint graph = nodes are variables, edges show constraints.
Graph can be used to simplify search.
o e.g. Tasmania is an independent subproblem.
Cryptarithmetic conventions
Each letter or symbol represents only one digit throughout the problem;
When letters are replaced by their digits, the resultant arithmetical operation must be
correct;
The numerical base, unless specifically stated, is 10;
Numbers must not begin with a zero;
There must be only one solution to the problem.
1.
S E N D
+ M O R E
------------
M O N E Y
We see at once that M in the total must be 1, since the total of the column SM cannot reach as
high as 20. Now if M in this column is replaced by 1, how can we make this column total as
much as 10 to provide the 1 carried over to the left below? Only by making S very large: 9 or
8. In either case the letter O must stand for zero: the summation of SM could produce only 10
or 11, but we cannot use 1 for letter O as we have already used it for M.
If letter O is zero, then in column EO we cannot reach a total as high as 10, so that there will
be no 1 to carry over from this column to SM. Hence S must positively be 9.
Since the summation EO gives N, and letter O is zero, N must be 1 greater than E and the
column NR must total over 10. To put it into an equation: E + 1 = N
We have to insert the expression (+ 1) because we don’t know yet whether 1 is carried over
from column DE. But we do know that 1 has to be carried over from column NR to EO.
Column DE must total at least 12, since Y cannot be 1 or zero. What values can we give D
and E to reach this total? We have already used 9 and 8 elsewhere. The only digits left that
are high enough are 7, 6 and 7, 5. But remember that one of these has to be E, and N is 1
greater than E. Hence E must be 5, N must be 6, while D is 7. Then Y turns out to be 2, and
the puzzle is completely solved.
S E N D
9 5 6 7
+ M O R E
1 0 8 5
---------
M O N E Y
1 0 6 5 2
2.
T W O
+ T W O
_____
F O U R
Since, Lets first check with F as 0.Now imagine O with highest possible value 9.Now R must be 8
and T should be 4. Now among the remaining numbers if we check then we get U as 3.Thus W must
be 6,
T W O
4 6 9
+ T W O
4 6 9
_____
F O U R
0 9 3 8
Game Playing
Summary
Games are fun (and dangerous)
They illustrate several important points about AI
Perfection is unattainable -> approximation
Benard O.Osero, Bsc, Msc Comp Sci, CCNA.
Good idea what to think about
Uncertainty constrains the assignment of values to states
Games are to AI as grand prix racing is to automobile design.
Types Of Games
Game setup
Two players: MAX and MIN
MAX moves first and they take turns until the game is over. Winner gets award, looser
gets penalty.
Games as search:
o Initial state: e.g. board configuration of chess
o Successor function: list of (move,state) pairs specifying legal moves.
o Terminal test: Is the game finished?
o Utility function: Gives numerical value of terminal states.
Optimal strategies
Find the contingent strategy for MAX assuming an infallible MIN opponent.
Assumption: Both players play optimally !!
Given a game tree, the optimal strategy can be determined by using the minimax value of
each node:
MINIMAX-VALUE(n)=
Production systems are composed of three parts, a global database, production rules and a control
structure.
A production system (or production rule system) is a computer program typically used to provide
some form of artificial intelligence, which consists primarily of a set of rules about behaviour. These
rules, termed productions, are a basic representation found useful in automated planning, expert
systems and action selection. A production system provides the mechanism necessary to execute
productions in order to achieve some goal for the system.
Productions consist of two parts: a sensory precondition (or "IF" statement) and an action (or
"THEN"). If a production's precondition matches the current state of the world, then the production is
said to be triggered. If a production's action is executed, it is said to have fired.
The first production systems were done by Newell and Simon in the 1950s, and the idea was written
up in their (1972).
"Production" in the title of these notes (or "production rule") is a synonym for "rule", i.e. for a
condition-action rule (see below). The term seems to have originated with the term used for
rewriting rules in the Chomsky hierarchy of grammar types, where for example context-free
grammar rules are sometimes referred to as context-free productions.
Rules
or
if <condition> then <action>
Example:
if patient has high levels of the enzyme ferritin in their blood
and patient has the Cys282→Tyr mutation in HFE gene
then conclude patient has haemochromatosis*
backward chaining
forward chaining
To determine if a decision should be made, work backwards looking for justifications for the
decision.
Eventually, a decision must be justified by facts.
Forward Chaining
Forward Chaining 2
Until a problem is solved or no rule's 'if' part is satisfied by the current situation:
a set of rules
working memory that stores temporary data
a forward chaining inference engine
Match-Resolve-Act Cycle
loop
match conditions of rules with contents of working memory
if no rule matches then stop
resolve conflicts
act (i.e. perform conclusion part of rule)
end loop
Chapter-3
Performance Measure:
Completeness:
it is easy to see that breadth-first search is complete that it visit all levels given that d
factor is finite, so in some d it will find a solution.
Optimality:
breadth-first search is not optimal until all actions have the same cost.
Space complexity and Time complexity:
Performance Measure:
Completeness:
DFS is not complete, to convince yourself consider that our search start expanding
the left sub tree of the root for so long path (may be infinite) when different choice near the root could
lead to a solution, now suppose that the left sub tree of the root has no solution, and it is unbounded,
then the search will continue going deep infinitely, in this case we say that DFS is not complete.
Optimality:
• Breadth first has computational, especially, space problems. Depth first can run off down a very
long (or infinite) path..
• Idea: introduce a depth limit on branches to be expanded.
• Don’t expand a branch below this depth.
• Most useful if you know the maximum depth of the solution.
Advantages
Will always terminate
Will find solution if there is one in the depth bound
Disadvantages
• Too small a depth bound misses solutions
• Too large a depth bound may find poor solutions when there are better ones
3.2.2.1.Greedy Search
3.2.2. A* Search
Best-known form of best-first search.
Idea: avoid expanding paths that are already expensive.
Evaluation function f(n)=g(n) + h(n)
o g(n) the cost (so far) to reach the node.
o h(n) estimated cost to get from the node to the goal.
o f(n) estimated total cost of path through n to goal.
A* search uses an admissible heuristic
o A heuristic is admissible if it never overestimates the cost to reach the goal
o Are optimistic
Formally:
1. h(n) <= h*(n) where h*(n) is the true cost from n
2. h(n) >= 0 so h(G)=0 for any goal G.
e.g. hSLD(n) never overestimates the actual road distance
example:
And so on…
Admissible Heuristic
A heuristic h(n) is admissible if for every node n,
h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n.
An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic
Example: hSLD(n) (never overestimates the actual road distance)
Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal
A* Search Evaluation
MINMAX Algorithm
minimax(player,board)
if(game over in current board position)
return winner
children = all legal moves for player from this board
if(max's turn)
return maximal score of calling minimax on all the children
else (min's turn)
return minimal score of calling minimax on all the children
ALPHA-BETA pruning is a method that reduces the number of nodes explored in Minimax
strategy.
It reduces the time required for the search and it must be restricted so that no time is to be
wasted searching moves that are obviously bad for the current player.
Properties of α-β
Chapter 4
4.1.1 Logics are formal languages for formalizing reasoning, in particular for representing
information such that conclusions can be drawn
Logic involves:
– A language with a syntax for specifying what is a legal expression in the language;
syntax defines well formed sentences in the language
– Semantics for associating elements of the language with elements of some subject
matter. Semantics defines the "meaning" of sentences (link to the world); i.e.,
semantics defines the truth of a sentence with respect to each possible world
– Inference rules for manipulating sentences in the language
4.1.7.Validity
An argument is valid whenever the truth of all its premises implies the truth of its conclusion.
An argument is a sequence of propositions. The final proposition is called the conclusion of the argument while the other
proposition are called the premises or hypotheses of the argument.
one can use the rules of inference to show the validity of an argument.
Note that p1, p2, … q are generally compound propositions or wffs.
4.2.
Horn Clause
A Horn sentence or Horn clause has the form:
P1 P2 P3 ... Pn Q
or alternatively
P1 P2 P3 ... Pn Q
where Ps and Q are non-negated atoms
• To get a proof for Horn sentences, apply Modus Ponens repeatedly until nothing can be done
• We will use the Horn clause form later
4.2.3.Propositional Logic
Propositional Logic Syntax
Propositional logic is the simplest logic – illustrates basic ideas
All objects described are fixed or unique
E.g. "John is a student" student(john) ; Here John refers to one unique person.
In propositional logic (PL) an user defines a set of propositional symbols, like P and Q. User
defines the semantics of each of these symbols. For example,
P means "It is hot"
Q means "It is humid“
R means "It is raining"
The proposition symbols:
S, S1, S2 etc are sentences
_ If S is a sentence, ØS is a sentence (negation )
_ If S1 and S2 are sentences, S1 Ù S2 is a sentence (conjunction )
_ If S1 and S2 are sentences, S1 Ú S2 is a sentence (disjunction )
_ If S1 and S2 are sentences, S1 => S2 is a sentence (implication )
_ If S1 and S2 are sentences, S1 S2 is a sentence (biconditional )
Logical Equivalence
Two sentences are logically equivalent iff true in same models: α ≡ ß iff α╞ β and β╞ α
Resolution
Conjunctive Normal Form (CNF)
o conjunction of disjunctions of literals clauses
E.g., (A Ú ØB) Ù (B Ú ØC Ú ØD)
Resolution is sound and complete for propositional logic
Conversion to CNF
Proportional Resolution
Example 2
We will express the following in first order predicate calculus
“sam is Kind”
“Every kind person has someone who loves them”
“sam loves someone”
The non-logical symbols of our language are
the constant sam and
the unary predicate (or property) Kind and
the binary predicate Loves.
We may represent the above sentences as
1. Kind(sam)
2. ∀x.(Kind(x) υ ∃y.Loves(y,x))
3. ∃y Loves(sam,y)
Using FOL
Brothers are siblings
x,y Brother(x,y) Sibling(x,y)
One's mother is one's female parent
m,c Mother(c) = m (Female(m) Parent(m,c))
“Sibling” is symmetric
x,y Sibling(x,y) Sibling(y,x)
Marcus was a man
Man(Marcus)
Marcus was a Pompeian
Pompeian(Marcus)
All Pompeians were Romans
x:Pompeian(x)Roman(x)
All Romans were either loyal to Caesar or hated him
x:Roman(x) loyalto(x,Caesar) V hate(x, Caesar)
Everyone is loyal to someone
x: y: loyalto(x,y)
People only try to assassinate rulers they are not loyal to
x: y: person(x) AND ruler(y) AND tryassassinate(x,y) ~loyalto(x,y)
Inference Rules
Complex deductive arguments can be judged valid or invalid based on whether or not the steps in that
argument follow the nine basic rules of inference. These rules of inference are all relatively simple,
although when presented in formal terms they can look overly complex.
Conjunction:
1. P
2. Q
3. Therefore, P and Q.
1. It is raining in New York.
2. It is raining in Boston
3. Therefore, it is raining in both New York and Boston
Simplification
1. P and Q.
2. Therefore, P.
1. It is raining in both New York and Boston.
2. Therefore, it is raining in New York.
Addition
1. P
2. Therefore, P or Q.
1. It is raining
2. Therefore, either either it is raining or the sun is shining.
Absorption
1. If P, then Q.
2. Therfore, If P then P and Q.
1. If it is raining, then I will get wet.
2. Therefore, if it is raining, then it is raining and I will get wet.
Modus Ponens
1. If P then Q.(p->q)
2. P.
3. Therefore, Q.
1. If it is raining, then I will get wet.
2. It is raining.
3. Therefore, I will get wet.
Modus Tollens
1. If P then Q.
2. Not Q. (~Q).
3. Therefore, not P (~P).
The above rules of inference, when combined with the rules of replacement, mean that propositional
calculus is "complete." Propositional calculus is simply another name for formal logic
Unification
I in computer science and logic, is an algorithmic process by which one attempts to solve
the satisfiability problem. The goal of unification is to find a substitution which demonstrates that two
seemingly different terms are in fact either identical or just equal. Unification is widely used
in automated reasoning, logic programming and programming language type system implementation.
Several kinds of unification are commonly studied: that for theories without any equations (the empty
theory) is referred to as syntactic unification: one wishes to show that (pairs of) terms are identical.
If one has a non-empty equational theory, then one is typically interested in showing the equality of (a
pair of) terms; this is referred to as semantic unification. Since substitutions can be ordered into
a partial order, unification can be understood as the procedure of finding a join on a lattice.
We also need some way of binding variables to values in a consistent way so that components of
sentences can be matched. This is the process of Unification.
Binding
A binding list is a set of enteries of the form v = e where v is a variable and e is an object. Given an
expression p and a binding list we write for the instantiation of p using bindings in.
Benard O.Osero, Bsc, Msc Comp Sci, CCNA.
Unifier
MGU is a unifier that binds the fewest variables or binds them to less specific expressions.
Dog(fido) Dog(fido)
Apply Resolution
Q.1. Anyone passing the Artificial Intelligence exam and winning the lottery is happy. But anyone
who studies or is lucky can pass all their exams. Ali did not study but he is lucky. Anyone who is
lucky wins the lottery. Is Ali happy?
Anyone passing the AI Exam and winning the lottery is happy
X:[pass(x,AI) Λ win(x, lottery) happy(x)]
Anyone who studies or is lucky can pass all their exams
X Y [studies(x) V lucky(x) pass(x,y)]
Ali did not study but he is lucky
¬ study(ali) Λ lucky(ali)
Anyone who is lucky wins the lottery
X: [lucky(x) win(x,lottery)]
4.4.
Symbolic versus statistical reasoning
The (Symbolic) methods basically represent uncertainty belief as being
True,
False, or
Neither True nor False.
Some methods also had problems with
Incomplete Knowledge
Contradictions in the knowledge.
Statistical methods provide a method for representing beliefs that are not certain (or uncertain) but for
which there may be some supporting (or contradictory) evidence.
Statistical methods offer advantages in two broad scenarios:
Genuine Randomness
-- Card games are a good example. We may not be able to predict any outcomes with
certainty but we have knowledge about the likelihood of certain items (e.g. like being dealt an
ace) and we can exploit this.
Exceptions
-- Symbolic methods can represent this. However if the number of exceptions is large such
system tend to break down. Many common sense and expert reasoning tasks for example.
Statistical techniques can summarise large exceptions without resorting enumeration.
So given a pack of playing cards the probability of being dealt an ace from a full normal deck
is 4 (the number of aces) / 52 (number of cards in deck) which is 1/13. Similarly the
probability of being dealt a spade suit is 13 / 52 = 1/4.
If you have a choice of number of items k from a set of items n then
the formula is applied to find the number of ways of making this choice. (! =
factorial).
Bayes Theorem
This states:
o This reads that given some evidence E then probability that hypothesis is true is
equal to the ratio of the probability that E will be true given times the a
priori evidence on the probability of and the sum of the probability of E over the
set of all hypotheses times the probability of these hypotheses.
o The set of all hypotheses must be mutually exclusive and exhaustive.
o Thus to find if we examine medical evidence to diagnose an illness. We must know
all the prior probabilities of find symptom and also the probability of having an
illness based on certain symptoms being observed.
Bayesian statistics lie at the heart of most statistical reasoning systems.
How is Bayes theorem exploited?
The key is to formulate problem correctly:
P(A|B) states the probability of A given only B's evidence. If there is other relevant evidence
then it must also be considered.
Herein lies a problem:
All events must be mutually exclusive. However in real world problems events are not
generally unrelated. For example in diagnosing measles, the symptoms of spots and a fever
are related. This means that computing the conditional probabilities gets complex.
In general if a prior evidence, p and some new observation, N then computing
How about our belief about several hypotheses taken together? Measures of belief given
several hypotheses and to be combined logically are calculated as follows:
Bayesian networks
These are also called Belief Networks or Probabilistic Inference Networks. Initially developed by
Pearl (1988).
The basic idea is:
Knowledge in the world is modular -- most events are conditionally independent of most
other events.
Adopt a model that can use a more local representation to allow interactions between events
that only affect each other.
Benard O.Osero, Bsc, Msc Comp Sci, CCNA.
Some events may only be unidirectional others may be bidirectional -- make a distinction
between these in model.
Events may be causal and thus get chained together in a network.
Implementation
A Bayesian Network is a directed acyclic graph:
o A graph where the directions are links which indicate dependencies that exist
between nodes.
o Nodes represent propositions about events or events themselves.
o Conditional probabilities quantify the strength of dependencies.
Consider the following example:
The probability, that my car won't start.
If my car won't start then it is likely that
o The battery is flat or
o The staring motor is broken.
In order to decide whether to fix the car myself or send it to the garage I make the following decision:
If the headlights do not work then the battery is likely to be flat so i fix it myself.
If the starting motor is defective then send car to garage.
If battery and starting motor both gone send car to garage.
The network to represent this is as follows:
Knowledge representation schemes are useful only if there are functions that map facts to
representations and vice versa. AI is more concerned with a natural language representation of facts
and the functions which map natural language sentences into some representational formalism. An
appealing way of representing facts is using the language of logic. Logical formalism provides a way
of deriving new knowledge from the old through mathematical deduction. In this formalism, we can
conclude that a new statement is true by proving that it follows from the statements already known to
be facts.
A good system for the representation of structured knowledge in a particular domain should posses
the following four properties:
(i) Representational Adequacy:- The ability to represent all kinds of knowledge that are needed in that
domain.
(ii) Inferential Adequacy :- The ability to manipulate the represented structure and infer new
structures.
(iii) Inferential Efficiency:- The ability to incorporate additional information into the knowledge
structure that will aid the inference mechanisms.
(iv) Acquisitional Efficiency :- The ability to acquire new information easily, either by direct insertion
or by program control.
The techniques that have been developed in AI systems to accomplish these objectives fall under two
categories:
1. Declarative Methods:- In these knowledge is represented as static collection of facts which are
manipulated by general procedures. Here the facts need to be stored only one and they can be used in
any number of ways. Facts can be easily added to declarative systems without changing the general
procedures.
In practice most of the knowledge representation employ a combination of both. Most of the
knowledge representation structures have been developed to handle programs that handle natural
language input. One of the reasons that knowledge structures are so important is that they provide a
way to represent information about commonly occurring patterns of things . such descriptions are
some times called schema. One definition of schema is
Benard O.Osero, Bsc, Msc Comp Sci, CCNA.
“Schema refers to an active organization of the past reactions, or of past experience, which must
always be supposed to be operating in any well adapted organic response”.
By using schemas, people as well as programs can exploit the fact that the real world is not random.
There are several types of schemas that have proved useful in AI programs. They include
Inheritable knowledge
Relational knowledge is made up of objects consisting of
attributes
corresponding associated values.
We extend the base more by allowing inference mechanisms:
Property inheritance
o elements inherit values from being members of a class.
o data must be organised into a hierarchy of classes (Fig. 8).
Inferential knowledge
Facts represented in a logical form, which facilitates reasoning.
An inference engine is required.
Procedural knowledge
Coding actions to be performed when a condition satisfied.
o Example
o IF
Has fever more than 39oC
Be lazy eating
Skin has red dots
o THEN
Suspect petechial fever
Implementation:
o Writing actions in LISP Programming Language
o Writing actions in Production System Framework, like CLISP, JESS
What are Semantic networks and frames? Explain with suitable examples.
Example
The physical attributes of a person can be represented as in the following figure using a semantic
net.
These values can also be represented in logic as: isa(person, mammal), instance(Mike-Hall, person)
team(Mike-Hall, Cardiff)
( For detail of semantic network please follow lecture slides)
Frames
Frames are descriptions of conceptual individuals. Frames can exist for ``real'' objects such as ``The
Everest Hotel'', sets of objects such as ``Hotels'', or more ``abstract'' objects such as ``Cola-Wars'' or
``Gulfwar''.
A Frame system is a collection of objects. Each object contains a number of slots. A slot represents
an attribute. Each slot has a value. The value of an attribute can be another object. Frames are
essentially defined by their relationships with other frames. Relationships between frames are
represented using slots. If a frame f is in a relationship r to a frame g, then we put the value g in the r
slot of f.
For example, suppose we are describing the following genealogical tree:
Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the
distinction between a semantic net and a frame ends. Semantic nets initially we used to represent
labelled connections between objects. As tasks became more complex the representation needs to
be more structured. The more structured the system it becomes more beneficial to use frames. A
frame is a collection of attributes or slots and associated values that describe some real world entity.
Semantic Networks
First introduced by Quillian back in the late-60s
M. Ross Quillian. "Semantic Memories", In M. M. Minsky, editor, Semantic
Information Processing, pages 216-270. Cambridge, MA: MIT Press, 1968
Semantic network is simple representation scheme which uses a graph of labeled nodes and
labeled directed arcs to encode knowledge
Nodes – objects, concepts, events
Arcs – relationships between nodes
Graphical depiction associated with semantic networks is a big reason for their popularity
The idea behind a semantic network is that knowledge is often best understood as a set of
concepts that are related to one another.
Arcs define binary relations which hold between objects denoted by the nodes.
inheritance reasoning in semantic nets
follow MemberOf & SubsetOf links
up the hierarchy
stop at the category with a property link
to infer the property for an
individual
Frames
Book Frame
Slot Filler
Benefits:
Makes programming easier by grouping related knowledge
Easily understood by non-developers
Expressive power
Conceptual dependency provides a str5ucture in which knowledge can be represented and also a set of
building blocks from which representations can be built. A typical set of primitive actions are
A second set of building block is the set of allowable dependencies among the conceptualization
describe in a sentence.
Advantages of CD:
Using these primitives involves fewer inference rules.
Many inference rules are already represented in CD structure.
The holes in the initial structure help to focus on the points still to be established.
Benard O.Osero, Bsc, Msc Comp Sci, CCNA.
Disadvantages of CD:
Knowledge must be decomposed into fairly low level primitives.
Impossible or difficult to find correct set of primitives.
A lot of inference may still be required.
Representations can be complex even for relatively simple actions. Consider:
Dave bet Frank five pounds that Wales would win the Rugby World Cup.
Complex representations require a lot of storage
Applications of CD:
MARGIE
(Meaning Analysis, Response Generation and Inference on English) -- model natural
language understanding.
SAM
(Script Applier Mechanism) -- Scripts to understand stories. See next section.
PAM
(Plan Applier Mechanism) -- Scripts to understand stories.
Script
A structured representation of background world knowledge. This structure contains knowledge
about objects, actions, and situations that are described in the input text. If we consider the
Knowledge about Shooping or Entering into the Restraunt. This kind of stored Knowledge about
stereotypical events is called a Script.
A script is a structure that prescribes a set of circumstances which could be expected to follow on
from one another.
It is similar to a thought sequence or a chain of situations which could be anticipated.
It could be considered to consist of a number of slots or frames but with more specialised roles.
Scripts are beneficial because:
Events tend to occur in known runs or patterns.
Causal relationships between events exist.
Entry conditions exist which allow an event to take place
Prerequisites exist upon events taking place. E.g. when a student progresses through a degree
scheme or when a purchaser buys a house.
The components of a script include:
Entry Conditions
-- these must be satisfied before events in the script can occur.
Results
-- Conditions that will be true after events in script occur.
Props
-- Slots representing objects involved in events.
Roles
-- Persons involved in the events.
Track
-- Variations on the script. Different tracks may share components of the same script.
Scenes
-- The sequence of events that occur. Events are represented in conceptual dependency form.
Advantages of Scripts:
Ability to predict events.
A single coherent interpretation may be build up from a collection of observations.
Disadvantages:
Less general than frames.
May not be suitable to represent all kinds of knowledge.
Disadvantages:
Benard O.Osero, Bsc, Msc Comp Sci, CCNA.
The neural network needs training to operate.
The architecture of a neural network is different from the architecture of microprocessors
therefore needs to be emulated.
Requires high processing time for large neural networks.
Trade-off
the need to collect many examples for the ability to “explain” single examples (a “domain” theory)
Use problem solver to justify, using the rules, the goal in terms of the facts.
Generalize the justification as much as possible.
The operationality criterion states which other terms can appear in the generalized result.
Boltzmann Machine
A Boltzmann machine is a type of stochastic recurrent neural network invented by Geoffrey
Hinton and Terry Sejnowski. Boltzmann machines can be seen as
the stochastic, generative counterpart of Hopfield nets. They were one of the first examples of a
neural network capable of learning internal representations, and are able to represent and (given
sufficient time) solve difficult combinatoric problems. However, due to a number of issues discussed
below, Boltzmann machines with unconstrained connectivity have not proven useful for practical
problems in machine learning or inference. They are still theoretically intriguing, however, due to the
locality and Hebbian nature of their training algorithm, as well as their parallelism and the
resemblance of their dynamics to simple physical processes. If the connectivity is constrained, the
learning can be made efficient enough to be useful for practical problems.
They are named after the Boltzmann distribution in statistical mechanics, which is used in their
sampling function.
A Boltzmann machine, like a Hopfield network, is a network of units with an "energy" defined for the network.
It also has binary units, but unlike Hopfield nets, Boltzmann machine units are stochastic. The global
energy, , in a Boltzmann machine is identical in form to that of a Hopfield network:
Where:
is the connection strength between unit and unit .
is the state, , of unit .
is the threshold of unit .
The connections in a Boltzmann machine have two restrictions:
. (No unit has a connection with itself.)
. (All connections are symmetric.)