Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

COS351-D/501/2004

ARTIFICIAL INTELLIGENCE TECHNIQUES


KUNSMATIGE INTELLIGENSIE TEGNIEKE

TUTORIAL LETTER 501 FOR COS351-D


STUDIEBRIEF 501 VIR COS351-D

Study guide
Studiegids

SCHOOL OF COMPUTING
SKOOL VIR REKENAARKUNDE
-ii-

Table of Contents

About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

The syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Notes on Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Notes on Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Notes on Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Notes on Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Notes on Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Notes on Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Notes on Chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Notes on Chapter 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Notes on Chapter 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Notes on Chapter 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Notes on Chapter 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Notes on Chapter 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Notes on Chapter 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Afrikaanse studente: Die studiegids vir COS351-D is slegs in Engels beskikbaar. Indien u enige probleme
ondervind met die Engelse terminologie of verduidelikings is u welkom om die dosente te skakel.
1 COS351-D/501

About this guide

The prescribed book for COS351-D is:

Nils J. Nilsson. Artificial Intelligence: A New Synthesis. Morgan Kaufm an Publishers, 1998.

This guide tells you which chapters are included in the course m aterial, and contains com m ents on those sections
of the book that you m ay find difficult to understand. Fortunately the prescribed book is reasonably well-written and
you should be able to follow m ost of the explanations.

The syllabus

The course is divided into three parts. After each part has been studied, you have to subm it an assignm ent. Part 1
covers chapters 1, 2, 3, 5, 7 and 8; Part 2 covers chapters 9, 11 and 12; and Part 3 covers chapters 13, 14, 15, 16
and 17.

Som e of the sections in a chapter m ay be om itted. These sections are listed in the notes on the chapter. If there are
no notes on a particular chapter, it does not m ean that the chapter is not im portant. W e only com m ent on aspects
of the textbook that are unclear or incorrect, including the parts where students have indicated to us in the past that
they cannot follow the given explanations.

In this guide we will often say that a particular section in the prescribed book is to be read only. This m eans that the
section is interesting but that it is not exam inable. W e urge you to read these sections at least once, but you do not
have to study their contents. The last paragraph of every chapter is called Additional Readings and Discussion. These
sections are all to be read only.

Outcomes

In each chapter, som e term inology, theory and artificial intelligence techniques are described. You should be able
to define all the term inology introduced, and explain the theory. Most chapters also focus on som e specific problem
solving skills that you should acquire. W e list these in m ore detail below.

Problem solving skills:

Chapter 1: • None

Chapter 2: • Calculate the outcom e of a given sequence of connected TLU’s for any given input vector.

Chapter 3: • Train a single TLU.


• Train a k-layer feedforward neural net by backpropagation.

Chapter 5: • Specify suitable production rules for a given system .

Chapter 7: • Design a suitable state representation for a given search problem .


2

• Draw a graph of the com plete state space for a given search problem .

Chapter 8: • Im plem ent the breadth-first, depth-first, and iterative deepening search algorithm s.
• Solve a given search problem by hand using each of the above search algorithm s.

Chapter 9: • Define suitable cost and heuristic functions for a given search problem .
• Im plem ent the uniform cost and A* search algorithm s.
• Solve a given search problem by hand using each of the above search algorithm s.

Chapter 11: • Give a representation of a given search problem as a set of constraints.


• Apply constraint propagation to a given problem to reduce the search space.
• Apply heuristic repair to a given problem .

Chapter 12: • Apply the m inim ax procedure to a given adversarial search problem .
• Apply the alpha-beta pruning procedure to a given adversarial search problem .

Chapter 13: • Use a given set of inference rules to prove theorem s.


• Construct truth tables, and use them to reason about validity and entailm ent.

Chapter 14: • Convert a given wff to clause form .


• Prove theorem s using resolution and resolution refutation.
• Apply the unit-preference, set of support, linear input and ancestry filtering strategies in
resolution refutations.

Chapter 15: • Determ ine whether a given wff is valid.


• Determ ine whether a given wff is true or false in a given interpretation.

Chapter 16: • Convert a given problem statem ent to first order predicate calculus.
• Convert a set of first order sentences to clause form .
• Use resolution refutation to prove theorem s and extract answers from queries.

Chapter 17: • Convert a given set of clauses to a Prolog program .


• Trace through a given Prolog program .
3 COS351-D/501

Part 1

The first part of the course consists of the following chapters: 1, 2, 3, 5, 7 and 8.

Chapters 4 and 6 are not included in the syllabus.

Notes on Chapter 2

Section 2.1

Note that the contents of the feature vector in Figure 2.2 on page 23 does not relate to the exam ple discussed in
Figure 2.1. Figure 2.2 sim ply shows an exam ple of a feature vector - the designer decides what each coordinate
represents.

Sections 2.1.3 and 2.1.4

The m athem atics in these sections is covered in COS101-S, COS113-W and COS211-X.

Section 2.2.1

The production system s introduced in this section are sim ilar to the work covered in COS301-Y.

Section 2.2.2

Consider the TISA units in Figure 2.7. Every TISA unit is used to represent one rule in a production system .

Let’s see how it works. The inhibit input of the topm ost TISA unit is 0. If the test input c 1 is 1, then the output of the
rightm ost TLU is 1, and the action a 1 is triggered. Suppose the test input c 1 was 0, then the output of the rightm ost
TLU would be 0 and the action a 1 would not be triggered.

If we have the first case (inhibit input is 0 and c 1 is 1) then the TLU at the bottom of the first TISA unit produces a
1. So the inhibit input for the next TISA is 1. In this case the second TISA unit cannot trigger the action a 2. All the
other TISA units also receive an inhibit input of 1.

If, on the other hand, both the inhibit input and c 1 are 0, then the inhibit input to the second TISA unit is 0, and so the
action a 2 will be triggered if c 2 is 1.

Section 2.2.3 is to be read only.


4

Notes on Chapter 3

Sections 3.2.1 and 3.2.2

In Chapter 2, you were introduced to Threshold Logic Units, or TLUs. A TLU is capable of classifying input vectors
into two categories. In this chapter, you will learn how to train a TLU. As is explained in Section 3.2.1, a TLU is trained
(by adjusting weights) on a set of sam ple data, each of which consists of an input vector together with the required
output value. The output value is the correct classification of the input vector. Usually these output values are taken
to be 0 or 1. This required output value is com pared with the output value calculated by the TLU to determ ine the
error m ade by the TLU. The TLU uses the dot product of the input vector X and the weight vector W , defined by:
X @ W = x 1w 1 + x 2w 2 + ... + x nw n

to calculate the output of the TLU. Nam ely, the output is 1 when X @ W $ 0, else it is 0.

Suppose there are only two input values x 1 and x 2, then X @ W = 0 is the equation of a straight line through the origin
on a two-dim ensional plane. You should be fam iliar with this equation, although you m ay possibly recognise it better
in the form y=m x+c. If we rewrite the equation x 1w 1 + x 2w 2 = 0, we get x 1 = (-w 2/w 1) x 2 + 0, which is in the form at you
m ay be used to. The gradient of the line is (-w 2/w 1) , and its offset from the origin is 0. The line X @ W = 0 separates
the input vectors into two sets, nam ely those points X for which X @ W $ 0, and those points for which X @ W < 0.

If there are m ore than two input values, then X @ W = 0 is the equation of a hyperplane through the origin in the space
of input vectors. If there are three input values this can be visualised as a plane through the origin, but in m ore than
three dim ensions one cannot draw a picture any m ore.

It is convenient to use a threshold of 0, which m eans that the hyperplane defined by the equation
X @ W = 0 goes through the origin of the space of input vectors. If we want to use a non-zero threshold 2, this can be
accom m odated as explained in Section 3.2.2. Nam ely, add another input value x n+1, which is always 1, and set the
weight w n+1 equal to -2. So instead of using the equation
x 1w 1 + x 2w 2 + ... + x nw n = 0
as linear boundary, we now use the equation
x 1w 1 + x 2w 2 + ... + x nw n + w n+1 = 0
or, equivalently,
x 1w 1 + x 2w 2 + ... + x nw n = 2.

Sections 3.2.3 to 3.2.6

The Gradient Descent Method, which is used to train a TLU, requires a m athem atical background which the m ajority
of you do not have. You should work through Sections 3.2.3 to 3.2.6, but if you don’t have the required background
in partial derivatives, the m athem atical explanations will not m ake m uch sense, and you can replace it by the intuitive
explanation we give below.

The aim of the training process is to set the weights so as to m inim ize the error m ade by the TLU. The error for each
input vector X is given by
g = (d - f) 2,
5 COS351-D/501

where f is the actual response of the TLU, and d is the desired response. (W e square (d - f) to m ake sure we get a
positive error.) Keeping the input vector X fixed for the m om ent, we see that each different weight vector will give a
different error value g. W e naturally want to choose a weight vector that will m inim ize g. So given the error produced
by the current input vector, we m ake a slight adjustm ent to the weight vector. W e cannot m ake the change too
radical, as this would am ount to “forgetting” all the previous input vectors in the training set, and focusing only on the
current input vector. W e have to find a weight vector that will perform as good as possible overall, taking into account
the entire training set. For each input vector in the training set, an adjustm ent is m ade to the weight vector which will
m ake the error for that input vector a little bit sm aller.

In m athem atical term s, we accom plish this by doing a negative gradient descent of the error function in weight space.
W hat does this m ean? Suppose again that we have only two input values. Then we also have two weights. These
weight vectors can be represented as points on a two-dim ensional plane. Keeping the input vector X fixed, we can
calculate the error produced for each weight by the error function g, and plot this on a third axis. The error curve looks
a bit like a m agic carpet floating above the weight plane. The gradient descent m ethod always follows a downward
slope of the error function (the “m agic carpet”). This is repeated for each error function produced by each input vector,
until the process converges and we have found a weight vector that produces a zero error for the entire training set.
This is not always possible, as is illustrated by the XOR function, which cannot be sim ulated by a TLU (see Section
2.2.2).

The fly in the ointm ent is that the gradient descent m ethod is based on calculating gradients (or slopes), which
involves partial derivatives. And, as you m ay rem em ber from high school m athem atics, not all functions are
differentiable at all points. In particular, the threshold function is not continuously differentiable.

The two m ethods considered in the textbook to work around this problem are the W idrow-Hoff procedure and the
Generalized Delta procedure. The form er ignores the threshold function, and uses the dot product as output, and the
latter replaces the threshold function with a sigm oid function, which is continuously differentiable. The sigm oid
function is shown in Figure 3.2 on page 42.

As we’ve m entioned above, you don’t have to understand how the gradient of the error function in weight space is
calculated in each of these procedures. But you should understand and be able to apply the rest of Section 3.2.4.
The m inus sign in the gradient -2(d-f)X used in the W idrow-Hoff procedure indicates that the gradient is negative,
so the error curve follows a downward slope near the point W . The factor 2 is incorporated into the learning rate
param eter c, which determ ines to what extent the error produced by current input vector influences the new weight.
The larger c is, the greater is the influence of the error produced by the current input vector. Sim ilarly, you don’t have
to be able to derive the gradient -2(d-f)f(1-f)X used in the generalized Delta procedure, but you should understand
and be able to apply everything else in Section 3.2.5.

Let us look at an exam ple: W e will train a TLU to determ ine when the robot introduced in Section 2.1 should m ove
east in its two-dim ensional grid-space world. The procedure is described in Section 3.2.6. W e start with the weight
vector having all weights equal to 0. First, we use the input vectors suggested in Figure 3.3, and then we continue
until the TLU gives the correct output for all input vectors. Recall that there are no tight spaces (spaces that are only
one cell wide) in the grid-space world. This elim inates a large num ber of input vectors that would otherwise be
possible. W e also assum e that what is m eant by “large, unm ovable objects” (see Section 2.1), is that boundaries
never change direction m ore than once every two cells.
6

As explained in Section 3.2.2 and tutorial letter 501, we add a ninth input, which is always 1, to the input vector in
order to find the threshold value. The final threshold value is the negation of the ninth elem ent of the final weight
vector, which in our case is 1. Note that different TLUs can im plem ent the sam e linearly separable function. Our TLU
differs from the one given in Figure 2.6, but it is also able to determ ine when the robot should m ove east.

W eight change rule: W i+1 = W i + c(d i - f i) X i


Rate of change constant: c = 1
Input vector: X i
Desired output: d i
Observed output: fi

W1 =( 0 0 0 0 0 0 0 0 0 )
X1 =( 0 0 0 0 1 1 0 0 1 )
d 1 = 0;
X 1AW 1 = 0; f 1 = 1

W2 =( 0 0 0 0 -1 -1 0 0 -1 )
X2 =( 1 1 1 0 0 0 0 0 1 )
d 2 = 1;
X 2AW 2 = -1; f 2 = 0

W3 =( 1 1 1 0 -1 -1 0 0 0 )
X3 =( 0 0 1 0 0 0 0 0 1 )
d 3 = 1;
X 3AW 3 = 1; f 3 = 1

W4 =( 1 1 1 0 -1 -1 0 0 0 )
X4 =( 0 0 0 0 0 0 0 0 1 )
d 4 = 0;
X 4AW 4 = 0; f 4 = 1

W5 =( 1 1 1 0 -1 -1 0 0 -1 )
X5 =( 0 0 0 0 1 0 0 0 1 )
d 5 = 0;
X 5AW 5 = -2; f 5 = 0

W6 =( 1 1 1 0 -1 -1 0 0 -1 )
X6 =( 0 1 1 0 0 0 0 0 1 )
d 6 = 1;
X 6AW 6 = 1; f 6 = 1

W7 =( 1 1 1 0 -1 -1 0 0 -1 )
X7 =( 1 0 0 0 0 0 0 0 1 )
d 7 = 0;
X 7AW 7 = 0; f 7 = 1
7 COS351-D/501

W8 =( 0 1 1 0 -1 -1 0 0 -2 )
X8 =( 0 1 1 1 0 0 0 0 1 )
d 8 = 0;
X 8AW 8 = 0; f 8 = 1

W9 =( 0 0 0 -1 -1 -1 0 0 -3 )
X9 =( 0 1 1 0 0 0 0 0 1 )
d 9 =1;
X 9AW 9 = -3; f 9 = 0

W 10 =( 0 1 1 -1 -1 -1 0 0 -2 )
X 10 =( 0 0 1 0 0 0 0 0 1 )
d 10 = 1;
X 10AW 10 = -1; f 10 = 0;

W 11 =( 0 1 2 -1 -1 -1 0 0 -1 )
X 11 =( 0 0 1 1 0 0 0 0 1 )
d 11 = 0;
X 11AW 11 = 0; f 11 = 1;

W 12 =( 0 1 1 -2 -1 -1 0 0 -2 )
X 12 =( 0 0 1 0 0 0 0 0 1 )
d 12 = 1;
X 12AW 12 = -1; f 12 = 0;

W 13 =( 0 1 2 -2 -1 -1 0 0 -1 )

At this point, no further input changes the weight vector, so the TLU has been fully trained. The input weight vector
is (0, 1, 2, -2, -1, 0, 0), and the threshold value is 1.
Section 3.3.3

Backpropagation is a technique that enables a neural net to adjust its weights autom atically from training exam ples.
This m ethod com putes how m uch the perform ance of a neural net im proves with individual weight changes. It is
called backpropagation because it com putes changes to the weights in the final layer first, reuses m uch of the sam e
com putation to com pute changes to the weights in the penultim ate layer, and ultim ately goes back to the initial layer.

As in Section 3.2, you can skip the calculation of the gradient of the error function g as a partial derivative in weight
space. The overall idea is to m ake a large change to a particular weight if the change leads to a large reduction in
the errors observed at the output nodes. However, a change to one weight affects other weights and inputs and
outputs to nodes as well. A change to a node’s input values results in a change to the output that depends on the
slope of the threshold function. All these factors are taken into account in the calculations on pages 47 to 49.

You should understand the principle of backpropagation, and pick up the discussion again in the m iddle of page 49,
where the general expression for the *’s is given. Note that each of the *’s is a m easure of the sensitivity of the
squared output to changes in the input of the corresponding sigm oid function.
8

So how do we apply the back propagation m ethod? Consider the exam ple on page 50 that refers to Figure 3.6. The
inputs to the two sigm oid functions in the first layer are com puted by taking the dot products of the input vector with
the weights. First we have (1*2) + (0*-2) + (1*0) = 2. The output of the sigm oid function (Figure 3.2) for an input of
2 is 0.881. The input to the second sigm oid function in the first layer is (1*1)+(0*3)+(1*-1) = 0, and its output is 0.5.

Now we can calculate the input to the sigm oid function in the second layer:
(0.881*3)+(0.5*-2)+(1*-1) = 0.643. The output is 0.665.

The next step is to calculate the base case * 2 = (d-f)f(1-f) = -0.665(0.665)(0.335) = -0.148145.

Now we have to calculate the *’s in the first layer by using the relevant form ula on page 49:
* (1) = f (1) (1 - f (1) )(* (2)w (1) ) = 0.881(0.119)(-0.148*3) = -0.046548. And sim ilar for * (1) .
1 1 1 1 2

The next step is to calculate new weights by using the relevant form ula on page 49. For exam ple,
W (1) = [2,-2,0] + (-0.047)[1,0,1] = [1,953,-2.000,-0.047].
1

W e hope that this chapter has served to dispel som e m isconceptions you m ay have had of neural networks. They
are founded on sound m athem atical principles, and there is nothing supernatural about them .

Notes on Chapter 5

Section 5.2 is to be read only.

Notes on Chapter 7

Section 7.5

Consider the definition of the branching factor of a tree on the m iddle of page 124. If the nodes of a tree do not all
have the sam e num ber of successors, we take the node with the m axim um num ber of successors, say m
successors, and call m the branching factor of that tree.

Notes on Chapter 8

Section 8.3

The textbook describes the breadth-first search algorithm , but does not give an algorithm which can easily be
converted to a com puter program . Breadth-first search can be im plem ented using two lists, called OPEN and
CLOSED respectively. The OPEN list contains states that have been generated but whose children have not yet been
exam ined. OPEN is m aintained as a queue, or first-in-first-out data structure. That is, elem ents are added to the back
of the queue, and rem oved from the front of the queue. In a C++ im plem entation, the STL queue container class can
be used to im plem ent the OPEN queue. The CLOSED list contains states that have already been exam ined, and can
be im plem ented using any list-type structure. (It is often unnecessary to im plem ent this list - it depends on the
particular problem whether this list is useful ot not.)
9 COS351-D/501

W hat inform ation do we need to keep in a state? This depends on the problem description. The choice of state
representation can greatly influence the efficiency and sim plicity of a solution. It is therefore worthwhile to sit down
and carefully design the state description of any search problem before em barking on an im plem entation. It is often
useful to keep a link to the parent state in the search tree as part of the state representation, in order to reconstruct
the solution path from the start state to the final state.

Here is the C++-style pseudo-code for the breadth-first search algorithm .

{
create a queue OPEN containing the start state;
create an empty list CLOSED;
while (OPEN is not empty)
{
remove the front element from OPEN, call it X;
if X is a goal state
{
return SUCCESS;
}
else
{
generate all the children of X;
put X on CLOSED;
discard children of X that are already on OPEN or CLOSED;
add the remaining children to the back of OPEN;
}
}
return FAIL;
}

Section 8.4

As in the case of breadth-first search, depth-first search can also be im plem ented using two lists, called OPEN and
CLOSED respectively. The OPEN list again contains states that have been generated but whose children have not
yet been exam ined. In the depth-first search algorithm , OPEN is m aintained as a stack, or last-in-first-out data
structure. That is, elem ents are added to the front (or top) of the stack, as well as rem oved from the front (or top) of
the stack. (Recall that, in the breadth-first search algorithm , elem ents were rem oved from the back of the queue.)
In a C++ im plem entation, the STL stack container class can be used to im plem ent the OPEN stack. CLOSED again
records states that have already been exam ined, and can be im plem ented using any list-type structure. (Again the
usefulness of this list depends on the problem you are solving.)

As in the case of breadth-first search, careful consideration of the state representation can save you a great deal of
difficulty during the im plem entation phase, and result in a sim pler, m ore efficient program . It is therefore worthwhile
to carefully consider what inform ation you need to store in a state, and how you want to represent this inform ation.
It m ay be useful to store a link to the parent state in the state representation, in order to reconstruct the solution path
from the start state to the final state.

Here is the C++-style pseudo-code for the depth-first search algorithm :


10

{
create a stack OPEN containing the start state;
create an empty list CLOSED;
while (OPEN is not empty)
{
pop the top element of OPEN, call it X;
if X is a goal state
{
return SUCCESS;
}
else
{
generate all the children of X;
put X on CLOSED;
discard children of X that are already on OPEN or CLOSED;
push the remaining children onto OPEN;
}
}
return FAIL;
}

Section 8.5

Consider the form ula N bf on page 135. Suppose we have a tree with a branching factor of b and we want to count
the num ber of nodes up to level d: there is one node on level 0, at m ost b nodes on level 1, at m ost b 2 nodes on level
2, ... , at m ost b d nodes on level d. So we just add all the nodes on each level together to get the m axim um num ber
of nodes expanded by a breadth-first search.
11 COS351-D/501

Part 2

The second part of the course consists of the following chapters: 9, 11 and 12.

Chapter 10 is not included in the syllabus of this course.

Notes on Chapter 9

Section 9.2

W e spotted an error in step 6 of GRAPHSEARCH that only appears in som e copies of the textbook: the last sentence
of step 6 is m issing. Make sure that the last sentence of step 6 appears in your copy: “Put these m em bers of M on
OPEN.”

Section 9.2.1

In step 8 on page 145 the author says that ties am ong m inim al v alues are resolved in favour of the deepest node

in the search tree. This is only one way to choose between two nodes with the sam e values: another approach
is to select the node whose parent has been expanded first.

Section 9.2.2

The last step in the proof of Lem m a 9.1 is to show that f(n*) = f(n 0).

W e know that f(n*) = g(n*) + h(n*) where g(n*) is the cost of the m inim al path from n 0 (start node) to node n*, and
h(n*) is the cost of the m inim al path from node n* to the goal.

W e also know that f(n 0) = g(n 0) + h(n 0) where g(n 0) = 0 and h(n 0) is the cost of the m inim al path from n 0 to the goal
node.

So f(n*) = h(n 0) = f(n 0).

Section 9.2.3

Consider the line in the proof of Theorem 9.3 starting with “For any node n i and its successor n i+1 in >...”. The equation
in this line follows from the fact that:
g(n i+1) - g(n i) = c(n i,n i+1).

Sections 9.2.4 and 9.2.5 are to be read only.


12

Section 9.3

A Manhattan distance is calculated by counting the num ber of m oves a tile has to m ake in order to reach its goal cell
if there were no other tiles in its way. Consider the start node in Figure 9.1 on page 140. The M anhattan distance
between tile 2 and the blank cell is 3.

Figure 9.11 shows a graph that was generated by the form ula on page 158 and can be used in several ways. If you
have already perform ed a search such as the one depicted in Figure 9.2, you can use the graph to determ ine the
effective branching factor: the goal node is at depth 5 and a total of 13 nodes have been expanded. This gives us
a branching factor of 1.34. (Note that the value of 1.2 in the prescribed book is incorrect.) On the other hand, if you
want to solve a m ore difficult version of the sam e problem , you can read from the graph that a depth of 30 will result
in about 2000 nodes if the branching factor were 1.2.

Notes on Chapter 11

Section 11.4 is to be read only. The comments we give below on hill-climbing search are for your interest only.

Nilsson’s description of hill-clim bing is unusual. In step 2 of his algorithm he sim ply selects the best successor and
discards the rem aining successors. In m ost AI text books you will place the rem aining siblings at the front of OPEN
in decreasing order in term s of their v values. If you page back to the GRAPHSEARCH algorithm of section 9.2 on
page 141, step 6 will read as follows:

6. Expand node n, generating a set M of successors. Install M as successors of n in Tr by creating arcs from
n to each m em ber in M. Reorder the set M in decreasing order according to their v values. Place these
m em bers of M at the front of OPEN.

Replace his algorithm on page 190 with the above. If you use our algorithm , you can see that hill-clim bing search is
a depth-first search which uses an evaluation function v to determ ine the m ost prom ising next state. So, unlike the
description given in the textbook, we are guaranteed to find a solution if one exists. Even if there are infinite paths,
any reasonable evaluation function will prevent an infinite traversal of a futile path on the search tree.

Notes on Chapter 12

In order to clarify the application of the alpha-beta procedure we provide the solutions to exercise 12.2, page 213 of
the prescribed book.

12.2.1 The answer is D.

W e illustrate the m ethod with the picture on the next page. The tip nodes are Min nodes, so the backed-up
values of the MAX parent nodes are the m axim um values of the tip node successors: E(3); F(8); G(7); H(1);
I(5); J(8) and K(10). Now we m ove up one level. The nodes at depth 1 are MIN nodes, so we need to take
the m inim um values of their successors’ backed-up values: B(3); C(1) and D(8). The start node A is a MAX
node and he selects the m axim um value of his successors’ backed-up values: the value 8.
13 COS351-D/501

This m eans that the best m ove the first player can m ake is m ove D. Then the second player’s best m ove
is J (because he knows that if he should choose K, the first player will choose X!). The first player will then
go for m ove V.

Note that Nilsson does not give the pseudo-code for the m inim ax procedure. He sim ply describes this procedure and
in the next section he gives the pseudo-code for the alpha-beta procedure. The reason is probably because it is
alm ost always better to use the alpha-beta procedure.

12.2.2 The answer is O, Q, I (hence T and U) and Y.

W e show a trace of the com plete solution using the alpha-beta procedure on page 205. Please note that the
param eters " and $ are not reference param eters. If their values should change during a call, the new values
are not passed back.

Using the alpha-beta procedure here follows a trace of the tree, exercise 12.2 on p 213, Nilsson.

Call 1
AB(A;-4,4)
" = -4; $ = 4
n 1 = B; n 2 = C;n 3 = D
k=1
(go to step 2)
" = m ax[-4, AB(B,-4,4)]
= m ax[-4,3] (See Call 2 below)
=3
(steps 3 and 4)
k=2
14

(step 2 again)
" = m ax[3, AB(C,3,4)]
= m ax[3,3] (See Call 10 below)
=3
(steps 3 and 4)
k=3
(step 2 again)
" = m ax[3, AB(D,3,4)]
= m ax[3,8] (See Call 14 below)
=8
return "=8

Call 2
AB(B;-4,4)
" = -4; $ = 4
n 1 = E; n 2 = F;n 3 = G
k=1
(go to step 2')
$ = m in[4, AB(E,-4,4)]
= m in[4,3] (See Call 3 below)
= 3
(steps 3' and 4')
k=2
(step 2' again)
$ = m in[3, AB(F,-4,3)]
= m in[3,3] (See Call 6 below)
= 3
(steps 3' and 4')
k=3
(step 2' again)
$ = m in[3, AB(G,-4,3)]
= m in[3,3] (See Call 8 below)
= 3
return $=3

Call 3
AB(E;-4,4)
" = -4; $ = 4
n 1 = L; n 2 = M
k=1
(go to step 2)
" = m ax[-4, AB(L,-4,4)]
= m ax[-4,2] (See Call 4 below)
=2
15 COS351-D/501

(steps 3 and 4)
k=2, (step 2 again)
" = m ax[2, AB(M,2,4)]
= m ax[2,3] (See Call 5 below)
=3
return "=3

Call 4
AB(L,-4,4) = 2

Call 5
AB(M,2,4) = 3

Call 6
AB(F;-4,3)
" = -4; $ = 3
n 1 = N; n 2 = O
k=1
(go to step 2)
" = m ax[-4, AB(N,-4,3)]
= m ax[-4,8] (See Call 7 below)
=8
(step 3)
"=8 is greater than $=3 (Here we cut off node O!)
return $=3

Call 7
AB(N,-4,3) = 8

Call 8
AB(G,-4,3)
" = -4; $ = 3
n 1 = P; n 2 = Q
k=1
(go to step 2)
" = m ax[-4, AB(P,-4,3)]
= m ax[-4,7] (See Call 9 below)
=7
(step 3)
"=7 is greater than $=3 (Here we cut off node Q!)
return $=3

Call 9
AB(P,-4,3) = 7
16

Call 10
AB(C;3,4)
" = 3; $ = 4
n 1 = H; n 2 = I
k=1
(go to step 2')
$ = m in[4, AB(H,3,4)]
= m in[4,3] (See Call 11 below)
=3
(step 3')
$=" (Here we cut off node I!)
return "=3

Call 11
AB(H,3,4)
" = 3; $ = 4
n 1 = R; n 2 = S
k=1
(go to step 2)
" = m ax[3, AB(R,3,4)]
= m ax[3,0] (See Call 12 below)
=3
(steps 3 and 4)
k=2, (step 2 again)
" = m ax[3, AB(S,3,4)]
= m ax[3,1] (See Call 13 below)
=3
return "=3

Call 12
AB(R,3,4) = 0

Call 13
AB(S,3,4) = 1

Call 14
AB(D,3,4)
" = 3; $ = 4
n 1 = J; n 2 = K
k=1
(go to step 2')
$ = m in[4, AB(J,3,4)]
= m in[4,8] (See Call 15 below)
=8
(steps 3' and 4')
17 COS351-D/501

k=2
(step 2' again)
$ = m in(8, AB(K,3,8)]
= m in[8,8] (See Call 18 below)
=8
(steps 3' and 4')
return $=8

Call 15
AB(J;3,4)
" = 3; $ = 4
n 1 = V; n 2 = W
k=1
(go to step 2)
" = m ax[3, AB(V,3,4)]
= m ax[3,8] (See Call 16 below)
=8
(steps 3 and 4)
k=2
(step 2 again)
" = m ax[8, AB(W ,8,4)]
= m ax[8,4] (See Call 17 below)
=8
(steps 3 and 4)
return "=8

Call 16
AB(V,3,4) = 8

Call 17
AB(W ,8,4) = 4

Call 18
AB(K,3,8)
" = 3; $ = 8
n 1 = X; n 2 = Y
k=1
(go to step 2)
" = m ax[3, AB(X,3,8)]
= m ax[3,10] (See Call 19 below)
= 10
(step 3)
"=10 is greater than $=8 (Here we cut off node Y!)
return $=8
18

Call 19
AB(X,3,10) = 10

Sections 12.6 and 12.7 are to be read only.


19 COS351-D/501

Part 3

The third part of the course consists of the following chapters: 13, 14, 15, 16 and 17.

Notes on Chapter 13

Section 13.4

In COS161 you were introduced to the propositional calculus, and to the natural deduction proof system . In this
m odule you will be introduced to another proof system , namely resolution (see Chapter 14). The set of inference rules
R is determ ined by the proof system used, and whether the agent is a perfect reasoner. For exam ple, R m ay be the
natural deduction rules you used in COS161, or it m ay be the resolution inference rule you will encounter in Chapter
14, or it m ay be som e arbitrary set of inference rules such as the set presented in the exam ple in Section 13.4. W hen
no confusion can arise, we om it explicit reference to the set of inference rules. In this m odule, this will usually be the
case.

Nilsson calls T n a theorem of the set ) if ) | T n. It is also called a deduction from ) in som e textbooks. If ) is the
em pty set, we call T n a theorem, and write it as follows: | T n. This m eans that there is a proof of T n from the em pty
set of prem isses.

Sections 13.5.5 and 13.8.1

The sym bol / is a m etalinguistic sym bol (see Section 13.8.1) and therefore cannot be used in an abbreviation for a
sentence in the propositional calculus. Nilsson is guilty here of the sam e confusion he warns readers about in Section
13.8.1. W hat he probably m eans is that T 1 / T 2 is an abbreviation for

(T 1 e T 2) v (T 2 e T 1) / T.

Sections 13.8.3 and 13.8.4

There are a num ber of useful laws that can be used to convert a wff into an equivalent wff (i.e. they have the sam e
truth value under all interpretations). W e give a m ore com prehensive list here to supplem ent the laws given in the
textbook.
20

1. "v$/$v" Com m utativity


"w$/$w"

2. (" v $) v ( / " v ($ v () Associativity


(" w $) w ( / " w ($ w ()

3. " v ($ w () / (" v $) w (" v () Distributivity


" w ($ v () / (" w $) v (" w ()

4. " v ¬" / F Com plem ent


" w ¬" / T

5. ¬¬" / " Double negation

6. "vT/" Identity
"wF/"

7. "v"/" Idem potence


"w"/"

8. "vF/F Boundedness
"wT/T

Notes on Chapter 14

Section 14.1.1

Nilsson som etim es uses the set notation to refer to a single clause, i.e. to a disjunction of literals. This can be quite
confusing, since he also uses the set notation to refer to a set of clauses. For exam ple, som etim es {P,Q,R} refers
to the single clause P w Q w R , and som etim es it refers to the set consisting of the clauses P, Q and R. W e
recom m end that you use the set notation only when referring to a set of clauses, and to write a single clause as a
disjunction of literals. W e list all the instances where he uses the set notation, instead of disjunction, to write a single
clause:

Section Clause
14.1.1 {P,Q,¬R} should be P w Q w ¬R.
14.1.1 {} should be nil (W e don’t use the em pty set for the em pty clause.)
14.1.2 {(} c E 1 and {¬(} c E 2 both refer to single clauses.
14.1.3 {(} c E 1 and {¬(} c E 2 both refer to single clauses.

Section 14.2

In step 3 of the conversion to clause form , the com m utative law has to be applied before the distributive law can be
applied. That is, after step 2,
21 COS351-D/501

(P v ¬Q) w (¬R w P)
/ (¬R w P) w (P v ¬Q) by com m utativity
/ (¬R w P w P) v (¬R w P w ¬Q) by distributivity and associativity
/ (¬R w P) v (¬R w P w ¬Q) by idem potency
/ (P w ¬R) v (¬Q w ¬R w P) by com m utativity

Omit the explanation on page 233 on how to convert DNF to CNF using a matrix. You may not use it. You may only
convert a DNF to CNF using the laws listed above.

Section 14.4.2

Set of Support

In this strategy you select a subset of the input (the initial clauses) and call it your set of support. Then you apply
resolution refutation but one of the clauses being resolved m ust be a m em ber of the set of support, or a descendant
of a m em ber of the set of support. Nilsson chooses the negation of the goal as his set of support, but there are other
choices as well. Consider the exam ple below.

The initial set of clauses:


1) A
2) ¬C
3) ¬A V ¬B

Prove: (A v ¬B v ¬C).

The negation of this goal, is the set of support:


4) ¬A w B w C

Proof:

5) ¬A w B (Resolve 2 and 4. 4 com es from the set of support)


6) ¬A (Resolve 3 and 5. 5 is a descendant of 4.)
7) Nil (Resolve 1 and 6. 6 is a descendant of 5, and therefore of 4 as well.)

Section 14.5

There are three types of Horn clauses:


• A single atom , e.g. P. This clause consists of a single positive literal, and is also called a fact.
• An im plication, e.g. P v Q e R. This im plication is written in clause form as
¬P w ¬Q w R. Note that all the positive literals appearing in the antecedent of the im plication are
negative literals in the clause, and that the consequent of the im plication is the only positive literal
in the clause. A clause with a single positive literal, and one or m ore negative literals is called a rule.
• A disjunction of negative literals, e.g. ¬P w ¬Q. This can be written as the im plication
P v Q e F. Note that we cannot write this as the im plication P v Q e , because the latter is not
syntactically correct. A clause with no positive literals, and one or m ore negative literals is called a
goal.
22

Notes on Chapter 15

Section 15.2

The language of the predicate calculus has a num ber of different com ponents, and one should take care to com bine
them in a syntactically correct fashion:
• Term s are built up using object constants, variables (introduced in Section 15.4) and function
constants. Term s cannot be com bined using connectives, and they are not wffs. Term s m ake up
the argum ents of relation constants.
• Com posite wffs are built up from relation constants using the connectives of propositional calculus,
and the quantifiers œ and › (introduced in Section 15.4).

Section 15.4

After introducing variables and quantifiers, Nilsson defines a closed wff or closed sentence as one in which every
occurrence of every variable x occurs within the scope of a quantification (œx) or (›x). If x falls within the scope of
a quantification (œx) or (›x), we say that x is bound by the quantifier. If a variable is not bound, it is called free. The
explanation in the textbook is quite cryptic, and we suggest that you refer back to your COS161 textbook for a m ore
detailed explanation. Note that what Nilsson calls a closed sentence is in som e other textbooks sim ply called a
sentence (i.e. a sentence is a closed wff). An open wff is one in which not all occurrences of all the variables are
bound. Here are som e exam ples:

(›x) P(x,y): The variable x is bound, but the variable y is free.


(›x) Q(x) v (›y) Q(y): Both x and y are bound. The given wff is therefore closed.
Q(x) v (›y) Q(y): x is free, and y is bound.
Q(y) v (›y) Q(y): The first occurrence of y is free, but the second occurrences of y are
bound.

Section 15.5

Nilsson often writes a wff T as T(>) to indicate that the variable > occurs in the wff T. This does not m ean that > is
the only variable occurring in T. For exam ple, if T is the wff (›x) Q(x) v (›y) Q(y), we may write it as T(x) or T(y) or
T(x,y), depending on which of x and/or y we are concerned with at the m om ent.

In Section 15.5.3, the idea of replacing a variable with another variable is m entioned. In Chapter 16 you will again
com e across this idea. You should take extrem e care when replacing variables with term s. W e will defer a m ore in-
depth discussion on this topic until Chapter 16, but for now, write “Careful! 0 m ust be a new variable which does not
already occur in T” in the m argin next to the equivalence
(œ>)T(>) / (œ0)T(0).

A sim ilar com m ent holds for his replacem ent of the constant sym bol " with the variable > in the Existential
generalization rule in Section 15.5.4. Add the following sentence: “Careful! > m ust be a new variable which does not
already occur in T.” Here is an exam ple of what m ay go wrong without this addition: Let T be the sentence (›x)P(x,c).
Consider the interpretation with dom ain the natural num bers, P the “larger than” relation, and c the constant 0. Our
23 COS351-D/501

sentence states that there exists a natural num ber x which is larger than 0. Now apply the existential generalization
rule, replacing c by the variable x, and adding (›x) to the front of T to get:
(›x)(›x)P(x,x).
The first occurrence of (›x) has no effect, since it is im m ediately followed by another occurrence of (›x). W e have
deduced that there exists a natural num ber x which is larger than itself!

Solution to Exercise 15.1 on page 250

There are m any possible m odels. W e have to decide which property of natural num bers we are going to m ap the
predicates On and Clear to, and which num bers we are going to m ap the constants A, B, C, and Fl to. Then we have
to check that each of the form ulas is true under this interpretation. Here is one possibility:

Fl m aps to 0.
A m aps to 2.
B m aps to 1.
C m aps to 1.
On m aps to the relation >.
Clear m aps to the property of being odd.

W e now have to check that this interpretation is a m odel of the given form ulas:

The statem ent “If 2 > 0 then 1 is odd” is true.


The statem ent “If 1 is odd and 1 is odd, then 2 >” is true.
The statem ent “1 is odd or 2 is odd” is true.
The statem ent “1 is odd” is true.
The statem ent “1 is odd” is true.

Since all the given form ulas are true in our interpretation, it is a m odel.

Notes on Chapter 16

Section 16.1

Nilsson again m entions that he som etim es uses the set notation to refer to a single clause. W e list all the instances
where he uses the set notation, instead of disjunction, to write a single clause.

Section Clause
16.2 (( 1 - {N}) and (( 2 - {Q}) are both single clauses.
16.2 {P(u), P(v)} and {¬P(x), ¬P(y)} are both single clauses.

Nilsson also uses curly braces where he should be using square brackets:

Section Clause
16.4 Replace the pairs of curly braces with pairs of square brackets in steps 5 and 6.
16.5 Replace the pair of curly braces with a pair of square brackets in point 1.
24

A substitution instance of an expression is obtained by substituting term s for variables in that expression. That is, we
replace variables by term s in the expression. Unlike in the existential generalization rule given in Section 15.5.4, we
m ay only replace variables. W e m ay not replace constant sym bols or functional expressions. For exam ple, we m ay
substitute f(x) for y (i.e. replace y by f(x)), but we m ay not substitute y for f(x) (i.e. replace f(x) by y).

Section 16.2

Note that all the sets in this section refer to single clauses, and not to sets of clauses. Also note that the literals in
( 1 m ust be unifiable, and sim ilar for the literals in ( 2. In the exam ple given, P(u) and P(v) m ust be unifiable, and so
m ust ¬P(x) and ¬P(y). Don’t confuse the generalized resolution rule given here with the com m on m istake described
in the last bulleted item in Section 14.1.2. For exam ple, the resolvent of the clauses P(u) w Q(v) and ¬P(x) w ¬Q(y)
is not nil. There are two possible resolvents, nam ely Q(v) w ¬Q(y) and P(u) w ¬P(x).

Section 16.4

Step 9 in the conversion to clause form will typically be repeated a num ber of tim es during a proof by resolution.
Before each step in the proof, it m ay be necessary to renam e som e of the variables in one of the clauses involved.

In order to obtain a better understanding of the conversion of arbitrary wffs to clause form the the solution to exercise
16.3(2), p 266 m ay be considered:

Exercise 16.3(2), p 266

W e follow the steps provided in Section 16.4

The original expression:

(œx)[¬P(x) e (œy)[(œz)[Q(x,y)] e ¬(œz)[R(y,x)]]]

1. Elim inate the im plication signs:

(œx)[¬P(x) w (œy)[(œz)[Q(x,y)] e ¬(œz)[R(y,x)]]]

(œx)[¬P(x) w (œy)[¬(œz)[Q(x,y)] w ¬(œz)[R(y,x)]]]

2. Reduce the scopes of the negation signs:

(œx)[¬P(x) w (œy)[(›z)[¬Q(x,y)] w(›z)[¬R(y,x)]]]

3. Standardize the variables:

(œx)[¬P(x) w (œy)[(›z)[¬Q(x,y)] w(›z)[¬R(y,x)]]]

(œx)[¬P(x) w (œy)[(›z)[¬Q(x,y)] w(›u)[¬R(y,x)]]]


25 COS351-D/501

4. Elim inate existential quantifiers:

(œx)[¬P(x) w (œy)[¬Q(x,y)] w [¬R(y,x)]]

5. Convert to prenex norm al form :

(œx)(œy)[¬P(x) w [¬Q(x,y)] w [¬R(y,x)]]

6. The m atrix is already in conjunctive norm al form :

(œx)(œy)[¬P(x) w [¬Q(x,y)] w [¬R(y,x)]]

7. Elim inate the universal quantifiers:

¬P(x) w [¬Q(x,y)] w [¬R(y,x)]]

8. The clauses are:

{¬P(x), ¬Q(x,y), ¬R(y,x)}

Figure 16.1 shows a proof tree. You m ay also write proofs linearly. For exam ple, the given proof can be written as
follows:

1. ¬I(A,27) Negation of wff to be proved


2. ¬P(x) w ¬P(y) w ¬I(x,27) w ¬I(y,28) w S(x,y) Assum ptions (1 to 7)
3. P(A)
4. P(B)
5. I(A,27) w I(A,28)
6. I(B,27)
7. ¬S(B,A)

8. I(A,28) From 1 and 5


9. ¬P(x) w ¬P(A) w ¬I(x,27) w S(x,A) From 2 and 8 using {A/y}
10. ¬P(x) w ¬I(x,27) w S(x,A) From 3 and 9
11. ¬I(B,27) w S(B,A) From 4 and 10 using {B/x}
12. S(B,A) From 6 and 11
13. Nil From 7 and 12.

Section 16.7 is to be read only.

Notes on Chapter 17

Section 17.5 is to be read only.


26

©
UNISA

You might also like