Professional Documents
Culture Documents
CH 5,6,7
CH 5,6,7
CH 5,6,7
100,000 99
10,000
nodes
1000
100
10
10 20 30 40 50
Search depth
Efficiency of Branch and Bound
From the graph above we can see the
advantages of branch-and-bound pruning verses
the brute-force minmax search .
For example :
on a scale of a million nodes per move , in the 8
puzzle - brute-force search searches 25 moves,
where else the branch-and-bound searches 35
moves.
About 40 % !
We can also see that we get better results for the
15 puzzle than for the 8 puzzle .
An Analytic Model
Is the former surprising result special for the
sliding-tile-puzzles or is it more general ?
A model has been defined so that each edge was
assigned a value of 0 or 1 with probability p , and
a uniform branching factor and depth .
This model represents the tile puzzle. In the tile
puzzle- each movement of a tile either increases
or decreases by one the h value.
An Analytic Model
Since for each move the g function increases
by one- the f function either increases by 2 or
doesn’t increase at all.
It has been proved that if the probability of
finding a zero cost node below a certain node is
less than one- finding the lowest cost route is
exponential, while if the probability is more than
one -then the time is polynomial.
An Analytic Model
For example:
ifthe probability is 0.5-for a binary tree the
expected number of zero cost edges is
2*0.5=1, whereas for a ternary tree the
expected number of zero cost edges is
3*0.5=1.5 !
f(n) = g(n)+h(n)
The difference is g(n) in RTA* is computed
differently than in A*- g(n) of RTA* is the distance
of node n from the current state, and not from the
initial state.
The implementation will be stored in an open list,
and for each move we update the g value of
each node in the open list relatively to the new
current state .
RTA* - The Drawbacks
The time to make a move is linear in the size of
the open list.
It is not clear exactly how to update the g values.
It is not clear how to find the path to the next
destination node that was chosen from the open
list .
But these problems can be solved in constant
time per move !
RTA* - Example
e 4 5 i
1 b
a
d c
i9 3 2 9 j
8 7 k
m
RTA* - Example
In the example, we start at node a and we
update the nodes so that now
f(b)=1+1=2, f(c)=1+2=3 , f(d)=1+3=4 .
The problem solver goes to b because it’s the
minimal and update h(a)=f(c)=3. Then we
generate nodes e and i and updates that
f(e)=1+4=5 , f(i)=1+5=6 ,f(a)=1+3=4
The problem solver goes to a and updates
f(b)=1+5=6 , and so on.
RTA* - Example
As we can notice we won’t get into an infinite loop
even though we do allow back tracking, since each
time we gather more information and according to
that we decide what will be the next move .
Note: RTA* does not require good admissible
functions and will find the solution in any case
(though a good heuristic function will give better
results).
RTA* running time is linear in the number of moves
made, and so is the size of the hash table stored.
Completeness of RTA*
RTA* is complete if it stands under the
following restrictions:
The problem space must be finite.
A goal must be reachable from every state.
There can’t be cycles in the graph with zero or
negative cost.
The heuristic values returned must be finite.
Correctness of RTA*
RTA* makes decisions based on limited
information, and therefore the quality of the
decision it makes is the best relative to the part of
the search space it has seen so far.
The nodes that need to be expanded by RTA* are
similar to the open list in A*.
The main difference is in the definitions of g and h.
The completeness of RTA* can be proved by
induction on the number of moves made.
Solution Quality vs.
Computation
We should also consider the quality of the
solution that is returned by RTA*.
This depends on the accuracy of the heuristic
function and the search depth.
A choice should be made between all of the
families of heuristic functions, while some of
them are more accurate but more expensive to
compute, while the other are less accurate but
simpler to compute.
Learning-RTA* (LRTA*)
Until now RTA* solved the problem for single trailed
problems.
We would like now to improve the algorithm so that
it will now be good also for multiple problem solving
trails
Learning-RTA* (LRTA*)
1 2 3 4
5 6 7 1
8 9
10 11 12 13 14
15 16 17 18
1 19
20 21 22 23 24
Enhancing the Manhattan
distance
We can perform a single search for each tile,
starting from its goal position, and record how
many moves of the tile are required to move to it
to every other position.
Doing this for all tiles results in a table which
gives, for every possible position of each tile, its
Manhattan distance from its goal position.
Then, since each move moves one tile, for a
given state we add the Manhattan distances of
each tile to get an admissible heuristic for the
state.
Enhancing the Manhattan
distance
However this heuristic function isn’t accurate, since it
ignores the interactions between the tiles.
The obvious next step is to repeat the process on all
possible pairs of tiles.
In other words, for each pair of tile, and each
combination of positions they would occupy, perform a
search to their goal positions, and count only moves of
the two tiles of interest. We call this value the pairwise
distance of the two tiles from their goal locations.
Enhancing the Manhattan
distance
Of course the goal is to find the shortest path from
the goal state to all possible positions of the two
tiles, where only moves of the two tiles of interest
are counted.
For almost all pairs of tiles and positions, their
pairwise distances will equal the sum of their
Manhattan distances from their goal positions.
However, there are three types of cases where the
pairwise distance exceeds the combined
Manhattan distance.
Enhancing the Manhattan
distance - the first case
1 Two tiles are in the same row or column but are reversed
relative to their goal positions.
In order to get to the goal states of the tiles, one tile must
move down or up in order to unable the other one to get to its
goal location, and than return to the row and go back to its
place.
Cost us relatively
2 1 to Manhattan :+2 1 2
2 2 2 2
1
1 1 1 1
Enhancing the Manhattan
distance - the second case
2 The corners of the puzzle.
Cost us relatively
3 to Manhattan : +2 3 4
4 3
3 1 3 4 2
4 1
If the 3 tile is in its goal position, but some tile other than the 4 is in
the 4 position, the 3 tile will have to move temporarily to correctly
position the 4 tile. This requires two moves of the 3 tile, one to
move it out of position, and another to move it back. Thus the sum
of their Manhattan distances will exceed by two moves to their
Manhattan distances.
Enhancing the Manhattan
distance - the third case
3 In the last moves of the solution.
Cost us relatively
5 1 to Manhattan : +2 1 1
1 5
1 1
3 X 2 X 11
4 3
3 2 2 1
2 1 1 0 1 0 0
1 0 0 0 0
OR nodes x
AND nodes x
0
Minimax
Minimax theorem - Every two-person zero-sum game
is a forced win for one player, or a forced draw for
either player, in principle these optimal minimax
strategies can be computed.
X
X
X
3-0 = 3
4-4=4 2-0 = 2
Example - tic-tac-toe
Behavior of Evaluation Function
Where
does X
go? X
X X
1 0 1 0 -1 -1 0 -1 0 -2
4-3 = 1 4-2 = 2
Minimax search
Search as deeply as possible given the computational
resources of the machine and the time constraints on
the game.
Evaluate the nodes at the search frontier by the
heuristic function.
Where MIN is to move, save the minimum of it’s
children’s values. Where MAX is to move, save the
maximum of it’s children’s values.
A move is made to a child of the root with the largest or
smallest value, depending on whether MAX or MIN is
moving.
Minimax search example Minmax Tree
MAX
4
MIN
4 2
4 8 2 14
4 2 6 8 1 2 12 14
4 5 3 2 6 7 8 9 1 10 2 11 12 13 14 14
Alpha-Beta Pruning
b4 m 2
c4 i>=6 n <=2
MAX
4 5 3 6 7 1 2
e f h k l p r MIN
Alpha-Beta
Deep pruning - Right half of tree in example.
Next slide code for alpha-beta pruning :
MAXIMIN - assumes that its argument node is a
maximizing node.
MINIMAX - the same.
V(N) - Heuristic static evaluation of node N.
MAXIMIN ( node: N ,lowerbound : alpha ,upperbound: beta)
IF N is at the search depth, RETURN V(N)
FOR each child Ni of N
value = MINIMAX(Ni,alpha,beta)
IF value > alpha , alpha := value
IF alpha >= beta ,return alpha
RETURN alpha
RETURN beta
Performance of Alpha-Beta
Efficiency depends on the order in which the
nodes are encountered at the search frontier.
Optimal - b½ - if the largest child of a MAX
node is generated first, and the smallest child
of a MIN node is generated first.
Worst - b.
Average b¾ - random ordering.
Games with chance
chance nodes: nodes where chance events
happen (rolling dice, flipping a coin, etc)
Evaluate expected value by averaging outcome
probabilities:
C is a chance node
P(di) probability of rolling di (1,2, …, 12)
S(C,di) is the set of positions generated by
applying all legal moves for roll di to C
Games with chance
Backgammon board
Search tree with
MAX probabilities
3 -1
2 4 7 4 6 0 5 -2
Search tree with
probabilities
Additional Enhancements
A number of additional improvements have been
developed to improve performance with limited
computation.
We briefly discuss the most important of these
below.
Node Ordering
By using node ordering we can get close to b½ .
Node ordering instead of generating the tree left-
to-right, we reorder the tree based on the static
evaluations of the interior nodes.
To save space only the immediate children are
reordered after the parent is fully expanded.
Iterative Deepening
Another idea is to use iterative deepening. In
two player games using time, when time runs
out, the move recommended by the last
completed iteration is made.
Can be combined with node ordering to
improve pruning efficiency. Instead of using
the heuristic value we can use the value from
pervious iteration.
Quiescence
Quiescence search is to make a secondary
search in the case of a position whose values
are unstable.
This way obtains a stable evaluation.
Transposition Tables
For efficiency, it is important to detect when a
state has already been searched.
In order to detect a searched state, previously
generated game states, with their minimax values
are saved into a transposition table.
Opening Book
6
Principal leaf -
expand it
4 6
Best First minimax search-
Example
4 2
Principal leaf -
expand it 5 2
Best First minimax search-
Example
1 2
8 1 5 2
Principal leaf -
expand it
Best First minimax search-
Example
1 2
8 1 5 7
3 7
Best First search
Full width search is a good insurance against
missing a move (and making a mistake).
Most game programs that use selective searches
use a combined algorithm that starts with a full-
width search to a nominal length, and then
searches more selectively below that depth.