CH 5,6,7

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 110

Introduction

 One of the major problems of the algorithms


schemas we have seen so far is that they take
exponential time to find the optimal solution.
 Therefore, these algorithms are usually used
for solving relatively small problems.

 However, most often we do not need the optimal solution,


and a near optimal solution can satisfy most “real”
problems.
Introduction
 A serious drawback of the above algorithms is that
they must search all the way to a complete solution
before making a commitment to even the first move
of the solution.
 The reason is that an optimal first move can not be
guaranteed until the entire solution is found and
shown to be at least as good as any other solution.
Two Player Games
Two Player Games
 Heuristic search in two player games adopts an
entirely different set of assumptions .

 Example - chess game :


 the actions are made before the consequences
are known
 there is a limited amount of time

 a move that has been made can’t be revoked .


Real-Time Single-Agent Search
 Our goal is to apply the assumptions of two-
player games to single agent heuristic search .
 So far we needed both to check all of the
available moves and in each case of back
tracking- each move that was tried was a waste
of time and we didn’t gain any information.
Minimin Lookahead Search
 In similar to the minmax search which we used
in the two player games , we will use an
algorithm minimin for the single problem solving
agent .
 This algorithm will always look for the minimal
route to the goal by choosing each time the
next minimal node , and that is because there
is only one player that makes all of the
decisions.
Minimin Lookahead Search
 The search proceeds so that we do a minimin
lookahead from a planning mode and at the end
of the search we execute the best move that
was found . From that point we repeat the
lookahead search procedure .
 There are a few heuristics that can be used in
this algorithm :
 A* heuristic function : f(n) = g(n)+h(n)
 Fixed depth heuristic function : a g(n) fixed cost.
 Fixed f(n) cost : search at the frontier for the
minimal node .
Minimin Lookahead Search
 If a goal state is encountered before the search
horizon, then the pat is terminated and a
heuristic value of zero is assigned to the goal.
 If a path ends I a non-goal dead-end before the
horizon is reached, than a heuristic value of
infinity is assigned to the dead-end node,
guaranteeing that the path will not be chosen.
Branch-and-Bound Pruning
 An obvious question is whether every frontier
node must be examined to find one of minimum
cost.
 If we allow heuristic evaluations of interior nodes,
then pruning is possible. By using an admissible f
function, we can apply the branch-and-bound
method, in order to reduce the number of nodes
checked .
Efficiency of Branch and Bound
99 24 15 8 8 15 24
1,000,000

100,000 99

10,000
nodes
1000

100

10

10 20 30 40 50
Search depth
Efficiency of Branch and Bound
 From the graph above we can see the
advantages of branch-and-bound pruning verses
the brute-force minmax search .
 For example :
 on a scale of a million nodes per move , in the 8
puzzle - brute-force search searches 25 moves,
where else the branch-and-bound searches 35
moves.
About 40 % !
 We can also see that we get better results for the
15 puzzle than for the 8 puzzle .
An Analytic Model
 Is the former surprising result special for the
sliding-tile-puzzles or is it more general ?
 A model has been defined so that each edge was
assigned a value of 0 or 1 with probability p , and
a uniform branching factor and depth .
 This model represents the tile puzzle. In the tile
puzzle- each movement of a tile either increases
or decreases by one the h value.
An Analytic Model
 Since for each move the g function increases
by one- the f function either increases by 2 or
doesn’t increase at all.
 It has been proved that if the probability of
finding a zero cost node below a certain node is
less than one- finding the lowest cost route is
exponential, while if the probability is more than
one -then the time is polynomial.
An Analytic Model
 For example:
 ifthe probability is 0.5-for a binary tree the
expected number of zero cost edges is
2*0.5=1, whereas for a ternary tree the
expected number of zero cost edges is
3*0.5=1.5 !

We see that a ternary tree can be searched more


efficiently than a binary tree !
An Analytic Model
 But this model is not so accurate for a
number of reasons:
 We can predict the results only till a certain
point.
 The model applies only to a limited depth,
because from there the model assumes the
probability for a zero cost node is the same
for all edges whereas in the sliding tile puzzle
the probability from some depth is not the
same for each node. (The probability for a
positive node increases).
Real -Time-A* (RTA*)
 So far , we only found a solution for one move at
a time , but not for several moves.
 The initial idea would be to repeat the action
done for one move several times . But that leads
to several problems .
Real -Time-A* (RTA*)
Problems:
 We might want to make a move to a node that
has already been visited and we will be in an
infinite loop .
 If we don’t allow visiting in previous visited
nodes , then we may encounter a state where
we visited all the rest of the nodes.
 Due to the limited information known in each
state we want to allow back tracking in cases
where we won’t repeat the same moves from
that state.
Real -Time-A* (RTA*)
Solution:
 We should allow backtracking only if the cost
of returning to that point plus the estimated
cost from there is less than the estimated cost
from the current point.

 Real-Time-A* (RTA*) is an efficient algorithm for


implementing this solution.
RTA* Algorithm
 In RTA* the value of f(n) for node n is like in A*.

f(n) = g(n)+h(n)
 The difference is g(n) in RTA* is computed
differently than in A*- g(n) of RTA* is the distance
of node n from the current state, and not from the
initial state.
 The implementation will be stored in an open list,
and for each move we update the g value of
each node in the open list relatively to the new
current state .
RTA* - The Drawbacks
 The time to make a move is linear in the size of
the open list.
 It is not clear exactly how to update the g values.
 It is not clear how to find the path to the next
destination node that was chosen from the open
list .
But these problems can be solved in constant
time per move !
RTA* - Example

e 4 5 i
1 b
a
d c
i9 3 2 9 j

8 7 k
m
RTA* - Example
 In the example, we start at node a and we
update the nodes so that now
f(b)=1+1=2, f(c)=1+2=3 , f(d)=1+3=4 .
 The problem solver goes to b because it’s the
minimal and update h(a)=f(c)=3. Then we
generate nodes e and i and updates that
f(e)=1+4=5 , f(i)=1+5=6 ,f(a)=1+3=4
 The problem solver goes to a and updates
f(b)=1+5=6 , and so on.
RTA* - Example
 As we can notice we won’t get into an infinite loop
even though we do allow back tracking, since each
time we gather more information and according to
that we decide what will be the next move .
 Note: RTA* does not require good admissible
functions and will find the solution in any case
(though a good heuristic function will give better
results).
 RTA* running time is linear in the number of moves
made, and so is the size of the hash table stored.
Completeness of RTA*
 RTA* is complete if it stands under the
following restrictions:
 The problem space must be finite.
 A goal must be reachable from every state.
 There can’t be cycles in the graph with zero or
negative cost.
 The heuristic values returned must be finite.
Correctness of RTA*
 RTA* makes decisions based on limited
information, and therefore the quality of the
decision it makes is the best relative to the part of
the search space it has seen so far.
 The nodes that need to be expanded by RTA* are
similar to the open list in A*.
 The main difference is in the definitions of g and h.
 The completeness of RTA* can be proved by
induction on the number of moves made.
Solution Quality vs.
Computation
 We should also consider the quality of the
solution that is returned by RTA*.
 This depends on the accuracy of the heuristic
function and the search depth.
 A choice should be made between all of the
families of heuristic functions, while some of
them are more accurate but more expensive to
compute, while the other are less accurate but
simpler to compute.
Learning-RTA* (LRTA*)
 Until now RTA* solved the problem for single trailed
problems.
 We would like now to improve the algorithm so that
it will now be good also for multiple problem solving
trails
Learning-RTA* (LRTA*)

 The algorithm for that is the same, except for


one change that will make the algorithm
suitable for the new problem:
The algorithm will store the best value of the
heuristic function, instead of the second best value,
each time.
Convergence of LRTA*
 An important advantage of LRTA* is that because of
the repetition of the problem solving trails the
heuristic values become the exact values!
 This advantage is under the following circumstances :
 The initial and goal states are chosen randomly.
 The initial heuristic values are admissble or do not
overestimate the distance to the nearest goal.
 Ties are broken randomly, otherwise if we find one
optimal solution- we might continue finding the same one
each time, and not find the other trails to the goal.
Convergence of LRTA*

 Theorem 5.2: In a finite space with finite positive


edge costs , and non-overestimating initial heuristic
values , in which a goal state is reachable from every
state , over repeated trials of LRTA* , the heuristic
values will eventualy converge to their exact values
along every optimal path.
Conclusion
 In real-time large scale application, we can’t use the
single agent heuristic search algorithms, because
the high cost and the fact that the algorithm does not
return a solution before searching the expanded
tree.
 Minimin solves the problems for such cases.
 Branch and bound pruning improves very much the
results given by minimin.
 RTA* solves the problem, of abandoning a trail to a
better looking one, efficiently.
Conclusion
 RTA* guarantees finding a solution.
 RTA* makes optimal local decisions.
 The more usage of lookahead- the higher the cost is
but the better quality of solution.
 The family of heuristic varies according to the
accuracy of the solution and the computational
complexity.
 The optimal level of lookahead depends on the
relative costs of simulating vs. executing moves.
 LRTA* is an algorithm that solves the over repeated
problem solving trail while preserving the
completeness of the solution.
Heuristic from Relaxed Models

 A heuristic function returns the exact cost of


reaching a goal in a simplified or relaxed version
of the original problem.
 This means that we remove some of the
constraints of the problem we are dealing with.
Heuristic from Relaxed Models -
Example

 Consider the problem of navigating in a network of roads


from initial location to a goal location,
 A good heuristic would be to estimate the cost between
two points in a straight line.
 We remove the constraint of the original problem that we
have to move along the roads and assume that we are
allowed to move in a straight line between two points.
Thus we get a relaxation of the original problem.
Relaxation example - TSP problem
 We can describe the problem as a graph with 3
constraints:
1 Our tour covers all the cities.
2 Every node has a degree two
 an edge entering the node and

 an edge leaving the node.

3 The graph is connected.


 If we remove constraint 2 :
We get a spanning graph and the optimal solution to
this problem is a MST (Minimum Spanning tree).
 If we remove constraint 3:
Now the graph isn’t connected and the optimal solution
to this problem is the solution to the assignment
problem.
Relaxation example - Tile Puzzle
problem

 One of the constraints in this problem is that a tile


can only slide into the position occupied by a
blank.
 If we remove this constraint we allow any tile to be
moved horizontally or vertically position. And we
actually get its Manhattan distance to its goal
location.
The STRIPS Problem formulation
 We would like to derive such heuristics
automatically.
 In order to do that we need a formal description
language that is richer than the problem space
graph.
 One such language is called STRIPS.
 In this language we have predicates and
operators.
 Let’s see a STRIPS representation of the Eight
Puzzle Problem
STRIPS - Eight Puzzle Example
1 On(x,y) = tile x is in location y.
2 Clear(z) = location z is clear.
3 Adj(y,z) = location y is adjacent to location z.
4 Move(x,y,z) = move tile x from location y to location z.
In the language we have:
 A precondition list - for example to execute move(x,y,z) we
must have: On(x,y)
Clear(z)
Adj(y,z)
 An add list - predicates that weren’t true before the operator
and now after the operator was executed are true.
 A delete list - a subset of the preconditions, that now after the
operator was executed aren’t true anymore.
STRIPS - Eight Puzzle Example

 Now in order to construct a simplified or relaxed


problem we only have to remove some of the
preconditions.
 For example - by removing Clear(z) we allow tiles to
move to adjacent locations.
 In general, the hard part is to identify which relaxed
problems have the property that their exact solution
can be efficiently computed.
Admissibility and Consistency
 The heuristics that are derived by this method are both
admissible and consistent.

 Note : The cost of the simplified graph should be as


close as possible to the original graph.

 Admissibility means that the simplified graph has an


equal or lower cost than the lowest - cost path in the
original graph.
 Consistency means that a heuristic h is consistent for
every neighbor n’ of n,
h(n)  c(n,n’)+h(n)

when h(n) is the actual optimal cost of reaching a goal


in the graph of the relaxed problem.
Heuristic from Multiple Subgoals
 We begin by presenting an alternative
derivation of the Manhattan distance heuristic
for the sliding tile puzzles.
 Any description of this problem is likely to
describe the goal state as a set of subgoals,
where each subgoal is to correctly position of
individual tile, ignoring the interaction with the
other tiles.
Enhancing the Manhattan
distance
 In the Manhattan distance for each tile we looked for
the optimal solution ignoring other tiles and only
counting moves of the tile in question.
 Therefore the heuristic function we get isn’t accurate.

1 2 3 4

5 6 7 1
8 9

10 11 12 13 14
15 16 17 18
1 19

20 21 22 23 24
Enhancing the Manhattan
distance
 We can perform a single search for each tile,
starting from its goal position, and record how
many moves of the tile are required to move to it
to every other position.
 Doing this for all tiles results in a table which
gives, for every possible position of each tile, its
Manhattan distance from its goal position.
 Then, since each move moves one tile, for a
given state we add the Manhattan distances of
each tile to get an admissible heuristic for the
state.
Enhancing the Manhattan
distance
 However this heuristic function isn’t accurate, since it
ignores the interactions between the tiles.
 The obvious next step is to repeat the process on all
possible pairs of tiles.
 In other words, for each pair of tile, and each
combination of positions they would occupy, perform a
search to their goal positions, and count only moves of
the two tiles of interest. We call this value the pairwise
distance of the two tiles from their goal locations.
Enhancing the Manhattan
distance
 Of course the goal is to find the shortest path from
the goal state to all possible positions of the two
tiles, where only moves of the two tiles of interest
are counted.
 For almost all pairs of tiles and positions, their
pairwise distances will equal the sum of their
Manhattan distances from their goal positions.
 However, there are three types of cases where the
pairwise distance exceeds the combined
Manhattan distance.
Enhancing the Manhattan
distance - the first case
1 Two tiles are in the same row or column but are reversed
relative to their goal positions.
In order to get to the goal states of the tiles, one tile must
move down or up in order to unable the other one to get to its
goal location, and than return to the row and go back to its
place.
Cost us relatively
2 1 to Manhattan :+2 1 2

2 2 2 2
1
1 1 1 1
Enhancing the Manhattan
distance - the second case
2 The corners of the puzzle.

Cost us relatively
3 to Manhattan : +2 3 4
4 3

3 1 3 4 2
4 1

 If the 3 tile is in its goal position, but some tile other than the 4 is in
the 4 position, the 3 tile will have to move temporarily to correctly
position the 4 tile. This requires two moves of the 3 tile, one to
move it out of position, and another to move it back. Thus the sum
of their Manhattan distances will exceed by two moves to their
Manhattan distances.
Enhancing the Manhattan
distance - the third case
3 In the last moves of the solution.

Cost us relatively
5 1 to Manhattan : +2 1 1
1 5

1 1

A detailed explanation is in the next slide


Enhancing the Manhattan
distance - the third case
 Before the last move either the 1 or 5 tile must be in the upper -left
corner in the goal state. Thus, the last move must move either the 1
tile right, or the 5 tile down.
 Since the Manhattan distance of these tiles is computed to their
goal positions, unless the 1 tile is in the left-most column, its
Manhattan distance will not accommodate a path through the
upper-left corner. Similarly, unless the 5 tile is in the top row ,its
Manhattan will not accommodate a path through the upper-left
corner.
 Thus, if the 1 tile is not in the left-most-column, and the 5 tile is not
in the top row, we can add tow moves to the sum of their Manhattan
distances. If we first move 1 or 5 tile into the blank position, and
thus the pairwise of the 1 and 5 tiles will be two moves greater than
the sum o their Manhattan distances, unless the 1 tile starts in the
left column or the 5 starts in the top row.
Enhancing the Manhattan
distance
2 2

 The states of these searches are only distinguishable by


the positions of the two tiles and the blank, and hence there
 n
are  = O(n3) different states , where n is number of tiles.
 3
 n  n
 Since there are  2 pairs of tiles, there are  2 = O(n2) such
searches to perform, for an overall time complexity of O(n5).
The size of the resulting table is O(n4), one entry for each
pair of tiles in each combination of positions.
Applying the Heuristics
 The next question is how to automatically handle the
interactions between these individual heuristics to
compute an overall admissible heuristic estimate for
a particular state.
 If we represent a state as a graph with a node for
each tile, and an edge between each pair. We need
to select a set of edges in such a way that the sum
of edges selected is maximized. This problem is
called the maximum weighted matching problem,
and can be solved in O(n3) time, where n is the
number of nodes.]
Higher-Order Heuristics
 Of course when using as heuristic the solution with
the pairs, it’s not an optimal solution.
 For example for the case:

3 X 2 X 11

 Therefore in order to get the full power of these


heuristic, we need to extend the idea of pairs of tiles
to include triples etc’.
Pattern Databases
 In the tile puzzles seen earlier each legal move,
moves only one tile and therefore affects only
one subgoal.
 This unable us to add heuristic estimates for the
individual tiles.

 This isn’t the same for all problems.


 For example in the Rubik’s Cube each legal twist
moves a large fraction of the individual cubes.
Pattern Databases
 The simple heuristic is 3-dimensional Manhattan distance,
where for every cubie we compute the minimum number of
moves required to correctly position and orient it, and sum
these values over all cubies.
 Here we have to divide the sum to 8 since every move,
moves 8 cubies.
 A better heuristic is as before, but only to compute the sum
of moves for the edge and corner cubies (In contrast to the
previous heuristic, here we calculate the number of moves
needed for the edge and corner cubies separately).
 For the edge cubies we will divide the value by 4, and for
the corner cubies we will also divide the value by 4.
Pattern Databases
 We can compute the heuristic function by a table
lookup, which is sometimes more efficient since it will
save time during execution of the program.

 The use in such tables is called pattern databases.


 A pattern database stores the number of moves to
solve different patterns of subsets of the puzzle
elements.
Pattern Databases
 For example, the Manhattan distance function is
usually computed with the aid of a small table that
contains the Manhattan distance for each cubie
from all possible positions and orientations.
 The idea can be developed much further since for
the 8 corner cubies - each cubie can be carried in
3 different orientions, but the last cubie is
determined by the other 7.
 This results in 8!*3^7=88,179,840 differnet states.
Pattern Databases
 We can use a breadth first search and record in a
table the number of moves required to solve each
combination of corner cubies (This table requires
42 megabytes).
 During an IDA* search as each state is generated,
a unique index into the heuristic table is
computed, followed by a reference to the table.
The stored value is the number of moves needed
to solve just the corner cubies, and thus a lower
bound on the number of moves needed to solve
the entire puzzle.
Pattern Databases
 We can improve the heuristic by considering the 12
edge cubies as well. The edge cubies can be in one of
12! permutations, and each can be in one of two
different orientations, but the last orientation of the last
cubie is determined by the other 11. However this
requires too much memory.
 Therefore we will compute and store pattern databases
for subsets of the edge cubies .
 We can compute the possible combinations for 6 cubies 12 6
(for 7 cubies it will take too much memory). The number 6
 
12
of possible combinations for 6 of the 12 edge cubies  is
 6
*2 = 42,557,920.
 Similarly, we can compute the corresponding heuristic
table for the remaining 6 edge cubies.
Pattern Databases
 The heuristic used for the experiments is the maximum
of all 3 of these values: all 8 corner cubies, and 2
groups of 6 edge cubies each.
 The total amount of memory for all the 3 tables is 82
megabytes.
 The total time to generate all 3 heuristic tables was
about an hour.
 Even though , the result was a small increase compared
to the number of moves for the corner cubies only, it
results in a significant performance improvement.
 Given more memory we could compute and store even
larger pattern databases.
Computer Chess
A natural domain for studying AI
 The game is well structured.
 Perfect information game.
 Early programmers and AI researchers were
often amateur chess players as well.
Brief History of Computer Chess
 Maelzel’s Chess Machine
 1769 Chess automaton by Baron Wolfgang von
Kempelen of Austria
 Appeared to automatically move the pieces on a
board on top of the machine and played
excellent chess.
 Puzzle of the machine playing solved in 1836 by
Edgar Allen Poe.
Brief History of Computer Chess
Maelzel’s Chess Machine
Brief History of Computer Chess
 Early 1950’s - First serious paper on computer chess
was written by Claude Shannon. Described minimax
search with a heuristic static evaluation function and
anticipated the need for more selective search
algorithms.
 1956 - Invention of alpha-beta pruning by John
McCarthy. Used in early programs such as Samuel’s
checkers player and Newell, Shaw and Simon’s chess
program.
Brief History of Computer Chess

 1982 - Development of Belle by Condon and


Thomson. Belle - first machine whose hardware was
specifically designed to play chess, in order to
achieve speed and search depth.
 1997 - Deep Blue machine was the first machine to
defeat the human world champion, Garry Kasparov,
in a six-game match.
Checkers
 1952 - Samuel developed a checkers program that
learned its own evaluation through self play.
 1992 - Chinook (J. Schaeffer) wins the U.S Open.
At the world championship, Marion Tinsley beat
Chinook.
Othello

 Othello programs better than the best humans.


 Large number of pieces change hands in each
move.
 Best Othello program today is Logistello (Michael
Buro).
Backgammon
 Unlike the above games backgammon includes a
roll of the dice, introducing a random element.
 Best backgammon program TD -gammon(Gerry
Tesauro). Comparable to best human players
today.
 Learns an evaluation function using temporal-
difference.
Card games
 In addition to a random element there is hidden
information introduced.
 Best bridge GIB (M.Ginsberg)
 Bridge games are not competitive with the best
human players.
 Poker programs are worse relative to their human
counterparts.
 Poker involves a strong psychological element
when played by people.
Other games - Summary
 The greater the branching factor the worse the
performance.
 Go - branching factor 361 very poor performance.
Checkers - branching factor 4 - very good
performance.
 Backgammon - exception. Large branching factor
still gets good results.
Brute-Force Search
 We begin considering a purely brute-force approach to
game playing.
 Clearly, this will only be feasible for small games, but
provides a basis for further discussions.
 Example - 5-stone Nim
 played with 2 players and pile of stones.
 Each player removes a stone from the pile.
 player who removes the last stone wins the game.
Example - Game Tree for 5-Stone Nim

4 3

3 2 2 1

2 1 1 0 1 0 0

1 0 0 0 0
OR nodes x
AND nodes x
0
Minimax
 Minimax theorem - Every two-person zero-sum game
is a forced win for one player, or a forced draw for
either player, in principle these optimal minimax
strategies can be computed.

 Performing this algorithm on tic-tac-toe results


in the root being labeled a draw.
Heuristic Evaluation Functions

 Problem: How to evaluate positions, where


brute force is out of the question?

 Solution: Use a heuristic static evaluation


function to estimate the merit of a position when
the final outcome has not yet been determined.
Example of heuristic Function
 Chess :
 Number of pieces on board of each type multiplied
by relative value summed up for each color. By
subtracting the weighted material of the black
player from the weighted material of the white
player we receive the relative strength of the
position for each player.
Heuristic Evaluation Functions
 A heuristic static evaluation function for a two
player game is a function from a state to a
number.
 The goal of a two player game is to reach a
winning state, but the number of moves
required to get there is unimportant.
 Other features must be taken into account to
get to an overall evaluation function.
Heuristic Evaluation Functions
 Given a heuristic static evaluation function, it is
straightforward to write a program to play a
game.
 From any given position, we simply generate all
the legal moves, apply our static evaluator to the
position resulting from each move, and then
move to the position with the largest or smallest
evaluation, depending if we are MIN/MAX
Example - tic-tac-toe
Behavior of Evaluation Function
 Detect if game over.
 If X is the Maximizer, the function should return 
if there are three X’s in a row and -  if there are
three O’s in a row.
 Count of the number of different rows, columns,
and diagonals occupied by O.
Example: First moves of tic-tac-toe

X
X
X

3-0 = 3
4-4=4 2-0 = 2
Example - tic-tac-toe
Behavior of Evaluation Function

 This algorithm is extremely efficient, requiring


time that is only linear in the number of legal
moves.
 It’s drawback is that it only considers
immediate consequences of each move
(doesn’t look over the horizon).
Minimax Search

Where
does X
go? X
X X

1 0 1 0 -1 -1 0 -1 0 -2
4-3 = 1 4-2 = 2
Minimax search
Search as deeply as possible given the computational
resources of the machine and the time constraints on
the game.
Evaluate the nodes at the search frontier by the
heuristic function.
Where MIN is to move, save the minimum of it’s
children’s values. Where MAX is to move, save the
maximum of it’s children’s values.
A move is made to a child of the root with the largest or
smallest value, depending on whether MAX or MIN is
moving.
Minimax search example Minmax Tree
MAX
4
MIN
4 2

4 8 2 14

4 2 6 8 1 2 12 14

4 5 3 2 6 7 8 9 1 10 2 11 12 13 14 14
Alpha-Beta Pruning

 By using alpha-beta pruning the minimax


value of the root of a game tree can be
determined without having to examine all
the nodes.
Alpha-Beta Pruning Example
a4

b4 m 2

c4 i>=6 n <=2

d 4 g <=3 j 6 o <=1 q<=2

MAX
4 5 3 6 7 1 2
e f h k l p r MIN
Alpha-Beta
 Deep pruning - Right half of tree in example.
 Next slide code for alpha-beta pruning :

 MAXIMIN - assumes that its argument node is a
maximizing node.
 MINIMAX - the same.
 V(N) - Heuristic static evaluation of node N.
MAXIMIN ( node: N ,lowerbound : alpha ,upperbound: beta)
IF N is at the search depth, RETURN V(N)
FOR each child Ni of N
value = MINIMAX(Ni,alpha,beta)
IF value > alpha , alpha := value
IF alpha >= beta ,return alpha
RETURN alpha

MINIMAX ( node: N ,lowerbound : alpha ,upperbound: beta)


IF N is at the search depth, RETURN V(N)
FOR each child Ni of N
value = MAXIMIN(Ni,alpha,beta)
IF value < beta , beta := value
IF beta <= alpha, return alpha

RETURN beta
Performance of Alpha-Beta
 Efficiency depends on the order in which the
nodes are encountered at the search frontier.
 Optimal - b½ - if the largest child of a MAX
node is generated first, and the smallest child
of a MIN node is generated first.
 Worst - b.
 Average b¾ - random ordering.
Games with chance
 chance nodes: nodes where chance events
happen (rolling dice, flipping a coin, etc)
 Evaluate expected value by averaging outcome
probabilities:
C is a chance node
 P(di) probability of rolling di (1,2, …, 12)
 S(C,di) is the set of positions generated by
applying all legal moves for roll di to C
Games with chance

Backgammon board
Search tree with
MAX probabilities
3 -1

0.5 0.5 0.5 0.5


2 4 0 -2
MIN

2 4 7 4 6 0 5 -2
Search tree with
probabilities
Additional Enhancements
 A number of additional improvements have been
developed to improve performance with limited
computation.
 We briefly discuss the most important of these
below.
Node Ordering
 By using node ordering we can get close to b½ .
 Node ordering instead of generating the tree left-
to-right, we reorder the tree based on the static
evaluations of the interior nodes.
 To save space only the immediate children are
reordered after the parent is fully expanded.
Iterative Deepening
 Another idea is to use iterative deepening. In
two player games using time, when time runs
out, the move recommended by the last
completed iteration is made.
 Can be combined with node ordering to
improve pruning efficiency. Instead of using
the heuristic value we can use the value from
pervious iteration.
Quiescence
 Quiescence search is to make a secondary
search in the case of a position whose values
are unstable.
 This way obtains a stable evaluation.
Transposition Tables
 For efficiency, it is important to detect when a
state has already been searched.
 In order to detect a searched state, previously
generated game states, with their minimax values
are saved into a transposition table.
Opening Book

 Most board games start with the same initial


state.
 A table of good initial moves is used, based on
human expertise, known as an opening book.
Endgame Databases
 A database of endgame moves, with minimax
values, is used.
 In checkers, endgame for less than eight or fewer
pieces on board.
 A technique for calculating endgame databases,
retrograde analysis.
Special Purpose Hardware
 The faster the machine ,the deeper the search in
the time available and the better it plays.
 The best machines today are based on special-
purpose hardware designed and built only to play
chess.
Selective Search
 The fundamental reason that humans are
competitive with computers is that they are very
selective in their choice of positions to
examine, unlike programs which do full-width
fixed depth searches.
 Selective search: to search only on a
“interesting” domain.
 Example - Best first minimax.
Best First Minimax
 Given a partially expanded minimax tree, the
backed up minimax value of the root is determined
by one of the leaf nodes, as is the value of every
node on the path from the root to that leaf.
 This path is known as principal variation, and the
leaf is known as principal leaf.
 In general, the best-first minimax will generate an
unbalanced tree, and make different move
decisions than full-width-fixed-depth alpha-beta.
Best First minimax search-
Example

6
Principal leaf -
expand it
4 6
Best First minimax search-
Example

4 2

Principal leaf -
expand it 5 2
Best First minimax search-
Example

1 2

8 1 5 2

Principal leaf -
expand it
Best First minimax search-
Example

1 2

8 1 5 7

3 7
Best First search
 Full width search is a good insurance against
missing a move (and making a mistake).
 Most game programs that use selective searches
use a combined algorithm that starts with a full-
width search to a nominal length, and then
searches more selectively below that depth.

You might also like