Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Chapter5:

Adversarial Search
(Game-Playing)

Ch5: Adversarial Search 1


Outline

❖ Games
❖ Optimal decisions in games
❖ Minimax algorithm
❖ Alpha-Beta Pruning (α-β pruning)
❖ Stochastic Games

Ch5: Adversarial Search 2


Which Problems can we Solve?
❖ The task environments which are suitable for the search
algorithms we’ve looked at so far are:
 Fully observable
 Deterministic
 Sequential
 Static
 Discrete
 Single agent

❖ Here we will consider the situation where other agents


messing with the world.

Ch5: Adversarial Search 3


Games
❖ Multiagent environments:
 Cooperative
 Competitive (in which the agent’s goals are in conflict)➔ adversarial search
problems ➔these problems known as games
❖ In Math. game theory (branch of economics), any multiagent environment
(either cooperative or competitive) is a game provided that the impact of
each agent on the other is significant
❖ In AI, games are usually what game theorists would call deterministic,
turn-taking, two-player, zero-sum games of perfect information.
 Zero-sum ➔ one players loss is the other’s gain.
 Perfect information ➔ both players have access to complete information about
the state of the game. No information is hidden from either player.
❖ Examples: chess, checkers, Connect 4, Othello, go, tic-tac-toe, …

Ch5: Adversarial Search 4


Features of these Games
❖ Fully observable: game state is visible to both players
❖ Deterministic: no element of chance
❖ Sequential: action taken now affects future choices
❖ Static: the world doesn’t change during deliberation
❖ Discrete: the game state can be represented exactly using a
finite representation
❖ Multi agent: two agents whose actions alternate and the utility
values at the end of the game are always equal and opposite
(+1 and –1)

Ch5: Adversarial Search 5


Game problem formulation
❖ Two players: MAX and MIN
❖ MAX moves first and they take turns until the game is over
 Winner gets reward, loser gets penalty.
 “Zero sum” means the sum of the reward and the penalty is a constant.
❖ Formal definition as a search problem:
 Initial state: Set-up specified by the rules, e.g., initial board configuration of
chess.
 Player(s): Defines which player has the move in a state.
 Actions(s): Returns the set of legal moves in a state.
 Result(s,a): Transition model defines the result of a move.
 Terminal-Test(s): Is the game finished? True if finished, false otherwise.
 Utility function(s,p): Gives numerical value of terminal state s for player p.
▪ vary from game to game:
▪ E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe.
▪ E.g., win (+1), lose (0), and draw (1/2) in chess.➔ (Constant Sum)

Ch5: Adversarial Search 6


Game tree

❖ As for a search problem, the initial state, action set and


transition model define a game tree for the game.
 a tree where the nodes are game states and the edges are
moves.

❖ We draw the tree assuming the two players, MAX and MIN
where MAX moves first.

❖ The next slide gives a partial game tree for tic-tac-toe.

Ch5: Adversarial Search 7


partial game tree for tic-tac-toe

How do we
search this
tree to find
the optimal
move?
Terminal states are labeled depending on the winner
(MAX=+1,MIN=-1)
High values are good for MAX and bad for MIN.

Ch5: Adversarial Search 8


Optimal strategies
❖ Key thing is that we have to take into account what the other
player is doing.
❖ Rather than the simple path that is a solution in a search
problem, we need a contingent strategy, which specifies
 MAX’s move in the initial state,
 then MAX’s moves in the states resulting from every possible
response by MIN,
 then MAX’s moves in the states resulting from every possible
response by MIN to those moves
 …
❖ This gives us an optimal strategy in the sense that we do as
well as we can against an infallible opponent.

Ch5: Adversarial Search 9


Minimax search
❖ One-move deep (two half-moves)(2 ply) game tree:

Ch5: Adversarial Search 10


Minimax search (Cont’d)
❖ Given a game tree, we determine the optimal strategy by
establishing the minimax value of each node, which is the utility (for
MAX) of being in the state corresponding to s.
❖ Well, the value assuming that both players finish the game out
perfectly. (assume both players play optimally)
❖ How do we do this?
 Obviously, Minimax value of a terminal state is just its utility.
 Assume our utility function gives terminal nodes high positive values if
they are good for MAX
 And low values if they are good for MIN
 Now, look at the leaf nodes and consider which ones MAX wants:
▪ Ones with high values.
 MAX could choose these nodes if it was his turn to play.
 So, the value of the MAX-node parent of a set of nodes is the max of all
the child values.
Ch5: Adversarial Search 11
Minimax search (Cont’d)

 Similarly, when MIN plays he wants the node with the lowest value.
 So the MIN-node parent of a set of nodes gets the min of all their
values.
❖ i.e., Given a choice, MAX prefer to move to a state of maximum
value, whereas MIN prefers a state of minimum value
❖ We back up values until we get to the children of the start node, and
MAX can use this to decide which node to choose.

Ch5: Adversarial Search 12


Minimax algorithm

Designed to find the optimal strategy for Max and find best move:

1. Generate the whole game tree, down to the leaves.

2. Apply utility (payoff) function to each leaf.

3. Back-up values from leaves through branch nodes:


a Max node computes the Max of its child values
a Min node computes the Min of its child values

4. At root: choose the move leading to the child of highest value.

Ch5: Adversarial Search 13


Minimax search (Cont’d)
MINIMAX(B) = min(3,12,8) =3
MINIMAX(C) = min(2,4,6) =2
MINIMAX(D) = min(14,5,2) =2

Ch5: Adversarial Search 14


Minimax search (Cont’d)
MINIMAX(root) = max(min(3,12,8), min(2,4,6), min(14,5,2)) = max(3,2,2) =3

❖ There’s an algorithm for this.


Ch5: Adversarial Search 15
Minimax algorithm
❖ Recursive Depth First Search:

Ch5: Adversarial Search 16


Properties of minimax
❖ Complete?
 Yes (if tree is finite)
❖ Optimal?
 Yes (against an optimal opponent)
❖ Time complexity?
 O(bm)
❖ Space complexity?
 O(bm) (depth-first exploration)

❖ For chess, b ≈ 35, m ≈100 for "reasonable" games


 exact solution completely infeasible
❖ It is usually impossible to develop the whole search tree.
 Moves must be made in a reasonable amount of time
Ch5: Adversarial Search 17
Solution to the complexity problem

❖ Two solutions:

 Early cutoff of the search tree


▪ depth limited Minimax search (MINIMAXcutoff).

 Dynamic pruning of redundant branches of the search tree


▪ Procedure: Alpha-Beta pruning

Ch5: Adversarial Search 18


Cutting off search
❖ Idea:
 Cutoff the search tree before the terminal state is reached.
❖ Problem:
 Utility is defined only for terminal states.
❖ Solution:
 apply a heuristic Evaluation function to states in the search
▪ Which estimate the position utility

❖ MinimaxCutoff search is identical to Minimax search except


1. TERMINAL-TEST(s) is replaced by CUTOFF-TEST(s)
2. UTILITY(s) is replaced by EVAL (s)

Ch5: Adversarial Search 19


Example—Tic-tac-toe.
❖ The evaluation function heuristic

Ch5: Adversarial Search 20


Example—Tic-tac-toe. (Cont’d)

Ch5: Adversarial Search 21


Example—Tic-tac-toe. (Cont’d)
❖ Unsurprisingly (for anyone who ever played Tic-tac-toe):

❖ Is the best move.


❖ So MAX moves and then MIN replies, and then MAX
searches again:

Ch5: Adversarial Search 22


Example—Tic-tac-toe. (Cont’d)

Here there are


two equally good
best moves.
• So we can break
the tie randomly.
• Then we let
MIN move and do
the search again.

Ch5: Adversarial Search 23


Example—Tic-tac-toe. (Cont’d)

And so on.

Ch5: Adversarial Search 24


α-β pruning
❖ It is possible to compute the correct minimax decision without looking
at every node in the game tree
❖ Example
Do DF-search until first leaf Range of possible values

[-∞,+∞]

[-∞, +∞]

Ch5: Adversarial Search 25


α-β pruning Example

[-∞,+∞]

[-∞,3]

Ch5: Adversarial Search 26


α-β pruning Example

[-∞,+∞]

[-∞,3]

Ch5: Adversarial Search 27


α-β pruning Example

[3,+∞]

[3,3]

Ch5: Adversarial Search 28


α-β pruning Example

[3,+∞]
This node is
worse for MAX

[3,3] [-∞,2]

Ch5: Adversarial Search 29


α-β pruning Example

[3,14] ,

[3,3] [-∞,2] [-∞,14]

Ch5: Adversarial Search 30


α-β pruning Example

[3,5] ,

[3,3] [−∞,2] [-∞,5]

Ch5: Adversarial Search 31


α-β pruning Example

[3,3]

[3,3] [−∞,2] [2,2]

Ch5: Adversarial Search 32


α-β pruning Example

[3,3]

[3,3] [-∞,2] [2,2]

Ch5: Adversarial Search 33


General alpha-beta pruning

❖ α is the value of the best Player


(i.e., highest value) choice
found so far at any choice
point along the path for
MAX
m 
❖ If v is worse than α, ( > v ), Opponent
MAX will avoid it
 prune that branch
Player
❖ Define β similarly for MIN

Opponent n v

Ch5: Adversarial Search 34


The α-β algorithm
❖ Depth first search
– only considers nodes along a single path from root at any time

 = highest-value choice found at any choice point of path for MAX


(initially,  = −infinity)
b = lowest-value choice found at any choice point of path for MIN
(initially, b = +infinity)

❖ Pass current values of  and b down to child nodes during search.


❖ Update values of  and b during search:
➢ MAX updates  at MAX nodes
➢ MIN updates b at MIN nodes
❖ Prune remaining branches at a node when  ≥ b

Ch5: Adversarial Search 35


α-β Example Revisited

Do DF-search until first leaf


, b, initial values
=−
b =+

, b, passed to kids
=−
b =+

Ch5: Adversarial Search 36


α-β Example Revisited

=−
b =+

=−
b =3

MIN updates b, based on kids

Ch5: Adversarial Search 37


α-β Example Revisited

=−
b =+

=−
b =3

MIN updates b, based on kids.


No change.

Ch5: Adversarial Search 38


α-β Example Revisited

MAX updates , based on kids.


=3
b =+

3 is returned
as node value.

Ch5: Adversarial Search 39


α-β Example Revisited

=3
b =+

, b, passed to kids
=3
b =+

Ch5: Adversarial Search 40


α-β Example Revisited

=3
b =+

MIN updates b,
based on kids.
=3
b =2

Ch5: Adversarial Search 41


α-β Example Revisited

=3
b =+

=3  ≥ b,
b =2 so prune.

Ch5: Adversarial Search 42


α-β Example Revisited

MAX updates , based on kids.


No change. =3
b =+

2 is returned
as node value.

Ch5: Adversarial Search 43


α-β Example Revisited

=3
b =+ ,
, b, passed to kids

=3
b =+

Ch5: Adversarial Search 44


α-β Example Revisited

=3
b =+ ,
MIN updates b,
based on kids.
=3
b =14

Ch5: Adversarial Search 45


α-β Example Revisited

=3
b =+ ,
MIN updates b,
based on kids.
=3
b =5

Ch5: Adversarial Search 46


α-β Example Revisited

=3
b =+ 2 is returned
as node value.

Ch5: Adversarial Search 47


α-β Example Revisited

Max calculates the same


node value, and makes the
same move!

Ch5: Adversarial Search 48


The α-β algorithm

Ch5: Adversarial Search 49


Final Comments about Alpha-Beta Pruning
❖ Pruning does not affect final results

❖ Entire subtrees can be pruned.

❖ Good move ordering improves effectiveness of pruning

❖ Repeated states are again possible.


 Store them in memory = transposition table

Ch5: Adversarial Search 50


Example1
❖ which nodes can be pruned?

5 6
3 4 1 2 7 8

Ch5: Adversarial Search 51


Example1 (Cont’d)
Max Answer:
NONE! Because the
most favorable nodes for both
are explored last (i.e., in the
diagram, are on the right-hand
side).
Min

Max

5 6
3 4 1 2 7 8

Ch5: Adversarial Search 52


Example2 : the exact mirror image of example1

❖ which nodes can be pruned?

3 4
6 5 8 7 2 1

Ch5: Adversarial Search 53


Example2 (Cont’d)
Answer:
Max LOTS! Because the most
favorable nodes for both are
explored first (i.e., in the
diagram, are on the left-hand
side).

Min

Max

3 4
6 5 8 7 2 1

Ch5: Adversarial Search 54

You might also like