Adversarial Search

❖ Games
❖ Optimal decisions in games
❖ Minimax algorithm
❖ Alpha-Beta Pruning (α-β pruning)
❖ Stochastic Games

Which Problems can we Solve?
❖ The task environments which are suitable for the search
algorithms we’ve looked at so far are:
 Fully observable
 Deterministic
 Sequential
 Static
 Discrete
 Single agent

❖ Here we will consider the situation where other agents

messing with the world.

❖ Multiagent environments:
 Cooperative
 Competitive (in which the agent’s goals are in conflict)➔ adversarial search
problems ➔these problems known as games
❖ In Math. game theory (branch of economics), any multiagent environment
(either cooperative or competitive) is a game provided that the impact of
each agent on the other is significant
❖ In AI, games are usually what game theorists would call deterministic,
turn-taking, two-player, zero-sum games of perfect information.
 Zero-sum ➔ one players loss is the other’s gain.
 Perfect information ➔ both players have access to complete information about
the state of the game. No information is hidden from either player.
❖ Examples: chess, checkers, Connect 4, Othello, go, tic-tac-toe, …

Features of these Games
❖ Fully observable: game state is visible to both players
❖ Deterministic: no element of chance
❖ Sequential: action taken now affects future choices
❖ Static: the world doesn’t change during deliberation
❖ Discrete: the game state can be represented exactly using a
finite representation
❖ Multi agent: two agents whose actions alternate and the utility
values at the end of the game are always equal and opposite
(+1 and –1)

Game problem formulation
❖ Two players: MAX and MIN
❖ MAX moves first and they take turns until the game is over
 Winner gets reward, loser gets penalty.
 “Zero sum” means the sum of the reward and the penalty is a constant.
❖ Formal definition as a search problem:
 Initial state: Set-up specified by the rules, e.g., initial board configuration of
 Player(s): Defines which player has the move in a state.
 Actions(s): Returns the set of legal moves in a state.
 Result(s,a): Transition model defines the result of a move.
 Terminal-Test(s): Is the game finished? True if finished, false otherwise.
 Utility function(s,p): Gives numerical value of terminal state s for player p.
▪ vary from game to game:
▪ E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe.
▪ E.g., win (+1), lose (0), and draw (1/2) in chess.➔ (Constant Sum)

Game tree

❖ As for a search problem, the initial state, action set and

transition model define a game tree for the game.
 a tree where the nodes are game states and the edges are

❖ We draw the tree assuming the two players, MAX and MIN
where MAX moves first.

❖ The next slide gives a partial game tree for tic-tac-toe.

partial game tree for tic-tac-toe

How do we
search this
tree to find
the optimal
Terminal states are labeled depending on the winner
High values are good for MAX and bad for MIN.

Optimal strategies
❖ Key thing is that we have to take into account what the other
player is doing.
❖ Rather than the simple path that is a solution in a search
problem, we need a contingent strategy, which specifies
 MAX’s move in the initial state,
 then MAX’s moves in the states resulting from every possible
response by MIN,
 then MAX’s moves in the states resulting from every possible
response by MIN to those moves
 …
❖ This gives us an optimal strategy in the sense that we do as
well as we can against an infallible opponent.

Minimax search
❖ One-move deep (two half-moves)(2 ply) game tree:

Minimax search (Cont’d)
❖ Given a game tree, we determine the optimal strategy by
establishing the minimax value of each node, which is the utility (for
MAX) of being in the state corresponding to s.
❖ Well, the value assuming that both players finish the game out
perfectly. (assume both players play optimally)
❖ How do we do this?
 Obviously, Minimax value of a terminal state is just its utility.
 Assume our utility function gives terminal nodes high positive values if
they are good for MAX
 And low values if they are good for MIN
 Now, look at the leaf nodes and consider which ones MAX wants:
▪ Ones with high values.
 MAX could choose these nodes if it was his turn to play.
 So, the value of the MAX-node parent of a set of nodes is the max of all
the child values.
Minimax search (Cont’d)

 Similarly, when MIN plays he wants the node with the lowest value.
 So the MIN-node parent of a set of nodes gets the min of all their
❖ i.e., Given a choice, MAX prefer to move to a state of maximum
value, whereas MIN prefers a state of minimum value
❖ We back up values until we get to the children of the start node, and
MAX can use this to decide which node to choose.

Minimax algorithm

Designed to find the optimal strategy for Max and find best move:

1. Generate the whole game tree, down to the leaves.

2. Apply utility (payoff) function to each leaf.

3. Back-up values from leaves through branch nodes:

a Max node computes the Max of its child values
a Min node computes the Min of its child values

4. At root: choose the move leading to the child of highest value.

Minimax search (Cont’d)
MINIMAX(B) = min(3,12,8) =3
MINIMAX(C) = min(2,4,6) =2
MINIMAX(D) = min(14,5,2) =2

Minimax search (Cont’d)
MINIMAX(root) = max(min(3,12,8), min(2,4,6), min(14,5,2)) = max(3,2,2) =3

❖ There’s an algorithm for this.

Minimax algorithm
❖ Recursive Depth First Search:

Properties of minimax
❖ Complete?
 Yes (if tree is finite)
❖ Optimal?
 Yes (against an optimal opponent)
❖ Time complexity?
 O(bm)
❖ Space complexity?
 O(bm) (depth-first exploration)

❖ For chess, b ≈ 35, m ≈100 for "reasonable" games

 exact solution completely infeasible
❖ It is usually impossible to develop the whole search tree.
 Moves must be made in a reasonable amount of time
Solution to the complexity problem

❖ Two solutions:

 Early cutoff of the search tree

▪ depth limited Minimax search (MINIMAXcutoff).

 Dynamic pruning of redundant branches of the search tree

▪ Procedure: Alpha-Beta pruning

Cutting off search
❖ Idea:
 Cutoff the search tree before the terminal state is reached.
❖ Problem:
 Utility is defined only for terminal states.
❖ Solution:
 apply a heuristic Evaluation function to states in the search
▪ Which estimate the position utility

❖ MinimaxCutoff search is identical to Minimax search except

1. TERMINAL-TEST(s) is replaced by CUTOFF-TEST(s)
2. UTILITY(s) is replaced by EVAL (s)

❖ The evaluation function heuristic

Example—Tic-tac-toe. (Cont’d)

Example—Tic-tac-toe. (Cont’d)
❖ Unsurprisingly (for anyone who ever played Tic-tac-toe):

❖ Is the best move.

❖ So MAX moves and then MIN replies, and then MAX
searches again:

Example—Tic-tac-toe. (Cont’d)

Here there are

two equally good
best moves.
• So we can break
the tie randomly.
• Then we let
MIN move and do
the search again.

Example—Tic-tac-toe. (Cont’d)

And so on.

α-β pruning
❖ It is possible to compute the correct minimax decision without looking
at every node in the game tree
❖ Example
Do DF-search until first leaf Range of possible values


[-∞, +∞]

α-β pruning Example



α-β pruning Example



α-β pruning Example



α-β pruning Example

This node is
worse for MAX

[3,3] [-∞,2]

α-β pruning Example

[3,14] ,

[3,3] [-∞,2] [-∞,14]

α-β pruning Example

[3,5] ,

[3,3] [−∞,2] [-∞,5]

α-β pruning Example


[3,3] [−∞,2] [2,2]

α-β pruning Example


[3,3] [-∞,2] [2,2]

General alpha-beta pruning

❖ α is the value of the best Player

(i.e., highest value) choice
found so far at any choice
point along the path for
m 
❖ If v is worse than α, ( > v ), Opponent
MAX will avoid it
 prune that branch
❖ Define β similarly for MIN

Opponent n v

The α-β algorithm
❖ Depth first search
– only considers nodes along a single path from root at any time

 = highest-value choice found at any choice point of path for MAX

(initially,  = −infinity)
b = lowest-value choice found at any choice point of path for MIN
(initially, b = +infinity)

❖ Pass current values of  and b down to child nodes during search.

❖ Update values of  and b during search:
➢ MAX updates  at MAX nodes
➢ MIN updates b at MIN nodes
❖ Prune remaining branches at a node when  ≥ b

α-β Example Revisited

Do DF-search until first leaf

, b, initial values
b =+

, b, passed to kids
b =+

α-β Example Revisited

b =+

b =3

MIN updates b, based on kids

α-β Example Revisited

b =+

b =3

MIN updates b, based on kids.

No change.

α-β Example Revisited

MAX updates , based on kids.

b =+

3 is returned
as node value.

α-β Example Revisited

b =+

, b, passed to kids
b =+

α-β Example Revisited

b =+

MIN updates b,
based on kids.
b =2

α-β Example Revisited

b =+

=3  ≥ b,
b =2 so prune.

α-β Example Revisited

MAX updates , based on kids.

No change. =3
b =+

2 is returned
as node value.

α-β Example Revisited

b =+ ,
, b, passed to kids

b =+

α-β Example Revisited

b =+ ,
MIN updates b,
based on kids.
b =14

α-β Example Revisited

b =+ ,
MIN updates b,
based on kids.
b =5

α-β Example Revisited

b =+ 2 is returned
as node value.

α-β Example Revisited

Max calculates the same

node value, and makes the
same move!

The α-β algorithm

Final Comments about Alpha-Beta Pruning
❖ Pruning does not affect final results

❖ Entire subtrees can be pruned.

❖ Good move ordering improves effectiveness of pruning

❖ Repeated states are again possible.

 Store them in memory = transposition table

❖ which nodes can be pruned?

5 6
3 4 1 2 7 8

Example1 (Cont’d)
Max Answer:
NONE! Because the
most favorable nodes for both
are explored last (i.e., in the
diagram, are on the right-hand


5 6
3 4 1 2 7 8

Example2 : the exact mirror image of example1

❖ which nodes can be pruned?

3 4
6 5 8 7 2 1

Example2 (Cont’d)
Max LOTS! Because the most
favorable nodes for both are
explored first (i.e., in the
diagram, are on the left-hand



3 4
6 5 8 7 2 1

