Ch5 AdversarialSearch

Chapter5:
Adversarial Search
(Game-Playing)
Ch5: Adversarial Search 1

Outline
❖ Games
❖ Optimal decisions in games
❖ Minimax algorithm
❖ Alpha-Beta Pruning (α-β pruning)
❖ Stochastic Games

Which Problems can we Solve?
❖ The task environments which are suitable for the search
algorithms we’ve looked at so far are:
 Fully observable
 Deterministic
 Sequential
 Static
 Discrete
 Single agent
❖ Here we will consider the situation where other agents

messing with the world.

Games
❖ Multiagent environments:
 Cooperative
 Competitive (in which the agent’s goals are in conflict)➔ adversarial search
problems ➔these problems known as games
❖ In Math. game theory (branch of economics), any multiagent environment
(either cooperative or competitive) is a game provided that the impact of
each agent on the other is significant
❖ In AI, games are usually what game theorists would call deterministic,
turn-taking, two-player, zero-sum games of perfect information.
 Zero-sum ➔ one players loss is the other’s gain.
 Perfect information ➔ both players have access to complete information about
the state of the game. No information is hidden from either player.
❖ Examples: chess, checkers, Connect 4, Othello, go, tic-tac-toe, …

Features of these Games
❖ Fully observable: game state is visible to both players
❖ Deterministic: no element of chance
❖ Sequential: action taken now affects future choices
❖ Static: the world doesn’t change during deliberation
❖ Discrete: the game state can be represented exactly using a
finite representation
❖ Multi agent: two agents whose actions alternate and the utility
values at the end of the game are always equal and opposite
(+1 and –1)

Game problem formulation
❖ Two players: MAX and MIN
❖ MAX moves first and they take turns until the game is over
 Winner gets reward, loser gets penalty.
 “Zero sum” means the sum of the reward and the penalty is a constant.
❖ Formal definition as a search problem:
 Initial state: Set-up specified by the rules, e.g., initial board configuration of
chess.
 Player(s): Defines which player has the move in a state.
 Actions(s): Returns the set of legal moves in a state.
 Result(s,a): Transition model defines the result of a move.
 Terminal-Test(s): Is the game finished? True if finished, false otherwise.
 Utility function(s,p): Gives numerical value of terminal state s for player p.
▪ vary from game to game:
▪ E.g., win (+1), lose (-1), and draw (0) in tic-tac-toe.
▪ E.g., win (+1), lose (0), and draw (1/2) in chess.➔ (Constant Sum)

Game tree
❖ As for a search problem, the initial state, action set and

transition model define a game tree for the game.
 a tree where the nodes are game states and the edges are
moves.
❖ We draw the tree assuming the two players, MAX and MIN
where MAX moves first.
❖ The next slide gives a partial game tree for tic-tac-toe.

partial game tree for tic-tac-toe
How do we
search this
tree to find
the optimal
move?
Terminal states are labeled depending on the winner
(MAX=+1,MIN=-1)
High values are good for MAX and bad for MIN.

Optimal strategies
❖ Key thing is that we have to take into account what the other
player is doing.
❖ Rather than the simple path that is a solution in a search
problem, we need a contingent strategy, which specifies
 MAX’s move in the initial state,
 then MAX’s moves in the states resulting from every possible
response by MIN,
 then MAX’s moves in the states resulting from every possible
response by MIN to those moves
 …
❖ This gives us an optimal strategy in the sense that we do as
well as we can against an infallible opponent.

Minimax search
❖ One-move deep (two half-moves)(2 ply) game tree:

Minimax search (Cont’d)
❖ Given a game tree, we determine the optimal strategy by
establishing the minimax value of each node, which is the utility (for
MAX) of being in the state corresponding to s.
❖ Well, the value assuming that both players finish the game out
perfectly. (assume both players play optimally)
❖ How do we do this?
 Obviously, Minimax value of a terminal state is just its utility.
 Assume our utility function gives terminal nodes high positive values if
they are good for MAX
 And low values if they are good for MIN
 Now, look at the leaf nodes and consider which ones MAX wants:
▪ Ones with high values.
 MAX could choose these nodes if it was his turn to play.
 So, the value of the MAX-node parent of a set of nodes is the max of all
the child values.
 Similarly, when MIN plays he wants the node with the lowest value.
 So the MIN-node parent of a set of nodes gets the min of all their
values.
❖ i.e., Given a choice, MAX prefer to move to a state of maximum
value, whereas MIN prefers a state of minimum value
❖ We back up values until we get to the children of the start node, and
MAX can use this to decide which node to choose.

Minimax algorithm
Designed to find the optimal strategy for Max and find best move:
1. Generate the whole game tree, down to the leaves.
2. Apply utility (payoff) function to each leaf.
3. Back-up values from leaves through branch nodes:

a Max node computes the Max of its child values
a Min node computes the Min of its child values
4. At root: choose the move leading to the child of highest value.

MINIMAX(B) = min(3,12,8) =3
MINIMAX(C) = min(2,4,6) =2
MINIMAX(D) = min(14,5,2) =2

MINIMAX(root) = max(min(3,12,8), min(2,4,6), min(14,5,2)) = max(3,2,2) =3
❖ There’s an algorithm for this.

Minimax algorithm
❖ Recursive Depth First Search:

Properties of minimax
❖ Complete?
 Yes (if tree is finite)
❖ Optimal?
 Yes (against an optimal opponent)
❖ Time complexity?
 O(bm)
❖ Space complexity?
 O(bm) (depth-first exploration)
❖ For chess, b ≈ 35, m ≈100 for "reasonable" games

 exact solution completely infeasible
❖ It is usually impossible to develop the whole search tree.
 Moves must be made in a reasonable amount of time
Solution to the complexity problem
❖ Two solutions:
 Early cutoff of the search tree

▪ depth limited Minimax search (MINIMAXcutoff).
 Dynamic pruning of redundant branches of the search tree

▪ Procedure: Alpha-Beta pruning

Cutting off search
❖ Idea:
 Cutoff the search tree before the terminal state is reached.
❖ Problem:
 Utility is defined only for terminal states.
❖ Solution:
 apply a heuristic Evaluation function to states in the search
▪ Which estimate the position utility
❖ MinimaxCutoff search is identical to Minimax search except

1. TERMINAL-TEST(s) is replaced by CUTOFF-TEST(s)
2. UTILITY(s) is replaced by EVAL (s)

Example—Tic-tac-toe.
❖ The evaluation function heuristic

Example—Tic-tac-toe. (Cont’d)

❖ Unsurprisingly (for anyone who ever played Tic-tac-toe):
❖ Is the best move.

❖ So MAX moves and then MIN replies, and then MAX
searches again:

Here there are

two equally good
best moves.
• So we can break
the tie randomly.
• Then we let
MIN move and do
the search again.

And so on.

α-β pruning
❖ It is possible to compute the correct minimax decision without looking
at every node in the game tree
❖ Example
Do DF-search until first leaf Range of possible values
[-∞,+∞]
[-∞, +∞]

α-β pruning Example
[-∞,+∞]
[-∞,3]

[-∞,+∞]
[-∞,3]

[3,+∞]
[3,3]

[3,+∞]
This node is
worse for MAX
[3,3] [-∞,2]

[3,14] ,
[3,3] [-∞,2] [-∞,14]

[3,5] ,
[3,3] [−∞,2] [-∞,5]

[3,3]
[3,3] [−∞,2] [2,2]

[3,3]
[3,3] [-∞,2] [2,2]

General alpha-beta pruning
❖ α is the value of the best Player

(i.e., highest value) choice
found so far at any choice
point along the path for
MAX
m 
❖ If v is worse than α, ( > v ), Opponent
MAX will avoid it
 prune that branch
Player
❖ Define β similarly for MIN
Opponent n v

The α-β algorithm
❖ Depth first search
– only considers nodes along a single path from root at any time
 = highest-value choice found at any choice point of path for MAX

(initially,  = −infinity)
b = lowest-value choice found at any choice point of path for MIN
(initially, b = +infinity)
❖ Pass current values of  and b down to child nodes during search.

❖ Update values of  and b during search:
➢ MAX updates  at MAX nodes
➢ MIN updates b at MIN nodes
❖ Prune remaining branches at a node when  ≥ b

α-β Example Revisited
Do DF-search until first leaf

, b, initial values
=−
b =+
, b, passed to kids
=−
b =+

=−
b =+
=−
b =3
MIN updates b, based on kids

=−
b =+
=−
b =3
MIN updates b, based on kids.

No change.

MAX updates , based on kids.

=3
b =+
3 is returned
as node value.

=3
b =+
=3
b =+

=3
b =+
MIN updates b,
based on kids.
=3
b =2

=3
b =+
=3  ≥ b,
b =2 so prune.

MAX updates , based on kids.

No change. =3
b =+
2 is returned
as node value.

=3
b =+ ,
=3
b =+

=3
b =+ ,
MIN updates b,
based on kids.
=3
b =14

=3
b =+ ,
MIN updates b,
based on kids.
=3
b =5

=3
b =+ 2 is returned
as node value.

Max calculates the same

node value, and makes the
same move!

The α-β algorithm

Final Comments about Alpha-Beta Pruning
❖ Pruning does not affect final results
❖ Entire subtrees can be pruned.
❖ Good move ordering improves effectiveness of pruning
❖ Repeated states are again possible.

 Store them in memory = transposition table

Example1
❖ which nodes can be pruned?
5 6
3 4 1 2 7 8

Example1 (Cont’d)
Max Answer:
NONE! Because the
most favorable nodes for both
are explored last (i.e., in the
diagram, are on the right-hand
side).
Min
Max
5 6
3 4 1 2 7 8

Example2 : the exact mirror image of example1
❖ which nodes can be pruned?
3 4
6 5 8 7 2 1

Example2 (Cont’d)
Answer:
Max LOTS! Because the most
favorable nodes for both are
explored first (i.e., in the
diagram, are on the left-hand
side).
Min
Max
3 4
6 5 8 7 2 1

Ch5 AdversarialSearch

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch5 AdversarialSearch

Uploaded by

Copyright:

Available Formats

Chapter5:

Ch5: Adversarial Search 1

Ch5: Adversarial Search 2

❖ Here we will consider the situation where other agents

Ch5: Adversarial Search 3

Ch5: Adversarial Search 4

Ch5: Adversarial Search 5

Ch5: Adversarial Search 6

❖ As for a search problem, the initial state, action set and

❖ The next slide gives a partial game tree for tic-tac-toe.

Ch5: Adversarial Search 7

Ch5: Adversarial Search 8

Ch5: Adversarial Search 9

Ch5: Adversarial Search 10

Ch5: Adversarial Search 12

1. Generate the whole game tree, down to the leaves.

2. Apply utility (payoff) function to each leaf.

3. Back-up values from leaves through branch nodes:

4. At root: choose the move leading to the child of highest value.

Ch5: Adversarial Search 13

Ch5: Adversarial Search 14

❖ There’s an algorithm for this.

Ch5: Adversarial Search 16

❖ For chess, b ≈ 35, m ≈100 for "reasonable" games

 Early cutoff of the search tree

 Dynamic pruning of redundant branches of the search tree

Ch5: Adversarial Search 18

❖ MinimaxCutoff search is identical to Minimax search except

Ch5: Adversarial Search 19

Ch5: Adversarial Search 20

Ch5: Adversarial Search 21

❖ Is the best move.

Ch5: Adversarial Search 22

Here there are

Ch5: Adversarial Search 23

Ch5: Adversarial Search 24

Ch5: Adversarial Search 25

Ch5: Adversarial Search 26

Ch5: Adversarial Search 27

Ch5: Adversarial Search 28

Ch5: Adversarial Search 29

[3,3] [-∞,2] [-∞,14]

Ch5: Adversarial Search 30

[3,3] [−∞,2] [-∞,5]

Ch5: Adversarial Search 31

[3,3] [−∞,2] [2,2]

Ch5: Adversarial Search 32

[3,3] [-∞,2] [2,2]

Ch5: Adversarial Search 33

❖ α is the value of the best Player

Ch5: Adversarial Search 34

 = highest-value choice found at any choice point of path for MAX

❖ Pass current values of  and b down to child nodes during search.

Ch5: Adversarial Search 35

Do DF-search until first leaf

Ch5: Adversarial Search 36

MIN updates b, based on kids

Ch5: Adversarial Search 37

MIN updates b, based on kids.

Ch5: Adversarial Search 38

MAX updates , based on kids.

Ch5: Adversarial Search 39

Ch5: Adversarial Search 40