4 Adversel Search Game Tree

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

28-08-2023

Adversarial Search
Two–persons game
Russel Norvig (Text) Book and
Patrick Henry Winston (Reference
Book)

Game Theory
• Mathematical game theory, a branch of Economics,
views any multi agent environment as a Game provided
that the impact of each agent on the others is
“significant”, regardless of whether the agents are
cooperative or competitive.
• Game playing was one of the first tasks undertaken in AI.
• By 1950, chess had been tackled by Konrad Zuse,
Claude Shannon, Norbert Wiener and by Alan Turing.
• The state of a game is easy to represent, and agents
are usually restricted to a small no. of actions whose
outcomes are defined by precise rules.

* Environments with many agents are best viewed as economies rather than games

1
28-08-2023

Complexity
• In tic-tac-toe there are nine first moves with 8 possible
responses to each of them, followed by 7 possible
responses to each of these, and so on.
• It follows that 9 X 8 X 7 X 6 X…….1 or 9! (=362,880)
• Although it is not impossible for a computer to search this
no. of paths exhaustively, many important problem (e.g.
chess) exhibit factorial or exponential complexity, although
on a much larger scale.
• For example, chess has 10120 possible game paths;
checkers has 10 40, some of which may never occur in an
actual game.
• These spaces are difficult or impossible to search
exhaustively.

Game-Tree Sizes
• Sizes of game trees (total no.of nodes):
– Nim-5: 28 nodes
– Tic-Tac-Toe:  105 nodes
– Checkers:  1031 nodes
– Chess:  10123 nodes
– Go:  10360 nodes
• In practice it is intractable to find a solution
with minimax
4

2
28-08-2023

Types of games

deterministic chance
chess, checkers, go, backgammon,
perfect othello, Tic-Tac-Toe monopoly
information

bridge, poker,
scrabble
imperfect
information

Typical case
• Zero-sum: one player’s loss is the other’s gain
• Perfect information: both players have access to complete
information about the state of the game. No information is
hidden from either player.
• No chance (e.g., using dice) involved
• Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim,
Othello
• Not: Bridge, Solitaire, Backgammon, ...
• Imperfect information: game of Bridge, as not all cards
are visible to each player.
• Competitive multi-agent environments give rise to
adversarial search also known as games

3
28-08-2023

Games vs. search problems


• Problem solving agent is not alone any more
– Multiagent, conflicts
• Default: deterministic, turn-taking, two-player,
zero sum game of perfect information
– Perfect info. vs. imperfect, or probability
• "Unpredictable" opponent  specifying a move
for every possible opponent reply
• Time limits  unlikely to find goal, must
approximate

Game formalization
• Initial state
• A successor function
– Returns a list of (move, state) pairs
• Terminal test
– Terminal states
• Utility function (or objective function)
– A numeric value for the terminal states
• Game tree
– The state space

4
28-08-2023

Two-Person Games
• A game can be formally defined as a kind
of search problem with initial state, a set
of operators, a terminal test, and a utility
function
• A search tree may be constructed, with
large number of states
• States at depth d and depth d+1 are for
different players (two plies)

MiniMax Game tree


• MiniMax is a depth first, depth limited, recursive search
procedure.
• This method is used for playing games in which there are 2
players taking turns to play moves.
• Physically it is just a tree of all possible moves.
• MiniMax game tree are best suited for games in which both
players can see the entire game situation.

• The strategy behind MiniMax Algorithm is that it


assumes that both player will play to the best of their
ability.
• Works for Zero-sum, perfect information games.

5
28-08-2023

Optimal Play

2 2 1

2 7 1 8 2 7 1 8 2 7 1 8
2
This is the optimal play

2 1

MAX
2 7 1 8
MIN 2 7 1 8

Minimax
• Perfect play for deterministic games: optimal strategy
• Idea: choose move to position with highest minimax value
= best achievable payoff against best play
• E.g., 2-ply game: only two half-moves

6
28-08-2023

Evaluation function
• Evaluation function or static evaluator is used to
evaluate the “goodness” of a game position.
– Contrast with heuristic search where the evaluation function
was a non-negative estimate of the cost from the start node
to a goal and passing through the given node.

• The zero-sum assumption allows us to use a single


evaluation function to describe the goodness of a
board with respect to both players.
– f(n) >> 0: position n good for me and bad for you
– f(n) << 0: position n bad for me and good for you
– f(n) near 0: position n is a neutral position
– f(n) = +infinity: win for me
– f(n) = -infinity: win for you

Game Tree

• The key idea is that the more look ahead we can


do, that is,
– the deeper in the tree we can look, the better our
evaluation of a position will be,
– even with a simple evaluation function.

7
28-08-2023

evaluation function for


Tic-Tac-Toe
• Example:
f(n) = [# of 3-lengths open for me] - [# of 3-lengths
open for you]
where a 3-length is a complete row, column, or diagonal

E.g. 2) Tic-tac-toe: Game tree


(2-player, deterministic, turns)

8
28-08-2023

MiniMax Algorithm
• It computes the minimax decision from the
current state.
• It uses a simple recursive computation of
the minimax values of each successor
state.
• The recursion proceeds all the way down
to the leaves of the tree, and then the
minimax values are backed up through the
tree as the recursion unwinds.

9
28-08-2023

MiniMaxValue Function
• if the state is terminal then
– return the corresponding utility function
value

• else if MAX is to move in state then


– return the highest MiniMaxValue of the
successors of the state

• else
– return the lowest MiniMaxValue of the
successors of the state

Pseudo-code that implements Min-Max

• it is a simple recursive alternation of


maximization and minimization at each
layer.

• We assume that we count the depth value


down from the max depth so that when we
reach a depth of 0, we apply our static
evaluation to the board.

10
28-08-2023

Efficiency of Minimax

Criterion Minimax

Complete? yes,  in theory…

Optimal? yes, 

Time O(bm), 

Space O(bm), 

11
28-08-2023

Properties of minimax
• Complete? Yes (if tree is finite)
• Optimal? Yes (against an optimal opponent)
• Time complexity? O(bm)
• Space complexity? O(bm) (depth-first exploration)

• For chess, b ≈ 35, m ≈100 for "reasonable" games


 exact solution completely infeasible

Drawback of minimax search


• Number of games states is exponential to
the number of moves.
– Solution: Do not examine every node
– ==> Alpha-beta pruning
• Remove branches that do not influence final
decision

12
28-08-2023

Alpha-beta pruning
• We can improve on the performance of the minimax
algorithm through alpha-beta pruning.

• Basic idea: “If you have an idea that is surely bad, don't
take the time to see how truly awful it is.” -- Pat Winston

MAX >=2
• We don’t need to compute
the value at this node.
MIN =2 <=1
• No matter what it is, it can’t
affect the value of the root
MAX node.
2 7 1 ?

MAX A STOP! What


What else
else can
can
STOP!
you deduce
you deduce now!?
now!?

MIN <=6 B C

On discovering util( D ) = 6

D we know that util( B ) <= 6


MAX 6 >=8 E
On discovering util( J ) = 8
we know that util( E ) >= 8

H I J K Can stop expansion of E as best


play will not go via E
6 5 8 Value of K is irrelevant – prune it!

= agent = opponent

13
28-08-2023

Example with MIN


MIN β≤5

MAX α=5 α≥6

MIN
3 4 5 6
(Some of) these
still need to be
looked at
As soon as the node with
value 6 is generated, we
know that the alpha value will be
larger than 6, we don’t need
to generate these nodes
(and the subtree below them)

Example of Alpha-Beta Pruning

MAX
3

MIN 3 2 2

3 12 8 2 14 5 2

14
28-08-2023

e.g. Patrick Henry Winston book

MiniMax Example

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

15
28-08-2023

Alpha-Beta Example

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

16
28-08-2023

Alpha-Beta Example

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example

0 -3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

17
28-08-2023

Alpha-Beta Example

0 -3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example

0 -3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

18
28-08-2023

Alpha-Beta Example

0 3

0 -3 3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example

0 3

0 -3 3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

19
28-08-2023

Alpha-Beta Example

0 3

0 -3 3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example

0 3

0 -3 3 5

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

20
28-08-2023

Alpha-Beta Example

0 3 2

0 -3 3 2

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example

0 3 2

0 -3 3 2

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

21
28-08-2023

Alpha-Beta Example

0 2

0 2

0 3 2

0 -3 3 2

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example

0 2

0 2

0 3 2

0 -3 3 2

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

22
28-08-2023

Alpha-Beta Example
0

0 2

0 2

0 3 2

0 -3 3 2

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example
0

0 2

0 2

0 3 2

0 -3 3 2 5

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

23
28-08-2023

Alpha-Beta Example
0

0 2

0 2

0 3 2 1

0 -3 3 2 1

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example
0

0 2

0 2

0 3 2 1

0 -3 3 2 1 -3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

24
28-08-2023

Alpha-Beta Example
0

0 2

0 2

0 3 2 1

0 -3 3 2 1 -3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example
0

0 2 1

0 2 1

0 3 2 1

0 -3 3 2 1 -3

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

25
28-08-2023

Alpha-Beta Example
0

0 2 1

0 2 1

0 3 2 1

0 -3 3 2 1 -3 -5

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example
0

0 2 1

0 2 1

0 3 2 1

0 -3 3 2 1 -3 -5

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

26
28-08-2023

Alpha-Beta Example
0

0 2 1

0 2 1 -5

0 3 2 1 -5

0 -3 3 2 1 -3 -5

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example
1

0 2 1

0 2 1 -5

0 3 2 1 -5

0 -3 3 2 1 -3 -5

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

27
28-08-2023

Alpha-Beta Example
1

0 1

0 2 1

0 2 1 -5

0 3 2 1 -5

0 -3 3 2 1 -3 -5

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Example
1

0 1

0 2 1 2

0 2 1 -5 2

0 3 2 1 -5 2

0 -3 3 2 1 -3 -5 2

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

28
28-08-2023

Alpha-Beta Example
1

0 1

0 2 1 2

0 2 1 -5 2

0 3 2 1 -5 2

0 -3 3 2 1 -3 -5 2

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

Alpha-Beta Pruning
• Alpha = the value of the best choice (i.e.
highest value) we have found so far at any
choice point along the path for MAX.

• Beta = the value of the best choice (i.e.


lowest value) we have found so far at any
choice along the path for MIN.

29
28-08-2023

Alpha-Beta Pruning
• When applied to a standard minimax tree, it
returns the same move as minimax would, but
prunes away branches that can not possibly
influence the final decision
• May eliminate some static evaluations
• May eliminate some node expansions

Alpha-Beta Pruning
• Alpha values of MAX nodes can never decrease.
• Beta values of MIN nodes can never increase.
• Rules for discontinuing the search:
– Search can be discontinued below any MIN node
having a beta value less than or equal to the alpha
value of any of its MAX node ancestors.
– Search can be discontinued below any MAX node
having an alpha value greater than or equal to
beta value of any of its MIN node ancestor.

30
28-08-2023

Why is it called α-β?


• α is the value of the
best (i.e., highest-
value) choice found so
far at any choice point
along the path for max
• If v is worse than α,
max will avoid it
 prune that branch

• β can be similarly
defined for min

pseudo-code for Alpha-Beta


• We start out with the range of possible scores (as
defined by alpha and beta) going from minus infinity to
plus infinity.

• We call Max-Value with the current board state. If we are


at a leaf, we return the static value.

31
28-08-2023

pseudo-code for Alpha-Beta


• Otherwise, we look at each of the successors of this
state (by applying the legal move function) and for each
successor, we call the minimizer (Min-Value) and we
keep track of the maximum value returned in alpha.

• If the value of alpha (the lower bound on the score) ever


gets to be greater or equal to beta (the upper bound)
then we know that we don't need to keep looking - this is
called a cutoff - and we return alpha immediately.
• Otherwise we return alpha at the end of the loop. The
Minimizer is completely symmetric.

32
28-08-2023

Efficiency of Alpha-Beta procedure


• The efficiency of Alpha-Beta procedure
depends on the order in which successors
of a node are examined:
– If we were lucky, at a MIN node we would
always consider the nodes in order from
low to high score.
– And at a MAX node the nodes in order
from high to low score.

The Alpha-Beta Procedure


•Suppose that there is a game that always allows a
player to choose among b different moves, and we
want to look d moves ahead.
•Then our search tree has bd leaves.
•Therefore, if we do not use alpha-beta pruning, we
would have to apply the static evaluation function
Nd = bd times.

33
28-08-2023

Optimal Alpha-Beta Ordering

2 3 4

5 6 7 8 9 10 11 12 13

141516 171819 202122 232425 262728 293031 323334 353637 383940

The Alpha-Beta Procedure


Of course, the efficiency gain by the alpha-beta method
always depends on the rules and the current configuration
of the game.
However, if we assume that somehow new children of a
node are explored in a particular order - those nodes p are
explored first that will yield maximum values e(p) at depth d
for MAX and minimum values for MIN - the number of
nodes to be evaluated is:

2b d / 2  1 for even d


N d   ( d 1) / 2 ( d 1) / 2
b b 1 for odd d

34
28-08-2023

Properties of α-β
• Pruning does not affect final result

• Good move ordering improves effectiveness of pruning

• The efficiency of α-β procedure depends on the order in


which successors of a node are examined:
– At a min node, consider the nodes in order from low to high
score
– At a max node, consider the nodes in order from high to low
score

• With "perfect ordering," time complexity = O(bm/2)


 doubles depth of search

Effectiveness of Alpha-beta pruning


• Alpha-Beta is guaranteed to compute the same value for the
root node as computed by Minimax.
• Worst case: NO pruning, examining O(bd) leaf nodes,
where each node has b children and a d-ply search is
performed
• Best case: examine only O(b(d/2)) leaf nodes.
– You can search twice as deep as Minimax ! Or the
branch factor is b1/2 ie sqrt(b) rather than b.
• Best case is when each player's best move is the leftmost
alternative, i.e.
– at MAX nodes the child with the largest value generated
first, and
– at MIN nodes the child with the smallest value generated
first.

35
28-08-2023

Alpha-Beta Pruning Analysis


• Worst case:
– Bad ordering: Alpha-beta prunes NO nodes
• Best case:
– Assume cooperative oracle orders nodes
• Best value on left
• “If an opponent has some response that makes
move bad no matter what the moving player does,
then the move is bad.”
• Implies: check move where opposing player has
choice, check all own moves

Cutting off search


• MinimaxCutoff is identical to MinimaxValue
except
1. Terminal-Test is replaced by Cutoff-Test
2. Utility is replaced by Evaluation Function
• 4-ply lookahead is a hopeless chess player!
– 4-ply ≈ human novice
– 8-ply ≈ typical PC, human master
– 12-ply ≈ Deep Blue, Kasparov

36
28-08-2023

Game Playing – Example 2


• Nim (a simple game)
• Start with a single pile of tokens
• At each move the player must select a pile
and divide the tokens into two non-empty,
non-equal piles
+
+
+

37
28-08-2023

A variant of the game nim


• A number of tokens are placed on a table
between the two opponents
• A move consists of dividing a pile of tokens into
two nonempty piles of different sizes
• For example, 6 tokens can be divided into piles
of 5 and 1 or 4 and 2, but not 3 and 3
• The first player who can no longer make a move
loses the game
• For a reasonable number of tokens, the state
space can be exhaustively searched

State space
for a variant
of nim

Source: George F Luger


book

38
28-08-2023

Game Playing – Nim


• Remember that larger values are taken to be better
for MAX
• Assume that use a utility function of
1 = a win for MAX
0 = a win for MIN

• We only compare values, “larger or smaller”, so the


actual sizes do not matter
– in other games might use {+1,0,-1} for
{win,draw,lose}.

Some Chess Programs


• The earliest serious chess program
(MacHack6), which had a ranking of 1200,
searched on average to a depth of 4.
• Belle, which was one of the first hardware-
assisted chess programs doubled the depth to 8
and gained about 800 points in ranking (2000).
• Deep Blue, which searched to an average depth
of about 13 beat the world champion with a
ranking of about 2900.

39
28-08-2023

U.S. Chess Federation Rating

Some features of Deep Blue


• It had 256 specialized chess processors coupled
into a 32 node supercomputer.

• It examined around 30 billion moves per


minute.

• The typical search depth was 13 ply, but in


some dynamic situations it could go as deep as
30.
• In Deep Blue, it was found empirically that with Alpha-
Beta pruning the average branching factor at each node
was about 6 instead of about 35-40

40
28-08-2023

Chess: Evaluation function


examples
• Alan Turing’s function for chess
– f(n) = w(n)/b(n)
– where w(n) = sum of the point value of white’s pieces and
b(n) = sum of black’s
• Most evaluation functions are specified as a
weighted sum of position features:
f(n) = w1*feat1(n) + w2*feat2(n) + ... + wn*featk(n)
– Example features for chess are piece count, piece
placement, squares controlled, etc.

Evaluation functions
• Deep Blue has about 6000 features in its evaluation
function
• For chess, typically linear weighted sum of features
Eval(white) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
– e.g., w1 = 9 for queen, w2 = 5 for rook, … wn = 1 for pawn
f1(s) = (number of white queens) – (number of black
queens), etc.
Score = Eval(white) - Eval(black)

41
28-08-2023

Chinese Checkers

• Move all your pieces into your opponent’s home area.


• In each move, a piece can either move to a neighboring
position or jump over any number of pieces.

Chinese Checkers
• Sample moves for RED (bottom) player:

42
28-08-2023

Chinese Checkers
8
7 7
4 5 6 6 6 5 4
4 5 5 5 5 4
4 4 4 4 4
3 3 3 3 3 3
2 2 2 2 2 2 2
1 1
0

• Idea for important feature:


• assign positional values
• sum values for all pieces of each player
• Feature progress is difference of sum between players

Chinese Checkers
• Another important feature:
 For successful play, no piece should be left behind.
 Therefore add another feature
 coherence: Difference between the players in terms of the
smallest and highest positional value for any of their pieces.

• Weights used in program:


• 1 for progress
• 2 for coherence

43
28-08-2023

Single move games


• The prisoners’ dilemma is the best-known game of
strategy in social science.
• It helps us understand what governs the balance
between cooperation and competition in business, in
politics, and in social settings.
• And in game theory, a situation in which two players
each have two options whose outcome depends crucially
on the simultaneous choice made by the other, often
formulated in terms of two prisoners separately deciding
whether to confess to a crime.
• In the traditional version of the game, the police have
arrested two suspects and are interrogating them in
separate rooms.
– Each can either confess, thereby implicating the other, or keep
silent. No matter what the other suspect does, each can improve
his own position by confessing.

Prisoner’s Dilemma
• Consider the following story: Two alleged burglars, Alice and
Bob, are caught red handed near the scene of a burglary and
are interrogated separately.
• A prosecutor offers each a deal: if you testify against your
partner as the leader of a burglary ring, you’ll go free for
being the cooperative one, while your partner will serve 10
years in prison.
• However, if you both testify against each other, you’ll both
get 5 years.
• Alice and Bob also know that if both refuse to testify they will
serve only 1 year each for the lesser charge of possessing
stolen PRISONER’S property.

44
28-08-2023

Best strategy
• Now Alice and Bob face the so-called prisoner’s
dilemma: should they testify DILEMMA or refuse?
• Being rational agents, Alice and Bob each want to
maximize their own expected utility.
• Let’s assume that Alice is callously unconcerned about
her partner’s fate, so her utility decreases in proportion
to the number of years she will spend in prison,
regardless of what happens to Bob.
• Bob feels exactly the same way.

Pay-off Matrix

• Alice analyzes the payoff matrix as follows: “Suppose


Bob testifies. Then I get 5 years if I testify and 10 years if
I don’t, so in that case testifying is better.
• On the other hand, if Bob refuses, then I get 0 years if I
testify and 1 year if I refuse, so in that case as well
testifying is better.
• So in either case, it’s better for me to testify, so that’s
what I must do.” DOMINANT Alice has discovered that
testify is a dominant strategy for the game.

45
28-08-2023

Dominant strategy

Application of prisoners’ dilemma in


Economics and Business
• Consider two firms, say Coca-Cola and Pepsi, selling
similar products. Each must decide on a pricing strategy.
– They best exploit their joint market power when both charge a
high price; each makes a profit of ten million dollars per month.
• If one sets a competitive low price, it wins a lot of
customers away from the rival.
– Suppose its profit rises to twelve million dollars, and that of the
rival falls to seven million. If both set low prices, the profit of each
is nine million dollars.
• Here, the low-price strategy is akin to the prisoner’s
confession, and the high-price akin to keeping silent.
– Call the former cheating, and the latter cooperation.
– Then cheating is each firm’s dominant strategy, but the result
when both “cheat” is worse for each than that of both
cooperating.

46
28-08-2023

Nash equilibrium
• If Alice is clever as well as rational, she will continue to
reason as follows: Bob’s dominant strategy is also to
testify. Therefore, he will testify and we will both get five
years.
• When each player has a dominant strategy, the
combination of those strategies is called a dominant
strategy equilibrium.
• An equilibrium is essentially a local optimum in the
space of policies; it is the top of a peak that slopes
downward along every dimension, where a dimension
corresponds to a player’s strategy choices.
• The mathematician John Nash (1928–) proved that
every game has at least one equilibrium.
• The general concept of equilibrium is called Nash
equilibrium in his honor.

Drawback
• The dilemma in the prisoner’s dilemma is that the
equilibrium outcome is worse for both players than the
outcome they would get if they both refused to testify.
• It is certainly an allowable option for both of them to
refuse to testify, but is hard to see how rational agents
can get there, given the definition of the game.
• Either player contemplating playing refuse will realize
that he or she would do better by playing testify.
• That is the attractive power of an equilibrium point.
• Game theorists agree that being a Nash equilibrium is a
necessary condition for being a solution—although they
disagree whether it is a sufficient condition.

47
28-08-2023

Expectiminimax
• It's application is for games that contain a certain
element of unpredictability, and as such the game tree is
not deterministic.

• Typical games that can use this method include


Backgammon, and those with dice-rolling.

• expectiminimax game tree includes another set of nodes


Chance nodes, which are shown as circles.

• At each players turn there is a chance node for every


possible permutation of the random element outcome:
– in the case of rolling 2 dice there are 21 distinct outcomes and
an associated probability of that node occurring.

Expectminmax
• Terminal nodes
• Max and Min nodes
• Exactly the same way as before
• Chance nodes
– Evaluated by taking the weighted average of
the values resulting from all possible dice rolls

48
28-08-2023

ExpectiMiniMax
• Expectiminimax(n) =
Utility(n), if n is terminal node
Max(successors(n)), if n is Max node
Min(successors(n)), if n is Min node

Sum {successors(n) X Prob(successor)},


if n is Chance node

– Time complexity increases to O(bmnm) time for


expectiminimax, where n is the distinct number of
chance nodes eg dice roll.

Schematic game tree for a backgammon


position

49
28-08-2023

An order-preserving transformation on
leaf values changes the best move

50
28-08-2023

Summary
• Deep learning based methods
• Adversarial Search Methods:
– MiniMax
– Alpha-Beta Cut-off
– ExpectiMiniMax

51

You might also like