Professional Documents
Culture Documents
4 Adversel Search Game Tree
4 Adversel Search Game Tree
4 Adversel Search Game Tree
Adversarial Search
Two–persons game
Russel Norvig (Text) Book and
Patrick Henry Winston (Reference
Book)
Game Theory
• Mathematical game theory, a branch of Economics,
views any multi agent environment as a Game provided
that the impact of each agent on the others is
“significant”, regardless of whether the agents are
cooperative or competitive.
• Game playing was one of the first tasks undertaken in AI.
• By 1950, chess had been tackled by Konrad Zuse,
Claude Shannon, Norbert Wiener and by Alan Turing.
• The state of a game is easy to represent, and agents
are usually restricted to a small no. of actions whose
outcomes are defined by precise rules.
* Environments with many agents are best viewed as economies rather than games
1
28-08-2023
Complexity
• In tic-tac-toe there are nine first moves with 8 possible
responses to each of them, followed by 7 possible
responses to each of these, and so on.
• It follows that 9 X 8 X 7 X 6 X…….1 or 9! (=362,880)
• Although it is not impossible for a computer to search this
no. of paths exhaustively, many important problem (e.g.
chess) exhibit factorial or exponential complexity, although
on a much larger scale.
• For example, chess has 10120 possible game paths;
checkers has 10 40, some of which may never occur in an
actual game.
• These spaces are difficult or impossible to search
exhaustively.
Game-Tree Sizes
• Sizes of game trees (total no.of nodes):
– Nim-5: 28 nodes
– Tic-Tac-Toe: 105 nodes
– Checkers: 1031 nodes
– Chess: 10123 nodes
– Go: 10360 nodes
• In practice it is intractable to find a solution
with minimax
4
2
28-08-2023
Types of games
deterministic chance
chess, checkers, go, backgammon,
perfect othello, Tic-Tac-Toe monopoly
information
bridge, poker,
scrabble
imperfect
information
Typical case
• Zero-sum: one player’s loss is the other’s gain
• Perfect information: both players have access to complete
information about the state of the game. No information is
hidden from either player.
• No chance (e.g., using dice) involved
• Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim,
Othello
• Not: Bridge, Solitaire, Backgammon, ...
• Imperfect information: game of Bridge, as not all cards
are visible to each player.
• Competitive multi-agent environments give rise to
adversarial search also known as games
3
28-08-2023
Game formalization
• Initial state
• A successor function
– Returns a list of (move, state) pairs
• Terminal test
– Terminal states
• Utility function (or objective function)
– A numeric value for the terminal states
• Game tree
– The state space
4
28-08-2023
Two-Person Games
• A game can be formally defined as a kind
of search problem with initial state, a set
of operators, a terminal test, and a utility
function
• A search tree may be constructed, with
large number of states
• States at depth d and depth d+1 are for
different players (two plies)
5
28-08-2023
Optimal Play
2 2 1
2 7 1 8 2 7 1 8 2 7 1 8
2
This is the optimal play
2 1
MAX
2 7 1 8
MIN 2 7 1 8
Minimax
• Perfect play for deterministic games: optimal strategy
• Idea: choose move to position with highest minimax value
= best achievable payoff against best play
• E.g., 2-ply game: only two half-moves
6
28-08-2023
Evaluation function
• Evaluation function or static evaluator is used to
evaluate the “goodness” of a game position.
– Contrast with heuristic search where the evaluation function
was a non-negative estimate of the cost from the start node
to a goal and passing through the given node.
Game Tree
7
28-08-2023
8
28-08-2023
MiniMax Algorithm
• It computes the minimax decision from the
current state.
• It uses a simple recursive computation of
the minimax values of each successor
state.
• The recursion proceeds all the way down
to the leaves of the tree, and then the
minimax values are backed up through the
tree as the recursion unwinds.
9
28-08-2023
MiniMaxValue Function
• if the state is terminal then
– return the corresponding utility function
value
• else
– return the lowest MiniMaxValue of the
successors of the state
10
28-08-2023
Efficiency of Minimax
Criterion Minimax
Optimal? yes,
Time O(bm),
Space O(bm),
11
28-08-2023
Properties of minimax
• Complete? Yes (if tree is finite)
• Optimal? Yes (against an optimal opponent)
• Time complexity? O(bm)
• Space complexity? O(bm) (depth-first exploration)
12
28-08-2023
Alpha-beta pruning
• We can improve on the performance of the minimax
algorithm through alpha-beta pruning.
• Basic idea: “If you have an idea that is surely bad, don't
take the time to see how truly awful it is.” -- Pat Winston
MAX >=2
• We don’t need to compute
the value at this node.
MIN =2 <=1
• No matter what it is, it can’t
affect the value of the root
MAX node.
2 7 1 ?
MIN <=6 B C
On discovering util( D ) = 6
= agent = opponent
13
28-08-2023
MIN
3 4 5 6
(Some of) these
still need to be
looked at
As soon as the node with
value 6 is generated, we
know that the alpha value will be
larger than 6, we don’t need
to generate these nodes
(and the subtree below them)
MAX
3
MIN 3 2 2
3 12 8 2 14 5 2
14
28-08-2023
MiniMax Example
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
15
28-08-2023
Alpha-Beta Example
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
16
28-08-2023
Alpha-Beta Example
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0 -3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
17
28-08-2023
Alpha-Beta Example
0 -3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0 -3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
18
28-08-2023
Alpha-Beta Example
0 3
0 -3 3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0 3
0 -3 3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
19
28-08-2023
Alpha-Beta Example
0 3
0 -3 3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0 3
0 -3 3 5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
20
28-08-2023
Alpha-Beta Example
0 3 2
0 -3 3 2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0 3 2
0 -3 3 2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
21
28-08-2023
Alpha-Beta Example
0 2
0 2
0 3 2
0 -3 3 2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0 2
0 2
0 3 2
0 -3 3 2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
22
28-08-2023
Alpha-Beta Example
0
0 2
0 2
0 3 2
0 -3 3 2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0
0 2
0 2
0 3 2
0 -3 3 2 5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
23
28-08-2023
Alpha-Beta Example
0
0 2
0 2
0 3 2 1
0 -3 3 2 1
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0
0 2
0 2
0 3 2 1
0 -3 3 2 1 -3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
24
28-08-2023
Alpha-Beta Example
0
0 2
0 2
0 3 2 1
0 -3 3 2 1 -3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0
0 2 1
0 2 1
0 3 2 1
0 -3 3 2 1 -3
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
25
28-08-2023
Alpha-Beta Example
0
0 2 1
0 2 1
0 3 2 1
0 -3 3 2 1 -3 -5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
0
0 2 1
0 2 1
0 3 2 1
0 -3 3 2 1 -3 -5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
26
28-08-2023
Alpha-Beta Example
0
0 2 1
0 2 1 -5
0 3 2 1 -5
0 -3 3 2 1 -3 -5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
1
0 2 1
0 2 1 -5
0 3 2 1 -5
0 -3 3 2 1 -3 -5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
27
28-08-2023
Alpha-Beta Example
1
0 1
0 2 1
0 2 1 -5
0 3 2 1 -5
0 -3 3 2 1 -3 -5
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Example
1
0 1
0 2 1 2
0 2 1 -5 2
0 3 2 1 -5 2
0 -3 3 2 1 -3 -5 2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
28
28-08-2023
Alpha-Beta Example
1
0 1
0 2 1 2
0 2 1 -5 2
0 3 2 1 -5 2
0 -3 3 2 1 -3 -5 2
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
Alpha-Beta Pruning
• Alpha = the value of the best choice (i.e.
highest value) we have found so far at any
choice point along the path for MAX.
29
28-08-2023
Alpha-Beta Pruning
• When applied to a standard minimax tree, it
returns the same move as minimax would, but
prunes away branches that can not possibly
influence the final decision
• May eliminate some static evaluations
• May eliminate some node expansions
Alpha-Beta Pruning
• Alpha values of MAX nodes can never decrease.
• Beta values of MIN nodes can never increase.
• Rules for discontinuing the search:
– Search can be discontinued below any MIN node
having a beta value less than or equal to the alpha
value of any of its MAX node ancestors.
– Search can be discontinued below any MAX node
having an alpha value greater than or equal to
beta value of any of its MIN node ancestor.
30
28-08-2023
• β can be similarly
defined for min
31
28-08-2023
32
28-08-2023
33
28-08-2023
2 3 4
5 6 7 8 9 10 11 12 13
34
28-08-2023
Properties of α-β
• Pruning does not affect final result
35
28-08-2023
36
28-08-2023
37
28-08-2023
State space
for a variant
of nim
38
28-08-2023
39
28-08-2023
40
28-08-2023
Evaluation functions
• Deep Blue has about 6000 features in its evaluation
function
• For chess, typically linear weighted sum of features
Eval(white) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
– e.g., w1 = 9 for queen, w2 = 5 for rook, … wn = 1 for pawn
f1(s) = (number of white queens) – (number of black
queens), etc.
Score = Eval(white) - Eval(black)
41
28-08-2023
Chinese Checkers
Chinese Checkers
• Sample moves for RED (bottom) player:
42
28-08-2023
Chinese Checkers
8
7 7
4 5 6 6 6 5 4
4 5 5 5 5 4
4 4 4 4 4
3 3 3 3 3 3
2 2 2 2 2 2 2
1 1
0
Chinese Checkers
• Another important feature:
For successful play, no piece should be left behind.
Therefore add another feature
coherence: Difference between the players in terms of the
smallest and highest positional value for any of their pieces.
43
28-08-2023
Prisoner’s Dilemma
• Consider the following story: Two alleged burglars, Alice and
Bob, are caught red handed near the scene of a burglary and
are interrogated separately.
• A prosecutor offers each a deal: if you testify against your
partner as the leader of a burglary ring, you’ll go free for
being the cooperative one, while your partner will serve 10
years in prison.
• However, if you both testify against each other, you’ll both
get 5 years.
• Alice and Bob also know that if both refuse to testify they will
serve only 1 year each for the lesser charge of possessing
stolen PRISONER’S property.
44
28-08-2023
Best strategy
• Now Alice and Bob face the so-called prisoner’s
dilemma: should they testify DILEMMA or refuse?
• Being rational agents, Alice and Bob each want to
maximize their own expected utility.
• Let’s assume that Alice is callously unconcerned about
her partner’s fate, so her utility decreases in proportion
to the number of years she will spend in prison,
regardless of what happens to Bob.
• Bob feels exactly the same way.
Pay-off Matrix
45
28-08-2023
Dominant strategy
46
28-08-2023
Nash equilibrium
• If Alice is clever as well as rational, she will continue to
reason as follows: Bob’s dominant strategy is also to
testify. Therefore, he will testify and we will both get five
years.
• When each player has a dominant strategy, the
combination of those strategies is called a dominant
strategy equilibrium.
• An equilibrium is essentially a local optimum in the
space of policies; it is the top of a peak that slopes
downward along every dimension, where a dimension
corresponds to a player’s strategy choices.
• The mathematician John Nash (1928–) proved that
every game has at least one equilibrium.
• The general concept of equilibrium is called Nash
equilibrium in his honor.
Drawback
• The dilemma in the prisoner’s dilemma is that the
equilibrium outcome is worse for both players than the
outcome they would get if they both refused to testify.
• It is certainly an allowable option for both of them to
refuse to testify, but is hard to see how rational agents
can get there, given the definition of the game.
• Either player contemplating playing refuse will realize
that he or she would do better by playing testify.
• That is the attractive power of an equilibrium point.
• Game theorists agree that being a Nash equilibrium is a
necessary condition for being a solution—although they
disagree whether it is a sufficient condition.
47
28-08-2023
Expectiminimax
• It's application is for games that contain a certain
element of unpredictability, and as such the game tree is
not deterministic.
Expectminmax
• Terminal nodes
• Max and Min nodes
• Exactly the same way as before
• Chance nodes
– Evaluated by taking the weighted average of
the values resulting from all possible dice rolls
48
28-08-2023
ExpectiMiniMax
• Expectiminimax(n) =
Utility(n), if n is terminal node
Max(successors(n)), if n is Max node
Min(successors(n)), if n is Min node
49
28-08-2023
An order-preserving transformation on
leaf values changes the best move
50
28-08-2023
Summary
• Deep learning based methods
• Adversarial Search Methods:
– MiniMax
– Alpha-Beta Cut-off
– ExpectiMiniMax
51