Professional Documents
Culture Documents
Document PDF
Document PDF
Games & AI
Deterministic Chance
1
The minimax principle Example: Tic-tac-toe
Assume the opponent plays to win and |
Terminology:
MAX = you, MIN = the opponent.
| | | | | |
Opponent Opponent
| | | | | |
(MIN) move (MIN) move
| | | | | |
| | | | | | | | | | | | | | | |
Your Your
| | | | | | | | | | | |
(MAX) (MAX)
move | | | | | | | | | | move | | | | | | | | | |
Minimax +1 0 0 0 0 +1
value
| | | | | | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | |
Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1
| | | | | |
Opponent Opponent
| | | | | |
(MIN) move (MIN) move
| | | | | |
Minimax Minimax
value 0 0 0 value 0 0 0
| | | | | | | | | | | | | | | |
Your Your
| | | | | | | | | | | |
(MAX) (MAX)
move | | | | | | | | | | move | | | | | | | | | |
Minimax +1 0 0 0 0 +1 Minimax +1 0 0 0 0 +1
value value
| | | | | | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | |
Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1
2
The minimax value The minimax algorithm
1. Start with utilities of terminal nodes
Minimax value for node n = 2. Propagate them back to root node by choosing the
minimax strategy
A
A
1 max
Utility(n) If n is a terminal node
B
B C
C D
D E
E
Max(Minimax-values of successors) If n is a MAX node -5 -6 0 1
min
Min(Minimax-values of successors) If n is a MIN node F G H I J K L M N O
-7
7 -5 3 9 -6 0 2 1 3 2
High utility favours you (MAX), therefore choose move with highest utility
Low utility favours the opponent (MIN), therefore choose move with
lowest utility
Figure borrowed from V. Pavlovic
B
B C
C D
D E
E B
B C
C D
D E
E
-5 -6 0 1 -5 -6 0 1
min min
F G H I J K L M N O F G H I J K L M N O
-7
7 -5 3 9 -6 0 2 1 3 2 -7
7 -5 3 9 -6 0 2 1 3 2
3
First three levels of the tic-tac-toe state space reduced by symmetry
3 states
(instead of 9)
| | |
| = | = | = |
| Image from G. F. Luger, Artificial Intelligence, 2002
W X A
-3 -5
Slide adapted from V. Pavlovic
F G H I J K L M max FF G H I J K L M
-5 -10 8 2 =? -5 -10 8 2
N O P Q R S T U V N O P Q R S T U V F
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B B
W X A W X A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
4
Alpha-Beta Example Alpha-Beta Example
minimax(N,3,4) minimax(F,2,4) is returned to
max F G H I J K L M max F G H I J K L M
=? -5 -10 8 2 =4
= -5 -10 8 2
N
N O P Q R S T U V F N O P Q R S T U V F
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state A W X gold: terminal state A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
min B C D E min B C D E
=? 0 =? 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2 W
O O
min N O
O P Q R S T U V F min N O P Q R S T U V F
4 =? 9 -6 0 3 5 -7 -9 4 =? 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
beta = -3, minimum seen so far O's beta (-3) < F's alpha (4): Stop expanding O (alpha cut-off)
min B C D E min B C D E
=? 0 =? 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2
O O
min N O P Q R S T U V F min N O P Q R S T U V F
4 =-3
= 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
5
Alpha-Beta Example Alpha-Beta Example
Why?
Smart opponent selects W or worse O's upper bound is 3 minimax(F,2,4) is returned to
So MAX shouldn't select O:-3 since N:4 is better
alpha not changed (maximizing)
A Call A Call
max =? max =?
Stack Stack
min B C D E min B C D E
=? 0 =? 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2
O
min N O P Q R S T U V F min N O P Q R S T U V F
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
A Call A Call
max =? max =?
Stack Stack
min B C D E min B C D E
=4
= 0 =4
= 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2
min N O P Q R S T U V min N O P Q R S T U V G
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
B B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
beta = -5, minimum seen so far alpha = -5, maximum seen so far
A Call A Call
max =? max =-5
=?
Stack Stack
min B C D E min B C D E
=-5
=4
= 0 =-5
=4
= 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2
min N O P Q R S T U V min N O P Q R S T U V
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
B
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
6
Alpha-Beta Example Alpha-Beta Example
minimax(C,1,4) minimax(H,2,4)
A Call A Call
max =-5
=? max =-5
=?
Stack Stack
min B CC D E min B CC D E
=-5
=4
= =? 0 =-5
=4
= =? 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2
min N O P Q R S T U V min N O P Q R S T U V H
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
C C
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
A Call A Call
max =-5
=? max =-5
=?
Stack Stack
min B CC D E min B CC D E
=-5
=4
= =-10
=? 0 =-5
=4
= =-10
=? 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2
min N O P Q R S T U V min N O P Q R S T U V
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
C C
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
A Call A Call
max =-5
=? max =-5
=0
=?
Stack Stack
min B CC D E min B CC D E
=-5
=4
= =-10
=? 0 =-5
=4
= =-10
=? 0
max F G H I J K L M max F G H I J K L M
=4 -5 -10 8 2 =4 -5 -10 8 2
min N O P Q R S T U V min N O P Q R S T U V
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
D
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
7
Alpha-Beta Example Alpha-Beta Example
minimax(D,1,4) is returned to
min B CC D E min B CC D E
E
=-5
=4
= =-10
=? 0 =-5
=4
= =-10
=? 0 =-7
max F G H I J K L M max F G H I J KK L M
M
=4 -5 -10 8 2 =4 -5 -10 8 =5 2 =-7
min N O P Q R S T U V min N O P Q R S T U V
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) All A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
min B CC D E min B CC D E
E
=-5
=4
= =-10
=? 0 =-5
=4
= =-10
=? 0 =-7
max F G H I J K L M max F G H I J K L M
M
=4 -5 -10 8 2 =4 -5 -10 8 2 =-7
min N O P Q R S T U V min N O P Q R S T U V
4 =-3 9 -6 0 3 5 -7 -9 4 =-3 9 -6 0 3 5 -7 -9
W X gold: terminal state (depth limit) A W X gold: terminal state (depth limit) Only 4 A
-3 -5 -3 -5
Slide adapted from V. Pavlovic Slide adapted from V. Pavlovic
=3 =3
==84
=
=48 = 3
=
=24
=2 =2 =2
=3
Which nodes will not be expanded when expanding from left to right?
8
Alpha-Beta pruning rule Alpha-Beta pruning rule
Stop expanding Stop expanding
max node n if (n) > higher in the tree max node n if (n) > higher in the tree
min node n if (n) < higher in the tree min node n if (n) < higher in the tree
= 3
==78 =3
=9
==34
= 8 =32 = 9
=
= 4
==92 =2 =3 =2
Which nodes will not be expanded when expanding from left to right? Which nodes will not be expanded when expanding from right to left?
Example, chess:
SBE = " Material Balance"+ " Center Control"+ ...
Material balance = value of white pieces value of black pieces, where
pawn = +1, knight & bishop = +3, rook = +5, queen = +9, king = ?
http://en.wikipedia.org/wiki/Computer_chess
Evaluation functions take many other factors into account, however, such as
pawn structure, the fact that doubled bishops are usually worth more,
centralized pieces are worth more, and so on. The protection of kings is
usually considered, as well as the phase of the game (opening, middle or
endgame).
9
http://en.wikipedia.org/wiki/Computer_chess
Nalimov Endgame Tablebases, which use state-of-the-art compression Dice games, card games,...
techniques, require 7.05 GB of hard disk space for all five-piece endings. It is
estimated that to cover all the six-piece endings will require at least 1 terabyte.
Seven-piece tablebases are currently a long way off.
Extend the minimax tree with chance
While Nalimov Endgame Tablebases handle en passant positions, they assume
layers.
that castling is not possible. This is a minor flaw that is probably of interest only A
to endgame specialists. =4
= max
Compute the expected
More importantly, they do not take into account the fifty move rule. Nalimov value over outcomes.
50/50 50/50
Endgame Tablebases only store the shortest possible mate (by ply) for each 50/50 50/50 chance
4 -2
position. However in certain rare positions, the stronger side cannot force win
before running into the fifty move rule. A chess program that searches the Select move with
.5 .5 .5 .5
database and obtains a value of mate in 85 will not be able to know in advance if the highest B C D E
=2 =6 =0 =-4 min
such a position is actually a draw according to the fifty move rule, or if it is a win, expected value.
because there will be a piece exchange or pawn move along the way. Various
solutions including the addition of a "distance to zero" (DTZ) counter have been 7 2 9 6 5 0 8 -4
proposed to handle this problem, but none have been implemented yet.
10