Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Tic-Tac-Toe AI Player using Minimax and

Alpha-Beta Pruning

By Yoong Zeng, Puvanesh Rao, Kaartiik Vijayan, Swaroop Kovoor

1. Introduction
Zero-sum games like tic-tac-toe and chess work on the basis of the Pareto
optimality where the total utility score is divided among the two players [1]. This
means an increase in one player’s score results into the decrease of another
player’s score.

Part of making an AI entity that is capable of having a higher win rate involves
implementation of combinatorial game theory such as the minimax algorithm. One
of the highpoints of the implementation of the minimax algorithm was its
implementation for Deep Blue that was designed by IBM to beat world chess
champion Garry Kasparov in 1997 [2]. Moreover, as minimax was being constantly
studied by people through self-playing, Google’s DeepMind Company designed
state-of-the-art intelligent player Alphazero based on the principles of minimax.

This paper will mainly apply minimax with alpha-beta pruning to allow the AI
entity to play 4-by-4 tic-tac-toe with a loss rate that is significantly lower than the
normal deviation. The minimax algorithm computes the values of each following
node and uses backtracking to find out the best move [3] It all begins by assigning
the two players as MAX and MIN respectively. MAX makes moves to maximize its
score while MIN tends to minimize MAX’s score. The heuristics of the minimax
algorithm involves predicting the state of the board ahead of time in order to make
the best move. The minimax search algorithm is good at predicting the opponent’s
move and then beating it. However one of the many complicated challenges that we
faced was the runtime of minimax where for a 4 by 4 tic-tac-toe game where the list
of possible states was 16! or an estimated 20,922,790,000,000 possible moves. This
results in high computational time and subsequently a high CPU resource allocation.
In order to shorten its runtime, this paper applies alpha-beta pruning to minimax.
Since time is too limited for minimax to look at every node in the game tree, the
main goal of alpha-beta pruning is to increase the minimax algorithm’s efficiency by
pruning any unnecessary move. [3].
This paper is organized in such a way subsequent to this introduction, the
methodology which is in Section 2, will detail the problem as well as the challenges
that were faced during the implementation of minimax with alpha-beta pruning for
the 4x4 tic-tac-toe AI entity. In addition to that, under Section 2.1 and 2.2, the
methodology will also go over generally the problem definition and details about
how the 4x4 tic-tac-toe game works. In Section 2.3 and 2.4, the paper will describe
in greater detail of the inner-workings and mechanics of the minimax algorithm as
well as the challenges that arose during the implementation of minimax respectively.
In section 2.5 and 2.6, the paper will describe the implementation of alpha-beta
pruning with code examples followed by the challenges faced during incorporating
it with the minimax algorithm. In Section 2.7 the paper will also go over
diagrammatically the flow of the game tree and how the game states and depths in
the game tree work in unison. In addition to that, in section 2.8 the usage of
different heuristic functions incorporated into the minimax algorithm will be
explained in detail regarding its purpose. Finally, in Section 3, we conclude the
paper with a discussion of the project and the results as well as how we could have
improved some aspects of the project to bring a better improvement to the
algorithm in the future.

2. Methodology and Implementation

2.1 Problem Definition and Representation

As it was introduced, the main problem that this paper will address is
allowing the AI entity to play a game of 4-by-4 tic-tac-toe without resulting in the
worst possible outcome. This means the algorithm intends to maximize the AI
player’s win rate by finding the best possible move from all the states recursed.

The paper will also address the problem faced when implementing minimax
algorithm which results in time complexity of O(b m) and the space complexity is
O(bm), where b is the number of legal moves at each point and m is the maximum
depth of the tree [4]. This high computational means that the algorithm goes
through approximately 20,922,790,000,000 possible moves to find the best possible
move which will take a long time and that is not ideal when it comes to zero-sum
games like tic-tac-toe[5]. Not to mention, the high computational time also puts a
heavy load on the resources used by the CPU which is not an appropriate way to
handle the game or any application for that matter. With our implementation of
heuristic evaluation functions and resource limits, as well as alpha-beta pruning this
makes it feasible to carry out the minimax algorithm by decreasing the number of
nodes that are evaluated by the minimax algorithm in its search tree which
consecutively reduces the number of searches as well as proportionally reducing
the time taken.

2.2 Tic-Tac-Toe

Tic-tac-toe is both a zero-sum game and a perfect information game where


the latter describes the nature of the game as each player can see all the pieces on
the board at all times [6]. This game is similar to the likes of chess, checkers, and
go.
The game can come in many grid sizes but for the purposes of showing the
capabilities of both the minimax algorithm and the alpha-beta pruning, the chosen
grid size will be 4-by-4 which means 4 columns by 4 rows. The game is
straightforward and it is played between two entities whether it is the human player
or the AI player.

The 4x4 tic-tac-toe can be won in two ways:

1) The player wins the game immediately if they match any one of the 4
possible combinations (diagonally, horizontally, vertically). Diagram 1 shows an
example of one way out of the 10 winning possible positions to win the game. The
implementation of this paper will detail that the possible winning states for this rule
are defined as the following state space where an array of 0-15 cells represents the
16 cells that is present on the board :

self.win_position = [
[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11],
[12, 13, 14, 15],
[0, 4, 8, 12],
[1, 5, 9, 13],
[2, 6, 10, 14],
[3, 7, 11, 15],
[0, 5, 10, 15],
[3, 6, 9, 12]
]
Diagram 1: Win Rule 1 for 4x4 Tic-Tac-Toe Game Won by Player ‘X’

2) If the preceding rule is not able to conclusively determine the winner, the
second rule will be brought into play where it determines the winner by counting the
number of most dominating straight lines done by the player. This is shown as an
example below in Diagram 2. Diagram 2 shows that player ‘O’ wins as more O’s
make up 4 dominating lines on the board that is outlined by the red lines. However
player ‘X’ only manages to make 3 dominating lines. These dominating lines come
in the form of diagonally, vertically, or horizontally.

Diagram 2: Win Rule 2 for a 4x4 Tic-Tac-Toe Game Won by Player ‘O’

2.3 Minimax Algorithm

Minimax is a recursive algorithm which is used to choose an optimal move


for a player assuming that the opponent is also playing optimally. As its name
suggests, its goal is to minimize the maximum loss (minimize the worst case
scenario). Minimax works by recursively calling itself until it finds itself a terminal
state, which is the end state of the game. This recursion occurs going level by level
deeper into the game tree. Once the terminal state is found, the score obtained
from it will be passed one level up.

For the implementation of the optimized AI player which serves as the basis
of this project, we utilized the minimax algorithm by having the two players assigned
as the Max and Min.

The algorithm begins by allowing the first player which is the one assigned
Max trying its first move. Minimax algorithm will recurse through all the possibilities
of combination of both Max’s and Min’s move. When either one wins or the game
comes to a draw, an evaluation value of the board will be given to indicate the
situation of the board. If some features on the board are in favor of Max, a positive
value will be given to that feature. Otherwise, a negative value will be given. The
final evaluation value is the summation of all the values of features. Max will choose
the maximum evaluation value and Min will choose the otherwise. Eventually, Max
will decide the best move.

The following pseudocode version of the minimax algorithm shows the


algorithm’s architecture. The algorithm begins by taking in the arguments of node,
depth, maximizingPlayer. The node argument defines the game nodes, the
depth defines the height of the tree, and the maximizingPlayer argument defines
whether the current player is the Max or otherwise. Then the algorithm proceeds to
check the first condition that is whether the depth of the game tree is at 0 or
terminal state. If it is in the terminal state it returns the heuristic value of that node.

Line 1 to 3

function minimax(node, depth, maximizingPlayer) is


if depth = 0 or node is a terminal node then
return the heuristic value of node

Following that, the algorithm carries out the condition where if the player is
really the maximizingPlayer then it will be assigned the worst score which is
negative infinity (−∞). Subsequently, the algorithm will then explore each child of the
node and then obtain the maximum value by comparing it with the value and what
the recursive minimax() algorithm brings for every depth explored (depth-1). The
best score (value) will be returned.

Lines 4-8

if maximizingPlayer then
value := −∞
for each child of node do
value := max(value, minimax(child, depth − 1, FALSE))
return value

If the player is the opposing player, the minimizing player, then the value will
be a positive infinity (+∞). Subsequently, the algorithm will then explore each child of
the node and then obtain the minimum value by comparing it with the value and
what the recursive minimax() algorithm brings for every depth explored (depth-1).
The best value will be returned.

Lines 9-13

else (* minimizing player *)


value := +∞
for each child of node do
value := min(value, minimax(child, depth − 1, TRUE))
return value

The flowchart for the minimax algorithm is shown as follows in Diagram 3


where the algorithm begins with the MiniMax-Start state and ends with MiniMax-
End by returning the Heuristic Value(HV) which is the best, optimal move.
Diagram 3: Flowchart for Minimax Algorithm.

2.4 Challenges Faced During Implementation of Minimax


Algorithm

The main challenge in implementing the minimax algorithm is determining the


terminal state of the board. As mentioned above, the terminal state is the end state
of the game. The need to come up with with an appropriate heuristic function to test
the terminal state was crucial. For this version of tic-tac-toe, there were many
different possibilities of terminal states to be tested which are, win by occupying full
straight line, win by the most number of dominating straight lines, and tie. Hence, a
function to check if the game is over is created. If a player wins the score is
returned, else if no player wins it checks for number of dominating straight lines. If
the score returned is a positive integer then the maximizing player would win, if it is
a negative integer then the opponent player would win or if its a zero then the game
would end in a tie. But if the game is not over, the heuristic function checks which
player occupies potential winning positions.

2.5 Alpha-Beta Pruning

Alpha-beta pruning is a search algorithm that objectively aims to reduce the


number of nodes that are evaluated by the minimax algorithm in its search tree. One
of the challenges that we faced with implementing minimax algorithm is that it
incurs a high computational time as it requires recursively to go through all the
possible nodes and then return the best move for the player to make. This means if
it not for alpha-beta pruning, the minimax algorithm would not have been feasible to
make the AI optimally choose the best move because the algorithm would take too
long especially when there’s a depth of 16 as the tic-tac-toe game is made to be of
the 4x4 grid structure.

Hence, the larger the depth of the tree, the more time the Minimax takes to
make a decision as well as making the best choice compared to if the depth was
set at 3 where the algorithm will not be taking into account of all possible states
until the terminal state where the game ends.

The following pseudocode version of the alpha-beta pruning shows the


algorithm’s architecture. In conjunction with the minimax algorithm, alpha-beta
pruning requires to take in the arguments of node, depth, maximizingPlayer.
The node argument defines the game nodes, the depth defines the height of the
tree, and the maximizingPlayer argument defines whether the current player is the
Max or otherwise. The only difference with the implementation of alpha-beta
pruning is taking in two extra arguments of alpha (α) and beta (β). Alpha is the
best choice for the player Max. The highest possible value requires to be obtained
here. Beta is the best choice for player Min and it has to be the lowest possible
value. Both alpha and beta are kept and computed at each node.

Then the algorithm proceeds to check the first condition that is whether the
depth of the game tree is at 0 or terminal state. If it is in the terminal state it returns
the heuristic value of that node.
Line 1-3

function alphabeta(node, depth, α, β, maximizingPlayer) is


if depth = 0 or node is a terminal node then
return the heuristic value of node

If the player is Max, then the value assigned as negative infinity (the worst
possible case). The condition to prune a node arrives when alpha becomes greater
than or equal to beta. The algorithm will then explore each child of the node and
then obtain the maximum value by comparing it with the value and what the
recursive alphabeta function brings for every depth explored (depth-1). Alpha
(α) then receives the maximum (best) value when compared max(α, value)
between the alpha (α) and the value returned by the recursive function. The pruning
comes into play in lines 9 and 10 where if alpha is greater than beta then the beta
node will be pruned off or cut off (break). Lastly, the best value is returned

Line 4-11

if maximizingPlayer then
value := −∞
for each child of node do
value := max(value, alphabeta(child, depth − 1, α, β, FALSE))
α := max(α, value)
if α ≥ β then
break (* β cut-off *)
return value

Lines 12-19 shows if it is the Min player’s turn then it is assigned the value of
positive infinity(+∞). Subsequently, the algorithm will explore each child of the node
and then obtain the minimum value by comparing it with the value and what the
recursive alpha-beta function brings for every depth explored (depth-1).

Beta (β) then receives the minimum value when compared min(β, value)
between the beta (β) and the value returned by the recursive function. The pruning
comes into play in lines 17 and 18 where if alpha is greater than beta then the alpha
node will be pruned off or cut off (break). The best value is finally returned.
Line 12-19

else
value := +∞
for each child of node do
value := min(value, alphabeta(child, depth − 1, α, β, TRUE))
β := min(β, value)
if α ≥ β then
break (* α cut-off *)
return value

Diagram 4 shows the logic of alpha-beta pruning. As shown at depth 1 (Min)


child node 5 is chosen instead of 8 as it is minimum and the alpha-beta function
prunes the entire subtree is cut off since the grayed-out subtrees do not need to be
explored (when moves are evaluated from left to right). This is because we know the
group of subtrees as a whole yield the value of an equivalent subtree or worse, and
as such cannot influence the final result.

Diagram 4: An illustration of alpha–beta pruning.


2.6 Challenges Faced During Implementation of Alpha-Beta
Pruning

The major challenge with alpha-beta pruning implementation was it still did
not optimize the player move initially when implemented. Even with alpha-beta
pruning implemented the algorithm took long computational time to calculate the
next best move. To solve this problem, the algorithm will not check for possible
winning moves until the depth of the game reaches 3. This is because in depth 3
each player would have made 2 moves each. So, the next player making the 3rd
move has the potential to dominate a straight line. Therefore, when on depth 3, the
algorithm checks which move the maximizing player needs to make in order to
increase winning chances.

2.7 Game Tree

An example of a game tree from the minimax algorithm implemented in this


project. Assuming the state of the board is:

[0, 0, 1, 0,
1, 1, 0, 1,
0, -1, 0, -1,
-1, 1, 1, 1]
Diagram 5: Possible Game Tree for Minimax

White - Maximizing Player


Black - Minimizing Player

The algorithm tests each available position on the board to choose the best
possible move looking for a win for the maximizing player and also preventing the
opponent from winning. In this case, the best possible move would be 12 which
gives a score of 10 in order to prevent the opposing player from winning. The other
2 positions which are 9 and 11 are ignored because it gives a lower score of 5,
which means the opponent has a higher chance of winning. The algorithm will not
calculate the value of each end node because alpha-beta pruning is applied. Alpha-
beta pruning saves computation time by stopping the algorithm going down a
certain node, once a seemingly better option is obtained in another node.

2.8 Heuristic Function

When dealing with game trees, the heuristic function is generally referred to
as the evaluation function or the static evaluator. The static evaluation takes in a
board position and gives it a score. The higher the score, the better it is for you.
Whereas the lower the score, the better for the opponent. There are 2 heuristic
functions in this algorithm which are the gameover() function and eval() function.

Each time the minimax algorithm is called, the board is tested whether the
game is running or over. If the game is over, the function tries to determine the
winning player and returns the score appropriately by using the wins() function and
win_score function respectively. Else, if the game is over but there is no winner then
the player with the most number of dominating straight lines will win the game. This
is determined by endboard_score() function. On the other hand, if the game is not
over but the depth is more than or equal to 3, then the eval() function is called. This
function only evaluates the goodness of the board state towards the maximizing
player. If the board is bad towards the maximizing player then it is good towards the
opponent. The goodness or badness of the board state is determined by testing the
occurrences of both players in potential winning states. The higher the score
obtained, the better it is for the maximizing player.

The winning indices are stored in an instance variable when the player object is
created. The indices are mapped onto the current board position to determine if a
player had won on that board. If a win is found, the win_score function checks if the
player has won, it then returns an evaluation score based on whether the player or
the opponent that won during that move.
Similarly, when the board is full, the endboard_score function checks all the
positions on the board to determine the dominant player and returns an evaluation
score. Next the heuristic function eval( ) follows the equation E(X) = N(X) - N(O)
where E(X) is the evaluation score returned, N(X) denotes the number of positions
that the player could win while N(O) denotes the number where the opponent could
win. Therefore, if the evaluation score is positive, the state is good towards the
player and vice versa.

3. Conclusion

The results of the paper show the capabilities of utilizing the minimax
algorithm as the technique to optimize and reduce the worst-case scenario and
subsequently make the AI player has a higher win rate in a game of 4-by-4 tic-tac-
toe. The implementation of the minimax algorithm does not prove to be the ideal
algorithm if the time complexity and space complexity are taken as the dominating
factors. However, the implementation of the alpha-beta pruning allows the minimax
to be used without excessively taking up a high computational time and as well as
allowing the AI player to do what it was objectively was designed to do which is
provide a tougher challenge.

4. References

[1] Krauthammer, C. (2018) “Be Afraid”.


https://www.weeklystandard.com/be-afraid/article/9802

[2] Järvensivu, M. (2018) Developing and Optimizing Artificial Intelligence in Zero-sum


Games. https://www.theseus.fi/bitstream/handle/10024/150758/Matti_Jarvensivu.pdf?
sequence=1&isAllowed=y

[3] Russell, S. J. and Norvig, P. (2016) Artificial Intelligence: A Modern Approach. Pearson
Education Limited, Malaysia.

[4] Borovska, Plamenka & Lazarova, Milena. (2007). Efficiency of parallel minimax algorithm
for game tree search. ACM International Conference Proceeding Series. 285. 14.
10.1145/1330598.1330615.

[5] Kang, X.Y., Wang, Y.Q. and Hu, Y.R. (2019) Research on Different Heuristics for Minimax
Algorithm Insight from Connect-4 Game. Journal of Intelligent Learning Systems and
Applications, 11, 15-31. https://doi.org/10.4236/jilsa.2019.112002

[6] Khomskii, Y (2010) Infinite Games. https://www.math.uni-hamburg/Infinite Games, Yurii


Khomskii (2010) (section 1.1)

You might also like