Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Playing the Rock-Paper-Scissors Game with a

Genetic Algorithm
Fathelalem F. AM, Zensho Nakaot, Yen-Wei Chenf

*Department of Management & Information Systems


Faculty of International Studies, Meio University
Nago-shi, Okinawa 905-8585, Japan
Phone: (+81) 98-051-1207
Email: ali@mis.meio-u.ac.jp
+Department of Electrical & Electronics Engineering
Faculty of Engineering, University of the Ryukyus
Okinawa 903-0213, Japan

Abstract 2. rock-paper-scissors rule


This paper describes a strategy to follow whilst
playing the Rock-Paper-Scissors game. Instead of In its simplest form, each of two players has a
making a biased decision, a rule is adopted where choice of ScZssors, Paper, or Rock. The two play-
the outcomings of the game in the last few turns ers, simultaneously make a choice each. Depend-
are observed and then a deterministic decision is ing on the two players’ choices, a winner is decided
made. Such a strategy is encoded into a genetic according to the rule in Table 1.
string and a genetic algorithm (GA) works on a
population of such strings. Good strings are pro- Tabel 1
duced at later generations. Such strategy is found Player A Player B Winner
to be successful, and its efficiency is demonstrated Scissors Paper Player A
by testing the strategy against systematic, as well Scissors Player B
as human strategies. Paper Rock Player A

1. introduction Player A and Player B face each other and si-


multaneously display their hands in one of the fol-
Many concepts and examples in game theory can lowing three shapes: a fist denoting a rock, the
provide good models in constructing abstract evo- forefinger and middle finger extended and spread
lutionary systems. Though game theory was orig- so as to suggest scissors, or a downward facing
inally developed by V. Neuman and Morgenstern palm denoting a sheet of paper. As in Table 1,
[13] for application to economic theory, it has spread the rock wins over the scissors since it can shatter
later to many other disciplines. Maynark-Smith them, the scissors win over the paper since they
and Price [17] have opened the door to the wide can cut it, and the paper wins over the rock since
use of game theory in evolutionary ecology. In our it can be wrapped around the latter. The winner
current work, we construct an evolutionary system is awarded though there is no award in the case of
to be applied to the Rock-Paper-Scissors (RPS) a tie.
game. The Rock-Paper-Scissors is a classical two- If the game is repeated several times, the player
person simple game to quickly decide on a winner. who favors one of the options over the others, places
It is a game that children as well as adults play, himself at a disadvantage. The best strategy for
mathematicians analyze, and a certain species of each player is to play each of the options with
lizard in California takes very seriously [14]. the same frequency of 1/3 in a manner that will
We use a genetic algorithm [1][2] to train a yield the opponent as little information as possible
player, that makes use of the historical behavior of about any particular decision.
the opponent during the past few games to guide The game is made more interesting by playing
its current decision. The Rock-Paper-Scissors is a it repeatedly with the same player or a group of
good model for experimental and theoretical inves- players, thereby permitting partial time histories
tigations of cooper ative short memory behavior. of behavior to guide future trials.

0-7803-6375-2/00/$10.00 02000 IEEE. 74 1


3. the ga strategy at the beginning of the game is indeterminate. To
get around this problem, we may add three let-
If we want the computer to Play the game, we ters to the coding to specify a strategy's premises
assume a smart strategy would have two featuring or assumptions about pre-game behavior. These
aspects: three letters would be used sequentially to spec-
ify the assumed behavior of the opponent prior
e Offensive aspect: gathering information about
t o the beginning of the game, and then later to
the favorites of the opponent during course
keep %long actual history of the opponent behav-
of the game.
ior. Thus, a 34-ternary string would represent a
Defensive aspect: Giving little as possible particular strategy with 27 ternaries for decision,
information about the computer's particular four ternaries for the caprice tendency, and three
decision as history-reservoir (Figure 1).

I
1- 1-
[3]: allowing the decision rule to depend mainly
2 0 1 . - . . 1 0 1 2 1 0 0 2

three possibilities: Scissors (S), Paper (P), or Rock


Figure 1: A strategy string
(R). So, the particular behavior sequence can be
coded as a three-letter string. For example, SPR
would represent the sequence where the opponent
chose Scissors, Paper, and Rock in the last three
games. Treating this code as a base 3 integer, the 4. the genetic operators
behavioral alphabet is coded as S = 0, P = 1 , R
= 2. An elitist selection scheme is used for selection
By doing so, the three-letter sequence can be of Parents, where strings with high fitness are in-
used to generate a number between 0 and 26 . serted in the next generation without undergoing
Hence, three consecutive scissors choices of the o p further genetic Operations* On the Other
ponent would decode to 0 ( O O O ) , while three con- strings with lower fitness are replaced by offsprings
secutive rocks would decode to 26 (222). so, we reproduced from the strings with higher fitness.
can define a particular strategy to be a ternary
string of length 27. The z-th 0,1, or 2 corresponds For the crossover operation, a uniform crossover
to the i-th behavioral sequence. Using this scheme, Was where an offspring is generated randomly
for example, a 1 in position 12 would be decoded ternary, by ternmy from two Parents.
as 210 --+ Paper. Such strategy is based on the
information gathered .from the opponent behavior In mutation, a bit-wise mutation is applied with
in the previous trials of the game (i.e. enforce small Probability; a ternary is mutated to one of
offensive-aspec t). the two other values. For example, a 0-ternary is
We go further and add some capricious features mutated to 1 or 2 (W. by flipping a coin)-
to our strategy to enforce the defensive-aspect of
the strategy. w e assume that would counteract ss mechanics of the to
the opponent trial to trace the strategy logic, and
at the same time act as a counter measurement In the computer simulation, we set a tournament
towards subtle behavior of the opponent. We add of two games; a training game and a one-to-one
four more letters to the string to express the caprace game. the training game, the GA a ran-
The value decoded from the four-letter domly generated string (RqlayeT) to play against
C is used to calculate the probability (f'cap+ice) of an opponent (0-player) (e.g. human). Prior to
taking a capricious random decision, rather than a the beginning of the game, the game course is de-
deterministic decision encoded in the string strat- cided. The GA generates a population of strings
egY. randomly, then the game starts between R-player
and Oglayer. For each choice of the 0-player, in
Peaprice =a x c (a is a scaling factor) addition to the Rqlayer, the GA makes a choice
for each string in the population. The process is re-
( a << 1)
peated for game-course times. Then to each string
in the population, a fitness value is attached which
Since the set of rules generated by a 31-ternary is calculated as percentage of wins to the summa-
string depends on the past three plays, behavior tion of wins and losses. Following the step, the
742
GA reproduces a new population through appli-
cation of genetic operators on the current popula-
tion. The game continues for a specified number
of generations. The strategy encoded in the string
with the best fitness is then adopted for future
games between 0-player and GA-player.
W

In the second one-to-one game, the 0-player f


U
plays against the GA-player which adopts the best
strategy obtained through the training game. The
game goes on for several times and the player with
the least losses and most wins is announced as the
WINNER.
10 20 30 40 50
During the one-to-one game, the decoded value Generations
of the caprice C is scaled by a factor A. A is pro-
portional to number of successive loses of the GA Figure 2: Best & average fitness through genera-
strategy during game. tions (Training game, GA U S Systematic player)

set of A (A < p ) elitist individuals. P ( t ) is a p o p


dation of ( p - A) selected randomly from among
P(t), and pop-size denotes population size.
Where A is the number of successive loses of
the GA.
6. computer simulation
The algorithm below shows mechanics of the
In the computer simulation, we tested the GA
genetic algorithm during the training game:
strategy against two different opponents; a sys-
tematic opponent, and human.
Begin
t=O c=o 6.1. GA v s Systematic player
Initialize P(t)' Here an opponent that makes a systematic choice
While ( t # mazimum-generationn) D o is challenged by the GA strategy. The systematic
While ( c # game-course) D o choice is made by a function that tends to make a
0-player and R-player makes a choice biased choice: prefers one choice on the others, or
each. makes the choices systematically in turn.
The GA makes a choice for each string in Tabel 2 shows a condition for simulation.
the population.
Calculate Scores Tabel 2
c = c +1
I 50
End
c=o
Number of Generations
Game course
Population Size
I 150
I 30
1
Evaluate P ( t )
Select + ( t ) fiom P ( t )
Elite selection rate I 50%
Mutation rate 0.01
Select i ; ( t ) fiom P(t)
a 0.005
Crossove: F ( t )
Mutate P c t )
Evaluate P ( t )
The graph of Figure 2 shows the best fitness as
+
P ( t I ) = P ( t ) UP ( t )
well as population average fitness during a training
t=t+l
game under the environment of Table 2.
End
Looking at the graphs in Figure 2, we notice
End
that the GA after a random start, soon seems to
understand the pattern of the behavior of the sys-
In this algorithm, P(t) denotesea population of tematic player, and then develops strategies chal-
p individuals at generation ( t ) . P ( t ) is a special
lenge the systematic player.
The best strategy obtained is then adopted and
' P ( t ) is initialbed randomly five sets of one-to-one game, with 100, 200, 300,
743
100 260 360 4bo 560
Game course

0 5 10 15 20 25 30 35 40
Figure 3: Performance of GA vs Systematic Play Generatlons
(One-to-one game)
Figure 4 Best & average fitness through genera-
tions during training game
400, and 500-long courses, are carried out. The
strategy maintained a high score against the sys-
tematic player, as appears in Figure 3.

0.2. G A vs H u m a n
We set up an environment of ten people to play
Win
against the GA strategy proposed in the literature.
10

Tabel 3 shows parameters and simulation en-


vironment of the GA during the training game. 5

Table 3
0
No of Generations 1 2 3 4 5 6 7 8 9 10
Game course Players

Population Size
Elite selection rate Figure 5: Elite GA strategy vs 10 players (30 sets
Mutation rate 0.015 for each player)
I I
0.005 I

Nevertheless, the genetic algorithm, with the


Figure 4 shows the best fitness as well as popu- adopted rule for coding, developed good strategies
lation average fitness during training game against that performed very well.
player 1. A novel feature of our approach is the introduc-
Then, the computer adopts the best strategy tion of the caprice concept which gives room for
obtained during the tTaining game later in the capricious behavior away from the encoded strat-
main game. egy when encountering subtle behavior patterns
Each player plays 30 sets (30 times), and Fig- of the opponent that are not predicted during the
ure 5 shows the output of the tournament. preliminary training phase.
The authors believe there is still alot to do
7. conclusion with the experiments for the approach: experi-
ments with longer historical record of the GA o p
The GA strategy maintained superiority over the ponent( more than three!), trying the GA with
systematic player as well as the human players. other strategies (e.g. Tit-for-tat!), to mention but
For the systematic strategy, it is relatively easy for few.
the GA to predict the pattern of the behavior of The work demonstrates a machine learning a p
the opponent. In contrast, the behavior of human plication using genetic algorithms. The problem
players is continuously vacillating, and is difficult considered is drawn from game area: an archetyp
to predict. ical problem where decisions are usually non-deter-

744
ministic and mostly biased. [15] J. Nash, Non-cooperative games, Annals of
Mathematics, 54, pp.286-295, 1950.
Reference
[16] J.M. Smith, G.A. Parker, The Logic of Ani-
[l] D. Goldberg, Genetic Algorithms in Search, mal Confict, Nature, 246, pp- 15-18, 1973.
Optimization, and Machine Leaming ,
Addison-Wesley, 1989. [17] J.M. Smith, Evolution and the Theory of
Games, Combridge University Press, London,
[2] Z. Michalewice, Genetic Algorithms + Data 1982.
Structures = Evolution Progmms, 3rd edition,
Springer-Verlag, New York, 1996. [18] S. Stahl, A Gentle Introduction to Game The-
ory, American Mathematical Society, Mathe-
[3] R. Axelrod, Genetic Algorithm for the Pris- matical World, Vol. 13, 1999.
oner Dilemma Problem, in [4], pp.32 -41.
[4] L. Davis (Editor), Genetic Algorithms and
Simulated Annealing, Morgan Kaufmann
Publishers, San Mateo, CA, 1987.
[5] C.G. Langton, C. Taylor, J.D. Farmer, &
S. Rasmussen (Editor), Artificial Life ZI,
SFZ Studies in the Sciences of Complexity,
Addison-Wesley, Vol. X , 1991.
[6] M. J. Maynard-Smith, Evolution and the The-
ory of Games, Cambridge: Cambridge Uni-
versity Press, 1982.
[7] J. Maynard-Smith, G.R. Price, The Logic of
Animal Conflict, Nature, Vol. 246, pp. 15-18,
London, 1973.
[8] J.H. Holland, Adaptation in Natuml and Ar-
tificial Systems, Ann Arbor, MI: Univ. of
Michigan Press, 1975.
[9] G.H. Burgin, Systems identification by
quasilinearization and evolutionary program-
ming,J. Cybem., vol. 3, no. 2, pp. 6-75, 1973.
[lo] L.J. Fogel, Autonomous automata, Znd. Res.,
Vol. 4, pp. 14-19, 1962.
[113 I. Rechenberg, Evolutionsstrategie: Opti-
mierung technischer Systeme nach Prinzipien
der biologischen Evolution, Stuttgart, Ger-
many: Frommann-Holzboog, 1973.
[12] H.-P. Schwefel, G. Rudolph, Contemporary
evolution strategies, 3rd Znt. Confe. on Arti-
ficial Life (Lecture Notes in Artificial Zntelli-
gence, F. Morirn, A. Moreno, J.J. Merelo, and
P. Chacbn, Eds. Berlin, Germany: Springer,
Vol. 929, pp. 893-907, 1995.
[13] J. von Neumann, J., and Morgenstrn, O.,
Theory of Games and Economic Behabior,
Princeton University Press, Princeton, 1947.
[14] B. Sinervo, C.M. Livey, The Rock-Paper-
Scissors game and the evolution of alterna-
tive male strategies, Nature, 380, pp. 240-243,
1996.
745

You might also like