evol_checker_player

Evolving a
Checkers Player
Without Relying on
Human Expertise
David B. Fogel
20 Summer 2000 • i n t e l l i g e n c e
Evolving a Checkers Player
Evolutionary algorithms can be used to learn

how to play complex games of strategy with-
out relying on human expertise. In this
article, I discuss the use of evolutionary com-
putation and artificial neural networks in
learning how to play checkers. Starting from
neural networks that were created randomly,
an evolutionary algorithm has been able to craft a network that
can play checkers at a nearly expert level. No features other than
the positions of pieces on the board and the piece differential
were provided. The evolutionary algorithm learned everything
else on its own, simply by playing the game.
Introduction sion, if possible. In doing so, you also remove
I
magine yourself seated at a table. In front your opponent’s piece or pieces from the
of you is an 8 X 8 board of squares that board. If you have more than one possibility David B. Fogel
alternate colors. I’m seated across from you to jump your opponent’s pieces, you can Natural Selection, Inc.
and tell you that we’re going to play a game. choose which way you’d like to execute the 3333 N. Torrey Pines Ct.,
Each of us starts with 12 pieces placed on jump. If any of your pieces makes it to the Suite 200
alternating squares, as shown in Figure 1. You back row of the board it becomes a special La Jolla, CA 92037
are playing “white” and I’m playing “red.” The piece called a “king.” This piece can move for- Phone: (858) 455-6449
red player moves first. Initially you think I ward or backward diagonally, again one square Fax: (858) 455-1560
might be giving myself an advantage, but at a time unless it is involved in a jump. dfogel@natural-selection.
since you have no idea of what game we are “Let’s play,” I say. You protest naturally that com
about to play you continue to listen. I haven’t told you the object of the game.
You can only move your pieces diagonally “Right, let’s play,” and I make my first move.
forward one square at a time, unless they are You counter with a move. We play for several
next to an opponent’s piece and there is an more moves and eventually I declare the game
empty square directly behind that piece, in to be over.
which case you are forced to jump over the “Let’s play again,” I suggest.
opponent’s piece. In fact, you must continue “But wait,” you say, “did I win, or lose, or
jumping over the opponent’s pieces in succes- draw, or what?”
i n t e l l i g e n c e • Summer 2000 21
competent in this game?
The game at hand, of
course, is checkers, a com-
mon board game of skill
that includes no random-
ness in play (unlike
backgammon, for exam-
ple). Ever since computers
were first developed, there
has been interest in
designing computer algo-
rithms to play games like
checkers. Chess has
received the most atten-
tion. The success of Deep
Blue over world chess
champion Garry Kasparov
in 1997 was recognized
widely in both the popular
and scientific press. Other
Figure 1. The opening board position. The picture is from an Internet gaming site, www.zone.com, where games have also been tack-
people can log in to play a variety of games. In this case, the game is checkers. Players can chat by led, including checkers,
using the “chat box” below the board.
Othello, and backgam-
“I’m not telling yet. Let’s play again.” mon. With a few exceptions, these efforts have
Now imagine that we play five such games, relied on domain-specific information pro-
and only after the fifth game do I tell you that grammed into an algorithm in the form of
you earned, say, seven points for playing those weighted features that were believed to be
games. I don’t tell you which games earned important for assessing the relative worth of
you the points, or even indeed if you might alternative positions in the game. The pro-
have started with, say, 20 points and lost grams relied, essentially, on human expertise
points on each and every game. The only way to defeat human expertise.
for you to find out is to play another series of Defeating a human champion, or even an
five games and compare the total points that everyday person that you meet, in a game of
you receive after that series. skill using just a computer algorithm is a note-
Here’s the critical thought experiment: worthy accomplishment. But the significance
How long would it take you to become an of such accomplishments in terms of compu-
expert at this game? How many games would tational intelligence can be quite minor. After
you have to play? What features about the all, if we program human expertise in the
play of the game would you look for? One form of chunks of knowledge and rules about
obvious feature might be the piece differen- how to analyze different situations then we
tial: the difference in the number of pieces can rely on the sheer speed of the computer to
that I have and the number of pieces that you evaluate many more positions that any human
have. It would also be fairly easy to correlate could ever imagine. Deep Blue evaluated 200
whether it was good to be ahead on pieces or million chess boards every second. No human
behind once you find that the game ends con- can match this, or even come close. But where
sistently when one player has no more pieces is the intelligence in an automaton like Deep
and that, over many trials, the player who Blue? Everything it “knows,” it knows because
ends with no moves receives fewer points. But it was effectively told. It learned nothing on its
how long would it take you to become really own. A system that never learns, and has no
capability of ever learning, does not deserve king. Chinook can access previous games
the description of intelligent. As I wrote in played by grand masters and has a complete
Evolutionary Computation [Fogel 2000], intel- endgame database for all possible checker-
ligence is the ability to adapt behavior to meet board positions with eight or fewer pieces.
desired goals in a range of environments. The correct moves in these endgames have
Without an ability to adapt, to learn which been enumerated completely, so in these cases
behaviors are appropriate in different settings, Chinook plays without error. Everything that
there can be no intelligence. Chinook knows has been programmed by
One challenge that is more significant than hand; no learning algorithms have been
relying on existing human expertise to gener- employed with success, either to adjust the
ate a program that can play a game of skill, weights of the features or to add or delete fea-
like checkers or chess, is to have an algorithm tures, or indeed to invent new features.
learn competent strategies without such In contrast to Chinook, the best known
knowledge, simply by play- effort in designing an algo-
ing successive games rithm to play checkers was
between candidate strate- that by Samuel [Samuel
gies. Those that do well are 1959]. This was one of the
favored over those that do Without an ability to first apparently successful
poorly. Variations can be experiments in machine
adapt, to learn which
made to algorithms that do learning. Samuel’s method
well, and the process of play- behaviors are relied in part on the use of a
ing games can then be iterat- polynomial evaluation func-
ed. The hope is that a appropriate in different tion that consisted of a sub-
competent algorithm for set of weighted features
settings, there can be
playing the game, whatever chosen from a larger list of
that game might be, would no intelligence. possibilities. The technique
emerge from this process relied on a self-learning pro-
after several iterations. Such cedure whereby one player
a protocol matches nicely competed against the other.
within the framework of evo- The loser was replaced by a
lutionary computation. By simulating the deterministic variant of the winner by altering
evolutionary process, by which individuals the weights on the features that were used, or
compete for survival, it is possible to use the in some cases by replacing features that had
computer as a tool for problem solving and low weight with other features. Samuel’s pro-
that tool can be used to address games where gram, which also included rote learning of
optimal strategies are unknown. Checkers is games played by masters, was played against
one such game. and defeated R.W. Nealey in 1962. IBM
Research News described Nealey as “a former
A Brief History of Computer Checkers Connecticut checkers champion, and one of
Many attempts have been made to write pro- the nation’s foremost players.”
grams that play checkers. The current com- Unfortunately, the success of Samuel’s
puter world champion checkers program is effort was overstated, and it continues to be
called Chinook [Schaeffer 1996]. The pro- overstated. Consider the following:
gram evaluates alternative checkerboard posi- 1. Nealey, in fact, only became a
tions by using a weighted sum of features that Connecticut champion later, and his
are considered important. These features level of play on a national level was
include (1) the piece count, (2) the king’s uncalibrated.
count, (3) whose turn it is, and (4) the poten- 2. The game itself was not well played:
tial for a checker to advance uncontested to a Using Chinook, Schaeffer [1996, pp.
93–97] showed that both Nealey and In retrospect, Samuel’s program had one
Samuel’s program made several errors. “lucky” and widely publicized early victory. As
3. Nealey defeated Samuel’s program in a Schaeffer [1996] wrote, “The promise of the
rematch the next year, and Samuel 1962 Nealey game was an illusion.”
played four games with his program The promise of machine learning methods
against both the world champion and for playing games of skill, however, is not an
challenger in 1966, losing all eight illusion. A significant challenge is to devise a
games. method of having neural networks learn how
4. Subsequent judgment by the editor of to play a game such as checkers without being
a checkers magazine in the mid-1970s given expert knowledge in the form of weight-
put Samuel’s program below the “Class ed features, prior games from masters, look-up
B” level. To place this in context, rat- tables of enumerated endgame positions, and
ings are assigned by points: 2400+ is so forth, as was done by Samuel and Schaeffer.
grand master, 2200–2399 is master, This new approach appears to be a necessary
2000–2199 is expert, 1800–1999 is precursor to any effort to generate machine
Class A, 1600–1799 is Class B, and so intelligence that is capable of solving new
forth. Chinook is rated higher than problems in new ways [Fogel et al. 1966]. The
2800, and the closest human competi- measure of success is the level of play that can
tors are in the 2600s. be attained against humans without having
preprogrammed the requisite knowl-
edge to play well.
Kumar Chellapilla, a doctoral stu-
dent in engineering at the University of
California at San Diego, and I have
conducted an experiment that uses evo-
lutionary algorithms to evolve neural
networks that represent strategies for
playing checkers. These strategies were
competed against themselves starting
from completely random initializa-
tions. Like the thought experiment at
the opening of this paper, points were
assigned for a win, loss, or draw in each
of a series of games. Only the total
point score was used to represent the
quality of play of a neural network
(that is, no credit assignment was
employed, even to the level of identify-
ing which games were won, lost, or
tied). Networks with the highest scores
were maintained as parents for the next
generation. Offspring networks were
created by randomly varying the con-
nection weights of their parents, and
the process was iterated. After 250 gen-
Figure 2. The neural network topology chosen for the evolutionary checkers experiments. The net- erations, the best evolved neural net-
works have 32 input nodes (blue) that correspond to the 32 possible positions on the board. The
two hidden layers (green) comprise 40 and 10 hidden nodes, respectively. All input nodes are work was tested against a range of
connected directly to the output node (red) with a weight of 1.0. Bias terms affect each hidden human players on an Internet gaming
and output node as a threshold term (not pictured). site (www.zone.com) without telling
them that they were playing a program. The from the range [-0.2, 0.2]. The value of K was
results indicate that the network not only rose set initially to 2.0 for all neural networks. New
to a high level of play, it was also able to draw offspring were created from these 15 net-
against a master. works, one offspring per parent, by using
Gaussian random variation of the weights (see
Experiments and Results Chellapilla and Fogel [1999a]). All parents
The details of the experiment are numerous; and offspring competed for survival by play-
those who are interested in repeating the pro- ing games of checkers and receiving points for
cedure can consult Chellapilla and Fogel their resulting play. Each player in turn played
[1999a]. Because of space constraints, only one checkers game against each of five ran-
the basic implementation is reviewed here. domly selected opponents from the popula-
Each board was represented by a vector of tion (with replacement). In each of these five
length 32, with each component correspond- games, the player always played red, whereas
ing to an available position on the board. the randomly selected opponent always played
Components in the vector could take on ele- white. In each game,
ments from {–K, -1, 0, 1, K}, where K was the the player scored –2,
value assigned for a king, “1” was the value of 0, or +1 points,
a regular checker, and “0” represented an respectively, for a
empty square. The sign of the value indicated loss, draw, or win.
whether the piece in question belonged to the Points were similarly
player (positive) or the opponent (negative). A assigned to the oppo-
player’s move was determined by evaluating nents depending on
the presumed quality of potential future posi- their outcome. Each
tions. This evaluation function was structured individual move was
as a fully connected feed-forward neural net- determined using a
work (Figure 2). Because of the direct connec- look-ahead of four
tions between the input and output, the ply (two moves on each side) in a minimax
neural networks could compute the piece dif- fashion. After all 30 neural networks had
ferential implicitly, but this was the only fea- played their series of games, the 15 networks
ture that was allowed a priori (and constitutes with the highest point totals were retained as
novice not expert knowledge). Note also that parents for the next generation and the
the value K and the weights of the network process was continued for 250 generations.
could evolve. The standard heuristic in check- The best evolved neural network from the
ers is to use K = 1.5. Such a “cheat” was avoid- 250th generation was used to play against
ed here. human opponents on www.zone.net. Each
When a board was presented to a neural player logging on to this site is assigned an ini-
network for evaluation, the output node des- tial rating of 1600 and the player’s rating then
ignated a scalar value that was interpreted as increases or decreases depending on his or her
the worth of that board from the position of performance against other rated competitors.
the player whose pieces were denoted by posi- Over the course of 2 weeks, Kumar and I
tive values. The closer the output was to 1.0, played 90 games against opponents. None of
the better the position. Similarly, the closer them hinted that we were using a program to
the output was to –1.0, the worse the board. play. In all, 47 games were played as red, 43 as
All positions that were wins for the player white, and we relied mainly on a six-ply look-
were assigned a value of 1.0 and all losses were ahead in these matches.
assigned the value –1.0. Figure 3 shows a histogram of the number
To begin the evolutionary algorithm, a of games played against players of various rat-
population of 15 neural networks (strategies) ings and the win–draw–loss record attained in
was initialized with weights drawn at random each category. The evolved neural network eas-
than 40,000 registered players. The sequence
of moves for this game can be found in
Chellapilla and Fogel [1999a].
Summary
The results show that an evolutionary algo-
rithm can start with essentially no informa-
tion in the game of checkers beyond the piece
differential and learn, over successive genera-
tions, how to play at a level that is challenging
to many humans—and even to earn a draw
against a master. It is important to emphasize
that whatever the computer program learned,
it did not learn it from either of its authors:
both Kumar and I are poor checkers players,
and the program was able to defeat each of us
by just the 10th generation.
It is interesting to go back to a comment
offered by Allen Newell in 1961 [Minsky
Figure 3. The performance of the evolved neural network after 250 generations, played 1961]): “It is extremely doubtful whether
throughout 90 games against human opponents on www.zone.com. The histogram indi- there is enough information in ‘win, lose, or
cates the rating of the opponent and the associated performance against opponents draw’ when referred to the whole play of the
with that rating. Ratings are binned into intervals of 100 units (that is, 1650 corre- game [such as checkers] to permit any learn-
sponds to opponents who were rated between 1600 and 1699). The numbers above each ing at all over available time scales.” In con-
bar indicate the number of wins, draws, and losses, respectively. Note that the evolved
network generally defeated opponents who were rated below 1800 and played to about an
trast to this conjecture, the results
equal number of wins and losses with those who were rated between 1800 and 1899. demonstrate that not only can learning occur,
but that such learning can be sufficient to at
ily defeated players rated 1800 or lower and least once play on par with someone ranked at
had almost as many losses as wins against oppo- the master level, in the top 20 on a global
nents rated between 1800 and 1900. It was not Internet gaming site. Furthermore, learning
competitive with players rated more than 2000 can take place on even less information than is
(expert and master categories). The final rating offered in win, lose, or draw because the neur-
for the neural network was estimated at al networks in this case never received specific
1901.98, placing it as a Class A player. For feedback on individual games played. Only an
comparison, recall that Samuel’s learning overall point score was made available.
machine—with its prescribed features that were Nevertheless, evolution was able to extract
believed to be important—was judged to be nonlinear structure from the game and cap-
below the Class B level. (Chinook, in contrast, ture information in the neural networks that
stands as a grand master and world champion was useful in identifying favorable board posi-
at 2800+.) Figure 4 shows the sequential rating tions. The specific features that the neural net-
of the evolved neural network and the rating of works invented are unknown, and subsequent
the opponents we played throughout all 90 analysis may be directed at unmasking these
games. Like the other players, the network features.
began its rating at 1600 and then climbed fair- Let’s return to the opening scene of this
ly steadily. The best performance of the evolved paper. Suppose that instead of facing an 8 X 8
network was likely recorded in a game against a checkerboard, you instead were looking at a
player rated 2207 (master level), which ended 1 X 32 “board” where the pieces seemed to be
in a draw. At the time, this opponent was able to move in a strange series of ways.
ranked number 18 on the website out of more Different squares at different distances
appeared to be connected. You might think you 1999a. Evolution, neural networks, games, and intel-
were playing some bizarre new game, but of ligence. Proceedings of the Institute of Electrical and
course, all that I’ve intended here is to have Electronics Engineers 87, 9, pp. 1471–1496.
scanned the checkerboard from left to right, Chellapilla, K. and Fogel, D.B. 1999b. Evolving neural
from top to bottom. This removes the spatial networks to play checkers without relying on expert
characteristics of the game. This configuration knowledge. IEEE Transactions on Neural Networks 10, PERMISSION TO MAKE DIGITAL OR
would undoubtedly impose a greater handicap 6. In press. HARD COPIES OF ALL OR PART OF THIS
WORK FOR PERSONAL OR CLASSROOM

in your attempt to discover useful features and Fogel, D.B. 2000. Evolutionary Computation: Toward a
USE IS GRANTED WITHOUT FEE PRO-
tactics for winning. But this is exactly what the New Philosophy of Machine Intelligence. Second ed. VIDED THAT COPIES ARE NOT MADE
neural network we have evolved had to contend Piscataway, NJ: IEEE Press. OR DISTRIBUTED FOR PROFIT OR COM-
MERCIAL ADVANTAGE AND THAT

with. It knew nothing of the spatial nature of Fogel, L.J., Owens, A.J., and Walsh, M.J. 1966.
COPIES BEAR THIS NOTICE AND THE
the game. Adding the provision for allowing Artificial Intelligence through Simulated Evolution. FULL CITATION ON THE FIRST PAGE.
the neural networks to evaluate board positions New York: John Wiley. TO COPY OTHERWISE, TO REPUBLISH,
TO POST ON SERVERS OR TO REDIS-
according to spatial features (that it alone Minsky, M. 1961. Steps toward artificial intelligence.
TRIBUTE TO LISTS, REQUIRES PRIOR
invents) offers promise for improving the level Proceedings of the IRE 49, 1, pp. 8–30. SPECIFIC PERMISSION AND/OR A FEE.
of play. Some initial steps in this direction have Samuel, A.L. 1959. Some studies in machine learning © ACM 1523-8822 00/0600 $5.00
been offered in Chellapilla and Fogel [1999b]. using the game of checkers. IBM Journal of Research
It remains to be seen how much computation is and Development 3, 3, pp. 210–219.
required for the neural networks to evolve to an Schaeffer, J. 1996. One Jump Ahead: Challenging
“expert” level. Further experimentation is Human Supremacy in Checkers. Berlin: Springer.
ongoing.
If you’d like to play
against our evolved neural
network, Kumar and I will
be offering matches at the
2000 Congress on
Evolutionary Computation
in San Diego, California,
July 16–19, 2000. The
congress is cosponsored by
the IEEE Neural Networks
Council, Evolutionary
Programming Society, and
Institution of Electrical
Engineers. Look for more
information about the
meeting at http://pcgipse-
ca.cee.hw.ac.uk/cec2000/.
Acknowledgment
The author would like to
thank Kumar Chellapilla
not only for his efforts on
the project but also in for-
matting the figures that
were used in this article.
Figure 4. The sequential rating of the evolved neural network (ENN) throughout the 90 games played against
human opponents. The graph indicates both the network’s rating and the corresponding rating of the opponent
References on each game, along with the result (win, draw, or loss). The highest rating for the ENN was 1975.8 on game 74.
Chellapilla, K. and Fogel, D.B. The evolved network completed the 90 games by earning a Class A rating.

evol_checker_player

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

evol_checker_player

Uploaded by

Copyright:

Available Formats

Evolving a

Evolutionary algorithms can be used to learn

Introduction sion, if possible. In doing so, you also remove

WORK FOR PERSONAL OR CLASSROOM

MERCIAL ADVANTAGE AND THAT

You might also like