Games of Knightian Uncertainty As AGI Testbeds: Ntroduction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Games of Knightian Uncertainty as AGI testbeds

Spyridon Samothrakis Dennis J.N.J. Soemers Damian Machlanski


IADS DACS CSEE
University of Essex Maastricht University University of Essex
Colchester CO4 3SQ, UK PHS1, 6229 EN Maastricht, NL Colchester CO4 3SQ, UK
ssamot@essex.ac.uk dennis.soemers@maastrichtuniversity.nl d.machlanski@essex.ac.uk

Abstract—Arguably, for the latter part of the late 20th and premise that, while a large part of the games community is
early 21st centuries, games have been seen as the drosophila interested in games for the sake of games, the almost universal
of AI. Games are a set of exciting testbeds, whose solutions (in shift to games as cultural artefacts is just a deviation stemming
arXiv:2406.18178v1 [cs.AI] 26 Jun 2024

terms of identifying optimal players) would lead to machines that


would possess some form of general intelligence, or at the very from perceived impotence. Drawing inspiration from recent
least help us gain insights toward building intelligent machines. discussions in the field of economics [5], we propose a set of
Following impressive successes in traditional board games like example games and benchmarks that can help rebuild trust in
Go, Chess, and Poker, but also video games like the Atari 2600 the process of “game AI for AGI” agenda. In particular, we
collection, it is clear that this is not the case. Games have been propose working around a new class of games, which we term
attacked successfully, but we are nowhere near AGI developments
(or, as harsher critics might say, useful AI developments!). In this “Games of Knightian Uncertainty,” where rapid rule changes
short vision paper, we argue that for game research to become are the norm.
again relevant to the AGI pathway, we need to be able to address The rest of this paper is organised as follows: Section II
Knightian uncertainty in the context of games, i.e. agents need to outlines the causes behind the failure of games as AGI
be able to adapt to rapid changes in game rules on the fly with testbeds, Section III discusses specifically why General Game
no warning, no previous data, and no model access.
Playing is no longer a useful AGI benchmark, while Section IV
introduces Knightian Games. We conclude with a very short
I. I NTRODUCTION
discussion in Section V.
Artificial Intelligence (AI) research papers have traditionally
justified their use of games as benchmarks with the claim II. W HAT PROBLEMS ARE STILL HARD FOR AI AND WHY ?
that they provide scaffolding for Artificial General Intelligence The recent successes of LLMs have created a situation
(AGI); the paper (and blog post) by Togelius [1] best captures where it is hard to argue that there is an alternative path to
early optimism in the process. Games are cultural artefacts of AGI that does not involve huge amounts of data and high
considerable importance, but also excellent “mini-universes” GPU cycles. Yet these models fail in a multitude of tasks,
one can experiment in. Although it was never clear what this which range from addition to reasoning. The status quo is so
AGI would look like at the limit (although certain authors have pro-LLMs that any proof of their shortcomings lies with the
tried to be more specific [2], with practical levels of automa- accuser, as the default perspective is that LLMs are indeed a
tion being discussed even further back [3]), one can assume form of AGI (or an AGI to be, pending some minor problems
that the goal was to end all the need for labour or to “sum- here and there). A more honest evaluation would show that
mon” an entity that would solve almost all human problems. the original criticisms of Chollet [4] and Mitchell [6] still
Within the wider scope of understanding intelligence, games hold. More specifically, the problems we face come from at
would also play a significant role in telling us more about least two major aspects related to foundational practices in
ourselves, thus potentially helping with advances in fields like modern Machine Learning (ML), discussed in the following
psychology and the broader cognitive sciences. However, we subsections.
have lately experienced a shift; games are no longer seen as the
“way to AGI.” The advent of Large Language Models (LLMs) A. Representation learning learns incomplete representations
moved the interest of the AI community away from games, and The basic selling point of representational learning (i.e.
more towards methods that learn from vast quantities of data what neural networks aim to do) is to help create abstractions
through a self-supervised process. that can be reused widely beyond the specific case. This
The problem of games (and related research, such as game was to be achieved through standard machine learning (given
competitions) not being up to the task of helping bring about enough data) or by combining various data streams into one
AGI was spotted early on by Chollet [4]. The insights identi- large network [7]. Intuitively, the discrete analogue of this
fied never seemed to catch on with the game AI community, research programme would be genetic programming/symbolic
which was getting involved deeper into games for the sake regression, where the basic functions are discovered by an
of games (i.e. studying the artefact itself), rather than using algorithm. Unfortunately, this is almost never the case. The
games with the goal of AGI in mind. This paper works on the latent variables learnt by neural networks often do not capture
(a) A board simulated using y = (b) Results using LightGBM; test MSE (c) Results using a standard MLP (trained
cos(x0 )cos(x1 ), with the color intensity outside the OOD area is ≈ 0.01, while using AdamW, 4 layers of 20 ReLU neu-
denoting y and x0 and x1 U ∼ [−20, 20]\ inside the OOD area is ≈ 0.42. rons); test MSE outside the OOD area is
[5, 15] for both training set and test set data ≈ 0.01, while inside the OOD area is
points, while OOD test set is U ∼ [5, 15] ≈ 0.29.
(the shaded area).
Fig. 1: Training, test and OOD test data and the default responses of modern regressors. Notice the significant error on OOD
test set (within the shaded area) – the gap between data points is far too wide for modern ML methods to generalise, while a
human would arguably fill in the pattern trivially.

any global properties meaningfully and thus fail to extrapolate, where an agent tries to identify how to operate in a single
something covered extensively by Chollet [4] and termed reward regime [13], [14] are the norm among AI theories, with
“memorising infinity” by Saba [8]. Representational learning, the upward generalisation limit often set to understanding the
instead of creating useful abstractions, seems to be trying to causal structure of the world [15]. This implies that a large
fill a huge hashmap-like structure with as much knowledge portion of all possible “realities” would have been traversed
as possible. An example of this is shown in Figure 1, where, by an agent throughout their lifetime, and at some point an
while out-of-the-box regressors successfully generalise in a agent will select how to act based on past observations. There
test set that comes from the same distribution, they fail when is an argument to be made, however, stemming from failure
it comes from a different distribution. Note that symbolic to make sufficient progress in robotics and self-driving cars,
regression would be successful in this case, but as we increase that the world is not stationary, but it changes all the time in
input dimensions, it becomes computationally impossible. The unexpected ways, which makes AI with statistical reasoning
situation is bad enough that, in order to reach superhuman lev- unsuitable as its core mechanism of analysis due to way too
els, agents have to play games orders of magnitude more than many “unknown unknowns”.
humans to fill those differentiable infinitely-large hashmaps.
As a consequence of memorising instead of learning robust III. G ENERAL G AME P LAYING
abstractions, quite often, and combined with other issues such The most obvious example of using games as “AGI
as catastrophic forgetting, agents tend to lose to adversarial scaffolding” comes from general game playing competitions.
examples [9]. This includes GVG-AI [16] and GGP [17], but also allied
approaches [18] and Atari games. The idea here is that one
B. The real world is non-stationary and open has to get insights beyond certain hand-coded heuristics of
Openness tends to refer to different things in Reinforcement playing a single game (something popular in single-game
Learning (RL) and economics; in RL, the environment is tasks) and come up with algorithms that could attack any
considered to be somewhat stationary [10], [11]. One is not game from scratch. The problem with these competitions is
expected to learn GO and then generalise to backgammon that the agents involved are not general in any sense. Almost
but at best slide into an adjacent game. That is, most open- always, the agents assume the existence of an easy-to-access
ended AI frameworks assume that there is a series of fixed (either online or offline) model, which does not adequately
environments that do not differ dramatically from one another, capture settings where more general intelligence would be
and open-endedness tends to focus on agents having increas- needed. At best, what is learnt is how to act in a wide array of
ingly more complicated behaviour. In economics, openness similar environments, i.e. the outcome agents are adapted to a
comes from new events that appear unexpectedly, and no specific game, not adaptive to wider environments. Overall,
statistical relationship with what was happening before can be the independent and sequential adaptation regime that this
established. Imposing closed-model thinking to open systems type of research promotes might be important for algorithmic
has been argued to be catastrophic [12]. Static environments, development but does very little to support AGI goals. Quoting
Hernández [19] “AI is the science and engineering of making functions were learnt until this point, they are now moot. If
machines do tasks they have never seen and have not been one treats everything as alleatoric or epistemic uncertainty,
prepared for beforehand”, and if we assume the agent has the intrusion of new rules mid-game would cause havoc to an
received a perfect model a-priori, a massive chunk of the already trained agent, and would require (at best) extensive
problem is effectively solved by the model. Researchers have retraining to bring the epistemic uncertainty levels down to
indeed tried to soften these requirements, but with some something manageable. To push things even further, imagine
notable exceptions [4], open settings (as for example in Wang an agent taught how to play backgammon, but now forced to
et al. [20]) remain limited to very basic distributional shifts play checkers with the remaining pieces without having seen
between what one uses during training and during testing. checkers before. Although a formal definition of what is a
game of Knightian uncertainty currently does not exist, we
IV. K NIGHTIAN UNCERTAINTY propose the following:

Definition 1. A game of Knightian uncertainty is one where


A. Types of uncertainty the transition function, rewards and the set of actions and
Researchers developing game playing agents have (mostly) observations are themselves non-stationary functions that can
so far focused on two forms of uncertainty; epistemic and change abruptly at any point.
alleatoric [21]. Both forms of uncertainty are very well known
In other words, the agent is never sure not only which game
to game designers and game players. Assuming an extensive-
they are playing (this is generally called “incompleteness” in
form game (the most general form of such games), epistemic
game-theoretic lingo), but also that they do not know which
uncertainty refers to the lack of knowledge that an agent
games are potentially possible. One might claim that an agent
playing the game has about the world. In practical terms,
might need some hints to link back to prior knowledge, but
this translates into a quantification of “how good” an action
this should not come from linking back to probabilistic feature
is. Given enough playthroughs, a well-adapted agent should
spaces (and the metric functions this alludes to) but by creat-
drop this to zero; the effects of every action should be known
ing abstractions and reasoning, something which is probably
to them. In contrast, alleatoric uncertainty reflects parts of
embedded within human value functions. Language and the
the rules of the game where randomness is irreducible, e.g.
way we communicate through stories and the generalisation
dice rolls, drawing up cards (or, in reinforcement learning
capacity inherent in successful models [24], potentially com-
lingo, stochastic transitions, rewards and actions). In tradi-
bined with causality [25], could potentially be a way forward.
tional games, depending on game type, there are incentives
To help support research on such games, a set of benchmarks
for agents to play around with both forms of uncertainty (e.g.
should be organised to create measurable objectives, building
increasing alleatoric uncertainty through bluffing in poker so
upon GGP. One such example, portrayed in Figure 2, would be
as to increase the epistemic uncertainty of your opponents,
as follows. There are two different “steps” involved, in which
or decreasing epistemic uncertainty while increasing alleatoric
participants are given a set of games (e.g. variations of chess),
uncertainty through exploratory actions). Both forms have
and they are asked to generalise to either different variations of
been studied extensively in RL, mainly through the exploration
the same game (e.g. different chess variants) or to completely
literature (e.g. see Jiang et al. [22] or Turner et al. [23] for
different games (e.g. backgammon). We call the first step “near
excellent reviews), and attacking them is vital for all agents,
OOD” (and reflect games that would be reachable through
but it does not address the main problems of agents learning
domain randomisation) and the second step “far OOD”, which
useful abstractions and being able to act with limited data.
includes different games – no model is to be provided, thus
B. Knightian uncertainty making search impossible. During evaluation, agents would
be exposed to a limited set of demo games (e.g. 5-10 demo
Historically, a third form of uncertainty, termed Knightian
runs) for their new setup, but otherwise no more information
uncertainty, has been both important and overlooked. The con-
should be provided. The interface should follow the well-
cept is borrowed from economics (for an excellent overview,
known observation-action paradigm of AI-GYM, but with both
see Sunstein [5]) where, as Keynes puts it, there might be
observations and actions of arbitrary size. Example setups for
situations where “there is no scientific basis on which to
a number of demo cases are provided below.
form any calculable probability whatever”, and it is tightly
coupled with “black swans” or “unknown unknowns”. The 1) Chess: Training Set: Players learn the fundamentals of
real world is an excellent source of such events, however chess through the training set, which represents the standard
it was never clear how these concepts could be applied to version of the game played on an 8×8 grid with traditional
games. What we propose here is to allow out-of-distribution pieces and rules. Near OOD: Test set variations include Chess
(OOD) events to take place habitually throughout a game. As 960 (Fischer random chess), where the starting position of the
an example, imagine an agent playing chess but suddenly there pieces is randomised, and chess variants with alternative board
is a change in the rules of the way the rook moves, and now it sizes or pieces or new types of pieces. Far OOD/Knightian:
moves the same way as a bishop or a king, while the board’s Playing poker and backgammon after training in near OOD
starting configuration is random. Whatever policies and value chess variants.
retrain
Training Testing Far OOD
Testing Near OOD
(e.g. a limited 5-10 demo runs 5-10 demo runs (e.g. different
(i.e. variations of
collection of games to the
the training games )
GVG-AI games) training set)

Fig. 2: A potential benchmarking setup for Knightian uncertainty. Agents train using the original game and are expected to
generalise to variations, while after training with variations (“Near OOD”) they are asked to generalise to different games.

2) Poker: Training Set: Games both limit and no-limit [8] P. Walid Saba, “Memorizing vs. Understanding (read: Data vs. Knowl-
Texas Hold’em. Near OOD: Poker variants such as as Omaha, edge) — medium.com,” https://medium.com/ontologik/memorizing-vs-
understanding-read-data-vs-knowledge-d27c5c756740, [Accessed 01-
and 5-Card Draw. Far OOD/Knightian: Playing Chess or Go 04-2024].
after training in near OOD poker variants. [9] T. T. Wang, A. Gleave, T. Tseng, K. Pelrine, N. Belrose, J. Miller, M. D.
3) Mario: Training Set: Players learn the basic mechanics Dennis, Y. Duan, V. Pogrebniak, S. Levine et al., “Adversarial policies
beat superhuman Go AIs,” in International Conference on Machine
and controls of Mario Bros, a classic platformer. Near OOD: Learning. PMLR, 2023, pp. 35 655–35 739.
Variations include different levels, challenges, and power- [10] K. O. Stanley, J. Lehman, and L. Soros, “Open-endedness: The last
ups, as well as fan-made levels and mods created by the grand challenge you’ve never heard of,” While open-endedness could be
a force for discovering intelligence, it could also be a component of AI
community. Far OOD: Playing Pac-Man following extensive itself, 2017.
training in the near OOD setting. [11] M. Samvelyan, A. Khan, M. D. Dennis, M. Jiang, J. Parker-Holder,
4) GVG-AI: Training Set: A small set of GVG-AI (Gen- J. N. Foerster, R. Raileanu, and T. Rocktäschel, “Maestro: Open-ended
environment design for multi-agent reinforcement learning,” in The
eral Video Game AI) games. Near OOD: Variations of the Eleventh International Conference on Learning Representations, 2022.
original training set (e.g. adding new enemies to space invaders [12] B. J. Loasby, “Closed models and open systems,” Journal of Economic
if they were included in the training set). Far OOD: A Methodology, vol. 10, no. 3, pp. 285–306, 2003.
[13] K. J. Friston and K. E. Stephan, “Free-energy and the brain,” Synthese,
different set of GVG-AI games, that have as little relationship vol. 159, pp. 417–458, 2007.
to the near OOD setting as possible. [14] M. Hutter, Universal artificial intelligence: Sequential decisions based
on algorithmic probability. Springer Science & Business Media, 2005.
V. C ONCLUSION [15] J. Richens and T. Everitt, “Robust agents learn causal world
models,” ArXiv, vol. abs/2402.10877, 2024. [Online]. Available:
The games community would either have to accept its https://api.semanticscholar.org/CorpusID:267740124
[16] D. Perez-Liebana, S. Samothrakis, J. Togelius, T. Schaul, S. M. Lucas,
irrelevance in the AGI race, or refocus to game benchmarks A. Couëtoux, J. Lee, C.-U. Lim, and T. Thompson, “The 2014 general
that matter. In tandem with other AI subcommunities, there video game playing competition,” IEEE Transactions on Computational
is a tendency to ignore the harder problems precisely because Intelligence and AI in Games, vol. 8, no. 3, pp. 229–243, 2016.
[17] H. Finnsson and Y. Björnsson, “Learning simulation control in general
they are hard, and solving them requires new approaches that game-playing agents,” in Proceedings of the AAAI Conference on
are not available at the time. However, if one is to accept Artificial Intelligence, vol. 24, no. 1, 2010, pp. 954–959.
that games do have a role to play in setting up meaningful [18] M. Stephenson, E. Piette, D. J. N. J. Soemers, and C. Browne, “An
overview of the Ludii general game system,” in 2019 IEEE Conference
benchmarks, current competition setups would have to change on Games (CoG). IEEE, 2019, pp. 864–865.
widely and test the limits of the generalisation capacity of [19] J. Hernández-Orallo, “Evaluation in artificial intelligence: from task-
game-playing agents. oriented to ability-oriented measurement,” Artificial Intelligence Review,
vol. 48, pp. 397–447, 2017.
R EFERENCES [20] R. Wang, J. Lehman, A. Rawal, J. Zhi, Y. Li, J. Clune, and K. Stan-
ley, “Enhanced POET: Open-ended reinforcement learning through
[1] J. Togelius, “AI researchers, video games are your friends!” in Computa- unbounded invention of learning challenges and their solutions,” in
tional Intelligence: International Joint Conference, IJCCI 2015 Lisbon, International Conference on Machine Learning. PMLR, 2020, pp.
Portugal, November 12-14, 2015, Revised Selected Papers. Springer, 9940–9951.
2017, pp. 3–18. [21] O. Lockwood and M. Si, “A review of uncertainty for deep reinforce-
[2] M. R. Morris, J. Sohl-dickstein, N. Fiedel, T. Warkentin, A. Dafoe, ment learning,” in Proceedings of the AAAI Conference on Artificial
A. Faust, C. Farabet, and S. Legg, “Levels of AGI: Operationalizing Intelligence and Interactive Digital Entertainment, vol. 18, no. 1, 2022,
progress on the path to AGI,” arXiv preprint arXiv:2311.02462, 2023. pp. 155–162.
[3] H. Braverman, Labor and Monopoly Capital: The Degradation of Work [22] M. Jiang, T. Rocktäschel, and E. Grefenstette, “General intelligence
in the Twentieth Century. NYU Press, 1974. [Online]. Available: requires rethinking exploration,” Royal Society Open Science, vol. 10,
http://www.jstor.org/stable/j.ctt9qfrkf no. 6, p. 230539, 2023.
[4] F. Chollet, “On the measure of intelligence,” 2019. [23] A. Turner, L. Smith, R. Shah, A. Critch, and P. Tadepalli, “Optimal
[5] C. R. Sunstein, “Knightian uncertainty,” Available at SSRN 4662711, policies tend to seek power,” Advances in Neural Information Processing
2023. Systems, vol. 34, pp. 23 063–23 074, 2021.
[6] M. Mitchell, “Abstraction and analogy-making in artificial intelligence,” [24] D. Hupkes, V. Dankers, M. Mul, and E. Bruni, “Compositionality
Annals of the New York Academy of Sciences, vol. 1505, no. 1, pp. decomposed: How do neural networks generalise?” Journal of Artificial
79–101, 2021. Intelligence Research, vol. 67, pp. 757–795, 2020.
[7] K. Rakelly, A. Gupta, C. Florensa, and S. Levine, “Which mutual- [25] B. Schölkopf, “Causality for machine learning,” in Probabilistic and
information representation learning objectives are sufficient for con- causal inference: The works of Judea Pearl, 2022, pp. 765–804.
trol?” Advances in Neural Information Processing Systems, vol. 34, pp.
26 345–26 357, 2021.

You might also like