Professional Documents
Culture Documents
Games of Knightian Uncertainty As AGI Testbeds: Ntroduction
Games of Knightian Uncertainty As AGI Testbeds: Ntroduction
Games of Knightian Uncertainty As AGI Testbeds: Ntroduction
Abstract—Arguably, for the latter part of the late 20th and premise that, while a large part of the games community is
early 21st centuries, games have been seen as the drosophila interested in games for the sake of games, the almost universal
of AI. Games are a set of exciting testbeds, whose solutions (in shift to games as cultural artefacts is just a deviation stemming
arXiv:2406.18178v1 [cs.AI] 26 Jun 2024
any global properties meaningfully and thus fail to extrapolate, where an agent tries to identify how to operate in a single
something covered extensively by Chollet [4] and termed reward regime [13], [14] are the norm among AI theories, with
“memorising infinity” by Saba [8]. Representational learning, the upward generalisation limit often set to understanding the
instead of creating useful abstractions, seems to be trying to causal structure of the world [15]. This implies that a large
fill a huge hashmap-like structure with as much knowledge portion of all possible “realities” would have been traversed
as possible. An example of this is shown in Figure 1, where, by an agent throughout their lifetime, and at some point an
while out-of-the-box regressors successfully generalise in a agent will select how to act based on past observations. There
test set that comes from the same distribution, they fail when is an argument to be made, however, stemming from failure
it comes from a different distribution. Note that symbolic to make sufficient progress in robotics and self-driving cars,
regression would be successful in this case, but as we increase that the world is not stationary, but it changes all the time in
input dimensions, it becomes computationally impossible. The unexpected ways, which makes AI with statistical reasoning
situation is bad enough that, in order to reach superhuman lev- unsuitable as its core mechanism of analysis due to way too
els, agents have to play games orders of magnitude more than many “unknown unknowns”.
humans to fill those differentiable infinitely-large hashmaps.
As a consequence of memorising instead of learning robust III. G ENERAL G AME P LAYING
abstractions, quite often, and combined with other issues such The most obvious example of using games as “AGI
as catastrophic forgetting, agents tend to lose to adversarial scaffolding” comes from general game playing competitions.
examples [9]. This includes GVG-AI [16] and GGP [17], but also allied
approaches [18] and Atari games. The idea here is that one
B. The real world is non-stationary and open has to get insights beyond certain hand-coded heuristics of
Openness tends to refer to different things in Reinforcement playing a single game (something popular in single-game
Learning (RL) and economics; in RL, the environment is tasks) and come up with algorithms that could attack any
considered to be somewhat stationary [10], [11]. One is not game from scratch. The problem with these competitions is
expected to learn GO and then generalise to backgammon that the agents involved are not general in any sense. Almost
but at best slide into an adjacent game. That is, most open- always, the agents assume the existence of an easy-to-access
ended AI frameworks assume that there is a series of fixed (either online or offline) model, which does not adequately
environments that do not differ dramatically from one another, capture settings where more general intelligence would be
and open-endedness tends to focus on agents having increas- needed. At best, what is learnt is how to act in a wide array of
ingly more complicated behaviour. In economics, openness similar environments, i.e. the outcome agents are adapted to a
comes from new events that appear unexpectedly, and no specific game, not adaptive to wider environments. Overall,
statistical relationship with what was happening before can be the independent and sequential adaptation regime that this
established. Imposing closed-model thinking to open systems type of research promotes might be important for algorithmic
has been argued to be catastrophic [12]. Static environments, development but does very little to support AGI goals. Quoting
Hernández [19] “AI is the science and engineering of making functions were learnt until this point, they are now moot. If
machines do tasks they have never seen and have not been one treats everything as alleatoric or epistemic uncertainty,
prepared for beforehand”, and if we assume the agent has the intrusion of new rules mid-game would cause havoc to an
received a perfect model a-priori, a massive chunk of the already trained agent, and would require (at best) extensive
problem is effectively solved by the model. Researchers have retraining to bring the epistemic uncertainty levels down to
indeed tried to soften these requirements, but with some something manageable. To push things even further, imagine
notable exceptions [4], open settings (as for example in Wang an agent taught how to play backgammon, but now forced to
et al. [20]) remain limited to very basic distributional shifts play checkers with the remaining pieces without having seen
between what one uses during training and during testing. checkers before. Although a formal definition of what is a
game of Knightian uncertainty currently does not exist, we
IV. K NIGHTIAN UNCERTAINTY propose the following:
Fig. 2: A potential benchmarking setup for Knightian uncertainty. Agents train using the original game and are expected to
generalise to variations, while after training with variations (“Near OOD”) they are asked to generalise to different games.
2) Poker: Training Set: Games both limit and no-limit [8] P. Walid Saba, “Memorizing vs. Understanding (read: Data vs. Knowl-
Texas Hold’em. Near OOD: Poker variants such as as Omaha, edge) — medium.com,” https://medium.com/ontologik/memorizing-vs-
understanding-read-data-vs-knowledge-d27c5c756740, [Accessed 01-
and 5-Card Draw. Far OOD/Knightian: Playing Chess or Go 04-2024].
after training in near OOD poker variants. [9] T. T. Wang, A. Gleave, T. Tseng, K. Pelrine, N. Belrose, J. Miller, M. D.
3) Mario: Training Set: Players learn the basic mechanics Dennis, Y. Duan, V. Pogrebniak, S. Levine et al., “Adversarial policies
beat superhuman Go AIs,” in International Conference on Machine
and controls of Mario Bros, a classic platformer. Near OOD: Learning. PMLR, 2023, pp. 35 655–35 739.
Variations include different levels, challenges, and power- [10] K. O. Stanley, J. Lehman, and L. Soros, “Open-endedness: The last
ups, as well as fan-made levels and mods created by the grand challenge you’ve never heard of,” While open-endedness could be
a force for discovering intelligence, it could also be a component of AI
community. Far OOD: Playing Pac-Man following extensive itself, 2017.
training in the near OOD setting. [11] M. Samvelyan, A. Khan, M. D. Dennis, M. Jiang, J. Parker-Holder,
4) GVG-AI: Training Set: A small set of GVG-AI (Gen- J. N. Foerster, R. Raileanu, and T. Rocktäschel, “Maestro: Open-ended
environment design for multi-agent reinforcement learning,” in The
eral Video Game AI) games. Near OOD: Variations of the Eleventh International Conference on Learning Representations, 2022.
original training set (e.g. adding new enemies to space invaders [12] B. J. Loasby, “Closed models and open systems,” Journal of Economic
if they were included in the training set). Far OOD: A Methodology, vol. 10, no. 3, pp. 285–306, 2003.
[13] K. J. Friston and K. E. Stephan, “Free-energy and the brain,” Synthese,
different set of GVG-AI games, that have as little relationship vol. 159, pp. 417–458, 2007.
to the near OOD setting as possible. [14] M. Hutter, Universal artificial intelligence: Sequential decisions based
on algorithmic probability. Springer Science & Business Media, 2005.
V. C ONCLUSION [15] J. Richens and T. Everitt, “Robust agents learn causal world
models,” ArXiv, vol. abs/2402.10877, 2024. [Online]. Available:
The games community would either have to accept its https://api.semanticscholar.org/CorpusID:267740124
[16] D. Perez-Liebana, S. Samothrakis, J. Togelius, T. Schaul, S. M. Lucas,
irrelevance in the AGI race, or refocus to game benchmarks A. Couëtoux, J. Lee, C.-U. Lim, and T. Thompson, “The 2014 general
that matter. In tandem with other AI subcommunities, there video game playing competition,” IEEE Transactions on Computational
is a tendency to ignore the harder problems precisely because Intelligence and AI in Games, vol. 8, no. 3, pp. 229–243, 2016.
[17] H. Finnsson and Y. Björnsson, “Learning simulation control in general
they are hard, and solving them requires new approaches that game-playing agents,” in Proceedings of the AAAI Conference on
are not available at the time. However, if one is to accept Artificial Intelligence, vol. 24, no. 1, 2010, pp. 954–959.
that games do have a role to play in setting up meaningful [18] M. Stephenson, E. Piette, D. J. N. J. Soemers, and C. Browne, “An
overview of the Ludii general game system,” in 2019 IEEE Conference
benchmarks, current competition setups would have to change on Games (CoG). IEEE, 2019, pp. 864–865.
widely and test the limits of the generalisation capacity of [19] J. Hernández-Orallo, “Evaluation in artificial intelligence: from task-
game-playing agents. oriented to ability-oriented measurement,” Artificial Intelligence Review,
vol. 48, pp. 397–447, 2017.
R EFERENCES [20] R. Wang, J. Lehman, A. Rawal, J. Zhi, Y. Li, J. Clune, and K. Stan-
ley, “Enhanced POET: Open-ended reinforcement learning through
[1] J. Togelius, “AI researchers, video games are your friends!” in Computa- unbounded invention of learning challenges and their solutions,” in
tional Intelligence: International Joint Conference, IJCCI 2015 Lisbon, International Conference on Machine Learning. PMLR, 2020, pp.
Portugal, November 12-14, 2015, Revised Selected Papers. Springer, 9940–9951.
2017, pp. 3–18. [21] O. Lockwood and M. Si, “A review of uncertainty for deep reinforce-
[2] M. R. Morris, J. Sohl-dickstein, N. Fiedel, T. Warkentin, A. Dafoe, ment learning,” in Proceedings of the AAAI Conference on Artificial
A. Faust, C. Farabet, and S. Legg, “Levels of AGI: Operationalizing Intelligence and Interactive Digital Entertainment, vol. 18, no. 1, 2022,
progress on the path to AGI,” arXiv preprint arXiv:2311.02462, 2023. pp. 155–162.
[3] H. Braverman, Labor and Monopoly Capital: The Degradation of Work [22] M. Jiang, T. Rocktäschel, and E. Grefenstette, “General intelligence
in the Twentieth Century. NYU Press, 1974. [Online]. Available: requires rethinking exploration,” Royal Society Open Science, vol. 10,
http://www.jstor.org/stable/j.ctt9qfrkf no. 6, p. 230539, 2023.
[4] F. Chollet, “On the measure of intelligence,” 2019. [23] A. Turner, L. Smith, R. Shah, A. Critch, and P. Tadepalli, “Optimal
[5] C. R. Sunstein, “Knightian uncertainty,” Available at SSRN 4662711, policies tend to seek power,” Advances in Neural Information Processing
2023. Systems, vol. 34, pp. 23 063–23 074, 2021.
[6] M. Mitchell, “Abstraction and analogy-making in artificial intelligence,” [24] D. Hupkes, V. Dankers, M. Mul, and E. Bruni, “Compositionality
Annals of the New York Academy of Sciences, vol. 1505, no. 1, pp. decomposed: How do neural networks generalise?” Journal of Artificial
79–101, 2021. Intelligence Research, vol. 67, pp. 757–795, 2020.
[7] K. Rakelly, A. Gupta, C. Florensa, and S. Levine, “Which mutual- [25] B. Schölkopf, “Causality for machine learning,” in Probabilistic and
information representation learning objectives are sufficient for con- causal inference: The works of Judea Pearl, 2022, pp. 765–804.
trol?” Advances in Neural Information Processing Systems, vol. 34, pp.
26 345–26 357, 2021.