Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World

Unlearnable Games and “Satisficing” Decisions:
A Simple Model for a Complex World

Jérôme Garnier-Brun,1, 2, ∗ Michael Benzaquen,1, 2, 3 and Jean-Philippe Bouchaud1, 3, 4
1
Chair of Econophysics and Complex Systems, École polytechnique, 91128 Palaiseau Cedex, France
2
LadHyX, CNRS, École polytechnique, Institut Polytechnique de Paris, 91120 Palaiseau, France
3
Capital Fund Management, 23 Rue de l’Université, 75007 Paris, France
4
Académie des Sciences, 23 Quai de Conti, 75006 Paris, France
(Dated: January 5, 2024)
As a schematic model of the complexity economic agents are confronted with, we introduce the
arXiv:2312.12252v2 [cond-mat.stat-mech] 3 Jan 2024
“SK-game”, a discrete time binary choice model inspired from mean-field spin-glasses. We show that
even in a completely static environment, agents are unable to learn collectively-optimal strategies.
This is either because the learning process gets trapped in a sub-optimal fixed point, or because
learning never converges and leads to a never ending evolution of agents intentions. Contrarily to
the hope that learning might save the standard “rational expectation” framework in economics,
we argue that complex situations are generically unlearnable and agents must do with satisficing
solutions, as argued long ago by Herbert Simon [1]. Only a centralized, omniscient agent endowed
with enormous computing power could qualify to determine the optimal strategy of all agents.
Using a mix of analytical arguments and numerical simulations, we find that (i) long memory of
past rewards is beneficial to learning whereas over-reaction to recent past is detrimental and leads
to cycles or chaos; (ii) increased competition destabilizes fixed points and leads first to chaos and, in
the high competition limit, to quasi-cycles; (iii) some amount of randomness in the learning process,
perhaps paradoxically, allows the system to reach better collective decisions; (iv) non-stationary,
“aging” behaviour spontaneously emerge in a large swath of parameter space of our complex but
static world. On the positive side, we find that the learning process allows cooperative systems to
coordinate around satisficing solutions with rather high (but markedly sub-optimal) average reward.
However, hyper-sensitivity to the game parameters makes it impossible to predict ex ante who will
be better or worse off in our stylized economy.
La complexité de l’ensemble fait que tout ce qui peut G. Habit Formation 10

arriver est vraiment, malgré l’expérience acquise, impos- H. Core Messages 11
sible à prévoir, encore plus à imaginer. Il est inutile
de tenter de le décrire, car on peut concevoir n’importe IV. Fixed Point Analysis and Complexity 11
quelle solution.[2] A. Critical Noise Level 12
Boris Vian, in “L’Automne à Pékin”. B. The Elusive Complexity 12
V. Counting Limit Cycles 13

CONTENTS A. Cycles without Memory (α = 1) 13
B. Cycles with Memory (α < 1) 13
I. Introduction 2
A. Rationality, Complexity & Learning 2 VI. Dynamical Mean-Field Theory 14
B. Related Ideas 3
C. Outline of the paper 3 VII. Noiseless Learning 15
A. The Memory-less Limit α = 1 15
II. A simple model for a complex world 4 B. Memory Helps Convergence to Fixed
A. Set-up of the Model 4 Points 16
B. The Interaction Matrix 5
C. Anomalous Stretching of Cycles 17
III. Basic intuition and main results 5
A. Phase Diagram in the Noiseless Limit 6 VIII. Noisy Learning 18
B. Finite N , Large t vs. Finite t, Large N 7 A. Fixed Points for Intentions 18
C. Noisy Learning & “Aging” 8 B. Aging 19
D. Individual Rewards 8 C. Chaos and (quasi-)Limit cycles 20
E. Unpredictable Equilibria 9 D. Role of the Noise: Recap 21
F. Increasing Cooperativity 9
IX. Summary & Discussion 22
A. Blindsided by Complexity 22
B. Technical Results & Conjectures 23
∗ jerome.garnier-brun@polytechnique.edu C. Extensions and Final Remarks 24
2
Acknowledgments 25 two-player games, or by one of us (JPB) with R. Farmer

in a simple binary choice, multiplayer game [15]. In such
Appendices 26 cases, the probabilities governing the different possible
choices are not fixed but must themselves be described
A. Static NMFE 26 by probabilities.
This is in fact a generic feature of “complex systems”.
B. Limit Cycle Complexity with Memory 26
As proposed by G. Parisi [16, 17], the description of such
1. Fixed Points 29
systems requires the introduction of probabilities of prob-
2. Two-cycles 29
abilities, as their statistical behaviour themselves (and
C. Derivation of the DMFT Equations 31 not only individual trajectories) are highly sensitive to
the small changes in parameters, initial conditions, or
D. Adapting the Sompolinsky & Crisanti result 31 time. The inability to describe such systems with know-
able probabilities was coined “radical complexity” in [18].
References 32 The sensitivity of optimal solutions to the parame-
ters of the problem, or to the algorithm used to find
them, has a very real consequence: one can no longer
I. INTRODUCTION assume that all agents, even fully rational, will make the
same decision, since any small perturbation may lead to
A. Rationality, Complexity & Learning a completely different solution, although similar in per-
formance. In other words, the “common knowledge” as-
Classical economics is based on the idea that rational sumption is not warranted.[19] This has already been un-
agents make optimal decisions, i.e. optimize their ex- derlined for example in the context of portfolio optimisa-
pected utility over future states of the world, weighted tion in [20, 21], or in the context of networked economies
by their objective probabilities. Such an idealisation of [13, 22], but is expected to be of much more general
human behaviour has been criticized by many (see e.g. scope, as anticipated by Keynes long ago and empha-
[1, 3–7]). In particular, assuming that all agents are ra- sized by many heterodox economists in the more recent
tional, allowing one to use game theoretic arguments to past [1, 4, 6, 7].
build such optimal strategies – often the result of compli- Here we want to dwell on this issue in the context
cated mathematical calculations – is implausible, to say of a multi-player binary game – the “SK-game” –, un-
the least. derstood as an idealisation of the economic world where
A way to possibly save the rational expectation agents strongly interact in such a way that their pay-
paradigm is to posit that agents are able to learn best offs depend non trivially on the action of others. In our
responses from past experience. Yes, agents are only setting, some relationships are mutually beneficial, while
“boundedly” rational, but they learn and in the long run, others are competitive. Agents have to learn how to co-
they act “as if” they were rational [8]. This is clearly ex- ordinate to optimize their expected gains, which they do
pressed by Evans and Honkapohja in their review paper in a standard reinforcement way by observing the payoff
on the subject [9]. They note that [i]n standard macroe- of their past actions and adapting their strategies accord-
conomic models rational expectations can emerge in the ingly.
long run, provided the agents’ environment remains sta- We find that our stylized model gives rise to a very
tionary for a sufficiently long period. wide range of dynamical behaviour: sub-optimal fixed
While seemingly reasonable, this proposition is by no points (a.k.a. Nash equilibria), but also limit cycles when
means guaranteed to be legitimate. Indeed, the hypoth- learning is too fast, and “chaos”, meaning that individual
esis that the environment should be stationary over “suf- mixed strategies never converge. Chaos can be determin-
ficiently long periods” can be restated in terms of the istic, when learning is noiseless, or stochastic when ran-
speed of convergence of the learning process, that should dom noise is introduced in the learning process. In such
be short enough compared to the correlation time scale chaotic situations, the environment of each agent changes
of the environment. However, in many circumstances over time, not because of exogenous shocks but because
and in particular in complex games, the convergence of of the endogenous learning dynamics of other agents. As
the learning process to a collectively optimal state can also argued in [13], a purely static network of interac-
be exceedingly long, or may in fact never take place. For tions can generate an apparent never-ending evolution of
example, reasonable learning rules can trap the system in the world in which agents live, although it may appear
some sub-optimal regions of the (high dimensional) solu- stationary for very long periods of time. In such a case,
tion space, see e.g. [10–13]. In other words, the learning one speaks of “aging”, which we address in a dedicated
process itself can be non-ergodic, even if the environment section below.
is described by an ergodic, stationary process. Another Even when learning is efficient and fixed points are
possibility is that agents’ strategies, even probabilistic, reached, the average pay-off of agents – although better
evolve chaotically forever, as was found by T. Galla & than random – is noticeably lower than the maximum
D. Farmer [14] in the context of competitive multi-choice possible value, that could be obtained if a perfectly in-
3
formed social planner with colossal computing power was of the model. In other words, which equilibrium is
dictating their strategies to each agent. reached by the agents in totally unpredictable.
Hence, our model provides an explicit example of Her-
bert Simon’s satisficing principle [1]: with reasonable, • As soon as some small amount of noise is present,
boundedly rational methods, agents facing complex prob- Grandmont style instabilities set in and the system
lems can only reach some satisfying and sufficing but starts exploring the set of possible equilibria, albeit
sub-optimal solutions. In fact, in the absence of the om- in a slow, non-ergodic way. In other words, the
niscient and omnipotent social planner alluded to above, stability of the visited equilibria increases with the
agents can hardly do better – the world in which they age of the system.
live in is de facto unlearnable. This is the main message
of our paper. • In the case where interactions are sufficiently non-
A further interesting twist of the model is that the fixed reciprocal, there are no fixed points to the learning
point reached by the learning process depends sensitively dynamics except the trivial one where agents play
on the initial condition and/or the detailed structure of randomly at each turn (like “rock-paper-scissors”).
the interaction network. As a consequence, which agents However, this is not what agents agree to do. They
will do better than average and which will do worse is to- keep holding strong beliefs at each time step, such
tally unpredictable, even when the economic interactions beliefs evolving chaotically in time – but now in an
between agents are fixed and known. Correspondingly, ergodic way.[30]
small changes in the interaction network (as a result of –
say – small exogenous shocks, or as some agents die and
Another somewhat related idea can be found in the re-
other are born [15]) can lead to very large changes in
cent paper of Hirano & Stiglitz [31], where there can be
individual strategies that agents would have to re-learn
a plethora of equilibrium trajectories, neither converging
from scratch.
to a steady state or even to a limit cycle, what they call
This feature actually corresponds to another possible “wobbly” macro-dynamics. We may finally mention the
definition of a “complex” system [17]: small perturba- work of Dosi et al. [32], which showcases that sophisti-
tions can lead to disproportionally large reconfigurations cated learning rules can fail at improving the individual
of the system – think of sand (or snow) avalanches as and collective outcomes when interacting agents are het-
an archetype of this type of behaviour: a single grain of erogeneous. Although the context is quite different, such
sand might leave the whole slope intact or, occasionally, a finding appears to be consistent with the saturation of
trigger a landslide [23] (see also, e.g., [4, 6, 22, 24–29] for the average reward at sub-optimal satisficing levels that
early and more recent discussions of complexity ideas in we observe when the memory loss rate of our learning
the context of economics). agents vanishes.
B. Related Ideas
C. Outline of the paper
Our work follows the footsteps of several important
contributions on the subject of learning in economics, The layout of the paper is as follows. In Sec. II, we
beyond the papers already quoted above, such as [1, 9], introduce our spin-glass inspired game. We then review
and in particular the complex two-player game of Galla the model’s main features and present what we believe
& Farmer [14]. Another relevant work is the study of are the most important conceptual results in the socio-
Grandmont [3] on the stability of economic equilibrium economic context in Sec. III. In Sec. IV, we enter the
under different learning rules. Although his general con- technical analysis of the “SK-game” by discussing the ex-
clusions are somewhat similar to ours, there are impor- istence and abundance of fixed point solutions. A similar
tant differences, due to the fact that complexity, in our analysis is conducted for short limit cycles when there
model, arises from the strong interaction between agents. are no fluctuations in the system in Sec. V. Having de-
The main differences are the following: termined when these solutions may or may not exist, we
take interest in the N → ∞ dynamics in Sec. VI by writ-
• While our agents can in some cases (i.e. when ing the Dynamical Mean-Field Theory of the problem.
interactions are reciprocal) converge to a locally Combining the static and dynamic pictures, we focus on
stable equilibrium, this equilibrium is rather ineffi- the fully rational (or “zero temperature”) limit and in
cient compared to the best theoretical one, which is particular on the role of learning on the collective dynam-
unattainable without the help of a central planner ics in Sec. VII. Section VIII finally considers the effect
with super-powers. Furthermore, instead of a sin- of fluctuations stemming from the bounded rationality of
gle equilibrium in the Grandmont case, the equilib- agents in their decision making process. In Sec. IX, we
rium reached by our learning agents is one among summarize our findings and discuss future perspectives
an exponential number of possible equilibria, that as well as the relevance of our model for other complex
are each sensitively dependent on the parameters systems such as biological neural networks.
4
II. A SIMPLE MODEL FOR A COMPLEX some forms of “Experience Weighted Attraction” that are
WORLD popular in the socio-economic context [40]. In fact one
can always rescale the prefactor α into β, so our choice
A. Set-up of the Model just means that Q does not artificially explode as α → 0,
which would impose that β effectively diverges in that
As a minimal, stylized model for decision making in limit.
a complex environment of interacting agents, we restrict Now, the missing ingredient is the specification of the
ourselves to binary decisions, as in many papers on mod- rewards, that encodes heterogeneity and non-reciprocity
els with social interactions, see e.g. [15, 33–36]. At of interactions. Inspired by the theory of spin-glasses,
every timestep t, each agent i plays Si (t) = ±1, with in particular by the Sherrington-Kirkpatrick (SK) model
i = 1, . . . , N , which can be thought of, for example, as the [41, 42], we set
decision of an investor to buy or to sell the stock market, N
or the decision of a firm to increase or to decrease pro- Ri (t) = ∑ Jij Sj (t). (4)
duction, etc. The incentive to play Si (t) = +1 is Qi (t) j=1
and is the agent’s estimate of the payoff associated to
Si (t) = +1 compared to that of Si (t) = −1. The actual Here, the matrix elements Jij specify the mutually bene-
decision of agent i is probabilistic and drawn using the ficial or competitive nature of the interactions between i
classic “logit” rule [37], i.e. sampled from a Boltzmann and j. (Note that Jij measures the impact of the decision
distribution over the choices, of j on the reward of i.)
In the context of firm networks, a client-supplier re-
e±βQi (t) 1 lation would correspond to Jij > 0, whereas two firms
P [Si (t) = ±1] = = [1±tanh (βQi (t)) ], i, j competing for the same clients would correspond to
eβQi (t) + e−βQi (t) 2
(1) Jij < 0. In the so-called “Dean problem”, Jij > 0 means
or, equivalently, the expected choice mi (or “intention”) that agents i and j get along well whereas Jij < 0 means
of agent i at time t is given by that they are in conflict [43]. The sign of Si determines
in which of the two available rooms agent i should sit,
mi (t) ∶= ⟨Si (t)⟩ = tanh (βQi (t)) . (2) in order to minimize the number of possible conflicts. A
predator-prey situation is when Jij × Jji < 0, meaning
Parameter β, assumed to be independent of i hence- that if i makes a gain, j makes a loss and vice versa.
forth, is analogous to the inverse temperature in statis- Note that reward Ri (t) depends on the actual (real-
tical physics and represents the agent’s rationality or in- ized) decision of other players, and not their expected
tensity of choice.[38] The limit β → ∞ corresponds to decisions or intentions. In other words, agents resort
perfectly rational agents, in the sense that they system- in online learning, which differs from offline learning
atically pick the choice corresponding to their preference where other players’ decisions Si (t) are averaged over
(given by the sign of Qi (t)), while setting β = 0 gives large batch sizes during which their inclinations would
erratic agents that randomly pick either decision with be assumed constant and replaced by their expectation
probability 1/2 regardless of the value of Qi (t). This will mi (t).
in fact turn out the be the case, in our model, in a whole Based on the learning dynamics, agents thus make
region of parameter space: when β is smaller than some a decision based on an imperfectly learned approxima-
critical value βc , all intentions mi do converge to zero, tion of what other players are likely to do. Indeed, us-
leading to a random string of decisions. ing Eq. (4) to express Qi (t) as the time-weighted sum
The evolution of the preference Qi (t) is where the (EWMA) of past rewards, plugging Eq. (3), and finally
learning takes place. We resort in so-called “Q-learning” replacing in Eq. (2), one finds that the evolution of agent
[39], i.e. reinforcement learning with a memory loss pa- i’s intention writes
rameter α. Given the (yet unspecified) reward ±Ri (t)
associated to making the choice ±1 at time t, the evolu- ⎛ N ⎞
tion of incentives (and, in turn, beliefs) is given by mi (t + 1) = tanh β ∑ Jij m̃α
j (t) , (5)
⎝ j=1 ⎠
Qi (t + 1) = (1 − α)Qi (t) + αRi (t). (3) where m̃αi (t) can be interpreted as the estimate of agent
This map amounts to calculating an Exponentially j’s expected decision at time t+1 based on its past actions
Weighted Moving Average (EWMA) on the history of up to time t,
rewards Ri (t). Taking α = 0, the agent’s preferences are ′
i (t) = α ∑ (1 − α)
m̃α Si (t′ ).
t−t
(6)
fixed at their initial values, and we thus restrict ourselves t′ ≤t
to α > 0. When α → 0, Qi (t) is approximately given by
the average reward over the last α−1 time steps. Note Expressed in this form, it is clear that there is characteris-
here that this averaging of past rewards is not exactly tic timescale τα ∼ 1/α over which past choices contribute
the same as the accumulation rule (where the reward to the moving average m̃α i (t). Note that offline learn-
would not be multiplied by α in Eq. (3)) appearing in ing would correspond to a different evolution equation,
5
namely and variance σ 2 /N (the case of a non-zero average value

of JS will be briefly discussed below). This defines what
⎛ N ⎞ we will call the “SK-game” henceforth.
mi (t + 1) = tanh β ∑ Jij mj (t) , (7)
⎝ j=1 ⎠ In the following we set σ = 1 without loss of generality.
The resulting variance of Jij is thus given by
although the two coincide in the α → 0 limit.
At this stage, it may be useful to compare and con- 1
trast the present model with previous work. On the υ(ε) ∶= N V[Jij ] = 1 − ε + ε2 . (9)
2
one hand, the learning procedure closely resembles the
original proposition by Sato and Crutchfield [44], and its The specific cases ε = {0, 1, 2} hence correspond to fully
treatment by Galla [14, 45, 46] and others [47–50], how- symmetric (Jij = Jji ), a-symmetric (i.e. Jij and Jji in-
ever these authors considered games comprising only two dependent) and anti-symmetric (Jij = −Jji ) interactions
players with many strategies. The subsequent case ex- respectively. We can thus also characterize the correla-
plored by Galla and Farmer considering a larger number tion between Jij and Jji through parameter η,
of players [51] therefore lies closer to our setting, but still
consider many strategies, while the similar binary deci- Jij Jji 1−ε
η= = , (10)
sion models proposed by Semeshenko et al. [52, 53] are 2
Jij υ(ε)
restricted to perfectly rational agents and homogeneous
interactions Jij = J0 > 0 ∀i, j. Note that all these works where overlines indicate an average over the disorder.
also consider accumulated rewards and offline learning, in It may actually be insightful to allow for a non-zero
contrast with our averaged rewards and online learning average value to the interaction parameters, and define
(for a comparison between the two and other learning dy- the reward Ri (t) as
namics, see [50]). On the other hand, replicator models N
with random non-symmetric interactions between a large 1 N
Ri (t) = ∑ Jij Sj (t) + J0 M (t); M (t) ∶=
∑ Sj (t).
number of species [54, 55] share many features with the j=1 N j=1
system at hand, but the prescribed dynamics are inher- (11)
ently linked to evolutionary principles such as extinction Note that if only the J0 term is present, the game be-
that are not present in our model. Finally, other Ising- comes simple, in the sense that either J0 > 0 and βJ0 > 1
inspired games such as that introduced in [56] are concep- and all agents converge to the same strategy mi ≡ m with
tually similar, in particular in their extension with my- m a non zero solution of m = tanh(βJ0 m), or they con-
opic strategy revision (meaning updating based on future verge to the random “rock-paper-scissor” strategy mi ≡ 0.
expectations and not directly passed realizations as done Finally, one may think that agents have some idiosyn-
here) [57]. So far, however, these models have been stud- cratic preferences, or different costs associated to the two
ied without heterogeneities and therefore do not present possible decisions Si = ±1. This would amount to adding
the “radical complexity” related to the presence of a very to the reward Ri (t) a time independent term Hi , where
large number of possible solutions discussed hereafter. Hi favors Si = +1 if positive and Si = −1 if negative. In
the present paper we will restrict to Hi ≡ 0, ∀i, but one
expects from the literature on spin-glasses that the main
B. The Interaction Matrix results discussed below would still hold for small enough
Hi s. Beyond some threshold value, on the other hand,
In order to rely on known results about the SK model, agents end up aligning to their a priori preference, i.e.
we will assume in the following that the all agents ran- mi Hi > 0.
domly interact with one another, meaning that all ele-
ments of the matrix J are non-zero. Sparse matrices,
corresponding to low-connectivity interaction matrices, III. BASIC INTUITION AND MAIN RESULTS
would probably be more realistic in an economic context.
However, we expect that many of the conclusions reached In this section, we review the most salient properties of
below will qualitatively hold in such cases as well. the “SK-game” introduced above, and why these results
We choose interactions Jij between i and j to be may be of interest in the context of socio-economic mod-
random Gaussian variables of order N −1/2 , with Jij in eling. This section is intended to be give a broad, non-
general different from Jji , accounting for possible non- technical overview, whereas the following sections will
reciprocity of interactions. More precisely, we introduce delve in more details, using the language and methods
the parameter ε and write the interaction matrix as of statistical physics. Readers more interested in con-
ε S ε A ceptual results and less in technical details are invited to
Jij = (1 − ) Jij + Jij , (8) read this section and skip the subsequent ones, and jump
2 2 directly to the final discussion Sec. IX.
with JS a symmetric matrix and JA an anti-symmetric In the SK-game, the payoff of each agent is a random
matrix. The entries of both these matrices are indepen- function of the decisions of all other agents. Hence, learn-
dent and sampled from a Gaussian distribution of mean 0 ing the optimal strategy (in terms of the probability for
6
β→∞ β1 (a) (b)

1
L=2 L=4 noisy chaos α = 0.01
(Q)cycle (Q)cycle q>0 α = 0.1
1 1
chaos
RN
α
random
1
L ∼ α− 2
chaos
(Q)FP aging oscil.
0 0
0
0 1 2 0 1 2 0.6 0.8 1.0 0 1 2
ε ε α ε
(c) (d)
1.5
FIG. 1. Qualitative phase diagram in the (ε, α) plane for the
SK-game in the noiseless (left) and weak noise (right) regimes. 1 A ≈ 2.1
RN
FP and (Q) refer to “fixed point” and “quasi” respectively
(mi (t) = 0 ∀t). “Random” refers to the case where agents 1.0
A ≈ 1.0
pick ±1 with equal probability, whereas “chaos” means that
at a given instant of time players have well defined intentions 0
mi but these evolve chaotically with time. 0.0 0.5 1.0 0.0 0.1
1/β N −2/3
agent i to play +1 or −1) is bound to be extremely diffi- FIG. 2. Evolution of the average reward with (a) the memory
cult. Defining the average reward per agent as loss rate α, for ε = {0, 0.6, 0.85, 1.05, 1.5, 2} from dark purple
to light green, β → ∞; (b) the asymmetry ε for α = 0.01
1 1 N and α = 0.1, β → ∞, the vertical dotted line indicates the
RN ∶= ∑ Si Ri = ∑ Si Jij Sj , (12) value εc above which the system becomes chaotic [61]; (c)
N i N i,j=1 the noise level 1/β for α = 0.01, ε = {0, 0.6} (dark purple
and blue respectively), note the non-monotonic behaviour; (d)
system size N for α = 0.01, β → ∞, ε = {0, 0.6} (dark purple
and noting that ∑N i,j=1 Si Jij Sj = (1 − 2 ) ∑i,j=1 Si Jij Sj ,
ε N S
and blue respectively), continuous lines showing fits RN =
the largest possible average reward R∞ for N → ∞ can
R∞ − AN −2/3 , excluding N = 32. The dotted line indicates
be exactly computed using the celebrated Parisi solution
the best fit for the SK ground state, for which A ≈ 1.5 [62].
of the classical SK model and reads [42]: In (a), (b) and (c) N = 256 and the dashed line represents the
Parisi solution for R∞ .
lim RN ∶= R∞ = 0.7631... × (2 − ε). (13)
N →∞
However, in practice this optimal value can only be Eq. (5) becomes
reached using algorithms that need a time exponential
in N .[58] Hence, it is expected that simple learning al- ⎛ ′
N ⎞
Si (t + 1) = sign ∑ (1 − α)t−t ∑ Jij Sj (t′ ) , (14)
gorithms will inevitably fail to find the true optimal ⎝t′ ≤t j=1 ⎠
solution. Nevertheless, we also know from the spin-
glass folklore [42] (see below for more precise statements) and the model is fully specified by two parameters: α
that many configurations of {Si }’s correspond to quasi- (controlling the memory time scale of the agents) and ε
optima, or, in the language of H. Simon, satisficing solu- (controlling the reciprocity of interactions). For N not
tions [1] (see also [21]). It is in a sense the proliferation too large, the evolution of Eq. (14) leads to either fixed
of such sub-optimal solutions that prevent simple algo- points, or oscillations, or else chaos. The schematic phase
rithms to find the optimum optimorum. Furthermore, if diagram in the plane (α, ε) is shown in Fig. 1.
learning indeed converges (which is not the case when One clearly sees a region for (α, ε) small where learn-
ε is too large, i.e. when interactions are not reciprocal ing reaches a fixed point, in which the average reward
enough), the obtained fixed point heavily depends on the is close, but significantly below the theoretical optimum
initial condition and/or the specific interaction matrix J. R∞ given by Eq. (13), see Fig. 2.[59] Note that learning
definitely helps: for ε = 0, most fixed points are character-
ized by a typical reward R ≈ 1.01 [60], significantly worse
than the value ≈ 1.40 reached by our learning agents,
A. Phase Diagram in the Noiseless Limit
extrapolated to N → ∞.
As ε and α are varied one observes the following fea-
Let us first consider the case where agents always tures:
choose the action that would have had the best average
reward in the past Qi (t) (assuming that other agents still • When ε is not too large (interactions sufficiently
played what they played). This corresponds to the noise- reciprocal) and α increases (shorter and shorter
less learning limit β → ∞. In this case, the iteration map memory) learning progressively ceases to converge
7
and oscillations start appearing: impatient learning (a) (b)

generates cycles. This leads to a sharp decrease of 1.0 1.0
the average reward (see Fig. 2(a)), as agents over-
react to new information and are no longer able to 0.5
C(τ )
0.5
coordinate on a mutually beneficial equilibrium. A
similar effect was observed in dynamical models of 0.0
supply chains, where over-reaction leads to oscillat- 0.0
ing prices and production – the so-called bullwhip 0 2 4 6 0 2 4 6
effect [63] (see also [22, 64]).
(c) (d)
• Conversely, when α is small (long memory) and 1.0 1
ε increases, the probability to reach a fixed point
C(τ )
progressively decreases, and when a fixed point is 0.5
0
reached, the average reward is reduced – see Fig.
2(b). Beyond some threshold value, the dynam- 0.0
ics becomes completely chaotic, leading to further −1
loss of reward. Note that “chaos” here means 0 2 4 6 0 2 4 6
that although agents have well defined intentions τ τ
mi (t) ≠ 0 at any given instant of time, these inten-
tions evolve chaotically forever.[65] FIG. 3. Evolution of the steady-state two-point correlation
function for α = 1, β → ∞, markers indicating simulations
• Surprisingly, oscillations reappear when ε becomes of the game at N = 256 averaged over 128 samples of dis-
larger than unity, i.e. when interactions are mostly order and initial conditions, error-bars showing 95% confi-
anti-symmetric, “predator-prey” like. Perhaps dence intervals, and continuous lines representing the solu-
reminiscent of the famous Lotka-Volterra model, tion of the DMFT equation (see Eq. (30) below). (a) ε = 0.1
agents’ decisions and payoffs become periodic, with (ε < εc ≈ 0.8), cycles of length L = 2. (b) ε = 0.85 (εc < ε < 1),
a period that scales anomalously as α−1/2 , i.e. much “weakly” chaotic behavior. (c) ε = 1.05 (1 < ε < 2 − εc ),
shorter than the natural memory time scale α−1 . “strongly” chaotic behavior. (d) ε = 1.5 (ε > 2 − εc ), cycles of
Although not the Nash equilibrium mi ≡ 0, this length L = 4.
oscillating state allows the average reward to be
positive, even when at each instant of time, some
agents have negative rewards. The autocorrelation function corresponding to the dif-
ferent cases described above are plotted in Fig. 3. Note
• Only in the extreme competition limit ε = 2 (cor- that the signature of oscillations of period L is that
responding to a zero sum game, Eq. (13)) and for C(nL) ≡ 1 for all integer n. However, note that when
small α, are agents able to learn that the unique ε < εc , not all spins flip at each time step. The fact that
Nash equilibrium is to play random strategies mi ≡ C(2n + 1) = 0 means that half of the spins in fact re-
0 (see Fig. 1, bottom right region).[66] main fixed in time, while the other half oscillate in sync
between +Si and −Si .[68] In the chaotic phases, C(τ )
• Finally, in the extreme (and unrealistic) case α = 1, tends to zero for large τ , with either underdamped or
where agents choose their strategy only based on overdamped oscillations. Hence in these cases, the con-
the last reward, the system evolves, as ε increases figuration {Si } evolves indefinitely with time, and hardly
from zero, from high frequency oscillations with pe- ever revisit the same states.
riod L = 2 to “weak chaos”, to “strong chaos” when
ε ≈ 1, and finally back to oscillations of period L = 4
when ε → 2. B. Finite N , Large t vs. Finite t, Large N
In order to characterize more precisely such temporal

behaviours, it is useful to introduce the two-point auto- The previous results hold in the long time limit for
correlation function of the expected decisions or inten- finite N , i.e. when we take formally the limit t → ∞
tions: before N → ∞. There is however another regime for
which the N → ∞ limit is taken first, which might be
1 more appropriate in the context of firm networks (say)
C(t, t + τ ) = ∑ ⟨mi (t)mi (t + τ )⟩, (15)
N i when the number of firms is very large.
Interestingly, specific analytical tools are available to
where the angular brackets now refer to an average over treat this regime – the so-called Dynamic Mean-Field
initial conditions.[67] In cases where the dynamics are Theory (DMFT) that we will use below.
assumed to be time-translation invariant, we will write In this infinite N regime, new types of behaviour ap-
C(τ ) which corresponds to the above quantity averaged pear that one can call quasi-fixed points or quasi-cycles.
over time after the system has reached a steady-state. In the case of quasi-fixed points, learning does not strictly
8
speaking converge, but actions Si (t) fluctuate around

time-independent averages. In other words, the two-
10−2
ρ>(ei)
point correlation C(τ ) is not equal to one for all τ (which 0.4
would be the case for a fixed point) but reaches a pos-
ρ(ei)
itive plateau value for large τ : C(τ → ∞) = C∞ > 0.
Only when ε = 0 does one find C∞ = 1. The same holds 10−6
0.2 2.5 5.0
for quasi-cycles of length L if one considers the correla-
tion function computed for τ = nL, with n an integer:
C(nL → ∞) = C∞ > 0, with C∞ = 1 when ε = 2. 0.0
The schematic phase phase diagram drawn in Fig. 1 0 1 2 3 4 5
then continues to hold in the large N finite t limit, pro- ei
vided one interprets “fixed points/cycles” as “quasi-fixed
points/cycles” in the sense defined above. More details FIG. 4. Distribution of individual rewards at β → ∞ for N =
on the subtle role of α, N and t will be provided in the 512 and 128 initial conditions and realizations of the disorder
next, technical sections. in the fully symmetric case ε = 0 for α = {0.5, 0.1, 0.01} (dark
to light coloring). The dashed line is the Sommers-Dupont
analytical solution to the SK model [69]. Inset: associated
C. Noisy Learning & “Aging” survival function in a lin-log scale and focusing on the right
tail, dotted line corresponding to a Gaussian fit.
In the presence of noise, the “convictions” ∣mi ∣ of
agents naturally decrease, and in fact become zero (i.e.
noisy chaos. However, there is still a distinction between
decisions are totally random) beyond a critical noise level
a low-noise phase where at each instant of time, agents
that depends on the asymmetry parameter ε: more asym-
have non-zero expected decisions mi (that will evolve
metry leads to more fragile convictions (see Fig. 16 below
over time) from a high-noise phase where agents always
for a more precise description).
make random choices between ±1 with probability 1/2
When the noise is weak but non-zero, strict fixed points
(see Fig. 1).
do not exist anymore, but are replaced (for ε small
Finally, in the case where learning leads to cycles, any
enough) by quasi-fixed points – in the sense that the in-
amount of noise irremediably disrupts the synchronisa-
tentions mi fluctuate around some plateau value for very
tion process, and cycles are replaced by pseudo-cycles,
long times, before evolving to another configuration com-
with either underdamped or overdamped characteristics.
pletely uncorrelated with the previous one. This process
In the limit ε → 2, fluctuations drive the system to a para-
goes on forever, albeit at a rate that slows down with
magnetic state where q = C(0) = 0 (see Fig. 16 below),
time: plateaus become more and more permanent. This
meaning the agents remain undecided.
is called “aging” in the context of glassy systems.
In a socio-economic context, it means that a form of
quasi-equilibrium is temporarily reached by the learn- D. Individual Rewards
ing process, but such a quasi-equilibrium will be com-
pletely disrupted after some time, even in the absence of
any exogenous shocks. This is very similar to the quasi- As we have noted above, the average reward is close,
nonergodic scenario recently proposed in Ref. [15], al- but significantly below the theoretical optimum R∞
though in our case the evolution time is not constant but given by Eq. (13). However, some agents are better off
increases with the “age” of the system, i.e. the amount than others, in the sense that the individual reward ei
of time the game has been running. at the fixed point (when fixed points exist), is different
Perhaps counter-intuitively, however, the role of noise from agent to agent. Noting that
is on average beneficial when 1/β is not too large. Indeed, RRR RRR
as shown in Fig. 2, the average reward first increases as ei ∶= Si⋆ Ri⋆ = RRRRR∑ Jij Sj⋆ RRRRR , (16)
a small amount of noise is introduced, before reaching RRR j RRR
a maximum beyond which “irrationality” becomes detri-
mental. The intuition is that without noise, the system where the second equality holds because at the fixed
gets trapped by fixed points with large basins of attrac- point one must have Si⋆ = sign(Ri⋆ ), it is clear that in
tion, but lower average rewards. A small amount of noise the fully reciprocal case ε = 0, all rewards ei are positive.
allows agents to reassess their intentions and collectively The distribution ρ(e) of these rewards over agents is ex-
reach more favorable quasi-fixed points, much as with pected to be self-averaging for large N , i.e. independent
simulated annealing or stochastic gradient descent. of the specific realisation of the Jij and of the initial con-
When learning leads to a chaotic evolution, i.e. when dition. Such distribution is shown in Fig. 4. One notices
Jij and Jji are close to uncorrelated (ε ∼ 1), noise in that ρ(e) vanishes linearly when e → 0
the learning process does not radically change the evo-
lution of the system: deterministic chaos just becomes ρ(e) ≈ κe, κ ≈ 1.6,
e→0
9
(a) E. Unpredictable Equilibria
2.5
Now, the interesting point about our model is that the
ei(t)
0.0 final rewards are highly dependent on the initial condi-

tions and/or the realisation of the Jij ’s. In other words,
−2.5 successful agents in one realisation of the game become
the losers for another realisation obtained with different
(b) initial conditions. A way to quantify this is to measure
2.5 the cross-sectional correlation of final rewards for two dif-
ferent realisations, i.e.
ei(t)
0.0 1
CN
×
∶= ∑(e − ⟨e ⟩)(ei − ⟨e ⟩),
a a b b
a≠b (17)
−2.5 N i i
0 10−1 100 101 102 103 where a, b corresponds to two different initial conditions
αt and ⟨e⟩ corresponds to the cross-sectional average reward.
As shown in Fig. 7, CN ×
goes to zero at large N , indicating
FIG. 5. Evolution of individual rewards in time for N = 256, that the final outcome of the game, in terms of the win-
α = 0.01, β → ∞, (a) ε = 0 and (b) ε = 0.6. Right: histogram ners and the losers, cannot be predicted. The dependence
of the individual rewards after a single timestep (shaded) and on N appears to be non-trivial, with different exponents
at the final time (unshaded). governing the decay of the mean overlap CN × (decaying as
N −0.85
) and its standard deviation (decaying as N −2/3 ).
A similar effect would be observed if instead of chang-
as for the standard SK model, although the value of κ is ing the initial condition one would randomly change the
distinctly different from the one obtained for the true op- interaction matrix J by a tiny amount ϵ. The statement
timal states of the SK model, for which κSK ≈ 0.6 [69, 70]. here is that for any small ϵ, CN ×
goes to zero for suffi-
Such a discrepancy is expected, since the fixed points are ciently large N . This is called “disorder chaos” in the
obtained as the long time limit of the learning process context of spin-glasses [73]; by analogy with known re-
– in particular, since κ > κSK , the number of poorly results for the SK model, we conjecture that CN ×
is a de-
warded agents is too high compared to what it would ζ
creasing function of N ϵ , where ζ is believed to be equal
be in the truly optimal state. Note that once α is suf- to 3 in the SK case [74]. This means that when N ≫ ϵ−ζ
ficiently small for the system to reach a fixed point, its the rewards between two systems with nearly the same
precise value does not seem to have an impact on the interaction structure, starting with the same initial con-
distribution of rewards and κ. ditions, will be close to independent.
Another important remark is that the distribution of Such a sensitive dependence of the whole equilibrium
rewards ρ(e) does not develop a “gap” for small e, i.e. state of the system (in our case the full knowledge of the
a region where ρ(e) is exactly zero. In other words, al- intentions mi of all agents) prevents any kind of “com-
though all agents have positive rewards, some of them mon knowledge” assumption about what other agents
are very small. This is associated with the so-called will decide to do in a specific environment. No reason-
“marginal stability” of the equilibrium state [71], to wit, able learning process can lead to a predictable outcome;
its fragility with respect to small perturbations, as dis- even the presence of a benevolent social planner assign-
cussed in more details in the next subsection. ing their optimal strategy to all agents would not be able
For very large e, the distribution ρ(e) decreases like to do so without a perfect knowledge of all interactions
a Gaussian (Fig. 4 inset), corresponding to a Central between agents and without exponentially powerful (in
Limit Theorem behaviour in that regime, as for the SK N ) computing abilities. Such a “radically complex” sit-
model.[72] Fig. 5 shows how the rewards of individual uation leads to “radical uncertainty” in the sense that
agents evolve from an initially random configuration, be- the behaviour of agents, even rational, cannot be pre-
fore settling to constant (but heterogeneous) values at dicted. Learning agents can only achieve satisficing so-
the fixed point. lutions, that are furthermore hypersensitive to details.
As competitive effects get stronger (i.e. as ε increases) As we have seen in Sec. III C, any amount of noise in
and the system ceases to reach a fixed point, the distribu- the learning process will make the whole system “jump”
tion ρ(e) develops a tail for negative values of e, meaning from one satisficing solution to another in the course of
that some agents make negative gains, see Fig. 6. In the time.
extreme “predator-prey” limit ε = 2, the distribution ρ(e)
becomes perfectly symmetric around e = 0, as expected
– see Fig. 6, rightmost plot. However, note that there is F. Increasing Cooperativity
no persistence in time of the winners: the individual re-
ward autocorrelation function eventually decays to zero, A way to help agents coordinate is to use rewards given
possibly with oscillations in the competitive region. by Eq. (11) with J0 > 0, representing a non-zero average
10
ρ(ei) 100
10−2
−5 0 5 −5 0 5 −5 0 5 −5 0 5 −5 0 5 −5 0 5
ei ei ei ei ei ei
FIG. 6. Distribution of individual rewards for N = 256 and α = {1, 0.5, 0.01} represented by black triangles, purple squares
and green circles respectively, β → ∞, measured over 32 initial conditions and realizations of the disorder. From left to right:
ε = {0, 0.1, 0.85, 1.05, 1.5, 2}.
(a) (b) (a) 2 (b) 1.0

N = 32
N = 64
0.2 ∝ N −0.85
ρ(CN× )/N 2/3
N = 128
10−1
N = 256
RN
M
1
CN×
N = 512 0.5
0.1
0.0 10−2 0 0.0

0 10 10−2 0 1 2 0 1 2
J0 J0
N 2/3 CN× − CN× 1/N
FIG. 8. (a) Evolution of the average reward with the in-

FIG. 7. Overlap between solutions for different initial condi- centive to cooperate J0 for N = 256, α = 0.01, β → ∞ and
tions and identical draws of interactions, for α = 0.01, β → ∞, ε = {0, 0.6, 0.85, 1.05, 1.5, 2} with colors ranging from purple
ε = 0. (a) Distribution of overlaps shifted by the mean and to light green with increasing ε. (b) Average intention in the
rescaled with the system size N to the power 2/3. (b) Aver- long time limit M = limt→∞ M (t) for the same parameters.
age overlap as a function of system size in log-log coordinates,
with the best regression line N −0.85 . Error-bars show the 95%
confidence interval over 16 different draws of the disorder. Eq. (7) (valid for α → 0) is mi = 0 for all i, i.e. agents
cannot coordinate and play random strategies.
cooperative contribution to rewards. This term obviously

helps agents finding mutually beneficial strategies. (Note G. Habit Formation
that with our normalisation, the J0 term is in fact N −1/2
times smaller than the random interaction terms Jij .) Up to this point, all results have assumed that there
The impact of such a term is well understood in the is no self-interaction, Jii = 0. Nonetheless, it is interest-
context of the SK model for ε = 0 [75], which nicely ing to consider the possibility of having an O(1) pos-
translates into the current dynamical framework. For itive diagonal term in the interaction matrix. In the
β = ∞, one finds that whenever J0 ≤ 1, the average inten- socio-economic context, such a contribution is relevant as
tion M (t) remains zero for large N and one expects that it represents self-reinforcement of past choices, which is
the learning process is not affected by such a “nudge”. also called “habit formation” where agents stick to past
When J0 > 1, on the other hand, the situation changes choices, a popular idea in behavioural science, see e.g.
as all agents start coordinate on one of the two possible [10, 12] and refs. therein.
choices. As shown in Fig. 8, the average intention be- The introduction of a diagonal contribution has im-
comes non-zero, although a finite fraction of agents still portant consequences for the problem. Assuming the
play opposite to the majority because of their own id- self-interaction is identical for all agents, Jii = Jd > 0, it
iosyncratic rewards. will rather intuitively favor the emergence of fixed points
For J0 ≫ 1, radical complexity disappears and learning since agents are tempted to stick to past choices. It is
quickly converges to the obvious optimal strategy where for instance known that in the case of fully random in-
all agents make the same move Si = +1 or Si = −1, ∀i. teractions ε = 1, fixed points will start to appear when Jd
In this case, RN = J0 as M (t) eventually reaches unity, is sufficiently large [76]. Interestingly, these fixed points
see Fig. 8. For ε > 0, the same occurs albeit for different can be very difficult to reach dynamically with standard
values of J0 . Hopfield dynamics (α = 1).
In the case J0 < 0 with ∣J0 ∣ ≫ 1, the only solution of Adding such diagonal contribution to our learning dy-
11
namics, we have observed that the fraction of trajectories We will mostly focus, in the following, on the long
converging to seemingly dynamically inaccessible config- memory case α ≪ 1 which is most relevant for thinking
urations significantly increases, specially when α ≪ 1. about learning in a (semi-)realistic context. In this case,
While further work is required to precisely assess the one can show that the exponential moving average on the
effectiveness of learning with self-reinforcement (partic- realized values Si (t) converges to one on the expected
ularly as finite size effects appear to play a significant values mi (t). Indeed, as detailed in Appendix A,
role), such a result is consistent with the overall influ-
2
ence of learning reported here. ′ α
⟨(α ∑ (1 − α)t−t (mi (t′ ) − Si (t′ ))) ⟩ ≤ ÐÐ→ 0.
t′ ≤t 2 − α α→0
√ (18)
H. Core Messages This means that up to fluctuations of order α, we can
describe the dynamics of the system through a determin-
In line with the conclusions of Galla & Farmer [14], our istic iteration on mi (t), in fact corresponding to offline
multi-agent binary decision model provides an explicit learning. (We will√see below that the neglected fluctua-
counter-example to the idea that learning could save the tions are of order α/β.)
rational expectation framework (cf. Sec. I and Ref. [9]). Further making the ansatz that the mean-field dynam-
Learning, in general, does not converge to any fixed ics will eventually reach a fixed point mi (t) = m⋆i ∀i given
point, even when the environment (in our case the inter- sufficient time, Eq. (5) then yields
action matrix J) is completely static: non-stationarity is
self-induced by the complexity of the game that agents ⎛ ⎞
are trying to learn, as also recently argued in [13]. m⋆i = tanh β ∑ Jij m⋆j . (19)
⎝ j ⎠
When learning does indeed converge (which requires a
minima a high level of reciprocity between agents) the This equation is known in the spin-glass literature as the
collective state reached by the system is far from the Naive Mean-Field Equations (NMFE) when ε = 0, and
optimal state, which only a benevolent, omniscient social defines a so-called static Quantal Response Equilibrium,
planner with formidable powers can achieve. In other similar to its fully mean-field equivalent (Jij = J/N )
words, even more sophisticated learning rules would not studied in [56].
really improve the outcome: the SK-game is unlearnable To the reader familiar with the physics of disor-
and – as argued by H. Simon [1] – agents must resort to dered systems, this equation is immediately reminis-
suboptimal, satisficing solutions. cent of the celebrated Thouless-Anderson-Palmer (TAP)
Furthermore, any small random perturbation (noise in equation [77] describing the mean magnetization in the
the learning process, or slow evolution in the environ- Sherrington-Kirkpatrick (SK) spin-glass [41]. Physically,
ment) eventually destabilises any fixed point reached by the NMFE equation is satisfied when minimizing the
the learning process, and completely reshuffles the collec- free energy of a system of N sites comprising M → ∞
tive state of the system: in the long run, agents initially binary spins, with sites interacting through an SK-like
favoring the +1 decision end up favoring −1, and better- Hamiltonian [78]. Despite being seemingly simpler than
off agents end up being the underdogs, and vice-versa its previously mentioned TAP counterpart, which in-
(much as in the simpler model of Ref. [15]). cludes an additional Onsager “reaction term”, the NMFE
Finally, even in the most favourable case of a fully re- shares many of its properties. Relevant to our problem,
ciprocal game with slow learning, the average reward is both the NMFE and the TAP equations have a para-
in fact improved when some level of noise (or irrational- magnetic phase (m⋆i = 0 ∀i) for β < βc , while above
ity) is introduced in the learning rule, before degrading this critical value there is a spin-glass phase where q ⋆ =
again for large noise. N −1 ∑i (m⋆i )2 > 0 and solutions are exponentially abun-
dant in N [78–80]. The NMFE has a critical temperature
1/βc = 2 as opposed to 1/βc = 1 in the TAP case, while the
IV. FIXED POINT ANALYSIS AND two equations become strictly equivalent in the β → ∞
COMPLEXITY limit.
Using known properties from the spin-glass literature,
We have seen that our model displays a wide variety we can therefore already establish that if the system
of complex collective dynamics. Only in some cases does reaches a fixed point when interactions are fully recip-
learning converge to non-trivial fixed points where strate- rocal (ε = 0) and memory is long ranged, it will be either
gies are probabilistic but with time independent proba- a trivial fixed point where agents continue making ran-
bility p± to play ±1, such that p± = (1±m⋆i )/2 for agent i. dom decisions for ever (m⋆i = 0 ∀i), or, when learning
Such a steady-state would be analogous to an economic is not too noisy (β > βc ) the number of fixed points is
equilibrium (although it is essential to dissociate this no- ∼ exp[Σ(β)N ], where Σ(β) is called the “complexity”.
tion from that of a thermodynamic equilibrium, which In this second case, the fixed point actually reached by
may only exist in the case of fully reciprocal interactions, learning depends sensitively on the initial conditions and
ε = 0). the interaction matrix J.
12
How is this standard picture altered when interactions As a matter of fact, even in the annealed case, the
are no longer reciprocal? In such cases, the system can- computation of the TAP complexity has proved to be
not be described using the equilibrium statistical me- a formidable task, and has sparked a large amount of
chanics machinery. controversy, as the original solution computed by Bray
and Moore (BM) [60] has been put into question before
being (partially) salvaged by the metastability of TAP
A. Critical Noise Level states in the thermodynamic limit [83]. For a relatively
up to date summary of the situation, we refer the reader
In order to extend the notion of critical noise βc to to G. Parisi’s contribution in [84].
ε > 0, one can naively look at the linear stability of the While the BM approach can be adapted to the NMFE
paramagnetic solution m⋆i = 0 ∀i to Eq. (19). Just as [79, 85], several aspects of the calculation remain unclear,
in the TAP case [42], expanding the hyperbolic tangent particularly as the absence of a sub-dominant reaction
to the second order and projecting the vector of mi on term means that the argument of the metastability of
an eigenvector of J, the stability condition can be ex- states in the N → ∞ limit is no longer valid a priori,
pressed with the largest eigenvalue of the interaction ma- although numerical results support the marginally stable
trix. Adapting known results from random matrix the- nature of NMFE fixed points in the thermodynamic limit
ory to our specific problem formulation, the spectrum of [79]. We leave its extension to ε > 0 to a later dedicated
J can be expressed as an interpolation between a Wigner work.
semi-circle on the real axis (ε = 0), the Ginibre ensemble Nevertheless, the previously introduced critical βc and
(ε = 1) and a Wigner semi-circle on the imaginary axis the existing computation of the number of fixed points
(ε = 2) [81]. The resulting critical “temperature” is then as a function of ε in the β → ∞ limit [61, 86] can be used
given by to conjecture the boundaries of the region in (β, ε) space
where the complexity Σ is non-vanishing. Indeed, in the
1 1 (2 − ε)2 zero temperature case, it has been shown [61] that the
Tc (ε) = = √ , (20)
βc (ε) 2 υ(ε) annealed complexity can be expressed as a function of
the asymmetry parameter η defined in Eq. (10) as
recovering the known result 1/βc = 2 for the case ε = 1
0. (We recall that we have set the interaction variance Σ(η) = − ηx2 + log 2 + log Φ(ηx), (22)
2
σ 2 to unity throughout the paper. If needed σ can be
reinstalled by the rescaling β → βσ.) with Φ the Gaussian cumulative density and x is the
solution to
1 x
B. The Elusive Complexity
xΦ(ηx) = Φ′ (ηx), Φ(x) ∶= erfc (− √ ) (23)
2 2
The main insight provided by this result is that the com-
To determine if there are still an exponential number plexity vanishes at η = 0, corresponding to ε = 1, where
of fixed point to reach below the candidate critical noise the paramagnetic fixed point is supposed to be unsta-
level, i.e. if there is a spin-glass phase, for β > βc (ε) √
ble as βc (ε = 1) = 2. As the complexity is a decreas-
when ε > 0, we should study the complexity, defined for ing function of temperature, this therefore means that
a single realization of the disorder as ε = 1 is an upper limit for the existence of fixed points
1 when β is finite. This conjecture is also consistent with
Σ(β, ε) = lim log NJ (N, β, ε), (21) the breakdown of fixed point solutions to the dynamical
N →∞ N
mean field theory below the critical noise level that will
where NJ is the number of fixed points in the system be discussed in Sec. VIII A, as well as the saddle point
for a given interaction matrix. There are then two ways equations obtained when adapting the BM calculation to
to compute an average of this quantity over the disor- ε > 0.
der: the “quenched” complexity, where the mean of the Combining these two somewhat heuristic delimitation
logarithm of the number of solutions is considered, and for the existence of a large number of non-trivial fixed
its “annealed” counterpart, where the logarithm is taken points, we obtain the critical lines shown in Fig. 9(a).
on the mean number of solutions. The former is usually Overlaying these borders with the annealed complexity
considered to be more representative, as unlikely samples measured numerically, we find a very good agreement.
leading to an abnormally large number of solutions can In particular, the vanishing of the complexity at ε √ =1
be observed to dominate the latter (see e.g. [21, 82] for in appears to be consistent for T < Tc (ε = 1) = 1/ 2,
recent examples), but requires a more involved calcula- as shown in Fig. 9(b). The agreement with the β → ∞
tion with the use of the so-called “replica trick” [42]. In analytical result, represented by the continuous line, also
the TAP case, quenched and annealed complexities co- appears to validate our counting method at low temper-
incide for solutions above a certain free-energy threshold atures. Note that one can in fact show that for ε > 1 and
[60] (where most solutions lie but importantly not the N = ∞, the only fixed point (or Nash equilibrium) is the
ground state). “rock-paper-scissors” equilibrium m⋆i = 0, ∀i.
13
(a) 0.20 (b) 0.2 Eq. (21)) can be extended to limit cycle complexity ΣL
1.0
for limit cycles of length L, with ΣL=1 ≡ Σ. The results
of Hwang et al. [61] can be summarized as follows:
Σ(ε)
T /2
T /2
0.05 0.1 0.2
Σ
0.5
0.01 • When ε < 1, limit cycles with L = 2 have the largest
0.0 0.00 0.0 0.0 complexity, which is exactly twice of the fixed point
0 1 0 1 complexity: Σ2 = 2Σ1 (as was in fact previously
ε ε
shown by Gutfreund et al. [86]).
FIG. 9. (a) Annealed complexity of the Naive Mean-Field • The complexities ΣL (ε) all go to zero when ε = 1.
Equation in (T, ε) space, where we recall T = β −1 , measured • When 1 < ε ≤ 2, limit cycles with L = 4 dominate,
numerically for N = 40. The dashed line represents the criti- with Σ4 (ε) ≥ Σ2 (2 − ε).
cal temperature Tc (ε) for which the paramagnetic fixed point
ceases to be stable, while the continuous line indicates ε = 1 • Close to ε = 1, the cut-off length Lc , beyond which
inferred from the β → ∞ result. (b) Annealed √ complexity as limit cycles become exponentially rare, grow expo-
a function of ε for varying temperatures T < 1/ 2, i.e. in the nentially with N : Lc ∼ eaN , where a weakly de-
bottom region of (a) where the complexity vanishes at ε = 1. pends on ε.
The continuous line represents the β → ∞ analytical solution,
recovering the result of Tanaka and Edwards [87] Σ ≈ 0.1992 From this analysis, one may surmise that:
for ε = 0. For ε > 1 and T = 0, the only possible fixed point is
m⋆i = 0, ∀i. a. When a limit cycle is reached by the dynamics, it
is overwhelmingly likely to be of length L = 2 for
ε < 1 and of length L = 4 for 1 < ε ≤ 2.
V. COUNTING LIMIT CYCLES
b. Even if exponentially less numerous, exponentially
long cycles will dominate when eaN > eN Σ2 , which
In the previous section, we have established the region occurs when εc < ε < 2 − εc , with εc ≈ 0.8.
of parameter space where exponentially numerous fixed
points exist, which might possibly be reached by learning These predictions are well obeyed by our numerical data,
in the slow limit α ≪ 1. However, limit cycles of vari- see Figs. 3, 11. Note however the strong finite N effects
ous lengths turn out to also be exponentially numerous that show up in the latter figure, which we discuss in the
when ε < 1, so we need to discuss them as well before next sections.
understanding the long term fate of the learning process
within our stylized complex world.
B. Cycles with Memory (α < 1)
A. Cycles without Memory (α = 1) When α < 1 and β = ∞, the update of Si (t) is given
by Eq. (14), have the same fixed points independently
In the memory-less limit, the dynamics becomes that of of α, but of course different limit cycles, which may in
the extensively studied Hopfield model [86, 88, 89] where fact cease to exist when α is small. In this section, we
the binary variable represents the activation of a neuron attempt to enumerate the number of cycles of length L in
evolving as the spirit of the calculation of Hwang et al. [61] for α < 1.
As detailed in Appendix B, we write the number of these
Si (t + 1) = sign ( ∑ Jij Sj (t)), (24) cycles as a sum over all possible trajectories of a product
j of δ functions ensuring the α < 1 dynamics of Qi are satis-
fied between two consecutive time-steps, while a product
with parallel updates. Counting limit cycles of length L of Heaviside step functions enforces Si (t) = sign(Qi (t)).
is even more difficult than counting fixed points (which Introducing the integral representation of the δ func-
formally correspond to L = 1). Some progress have been tion, averaging over the disorder and taking appropriate
reported by Hwang et al. [61] in the memory-less case changes of variable to decouple the N dimensions, the
α = 1. The notion of fixed point complexity Σ (defined in (annealed) complexity of cycles of length L writes
η
ΣL (α, η) = saddle { ∑ iR̂(t, s)iK̂(t, s) − ∑ V̂ (t, s)V̂ (s, t) + log IL }, (25)
R̂,K̂,V̂ s<t 2 t,s
where R̂(t, s) and K̂(t, s) are symmetric matrices while V̂ (t, s) is not a priori, and IL is explicitly given in the
14
0.4 1.0 1.0
1 − C(τ = 1)
Σ2(α, ε)
N = 64
C(τ = 2)
0.2 0.5 N = 128
0.5
N = 256
N = 512
0.0 0.0 N = 1024
0.0 0.5 1.0 0.0 0.5 1.0 0.0 DMFT
α α
0.5 0.6 0.7 0.8 0.9 1.0
FIG. 10. (a) L = 2 cycle complexity from the numerical res- ε
olution of the saddle point equations for ε = {0, 0.2, 0.4, 0.6}
from dark purple to light green, dashed lines indicating the FIG. 11. Steady-state two-point correlation between config-
fixed point complexity associated to each parameter. (b) Or- urations shifted by τ = 2 time-steps in the α = 1, β → ∞
der parameter at the L = 2 saddle, showing the nontrivial limit from finite N numerical simulations averaged over 128
coalescence of the cycle and fixed point saddle point as α is samples of disorder and initial conditions, error bars show-
decreased. The numerical resolution appears to breakdown ing 95% confidence intervals. The N = {64, 128} simulations
when we get close to α = 0. are run for t = 108 time-steps to illustrate taking the t → ∞
limit before N → ∞, whereas N = {256, 512, 1024} have been
simulated for t = 5 × 106 time-steps to recover the N → ∞ be-
Appendi, and where we can notably identify −iR̂(t, s) = fore t → ∞ regime. The continuous line represents the N → ∞
C(∣t − s∣). As a sanity check, one can verify that the DMFT solution integrated numerically, while the vertical dot-
L = 1 case, corresponding to the fixed point complexity, ted line corresponds to the critical value εc found by Hwang
is indeed independent of α and is given by the same ex- et al. [61].
pression as Eq. (22), see Appendix B 1. In a similar vein,
one can recover Σ2 (α = 1) = 2Σ1 (η).
Numerically solving the saddle point equations for de- one expects that as N grows, fixed points/limit cycles
creasing values of α, it appears that Σ2 (α, η) → Σ(η) will in fact never be reached, even if they are numerous.
when α → 0+ , see Fig. 10. While numerical difficulties This is in fact what happens numerically, see Fig. 11.
prevent us from exploring very small values of α, it seems What is then to be expected in the limit N → ∞? In
clear that the saddle point corresponding to the L = 2 order to study the complicated learning dynamics that
cycles eventually coalesces with the fixed point saddle takes place, we will resort to Dynamical Mean-Field The-
(which is known to be a sub-dominant saddle point when ory (DMFT). In a nutshell, DMFT allows deterministic
α = 1, see [61] and the discussion above). In any case, or stochastic dynamics in discrete or continuous time of
and perhaps surprisingly, there does not appear to be a a large number N of interacting degrees of freedom to be
critical value of α below which fixed points become more rewritten as a one-dimensional stochastic process with
abundant than cycles. We therefore expect a progressive self-consistent conditions. While difficult to solve both
crossover and not a sharp transition. analytically and numerically due to their self-consistent
nature, DMFT equations have proved very effective at
describing a very wide range of complex systems – see
VI. DYNAMICAL MEAN-FIELD THEORY [90] for a recent review. Note however that such an ap-
proach is only valid when N → ∞; as will be clear later,
strong finite size effects can appear and change the con-
We have thus seen that both fixed points and limit cy-
clusions obtained using DMFT.
cles are exponentially numerous. However, the question
In our case, we write the DMFT for the evolution
remains as to what happens dynamically, as the exis-
of the incentives Qi (t), which directly yield mi (t) =
tence of a large number of fixed points or limit cycles by
tanh(βQi (t)). In order to do so, we rewrite our online
no means guarantees that these will be reached at long
learning process, which depends on the realized Si (t), as
times.
an expression solely in terms of mi (t) with additional
In fact, the number of agents N is expected to play
fluctuations,
a major role in determining the long term fate of the
system. In particular, there are strong indications that ∑ Jij Sj (t) = ∑ Jij mj (t) + ηi (t), ηi (t) = ∑ Jij ξi (t),
the time τr needed to reach a fixed point or a limit cycle j j j
grows itself exponentially with N , at least when α = 1 (27)
[89]. More precisely, with ξi (t) = Si (t) − mi (t) and hence ⟨ξi (t)⟩ = 0 and
⟨ξi (t)ξi (s)⟩ = (1 − (mi (t))2 )δt,s . Now, assuming the cen-
τr ∼ N s eN B(ε) , (26) tral limit theorem holds, the random variables ηi become
Gaussian for large N with
where s is an exponent (possibly dependent on ε) and
B(ε) an effective barrier such that B(ε = 0) = 0. Hence ⟨ηi (t)⟩ = 0, ⟨ηi (t)ηj (s)⟩ = υ(ε)(1 − q(t))δt,s δi,j , (28)
15
where q(t) = C(t, t) as defined in Eq. (15). As required, with, we recall, q(t) = C(t, t) = ⟨m2 (t)⟩. Very impor-
in the noiseless limit β → ∞ limit, one has q(t) = 1 ∀t tantly, note the rescaling in time introduces a prefactor
and the random variables ηi are identically zero. α in the variance of the ϕ, which stems from the noise in
Starting from the N equations the learning process. Since 1 − q(t) ∼ β −2 for large β, this
extra term is of order α/β 2 , as anticipated above.
Qi (t + 1) = (1 − α)Qi (t) + α ∑ Jij mj (t) + αηi (t) + αhi (t),
j
In the next sections, the DMFT equations will be used
(29) to shed light on the dynamical behaviour of the model in
where hi (t) is an arbitrary external field that will even- the limit N → ∞.
tually be set to 0, the DMFT can be derived using path
integral techniques or the cavity method, the latter be-
ing detailed in Appendix C. Remaining in discrete time VII. NOISELESS LEARNING
to explore the entire range of values of α, one finds, in
the N → ∞ limit, In this section, we use both the DMFT equations and
the results on the complexity of fixed points and limit
Q(t + 1) = (1 − α)Q(t) cycles to classify the different dynamical behaviours of
+ α2 (1 − ε) ∑ G(t, s)m(s) + αϕ(t) + αh(t), the learning process in the noiseless case β → ∞, where
s<t the realized and expected decisions are equal, mi (t) =
(30) Si (t) = sign(Qi (t)).
with ⟨ϕ(t)⟩ = 0, and
⟨ϕ(t)ϕ(s)⟩ = υ(ε) [C(t, s) + (1 − q(t))δt,s ] , (31)
A. The Memory-less Limit α = 1
The memory kernel G and correlation function C are
then to be determined self-consistently,
In this case, corresponding to Eq. (24), both ap-
∂m(t) proaches (DMFT and complexity of limit cycles) seem
G(t, s) = ⟨ ∣ ⟩, C(t, s) = ⟨m(t)m(s)⟩, (32) to agree on the overall picture: as ε increases from 0
∂h(s) h=0
to 1, the system transitions from L = 2 cycles to over-
where the averages ⟨. . .⟩ are over the realisations of the damped oscillations and chaos – see Fig. 3. However,
random variable ϕ. These discrete time dynamics can upon scrutiny, one realizes that the perfect agreement
first be integrated numerically with an iterative scheme between DMFT and direct numerical simulations of the
to update both the memory kernel and correlation func- dynamics for finite N is only valid in a region where ε is
tion until convergence, see [91] or [92]. As detailed in the small and N large enough – see Fig. 11. In particular,
original work of Eissfeller and Opper [93, 94], one can also when 0.5 ≲ ε ≲ 0.8, L = 2 cycles do persist when N is
make use of Novikov’s theorem to compute the response smaller than ∼ 200. For larger N , the lag 2 autocorrela-
function with correlations, avoiding the unpleasant task tion function C(τ = 2) is noticeably smaller than unity,
of taking finite differences on noisy trajectories, at the and well predicted by DMFT as soon as N ≳ 1000.
cost of the inversion of the correlation matrix. Note that What happens for ε ≲ 0.5 when N → ∞? The numeri-
this inversion will however mean that very long trajecto- cal solution of the DMFT equations suggest the following
ries become difficult to integrate. scenario: when ε < εRM ≈ 0.473, the long time value m∞
While we shall see that this numerical resolution can of the correlation with the initial conditions C(0, 2n) at
provide precious intuition to understand the role of finite even time steps is strictly positive, hence the subscript
N in the dynamics, a continuous description will be much “RM” for Remnant Magnetisation.[95] It is only exactly
more convenient to obtain analytical insights. In the α ≪ equal to one for ε = 0 (permanent oscillations) and de-
1, t ≫ 1 regime, we can rescale the time as t → t/α. creases to reach zero when ε → εRM [94]. Below εRM ,
Interestingly, doing so requires expanding Q(t + 1) to the we can conclude that the system is not ergodic, which
second order if one is to keep an explicit dependence on α. will have important implications on the finite tempera-
The resulting continuous dynamics reads ture dynamics. For asymmetries greater that εRM on the
α t other hand, the decorrelation becomes exponential and
Q̈(t) + Q̇(t) = − Q(t) + (1 − ε) ∫ ds G(t, s)m(s) we enter a bona fide chaotic, ergodic regime.
2 0 (33)
Although memory-less learning is clearly unrealistic,
+ ϕ(t) + h(t) these results are rather instructive. The system is indeed
with unable to display aggregate coordination when interac-
tions are mutually independent (chaotic region around
⟨ϕ(t)ϕ(s)⟩ = υ(ε)[C(t, s) + α(1 − q(t))δ(t − s)], (34) ε = 1). Placing ourselves in the socio-economic setting, it
and the memory kernel and correlation function are sim- seems evident that the number of players vastly exceeds
ilarly defined self-consistently the number of iterations, and the results above indicate
that the chaotic region is in fact quite large. Perhaps
δm(t) more importantly, in the absence of learning, agents will
G(t, s) = ⟨ ∣ ⟩, C(t, s) = ⟨m(t)m(s)⟩, (35)
δh(s) h=0 see their decisions vary at a high frequency without ever
16
(a) 0.8 (a) 100

1.00 1.0
(b)
Si(t)
(b)
C(τ = 2/α)
0.75 0.6
C (τ = 1)
0.5 N = 128
α
Si(t)
0.50 0.4 (c)
ε
N = 256
Si(t)
0.25
N = 512 10−1 (c)
0.2 DMFT
N = 128 0.0
(d)
0.00 N = 256 0.6 0.8 1.0 1 40
0.0 ε α(t − t0)
0.5 1.0 1 50
α t − t0
FIG. 13. Influence of finite size N , non-reciprocity ε and
FIG. 12. Convergence to fixed points with decreasing α and memory span α on the learning dynamics. (a) Steady-state
for different values of ε < 0.8, β → ∞ and finite N . (a) two-point correlation function shifted by τ = 2/α in the β → ∞
Steady-state two-point correlation function between succes- limit for different memory loss rates α, from light green to
sive configurations from finite N numerical simulations aver- black (color map on the right axis). Symbols correspond to di-
aged over 200 samples of disorder and initial conditions, error- rect simulations and plain lines to the solutions of the DMFT
bars showing 95% confidence intervals. (b), (c) and (d) Sam- equations, while the dashed line is the solution to the DMFT
ple trajectories of 32 randomly chosen sites among N = 256 equations for α → 0. The N = 128 simulations are initialized
for ε = 0.4, t0 = 105 /α, for α = {1.0, 0.88, 0.7} respectively. with t0 = 108 time-steps, whereas N = {256, 512} have been
simulated for t0 = 106 iterations before taking measurements.
Results are averaged over 32 samples, with error-bars show-
ing 95% confidence intervals. (b) and (c) Sample trajectories
reaching any static steady-state, not only when the game for all N = 128 sites for ε = 1 and α = {1.0, 0.1} respectively,
is close to zero sum (ε → 2), but even when it is fully re- clearly reaching chaotic and fixed-point steady states.
ciprocal (ε → 0). Clearly, this last point underlines the
importance of introducing memory to recover realistic
learning dynamics.
DMFT regime shown as plain lines in Fig. 13. One finds
that decreasing the value of α slows down the decorrela-
tion of the system. However, for small α, the evolution
B. Memory Helps Convergence to Fixed Points
becomes a function of ατ only, as suggested by Eq. (33)
when α → 0: the dynamical slowdown is dominated by
For N not too large, we observe numerically that the the long memory of learning itself.
fraction of “frozen” agents for which Si (t + 1) = Si (t)
Fig. 13 shows that when ε ≳ 0.5, sufficiently large sys-
quickly tends to 1 as α decreases from 1, as shown in
tems (described by DMFT) decorrelate with time for all
Fig. 12. This is somewhat consistent with intuition,
α, and we expect C(τ → ∞) → 0: learning leads to chaos
as the learning dynamics average rewards over a period
in such cases.
τα ∼ 1/α, meaning that high frequency cycles observed
for α = 1 are expected to be “washed out” when α is suf- When ε ≲ 0.5, on the other hand, we found that there
ficiently small. Since fixed points exist in large numbers, is ergodicity breaking, in the sense that C(τ → ∞) > 0, as
it appears natural that they are eventually reached given we found above for cycles when α = 1. More precisely, a
their abundance at zero temperature. However, as we numerical analysis of the DMFT equations suggests that
have shown in the previous section, L = 2 limit cycles are when α → 0 and ε small, 1−C(τ → ∞) is extremely small
still much more numerous than fixed points for α ≳ 0.5. but non zero. For example, when ε = 0.4 we found 1 −
The fact that C(τ = 1) approaches unity as α is reduced C(τ → ∞) ≈ 0.005. This is compatible with the numerical
much faster than in Fig. 10(b) suggests that the basin of results of [94].
attraction of fixed points quickly expands, at the expense In other words, there seems to exist a critical value
of L = 2 limit cycles.[96] εRM (α) separating the ergodic, chaotic phase for ε >
Our numerical results therefore indicate that for any εRM (α) from the non-ergodic, quasi fixed-point be-
finite size system which has enough time to reach a steady haviour for ε < εRM (α). However, our numerical results
state, the effect of α is effectively to help the system are not precise enough to ascertain the dependence of
find fixed points – see Fig. 13. Focusing for example εRM on α, which seems to hover around the value 0.473
on the points corresponding to N = 128 and α = 0.1, we found for α = 1. More work on this specific point would
indeed observe that the 95% quantile includes C(2/α) = 1 be needed to understand such a weak dependence on the
even for ε = 1, i.e. fixed points can be reached even in memory length.
the chaotic regime with modest simulation times, which The precise dynamical behavior of the autocorrelation
would be an overwhelmingly improbable scenario in the function C(τ ) can be ascertained in the continuous limit
memory-less case, as illustrated by Figs. 13 (b) and (c). α → 0 when ε = 1. Indeed, the influence of the memory
As the number of agents N increases, we enter the kernel vanishes in this case where interactions are exactly
17
(a) (b) 100 central question is what will happen if the memory loss
1.0 100
N = 512 rate is reduced in the region of parameter space where
N = 1024 there are only limit cycles. As previously stated, aver-
aging over a period τα ∼ 1/α clearly suggests that the
C(τ )
0.5
α
10−1 occurrence of short cycles (starting at L = 4 for α = 1)
10−1 should gradually vanish.
0.0 Naively, one might expect a simple rescaling in time
0 50 100 0 4 8 t → t/α, yielding cycles – when they exist – of period
τ ατ
inversely proportional to α itself. Looking at the numer-
ical results from both the finite size game and the DMFT
FIG. 14. Evolution of the time shifted autocorrelation func- integrated numerically in Fig. 15 (a), it quickly appears
tion in the non-symmetric case ε = 1, β → ∞, for different that such a simple rescaling in time does not provide the
system sizes and memory loss parameters averaged over 20
correct description. Indeed, the√period of cycles is ob-
realizations. Left: lin-lin scale, errobars showing 95% confi-
dence intervals, continuous lines representing the numerically served to be proportional to 1/ α, i.e. much shorter
integrated full DMFT equations (Eq. (36)). Right: lin-log than 1/α – see Figs. 15 (b) and (c).
scale and rescaling of the time shift by α such that points col- One important aspect to note is that there is some
lapse onto a single curve (errobars not shown), black continu- decorrelation, as the second peak of C(τ ) does not quite
ous line representing the analytical solution found by solving reach unity (in Fig. 15 (b)), meaning that we may see
the Sompolinsky & Crisanti ODE Eq. (38), dashed line repre- quasi-cycles and not exact limit cycles, complicating the
senting a pure exponential decay with characteristic time τ1 . analytical description of the phenomenon. Just as true
fixed points in the N → ∞ limit only exist only ε = 0, it
appears that only the case ε = 2 does display true limit
non-symmetric, leaving us with cycles.
Another subtle point to consider is similar to the ε < 1
Q̇(t) = −Q(t) + ϕ(t), (α → 0) (36)
cases discussed, we expect the time taken to reach these
where we emphasize that the time variable has been cycles will depend on the system size and the relative
rescaled as t → αt. From there, the classical solution distance to the chaotic region. This is confirmed by the
method proposed by Crisanti & Sompolinsky [97, 98] can DMFT solved for fixed trajectory times for ε = 1.5 (light
√
be straightforwordly adapted with a small modification crosses), which progressively departs from the ω0 ∼ α
due to our parametrization of the interaction matrix that regime around α = 0.1.
scales the variance of the entries by a factor 1/2 for ε = 1, To understand how such non-trivial stretching occurs,
see Appendix D. The two-point autocorrelation function we go back to the continuous DMFT equation,
is found to be given by t
α
Q̈(t) = − Q̇(t) − Q(t) + (1 − ε) ∫ ds G(t, s)m(s)
2 ∆(τ ) 2 0
C(τ ) = sin−1 ( ), (37)
π ∆(0) + ϕ(t) + h(t).
where ∆(τ ) = ⟨Q(t + τ )Q(t)⟩ follows the second-order While the presence of the second order derivative Q̈(t)
ordinary differential equation appears natural to recover limit cycles, it should be noted
that this term, being pre-factored by α, is superficially
¨ ) = ∆(τ ) − 1 C(τ ),
∆(τ (38) subdominant relative to the dissipation represented by
2 Q̇(t). While we have seen that there is some decorrela-
with ∆(0) = 1 − π2 [99]. Very quickly, this means that the tion, the fact that robust oscillations are present there-
− τ fore suggests that the complicated self-consistent forcing
autocorrelation decays exponentially, C(τ ) ∝ e τ1 with
terms almost exactly compensate dissipation over a pe-
√ riod, allowing the system to periodically revisit quasi-
π−2
τ1 = ≈ 2.84. (39) identical configurations. In fact, the shape of these oscil-
π−3 lations is far from sinusoidal, but rather of see-saw type,
Both the full solution, obtained by integrating the ODE see Fig. 15 (b). This suggests that in the limit α → 0,
numerically, as well as this exponential decay, are shown Q̈(t) diverges each time Ċ(τ ) changes sign, such that
α
in Fig. 14, displaying a very satisfactory match with nu- 2
Q̈(t) cannot be neglected and therefore sets the rele-
merical simulations. vant time scale to α−1/2 . We have however not been able
to perform a more precise singular perturbation analysis
of this phenomenon.
C. Anomalous Stretching of Cycles Going beyond this rather loose argument, and precisely
characterizing such see-saw patterns appears very chal-
Having a better understanding of how the memory al- lenging and is left for future work. A possible approach
lows the system to find fixed points when they exist, a would be to first take the ε = 2 case where true cycles
18
(a) 2.0 (a) (b) (c) 1.00

102 1
N = 64
T /(2J)
0.50
N = 128 1.8
q
N = 256 0.15
1/ω0
DMFT 1.6
ε
0 0.00
101 0 1 2 0 1 2 0 1 2
1.4 ε ε ε
1.2
100 101 102 FIG. 16. Heat map of q = C(0) in (T = 1/β, ε) space from
1/α numerical simulations for N = 256, t0 = 106 averaged over 32
(b) (c) samples of disorder and initial conditions and (a), (b) and (c)
1 10−1 corresponding to α = {0.5, 0.1, 0.01} respectively. The white
2
10
dashed line represents the critical temperature Tc (ε) where
α|Ĉ(ω)|2
the paramagnetic solution (q = 0) becomes linearly unstable

C(τ )
α
100 (Sec. IV).
10−2
−1
0 2√ 4 6
10−2
0 ω0
for instance α = 1, it is indeed clear that the iteration
√3ω0 1
ατ ω/ α
mi (t + 1) = tanh (β ∑ Jij Sj (t))
J
FIG. 15. Evolution of the oscillation frequency ω0 with the
memory loss rate α for β → ∞, ε > 2 − εc averaged over 96 will have extremely large fluctuation in the argument on
samples of disorder and initial conditions, errorbars showing the right hand side. As a result, we expect to lose the
95% confidence interval. (a) Log-log √ plot highlighting the sharp transition as a function of T that can be observed
near square root dependency ω0 ∼ α the dashed line corre- for the NMFE (see Fig. 16). The order parameter q in-
ponding to an exponent 1/2. (b) Two-point autocorrelation stead continuously tends to 0 with T , regardless of the
for ε = 1.8, N = 256 as a function of the rescaled time lag
asymmetry ε. This regime is shown in Fig. 16 (a), rep-
for different values of α, confirming the suitability of the scal-
ing in particular at short time scales. (c) Power spectrum of resenting the heat map of q for α = 0.5. Clearly, the
the autocorrelation for the same parameters, displaying the linear stability analysis of the paramagnetic fixed point
maximum at ω0 and secondary peaks at odd multiples of this presented in Sec. IV cannot hold when the thermal fluc-
fundamental frequency as expected from the triangular aspect tuations are not averaged on large periods of time. To
of the autocorrelation. find a richer phenomenology, we will therefore focus on
the α ≪ 1 regime where more complex dynamics can be
observed.
should exist, and to assume the correlation function is
an exact triangular wave of frequency ω as suggested by
Fig. 15 (c). As a result, m(s) = sign(Q(s)) is an exact A. Fixed Points for Intentions
square wave, and the convolution with G can be written
as a product in Fourier space. Enforcing the dissipation In Sec. IV, we studied the fixed points of the NMFE
over a period to be zero, one could then perhaps find a that the game may reach if the fluctuations from imper-
closed equation for Q and ω if appropriate ansätze for fect learning can be neglected, i.e. if α → 0. Now, we have
the response and forcing functions are taken. seen that the DMFT equations proved effective in the
zero “temperature” limit, and importantly established
that true fixed points or two-cycles are only reached in
VIII. NOISY LEARNING finite time in the fully reciprocal case ε = 0 when the sys-
tem size diverges (see Fig. 11). The same equations can
also be used to revisit this finite β regime. √
While we have shown that the β → ∞ deterministic
Going back to Eq. (33) and neglecting the term in α
limit can be relatively well understood with the analyt-
from the correlation function as we did in the static setup,
ical tools at our disposal, one of the key features of our
fixed point solutions require
model is the uncertainty in the decision occurring for √
boundedly rational agents. Besides, it is also in this situa- Q = (1 − ε)mχ + J qυ(ε)z, (40)
tion that the online learning dynamics differ significantly
from the more widely studied offline learning where the where z is now a static white noise of unit variance, q
entire model can be understood in terms of deterministic is simply the now constant autocorrelation and χ is the
mixed strategies parameterized by the coefficients mi (t) integrated response function that we assume to be time-
(compare Eqs. (5) and (7)). translation invariant,
When α is close to unity and β becomes small, the fluc- ∞
tuations are too large for coordination to occur. Taking χ=∫ dτ G(τ ). (41)
0
19
The averages on the effective process can now be taken where we have used sech2 (βQ0 ) = 1 − m2 (z) from
on z to self-consistently solve for q and χ (see e.g. [55] Eq. (44), giving in Fourier space
for a more detailed description). The resulting set of
equations are then ˆ
ϕ̂1 (ω) + ξ(ω)
Q̂1 (ω) = , (47)
q = ⟨m2 (z)⟩z , (42) iω + 1 − β(1 − ε)(1 − m2 (z))Ĝ(ω)
to be solved simultaneously with where we have again assumed that the memory kernel is
time-translation invariant.
β(1 − m2 (z)) Now, in the limit βQ1 ≪ 1, i.e. close to the critical
χ=⟨ ⟩ (43)
1 − β(1 − ε)χ(1 − m2 (z)) z temperature, one can we linearize the hyperbolic tangent
tanh(βQ1 (t)) and write a closed equation for the spectral
where m(z) is the solution to
density of Q1 at order β 2 Q21 ,
√
m(z) = tanh(β(1 − ε)χm(z) + β qυ(ε)z). (44) 1
= ∣iω + 1 − β(1 − ε)Ĝ(ω)∣2 − β 2 υ(ε). (48)
Although our model is entirely built on a dynamical evo- ⟨∣Q̂1 (ω)∣2 ⟩
lution equation, and not on a notion of thermal equilib-
rium, this set of self-consistent equations coincides with As a result, we have the criterion for the onset of insta-
the replica-symmetric solution of the NMFE model found bility for ω = 0:
by Bray, Sompolinsky and Yu [78] for ε = 0. Since replica √
symmetry is broken in the whole low temperature phase (1 − ε)χ = 1 − β υ(ε), (49)
of the NMFE model, we expect that these static solu-
tions of the DMFT cannot correctly describe the long where we noticed Ĝ(ω = 0) = χ, given, close to βc , by
time limit of the dynamics, as we now show.
The numerical solutions for the DMFT fixed point 1 √
χ= (1 − 1 − 4β 2 (1 − ε)) . (50)
equations are shown in Fig. 17 (a) and compared to nu- 2β(1 − ε)
merical results of the game for small α and for ε = 0.1
(similar results, not shown, are obtained for other values Taking ε = 0, we recover the criterion found by Bray et
of ε ≲ 0.8).[100] We find that the long time behaviour al. [78] for the critical temperature, giving Tc = 1/βc = 2
of q for the direct simulation of the SK-game (circles in their case.
and squares) and for long time dynamical solution of the For non-zero ε, we can also find the critical tempera-
DMFT equations match very well, but differ from the ture by replacing Eq. (50) in (49), to recover yet again
value of q inferred from the set of self-consistent equa- the critical value given by Eq. (20). The invalidity of the
tions established above. This is expected since with such fixed point solution is illustrated in Fig. 17 (b), where
solution the order parameter q approaches unity expo- the spectral density evaluated at ω = 0 can be observed
nentially fast as T → 0, whereas the fact that the proba- to become negative for T < Tc (ε). While the linearization
bility of small local fields (i.e. rewards in the game anal- used to obtain the relation is expected to become invalid,
ogy) vanish linearly (see Fig. 4) suggest that q = 1 − κT 2 , we understand this negativity as a strong sign that the
as for the full RSB solution of [78] but with presumably solution is unstable and thus likely invalid throughout the
a different value of κ. low “temperature” phase, consistently with the discrep-
To ascertain the range over which this non-trivial ancy observed between the static prediction and the di-
mean-field solution should be valid, we can study the rect simulations and the dynamic solution of the DMFT
stability of the DMFT fixed point close to the critical show in Fig. 17 (a). In this finite β regime, we therefore
temperature 1/βc , following the procedure first detailed effectively only have what we previously referred to as
in [54]. Considering a random perturbation to the fixed quasi-fixed points at best (including when ε = 0), even
point ϵξ(t), with ξ(t) a Gaussian white noise and ϵ ≪ 1, when neglecting the vanishing fluctuations caused by the
we study the perturbed solution online learning dynamics as α → 0.
Q(t) = Q0 + ϵQ1 (t), (45)
with Q0 the fixed point given in Eq.(40), where√the noise B. Aging
is no longer static but similarly given by ϕ(t) = qυ(ε)z+
ϵϕ1 (t). Replacing in the DMFT continuous dynamics for The inadequacy of the static solution of the DMFT
α → 0+ and collecting terms of order ϵ, we find that the equations to describe the long term dynamics of the sys-
perturbation evolves as tem is a well known symptom associated with the “aging”
t
phenomenon [101], i.e. the fact that equilibrium is never
Q̇1 (t) = −Q1 (t) + β(1 − ε)(1 − m2 (z)) ∫ G(t, s)Q1 (s) reached and all correlation functions depend on the “age”
0 of the system [102]:
+ ϕ1 (t) + ξ(t),
(46) C(tw , tw + t) = Crelax (t) + Caging (t, tw ), (51)
20
(a) (b) model [112, 113], which is typically found to display “sub-
1.0 aging”, although the precise scaling of the aging correla-
0.4
h|Q̂1(ω = 0)|2i−1
tion function remains unclear. It is however important
0.2 to emphasize that the learning dynamics of the SK-game
0.5 is markedly different from the physical dynamics of the
q
0.0 original SK model, so there is a priori no reason to ex-

pect their complicated out-of-equilibrium relaxation to
−0.2 be directly comparable.
0.0
0.0 0.5 1.0 0.8 1.0 1.2 What happens in our model when ε > 0? It is known
T /Tc(ε) T /Tc(ε) from previous work (in somewhat different contexts),
that aging is interrupted at long times in the presence of
non reciprocal interactions, but survives for finite times
FIG. 17. (a) Order parameter q = C(t, t) averaged in time
provided the asymmetry is not to large [114–116]. If the
in the (quasi) stationary regime vs. rescaled temperature for
ε = 0.1. Circular and diamond markers correspond direct, asymmetry strength is further increased, we expect the
finite size simulations at N = 256 for α = 0.01 and α = 0.1 re- amount of mixing in the system to eventually be large
spectively, whereas crosses represent the (dynamical) numer- enough for the dynamics to no longer get stuck [117].
ical solution to the complete set of N → ∞ DMFT equations From our numerical simulations, shown in Fig. 18 (b)-
for α = 0.1. Continuous lines show the solution to the static (d), it appears that aging (as described by Eq. (51))
DMFT fixed point equations (Eq. (42)-(44)) for ε = 0.1. (b) still holds when ε < ε⋆ , but that the dynamics becomes
Spectral density of a small perturbation Q1 to the fixed point time translation invariant when ε > ε⋆ . It is tempting
solution of the DMFT close to the critical temperature. As to conjecture that aging disappears exactly when the dy-
the quantity is necessarily positive for a valid solution, the namics becomes ergodic, i.e. when the correlation with
grey region corresponds to instability.
a random initial condition decays to zero. This suggests
that ε⋆ = εRM , which is roughly in line with our numerical
data. Note however that the transition between aging dy-
where tw is the waiting time, or age of the system, tw = 0 namics and time translation invariant correlations seems
corresponding to a random initial condition. to occur somewhat progressively, hence it may well be
Aging typically arises in complex systems in the low that we in fact observe interrupted aging beyond a time
temperature (low noise) limit. Pictorially, the energy that decreases not only with ε but also with tempera-
landscape of such systems (like spin-glasses) are highly ture 1/β. More analytical work is needed to clarify the
non convex and “rugged”, with a very large number of lo- situation.
cal minima or quasi-stable saddle points in which the dy- A particularity of the aging dynamics of the SK-game
namics gets stuck for extended periods of time [102, 103]. is related to its online learning dynamics. As clearly vis-
The consequence is then that the time required to exit ible in both the DMFT equation and the derivation of
a local minimum is a function of the age of the system, the Naive Mean-Field equation detailed in Appendix A,
i.e. the time taken for the system to reach this config- there will inevitably be some decorrelation in time of the
uration. Intuitively, the deeper in the energy landscape expected decisions if α(1 − q) becomes significant. We
the solution is, the longer it will take for a sufficiently therefore naturally expect the region where time transla-
large random fluctuation to occur and allow the system tion invariance breaks down to be dependent on all three
to resume its exploration of the landscape. Such aging parameters α, β and ε.
phenomena are known to occur in a wide range of com- The interpretation of aging in the socio-economic
plex systems with reciprocal interactions, such as glassy context is quite interesting and has been discussed in
systems, populations dynamics [104] or neural networks Sec. III C. In a nutshell, it means that as time goes on,
that are described by very similar mean-field dynamics agents get stuck in locally satisficing strategies for longer
[105]. Aging dynamics was also recently found in a “habit and longer, but after a time proportional to the total time
formation” model, see [12]. the game has already been played, the system eventually
Not surprisingly in view of its similarity with usual evolves and individual strategies mi reach an altogether
spin-glasses, the SK-game displays aging for reciprocal different configuration. This process goes on forever, but
interactions (ε = 0) and sufficiently low temperatures becomes slower and slower with time: the notion of quasi-
β > βc , see Fig. 18 (a), for which Eq. (51) accurately equilibrium therefore makes sense at long times, for small
describes the data with the initial relaxation component enough noise and small enough non-reciprocity.
Crelax (t) well fitted by a power law t−x and
t
Caging (t, tw ) = C ( ), (52) C. Chaos and (quasi-)Limit cycles
tw
corresponding to the aging behaviour found in a wide When the non-reciprocity of interactions is sufficiently
range of glassy models [102, 106–111]. Interestingly, small and quasi-fixed points exist, we have established
this is not the behavior observed in the “physical” SK that boundedly rational systems displays complicated ag-
21
0.8 107
0.95 (a) (b) (c) (d)
x = 0.16 x = 0.20
C(tw , tw + t) − At−x
0.80 0.8
0.90 A = 0.015 A = 0.024 0.6
C(tw , tw + t)
0.75 0.6
αtw
0.85 105
0.9 0.70
0.8
0.4 0.4
0.80 0.8
0.65 0.2 0.2
0.6
100 103 106 100 103 106
0.75
0.60 0.0 103
10−6 10−2 102 10−6 10−2 102 100 103 106 100 103 106
t/tw t/tw αt αt
FIG. 18. Aging behavior of the system for N = 256, averages performed over 576 realizations. Color indicates the value of αtw ,
see scale on the far right. (a) and (b): Aging two-point correlation functions with the initial power law decay removed to isolate
the aging component plotted as a function of t/tw , inset showing the entire correlation function as a function of αt, dashed line
representing the power law fit At−x of the first relaxation. (a) α = 0.5, β = 4, ε = 0, (b) α = 0.2, β = 2, ε = 0.25. (c) and (d):
Partially aging two-point correlation functions plotted as a function of αt. (c) α = 0.5, β = 4, ε = 0.5, (d) α = 0.2, β = 2, ε = 0.5.
ing dynamics when learning noise, parameterised by the works with uncorrelated couplings [118], what is interest-
value of β, is present. The immediate question is now ing in our case is that both the non-linearity of the hy-
how such noise influences the complex dynamics, chaos perbolic tangent (governed by β), and the strength of the
and (quasi) limit cycles that we have found in the β → ∞, effective noise (which scales as 1 − q, see Eq. (34)) are
α ≪ 1 regime (see Sec. VII). varied simultaneously and, in a sense, self-consistently.
To qualitatively illustrate the effect of non-zero noise, Determining the way the decorrelation rate evolves with
we have run simulations for α = 0.1 and different values both β and ε is therefore quite non trivial and would be
of ε and β. Fig. 19 displays individual trajectories of a very interesting endeavor.
the intentions mi (t), as well as the distribution of the Where we previously had limit cycles, for ε = 1.5 for
values of individual mi over all agents, realizations and instance (last row of Fig. 19), it appears that oscillations
time-steps. We also show the auto-correlation function survive for large values of β. Note however that in this
C(τ ) (assumed to be time-translation invariant over the case the value q = C(0) appears to very quickly vanish
short time scales considered), which is also compared to when β decreases. This is consistent with the linear sta-
the DMFT solved numerically. Note that for the small- bility analysis of the paramagnetic solution that should
est value ε = 0.1, the trajectories illustrate the previously be valid for vanishingly small α. As visible in Fig. 16
discussed quasi-fixed points emerging from the online dy- (c) we indeed expect that the region in which the system
namics. Clearly, while the correlation function remains displays any form of aggregate coordination becomes in-
close to constant, individual intentions are not exactly creasingly narrow as ε gets closer to its maximum value
frozen (notice the wiggles in the top row of Fig. 19, spe- of 2. Precisely for ε = 2, the system likely becomes fully
cially for β = 2), explaining how the system as a whole disordered (q = 0) for any finite values of β when α → 0.
eventually decorrelates and displays aging, as discussed For small but finite values of α as those presented here,
in the previous section. this is not quite the case however, and large asymme-
tries ε > 2 − εc do give rise to clear oscillations, both in
In the chaotic regime around ε = 1, it is clear that
individual trajectories and the correlation function (for
decreasing β (increasing noise) spreads the distribution
instance Fig. 19, bottom row shows for ε = 1.5 that some
of the mi , which is less and less concentrated around ±1
oscillations can be somewhat sustained).
as an immediate consequence of the smoothed out hyper-
bolic tangent. As a result, the equal-time autocorrelation
C(0) = q naturally decreases when β decreases. It is fur-
thermore interesting to note that its value significantly D. Role of the Noise: Recap
decreases as the asymmetry parameter ε increases. We
expect a decrease in α to have a similar role, as suggested In the presence of noise, the “conviction” of agents nat-
by the phase diagrams presented in Fig. 16 (b) and (c). urally goes down, in the sense that individual mi ’s be-
Dynamically, the decay of the autocorrelation in this come smaller in absolute value, i.e. less polarized around
chaotic regime appears to be more or less independent of ±1. For small α (long memory time), there exists a well
the strength of the noise, which can be seen by comparing defined transition line in the plane ε (asymmetry), 1/β
the insets of the third row of Fig. 19. While it is known (amplitude of the noise) above which agents start playing
that external noise kills deterministic chaos in neural net- randomly at each round (i.e. mi = 0), but below which
22
(a) (b)
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
0 4 8 0 4 8 0 4 8 0 4 8
α(t − t0) ατ α(t − t0) ατ
FIG. 19. Finite β trajectories obtained from numerical simulations at α = 0.1, N = 256, t0 = 108 averaged over 96 samples
of disorder and initial conditions for (a) β = 4, (b) β = 2 and from top to bottom ε = {0.1, 0.85, 1.05, 1.5}. Each line displays
(from left to right) the evolution of 16 randomly selected agents, the associated histogram of mi over both agents, time and
realizations and the autocorrelation function assumed to be time translation invariant on short time scales. Third row: insets
representing the evolution of the normalized autocorrelation C(τ )/C(0) with a logarithmic vertical scale.
some instantaneous propensity to overplay +1 or −1 ap- ceases to be negligible.

pears. The time evolution of this propensity depends on
the value of ε: for small asymmetries, the fixed points
that are reached in the absence of noise become quasi- IX. SUMMARY & DISCUSSION
fixed points around which the system settles for longer
and longer periods of time, after eventually moving on A. Blindsided by Complexity
to a completely different configuration (aging). For ε ∼ 1
(uncorrelated influence from i to j and from j to i), deter-
ministic chaos when noise is absent becomes noisy chaos, Let us summarize our main conceptual assertions. As a
with not much changes. In the highly competitive region schematic model of the complexity economic agents are
ε → 2, periodic cycles progressively become over-damped, confronted with, we introduced the “SK-game”, a dis-
as expected since noise does not allow synchronisation to crete time binary choice model with N interacting agents
survive at long times. In terms of individual rewards, and three parameters: α (memory loss rate), β (inverse
not surprisingly, noise tends to be detrimental, except amplitude of noise in the learning process or intensity of
in the reciprocal region (ε small) where weak noise actu- choice) and ε (non reciprocity of interactions).
ally helps agents coordinating around mutually beneficial We have shown that even in a completely static envi-
actions. ronment where the pay-off matrix does not evolve, agents
are unable to learn collectively optimal strategies. This is
While the precise characterization of this very rich either because the learning process gets trapped by a sub-
ecology of dynamical behaviors is left for future work, optimal fixed point (or remains around one for very long
we emphasize that the decorrelation induced by the fact times), or because learning never converges and leads to
that incentives Qi are themselves random variables is a a never ending (chaotic or quasi-periodic) evolution of
key difference between the online learning presented here agents intentions.
and their offline counterpart that are often considered Hence, contrarily to the hope that learning might
[14, 46]. While the phenomenology between these two save the “rational expectation” framework [9], which still
types of learning is somewhat similar for α ≪ 1, β√≫ 1, holds the upper hand in macroeconomics textbooks, we
there are clearly key differences whenever the ratio α/β argue that complex situations are generically unlearnable.
23
Agents, therefore, must do with satisficing solutions, as get stuck. We also showed how the number of L-cycles
argued long ago by H. Simon [1], an idea embodied by can be calculated for all values of α < 1.
our model in a concrete and tangible way. The chaotic region that is known to exist when inter-
Only a centralized, omniscient agent may be able to actions are mostly non-symmetric (ε ≈ 1) also appears
ascribe an optimal strategy to all agents – which inciden- to be reduced by memory. When couplings are mostly
tally raises the question of trust: would agents even agree non-reciprocal ε ≲ 2, periodic oscillations survive but we
to follow the central planner advice? Would they even found that decreasing α non-trivially increases the cycle
believe in her ability to solve complex problems, knowing length, as α−1/2 .
that their solution sensitively depends on all parameters When β is finite, the fixed point solutions to the
of the model? If a finite fraction of all agents fail to com- dynamics correspond to the so-called Naive Mean-field
ply, the resulting average reward will drop precipitously Equation of spin glasses, another model that has been
below the optimal value and not be much better than the studied in detail [78]. One knows in particular that
result obtained through individual learning. such solutions become exponentially abundant for small
As general ideas of interest in a socio-economic context, enough noise 1/β and for ε = 0, a result that we have
we have established that extended to all ε < 1. When ε > 1, on the other hand, the
only fixed point (or Nash equilibrium) is mi = 0, ∀i, i.e.
1. long memory of past rewards is beneficial to learn- completely random decisions at each time step.
ing whereas over-reaction to recent past is detri- The Dynamical Mean-Field Theory (DMFT) is a tool
mental; of choice for investigating the dynamics of the model
2. increased competition generically destabilizes fixed when N → ∞. DMFT is however frustratingly difficult
points and leads first to chaos and, in the high com- to exploit analytically in the general case, so we are left
petition limit, to quasi-cycles; with numerical solutions of our DMFT equations that ac-
curately match direct numerical simulations of the model
3. some amount of noise in the learning process, quite when N is large, but fails to capture some specific fea-
paradoxically, allows the system to reach better col- tures arising when N is small. From our numerical re-
lective decisions, in the sense that the average results, we conjectured that quasi-fixed points (and corre-
ward is increased; spondingly, aging dynamics) persist for small noise and
when ε ≤ ε⋆ , where the value of ε⋆ is difficult to ascertain
4. non-ergodic behaviour spontaneously appear (in but could be as high as εRM = 0.47, perhaps related to
the form of “aging”) in a large swath of parame- the remnant magnetisation transition found in [94].
ter space, when α, 1/β and ε are small. For β = ∞, the long-time, zero temperature autocor-
relation C(∞) appears to drop extremely slowly with
On the positive side, we have shown that learning is far ε, but we have not been able to get an analytical result.
from useless: instead of getting stuck among one of the DMFT equations also clearly lead to the anomalous α−1/2
most numerous fixed points with low average reward, stretching of the cycles mentioned above, but an analytic
the learning process does allow the system to coordinate solution again eluded us.
around satisficing solutions with rather high (but not op- One of the reasons analytical progress with DMFT
timal) average reward. Numerically, the average reward is difficult is presumably related to the phenomenon
at the end of the learning process is, for ε = 0, ≈ 8% below of “Replica Symmetry Breaking” and its avatar in the
the optimal value, when the majority of fixed points lead present context of dynamical learning. Indeed, any at-
to a much worse average reward ≈ 33% below the optimal tempt to expand around a static solution of the DMFT
value [60]. equations leads to inconsistencies in the interesting situa-
tion β > βc (ε) when decisions are not purely random, see
Fig. 17(b). In fact, as shown in Fig. 17(a), the value of
B. Technical Results & Conjectures the order parameter q predicted by such static solutions
is substantially off the value found from the long time,
From a statistical mechanics perspective, our model numerical solution of the DMFT equations, which itself
is next of kin to, but different from several well studied coincides with direct simulations of the SK-game. En
models; a synthesis of our original results can be found passant, we noticed that the value of q seems to be given
in Figs. 1, 2, 9, 16. by a universal function of βc (ε)/β, independently of the
For example, when α = 1, β → ∞, the dynamics is value of ε. Again, we have not been able to understand
equivalent to a Hopfield model of learning with non- why this should be the case.
symmetric Gaussian synaptic couplings, for which many Finally, we have numerically established several inter-
results are known, in particular on the number of fixed esting results concerning average and individual rewards,
points and L-cycles. Introducing some memory with that would deserve further investigations. For example,
α < 1, we found that previously dynamically unattain- the average reward seems to converge towards its asymp-
able fixed points become typical solutions, replacing the totic value as N −2/3 , exactly as for the SK model, al-
short limit cycles in which the α = 1 parallel dynamics though, as already noted above, this asymptotic value
24
is ≈ 8% below the optimal SK value. Is is possible to rates mi [120–125]. Although in our case the influence of
characterize more precisely the ensemble of configura- β introduces some perhaps unwanted stochasticity, these
tions reached after learning in the long memory limit fluctuations can in principle be suppressed (at least par-
α → 0? Can one, in particular, understand analytically tially) with sufficiently small α. The memory loss param-
the distribution of individual rewards shown in Fig. 4 eter could also represent an interesting way to tune the
and the corresponding asymptotic value of the average effective slowing down of the dynamics caused by symme-
reward, as well as its non monotonic behaviour as a func- try and described in [105]. Here, the description of real
tion of the noise parameter β? These are, in our opinion, neural networks would likely require much more sparse
quite interesting theoretical questions left for future in- interactions, but also perhaps the introduction of some
vestigation. dedicated dynamics for the interactions themselves, see
e.g. [126] for recent ideas.
Closer to our original motivation, more applied socio-
C. Extensions and Final Remarks economic problems might benefit from the introduction
of this type of reinforcement learning. In macroeco-
Many extensions of the very simple framework pre- nomics for instance, some form of “habit formation”
sented here can be imagined for the model to be more rep- could perhaps be relevant to extend existing descriptions
resentative of real socio-economic systems. For example, of input/output networks [13, 64], where client/supplier
by analogy with spin-glasses, going beyond the fully con- relationships are probably strongly affected by history
nected interactions and towards a more realistic network (on this point, see Kirman’s classic study on Marseilles’
structure should not change the overall phenomenology fish market [127], see also [29]). Finally, while it is an as-
although some subtle differences may show up. While an- pect of our model we have not investigated here, previous
alytical predictions become even more challenging, recent works have reported that similar dynamics yield interest-
works on dynamical mean-field theories with finite con- ing volatility clusters and heavy tails, corresponding to
nectivity, so far developed for Lotka-Volterra type sys- sudden changes of quasi-fixed points. Such effects might
tems, could perhaps be adapted to the learning dynamics be relevant to describe financial time series [14, 51].
of our model. In the context of financial markets, our model could
Allowing the interaction network to evolve with time also be used to challenge the idea that efficiency is
would of course also make the model more realistic, as in reached by evolution. Indeed, a theory that has been
[13]; in this case one would have to distinguish the case proposed to explain empirical observations going against
where the learning time is much longer or much shorter the Efficient Market Hypothesis is the so-called “Adap-
than the reshuffling time of the network. tive Markets Hypothesis”, stating that inefficiencies stem
Other interesting additions to Ising games could also from the (transient) evolution of a market towards true
include the introduction of self-excitation [119] or of al- efficiency [128, 129]. However, reinterpreting the Qi in
ternative decision rules that might be less statistical our model as some form of fitness measure, it appears
mechanics-friendly [36]. Extension to multinary deci- unlikely that simple evolutionary dynamics (akin to the
sions, beyond the binary case considered here, as well as learning dynamics) could overcome the type of radical
higher-order interactions (i.e. a “p-spin game”), would complexity discussed here. As a matter of fact, such an
obviously be interesting as well, specially as higher order evolutionary twist to the “SK-game” could also be used
interactions are known to change the phenomenology of to conjecture that simple Darwinian dynamics is unlikely
the SK model (see e.g. [102]). In particular, we expect to lead to a global optimum in a complex biological set-
that in the p-spin case with p ≥ 3 a much larger gap would ting [130].
develop between the optimal average reward and the one Last but not least, we believe that the learning dy-
reached by learning. namics presented here may be useful from a purely algo-
Finally, whereas temperature in physics is the same for rithmic point of view in the study of spin-glasses and so-
all spins, there is no reason to believe that the amount called TAP states. Indeed, in the β → ∞ limit, we have
of noise β or the memory span α should be the same seen that our iteration relatively frequently finds fixed
for all agents. Introducing such heterogeneities might be points in regions where their abundance is known to be
worth exploring, as some agents with longer memory may sub-exponential (close to ε = 1 in particular), and this
fare systematically better than others, like in Minority even for relatively large values of α. Interestingly, sim-
Games, see [34]. ilar exponentially weighted moving averages have been
Beyond the socio-economic context that was our initial employed in past numerical studies of TAP states for
motivation for its design, we believe that the simplicity symmetric interactions [131, 132], but on the magnetiza-
and generality of our model makes it a suitable candi- tions mi themselves and not on the local fields Qi like is
date to describe a much wider range of complex systems. the case above.
In the context of biological neural networks, the param- Using an offline version of our learning procedure could
eter α indeed allows one to interpolate between simple then be of use to effectively converge to fixed points of
discrete-time Hopfield network [88], and continuous-time the TAP equations or Naive Mean-Field Equations, and
models where Qi is an activation variable for the firing study their properties. Perhaps even more interestingly,
25
the online dynamics and the resulting fluctuations of the

mi themselves could prove to be extremely valuable to
probe hardly accessible regions of the solution space. In
some sense, the fluctuations related to finite values of α
and β could allow to define “meta-TAP” states, in the
sense of closely related TAP states mutually accessible
thanks to such extra fluctuations, in the same spirit as
standard Langevin dynamics in an energy landscape.
Finally, as mentioned in Sec. III G, in the zero temper-
ature limit and for ε = 1, it has recently been reported
that for a certain range of self-interaction strengths Jd ,
there appears an exponential number of accessible solu-
tions to the TAP equations that are seemingly not reach-
able with standard Hopfield dynamics [76]. Preliminary
numerical experiments seem to suggest that our learning
dynamics find such fixed points, as suggested by their ef-
fectiveness for ε = 1 without any form of self-interaction.
Beyond existing interest around neural networks, such
a self-reinforcement, “habit formation” term could also
be, as stated above, interesting to study from the socio-
economic perspective [10, 12].
ACKNOWLEDGMENTS
The authors are indebted to T. Galla and S. Hwang for

their insights on the DMFT and the enumeration of cy-
cles respectively. We also thank C. Aubrun, F. Aguirre-
Lopez, G. Biroli, C. Colon, J. D. Farmer, R. Farmer, A.
Kirman, S. Lakhal, M. Mézard, P. Mergny, A. Montanari,
J. Moran, N. Patil, V. Ros, P.-F. Urbani, R. Zakine and
F. Zamponi for fruitful discussions on these topics. J.G.-
B. would finally like to thank F. Mignacco for precious
tips on the numerical resolution of the DMFT equations.
This research was conducted within the Econophysics &
Complex Systems Research Chair, under the aegis of the
Fondation du Risque, the Fondation de l’Ecole polytech-
nique, the Ecole polytechnique and Capital Fund Man-
agement.
26
Appendix A: Static NMFE
We claim that for α ≪ 1

t
mj (t′ ).
′
j (t) ≃ α ∑ (1 − α)
m̃α t−t
(A1)
t′ =1
Indeed, given the assumption of independence in time, i.e. ⟨(Sj (t′ ) − mj (t′ )) (Sj (t′′ ) − mj (t′′ ))⟩ = δt,t′ (1 − m( t)), we have
2
mj (t′ )) ⟩ = α2 ∑ (1 − α)2(t−t ) ⟨(Sj (t′ ) − mj (t′ )) ⟩
′ ′ 2
⟨(m̃α
j (t) − α ∑ (1 − α)
t−t
t′ ≤t t′ ≤t
+ α2 ∑ ∑ (1 − α)t−t (1 − α)t−t ⟨(Sj (t′ ) − mj (t′ )) (Sj (t′′ ) − mj (t′′ ))⟩

′ ′′
t′ ≤t t′′ ≠t′ (A2)

= α2 ∑ (1 − α)2(t−t ) (1 − (mj (t′ ))2 )
′
t′ ≤t
α
≤ α2 ∑ (1 − α)2(t−t ) =
′
ÐÐ→ 0.
t′ ≤t 2 − α α→0
We can then make the ansatz that the expected decision reaches a fixed point m⋆j after some time. For sufficiently large t and
small but finite values of α, we will therefore have
⋆
⟨m̃α
j (t)⟩ ≃ mj (A3)
with fluctuations characterized by
⋆ 2 α
j (t) − mj ) ⟩ =
⟨(m̃α (1 − (m⋆j )2 ) + O(α2 ). (A4)
2
Appendix B: Limit Cycle Complexity with Memory
To study the influence of the memory loss rate α < 1, we may adapt the method of Hwang et al., although this requires the
introduction of either a strong nonlinearity in the exponent and subsequent saddle equations, or of new variables. We opt for
the latter, and write the number of cycles of length L as
∞ N L
NL (N, α, ε) = ( ∏ ∏ dQi (t)) ∣det J α ∣ ∏ δ(Qi (t + 1) − (1 − α)Qi (t) − α ∑ Jij Sj (t))Θ(Qi (t)Si (t)),
N
∑ ∫ (B1)
{Si (t)} −∞ i=1 t=1 i,t j
where the Dirac δ ensures that the dynamics are satisfied at each step, while the second enforces Si (t) = sign(Qi (t)) ∀t. The
L × L matrix J α is the α-dependent Jacobian ensuring that the zeros of the δ function are correctly weighted, i.e. for L > 1
⎧
⎪−(1 − α), if s = t (mod L)
⎪
⎪
⎪
Jts
α
= ⎨1, if s = t + 1 (mod L) (B2)
⎪
⎪
⎪
⎪0
⎩ otherwise.
The first step is, as usual, to perform the average over the disorder after introducing the integral representation of the Dirac δ,
∞ N L
dλi (t)
NL (N, α, ε) = ∣det J α ∣ ( ∏ ∏ dQi (t) )( ∏ Θ(Si (t)Qi (t))) exp [ − i ∑ λi (t)Qi (t + 1) + i(1 − α) ∑ λi (t)Qi (t)
N
∑ ∫
{Si (t)} −∞ i=1 t=1 2π i,t i,t i,t
1 2 ε 2 2 1 2 ε 2 2
− α (1 − ) ∑ ( ∑[λi (t)Sj (t) + λj (t)Si (t)]) − α ( ) ∑ ( ∑[λi (t)Sj (t) − λj (t)Si (t)]) ].
2N 2 i<j t 2N 2 i<j t
(B3)
The last two terms, resulting from the average on disorder, may be rearranged to give
α2 1
− ∑ ( ∑ [υ(ε)λi (t)Sj (t)λi (s)Sj (s) + (1 − ε)λi (t)Sj (t)λj (s)Si (s)] − (ε − 2) ∑ λi (t)Si (t)λi (s)Si (s)).
2
2N t,s i,j 2 i
Similar to ref. [61], we introduce a set of auxiliary functions,

1
U (t, s) = ∑ λi (t)Si (t)λi (s)Si (s), (B4)
N i
1
V (t, s) = ∑ λi (t)Si (s), (B5)
N i
1
R(t, s) = ∑ λi (t)λi (s), (B6)
N i
1
K(t, s) = ∑ Si (t)Si (s), (B7)
N i
27
such that the last term gives
1 (2 − ε)2
− N α2 ∑ [υ(ε)R(t, s)K(t, s) + (1 − ε)V (t, s)V (s, t) − U (t, s)].
2 t,s 2N
In the limit N → ∞, the O(N −1 ) term is insignificant and can thus be neglected. The complete expression is then,
∞ dλi (t) dK̂(t, s) dR̂(t, s) dV̂ (t, s)

NL (N, α, ε) = ∣det J α ∣ ( ∏ dQi (t) )( ∏ Θ(Si (t)Qi (t)))( ∏ dK(t, s) dV (t, s) )
N
∑ ∫ dR(t, s)
{Si (t)} −∞ i,t 2π i,t t,s 2π 2π 2π
1
exp [ − i ∑ λi (t)Qi (t + 1) + i(1 − α) ∑ λi (t)Qi (t) − N α2 ∑ [υ(ε)R(t, s)K(t, s) + (1 − ε)V (t, s)V (s, t)]
i,t i,t 2 t,s
− i ∑ K̂(t, s)(N K(t, s) − ∑ Si (t)Si (s)) − i ∑ R̂(t, s)(N R(t, s) − ∑ λi (t)λi (s))
t,s i t,s i
− i ∑ V̂ (t, s)(N V (t, s) − ∑ λi (t)Si (s))].

t,s i
(B8)
Now, one can notice that
∞ dR(t, s) − ∑t,s iR(t,s)[N R̂(t,s)− 12 iN α2 υ(ε)K(t,s)] 1
∫ (∏ )e = ∏ δ(N R̂(t, s) − iN α2 υ(ε)K(t, s)), (B9)
−∞ t,s 2π t,s 2
which means the integral over K(t, s) now reads,

∞ dR(t, s) 1 − 2N ∑
) ∏ e−iN R̂(t,s)R(t,s) δ(N R̂(t, s) − iN α2 υ(ε)K(t, s)) = e α2 υ(ε) t,s
R̂(t,s)K̂(t,s)
∫ (∏ (B10)
−∞ t,s 2π t,s 2
For V (t, s), we have to be a bit more careful, as the expression is not linear for all terms, as the diagonal for t = s gives a
quadratic term that will have to be treated separately. Noticing that the product V (t, s)V (s, t) is symmetric, we start by
considering the t < s,
∞ dV (t, s) − ∑t<s iV (t,s)[N V̂ (t,s)−iN α2 (1−ε)V (s,t)]
∫ (∏ )e = ∏ δ(N V̂ (t, s) − iN α2 (1 − ε)V (s, t)). (B11)
−∞ t<s 2π t<s
Now integrating over the V (t, s) for t > s,

∞ dV (t, s) − N
∑ V̂ (t,s)V̂ (s,t)
∫ (∏ ) ∏ e−iN V̂ (t,s)V (t,s) δ( − iα2 (1 − ε)V (s, t) + 2V̂ (t, s)) = e α2 (1−ε) t>s . (B12)
−∞ t>s 2π t>s
Finally, the diagonal V (t, t) for which the expression is quadratic can be computed with a Gaussian integral,
∞ 2 − N
∑t V̂ (t,t)
2
(1−ε) ∑t V (t,t)2 −iN ∑t V (t,t)V̂ (t,t)
( ∏ dV (t, t)) e− 2 N α
1
∫ ∼e 2α2 (1−ε) (B13)
−∞ t
Up to an O(1) multiplicative constant, the complete expression now reads

∞ dλi (t)
NL (N, α, ε) ∼ ∣det J α ∣ ( ∏ dQi (t) )( ∏ Θ(Si (t)Qi (t)))( ∏ dK̂(t, s)dR̂(t, s)dV̂ (t, s))
N
∑ ∫
{Si (t)} −∞ i,t 2π i,t t,s
exp ( − i ∑ λi (t)Qi (t + 1) + i(1 − α) ∑ λi (t)Qi (t) + i ∑ νi (t)Qi (t)Si (t)

i,t i,t i,t
(B14)
2N N
− 2 ∑ R̂(t, s)K̂(t, s) − ∑ V̂ (t, s)V̂ (s, t)
α υ(ε) t,s 2α2 (1 − ε) t,s
+ i ∑ K̂(t, s) ∑ Si (t)Si (s) + i ∑ R̂(t, s) ∑ λi (t)λi (s) + i ∑ V̂ (t, s) ∑ λi (t)Si (s)).

t,s i t,s i t,s i
Taking the changes of variable

1 2
R̂(t, s) → α υ(ε)R̂(t, s)
2
1 (B15)
K̂(t, s) → α2 υ(ε)K̂(t, s)
2
V̂ (t, s) → α2 (1 − ε)V̂ (t, s)
28
for convenience, we can entirely factorize the problem in N , finally giving

∞ 1
NL (N, α, ε) ∼ ∫ ( ∏ dK̂(t, s)dR̂(t, s)dV̂ (t, s)) exp (N [ − α2 υ(ε) ∑ R̂(t, s)K̂(t, s)
−∞ t,s 2 t,s
(B16)
1
− α2 (1 − ε) ∑ V̂ (t, s)V̂ (s, t) + log IL + log ∣det J α ∣ ]).
2 t,s
where
∞ dλ(t)
IL = ∑ ∫ ( ∏ dQ(t) ) ( ∏ Θ(S(t)Q(t))) exp [ − i ∑ λ(t)Q(t + 1) + i(1 − α) ∑ λ(t)Q(t)
{S(t)} −∞ t 2π t t t
(B17)
1
+ α2 υ(ε) ∑(S(t)S(s)iK̂(t, s) + λ(t)λ(s)iR̂(t, s)) + α2 (1 − ε) ∑ λ(t)S(s)iV̂ (t, s)].
2 t,s t,s
At this stage, we may notice that S(t)2 = 1 ∀t and as such that the diagonal part of the sum over K(t, s) can be taken out of
IL . Doing so and combining this contribution to the first term of the exponent of Eq. B16, we have
∞ 1
NL (N, α, ε) ∼ ∫ ( ∏ dK̂(t, s)dR̂(t, s)dV̂ (t, s)) exp (N [ α2 υ(ε) ∑(iR̂(t, t) + 1)iK̂(t, t)
−∞ t,s 2 t
(B18)
1 ε2 1
+ α2 (1 − ε + ) ∑ iR̂(t, s)iK̂(t, s) − α2 (1 − ε) ∑ V̂ (t, s)V̂ (s, t) + log IL + log ∣det J α ∣ ]).
2 2 s≠t 2 t,s
Now, the first term in the exponent can be integrated over the K̂(t, t) exactly, yieding a product of δ functions fixing
iR̂(t, t) = −1 ∀t, (B19)
as expected and introducing only a sub-dominant correction O ( logNN ) in the complexity.

In the N → ∞ limit, the complexity is then finally given
1 1
ΣL (α, ε) = saddle { α2 υ(ε) ∑ iR̂(t, s)iK̂(t, s) − α2 (1 − ε) ∑ V̂ (t, s)V̂ (s, t) + log IL + log ∣det J α ∣ }, (B20)
R̂,K̂,V̂ 2 s≠t 2 t,s
with now
∞ dλ(t)
IL = ∑ ∫ ( ∏ dQ(t) ) ∏ Θ(S(t)Q(t)) exp [ − i ∑ λ(t)Q(t + 1) + i(1 − α) ∑ λ(t)Q(t)
{S(t)} −∞ t 2π t t t
(B21)
1
− α2 υ(ε)[ ∑ λ(t)2 − ∑(S(t)S(s)iK̂(t, s) − λ(t)λ(s)iR̂(t, s))] + α2 (1 − ε) ∑ λ(t)S(s)iV̂ (t, s)].
2 t s≠t t,s
This integral may be rewritten as

2 ∞ dQ(t) dλ(t) 1 ⊺
1 υ(ε) ∑s≠t S(t)S(s)iK̂(t,s) ⊺
IL = ∑ e 2 α ∫ (∏ √ √ ) ∏ Θ(S(t)Q(t)) exp (− λ Aλ − ib λ) . (B22)
{S(t)} −∞ t 2π 2π t 2
with the L × L matrix A constituted of
A(t, s) = α2 υ(ε)(δts − (1 − δts )iR̂(t, s)), (B23)
and b ∈ RL ,
b(t) = Q(t + 1) − (1 − α)Q(t) − α2 (1 − ε) ∑ V̂ (t, s)S(s), (B24)
s
such that we in fact have
b = J α Q + c, c(t) = −α2 (1 − ε) ∑ V̂ (t, s)S(s). (B25)
s
Computing the Gaussian integral on λ,
1 2
υ(ε) ∑s≠t S(t)S(s)iK̂(t,s) ∞
e2α dQ(t) 1
IL = ∑ √ ∫ (∏ √ ) ∏ Θ(S(t)Q(t)) exp (− (J α Q + c)⊺ A−1 (J α Q + c)) . (B26)
{S(t)} det A −∞ t 2π t 2
Taking the change of variable u = J α Q + c, the above becomes

1 2
e2α dQ(t) 1
IL = ∑ √ ∫ (∏ √ ) ∏ Θ(S(t)(Q − (J α )−1 c)(t)) exp (− Q⊺ Ã−1 Q) , (B27)
{S(t)} ∣det J ∣ det Ã
α −∞ t 2π t 2
29
where Ã = (J α )−1 A(J α )−1 As a result, the ∣det J α ∣ contributions in the complexity cancel out, and we find ourselves with
1 1
ΣL (α, ε) = saddle { α2 υ(ε) ∑ iR̂(t, s)iK̂(t, s) − α2 (1 − ε) ∑ V̂ (t, s)V̂ (s, t) + log IL }, (B28)
R̂,K̂,V̂ 2 s≠t 2 t,s
with now
1 2
e2α dQ(t) 1
IL = ∑ √ ∫ (∏ √ ) ∏ Θ(S(t)(Q − (J α )−1 c)(t)) exp (− Q⊺ Ã−1 Q) . (B29)
{S(t)} det Ã −∞ t 2π t 2
The integral is challenging to study in generality, as the non-diagonal nature of J α and A means that we cannot factorize
the integrand. We now move √ on to specific cases of interest. The√ equations can be further simplified through the rescalings
α2 υ(ε)K̂(t, s) → K̂(t, s), α υ(ε)V̂ (t, s) → V̂ (t, s) and Q(t) → α υ(ε), in which case we finally get
η
ΣL (α, η) = saddle { ∑ iR̂(t, s)iK̂(t, s) − ∑ V̂ (t, s)V̂ (s, t) + log IL }, (B30)
R̂,K̂,V̂ s<t 2 t,s
IL = ∑ e∑s<t S(t)S(s)iK̂(t,s) ΨL (Γ1 (α, η), . . . , ΓL (α, η); C), (B31)

{S(t)}
where ΨL (x1 , . . . , xL ; C) is the cumulative distribution of an L-dimensional Gaussian with a zero mean vector and covariance
matrix C evaluated up to xt , t ∈ {1, . . . , L}. In our case, the upper bounds of integration are given by Γ = S ○ (J α )−1 c, with
now
c(t) = η ∑ V̂ (t, s)S(s), (B32)
s
while the covariance matrix must be taken with care as the off-diagonal elements must be adapted to the presence of S(t) in
the Heaviside step function. As a result, off-diagonal elements must be multiplied by S(t)S(s). As a result, we have
Ã(t, t) for t = s
C(t, s) = { (B33)
S(t)S(s)Ã(t, s) for t ≠ s
with Ã = (J α )−1 A(J α )−1 and

A(t, s) = δts − (1 − δts )iR̂(t, s). (B34)
1. Fixed Points
We can start by checking that we recover the known result for L = 1. In this case, J α = α, and we only have to solve for a
scalar V̂ (1, 1) = x. Here,
η 1
IL = 2Ψ1 ( x; 2 ) = 2Φ(ηx), (B35)
α α
therefore the saddle point equation becomes
1
Σ1 (α, η) ∶= ΣFP (η) = max {− x2 + log 2 + log Φ(ηx)} , (B36)
x 2
from which the expression in the main text can immediately be recovered.
2. Two-cycles
There are now six variables to solve for: iR̂, iK̂, V̂1 = V̂ (1, 1), V̂2 = V̂ (2, 2), V̂12 = V̂ (1, 2) and V̂21 = V̂ (2, 1). In this case, the
Jacobian matrix is given by
−(1 − α) 1 1 1−α 1
Jα = [ ] ⇒ (J α )−1 = [ ], (B37)
1 −(1 − α) 1 − (1 − α)2 1 1−α
giving the bounds of integration
η V̂ + S1 S2 V̂2 + (1 − α)(V̂1 + S1 S2 V̂12 )

Γ= [ 21 ] (B38)
1 − (1 − α)2 V̂12 + S1 S2 V̂1 + (1 − α)(V̂2 + S1 S2 V̂21 )
and the covariance matrix
1 1 + (1 − α)2 − 2(1 − α)iR̂ S1 S2 (2(1 − α) − (1 + (1 − α)2 )iR̂)
C= [ ]. (B39)
(1 − (1 − α) ) S1 S2 (2(1 − α) − (1 + (1 − α) )iR̂)
2 2 2
1 + (1 − α)2 − 2(1 − α)iR̂
30
Now, it is immediately apparent that IL is a function of the product S1 S2 = ±1 rather than S1 and S2 . As a result, the
complexity is given by
η 2
Σ2 (α, η) = saddle {iR̂iK̂ − (V̂1 + V̂22 + 2V̂12 V̂21 ) + log 2 + log ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C)} . (B40)
R̂,K̂,V̂1 ,V̂2 ,V̂12 ,V̂21 2 S=±1
The six saddle point equations read
iR̂ ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ SeSiK̂ Ψ2 (Γ1 , Γ2 ; C) = 0, (B41)

S=±1 S=±1
∂C−1 e− 2 Q C Q
1 ⊺ −1
iR̂ 1 Γ1 Γ2
(iK̂ + ) ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) − ∑ e
SiK̂
∫ dQ1 ∫ dQ2 Q⊺ Q √ =0 (B42)
1 − (iR̂) 2
S=±1 2 S=±1 −∞ −∞ ∂iR̂ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ2
−η V̂1 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ eSiK̂
∫ dQ2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B43)
e− 2 Q1 C Q1
1 ⋆⊺ −1 ⋆
η Γ1
+ ∑ Se
SiK̂
∫ dQ1 √ =0
1 − (1 − α) S=±1
2 −∞ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺ −1 ⋆
η Γ2
−η V̂2 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ SeSiK̂
∫ dQ2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B44)
e− 2 Q1 C Q1
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ1
+ ∑ e
SiK̂
∫ dQ1 √ =0
1 − (1 − α) S=±1
2 −∞ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ2
−η V̂21 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ SeSiK̂
∫ dQ 2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B45)
e− 2 Q1 C Q1
1 ⋆⊺ −1 ⋆
η Γ1
+ ∑ e
SiK̂
∫ dQ1 √ =0
1 − (1 − α) S=±1
2 −∞ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺ −1 ⋆
η Γ2
−η V̂12 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ eSiK̂
∫ dQ 2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B46)
e− 2 Q1 C Q1
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ1
+ ∑ Se
SiK̂
∫ dQ1 √ = 0,
1 − (1 − α) S=±1
2 −∞ 2π det C
with Q∗1 = [Q1 Γ2 ]⊺ and similarly for Q∗2 , and
∂C−1 1 2iR̂(1 + (1 − α)2 ) − 2(1 − α)((iR̂)2 + 1) S(−4iR̂(1 − α) + (1 + (1 − α)2 )((iR̂)2 + 1))

= [ ]. (B47)
(1 − (iR̂)2 )2 S(−4iR̂(1 − α) + (1 + (1 − α) )((iR̂) + 1)) 2iR̂(1 + (1 − α)2 ) − 2(1 − α)((iR̂)2 + 1)
2 2
∂iR̂
We now take the particular case α = 1. In this case, we expect two-cycles with a zero overlap in between consecutive steps,
meaning iR̂ = 0. As a result, the matrix C is the identity matrix, and thus
Ψ2 (Γ1 , Γ2 ; C) = Φ (η(V̂21 + S V̂2 ) Φ (η(V̂12 + S V̂1 )) . (B48)
In order for iR̂ = 0 to satisfy Eq. (B41), we then require iK̂ = 0 and V̂1 = V̂2 = 0 such that ∑S=±1 SeiK̂ Ψ2 (Γ1 , Γ2 ; C) = 0. Given
∂C−1 0 S
∣ =[ ] (B49)
∂iR̂ S 0
iR̂=0
for α = 1, the solution iK̂ = 0 satisfies the saddle point equation (B42) while the independence of the integrands on S ensures
that V̂1 = V̂2 = 0 are compatible with Eqs. (B43)-(B44). Eqs. (B45)-(B46) then suggest the ansatz V̂12 = V̂21 = x, in which case
the problem is finally reduced to
Σ2 (α = 1, η) = max {−ηx2 + 2 log 2 + 2 log Φ(ηx)}
x
(B50)
= 2ΣFP (η),
recovering the known solution of [61, 86].
31
Appendix C: Derivation of the DMFT Equations
We start from the N ≫ 1 discrete difference equations to which we have added an external field hi (t)
Qi (t + 1) = (1 − α)Qi (t) + α ∑ Jij mj (t) + αηi (t) + hi (t), ⟨ηi (t)ηjs ⟩ ≈ υ(ε)(1 − q(t))δt,s δi,j (C1)
j
and introduce a new agent at index i = 0. The influence of this newly introduced agent on the dynamic equation of agents i ≠ 0
is then
∑ Jij mj (t) + hi (t) → ∑ Jij mj (t) + Ji0 m0 (t) + hi (t) (C2)
j j
Given N is large, we consider the response at the linear order, meaning that the expected decision of all agents i > 0 becomes
∂mi (t)
mi (t) → mi (t) + ∑ ∑ ∣ Jj0 m0 (s), (C3)
s<t j ∂hj (s) h=0
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
χij (t,s)
as Jj0 m0 (t) can be seen as the modification of the effective field “felt” by all agents j, and χij (t, s) is simply the linear response
function to a small external field. The dynamics that the newly introduced agent follows is then given by
Q0 (t + 1) = (1 − α)Q0 (t) + α ∑ J0i mi (t) + ∑ ( ∑ J0i χij (t, s)Jj0 )m0 (s) + αη0 (t) + h0 (t). (C4)
i>0 s<t ij
The sum on i, j can be split in its diagonal and off-diagonal parts. On the diagonal, we assume the central limit theorem holds,
yielding
1
∑ J0i χii (t, s)Ji0 = N [⟨J0i χii (t, s)Ji0 ⟩ + O( √ )] ≈ (1 − ε)⟨χii (t, s)⟩, (C5)
i N
while the off-diagonal contribution will be sub-dominant as its mean will be zero given non-opposing entries in the interaction
matrix are uncorrelated. We can also assume the CLT is valid for the sum on indices i at the leading order, in which case
∑ J0i mi (t) ≈ 0 + υ(ε)ξ0 (t), (C6)

i
where ξ0 is a Gaussian of zero mean and correlated in time as ⟨ξ0 (t)ξ0 (s)⟩ = C0 (t, s) = ⟨m0 (t)m0 (s)⟩. Bringing everything
together, one realizes that there is no cross contributions between agents at the leading order and thus that we can drop the
index 0 and recover the equation in the main text
Q(t + 1) = (1 − α)Q(t) + α2 (1 − ε) ∑ G(t, s)m(s) + αϕ(t) + αh(t), (C7)

s<t
with a new noise term combining the original thermal-like fluctuations and the effective contribution from the disorder averaging,
with ⟨ϕ(t)⟩ = 0, and
⟨ϕ(t)ϕ(s)⟩ = υ(ε)[C(t, s) + (1 − q(t))δt,s ], (C8)
and where the memory kernel and correlation function are to be determined self-consistently,
∂m(t)
G(t, s) = ⟨χii (t, s)⟩ = ⟨ ∣ ⟩, C(t, s) = ⟨m(t)m(s)⟩. (C9)
∂h(s) h=0
Note that in order to eliminate the external field, we have expressed the susceptibility with the noise term, resulting in a
rescaling G(t, s) → αG(t, s). From this expression, the continuous limit can be taken, changing the sum in time weighted by α
into an integral,
α t
Q̈(t) + Q̇(t) = −Q(t) + (1 − ε) ∫ ds G(t, s)m(s) + ϕ(t) + h(t). (C10)
2 0
Appendix D: Adapting the Sompolinsky & Crisanti result
We start from the much simplified DMFT equation,

1
Q̇(t) = −Q(t) + ϕ(t), ⟨ϕ(t)ϕ(s)⟩ = C(t, s) (D1)
2
with C(t, s) = ⟨m(t)m(s)⟩, m(t) = sign(Q(t)) in the β → ∞ limit. As shown by Sompolinsky & Crisanti [97], we can write a
second order differential equation for the two-point autocorrelation of Q(t),
¨ ) = ∆(τ ) − 1 C(τ ),
∆(τ (D2)
2
32
with therefore ∆(τ ) = ⟨Q(t + τ )Q(t)⟩, and where thanks to the Gaussian nature of the fluctuations we have
∞ dz − 1 z2 ∞ dx
− 1 x2
√ √ 2
C(τ ) = ∫ √ e 2 [∫ √ e 2 sign ( ∆(0) − ∣∆(τ )∣x + ∣∆(τ )∣z)]
−∞ 2π −∞ 2π
¿
∞ dz − 1 z2 Á ∣∆(τ )∣
2
√ e 2 erf (ÁÀ z
=∫ √ ) (D3)
−∞ 2π ∆(0) − ∣∆(τ ) 2
2 ∆(τ )
= sin−1 ( ),
π ∆(0)
(see [133] for the last step above). Now, by identifying that the only difference with the original problem is a factor 2 in ∆, we
can directly use the result ∆(0) = 1 − π2 , found by enforcing the condition required for ∆(τ ) to decay monotonously [99]. We
may finally expand the inverse sine to recover
¨ ) ∼ (1 − 1
∆(τ )∆(τ ) (D4)
τ ≫1 π∆(0)
and therefore √
2 − ττ π−2
C(τ ) ∼ e 1, τ1 = ≈ 2.84. (D5)
τ ≫1 π π−3
We notice that the value of the characteristic time is identical in the standard Sompolinsky
√ π & Crisanti case (the only difference
being a factor 2 in the magnitude of ∆(τ )), which is therefore not equal to the value π−2 ≈ 1.66 originally given in [97] and
[98].
[1] H. A. Simon, The quarterly journal of economics , 99 [19] “Common knowledge” means that “You know what I
(1955). know and I know that you know what I know”.
[2] The complexity of the whole means that anything that [20] S. Galluccio, J.-P. Bouchaud, and M. Potters, Physica
can happen is really, despite the experience gained, im- A: Statistical Mechanics and its Applications 259, 449
possible to predict, let alone imagine. It is pointless at- (1998).
tempt to describe it, because any solution is conceivable. [21] J. Garnier-Brun, M. Benzaquen, S. Ciliberti, and J.-
[3] J.-M. Grandmont, Econometrica , 741 (1998). P. Bouchaud, Journal of Statistical Mechanics: Theory
[4] A. Kirman, Complex economics: individual and collec- and Experiment 2021, 093408 (2021).
tive rationality (Routledge, 2010). [22] D. Sharma, J.-P. Bouchaud, M. Tarzia, and F. Zam-
[5] C. Hommes, Journal of Economic Literature 59, 149 poni, Journal of Statistical Mechanics: Theory and Ex-
(2021). periment 2021, 063403 (2021).
[6] G. Dosi and A. Roventini, Journal of Evolutionary Eco- [23] P. Bak, How nature works: the science of self-organized
nomics 29, 1 (2019). criticality (Springer Science & Business Media, 2013).
[7] M. King and J. Kay, Radical uncertainty: Decision- [24] P. W. Anderson, The economy as an evolving complex
making for an unknowable future (Hachette UK, 2020). system (CRC Press, 2018).
[8] A. Marcet and T. J. Sargent, Journal of Economic the- [25] P. Bak, K. Chen, J. Scheinkman, and M. Woodford,
ory 48, 337 (1989). Ricerche economiche 47, 3 (1993).
[9] G. W. Evans, S. Honkapohja, et al., Rethinking expecta- [26] R. Bookstaber, The end of theory: Financial crises, the
tions: The way forward for macroeconomics 68 (2013). failure of economics, and the sweep of human interac-
[10] R. Pemantle, Probability Surveys 4, 1 (2007). tion (Princeton University Press, 2017).
[11] D. Lamberton, P. Tarrès, et al., Ann. Appl. Probab. 14, [27] M. Shiino and T. Fukai, Journal of Physics A: Mathe-
1424 (2004). matical and General 23, L1009 (1990).
[12] J. Moran, A. Fosset, D. Luzzati, J.-P. Bouchaud, and [28] J. Moran and J.-P. Bouchaud, Physical Review E 100,
M. Benzaquen, Chaos: An Interdisciplinary Journal of 032307 (2019).
Nonlinear Science 30, 053123 (2020). [29] G. Dosi, The Foundations of Complex Evolving
[13] C. Colon and J.-P. Bouchaud, Available at SSRN Economies: Part One: Innovation, Organization, and
4300311 (2022). Industrial Dynamics (Oxford University Press, 2023).
[14] T. Galla and J. D. Farmer, Proceedings of the National [30] For other examples of models where learning leads to
Academy of Sciences 110, 1232 (2013). chaotic dynamics, see e.g. [134, 135].
[15] J.-P. Bouchaud and R. E. Farmer, Journal of Political [31] T. Hirano and J. E. Stiglitz, Land Speculation and Wob-
Economy 131, 000 (2023). bly Dynamics with Endogenous Phase Transitions, Tech.
[16] G. Parisi, Physica A: Statistical Mechanics and its Ap- Rep. (National Bureau of Economic Research, 2022).
plications 263, 557 (1999), proceedings of the 20th IU- [32] G. Dosi, M. Napoletano, A. Roventini, J. E. Stiglitz,
PAP International Conference on Statistical Physics. and T. Treibich, Economic Inquiry 58, 1487 (2020).
[17] G. Parisi, Advances in Complex Systems 10, 223 (2007). [33] W. A. Brock and S. N. Durlauf, The Review of Eco-
[18] J.-P. Bouchaud, Entropy 23, 1676 (2021). nomic Studies 68, 235 (2001).
33
[34] D. Challet, M. Marsili, and Y.-C. Zhang, Minority .

games: interacting agents in financial markets (OUP [60] A. J. Bray and M. A. Moore, Journal of Physics C: Solid
Oxford, 2004). State Physics 13, L469 (1980).
[35] M. B. Gordon, J.-P. Nadal, D. Phan, and V. Semesh- [61] S. Hwang, V. Folli, E. Lanza, G. Parisi, G. Ruocco, and
enko, Mathematical Models and Methods in Applied F. Zamponi, Journal of Statistical Mechanics: Theory
Sciences 19, 1441 (2009). and Experiment 2019, 053402 (2019).
[36] J.-P. Bouchaud, Journal of Statistical Physics 151, 567 [62] S. Boettcher, Journal of Statistical Mechanics: Theory
(2013). and Experiment 2010, P07002 (2010).
[37] S. P. Anderson, A. De Palma, and J.-F. Thisse, Discrete [63] J. Sterman, Management Science 35, 321 (1989).
choice theory of product differentiation (MIT press, [64] T. Dessertaine, J. Moran, M. Benzaquen, and J.-P.
1992). Bouchaud, Journal of Economic Dynamics and Control
[38] For a complete review of the motivations behind this 138, 104362 (2022).
rule, see [36, 37] and references therein. [65] Note that the role of ε is somewhat similar to that of
[39] C. J. Watkins and P. Dayan, Machine learning 8, 279 parameter Γ in [14]: increasing competition leads to
(1992). chaos.
[40] C. Camerer and T. Hua Ho, Econometrica 67, 827 [66] Note that this is strictly speaking true only when β is
(1999). large, but not infinite. For infinite β and finite N one
[41] D. Sherrington and S. Kirkpatrick, Physical review let- always gets oscillations of length L = 4.
ters 35, 1792 (1975). [67] Of course, in the β → ∞ limit discussed in this sub-
[42] M. Mézard, G. Parisi, and M. A. Virasoro, Spin glass section, one can replace mi (t) by the actual deci-
theory and beyond: An Introduction to the Replica sion Si (t) in the definition of C(t, t + τ ). More gen-
Method and Its Applications, Vol. 9 (World Scientific erally, the spin-spin correlation function is given by
Publishing Company, 1987). (1 − q(t))δ(τ ) + C(t, t + τ ) with q(t) = ⟨m2i (t)⟩.
[43] D. Panchenko, Current Developments in Mathematics , [68] Note the rather large error bar on C(2n + 1), meaning
231 (2015). that there are actually substantial fluctuations of the
[44] Y. Sato, E. Akiyama, and J. P. Crutchfield, Physica D: number of idle spins around the value N /2 when N is
Nonlinear Phenomena 210, 21 (2005). finite.
[45] T. Galla, Physical review letters 103, 198702 (2009). [69] H.-J. Sommers and W. Dupont, Journal of Physics C:
[46] T. Galla, Journal of Statistical Mechanics: Theory and Solid State Physics 17, 5785 (1984).
Experiment 2011, P08007 (2011). [70] S. Pankov, Physical review letters 96, 197204 (2006).
[47] D. Vilone, A. Robledo, and A. Sánchez, Physical review [71] M. Müller and M. Wyart, Annu. Rev. Condens. Matter
letters 107, 038101 (2011). Phys. 6, 177 (2015).
[48] A. Kianercy and A. Galstyan, Physical Review E 85, [72] One would expect a different behavior in the possibly
041145 (2012). relevant case of a fat-tailed distribution of the Jij , see
[49] J. Burridge, Physical Review E 92, 042111 (2015). [137–139]. We leave this question for further investiga-
[50] M. Pangallo, T. Heinrich, and J. Doyne Farmer, Science tions.
advances 5, eaat1328 (2019). [73] F. Krzakala and J.-P. Bouchaud, Europhysics Letters
[51] J. B. Sanders, J. D. Farmer, and T. Galla, Scientific 72, 472 (2005).
reports 8, 1 (2018). [74] T. Aspelmeier, Journal of Physics A: Mathematical and
[52] V. Semeshenko, M. B. Gordon, J.-P. Nadal, and Theoretical 41, 205005 (2008).
D. Phan, Contributions to Economic Analysis 280, 177 [75] G. Toulouse, Journal de Physique Lettres 41, 447
(2006). (1980).
[53] V. Semeshenko, M. B. Gordon, and J.-P. Nadal, Phys- [76] R. Zecchina, “Liquid computing in neural networks,”
ica A: Statistical Mechanics and its Applications 387, (2023), Towards a theory of artificial and biological neu-
4903 (2008). ral networks – Les Houches workshop.
[54] M. Opper and S. Diederich, Physical review letters 69, [77] D. J. Thouless, P. W. Anderson, and R. G. Palmer,
1616 (1992). Philosophical Magazine 35, 593 (1977).
[55] T. Galla, Journal of Physics A: Mathematical and Gen- [78] A. J. Bray, H. Sompolinsky, and C. Yu, Journal of
eral 39, 3853 (2006). Physics C: Solid State Physics 19, 6389 (1986).
[56] A. Leonidov, A. Savvateev, and A. G. Semenov, arXiv [79] H. Takayama and K. Nemoto, Journal of Physics: Con-
preprint arXiv:2108.00824 (2021). densed Matter 2, 1997 (1990).
[57] A. Leonidov and E. Vasilyeva, Chaos, Solitons & Frac- [80] K. Nishimura, K. Nemoto, and H. Takayama, Journal of
tals 160, 112279 (2022). Physics A: Mathematical and General 23, 5915 (1990).
[58] In fact, it was recently shown that one can devise a [81] H. J. Sommers, A. Crisanti, H. Sompolinsky, and
smart algorithm with run-time growing like K(ϵ)N 2 to Y. Stein, Physical review letters 60, 1895 (1988).
find configurations {Si } that reach a value of at least [82] V. Ros, F. Roy, G. Biroli, G. Bunin, and A. M. Turner,
1 − ϵ times the optimum (13), with ϵ > 0 independent of arXiv preprint arXiv:2212.01837 (2022).
N [136]. [83] T. Aspelmeier, A. J. Bray, and M. A. Moore, Physical
[59] Note that there are finite N corrections that must be review letters 92, 087203 (2004).
taken into account for such a comparison, which read [84] A. Bovier, F. Dunlop, A. Van Enter, F. Den Hollan-
[62] der, and J. Dalibard, Mathematical Statistical Physics:
Lecture Notes of the Les Houches Summer School 2005
A (Elsevier, 2006).
RN ≈ R∞ − , A ≈ 0.75 × (2 − ε). (D6)
N 2/3
34
[85] F. R. Waugh, C. M. Marcus, and R. M. Westervelt, [111] T. Arnoulx de Pirey and G. Bunin, Physical Review
Physical review letters 64, 1986 (1990). Letters 130, 098401 (2023).
[86] H. Gutfreund, J. Reger, and A. Young, Journal of [112] E. Marinari, G. Parisi, and D. Rossetti, The Euro-
Physics A: Mathematical and General 21, 2775 (1988). pean Physical Journal B-Condensed Matter and Com-
[87] F. Tanaka and S. Edwards, Journal of Physics F: Metal plex Systems 2, 495 (1998).
Physics 10, 2769 (1980). [113] A. Baldassarri, Physical Review E 58, 7047 (1998).
[88] J. J. Hopfield, Proceedings of the national academy of [114] L. F. Cugliandolo, J. Kurchan, P. Le Doussal, and
sciences 79, 2554 (1982). L. Peliti, Physical review letters 78, 350 (1997).
[89] U. Bastolla and G. Parisi, Journal of Physics A: Math- [115] E. Marinari and D. Stariolo, Journal of Physics A:
ematical and General 31, 4583 (1998). Mathematical and General 31, 5021 (1998).
[90] L. F. Cugliandolo, arXiv preprint arXiv:2305.01229 [116] L. Berthier and J. Kurchan, Nature Physics 9, 310
(2023). (2013).
[91] F. Roy, G. Biroli, G. Bunin, and C. Cammarota, Jour- [117] G. Iori and E. Marinari, Journal Of Physics A: Mathe-
nal of Physics A: Mathematical and Theoretical 52, matical and General 30, 4489 (1997).
484001 (2019). [118] L. Molgedey, J. Schuchhardt, and H. G. Schuster, Phys-
[92] F. Mignacco, F. Krzakala, P. Urbani, and L. Zdeborová, ical review letters 69, 3717 (1992).
Advances in Neural Information Processing Systems 33, [119] A. Antonov, A. Leonidov, and A. Semenov, Physica A:
9540 (2020). Statistical Mechanics and its Applications 561, 125305
[93] H. Eissfeller and M. Opper, Physical review letters 68, (2021).
2094 (1992). [120] C. Van Vreeswijk and H. Sompolinsky, Science 274,
[94] H. Eissfeller and M. Opper, Physical Review E 50, 709 1724 (1996).
(1994). [121] N. Brunel, Journal of computational neuroscience 8, 183
[95] The convergence to m∞ is as slow a power law of τ , (2000).
which makes difficult its numerical determination. Ob- [122] K. Rajan, L. F. Abbott, and H. Sompolinsky, Physical
taining the precise value of εRM is therefore challenging review e 82, 011903 (2010).
[94]. [123] M. Stern, H. Sompolinsky, and L. F. Abbott, Physical
[96] Note that C(τ = 2) remains equal to 1 in the whole Review E 90, 062710 (2014).
range of parameters shown in Fig. 12, i.e. we only ob- [124] J. Kadmon and H. Sompolinsky, Physical Review X 5,
serve fixed points or 2-cycles. 041030 (2015).
[97] H. Sompolinsky, A. Crisanti, and H.-J. Sommers, Phys- [125] D. G. Clark and L. F. Abbott, arXiv preprint
ical review letters 61, 259 (1988). arXiv:2302.08985 (2023).
[98] A. Crisanti and H. Sompolinsky, Physical Review A 37, [126] U. Pereira-Obilinovic, J. Aljadeff, and N. Brunel, Phys-
4865 (1988). ical Review X 13, 011009 (2023).
[99] A. Crisanti and H. Sompolinsky, Physical Review E 98, [127] A. P. Kirman and N. J. Vriend, in Interaction and
062120 (2018). market structure: Essays on heterogeneity in economics
[100] We have in fact noted that in this regime, all results (Springer, 2000) pp. 33–56.
seem to collapse when plotted vs. T /Tc (ε), as in Fig. 17 [128] A. W. Lo, Journal of Portfolio Management, Forthcom-
(a). ing (2004).
[101] L. F. Cugliandolo and J. Kurchan, Journal of Physics [129] A. W. Lo, Journal of investment consulting 7, 21 (2005).
A: Mathematical and General 27, 5749 (1994). [130] T. Brotto, G. Bunin, and J. Kurchan, Journal of Sta-
[102] J.-P. Bouchaud, L. F. Cugliandolo, J. Kurchan, and tistical Physics 166, 1065 (2017).
M. Mézard, Spin glasses and random fields 12, 161 [131] T. Aspelmeier, R. A. Blythe, A. J. Bray, and M. A.
(1998). Moore, Physical Review B 74, 184411 (2006).
[103] E. Vincent, J. Hammann, M. Ocio, J.-P. Bouchaud, and [132] T. Aspelmeier and M. A. Moore, Physical Review E
L. F. Cugliandolo, in Complex Behaviour of Glassy Sys- 100, 032127 (2019).
tems: Proceedings of the XIV Sitges Conference Sitges, [133] A. Prudnikov, Y. A. Brychkov, and O. Marichev, More
Barcelona, Spain, 10–14 June 1996 (Springer, 2007) pp. special functions (integrals and series vol 3) (New York:
184–219. Gordon and Breach, 1990).
[104] A. Altieri, F. Roy, C. Cammarota, and G. Biroli, Phys- [134] W. A. Brock and C. H. Hommes, Econometrica: Journal
ical Review Letters 126, 258301 (2021). of the Econometric Society , 1059 (1997).
[105] D. Martı́, N. Brunel, and S. Ostojic, Physical Review [135] W. A. Brock and C. H. Hommes, Journal of Economic
E 97, 062314 (2018). dynamics and Control 22, 1235 (1998).
[106] J.-P. Bouchaud, Journal de Physique I 2, 1705 (1992). [136] A. Montanari, SIAM Journal on Computing (2021).
[107] H. Rieger, Journal of Physics A: Mathematical and Gen- [137] P. Cizeau and J.-P. Bouchaud, Journal of Physics A:
eral 26, L615 (1993). Mathematical and General 26, L187 (1993).
[108] L. F. Cugliandolo, J. Kurchan, and F. Ritort, Physical [138] K. Janzen, A. Engel, and M. Mézard, Physical Review
Review B 49, 6331 (1994). E 82, 021127 (2010).
[109] H. Yoshino, Journal of Physics A: Mathematical and [139] I. Neri, F. Metz, and D. Bollé, Journal of Statistical Me-
General 29, 1421 (1996). chanics: Theory and Experiment 2010, P01010 (2010).
[110] L. Berthier and J.-P. Bouchaud, Physical Review B 66,
054404 (2002).

Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World

Uploaded by

Copyright:

Available Formats

Unlearnable Games and “Satisficing” Decisions:

A Simple Model for a Complex World

La complexité de l’ensemble fait que tout ce qui peut G. Habit Formation 10

V. Counting Limit Cycles 13

Acknowledgments 25 two-player games, or by one of us (JPB) with R. Farmer

namely and variance σ 2 /N (the case of a non-zero average value

β→∞ β1 (a) (b)

and oscillations start appearing: impatient learning (a) (b)

In order to characterize more precisely such temporal

speaking converge, but actions Si (t) fluctuate around

(a) E. Unpredictable Equilibria

0.0 final rewards are highly dependent on the initial condi-

(a) (b) (a) 2 (b) 1.0

0.0 10−2 0 0.0

FIG. 8. (a) Evolution of the average reward with the in-

cooperative contribution to rewards. This term obviously

0.4 1.0 1.0

(a) 0.8 (a) 100

(a) 2.0 (a) (b) (c) 1.00

the paramagnetic solution (q = 0) becomes linearly unstable

0.0 original SK model, so there is a priori no reason to ex-

some instantaneous propensity to overplay +1 or −1 ap- ceases to be negligible.

the online dynamics and the resulting fluctuations of the

The authors are indebted to T. Galla and S. Hwang for

Appendix A: Static NMFE

We claim that for α ≪ 1

+ α2 ∑ ∑ (1 − α)t−t (1 − α)t−t ⟨(Sj (t′ ) − mj (t′ )) (Sj (t′′ ) − mj (t′′ ))⟩

t′ ≤t t′′ ≠t′ (A2)

Appendix B: Limit Cycle Complexity with Memory

Similar to ref. [61], we introduce a set of auxiliary functions,

such that the last term gives

∞ dλi (t) dK̂(t, s) dR̂(t, s) dV̂ (t, s)

− i ∑ V̂ (t, s)(N V (t, s) − ∑ λi (t)Si (s))].

which means the integral over K(t, s) now reads,

Now integrating over the V (t, s) for t > s,

Up to an O(1) multiplicative constant, the complete expression now reads

exp ( − i ∑ λi (t)Qi (t + 1) + i(1 − α) ∑ λi (t)Qi (t) + i ∑ νi (t)Qi (t)Si (t)

+ i ∑ K̂(t, s) ∑ Si (t)Si (s) + i ∑ R̂(t, s) ∑ λi (t)λi (s) + i ∑ V̂ (t, s) ∑ λi (t)Si (s)).

Taking the changes of variable

for convenience, we can entirely factorize the problem in N , finally giving

iR̂(t, t) = −1 ∀t, (B19)

as expected and introducing only a sub-dominant correction O ( logNN ) in the complexity.

This integral may be rewritten as

with the L × L matrix A constituted of

A(t, s) = α2 υ(ε)(δts − (1 − δts )iR̂(t, s)), (B23)

Taking the change of variable u = J α Q + c, the above becomes

IL = ∑ e∑s<t S(t)S(s)iK̂(t,s) ΨL (Γ1 (α, η), . . . , ΓL (α, η); C), (B31)

with Ã = (J α )−1 A(J α )−1 and

giving the bounds of integration

η V̂ + S1 S2 V̂2 + (1 − α)(V̂1 + S1 S2 V̂12 )

The six saddle point equations read

iR̂ ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ SeSiK̂ Ψ2 (Γ1 , Γ2 ; C) = 0, (B41)

∂C−1 1 2iR̂(1 + (1 − α)2 ) − 2(1 − α)((iR̂)2 + 1) S(−4iR̂(1 − α) + (1 + (1 − α)2 )((iR̂)2 + 1))

Ψ2 (Γ1 , Γ2 ; C) = Φ (η(V̂21 + S V̂2 ) Φ (η(V̂12 + S V̂1 )) . (B48)

Appendix C: Derivation of the DMFT Equations

∑ J0i mi (t) ≈ 0 + υ(ε)ξ0 (t), (C6)

Q(t + 1) = (1 − α)Q(t) + α2 (1 − ε) ∑ G(t, s)m(s) + αϕ(t) + αh(t), (C7)

Appendix D: Adapting the Sompolinsky & Crisanti result

We start from the much simplified DMFT equation,

[34] D. Challet, M. Marsili, and Y.-C. Zhang, Minority .

You might also like

β→∞ β1 (a) (b)