Professional Documents
Culture Documents
Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World
Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World
“SK-game”, a discrete time binary choice model inspired from mean-field spin-glasses. We show that
even in a completely static environment, agents are unable to learn collectively-optimal strategies.
This is either because the learning process gets trapped in a sub-optimal fixed point, or because
learning never converges and leads to a never ending evolution of agents intentions. Contrarily to
the hope that learning might save the standard “rational expectation” framework in economics,
we argue that complex situations are generically unlearnable and agents must do with satisficing
solutions, as argued long ago by Herbert Simon [1]. Only a centralized, omniscient agent endowed
with enormous computing power could qualify to determine the optimal strategy of all agents.
Using a mix of analytical arguments and numerical simulations, we find that (i) long memory of
past rewards is beneficial to learning whereas over-reaction to recent past is detrimental and leads
to cycles or chaos; (ii) increased competition destabilizes fixed points and leads first to chaos and, in
the high competition limit, to quasi-cycles; (iii) some amount of randomness in the learning process,
perhaps paradoxically, allows the system to reach better collective decisions; (iv) non-stationary,
“aging” behaviour spontaneously emerge in a large swath of parameter space of our complex but
static world. On the positive side, we find that the learning process allows cooperative systems to
coordinate around satisficing solutions with rather high (but markedly sub-optimal) average reward.
However, hyper-sensitivity to the game parameters makes it impossible to predict ex ante who will
be better or worse off in our stylized economy.
formed social planner with colossal computing power was of the model. In other words, which equilibrium is
dictating their strategies to each agent. reached by the agents in totally unpredictable.
Hence, our model provides an explicit example of Her-
bert Simon’s satisficing principle [1]: with reasonable, • As soon as some small amount of noise is present,
boundedly rational methods, agents facing complex prob- Grandmont style instabilities set in and the system
lems can only reach some satisfying and sufficing but starts exploring the set of possible equilibria, albeit
sub-optimal solutions. In fact, in the absence of the om- in a slow, non-ergodic way. In other words, the
niscient and omnipotent social planner alluded to above, stability of the visited equilibria increases with the
agents can hardly do better – the world in which they age of the system.
live in is de facto unlearnable. This is the main message
of our paper. • In the case where interactions are sufficiently non-
A further interesting twist of the model is that the fixed reciprocal, there are no fixed points to the learning
point reached by the learning process depends sensitively dynamics except the trivial one where agents play
on the initial condition and/or the detailed structure of randomly at each turn (like “rock-paper-scissors”).
the interaction network. As a consequence, which agents However, this is not what agents agree to do. They
will do better than average and which will do worse is to- keep holding strong beliefs at each time step, such
tally unpredictable, even when the economic interactions beliefs evolving chaotically in time – but now in an
between agents are fixed and known. Correspondingly, ergodic way.[30]
small changes in the interaction network (as a result of –
say – small exogenous shocks, or as some agents die and
Another somewhat related idea can be found in the re-
other are born [15]) can lead to very large changes in
cent paper of Hirano & Stiglitz [31], where there can be
individual strategies that agents would have to re-learn
a plethora of equilibrium trajectories, neither converging
from scratch.
to a steady state or even to a limit cycle, what they call
This feature actually corresponds to another possible “wobbly” macro-dynamics. We may finally mention the
definition of a “complex” system [17]: small perturba- work of Dosi et al. [32], which showcases that sophisti-
tions can lead to disproportionally large reconfigurations cated learning rules can fail at improving the individual
of the system – think of sand (or snow) avalanches as and collective outcomes when interacting agents are het-
an archetype of this type of behaviour: a single grain of erogeneous. Although the context is quite different, such
sand might leave the whole slope intact or, occasionally, a finding appears to be consistent with the saturation of
trigger a landslide [23] (see also, e.g., [4, 6, 22, 24–29] for the average reward at sub-optimal satisficing levels that
early and more recent discussions of complexity ideas in we observe when the memory loss rate of our learning
the context of economics). agents vanishes.
B. Related Ideas
C. Outline of the paper
Our work follows the footsteps of several important
contributions on the subject of learning in economics, The layout of the paper is as follows. In Sec. II, we
beyond the papers already quoted above, such as [1, 9], introduce our spin-glass inspired game. We then review
and in particular the complex two-player game of Galla the model’s main features and present what we believe
& Farmer [14]. Another relevant work is the study of are the most important conceptual results in the socio-
Grandmont [3] on the stability of economic equilibrium economic context in Sec. III. In Sec. IV, we enter the
under different learning rules. Although his general con- technical analysis of the “SK-game” by discussing the ex-
clusions are somewhat similar to ours, there are impor- istence and abundance of fixed point solutions. A similar
tant differences, due to the fact that complexity, in our analysis is conducted for short limit cycles when there
model, arises from the strong interaction between agents. are no fluctuations in the system in Sec. V. Having de-
The main differences are the following: termined when these solutions may or may not exist, we
take interest in the N → ∞ dynamics in Sec. VI by writ-
• While our agents can in some cases (i.e. when ing the Dynamical Mean-Field Theory of the problem.
interactions are reciprocal) converge to a locally Combining the static and dynamic pictures, we focus on
stable equilibrium, this equilibrium is rather ineffi- the fully rational (or “zero temperature”) limit and in
cient compared to the best theoretical one, which is particular on the role of learning on the collective dynam-
unattainable without the help of a central planner ics in Sec. VII. Section VIII finally considers the effect
with super-powers. Furthermore, instead of a sin- of fluctuations stemming from the bounded rationality of
gle equilibrium in the Grandmont case, the equilib- agents in their decision making process. In Sec. IX, we
rium reached by our learning agents is one among summarize our findings and discuss future perspectives
an exponential number of possible equilibria, that as well as the relevance of our model for other complex
are each sensitively dependent on the parameters systems such as biological neural networks.
4
II. A SIMPLE MODEL FOR A COMPLEX some forms of “Experience Weighted Attraction” that are
WORLD popular in the socio-economic context [40]. In fact one
can always rescale the prefactor α into β, so our choice
A. Set-up of the Model just means that Q does not artificially explode as α → 0,
which would impose that β effectively diverges in that
As a minimal, stylized model for decision making in limit.
a complex environment of interacting agents, we restrict Now, the missing ingredient is the specification of the
ourselves to binary decisions, as in many papers on mod- rewards, that encodes heterogeneity and non-reciprocity
els with social interactions, see e.g. [15, 33–36]. At of interactions. Inspired by the theory of spin-glasses,
every timestep t, each agent i plays Si (t) = ±1, with in particular by the Sherrington-Kirkpatrick (SK) model
i = 1, . . . , N , which can be thought of, for example, as the [41, 42], we set
decision of an investor to buy or to sell the stock market, N
or the decision of a firm to increase or to decrease pro- Ri (t) = ∑ Jij Sj (t). (4)
duction, etc. The incentive to play Si (t) = +1 is Qi (t) j=1
and is the agent’s estimate of the payoff associated to
Si (t) = +1 compared to that of Si (t) = −1. The actual Here, the matrix elements Jij specify the mutually bene-
decision of agent i is probabilistic and drawn using the ficial or competitive nature of the interactions between i
classic “logit” rule [37], i.e. sampled from a Boltzmann and j. (Note that Jij measures the impact of the decision
distribution over the choices, of j on the reward of i.)
In the context of firm networks, a client-supplier re-
e±βQi (t) 1 lation would correspond to Jij > 0, whereas two firms
P [Si (t) = ±1] = = [1±tanh (βQi (t)) ], i, j competing for the same clients would correspond to
eβQi (t) + e−βQi (t) 2
(1) Jij < 0. In the so-called “Dean problem”, Jij > 0 means
or, equivalently, the expected choice mi (or “intention”) that agents i and j get along well whereas Jij < 0 means
of agent i at time t is given by that they are in conflict [43]. The sign of Si determines
in which of the two available rooms agent i should sit,
mi (t) ∶= ⟨Si (t)⟩ = tanh (βQi (t)) . (2) in order to minimize the number of possible conflicts. A
predator-prey situation is when Jij × Jji < 0, meaning
Parameter β, assumed to be independent of i hence- that if i makes a gain, j makes a loss and vice versa.
forth, is analogous to the inverse temperature in statis- Note that reward Ri (t) depends on the actual (real-
tical physics and represents the agent’s rationality or in- ized) decision of other players, and not their expected
tensity of choice.[38] The limit β → ∞ corresponds to decisions or intentions. In other words, agents resort
perfectly rational agents, in the sense that they system- in online learning, which differs from offline learning
atically pick the choice corresponding to their preference where other players’ decisions Si (t) are averaged over
(given by the sign of Qi (t)), while setting β = 0 gives large batch sizes during which their inclinations would
erratic agents that randomly pick either decision with be assumed constant and replaced by their expectation
probability 1/2 regardless of the value of Qi (t). This will mi (t).
in fact turn out the be the case, in our model, in a whole Based on the learning dynamics, agents thus make
region of parameter space: when β is smaller than some a decision based on an imperfectly learned approxima-
critical value βc , all intentions mi do converge to zero, tion of what other players are likely to do. Indeed, us-
leading to a random string of decisions. ing Eq. (4) to express Qi (t) as the time-weighted sum
The evolution of the preference Qi (t) is where the (EWMA) of past rewards, plugging Eq. (3), and finally
learning takes place. We resort in so-called “Q-learning” replacing in Eq. (2), one finds that the evolution of agent
[39], i.e. reinforcement learning with a memory loss pa- i’s intention writes
rameter α. Given the (yet unspecified) reward ±Ri (t)
associated to making the choice ±1 at time t, the evolu- ⎛ N ⎞
tion of incentives (and, in turn, beliefs) is given by mi (t + 1) = tanh β ∑ Jij m̃α
j (t) , (5)
⎝ j=1 ⎠
Qi (t + 1) = (1 − α)Qi (t) + αRi (t). (3) where m̃αi (t) can be interpreted as the estimate of agent
This map amounts to calculating an Exponentially j’s expected decision at time t+1 based on its past actions
Weighted Moving Average (EWMA) on the history of up to time t,
rewards Ri (t). Taking α = 0, the agent’s preferences are ′
i (t) = α ∑ (1 − α)
m̃α Si (t′ ).
t−t
(6)
fixed at their initial values, and we thus restrict ourselves t′ ≤t
to α > 0. When α → 0, Qi (t) is approximately given by
the average reward over the last α−1 time steps. Note Expressed in this form, it is clear that there is characteris-
here that this averaging of past rewards is not exactly tic timescale τα ∼ 1/α over which past choices contribute
the same as the accumulation rule (where the reward to the moving average m̃α i (t). Note that offline learn-
would not be multiplied by α in Eq. (3)) appearing in ing would correspond to a different evolution equation,
5
chaos
RN
α
random
1
L ∼ α− 2
chaos
(Q)FP aging oscil.
0 0
0
0 1 2 0 1 2 0.6 0.8 1.0 0 1 2
ε ε α ε
(c) (d)
1.5
FIG. 1. Qualitative phase diagram in the (ε, α) plane for the
SK-game in the noiseless (left) and weak noise (right) regimes. 1 A ≈ 2.1
RN
FP and (Q) refer to “fixed point” and “quasi” respectively
(mi (t) = 0 ∀t). “Random” refers to the case where agents 1.0
A ≈ 1.0
pick ±1 with equal probability, whereas “chaos” means that
at a given instant of time players have well defined intentions 0
mi but these evolve chaotically with time. 0.0 0.5 1.0 0.0 0.1
1/β N −2/3
agent i to play +1 or −1) is bound to be extremely diffi- FIG. 2. Evolution of the average reward with (a) the memory
cult. Defining the average reward per agent as loss rate α, for ε = {0, 0.6, 0.85, 1.05, 1.5, 2} from dark purple
to light green, β → ∞; (b) the asymmetry ε for α = 0.01
1 1 N and α = 0.1, β → ∞, the vertical dotted line indicates the
RN ∶= ∑ Si Ri = ∑ Si Jij Sj , (12) value εc above which the system becomes chaotic [61]; (c)
N i N i,j=1 the noise level 1/β for α = 0.01, ε = {0, 0.6} (dark purple
and blue respectively), note the non-monotonic behaviour; (d)
system size N for α = 0.01, β → ∞, ε = {0, 0.6} (dark purple
and noting that ∑N i,j=1 Si Jij Sj = (1 − 2 ) ∑i,j=1 Si Jij Sj ,
ε N S
and blue respectively), continuous lines showing fits RN =
the largest possible average reward R∞ for N → ∞ can
R∞ − AN −2/3 , excluding N = 32. The dotted line indicates
be exactly computed using the celebrated Parisi solution
the best fit for the SK ground state, for which A ≈ 1.5 [62].
of the classical SK model and reads [42]: In (a), (b) and (c) N = 256 and the dashed line represents the
Parisi solution for R∞ .
lim RN ∶= R∞ = 0.7631... × (2 − ε). (13)
N →∞
However, in practice this optimal value can only be Eq. (5) becomes
reached using algorithms that need a time exponential
in N .[58] Hence, it is expected that simple learning al- ⎛ ′
N ⎞
Si (t + 1) = sign ∑ (1 − α)t−t ∑ Jij Sj (t′ ) , (14)
gorithms will inevitably fail to find the true optimal ⎝t′ ≤t j=1 ⎠
solution. Nevertheless, we also know from the spin-
glass folklore [42] (see below for more precise statements) and the model is fully specified by two parameters: α
that many configurations of {Si }’s correspond to quasi- (controlling the memory time scale of the agents) and ε
optima, or, in the language of H. Simon, satisficing solu- (controlling the reciprocity of interactions). For N not
tions [1] (see also [21]). It is in a sense the proliferation too large, the evolution of Eq. (14) leads to either fixed
of such sub-optimal solutions that prevent simple algo- points, or oscillations, or else chaos. The schematic phase
rithms to find the optimum optimorum. Furthermore, if diagram in the plane (α, ε) is shown in Fig. 1.
learning indeed converges (which is not the case when One clearly sees a region for (α, ε) small where learn-
ε is too large, i.e. when interactions are not reciprocal ing reaches a fixed point, in which the average reward
enough), the obtained fixed point heavily depends on the is close, but significantly below the theoretical optimum
initial condition and/or the specific interaction matrix J. R∞ given by Eq. (13), see Fig. 2.[59] Note that learning
definitely helps: for ε = 0, most fixed points are character-
ized by a typical reward R ≈ 1.01 [60], significantly worse
than the value ≈ 1.40 reached by our learning agents,
A. Phase Diagram in the Noiseless Limit
extrapolated to N → ∞.
As ε and α are varied one observes the following fea-
Let us first consider the case where agents always tures:
choose the action that would have had the best average
reward in the past Qi (t) (assuming that other agents still • When ε is not too large (interactions sufficiently
played what they played). This corresponds to the noise- reciprocal) and α increases (shorter and shorter
less learning limit β → ∞. In this case, the iteration map memory) learning progressively ceases to converge
7
C(τ )
0.5
coordinate on a mutually beneficial equilibrium. A
similar effect was observed in dynamical models of 0.0
supply chains, where over-reaction leads to oscillat- 0.0
ing prices and production – the so-called bullwhip 0 2 4 6 0 2 4 6
effect [63] (see also [22, 64]).
(c) (d)
• Conversely, when α is small (long memory) and 1.0 1
ε increases, the probability to reach a fixed point
C(τ )
progressively decreases, and when a fixed point is 0.5
0
reached, the average reward is reduced – see Fig.
2(b). Beyond some threshold value, the dynam- 0.0
ics becomes completely chaotic, leading to further −1
loss of reward. Note that “chaos” here means 0 2 4 6 0 2 4 6
that although agents have well defined intentions τ τ
mi (t) ≠ 0 at any given instant of time, these inten-
tions evolve chaotically forever.[65] FIG. 3. Evolution of the steady-state two-point correlation
function for α = 1, β → ∞, markers indicating simulations
• Surprisingly, oscillations reappear when ε becomes of the game at N = 256 averaged over 128 samples of dis-
larger than unity, i.e. when interactions are mostly order and initial conditions, error-bars showing 95% confi-
anti-symmetric, “predator-prey” like. Perhaps dence intervals, and continuous lines representing the solu-
reminiscent of the famous Lotka-Volterra model, tion of the DMFT equation (see Eq. (30) below). (a) ε = 0.1
agents’ decisions and payoffs become periodic, with (ε < εc ≈ 0.8), cycles of length L = 2. (b) ε = 0.85 (εc < ε < 1),
a period that scales anomalously as α−1/2 , i.e. much “weakly” chaotic behavior. (c) ε = 1.05 (1 < ε < 2 − εc ),
shorter than the natural memory time scale α−1 . “strongly” chaotic behavior. (d) ε = 1.5 (ε > 2 − εc ), cycles of
Although not the Nash equilibrium mi ≡ 0, this length L = 4.
oscillating state allows the average reward to be
positive, even when at each instant of time, some
agents have negative rewards. The autocorrelation function corresponding to the dif-
ferent cases described above are plotted in Fig. 3. Note
• Only in the extreme competition limit ε = 2 (cor- that the signature of oscillations of period L is that
responding to a zero sum game, Eq. (13)) and for C(nL) ≡ 1 for all integer n. However, note that when
small α, are agents able to learn that the unique ε < εc , not all spins flip at each time step. The fact that
Nash equilibrium is to play random strategies mi ≡ C(2n + 1) = 0 means that half of the spins in fact re-
0 (see Fig. 1, bottom right region).[66] main fixed in time, while the other half oscillate in sync
between +Si and −Si .[68] In the chaotic phases, C(τ )
• Finally, in the extreme (and unrealistic) case α = 1, tends to zero for large τ , with either underdamped or
where agents choose their strategy only based on overdamped oscillations. Hence in these cases, the con-
the last reward, the system evolves, as ε increases figuration {Si } evolves indefinitely with time, and hardly
from zero, from high frequency oscillations with pe- ever revisit the same states.
riod L = 2 to “weak chaos”, to “strong chaos” when
ε ≈ 1, and finally back to oscillations of period L = 4
when ε → 2. B. Finite N , Large t vs. Finite t, Large N
ρ>(ei)
point correlation C(τ ) is not equal to one for all τ (which 0.4
would be the case for a fixed point) but reaches a pos-
ρ(ei)
itive plateau value for large τ : C(τ → ∞) = C∞ > 0.
Only when ε = 0 does one find C∞ = 1. The same holds 10−6
0.2 2.5 5.0
for quasi-cycles of length L if one considers the correla-
tion function computed for τ = nL, with n an integer:
C(nL → ∞) = C∞ > 0, with C∞ = 1 when ε = 2. 0.0
The schematic phase phase diagram drawn in Fig. 1 0 1 2 3 4 5
then continues to hold in the large N finite t limit, pro- ei
vided one interprets “fixed points/cycles” as “quasi-fixed
points/cycles” in the sense defined above. More details FIG. 4. Distribution of individual rewards at β → ∞ for N =
on the subtle role of α, N and t will be provided in the 512 and 128 initial conditions and realizations of the disorder
next, technical sections. in the fully symmetric case ε = 0 for α = {0.5, 0.1, 0.01} (dark
to light coloring). The dashed line is the Sommers-Dupont
analytical solution to the SK model [69]. Inset: associated
C. Noisy Learning & “Aging” survival function in a lin-log scale and focusing on the right
tail, dotted line corresponding to a Gaussian fit.
In the presence of noise, the “convictions” ∣mi ∣ of
agents naturally decrease, and in fact become zero (i.e.
noisy chaos. However, there is still a distinction between
decisions are totally random) beyond a critical noise level
a low-noise phase where at each instant of time, agents
that depends on the asymmetry parameter ε: more asym-
have non-zero expected decisions mi (that will evolve
metry leads to more fragile convictions (see Fig. 16 below
over time) from a high-noise phase where agents always
for a more precise description).
make random choices between ±1 with probability 1/2
When the noise is weak but non-zero, strict fixed points
(see Fig. 1).
do not exist anymore, but are replaced (for ε small
Finally, in the case where learning leads to cycles, any
enough) by quasi-fixed points – in the sense that the in-
amount of noise irremediably disrupts the synchronisa-
tentions mi fluctuate around some plateau value for very
tion process, and cycles are replaced by pseudo-cycles,
long times, before evolving to another configuration com-
with either underdamped or overdamped characteristics.
pletely uncorrelated with the previous one. This process
In the limit ε → 2, fluctuations drive the system to a para-
goes on forever, albeit at a rate that slows down with
magnetic state where q = C(0) = 0 (see Fig. 16 below),
time: plateaus become more and more permanent. This
meaning the agents remain undecided.
is called “aging” in the context of glassy systems.
In a socio-economic context, it means that a form of
quasi-equilibrium is temporarily reached by the learn- D. Individual Rewards
ing process, but such a quasi-equilibrium will be com-
pletely disrupted after some time, even in the absence of
any exogenous shocks. This is very similar to the quasi- As we have noted above, the average reward is close,
nonergodic scenario recently proposed in Ref. [15], al- but significantly below the theoretical optimum R∞
though in our case the evolution time is not constant but given by Eq. (13). However, some agents are better off
increases with the “age” of the system, i.e. the amount than others, in the sense that the individual reward ei
of time the game has been running. at the fixed point (when fixed points exist), is different
Perhaps counter-intuitively, however, the role of noise from agent to agent. Noting that
is on average beneficial when 1/β is not too large. Indeed, RRR RRR
as shown in Fig. 2, the average reward first increases as ei ∶= Si⋆ Ri⋆ = RRRRR∑ Jij Sj⋆ RRRRR , (16)
a small amount of noise is introduced, before reaching RRR j RRR
a maximum beyond which “irrationality” becomes detri-
mental. The intuition is that without noise, the system where the second equality holds because at the fixed
gets trapped by fixed points with large basins of attrac- point one must have Si⋆ = sign(Ri⋆ ), it is clear that in
tion, but lower average rewards. A small amount of noise the fully reciprocal case ε = 0, all rewards ei are positive.
allows agents to reassess their intentions and collectively The distribution ρ(e) of these rewards over agents is ex-
reach more favorable quasi-fixed points, much as with pected to be self-averaging for large N , i.e. independent
simulated annealing or stochastic gradient descent. of the specific realisation of the Jij and of the initial con-
When learning leads to a chaotic evolution, i.e. when dition. Such distribution is shown in Fig. 4. One notices
Jij and Jji are close to uncorrelated (ε ∼ 1), noise in that ρ(e) vanishes linearly when e → 0
the learning process does not radically change the evo-
lution of the system: deterministic chaos just becomes ρ(e) ≈ κe, κ ≈ 1.6,
e→0
9
2.5
Now, the interesting point about our model is that the
ei(t)
0.0 1
CN
×
∶= ∑(e − ⟨e ⟩)(ei − ⟨e ⟩),
a a b b
a≠b (17)
−2.5 N i i
0 10−1 100 101 102 103 where a, b corresponds to two different initial conditions
αt and ⟨e⟩ corresponds to the cross-sectional average reward.
As shown in Fig. 7, CN ×
goes to zero at large N , indicating
FIG. 5. Evolution of individual rewards in time for N = 256, that the final outcome of the game, in terms of the win-
α = 0.01, β → ∞, (a) ε = 0 and (b) ε = 0.6. Right: histogram ners and the losers, cannot be predicted. The dependence
of the individual rewards after a single timestep (shaded) and on N appears to be non-trivial, with different exponents
at the final time (unshaded). governing the decay of the mean overlap CN × (decaying as
N −0.85
) and its standard deviation (decaying as N −2/3 ).
A similar effect would be observed if instead of chang-
as for the standard SK model, although the value of κ is ing the initial condition one would randomly change the
distinctly different from the one obtained for the true op- interaction matrix J by a tiny amount ϵ. The statement
timal states of the SK model, for which κSK ≈ 0.6 [69, 70]. here is that for any small ϵ, CN ×
goes to zero for suffi-
Such a discrepancy is expected, since the fixed points are ciently large N . This is called “disorder chaos” in the
obtained as the long time limit of the learning process context of spin-glasses [73]; by analogy with known re-
– in particular, since κ > κSK , the number of poorly re- sults for the SK model, we conjecture that CN ×
is a de-
warded agents is too high compared to what it would ζ
creasing function of N ϵ , where ζ is believed to be equal
be in the truly optimal state. Note that once α is suf- to 3 in the SK case [74]. This means that when N ≫ ϵ−ζ
ficiently small for the system to reach a fixed point, its the rewards between two systems with nearly the same
precise value does not seem to have an impact on the interaction structure, starting with the same initial con-
distribution of rewards and κ. ditions, will be close to independent.
Another important remark is that the distribution of Such a sensitive dependence of the whole equilibrium
rewards ρ(e) does not develop a “gap” for small e, i.e. state of the system (in our case the full knowledge of the
a region where ρ(e) is exactly zero. In other words, al- intentions mi of all agents) prevents any kind of “com-
though all agents have positive rewards, some of them mon knowledge” assumption about what other agents
are very small. This is associated with the so-called will decide to do in a specific environment. No reason-
“marginal stability” of the equilibrium state [71], to wit, able learning process can lead to a predictable outcome;
its fragility with respect to small perturbations, as dis- even the presence of a benevolent social planner assign-
cussed in more details in the next subsection. ing their optimal strategy to all agents would not be able
For very large e, the distribution ρ(e) decreases like to do so without a perfect knowledge of all interactions
a Gaussian (Fig. 4 inset), corresponding to a Central between agents and without exponentially powerful (in
Limit Theorem behaviour in that regime, as for the SK N ) computing abilities. Such a “radically complex” sit-
model.[72] Fig. 5 shows how the rewards of individual uation leads to “radical uncertainty” in the sense that
agents evolve from an initially random configuration, be- the behaviour of agents, even rational, cannot be pre-
fore settling to constant (but heterogeneous) values at dicted. Learning agents can only achieve satisficing so-
the fixed point. lutions, that are furthermore hypersensitive to details.
As competitive effects get stronger (i.e. as ε increases) As we have seen in Sec. III C, any amount of noise in
and the system ceases to reach a fixed point, the distribu- the learning process will make the whole system “jump”
tion ρ(e) develops a tail for negative values of e, meaning from one satisficing solution to another in the course of
that some agents make negative gains, see Fig. 6. In the time.
extreme “predator-prey” limit ε = 2, the distribution ρ(e)
becomes perfectly symmetric around e = 0, as expected
– see Fig. 6, rightmost plot. However, note that there is F. Increasing Cooperativity
no persistence in time of the winners: the individual re-
ward autocorrelation function eventually decays to zero, A way to help agents coordinate is to use rewards given
possibly with oscillations in the competitive region. by Eq. (11) with J0 > 0, representing a non-zero average
10
ρ(ei) 100
10−2
−5 0 5 −5 0 5 −5 0 5 −5 0 5 −5 0 5 −5 0 5
ei ei ei ei ei ei
FIG. 6. Distribution of individual rewards for N = 256 and α = {1, 0.5, 0.01} represented by black triangles, purple squares
and green circles respectively, β → ∞, measured over 32 initial conditions and realizations of the disorder. From left to right:
ε = {0, 0.1, 0.85, 1.05, 1.5, 2}.
N = 128
10−1
N = 256
RN
M
1
CN×
N = 512 0.5
0.1
namics, we have observed that the fraction of trajectories We will mostly focus, in the following, on the long
converging to seemingly dynamically inaccessible config- memory case α ≪ 1 which is most relevant for thinking
urations significantly increases, specially when α ≪ 1. about learning in a (semi-)realistic context. In this case,
While further work is required to precisely assess the one can show that the exponential moving average on the
effectiveness of learning with self-reinforcement (partic- realized values Si (t) converges to one on the expected
ularly as finite size effects appear to play a significant values mi (t). Indeed, as detailed in Appendix A,
role), such a result is consistent with the overall influ-
2
ence of learning reported here. ′ α
⟨(α ∑ (1 − α)t−t (mi (t′ ) − Si (t′ ))) ⟩ ≤ ÐÐ→ 0.
t′ ≤t 2 − α α→0
√ (18)
H. Core Messages This means that up to fluctuations of order α, we can
describe the dynamics of the system through a determin-
In line with the conclusions of Galla & Farmer [14], our istic iteration on mi (t), in fact corresponding to offline
multi-agent binary decision model provides an explicit learning. (We will√see below that the neglected fluctua-
counter-example to the idea that learning could save the tions are of order α/β.)
rational expectation framework (cf. Sec. I and Ref. [9]). Further making the ansatz that the mean-field dynam-
Learning, in general, does not converge to any fixed ics will eventually reach a fixed point mi (t) = m⋆i ∀i given
point, even when the environment (in our case the inter- sufficient time, Eq. (5) then yields
action matrix J) is completely static: non-stationarity is
self-induced by the complexity of the game that agents ⎛ ⎞
are trying to learn, as also recently argued in [13]. m⋆i = tanh β ∑ Jij m⋆j . (19)
⎝ j ⎠
When learning does indeed converge (which requires a
minima a high level of reciprocity between agents) the This equation is known in the spin-glass literature as the
collective state reached by the system is far from the Naive Mean-Field Equations (NMFE) when ε = 0, and
optimal state, which only a benevolent, omniscient social defines a so-called static Quantal Response Equilibrium,
planner with formidable powers can achieve. In other similar to its fully mean-field equivalent (Jij = J/N )
words, even more sophisticated learning rules would not studied in [56].
really improve the outcome: the SK-game is unlearnable To the reader familiar with the physics of disor-
and – as argued by H. Simon [1] – agents must resort to dered systems, this equation is immediately reminis-
suboptimal, satisficing solutions. cent of the celebrated Thouless-Anderson-Palmer (TAP)
Furthermore, any small random perturbation (noise in equation [77] describing the mean magnetization in the
the learning process, or slow evolution in the environ- Sherrington-Kirkpatrick (SK) spin-glass [41]. Physically,
ment) eventually destabilises any fixed point reached by the NMFE equation is satisfied when minimizing the
the learning process, and completely reshuffles the collec- free energy of a system of N sites comprising M → ∞
tive state of the system: in the long run, agents initially binary spins, with sites interacting through an SK-like
favoring the +1 decision end up favoring −1, and better- Hamiltonian [78]. Despite being seemingly simpler than
off agents end up being the underdogs, and vice-versa its previously mentioned TAP counterpart, which in-
(much as in the simpler model of Ref. [15]). cludes an additional Onsager “reaction term”, the NMFE
Finally, even in the most favourable case of a fully re- shares many of its properties. Relevant to our problem,
ciprocal game with slow learning, the average reward is both the NMFE and the TAP equations have a para-
in fact improved when some level of noise (or irrational- magnetic phase (m⋆i = 0 ∀i) for β < βc , while above
ity) is introduced in the learning rule, before degrading this critical value there is a spin-glass phase where q ⋆ =
again for large noise. N −1 ∑i (m⋆i )2 > 0 and solutions are exponentially abun-
dant in N [78–80]. The NMFE has a critical temperature
1/βc = 2 as opposed to 1/βc = 1 in the TAP case, while the
IV. FIXED POINT ANALYSIS AND two equations become strictly equivalent in the β → ∞
COMPLEXITY limit.
Using known properties from the spin-glass literature,
We have seen that our model displays a wide variety we can therefore already establish that if the system
of complex collective dynamics. Only in some cases does reaches a fixed point when interactions are fully recip-
learning converge to non-trivial fixed points where strate- rocal (ε = 0) and memory is long ranged, it will be either
gies are probabilistic but with time independent proba- a trivial fixed point where agents continue making ran-
bility p± to play ±1, such that p± = (1±m⋆i )/2 for agent i. dom decisions for ever (m⋆i = 0 ∀i), or, when learning
Such a steady-state would be analogous to an economic is not too noisy (β > βc ) the number of fixed points is
equilibrium (although it is essential to dissociate this no- ∼ exp[Σ(β)N ], where Σ(β) is called the “complexity”.
tion from that of a thermodynamic equilibrium, which In this second case, the fixed point actually reached by
may only exist in the case of fully reciprocal interactions, learning depends sensitively on the initial conditions and
ε = 0). the interaction matrix J.
12
How is this standard picture altered when interactions As a matter of fact, even in the annealed case, the
are no longer reciprocal? In such cases, the system can- computation of the TAP complexity has proved to be
not be described using the equilibrium statistical me- a formidable task, and has sparked a large amount of
chanics machinery. controversy, as the original solution computed by Bray
and Moore (BM) [60] has been put into question before
being (partially) salvaged by the metastability of TAP
A. Critical Noise Level states in the thermodynamic limit [83]. For a relatively
up to date summary of the situation, we refer the reader
In order to extend the notion of critical noise βc to to G. Parisi’s contribution in [84].
ε > 0, one can naively look at the linear stability of the While the BM approach can be adapted to the NMFE
paramagnetic solution m⋆i = 0 ∀i to Eq. (19). Just as [79, 85], several aspects of the calculation remain unclear,
in the TAP case [42], expanding the hyperbolic tangent particularly as the absence of a sub-dominant reaction
to the second order and projecting the vector of mi on term means that the argument of the metastability of
an eigenvector of J, the stability condition can be ex- states in the N → ∞ limit is no longer valid a priori,
pressed with the largest eigenvalue of the interaction ma- although numerical results support the marginally stable
trix. Adapting known results from random matrix the- nature of NMFE fixed points in the thermodynamic limit
ory to our specific problem formulation, the spectrum of [79]. We leave its extension to ε > 0 to a later dedicated
J can be expressed as an interpolation between a Wigner work.
semi-circle on the real axis (ε = 0), the Ginibre ensemble Nevertheless, the previously introduced critical βc and
(ε = 1) and a Wigner semi-circle on the imaginary axis the existing computation of the number of fixed points
(ε = 2) [81]. The resulting critical “temperature” is then as a function of ε in the β → ∞ limit [61, 86] can be used
given by to conjecture the boundaries of the region in (β, ε) space
where the complexity Σ is non-vanishing. Indeed, in the
1 1 (2 − ε)2 zero temperature case, it has been shown [61] that the
Tc (ε) = = √ , (20)
βc (ε) 2 υ(ε) annealed complexity can be expressed as a function of
the asymmetry parameter η defined in Eq. (10) as
recovering the known result 1/βc = 2 for the case ε = 1
0. (We recall that we have set the interaction variance Σ(η) = − ηx2 + log 2 + log Φ(ηx), (22)
2
σ 2 to unity throughout the paper. If needed σ can be
reinstalled by the rescaling β → βσ.) with Φ the Gaussian cumulative density and x is the
solution to
1 x
B. The Elusive Complexity
xΦ(ηx) = Φ′ (ηx), Φ(x) ∶= erfc (− √ ) (23)
2 2
The main insight provided by this result is that the com-
To determine if there are still an exponential number plexity vanishes at η = 0, corresponding to ε = 1, where
of fixed point to reach below the candidate critical noise the paramagnetic fixed point is supposed to be unsta-
level, i.e. if there is a spin-glass phase, for β > βc (ε) √
ble as βc (ε = 1) = 2. As the complexity is a decreas-
when ε > 0, we should study the complexity, defined for ing function of temperature, this therefore means that
a single realization of the disorder as ε = 1 is an upper limit for the existence of fixed points
1 when β is finite. This conjecture is also consistent with
Σ(β, ε) = lim log NJ (N, β, ε), (21) the breakdown of fixed point solutions to the dynamical
N →∞ N
mean field theory below the critical noise level that will
where NJ is the number of fixed points in the system be discussed in Sec. VIII A, as well as the saddle point
for a given interaction matrix. There are then two ways equations obtained when adapting the BM calculation to
to compute an average of this quantity over the disor- ε > 0.
der: the “quenched” complexity, where the mean of the Combining these two somewhat heuristic delimitation
logarithm of the number of solutions is considered, and for the existence of a large number of non-trivial fixed
its “annealed” counterpart, where the logarithm is taken points, we obtain the critical lines shown in Fig. 9(a).
on the mean number of solutions. The former is usually Overlaying these borders with the annealed complexity
considered to be more representative, as unlikely samples measured numerically, we find a very good agreement.
leading to an abnormally large number of solutions can In particular, the vanishing of the complexity at ε √ =1
be observed to dominate the latter (see e.g. [21, 82] for in appears to be consistent for T < Tc (ε = 1) = 1/ 2,
recent examples), but requires a more involved calcula- as shown in Fig. 9(b). The agreement with the β → ∞
tion with the use of the so-called “replica trick” [42]. In analytical result, represented by the continuous line, also
the TAP case, quenched and annealed complexities co- appears to validate our counting method at low temper-
incide for solutions above a certain free-energy threshold atures. Note that one can in fact show that for ε > 1 and
[60] (where most solutions lie but importantly not the N = ∞, the only fixed point (or Nash equilibrium) is the
ground state). “rock-paper-scissors” equilibrium m⋆i = 0, ∀i.
13
(a) 0.20 (b) 0.2 Eq. (21)) can be extended to limit cycle complexity ΣL
1.0
for limit cycles of length L, with ΣL=1 ≡ Σ. The results
of Hwang et al. [61] can be summarized as follows:
Σ(ε)
T /2
T /2
0.05 0.1 0.2
Σ
0.5
0.01 • When ε < 1, limit cycles with L = 2 have the largest
0.0 0.00 0.0 0.0 complexity, which is exactly twice of the fixed point
0 1 0 1 complexity: Σ2 = 2Σ1 (as was in fact previously
ε ε
shown by Gutfreund et al. [86]).
FIG. 9. (a) Annealed complexity of the Naive Mean-Field • The complexities ΣL (ε) all go to zero when ε = 1.
Equation in (T, ε) space, where we recall T = β −1 , measured • When 1 < ε ≤ 2, limit cycles with L = 4 dominate,
numerically for N = 40. The dashed line represents the criti- with Σ4 (ε) ≥ Σ2 (2 − ε).
cal temperature Tc (ε) for which the paramagnetic fixed point
ceases to be stable, while the continuous line indicates ε = 1 • Close to ε = 1, the cut-off length Lc , beyond which
inferred from the β → ∞ result. (b) Annealed √ complexity as limit cycles become exponentially rare, grow expo-
a function of ε for varying temperatures T < 1/ 2, i.e. in the nentially with N : Lc ∼ eaN , where a weakly de-
bottom region of (a) where the complexity vanishes at ε = 1. pends on ε.
The continuous line represents the β → ∞ analytical solution,
recovering the result of Tanaka and Edwards [87] Σ ≈ 0.1992 From this analysis, one may surmise that:
for ε = 0. For ε > 1 and T = 0, the only possible fixed point is
m⋆i = 0, ∀i. a. When a limit cycle is reached by the dynamics, it
is overwhelmingly likely to be of length L = 2 for
ε < 1 and of length L = 4 for 1 < ε ≤ 2.
V. COUNTING LIMIT CYCLES
b. Even if exponentially less numerous, exponentially
long cycles will dominate when eaN > eN Σ2 , which
In the previous section, we have established the region occurs when εc < ε < 2 − εc , with εc ≈ 0.8.
of parameter space where exponentially numerous fixed
points exist, which might possibly be reached by learning These predictions are well obeyed by our numerical data,
in the slow limit α ≪ 1. However, limit cycles of vari- see Figs. 3, 11. Note however the strong finite N effects
ous lengths turn out to also be exponentially numerous that show up in the latter figure, which we discuss in the
when ε < 1, so we need to discuss them as well before next sections.
understanding the long term fate of the learning process
within our stylized complex world.
B. Cycles with Memory (α < 1)
A. Cycles without Memory (α = 1) When α < 1 and β = ∞, the update of Si (t) is given
by Eq. (14), have the same fixed points independently
In the memory-less limit, the dynamics becomes that of of α, but of course different limit cycles, which may in
the extensively studied Hopfield model [86, 88, 89] where fact cease to exist when α is small. In this section, we
the binary variable represents the activation of a neuron attempt to enumerate the number of cycles of length L in
evolving as the spirit of the calculation of Hwang et al. [61] for α < 1.
As detailed in Appendix B, we write the number of these
Si (t + 1) = sign ( ∑ Jij Sj (t)), (24) cycles as a sum over all possible trajectories of a product
j of δ functions ensuring the α < 1 dynamics of Qi are satis-
fied between two consecutive time-steps, while a product
with parallel updates. Counting limit cycles of length L of Heaviside step functions enforces Si (t) = sign(Qi (t)).
is even more difficult than counting fixed points (which Introducing the integral representation of the δ func-
formally correspond to L = 1). Some progress have been tion, averaging over the disorder and taking appropriate
reported by Hwang et al. [61] in the memory-less case changes of variable to decouple the N dimensions, the
α = 1. The notion of fixed point complexity Σ (defined in (annealed) complexity of cycles of length L writes
η
ΣL (α, η) = saddle { ∑ iR̂(t, s)iK̂(t, s) − ∑ V̂ (t, s)V̂ (s, t) + log IL }, (25)
R̂,K̂,V̂ s<t 2 t,s
where R̂(t, s) and K̂(t, s) are symmetric matrices while V̂ (t, s) is not a priori, and IL is explicitly given in the
14
1 − C(τ = 1)
Σ2(α, ε)
N = 64
C(τ = 2)
0.2 0.5 N = 128
0.5
N = 256
N = 512
0.0 0.0 N = 1024
0.0 0.5 1.0 0.0 0.5 1.0 0.0 DMFT
α α
0.5 0.6 0.7 0.8 0.9 1.0
FIG. 10. (a) L = 2 cycle complexity from the numerical res- ε
olution of the saddle point equations for ε = {0, 0.2, 0.4, 0.6}
from dark purple to light green, dashed lines indicating the FIG. 11. Steady-state two-point correlation between config-
fixed point complexity associated to each parameter. (b) Or- urations shifted by τ = 2 time-steps in the α = 1, β → ∞
der parameter at the L = 2 saddle, showing the nontrivial limit from finite N numerical simulations averaged over 128
coalescence of the cycle and fixed point saddle point as α is samples of disorder and initial conditions, error bars show-
decreased. The numerical resolution appears to breakdown ing 95% confidence intervals. The N = {64, 128} simulations
when we get close to α = 0. are run for t = 108 time-steps to illustrate taking the t → ∞
limit before N → ∞, whereas N = {256, 512, 1024} have been
simulated for t = 5 × 106 time-steps to recover the N → ∞ be-
Appendi, and where we can notably identify −iR̂(t, s) = fore t → ∞ regime. The continuous line represents the N → ∞
C(∣t − s∣). As a sanity check, one can verify that the DMFT solution integrated numerically, while the vertical dot-
L = 1 case, corresponding to the fixed point complexity, ted line corresponds to the critical value εc found by Hwang
is indeed independent of α and is given by the same ex- et al. [61].
pression as Eq. (22), see Appendix B 1. In a similar vein,
one can recover Σ2 (α = 1) = 2Σ1 (η).
Numerically solving the saddle point equations for de- one expects that as N grows, fixed points/limit cycles
creasing values of α, it appears that Σ2 (α, η) → Σ(η) will in fact never be reached, even if they are numerous.
when α → 0+ , see Fig. 10. While numerical difficulties This is in fact what happens numerically, see Fig. 11.
prevent us from exploring very small values of α, it seems What is then to be expected in the limit N → ∞? In
clear that the saddle point corresponding to the L = 2 order to study the complicated learning dynamics that
cycles eventually coalesces with the fixed point saddle takes place, we will resort to Dynamical Mean-Field The-
(which is known to be a sub-dominant saddle point when ory (DMFT). In a nutshell, DMFT allows deterministic
α = 1, see [61] and the discussion above). In any case, or stochastic dynamics in discrete or continuous time of
and perhaps surprisingly, there does not appear to be a a large number N of interacting degrees of freedom to be
critical value of α below which fixed points become more rewritten as a one-dimensional stochastic process with
abundant than cycles. We therefore expect a progressive self-consistent conditions. While difficult to solve both
crossover and not a sharp transition. analytically and numerically due to their self-consistent
nature, DMFT equations have proved very effective at
describing a very wide range of complex systems – see
VI. DYNAMICAL MEAN-FIELD THEORY [90] for a recent review. Note however that such an ap-
proach is only valid when N → ∞; as will be clear later,
strong finite size effects can appear and change the con-
We have thus seen that both fixed points and limit cy-
clusions obtained using DMFT.
cles are exponentially numerous. However, the question
In our case, we write the DMFT for the evolution
remains as to what happens dynamically, as the exis-
of the incentives Qi (t), which directly yield mi (t) =
tence of a large number of fixed points or limit cycles by
tanh(βQi (t)). In order to do so, we rewrite our online
no means guarantees that these will be reached at long
learning process, which depends on the realized Si (t), as
times.
an expression solely in terms of mi (t) with additional
In fact, the number of agents N is expected to play
fluctuations,
a major role in determining the long term fate of the
system. In particular, there are strong indications that ∑ Jij Sj (t) = ∑ Jij mj (t) + ηi (t), ηi (t) = ∑ Jij ξi (t),
the time τr needed to reach a fixed point or a limit cycle j j j
grows itself exponentially with N , at least when α = 1 (27)
[89]. More precisely, with ξi (t) = Si (t) − mi (t) and hence ⟨ξi (t)⟩ = 0 and
⟨ξi (t)ξi (s)⟩ = (1 − (mi (t))2 )δt,s . Now, assuming the cen-
τr ∼ N s eN B(ε) , (26) tral limit theorem holds, the random variables ηi become
Gaussian for large N with
where s is an exponent (possibly dependent on ε) and
B(ε) an effective barrier such that B(ε = 0) = 0. Hence ⟨ηi (t)⟩ = 0, ⟨ηi (t)ηj (s)⟩ = υ(ε)(1 − q(t))δt,s δi,j , (28)
15
where q(t) = C(t, t) as defined in Eq. (15). As required, with, we recall, q(t) = C(t, t) = ⟨m2 (t)⟩. Very impor-
in the noiseless limit β → ∞ limit, one has q(t) = 1 ∀t tantly, note the rescaling in time introduces a prefactor
and the random variables ηi are identically zero. α in the variance of the ϕ, which stems from the noise in
Starting from the N equations the learning process. Since 1 − q(t) ∼ β −2 for large β, this
extra term is of order α/β 2 , as anticipated above.
Qi (t + 1) = (1 − α)Qi (t) + α ∑ Jij mj (t) + αηi (t) + αhi (t),
j
In the next sections, the DMFT equations will be used
(29) to shed light on the dynamical behaviour of the model in
where hi (t) is an arbitrary external field that will even- the limit N → ∞.
tually be set to 0, the DMFT can be derived using path
integral techniques or the cavity method, the latter be-
ing detailed in Appendix C. Remaining in discrete time VII. NOISELESS LEARNING
to explore the entire range of values of α, one finds, in
the N → ∞ limit, In this section, we use both the DMFT equations and
the results on the complexity of fixed points and limit
Q(t + 1) = (1 − α)Q(t) cycles to classify the different dynamical behaviours of
+ α2 (1 − ε) ∑ G(t, s)m(s) + αϕ(t) + αh(t), the learning process in the noiseless case β → ∞, where
s<t the realized and expected decisions are equal, mi (t) =
(30) Si (t) = sign(Qi (t)).
with ⟨ϕ(t)⟩ = 0, and
⟨ϕ(t)ϕ(s)⟩ = υ(ε) [C(t, s) + (1 − q(t))δt,s ] , (31)
A. The Memory-less Limit α = 1
The memory kernel G and correlation function C are
then to be determined self-consistently,
In this case, corresponding to Eq. (24), both ap-
∂m(t) proaches (DMFT and complexity of limit cycles) seem
G(t, s) = ⟨ ∣ ⟩, C(t, s) = ⟨m(t)m(s)⟩, (32) to agree on the overall picture: as ε increases from 0
∂h(s) h=0
to 1, the system transitions from L = 2 cycles to over-
where the averages ⟨. . .⟩ are over the realisations of the damped oscillations and chaos – see Fig. 3. However,
random variable ϕ. These discrete time dynamics can upon scrutiny, one realizes that the perfect agreement
first be integrated numerically with an iterative scheme between DMFT and direct numerical simulations of the
to update both the memory kernel and correlation func- dynamics for finite N is only valid in a region where ε is
tion until convergence, see [91] or [92]. As detailed in the small and N large enough – see Fig. 11. In particular,
original work of Eissfeller and Opper [93, 94], one can also when 0.5 ≲ ε ≲ 0.8, L = 2 cycles do persist when N is
make use of Novikov’s theorem to compute the response smaller than ∼ 200. For larger N , the lag 2 autocorrela-
function with correlations, avoiding the unpleasant task tion function C(τ = 2) is noticeably smaller than unity,
of taking finite differences on noisy trajectories, at the and well predicted by DMFT as soon as N ≳ 1000.
cost of the inversion of the correlation matrix. Note that What happens for ε ≲ 0.5 when N → ∞? The numeri-
this inversion will however mean that very long trajecto- cal solution of the DMFT equations suggest the following
ries become difficult to integrate. scenario: when ε < εRM ≈ 0.473, the long time value m∞
While we shall see that this numerical resolution can of the correlation with the initial conditions C(0, 2n) at
provide precious intuition to understand the role of finite even time steps is strictly positive, hence the subscript
N in the dynamics, a continuous description will be much “RM” for Remnant Magnetisation.[95] It is only exactly
more convenient to obtain analytical insights. In the α ≪ equal to one for ε = 0 (permanent oscillations) and de-
1, t ≫ 1 regime, we can rescale the time as t → t/α. creases to reach zero when ε → εRM [94]. Below εRM ,
Interestingly, doing so requires expanding Q(t + 1) to the we can conclude that the system is not ergodic, which
second order if one is to keep an explicit dependence on α. will have important implications on the finite tempera-
The resulting continuous dynamics reads ture dynamics. For asymmetries greater that εRM on the
α t other hand, the decorrelation becomes exponential and
Q̈(t) + Q̇(t) = − Q(t) + (1 − ε) ∫ ds G(t, s)m(s) we enter a bona fide chaotic, ergodic regime.
2 0 (33)
Although memory-less learning is clearly unrealistic,
+ ϕ(t) + h(t) these results are rather instructive. The system is indeed
with unable to display aggregate coordination when interac-
tions are mutually independent (chaotic region around
⟨ϕ(t)ϕ(s)⟩ = υ(ε)[C(t, s) + α(1 − q(t))δ(t − s)], (34) ε = 1). Placing ourselves in the socio-economic setting, it
and the memory kernel and correlation function are sim- seems evident that the number of players vastly exceeds
ilarly defined self-consistently the number of iterations, and the results above indicate
that the chaotic region is in fact quite large. Perhaps
δm(t) more importantly, in the absence of learning, agents will
G(t, s) = ⟨ ∣ ⟩, C(t, s) = ⟨m(t)m(s)⟩, (35)
δh(s) h=0 see their decisions vary at a high frequency without ever
16
Si(t)
(b)
C(τ = 2/α)
0.75 0.6
C (τ = 1)
0.5 N = 128
α
Si(t)
0.50 0.4 (c)
ε
N = 256
Si(t)
0.25
N = 512 10−1 (c)
0.2 DMFT
N = 128 0.0
(d)
0.00 N = 256 0.6 0.8 1.0 1 40
0.0 ε α(t − t0)
0.5 1.0 1 50
α t − t0
FIG. 13. Influence of finite size N , non-reciprocity ε and
FIG. 12. Convergence to fixed points with decreasing α and memory span α on the learning dynamics. (a) Steady-state
for different values of ε < 0.8, β → ∞ and finite N . (a) two-point correlation function shifted by τ = 2/α in the β → ∞
Steady-state two-point correlation function between succes- limit for different memory loss rates α, from light green to
sive configurations from finite N numerical simulations aver- black (color map on the right axis). Symbols correspond to di-
aged over 200 samples of disorder and initial conditions, error- rect simulations and plain lines to the solutions of the DMFT
bars showing 95% confidence intervals. (b), (c) and (d) Sam- equations, while the dashed line is the solution to the DMFT
ple trajectories of 32 randomly chosen sites among N = 256 equations for α → 0. The N = 128 simulations are initialized
for ε = 0.4, t0 = 105 /α, for α = {1.0, 0.88, 0.7} respectively. with t0 = 108 time-steps, whereas N = {256, 512} have been
simulated for t0 = 106 iterations before taking measurements.
Results are averaged over 32 samples, with error-bars show-
ing 95% confidence intervals. (b) and (c) Sample trajectories
reaching any static steady-state, not only when the game for all N = 128 sites for ε = 1 and α = {1.0, 0.1} respectively,
is close to zero sum (ε → 2), but even when it is fully re- clearly reaching chaotic and fixed-point steady states.
ciprocal (ε → 0). Clearly, this last point underlines the
importance of introducing memory to recover realistic
learning dynamics.
DMFT regime shown as plain lines in Fig. 13. One finds
that decreasing the value of α slows down the decorrela-
tion of the system. However, for small α, the evolution
B. Memory Helps Convergence to Fixed Points
becomes a function of ατ only, as suggested by Eq. (33)
when α → 0: the dynamical slowdown is dominated by
For N not too large, we observe numerically that the the long memory of learning itself.
fraction of “frozen” agents for which Si (t + 1) = Si (t)
Fig. 13 shows that when ε ≳ 0.5, sufficiently large sys-
quickly tends to 1 as α decreases from 1, as shown in
tems (described by DMFT) decorrelate with time for all
Fig. 12. This is somewhat consistent with intuition,
α, and we expect C(τ → ∞) → 0: learning leads to chaos
as the learning dynamics average rewards over a period
in such cases.
τα ∼ 1/α, meaning that high frequency cycles observed
for α = 1 are expected to be “washed out” when α is suf- When ε ≲ 0.5, on the other hand, we found that there
ficiently small. Since fixed points exist in large numbers, is ergodicity breaking, in the sense that C(τ → ∞) > 0, as
it appears natural that they are eventually reached given we found above for cycles when α = 1. More precisely, a
their abundance at zero temperature. However, as we numerical analysis of the DMFT equations suggests that
have shown in the previous section, L = 2 limit cycles are when α → 0 and ε small, 1−C(τ → ∞) is extremely small
still much more numerous than fixed points for α ≳ 0.5. but non zero. For example, when ε = 0.4 we found 1 −
The fact that C(τ = 1) approaches unity as α is reduced C(τ → ∞) ≈ 0.005. This is compatible with the numerical
much faster than in Fig. 10(b) suggests that the basin of results of [94].
attraction of fixed points quickly expands, at the expense In other words, there seems to exist a critical value
of L = 2 limit cycles.[96] εRM (α) separating the ergodic, chaotic phase for ε >
Our numerical results therefore indicate that for any εRM (α) from the non-ergodic, quasi fixed-point be-
finite size system which has enough time to reach a steady haviour for ε < εRM (α). However, our numerical results
state, the effect of α is effectively to help the system are not precise enough to ascertain the dependence of
find fixed points – see Fig. 13. Focusing for example εRM on α, which seems to hover around the value 0.473
on the points corresponding to N = 128 and α = 0.1, we found for α = 1. More work on this specific point would
indeed observe that the 95% quantile includes C(2/α) = 1 be needed to understand such a weak dependence on the
even for ε = 1, i.e. fixed points can be reached even in memory length.
the chaotic regime with modest simulation times, which The precise dynamical behavior of the autocorrelation
would be an overwhelmingly improbable scenario in the function C(τ ) can be ascertained in the continuous limit
memory-less case, as illustrated by Figs. 13 (b) and (c). α → 0 when ε = 1. Indeed, the influence of the memory
As the number of agents N increases, we enter the kernel vanishes in this case where interactions are exactly
17
(a) (b) 100 central question is what will happen if the memory loss
1.0 100
N = 512 rate is reduced in the region of parameter space where
N = 1024 there are only limit cycles. As previously stated, aver-
aging over a period τα ∼ 1/α clearly suggests that the
C(τ )
0.5
α
10−1 occurrence of short cycles (starting at L = 4 for α = 1)
10−1 should gradually vanish.
0.0 Naively, one might expect a simple rescaling in time
0 50 100 0 4 8 t → t/α, yielding cycles – when they exist – of period
τ ατ
inversely proportional to α itself. Looking at the numer-
ical results from both the finite size game and the DMFT
FIG. 14. Evolution of the time shifted autocorrelation func- integrated numerically in Fig. 15 (a), it quickly appears
tion in the non-symmetric case ε = 1, β → ∞, for different that such a simple rescaling in time does not provide the
system sizes and memory loss parameters averaged over 20
correct description. Indeed, the√period of cycles is ob-
realizations. Left: lin-lin scale, errobars showing 95% confi-
dence intervals, continuous lines representing the numerically served to be proportional to 1/ α, i.e. much shorter
integrated full DMFT equations (Eq. (36)). Right: lin-log than 1/α – see Figs. 15 (b) and (c).
scale and rescaling of the time shift by α such that points col- One important aspect to note is that there is some
lapse onto a single curve (errobars not shown), black continu- decorrelation, as the second peak of C(τ ) does not quite
ous line representing the analytical solution found by solving reach unity (in Fig. 15 (b)), meaning that we may see
the Sompolinsky & Crisanti ODE Eq. (38), dashed line repre- quasi-cycles and not exact limit cycles, complicating the
senting a pure exponential decay with characteristic time τ1 . analytical description of the phenomenon. Just as true
fixed points in the N → ∞ limit only exist only ε = 0, it
appears that only the case ε = 2 does display true limit
non-symmetric, leaving us with cycles.
Another subtle point to consider is similar to the ε < 1
Q̇(t) = −Q(t) + ϕ(t), (α → 0) (36)
cases discussed, we expect the time taken to reach these
where we emphasize that the time variable has been cycles will depend on the system size and the relative
rescaled as t → αt. From there, the classical solution distance to the chaotic region. This is confirmed by the
method proposed by Crisanti & Sompolinsky [97, 98] can DMFT solved for fixed trajectory times for ε = 1.5 (light
√
be straightforwordly adapted with a small modification crosses), which progressively departs from the ω0 ∼ α
due to our parametrization of the interaction matrix that regime around α = 0.1.
scales the variance of the entries by a factor 1/2 for ε = 1, To understand how such non-trivial stretching occurs,
see Appendix D. The two-point autocorrelation function we go back to the continuous DMFT equation,
is found to be given by t
α
Q̈(t) = − Q̇(t) − Q(t) + (1 − ε) ∫ ds G(t, s)m(s)
2 ∆(τ ) 2 0
C(τ ) = sin−1 ( ), (37)
π ∆(0) + ϕ(t) + h(t).
where ∆(τ ) = ⟨Q(t + τ )Q(t)⟩ follows the second-order While the presence of the second order derivative Q̈(t)
ordinary differential equation appears natural to recover limit cycles, it should be noted
that this term, being pre-factored by α, is superficially
¨ ) = ∆(τ ) − 1 C(τ ),
∆(τ (38) subdominant relative to the dissipation represented by
2 Q̇(t). While we have seen that there is some decorrela-
with ∆(0) = 1 − π2 [99]. Very quickly, this means that the tion, the fact that robust oscillations are present there-
− τ fore suggests that the complicated self-consistent forcing
autocorrelation decays exponentially, C(τ ) ∝ e τ1 with
terms almost exactly compensate dissipation over a pe-
√ riod, allowing the system to periodically revisit quasi-
π−2
τ1 = ≈ 2.84. (39) identical configurations. In fact, the shape of these oscil-
π−3 lations is far from sinusoidal, but rather of see-saw type,
Both the full solution, obtained by integrating the ODE see Fig. 15 (b). This suggests that in the limit α → 0,
numerically, as well as this exponential decay, are shown Q̈(t) diverges each time Ċ(τ ) changes sign, such that
α
in Fig. 14, displaying a very satisfactory match with nu- 2
Q̈(t) cannot be neglected and therefore sets the rele-
merical simulations. vant time scale to α−1/2 . We have however not been able
to perform a more precise singular perturbation analysis
of this phenomenon.
C. Anomalous Stretching of Cycles Going beyond this rather loose argument, and precisely
characterizing such see-saw patterns appears very chal-
Having a better understanding of how the memory al- lenging and is left for future work. A possible approach
lows the system to find fixed points when they exist, a would be to first take the ε = 2 case where true cycles
18
T /(2J)
0.50
N = 128 1.8
q
N = 256 0.15
1/ω0
DMFT 1.6
ε
0 0.00
101 0 1 2 0 1 2 0 1 2
1.4 ε ε ε
1.2
100 101 102 FIG. 16. Heat map of q = C(0) in (T = 1/β, ε) space from
1/α numerical simulations for N = 256, t0 = 106 averaged over 32
(b) (c) samples of disorder and initial conditions and (a), (b) and (c)
1 10−1 corresponding to α = {0.5, 0.1, 0.01} respectively. The white
2
10
dashed line represents the critical temperature Tc (ε) where
α|Ĉ(ω)|2
α
100 (Sec. IV).
10−2
−1
0 2√ 4 6
10−2
0 ω0
for instance α = 1, it is indeed clear that the iteration
√3ω0 1
ατ ω/ α
mi (t + 1) = tanh (β ∑ Jij Sj (t))
J
FIG. 15. Evolution of the oscillation frequency ω0 with the
memory loss rate α for β → ∞, ε > 2 − εc averaged over 96 will have extremely large fluctuation in the argument on
samples of disorder and initial conditions, errorbars showing the right hand side. As a result, we expect to lose the
95% confidence interval. (a) Log-log √ plot highlighting the sharp transition as a function of T that can be observed
near square root dependency ω0 ∼ α the dashed line corre- for the NMFE (see Fig. 16). The order parameter q in-
ponding to an exponent 1/2. (b) Two-point autocorrelation stead continuously tends to 0 with T , regardless of the
for ε = 1.8, N = 256 as a function of the rescaled time lag
asymmetry ε. This regime is shown in Fig. 16 (a), rep-
for different values of α, confirming the suitability of the scal-
ing in particular at short time scales. (c) Power spectrum of resenting the heat map of q for α = 0.5. Clearly, the
the autocorrelation for the same parameters, displaying the linear stability analysis of the paramagnetic fixed point
maximum at ω0 and secondary peaks at odd multiples of this presented in Sec. IV cannot hold when the thermal fluc-
fundamental frequency as expected from the triangular aspect tuations are not averaged on large periods of time. To
of the autocorrelation. find a richer phenomenology, we will therefore focus on
the α ≪ 1 regime where more complex dynamics can be
observed.
should exist, and to assume the correlation function is
an exact triangular wave of frequency ω as suggested by
Fig. 15 (c). As a result, m(s) = sign(Q(s)) is an exact A. Fixed Points for Intentions
square wave, and the convolution with G can be written
as a product in Fourier space. Enforcing the dissipation In Sec. IV, we studied the fixed points of the NMFE
over a period to be zero, one could then perhaps find a that the game may reach if the fluctuations from imper-
closed equation for Q and ω if appropriate ansätze for fect learning can be neglected, i.e. if α → 0. Now, we have
the response and forcing functions are taken. seen that the DMFT equations proved effective in the
zero “temperature” limit, and importantly established
that true fixed points or two-cycles are only reached in
VIII. NOISY LEARNING finite time in the fully reciprocal case ε = 0 when the sys-
tem size diverges (see Fig. 11). The same equations can
also be used to revisit this finite β regime. √
While we have shown that the β → ∞ deterministic
Going back to Eq. (33) and neglecting the term in α
limit can be relatively well understood with the analyt-
from the correlation function as we did in the static setup,
ical tools at our disposal, one of the key features of our
fixed point solutions require
model is the uncertainty in the decision occurring for √
boundedly rational agents. Besides, it is also in this situa- Q = (1 − ε)mχ + J qυ(ε)z, (40)
tion that the online learning dynamics differ significantly
from the more widely studied offline learning where the where z is now a static white noise of unit variance, q
entire model can be understood in terms of deterministic is simply the now constant autocorrelation and χ is the
mixed strategies parameterized by the coefficients mi (t) integrated response function that we assume to be time-
(compare Eqs. (5) and (7)). translation invariant,
When α is close to unity and β becomes small, the fluc- ∞
tuations are too large for coordination to occur. Taking χ=∫ dτ G(τ ). (41)
0
19
The averages on the effective process can now be taken where we have used sech2 (βQ0 ) = 1 − m2 (z) from
on z to self-consistently solve for q and χ (see e.g. [55] Eq. (44), giving in Fourier space
for a more detailed description). The resulting set of
equations are then ˆ
ϕ̂1 (ω) + ξ(ω)
Q̂1 (ω) = , (47)
q = ⟨m2 (z)⟩z , (42) iω + 1 − β(1 − ε)(1 − m2 (z))Ĝ(ω)
to be solved simultaneously with where we have again assumed that the memory kernel is
time-translation invariant.
β(1 − m2 (z)) Now, in the limit βQ1 ≪ 1, i.e. close to the critical
χ=⟨ ⟩ (43)
1 − β(1 − ε)χ(1 − m2 (z)) z temperature, one can we linearize the hyperbolic tangent
tanh(βQ1 (t)) and write a closed equation for the spectral
where m(z) is the solution to
density of Q1 at order β 2 Q21 ,
√
m(z) = tanh(β(1 − ε)χm(z) + β qυ(ε)z). (44) 1
= ∣iω + 1 − β(1 − ε)Ĝ(ω)∣2 − β 2 υ(ε). (48)
Although our model is entirely built on a dynamical evo- ⟨∣Q̂1 (ω)∣2 ⟩
lution equation, and not on a notion of thermal equilib-
rium, this set of self-consistent equations coincides with As a result, we have the criterion for the onset of insta-
the replica-symmetric solution of the NMFE model found bility for ω = 0:
by Bray, Sompolinsky and Yu [78] for ε = 0. Since replica √
symmetry is broken in the whole low temperature phase (1 − ε)χ = 1 − β υ(ε), (49)
of the NMFE model, we expect that these static solu-
tions of the DMFT cannot correctly describe the long where we noticed Ĝ(ω = 0) = χ, given, close to βc , by
time limit of the dynamics, as we now show.
The numerical solutions for the DMFT fixed point 1 √
χ= (1 − 1 − 4β 2 (1 − ε)) . (50)
equations are shown in Fig. 17 (a) and compared to nu- 2β(1 − ε)
merical results of the game for small α and for ε = 0.1
(similar results, not shown, are obtained for other values Taking ε = 0, we recover the criterion found by Bray et
of ε ≲ 0.8).[100] We find that the long time behaviour al. [78] for the critical temperature, giving Tc = 1/βc = 2
of q for the direct simulation of the SK-game (circles in their case.
and squares) and for long time dynamical solution of the For non-zero ε, we can also find the critical tempera-
DMFT equations match very well, but differ from the ture by replacing Eq. (50) in (49), to recover yet again
value of q inferred from the set of self-consistent equa- the critical value given by Eq. (20). The invalidity of the
tions established above. This is expected since with such fixed point solution is illustrated in Fig. 17 (b), where
solution the order parameter q approaches unity expo- the spectral density evaluated at ω = 0 can be observed
nentially fast as T → 0, whereas the fact that the proba- to become negative for T < Tc (ε). While the linearization
bility of small local fields (i.e. rewards in the game anal- used to obtain the relation is expected to become invalid,
ogy) vanish linearly (see Fig. 4) suggest that q = 1 − κT 2 , we understand this negativity as a strong sign that the
as for the full RSB solution of [78] but with presumably solution is unstable and thus likely invalid throughout the
a different value of κ. low “temperature” phase, consistently with the discrep-
To ascertain the range over which this non-trivial ancy observed between the static prediction and the di-
mean-field solution should be valid, we can study the rect simulations and the dynamic solution of the DMFT
stability of the DMFT fixed point close to the critical show in Fig. 17 (a). In this finite β regime, we therefore
temperature 1/βc , following the procedure first detailed effectively only have what we previously referred to as
in [54]. Considering a random perturbation to the fixed quasi-fixed points at best (including when ε = 0), even
point ϵξ(t), with ξ(t) a Gaussian white noise and ϵ ≪ 1, when neglecting the vanishing fluctuations caused by the
we study the perturbed solution online learning dynamics as α → 0.
Q(t) = Q0 + ϵQ1 (t), (45)
with Q0 the fixed point given in Eq.(40), where√the noise B. Aging
is no longer static but similarly given by ϕ(t) = qυ(ε)z+
ϵϕ1 (t). Replacing in the DMFT continuous dynamics for The inadequacy of the static solution of the DMFT
α → 0+ and collecting terms of order ϵ, we find that the equations to describe the long term dynamics of the sys-
perturbation evolves as tem is a well known symptom associated with the “aging”
t
phenomenon [101], i.e. the fact that equilibrium is never
Q̇1 (t) = −Q1 (t) + β(1 − ε)(1 − m2 (z)) ∫ G(t, s)Q1 (s) reached and all correlation functions depend on the “age”
0 of the system [102]:
+ ϕ1 (t) + ξ(t),
(46) C(tw , tw + t) = Crelax (t) + Caging (t, tw ), (51)
20
(a) (b) model [112, 113], which is typically found to display “sub-
1.0 aging”, although the precise scaling of the aging correla-
0.4
h|Q̂1(ω = 0)|2i−1
tion function remains unclear. It is however important
0.2 to emphasize that the learning dynamics of the SK-game
0.5 is markedly different from the physical dynamics of the
q
t
Caging (t, tw ) = C ( ), (52) C. Chaos and (quasi-)Limit cycles
tw
corresponding to the aging behaviour found in a wide When the non-reciprocity of interactions is sufficiently
range of glassy models [102, 106–111]. Interestingly, small and quasi-fixed points exist, we have established
this is not the behavior observed in the “physical” SK that boundedly rational systems displays complicated ag-
21
0.8 107
0.95 (a) (b) (c) (d)
x = 0.16 x = 0.20
C(tw , tw + t) − At−x
0.80 0.8
0.90 A = 0.015 A = 0.024 0.6
C(tw , tw + t)
0.75 0.6
αtw
0.85 105
0.9 0.70
0.8
0.4 0.4
0.80 0.8
0.65 0.2 0.2
0.6
100 103 106 100 103 106
0.75
0.60 0.0 103
10−6 10−2 102 10−6 10−2 102 100 103 106 100 103 106
t/tw t/tw αt αt
FIG. 18. Aging behavior of the system for N = 256, averages performed over 576 realizations. Color indicates the value of αtw ,
see scale on the far right. (a) and (b): Aging two-point correlation functions with the initial power law decay removed to isolate
the aging component plotted as a function of t/tw , inset showing the entire correlation function as a function of αt, dashed line
representing the power law fit At−x of the first relaxation. (a) α = 0.5, β = 4, ε = 0, (b) α = 0.2, β = 2, ε = 0.25. (c) and (d):
Partially aging two-point correlation functions plotted as a function of αt. (c) α = 0.5, β = 4, ε = 0.5, (d) α = 0.2, β = 2, ε = 0.5.
ing dynamics when learning noise, parameterised by the works with uncorrelated couplings [118], what is interest-
value of β, is present. The immediate question is now ing in our case is that both the non-linearity of the hy-
how such noise influences the complex dynamics, chaos perbolic tangent (governed by β), and the strength of the
and (quasi) limit cycles that we have found in the β → ∞, effective noise (which scales as 1 − q, see Eq. (34)) are
α ≪ 1 regime (see Sec. VII). varied simultaneously and, in a sense, self-consistently.
To qualitatively illustrate the effect of non-zero noise, Determining the way the decorrelation rate evolves with
we have run simulations for α = 0.1 and different values both β and ε is therefore quite non trivial and would be
of ε and β. Fig. 19 displays individual trajectories of a very interesting endeavor.
the intentions mi (t), as well as the distribution of the Where we previously had limit cycles, for ε = 1.5 for
values of individual mi over all agents, realizations and instance (last row of Fig. 19), it appears that oscillations
time-steps. We also show the auto-correlation function survive for large values of β. Note however that in this
C(τ ) (assumed to be time-translation invariant over the case the value q = C(0) appears to very quickly vanish
short time scales considered), which is also compared to when β decreases. This is consistent with the linear sta-
the DMFT solved numerically. Note that for the small- bility analysis of the paramagnetic solution that should
est value ε = 0.1, the trajectories illustrate the previously be valid for vanishingly small α. As visible in Fig. 16
discussed quasi-fixed points emerging from the online dy- (c) we indeed expect that the region in which the system
namics. Clearly, while the correlation function remains displays any form of aggregate coordination becomes in-
close to constant, individual intentions are not exactly creasingly narrow as ε gets closer to its maximum value
frozen (notice the wiggles in the top row of Fig. 19, spe- of 2. Precisely for ε = 2, the system likely becomes fully
cially for β = 2), explaining how the system as a whole disordered (q = 0) for any finite values of β when α → 0.
eventually decorrelates and displays aging, as discussed For small but finite values of α as those presented here,
in the previous section. this is not quite the case however, and large asymme-
tries ε > 2 − εc do give rise to clear oscillations, both in
In the chaotic regime around ε = 1, it is clear that
individual trajectories and the correlation function (for
decreasing β (increasing noise) spreads the distribution
instance Fig. 19, bottom row shows for ε = 1.5 that some
of the mi , which is less and less concentrated around ±1
oscillations can be somewhat sustained).
as an immediate consequence of the smoothed out hyper-
bolic tangent. As a result, the equal-time autocorrelation
C(0) = q naturally decreases when β decreases. It is fur-
thermore interesting to note that its value significantly D. Role of the Noise: Recap
decreases as the asymmetry parameter ε increases. We
expect a decrease in α to have a similar role, as suggested In the presence of noise, the “conviction” of agents nat-
by the phase diagrams presented in Fig. 16 (b) and (c). urally goes down, in the sense that individual mi ’s be-
Dynamically, the decay of the autocorrelation in this come smaller in absolute value, i.e. less polarized around
chaotic regime appears to be more or less independent of ±1. For small α (long memory time), there exists a well
the strength of the noise, which can be seen by comparing defined transition line in the plane ε (asymmetry), 1/β
the insets of the third row of Fig. 19. While it is known (amplitude of the noise) above which agents start playing
that external noise kills deterministic chaos in neural net- randomly at each round (i.e. mi = 0), but below which
22
(a) (b)
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
1 1 1 1
mi(t)
mi(t)
C(τ )
C(τ )
0 0
0 0
−1 −1
0 4 8 0 4 8 0 4 8 0 4 8
α(t − t0) ατ α(t − t0) ατ
FIG. 19. Finite β trajectories obtained from numerical simulations at α = 0.1, N = 256, t0 = 108 averaged over 96 samples
of disorder and initial conditions for (a) β = 4, (b) β = 2 and from top to bottom ε = {0.1, 0.85, 1.05, 1.5}. Each line displays
(from left to right) the evolution of 16 randomly selected agents, the associated histogram of mi over both agents, time and
realizations and the autocorrelation function assumed to be time translation invariant on short time scales. Third row: insets
representing the evolution of the normalized autocorrelation C(τ )/C(0) with a logarithmic vertical scale.
Agents, therefore, must do with satisficing solutions, as get stuck. We also showed how the number of L-cycles
argued long ago by H. Simon [1], an idea embodied by can be calculated for all values of α < 1.
our model in a concrete and tangible way. The chaotic region that is known to exist when inter-
Only a centralized, omniscient agent may be able to actions are mostly non-symmetric (ε ≈ 1) also appears
ascribe an optimal strategy to all agents – which inciden- to be reduced by memory. When couplings are mostly
tally raises the question of trust: would agents even agree non-reciprocal ε ≲ 2, periodic oscillations survive but we
to follow the central planner advice? Would they even found that decreasing α non-trivially increases the cycle
believe in her ability to solve complex problems, knowing length, as α−1/2 .
that their solution sensitively depends on all parameters When β is finite, the fixed point solutions to the
of the model? If a finite fraction of all agents fail to com- dynamics correspond to the so-called Naive Mean-field
ply, the resulting average reward will drop precipitously Equation of spin glasses, another model that has been
below the optimal value and not be much better than the studied in detail [78]. One knows in particular that
result obtained through individual learning. such solutions become exponentially abundant for small
As general ideas of interest in a socio-economic context, enough noise 1/β and for ε = 0, a result that we have
we have established that extended to all ε < 1. When ε > 1, on the other hand, the
only fixed point (or Nash equilibrium) is mi = 0, ∀i, i.e.
1. long memory of past rewards is beneficial to learn- completely random decisions at each time step.
ing whereas over-reaction to recent past is detri- The Dynamical Mean-Field Theory (DMFT) is a tool
mental; of choice for investigating the dynamics of the model
2. increased competition generically destabilizes fixed when N → ∞. DMFT is however frustratingly difficult
points and leads first to chaos and, in the high com- to exploit analytically in the general case, so we are left
petition limit, to quasi-cycles; with numerical solutions of our DMFT equations that ac-
curately match direct numerical simulations of the model
3. some amount of noise in the learning process, quite when N is large, but fails to capture some specific fea-
paradoxically, allows the system to reach better col- tures arising when N is small. From our numerical re-
lective decisions, in the sense that the average re- sults, we conjectured that quasi-fixed points (and corre-
ward is increased; spondingly, aging dynamics) persist for small noise and
when ε ≤ ε⋆ , where the value of ε⋆ is difficult to ascertain
4. non-ergodic behaviour spontaneously appear (in but could be as high as εRM = 0.47, perhaps related to
the form of “aging”) in a large swath of parame- the remnant magnetisation transition found in [94].
ter space, when α, 1/β and ε are small. For β = ∞, the long-time, zero temperature autocor-
relation C(∞) appears to drop extremely slowly with
On the positive side, we have shown that learning is far ε, but we have not been able to get an analytical result.
from useless: instead of getting stuck among one of the DMFT equations also clearly lead to the anomalous α−1/2
most numerous fixed points with low average reward, stretching of the cycles mentioned above, but an analytic
the learning process does allow the system to coordinate solution again eluded us.
around satisficing solutions with rather high (but not op- One of the reasons analytical progress with DMFT
timal) average reward. Numerically, the average reward is difficult is presumably related to the phenomenon
at the end of the learning process is, for ε = 0, ≈ 8% below of “Replica Symmetry Breaking” and its avatar in the
the optimal value, when the majority of fixed points lead present context of dynamical learning. Indeed, any at-
to a much worse average reward ≈ 33% below the optimal tempt to expand around a static solution of the DMFT
value [60]. equations leads to inconsistencies in the interesting situa-
tion β > βc (ε) when decisions are not purely random, see
Fig. 17(b). In fact, as shown in Fig. 17(a), the value of
B. Technical Results & Conjectures the order parameter q predicted by such static solutions
is substantially off the value found from the long time,
From a statistical mechanics perspective, our model numerical solution of the DMFT equations, which itself
is next of kin to, but different from several well studied coincides with direct simulations of the SK-game. En
models; a synthesis of our original results can be found passant, we noticed that the value of q seems to be given
in Figs. 1, 2, 9, 16. by a universal function of βc (ε)/β, independently of the
For example, when α = 1, β → ∞, the dynamics is value of ε. Again, we have not been able to understand
equivalent to a Hopfield model of learning with non- why this should be the case.
symmetric Gaussian synaptic couplings, for which many Finally, we have numerically established several inter-
results are known, in particular on the number of fixed esting results concerning average and individual rewards,
points and L-cycles. Introducing some memory with that would deserve further investigations. For example,
α < 1, we found that previously dynamically unattain- the average reward seems to converge towards its asymp-
able fixed points become typical solutions, replacing the totic value as N −2/3 , exactly as for the SK model, al-
short limit cycles in which the α = 1 parallel dynamics though, as already noted above, this asymptotic value
24
is ≈ 8% below the optimal SK value. Is is possible to rates mi [120–125]. Although in our case the influence of
characterize more precisely the ensemble of configura- β introduces some perhaps unwanted stochasticity, these
tions reached after learning in the long memory limit fluctuations can in principle be suppressed (at least par-
α → 0? Can one, in particular, understand analytically tially) with sufficiently small α. The memory loss param-
the distribution of individual rewards shown in Fig. 4 eter could also represent an interesting way to tune the
and the corresponding asymptotic value of the average effective slowing down of the dynamics caused by symme-
reward, as well as its non monotonic behaviour as a func- try and described in [105]. Here, the description of real
tion of the noise parameter β? These are, in our opinion, neural networks would likely require much more sparse
quite interesting theoretical questions left for future in- interactions, but also perhaps the introduction of some
vestigation. dedicated dynamics for the interactions themselves, see
e.g. [126] for recent ideas.
Closer to our original motivation, more applied socio-
C. Extensions and Final Remarks economic problems might benefit from the introduction
of this type of reinforcement learning. In macroeco-
Many extensions of the very simple framework pre- nomics for instance, some form of “habit formation”
sented here can be imagined for the model to be more rep- could perhaps be relevant to extend existing descriptions
resentative of real socio-economic systems. For example, of input/output networks [13, 64], where client/supplier
by analogy with spin-glasses, going beyond the fully con- relationships are probably strongly affected by history
nected interactions and towards a more realistic network (on this point, see Kirman’s classic study on Marseilles’
structure should not change the overall phenomenology fish market [127], see also [29]). Finally, while it is an as-
although some subtle differences may show up. While an- pect of our model we have not investigated here, previous
alytical predictions become even more challenging, recent works have reported that similar dynamics yield interest-
works on dynamical mean-field theories with finite con- ing volatility clusters and heavy tails, corresponding to
nectivity, so far developed for Lotka-Volterra type sys- sudden changes of quasi-fixed points. Such effects might
tems, could perhaps be adapted to the learning dynamics be relevant to describe financial time series [14, 51].
of our model. In the context of financial markets, our model could
Allowing the interaction network to evolve with time also be used to challenge the idea that efficiency is
would of course also make the model more realistic, as in reached by evolution. Indeed, a theory that has been
[13]; in this case one would have to distinguish the case proposed to explain empirical observations going against
where the learning time is much longer or much shorter the Efficient Market Hypothesis is the so-called “Adap-
than the reshuffling time of the network. tive Markets Hypothesis”, stating that inefficiencies stem
Other interesting additions to Ising games could also from the (transient) evolution of a market towards true
include the introduction of self-excitation [119] or of al- efficiency [128, 129]. However, reinterpreting the Qi in
ternative decision rules that might be less statistical our model as some form of fitness measure, it appears
mechanics-friendly [36]. Extension to multinary deci- unlikely that simple evolutionary dynamics (akin to the
sions, beyond the binary case considered here, as well as learning dynamics) could overcome the type of radical
higher-order interactions (i.e. a “p-spin game”), would complexity discussed here. As a matter of fact, such an
obviously be interesting as well, specially as higher order evolutionary twist to the “SK-game” could also be used
interactions are known to change the phenomenology of to conjecture that simple Darwinian dynamics is unlikely
the SK model (see e.g. [102]). In particular, we expect to lead to a global optimum in a complex biological set-
that in the p-spin case with p ≥ 3 a much larger gap would ting [130].
develop between the optimal average reward and the one Last but not least, we believe that the learning dy-
reached by learning. namics presented here may be useful from a purely algo-
Finally, whereas temperature in physics is the same for rithmic point of view in the study of spin-glasses and so-
all spins, there is no reason to believe that the amount called TAP states. Indeed, in the β → ∞ limit, we have
of noise β or the memory span α should be the same seen that our iteration relatively frequently finds fixed
for all agents. Introducing such heterogeneities might be points in regions where their abundance is known to be
worth exploring, as some agents with longer memory may sub-exponential (close to ε = 1 in particular), and this
fare systematically better than others, like in Minority even for relatively large values of α. Interestingly, sim-
Games, see [34]. ilar exponentially weighted moving averages have been
Beyond the socio-economic context that was our initial employed in past numerical studies of TAP states for
motivation for its design, we believe that the simplicity symmetric interactions [131, 132], but on the magnetiza-
and generality of our model makes it a suitable candi- tions mi themselves and not on the local fields Qi like is
date to describe a much wider range of complex systems. the case above.
In the context of biological neural networks, the param- Using an offline version of our learning procedure could
eter α indeed allows one to interpolate between simple then be of use to effectively converge to fixed points of
discrete-time Hopfield network [88], and continuous-time the TAP equations or Naive Mean-Field Equations, and
models where Qi is an activation variable for the firing study their properties. Perhaps even more interestingly,
25
ACKNOWLEDGMENTS
Indeed, given the assumption of independence in time, i.e. ⟨(Sj (t′ ) − mj (t′ )) (Sj (t′′ ) − mj (t′′ ))⟩ = δt,t′ (1 − m( t)), we have
2
mj (t′ )) ⟩ = α2 ∑ (1 − α)2(t−t ) ⟨(Sj (t′ ) − mj (t′ )) ⟩
′ ′ 2
⟨(m̃α
j (t) − α ∑ (1 − α)
t−t
t′ ≤t t′ ≤t
t′ ≤t
α
≤ α2 ∑ (1 − α)2(t−t ) =
′
ÐÐ→ 0.
t′ ≤t 2 − α α→0
We can then make the ansatz that the expected decision reaches a fixed point m⋆j after some time. For sufficiently large t and
small but finite values of α, we will therefore have
⋆
⟨m̃α
j (t)⟩ ≃ mj (A3)
with fluctuations characterized by
⋆ 2 α
j (t) − mj ) ⟩ =
⟨(m̃α (1 − (m⋆j )2 ) + O(α2 ). (A4)
2
To study the influence of the memory loss rate α < 1, we may adapt the method of Hwang et al., although this requires the
introduction of either a strong nonlinearity in the exponent and subsequent saddle equations, or of new variables. We opt for
the latter, and write the number of cycles of length L as
∞ N L
NL (N, α, ε) = ( ∏ ∏ dQi (t)) ∣det J α ∣ ∏ δ(Qi (t + 1) − (1 − α)Qi (t) − α ∑ Jij Sj (t))Θ(Qi (t)Si (t)),
N
∑ ∫ (B1)
{Si (t)} −∞ i=1 t=1 i,t j
where the Dirac δ ensures that the dynamics are satisfied at each step, while the second enforces Si (t) = sign(Qi (t)) ∀t. The
L × L matrix J α is the α-dependent Jacobian ensuring that the zeros of the δ function are correctly weighted, i.e. for L > 1
⎧
⎪−(1 − α), if s = t (mod L)
⎪
⎪
⎪
Jts
α
= ⎨1, if s = t + 1 (mod L) (B2)
⎪
⎪
⎪
⎪0
⎩ otherwise.
The first step is, as usual, to perform the average over the disorder after introducing the integral representation of the Dirac δ,
∞ N L
dλi (t)
NL (N, α, ε) = ∣det J α ∣ ( ∏ ∏ dQi (t) )( ∏ Θ(Si (t)Qi (t))) exp [ − i ∑ λi (t)Qi (t + 1) + i(1 − α) ∑ λi (t)Qi (t)
N
∑ ∫
{Si (t)} −∞ i=1 t=1 2π i,t i,t i,t
1 2 ε 2 2 1 2 ε 2 2
− α (1 − ) ∑ ( ∑[λi (t)Sj (t) + λj (t)Si (t)]) − α ( ) ∑ ( ∑[λi (t)Sj (t) − λj (t)Si (t)]) ].
2N 2 i<j t 2N 2 i<j t
(B3)
The last two terms, resulting from the average on disorder, may be rearranged to give
α2 1
− ∑ ( ∑ [υ(ε)λi (t)Sj (t)λi (s)Sj (s) + (1 − ε)λi (t)Sj (t)λj (s)Si (s)] − (ε − 2) ∑ λi (t)Si (t)λi (s)Si (s)).
2
2N t,s i,j 2 i
1 (2 − ε)2
− N α2 ∑ [υ(ε)R(t, s)K(t, s) + (1 − ε)V (t, s)V (s, t) − U (t, s)].
2 t,s 2N
In the limit N → ∞, the O(N −1 ) term is insignificant and can thus be neglected. The complete expression is then,
− i ∑ K̂(t, s)(N K(t, s) − ∑ Si (t)Si (s)) − i ∑ R̂(t, s)(N R(t, s) − ∑ λi (t)λi (s))
t,s i t,s i
For V (t, s), we have to be a bit more careful, as the expression is not linear for all terms, as the diagonal for t = s gives a
quadratic term that will have to be treated separately. Noticing that the product V (t, s)V (s, t) is symmetric, we start by
considering the t < s,
∞ dV (t, s) − ∑t<s iV (t,s)[N V̂ (t,s)−iN α2 (1−ε)V (s,t)]
∫ (∏ )e = ∏ δ(N V̂ (t, s) − iN α2 (1 − ε)V (s, t)). (B11)
−∞ t<s 2π t<s
Finally, the diagonal V (t, t) for which the expression is quadratic can be computed with a Gaussian integral,
∞ 2 − N
∑t V̂ (t,t)
2
(1−ε) ∑t V (t,t)2 −iN ∑t V (t,t)V̂ (t,t)
( ∏ dV (t, t)) e− 2 N α
1
∫ ∼e 2α2 (1−ε) (B13)
−∞ t
where
∞ dλ(t)
IL = ∑ ∫ ( ∏ dQ(t) ) ( ∏ Θ(S(t)Q(t))) exp [ − i ∑ λ(t)Q(t + 1) + i(1 − α) ∑ λ(t)Q(t)
{S(t)} −∞ t 2π t t t
(B17)
1
+ α2 υ(ε) ∑(S(t)S(s)iK̂(t, s) + λ(t)λ(s)iR̂(t, s)) + α2 (1 − ε) ∑ λ(t)S(s)iV̂ (t, s)].
2 t,s t,s
At this stage, we may notice that S(t)2 = 1 ∀t and as such that the diagonal part of the sum over K(t, s) can be taken out of
IL . Doing so and combining this contribution to the first term of the exponent of Eq. B16, we have
∞ 1
NL (N, α, ε) ∼ ∫ ( ∏ dK̂(t, s)dR̂(t, s)dV̂ (t, s)) exp (N [ α2 υ(ε) ∑(iR̂(t, t) + 1)iK̂(t, t)
−∞ t,s 2 t
(B18)
1 ε2 1
+ α2 (1 − ε + ) ∑ iR̂(t, s)iK̂(t, s) − α2 (1 − ε) ∑ V̂ (t, s)V̂ (s, t) + log IL + log ∣det J α ∣ ]).
2 2 s≠t 2 t,s
Now, the first term in the exponent can be integrated over the K̂(t, t) exactly, yieding a product of δ functions fixing
1 1
ΣL (α, ε) = saddle { α2 υ(ε) ∑ iR̂(t, s)iK̂(t, s) − α2 (1 − ε) ∑ V̂ (t, s)V̂ (s, t) + log IL + log ∣det J α ∣ }, (B20)
R̂,K̂,V̂ 2 s≠t 2 t,s
with now
∞ dλ(t)
IL = ∑ ∫ ( ∏ dQ(t) ) ∏ Θ(S(t)Q(t)) exp [ − i ∑ λ(t)Q(t + 1) + i(1 − α) ∑ λ(t)Q(t)
{S(t)} −∞ t 2π t t t
(B21)
1
− α2 υ(ε)[ ∑ λ(t)2 − ∑(S(t)S(s)iK̂(t, s) − λ(t)λ(s)iR̂(t, s))] + α2 (1 − ε) ∑ λ(t)S(s)iV̂ (t, s)].
2 t s≠t t,s
and b ∈ RL ,
b(t) = Q(t + 1) − (1 − α)Q(t) − α2 (1 − ε) ∑ V̂ (t, s)S(s), (B24)
s
such that we in fact have
b = J α Q + c, c(t) = −α2 (1 − ε) ∑ V̂ (t, s)S(s). (B25)
s
Computing the Gaussian integral on λ,
1 2
υ(ε) ∑s≠t S(t)S(s)iK̂(t,s) ∞
e2α dQ(t) 1
IL = ∑ √ ∫ (∏ √ ) ∏ Θ(S(t)Q(t)) exp (− (J α Q + c)⊺ A−1 (J α Q + c)) . (B26)
{S(t)} det A −∞ t 2π t 2
where à = (J α )−1 A(J α )−1 As a result, the ∣det J α ∣ contributions in the complexity cancel out, and we find ourselves with
1 1
ΣL (α, ε) = saddle { α2 υ(ε) ∑ iR̂(t, s)iK̂(t, s) − α2 (1 − ε) ∑ V̂ (t, s)V̂ (s, t) + log IL }, (B28)
R̂,K̂,V̂ 2 s≠t 2 t,s
with now
1 2
υ(ε) ∑s≠t S(t)S(s)iK̂(t,s) ∞
e2α dQ(t) 1
IL = ∑ √ ∫ (∏ √ ) ∏ Θ(S(t)(Q − (J α )−1 c)(t)) exp (− Q⊺ Ã−1 Q) . (B29)
{S(t)} det à −∞ t 2π t 2
The integral is challenging to study in generality, as the non-diagonal nature of J α and A means that we cannot factorize
the integrand. We now move √ on to specific cases of interest. The√ equations can be further simplified through the rescalings
α2 υ(ε)K̂(t, s) → K̂(t, s), α υ(ε)V̂ (t, s) → V̂ (t, s) and Q(t) → α υ(ε), in which case we finally get
η
ΣL (α, η) = saddle { ∑ iR̂(t, s)iK̂(t, s) − ∑ V̂ (t, s)V̂ (s, t) + log IL }, (B30)
R̂,K̂,V̂ s<t 2 t,s
where ΨL (x1 , . . . , xL ; C) is the cumulative distribution of an L-dimensional Gaussian with a zero mean vector and covariance
matrix C evaluated up to xt , t ∈ {1, . . . , L}. In our case, the upper bounds of integration are given by Γ = S ○ (J α )−1 c, with
now
c(t) = η ∑ V̂ (t, s)S(s), (B32)
s
while the covariance matrix must be taken with care as the off-diagonal elements must be adapted to the presence of S(t) in
the Heaviside step function. As a result, off-diagonal elements must be multiplied by S(t)S(s). As a result, we have
Ã(t, t) for t = s
C(t, s) = { (B33)
S(t)S(s)Ã(t, s) for t ≠ s
1. Fixed Points
We can start by checking that we recover the known result for L = 1. In this case, J α = α, and we only have to solve for a
scalar V̂ (1, 1) = x. Here,
η 1
IL = 2Ψ1 ( x; 2 ) = 2Φ(ηx), (B35)
α α
therefore the saddle point equation becomes
1
Σ1 (α, η) ∶= ΣFP (η) = max {− x2 + log 2 + log Φ(ηx)} , (B36)
x 2
from which the expression in the main text can immediately be recovered.
2. Two-cycles
There are now six variables to solve for: iR̂, iK̂, V̂1 = V̂ (1, 1), V̂2 = V̂ (2, 2), V̂12 = V̂ (1, 2) and V̂21 = V̂ (2, 1). In this case, the
Jacobian matrix is given by
−(1 − α) 1 1 1−α 1
Jα = [ ] ⇒ (J α )−1 = [ ], (B37)
1 −(1 − α) 1 − (1 − α)2 1 1−α
Now, it is immediately apparent that IL is a function of the product S1 S2 = ±1 rather than S1 and S2 . As a result, the
complexity is given by
η 2
Σ2 (α, η) = saddle {iR̂iK̂ − (V̂1 + V̂22 + 2V̂12 V̂21 ) + log 2 + log ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C)} . (B40)
R̂,K̂,V̂1 ,V̂2 ,V̂12 ,V̂21 2 S=±1
∂C−1 e− 2 Q C Q
1 ⊺ −1
iR̂ 1 Γ1 Γ2
(iK̂ + ) ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) − ∑ e
SiK̂
∫ dQ1 ∫ dQ2 Q⊺ Q √ =0 (B42)
1 − (iR̂) 2
S=±1 2 S=±1 −∞ −∞ ∂iR̂ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ2
−η V̂1 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ eSiK̂
∫ dQ2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B43)
e− 2 Q1 C Q1
1 ⋆⊺ −1 ⋆
η Γ1
+ ∑ Se
SiK̂
∫ dQ1 √ =0
1 − (1 − α) S=±1
2 −∞ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺ −1 ⋆
η Γ2
−η V̂2 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ SeSiK̂
∫ dQ2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B44)
e− 2 Q1 C Q1
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ1
+ ∑ e
SiK̂
∫ dQ1 √ =0
1 − (1 − α) S=±1
2 −∞ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ2
−η V̂21 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ SeSiK̂
∫ dQ 2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B45)
e− 2 Q1 C Q1
1 ⋆⊺ −1 ⋆
η Γ1
+ ∑ e
SiK̂
∫ dQ1 √ =0
1 − (1 − α) S=±1
2 −∞ 2π det C
e− 2 Q2 C Q2
1 ⋆⊺ −1 ⋆
η Γ2
−η V̂12 ∑ eSiK̂ Ψ2 (Γ1 , Γ2 ; C) + ∑ eSiK̂
∫ dQ 2 √
S=±1 1 − (1 − α)2 S=±1 −∞ 2π det C
(B46)
e− 2 Q1 C Q1
1 ⋆⊺
η(1 − α)
−1 ⋆
Γ1
+ ∑ Se
SiK̂
∫ dQ1 √ = 0,
1 − (1 − α) S=±1
2 −∞ 2π det C
with Q∗1 = [Q1 Γ2 ]⊺ and similarly for Q∗2 , and
We now take the particular case α = 1. In this case, we expect two-cycles with a zero overlap in between consecutive steps,
meaning iR̂ = 0. As a result, the matrix C is the identity matrix, and thus
In order for iR̂ = 0 to satisfy Eq. (B41), we then require iK̂ = 0 and V̂1 = V̂2 = 0 such that ∑S=±1 SeiK̂ Ψ2 (Γ1 , Γ2 ; C) = 0. Given
∂C−1 0 S
∣ =[ ] (B49)
∂iR̂ S 0
iR̂=0
for α = 1, the solution iK̂ = 0 satisfies the saddle point equation (B42) while the independence of the integrands on S ensures
that V̂1 = V̂2 = 0 are compatible with Eqs. (B43)-(B44). Eqs. (B45)-(B46) then suggest the ansatz V̂12 = V̂21 = x, in which case
the problem is finally reduced to
Σ2 (α = 1, η) = max {−ηx2 + 2 log 2 + 2 log Φ(ηx)}
x
(B50)
= 2ΣFP (η),
recovering the known solution of [61, 86].
31
We start from the N ≫ 1 discrete difference equations to which we have added an external field hi (t)
Qi (t + 1) = (1 − α)Qi (t) + α ∑ Jij mj (t) + αηi (t) + hi (t), ⟨ηi (t)ηjs ⟩ ≈ υ(ε)(1 − q(t))δt,s δi,j (C1)
j
and introduce a new agent at index i = 0. The influence of this newly introduced agent on the dynamic equation of agents i ≠ 0
is then
∑ Jij mj (t) + hi (t) → ∑ Jij mj (t) + Ji0 m0 (t) + hi (t) (C2)
j j
Given N is large, we consider the response at the linear order, meaning that the expected decision of all agents i > 0 becomes
∂mi (t)
mi (t) → mi (t) + ∑ ∑ ∣ Jj0 m0 (s), (C3)
s<t j ∂hj (s) h=0
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
χij (t,s)
as Jj0 m0 (t) can be seen as the modification of the effective field “felt” by all agents j, and χij (t, s) is simply the linear response
function to a small external field. The dynamics that the newly introduced agent follows is then given by
Q0 (t + 1) = (1 − α)Q0 (t) + α ∑ J0i mi (t) + ∑ ( ∑ J0i χij (t, s)Jj0 )m0 (s) + αη0 (t) + h0 (t). (C4)
i>0 s<t ij
The sum on i, j can be split in its diagonal and off-diagonal parts. On the diagonal, we assume the central limit theorem holds,
yielding
1
∑ J0i χii (t, s)Ji0 = N [⟨J0i χii (t, s)Ji0 ⟩ + O( √ )] ≈ (1 − ε)⟨χii (t, s)⟩, (C5)
i N
while the off-diagonal contribution will be sub-dominant as its mean will be zero given non-opposing entries in the interaction
matrix are uncorrelated. We can also assume the CLT is valid for the sum on indices i at the leading order, in which case
where ξ0 is a Gaussian of zero mean and correlated in time as ⟨ξ0 (t)ξ0 (s)⟩ = C0 (t, s) = ⟨m0 (t)m0 (s)⟩. Bringing everything
together, one realizes that there is no cross contributions between agents at the leading order and thus that we can drop the
index 0 and recover the equation in the main text
with a new noise term combining the original thermal-like fluctuations and the effective contribution from the disorder averaging,
with ⟨ϕ(t)⟩ = 0, and
⟨ϕ(t)ϕ(s)⟩ = υ(ε)[C(t, s) + (1 − q(t))δt,s ], (C8)
and where the memory kernel and correlation function are to be determined self-consistently,
∂m(t)
G(t, s) = ⟨χii (t, s)⟩ = ⟨ ∣ ⟩, C(t, s) = ⟨m(t)m(s)⟩. (C9)
∂h(s) h=0
Note that in order to eliminate the external field, we have expressed the susceptibility with the noise term, resulting in a
rescaling G(t, s) → αG(t, s). From this expression, the continuous limit can be taken, changing the sum in time weighted by α
into an integral,
α t
Q̈(t) + Q̇(t) = −Q(t) + (1 − ε) ∫ ds G(t, s)m(s) + ϕ(t) + h(t). (C10)
2 0
¨ ) = ∆(τ ) − 1 C(τ ),
∆(τ (D2)
2
32
with therefore ∆(τ ) = ⟨Q(t + τ )Q(t)⟩, and where thanks to the Gaussian nature of the fluctuations we have
∞ dz − 1 z2 ∞ dx
− 1 x2
√ √ 2
C(τ ) = ∫ √ e 2 [∫ √ e 2 sign ( ∆(0) − ∣∆(τ )∣x + ∣∆(τ )∣z)]
−∞ 2π −∞ 2π
¿
∞ dz − 1 z2 Á ∣∆(τ )∣
2
√ e 2 erf (ÁÀ z
=∫ √ ) (D3)
−∞ 2π ∆(0) − ∣∆(τ ) 2
2 ∆(τ )
= sin−1 ( ),
π ∆(0)
(see [133] for the last step above). Now, by identifying that the only difference with the original problem is a factor 2 in ∆, we
can directly use the result ∆(0) = 1 − π2 , found by enforcing the condition required for ∆(τ ) to decay monotonously [99]. We
may finally expand the inverse sine to recover
¨ ) ∼ (1 − 1
∆(τ )∆(τ ) (D4)
τ ≫1 π∆(0)
and therefore √
2 − ττ π−2
C(τ ) ∼ e 1, τ1 = ≈ 2.84. (D5)
τ ≫1 π π−3
We notice that the value of the characteristic time is identical in the standard Sompolinsky
√ π & Crisanti case (the only difference
being a factor 2 in the magnitude of ∆(τ )), which is therefore not equal to the value π−2 ≈ 1.66 originally given in [97] and
[98].
[1] H. A. Simon, The quarterly journal of economics , 99 [19] “Common knowledge” means that “You know what I
(1955). know and I know that you know what I know”.
[2] The complexity of the whole means that anything that [20] S. Galluccio, J.-P. Bouchaud, and M. Potters, Physica
can happen is really, despite the experience gained, im- A: Statistical Mechanics and its Applications 259, 449
possible to predict, let alone imagine. It is pointless at- (1998).
tempt to describe it, because any solution is conceivable. [21] J. Garnier-Brun, M. Benzaquen, S. Ciliberti, and J.-
[3] J.-M. Grandmont, Econometrica , 741 (1998). P. Bouchaud, Journal of Statistical Mechanics: Theory
[4] A. Kirman, Complex economics: individual and collec- and Experiment 2021, 093408 (2021).
tive rationality (Routledge, 2010). [22] D. Sharma, J.-P. Bouchaud, M. Tarzia, and F. Zam-
[5] C. Hommes, Journal of Economic Literature 59, 149 poni, Journal of Statistical Mechanics: Theory and Ex-
(2021). periment 2021, 063403 (2021).
[6] G. Dosi and A. Roventini, Journal of Evolutionary Eco- [23] P. Bak, How nature works: the science of self-organized
nomics 29, 1 (2019). criticality (Springer Science & Business Media, 2013).
[7] M. King and J. Kay, Radical uncertainty: Decision- [24] P. W. Anderson, The economy as an evolving complex
making for an unknowable future (Hachette UK, 2020). system (CRC Press, 2018).
[8] A. Marcet and T. J. Sargent, Journal of Economic the- [25] P. Bak, K. Chen, J. Scheinkman, and M. Woodford,
ory 48, 337 (1989). Ricerche economiche 47, 3 (1993).
[9] G. W. Evans, S. Honkapohja, et al., Rethinking expecta- [26] R. Bookstaber, The end of theory: Financial crises, the
tions: The way forward for macroeconomics 68 (2013). failure of economics, and the sweep of human interac-
[10] R. Pemantle, Probability Surveys 4, 1 (2007). tion (Princeton University Press, 2017).
[11] D. Lamberton, P. Tarrès, et al., Ann. Appl. Probab. 14, [27] M. Shiino and T. Fukai, Journal of Physics A: Mathe-
1424 (2004). matical and General 23, L1009 (1990).
[12] J. Moran, A. Fosset, D. Luzzati, J.-P. Bouchaud, and [28] J. Moran and J.-P. Bouchaud, Physical Review E 100,
M. Benzaquen, Chaos: An Interdisciplinary Journal of 032307 (2019).
Nonlinear Science 30, 053123 (2020). [29] G. Dosi, The Foundations of Complex Evolving
[13] C. Colon and J.-P. Bouchaud, Available at SSRN Economies: Part One: Innovation, Organization, and
4300311 (2022). Industrial Dynamics (Oxford University Press, 2023).
[14] T. Galla and J. D. Farmer, Proceedings of the National [30] For other examples of models where learning leads to
Academy of Sciences 110, 1232 (2013). chaotic dynamics, see e.g. [134, 135].
[15] J.-P. Bouchaud and R. E. Farmer, Journal of Political [31] T. Hirano and J. E. Stiglitz, Land Speculation and Wob-
Economy 131, 000 (2023). bly Dynamics with Endogenous Phase Transitions, Tech.
[16] G. Parisi, Physica A: Statistical Mechanics and its Ap- Rep. (National Bureau of Economic Research, 2022).
plications 263, 557 (1999), proceedings of the 20th IU- [32] G. Dosi, M. Napoletano, A. Roventini, J. E. Stiglitz,
PAP International Conference on Statistical Physics. and T. Treibich, Economic Inquiry 58, 1487 (2020).
[17] G. Parisi, Advances in Complex Systems 10, 223 (2007). [33] W. A. Brock and S. N. Durlauf, The Review of Eco-
[18] J.-P. Bouchaud, Entropy 23, 1676 (2021). nomic Studies 68, 235 (2001).
33
[85] F. R. Waugh, C. M. Marcus, and R. M. Westervelt, [111] T. Arnoulx de Pirey and G. Bunin, Physical Review
Physical review letters 64, 1986 (1990). Letters 130, 098401 (2023).
[86] H. Gutfreund, J. Reger, and A. Young, Journal of [112] E. Marinari, G. Parisi, and D. Rossetti, The Euro-
Physics A: Mathematical and General 21, 2775 (1988). pean Physical Journal B-Condensed Matter and Com-
[87] F. Tanaka and S. Edwards, Journal of Physics F: Metal plex Systems 2, 495 (1998).
Physics 10, 2769 (1980). [113] A. Baldassarri, Physical Review E 58, 7047 (1998).
[88] J. J. Hopfield, Proceedings of the national academy of [114] L. F. Cugliandolo, J. Kurchan, P. Le Doussal, and
sciences 79, 2554 (1982). L. Peliti, Physical review letters 78, 350 (1997).
[89] U. Bastolla and G. Parisi, Journal of Physics A: Math- [115] E. Marinari and D. Stariolo, Journal of Physics A:
ematical and General 31, 4583 (1998). Mathematical and General 31, 5021 (1998).
[90] L. F. Cugliandolo, arXiv preprint arXiv:2305.01229 [116] L. Berthier and J. Kurchan, Nature Physics 9, 310
(2023). (2013).
[91] F. Roy, G. Biroli, G. Bunin, and C. Cammarota, Jour- [117] G. Iori and E. Marinari, Journal Of Physics A: Mathe-
nal of Physics A: Mathematical and Theoretical 52, matical and General 30, 4489 (1997).
484001 (2019). [118] L. Molgedey, J. Schuchhardt, and H. G. Schuster, Phys-
[92] F. Mignacco, F. Krzakala, P. Urbani, and L. Zdeborová, ical review letters 69, 3717 (1992).
Advances in Neural Information Processing Systems 33, [119] A. Antonov, A. Leonidov, and A. Semenov, Physica A:
9540 (2020). Statistical Mechanics and its Applications 561, 125305
[93] H. Eissfeller and M. Opper, Physical review letters 68, (2021).
2094 (1992). [120] C. Van Vreeswijk and H. Sompolinsky, Science 274,
[94] H. Eissfeller and M. Opper, Physical Review E 50, 709 1724 (1996).
(1994). [121] N. Brunel, Journal of computational neuroscience 8, 183
[95] The convergence to m∞ is as slow a power law of τ , (2000).
which makes difficult its numerical determination. Ob- [122] K. Rajan, L. F. Abbott, and H. Sompolinsky, Physical
taining the precise value of εRM is therefore challenging review e 82, 011903 (2010).
[94]. [123] M. Stern, H. Sompolinsky, and L. F. Abbott, Physical
[96] Note that C(τ = 2) remains equal to 1 in the whole Review E 90, 062710 (2014).
range of parameters shown in Fig. 12, i.e. we only ob- [124] J. Kadmon and H. Sompolinsky, Physical Review X 5,
serve fixed points or 2-cycles. 041030 (2015).
[97] H. Sompolinsky, A. Crisanti, and H.-J. Sommers, Phys- [125] D. G. Clark and L. F. Abbott, arXiv preprint
ical review letters 61, 259 (1988). arXiv:2302.08985 (2023).
[98] A. Crisanti and H. Sompolinsky, Physical Review A 37, [126] U. Pereira-Obilinovic, J. Aljadeff, and N. Brunel, Phys-
4865 (1988). ical Review X 13, 011009 (2023).
[99] A. Crisanti and H. Sompolinsky, Physical Review E 98, [127] A. P. Kirman and N. J. Vriend, in Interaction and
062120 (2018). market structure: Essays on heterogeneity in economics
[100] We have in fact noted that in this regime, all results (Springer, 2000) pp. 33–56.
seem to collapse when plotted vs. T /Tc (ε), as in Fig. 17 [128] A. W. Lo, Journal of Portfolio Management, Forthcom-
(a). ing (2004).
[101] L. F. Cugliandolo and J. Kurchan, Journal of Physics [129] A. W. Lo, Journal of investment consulting 7, 21 (2005).
A: Mathematical and General 27, 5749 (1994). [130] T. Brotto, G. Bunin, and J. Kurchan, Journal of Sta-
[102] J.-P. Bouchaud, L. F. Cugliandolo, J. Kurchan, and tistical Physics 166, 1065 (2017).
M. Mézard, Spin glasses and random fields 12, 161 [131] T. Aspelmeier, R. A. Blythe, A. J. Bray, and M. A.
(1998). Moore, Physical Review B 74, 184411 (2006).
[103] E. Vincent, J. Hammann, M. Ocio, J.-P. Bouchaud, and [132] T. Aspelmeier and M. A. Moore, Physical Review E
L. F. Cugliandolo, in Complex Behaviour of Glassy Sys- 100, 032127 (2019).
tems: Proceedings of the XIV Sitges Conference Sitges, [133] A. Prudnikov, Y. A. Brychkov, and O. Marichev, More
Barcelona, Spain, 10–14 June 1996 (Springer, 2007) pp. special functions (integrals and series vol 3) (New York:
184–219. Gordon and Breach, 1990).
[104] A. Altieri, F. Roy, C. Cammarota, and G. Biroli, Phys- [134] W. A. Brock and C. H. Hommes, Econometrica: Journal
ical Review Letters 126, 258301 (2021). of the Econometric Society , 1059 (1997).
[105] D. Martı́, N. Brunel, and S. Ostojic, Physical Review [135] W. A. Brock and C. H. Hommes, Journal of Economic
E 97, 062314 (2018). dynamics and Control 22, 1235 (1998).
[106] J.-P. Bouchaud, Journal de Physique I 2, 1705 (1992). [136] A. Montanari, SIAM Journal on Computing (2021).
[107] H. Rieger, Journal of Physics A: Mathematical and Gen- [137] P. Cizeau and J.-P. Bouchaud, Journal of Physics A:
eral 26, L615 (1993). Mathematical and General 26, L187 (1993).
[108] L. F. Cugliandolo, J. Kurchan, and F. Ritort, Physical [138] K. Janzen, A. Engel, and M. Mézard, Physical Review
Review B 49, 6331 (1994). E 82, 021127 (2010).
[109] H. Yoshino, Journal of Physics A: Mathematical and [139] I. Neri, F. Metz, and D. Bollé, Journal of Statistical Me-
General 29, 1421 (1996). chanics: Theory and Experiment 2010, P01010 (2010).
[110] L. Berthier and J.-P. Bouchaud, Physical Review B 66,
054404 (2002).