Findling&Wyart - 2021 - Computation Noise in Human Learning Anddecision-Making Origin, Impact, Function

Computation noise in human learning and

decision-making: origin, impact, function
Charles Findling1,2 and Valentin Wyart1,2

Making sense of uncertain and volatile environments, a In the laboratory, these dynamic inferences are investi-
cognitive process modeled across domains as statistical gated using controlled paradigms generating different
inference, constitutes a difficult yet ubiquitous challenge for forms of uncertainty (Figure 1a). The ‘weather pre-
human intelligence. Beside sensory errors and exploratory diction’ task [1–3] probes the ability to combine multiple
choices, recent research has identified the limited sources of uncertain information about the location of a
computational precision of cognitive inference as a surprisingly hidden reward. After learning probabilistic associations
large contributor to the variability and suboptimality of between a set of symbols — presented one at a time —
perceptual and reward-guided decisions made under and the location of the reward, subjects are asked to
uncertainty. This focused review discusses the theoretical and predict the location of the reward based on sequences
experimental evidence scattered across psychology and of symbols. Each symbol taken in isolation predicts the
neuroscience which, taken together, provides key insights into same reward location as before, but this uncertain infor-
the origin, impact and function of this ‘computation noise’ for mation has to be combined across symbols to predict
learning and decision-making. Moving beyond the classical accurately the location of the reward. In this task, even
description of internal noise as performance-limiting constraint after reaching ceiling performance at predicting the loca-
on neural function and cognition, we outline the possible tion of the reward based on symbols presented one at a
emergent benefits of computation noise for adaptive behavior time, humans (and non-human primates) show a large
in adverse conditions and highlight open questions for future trial-to-trial variability in their predictions based on
research. sequences of symbols [3,4]. In other words, subjects
sometimes choose a reward location which is not associ-
ated with the highest posterior probability of reward given
Laboratoire de Neurosciences Cognitives et Computationnelles, Insti- the presented sequence of symbols.
tut National de la Santé et de la Recherche Médicale, Paris, France
Département d’Études Cognitives, École Normale Supérieure,
Another widely used paradigm, the ‘reversal learning’
Université PSL, Paris, France
task [5–7], probes the ability to monitor changes in the
Corresponding authors: Findling, Charles (, reward probabilities associated with choice options (e.g.
Wyart, Valentin ( the two arms of a bandit, Figure 1b). Reward probabilities
being uncertain, subjects need to distinguish ‘false
Current Opinion in Behavioral Sciences 2021, 38:124–132
alarms’ — missing rewards when choosing the option
associated with the highest reward probability — from
This review comes from a themed issue on Computational cognitive
genuine changes in reward probabilities. Humans
engaged in this task (or one of its variants) make a
Edited by Angela J Langdon and Geoffrey Schoenbaum
substantial fraction of ‘non-greedy’ choices — choices
For a complete overview see the Issue and the Editorial which do not maximize expected reward but reduce the
Available online 12th March 2021 uncertainty about recently unchosen options [5,8]. Therefore, and despite the several differences between
these different experimental paradigms, human decisions
2352-1546/ã 2021 The Author(s). Published by Elsevier Ltd. This is an
open access article under the CC BY-NC-ND license (http://creative- made in uncertain and volatile environments exhibit a pervasive variability which limits their accuracy.

Large contribution of inference errors to

Prominent variability of human decisions human decision variability
made under uncertainty The origin of decision variability under uncertainty has
Humans routinely navigate uncertain and volatile envir- usually been assigned to a single source, located either at
onments in everyday life, from a changing weather to the input or output of the probabilistic inference process
unexpected incidents on our regular metro line. In such used to update an internal model of the environment –
conditions, adaptive behavior requires making dynamic the location of the hidden reward in the weather predic-
probabilistic inferences about external events (e.g. mak- tion task, or the reward probabilities associated with each
ing accurate predictions about the weather) and action- choice option in the reversal learning task. However,
outcome contingencies (e.g. choosing a detour which these accounts fail to explain empirical observations, such
minimizes additional delays on our way to work). as why decision variability grows linearly with the number

Current Opinion in Behavioral Sciences 2021, 38:124–132

Computation noise in human cognition Findling and Wyart 125

Figure 1




Current Opinion in Behavioral Sciences

(a) Description of experimental paradigms studying dynamic inferences under uncertainty. Left: visual variant of the weather prediction task. Each
trial consists of a sequence of oriented patterns, drawn from one of two generative probability distributions (sources A and B). At sequence offset,
subjects are prompted to indicate the source from which they believed the oriented patterns were drawn. Right: restless variant of the reversal
learning task. On each trial, subjects are asked to choose between two colored symbols (options), and then obtain its associated reward. The
mean rewards associated with the two options drift continuously and randomly over time (thick lines represent the drifting mean rewards
associated with the two options, whereas thin lines represent rewards sampled the drifting means). (b) Contributions of distinct sources of errors
to human behavioral variability in the weather prediction task (left) and the reversal learning task (right). Left: inference errors explain about 90% of
the observed behavioral variability in the weather prediction task. Right: inference errors in reinforcement learning explain more than 60% of
seemingly non-greedy decisions when the reward associated with the foregone option is not observed (partial outcome condition, left), and more
than 85% when the foregone reward is observed and there is thus equal uncertainty about chosen and unchosen options (complete outcome
condition, right). (c) Decomposition of human inference errors in terms of a computation bias-variance trade-off in the weather prediction task (left)
and the reversal learning task (right). The bias term (left) corresponds to predictable errors across repetitions of the same trial, whereas the
variance term (right) corresponds to unpredictable errors across trial repetitions – that is, computation noise. In the weather prediction task, the
bias term is split into temporal biases (green), perceptual biases (blue), trial history biases (brown), and other unspecified biases (gray).
Computation noise explains about two thirds of human inference errors in the two tasks.

of presented symbols in the weather prediction task – at a steps and explain why decision variability grows with the
rate which greatly exceeds sensory variability [4]. Or number of stimuli in the sequence. Strikingly, inference
why subjects make non-greedy decisions even in condi- errors account for more than 85% of the observed decision
tions where they observe after each choice the foregone variability in this task. This result means that subjects
reward associated with the unchosen option, and there is almost never choose a reward location which is not
thus equal uncertainty about chosen and unchosen associated with the highest perceived probability of
options [8,9,10]. reward. Rather, they make substantial cognitive errors
inferring the most probable reward location based of the
These different effects can be readily explained by the presented sequence of stimuli. In a ‘random walk’ variant
presence of significant inference errors – that is, errors in of the reversal learning task where non-greedy decisions
the updating of the internal model of the environment reduce the uncertainty about recently unchosen options,
(Figure 1b). In the weather prediction task [4], these inference errors still make up more than 60% of these
cognitive errors accumulate across successive inference decisions [8]. This observation means that a substantial Current Opinion in Behavioral Sciences 2021, 38:124–132

126 Computational cognitive neuroscience

fraction of these adaptive decisions is not driven by an precision of underlying computations rather than from
explicit arbitration between exploration and exploitation, biased (systematically wrong) inference [23].
but by cognitive errors when inferring the reward proba-
bilities associated with the different choice options. In In volatile environments where the state of the environ-
other words, many of the non-greedy decisions labeled as ment changes over time, computation noise shows a
‘exploratory’ (uncertainty-minimizing) when assuming scaling variance which matches the Weber’s law of per-
noise-free inference reflect ‘exploitative’ (reward-maxi- ceptual discrimination prevalent in numerous sensory
mizing) decisions based on misestimated reward proba- modalities. Across different variants of the reversal learn-
bilities caused by inference noise. Accounting for infer- ing task [8,24], computation noise grows with the
ence errors during reward-guided learning also explains prediction error between observed and expected reward
the non-greedy decisions observed when foregone – the ‘temporal difference’ (surprise) signal which drives
rewards are presented together with obtained rewards – model updating in reinforcement learning (RL) [25]. This
that is, conditions where these decisions have no adaptive Weber scaling structure of computation noise is not
value in terms of reward maximization. observed in stable environments where the state of the
environment is uncertain but fixed, as in the weather
Computation bias-variance structure of prediction task [4]. These differences may be due to the
human inference errors distinct roles of surprise for learning in these two types of
The surprisingly large contribution of inference errors to uncertain environments. Inferring the state of volatile
human decision variability requires qualifying their environments constitutes an estimation task where sur-
nature. The statistical signatures of inference errors differ prise indicates a possible change in the current state of the
from those of stochastic choice policies (e.g. softmax, environment – and is thus the primary signal for updating
Thompson sampling) in the sense that inference errors the internal model of the environment [26–28]. By con-
committed at one point in time corrupt (in a normative trast, inferring the state of stable environments relies on
sense) the internal model of the environment and thus the integration of uncertain information over much longer
propagate forward – within a sequence of stimuli in the (ideally infinite) time constants [29]. The relevance of
weather prediction task [4], or across successive trials in surprise for inferring the state of a stable environment
sequential reinforcement paradigms such as the reversal decays over time as the amount of integrated information
learning task [8]. In contrast to stochastic choice policies grows, and optimal inference does not require computing
[11], inference errors do not reflect a sampling-based surprise to update the internal model of the environment.
‘read-out’ of the internal model of the environment, Future research should examine further the relation
but fluctuations of the internal model itself. between surprise, model updating and computation
A key question concerns the statistical structure of
inference errors — in particular, whether they produce Despite these fine-grained differences, computation
random (unpredictable) or systematic (predictable) var- noise accounts for a dominant fraction of inference errors
iability in behavior — a decomposition known as the across perceptual and reward-guided decisions – two
‘bias-variance’ trade-off. Inference errors being defined canonical types of decisions studied and theorized using
as deviations from optimal (or near-optimal) inference different paradigms and cognitive models [4,8]. This
[4,8], they may be caused by different cognitive pattern of findings sets the limited precision of probabi-
biases known to affect probabilistic inference under listic inference as an upper bound on the accuracy and
uncertainty. For example, evidence integration shows trial-to-trial consistency of human decisions made in
idiosyncratic temporal biases (e.g. primacy or recency) uncertain environments.
that can produce large inference errors [12–15]. Simi-
larly, confirmation biases and other forms of ‘trial history’ Constraints of computation noise on learning
effects have been reported across decision domains [16– and decision-making
19]. A fundamental difference between such cognitive Computation noise, defined above as random variability
biases and computation ‘noise’ — that is, stochastic in the updating of the internal model of the environment
variability in the updating of the internal model of the (the location of the hidden reward in the weather predic-
environment — is that biases should produce correlated tion task, or the reward probabilities associated with each
inference errors across repetitions of the same trial choice option in the reversal learning task), can have very
[4,8,20]. Instead of performing an exhaustive search different substrates in the brain. Computation noise can
of all possible cognitive biases [21], leveraging the reflect genuine neural noise such as stochastic synaptic
consistency of human decisions across repeated trials release at the cellular level [30], random fluctuations in
[22] revealed that random variance accounts for 65% of the tight excitation-inhibition balance required by neural
inference errors across perceptual and reward-guided populations to perform precise computations [31,32], or
decisions [4,8] (Figure 1c). In other words, about the variable pooling of task-relevant neural responses by
two thirds of inference errors arise from the limited top-down attention across cortical hierarchies [33]. But

Current Opinion in Behavioral Sciences 2021, 38:124–132

Computation noise in human cognition Findling and Wyart 127

computation noise can also arise from task-independent produce either primacy or recency effects depending
(background) input to neural circuits implementing prob- on the main source of uncertainty [15]. And more com-
abilistic inference [34,35,36] – that is, ‘effective’ noise plex, ‘winner-take-all’ biases in information integration
which may not be random in an absolute sense, but may have emerged to mitigate the impact of computation
generates trial-to-trial variability in the updating of the noise on performance [54].
internal model. And irrespective of its precise substrates,
this type of internal noise is widely seen as a performance- Together, these recent behavioral findings blur the line
limiting constraint which neural systems have evolved to between bias and variance in the statistical sense [23]:
cope with using ‘efficient’ mechanisms. certain cognitive biases may correspond to direct mecha-
nistic consequences of computation noise, while others
At the neural level, correlated noise across neurons may guard against the variability and suboptimality trig-
(which does not average out by pooling neural gered by computation noise. In either case, this deep
responses) has been theorized [37] and recently shown interplay between cognitive bias and variance emphasizes
[38,39] to constrain the precision of neural representa- the importance of characterizing computation noise for
tions. Because this correlated noise aligns strikingly understanding even seemingly unrelated aspects of
well with the neural ‘dimensions’ (in population space) human cognition [20,22,23].
which predict decisions on a trial-by-trial basis [40],
its impact on task performance is expected to be sub-
stantial, and consistent with the large contribution of Regulation of computation noise by
computation noise to decision variability. In agreement noradrenergic neuromodulation
with this view, correlated noise has been shown to be The large contribution of computation noise to human
the prime target of top-down attentional modulation learning and decision-making suggests that neural sys-
[41–43], by decreasing pairwise correlations between tems should be involved in its active regulation [4,8].
sensory neurons tuned to attended locations or features, Previous research which does not consider explicitly
and increasing the gain of neural representations at the computation noise has identified the anterior cingulate
population level [44]. At the theoretical level, attention cortex (ACC) as a selective neural correlate of behavioral
has been described as an evolved mechanism for variability in uncertain and volatile environments which
approximate inference which can increase the precision require arbitrating between exploration and reward max-
of task-relevant computations in the presence of lim- imization [55,56]. Accounting for the presence of compu-
ited processing resources [45]. tation noise in reinforcement learning has revealed that
trial-to-trial fluctuations in ACC activity covary with
Other neural constraints which may give rise to compu- variability in the updating of choice values based on
tation noise include the multiplexing of several cognitive obtained rewards, even when it does not result in a
tasks by context-dependent computations in shared neu- behavioral switch (i.e. overt exploration) on the subse-
ral circuits in parietal and prefrontal cortices [36,46]. quent trial (Figure 2a) [8]. This finding indicates that
Concurrent, irrelevant input that has not been fully the phasic ACC activity observed preceding exploratory
suppressed by context-dependent computations would choices may reflect a genuine resetting of the internal
produce task-independent variability with the same sta- model of the environment rather than a temporary
tistical signatures as the computation noise observed in ‘release’ of the reliance on the internal model for guiding
the weather prediction task [4]. behavior.

In terms of cognition, different lines of research have Beside the ACC, the magnitude of computation noise in
identified ‘efficient coding’ mechanisms for dealing with the updating of choice values correlates also robustly with
external and internal sources of noise [47]. Human per- trial-to-trial fluctuations in pupil dilation [8], a non-
ceptual biases running opposite to prior expectations (in invasive proxy of locus coeruleus-norepinephrine (LC-
apparent contradiction with statistical inference) can be NE) activity [57,58,59]. Noradrenergic neuromodulation
explained by efficient coding principles [48]. Similarly, has been linked to the regulation of arousal, but also
the variability and occasional irrationality of human pref- strategic exploration through the adaptive gating of neural
erence-based decisions may arise from strikingly similar variability in the ACC and other decision circuits in the
mechanisms applied to value signals [49]. The notion of prefrontal cortex [60]. In volatile environments, pupil
limited processing resources, which lies at the heart of dilation reflects not only the updating of the internal
efficient coding theories, has recently been instantiated as model following surprising events [61], but also the
a finite number of ‘particles’ in a sampling-based descrip- variability of the resulting behavior [62]. Accounting
tion of statistical inference [50,51–53]. As shown by for the presence of computation noise in similar condi-
these two examples, certain cognitive biases may be tions showed that pupil-linked variability in behavior
the consequence of computation noise. Sampling-based arises from computation noise rather than fluctuations
computations in a hierarchical inference circuit can in the exploration-exploitation trade-off (Figure 2b) – an Current Opinion in Behavioral Sciences 2021, 38:124–132

128 Computational cognitive neuroscience

Figure 2

(a) anterior cingulate cortex (ACC) (c) reanalysis of Jepma et al., 2010
contributions to behavioral variability
parameter estimate (a.u.)

parameter estimate (a.u.)

(b) phasic pupil dilation (d) computation noise estimates
parameter estimate (a.u.)

parameter estimate (a.u.)

posterior density

Current Opinion in Behavioral Sciences

(a) Regression of trial-to-trial fluctuations of computation noise with BOLD activity in the anterior cingulate cortex (ACC) locked to the onset of the
choice period. Left: the correlations of ACC activity with computation noise (blue) and prediction error triggered by the previous reward (orange)
emerge before choice onset and follow similar time courses. Right: fluctuations of ACC activity predict trial-to-trial changes in behavioral variability
to the same extent in the partial outcome condition (left bar) and the complete outcome condition (right bar) where exploration has no adaptive
value in terms of reward maximization. (b) Regression of trial-to-trial fluctuations of computation noise with phasic pupil dilation locked to the
onset of the choice period. Left: the correlation of pupil dilation with learning noise (blue) emerges before choice onset and precedes the
correlation of pupil dilation with choice value (purple). Right: like ACC activity, fluctuations of pupil dilation predict trial-to-trial changes in
behavioral variability to the same extent in the partial and complete outcome conditions. (c) Reanalysis of behavioral data from a human
pharmacological study of the effect of reboxetine (noradrenaline reuptake inhibitor) on reinforcement learning in a restless four-armed bandit task.
Contribution of inference errors in reinforcement learning to behavioral variability for the placebo group (left) and the reboxetine group (right).
Computation noise in reinforcement learning explains a larger fraction of behavioral variability in the reboxetine group (80%) than the placebo
group (53%). (d) Posterior distributions of computation noise estimates for the placebo group (gray) and reboxetine group (dark blue). Reboxetine
increases the magnitude of computation noise during reinforcement learning.

effect observed even when there is equal uncertainty reuptake inhibitor, increase the rate of endogenous
about chosen and unchosen options [8]. Recent simul- perceptual alternations of bistable stimuli [64], in a task
taneous recordings from the LC and the ACC have which does not involve any form of exploration. Fur-
further shown that phasic LC activation increases pair- thermore, small doses of reboxetine, another noradrena-
wise correlations between ACC neurons [63], in a way line reuptake inhibitor, do not produce exploratory
that resembles the attentional modulation of correlated behavior [65] but seem to increase the magnitude of
noise in visual cortex [41–44]. computation noise in a preliminary reanalysis of this
dataset using a reinforcement learning model which
The idea that noradrenergic neuromodulation may reg- accounts for the presence of computation noise
ulate computation noise rather than the exploration- (Figure 2c,d) [8,66]. Future research should validate
exploitation trade-off is also supported by causal phar- this preliminary finding and investigate further the pos-
macological manipulations of the LC-NE system in sible involvement of the noradrenergic system in the
humans. Small doses of atomoxetine, a noradrenaline active regulation of computation noise.

Current Opinion in Behavioral Sciences 2021, 38:124–132

Computation noise in human cognition Findling and Wyart 129

Emergent benefits of computation noise for behavioral variability when confronted with a computer-
learning and decision-making simulated competitor that is capable of predicting their
The research discussed above describes computation upcoming choices [73,74].
noise as an important constraint on neural function and
cognition, whose impact on performance may be con- Finally, and more generally, computation noise may
trolled by a specific neuromodulatory pathway. A remain- produce beneficial stochasticity for the attractor-like
ing puzzle concerns the reasons of its very existence. If dynamics of neural circuits implementing probabilistic
computation noise drives such a large fraction of human inference [34,35,36,46]. Their dynamics are typically
behavioral variability and suboptimality in uncertain low-dimensional and converge on strong attractor states,
environments, why hasn’t it been suppressed or reduced such that computation noise could serve to destabilize
to a larger extent during evolution? these attractor states and allow for more flexible transi-
tions between them. Such flexibility is particularly wel-
A first possible answer is that computation noise may come in uncertain and volatile environments where the
optimize a trade-off between the marginal payoff of a state of the environment can change unpredictably, and
computation and the cost associated with performing the the internal model (reflected in the current attractor state
computation at a certain precision [8]. The ACC has of the circuit) needs to be updated following each change.
recently been hypothesized to arbitrate such a cost-ben- This flexibility is typically implemented in cognitive
efit trade-off by computing an ‘expected value of control’ models through explicit sophistication (e.g. hierarchical
– defined as the difference between an expected payoff inference) [6,75,76]. Whether computation noise may
and its associated cost in terms of cognitive conflict [67]. provide such cognitive flexibility at zero cost therefore
The cost associated with a computation is likely to grow constitutes another important open question for future
with its precision, such that the ACC may optimize this research.
computation cost-benefit trade-off by regulating compu-
tation noise [8] (possibly, through bidirectional connec- Conflict of interest statement
tivity with the LC-NE system [58]). This proposal Nothing declared.
provides a natural explanation for the existence of com-
putation noise, but also makes testable predictions. In
particular, computation noise should increase in highly Acknowledgements
volatile conditions where the marginal payoff of main- We thank Marieke Jepma and Sander Nieuwenhuis for sharing the
taining a precise internal model of the environment behavioral data of their pharmacological study, and Vasilisa Skvortsova for
useful discussions. This work was supported by a starting grant from the
decreases – a prediction which has recently been vali- European Research Council (OPTIMIZERR, ERC-StG-759341) awarded
dated experimentally [24]. The Weber scaling structure to V.W., and an institutional grant from the Agence Nationale de la
of computation noise also drives transient increases in Recherche (FrontCog, ANR-17-EURE-0017) awarded to the Département
d’Études Cognitives of the École Normale Supérieure.
behavioral variability following surprising events by reset-
ting the current state of the internal model [8,24].
