Perspectives: If Deep Learning Is The Answer, What Is The Question?

PERSpECTIVES
and computer scientists proposed neural

If deep learning is the answer, what networks as solutions to key problems
in perception, memory and language31.
is the question? Contemporary deep networks resemble

scaled-up connectionist models, and
recent advances in machine learning are
Andrew Saxe , Stephanie Nelli and Christopher Summerfield also heavily indebted to the ubiquity of
digital data and the relatively low cost
Abstract | Neuroscience research is undergoing a minor revolution. Recent of computation in the twenty-first century26.
advances in machine learning and artificial intelligence research have opened up It might thus be tempting to dismiss current
new ways of thinking about neural computation. Many researchers are excited by excitement around deep learning models for
neuroscience as a rehashing of earlier ideas,
the possibility that deep neural networks may offer theories of perception,
owing more to the slow churn of scientific
cognition and action for biological brains. This approach has the potential to fashion than to genuine intellectual progress.
radically reshape our approach to understanding neural systems, because the However, many researchers believe that
computations performed by deep networks are learned from experience, and not deep learning models have the potential to
endowed by the researcher. If so, how can neuroscientists use deep networks to radically reshape neural theory, and to open
model and understand biological brains? What is the outlook for neuroscientists new avenues for symbiotic research between
neuroscience and AI research23,32–34. This
who seek to characterize computations or neural codes, or who wish to understand is because contemporary deep networks
perception, attention, memory and executive functions? In this Perspective, our are grounded in quasi-naturalistic sensory
goal is to offer a road map for systems neuroscience research in the age of deep signals (such as image pixels13 or auditory
learning. We discuss the conceptual and methodological challenges of comparing spectrograms15) that allow them to perform
behaviour, learning dynamics and neural representations in artificial and biological tasks of far greater complexity than was
previously possible. Contemporary deep
systems, and we highlight new research questions that have emerged for
networks can thus learn ‘end-to-end’
neuroscience as a direct consequence of recent advances in machine learning. (that is, without researcher intervention) in
a sensory ecology that resembles our own:
Recent years have seen a dramatic resurgence questions, inspired by deep learning, that natural sounds and scenes for supervised
in optimism about the progress of artificial neuroscientists are only just beginning to learning and generative modelling, and
intelligence (AI) research, driven by advances address. In doing so, we highlight specific 3D environments with realistic physics for
in deep learning1. ‘Deep learning’ is the falsifiable hypotheses that often underpin deep reinforcement learning. This advent
name given to a methodological toolkit deep learning models, drawing on the of end-to-end models of biological function
for building multilayer (or ‘deep’) neural domains of perception, memory, inference has enabled researchers to attempt to model,
networks that solve challenging problems and control processes. We point to the limits for the first time, the de novo emergence
in supervised classification2, generative of correlating representations of brains and of neural computations that can solve
modelling3 or reinforcement learning4,5. complex deep learning architectures, and real-world problems.
Neuroscience and AI research have a rich argue for a focus on learning trajectories Networks capable of high performance
shared history6, and deep networks are now and complex behaviours. Finally, we ask on complex real-world tasks have enabled
increasingly being considered as promising how deep network theories can provide a host of recent advances at the intersection
theories of neural computation. The recent explanation and understanding, by drawing of machine learning and neuroscience.
literature is studded with comparisons of on recent research that is beginning to For example, one major line of research has
the behaviour and activity of biological develop mathematical descriptions of examined the representations formed by
and artificial systems7–21, summarized in network learning dynamics and behaviour. supervised deep networks that are trained
a growing number of review articles22–30. In doing so, we argue that deep networks to label objects in natural scenes2 (Fig. 1).
In this Perspective, we assess the can and should be used to provide a new A striking observation is that biologically
opportunities and challenges presented by generation of falsifiable theories of how plausible neural representations can emerge
this new wave of intellectual synergy between humans and other animals think, learn in networks that combine gradient descent
neuroscience and AI research. We begin by and behave. with a handful of simple computational
considering the recent proposals that have principles29 (gradient descent is a training
sought to reframe neural theory as a deep Neoconnectionism? method where weights are adjusted
learning problem. We assess extant evidence The idea that neural networks can serve as incrementally to encourage the network
that deep networks form representations theories of neural computation is not new. outputs towards an objective). When deep
or exhibit behaviours in ways that resemble During the parallel distributed processing networks are endowed with properties
biological agents and consider a host of new movement of the 1980s, psychologists including local connectivity, convolutions,
Nature Reviews | Neuroscience

Perspectives
f LIP RNN g DMFC RNN
Category 1
Category 2
Short Long
a e
V1 CNN Optimized for:
Performance
Explaining IT
IT variance explained
b
V1–V3 CNN
Accuracy
c
d
IT CNN
Activity
Observed
Predicted
Image
Fig. 1 | representational equivalence between neural networks and the in IT signals and classification accuracy for pseudo-randomly generated
primate brain. This figure summarizes evidence for representational neural networks that are trained to maximize either classification perfor
correspondence between deep networks and biological brains. a | Left: mance (light blue line) or explanation of variance in neural signals (dark
schematic illustration of simple and complex cell receptive fields from blue line)13. f | Left: state space analysis of neural signals from macaque
mammalian primary visual cortex (V1). Right: example filters learned in lateral intraparietal area (LIP) recorded during a dot motion categorization
the first hidden layer of a deep convolutional neural network (CNN)2. task. Red and blue lines show different motion directions belonging to
b | Representational similarity analysis is a method by which the similarity opposing categories; trajectories are plotted in two dimensions. Right:
of the population response to each stimulus (images of faces, bees, leaves same analysis conducted on hidden units of a recurrent neural network
and balls in this example) can be assessed. Example representational simi (RNN)45. g | Left: state space analysis performed on neural signals recorded
larity matrices illustrating the similarity (blue indicating similar and red from macaque dorsomedial prefrontal cortex (DMPFC) during perfor
indicating dissimilar) in population activity evoked by objects in early visual mance of a long-interval or short-interval reproduction task, plotted in
areas of the primate brain (left; recorded with electrophysiology) and in the three dimensions. Right: same analysis conducted on hidden units of an
intermediate layers in a deep CNN14 (right). c | Hypothetical neural firing RNN48. Part a adapted with permission from ref. 2, Neural Information
rates in response to a series of natural images (dark blue trace) and corre Processing Systems. Parts b and d adapted from ref.14, CC BY 4.0 (https://
sponding hypothetical activity predicted as a linear transform of the neural creativecommons.org/licenses/by/4.0/). Panel c left insert credit: Wolfgang
network activity (light blue trace)13. d | Representational similarity matrices Flamisch/Getty. Panel c middle insert credit: KDP/Getty. Part f adapted
as in part b but comparing inferior temporal cortex (IT) with the final layers with permission from ref.45, Elsevier. Part g adapted with permission from
of a CNN14. e | Illustration of the relationship between variance explained ref.48, Elsevier.
pooling and normalization, the early layers be stronger in networks that perform object A deep learning framework
acquire simple filters for orientation and recognition more accurately13 (Fig. 1e). This claim has potentially profound
spatial frequency2, just like neurons in the One corollary of these findings is that the implications for neuroscience. It has already
primary visual cortex (Fig. 1a), whereas in sophisticated behaviours and structured prompted calls for systems neuroscientists
deeper layers, the distributions and similarity neural representations observed in humans to refrain from building theories that
structure of neural representations for and other animals might emerge from a impose intuitive functional importance
objects and categories resemble those in limited set of computational principles, as on neural circuits by fiat, and instead
the primate ventral stream12–14,19 (Fig. 1b,d). long as the input data are sufficiently rich and to study the computations that emerge
Notably, representational equivalence may the network is appropriately optimized25,35. spontaneously during the training of deep
www.nature.com/nrn
Perspectives
networks23,32–34. One such enjoinder33 that level — for example, using dimensionality One strength of the deep learning
invokes the term ‘deep learning framework’ reduction — neural patterns emerge that framework is its generality: it offers a
encourages researchers to avoid explicit meaningfully distinguish experimental unified vision for studying computation
characterizations of neural computation variables44,45. across functions, species and brain regions.
(for example, how simulated neurons with Another key observation is that these However, it has yet to provide a concrete
handcrafted tuning curves and hand- patterns of population activity can be road map for systems neuroscience
engineered network connectivity might recreated when the same analysis is applied research23,32–34. If neural computation
implement a certain function such as object to unit activations in recurrent neural emerges uncontrollably through blind,
recognition). Instead, it proposes that networks trained to evaluate time-varying unconstrained optimization, how can
the role of the researcher is to specify the decision evidence44–46 (Fig. 1f), judge neuroscientists formulate new, empirically
overall network architecture, the learning the length of a time interval47,48 (Fig. 1g) testable hypotheses about brain function?
rule and the cost function; control is thus or maintain information over a delay Such hypotheses are argued to take the
relinquished over the microstructure period49–51. Accordingly, deep recurrent form of design choices about learning rules,
of computation, which instead emerges neural networks that have been trained regularization principles or architectural
organically over the course of network from scratch are increasingly being constraints in deep networks33. There is
training33. This proposal has raised the proposed as computational theories for some evidence that more judicious design
question of whether neural computation sensorimotor integration and working choices for deep networks may permit
is sufficiently interpretable to be worth memory that go beyond existing models a closer match to biology29. For example,
explaining at all. A related proposal draws with more ‘handcrafted’ features, such as adding recurrent connections improves
an analogy between optimization over attractor models. In the domain of working the fit to neural data18, especially for those
computation in neural networks and memory, a particularly interesting new natural images that are harder to classify
optimization over biological forms by line of research has used recurrent networks and at later poststimulus time points17,
evolution: in both cases, interpretable to address a key question in systems whereas including a biologically plausible
functional adaptations emerge without neuroscience, namely when codes for front end (a ‘retina net’) encourages the
meaningful constraints being imposed stored information should be static or formation of realistic coding properties,
on the search process32. In other words, dynamic49. This work has contributed including cell types typically found in the
it has been claimed that neural systems to claims that it is futile to characterize the thalamus52. In general, however, we lack
are fundamentally uninterpretable, and coding properties of individual cells or to overall guiding principles for making
that structured theories of perception infer how they participate in computation38. such design choices. In machine learning
and cognition are ‘just-so stories’ that Instead, it is argued the computational research, networks are rarely built with
reflect more closely the researcher’s quest model is explainable only at the aggregate biological plausibility in mind, and so there
for meaning than the reality of neural level of the population, which is ultimately is relatively little prior guidance in how
computation32. We consider this view in driven by the structure of the network and they might be used to model neural systems.
more detail herein. the way it is optimized. Moreover, understanding the mapping from
The claim has also been made that Together, these findings have been design to performance in deep networks
modelling brains as neural networks invoked to argue that attempts to explain is challenging, which is presumably why
relieves researchers of the burden of computation in single neurons or local AI has a relatively poor track record
exhaustively documenting and interpreting brain areas are fruitless, and that meaningful in conducting interpretable or overtly
the coding properties of single neurons33. descriptions of neural computation are better hypothesis-driven research, preferring to
As methodological advances have permitted given by design choices or hyperparameter focus instead on whether a system works
simultaneous recordings from large numbers settings from machine learning models. rather than why it works53.
of neurons36, a doctrine has emerged For our part, although we celebrate the At worst, the deep learning framework
according to which neural representation opportunity to study biological brains seems to face neuroscience with an
is dynamically multiplexed across through the lens of neural network models, existential challenge. The research
populations37. From this perspective, single we reject the view that this means giving up programme asks researchers to document
neurons code for multiple experimental on the attempt to explain computation. We how different architectures or algorithms
variables and their interactions38–41, elaborate on this in the following sections. can encourage deep networks to form
exhibiting non-linear mixed selectivity. semantically meaningful representations or
Although a focus on population coding From framework to hypothesis exhibit complex behaviours, as humans and
emerged independently of the growing The deep learning framework proposes other animals do. This endeavour sounds
interest in deep learning, mixed selectivity is powerful new tools for modelling the suspiciously similar to contemporary AI
often (but not always)42 a hallmark of coding bewildering volumes of data that are research itself. The deep learning framework
in deep network models4. In the brain, this now routinely recorded in systems seems to break with a long tradition
tendency seems to be most pronounced in neuroscience laboratories. However, of searching for explanations of neural
higher cortical areas, such as the parietal we hope that enthusiasm for deep networks computation in biological brains. Rather,
cortex and prefrontal cortex, that support as computational models will be tempered it seems to propose sweeping away existing
working memory and action selection38–40. with a sober consideration of how they can knowledge about how specific classes of
In these regions, the coding properties of be usefully deployed to understand neural computation underpin behaviour, merging
single neurons can be highly heterogenous mechanism and cognitive function. That the goals of theoretical neuroscience with
and vary in mystifying ways over the course is, if deep learning is the answer, what are those of contemporary AI research.
of a given trial38,39,43. However, when neural the questions that neuroscientists should We recognize the promise of the deep
activity is examined at the population ultimately be asking? learning framework and are excited about

Perspectives
a b One-versus-all object discriminability

Learning phase
Early 1.0
Primate
zone
Middle
Late
Human consistency
Error
Time
0.0
RSA
els
V1
et
et
-5 0
-v3
NY
VG
xN
leN
Pix
ion
t
Early Middle Late
Ne
Ale
og
ept
Res
Go
Inc
IT
ResNet-50 performance
Dissimilar
Train
CNN
Test
Similar
Item Super-human Chance
c
Activity
1
Neuron
MEI
Image
Predicted
activity
Activity
1
Unit
2
1
3
Image
www.nature.com/nrn
Perspectives
◀ Fig. 2 | emerging methods for comparing deep learning and the brain. a | Comparing representa if the neural patterns in brains and neural
tional change throughout learning. Top: over the course of learning and development, behaviour may networks differ wildly in terms of sparsity
systematically improve (schematized here as reduction in error on a task). Bottom: experiments can or dimensionality. Stricter tests of shared
track how neural representations change during learning (schematized as representational similarity coding are provided by methods that restrict
analysis (RSA) matrices in inferior temporal cortex (IT) at three time points; top row), and whether these
the freedom of the mapping function, such
changes are predicted by deep networks trained using specific learning rules (schematized as RSA
matrices of a convolutional neural network (CNN); bottom row). Comparing learning trajectories can as representational similarity analysis56,
help assess whether the learning procedure, rather than only the final representations, in deep neural in which representations are characterized
networks is similar to that in the primate brain. b | Finer-grained comparisons of behaviour. Top: meas by the distance between population
uring discriminability of one image against distractor objects isolates behavioural variance that is activity vectors evoked by different inputs.
specifically image-driven but not predicted by the object. Patterns of confusion among individual Representational similarity analysis discloses
images are shared by humans (y axis) and macaques (primate zone) but not deep networks. Light blue a superficial resemblance between brains
bars show the human-performance consistency of models based on low-level visual representations, and networks, but this agreement may
while dark blue bars show human-performance consistency of publicly available deep neural networks be driven principally by shared similarity
(VGG is a model developed by the Visual Geometry Group at the University of Oxford, NYU is a model structure for stimuli that are physically
developed at New York University)11. Bottom: classification performance of ResNet-50 trained from
similar, such as faces14. To go beyond
scratch on ImageNet (a database of images of objects) is close to perfect when it is trained and tested
on standard colour images (left) and when it is trained and tested on images with additive uniform
correlations, systems neuroscience will need
noise (middle). However, when it is trained on images with salt-and-pepper noise and tested on images to use causal assays of the predictive links
with uniform noise (right), performance is at chance even though the noise types do not seem different between artificial and biological networks,
to human observers9. This is one way in which humans generalize better than current deep networks. such as harnessing network activations
c | Causal tests of deep learning models. One approach, illustrated schematically here, uses ‘closed- for novel image synthesis57,58 (Fig. 2c). Such
loop’ experimental designs to test the predictive power of deep networks57,58,149. In one study149, natural closed-loop experimental designs promise
images were presented to mice while evoked neural activity (dark blue traces in the top-right panel) strong tests of the mapping between artificial
was recorded, and a deep neural network was trained to predict this activity (the light blue trace and biological brains.
illustrates correspondence between unit 1 and neuron 1 in the bottom-right panel). The deep network Another way to test the equivalence
was then used to compute a maximally exciting input image (MEI) that strongly activates that parti
between biological and artificial systems
cular neuron in the model (peach pathway). This MEI is then shown to the mouse, and the resulting
neural response is measured (peach part of the top-right panel). If the deep network captures the is to study their patterns of response. This is
mapping from pixels to neural response, the MEI should also strongly excite the biological neuron vital because the computations in a neural
(neuron 1). Part b adapted from ref.11, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). system can often profitably be understood in
Part b adapted with permission from ref.9, Neural Information Processing Systems. Part c adapted from the context of the behaviour they produce59.
ref.149, Springer Nature Limited. Part c left insert (chairs) credit: Wolfgang Flamisch/Getty. Part c middle Revealingly, humans and machines make
insert (mountain) credit: KDP/Getty. strikingly different sorts of errors in assays
of object recognition. In one study, networks
the new possibilities offered by neural networks can explain an impressive 60% of were prone to confuse object classes that
network models as theories of neural the variance in neuronal responses in the humans and even monkeys could safely tell
computation. We believe the strongest primate inferior temporal cortex. However, apart, such as dogs and guitars, and patterns
version of this framework will build on neural networks that perform substantially of confusion among individual images were
existing neural theories and maintain a worse at image classification explain just shared by humans and macaques, but not
pointed focus on explaining computation 5% less55. Indeed, the difference in the deep networks11 (Fig. 2b, top). Similarly,
in biological brains. In other words, we accuracy of predictions of blood oxygen humans generalize far better than deep
hope that deep learning will provide not level-dependent signals between trained networks to images that have been perturbed
just a framework for neuroscience research networks and untrained networks (that is, by addition of pixelated noise, or bandpass
but also a set of explicit hypotheses about those with random weights) is quite small — filtering9 (Fig. 2b, bottom), and are less
behaviour, learning dynamics and neural on the order of 5–10% accuracy difference prone to be fooled by deliberately misleading
representation in biological networks. for most visual regions19. It is often forgotten images7,21,22. There is a widespread view that
that landmark studies on which claims of biological vision exhibits a robustness that
Deep networks as neural models equivalent representations in deep networks is currently lacking in supervised deep
The deep learning framework is built and the brain are based actually used deep neural networks.
on the proposal that neural networks networks that were not trained with gradient More generally, animal behaviour is
learn representations and computations descent13. It is not fully clear, then, whether richly structured, in theory permitting
that resemble those in biological brains existing evidence strongly separates deep researchers to make systematic comparisons
(Fig. 2). However, it is possible that the learning from the more generic notion of with machine performance60. For example,
equivalence between deep networks and computation in a densely wired multilayer animal decisions are subject to stereotyped
animal brains has been overstated22. Indeed, network. Thus, an important goal for future biases, but also irreducibly noisy61; animals
comparing the multivariate representations research is going to be to more rigorously are flexible but forgetful, behaving as if
in brains and neural models is fraught and systematically assess the status of the memory and control systems were capacity
with statistical challenges54. Currently, claim that deep networks and biological limited62; and the rate and effectiveness of
one popular approach is to learn a linear brains learn in similar ways, for example information acquisition depend strongly on
mapping from network units to neurons, by measuring and comparing changes in the structure and order of study materials63.
and to evaluate the predictive validity of representations over learning (Fig. 2a). Mature theories of biological brain function
the resulting regression model in a held-out Testing whether neural signals are a linear should be able account for these phenomena,
dataset. If this approach is adopted for image transform of model activations is a good and we hope that future deep network
classification, the highest-performing deep start, but such a relationship could exist even models will be held to this standard.

Perspectives
Thus far, we have argued that neural can account for the full diversity of Other learning principles such as contrastive
networks, and in particular modern tools perceptual neural responses across Hebbian learning72 or predictive coding75,
from deep learning, have great potential modalities without building in specific which involve feedback connectivity, might
to shape our theories of neural computation. domain content. predict different distributions and timings
However, we have offered two reasons to A second proposal of deep learning of activity changes across layers. In this
be cautious. First, we should take care not that is in need of testing is end-to-end way, comparisons of learning trajectories
to overstate the extent to which existing learning. One way to evaluate learning can distinguish computational principles of
experimental comparisons between deep rules is to assess their ability to furnish learning even without specifying biological
networks and biological systems endorse deep networks with rich representations implementations. This approach opens the
deep learning as a framework for biology. and complex behaviours when exposed to door to a new programme of experiments
Second, if we wish to use deep learning naturalistic data. However, this approach focused on measuring the dynamics of
as a framework for neuroscience, it is is challenging. As noted earlier, learning representational change across cortical
important to be clear about what new may only modestly improve the match stages during prolonged training using
research questions it allows us to ask. If we to data, suggesting that deep network macroscopic imaging techniques such
wish to adjust learning rules or architectures representations rather than deep learning as functional MRI (fMRI) or wide-field
to model biological systems, where do we might drive much of the correspondence. calcium imaging.
begin? What empirical phenomena might Moreover, standard supervised models, such
deep networks predict that conventional as those described earlier that are popular Deep learning for cognition
models from classical neuroscience might for explaining primate object recognition, Deep neural networks excel at classifying
not? Which theories can we validate or seem to require improbable quantities of complex inputs into distinct classes such
falsify? In what follows, we take steps labelled data — unlike human infants, who as objects or words. Equally important,
towards answering these questions. gain sophisticated object understanding however, is what our brains do next: we link
even before language is acquired. Another objects and items into diverse knowledge
Learning rules for perception challenge for networks trained with gradient structures that describe our world. We
Perception provides a key opportunity descent is to identify a biologically realistic know, for example, that a dog can bark and
to test several planks of the deep learning implementation — that is, one where that a maple is a type of tree. Moreover, we
hypothesis. For instance, psychologists updates are local and forward and backward form semantic categories from multimodal
and neuroscientists have long debated the connectivity in the network is not required features, connecting the written and spoken
extent to which perceptual representations to be symmetric. Although mechanisms name for an object with its shape, odour
are prespecified by evolution or learned via adopted by machine learning researchers and texture. This conceptual knowledge of
experience64; as an example, whether primate for assigning credit to individual synapses the world transcends physical appearance,
face representations are innate or acquired were once thought to be biologically interlinking diverse and even unobservable
remains controversial63,65. The deep learning implausible, we now have a growing set object properties (for example, that a dog
hypothesis reframes this debate by asking of candidate implementations in need of has a spleen). The abstractions we acquire
whether neural codes could emerge from empirical tests71,72. over the course of development form the
a learning principle applied to a relatively Given these difficulties, a more direct building blocks for flexible generalization
generic architecture and starting point. test of different learning principles could and higher-level cognition in maturity76.
One strong candidate is supervised learning focus on how representations change Evaluating deep learning insights
with gradient descent, in which representa during prolonged training. This opens the beyond the realm of perceptual tasks is a
tions are sculpted by feedback about the door to studies of perceptual learning key open opportunity for neuroscientists.
label, name or category associated with a that can attempt to confirm or refute The behaviour of humans and other animals
sensory input29. As detailed earlier herein, these predictions73,74. For example, Fig. 3 is governed by a rich array of cognitive
supervised models have been a major focus shows the predictions of a neural network functions, including modular memory
of comparisons between deep networks model trained to classify tilted gratings processes and attentional and task-level
and biology29. However, a long tradition with gradient descent74. Extant neural control, and neural systems for navigation,
in neuroscience emphasizes unsupervised and behavioural phenomena emerge planning, mental simulation, reasoning and
principles such as Hebbian learning, or seamlessly from the model, such as stronger abstract inference. These cognitive functions
relatedly that representations are formed by sharpening of the tuning functions of the are implemented in a regionally specialized
a pressure to accurately predict the spatially most informative neurons (Fig. 3b,c), earlier brain, in which a patchwork of subcortical
or temporally local environment under an representational changes in higher cortical and allocortical structures interconnects
efficiency constraint66–68. Indeed, recent deep stages (that is, deeper layers) during training with granular and infragranular cortical
generative models show a remarkable ability (Fig. 3d), greater proneness to transfer coarse zones, each housing unique cell types and
to disentangle complex, high-dimensional rather than fine discrimination abilities to circuits. If we are committed to deploying
signals into their latent factors under other non-trained stimuli (Fig. 3e) and the deep learning models as theories for
this latter self-supervised objective69,70. transfer of fine discrimination early but not biology, we need to take seriously the
By contrast, a successful AI model that late during training73,74. question of how such elaborate structure
has yet to impact neuroscience proposes Critically, other learning principles may in cognition and behaviour emerges via
instead that representation formation is make qualitatively different predictions optimization. How do humans learn abstract
driven by the need to accurately predict the (Fig. 3f). For example, under correlational representations, divorced from physical
motivational value of experience5. It remains Hebbian learning, one might predict that the object properties? How do we assemble
to be seen whether some combination of most active (rather than most informative) knowledge into relational structures such
these more complex learning mechanisms neurons would change their tuning the most. as trees, rings and grids? How do we
www.nature.com/nrn
Perspectives
a Layer Layer Layer Layer Layer Layer

Image 1 2 3 4 5 6
hr
Reference
–1
p(CW)
+1
Target
ht
b Schoups et al. (2001) c High-precision model

Untrained Trained
Slope at TO (% change/degree)
Slope at TO (% change/degree)
3.0 3.0
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
–50 –25 0 25 50 –90 –45 0 45 90

Preferred orientation (°) Preferred orientation (°)
d e f Space of learning rules

80
Ori SF
1.0
+ 45 × 0.5 Gradient
+ 90 × 2.0 descent
60
0.9
Peak speed iteration
Transfer accuracy
Predictive
0.8 coding
40
0.7
20 Contrastive Correlational
0.6 Hebbian Hebbian
Higher layers change earlier

0 0.5
0.5 1.0 2.0 5.0 10.0 Active units change more
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
Separation (°) Informative units change more
Fig. 3 | Testing principles of learning using perceptual learning para- layers change earlier74. e | Behavioural performance transfer to different
digms. a | A deep network model of perceptual learning 74 . orientations (Ori) and spatial frequencies (SF) after training on tasks of dif
Clockwise-oriented or anticlockwise-oriented visual inputs flow through ferent angular separation74. f | A schematized conceptual space of learning
layers of weights to an output layer that reports the direction of rotation. ht rules, in which specific learning rules are points. Experimental observations
and hr denote the last hidden unit for the target and reference images, may span large regions of this space (coloured circles) and thus could be
respectively; p(CW) is the probability that the target image is clockwise with theoretically consistent with multiple learning rules. Intersecting many con
respect to the reference. b | Tuning curve slope changes due to learning straints can begin to narrow the set of candidate learning algorithms. TO,
measured in primate primary visual cortex (V1)150. c | Tuning curve slope trained orientation. Parts a, c, d and e adapted from ref. 74, CC BY 4.0
changes due to gradient descent learning in the model illustrated in part a74. (https://creativecommons.org/licenses/by/4.0/). Part b adapted from ref.150,
d | Timing of peak synaptic changes in each layer. The weights at higher Springer Nature Limited.
compose new behaviours from existing via a ‘blind search’ process, allowing concept for end-to-end learning in complex
subcomponents? How do we rapidly neuroscientists to sidestep the problem of cognitive architectures.
acquire and generalize new memories? modelling them explicitly32. Indeed, recent Neuroscientists can harness their
These are important questions for AI advances in AI research have followed the familiar experimental toolkit to study
researchers as well, and indeed, some have successful fusion of deep learning with other cognition using deep networks, pressing
expressed a hope that machine learning will methods, such as reinforcement learning5, towards more complex behaviours and
soon offer more powerful models in which content-addressable memory77,78 or Monte exposing current limitations of the deep
higher cognitive functions emerge naturally Carlo tree search4, demonstrating a proof of learning hypothesis. One potentially fruitful

Perspectives
approach is to identify specific problems It is thus unclear whether invasive tools of stimuli94,98–104. Moreover, in a bandit
or tasks in which human performance and for recording and interference (such task, the entorhinal cortex is one brain
network performance are qualitatively as electrophysiology or optogenetics) region where a consistent mapping exists
different. For example, if networks struggle can be used to study generalization and between neural patterns and the covariance
with one problem class whereas humans transfer in animals. Moreover, to study among stimuli and rewards, irrespective of
do not, one can use techniques from human abstraction, we are obliged to the physical images involved105. Stitching
psychology and neuroscience to identify use macroscopic imaging methods such together multiple patterns of association,
how humans solve the problem, and then as fMRI, magnetoencephalography and and learning their structures, could enable
ask how the learning rule or architecture electroencephalography that are less well animals to learn a comprehensive model
may be varied to accommodate the same suited to revealing how computation unfolds of the world that can be used for navigation,
behaviour in deep networks. Experimental in neural circuits. Nevertheless, inventive inference and planning106.
interrogation of interacting brain systems new ways of using these tools are being In parallel with this growing emphasis on
can help to identify constraints on network developed, allowing researchers to probe the virtues of model-based computation in
submodules that generate forms of replay86–88, changes in excitatory–inhibitory neuroscience, machine learning researchers
information processing that resemble balance89,90 and hexagonal (grid) coding91,92 are building powerful deep generative
memory, attention or reasoning systems in human brain signals. Second, humans models that are capable of disentangling
in biological agents. Thus, the research (and other animals) usually enter the the world into its latent factors, and
programme should build upon the work laboratory with rich past experiences that recomposing these to construct realistic
of past decades, in which neuroscientists sculpt the ways in which they learn. This synthetic 3D images70,107,108. However, to
have experimentally dissected cognitive complicates direct comparisons between date, wiring these generative models up
systems, in many cases providing a detailed, humans and neural networks, because it with control systems to build intelligent
computationally grounded account of their is difficult to imbue artificial systems with agents has proved challenging, despite some
function. For example, we understand equivalent priors, or to eliminate human promising efforts78. Indeed, AI researchers
a great deal about the navigation system priors using wholly novel stimuli. Third, have struggled to build model-based
in the rodent medial temporal lobe (MTL)79, humans and neural networks learn over systems that can hold their own against
the motor system in songbirds80 and the very different timescales. For example, model-free agents in benchmark problems
saccadic system in the macaque81. With this deep reinforcement learning systems exceed such as Atari games109. It is paradoxical that
foundation, deep learning approaches can human performance on Atari video games this is occurring against a rich backdrop
help explain key computational trade-offs but require many times more training than of neuroscience research that emphasizes
and underlying reasons for the specialized, a human player to acquire this proficiency93. the virtues of model-based inference.
interacting subsystems evident in biology. In an end-to-end learning system, Neuroscientists have even begun to
abstract representations need to be unravel how seemingly idiosyncratic coding
Abstraction and generalization grounded in experience. One possibility is properties in the MTL and other structures
Deep networks excel when data are that lifelong exposure to huge volumes of may be hallmarks of a normative scheme
abundant and training is exhaustive. sensory data might allow strong invariances for computationally efficient planning and
However, they struggle to extrapolate this to emerge naturally via either supervised or inference110,111. For example, grid cell codes in
knowledge to new environments populated unsupervised learning. There is evidence the medial entorhinal cortex and elsewhere
by previously unseen features and objects. that cells in the MTL, which sits at the apex may be signatures of a neural code that has
Humans, by contrast, seem to generalize of the primate ventral stream, develop learned the geometry by which space itself
effectively28. For example, most people can physically invariant coding properties. is structured, potentially supporting transfer
navigate a foreign city where the language, For example, in humans, ‘concept’ cells learning for navigation111. There are even
coinage and customs are unfamiliar, because encode famous individuals or landmarks, hints that this coding scheme may apply
they understand concepts such as ‘greeting’, irrespective of whether they are denoted by to non-spatial as well as spatial domains92,
‘taxi’ and ‘map’. A popular view is that deep pictures or words94. Echoes of this coding potentially laying the foundations for a
networks fail to transfer knowledge because scheme in which MTL coding is tied more theory of higher-order human reasoning112.
they do not form neural codes that abstract tightly to allocentric space can be seen in Although machine learning researchers have
over physically dissimilar domains. Building other animals. For example, in rodents, noted that lattice-like codes may emerge
deep networks that can generalize in this hippocampal place cells encode locations when deep reinforcement learning systems
way would be a major milestone for machine in a way that is invariant to viewpoint are trained to navigate113,114, they have yet
learning. In parallel, this difference provides and heading direction95, and in primates, to build on these insights for building
an incentive for neuroscientists to study ‘schema’ cells activate across environments stronger AI. More generally, understanding
how biological brains encode, compose in a way that allows for generalization how to simulate biologically plausible model-
and generalize abstract knowledge82–84. over common spatial configurations96. based computations in a way that is useful to
Unfortunately, key methodological These neural codes for high-level concepts machine learning researchers is a potentially
challenges arise for neuroscientists can form when different features, objects rich intellectual seam that neuroscientists are
seeking to address this question. First, or locations are repeatedly associated only just beginning to exploit.
it is unknown whether experimental in space or time, for example through
animals such as rodents and macaques Hebbian learning97. Indeed, fMRI studies of Resource allocation in learning
(or even our closer primate cousins)85 have statistical learning have revealed that neural Humans and other animals continue to
evolved neural mechanisms that permit similarities (such as multivoxel pattern learn across their lifespan. This ‘continual’
the strong, flexible transfer of knowledge overlap) in the MTL recapitulate association learning might allow a human to acquire
characteristic of human intelligence. strengths for pairs, lines, maps or hierarchies a second language, a monkey to adopt a
www.nature.com/nrn
Perspectives
new social role or a rodent to navigate in a a crucial factor that enables deep networks A more general question concerning
novel environment. This is in stark contrast to exhibit strong performance in temporally resource allocation is how biological systems
to most current AI systems, which lack correlated environments125, including have evolved both to minimize negative
the flexibility to acquire new behaviours deep reinforcement learning agents for transfer (interference) and maximize
once they have achieved convergence on dynamic video games5. Pleasingly, this has positive transfer (generalization) among
an initial task. Building machines that can allowed theorists to draw a link between tasks. One fascinating theoretical perspective
learn continually, as humans and other computational solutions to continual argues that the capacity limits that are
animals do, is proving one of the thorniest learning in biological intelligence and AI126. inherent in biological control processes
challenges in contemporary machine Adaptations of the complementary learning are a response to this conundrum136.
learning research115. Fortunately, however, system framework allow it to account for Using simulations involving deep networks,
this question has opened up new avenues seemingly contradictory phenomena, such Musslick et al. show that shared and separate
for neuroscience research focused on as the involvement of MTL structures in task representations have mixed costs
how biology may have solved continual rapid statistical learning116. and benefits, with shared codes enabling
learning8,116. Although evidence that offline generalization between tasks at the risk of
It has long been noted that, in neural replay may be important for memory interference between tasks. They suggest
networks, learning pursuant to an initial consolidation has grown, the problem that the brain has found a solution by
task A is often overwritten during subsequent of continual learning has prompted new promoting shared neural codes, which in
training on task B (known as ‘catastrophic questions for neuroscientists. Is biological turn enables strong transfer, but deploying
interference’)117. This occurs because a learning actively partitioned so as to avoid control processes to gate out irrelevant
parameterization that solves task A is not catastrophic interference? Unlike neural tasks that might provoke interference. They
guaranteed to solve any other task, and so networks, animals do not always benefit suggest that this answers the question of
during training on task B, gradient descent from interleaved study conditions (imagine why, despite a brain that comprises billions
drives network weights away from the local learning the violin and the cello at once). of neurons and trillions of connections,
minimum (that is, a setting that specifies For example, humans who have been humans struggle with multitasking problems
a local optimum for behaviour) for task A. trained in a blocked regimen to classify such as typing a line of computer code while
It occurs even when the network has naturalistic stimuli (trees) according to answering a verbal question136.
sufficient capacity to perform both tasks, two orthogonal boundaries (their ‘leafiness’
because simultaneous (or ‘interleaved’) versus their ‘branchiness’) perform better Understanding deep networks
training allows the discovery of a setting on a later interleaved test compared with To fully realize the promise of deep neural
that jointly solves tasks A and B. In humans, those who experienced the same conditions networks as scientific theories of brain
new learning can sometimes degrade extant at training and test8. Other evidence function, we need to understand how
performance, for example when memorizing from human category learning implies they work. Unfortunately, the computations
associate pairs A–C after having encoded that human knowledge may be actively performed by deep networks are enmeshed
pairs A–B, but in general interference partitioned by time and context127,128. in millions of trainable parameters, such
effects are far less dramatic than for neural Indeed, promising solutions to continual that they have been dubbed ‘black boxes’.
networks118. learning in the machine learning literature Despite this complexity, however, in neural
One popular model suggests rely on the identification of weight networks we can access every synaptic
that mammals have evolved to solve subspaces in which new learning is least weight and unit activation over the course
continual learning by using complementary likely to cause retrospective interference, of learning, a feat that remains impossible
learning systems in the hippocampus and for example by ‘freezing’ synapses that in animal models. These considerations
neocortex116,119,120. Unlike the cortex, the are more likely to participate in extant raise thorny questions concerning the
hippocampus can rapidly learn sparse tasks129,130. These tools are more effective utility of deep networks as neural models,
(or ‘pattern-separated’) representations of when coupled with a gating process that and more generally, what it means to
specific experiences, often called ‘episodic overtly earmarks neural subspaces for new ‘understand’ a neural process via a
memories’121, and these memories are learning, in a way that resembles top-down computational model34.
replayed offline during periods of rest or attention in the primate neocortex131,132. Thus far, many neuroscientists drawing
sleep122. Hippocampal replay provides an Another intriguing possibility is that on the deep learning toolkit have preferred
opportunity for virtual interleaving of past unsupervised processes facilitate continual to use simulations of off-the-shelf black
and present experiences, potentially allowing learning in biological systems by clustering box deep networks as neural models29.
memories to be gradually consolidated neural representations according to their However, collaborations between theoretical
into neocortical circuits in a way that context. Hebbian learning might encourage neuroscientists, physicists and computer
circumvents the problem of catastrophic the formation of orthogonal neural codes scientists have paved the way for a new
interference. This theory is supported for different temporal contexts133, which approach that uses idealized neural
by a wealth of evidence, including the in turn enables tasks to be learned in network models to understand the
finding that hippocampal damage leads different neural subspaces132. The curious mathematical principles by which they
to a gradient of retrograde amnesia123, phenomenon of ‘representational drift’ learn137, and deploys the results to predict
and reports of double dissociations between (in which neural codes meander or explain phenomena in psychology
instance-based memory (or ‘recollection’) unpredictably over time)134 might reflect or neuroscience138. For this endeavour
in the hippocampus and summaries of the allocation of information to different to be tractable, deep network models
past experience (or ‘familiarity’) in the neural circuits in distinct contexts, enabling must be simplified (Fig. 4), for example by
neocortex124. In more recent years, artificial task knowledge to be partitioned in a way using linear activation functions (‘deep
replay of past experiences has emerged as that minimizes interference135. linear’ networks)139 (Fig. 4a–c) or specially

Perspectives
Simulation Theory
a b + c Random
Unsupervised
pretraining
Orthogonal
Training error
Training error
Weight
0
–
Training time Training time Layers
training time
d e f
Small network simulation Train Simulation
Large network simulation Test Theory
Inﬁnite-width theory
Generalization error
Training error
Error
Underparam
Overparam
Trainingtime
Training time Training time No. of samples/no. of parameters
Fig. 4 | understanding deep networks using idealized models. Although neurons. Schematic of training error of non-linear networks of different
mathematical insight into general non-linear deep networks remains chal sizes trained on the same task from different random initial conditions.
lenging, there are a growing number of well-understood simplified settings. Simulations of small networks with few neurons often exhibit complex
a,b | The error-corrective learning process in deep neural networks is often trajectories that end at local minima with non-zero error (light purple).
simulated on a computer and can exhibit complex training error dynamics By contrast, simulations of large networks with many neurons reliably find
(part a) and complex synaptic weight dynamics (part b) as schematized here a zero-error solution and take similar trajectories (dark purple). Remarkably,
(solid curves). By simplifying the neural non-linearity, deep linear networks as the number of neurons tends to infinity in a specific initialization regime,
permit exact analytical solutions for training error dynamics (part a) and the trajectory can be described analytically142,143 (dashed red line).
weight dynamics (part b) from certain initializations, plotted as dotted e,f | Tractable settings can also arise by placing assumptions on how data
curves139. These solutions explicitly describe the trajectory of every weight are generated. In one influential approach, a ‘teacher’ neural network labels
over the course of training, removing the need to simulate these networks data for a ‘student’ neural network. This setting allows analytical description
and directly revealing the impact of dataset statistics on learning dynamics. of both training (blue) and testing (light blue) error, schematized in part e,
c | Analytical solutions have shed light on various phenomena, including and permits analysis of the overtraining phenomenon140,141,144,145. As sche
how training speed in deep linear networks depends on network initializa matized in part f, the student–teacher setting enables analytical predictions
tion. As schematized here, deep linear networks starting with small random (dashed red line) for the generalization error in the ‘high-dimensional’
weights train exponentially slowly as depth increases, those with unsuper regime, where data are scarce relative to the number of weights. These
vised layerwise pretraining train linearly and training speed in networks predictions closely match the performance of simulated large networks
with large orthogonal initializations is depth independent, consistent with (purple dots), and explain why generalization error peaks at the transition
empirical findings from simulations of non-linear networks137,139. d | Learning from overparameterization (overparam) to underparameterization
dynamics can simplify in very large ‘wide’ networks with many non-linear (underparam)141,145,151.
structured environments140,141. Often the bad local minima in the loss landscape, learning reminiscent of critical period
behaviour of deep networks becomes leading to suboptimal outcomes141,144,146 plasticity (‘unsupervised pretraining’) can
simpler in ‘limit’ cases, such as when the (Fig. 4d). Leveraging these simplifying accelerate future learning with gradient
number of neurons per layer diverges assumptions has allowed researchers to descent139 (Fig. 4c) and why generalization
towards infinity (infinite width limit)142,143 derive exact solutions for the learning to unseen data suffers at the transition to
(Fig. 4d) or when the number of data trajectories that every single synapse overparameterization144–146 (Fig. 4e,f). This
samples and model parameters both diverge will follow in certain networks139,142,143 work challenges the notion that deep neural
towards infinity but their ratio is finite (Fig. 4a,b,d). These network idealizations networks are black boxes and promises
(the high-dimensional limit)144,145 (Fig. 4e,f). have generated mathematical insight interpretable neural network models of
Paradoxically, infinite-size networks can be into perplexing questions about network biological phenomena.
more interpretable than those with fewer behaviour, including why deep networks Recently, this approach has been applied
units, because their learning trajectory is are often slower to train138 (Fig. 4c), why an to the study of semantic cognition138 (Fig. 5).
stabler and not prone to being waylaid by initial epoch of layer-by-layer statistical During development, children transition
www.nature.com/nrn
Perspectives
through quasi-discrete stages in which classes of model converge to identical Conclusions

they rapidly acquire new categories or terminal solutions. This highlights the Deep learning models have much to offer
concepts. Their learning is also highly importance for neuroscientists of studying neuroscience. Most exciting is the potential
structured: for example, semantic learning dynamics — that is, the trajectory to go beyond handcrafting of function,
knowledge is progressively differentiated, that learning takes — rather than simply and to understand how computation
as children pick up on broader hierarchical examining representations in networks that emerges from experience. Neuroscientists
distinctions (‘animal’ versus ‘plant’) before have converged. have recognized this opportunity, but its
finer distinctions (‘rose’ versus ‘daisy’), and One potential concern is that insights exploitation has only just begun. In this
displays stereotyped errors (such as thinking acquired in this way might not scale, Perspective, we have tried to offer a road
that worms have bones)147. Deep networks because models are idealizations that map for researchers wishing to use deep
trained on richly structured data (Fig. 5a) eschew the messy complexity of state- networks as neural theories. Our principal
are known to exhibit these phenomena148, of-the art deep networks and make exhortation for neuroscientists is to use deep
but only recently has it been shown: that assumptions that are false for biology networks as predictive models that make
stage-like transitions arise due to so-called (such as linear transduction, or layers falsifiable predictions, and to use model
saddle points in the error surface (Fig. 5c), of infinite width). However, we argue that idealization methods to provide genuine
that progressive differentiation arises neural theory is well served by analytical understanding of how and why they might
from the way the singular values of the formulations of complex phenomena that capture biological phenomena. We caution
input–output correlations drive learning give rise to specific, falsifiable predictions against using increasingly complex models
over time (Fig. 5a–d) and that semantic for neural circuits and systems. We hope and simulations that outpace our conceptual
illusions arise from pressure to sacrifice that neuroscientists will incorporate insight, and discourage the blind search for
accuracy on exceptions to meet the global reductions of deep network models correspondences in neural codes formed
supervised objective138 (Fig. 5e). Moreover, into their canonical set of neural theories, by biological and artificial systems. Instead,
these phenomena can be shown to be a rather than only seeking correspondences we hope that neuroscientists will build
consequence of depth itself, arising in between brains and fully fledged deep models that explain human behaviour,
deep linear networks but not shallow learning systems that offer little hope learning dynamics and neural coding in
networks (Fig. 5c,e), even though the two of being understood. rich and fruitful ways, but without losing
the interpretability inherent to classical
a Living things b neural models.
Andrew Saxe ✉, Stephanie Nelli ✉
and Christopher Summerfield ✉
Animal Plant
Department of Experimental Psychology,
University of Oxford, Oxford, UK.
✉e-mail: andrew.saxe@psy.ox.ac.uk; stephanie.nelli@
Fish Bird Flower Tree
psy.ox.ac.uk; christopher.summerfield@psy.ox.ac.uk
https://doi.org/10.1038/s41583-020-00395-8
Start End Published online xx xx xxxx
Learning 1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning.
h
na
ry
se
isy
ne
ak
ﬁs
bi
na
Ro
Nature 521, 436–444 (2015).

O
Tu
Pi
Da
Ro
ld
Ca
Go
2. Krizhevsky, A., Hinton, G. E. & Sutskever, I. ImageNet

classification with deep convolutional neural networks.
Adv. Neural Inform. Process. Syst. 25, 1106–1114
c d e (2012).
Shallow Shallow 3. Eslami, S. M. A. et al. Neural scene representation
Deep Deep and rendering. Science 360, 1204–1210 (2018).
4. Silver, D. et al. Mastering the game of Go with deep
Worm has bones
neural networks and tree search. Nature 529,

Learning speed
484–489 (2016).
5. Mnih, V. et al. Human-level control through deep
Error
reinforcement learning. Nature 518, 529–533

(2015).
6. Hassabis, D., Kumaran, D., Summerfield, C. &
Botvinick, M. Neuroscience-inspired artificial
intelligence. Neuron 95, 245–258 (2017).
Learning time Hierarchical Learning time 7. Golan, T., Raju, P. C. & Kriegeskorte, N. Controversial
distinction stimuli: pitting neural networks against each other
as models of human recognition. Preprint at arXiv
https://arxiv.org/abs/1911.09288 (2020).
Fig. 5 | Developmental trajectories in deep linear neural networks. a | An idealized hierarchical 8. Flesch, T., Balaguer, J., Dekker, R., Nili, H. &
environment. Items (leaf nodes) possess many properties, such as ‘can fly’ or ‘has roots’. Nearby items Summerfield, C. Comparing continual task learning
in minds and machines. Proc. Natl Acad. Sci. USA
in the tree are more likely to share properties. b | Two-dimensional embedding of internal representa 115, E10313–E10322 (2018).
tions for each item over learning in a deep linear network trained to output each item’s properties. The 9. Geirhos, R. et al. Generalisation in humans and deep
network exhibits progressive differentiation, passing through a series of stages in which higher hier neural networks. NeurIPS Proc. (2018).
10. Zhou, Z. & Firestone, C. Humans can decipher
archical distinctions are learned before lower distinctions. c | As schematized here, only deep networks adversarial images. Nat. Commun. 10, 1334
exhibit quasi-stage-like transitions in learning, which arise from saddle points in the error surface138. (2019).
d | For a class of hierarchies, learning speed decreases as a function of hierarchical level (indicated by 11. Rajalingham, R. et al. Large-scale, high-resolution
comparison of the core visual object recognition
colour, as in part a), and the network exhibits progressive differentiation starting with the broadest behavior of humans, monkeys, and state-of-the-art
distinction. e | Deep but not shallow networks can make transient errors on specific items and prop deep artificial neural networks. J. Neurosci. 38,
erties (such as asserting that ‘worms have bones’) during learning reminiscent of human semantic 7255–7269 (2018).
12. Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven
development. Parts a, b, d and e adapted with permission from ref.138, National Academy of Sciences, deep learning models to understand sensory cortex.
USA. Insert in part c adapted with permission from ref.137, Annual Reviews. Nat. Neurosci. 19, 356–365 (2016).

Perspectives
13. Yamins, D. L. K. et al. Performance-optimized 38. Fusi, S., Miller, E. K. & Rigotti, M. Why neurons mix: of anterior cingulate cortex function. Neuron 79,
hierarchical models predict neural responses in high dimensionality for higher cognition. Curr. Opin. 217–240 (2013).
higher visual cortex. Proc. Natl Acad. Sci. USA 111, Neurobiol. 37, 66–74 (2016). 63. Deen, B. et al. Organization of high-level visual cortex
8619–8624 (2014). 39. Rigotti, M. et al. The importance of mixed selectivity in human infants. Nat. Commun. 8, 13995 (2017).
14. Khaligh-Razavi, S.-M. & Kriegeskorte, N. Deep in complex cognitive tasks. Nature 497, 585–590 64. Op de Beeck, H. P., Pillet, I. & Ritchie, J. B. Factors
supervised, but not unsupervised, models may explain (2013). determining where category-selective areas emerge
IT cortical representation. PLoS Comput. Biol. 10, 40. Johnston, W. J., Palmer, S. E. & Freedman, D. J. in visual cortex. Trends Cogn. Sci. 23, 784–797
e1003915 (2014). Nonlinear mixed selectivity supports reliable neural (2019).
15. Kell, A. J. E., Yamins, D. L. K., Shook, E. N., computation. PLoS Comput. Biol. 16, e1007544 65. Arcaro, M. J., Schade, P. F., Vincent, J. L., Ponce, C. R.
Norman-Haignere, S. V. & McDermott, J. H. A task- (2020). & Livingstone, M. S. Seeing faces is necessary for
optimized neural network replicates human auditory 41. Raposo, D., Kaufman, M. T. & Churchland, A. K. face-domain formation. Nat. Neurosci. 20,
behavior, predicts brain responses, and reveals a A category-free neural population supports evolving 1404–1412 (2017).
cortical processing hierarchy. Neuron 98, 630–644. demands during decision-making. Nat. Neurosci. 17, 66. Olshausen, B. A. & Field, D. J. Emergence of simple-
e16 (2018). 1784–1792 (2014). cell receptive field properties by learning a sparse
16. Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A. 42. Higgins, I. et al. Unsupervised deep learning identifies code for natural images. Nature 381, 607–609
& Oliva, A. Comparison of deep neural networks to semantic disentanglement in single inferotemporal (1996).
spatio-temporal cortical dynamics of human visual neurons. Preprint at arXiv https://arxiv.org/abs/ 67. Simoncelli, E. P. & Olshausen, B. A. Natural image
object recognition reveals hierarchical correspondence. 2006.14304 (2020). statistics and neural representation. Annu. Rev.
Sci. Rep. 6, 27755 (2016). 43. Park, I. M., Meister, M. L. R., Huk, A. C. & Pillow, J. W. Neurosci. 24, 1193–1216 (2001).
17. Kar, K., Kubilius, J., Schmidt, K., Issa, E. B. & Encoding and decoding in parietal cortex during 68. Friston, K. J. The free-energy principle: a unified brain
DiCarlo, J. J. Evidence that recurrent circuits are sensorimotor decision-making. Nat. Neurosci. 17, theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
critical to the ventral stream’s execution of core object 1395–1403 (2014). 69. Kingma, D. P. & Welling, M. Auto-encoding variational
recognition behavior. Nat. Neurosci. 22, 974–983 44. Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. bayes. Preprint at arXiv https://arxiv.org/abs/
(2019). Context-dependent computation by recurrent dynamics 1312.6114 (2014).
18. Kietzmann, T. C. et al. Recurrence is required to in prefrontal cortex. Nature 503, 78–84 (2013). 70. Burgess, C. P. et al. MONet: unsupervised scene
capture the representational dynamics of the human 45. Chaisangmongkon, W., Swaminathan, S. K., decomposition and representation. Preprint at arXiv
visual system. Proc. Natl Acad. Sci. USA 116, Freedman, D. J. & Wang, X.-J. Computing by robust https://arxiv.org/abs/1901.11390 (2019).
21854–21863 (2019). transience: how the fronto-parietal network performs 71. Lillicrap, T. P., Cownden, D., Tweed, D. B. &
19. Guclu, U. & van Gerven, M. A. J. Deep neural sequential, category-based decisions. Neuron 93, Akerman, C. J. Random synaptic feedback weights
networks reveal a gradient in the complexity of 1504–1517.e4 (2017). support error backpropagation for deep learning.
neural representations across the ventral stream. 46. Engel, T. A., Chaisangmongkon, W., Freedman, D. J. Nat. Commun. 7, 13276 (2016).
J. Neurosci. 35, 10005–10014 (2015). & Wang, X. J. Choice-correlated activity fluctuations 72. Detorakis, G., Bartley, T. & Neftci, E. Contrastive
20. Elsayed, G. F. et al. Adversarial examples that fool underlie learning of neuronal category representation. Hebbian learning with random feedback weights.
both computer vision and time-limited humans. Nat. Commun. 6, 6454 (2015). Neural Netw. https://doi.org/10.1016/j.neunet.
NeurIPS Proc. (2018). 47. Remington, E. D., Egger, S. W., Narain, D., Wang, J. 2019.01.008 (2019).
21. Ullman, S., Assif, L., Fetaya, E. & Harari, D. Atoms of & Jazayeri, M. A dynamical systems perspective on 73. Saxe, A. Deep Linear Networks: A Theory of Learning
recognition in human and computer vision. Proc. Natl flexible motor timing. Trends Cogn. Sci. 22, 938–952 in the Brain and Mind. Thesis, Stanford Univ. (2015).
Acad. Sci. USA 113, 2744–2749 (2016). (2018). 74. Wenliang, L. K. & Seitz, A. R. Deep neural networks for
22. Sinz, F. H., Pitkow, X., Reimer, J., Bethge, M. & 48. Remington, E. D., Narain, D., Hosseini, E. A. & modeling visual perceptual learning. J. Neurosci. 38,
Tolias, A. S. Engineering a less artificial intelligence. Jazayeri, M. Flexible sensorimotor computations 6028–6044 (2018).
Neuron 103, 967–979 (2019). through rapid reconfiguration of cortical dynamics. 75. Whittington, J. C. R. & Bogacz, R. An approximation
23. Marblestone, A. H., Wayne, G. & Kording, K. P. Toward Neuron 98, 1005–1019.e5 (2018). of the error backpropagation algorithm in a predictive
an integration of deep learning and neuroscience. 49. Orhan, A. E. & Ma, W. J. A diverse range of factors coding network with local Hebbian synaptic plasticity.
Front. Comput. Neurosci. 10, 1–61 (2016). affect the nature of neural representations underlying Neural Comput. 29, 1229–1262 (2017).
24. Kell, A. J. & McDermott, J. H. Deep neural network short-term memory. Nat. Neurosci. 22, 275–283 76. Murphy, G. L. The Big Book of Concepts (MIT Press,
models of sensory systems: windows onto the role of (2019). 2002).
task constraints. Curr. Opin. Neurobiol. 55, 121–132 50. Masse, N. Y., Rosen, M. C. & Freedman, D. J. 77. Graves, A. et al. Hybrid computing using a neural
(2019). Reevaluating the role of persistent neural activity in network with dynamic external memory. Nature 538,
25. Kriegeskorte, N. Deep neural networks: a new short-term memory. Trends Cogn. Sci. 24, 242–258 471–476 (2016).
framework for modeling biological vision and brain (2020). 78. Wayne, G. et al. Unsupervised predictive memory in a
information processing. Annu. Rev. Vis. Sci. 1, 51. Masse, N. Y., Yang, G. R., Song, H. F., Wang, X.-J. goal-directed agent. Preprint at arXiv https://arxiv.org/
417–446 (2015). & Freedman, D. J. Circuit mechanisms for the abs/1803.10760 (2018).
26. Bowers, J. S. Parallel distributed processing theory maintenance and manipulation of information in 79. Moser, E. I., Kropff, E. & Moser, M.-B. Place cells, grid
in the age of deep networks. Trends Cogn. Sci. 21, working memory. Nat. Neurosci. 22, 1159–1167 cells, and the brain’s spatial representation system.
950–961 (2017). (2019). Annu. Rev. Neurosci. 31, 69–89 (2008).
27. Cichy, R. M. & Kaiser, D. Deep neural networks as 52. Lindsey, J., Ocko, S. A., Ganguli, S. & Deny, S. 80. Fee, M. S., Kozhevnikov, A. A. & Hahnloser, R. H. R.
scientific models. Trends Cogn. Sci. 23, 305–317 A unified theory of early visual representations from Neural mechanisms of vocal sequence generation in
(2019). retina to cortex through anatomically constrained the songbird. Ann. N. Y. Acad. Sci. 1016, 153–170
28. Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & deep CNNs. Preprint at bioRxiv https://doi.org/ (2004).
Gershman, S. J. Building machines that learn and 10.1101/511535 (2019). 81. Hanes, D. P. & Schall, J. D. Neural control of voluntary
think like people. Behav. Brain Sci. 40, e253 (2017). 53. Rahwan, I. et al. Machine behaviour. Nature 568, movement initiation. Science 274, 427–430 (1996).
29. Lindsay, G. W. Convolutional neural networks as 477–486 (2019). 82. Tenenbaum, J. B., Kemp, C., Griffiths, T. L. &
a model of the visual system: past, present, and 54. Thompson, J. A. F., Bengio, Y., Formisano, E. & Goodman, N. D. How to grow a mind: statistics,
future. J. Cogn. Neurosci. https://doi.org/10.1162/ Schönwiesner, M. How can deep learning advance structure, and abstraction. Science 331, 1279–1285
jocn_a_01544 (2020). computational modeling of sensory information (2011).
30. Zador, A. M. A critique of pure learning and what processing? Preprint at arXiv https://arxiv.org/abs/ 83. Tervo, D. G. R., Tenenbaum, J. B. & Gershman, S. J.
artificial neural networks can learn from animal brains. 1810.08651 (2018). Toward the neural implementation of structure
Nat. Commun. 10, 3770 (2019). 55. Schrimpf, M. et al. Brain-score: which artificial neural learning. Curr. Opin. Neurobiol. 37, 99–105 (2016).
31. Rogers, T. T. & Mcclelland, J. L. Parallel distributed network for object recognition is most brain-like? 84. Behrens, T. E. J. et al. What is a cognitive map?
processing at 25: further explorations in the Preprint at bioRxiv https://doi.org/10.1101/407007 Organizing knowledge for flexible behavior. Neuron
microstructure of cognition. Cogn. Sci. 38, (2018). 100, 490–509 (2018).
1024–1077 (2014). 56. Kriegeskorte, N. Representational similarity analysis – 85. Penn, D. C., Holyoak, K. J. & Povinelli, D. J. Darwin’s
32. Hasson, U., Nastase, S. A. & Goldstein, A. Direct fit connecting the branches of systems neuroscience. mistake: explaining the discontinuity between human
to nature: an evolutionary perspective on biological Front. Syst. Neurosci. 2, 4 (2008). and nonhuman minds. Behav. Brain Sci. 31, 109–130
and artificial neural networks. Neuron 105, 416–434 57. Bashivan, P., Kar, K. & DiCarlo, J. J. Neural population (2008).
(2020). control via deep image synthesis. Science 364, 86. Schuck, N. W. & Niv, Y. Sequential replay of nonspatial
33. Richards, B. A. et al. A deep learning framework for eaav9436 (2019). task states in the human hippocampus. Science 364,
neuroscience. Nat. Neurosci. 22, 1761–1770 (2019). 58. Ponce, C. R. et al. Evolving images for visual neurons eaaw5181 (2019).
34. Lillicrap, T. P. & Kording, K. P. What does it mean using a deep generative network reveals coding 87. Kurth-Nelson, Z., Economides, M., Dolan, R. J.
to understand a neural network? Preprint at arXiv principles and neuronal preferences. Cell 177, & Dayan, P. Fast sequences of non-spatial state
https://arxiv.org/abs/1907.06374 (2019). 999–1009.e10 (2019). representations in humans. Neuron 91, 194–204
35. Saxe, A., Bhand, M., Mudur, R., Suresh, B. & Ng, A. Y. 59. Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., (2016).
Unsupervised learning models of primary cortical MacIver, M. A. & Poeppel, D. Neuroscience needs 88. Liu, Y., Dolan, R. J., Kurth-Nelson, Z. & Behrens, T. E. J.
receptive fields and receptive field plasticity. Adv. behavior: correcting a reductionist bias. Neuron 93, Human replay spontaneously reorganizes experience.
Neural Inform. Process. Syst. 25, 1971–1979 480–490 (2017). Cell 178, 640–652.e14 (2019).
(2011). 60. Gomez-Marin, A. & Ghazanfar, A. A. The life of 89. Barron, H. C. et al. Unmasking latent inhibitory
36. Stevenson, I. H. & Kording, K. P. How advances in behavior. Neuron 104, 25–36 (2019). connections in human cortex to reveal dormant
neural recording affect data analysis. Nat. Neurosci. 61. Rich, A. S. & Gureckis, T. M. Lessons for artificial cortical memories. Neuron 90, 191–203 (2016).
14, 139–142 (2011). intelligence from the study of natural stupidity. 90. Koolschijn, R. S. et al. The hippocampus and
37. Saxena, S. & Cunningham, J. P. Towards the neural Nat. Mach. Intell. 1, 174–180 (2019). neocortical inhibitory engrams protect against
population doctrine. Curr. Opin. Neurobiol. 55, 62. Shenhav, A., Botvinick, M. M. & Cohen, J. D. memory interference. Neuron 101, 528–541.e6
103–111 (2019). The expected value of control: an integrative theory (2019).
www.nature.com/nrn
Perspectives
91. Doeller, C. F., Barry, C. & Burgess, N. Evidence for 113. Cueva, C. J. & Wei, X.-X. Emergence of grid-like 136. Musslick, S. et al. Multitasking capability versus
grid cells in a human memory network. Nature 463, representations by training recurrent neural networks learning efficiency in neural network architectures.
657–661 (2010). to perform spatial localization. Preprint at arXiv in Annual Meeting of the Cognitive Science Society
92. Constantinescu, A. O., OReilly, J. X. & Behrens, T. E. J. https://arxiv.org/abs/1803.07770 (2018). 829–834 (Cognitive Science Society, 2017).
Organizing conceptual knowledge in humans with 114. Banino, A. et al. Vector-based navigation using 137. Bahri, Y. et al. Statistical mechanics of deep learning.
a gridlike code. Science 352, 1464–1468 (2016). grid-like representations in artificial agents. Nature Annu. Rev. Condens. Matter Phys. 11, 501–528
93. Tsividis, P. A., Pouncy, T., Xu, J., Tenenbaum, J. B. 557, 429–433 (2018). (2020).
& Gershman, S. J. Human learning in Atari (AAAI, 115. Parisi, G. I., Kemker, R., Part, J. L., Kanan, C. & 138. Saxe, A. M., McClelland, J. L. & Ganguli, S. A
2017). Wermter, S. Continual lifelong learning with neural mathematical theory of semantic development in
94. Schapiro, A. C., Rogers, T. T., Cordova, N. I., networks: a review. Neural Netw. 113, 54–71 (2019). deep neural networks. Proc. Natl Acad. Sci. USA 116,
Turk-Browne, N. B. & Botvinick, M. M. Neural 116. Schapiro, A. C., Turk-Browne, N. B., Botvinick, M. M. 11537–11546 (2019).
representations of events arise from temporal & Norman, K. A. Complementary learning systems 139. Saxe, A. M., McClelland, J. L. & Ganguli, S. Exact
community structure. Nat. Neurosci. 16, within the hippocampus: a neural network modelling solutions to the nonlinear dynamics of learning in
486–492 (2013). approach to reconciling episodic memory with deep linear neural networks. Preprint at arXiv
95. O’Keefe, J. & Dostrovsky, J. The hippocampus as a statistical learning. Phil. Trans. R. Soc. B 372, https://arxiv.org/abs/1312.6120 (2014).
spatial map. Preliminary evidence from unit activity in 20160049 (2017). 140. Seung, H. S., Sompolinsky, H. & Tishby, N. Statistical
the freely-moving rat. Brain Res. 34, 171–175 (1971). 117. French, R. Catastrophic forgetting in connectionist mechanics of learning from examples. Phys. Rev. A 45,
96. Baraduc, P., Duhamel, J.-R. & Wirth, S. Schema cells networks. Trends Cogn. Sci. 3, 128–135 (1999). 6056–6091 (1992).
in the macaque hippocampus. Science 363, 635–639 118. McCloskey, M. & Cohen, N. J. Catastrophic 141. Goldt, S., Advani, M. S., Saxe, A. M., Krzakala, F. &
(2019). interference in connectionist networks: the sequential Zdeborová, L. Dynamics of stochastic gradient descent
97. Miyashita, Y. Neuronal correlate of visual associative learning problem. in Psychology of Learning and for two-layer neural networks in the teacher-student
long-term memory in the primate temporal cortex. Motivation Vol. 24 109–165 (Academic, 1989). setup. NeurIPS Proc. (2019).
Nature 335, 817–820 (1988). 119. McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. 142. Jacot, A., Gabriel, F. & Hongler, C. Neural tangent
98. Garvert, M. M., Dolan, R. J. & Behrens, T. E. Why there are complementary learning systems in kernel: convergence and generalization in neural
A map of abstract relational knowledge in the human the hippocampus and neocortex: insights from the networks. NeurIPS Proc. (2018).
hippocampal-entorhinal cortex. eLife 6, e17086 successes and failures of connectionist models of 143. Lee, J. et al. Wide neural networks of any depth
(2017). learning and memory. Psychol. Rev. 102, 419–457 evolve as linear models under gradient descent.
99. Schapiro, A. C., Kustner, L. V. & Turk-Browne, N. B. (1995). NeurIPS Proc. (2019).
Shaping of object representations in the human 120. O’Reilly, R. C., Bhattacharyya, R., Howard, M. D. & 144. Advani, M. S. & Saxe, A. M. High-dimensional
medial temporal lobe based on temporal regularities. Ketz, N. Complementary learning systems. Cogn. Sci. dynamics of generalization error in neural networks.
Curr. Biol. 22, 1622–1627 (2012). 38, 1229–1248 (2014). Preprint at arXiv https://arxiv.org/abs/1710.03667
100. Schapiro, A. C., Turk-Browne, N. B., Norman, K. A. 121. Tulving, E. Episodic memory: from mind to brain. (2017).
& Botvinick, M. M. Statistical learning of temporal Annu. Rev. Psychol. 53, 1–25 (2002). 145. Krogh, A. & Hertz, J. A. Generalization in a linear
community structure in the hippocampus:. 122. Carr, M. F., Jadhav, S. P. & Frank, L. M. Hippocampal perceptron in the presence of noise. J. Phys.
Hippocampus 26, 3–8 (2016). replay in the awake state: a potential substrate for Math. Gen. 25, 1135–1147 (1992).
101. Schlichting, M. L., Mumford, J. A. & Preston, A. R. memory consolidation and retrieval. Nat. Neurosci. 146. Dauphin, Y. et al. Identifying and attacking the saddle
Learning-related representational changes reveal 14, 147–153 (2011). point problem in high-dimensional non-convex
dissociable integration and separation signatures 123. Zola-Morgan, S. & Squire, L. The primate hippocampal optimization. NeurIPS Proc. (2014).
in the hippocampus and prefrontal cortex. formation: evidence for a time-limited role in memory 147. Carey, S. Précis of ‘The Origin of Concepts’.
Nat. Commun. 6, 8151 (2015). storage. Science 250, 288–290 (1990). Behav. Brain Sci. 34, 113–124 (2011).
102. Zeithamova, D., Dominick, A. L. & Preston, A. R. 124. Yonelinas, A. P. The nature of recollection and 148. Rogers, T. T. & McClelland, J. L. Semantic Cognition:
Hippocampal and ventral medial prefrontal activation familiarity: a review of 30 years of research. J. Mem. A Parallel Distributed Processing Approach
during retrieval-mediated learning supports novel Lang. 46, 441–517 (2002). (MIT Press, 2004).
inference. Neuron 75, 168–179 (2012). 125. van de Ven, G. M. & Tolias, A. S. Generative replay 149. Walker, E. Y. et al. Inception loops discover what
103. Park, S. A., Miller, D. S., Nili, H., Ranganath, C. & with feedback connections as a general strategy for excites neurons most using deep predictive models.
Boorman, E. D. Map making: constructing, combining, continual learning. Preprint at arXiv https://arxiv.org/ Nat. Neurosci. 22, 2060–2065 (2019).
and inferring on abstract cognitive maps. Neuron 107, abs/1809.10635 (2019). 150. Schoups, A., Vogels, R., Qian, N. & Orban, G.
1226–1238.e8 (2020). 126. Kumaran, D., Hassabis, D. & McClelland, J. L. Practising orientation identification improves
104. Kumaran, D., Banino, A., Blundell, C., Hassabis, D. What learning systems do intelligent agents need? orientation coding in V1 neurons. Nature 412,
& Dayan, P. Computations underlying social hierarchy Complementary learning systems theory updated. 549–553 (2001).
learning: distinct neural mechanisms for updating and Trends Cogn. Sci. 20, 512–534 (2016). 151. Belkin, M., Hsu, D., Ma, S. & Mandal, S. Reconciling
representing self-relevant information. Neuron 92, 127. Qian, T. & Aslin, R. N. Learning bundles of stimuli modern machine-learning practice and the classical
1135–1147 (2016). renders stimulus order as a cue, not a confound. bias–variance trade-off. Proc. Natl Acad. Sci. USA
105. Baram, A. B., Muller, T. H., Nili, H., Garvert, M. Proc. Natl Acad. Sci. USA 111, 14400–14405 (2014). 116, 15849–15854 (2019).
& Behrens, T. E. J. Entorhinal and ventromedial 128. Collins, A. G. E. & Frank, M. J. Cognitive control over
prefrontal cortices abstract and generalise the learning: creating, clustering, and generalizing task-set Acknowledgements
structure of reinforcement learning problems. structure. Psychol. Rev. 120, 190–229 (2013). This work was supported by generous funding from the
Preprint at bioRxiv https://doi.org/10.1101/827253 129. Kirkpatrick, J. et al. Overcoming catastrophic European Research Council (ERC Consolidator award to
(2019). forgetting in neural networks. Proc. Natl Acad. C.S. and Special Grant Agreement 3 of the Human Brain
106. Dolan, R. J. & Dayan, P. Goals and habits in the brain. Sci. USA 114, 3521–3526 (2017). Project) and a Sir Henry Dale Fellowship to A.S. from the
Neuron 80, 312–325 (2013). 130. Zenke, F., Poole, B. & Ganguli, S. Continual learning Wellcome Trust and Royal Society (grant number
107. Higgins, I. et al. Early visual concept learning with through synaptic intelligence. Proc. Mach. Learn. Res. 216386/Z/19/Z). A.S. is a CIFAR Azrieli Global Scholar in the
unsupervised deep learning. Preprint at arXiv 70, 3987–3995 (2017). Learning in Machines & Brains programme.
https://arxiv.org/abs/1606.05579 (2016). 131. Masse, N. Y., Grant, G. D. & Freedman, D. J. Alleviating
108. Higgins, I. et al. SCAN: learning hierarchical catastrophic forgetting using context-dependent gating Author contributions
compositional visual concepts. Preprint at arXiv and synaptic stabilization. Proc. Natl Acad. Sci. USA The authors contributed equally to all aspects of the article.
https://arxiv.org/abs/1707.03389 (2018). 115, E10467–E10475 (2018).
109. Hessel, M. et al. Rainbow: combining improvements 132. Zeng, G., Chen, Y., Cui, B. & Yu, S. Continual learning Competing interests
in deep reinforcement learning (AAAI, 2018). of context-dependent processing in neural networks. The authors declare no competing interests.
110. Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. Nat. Mach. Intell. 1, 364–372 (2019).
Design principles of the hippocampal cognitive map. 133. Bouchacourt, F., Palminteri, S., Koechlin, E. & Peer review information
Int. Conf. Neural Inform. Process. Syst. 2, 2528–2536 Ostojic, S. Temporal chunking as a mechanism for Nature Reviews Neuroscience thanks M. Jazayeri and the
(2014). unsupervised learning of task-sets. eLife 9, e50469 other, anonymous, reviewer(s) for their contribution to
111. Whittington, J. C. et al. The Tolman-Eichenbaum (2020). the peer review of this work.
machine: unifying space and relational memory through 134. Harvey, C. D., Coen, P. & Tank, D. W. Choice-specific
generalisation in the hippocampal formation. Preprint sequences in parietal cortex during a virtual-navigation Publisher’s note
at bioRxiv https://doi.org/10.1101/770495 (2019). decision task. Nature 484, 62–68 (2012). Springer Nature remains neutral with regard to jurisdictional
112. Bellmund, J. L. S., Gärdenfors, P., Moser, E. I. & 135. Rule, M. E., O’Leary, T. & Harvey, C. D. Causes and claims in published maps and institutional affiliations.
Doeller, C. F. Navigating cognition: spatial codes for consequences of representational drift. Curr. Opin.
human thinking. Science 362, eaat6766 (2018). Neurobiol. 58, 141–147 (2019). © Springer Nature Limited 2020

Perspectives: If Deep Learning Is The Answer, What Is The Question?

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Perspectives: If Deep Learning Is The Answer, What Is The Question?

Uploaded by

Copyright:

Available Formats

PERSpECTIVES

and computer scientists proposed neural

is the question? Contemporary deep networks resemble

Nature Reviews | Neuroscience

f LIP RNN g DMFC RNN

Nature Reviews | Neuroscience

a b One-versus-all object discriminability

Item Super-human Chance

Nature Reviews | Neuroscience

a Layer Layer Layer Layer Layer Layer

b Schoups et al. (2001) c High-precision model

–50 –25 0 25 50 –90 –45 0 45 90

d e f Space of learning rules

Higher layers change earlier

Separation (°) Informative units change more

Nature Reviews | Neuroscience

Nature Reviews | Neuroscience

through quasi-​discrete stages in which classes of model converge to identical Conclusions

Nature 521, 436–444 (2015).

2. Krizhevsky, A., Hinton, G. E. & Sutskever, I. ImageNet

neural networks and tree search. Nature 529,

reinforcement learning. Nature 518, 529–533

Nature Reviews | Neuroscience

Nature Reviews | Neuroscience

You might also like

through quasi-discrete stages in which classes of model converge to identical Conclusions