Professional Documents
Culture Documents
Exactly Solvable Statistical Physics Models For Large Neuronal Populations 2310.10860
Exactly Solvable Statistical Physics Models For Large Neuronal Populations 2310.10860
Christopher W. Lynn,1, 2, 3, ∗ Qiwei Yu,4 Rich Pang,5 William Bialek,2, 4, 6 and Stephanie E. Palmer7, 8
1
Initiative for the Theoretical Sciences, The Graduate Center,
City University of New York, New York, NY 10016, USA
2
Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ 08544, USA
3
Department of Physics, Quantitative Biology Institute,
and Wu Tsai Institute, Yale University, New Haven, CT 06520, USA
4
Lewis–Sigler Institute for Integrative Genomics,
Princeton University, Princeton, NJ 08544, USA
5
Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
6
Center for Studies in Physics and Biology, Rockefeller University, New York, NY 10065 USA
7
Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL 60637, USA
arXiv:2310.10860v1 [physics.bio-ph] 16 Oct 2023
8
Department of Physics, University of Chicago, Chicago, IL 60637, USA
(Dated: October 18, 2023)
Maximum entropy methods provide a principled path connecting measurements of neural activity
directly to statistical physics models, and this approach has been successful for populations of
N ∼ 100 neurons. As N increases in new experiments, we enter an undersampled regime where we
have to choose which observables should be constrained in the maximum entropy construction. The
best choice is the one that provides the greatest reduction in entropy, defining a “minimax entropy”
principle. This principle becomes tractable if we restrict attention to correlations among pairs of
neurons that link together into a tree; we can find the best tree efficiently, and the underlying
statistical physics models are exactly solved. We use this approach to analyze experiments on
N ∼ 1500 neurons in the mouse hippocampus, and show that the resulting model captures the
distribution of synchronous activity in the network.
It has long been hoped that neural networks in the maximum entropy approach takes this limited number of
brain could be described using concepts from statistical observables seriously, and insists that expectation values
physics [1–6]. More recently, our ability to explore the for these quantities predicted by the model match the
brain has been revolutionized by techniques that record values measured in experiment,
the electrical activity from thousands of individual neu-
rons, simultaneously [7–13]. Maximum entropy meth- ⟨fν (x)⟩P = ⟨fν (x)⟩exp , (1)
ods connect these data to theory, starting with measured
or more explicitly,
properties of the network and arriving at models that are
mathematically equivalent to statistical physics problems M
X 1 X
[14, 15]. In some cases these models provide successful, P (x)fν (x) = fν (x(m) ), (2)
parameter–free predictions for many detailed features of x
M m=1
the neural activity pattern [16, 17]. The same ideas have
been used in contexts ranging from the evolution of pro- where x(m) is the mth sample out of M samples in to-
tein families to ordering in flocks of birds [18–24]. But tal. Notice that to have control over errors in the mea-
as experiments probe systems with more and more de- surement of all K expectation values, we need to have
grees of freedom, the number of samples that we can col- K ≪ MN.
lect typically does not increase in proportion. Here we There are infinitely many distributions that obey these
present a strategy for building maximum entropy models matching conditions, but the idea of maximum entropy
in this undersampled regime, and apply this strategy to is that we should choose the one that has the least possi-
data from 1000+ neurons in the mouse hippocampus. ble structure, or equivalently generates samples that are
as random as possible while obeying the constraints in
Consider a system of N variables x = {xi }, i =
Eq. (1). From Shannon we know that “as random as pos-
1, 2, · · · , N . To describe the system, we would like to
sible” translates uniquely to finding the distribution that
write down the probability distribution P (x) over these
has the maximum entropy consistent with the constraints
microscopic degrees of freedom, in the same way that we
[25, 26]. The solution of this optimization problem has
write the Boltzmann distribution for a system in equilib-
the form
rium. In the example that we discuss below, all the xi " K #
are observed at the same moment in time, but we could 1 X
also include time in the index i, so that P (x) becomes a P (x) = exp − λν fν (x) , (3)
Z ν=1
probability distribution of trajectories.
From these N variables we can construct a set of K where the coupling constants λν must be chosen to satisfy
operators or observables {fν (x)}, ν = 1, 2, · · · , K. The Eq. (1). We emphasize that in using this approach, the
2
model in Eq. (3) is something that needs to be tested— (a) Mutual info. Iij (b) Construction Iij (c) Optimal tree
in systems such as networks of neurons, there is no H– 6 6 1 6
2 2 2
Tree
theorem telling us that entropy will be maximized, nor 2
1 1 1
5 5 3 5
is there a unique choice for the constraints that would 3 3 3
4 5 6
correspond to the Hamiltonian of an equilibrium system. 4 4 New 4
Without a Hamiltonian, how should we choose the ob-
servables fν ? Probability distributions define a code for FIG. 1. Computing the optimal tree of pairwise correlations.
the data [25] in which each state x is mapped to a code (a) Mutual information between variables in a hypothetiacal
word of length system. (b) One can build the optimal tree one variable at a
time, from an arbitrary starting variable, by iteratively adding
ℓ(x) = − log P (x), (4) the connection corresponding to the largest mutual informa-
tion (dashed) from the tree (solid) to the remaining variables;
this is Prim’s algorithm. (c) Optimal tree that minimizes the
so the mean code length for the data is
entropy ST and maximizes the information IT .
M
1 X
⟨ℓ⟩exp ≡− log P (x(n) ). (5)
M n=1 loops then it describes a tree T , and among other sim-
plifications we can write the entropy
Combining Eqs. (1) and (3), ⟨ℓ⟩exp is exactly the entropy X X
of P . Thus, among all maximum entropy distributions, − PT (x) log PT (x) ≡ ST = Sind − Iij , (7)
the one that gives the shortest description of the data x (ij)∈T
is the one with minimum entropy. This “minimax en-
tropy” principle was discussed 25 years ago [27], but has where Sind is the independent entropy of the individual
attracted relatively little attention. variables, and Iij is the mutual information between xi
Every time we add a constraint, the maximum possible and xj [Fig. 1(a)]. Because of the constraints on the
entropy is reduced [28], and this entropy reduction is an maximum entropy distribution, Iij is the same whether
information gain. Thus the minimax entropy principle we compute it from the model or from the data. Thus we
tells us to choose observables whose expectation values can compute the entropy of the pairwise maximum en-
provide as much information as possible about the micro- tropy model on any tree without constructing the model
scopic variables. The problem is that finding these max- itself.
imally informative observables is generally intractable. Minimizing the entropy ST in Eq. (7) is equivalent to
To make progress we restrict the class of observables maximizing the total mutual information
that we consider. With populations of N ∼ 100 neurons, X
it can be very effective to constrain just the mean activity IT = Iij . (8)
of each neuron and the correlations among pairs [14, 16, (ij)∈T
17]. But this corresponds to K ∝ N 2 constraints, and at
large N we will violate the good sampling condition K ≪ This defines a minimum spanning tree problem [15],
N M . To restore good sampling, we try constraining only which admits a number of efficient solutions. For exam-
Nc of the correlations, which define links between specific ple, one can grow the optimal tree by greedily attaching
pairs of neurons that together form a graph G. If we the new variable xi with the largest mutual information
describe the individual neurons as being either active or Iij to an existing variable xj [Figs. 1(b) and 1(c)]; this
silent, so that xi ∈ {0, 1}, then the maximum entropy is Prim’s algorithm, which runs in O(N 2 ) time [30]. By
distribution is an Ising model on the graph G, restricting observables to pairwise correlations that form
a tree, we can solve the minimax entropy problem ex-
actly, even at very large N . Further, we can give explicit
" #
1 X X
PG (x) = exp Jij xi xj + hi xi , (6) expressions for the fields and couplings [15, 31],
Z i
(ij)∈G
⟨xi xj ⟩ (1 − ⟨xi ⟩ − ⟨xj ⟩ + ⟨xi xj ⟩)
where {hi , Jij } must be adjusted to match the measured Jij = ln , (9)
(⟨xi ⟩ − ⟨xi xj ⟩) (⟨xj ⟩ − ⟨xi xj ⟩)
expectation values ⟨xi ⟩exp and ⟨xi xj ⟩exp for pairs (ij) ∈ ⟨xi ⟩
G. The minimax entropy principle tells us that we should hi = ln (10)
1 − ⟨xi ⟩
find the graph G with a fixed number of links Nc such
that PG (x) has the smallest entropy while obeying these
X (1 − ⟨xi ⟩) (⟨xi ⟩ − ⟨xi xj ⟩)
+ ln ,
constraints. This remains intractable. ⟨xi ⟩ (1 − ⟨xi ⟩ − ⟨xj ⟩ + ⟨xi xj ⟩)
j∈Ni
Statistical physics problems are hard because of feed-
back loops. If we can eliminate these loops then we can where Ni are the neighbors of i on the tree. The total
find the partition function exactly, as in one–dimensional number of constraints is K = 2N − 1, so if the number of
systems or on Bethe lattices [29]. If the graph G has no independent samples M ≫ 2, then we are well sampled.
3
(a) (b) 10 2
Applying our method, we identify the tree of maxi-
Prob. density
mally informative correlations [Fig. 3(a)], which captures
Mutual information Iij (bits)
0
10
IT = 26.2 bits of information; this is more than 50× the
10-2
average we find on a random tree [Fig. 3(b)]. Although
10-4
68% 32% a tree includes only 2/N ≈ 0.1% of all pairwise corre-
(c) lations, we have IT ≈ 0.144Sind , so that the correlation
Average structure we capture is as strong as freezing the states of
Iij (bits)
214 randomly selected neurons. Another way to assess
the strength of the interactions is to see that on the op-
Rank Correlation coef cient
timal tree the mean activity of each neuron ⟨xi ⟩ deviates
strongly from what is predicted by the field hi alone,
FIG. 2. Mutual information in a large population of neurons.
(a) Ranked order of all significant mutual information Iij in 1
⟨xi ⟩J=0 = . (11)
a population of N = 1485 neurons in the mouse hippocam- 1 + e−hi
pus [32]. Solid line and shaded region indicate estimates and
errors (two standard deviations) of Iij . (b) Distribution of This is shown in Fig. 3(d), where we compare the optimal
correlation coefficients over neuron pairs, with percentages tree with a random tree. So despite limited connectivity,
indicating the proportion of positively and negatively corre- the effective fields
lated pairs. (c) Mutual information Iij versus correlation co- X
efficient, where each point represents a distinct neuron pair. heff
i = hi + Jij σj (12)
Estimates and errors are the same in (a). j
Independent
short periods of activity, providing a natural way to dis- 0.1
cretize into active/silent (xi = 1/0) in each video frame 0.2 0.08
pairs of neurons are negatively correlated [Fig. 2(b)], the the perimeter are leaves (having one connection) with distance
strongest mutual information belong to pairs that are from the central neuron decreasing in the clockwise direction.
positively correlated [Fig. 2(c)]. Together, these obser- (c) Distributions of Ising interactions Jij [Eq. (9)]. (d) Av-
vations suggest that a sparse network of positively corre- erage activities ⟨xi ⟩ versus local fields hi , where each point
lated neurons may provide a large amount of information represents one neuron, and the dashed line illustrates the in-
about the collective neural activity. dependent prediction [Eq. (11)].
4
-1
10 Sind - S (bits) positive correlations.
sparse network of the strongest
Thus far, we have focused on a single population of
N ∼ 1500 neurons. But as we observe larger popu-
lations, how do the maximally informative correlations
10 -2
Probability P(K)
45
40
35
30
25
20
15
10
DLP projector
Ising ferromagnet. toroidal
screen
A (a)
B (b) 0.15 Optimal
Is it possible that a backbone of ferromagnetic pmt inter-
actions captures some of the collective behavior in the
Proportion I T / Sind
rot.stage 0.1
network? One signature of this collective behavior is the
pfa optical table
probability P (K) that K out of the N mouse
neurons are simul-
taneously active within a window of time [14, 17, 24, 33]. 0.05
For independent neurons, this distribution approximately 290 °
ity that K ∼ 50 or more neurons will be active in syn- each of the different neurons. (b) Fraction of the independent
chrony (Fig. 4, red). Although the detailed patterns of entropy IT /Sind explained by the optimal and random trees
activity in the system are shaped by competing interac- as a function of population size N . Lines and shaded regions
tions that are missing from our ferromagnetic backbone, represent median and interquartile range over different central
this shows that large–scale synchrony can emerge from a neurons.
5
In summary, it has been appreciated for nearly two [1] Norbert Wiener, Nonlinear Problems in Random Theory
decades that the maximum entropy principle provides a (MIT Press, Cambridge MA, 1958).
link from data directly to statistical physics models, and [2] Leon N Cooper, “A possible organization of animal mem-
this is useful in networks of neurons as well as other com- ory and learning,” in Collective Properties of Physical
Systems: Proceedings of Nobel Symposium 24, edited by
plex systems [14–24]. Less widely emphasized is that we B Lundqvist and S Lundqvist (Academic Press, New
do not have “the” maximum entropy model, but rather a York, 1973) pp. 252–264.
collection of possible models depending on what features [3] William A Little, “The existence of persistent states in
of the system behavior we choose to constrain. Quite the brain,” Math. Biosci. 19, 101–120 (1974).
generally we should choose the features that are most in- [4] John J Hopfield, “Neural networks and physical systems
formative, leading to the minimax entropy principle [27]. with emergent collective computational abilities,” Proc.
As we study larger and larger populations of neurons we Natl. Acad. Sci. U.S.A. 79, 2554–2558 (1982).
[5] Daniel J Amit, Modeling Brain Function: The World of
enter an undersampled regime in which selecting a lim- Attractor Neural Networks (Cambridge University Press,
ited number of maximally informative features is not only 1989).
conceptually appealing but also a practical necessity. [6] John Hertz, Anders Krogh, and Richard G Palmer,
The problem is that the minimax entropy principle Introduction to the Theory of Neural Computation
(Addison–Wesley, Redwood City, 1991).
is intractable in general. Here we have made progress
[7] Ronen Segev, Joe Goodhouse, Jason Puchalla, and
in two steps. First, following previous successes, we Michael J Berry II, “Recording spikes from a large frac-
focus on constraining the mean activity and pairwise tion of the ganglion cells in a retinal patch,” Nat. Neu-
correlations. Second, we take the lesson of the Bethe rosci. 7, 1155–1162 (2004).
lattice and select only pairs that define a tree. Once [8] AM Litke, N Bezayiff, EJ Chichilnisky, W Cunning-
we do this, the relevant statistical mechanics problem ham, W Dabrowski, AA Grillo, M Grivich, P Grybos,
is solved exactly, and the optimal tree can be found P Hottowy, S Kachiguine, et al., “What does the eye tell
the brain?: Development of a system for the large-scale
in quadratic time [15, 31]. This means that there is a
recording of retinal output activity,” IEEE Trans. Nucl.
non–trivial family of statistical physics models for large Sci. 51, 1434–1440 (2004).
neural populations that we can construct very efficiently, [9] Jason E Chung, Hannah R Joo, Jiang Lan Fan, Daniel F
and it is worth asking whether these models can capture Liu, Alex H Barnett, Supin Chen, Charlotte Geaghan-
any of the essential collective behavior in real networks. Breiner, Mattias P Karlsson, Magnus Karlsson, Kye Y
We find that the optimal tree correctly predicts the Lee, et al., “High-density, long-lasting, and multi-region
distribution of synchronous activity (Fig. 4), and these electrophysiological recordings using polymer electrode
arrays,” Neuron 101, 21–31 (2019).
models capture more of the correlation structure as
[10] Daniel A Dombeck, Christopher D Harvey, Lin Tian,
we look to larger networks [Fig. 5(b)]. The key to Loren L Looger, and David W Tank, “Functional imag-
this success is the heavy–tailed distribution of mutual ing of hippocampal place cells at cellular resolution dur-
information [Fig. 2(a)], and this in turn may be grounded ing virtual navigation,” Nat. Neurosci. 13, 1433–1440
in the heavy–tailed distribution of physical connections (2010).
[34]. While these models cannot capture all aspects of [11] Lin Tian, Jasper Akerboom, Eric R Schreiter, and
collective behavior, these observations provide at least a Loren L Looger, “Neural activity imaging with geneti-
cally encoded calcium indicators,” Prog. Brain Res. 196,
starting point for simplified models of the much larger
79–94 (2012).
systems now becoming accessible to experiments. [12] Jeffrey Demas, Jason Manley, Frank Tejera, Kevin Bar-
ber, Hyewon Kim, Francisca Martı́nez Traub, Brandon
We thank L. Meshulam and J.L. Gauthier for guid- Chen, and Alipasha Vaziri, “High-speed, cortex-wide
ing us through the data of Ref. [32], and C.M. Holmes volumetric recording of neuroactivity at cellular resolu-
tion using light beads microscopy,” Nat. Methods 18,
and D.J. Schwab for helpful discussions. This work
1103–1111 (2021).
was supported in part by the National Science Founda- [13] Nicholas A Steinmetz, Cagatay Aydin, Anna Lebe-
tion, through the Center for the Physics of Biological deva, Michael Okun, Marius Pachitariu, Marius Bauza,
Function (PHY–1734030) and a Graduate Research Fel- Maxime Beau, Jai Bhagat, Claudia Böhm, Martijn
lowship (C.M.H.); by the National Institutes of Health Broux, et al., “Neuropixels 2.0: A miniaturized high-
through the BRAIN initiative (R01EB026943); by the density probe for stable, long-term brain recordings,” Sci-
James S McDonnell Foundation through a Postdoctoral ence 372, eabf4588 (2021).
[14] Elad Schneidman, Michael J Berry II, Ronen Segev,
Fellowship Award (C.W.L.); and by Fellowships from
and William Bialek, “Weak pairwise correlations imply
the Simons Foundation and the John Simon Guggenheim strongly correlated network states in a neural popula-
Memorial Foundation (W.B.). tion,” Nature 440, 1007 (2006).
[15] H Chau Nguyen, Riccardo Zecchina, and Johannes Berg,
“Inverse statistical problems: from the inverse Ising prob-
lem to data science,” Adv. Phys. 66, 197–261 (2017).
[16] Leenoy Meshulam, Jeffrey L Gauthier, Carlos D Brody,
∗
Corresponding author: christopher.lynn@yale.edu David W Tank, and William Bialek, “Collective behav-
6