Professional Documents
Culture Documents
Mechanisms of Dimensionality Reduction and Decorrelation in Deep Neural Networks
Mechanisms of Dimensionality Reduction and Decorrelation in Deep Neural Networks
(Received 30 October 2017; revised manuscript received 26 November 2018; published 14 December 2018)
Deep neural networks are widely used in various domains. However, the nature of computations at each layer
of the deep networks is far from being well understood. Increasing the interpretability of deep neural networks
is thus important. Here, we construct a mean-field framework to understand how compact representations are
developed across layers, not only in deterministic deep networks with random weights but also in generative deep
networks where an unsupervised learning is carried out. Our theory shows that the deep computation implements
a dimensionality reduction while maintaining a finite level of weak correlations between neurons for possible
feature extraction. Mechanisms of dimensionality reduction and decorrelation are unified in the same framework.
This work may pave the way for understanding how a sensory hierarchy works.
DOI: 10.1103/PhysRevE.98.062313
062313-2
MECHANISMS OF DIMENSIONALITY REDUCTION AND … PHYSICAL REVIEW E 98, 062313 (2018)
be blocked from passing through that layer where the neural compared with a shallow RBM network. To study the expres-
activity becomes completely independent. High correlations sive property of the DBN, we first specify a data distribution
imply strong statistical dependence and thus redundancy. generated by a random RBM whose parameters follow the
An efficient representation must not be highly redundant normal distribution N (0, g/N ) for weights and N (0, σb ) for
[12], because a highly redundant representation cannot be biases. Using the random RBM as a data generator allows us
easily disentangled and is thus not useful for computation; to calculate analytically the complexity of the input data. In
e.g., coadaptation of neural activities is harmful for feature the random RBM, the hidden neural activity h at the top layer
extraction [22]. can be marginalized over using the conditional independence
The dimensionality reduction results from the nested [Eq. (2)]; thus the distribution of the representation v at the
nonlinear transformation of input data. For a mechanistic√ bottom layer can be expressed as (Appendix C)
explanation, by noting that lij is of the order O(1/ N ), we
expand Cijl = Kij lij + O[(lij )2 ] in a large-N limit 1
where κ ≡ [φ (xi0 )]2 (the average is taken over the random where a is the site index of hidden neurons, and Z is the parti-
network parameters, see Appendix B). For the random tion function. Based on the Bethe approximation [23], which
model, N 1 = g 2 κ 2 (N 0 + 1), which determines a captures weak correlations among neurons, the covariance of
g2 κ 2 neural activity (the same definition as before) under Eq. (3)
critical N ∗ = 1−g 2κ2 (so-called operating point), such
that a first boost of the correlation strength is observed can be computed from the approximate free energy using the
when 0 < ∗ (Fig. 2); otherwise, the correlation level linear response theory (Appendix D). The estimated statistics
is maintained, or decorrelation is achieved. The iteration of the random-RBM representation are used as a starting point
of l can be used to derive D̃ 1 = (N−1)10 +1+ϒ , where from which the mean-field complexity-propagation equation
[Eq. (1)] iterates, for the investigation of the dimensionality
ϒ > 0 and its value depends on the layer’s parameters.
and the redundancy reduction in the deep generative model.
Thus ϒ determines how significantly the dimensionality
Finally, we study the generative deep network where net-
is reduced, and the dimensionality reduction is explained
work parameters are learned in a bottom-up pass from the
as D̃ 1 < D̃ 0 = (N−1) 1
0 +1 . This relationship carries over to
representations at lower layers. The network parameters for
deeper layers (Appendix B), due to an additive positive term each stacked RBM in the DBN are updated by the contrast
in the denominator of the dimension formula. Therefore, divergence procedure truncated to one step [24]. With this lay-
the decorrelation of redundant inputs together with the erwise training, each layer learns a nonlinear transformation
dimensionality reduction is theoretically explained. of the data, and upper layers are conjectured to learn more
abstract (complex) concepts, which is a key step in object
III. A STOCHASTIC DEEP NETWORK recognition problems [25]. One typical learning trajectory
for each layer is shown in the top inset of Fig. 3(a), where
It is of practical interest to see whether a deep generative
the reconstruction error decreases with the learning epoch.
model trained in an unsupervised way has a similar collective
The input data and subsequent representation complexity are
behavior. We consider a deep belief network (DBN) as a
captured very well by the theory [Fig. 3(b)]. We use the mean-
typical example of stochastic deep networks [5], in which
field framework derived for deterministic networks to study
each neuron’s activity at one hidden layer takes a binary
the complexity propagation (starting from the statistics of the
value (±1) according to a stochastic function of the neuron’s
input data), which is reasonable, because to suppress the noise
preactivation. Specifically, the DBN is composed of multiple
due to sampling, the mean activities at the intermediate layer
restricted Boltzmann machines (RBMs) stacked on top of
are used as the input data when the next layer is learned [24].
each other (Fig. 1). RBM is a two-layered neural network,
Therefore, the stochasticity of the neural response is implicitly
where there are no lateral connections within each layer, and
encoded into the learned parameters during training.
the bottom (top) layer is also named the visible (hidden)
Compared to the initial input dimensionality, the represen-
layer. Therefore, given the input hl at the lth layer, the neural
tation dimensionality the successive layers create becomes
representation at a higher (l + 1-th) layer is determined by a
lower [Fig. 3(a)], which coincides with observations in the
conditional probability:
deterministic random deep networks. This feature does not
change when more neurons are used in each layer. Dur-
e hl+1
i [wl+1 hl ]i +bil+1
P (hl+1 |hl ) = . (2) ing learning, the evolution of the dimensionality displays a
i
2 cosh [wl+1 hl ]i + bil+1 nonmonotonic behavior [Fig. 3(c)]: the dimensionality first
increases and then decreases to a stationary value. Moreover,
Similarly, P (hl |hl+1 ) is also factorized. the learning decorrelates the correlated input, whereas, after
The DBN as a generative model, once network parameters the first drop, the learning seems to preserve a finite level of
(weights and biases) are learned (so-called training) from correlations [the bottom inset of Fig. 3(a)]. These compact
a data distribution, can be used to reproduce the samples representations may remove some irrelevant factors in the
mimicking that data distribution. With deep layers, the net- input, which facilitates the formation of easily decoded repre-
work becomes more expressive to capture high-order inter- sentations at deeper layers. Our theoretical analysis in random
dependence among components of a high-dimensional input, neural networks will likely carry over to this unsupervised
062313-3
HAIPING HUANG PHYSICAL REVIEW E 98, 062313 (2018)
FIG. 3. (a) Representation behavior as a function of depth in generative deep networks. Ten network realizations are considered for each
network width. The top inset shows an example (N = 150) of reconstruction errors (ε ≡
h − h
22 ) between input h and reconstructed one
h for each layer during learning. The bottom inset shows the overall strength of covariance as a function of depth. (b) Numerically estimated
off-diagonal correlation versus its theoretical prediction (N = 150). In these plots, we generate M = 60 000 training examples (each example
is an N -dimensional vector) from the random RBM whose parameters follow the normal distribution N (0, g/N ) for weights and N (0, σb ) for
biases. g = 0.8 and σb = 0.1. Then these examples are learned by the DBN (see simulation details in Appendix C). (c) One typical learning
trial shows how the estimated dimensionality evolves.
learning system; e.g., the operating point may explain the make feature extraction possible, in accord with the redun-
low-level preserved correlation. dancy reduction hypothesis [12]. In the deep computation,
By looking at the eigenvalue distribution of the covariance more abstract concepts captured at higher layers along the
matrix, we find that the distribution for the unsupervised hierarchy are typically built upon less abstract ones at lower
learning system deviates significantly from the Marchenko- layers, and high-level representations are generally invariant
Pastur law of a Wishart ensemble [26] (Appendix E). For the to local changes of the input [2], which thereby coincides
random neural networks, the eigenvalue distribution at deep with our theory that demonstrates a compact (compressed)
layers seems to assign a higher probability density when the representation formed by a series of dimensionality reduction.
eigenvalue gets close to zero, yet a lower density at the tail of Unwanted variability may be suppressed in this compressed
the distribution, compared with the Marchenko-Pastur law of a representation. It was hypothesized that neuronal manifolds
random-sample covariance matrix [27] (Appendix E). There- at lower layers are strongly entangled with each other, while
fore, the dimensionality reduction and its relationship with at later stages, manifolds are flattened to facilitate that rele-
decorrelation are a nontrivial result of the deep computation. vant information can be easily decoded by downstream areas
[1,2,21], which connects to the small level of correlations
preserved in the network for a representation that may be
IV. SUMMARY
maximally disentangled [28].
Brain computation can be thought of as a transformation of Our work thus provides a theoretical underpinning of the
internal representations along different stages of a hierarchy hierarchical representations, through a physics explanation of
[3,21]. Deep artificial neural networks can also be interpreted dimensionality reduction and decorrelation, which encourages
as a way of creating progressively better representations of several directions such as generalization of this theory to
input sensory data. Our work provides mean-field evidence more complex architectures and data distributions, demon-
about this picture that compact representations of relatively stration of how the compact representation helps generaliza-
low dimensionality are progressively created by deep compu- tion (invariance) or discrimination (selectivity) in a neural
tation, while a small level of correlations is still maintained to system [29,30], and using the revealed principles to control
062313-4
MECHANISMS OF DIMENSIONALITY REDUCTION AND … PHYSICAL REVIEW E 98, 062313 (2018)
the complexity of internal representations for an engineering where Kij = φ (xi0 )φ (xj0 ) and xi,j
0
= bi,j
l
+ [wl ml−1 ]i,j . It
application. follows that
2
l K 2 2
ACKNOWLEDGMENTS N (N − 1) i<j ij ij
I thank Hai-Jun Zhou and Chang-Song Zhou for their g 2 Kij2
l−1 2
insightful comments. This research was supported by AMED g 2 Kij2 l−1 + Cii , (B2)
under Grant No. JP18dm020700 and the start-up budget Grant N2 i
No. 74130-18831109 of the 100 Talents Program of Sun
2
Yat-sen University. where Kij2 [φ (xi0 )]2 in which the mean (overline) is taken
over the quenched disorder, based on the facts that the cor-
relation between different weights is negligible and that the
APPENDIX A: DERIVATION OF MEAN-FIELD covariance of the mean preactivations of different units is
EQUATIONS FOR THE COMPLEXITY PROPAGATION IN negligible as well in the thermodynamic limit.
DEEP RANDOM NEURAL NETWORKS √
Finally, [φ (xi0 )]2 = Dt Du[φ ( σb u + gQl−1 t )]2 ,
We first derive the mean-field equation for the mean where one independent Gaussian random variable (u) corre-
activity mli by noting that its pre-activation ãil + bil = ail + sponds to the randomness of the bias, and the other Gaussian
[wl ml−1 ]i + bil . ail behaves like a Gaussian random variable random variable (t) corresponds to the random mean preacti-
with zero mean and variance lii , depending on the fluctuating vation [wl ml−1 ]i because of the random weights. In addition,
input; thus the average operation for the mean activity can be gQ specifies the variance
of the mean preactivation, with the
computed as a Gaussian integral: definition Q = N1 i m2i (or spin glass order parameter in
physics). Clearly, Q can be iteratively computed from one
layer to its next layer as follows:
mli = hli = Dtφ lii t + [wl ml−1 ]i + bil . (A1)
√
Q = Dt Duφ 2 σb u + gQl−1 t .
l
(B3)
Analogously, to compute the covariance Cijl , one first evalu-
ates the statistics of the pre-activations of units i and j , i.e., Note that the initial Q0 = 0 by the construction of the random
ail and ajl , which follows a joint Gaussian distribution with model.
zero mean, variances lii and ljj , respectively, and covari- Looking at l = 1 for the random model, one finds
√
ance lij . Therefore, one can use two independent standard [φ (xi0 )]2 = Du[φ ( σb u)]2 ≡ κ as defined in the main
Gaussian random variables with zero mean and unit variance text. It follows that 1 = g 2 κ 2 0 + g 2 κ 2 /N . Following the
[Tr(C)]2
to parametrize this joint distribution, which results in definition of the normalized dimensionality (D̃ = NTr(C 2) ,
[φ (xi0 )]4
×φ jj x + y 1 − 2 + bjl + [wl ml−1 ]j 2 1.
[φ (xi0 )]2
− mli mlj , (A2) To derive the relationship between dimensionality
of con-
secutive layers, we first define K1l = N1 i Ciil and K2l =
√ 1 l 2
i (Cii ) , and then we get the normalized dimensionality
where Dx = e−x dx/ 2π and = √ ij
2
/2 N
. The super- of layer l as
ii jj
script l is omitted for the covariance of al . It is easy to verify l 2
that the above parametrization of ail and ajl follows the same K1
D̃ =
l
, (B4)
statistics as mentioned above. (N − 1) l + K2l
which is compared with the counterpart at a higher layer l + 1
APPENDIX B: THEORETICAL ANALYSIS given by
IN THE LARGE-N LIMIT l 2
K1
In the thermodynamic limit,√the covariance of activation D̃ =
l+1
2 . (B5)
al , lij , is of the order of O(1/ N ), where N is the network (N − 1) + K2l + K1l ϒ
l
width. This is because both the weights and√the (connected) To derive Eq. (B5), we used Eq. (B2). Because the additive
correlations are also of the order of O(1/ N ). Thus, one term (K1l )2 ϒ in the denominator is always positive, Eq. (B5)
can expand the covariance equation [Eq.(A2)] in the small explains the dimensionality reduction across layers. The value
ij limit. After the expansion and some simple algebra, one of the additive term thus determines how significantly the di-
obtains mensionality is reduced. Its behavior with increasing numbers
of layers can also be analyzed within the large-N expansion.
Cijl = Kij ij + O 2ij , (B1) First, we derive the recursion equation for K1l . Using the fact
062313-5
HAIPING HUANG PHYSICAL REVIEW E 98, 062313 (2018)
by following the same principle as mentioned above. Second, wia = η[σi sa p(sa |σ )data − sa σi model ], (C3a)
according to the definition, it is easy to write that
biv = η[σi data − σi model ], (C3b)
√ 4
Dt Du φ σb u + gQl−1 t
ϒ = √ 2 2 . (B7) bah = η[sa p(sa |σ )data − sa model ], (C3c)
Dt Du φ σb u + gQl−1 t
Last, we find that the additive positive term tends to be a very where η specifies a learning rate. Since there are no lateral
small value as the number of layers increases (Fig. 4), which connections between neurons at each layer, given one layer’s
is consistent with the observation in a finite-N system. This activity, the other layer’s activity is factorized as
implies that the estimated dimensionality at deep layers be-
comes nearly a constant, due to the nearly vanishing additive
esa [wσ ]a +ba
h
062313-6
MECHANISMS OF DIMENSIONALITY REDUCTION AND … PHYSICAL REVIEW E 98, 062313 (2018)
N (0, g/N ) for weights and N (0, σb ) for biases. g = 0.8 and approximation, i.e., Cii = 1 − m2i . Therefore, we impose the
σb = 0.1 unless otherwise specified. Using the RBM as a data statistical consistency of diagonal
terms on a corrected free
generator allows us to control the complexity of the input data. energy as F̃Bethe = FBethe − 21 i i (1 − m2i ) [33–35]. Fol-
In addition, the RBM has been used to model many real data lowing the similar procedure in our previous work [16], we
sets (e.g., handwritten digits [24]). In numerical simulations obtain the following mean-field iterative equation:
(Fig. 3 in the main text), we generate M = 60 000 training
examples (each example is an N -dimensional vector) from ⎛ ⎞
the random RBM. Then these examples are learned by RBMs
in the DBN. We divide the entire data set into mini-batches mi→a = tanh ⎝biv − i mi + ua →i ⎠, (D2a)
of size B = 150. One epoch corresponds to a sweep of the a ∈∂i\a
entire data set. Each RBM is trained for tens of epochs until
1 cosh bah + Ga →i + wia
the reconstruction error (ε ≡
h − h
22 ) between input h and u = ln
a →i , (D2b)
reconstructed input h does not decrease. We use an initial 2 cosh bah + Ga →i − wia
learning rate of 0.12 divided by t/10 at the tth epoch, and
where Ga →i ≡ j ∈∂a \i wj a mj →a and the correction intro-
an
2 weight decay parameter of 0.0025.
duces an Onsager term (−i mi ). The cavity magnetization
mi→a can be understood as the message passing from the
APPENDIX D: ESTIMATING THE COVARIANCE visible node i to the factor node a, while the cavity bias ua →i
STRUCTURE OF A RANDOM RESTRICTED BOLTZMANN is interpreted as the message passing from the factor node a
MACHINE to the visible node i. In fact, Eq. (D2) is not closed. {i } must
To study the complexity propagation in the DBN, it is nec- be computed based on correlations. Therefore, we define the
essary to evaluate the statistics of the input data distribution, cavity susceptibility χi→a,k ≡ ∂m∂bi→av [36]. According to this
k
which is provided by a random RBM in the model setup of definition and the linear response theory, we close Eq. (D2) by
stochastic neural networks. This is because the estimated co- obtaining the following susceptibility propagation equations:
variance can be used as a starting point from which the mean-
field complexity-propagation equation iterates. In addition,
characterizing the RBM representation may provide insights χi→a,k = 1 − m2i→a a →i Pa →i,k + δik 1 − m2i→a
towards deep representations, since RBM is a building block a ∈∂i\a
for deep models and moreover a universal approximator of − i Cik , (D3a)
discrete distributions [31].
Given the RBM, the hidden neural activity at a higher 1 − m2i
layer (e.g., s) can be marginalized over using the conditional Cik = Fik , (D3b)
1 + 1 − m2i i
independence [Eq. (C4)]; thus the distribution of the represen- Fii − 1
tation at a lower layer (e.g., σ ) can be expressed as i = , (D3c)
1 − m2i
1
P (σ ) = P (s, σ ) = 2 cosh [wl+1 σ ]a where the full magnetization mi = tanh(biv − i mi +
Z a
s tanh(w )[1−tanh2 (bah +Ga→i )]
a ∈∂i ua →i ), a→i ≡ 1−tanhia2 (bh +G ) tanh , Pa→i,k ≡
v 2
(wia )
+ bah eσi bi , (D1) a a→i
062313-7
HAIPING HUANG PHYSICAL REVIEW E 98, 062313 (2018)
FIG. 5. The eigenvalue distribution of the covariance matrix estimated from the deep computation. (a) The distribution for the deep
networks with random weights (N = 100). One hundred instances are used. In the inset, the distribution at deeper layers is compared with the
Marchenko-Pastur law of a random-sample covariance matrix. The tail part is enlarged for comparison. (b) The distribution obtained from an
unsupervised learning system of deep belief networks is strongly different from the Marchenko-Pastur law. Ten instances are used.
dynamics of learning in RBMs [37]). Our mean-field formula mean-field theory of deep networks, we choose P = N ,
[Eqs. (D2) and (D3)] may offer a basis to be further improved and ς 2 is obtained by matching the range of the eigen-
to address this interesting special property, although one can value. The designed random-sample covariance matrix has the
enforce the symmetry by [Cij + Cj i ]/2 in our mean-field Marchenko-Pastur law for the density of eigenvalues [26,27]:
formula. 1
μ(λ) = [(λ − λ− )(λ+ − λ)]1/2 , (E1)
2π λς 2
where λ ∈ [λ− , λ+ ], λ− = 0, and λ+ = 4ς 2 .
APPENDIX E: THE EIGENVALUE DISTRIBUTION
For the random neural networks, the eigenvalue distri-
OF THE COVARIANCE MATRIX
bution at deep layers seems to assign a higher probability
To analyze the eigenvalue distribution of the covariance density when the eigenvalue gets close to zero, yet a lower
matrix at each layer of deep networks, we first construct a density at the tail of the distribution [Fig. 5(a)], compared with
random-sample covariance matrix. More precisely, we con- the Marchenko-Pastur law of a random-sample covariance
sider a real Wishart ensemble, where a random-sample co- matrix. For the unsupervised learning system, we find that the
variance matrix is defined as N1 ξ ξ T in which ξ defines an distribution deviates significantly from the Marchenko-Pastur
N × P matrix whose entries follow independently a normal law of a Wishart ensemble. The distribution has a Gaussian-
distribution N (0, ς 2 ). In fact, ξ can be thought of as a random like bulk part together with a long tail [Fig. 5(b)]. Therefore,
uncorrelated pattern matrix. To compare the real Wishart the dimensionality reduction and its relationship with decor-
ensemble with the covariance matrix estimated from the relation are a nontrivial result of the deep computation.
[1] J. J. DiCarlo and D. D. Cox, Trends Cognit. Sci. 11, 333 (2007). [10] G. F. Elsayed and J. P. Cunningham, Nat. Neurosci. 20, 1310
[2] J. J. DiCarlo, D. Zoccolan, and N. C. Rust, Neuron 73, 415 (2017).
(2012). [11] U. Guclu and M. A. J. van Gerven, J. Neurosci. 35, 10005
[3] N. Kriegeskorte and R. A. Kievit, Trends Cognit. Sci. 17, 401 (2015).
(2013). [12] H. Barlow, in Sensory Communication, edited by W. Rosenblith
[4] D. Marr, Proc. R. Soc. London, Ser. B 176, 161 (1970). (MIT, Cambridge, MA, 1961), pp. 217–234.
[5] G. E. Hinton and R. R. Salakhutdinov, Science 313, 504 (2006). [13] J. Tubiana and R. Monasson, Phys. Rev. Lett. 118, 138301
[6] B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and (2017).
S. Ganguli, in Advances in Neural Information Processing [14] S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein,
Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, arXiv:1710.06570.
I. Guyon, and R. Garnett (Curran Associates, Red Hook, NY, [15] A. Barra, G. Genovese, P. Sollich, and D. Tantari, Phys. Rev. E
2016), pp. 3360–3368. 97, 022310 (2018).
[7] B. Babadi and H. Sompolinsky, Neuron 83, 1213 (2014). [16] H. Huang and T. Toyoizumi, Phys. Rev. E 91, 050101 (2015).
[8] J. Kadmon and H. Sompolinsky, in Advances in Neural Informa- [17] H. Huang and T. Toyoizumi, Phys. Rev. E 94, 062310
tion Processing Systems 29, edited by D. D. Lee, M. Sugiyama, (2016).
U. V. Luxburg, I. Guyon, and R. Garnett (Curran Associates, [18] H. Huang, J. Stat. Mech.: Theory Exp. (2017) 053302.
Red Hook, NY, 2016), pp. 4781–4789. [19] J. Pennington and Y. Bahri, in Proceedings of the 34th Inter-
[9] E. Domany, W. Kinzel, and R. Meir, J. Phys. A: Math. Gen. 22, national Conference on Machine Learning, Vol. 70 of Proceed-
2081 (1989). ings of Machine Learning Research, edited by D. Precup and
062313-8
MECHANISMS OF DIMENSIONALITY REDUCTION AND … PHYSICAL REVIEW E 98, 062313 (2018)
Y. W. Teh (PMLR, International Convention Centre, Sydney, [28] A. Achille and S. Soatto, arXiv:1706.01350.
Australia, 2017), pp. 2798–2806. [29] H. Hong, D. L. Yamins, N. J. Majaj, and J. J. DiCarlo,
[20] K. Rajan, L. Abbott, and H. Sompolinsky, in Advances in Neural Nat. Neurosci. 19, 613 (2016).
Information Processing Systems 23, edited by J. D. Lafferty, [30] M. Pagan, L. S. Urban, M. P. Wohl, and N. C. Rust,
C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta Nat. Neurosci. 16, 1132 (2013).
(Curran Associates, Red Hook, NY, 2010), pp. 1975–1983. [31] N. Le Roux and Y. Bengio, Neural Comput. 20, 1631
[21] D. L. Yamins and J. J. DiCarlo, Nat. Neurosci. 19, 356 (2008).
(2016). [32] M. Mézard and A. Montanari, Information, Physics, and Com-
[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and putation (Oxford University, Oxford, 2009).
R. Salakhutdinov, J. Mach. Learn. Res. 15, 1929 (2014). [33] H. Huang and Y. Kabashima, Phys. Rev. E 87, 062129
[23] H. A. Bethe, Proc. R. Soc. London, Ser. A 150, 552 (1935). (2013).
[24] G. Hinton, S. Osindero, and Y. Teh, Neural Comput. 18, 1527 [34] J. Raymond and F. Ricci-Tersenghi, Phys. Rev. E 87, 052111
(2006). (2013).
[25] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, Commun. ACM [35] M. Yasuda and K. Tanaka, Phys. Rev. E 87, 012134 (2013).
54, 95 (2011). [36] S. Higuchi and M. Mezard, J. Phys.: Conf. Ser. 233, 012003
[26] V. Marcenko and L. Pastur, Math. USSR-Sb. 1, 457 (1967). (2010).
[27] R. Cherrier, D. S. Dean, and A. Lefévre, Phys. Rev. E 67, [37] A. Decelle, G. Fissore, and C. Furtlehner, Europhys. Lett. 119,
046112 (2003). 60001 (2017).
062313-9