Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO.

6, JUNE 2011 919

Analyzing the Scaling of Connectivity in


Neuromorphic Hardware and in Models of
Neural Networks
Johannes Partzsch and René Schüffny

Abstract— In recent years, neuromorphic hardware systems flexibility, can only be used for very specific applications.
have significantly grown in size. With more and more neurons Given these prospects, a middle course between free con-
and synapses integrated in such systems, the neural connectivity figurability and hard-wired connections seems to be most
and its configurability have become crucial design constraints.
To tackle this problem, we introduce a generic extended graph promising for building large-scale neuromorphic hardware.
description of connection topologies that allows a systematical Such an approach has also been adopted by biological neural
analysis of connectivity in both neuromorphic hardware and networks. The work of Stepanyants et al. [8] shows that the
neural network models. The unifying nature of our approach existing synaptic connections in the brain are chosen (i.e.,
enables a close exchange between hardware and models. For an configured) from a much larger pool of potential connections,
existing hardware system, the optimally matched network model
can be extracted. Inversely, a hardware architecture may be fitted while potential all-to-all connectivity is not maintained on
to a particular model network topology with our description a larger scale in the brain. As this example demonstrates,
method. As a further strength, the extended graph can be used orienting hardware system design on biological constraints
to quantify the amount of configurability for a certain network could be very fruitful in restricting the architectural design
topology. This is a hardware design variable that has widely space and in enabling a more efficient implementation of
been neglected, mainly because of a missing analysis method.
To condense our analysis results, we develop a classification for biologically inspired NN models. However, investigations on
the scaling complexity of network models and neuromorphic connectivity and configurability in neuromorphic hardware
hardware, based on the total number of connections and the are often purely hardware-specific [9]–[12], not taking into
configurability. We find a gap between several models and existing account the biological counterpart.
hardware, making these hardware systems either impossible
or inefficient to use for scaled-up network models. In this
While these general investigations are missing, specific
respect, our analysis results suggest models with locality in their biologically realistic network models have been realized in
connections as promising approach for tackling this scaling gap. neuromorphic hardware, such as locally coupled networks for
Index Terms— Connectivity, network scaling, network topol- orientation selectivity [1], [3], networks for olfactory process-
ogy, neuromorphic hardware. ing [2], realizations of selective attention mechanisms [13] or
reproduction of spiking regimes in generic random networks
I. I NTRODUCTION [14]. However, these implementations are often very restricted
to the particular model they were designed for. Furthermore,
I N RECENT YEARS, several hardware systems for mim-
icking biological spiking neural networks (NNs) have been
implemented [1]–[5]. These systems either offer unspecific
they represent only a small fraction of the various models
for spiking NNs. For example, those models can be based on
relatively freely configurable connectivity or implement hard- connectivity measurements, such as the V1 models of Häusler
wired task-specific connections. However, the cost of freely and Maass [15] and Kremkow et al. [16] or use computational
configurable connections is growing fast with the number neural principles for model construction, such as the HMAX
of neurons in the system, because the number of possible model [17] or the more biologically centered column model
connections scales quadratically with the neuron count. This of Lundqvist et al. [18].
becomes critical for systems with thousands of neurons, as When it comes to orienting neuromorphic system design on
they are currently under development [6], [7]. On the other biologically realistic network models, the models’ connectivity
hand, completely hard-wired connectivity, due to its lack of has to be analyzed. Methods for such studies can be adopted
from the active research field of complex networks, which has
Manuscript received May 12, 2010; revised January 4, 2011; accepted investigated properties and functional implications of connec-
March 13, 2011. Date of publication May 13, 2011; date of current version
June 2, 2011. This work was supported in part by the European Union Seventh tivity in all kinds of networks, including NNs (for reviews,
Framework Programme (FP7/2007-2013) under Grant 269921 (BrainScaleS) see [19], [20]). Properties of network connectivity, such as
and Grant 269459. The work of J. Partzsch was supported by a doctoral the degree distribution, the small-world property or node
scholarship of the Konrad Adenauer Foundation.
The authors are with the Chair for Parallel VLSI Systems and Neuro- distances [19], have been studied extensively in recent years.
Microelectronics, Circuits and Systems Laboratory, Department of Elec- Furthermore, fundamental network classes, like preferential
trical Engineering and Information Technology, University of Technology attachment, connectivity optimization, and random graphs
Dresden, Saxony 01062, Germany (e-mail: johannes.partzsch@tu-dresden.de;
schueffn@iee.et.tu-dresden.de). [19]–[21] have been defined and investigated. Specifically, for
Digital Object Identifier 10.1109/TNN.2011.2134109 biologically centered NNs, optimization approaches [22] and
1045–9227/$26.00 © 2011 IEEE
920 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

scaling of connectivity [23], [24] have been studied. However, of configurability. Thus, these investigations somewhat differ
all of these investigations, while providing useful insights from common scaling analyses, e.g., on the degree distribution
into general connectivity properties, draw no connection to [19] or on the local distribution of connections, as expressed
neuromorphic hardware design. in Rent’s rule [23], [30].
The above connectivity analyses are based on a combination We introduce the extended graph description in Section II,
of theoretical results, often concerning asymptotic properties together with network properties that can be calculated from
for large numbers of neurons, and calculations on single it and methods for representing random graphs. In Section III,
graphs (networks). In this paper, we employ a complementary several network models and their scaling properties are ana-
approach that is more suited for neuromorphic hardware lyzed and classified with the methods of Section II. Section IV
design. For that, we represent a whole network class by performs the same analyses for representative neuromorphic
an extended graph model. Thereby, a network class can be hardware systems and interprets differences to the network
any set of networks. For example, all networks that can be models. Furthermore, it relates the results to other common
realized with a neuromorphic hardware system are described scaling analyses in Section V. Finally, consequences of the
by a corresponding extended graph. This approach has several results and research prospects are discussed in Section VI.
advantages. First of all, the amount of configurability for
implementing a certain network class (e.g., a NN model) can II. N ETWORK D ESCRIPTION
be determined. Following the approach of [25], this constitutes A. Elements of the Description
an independent dimension of network complexity analysis The connectivity of a specific NN can be described by a
[19], [24] and is an important design factor for neuromorphic directed graph consisting of nodes and directed edges. Follow-
hardware, determining the amount of configuration memory ing the terminology of NNs, we name the nodes as neurons
and the number of switches needed for implementing the and the edges as synapses or simply connections. Furthermore,
network’s connections. Second, connectivity of neuromorphic we combine all connections starting from a single neuron into
hardware systems and NN models can be described with the a hyperedge, which is a connection that has a single source
same extended graph, enabling direct comparisons of models neuron but may have several target neurons. This concept is
and hardware. Third, common properties of a class can be used, for example, in partitioning computer networks [31].
calculated directly from the extended graph, avoiding the To better identify the synapses (i.e., inputs) of a neuron, a
averaging over a large number of single networks, which synapse element is introduced that joins several connections
is computationally expensive [21], [26]. Generally speaking, in one element. Thus, it can have multiple inputs, but has
our extended graph representation is an intermediate approach only one output, effectively representing a reverse hyperedge.
between common network analysis tools. In contrast to theo- This joining of single edges has been used, for example, when
retical predictions for a certain network class, such as uniform describing dependency graphs of parallel programs [32]. In
random graphs [20], it can be used for a wide variety of contrast to the synapse element, neuron elements are restricted
network classes. On the other hand, it is more general than to have a single input.
averaging over representative graphs of a network structure As mentioned above, neuron and synapse elements are suf-
[19], [21]. Finally, it does not make any assumptions on the ficient to describe a single network. To be able to represent a
underlying connectivity structure. This, for example, is in whole class of networks, we introduce a permutation element.
contrast to the method by Feng and Greene [27], who predicted This element represents all possible assignments of its inputs
the amount of configuration from the Rent exponent based on to its outputs, implying equal number of inputs and outputs.
a 2-D placement model by Donath [28]. However, for convenience we allow a differing input and
Our approach also differs from the connection set algebra output count, assuming that the missing inputs or outputs are
[29], which generates network structures from basic templates not connected. With the permutation element, a set of graphs
using generic construction rules. While connections can be can be merged into a single extended graph. Each single graph
extended with arbitrary attributes, the result is still a single then corresponds to a combination of assignments for each
netlist. In contrast, our extended graph approach describes sets permutation element in the extended graph, which we call a
of netlists. While this set approach is implicitly contained in configuration in the following. Symbols for the three elements
the extended graph, it was explicitly used by Bianconi [25] to in the extended graph are depicted in Fig. 1(a).
derive the entropy of a random graph ensemble. This measure From a hardware perspective, the permutation element is
is identical to our definition of configurability, however, the similar to a switch matrix. However, it differs in the allowed
applications are diverse. While the set approach of Bianconi is combinations of active switches. Each input has to be con-
suited for studying general network classes, the extended graph nected to exactly one output and vice versa. Thus, whereas
can be used for modular hierarchical descriptions of network a switch matrix with 3 inputs and 3 outputs would allow
topologies, making it more convenient to use for specialized 23·3 = 512 combinations, a permutation element of the same
network models and hardware systems. size would represent 3! = 6 assignments [Fig. 1(b)]. Thereby,
We employ the extended graph method to analyze the scal- concurrency situations, in which realizing one connection
ing of some common topological models of NNs, specifically excludes the realization of others, can be modeled.
population-based models and models with local connectivity, However, not all forms of concurrency are representable
and recent neuromorphic hardware systems. Thereby, we con- with simple permutation elements. For example, the neuro-
centrate on the total number of connections and the amount morphic waferscale system by Schemmel et al. [6] combines
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 921

permutation
automated way. In doing so, these measures can then be
1 2 3 switch matrix element
extracted for any network class for which an extended graph
has been created. In contrast to derivations in [27], no a priori
S P (1) (2) assumptions about the underlying connectivity are required,
because the connectivity of the whole class is described by
the extended graph.
(a) (b) Simple measures that can be extracted from the extended
graph are the number of neurons (by counting the number of
ungrouped
neuron elements) and the number of available synapses (by
transmission summing the input counts of the synapse elements). Note that
P
channel the number of synapses in the extended graph may be higher
B than the synapses actually required in a single network, i.e.,
N synapses could be switched off when not needed.
connection
P P P P An informative measure of the extended graph is the number
grouped lookup
(P-2) Nsyn of possible configurations, expressed as the minimum con-
1
S
synapse/
2 figuration memory. This value represents the number of bits
S
3 that are needed to select a single configuration out of the set
neuron block S
of possible configurations. The same measure was employed
S N by Bianconi [25], named entropy, to characterize randomized
graph ensembles. Also, similar definitions have been used in
(c) (d)
[27] with a more abstract connectivity model and in [26] for
Fig. 1. Extended graph description. (a) Symbols for the elements: 1 neuron, extracting entropy measures from single netlists. Also, this
2 synapse, 3 permutation element. (b) Comparison of a switch matrix and
a permutation element: the switches are denoted as grey circles; for the entropy was used in [33] as an inverse measure of graph
permutation element, the remaining two allowed assignments are shown complexity.
(dashed lines) if the first input is assigned to the second output. (c) Comparison
of permutation elements of size 4 without groups and with group size 2; for the
The overall configuration memory can be calculated by sum-
ungrouped matrix, only the allowed assignments for the first input are shown. ming the memory amounts I j of all permutation elements ( j ).
(d) Example system, consisting of N neurons, each having Nsyn synapses; The value I j is computed by taking the dyadic logarithm
the N possible inputs for the synapses have to be transmitted over a channel
with capacity for B neuron outputs.
denoted as ld(.) of the number of permutations
 
Nj!
outputs of 64 neurons on a serial channel that itself is switched. I j = ld ( j) ( j) ( j)
. (1)
Nout,1 ! · Nout,2 ! · . . . · Nout,k !
Such a switching of grouped inputs, while not adequately
modeled with simple permutation elements, can be described Thereby, N j is the size of the permutation element, which
by a grouped permutation element. In this element, inputs are is the maximum of its inputs and its outputs. For a correct
divided into groups of equal size and assignments are made value, indistinguishable permutations have to be accounted
between groups, not between single inputs. This introduces for. These can occur, for example, when some outputs of
further constraints, as Fig. 1(c) illustrates. However, the net- a permutation element are connected to a single synapse
work models and hardware systems analyzed in this paper element, as the synapse inputs are freely permutable. In the
can be described completely without grouped permutation equation, those permutations are removed by counting the
elements, so that we do not further introduce them here. ( j)
number of indistinguishable outputs to an element Nout,i
Fig. 1(d) shows an example description containing typical and reducing the number of permutations accordingly. In
elements of neuromorphic hardware: a bandwidth-restricted contrast, we define all inputs of permutation elements as being
channel, look-up tables for the connectivity, and a synapse- distinguishable, so that when two permutation elements are
and-neuron block. The channel to the hardware unit can connected, permutations in the connections between them are
transmit pulses from B neurons only, where B is lower than accounted for only once.
the number of neurons N in the system. Thus, the signals Another informative measure of the extended graph is
transmitted over the channel have to be selected from the N the maximum number of simultaneous connections. In a
neuron outputs, modeled by a permutation element. At the network without permutation elements, this value is equal
channel output, the remaining B signals can be distributed over to the number of synapses. However, due to the constraints
the synapse block, modeled by a permutation element for each introduced by the permutation elements, the actual maximum
synapse element, which itself connects to a single neuron. The number of simultaneous connections may be smaller than
outputs of the neurons are then fed back to the channel input. the synapse count. This is reminiscent of the maximum flow
This simple example shows how the connectivity of neuro- in directed networks [34], in which the capacities of parts
morphic systems can be represented by an extended graph. of the edges may not be exploited, because bottlenecks in
other parts of the network exist. In fact, the extended graph
B. Properties and Algorithms can be transformed into a directed graph, with each node
One motivation of the developed description is to extract representing a permutation element and edges between the
characteristic measures of the connectivity in a unified and nodes having capacity equal to the number of connections
922 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

extended graph connection matrix


source node:
neuron outputs 16 p = 1.0 (1x) neuron outputs 1
P 4 P for first
I I I p = 1.0
4 = 0.25 (3x) connection

neuron inputs
3

probability
3
branch × 4
4
3 3 3 3 3 3 3 3 P P P P
PII PIII PIV PV
II III IV V II III IV V
2 2 2 2 S
2 2 2 2 2 2 2 2

p
S S p = 0.5 0

ing
S sink node: S

ult
S

r es
neuron inputs S
S
p = 3. ( 0.25 ) = 0.25 (2x)
3 p = 2.0.25 = 0.5

Fig. 2. Generation of a directed graph with edge capacities from the example
system in Fig. 1(d) with N = 4, B = 3, and Nsyn = 2. The resulting Fig. 3. Extraction of a matrix with connection probabilities from the extended
maximum flow is 4 · 2 = 8. graph example in Fig. 2. The iteration for the first neuron is shown, starting
at the thick arrow.

TABLE I
between the corresponding permutation elements. Therefore, T YPICAL P ROPERTIES OF E XAMPLE S YSTEM IN F IG . 1(d) WITH
branches in the connections between elements (i.e., true N = 20, Nsyn = 10 AND D IFFERENT BANDWIDTHS B . R IGHTMOST
hyperedges) have to be shifted toward the neuron outputs C OLUMN : IN THE C HANNEL P ERMUTATION E LEMENT, F OUR O UTPUTS
and capacities of the involved edges multiplied with the W ERE A LREADY A SSIGNED
number of branches. The whole transformation process is
illustrated in Fig. 2. For the calculation of the maximum flow B = 20 B = 15 B=8 B = 15,
in the resulting directed graph, standard algorithms exist that 4 fixed
iteratively find non-exploited paths between source and sink Config. (I ) 350bit 245bit 17bit 243bit 1
node until no such path exists anymore [35]. In the example Max. flow 200 200 160 200

probability
in Fig. 2, the resulting maximum flow is 8. This value then
is an upper bound of the maximum number of simultaneous
connections.
( p = 0.5) ( p = 0.5) ( p = 0.4) ( pfix = 0.67, 0
Up to now, we have introduced only scalar measures to ana- p = 0.46)
lyze the network structure. However, these reflect only mean
characteristics. To investigate local structural differences, a
connection matrix may be extracted from the extended graph
description. Such a matrix is a common visualization method can be accounted for by dividing the input probabilities by
for synaptic connections [36], [37]. However, instead of synap- the maximum of input and output count. Connections via
tic weights we calculate connection probabilities. We start different inputs of an element are disjoint, and thus can
from the assumption that each (distinguishable) assignment of be combined by summing their probabilities (denoted as
a permutation element has equal probability of being chosen. multiplication, because probability values are equal at the
In this ideal case, the configuration memory calculated with (1) inputs). In the example of Fig. 3, all neurons have the
is equal to the entropy E j of the assignments, calculated over same input connectivity, leading to a uniform connection
the assignment probabilities ( pi ). The following expression matrix with probabilities 0.5. To distinguish the values in
shows this equality: the extracted connection matrix from probabilities in network
 classes with random connections, we call them extracted
Ej = − pi · ld( pi ) probabilities.
(i) Table I shows how the introduced properties reflect dif-
= ld(K j ) = ld(2 I j ) = I j (2) ferent aspects of a network structure. For visualizing this,
we changed the bandwidth of the example system depicted
where K j denotes the number of distinguishable assignments in Fig. 1(d). If the bandwidth is sufficient for transmit-
of the permutation element j . Thus, if the extended graph ting all neuron outputs, i.e., B ≥ 20, the system is only
description of a network class results in equal assignment restricted by the number of synapses per neuron (leftmost
probabilities, its configurability is perfectly matched to that column in Table I). Then, it represents all networks with
class. For extracting a connection matrix, we want to calculate up to 10 synapses per neuron. This is also reflected in the
the connection probabilities of this perfectly matched network maximum flow, which equals the number of synapses, and in
class. For this, we use a modification of the Dijkstra algorithm the amount of configuration memory, which is the logarithm
for shortest paths [38] on each neuron individually. Instead of the number of networks with 20 neurons and 10 synapses
of the path length, we store for each output of an element per neuron. Note that this value represents those networks with
the probability that at least one connection from the current exactly 10 synapses per neuron, because it does not include
neuron to that output exists. Fig. 3 shows the procedure for configuration memory for switching off synapses or neurons.
the first neuron of the example system in Fig. 1(d). For This is because we are interested in the configurability of the
ungrouped permutation elements, the assignment probabilities routing network. Enable bits for synapses and neurons could
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 923

be added easily, but would distort the purely routing-based structure. Therefore, it cannot directly represent random struc-
value. ture elements, often expressed in a model as connection
The extracted connection matrix shows a uniform network probabilities [15], [18]. Thus, a transformation method for
with probability 0.5 for all connections. An informative mea- probabilistic connections is required. In the preceding sec-
sure of this matrix is its entropy, meaning the sum over tion, we have introduced a method for extracting connec-
the entropies of all connections when treated as independent tion probabilities from an extended graph, based on optimal
symbols with two states (on/off), (2) with i ∈ {0, 1}. The usage of the configurability in the system. For transforming
entropy for the extracted connection matrix is 400 bits, which random elements of a model, we effectively need an inverse
is thus slightly higher than the 350-bit configuration for the method. Then, if the resulting extended graph is an efficient
example system. This is because, in the system, connections representation of the original model, the extracted connection
to a single neuron are not completely independent of each probabilities should be close to the original values. As a sec-
other, as the sum of synapses for this neuron needs to be 10. ond requirement, the extended graph should provide enough
This example shows that the difference between the entropy connection resources to account for the random variations
of the extracted connection matrix and the configuration inherent in the model. In other words, a realization (graph)
amount indicates dependencies between connections. In other of a random network structure should have a high probability
words, if the existence of a connection depends on another to be contained in the corresponding extended graph.
connection, part of its (existence) information is already In the following, we use random graphs [19], [20] as starting
contained in the other connection. This reduces the extra point for the transformation. These are widely used in network
amount of information needed to store the existence of the behavior analysis [15], [39], [40]. Thereby, we differentiate
dependent connection, compared to treating both connections between uniform random graphs with constant connection
as independent. Thus, if the configuration amount in a sys- probability p over all possible connections, and nonuni-
tem is reduced, dependencies between connections increase, form random graphs, in which the connection probability
even if the pure number of connections and the extracted may vary.
connection probabilities stay constant. This is reflected in A simple transformation approach is to use a description
the example system (Table I). Restricting the bandwidth adds like that of the example system in Fig. 1(d), but removing the
dependencies between connections to different neurons, which bandwidth restriction and adjusting the number of synapses
results in a lower configuration memory. However, this is for each neuron individually. Given this basic structure, the
not reflected in the extracted connection matrix and in the number of synapses Nsyn, j for each neuron j remains to
maximum flow value at first. Only if the bandwidth is lower be calculated. For this, we calculate the expected number of
than the number of synapses per neuron (case B = 8), the neurons K (Nsyn ) in the network with a certain number of
total number of connections reduces, because there are not synapses Nsyn . Rounding each K (Nsyn ) to the nearest integer
enough signals from the channel for feeding each synapse then determines the number of neurons in the extended graph
with a different input. This shows that a characterization with Nsyn, j = Nsyn synapses. These rounded values may not
based on the counting of connections, as is the case for sum up to the number
 N of neurons in the network as would
the maximum flow method and the extraction of connection be required, i.e., i=0 K (i ) = N. In this case, neurons are
probabilities, may reveal strong structural constraints but fails either added or removed, beginning at the neurons that have
to detect still-significant dependencies between connections. In the mean number of synapses per neuron. To allow for smooth
contrast, these are well reflected in the configuration memory variation of the total number of synapses in the network, we
measure. modify the connection probabilities pgen,i j with a factor s
Dependencies between connections may be visualized by
pgen,i j = pi j + s · pi j · (1 − pi j ). (3)
fixing a part of the connections, as shown in the rightmost
column of Table I. This causes additional structure to appear Having generated an extended graph, the mapping of the
in the extracted connection matrix. In the example, four neurons in a realization of the given random graph to the neu-
neuron outputs are guaranteed to be transmitted over the rons in the extended graph needs to be determined. Thereby,
channel at the cost of the other outputs. This results in a the optimal mapping may vary between realizations because of
higher probability for connections from the four fixed neurons, random variations, and thus it cannot be fixed beforehand. An
whereas connections from other neurons suffer from a lower optimal mapping to this problem is obtained by ordering both
probability. In the extreme case when all 15 inputs to the extended graph and realization neurons by their number of
channel were assigned, all connections from five neurons, i.e., synapses and assigning them on a one-to-one basis. That is, the
five columns in the matrix, would have probability 0, whereas neuron in the realization with the most synapses is mapped to
the others would have a probability of 10/15 = 0.67. Such in- the extended graph neuron with the most synapses, and so on.
depth investigations may be used for evaluating the impact of For verifying that the generated extended graph indeed
lowered configuration amount on the structure of the network covers the most likely realizations of the random graph,
connectivity. the probability Pmap of a realization to be contained in the
extended graph may be derived. For the special case that the
C. Transformation of Random Graphs synapse count distribution is the same for all neurons, as in
The extended graph description, despite its possibility for uniform random graphs, this value can be calculated analyti-
configuration, is a deterministic description of a network cally with the following procedure. Let n j , j = 0, 1, . . . , N be
924 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

 1.0
the number of neurons having j synapses, giving Nj=0 n j = 10 1.0

mean synapse loss (%)

mapping probability
mapping probability
N, and N be the set of all different (n j ) that are contained 8 0.8 0.8

in a given extended graph (i.e., in the provided number of 6 0.6 0.6


synapses per neuron). Because the (n j ) represent disjoint sets 4 0.4 0.4
of realizations, we may sum them up to get the probability of 2 0.2 0.2
an arbitrary realization of the random graph to be mappable 0 0
0
to an extended graph 2800 3000 3200 3400 3600 2800 3000 3200 3400 3600

 n N!
synapses provided in
extended graph
synapses provided in
extended graph
Pmap = P(N ) = p0 0 · . . . · p nNN ·
n0 ! · . . . · n N ! (a) (b)
(n j )∈N
n
 
N p j
j
Fig. 4. (a) Mapping probability Pmap and synapse loss L for generated
extended graphs with different safety factors s, reflected in the varying total
= N! · (4) synapse count. The given uniform random graph has N = 100 neurons and
nj!
(n j )∈N j =0 connection probability p = 0.3. The vertical line represents the expected
synapse count. (b) Deviation of the Monte Carlo evaluation method from the
with p j = px ( j ) being the probability of a neuron to analytical solution: black line, analytical solution from (5); grey area, region of
have exactly j synapses. This sum can be factorized. For results for 10000 runs of the Monte Carlo method, each with 100 realizations.
any synapse count k, all (n j ) having the same partial sum
n 0 + n 1 + · · · + n k share the same possible combinations
of n k+1 , . . . , n N . Therefore, one may rewrite (4) as a sum
... ... Nsyn, 1 ... ... n1, 1
of N products, each consisting of two sums. This makes 1 P-1 1
an iteration over all synapse counts possible and gives a P-1
P-1
calculation complexity of O(N 3 ). n2, 1
... ... ... ... n
As stated, this algorithm for analytically calculating Pmap N P-1
1,N
N
only works if all neurons have the same synapse count dis- P-1
P-1
Nsyn, N n 1, N
tribution. In general, however, exactly calculating Pmap would
require iterating over all N! assignments from random graph to
extended graph neurons, which is computationally infeasible.
As an alternative, we approximate Pmap by generating a
large number of realizations and test for each whether it is
mappable, comparable to Monte Carlo methods [41].
The mapping probability Pmap only reflects the case that
a realization is completely contained in the extended graph. Fig. 5. Two possible representations of a random graph and corresponding
extracted connection matrices: ungrouped (left) and grouped (right, 2 groups).
Alternatively, it may be tolerable that a small number of For clarity, inputs to the individual permutation elements are omitted; permuta-
synapses in individual realizations are not mappable to the tion elements get their inputs from the vertical lines going through them. Gray
extended graph. Following [42], we call this number synapse values in connection matrices are only for illustrating the effect of grouping
on the extracted probabilities.
loss. The expected (relative) synapse loss L j can be calculated
independently for each neuron as
N
E(S j > Nsyn, j ) n=Nsyn, j +1 (n − Nsyn, j ) · p j (n) In the extended graphs discussed so far, only one per-
Lj = = N
E(S j ) n=1 n · p j (n)
mutation element was assigned to each neuron for choosing
(5) the inputs to its synapses. Thus, structure in the network
where S j is the random number of synapses of neuron j , and topology is reflected only in the synapse count per neuron.
p j (n) denotes the probability that neuron j has n synapses. The extracted connection matrix then consists of rows with
For a global measure L, the values in the numerator and identical probability values. This is depicted in the left part of
denominator have to be summed separately over all neurons. Fig. 5. Consequently, an extension of the above transformation
For the Monte Carlo method, L can be calculated as average method should enable differentiation inside a single row. This
over all realizations. can be done by organizing the columns, i.e., the sending
Fig. 4(a) shows results for the above generation method. neurons, into disjoint groups. Each neuron then has one permu-
The synapse loss is close to zero at the expected number of tation element per group for choosing its synapses. The right
synapses (3000), and a sufficient mapping probability can be part of Fig. 5 shows such an extended graph with two groups.
reached by adding a small number of synapses. This con- By grouping, more variability is added to the generation
firms that the introduced transformation method for random process. The number of synapses to a neuron can be chosen
graphs results in an efficient extended graph representation. independently for each group. Furthermore, neuron outputs
Fig. 4(b) shows that the statistical evaluation leads to well- can be freely assigned to groups as long as the group sizes
bounded approximation results already for a moderate number are not exceeded. However, optimization over both degrees of
of 100 realizations. Expecting the deviations to decrease with freedom is an infeasible task, because it would have to account
the square root of this number, we choose 10 000 realizations for permutations of neurons, which scales exponentially. In the
per evaluation for sufficient accuracy. For consistency, we use following, we therefore choose the assignment of neurons to
this method in the rest of this paper. groups by hand, following the structure of the given random
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 925

10

synapses in extended
graph, and only optimize the number of synapses per group

graph (×103)
10

configuration memory (103 bit)


and neuron. 8
8
The generation method for ungrouped representations as 6
described at the beginning of this section can be adopted to the 6
4
grouped case by calculating the synapse count distribution for 4
each neuron and group individually. The ungrouped method 2

synapse
30 2

loss
20
employed the degree of freedom in the mapping of neurons 10
01 0
2 5 10 20 50 100 1 2 5 10 20 50 100
from a realization of the given random graph to the extended number of groups number of groups
graph. In a grouped representation, however, permutations of (a) (b)
neurons over groups would possibly change the synapse count
distributions for that group, making it difficult to exploit these Fig. 6. Evaluation of the grouped generation method for the example random
permutations in the generation process, because this would graph of Fig. 4. (a) Synapse count and synapse loss. (b) Configuration memory
with respect to the number of groups. The x-axis is in logarithmic scale for
require iteration over all possible permutations. In contrast, clarity, ticks represent used group counts.
when allowing permutations inside groups only, synapse count
distributions stay constant. With this restriction, the same
approach as for ungrouped representations can be followed. are only configurable for input signals of one group, which
Let us choose a single neuron group and regard its grouped restricts their variability. Each neuron thus has to provide
inputs. For each input group, the synapse count distribution enough synapses in each of its input groups to ensure a faithful
can be discretized to yield a sequence of synapse counts for the representation. The smaller the groups, the more overhead this
group. However, it is not straightforward to combine synapse restriction results in. On the other hand, with less variability
counts of different groups, as required for individual neurons. in the synapses, the amount of configuration reduces. In
Iterating over all possible assignments would be infeasible the limit case of N groups with one neuron each, every
because of exponential scaling, therefore, we use a heuristic possible connection is realized directly with a synapse and
solution based on sampling a locally optimized combined no permutation elements are needed, also reducing synapse
distribution. As iteration variable, we use the expected number loss to zero.
of neurons n fit in the random graph that can be mapped to a As a summary of this section, Table II lists the measures
neuron j with n k, j inputs in the kth group, i.e., whose numbers for characterizing the connectivity of the extended graph
of synapses in each group are sufficient to hold the numbers descriptions.
of synapses per group in the random graph neurons
n fit = Ni · P(S1,i ≤ n 1, j ) · . . . · P(SK ,i ≤ n K , j ) (6) D. Software Implementation
where Ni is the number of neurons in the group and Sk,i is The extended graph, the algorithms for extracting the prop-
the random number of synapses from group k to a neuron erties in Table II and the transformations of random graphs
in group i . The optimized combined distribution is formed were all implemented in C++ and interfaced to Python for
by starting with all n k, j = 0 and iteratively increasing the simplified usage. Network structures were defined with a
synapse counts n k, j , such that in each step the increase of the subset of the PyNN language [43]. Hardware systems were
value n fit is maximized. This is assured by choosing in each described using a hierarchical approach comparable to hard-
step the maximum value of the relative probability increase ware description languages (HDLs). Thereby, modules were
f S (Sk, j + 1)/P(Sk, j ) over all k. During this procedure, each defined that could contain inputs, outputs, other modules, or
time the value of n fit crosses an integer value, a neuron with basic elements (neurons, synapses, P-elements). The module
the current values of (n k, j ) is added to the extended graph description was realized as a set of Python functions, so that
representation. it can be executed to generate the underlying extended graph.
For evaluating the resulting representations, we use the same
Monte Carlo-like method as for ungrouped representations. III. S CALING OF NN M ODELS
Thereby, for determining an optimized assignment from ran- For illustrating the transformation of randomness in topolo-
dom realization neurons to extended graph neurons, we only gies to the extended graph description in Section II-C, we used
consider permutations inside groups and calculate the optimal uniform random graphs for simplicity. However, these graphs
assignment in each group with a linear assignment problem are also employed for understanding the behavior of large-
solver [38], using the synapse loss as cost function. scale NNs [39], [44]. We therefore start our analysis with this
Fig. 6(a) shows the number of realized synapses and the kind of network.
synapse loss with respect to the number of groups for the Fig. 7(a) shows the progress of synapse count and config-
grouped generation method. Neurons in the network were uration memory for uniform random graphs with 100 neu-
divided into equally sized groups. According to (3), we chose rons when neurons are divided into equally sized groups
a factor s = 0. As Fig. 6(a) shows, this results in a synapse as described in conjunction with Fig. 6. Effectively, this
loss of approx. 1%, decreasing at a higher number of groups. parametric plot combines Fig. 6(a) and (b) with the number
As illustrated in Fig. 6(a) and (b), the number of realized of groups as the varying parameter. The starting point of
synapses increases with the number of groups, whereas the the curves corresponds to an ungrouped representation. Its
configuration memory decreases. This is because synapses synapse count is close to the expected number of synapses
926 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

TABLE II
M EASURES FOR C HARACTERIZING N ETWORK C LASSES AND T HEIR E XTENDED G RAPH R EPRESENTATIONS

Value Represents
synapse count available synapses in extended graph
max. flow maximum number of simultaneous connections between neurons and available synapses
(equal to synapse count in most networks)
config. memory I number of different connectivity graphs between neurons and available synapses
extracted connection matrix optimal network structure for extended graph
matrix entropy E minimum configuration memory of a network if all connections are independent from each other
mapping probability Pmap probability that a realization of a random graph is mappable to an extended graph
synapse loss L expected number of synapses of a random graph realization that are not mappable to an extended graph

1000 Häusler and Maass Kremkow et al. Lundqvist et al.


synapses in ext. graph

800

original
600
(×103)

complete
400

200
10
p  0.5 ungrouped
p  0.3 p  0.7 0
configuration memory

8 800

extracted
config. memory
(×103 bit)

6 p  0.1 600
(×103 bit)

ungrouped
4 400
ungrouped
2 200
complete complete
0 0 0 1
0 2 4 6 8 10 0 200 400 600 800 1000 probability p
synapses in extended graph (×103) number of neurons
(a) (b) Fig. 8. Original and extracted connection probability matrices for population-
based network models.
Fig. 7. Properties of uniform random graphs. (a) Parametric plot of
configuration memory over synapse count, varying with group size, for
random graphs with 100 neurons and different connection probabilities p;
circles denote theoretical values: expected synapse count and entropy of the in Fig. 7(a)] configuration memory increases before tending
corresponding random graph. (b) Scaling of synapse count and configuration toward zero. This is because the extended graph representation
memory with number of neurons for p = 0.3, for ungrouped and complete has to provide additional synapses when groups get smaller to
(group size 1) representations; in the synapse count plot, circles denote the
maximum flow values for the representations. ensure a low synapse loss. Effectively, the number of synapses
per group and neuron stays approximately constant if the
connection probability is low, mainly because the extended
in the corresponding random graph (denoted by circle), in graph has to account for the statistical spread in the random
accordance with the results of Fig. 4(a). In contrast, the realizations. Consequently, the number of synapses increases
configuration amount is slightly smaller than the theoretical almost linearly with the number of groups. At the same time,
value, i.e., the entropy of the random graph. This is caused the configuration per synapse decreases only slightly if the
by the fixed number of synapses per neuron in the extended number of groups is small. Overall, this leads to the increasing
graph, contrasting the randomly distributed synapse count in curves in Fig. 7(a). In contrast, if a neuron provides more
the random graph and adding slight dependencies between synapses in a group than half of the group size, configuration
connections, as discussed in conjunction with Table I. All per synapse always reduces more rapidly than the number
curves in Fig. 7(a) end at a fully connected extended graph. of synapse increases. Literally speaking, the corresponding
This is caused by the limit case with group size 1. There, permutation element then stores those inputs of a group that
permutation elements have a single input and output, thus do not drive a synapse of the neuron, so that each additional
degenerating to a one-to-one connection and reducing con- synapse reduces the total configuration.
figuration amount to zero. Furthermore, because in a uniform As expected for uniform random graphs, both synapse count
random graph each neuron pair may be connected in principle, and configuration amount scale quadratically with the number
an extended graph representation with groups of size 1 has to of neurons N. This principal behavior is not changed by the
provide a synapse for each possible connection, leading to the transformation to an extended graph [Fig. 7(b)]. The maximum
limit of 104 synapses in Fig. 7(a). flow values of the networks [circles in Fig. 7(b)] are identical
In Section II-C, it was argued that the configuration memory to their synapse counts, because there are no extra restrictions
decreases with the group size, because each synapse chooses on the number of simultaneous connections. This is a property
its input from a reducing subset of neuron outputs only. of all the transformations employed here, therefore, we omit
Paradoxically, for small connection probabilities [ p = 0.1 the flow values for the remainder of this section.
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 927

Kremkow et al. Häusler/Maass Lundqvist et al.


108

config. memory (bit) synapses in ext. graph


107
complete complete complete
106
5
10
grouped by grouped by grouped by
104 population population population
400 103
uniform p  0.05, uniform, p  0.05 uniform, p  0.05 uniform, p  0.05
configuration memory

102
300 Kremkow et al.
107 grouped by grouped by grouped by
(×103 bit)

6
population population population
200 10

Häusler/Maass 105
100
Lundqvist et al. 104
uniform, p  0.05 uniform, p  0.05 uniform, p  0.05
0 103 2
0 0 0 0 10 103 104102 103 104 102 103 104
synapses in extended number of neurons number of neurons number of neurons
graph (×103)
(a) (b)

Fig. 9. Properties of population-based network models. (a) Parametric plot of configuration over synapse count for model sizes of approx. 800 neurons,
symbols as in Fig. 7(a). (b) Scaling of synapse count (upper plots) and configuration memory (lower plots) with number of neurons in double-logarithmic
scale. For scaling, population sizes were increased proportionally, except for the Lundqvist et al. model, where the number of macro-columns was increased,
resulting in constant group sizes. For comparison, a uniform random graph with p = 0.05 is included in the plots. Because this graph has only one population,
either group size N (grouped) or group size 1 (complete) was chosen for the scaling diagrams.

A variety of models create structured network topologies by is also responsible for the comparatively large differences to
using partly uniform random graphs. In this approach, popula- the behavior of the Lundqvist et al. model. Although it has
tions of neurons are formed, and the connections between two a comparable number of expected synapses, its parametric
populations have constant probability. The models of Häusler curve steeply decreases to end at a small number of neuron-
and Maass [15] and Kremkow et al. [16] for area V1 of visual to-neuron connections that are possible at all in the model.
cortex, as well as the model of Lundqvist et al. [18] for cortical When implementing a network model on a distributed system,
macro columns are of this type. neurons have to be divided into groups, which corresponds to
The population structure gives a natural division into groups choosing a point somewhere in the middle of the parametric
of neurons. Fig. 8 shows the extracted connection matrices curves in Fig. 9(a). Thus, the steep decrease of the parametric
when using this grouping in the transformation to the extended curve in the model of Lundqvist et al. corresponds to less
graph. Compared to the original probability matrices, the implementation effort on such a system compared to the other
description resembles the original structure well. Thereby, the models, both in terms of configuration and synapse count.
two V1 models are very similar, with the Häusler and Maass The differences between the models are also present during
model having more variations in the interconnection proba- scaling, as Fig. 9(b) shows. An especially interesting property
bilities and more populations that are not directly connected of the networks is the difference in synapse count between
to each other. The Lundqvist et al. model clearly differs in grouped and complete (group size 1) representation. Whereas
that it has small populations that are only sparsely connected. this difference is comparatively high for the Kremkow et al.
However, if two populations are connected, the individual model and for the uniform random graph, it is significantly
connections have a high probability, as can be seen from the reduced in the Häusler and Maass model and relatively small in
absence of light grey areas in the right column of Fig. 8. the Lundqvist et al. model. Especially for the V1 models, the
The analysis of the grouped transformations reflects the Häusler and Maass model has a significantly higher synapse
differences in the models. To arrive at a curve as in Fig. 7(a), count than the Kremkow et al. model when grouped by
we iteratively increased the number of groups by dividing the population sizes, but a lower synapse count in a complete rep-
biggest group into two equal parts. Fig. 9(a) shows the para- resentation. This reflects the differences depicted in Fig. 9(a)
metric plot of synapse count and configuration with respect (compare the starting and end points of the parametric curves).
to the increasing group count. The Kremkow et al. model Due to the smaller-sized populations and fewer directly
is very similar to a uniform random graph. In contrast, the connected populations, also the configuration memory is sig-
Häusler and Maass model, despite having a higher expected nificantly reduced in the Lundqvist et al. model. However, all
synapse count and a higher entropy (see circles), shows a models, including the Lundqvist et al. model, scale quadrat-
parametric curve with less synapses and less configuration. ically in the number of neurons, both for their synapses and
This is due to the fact that the model has bigger differences for their configuration amount. Thus, from a scaling point of
in the probability values. Especially favorable in this context view, the models are equivalent to fully connected and uniform
is the higher number of connections that will not be generated random networks, they only differ in their offsets.
at all. Thereby, the number of possible connections, marking From a biological viewpoint, the number of synapses per
the endpoint of the parametric curve, is reduced. This relation neuron is bounded. Thus, a quadratic scaling of expected
928 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

exponent rs of synapse count


108 2
Häusler/Maass
107 1.8 Kremkow et al.
synapses in ext. graph group size 10 1.6 uniform, p = 0.05
106 Lundqvist et al.
1.4 HMAX
105 NN
1.2 uniform, p ~ 1/N
ungrouped
104 Gaussian
1
3
uniform, p = 0.05 0 20 40 60 80 100 120
10
synapses per neuron
102 (a)
108

exponent rc of configuration
2
107 group size 10
config. memory (bit)

1.8
106
1.6
ungrouped
5
10
1.4
4
uniform, p = 0.05
10
1.2
2 3 4 5 6
103 2 configuration per synapse (bit)
10 103 104
(b)
number of neurons
Fig. 12. Classification of networks by their scaling properties. (a) Classi-
Fig. 10. Scaling of uniform random graphs with constant expected number fication by synapse count. (b) Classification by configuration memory. The
of synapses per neuron; connection probability p = 50/N . values on the abscissas are absolute numbers for a network size of N = 500.
The nearest neighbor, and HMAX models are not included in (b), because
they have fixed connectivity and thus 0 bit of configuration in our framework.
108 Empty symbols denote values for partitioning into groups of 20 neurons (see
complete, p > 0
synapses in ext. graph

107 Section II-C).


complete, p > 0.01
original

106
105 grouped count and configuration memory, as expected. However, when
10 4
the neurons in the network are divided into groups, complexity
103 increases toward quadratic scaling (see example for group size
uniform, p = 0.05
102 10 in the figure). This is because, each possible neuron-to-
108 neuron connection may be existent in a random realization
config. memory (bit)

in principle. As a consequence, each neuron has to provide


extracted

107
synapses for each group of sending neurons. Then, because
106 the number of fixed-size groups scales linearly with neuron
grouped
105 count N, the number of synapses per neuron also approaches
104
linear scaling with N, leading to quadratic scaling of overall
uniform, p = 0.05 synapse count with the number of neurons. Thus, one has to
0 1 103 2 be careful when implementing such networks, as partitioning
probability p 10 103 104
number of neurons may result in bad scaling of the network.
(a) (b) As an alternative to uniform and population-based random
networks, locally coupled random graphs are used for net-
Fig. 11. Properties of locally coupled network with Gaussian-shaped work analysis, often employing a Gaussian-shaped connection
connection probability function. (a) Original and extracted probability matrix
for a grid of 16 × 16 neurons. (b) Scaling of synapse count and configuration
probability with respect to connection distance [40], [42], [45].
memory. This is argued to be more biologically plausible. As an instance
of such a topology, we use a quadratic grid of neurons that are
connected to each other depending on their Euclidean distance
synapse count with neuron count as in the above models is not d in the grid as
−d 2
plausible on a bigger scale, because it means a linear increase (2σ 2
p)
p(d) = e (7)
in the number of synapses per neuron. Alternatively, the
expected number of synapses per neuron can be held constant. where we chose the width σ p equal to four grid spacings,
As an example, we do so by relating connection probability according to [40]. Fig. 11(a) shows the corresponding con-
inversely proportional to neuron count, as is also done for nection matrix, with the neurons sorted row-wise. The local
getting theoretical results in the limit of large N [20]. Fig. 10 coupling in two dimensions is visible as a set of diagonals.
shows the scaling results for this case. For an ungrouped For the transformation to the extended graph, we arranged
representation, scaling is approximately linear for synapse each row of neurons into one group. This neglects the location
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 929

Schemmel et al. CLANN Choi et al.

off-chip
N N
N
(32) (2) (1) P
P
P P P
S S S
32 32 32

64 P P P
1 2 192 192 192 S 1
1 S S S
on-chip

S 2
S
S 2

32
S S S S
192
S

Fig. 13. Extended graph representations of the three neuromorphic hardware systems analyzed in this section. For clarity, some connections are bundled and
drawn as thick lines with number of connections indicated.

CLANN Schemmel et al. Choi et al.


of the connections in one row, but retains the dependence
on the distance between rows, as the extracted probability
matrix shows. From a scaling point of view, local coupling
is favorable, as Fig. 11(b) shows. The grouped representation
reaches linear scaling for bigger network sizes both in terms of
synapse count and configuration amount. Theoretically, a com-
plete representation would contain all possible connections, 4 chips 2 chips 10 ×10 neurons
because p is always greater than 0, but if one neglects very (128 neurons) (768 neurons)
(p = 0.01)
small connection probabilities ( p ≤ 0.01), scaling of synapse
count again becomes linear. This is an advantage compared 0 probability p 1
to uniform random graphs with bounded synapse-per-neuron
count, because linear scaling is retained independently of the Fig. 14. Extracted probability matrices for the hardware approaches analyzed
in this paper.
level of partitioning, i.e., the group size.
The number of synapses and the amount of configurability
are informative measures on how many resources a network the so-called HMAX model by Riesenhuber and Poggio [17],
model minimally requires when implemented in neuromorphic which implements hierarchical feature recognition in a feed-
hardware or software. Thus, these values can be used to forward network (see [24] for details on the implementation of
categorize models in terms of their connection complexity. As connectivity used here). Both models have a low total synapse
discussed in this section, both the absolute values per network count that scales approximately linearly with the neuron count.
size (e.g., synapses per neuron) and their scaling are of interest. Only the uniform random graph with constant number of
To allow for a scalar scaling value instead of the scaling plots synapses per neuron shows a somewhat similar characteristic.
in Figs. 9–11, we assume that a certain network measure X For classification, we have put these three models into a single
depends on the number of neurons N with a power law, i.e., category concerning the synapse count [see lines in Fig. 12(a)].
X = X̄ · N r . The exponent r then determines the scaling of X The population-based models all fall into the quadratic scaling
with the network size N. If one extracts X for two network class. Thereby, the Häusler and Maass model sticks out with
sizes N1 and N2 , yielding values X 1 and X 2 , this exponent a higher synapse count. Even more synapses are implemented
can be calculated as in the Gaussian network model, but with an approximately
log(X 2 / X 1 ) linear scaling. The disadvantageous partitioning properties of
r= . (8) the uniform random graph, as discussed in conjunction with
log(N2 /N1 )
Fig. 10, are also reflected in the classification diagram. The
For the network classification, we used N1 = 500 and N2 = connection complexity of the partitioned graph increases both
1500. For each of the networks discussed in this section, we in terms of scaling and absolute synapse count, in contrast to
calculated for both network sizes the number of synapses X 1S the locally coupled Gaussian network.
and X 2S as well as the amount of configuration X 1C and X 2C . Classification by configuration shows partially different
Using (8), the exponents rS and rC could then be calculated. results. The population-based models again exhibit quadratic
We plotted these exponent against the absolute values for the scaling. Thereby, the Kremkow model needs an especially
small-size network (N1 = 500), X 1S and X 1C , resulting in the high amount of configuration, falling in the same class as the
classification diagrams in Fig. 12. uniform random graph. Using a constant synapse-per-neuron
For a broader comparison, we included two models with count, the random graph model again shows approximately
fixed connections. The first one is a network with near- linear scaling, which approaches quadratic scaling during par-
est neighbour connectivity (four-neighborhood), as used, for titioning. The Gaussian network exhibits intermediate scaling
example, in image processing tasks [46]. The second one is complexity but low absolute configuration values.
930 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

configuration memory (103 bit)

number of realised synapses


120 108 108

configuration memory (bit)


Schemmel et al. Schemmel et al. Schemmel et al. uniform, p  0.05
CLANN
100 107 CLANN uniform, p  0.05 CLANN
Choi et al. Choi et al. 107 Choi et al.
80 106
106
60 105
105
40 104
20 103 104
0 102 2 103 2
0 200 400 600 800 10 103 104 10 103 104
number of realised synapses (103) number of neurons number of neurons
(a) (b) (c)

Fig. 15. Scaling of neuromorphic hardware approaches with the number of hardware units. For a suitable range and axis scaling in all three plots, system
sizes were varied as follows: 1–20 chips for CLANN, 1–10 chips for Schemmel et al., and 10 × 10 to 70 × 70 neurons for Choi et al. For comparison, the
scaling of a uniform random graph as in Fig. 9(b) (grouped curve) was included in (b) and (c). (a) Comparison of configuration memory and synapse count.
(b) Number of synapses over neuron count. (c) Configuration memory over neuron count.

While the limits of the categories in Fig. 12 are cho- an arbitrary input via an AER bus. Thus, each neuron has only
sen somewhat arbitrarily, the diagrams give an informative five synapses altogether owing to its special application.
overview of network models and facilitate their comparison Fig. 14 shows the extracted connection matrix for each
in terms of connection complexity. Due to the generic nature system. For the CLANN system, the hard-wired synapses
of the extended graph, these diagrams can also be generated inside a chip are visible as blocks along the main diagonal,
for neuromorphic hardware systems, as we will show in the the rest of the connections are fed from outside and thus has
next section. uniform probability. In the Schemmel et al. system, the regions
with higher probability are caused by the internal connections
IV. S CALING OF N EUROMORPHIC H ARDWARE between opposite blocks, the rest have uniform connectivity
A PPROACHES from outside. The matrix for the Choi et al. system clearly
shows the nearest neighbor connectivity as diagonals, with
From the neuromorphic hardware systems available today, uniform probability otherwise. Apart from the diagonals, the
we choose three that are representative of different connec- connection probability is very low, because only one synapse
tivity approaches. These are the system by Schemmel et al. for each neuron is fed from outside the chip.
[5], the CLANN system [4], and the system by Choi et al. [1]. The CLANN and the Schemmel et al. system resemble
From the system descriptions, extended graphs can be derived, each other in the extracted connection matrices. However, they
which are shown in Fig. 13 and explained in the following. differ in their configurability relative to the synapse count,
As a general convention, we assume free configurability for as Fig. 15(a) shows. The similar amount of configuration per
connections between chips of each system, but avoid doubling synapse in the CLANN and the Choi et al. system leads to
neuron–neuron connections. For that, permutation elements, almost equal curves in this diagram, despite both systems
receiving inputs from all N neurons in the system, are added, greatly differing in their number of synapses per neuron. This
one per synapse block of each neuron. Thus, the analysis only is resolved when plotting against neuron count, as can be
covers connectivity restrictions inside a single chip. seen from Fig. 15(b) and (c). Synapse count increases linearly
The system by Schemmel et al. [5] employs a blockwise with the number of neurons for all systems [Fig. 15(b)], as
organization of neurons and synapses. Thereby, two blocks of expected from the constant number of synapses per neuron.
192 neurons with each 256 × 192 synapses were integrated on The configuration amount scales as N log(N) with the neuron
a chip (with only one shown in Fig. 13). Synapses in a column count (N) for all systems, however, the Schemmel et al. and
are driven by the same signal, so that the synapses of each Choi et al. chips restrict the configurability more than the
neuron in a block are driven by the same 256 inputs. All inputs CLANN chip.
may be configured to receive their spikes from outside the The expressiveness of the extracted probability matrices
chip. Alternatively, each of the first 192 columns can receive in Fig. 14 can be judged by comparing the entropy of the
its input from the corresponding neuron in the opposite block. matrix to the configuration amount of the system (Fig. 16
The CLANN chip, as described in [4], consists of a block and also discussion in Section II-B). If both values are
of 32 neurons with 64 synapses each. Thereby, 32 synapses similar, as is the case for the CLANN and the Choi et al.
of each neuron are hard-wired, receiving input from one of system, connection probabilities in the matrix are likely to
the neurons on the chip, whereas the remaining 32 synapses be independent of each other, so that the probability matrix
can be fed individually from outside using an address–event faithfully reflects the connectivity properties. In contrast, if
representation (AER) protocol. the actual configuration amount is smaller than the entropy as
A more specialized connectivity is realized in the system for the Schemmel et al. system, connections are correlated,
by Choi et al. [1]. Their chip is used for image processing i.e., choosing a single connection changes the probabilities of
and resembles localized filter kernels by hard-wired nearest other connections, as was illustrated in Table I for a partly
neighbor connections between neurons in a rectangular grid. fixed connection assignment. In such a case, the extracted
Additionally, each neuron has one synapse that can be fed with connection matrix does not fully resemble the underlying
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 931

configuration memory (bit)

configuration memory (bit)


configuration memory (bit)
108 108 108
CLANN Schemmel et al. Choi et al.
107 107 107
106 106 106 uniform, p  0.05

105 105 105


104 104 104
uniform, p  0.05 uniform, p  0.05
103 2 103 2 103 2
10 103 104 10 103 104 10 103 104
number of neurons number of neurons number of neurons
(a) (b) (c)

Fig. 16. Entropies of extracted probability matrices for the hardware systems in Fig. 15. Solid lines denote entropies, and dashed lines the configuration
amount from Fig. 15(c) as reference. (a) CLANN, (b) Schemmel et al., and (c) Choi et al. exponent rc of synapse count
exponent rs of synapse count

2 1.3 absolute synapse-per-neuron values clearly discriminate the


CLANN
Schemmel et al. systems, as discussed in conjunction with Fig. 15. Concerning
1.8 Choi et al. configuration memory, the systems show a scaling slightly
1.6
1.2 above linear (rC ≈1.2). This is because the number of synapses
1.4 or inputs to a single chip or neuron is constant (linear scaling)
1.2 and the number of neurons a single input can choose from
1 increases with system size (logarithmic scaling), leading to a
1.1
0 50 100 150 200 250 0 1 2 3 scaling of the order O(N log(N)). Again, the differences in the
synapses per neuron configuration per synapse (bit) absolute configuration-per-synapse values are more significant,
(a) (b) reflecting the discussion of Fig. 15. These absolute values
may seem too low for the Choi et al. and the CLANN
Fig. 17. Classification diagrams for the neuromorphic hardware approaches in
Fig. 15. (a) Classification by synapse count. (b) Classification by configuration system, given that synapses can be configured independently.
memory. Meaning of the axis is the same as in Fig. 12. System sizes were With the 512 neurons in the small-size CLANN system, one
chosen to be similar to the neuron counts N1 = 500 and N2 = 1500 used for would expect 9-bit configuration per synapse. However, our
the network model classification [16 (N1 ) and 47 (N2 ) chips for CLANN, 1
(N1 ), and 4 (N2 ) chips for Schemmel et al., 22 × 22 (N1 ) and 39 × 39 (N2 ) calculation method takes into account that synapses to a single
neurons for Choi et al.]. neuron are interchangeable in their order, which alone reduces
configuration to approx. 5.3 bits per synapse. Additionally, the
final value is derived by averaging over all synapses in the
constraints on connectivity. In the Schemmel et al. system, system, as also the fixed ones. This results in the lower values
synapses in the same column are strongly correlated, because shown in the diagram.
they are driven by the same input signal (Fig. 13). Fig. 18 shows combined classification diagrams for network
From the characterizations carried out in this section, the models and neuromorphic hardware. The CLANN and Schem-
strengths and weaknesses of the three hardware systems can mel et al. systems are sufficient for most models concerning
be identified. The CLANN system has a high amount of con- the absolute synapse-per-neuron values, whereas the Choi et
figurability, but its number of synapses per neuron is limited, al. system, as expected, fits only the nearest neighbor network.
furthermore, it integrates a relatively low number of neurons However, in terms of synapse scaling, all hardware systems
on a chip, so that bigger networks would require a high number cannot cope with the network models, except for the nearest
of chips. Nevertheless, the configurability ensures that the neighbor and the uniform constant-input network, because
system is maximally utilized. The Schemmel et al. system all other networks have a scaling exponent rs > 1. Still, a
integrates more synapses and neurons on a single chip, but it hardware system may be able to implement networks with
offers fewer configurations in terms of connectivity. However, higher synapse scaling (rs > 1) up to a limited network size N
as the analysis in Section III shows, less configuration may if it provides excess synapses per neuron at a smaller network
be compensated by providing a higher number of synapses size. This is shown exemplarily in Fig. 18(a) for the Schemmel
[Fig. 9(a)], so that the higher integration density pays off et al. system. For a network size of 3000 neurons, the system
the lower configurability, at the expense of lower synapse can, considering the pure number of synapses, implement all
utilization. Finally, the Choi et al. system integrates far more investigated network models except for the Häusler/Maass
neurons on a single chip than the other systems, but this is model.
paid off by the low number of synapses per neuron and the The comparison of configuration memory in Fig. 18(b)
mostly fixed connectivity. shows similar results as that for the synapse count. Hardware
The differences in the systems can be concisely visualized systems do not reach the scaling of network models. Moreover,
by classification diagrams similar to Fig. 12, which are shown only the CLANN system reaches an absolute configuration-
in Fig. 17. All hardware approaches employ a strictly linear per-synapse value that is sufficient to at least represent a
scaling of synapse count, because the number of synapses per subset of the network models. Because configuration scaling
neuron is fixed in all systems. However, the differences in the is partly correlated to synapse scaling, enhancing the latter
932 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

exponent rs of synapse count


Häusler/Maass sufficient reserves could therefore be foreseen in the hardware
2 Kremkow et al.
uniform, p = 0.05
design, on the other, quadratic network scaling may not be
1.8
Lundqvist et al. realistic for bigger network sizes, as discussed in Section III.
1.6 HMAX
Thus, network models could be modified for exhibiting more
NN
1.4 uniform, p = 50/N compatible scaling properties.
Gaussian
1.2 Second, the realization of a network topology offers some
CLANN
1 Schemmel et al. degree of flexibility, such that the amount of configurability
0 50 100 150 200 250 Choi et al.
equal number of
can be reduced when providing more synapses [Fig. 9(a)].
synapses per neuron
synapses at N = 3000 However, hardware utilization is reduced at the same time,
(a)
meaning that the percentage of actually used synapses for a
single network is decreased.
exponent rc of configuration

2 Third, more flexibility has to be paid off by less integration


1.8 density in terms of neurons and synapses, as can be seen from
1.6 the differing number of neurons per chip. Besides differences
1.4 in synapse circuit size, this might be caused by the additional
1.2 switches, decoders, and memory needed for a more flexible
1 connectivity. Thus, restricting a hardware design to a certain
0 1 2 3 4 5 6
network topology may greatly increase the feasible integra-
configuration per synapse (bit) tion density. The Choi et al. system somewhat follows this
(b) approach, implementing restrictive nearest neighbour connec-
tivity at a high number of neurons per chip.
Fig. 18. Comparing classification diagrams for network models and
neuromorphic hardware systems. (a) Classification by synapse count. (b)
Classification by configuration memory. Meaning of axes is the same as V. R ELATION TO OTHER S CALING M EASURES
in Fig. 12. The solid line in (a) connects (virtual) systems with different
scaling exponent r S that, however, would have the same number of synapses In the scaling analyses in Sections III and IV, we have used
(768 000) like the Schemmel et al. system at a network size of N = 3000. the total number of synapses and the amount of configurability.
How do these values relate to common scaling and complexity
measures, such as the degree distribution [19] and Rent’s rule?
in neuromorphic hardware may overcome both scaling gaps. The total number of synapses Stot is directly connected
An approach to enhance the synapse scaling is to combine to the mean degree N syn via the number of neurons: i.e.,
multiple neurons to arrive at a single new neuron with more Stot = N syn · N. Thus, a degree distribution that is independent
synapses. This is, for example, used in the FACETS wafer- of the network size would result in a linear scaling of Stot .
scale system [6]. By variably adjusting the synapse-per-neuron Conversely, a quadratic scaling of Stot with the network size
count, networks with scaling rs >1 can be implemented with- corresponds to a linear increase of the mean degree. Apart
out having excess synapses at smaller network sizes. While from the mean, no information about the degree distribution
such an approach alleviates the synapse constraint imposed can be extracted from the total connection count. However, for
by a single chip, it does not resolve the scaling gap, in the network models analyzed in this paper, the shape of the
principle, as the number of synapses per chip stays constant. distribution can be easily inferred from the model definitions.
For example, quadratic synapse scaling would then correspond Uniform random graphs have a binomial degree distribution
to a quadratic increase in the number of chips. [19]. Because population-based models consist of several
Besides the discrepancies between neuromorphic hardware areas with uniform connectivity, their degree distribution is
systems and network models, our analyses have shown great a superposition of binomial distributions. Gaussian-connected
differences between the hardware systems. It may be argued networks have a degree distribution similar to uniform random
that a comparison only in terms of connectivity is unfair, graphs, with a potential extension to a quarter of the mean
because it neglects other important design considerations, degree depending on the boundary condition.
such as implemented neuron and synapse model, speedup All analyzed hardware systems have a constant number
and robustness, which are determined by the applications the of synapses per neuron, resulting in a single peak in the
systems were implemented for. However, such an analysis degree distribution. However, as synapses can be switched off
reveals general constraints in terms of connectivity that have in hardware, this distribution only determines the maximum
to be accounted for in neuromorphic hardware design. degree, i.e., arbitrary degree distributions with the given degree
First, the number of synapses and the configuration maximum may be realized. With a more flexible synapse-
amount scales approximately linearly in hardware systems, but to-neuron configuration [6], even this restriction could be
quadratically in most network models (Fig. 18). In hardware diminished.
systems as the above ones, the number of synapses as well Besides the distribution of degrees, also the distribution
as the configurability per synapse cannot be increased after of connections between groups of neurons has been used
chip implementation, so that linear scaling is a rather hard for complexity analysis, expressed in Rent’s rule [23], [24],
boundary on neuromorphic hardware systems even when using [47], [48]. This power-law relation connects the number of
a variable number of synapses per neuron [6]. On one hand, neurons in a certain subpartition of a network with the number
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 933

of connections that the partition forms with the surrounding whereas hardware systems scaled linearly. This concerns both
network. Thereby, the exponent of the power law, which is the number of synapses and the minimally required amount of
called the Rent exponent r , is used as a measure of the configurability. Linear scaling in the analyzed systems results
routing complexity of the network. Investigations with Rent’s from the constant number of synapses that are assigned to
rule are commonly performed with networks of constant a hardware neuron. Most neuromorphic hardware chips are
size. However, the expected number of connections to/from a designed like that, including the systems investigated in this
partition depends both on the Rent exponent and on the mean paper [1], [4], [5], but also, for example, the FLANN system
degree. Thus, when scaling up a network, two independent [4] and the learning chip by Arthur and Boahen [50]. With
measures have to be taken into account for a partition: the the simple addition of configurable neuron merger circuits
scaling of the mean degree, as can be extracted from our as proposed in [6], the number of synapses per neuron can
analysis, and Rent’s rule. be flexibly increased, as discussed in Section IV. While this
We have analyzed the network models of Section III with relaxes the synapse-per-neuron count constraint, the number of
Rent’s rule in a prior publication [24]. There, we have found a neurons on a single chip potentially decreases with network
high routing complexity for uniform random and population- size, thus requiring more chips in the system. In other words,
based models. In contrast, models with local connectivity had a the synapse scaling in a given network determines the scaling
significantly lower Rent exponent, which is also theoretically of the number of chips, thus transferring the scaling constraint
predicted by the network definitions: Locality measures are from the chip level to the system level. An interesting alterna-
commonly calculated in a low-dimensional space, which can tive hardware approach to relax the synapse scaling constraint
be directly related to a maximum Rent exponent [30], [49]. For is the chip implemented by Vogelstein et al. [51]. Instead of
hardware systems that employ a matrix-like arrangement of implementing multiple hardware synapses per neuron, they
synapses, internal all-to-all connectivity is feasible in principle. use only one synapse circuit per neuron that generates all
This corresponds to the highest possible routing complexity synaptic input. Thus, an arbitrary number of synapses per
(r = 1). However, the connectivity between chips affects the neuron could be implemented in principle, while maintaining
behavior for bigger Rent partitions, so that a different power- a constant number of neurons per chip. However, the number
law relation is valid for this region, possibly with a lower Rent of synapses is limited by the bandwidth of the pulse input
exponent. channel, which could be quite restrictive, because each pulse
Rent’s rule counts the absolute number of connections has to be re-sent multiple times in order to model a decaying
to/from a partition, thus, it does not include any dependence synaptic current. Also, synaptic plasticity is calculated off-
on configurability. Still, some qualitative similarities between chip. Thus, all constraints on connectivity are essentially
the amount of configurability and the complexity in Rent’s transferred to the surrounding system. In contrast, as stated
rule can be observed. The more the sources of connections for before, the analysis in this paper assumed infinite connectivity
an arbitrary target neuron may vary (high configurability), the resources off-chip. In a more global investigation of hardware
more difficult it is to divide the network such that the resulting systems, throughput, and memory constraints could as well be
partitions are maximally decoupled. At the same time, a described and analyzed by our method, assuming the details
better decoupling corresponds to a lower Rent exponent. This of the off-chip interconnection network are known. However,
principal relation is reflected in the results of [27] and can as memory, bandwidth, and processing elements are constant
be illustrated with uniform random graphs. These graphs have once a hardware module has been implemented, scaling stays
maximum configurability (entropy) of all the graphs with a inevitably linear with the number of modules.
given mean degree. At the same time, analysis with Rent’s rule Given the hardware scaling constraints, quadratically scaling
shows maximum routing complexity [24]. In contrast, local networks are very costly to enlarge within neuromorphic
connectivity both restricts configurability and the amount of hardware. Thus, network models with less demanding and
long-range connections, resulting in a lower Rent exponent. potentially linear scaling are advantageous from an imple-
mentation point of view. In our analysis, we have found that
networks with localized connectivity, such as nearest neighbor
VI. C ONCLUSION
and distance-dependent connection probability models, exhibit
In this paper, we have introduced an extended graph method synapse and configuration scaling close to linear. Not surpris-
for formalizing and characterizing the connectivity of NN ingly, they have been shown to be scalable in existing neuro-
models and neuromorphic hardware. This especially allowed morphic hardware designs [1], [42]. As a consequence, also
us to extract the amount of configurability minimally required a small-world topology [52], consisting of local connectivity
for a network topology, a measure that has great importance and some global connections, employs a favorable scaling, as
to neuromorphic hardware design but widely neglected so far. long as the global connectivity is sparse, while keeping the
Furthermore, we investigated scaling properties of models and network diameter low as argued for brain networks [53].
hardware. This allowed us to categorize both in classes of From a biological point of view, quadratic scaling seems to
differing connection complexity. be feasible for the network sizes we have used in our analysis,
The most prominent difference between neuromorphic hard- as the potential (feasible) connectivity has been found to be
ware systems and most of the analyzed network models all-to-all up to an order of 104 neurons (corresponding to
is the scaling of connectivity (Fig. 18): uniform random a geometrical extension of some 100 μm) [8]. Also, local
and population-based network models scaled quadratically, connectivity measurements assume uniform distribution of the
934 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 6, JUNE 2011

connections, which is known as Peter’s rule [36]. However, that large-scale neuromorphic hardware designs need to aim
bigger networks are again bound to localized connectivity for a similar compromise in order to offer sufficiently powerful
[8], [40]. This could be included into present population- processing while allowing for an efficient implementation. The
based network models, such as [15] and [16], to make them constraint comparisons between hardware and brain performed
biologically realistic also on a bigger scale. The model by by Moses et al. [56] may be used to guide this process.
Lundqvist et al. [18] is already based on a geometrical layout Following this approach, the extended graph could be used
and employs a columnar structure. However, the intercolumnar for generating hardware designs that are fitted to a specific
connections scale quadratically in the model definition of [18], configurable network topology. In a naive approach, circuit
which leads to the overall quadratic results seen in our analysis realizations of the basic graph elements (neuron, synapse,
(Figs. 9 and 12). Lundqvist et al. note that these connections P-element) could be implemented and connected according to
could be thinned out in bigger networks. If this thinning the connections in the extended graph. While such a procedure
resulted in linear scaling of the intercolumnar connections, alone would lead to an inefficient hardware realization, it still
the whole network would grow linearly, making the model a opens an avenue to a more automated design of neuromorphic
good candidate for neuromorphic implementation. chips.
While the extended graph is especially fitted to network
topologies with a large amount of configurability like the ACKNOWLEDGMENT
ones discussed above, it is generally bound to descriptive
The authors would like to thank C. Mayr for many fruitful
network definitions. Topologies resulting from generative
comments on the manuscript.
approaches, e.g., preferential attachment [19], cannot be rep-
resented directly. For hardware systems, this is no limitation,
because these can be described in a nongenerative way. For R EFERENCES
completely deterministic network descriptions that result in [1] T. Y. W. Choi, B. E. Shi, and K. A. Boahen, “An ON-OFF orientation
a single netlist (e.g., nearest neighbor or all-to-all connec- selective address event representation image transceiver chip,” IEEE
Trans. Circuits Syst. I, vol. 51, no. 2, pp. 342–353, Feb. 2004.
tivity), the extended graph reduces to a conventional graph, [2] T. J. Koickal, A. Hamilton, T. C. Pearce, S. L. Tan, J. A. Covington, and
which could also be characterized with existing measures [19]. J. W. Gardner, “Analog VLSI design of an adaptive neuromorphic chip
Consequently, care has to be taken in the network definition for olfactory systems,” in Proc. IEEE Int. Symp. Circuits Syst., Island
of Kos, Greece, May 2006, pp. 4547–4550.
and its transformation to the extended graph to include all [3] E. Chicca, P. Lichtsteiner, T. Delbruck, G. Indiveri, and R. Douglas,
relevant sources of topological variability. For example, a “Modeling orientation selectivity using a neuromorphic multi-chip sys-
single realization of a random graph, i.e., a netlist, described tem,” in Proc. Int. Symp. Circuits Syst., Island of Kos, Greece, 2006,
pp. 1–4.
with the extended graph, would have no configurability. [4] M. Giulioni, “Networks of spiking neurons and plastic synapses: Imple-
Scaling results such as ours can be used to identify and mentation and control,” Ph.D. dissertation, Facolta Sci. Mate. Fis.
constrain models for biologically realistic connectivity. This Naturali, Università degli studi di Roma Tor Vergata, Rome, Italy, 2008.
[5] J. Schemmel, A. Gruebl, K. Meier, and E. Mueller, “Implementing
requires linking to biological scaling requirements, which has synaptic plasticity in a VLSI spiking neural network model,” in Proc. Int.
been done for synapse count and distribution [23], [53], [54]. Joint Conf. Neural Netw., Vancouver, BC, Canada, Oct. 2006, pp. 1–6.
Results of this paper could also be adapted to our analyses. For [6] J. Schemmel, J. Fieres, and K. Meier, “Wafer-scale integration of analog
neural networks,” in Proc. Int. Joint Conf. Neural Netw., Hong Kong,
relating the configuration amount in our method to biological Jun. 2008, pp. 431–438.
measurements, the work of Stepanyants and Chklovskii [8] [7] S. Furber and A. Brown, “Biologically-inspired massively-parallel archi-
points to an important aspect. From the available connection tectures - computing beyond a million processors,” in Proc. 9th Int. Conf.
Appl. Concurrency Syst. Des., Augsburg, Germany, Jul. 2009, pp. 3–12.
points between axons and dendrites, only a fraction of 10–30% [8] A. Stepanyants and D. B. Chklovskii, “Neurogeometry and potential
is occupied by synapses. Thus, independent of axon and synaptic connectivity,” Trends Neurosci., vol. 28, no. 7, pp. 387–394,
dendrite growth, biological neural networks take advantage Jul. 2005.
[9] J. Bailey and D. Hammerstrom, “Why VLSI implementations of associa-
of a significant amount of variability, or configurability, in tive VLCNs require connection multiplexing,” in Proc. IEEE Int. Conf.
their connections that can be shaped by plasticity mechanisms Neural Netw., vol. 2. San Diego, CA, Jul. 1988, pp. 173–180.
[55]. Biologically realistic network models need to take this [10] E. Culurciello and A. G. Andreou, “A comparative study of access
topologies for chip-level address-event communication channels,” IEEE
configurability into account, which could be quantified using Trans. Neural Netw., vol. 14, no. 5, pp. 1266–1277, Sep. 2003.
the methods introduced in this article. Related to this is [11] J. Navaridas, M. Lujan, J. Miguel-Alonso, L. A. Plana, and S. Furber,
the finding that networks formed by brain regions are well “Understanding the interconnection network of SpiNNaker,” in Proc.
23rd Int. Conf. Supercomput., 2009, pp. 286–295.
described by random graphs with first-order statistics, corre- [12] P. Merolla, J. Arthur, B. Shi, and K. Boahen, “Expandable networks
sponding to high entropy values [33]. While the brain networks for neuromorphic chips,” IEEE Trans. Circuits Syst. I, vol. 54, no. 2,
on the single-neuron level may show differing statistics, the pp. 301–311, Feb. 2007.
[13] C. Bartolozzi and G. Indiveri, “Selective attention in multi-chip address-
results of [8] and [33] indicate that both the global and local event systems,” Sensors, vol. 9, no. 7, pp. 5076–5098, Jun. 2009.
neural circuits in nervous systems exhibit a high entropy, [14] D. Brüderle, “Neuroscientific modeling with a mixed-signal VLSI hard-
corresponding to a relatively low degree of structure. At the ware system,” Ph.D. dissertation, Kirchhoff Inst. Phys., Ruperto-Carola
Univ. Heidelberg, Heidelberg, Germany, 2009.
same time, geometrical constraints in the brain necessitate a [15] S. Häusler and W. Maass, “A statistical analysis of information-
certain amount of structure [24]. Thus, it can be stated that processing properties of lamina-specific cortical microcircuit models,”
neural circuits constitute a compromise between unspecific Cereb. Cortex, vol. 17, no. 1, pp. 149–162, 2007.
[16] J. Kremkow, A. Kumar, S. Rotter, and A. Aertsen, “Emergence of
(for example, uniform random) and highly structured (e.g., population synchrony in a layered network of the cat visual cortex,”
nearest neighbor) networks. From this observation, we argue Neurocomputing, vol. 70, nos. 10–12, pp. 2069–2073, Jun. 2007.
PARTZSCH AND SCHÜFFNY: SCALING OF CONNECTIVITY IN NEUROMORPHIC HARDWARE AND IN MODELS OF NNs 935

[17] M. Riesenhuber and T. Poggio, “Hierarchical models of object recogni- [43] A. Davison, D. Brüderle, J. Eppler, J. Kremkow, E. Mueller, D. Pecevski,
tion in cortex,” Nature Neurosci., vol. 2, no. 11, pp. 1019–1025, Nov. L. Perrinet, and P. Yger, “PyNN: A common interface for neuronal
1999. network simulators,” Front. Neuroinf., vol. 2, no. 11, pp. 1–10, 2009.
[18] M. Lundqvist, M. Rehn, M. Djurfeldt, and A. Lansner, “Attractor [44] C. Börgers and N. Kopell, “Synchronization in networks of excitatory
dynamics in a modular network model of neocortex,” Netw.: Comput. and inhibitory neurons with sparse, random connectivity,” Neural Com-
Neural Syst., vol. 17, no. 3, pp. 253–276, 2006. put., vol. 15, no. 3, pp. 509–538, Mar. 2003.
[19] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, [45] W. Maass, T. Natschläger, and H. Markram, “Fading memory and kernel
“Complex networks: Structure and dynamics,” Phys. Rep., vol. 424, nos. properties of generic cortical microcircuit models,” J. Physiol., vol. 98,
4–5, pp. 175–308, Feb. 2006. nos. 4–6, pp. 315–330, 2004.
[20] M. E. J. Newman, “The structure and function of complex networks,” [46] C. Mayr and R. Schüffny, “Neighborhood rank order coding for robust
SIAM Rev., vol. 45, no. 2, pp. 167–256, 2003. texture analysis and feature extraction,” in Proc. 7th Int. Conf. Hyb.
[21] J. Wang and G. Provan, “Topological analysis of specific spatial complex Intell. Syst., Kaiserslautern, Germany, Sep. 2007, pp. 290–301.
networks,” Adv. Complex Syst., vol. 12, no. 1, pp. 45–71, 2009. [47] M. A. Sivilotti, “Wiring considerations in analog VLSI systems, with
[22] D. B. Chklovskii, “Exact solution for the optimal neuronal layout application to field-programmable networks,” Ph.D. dissertation, Dept.
problem,” Neural Comput., vol. 16, no. 10, pp. 2067–2078, Oct. 2004. Eng. Appl. Sci., CalTech, Pasadena, 1991.
[23] V. Beiu and W. Ibrahim, “Does the brain really outperform Rent’s rule?” [48] P. Christie and D. Stroobandt, “The interpretation and application of
in Proc. ISCAS, 2008, pp. 640–643. Rent’s rule,” IEEE Trans. Very Large Scale Integr. Syst., vol. 8, no. 6,
[24] J. Partzsch and R. Schüffny, “On the routing complexity of neural pp. 639–648, Dec. 2000.
network models - Rent’s rule revisited,” in Proc. Eur. Symp. Artif. Neural [49] L. Hagen, A. Kahng, J. Fadi, and C. Ramachandran, “On the intrinsic
Netw., Bruges, Belgium, Apr. 2009, pp. 595–600. Rent parameter and spectra-based partitioning methodologies,” IEEE
[25] G. Bianconi, “The entropy of randomized network ensembles,” Euro- Trans. Comput.-Aid. Integr. Circuits Syst., vol. 13, no. 1, pp. 27–37,
phys. Lett., vol. 81, no. 2, pp. 28005-1–28005-6, Jan. 2008. Jan. 1994.
[26] J. Kim and T. Wilhelm, “What is a complex graph?” Phys. A, vol. 387, [50] J. Arthur and K. Boahen, “Learning in silicon: Timing is everything,”
no. 11, pp. 2637–2652, 2008. in Proc. Adv. Neural Inf. Process. Syst., 2006, pp. 1–8.
[27] W. Feng and J. Greene, “Post-placement interconnect entropy: How [51] R. J. Vogelstein, U. Mallik, J. T. Vogelstein, and G. Cauwenberghs,
many configuration bits does a programmable logic device need?” in “Dynamically reconfigurable silicon array of spiking neurons with
Proc. Int. Workshop Syst.-Level Interconn. Predict., Munich, Germany, conductance-based synapses,” IEEE Trans. Neural Netw., vol. 18, no.
Mar. 2006, pp. 41–48. 1, pp. 253–265, Jan. 2007.
[28] W. Donath, “Placement and average interconnection lengths of computer [52] D. Watts and S. Strogatz, “Collective dynamics of ‘small-world’ net-
logic,” IEEE Trans. Circuits Syst., vol. 26, no. 4, pp. 272–277, Apr. 1979. works,” Nature, vol. 393, pp. 440–442, Jun. 1998.
[29] M. Djurfeldt, “Large-scale simulation of neuronal systems,” Ph.D. [53] M. Changizi, “Scaling the brain and its connections,” in Evolution of
dissertation, School Comput. Syst. Sci., KTH Royal Inst. Technol., Nervous Systems, J. Kaas, Ed. Amsterdam, The Netherlands: Elsevier,
Stockholm, Sweden, 2009. 2007.
[30] D. Bassett, D. Greenfield, A. Meyer-Lindenberg, D. Weinberger, [54] V. Braitenberg, “Brain size and number of neurons: An exercise in
S. Moore, and E. Bullmore, “Efficient physical embedding of topologi- synthetic neuroanatomy,” J. Comput. Neurosci., vol. 10, no. 1, pp. 71–77,
cally complex information processing networks in brains and computer 2001.
circuits,” PLoS Comput. Biol., vol. 6, no. 4, pp. e1000748-1–e1000748- [55] D. Feldman, “Synaptic mechanisms for plasticity in neocortex,” Annu.
14, Apr. 2010. Rev. Neurosci., vol. 32, pp. 33–55, Jul. 2009.
[31] B. Hendrickson and T. G. Kolda, “Graph partitioning models for parallel [56] M. Moses, S. Forrest, A. Davis, M. Lodder, and J. Brown, “Scaling
computing,” Parallel Comput., vol. 26, no. 12, pp. 1519–1534, Nov. theory for information networks,” J. Royal Soc. Interf., vol. 5, no. 29,
2000. pp. 1469–1480, Dec. 2008.
[32] S. Isoda, Y. Kobayashi, and T. Ishida, “Global compaction of horizontal
microprograms based on the generalized data dependency graph,” IEEE
Trans. Comput., vol. 32, no. 10, pp. 922–933, Oct. 1983.
[33] J. Wang and G. Provan, “Characterizing the structural complexity of
real-world complex networks,” Lect. Notes Comput. Sci., vol. 4, no. 1, Johannes Partzsch received the M.Sc. degree in
pp. 1178–1189, 2009. electrical engineering from the University of Tech-
[34] A. V. Goldberg and R. E. Tarjan, “A new approach to the maximum- nology Dresden, Saxony, Germany, in 2007. He is
flow problem,” J. Assoc. Comput. Mach., vol. 35, no. 4, pp. 921–940, currently pursuing the Ph.D. degree in optimized
Oct. 1988. architectures for large-scale neuromorphic circuits
[35] G. Gallo, M. Grigoriadis, and R. Tarjan, “A fast parametric maximum at the Chair for Parallel VLSI Systems and Neural
flow algorithm and applications,” SIAM J. Comput., vol. 18, no. 1, pp. Circuits, University of Technology Dresden.
30–55, 1989. His current research interests include bio-inspired
[36] T. Binzegger, R. Douglas, and K. Martin, “A quantitative map of the circuits, topological analysis of neural networks, and
circuit of cat primary visual cortex,” J. Neurosci., vol. 24, no. 39, pp. modeling of synaptic plasticity.
8441–8453, Sep. 2004.
[37] Y. He, Z. Chen, and A. Evans, “Small-world anatomical networks in the
human brain revealed by cortical thickness from MRI,” Cereb. Cortex,
vol. 17, no. 10, pp. 2407–2419, Jan. 2007.
[38] R. Jonker and A. Volgenant, “A shortest augmenting path algorithm for
dense and sparse linear assignment problems,” Computing, vol. 38, no. René Schüffny received the Dr.Ing (Ph.D.) and
4, pp. 325–340, 1987. Dr.Ing.habil. (D.Sc.) degrees from the University of
[39] N. Brunel, “Dynamics of sparsely connected networks of excitatory and Technology Dresden, Saxony, Germany, in 1976 and
inhibitory spiking neurons,” J. Comput. Neurosci., vol. 8, no. 3, pp. 1983, respectively.
183–208, 2000. He has been with the endowed Chair for Paral-
[40] C. Mehring, U. Hehl, M. Kubo, M. Diesmann, and A. Aertsen, “Activity lel VLSI Systems and Neural Circuits, University
dynamics and propagation of synchronous spiking in locally connected of Technology Dresden, since April 1997. He is
random networks,” Biol. Cybern., vol. 88, no. 5, pp. 395–408, 2003. author or co-author of numerous publications in the
[41] N. Metropolis and S. Ulam, “The Monte Carlo method,” J. Amer. Stat. above field and has acted as a reviewer for several
Assoc., vol. 44, no. 247, pp. 335–341, Sep. 1949. international journals. His current research interests
[42] J. Fieres, J. Schemmel, and K. Meier, “Realizing biological spiking include complementary metal-oxide-semiconductor
network models in a configurable wafer-scale hardware system,” in Proc. image sensors and vision chips, design and modeling of analog and digital
Int. Joint Conf. Neural Netw., Hong Kong, Jun. 2008, pp. 969–976. parallel very large scale integrated architectures, and neural networks.

You might also like