Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Article

https://doi.org/10.1038/s41586-020-2735-5

Supplementary information

Third-order nanocircuit elements for


neuromorphic engineering
In the format provided by the
authors and unedited

Nature | www.nature.com/nature
Third-Order Nano-Circuit Elements for Neuromorphic Engineering

Suhas Kumar1, R. Stanley Williams2 and Ziwen Wang3


1
Hewlett Packard Labs, Palo Alto, CA, USA
2
Texas A&M University, College Station, TX, USA
3
Stanford University, Stanford, CA, USA

Supplementary Information
Contents:
1. Specificity of material composition to neuromorphic behaviour
2. Additional details of the device structure and measurement schemes
1. Fabrication of the third-order element
2. Measurement details
3. Details on transmission electron microscopy
4. Details on in-operando x-ray spectroscopy
5. Extended experimental data
1. Chaos in beats
2. Identifying additional neuromorphic properties
3. Energy required to generate spikes and comparison to transistor-based neurons
4. Comments on the origins of the different dynamical functions
6. Details on performance of devices
1. The range of parameters enabling neuromorphic behaviours
2. Variability, yield and stability
7. Additional details on the compact model
1. A discussion on the order of complexity of circuit elements
2. Representation of the Mott transition by using a parallel metallic conduction
3. Alternate representations of Equation 5
4. Physical insights and comparing to biological neurons
5. A two-state-variable model cannot produce neuromorphic behaviour
6. An alternate model, illustrating generality
8. Comparison to prior literature
9. Details on the experimental demonstration of analogue computing
1. Background on solving optimization problems with coupled oscillators
2. Construction and operation of the coupled oscillator system
3. Performance evaluation

1
1. Specificity of material composition to neuromorphic behaviour
Negative differential resistance (NDR) is one of the signatures of local activity, the ability of a system to amplify small
fluctuations to its state (i.e., it can act as an amplifier) [Mainzer & Chua, Local Activity Principle, ICP (2013); IEEE
Trans. Cir. Sys. 65, 1165 (2015)] In materials such as NbO2, the absence of NDR implies passivity. [IEEE Trans. Cir.
Sys. 65, 1165 (2015)] The regions of local activity and passivity are disjoint and contiguous sets in a parameter space.
A subset of the region of local activity is the edge of chaos, which can be broadly described as a state of impending
transient amplification of energy. A further subset of the space of edge of chaos contains sharp edge of chaos, which
contains instabilities in addition to impending energy amplification. A comprehensive mathematical theory of these
ideas has been developed by Chua. [Mainzer & Chua, Local Activity Principle, ICP (2013); Int. J. Bif. Chaos 15, 3435
(2015)] Chua further proved that neuromorphic action potential requires operation near the edge of chaos (within the
region of local activity), [Int. J. Bif. Chaos 22, 1250098, (2012)] while periodic oscillations require local activity but
not edge of chaos. [IEEE Trans. Cir. Sys. 65, 1165 (2015)]
The foregoing discussion exhibits remarkable agreement with our experimental data on regions of different dynamical
properties within a parameter space (Figure S1). It is apparent that within the parameter space in Figure S1a
(reproduction of Figure 1h), the neuromorphic dynamics (Mott dynamics) is a subset of the static Mott behaviour, which
is further a subset of NDR (local activity). Additionally, local activity is disjoint and nominally contiguous with the ‘no
NDR’ (passive) region. Figure S1b is a summary of a part of Chua’s theory of local activity that closely resembles our
experimental data. Although there have been numerous observations of NDR in the past, it is possible that the parameter
space within which a search had to be conducted in order to identify regions of additional activity (e.g., neuromorphic
dynamics) was never recognized. Introduction of additional electro-physical processes and tuning of the stoichiometry
in this work enabled us to sweep the space of local activity with greater resolution, which resulted in the discovery of
additional dynamical behaviours.

This work Chua’s theory of local activity

a b Passive Locally active (NDR)


Static Mott (No NDR)
10 NDR
 (x10 m)

Parameter 2

No NDR
-3

0.1 Sharp edge of chaos


Mott dynamics Edge of chaos
(neuronal activity)
(stable)
1.0 1.5 2.0 2.5 Parameter 1
x in NbOx

Figure S1: Experimental results compared to Chua’s prediction. (a) Reproduction of experimental data from Figure 1h. (b)
Illustration of Chua’s theory of local activity, redrawn from [Int. J. Bif. Chaos 15, 3435 (2015)]

2. Additional details of the device structure and measurement schemes

2.1. Fabrication of the third-order element

The third-order elements were fabricated on 12-inch Si wafers using standard manufacturing foundry processes. 240
nm of W was deposited atop fresh Si wafers using a physical vapor deposition process. This was followed by plasma-
enhanced chemical vapor deposition of 20 nm of SiNx to cap the W layer. This was followed by deposition of 900 nm

2
of SiO2 via plasma-enhanced chemical vapor deposition. The SiNx layer prevented oxidation of W from the oxygen in
SiO2. Extreme ultraviolet lithography was used to define via regions on the SiO2 layer ranging from 2 µm down to 50
nm in diameter. The via regions were defined as squares, which came out circular for sizes <100 nm. Dry reactive ion
etching was performed to etch the material (SiO2 and SiNx) in the lithographically defined via regions, to create via
holes down to the W layer. The walls of the via holes were coated with a thin conformal layer of SiNx, which was then
etched using directed ion etching to remove the SiNx layer coating the bottom W layer, thereby leaving SiNx on the side-
walls of the via holes. The via holes were then filled with TiN via plasma-enhanced chemical vapor deposition, such
that the TiN made contact with the underlying W, and was spaced from the SiO2 by the SiNx layer. The resulting
structure was planarized using chemical-mechanical polishing. The aforementioned process was followed by a back-
end-of-the-line deposition of a blanket layer of NbOx using reactive sputtering, followed by lithographically defined
reactive sputtering of TiN and evaporation of Pt. An additional resistor in series (Rint) was fabricated by deposition of
5 nm of Ti, followed by breaking of vacuum and evaporation of 20 nm of Pt. The atmospheric oxidation of the specific
thickness of Ti led to the desired resistance.
a d

12 in.
b e

c f

Pt
TiN
NbO2
SiO2
TiN
SiNx
W

Figure S2: Chip layout and device structure. (a) Optical micrograph of the chip used. Scale bar is 2 mm. (b) Scanning electron
micrograph of the bottom electrode array before the back-end-of-the-line processing. Scale bar is 6 µm. (c) Magnified scanning
electron micrograph of the smaller electrodes. Scale bar is 1 µm. (d-e) Schematic layouts of the wafer and the electrode array. (f)
Schematic structure of a single circuit element.

3
Sputter deposition of NbOx was performed using the following recipe:
Step 1: Ar cleaning: Using 50 W of RF power in a pressure of 3 mTorr, with Ar flow at 30 sccm and O2 flow at 6
sccm for 240 s at a base pressure between 1×10-7 - 3×10-7 Torr.
Step 2: Using a target of Nb2O5 with an RF power of 150 W at a pressure of 3 mTorr, with Ar flow at 30 sccm and O2
flow at 0 - 10 sccm for 200 - 900 s (with a pre-sputter time of 240 s) at a base pressure between 1×10-7 - 5×10-7 Torr.
Some of the films with a higher oxygen composition were deposited (following an Ar cleaning step) using a target of
Nb with a DC power of 150 W, with Ar flow at 15-20 sccm and O2 flow at 15 - 30 sccm for 1000 - 4000 s (with a pre-
sputter time of 120 s) at a base pressure between 3×10-8 - 2×10-7 Torr.
These films were incorporated into electrical devices either in a crossbar structure or in a nanovia structure. The
stoichiometry of the films was established via x-ray photoemission spectroscopy (XPS) (solid circles in Figure 1h) or
electron energy loss spectroscopy (EELS) in a transmission electron microscope (solid stars in Figure 1h). The solid
stars in Figure 1h correspond to data points on specific devices that were electrically measured prior to undergoing a
destructive EELS measurement. The XPS measurements were performed close to the devices that were electrically
measured, and it was confirmed that the composition of the film was uniform (within measurement error) throughout
the wafer.

2.2. Measurement details

Quasi-static current-voltage curves were measured using an Agilent B1500 parameter analyzer and a Cascade probe
station. The parameter analyzer was controlled through a General-Purpose Interface Bus (GPIB) using software
programs compiled in Igor. The dynamical current was measured using a LeCroy 820Zi oscilloscope and/or an Agilent
B1500 parameter analyzer, and the circuit element was probed using a Cascade probe station. The DC input voltages
were provided using software control of the Agilent B1500 parameter analyzer. The contact/interface resistance of the
electrodes contacting the NbOx layer (of 70-350 Ω, depending on the device’s structure/size) was measured in elements
in which the NbOx layer was purposefully dropped during the fabrication process, so that the resistance between the
electrodes is mostly of the contact resistance. This interface resistance did not include the intentionally fabricated
internal resistance (Rint). While the cables and probing techniques may have introduced measurable parasitics, we did
not consider them in our simulations.

Although for current levels greater than 0.3 mA, the static im-vm curve had a positive slope and therefore the circuit is
not expected to oscillate, there are oscillations in this range (Figure 2). The most reasonable explanation for this
observation is that there is 100-200 Ω series resistance in the electrodes, and if this is removed from the measurement
the resulting ‘intrinsic’ im-vm curve has a negative slope over the entire range of bias points examined (Figure S3).

4
a b 3
Measured
0.9
Rint
2
0.6
im (mA)

iRint (mA)
Subtracted
150 Ω

1
0.3

0.0 0
0.0 0.2 0.4 0 1 2 3 4 5 6
vm (V)
vRint (V)

Figure S3: Subtraction of the interface resistance. (a) Experimentally measured quasi-static current-voltage curve, and the
same curve with 150 Ω of series resistance subtracted. Most parts of the corrected curve contain NDR, consistent with oscillations
occurring throughout the quasi-DC curve. (b) Quasi-static current-voltage curve measured on Rint, demonstrating that it is a linear
resistor.

The circuit elements’ construction included an internal series resistor (Rint) that could either be used or bypassed by
contacting either of two different pads via probe tips (Figure 1b). This allowed measurements of only the memristor-
capacitor element, which were effectively measurements of only the memristors when performed in quasi-static mode
(Figures 1d, S4a). For some of the dynamical measurements, we utilized the internal resistor to create a load-line that
could bias the memristor at different points on the quasi-static im-vm curve depending on the applied external voltage
(Figures 2b-c, 4, S4b). In the aforementioned measurements, we utilized only the internal capacitance (Cint) as the
capacitor required to create oscillatory behaviour. For these cases, in Equations 1-5, C = Cint and RS = Rint. For the
remaining dynamical measurements, we included an external capacitor and an external resistor, and bypassed the
internal resistor, by typically using a larger external capacitor, in order to slow down the dynamics of the system, which
allowed us to record the temporal dynamics with better resolution (Figures 2d, 3, S4c). For these cases, in Equations
1-5, C = Cp (Cp >> Cint such that the total capacitance is dominated by Cp) and RS = Rext. Approximately, Rint = 2 kΩ,
Cint = 100 pF.

5
a
Cint
im = iext
RNbO2 (T,iNbO2)
Rint
iext vm
Integrated component
vdev

b
Cint
im
vext RNbO2 (T,iNbO2)
Rint
vm
Integrated component
vdev

c
Cint
im = iRs
RNbO2 (T,iNbO2)
Rext Rint
vext vm
Integrated component

Cp vdev

Figure S4: Measurement schemes on the neuromorphic element. (a) Bypassing of the internal resistor (Rint) for measuring the
im-vm curve. (b) Utilizing the internal resistor and the internal capacitor to generate dynamics. (c) Inclusion of an external resistor
and an external capacitor, and bypassing of the internal resistor, to generate slower dynamics.

3. Details on transmission electron microscopy


The as-deposited niobium dioxide layer was slightly over-oxidized (NbO2+δ) and amorphous, detailed elsewhere. [Sci.
Rep. 6, 34294 (2016); Nature Comm. 8, 658 (2017)] We performed cross sectional transmission electron microscopy
on a circuit element that was operated once past the hysteresis, which was therefore expected to have experienced
temperatures of >1000 K (from temperature calibration via conduction mechanisms). [See for description of conduction
mechanism, studied on devices similar to the ones here: Appl. Phys. Lett., 108, 023505 (2016)] We used a Titan Themis
300 scanning transmission electron microscope to perform these studies, including electron energy loss spectroscopy
and electron diffraction studies on our samples. Via high-resolution transmission electron micrographs, we identified a
sub-10 nm crystalline region in the monoclinic structure with [001] orientation (Figure S5). This crystal structure was

6
specific to NbO2. [Sci. Rep. 6, 34294 (2016)] Further, the regions immediately surrounding the crystallized region
remained amorphous (similar to the rest of the film) and was also more highly oxidized compared to the rest of the film
(NbO2+δ+ε). This indicated that oxygen from the as-grown NbO2+δ was expelled into the surrounding region to create a
localized region of NbO2, followed by crystallization of the NbO2. NbO2 is expected to crystallize at ~1000 K, which
was close to the local temperatures the circuit element was expected to have experienced. Therefore, the crystallization
likely happened due to the Joule heating during initial operation being localized to a sub-10 nm region, thereby causing
ionic motion to create a thermodynamically lower energy configuration (both chemically and structurally). This resulted
in the formation of a permanent conduction channel, which defined the effective device area.

One natural question that follows the above study is whether a pristine device contains crystallized regions of any
stoichiometry. To resolve this, we studied pristine devices using transmission electron microscopy, along with electron
diffraction patterns of selected regions, which showed that all pristine devices contained an amorphous active material
(NbO2+δ) (Figure S6). Further, when we electrically operated a previously pristine device (using a current source), we
were able to observe NDR, where we were careful to not supply a large enough current to cause the box-shaped
hysteresis. In all such devices, we were able to observe NDR (Figure S6d), but there was no crystallization or change
in stoichiometry within the NbO2+δ layer. This indicated that pristine devices were amorphous in the as-grown state,
and that NDR can be observed in amorphous films of NbO2+δ, without specificity to stoichiometry (i.e.: NDR was
observed for a range of δ, as shown in Figure 1h in the main text).

A question that follows the above results is whether the localized crystalline region of NbO2, seen in Figure S5, indeed
was responsible for the box-shaped hysteresis. In other words, could the box-shaped hysteresis rely on the amorphous
NbO2+δ+ε surrounding the crystallized NbO2 region. This is an important question to answer because this will provide
further confirmation (or contradiction) of the claim that the box-shaped hysteresis is cause by a Mott transition in NbO2,
and is specific to the stoichiometry (i.e.: is seen only when a part of the active region is NbO2). To resolve this, we
supplied relatively large currents (~1.5 mA) into a relatively smaller device (with nominal radius rdev = 25 nm), which
resulted in the expected NDR and box-shaped hysteresis (Figure S7a). We expected this process to have caused a
significantly large increase in local temperature, leading to crystallization throughout the NbO2+δ region. A transmission
electron micrograph of this device indeed confirmed crystallization in all the regions between the electrodes that were
measured via electron diffraction (Figures S7b-d). Furthermore, via EELS analysis, we confirmed that the entire region
between the electrodes was in the NbO2 stoichiometry, which even extended beyond the edges of the region between
the electrodes. The fact that this device exhibited the box-shaped hysteresis proves that the electrical behaviour was
caused purely by the crystallized NbO2 region and did not require the presence of any other stoichiometry. Thus, it is
possible to engineer the electrical behaviour observed here in smaller sized devices, across large arrays of many devices,
by controlling the stoichiometry of the active region, which is a scalable route to construction of large-scale arrays of
devices.

7
a b
ON

Pt

TiN
NbO2+δ

Ti Nb Pt Si
SiO2
SiNx
TiN

20 nm

TiN

NbO2
NbO2+δ+ε

NbO2+δ

TiN

SiO2
10 nm [001]

Figure S5: Transmission electron microscopy, EELS and electron diffraction on an operated device. (a) Cross sectional
transmission electron micrograph of the circuit element. (b) Electron energy loss spectral (EELS) mapping of different elements
(colour-coded with the legend) within the dashed yellow rectangle in (a). (c) High-resolution transmission electron micrograph
within the green rectangle in (a). Locally crystallized region of NbO 2 is marked with a white rectangle. Inset is an electron
diffraction pattern of the crystallized NbO2 region, indicating a monoclinic structure in the [001] orientation.

8
a c
TiN
NbO2+δ

TiN
SiO2
40 nm

b
d
0.4

Current (mA)
0.2

0.0
25 nm 0.0 0.5 1.0
Voltage (V)

Figure S6: Transmission electron microscopy and electron diffraction on a pristine device. (a) Transmission electron
micrograph of a pristine device. (b) High-resolution transmission electron micrograph within the green rectangle in (a). (c) An
electron diffraction pattern of the NbO2+δ region in (b), confirming the amorphous phase of the material. (d) A current-voltage
curve measured using a current source on a previously pristine device, which remained amorphous following the presented
operation. NDR is seen in pristine amorphous devices, while the box-shaped hysteresis is not observed.

9
b
TiN

NbO2
a
TiN
0.9
20 nm
SiO2
im (mA)

0.6
c

0.3

0.0
0.0 0.2 0.4 0.6
vm (V)

Figure S7: Transmission electron microscopy and electron diffraction on a smaller operated device. (a) A current-voltage
curve measured using a current source, exhibiting both NDR and the box-shaped hysteresis. (b) Transmission electron micrograph
of the same device in (a). (c) High-resolution transmission electron micrograph within the red rectangle in (b). (d) an electron
diffraction pattern of the NbO2 region in (c), exhibiting crystallization throughout the regions defined by the bottom electrode.

4. Details on in-operando x-ray spectroscopy

Since electron microscopy confirmed that NbO 2 was the active compound, we noted that among the oxides of Nb,
only NbO2 undergoes a Mott insulator-to-metal transition when heated beyond roughly 1000 K. There is a debate
in recent literature on the origin of the box-shaped hysteresis in the im-vm curve, with one hypothesis suggesting that
the Mott transition in NbO2 causes this behaviour. [Nature Comm. 8, 658 (2017); Adv. Func. Mat. 29, 1906731 (2019);
Nature Comm. 10, 1628, (2019)] Since we sought the dynamical properties of the box-shaped hysteresis, we wanted to
construct a fairly accurate but compact model of the circuit element. To achieve that, we needed to understand the
fundamental physics causing the hysteretic behaviour. Therefore, we performed in-operando x-ray absorption spectro-
microscopy on prototype crossbar NbO2 devices, which were similar to the nanovia circuit element in terms of their im-

10
vm behaviour. [Nature Comm. 9, 2030 (2018)] The prototype devices were constructed with a similar material stack
compared to the nanovia circuit element, except that we eliminated the TiN layer, since it underwent slight oxidation
and produced Ti-O absorption that interfered with the Nb-O spectra. The setup and operation of the measurement,
illustrated in Figure S8a, are further detailed elsewhere. [J. Appl. Phys. 118, 034502 (2015); Nanoscale 9, 1793 (2017)]
The setup allows detection of the effects of Joule heating by collecting x-ray absorption spectro-microscopy data in two
states of the sample under test: with and without an applied electrical bias (current or voltage). The setup further
enhances very low signals (<0.01% of the background) by employing time-multiplexing and averaging over repeated
short-duration (≥0.1 µs) measurements, thereby effectively eliminating spatial drifts, persistent changes in sample
behaviour with time, changes in the beam, etc. We modified this capability to enable collection of data for two different
applied biases (instead of only with and without bias), so that we could probe the state of NbO2 at two different current
levels: just before the hysteresis (0.6 mA) and just after the hysteresis (1 mA). This isolated the effects within the
hysteresis, especially from the spectral changes for currents in the NDR region.

The O K-edge x-ray spectrum in Figure 1g contains bands corresponding to NbO2, with the device bands identified.
[Nature Comm. 8, 658 (2017); Nanoscale 9, 1793 (2017)] Figure 1g also displays the difference between the spectra
at the two different currents noted above. Since the spectral difference is about 50 times lower in intensity compared
to the spectrum, the two individual spectra that produced the difference do not appear distinct. We therefore display
only the spectral difference, which shows two main features. A dip followed by a peak corresponding to the π* band of
NbO2, which indicates a downshifting of the π* band in the high-current state of NbO2, which is characteristic of
increased electrical conductance. The downshifting of the π* band is consistent with the electronic changes during the
Mott transition. In addition, there is the appearance of a new peak at the d||* energy of NbO2, also corroborated by an
increase in the overall spectral weight, noted by the increased post-absorption-edge signal. This indicates Nb-Nb
dimerization in the low-current state of NbO2, which is by definition the structural distortion in the low-temperature
monoclinic phase or essentially a distorted version of the high-temperature rutile phase. These spectral signatures are
identical to VO2, which involves similar electronic and structural transitions. [Adv. Mater. 26, 7505 (2014); Appl. Phys.
Lett. 108, 073102 (2016)] Although until recently it was unknown if the Mott transition of NbO2 was similar to that in
VO2, first-principles calculations have now confirmed this correspondence, which is a noteworthy corroboration of our
experimental data. [Comp. Mater. Sci. 173, 109434 (2020)] This data is therefore clear confirmation that the box-
shaped hysteresis in NbO2 is associated with the electronic and structural changes of the Mott transition. This allowed
us to construct a simple compact model that captures the underlying physics, and provided a deeper understanding of
the state variables.

11
a
X-ray

Zone
Plate

Metal

Metal
Insulator

Insulator
X-rays im

Devices
vDet. 1

vDet. 2

Detector
t

0.9

0.6
im (mA)

0.3

0.0
0.0 0.2 0.4 0.6 0.8
vm (V)

Figure S8: In-operando x-ray spectroscopy. (a) Schematic illustration of the measurement setup, along with the biasing
scheme. im is the applied device current, vDet. 1 and vDet. 2 are the two detector signals, gating the output of the detector
synchronously with each other and with the device current, asynchronous x-rays from the synchrotron are also marked. High and
low values of im are chosen to enable operation across a Joule-heating-driven Mott transition, thereby allowing the probing of the
insulating and metallic states within the window defined by im. (b) im-vm curve of the NbO2 device under measurement.

5. Extended experimental data


5.1. Chaos in beats
Figure S9 displays oscillations from the third-order element when biased at voltages lower than those required to access
the neuromorphic action potential. Data were collected using the internal resistor to create the load lines and an external
applied voltage, without an external capacitor (Figure S4b). In each of the current vs. time plots (Figure S9a) there are
oscillations, most of which indicate the presence of two frequencies that create beats visible within the timescales
displayed. As the applied voltage is increased, the power spectra (Figure S9b) reveal that the peak frequencies approach
each other and cross. At the crossing, the frequencies overlap, the envelope of the oscillations becomes extremely
sensitive to any perturbations, and appears highly irregular (Figure S9a middle). The Poincare plots of the oscillation

12
envelopes (Figure S9c) confirm that during crossing of the frequencies, the limit cycle in the envelope of the oscillations
collapses, indicating chaos. Chaos has an important function in neurons, especially in the light of the relatively recent
study on the causal relationship between the edge of chaos and neuromorphic action potential. [Int. J. Bif. Chaos. 22,
1250098, (2012)] Chaos is also sometimes explicitly included as a required property of a neuron model. [IEEE Trans.
Neural Net. 15, 1063 (2004)] In similar NbO2 switches, chaos has been previously observed at lower bias currents,
though not explicitly in relation to neuromorphic properties. [Nature 548, 318, (2017)] Here we also note that the two
peaks in the power spectra (Figure S9b) have nearly identical and mirror-imaged noise profiles, suggesting that an
interacting source of noise or fluctuations was a driving force that was amplified by the intrinsic resonance frequencies
of the device. Coupled thermal noise causing chaotic dynamics was an idea explored earlier and is likely worth pursuing
for these results on beating oscillators. [Nature 548, 318, (2017)]

13
a b c
1.0 1.0
3

vext = 1.45 V
0.5 0.5

0.0 0 0.0
1.0 1.0
3
vext = 1.50 V
0.5 0.5

0.0 0 0.0
1.0 1.0
3
vext = 1.56 V
0.5 0.5
Amplitude (a.u.)

im (t) (mA)
im (mA)

0.0 0 0.0
1.0 1.0
3

0.5 vext = 1.61 V


0.5

0.0 0 0.0
1.0 1.0
3

0.5 vext = 1.66 V


0.5

0.0 0 0.0
1.0 1.0
3

0.5 0.5
vext = 1.72 V

0.0 0 0.0
0 25 50 8 9 0.0 0.5 1.0
t (µs) f (MHz) im (t+Δ) (mA)

Figure S9: Extended data on chaos in beats. (a) Temporal dynamics of im, measured at different external applied voltages, as
annotated. (b) Power spectra of data in (a). Poincare plots of the envelopes of the oscillations in (a).

14
5.2. Identifying additional neuromorphic properties
Many neuron models attempt to emulate specific properties exhibited by biological neurons. [Dynamical Systems in
Neuroscience, MIT Press (2007)] And many, but not all, of those properties have known computational functions.
Therefore, experimental efforts to construct neuromorphic systems identify known neuron behaviours. [Nature Comm.
9, 4661 (2018)] In this context, we discuss some of the neuromorphic properties exhibited by our third-order circuit
element and also identify certain dynamical behaviours of our element that we believe are unique among electrical
systems even if they are not directly relevant to biological neurons. In the main text and in the preceding section, we
identified the following five properties of the third-order circuit element:
i. Self-sustained oscillations
ii. Periodic (tonic) spiking
iii. Periodic (tonic) bursting
iv. Burst-number adaptation (with applied bias)
v. Damped spiking
vi. Chaos
In addition to the data presented in the main text, in Figures S10-S14, we display temporal dynamics of im at various
vext and identify the following additional properties:
vii. Mixed-mode: Figure S10 details coexistence of self-sustained oscillations, periodic spiking, periodic bursting,
and burst-number adaptation. As the applied bias is increased, the different behaviours transition smoothly,
where multiple behaviours can be seen to coexist at intermediate applied biases. A specific version of such
behaviour, a burst followed by periodic single spikes, is usually referred to as intrinsically bursting. [IEEE
Trans. Neural Net. 14, 1569 (2003)]
viii. Random firing: Figure S11a is response to specific values of vext, where periodic spiking coexisted with beyond-
threshold (or super-threshold) damped spiking, such that the spikes occurred at random positions in time (with
a certain periodicity), while the absence of spiking at an expected point in time was usually replaced with a
damped spiking. We note this as a special case of mixed-mode operation.
ix. Beyond-threshold damped oscillations: Typically, neuron models produce spiking above a certain threshold
potential and produce no spiking below the threshold potential, or sometimes damped spiking (known as sub-
threshold spiking). In our element, we observe damped spiking above the threshold required for spiking, while
we also observe no activity below a certain threshold potential (that required to access the nonlinearity) (Figure
S11b-c). In other words, while most neuron models exhibit beyond-threshold activity in sub-threshold regions,
our circuit element produces those behaviours post-threshold. In recent work, dendritic neurons have exhibited
a similar property. [Science 367, 83 (2020)] The property of post-threshold damping is distinct from damped
spiking because the latter refers to damped spiking occurring specifically post-threshold. This has been
previously referred to as sub-threshold (damped) oscillations. [IEEE Trans. Neural Net. 15, 1063 (2004)]
x. Beyond-threshold self-sustained oscillations: At applied biases higher than those required for beyond-threshold
damped spiking, we observed low-amplitude self-sustained oscillations (Figures S12a-b). We distinguish this
behaviour from beyond-threshold damped oscillations, which is listed here as a distinct property. [Nature
Comm. 9, 4661 (2018)]
xi. Beyond-threshold frequency adaptation: In Figure S12a, it is noticeable that the sub-threshold oscillations start
as continuous oscillations, and after nearly 25 µs, break into bursts of oscillations. This behaviour was repeated
under similar experimental conditions. This is likely due to some slower dynamics causing continuous

15
oscillations to transform into bursting (such as slow heating of the circuit element). This is similar to spike
frequency adaptation in neuron models. [Nature Comm. 9, 4661 (2018)]
xii. Phasic spiking: Figure S12c is a temporal current response where an initial spike following the onset of the
applied voltage (indicated by a black arrow) is followed by no spikes. This is referred to as phasic spiking.
[IEEE Trans. Neural Net. 15, 1063 (2004)]
xiii. Phasic bursting: Figure S12d is a temporal current response where an initial burst of spikes following the onset
of the applied voltage (indicated by a black arrow) is followed by no spikes or burst of spikes. This is referred
to as phasic bursting. [IEEE Trans. Neural Net. 15, 1063 (2004)]
xiv. Integrate and fire: One of the typical neuromorphic properties is the ability to integrate input energy (e.g.: in
the form of applied pulses), and produce an action potential only when the integrated energy exceeds a certain
threshold. Typical forms of implementing such a property make use of a physical state variable, such as
temperature, wherein every input pulse increases the temperature of the Mott device by a certain quantity, which
is accumulatively increased due to inherent thermal capacitances, which upon reaching the Mott transition
temperature triggers the Mott device to produce a jump in current (action potential). [Sci. Rep. 10, 4292 (2020)]
In our devices, we observe this property, where the input pulse amplitude may not be enough to trigger periodic
spiking, but a series of such pulses will eventually trigger periodic spiking (Figure S13).
xv. Refractory period adaptation: With changing input potential, the refractory period between spikes may change,
which is known as refractory period adaptation, or adaptation of spike timing/frequency. We observe such a
property when the applied external voltage is temporally tuned (Figure S14). This qualifies more as a refractory
period adaptation rather than adaptation of spike widths (both of which can lead to changes in frequency),
because the spike width is relatively unchanged compared to the refractory period between subsequent spikes.
[Front. Neurosci. 5, 73 (2011)]
While all these properties were measured without using an external capacitor, we did not identify any new properties
upon inclusion of an external capacitor, which served the only purpose of slowing down the dynamics.

16
a
1.5

im (mA)
1.0 vext=1.85 V
0.5
0.0
b 1.5
im (mA)

1.0 vext=1.82 V
0.5
0.0
c
1.5
im (mA)

1.0 vext=1.87 V
0.5
0.0
d
1.5
im (mA)

1.0 vext=1.89 V
0.5
0.0
e 1.5
im (mA)

1.0 vext=1.92 V
0.5
0.0
f 1.5
im (mA)

1.0 vext =1.99 V


0.5
0.0
0 1 2 3 4
t (s)

Figure S10: Extended experimental data on mixed-mode neuromorphic properties. Temporal dynamics of im, measured at
different external applied voltages, as annotated.

17
a
1.5

im (mA)
1.0 vext=2.09 V
0.5
0.0
b
1.5
im (mA)

1.0 vext=1.92 V
0.5
0.0
0 1 2 3 4
c
0.84
im (mA)

0.80 vext =1.92 V

0.76
3.2 3.3 3.4
t (s)

Figure S11: Extended experimental data on random firing (a) and beyond-threshold damped oscillations (b,c)
neuromorphic properties. Temporal dynamics of im, measured at different external applied voltages, as annotated. (c) Data from
(b) within the green rectangle magnified.

18
a
2.0
1.5

im (mA)
1.0 vext=2.2 V
0.5
0.0
0 10 20 30 40
b t (s)
1.5

1.0
im (mA)

vext=2.2 V
0.5

0.0
0 1 2 3 4
t (s)
c
1.5
im (mA)

1.0 vext=2.06 V
0.5
0.0
0 1 2 3 4
t (s)
d
1.5
im (mA)

1.0 vext=1.97 V
0.5
0.0
0 1 2 3 4
t (s)

Figure S12: Extended experimental data on beyond-threshold self-sustained oscillations (a,b), beyond-threshold frequency
adaptation (a), phasic spiking (c) and phasic bursting (d) neuromorphic properties. Temporal dynamics of im, measured at
different external applied voltages, as annotated. (b) Data from (a) within the green rectangle magnified. Black arrows in (c) and
(d) indicate the time at which vext was applied. The initial spike is a capacitive overshoot typical of most devices.

19
a
1.0

im (mA)
0.5

0.0

bvext (V)
1.0

0.5

0.0
0.00 0.05 0.10 0.15 0.20
t (s)

Figure S13: Extended experimental data on integrate-and-fire property. (a) Temporal dynamics of im, for the applied external
voltage displayed in (b). Data in (b) was constructed from the settings of the arbitrary waveform generator programmed to
provide the external voltage and was not directly measured.

a
1.0
im (mA)

0.5

0.0
b 2
vext (V)

0
0.0 0.2 0.4 0.6
t (s)

Figure S14: Extended experimental data on refractory period adaptation. (a) Temporal dynamics of im, for the applied external
voltage displayed in (b). Data in (b) was constructed from the settings of the arbitrary waveform generator programmed to
provide the external voltage and was not directly measured.

20
a 0.4

im (mA)
0.2 15 ns

0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6
t (s)
b 0.4
im (mA)

0.2

0.0
0.20 0.21 0.22 0.23 0.24 0.25
t (s)
Figure S15: Experimental data on the smallest and fastest device. (a) Temporal dynamics of im, on a nano circuit element of
radius rdev = 25 nm, which was the smallest measured device. Black arrow indicates the time at which vext was applied. (b) Data
magnified within the time window marked in (a). The energy in the spike is roughly 2 pJ.

5.3. Energy required to generate spikes and comparison to transistor-based neurons


In Figure S15, we display spike generation during self-sustained oscillations in the smallest and fastest devices. The
energy required to produce the spike is about 2 pJ. This compares well with transistor-based neurons that can produce
a range of neuromorphic properties, [Front. Neurosci. 5, 73 (2011)] while there are much lower energy transistor-
based neurons (<10 fJ/spike), [Solid State Elec. 168, 107717 (2020); Front. Neurosci. 11, 123 (2017)] though such
neurons do not produce the wide range of neuromorphic behaviours or any of the complex dynamics shown in this
work. The obvious advantage of our neuromorphic nano circuit element is the compact structure, and its ability to
emulate a wide range of neuromorphic behaviours obtained via engineering the electro-physico-chemical properties of
the device.

21
a Beyond-
b
im
thresh.
0.9 activity
Action
potential 2 mA Inactivity Activity Inactivity

E
im (mA)

0.6
Beats

Chaos
C C
oscillations

1 mA
Periodic

B F G
0.3 D H
D
A

0
0.0 0
0.0 0.2 0.4 1V 2V v ext
vm (V)

Figure S16: Bifurcation diagram of the different neuromorphic properties. (a) Classification of neuromorphic properties of
the circuit element mapped onto the quasistatic im-vm curve, reproduced from Figure 2a. (b) Bifurcation diagram of the circuit
element illustrated on the im-vext plane for dynamical response. Coloured bars represent the range of current oscillations in each
behaviour class, corresponding to the following properties. A: sub-threshold inactivity, B: periodic oscillations, C: beating
oscillations, D: chaos, E: action potential (spiking and bursting), F: super-threshold spiking, G: super-threshold active dynamics,
H: super-threshold inactivity.

5.4. Comments on the origins of the different dynamical properties


Figure S16b is a bifurcation diagram listing the different classes of active dynamics exhibited by the third order
elements. Although the static behaviour seen in Figure S16a has been observed previously, few of the dynamical
properties associated with the different regions within the static behaviour have been measured. The bifurcation diagram
is a mapping of the different dynamical behaviours and defines the regions of activity. Periodic oscillations from any
NDR element have been assigned as a thermal (or in some cases, electronic) runaway process that repeated periodically
when incorporated into a Pearson-Anson-like oscillator circuit. In a prior work, we described the interplay between
thermal fluctuations and the nonlinear transport to be responsible for chaotic dynamics. [Nature 548, 318 (2017)] Here,
we measured beating oscillations and a range of other neuromorphic behaviours. The beating oscillations (which have
been observed in the past to a limited extent) occur at the operation region between periodic oscillations and the Mott
transition. Thus, the dynamics associated with the two processes likely contributed to closely-spaced frequencies, which
showed up as beating oscillations. When the frequencies merged, the oscillation amplitudes became chaotic. At higher
bias voltages, as detailed by the models here, the Mott transition dynamics enabled the various neuromorphic dynamics,
including super-threshold activity.

22
6. Details on performance of devices
6.1. The range of parameters enabling neuromorphic behaviours
Previous studies have observed some of the properties discussed here in individual devices, for instance, chaotic
dynamics, [Nature 548, 318 (2017)] whereas here we observed a wide range of neuromorphic behaviours from a single
device. One of the main reasons for this is the presence of a third state variable built into the circuit element, as described
in the main text. This involved carefully tuning the material properties, especially the stoichiometry. However, in order
for the three state equations to coexist without one dominating the behaviour of the device, other structural features
must also be carefully controlled.
One of them is the thermal mass of the system (or thermal capacitance Cth), which characterizes the physical size of the
circuit element. For the thermal state equation to have a proportional contribution to the dynamics relative to the
electrical state equation, especially in small-sized devices (where size also limits the electrical capacitance Cp), the
speed of the thermal dynamics must be comparable to that of the electrical dynamics. We performed simulations of our
model to identify the range of Cth for which there was full neuromorphic behaviour (periodic spiking and bursting),
leading to the observation that such behaviour occurred only below Cth ≈ 10-16 WsK-1 (Figure S17a). In other words, the
thermal mass had to be less than a limiting quantity for the thermal dynamics to be fast enough. In Figure S17a, we also
indicate the approximate physical sizes of the devices corresponding to the various thermal capacitances, indicating that
neuromorphic properties are predicted only below rdev ≈ 35 nm. The property complementary to the thermal mass is the
electrical capacitance of the system. In an analysis similar to that for the thermal mass, we observed that neuromorphic
properties are observed only below Cp ≈ 200 pF (Figure S17b). Similarly, thermal resistance (Rth) also needs to be within
a certain range in order to exhibit neuromorphic properties (Figure S17c).

23
rdev ≈ 1 nm rdev ≈ 35 nm rdev ≈ 1 µm
a
3

f (MHz)
2

1 Neuronal
properties
0
10-20 10-19 10-18 10-17 10-16 10-15 10-14 10-13
Cth(WsK-1)
Simulation of model

b
1000
f (MHz)

100
10 Neuronal
1
properties
100 101 102 103
Cp(pF)
c
10
f (MHz)

1 Neuronal
properties
106 107 108 109
Rth(KW -1)
d
10
Experimental data

Neuronal
f (MHz)

properties
5

0
25 50 75 100
rdev (nm)

Figure S17: The ranges of physical properties required for neuromorphic behaviour. (a) Simulated frequency of action
potential plotted against Cth, with the blue shaded region indicating the range of Cth for which neuromorphic properties were
observed. Approximate device radius rdev corresponding to the different abscissa values are marked, by assuming that Cth is
proportional to the volume of the active region. (b) Similar plot as (a) as a function of Cp. (c) Similar plot as (a) as a function of
Rth. (d) Experimental frequency plotted against device radius rdev, with the blue shaded region indicating the range of rdev for
which neuromorphic properties are observed. Each experimental data point is averaged over 30-60 devices.

In physical devices, the thermal mass and electrical capacitances are effectively coupled and difficult to tune separately.
We performed experimental measurements on our circuit elements of differing sizes, and observed that neuromorphic
properties are observed only below rdev ≈ 60 nm (Figure S17d). There is less than a factor of two difference between the
critical size predicted by the compact model (Figure S17a) and the experimentally measured results (Figure S17c),
which is good considering the approximations in the former. The essence of these analyses is that the circuit parameters
need to be set within certain limits to enable the underlying electro-physical dynamics to produce neuromorphic
behaviours. Our analysis indicates that smaller devices are favored in producing neuromorphic behaviour, while larger
ones may also exhibit such behaviour under special circumstances.

24
a
2.0
1.5

Vsw (V)
1.0
0.5
0.0
50 100
rdev (nm)
b
100
80
Yield (%)

60
40
20
0
50 100
rdev (nm)

Figure S18: Experimental data on variability and yield. (a) Variability in the switching voltage plotted against device radius
rdev, with each data point containing statistics on 30-60 devices. (b) Yield (in percentage of working devices) plotted against rdev.
Yield was low for devices with rdev = 25 nm and 30 nm, likely due to challenging lithography at the smaller scales.

6.2. Variability, yield and stability


We report on the variability in the operation of our circuit elements, via measurements of the switching voltage at
different sizes of the devices, for a specific stoichiometry of x ≈ 2.3 in NbOx (Figure S18a). While the quantities vary
significantly with stoichiometry, within a specific stoichiometry (which was uniform across all devices in a die), the
variability is minimal even for small devices. This is particularly encouraging because small devices are prone to
fluctuations and thermal noise, and stable operation at small sizes enables construction of highly reliable large-scale
arrays of such devices.
The yield of our devices (the fraction of devices being operational) was high (~90%) for most of the device sizes, while
it was low (<40%) for smaller devices and there was a noticeable drop in yield even for the largest devices (Figure
S18b). The smallest devices often failed due to reliability issues with the electron-beam lithography process, which is a
relatively common issue, possibly because the process parameters (resist, dose, exposure, etc.) are optimized for the
average of the various device sizes, but this is not a prohibitive or a fundamental issue in building smaller devices. The
largest devices often failed by shorting of the electrodes, which is likely because of the parasitic dielectric breakdown
pathways increase with size. This points to the possibility that material optimization may be required specifically for
larger devices, and combined with the fact that neuromorphic properties are found only in smaller devices, this may
lead to a natural choice of small length scales when constructing neuromorphic circuit elements.
In order to evaluate the endurance of our devices, we allowed a device to undergo self-oscillations for 1 hour, which
consisted of roughly 3×1011 cycles. We obtained quasi-static current-voltage measurements before and after this
process, which exhibited no obvious signs of degradation (Figure S19). Further, we have noticed no signs of degradation
with time in any of our devices, likely because the active material was buried underneath multiple layers of passivation
and electrode materials.

25
a vext=1.1 V c vext=1.2 V
2 2
im (mA)

im (mA)
1 1

0 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
t (s) t (s)
b d
2 2
im (mA)

im (mA)
1 1

0 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
t (s) t (s)
g
e vext=1.95 V
1.5

0.9
im (mA)

1.0

0.5

0.0 0.6

im (mA)
0 1 2 3 4
t (s)
f
1.5
0.3
im (mA)

1.0

0.5
0.0
0.0 0.0 0.2 0.4
0 1 2 3 4 vm (V)
t (s)

Figure S19: Experimental data on device stability. The nano circuit element displayed in the main text (Figure 2) was operated
continuously for 1 hour, undergoing 3×10 11 self-oscillations, referred to as the stability test. Temporal dynamics of im (a,c,e)
before and (b,d,f) after the stability test, at two different applied external voltages, as noted. (g) Quasi-static current-voltage
behaviours before and after the stability test overlaid on top of each other. The data here show no significant degradation in
behaviour.

7. Additional details on the compact model


7.1. A discussion on the order of complexity of circuit elements
The order of a dynamical circuit element is the dimension of the phase space, i.e. the number of state variables that
change with time when power is applied to the element. State variables represent the distinct physical properties that
determine the conductance of the element at any point in time. While nearly every electronic component theoretically
contains at the least first order complexity (since temperature is a universal state variable), many practical applications
of electronic components do not encounter the dynamical effects of Joule heating, thereby often making temperature
dynamics irrelevant. Thus, resistors that are operated at low power are essentially zeroth order elements. In the usual
formulation of compact models for nonlinear dynamical circuit elements, each state equation describing the temporal
rate of change of a state variable is expressed as a first order ordinary differential equation in time, for example Newton’s
26
law of cooling for temperature. Thus, there are the same number of defining equations as there are state variables. [IEEE
Circuits and Sys. Mag. 14, 12 (2014); Int. J. Bif. Chaos 22, 1230011 (2012)]
Neuron models such as the Izhikevich model (one of the simplest models) are in effect driven systems containing a
dynamically coupled signal, [IEEE Trans. Neural Net. 14, 1569 (2003)] whereas this work demonstrates a third order
system with a DC bias. A recent report studied multiple third order systems of equations simulated in software, in order
to produce neuron-like spiking, bursting, chaotic dynamics, etc., similar to the observations in this study. [Comms.
Nonlin. Sci. and Numerical Simulation, 69, 343 (2019)]
7.2. Representation of the Mott transition by using a parallel metallic conduction
Equation 4 includes a switchable resistor in parallel with the Schottky conduction, illustrated in Figure S20.

iSch
a
iox iox
RSch
= imet
RNbO2 (T,iNbO2)

Rmet
vm vm

b
Rs RTE
RNbO2 (T,iNbO2)

CP
V

ROsc RBE

COsc

Figure S20: Representation of the Mott transition and simulation circuit. (a) Representation of the Mott transition. The
parallel resistor Rmet represents a switchable high conductance pathway when NbO2 is in the conducting state. (b) Circuit used to
represent the experiments via modeling. RTE and RBE refer to top and bottom electrode interface resistances. Osc refers to the
parameters of the oscilloscope or the measuring setup. RS = Rint when no additional external series resistor is used.

7.3. Alternate representations of Equation 5


Equation 5 can be rewritten more transparently as a state equation using the differential form for the state variable.
For instance, Equation S1 is another possible representation of Equation 5.
d𝑖ox 1 −𝑅met 1 d𝑖ox
= {𝑖ox − (tanh−1 ( + 𝛽)) + 𝐼2 } | |
d𝑡 𝛼 𝑅0 𝐼1 d𝑡

27
(S1)
Alternately, if 𝑖ox = 𝛾sin⁡(ω𝑡), i.e. we assume the current is a sinusoidal function of time, then Equation 5 can be
rewritten in a differential form as a state equation in Rmet (Equation S2). This would make Equation 5 the integral form
of a state equation for Rmet.
d𝑅met
= 𝑅0 [− sech2 {𝛼(𝑖ox + [sgn(𝛾𝜔 cos(𝜔𝑡))𝐼1 − 𝐼2 ])}] ∙
d𝑡
⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡[𝛼(𝛾𝜔 cos(𝜔𝑡) − [2δ(𝛾𝜔 cos(𝜔𝑡))(𝛾𝜔2 sin(𝜔𝑡))𝐼1 ])]
(S2)
Figure S21a illustrates the hysteresis in Rmet (Equation 5) and Figure S21b is an illustration of the hysteresis in the
differential form of Rmet. As we mentioned in the main text, this formulation is not the most accurate representation of
the experimentally measured electrical data or the underlying physics. The conduction mechanism is likely better
modeled using a 3D Poole-Frenkel model. [Appl. Phys. Lett. 108, 023505, (2016)] Moreover, the dynamical behaviour
of the hysteresis is mostly unexplored and thus our formulation may contain quantitative or qualitative inaccuracies
(e.g., current values at which Rmet changes, direction of tracing the loop, etc.), whereas the dynamical behaviours may
also be different from the quasi-static measurements. Nonetheless, our model elegantly demonstrates that inclusion of
a third order of complexity enables reproduction of several of the experimental data.
The values of the constants used in the simulations presented in the main text are as follows: 𝑇amb = 300⁡K, 𝐴 =
1.37 × 10−9 ⁡AK −2 , 𝑑 = 15 × 10−9 ⁡m, 𝑅th = 2 × 107 ⁡KW −1 , 𝐶th = 1.2 × 10−16 ⁡JK −1 , 𝜅 = 6.9 ×
10−11 Cm−0.5 eV −0.5 , 𝑘𝐵 = 1.38 × 10−23 JK −1 , 𝜙 = 0.168⁡eV, 𝑅0 = 5 × 108 Ω, 𝐼1 = 0.1⁡mA, 𝐼2 = 0.9⁡mA, 𝛼 =
106 A−1 , 𝐶p = 100⁡pF, 𝑅s = 2⁡kΩ, 𝛽 = 1, 𝑅TE = 100⁡Ω, 𝑅BE = 50⁡Ω, 𝐶Osc = 100⁡pF, 𝑅Osc = 50⁡Ω.

a Rmet

0
i2 i1 iox
b
dRmet /diox

i2 i1 iox

Figure S21: Schematic illustrations of Equation 5. (a) Illustration of Equation 5. (b) Illustration of the differential form of Rmet.

28
7.4. Physical insights via simulations and comparing to biological neurons
d𝑖m
In Figure S22, we plot the temporal dependence of 𝑇, 𝑖m , 𝑖Sch , 𝑖Rs and . The temperature dynamics confirms that the
d𝑡
current spikes are indeed at higher temperatures, triggering a Mott insulator-to-metal transition (and back). Two
currents (𝑖Rs and 𝑖Sch) increase sharply during the dip and the spike in 𝑖m , respectively. These are analogous to the Na
d𝑖m
and K polarizations during a biological neuron’s action potential. Further, the change in sign of d𝑡
marks the beginning
d𝑖m
and end of every spike, indicating that the spikes are caused by the physical process that depends on , which is the
d𝑡
Mott-transition-driven hysteresis in the im-vm curve.

t1 t2 t3 t4
a f
1500 Mott transition 1500
T (K)

T (K)
1000 1000

500 500

b 2 g2
im (mA)

im (mA)

1 1

0 0
c -0.8 h
-0.8
iRs (mA)
iRs (mA)

-1.0 -1.0
d i 1
1 1
iSch (mA)

iSch (mA)

0
0.11 0.12 0.13
0
e j0
0.2 0.2
dim/dt (As )
-1
-dim/dt (As )
-1

0.1 0.1

0.0 0.0

0.00 0.25 0.50 0.75 0.05 0.10 0.15


t (s) t (s)

Figure S22: Additional variables from the model. (a) Temporal dependence of additional variables from the model. (b) Data in
(a) magnified in time, and divided into four coloured zones labeled t1-t4, which correspond to the four events of a neuromorphic
action potential marked in Figures 2h-k.

29
7.5. A two-state-variable model cannot produce neuromorphic behaviour
In order to determine if a simpler two-state-variable model can produce neuromorphic behaviour, we introduced a
smooth transition in Rth at the Mott transition temperature, consistent with prior experimental measurements, and did
not utilize the parallel resistor to represent the Mott transition in the model presented in the main text. [Nature Comm.
8, 658 (2017)] Rth was modeled by Equation S3.
1
𝑅th = 2 × 107 ( (tanh((𝑇 − 2000) × 0.1) + 1) + 1)
3
(S3)
Equation S3 implicitly introduces Rth as a state parameter, not a state variable, since it does not have a differential
representation. This reduces the system to second-order complexity. A simulation of this system produced a quasi-
static im-vm curve that was similar to the one produced by the model presented in the main text, and the dynamical
behaviour produced oscillatory behaviour. However, it was not able to produce periodic spiking or bursting (Figures
S23a-d). This reinforces that a system of a higher order complexity was required to generate neuromorphic behaviour.

a
0.9
im (mA)

0.6

0.3

0.0
0.0 0.2 0.4
vm (V)
b 2
vext (V)

0
0 20 40 60 80 100
t (s)
c 1.5

1.0
im (mA)

0.5

0.0
0 20 40 60 80 100
d t (s)
1.5

1.0
im (mA)

0.5

0.0
26 28 30 32 34
t (s)

30
Figure S23: A two-state-variable model. (a) Quasi-static im-vm curve from the two-state-variable model. (b) Applied voltage. (c)
Simulated current dynamics. (d) Data from (c) within the blue rectangle magnified.

7.6. An alternate third-order model, illustrating generality


To further reinforce that a third-order model is required to capture some key neuromorphic properties, we present an
alternate model, with three state variables, which was able to reproduce some of the experimental measurements. Rmet
was represented by Equation S4, instead of Equation 5. The rest of the model remained identical to the one presented
in the main text.

0.6⁡mA 30 d𝑖ox
𝑅0 ⁡if⁡𝑖ox < 0.6⁡mA, ⁡𝑅0 ( ) if⁡𝑖ox ≥ 0.6⁡mA, if <0
𝑖ox d𝑡
𝑅met =
0.5⁡mA 20 d𝑖ox
𝑅0 ⁡if⁡𝑖ox < 0.5⁡mA, ⁡𝑅0 ( ) if⁡𝑖ox ≥ 0.5⁡mA, if ≥0
{ 𝑖ox d𝑡
(S4)
Equation S4 essentially implements a function similar to Equation 5 (illustrated in Figure S21a), using non-conventional
d𝑖ox
exponents of currents. Equation S4 describes the current dynamics (in the form of ) and is therefore a state equation.
d𝑡
The static and dynamical behaviours of this model are displayed in Figure S24. The model is able to capture both
periodic spiking and bursting, similar to the model presented in the main text. Thus, we are able to corroborate the
simulated results using alternate approaches. Apart from this model, there have been other recent third order
mathematical formulations that have been able to produce similar spiking and bursting behaviours. [Comms. Nonlin.
Sci. and Numerical Simulation, 69, 343 (2019)]

31
a 1.5
1.2
0.9

im (mA)
0.6
0.3
0.0
0.0 0.2 0.4
vm (V)
b 2
vext (V)

0
0 20 40 60 80 100
t (s)
c 2.5
2.0
im (mA)

1.5
1.0
0.5
0.0
20 40 60 80 100
t (s)
d 2.5
2.0
im (mA)

1.5
1.0
0.5
0.0
85 90 95
t (s)

Figure S24: An alternate model. (a) Quasi-static im-vm curve from the alternate third-order model. (b) Applied voltage. (c)
Simulated current dynamics. (d) Data from (c) within the blue rectangle magnified.

32
Nature 548, 318 (2017) This work
a 1.2 b Focus of this work
Beyond-
thresh. Explicit Mott
0.9 activity dynamics + neuronal
No activity,
properties enabled by
0.9 no neuronal Action chemical and
properties potential structural engineering

im (mA)
im (mA)

0.6

Beats
Active
0.6 Chaos dynamics
Active Periodic oscillations

oscillations
dynamics

Periodic
+ beating oscillations
(periodic 0.3
0.3 + chaos in beats
Dynamic oscillations +
Static chaos)

0.0 0.0
0.0 0.3 0.6 0.9 0.0 0.2 0.4
vm (V) vm (V)

Figure S25: Comparison to prior literature. (a) Data repeated from [Nature 548, 318 (2017)] to illustrate similarity in the static
behaviour compared to present work. (b) Figure 2a reproduced. The annotations highlight the active dynamics enabled by the
electro-physico-chemical engineering of the devices in the present work.

8. Comparison to prior literature


Prior literature has recorded observations of the box-shaped hysteresis in NbO2 devices, which appears similar to the
static behaviour of the basic device structure in this work. [Nature 548, 318 (2017), Nature Comm. 8, 658 (2017)]
However, there was no observation of the neuromorphic properties reported in that work. The novelty in this work is
the distinct dynamics associated with the box-shaped hysteresis, which was missing in any prior study (Figure S25).
The analyses presented above (both modeling and experiments) detail how it was important to carefully engineer the
device to achieve suitable electrical, thermal and chemical properties, which together enabled neuromorphic dynamics.
We have studied static and dynamical behaviours of other Mott insulators such as VO2, [Nano Lett. 19, 6751 (2019);
Adv. Mater. 25, 6128 (2014)] in prototypical devices that were not engineered towards exhibiting neuromorphic
dynamics. In such systems, we have observed static behaviour similar to our neuromorphic elements (especially the
box-shaped hysteresis), but we were unable to observe the range of neuromorphic behaviours demonstrated here.
9. Details on the experimental demonstration of analogue computing
In this section, we provide a brief background on solving optimization problems using coupled oscillators and an
overview of the construction and operation of the experimental system. A detailed description of the system and
performance analysis are deferred to a dedicated publication.
9.1. Background on solving optimization problems with coupled oscillators
Oscillators that are coupled typically synchronize in frequency and phase (by going in-phase or anti-phase, depending
on the type of coupling) if they have fundamental frequencies that are close to one another. Pearson-Anson-like
oscillators built using NDR devices typically synchronize in-phase under resistive coupling and are driven towards an
anti-phase synchronization under capacitive coupling. [arXiv:1709.08102 (2017); Sci. Rep. 7, 911 (2017)] If the

33
solution to a problem is encoded in the phase of coupled oscillators, the oscillators can be considered similar to neurons
in a neural network that are driven to a specific state (phase) depending on the inputs and feedback they receive. As an
illustration, resistive connections ‘excite’ neurons into state ‘+1’ (in-phase synchronization), while capacitive
(‘inhibitory’) connections drive neurons into state ‘-1’ (anti-phase synchronization). The connections that are defined
by a network of electrical devices (resistors and/or capacitors) form the weight matrix, which also define the problem
to be solved. Although there is no explicit feedback controlled by a clocking signal, any change in the state of any
neuron can affect the rest of the coupled neurons, which will together be forced to find a new stable configuration. Such
a stable configuration is essentially an energy minimum for the system. In other words, the system performs
minimization of an energy or cost function, similar to Ising machines and Hopfield networks, though the latter systems
differ vastly in construction and utility from a coupled oscillator system. [arXiv:1709.08102 (2017); Phys. Rev. E 62,
4010 (2000)] Thus, coupled oscillator systems can be used as associative memories for identification of matching
patterns. [IEEE Trans. Neural Net. 10, 508 (1999)] More importantly, such systems can be used to solve optimization
problems by encoding the problem into the weight matrix, such that the energy minimum of the system also corresponds
to the optimal solution to the problem. As outlined in the main text, coupled oscillator systems exhibit many parallels
to thalamo-cortical computations in the brain. [SIAM J. Appl. Math. 59, 2193 (1999)]
Consider for instance the viral quasispecies reconstruction problem, which searches for variants of a viral population
that share specific mutations. [Sci. American 269, 42 (1993)] The aim of the problem is to divide the conflict graph into
sets of reads by minimizing the intra-set conflicts. The viral quasispecies reconstruction problem can be solved using
several different approaches, including Markov models, probabilistic mixture models and graph partitioning (via
maximum cut, maximum clique, etc.). [PLoS Comp. Biol. 10, e1003515 (2014); PLoS ONE 14, e0225578 (2019)] Here
we formulate the problem as an energy function:
1
𝐸 = min ⁡∑ 𝑠𝑖𝑗 (𝑎𝑖 𝑎𝑗 − 1)
2
𝑖<𝑗

(S5)
𝑠𝑖𝑗 is a quantity within the adjacency matrix (with 𝑁 2 weights) at position (i,j), connecting 𝑁 neurons, with each neuron
𝑎𝑖 assuming state ‘+1’ or ‘-1’, indicating two sub-groups of 𝑁 genomic reads. 𝑠𝑖𝑗 is 1 for a conflict between two reads
at positions i and j, and 0 when there is no conflict. Equation S5 refers to creation of two sub-groups of genomic reads
with minimum intra-set conflict. The resulting sub-groups of reads are put through this process until we are left with
zero conflicts in each resulting sub-group. Alternatively, neurons with multiple states (more than two) may allow
creation of multiple sub-groups. A typical Ising model for energy minimization is written as: [Front. Phys. 2, 5 (2014)]
1
𝐸 = − min⁡ ∑ 𝑤𝑖𝑗 (𝑎𝑖 𝑎𝑗 )
2
𝑖<𝑗

(S6)
𝑤𝑖𝑗 is a weight within the weight matrix defining the network. Comparing Equations S5 and S6, [w] = [-a]. In other
words, every conflict is a weight of -1, or one that repels the corresponding reads into two different sub-groups. This is
a simple mapping between the adjacency and the weight matrix, making this problem (and variants of it) an accessible
metric for benchmarking the performance of new computing primitives. There are many variants of the problem that
have a similar or identical mapping between the adjacency and the weight matrix, including routing of lines in VLSI
design to maximize data flow between sub-circuits, [IEEE Trans Comp. 47, 1253 (1991)] the pure maximum-cut and
graph colouring problems, [Front. Phys. 2, 5 (2014)] social influence maximization, [Social Network Analysis and
Mining, 4 153 (2014)], contig orientation problem to determine the sequence of a genome, [J. Comput. Biol. 19, 1162
(2012)] etc. A randomly chosen connection matrix may represent a variety of the types of problems, though not any

34
possible type of problem (since some problems such as the traveling salesman problem have inherent sparsity). Recent
work on optical, quantum and electronic computing primitives to solve optimization problems have used variants of
such a problem to benchmark the performance of their systems. [Science Advances 5, 0823, (2019); Nature Elec. 3, 409
(2020); Sci. Rep. 9, 14786 (2019)]

Figure S26: Schematic of the analogue computing demonstration. (a) Schematic of the oscillator network, along with the
connections matrix. This represents the illustration in Figure 4. The green boxes represent weights (encoded in the impedance of
the connecting devices) and the light-brown coloured boxes represent shorting of the connection (to enforce a zero-diagonal
connection matrix). (b) Photographs of a part of the measurement setup. Scalebars are 2 mm (left) and 100 µm (right). (c)
Schematic illustration of the chip connection, along with the wire-bonded connections. In Chip 1, a common metal pad (green
circle) was used to short all the input nodes of the oscillators, which was biased with an external voltage. All the ground
electrodes were internally shorted to a metal pad (light-blue-coloured circle). In Chip 2, the yellow electrode pads were used to
program the individual pseudo-memcapacitors. Measurement circuits are not illustrated.

9.2. Construction and operation of the coupled oscillator system


Here we utilized programmable capacitive connections to achieve repulsion of phases, with the remaining connections
being resistive. Since a connection between neurons i and j is bidirectional, we naturally enforce a symmetric weight
matrix without having to achieve precise symmetric programming, while also reducing the number of hardware weights
to half the size of the weight matrix (Figure S26a).
To achieve programmable capacitive connections, we constructed nonvolatile memristors that exhibited pseudo-
memcapacitance, wherein the capacitance of the system switches with the resistance of the system. Such an effect has
been seen multiple times in multi-layer dielectrics sandwiched between metal layers. [Nanoscale Res. Lett. 9, 522
(2014); Sci. Rep. 3, 2482 (2014); Appl. Phys. Lett. 98, 093503 (2011); Nature Comm. 9, 3208 (2018)] The most likely
mechanism underlying switching in capacitance along with resistance changes is the formation of two capacitive
structures in series (for instance, via formation of a charge layer between two dielectric layers, or the presence of a
metallic electrode in between dielectric layers, which will effectively create two capacitors in series), with the resistance

35
switching in each structure shorting out the capacitance associated with it. We were able to replicate a similar effect
using a multi-layer stack of dielectrics (TaCuOx, HfOx) separated by a weakly conducting layer of TaNOx. The diagonal
elements were shorted via electrical breakdown, to ensure a zero-diagonal weight matrix. Each weight of -1 was
represented by a high capacitance, while weights of 0 were represented by a low capacitance (essentially a resistive
connection). Elements that were a part of the crossbar array but were not a part of the weight matrix were left in their
unoperated state with an extremely large resistance (>1 GΩ). The weight matrix was a randomly chosen instance with
a high degree of density.

a b

-3
10
-4
10

|ic| (A)
-5
10
-6
10
-7
10
-2 0 2
vc (V)

Figure S27: Schematic of the structure of the pseudo-memcapacitors. (a) Illustration of the structure of and the material stack
within the pseudo-memcapacitors used to create the connection matrix. (b) Current-voltage behaviour of the device illustrated in
(a) with the ordinate in a logarithmic scale.

Following the programming of the weight matrix, the nominally identical neurons connected to the network of pseudo-
memcapacitors were powered by a common voltage, and their oscillations were observed until their relative phases
were stable. In order to avoid the high degree of sensitivity of the oscillations to the input voltages, the neurons were
powered by a lower voltage (<1.8 V), wherein the sensitivity is lower (Figures 2, S9). Their phases were divided into
two halves and assigned to two sub-groups of neurons (Figure S29). The phases were measured using a combination of
oscilloscopes and digital logic, described elsewhere. [Sci. Rep. 7, 911 (2017)] While the phases assumed a range of
values, it is possible to snap the phases to a specified number of values using a synchronizing signal, which may enable
more accurate computations especially in larger scale systems with non-binary neurons. [arXiv:1709.08102 (2017);
Biosystems 48, 85 (1998)] Specifically, such synchronization may be necessary when solving problems that are not
merely partitioning of a set (e.g., identifying the number of colours required for the map colouring problem and
assigning those colours to specified regions will require more computations beyond merely partitioning the set of
regions into two, whereas for the maximum cut problem, such partitioning suffices). Furthermore, we note that the
performance of the system is especially good at higher densities of the weight matrix (graphs with a high density of
edges), which is particularly hard for most other electronic and quantum approaches.

36
Settling time Settled

50
vo (mV)

0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
t (s)

Figure S28: Extended data of convergence of the oscillations. Extended data corresponding to Figure 4h. An external voltage
was applied at t = 0. Settling was defined as the relative phases of the oscillations being stable with time. Relative phases were
measured in the settled time.

Figure S29: Illustration of the iterative process of viral quasispecies reconstruction. The process of dividing the nodes
(reads) of a conflict graph into sub-groups continues until groups of reads with no intra-group conflicts are identified.

37
a To connection matrix

i1 Cint
i2
v ext RNbO2 (T,iNbO2 )
Rint
vm
Integrated circuit element

b 1.0
i1
i (mA)

i2
0.5

0.0
0.0 0.1 0.2 0.3 0.4
t (s)

Figure S30: Estimation of power dissipation outside the electronic neurons. (a) In order to estimate the power dissipation
outside the electronic neurons we measured the currents in two branches of the neuromorphic element (i1 and i2). (b) By using the
Kirchhoff’s current law, we inferred that the current flowing out of the neuromorphic elements was negligible.

a
50
vo (mV)

0
0 20 40 60 80
t (s)
b c
50 50
vo (mV)

vo (mV)

0 0
20.0 20.2 20.4 20.6 20.8 21.0 70.0 70.2 70.4 70.6 70.8 71.0
t (s) t (s)

Figure S31: Longer duration of the transients. (a) Current dynamics in Figure S28 displayed for a longer duration. (b,c) Data
in (a) magnified at two different time slices.

9.3. Performance evaluation


While there are several benchmarking metrics used to evaluate the performance of computing primitives, the total
energy required to obtain a solution or the number of solutions that can be obtained for a Joule of energy are metrics
that encompass several different quantities, including power consumption and speed. [Appl. Phys. Rev. 7, 011305,
(2020)] In order to obtain the neuron energy corresponding to this metric, we calculated the energy contained within a
current spike, which is ~40 pJ for the data presented in Figure 4, and about 2 pJ for the smallest and fastest device
(Figure S15). We further estimate the number of pulses required to reach a solution (a solution time of 2 µs from Figure

38
4 for N = 16, and 1 pulse every ~0.1 µs) as 20. Further, for the probability of reaching the best solution of ~25% for N
= 16, we calculated the number of times the experiment will have to be repeated to obtain an optimal solution probability
of 99% using the equation 0.99 = 1 − (1 − 𝑝s )𝑛 , where 𝑝s is the optimal solution probability and 𝑛 is the number of
repetitions required. We obtained n ≈ 16 for N = 16. We obtain the total neuron energy to solution as the product of
energy per spike, number of pulses required to reach a solution, n and N. This was about 200 nJ of neuron energy per
solution for rdev ≈ 45 nm and 10 nJ of neuron energy per solution for rdev ≈ 25 nm. Other equally useful metrics involve
the number of solutions obtained per Joule of energy. As we mentioned earlier, larger scale systems may require use of
a synchronization signal to ensure alignment of phases to specified quantities in order to maintain a high accuracy of
solutions, while there may also be cooling overheads and losses due to power dissipation in dense wiring, all of which
may lead to corrections in these performance metrics. Here we do not include energy consumed by the connection
matrix, which may add to the total energy consumption. In recurrent networks with explicit feedback, such as Hopfield
networks, typical two terminal nonvolatile memory crossbars used for weighted feedback do not dominate the energy
consumption, whereas the energy consumption is dominated by the feedback processors and converters (such as digital-
analog converters), signal processing circuits, and neuromorphic blocks in some cases (most of which are absent in the
present implementation). [Nature Elec. 3, 409 (2020)] Another possible improvement is in truly randomizing the initial
conditions of the oscillators, wherein each oscillator is initialized at a specified time, and then the connection matrix is
gradually introduced. A potential issue in evaluating the performance is also in randomizing the weight matrix, which
involves a measure of the ease of programming the weight matrix. Here we chose the programmed matrix to be the
target matrix, which enabled ease of identifying the optimal solution corresponding to the hardware. Furthermore,
different schemes of solving problems may involve different techniques to read the solutions (e.g.: reading of synaptic
solution weights in recurrent neural networks and phases of oscillations in coupled oscillator network), which may also
introduce differences in performance comparisons. Our median accuracies drop off sharply at problem sizes of N >16
(Figure 4), leading to degraded solution quality, indicating possible synchronization and circuit loading issues. This
suggests room for incorporation of external synchronization signals and buffered connectivity networks. [Sci. Rep. 9,
14786 (2019); arXiv:1903.07163 (2019)] Despite our system being crude, it demonstrates qualitative feasibility of the
idea. Our system is certain to produce significantly better performance benchmark quantities when optimized for energy.
The following are identified as the most promising hardware approaches to solving optimization problems: application-
specific CMOS-based chips: e.g.: FPGA-based digital annealer and CMOS Ising machines, [Solid-State Circuits 51,
303 (2016); Front. Phys. 7, 48 (2019)] CPU-based solvers, [Quantum Sci. Technol. 3, 04LT01 (2018); Nature Elec. 3,
409 (2020))] GPU-based solvers, [arXiv 1806.08422 (2018)] quantum annealing, [‘The D-Wave: 2XTMQuantum
Computer Technology Overview’; Science Advances 5, 0823, (2019)] optical Ising machines, [Science Advances 5,
eaau0823, (2019)] non-CMOS (hybrid) chips: memristor Hopfield networks, stochastic recurrent neural networks.
[Nature 573, 390 (2019); Nature Electron. 3, 409 (2020)] A quantitatively comprehensive benchmark is challenging at
present owing to the sparsity in the literature of reporting energy efficiency, since most approaches are not focused on
optimizing energy efficiency but are instead focused on demonstration of capability and reasonable speed. This is true
especially with quantum approaches since the energy costs of cooling dominate the energy costs of computing, which
is unlikely to change unless scaled up to massive sizes. There is also lack of a common performance metric to report
(e.g.: solutions/s/W at size N, wall-plug energy to solution, etc.), and a common problem to evaluate optimization
primitives (as an example, for classification primitives, the MNIST or ImageNet databases serve as well-accepted
standards). A dedicated study is required to establish quantitatively accurate comparative benchmarks, including a
choice of a common problem that is optimized for difficulty, access to the different competing technologies in order to
evaluate them under similar conditions, and use of optimal algorithms supported by the respective hardware units.

39

You might also like