Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Japanese Journal of Applied

Physics

PROGRESS REVIEW • OPEN ACCESS You may also like


- If it’s pinched it’s a memristor
Ta/HfO2 memristors: from device physics to neural Leon Chua

networks - Towards engineering in memristors for


emerging memory and neuromorphic
computing: A review
Andrey S. Sokolov, Haider Abbas, Yawar
To cite this article: Hao Jiang et al 2022 Jpn. J. Appl. Phys. 61 SM0802 Abbas et al.

- Dynamic resistive switching devices for


neuromorphic computing
Yuting Wu, Xinxin Wang and Wei D Lu
View the article online for updates and enhancements.

This content was downloaded from IP address 110.172.187.198 on 24/08/2022 at 06:54


Japanese Journal of Applied Physics 61, SM0802 (2022) PROGRESS REVIEW
https://doi.org/10.35848/1347-4065/ac665d

Ta/HfO2 memristors: from device physics to neural networks


Hao Jiang1*, Can Li2*, and Qiangfei Xia3*
1
Frontier Institute of Chip and System, Fudan University, Shanghai, People’s Republic of China
2
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People’s Republic of China
3
Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, MA 01003, United States of America
*
E-mail: haoj@fudan.edu.cn; canl@hku.hk; qxia@umass.edu
Received January 26, 2022; revised March 13, 2022; accepted April 10, 2022; published online June 21, 2022

Hardware implementation of neural networks with memristors can break the “von-Neumann bottleneck,” offer massive parallelism, and hence
substantially boost computing throughput and energy efficiency. In this review, we first explain the design principles and switching mechanism of a
Ta/HfO2 memristor. We show that the device meets most key requirements on device properties for in-memory computing. We then introduce the
integration of the memristor with foundry-made metal-oxide-semiconductor transistors and the programming of the one-transistor-one-resistance
switch (1T1R) arrays. We demonstrate that the crossbar arrays can be used in various neural networks. Finally, we discuss the remaining
challenges of scaling up the memristive neural networks for larger scale real-world problems. © 2022 The Author(s). Published on behalf of The
Japan Society of Applied Physics by IOP Publishing Ltd

temperature, an essential requirement for inference opera-


1. Introduction tions. The device must be able to switch enough times for
Conventional digital computers based on von-Neumann adequate conductance programming cycles during the
architecture require frequent data transmission between the training. The preferred operational conductance should be
physically separated processing and memory units. Their low to reduce the power consumption. Still, a linear current–
basic building block, namely, metal-oxide-semiconductor voltage (IV) relationship is desired at the same time to
field-effect transistors is approaching its scaling limit. As a directly utilize Ohm’s Law for the VMM. Moreover, a linear
result, they have limited energy and speed efficiencies, and symmetric weight updating is critical for efficient in situ
especially for data-centric applications, including artificial training of neural networks, which means the device con-
intelligence. Hardware built with emerging devices capable ductance should increase or decrease at the same pace when
of computation at the site where data is stored by using external stimuli of the same amplitude, but opposite polarity
physical laws offers an attractive solution to these issues. A are applied.
memristor, a two-terminal resistance switch whose behavior Herein we review our progress in developing a Ta/HfO2
is based on the ionic motion under an electric field, is one of memristor and in-memory computing applications. We de-
the promising candidate devices.1–3) Built into large-scale signed the device based on the requirements in critical
arrays, memristors perform in-memory computing with their metrics such as multilevel conductance, retention, endurance,
multilevel conductance states as the synaptic weights.4–21) In etc. We proposed that the continuous modulation of the
a crossbar array, a current is generated at each cell when chemical composition in the conduction channel and the
voltages are applied at row wires based on Ohm’s Law, thermodynamic properties of the mobile species are respon-
implementing multiplication in the physical domain. The sible for the device’s behavior. We built a large-scale 1T1R
current along each column wire sums up according to array by back-end-of-the-line integration of the memristors
Kirchhoff’s Current Law, realizing the summation operation. with foundry-made transistor arrays. We introduced the
Such implementation of the vector-matrix multiplication programming and operation of the array from the perspec-
(VMM), an essential but resource-hungry computing opera- tives of both the semiconductor devices physics and mixed-
tion in various artificial neural networks, can be efficiently signal circuits. We implemented memristive neural networks
performed by memristor crossbar arrays in one step. with software- and hardware-based activation functions and
Although computing hardware based on memristors is demonstrated their applications in typical machine learning
promising for a broad spectrum of applications such as tasks. Finally, we discussed the challenges and opportunities
computer vision, speech recognition, and autonomous vehi- of the emerging hardware and identified some key areas that
cles, building such hardware is a challenging task.17) need intensive research and development to push the
Going from a discrete device to a large-scale array and memristive neural networks as a low-power hardware plat-
eventually to a memristive neural network capable of solving form for larger scale real-world machine learning applica-
real-world problems (Fig. 1) requires multidisciplinary ex- tions.
pertise and a deep understanding of device physics, circuits,
architecture, and algorithms. From the device property 2. Device design principles
perspective, a reasonably wide conductance range with To design a memristive device with multiple resistance states
multiple conductance states is required to represent analog that can be tuned in an analog fashion, it is essential to
synaptic weights in many neural network algorithms. Each understand the fundamental mechanism underpinning the
conductance state should be stable against drift with time or switching behavior. Figure 2 schematically illustrates three

Content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this
work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
© 2022 The Author(s). Published on behalf of
SM0802-1 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

Fig. 1. (Color online) The two-terminal memristor device, usually with a switching layer sandwiched between two electrodes, has multilevel conductance
modulated by an electric stimulus. Organized into a crossbar architecture, they can perform analog vector-matrix multiplication within the array using physical
laws. With necessary supporting peripheral circuits, they implemented a neural network that can solve real-world problems.17)

Fig. 2. (Color online) The schematics with exemplary TEM images showing three typical mechanisms responsible for conductance tuning in conduction-
channel based memristors, including the modulation of the width of a conduction-channel, the gap between the channel and an electrode, and the composition
of the conduction-channel. Reprinted from Ref. 17.

typical resistance switching mechanisms in conduction- Fig. 3. The composition analysis of the conduction channel
channel-based memristors.17) The device conductance could suggests that Ta has moved inside of the HfO2 matrix, and
be changed by modulating the diameter of a conduction- oxygen vacancy is also playing a critical role.
channel, e.g. in the conduction-bridge type memristor by Consequently, the electronic conduction mechanism at the
applying different levels of compliance.22) However, in such high-resistance and low-resistance states are distinctly dif-
a case, the reset step is usually strong and abrupt due to the ferent. As shown in Fig. 4(a), both LRS and HRS showed a
existence of a metallic bridge. In addition, efficient contin- linear I–V relationship at low voltages, indicating no tun-
uous conductance tuning could be achieved by modulating neling gap between electrodes and the conduction channel,
the distance of a tunneling gap between the conduction which is different from the well-studied TiOx−based devices
channel and an electrode.23–25) Since the conduction me- where a tunneling gap was present and responsible for the
chanism is tunneling-dominated, the IV relationship of each switching behavior. Although the linear I–V relationship can
state is nonlinear. Consequently, the most promising ap- also be achieved for a metal/insulator/metal (MIM) junction
j
proach to obtaining multiple conductance states with the when the applied voltage is very small (≈0 V or = e , where
linear IV behavior is the modulation of the conduction- j is the barrier height and e is the charge of the electron),29) it
channel composition.26–28) In choosing the matrix material, a is highly unlikely for our case here since the current is
simple binary system with only a conductive and an relatively high for our device even at HRS.25) We measured
insulating phase at the switching temperature is beneficial the device resistance as a function of temperature at both
for higher endurance.26) Based on these principles, we HRS and LRS. The device at LRS showed a typical metal-
designed a Ta/HfO2 memristor device that operates on the like behavior as its resistance linearly increased with tem-
motion of mobile species (oxygen vacancies and tantalum perature. In contrast, the device at HRS showed a typical
ions) in an amorphous HfO2 matrix, with the analog behavior for non-metallic materials in that the resistance
switching behavior enabled by the modulation of Ta:O ratio increased with temperature [Fig. 4(b)]. The measured tem-
in the localized conduction channel. perature coefficients of resistance (TCR) were 8.75 × 10−4/K
The composition change during the switching is confirmed and −4.37 × 10−4/K for LRS and HRS, respectively. The
with electron energy loss spectroscopy (EELS) of a sub-10 different signs of the TCR at LRS and HRS suggest that the
nm Ta-rich and O-deficient conduction channel. As shown in modulation of the chemical composition in the conduction
© 2022 The Author(s). Published on behalf of
SM0802-2 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

(a) (b)

Fig. 3. (Color online) Direct observation of a Ta-rich and O-deficient conduction channel. (a) Comparison of core-loss EELS spectra collected at the pristine
HfO2 layer, conduction channel region and Ta electrode. It indicates the conduction channel is Ta-rich. (b) O-K edge EELS spectra taken at three areas, which
clearly show the conduction channel is also O-deficient.28)

(a) (b)

Fig. 4. (Color online) (a) Linear IV curves for the device at both LRS and HRS. (b) The dependence of the normalized resistance change
(ΔR = (R−R(300 K))/R(300 K)) on temperatures, from where the TCR is measured to be 8.75 × 10−4/K for LRS and −4.37 × 10−4/K for HRS. Reprinted from Ref. 28.

channel rather than the size should be responsible for the switching behavior under quasi-DC voltage sweeps after an
switching in our Ta/HfO2 memristor. The continuous mod- electrical forming step, turning on when a positive voltage is
ulation of the conduction-channel composition leads to applied to the Ta electrode and off when the voltage polarity
multiple resistance states. It should be noted that if the reset switches [Fig. 5(a)]. With a smaller device size, the opera-
step is too strong, we may achieve a HRS with a much lower tional current of the device is much reduced [Fig. 5(b)]. We
conductance but poor IV linearity, which is not within the first achieved multiple conductance states from the Ta/HfO2
interested conductance range in our current studies. memristor using different compliance currents and stop
Since the most suitable application for a memristor-based voltages during SET and RESET, respectively, as shown in
AI accelerator is inference, we chose mobile species with Fig. 5(c). In addition, like the potentiation/depression beha-
reasonably high activation energy in our device design. vior of a biological synapse, we can gradually increase and
Table I lists the activation energy of a few ionic species in decrease the device conductance with a train of electrical
different transition metal oxides. Higher activation energy pulses. For example, Fig. 5(d) plots the conductance change
means the mobile species will not move around once set into of the Ta/HfO2 memristor in response to 39 electric pulses
a particular position, suggesting high stability of the resis- (pulse width: 100 ns), including 26 consecutive positive
tance state. Although a relatively higher program voltage pulses with the amplitude increased from 0.75 to 1 V, and
may be needed during programming, an inference system is 13 consecutive negative ones with the amplitude decreased
not trained intensively, so the overall energy footprint is still from −1.05 to −1.17 V (step size: 10 mV step).
low. The different resistance states are all stable even at higher
temperatures. We programmed the device into eight states
3. Electrical properties of the Ta/HfO2 memristors using different compliance currents to examine the stability
Our Ta/HfO2 memristor has a simple MIM consisting of an of different conductance states. They showed no evident
inert metal Pt (or Pd/Ru) as the bottom electrode (BE), Ta as current fluctuation and drift for over 104 s at 150 °C
the top electrode (TE). A 5 nm atomic layer deposition [Fig. 6(a)]. To fully evaluate the retention properties,
prepared HfO2 blanket layer is sandwiched in between as temperature (T) dependent retention measurements were
the switching material. The device exhibits typical resistance performed, and the device failure time (t) at HRS was

Table I. The activation energy of typical mobile species in different oxide matrices.

Mobile species Ag Ag W VO VO VO VO/Ta2+

Host material SiOx a-Si WOx TaOx HfO2 HfAlOx HfO2


Activation energy (eV) 0.27 0.87 0.6–1 1.6 0.7–1.8 1.09 1.55
References 30 31 32 33 34 35 28
© 2022 The Author(s). Published on behalf of
SM0802-3 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

(a) (b)

(c) (d)

Fig. 5. (Color online) (a) Typical IV curves for the Ta/HfO2 device by quasi-DC voltage sweeps (step size: 50 mV) after an electrical forming step (at 2 V)
for a 10 μm device and (b) 100 nm device. The black arrows indicate the switching polarity. (c) Analog resistance tuning in the 10 μm device achieved by
using quasi-DC voltage sweeps with different compliance currents. (d) The gradual modulation of the device conductance can also be achieved using pulse
train consisting of 26 positive pulses (100 ns, 0.75 to 1 V, 10 mV step) and 13 negative pulses (100 ns, −1.05 to −1.17 V, 10 mV step), akin to the potentiation
and depression behavior of a biological synapse.28) The results of (c) and (d) are from the 10 μm devices.

(a) (b)

Fig. 6. (Color online) (a) Retention of eight different conductance states at 150 °C for over 104 s. (b) The fitting plot of measured HRS retention time at
250 °C, 275 °C, 300 °C, 325 °C and 350 °C with the Arrhenius equation (red line). The activation energy of mobile species (Ea) is extrapolated to be 1.55 eV and
the extrapolated retention time is 70258 years at 85 °C and 10 years at 162 °C. Reprinted from Ref. 28. The results here were measured from the 10 μm devices.

2.7 × 105, 7.5 × 104, 1.4 × 104, 2.7 × 103, and 1.3 × 103 s at
250, 275, 300, 325, and 350 °C, respectively [Fig. 6(b)]. The
Arrhenius equation can well fit the t–T relation. The extra-
polated retention time was 7 × 104 years at 85 °C, and
beyond 10 years even at 162 °C. The exceptional stability
was attributed to the relatively higher activation energy of
mobile species (1.55 eV) extrapolated from our Ta/HfO2
when compared with other material systems, as listed in
Table I.
The device also exhibits high endurance, ensuring suffi-
cient programming cycles for the training for many applica-
tions. As shown in Fig. 7, over 1.2 × 1011 open-loop digital
Fig. 7. (Color online) Over 1011 billion digital switching cycles achieved
switching cycles were achieved from the device without any from the Ta/HfO2 memristor with pulses of 1.3 V/100 ns for SET and
feedback or power-limiting circuits, which is the highest −3.05 V/100 ns for RESET. The device states were read at 0.1 V. Reprinted
reported endurance for a single-oxide-layer memristive from Ref. 28. The results here were measured from a 10 × 10 μm2 device.
© 2022 The Author(s). Published on behalf of
SM0802-4 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

device to the best of our knowledge. The high endurance (a)


likely results from the simple material system with only two
stable solid phases (conductive and insulating at the tem-
perature at which switching occurs.36,37) We expect the
endurance to be improved if measured with a closed-loop (b)
system equipped with a feedback mechanism for the applied
pulses.
The IV linearity of the device depends on the resistance
range. The memristor device has a nonlinear IV behavior,
especially at a high resistance range, and the linear IV usually
occurs at high conductance states. To utilize Ohm’s Law
directly for multiplication, we choose an operation range of
1–11 kiloohms for the proof-of-concept VMM demonstra-
tion. Our device exhibits IV linearity in this range (Fig. 8). It
is worth pointing out that the current level is higher than we Fig. 9. (Color online) (a) The schematic illustration of a 1T1R single-cell
wanted but sufficient for operating the demo array. Future and (b) the band diagrams under different bias conditions. To avoid the
work is needed to reduce the current level by one or even two transistor’s body effect, a proper voltage polarity needs to be chosen during
orders of magnitude to suit the array for much larger datasets. the programming. In this case, a positive VTE sets the device while a positive
VBE resets the device. The PN junctions between the source or drain and the
4. Programming of 1T1R single cells and arrays silicon bulk is reversely biased, avoiding leakage current to the bulk.

Access devices or selectors are necessary to precisely


program individual memristors and avoid the sneak path (1S1R) architecture is unsuitable, if all possible, for com-
problem in large-scale arrays. Among the reported options, puting purposes, in particular in the analog domain.
transistors are the most mature ones to date. The targeted 4.1. Programming of a single 1T1R cell
memristors can be selected by turning on/off the serially Figure 9(a) shows the schematics of a 1T1R single cell with
connected transistor. By controlling the applied gate voltage the memristor BE connected to the source of an NMOS
of the transistor, a precisely controlled voltage can be transistor and the body/substrate grounded. During SET, a
delivered to the memristor for high precision programming. positive voltage of different amplitudes (VTE) is applied to the
During inference, all transistors are turned on, and the VMM memristor TE with the transistor drain grounded (VBE). As
of the input voltage vector and the conductance matrix stored mentioned earlier, positive gate voltages (VGate) of different
in the memristor array can be performed parallelly in one amplitudes can enable various current compliances from the
step. On the other hand, a two-terminal selector is expected to access transistor. To RESET the memristor in the 1T1R cell,
yield a smaller cell footprint. However, it still requires further instead of applying a negative VTE as widely practiced in
device engineering on performances, including nonlinearity, individual bipolar memristors, a positive VBE to the drain is
endurance, speed, current density, and variation among used, with TE grounded. With this scheme, the PN junctions
devices and cycles. For example, the requirement for the between the source or drain and the substrate are always
endurance of selectors is even higher than memristors since reversely biased, avoiding undesired current leakage to the
selectors must be switched during both inference and silicon bulk [Fig. 9(b)]. Using only positive voltages for
training. Furthermore, lacking a designated control port programming is also consistent with the availability of
makes it inconvenient for all two-terminal selectors in a voltage polarity in an integrated circuit, in which the voltages
large array to stay at the on state during the computing since are usually positive.
each connected memristor could be at a different conductance 4.2. Programming scheme for ex situ training of a
state. As a result, the one-selector-one-resistance switch 1T1R array
For inference purposes, the conductance of each memristor in
the 1T1R array must be programmed to the target value with
high precision. A write-verify scheme with [Fig. 10(a)] is
adopted for this purpose. After each writing cycle, the device
conductance is read out and compared with the targeted value
to determine if a set or reset step is required next. If the
previous pulse does not increase the device conductance
enough towards the target, the VTE amplitude will be raised
for the next pulse. On the other hand, if the previous pulse
increases the conductance over the target, a reset pulse will
be applied on the BE, and the transistor gate voltage
increased to allow for an efficient conductance reduction.
Each cell in the array will be programmed with multiple read/
write cycles until the difference between the final and target
conductance (Gfinal – Gtarget) is within the defined tolerance.
Fig. 8. (Color online) Linear IV occurs at a relatively higher conductance
range for the device. Future engineering will have to reduce the current but
Figure 10(b) plots a histogram of the Gfinal – Gtarget for the
still maintain other properties for VMM.6) The results here were measured by 8,192 devices in a 128 × 64 1T1R array. It exhibits a normal
quasi-DC sweeps with a step size of 50 mV. Device size: 4 × 4 μm2. distribution centering at −4.7 μS with a standard deviation
© 2022 The Author(s). Published on behalf of
SM0802-5 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

(a) (b)

(c)

Fig. 10. (Color online) (a) A typical example of programming our Ta/HfO2 memristor to a desired conductance with the feedback tuning algorithm.9) (b) The
histogram of the writing error (the difference between target conductance value and final value) with a standard deviation of 6 μS when the writing tolerance of
±10 μS was used. (c) Experimental conductance writing result of the discrete-cosine transform (DCT) pattern into a 64 × 64 array. Device size: 4 × 4 μm2.
Reprinted from Ref. 6.

(σ) of 6 μS when the writing tolerance of ±10 μS was used to difficult for most two-terminal memristive devices because of
program the array. The small σ value suggests that more than the intrinsic device nonlinearity and that conductance change
64 conductance levels or 6 bits of digital precision can be is history-dependent. With the 1T1R architecture, linear and
achieved from our Ta/HfO2 devices within a device con- symmetric weight updating with one-shot programming
ductance range of 100–900 μS, sufficient for many edge becomes feasible. As shown in Fig. 11(a), two synchronized
computing tasks. The equivalent bit-precision, primarily electric pulses (VTE/VBE to memristors and Vgate to transis-
limited by the intrinsic device noise, could be further tors) are applied, with the VTE staying the same while the gate
increased by defining a narrower tolerance during program- voltage increases in each set cycle. For the reset operation, a
ming and/or using a larger number of closed-loop iteration large reset pulse is first applied on the BE with the transistor
cycles. With the write-verify scheme, arbitrary conductance gate open, and the following conductance tuning is imple-
maps corresponding to different algorithms can be written mented with the VTE and a decreasing transistor gate voltage
into the 1T1R array with high precision. For example, [Fig. 11(b)]. With this scheme, the device conductance can be
Fig. 10(c) shows the conductance map of a 64 × 64 array linearly increased with the gate voltage for Vgate between
after the discrete-cosine transform (DCT) algorithm was 0.6 and 1.6 V [Fig. 11(c)]. The linear and symmetric weight
successfully written. updating with our developed scheme also reduces cycle-to-
4.3. Programming scheme for in situ training of a cycle and device-to-device variations [Figs. 11(d), 11(e)],
1T1R array demonstrating it is a reliable scheme for in situ training.
While an inference system can afford multiple pulses at each A simple model can be built to understand the linear and
cell during programming because once trained, the system symmetric weight updating.38) When the voltage across a
R
can be used for a long time, making the overall energy memristor (Vmem = R mem VAppl ) in the 1T1R cell drops
mem + Rtrans
landscape reasonable. However, for a system that needs below a threshold voltage (Vmin), the switching process
frequent training, one-shot programming is desired to reduce terminates. Here, Rmem , Rtrans, and VAppl is memristor resis-
power consumption. Furthermore, to better map machine tance, transistor channel resistance, and the total applied
learning algorithms into the conductance map of the array, voltage, respectively. The assumption is similar to the theory
linear and symmetric weight updating is essential. This is proposed by Ulrich B. et al. recently.39) After the 1T1R

© 2022 The Author(s). Published on behalf of


SM0802-6 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

(a) (b)

(c) (d)

(e)

Fig. 11. (Color online) The developed two-pulse scheme to gradually (a) set and (b) reset the device. (c) 20 cycles of linear and symmetric weight updating
from a representative cell and each cycle contains 200 pulses. (d) Single such weight updating cycle collected from all responsive devices in the array and the
median conductance is indicated by the yellow line. (e) Over 20 such weight updating cycles from all responsive devices in the array. Device size: 4 × 4 μm2.
Reprinted from Ref. 7.

system stabilized, we have Rmem = Rtrans / VAppl - 1 or ( V


) 4.4. Parallel programming of the 1T1R array
V
( )
equivalently Gmem = Gtrans VAppl - 1 , where Gmem and
min
min

Gtrans are memristor and transistor channel conductance. As


It is critical to develop schemes that can program the array
with high throughputs. With the 1T1R configuration, it is
VAppl
Vmin
is a constant, the final Gmem should be linearly dependent possible to set or reset a crossbar array in a row-by-row or
on Gtrans and hence on the gate voltage when VGate > Vth column-by-column fashion. Figure 12 illustrates how such
(transistor threshold voltage). parallel programming is implemented. During the reset
It should be noted that with the one-shot programming, the process, a reset pulse is applied to the BEs of the entire
conductance tuning may not be as accurate as in the write- column with all the transistors in the same column open. The
verify approach. However, the advantage in energy con- voltage on the TEs dictates which memristors will be reset.
sumption is evident. Furthermore, for machine learning For a set process, on the other hand, a voltage pulse is applied
algorithms based on stochastic gradient descent optimization, to all the TEs of memristors on the same row, and the gate
the error in programming will be compensated by the voltages for the serial transistors vary, leading to the tuning
neighboring cells, as proved by our previous studies,7) of the conductance states of the memristors on the same row
suggesting the importance of device-circuit-architecture-al- simultaneously.
gorithm co-design. The device requirements for in situ
training are more relaxed than those for inference or a 5. Memristive neural networks
memory system, indicating that the memristor device is a The superior energy efficiency and computing throughputs
better candidate for neural computing than memory applica- enabled by physical computing in the crossbar make the
tions, as the device variation is considered a roadblock for 1T1R array an ideal module for neural networks, especially
their wide adoption in memory products. for the VMM operations that are usually resource-hungry in

© 2022 The Author(s). Published on behalf of


SM0802-7 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

(a) (b)

Fig. 12. (Color online) Column or row-wise parallel programming of the 1T1R crossbar array. (a) During reset, a reset pulse is applied to the bottom
electrodes of an entire column simultaneously, while the top-electrode voltages control which devices are reset. (b) For the set process, a set pulse is applied on
the top electrodes of an entire row, and the different transistor gate voltages determine the how much each memristor’s conductance is changed.7)

most typical neural network topographies. However, such Figure 13 shows an exemplary system that also has a digital
neural networks are mixed-signal systems, which would not computer and a microcontroller, in which the computer is
be possible without peripheral circuitry such as DACs used to run scripts from different algorithms, and the
(digital-analog converters) for generating driving pulses and microcontroller controls all the circuit components. These
ADCs (analog-digital converters) for collecting the measure- peripheral circuit components are critical to ensure the
ment data. Sample and hold circuits may be used to synchronization of the signals (such as the voltages on the
synchronize all the output signals better, and trans-impedance memristor electrodes and transistor gates), and a professional
amplifiers to convert the sensed current into a voltage signal.

Fig. 13. (Color online) An exemplary system consisting of a 1T1R memristor crossbar array and peripheral circuitry that supports the operation. The proper
design of such a system calls for both device knowledge and mixed-signal circuit expertise. Reprinted from Ref. 7.
© 2022 The Author(s). Published on behalf of
SM0802-8 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

design is required to utilize the advantages of both the of printed circuit boards (PCBs) that contain the hardware
traditional and emerging technologies. neurons built with off-the-shelf electronic components. The
To implement neural networks with more than one layer, fully hardware-based neural network was used in demon-
the data flow is a critical issue. Previously we have used strating high-accuracy object classification. By substantially
different partitions of the same array as separate layers in a reducing the resource-demanding data shuttling and analog-
neural network. For example, by partitioning a 128 × 64 digital conversions, it delivers much-improved power and
array into two layers, we built a two-layer perceptron that area efficiencies, as estimated based on the 65 nm CMOS
was used to classify MNIST handwritten digits with high node.
accuracy.7) By partitioning the array into three fully con-
nected layers, we implemented the reinforcement learning 6. Summary and perspective
algorithm.12) Similar techniques were adopted to demonstrate Guided by the fundamental physical principle and the
recurrent neural networks with long short-term memory units required device properties for computing, we have developed
for time-sequence data analysis8) and convolutional recurrent a Ta/HfO2 memristor that meets most of the requirements for
neural networks.40) in-memory computing in artificial neural networks. The
For the aforementioned multilayer networks, the output of device exhibits stable multilevel conductance states, analog
one layer should be nonlinearly activated before being fed to tunability, high endurance, long retention, and decent IV
the subsequent layer. This could be implemented in software, linearity within certain conductance ranges. Integrated with a
in which the analog signals from the earlier layer are transistor into a 1T1R crossbar array, the conductance of each
converted to digital before they can be nonlinearly activated. cell can be precisely tuned for an inference system and
This step usually is a bottleneck that limits the system modulated linearly and symmetrically with one-shot pro-
performance. To address this issue, a two-layer perceptron gramming for an online training system. Taking the resource-
with both hardware neurons and synapses was designed and demanding tasks, the vector-matrix operation, the 1T1R
built (Fig. 14).41) In this proof-of-concept system, two memristor crossbar array brings advancement in energy
128 × 64 1T1R arrays were used as the first and second efficiency and computing throughput, owing to the capability
fully connected layers in the perceptron, connected by a stack of parallel analog computing in the physical domain.

Fig. 14. (Color online) A two-layer all hardware perceptron. The upper panel shows the circuit schematic of the perceptron. The lower panel shows the
optical images of a 1T1R crossbar for the first layer, a stack of PCBs of the hidden neurons (ReLUs), and a second crossbar for the second fully connected
layer. The hardware neurons contain a total of 64 channels of ReLU activation functions and 64 inverters to support 64 differential pairs for the second layer.
Scale bar, 1 mm. Reprinted from Ref. 41.
© 2022 The Author(s). Published on behalf of
SM0802-9 The Japan Society of Applied Physics by IOP Publishing Ltd
Jpn. J. Appl. Phys. 61, SM0802 (2022) PROGRESS REVIEW

Despite the demonstrated potential in implementing var- 1) L Chua, IEEE Trans. Circuit Theory 18, 507 (1971).
ious neural networks, extensive future work spanning from 2) D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, Nature 453,
80 (2008).
fundamental device physics to algorithms is needed to fully 3) J. J. Yang, D. B. Strukov, and D. R. Stewart, “Memristive devices for
unleash this technology’s potential. Understanding the under- computing,” Nat. Nanotechnol. 8, 13 (2013).
lying mechanism of the nonlinear switching behavior in 4) M. Hu, J. P. Strachan, Z. Li, and S. R. Williams, In 17th Int. Symp. Quality
amorphous/crystalline heterogeneous systems is essential to Electronic Design (ISQED) p. 374 (2016).
5) P. Yao et al., Nat. Commun. 8, 15199 (2017).
achieving controllable memristive behavior. The defects in 6) C. Li et al., Nat. Electron. 1, 52 (2018).
the switching layer dictate the device behavior, and how the 7) C. Li et al., Nat. Commun. 9, 2385 (2018).
ions and charges interplay at these sites remains unsolved. 8) C. Li et al., Nat. Mach. Intell. 1, 49 (2019).
New characterization techniques are yet to be developed to 9) M. Hu et al., Adv. Mater. 30, 1705914 (2018).
10) M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev,
better understand the device’s physics, especially at the and D. B. Strukov, Nature 521, 61 (2015).
atomic scale. Analog systems are usually more susceptible 11) P. M. Sheridan, F. Cai, C. Du, W. Ma, Z. Zhang, and W. D. Lu, Nat.
to noise. In addition to figuring out the noise source and Nanotechnol. 12, 784 (2017).
12) Z. Wang et al., Nat. Electron. 2, 115 (2019).
characteristics, it is essential to understand the fundamental
13) P. Lin et al., Nat. Electron. 3, 225 (2020).
limits of such noises on the computing capability of an array. 14) P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, and
The knowledge gained in this aspect will lead to a better- H. Qian, Nature 577, 641 (2020).
designed system with better-utilized analog properties. 15) C. Du, F. Cai, M. A. Zidan, W. Ma, S. H. Lee, and W. D. Lu, Nat. Commun.
8, 2204 (2017).
To the end of device engineering, a lower operational
16) K. Berggren et al., Nanotechnology 32, 012002 (2020).
current is desired to reduce the power consumption. 17) Q. Xia and J. J. Yang, Nat. Mater. 18, 309 (2019).
However, a question mark is whether other critical metrics 18) M. A. Zidan, J. P. Strachan, and W. D. Lu, Nat. Electron. 1, 22 (2018).
such as multilevel conductance, stability of resistance levels, 19) D. Ielmini and H.-S. P. Wong, Nat. Electron. 1, 333 (2018).
20) S. Yu, Proc. IEEE 106, 260 (2018).
and IV linearity will be preserved for a low-current device. 21) A. Sebastian, M. L. Gallo, R. Khaddam-Aljameh, and E. Eleftheriou, Nat.
Other longstanding problems for the filamentary type of Nano 15, 529 (2020).
memristors are the requirement of an electrical forming step, 22) S. Pi, M. Ghadiri-Sadrabadi, J. C. Bardin, and Q. Xia, Nat. Commun. 6,
which can be power consumption and even destructive, and 7519 (2015).
23) J. Chen, C. Hsin, C. Huang, C. Chiu, Y. Huang, S. Lin, W. Wu, and
the cycle-to-cycle and device-to-device variability. Also L. Chen, Nano Lett. 13, 3671 (2013).
lacking on the device level is a two-terminal selector with 24) J. J. Yang, M. D. Pickett, X. Li, D. A. A. Ohlberg, D. R. Stewart, and R.
very high performance, especially endurance. Even when S. Williams, Nat. Nano 3, 429 (2008).
such a device becomes available, plenty of work needs to be 25) S. Menzel, U. Böttger, and R. Waser, J. Appl. Phys. 111, 014501
(2012).
done to properly use them, if possible, in computing 26) F. Miao, J. P. Strachan, J. J. Yang, M. Zhang, I. Goldfarb, A. C. Torrezan,
applications. P. Eschbach, R. D. Kelley, G. Medeiros-Ribeiro, and R. S. Williams, Adv.
Low power peripheral circuits for memristive neural net- Mater. 23, 5633 (2011).
27) D. Ielmini, F. Nardi, and C. Cagli, IEEE Trans. Electron Dev. 58, 3246
works are vital on the circuit level. It was estimated that data
(2011).
conversation consumes most of the power in memristor- 28) H. Jiang, L. Han, P. Lin, Z. Wang, M. H. Jang, Q. Wu, M. Barnell,
based deep learning accelerators. As a result, there is a great J. J. Yang, H. L. Xin, and Q. Xia, Sci Rep. 6, 28525 (2016).
need to balance the bit-precision of the DAC/ADCs and the 29) J. G. Simmons, J. Appl. Phys. 34, 2581 (1963).
30) Z. Wang et al., Nat. Mater. 16, 101 (2017).
final computing requirements for a specific task. Given that
31) S. H. Jo, K. Kim, and W. Lu, Nano Lett. 9, 496 (2009).
we use two 1T1R cells at different rows (a differential pair) to 32) S. Chakrabarti, S. Samanta, S. Maikap, S. Z. Rahaman, and H. M. Cheng,
represent synaptic weight, the physical array is usually Nanoscale Res. Lett. 11, 389 (2016).
asymmetric (more rows than columns). As such, it is very 33) S. Choi, J. Lee, S. Kim, and W. D. Lu, Appl. Phys. Lett. 105, 113510
(2014).
challenging to implement the backpropagation algorithm that 34) Y. Chen, C. Lin, S. Hu, C. Lin, B. Fowler, and J. Lee, Sci Rep. 9, 12420
is critical in training. Creative circuit design with symmetric (2019).
topography and negative weight capability is imperative. 35) E. Perez, M. K. Mahadevaiah, C. Zambelli, P. Olivo, and C. Wenger, J. Vac.
From the algorithm perspective, analog error-correction42) Sci. Technol. B 37, 012202 (2019).
36) H. Y. Lee et al., 2010 Int. Electron Devices Meeting - Techn. Dig. IEEE,
will help mitigate and even compensate for the performance New York, p.460 (2010).
degradation resulting from the device variation and circuit 37) J. J. Yang, M. Zhang, J. P. Strachan, F. Miao, M. D. Pickett, R. D. Kelley,
defects. One example is the voltage drop from the left to the G. Medeiros-Ribeiro, and R. S. Williams, Appl. Phys. Lett. 97, 232102
right of the array when the array size becomes huge and the (2010).
38) X. Sheng, C. E. Graves, S. Kumar, X. Li, B. Buchanan, L. Zheng, S. Lam,
wire resistance is significant. A compensation algorithm C. Li, and J. P. Strachan, Adv. Electron. Mater. 5, 1800876 (2019).
could improve the computing accuracy.43) Given that some 39) U. Böttger, M. von Witzleben, V. Havel, K. Fleck, V. Rana, R. Waser, and
other defects are not predictable, a more flexible correction or S. Menzel, Sci Rep. 10, 16391 (2020).
40) Z. Wang et al., Nat. Mach. Intell. 9, 434 (2019).
compensation algorithm should be developed.
41) F. Kiani, J. Yin, Z. Wang, J. J. Yang, and Q. Xia, Sci. Adv. 7, eabj4801
Nevertheless, the device has shown great potential, (2021).
although challenges remain. Experts across disciplines must 42) C. Li, R. M. Roth, C. Graves, X. Sheng, and J. P. Strachan, 2021 Int.
work together, taking a co-design and co-optimization Electron Devices Meeting (IEDM), San Fransicso, CA, (2020).
43) M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam,
approach to take the technology to the market, with the
N. Ge, J. J. Yang, and R. S. Williams, 53nd ACM/EDAC/IEEE Design
promised energy efficiency and computing throughput. Automation Conf. (DAC), p. 1, 2016.

© 2022 The Author(s). Published on behalf of


SM0802-10 The Japan Society of Applied Physics by IOP Publishing Ltd

You might also like