A Cost-Effective Embedded Nonvolatile Memory With Scalable LEE Flash-G2 SONOS For Secure IoT and Computing-in-Memory CiM Applications

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A Cost-Effective Embedded Nonvolatile Memory with Scalable LEE Flash®-G2

SONOS for Secure IoT and Computing-in-Memory (CiM) Applications


Koji Nii, Yasuhiro Taniguchi and Kosuke Okuyama
Floadia Corporation, Japan
{nii.om, taniguchi.ga}@floadia.com

ABSTRACT structure with p-well. As a charge-trapping type of flash cell, Oxide-
Nitride-Oxide (ONO) stacked film is formed in the middle of
We introduce a cost-effective, reliable and energy efficient transistor bottomed MG region. Fig. 2 indicates two options of
embedded flash memory technology and its applications. A charge process flows to embed SONOS into base CMOS standard logic
trapping type of Silicon-Oxide-Nitride-Oxide-Silicon (SONOS) with process. In 4-mask adder flow, additional well/channel doping,
twin select-gates structure has been demonstrated on 55-nm bulk channel implantation and poly etching for SG/DG select transistors,
CMOS technology. It is potentially scalable on advanced fully and memory LDD formations require three rough resolution masks.
depleted (FD)-SOI or 3D Fin-FET devices below 28-nm node. Those Only one additional high-resolution mask is applied for MG poly
feasibilities are shown by TCAD simulations and existing 55-nm etching. In another 2-mask adder flow, there are one rough mask for
planar bulk silicon data. Secure and low-power applications are poly etching to remove logic gate and well/channel doping, and one
introduced that are using nonvolatile (NV)-SRAM by combining high-resolution mask for MG poly etching. The ONO stack is formed
with SRAM cell and flash cell. Besides, analog computing-in- with low thermal budget, being no impact on existing SPICE models
memory (CiM) based on flash is also introduced for energy efficient of baseline logic CMOS platform.
artificial intelligence (AI) applications in edge computing.

INTRODUCTION
Demands of smart tiny edge devices for internet-of-things (IoT)
are rapidly expanding. These tiny devices require significant low-
power operations as well as extremely low or zero standby power. In
parallel, the devices tend to be more intelligent with embedded
micro-controller-unit (MCU) for smart applications. To achieve
above demands, an embedded non-volatile memory (NVM) is one of Fig. 1. Illustration of proposed scalable LEE Flash®-G2 SONOS
the key elements in a die. High density and energy-efficient structure with twin select-gates [7], [8].
embedded NVMs are rapidly spread for mobile, IoT, automotive and
artificial intelligence (AI) applications. Recently, there are several
types of emerging NVMs such as ReRAM, MRAM, PCM and
FeRAM [1]-[3]. Some of them are targeting for replacing legacy
flash memories, but there are still challenges on yield ramping, high-
temperature retention and manufacturing cost including new
equipment installation. Contrary, flash memories are based on legacy
technology without any new materials and tools, and already have
maturities by silicon proven on huge volumes in mass-productions.
There are mainly two types of embedded flash memory technologies,
one is floating-gate (FG) type [4] and the other is charge trap type [5],
[6]. SONOS is one of the later types. Technology scaling of FG flash
has a limitation beyond 28-nm node due to its device structure while
the SONOS based has potentially scale-down beyond 28-nm and
below.
In this work, unique twin Select-Gates SONOS [7]-[9] is
introduced first to achieve above target. It is potentially scalable in
advanced FD-SOI and Fin-FET devices below 28-nm node [10]. Fig. 2. Two types of process flows of LEE Flash®-G2 SONOS [8].
Those feasibilities are shown by TCAD simulations and planar bulk
silicon data on 55 nm. Lastly, some applications like a nonvolatile Table 1 shows bias conditions of G2 SONOS cell for program,
(NV)-SRAM by combining with SRAM cell and flash memory cell erase, and readout operations. Both program and erase are operated
are introduced [10]. Analog computing-in-memory (CiM) based on by using Fowler-Nordheim (FN) tunneling mechanism. Only the MG
proposed LEE Flash®-G2 SONOS is also introduced for energy is biased to either high-voltage +VMG for program or -VMG for erase.
efficient AI applications in edge computing [9], [11]. In the readout operation, the MG is biased to 0 V. The voltage ranges
of other nodes, SG, DG connected to word-line (WL), bit-line (BL)
TWIN SELECT-GATES SONOS STRUCTURE are between 0 V to ~1.0 V (core voltage). Source-line (SL) and P-
well are usually biased to 0 V. The bias conditions make scalable and
Fig.1 illustrates a cross-section image of proposed twin select- low-power in the peripheral designs of column circuits and readout
gates Low-power, cost-Effective and Easily embedded (LEE) row-decoder. Only MG driver circuits need high-voltage with an
Flash®-G2 SONOS memory cell. There are three gates: memory-gate internal charge-pump circuit. Figs. 3(a) and 4(b) portray the cross-
(MG) for controlling program/erase (P/E), source-gate (SG) and section SEM phots of fabricated G2 SONOS cell structures on 90-nm
drain-gate (DG) as select transistors, based on bulk core NMOS bulk CMOS and 55-nm bulk CMOS respectively [8], [9].

978-1-7281-6083-2/20/$31.00 ©2020 IEEE


Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 19,2022 at 23:16:33 UTC from IEEE Xplore. Restrictions apply.
Table 1. Bias conditions for program, erase and read operations [8]. SCALABILITY FOR ADVANCED TECHNOLOGY NODES
As showed in Fig. 1, proposed G2 SONOS cell with twin select-
gates is scalable for advanced technology nodes thanks to the core
MOS based structure. Indeed, the thickness of ONO film will not be
changed, but gate length of SG, DG, MG, and min feature sizes of
contact hole and diffusion are able to scale-down. Fig. 5 illustrates
the possibilities of scalability of G2 SONOS cell for 28/22-nm high-
k/metal gate (HKMG), 16/14-nm Fin-FET or below. It is also
available to scale-down to 28/22-nm and 18/12-nm FD-SOI
structures. Fig. 6 shows the example of TCAD simulation result of
G2 SONOS cell on 22-nm FD-SOI. It is confirmed that the channel
potential in SOI layer is raised to around 8 V during P/E pulse
applied, preventing the program disturb issues in unselected program
inhibit cells. Estimated scaling factors are shown in Fig. 7. The 55-
nm G2 SONOS cell is referenced, the scaled cell sizes on 22-nm FD-
SOI and 14-nm Fin-FET are comparatively 0.31x and 0.14x
respectively.

(a) 90 nm (b) 55 nm
Fig. 3. Cross section SEM phots of fabricated G2 SONOS cell
structures. (a) 90 nm bulk CMOS, (b) 55-nm bulk CMOS [8], [9].

MEASURED SILICON DATA ON 55-NM


Test chips including 8k-Byte and 1M-Byte array structures of G2 Fig. 5. Scalability of G2 SONOS structure [11].
SONOS are fabricated on 55-nm low-power CMOS process. Figs.
4(a) – 4(c) show measured cell characteristics. The Vt window
depending on P/E time is shown in Fig. 4(a). It is obtained over 2.5 V
Vt window at 10 ms of P/E pulse time for the median cell (at zero
sigma) of 8k-Byte. Vth distributions of 1M-Byte array after P/E are
also plotted in Fig. 4(b), showing steep distributions without any
tailing bits. Measured endurance data is plotted in Fig. 4(c), observed
stable Vt window without significant Vt shift up to 100k cycles.
Those P/E characteristics are ggotten without any
y verification steps.
p

Fig. 6. TCAD simulation result of channel potential vs. biasing


stress time for G2 SONOS on 22 nm FD-SOI [11].
(a) (b)

(c)
Fig. 4. Measured G2 SONOS cell characteristics on 55 nm. (a) Vt Fig. 7. Estimated area scaling of G2 SONOS memory cell [11].
window vs. P/E pulse time at 25°C, (b) Distributions of Vt after P/E
at 85°C, (c) Endurance up to 100k P/E cycles at 85°C.

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 19,2022 at 23:16:33 UTC from IEEE Xplore. Restrictions apply.
APPLICATIONS BASED ON LEE FLASH®-G2 SONOS Computing-In-Memory (CiM)
Another possibility using G2 SONOS is analog CiM in edge AI
Low-power Secure MCU/SoCs for IoT applications. Fig. 10 shows a CiM synapse array based on G2
There are some applications for utilizing logic friendly G2 SONOS for simple 1-layer neural network (NN). In the digital
SONOS. One of the possibilities is non-volatile (NV)-SRAM calculation case, there are many multiplications and accumulations of
combining with 6T SRAM cell and G2 SONOS cells as shown in Fig. data inputs (Dn-1) and synaptic weight (Wij) sequentially for getting
8. Usual read/write access is done by 6T SRAM, achieving fast data output (Dn). On the other hand, CiM synapse array stores each
access time with core low-voltage operation and no limitation of Wij data in each cell and the multiplications and accumulations can
endurance. When the system is going power off, each stored data in be done by simultaneously reading out all lines in parallel. The
each SRAM cell is transferred to complemental G2 SONOS cells. It number of calculations is significantly reduced as the larger the array
can be done by whole NV-SRAM cell array simultaneously, because size. Each Wij should be taken at least binary data (“1”, “-1”), which
the program current per G2 SONOS cell is extremally low (~pA can be assigned 1-bit per cell. For energy efficient edge computing,
order) so that the total peak program current for NV-SRAM cell 8-bit precision might be enough for keeping with high accuracy. If
array is at least μA order if the array size is 1MByte. In the standby multi-bit weight can be stored in a cell, it can be much energy
mode, the standby current is zero by cutting off the power supply for efficient by analog multiplications and accumulations in the synapse
NV-SRAM. When the system is going power up, each stored data in array. Fig. 11 shows the concept of multi-bit weight per cell for
G2 SONOS cell is resumed to each SRAM cell simultaneously, analog CiM. There is a capability to set high precise linearity of
enabling quick wake-up operation. Fig. 9 shows an application for resistance stored multi-bit wight in a G2 SONOS cell. It can be done
MCU using NV-SRAM. In the conventional MCU, when the system by precisely controlled to set the Vt position in the program operation
is power-on (off), it is required to import (export) the data between with bit-by-bit verification loops.
external or internal flash and SRAM cache sequentially. Meanwhile,
proposed on-die NV-SRAMs, which are adopted to I/D caches and
user memory, does not require such sequential data transfer so that
the power and waiting time can be reduced significantly. Besides, in
the security point of view, it is much high tampering against
malicious attacks by removing the data transfer bus and interface
ports, where the key data are sometimes stolen by side-channel
attacks.

Fig. 10. CiM based on G2 SONOS for neural network (NN) [11].

Fig. 8. Layout and equivalent circuit of NV-SRAM combining with Fig. 11. Concept of multi-bit wight storage per cell for CiM [11].
6T SRAM cell and two G2 SONOS cells [8], [11].

Fig. 12. 7-bit multi-level wight stored in a G2 SONOS cell [9].


Fig. 9. Application for Secure MCU with NV-SRAM [8], [11].

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 19,2022 at 23:16:33 UTC from IEEE Xplore. Restrictions apply.
CONCLUSION
A charge trapping type of LEE Flash®-G2 SONOS memory with
twin select-gates structure has been successfully demonstrated on 55-
nm bulk CMOS technology. Scalability on advanced FD-SOI or Fin-
FET devices are also discussed with TCAD simulations. Secure IoT
applications with NV-SRAM are introduced and analog CiM is
demonstrated for energy efficient AI applications in edge computing.

ACKNOWLEDGMENTS
Authors would like to thank to Floadia R&D team in Tokyo as
Fig. 13. Measured silicon data of 7-bit multi-level precision per
well as Hsinchu design team in Taiwan for their contributions.
single cell on 55-nm [9].
REFERENCES
[1] Pulkit Jain et al., “A 3.6Mb 10.1Mb/mm2 Embedded Non-Volatile
ReRAM Macro in 22nm FinFET Technology with Adaptive
Forming/Set/Reset Schemes Yielding Down to 0.5V with Sensing
Time of 5ns at 0.7V,” ISSCC, pp. 212-213, 2019.
[2] L. Wei et al., “A 7Mb STT-MRAM in 22FFL FinFET Technology with
4ns Read Sensing Time at 0.9V Using Write-Verify-Write Scheme and
Offset-Cancellation Sensing Technique,” ISSCC, pp. 214-215, 2019.
[3] Guido De Sandre et al., “A 90nm 4Mb Embedded Phase-Change
Memory with 1.2V 12ns Read Access Time and 1MB/s Write
Throughput,” ISSCC, pp. 268-269, 2010.
[4] http://www.sst.com/technology/superflash-technology.aspx
Fig. 14. Measured repeating readout operation on 55-nm [9]. [5] Takashi Kono et al., “40-nm Embedded Split-Gate MONOS (SG-
MONOS) Flash Macros for Automotive With 160-MHz Random
Access for Code and Endurance Over 10M Cycles for Data at the
Junction Temperature of 170°C,” JSSC, pp. 154-166, 2014.
[6] H. Mitani et al., “A 90nm embedded 1T-MONOS flash macro for
automotive application with 0.07mJ/8kB rewrite energy and endurance
over 100M cycles under Tj of 175°C,” ISSCC, pp. 140-141, 2016.
[7] Yutaka Shinagawa, Yasuhiro Taniguchi, Hideo Kasai, Ryotaro Sakurai,
Yasuhiro Kawashima, Tatsuro Toya and Kosuke Okuyama,
USP10038101B2, Jul. 2018.
[8] Yasuhiro Taniguchi, Shoji Yoshida et al., “A new core transistor
equipped with NVM functionality without using any emerging memory
materials,” Leading Edge Embedded NVM Workshop, Sep. 2017.
[9] Yasuhiro Taniguchi, Shoji Yoshida, T. Tamatsu, M. Hishiki, Y. Sasaki,
Y. Kawashima et al., “Computing-In-Memory with Tri gate SONOS
Nonvolatile Multi level Memory,” Flash Memory Summit, Aug. 2019.
[10] Koji Nii, “Embedded Flash Memory Technologies and Applications in
Advanced Nodes,” IEEE S3S Conf., Short Course, Oct. 2019.
[11] Koji Nii and Yasuhiro Taniguchi, “An Energy Efficient Computing-In-
Fig. 15. Estimated performance and energy efficiency [9], [11]. Memory based on Tri-Gate SONOS Flash Technology,” IEEE S3S
Conf., Oct. 2019.
Fig. 12 demonstrates the 7-bit multi-level wight stored in a G2 [12] Paul A. Merolla, John Arthur et al., “A million spiking-neuron
SONOS cell. Simulation result shows that each 128-level current is integrated circuit with a scalable communication network and
preciously programed with lineal 8 nA resolution after just erased interface,” Science, Vol. 345 no. 6197 pp. 668-673, Aug. 2014.
1M-Byte cells with normal distribution. Fig. 13 shows the actual [13] Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He et al.,
silicon measurement data of a single bit on 55-nm. It is obtained that “DaDianNao: A Machine-Learning Supercomputer,” in Proc. of the
the current difference between target and measurement is within half 47th IEEE/ACM Int’l Symposium on Microarchitecture (MICRO), 2014.
of 8 nA target resolution in each level. However, an unexpected [14] Zidong Du, Robert Fasthuber, Tianshi Chen et al., “ShiDianNao:
current shifting is observed by repeating readout operation as shown Shifting Vision Processing Closer to the Sensor,” in Proc. of the 42nd
in Fig. 14. Though it is limited 64-level precision by this noise in the Int’l Symposium on Computer Architecture (ISCA), 2015.
measurement, 128-level sensing becomes feasible if the noise can be [15] Ian Cutress, “Cambricon, Maker of Hauwei’s Kirin NPU IP, Build a
mitigated. Fig. 15 plots the estimated operations per second (OPS) vs Big AI Chip and PCIe Card,” May 2018. [Online]. Available:
OPS/W. Conventional approaches [12]-[16] reported in [17] are less https://www.anandtech.com/show/12815/cambricon-makers-of-
than several tera-OPS (TOPS)/W in the point of energy efficient. On huaweis-kirin-npu-ip-build-a-big-ai-chip-and-pcie-card
the other hand, proposed CiM with 1M-bit G2 SONOS synapse array [16] “Edge TPU,” 2019. [Online]. Available: https://cloud.google.com/
on 55-nm achieves 40 TOPS and 5 peta-OPS (POPS)/W by edge-tpu/
estimation. 100M-bit larger synapse array on 16-nm FinFET gets 4 [17] Albert Reuther et al., “Survey and Benchmarking of Machine Learning
POPS and 50 POPS/W respectively [9], [11]. Accelerators,” IEEE HPEC conf., Sep. 2019.

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on December 19,2022 at 23:16:33 UTC from IEEE Xplore. Restrictions apply.

You might also like