Ieee 2016 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

28nm Ultra-Low Power Near-/Sub-threshold First-In-First-Out (FIFO)

Memory for Multi-Bio-Signal Sensing Platforms


Wei-Shen Hsu1, Po-Tsang Huang2, Shang-Lin Wu1, Ching-Te Chuang1, Wei Hwang1, Ming-Hsien Tu3, and Ming-Yu Yin3

Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan1


Department of Electrical and Computer Engineering, National Chiao Tung University, Hsinchu, Taiwan2
Faraday Technology Corporation, Hsinchu, Taiwan3

ABSTRACT 16-Bit 1

In this paper, an ultra-low-power near-/sub-threshold first-in-first- 64


out (FIFO) memory is proposed for energy-constrained bio-signal 4
WL0 D015 D25515 D25615 D51115 D00 D2550 D2560 D5110

WL1
sensing applications. This FIFO memory is designed and D115 D25415 D25715 D51015

D215 D25315 D25815 D50915


D10

D20
D2540 D2570 D5100

D2530 D2580 D5090

implemented using folded bit-interleaved 10T near-/sub-threshold D315 D25215 D25915 D50815 D30 D2520 D2590 D5080

512-Word

128
SRAM bit-cells, self-timed pointers and bank-level power control Memory

512
MSB

MSB
128
circuits. The 10T SRAM cell is proposed for the bit-interleaving Array
structure with 2.4X write static noise margin (SNM) improvement. D12515 D13015 D38115 D38615 D1250 D1300 D3810 D3860

The folded bit-interleaving structure reduces the bit-line capacitance WL127


D12615 D12915 D38215 D38515

D12715 D12815 D38315 D38415


D1260 D1290 D3820 D3850

D1270 D1280 D3830 D3840

and avoids long routing wires for the circular self-timed pointers.
MUX15 MUX0
Additionally, the event-driven self-timed pointers are designed to
LSB
reduce the power consumption of clock buffers. For further MSB

decreasing the overall power dissipation, bank-level column-based Figure 1. Folded bit-interleaving structure for 512x16 FIFO memory.
power control circuitry is proposed to switch the voltages for
different banks to achieve 60.5% power saving. A 512x16 FIFO signal sensing platform. This FIFO memory is designed using self-
memory is implemented in UMC 28nm HKMG CMOS technology. timed pointers and folded bit-interleaving structure without global
Compared with the prior arts, 47X power reduction and 2.7X area clock signal. For the control of different operation voltages and
efficiency can be achieved by the proposed design techniques. frequencies of FIFO memory, a bank-level dynamic voltage scaling
technique is utilized to further reduce the total power consumption
I. INTRODUCTION by providing adjustable voltages for different banks. The rest of this
paper is organized as follows. The folded bit-interleaving structure
FIFO memories are widely utilized for data buffers and flow for the 512x16 FIFO memory is presented in Section II. Section III
control in many bio-signal sensing applications, and dominate the describes the proposed 10 transistors (10T) SRAM bit-cell to
overall die area and power consumption [1]. For multi-bio-signal enhance read/write margin and performance. The design of event-
sensing applications, the sensing platforms have to be lightweight driven self-timed pointers is described in Section IV. Section V
with small form factor for wearable or implantable devices. Thus, elucidates the bank-level power control circuitry for further reducing
limited energy consumption and small area are the two critical design total power consumption. Section VI summaries the post-simulation
challenges of multi-bio-signal sensing platforms. As such, ultra-low- results and the conclusion is given in Section VII.
power FIFO memories become the significant building blocks for
bio-sensing applications.
A conventional FIFO memory consists of three major components:
II. FOLDED BIT-INTERLEAVING STRUCTURE
storage elements, read/write pointers and timing control circuitries. For the 256x16 FIFO memory, the folded bit-interleaving
For high density and low power concerns, the SRAM-based storage structure is adopted to prevent long wires of control signals and to
elements are usually adopted rather than register-based memories. reduce the number of shift registers as shown in Fig. 1. In this
Accordingly, near-/sub-threshold SRAMs are utilized to reduce the structure, each column with 256 bit-cells is folded into four sub-
power consumption significantly by decreasing the operation voltage columns controlled by circular column-based pointers. Therefore, the
to near-/sub-threshold region [2-4]. However, the degraded design 512x16 memory array is transformed to a 128x64 array. Hence, the
margin and increased transistor variations are the serious challenges bit-line capacitance is reduced to improve the read/write access time.
for near-/sub-threshold SRAMs [1]. The read/write pointers of FIFO Based on the folded structure, the number of pointers is decreased
memories are typically implemented by shift registers [2] or counter- from 512 to 192 (128+4x16). Moreover, the original 512 circular
based pointers [2]. Both types of pointers consume a relatively large pointers require a long wire to transfer the token signal from the
portion of the total power consumption due to large number of flip- bottom register to the top register. In the folded interleaving structure,
flops for shift-register-based pointers, and overhead of decoder the pointers are implemented by 128-bit up/down scanning shift
circuit for counter-based pointers. Additionally, these two types of registers and 4-bit circular scanning shift registers. The up/down
pointers also require a global clock for the timing control circuits scanning requires only about 1/4 wire length compared with the
with large power consumption by clock buffers. original un-folded design for transferring the token from the bottom
To realize ultra-low-power FIFO memories, an 8kb near-/sub- to the top. As such, the operation speed of pointers is significantly
threshold FIFO memory is presented in this paper for multi-bio- improved.

978-1-4673-9498-7/16/$31.00 ©2016 IEEE


WWLA
RWL Q D Return Signal
A FF
WWLB Reset 0 Generator
B

MN1 A Q D
FF_neg
Reset 1
Q Qb MN4 Replica
MN2 write pulse
MN3
B WLn-1 Q
MP1 WL0
MP2
WLn Reset
Q D
FF
A
Reset
WBL WWLAb WLn-1 Q
WL1
RBL WLn Reset Q D
FF
Regular Vt Low Vt
Reset
WLn-1 Q
WL2
WLn Reset
Figure 2. 10T bit-interleavable near-/sub-threshold SRAM bit-cell.

Pointer Column
28nm HKMG tech. @TT 25°C VDD=400mV Bank-Level WL17
MC Simulation (16,000 samples) MUXs Up/Down
2FFXUUHQFH WLPHV

 WL110
Indicators
 μ=108.4 10T Bit-cell [5]
ȱ=11.3 9T Bit-cell [6] WLn-1 Q
WL126
 This work WLn Reset Q D
FF
 μ=105.8 μ=110.9 Reset
WLn-1 Q
 ȱ=14.0 ȱ=13.8 WL127
WLn Reset Q D
FF
 Reset
Q D
 A FF
Reset 0
     B

5HDG610 P9 Q D
FF_neg
Figure 3. Distribution of read SNM from Monte Carlo simulations. Reset 1

 Figure 5. Block diagram of event-driven self-timed pointers.


2FFXUUHQFH WLPHV

28nm HKMG tech. @TT 25°C VDD=400mV 10T Bit-cell [5]


 MC Simulation (16,000 samples) 9T Bit-cell [6]
This work
 The read buffer isolates the cell storage node from the read current to
μ=166.9
 μ=263.6 eliminate the read disturb.
ȱ=40.2
 μ=108.9 ȱ=21.3 In the proposed 10T bit-cell, a cut-off transistor MP2 is used to

ȱ=28.3 cut off the positive feedback loop of the cross-coupled inverter pairs,

thus eliminating the voltage dividing effect between the point Q and

the output of inverter B during the write operations. Hence, the 10T
      bit-cell enlarges the write margin without any peripheral write-assist
:ULWH610 P9 circuits, especially for operation in the near-/sub-threshold regions.
Furthermore, the dual-Vt design technique is also utilized in this 10T
cell to improve the hold stability and write-ability under ultra-low
Figure 4. Distribution of write-1 margin from Monte Carlo simulations.
voltages.
In the read mode, read word-line (RWL) is asserted to “High”.
III. NEAR-/SUB-THRESHOLD 10T SRAM BIT-CELL read bit-line (RBL) is pre-charged to VDD before the cell is accessed.
Then, MN1, MN2 and MP1 are turned off; MN3, MN4 and MP2 are
A near-/sub-threshold two-port 10 transistors (10T) SRAM bit- turned on. Depending on the cell data, RBL is conditionally
cell is proposed as the storage elements in the bit-interleaving FIFO discharged to GND through MN3 and MN4. Therefore, the proposed
memory as shown in Fig. 2. Compared with other 10T and 9T bit- SRAM bit-cell isolates the cell storage nodes from disturb noise and
interleavable SRAM cells [5, 6], the proposed 10T bit-interleavable improves the read SNM. Fig. 3 presents the distribution of the read
cell can achieve stability and performance improvements. This 10T SNM at 0.4V, TT, 25oC from Monte Carlo simulations with 16,000
SRAM bit-cell consists of a cross-coupled inverter pair, three write samples. Compared with other 9T and 10T bit-interleavable SRAM
access transistors (MN1, MN2, MP1), a pass transistor (MP2), and a cells, the proposed 10T SRAM bit-cell achieves slightly larger
decoupled read-out buffer (MN3, MN4). For realizing bit- nominal read SNM.
interleavable bit-cells, two write word-lines (WWLs) are required, In the write mode, write word-line A (WWLA) and write word-
including the vertical WWL (WWLA) and the horizontal one line B (WWLB) are both asserted to “High”. The WBL is driven to
(WWLB). If the write access transistors are composed of just two VDD or GND to write “1” or write “0” before the cell is accessed.
NMOS, the write-1 ability will be severely degraded by two Vt-drops. WWLB turns on MN2 and turns off MP2 simultaneously. Thus, the
Therefore, MP1 is utilized as a complementary pass gate with MN1 stored datum is transferred to the node VC through MN2, pass
for enhancing the write-ability. transistors MN1 and MP1, inverter A and B. MP2 cuts off the
Reducing bit-line loading results in significant active power positive feedback loop of the inverter pairs during the write operation
reduction [7]. Single-ended bit-line scheme is utilized with separate to avoid the contention. Fig. 4 shows the distribution of
read/write ports to reduce the read/write power consumption.
Additionally, large signal sensing is adopted for the read operation.
VDDL
VDDH
WLn-1
D Q

PSC

PSC

PSC
PSC
WLn-1 FF WLn
Reset Qn
WLn
Global Reset
Figure 6. Schematic and timing diagram of a self-timed pointer unit. VDDL VDDH

Bank 0

Normalized Pointer Power

32
-34% UHDGFRQWUROcircuitry

-59% 4PSC 4PSC įįį 4PSC


:ULWH'ULYHU
VDD4
VDD3
VDD2
4

Bank0
Bank1

Bank7
VDD1
' įįį

6$
16


įįį

ZULWHFRQWUROcircuitry

Shift-Register- Counter-Based Self-Timed
Based Pointer Pointer Pointer
Figure 8. Bank-level column-based power control.
Figure 7. P ower comparison of different pointers.
VDDL VDDH

write-1 margin at 0.4V, TT, 25oC from Monte Carlo simulations WWLAj
WWLB8i-1
PS_L

with 16,000 samples. The nominal write-1 margin of the proposed VCS
CEN PS_H
10T SRAM bit-cell is 1.9X and 2.4X higher than that of the WWLAj
RWL8i-1
previous 10T and 9T SRAM bit-cells, respectively. WWLAj PS_H DSCH
RWL16i-1
IV. EVENT-DRIVEN SELF-TIMED POINTER
Delay line
As the depth of FIFO increases, the number of flip-flops and the
lengths of clock/control wires increase. The access time of the FIFO PS_L

memory is dominated by the propagation delay of these long wires. PS_H

Therefore, the event-driven self-timed pointer is proposed to reduce DSCH

both access time and power consumption by avoiding long wire and VCS
VDDL VDDH

reducing clock buffers. The proposed self-timed pointer consists of Write


Power Read Discharge
four major blocks as shown in Fig. 5, including 128 self-timed gating Hold

pointer units in a pointer column, 128 multiplexers, 2 return signal


generators and 8 bank-level up/down indicators. To provide Figure 9. Power switch control circuits and corresponding waveforms.
up/down scan for the interleaving structure, two return signal
generators are designed to reflect the token at the two boundaries of the shift-register-based pointer. The proposed pointer adopts the self-
the pointer column. For avoiding long control signals, the up/down timed control scheme without long wires for control/clock signals.
control is designed at bank-level. Thus, 8 bank-level up/down
indicators are also implemented to indicate the direction of the token V. BANK-LEVEL COLUMN-BASED POWER CONTROL
for each bank. The up/down state of each bank is determined by In FIFO memory, each storage elements can be independently
detecting the word-line signals at the boundaries of each bank. operated in read, write, data-retention and power-off modes with
Additionally, the multiplexers are used to decide the token different voltages. Hence, the granularity of the dynamic voltage
transmitted upward or downward for self-timed pointer units. scaling for the FIFO memory can be applied from row-level, bank-
Each self-time pointer unit generates the corresponding token level, sub-array-level to array-level. A trade-off between the
signal by detecting the two adjacent word-lines and up/down state of overhead of control circuits and the voltage transition time exists
this bank. Thus, each word-line is asserted by this token signal and under different granularities. To reduce the charge/discharge energy
the write pulse generated from the write replica circuitry. Fig. 6 and eliminate the setup time during voltage conversion, a bank-level
presents the schematic and timing diagram of a self-timed pointer column-based power control circuitry is presented for the bit-
unit composed of one flip-flop, one inverter and one NOR logic gate. interleaving structure as shown in Fig. 8. Each bank contains four
This self-timed pointer unit is designed by edge-sensitive event- power switch control (PSC) circuits for the 4 folded interleaved sub-
driven circuits. When the last word-line signal WLn-1 falls, the stored columns. Thus, the PSC circuits provide different voltages according
datum Qn rises to indicate that the token is transferred to this word- to the boundary R/W conditions of each bank based on the
line. Similarly, after this word-line signal WLn falls, the stored datum characteristics of FIFO.
Qn will fall also. This function results in self-timed token passing Fig. 9 presents the PSC circuits and the corresponding waveforms
through flip-flops without the global clock signal. of the voltage transitions. Each PSC consists two power PMOS, one
Fig. 7 presents the power comparison of different pointers. The discharge NMOS and control logics. Accordingly, the PSC is
registers and long metal lines in the counter-based pointer are less controlled by the self-timed read/write pointers. The signal of
than those in the shift-register-based pointer. Therefore, 34% power read/write pointers (RWL/WWL) are decoded to generate the signals
saving can be realized for the counter-based pointer compared with of power switch high (PS_H), power switch low (PS_L), and
the shift-register-based pointer. Moreover, the proposed event-driven discharge (DSCH) by the control logic. The two signals of PS_H and
self-timed pointer can achieve 59% power reduction compared with
Table I. Comparison of ultra-low-power FIFO memories.
Normalized power consumption

 [8] [6] [3] [9] This work


Technology 90nm 65nm 90nm 65nm 28nm

Bit-Cell 7T 8T 10T 9T 10T
-60.5% Pointer +
 Structure Pointer Pointer Decoder Decoder
Interleaving
DVS N.A. Array-Level N.A. Row-Level Bank-Level

Voltage (V) 0.5V 0.3V/0.5V 0.5V 0.3V/0.5V 0.43V/0.6V

Size 4kb 1kb 16kb 2kb 8kb
Read/Write 5 MHz/ 625kHz/ 625 kHz/ 625 kHz/ 10 MHz/

Frequency 200kHz 20kHz 50kHz 33kHz 10MHz
without PSC with PSC Average
2.21μW 0.17μW 1.646μW 0.606μW 9.07μW
Power
348μm x 111μm x 666μm x 185μm x 317μm x
Figure 10. power consumption w./w.o. the bank-level power control. Area
147μm 82μm 508μm 130μm 199μm
Normalized
1.43 4.38 1.10 5.17 0.11
Power *
Normalized
14112 9102 21145 12025 7885
Area **
SRAM Bank PS W W PS SRAM Bank
*Normalized power: μW/kb∙MHz **Normalized Area: μm2/kbit
Write Vertical pointers
Read Vertical pointers

C D D C
LEV LEV
Write Track

SRAM Bank PS W W PS SRAM Bank


C D D C
LEV LEV
SRAM Bank PS W W PS SRAM Bank
and read SNM. For eliminating long wires of control/clock signals,
Write Replica
R d Replica
Read

C D D C
LEV LEV
SRAM Bank SRAM Bank
self-timed pointers are utilized to reduce both the access time and
PS W W PS
C D D C
R
R

LEV
199.16 μm

LEV
DIDO DIDO
Read/Write Horizontal Pointers Read/Write Horizontal Pointers power consumption. Moreover, bank-level power control is
DIDO DIDO
implemented to provide dynamic voltage scaling based on the
Column
Column

LEV LEV
Read Track
olumn
olumn

PS W W PS
l

SRAM Bank C D D C SRAM Bank


LEV
SRAM Bank
PS
C
W
D
W
D
PS
C
LEV
SRAM Bank
read/write conditions of different banks. This FIFO memory is
LEV
SRAM Bank
PS
C
W
D
W
D
PS
C
LEV
SRAM Bank implemented in UMC 28nm HKMG CMOS technology and achieves
LEV LEV
SRAM Bank
PS
C
W
D
W
D
PS
C SRAM Bank the best power efficiency compared with other FIFO memories.
Based on the low-power design techniques, this FIFO memory
enables energy-efficient and robust operations for multi-bio-signal
316.86 μm
sensing platforms.
Figure 11. Layout view and floorplan of 8kb FIFO memory. ACKNOWLEDGEMENT
PS_L are utilized to provide VDDH and VDDL for the cell voltage This work is supported by Ministry of Science and Technology in
(VCS), respectively. DSCH is used to discharge the VCS through Taiwan under MOST 103-2218-E-009-007 and MOST 103-2221-E-
turning on the NMOS while the data can be destroyed. Fig. 10 shows 009-202-MY3!
the power consumption of the FIFO memory with/without the REFERENCES
proposed bank-level column-based power control. The VDDH and
VDDL are 0.6V and 0.43V, respectively. 60.5% power reduction is [1] N. Verma, “Analysis towards minimization of total SRAM energy over
active and idle operating modes,” IEEE Trans. on VLSI Systems, Vol. 19,
achieved by the power control circuits.
No. 9, pp. 1695-1703, Sept. 2010.
VI. IMPLEMENTATION AND POST-SIM RESULTS [2] K. Nii, et al., “A 65 nm ultra-high-density dual-port SRAM with
0.71um2 8T-cell for SoC,” IEEE Symp. on VLSI Circuits, pp. 130-131,
An 8kb ultra-low power FIFO memory is designed and 2006.
implemented in UMC 28nm high-k metal-gate (HKMG) CMOS [3] W.-H. Du, et al, “An Energy-Efficient 10T SRAM-based FIFO Memory
technology. Fig. 11 presents the layout view and floorplan of the Operating in Near-/Sub-threshold Regions,” IEEE System-on-Chip
proposed 8kb FIFO memory. The area of the FIFO memory is Conference, pp. 19-23, Sept. 2011.
317μm x 199μm, including the power rings. This FIFO memory can [4] D. Markovic, et al., “Ultralow-power design in near-threshold region,”
operate from 0°C to 80°C across all corners, and the maximum read IEEE Proceedings, vol. 98, no 2, pp. 237-252, Feb. 2010.
and write frequencies are 22.7MHz and 740kHz at 0°C at SS corner. [5] I.-J. Chang, et al., “A 32 kb 10T Sub-Threshold SRAM Array With Bit-
Table I presents the comparison of different ultra-low-power Interleaving and Differential Read Scheme in 90 nm CMOS,” IEEE
FIFO memories. The proposed FIFO memory is implemented by the Journal of Solid-State Circuits, pp 650-658, Feb. 2009.
folded interleaving structure and self-timed pointers and operates at [6] Y.-T. Chiu, et al., “Subthreshold Asynchronous FIFO Memory for
10MHz/10MHz and 0.6V/0.4V for the read/write operations, Wireless Body Area Networks (WBANs)”, International Symposium on
Medical Information and Communication Technology (ISMICT), March
respectively. Additionally, bank-level dynamic voltage scaling (DVS) 2010.
is utilized to reduce the power consumption by self-controlled PWC.
[7] M.-H. Tu, et al, “Single-ended Subthreshold SRAM with Asymmetrical
Both the power and area are normalized to μW/kb∙MHz and μm2/kbit, Write/Read-Assist,” in IEEE Trans. on Circuits and Systems, Vol. 57,
respectively. Based on the normalized power and area, the proposed No. 12, pp.3039-3047, Dec. 2010.
FIFO memory can realize the best power and area efficiency. [8] M.-T. Chang, et al., “A Robust Ultra-Low Power Asynchronous FIFO
Memory With Self-Adaptive Power Control,” IEEE System-on-Chip
VII. CONCLUSIONS Conference, pp. 175-178, 2008.
In this paper, an 8kb ultra-low-power near-/sub-threshold SRAM- [9] W.-H. Du, et al., “A 2kb built-in row-controlled dynamic voltage
based FIFO memory is presented for multi-bio-signal sensing scaling near-/sub-threshold FIFO memory for WBANs," IEEE
platform based on a folded bit-interleaving structure. A 10T bit- International Symposium on VLSI Design, Automation, and Test (VLSI-
DAT, pp.1-4, 2012.
interleavable SRAM bit-cell is proposed to enhance the write-ability

You might also like