Power Gating Nems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2017 IEEE Computer Society Annual Symposium on VLSI

Ultra-Low Energy Data Driven Computing


Using Asynchronous Micropipelines and
Nano-Electro-Mechanical Relays
Haider Alrudainy, Andrey Mokhov, Fei Xia and Alex Yakovlev
School of Electrical and Electronic Engineering,
Newcastle University, Newcastle upon Tyne, NE1 7RU, England, UK
Emails: {h.m.a.alrudainy, andrey.mokhov, fei.xia, alex.yakovlev}@ncl.ac.uk

Abstract—Asynchronous logic and power gating are promising applications more challenging. In most synchronous designs,
techniques for low duty cycle applications which need maxi- switching circuits off during idle periods is usually performed
mal energy efficiency, which are becoming more common-place by adopting shut-off instructions within the program code. In
with the popularity of wireless and embedded systems. This
paper investigates the potential use of Nano/Micro- Electro- contrast, it is difficult to employ this approach with circuit
Mechanical (N/MEM) switches as the means of power gating systems which operate without using instructions.
asynchronous computation. A systematic optimization of the Although, a recent proposed approach performs power
N/MEMS parameters is performed using finite element analysis gating during active computation time within the sub-clock
(FEA) in the multiphysics COMSOL tool. An asynchronous cycle [5], synchronous circuits will still require proper timing
FIR filter with a 4-phase bundled-data handshake protocol is
designed and implemented in the 90nm technology node. The analysis. It is essential to ensure that the hold and set-up
N/MEMS switches are comparatively studied with conventional time conditions are satisfied during circuit switching on/off.
sleep transistors in a system where both the computation and the In asynchronous circuit designs, however, local handshake
timing controls for the asynchronous FIR circuits are power gated protocols are used as control signals to power gate during
appropriately. It is demonstrated that our N/MEMS solution the idle period, as these control signals specify when the
offers a 69% energy improvement when 32-tap FIR filter is power
gated at a data rate of 1KHz compared to a 39% savings realized circuit starts and ends computation. A plethora of research has
by using sleep transistors in the same design. been conducted to investigate power gating in asynchronous
designs. In [6] the request signals of a four-phase bundled
I. I NTRODUCTION
data scheme are utilized as a control signal to power on/off
Ultra-low-power digital circuit design has become a key the idle circuits in each stage of a micropipeline. A further
challenge given the growth in demand for energy-constrained improvement has been proposed to power gate both the com-
devices that consume the minimum energy possible, and have binational circuit (CC) block and the latches within each stage
less emphasise on circuit performance. These devices include of the pipeline [7]. The latter approach proposes monitoring
wireless sensor network, biomedical devices, baseband posses- the states of the adjacent blocks, and thereby shutting them
sor, and autonomous sensors for the IOTs [1]. Traditionally, off when required. Generally, depending on how the delay line
the scaling down of supply voltage has been used to reduce (DL) block is implemented in asynchronous designs, this can
the dynamic energy quadratically [2]. However, operating at be a major source of a significant amount of leakage current.
low voltage introduces a significant increase in propagation Therefore in the present work, zero-leakage N/MEMS-based
delay, which in turn leads to longer task completion time devices are employed to power gate each combinational circuit
resulting in significant increases in leakage energy. This is even (CC) block as well as the delay line (DL) block (i.e. N/MEMS
exacerbated as the technology node is scaled down further provides free mechanical delay time) in micropipeline stages.
past the 90nm transistor size. Since the target applications in Furthermore, the conditions where N/MEMS-based power
this paper generally demand non-invasive power gating of the gating circuitry can achieve greater energy savings than sleep
combinational circuit, the idle time between operations is non- transistor counterparts are investigated, including questions of
trivial which in turn leads to more energy consumption. design architecture, and behaviour of operation.
To alleviate idle energy consumption, power gating tech- This work addresses the limitations and drawbacks of
niques [3] as well as body biasing approaches [4] were used in the previous work utilizing N/MEMS for energy-constrained
both asynchronous and synchronous circuit designs. Of the two implementations. Since these studies are either based on
methods, power gating has been demonstrated to be an effec- theoretical demonstrations [8] or showing a lack of the model
tive technique to mitigate leakage power during the idle period. developed and simulation environment utilized [9]. Therefore,
Typically, sleep transistors are inserted between the main a novel work based on 3D FEA performed on COMSOL
power supply rail and combinational circuit (CC). However, multiphysics simulation tool is presented to target applications
these transistors themselves contribute high leakage current which exhibit low duty cycles as well as bursty computation
making the reduction in idle energy of energy-constrained behaviour. It should be noted that, depending on the data

2159-3477/17 $31.00 © 2017 IEEE 158


DOI 10.1109/ISVLSI.2017.36
Table I: Key features of N/MEMS actuation characteristics. IDS Gate dielectric
Gate
h
MEMS Piezoelectric Electrostatic Magnetic Thermal g0 gd
Fast switching () () (∼) (∼) Source Body Drain CL
Simple fabrication () () () () + V
Low pull-in voltage () () () () S + Vgb
Bias current [mA] () () () () Vgb
Low power () () () () Vrl Vpi
High force () () () ()
Scalability () () () () Figure 1: Cross-section of N/MEMS with its Vgb vs. Ids curve.

Table II: key Features of the fabricated N/MEM relays.

Ref. No. Geo. A. D. S. M. C.R.(Ω) Es (pJ) force can not be produced to balance the electrostatic force.
[10] 4 PP V Poly-SiGe 1.4k 1.8 Consequently, the gate electrode collapses into the drain-
proposed 4 PP V Poly-SiGe NR 1 source terminal. To turn off the N/MEM relay, Vgb is lowered,
[11] 3 CB L SiO2 5K 0.082
which decreases the electrostatic force between the suspended
gate and the body terminal. Eventually, as Vgb continues
to decline, the spring elastic force becomes larger than the
rate (target throughput) and design architecture, the total
electrostatic force, and thus the dimples pull out of contact.
energy consumed by the power gated circuits can be greater
The release voltage of the N/MEMS is referred to as Vrl , as
or smaller compared to cases with no power gating. This
can be seen in Fig. 1.
work investigates the extent to which power gating in the
asynchronous micropipeline based N/MEM switches can be III. F INITE E LEMENT O PTIMIZATION
beneficial compared to using sleep transistors at different data
rates and levels of design architecture complexity. In the present study, 3D Finite element analysis (FEA) is
II. BACKGROUND used to accurately simulate and evaluate N/MEMS physical
Due to recent advances in planar fabrication process, me- parameters. Figs. (2-3) show the 3D FEA simulation results
chanical computing has been revived for energy-constrained of various geometric shapes of MEMS at comparable foot-
applications [10]–[12]. Typically, N/MEMS can be classi- print size (450um2 ) and material type (poly-SiGe). These
fied, based on the method of actuation, into electrostatic, experiments were carried out using 3D FEA in COMSOL
electrothermal, magnetostatic, and piezoelectric. Each type of multiphysics tool. The pull-in voltage, bending out of plane,
actuation scheme has specific advantages and drawbacks as and residual stress of these relays have been evaluated and
listed in Table I. It can be deduced from Table I that the analysed. Fig. 3 (b) illustrates that 4-terminal relays with
electrostatic actuated MEM relay is attractive candidate for double springs, as expected, are pulled-in with lower voltage
digital logic applications due to its low active power consump- and switching energy compared with that of Fig. 3 (a) and
tion, scalability, fast switching, and ease of manufacture using Fig. 2. However, the indicated results demonstrate that 4-
conventional planar processing techniques [10]. Alternatively, terminal relay with double spring suffers from high residual
they could also be classified according to the axis of deflection stress which may affect their functionality of operation. In
(lateral, vertical), contact interface (ohmic or capacitive), and the present work, due to its higher recorded on/off switching
geometric shape (see-saw beam (SS), cantilever beam (CB), cycles without exhibiting any failure (2.1× 109 ) as well as
dual bridge (DB), clamped-clamped beam (CC), parallel plate lower switching resistance, the relay in Fig. 2 is used to shut-
(PP), sidewall perimeter beam (SW)). Table II summarises off the idle circuits. For example, if these relays switch on/off
the key features of various N/MEM relays. These features on average of 1KHz for 1 second followed by 4 minutes
including the number of terminals (No.), geometric shape process of idle, then they roughly can last for 17 year without
(Geo.), actuation dirction (A. D.), structure material (S. M.), experiencing any failure in operation.
contact resistance (C. R.), and switching energy (Es ). In the
present work, for coherent comparison analysis, relays in Table
LA
II are simulated using the COMSOL multiphysics tool with
W
comparable footprint size (see Section III).
Channel
Operating principles of an electrostatic relay can be sum-
WA L
marised as follows. Firstly, As Vgb is increased, as shown in
Fig. 1, an electrostatic force is generated between the gate-
body terminal. As a result, the gate bends toward the body
terminal, and thereby an elastic spring force is generated which
counteracts the electrostatic force. Increasing Vgb leads to an
(a) (b)
increase in the gates deflection at a higher rate due to the
positive feedback which in turn decreases the gap distance and Figure 2:Demonstrates the: (a) FEA-simulated pull-in voltage; (b)
leads to an increased of electrostatic force. As Vgb reaches a simplified sketch, symbols L, W, LA, and WA denote, respectively,
spring (length/width), and actuation area (length/width) [10].
level above the pull-in voltage (Vpi ), a large enough spring

159
Table III: Current and scaled MEM relay physical parameters based on COMSOL multiphysics tool.

MEMS Pull-in Switching Mechanical Stiffness Mass Damping Actuation Actuation


Area (um2 ) voltage(v) energy (pJ) delay(μs) (N/m) (pg) (μN.s/m) gap(nm) Capacitance(fF)
450 (existed) 11.3-2.6 0.1-3.2 0.15-0.69 10.14-192.6 1.1-2.9 50 200 40
45 (NEMS) 0.19-1.97 0.049-0.003 0.06-0.28 5.51-68.2 0.15-0.25 0.07 40 17
4.5 (NEMS) 0.1-0.31 (0.037-0.36)× 10−3 (24-85)× 10−3 0.15-1.51 (3-3.77)× 10−3 0.007 20 3.54

(a) (b)
Figure 3: Demonstrates the 3D FEA of: (a) anchor MEMS [11]; (b) 4-terminals with double spring MEMS.

A. Energy-Latency Analysis of N/MEMS Fig. 4 (d) demonstrates a significant reduction in the switch-
An extensive parametric sweep simulation is performed ing energy of the scaled MEMS. Furthermore, these results
using the COMSOL multiphysics tool, in this work, to estimate clearly indicate a better trade-off between switching energy
the range of electo-mechanical parameters for both fabricated and mechanical delay time compared to that of the fabricated
450um2 and scaled 45um2 , and 4.5um2 relays respectively, MEMS (A=450um2 ). As an example, it is found that at,
thereby energy-latency trade-offs of N/MEMS can be opti- (gd /g0 )=0.5, every ∼5× increase in switching energy can be
mised. These parameters can be seen in Table III. trade-doff for a ∼5× reduction in the scaled delay, as can be
The result in Fig. 4 (a) shows the switching energy con- seen in Fig. 4 (c-d).
sumption of MEMS by using 3D FEA as a function of IV. A SYNCHRONOUS PIPELINE STAGE
the dimple gap (gd ), and resonant frequency (w). As can
The asynchronous micropipeline, as shown in Fig. 5, pro-
be seen, increasing (gd ) causes an almost linear increase in
vides an event-driven scheme utilizing localized handshaking
switching energy at low (w). Alternatively, switching energy
signals to transfer data through the pipeline sequentially from
increases exponentially with increasing resonant frequency (w)
one stage to another [14]. When the input data is ready to
by sweeping the ratio of (L/W), at high (gd ). Fig. 4 (b) shows
be sent, a request signal req_in is generated resulting in
the simulation results of mechanical delay time as a function
of the gap ratio (gd /g0 ), and resonant frequency (w). One 12
x 10
¡

observation which can be made is that Tmech is inversely x 10


6
x 10
-11

1 10
Mechanical delay time (sec)

proportional to (w), and it is linearly proportional with the


Switching energy (J)

1
0.8 8
increase in (gd /g0 ), which is consistent with the theoretical 0.6
0.8
0.6 6
predictive analytical equation in the previous study [13]. 0.4
0.4 4
B. Implications of Scaling on the Energy-Latency Trade-off 0.2
0.2
0 2
0
Like the case for CMOS, the scaling down of MEMS 0.8
Ga 6 0
1
pr 2 1.5
at 0.6 -7
parameters will lead to achieving greater energy savings. io
(g
d/
4
/w)]
x 10
1/w 3
[f(L/w 4
)]
1
(m)
x 10
-7

g0 0.4 2 [f(L
) 1/w x 1 -7 0.5 gap
A 1-DOF optimization based on a variable scaling factors (a) 0 (b)
-14
7 x 10
methodology has been proposed in previous work [13]. How- x 10
7
x 10
4
-13 15
Mechanical delay time (sec)

x 10
ever, single DOF analysis can lead to inaccurate outcomes. 4 3.5
Switching energy (J)

3
As a result, 3D FEA coupled with the sweeping of scaled 3
2.5
1.5
10

parameters has been performed in the COMSOL tool. In our 2 2 1

1.5 5
experiment, it is postulated that area (A), gate thickness (h), 1
1
0.5

and actuation gap (g0 ), can be scaled by factors of 0.1×, 0 0


0
0.8 2.5 0.5
0.5×, and 0.2× respectively. Other parameters such as gd , Ga
p r 0.6
1.5
2
7 -7
1
3
-8
at x 10 x 10 1 1.5 2 x 10
io
spring width (W), and spring (L) are kept variable by sweeping (g 0.4
d/
g0 0.5
1
1/w
[f(L
/w)] /w [f(L/w 2
)] 2.5 1 gap
(m)
)
(c)
(L/W). Ultimately, some parameters will approach a lower (d)

limit and it may not be be possible to scale them down as Figure 4: Illustrates that: (a) switching energy based FEA at g0 =200nm and A=450um2
as a function of gd and resonant frequency (w); (b) Tmech. as a function of gap ratio and
readily as other parameters. For instance, g0 is limited by resonant frequency obtained from 3D FEA at A=450um2 ; (c) Tmech. of the scaled relay
nano-gap formation technology, while gate thickness (h) will at g0 =40nm and A=45um2 as a function of gd and resonant frequency; (d) switching
energy of the scaled relay.
be set by process technology constraints.

160
enabling latches to capture the input data through EN1 and
to raise the output request signal req_1. Consequently, the Req_1 D_in D_out
Ton Toff Comb.
previous stage waits for an acknowledgement signal from the Latches Latches
circuit
next stage, which is asserted once the data is computed and
ready to be stored into the latches. While the latched data in CC
EN1 EN2
the previous stage is being processed by the combinational Teval req_in GND req_2
circuits, the request signal req_1 is also passing through a H/S1 req_1 Delay line H/S2
Ctrl Ctrl
matched delay element. The delay of the DL block is estimated Ack_1 En2 = 1 ack_out ack_1 ack_2
to be not less than the worst case delay of the combinational (a) (b)

circuits. This is to guarantee that correct data values are Figure 6: (a) timing diagram; (b) power-gated asyn. micropipeline [6].
captured by the latches.
V. P OWER GATING IN ASYNCHRONOUS MICROPIPELINE
dissipation. As a result, this approach can only be energy-
In this section we briefly discuss the power gating tech- efficient when the energy saved is much greater than the energy
niques in asynchronous miropipelines using sleep transistors. consumed in switching the power switch network (PSN) on-
Furthermore, using the optimized relay design (Section III) and-off. Moreover, further energy savings of between 5% and
a N/MEMS-based power gating controller is proposed. In the 12% can be obtained by including the DL blocks in the power
following sections we briefly describe our approach, highlight- gating domains [16], as shown in Fig. 7 (a). However, sleep
ing the amount of transistors have fundamental limitations in their effectiveness.
Therefore, the present research investigates using N/MEMS
A. Conventional power gating in asynchronous micropipeline for effective idle energy minimization.
The req_1 signal, as shown in Fig. 6 (b), is used as a B. N/MEMS power gating in asynchronous micropipeline
sleep control signal to turn off the CC block during its idle
state, or to turn it on during its active period. This will lead to Further improvements to the previous paradigm can be
the use of so called ”just-in-time” computation. Consequently, achieved by employing N/MEMS to power gate both the DL
leakage current can be reduced during the inactive period, as as well as CC blocks, as can be seen in Fig. 7 (b). In this study,
illustrated by Tof f in the timing diagram shown in Fig. 6 (a). An on chip charge pump for MEMS implementations has been
When the data is asserted, the request signal req_1 is raised optimised to meet the requirement of the pull-in voltage of
to high and passed simultaneously to both sleep transistors any types of chosen MEMS. For example, It is found that
and the DL block as a turn-on signal. This leads to powering adopting a charge pump increases the delay overhead by 0.5
the CC block, and thereby the incoming data is accepted from μs of the MEMS in Fig. 2, while it has a slight impact on
the data bus during Ton . The time required for the CC block the energy overhead 0.5 pJ. The following two benefits can
to compute the incoming data and give stable output values is be observed by adopting N/MEMS in the PSN of Fig. 7(b).
denoted as Teval . As the req_1 is passed through the DL Firstly, unlike sleep transistors, N/MEMS switches exhibit
block, the handshaking controller 2 (H/S2 Ctr2) generates zero leakage current. To avoid any performance degradation,
the enable signal EN2 for the latches and Ack_1, which especially in ultra low power applications, the width of sleep
is transmitted back to the handshaking controller 1 (H/S1 transistors has to be made larger, which in turn leads to
Ctrl). Consequently, handshaking controller 1 (H/S1 Ctrl) de- significantly greater leakage current. Finally, fewer N/MEMS
asserts the req_1 signal once the Ack_1 signal is received, than CMOS counterparts are required in both the DL block as
and thereby turns off the CC block. It is reported that a well as the PSN, due to its lower Ron than the sleep transistor
reduction of about 70% in the leakage energy dissipation counterpart. When req_1 is de-asserted, the DL block will
can be achieved by utilizing this approach compared to one be floating due to the disconnectivity period caused by the
without a power gating technique [6]. It should be noted that long Tmech delay to turn-on the other N/MEMS in the buffer
this study only evaluated leakage energy reduction without circuit. Therefore, employing the AND gate, as can be seen
taking into account the expense in energy overhead caused by in Fig. 7 (b), is essential to ensure that the output of the DL
the effect of the power gating circuitry on the total energy block is always driven by either logic low or logic high and
that it is never floating.
VI. E VALUATION SETUP

D_in D_out
To validate the proposed approach as well as to find the con-
Combination
Latches
circuits
Latches ditions at which power gating in asynchronous micropipelines
using N/MEM power switches becomes beneficial, 8- and 32-
EN1 EN2 tap FIR filters were designed. These FIR filters are imple-
req_in req_2
H/S req_1 Delay line H/S mented using 90nm CMOS technology node, as shown in
Ctrl Ctrl Fig. 8. They consist of two combinational blocks including
ack_1 ack_2
ack_out
an accumulator and multiplier. Each combinational block is
Figure 5: Asynchronous 4 phase bundle data micropipeline [6].
powered by its own power domain, so that the multiplier is

161
Vdd
Vdd
Power switch transistors MEMS power switch
Vdd PD Vdd PD

D_in D_out D_in D_out


Comb. Comb.
Latches Latches Latches Latches
circuits circuits

MEMS
EN1 EN2 EN1 Delay line
EN2

req_in
Delay line req_2 req_in Charge req_2
H/S H/S pump H/S
H/S req_1 req_1
Ctrl Ctrl Ctrl
Ctrl ack_1 ack_1 ack_2
ack_2 ack_out
ack_out (b)
(a)

Figure 7: Schematic illustration of power gating approach in asynchronous micropipeline using: (a) sleep transistors; (b) proposed N/MEMS.

Vdd Vdd
N/MEMS power switches N/MEMS power switches
Vdd PD1 (S1) (S2)
Vdd PD2

32 64 32 32 32 32
Multiplier A
Accumulator D Flip-flop
D Flip-flop
Data_in D Flip-flop Data_out

N/MEMS-based N/MEMS-based
EN0 EN1 EN2
Delay line Delay line
req_in req_0 req_1 req_out
Pipeline CP H/S CP H/S
Controller Ctr1 Ctr2
ack_out ack_0 ack_1 ack_in

Figure 8: Asynchronous 32-tap FIR filter implemented based on the proposed approach.

powered by power domain1 (Vdd PD1) while the accumulator Synopsys simulator the area, dynamic and static energy has
is powered by power domain2 (Vdd PD2). These two power been optimized. Finally, the two technology N/MEMS-CMOS
domains are powered by the main power supply voltage (Vdd ) are simulated using Cadence simulation tool.
through an array of PSN including S1 and S2 , as indicated VII. R ESULTS
in Fig. 8. The delay line (DL) blocks in each stage of the Our approach was evaluated and compared with various set-
micropipeline are connected now with the power domain ups in previous work [6] [17]. All these set-ups are powered
of the corresponding combinational circuit (CC) block. The by supply voltage with Vdd =0.6V. The number of PMOS sleep
acknowledgement signal ack_out is only generated when the transistors used in the PSN are S1 =20 with width=4um, and
controller receives a request signal req_in and performs N S2 =15 with width=4um. The total energy dissipation was eval-
handshake cycles with the micropipeline. Consequently, when uated and energy per computation was recorded. Furthermore,
the req_0 signal is placed high the switch array S1 will the total leakage energy (i.e. caused by DL, and CC) of each
turn on, thereby connecting the supply voltage Vdd to power set-up was recorded. Moreover, the overall energy overhead
domain1 (Vdd PD1). This leads to powering up the multiplier of the proposed approach caused by adding N/MEMS and the
and its corresponding delay line (DL) block. One branch of charge pump were evaluated. It should be noted that this filter
the req_0 signal bypasses the delay line (DL) block and will only works for a range of data rates which must not exceed the
be ANDed with its delayed signal, resulting in generating the natural throughput of the filter at Vdd =0.6V. Table IV shows
signal to be passed into handshaking controller1 (H/S Ctrl). the simulation results of 8-tap FIR filter at various setups.
The (Vdd PD1) can be powered down when the req_0 signal
It can be deduced from these results that decreasing the
is de-asserted, and this leads to placing one input of the AND
data rate from 1MHz to 1KHz will result in an increase in
gate in the floating state. However, the output logic of the
AND gate will be set at logic low by req_0. COMSOL
Device multiphysics
The flowchart of our evaluation process is presented in Fig. o dimensions
3D FEA
o geometry
9. Firstly, N/MEMS switch has been modelled and designed o material
simulator
o etc. VHDL code
using 3D FEA performed by COMSOL multiphysics tool. Pre- -Parametric sweep o FIR filter
Contact -Transient analysis o AES key generator
defined characteristics including dimension, geometric shape, modelling -Frequency analysis
o
material, and actuation type are used as an input to the o
Stiffness (k)
Damping (b) Synopsis-
simulator
COMSOL. As a result, the evaluation of the mechanical Extract device
o
o
Mass(m)
Pull-in Voltage (Vpi) synthesis
and electrical lumped parameters can be then obtained by parameters o Mechanical time
o Switching energy
performing frequency, transient, and parametric sweep anal- Optimize
Verilog-AMS o Dynamic and
ysis. These lumped parameters are written in Verilog-AMS (Electrical and static power
mechanical lumped o Area
and co-simulated in Cadence simulator, as illustrated in [15] model)
Results
[16]. From the other side, VHDL code is written for 32-tap o Total energy Cadence
o Timing
FIR filter which consists of two combinational blocks 32-bit o etc.
simulator

adder and 32-bit multiplier, as can be seen in Fig. 8. Using


Figure 9: Mixed electronics N/MEMS-CMOS evaluation process.

162
Table IV: Total energy/computation for various asynchronous PG configurations. Table V: Energy consumption for 32-tap FIR filter at various asynchronous PG setups.

No PG With PG in [6] No PG With PG in [17] Proposed PG (MEMS)


Data- Leakage Total energy Total energy Saving Data- Energy Total energy Saving Total energy Saving
rate (KHz) (pJ) (pJ) (pJ) (%) rate(KHz) (nJ) (nJ) (%) (nJ) (%)
1 1351 2812 1940 31.0 1 6.778 4.10 39.5 2.10 69.0
10 134.3 283 217 24.9 10 0.80 0.49 38.0 0.30 62.5
100 12.90 41.0 33.0 18.14 100 0.60 0.41 31.0 0.31 48.3
400 3.277 19.8 20.6 -2.95 400 0.45 0.38 15.0 0.315 30.0
800 1.678 17.0 17.7 -7.56 800 0.389 0.35 10.2 0.32 17.7
1000 1.301 16.0 17.35 -8.40 1000 0.34 0.345 -1.4 0.32 5.80
10000 0.3 0.38 -26.0 0.45 -50.0
Table IV (continued): Total energy/computation for various asynchronous PG setups.
8
With PG in [17] Proposed PG (MEMS) 10
WithoutPG
Data- Total energy Saving Energy (pJ) Total energy WithPG [16]
rate (KHz) (pJ) (%) (overhead) (pJ) MEMSPG

Energy/computation ( J)
NEMSPG
1 1601.9 43.0 3171 4640
10 172.6 39.0 336 464.9
100 30.9 24.0 52.5 81.70 9
10
400 18.6 7.0 28.8 45.90
800 16.8 -2.43 24.9 40.60
1000 16.5 -3.12 24.1 39.60

10
10
the dissipated energy/computation of the four set-ups. This 10
0 1
10 10
2
10
3
10
4

Data rate (KHz)


is attributed to the fact that leakage energy in the circuit
Figure 10: Shows the total energy consumption for 32-tap FIR filter at various date rate
increases as the time required to complete a single computation and different power gating configurations.
increases, leading to a longer circuit idle time. Although
MEMS relays exhibit zero leakage current, decreasing the
data rate (i.e, with longer computation time) will lead to a R EFERENCES
significant increase in the energy, as shown in Table IV. [1] L. Nazhandali et al., “Sensebench: toward an accurate evaluation of
Table V and Fig.10 show the total energy consumption of sensor network processors,” in IISWC,, pp. 197–203, Oct 2005.
the 32-tap FIR filter implemented at various power gating set- [2] S. Hanson et al., “Ultralow-voltage, minimum-energy cmos,” IBM
Journal of Research and Development, vol. 50, pp. 469–490, July 2006.
ups and data rate. All these set-ups are powered by supply [3] K. Roy et al., “Leakage current mechanisms and leakage reduction
voltage with Vdd =1V. The number of PMOS sleep transistors techniques in deep-submicrometer cmos circuits,” Proceedings of the
used in the PSN are S1 =30 with width=8um, and S2 =20 with IEEE, vol. 91, pp. 305–327, Feb 2003.
[4] L. T. Clark et al., “Reverse-body bias and supply collapse for low
width=8um. The total energy consumption caused by the DL, effective standby power,” IEEE Transactions on VLSI Systems, vol. 12,
and CC of each set-up as well as the overall energy overhead pp. 947–956, Sept 2004.
of the proposed approach caused by adding MEMS relays and [5] J. N. Mistry et al., “Sub-clock power-gating technique for minimising
leakage power during active mode,” in DATE, pp. 1–6, March 2011.
the charge pump were evaluated. These results indicate that at [6] T. Lin et al., “Fine-grained power gating for leakage and short-circuit
low data rate our approach can achieve greater energy savings power reduction by using asynchronous-logic,” in ISCAS, pp. 3162–
about 69% compared with the one without-PG and 29.5% 3165, IEEE, 2009.
[7] Kawano et al., “Adjacent-state monitoring based fine-grained power-
with-PG [17]. This is attributed to the significant increase of gating scheme for a low-power asynchronous pipelined system,” in
the leakage energy in the DL, and CC blocks. It is evaluated ISCAS, pp. 2067–2070, IEEE, 2011.
that the leakage energy dissipation of DL blocks equal to [8] H. Fariborzi et al., “Analysis and demonstration of mem-relay power
gating,” in CICC, 2010 IEEE, pp. 1–4, Sept 2010.
0.68nJ at data rate about 1KHz. However, increasing the data [9] M. Henry et al., “From transistors to mems: Throughput-aware power
rate will lead to increase Es of the MEMS-based power gating gating in cmos circuits,” in DATE, pp. 130–135, March 2010.
which outweigh its leakage power savings. Due to its lower Es , [10] M. Spencer et al., “Demonstration of integrated micro-electro-
mechanical relay circuits for vlsi applications,” JSSC, vol. 46, no. 1,
NEMS at 45μm2 can achieve greater energy savings compared pp. 308–320, 2011.
to the existed MEMS, as shown in Fig. 10. [11] S. Rana et al., “Energy and latency optimization in nem relay-based
digital circuits,” TCAS-I, vol. 61, pp. 2348–2359, Aug 2014.
VIII. C ONCLUSION [12] H. Alrudainy et al., “Mems-based power delivery control for bursty
applications,” in ISCAS, pp. 790–793, May 2016.
This paper presents an investigation into power gating [13] H. Kam et al., “Design, optimization, and scaling of mem relays for
ultra-low-power digital logic,” T-ED, vol. 58, pp. 236–250, 2011.
techniques implemented on asynchronous micropipeline. This [14] I. E. Sutherland, “Micropipelines,” Communications of the ACM, vol. 32,
study demonstrated the threshold at which these techniques no. 6, pp. 720–738, 1989.
can achieve greater energy savings in relation to the design [15] H. Alrudainy et al., “A scalable physical model for nano-electro-
mechanical relays,” in PATMOS, pp. 1–7, Sept 2014.
architecture and data rate of the input. Future work, develop a [16] A. Bazigos et al., “Analytical compact model in verilog-a for electro-
tool to automatically cluster the micropipeline stages with the statically actuated ohmic switches,” TED, vol. 61, pp. 2186–2194, 2014.
best power gating techniques by determining the static power [17] A. Ogweno et al., “Power gating in asynchronous micropiplines for low
power data driven computing,” in PRIME, pp. 342–345, IEEE, 2015.
requirements and maximum allowable wake-up time.

163

You might also like