Professional Documents
Culture Documents
Power Gating Nems
Power Gating Nems
Power Gating Nems
Abstract—Asynchronous logic and power gating are promising applications more challenging. In most synchronous designs,
techniques for low duty cycle applications which need maxi- switching circuits off during idle periods is usually performed
mal energy efficiency, which are becoming more common-place by adopting shut-off instructions within the program code. In
with the popularity of wireless and embedded systems. This
paper investigates the potential use of Nano/Micro- Electro- contrast, it is difficult to employ this approach with circuit
Mechanical (N/MEM) switches as the means of power gating systems which operate without using instructions.
asynchronous computation. A systematic optimization of the Although, a recent proposed approach performs power
N/MEMS parameters is performed using finite element analysis gating during active computation time within the sub-clock
(FEA) in the multiphysics COMSOL tool. An asynchronous cycle [5], synchronous circuits will still require proper timing
FIR filter with a 4-phase bundled-data handshake protocol is
designed and implemented in the 90nm technology node. The analysis. It is essential to ensure that the hold and set-up
N/MEMS switches are comparatively studied with conventional time conditions are satisfied during circuit switching on/off.
sleep transistors in a system where both the computation and the In asynchronous circuit designs, however, local handshake
timing controls for the asynchronous FIR circuits are power gated protocols are used as control signals to power gate during
appropriately. It is demonstrated that our N/MEMS solution the idle period, as these control signals specify when the
offers a 69% energy improvement when 32-tap FIR filter is power
gated at a data rate of 1KHz compared to a 39% savings realized circuit starts and ends computation. A plethora of research has
by using sleep transistors in the same design. been conducted to investigate power gating in asynchronous
designs. In [6] the request signals of a four-phase bundled
I. I NTRODUCTION
data scheme are utilized as a control signal to power on/off
Ultra-low-power digital circuit design has become a key the idle circuits in each stage of a micropipeline. A further
challenge given the growth in demand for energy-constrained improvement has been proposed to power gate both the com-
devices that consume the minimum energy possible, and have binational circuit (CC) block and the latches within each stage
less emphasise on circuit performance. These devices include of the pipeline [7]. The latter approach proposes monitoring
wireless sensor network, biomedical devices, baseband posses- the states of the adjacent blocks, and thereby shutting them
sor, and autonomous sensors for the IOTs [1]. Traditionally, off when required. Generally, depending on how the delay line
the scaling down of supply voltage has been used to reduce (DL) block is implemented in asynchronous designs, this can
the dynamic energy quadratically [2]. However, operating at be a major source of a significant amount of leakage current.
low voltage introduces a significant increase in propagation Therefore in the present work, zero-leakage N/MEMS-based
delay, which in turn leads to longer task completion time devices are employed to power gate each combinational circuit
resulting in significant increases in leakage energy. This is even (CC) block as well as the delay line (DL) block (i.e. N/MEMS
exacerbated as the technology node is scaled down further provides free mechanical delay time) in micropipeline stages.
past the 90nm transistor size. Since the target applications in Furthermore, the conditions where N/MEMS-based power
this paper generally demand non-invasive power gating of the gating circuitry can achieve greater energy savings than sleep
combinational circuit, the idle time between operations is non- transistor counterparts are investigated, including questions of
trivial which in turn leads to more energy consumption. design architecture, and behaviour of operation.
To alleviate idle energy consumption, power gating tech- This work addresses the limitations and drawbacks of
niques [3] as well as body biasing approaches [4] were used in the previous work utilizing N/MEMS for energy-constrained
both asynchronous and synchronous circuit designs. Of the two implementations. Since these studies are either based on
methods, power gating has been demonstrated to be an effec- theoretical demonstrations [8] or showing a lack of the model
tive technique to mitigate leakage power during the idle period. developed and simulation environment utilized [9]. Therefore,
Typically, sleep transistors are inserted between the main a novel work based on 3D FEA performed on COMSOL
power supply rail and combinational circuit (CC). However, multiphysics simulation tool is presented to target applications
these transistors themselves contribute high leakage current which exhibit low duty cycles as well as bursty computation
making the reduction in idle energy of energy-constrained behaviour. It should be noted that, depending on the data
Ref. No. Geo. A. D. S. M. C.R.(Ω) Es (pJ) force can not be produced to balance the electrostatic force.
[10] 4 PP V Poly-SiGe 1.4k 1.8 Consequently, the gate electrode collapses into the drain-
proposed 4 PP V Poly-SiGe NR 1 source terminal. To turn off the N/MEM relay, Vgb is lowered,
[11] 3 CB L SiO2 5K 0.082
which decreases the electrostatic force between the suspended
gate and the body terminal. Eventually, as Vgb continues
to decline, the spring elastic force becomes larger than the
rate (target throughput) and design architecture, the total
electrostatic force, and thus the dimples pull out of contact.
energy consumed by the power gated circuits can be greater
The release voltage of the N/MEMS is referred to as Vrl , as
or smaller compared to cases with no power gating. This
can be seen in Fig. 1.
work investigates the extent to which power gating in the
asynchronous micropipeline based N/MEM switches can be III. F INITE E LEMENT O PTIMIZATION
beneficial compared to using sleep transistors at different data
rates and levels of design architecture complexity. In the present study, 3D Finite element analysis (FEA) is
II. BACKGROUND used to accurately simulate and evaluate N/MEMS physical
Due to recent advances in planar fabrication process, me- parameters. Figs. (2-3) show the 3D FEA simulation results
chanical computing has been revived for energy-constrained of various geometric shapes of MEMS at comparable foot-
applications [10]–[12]. Typically, N/MEMS can be classi- print size (450um2 ) and material type (poly-SiGe). These
fied, based on the method of actuation, into electrostatic, experiments were carried out using 3D FEA in COMSOL
electrothermal, magnetostatic, and piezoelectric. Each type of multiphysics tool. The pull-in voltage, bending out of plane,
actuation scheme has specific advantages and drawbacks as and residual stress of these relays have been evaluated and
listed in Table I. It can be deduced from Table I that the analysed. Fig. 3 (b) illustrates that 4-terminal relays with
electrostatic actuated MEM relay is attractive candidate for double springs, as expected, are pulled-in with lower voltage
digital logic applications due to its low active power consump- and switching energy compared with that of Fig. 3 (a) and
tion, scalability, fast switching, and ease of manufacture using Fig. 2. However, the indicated results demonstrate that 4-
conventional planar processing techniques [10]. Alternatively, terminal relay with double spring suffers from high residual
they could also be classified according to the axis of deflection stress which may affect their functionality of operation. In
(lateral, vertical), contact interface (ohmic or capacitive), and the present work, due to its higher recorded on/off switching
geometric shape (see-saw beam (SS), cantilever beam (CB), cycles without exhibiting any failure (2.1× 109 ) as well as
dual bridge (DB), clamped-clamped beam (CC), parallel plate lower switching resistance, the relay in Fig. 2 is used to shut-
(PP), sidewall perimeter beam (SW)). Table II summarises off the idle circuits. For example, if these relays switch on/off
the key features of various N/MEM relays. These features on average of 1KHz for 1 second followed by 4 minutes
including the number of terminals (No.), geometric shape process of idle, then they roughly can last for 17 year without
(Geo.), actuation dirction (A. D.), structure material (S. M.), experiencing any failure in operation.
contact resistance (C. R.), and switching energy (Es ). In the
present work, for coherent comparison analysis, relays in Table
LA
II are simulated using the COMSOL multiphysics tool with
W
comparable footprint size (see Section III).
Channel
Operating principles of an electrostatic relay can be sum-
WA L
marised as follows. Firstly, As Vgb is increased, as shown in
Fig. 1, an electrostatic force is generated between the gate-
body terminal. As a result, the gate bends toward the body
terminal, and thereby an elastic spring force is generated which
counteracts the electrostatic force. Increasing Vgb leads to an
(a) (b)
increase in the gates deflection at a higher rate due to the
positive feedback which in turn decreases the gap distance and Figure 2:Demonstrates the: (a) FEA-simulated pull-in voltage; (b)
leads to an increased of electrostatic force. As Vgb reaches a simplified sketch, symbols L, W, LA, and WA denote, respectively,
spring (length/width), and actuation area (length/width) [10].
level above the pull-in voltage (Vpi ), a large enough spring
159
Table III: Current and scaled MEM relay physical parameters based on COMSOL multiphysics tool.
(a) (b)
Figure 3: Demonstrates the 3D FEA of: (a) anchor MEMS [11]; (b) 4-terminals with double spring MEMS.
A. Energy-Latency Analysis of N/MEMS Fig. 4 (d) demonstrates a significant reduction in the switch-
An extensive parametric sweep simulation is performed ing energy of the scaled MEMS. Furthermore, these results
using the COMSOL multiphysics tool, in this work, to estimate clearly indicate a better trade-off between switching energy
the range of electo-mechanical parameters for both fabricated and mechanical delay time compared to that of the fabricated
450um2 and scaled 45um2 , and 4.5um2 relays respectively, MEMS (A=450um2 ). As an example, it is found that at,
thereby energy-latency trade-offs of N/MEMS can be opti- (gd /g0 )=0.5, every ∼5× increase in switching energy can be
mised. These parameters can be seen in Table III. trade-doff for a ∼5× reduction in the scaled delay, as can be
The result in Fig. 4 (a) shows the switching energy con- seen in Fig. 4 (c-d).
sumption of MEMS by using 3D FEA as a function of IV. A SYNCHRONOUS PIPELINE STAGE
the dimple gap (gd ), and resonant frequency (w). As can
The asynchronous micropipeline, as shown in Fig. 5, pro-
be seen, increasing (gd ) causes an almost linear increase in
vides an event-driven scheme utilizing localized handshaking
switching energy at low (w). Alternatively, switching energy
signals to transfer data through the pipeline sequentially from
increases exponentially with increasing resonant frequency (w)
one stage to another [14]. When the input data is ready to
by sweeping the ratio of (L/W), at high (gd ). Fig. 4 (b) shows
be sent, a request signal req_in is generated resulting in
the simulation results of mechanical delay time as a function
of the gap ratio (gd /g0 ), and resonant frequency (w). One 12
x 10
¡
1 10
Mechanical delay time (sec)
1
0.8 8
increase in (gd /g0 ), which is consistent with the theoretical 0.6
0.8
0.6 6
predictive analytical equation in the previous study [13]. 0.4
0.4 4
B. Implications of Scaling on the Energy-Latency Trade-off 0.2
0.2
0 2
0
Like the case for CMOS, the scaling down of MEMS 0.8
Ga 6 0
1
pr 2 1.5
at 0.6 -7
parameters will lead to achieving greater energy savings. io
(g
d/
4
/w)]
x 10
1/w 3
[f(L/w 4
)]
1
(m)
x 10
-7
g0 0.4 2 [f(L
) 1/w x 1 -7 0.5 gap
A 1-DOF optimization based on a variable scaling factors (a) 0 (b)
-14
7 x 10
methodology has been proposed in previous work [13]. How- x 10
7
x 10
4
-13 15
Mechanical delay time (sec)
x 10
ever, single DOF analysis can lead to inaccurate outcomes. 4 3.5
Switching energy (J)
3
As a result, 3D FEA coupled with the sweeping of scaled 3
2.5
1.5
10
1.5 5
experiment, it is postulated that area (A), gate thickness (h), 1
1
0.5
limit and it may not be be possible to scale them down as Figure 4: Illustrates that: (a) switching energy based FEA at g0 =200nm and A=450um2
as a function of gd and resonant frequency (w); (b) Tmech. as a function of gap ratio and
readily as other parameters. For instance, g0 is limited by resonant frequency obtained from 3D FEA at A=450um2 ; (c) Tmech. of the scaled relay
nano-gap formation technology, while gate thickness (h) will at g0 =40nm and A=45um2 as a function of gd and resonant frequency; (d) switching
energy of the scaled relay.
be set by process technology constraints.
160
enabling latches to capture the input data through EN1 and
to raise the output request signal req_1. Consequently, the Req_1 D_in D_out
Ton Toff Comb.
previous stage waits for an acknowledgement signal from the Latches Latches
circuit
next stage, which is asserted once the data is computed and
ready to be stored into the latches. While the latched data in CC
EN1 EN2
the previous stage is being processed by the combinational Teval req_in GND req_2
circuits, the request signal req_1 is also passing through a H/S1 req_1 Delay line H/S2
Ctrl Ctrl
matched delay element. The delay of the DL block is estimated Ack_1 En2 = 1 ack_out ack_1 ack_2
to be not less than the worst case delay of the combinational (a) (b)
circuits. This is to guarantee that correct data values are Figure 6: (a) timing diagram; (b) power-gated asyn. micropipeline [6].
captured by the latches.
V. P OWER GATING IN ASYNCHRONOUS MICROPIPELINE
dissipation. As a result, this approach can only be energy-
In this section we briefly discuss the power gating tech- efficient when the energy saved is much greater than the energy
niques in asynchronous miropipelines using sleep transistors. consumed in switching the power switch network (PSN) on-
Furthermore, using the optimized relay design (Section III) and-off. Moreover, further energy savings of between 5% and
a N/MEMS-based power gating controller is proposed. In the 12% can be obtained by including the DL blocks in the power
following sections we briefly describe our approach, highlight- gating domains [16], as shown in Fig. 7 (a). However, sleep
ing the amount of transistors have fundamental limitations in their effectiveness.
Therefore, the present research investigates using N/MEMS
A. Conventional power gating in asynchronous micropipeline for effective idle energy minimization.
The req_1 signal, as shown in Fig. 6 (b), is used as a B. N/MEMS power gating in asynchronous micropipeline
sleep control signal to turn off the CC block during its idle
state, or to turn it on during its active period. This will lead to Further improvements to the previous paradigm can be
the use of so called ”just-in-time” computation. Consequently, achieved by employing N/MEMS to power gate both the DL
leakage current can be reduced during the inactive period, as as well as CC blocks, as can be seen in Fig. 7 (b). In this study,
illustrated by Tof f in the timing diagram shown in Fig. 6 (a). An on chip charge pump for MEMS implementations has been
When the data is asserted, the request signal req_1 is raised optimised to meet the requirement of the pull-in voltage of
to high and passed simultaneously to both sleep transistors any types of chosen MEMS. For example, It is found that
and the DL block as a turn-on signal. This leads to powering adopting a charge pump increases the delay overhead by 0.5
the CC block, and thereby the incoming data is accepted from μs of the MEMS in Fig. 2, while it has a slight impact on
the data bus during Ton . The time required for the CC block the energy overhead 0.5 pJ. The following two benefits can
to compute the incoming data and give stable output values is be observed by adopting N/MEMS in the PSN of Fig. 7(b).
denoted as Teval . As the req_1 is passed through the DL Firstly, unlike sleep transistors, N/MEMS switches exhibit
block, the handshaking controller 2 (H/S2 Ctr2) generates zero leakage current. To avoid any performance degradation,
the enable signal EN2 for the latches and Ack_1, which especially in ultra low power applications, the width of sleep
is transmitted back to the handshaking controller 1 (H/S1 transistors has to be made larger, which in turn leads to
Ctrl). Consequently, handshaking controller 1 (H/S1 Ctrl) de- significantly greater leakage current. Finally, fewer N/MEMS
asserts the req_1 signal once the Ack_1 signal is received, than CMOS counterparts are required in both the DL block as
and thereby turns off the CC block. It is reported that a well as the PSN, due to its lower Ron than the sleep transistor
reduction of about 70% in the leakage energy dissipation counterpart. When req_1 is de-asserted, the DL block will
can be achieved by utilizing this approach compared to one be floating due to the disconnectivity period caused by the
without a power gating technique [6]. It should be noted that long Tmech delay to turn-on the other N/MEMS in the buffer
this study only evaluated leakage energy reduction without circuit. Therefore, employing the AND gate, as can be seen
taking into account the expense in energy overhead caused by in Fig. 7 (b), is essential to ensure that the output of the DL
the effect of the power gating circuitry on the total energy block is always driven by either logic low or logic high and
that it is never floating.
VI. E VALUATION SETUP
D_in D_out
To validate the proposed approach as well as to find the con-
Combination
Latches
circuits
Latches ditions at which power gating in asynchronous micropipelines
using N/MEM power switches becomes beneficial, 8- and 32-
EN1 EN2 tap FIR filters were designed. These FIR filters are imple-
req_in req_2
H/S req_1 Delay line H/S mented using 90nm CMOS technology node, as shown in
Ctrl Ctrl Fig. 8. They consist of two combinational blocks including
ack_1 ack_2
ack_out
an accumulator and multiplier. Each combinational block is
Figure 5: Asynchronous 4 phase bundle data micropipeline [6].
powered by its own power domain, so that the multiplier is
161
Vdd
Vdd
Power switch transistors MEMS power switch
Vdd PD Vdd PD
MEMS
EN1 EN2 EN1 Delay line
EN2
req_in
Delay line req_2 req_in Charge req_2
H/S H/S pump H/S
H/S req_1 req_1
Ctrl Ctrl Ctrl
Ctrl ack_1 ack_1 ack_2
ack_2 ack_out
ack_out (b)
(a)
Figure 7: Schematic illustration of power gating approach in asynchronous micropipeline using: (a) sleep transistors; (b) proposed N/MEMS.
Vdd Vdd
N/MEMS power switches N/MEMS power switches
Vdd PD1 (S1) (S2)
Vdd PD2
32 64 32 32 32 32
Multiplier A
Accumulator D Flip-flop
D Flip-flop
Data_in D Flip-flop Data_out
N/MEMS-based N/MEMS-based
EN0 EN1 EN2
Delay line Delay line
req_in req_0 req_1 req_out
Pipeline CP H/S CP H/S
Controller Ctr1 Ctr2
ack_out ack_0 ack_1 ack_in
Figure 8: Asynchronous 32-tap FIR filter implemented based on the proposed approach.
powered by power domain1 (Vdd PD1) while the accumulator Synopsys simulator the area, dynamic and static energy has
is powered by power domain2 (Vdd PD2). These two power been optimized. Finally, the two technology N/MEMS-CMOS
domains are powered by the main power supply voltage (Vdd ) are simulated using Cadence simulation tool.
through an array of PSN including S1 and S2 , as indicated VII. R ESULTS
in Fig. 8. The delay line (DL) blocks in each stage of the Our approach was evaluated and compared with various set-
micropipeline are connected now with the power domain ups in previous work [6] [17]. All these set-ups are powered
of the corresponding combinational circuit (CC) block. The by supply voltage with Vdd =0.6V. The number of PMOS sleep
acknowledgement signal ack_out is only generated when the transistors used in the PSN are S1 =20 with width=4um, and
controller receives a request signal req_in and performs N S2 =15 with width=4um. The total energy dissipation was eval-
handshake cycles with the micropipeline. Consequently, when uated and energy per computation was recorded. Furthermore,
the req_0 signal is placed high the switch array S1 will the total leakage energy (i.e. caused by DL, and CC) of each
turn on, thereby connecting the supply voltage Vdd to power set-up was recorded. Moreover, the overall energy overhead
domain1 (Vdd PD1). This leads to powering up the multiplier of the proposed approach caused by adding N/MEMS and the
and its corresponding delay line (DL) block. One branch of charge pump were evaluated. It should be noted that this filter
the req_0 signal bypasses the delay line (DL) block and will only works for a range of data rates which must not exceed the
be ANDed with its delayed signal, resulting in generating the natural throughput of the filter at Vdd =0.6V. Table IV shows
signal to be passed into handshaking controller1 (H/S Ctrl). the simulation results of 8-tap FIR filter at various setups.
The (Vdd PD1) can be powered down when the req_0 signal
It can be deduced from these results that decreasing the
is de-asserted, and this leads to placing one input of the AND
data rate from 1MHz to 1KHz will result in an increase in
gate in the floating state. However, the output logic of the
AND gate will be set at logic low by req_0. COMSOL
Device multiphysics
The flowchart of our evaluation process is presented in Fig. o dimensions
3D FEA
o geometry
9. Firstly, N/MEMS switch has been modelled and designed o material
simulator
o etc. VHDL code
using 3D FEA performed by COMSOL multiphysics tool. Pre- -Parametric sweep o FIR filter
Contact -Transient analysis o AES key generator
defined characteristics including dimension, geometric shape, modelling -Frequency analysis
o
material, and actuation type are used as an input to the o
Stiffness (k)
Damping (b) Synopsis-
simulator
COMSOL. As a result, the evaluation of the mechanical Extract device
o
o
Mass(m)
Pull-in Voltage (Vpi) synthesis
and electrical lumped parameters can be then obtained by parameters o Mechanical time
o Switching energy
performing frequency, transient, and parametric sweep anal- Optimize
Verilog-AMS o Dynamic and
ysis. These lumped parameters are written in Verilog-AMS (Electrical and static power
mechanical lumped o Area
and co-simulated in Cadence simulator, as illustrated in [15] model)
Results
[16]. From the other side, VHDL code is written for 32-tap o Total energy Cadence
o Timing
FIR filter which consists of two combinational blocks 32-bit o etc.
simulator
162
Table IV: Total energy/computation for various asynchronous PG configurations. Table V: Energy consumption for 32-tap FIR filter at various asynchronous PG setups.
Energy/computation ( J)
NEMSPG
1 1601.9 43.0 3171 4640
10 172.6 39.0 336 464.9
100 30.9 24.0 52.5 81.70 9
10
400 18.6 7.0 28.8 45.90
800 16.8 -2.43 24.9 40.60
1000 16.5 -3.12 24.1 39.60
10
10
the dissipated energy/computation of the four set-ups. This 10
0 1
10 10
2
10
3
10
4
163