An Energy-Efficient Resilient Flip-Flop Circuit With Built-In Timing-Error Detection and Correction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

An Energy-Efficient Resilient Flip-Flop Circuit

with Built-In Timing-Error Detection and Correction


Che-Min Huang, Tsung-Te Liu, and Tzi-Dar Chiueh
Graduate Institute of Electronics Engineering and Department of Electrical Engineering,
National Taiwan University, Taipei, Taiwan 10617.

ABSTRACT
This paper presents a timing error resilient flip-flop (ERFF)
circuit with high energy-efficiency. The proposed flip-flop design
automatically corrects timing errors and therefore minimizes the
performance degradation due to variations. The simulation
results show that the proposed design can achieve better energy-
efficiency in ISCAS'89 benchmark circuits and LEON3 integer-
processing unit, when compared to other state-of-the-art timing
error detection and correction methods.

Keywords—DVFS, timing error detection and correction,


wide- operating voltage processor, energy efficient, flip-flop

I. INTRODUCTION
FIGURE 1. CIRCUIT ENERGY CONSUMPTION AS A FUNCTION OF
Today’s mobile devices must support various applications and OPERATING SUPPLY VOLTAGE IN A TYPICAL DIGITAL SYSTEM.
operating scenarios, while still maintaining reasonable operating and
standby time. However, mobile users actually seldom perform
In [7], Soft-Edge Flip-flop (SEF) uses two clock signals to create a
computation-intensive applications, such as playing games and
transparency window that can achieve timing error detection and
watching movies, over a long period of time on handheld devices. On
correction.
the other hand, they do spend much time on sending and receiving
text messages, surfing the web, and other less computation-intensive In this paper, we propose a new flip-flop circuit with built-in
applications. Therefore, processors for the handheld devices must be timing-error detection and correction capability, called error resilient
capable of operating at a wide dynamic range to support an even flip-flop (ERFF). We designed and implemented a LEON3 integer-
wider spectrum of application scenarios. Energy-efficient processor processing unit with the proposed ERFF circuit. ERFF not only
designs with wide operating range will no doubt be one of the key lowers the impact of variations to enhance the speed performance,
features of next-generation digital SoCs. but also reduces the power consumption to achieve higher energy-
efficiency. Compared to other state-of-the-art solutions, ERFF
The speed performance and energy consumption of a digital IC
realizes timing error detection and correction capability with the least
implemented in an advanced process are highly dependent on its
power consumption.
operating voltage, as shown in Fig. 1. At high supply voltages, the
circuit operates at higher speed but consumes more energy. On the
other hand, the circuit slows down significantly as the supply voltage II. ERROR RESILIENT FLIP FLOP (ERFF)
is scaled down, while significant energy consumption can be
potentially saved. The optimum energy operating point (0.4V in this Fig. 2 shows the proposed timing error resilient flip-flop (ERFF)
example) is around device threshold voltage [1]. As a result, circuits circuit, which is composed of four parts. The first two parts are
with dynamic voltage-frequency scaling (DVFS) capability [2] can similar to the master latch and the slave latch in traditional CMOS
achieve higher energy-efficiency and would become indispensable in transmission-gate based D-type flip-flops. The third part is a 2-1
the design of digital circuits, especially for energy-constrained multiplexer, which bypasses the master latch if the input signal “D”
applications. transition occurs after the rising clock “CLK” edge. The last part is
the late detector, which consists of a transmission-gate based XOR
One of the major challenges of employing DVFS technique is the and a dynamic logic circuit. When there exists a signal change in D
variations in process, temperature and supply voltage that can after the CLK goes high, the “Late” signal will be asserted and will
significantly impact circuit functionality and performance especially remain asserted until CLK goes low. This causes D to bypass the
at lower supply voltages. Several timing error detection and master latch of ERFF. As a result, the ERFF output signal “Q” will
correction methods have been proposed in the literature. In [3], the simply follow its input D for the remaining positive CLK cycle.
Razor circuit employs a flip-flop and a latch to determine whether
the incoming data is late, and uses the entire pipeline architecture to Fig. 3 illustrates the timing diagram of ERFF circuit for two
do error correction. The Razor II circuit uses a latch as the main different operating scenarios. We can see that if the data “D” is stable
circuit to correct errors, and employs a signal change detector to before the rising edge of the clock “CLK”, the output signal “Q” will
detect timing error [4]. On the hand, Bubble Razor technique tries to follow the data and ERFF acts just like a conventional flip-flop.
extend the equivalent operating cycle when timing errors occur, and However, if D fails to become stable before a CLK rising edge and
then performs the cycle extension in both data propagation directions continues to change after CLK rising edge, the “Late” signal will be
of the pipeline [5]. All the Razor series of techniques require asserted for that half CLK cycle and the ERFF output Q would
architectural effort to recover from or to avoid timing errors. In [6], follow its input D. As a result, the error detection window of ERFF
two similar timing borrowing methods, Double Sampling with Time approximately equals to the entire half clock cycle when CLK is high,
Borrowing (DSTB) and Transition Detector with Time Borrowing as shown in Fig. 3. The extra time beyond the error detection
(TDTB), were proposed. window is reserved as setup time to generate late signal and switch
multiplexer circuit.

978-1-4799-6275-4/15/$31.00 ©2015 IEEE


FIGURE 4. TIMING DIAGRAM OF A MULTI-STAGE PIPELINE
STRUCTURE WITH TIMING ERROR DETECTION AND CORRECTION.

TABLE 1. FOM COMPARISON OF DIFFERENT ERROR DETECTION AND


CORRECTION METHODS.

ERFF TDTB[6] DSTB[6] SEF[7]


Power
4.10 11.6 8.69 5.41
PW)
(P
Window
FIGURE 2. PROPOSED ERROR RESILIENT FLIP-FLOP (ERFF) 1.84 1.92 1.92 1.84
Size (ns)
CIRCUIT DIAGRAM.
FOM
0.449 0.166 0.221 0.340
PW)
(ns/P

compare the effectiveness and energy-efficiency of different error


detection and correction methods:

Table 1 compares the power, error detection window size and


FOM of the proposed ERFF circuit with other time-borrowing
designs. The proposed ERFF design consumes the least power
consumption and the best FOM, while its error detection window
size is only shorter by 5%.

III. CIRCUIT VERIFICATION


A. ISCAS’89 Benchmark Circuits
FIGURE 3. ERFF TIMING DIAGRAM.
We verified our ERFF design and other time-borrowing error
The proposed ERFF circuit with built-in timing error detection and detection and correction flip-flop designs in three digital circuits
correction mechanism can significantly improve the performance of (s349, s386, s1196) from ISCAS’89 benchmark database. The
pipeline digital systems. This is illustrated in Fig. 4 by demonstrating verification flow is as follows. First, we use the clock frequency of
two operation scenarios of a multi-stage pipeline operating at a traditional transmission-gate master-slave flip-flop (TGFF) at 1V in
nominal (1000MHz) and a higher-than-nominal (1200MHz) clock 90nm CMOS technology as our baseline clock frequency (fTGFF).
frequency, respectively. We assume the shortest path of the pipeline Second, we used this baseline frequency as input clock frequency for
is 0.6ns, and the longest path exists in the first stage and takes 0.95ns. different designs, and then lowered the supply voltage until the test
The top plot represents the nominal operation case at 1000MHz. In circuits started to show errors. Fig. 5 compares the normalized
order to further enhance the pipeline throughput, a 1200MHz clock is energy consumption of different designs with respect to TGFF-based
employed instead, as shown in the bottom plot. Since the clock circuits. We can observe that ERFF consumes the least power among
period is shorter than the critical path, there would be timing errors in all designs, saving about 30% to 50% of energy when compared to
the traditional pipeline architecture. On the other hand, for an ERFF TGFF-based circuits.
based pipeline structure, the computation result of the first pipeline
We next perform Monte Carlo simulation to evaluate circuit
stage can extend over the next rising clock edge. The first stage and
performance under process variations. A standard deviation of 28mV
the next two stages can absorb the longer-than-nominal path delay
in device threshold voltage is introduced in 90nm CMOS technology.
through time-borrowing capability of the ERFF based pipeline
Table 2 summarizes the maximum operating frequencies of s1196
structure.
circuits under variations using different error detection and
The size of error detection window directly impacts the correction methods for supply voltages ranging from 1V down to
effectiveness of timing error detection and correction method and 300mV. We can see that all error detection and correction methods
therefore is an important performance index. As a result, we define a can operate faster than TGFF-based circuit. This further
new figure of merit (FOM), which is the ratio of the detection demonstrates that circuits employing timing error detection and
window size to the power consumption of the flip-flop circuit, to correction techniques can significantly reduce the performance
TABLE 2 MAXIMUM OPERATING FREQUENCIES OF DIFFERENT ERROR
DETECTION AND CORRECTION METHODS FOR S1196 CIRCUIT UNDER
VARIATIONS FROM 0.3V TO 1V.

VDD
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
(V)
TGFF
10 60 150 300 500 650 850 950
(MHz)
ERFF
27 120 300 650 950 1350 1650 1950
(MHz)
DSTB
29 130 350 650 1000 1350 1700 2000
(MHz)
TDTB
29 130 350 650 1000 1350 1700 2000
(MHz)
SEF
32 130 350 600 900 1250 1450 1700
(MHz)
FIGURE 5. NORMALIZED ENERGY CONSUMPTION OF DIFFERENT
ERROR DETECTION AND CORRECTION METHODS AT THE SAME
OPERATING FREQUENCY (FTGFF @ 1V) FOR THREE ISCAS’89
BENCHMARK CIRCUITS IN 90NM CMOS TECHNOLOGY.

degradation caused by process variations. To compare energy-


efficiency of different designs at low supply voltages under
variations, we use the clock frequency of TGFF-based s1196 circuits
at 0.4V as our baseline clock frequency (fTGFF = 60MHz) and
compare the power-delay-products (PDP) of different designs. The
simulation results show that ERFF based implementation consumes
the least power among all designs, saving about 15% of energy when
compared to TGFF-based circuits.

We next perform Monte Carlo simulation to evaluate circuit


performance under process variations. A standard deviation of 28mV
in device threshold voltage is introduced in 90nm CMOS technology.
Table 2 summarizes the maximum operating frequencies of s1196 FIGURE 6. POWER CONSUMPTION OF LEON3 IU AT 90MHZ
circuits under variations using different error detection and USING DIFFERENT TIMING ERROR DETECTION AND CORRECTION
correction methods for supply voltages ranging from 1V down to METHODS.
300mV. We can see that all error detection and correction methods
can operate faster than TGFF-based circuit. This further
demonstrates that circuits employing timing error detection and TABLE 3. MAXIMUM OPERATING FREQUENCY OF LEON3 IU AT
correction techniques can significantly reduce the performance 0.3V UNDER VARIATIONS.
degradation caused by process variations. To compare energy-
efficiency of different designs at low supply voltages under TGFF ERFF DSTB TDTB SEF
variations, we use the clock frequency of TGFF-based s1196 circuits No process
at 0.4V as our baseline clock frequency (fTGFF = 60MHz) and 20 25 25 25 25
variation (MHz)
compare the power-delay-products (PDP) of different designs. The Process
simulation results show that ERFF based implementation consumes 17 21 22 22 21
variation (MHz)
the least power among all designs, saving about 15% of energy when
compared to TGFF-based circuits.
Performance
15% 16% 12% 12% 16%
loss
B. LEON3 Integer-Processing Unit
LEON3 is an open-source 32-bit RISC processor that has been mentation, by 7%. Simulation results show that ERFF also
adopted in several commercial SoC products [8]. We used the demonstrates 12.5% and 18% reduction in power than the baseline
LEON3 integer-processing unit (IU) with seven-stage pipeline and design at 550MHz and 1000MHz, respectively.
implemented the processor in 90nm CMOS technology using
different timing error detection and correction methods. We again We also evaluate the LEON3 IU performance under variations at
used the maximum frequency at which TGFF-based implementation low supply voltages. Table 3 summarizes the processor performance
can operate at 0.4V (fTGFF = 90MHz) as our baseline clock frequency with different timing error detection and correction methods at 0.3V.
and compared the processor power of different designs. Fig. 6 shows The process variation causes the maximum processor operating
the simulated processor power of different designs at 90MHz and the frequency to drop by about 12% to 16%. It should be noted that all
corresponding power breakdown. ERFF based implementation is the timing error detection and correction methods could help achieve
only design that consumes lower power than TGFF-based imple- higher processor operating frequency than TGFF-based design at low
REFERENCES
[1] A. Wang and A. Chandrakasan, “A 180-mV subthreshold FFT processor
using a minimum energy design methodology,” IEEE Journal of Solid-
State Circuits, vol. 40, no. 1, pp. 310–319, Jan. 2005.
[2] J. Howard, S. Dighe, S. Vangal, G. Ruhl, N. Borkar, S. Jain, V.
Erraguntla, M. Konow, M. Riepen, M. Gries, G. Droege, T. Lund-
Larsen, S. Steibl, S. Borkar, V. De, R. Van Der Wijngaart, "A 48-Core
IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and
DVFS for Performance and Power Scaling," IEEE Journal of Solid-
State Circuits, vol.46, no.1, pp.173-183, Jan. 2011.
[3] D. Ernst, N.-S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D.
Blaauw, T. Austin, K. Flautner, and T. Mudge, “Razor: a low-power
pipeline based on circuit-level timing speculation, Proc. of IEEE/ACM
International Symposium on Microarchitecture, Dec. 2003.
[4] S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. Bull,
and D. Blaauw, “RazorII: In Situ Error Detection and Correction for
PVT ane SER Tolerance,” IEEE Journal of Solid-State Circuits, vol.44,
no.1, pp.32-48, Jan. 2009.
[5] M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. Harris, D. Blaauw and D.
Sylvester, “Bubble Razor: An Architecture-Independent Approach to
Timing-Error Detection and Correction,” IEEE ISSCC Dig. Tech.
FIGURE 7. SIMULATED PDP RESULTS OF DIFFERENT LEON3 IU Papers, pp. 488-489, Feb. 2012.
IMPLEMENTATIONS AT 0.3V UNDER VARIATIONS.
[6] K. Bowman, J. Tschanz, N.-S. Kim, J. Lee, C. Wilkerson, S. Lu, T.
Karnik, and V. De, “Energy-Efficient and Metastability-Immune
Resilient Circuits for Dynamic Variation Tolerance,” IEEE Journal of
supply voltages. Although ERFF based design realize slightly lower
Solid-State Circuits, vol.44, no.1, pp.49-63, Jan. 2009.
operating frequency (21MHz) than DSTB and TDTB
[7] S. Dillen, D. Priore, A. Horiuchi, and S. Naffziger, “Design and
implementations (22MHz), it consumes the least energy among all implementation of soft-edge flip-flops for x86-64 AMD microprocessor
designs, as shown in Fig. 7. The designs based on other error modules,” Proc. of IEEE Custom Integrated Circuits Conference, Sept.
detection and correction methods all consume more energy than 2012.
TGFF based design. [8] http://www.gaisler.com/index.php/products/processors/leon3
[9] S. Kim, I. Kwon, D. Fick, M. Kim, Y.-P. Chen; D. Sylvester, "Razor-
IV. DISCUSSIONS lite: A side-channel error-detection register for timing-margin recovery
in 45nm SOI CMOS," IEEE ISSCC Dig. Tech. Papers, pp.264-265, Feb.
When we implement LEON3 IU using the proposed ERFF circuit, 2013.
we replaced the flip-flops only in the most critical paths, about 20%
of signal paths of processor. If the path delays between adjacent
stages in a pipeline are distributed more uniformly, we must replace
more flip-flops with the ones with timing error detection and
correction function. For example, hundreds of flip-flops with timing
error detection and correction function are required even if we
replace flip-flops only in the top 10% of the most critical paths in an
ARM Cortex M3 processor [9]. As a result, the power spent on the
sequential circuits of processor would increase dramatically. Since
the proposed ERFF circuit exhibits high energy-efficiency over a
wide range of supply voltages, it is the most promising candidate for
DVFS processors that require timing error detection and correction
function to recover performance under variations.

V. CONCLUSIONS
In this paper, we propose a timing error resilient flip-flop (ERFF)
circuit and demonstrate its performance advantages through
implementations in ISCAS’89 benchmark circuits and the LEON3
integer-processing unit. Compared to other state-of-the-art timing
error detection and correction methods, the proposed ERFF circuit
consumes the least power with comparable error detection window
size. The LEON3 integer-processing unit implementation using the
proposed ERFF circuit can reduce its power by 18% and 7% at 1V at
0.3V, respectively, realizing high energy-efficiency across a wide
range of supply voltages.

You might also like