Professional Documents
Culture Documents
Constant Delay Logic Style
Constant Delay Logic Style
Constant Delay Logic Style
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
I. I NTRODUCTION
555
CLK
1.0
CLK
Out
M1
N2
N3
N4
.75
Out
CLK
M1
PDN
IN
CLK
NMOS
Pull Down
Network
CLK
(V)
IN
N1
CLK
CLK
.5
N2 N4 N6 N8 N3M3
M2
N4
CLK
M2
.25
(a)
575
(b)
600
625
650
675
time(ps)
700
725
Fig. 1. Schematic of (a) dynamic domino logic with a footer transistor and
(b) FTL.
B. CD Logic
To mitigate the above-mentioned problems, CD logic is
proposed with a schematic shown in Fig. 3(a). Timing block
(TB) creates an adjustable window period to reduce the static
power dissipation. Logic Block (LB) helps to reduce the
unwanted glitch and also makes cascading CD logic feasible.
A buffer implemented in CD logic with schematics of TB and
LB is shown in Fig. 3(b).
1) CD Logic Operation: Fig. 4 depicts the corresponding
CD logic timing diagram and flowchart. For simplicity, we
assume that IN come from dynamic domino logic gates. When
CLK is high, CD logic predischarges both X and Y to GND.
When CLK is low, CD logic enters the evaluation period and
three scenarios can take place: namely, the contention, CQ
delay, and DQ delay modes. The contention mode happens
when CLK is low while IN remain at logic 1. In this case, X
is at a nonzero voltage level which causes Out to experience
a temporary glitch. The duration of this glitch is determined
by the local window width, which is determined by the delay
between CLK and CLK_d. When CLK_d becomes high, and
if X remains low, then Y rises to logic 1, and turns off M1.
Thus the contention period is over, and the temporary glitch at
Out is eliminated. CQ delay mode takes places when IN make
a transition from high to low before CLK becomes low. When
CLK becomes low, X rises to logic 1 and Y remains at logic
0 for the entire evaluation cycle. The delay is measured by
the falling edge of both CLK and Out: hence the name CQ
delay. DQ delay mode utilizes the pre-evaluated characteristic
of CD logic to enable high-performance operations. In this
mode, CLK falls from high to low before IN transit, hence
X initially rises to a nonzero voltage level. As soon as IN
become logic 0, while Y is still low, then X quickly rises
to logic 1. A race condition exists in this case between X
and Y. If CLK_d rises much earlier than X and Y will go to
logic 1, turn off M1, and result in a false logic evaluation. If
CLK_d rises slightly slower than X, then Y will initially rise
(thus slightly turns off M1) but eventually settle back to logic
0. CD logic can still perform the correct logic operation
in this case, however, its performance is degraded because of
M1s reduced current drivability. Therefore, it is important to
556
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
CLK
CLK
M0
Timing
Block
M0
CLK_d
M1
2.8u
CLK
IN
2.8u
CLK
Logic
Block
NMOS
Pull Down
Network
M3
Out
CLK
M2
M1
IN
M5
Out
M4
CLK
Logic Block
CLK
1u
M6
M2
M7
Timing Block
NMOS Pull Down Network
(a)
Fig. 3.
(b)
Window Width
C-Q Delay
D-Q Delay
CLK is high. X is predischarged to GND and
Out is precharged to VDD
CLK
Predischarge period
CLK_d
No
IN
Contention Mode
Contention
X
PDN is on for the entire
evaluation period ?
Pre-evaluated
Y
Yes, remain in contention mode
Out
Fig. 4.
Temporary
Glitch
Scenario
Operation
CLK is high
Contention
C-Q Delay
D-Q Delay
557
P1
V1
VDD
Fig. 5.
N1
P2
V1 = VDD Vt n
VDD - V2
N2
(VDD Vt n )2
(1)
(3)
By Taylor expansion, the square root term can be approximated in first order as
d2
d
d3
d
. (4)
N2 + d = N +
+
N +
3
5
2N
8N
16N
2N
Assuming Vt p Vt n , (3) can be approximated as
2
W p1 VDD Vt p
V1 VDD Vt n (VDD Vt n ) +
4Wn1 (VDD Vt n )
W p1 VDD Vt p
.
(5)
4Wn1
V2 can also be found through a similar approach. Consider
Fig. 5 again transistor N2 operates in the subthreshold region
while P2 is working in the linear mode. Equating the two
current equations yields
Vgsn2 Vtn
Vdsn2
Wn2
It e VT
1 e VT
L n2
(Vdsp2)2
W p2
Vgsp2 Vt p Vdsp2
= p Cox
(6)
L p2
2
where [18], [19]
It = o Cox (VT )2 e1.8 , = 1 +
3Tox
KT
, VT =
Wdm
q
(7)
Wn2 It e VT
p Cox W p2
(V2 )2
= VDD V1 Vt p V2
2
(8)
V1 Vtn
2
VDD V1 Vt p Ae VT
A=
2Wn2 It
4Wn2
(VT )2 e1.8 .
p Cox W p2
W p2
(9)
(10)
(2)
2
W p1
VDD Vt p .
2Wn1
VDD V1 Vt p
V1 Vtn
Ae VT
.
2 VDD V1 Vt p
(11)
558
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
1.03
1.0
(V)
.975
N1
N2 N3
.95
N4 N5
.925
.9
.875
1.6
1.65
Time (ns)
1.7
1.75
Fig. 6.
Simulated output glitch of a five-stage two-input AND gate
implemented with CD logic.
100
90
80
Voltage (mV)
70
D. CD Logic Family
60
50
40
30
20
10
0
-40
-20
20 40 60 80
Temperature (C)
Fig. 7. Temporary glitch mean and standard deviation at the output of threeinput CD, AND, and OR gate versus temperature in a Monte Carlo simulation
with 7500 iterations.
Finally
V1 Vtn
Ae VT
.
V2
2 VDD V1 Vt p
(12)
559
600
500
400
300
Dynamic
OR3 NM
200
400
CD AND3 "0"-NM w/t keeper
CD AND3 "0" NM w keeper
CD OR3 "0" NM
Dynamic
AND3 NM
CD OR3 "1" NM
CD AND3 "1" NM w keeper
CD AND3 "1"-NM w/t keeper
100
0
50
100
150
200
Window Duration (ps)
250
300
1.1
27
24
21
18
15
12
8
7
6
5
4
3
2
1
10%
10%
10%
10%
10%
50%
50%
50%
50%
100%
100%
100%
100%
50%
100%
AND2
AND3
Static
OR2
OR3
AOI22
Dynamic
CD
Fig. 10. Normalized average power of five logic expressions at various data
activities implemented in static, dynamic, and CD logic.
1.0
0.9
Normalized Delay
800
0.8
0.7
0.6
0.5
0.4
0.3
AND2
Static
AND3
CD C-Q
OR2
OR3
CD D-Q
AOI22
Dynamic
B. CD Logic Performance
Figs. 9 and 10 illustrate the normalized delay and average power consumption of static, dynamic, and CD logic,
respectively, in five logic expressions with various input data
activities. The average power is calculated by summing up the
power consumption of every possible input vector and then
dividing it by the number of input vector combinations.
CD logic demonstrates superior performance, especially
for complicated logic expressions, such as Y = AB + C D
(AOI22), in the D-Q mode due to the pre-evaluated characteristic. This is demonstrated in Fig. 11, where CD logic is
approximately two times faster than dynamic domino logic.
This is contributed by: 1) the pre-evaluated characteristic;
and 2) the less number of transistors in the critical path
(3N1P for dynamic, while only 2P1N for CD logic). On the
other hand, CD logics performance is only approximately
the same as or even worse than that of dynamic domino
logic during the CQ mode. Therefore, it is advantageous
to implement CD logic in a single-cycle multistage datapath
because then the pre-evaluated feature (DQ delay) of CD
logic can be fully utilized. The power consumption of CD
logic at 50% data activity is at least 3 and 5 higher than
that of static logic in AOI22 and the rest of logic expressions,
respectively. This suggests that CD logic should be used only
to replace the critical path in any circuit block, since it is
not energy efficient to implement any system with CD logic
only. Table II summarizes the total transistor width of static,
dynamic, and CD logic. Despite CD logics additional transistor overhead, the average area of CD logic is 13% smaller
and 4.5% larger than that of static and dynamic domino logic,
respectively.
560
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
CLK2
CLK1
0.5u
3u
0.5u
M0
CLK_d
2.8u
CLK2
M3
20fF
X
3u
CLK2
2.8u
CLK2
OUT1
3u
M5
M4
M1
A
M7
(V)
(V)
CLK1
.25
1.25
OUT1
.5
(V)
(V)
27.56ps
M0(27.56ps,
0.0V)
M0(27.56ps,
M0(27.56ps, 0.0V)
0.0V)
.75
.25
0
-.25
1.1
1.15
1.2
1.25
time (ns)
1.3
1.35
2u
A
B
CLK2
1.25
1.0
.75
.5
.25
0
-.25
1.25
1.0
.75
.5
.25
0
-.25
OUT2_
Pre-evaluated
13.27ps
OUT2
1.325
1.35
1.375
1.400
1.425
1.45
time (ns)
(a)
Fig. 11.
1.25
1.0
.75
.5
.25
0
(V)
1.0
M2
M8
1.0
.5
20fF
CLK2
0.3u
2u
M6
3u
1.25
.75
OUT2
1.5u
CLK1
1u
X (OUT2_ )
(b)
TABLE II
S TATIC , DYNAMIC , AND CD L OGIC A REA C OMPARISON
Total transistor width (m)
Number of transistors
Static
Dynamic
CD
Static
Dynamic
CD
AND 2
11
13.6
14.96
17
AND 3
18
20.9
19.96
18
OR2
13
10.6
12.96
17
OR3
24
12.6
13.96
18
AOI22
27
19.6
18.96
10
19
Average
18.6
15.46
16.16
7.2
7.6
17.8
V. P ERFORMANCE A NALYSIS
A. 8-bit Ripple Carry Adders (RCAs)
The simulation setup in this section is similar to that of
Section IV. Three 8-bit RCAs using static, dynamic, and CD
logic style are simulated to compare their performances. An
RCA with FTL on the critical path is also implemented,
however, our analysis indicates that FTL-based RCA would
generate false outputs at the later bits because of the false
evaluation phenomenon described earlier. NP-FTL (equivalent to NP-domino, where nMOS-FTL and pMOS-FTL alternate) is also difficult to realize because the output glitch
is significant and easily exceeds 500 mV under process
variations.
The basic static full adder (FA) is implemented with 28 transistors with sizing strongly in favor of Cout computation [6].
The main purpose of this 8-bit RCA is to demonstrate CD
logics performance advantage and to discuss the design considerations that should be taken into account when using CD
logic. A more energy-efficient pass-transistor FA design [23]
will be implemented in the subsequent analysis to provide a
more realistic comparison.
Only the timing-critical carry generation is replaced with
dynamic and CD logic, while noncritical sum computation
remains static in all three RCAs. Ten-thousand random input
vectors are applied to RCAs to compute the average power
consumption. The clock timing is designed in such a way that
all the CD logic gates except the first stage are operated in the
561
1 Cycle Time
B7 A7 CLK4
C8
B1 A1 CLK1 B0 A0 CLK1
...
FA7
C1
FA1
S0
CLK
CLK2 (FA3~4)
CLK1 (FA0~2)
CLK3 (FA5~6)
CLK
Keeper
Cout
C 3
A
3 B
3 B
CLK
6 B
6 B
2 C
out
CLK
CLK1
CLK2
CLK3
TB
C1
2.8
Tdischarge
CLK4
t 115ps
0.3 C
Tdelta
CLK2 needs to arrive
earlier than C3
Carry Generation
2.8
Cout
Dynamic
Critical Path
* Window period
115ps
CLK4 (FA7)
1
CLK
Cin
FA0
S1
S7
Data
1
A
2 B
2 B
1B
1C
Cout
C2
C3
CD
C4
A
1
1
Cout
C
A
3 B
A
3 B
1
1
2 C
out
Cout
C5
S
Static
A
1 B
1 C
1 S
C6
0.5
C7
S8
Sum Generation
(a)
Fig. 12.
(b)
RCA (a) block diagram and (b) timing diagram implemented with CD logic.
TABLE III
RCA P ERFORMANCE C OMPARISON
Static
Data activity
10%
Delay (ps)
50%
Dynamic
100%
10%
369.4 (1.00)
50%
CD
100%
10%
292.6 (0.79)
50%
100%
224.4 (0.61)
Power (W)
50.95
254.77
509.54
142.70
336.05
401
231.85
533.96
18.8
94.1
188.2
41.8
98.3
117.3
52
119.8
135
6944
34761
69521
12231
28763
34322
11669
26883
30294
2) The entire circuitry (i.e., 8-bit RCA) is then simulated under typical corner at 110 C to determine the
glitch level. Extensive simulation results reveal that, if
this glitch level is approximately 65 mV, then the 6
glitch level will be less than 300 mV.
3) Iterative simulations are performed by sweeping this
variable until the glitch level is around 65 mV.
The equally weighted scheme clearly may not be the optimal solution. However, different sizing schemes have been
explored and simulation results indicate that no apparent
performance improvement is achieved compared to the sizing
strategy described above.
3) RCA Performance: Table III compares static, dynamic,
and CD logic RCAs with various figures of merits at different data activity factors. CD-based RCA is approximately
39 and 23% faster than the static and dynamic counterparts,
respectively. On the other hand, the power consumption of CD
logic ranges from 4.55 to 1.18 higher than that of static
logic. In terms of the PDP, CD logic is 2.78 more and 0.72
less than static logic at 10 and 100% data activity, respectively.
CD logic provides a speed advantage that logic styles such as
static and dynamic find difficult to reach. Therefore, CD logic
is suitable in a system where performance is the most critical
factor.
602.82
(15)
P3:2 = P3 P2
(16)
562
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
CLK
TB
* CCD logic replaces the inverter in the logic block with a complex logic gate to provide
better performance, similar to the idea of compound domino logic (CDL) over dynamic logic.
Compound CD
(CCD) logic*
1.2
1.2
t 115ps
G1+P1G0
CLK
G1
0.3
0.5
P1
CLK
G0
0.3
G3
P3
G3+P3G2
4
G3:0
1.2
t 115ps
1.5
Dynamic logic
P2
2.5
G1
0.3
A
1
CLK
B
2.5
G0
0.3
P3
P2
Non-critical path
implemented with static
logic (not shown)
CLK
A
B
0.5
0.5
0.5
P
C
2.5
P3G2
CLK
B
A
P1
A
2
1.5
1.2
TB
G3:0
0.5
2.5
G2
CLK
B
A
2.5
A
B
0.5
0.5
0.5
Sum Generation
S12:15 B12:15 A12:15
S4:7
C11
CLK
B0:3
A0:3
Cin
G3
0.5
P3
G2
CD logic
P2
G1
1.5
P1
G0
0.25
G15:0, P15:0
G3:0
C31
G3
0.5
G27:0, P27:0
G31:28, P31:28
P3
G2
C23
C27
S24:27 B24:27 A24:27
G3:0
P2
G1
1.5
Pseudo-NMOS
Fig. 13.
S0:3
C3
Critical Path
1.4
t 115ps
0.3
A4:7
1.4
TB
CLK
B4:7
C7
P1
G0
C15
C19
S20:23 B20:23 A20:23
32-bit CLA.
TABLE IV
32- BIT CLA P ERFORMANCE C OMPARISON
Data activity
100%
Delay
(ps)
Power
(mW)
50%
25%
10%
PDP
(pJ)
EDP
(pJps)
Power
(mW)
PDP
(pJ)
EDP
(pJps)
Power
(mW)
PDP
(pJ)
EDP
(pJps)
Power
(mW)
PDP
(pJ)
EDP
(pJps)
Worst case
leakage (A)
32.9
Static
448 (1.00)
5.13
2.3
1030
1.94
0.87
390
0.99
0.45
199
0.42
0.19
85.1
Dynamic
316 (0.71)
5.27
1.67
526
2.22
0.7
222
1.32
0.42
132
0.8
0.25
79.8
32.1
CDL
287 (0.64)
5.29
1.52
437
2.21
0.63
182
1.33
0.38
110
0.81
0.23
66.7
32.7
Pseudo-nMOS
313 (0.70)
5.34
1.67
523
2.21
0.69
216
1.25
0.39
122
0.68
0.21
66.3
551.2
CD logic
272 (0.61)
5.02
1.37
371
2.31
0.63
171
1.43
0.39
106
0.90
0.25
66.8
33.5
CCD logic
239 (0.53)
5.09
1.22
291
2.34
0.56
134
1.46
0.35
84
0.94
0.22
53.5
33.4
Normalized Power
2.3
2.2
2.1
2.0
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1.0
0.9
0.8
563
CLK
DFF
Static
Dynamic
CDL
pseudoNMOS
CD logic
CCD logic
Wallace
Tree
2:1 Mux
4 Bit FA
100%
50%
25%
10%
2:1 Mux
DFF
CLK
P8P7P6
Critical
Path
P12P11P10P9
3 Bit FA
1.5
1.4
1.2
1.1
DFF
Static
Dynamic
CDL
pseudoNMOS
CD logic
CCD logic
Fig. 17.
1.5
1.0
Static
Dynamic
pseudoNMOS
CD logic
1.4
0.9
0.8
0.7
0.6
0.5
100%
50%
25%
10%
CLK
Normalized Power
Normalized PDP
1.3
1.3
1.2
1.1
1.0
0.9
Normalized EDP
0.8
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
100%
25%
Fig. 18.
50%
10%
100%
25%
10%
50%
Static
Dynamic
CDL
pseudoNMOS
CD logic
CCD logic
CCD logic
Temp.
(C)
Mean
(mV)
(mV)
Mean +
6 (mV)
Mean
(mV)
(mV)
Mean +
6 (mV)
85
65.3
23.2
204.5
63.6
26.2
220.8
110
88.5
29.8
267.3
86.8
36.1
303.4
564
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
TABLE VI
8-bit M ULTIPLIER P ERFORMANCE C OMPARISON
Data activity
100%
50%
25%
10%
Delay
(ps)
Power
(mW)
PDP
(pJ)
EDP
(pJps)
Power
(mW)
PDP
(pJ)
EDP
(pJps)
Power
(mW)
PDP
(pJ)
EDP
(pJps)
Power
(mW)
PDP
(pJ)
EDP
(pJps)
Worst case
leakage (A)
Static
404 (1.00)
3.52
1.42
575
2.29
0.93
375
1.29
0.52
211
0.67
0.27
109
88.78
Dynamic
294 (0.73)
3.57
1.05
309
2.36
0.7
205
1.39
0.41
121
0.80
0.23
69
89.09
Pseudo-nMOS
292 (0.72)
3.82
1.11
325
2.56
0.75
218
1.57
0.46
134
0.96
0.28
82
619.6
CD logic
243 (0.60)
3.59
0.87
212
2.46
0.60
145
1.49
0.36
88
0.88
0.22
52
89.99
TABLE VII
PVT AND M ONTE C ARLO P ERFORMANCE A NALYSIS OF THE CD AND CCD L OGIC -BASED D ESIGNS
Corner
Temperature (C)
FF
FS
30
110
SF
30
110
SS
30
110
Monte Carlo
30
110
VDD (V)
1.1
0.9
1.1
0.9
1.1
0.9
1.1
0.9
1.1
0.9
1.1
0.9
1.1
0.9
1.1
0.9
mean
152
199
142
197
182
244
178
265
164
220
152
217
262
369
249
392
226
16.2
189
238
172
229
222
288
206
289
205
259
186
248
321
439
294
437
274
17.8
157
200
147
199
190
250
185
263
169
218
156
214
280
388
266
405
240
18.2
211
229
208
228
223
247
219
249
216
236
212
235
259
327
251
347
244
1.3
1.1
1.3
1.2
Normalized EDP
Normalized PDP
1.2
1.4
Static
Dynamic
pseudoNMOS
CD logic
1.0
0.9
0.8
0.7
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.6
0.3
100%
50%
25%
10%
100%
1.1
Static
Dynamic
pseudoNMOS
CD logic
50%
25%
10%
worst case glitch for CD and CCD logic at 110 C are 220.8
and 303.4 mV, respectively.
CD logics advantages in terms of delay and EDP were also
demonstrated in 8-bit Wallace tree multipliers. Compared to
32-bit adders, CD logic achieves a similar delay improvement,
but has an even better EDP reduction, primarily because the
final adder which makes up the critical path of the multiplier
is a relatively small circuit block of the overall circuitry. At
25% , CD logic is 52, 25, and 37% more EDP-efficient than
static, dynamic, and pseudo-nMOS logic, respectively.
565
R EFERENCES
[1] R. Zimmermann and W. Fichtner, Low-power logic styles: CMOS
versus pass-transistor logic, IEEE J. Solid-State Circuits, vol. 32, no. 7,
pp. 10791090, Jul. 1997.
[2] N. Goncalves and H. De Man, NORA: A racefree dynamic CMOS
technique for pipelined logic structures, IEEE J. Solid-State Circuits,
vol. 18, no. 3, pp. 261266, Jun. 1983.
[3] C. Lee and E. Szeto, Zipper CMOS, IEEE Circuits Syst. Mag., vol. 2,
no. 3, pp. 1016, May 1986.
[4] R. Rafati, S. Fakhraie, and K. Smith, A 16-bit barrel-shifter implemented in data-driven dynamic logic (D 3 L), IEEE Trans. Circuits Syst.
I, Reg. Papers, vol. 53, no. 10, pp. 21942202, Oct. 2006.
[5] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri, and P. Corsonello, Lowpower split-path data-driven dynamic logic, Circuits Dev. Syst. IET,
vol. 3, no. 6, pp. 303312, Dec. 2009.
[6] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems
Perspective, 4th ed. Reading, MA: Addison Wesley, Mar. 2010.
[7] K. Bernstein, High Speed CMOS Design Styles, 1st ed. New York:
Springer-Verlag, Aug. 1998.
[8] S. Mathew, R. K. Krishnamurthy, M. A. Anders, R. Rios, K. R. Mistry,
and K. Soumyanath, Sub-500-ps 64-b ALUs in 0.18-m SOI/bulk
CMOS: Design and scaling trends, IEEE J. Solid-State Circuits, vol.
36, no. 11, pp. 318319, Nov. 2001.
[9] S. Mathew, M. Anders, R. Krishnamurthy, and S. Borkar, A 4 GHz
130 nm address generation unit with 32-bit sparse-tree adder core, in
VLSI Circuits Dig. Tech. Papers Symp., 2002, pp. 126127.
[10] S. K. Mathew, M. A. Anders, B. Bloechel, T. N. Krishnamurthy, and
S. Borkar, A 4-GHz 300-mW 64-bit integer execution ALU with dual
supply voltages in 90-nm CMOS, IEEE J. Solid-State Circuits, vol. 40,
no. 1, pp. 4451, Jan. 2005.
[11] S. Wijeratne, N. Siddaiah, S. Mathew, M. Anders, R. Krishnamurthy, J.
Anderson, S. Hwang, M. Ernest, and M. Nardin, A 9 GHz 65 nm intel
pentium 4 processor integer execution core, in IEEE Int. Solid-State
Circuits Conf. ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2006,
pp. 353365.
[12] I. Sutherland, R. F. Sproull, and D. Harris. (Feb. 1999). Logical Effort: Designing Fast CMOS Circuits [Online]. Available:
http://amazon.com/o/ASIN/1558605576/
[13] S. Kiaei, S.-H. Chee, and D. Allstot, CMOS source-coupled logic
for mixed-mode VLSI, in Proc. IEEE Int. Circuits Syst. Symp., New
Orleans, LA, May 1990, pp. 16081611.
[14] L. McMurchie, S. Kio, G. Yee, T. Thorp, and C. Sechen, Output
prediction logic: A high-performance CMOS design technique, in Proc.
Comput. Des. Int. Conf., Austin, TX, 2000, pp. 247254.
[15] K. H. Chong, L. McMurchie, and C. Sechen, A 64 b adder using selfcalibrating differential output prediction logic, in IEEE Int. Solid-State
Circuits Conf. Dig. Tech. Papers, San Francisco, CA, Feb. 2006, pp.
17451754.
[16] V. Navarro-Botello, J. A. Montiel-Nelson, and S. Nooshabadi, Low
power arithmetic circuit in feedthrough dyanmic CMOS logic, in Proc.
IEEE Int. 49th Midw. Symp. Circuits Syst., Aug. 2006, pp. 709712.
[17] V. Navarro-Botello, J. A. Montiel-Nelson, and S. Nooshabadi, Analysis
of high-performance fast feedthrough logic families in CMOS, IEEE
Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 6, pp. 489493, Jun.
2007.
[18] Y. Taur and T. Ning, Fundamentals of Modern VLSI Devices. Cambridge,
U.K.: Cambridge Univ. Press, 1998.
[19] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer cmos circuits, Proc. IEEE, vol. 91, no. 2, pp. 305327,
Feb. 2003.