Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

6/8/2018

ECE4740:
Digital VLSI Design
Lecture 19: Dynamic latches/flip-flops

690

Recap

Timing, flip-flops, and latches

691

1
6/8/2018

Common flip-flop and latch symbols

D Q D Q D Q D Q

CLK CLK CLK CLK

rising-edge falling-edge positive latch negative latch


triggered FF triggered FF 1-transparent 0-transparent
0-hold 1-hold

• Real-world flip-flops (and latches) may have


more inputs and outputs, such as
– Reset in, enable in, scan in, and !Q out
692

Positive latch: transparent if CLK=1


CLK

!CLK

input sampled
D (transparent mode)

CLK
CLK
in D Q out !CLK

clk CLK
feedback (hold mode)
693

2
6/8/2018

Positive-edge triggered MS flip-flop


master slave

T2 I5 T4 I6 Q
I2 I3
QM

T1 I4 T3
D I1

0
CLK
1

CLK=0  master transparent; slave hold


694

Positive-edge triggered MS flip-flop


master slave

T2 I5 T4 I6 Q
I2 I3
QM

T1 I4 T3
D I1

1
CLK
0

CLK=1  master hold; slave transparent


695

3
6/8/2018

Setup and hold times

clock clock

tsetup thold time

In data must
be stable

tpd,ff time

Out output 9chncy3820v output


stable 58voq5n0521 stable

tcd,reg output time


undefined
696

Non-ideal clocks: clock skew

CLK CLK

!CLK !CLK

ideal clocks Non-ideal clocks


clock skew

clock skew can happen due to


1-1 overlap
uneven wire lengths, capacitances,
different fan-outs, etc. 0-0 overlap

697

4
6/8/2018

1-1 overlap is dangerous


CLK !CLK Q
on on
P1 P3 I3 I4 !Q
D I1 I2

P2 P4

!CLK CLK

• Direct path from D to Q during short time


when both CLK and !CLK are high
– Happens during 1-1 overlap
698

1-1 overlap is dangerous (cont’d)


X=? !CLK Q
CLK
on
P1 A P3 I3 I4 !Q
D I1 I2

B P4
P2
on
!CLK CLK

• Both B and D are driving A when CLK and


!CLK are both high (1-1 overlap)

699

5
6/8/2018

Generating a non-1-1-overlapping clock

• To avoid overlapping clocks 1-1 we need


– tools for accurate timing analysis OR
– non-1-1-overlapping clock signals
– One can use SR-latch to generate such clocks

CLK
tnon_overlap
CLK1

CLK2

700

Building sequential logic with fewer transistors

Dynamic latches and flip-flops

701

6
6/8/2018

Static vs. dynamic storage cells


• Static cells use bistable element with feedback
(regeneration)
– Preserve state as long as power is on
• Static storage is preferred when updates are
infrequent (clock gating etc.)
• Dynamic storage on parasitic capacitors
– Preserve state only for milliseconds
• Dynamic storage cells are usually smaller,
achieve higher speed and consume lower power
702

Dynamic edge-triggered flip-flop


master slave
!CLK CLK
QM Q gate cap of I2, and
T1 I1 T2 I2 junction cap & overlap
D
gate cap of T2
C1 !CLK C2
CLK

master transparent
slave hold
CLK

master hold
!CLK
slave transparent
703

7
6/8/2018

Dynamic ET flip-flop (cont’d)


master slave
!CLK CLK
QM Q tsu = tpd_tx
T1 I1 T2 I2 thold = zero
D
tpd = 2tpd_inv+tpd_tx
C1 !CLK C2
CLK

• Requires only 8 transistors; clock load = 4


• Dynamic nodes need periodical refresh

704

Issue 1: race conditions


output can change
at falling edge
!CLK CLK

D T1 I1 T2 I2 Q

C1 !CLK C2
CLK

0-0 overlap race condition


CLK toverlap0-0 < tT1 +tI1 + tT2
!CLK
1-1 overlap race condition
toverlap1-1 < thold
data must be stable
during high phase
705

8
6/8/2018

Solution: non-overlapping clocks

CLK1 CLK2

D T1 I1 T2 I2 Q

C1 !CLK2 C2
!CLK1
requires
master transparent routing of 4
slave hold clock signals

CLK1
tnon_overlap
CLK2
master hold
slave transparent
706

Issue 2: robustness
• Dynamic flip-flops suffer from
– Coupling between signal nets and internal
storage nodes (can destroy FF state)
– Leakage currents cause state to leak with time
• Solution: pseudostatic FF add weak
feedback inverter
to each latch
!CLK CLK

D Q

CLK !CLK

707

9
6/8/2018

A clock-skew insensitive approach

The C2MOS register

708

C2MOS (clocked CMOS) ET FF


Master Slave

M2 M6

CLK M4on !CLK Moff


8
off QM on
D Q
!CLK M3on C1 CLK M7off C2
off on
M1 M5

master transparent
slave hold
CLK

!CLK master hold


slave transparent
709

10
6/8/2018

C2MOS FF: 0-0 overlap


M2 M6 all clock inputs
are zero
0 M4 0 M8
QM
D Q
C1 C2

M1 M5

CLK CLK
!CLK !CLK

710

C2MOS FF: 1-1 overlap


M2 M6
all clock inputs
are VDD
QM
D Q
1 M3 C1 1 M7 C2

M1 M5

CLK CLK
!CLK !CLK

1-1 overlap constraint


toverlap1-1 < thold 711

11
6/8/2018

(Slope matters: transient response)


3
For a
2.5
QM(3) 0.1 ns clock
2 Q(3)

1.5
Q(0.1)
1 clk(0.1)
For a
0.5 CLK(3) 3 ns clock
0 (race condition
exists)
-0.5
0 2 4 6 8
Time (nsec)
712
Image adapted from: Digital Integrated Circuits (2nd Edition) by Rabaey, Chandrakasan, Nikolic

For high-throughput designs

Pipelining & retiming

713

12
6/8/2018

Consider the timing of this circuit

REG
a

REG
CLK log Out
REG

b CLK

CLK

• Critical path:
Tmin=tpd,ff+tpd,add+tpd,abs+tpd,log+tsu,ff

714
Image taken from: Digital Integrated Circuits (2nd Edition) by Rabaey, Chandrakasan, Nikolic

Pipelining reduces critical path


REG

+
REG

REG

REG

CLK log Out


REG

b CLK CLK CLK

CLK

• Insert pipeline registers (flip-flops)


• Shortens critical path!
Tpipe,min=tpd,ff+max{tpd,add,tpd,abs,tpd,log}+tsu,ff

715
Image taken from: Digital Integrated Circuits (2nd Edition) by Rabaey, Chandrakasan, Nikolic

13
6/8/2018

Pipelining
a1+b1
a REG

REG

REG

REG
CLK log Out
REG

b CLK CLK CLK

CLK Cycle Add Abs Log


1 a1+b1
2 a2+b2 |a1+b1|
3 a3+b3 |a2+b2| log|a1+b1|
4 a4+b4 |a3+b3| log|a2+b2|
… … … …

716

Pipelining (cont’d)
REG

a |a1+b1|

+
REG

REG

REG

CLK log Out


REG

b CLK CLK CLK

CLK Cycle Add Abs Log


1 a1+b1
2 a2+b2 |a1+b1|
new data item 3 a3+b3 |a2+b2| log|a1+b1|
inserted in pipeline
4 a4+b4 |a3+b3| log|a2+b2|
… … … …

717

14
6/8/2018

Pipelining (cont’d)
a REG
log|a1+b1|

REG

REG

REG
CLK log Out
REG

b CLK CLK CLK

CLK Cycle Add Abs Log


1 a1+b1
2 a2+b2 |a1+b1|
3 a3+b3 |a2+b2| log|a1+b1|
new data item 4 a4+b4 |a3+b3| log|a2+b2|
inserted in pipeline
… … … …

718

Pipelining improves throughput!


REG

+
REG

REG

REG

CLK log Out


REG

b CLK CLK CLK

CLK

• Processes 1 data item per clock cycle at higher fmax


 higher throughput (time per data item Tmin,pipe)
• Ideally: Tmin,pipe = tpd,ff+tpd,logic/N+tsu,ff with N stages
• Throughput limit: Tmin,pipe  tpd,ff+tsu,ff
719

15
6/8/2018

Pipelining introduces latency


a REG

REG

REG

REG
CLK log Out
REG

b CLK CLK CLK

CLK

• Latency = # of cycles for data to


propagate from input to output
• Latency = 4 (four rising clock edges)
720

The feedback problem


REG

+
REG

REG

REG

CLK log Out


REG

b CLK CLK CLK

CLK

• If feedback path is present, latency will reduce


throughput (circuit has to wait for data)
• Problem in processors and application specific
integrated circuits (data dependencies)
721

16
6/8/2018

Solution: Pipeline interleaving


reduces
throughput by 2x
A B

• Idea: Process independent problems in an


interleaved manner in the same hardware

even cycles odd cycles

A B A B
P1 P2 P2 P1

722

Pipelining using C2MOS


M2 M6 M2

CLK M4 !CLK M8 CLK M4


F G
In Out
M3 C1 CLK M7 C2 !CLK M3
!CLK
M1 M5 M1

• Circuit is race-condition free (NORA) if


functions F and G are non-inverting!
723

17
6/8/2018

Your turn: pipeline a MAC unit


multiply-accumulate (MAC) unit: D=B*C+A

A tpd,ff=0.5ns • What is the max.


DQ
tpd,add=2ns
clock frequency?
CLK
• Where is the
B
DQ + DQ
D critical path?
CLK CLK
• Insert a single
*
tsu,ff=0.5ns pipeline stage
C
DQ
tpd,mult=7ns
• What is the max.
CLK clock frequency
after pipelining?
724

Critical path and max. clock freq.


A tpd,ff=0.5ns
DQ
tpd,add=2ns
CLK

B D
DQ + DQ

CLK CLK
*
C tsu,ff=0.5ns
DQ
tpd,mult=7ns
CLK

• Tmin=tpd,ff+tpd,mult+tpd,add+tsu,ff=10ns
• fmax=100MHz
725

18
6/8/2018

Pipelining: max. clock freq. now?


A tpd,ff=0.5ns data must arrive in
DQ the same cycle at
tpd,add=2ns input of adder!
CLK

B D
DQ + DQ

CLK CLK
*
C tsu,ff=0.5ns
DQ
tpd,mult=7ns
CLK

• Tmin,pipe=tpd,ff+tpd,mult+tsu,ff=8ns
• fmax=125MHz
726

19

You might also like