Unit-Iii Sequential Logic Circuits

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 104

UNIT-III

SEQUENTIAL LOGIC CIRCUITS


Static and Dynamic Latches and
Registers, Timing issues, pipelines, clock
strategies, Memory architecture and
memory control circuits, Low power
memory circuits, Synchronous and
Asynchronous design
Static Latches and Registers
The Bistability Principle:

• Static memories use positive feedback to create a


bistable circuit — a circuit having two stable states that
represent 0 and 1.

2 cascaded inverters
Static Latches and Registers

voltage transfer characteristics


Static Latches and Registers
• The resulting circuit has only three possible operation
points (A, B, and C), as demonstrated on the combined
VTC.
• Under the condition that the gain of the inverter in the
transient region is larger than 1, only A and B are stable
operation points, and C is a metastable operation point.
• A bistable circuit has two stable states.
Static Latches and Registers
• In absence of any triggering, the circuit remains in a
single state (assuming that the power supply remains
applied to the circuit), and hence remembers a value.

• A trigger pulse must be applied to change the state of


the circuit.

• Another common name for a bistable circuit is flip-flop.


SR Flip-Flops

• The cross-coupled inverter pair provides an approach


to store a binary variable in a stable way.
• However, extra circuitry must be added to enable
control of the memory states.

NOR-based SR flip-flop
SR Flip-Flops
SR Flip-Flops

• When both S and R are 0, the flip-flop is in a quiescent


state and both outputs retain their value.
• If a positive (or 1) pulse is applied to the S input, the Q
output is forced into the 1 state.
• Vice versa, a 1 pulse on R resets the flip-flop and the Q
output goes to 0.
• The characteristic table is the truth table of the gate
and lists the output states as functions of all possible
input conditions.
SR Flip-Flops

• Most systems operate in a synchronous fashion with


transition events referenced to a clock.
• One possible realization of a clocked SR flip-flop— a
level-sensitive positive latch.
• It consists of a cross-coupled inverter pair i , plus 4 extra
transistors to drive the flip-flop from one state to another
and to provide clocked operation.
SR Flip-Flops
SR Flip-Flops

• The combination of transistorsM4 , M7 , and M8 forms


a ratioed inverter.
• In order to make the latch switch, we must succeed in
bringingQ below the switching threshold of the
inverterM1 -M2.
• Once this is achieved, the positive feedback causes the
flip-flop to invert states.
• The presented flip-flop does not consume any static
power.
Multiplexer Based Latches

• Multiplexer based latches can provide similar


functionality to the SR latch, but has the important added
advantage that the sizing of devices only affects
performance and is not critical to the functionality.
Multiplexer Based Latches

• For a negative latch, when the clock signal is low, the


input 0 of the multiplexer is selected, and the D input is
passed to the output.
• When the clock signal is high, the input 1 of the
multiplexer, which connects to the output of the latch, is
selected.
• The feedback holds the output stable while the clock
signal is high.
• Similarly in the positive latch, the D input is selected
when clock is high, and the output is held (using
feedback) when clock is low.
Multiplexer Based Latches

Transistor level implementation of


a positive latch built using transmission gates.

• When CLK is high, the bottom transmission gate ison


and the latch is transparent - that is, the D input is copied to
the Q output.

• During this phase, the feedback loop is open since the


top transmission gate is off.
Master-Slave Based Edge Triggered
Register
• The most common approach for constructing an edge-
triggered register is to use a master-slave configuration.
• The register consists of cascading a negative latch
(master stage) with a positive latch (slave stage).
Master-Slave Based Edge Triggered
Register
• On the low phase of the clock, the master stage is
transparent and the D input is passed to the master
stage output, QM.
• During this period, the slave stage is in the hold mode,
keeping its previous value using feedback.
• On the rising edge of the clock, the master slave stops
sampling the input, and the slave stage starts sampling.
• During the high phase of the clock, the slave stage
samples the output of the master stage (QM), while the
master stage remains in a hold mode.
Master-Slave Based Edge Triggered
Register

• When clock is low (CLK = 1), T1 is on and T2 is off, and the D input is
sampled onto node QM.
• When the clock goes high, the master stage stops sampling the input
and goes into a hold mode.
Low-Voltage Static Latches
• The scaling of supply voltages is critical for low power
operation.
• Unfortunately, certain latch structures don’t function at
reduced supply voltages.
• Scaling to low supply voltages hence requires the use of
reduced threshold devices.
• When the registers are constantly accessed, the
leakage energy is typically insignificant compared to the
switching power.
• However, with the use of conditional clocks, it is
possible that registers are idle for extended periods and
the leakage energy expended by registers can be quite
significant.
Low-Voltage Static Latches

• Many solutions are being explored to address the


problem of high leakage during idle periods.
Dynamic Latches and Registers

• Storage in a static sequential circuit relies on the concept


that a cross-coupled inverter pair produces a bistable
element and can thus be used to memorize binary
values.
• The major disadvantage of the static gate, however, is
its complexity.
• The principle is exactly identical to the one used in
dynamic logic — charge stored on a capacitor can be
used to represent a logic signal.
• The absence of charge denotes a 0, while its presence
stands for a stored 1.
Dynamic Transmission-Gate Based
Edge-triggred Registers

•When CLK = 0, the input data is sampled on storage node


1, which has an equivalent capacitance of C1 consisting of
the gate capacitance of I1 , the junction capacitance of T1 ,
and the overlap gate capacitance of T1 .
Dynamic Transmission-Gate Based
Edge-triggred Registers
• During this period, the slave stage is in a hold mode,
with node 2 in a high-impedance (floating) state.
• On the rising edge of clock, the transmission gate T2
turns on, and the value sampled on node 1 right before
the rising edge propagates to the output Q (note that
node 1 is stable during the high phase of the clock since
the first transmission gate is turned off).
• Node 2 now stores the inverted version of node 1.
• This implementation of an edge-triggered register is
very efficient as it requires only 8 transistors.
2
C MOS Dynamic Register: A Clock Skew
Insensitive Approach
2
The C MOS Register
C MOS Dynamic Register: A Clock Skew
Insensitive Approach
------
• CLK = 0 (CLK = 1):
• The first tri-state driver is turned on, and the master
stage acts as an inverter sampling the inverted version
of D on the internal node X.
• The master stage is in the evaluation mode.
• Meanwhile, the slave section is in a high-impedance
mode, or in a hold mode.
• The roles are reversed when CLK = 1.
True Single-Phase Clocked Register
(TSPCR)
• In the two-phase clocking schemes described above,
care must be taken in routing the two clock signals to
ensure that overlap is minimized.
• The True Single-Phase Clocked Register (TSPCR) uses
a single clock (without an inverse clock) .
True Single-Phase Clocked Register
(TSPCR)
• For the positive latch, when CLK is high, the latch is in
the transparent mode and corresponds to two cascaded
inverters; the latch is non-inverting, and propagates the
input to the output.
• When CLK = 0, both inverters are disabled, and the latch
is in hold-mode.
• Only the pull-up networks are still active, while the pull-
down circuits are deactivated.
• A register can be constructed by cascading positive and
negative latches.
True Single-Phase Clocked Register
(TSPCR)

• The main advantage is the use of a single clock phase.


• The disadvantage is the slight increase in the number of
transistors — 12 transistors are required.
• TSPC offers an additional advantage: the possibility of
embedding logic functionality into the latches.
• This reduces the delay overhead associated withthe
latches.
True Single-Phase Clocked Register
(TSPCR)

• When CLK = 0, the input inverter is sampling the inverted D


input on node X.
• The second (dynamic) inverter is in the precharge mode.
• The third inverter is in the hold mode.
Pulse Registers
• A fundamentally different approach for constructing a
register uses pulse signals.
• The idea is to construct a short pulse around the rising
(or falling) edge of the clock.
• This pulse acts as the clock input to a latch, sampling
• the input only in a short window.
• Race conditions are thus avoided by keeping the
opening time (i.e, the transparent period) of the latch
very short.
• The combination of the glitch generation
• circuitry and the latch results in a positive edge-triggered
register.
Pulse Registers
Pulse Registers

• This in turn activates MN, pulling X and eventually CLKG


low.
• The length of the pulse is controlled by the delay of the
AND gate and the two inverters.
Pulse Registers

• The advantage of the approach is the reduced clock load


and the small number of transistors required.

• The glitch-generation circuitry can be amortized over


multiple register bits.

• The disadvantage is a substantial increase in verification


complexity.

• This has prevented a wide-spread use.


Sense-Amplifier Based Registers

• A sense amplifier structure to implement an edge-


triggered register.

• Sense amplifier circuits accept small input signals and


amplify them to generate rail-to-rail swings.

• There are many techniques to construct these amplifiers,


with the use of feedback (e.g., cross-coupled inverters).
Sense-Amplifier Based Registers

Positive edge-triggered
register based on sense-amplifier
Sense-Amplifier Based Registers
• The circuit uses a precharged front-end amplifier that
samples the differential input signal on the rising edge of
the clock signal.

• The outputs of front-end are fed into a NAND cross-


coupled SR FF that holds the data and gurantees that
the differential outputs switch only once per clock cycle.

• The differential inputs in this implementation don’t have


to have rail-to-rail swing and hence this register can be
used as a receiver for a reduced swing differential bus.
Pipelining: An approach to optimize
sequential circuits
• Pipelining is a popular design technique often used to
accelerate the operation of the datapaths in digital
processors.
• The goal of the presented circuit is to computelog(|a -
b|), where both a and b represent streams of numbers,
that is, the computation must be performed on a large
set of input values.
• The minimal clock period Tmin necessary to ensure
correct evaluation is given as:
Pipelining: An approach to optimize
sequential circuits

• Where tc-q and tsu are the propagation delay and the set-
up time of the register, respectively.

• The term tpd,logic stands for the worst-case delay path


through the combinatorial network, which consists of the
adder, absolute value, and logarithm functions.

• Pipelining is a technique to improve the resource


utilization, and increase the functional throughput.
Pipelining: An approach to optimize
sequential circuits
Pipelining: An approach to optimize
sequential circuits

• The advantage of pipelined operation becomes apparent


when examining the minimum clock period of the
modified circuit.
• The combinational circuit block has been partitioned into
three sections, each of which has a smallerp ropagation
delay than the original function.
• This effectively reduces the value of the minimum
allowable clock period:
Pipelining: An approach to optimize
sequential circuits
• Suppose that all logic blocks have approximately the
same propagation delay, and that the register overhead
is small with respect to the logic delays.

• The pipelined network outperforms the original circuit by


a factor of three under these assumptions, oTr min,pipe=
Tmin/3.

• The increased performance comes at the relatively small


cost of two additional registers, and an increased
latency.
Latch- vs. Register-Based Pipelines

• Pipelined circuits can be constructed using level-


sensitive latches instead of edge-triggered registers.
• The pipeline system is implemented based on pass-
transistor-based positive and negative latches instead of
edge triggered registers.
• That is, logic is introduced between the master and slave
latches of a master-slave system.
• Latch-based systems give significantly more flexibility in
implementing a pipelined system, and often offers higher
performance.
Latch- vs. Register-Based Pipelines

Operation of two-phase
pipelined circuit using dynamic registers
NORA-CMOS— A Logic Style for
Pipelined Structures
• This topology has one important property:
• A - based pipelined circuit is race-free as long as
all the logic functionsF (implemented using static logic)
between the latches are noninverting.
• The only way a signal can race from stage to stage
under this condition is when the logic functionF is
inverting where F is replaced by a single, static CMOS
inverter.
NORA-CMOS— A Logic Style for
Pipelined Structures
NORA-CMOS— A Logic Style for
Pipelined Structures

• Logic and latch are clocked in such a way that both are
simultaneously in either evaluation, or hold (precharge)
mode.
-----
• A block that is in evaluation during CLK = 1 is called a
CLK-module,
-----
while the inverse is called a CLK-module.
• A NORA datapath consists of a chain of alternating CLK
and CLK modules.
• While one class of modules is precharging with its output
latch in hold mode, preserving the previous output value,
the other class is evaluating.
Memory architecture
Semiconductor Memory
Classification

Non-Volatile
Read-Write Memory Read-Write Read-Only Memory
Memory

Random Non-Random EPROM Mask-Programmed


Access Access
E2PROM Programmable (PROM)

SRAM FIFO FLASH

DRAM LIFO
Shift Register
CAM
Memory Timing: Definitions
Memory Architecture:
Decoders M bits M bits

S0 S0
Word 0 Word 0
S1
Word 1 A0 Word 1
S2 Storage Storage
Word 2 A1 Word 2
cell cell

words AK2 1
N SN 2 2 Decoder
Word N 2 2 Word N 2 2
SN 2 1
Word N 2 1 Word N 2 1
K 5 log2N

Input-Output Input-Output
(M bits) (M bits)

Intuitive architecture for N x M memory Decoder reduces the number of select signals
Too many select signals:
N words == N select signals
K = log2N
Contents-Addressable Memory
Data (64 bits)

I/O Buffers

Commands
I/O Buffers
I/O Buffers
Comparand

Mask
Commands
Commands

Address Decoder

Priority Encoder
29 Validity Bits
CAM Array
Control Logic R/W Address (9 bits) 9
2 words 3 64 bits

92Validity Bits
Priority
Address Decoder9 Validity BitsEncode
2 Priority Encod
Address Decoder
Memory Timing:
Approaches

DRAM Timing SRAM Timing


Multiplexed Adressing Self-timed
Read-Only Memory Cells
BL BL BL
VDD
WL
WL WL
1

BL BL BL

WL WL
WL
0
GND

Diode ROM MOS ROM 1 MOS ROM 2


MOS OR ROM
BL [0] BL [1] BL [2] BL [3]

WL [0]
V DD
WL [1]

WL [2]
V DD

WL [3]

V bias

Pull-down loads
MOS NOR ROM
V DD
Pull-up devices

WL [0]

GND
WL [1]

WL [2]

GND
WL [3]

BL [0] BL [1] BL [2] BL [3]


MOS NAND ROM
V DD
Pull-up devices

BL [0] BL [1] BL [2] BL [3]

WL [0]

WL [1]

WL [2]

WL [3]

All word lines high by default with exception of selected row


Equivalent Transient Model for MOS
NOR ROM
Model for NOR ROM V DD

BL
r word
WL Cbit

cword

• Word line parasitics


– Wire capacitance and gate capacitance
– Wire resistance (polysilicon)
• Bit line parasitics
– Resistance not dominant (metal)
– Drain and Gate-Drain capacitance
Equivalent Transient Model for MOS
NAND ROM
V DD

Model for NAND ROM BL

CL
r bit

cbit
r word
WL

cword

 Word line parasitics


 Similar to NOR ROM
 Bit line parasitics
 Resistance of cascaded transistors dominates
 Drain/Source and complete gate capacitance
Non-Volatile Memories
The Floating-gate transistor
(FAMOS)
Floating gate Gate
D
Source Drain

tox G

tox
S
n+ p n+_
Substrate

Device cross-section Schematic symbol


Floating-Gate Transistor
Programming
20 V 0V 5V

10 V 5V 20 V 0V 5V
25V 2 2.5 V

S D S D S D

Avalanche injection Removing programming Programming results in


voltage leaves charge trapped higher V T .
Flash EEPROM

Control gate
Floating gate

erasure Thin tunneling oxide

n1 source n1 drain
programming
p-substrate

Many other options …


Basic Operations in a NOR Flash
Memory―Erase
Basic Operations in a NOR Flash
Memory―Write
Basic Operations in a NOR Flash
Memory―Read
NAND Flash Memory
Word line(poly)

Unit Cell

Source line
(Diff. Layer)

Courtesy Toshiba
Read-Write Memories (RAM)
 STATIC (SRAM)

• Data stored as long as supply is applied


• Large (6 transistors/cell)
• Fast
• Differential

 DYNAMIC (DRAM)

• Periodic refresh required


• Small (1-3 transistors/cell)
• Slower
• Single Ended
6-transistor CMOS SRAM Cell

WL

V DD
M2 M4
Q
M5 Q M6

M1 M3

BL BL
CMOS SRAM Analysis (Read)
WL

V DD
BL M4
BL
Q= 0
Q= 1 M6
M5

V DD M1 V DD V DD

Cbit Cbit
CMOS SRAM Analysis
(Write) WL
V DD
M4

Q= 0 M6
M5
Q= 1

M1
V DD
BL = 1 BL = 0
3-Transistor DRAM Cell
BL 1 BL 2

WWL

RWL WWL

M3 RWL

M1 X X V DD 2 V T
M2
V DD
CS BL 1

BL 2 V DD 2 V T DV

No constraints on device ratios


Reads are non-destructive
Value stored at node X when writing a “1” = V WWL-VTn
1-Transistor DRAM Cell

Write: C S is charged or discharged by asserting WL and BL.


Read: Charge redistribution takes places between bit line and storage capacitance
CS
DV = VBL – V PRE = V BIT – V PRE ------------
C S + CBL

Voltage swing is small; typically around 250 mV.


DRAM Cell Observations
• 1T DRAM requires a sense amplifier for each bit line, due to
charge redistribution read-out.
• DRAM memory cells are single ended in contrast to SRAM
cells.
•The read-out of the 1T DRAM cell is destructive; read and
refresh operations are necessary for correct operation.
• Unlike 3T cell, 1T cell requires presence of an extra
capacitance that must be explicitly included in the design.
• When writing a “1” into a DRAM cell, a threshold voltage is
lost. This charge loss can be circumvented by bootstrapping
the word lines to a higher value than VDD
Static CAM Memory Cell
Bit Bit Bit Bit
Word Bit Bit
M8 M9
M4 M5
CAM ••• CAM
M6 M7

Word ••• •••


Word S S
int
CAM ••• CAM M3 M2
Match
M1

Wired-NOR Match Line


CAM in Cache Memory

CAM SRAM
ARRAY ARRAY
Hit Logic

Address Decoder

Input Drivers Sense Amps / Input Drivers

Address Tag Hit R/W Data


Row Decoders
•Collection of 2M complex logic gates
•Organized in regular and dense fashion

(N)AND Decoder

NOR Decoder
Hierarchical Decoders
Multi-stage implementation improves performance
•••

WL 1

WL 0

A 0A 1 A 0A 1 A 0A 1 A 0A 1 A 2A 3 A 2A 3 A 2A 3 A 2A 3

•••
NAND decoder using
2-input pre-decoders
A1 A0 A0 A1 A3 A2 A2 A3
Dynamic Decoders
Precharge devices GND GND VDD

WL 3
VDD
WL 3

WL 2
WL 2 VDD

WL 1
WL 1
V DD
WL 0
WL 0

VDD f A0 A0 A1 A1
A0 A0 A1 A1 f

2-input NOR decoder 2-input NAND decoder


4-input pass-transistor based
columnBL
decoder
BL BL BL
0 1 2 3

S0
A0
S1

S2

A1 S3

2-input NOR decoder

• Advantages: speed (tpd does not add to overall memory


access time)
• Only one extra transistor in signal path
•Disadvantage: Large transistor count
4-to-1 tree based column
decoder
BL BL BL BL 0 1 2 3

A0

A0

A1

A1

D
Number of devices drastically reduced
Delay increases quadratically with # of sections; prohibitive for large
decoders
Solutions ::buffers
progressive sizing
combination of tree and pass transistor approaches
Decoder for circular shift-
register

V DD V DD V DD V DD V DD V DD

WL 0 WL 1 WL 2
f f f f f f
• • •
R f f R f f R f f
V DD
Sense Amplifiers
make D V as small
C ×D V as possible
tp = ----------------
Iav

large small
Idea: Use Sense Amplifer

small
transition s.a.

input output
Differential Sense Amplifier
V DD

M3 M4
y Out

bit M1 M2 bit

SE M5

Directly applicable to
SRAMs
Differential Sensing ― SRAM
V DD V DD
PC

BL BL V DD V DD
EQ
y M3 M4 2y

WL i
x M1 M2 2x x 2x

SE M5 SE

SE
SRAM cell i

V DD
Diff.
x Sense 2x Output
Amp y

SE
Output
(a) SRAM sensing scheme (b) two stage differential amplifier
Latch-Based Sense Amplifier (DRAM)
EQ
BL BL
VDD

SE

SE

Initialized in its meta-stable point with EQ


Once adequate voltage gap created, sense amp enabled with SE
Positive feedback quickly forces output to a stable operating point.
Sources of Power Dissipation in
Memories
V DD

CHIP I DD 5 SCi DV i f 1S I DCP

nCDE V INT f m

selected mi act
CPTV INT f

I DCP
n

ROW non-selected m(n 2 1)i hld


DEC ARRAY

mC DE V INT f
PERIPHERY
COLUMN DEC

V SS
Suppressing Leakage in
SRAM
V DD
low-threshold transistor V DD V DDL
sleep
V DD,int sleep
V DD,int

SRAM SRAM SRAM


cell cell cell SRAM SRAM SRAM
cell cell cell

V SS,int
sleep

Inserting Extra Resistance Reducing the supply voltage


Clocking
• Synchronous systems use a clock to keep operations in
sequence
– Distinguish this from previous or next
– Determine speed at which machine operates
• Clock must be distributed to all the sequencing elements
– Flip-flops and latches
• Also distribute clock to other elements
– Domino circuits and memories
Clock Distribution
• On a small chip, the clock distribution network is just a
wire
– And possibly an inverter for clkb
• On practical chips, the RC delay of the wire resistance
and gate load is very long
– Variations in this delay cause clock to get to different
elements at different times
– This is called clock skew
• Most chips use repeaters to buffer the clock and
equalize the delay
– Reduces but doesn’t eliminate skew
Review: Skew Impact
clk clk

• Ideally full cycle is Q1 D2

F1

F2
Combinational Logic

available for work Tc

• Skew adds sequencing


clk
tpcq
tskew
tsetup
overhead Q1 tpdq

D2
• Increases hold time too clk

t pd  Tc   t pcq  tsetup  tskew 


Q1

F1
CL

sequencing overhead clk

D2
tcd  thold  tccq  tskew F2
tskew

clk
thold

Q1 tccq

D2 tcd
• Reduce clock skew
– Careful clock distribution network design
– Plenty of metal wiring resources
• Analyze clock skew
– Only budget actual, not worst case skews
– Local vs. global skew budgets
• Tolerate clock skew
– Choose circuit structures insensitive to skew
Skew Tolerance
• Flip-flops are sensitive to skew because of hard edges
– Data launches at latest rising edge of clock
– Must setup before earliest next rising edge of clock
– Overhead would shrink if we can soften edge
• Latches tolerate moderate amounts of skew
– Data can arrive anytime latch is transparent
Skew: Latches
f1 f2 f1
2-Phase Latches
D1 Q1 Combinational D2 Q2 Combinational D3 Q3
 2t 

L1

L2

L3
t pd  Tc  pdq
Logic 1 Logic 2

sequencing overhead
f1
tcd 1 , tcd 2  thold  tccq  tnonoverlap  tskew
f2
 c   tsetup  tnonoverlap  tskew 
T
tborrow
2
Pulsed Latches
t pd  Tc  max  t pdq , t pcq  tsetup  t pw  tskew 
sequencing overhead

tcd  thold  t pw  tccq  tskew

tborrow  t pw   tsetup  tskew 


Dynamic Circuit Review
• Static circuits are slow because fat pMOS load input
• Dynamic gates use precharge to remove pMOS
transistors from the inputs
– Precharge: f = 0 output forced high
– Evaluate: f = 1 output may pull low
A
B
C f Y
D Y A B C D
A B C D
Domino Circuits
• Dynamic inputs must monotonically rise during
evaluation
– Place inverting stage between each dynamic gate
– Dynamic / static pair called domino gate
• Domino gates can be safely cascaded
domino AND

W X
A
B
f

dynamic static
NAND inverter
Clock Skew
• Skew increases sequencing overhead
– Traditional domino has hard edges
– Evaluate at latest rising edge
– Setup at latch by earliest falling edge

clk

clk
t pd  Tc  2tsetup  2tskew
clk clk clk clk clk clk clk clk
Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static
Latch

Latch
tsetup tskew
Time Borrowing
• Logic may not exactly fit half-cycle
– No flexibility to borrow time to balance logic between
half cycles
• Traditional domino sequencing overhead is about 25% of
cycle time in fast systems!
clk

clk

clk clk clk clk clk clk


Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static
Latch

Latch
tsetup tskew
Skew-Tolerant Domino
• Use overlapping clocks to eliminate latches at phase
boundaries.
– Second phase evaluates using results of first
No latch at
phase boundary
f1 f2
Dynamic

Dynamic
a b c d

Static

Static
f1 f1

f2 f2

a a

b b

c c
Full Keeper
• After second phase evaluates, first phase precharges
• Input to second phase falls
– Violates monotonicity?
• But we no longer need the value
• Now the second gate has a floating output
– Need full keeper to hold it either high or low
f H
X
weak full
f keeper
transistors
Time Borrowing
• Overlap can be used to
– Tolerate clock skew
– Permit time borrowing
• No sequencing overhead toverlap
tborrow tskew

f1

f2
t pd  Tc
f1 f1 f1 f1 f1 f2 f2 f2
Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static

Static

Static

Static

Static
Phase 1 Phase 2
Multiple Phases
• With more clock phases, each phase overlaps more
– Permits more skew tolerance and time borrowing
f1

f2

f3

f4

f1 f1 f2 f2 f3 f3 f4 f4
Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic

Dynamic
Static

Static

Static

Static

Static

Static

Static

Static
Phase 1 Phase 2 Phase 3 Phase 4
Clock Generation

en clk

f1

f2

f3

f4
Timing issues
• Set up and hold time:
• Every flip-flop has restrictive time regions around the
active clock edge in which input should not change
• We call them restrictive because any change in the input
in this regions the output may be the expected one
• It may be derived from either the old input, the new input,
or even in between the two.
Timing issues

• The setup time is the interval before the clock where the
data must be held stable.
• The hold time is the interval after the clock where the
data must be held stable.
• Hold time can be negative, which means the data can
change slightly before the clock edge and still be
properly captured.
• Most of the current day flip-flops has zero or negative
hold time.
Timing issues
Timing issues
• To avoid setup time violations:

•The combinational logic between the flip-flops should be


optimized to get minimum delay.
• Redesign the flip-flops to get lesser setup time.
• Tweak launch flip-flop to have better slew at the clock pin,
this will make launch flip-flop to be fast there by helping
fixing setup violations.
• Play with clock skew (useful skews).

•To avoid hold time violations:

• By adding delays (using buffers).


•One can add lockup-latches (in cases where the hold time
requirement is very huge, basically to avoid data slip).

You might also like