Clock Network Synthesis For Synchronous Circuits

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 59

CLOCK NETWORK SYNTHESIS

FOR SYNCHRONOUS CIRCUITS

Ranganayakulu Konduri
29thth August,2015
Agenda
1
• Introduction

2
• Clock skew sources

3
• Clock distribution Methods

4
• Low power CTS

5
• Advance CTS topics
Introduction - 1
• What is CTS?
• Why do we need to do CTS?

D Q D Q

CK CK
Introduction - 2
1
• Clock skew

2
• Clock Jitter

3
• Clock Slew

4
• Pulse period degradation

5
• Min pulse width check
Clock Skew - 1
• Given two sequentially-adjacent registers R1 and R2
with clock arrival times at destination and source
register clock pins equal to T2 and T1 respectively,
clock skew can be defined as: Tskew 1, 2 = T2 − T1

D Q D Q
T1
CK CK

T2
Positive and negative skew
• If the latency of capture flop is greater than
launch flop then it is termed as Positive skew
else it is termed as negative skew.

D Q D Q D Q
T1
CK CK CK

T2
T3
Positive skew
• Launch edge arrives before capture edge.
• Good for setup timing, clock period increased by
skew. It is termed as Useful skew.
• Bad for hold timing.
TCLK
TCLK+ δ

CLK1

δ
CLK2

Thold + δ
Negative skew
• Capture edge receives clock before the launch
edge
• Bad for setup, good for hold timing.
TCLK
TCLK - δ

CLK1

δ
CLK2
Slew
VDD
70%

30%
• Slew is the time takes
GND
for a signal to
transition between
two specific levels.
Slew rates
• Sharp slew rates maintain signal integrity.
• Slow slew rates increase short circuit power.
• Fast slew rates lead to overdesign of clock
network and may lead to increase in clock
network dynamic power.
• So designer need to make trade-off in choosing
slew target. In general as a rule of thumb it is
chosen to be 10-15% of clock period.
CTS constraints - 1
• Area
– Clock network repeater/interconnect overhead
causes increase in chip power.
– Minimizing area reduces wire capacitance and power.
• Global skew – Advisable to keep at-least greater
than 2 times the normalized inverter delay.
• Clock balancing constraints
• Clock slew limit, max fanout limit.
CTS constraints - 2
• Insertion Delay
– Not directly used as constraint but a important
metric to track in the CTS.
– Need to keep as much minimum as possible.
• Minimize the area and so is power.
• Softens the effect of OCV
– Ideal target is to keep total ID less than clock
period. (might not be achievable in most cases)
Clock Routing problem
• Given a source and n sinks, clock distribution
is interconnect optimization problem.
• Connect all sinks to the source by an
interconnect tree so as to minimize:
– Clock Skew = maxi,j |ti - tj|
– Delay = maxi ti
– Total wirelength
– Noise and coupling effect
Global skew
# No.of sinks • Global skew is the
temporal difference of
Normal Insertion clock arrival times of
delay closest and farthest sink.

Temporally
Closest sink
Temporally
farthest sink

Delay
Global skew
Clock skew sources
• Uneven spread of sinks.
– Placement optimization engine is designed to
place the cells such that data-path delay is
minimized.
• Timing driven placement.
• Wire-length driven placement.
• Congestion driven placement
• Process, voltage, temperature variations
Pulse width/duty cycle degradation

• If the clock traverses through combo logic


which is doesn’t have uniform rise/fall time
you might end up seeing degradation of pulse
width.
D Q D Q

CK CK
Pulse period degradation
• If the clock traverses through clock network which is not built properly you might
end up seeing more period than the actual clock period.

D Q D Q

CK CK
Clock design problem
• What are the main concerns for clock design?
• Skew
– No. 1 concern for clock networks
– For increased clock frequency, skew may contribute over 10% of the
system cycle time
• Power
– very important, as clock is a major power consumer
– It switches at every clock cycle
• Noise
– Clock is often a very strong aggressor
– May need shielding, double/triple spacing
• Delay
– Not really important
– But slew rate is important (sharp transition)
• EM
– May need double/tripe width rules for interconnect.
Clock tree synthesis methods
• Multi-point CTS – Mesh/grid based design.
• Tree distribution
– Symmetric buffered trees (Eg : H-tree)
– Synthesized buffered trees
Multipoint CTS
• Pros
– Multipoint CTS is the best option when working with high speed
design of frequencies 1Ghz and above.
– Lowest skew can be achieved by this method.
– Non even distribution of sinks doesn’t effect the skew.
– Tolerant to process variation.
• Cons
– Huge wire area and very large drivers which results in huge
power consumption.
– Grid is in general over designed achieving optimal solution is
highly impossible.
Symmetric clock trees
• H-tree is most famous among the symmetric clock trees.
• Pros
– Inherently balanced structure, can achieve low skew.
– Lower area and power compared to mesh CTS
– Easier to implement given an even sink distribution.
• Cons
– Difficult to achieve lower skew in case of non-uniform distribution of
sinks.
– Need to guide the tool/automatic solution when working with macro
dominated designs.
– Designer need to spend more time on clock gate placement and
meeting timing of enable pins of clock gate logic.
H-tree algorithm
Clock tree synthesis
• Variety of tools use different algorithms to provide
solution when the design is having non-uniform
sinks.
– But it is difficult to achieve robust clock tree solution.
Meaning clock tree designed for one particular corner
with minimal skew (say x )may have 2-3 times (2X-3X)
skew in other corner.
– Skew achieved might not be always the lowest. Always
user intervention is needed to achieve better skew and
minimum insertion delay.
Clock tree synthesis - 2
• CTS engine’s approach for non-uniform clock
sink distribution.
– Clustering
• Sinks are grouped to have almost equal load clusters.
• Clustering algorithm groups the repeaters inserted in the
previous level to create equal load until it reaches the
clock source.
– Clock tree routing
• Clock tree router ensures the delays are matched to have
uniform load at the same hierarchy.
Method of Means and Medians
• It is one of the algorithm used by CTS engine for
non-uniform distribution of sinks.
• Recursively partition the terminals into two sets
of equal size.
• Connect the center of mass of the whole circuit
to the center of mass of two sub-circuits.
• Clock skew is minimized heuristically may not
achieve the minimal skew or zero-skew.
MMM method
Geometric matching algorithm
• Geometric matching of n end-points
– Construct a set of n/2 line segments connects n –
sinks
– Cost is sum of edge lengths.
GMA
Clock gating
• A significant fraction of power is consumed by clock network. Almost 50% of
total dynamic power is contributed by clock network.
– It is intuitive since these buffers have highest toggle rate. (probability of transition a=2)
– We have lots of clock buffers in the design (~ 5%)
– All clock buffers we use are of high drive strengths ( i.e. larger in size, so is more gate
cap)
– Also, registers which are clocked dissipate some dynamic power even when input data
is not changing.
• Simple and easiest way is turn-off clocks when we not need them.  Clock
gating.
• Earlier (2004) we used to manually insert clock gates based on understanding
circuitry. Now tools are equipped to automatically insert clock gates whenever
they identify certain circuitry structure which can be replaced by clock gate
(ICG).
Clock gating (synthesis)
always @ (posedge CLK)
If (EN)
Q <= D;
` Q
D

CLK (high activity ‘a’)


EN

EN Q
D
ICG
CLK GCLK
(low activity ‘a’)
Clock gating challenges
• Not all registers on a clock can be gated
– Synthesis tools try to cover as much as possible
– This causes imbalance in clock network resulting in
skew.
• Clock gating logic should be placed close to
regsiters.
– Clock gating insertion is done during RTL
elaboration no or minimal knowledge of
placement.
Multi-mode clock balancing

CLK1

SCAN_CLK

CLK2

Tree buffer

Skew buffer
Clocks between Multi-VDD domains
1.1V

D Q D Q

CK CK

0.9/1.1V
1.2V

CLKcontrol
Cascaded clocks

CLK1

CLK2 CLK1_div2
Path sharing

W/O path sharing

W/ path sharing
update_clock_latency
• Why do we need to run UCL?
– In general we provide estimated clock latency
during pre-CTS phase which may be off from the
insertion delay numbers calculated after clock
tree implementation.
– If we do not update the clock latency w.r.t to
averaged/median latency of all the sinks we
might end up having either optimistic/pessimistic
budgets for the interface.
CTS – Tool perspective
1
• Understand your clock

2
• Fix issues in your clock

3
• Route your clock (RRC)

4
• Balance your clock (RGC)

5
• Debug your clock
Understand your clock-1
• Before starting CTS remove all the existing repeaters.
• Are there any pre-existing repeaters which are
marked as dont_touch. (constraint buffer, guide
buffers etc)
• Understand how many levels of clock gating, muxes,
combo logic etc.. (report clock tree –summary/-
level_info)
• Investigate the macro placement especially w.r.t to
clock tree perspective.
Understand your clock-2

L2ram0

L2ram1

L2ram2 L2ram3 L2ram4


Fix issues in your clock
• Try to check whether clock definitions are
defined on the non-hierarchical ports or not?
– Many tools may not support the clock definitions
defined on hierarchical ports.

EN Q
D
ICG
CLK
GCLK
Fix issues in your clock-2
• Check if generated clock definitions trace
back to the source defined in the definition.
– If you do not fix this issue generated clock sinks
will not be balanced.

CLKA CLKAdiv2

CLKB

create_generated_clock –name CLKAdiv2 –source [get_ports CLKA] \


[get_pins divreg/q] –divide_by 2
SDC check
• Understand your sdc constraints
– set_clock_sense –stop_propogation
– set_case_analysis (report_case_analysis)
– set_disable_timing (report_disable_timing)
Fix issues (congestion/cell density/illegal
cells)
• Before starting CTS check and fix the issues
related illegal cells. If you provide illegal
database as a start db your CTS engine might
end up providing you illegal db post CTS.
– Some tools might not even proceed if you start
with illegal placement db.
• Fix/address congestion and cell density
issues.
CTS spec-1
• DRC constraints (clock trans/cap/fanout)
• CTS goals (skew/target insertion delay)
• CTS algos (ocv clustering, logic balancing,
top/block mode)
• Buffers to be used for CTS/CTO engines
– -reference (CTS)
– -reference
• –sizing_only
• -delay_insertion_only
CTS spec – 2 (NDRs)
• Target metal layers to be used
• Wide space rules
• Shielding rules
CTS spec-3 (exceptions)
• Non stop pins (skew anchor)
• Exclude pins (skew don’t care pins)
• Float pins (offset latency Note that –ve offset
will delay the endpoint )
• Stop pins
CTS exceptions
• Don’t size cells
• Don’t buffer nets
• Size only cells (do not clone or move)
• Preserve hierarchy ( do not club with clock
sinks of other hierarchy sinks – rarely used
also will effect CTS QOR)
Build your clock
• Build the most critical functional clock first
and keep the tree using don’t touch tree
options.
• Overlapping clocks should be synthesized
together
CTS modes
• Block mode (bottom-up)
– Designs which have less obstructions or uniform core area
• Top mode (mixed mode bottom-up for lower level
and top-down for top level)
– As the name indicates use it for top level implementation
or blocks which have more macros. Use it when you have
tighter constraints on ID.
• Logic-level balancing (To minimize OCV)
– Minimizes skew, balances both levels and clock network
delay (robust clock tree)
CTS modes -2 (Logic level balancing)

EN EN
EN EN
ICG ICG
ICG ICG
CLK` CLK`

- Use only when timing is very critical and power/area is not a concern
OCV clustering
CTS Issues
• Clock detouring near hard macros – use guide buffers
• Large skew due to unbalanced pad rise/fall delay
– Provide feedback to pinmux owner/RTL owner to use
balanced IO buffer
• Constraint conflict near overlapping clock, fix the
constraints.
• Large skew on certain branch due to sinks which are
spread across. Use tighter DRC constraint on that
branch.
CTS checklist - 1
• Pre-requisite to start CTS :
– Is design is legally placed?
– Is pre-CTS timing snapshot reasonably good?
– Is pre-CTS congestion/cell density issues
addressed?
– Are all generated clocks are able to trace back to
source clock. Not meeting this might result very
bad skew in the design.
CTS checklist – 2
• Does clock tree references are specified in
the clock tree references list are having equal
rise and fall delay?
• Does wide range of buffers/inverters
specified in the list? Recommendation is
provide 4X-16X buffers and inverters
• Remove dont_use and dont_touch attributes
specified in the clock tree reference list.
CTS checklist - 3
• Clock tree options : (set_clock_tree_references)
– Specify all the four options to have better CTS
– reference (CTS)
– reference –sizing_only (CTO)
– reference –delay_inserion_only (CTO)
– reference –boundary_cell_only (CTO)

• If you issue the set_clock_tree_references command multiple times, the


newreferences you specify are added to existing references.
• Recommendation is to issue before you set the references
reset_clock_tree_reference
CTS checklist – 4
• Remove existing clock tree buffers/inverters
in the design.
– report_clock_tree –summary
• You can find how many existing CTS buffers/inverters
exist in the design.
– Issue remove_clock_tree to delete existing
buffers/inverters.
Notes
• A good CTS engine should
– Implement low latency and well balanced tree
– Optimize tree gate area
– Meet rise/fall time requirements
– Should be capable of meeting skew with in skew
group and across synchronous skew groups
– Provide ability to offset skew certain endpoints
– Need to build robust clock network
– Minimize xtalk effects
Notes-2
• Use balanced CT repeaters
• Minimize OCV impact
• Gated logic should be clustered and placed
optimally.
• Need to support manual clock tree synthesis
along with automated tree implementation.
• Use of inverters will help to reduce the CT
area

You might also like