Professional Documents
Culture Documents
Clock Balance Ieee Seminar04
Clock Balance Ieee Seminar04
Clock Distribution
Distribution and
and Balancing
Balancing
Methodology
Methodology For
For Large
Large and
and
Complex
Complex ASIC
ASIC Designs
Designs
z Summaries
Classical clock tree synthesis
methods
Taping point
Clock tree tuning
logic cloud
z Group launch and capture clock nodes and cluster them in the
bottom up fashion in clock tree topology generation
Bad Good
Bad Good
Clock tree synthesis: Engineer
perspective
z Custom vs. automatic clock tree synthesis
z Clock structure in large and complex ASIC designs
z Clock phase delay control
z Clock skew control
z Clock duty cycle distortion control
z Clock gating efficiency
z Clock signal integrity
Custom clock tree distribution and
balancing
z Manually define top levels of clock tree to blocks
H-tree, wide/shield wires, differential buffers etc.
z Build local mesh or tree to distribute clock to leaf cells
z Extract clock tree and build SPICE deck.
z Analyze the extracted clock tree and tune it manually.
z Pros:
Low skew clock tree
z Cons:
~ Long and complex process
~ Large power dissipation in clock mesh
z Popular in high speed MPU design, but not suitable for ASIC
and low-power designs
Automatic clock tree synthesis (CTS)
z Pros:
~ Flexibleand relatively easy to use
~ Low skew, if clock structure is not complex
~ Clock phase delay and wire length are minimized
~ max_fanout, max_cap, max_slew targets are honored
~ Good correlation between pre- and post-route clock
skew
z Limitations:
~ Optimization is focused on expanded clock buffer trees.
~ Quality drops when clock structure becomes complex
and CTS constraints are not provided.
z Need CTS constraints and guidance.
Clock tree synthesis: Engineer
perspective
z Custom vs. automatic clock tree synthesis
z Clock structure in large and complex ASIC designs
z Clock phase delay control
z Clock skew control
z Clock duty cycle distortion control
z Clock gating efficiency
z Clock signal integrity
Clock structure in large and complex
ASIC designs
z For power saving
~ Multi-levelclock gating
~ Many gating domains
~ Multi-clock speed domains
Clock_Gating_
Cell FlipFlop
CTB
CTB
Clock CTB
CTB CTB
divider
CTB FlipFlop
CTB
Clock_Gating_
Cell
1 1
CTS
0 0
Gating Gating
z A practical solution
~ Find“non-CTS” output nets
~ Apply heavy net-weight to the nets in cell placement
CTB
CTB
CTB
CTB
CTB
Flop3 Flop3
Top-level
delay
CTB
CTB
Flop3
Top-level
delay
Solution 2: Isolate the local flops
CTB
Flop2 Flop2
CTB
CTB
CTB
CTB CTB
CTB
Flop
PLLs
programmable DLLs
Interface
Interfaces
MPU Core
CLK&RST
Mgr.
DMA
Controller
DPLL x 2
1/8
1/6
Clock 1/4 CTS
dividing FSM 1/2
bypass
Clock
select
Reset
CLR
Div lock-in D Q 1
0
Clock-out
Clock-in
Delay line
Div/bypass
select
1/8
1/6
Clock Dividing
dividing 1/4 clock delay CTS
FSM 1/2
equalizer
Div/bypass
Clock select
Dividing Clock
bypass select
Dividing
Clock clock delay CTS
Clock-in dividing equalizer
FSM
Clock select
z Pros.
~ Save one flop
~ Divided clocks and bypass clock are always in phase
Clock
Div_2
Div_4
Div_8
Cause 2: Operation mode dependent
multiple clock distributions
DOMAIN1 block1_func_enable no_transaction
k
1 test_enable
GATING CTS
Test_clk 0 1 GATING CTS
DPLL2 CTS GATING
0
test_enable
block2_func_enable
func_enable2
GATING
CTS
bsr_clock DOMAIN2
boundary_scan_enable
Functional
Four levels of Gating path func_enable
1/x
1. Chip level clock gating Asynchronous
1
2. Functional domain level clock gating Divider GATING CTS
module
3. Block level clock gating 0
CTS
4. Transaction level clock gating
Test mode path test_enable
Solution: A multi-mode clock balance
strategy
z Extracted a common clock distribution topology
z Mode dependent clock path selection criteria
~ Minimizeunbalanced leaf cells
~ Maximize balanced leaf cells in timing critical paths
k
1 test_enable
GATING CTS
Test_clk 0 1 GATING CTS
DPLL2 CTS GATING
0
test_enable
block2_func_enable
func_enable2
GATING
CTS
bsr_clock DOMAIN2
boundary_scan_enable
Functional
path func_enable
1/x
1
Asynchronous
GATING CTS
Divider module
0
CTS
Clock_in 1
Clock_out Clock_in
Clock_out
0 Test_mode
Test_mode
Issues and practical solutions in
automatic clock tree synthesis
z Clock phase delay control
z Clock skew control
z Clock duty cycle distortion control
z Clock gating efficiency
z Clock signal integrity
Clock duty cycle distortion control
INV
CTB FlipFlop
CTB
INV CTB
CTB
CTB FlipFlop
CTB
INV
Clock Td
Delay tree
line Td Output Clock
z Pros.
z Single insertion point and easy to implement
z Implemented before clock tree expansion
z Local delay tuning has little disturbance in layout
Solution2: Implement a clock phase
chopper – case 2: longer low phase
z A clock phase chopper for 50-Td/50+Td duty cycle
distortion
z An OR gate chopper to delay falling edge of the clock
Clock Td
Delay tree
line Td Output Clock
Issues and practical solutions in
automatic clock tree synthesis
z Clock phase delay control
z Clock skew control
z Clock duty cycle distortion control
z Clock gating efficiency
z Clock signal integrity
Clock gating efficiency
z Maximize gating effect to reduce idle power
z Minimize ungated part of the clock tree
~ Place clock gating cells close to clock sources
Clock_Gating_
Cell Flop Clock_Gating_
CTB Cell CTB Flop
CTB
CTB
CTB
Clock CTS
divider