Professional Documents
Culture Documents
Haitong Tian, Wai-Chung Tang, Evangeline F.Y. Young and C.N. Sze
Haitong Tian, Wai-Chung Tang, Evangeline F.Y. Young and C.N. Sze
Grid
to--Ports Clock Routing for
High Performance
Microprocessor Designs
Haitong Tian#, Wai-Chung Tang#, Evangeline F.Y. Young# and C.N. Sze*
# Department
Outline
Introduction
Problem Formulation
Routing Algorithm
Experimental
E
i
l Results
R l
Conclusion
ISPD 2011
ISPD 2011
Grid buffers
External
Clock
PLL
...
Local
Clock
buffers
...
...
Grid Bufer
Regional
Clock
buffers
Post-grid Clock
Distribution
Regional
Clock
buffers
Clock Grid
Local
Clock
buffers
Local Clock
Network
ISPD 2011
Global grid
Grid Buffer
Blocks
Reserved
R
d
Tracks
Port
Sequential
Global Grid
Local Clock
Buffer
Layout
Region
Reserved
multilayer tracks
ISPD 2011
Motivations
Clock distribution of microprocessor:
Crucial importance
Major source of power dissipation
Manuallyy design
g in ppractice
Hard to satisfy delay/slew constraints
Time to market
See
S Sh
Shelar
l ISPD09
ISPD09, TCAD10
[1]: D. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P. Harvey, H. Hofstee, C. Johns,
et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE
Journal of Solid-State Circuits, 41(1):179196, Jan. 2006.
ISPD 2011
6
Outline
Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion
ISPD 2011
Problem Formulation
Input
A set of reserved tracks
Locations and capacitances of ports P
Different types of wires on each metal layer
Delay limit D. Slew limit S
Output
A clock network (may be non-tree
non tree structures)
Objective
Connecting every port to the source
Satisfying delay and slew constraints
Minimizing capacitance usage
ISPD 2011
Layer
3
0
500
1000
1500
200
400
600
800
1000
1200
1400
1600
1800
ISPD 2011
Outline
Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion
ISPD 2011
10
Overall Algorithm
Critical ports
Ports with large capacitance or
f away from
far
f
the
th source
Path expansion algorithm
Elmore-delay driven
Expanding in some selected
directions
Post-processing
Wire replacement
Topology refinement
Iterations
The overall algorithm is
repeatedly invoked
Mayy fail when number of
iterations > K (user specified)
ISPD 2011
11
ISPD 2011
12
A Routing Example
Initially, the heap is empty
First iteration (simultaneously expand from all ports)
Heap={(P1,P2);(P1,C1);(P2,P3);(P2,P1);(P3,P2);(P3,C2)}
ISPD 2011
13
A Routing Example
Fourth iteration (identify chain paths)
Heap ={(P1,P2,P3);(P1,C1);(P2,P3);(P2,P1)}
Chain path={(P1,P2,P3);(P2,P3)}
ISPD 2011
14
Post-processing Techniques
Wire replacement
Two types of wires
capacitance/resistance
it
/ it
tradeoff
t d ff
Procedures
Identify port Pl with the largest
Elmore delay
Wire replacement
P3
C1
C2
P1
P4
P5
ISPD 2011
15
Post-processing Techniques
Topology refinement
Procedures
Disconnect a port P
Topology refinement
Elmore delay:
P5>P4>P6>P2>P1>P3>P7
S1
P2
P3
C1
P1
C2
C3
P4
P5
P6
C4
C5
P7
S2
ISPD 2011
16
Non-tree Extensions
A small number of ports have
exceptionally large capacitances
The delay of its shortest path
exceeds the delay limit D
Procedures
Establish a shortest ppath for p
Non-tree extensions
Connect p to S1
Find a second source S2
Add crosslinks
Find a third source S3
Add crosslinks
ISPD 2011
17
Outline
Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion
ISPD 2011
18
Experiment Setup
Environment
Implemented in C++
Run on Linux server
Intel Pentium 4 3.2GHz
2GB RAM
Benchmarks
B h
k
3 test cases are provided by industry
11 test
es cases aaree from
o ISPD
S
2010
0 0C
Clock
oc Network
Ne wo Synthesis
Sy es s Co
Contest
es
Comparisons
Compared with TG, which was proposed by Shelar in ISPD09,
TCAD10
ISPD 2011
19
S1
C1
S2
P1
P2
P3
ISPD 2011
C2
20
Comparisons: capacitance
Capacitance (without post-processing techniques)
25000
ccapacitance (fF)
Without post-processing:
18 3% improvement
18.3%
i
t
20000
15000
TG
Ours
10000
5000
0
1
10
11
12
13
14
Test cases
20000
15000
TG
Ours
10000
5000
0
1
10
11
12
13
14
Test cases
ISPD 2011
21
Without post-processing:
16 8% improvement
16.8%
i
t
40000
30000
TG
Ours
20000
10000
0
1
10
11
12
13
14
Test cases
50000
40000
30000
TG
Ours
20000
10000
0
1
10
11
12
13
14
Test cases
ISPD 2011
22
Ours
7
6
5
4
3
2
1
0
TG
Ours
10
11
12
13
14
Test cases
Time ((s)
8
6
TG
Ours
4
2
0
1
10
11
12
13
14
T t cases
Test
ISPD 2011
23
Non-tree extension
We also did some experiments to see the results of our non-tree
extension
A 23.4%
23 4% improvement
i
t on the
th minimum
i i
portt delay
d l
Minimum port delay is the lower bound one can achieve using tree
structures
Non-tree delay
3000
Delay (fs)
2500
2000
Non-tree delay
Lower bound delay
1500
1000
500
0
1
10
11
12
13
14
Test cases
ISPD 2011
24
Simulation Results
Simulation tool: Hspice
Delay
Correlation coefficient
3000
Delay(fs)
2500
2000
Calculated delay
Simulated delay
1500
1000
500
0
1
10
11
12
13
14
Test case
Non-tree delay
2500
2000
Delay (fs)
Tree: 99%
Non-tree: 99%
Tree delay
1500
Calculated delay
Simulated delay
1000
500
0
1
10
11
12
13
14
Test cases
ISPD 2011
25
Simulation Results
Slew
Correlation coefficient
12
Slew (ps)
11.5
11
Simulated slew
Calculated slew
10.5
10
9.5
9
1
10
11
12
13
14
Test cases
Non-tree slew
Slew (ps)
Tree:
T
96%
Non-tree: 94%
Tree slew
11.2
11
10.8
10.6
10.4
10.2
10
9.8
9.6
9.4
Simulated slew
Calculated slew
10
11
12
13
14
Test cases
ISPD 2011
26
Outline
Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion
ISPD 2011
27
Conclusion
Proposed an efficient algorithm to construct a post-grid clock
network on reserved multi-layer metal tracks
Extended the algorithm to allow non-tree structures to further
bi
brings
down
d
the
th delay
d l
Verified our results using Hspice simulation
Expected
p
to reduce energy
gy consumption,
p
, improve
p
ggrid to pport
delay in real post-grid clock networks
ISPD 2011
28
ISPD 2011
29