Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Grid--to

Grid
to--Ports Clock Routing for
High Performance
Microprocessor Designs
Haitong Tian#, Wai-Chung Tang#, Evangeline F.Y. Young# and C.N. Sze*
# Department

of Computer Science and Engineering


The Chinese University of Hong Kong
* IBM

Austin Research Laboratory

ISPD 11, Santa Barbara , USA


March 28, 2011

Outline
Introduction
Problem Formulation
Routing Algorithm
Experimental
E
i
l Results
R l
Conclusion

ISPD 2011

Clock Distribution Categories


Clock distribution is an very important issue
Buffered and unbuffered trees
Used in various ASICs
Supported by many physical design tools
See Tsay TCAD93, Xi DAC95

Non-tree structure with crosslinks


Intended for reducing clock skews
aja a DAC04,
C 0 , TCAD06
C
06
See Rajaram

Grid and buffered trees


High performance processors
Sometimes manually design the clock structures
See Shelar ISPD09, TCAD10, Guru VLSI Circuits10

ISPD 2011

High Performance Clock Distribution


Clock network in high
performance
microprocessors
Distributed as global grid
followed by buffered trees
See Shelar ISPD
ISPD09
09,
TCAD10, Guru VLSI
Circuits10
This paper focuses on the
post-grid clock distribution
area

Grid buffers

External
Clock

PLL

...

Local
Clock
buffers

...

...

Grid Bufer

Regional
Clock
buffers

Post-grid Clock
Distribution

Regional
Clock
buffers

Clock Grid

Local
Clock
buffers

Local Clock
Network

Post grid clock distribution

ISPD 2011

Post-grid Clock Distribution


In our modeling
Entire chip divided into
several layout areas

Global grid
Grid Buffer

Each layout area contains


many blocks

Blocks
Reserved
R
d
Tracks
Port

Each block contains


standard cells and/or macros

Sequential
Global Grid
Local Clock
Buffer

Each layout area contains


100s-1000s clock pports
Grid wires reserved for
clock routing
Typically upper mental
layers

Layout
Region

Reserved
multilayer tracks

ISPD 2011

Motivations
Clock distribution of microprocessor:
Crucial importance
Major source of power dissipation

High capacitance usage


[1]
18.1%
18 1% off total
t t l clock
l k capacitance
it
See Pham Solid State Circuits06

Manuallyy design
g in ppractice
Hard to satisfy delay/slew constraints
Time to market
See
S Sh
Shelar
l ISPD09
ISPD09, TCAD10

[1]: D. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P. Harvey, H. Hofstee, C. Johns,
et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE
Journal of Solid-State Circuits, 41(1):179196, Jan. 2006.
ISPD 2011
6

Outline

Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion

ISPD 2011

Problem Formulation
Input
A set of reserved tracks
Locations and capacitances of ports P
Different types of wires on each metal layer
Delay limit D. Slew limit S

Output
A clock network (may be non-tree
non tree structures)

Objective
Connecting every port to the source
Satisfying delay and slew constraints
Minimizing capacitance usage
ISPD 2011

Post-grid Clock Routing


0
7

Layer

3
0

500

1000

1500

200

400

600

800

1000

1200

1400

1600

1800

ISPD 2011

Outline

Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion

ISPD 2011

10

Overall Algorithm
Critical ports
Ports with large capacitance or
f away from
far
f
the
th source
Path expansion algorithm
Elmore-delay driven
Expanding in some selected
directions
Post-processing
Wire replacement
Topology refinement
Iterations
The overall algorithm is
repeatedly invoked
Mayy fail when number of
iterations > K (user specified)
ISPD 2011

11

Delay-driven Path Expansion Algorithm


Basic steps
Simultaneously expand from all ports
Select the path with the minimum Elmore delay to further expand
Connect the ports to the source once the path reaches the source
grid
Check delay/slew constraints

ISPD 2011

12

A Routing Example
Initially, the heap is empty
First iteration (simultaneously expand from all ports)
Heap={(P1,P2);(P1,C1);(P2,P3);(P2,P1);(P3,P2);(P3,C2)}

Second iteration (P1,P2)


Heap={(P
p {( 3,,C2);(
);(P1,,P2,,P3);(
);(P1,,C1);(
);(P2,,P3);(
);(P2,,P1);(
);(P3,,P2)}

Third iteration (P3,C


C2)
Heap ={(P3,C2,S2);(P1,P2,P3);(P1,C1);(P2,P3);(P2,P1);(P3,P2)}

ISPD 2011

13

A Routing Example
Fourth iteration (identify chain paths)
Heap ={(P1,P2,P3);(P1,C1);(P2,P3);(P2,P1)}
Chain path={(P1,P2,P3);(P2,P3)}

Fifth iteration (P2,P3)


Heap={(P1,P2,P3);(P1,C1)}
{( 1,,P2,,P3);(
);(P1,,P2)}
Chain ppath={(P

Sixth iteration (P1,P


P2)
Heap={}, chain path={}
Final result

ISPD 2011

14

Post-processing Techniques
Wire replacement
Two types of wires
capacitance/resistance
it
/ it
tradeoff
t d ff

Procedures
Identify port Pl with the largest
Elmore delay

Wire replacement

Port with largest delay: P5


Replace edge P1C1
Replace edge P4C2
Replace edge P2P3, P3C1
Replace
p
P5C3, C3C2, C2C1, C1S1

Replace wires in a bottom-up style


S1

Check delay/slew constrains


P2
C3

P3

C1
C2

P1
P4

P5

ISPD 2011

15

Post-processing Techniques
Topology refinement
Procedures
Disconnect a port P

Topology refinement
Elmore delay:
P5>P4>P6>P2>P1>P3>P7

Sequentially process all the ports


Expand P towards all
directions
Select paths with smaller
capacitance
i
Check delay/slew constraints

S1
P2

P3

C1

P1

C2

C3

P4

P5
P6

C4
C5

P7

S2

ISPD 2011

16

Non-tree Extensions
A small number of ports have
exceptionally large capacitances
The delay of its shortest path
exceeds the delay limit D
Procedures
Establish a shortest ppath for p

Non-tree extensions
Connect p to S1
Find a second source S2
Add crosslinks
Find a third source S3
Add crosslinks

Find a second shortest path


Target delay not met? Add all
useful corsslinks
Target delay not met? Do the
same thing for parent node of p

ISPD 2011

17

Outline

Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion

ISPD 2011

18

Experiment Setup
Environment
Implemented in C++
Run on Linux server
Intel Pentium 4 3.2GHz
2GB RAM

Delay setup: 5ps


Slew setup: input: 10ps; output: 15 ps

Benchmarks
B h
k
3 test cases are provided by industry
11 test
es cases aaree from
o ISPD
S
2010
0 0C
Clock
oc Network
Ne wo Synthesis
Sy es s Co
Contest
es

Comparisons
Compared with TG, which was proposed by Shelar in ISPD09,
TCAD10
ISPD 2011

19

Tree Growing Algorithm


Proposed in R. Shelar ISPD09,
TCAD10
Delay/Slew
D l /Sl constraints
t i t
Greedy expansion from the
source
Edges
Ed
with
ith the
th smallest
ll t
capacitance will be added into
the network

Tree Growing Algorithm


Expand from the source
Add S1C1, S2C2
Add C2P3
Add C1P1
Add P3P2

S1
C1

S2
P1

P2

P3

ISPD 2011

C2

20

Comparisons: capacitance
Capacitance (without post-processing techniques)
25000
ccapacitance (fF)

Without post-processing:
18 3% improvement
18.3%
i
t

20000
15000

TG
Ours

10000
5000
0
1

10

11

12

13

14

Test cases

Capacitance (with topology refinement)


25000
capacitancce(fF)

With topology refinement:


24.6% improvement

20000
15000

TG
Ours

10000
5000
0
1

10

11

12

13

14

Test cases

ISPD 2011

21

Comparisons: wire length


Wire Length (without post-processing techniques)
50000
Wire length (um)
W

Without post-processing:
16 8% improvement
16.8%
i
t

40000
30000

TG
Ours

20000
10000
0
1

10

11

12

13

14

Test cases

With topology refinement:


23.6% improvement

Wire Length (with topology refinement)

Wire length ((um)

50000
40000
30000

TG
Ours

20000
10000
0
1

10

11

12

13

14

Test cases

ISPD 2011

22

Comparisons: run time


TG
Average
A
run ti
time 0.14s
0 14
Time (s)

Ours

Run time (without any post-processing techniques)

Without any post-processing


techniques: 1.49s on average
With all techniques: 1.78s on
average
Practical runtime

7
6
5
4
3
2
1
0

TG
Ours

10

11

12

13

14

Test cases

Run time (with all post-processing techniques)

Time ((s)

8
6
TG
Ours

4
2
0
1

10

11

12

13

14

T t cases
Test

ISPD 2011

23

Non-tree extension
We also did some experiments to see the results of our non-tree
extension
A 23.4%
23 4% improvement
i
t on the
th minimum
i i
portt delay
d l
Minimum port delay is the lower bound one can achieve using tree
structures
Non-tree delay
3000

Delay (fs)

2500
2000
Non-tree delay
Lower bound delay

1500
1000
500
0
1

10

11

12

13

14

Test cases

ISPD 2011

24

Simulation Results
Simulation tool: Hspice
Delay
Correlation coefficient

3000

Delay(fs)

2500
2000
Calculated delay
Simulated delay

1500
1000
500
0
1

10

11

12

13

14

Test case

Non-tree delay
2500
2000
Delay (fs)

Tree: 99%
Non-tree: 99%

Tree delay

1500

Calculated delay
Simulated delay

1000
500
0
1

10

11

12

13

14

Test cases

ISPD 2011

25

Simulation Results
Slew
Correlation coefficient

12

Slew (ps)

11.5
11
Simulated slew
Calculated slew

10.5
10
9.5
9
1

10

11

12

13

14

Test cases

Non-tree slew

Slew (ps)

Tree:
T
96%
Non-tree: 94%

Tree slew

11.2
11
10.8
10.6
10.4
10.2
10
9.8
9.6
9.4

Simulated slew
Calculated slew

10

11

12

13

14

Test cases

ISPD 2011

26

Outline

Introduction
Problem Formulation
Routing Algorithm
Experimental Results
Conclusion

ISPD 2011

27

Conclusion
Proposed an efficient algorithm to construct a post-grid clock
network on reserved multi-layer metal tracks
Extended the algorithm to allow non-tree structures to further
bi
brings
down
d
the
th delay
d l
Verified our results using Hspice simulation
Expected
p
to reduce energy
gy consumption,
p
, improve
p
ggrid to pport
delay in real post-grid clock networks

ISPD 2011

28

ISPD 2011

29

You might also like