Professional Documents
Culture Documents
Adaptive Clock Distribution For 3D Integrated Circuits: Xi Chen, W Rhett Davis, Paul D Franzon
Adaptive Clock Distribution For 3D Integrated Circuits: Xi Chen, W Rhett Davis, Paul D Franzon
Abstract—Clock distribution in three-dimensional integrated In previous 2D ICs, some techniques have been proposed to
circuits (3D ICs) is faced with many challenges. In this work, we reduce clock skews. In the designs [5][6], delay buffers are
present new techniques for realizing highly adaptive and reliable inserted in the selected positions of the clock trees and they are
clock distribution for 3D ICs. Firstly, an efficient clock used to match the delays of all clock distributions to the longest
distribution topology without need of balanced H-tree is routing paths. This method will introduce large overhead if it
proposed. Secondly, a robust tunable-delay-buffer (TDB) circuit needs buffer insertions in multiple clock stages. In addition, the
and a novel active de-skew method are developed in order to traditional delay buffer design is sensitive to PVT variations,
handle the cross-die variations, thermal gradients, and wiring and the phase error caused by the variations will be
asymmetry. Moreover, a design optimization flow is constructed
accumulated across the whole clock path. In high performance
for improving the adaptive clock design based on the thermal
systems like micro-processors, active de-skew technique is
profiles. Experiment results show that the clock skews are
significantly reduced using the proposed techniques.
used [7]. This method compares clock phases at loading points
with the phase of a reference signal, and adjusts the Tunable-
Keywords-3D IC, clock districution, adaptive, de-skew Delay-Buffers (TDBs) in the clock path according to the phase
errors obtained by the phase comparators [8][9]. However,
distribution of an accurate reference signal itself is very
I. INTRODUCTION challenging and also it is difficult for this method to
Three-dimensional integrated circuit (3D IC) technology compensate the asymmetry caused by TSVs in 3D ICs.
provides promising benefits for advanced digital system
designs. The technology can help to overcome the interconnect In recent years, some efforts are made to design clock
wire delay barrier by greatly shortening the wire length from a network in 3D ICs. In [10], for each tier, the clock tree is
2D system [1]. It also provides a solution to the well-known designed by the same way as a 2D design. This method does
memory wall problem [2] by stacking multiple logic and not have the capability to handle cross-tier variations and
memory dies and connecting them with Through-Silicon-Vias results in a clock skew up to 250ps in the simulation. In [11],
(TSVs) [3]. In addition, the technology is able to significantly the method routes the clock network freely in three-
reduce memory access latency and input/output driver power dimensional space using updated algorithms. However, since
consumption compared to a general multi-chip system design. these routing algorithms oversimplify the effects of the TSVs,
All these features make 3D IC technology attractive. they are too optimistic and have limited use [12]. Some
researchers extend the design based on TDBs into 3D ICs [13],
Clock distribution is critical to a digital system design. but they have not provided solutions to handle the non-
When a system is implemented in 3D technologies, it is idealities caused by 3D integration.
becoming more challenging to control the clock skews for the
following reasons, In this work, we propose new techniques to handle the
challenges in 3D clock distribution. Firstly, we propose a new
a) Cross-die process variations. In a 3D integration, 3D clock distribution topology to achieve high quality and
especially a heterogeneous integration, cross-die process good cost-efficiency. Secondly, we design a phase mixer based
variations will increase the clock skews if the sequential TDB circuit which is tunable in 360 degrees and has good
elements in the same clock domain are located on different tolerance to the PVT variations. Thirdly, a novel de-skew
tiers. method is developed to handle the cross-tier variations and the
3D wiring asymmetry. Moreover, a design optimization flow
b) High thermal gradients. A 3D integration will lead to a
based on thermal profile is developed to minimize the power
higher heat density and moves some active devices further
and area overheads of the TDB insertions and further improve
away from the heatsink. The increased thermal gradients will
the adaptive clock network.
result in significant clock skews.
The paper is organized as follows. Section II discusses
c) Non-idealities of Through-Silicon-Via (TSV). Due to
details of the proposed techniques, including clock distribution
parasitics, TSVs can degrade clock signal quality and increase
topology, new active de-skew, and TDB circuit design. Section
skews. Also TSVs can absorb noise from substrate [4]. In
III presents the design optimization flow. The experiment
addition, TSVs make it difficult to design a highly symmetric
results are demonstrated in Section IV.
clock distribution.
978-1-4244-9399-9/11/$26.00©2011 IEEE
978-1-4244-9401-9/11/$26.00©2011 91
II. ADAPTIVE 3D CLOCK TECHNOLOGIES
Delay (ps)
stage of the global distribution and therefore saves large design 600
effort. Without need of the H-trees, the proposed clock
400
distribution topology is able to largely reduce power and
routing complexity. The TDB design is discussed in details in 200
the following subsection.
0
0 8 16 24 32
B. Phase mixer based Tunable-Delay-Buffer (TDB)
Tuning Code D[4-0]
In this work, we use multi-phase clocking to enhance the
(b)
capability of locking the phases of the TDBs with the clock
generator. A Phase Mixer based TDB (PM-TDB) circuit is Figure.2 New tunable delay buffer (TDB) circuit design. (a) Simplified
designed. As shown in Figure 2(a), the PM-TDB consists of a circuit schematic. (b) Simulated tuning delays at different process corners
phase multiplexer and a phase interpolator. By interpolating the
multi-phase clock, delay of the PM-TDB can be tuned
precisely. This PM-TDB circuit provides multiple advantages. number of TDB insertions. Secondly, the circuit has good
Firstly, it is capable of tuning in 360 degrees and generating tolerance to the PVT variations. Moreover, the PM-TDB is
arbitrary delay within only one stage so that the clock convenient for regional clock gating and intentional skew
distribution neither needs a highly balanced H-tree nor a large editing as it can be tuned individually without complicated
delay analysis. A PM-TDB controlled by 5-bit digital tuning
code is designed in a 45nm CMOS process. The loading
structure for the circuit is optimized for better slew-rate and
linearity. As the simulation shows, the total power
consumption for one mixer is 125μW at 1GHz, and the silicon
area is 10μm2. The nominal and worst-case delay values under
all code settings are shown in Figure 2(b).
92
Global Regional
Clock Distribution Distribution
Source
ΦSource TDB_L ΦLoading
Pref PL Loading
Point
Return signal
TDB_ref
path
Phase
D Comparator D
ΦSourceΦ Loading
Figure.4 Proposed clock tree topology
Figure.6 De-skew transient simulation results.
93
arrays (2KB/bank×16banks) on the top tier. Each tier is [10] Hao Hua, “Design and Verification Methodology for Complex Three-
Dimensional Digital Integrated Circuit”, Ph.D. dissertation, Dept. Elect.
1mm×1mm in area. HSPICE simulations are used to extract Comp. Eng., North Carolina State Univ., Raleigh, NC, 2006.
the temperature coefficients of the clock buffers and metal [11] Jacob Minz, Xin Zhao, and Sung Kyu Lim, “Buffered Clock Tree
wires in the clock tree. The accumulated delays of all clock Synthesis for 3D ICs under Thermal Variations”, in Asia and South
routings on both tiers are also calculated. Pacific Design Automation Conf., 2008, pp. 504-509.
[12] David Kung and Ruchir Puri, “CAD challenges for 3D ICs”, in Asia and
Figure 7(a) shows the thermal profiles for both tiers. The
South Pacific Design Automation Conf., 2009. pp. 421-422.
background temperature is 25°C, and the highest temperature [13] Mosin Mondal et al, “Thermally Robust Clocking Schemes for 3D
is 90°C. Figure 7(b) shows the delay performance based on a Integrated Circuits”, in Design, Automation & Test in Europe, 2007, pp.
traditional H-tree clock distribution which neither have TDB 1-6.
insertion nor de-skew technique. As the results show, the
maximum in-tier skew is 85.2ps for the logic tier, and 75ps for
the memory tier. The cross-tier skew, 214.3ps, is even worse °C
because of the thermal gradients and TSVs between tiers. Memory
Figure 7(c) shows the results based on the proposed topology
with 250μm×250μm minimum clock region and a 7.8ps tuning
step TDB in each region. The maximum in-tier skew are
17.8ps for the logic tier and 21ps for the memory tier. Because μm μm
the adaptive de-skew is able to compensate the effects from °C
both TSVs and thermal gradients cross tiers, in this case, the Logic
worst case clock skew is the same as the value of the memory
tier. The results show that the clock regions partition and the
de-skew technique reduce the clock skews by more than 90%.
V. CONCLUSIONS μm μm
In this paper, we present novel technologies to realize (a)
high performance clock distribution in 3D ICs. An efficient ps
clock distribution topology, a reliable tunable-delay-buffer, Memory
and a highly adaptive de-skew technique are proposed to
overcome the impacts from the cross-tier process variations,
the large thermal gradients, and the routing asymmetries in 3D
ICs. In addition, an optimization flow is developed to improve
μm μm
the clock regions design and reduce the overhead. ps
REFERENCES Logic
[1] S. J. Souri, K. Banerjee, A. Mehrotra, and K. C. Saraswat, “Multiple Si
layer ICs: motivation, performance analysis, and design implications”, in
Proc. Design Automation Conf., 2000, pp. 213-220.
[2] K. Banerjee, S. J. Souri, P. Kapur, K. C. Saraswat, “3-D ICs: a novel chip
design for improving deep-submicrometer interconnect performance and μm μm
systems-on-chip integration”, Proc. IEEE, vol. 89, pp. 602-633, May 2001.
[3] Wm.A. Wulf and S.A. McKee, “Hitting the memory wall: Implications of (b)
the obvious,” ACM SIGARCH Computer Architecture News, vol. 23, pp. ps
20-24, March 1995. Memory
[4] Jonghyun Cho et al, “Active Circuit to Through Silicon Via (TSV) Noise
Coupling”, in IEEE 18th Conf. Electrical Performance of Electronic
Packaging and Systems, 2009, pp. 97-100.
[5] Mosin Mondal et al, “Mitigating Thermal Effects on Clock Skew with
Dynamically Adaptive Drivers”, in Int. Symp. Quality Electronic Design,
2007, pp. 67-72. μm μm
[6] Ashutosh Chakraborty et al, “Dynamic Thermal Clock Skew ps
Compensation Using Tunable Delay Buffers”, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., Vol. 16, pp. 639-649, June 2008. Logic
[7] Simon Tam et al, “Clock Generation and Distribution for the 130-nm
Itanium® 2 Processor with 6-MB On-Die L3 Cache”, IEEE J. Solid-State
Circuits, Vol. 39, pp. 636-642, April 2004.
[8] Simon Tam et al, “Clock generation and distribution for the first IA-64
microprocessor”, IEEE J. Solid-State Circuits, Vol. 35, pp. 1545-1552, μm μm
2000.
[9] Patrick Mahoney, Eric Fetzer, Bruce Doyle, and Sam Naffziger, “Clock (c)
Distribution on a Dual-Core, Multi-Threaded Itanium®-Family Figure.7 Thermal profile and simulated clock skew distribution for two tiers
Processor”, in IEEE Int. Solid-State Circuits Conf., 2005, pp. 292-599. (a) Thermal profiles (90℃ hot spot) (b) Delays of H-tree clock distribution
(214.3ps max skew) (c) Skews of new topology with de-skew (21ps max)
94