Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

8OWUD'HHS6XEPLFURQ

'HVLJQ&KDOOHQJHV³
$Q2YHUYLHZ

Jan M. Rabaey
BWRC
University of California @ Berkeley
http://bwrc.eecs.berkeley.edu

With contributions from D. Sylvester, K. Keutzer,


and many others
The Deep Sub-Micron Challenge

∝ DSM ∝ 1/DSM
“Microscopic Problems” “Macroscopic Issues”
• Wiring Load Management • Time-to-Market
• Noise, Crosstalk • Millions of Gates
• Reliability, Manufacturability • High-Level Abstractions
• Complexity: LRC, ERC • Reuse & IP: Portability
• Accurate Power Prediction • Predictability
• Accurate Delay Prediction • etc.
• etc.

Everything Looks a Little Different …and There’s a Lot of Them!


?
Design at a Crossroad
Silicon technology tracking Moore’s Law
64 Gbits

1010
*0.08µm
Human
Humanmemory
memory
Human DNA 4 Gbits
109 Human DNA 0.15µm P7
1 Gbits 1000.0
0.15-0.2µm
108
P6

Microprocessor Power (MIPS)


256 Mbits
Number of bits per chip

0.25-0.3µm
100.0
107
64 Mbits 0.35-0.4µm

106 Book 10.0 Pentium


Book 16 Mbits 0.5-0.6µm 486

105 4 Mbits 0.7-0.8µm


1.0
1 Mbits 1.0-1.2µm
104 386
256 Kbits Encyclopedia
1.6-2.4µm Encyclopedia
22hrs
hrsCD
CDAudio
Audio
30
64 Kbits 30sec
secHDTV
HDTV 286
Page
Page

1970 1980 1990 2000 2010 1980 1985 1990 1995 2000

Year Year

Courtesy of David Eaglesham, Lucent


Power Dissipation

Due to 30%
10,000
Vdd scaling 70

60
1,000
50

2
100

Watts/cm
Pentium 40
Icc (amps)

Pro (R)
10 30
Pentium 100-2,000amps 20
1 (R)
486 10
386
0 0
1985 1990 1995 2000 2005 2010

Surpassed Hot-Plate Power


Density in 0.6 µm CMOS

Courtesy Intel
Challenges in Deep-Submicron Design

● Device scaling
– Scaling of the voltage
– The leaky transistor
– Short- and long-term reliability
● Interconnect scaling
– Capacitance
– Resistance
– Inductance
DSM devices: Evolution of Idsat
NMOS
800 PMOS 800

700 700
Idsat (µA/µm)

600 600

500 500

400 400

300 300

200 200
0.08 0.12 0.16 0.20 0.24
Drawn Channel Length (µm)
Data taken from 16 papers (IBM,TI, Bell Labs, Motorola, Intel, AMD)
Demonstrates a relatively constant Idsat from Ldrawn of 0.25 to 0.09 µm
[Sylvester, Keutzer, 98]
Scaling the Supply Voltage
5
0.2
4.5

4
0.15
3.5
Supply Voltage (V)

Vout (V)
0.1
2.5

1.5 0.05

0.5
-1 0
1 10 0 0.05 0.1 0.15 0.2
Minimum Feature Size (micron) V (V)
in

Scaling forced by reliability Scaling limited by gain


and power considerations and thermal noise
Projected Evolution in Ioff
12
12
10
10
(25C)
Off-current(25C)

88
Off-current

66
44

22
00
250
250 180
180 130130 100
100 70
70 50
50
Te
Technology
chnologyNode
Node
Silicon-on-Insulator

Gate tSi < 50 nm


tOX
Oxide n+ tSi n+ Oxide

Buried Oxide (BOX) t BOX


P Substrate

● Extension beyond Bulk CMOS ● Reduced leakage (low-power)


Scaling Limit ● Latch-Up Elimination
● Performance Improvement ● Ease of Device Isolation
– Reduced Junction Capacitance
● Potentially Reduced Wafer
– Absence of Reverse-Body Effect
Fabrication Cost
● Soft Error Rate (SER) Improvement
IBM 64b PowerPC
● Bulk CMOS Base Design ● PD/SOI Technology
➔ 0.12 µm Leff , 6LM (Cu) ➔ 0.12 µm Leff , 6LM (Cu)
➔ 450 MHz, 22 W @ 1.8 V ➔ 550 MHz, 24 W @ 1.8 V

CMOS6S2 CMOS7S CMOS7S SOI


Core Clock Frequency 350 MHz 450 MHz 550 MHz
L1 Cache 64KB-I + 64KB-D 128KB-I + 128KB-D 128KB-I + 128KB-D
L2 Directory N/A 104 x 16 K 104 x 16 K
Supply Voltage 2.5 V 1.8 V 1.8 V
Transistors 12 M 34 M 34 M
Die Size 162 mm2 139 mm2 139 mm2
Power 34 W 22 W 24 W
Leff (nFET) 0.18 µm 0.12 µm 0.12 µm
TOX 5.0 nm 3.5 nm 3.5 nm
Metalization 5 Layers AL 6 Layers Cu 6 Layers Cu
Contacted M2 – M4 1.26 µm 0.81 µm 0.81 µm
Pitch

(D. H. Allen et al., ISSCC, 1999)


Alternatives to planar CMOS:
Vertical transistors

Drain

Gate Gate

Source

● Some early types of vertical structures discussed since 1980’s


● Separates performance-scaling from packing-scaling
● Require manufacturable process with low parasitics
Dual-Gate Berkeley FinFET (1999)

SiO2

• Closer to planar technology


• Proven operation for NMOS and PMOS down to 18 nm
• Can be scaled down to 10 nm
• Suppresses short-channel effects! [Huang, IEDM99]
Berkeley PMOS FINFET (Lg = 45 nm)

S = 69 mV/decade Highest reported PMOS Drive Current

[Huang, IEDM99]
Device Challenges — Summary
● Conventional planar CMOS continues as long as
possible
● Transistor gets (slightly) faster and (plenty) leakier
– Off-current and gate-current will both increase to meet
design limit
– Circuit design techniques needed to address standby power
dissipation
● Deep sub-micron effects (VT-variation, drain-induced
effects, hot-carrier) impact predictability
● Non-planar transistors separate shrinks from
performance improvements
Dual-gate devices help to suppress DSM effects
The Interconnect Challenge
● With increases in performance and integration
density, wire parasitics gain dominance
● The wire combines capacitance, resistance, and
inductance
● Wire parasitics impact performance, energy
dissipation and reliability

transmitters receivers
Interconnect Distribution

Pentium Pro (R)


Pentium(R) II
Pentium (MMX)
Pentium (R)
Pentium (R) II
(Log Scale)
No of nets

Source: Intel
10 100 1,000 10,000 100,000
Length (u)
The Ideal Wire Scaling Model
“The RC Dilemma”

While transistor delay


scales as 1/S!
“Constant Resistance” Scaling

– Scaling would increase R (∝ S3)


• historically aspect ratio has increased to compensate
“Constant” Resistance Scaling
Differential scaling of horizontal and vertical dimensions keeps
resistance in check

εc: horizontal/vertical capacitance scaling factor (including fringing)


Will Interconnect Dominate Delay?
delay
1
● # logic levels decreasing
(architecture)
0.5 ● Min. gate size shrinking
● Parasitics increase due to
0.25 scaling
● Increasing RC delay with
88 94 00 Year chip size

From Aykut Dengi


gate delay
1996 ICCAD tutorial
delay due to
sizing and buffering
interconnect delay
Or Will Its Impact Decrease?
2-input NAND, FO = 2, W/L = 16
120 120
110 Gate delay 110 [Keutzer98]
100 Stage delay 100
90 90
80 80
Delay (ps)

70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0.1 0.13 0.18 0.25
Process Generation (µm)

Shorter local wire length, transistor sizing, and low-k dielectrics


Interconnect Projections
Low-k dielectrics
● Both delay and power are reduced by dropping
interconnect capacitance
● Types of low-k materials include: inorganic (SiO2),
organic (Polyimides) and aerogels (ultra low-k)
● The numbers below are on the
conservative side of the NRTS roadmap
ε

Generation 0.25 0.18 0.13 0.1 0.07 0.05


µm µm µm µm µm µm
Dielectric 3.3 2.7 2.3 2.0 1.8 1.5
Constant
From Capacitance-to-GND to
Interwire Capacitance

fringing parallel
Crosstalk
W S
• Neighboring wires switch,
Cc Cc
T coupling to a quiet line

Ca Cv Ca
• Quiet line sees a undesired
H

Ground Plane
voltage spike
• Crosstalk can lead to:
- Logic faults (especially in dynamic circuits)
- Voltage overshoot (stress, forward-bias PN junctions)
• Voltage spike, Vx ∝ Cc / Ctotal
• Vx is a complex function of
- Driver strength
- Fan-out capacitance
- Wiring resistance
Delay Degradation

- Impact of neighboring signal


Cc activity on switching delay
- When neighboring lines switch
in opposite direction of victim
line, delay increases

Miller Effect
- Both terminals of capacitor are switched in opposite directions
(0 → Vdd, Vdd → 0)
- Effective voltage is doubled and additional charge is needed
(from Q=CV)
Structured and Predictable Interconnect

V S G S V S

S
V
S
G
S
V
Example: Dense Wire Fabric (DWF) [Khatri, DAC99]
Trade-off:
• Cross-coupling capacitance 40x lower, 2% delay variation
• Increase in area and overall capacitance
The Impact of Resistivity
Tr

The distributed rc-line


R1 R2 RN-1 RN

C1 C2 CN-1 CN
Vin

2 .5
2 .5

x= L /1 0

Diffused signal 2
2
x= L /1 0

x = L /4

propagation x = L /4
v o lta g e (V)

1 .5
v o lta g e (V)

1 .5
x = L /2
x = L /2
1
1
x= L

Delay ~ L2 0 .5
0 .5
x= L

0
00 0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
0 0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
tim e ( n se c )
tim e ( n se c )
Using Copper as Interconnect
Material
• With cladding and other effects,
Cu ~ 2.2 mW-cm vs. 3.5 for Al(Cu)
⇒ 40% reduction in resistance
•Yields 12% performance
improvement over an aluminum
process in a PowerPC design
•Electromigration improvement;
100X longer lifetime (IBM, IEDM97)
Electromigration is a limiting
factor beyond 0.18 mm if Al is
used (HP, IEDM95)

Transistor SEM
The Global Wire Problem
Td = 0.377 R w C w + 0.693(R d C out + R d C w + R w C out )

Challenges
● No further improvements to be expected after the
introduction of Copper (superconducting, optical?)
● Design solutions
– Use of fat wires
– Insert repeaters — but might become prohibitive (power, area)
– Efficient chip floorplanning
● Towards “communication-based” design
– How to deal with latency?
– Is synchronicity an absolute necessity?
Architecture Must Evolve to Fit
the Landscape
Global operations
Low bandwidth 20 Clocks
High latency &
High power

90,000
tracks

Local, parallel operations


High bandwidth
Low latency &
Low power

Source: Bill Dally, Stanford


Interconnect: # of Wiring Layers
ρ = 2.2 M6
# of metal layers is steadily increasing due to:
µΩ-cm
• Increasing die size and device count: we need
Tins more wires and longer wires to connect
everything
M5 • Rising need for a hierarchical wiring network;
local wires with high density and global wires with
W
S low RC
M4 Minimum Widths (Relative) Minimum Spacing (Relative)
H 3.5 4.0

3.0 3.5

3.0
2.5
M3
2.5
2.0 M5 M5
2.0
M4 M4
1.5
M3 1.5 M3
M2
1.0 M2 M2
M1
1.0 M1
M1
0.5 Poly 0.5 Poly
poly
substrate 0.0 0.0
1.0µ 0.8µ 0.6µ 0.35µ 0.25µ 1.0µ 0.8µ 0.6µ 0.35µ 0.25µ
0.25 µm wiring stack
Resistance and the Power
Distribution Problem
10,000

1,000

100
Pentium
Icc (amps)

10
Pro (R) 100-3,000amps RI drop
Pentium
VDD I
1 (R)
486
386
0
1985 1990 1995 2000 2005 2010
φpr e R’ VDD - ∆V’

I ∆V
∆V
R
Resistance and the Power
Distribution Problem
Before After

• Requires fast and accurate peak current prediction


• Heavily influenced by packaging technology
Source: Simplex
Inductance

● Transmission line effects


cause overshooting and non-
monotonic behavior
● Wave propagation puts
minimum bound on delay
and may require termination
● Only to be considered when
the rise and fall times of the
signal are comparable to the
time-of-flight of the line, and
when the resistance of the
wire is small (< 5Z0)
Clock signals in 400 MHz IBM Microprocessor
(measured using e-beam prober) [Restle98]
Dealing with Inductance
• Inductance hard to analyze accurately
• Structural design approaches might be more appropriate
• DEC approach in Alpha 21264 — use entire planes of metal as
references (Vdd and GND) to reduce inductance by controlling the
return path
- Loss of routing density, added metal layers reduce yield &
raise costs
• Another industry approach uses shield wires every ~ 3 signal lines
in a dense array

Vdd GND
Bus lines
Inductive Noise - Ldi/dt
1.E+08
1.E+07
1.E+06
di/dt in AU

1.E+05
Pentium
1.E+04 Pro
1.E+03 Pentium di/dt noise
increases
1.E+02 486
1.E+01
386
1.E+00
1.5µ 0.8µ 0.35µ 0.18µ 0.1µ

Source: Intel
Inductive Noise - Ldi/dt
● Decoupling
capacitance problem
Board Bonding becoming extreme
Wiring Wire – DEC 21164: 128
+
nF of on-chip
decoupling
SUPPLY Cd CHIP
– DEC 21264: add
- flip-chip decoupling
capacitor chip

Decoupling ● Mostly solvable by


Capacitor advances in
packaging
technology and novel
timing approaches
Summary
● Deep-submicron effects impact reliability,
performance, and power dissipation
● The major device challenge: low-voltage, non-leaky
design
● Interconnect starts playing a dominant role
– capacitive: the increasing impact of interwire capacitance
– resistive: global wire delay and power distribution
– inductive: mostly supply noise, but transmission line effects
are emerging
● Requires a new generation of fast and accurate
analysis tools
● But most of all … novel design methodologies and
concepts producing predictable or insensitive design

You might also like