Professional Documents
Culture Documents
UCI Colloquium 121031 v7 Distributed
UCI Colloquium 121031 v7 Distributed
Andrew B. Kahng
UCSD CSE and ECE Departments
abk@ucsd.edu
http://vlsicad.ucsd.edu
# of Transistors
Clock Frequency
Power
Performance/CLK (ILP)
[Sutter09]
ABK UCI ECE Colloquium 121031 4
Dimension and Transistor Density
ITRS = International Technology Roadmap for
Semiconductors (http://www.itrs.net/)
Key metric of progress: Metal-1 (M1) half-pitch (F)
M1 HP scales by 0.7x (note: 0.7 x 0.7 = 0.49 density doubles)
at each technology node
Rough equivalences:
Pitch of M1 (PM1) = 2F
Model scaling
Pitch of M2 (PM2) = 1.25PM1 in both X, Y
Pitch of polysilicon (Ppoly) = 1.5PM1 directions
5PM1 8PM2
1.00E+12 1.00E+09
1.00E+08
1.00E+11 1.00E+07
1.00E+06
1.00E+10
1.00E+05
1.00E+04
1.00E+09
MPULogicTransistor 1.00E+03 ???
Density(xtors/cm^2) 1.00E+02
1.00E+08
SRAMtransistor
1.00E+01
density(xtors/cm^2)
1.00E+07 1.00E+00
2005 2010 2015 2020 2025 1970 1975 1980 1986 1991 1997 2002 2008 2013
[Tx/cm2,ITRS2007MPUmodel] [Tx/cm2,StanfordCPUDB]
Observation: Microarchitecture
(pipelining) lever runs out of gas ~2004
[ITRS 2007]
before 2001
100 2001 ITRS
2007 ITRS
Device speed
only 2011 ITRS
Platform
Frequency (GHz)
power limit
10 Device scaling
limit
1
2001 2006 2011 2016 2021
Frequency
(GHz)
10
[Danowitz etal.,StanfordCPUDB]
1
2001 2006 2011 2016 2021
TSMC28nm 20nm:30%higherspeed,25%lesspower
TSMC40nmLP 28nmLP:20%higherspeed
UMC40nmLP 28nmLP:20%higherspeed
Samsung45nm 32nm:30%higherspeed,30%lesspower
Guardbands
Lost benefits!
Technology Nodes
ABK UCI ECE Colloquium 121031 14
What Can The Semiconductor Industry Do?
Surrender
Dont turn on the transistors: dark silicon
30
20
10
Constant area region
1999-2004
0
1998 2000 2002 2004 2006 2008 2010 2012 2014
Year
ABK UCI ECE Colloquium 121031 16
What Can The Semiconductor Industry Do?
Surrender
Dont turn on the transistors: dark silicon
Dont use the transistors as much: less activity
Fight
Design-based equivalent scaling !
= the rest of this talk
Geometric Scaling
Equivalent Scaling
[Mistry07]
Geometric Scaling
Equivalent Scaling
Design-Based Equivalent
Scaling
1
+1 4 - 4
2- 1
01 cellscanhavesmallerW
01 gate,
5- 5
20higherVth,largerLgate,more
1
+1 variation,
4 - 4
2- 1
High Low
BEST Fast Fast Min. Max
(e.g. 1.1V) (e.g. -40 C) Circuit delay
Analyze outcomes
Impact of guardband reduction
(Area, wirelength, insight into costs of guardband
runtime, #violations,
yield)
ABK UCI ECE Colloquium 121031 29
Impact on Yield
Guardband reduction in design process
(Actual guardband of fabrication is unchanged)
Parametric yield will decrease
Random defect yield will increase
# of good dice per wafer vs. RGB
158
156
154
# of good dice per wafer
152 no clustering
150
alpha=0.42
alpha=0.43
148
alpha=0.44
146
alpha=0.45
alpha=0.5
144
alpha=1
142
alpha=10
alpha=1000
140
138
0 10 20 30 40 50 60
RGB (%)
TSMCR&DVPCliffHou:At20nmthe
challengeisdoublepatterning,
October24,2012
2.5E-11
Delay (s)
1.5E-11 Small CD group
1.0E-11 Best case: Large CD group
Worst case: Large CD group
Best case: Small CD group
5.0E-12 Worst case: Small CD group
Best case: Pooled CD
CD mean Worst case: Pooled CD
difference 0.0E+00
1 nm 2 nm 3 nm 4 nm 5 nm 6 nm
CD Mean Difference
+ + 20
+4
+ + 5
0 0
Variation()iscompensated 0 1 2 3 4
CD Mean Difference (nm)
5 6
5.00E-11 Case2
4.00E-11
3.00E-11
2.00E-11
Case1
1.00E-11
0.00E+00
0nm 1nm 2nm 3nm 4nm 5nm 6nm
CD mean difference
RTLtoGDS
DPLMaskColoring
Alternatecoloring
usingintegerlinear
BimodalAware programming
TimingAnalysis
Optimization1
Maximizationof
AlternateColoring
Coloringconflict >Minimumresolution
(Datapaths)
Optimization2
PlacementPerturbation
forColorConflictRemoval
(ClockandDatapaths) Placementperturbationusing
dynamicprogramming
ProcessandtemperatureawareAVS
Generic monitor
Genericonchipmonitor[Burd00]
Closed- Design dependent Designdependentmonitor
Loop AVS replica [Elgebaly07,Drake08,Chan12]
In-situ Insituperformancemonitor
monitor Measureactualcriticalpaths[Hartman06,
Fick10]
Errordetectionandcorrectionsystem
Error Detection System
Vdd scalinguntilerroroccurs[Das06,Tschanz10]
LoadingawareAVS(softwaretechnique)
Application Driven AVS ApplicationdrivenVdd andfrequencyscaling
[Lin09]
ABK UCI ECE Colloquium 121031 40
Design-Dependent RO [ISQED12]
1 Delay 1 Delay
path(A+B)
. .
Vth Delaynom Vth Delaynom
1 Delay
Criticalpath .
Lgate Delaynom 1 Delay
DDRO .
Lgate Delaynom
ABK UCI ECE Colloquium 121031 41
DDRO Synthesis Flow [ISQED12]
Delaysensitivity temp.(%)
sensitivities sensitivities 1.0
X:
X:cluster
clustercentroids
centroids
0.5
Cluster
critical paths
0.0
-3.5 -3.0 -2.5
Delaysensitivity Vdd (%)
sensitivitieserror (%)
For each cluster,
Sumofdelay
synthesize a DDRO using 4
DDRO
DDRO
Cluster1
Cluster2
integer linear program Cluster3
2 Cluster4
Cluster5
Average
Estimateddelay(ns)
Estimateddelay(ns)
1.1 1.1
1.0 1.0
DDRO hvt+rvt Inv RO
0.9 0.9
0.9 1.0 1.1 1.2 0.9 1.0 1.1 1.2
Actualdelay(ns) Actualdelay(ns)
45nmtestchipmeasurement
Eachmonitorhave3copiesperchip,19chips(nowafersplit)
Std.DeviationofFmax Fmax CorrelationCoefficient
0.08 1
Std.deviationoffmax
0.9
CorrelationCoeficient
0.07
0.8
0.06
0.7
0.05 0.6
Copy1 Copy1
0.04 0.5
0.03 Copy2 0.4 Copy2
Copy3 0.3 Copy3
0.02
0.2
0.01
0.1
0 0
DDRO Criticalpath hvt+rvtINVRO DDRO Criticalpathreplica hvt+rvtINVRO
replica
SS
f target
0.800
0.700
0.600
0.500
INVX0 NAND2X0 NAND3X0 NAND4X0 NOR2X0 NOR3X0 NOR4X0
Celltype
High resistance
Low resistance
Frequentlyexercisedpaths
:upsizecells Scalevoltagefurther
Rarelyexercisedpaths
:downsizecells
VoltageScaling
reduce voltage until the error
rate exceeds a target
PathOptimization
optimize frequently exercised,
negative slack paths
PowerReduction
reduce power without affecting
error rate
22%powersavings
12%
R = 5%
8%
R = 10%
4%
0%
0% 10% 20% 30% multi-mode design
Allowable Area Overhead
FFU module has 12%
16% power reduction with energy savings through
10% area overhead (R=1%) selective-replication
ABK UCI ECE Colloquium 121031 54
What If We Knew (accuracy requirements)
ApproximateDesign
Problem:
Whatisthesquarerootof10?
Accuracyrequirementcan
alittlemore changeduringruntime
thanthree
benefitsofapproximationcould
3.162278... bereduced
Approximationcouldbefasterand
morepowerful
Adapt to changing requirements
with runtime accuracy configuration
[DAC2012]
accuracyconfigurable
approximateadder
lower power higher accuracy
consumption
ABK UCI ECE Colloquium 121031 55
Accuracy-Configurable Adder [DAC12]
reduction
0.8
consumption
size
Jrms EM, TDDB,
general general
NBTI, HCI
general Temp
AF general MTTF
()
NBTI
general HCI
general
Timing
general slack TDDB
Supply NBTI NBTI
voltage |Vthp | HCI
Directrelation;ifAincreases
general
A B thenBincreases
HCI HCI
Freq.
HCI
|Vthn | A B Inverserelation;ifAincreases
HCI thenBdecreases
general
general Tunableatdesignor
HCI runtime
HCI
Gate general HCI
length Tunableatdesign
Slewrate
general Junction
Load/ general resistance
general fanout
general
40%
20%
0%
10 9 8 7 6 5 4 3 2 1