Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

Design-Based Equivalent Scaling

to the Rescue of Moores Law

Andrew B. Kahng
UCSD CSE and ECE Departments

abk@ucsd.edu
http://vlsicad.ucsd.edu

ABK UCI ECE Colloquium 121031 1


Conclusions
A new technology node costs billions of dollars in
technology development and hundreds of millions of
dollars in design enablement
Leading-edge companies accept these costs to gain
20% advantages
Design-based equivalent scaling offers entire
technology nodes of improvements that are essential
to the continuation of Moores Law
Recurring theme: What if we knew
Bridges between design and manufacturing
Bridges between system design and IC implementation

ABK UCI ECE Colloquium 121031 2


What is Moores Law?
Moore, 1965: The complexity for minimum component costs
has increased at a rate of roughly a factor of two per year

Min cost per transistor

Moores Law is a law of cost reduction


Proxy for cost reduction: scaling of value
Proxies for value: bits, hertz, density (= utility, integration)
ABK UCI ECE Colloquium 121031 3
What Is Scaling?

# of Transistors

Clock Frequency

Power

Performance/CLK (ILP)

[Sutter09]
ABK UCI ECE Colloquium 121031 4
Dimension and Transistor Density
ITRS = International Technology Roadmap for
Semiconductors (http://www.itrs.net/)
Key metric of progress: Metal-1 (M1) half-pitch (F)
M1 HP scales by 0.7x (note: 0.7 x 0.7 = 0.49 density doubles)
at each technology node
Rough equivalences:
Pitch of M1 (PM1) = 2F
Model scaling
Pitch of M2 (PM2) = 1.25PM1 in both X, Y
Pitch of polysilicon (Ppoly) = 1.5PM1 directions

ABK UCI ECE Colloquium 121031 5


Basic SRAM, Logic Circuits and Layouts
Models of SRAM (USRAM) and NAND2 (UNAND2) area
based on canonical layouts [ISOCC09, ITRS 2009]
2Ppoly 3Ppoly

5PM1 8PM2

USRAM = 2Ppoly 5PM1 = 60F2 Ulogic = 3Ppoly 8PM2 = 180F2


ABK UCI ECE Colloquium 121031 6
Historical Data for MPU Products

1.00E+12 1.00E+09
1.00E+08
1.00E+11 1.00E+07
1.00E+06
1.00E+10
1.00E+05
1.00E+04
1.00E+09
MPULogicTransistor 1.00E+03 ???
Density(xtors/cm^2) 1.00E+02
1.00E+08
SRAMtransistor
1.00E+01
density(xtors/cm^2)
1.00E+07 1.00E+00
2005 2010 2015 2020 2025 1970 1975 1980 1986 1991 1997 2002 2008 2013

[Tx/cm2,ITRS2007MPUmodel] [Tx/cm2,StanfordCPUDB]

ABK UCI ECE Colloquium 121031 7


Frequency
Figure from 2001 International Technology Roadmap for
Semiconductors (ITRS) System Drivers Chapter: FO4 INV
delays in clock period of Intel microprocessors

Observation: Microarchitecture
(pipelining) lever runs out of gas ~2004

Limit: 12-14 FO4 delays

ABK UCI ECE Colloquium 121031 8


Power
Static power density and active capacitance (= dynamic
power) density both continue to increase, modulo small resets
(high-k, FDSOI, FinFET, )
0.06 LogicPowerstatic(density) 0.60
Activecapdensity
0.05 (W/mm2) 0.50 (nF/mm2)
SRAMPowerstatic(density)
0.04 (W/mm2) 0.40
0.03 0.30
0.02 0.20
0.01 0.10
0.00 0.00
2005 2010 2015 2020 2025 2005 2010 2015 2020 2025

[ITRS 2007]

ABK UCI ECE Colloquium 121031 9


ITRS MPU Frequency Roadmap

before 2001
100 2001 ITRS
2007 ITRS
Device speed
only 2011 ITRS
Platform
Frequency (GHz)

power limit

10 Device scaling
limit

1
2001 2006 2011 2016 2021

ABK UCI ECE Colloquium 121031 10


ITRS MPU Frequency Roadmap
before 2001
100
2001 ITRS
2007 ITRS
2011 ITRS

Frequency
(GHz)
10

[Danowitz etal.,StanfordCPUDB]

1
2001 2006 2011 2016 2021

ABK UCI ECE Colloquium 121031 11


Seeing the Future, With 20-20-20 Vision

TSMC28nm 20nm:30%higherspeed,25%lesspower

TSMC40nmLP 28nmLP:20%higherspeed

UMC40nmLP 28nmLP:20%higherspeed

Samsung45nm 32nm:30%higherspeed,30%lesspower

ABK UCI ECE Colloquium 121031 12


Seeing the Future, With 20-20-20 Vision

Reality: In a new technology node,


the best that
TSMC28nm designers can hope for
20nm:30%higherspeed,25%lesspower

is 20% less 28nmLP:20%higherspeed


TSMC40nmLP power, 20% more speed,
and 20% better density
UMC40nmLP 28nmLP:20%higherspeed

Corollary: 10% = half of a technology


node that costs many $B
Samsung45nm 32nm:30%higherspeed,30%lesspower

Challenge: How to extract value


from new technology ?!?
ABK UCI ECE Colloquium 121031 13
This Challenge is Due Largely to Margins
Design quality (e.g., frequency)

Margin lost benefits of technology

Guardbands
Lost benefits!

Signoff with larger


guardbands

Technology Nodes
ABK UCI ECE Colloquium 121031 14
What Can The Semiconductor Industry Do?
Surrender
Dont turn on the transistors: dark silicon

ABK UCI ECE Colloquium 121031 15


Dark Silicon Analysis in 2001 ITRS
Power management gap amount of (switched) logic
content in an SOC goes to zero
Unfortunately, chip value also goes to zero
% of area devoted to logic

50 Constant Power (90W)


2
Constant Power Density (90W/1.57cm )
40

30

20

10
Constant area region
1999-2004
0
1998 2000 2002 2004 2006 2008 2010 2012 2014

Year
ABK UCI ECE Colloquium 121031 16
What Can The Semiconductor Industry Do?
Surrender
Dont turn on the transistors: dark silicon
Dont use the transistors as much: less activity

ABK UCI ECE Colloquium 121031 17


ITRS Magical Activity Factor Reduction
To reduce dynamic power: Do less work
MPU power limit is maintained by assuming a design-
based reduction of switching activity (-5% per year)
450
TotalDynamicPower(W)
400
NEWTotalDynamicPower(W)
350
300 With 5% per year reduction of
Power(W)

250 switching activity


200
Power < 150W
150
100
50
0
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
ABK UCI ECE Colloquium 121031 18
What Can The Semiconductor Industry Do?
Surrender
Dont turn on the transistors: dark silicon
Dont use the transistors as much: less activity

Fight
Design-based equivalent scaling !
= the rest of this talk

(There is a third choice)


Retire

ABK UCI ECE Colloquium 121031 19


Design-Based Equivalent Scaling
[Bohr08]
Geometric Scaling

Geometric scaling: Reduction of physical


dimensions to improve density (cost per function),
performance, reliability, etc.
Examples: scaling of Tox , Lgate, gate pitch

ABK UCI ECE Colloquium 121031 20


Design-Based Equivalent Scaling

Geometric Scaling
Equivalent Scaling

[Mistry07]

Equivalent scaling: Non-geometric enhancements


of process, devices or materials to improve
electrical performance
Examples: High-K metal gate, FinFET devices

ABK UCI ECE Colloquium 121031 21


Design-Based Equivalent Scaling

Geometric Scaling
Equivalent Scaling
Design-Based Equivalent
Scaling

Design-based equivalent scaling: Design


technologies that achieve power, performance and
cost tradeoffs to rescue Moores-Law scaling of value
Examples: design for variability, low-power design,
heterogeneous multi-core architectures,
including some research at UCSD

ABK UCI ECE Colloquium 121031 22


Design-Based Equivalent Scaling

Design-based equivalent scaling: Design


technologies that achieve power, performance and
cost tradeoffs to rescue Moores-Law scaling of value
Rest of this talk: 4 vignettes
The cost of margins
Mitigating bimodal variations
Adaptivity
What if we knew

ABK UCI ECE Colloquium 121031 23


On the Cost of Margin (a.k.a. Guardband)

ABK UCI ECE Colloquium 121031 24


Review: Concept of Timing Slack
Basic idea of power optimization: convert positive timing slack
into power reductions: smaller transistors, area, power, (but
this is not easy!)
3 - 1
Tarrival
1
+2 5 - 3
Trequired
2
+2
7- 7
1-1
10 20
2 - 2
01 01
5- 5
20

1
+1 4 - 4

2- 1

Slack = Trequired Tarrival


CLK CLK

ABK UCI ECE Colloquium 121031 25


Review: Concept of Timing Slack
Basic idea of power optimization: convert positive timing slack
into power reductions: smaller transistors, area, power, (but
this is not easy!)
3 - 1
Tarrival
1
+2 5 - 3
Trequired
2
+2
7- 7
1-1
10
2 - 2
Transistorsinpositiveslack
20

01 cellscanhavesmallerW
01 gate,
5- 5
20higherVth,largerLgate,more

1
+1 variation,
4 - 4

2- 1

Slack = Trequired Tarrival


CLK CLK

ABK UCI ECE Colloquium 121031 26


Guardband for Variations
Guardband to cover uncertainties
Defocus/Dose Variation Mask CD Error
Non-Rectangular Shapes Misalignment
Line-End Shortening Erosion/Dishing in CMP Flare
Line Edge Roughness Wafer flatness
Lens Aberration
Non-Uniform CD
Reliability Alpha-Particle
Imperfect regulators
Temperature NBTI Electromigration
IR-drop
Variation Hot-Carrier Injection
Crosstalk

Traditional components of guardband


Process
FEOL BEOL Voltage Temp.
NMOS PMOS Cap. Res.
BC WC
Low High
WORST Slow Slow Max. Min.
(e.g. 0.9V) (e.g. 125C)

High Low
BEST Fast Fast Min. Max
(e.g. 1.1V) (e.g. -40 C) Circuit delay

ABK UCI ECE Colloquium 121031 27


Motivating Study: Guardband Reduction [ISQED08]
What is the true benefit of Expected impact of
design/manufacturing guardband reduction
optimization techniques?
Delay reduction
50% guardband reduction
Easy to optimization

Smaller gate size


Valuebest Valueworst
Smaller area (A)
-100% 0% 100%
Shorter wire
From delay table analysis:
Worst case delay 12.5% reduction
Yr e Ad
Smaller #defects (d: defect density)
From capacitance table analysis:
r2 2r
Worst case cap. 4% reduction Smaller cost N dies
A 2A
(r: wafer radius)
ABK UCI ECE Colloquium 121031 28
Design Outcomes from Guardband Reduction
Technology
(90nm, 65nm, 45nm) 40% guardband reduction
Area: 13% reduction
Cell library guardband RC guardband
reduction reduction Dynamic power: 13% reduction
Leakage power: 19% reduction
Wirelength: 12% reduction
RTL Design
(AES, JPEG, SOC1)
Synthesis SP&R runtime: 28% reduction
#Timing viols.:100% reduction
Placement
#Good dies (w/ process
Experiments enhancement): 10% increases
with industry chip Clock tree synthesis #Good dies (w/o process
implementation enhancement): 4% increase
flow Routing

Analyze outcomes
Impact of guardband reduction
(Area, wirelength, insight into costs of guardband
runtime, #violations,
yield)
ABK UCI ECE Colloquium 121031 29
Impact on Yield
Guardband reduction in design process
(Actual guardband of fabrication is unchanged)
Parametric yield will decrease
Random defect yield will increase
# of good dice per wafer vs. RGB
158

156

154
# of good dice per wafer

152 no clustering
150
alpha=0.42
alpha=0.43
148
alpha=0.44
146
alpha=0.45
alpha=0.5
144
alpha=1
142
alpha=10
alpha=1000
140

138
0 10 20 30 40 50 60

RGB (%)

20% guardband reduction results in 4% increase


in total number of good dies per wafer
ABK UCI ECE Colloquium 121031 30
On Taming Bimodality
(Double-Patterning Lithography)

TSMCR&DVPCliffHou:At20nmthe
challengeisdoublepatterning,
October24,2012

ABK UCI ECE Colloquium 121031 31


CD Bimodality in Double-Patterning Litho
Two patterning steps Two different CDs
CD = Critical Dimension

Green lines Blue lines


from 1st patterning from 2nd patterning

Two different colorings Two different timings


C12-type cell C21-type cell
C12: Odd polys in BLUE,
Even polys in GREEN
C21: Odd polys in GREEN,
Even polys in BLUE
Gates from CD group1
Gates from CD group2

ABK UCI ECE Colloquium 121031 32


Bimodality Impact on Guardband [SPIE08, ASPDAC09]
Comparison of design guardband (Min-Max delay)
Unimodal representation is too pessimistic
3.0E-11

2.5E-11

2.0E-11 Large CD group

Delay (s)
1.5E-11 Small CD group
1.0E-11 Best case: Large CD group
Worst case: Large CD group
Best case: Small CD group
5.0E-12 Worst case: Small CD group
Best case: Pooled CD
CD mean Worst case: Pooled CD
difference 0.0E+00
1 nm 2 nm 3 nm 4 nm 5 nm 6 nm
CD Mean Difference

ABK UCI ECE Colloquium 121031 33


Impact of Bimodality on Path Delay
Bimodality can help reduce path delay variation
Reduction of covariance when alternately colored
C12 C12 C12 C12
SPICE Simulation Results
25
Uniform
+ + Alternate

+ + 20

+4

Sigma / Mean (%)


Variation()isaccumulated 15

C12 C21 C12 C21


10

+ + 5


0 0

Variation()iscompensated 0 1 2 3 4
CD Mean Difference (nm)
5 6

ABK UCI ECE Colloquium 121031 34


Impact of Bimodality on Clock Skew
Different coloring sequences in a clock network
Clock skew
Case Source to Sink A Source to Sink B
1 C12+C12+C12++C12 C12+C12+C12++C12
2 C12+C12+C12++C12 C21+C21+C21++C21
6.00E-11
Clock skew (s)

5.00E-11 Case2
4.00E-11

3.00E-11

2.00E-11
Case1
1.00E-11

0.00E+00
0nm 1nm 2nm 3nm 4nm 5nm 6nm
CD mean difference

Same color on all clock buffers is better!


ABK UCI ECE Colloquium 121031 35
Bimodal CD Distribution: 3 Key Facts
1. Design requires bimodal-aware timing models
Unimodal representation is too pessimistic

2. Data paths benefit from alternate (mixed) coloring


Exploit existence of two uncorrelated CD populations

Minimize correlated variations in a given path

3. Clock paths benefit from uniform coloring


Correlated variation between launch and capture paths
minimizes bimodality-induced clock skew
Principle:Designcanexploitbothcorrelated,uncorrelatedvariations
ABK UCI ECE Colloquium 121031 36
DPL Layout-to-Mask Flow

RTLtoGDS

DPLMaskColoring
Alternatecoloring
usingintegerlinear
BimodalAware programming
TimingAnalysis
Optimization1
Maximizationof
AlternateColoring
Coloringconflict >Minimumresolution
(Datapaths)

Optimization2
PlacementPerturbation
forColorConflictRemoval
(ClockandDatapaths) Placementperturbationusing
dynamicprogramming

ABK UCI ECE Colloquium 121031 37


Overall Timing Improvement
Bimodal timing model Reduce pessimism (margin)
Alternate coloring Improve timing
Placement perturbation Remove conflicts
Timing Mean CD Difference
Stage #Conflict
Metric 2nm 4nm 6nm
Initial Coloring WNS (ns) -1.113 -2.016 -2.902
(Unimodal) 0
TNS (ns) -671.1 -1776.3 -3348.5
Initial Coloring WNS (ns) -0.191 -0.354 -0.527
(Bimodal) 0
TNS (ns) -8.17 -26.56 -64.64
Alternative WNS (ns) -0.090 -0.145 -0.267
Coloring 219
TNS (ns) -1.48 -3.85 -22.40
DPL-Corr WNS (ns) -0.104 -0.183 -0.295
(+ECO Routing) 0
TNS (ns) -3.43 -10.45 -28.42
Bimodality impact can be effectively mitigated!
ABK UCI ECE Colloquium 121031 38
On Adaptivity

ABK UCI ECE Colloquium 121031 39


Adaptive Voltage Scaling Approaches
Power
AVS
Freq.&Vdd LUT
Open Loop PrecharacterizeLUT[Martin02]
AVS Post-silicon ProcessawareAVS
characterization Postsiliconcharacterization[Tschanz03]

ProcessandtemperatureawareAVS
Generic monitor
Genericonchipmonitor[Burd00]
Closed- Design dependent Designdependentmonitor
Loop AVS replica [Elgebaly07,Drake08,Chan12]

In-situ Insituperformancemonitor
monitor Measureactualcriticalpaths[Hartman06,
Fick10]
Errordetectionandcorrectionsystem
Error Detection System
Vdd scalinguntilerroroccurs[Das06,Tschanz10]

LoadingawareAVS(softwaretechnique)
Application Driven AVS ApplicationdrivenVdd andfrequencyscaling
[Lin09]
ABK UCI ECE Colloquium 121031 40
Design-Dependent RO [ISQED12]

Timing variability is design-specific


1 Delay
why use generic monitor? .
Vth Delaynom
GateA
Idea: Select gates to form DDROs 1
.
Delay
Lgate Delaynom
with similar delay sensitivity to
variations (Lgate, Vth, V, T, ) as
actual critical paths
Benefits: low area overhead, 1 Delay GateB
.
automated flow, standard cells only Vth Delaynom
Delay
Can cluster critical paths having
1
.
Lgate Delaynom
similar sensitivities to reduce #ROs

1 Delay 1 Delay
path(A+B)
. .
Vth Delaynom Vth Delaynom

1 Delay
Criticalpath .
Lgate Delaynom 1 Delay
DDRO .
Lgate Delaynom
ABK UCI ECE Colloquium 121031 41
DDRO Synthesis Flow [ISQED12]

Gate Critical path

Delaysensitivity temp.(%)
sensitivities sensitivities 1.0

X:
X:cluster
clustercentroids
centroids
0.5
Cluster
critical paths
0.0
-3.5 -3.0 -2.5
Delaysensitivity Vdd (%)

sensitivitieserror (%)
For each cluster,

Sumofdelay
synthesize a DDRO using 4
DDRO
DDRO
Cluster1
Cluster2
integer linear program Cluster3
2 Cluster4
Cluster5
Average

45nm SOI ARM DDRO


test chip Cortex M3
Off-line or on-chip
delay estimation
ABK UCI ECE Colloquium 121031 42
Design-Dependent RO vs. Generic RO [ISQED12]
SPICEMonteCarloSimulation
30samples Estimationerror
=0.5%~3.7%
Estimationerror
1.2 1.2 =1.7%~5.1%

Estimateddelay(ns)

Estimateddelay(ns)
1.1 1.1

1.0 1.0
DDRO hvt+rvt Inv RO
0.9 0.9
0.9 1.0 1.1 1.2 0.9 1.0 1.1 1.2
Actualdelay(ns) Actualdelay(ns)

45nmtestchipmeasurement
Eachmonitorhave3copiesperchip,19chips(nowafersplit)
Std.DeviationofFmax Fmax CorrelationCoefficient
0.08 1
Std.deviationoffmax

0.9

CorrelationCoeficient
0.07
0.8
0.06
0.7
0.05 0.6
Copy1 Copy1
0.04 0.5
0.03 Copy2 0.4 Copy2
Copy3 0.3 Copy3
0.02
0.2
0.01
0.1
0 0
DDRO Criticalpath hvt+rvtINVRO DDRO Criticalpathreplica hvt+rvtINVRO
replica

ABK UCI ECE Colloquium 121031 43


Process-aware Voltage Scaling (PVS) [ICCAD-2012]
Monitor design considerations
Critical path maybe difficult to be Closed-
Loop AVS
identified (IP from 3rd party)
Multiple modes/voltages Fmax calibration
takes long test time
PVSRO+SoC
Proposal: tunable monitor
Design monitor to guardband for
arbitrary circuit (overdesign) WithoutFmax of WithFmax of
Tune monitor based on Fmax of sample sample chips samplechips
chips to recover design margin (calibrate
only once)
ConfigureROso
ConfigureROfor
Abstract voltage scaling property worstcase
thatallsample
chipsmeetstiming
instead of matching critical path
Enable analysis of worst-case voltage
scaling StoretargetfrequencyandRO
configurationsinaROM

ABK UCI ECE Colloquium 121031 44


Voltage Scaling Properties
Vmin= Minimum Vdd to meet timing constraints
= process distance/scaling rate
Process distance: process-induced frequency shift
relative to target frequency
Scaling rate: frequency shift for a unit voltage difference
Freq. FF
Scaling rate = f
Process distance V
TT
k

SS
f target

Vmin_path (k ) V nom V nom V V

ABK UCI ECE Colloquium 121031 45


PVS Monitor Design Concept
RO is used as a reference for voltage scaling
Design ROs with the worst case voltage scaling
properties
guardband for arbitrary circuits
Freq. FF
A circuit meets its timing when
m n
TT
m a x (V m in _ r o ( i , k ) ) m a x (V m in _ p a th ( j , k ) ) k
i 1 j 1
SS
ftarget
Maximum of m ROs Maximum of n paths
Vmin(k) Vnom Vnom V V
Design challenges
Vmin_ro > Vmin of any data path across all process conditions

ABK UCI ECE Colloquium 121031 46


Vmin Analysis
Key observation: Vmin is bounded by NMOS or PMOS
dominated cells (e.g., NOR3 at FS corner)
Use NAND, NOR type ROs
SS TT FF SF FS
1.100
1.000
0.900
Vmin(V)

0.800
0.700
0.600
0.500
INVX0 NAND2X0 NAND3X0 NAND4X0 NOR2X0 NOR3X0 NOR4X0
Celltype

ABK UCI ECE Colloquium 121031 47


Design RO with Tunable Vmin
Identified two circuit knobs to tune Vmin
Series resistance
Cell types (INV, NAND, NOR)
Example circuit strategy
Allow tuning of series resistance of each stage to high or low
Different cell types cover different process corners
Control
pins 1 bit 1 bit 1 bit

High resistance

Low resistance

ABK UCI ECE Colloquium 121031 48


PVS Experiment Result
Min margin
65nm, OpenSPARC T1 module
More aggressive Monte Carlo SPICE simulation
scaling

Default setting: low resistance in all stages


Vmin_est Vmin_chip = 13mV on average (guardband for
worst-case)
With Fmax information per die, can tune RO configuration
to drive Vmin_est Vmin_chip 0
Better on-chip sensing and adaptation
more reduction of runtime power overheads (Vdd)
ABK UCI ECE Colloquium 121031 49
On What if We Knew

ABK UCI ECE Colloquium 121031 50


What If We Knew(switching activity from workload)
ErrorTolerantDesign Problem:
Manypathshavenearcritical
CPU,healthyself... slackwallof(critical)slack
Errorsaredetected Scalingbeyondthecritical
andcorrectedwith operatingpointcausesmassive
redundancytechnique errorsthatcannotbecorrected

Reshape slack distribution for gracefully increasing error rate

Frequentlyexercisedpaths
:upsizecells Scalevoltagefurther
Rarelyexercisedpaths
:downsizecells

ABK UCI ECE Colloquium 121031 51


Recovery-Driven for Error-Tolerant Designs [TCAD12]
Minimize power for a target error rate
Slack redistribution based on functional information

VoltageScaling
reduce voltage until the error
rate exceeds a target

PathOptimization
optimize frequently exercised,
negative slack paths
PowerReduction
reduce power without affecting
error rate
22%powersavings

ABK UCI ECE Colloquium 121031 52


What If We Knew (scenarios, duty cycles)
Dynamic Voltage Freq. Scaling
1.0V, 1GHz
(e.g., talk mode) DVFS allows adaptation to workloads &
operating conditions
Multi-Mode (or DVFS) design operates at
multiple power/performance points with
0.7V, 100MHz different lifetimes
(e.g., standby mode)

Conventional EDA tool: require constraints (freq., voltage)


before implementation
(which constraints will provide minimum energy?)
Replication: Create replicas that target each performance mode
(Replication incurs a large area overhead)
Use scenario/duty cycle information for multi-mode
optimization [TCAD12]
ABK UCI ECE Colloquium 121031 53
DVFS Design Implementations [TVLSI12]

Context-aware design shows up to 19.5%, 7.6% (avg.)


energy reduction over conventional multi-mode design
Replication-based design shows up to 25.4%, 9.1% (avg.)
energy reduction over conventional multi-mode design
Selective-replication design
Duty Cycle (R) = 1%
Layout results (OpenSPARC/FFU)
16%
Energy Reduction

12%
R = 5%
8%

R = 10%
4%

0%
0% 10% 20% 30% multi-mode design
Allowable Area Overhead
FFU module has 12%
16% power reduction with energy savings through
10% area overhead (R=1%) selective-replication
ABK UCI ECE Colloquium 121031 54
What If We Knew (accuracy requirements)
ApproximateDesign
Problem:
Whatisthesquarerootof10?
Accuracyrequirementcan
alittlemore changeduringruntime
thanthree
benefitsofapproximationcould
3.162278... bereduced
Approximationcouldbefasterand
morepowerful
Adapt to changing requirements
with runtime accuracy configuration

[DAC2012]
accuracyconfigurable
approximateadder
lower power higher accuracy
consumption
ABK UCI ECE Colloquium 121031 55
Accuracy-Configurable Adder [DAC12]

Accuracy configuration with pipelined adder

Power reduction when accuracy requirement varying


1 0.98 Accuracy 1.00 Power
Config. Accuracy
Normalizedpower

reduction
0.8
consumption

mode4 1.000 11.5%


0.6
mode3 0.998 12.4%
0.4
mode2 0.991 31.0%
0.2
mode1 0.983 51.6%
0
| result reference |
Accuracy Avg .1
reference
Average30%powersavingsvs.noaccuracyconfiguration
ABK UCI ECE Colloquium 121031 56
What If We Knew (Lifetime (MTTF) Reqts)
Wire Wire
EM
width general spacing

Driver general EM TDDB

size
Jrms EM, TDDB,
general general
NBTI, HCI
general Temp
AF general MTTF
()
NBTI
general HCI
general
Timing
general slack TDDB
Supply NBTI NBTI
voltage |Vthp | HCI

Directrelation;ifAincreases
general
A B thenBincreases
HCI HCI
Freq.
HCI
|Vthn | A B Inverserelation;ifAincreases
HCI thenBdecreases
general
general Tunableatdesignor
HCI runtime
HCI
Gate general HCI

length Tunableatdesign
Slewrate
general Junction
Load/ general resistance
general fanout
general

ABK UCI ECE Colloquium 121031 57


Example: Electromigration MTTF vs. Fmax
100%
DMA AES JPEG
% increase of Fmax
80%
-30% of MTTFrequire 65nm technology
60% = +60% of Fmax Fixed area

40%

20%

0%
10 9 8 7 6 5 4 3 2 1

Fmax increases with relaxing MTTFrequire


Up to +60% of Fmax for -30% of MTTFrequire
Fmax improvement is determined by
Mix of cell sizes
Length and timing constraints of critical paths
ABK UCI ECE Colloquium 121031 58
Conclusions
A new technology node costs billions of dollars in
technology development and hundreds of millions of
dollars in design enablement
Leading-edge companies accept these costs to gain
20% advantages
Design-based equivalent scaling offers entire
technology nodes of improvements that are essential
to the continuation of Moores Law
Recurring theme: What if we knew
Bridges between design and manufacturing
Bridges between system design and IC implementation

ABK UCI ECE Colloquium 121031 59


THANK YOU

ABK UCI ECE Colloquium 121031 60

You might also like