Professional Documents
Culture Documents
Lecture11 PDF
Lecture11 PDF
Lecture11 PDF
Spring 2018
Digital Design and
Integrated Circuits
Instructors:
Nick Weaver & John Wawrzynek
Lecture 11
1
EE141
Driving Large Loads
‣ Large fanout nets: clocks, resets, memory bit lines, off-chip
‣ Relatively small driver results in long rise time (and thus
large gate delay)
Staged Buffers
‣ Strategy:
2
Inverter Chain
In Out
CL
3
EE141
Careful about Optimization Problems
❑ Get fastest delay if build one very big inverter
▪ So big that delay is set only by self-loading
Cload
4
EE141
Delay Optimization Problem #1
5
EE141
Apply to Inverter Chain
In Out
1 2 N CL
6
EE141
Optimal Sizing for Given N
dt p 1 Cin , j +1
= tinv − tinv 2
=0
dCin , j Cin , j −1 C in , j
7
EE141
Optimal Sizing for Given N (cont’d)
8
EE141
Optimum Delay and Number of Stages
9
EE141
Example
In Out
1 f f2 CL= 8 C1
C1
10
EE141
Delay Optimization Problem #2
11
EE141
Untangling the Optimization Problem
For γ = 0, f = e, N = ln (CL/Cin)
12
EE141
Optimum Effective Fanout f
4.5
4
fopt
3.5
fopt = 3.6
3 for γ = 1
e
2.5
0 0.5 1 1.5 2 2.5 3
γ
13
EE141
In Practice: Plot of Total Delay
[Hodges, p.281]
14
EE141
Normalized Delay As a Function of F
( )
t p = Ntinv γ + N F , F = CL Cin
(γ = 1)
[Rabaey: page 210]
15
EE141
Conclusion
‣ Timing Optimization: You start with a target on clock
period. What control do you have?
‣ Biggest effect is RTL manipulation.
‣ i.e., how much logic to put in each pipeline stage.
‣ We will be talking later about how to manipulate RTL
for better timing results.
‣ In most cases, the tools will do a good job at logic/circuit
level:
‣ Logic level manipulation
‣ Transistor sizing
‣ Buffer insertion
‣ But some cases may be difficult and you may need to
help
‣ The tools will need some help at the floorpan and layout
16 EECS151, UC Berkeley Sp18
Energy-delay
trade-offs
EE141
Power, Energy, and Power Density
❑ Energy: The total energy consumed for
an operation
❑ Measured in joules
❑ Alternately in power*time (e.g. watt-hours for batteries)
❑ Power:
❑ The amount of energy per unit time
used
❑ Power-density:
❑ The amount of energy per unit time
per unit area
18
EE141
When Optimizing for "Power"...
❑ Colloquially, we talk about power
optimization...
❑ But really we want to optimize for energy
consumption or power density
❑ Energy consumption -> battery life
❑ Power density -> ability to cool the chip
❑ Cooling is often steady-state
❑ We can temporarily use more power
but eventually the chip will either slow/
shut down or blow up
19
EE141
Where Does Power Go in CMOS?
❑ Switching power
▪ Charging/discharging capacitors
▪ Proportional to C V2 F
▪ Switching energy is only proportional to C V2
▪ Lowering voltage slows transistors
❑ Short-circuit power
▪ Both pull-up and pull-down on during transition
❑ Leakage power
▪ Transistors are imperfect switches
❑ Static currents
▪ Biasing currents, in e.g. analog, memory
20
EE141
The Knobs of the Circuit Designer
❑ Architectural choices
❑ Logic Structure
❑ Width of Transistors
❑ Supply Voltage
❑ (Threshold Voltage)
21
EE141
Transistor Resistance as a Function of Voltage
5
x 10
7
5
R (Ohm)
4
eq
0
0.5 1 1.5 2 2.5
V (V)
DD
22
EE141
Principles for Power Reduction
23
EE141
Energy – Performance Space
Energy/op
Performance
❑ Plot all possible designs on a 2-D plane
▪ No matter what you do, can never get below/to the right of
the solid line
❑ This line is called “Pareto Optimal Curve”
▪ Usually (always) follows law of diminishing returns
24
EE141
Optimization Perspective
Energy/op
Performance
❑ Instead of metrics like EDP (Energy-Delay Product),
this curve often provides information more directly
▪ Ex1: What is minimum energy for XX performance?
▪ Ex2: Over what range of performance is a new technique
(dotted line) actually beneficial?
25
EE141
Expanding the Playing Field
E E
D D
26
EE141
Voltage scaling:
Parallelism and Pipelining
EE141
Reducing the Supply Voltage (while maintaining
performance)
Concurrency:
trading-off clock frequency versus area to reduce power
Consider the following reference design
R
F1
R F2
R
fref
Cref: average switching capacitance
R: register,
F1,F2: combinational logic blocks
[A. Chandrakasan, JSSC’92]
(adders, ALUs, etc)
28
EE141
A Parallel Implementation
R
F1
R F2
R
fref /2
R Almost cancels
F1
R F2
R
fref /2
Running slower reduces required supply voltage
Yields quadratic reduction in power
29
EE141
Example: 90nm Technology
5
4.5
4
tp (norm.)
3.5
3
2.5
2
1.5
1
0.5 0.6 0.7 0.8 0.9 1
VDD (norm.)
Assuming
ovpar = 7.5%
30
EE141
A Pipelined Implementation
R
F1 R
R F2
R R
fref fref
31
EE141
Leveraging
Concurrency
EE141
Increasing use of Concurrency Saturates
▪ Can combine parallelism and pipelining to drive VDD down
▪ But, close to process threshold overhead of excessive concurrency starts to
dominate 1
0.9
0.8
0.7
Power
0.6
0.5
0.4
0.3
0.2
0.1
2 4 6 8 10 12 14 16
Concurrency
Assuming constant % overhead
33
EE141
Reducing VTH
P Nominal design
Fixed (no concurrency)
Throughput
Overhead +
leakage
Concurrency
Pmin
VDD
Only option: Reduce VTH as well!
But: Must consider Leakage …
34
EE141
Mapping into the Energy-Delay Space
© IEEE 2004
EOp
Fixed throughput
Optimum
Energy-Delay
increasing level of parallelism point
Delay = 1/Throughput
▪ For each level of performance, optimum amount of concurrency
▪ Concurrency only energy-optimal if requested throughput larger
than optimal operation point of nominal function
[Ref: D. Markovic, JSSC’04]
35
EE141
What if the Required Throughput is Below Minimum?
(that is, at no concurrency)
Introduce Time-Multiplexing!
A
f f f A f
A 2f 2f
f f f f
reference time-mux
Absorb unused time slack by increasing clock frequency
(and voltage …)
Again comes with some area and capacitance overhead!
36
EE141
Concurrency and Multiplexing Combined
Data for 64-b ALU
par4 3 time
alle 2 1
lism 1 1 -mux
2 3 1 1
4
5 Max EOp
1
A = 5 Aref
Dtarget
LARGE SMALL
AREA
37
EE141
Some Energy-Inspired Design Guidelines
38
EE141
Clock Gating and FPGAs
❑ We can't shut down unused blocks of
our logic...
❑ Although the FPGA designers do
optimize for reducing leakage...
❑ The FPGA clocks are distributed using
dedicated clock buffers
❑ These buffers have multiple inputs:
2 separate clocks and a clock enable
❑ Can use this to "power off" large design
blocks by turning off the clock
39
EE141
More On Xilinx Clock Buffers...
❑ Have 12 dedicated clock lines
❑ Driven by global buffers called BUFG
❑ Two interesting additions:
❑ Has two clock inputs:
Allows switching between the two
clocks
❑ Each clock also can have a global
enable:
Allows turning off large blocks of logic
by turning off the clock
40
EE141
The Leakage Challenge – Power in Standby
41
EE141
Standby Static Power Reduction Approaches
❑ Multi-Vth design
❑ Power gating
❑ Body biasing
❑ Supply voltage ramping
42
EE141
Power Gating
Very often called “MTCMOS” (when using high- and low- threshold devices)
43
EE141
Power Gating ─ Concept
Leakage current reduces because
VDD ❑ Increased resistance in leakage path
❑ Stacking effect introduces source biasing
IN (= 0)
IN = 0
OUT
Ileak
M1
M1 VTH shift
VS = Ileak RS
Sleep RS Extra resistance RS
VS
44
EE141
Power Gating Options
sleep sleep
sleep sleep
45
EE141
"Hurry Up And Wait"
❑ Often end up in a position where you
can't reduce the supply voltage
❑ FPGAs
❑ Constrained by some other voltage in
an ASIC
❑ So slowing down doesn't help your
energy consumption
❑ But we can pause and use less
energy that way if we can shut down
swaths of logic
46
EE141
Wires
EE141
The importance of wires
49
EE141
Wires dominate in modern designs
52
EE141
Impact of Interconnect Parasitics
53
EE141
Interconnect Length Distribution
SLocal = STechnology
SGlobal = SDie
55
EE141