Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Architecture and System Level Optimization

of Power Consumption

Anantha Chandrakasan
Massachusetts Institute of Technology

Power and Throughput in CMOS

Throughput (1/delay)

1.5
1/delay, .6m ring oscillator
1.0

0.5

0.0
1.0

2.0

3.0
4.0
VDD

5.0

6.0

How to scale VDD keeping a fixed throughput?


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

Architecture Trade-offs - Reference Datapath

1
T

LATCH C

COMPARATOR

A>B

ADDER

LATCH B

1
T

LATCH A

COMPARATOR

Area = 636 x 833 2


1
T

Critical path delay Tadder + Tcomparator (= 25ns)


fref = 40Mhz
Total capacitance being switched = Cref
Vdd = Vref = 5V
Power for reference datapath = Pref = Cref Vref2 fref
from [Chandrakasan92] (IEEE JSSC)
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

Parallel Datapath

LATCH B

ADDER

COMPARATOR

LATCH C

ADDER

COMPARATOR

LATCH C

LATCH B

1
2T

COMPARATOR

MUX

1
2T

A>B

LATCH A

1
2T

LATCH A

COMPARATOR

1
2T

1
2T

A>B

1
T

1
2T

Area = 1476 x 1219 2

The clock rate can be reduced by half with the same


throughput fpar = fref / 2
Vpar = Vref / 1.7, Cpar = 2.15Cref
Ppar = (2.15Cref) (Vref/1.7)2 (fref/2) 0.36 Pref
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

The Faster the Better??


NORMALIZED POWER

1.00

Fixed Throughput

0.90

Minimal Area

0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00

Minimal Power
1.00

2.00

3.00

4.00

5.00

Vdd (volts)

Capacitance overhead starts to dominate at high levels


of parallelism and results in an optimum voltage

THE ANSWER IS NO!


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

Pipelined Datapath

LATCH C1

LATCH C2

LATCH P

LATCH B

COMPARATOR

1
T

A>B

ADDER

1
T

LATCH A

1
T

COMPARATOR

C
1
T

1
T

Area = 640 x 1081 2

Critical path delay is less max [Tadder , Tcomparator]


Keeping clock rate constant: fpipe = fref
Voltage can be dropped Vpipe = Vref / 1.7
Capacitance slightly higher: Cpipe = 1.15Cref
Ppipe = (1.15Cref) (Vref/1.7)2 fref 0.39 Pref
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

Architecture Summary for a Simple Datapath

Architecture type

Voltage

Simple datapath
(no pipelining or
parallelism)

5V

Pipelined datapath

2.9V

1.3

0.39

Parallel datapath

2.9V

3.4

0.36

Pipeline-Parallel

2.0V

3.7

0.2

Architecture and System Level Optimization of Power Consumption

Area

Power

Anantha Chandrakasan 1997

Algorithmic Transformations
XN
XN

YN

YN

2D

Loop Unrolling

XN-1

Ceff = 1
Voltage = 5
Throughput = 1
Power = 25

*
*
+

A
YN-1

Ceff = 1
Voltage = 5
Throughput = 1
Power = 25

Loop-unrolling does not reduce power consumption


from [Chandrakasan95] (IEEE TCAD)
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

Loop Unrolling Enables Other Transformations


XN

2D

*
A

2D

A
*
+

Pipelining
YN-1

XN-1

After
Algebraic Transformations,
&
Constant Propagation
Ceff = 1.5
Voltage = 3.7
Throughput = 1
Power = 20 (20% reduction)

Architecture and System Level Optimization of Power Consumption

YN

*
A2

XN-1

XN

YN

*
A2

*
+

YN-1

Ceff = 1.5
Voltage = 2.9
Throughput = 1
Power = 12.5 (x2 reduction)

Anantha Chandrakasan 1997

Speed vs. Power Optimization


25
21

POWER (Fixed Throughput)


17
13
9

SPEEDUP
VOLTAGE

5
1

CAPACITANCE

Unrolling Factor

Area can be traded for Higher Throughput or Lower Power


ARBITRARY SPEEDUP vs. FINITE POWER REDUCTION
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

10

Multiple Supply Voltage Systems: Filter Example


1
2
3
4
5

* * * *

* *

+
+

+
+

2.4V

6
7

*
+

*
+

8
9
10

* *

3V

5V

+
Power (5V) / Power (5V,3V, 2.4V)= 1.5

from [Raje95]
Similar approach to logic design proposed in [Usami95]
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

11

Self Regulating Voltage Reduction Technique


VDD_Ref
Equivalent
Critical
Path

VDD_Ref
Signal

+
-

Comparator
Equivalent
Critical
Path
Regulated Voltage to DSP
from [Macken90]

Feedback adjusts the regulated voltage to the point


where the equivalent critical path is about to fail
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

12

Implementation Using PLL


Reference
Clock

Phase
Comparator
System Clock

Loop
Filter

Voltage
Regulator

Regulated VDD
Ring
Oscillator

from [Macken90], [Horowitz93]

Limited by bandwidth of the feedback loop


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

13

Low Voltage Switching Regulator


PASS Vg1
DEVICE

Vx
Lf

M1

Vin

ILf

Cin

Vg2

M2

Cx

Cf

RL Vdd
-

SYNCHRONOUS RECTIFIER

Arbitrary Vdd (<Vin) generated using the Buck converter


Vdd = Vin Duty Cycle at Node X
Chief sources of inefficiencies:

Conduction loss (I2R)


Switching loss (Cx Vin2 fs and Ls I2 fs)
Gate-drive loss (Cg Vin2 fs)
from [Stratakos94]
(IEEE PESC)

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

14

Soft-Switching Eliminates CxV2f Loss


PASS Vin
DEVICE
Vgp
M1

Vx
ILf

Iout

Vx
Vgn

Lf
M2

Cx

Cf

RL

ILf
Iout

RECTIFIER

Dead-time when neither


FET conducts
Current reverses

t
PASS DEVICE ON

|Vgsp|
Vgsn

Lf charges and discharges Cx

FETS ARE SWITCHED WITH VDS = 0


Architecture and System Level Optimization of Power Consumption

RECTIFIER ON

Anantha Chandrakasan 1997

15

What happens when Iout changes?


Iout Cx discharges slowly

Iout Cx discharges quickly

Vgn

Vgn

Vx

Vx

Rectifier Discharges Cx

Body Diode Conduction

Inverter node transition times depends on Iout


Typical schemes use fixed dead time set by gate delays

Adaptive Dead-time Control Needed for varying Iout


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

16

Normalized FET Losses

ASIC Design: Power Transistor Sizing

Ptotal = Pgd + Pcl

Pcl = b/W

Pgd = afsW
0

Wopt

Gate-Width

Minimize Ptotal = Pgate-drive + Pconduction loss


W opt =

b
-------------a fs

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

17

Low Voltage Support Circuitry: Level Converter


VddH
4/3
M4

OEN

4/3
M3
VddL

VIN

24/2
M2

8/2,
4/2

24/2
M1

VOUT

VddH

0 VddL
VddL
O

Tri-stateable output driver

Compatibility with 3.3V/5V standard components


(VOH)IN = 1.1V to 5V and (VOH)OUT = 1.1 to 5V
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

18

Adaptive Power Supply Voltages

Processor

REG

Self-timed

FIFO

FIFO

REG

Control

Power
Supply
VDD(t)

Exploit Data Dependent Computation Times To Vary the Supply


from [Nielsen94]
(IEEE Transactions on VLSI Systems)
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

19

But Self-timed Circuits Can be Expensive...


Vdd

Vdd

OUTB

OUT

IN
INB
I

Guaranteed transition for every operation!


0->1 = 1

Our Approach Uses Synchronous DSP


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

20

Variable Supply Voltage Block Diagram

Rate
Controller

Ref_CLK
Workload
Filter

Sample
Workload
Data_In

Actively
Damped
Switching
Supply

Ring
Oscillator

Var_VDD

Var_CLK

Generic

Input
Buffer

DSP

Data_Out

from [Gutnik96]
(IEEE VLSI Circuits Symposium)

Note: cant exploit low-level data dependent computation times


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

21

Variable Algorithmic Workload


Compare Current Image...

...to Previous Image

Receiver just updates

# of blocks to process depends


on the image activity
Scene change require a lot of
computation.
Fixed throughput, varying
computational requirements.

Exploit Time Varying Computation Requirements


to Vary the Power Supply Voltage
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

22

Typical MPEG IDCT Histogram

Frequency of Occurrence

0.06

0.04

0.02

0.00
0

500
1000
1500
Number of IDCTs per Frame

Architecture and System Level Optimization of Power Consumption

2000

Anantha Chandrakasan 1997

23

Synchronous Variable Voltage System

Work Load Power


Supply
VDD

Compressed
Video Data

Synchronous

FRAME

Processor

BUFFER

For an MPEG Decoder, the Work Load is Known A Priori


NO BUFFERING NEEDED!
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

24

Example of Variable Supply

Tsample
Block 1
Block 2

IDLE

Block 3
Block 4

IDLE

Tsample
Block 1

VDD = 5
VDD = 5

Block 2

VDD = 5

Energy Sample = C (5)2 + 1/2C (5) 2

VDD = 2.5

Block 3

VDD = 5

VDD = 5

VDD = 5
VDD = 2.5

Block 4

Energy Sample = C (5)2 + 1/2C (2.5) 2

2
= 18.75C
Architecture and System Level Optimization of Power Consumption

2
= 14C
Anantha Chandrakasan 1997

25

Power vs. Workload

Normalized Power

1.0

VDD = 2V
VT = .5V

0.8

Fixed Power Supply

0.6
0.4
Energy Savings

0.2
0

Variable Power Supply

0.2

0.4
0.6
0.8
Normalized Workload

Architecture and System Level Optimization of Power Consumption

1.0

Anantha Chandrakasan 1997

26

Reduce Power Further By Buffering


Averaged Workload

Xn+Xn+1
2

Power
Supply
VDD(y)

In, In+1

Synchronous

On, On+1

Processor
In+2, In+3

On-2, On-1

Two samples processed every two sample periods


Increased latency
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

27

Example of Buffering

Tsample
Block 1
Block 2
Block 3
Block 4

Tsample
VDD = 5

Block 1, 2

VDD = 3.75

VDD = 2.5

Block 1, 2

VDD = 3.75

VDD = 5

Block 3,4

VDD = 3.75

VDD = 2.5

Block 3,4

VDD = 3.75

Energy Sample = C (5)2 + 1/2C (2.5) 2

Energy Sample = 2 (0.75) C (3.75)2

2
= 14C
Architecture and System Level Optimization of Power Consumption

2
= 10.5C
Anantha Chandrakasan 1997

28

Graphical Interpretation of Averaging

Normalized Power

1.0
0.8

Fixed Supply
0.6

Variable Supply
(no buffer)

0.4

Averaging

0.2

Average Workload
0

0.2

0.4

0.6

0.8

1.0

Normalized Workload

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

29

In General, Use an FIR to Filter Workload


Averaged Workload
y[n]

Power
Supply

FIR
x[n]

Synchronous
Processor

y [n] =

i =1
L

i=1

FIFO

FIFO

VDD(y)

ai x [n i ]

ai = 1 ; ( ai > 0 )

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

30

Buffering Example: MPEG Decoder

Supply Voltage, Volts

4V

Unbuffered

VDDMAX= 3.5V VT=0.7V

3V

2V

Buffered
1V

550

555

560
565
570
Video Frame Number

575

580

Averaging smooths effective workload and power


supply
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

31

Quantizing the Number of Voltage Levels


1.0

Normalized Power

4 levels, dithered
0.8

4 levels, undithered
0.6

Fixed Supply
0.4

0.2

Arbitrary Levels

0
0

0.2

0.4

0.6

0.8

1.0

Normalized Workload

4 Voltage Levels are Sufficient!


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

32

Power Supply Control Loop


Reference Clock

Compare

Variable Clock

Update

Power
Supply
VDD

Workload -> VDD


Lookup Table
Workload

Ring
Oscillator

Two Types of Changes:


1) Large, fast changes when workload changes (Open loop)
2) Small adjustments to compensate for temperature and
processvariations (Feedback)
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

33

Data Driven Processing: IDCT Example


60
Number of Blocks, x103

Number of Blocks, x103

50
40
30
20
10
0

8
16 24 32 40 48 56 64
Number of Non-zero Coefficients

40

20

0
0

8
16 24 32 40 48 56
Number of Non-zero Coefficients

64

Few non-zero coefficients in a typical 8x8 DCT matrix


Conventional algorithms do not exploit distribution
Recast algorithm noting multiplication with zero = NOP!
from [Xanthopoulos96]

I = x0 C0 + x1 C1 + ... + x63 C63


64x1Pixel Data

DCT Data

64x1 Coefficient Kernels

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

34

CHEN
(separable)

FMIDCT
(instantaneous)

0
0

FMIDCT
(average)

200
400
600
800
1000
Time (Block Sequence Number)

Number of Additions(x1000)

Number of Multiplications (x100)

FMIDCT vs. Conventional Algorithm


4
FMIDCT
(instantaneous)
3
FMIDCT
(average)
2
CHEN
(separable)
1

00

200
400
600
800
Time (Block Sequence Number)

1000

Time variable computational requirement using FMIDCT


-> Can use variable power supply voltage

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

35

Relationship to Parallelism

Normalized Power

1.0
0.8

Reference System
2x Parallel
16x Parallel

0.6
0.4
0.2
0.0
0.0

0.2
0.4
0.6
0.8
Normalized Workload

1.0

If the workload is typically low, then variable voltage


processing achieves energy efficiency of parallel
solution without area penalty
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

36

Algorithmic Transformations: VQ Compression


Blocked
Vectors

DISTORTION
CALCULATION

index of
best codeword
(8 bits)

CODEBOOK
(256 levels)

(4x4 pixels)

Algorithm
Full Search
Tree Search
Differential
Tree Search

# of
Memory
Accesses

# of
Multiplications

# of
Adds

# of
Subs

4096
256
136

4096
256
128

3840
240
128

4096
264
0

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

37

Algorithmic Transformations
Comparing Two Code Vectors
15
D left right =

15

j=0

( X j C leftj )

j=0

( X j C rightj )

OR
15
D left right =
15
D left right =

j=0

2
2
(X C
)

(
X

C
)
leftj
j
rightj
j

2
2

2X j C leftj X 2j C rightj
+ 2X j C
X 2j + C leftj
rightj

j=0

15
D left right =

j=0

2
2
( C leftj
C rightj
)+

15

2X j ( C rightj C leftj )

j=0

from [Fang94]
(VLSI signal processing IV)
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

38

A Uni-Processor Implementation

MEMORY (36K)

Data

REG
Control

8
INDEX

from [Lidsky94]
(VLSI Signal Processing, VII)

fclock: 8.3 MHz


VDD: 3.3V
Pav: 10.8 mW
Area: 72 mm2

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

39

A Multi-Processor Implementation

Control

Control
INDEX

fclock: 1.04 MHz

REG

Data

MEM
REG

REG

MEM

MEMORY
(18K)

INDEX

Control
2

INDEX

Pipelined Implementation reduces VDD

Pav: 250W

Distrubuted Resources reduce Physical Capacitance


Memory Power Reduced by factor 8!

Area: 112 mm2

Smaller Clock Frequency reduces activity

VDD: 1.1 V

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

40

Approximate Filtering Techniques


POWER DOWN

IN

fs

fs

fs

fs

fs

OUT

Dynamically vary the number of operations per sample.

Trade power consumption and filter quality


from [Ludwig95]
(CICC95)
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

41

How Much Power Reduction is Possible?


Speech

x 10

Filter Length vs. Time


450

0
400

1
0

2000

4000

amplitude

x 10

8000

10000

12000

14000

16000

350

0
1
0

2000

4000

6000

8000

10000

12000

14000

16000

Output of Adaptive Filter

amplitude

6000

Speech Corrupted with Noise

Numberfilterof
Taps
length, N

amplitude

x 10

Average Filter Length = 212

250
200
150
100

0
1
0

Average length = 212

300

50

2000

4000

6000
8000
10000
discrete time index, n

12000

14000

0
0

16000

2000

4000

6000
8000
10000
discrete time index, n

12000

14000

16000

Sample Number

Switched Capacitance Reduction

Peak Number of Operations


Average Number of Operations

Strong Function of Signal Statistics


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

42

Number Representation
Sign Magnitude

1.0
Rapidly Varying
0.8
0.6
0.4
0.2
0.0

Slowly Varying
0

10

12

14

Transition Probability

Transition Probability

Twos Complement
1.0
0.8

Rapidly Varying

0.6
0.4
0.2
0.0

Bit Number

10

12

14

Bit Number

Sign-extension activity significantly reduced using


sign-magnitude representation
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

43

Number Representation: Accumulator Example


RST
3

14
SIGN-BIT
(To Control)

RST
4

IN

13

14

ACC

IN

OCLK
CLK
(64Mhz)(640Khz)

CLK
(64Mhz)

13
3

GATED
CLK

Twos Complement

13

GATED
GATED OCLK
CLK
CLK (640Khz)
3
IN
3-bit Magnitude
RST

CLK
(64Mhz)

RST

14

14
ACC

RST
13

OCLK
(640Khz)

GATED OCLK
CLK (640Khz)

Sign Magnitude

Sign magnitude datapath switches 30% less


capacitance for uniformly distributed inputs
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

44

Twos Complement vs. Sign-Magnitude

Transition Activity

SUM
(Twos Complement)

1.0

SUMB

SUMA + SUMB
(Sign-Magnitude)

0.5

SUMA

0.0
0

10

12

Bit Number

Twos complement datapath has a significantly


higher glitching activity
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

45

Conditional Inversion Coding


Simple
Control

DataIn

Register

invert

DataOut

Large Capacitance
Bus
from [Stan94]
(1994 International Workshop on Low-power Design)

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

46

Reducing Activity by Reordering Inputs


SUM1

SUM2

SUM1

IN

IN

>> 7

SUM2

>> 8

>> 8

>> 7

IN

Associativity & Commutativity


IN

IN

Transition Probability

Transition Probability

IN

0.5
0.4

SUM1

0.3

SUM2

0.2
0.1
0.0 0

10

12

14

0.5
0.4
0.3
SUM2

0.2
SUM1
0.1
0.0 0

Bit Number

10

12

14

Bit Number

30% reduction in switching energy


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

47

Counter 1

BUS1

SHARED BUS

OR
Counter 2

Counter 2

Counter 1

Resource Sharing Can Increase Activity

BUS2

Number of Bus Transitions Per Cycle


= 2 (1 + 1/2 + 1/4 + ...+1/128) 4

# of Bus Transitions Per Cycle

10.0

8.0

6.0

4.0

No Bus-sharing

2.0

0.0

50

100

150

200

250

Skew Between Counter Outputs


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

48

SPICE Approach to Energy Estimation


Vdd

idd

IN0
IN1
CIN

+
-

Vc = Energy in pJ

COUT
Vdd idd

1pF

100M

SUM

4.0

Voltage, V

3.0

2.0

1.0

voltage across the integrating V


capacitor

IN0

IN1

Energy/ transition is 3.64pJ/8


= 0.44pJ

CIN

100 010
110 001 101 011 111 000

0.00

25

50

75
100
time, ns

125

150

Accurate but extremely slow for large circuits...


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

49

Switch Level Simulation


Irsim-Cap
(from P. Landman and J. Rabaey)
100
90
80
70
60
50
40
30
20
10
0

Cap (fF/bit)

Irsim-Cap
SPICE

10

20

30

40

50

60

Sample

A
B
X
F

Signal
A

Cphysical N(01) Ceffective % Total


20 fF
1
20 fF
6.9%

20 fF

40 fF

13.8%

10 fF

30 fF

10.3%

100 fF

200 fF

69.0%

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

50

Intermediate Level: Timing Simulation


Handles reduced signal swings
More accurate in leakage and dc-currents
Up to 2 orders of magnitude faster than SPICE
i(Vdd)
Vdd

in
in
out1

out2

out3
out1
Vdd-Vth
out2

out3

Example: PowerMill (Epic)


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

51

Architecture Power Analysis

Too Little

Z = X*Y
if (Z<0) then Z=0
Power

Algorithm
Time

Mem

Fast & Accurate

>>

Ctrl

Architecture

Area
Too Late

A
B

Exploration

Gate/Circuit
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

52

Architecture Level Power Analysis (SPA)


C
reg

reg

Mem

INPUT

RTL Netlist
Power Models
Chip Inputs

+
Ctrl

OUTPUT
Average Power:
Module
Bus
Clock
Control

reg
C

Tool from U.C. Berkeley (P. Landman and J. Rabaey)


from [Landman93] (Proc. EDAC-EUROASIC)
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

53

Architectural Power Estimation (Sente)


Sentes Watt Watcher Tool
- Performs early [pre-synthesis], full-chip, power estimation
- WW operates at the module-level, not gate-level
- Provides a specific power estimation technique for each full
chip power component

Watt Watchers Approach to Power Estimation


- Synthesizes design into -architectural blocks
- Uses basic Process Technology parameters, not gate-level
models
- Determines circuit activity from functional simulation
- Back-annotates wiring capacitances from floor planners or
layout
- Analyzes Mode-Dependent power contributions
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

54

Case Study: A Portable Multimedia I/O Terminal


Antenna

Radio Modem

Protocol Module
(2mW)

Video
Decompression
Module
(2mW)

Speech
Pen
Digitizer Codec

Text/Graphics
Frame-Buffer
Module
(1mW)

Protocol, ECC, Buffering, Video Decompression, and I/O


(InfoPad Terminal Developed at U.C. Berkeley)
from [Chandrakasan94]
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

55

Activity Driven Placement for Low-power

Physical Capacitance, pF

Minimize Cswitched = Cphysical f


Decreasing Activity
1.8

1.2

Display
Data

Address

SRAM Data

f = 1/32
f = 1/16

f = 1/2
0.8

0.2

10

20

30

40

50

Distance from Core to Pads


Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

56

Self-timed Approach for Eliminating Glitching

Sense

Row Decode

Cells

Cells

INACTIVE

Sense

Block Selected
and Sense-amp
Output Valid
32

Enable tri-state drivers after sense-amp outputs are valid


to eliminate glitching on the data-bus.
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

57

Glitch Free, Low Swing RAM Bitslice


Data Out

Output remains tri-stated


until senseamp/latch has
resolved data

Vdd

OEN
PRE

PRE

Vdd

Column select/
cascode amp

Bitlines precharged to
Vdd - Vtn

Vdd

PRE
SEL0
SEL1

PRE

Vdd
B0 B0

Architecture and System Level Optimization of Power Consumption

Vdd

PRE
Vdd
B1 B1

Anantha Chandrakasan 1997

58

Memory Architecture

MEMORY
CELL
ARRAY
4 4

f
f

Parallel Access

Addr

Row Decoding

Addr

Row Decoding

Serial Access

MEMORY
CELL
ARRAY
4 4

4 4
Mux
4
Latch

4 4
Latch

f/8
4 4

4 4
Mux

8-nibbles

4 bit display interface

Voltage = 3V
Architecture and System Level Optimization of Power Consumption

Voltage = 1.1V
Anantha Chandrakasan 1997

59

Digital YIQ -> Digital RGB


R
G
B

D11 D12 D13


D21 D22 D23
D31 D32 D33

Y
I
Q

Optimized matrix multiplication (6mults -> 8 adds)


Hardwired shift-add operations
Coefficient scaling to minimize shift-add operations
Exploit multiple coefficients multiplied with the
same input

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

60

Optimizing Multiplications
A = IN * 0 0 1 1
B = IN * 0 1 1 1
A = (IN >>4 + IN >>3)
B = (A + IN >>2)

# of shift-add operations

A = (IN >>4 + IN >>3)


B = (IN >>4 + IN >>3 + IN >>2)
16

14

Only Scaling

12

10

6
1.10

1.15

1.20

1.25

Architecture and System Level Optimization of Power Consumption

Scaling &
Common
Sub-expression
Anantha Chandrakasan 1997

61

Time-multiplexed Architectures
Parallel busses for I,Q
I0

I1

I2

Q0

Q1

Q2

Time-shared bus for I,Q


I0

20

20

Signal Value

30

Signal Value

30

Q1

I1

10
0

-10

-10
-20

I1
T/2

10

Q0

10

20

30

40

Time, Sample Number

50

-20

20

40

60

80

Time, Sample Number

100

Can destroy signal correlations and increase


the switching activity
Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

62

Summary
Signal statistics can be exploited to minimize
the number of transitions required to perform
a given function
Architectural voltage scaling is a key technique for
low-voltage operation
Variable power supply reduces power and buffering
trades latency for power
Chipset for a portable terminal consumes
orders of magnitude lower power compared
to current design approaches

Architecture and System Level Optimization of Power Consumption

Anantha Chandrakasan 1997

63

References
[Chandrakasan92] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-power Digital CMOS Design, IEEE Journal of
Solid State Circuits, pp. 473-484, April 1992.
[Chandrakasan94] A. P. Chandrakasan, A. Burstein, and R. W. Brodersen, A Low-power Chipset for a Portable Multimedia I/O
Terminal, IEEE Journal of Solid-State Circuits, December 1994.
[Chandrakasan95] A. P. Chandrakasan, R. Mehra, M. Potkonjak, J. Rabaey, and R. W. Brodersen, Optimizing Power Using
Transformations, IEEE Transaction on CAD, vol. 14, pp. 13-32, January 1995.
[Fang94] W.-C. Fang, C.-Y. Chang, B. J. Sheu, O. T.-C. Chen, J. C. Curlander, VLSI systolic binary tree-searched vector quantizer for image compression, IEEE Trans. on VLSI Systems, vol. 2, no. 1, pp. 33-44, Mar. 1994.
[Gutnik96] V. Gutnik, A. Chandrakasan, An Efficient Controller for Variable Supply-Voltage Low Power Processing, 1996
VLSI Circuits Symposium, June 1996.
[Horowitz93] M. Horowitz, Low Power Processor Design Using Self-Clocking, Workshop on Low-power Electronics, 1993.
[Landman93] P. Landman and J. Rabaey, Power Estimation for High Level Synthesis, Prof. EDAC-EUROASIC 93, Paris,
France, pp. 361-366, Feb. 1993.
[Lidsky94] D. Lidsky, J. Rabaey Low-Power Design of Memory Intensive Functions: Case Study: Vector Quantization, VLSI
Signal Processing,VII October 1994 IEEE Press, NY NY.
[Ludwig95] J. Ludwig, S. Nawab, A.P. Chandrakasan,"Low Power Filtering Using Approximate Processing for DSP," IEEE 1995
Custom Integrated Circuits Conference, pp. 185-188, May 1995.
[Macken90] P. Maken, M. Degrauwe, M. Van Paemel, H. Oguey, A Voltage Reduction Technique for Digital Systems, ISSCC
February, 1990.
[Nielsen94] L Nielsen, C. Niessen, J. Sparso, K. van Berkel, Low-Power Operation Using Self-Timed Circuits and Adaptive
Scaling of Supply Voltage, IEEE Transactions on VLSI systems, December 1994, pp 391-397.
[Raje95] S. Raje and M. Sarrafzadeh, Variable Voltage Scheduling, pp. 9-13, International Symposium on Low Power Design,
April 1995.
[Stan94] M. Stan and W. Burleson, Limited-weight Codes for Low-power I/O, 1994 International Workshop on Low-power
Design, pp. 209-214, April 1994.
[Stratakos94] A. Stratakos, S. Sanders, and R. Brodersen, A Low-Voltage CMOS DC-DC Converter for a Portable Battery-Operated
System, IEEE Power Electronics Specialists Conference., pages 619-626, 1994.
[Usami95] K. Usami and M. Horowitz, Clustered Voltage Scaling Technique for Low Power Design, International Symposium on
Low Power Design, pp. 3-8, April 1995.
[Xanthopoulos96] T. Xanthopoulos, A. Chandrakasan, C. Sodini, B. Dally, A Data-Driven IDCT Architecture for Low Power Video
Applications, IEEE ESSIRC, September 1996.

You might also like