Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 70

2.

Physics of Power Dissipation


in CMOS FET Devices

1
2. Physics of Power Dissipation
in CMOS FET Devices
• For an ideal MIS diode, the energy difference ψms bet
ween the metal work function ψm and the semiconduc
tor work function ψs is zero:
• ψms ≡ψm - (χ+ Eg/2q +ΨB) = 0 (2.1)

• where χis the semiconductor electron affinity (from c


onduction band to vacuum level), Eg the band gap (fr
om valence band to conduction band), ψB the potentia
l barrier between the metal and the insulator, and ΨB t
he potential difference between the Fermi level EF an
d the intrinsic Fermi level Ei.
2
The Fermi-Dirac Function
• fFD(E) = 1/ (1 + exp ((E – EF) / kT))
• The Fermi-Dirac distribution function gives
the probability that a certain energy state wi
ll be occupied by an electron.
• As in a gas, the electrons in a solid are in co
nstant motion and consequently changing th
eir energy and momentum.

3
P-type

4
CMOS Gate Power equations
• P = CLVDD2f 01 + tsc VDD Ipeak f 0  1 + VDD Ileakage

• Dynamic term CLVDD2f 01


• Short-circuit term tsc VDD Ipeakf 0  1
• Leakage term VDD Ileakage

5
• The Maxwell-Boltzmann statistics relates th
e equilibrium hole concentration to the intri
nsic Fermi level:
• p0 = ni exp((Ei – EF)/kT) (2.2)

6
P substrate (The Fermi level EF in the semi
conductor is now –qV below the Fermi level i
n the metal gate.)

7
P substrate

8
• If the applied voltage is increased sufficiently, the
bands bend far enough that level Ei at the surface
crosses over to the other side of level EF.
• This is brought about by the tendency of carriers t
o occupy states with the lowest total energy.
• In the present condition of inversion the level Ei b
ends to be closer to level Ec and electrons outnumb
er holes at the surface.

9
Ei at the surface now is below EF by an amount of ene
rgy equal to 2 ΨB , where ΨB is the potential difference
between the Fermi level EF and the intrinsic Fermi lev
el Ei in the bulk.

10
• The value of V necessary to reach the onset
of strong inversion is called the threshold
voltage.

11
Surface Space Charge Region
and the Threshold Voltage
• Poisson equation
• ▽ ‧D = ρ(x, y, z) (2.3)
• Where D, the electric displacement vector, i
s equal to εs E under low-frequency or static
conditions; εs is the permittivity of Si; E the
electric field vector; and ρ(x, y, z) the total
electric charge density.

12
13
Threshold voltage
• VT =
(2d/εi ) * ( q εs NA ψB (1 – e-2βψB) )0.5 + 2ψB
The total voltage needed to offset the effect of
nonzero work function difference and the pr
esence of the charges is referred to as the fla
t-band voltage VFB.
VFB = ψms – QT*d/εi

14
Threshold voltage
• VT =
(2d/εi ) * ( q εs NA ψB (1 – e-2βψB) )0.5 + 2ψB + VF
B

15
16
2.2.3.1 Effects Influencing
Threshold Voltage
• VT decreases when L (length) is decreased,
varies with Z (width), and decreases when
the drain-source voltage VDS is increased.

17
• Drain-induced barrier lowering (DIBL) is
the basis for a number of more complex
models of the threshold voltage shift.
• It refers to the decrease in threshold voltage
due to the depletion region charges in the
potential barrier between the source and the
channel at the semiconductor surface.

18
• A recent model adopt a quasi two-dimensio
nal approach to solving the two-dimensiona
l Poisson equation.
• dEx/dx at each point (x, y) can be replaced
with the average of its value at (0, y) and at
(W, y)

19
Short channel effect
• The minimum value of the surface potential
increases with decreasing channel length
and increasing VDS.

20
2.2.3.2 Subsurface Drain-Induced
Barrier Lowering (Punchthrough)
• The punchthrough voltage VPT defined as th
e value of VDS at which I D, st reaches some s
pecific magnitude with VGS = 0.
• The parameter VPT can be roughly approxim
ated as the value of VDS for which the sum o
f the widths of the source and the drain depl
etion regions becomes equal to L.

21
22
• If the field in the oxide, Eox, is large enough, the v
oltage drop across the depletion layer suffices to e
nable tunneling in the drain via a near-surface trap
.
• The minority carriers emitted to the incipient inver
sion layer are laterally removed to the substrate, c
ompleting a path for a gate-induced drain leakag
e (GIDL) current. In CMOS circuits this leakage c
urrent contributes to standby power.
23
2.3 Power Dissipation in CMOS
• The first ICs ever fabricated used a PMOS process
. This is due to the simplicity of fabrication of a p-
channel enhancement mode MOS field-effect trans
istor (PMOST) with threshold voltage VTp < 0.
• The charge mobility factor caused the move to the
NMOS process.
• Then change to CMOS because of the power dissi
pation problem.

24
• This advantage of CMOS over NMOS has proven
to be important enough that the shortcomings of
CMOS are overlooked.
• The CMOS process is more complex than the
NMOS, the CMOS requires use of guard-rings to
get around the latch-up problem, and CMOS
circuits require more transistors than the
equivalent NMOS circuits.

25
26
• The threshold voltages place a limit on the
minimum supply voltage that can be used w
ithout incurring unreasonable delay penaltie
s.
• If the threshold voltage is too low, the static
component of the power due to subthreshol
d currents becomes significant.

27
28
2.3.1 Short-Circuit Dissipation
• The short-circuit dissipation of the gate vari
es with the output load and the input signal
slope.
• The short-circuit dissipation decreases linea
rly (roughly) in both absolute terms and a fr
action of the total dissipation as the output l
oad is increased to a critical value and then i
t will increase again rapidly.
29
• For simplicity a symmetrical inverter (i.e., β
N = βp and VTn = -Vtp;) and a symmetrical in
put signal (rise time = fall time) are conside
red.
• I = β/2(Vin – V T)2 for 0≦ I≦ Imax
• Imean = 1/T ∫0T I(t) dt
• = 2* 2/T ∫t1t2 β/2 (Vin (t) – VT)2 dt
30
• Assuming the rising and falling portions of t
he input voltage waveform to be linear ram
ps,
• Vin(t) = t* VDD/τ
• Imean = 2*2/T∫(Vt/Vdd) ττ/2 β/2(t*VT/τ – VT)2 dt

• Let θ= (VT/τ)t - VT

31
• Imean = - 2β/T∫(Vt/Vdd) ττ/2 θ dθ
• Imean = 1/12*β/VDD(VDD – VT)3 τ/T
• The short-circuit power dissipation of an un
loaded inverter is
• PSC = β/12(VDD – VT)3 τ/T

32
• If the inverter is lightly loaded, causing output rise
and fall times that are relatively shorter than the
input rise and fall times, the short-circuit
dissipation increases to become comparable to
dynamic dissipation.
• To minimize dissipation, an inverter should be
designed in such a way so that the input rise and
fall times are about equal to the output rise and fall
times.
33
2.3.2 Dynamic Dissipation
• Assuming that the input Vin is a square wav
e having a period T and that the rise and fall
times of the input are much less than the rep
etition period, the dynamic dissipation is gi
ven by
• PD = CL VDD2/T

34
35
• When V = VDD, E 0->1 = CLVDD2.
• When energy stored in a capacitor with
capacitance CL and voltage VDD across its
plates is CL VDD2/2, the rest of the energy,
another CL VDD2/2, is converted into heat.

36
Networks of pass transistors

37
38
2.3.3 The Load Capacitance

39
40
• The overall load capacitance is modeled as t
he parallel combination of 4 capacitors – t
he gate capacitance Cg,
the overlap capacitance Cov,
the diffusion capacitance Cdiff,
and the interconnect capacitance Cint.

41
42
2.3.3.2 The Overlap Capacitance
• Cgd1 = Cgd2 = 2 Cox xd W
• Cgd3 = Cgd4 = Cgs3 = Cgs4 = Cox xd W
• The total overlap capacitance is simply the s
um of all the above:
– Cov = Cgd1 + Cgd2 + Cgd3 + Cgd4 + Cgs3 + Cgs4

43
2.3.3.3 Diffusion Capacitance
• Two components: the bottomwall area capa
citance and the sidewall capacitance

44
2.4.1 Principles of Low-Power
Design
• Using the lowest possible supply voltage
• Using the smallest geometry, highest frequency devices
but operating them at the lowest possible frequency
• Using parallelism and pipelining to lower required
frequency of operation
• Power management by disconnecting the power source
when the system is idle
• Designing systems to have lowest requirements on
subsystem performance for the given user level
functionality

45
2.4.3 Fundamental Limits
• The limit from thermodynamic principles results f
rom the need to have, at any node with an equivale
nt resistor R to the ground, the signal power P s exc
eed the available noise power Pavail.
• The quantum theoretic limit on low power comes
from the Heisenberg uncertainty principle. In order
to be able to measure the effect of a switching tran
sition of duration Δt, it must involve an energy gre
ater than h/ Δt:
• P ≧ h/ (Δt)2 where h is the Planck’s constant.
46
• Finally the fundamental limit based on
electromagnetic theory results in the
velocity of propagation of a high-speed
pulse on an interconnect to be always less
than the speed of light in free space, c0:
• L/τ≦ c0 where L is the length of the
interconnect and τ is the interconnect transit
time.
47
2.4.4 Material Limits
• The attributes of a semiconductor material t
hat determine the properties of a device buil
t with the material are
• Carrier mobility μ
• Carrier saturation velocity σs
• Self-ionizing electric field strength Ec
• Thermal conductivity K
48
• Consider an SOI structure by surrounding th
e above generic device in a hemispherical s
hell of SiO2 of radius ri, indicating a two-or
der-of-magnitude reduction in thermal cond
uctivity.

49
• The response time of the global interconnec
t circuit is
• τ= (2.3 Rtr + Rint) Cint where Rtr is the outpu
t resistance of the driving transistor and R int
and Cint are the total resistance and capacita
nce, respectively, of the global interconnect.

50
2.4.7 System Limits
• The architecture of the chip
• The power-delay product of the CMOS
technology used to implement the chip
• The heat removal capacity of the chip
package
• The clock frequency
• Its physical size
51
Energy characterization
• Transition-sensitive energy models
– Single energy tables
• Bit independent modules e.g., flipflops
– Multiple energy tables
• Large bit dependent modules e.g., 32-b adders
• Large multi-element modules e.g., register files
– Transition sensitive energy equations
– System level interconnect capacitance values
• Analytical energy modes
– Cache and main memory

52
Transition-sensitive energy
model
• Must first design and layout a functional unit and
then simulate it to capture switch capacitances
– Bit independent – bus lines, pipeline registers
• One bit switching does not affect other bit slices’ operations
• Bit dependent – ALU, decoders
• Once constructed, the models can be reused in
simulations of other architectures built with the
same technology

53
Switch Capacitance Table
Previous Input Current Input Switch
Vector Vector Capacitance
0…00 0…00 cap0→0

0…00 0…01 cap0→1

… … …

1…11 1…11 Cap2n-1→2n-1

54
Table Compression
• Problem
– Results in large uncompressed table (e.g., 16-bit adder
 232 rows)
– Excessive simulation (e.g., 232!)
• Solution
– Clustering Algorithm Reference: Huzefa Mehta, et al. “
Module Energy Characterization using Clustering”, DA
C’96
– For 16-bit adder, to keep 12% average error  1000 si
mulation points, 97 rows

55
2:1 Multiplexer Table
Uncompressed
64 rows
000 000 0.00
000 001 0.00 Compressed

000 010 0.00 32 rows Reduced


000 011 0.00 000 0xx 0.00 11 rows
000 100 0.04 000 100 0.04 000 0xx 0.00
000 101 0.05 000 101 0.05 000 1xx 0.045
000 110 0.04 000 110 0.04 001 0xx 0.00
000 111 0.05 000 111 0.05 … …
001 000 0.00 001 0xx 0.00

001 001 0.00 … …


56
… …
57
58
59
60
Memory System Energy Model
• Parameterizable analytical energy models for the on-
chip memories that capture
– Energy dissipated by bitlines: precharge, read and write cy
cles
– Energy dissipated by wordlines: when a particular row is b
eing read and written
– Energy dissipated by storage cell on access
– Energy dissipated by address decoders
– Energy dissipated by peripheral circuits – cache control lo
gic, comparators, etc.
• Off-chip main memory energy is based on per-acces
s cost
61
Cache energy model example
• On-chip cache
– Energy = Ebus + Ecell + Epad
– Ecell =  * (Wl_length) * (Bl_length + 4.8) * (Nhit + 2 * Nmiss)
– Wl_length = m * (T + 8L + St)
– Bl_length = C / (m * L)
– Nhit = number of hits; Nmiss = number of misses;
– C = cache size; L = cache line size in bytes;
– m = set associativity; T = tag size in bits;
– St = # of status bits per line;
  = 1.44e-14 (technology based cell access cost of SRAM)
– Em = 4.95e-9 (technology based access cost of DRAM)

62
63
64
65
66
67
68
Architectural Level Analysis
Considerations
• Very computationally efficient
– Requires predefined analytical and transition-
sensitive energy characterization models
– Requires design only to RTL (with some idea
as to the kind of functional units planned)
– Coarse grain – use of gated clocks implicit
• Reasonably accurate (within 5% - 15% of
SPICE)
69
• Simulation based so can be used to support
architectural, compiler, OS, and application
level experimentation
• WattWatcher (Sente), DesignPower and Po
werCompiler (Synopsys), prototype academ
ic tools (Wattch – Princeton, SimplePower
– PSU)

70

You might also like