Professional Documents
Culture Documents
PD Lec03
PD Lec03
PD Lec03
“Gate Delay”
=88
FO4
Delays CPU Clock Periods
B8
1985-2005 '$,-/)7A?
A8 '$,-/)4A?
'$,-/)C-$,'3D
'$,-/)C-$,'3D)6
@8 '$,-/)C-$,'3D)7
MIPS 2000
'$,-/)C-$,'3D)4
5 stages
?8 '$,-/)',#$'3D
E/CF#)6=8?4
E/CF#)6==?4
>8 Pentium Pro E/CF#)6=6?4
10 stages 9C#"%
48 93C-"9C#"%
9C#"%?4
Historical 78
G'C(
HI)IE
limit: I&J-")IK
about 68 EGL)M?
EGL)M@
12 =8
EGL)NA?O?4
Pentium 4
20 stages
8
A> A? A@ AA AB B8 B= B6 B7 B4 B> B? B@ BA BB 88 8= 86 87 84 8>
!"#$%&'()*#+&$,-
./#+&$,-0(,#$.&"12-13 456756887 6 9,#$.&"1):$';-"(',<
∆ ∝C /I
L
ISAT ∝ W/L
‣ Transistor
I drains
‣ Interconnection
(wires/
contacts/vias)
‣ Transistor Gates
Wires
‣ So far, simple capacitors:
C ∝ Area = width ∗ length
‣ Wires have finite resistance, so have distributed R and C:
1. # of levels of logic
2. Internal cell delay
3. wire delay
4. cell input capacitance
5. cell fanout
6. cell output drive strength
Lecture 03, Timing 13 CS250, UC Berkeley Fall ‘10
sed at the and also relieves stall speed paths in the instruction fetch and
st
gn issues,
sures that
branch prediction units.
200
Deferred register dependency stalls. While register depen-
late
e is seven dencies are checked in the RF stage, stalls due to these hazards
eline, and are deferred until the X1 stage. All the necessary operands are
s inserted then captured from result-forwarding busses as the results are
nto ARM
16 b, two
returned to the register file. 150
One of the major goals of the design was to minimize the en-
humb in- ergy consumed to complete a given task. Conventional wisdom
ipeline is has been that shorter pipelines are more efficient due to re-
100
ng
50
rs 0
!40 !20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
*
Timing slack (ps)
From “The circuit and physical design of the POWER4 microprocessor”, IBM J Res and Dev, 46:1, Jan 2002, J.D. Warnock et al.
g
as Figure 26
Histogram of the POWER4 processor path delays.
Timing Analysis Tools
‣ Static Timing Analysis: Tools use delay models for
gates and interconnect. Traces through circuit paths.
‣ Cell delay models capture
‣ For each input/output pair, internal delay (output load
independent)
delay
‣ output dependent delay
‣ Standalone tools (PrimeTime) and part of logic
synthesis.
output load
‣ Back-annotation takes information from results of
place and route to improve accuracy of timing
analysis.
‣ DC in “topographical mode” uses preliminary layout
information to model interconnect parasitics.
‣ Prior versions used a simple fan-out model of gate
loading.
Lecture 04, Timing 17 CS250, UC Berkeley Fall ‘09