Professional Documents
Culture Documents
Pipe 1 New
Pipe 1 New
Critério de Aprovação:
duas provas + prova final
para passar sem prova final, média >= 7
para passar com prova final, média >= 5
direito à 2a. chamada
para ir para a prova final, média >= 4
Conteúdo
Capítulo 6: Enhancing Performance with Pipelining
Capítulo 9: Multiprocessors
2
cs 152 L1 3 .2 DAP Fa97, U.CB
Recap: Microprogramming
3
cs 152 L1 3 .3 DAP Fa97, U.CB
Summary: Microprogramming one inspiration for RISC
5
cs 152 L1 3 .5 DAP Fa97, U.CB
Definitions of control variables
S <= AIf- AB = B
S <= A fun B S <= A op ZX S <= A + SX S <= A + SX then PC <= S
0100 0110 1000 1011 0010
overflow
Memory
Write-back
R[rd] <= S R[rt] <= S R[rt] <= M
0101 0111 1010
7
The Big Picture: Where are We Now?
Processor
Input
Control
Memory
Datapath Output
Today’s Topics:
Pipelining by Analogy
8
6.1 An overview of Pipelining. It’s Natural!
6 PM 7 8 9 10 11 12 1 2 AM
6 PM 7 8 9 10 11 12 1 2 AM
base addressing
Ex: lw $1, 100 ($0) $1 = Memory[$0 + 100] 13
cs 152 L1 3 .13
DAP Fa97, U.CB
We will limit our attention to 8 instrs
figure 6.3
Ideal speedup is number of stages in the pipeline. Do we
achieve this? 15
cs 152 L1 3 .15 DAP Fa97, U.CB
Performance Metrics
In the single-cycle design model (every instruction
takes 1 clock cycle), the clock cycle must be
stretched to accommodate the slowest instruction
(Load is slowest and takes 8ns in MIPS).
In the pipelined design, the pipeline stages take a
single clock cycle. The clock cycle must be long
enough to accommodate the slowest operation
(2ns).
Assume write to reg file occurs in first half of clock
cycle and read from reg file occurs in second half of
cycle.
Speedup = execution time without pipeline /
execution time with pipeline = ideally to number of
stages.
(definition in pg. 101)
cs 152 L1 3 .16
16
DAP Fa97, U.CB
Performance Improvement
19
cs 152 L1 3 .19
DAP Fa97, U.CB
Graphically Representing Pipelines
20
cs 152 L1 3 .20 DAP Fa97, U.CB
Conventional Pipelined Execution Representation
Time
21
cs 152 L1 3 .21 DAP Fa97, U.CB
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1 Cycle 2
Clk
Single Cycle Implementation:
Load Store Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Pipeline Implementation:
Load Ifetch Reg Exec Mem Wr
Multicycle Machine
10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
cs 152 L1 3 .23
23
DAP Fa97, U.CB
Why Pipeline? Because the resources are there!
ALU
I Im Reg Dm Reg
n Inst 0
s
ALU
t
r.
Inst 1 Im Reg Dm Reg
ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3
ALU
Im Reg Dm Reg
e
r
Inst 4
ALU
Im Reg Dm Reg
24
cs 152 L1 3 .24
DAP Fa97, U.CB
Can pipelining get us into trouble?
Yes: Pipeline Hazards (next instr can’t execute in next
cycle)
structural hazards: attempt to use the same resource two different
ways at the same time
- E.g., combined washer/dryer would be a structural hazard or
folder busy doing something else (watching TV)
data hazards: attempt to use item before it is ready
- E.g., one sock of pair in dryer and one in washer; can’t fold until
get sock from washer through dryer
- instruction depends on result of prior instruction still in the
pipeline
control hazards: attempt to make a decision before condition is
evaluated
- E.g., washing football uniforms and need to get proper
detergent level; need to see after dryer before next load in
- branch instructions
ALU
I Mem Reg Mem Reg
n Load
s
ALU
Mem Reg Mem Reg
t
r.
Instr 1
ALU
Mem Reg Mem Reg
O Instr 2
r
ALU
d Mem Reg Mem Reg
e
Instr 3
r
ALU
Mem Reg Mem Reg
Instr 4
Detection is easy in this case! (right half highlight means read, left half write)
26
cs 152 L1 3 .26
DAP Fa97, U.CB
Structural Hazards limit performance
27
cs 152 L1 3 .27
DAP Fa97, U.CB
Control Hazard Solutions
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
Beq
O
r
Load
ALU
Mem Reg Mem Reg
d
e
r
29
cs 152 L1 3 .29
DAP Fa97, U.CB
Control Hazard Solutions
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
Beq
O
r
Load
ALU
Mem Reg Mem Reg
d
e
r
31
cs 152 L1 3 .31
DAP Fa97, U.CB
Control Hazard Solutions
Redefine branch behavior (takes place after next
instruction) “delayed branch”. Place a “safe”instr after
the branch instr. The compiler rearranges instrs when
possible (doesn’t change program results).
I Time (clock cycles)
n
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
Beq
O
r
ALU
d Misc Mem Reg Mem Reg
ALU
r Load Mem Reg Mem Reg
Impact: 0 clock cycles per branch instruction if can find instruction to put
in “slot” ( 50% of time)
Longer pipes, more slots, harder to fill. 32
cs 152 L1 3 .32
DAP Fa97, U.CB
Data Hazard on r1
add r1 ,r2,r3
sub r4, r1 ,r3
and r6, r1 ,r7
or r8, r1 ,r9
33
cs 152 L1 3 .33
DAP Fa97, U.CB
Data Hazard on r1:
ALU
I add r1,r2,r3 Im Reg Dm Reg
ALU
s Im Reg Dm Reg
t
sub r4,r1,r3
r.
ALU
Im Reg Dm Reg
and r6,r1,r7
O
ALU
r Im Reg Dm Reg
d or r8,r1,r9
e
ALU
Im Reg Dm Reg
r xor r10,r1,r11
cs 152 L1 3 .34
34
DAP Fa97, U.CB
Data Hazard Solution:
ALU
I add r1,r2,r3 Im Reg Dm Reg
ALU
s Im Reg Dm Reg
t
sub r4,r1,r3
r.
ALU
Im Reg Dm Reg
and r6,r1,r7
O
ALU
r Im Reg Dm Reg
d or r8,r1,r9
e
ALU
r xor r10,r1,r11 Im Reg Dm Reg
ALU
lw r1,0(r2) Im Reg Dm Reg
ALU
Im Reg Dm Reg
sub r4,r1,r3
37
cs 152 L1 3 .37
DAP Fa97, U.CB
Pipelined Datapath (as in book); hard to read
IF ID EX MEM WB
Fig 6.12
Data flowing from right to left does not affect
the current instr, only later instrs in the pipe
38
cs 152 L1 3 .38
DAP Fa97, U.CB
Pipelined Processor (almost) for slides
WB Ctrl
Mem Ctrl
IRmem
IRwb
Dcd Ctrl
IRex
Ex Ctrl
IR
Equal
Reg.
File
Exec
Reg
A
Next PC
File
S
PC
Access
Mem
M
Mem
Data
39
cs 152 L1 3 .39
DAP Fa97, U.CB
Control and Datapath
Equal
R[rd] <– S; R[rt] <– S; R[rd] <– M;
Reg.
Inst. Mem
File
Exec
Reg
A
Next PC
File
S
PC
IR
M
B
Access
Mem
Mem
Data
D 40
cs 152 L1 3 .40
DAP Fa97, U.CB
Pipelining the Load Instruction
41
cs 152 L1 3.41
DAP Fa97, U.CB
The Four Stages of R-type
42
cs 152 L1 3 .42
DAP Fa97, U.CB
Pipelining the R-type and Load Instruction
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Clock
43
cs 152 L1 3 .43
DAP Fa97, U.CB
Important Observation
R-type uses Register File’s Write Port during its 4th stage
1 2 3 4
R-type Ifetch Reg/Dec Exec Wr
44
cs 152 L1 3 .44
DAP Fa97, U.CB
Solution 1: Insert “Bubble” into the Pipeline
46
cs 152 L1 3 .46
DAP Fa97, U.CB
Modified Control & Datapath
Equal
R[rd] <– M; R[rt] <– M; R[rd] <– M;
Reg.
Inst. Mem
File
Exec
Reg
A M
Next PC
File
S
PC
IR
Access
Mem
Mem
Data
D 47
cs 152 L1 3 .47
DAP Fa97, U.CB
The Four Stages of Store
48
cs 152 L1 3 .48
DAP Fa97, U.CB
The Three Stages of Beq
Reg/Dec:
Registers Fetch and Instruction Decode
Exec:
compares the two register operand,
select correct branch target address
latch into PC
49
cs 152 L1 3 .49 DAP Fa97, U.CB
Control
Diagram
IR <- Mem[PC]; PC < PC+4;
Equal
R[rd] <– S; R[rt] <– S; R[rd] <– M;
Reg.
Inst. Mem
File
Exec
Reg
A M
Next PC
File
S
PC
IR
Access
Mem
Mem
Data
D
50
cs 152 L1 3 .50 DAP Fa97, U.CB
Datapath + Data Stationary Control
IR v v v
fun rw rw
Inst. Mem
rw
Decode
wb wb wb
rt me me WB
rs ex Mem
op Ctrl
Ctrl
rs rt im
Reg.
File
Reg
File A M
Exec
S
Access
Mem
Mem
Data
D
Next PC
PC
cs 152 L1 3 .51
51
DAP Fa97, U.CB
Let’s Try it Out
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
these addresses are octal
30 ori r8, r9, 17
34 add r10, r11, r12
52
cs 152 L1 3 .52
DAP Fa97, U.CB
Start: Fetch 10
Inst. Mem n n n n
Decode
WB
Mem
Ctrl
Ctrl
IR im
rs rt
Reg.
File
Reg
A M
File
Exec
S
B 10 lw r1, r2(35)
Access
Mem
Mem
Data
D 14 addI r2, r2, 3
20 sub r3, r4, r5
Next PC
n n n
lw r1, r2(35)
Inst. Mem
Decode
WB
Mem
Ctrl
Ctrl
IR im
2 rt
Reg.
File
Reg
A M
File
Exec
S
B 10 lw r1, r2(35)
Access
Mem
Mem
Data
D 14 addI r2, r2, 3
20 sub r3, r4, r5
Next PC
Decode
lw r1
WB
Mem
Ctrl
Ctrl
IR
35
2 rt
Reg.
File
Reg
r2 M
File
Exec
S
B 10 lw r1, r2(35)
Access
Mem
Mem
14 addI r2, r2, 3
Data
D
20 sub r3, r4, r5
24 beq r6, r7, 100
Next PC
Decode
lw r1
WB
Mem
Ctrl
Ctrl
IR
4 5
3
Reg.
r2+35
File
Reg
r2 M
File
Exec
B
10 lw r1, r2(35)
Access
Mem
Mem
Data
D 14 addI r2, r2, 3
20 sub r3, r4, r5
Next PC
Decode
addI r2
sub r3
lw r1
WB
Mem
Ctrl
Ctrl
IR
6 7
M[r2+35]
Reg.
File
Reg
r2+3
r4
File
Exec
r5 10 lw r1, r2(35)
Access
Mem
Mem
Data
D 14 addI r2, r2, 3
20 sub r3, r4, r5
Next PC
ori r8, r9 17
Inst. Mem
Decode
addI r2
sub r3
WB
beq
Mem
Ctrl
Ctrl
r1=M[r2+35]
IR
100
9 xx
Reg.
r2+3
File
Reg
r4-r5
r6
File
Exec
r7 10 lw r1, r2(35)
Access
Mem
Mem
14 addI r2, r2, 3
Data
D
20 sub r3, r4, r5
24 beq r6, r7, 100
Next PC
Decode
ori r8
sub r3
beq
WB
Mem
Ctrl
Ctrl
17
11 12
Reg.
IR r1=M[r2+35]
r4-r5
File
Reg
r9
File
Exec
xxx
r2 = r2+3
x 10 lw r1, r2(35)
Access
Mem
Mem
14 addI r2, r2, 3
Data
D
20 sub r3, r4, r5
24 beq r6, r7, 100
Next PC
100
30 ori r8, r9, 17
34 add r10, r11, r12
PC
ooops, we should have only one delayed instruction 100 and r13, r14, 15
cs 152 L1 3 .59
59
DAP Fa97, U.CB
Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24
n
add r10
Decode
ori r8
beq
WB
Mem
Ctrl
Ctrl
xx
14 15
r9 | 17
Reg.
IR r1=M[r2+35]
File
xxx
Reg
r11
File
Exec
r2 = r2+3
r3 = r4-r5
r12 10 lw r1, r2(35)
Access
Mem
Mem
14 addI r2, r2, 3
Data
D
20 sub r3, r4, r5
24 beq r6, r7, 100
Next PC
104
30 ori r8, r9, 17
34 add r10, r11, r12
PC
Inst. Mem n
add r10
Decode
and r13
ori r8
WB
Mem
Ctrl
Ctrl
xx
r11+r12
r9 | 17
Reg.
IR r1=M[r2+35]
File
Reg
r14
File
Exec
r2 = r2+3
r3 = r4-r5
r15 10 lw r1, r2(35)
Access
Mem
Mem
14 addI r2, r2, 3
Data
D
20 sub r3, r4, r5
24 beq r6, r7, 100
Next PC
110
30 ori r8, r9, 17
34 add r10, r11, r12
PC
and r13
add r10
Decode
NO Ovflow
WB
Mem
Ctrl
Ctrl
r11+r12
Reg.
IR r1=M[r2+35]
File
Reg
File
Exec
r2 = r2+3
r3 = r4-r5
r8 = r9 | 17
Access
10 lw r1, r2(35)
Mem
Mem
Data
D 14 addI r2, r2, 3
20 sub r3, r4, r5
Next PC
Squash the extra instruction in the branch shadow! 100 and r13, r14,
62 15
cs 152 L1 3 .62
DAP Fa97, U.CB
Summary: Pipelining
64
cs 152 L1 3 .64
DAP Fa97, U.CB