Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

dce

2013

COMPUTER ARCHITECTURE
CE2013
Faculty of Computer Science and
Engineering
BK
TP.HCM Department of Computer Engineering

Vo Tan Phuong
http://www.cse.hcmut.edu.vn/~vtphuong
dce
2013

Chapter 4
Single-cycle & Pipeline
Processor

Computer Architecture – Chapter 4.2 ©2013, CE 2


dce
2013
Single-Cycle Processor Overview
30 Jump or Branch Target Address

30 30

Next J, Beq, Bne


Imm26
PC ALU result
PCSrc +1 Imm16
zero
Instruction Rs 5
BusA Data
RA
Memory Memory 0
30 32
Registers A m 32
00

Instruction E
0 Rt 5 L Address u
m RB 0 32 x
u Address
BusB
m
U Data_out 1
PC

0
x m u
x Data_in
1 u RW BusW
Rd x 1
1
5
clk
ALUop
func

Op
RegDst RegWrite ExtOp ALU
Ctrl MemRead

ALUSrc
MemWrite MemtoReg

Main
Control

Computer Architecture – Chapter 4.2 ©2013, CE 3


dce
2013
Exercise 1
Fill the value of the control signals for following instruction:
a. slt $t0,$s0,$zero
Reg Reg Ext ALU Beq Bne J Mem Mem Mem
Dst Write Op Src Read Write toReg

1 1 x 0 0 0 0 0 0 0

b. bne $t0,$zero,exit_label
Reg Reg Ext ALU Beq Bne J Mem Mem Mem
Dst Write Op Src Read Write toReg

Computer Architecture – Chapter 4.2 ©2013, CE 4


dce
2013
Exercise 2
• We wish to add the instruction jalr (jump and link
register) to the single-cycle datapath. Add any necessary
datapath and control signals and draw the result
datapath. Show the values of the control signals to
control the execution of the jalr instruction.
• The jump and link register instruction is described
below:

Computer Architecture – Chapter 4.2 ©2013, CE 5


dce
2013
Exercise 2
• One solution:
(Comment: JReg means Jump Register; RA means: Return Address)

Computer Architecture – Chapter 4.2 ©2013, CE 6


dce
2013
Exercise 2
• The main control signals for the JALR instruction are the
same for other R-type instructions, such as ADD and SUB.
These control signals are shown in the table below:

• The ALU Control signals for the JALR instruction are shown
below. JReg = 1 and RA = 1. ALUCtrl is a don't care

Computer Architecture – Chapter 4.2 ©2013, CE 7


dce
2013
Exercise 3
We want to compare the performance of a single-cycle CPU design
with a multi-cycle CPU. Suppose we add the multiply and divide
instructions. The operation times are as follows:
o Instruction memory access time = 190 ps, Data memory access time = 190
ps
o Register file read access time = 150 ps, Register file write access = 150 ps
o ALU delay for basic instructions = 190 ps, ALU delay for multiply or divide =
550 ps
Ignore the other delays in the multiplexers, control unit, sign-extension, etc.
Assume the following instruction mix: 30% ALU, 15% multiply & divide, 15%
load, 15% store, 15% branch, and 10% jump.

a. What is the total delay for each instruction class and the clock cycle for the
single-cycle CPU design
b. Assume we fix the clock cycle to 200 ps for a multi-cycle CPU, what is the
CPI for each instruction class and the speedup over a fixed-length clock
cycle?

Computer Architecture – Chapter 4.2 ©2013, CE 8


dce
2013
Exercise 3
a. Total delay for each instruction:

Clock cycle = max delay = 1040ps

Computer Architecture – Chapter 4.2 ©2013, CE 9


dce
2013
Exercise 3
b. CPI for each instruction:
CPI for Basic ALU = 4 cycles
CPI for Multiply & Divide = 6 cycles (ALU takes 3 cycles)
CPI for Load = 5 cycles
CPI for Store = 4 cycles
CPI for Branch = 3 cycles
CPI for Jump = 2 cycles

Average CPI = 0.3 * 4 + 0.15 * 6 + 0.15 * 5 + 0.15 * 4 + 0.15 * 3 + 0.1 *


2 = 4.1

Speedup of multi-cycle over single-cycle = (1040 * 1) / (200 * 4.1) =


1.27

Computer Architecture – Chapter 4.2 ©2013, CE 10


dce
2013
Exercise 4
• Identify all the RAW data dependencies in the following
code. Which dependencies are data hazards that will be
resolved by forwarding? Which dependencies are data
hazards that will cause a stall? Using a graphical
representation of the pipeline, show the forwarding paths
and stalled cycles if any.
add $3, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6

Computer Architecture – Chapter 4.2 ©2013, CE 11


dce
2013
Exercise 4
• RAW dependencies:
add $3, $4, $2 and sub $5, $3, $1 (forwarding)
add $3, $4, $2 and lw $6, 200($3) (forwarding)
lw $6, 200($3) and add $7, $3, $6 (stall 1, forward)
add $3, $4, $2 and add $7, $3, $6 (from register)

Computer Architecture – Chapter 4.2 ©2013, CE 12


dce
2013
Exercise 5
• We have a program of 10^6 instructions in the format of “lw, add,
lw, add,…”. The add instruction depends only on the lw instruction
right before it. The lw instruction also depends only on the add
instruction right before it. If this program is executed on the 5-stage
MIPS pipeline:
a. Without forwarding, what would be the actual CPI?
It takes 6 cycles on average to complete one LW and one ADD.
1 cycle (to complete LW) + 2 cycles (bubbles) + 1 cycle (to complete ADD) + 2
cycles (bubbles) = 6 cycles
So, it takes 6 cycles to complete 2 instructions
Average CPI = 6/2 = 3
b. With forwarding, what would be the actual CPI?
It takes only 3 cycles on average to to complete one LW and one ADD.
1 cycle (to complete LW) + 1 cycle (bubble) + 1 cycle (to complete ADD) = 3
cycles
So, it takes 3 cycles to complete 2 instructions
Average CPI = 3/2 = 1.5
Computer Architecture – Chapter 4.2 ©2013, CE 13

You might also like