Kien-Truc-May-Tinh - Vo-Tan-Phuong - Chapter04-Exercise - (Cuuduongthancong - Com)

dce
2013
COMPUTER ARCHITECTURE
CE2013
Faculty of Computer Science and
Engineering
BK
TP.HCM Department of Computer Engineering
Vo Tan Phuong
http://www.cse.hcmut.edu.vn/~vtphuong
dce
2013
Chapter 4
Single-cycle & Pipeline
Processor
Computer Architecture – Chapter 4.2 ©2013, CE 2

dce
2013
Single-Cycle Processor Overview
30 Jump or Branch Target Address
30 30
Next J, Beq, Bne

Imm26
PC ALU result
PCSrc +1 Imm16
zero
Instruction Rs 5
BusA Data
RA
Memory Memory 0
30 32
Registers A m 32
00
Instruction E
0 Rt 5 L Address u
m RB 0 32 x
u Address
BusB
m
U Data_out 1
PC
0
x m u
x Data_in
1 u RW BusW
Rd x 1
1
5
clk
ALUop
func
Op
RegDst RegWrite ExtOp ALU
Ctrl MemRead
ALUSrc
MemWrite MemtoReg
Main
Control

dce
2013
Exercise 1
Fill the value of the control signals for following instruction:
a. slt $t0,$s0,$zero
Reg Reg Ext ALU Beq Bne J Mem Mem Mem
Dst Write Op Src Read Write toReg
1 1 x 0 0 0 0 0 0 0
b. bne $t0,$zero,exit_label
Reg Reg Ext ALU Beq Bne J Mem Mem Mem
Dst Write Op Src Read Write toReg

dce
2013
Exercise 2
• We wish to add the instruction jalr (jump and link
register) to the single-cycle datapath. Add any necessary
datapath and control signals and draw the result
datapath. Show the values of the control signals to
control the execution of the jalr instruction.
• The jump and link register instruction is described
below:

dce
2013
Exercise 2
• One solution:
(Comment: JReg means Jump Register; RA means: Return Address)

dce
2013
Exercise 2
• The main control signals for the JALR instruction are the
same for other R-type instructions, such as ADD and SUB.
These control signals are shown in the table below:
• The ALU Control signals for the JALR instruction are shown
below. JReg = 1 and RA = 1. ALUCtrl is a don't care

dce
2013
Exercise 3
We want to compare the performance of a single-cycle CPU design
with a multi-cycle CPU. Suppose we add the multiply and divide
instructions. The operation times are as follows:
o Instruction memory access time = 190 ps, Data memory access time = 190
ps
o Register file read access time = 150 ps, Register file write access = 150 ps
o ALU delay for basic instructions = 190 ps, ALU delay for multiply or divide =
550 ps
Ignore the other delays in the multiplexers, control unit, sign-extension, etc.
Assume the following instruction mix: 30% ALU, 15% multiply & divide, 15%
load, 15% store, 15% branch, and 10% jump.
a. What is the total delay for each instruction class and the clock cycle for the
single-cycle CPU design
b. Assume we fix the clock cycle to 200 ps for a multi-cycle CPU, what is the
CPI for each instruction class and the speedup over a fixed-length clock
cycle?

dce
2013
Exercise 3
a. Total delay for each instruction:
Clock cycle = max delay = 1040ps

dce
2013
Exercise 3
b. CPI for each instruction:
CPI for Basic ALU = 4 cycles
CPI for Multiply & Divide = 6 cycles (ALU takes 3 cycles)
CPI for Load = 5 cycles
CPI for Store = 4 cycles
CPI for Branch = 3 cycles
CPI for Jump = 2 cycles
Average CPI = 0.3 * 4 + 0.15 * 6 + 0.15 * 5 + 0.15 * 4 + 0.15 * 3 + 0.1 *

2 = 4.1
Speedup of multi-cycle over single-cycle = (1040 * 1) / (200 * 4.1) =

1.27

dce
2013
Exercise 4
• Identify all the RAW data dependencies in the following
code. Which dependencies are data hazards that will be
resolved by forwarding? Which dependencies are data
hazards that will cause a stall? Using a graphical
representation of the pipeline, show the forwarding paths
and stalled cycles if any.
add $3, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6

dce
2013
Exercise 4
• RAW dependencies:
add $3, $4, $2 and sub $5, $3, $1 (forwarding)
add $3, $4, $2 and lw $6, 200($3) (forwarding)
lw $6, 200($3) and add $7, $3, $6 (stall 1, forward)
add $3, $4, $2 and add $7, $3, $6 (from register)

dce
2013
Exercise 5
• We have a program of 10^6 instructions in the format of “lw, add,
lw, add,…”. The add instruction depends only on the lw instruction
right before it. The lw instruction also depends only on the add
instruction right before it. If this program is executed on the 5-stage
MIPS pipeline:
a. Without forwarding, what would be the actual CPI?
It takes 6 cycles on average to complete one LW and one ADD.
1 cycle (to complete LW) + 2 cycles (bubbles) + 1 cycle (to complete ADD) + 2
cycles (bubbles) = 6 cycles
So, it takes 6 cycles to complete 2 instructions
Average CPI = 6/2 = 3
b. With forwarding, what would be the actual CPI?
It takes only 3 cycles on average to to complete one LW and one ADD.
1 cycle (to complete LW) + 1 cycle (bubble) + 1 cycle (to complete ADD) = 3
cycles
So, it takes 3 cycles to complete 2 instructions
Average CPI = 3/2 = 1.5

Kien-Truc-May-Tinh - Vo-Tan-Phuong - Chapter04-Exercise - (Cuuduongthancong - Com)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kien-Truc-May-Tinh - Vo-Tan-Phuong - Chapter04-Exercise - (Cuuduongthancong - Com)

Uploaded by

Copyright:

Available Formats

dce

Computer Architecture – Chapter 4.2 ©2013, CE 2

Next J, Beq, Bne

Computer Architecture – Chapter 4.2 ©2013, CE 3

Computer Architecture – Chapter 4.2 ©2013, CE 4

Computer Architecture – Chapter 4.2 ©2013, CE 5

Computer Architecture – Chapter 4.2 ©2013, CE 6

Computer Architecture – Chapter 4.2 ©2013, CE 7

Computer Architecture – Chapter 4.2 ©2013, CE 8

Clock cycle = max delay = 1040ps

Computer Architecture – Chapter 4.2 ©2013, CE 9

Average CPI = 0.3 * 4 + 0.15 * 6 + 0.15 * 5 + 0.15 * 4 + 0.15 * 3 + 0.1 *

Speedup of multi-cycle over single-cycle = (1040 * 1) / (200 * 4.1) =

Computer Architecture – Chapter 4.2 ©2013, CE 10

Computer Architecture – Chapter 4.2 ©2013, CE 11

Computer Architecture – Chapter 4.2 ©2013, CE 12

You might also like