369W17 Slideset07

Slide Set 7 for Lecture Section 01
for ENCM 369 Winter 2017
Steve Norman, PhD, PEng
Electrical & Computer Engineering

Schulich School of Engineering
University of Calgary
February 2017
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 2/86
Contents
The multicycle processor (textbook Section 7.4)
Introduction to Pipelining
5 pipeline stages for our MIPS subset
Pipeline Hazards
Making pipelining work in hardware
Hardware features to manage data hazards
Hardware changes to manage control hazards
Exceptions
Outline of Slide Set 7 for Lecture Section 01

Pipeline Hazards
Exceptions
ENCM 369 will not cover Section 7.4 in detail, because terms
at Canadian universities are short!
That’s too bad, because the multicycle design has some
interesting aspects . . .
I It shows how a computer can use a single memory
array for both instructions and data.
I It makes very efficient use of the ALU—the ALU gets
used to compute three different results for every
instruction.
I The control unit is sequential—it’s a really nice and
practical example of a finite state machine (FSM).

Pipeline Hazards
Exceptions
Before we start to learn about pipelining, let’s review a model
we will call the one-instruction-at-a-time model:
Step 1: Processor reads instruction from

memory and updates PC.
Step 2: Processor executes the instruction.
The processor performs Step 1, Step 2, Step 1,
Step 2, . . . , forever (or until the power is
turned off).
This model correctly predicts the results produced by

sequences of instructions in assembly language code.
Also, the model accurately describes the organization of the
processors of textbook Sections 7.3 and 7.4.
The one-instruction-at-a-time model and modern

processors
The model DOES NOT accurately describe the organization of

modern processors!
At a given moment in time, a modern processor will be
working on many different instructions—this allows much
greater speed than one-instruction-at-a-time processing.
However, the processor must produce results as if instructions
were being handled one at a time.
Remark: Your instructor thinks that “as if” is a very short

and very useful summary of many of the important ideas
related to modern computer system designs.
Modern processor chips often process instructions in ways that
are hard for humans to understand, but nevertheless do what
skilled coders want in time- and energy-efficient ways.
The Laundry Analogy

This analogy is taken from Computer Organization and
Design, by David Patterson and John Hennessy, which was the
ENCM 369 textbook for many years.
You have many loads of laundry to do, with these four
resources:
I a washing machine
I a dryer
I a “folding unit” (you)
I a “putting-away unit” (your roommate)
(In real life not very many students would ask their roommates
to put away laundry for them, but let’s just follow Patterson
and Hennessy here.)
The Laundry Analogy, continued
Let’s assume that each step in processing laundry takes

30 minutes. (In real life, this close to correct for washers but
unfortunately not at all correct for dryers.)
Suppose you have four loads of dirty laundry.
If you process each load completely before starting the next,
how long does it take to finish all four loads?
Processing four loads of laundry, one at a time . . .

Load
1st W D F PA
2nd W D F PA
3rd W D F PA
4th W D F PA
6:00pm 8:00pm 10:00pm midnight 2:00am
Time
The work takes EIGHT HOURS in total! And each resource

(washer, dryer, etc.) is IDLE for three-quarters of the time.
There is an obvious way to speed this up . . .
Processing four loads of laundry, making better use

of resources . . .
As soon as one load is out of the washer, the washer is free for
the next load. The same is true for all of the other resources.
So we can schedule the work this way. . .
Load
1st W D F PA
2nd W D F PA
3rd W D F PA
4th W D F PA
6:00pm 8:00pm 10:00pm midnight 2:00am
Time
The concept of pipelining in digital logic design
A pipelined system is a collection of stages, each with a simple

role to perform.
When a stage is finished producing its current output, it can
pass that output to the next stage and receive new input.
In the laundry analogy, the washer stage receives a load of
dirty clothes as input, and produces a load of wet, clean
clothes as output, which gets passed as input to the dryer
stage.
In Harris and Harris, this year’s textbook, pipelining is
introduced in Section 3.6, along with an analogy to baking
cookies. That section is short and worth reading.
Pipelined execution of instructions
In a pipelined processor, an instruction is like a single load of

laundry. Processing an instruction can start long before
processing of the preceding instruction is finished.
To divide the work of processing an instruction across a
number of pipeline stages, that work has to be broken down
into simple steps that take roughly equal amounts of
time.
Each step must fit into a single clock cycle.

Pipeline Hazards
Exceptions

The subset is: ADD, SUB, SLT, AND, OR, LW, SW, BEQ.
The stages are:
I Fetch: Read instruction from I-Mem and update PC.
I Decode: Determine outputs of Control Unit and read

GPRs from R-File.
I Execute: Get a result from the ALU.
I Memory: D-Mem access for loads and stores.
I Writeback: Write to a GPR at the end of a load or an

R-type instruction.
In some stages, for some instructions, nothing happens. What
are some examples of that?
Example sequence of instructions in our 5-stage

MIPS subset pipeline
# This sequence is not practical code,

# but it makes for a simple example.
ADD $t2, $t0, $t1
LW $t4, ($t3)
SW $t5, ($t6)
SUB $t9, $t7, $t8
Let’s suppose we have a 1 GHz clock, so the clock period is
1 ns. How long will it take from the beginning of the ADD
instruction to the end of the SUB instruction?
Pipelined processing for example instruction

sequence
ADD IF ID EX MEM WB
LW IF ID EX MEM WB
SW IF ID EX MEM WB
SUB IF ID EX MEM WB
0 ns 1 ns 2 ns 3 ns 4 ns 5 ns 6 ns 7 ns 8 ns
time
The single-cycle processor starts one instruction per clock cycle.
A pipelined processor also starts one instruction per clock cycle.
Why will a pipelined design allow much greater instruction
throughput? (The diagram below provides a hint at the answer!)
CLK
PC output
Instruction
main decoder outputs
R-File outputs
ALU decoder outputs
ALU result
D-Mem RD output
$s1 contents
An example 3-instruction sequence in a pipelined

processor
The sequence is . . .
lw $t0, 20($t1)
or $t2, $t3, $t4
sw $t5, 40($t6)
Let’s use the “Pipeline Basics” handout to track all the steps
in processing these instructions.

Pipeline Hazards
Exceptions
Pipeline Hazards
These can be defined as situations that prevent throughput of

one instruction per clock cycle.
There are three main kinds: structural hazards, data hazards,
and control hazards.
Structural Hazards
A structural hazard occurs when a unit within a computer is

asked to do two (or more) incompatible things at the same
time.
Example: In a computer with a single memory unit, the
processor can’t do the Fetch step of one instruction while also
doing the Memory step of an earlier LW or SW instruction.
Solution to Structural Hazards
Design the instruction set and hardware so that this kind of

hazard does not occur.
Example: Have separate Instruction and Data Memories, so
Fetch can be simultaneous with Memory of an earlier
instruction.
(Note: When we get to textbook Chapter 8, we’ll see that for
modern processors, separation of instructions and data really
means having separate caches for instructions and data.)
Data Hazards: Are inputs to instructions

up-to-date?
Example:
add $t0, $t1, $t2
sub $t4, $t3, $t0
The destination of ADD is a source for SUB.
The Writeback step of ADD will happen later than the
Decode step of SUB, so there is a risk that SUB will use old,
wrong data from $t0.
Remember, the processor must produce results as if one
instruction completes before the next instruction starts!
Control Hazards: What instruction address should

be used in the next Fetch?
Example . . .
beq $t0, $t1, L1

and $t4, $t2, $t3
. . . more instructions . . .
L1: lw $t5, ($t6)
Which instruction should be fetched after BEQ is fetched?

AND or LW? The processor will not know until the
$t0 == $t1 comparison is done!
Assumption about Register File in textbook

Section 7.5, related to data hazards
Writes to the Register File occur in the first half of a clock

cycle, and reads from the Register File occur in the second
half.
To enable this behaviour, what choices can be made about
flip-flops, Data Memory, and other clocked components?
What are the consequences regarding GPR reads and writes
that happen within the same clock cycle?
Edge-triggering for pipelined computers

in Section 7.5
Updates to GPRs in the Register File
happen in response to negative clock edges.
1
system clock
0
Updates to PC, Data Memory, and pipeline registers

happen in response to positive clock edges.
This is NOT applicable to the single-cycle design of

Section 7.3 and the multicycle design of Section 7.4!
Review: 5 pipeline stages for our MIPS subset
Fetch: Read instruction from Instruction Memory;

do PC = PC + 4.
Decode: Determine Control Unit outputs appropriate for
instruction opcode; copy two GPR values out of Register File.
Execute: Do computation in ALU.
Memory: Read or write Data Memory.
Writeback: Update a GPR in the Register File.
Solutions to data hazards, first of three: stalling

the pipeline
Example:
add $t0, $t1, $t2
sub $t4, $t3, $t0
Solutions to data hazards, second of three:

forwarding
Example A, from previous slide:

add $t0, $t1, $t2
sub $t4, $t3, $t0
Example B:
lw $t0, ($t1)
add $t2, $t2, $t3
slt $t6, $t0, $t5
Solutions to data hazards, third of three:

combine stalling and forwarding
Example:
lw $t0, ($t1)
add $t3, $t0, $t2
Can forwarding by itself solve this data hazard?

Control Hazards (repeat of earlier example)
What instruction address should be used in the next Fetch

step after the Fetch step of a branch instruction? Example . . .
beq $t0, $t1, L1

and $t4, $t2, $t3
. . . more instructions . . .
L1: lw $t5, ($t6)
Which instruction should be fetched after BEQ is fetched?

AND or LW? The processor will not know until the
$t0 == $t1 comparison is done!
Control hazard illustration
BEQ F D E M W
next instruction F D E M W
Here “next” means “next in time”, not necessarily “next

location in Instruction Memory”.
Why will it be difficult to do the Fetch step for the next
instruction just one clock cycle after the Fetch step for BEQ?
(There are multiple reasons.)
Four kinds of solutions for control hazards
1. Stall: Delay the Fetch step for the next instruction until
the address of the next instruction is known.
2. Predict: Guess what the address of the next instruction
will be, and act on the guess without delay. Check that
the guess was correct; if not, cancel instructions that
have incorrectly entered the pipeline.
3. Delayed branch and jump rules
4. Conditional instructions
Dynamic branch prediction
This is widely used in modern processors (but mostly not used

in low-power embedded processors).
A large and complex branch prediction circuit is dedicated to
recording information about recently-encountered branch
instructions.
For each branch instruction, its target address is recorded
along with a prediction about whether the branch will be
taken.
Dynamic branch prediction, continued
When a branch instruction is encountered, the branch

prediction circuit can quickly supply a guess for the next PC
value, and instruction fetch can occur without delay.
If a guess is wrong, some instructions will have to be
cancelled, and clock cycles will be lost.
This system is called dynamic because a taken/not-taken
prediction will be changed if it has recently been more often
wrong than right.
Branch prediction code example
p and past_last are of type int*. count is an int.
do {
if (*p < 0)
count++;
p++;
} while (p != past_last);
p walks through an array of int elements, and count records

how many of those elements are negative.
Branch prediction code example, continued
Let’s suppose that there are a lot of array elements, and most
of them are negative . . .
L1: lw $t0, ($a0)

slt $t1, $zero, $t0
beq $t1, $zero, L2 # branch if !(*p < 0)
addiu $t9, $t9, 1 # count++
L2: addiu $a0, $a0, 4 # p++
bne $a0, $t8, L1 # branch if p != past_last
As the processor runs the loop, what predictions will it make

about the BEQ and BNE instructions?
Delayed branch and jump rules
This kind of solution to control hazards is older and less

sophisticated than branch prediction.
This is a feature of the real MIPS instruction set, but is NOT
enabled by default in MARS and other MIPS simulators used
for education!
The idea is that one instruction of useful work can get started
in the clock cycle needed to make a branch decision and
compute a branch or jump target address.
Details are in the two paragraphs at the bottom of the page in
the “Control Hazard Solutions” document.
MIPS delayed branch example
What will the flow of instructions be if $t0 != 0? What will it

be if $t0 == 0?
slt operands
beq $t0, $zero, L1
add operands
lw operands
L1: or operands
sub operands
Examples of MIPS delayed jumps

# C code: i = foo(17); . . . suppose i is in $s0.
jal foo
addiu $a0, $zero, 17 # Argument set up after call starts!
addu $s0, $v0, $zero # $ra points to this instruction.
# Example return from nonleaf procedure . . .

jr $ra
addiu $sp, $sp, 32 # Deallocate stack after return starts!
If you ever do A.L. programming for real MIPS processors, or

need to read MIPS compiler output, be aware of delayed
branches and jumps!
Conditional instructions
Suppose this if-else code is inside a loop.

if (a < b)
Translating this with a branch and a jump
c = a;
could cause a lot of lost clock cycles,
else
especially if branch prediction does a
c = b;
poor job.
Suppose that a, b, and c are ints in $s0, $s1, $s2. Let’s see
how this can be coded with MIPS “move conditional”
instructions movn and movz.
By the way, ARM instruction sets have very rich collections of
conditional instructions.

Pipeline Hazards
Exceptions
The textbook presents a sequence of designs, from

Figure 7.45 to Figure 7.58.
The earliest designs are incomplete and incorrect in many
ways.
Later designs get closer and closer to being complete and
correct.
Recommendation: Read Sections 7.5.1 through 7.5.3
carefully and observe how new features get added and
existing features get modified.
Remarks on Textbook Figure 7.47
This computer handles R-type, LW, and SW instructions

correctly, except when there are data hazards.
It makes an attempt to handle BEQ, but doesn’t get it right.
This computer works as if three delay-slot instructions should
be processed before a branch is taken.
D flip-flops: What’s the point?

(Repeat slide from Slide Set 6)
This is important! Knowing what a D flip-flop does is as
important as knowing the truth tables for NOT, AND, and OR.
A clock cycle is a span of time from one active edge of a
clock to the next active edge.
A D flip-flop captures the value of the

input bit D at the end of a clock cycle,
and makes that captured bit value
available on Q throughout the next clock
cycle.
Pipeline registers
Prominent in all of the Section 7.5 designs are pipeline

registers made of D flip-flops.
The pipeline registers are not 32 bits wide—they’re much
wider than that.
They have clock inputs; the register outputs change only on
active clock edges.
At the end of each clock cycle, each pipeline register collects
information from one pipeline stage, and makes that
information available to the next stage throughout the next
clock cycle.
A sketch of a pipelined datapath

This is essentially textbook Figure 7.46 with the wiring
removed to reduce clutter. Note the highlighted pipeline
registers!
CLK CLK CLK CLK
CLK
CLK
CLK
M/W pipeline register

E/M pipeline register
D/E pipeline register
F/D pipeline register
I-Mem R-File ALU D-Mem

PC
<<2
+ +
4 SignExt
F D E M W
stage stage stage stage stage
Review: Edge-triggering for pipelined computers

in Section 7.5
Updates to GPRs in the Register File

happen in response to negative clock edges.
1
system clock
0
Updates to PC, Data Memory, and pipeline registers

happen in response to positive clock edges.
Tracing an instruction through the datapath of

Figure 7.46
Let’s trace an R-type instruction: SLT $2, $4, $5. We’ll

assume that this instruction is located at address
0x0040_0030 in Instruction Memory.
For now, we’ll look at the datapath only.
We’ll consider control later, after we have seen the whole
datapath.
SLT $2, $4, $5 located at 0x0040_0030: F stage
CLK CLK
0 PCF
1
F/D pipeline reg.

PC I-Mem
+
4
PCPlus4F
PCBranchM (from M stage)
How many DFFs are there in the F/D register?

What values get written into the F/D register at the end of
the Fetch clock cycle of the SLT?
SLT $2, $4, $5 located at 0x0040_0030: D stage

CLK CLK CLK
How many DFFs are
25:21
InstrD WE3 there in the D/E
20:16
register?
D/E pipeline reg.

F/D pipeline reg.
R-File
What gets into the
20:16 D/E register at the
15:11 end of the Decode
clock cycle?
15:0
SignExt
What is going on
PCPlus4D
with WriteRegW and
WriteRegW
ResultW?
ResultW
SLT $2, $4, $5 located at 0x0040_0030: E stage

How many DFFs are
there in the E/M
CLK CLK register?
SrcAE For the SLT
ALU
instruction, what useful
0 SrcBE information gets
E/M pipeline reg.

D/E pipeline reg.
1
WriteDataE written into the E/M
RtE
register at the end of
0 WriteRegE the Execute clock
1
RdE cycle?
<<2 What useful
SignImmE
+
PCPlus4E information gets
written into the E/M
register in the cases of
LW, SW and BEQ?
SLT $2, $4, $5 located at 0x0040_0030: M stage
CLK CLK How many DFFs are

there in the M/W
ZeroM
register?
CLK
For the SLT
M/W pipeline reg.

E/M pipeline reg.
ALUOutM WE instruction, what useful

D-Mem information gets
WriteDataM written into the E/M
WriteRegM register at the end of
Memory clock cycle?
PCBranchM
What happens in the
M stage for LW, SW,
and BEQ?
SLT $2, $4, $5 located at 0x0040_0030: W stage

CLK
ALUOutW
For the SLT instruction, what

M/W pipeline reg.
0
happens in the Writeback
ReadDataW
1 stage? Let’s draw part of a
schematic to help explain it.
WriteRegW What would be the same and
what would be different for an
LW instruction in the
W stage?
ResultW
Pipelined control for the Figure 7.46 datapath

Perhaps surprisingly, we can use exactly the same control
unit that was designed for the single-cycle machine.
We can drop the Control Unit into the Decode stage.
However, now we must organize the control signals so that
each one arrives at the correct time wherever it is needed on
the datapath! For example . . .
I Q1: RegWrite = 1 is generated for LW. When should that
value of RegWrite arrive at the R-File?
I Q2: MemWrite = 1 is generated for SW. When should
that value of MemWrite arrive at D-Mem?
Q3: What general method can we use to get the timing
correct for all of the control signals?
Control circuit for pipelined datapath of Figure 7.46
RegWriteW
CLK CLK PCSrcM CLK
RegWriteD RegWriteE RegWriteM

Control
Unit MemtoRegD MemtoRegE MemtoRegM
MemtoRegW
M/W pipeline register
E/M pipeline register
D/E pipeline register
MemWriteD MemWriteE MemWriteM
BranchD BranchE BranchM
31:26
opcode
ALUControlD ALUControlE
ZeroM (from ALU)

5:0
funct
ALUSrcD ALUSrcE
RegDstD
RegDstE
to R-File
.. .. ..
Instr
. . .
Let’s make a few notes about how this circuit works.

How much progress have we made so far?
Reminder: processor designs near the beginning of Section 7.5

are incomplete and partly incorrect.
Processor designs get better and better as corrections and
improvements are made.
The datapath and control system we have just looked at in
detail are combined in the textbook in the computer of
Figure 7.47. That computer can’t deal with data hazards and
handles BEQ incorrectly.

Pipeline Hazards
Exceptions

Let’s start by reviewing two of the more complicated kinds of
data hazard.
For example #2 of the “Hazard Examples” document . . .
first ADD F D E M W
second ADD F D E M W
SUB F D E M W
Let’s illustrate why forwarding by itself won’t work for

example #4 in “Hazard Examples” . . .
Hardware for forwarding: This incomplete sketch of an

upgraded Execute stage allows a lot of choice for ALU A and
B inputs!
ALUSrcE
CLK
GPR
00
01 A
ID/EX pipeline register
10
ALU
GPR
00 0
01 B
10 1
LW/SW
WriteDataE
offset
2 2
ALUOutM
ResultW
ForwardAE ForwardBE
Hazard Unit
Hardware for forwarding, continued:

Q1: What should the values of ForwardAE and ForwardBE be
in the case where no forwarding is needed?
Consider this sequence:
LW R8, 0(R4)
AND R9, R10, R11
SUB R12, R8, R9
Q2: What should the values of ForwardAE and ForwardBE be
when SUB is in the EX stage?
Q3: What inputs does the Hazard Unit need in order to decide
correctly on the values of ForwardAE and ForwardBE?
Hazard Unit for computer of textbook Figure 7.50
WriteRegW
RegWriteW
WriteRegM
RegWriteM
ForwardAE
ForwardBE
5 5 2 2 5 5
RsE
RtE
Hazard Unit
What are RsE and RtE, and how are they used by the Hazard
Unit?
A complete description of the logic in this version of the
Hazard Unit can be found on pages 416 and 418 in the
textbook.
Note: The computer of Figure 7.50 properly handles data
hazards that can be solved using forwarding only. It is not
capable of solving data hazards that require stalls.
Hardware for data hazard stalls
This is an example of what is called a “load-use” data hazard:

LW $8, 0($9)
ADD $16, $17, $8
SUB $18, $4, $5
We’ve already seen that a one-cycle stall is needed so that the
M stage result of LW can be forwarded to the E stage of ADD.
The need for a stall can be detected in the D stage of ADD.
Let’s draw a diagram to show how LW, ADD, and SUB will be
processed.
To make this work in hardware, we must enhance some of the

registers in the system . . .
I Add an EN (enable) input to the PC. If EN is turned off
the PC is “frozen” and does not update on a positive
clock edge.
I Add a similar EN input to the F/D pipeline register.
I Add a CLR (clear) input to the D/E pipeline register. If

CLR is turned on, the instruction arriving in the register
is converted to a harmless NOP.
These changes are sketched in an incomplete schematic on the
next slide . . .
CLK CLK
CLR D/E register

F/D register
CLK
PC
EN EN
MemtoRegE
FlushE
StallD
StallF
5 5 5
RsD
RtD
RtE
extension to Hazard Unit
For clarity, the schematic above only shows Hazard Unit inputs and
outputs that are used to effect the stall for LW instructions. See
textbook Figure 7.53 for a complete schematic.
For a complete description of all of the logic used to effect the

stall for LW instructions, see pages 418–421 in the textbook.
In lecture, it’s really only possible to present a sketch of that
logic.

Pipeline Hazards
Exceptions
ENCM 369 will NOT cover this material in depth, and there
will be NO lab exercises or midterm or final exam questions on
it!
The Figure 7.53 processor is excellent regarding data hazards,
but handles BEQ instructions poorly—three instructions follow
a BEQ into the pipeline before the branch decision gets made.
Why does that happen? The Figure 7.53 processor makes the
branch decision in the Memory stage. (Check the location of
the AND gate . . . )
Redesign to make branch instructions work better
The processor of Figure 7.56 moves the branch decision from

the Memory stage to the Decode stage, and the branch target
address generation from the Execute stage to the Decode
stage.
So, only one instruction follows BEQ into the pipeline
before a branch is taken, which is better, but . . .
. . . making the decision in the Decode stage causes new data
hazards!
Redesign to make branch instructions work better,

continued
Example #6 from the “Hazard Examples” document, with an
extra instruction . . .
LW $17, 0($4)
BEQ $17, $0, some_other_label
ADD $2, $5, $6
What is needed to get the LW result into the Decode step of

BEQ?
If the branch is taken, what should happen to ADD? (Assume
that we’re designing a computer that does NOT have a
delayed branch rule.)
Redesign to make branch instructions work better:

Remarks
It’s hard to process branches with perfect accuracy without

losing lots of cycles due to hazards!
Therefore, dynamic branch prediction can save a lot of cycles
if most guesses are correct.
Also, conditional instructions such as MIPS movn and movz
are sometimes better choices than branch instructions.

Pipeline Hazards
Exceptions
Exceptions: General Concepts
An exception is an event that changes flow of instructions in a

way that is quite different from a branch or jump.
So, obviously, an exception causes a special kind of PC update.
But an exception can also cause a change in privilege—a
switch from a user program to operating system kernel
software.
Privilege: user program vs. kernel
A user program has rights to read and write memory allocated

to that program and to read and write registers. That’s all it
can do by itself, but it can also ask for help from the kernel.
The kernel controls hardware like disks and network interfaces.
The kernel has power over all memory in the computer and
can start and stop all other programs.
Two meanings for “exception”
The concept of an exception in discussion of hardware or

assembly language code is NOT THE SAME as the concept of
an exception in a high-level language like C++, Java, or
Python!
Exception-related keywords in C++: try, catch, throw

Exception-related keywords in Java: try, catch, finally,
throw, throws
Exception-related keywords in Python: try, except, raise
Two meanings for “exception”, continued
High-level language exception: a special kind of jump (possibly

involving a return through one or more procedure calls) to
code that is set up to handle an error condition.
Do NOT try to connect the above concept to hardware

exceptions—if you do, your brain will hurt and your
understanding of both kinds of exceptions will be damaged.
Exceptions in Hardware and Assembly Language:

3 Main Categories
1. The processor notices that a program has tried to do a

bad thing.
2. A program intentionally generates the exception.
3. “Interrupts”—hardware external to the processor sends a
signal to the processor asking for attention.
Examples of program-trying-a-bad-thing exceptions

Instruction fetched with opcode that does not make sense to
processor (“undefined instruction”).
Addition or subtraction of integers resulted in overflow (e.g.,
MIPS ADD, SUB, ADDI, but not ADDU, SUBU, ADDIU).
Attempt to access memory a program is not permitted to
access.
Attempt to access memory with invalid address (e.g., LW data
address is not a multiple of 4).
(Note: Memory units in Chapter 7 computers don’t have the
capability to report memory access errors, but memory
systems in real computers usually do.)
Programs intentionally causing exceptions
This mainly happens with system calls.

Examples: MIPS syscall instruction, similar instructions in
other instruction sets. A user program asks the operating
system kernel to provide a service.
Examples of Interrupts
Laptop user presses key on keyboard.

Desktop user moves a mouse.
Smartphone or tablet user taps finger on a touchscreen.
A data packet arrives on a network interface.
A disk controller reports that a write operation on a disk has
completed.
What happens when an exception occurs?
The processor will start executing instructions that form an

exception handler (like a procedure, but not exactly the
same).
Before starting the exception handler, the processor must
record some essential information in some special-purpose
registers . . .
Let’s make some notes about this essential information.
Exceptions and pipelines
Due to time limitations and lack of textbook support, we will

not look in detail at this topic, just give a quick sketch.
Useful terms: exception victim and flushing.
The victim of an exception is either the instruction that
caused the exception or, when there is an interrupt, the first
instruction in the pipeline that will not be allowed to
complete.
To flush an instruction in a pipeline means ensuring that the
instruction does not update system state, such as register file
or memory contents.
Exceptions and pipelines: key challenges
Instructions that enter a pipeline before the victim must be

allowed to complete.
The victim and the instructions that followed the victim in the
pipeline must be flushed.
The address of the victim must be identified—NOT easy,
because in a pipelined system, the PC probably will NOT be
pointing to the victim.
Example of MIPS exception processing

Suppose there is an
# Code running in a
exception when the LW
# 5-stage pipeline, an
instruction (at address
# actual MIPS computer,
0x0040_0090) is in the
# not a Ch. 7 machine!
Memory stage. What
should happen?
andi $t2, $s4, 0xFF
Scenario 1: Exception is sll $t3, $t2, 8
caused by $t0 not being a or $s2, $s2, $t3
multiple of 4. lw $t1, ($t0)
Scenario 2: Exception is an addiu $t0, $t0, 4
interrupt, unrelated to this sw $t1, ($s0)
program. addiu $s0, $s0, 4
slt $t4, $t0, $s7

369W17 Slideset07

Uploaded by

Copyright:

Available Formats

You might also like

369W17 Slideset07

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

369W17 Slideset07

Uploaded by

Copyright:

Available Formats

Slide Set 7 for Lecture Section 01

for ENCM 369 Winter 2017

Steve Norman, PhD, PEng

Electrical & Computer Engineering

5 pipeline stages for our MIPS subset

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Outline of Slide Set 7 for Lecture Section 01

5 pipeline stages for our MIPS subset

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

The multicycle processor (textbook Section 7.4)

Outline of Slide Set 7 for Lecture Section 01

5 pipeline stages for our MIPS subset

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Step 1: Processor reads instruction from

This model correctly predicts the results produced by

The one-instruction-at-a-time model and modern

The model DOES NOT accurately describe the organization of

Remark: Your instructor thinks that “as if” is a very short

The Laundry Analogy

I a “folding unit” (you)

I a “putting-away unit” (your roommate)

The Laundry Analogy, continued

Let’s assume that each step in processing laundry takes

Processing four loads of laundry, one at a time . . .

The work takes EIGHT HOURS in total! And each resource

Processing four loads of laundry, making better use

The concept of pipelining in digital logic design

A pipelined system is a collection of stages, each with a simple

Pipelined execution of instructions

In a pipelined processor, an instruction is like a single load of

Outline of Slide Set 7 for Lecture Section 01

5 pipeline stages for our MIPS subset

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

5 pipeline stages for our MIPS subset

I Decode: Determine outputs of Control Unit and read

I Memory: D-Mem access for loads and stores.

I Writeback: Write to a GPR at the end of a load or an

Example sequence of instructions in our 5-stage

# This sequence is not practical code,

Pipelined processing for example instruction

main decoder outputs

ALU decoder outputs

An example 3-instruction sequence in a pipelined

Outline of Slide Set 7 for Lecture Section 01

5 pipeline stages for our MIPS subset

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

These can be defined as situations that prevent throughput of

A structural hazard occurs when a unit within a computer is