369W17 Slideset07

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Slide Set 7 for Lecture Section 01

for ENCM 369 Winter 2017

Steve Norman, PhD, PEng

Electrical & Computer Engineering


Schulich School of Engineering
University of Calgary

February 2017
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 2/86

Contents
The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 3/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 4/86

The multicycle processor (textbook Section 7.4)

ENCM 369 will not cover Section 7.4 in detail, because terms
at Canadian universities are short!
That’s too bad, because the multicycle design has some
interesting aspects . . .
I It shows how a computer can use a single memory
array for both instructions and data.
I It makes very efficient use of the ALU—the ALU gets
used to compute three different results for every
instruction.
I The control unit is sequential—it’s a really nice and
practical example of a finite state machine (FSM).
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 5/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 6/86

Introduction to Pipelining
Before we start to learn about pipelining, let’s review a model
we will call the one-instruction-at-a-time model:

Step 1: Processor reads instruction from


memory and updates PC.
Step 2: Processor executes the instruction.
The processor performs Step 1, Step 2, Step 1,
Step 2, . . . , forever (or until the power is
turned off).

This model correctly predicts the results produced by


sequences of instructions in assembly language code.
Also, the model accurately describes the organization of the
processors of textbook Sections 7.3 and 7.4.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 7/86

The one-instruction-at-a-time model and modern


processors

The model DOES NOT accurately describe the organization of


modern processors!
At a given moment in time, a modern processor will be
working on many different instructions—this allows much
greater speed than one-instruction-at-a-time processing.
However, the processor must produce results as if instructions
were being handled one at a time.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 8/86

Remark: Your instructor thinks that “as if” is a very short


and very useful summary of many of the important ideas
related to modern computer system designs.
Modern processor chips often process instructions in ways that
are hard for humans to understand, but nevertheless do what
skilled coders want in time- and energy-efficient ways.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 9/86

The Laundry Analogy


This analogy is taken from Computer Organization and
Design, by David Patterson and John Hennessy, which was the
ENCM 369 textbook for many years.
You have many loads of laundry to do, with these four
resources:
I a washing machine

I a dryer

I a “folding unit” (you)

I a “putting-away unit” (your roommate)

(In real life not very many students would ask their roommates
to put away laundry for them, but let’s just follow Patterson
and Hennessy here.)
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 10/86

The Laundry Analogy, continued

Let’s assume that each step in processing laundry takes


30 minutes. (In real life, this close to correct for washers but
unfortunately not at all correct for dryers.)
Suppose you have four loads of dirty laundry.
If you process each load completely before starting the next,
how long does it take to finish all four loads?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 11/86

Processing four loads of laundry, one at a time . . .


Load
1st W D F PA
2nd W D F PA
3rd W D F PA
4th W D F PA
6:00pm 8:00pm 10:00pm midnight 2:00am
Time

The work takes EIGHT HOURS in total! And each resource


(washer, dryer, etc.) is IDLE for three-quarters of the time.
There is an obvious way to speed this up . . .
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 12/86

Processing four loads of laundry, making better use


of resources . . .
As soon as one load is out of the washer, the washer is free for
the next load. The same is true for all of the other resources.
So we can schedule the work this way. . .

Load
1st W D F PA
2nd W D F PA
3rd W D F PA
4th W D F PA
6:00pm 8:00pm 10:00pm midnight 2:00am
Time
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 13/86

The concept of pipelining in digital logic design

A pipelined system is a collection of stages, each with a simple


role to perform.
When a stage is finished producing its current output, it can
pass that output to the next stage and receive new input.
In the laundry analogy, the washer stage receives a load of
dirty clothes as input, and produces a load of wet, clean
clothes as output, which gets passed as input to the dryer
stage.
In Harris and Harris, this year’s textbook, pipelining is
introduced in Section 3.6, along with an analogy to baking
cookies. That section is short and worth reading.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 14/86

Pipelined execution of instructions

In a pipelined processor, an instruction is like a single load of


laundry. Processing an instruction can start long before
processing of the preceding instruction is finished.
To divide the work of processing an instruction across a
number of pipeline stages, that work has to be broken down
into simple steps that take roughly equal amounts of
time.
Each step must fit into a single clock cycle.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 15/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 16/86

5 pipeline stages for our MIPS subset


The subset is: ADD, SUB, SLT, AND, OR, LW, SW, BEQ.
The stages are:
I Fetch: Read instruction from I-Mem and update PC.

I Decode: Determine outputs of Control Unit and read


GPRs from R-File.
I Execute: Get a result from the ALU.

I Memory: D-Mem access for loads and stores.

I Writeback: Write to a GPR at the end of a load or an


R-type instruction.
In some stages, for some instructions, nothing happens. What
are some examples of that?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 17/86

Example sequence of instructions in our 5-stage


MIPS subset pipeline

# This sequence is not practical code,


# but it makes for a simple example.
ADD $t2, $t0, $t1
LW $t4, ($t3)
SW $t5, ($t6)
SUB $t9, $t7, $t8
Let’s suppose we have a 1 GHz clock, so the clock period is
1 ns. How long will it take from the beginning of the ADD
instruction to the end of the SUB instruction?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 18/86

Pipelined processing for example instruction


sequence

ADD IF ID EX MEM WB

LW IF ID EX MEM WB

SW IF ID EX MEM WB

SUB IF ID EX MEM WB

0 ns 1 ns 2 ns 3 ns 4 ns 5 ns 6 ns 7 ns 8 ns
time
The single-cycle processor starts one instruction per clock cycle.
A pipelined processor also starts one instruction per clock cycle.
Why will a pipelined design allow much greater instruction
throughput? (The diagram below provides a hint at the answer!)
CLK

PC output

Instruction

main decoder outputs

R-File outputs

ALU decoder outputs

ALU result

D-Mem RD output

$s1 contents
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 20/86

An example 3-instruction sequence in a pipelined


processor

The sequence is . . .

lw $t0, 20($t1)
or $t2, $t3, $t4
sw $t5, 40($t6)

Let’s use the “Pipeline Basics” handout to track all the steps
in processing these instructions.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 21/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 22/86

Pipeline Hazards

These can be defined as situations that prevent throughput of


one instruction per clock cycle.
There are three main kinds: structural hazards, data hazards,
and control hazards.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 23/86

Structural Hazards

A structural hazard occurs when a unit within a computer is


asked to do two (or more) incompatible things at the same
time.
Example: In a computer with a single memory unit, the
processor can’t do the Fetch step of one instruction while also
doing the Memory step of an earlier LW or SW instruction.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 24/86

Solution to Structural Hazards

Design the instruction set and hardware so that this kind of


hazard does not occur.
Example: Have separate Instruction and Data Memories, so
Fetch can be simultaneous with Memory of an earlier
instruction.
(Note: When we get to textbook Chapter 8, we’ll see that for
modern processors, separation of instructions and data really
means having separate caches for instructions and data.)
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 25/86

Data Hazards: Are inputs to instructions


up-to-date?

Example:
add $t0, $t1, $t2
sub $t4, $t3, $t0
The destination of ADD is a source for SUB.
The Writeback step of ADD will happen later than the
Decode step of SUB, so there is a risk that SUB will use old,
wrong data from $t0.
Remember, the processor must produce results as if one
instruction completes before the next instruction starts!
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 26/86

Control Hazards: What instruction address should


be used in the next Fetch?

Example . . .

beq $t0, $t1, L1


and $t4, $t2, $t3
. . . more instructions . . .
L1: lw $t5, ($t6)

Which instruction should be fetched after BEQ is fetched?


AND or LW? The processor will not know until the
$t0 == $t1 comparison is done!
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 27/86

Assumption about Register File in textbook


Section 7.5, related to data hazards

Writes to the Register File occur in the first half of a clock


cycle, and reads from the Register File occur in the second
half.
To enable this behaviour, what choices can be made about
flip-flops, Data Memory, and other clocked components?
What are the consequences regarding GPR reads and writes
that happen within the same clock cycle?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 28/86

Edge-triggering for pipelined computers


in Section 7.5
Updates to GPRs in the Register File
happen in response to negative clock edges.

1
system clock
0

Updates to PC, Data Memory, and pipeline registers


happen in response to positive clock edges.

This is NOT applicable to the single-cycle design of


Section 7.3 and the multicycle design of Section 7.4!
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 29/86

Review: 5 pipeline stages for our MIPS subset

Fetch: Read instruction from Instruction Memory;


do PC = PC + 4.
Decode: Determine Control Unit outputs appropriate for
instruction opcode; copy two GPR values out of Register File.
Execute: Do computation in ALU.
Memory: Read or write Data Memory.
Writeback: Update a GPR in the Register File.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 30/86

Solutions to data hazards, first of three: stalling


the pipeline

Example:
add $t0, $t1, $t2
sub $t4, $t3, $t0
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 31/86

Solutions to data hazards, second of three:


forwarding

Example A, from previous slide:


add $t0, $t1, $t2
sub $t4, $t3, $t0

Example B:
lw $t0, ($t1)
add $t2, $t2, $t3
slt $t6, $t0, $t5
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 32/86

Solutions to data hazards, third of three:


combine stalling and forwarding

Example:
lw $t0, ($t1)
add $t3, $t0, $t2

Can forwarding by itself solve this data hazard?


ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 33/86

Control Hazards (repeat of earlier example)

What instruction address should be used in the next Fetch


step after the Fetch step of a branch instruction? Example . . .

beq $t0, $t1, L1


and $t4, $t2, $t3
. . . more instructions . . .
L1: lw $t5, ($t6)

Which instruction should be fetched after BEQ is fetched?


AND or LW? The processor will not know until the
$t0 == $t1 comparison is done!
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 34/86

Control hazard illustration

BEQ F D E M W

next instruction F D E M W

Here “next” means “next in time”, not necessarily “next


location in Instruction Memory”.
Why will it be difficult to do the Fetch step for the next
instruction just one clock cycle after the Fetch step for BEQ?
(There are multiple reasons.)
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 35/86

Four kinds of solutions for control hazards

1. Stall: Delay the Fetch step for the next instruction until
the address of the next instruction is known.
2. Predict: Guess what the address of the next instruction
will be, and act on the guess without delay. Check that
the guess was correct; if not, cancel instructions that
have incorrectly entered the pipeline.
3. Delayed branch and jump rules
4. Conditional instructions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 36/86

Dynamic branch prediction

This is widely used in modern processors (but mostly not used


in low-power embedded processors).
A large and complex branch prediction circuit is dedicated to
recording information about recently-encountered branch
instructions.
For each branch instruction, its target address is recorded
along with a prediction about whether the branch will be
taken.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 37/86

Dynamic branch prediction, continued

When a branch instruction is encountered, the branch


prediction circuit can quickly supply a guess for the next PC
value, and instruction fetch can occur without delay.
If a guess is wrong, some instructions will have to be
cancelled, and clock cycles will be lost.
This system is called dynamic because a taken/not-taken
prediction will be changed if it has recently been more often
wrong than right.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 38/86

Branch prediction code example

p and past_last are of type int*. count is an int.

do {
if (*p < 0)
count++;
p++;
} while (p != past_last);

p walks through an array of int elements, and count records


how many of those elements are negative.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 39/86

Branch prediction code example, continued

Let’s suppose that there are a lot of array elements, and most
of them are negative . . .

L1: lw $t0, ($a0)


slt $t1, $zero, $t0
beq $t1, $zero, L2 # branch if !(*p < 0)
addiu $t9, $t9, 1 # count++
L2: addiu $a0, $a0, 4 # p++
bne $a0, $t8, L1 # branch if p != past_last

As the processor runs the loop, what predictions will it make


about the BEQ and BNE instructions?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 40/86

Delayed branch and jump rules

This kind of solution to control hazards is older and less


sophisticated than branch prediction.
This is a feature of the real MIPS instruction set, but is NOT
enabled by default in MARS and other MIPS simulators used
for education!
The idea is that one instruction of useful work can get started
in the clock cycle needed to make a branch decision and
compute a branch or jump target address.
Details are in the two paragraphs at the bottom of the page in
the “Control Hazard Solutions” document.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 41/86

MIPS delayed branch example

What will the flow of instructions be if $t0 != 0? What will it


be if $t0 == 0?

slt operands
beq $t0, $zero, L1
add operands
lw operands
L1: or operands
sub operands
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 42/86

Examples of MIPS delayed jumps


# C code: i = foo(17); . . . suppose i is in $s0.
jal foo
addiu $a0, $zero, 17 # Argument set up after call starts!
addu $s0, $v0, $zero # $ra points to this instruction.

# Example return from nonleaf procedure . . .


jr $ra
addiu $sp, $sp, 32 # Deallocate stack after return starts!

If you ever do A.L. programming for real MIPS processors, or


need to read MIPS compiler output, be aware of delayed
branches and jumps!
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 43/86

Conditional instructions

Suppose this if-else code is inside a loop.


if (a < b)
Translating this with a branch and a jump
c = a;
could cause a lot of lost clock cycles,
else
especially if branch prediction does a
c = b;
poor job.

Suppose that a, b, and c are ints in $s0, $s1, $s2. Let’s see
how this can be coded with MIPS “move conditional”
instructions movn and movz.
By the way, ARM instruction sets have very rich collections of
conditional instructions.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 44/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 45/86

Making pipelining work in hardware

The textbook presents a sequence of designs, from


Figure 7.45 to Figure 7.58.
The earliest designs are incomplete and incorrect in many
ways.
Later designs get closer and closer to being complete and
correct.
Recommendation: Read Sections 7.5.1 through 7.5.3
carefully and observe how new features get added and
existing features get modified.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 46/86

Remarks on Textbook Figure 7.47

This computer handles R-type, LW, and SW instructions


correctly, except when there are data hazards.
It makes an attempt to handle BEQ, but doesn’t get it right.
This computer works as if three delay-slot instructions should
be processed before a branch is taken.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 47/86

D flip-flops: What’s the point?


(Repeat slide from Slide Set 6)
This is important! Knowing what a D flip-flop does is as
important as knowing the truth tables for NOT, AND, and OR.
A clock cycle is a span of time from one active edge of a
clock to the next active edge.

A D flip-flop captures the value of the


input bit D at the end of a clock cycle,
and makes that captured bit value
available on Q throughout the next clock
cycle.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 48/86

Pipeline registers

Prominent in all of the Section 7.5 designs are pipeline


registers made of D flip-flops.
The pipeline registers are not 32 bits wide—they’re much
wider than that.
They have clock inputs; the register outputs change only on
active clock edges.
At the end of each clock cycle, each pipeline register collects
information from one pipeline stage, and makes that
information available to the next stage throughout the next
clock cycle.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 49/86

A sketch of a pipelined datapath


This is essentially textbook Figure 7.46 with the wiring
removed to reduce clutter. Note the highlighted pipeline
registers!
CLK CLK CLK CLK
CLK
CLK
CLK

M/W pipeline register


E/M pipeline register
D/E pipeline register
F/D pipeline register

I-Mem R-File ALU D-Mem


PC

<<2
+ +
4 SignExt

F D E M W
stage stage stage stage stage
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 50/86

Review: Edge-triggering for pipelined computers


in Section 7.5

Updates to GPRs in the Register File


happen in response to negative clock edges.

1
system clock
0

Updates to PC, Data Memory, and pipeline registers


happen in response to positive clock edges.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 51/86

Tracing an instruction through the datapath of


Figure 7.46

Let’s trace an R-type instruction: SLT $2, $4, $5. We’ll


assume that this instruction is located at address
0x0040_0030 in Instruction Memory.
For now, we’ll look at the datapath only.
We’ll consider control later, after we have seen the whole
datapath.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 52/86
SLT $2, $4, $5 located at 0x0040_0030: F stage
CLK CLK

0 PCF
1

F/D pipeline reg.


PC I-Mem

+
4
PCPlus4F

PCBranchM (from M stage)

How many DFFs are there in the F/D register?


What values get written into the F/D register at the end of
the Fetch clock cycle of the SLT?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 53/86

SLT $2, $4, $5 located at 0x0040_0030: D stage


CLK CLK CLK
How many DFFs are
25:21
InstrD WE3 there in the D/E
20:16
register?

D/E pipeline reg.


F/D pipeline reg.

R-File
What gets into the
20:16 D/E register at the
15:11 end of the Decode
clock cycle?
15:0
SignExt
What is going on
PCPlus4D
with WriteRegW and
WriteRegW
ResultW?
ResultW
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 54/86

SLT $2, $4, $5 located at 0x0040_0030: E stage


How many DFFs are
there in the E/M
CLK CLK register?
SrcAE For the SLT

ALU
instruction, what useful
0 SrcBE information gets

E/M pipeline reg.


D/E pipeline reg.

1
WriteDataE written into the E/M
RtE
register at the end of
0 WriteRegE the Execute clock
1
RdE cycle?
<<2 What useful
SignImmE
+
PCPlus4E information gets
written into the E/M
register in the cases of
LW, SW and BEQ?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 55/86

SLT $2, $4, $5 located at 0x0040_0030: M stage

CLK CLK How many DFFs are


there in the M/W
ZeroM
register?
CLK
For the SLT

M/W pipeline reg.


E/M pipeline reg.

ALUOutM WE instruction, what useful


D-Mem information gets
WriteDataM written into the E/M
WriteRegM register at the end of
Memory clock cycle?
PCBranchM
What happens in the
M stage for LW, SW,
and BEQ?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 56/86

SLT $2, $4, $5 located at 0x0040_0030: W stage


CLK

ALUOutW

For the SLT instruction, what


M/W pipeline reg.

0
happens in the Writeback
ReadDataW
1 stage? Let’s draw part of a
schematic to help explain it.
WriteRegW What would be the same and
what would be different for an
LW instruction in the
W stage?

ResultW
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 57/86

Pipelined control for the Figure 7.46 datapath


Perhaps surprisingly, we can use exactly the same control
unit that was designed for the single-cycle machine.
We can drop the Control Unit into the Decode stage.
However, now we must organize the control signals so that
each one arrives at the correct time wherever it is needed on
the datapath! For example . . .
I Q1: RegWrite = 1 is generated for LW. When should that
value of RegWrite arrive at the R-File?
I Q2: MemWrite = 1 is generated for SW. When should
that value of MemWrite arrive at D-Mem?
Q3: What general method can we use to get the timing
correct for all of the control signals?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 58/86

Control circuit for pipelined datapath of Figure 7.46

RegWriteW
CLK CLK PCSrcM CLK

RegWriteD RegWriteE RegWriteM


Control
Unit MemtoRegD MemtoRegE MemtoRegM

MemtoRegW
M/W pipeline register
E/M pipeline register
D/E pipeline register
MemWriteD MemWriteE MemWriteM
BranchD BranchE BranchM
31:26
opcode
ALUControlD ALUControlE

ZeroM (from ALU)


5:0
funct
ALUSrcD ALUSrcE
RegDstD
RegDstE
to R-File

.. .. ..
Instr

. . .

Let’s make a few notes about how this circuit works.


ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 59/86

How much progress have we made so far?

Reminder: processor designs near the beginning of Section 7.5


are incomplete and partly incorrect.
Processor designs get better and better as corrections and
improvements are made.
The datapath and control system we have just looked at in
detail are combined in the textbook in the computer of
Figure 7.47. That computer can’t deal with data hazards and
handles BEQ incorrectly.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 60/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 61/86

Hardware features to manage data hazards


Let’s start by reviewing two of the more complicated kinds of
data hazard.

For example #2 of the “Hazard Examples” document . . .

first ADD F D E M W

second ADD F D E M W

SUB F D E M W

Let’s illustrate why forwarding by itself won’t work for


example #4 in “Hazard Examples” . . .
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 62/86

Hardware for forwarding: This incomplete sketch of an


upgraded Execute stage allows a lot of choice for ALU A and
B inputs!

ALUSrcE
CLK

GPR
00
01 A
ID/EX pipeline register

10

ALU
GPR
00 0
01 B
10 1
LW/SW
WriteDataE
offset
2 2
ALUOutM
ResultW

ForwardAE ForwardBE
Hazard Unit
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 63/86

Hardware for forwarding, continued:


Q1: What should the values of ForwardAE and ForwardBE be
in the case where no forwarding is needed?
Consider this sequence:
LW R8, 0(R4)
AND R9, R10, R11
SUB R12, R8, R9
Q2: What should the values of ForwardAE and ForwardBE be
when SUB is in the EX stage?
Q3: What inputs does the Hazard Unit need in order to decide
correctly on the values of ForwardAE and ForwardBE?
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 64/86

Hazard Unit for computer of textbook Figure 7.50

WriteRegW

RegWriteW
WriteRegM

RegWriteM
ForwardAE

ForwardBE
5 5 2 2 5 5
RsE

RtE
Hazard Unit

What are RsE and RtE, and how are they used by the Hazard
Unit?
A complete description of the logic in this version of the
Hazard Unit can be found on pages 416 and 418 in the
textbook.
Note: The computer of Figure 7.50 properly handles data
hazards that can be solved using forwarding only. It is not
capable of solving data hazards that require stalls.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 65/86

Hardware for data hazard stalls

This is an example of what is called a “load-use” data hazard:


LW $8, 0($9)
ADD $16, $17, $8
SUB $18, $4, $5
We’ve already seen that a one-cycle stall is needed so that the
M stage result of LW can be forwarded to the E stage of ADD.
The need for a stall can be detected in the D stage of ADD.
Let’s draw a diagram to show how LW, ADD, and SUB will be
processed.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 66/86

To make this work in hardware, we must enhance some of the


registers in the system . . .
I Add an EN (enable) input to the PC. If EN is turned off
the PC is “frozen” and does not update on a positive
clock edge.
I Add a similar EN input to the F/D pipeline register.

I Add a CLR (clear) input to the D/E pipeline register. If


CLR is turned on, the instruction arriving in the register
is converted to a harmless NOP.
These changes are sketched in an incomplete schematic on the
next slide . . .
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 67/86
CLK CLK

CLR D/E register


F/D register
CLK

PC
EN EN

MemtoRegE
FlushE
StallD
StallF

5 5 5
RsD

RtD

RtE
extension to Hazard Unit

For clarity, the schematic above only shows Hazard Unit inputs and
outputs that are used to effect the stall for LW instructions. See
textbook Figure 7.53 for a complete schematic.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 68/86

For a complete description of all of the logic used to effect the


stall for LW instructions, see pages 418–421 in the textbook.
In lecture, it’s really only possible to present a sketch of that
logic.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 69/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 70/86

Hardware changes to manage control hazards

ENCM 369 will NOT cover this material in depth, and there
will be NO lab exercises or midterm or final exam questions on
it!
The Figure 7.53 processor is excellent regarding data hazards,
but handles BEQ instructions poorly—three instructions follow
a BEQ into the pipeline before the branch decision gets made.
Why does that happen? The Figure 7.53 processor makes the
branch decision in the Memory stage. (Check the location of
the AND gate . . . )
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 71/86

Redesign to make branch instructions work better

The processor of Figure 7.56 moves the branch decision from


the Memory stage to the Decode stage, and the branch target
address generation from the Execute stage to the Decode
stage.
So, only one instruction follows BEQ into the pipeline
before a branch is taken, which is better, but . . .
. . . making the decision in the Decode stage causes new data
hazards!
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 72/86

Redesign to make branch instructions work better,


continued
Example #6 from the “Hazard Examples” document, with an
extra instruction . . .

LW $17, 0($4)
BEQ $17, $0, some_other_label
ADD $2, $5, $6

What is needed to get the LW result into the Decode step of


BEQ?
If the branch is taken, what should happen to ADD? (Assume
that we’re designing a computer that does NOT have a
delayed branch rule.)
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 73/86

Redesign to make branch instructions work better:


Remarks

It’s hard to process branches with perfect accuracy without


losing lots of cycles due to hazards!
Therefore, dynamic branch prediction can save a lot of cycles
if most guesses are correct.
Also, conditional instructions such as MIPS movn and movz
are sometimes better choices than branch instructions.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 74/86

Outline of Slide Set 7 for Lecture Section 01


The multicycle processor (textbook Section 7.4)

Introduction to Pipelining

5 pipeline stages for our MIPS subset

Pipeline Hazards

Making pipelining work in hardware

Hardware features to manage data hazards

Hardware changes to manage control hazards

Exceptions
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 75/86

Exceptions: General Concepts

An exception is an event that changes flow of instructions in a


way that is quite different from a branch or jump.
So, obviously, an exception causes a special kind of PC update.
But an exception can also cause a change in privilege—a
switch from a user program to operating system kernel
software.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 76/86

Privilege: user program vs. kernel

A user program has rights to read and write memory allocated


to that program and to read and write registers. That’s all it
can do by itself, but it can also ask for help from the kernel.
The kernel controls hardware like disks and network interfaces.
The kernel has power over all memory in the computer and
can start and stop all other programs.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 77/86

Two meanings for “exception”

The concept of an exception in discussion of hardware or


assembly language code is NOT THE SAME as the concept of
an exception in a high-level language like C++, Java, or
Python!

Exception-related keywords in C++: try, catch, throw


Exception-related keywords in Java: try, catch, finally,
throw, throws
Exception-related keywords in Python: try, except, raise
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 78/86

Two meanings for “exception”, continued

High-level language exception: a special kind of jump (possibly


involving a return through one or more procedure calls) to
code that is set up to handle an error condition.

Do NOT try to connect the above concept to hardware


exceptions—if you do, your brain will hurt and your
understanding of both kinds of exceptions will be damaged.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 79/86

Exceptions in Hardware and Assembly Language:


3 Main Categories

1. The processor notices that a program has tried to do a


bad thing.
2. A program intentionally generates the exception.
3. “Interrupts”—hardware external to the processor sends a
signal to the processor asking for attention.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 80/86

Examples of program-trying-a-bad-thing exceptions


Instruction fetched with opcode that does not make sense to
processor (“undefined instruction”).
Addition or subtraction of integers resulted in overflow (e.g.,
MIPS ADD, SUB, ADDI, but not ADDU, SUBU, ADDIU).
Attempt to access memory a program is not permitted to
access.
Attempt to access memory with invalid address (e.g., LW data
address is not a multiple of 4).
(Note: Memory units in Chapter 7 computers don’t have the
capability to report memory access errors, but memory
systems in real computers usually do.)
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 81/86

Programs intentionally causing exceptions

This mainly happens with system calls.


Examples: MIPS syscall instruction, similar instructions in
other instruction sets. A user program asks the operating
system kernel to provide a service.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 82/86

Examples of Interrupts

Laptop user presses key on keyboard.


Desktop user moves a mouse.
Smartphone or tablet user taps finger on a touchscreen.
A data packet arrives on a network interface.
A disk controller reports that a write operation on a disk has
completed.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 83/86

What happens when an exception occurs?

The processor will start executing instructions that form an


exception handler (like a procedure, but not exactly the
same).
Before starting the exception handler, the processor must
record some essential information in some special-purpose
registers . . .
Let’s make some notes about this essential information.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 84/86

Exceptions and pipelines

Due to time limitations and lack of textbook support, we will


not look in detail at this topic, just give a quick sketch.
Useful terms: exception victim and flushing.
The victim of an exception is either the instruction that
caused the exception or, when there is an interrupt, the first
instruction in the pipeline that will not be allowed to
complete.
To flush an instruction in a pipeline means ensuring that the
instruction does not update system state, such as register file
or memory contents.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 85/86

Exceptions and pipelines: key challenges

Instructions that enter a pipeline before the victim must be


allowed to complete.
The victim and the instructions that followed the victim in the
pipeline must be flushed.
The address of the victim must be identified—NOT easy,
because in a pipelined system, the PC probably will NOT be
pointing to the victim.
ENCM 369 Winter 2017 Slide Set 7 for Lecture Section 01 slide 86/86

Example of MIPS exception processing


Suppose there is an
# Code running in a
exception when the LW
# 5-stage pipeline, an
instruction (at address
# actual MIPS computer,
0x0040_0090) is in the
# not a Ch. 7 machine!
Memory stage. What
should happen?
andi $t2, $s4, 0xFF
Scenario 1: Exception is sll $t3, $t2, 8
caused by $t0 not being a or $s2, $s2, $t3
multiple of 4. lw $t1, ($t0)
Scenario 2: Exception is an addiu $t0, $t0, 4
interrupt, unrelated to this sw $t1, ($s0)
program. addiu $s0, $s0, 4
slt $t4, $t0, $s7

You might also like