Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

MIPS Processor Design

Anthony J Souza
April 2017
Topics and Reading
• Topics
• Overview of components
• Simple MIPS implementation
• Performance
• Basic concepts of piplining
• Reading:
• Patterson and Hennessy
• 4.1 – 4.5 (skim 4.6)
The Basics
• Data path:
• Passive part of circuit storage elements, registers and RAM.
• Control:
• Active part; finite state machine with control signals.
• Controls data path components.
• Goal:
• Design a simple CPU to support a subset of MIPS instructions.
• MIPS subset:
• R-format : add, sub, and or slt
• Load/store: lw, sw
• Branches: beq, j
Simple Single-cycle Implementation
• Each instruction completes in once cycle.
• Cycle must be long enough for slowest operation to complete.
• Advantage: design is fairly simple
• Disadvantage: faster instructions have to wait for slower ones to
finish.
• Review MIPS instruction formats from CH5 Slides.
Major Components: Registers

5 32
Read Reg 1 Read Data 1

5 32
Read Reg 2 Read Data 2
32 32
PC
5
Write Reg

32
Write Data

Registers
(r0 - r31) PC write enable
Major Components: Memory
write

Instruction Data Memory


Memory
32 32 32
Read
Address Instruction
Data
32
Address
32
Write data

read
Major Components: Arithmetic/logic unit
ALU Function
ALU Operation
operation
4
0000 AND

32 0001 OR

0010 add
zero
0110 subtract
ALU Result
32 0111 Set less than
32
1100 NOR

Zero:
0 if ALUResult is != 0
1 if ALUResult is == 0
Processing an arithmetic/logic instruction
• R-Format; all
operands are in
registers.
• Example: add $23,
$13, $17
• rd = $23
Instruction
• rs = $13 memory
• rt = $17 PC
address instruction
• Addr = 0x400024
Processing an arithmetic/logic instruction
• R-Format; all
Step 1 : fetch instruction from memory
operands are in
registers.
• Example: add $23,
Contents at
$13, $17 addr:
• rd = $23 000000
Instruction 01101
• rs = $13 memory
10001
10111
• rt = $17 0x400024 PC 0x400024 00000100000
address instruction
• Addr = 0x400024
Processing an arithmetic/logic instruction
• R-Format; all
Step 2 : decode instruction, read rs and
operands are in
registers. rt
• Example: add $23,
$13, $17
• rd = $23 01101
Read ReadData1
Reg1
• rs = $13 10001
Read ReadData2
• rt = $17 Reg2
10111
• Addr = 0x400024 Write Reg

Write Data

Registers
Processing an arithmetic/logic instruction
• R-Format; all
Step 3 : execute add operation
operands are in
registers.
• Example: add $23, 0010 (add)
$13, $17
• rd = $23 01101
Read ReadData1
$13
Reg1
• rs = $13 10001 $17
Read ReadData2
• rt = $17 Reg2
10111
• Addr = 0x400024 Write Reg

Write Data

Registers
Processing an arithmetic/logic instruction
• R-Format; all
Step 4 : write rs + rt to rd
operands are in
registers.
• Example: add $23, 0010 (add)
$13, $17
• rd = $23, Inst15-11 01101
Read ReadData1
$13
Reg1
• rs = $13, Inst25-21 10001 $17
Read ReadData2
• rt = $17, Inst20-16 Reg2
10111
• Addr = 0x400024 Write Reg

Write Data

Registers
Fetching next instruction: PC = PC + 4

0010 (add)

PC
• Assume edge-triggered
logic: write state elements
on rising edge of clock.
4
Basic Idea of Edge Triggered Logic
• Basic on Clocks.
• Needed in sequential logic to decide when an element that contains
state should be updated.
• A clock is simply a free-running signal with a fixed cycle time.
• The clock frequency is the inverse of the cycle time.
• The clock cycle time or clock period is divided into two portions:
when the clock is high and when the clock is low.

Clock period
Basic Idea of Edge Triggered Logic
• In edged-triggered logic, either a rising edge or a falling edge of the
clock is active, causing state changes to occur.
• Falling edge → high to low
• Rising edge → low to high

Rising edge Falling edge

Clock period
Processing a lw instruction
• Example:
• Lw $23, -4($13)
• Step 1:
• Fetch instruction
• Step 2:
• Decode and read rs and rt
Processing a lw instruction
• Step 3: ADDR = rs + 16-bit constant sign-extended

Read write
ReadData1
Reg1
Data memory
Read address Read
Reg2 ReadData2 ADDR data
Write Reg Write
Inst15-0 sign-ext data
Write Data
read
Processing a lw instruction
• Step 4: read data at MEM[ADD]

Read write
ReadData1
Reg1
Data memory Contents of
Read MEM[ADDR]
address Read
Reg2 ReadData2 ADDR data
Write Reg Write
Inst15-0 sign-ext data
Write Data
read
Processing a lw instruction
• Step 5: write read data at MEM[ADD] to rt

Read write
ReadData1
Reg1
Data memory Contents of
Read MEM[ADDR]
address Read
Inst20-16 Reg2
ReadData2 ADDR data
10111
Write Reg Write
Inst15-0 sign-ext data
Write Data
read
Possible issue with R-format and load/store
instructions
• In R-format instructions rt goes into lower ALU input.
• In load/store instructions the 16-bit constant goes into ALU input.

rs
rt

I sign-ext
• How to over come this issue for 2 different types of instructions?
Possible issue with R-format and load/store
instructions
• In R-format instructions rt goes into lower ALU input.
• In load/store instructions the 16-bit constant goes into ALU input.

rs

rt M
U
X
I sign-ext

• How to over come this issue for 2 different types of instructions?


• Simple! Use a multiplexor to select one of two inputs.
Possible issue with R-format and load/store instructions
• In R-format instructions:
• ALU result to write data of registers
• Rd to write register of registers
• In load/store instructions:
• data memory read data to write data of registers
• Rt to write register of registers

Rt Read
ReadData1
Reg1
Rd
Read
Reg2 ReadData2
Write Reg
ALU result
Write Data
Data memory
read data

• How to over come this issue for 2 different types of instructions?


Possible issue with R-format and load/store instructions
• In R-format instructions:
• ALU result to write data of registers
• Rd to write register of registers
• In load/store instructions:
• data memory read data to write data of registers
• Rt to write register of registers
Registers
Rt M Read
ReadData1
U Reg1
Rd X
Read
Reg2 ReadData2
M Write Reg
ALU result
U
Write Data
Data memory X
read data

• How to over come this issue for 2 different types of instructions?


• Simple! Use a multiplexor to select one of two inputs.
BEQ branch instruction
• Two operations:
1. BTA = PC + 4 + 16-bit I shift left 2, sign-extended
2. If (rs - rt == 0) PC = BTA
Simple datapath to support MIPS subset
List of Control Signals
• RegDst:
• 0 write to rt
• 1 write to rd
• RegWrite:
• Don’t write
• Write to register
• ALUSrc:
• 0 contents of rt to ALU low input
• 1 16-bit I sign-ext to ALU lower input
• MemRead:
• 0 don’t read memory
• 1 read memory
• MemWrite:
• 0 don’t write to memory
• 1 write to memory
• MemtoReg:
• 0 ALUResult written to register
• 1 data from memory written to register
• Branch:
• 0 not BEQ instruction
• 1 branch instruction
• ALUOp (2-bits):
• 00 ALU performs add always
• 01 ALU performs subtract always
• 10 ALU follow operation for R-format instructions
What ALU has to do depends on instruction
opcode ALUOp Operation funct ALU function ALU control

lw 00 load word XXXXXX add 0010

sw 00 store word XXXXXX add 0010

beq 01 branch equal XXXXXX subtract 0110

R-type 10 add 100000 add 0010

subtract 100010 subtract 0110

AND 100100 AND 0000

OR 100101 OR 0001

set-on-less-than 101010 set-on-less-than 0111


Different parts of instruction determines
control signals.
• 2-bit ALUOp
determined by R-type 0 rs rt rd shamt funct
opcode. 31:26 25:21 20:16 15:11 10:6 5:0
• For R-type, must Load/
Store
35 or 43 rs rt address
also look at 31:26 25:21 20:16 15:0
instr[5-0].
Branch 4 rs rt address
31:26 25:21 20:16 15:0

opcode always read, write for sign-extend


read except R-type and add
for load and load
ALU control signals in more detail
ALUOp definition: ALUOp ALU action
00 Force ALU to add

01 Force ALU to
subtract
10 Follow instr[5-0]

ALUOp
2
To ALU control
4
See p.317 Fig. 4.12
6 p. 318 Fig. 4.13
Determining control signals for each
Instruction
instr Reg ALU Mem Reg Mem Mem Branch ALU ALU
Dst Src To Write Read Write Op1 Op0
Reg

R-format

lw

sw

beq
Determining control signals for each
Instruction
instr Reg ALU Mem Reg Mem Mem Branch ALU ALU
Dst Src To Write Read Write Op1 Op0
Reg

R-format
1 0 0 1 X 0 0 1 0
lw

sw

beq
Determining control signals for each
Instruction
instr Reg ALU Mem Reg Mem Mem Branch ALU ALU
Dst Src To Write Read Write Op1 Op0
Reg

R-format
1 0 0 1 X 0 0 1 0
lw
0 1 1 1 1 0 0 0 0
sw

beq
Determining control signals for each
Instruction
instr Reg ALU Mem Reg Mem Mem Branch ALU ALU
Dst Src To Write Read Write Op1 Op0
Reg

R-format
1 0 0 1 X 0 0 1 0
lw
0 1 1 1 1 0 0 0 0
sw
X 1 X 0 0 1 0 0 0
beq
Determining control signals for each
Instruction
instr Reg ALU Mem Reg Mem Mem Branch ALU ALU
Dst Src To Write Read Write Op1 Op0
Reg

R-format
1 0 0 1 X 0 0 1 0
lw
0 1 1 1 1 0 0 0 0
sw
X 1 X 0 0 1 0 0 0
beq
X 1 X 0 X 0 1 0 1
R-Type Instruction (p. 266, Fig 4.19)
Lw instruction (p. 267 Fig. 4.20)
Beq instruction (p. 268 Fig. 4.21)
Adding Jump Instruction
• Format
opcode 26-bit I

• Necessary Operations:
1. Target address = top 4 bits of PC || 26-bit I || 00
2. PC = Target address
Jump Instruction (p.271, 4.24)
Performance of single-cycle datapath
• Given the following latencies:
• Memory units → 200ps,
• ALU/adder → 200ps
• register read → 100 ps
• register write →100 ps
• R-format: 200ps(instr fetch) + 100ps(reg read) + 200ps (ALU/adder) +
100ps(reg write) = 600ps
• LW: 200ps + 100ps + 200ps + 200ps + 100ps = 800ps
• SW: 200ps + 100ps + 200ps + 200ps = 700ps
• BEQ: 200ps + 100ps + 200ps = 500ps
• Single cycle implementation means one cycle = 800ps (slowest operation)
Cycle time versus clock rate
• Suppose your home computer has a 1 GH
• 1 GHz = 109 cycles per second
• 1 cycle = 10−9 seconds ( 1/(cycles_per_second))

• Since 1 ns = 10−9
• And 1 ps = 10−12
• For our single-cycle implementation, clock rate = 1.25 GHz
• 800ps → 800𝑥10−12 → 8𝑥10−10 is time for 1 cycle.
• 1.25 𝑥 109 → 1.25 GHz
Pipelining: Basics Concepts
• Doing Laundry in 4 steps (each 30 mins):
• Wash
• Dry
• Fold
• Put in closet
• Each load takes 2 hours, 4 loads take 2 * 4 hours.
• But can overlap steps, like an assembly line.
• Wash load 1
• Dry load 1, wash load 2,
• Fold load 1, dray load 2, wash load 3
• And so on.
Pipelining: Basics
Concepts
• 4 loads now take 3 ½ hours instead
of the 8 hours without
overlapping.
Simple MIPS pipeline
• Consider lw instruction (slowest): 5 steps;
• Instruction Fetch (IF)
• Instruction Decode / register fetch (ID): generate control signals, get rs, rt
• Execute (EX): calculate memory address
• Memory Access (MEM): read data from memory
• Writeback (WB): write result back to rt

• Then somewhere else PC = PC + 4 happens.


Simple MIPS pipeline
• Each step uses a different part of the datapath
• IF uses instruction memory
• ID reads registers, generates control signals
• EX uses ALU
• MEM uses data memory
• WB uses registers
• Can overlap steps: pipelined MIPS implementation.
Simple MIPS pipeline
• What’s the speed up? Remember:
• Register read or write takes 100ps
• Memory access, ALU takes 200ps
• Without pipelining:
Instr fetch Register ALU op Memory Register Total time
read access write

lw 200ps 100 ps 200ps 200ps 100 ps 800ps


sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
MIPS Pipelining

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)


MIPS Pipelining
Cycle 1: lw $1… in IF
Cycle 2: lw $2… in IF
lw $1… in ID
Cycle 3: lw $3… in IF
lw $2… in ID
lw $1… in EX
Etc, etc

Cycle 5: lw $1… in WB (other lw’s in IF, ID etc)


Cycle 6: lw $2… in WB (other lw’s in IF, ID etc)
• If there are many instructions, ideally each instruction takes 1 cycle.
MIPS Pipelining
• MIPS ISA makes it easy to implement a pipeline!
• Each MIPS instruction is 32 bits.
• In IF, get 32 bits (MEM[PC]) from memory, one instruction.
(later: x86 instructions have different sizes)
• To get next instruction, PC = PC + 4
• Instruction format is fixed; easy to decode
• Load/store architecture:
• Lw/sw access data memory once per instruction
Pipeline Hazards
• Ideally, start/complete new instructions every cycle in pipeline.
• But sometimes this may not be possible: hazards.
• Hazards may result in stalls or bubbles in the pipeline. (cycles where
nothing happens in a single stage.
• Three types of hazards
1. Data hazards
2. Control hazards
3. Structural hazards
Data hazard
• An instruction uses the result from the previous instruction.
• Add $s0, $t0, $t1
• Sub $t2, $s0, $t3
• $s0 is only ready in cycle 5!, sub has to wait
Data Hazards
• However, technically speaking, the result is ready after Cycle 3, output of
ALU.
• We can forward this result to the sub instruction.
• This result in either less or no stalls at all.
• Would need extra logic and connections in hardware to perform
forwarding.
Load-Use Data hazards
• Load-use example:
1 2 3 4 5
lw $s0, ? IF ID EX MEM WB
add ?, $s0, ?
• Result of lw is ready at the end of cycle 4.
• Add needs the result at the beginning of cycle 3.
• This results in a load-use data-hazard.
• A stall/bubble that is necessary.
More likely timing:
cycle 1 2 3 4 5 6 7
lw $s0… IF ID EX MEM WB
sub $t2… IF ID stall EX MEM WB
At cycle 4, sub is blocked in ID, lw continues to MEM.
Code scheduling to avoid stalls.

• Sometimes a compiler can re-order or reschedule instructions in


order to avoid load-use stalls.
lw $t1, 0($t0) lw $t1, 0($t0)
lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3, 12($t0) add $t3, $t1, $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
13 cycles 11 cycles
Control hazards
• Control hazards are pipeline hazards caused by instructions that
change the PC (conditional branches and jumps)
• beq $s0,$s1, T
A: [fall through]

T: [target]
• Cycle 1: beq in IF
• Cycle 2: what instruction to fetch next?
Control hazards
• Need to know what instruction to fetch next as soon as possible.
• Branch Target address for beq
• $s0 == $s1?

• Can try to calculate both in ID stage.


• But stalls are usually necessary.
• Can also try to predict branches.
• In general, branch/control instructions are difficult for efficient
pipeline operation.
Structural hazards
• A Structural hazard is when a hardware resource component is
needed by more than one instruction, in the same cycle.
• This is a resource conflict, or resource contention.
• Current MIPS pipeline has no structural hazards.
• But suppose we decide to combine instruction and data memories
into one memory unit.
• Then on every lw/sw when we access data in MEM stage, there is a
conflict with the IF stage.
• This would be a structural hazard, the pipeline has to stall.
MIPS pipeline divided into stages. P. 287, Fig 4-33
MIPS Pipelining
• Need pipeline registers between each pair of stages.
• At rising edge of clock, new contents appear in register before each stage.
• New contents go though logic in stage, computations are performed.
• New contents propagate to input of next register
• Ex. Add $16, $17, $18
17 $17
Registers

18 $18
Full view of Pipeline, Fig 4-41 p. 296
Lw $16, 0($24), IF
Lw $16, 0($24), ID
Lw $16, 0($24), EX
Lw $16, 0($24), MEM
Lw $16, 0($24), WB
MIPS Pipelining
• Note that each instruction carries its control signals ( and other
necessary information) with it through the pipeline.
• Control signals, registers numbers, memory addresses, etc are pulled
out of the pipeline registers and used in the appropriate stages.
• ALL instructions have the same

Stage operations
IF Current instr = MEM[PC]; PC = PC + 4
ID Read rs, rt; generate control signals
MIPS pipelining.
• Each instruction has its own operation after ID:
Stage lw sw ALU beq
EX ADDR = rs + I ADDR = rs + I Result = rs op Compute BTA;
rt rs - rt
MEM Data = MEM[ADDR] = If (rs-rt == 0)
MEM[ADDR] rt PC = BTA
WB rt = Data rd = result
Simplified view of pipeline control (p. 301 Fig. 4.46):
Carrying information between stages (p. 303 Fig. 4.50):

You might also like