Datapath Control

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 56

The Processor: Datapath & Control

We're ready to look at an implementation of the MIPS

Simplified to contain only:
memory-reference instructions: lw, sw
arithmetic-logical instructions: add, sub, and, or, slt
control flow instructions: beq, j
Generic Implementation:
use the program counter (PC) to supply instruction address
get the instruction from memory
read registers
use the instruction to decide exactly what to do
All instructions use the ALU after reading the registers
Why? memory-reference? arithmetic? control flow?

2004 Morgan Kaufmann Publishers 1

More Implementation Details

Abstract / Simplified View:

Two types of functional units:

elements that operate on data values (combinational)
elements that contain state (sequential)

2004 Morgan Kaufmann Publishers 2

Figure 5.2 The basic implementation of the MIPS subset
including the necessary multiplexers and control lines.

2004 Morgan Kaufmann Publishers 3

5.3 Building a Datapath

2004 Morgan Kaufmann Publishers 4


Datapath element A functional unit used to operate on or hold

data within a processor. In the MIPS implementation the datapath
elements include the instruction and data memories, the register
file, the arithmetic logic unit (ALU), and adders.

Program counter (PC) The register containing the address of

the instruction in the program being executed.

Register file A state element that consists of a set of registers

that can be read and written by supplying a register number to be

Sign-extend To increase the size of a data item by replicating

the high-order sign bit of the original data item in the high-order
bits of the larger, destination data item.

2004 Morgan Kaufmann Publishers 5


Branch target address The address specified in a branch,

which becomes the new program counter (PC) if the branch is
taken. In the MIPS architecture the branch target is given by the
sum of the offset field of the instruction and the address of the
instruction following the branch.
Branch taken A branch where the branch condition is satisfied
and the program counter (PC) becomes the branch target. All
unconditional branches are taken branches.
Branch not taken A branch where the branch condition is false
and the program counter (PC) becomes the address of the
instruction that sequentially follows the branch.
Delayed branch A type of branch where the instruction
immediately following the branch is always executed, independent
of whether the branch condition is true or false.

2004 Morgan Kaufmann Publishers 6

Register File

Built using D flip-flops Read register

number 1
Register 0

Register 1
... u Read data 1
Read register Register n 2
number 1 Read Register n 1
Read register data 1
number 2
Register file Read register
Write Read number 2
register data 2
data Write M
u Read data 2

Do you understand? What is the Mux above?

2004 Morgan Kaufmann Publishers 7

Register File

Note: we still use the real clock to determine when to write


1 Register 0

n-to-2n .. D
Register number .
Register 1


Register n 2

Register n 1
Register data D

2004 Morgan Kaufmann Publishers 8

Simple Implementation

Include the functional units we need for each instruction


Instruction PC Add Sum


a. Instruction memory b. Program counter c. Adder

2004 Morgan Kaufmann Publishers 9


Address data
16 32
Data extend
Write memory


a. Data memory unit b. Sign-extension unit

2004 Morgan Kaufmann Publishers 10

5 Read ALU operation
register 1 4
Register 5 data 1
numbers register 2 Zero
5 Registers
Write result
register Read
data 2
Data Write

a. Registers b. ALU

Why do we need this stuff?

2004 Morgan Kaufmann Publishers 11
Figure 5.10 The datapath for the memory instructions and
the R-type instructions.

2004 Morgan Kaufmann Publishers 12

Building the Datapath
Use multiplexors to stitch them together

Add u
4 Add
left 2

Read ALUSrc ALU operation

Read register 1 4
PC address Read MemWrite
data 1
Read MemtoReg
register 2 Zero
Instruction ALU ALU
Registers Read Read
Write Address
data 2 result data M
Instruction register M
memory u u
x x
data Data
Write memory
RegWrite data

16 32 MemRead

2004 Morgan Kaufmann Publishers 13

5.4 A Simple Implementation Scheme

2004 Morgan Kaufmann Publishers 14

Figure B.5.9 A 1-bit ALU that performs AND, OR, and
addition on a and b or a and b.

2004 Morgan Kaufmann Publishers 15

FIGURE B.5.10 (Top) A 1-bit ALU that performs AND,
OR, and addition on a and b or b.

2004 Morgan Kaufmann Publishers 16

FIGURE B.5.10 (bottom) a 1-bit ALU for the most
significant bit.

2004 Morgan Kaufmann Publishers 17

FIGURE B.5.11 A 32-bit ALU constructed from the 31 copies of the 1-bit
ALU in the top of Figure B.5.10 and one 1-bit ALU in the bottom of that figure.

2004 Morgan Kaufmann Publishers 18

FIGURE B.5.12 The final 32-bit ALU. This adds a Zero
detector to Figure B.5.11.

2004 Morgan Kaufmann Publishers 19

FIGURE B.5.14 The symbol commonly used to represent an
ALU, as shown in FigureB.5.12.

2004 Morgan Kaufmann Publishers 20

Figure 5.15 The datapath of Figure 5.12 with all necessary
multiplexors and all control lines identified

2004 Morgan Kaufmann Publishers 21


Simple combinational logic (truth tables)


ALU control block

Operation2 R-format Iw sw beq
F3 RegDst
F2 Operation1 ALUSrc
F (5 0)
F1 MemtoReg



2004 Morgan Kaufmann Publishers 22

Figure 5.17 The simple datapath with the control unit.

2004 Morgan Kaufmann Publishers 23

Figure 5.18 The setting of the control lines is completely
determined by the opcode fields of the instruction.

Memto- Reg Mem Mem

Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1

2004 Morgan Kaufmann Publishers 24

Figure 5.19 The datapath in operation for an R-type instruction
such as add $t1, $t2, $t3.

2004 Morgan Kaufmann Publishers 25

Figure 5.20 The datapath in operation for a load instruction.

2004 Morgan Kaufmann Publishers 26

Figure 5.21 The datapath in operation for a branch equal

2004 Morgan Kaufmann Publishers 27

Figure 5.22 The control function for the simple single-cycle
implementation is completely specified by this truth table.
Input or output Signal name R-format lw sw beq
Inputs Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
Outputs RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0
MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1
2004 Morgan Kaufmann Publishers 28
Figure 5.23 Instruction format for the jump instruction
(opcode = 2).

Field 000010 address

Bit positions 31:26 25:0

2004 Morgan Kaufmann Publishers 29

Figure 5.24 The simple control and datapath are extended to
handle the jump instruction.

2004 Morgan Kaufmann Publishers 30

Problem: Performance of Single-Cycle Machines (p.315)
Assume that the operation times for the major functional units in this implementation
are the following:

Memory units: 200 picoseconds (ps)

ALU and adders: 100 ps
Register file (read or write): 50 ps

Assume that the multiplexors, control unit, PC accesses, sign extension unit, and
wires have no delay, which of the following implementations would be faster and by
how much?

1. An implementation in which every instruction operates in 1 clock cycle of a

fixed length.
2. An implementation where every instruction executes in 1 clock cycle
using a variable-length clock, which for each instruction is only as long as it
needs to be.

To compare the performance, assume the following instruction mix: 25% loads, 10%
stores, 45% ALU instructions, 15% branches, and 5% jumps.

2004 Morgan Kaufmann Publishers 31

Lets start by comparing the CPU execution times.
CPU execution time Instruction count CPI Clock cycle time
Since CPI must be 1, we can simplify this to
CPU execution time Instruction count Clock cycle time
The critical path for the different instruction classes is as follows:

Instruction class Functional units used by the instruction class

R-type Instruction fetch Register access ALU Register access

Load word Instruction fetch Register access ALU Memory access Register access

Store word Instruction fetch Register access ALU Memory access

Branch Instruction fetch Register access ALU

Jump Instruction fetch

2004 Morgan Kaufmann Publishers 32

Using these critical paths, we can compute the required length for
each instruction class:
Instruction Instruction Register ALU Data Register
class memory read operation memory write Total

R-type 200 50 100 0 50 400ps

Load word 200 50 100 200 50 600ps

Store word 200 50 100 200 550ps

Branch 200 50 100 0 350ps

Jump 200 200ps

Thus, the average time per instruction with a variable clock is

CPU clock cycle 600 25% 550 10% 400 45% 350 15% 200 5%
447.5 ps

2004 Morgan Kaufmann Publishers 33

Since the variable clock implementation has a shorter average clock
cycle, it is clearly faster. Lets find the performance ratio:

CPU performance variable clock CPU execution timesingle clock

CPU performance single clock CPU execution time variable clock
IC CPU clock cyclesingle clock CPU clock cyclesingle clock

IC CPU clock cyclevariable clock CPU clock cyclevariable clock

2004 Morgan Kaufmann Publishers 34

5.5 A Multicycle Implementation

2004 Morgan Kaufmann Publishers 35


Multicycle implementation Also called multiple clock cycle

implementation. An implementation in which and instruction is
executed in multiple clock cycles.

Microprogramming A symbolic representation of control in the

form of instructions, called microinstructions, that are executed on
a simple micromachine.

Finite state machine A sequential logic function consisting of a

set of inputs and outputs, a next-state function that maps the
current state and the inputs to a new state, and an output function
that maps the current state and possibly the input to a set of
asserted outputs.

Next-state function A combinational function that, given the

inputs and the current state, determines the next state of a finite
state machine.
2004 Morgan Kaufmann Publishers 36
Where we are headed

Single Cycle Problems:

what if we had a more complicated instruction like floating
wasteful of area
One Solution:
use a smaller cycle time
have different instructions take different numbers of cycles
a multicycle datapath:

2004 Morgan Kaufmann Publishers 37

Multicycle Approach

We will be reusing functional units

ALU used to compute address and to increment PC
Memory used for instruction and data
Our control signals will not be determined directly by instruction
e.g., what should the ALU do for a subtract instruction?
Well use a finite state machine for control

2004 Morgan Kaufmann Publishers 38

Multicycle Approach

Break up the instructions into steps, each step takes a cycle

balance the amount of work to be done
restrict each cycle to use only one major functional unit
At the end of a cycle
store values for use in later cycles (easiest thing to do)
introduce additional internal registers

2004 Morgan Kaufmann Publishers 39

Figure 5.27 The multicycle datapath from Figure 5.26 with the
control lines shown.

2004 Morgan Kaufmann Publishers 40

Figure 5.28 The complete datapath for the multicycle
implementation together with the necessary control lines.

2004 Morgan Kaufmann Publishers 41

Figure 5.29 The action caused by the setting of each control
signal in Figure 5.28 on page 323.

Actions of the 1-bit control signals

Signal name Effect when deasserted Effect when asserted
RegDst The register file destination number for The register file destination number for the Write register
the Write register comes from the rt comes from the rd field.
RegWrite None. The general-purpose register selected by the Write register
number is written with the value of the Write data input.
ALUSrcA The first ALU operand is the PC. The first ALU operand comes from the A register.

MemRead None. Content of memory at the location specified by the address

input is put on Memory data output.
MemWrite None. Memory contents at the location specified by the address
input is replaced by value on Write data input.
MemtoReg The value fed to the register file Write The value fed to the register file Write data input comes from
data input comes from ALUOut. the MDR.
IorD The PC is used to supply the address to ALUOut is used to supply the address to the memory unit.
the memory unit.
IRWrite None. The output of the memory is written into the IR.

PCWrite None. The PC is written; the source is controlled by PCSource.

PCWriteCond None. The PC is written is the Zero output from the ALU is also

2004 Morgan Kaufmann Publishers 42


Actions of the 2-bit control signals

Signal Value Effect
name (binary)
ALUOp 00 The ALU performs an add operation.

01 The ALU performs a subtract operation.

10 The funct field of the instruction determines the ALU operation.

ALUSrcB 00 The second input to the ALU comes from the B register.

01 The second input to the ALU is the constant 4.

10 The second input to the ALU is the sign-extend, lower 16 bits of the IR.

11 The second input to the ALU is the sign-extended, lower 16 bits of the IR shifted left 2 bits.

PCSource 00 Output of the ALU (PC+4) is sent to the PC for writing.

01 The contents of ALUOut (the branch target address) are sent to the PC for waiting.

10 The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is
sent to the PC for writing.)

2004 Morgan Kaufmann Publishers 43

Instructions from ISA perspective

Consider each instruction from perspective of ISA.

The add instruction changes a register.
Register specified by bits 15:11 of instruction.
Instruction specified by the PC.
New value is the sum (op) of two registers.
Registers specified by bits 25:21 and 20:16 of the instruction
Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op

In order to accomplish this we must break up the instruction.

(kind of like introducing variables when programming)

2004 Morgan Kaufmann Publishers 44

Breaking down an instruction

ISA definition of arithmetic:

Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op


Could break down to:

IR <= Memory[PC]
A <= Reg[IR[25:21]]
B <= Reg[IR[20:16]]
ALUOut <= A op B
Reg[IR[20:16]] <= ALUOut

We forgot an important part of the definition of arithmetic!

PC <= PC + 4

2004 Morgan Kaufmann Publishers 45

Idea behind multicycle approach

We define each instruction from the ISA perspective (do this!)

Break it down into steps following our rule that data flows through
at most one major functional unit (e.g., balance work across steps)

Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)

Finally try and pack as much work into each step

(avoid unnecessary cycles)
while also trying to share steps where possible
(minimizes control, helps to simplify solution)

Result: Our books multicycle Implementation!

2004 Morgan Kaufmann Publishers 46

Five Execution Steps

Instruction Fetch

Instruction Decode and Register Fetch

Execution, Memory Address Computation, or Branch Completion

Memory Access or R-type instruction completion

Write-back step


2004 Morgan Kaufmann Publishers 47

Step 1: Instruction Fetch

Use PC to get instruction and put it in the Instruction Register.

Increment the PC by 4 and put the result back in the PC.
Can be described succinctly using RTL "Register-Transfer Language"

IR <= Memory[PC];
PC <= PC + 4;

Can we figure out the values of the control signals?

What is the advantage of updating the PC now?

2004 Morgan Kaufmann Publishers 48

Step 2: Instruction Decode and Register Fetch

Read registers rs and rt in case we need them

Compute the branch address in case the instruction is a branch

A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]];
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);

We aren't setting any control lines based on the instruction type

(we are busy "decoding" it in our control logic)

2004 Morgan Kaufmann Publishers 49

Step 3 (instruction dependent)

ALU is performing one of three functions, based on instruction type

Memory Reference:

ALUOut <= A + sign-extend(IR[15:0]);


ALUOut <= A op B;


if (A==B) PC <= ALUOut;

2004 Morgan Kaufmann Publishers 50

Step 4 (R-type or memory-access)

Loads and stores access memory

MDR <= Memory[ALUOut];

Memory[ALUOut] <= B;

R-type instructions finish

Reg[IR[15:11]] <= ALUOut;

The write actually takes place at the end of the cycle on the edge

2004 Morgan Kaufmann Publishers 51

Write-back step

Reg[IR[20:16]] <= MDR;

Which instruction needs this?

2004 Morgan Kaufmann Publishers 52


2004 Morgan Kaufmann Publishers 53

Problem: CPI in a multicycle CPU

Using the SPECINT2000 instruction mix shown in Figure 3.26, what is

the CPI, assuming that each state in the multicycle CPU requires 1
clock cycle?

The mix is 25% loads (1% load byte+24% load word), 10% stores (1%
store byte+9% store word), 11% branches (6% beq, 5% bne), 2% jumps
(1% jal+1% jr), and 52% ALU (all the rest of the mix, which we assume to
be ALU instructions). From Figure 5.30 on page 329, the number of clock
cycles for each instruction class is the following:
Loads: 5 ; Store: 4; ALU instructions: 4; Branches: 3; Jumps: 3;

The CPI is given by the following:

CPU clock cycles Instruction count i CPI i

Instruction count Instruction count
Instruction count i
Instruction count 2004 Morgan Kaufmann Publishers 54
The ratio

Instruction counti
Instruction count
is simplify the instruction frequency for the instruction class i. We
can therefore substitute to obtain

CPI 0.25 5 0.10 4 0.52 4 0.11 3 0.02 3 4.12

This CPI is better than the worst-case CPI of 5.0 when all the
instructions take the same number of clock cycles.

2004 Morgan Kaufmann Publishers 55

Figure 5.39 The multicycle datapath with the addition needed
to implement exceptions.

2004 Morgan Kaufmann Publishers 56

You might also like