Professional Documents
Culture Documents
Slides Week8 PDF
Slides Week8 PDF
Introduc)on
• We shall first look at the instruc>on set architecture of a popular Reduced
Instruc,on Set Architecture (RISC) processor, viz. MIPS32.
– It is a 32-bit processor, i.e. can operate on 32 bits of data at a >me.
• We shall look at the instruc>on types, and how instruc>ons are encoded.
• Then we can understand the process of instruc>on execu>on, and the steps
involved.
• We shall discuss the pipeline implementa>on of the processor.
– Only for a small subset of the instruc>ons (and some simplifying assump>ons).
• Finally, we shall present the Verilog design of the pipeline processor.
1
15/09/17
2
15/09/17
3
15/09/17
4
15/09/17
5
15/09/17
6
15/09/17
END OF LECTURE 37
7
15/09/17
8
15/09/17
9
15/09/17
10
15/09/17
Memory Reference:
Example: LW R3, 100(R8)
ALUOut ß A + Imm;
Branch:
ALUOut ß NPC + Imm ; Example: BEQZ R2, Label
[op is ==]
cond ß (A op 0);
11
15/09/17
Load instruc>on:
PC ß NPC;
LMD ß Mem [ALUOut];
Branch instruc>on:
if (cond) PC ß ALUOut;
else PC ß NPC;
31 26 25 21 20 16 15 0
I-type opcode rs rt Immediate Data
12
15/09/17
Load Instruc>on:
Reg [rt] ß LMD;
13
15/09/17
14
15/09/17
MEM PC ß NPC;
if (cond) PC ß ALUOut;
WB -
= 0 cond
+ NPC M
1 U
M X
rs A
A U
rt X L
Instruc>on Register A U
PC IR rd Bank L O L
Memory Data
U u M M
B M t Memory D
U U
X X
I
The MIPS32 Sign
M
Extend
Data Path (Non- M
Pipelined)
15
15/09/17
END OF LECTURE 38
16
15/09/17
Introduc)on
• Basic requirements for pipelining the MIPS32 data path:
– We should be able to start a new instruc>on every clock cycle.
– Each of the five steps men>oned before (IF, ID, EX, MEM and WB)
becomes a pipeline stage.
– Each stage must finish its execu>on within one clock cycle.
Δ T Δ T Δ T Δ T Δ T Δ
IF ID EX MEM WB
Clock Cycles
Instruc)on 1 2 3 4 5 6 7 8
i IF ID EX MEM WB
i + 1 IF ID EX MEM WB
i + 2 IF ID EX MEM WB
i + 3 IF ID EX MEM WB
Instr-i Instr-(i+2)
finishes finishes
Instr-(i+1) Instr-(i+3)
finishes finishes
17
15/09/17
Source Register 1
Read (5 bits)
Port 1 Register Data Des>na>on Register
(32 bits) (5 bits) Write
REGISTER Register Data Port
Source Register 1 BANK (32 bits)
(5 bits)
Read
Port 2 Register Data
(32 bits)
18
15/09/17
From EX_MEM_ALUOut
From EX_MEM_cond
M
1 U
+ X NPC
IR
Instruc>on
PC
Memory
IF_ID
19
15/09/17
NPC NPC
rs
A
rt
IR Register
Bank B
Imm
Sign
Extend
IR
From MEM_WB_ALUOut
or MEM_WB_LMD
From MEM_WB_IR[rd]
IF_ID ID_EX
20
15/09/17
EX_MEM_IR ß ID_EX_IR;
EX_MEM_ALUOut ß ID_EX_A + ID_EX_Imm;
LOAD / STORE
EX_MEM_B ß ID_EX_B;
NPC
cond
= 0
M
A U
X
A ALUOut
L
B
M U
Imm U
X B
IR IR
ID_EX EX_MEM
21
15/09/17
MEM_WB_IR ß EX_MEM_IR;
STORE
Mem [EX_MEM_ALUOut] ß EX_MEM_B;
To IF stage
To IF stage
cond
ALUOut
Data LMD
B Memory
ALUOut
IR IR
EX_MEM MEM_WB
22
15/09/17
LMD
M
ALUOut U
X
IR
MEM_WB
Register
Access
23
15/09/17
M NPC
1 = 0
U
+ X
rs M
A U
X A
Register
rt L LMD
IR Bank B
PC IM M U DM M
U U
X X
Sign B
Extend ALUOut
IR
24
15/09/17
END OF LECTURE 39
25
15/09/17
26
15/09/17
reg HALTED;
// Set after HLT instruction is completed (in WB stage)
reg TAKEN_BRANCH;
// Required to disable instructions after branch
27
15/09/17
case (IF_ID_IR[31:26])
ADD,SUB,AND,OR,SLT,MUL: ID_EX_type <= #2 RR_ALU;
ADDI,SUBI,SLTI: ID_EX_type <= #2 RM_ALU;
LW: ID_EX_type <= #2 LOAD;
SW: ID_EX_type <= #2 STORE;
BNEQZ,BEQZ: ID_EX_type <= #2 BRANCH;
HLT: ID_EX_type <= #2 HALT;
default: ID_EX_type <= #2 HALT;
// Invalid opcode
endcase
end
28
15/09/17
case (ID_EX_type)
RR_ALU: begin
case (ID_EX_IR[31:26]) // "opcode"
ADD: EX_MEM_ALUOut <= #2 ID_EX_A + ID_EX_B;
SUB: EX_MEM_ALUOut <= #2 ID_EX_A - ID_EX_B;
AND: EX_MEM_ALUOut <= #2 ID_EX_A & ID_EX_B;
OR: EX_MEM_ALUOut <= #2 ID_EX_A | ID_EX_B;
SLT: EX_MEM_ALUOut <= #2 ID_EX_A < ID_EX_B;
MUL: EX_MEM_ALUOut <= #2 ID_EX_A * ID_EX_B;
default: EX_MEM_ALUOut <= #2 32'hxxxxxxxx;
endcase
end
Hardware Modeling Using Verilog 57
RM_ALU: begin
case (ID_EX_IR[31:26]) // "opcode"
ADDI: EX_MEM_ALUOut <= #2 ID_EX_A + ID_EX_Imm;
SUBI: EX_MEM_ALUOut <= #2 ID_EX_A - ID_EX_Imm;
SLTI: EX_MEM_ALUOut <= #2 ID_EX_A < ID_EX_Imm;
default: EX_MEM_ALUOut <= #2 32'hxxxxxxxx;
endcase
end
29
15/09/17
LOAD, STORE:
begin
EX_MEM_ALUOut <= #2 ID_EX_A + ID_EX_Imm;
EX_MEM_B <= #2 ID_EX_B;
end
BRANCH: begin
EX_MEM_ALUOut <= #2 ID_EX_NPC + ID_EX_Imm;
EX_MEM_cond <= #2 (ID_EX_A == 0);
end
endcase
end
case (EX_MEM_type)
RR_ALU, RM_ALU:
MEM_WB_ALUOut <= #2 EX_MEM_ALUOut;
30
15/09/17
endmodule
END OF LECTURE 40
31
15/09/17
32
15/09/17
Example 1
• Add three numbers 10, 20 and 30 stored in processor registers.
• The steps:
– Ini>alize register R1 with 10.
– Ini>alize register R2 with 20.
– Ini>alize register R3 with 30.
– Add the three numbers and store the sum in R4.
33
15/09/17
module test_mips32;
initial
begin
clk1 = 0; clk2 = 0;
repeat (20) // Generating two-phase clock
begin
#5 clk1 = 1; #5 clk1 = 0;
#5 clk2 = 1; #5 clk2 = 0;
end
end
initial
begin
for (k=0; k<31; k++)
mips.Reg[k] = k;
34
15/09/17
mips.HALTED = 0;
mips.PC = 0;
mips.TAKEN_BRANCH = 0; SIMULATION
OUTPUT
#280
for (k=0; k<6; k++)
$display ("R%1d - %2d", k, mips.Reg[k]); R0 - 0
end R1 - 10
R2 - 20
initial R3 - 25
begin R4 - 30
$dumpfile ("mips.vcd"); R5 - 55
$dumpvars (0, test_mips32);
#300 $finish;
end
endmodule
35
15/09/17
Example 2
• Load a word stored in memory loca>on 120, add 45 to it, and store the result
in memory loca>on 121.
• The steps:
– Ini>alize register R1 with the memory address 120.
– Load the contents of memory loca>on 120 into register R2.
– Add 45 to register R2.
– Store the result in memory loca>on 121.
36
15/09/17
module test_mips32;
initial
begin
clk1 = 0; clk2 = 0;
repeat (50) // Generating two-phase clock
begin
#5 clk1 = 1; #5 clk1 = 0;
#5 clk2 = 1; #5 clk2 = 0;
end
end
initial
begin
for (k=0; k<31; k++)
mips.Reg[k] = k;
mips.Mem[120] = 85;
37
15/09/17
mips.PC = 0;
mips.HALTED = 0;
mips.TAKEN_BRANCH = 0;
endmodule
38
15/09/17
Example 3
• Compute the factorial of a number N stored in memory loca>on 200. The
result will be stored in memory loca>on 198.
• The steps:
– Ini>alize register R10 with the memory address 200.
– Load the contents of memory loca>on 200 into register R3.
– Ini>alize register R2 with the value 1.
– In a loop, mul>ply R2 and R3, and store the product in R2.
– Decrement R3 by 1; if not zero repeat the loop.
– Store the result (from R3) in memory loca>on 198.
39
15/09/17
module test_mips32;
initial
begin
clk1 = 0; clk2 = 0;
repeat (50) // Generating two-phase clock
begin
#5 clk1 = 1; #5 clk1 = 0;
#5 clk2 = 1; #5 clk2 = 0;
end
end
initial
begin
for (k=0; k<31; k++)
mips.Reg[k] = k;
40
15/09/17
mips.PC = 0;
mips.HALTED = 0;
mips.TAKEN_BRANCH = 0;
initial
begin
$dumpfile ("mips.vcd");
$dumpvars (0, test_mips32);
$monitor ("R2: %4d", mips.Reg[2]);
#3000 $finish;
end
endmodule
SIMULATION
OUTPUT
R2: 2
R2: 1
R2: 7
R2: 42
R2: 210
R2: 840
R2: 2520
R2: 5040
R2: 5040
Mem[200] = 7, Mem[198] = 5040
41
15/09/17
42
15/09/17
Point to Note
• We have not considered the methods for avoiding hazards in pipelines.
• For the examples shown, we have inserted dummy instruc>ons between
dependent pairs of instruc>ons.
– So that data hazard does not lead to incorrect results.
• Also, we have modeled the processor using behavioral code.
– In a real design where the target is to synthesize into hardware, structural design of
the pipeline stages is usually used.
– The Verilog code will be genera>ng the control signals for the pipeline data path in
the proper sequence.
END OF LECTURE 41
43
15/09/17
Topics Covered
• Basic introduc>on to Verilog language and features.
• Difference between behavioral and structural representa>ons.
• Using con>nuous dataflow assignments in Verilog code.
• Modeling combina>onal and sequen>al circuits.
• Procedural blocks, blocking and non-blocking assignments.
• Wri>ng test benches.
44
15/09/17
45
15/09/17
46