CA Paper

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/359861898
5 Stage Pipelined RISCV Processor in RTL
Conference Paper · December 2021
CITATIONS READS
0 839
1 author:
Gaurav Srivastav
Indraprastha Institute of Information Technology
1 PUBLICATION 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Computer Architecture View project
All content following this page was uploaded by Gaurav Srivastav on 10 April 2022.
The user has requested enhancement of the downloaded file.

5 Stage Pipelined RISCV Processor in RTL
Gaurav Srivastav Divya Sareen Hanan Hamid
M.Tech, IIIT Delhi M.Tech, IIIT Delhi B.Tech IIIT, Delhi
Department of ECE Department of ECE Department of ECE
Delhi, India Delhi, India Delhi, India
Abstract— This paper consists of RISCV (RV32I) Processor Architecture

implementation in Verilog. We have implemented the processor
with 5 stage pipelines, i.e., fetch, decode, execute, memory, A. Fetch Stage
writeback. The processor is designed using Harvard Fetch is the 1st stage of a processor pipeline. It uses a
Architecture i.e., two separate memories have been used for
program counter that helps in fetching the instruction
storing instructions & data. To achieve better performance
Bypassing and stalling logic along with 10 different instructions code from the insruuction memory. The program
are implemented in this project. We have implemented 32-bit counter is incremented by 4 after every instruction.
architecture of RV321. B. Decode Stage
Keywords— RISCV(RV32I), pipeline, RTL, Xilinx Vivado In the decode stage the instruction that has been fetched
is decoded and the values are fetched from the registers
mentioned in the instruction.
C. Execute Stage
Introduction
This stage will get the values from the decode stage &
RISC processor is a design technique which is used to then perform the operations according to instruction
reducing the amount of area required, complexity of
instruction set, instruction cycle and cost during the D. Memory
implementation of the design. After the fast growth of silicon Memory stage in the processor will perform the
technology and fall in price of IC’s. RISC processor usage operations of memory access & forward the signals
increased rapidly in all areas of technology. RISC-V is an which are not related with memory
open standard instruction set architecture (ISA) which is based
on established reduced instruction set computer (RISC) E. Writeback
principles. RISC-V provides an open-source license, which Writeback stage is for perform the write operations on
means anyone can use it without paying anything. This the registers where we have to store the new value.
architecture is generally of 3 variants, i.e., 32-bit, 64-bit and
128-bit. But in this project, we are using 32-bit architecture.
It’s a load-store architecture just like RISC processor which I. IMPLEMENTATION
uses only registers. Other operations can be done on register-
to-register criteria [1]. We have implemented this project in Verilog HDL.
Verilog is a hardware description language (HDL), widely
The main aim of making RISC-V processor is that the used for describing electronic hardware. We can design and
designer wanted the processor should be user friendly, could test abstract hardware circuits using Verilog without having to
be studied by anyone, could be purchased by anyone and worry about a specific technology. The design of the RISCV
designer could make their own processors of their own architecture is done using the Register Transfer Level (RTL).
demand. Since ARM, Intel processor is very costly the In RTL code the circuit is defined using operations and
infrastructure required for learning and testing anything on it transfer of data between different registers. This code can then
costs a fortune. Moreover, the ARM providers don’t allow
be synthesized to generate a schematic which produces a
much transparencies so that designer could know everything. graphical representation of the modules used and the
The framing of this paper is as below. Section I describes connections in-between them.
about Implementation of whole code, how the code files are
A. Code Modules & Working
working and how they are linked with each other. In
implementation we described all logics of operation. Section Since we have implemented it with 5 stage pipelines, we have
II describes about instruction format in which formats of divided our program in 5 stages i.e., fetch, decode, Alu,
different instruction showed. Section III describes about memory, writeback. & Processor. Processor module contains
simulation and results of our project. all the connections and instantiations of all 5 stages of the
pipeline and the corresponding modules that are required for
the functioning of the pipeline. This includes the instruction
memory, the data memory and the register files. The
testbench is instantiated with this main file for testing the
code outputs. A Clock is defined in testbench & used for
operations of the processor. On running the code first, the
main file is called by the test bench along with clock, program Implementation of Branch Module
counter & a starting pulse signal to start the working of the
processor. First of all, the instruction memory is loaded with While decoding the instruction in the Decode stage if the
the 32-bit instructions that are to be executed on the instruction is found to be a branch type instruction, then, we
processor. calculate the value of program counter & check the condition
of jump in the ALU stage. We continuously give it to the fetch
Then the main file which is Processor.v is first calling the stage, & in fetch stage if condition is true for jump, then we
fetch stage, with the value of program counter. The fetch are manipulating the program counter with the new program
stage uses the value of program counter (PC) to fetch the counter value. Then the fetch stage starts fetching the next
respective instruction from the instruction memory. After instruction from the new value of program counter. Along
every fetch the program counter is automatically incremented with this the speculative instructions that were already
to PC+4 (in case of byte addressable memory) and PC+1 (in fetched are flushed from the pipeline. For flushing those
case of word addressable memory), in our implementation we instructions, we are disabling the memory and writeback
are using the latter for updating PC. operations when the branch is taken.
Now the Processor.v file passes the fetched instruction from Implementation of Stall
instruction fetch stage to the decode stage. In the decode stage
the instruction is broken down into different segments Stall logic is required to overcome the data hazard by stalling
according to the requirements of RISC RV321 instruction set the pipeline until the value of the source register used is
architecture. The Decode stage prepares the required updated to its correct value by the previous instruction. For
information like source & destination registers used by the example, if any dependency like RAW hazard comes in
instruction, which operation the instruction is performing & picture so to solve that we require a control logic to freeze the
if any offset is used the offset value along with sign extension stages until the previous stages has finished with required
format for the next stage that is the ALU stage. values
The ALU stage is for performing the arithmetic & logical We have implemented the stall logic by sensing the inflight
operations, it performs the required operation that is register in the pipeline. We are continuously tracking the
mentioned in the instruction & if the instruction is branch register after each stage & if the source register in decode
type then ALU calculates the next value of the program stage is same as any inflight destination register in any stage,
counter taking for the branch. After all of these operations are then we are making the stall signal high & inserting NOP
done ALU sends the output to main file which passes the instructions from fetch stage to flow through the pipeline and
information to the Memory stage. delay the instruction until the dependency is over.
The function of the Memory stage is to perform memory Implementation of Bypass

accessing operations for load and store instructions.
Otherwise, it forwards the ALU output to next stage that is Bypassing/Forwarding is the method to bypass the values of
the Writeback stage. one instruction to the next instruction which is using the
updated value of the register before the first instruction has
At last, the output of Memory stage is sent to the Writeback reached the writeback stage.
stage where the writing of register files is done for that For implementing it we are tracking the inflight registers
particular instruction. same as stall and if inflight destination register is same as
source register, then we are forwarding the output of that
All of this process is carried out sequentially that is why it is stage directly to decode stage for the next instruction. By
called as a pipeline. means of bypassing we are able to complete the execution in
a smaller number of clock cycles.
Processor Architecture [3]

II. INSTRUCTION FORMAT 31-20 19-15 14-12 11-7 6-2 1-0
We are using the standard provided format of all instructions
which is as below [2]. offset[11:0] Rs1 010 rd 00000 11
a) AND Operation h) SW
Performs bitwise AND operation on registers rs1 and rs2 and Store 32-bit values from the low bits of register rs2
put the result in rd
31-25 24- 19-15 14-12 11- 6-2 1-0
31-27 26- 24- 19-15 14- 11- 6-2 1-0 20 7
25 20 12 7 Offset Rs2 Rs1 010 rd 01000 11
00000 00 Rs2 Rs1 111 rd 01100 11 [6:0]
i) SLL
b) OR
Perform the logical left shift on the value of register rs1 by the
Performs bitwise OR operation on registers rs1 and rs2 and shift amount held in the lower 5 bit of register rs2
31-27 26- 24- 19-15 14- 11- 6-2 1-0
31-27 26- 24- 19-15 14- 11- 6-2 1-0 25 20 12 7
25 20 12 7
00000 00 Rs2 Rs1 001 rd 01100 11
00000 00 Rs2 Rs1 110 rd 01100 11
j) SRA
c) ADD
Perform the arithmetic right shift on the value of register rs1
Adds the registers rs1 & rs2 and stores the result in rd by the shift amount held in lower 5 bit of register rs2
31-27 26- 24- 19-15 14- 11- 6-2 1-0 31-27 26- 24- 19-15 14- 11- 6-2 1-0
25 20 12 7 25 20 12 7
00000 00 Rs2 Rs1 000 rd 01100 11 01000 00 Rs2 Rs1 101 rd 01100 11
d) SUB k) MAC
Subtract the register rs1 and rs2 and put the result in rd rd<---rd+(rs1*rs2)
31-27 26- 24- 19-15 14- 11- 6-2 1-0 31-27 26- 24- 19- 14- 11- 6-2 1-0
25 20 12 7 25 20 15 12 7
01000 00 Rs2 Rs1 000 rd 01100 11 00000 00 Rs2 Rs1 000 rd 11111 11
e) ADDI
Add the sign extended 12-bit given offset with register rs1 and
III. SIMULATION & RESULTS
Input Assembly code with instruction
31-20 19-15 14-12 11-7 6-2 1-0
Imm[11:0] Rs1 000 rd 00100 11 Assembly Binary(in hexadecimal)

LW R1,0(R0) 00002083
f) BEQ ADD R2,R1,R3 00308133
Take branch if rs1==rs2 OR R4,R1,R0 0000e233
31-25 24- 19-15 14-12 11- 6-2 1-0 ADDI R2,R4,5 00520113
20 7
OR R6,R4,R5 00526333
Offset Rs2 Rs1 111 rd 01100 11
[6:0] ADD R7,R8,R9 009403b3
SUB R12,R10,R11 40b50633
g) LW
SW R20,R20,0 014a2023
Load the load the 32-bit from memory and store it in register
rd
View publication stats
LW R16,R15,5 0057d403 R25 25 R25 0

SLL R19,R20,R21 01499ab3
SRA R22,R23,R25 417b5cb3
Conclusion
We have initialized all the registers like R0=0, R1=1, We have designed and implemented 5 Stage pipeline using
R2=2…& memory like with their address & the output of RISCV processor using Verilog HDL on VIVADO tool. We
above code is as given. have used ISIM to perform simulations of our project.
INITIAL AFTER SIMULATION
R0 0 R0 0 REFERENCES
R1 1 R1 0 [1] J. Jeemon, "Pipelined 8-bit RISC processor design using Verilog HDL
on FPGA", 2016 IEEE International Conference on Recent Trends in
R2 2 R2 5 Electronics, Information & Communication Technology (RTEICT), p.
R4 4 R4 0 1, 2016. Available: http://ieeexplore.ieee.org/document/7808194/.
[Accessed 12 December 2021].
R6 6 R6 5 [2] RV32I, RV64I Instructions — riscv-isa-pages
documentation", Msyksphinz-self.github.io, 2021. [Online]. Available:
R7 7 R7 17 https://msyksphinz-self.github.io/riscv-isadoc/html/rvi.html.
R12 12 R12 -1 [Accessed: 12- Dec- 2021].
[3] D. Patterson and J. Hennessy, Computer architecture, 6th ed. San
R16 16 R16 20 Francisco: Kaufmann, 1996.I. S. Jacobs and C. P. Bean, “Fine particles,
thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado
R21 21 R21 19922944 and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.

CA Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CA Paper

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

5 Stage Pipelined RISCV Processor in RTL

Conference Paper · December 2021

Computer Architecture View project

The user has requested enhancement of the downloaded file.

Abstract— This paper consists of RISCV (RV32I) Processor Architecture

The function of the Memory stage is to perform memory Implementation of Bypass

Processor Architecture [3]

Imm[11:0] Rs1 000 rd 00100 11 Assembly Binary(in hexadecimal)

LW R16,R15,5 0057d403 R25 25 R25 0

You might also like