Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Implementation of RISC Processor on FPGA

Pravin S. Mane*, Indra Gupta**, M. K. Vasantha**


*Computer Science & Engineering Department, Mody Institute Of Technology & Science, Lakshmangarh-332311
*"Electrical Engineering Department, Indian Institute of Technology Roorkee, Roorkee-247667
Email: *pravinm586, ,yahoo.com, indrafee(itrernetin, krisnfeeg iitrernetVin
Abstract A true 16-bit RISC processor has been designed 16/ 16
using VHDL. Hierarchical approach has been used so that basic Data Lines OutDut Port
units can be modeled using behavioral programming. These basic 16 16
units are combined using structural programming. Four stage AddressLine Input Port
(viz. instruction fetch stage, instruction decode stage, execution
stage and memory/lO-writeback stage) pipelining is used to Processor Read
improve the overall CPI (Clock Cycles per Instruction).
Hardwired control approach is used to design the control unit as RESET *| * Write
against microprogrammed control approach in conventional 6 6 k

CISC processor. The processor has one input port, one output Interrupt ntrUDt Ack
port and six hardware vectored interrupts along with 16-bit
address bus and 16-bit data bus. Structural hazards are dealt Fig 1 RISC Processor
with the implementation of prefetch unit, data hazards are dealt
with forwarding and control hazards are dealt with flushing and in succeeding instructions (source for an instruction is
stalling. The design has been implemented on FPGA for destination
dsnaonfor previous instruction) and they can be resolved
verification purpose.
by the method of forwarding. Structural hazards are due to
Index Terms - Architecture, FPGA, VHDL common program and data memory. By implementing the
prefetch queue in processor, structural hazards can be handled.
Control hazards are introduced in non-sequential execution of
an instruction and the method of flushing is used to deal with
A trend is towards the designing of RISC processors that are them.
efficient for specific application and meet the minimum Proposed RISC processor has four stages in the pipeline and
requirements of application in hand [1]. Performance is the in general each stage take one clock pulse to complete its
main criteria for designing of such processors [2]. The operation. It has 16-address lines, 16-data lines, one 16-bit
proposed RISC processor designed here is an effort towards input port and one 16-bit output port. The logical block of it is
efficient processor suitable for small applications. shown in fig. 1. All internal registers are 16-bit in length.
CISC processors have gained the common marketplace over
the years. They support various addressing modes and various
data types. The instruction length varies from instruction to As a true 16-bit RISC processor, all instructions are 16-bit
instruction. They frequently access data in external memory. in length and use the 3-operand notation method (2-source
They are generally implemented using microprogrammed operands and 1-destination operand). The design has 8-internal
control. There is little semantic gap between instructions of general purpose registers all are 16-bit in length. The general
CISC processor and statements in higher-level languages. instruction format for the designed RISC processor is given in
Although they may be saving memory space, design is fig. 2. Proposed processor supports 37-instructions as
complicated and as instructions are variable in length; special summarized in tablel. It supports addressing modes viz.
hardware for boundary marking of instruction is required. register addressing, immediate addressing and registers
Study over the years proved that simple instructions are used indirect addressing.
80% of the time and many complex instructions can be The actual instruction format may vary from instruction to
replaced by group of simple instructions [3]. instruction. As each instruction has different format, they
RISC processor operates on very few data types and does follow different datapath, which are combined together to form
the simple operations. It supports very few addressing modes final datapath.
and are mostly register based. Most of the instructions operate
on data present in internal registers. Only LOAD and STORE
instructions access data in external memory. Also the 15 12 11 9 8 6 5 3 2 0
instruction length is fixed and hence decoding is easier.
Parallel executon o structions through te ppeled cl tintin l o fnctin
stages of processor improves the overall throughput of the
processor but introduces some hazards in its working [4]. Data Fig. 2. General Instruction Format
hazards are due to sharing of destination and source resources

1-4244-0726-5/06/$20.00 '2006 IEEE 2096

Authorized licensed use limited to: GHULAM ISHAQ KHAN INST OF ENG SCI AND TECH. Downloaded on April 30,2022 at 17:17:51 UTC from IEEE Xplore. Restrictions apply.
TABLE I instruction has immediate data, it is signed extended to 16-bit
Instruction DeTScMMARit
INSTRUCTION SET SUMMARY
by this stage. This stage also responsible for differentiating
ADD RD, RS1, RS2 Signed addition, RD eRS1
c
R2 between absolute branching and relative branching. Instruction
ADDu RD, RS1, RS2 Unsigned addition, RD RS1 RS2
= +
execution stage does all the arithmetic and logical operations
SUB RD, RS1, RS2 Signed subtraction, RD = RS1 - RS2 including shift and rotate. The arithmetical (addition,
SUBu RD, RS1, RS2 Unsigned Subtraction, RD = RS1 - RS2 subtraction etc.) and logical (logical AND, OR, XOR etc.)
ADDI RD, RS1, D6 Signed addition, RD = RS1 + D6 operations are carried out by basic ALU, shift and rotate
SUBI RD, RS 1, D6 Signed subtraction, RD = RS 1 - D6 . .
SLT RD, RS1, RS2 Signed set less than, RD = O if RSI< RS2 operations are ccarried out by shift unit and move immediate
SLTu RD, RS1, RS2 Unsigned set less than. RD = ifRSl<R2 unit loads immediate data in specified register. Arithmetic
NOT RD, RS Logical NOT. RD NOT (RS) logic unit also sets the overflow flag in case of arithmetic
AND RD, RS1, RS2 Logical AND. RD RS1 AND RS2 instructions. Memory/JO-write back stage is responsible for
OR RD, RS 1, RS2 Logical OR. RD = RS 1 OR RS2 accessing external memory for data and for writing the results
XOR RD, RS1, R52 Logical XOR. RD RSI XOR R52 back into destination register. In case of IN instruction, data on
NOR RD, RS1, RS2 Logical NOR. RD =RS1 NOR RS2.....
SLL RD, RS1, RS2 Logic shift left. RD = RS1 shifted byRS2 input port is directly sent to destination register by this unit. It
SRL RD,, RS1, RS2 Logic shift right. RD = RS1 shifted by RS2 also place data on output port for OUT instruction.
SRA RD, RS 1, RS2 Arithmetic shift right. RD = RS 1 shifted by RS2 2) Control Unit. Hardwired control approach is best suited
ROR RD, RS1, RS2 Rotated right. RD =RS rotated by RS2 for control unit of RISC processor because very few control
IN RD RD = data on input port signals needs to be generated for smaller set of instructions.
BZ RD, RS Branch on zero. If RS 0 jump Sto c RD Sequencing of control signals for various stages of processor is
BNZ RD, RS Branch not zero. If RS /= 0 jump to loc RD important issue and is achieved by passing them through clock
El D6 Enable interrupt whose value is 1 in D6 sensitive units in succession. The block diagram of hardwired
JAL RD, RS Jump to loc RS and link back to loc RD control unit designed is shown in figure 4.
RJAL RD Link back to loc RD 3) Hazard Detection Unit. This unit primarily detects the
RETI Return from interrupt to loc in TRAP register resource conflicts and generates the necessary signals for
MVIL RD, D8
MVIH RD, D8
Move D8 in lower byte of RD
Move D8 in upper byte of RD recutionfowdgn
execution forwarding unit and branch
and
branch forwary unit. The
forwarding unit. The
BZI RD, D8 If RD 0, jump to loc (PC) + D8 source of current instruction may be the destination of
BNZI RD, D8 If RD /0, jump to loc (PC) + D8 previous instruction and in such case the data supplied to
SLLI RD, RS, D5 RD = RS shifted left logically by D5 execution stage is not correct until result of previous
SRLI RD, RS, D5 RD = RS shifted right logically by D5 instruction is written back to destination.
SRAI RD, RS, D5 RD = RS shifted right arithmetic by D5 4) Branch Forwarding Unit. The branch target address for
RORI RD, RS, D5
LW RD, RS, D6
RD = RS rotated right by D5
RD = data from memory loc (RS + D6) Branch isructiono
any branch
an

instruction or cndti
conditional datar aditior
data for conditional
SW RD, RS, D6 Loc (RS + D6) in memory = RD branch instruction may be the destination for previous
HLT Halt the program execution. instruction. When an interrupt occurs and the current
instruction in execution is branch instruction then the return
III. PROCESSOR ARCHITECTURE address is not the next sequential address but it will be the
branch target address. This unit takes care of all such
The proposed architecture consists of pipelined unit, control situations. It also flushes the instructions following the branch
unit, hazard detection unit, branch forwarding unit, execution instruction in the pipeline if branching actually takes place. If
forwarding unit, interrupt and exception unit andprefetch unit. data hazards occur for branch instruction then result from
Its block diagram is shown in fig.3. Pipelined unit consists of execution unit is selected as a component for branch target
four stages viz. instruction fetch unit, instruction decode unit, address calculation for any branch instruction or conditional
instruction execution unit and memory/JO-write back unit. data for conditional branch instruction.
1) Pipelined unit. The pipelined unit designed for this 5) Execution Forwarding Unit. When data hazards occurs
processor consists of four stages viz. instruction fetches stage, this unit directly forwards the result of execution unit or results
instruction decodes stage, instruction execution stage and from write-back stage as a source to the instruction following
memory/JO-write back stage. Each stage is connected to the the current instruction in execution stage.
next stage through output register of each stage to implement 6) Interrupt and Exception Unit. RESET has the highest
just-in-time principle. Instruction fetch stage fetches priority, and then overflow exception, hardware interrupt lines
instructions from memory through prefetch unit and consists of 5 to 0 has priority in decreasing order. All of them are
program counter and selector to choose correct PC value based vectored. Although zero flag, carry flag etc. are not designed;
on current instruction in execution stage. For sequential they can be simulated through exception routine for overflow
execution of program, it selects incremented program counter exception and branch instructions. The return address for
value. For non-sequential execution, it selects either branch exceptions and hardware interrupts is stored in TRAP register,
target address or vector address for interrupts and exceptions which is part of this unit.
It also has the program counter incrementer. Instruction 7) Prefetch Unit. This unit makes use of idle bus time to
decode stage separates the opcode, function and the source and prefetch instructions from memory and stores them in prefetch
destination codes for control unit. Register file is a part of this
unit, which consists of 8-general purpose registers. If the

2097

Authorized licensed use limited to: GHULAM ISHAQ KHAN INST OF ENG SCI AND TECH. Downloaded on April 30,2022 at 17:17:51 UTC from IEEE Xplore. Restrictions apply.
InDut Port

| Stage I
Intuto
rx 1 rx
|Stage 2
Instruction
Istrucho
3

Instruction
Stage 4
Memory/1
1 | | ~~~~~~~~~~~~~~~~~~~~Output

| | ~~~~~~~~~~~~~~~~~~~~~~~~~Interrupt
Memory Read sAcknowledgement
FeWrite Control
Fig.3. RISC Processor Architecture Block Diagram
elk
| Stage4
Stag4Cntro Cotro
Stage4 Cotl
UrteCnitr Sae2Cnto
Signals Write Back Stage4
Function |Stage3 Control |||Execution | |Control|
Control HazSignals Stage3 Control Register
Zero >|BscCnrle
BasicCtrlt U Controll Control
F Decode
2C
Registerl
Cntr
Register
ll
Signals
Opcode Signals
Signals Control Signals for Stage2

Control Signals for Stagel

Fig.4. Block Diagram of Hardwired Control Unit.

queue. Prefetch queue has the capacity 1S.


VHDL~~~~~~~~~~~Sinl of Icoy
storing four number of needs in the design process. FirstSignal
Conrr it allows
hig-see ine
instructions at a time along with tag for each ofthem. The tag description of the structure of system that is, how it is
specifies from which memory location the corresponding decomposed into subsystems and how they are interconnected.
instruction has been fetched. The prefetch unit works like Second, it allows the specification of the function of a system
small look-ahead cache. It does not work on FIFO principle, using familiar programming language forms. Third, as a result,
This unit supplies instruction to instruction fetch stage. it allows the design of a system to be simulated before being
I. VHDL
IV V

. ~~~~~~manufactured,
alternatives and sotestthat designers can quickly compare
for correctness without the delay and
VDLisacronymfordingra ccuitof expense of hardware prototyping. Fourth, it allows the detailed
Hardware Description Language. It Si designed to fill a

2098

Authorized licensed use limited to: GHULAM ISHAQ KHAN INST OF ENG SCI AND TECH. Downloaded on April 30,2022 at 17:17:51 UTC from IEEE Xplore. Restrictions apply.
structure of a design to be synthesized from a more abstract excto fsalpormi ie nfg5 hspormi
specification, allowing designers to concentrate on more fradtoofhxecmlaa10Han30Hlaedi
strategic design decisions and reducing time to market [6]. reitrR n 2adteadto spae nR eitr
VHDL was established as the IEEE 1076 standard in 1987. In Tersuccofithscurdinhspoga bcue
1993, the IEEE 1076 standard was updated and an additional dsiaon fMO istuinisR adsureorex
standard, IEEE 1164 was adopted. In 1996, IEEE 1076.3 suceigisrcinADi gi i hsrsuc
became the VHDL synthesis standard cofiti ucsflyhnlda hw nsmlto eut
V. IMPLEMENTATION ON FPGA
TABLE II
FPGA (Field Programmable Gate Arrays) is a device used DVCUTLZIOSMAR
for the verification of design. It works as raw IC where user LgcUizaonse AvlbeUtiain
can implement its design on it and verify the correctness of TtlNmeSicRgser 57 47012
design. This allows for cheaper prototyping and shorter time NmeusdaFlplos4
to-market of hardware designs. It consists of LUTs (Look Up NmeusdaLtcs52
Tables), CLBs (Configurable Logic Blocks), LCs (Logic Nubro nu Us201 47440
Cells) and JOBs (hinput Output Blocks) [7]. The designed LgcDsrbto
processor has been implemented on Xilinx's Spartan-Il FPGA NmeofcupdSles 137 25240
for verification purpose. The device utilization summary is Nubrflcscnann ,3 ,3 00
gvnin table II. olreadlgi
given .Number ofSlices containing 0 1,137 000
VI. SIMULATION RESULTS only unrelated logic
Total Number of 4inpuft LUTs 2, 123 4,704 45%
After modeling the design through VHDL, it is simulated Nmeusdaloi2,0
before implementing it on FPGA. Xilinx's ISE simulator is Nmeusdarot-hug 12
used for this purpose. The input test bench waveform is used NmeofGLs14 20
as source for applying the input values and the output Nubro CKOs14 20
waveform from simulator is used as a mean to verify the
correctness of the design [5]. Simulation is used as debugging
tool for correcting errors in design. Sample waveform for
File6 Simulabon

1019 fit 72 W~~~~ 122 1 73n 22 24n

in puLtpob 5L b
+4 *jnt3erupts :50] b

sloc6kf 1
oututpo1 ... 16..I01LUL

+ redfqk:0 16'..11htlOI333
acknow tlei. 0 0
A :fk~450]
leot 8h66 1hO hO H8FC8hi SFC0D2 11
............................................................[............................................................................. .....................4 1................ 8....

ee~~~~idffi5.01 i&6~~~
~ ~~ ~ ~ ~ ~ ~ ~
Al ~~~16d6yt[7________h6____________

i6d66hte01Fi.5[imltin3eul o esgndRICPrcesr4oneteht emr

Authorized licensed use limited to: GHULAM ISHAQ KHAN INST OF ENG SCI AND TECH. Downloaded on April 30,2022 at 17:17:51 UTC from IEEE Xplore. Restrictions apply.
VII. CONCLUSION REFERENCES

Through pipelining, the maximum throughput of execution [1] M. A. S.D. Poz, J. E. A. Cobo, W. A. M. V. Noije and M. K. Zuffo, "A
is achieved as one instruction per clock pulse provided that Simple RISC Microprocessor Core Designed for Digital Set-Top-Box
there are no stalls. This is possible due to hardwired approach Applications," IEEE International Conference on Application-Specific
for design of control unit and fixed length instruction format. Processors, 2000. Proceedings,
Systems,XiaArchitectures, andBo pp. 35-44. "VLSI
To reolve azard, result
reult frwaring [2] Longwei
Li, Ji, Shen, Wenhong Qianling Zhang,
Li and
To resolve hazards,
data ata forwarding iis effcientthan
efficient than Implementation of a High-performance 32-bit RISC Microprocessor,"
stalling as it remove the penalty of time in handling such International Conference on Communications, Circuits and Systems and West
conflicts. The prefetch unit is used for handling the structural Sino Expositions, IEEE 2002, Vol. 2, pp.1458-1461
hazards while flushing is used to handle control hazards. The [3] Sivarama P. Dandamudi, "Fundamentals of Computer Organization and
prefetch buffer is like a small cache storing tag along with design" Springer-Verlag New York, Inc., 2003.
[4] John L. Hennessy and David A. Patterson, "Computer Architecture A
instruction fetched. It does not work on FIFO principle. The Quantitative Approach," 3rd Edition, Morgan Kaufmann Publishers, 2003.
design is modeled and simulated using VHDL and then [5] Zhenyu Gu, Zhiyi Yu, Bo Shen and Qianling Zhang, "Functional
implemented on FPGA successfully. The maximum frequency Verification onMethodology of a 32-bit RISC Processor," International
of operation theXilinx'sSpartan11FPGAis26MHConference Communications, Circuits and Systems and West Sino
of operation on the Xilinx's Spartan-Il FPGA iS 26-MHz.
on
Expositions, IEEE 2002, Vol. 2, pp. 1454-1457.
{6] Peter J. Ashenden, "The Designer's Guide to VHDL," 2nd Edition,
Morgan Kaufmann Publishers, 2002.
[7] Spartan-Il Platform FPGA handbook October 24, 2002.

2100

Authorized licensed use limited to: GHULAM ISHAQ KHAN INST OF ENG SCI AND TECH. Downloaded on April 30,2022 at 17:17:51 UTC from IEEE Xplore. Restrictions apply.

You might also like