Professional Documents
Culture Documents
16-Bit RISC PROCESSOR
16-Bit RISC PROCESSOR
by
Michael Telcide
Nemanja Stojanovic
Cleiton Juffo
Project directed by
Dr. Onur Tigli
Asst. Professor of Electrical and Computer Engineering
Page |1
TABLE OF CONTENTS
Page
1. Table of Contents
2. Abstract
3. Introduction
6. CPU Diagram
7. Simulations
8. Schematics
10
11
Page |2
ABSTRACT
In this project, we are designing a 16-bit RISC processor to be implemented in
Verilog. This design presents the structural design and the functional characteristics of a
general purpose RISC processor. The entire project was constructed as a bottom up approach
in the design method. We started with basic sequential and combinational building blocks of
NON-pipelined process and built on more complex blocks. The 16 bit RISC processor
architecture features 16 bit instruction words, 16 internal general-purpose registers each of
which can hold a 16 bit data word, 6 external address lines to ROM, and 6 external address
lines to an external memory (RAM). Each module was designed, synthesized and tested at
each level of implementation. Afterwards, the modules were interconnected and integrated in
a top-level simulation by appropriate port mapping.
Page |3
INTRODUCTION
Among all kinds of CPU in use today, the Reduced Instruction Set or RISC CPU has the
majority market share. It is most commonly used in embedded systems, which are in almost
every consumer products on the market. RISC CPUs are basic in nature and offers lowpower consumption and small size. They are sometimes referred to as load-store processor
because of the basic mechanics upon which it operates. The idea of RISC CPU is to reduce
the complexity of the system and increase the speed. Any complex operation can be split into
smaller chucks that can be calculated simultaneously in most cases. Other important features
of the RISC CPU include uniform instruction coding, which allows faster coding. A good
example is that the op-code is always in the same bit position in each instruction, which is
always one word long. Another advantage is a homogeneous register set, which allows any
register to be used in any context and simplify compiler design. Lastly, complex addressing
modes are replaced by sequences of simple arithmetic instructions. The convenience of the
RISC processor is a direct explanation why it dominates the CPU market.
Page |4
DESIGN and RESULTS
Below is the state machine that represents our design. It includes five states. Once the
machine has started or has completed all of the tasks, it enters the idle state. At the beginning
of the program, the first state it Fetch; this state is to get the instruction from ROM and send
it to the instruction register (IR). The program Counter (PC) is also incremented at this stage.
In the second state, Load, the instruction is decoded by the IR and the address of the
operands are sent to the register files (RF). On the next Execute state, the operands are loaded
into the ALU form the RF; a signal is sent to enable the ALU for a specific operation and
another signal may be sent to enable RAM if we have to write to or read from it. At the Store
state, all operations are stored in either the RF or RAM. If there was a jump instruction, PC
will be incremented by the offset value.
State machine
RESET
5'b10000;//Send/Rec
eive data to/from
RAM //store results
into RF, add offset to
PC
Store
//send
next instruction
address to ROM
Execute
Idle
5'b00010;//retrieve
instruction from ROM and
load it into IR, //increment
PC
Fetch
Load
Page |5
The 16 bit computer will be able to perform the following operations. Based on this table, we
designed our computer to operate under these three categories; arithmetic, logic, and branch
operations.
The most important part of our design is to figure out all the necessary components that will
be need to implement the 16 bit RISC computer and how they will be interacting with one
another. The drawing below illustrates our logic, the components, and the datapath we think
would be needed in this project.
Page |6
After conceptually verify that our logic works, our next biggest task was to figure out how to
implement it in Verilog. After a lot thinking and a lot of debugging, the final codes are given
in the next section. The top module is the CPU and all the other modules inside are whats
needed to make it functional.
Page |8
Simulation Results
Upon completion of our design, we implemented our CPU as specified in the guidelines. This
consisted of using several modules and files given to us to simulate ROM, RAM and UART
components. Synthesizing as was specified allowed us to simulate a working CPU using
Xilinx, and then we went ahead and generated a bit file. Upon loading the bit file into the
FPGA board, we realized our program was working, but not optimally. It conducted several
instructions, ranging from load immediate, to ALU operations and was also able to write a
register value into memory at a specified location. Even so, using the FPGA board, PuTTy
and the UART module, we could only see the first memory value and not the next 64. We
then felt best if we were to remove the modules and files given to us, and for us to re-run the
simulations on Xilinx of just our CPU and relevant components. This is the simulation that
follows:
Sample Program
Page |8
We wrote the sample program above to test our CPU components using the OP-Codes given
to us in the table on page 5 of this report. Using the testbench code (which is provided in the
appendix) we were able to simulate our CPU.
Simulation
In this simulation, we see that the btn_press has to be activated low for an amount of time,
and then set to low to mimic a de-bounce. After this stage, we begin to receive date from the
ROM unit. All the instructions listed in the sample program table are completed and we send
data to be written to the ram (highlighted by the yellow cursor line). Since we do not have a
RAM module attached, we cannot simulate reading from memory.
Page |9
RTL and Tech Schematic
The synthesized CPU was then able to generate an RTL/tech schematic of our components.
Due to the size of these components, the image was very detailed and complex.
P a g e | 10
CONCLUSION
Designing this 16-bit processor gives us a lot of insight about computer architecture
in general. Not only did we learn about each component and how to implement them in
Verilog, we also know how each one of them is related to the other and how the overall
operation should be. Due to the scale of this project, we spent a lot of hours coding and
debugging. We ran into multiple problems while designing the CPU, for example, we
realized that we needed additional states in order to execute all instruction. As we went step
by step throughout our implementation of the design, we realized how massive and intense
this project was. It gave us great insight into how much hard work and sacrifice there must
have been made to make such processors possible prior to tools such as FPGAs and Xilinx.
Although our CPU did not function properly, we feel confident in saying that this was a
tremendous learning experience.
REFERENCES
R. Parihar , S. Reddy DESIGN OF 16 BIT RISC PROCESSOR
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE May 2006
P a g e | 11
APPENDIX
Verilog HDL Code
We will send all codes electronically due to space constraints. They will all be in their
original .v (Verilog) file types.
Testbench Code
`timescale 1ns / 1ps
//TOP LEVEL TESTBENCH FOR SIMULATION
module tb_cpu;
// Inputs
reg [15:0] data_from_rom;
reg reset;
reg clk;
reg btn_press;
// Outputs
wire [5:0] address_to_rom;
wire enable_to_rom;
wire write_enable_to_ram;
wire [5:0] address_to_ram;
wire read_enable_to_ram;
wire enable_ram_read;
// Bidirs
wire [15:0] data_ram;
// Instantiate the Unit Under Test (UUT)
CPU uut (
.data_from_rom(data_from_rom),
.reset(reset),
.clk(clk),
.btn_press(btn_press),
.address_to_rom(address_to_rom),
.enable_to_rom(enable_to_rom),
.data_ram(data_ram),
.write_enable_to_ram(write_enable_to_ram),
.address_to_ram(address_to_ram),
.read_enable_to_ram(read_enable_to_ram),
.enable_ram_read(enable_ram_read)
);
always
begin
#10
end
initial begin
clk = ~clk;
// Initialize Inputs
reset = 0;
clk = 0;
btn_press = 1;
#100;#100;
btn_press = 0;
CPU.xise
Parser Errors:
No Errors
Module Name:
CPU
Implementation State:
Target Device:
xa7a100t-2Icsg324
Product Version:
ISE 14.2
Design Goal:
Balanced
Design Strategy:
Environment:
System Settings
Errors:
Warnings:
Routing Results:
Timing Constraints:
0 (Timing Report)
[-]
Used
459
1%
661
63,400
1%
657
63,400
1%
19,000
0%
15,850
1%
439
20
0
Utilization
126,800
Available
625
0
32
215
721
271
721
37%
60
721
8%
Note(s)
390
721
54%
16
29
126,800
1%
51
210
24%
Number of RAMB36E1/FIFO36E1s
135
0%
Number of RAMB18E1/FIFO18E1s
270
0%
Number of BUFG/BUFGCTRLs
32
3%
Number of IDELAYE2/IDELAYE2_FINEDELAYs
300
0%
Number of ILOGICE2/ILOGICE3/ISERDESE2s
300
0%
Number of ODELAYE2/ODELAYE2_FINEDELAYs
Number of OLOGICE2/OLOGICE3/OSERDESE2s
300
0%
Number of PHASER_IN/PHASER_IN_PHYs
24
0%
Number of PHASER_OUT/PHASER_OUT_PHYs
24
0%
Number of BSCANs
0%
Number of BUFHCEs
96
0%
Number of BUFRs
24
0%
Number of CAPTUREs
0%
Number of DNA_PORTs
0%
Number of DSP48E1s
240
0%
Number of EFUSE_USRs
0%
Number of FRAME_ECCs
0%
Number of ICAPs
0%
Number of IDELAYCTRLs
0%
Number of IN_FIFOs
24
0%
Number of MMCME2_ADVs
0%
Number of OUT_FIFOs
24
0%
Number of PCIE_2_1s
0%
Number of PHASER_REFs
0%
Number of PHY_CONTROLs
0%
Number of PLLE2_ADVs
0%
Number of STARTUPs
0%
Number of XADCs
0%
5.93
Performance Summary
[-]
Pinout Data:
Pinout Report
Routing Results:
Clock Data:
Clock Report
Timing Constraints:
[-]
Report Name
Status
Generated
Errors
Warnings
Infos
Synthesis Report
Current
46 Warnings (1 new)
25 Infos (6 new)
Translation Report
Current
Map Report
Current
Current
1 Warning (1 new)
1 Info (0 new)
Current
1 Warning (1 new)
3 Infos (0 new)
Power Report
Post-PAR Static Timing Report
Bitgen Report
Secondary Reports
[-]
Report Name
Status
Generated
Out of Date
WebTalk Report
Out of Date
Out of Date