Microprocessor design

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Assignment 2 Report

CE869: High Level Logic Design

Submitted to: Dr Xiaojun Zhai

Submitted by: Junaid Ahmad Khan

Reg #: 2003230

Total Word Count = 3575

Dated: 02/04/21
16 – bit microprocessor design in VHDL with ISA implementation.

1 Introduction

This report covers the design, testing and implementation of a 16-bit microprocessor coded in
VHDL on Xilinx Vivado Design Suite. A 16-bit microprocessor comprises of an ALU that can
perform arithmetic and logical operations on 16-bit data, besides other functional blocks. The
register file also has 16-bit registers to provide the ALU with the necessary data to perform its
computations. Generally, any microprocessor will have two main entities: the data path part and
the control unit part. The data path is responsible mainly for the functionality of the
microprocessor, how the data flow takes places inside the CPU. It has certain essential elements
that carry out this functionality. The control unit on the other hand controls the operation of the
data path and the entire CPU.
There are different variants of the microprocessor, based on the data bus width and the size of
the ALU. They are classified as either 4-bit, 8-bit, 16-bit or 32-bit. Nowadays, we have seen 64-
bit microprocessors as well, but that certainly increases the complexity and design cost. On the
other hand, power consumption is a major issue as well that needed to be taken into account. More
complex designs include on chip memory as well. This can be seen in ARM based RISC
microprocessors. The benefit of this on chip ram is that the microprocessor doesn’t have to
accommodate for the wait cycles of memory read and write operations as that can be time costly.
Microprocessors generally deal with two types of memories: flash and sRAM, Flash is
typically used to store program code and the data that remains unchanged throughout the life cycle
of a program. Whereas sRAM stores local variables and the data that is constantly changed by the
microprocessor. The CPU needs to keep track of what current instructions are being executed and
whenever required by the program, branch to another address in memory to continue its execution.
This can be seen in the case of branch instructions, such as conditional or un-conditional jump.
Each microprocessor architecture will be slightly different from the other, however the basic
functionality remains the same. The CPU is given a set of instructions and will process those
instructions to generate the required output, but the Instruction Set Architecture (ISA) for each
microprocessor will be dependent upon the underlying architecture. The programmer therefore
must be aware of the microprocessor architecture, in order to generate a correct set of instructions
that are used to operate the CPU. The instruction length will also vary based on whether it’s a 16-
bit microprocessor or 32-bit etc.
VHDL is a very powerful synthesis tool that can be used to design and implement an entire
microprocessor on an FPGA based platform. As the FPGA contains thousands of logic cells, they
can be configured to form the basic building blocks of the CPU. The design can be simulated inside
an ISE design suite or run on a real hardware. In this project, we have first built the basic building
blocks of the microprocessor, tested them and then finally integrated them to produce an FPGA
based 16-bit microprocessor. We have tested the design using two reference programs provided in
the manual and observed the output on a simulator. The simulator output gives a correct depiction
of programs execution.

2 High Level View

Figure 1 gives a top-level view of the entire system.

Fig 1: Microprocessor Block Diagram

The design basically comprises of two main system components:


1. Data Path.
2. Control Unit.
Within the functional data path, we have four different iterative design components as
shown in figure 1.
Arithmetic / Logic Instruction Data Path:
The arithmetic / logic instruction data path consists of an ALU that performs all the
arithmetic (add, sub, inc, dec) and logical (and, or, not) operations. The ALU is also capable
of passing its input ‘b’ to the output as such is needed to output data from the register file.
Program Sequencing Data Path:
This part is mainly responsible for sequencing the execution of instructions. It comprises
of an instruction register (IR) that holds the current instruction being executed, a program
counter (PC) that points to the next in memory and an address incrementor that loads PC with
the address of the next instruction.
Data Transfer Data Path:
The main elements are the ROM which permanently stores the program instructions and a
register file that performs both read and write operations. The ROM provides the program
sequencing data path with the addressed instruction and the register file stores data to be
processed by the ALU, any input from the user or immediate data from the instruction.
Control Flow Data Path:
In case of an event where the program flow needs to be changed, the control flow data path
provides the PC with the address of a new instruction from where the execution should
continue.

3 Functional Blocks

Apart from the control unit, the entire data path is comprised of the following sub-
components:
• 2x1 mux.
• 3x1 mux.
• ALU.
• PC incrementor.
• Tristate Buffer.
• Program Counter.
• Instruction Register.
• 16x10 ROM.
• 4x16 Register File.
3.1 2x1 MUX

The 2x1 mux receives two input data lines (in0 and in1) , both 16-bit wide as shown in figure
2 and based on the select line ‘sel’ 0/1, either produces in0 or in1 on the output. In the VHDL
code the multiplexor is implemented using the ‘when’ concurrent statement.

Fig 2: 2x1 mux

3.2 3x1 MUX

The 3x1 mux receives three input data lines (in0, in1, in2), all 16-bit wide as shown in figure
3 and based on the two bit select line ‘sel’ , either produces in0, in1 or in2 on the output. In the
VHDL code the multiplexor is implemented using the ‘when’ concurrent statement.
Fig 3: 3x1 mux

3.3 PC incrementor

The pc incrementor is an adder that is used to increment PC by one address every fetch cycle.
This is to ensure that during the decode cycle, the IR register has the appropriate next instruction
which follows the sequential flow of the program. The incrementor get’s the current address in PC
as an input and outputs (PC + 1) at the input of a 2x1 mux. This newly generated address will then
be loaded into PC during the next fetch cycle, if no branch instruction is executed.

Fig 4: PC incrementor
3.4 Tristate Buffer

Tristate buffer is connected at the output of the ALU and is the main output of the entire
CPU. It’s state is controlled by the control signal ‘EN_OUT’ from the control unit. When this
signal is high, then ALU output is driven on the output, otherwise a high impedance state is shown
at the output.

Fig 5: Tristate Buffer

3.5 Program Counter

The program counter points to the next instruction address in ROM. Because there are only
16 locations in ROM, therefore the width of this register is 4 bits only. There is also a 2x1
multiplexor connected at the input which decides what address to be loaded into it. Typically, there
are two addresses, one coming from the PC incrementor and the other coming from the 4 LSB’s
of the IR register. The correct address to be loaded into PC is determined by the control signal
‘JNZ_MUX’. The PC also has a load signal which when asserted causes new address to be loaded
into the PC at next clock cycle. Apart from that it also has ‘clk’ and ‘rst’ signals. Figure 6 shows
how the PC arrangement is made in the data path.
Fig 6: PC Arrangement

3.6 Instruction Register

The instruction register is 10-bit wide to accommodate for program instructions. It has an
‘IR_LOAD’ signal coming from the control unit which when asserted loads the 10-bit instruction
into it. The control unit uses the 4 MSB’s of the IR register to be able to determine which operation
is to be performed. The other bits of this register serve as either read/write addresses of the register
file or immediate data.

Fig 7: IR Register
3.7 16x10 ROM

The ROM used in this project has 16 memory locations and each location is 10-bit wide to
support the ISA instruction encoding. It receives a read address from the PC as an input and loads
the output with the contents present at that address. Before any program execution begins, the
ROM should be preloaded with an existing set of instructions as we cannot write into the ROM.
In VHDL, this is accomplished by creating an array type of 16 elements of ‘std_logic_vector’ each
16-bit wide. Then create a constant of the type ‘rom_array’ to store the instructions. This is shown
in the code below.
subtype rom_width is std_logic_vector (N-1 downto 0);

type rom_array is array (0 to 15) of rom_width; -- Create a rom vector with


16 locations, each 10 bit wide.
-- store rom instructions that remain unchanged throughout the program.

constant rom: rom_array := (


0 => (others => '0'),

1 => (others => '0'),


2 => (others => '0'),
…………

);

The ROM performs it’s function by first converting the read address to an integer and then
that integer value is indexed into the ‘rom_array’ to output the content present at that address.
addr <= TO_INTEGER(unsigned(read_add)); -- convert read address to integer
to index into the rom array.

read_data <= rom(addr); -- output rom data at the sprecified read address.

The motivation behind this design was taken from the LAB on encoders and memories.

3.8 4x16 Register File

The register file is the main component of the processor’s data path where load and store
operations are performed. It has two read ports to provide for the ALU operands and one write
port. The register file used in the design has 4 registers (R0 to R3), each 16-bit wide. For read
operations from either of the ports, a read enable signal (RAE or RBE) and read address (RAA or
RBA) is required. Similarly, to write to a register, a write enable ‘WE’ and a write address ‘WA’
is required. The register file is capable of reading from the two ports at the same time
asynchronously, but the write operations must be synchronous. Figure 8 shows the integration of
the register file with rest of the components.
Fig: 8 Register File

The input data into the register file comes from 3 different sources. In0 of the 3x1 mux at
the input gets the immediate data from the instruction. In1 gets the user input and In2 gets the ALU
output. The operation of this multiplexor is controlled by the ‘IN_MUX’ control signal.
There are two additional multiplexors used in the design that form the port A read address
and general write address respectively. The 2-bit write address can come from any of the 6 bits of
the IR from (5 down to 0). Therefore, a 3x1 mux with ‘W_MUX’ control signal is used. Port A
read address can be formed by the IR bits (5 – 4) or (3 – 2), so for that purpose a 2x1 mux with
‘R_MUX’ control signal was used.
The VHDL code for the register file was inspired from the code provided in the lecture
slides with an additional modification that, when the read port signals are not enabled, both the
ports will produce the input to the register file as the output, as shown in the code below.
-- Read PortA

aout <= rf(TO_INTEGER(unsigned(RAA))) when RAE = '1' else


data; -- Pass regFile input to port A when RAE = '0'.
-- Read PortB
bout <= rf(TO_INTEGER(unsigned(RBA))) when RBE = '1' else
data; -- Pass regFile input to port B when RBE = '0'.

3.9 ALU

The ALU can perform 16-bit arithmetic and logical operations, based on the 3-bit opcode
generated by the control unit. The ALU implements the following operations:
OPERATION ALU_OP
PASS : Pass the input ‘b’ at the output. 000
AND: a and b. 001
OR: a or b. 010
NOT: not b. 011
Addition: signed(a) + signed(b). 100
Subtraction: signed(a) – signed(b). 101
Increment: signed(a) + 1. 110
Decrement: signed(a) – 1. 111

When the ALU_OP is not within the above range, the ALU retains whatever was it’s
previous output. This was done to accommodate for the correctness of the sign and zero flags.
In VHDL, we cannot perform arithmetic operations on ‘std_logic_vector’ types, so a
conversion was performed to make them signed integer values using the ‘numeric_std’ library
‘signed’ construct. After signed arithmetic’s, the result was again converted back to
‘std_logic_vector’ form to make it appear on ALU output.
-- Convert to signed integers to perform signed arithmetics.

a_signed <= signed(a);


b_signed <= signed(b);
-- convert from signed to std_logic_vector.

std_logic_vector(a_signed + b_signed) when "100",

Figure 9 shows the block diagram of the ALU.


Fig 9: ALU

4 Control Unit

The control unit is implemented as a finite state machine (FSM) that transitions through a
sequence of states. In other words, control words. A control word is an instruction that takes one
clock cycle to execute. The control unit starts it’s execution beginning with the start state, in which
no specific operation is done. It then goes to the fetch state which causes the instruction at the
output of the ROM to be loaded into the IR register. An additional wait state is inserted between
the fetch and the decode state so that the IR register is fully loaded with the contents of the next
instruction. Then in the decode state, based on the IR opcode, the next state is decided which would
be one of the ISA instructions. In each state, appropriate control signals are asserted to synchronize
the entire system. Once the current instruction gets executed, the controller again goes back to the
fetch state to load the next instruction.
One exception is made for the branch instructions i.e JNZ, JN ,JMP, where the controller goes
back to the start state to load PC with the jump address or to continue with the next instruction
address. When the halt instruction is executed, the system stops it’s execution and it needed to be
reset to execute another set of instructions.
Table 1 shows the state transitions for the control unit implemented in the design and table 2
gives the control signals asserted for each of the states.
State Transition Table for the FSM
Curr_State Next_State IR [9 – 6]
Halt Mov_1 IN Out Not Jmp Jnz Jn Lt
0000 0001 0010 0011 0100 0101 0110 0111 1000

start fetch -- -- -- -- -- -- -- --
fetch wait -- -- -- -- -- -- -- --
wait decode -- -- -- -- -- -- -- --
decode Halt Mov_1 IN Out Not Jmp Jnz Jn Lt

Halt Halt -- -- -- -- -- -- -- --
Mov_1 fetch -- -- -- -- -- -- -- --
In fetch -- -- -- -- -- -- -- --
Out fetch -- -- -- -- -- -- -- --
Not fetch -- -- -- -- -- -- -- --
Jmp start -- -- -- -- -- -- -- --
Jnz start -- -- -- -- -- -- -- --
Jn start -- -- -- -- -- -- -- --
Lt fetch -- -- -- -- -- -- -- --
Inc fetch -- -- -- -- -- -- -- --
Dec fetch -- -- -- -- -- -- -- --
Add fetch -- -- -- -- -- -- -- --
Sub fetch -- -- -- -- -- -- -- --
And fetch -- -- -- -- -- -- -- --
Or fetch -- -- -- -- -- -- -- --
Mov_2 fetch -- -- -- -- -- -- -- --
Curr_State Next_State IR [9 – 6]
Inc Dec Add Sub And Or Mov_2
1001 1010 1011 1100 1101 1110 1111
start -- -- -- -- -- -- --
fetch -- -- -- -- -- -- --
wait -- -- -- -- -- -- --
decode Inc Dec Add Sub And Or Mov_2

Halt -- -- -- -- -- -- --
Mov_1 -- -- -- -- -- -- --
In -- -- -- -- -- -- --
Out -- -- -- -- -- -- --
Not -- -- -- -- -- -- --
Jmp -- -- -- -- -- -- --
Jnz -- -- -- -- -- -- --
Jn -- -- -- -- -- -- --
Lt -- -- -- -- -- -- --
Inc -- -- -- -- -- -- --
Dec -- -- -- -- -- -- --
Add -- -- -- -- -- -- --
Sub -- -- -- -- -- -- --
And -- -- -- -- -- -- --
Or -- -- -- -- -- -- --
Mov_2 -- -- -- -- -- -- --

Table 1: FSM State Transition

Curr_state Control Signals


RA_MUX WE RAE RBE IR_LOAD PC_LOAD IN_MUX W_MUX JNZ_MUX ALU_OP EN_OUT halt Gen_Zero
start 0 0 0 0 0 0 -- -- 0 --- 0 0 1 or 0
fetch 0 0 0 0 1 1 -- -- 0 --- 0 0 1 or 0
wait 0 0 0 0 0 0 -- -- 0 --- 0 0 1 or 0
decode 0 0 0 0 0 0 -- -- 0 --- 0 0 1 or 0
Halt 0 0 0 0 0 0 -- -- 0 --- 0 1 1 or 0
Mov_1 0 1 0 1 0 0 10 01 0 000 0 0 1 or 0
In 0 1 0 0 0 0 01 10 0 --- 0 0 1 or 0
Out 0 0 0 1 0 0 10 -- 0 000 1 0 1 or 0
Not 0 1 0 1 0 0 10 01 0 011 0 0 1 or 0
Jmp 0 0 0 0 0 1 -- -- 1 --- 0 0 1 or 0
Jnz 0 0 0 0 0 1 or 0 -- -- 1 or 0 --- 0 0 1 or 0
Jn 0 0 0 0 0 1 or 0 -- -- 1 or 0 --- 0 0 1 or 0
Lt 0 0 1 1 0 0 -- -- 0 101 0 0 1 or 0
Inc 1 1 1 0 0 0 00 00 0 100 0 0 1 or 0
Dec 1 1 1 0 0 0 00 00 0 101 0 0 1 or 0
Add 0 1 1 1 0 0 10 00 0 100 0 0 1 or 0
Sub 0 1 1 1 0 0 10 00 0 101 0 0 1 or 0
And 0 1 1 1 0 0 10 00 0 001 0 0 1 or 0
Or 0 1 1 1 0 0 10 00 0 010 0 0 1 or 0
Mov_2 0 1 0 0 0 0 00 00 0 --- 0 0 1 or 0

Table 2: FSM Control Signals


* For Table 1 above “--” denotes repetition in all rows and columns. For table 2 “-” denotes don’t care.
5 Data Path Flags

The data path contains two flags: Sign and Zero. The sign flag indicates whether the result of
arithmetic subtraction is a negative number, which will then be used for the comparison (LT)
instruction. Based on that the zero flag is set to ‘0’ or ‘1’. The sign flag is taken out of bit 15 of
the ALU. The zero flag is set when either all the bits of the ALU are ‘0’ or when a comparison
operation is performed, and the result is a positive number. These two flags are implemented in
VHDL as follows:
-- Generating Zero flag.

process (y_cmp, Gen_zero)

begin

if (y_cmp = x"0000") then

zero <= '1';

elsif (ALU_OP = "101" and WE = '0') then -- indicates LT state

if (Gen_zero = '1' ) then

zero <= '0';

else

zero <= '1';

end if;

else

zero <= '0';

end if;

end process;

-- Sign flag

S <= y_cmp(15);

The zero flag was implemented as an ‘inout’ port so that it could be set by both the control
logic as well as the data path.
6 Final Design Hierarchy

Fig 10: VHDL Design Hierarchy

7 Test Programs

In order to test the system, we ran two programs. The first program was to calculate the
sum of natural numbers less than N. Where N is the user provided input. The second program
was to compute the quotient of N/11 where N again is a user provided input. We first
developed the assembly language versions of the two programs and then generated the
corresponding machine level instructions. The two programs are presented below.

Program 1
0 => "0010000001", -- IN R1
1 => "1010010001", -- DEC R1

2 => "0111000110", -- JN 0110


3 => "1011100110", -- ADD R2,R1,R2
4 => "1010010001", -- DEC R1
5 => "0110000011", -- JNZ 0011

6 => "0011000010", -- OUT R2

Program 2
0 => "1111001011", -- MOV R0 # 1011

1 => "0010000001", -- IN R1
2 => "1000000100", -- LT R1,R0

3 => "0110000111", -- JNZ 0111


4 => "1100010100", -- SUB R1,R1,R0
5 => "1001100001", -- INC R2

6 => "0101000010", -- JMP 0010


7 => "0011000010", -- OUT R2

8 Results

Figure 11 shows the output for the first program. The period of the system clock was set to
20ns. The user input was integer number ‘3’. The simulation was performed in Vivado IDE and
the output shows the sum of numbers less than 3, i.e 3.
Fig 11: Program 1 Output

For the second program, the same clock period of 20ns was set and a number 22 was
provided as an input. The output shows 2 which is the quotient of 22/11.

Fig 12: Program 2 Output

9 Conclusion

This project implements the design of a 16-bit microprocessor in VHDL according to the
specification provided, in accordance with the ISA. The design successfully implements all the
instructions that were part of the ISA and the test programs output shows the working of the design.

You might also like