Download as pdf or txt
Download as pdf or txt
You are on page 1of 102

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE

Implementing booth algorithm on FPGA

A graduate project submitted in Partial fulfillment for requirements for the


Degree of Master of Science in

Electrical Engineering

By

Rashed Hamad Alajmi

May 2019
The graduate project of Rashed Hamad Alajmi is approved:

Date
Dr. Xiyi, Hang

Dr. Xiaojun, (Ashley) Geng Date

Date
Dr. El Naga Nagi , Chair

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE

ii
ACKNOWLEDGEMENTS

I would like to express my special thanks of gratitude to my professor Dr. Nagi El Naga for

all his advice in this project and since I started my master degree.

Secondly I would like to thank my parents and my whole family for all their support and

patients till I finish this project.

iii
Table Of Contents
SIGNATURE PAGE ……………………………………………………………………i

ACKNOWLEDGMENT ………………………………………………………………..ii

LIST OF FIGURES ……………………………………………………………………vii

ABSTRACT........................................................................................................................ x

Chapter 1: Project Overview............................................................................................... x

1.1 Introduction ..................................................................................................... 1

1.2 Objective ......................................................................................................... 2

1.3 Project Outline................................................................................................. 3

Chapter 2: High Performance Adders ................................................................................. 4

2.1 Carry Look-Ahead (CLA) Adders .................................................................. 4

2.1.1 CLA Theory ................................................................................... 4

2.1.2 Carry Look-Ahead Adder Group...................................................... 6

2.1.3 Four-bit Carry Look-ahead Unit 1 ....................................................... 7

2.1.4 Four-bit Carry Look-ahead Unit 2 ........................................................ 8

2.1.5 32-bit One-Level Carry-Look Ahead Adder .............................................. 10

2.1.6 Two Level CLA Adder unit ................................................................ 11

2.2 Carry Save Adders ........................................................................................ 12

2.3 Conclusion..................................................................................................... 15

Chapter 3: Multiplication Algorithms ............................................................................... 16

3.1 Serial Multiplication Methods ....................................................................... 16

iv
3.1.1 Simple Multiplication Method……………………………………………………….17

3.1.2 Booths Algorithm………………………………………………………………………..….18

3.1.3 Modified Booth Algorithm…………………………………………………………..21

3.2 Recoding Multipliers ..................................................................................... 23

3.2.1 Uniform Shift of One method………………………………………………………..23

3.3.2 Uniform Shift of Two method…………………………………………..…………..29

3.4 Implementing the Overlapped Scanning multiplier algorithm ...................... 34

Chapter 4- Logic Circuits and Modules ............................................................................ 36

4.1 Decoders ........................................................................................................ 36

4.2 Control Gates................................................................................................. 37

4.3 Complementing circuits ................................................................................ 37

4.4 Accumulator .................................................................................................. 38

4.5 Left-bit shifter ............................................................................................... 39

Chapter 5: Designing the Multiplier ................................................................................. 41

5.1 Logical Circuits Schematic ........................................................................... 41

5.2 8-bit Multiplication Model ............................................................................ 43

5.3 Final Multiplier Design ................................................................................. 48

Chapter 6: DETAILS OF IMPLEMENTATION …………………..………………………………………..49

6.1 introduction …………………………………………………………………………………………………………49

6.2 List of components ………………………………………………………………………………………………49

v
6.3 Description of hardware components ……………………………………………………………….60

6.3.1 Seven segments ………………………………………………………………………………….50

6.3.2 Transistors ………………………………………………………………………………………….50

6.3.3 Arty Z7 ……………………………………………………………………………………………..51

6.4 Schematic Diagram …………………………………………………………………………………………52

6.5 Functional description of the project ………………………………………………………………54

6.6 Software Description ………………………………………………………………………………………56

6.6.1 Introduction…………………………………………………………………………………………………56

6.6.2 Signed Multiplier…………………………………………………………………………………………56

6.6.3 Signed to SLV……………………………………………………………………………………………….57

6.6.4 BCD Display……………………………………………………………………………………………………58

6.6.5 Hex to 7 SEG……………………………………..…………………………….59

6.6.6 Segment Controller……………………....……………………………59

Chapter 7……………………………………………………………………………61

References…………………………………………………………………………..63

Appendix: VHDL CODE………………………………………………………………65

vi
List of Figures:

2.2.1 Schematic of a Carry Look Ahead Adder 5


2.2.2 Schematic of a Carry Look-Ahead Group Unit 6
2.2.3 Schematic of a Four-bit Carry Look-ahead Unit 1 7
2.2.3 Block Diagram a Four-bit Carry Look-ahead Unit 1 7
2.2.3 Schematic of a Four-bit Carry Look-ahead Unit 2 9
2.2.3 Block Diagram a Four-bit Carry Look-ahead Unit 2 9
2.3.4 Schematic of a 32-bit One Level Carry Look-Ahead Adder Unit 10
2.2.4 Schematic of a 32-bit Two Level Carry Look-Ahead Adder Unit 11
2.4 The Similarities between a 1-bit Full Adder and a Carry Save Adder 13
2.4 The Similarities between a 1-bit Full Adder and a Carry Save Adder 13
2.4 Carry Save Adder Blocks for n = 8 bit numbers 13
2.4 Three-leveled Carry Save Ahead Tree 14
3.1.2 Booths Algorithm operations for different combinations of bi-1 and bi 18
3.1.2 Flow Chart for Booth’s Algorithm 19
3.1.2 Booth’s Algorithm example 20
3.1.3 Radix-4 Booth encoding table 22
3.1.3 Radix-4 example 22
3.2.1 Operation Rules when depending on yi 24
3.2.1 Operation Rules when depending on yi and carry variable is considered 25
3.2.1 Operations rules for fi0 26
3.2.1 Operations rules for fi1 26
3.2.1 Operation rules for Uniform Shift of One method 27
3.3.2 Uniform shift by one recoding bit multiplier 28
3.3.2.1 Non-overlapped multibit scanning multiplier 31
3.3.2.2 Operation Rules for recoding by pairs 32

vii
3.3.2.2 Modified Operation Rules for recoding by pairs 33
3.4 Schematic for recoding by pairs multiplier 34
3.4 Decoding Operations Truth Table 35
4.1 Gate designs in a recoding by pairs decoder 36
4.3 34-bit Complementing Circuit 37
4.4 Accumulator Unit 38
4.5 Shift left register 39

4.6 Details circuit for recording by multiple pair 40

5.1 Internal resistor-transistor logic circuit schematic of the multiplier 42


5.1 Block diagram of the recoding logic component circuits 42
5.1 Outputs of each recoding logic component 43
5.3 8-bit multiplication unit using Recoding by pair's algorithm 44
5.3 Outputs for each level in the 2nd transition cycle 47
5.3 Outputs for the 3rd and 4th transition cycles 47
5.3 High speed multiplier of 32 bits which uses a recoding by pairs algorithm 48

6.2 Table of components 49


6.3.1 The common anode 7-segment Display 50
6.3.3 Arty Z7 51
6.4 Schematic of the circuit 52
6.4 Circuit before connecting to Arty Z7 53
6.5 Circuit after connecting to Arty Z7 55
6.6.2 Signed Multiplier 56
6.6.3 Signed to std logic vector convertor 57
6.6.3 Signed to slv details circuit 58
6.6.4 BCD display components 58

6.6.5 Hex to 7 segments component 59

viii
6.6.6 Segment controller 60

ix
ABSTRACT

Implementing booth algorithm on FPGA

By

Rashed Hamad Alajmi

Master of Science in Electrical Engineering

The goal of this project is to design, model, simulate and ultimately create a High-speed

32-bit multiplication system which will be utilizing recording by pair algorithm in order to speed

up the process. In basic mathematical terms, multiplication is the process by which a number is

scaled by another number. In order to elaborate on high speed 32-bit multiplication process, I have

discussed in detail several methodologies and frameworks. After thoroughly comparing different

algorithms, booth’s algorithm is by far the most efficient in terms of speed and accuracy.

The logical circuits are used to carry out a set of different actions which are dependent on
the input fed to the system. The function of the logical circuits can be defined as shifting,
complementing and control circuits. The specifics to these components are examined, designed
and articulated during this Project. A carry look ahead adder (CLA) is used in the project due to
its fast propagation time. The CLA is used along with a carry save adder because the addition
process involves multiple n-bit numbers. Two different architectures, one level CLA and two
level CLA adder have been discussed which can be used to increase the speed of operation.
However in this project, a two level CLA adder is used because it is much faster than one level
CLA adder when it comes to dealing with 32 bit numbers.

x
Chapter 1: Project Overview

1.1 Introduction

Multiplication is paramount importance in various fields ranging from finance to

engineering. It is a crucial mathematical aspect of several electrical engineering fields such as

microchip development, telecommunication networks, graphic engine designing, image rendering

and most importantly digital signal processing (DSP). In a general-purpose multiplier, the data

input is a continuous process which makes the algorithm complex. The complexity of any

algorithm can be classified into cost and time. Therefore, a better algorithm would not cost high

as well as have a fast execution time. Multiplication processes are time-consuming since they

involve multiple complicated computations which would increase the execution time, thus the

algorithm must be optimal to reduce time delay. It has been found by VLSI designers that assigning

large area to the integer and floating-point multipliers helps in speeding up the multiplication

process.

Optimally, the rate of calculations needs to be as fast as one billion arithmetic operations

performed in each second in conjunction with real-time DSP. The average rate required is one

computation successfully performed in a billionth of a second (10-9 s) which is very common

considering the vast computations necessary in real-time DSP. Advancements in Very Large Scale

Integration (VLSI) technology has resolved the heavy time-consumption issues in real-time DSP

computations by incorporating innovative and new methodologies and design architectures which

increase the efficiency of the operation significantly. The theoretical aspects of multiplication

algorithms which seemed far-fetched a few decades ago can now easily be implemented thanks to

the advancement in VLSI technology in both the production of the devices and the articulation of

1
relevant methodologies that have resolved any issues with fabrication and enhanced the efficiency

of complex design.

Multiplier requires intensive computations. Multiplication can be performed in three


major steps. In the first step, the partial products are generated. In the second step, the partial
products are reduced to one row of final sums and carries. In the third step, the final sums and
carries are added to generate the result. A modified booth multiplier should concentrate on the
following things: reducing total number of partial products, reduce number of 2’s compliment,
and optimization of adder structure.

Booth algorithm is a crucial improvement in the design of signed binary multiplication.


There has been progress in partial products reductions, adder structures and complementary
methods but still there is scope in modifying the booth algorithm so as to further optimize. The
modified booth multiplier is synthesized and implemented on FPGA using VHDL hardware
descriptive language.

1.2 Objective

The primary purpose of this project is the modelling, design, testing and implementing of

the 32-bit high speed multiplier by using recording by pair algorithm on a field programmable gate

array (FPGA) which is both efficient in terms of speed and accuracy in terms of solving a huge

amount of complicated computations. Normally, the speed and complexity of the design would be

compromised due to its inherent nature but if a carry look-ahead adder circuit can help reduce the

speed issues.

2
1.3 Project Outline

The project is categorically separated into the following components:

The introduction of the project is presented in chapter 1. It helps to give an overview to the

reader about the project.

Chapter 2- High Performance Adders: This section will briefly touch upon some of the fast

addition techniques that improve the multiplier performance such as Carry Save adders and

discussed carry look-ahead adders in great detail since they are of more importance for this project.

Chapter 3- Multiplication Algorithms: This section will adequately introduce, classify and

discuss the various multiplication methodologies and theories that can be implemented. These

include Recoding algorithms, direct multiplication and Booths algorithm. The main focus of this

project is implementing the booths algorithm on FPGA to design the multiplier therefore the booth

algorithm will be discussed in greater detail.

Chapter 4- Logic Circuits Design: As discussed previously, many logic circuits will be

used in order to design, model, simulate and implement the multiplier which include decoders,

accumulators, right and left shift registers, control circuits and how each of them play a crucial

part in the efficiency of the multiplier.

Chapter 5- Designing the Multiplier: After laying down the groundwork necessary for the

articulation of our multiplier in chapters 2,3 and 4 in terms of the adders, multipliers and logic

circuits to be implemented, the high speed multiplier is studied and designed so that it is compatible

with inflexible applications and very large scale integration applications.

Chapter 6 illustrates the modeling of a multiplier using VHDL hardware descriptive


language.
Chapter 7 acts as a concluding chapter that summarizes the results of the project.

3
Chapter 2: High Performance Adders

Fast addition is an essential component in this digital era and especially in real-time digital

signal processing. The efficiency and speed of the adders ends up playing a very important in the

overall speed and accuracy of any mathematical circuit. In this chapter, some of the fast adding

methods widely used are investigated along with the necessary details regarding Carry Look-

Ahead Adders (CLA) which will be implemented in the final design of the multiplier. It is deduced

that in order to enhance the effectiveness of the addition and the overall system, it would be more

beneficial for the system to comprise of a multilevel CLA addition algorithm.

2.1 Carry Look-Ahead (CLA) Adders

CLA adders in contrast to the slow and basic ripple carry adder are much more complicated

but provide a very efficient upgrade in speed. Ripple-Carry (RC) Adders can be compared with

conventional methods of addition i.e. via paper and pencil in which corresponding digit are added

to one another starting from the units position or whichever is the least significant until all

corresponding digits have been added and a final result has been obtained. In the RC Adders, there

is a chance that the sum of the corresponding digits might exceed the limit because of which an

extra carry bit has to be carried to the next least most significant number. The main difference

between RC and CLA adders is that although both processes initiate in the same manner i.e.

propagation through each 4-bit segment, in the CLA adder after the initiation, the speed is 4 times

greater since it involves jumping from one adjacent carry unit to the next which ultimately results

in the carry propagating inside the numbers in that segment for each group that has accepted a

carry in.

4
2.1.1 CLA Theory

Based on the concept established of 1-bit full adders, let’s assume a full adder circuit as

shown in Figure 1 in which the operand bits Ai and Bi are being added along with the Carry in bit

from the previous column (Ci).

Figure.1: Schematic of a Carry Look Ahead Adder

As it can be seen, there are two internal signals being generated, namely Pi and Gi which

are computed as follows:

𝑃𝑃𝑖𝑖 = 𝐴𝐴𝑖𝑖 + 𝐵𝐵𝑖𝑖 … … … … … (1)

𝐺𝐺𝑖𝑖 = 𝐴𝐴𝑖𝑖 . 𝐵𝐵𝑖𝑖 … … … … … . (2)

Subsequently, the sum and carry out functions can be defined as follows:

𝑆𝑆𝑖𝑖 = 𝑃𝑃𝑖𝑖 + 𝐶𝐶𝑖𝑖 … … … … … (3)

𝐶𝐶𝑖𝑖+1 = 𝑃𝑃𝑖𝑖 . 𝐶𝐶𝑖𝑖 + 𝐺𝐺𝑖𝑖 … … … … … (4)

Where Si, Ci+1 and Ci are the sum, carry out and carry in functions respectively and Pi and

Gi are known as the carry propagate and the carry generate respectively. The carry generate is

known by that term since it since a carry out is generated whenever the signal is equal to 1,

irrespective of the carry in signal. The carry propagate is known by that term since it propagates

the carry from carry in to carry out whenever the carry propagate is equal to 1. There exist two

different architectures in CLA adders which are known as One-level and Two-level CLA

respectively. A thorough investigation needs to be made in order to decide which of the two will

5
give the most optimal results. In order to analyze and design these units, there are a few modules

used which will be discussed in the following section.

2.1.2 Carry Look-Ahead Adder Group

Based on the CLA theory discussed in section 2.2.1, it was deduced how the sum and carry

out signals are determined in a CLA Adder. However, there exist some fan-in restrictions because

of which the adder is split into different 4-bit groups. These 4-bit groups are split across 3 levels

of logic as depicted by Figure.2

Figure.2: Schematic of a Carry Look-Ahead Group Unit

The levels are as follows:

First Level: All the P &G signals are generated from here, more specifically four sets of

P& G logic signals ( each set includes an AND gate and an XOR gate)

6
Second Level: This is logic block of the CLA that that includes 4 different 2 level

implementation circuits. In the above figure, the C1, C2, C3 and C4 are generated in this level

Third Level: This consist of the four logic XOR gates which generate the sum signals S0,

S1, S2 and S3.

2.1.3 Four-bit Carry Look-ahead Unit 1

Building upon the carry look-ahead adder theory and group discussed in the previous

discussions and observing the schematic and block diagram shown in Figure 3 and 4 respectively,

it is possible to recursively expand equation 6.

Figure.3: Schematic of a Four-bit Carry Look-ahead Unit 1

Figure.4 Block Diagram a Four-bit Carry Look-ahead Unit 1

7
The Boolean expressions of the carry outputs at each stage could be determined and

simplified by simply substituting the previous carry output expressions as described below:

𝐶𝐶1 = 𝐺𝐺0 +𝑃𝑃0 𝐶𝐶0 … … … . .5

𝐶𝐶2 = 𝐺𝐺1 +𝑃𝑃1 𝐶𝐶1 … … … . .6

𝐶𝐶2 = 𝐺𝐺1 +𝑃𝑃1 𝐺𝐺0 +𝑃𝑃1 𝑃𝑃0 𝐶𝐶0 … … … . .7

𝐶𝐶3 = 𝐺𝐺2 +𝑃𝑃2 𝐶𝐶2 … … … . .8

𝐶𝐶3 = 𝐺𝐺2 +𝑃𝑃2 𝐺𝐺1 +𝑃𝑃2 𝑃𝑃1 𝐺𝐺0 +𝑃𝑃2 𝑃𝑃1 𝑃𝑃0 𝐶𝐶0 … … … . .9

𝐶𝐶4 = 𝐺𝐺3 +𝑃𝑃3 𝐶𝐶3 … … … . .10

𝐶𝐶4 = 𝐺𝐺3 +𝑃𝑃3 𝐺𝐺2 +𝑃𝑃3 𝑃𝑃2 𝐺𝐺1 +𝑃𝑃3 𝑃𝑃2 𝑃𝑃1 𝐺𝐺0 +𝑃𝑃3 𝑃𝑃2 𝑃𝑃1 𝑃𝑃0 𝐶𝐶0 … … … . .11

Thus the equations relevant for this project are equations 5, 7, 9 and 11. The carry output

C4 is the final carry generated from the previous carry generates and propagates and is thus fed

into the segment that is next.

2.1.4 Four-bit Carry Look-ahead Unit 2

The schematic and block diagram shown in Figure 5 and 6 respectively indicate some of

the differences that exist in CLAU 1 and CLAU 2. Based on these differences, it is possible to

derive modified expressions for the carry generate (G1*) and carry propagate (P1*) which are as

follows:

𝑃𝑃1∗ = 𝑃𝑃3 . 𝑃𝑃2 . 𝑃𝑃1 . 𝑃𝑃0 … … . .12

𝐺𝐺1∗ = 𝐺𝐺3 + 𝐺𝐺2 . 𝑃𝑃3 . 𝑃𝑃2 + 𝐺𝐺1 . 𝑃𝑃3 . 𝑃𝑃2 + 𝐺𝐺0 . 𝑃𝑃3 . 𝑃𝑃2 𝑃𝑃1 … … .13

8
A one-level Carry-Look Ahead Adder is used in this case which basically consists of a

Four-bit Carry Look-ahead Unit 1 (CLAU1) and a Carry Look-Ahead Adder Group (CLAAG)

combined in order to enhance the speed of the multiplier. 4 bit carry generates and propagates are

generate from the CLAAG which uses 4-bit carry input from the CLAU 1. The CLAU 1 then

generates the 4-bit carry as the output after taking from the CLAAG the carry generate and

propagate as its inputs. This ends up creating a total of 8 blocks which have a combined 32 bits,

all inter-connected which gives way for the concept of a 32-bit one-level CLA unit.

Figure.5: Schematic of a Four-bit Carry Look-ahead Unit 2

Figure.6 Block Diagram a Four-bit Carry Look-ahead Unit 2

9
2.1.5 32-bit One-Level Carry-Look Ahead Adder

In this technique, as discussed briefly in the previous section, the different adder units are

segmented into different segment and the carry look-ahead method is applied at the group level.

The output carry generates propagation is permitted along the various groups that exist in this

adder for the purpose of reducing the time delay which is a significant issue in conventional CLA

adders. The time delay in CLA adders is denoted by 𝜏𝜏𝑔𝑔 which is accounted in the total addition

time for an n-bit adder as follows:

𝑛𝑛
𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑓𝑓𝑓𝑓𝑓𝑓 𝑛𝑛 − 𝑏𝑏𝑏𝑏𝑏𝑏 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎) = 𝜏𝜏𝑔𝑔 + 2 𝜏𝜏𝑔𝑔 𝑥𝑥 + 𝜏𝜏𝑔𝑔 … … … … … 14
4

𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑓𝑓𝑓𝑓𝑓𝑓 𝑛𝑛 − 𝑏𝑏𝑏𝑏𝑏𝑏 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎) = (2 + 0.5𝑛𝑛)𝑥𝑥 𝜏𝜏𝑔𝑔 … … … … … 15

𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑓𝑓𝑓𝑓𝑓𝑓 32 − 𝑏𝑏𝑏𝑏𝑏𝑏 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎) = 18𝜏𝜏𝑔𝑔

*note: The first time delay 𝜏𝜏𝑔𝑔 denoted is the delay due to the carry propagate and carry

generate in the CLAAGs, the second denotes the delay in the output sum generation of the final

group while the middle term denotes the time delay in signal propagation through the CLAU1.

Figure 7 represents a 32-bit one level CLA adder where A and B are the inputs over 32 bits and S

is the corresponding output. C denotes the flow of carry propagation from one stage to the other.

The final output carry is C32 while the first input carry is C0.

Figure.7: Schematic of a 32-bit One Level Carry Look-Ahead Adder Unit

10
2.1.6 Two Level CLA Adder unit

A two-level CLA adder unit is much faster compared to a one-level CLA unit adder which

is why it is preferred over it usually. There are some key differences in the two-level CLA adder

and the one-level CLA Adder as it can be inferred from Figure 8. It forms one one-level CLA unit

at a group level wile it forms two two-level adders at the piece wise. The carry output is generated

from each of the pieces and is rippled throughout the latter pieces in the Adder. When comparing

Figure 7 and 8, the terms A,B and S in both figures serve the same purpose where A and B are the

inputs over 32 bits and S is the corresponding output. The main difference in the bottom CLAU 1

sections where Cin is the input carry to the CLAU 1 and C32 is the output carry and C16 serves both

purposes as it propagates from one CLAU1 stage to the other. Other differences are that this system

includes P1* and G1* too which are modified carry propagate and generate connection.

Figure.8: Schematic of a 32-bit Two Level Carry Look-Ahead Adder Unit

The time delay is also measured differently in the two-level CLA Adder and it is s follows:

11
𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 𝜏𝜏𝑔𝑔 + 2 𝜏𝜏𝑔𝑔 + 2 𝜏𝜏𝑔𝑔 𝑥𝑥 𝑆𝑆 + 2 𝜏𝜏𝑔𝑔 + 𝜏𝜏𝑔𝑔 … … … … … 16

𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = (6 + 2𝑆𝑆)𝑥𝑥 𝜏𝜏𝑔𝑔 … … … … … 15

Since in our case S =2,

𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 10 𝜏𝜏𝑔𝑔

*note: The first time delay 𝜏𝜏𝑔𝑔 denoted is the delay due to the carry propagate and carry

generate in the CLAAGs, the second (2 𝜏𝜏𝑔𝑔 ) is the delay due to the carry propagate and carry

generate in the CLAU2, the third (2 𝜏𝜏𝑔𝑔 𝑥𝑥 𝑆𝑆) is the delay which occurs when signals are propagating

through the CLAU2 sections, the fourth (2 𝜏𝜏𝑔𝑔 ) is the delay which occurs during signal propagation

through the last section’s CLAU2 and the last delay 𝜏𝜏𝑔𝑔 is due to the output sum produced in the

last segment.

As we can see the time delay is much lesser compared to the one-level CLA unit which

indicates better performance at least in terms of speed. However, when n-bit numbers have to be

added, where n is larger, CLAU2 might become inefficient which is where carry save adders can

replace them as a more efficient option.

2.2 Carry Save Adders

The main drawback of using conventional adders is that they are devised in a way to only

add two numbers simultaneously. For the current project of a 32 bit multiplier, it is required that

the multiplicand multiples are added at a significantly quick rate without and restrictions on the

number of units being added which is due to the inherent size of the operands and the vast amount

of partial additions required. Carry Save Adders or CSA can overcome the shortcomings of

conventional adders and even CLAU2 Adders since it is possible to carry out addition of three n-

bit numbers simultaneously while producing the sum vectors and carry generates which are then

used as inputs by the following CSA units. A CSA unit will have the same number of full adders

12
as the bit size i.e. a 32 bit CSA will have 32 full adders. The general mechanism behind a CSA

adder is that it generates two n-bit output vectors which are the sum (S) for the partial sum and the

carry ( C) for the partial carry to be used later on and takes the input of 3 n-bit binary numbers

which can be denoted as x, y and z. The order in which they are added is irrelevant to the

computations of a CSA. A CSA is highly similar to a 1-bit full adder as can be seen in Figure 9

with the difference being that the input carry is now denoted as ‘z’, the original answer output is

now denoted as ‘s’ and the output carry is denoted as c. Figure 10 depicts how the full adders in a

CSA unit can add three distinct n-bit numbers (x, y and z) and convert them to two output vectors

(c and s). In a CSA unit, the carry vector is moved to the left by one-bit always.

Figure 9: The Similarities between a 1-bit Full Adder and a Carry Save Adder

Figure 10: Carry Save Adder Blocks for n = 8 bit numbers

The final sum however is calculated by using a look-ahead carry adder (LCA). Figure 11

shows how many steps does it take to add m different n-bit numbers. Before the numbers are added

in the final LCA, they need to go through m-2 blocks, with each block representing multiple one-

13
bit CSA units arranged in a parallel orientation. Each time a block is passed, the numbers are

incremented by 1-bit in terms of size. Consequently if we assume the time delay at each gate to be

𝜏𝜏𝑔𝑔 then each CSA level contributes to a time delay of 2𝜏𝜏𝑔𝑔 which is the same as the time delay

incurred in a full adder stage. According to figure 11 shown below which is a three-leveled tree,

the total time delay will be 6𝜏𝜏𝑔𝑔 .

Figure.11: Carry Save Adder Tree to add m-n numbers

14
2.3 Conclusion

In this section, the carry look ahead technique was described in detail which will be used

for the multiplication techniques that will be discussed later. The CLAAG units and their

importance in addition was also discussed. Then two addition implementation commonly applied

to multiplier were discussed namely the One-level CLA unit and the Two-level CLA unit. After

comparing their speeds in terms of time delay, it was deduced that the two-level CLA unit is much

faster having a time delay of only 10𝜏𝜏𝑔𝑔 while the one-level CLA unit has a delay of 18𝜏𝜏𝑔𝑔 . It was

then investigated how multiple n-bit numbers could be added and the techniques most feasible was

carry save adder in which a three-leveled CSA gives a time of delay of only 6𝜏𝜏𝑔𝑔 thus the total time

delay incurred from the addition process will be 16𝜏𝜏𝑔𝑔 .

15
Chapter 3: Multiplication Algorithms

In this chapter, several multiplication algorithms and techniques will be discussed. The

basic method of adding a number of partial products is constant throughout all techniques however

they differ in their complexity and speed, both of which are important factors to determining which

is best suited for the task at hand. The primary objective is to achieve a perfect a balance in speed

and simplicity in order to design and implement the 32-bit multiplier.

The optimal speed required can be determined the reduction of time delay required and the

simplicity can be determined by the maximum reduction in gate complexity. These in turn depend

on the cost incurred and the overall performance of the system. Therefore it is important to

determine which broad category of multiplication techniques does the required method fall under.

There exist three main categories namely serial, parallel and serial-parallel multipliers. Serial

multipliers include add-shift techniques and recoding techniques while parallel techniques include

Rom network, reduction methods and iterative cellular arrays. The main focus for this project is

serial multipliers and in the following section, several techniques that have been developed and

utilized over the years will be discussed in this section.

3.1 Serial Multiplication Methods

The main concept behind serial multiplication and architectures based on serial

multiplications use the add-shift method for their operations. The number of bits n determines the

multiplication time and complexity as they are proportional to the square of n (n2). Thus the

multiplication and time and complexity tend to increase exponentially for greater values of n. The

method of serial multiplication entails that the least significant bit will be the first to be sequentially

inspected. If the value of the bit is ‘1’ then the most important segment of the double-length

16
accumulator that is valued at zero will be added with the multiplication with the multiplicand

whose bit is ‘1’. The accumulator shifts one bit to the right after every sequential inspection and

once all the bits have been inspected, the product is generated within the accumulator. The trade-

off in this technique is that it although it uses fewer resources and is simple, the computations

could become significantly complicated should the size of the multiplicand/multiplier increase.

Improvements to this basic mechanism have been devised and they would be discussed in the

following sections.

3.1.1 Simple Multiplication Method

The simple multiplication or direct multiplication method, each multiplier with the value

of 1 is added to the multiplicand. The number of digits that have the value 1 correspond to the

exact number of addition operations required i.e. 3 digit multiplier will need 3 addition operations.

The entire premise of this simple multiplication method is based on detection of 0’s and 1’s. In the

former the multiplier takes no action and for the latter it performs an addition operation. The

example below depicts how the number of operations is decided:

X=101101010110

In the above example, there are a total of 8 1’s detected which means that the multiplier

would be added to the multiplicand 8 times.

17
3.1.2 Booths Algorithm

Booths algorithm is one of the most widely used algorithms that involves the multiplication

of two signed binary numbers in a complement notation of two’s. This algorithm is of great

importance in computer architecture. The importance of Booth algorithm lies in the fact that it can

preserve the sign of the result. The entire theory is based on the notion that the strings of binary

digits in a multiplier only need to be shifted and not necessarily added. There are four main steps

to be followed in the Booths Algorithm and the foundation of this method is built upon the notion

that an extra 0 will be added next to the lowest significant bit of the multiplier (b). The multiplier

bits bi and bi-1 from the lowest significant bit are checked sequentially and then the multiplicand

(a) is added to their partial product (m) or subtracted based on the signs and values. A single bit of

the multiplier is moved to the right at the end of each step till it obtains the value of 0. The actions

and conditions could be characterized as follows in Figure 12:

Value of bi-1 Value of bi Action

0 0 Do Nothing

0 1 Add a

1 0 Subtract a

1 1 Do Nothing

Figure.12: Booths Algorithm operations for different combinations of bi-1 and bi

A more concise way of representing this table would be by the expression (bi-1-bi) which if

0 requires no action, if +1 requires addition to a and if -1 requires a subtraction from a. Figure 13

18
depicts the flow chart and the method by which Booths algorithm is carried out generally following

which an example of booths algorithms application is described.

Figure.13: Flow Chart for Booth’s Algorithm

19
Example:

Qs. Multiply 2 (0010) by -3 (1101) using booths algorithm using 4 bit numbers? The

answer should be 1111 1010 1. The steps for this computation are shown below in Figure 14:

Figure.14: Booth’s Algorithm example

*note: The colored bits in each iteration are the bits used to determine what the next step

or action to be taken will be.

A few examples which compare the number of operations when the multiplication is

carried out on the same number using booths and direct multiplication are shown below:

1. X = 1 1 1 0 1 1 1 1 0 1 1 0

When using direct multiplication, the number of operations required would be 9

however let’s have a look at booths algorithms application to this:

1 1 1 0 1 1 1 1 0 1 1 0

- + - + - +

From this we can see that it will only take 6 operations using the booths algorithm.

2. X = 1 0 1 0 0 1 0 1 0 1 0 1

20
When using direct multiplication, the number of operations required would be 6

however let’s have a look at booths algorithms application to this:

1 0 1 0 0 1 0 1 0 1 0 1

- + - + - + - + - + -

From this we can see that it will take 11 operations using the booths algorithm.

This shows that the pattern of the binary digits determines the complexity of the operations

and the speed too hence Booth’s algorithm is only beneficial when there are lesser bits with the

value of 1 which is a shortcoming of this technique; however improvements to this have been

suggested which will be discussed in the following section.

3.1.3 Modified Booth Algorithm

The Modified Booth’s algorithm is much greater in speed than the normal Booth’s

algorithm, almost twice as fast. This algorithm is meant to group the consecutive bits in either of

two operands to formulate signed multiples that decreases the total number of partial products and

ultimately increases the efficiency of the operation. The most efficient popular modified Booth’s

algorithm is known as the Radix-4 which uses the following algorithm in order to scan 3 bits of

strings:

1. Firstly, in order to ensure that that n is even, the sign bit 1 position should be extended

if need be.

2. Secondly, a ‘0’ is added to the right of the multipliers least significant bit.

3. Thirdly, the value of each vector will determine what the partial product will be. The

partial product can only take the value of 0, +X, -X, +2X, -2X where X is the

multiplicand. The bits are to be grouped in groups of three so that they can overlap by

21
one bit with the previous group. This process starts from the least significant bit and

only 2 bits of the multiplier are used for the first group of 3. Figure 15 below shows the

functional operations of the Booth encoder for the Radix-4.

Figure.15: Radix-4 Booth encoding table

An example of Radix-4 Booths algorithm is shown below in Figure 16 for the

multiplication of -73 (10110111) and 90 (01011010):

Figure.16: Radix-4 example

22
3.2 Recoding Multipliers

In direct multiplication, it was discussed that the multiplicand must be added for each digit

that attains the value of one therefore n number of operations are required for multipliers that have

n digits with the value of one. The primary objective of using recoding multiplication algorithms

is to enhance the simplicity of the system by reducing the number of operations required regardless

of how many digits that have attained the value of 1 exist. The uniform shift of one and the uniform

shift of two methods are the two methods by which recoding multipliers carry out their operations.

The premise of recoding multipliers is similar to that discussed in booths algorithm i.e. that both

addition and subtraction operations are carried out based on these two following fundamental rules:

1. The additional of a multiplicand that has been moved i positions = Subtraction of that

multiplicand plus an addition of a multiplicand that has been moved i+1 positions so they can be

interchanged.

2. The addition of a multiplicand that has been moved i+1 positions = Two additional of a

multiplicand that has been moved i positions so they can be interchanged.

3.2.1 Uniform Shift of One method

For a multiplier X, recoding yi as discussed before is dependent on what values are attained

by it and yi+1. The final recoding results also depend on what the results of the recoded yi-1

multiplier were. The operations fi that is performed because of the recoding principles mentioned

above can be summarized as follows in Figure 17:

23
Operation required yi yi+1 fi

None 0 0 0

None 0 1 0

Addition 1 0 1

Subtraction 1 1 -1

Figure .17: Operation Rules when depending on yi

*note: The above situations only occur when the recordings results of yi-1 have no bearing

on the recoding of yi.

From the above table, in the last scenario when both digits attain the value of 1, instead of

carrying out the addition with the multiplicand that has moved i positions, subtraction of that

multiplicand is carried plus an addition of a multiplicand that has been moved i+1 positions. In the

event we have an yi+2 and it attains the value of zero then instead of carrying out two addition

operations on the yi+1 multiplicand, all that is need to be done is a single addition operation of the

yi+2 multiplicand however as it can be seen, the number of operations remain the same for this

scenario. This is due to the fact that two addition operations of yi and yi+1 are now being replaced

by an equal number of operations that include an addition of the yi+2 multiplicand and a subtraction

of the yi multiplicand. The number of operations will actually reduce if the yi+2 multiplicand also

attained the value of 1. In this case, instead of having two addition operations for yi+1, a single

addition of the yi+2 multiplicand is sufficient and instead of the two additions that will be needed

on the yi+2 position, a single addition process on the yi+3 multiplicand will be sufficient if it attains

24
the value of 1 too. This implies that as long as this process continues, the total number of steps

required will decrease drastically enhancing the simplicity of the multiplier.

The carry propagation theory encapsulates the entire impact that a bit-recoding has on the

bits that follow later on. The pseudo carry(ci)can be defined as the operation that as a result of

recoding applied to yi-1 is pushed forward to the yi multiplicand. It takes the value of 1 in the event

that an addition operation is pushed forward and 0 if nothing happens. Figure 18 gives a detailed

a description of the operations requirement during recoding operations.

yi yi+1 ci ci+1 fi

0 0 0 0 0

0 0 1 0 1

1 0 0 0 1

1 0 1 1 0

0 1 0 0 0

0 1 1 1 -1

1 1 0 1 -1

1 1 1 1 0

Figure .18: Operation Rules when depending on yi and carry variable is considered

For uniform shifts of one operation, it is important to introduce two additional binary

variables namely fi1 that represents either addition of subtraction operations and fi0 which indicates

25
whether or not an operation is taking place. The value assignment for these variables can be

summarized as per Figure 19 and 20 below:

fi0 value Indication

1 Subtraction operation required

0 Addition operation required

Figure .19: Operations rules for fi0

fi1 value Indication

1 Operation is required

0 No operations required

Figure .20: Operations rules for fi1

Incorporating these more elaborately defined operation variables into Figure 18, the

following recoding table is obtained:

26
yi yi+1 ci ci+1 fi0 fi1

0 0 0 0 0 N/A

0 0 1 0 1 0

1 0 0 0 1 0

1 0 1 1 0 N/A

0 1 0 0 0 N/A

0 1 1 1 1 1

1 1 0 1 1 1

1 1 1 1 0 N/A

Figure .21: Operation rules for Uniform Shift of One method

A few equations and relationships can be derived on the basis of the above information. It

can be seen that fi1 attains the same values as yi+1 given that fi0 has the value of 1. The others are as

follows:

𝑓𝑓𝑖𝑖0 = 𝑐𝑐𝑖𝑖 + 𝑦𝑦𝑖𝑖 … … … … … .18

𝑐𝑐𝑖𝑖+1 = 𝑦𝑦𝑖𝑖 . 𝑦𝑦𝑖𝑖+1 + 𝑦𝑦𝑖𝑖+1 . 𝑐𝑐𝑖𝑖 + 𝑦𝑦𝑖𝑖 . 𝑐𝑐𝑖𝑖 … … … … … 19

27
In conjunction with the above equations, Figure 22 shows the design of a uniform shift by

one recoding bit multiplier, the sequence in which it passes the logic gates, logic circuits and the

flow of operations from start to end.

Figure 22: Uniform shift by one recoding bit multiplier

28
3.3.2 Uniform Shift of Two method

As discussed previously, there was significant room for improvement in speed of the

multiplier as the uniform shift of one method was able to reduce the total number of operations in

most situations to a great extent. However there is more room for speed improvement in multiple

multipliers are scanned in each cycle. Scanning two multiplier bits simultaneously could reduce

the number of operations by half, and three multiplier bits by three times. This is known as uniform

shift of multiples. For this project, only uniform shift of two will be discussed which includes non-

overlapped scanning and overlapped scanning.

3.3.2.1 Non-Overlapped Scanning

In this method, the multiples outputted from each multiplier bit are used in an addition

operation with the partial product. A better way to visualize this is through an example where an

even word length of n = 2M, where M is the total bits examined, is assumed. For this example, M

is assumed as 2 which means that there are a total of 4 bits. The least significant bits among all the

bits which are y1 and y0 are scanned and there are four possible routes that could be taken

depending on their values. The sums that result from these operations contain more than n bits that

have been moved multiple times albeit in either direction. The following rules determine those

operations:

1. If both y1 and y0 attain the value of 0 then no multiples will be added.

2. If y1 attains the value of 0 and y0 attains the value of 1, then the multiplicand X is added

to the following partial product.

29
3. If y1 attains the value of 0 and y0 attains the value of 1, then a multiple of the

multiplicand X (in this case 2X) is added to the following partial product. 2X implies

that the multiplicand X will be moved one place to the left.

4. If both y1 and y0 attain the value of 1, both X and 2X are added to the following partial

product.

Broadly defining this, multiplying a number by a factor of 2i is equivalent to moving a

binary digit to the left by i positions with zero digit coming in from the right. The following

equations capture this relationship where y1 and y0 are represented j and T ranges from 0 to k.

𝑘𝑘

𝑗𝑗 𝑥𝑥 𝐵𝐵 = � 𝑏𝑏𝑇𝑇 𝑥𝑥 2𝑖𝑖 𝑥𝑥 𝐵𝐵 … … … .20


0

𝑗𝑗 𝑥𝑥 𝐵𝐵 = 21 𝑥𝑥 𝐵𝐵 = 2𝐵𝐵 … … … … … … . .21

Following this addition operation, the accumulator and the multipliers both are moved

together as if they were on unit by 2 positions towards the right side as per the following

equation:

𝐴𝐴𝐴𝐴𝐴𝐴 𝑥𝑥 𝑦𝑦 = 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑜𝑜𝑜𝑜𝑜𝑜 + 𝑆𝑆𝑛𝑛 … … . 𝑆𝑆0 . 𝑦𝑦𝑛𝑛−1 … . 𝑦𝑦2 … … … . .21

30
A carry save adder and carry propagate adder which were discussed in section 2 are

implemented in a non-overlapped scanning multiplier as per Figure 23. The flow of operations

shown is that firstly the carry save adder receives two inputs which it then sets aside for the X

and 2X multiples. The way these multiples are generated is by passing through the AND gate

with the y1 and y0 multipliers. The third input received by the carry save adder is generated from

the accumulator’s existing partial product. The carry propagate adder receives its inputs in the

form the two carry save adder outputs and thus generates the final product. The newly generated

partial product is moved two positions to the right after every iteration.

Figure .23: Non-overlapped multibit scanning multiplier

31
3.3.2.2 Overlapped Scanning

Through this technique it is possible to drastically reduce the total quantity of multiplicated

multiples which in turn decreases the total number of operations. The main idea behind this

algorithm is basically recoding by pairs or overlapped scanning. The basic process starts when the

multiplier is split into paired-bit groups and only one of these groups is scanned at a specified time.

As with other multipliers, either no operation happens or either of addition/subtraction takes place.

The multiplicand in the addition or subtraction operations exists in multiples of 2’s (2 times X/ 4

times X etc.). The multiple is obtained by moving the multiplicand from its position of entry in the

adder to the left by either 1 position or 2 positions from the reference bit which is the low order bit

in the sequence. After this, the partial product that is obtained is moved by 2 positions to the right,

as well is the multiplier.

yi+1 yi Supposed to Add Actually Added

0 0 No operation No Operation

0 1 +X +2xX

1 0 +2xX +2xX

1 1 +3xX +4xX

Figure .24: Operation Rules for recoding by pairs

32
The above figure encapsulates the rules for recoding by pairs. What it infers is that if the

first bit attains the value of 1, then an error of X is incurred in the partial product which can be

mitigated when the preceding pair is processed and 4 times X is subtracted from the partial product

where X is the multiplicand. The final set of modified rules is shown in the figure below:

yi+2 yi+1 yi operation

0 0 0 No operation

0 0 1 +2xX

0 1 0 +2xX

0 1 1 +4xX

1 0 0 -4xX

1 0 1 -2xX

1 1 0 -2xX

1 1 1 No operation

Figure .25: Modified Operation Rules for recoding by pairs

The lowest significant bit is assumed to have a value of zero during the computation, and

then the initiating partial product is zero. If the lowest significant bit attains the value of 1 then the

partial product I equal to the multiplicand. The speed with which this algorithm carries out

operations especially or larger bit numbers is precisely why it is preferred for this project.

33
3.4 Implementing the Overlapped Scanning multiplier algorithm

The main components of this system are the shifting, complementing circuits, the adder,

accumulator, multiplier and decoder as shown in Figure 26. Four control signals (S1 – S4) are

generated after the least significant three bits of the multiplier index are decoded.

The first of these control signals, S1 is known as the operation control signal that is inputted

to the adder. It has the task of enabling or disabling the results from the shifting and complementing

circuits. The S2 signal is the Addition/Subtraction operation control signal that as the name

suggests indicates which of the operations is supposed to be performed. S3 is the operation signal

that shifts the bits one position to the left (one-bit shift), while S4 is the operation signal that shifts

the bits two positions to the left (two-bit shift). The decoding operations take place as per Figure

27.

Figure .26: Schematic for recoding by multiplier pairs

34
yi+2 yi+1 yi operation S1 S2 S3 S4

0 0 0 0 0 N/A N/A N/A

0 0 1 +2xX 1 0 1 0

0 1 0 +2xX 1 0 1 0

0 1 1 +4xX 1 0 0 1

1 0 0 -4xX 1 1 0 1

1 0 1 -2xX 1 1 1 0

1 1 0 -2xX 1 1 1 0

1 1 1 0 0 N/A N/A N/A

Figure .27: Decoding Operations Truth Table

From this we can infer that if any of the three bits is 1, then S1 will attain the value of 1 but

if all three bits are 0 or 1 then it will be 0. S2 always takes the same value as yi+2 provided that the

S1 signal has the value of 1. If any of the two (yi+1 or yi) bits is 1, then S3 will attain the value of 1

but if both bits are 0 or 1 then it will be 0 and S4 will take the opposite value as S3.

35
Chapter 4- Logic Circuits and Modules

There are many different modules and circuits which have been used effectively in order

to carry out this project effectively. These include decoders, control gates, left-bit shifters, right-

bit shifters, accumulator and complementing circuits.

4.1 Decoders

A decoder is defined as a circuit that alters the code and converts them into a set of signals

which is primarily the reverse of encoding. It includes different logic gates such as AND, OR and

XOR that take different inputs and generate a certain number of control signals. In this project, the

inputs are yi+2, yi+1 and yi which generate 4 signals (S1 – S4). A schematic of the decoder used for

this project and its respective components, inputs and outputs is shown in Figure 28.

Figure .28: Gate designs in a recoding by pairs decoder

36
4.2 Control Gates

Control Gates are memoryless circuits which generate an output solely based on the

combination of their inputs which can be 0 or 1 at a given time. They have no feedback and any

change to the signals being fed to them will instantaneously alter the output signals too. The control

gate used in this project includes a chain of AND gates which works on the following principle: if

the two inputs to an AND gate are 1, only then will the output signal be 1 otherwise it will be zero.

In the circuit implemented for this project, there is a 34-bit control gates circuit which receives

input from the decoder and from the complementing circuit’s output.

4.3 Complementing circuits

The complementing circuit used in the project is a chain of XOR (Exclusive OR Gate)

circuits that operate on the following principle: It receives multiple inputs and has one output with

an exclusive disconnection. If any one of the input signals has a value of 1, only then will the

output signal be 1 but if both are 0 or both are 1, then the output signal will be 0. In this circuit, a

34-bit complementing circuit is used as shown in Figure 29 which receives the input signal S2 from

the decoder and from the left-bit shifting circuit.

Figure .29: 34-bit Complementing Circuit

37
4.4 Accumulator

An accumulator is defined as a register which serves the purpose of a temporary,

intermediate storage unit for the logic and arithmetic input from the computer’s CPU. If these

accumulators did not exist, then it would be necessary to copy each of the computations and results

onto the main memory which will be very time consuming as accessing the main memory over

and over again in order to read the results is a much slower process as opposed to reading the

results from an accumulator since the controller overhead for reading/writing is used for memory

elements.

The primary purpose that the accumulator register serves in this project is the accumulation

of the list of member bits. The count in the accumulator is initially zero and keeps rising as numbers

enter into it from the CLA unit. The result is stored in the accumulator and the multiplier register

once all the numbers have gone through the necessary operations. Below in Figure 30, it can be

seen that the extension bits € and the load_acc which is the binary input received to the accumulator

are fed into the accumulator and ultimately exist to the 32-bit multiplier register Q. If the input

signal coming from the load_acc is 1, then the data will add onto the accumulator, otherwise the

value in the accumulator will stay as it is.

2 2
To Q regiter
E Accumulator

Load_acc

Figure .30: Accumulator Unit

38
4.5 shift left register

A shift left register is a circuit conjunction that moves the data towards the opposite

direction of the control signal flow (towards the left) by other one or two position and the output

gained is a 2’s multiple of the multiplicand. This register is enabled by the S3 (one-bit shift left

control signal) and the S4 (two-bit shift left control signal) control signals. Below in Figure 31, the

shift left register circuit used in the 32 bit multiplier is shown. The 32-bit input is denoted by A32.

As discussed previously, the control signal S3 comes from the decoder and if it attains the value of

1 then the shifting circuit moves to the left by one bit position and if it has the value of 0 then no

shifting occurs. The output is a total of 34 bits which then behave as the input for the

complementing circuit discussed in section 4.3. Similarly, if control signal S4 attains the value of

1 then the shifting circuit moves to the left by two bit positions and if it has the value of 0 then no

shifting occurs. The output is a total of 34 bits which also behaves as the input for the

complementing circuit.

Figure .31: Shift left register

39
The very first least significant bit in the figure above is set at 0 while the others are

dependent on the combination of the inputs received and the control signals.

4.6 Shift Right Register

A shift register is required in order to carry out two fundamental tasks: storing the data and

moving it subsequently. It consists of a group of flip-flops that each stores a single binary bit and

then shifts that data from one flip-flop to the other within itself or outside it. A shift right register

moves the bits towards the right, one or multiple bits at a time in the direction of the control signal.

The multiplier register and the accumulators behave as the shift right registers in this project and

they move two bits in each transition.

Figure 32: details circuit for recording by multiple pairs

40
Chapter 5: Designing the Multiplier

A hierarchal modeling methodology has been applied in this project in order to design a

top module multiplier. A carry propagate adder, 3-leveled carry save adders and 4 recoding logic

modules are amalgamate in order to design the high speed recoding multiplier and an accumulator

and multiplier register instantiate the multiplier. This multiplier operates much faster than a ripple-

carry adder due to the extensive CLA circuitry deployed along with considerations for propagation

delays.

5.1 Logical Circuits Schematic

The gate-level logic circuit schematic of the multiplier and the block diagram of the

recoding logic components are shown in Figure 32 and 33 respectively which essentially comprise

of shifting, logic and complementing circuits. This figure takes into consideration that 8 bits are

being recoded at a time for a multiplicand of 8-bits hence there would be a total of 17 bits being

generated including the sign bit. This can be understood better if the total 17 bits are seen as 10

output bits, one sign bit and the rest sign extension bits. The results generated from these are reliant

on the three least significant bits from the multiplier. The required number of concurrent recoding

logic components is 4 since the multiplier is split into 8 bits in each component which adds up 32

bits in total. They also play an important part in the 32-but multiplication process. Four control

signals are generated from the decoder which then becomes the input of the recoding logic

components. The width of each of these output signals can be generalized by the following

equation where x is the module number and n represents the number of bits and m is the number

of operands:

41
𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ 𝑜𝑜𝑜𝑜 𝑥𝑥 𝑡𝑡ℎ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = (𝑛𝑛 + 2𝑚𝑚)𝑥𝑥 + 1 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏 … … … … … . .22

𝑒𝑒. 𝑔𝑔. 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ 𝑜𝑜𝑜𝑜 𝑥𝑥 𝑡𝑡ℎ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑤𝑤𝑤𝑤𝑤𝑤ℎ 2 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = (𝑛𝑛 + 4)𝑥𝑥 + 1 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏

Figure .33: Internal resistor-transistor logic circuit schematic of the multiplier

Figure .34: Block diagram of the recoding logic component circuits

42
From the above figures and details, it can be seen that 4 recoding logic components and 4

operands are necessary in order to recode 8 bits at a time. Each of the components generates a total

of n+8 bits plus one sign bit too. The output for each of the recoding logic components are

characterized in the figure below:

Module Output

1st 6 bit sign extension + 1 sign bit + (x+2) (exiting from the control gates)

2nd 4 bit sign extension + 1 sign bit + (x+2) (exiting from the control gates)

3rd 2 bit sign extension + 1 sign bit + (x+2) (exiting from the control gates)

4th 1 sign bit + (x+2) (exiting from the control gates) + 6 bits

Figure .35: Outputs of each recoding logic component

5.2 8-bit Multiplication Model

It has already been discussed how the recoding logic components will produce a total of

17 bits including the sign bit. This output of the first three is then fed to the first carry save adder

after which the 17-bit sum is obtained and the carry vectors are stored so that they can serve as the

input for the next carry save adder. Simultaneously, the 4th recoding logic module generates its 17

bits and feeds them to the next carry save adder together with the output from the accumulator.

This is then fed to the carry look ahead and is output is then subsequently fed to the accumulator

which carries it to the third carry save adder. This entire process is depicted in Figure 36. Since

there are 8 bits being recoded at a time, the 8 most significant bits of the output would be in the

accumulator whereas the 8 least significant bits would be in the multiplier register at the end of

the 4 transition cycles.

43
Figure .36: 8-bit multiplication unit using Recoding by pair's algorithm

From the figure, it can be seen that the ALOAD signal controls the 8 least significant bits

that go into the multiplier register. The first three of these bits are fed to the decoder which then

outputs the 4 control signals S1-S4. The recoding logic components generate the 17bits once they

obtain the 8 bits that are loaded in the multiplier register which then serve as the input for the adder

component in order to give the final output. In the figure above, S21 serves as the input for the

second carry save adder along with S22, S23 and S24. In order to better understand the workings of

the multiplier, an example will be discussed.

44
X (8 bit multiplicand) = 11101010 (234), Y (9 bit multiplier) = 001101100 (108)

1st Transition Cycle: The 1st decoder takes in the y0-y2 bits, the 2nd decoder takes in the y2-

y4 bits, the 3rd decoder takes in the y4-y6 bits and the 4th decoder takes in the y6-y8 bits and the

control signals are generates accordingly.

1st Decoder: y0 = 0, y1 = 0 and y2 = 1 which produces the control signals S1 = 1, S2 = 1, S3

= 0 and S4 = 1. These control signals indicate that the error will be fixed by subtracting 4 times

the multiplicand from the partial product. This moves it towards the left by two bit positions and

then complements which gives the output 11111110001010111. The sign extension bits are the

first 6 bits which is a replication of the signs bit that resulted from the gating of control signals S1

and S2 bits of their corresponding decoders by the AND gates.

2nd Decoder: y2 = 1, y3 = 1and y4 = 0 which produces the control signals S1 = 1, S2 = 0, S3

= 0 and S4 = 1. These control signals indicate that the error will be fixed by adding 4 times the

multiplicand to the partial product. This moves it towards the left by two bit positions and then

complements which gives the output 00000111010100000. The sign extension bits are the last 2

bits and the first 4 bits which is a replication of the signs bit.

3rd Decoder: y4 = 0, y5 = 1and y6 = 1 which produces the control signals S1 = 1, S2 = 1, S3

= 1 and S4 = 0. These control signals indicate that the error will be fixed by subtracting 4 times

the multiplicand from the partial product. This moves it towards the left by two bit positions and

then complements which gives the output 11110001010111111. The sign extension bits are the

last 4 bits and the first 2 bits which is a replication of the signs bit.

4th Decoder: y6 = 1, y7 = 0 and y8 = 0 which produces the control signals S1 = 1, S2 = 0, S3

= 1 and S4 = 0. These control signals indicate that the error will be fixed by adding 2 times the

45
multiplicand to the partial product. This moves it towards the left by one bit position and then

complements which gives the output 00111010100000000. The sign extension bits are the last 6

bits which is a replication of the signs bit.

The results from the first three recoding logic components will be fed to the first carry save

adder after which the sum and carry vectors are obtained. This sum and the carry vectors are stored

so that they can serve as the input for the next carry save adder along with the output from the 4th

recoding logic module. The addition process in the second carry save adder then generates a new

sum and carry vector and feeds them to the third carry save adder together with the output from

the accumulator. This is then fed to the carry propagate adder to get the final output as

00110001010101111. The two least significant bits from the accumulator are moved to the right

towards the multiplier register that then takes up their spots as the two most significant bits of the

multiplier register. The two most significant bit of the accumulator is compared in order to

determine which bits would move forward to the two bit extension register which marks the end

of the 1st transition cycle.

In the 2nd transition cycle, the control signals are generated from the 4 decoders after the three least

significant bits from the multiplier register are compared. The results from the decoders and final

result from the 2nd transition cycle are described in the figure below:

46
Stage Output

1st Decoder 00000001110101000

2nd Decoder 11111100010101111

3rd Decoder 00001110101000000

4th Decoder 11000101011111111

Final Carry Propagate Adder 10011110101000100

Figure .37: Outputs for each level in the 2nd transition cycle

This final result is loaded onto the accumulator and moved to the right by two bit positions

towards the multiplier register. The same process is repeated for the 3rd and 4th transition cycles

and their results are as per the following figure:

Cycle Output

3rd Transition Cycle 01010110010001100

4th Transition Cycle 11100001011100111

Figure .38: Outputs for the 3rd and 4th transition cycles

The output that was loaded into the accumulator at the end of the 4th transition cycle was

00111000011100010 while the output in the multiplier register was 10111000. The final output is

determined by taking the 7 least significant bits from the accumulator and the 8 bits from the

multiplier register which puts the final multiplication result at 110001010111000.

47
5.3 Final Multiplier Design

The final design follows the same method as the one discussed in section 5.2. It is depicted

in Figure 39 which shows the final design for the high speed multiplier of 32 bits which uses a

recoding by pairs algorithm. 4 transition cycles are required in order to simulate the 8 bits of input

in the example in 5.2 but in this final design, the total number of cycles required is 16 is the bits

required are 32.

Figure .39: High speed multiplier of 32 bits which uses a recoding by pairs algorithm (Final

Design)

48
Chapter 6: DETAILS OF IMPLEMENTATION

6.1 introduction

This chapter talks about the implementation of booth algorithms on FPGA including the

circuit diagram and hardware components used to build the project. It will guide you through the

step by step to build this project. The machine is power using a 3.3v 500mA power supply. The

FPGA used for the project is Arty Z7.

6.2 List of Components

Items Qty

Arty Z7 1

7-segment display 4

NPN transistor 16

Resistor-470 ohms 16

Wires 40

Breadboard 1

Figure 40: Table of component

49
6.3 Description of hardware component

6.3.1 Digit seven segments

The 7-segment used is a 4 digit 7-segment display. Pins 1, 2, 6 & 8 are common anodes

Pins 14, 16, 13, 3, 5, 11, 15 & 7 are the pins corresponding to the LED’s.

Figure 41: the common anode 7-segment

6.3.2 Transistors:

An NPN transistor is used in the project to connect common anode from the 7-segment to the

positive supply. It has been chosen because a NPN transistor avoid some of the voltage base to

emitter drop, it is as little as 100mV.

The transistor here is used as a switch to control the positive supply going to the display. The

base of the transistor is connected to the FPGA using a 470 ohms resistor. It is enough to operate

the transistor in saturation and cut-off mode.

50
6.3.3 ArtyZ7

It is the development kit designed around the Zynq-7000 from Xilinx. It consists of dual-core,

650 MHz ARM Cortex-A9 processor with Xilinx 7-series Field Programmable Gate Array

(FPGA) logic. This is the core of the project where multiplication algorithm is implemented.

Figure 42: Arty Z7

51
6.4 Schematic Diagram

Figure 43: schematic of the circuit

52
Figure 44: circuit before connecting to the ArtyZ7

53
6.5 Functional Description of the Project:

ArtyZ7 is programmable SOC (system on chip) using A9 processor with architecture that

integrate dual core and 650 MHZ clock rate, which make it a powerful processor. Also it has

four buttons and two switches which are used in this project to get user input. Switch one is to

select between input one and input two, switch 2 is to select the sign of the input which is

selected by switch one. It sends logic one or high signal to activate the digit on the seven

segments digit. The switch 3 is assigned to change the value of the input and preform the

multiplication. Button 0 is to add one to the first digit of the four seven-segment. Every time is

pressed, it will increment the value until it reaches 9 then start from zero again by sending high

signal to the emitter of the transistor that is connected to the digit. Button 1 is controlling the

second digit and button 2 is controlling the third digit of the input that is selected by switch one.

The button 3 is assigned to perform the multiplication.

54
Figure 45: circuit after connecting to the ArtyZ7

55
6.6 Software Description:

6.6.1 Introduction

After designing the circuit, it is modeled in VHDL. Since the circuit does not require the

use of memory and the output is only dependent on the present input, the combinational circuit

design process is used for the implementation of the circuit. This section describes the

implementation of various components and their use.

While writing the code, the standard packages and libraries from IEEE is used. The clock

frequency is set to the default of 100MHz. The code consists of several components including

singed multipliers, seven segment displays, BCD display, hex to seven segment converters and

signed to slv converter. The following are the components in the code:

6.6.2 Signed Multiplier:

This section contains the implementation of Booth’s algorithm. The signals input 1 and

input 2 provide the two numbers for multiplication. The clock signal synchronizes the circuit with

other components in the circuit. The reset is always set to zero. When start signal is high,

multiplication occurs and when the start signal is low, the multiplication is complete and the result

is produced on the product output.

56
Figure 46: Signed multiplier

6.6.3 Signed_to_SLV:

Following to the multiplier is the signed to slv component. It is used to generate signal for

the segment converter to display the sign of the number. There are three signed to slv converters,

one each for A, B and result. It looks at the number and generate 1 if the number is smaller than

zero. The output goes to a multiplexer, which generates either ‘0111111’ or ‘1111111’. The signals

are stored in the register and passed on to the segment converter to display to the output.

Figure 47: signed to std logic vector convertor

57
Figure 48: signed to SLV details circuit

6.6.4 BCD Display:

Next follows a bcd display component. This component is a binary to bcd converter, which

converts 12 bits binary input from signed to slv converter to 4 bit bcd signals. It works by shifting

bits from one shift register to another starting from MSB first. There are three binary to bcd

converter in the code, one each for input A, input B and result.

Figure 49: BCD display

58
6.6.5 Hex_to_7_Seg:

The hex to seven segment converter the hex input from the bcd display and converts it to

the seven-segment output which is fed into the segment controller. There is a total of 13 hex to 7

segment converters in the code. It consists of predefined outputs for the set of inputs.

Figure 50: Hex to 7 segment component

6.6.6 Segment Controller:

The section of the code controls the output on the seven-segment display. The input signals

are clock, refresh rate and digital inputs from the output of the multiplier and the output connects

directly to the pins of the seven-segment display. The segment refresh rate is set to 50 kHz. The

segment controller works by toggling between different pins of the seven-segment display to

display digits. This section consists of several processes. They are:

59
The sign_proc process handles the sign assignment to be displayed for A, B and result

based upon the value stored in sign register. With every rising edge, it reads the value of sign

register and produces corresponding output to display the sign.

The add_sub_proc process accounts for the digits which are displayed at A and B. With

every rising edge of the clock, if the one_edge_start is 1 than it increments the LSB of the output

displays. Similarly, if the ten_edge_start of the input is 1, it will control the center digit or the digit

at tens place. If the hundred_edge_start of the input is high, it will control the MSB.

The counter_proc is a counter to create desired multiplexed rate and shift toggle bits.

The toggle_proc toggles between the various seven segment displays by selecting

appropriate display output based on the toggle bit.

Figure 51: segment controller

60
Chapter 7: SUMMARY AND CONCLUSION

In this project, a High speed 32-bit multiplier is studied, designed, modeled,


simulated and implemented. Recoding by pairs algorithm is used to make multiplier faster.
The CSA's are used because the multiplication process involves addition of multiple n-bit
numbers. Two-level CLA adder is modeled and designed to speed up the addition process.
The Two-level CLA adder unit is used because it is approximately 50% faster than One-
level CLA adder unit.
The major goal of this project is to design a multiplier that operate much faster than
regular multiplier. This multiplier performs high speed 32-bit multiplication with moderate
complexity and high performance. However, various schemes and methodologies are
presented; an ideal selection would depend on the individual needs and requirements of the
designer.
Few problems were encountered during the completion of the project, especially in
the first stages of simulation and synthesis as well. One by one, these problems were solved
until the expected results were obtained. Several root causes were identified along the way.
A description of the problems and their corresponding solutions are discussed below.
First, conflicting multiplication results were obtained during simulation when
certain combinations of the multiplier triplets were being examined. For instance,
multiplier bits with the format 000 or 111 produced inaccurate results. It was observed that
control signal S2 out of decoders took the unknown value of x, which only matched what
the decoder function table (Figure 27) indicated. In fact, this was wrong. Control signal S2
turned out to be as important as control signal S1 for the above mentioned combinations of
multiplier triplets. To resolve this problem, the decoder function table was modified and
S2 was assigned a value of "0" for multiplier triplets like 000 and 111. The decoder was
then re-designed and tested for functionality.

Second, also during simulation, the last 32 of the 64 bits in the multiplication final result
were shown having values in complement form, that is, instead of 0's there were 1's and

61
vice versa. The error was caused by the right shift input to the accumulator, which was zero
all the times. Whenever 2's complement was performed on the multiplicand multiples, one
was needed as right shift input to the accumulator. To resolve this issue, an AND gate was
added to provide the correct shift input. The accumulator MSB (acc [40]) and constant
number "1" were used as inputs to the AND gate. With this new addition, the functionality
of the multiplier was successfully verified.
All throughout the synthesis, many difficulties were overcome. However, the most
significant one was the definition of constraints. Each step was performed a number of
times until the right set of constraints were conceived. The excellent debugging capabilities
of both synthesis tools made it possible to identify the critical paths in the design such that
when inspected, gave the author a better understanding of the way constraints were used by
the synthesis tools.
Ultimately, the experience and knowledge gained while working on the project have
been very valuable. Although, the design sacrifices some uniformity and cost, the recoding
by pairs multiplier was designed, modeled and simulated successfully. Thus, achieving the
initial goals set for the project.

62
References

Baugh, C.R. and Wooley, B.A., "A Two's Complement Parallel Array Multiplication Algorithm"

Bell Laboratories, 1973.

Booth, A.D., "A Signed Binary Multiplication Technique," 1951.

Brent, R.P. and Kung, H.T., "The Area time Complexity of Binary Multiplication," Journal of the

ACM, 1981.

Dr. Nagi, El naga. "ECE621 Lecture Notes", California state university, Northridge, 2009

Fenwick, P.M., "Binary Multiplication with Overlapped Addition Cycles," IEEE Trans. Comp.,

Vol.C-18, No.1, Jan. 1969.

Habiti, A and Wintz, P.A., "Fast Multipliers," IEEE Trans. Computers, Vol. C-19, No.4, Feb 1971.

Kai, Hwang, " Global Versus Modular Two's Complement Array Multipliers" IEEE Trans.

Computers, Vol. C- 28, No.4, Apr.2007.

Kai, Hwang, "Computer Arithmetic- Principles, Architecture and Design," New York: John wiley

and sons. inc., 1979.

Kamal, A.A and Ghanam, M., "High - Speed Multiplication Systems," IEEE Trans. Computers,

Vol. C-21, No.9, Sep. 1972.

Koren, Israel, "Computer Arithmetic Algorithms," second edition, 2005.

Lyon, R. F., "Two's Complement Pipeline Multipliers," IEEE Trans. Commun., com-24, Apr.

1976.

63
Mi, Lu, "Arithmetic and logic in computer systems" John Wiley and sons, Hoboken, NJ, c2004.

Morris, Mano. "Digital Design" Upper Saddle River, NJ: Prentice Hall, 2007.

Palnitkar, Samir, "Verilog HDL, a guide to Digital Design and Synthesis", Prentice Hall, NJ, 2008

Pezaris, S, D, "A 40ns 17-bit-by-bit An-ay Multiplier," IEEE Trans. Computers, Vol. C-20, No.4,

Apr. 1971.

Stenzel, W.J. et al., "A Compact High - Speed Multiplication Scheme," IEEE Trans. Computers,

Oct. 1977.

64
Appendix

1- VHDL

----------------------------------------------------------------------------------
-- Company:
-- Engineer: Rashed Alajmi
--
-- Create Date: 11/21/2018 11:35:06 AM
-- Design Name:
-- Module Name: top - Behavioral
-- Project Name:
-- Target Devices:
-- Tool Versions:
-- Description:
--
-- Dependencies:
--
-- Revision:
-- Revision 0.01 - File Created
-- Additional Comments:
--

-- ALU Top

-- Libraries
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

65
use IEEE.numeric_std.all;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.std_logic_signed.all;

entity Booth_Top is
generic (
clock_frequency : integer := 100000000; -- Input clock rate in Hz (100 MHz
default)
segment_refresh : integer := 50000); -- Refresh rate in Hz
port (
ja : out std_logic_vector(6 downto 0); -- seg out
dp : out std_logic;
digit_select : out std_logic_vector(15 downto 0);
one : in std_logic;
ten : in std_logic;
hundred : in std_logic;
start : in std_logic;
sel : in std_logic;
add_sub : in std_logic;
clk : in std_logic);
end Booth_Top;

architecture behavior of Booth_Top is

-------------------------------------------------------------------------------
-- COMPONENTS
-------------------------------------------------------------------------------
-- ALU

66
--component ALU_2
--generic (
-- bit_depth : integer := 8);
--port (
-- opcode : in std_logic_vector(2 downto 0);
-- A : in signed(2 * bit_depth - 1 downto 0);
-- B : in signed(2 * bit_depth - 1 downto 0);
-- execute : in std_logic;
-- result : out signed(2 * bit_depth - 1 downto 0));
--end component;

-- Signed Multiplier (Booths Algorithm)


component smult_1
generic (
input_size : integer := 8);
port (
product : out signed(2 * input_size - 1 downto 0);
data_ready : out std_logic;
input_1 : in signed(input_size - 1 downto 0);
input_2 : in signed(input_size - 1 downto 0);
start : in std_logic;
reset : in std_logic;
clk : in std_logic);
end component;

-- Seg Display
component Seg_Display_16

67
generic(
input_clk_freq : integer := 100000000; -- Input clock rate in Hz
refresh_rate : integer := 50000); -- Refresh rate in Hz
port(
-- 7 Segment Display Output
seg : out std_logic_vector(6 downto 0);
-- 7 Segment Display Decimal Point
dp : out std_logic;
-- Selects Digit
an : out std_logic_vector(15 downto 0);
-- Input segments 0 through 3
digit_1 : in std_logic_vector(6 downto 0);
digit_2 : in std_logic_vector(6 downto 0);
digit_3 : in std_logic_vector(6 downto 0);
digit_4 : in std_logic_vector(6 downto 0);
digit_5 : in std_logic_vector(6 downto 0);
digit_6 : in std_logic_vector(6 downto 0);
digit_7 : in std_logic_vector(6 downto 0);
digit_8 : in std_logic_vector(6 downto 0);
digit_9 : in std_logic_vector(6 downto 0);
digit_10 : in std_logic_vector(6 downto 0);
digit_11 : in std_logic_vector(6 downto 0);
digit_12 : in std_logic_vector(6 downto 0);
digit_13 : in std_logic_vector(6 downto 0);
digit_14 : in std_logic_vector(6 downto 0);
digit_15 : in std_logic_vector(6 downto 0);
digit_16 : in std_logic_vector(6 downto 0);

68
-- Input decimal points
-- in_dp : in std_logic_vector(15 downto 0);
-- Input Clock
clk : in std_logic);
end component;

-- BCD Display
component binary_bcd
generic(
N: integer := 16);
port(
clk, reset: in std_logic;
binary_in: in std_logic_vector(N-1 downto 0);
bcd0, bcd1, bcd2, bcd3,
bcd4, bcd5, bcd6 : out std_logic_vector(3 downto 0));
end component;

-- Hex to 7 seg
component Hex_to_7_Seg
port (
seven_seg : out std_logic_vector(6 downto 0);
hex : in std_logic_vector(3 downto 0));
end component;

-- Signed to SLV Converter


component Signed_to_SLV
generic(

69
bit_depth : integer := 12);
port(
Signed_in : in signed(bit_depth - 1 downto 0);
SLV_out : out std_logic_vector(bit_depth - 1 downto 0);
neg : out std_logic);
end component;

-------------------------------------------------------------------------------
-- SIGNALS & CONSTANTS
-------------------------------------------------------------------------------
signal A_input, B_input : signed(11 downto 0) := (others => '0');
signal Result_out : signed(23 downto 0) := (others => '0');

signal A_slv, B_slv : std_logic_vector(11 downto 0) := (others


=> '0');
signal Result_slv : std_logic_vector(23 downto 0) :=
(others => '0');

signal A_sign, B_sign, Result_sign : std_logic_vector(6 downto 0) :=


"0000000";

signal dig_1 : std_logic_vector(6 downto 0) := "0000000";


signal dig_2 : std_logic_vector(6 downto 0) := "0000000";
signal dig_3 : std_logic_vector(6 downto 0) := "0000000";
signal dig_4 : std_logic_vector(6 downto 0) := "0000000";
signal dig_5 : std_logic_vector(6 downto 0) := "0000000";
signal dig_6 : std_logic_vector(6 downto 0) := "0000000";
signal dig_7 : std_logic_vector(6 downto 0) := "0000000";

70
signal dig_8 : std_logic_vector(6 downto 0) := "0000000";
signal dig_9 : std_logic_vector(6 downto 0) := "0000000";
signal dig_10 : std_logic_vector(6 downto 0) := "0000000";
signal dig_11 : std_logic_vector(6 downto 0) := "0000000";
signal dig_12 : std_logic_vector(6 downto 0) := "0000000";
signal dig_13 : std_logic_vector(6 downto 0) := "0000000";
signal dig_14 : std_logic_vector(6 downto 0) := "0000000";
signal dig_15 : std_logic_vector(6 downto 0) := "0000000";
signal dig_16 : std_logic_vector(6 downto 0) := "0000000";

signal A_bcd0, A_bcd1, A_bcd2, A_bcd3, A_bcd4, A_bcd5, A_bcd6 :


std_logic_vector(3 downto 0) := x"0";
signal B_bcd0, B_bcd1, B_bcd2, B_bcd3, B_bcd4, B_bcd5, B_bcd6 :
std_logic_vector(3 downto 0) := x"0";
signal R_bcd0, R_bcd1, R_bcd2, R_bcd3, R_bcd4, R_bcd5, R_bcd6 :
std_logic_vector(3 downto 0) := x"0";

signal one_lead, ten_lead, hundred_lead, start_lead : std_logic := '0';


signal one_follow, ten_follow, hundred_follow, start_follow : std_logic := '0';
signal one_edge_start, ten_edge_start, hundred_edge_start, start_start
: std_logic := '0';

signal A_neg, B_neg, R_neg : std_logic := '0';

signal reset : std_logic := '0';


signal ready_data : std_logic;
-------------------------------------------------------------------------------
-- DESIGN
-------------------------------------------------------------------------------

71
begin

dig_1 <= Result_sign;


dig_10 <= A_sign;
dig_13 <= B_sign;

-- ALU
--ALU : ALU_2
--generic map(12)
--port map(opcode, A_input, B_input, start_start, Result_out);

-- Signed Multiplier
BOOTH: smult_1
generic map(12)
port map(Result_out, ready_data, A_input, B_input, start_start, reset, clk);

-- Binary BCD's
-- BCD Display A
A_BCD: binary_bcd
generic map(12)
port map(clk, reset, A_slv, A_bcd0, A_bcd1, A_bcd2, A_bcd3, A_bcd4, A_bcd5,
A_bcd6);

-- BCD Display B
B_BCD: binary_bcd
generic map(12)
port map(clk, reset, B_slv, B_bcd0, B_bcd1, B_bcd2, B_bcd3, B_bcd4, B_bcd5,
B_bcd6);

72
-- BCD Display Result
R_BCD: binary_bcd
generic map(24)
port map(clk, reset, Result_slv, R_bcd0, R_bcd1, R_bcd2, R_bcd3, R_bcd4, R_bcd5,
R_bcd6);

-- Hex to 7 Seg converters


--DIGIT_1: Hex_to_7_Seg
-- port map(dig_1, Result_sign);

-- A Converter
A_CONVERTER: Signed_to_SLV
generic map(12)
port map(A_input, A_slv, A_neg);

B_CONVERTER: Signed_to_SLV
generic map(12)
port map(B_input, B_slv, B_neg);

RESULT_CONVERTER: Signed_to_SLV
generic map(24)
port map(Result_out, Result_slv, R_neg);

DIGIT_2: Hex_to_7_Seg
port map(dig_2, R_bcd6);

DIGIT_3: Hex_to_7_Seg

73
port map(dig_3, R_bcd5);

DIGIT_4: Hex_to_7_Seg
port map(dig_4, R_bcd4);

DIGIT_5: Hex_to_7_Seg
port map(dig_5, R_bcd3);

DIGIT_6: Hex_to_7_Seg
port map(dig_6, R_bcd2);

DIGIT_7: Hex_to_7_Seg
port map(dig_7, R_bcd1);

DIGIT_8: Hex_to_7_Seg
port map(dig_8, R_bcd0);

DIGIT_9: Hex_to_7_Seg
port map(dig_9, A_bcd1);

--DIGIT_10: Hex_to_7_Seg
-- port map(dig_10, A_sign);

DIGIT_11: Hex_to_7_Seg
port map(dig_11, A_bcd2);

DIGIT_12: Hex_to_7_Seg

74
port map(dig_12, A_bcd0);

--DIGIT_13: Hex_to_7_Seg
-- port map(dig_13, B_sign);

DIGIT_14: Hex_to_7_Seg
port map(dig_14, B_bcd2);

DIGIT_15: Hex_to_7_Seg
port map(dig_15, B_bcd1);

DIGIT_16: Hex_to_7_Seg
port map(dig_16, B_bcd0);

-- 7 Segment Display Controller


SEG_CONTROLLER: Seg_Display_16
generic map(clock_frequency, segment_refresh)
port map(ja, dp, digit_select, dig_1, dig_2, dig_3, dig_4, dig_5, dig_6, dig_7, dig_8,
dig_9, dig_10, dig_11, dig_12, dig_13, dig_14, dig_15, dig_16, clk);

sign_proc : process(clk)
begin
if(rising_edge(clk)) then
if(A_neg = '1') then
A_sign <= "0111111";
else
A_sign <= "1111111";
end if;

75
if(B_neg = '1') then
B_sign <= "0111111";
else
B_sign <= "1111111";
end if;

if(R_neg = '1') then


Result_sign <= "0111111";
else
Result_sign <= "1111111";
end if;
end if;
end process sign_proc;

one_edge_start <= one_lead and (not one_follow);


ten_edge_start <= ten_lead and (not ten_follow);
hundred_edge_start <= hundred_lead and (not hundred_follow);
start_start <= start_lead and (not start_follow);

edge_detect_proc : process(clk)
begin
if(rising_edge(clk)) then
one_lead <= one;
one_follow <= one_lead;
ten_lead <= ten;
ten_follow <= ten_lead;

76
hundred_lead <= hundred;
hundred_follow <= hundred_lead;
start_lead <= start;
start_follow <= start_lead;
end if;
end process edge_detect_proc;

add_sub_proc : process(clk)
begin
if(rising_edge(clk)) then
if(one_edge_start = '1') then
if(add_sub = '1') then
if(sel = '1' and A_input < 999) then
A_input <= A_input + 1;
elsif(sel = '0' and B_input < 999) then
B_input <= B_input + 1;
end if;
else
if(sel = '1' and A_input > -999) then
A_input <= A_input - 1;
elsif(sel = '0' and B_input > -999) then
B_input <= B_input - 1;
end if;
end if;

elsif(ten_edge_start = '1') then


if(add_sub = '1') then

77
if(sel = '1' and A_input < 989) then
A_input <= A_input + 10;
elsif(sel = '0' and B_input < 989) then
B_input <= B_input + 10;
end if;
else
if(sel = '1' and A_input > -989) then
A_input <= A_input - 10;
elsif(sel = '0' and B_input > -989) then
B_input <= B_input - 10;
end if;
end if;

elsif(hundred_edge_start = '1') then


if(add_sub = '1') then
if(sel = '1' and A_input < 899) then
A_input <= A_input + 100;
elsif(sel = '0' and B_input < 899) then
B_input <= B_input + 100;
end if;
else
if(sel = '1' and A_input > -899) then
A_input <= A_input - 100;
elsif(sel = '0' and B_input > -899) then
B_input <= B_input - 100;
end if;
end if;

78
end if;

end if;
end process add_sub_proc;

end behavior;

2- Booth multiplier

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_std.all;

entity smult_1 is
generic (
input_size : integer := 8);
port (
product : out signed(2 * input_size - 1 downto 0);
data_ready : out std_logic;
input_1 : in signed(input_size - 1 downto 0);
input_2 : in signed(input_size - 1 downto 0);
start : in std_logic;
reset : in std_logic;
clk : in std_logic);
end smult_1;

----------------------------------------------------------------------------
--
-- BEHAVIOR
--
----------------------------------------------------------------------------

79
architecture behavior of smult_1 is
-- State Machine states
type state_type is(init, load_state, right_shift, done);
signal state, nxt_state : state_type;

-- Control signals
signal shift : std_logic;
signal add_A : std_logic;
signal add_S : std_logic;
signal load : std_logic;

-- Data Signals
constant maxcount : integer := input_size - 1;
signal A_reg : signed((2*input_size) downto 0) := (others => '0');
signal S_reg : signed((2*input_size) downto 0) := (others => '0');
signal P_reg : signed((2*input_size) downto 0) := (others => '0');
signal sum_S : signed((2*input_size) downto 0) := (others => '0');
signal sum_A : signed((2*input_size) downto 0) := (others => '0');
signal count : integer range 0 to maxcount + 1 := 0;

signal start_count_lead : std_logic := '0';


signal start_count_follow : std_logic := '0';
signal start_count : std_logic := '0';

begin
-----------------------------------------
-- STATE MACHINE
-- (Two Process)
--
-- This state machine is used to determine
-- what state smult_1 is in based on then
-- count value and the LSB's of the P
-- register
-----------------------------------------
state_proc: process(clk)
begin
if rising_edge(clk) then
if(reset = '1') then
state <= init;
else
state <= nxt_state;
end if;
end if;
end process state_proc;

state_machine: process(state, start, start_count, count, P_reg(1 downto 0))

80
begin
-- Initialize nxt_state and control signals
nxt_state <= state;
shift <= '0';
add_A <= '0';
add_S <= '0';
load <= '0';
data_ready <= '0';

case state is
-- Initialization State
when init =>
if(start_count = '1') then
nxt_state <= load_state;
else
nxt_state <= init;
end if;

-- Loading State
when load_state =>
load <= '1';
nxt_state <= right_shift;

-- Right Shift state (Multiplication is occurring)


when right_shift =>
shift <= '1';
if(count /= maxcount) then
nxt_state <= right_shift;
else
nxt_state <= done;
end if;

-- Read 2 LSB's of P_reg


if(P_reg(1 downto 0) = "01") then
add_A <= '1';
elsif(P_reg(1 downto 0) = "10") then
add_S <= '1';
end if;

-- Multiplication is complete (ready to receive new inputs)


when done =>
data_ready <= '1';
if(start = '0') then
nxt_state <= init;
else
nxt_state <= done;

81
end if;

-- All other states


when others =>
nxt_state <= init;

end case;
end process state_machine;

-----------------------------------------
-- EDGE DETECTION
--
-- This is used to detect a rising edge of
-- a signal
-----------------------------------------
start_count <= start_count_lead and (not start_count_follow);

start_count_proc: process(clk)
begin
if(rising_edge(clk)) then
if(reset = '1') then
start_count_lead <= '0';
start_count_follow <= '0';
else
start_count_lead <= start;
start_count_follow <= start_count_lead;
end if;
end if;
end process start_count_proc;

-----------------------------------------
-- COUNT PROCESS
--
-- This process is a counter that keeps
-- track of the number of cycles iterated
-- in the state machine
-----------------------------------------
count_proc: process(clk)
begin
if(rising_edge(clk)) then
if((start_count = '1') or (reset = '1')) then
count <= 0;
elsif(state = right_shift) then
count <= count + 1;
end if;
end if;

82
end process count_proc;

-----------------------------------------
-- MULTIPLIER PROCESS
--
-- This process is used to apply the
-- actual multiplication via shifts
-- and additions
-----------------------------------------
-- Determine the Sum of S_reg and A_reg
sum_S <= P_reg + S_reg;
sum_A <= P_reg + A_reg;

mult_proc: process(clk)
begin
if(rising_edge(clk)) then
if(reset = '1') then
P_reg <= (others => '0');
A_reg <= (others => '0');
S_reg <= (others => '0');

elsif(load = '1') then


-- A_reg
A_reg(2*input_size downto input_size + 1) <= input_1;
A_reg(input_size downto 0) <= (others => '0');

-- S_reg
S_reg(2*input_size downto input_size + 1) <= (not input_1) + 1;
S_reg(input_size downto 0) <= (others => '0');

-- P_reg
P_reg(2*input_size downto input_size + 1) <= (others => '0');
P_reg(input_size downto 1) <= input_2;
P_reg(0) <= '0';

elsif(add_A = '1') then


P_reg <= sum_A(2*input_size) & sum_A(2*input_size downto 1);

elsif(add_S = '1') then


P_reg <= sum_S(2*input_size) & sum_S(2*input_size downto 1);

elsif(shift = '1') then


P_reg <= P_reg(2*input_size) & P_reg(2*input_size downto 1);

end if;
end if;

83
end process mult_proc;

-- Defining the output


product <= P_reg(2*input_size downto 1);
end behavior;

3- XDC for arty7


## This file is a general .xdc for the ARTY Rev. B
## To use it in a project:
## - uncomment the lines corresponding to used pins
## - rename the used ports (in each line, after get_ports) according to the top level
signal names in the project

## Clock signal

set_property -dict { PACKAGE_PIN E3 IOSTANDARD LVCMOS33 } [get_ports


{ clk }]; #IO_L12P_T1_MRCC_35 Sch=gclk[100]
#create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports {
CLK100MHZ }];

##Switches

set_property -dict { PACKAGE_PIN A8 IOSTANDARD LVCMOS33 } [get_ports


{ add_sub }]; #IO_L12N_T1_MRCC_16 Sch=sw[0]
#set_property -dict { PACKAGE_PIN C11 IOSTANDARD LVCMOS33 }
[get_ports { add_sub}]; #IO_L13P_T2_MRCC_16 Sch=sw[1]
#set_property -dict { PACKAGE_PIN C10 IOSTANDARD LVCMOS33 }
[get_ports { opcode[2] }]; #IO_L13N_T2_MRCC_16 Sch=sw[2]
set_property -dict { PACKAGE_PIN A10 IOSTANDARD LVCMOS33 }
[get_ports { sel }]; #IO_L14P_T2_SRCC_16 Sch=sw[3]

##RGB LEDs

#set_property -dict { PACKAGE_PIN E1 IOSTANDARD LVCMOS33 }


[get_ports { led0_b }]; #IO_L18N_T2_35 Sch=led0_b
#set_property -dict { PACKAGE_PIN F6 IOSTANDARD LVCMOS33 }
[get_ports { led0_g }]; #IO_L19N_T3_VREF_35 Sch=led0_g
#set_property -dict { PACKAGE_PIN G6 IOSTANDARD LVCMOS33 }
[get_ports { led0_r }]; #IO_L19P_T3_35 Sch=led0_r
#set_property -dict { PACKAGE_PIN G4 IOSTANDARD LVCMOS33 }
[get_ports { led1_b }]; #IO_L20P_T3_35 Sch=led1_b
#set_property -dict { PACKAGE_PIN J4 IOSTANDARD LVCMOS33 } [get_ports
{ led1_g }]; #IO_L21P_T3_DQS_35 Sch=led1_g
#set_property -dict { PACKAGE_PIN G3 IOSTANDARD LVCMOS33 }
[get_ports { led1_r }]; #IO_L20N_T3_35 Sch=led1_r

84
#set_property -dict { PACKAGE_PIN H4 IOSTANDARD LVCMOS33 }
[get_ports { led2_b }]; #IO_L21N_T3_DQS_35 Sch=led2_b
#set_property -dict { PACKAGE_PIN J2 IOSTANDARD LVCMOS33 } [get_ports
{ led2_g }]; #IO_L22N_T3_35 Sch=led2_g
#set_property -dict { PACKAGE_PIN J3 IOSTANDARD LVCMOS33 } [get_ports
{ led2_r }]; #IO_L22P_T3_35 Sch=led2_r
#set_property -dict { PACKAGE_PIN K2 IOSTANDARD LVCMOS33 }
[get_ports { led3_b }]; #IO_L23P_T3_35 Sch=led3_b
#set_property -dict { PACKAGE_PIN H6 IOSTANDARD LVCMOS33 }
[get_ports { led3_g }]; #IO_L24P_T3_35 Sch=led3_g
#set_property -dict { PACKAGE_PIN K1 IOSTANDARD LVCMOS33 }
[get_ports { led3_r }]; #IO_L23N_T3_35 Sch=led3_r

##LEDs

#set_property -dict { PACKAGE_PIN H5 IOSTANDARD LVCMOS33 }


[get_ports { led[0] }]; #IO_L24N_T3_35 Sch=led[4]
#set_property -dict { PACKAGE_PIN J5 IOSTANDARD LVCMOS33 } [get_ports
{ led[1] }]; #IO_25_35 Sch=led[5]
#set_property -dict { PACKAGE_PIN T9 IOSTANDARD LVCMOS33 }
[get_ports { led[2] }]; #IO_L24P_T3_A01_D17_14 Sch=led[6]
#set_property -dict { PACKAGE_PIN T10 IOSTANDARD LVCMOS33 }
[get_ports { led[3] }]; #IO_L24N_T3_A00_D16_14 Sch=led[7]

##Buttons

set_property -dict { PACKAGE_PIN D9 IOSTANDARD LVCMOS33 } [get_ports


{ one }]; #IO_L6N_T0_VREF_16 Sch=btn[0]
set_property -dict { PACKAGE_PIN C9 IOSTANDARD LVCMOS33 } [get_ports
{ ten }]; #IO_L11P_T1_SRCC_16 Sch=btn[1]
set_property -dict { PACKAGE_PIN B9 IOSTANDARD LVCMOS33 } [get_ports
{ hundred }]; #IO_L11N_T1_SRCC_16 Sch=btn[2]
set_property -dict { PACKAGE_PIN B8 IOSTANDARD LVCMOS33 } [get_ports
{ start }]; #IO_L12P_T1_MRCC_16 Sch=btn[3]

##Pmod Header JA

set_property -dict { PACKAGE_PIN G13 IOSTANDARD LVCMOS33 }


[get_ports { ja[0] }]; #IO_0_15 Sch=ja[1]
set_property -dict { PACKAGE_PIN B11 IOSTANDARD LVCMOS33 }
[get_ports { ja[1] }]; #IO_L4P_T0_15 Sch=ja[2]
set_property -dict { PACKAGE_PIN A11 IOSTANDARD LVCMOS33 }
[get_ports { ja[2] }]; #IO_L4N_T0_15 Sch=ja[3]
set_property -dict { PACKAGE_PIN D12 IOSTANDARD LVCMOS33 }
[get_ports { ja[3] }]; #IO_L6P_T0_15 Sch=ja[4]

85
set_property -dict { PACKAGE_PIN D13 IOSTANDARD LVCMOS33 }
[get_ports { ja[4] }]; #IO_L6N_T0_VREF_15 Sch=ja[7]
set_property -dict { PACKAGE_PIN B18 IOSTANDARD LVCMOS33 }
[get_ports { ja[5] }]; #IO_L10P_T1_AD11P_15 Sch=ja[8]
set_property -dict { PACKAGE_PIN A18 IOSTANDARD LVCMOS33 }
[get_ports { ja[6] }]; #IO_L10N_T1_AD11N_15 Sch=ja[9]
set_property -dict { PACKAGE_PIN K16 IOSTANDARD LVCMOS33 }
[get_ports { dp }]; #IO_25_15 Sch=ja[10]

##Pmod Header JB

#set_property -dict { PACKAGE_PIN E15 IOSTANDARD LVCMOS33 }


[get_ports { jb[0] }]; #IO_L11P_T1_SRCC_15 Sch=jb_p[1]
#set_property -dict { PACKAGE_PIN E16 IOSTANDARD LVCMOS33 }
[get_ports { jb[1] }]; #IO_L11N_T1_SRCC_15 Sch=jb_n[1]
#set_property -dict { PACKAGE_PIN D15 IOSTANDARD LVCMOS33 }
[get_ports { jb[2] }]; #IO_L12P_T1_MRCC_15 Sch=jb_p[2]
#set_property -dict { PACKAGE_PIN C15 IOSTANDARD LVCMOS33 }
[get_ports { jb[3] }]; #IO_L12N_T1_MRCC_15 Sch=jb_n[2]
#set_property -dict { PACKAGE_PIN J17 IOSTANDARD LVCMOS33 }
[get_ports { jb[4] }]; #IO_L23P_T3_FOE_B_15 Sch=jb_p[3]
#set_property -dict { PACKAGE_PIN J18 IOSTANDARD LVCMOS33 }
[get_ports { jb[5] }]; #IO_L23N_T3_FWE_B_15 Sch=jb_n[3]
#set_property -dict { PACKAGE_PIN K15 IOSTANDARD LVCMOS33 }
[get_ports { jb[6] }]; #IO_L24P_T3_RS1_15 Sch=jb_p[4]
#set_property -dict { PACKAGE_PIN J15 IOSTANDARD LVCMOS33 }
[get_ports { jb[7] }]; #IO_L24N_T3_RS0_15 Sch=jb_n[4]

##Pmod Header JC

set_property -dict { PACKAGE_PIN U12 IOSTANDARD LVCMOS33 }


[get_ports { digit_select[0] }]; #IO_L20P_T3_A08_D24_14 Sch=jc_p[1]
set_property -dict { PACKAGE_PIN V12 IOSTANDARD LVCMOS33 }
[get_ports { digit_select[1] }]; #IO_L20N_T3_A07_D23_14 Sch=jc_n[1]
set_property -dict { PACKAGE_PIN V10 IOSTANDARD LVCMOS33 }
[get_ports { digit_select[2] }]; #IO_L21P_T3_DQS_14 Sch=jc_p[2]
set_property -dict { PACKAGE_PIN V11 IOSTANDARD LVCMOS33 }
[get_ports { digit_select[3] }]; #IO_L21N_T3_DQS_A06_D22_14 Sch=jc_n[2]
set_property -dict { PACKAGE_PIN U14 IOSTANDARD LVCMOS33 }
[get_ports { digit_select[4] }]; #IO_L22P_T3_A05_D21_14 Sch=jc_p[3]
set_property -dict { PACKAGE_PIN V14 IOSTANDARD LVCMOS33 }
[get_ports { digit_select[5] }]; #IO_L22N_T3_A04_D20_14 Sch=jc_n[3]
set_property -dict { PACKAGE_PIN T13 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[6] }]; #IO_L23P_T3_A03_D19_14 Sch=jc_p[4]
set_property -dict { PACKAGE_PIN U13 IOSTANDARD LVCMOS33 }
[get_ports { digit_select[7] }]; #IO_L23N_T3_A02_D18_14 Sch=jc_n[4]

86
##Pmod Header JD

set_property -dict { PACKAGE_PIN D4 IOSTANDARD LVCMOS33 } [get_ports


{ digit_select[8] }]; #IO_L11N_T1_SRCC_35 Sch=jd[1]
set_property -dict { PACKAGE_PIN D3 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[9] }]; #IO_L12N_T1_MRCC_35 Sch=jd[2]
set_property -dict { PACKAGE_PIN F4 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[10] }]; #IO_L13P_T2_MRCC_35 Sch=jd[3]
set_property -dict { PACKAGE_PIN F3 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[11] }]; #IO_L13N_T2_MRCC_35 Sch=jd[4]
set_property -dict { PACKAGE_PIN E2 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[12] }]; #IO_L14P_T2_SRCC_35 Sch=jd[7]
set_property -dict { PACKAGE_PIN D2 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[13] }]; #IO_L14N_T2_SRCC_35 Sch=jd[8]
set_property -dict { PACKAGE_PIN H2 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[14] }]; #IO_L15P_T2_DQS_35 Sch=jd[9]
set_property -dict { PACKAGE_PIN G2 IOSTANDARD LVCMOS33 } [get_ports
{ digit_select[15] }]; #IO_L15N_T2_DQS_35 Sch=jd[10]

##USB-UART Interface

#set_property -dict { PACKAGE_PIN D10 IOSTANDARD LVCMOS33 }


[get_ports { uart_rxd_out }]; #IO_L19N_T3_VREF_16 Sch=uart_rxd_out
#set_property -dict { PACKAGE_PIN A9 IOSTANDARD LVCMOS33 }
[get_ports { uart_txd_in }]; #IO_L14N_T2_SRCC_16 Sch=uart_txd_in

##ChipKit Single Ended Analog Inputs


##NOTE: The ck_an_p pins can be used as single ended analog inputs with voltages
from 0-3.3V (Chipkit Analog pins A0-A5).
## These signals should only be connected to the XADC core. When using these
pins as digital I/O, use pins ck_io[14-19].

#set_property -dict { PACKAGE_PIN C5 IOSTANDARD LVCMOS33 }


[get_ports { ck_an_n[0] }]; #IO_L1N_T0_AD4N_35 Sch=ck_an_n[0]
#set_property -dict { PACKAGE_PIN C6 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_p[0] }]; #IO_L1P_T0_AD4P_35 Sch=ck_an_p[0]
#set_property -dict { PACKAGE_PIN A5 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_n[1] }]; #IO_L3N_T0_DQS_AD5N_35 Sch=ck_an_n[1]
#set_property -dict { PACKAGE_PIN A6 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_p[1] }]; #IO_L3P_T0_DQS_AD5P_35 Sch=ck_an_p[1]
#set_property -dict { PACKAGE_PIN B4 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_n[2] }]; #IO_L7N_T1_AD6N_35 Sch=ck_an_n[2]
#set_property -dict { PACKAGE_PIN C4 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_p[2] }]; #IO_L7P_T1_AD6P_35 Sch=ck_an_p[2]

87
#set_property -dict { PACKAGE_PIN A1 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_n[3] }]; #IO_L9N_T1_DQS_AD7N_35 Sch=ck_an_n[3]
#set_property -dict { PACKAGE_PIN B1 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_p[3] }]; #IO_L9P_T1_DQS_AD7P_35 Sch=ck_an_p[3]
#set_property -dict { PACKAGE_PIN B2 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_n[4] }]; #IO_L10N_T1_AD15N_35 Sch=ck_an_n[4]
#set_property -dict { PACKAGE_PIN B3 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_p[4] }]; #IO_L10P_T1_AD15P_35 Sch=ck_an_p[4]
#set_property -dict { PACKAGE_PIN C14 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_n[5] }]; #IO_L1N_T0_AD0N_15 Sch=ck_an_n[5]
#set_property -dict { PACKAGE_PIN D14 IOSTANDARD LVCMOS33 }
[get_ports { ck_an_p[5] }]; #IO_L1P_T0_AD0P_15 Sch=ck_an_p[5]

##ChipKit Digital I/O Low

#set_property -dict { PACKAGE_PIN V15 IOSTANDARD LVCMOS33 }


[get_ports { add_sub }]; #IO_L16P_T2_CSI_B_14 Sch=ck_io[0]
#set_property -dict { PACKAGE_PIN U16 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[1] }]; #IO_L18P_T2_A12_D28_14 Sch=ck_io[1]
#set_property -dict { PACKAGE_PIN P14 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[2] }]; #IO_L8N_T1_D12_14 Sch=ck_io[2]
#set_property -dict { PACKAGE_PIN T11 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[3] }]; #IO_L19P_T3_A10_D26_14 Sch=ck_io[3]
#set_property -dict { PACKAGE_PIN R12 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[4] }]; #IO_L5P_T0_D06_14 Sch=ck_io[4]
#set_property -dict { PACKAGE_PIN T14 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[5] }]; #IO_L14P_T2_SRCC_14 Sch=ck_io[5]
#set_property -dict { PACKAGE_PIN T15 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[6] }]; #IO_L14N_T2_SRCC_14 Sch=ck_io[6]
#set_property -dict { PACKAGE_PIN T16 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[7] }]; #IO_L15N_T2_DQS_DOUT_CSO_B_14 Sch=ck_io[7]
#set_property -dict { PACKAGE_PIN N15 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[8] }]; #IO_L11P_T1_SRCC_14 Sch=ck_io[8]
#set_property -dict { PACKAGE_PIN M16 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[9] }]; #IO_L10P_T1_D14_14 Sch=ck_io[9]
#set_property -dict { PACKAGE_PIN V17 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[10] }]; #IO_L18N_T2_A11_D27_14 Sch=ck_io[10]
#set_property -dict { PACKAGE_PIN U18 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[11] }]; #IO_L17N_T2_A13_D29_14 Sch=ck_io[11]
#set_property -dict { PACKAGE_PIN R17 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[12] }]; #IO_L12N_T1_MRCC_14 Sch=ck_io[12]
#set_property -dict { PACKAGE_PIN P17 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[13] }]; #IO_L12P_T1_MRCC_14 Sch=ck_io[13]

##ChipKit Digital I/O On Outer Analog Header

88
##NOTE: These pins should be used when using the analog header signals A0-A5 as
digital I/O (Chipkit digital pins 14-19)

#set_property -dict { PACKAGE_PIN F5 IOSTANDARD LVCMOS33 }


[get_ports { ck_io[14] }]; #IO_0_35 Sch=ck_a[0]
#set_property -dict { PACKAGE_PIN D8 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[15] }]; #IO_L4P_T0_35 Sch=ck_a[1]
#set_property -dict { PACKAGE_PIN C7 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[16] }]; #IO_L4N_T0_35 Sch=ck_a[2]
#set_property -dict { PACKAGE_PIN E7 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[17] }]; #IO_L6P_T0_35 Sch=ck_a[3]
#set_property -dict { PACKAGE_PIN D7 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[18] }]; #IO_L6N_T0_VREF_35 Sch=ck_a[4]
#set_property -dict { PACKAGE_PIN D5 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[19] }]; #IO_L11P_T1_SRCC_35 Sch=ck_a[5]

##ChipKit Digital I/O On Inner Analog Header


##NOTE: These pins will need to be connected to the XADC core when used as
differential analog inputs (Chipkit analog pins A6-A11)

#set_property -dict { PACKAGE_PIN B7 IOSTANDARD LVCMOS33 }


[get_ports { ck_io[20] }]; #IO_L2P_T0_AD12P_35 Sch=ad_p[12]
#set_property -dict { PACKAGE_PIN B6 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[21] }]; #IO_L2N_T0_AD12N_35 Sch=ad_n[12]
#set_property -dict { PACKAGE_PIN E6 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[22] }]; #IO_L5P_T0_AD13P_35 Sch=ad_p[13]
#set_property -dict { PACKAGE_PIN E5 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[23] }]; #IO_L5N_T0_AD13N_35 Sch=ad_n[13]
#set_property -dict { PACKAGE_PIN A4 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[24] }]; #IO_L8P_T1_AD14P_35 Sch=ad_p[14]
#set_property -dict { PACKAGE_PIN A3 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[25] }]; #IO_L8N_T1_AD14N_35 Sch=ad_n[14]

##ChipKit Digital I/O High

#set_property -dict { PACKAGE_PIN U11 IOSTANDARD LVCMOS33 }


[get_ports { ck_io[26] }]; #IO_L19N_T3_A09_D25_VREF_14 Sch=ck_io[26]
#set_property -dict { PACKAGE_PIN V16 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[27] }]; #IO_L16N_T2_A15_D31_14 Sch=ck_io[27]
#set_property -dict { PACKAGE_PIN M13 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[28] }]; #IO_L6N_T0_D08_VREF_14 Sch=ck_io[28]
#set_property -dict { PACKAGE_PIN R10 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[29] }]; #IO_25_14 Sch=ck_io[29]
#set_property -dict { PACKAGE_PIN R11 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[30] }]; #IO_0_14 Sch=ck_io[30]

89
#set_property -dict { PACKAGE_PIN R13 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[31] }]; #IO_L5N_T0_D07_14 Sch=ck_io[31]
#set_property -dict { PACKAGE_PIN R15 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[32] }]; #IO_L13N_T2_MRCC_14 Sch=ck_io[32]
#set_property -dict { PACKAGE_PIN P15 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[33] }]; #IO_L13P_T2_MRCC_14 Sch=ck_io[33]
#set_property -dict { PACKAGE_PIN R16 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[34] }]; #IO_L15P_T2_DQS_RDWR_B_14 Sch=ck_io[34]
#set_property -dict { PACKAGE_PIN N16 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[35] }]; #IO_L11N_T1_SRCC_14 Sch=ck_io[35]
#set_property -dict { PACKAGE_PIN N14 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[36] }]; #IO_L8P_T1_D11_14 Sch=ck_io[36]
#set_property -dict { PACKAGE_PIN U17 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[37] }]; #IO_L17P_T2_A14_D30_14 Sch=ck_io[37]
#set_property -dict { PACKAGE_PIN T18 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[38] }]; #IO_L7N_T1_D10_14 Sch=ck_io[38]
#set_property -dict { PACKAGE_PIN R18 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[39] }]; #IO_L7P_T1_D09_14 Sch=ck_io[39]
#set_property -dict { PACKAGE_PIN P18 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[40] }]; #IO_L9N_T1_DQS_D13_14 Sch=ck_io[40]
#set_property -dict { PACKAGE_PIN N17 IOSTANDARD LVCMOS33 }
[get_ports { ck_io[41] }]; #IO_L9P_T1_DQS_14 Sch=ck_io[41]

## ChipKit SPI

#set_property -dict { PACKAGE_PIN G1 IOSTANDARD LVCMOS33 }


[get_ports { ck_miso }]; #IO_L17N_T2_35 Sch=ck_miso
#set_property -dict { PACKAGE_PIN H1 IOSTANDARD LVCMOS33 }
[get_ports { ck_mosi }]; #IO_L17P_T2_35 Sch=ck_mosi
#set_property -dict { PACKAGE_PIN F1 IOSTANDARD LVCMOS33 }
[get_ports { ck_sck }]; #IO_L18P_T2_35 Sch=ck_sck
#set_property -dict { PACKAGE_PIN C1 IOSTANDARD LVCMOS33 }
[get_ports { ck_ss }]; #IO_L16N_T2_35 Sch=ck_ss

## ChipKit I2C

#set_property -dict { PACKAGE_PIN L18 IOSTANDARD LVCMOS33 }


[get_ports { ck_scl }]; #IO_L4P_T0_D04_14 Sch=ck_scl
#set_property -dict { PACKAGE_PIN M18 IOSTANDARD LVCMOS33 }
[get_ports { ck_sda }]; #IO_L4N_T0_D05_14 Sch=ck_sda
#set_property -dict { PACKAGE_PIN A14 IOSTANDARD LVCMOS33 }
[get_ports { scl_pup }]; #IO_L9N_T1_DQS_AD3N_15 Sch=scl_pup
#set_property -dict { PACKAGE_PIN A13 IOSTANDARD LVCMOS33 }
[get_ports { sda_pup }]; #IO_L9P_T1_DQS_AD3P_15 Sch=sda_pup

##Misc. ChipKit signals

90
#set_property -dict { PACKAGE_PIN M17 IOSTANDARD LVCMOS33 }
[get_ports { ck_ioa }]; #IO_L10N_T1_D15_14 Sch=ck_ioa
#set_property -dict { PACKAGE_PIN C2 IOSTANDARD LVCMOS33 }
[get_ports { ck_rst }]; #IO_L16P_T2_35 Sch=ck_rst

##SMSC Ethernet PHY

#set_property -dict { PACKAGE_PIN D17 IOSTANDARD LVCMOS33 }


[get_ports { eth_col }]; #IO_L16N_T2_A27_15 Sch=eth_col
#set_property -dict { PACKAGE_PIN G14 IOSTANDARD LVCMOS33 }
[get_ports { eth_crs }]; #IO_L15N_T2_DQS_ADV_B_15 Sch=eth_crs
#set_property -dict { PACKAGE_PIN F16 IOSTANDARD LVCMOS33 }
[get_ports { eth_mdc }]; #IO_L14N_T2_SRCC_15 Sch=eth_mdc
#set_property -dict { PACKAGE_PIN K13 IOSTANDARD LVCMOS33 }
[get_ports { eth_mdio }]; #IO_L17P_T2_A26_15 Sch=eth_mdio
#set_property -dict { PACKAGE_PIN G18 IOSTANDARD LVCMOS33 }
[get_ports { eth_ref_clk }]; #IO_L22P_T3_A17_15 Sch=eth_ref_clk
#set_property -dict { PACKAGE_PIN C16 IOSTANDARD LVCMOS33 }
[get_ports { eth_rstn }]; #IO_L20P_T3_A20_15 Sch=eth_rstn
#set_property -dict { PACKAGE_PIN F15 IOSTANDARD LVCMOS33 }
[get_ports { eth_rx_clk }]; #IO_L14P_T2_SRCC_15 Sch=eth_rx_clk
#set_property -dict { PACKAGE_PIN G16 IOSTANDARD LVCMOS33 }
[get_ports { eth_rx_dv }]; #IO_L13N_T2_MRCC_15 Sch=eth_rx_dv
#set_property -dict { PACKAGE_PIN D18 IOSTANDARD LVCMOS33 }
[get_ports { eth_rxd[0] }]; #IO_L21N_T3_DQS_A18_15 Sch=eth_rxd[0]
#set_property -dict { PACKAGE_PIN E17 IOSTANDARD LVCMOS33 }
[get_ports { eth_rxd[1] }]; #IO_L16P_T2_A28_15 Sch=eth_rxd[1]
#set_property -dict { PACKAGE_PIN E18 IOSTANDARD LVCMOS33 }
[get_ports { eth_rxd[2] }]; #IO_L21P_T3_DQS_15 Sch=eth_rxd[2]
#set_property -dict { PACKAGE_PIN G17 IOSTANDARD LVCMOS33 }
[get_ports { eth_rxd[3] }]; #IO_L18N_T2_A23_15 Sch=eth_rxd[3]
#set_property -dict { PACKAGE_PIN C17 IOSTANDARD LVCMOS33 }
[get_ports { eth_rxerr }]; #IO_L20N_T3_A19_15 Sch=eth_rxerr
#set_property -dict { PACKAGE_PIN H16 IOSTANDARD LVCMOS33 }
[get_ports { eth_tx_clk }]; #IO_L13P_T2_MRCC_15 Sch=eth_tx_clk
#set_property -dict { PACKAGE_PIN H15 IOSTANDARD LVCMOS33 }
[get_ports { eth_tx_en }]; #IO_L19N_T3_A21_VREF_15 Sch=eth_tx_en
#set_property -dict { PACKAGE_PIN H14 IOSTANDARD LVCMOS33 }
[get_ports { eth_txd[0] }]; #IO_L15P_T2_DQS_15 Sch=eth_txd[0]
#set_property -dict { PACKAGE_PIN J14 IOSTANDARD LVCMOS33 }
[get_ports { eth_txd[1] }]; #IO_L19P_T3_A22_15 Sch=eth_txd[1]
#set_property -dict { PACKAGE_PIN J13 IOSTANDARD LVCMOS33 }
[get_ports { eth_txd[2] }]; #IO_L17N_T2_A25_15 Sch=eth_txd[2]
#set_property -dict { PACKAGE_PIN H17 IOSTANDARD LVCMOS33 }
[get_ports { eth_txd[3] }]; #IO_L18P_T2_A24_15 Sch=eth_txd[3]

91
##Quad SPI Flash

#set_property -dict { PACKAGE_PIN L13 IOSTANDARD LVCMOS33 }


[get_ports { qspi_cs }]; #IO_L6P_T0_FCS_B_14 Sch=qspi_cs
#set_property -dict { PACKAGE_PIN K17 IOSTANDARD LVCMOS33 }
[get_ports { qspi_dq[0] }]; #IO_L1P_T0_D00_MOSI_14 Sch=qspi_dq[0]
#set_property -dict { PACKAGE_PIN K18 IOSTANDARD LVCMOS33 }
[get_ports { qspi_dq[1] }]; #IO_L1N_T0_D01_DIN_14 Sch=qspi_dq[1]
#set_property -dict { PACKAGE_PIN L14 IOSTANDARD LVCMOS33 }
[get_ports { qspi_dq[2] }]; #IO_L2P_T0_D02_14 Sch=qspi_dq[2]
#set_property -dict { PACKAGE_PIN M14 IOSTANDARD LVCMOS33 }
[get_ports { qspi_dq[3] }]; #IO_L2N_T0_D03_14 Sch=qspi_dq[3]

##Power Measurements

#set_property -dict { PACKAGE_PIN B17 IOSTANDARD LVCMOS33 }


[get_ports { vsnsvu_n }]; #IO_L7N_T1_AD2N_15 Sch=ad_n[2]
#set_property -dict { PACKAGE_PIN B16 IOSTANDARD LVCMOS33 }
[get_ports { vsnsvu_p }]; #IO_L7P_T1_AD2P_15 Sch=ad_p[2]
#set_property -dict { PACKAGE_PIN B12 IOSTANDARD LVCMOS33 }
[get_ports { vsns5v0_n }]; #IO_L3N_T0_DQS_AD1N_15 Sch=ad_n[1]
#set_property -dict { PACKAGE_PIN C12 IOSTANDARD LVCMOS33 }
[get_ports { vsns5v0_p }]; #IO_L3P_T0_DQS_AD1P_15 Sch=ad_p[1]
#set_property -dict { PACKAGE_PIN F14 IOSTANDARD LVCMOS33 }
[get_ports { isns5v0_n }]; #IO_L5N_T0_AD9N_15 Sch=ad_n[9]
#set_property -dict { PACKAGE_PIN F13 IOSTANDARD LVCMOS33 }
[get_ports { isns5v0_p }]; #IO_L5P_T0_AD9P_15 Sch=ad_p[9]
#set_property -dict { PACKAGE_PIN A16 IOSTANDARD LVCMOS33 }
[get_ports { isns0v95_n }]; #IO_L8N_T1_AD10N_15 Sch=ad_n[10]
#set_property -dict { PACKAGE_PIN A15 IOSTANDARD LVCMOS33 }
[get_ports { isns0v95_p }]; #IO_L8P_T1_AD10P_15 Sch=ad_p[10]

92

You might also like