Download as pdf or txt
Download as pdf or txt
You are on page 1of 109

Design of Arithmetic

circuits
ic principle of pipelining

aditional approach

Input
Data
Process Through
< 100 ns 10 MH

clk

pelining approach

¾ Throughput increases
considerably.
¾ Chip area also increases.
¾ Latency comes into effect.

nput Throug
ata 100 M
Proc. Reg. Proc. Reg.
1 1 10 10
<10 ns <10 ns

clk clk

ocessing order

me(ns) Input Reg. 1 Reg. 2


Reg. 10

0 Data1

0 Data2 Proc.1_1
20 Data3 Proc.1_2
Proc.2_1

0 Data11 Proc.1_10 Proc.2_9...


oc.10_1

tency: 100 ns.

rtitioning of a design

™ Partition of data width

™ Partition of functionality

Partition of data width


Consider the example of a
signed adder:

¾ Eight signed input numbers,


each of width
12 bits.

¾ Sum of these numbers are


required.

¾ Conventional approach of
addition/subtraction uses
all the 12 bits together.

¾ Since full adders are used


for implementation, the
result is delayed owing to
the propagation of carry
rippling through all the 12
bits.
¾ Even the usage of ‘carry look
ahead’ circuit does not
help in speeding up the
computation since a large
number of gates and inputs
are required in this case.

¾ The answer for this problem


is to divide the data
widths into smaller chunks,
and introduce pipelining.

¾ In the data width


partitioning approach,
all sub blocks do the same
function.

artition of functionality

¾ In this method, the


functional block is
divided into smaller sub
blocks.

¾ In this type of partitioning,


each sub
block does a different
function, in
general.

¾ In the signed adder example


to be presented,
LSBs (7 bits) of the eight
numbers are
added concurrently followed
by the
addition of MSBs (5 bits
along with carry
from LSB addition) in
subsequent pipeline
stages.

________________________________
__________________________
ADDER CAN BE REALIZED IN TWO
DIFFERENT WAYS:

• Feeding inputs serially


• Feeding inputs concurrently

SERIAL SIGNED ADDER DESIGN

sum [14:0] + s
n [11:0] - s
( n0 – n7 )
enable clk

// Pipelined Serial Signed Adder


Design - Verilog Code
//Adds eight numbers of 12 bit, 2's complement
// nos. Feed inputs serially at 'n'.
// Eight pipelining registered @ posedge of clk.
// Result, sum, is 15 bits wide, in 2's complement
// (registered output).

module serial_adder12s (
clk,
enable,
n,
sum,
sum_valid,
result
);

input clk ;
input enable ;

input [11:0] n;

output [14:0] sum ;


output sum_valid ;
output [14:0] result ;

// Extend the result till it is overwritten by the


new result.

wire [14:0] sum_next ;


wire [2:0] cnt_next ;
wire sum_val ;

reg [14:0] sum;


reg [2:0] cnt ;
reg sum_valid ;

reg [14:0] result ;

assign sum_next[14:0] = enable ?


({{3{n[11]}},n[11:0]}+sum[14:0]) : 0
;

// Sign extend & accumulate.


assign cnt_next[2:0] = enable ? (cnt+1) : 0 ;

// Sign extend & pre-advance the counter.

assign sum_val = (cnt==7) ? 1 : 0 ;

// Pre-determine the validity of the sum.

always @ (posedge clk)

// Pipeline - Register the sum.

begin

sum[14:0] <= sum_next[14:0] ;

// Register the sum.

cnt[2:0] <= cnt_next[2:0] ;

// Advance the count.


sum_valid <= sum_val ;

// Register the signal.

end

always @ (posedge clk)

// Extend the result till it is overwritten by the


new result.

begin

result[14:0] = sum_valid ? sum[14:0] :


result[14:0]
;

// Register the sum.

end
endmodule

________________________________
__________________________

// Test Bench for Serial Adder Design

`define clkperiodby2 10

`include "serial_adder12s.v"

module serial_adder12s_test (
sum,
sum_valid,
result
);

output [14:0] sum;


output sum_valid ;

output [14:0] result;

reg clk ;

reg enable ;

reg [11:0] n;

serial_adder12s u1(

.clk(clk),
.enable(enable),
.n(n),
.sum(sum),
.sum_valid(sum_valid),
.result(result)
);

initial
begin
clk = 1'b0 ;

// Apply first set of inputs sequentially every 20


ns.

n = 12'h0 ; // n0 @ 0 ns.
enable =0;
#20 enable =1;

#17 n = 12'hfff ; // n1 @ 37 ns.


#20 n = 12'h7ff ; // n2 @ 57 ns, etc.
#20 n = 12'h800 ;
#20 n = 12'h001 ;
#20 n = 12'h001 ;
#20 n = 12'h7ff ;
#20 n = 12'haaa ; // n7 @ 157 ns.
#20 n = 12'h0 ;
enable = 0 ; // Disable before
applying
// the next set of
inputs
// so that the
accumulated
// ‘sum’ is cleared.

#20 enable = 1 ;
// Apply the next set of
inputs.
n =100 ; // n0
#20 n = 200 ;
#20 n = 300 ;
#20 n = 400 ;
#20 n = 500 ;
#20 n = 100 ;
#20 n = 200 ;
#20 n = 247 ; // n7

#20 enable = 0 ;

#100

$stop ;

end
always

#`clkperiodby2 clk <= ~clk ;

// Run the clock at 50 MHz.

endmodule

________________________________
__________________________

Simulation results of serial signed adder


_______________________________________
____________________________

Synplify results

Max. frequency of operation: 138 MHz.

Mapping to part: xcv600ehq240-8


Cell usage:
MUXCY_L 14 uses
XORCY 14 uses
FDR 19 uses
FDE 15 uses
GND 1 use

I/O primitives:

IBUF 13 uses
OBUF 31 uses
BUFGP 1 use

I/O Register bits: 15


Register bits not including I/Os: 19 (0%)

Global Clock Buffers: 1 of 4 (25%)

Mapping Summary:
Total LUTs: 18 (0%)

Mapper successful!
Xilinx P&R Results

Design Summary:
Number of errors: 0
Number of warnings: 0
Number of Slices: 11 out of 6,912
1%
Number of Slices containing
unrelated logic: 0 out of 11 0%
Number of Slice Flip Flops: 19 out of 13,824
1%
Number of 4 input LUTs: 18 out of 13,824
1%
Number of bonded IOBs: 44 out of 158
27%
IOB Flip Flops: 15
Number of GCLKs: 1 out of 4
25%
Number of GCLKIOBs: 1 out of 4
25%

Total equivalent gate count for design: 464


Additional JTAG gate count for IOBs: 2,160
Mapping completed.

Maximum frequency: 174.307MHz

_______________________________________
____________________________

PARALLEL SIGNED ADDER DESIGN

n0 [11:0]

n1 [11:0]
n2 [11:0]

n3 [11:0]
adder12s
n4 [11:0]

n5 [11:0]

n6 [11:0]
Complement evaluation (shortcut)

[8].....[0]

ep
11110000 Data
1 10000 Retain first 1
followed by 0s
2 00010000 Invert other
bits

• Sign can be extended by any


number of bits without
affecting the actual value.
• Sign extend means duplicate
MSB ([8]=[7]).

Extend
Sign
[8].....[0]

111111111 -1
111111111 -1

111111111 -1
000000001 +1
_________ ___
_________ __

111111110 -2
000000000 0
_________ ___
_________ __

Ignore Carry.
001111111 +1 27
110000000 -128

001111111 +1 27
110000000 -128
_________ _____ _________
____

011111110 +2 54
100000000 -256
_________ _____
_________ ____

• Without the sign extension,


the MSB [7] will be mistaken
as a negative number for high
positive values such as +254.

Pipelined
design partition
n0 [11:0]
+
n1 [11:0]
+
n2 [11:0]
+
n3 [11:0]

n4 [11:0]
+
n5 [11:0]
+
n6 [11:0]
+
n7 [11:0] Regist
clk clk
LSB MSB
clk clk
Register
Register LSB MSB Result
Result
First stage Second stage T
**********

Verilog code for


signed adder

// Adds eight 12 bit, 2's


complement nos.,
// n0 to n7.
// Five pipeline stages
registered @ posedge // clk.
// Result, sum, is in 12 bit,
2's complement // (not
registered).

module adder12s(
clk,

n0,n1,n2,n3,n4,n5,n6,n7,
sum
) ;
input clk ;

input [11:0] n0, n1, n2, n3,


n4, n5, n6, n7;

output [14:0] sum ;

wire [7:0] s00_lsb ;


wire [7:0] s01_lsb ;
wire [7:0] s02_lsb ;
wire [7:0] s03_lsb ;

wire [5:0] s00_msb ;


wire [5:0] s01_msb ;
wire [5:0] s02_msb ;
wire [5:0] s03_msb ;

wire [7:0] s10_lsb ;


wire [7:0] s11_lsb ;

wire [6:0] s10_msb ;


wire [6:0] s11_msb ;
wire [7:0] s20_lsb ;

reg [11:7] n0_reg1 ;


reg [11:7] n1_reg1 ;
reg [11:7] n2_reg1 ;
reg [11:7] n3_reg1 ;
reg [11:7] n4_reg1 ;
reg [11:7] n5_reg1 ;
reg [11:7] n6_reg1 ;
reg [11:7] n7_reg1 ;

reg [7:0] s00_lsbreg1 ;


reg [7:0] s01_lsbreg1 ;
reg [7:0] s02_lsbreg1 ;
reg [7:0] s03_lsbreg1 ;

reg [5:0] s00_msbreg2 ;


reg [5:0] s01_msbreg2 ;
reg [5:0] s02_msbreg2 ;
reg [5:0] s03_msbreg2 ;

reg [6:0] s00_lsbreg2 ;


reg [6:0] s01_lsbreg2 ;
reg [6:0] s02_lsbreg2 ;
reg [6:0] s03_lsbreg2 ;

reg [7:0] s10_lsbreg3 ;


reg [7:0] s11_lsbreg3 ;

reg [5:0] s00_msbreg3 ;


reg [5:0] s01_msbreg3 ;
reg [5:0] s02_msbreg3 ;
reg [5:0] s03_msbreg3 ;

reg [6:0] s10_lsbreg4 ;


reg [6:0] s11_lsbreg4 ;

reg [6:0] s10_msbreg4 ;


reg [6:0] s11_msbreg4 ;

reg [6:0] s10_msbreg5 ;


reg [6:0] s11_msbreg5 ;

reg s20_lsbreg5cy ;
reg [6:0] s20_lsbreg5 ;
// First stage addition

assign s00_lsb[7:0] =
n0[6:0]+n1[6:0] ;

// Add lsb first - s00_lsb[7] is


the carry

assign s01_lsb[7:0] =
n2[6:0]+n3[6:0] ;

// n0-n7 lsb need not be


registered since
// addition is already carried
out here.

assign s02_lsb[7:0] =
n4[6:0]+n5[6:0] ;

assign s03_lsb[7:0] =
n6[6:0]+n7[6:0] ;
always @ (posedge clk)

// Pipeline 1: clk (1). Register


msb to
// continue addition of msb.

begin

n0_reg1[11:7] <= n0[11:7] ;

// Preserve all inputs


for msb addition
// during the clk(2).

n1_reg1[11:7] <= n1[11:7] ;

n2_reg1[11:7] <= n2[11:7] ;

n3_reg1[11:7] <= n3[11:7] ;

n4_reg1[11:7] <= n4[11:7] ;


n5_reg1[11:7] <= n5[11:7] ;

n6_reg1[11:7] <= n6[11:7] ;

n7_reg1[11:7] <= n7[11:7] ;

s00_lsbreg1[7:0] <=
s00_lsb[7:0] ;

// Preserve all
lsb sum.
// s00_lsbreg1[7]
is the
// registered
carry
// from lsb
addition.

s01_lsbreg1[7:0] <=
s01_lsb[7:0] ;
s02_lsbreg1[7:0] <=
s02_lsb[7:0] ;

s03_lsbreg1[7:0] <=
s03_lsb[7:0] ;

end

// Sign extended & msb added


with carry.

assign s00_msb[5:0] =
{n0_reg1[11],

n0_reg1[11:7]}+

{n1_reg1[11],

n1_reg1[11:7]}+s00_lsbreg1[7];

//
s00_msb[6] is ignored.
assign s01_msb[5:0] =
{n2_reg1[11],

n2_reg1[11:7]}+

{n3_reg1[11],

n3_reg1[11:7]}+s01_lsbreg1[7];

assign s02_msb[5:0] =
{n4_reg1[11],

n4_reg1[11:7]}+

{n5_reg1[11],

n5_reg1[11:7]}+s02_lsbreg1[7];

assign s03_msb[5:0] =
{n6_reg1[11],

n6_reg1[11:7]}+
{n7_reg1[11],

n7_reg1[11:7]}+s03_lsbreg1[7];

always @ (posedge clk)

// Pipeline 2: clk (2). Register


msb to
// continue addition of msb.

begin

s00_msbreg2[5:0] <=
s00_msb[5:0] ;
//
Preserve all msb sum.

s01_msbreg2[5:0] <=
s01_msb[5:0] ;

s02_msbreg2[5:0] <=
s02_msb[5:0] ;
s03_msbreg2[5:0] <=
s03_msb[5:0] ;

s00_lsbreg2[6:0] <=
s00_lsbreg1[6:0] ;

// Preserve
all lsb
sum.

s01_lsbreg2[6:0] <=
s01_lsbreg1[6:0] ;

s02_lsbreg2[6:0] <=
s02_lsbreg1[6:0] ;

s03_lsbreg2[6:0] <=
s03_lsbreg1[6:0] ;

end
// Second stage addition

assign s10_lsb[7:0] =
s00_lsbreg2[6:0] +

s01_lsbreg2[6:0] ;

// Add lsb first -


s10_lsb[7] is
// the carry.

assign s11_lsb[7:0] =
s02_lsbreg2[6:0] +

s03_lsbreg2[6:0] ;

// s00,s01 lsbs need not


be registered
// since addition is
already carried
// out here.
always @ (posedge clk)

// Pipeline 3: clk (3). Register


msb to
// continue addition of msb.

begin

s10_lsbreg3[7:0] <=
s10_lsb[7:0] ;

// Preserve
all lsb
sum.

s11_lsbreg3[7:0] <=
s11_lsb[7:0] ;

s00_msbreg3[5:0] <=
s00_msbreg2[5:0] ;
// Preserve
all msb sum.

s01_msbreg3[5:0] <=
s01_msbreg2[5:0] ;

s02_msbreg3[5:0] <=
s02_msbreg2[5:0] ;

s03_msbreg3[5:0] <=
s03_msbreg2[5:0] ;

end

assign s10_msb[6:0] =
{s00_msbreg3[5],

s00_msbreg3[5:0]}+

{s01_msbreg3[5],

s01_msbreg3[5:0]}
+s10_lsbreg3[7] ;

// Add MSB of 2nd stage with sign


extension
// and carry in from LSB.
// s10_msb[7] is ignored.

assign s11_msb[6:0] =
{s02_msbreg3[5],

s02_msbreg3[5:0]}+

{s03_msbreg3[5],

s03_msbreg3[5:0]}+

s11_lsbreg3[7] ;

always @ (posedge clk)

// Pipeline 4: clk (4). Register


msb to
// continue addition of msb.

begin

s10_lsbreg4[6:0] <=
s10_lsbreg3[6:0] ;

//
Preserve all lsb sum.

s11_lsbreg4[6:0] <=
s11_lsbreg3[6:0] ;

s10_msbreg4[6:0] <=
s10_msb[6:0] ;

//
Preserve all msb sum.

s11_msbreg4[6:0] <=
s11_msb[6:0] ;

end
// Third stage addition.

assign s20_lsb[7:0] =
s10_lsbreg4[6:0]+

s11_lsbreg4[6:0] ;

// Add lsb first -


s20_lsb[7] is
// the carry.

always @ (posedge clk)

// Pipeline 5: clk (5). Register


msb to
// continue addition of msb.

begin
s10_msbreg5[6:0] <=
s10_msbreg4[6:0] ;

//
Preserve all msb sum.

s11_msbreg5[6:0] <=
s11_msbreg4[6:0] ;

s20_lsbreg5cy <=
s20_lsb[7];

//
Preserve all lsb sum.

s20_lsbreg5[6:0] <=
s20_lsb[6:0];

end

// Add third stage MSB result


and concatenate
// with LSB result to get the
final result.

assign sum[14:0] =
{({s10_msbreg5[6],

s10_msbreg5[6:0]}+

{s11_msbreg5[6],

s11_msbreg5[6:0]}+

s20_lsbreg5cy),

s20_lsbreg5[6:0]};

endmodule

________________________________
__________________________
TEST BENCH FOR PARALLEL SIGNED
ADDER DESIGN

`define clkperiodby2 10

`include "adder12s_banno.v"

// Use back
annotated file.

module adder12s_test (
sum
);

output [14:0] sum;

reg clk ;

reg [11:0] n0 ;
reg [11:0] n1 ;

reg [11:0] n2 ;

reg [11:0] n3 ;

reg [11:0] n4 ;

reg [11:0] n5 ;

reg [11:0] n6 ;

reg [11:0] n7 ;

adder12s u1(

.clk(clk),
.n0(n0),
.n1(n1),
.n2(n2),
.n3(n3),
.n4(n4),
.n5(n5),
.n6(n6),
.n7(n7),
.sum(sum)

);

initial

begin

clk = 1'b0 ;

n0 = 12'h0 ;
n1 = 12'h0 ;
n2 = 12'h0 ;
n3 = 12'h0 ;
n4 = 12'h0 ;
n5 = 12'h0 ;
n6 = 12'h0 ;
n7 = 12'h0 ;
#17 n0 = 12'hfff ;
n1 = 12'hfff ;
n2 = 12'hfff ;
n3 = 12'hfff ;
n4 = 12'hfff ;
n5 = 12'hfff ;
n6 = 12'hfff ;
n7 = 12'hfff ;

#20 n0 = 12'h7ff ;
n1 = 12'h7ff ;
n2 = 12'h7ff ;
n3 = 12'h7ff ;
n4 = 12'h7ff ;
n5 = 12'h7ff ;
n6 = 12'h7ff ;
n7 = 12'h7ff ;

#20 n0 = 12'h800 ;
n1 = 12'h800 ;
n2 = 12'h800 ;
n3 = 12'h800 ;
n4 = 12'h800 ;
n5 = 12'h800 ;
n6 = 12'h800 ;
n7 = 12'h800 ;

#20 n0 = 12'h001 ;
n1 = 12'h001 ;
n2 = 12'h001 ;
n3 = 12'h001 ;
n4 = 12'h001 ;
n5 = 12'h001 ;
n6 = 12'h001 ;
n7 = 12'h001 ;

#20 n0 = 12'h001 ;
n1 = 12'hfff ;
n2 = 12'h001 ;
n3 = 12'hfff ;
n4 = 12'h001 ;
n5 = 12'hfff ;
n6 = 12'h001 ;
n7 = 12'hfff ;

#20 n0 = 12'h7ff ;
n1 = 12'h7ff ;
n2 = 12'h7ff ;
n3 = 12'h7ff ;
n4 = 12'h801 ;
n5 = 12'h801 ;
n6 = 12'h801 ;
n7 = 12'h801 ;

#20 n0 = 12'haaa ;
n1 = 12'h555 ;
n2 = 12'haaa ;
n3 = 12'h555 ;
n4 = 12'haaa ;
n5 = 12'h555 ;
n6 = 12'haaa ;
n7 = 12'h555 ;

#20 n0 = 12'h0 ;
n1 = 12'h0 ;
n2 = 12'h0 ;
n3 = 12'h0 ;
n4 = 12'h0 ;
n5 = 12'h0 ;
n6 = 12'h0 ;
n7 = 12'h0 ;

#400
$stop ;

end

always

#`clkperiodby2 clk <= ~clk ;

endmodule

________________________________
__________________________

Simulation results of eight


input parallel signed adder
________________________________
__________________________
Synplify synthesis Results

@I::"D:\user\ram\verilog_latest\
dvlsi_des_verilog\adder12s.v"
Verilog syntax check successful!
Selecting top level module
adder12s
Synthesizing module adder12s

Performance Summary
*******************

Worst slack in design: 1.136

Requested
Estimated
Starting Clock Frequency
Frequency
--------------------------------
-----------
clk 100.0 MHz
112.8 MHz
================================
===========

Requested Estimated
Clock
Period Period
Slack Type
--------------------------------
--------------
10.000 8.864
1.136 inferred
================================
==============

Resource Usage Report for


adder12s

Mapping to part: xcv600ehq240-8


Cell usage:
MUXCY_L 81 uses
XORCY 88 uses
MUXCY 7 uses
FD 214 uses
GND 1 use

I/O primitives:
IBUF 96 uses
OBUF 15 uses

BUFGP 1 use

I/O Register bits:


47
Register bits not including
I/Os: 167 (1%)

Global Clock Buffers: 1 of 4


(25%)

Mapping Summary:
Total LUTs: 95 (0%)

Mapper successful!
________________________________
__________________________

Xilinx P&R
Results

Design Summary:
Number of errors: 0
Number of warnings: 0
Number of Slices: 97 out
of 6,912 1%
Number of Slices containing
unrelated logic: 0 out
of 97 0%

Number of Slice Flip Flops:


167 out
of 13,824 1%
Number of 4 input LUTs:
95 out
of 13,824 1%
Number of bonded IOBs:
111 out
of 158 70%
IOB Flip Flops: 47

Number of GCLKs: 1 out


of 4 25%
Number of GCLKIOBs: 1 out
of 4 25%

Total equivalent gate count for


design: 2,810
Additional JTAG gate count for
IOBs: 5,376

Mapping completed.

Timing summary:
---------------

Design statistics:
Minimum period: 6.563ns
(Maximum frequency:

152.369MHz)
Minimum input arrival time
before clock:

4.259ns
Minimum output required time
after clock:

11.083ns
Running DRC.
DRC detected 0 errors and 0
warnings.
Creating bit map...
Saving bit stream in
"adder12s.bit".
Creating bit mask...
Saving mask bit stream in
"adder12s.msk".
Bitstream generation is
complete.

________________________________
__________________________

COMPARISON OF SERIAL ADDER AND


PARALLEL ADDER
WITH EIGHT NUMBER OF INPUTS

--------------------------------
--
Type of Serial Parallel
Adder
--------------------------------
--
No. of i/p 8 1
clk cycles
--------------------------------
--
No. of o/p 9 1
clk cycles
--------------------------------
--
Gate count 464 2810
JTAG gate 2,160 5376
--------------------------------
--
Max. freq. of 174 152
Operation in MHz
--------------------------------
--

________________________________
__________________________

MULTIPLIER DESIGN – A NEW


ALGORITHM

n1 [10:0]
mult11sx8s
n2 [7:0]
clk

8
pipeline
stages

Example :

Consider the evaluation of


products of two signed numbers:

1023 x -128 =
- 130944

Binary, signed representation:

01111111111 x 10000000 =
1000000000010000000

n1 (magnitude) x
n2 (magnitude)
01111111111
x 10000000

______________________________
_______

00000000000
P1
00000000000
P2
00000000000
P3
00000000000
P4
00000000000
P5
00000000000
P6
00000000000
P7
01111111111
P8
____________________

011111111110000000
(magnitude)
____________________

Pipelined design partition

P1
+ S 11
P2 LS 1 b S2
+
P3 LS 2 b
+
P4 LS 1 b S12

P5 L
+ S13
P6 LS 1 b
+
P7 LS 2 b S2
+
P8 LS 1 b S14

Second stage

Verilog code for multiplier

// Signed multiplication of two


numbers, n1
// (11-bit) & n2 (8-bit).
// n1 (Partial product, CX for
example) is the
// multiplicand, and is signed.
// n2 (cos term, CT for example)
is the signed
// multiplier.
// Result (CX)CT is in twos
complement.
// CX, CT are used in DCTQ
Processor.

// This module has eight


pipeline stages to
// increase the speed - input is
not
// registered.

module mult11sx8s(
clk,
n1,
n2,
result
) ;

input clk ;
input [10:0] n1 ;
input [7:0] n2 ;
output [18:0] result ;
wire n1orn2z
;

wire [10:0] p1 ;
wire [10:0] p2 ;
wire [10:0] p3 ;
wire [10:0] p4 ;
wire [10:0] p5 ;
wire [10:0] p6 ;
wire [10:0] p7 ;
wire [10:0] p8 ;

wire [6:0] s11a ;


wire [6:0] s12a ;
wire [6:0] s13a ;
wire [6:0] s14a ;

wire [5:0] s11b ;


wire [5:0] s12b ;
wire [5:0] s13b ;
wire [5:0] s14b ;
wire [12:0] s11 ;
wire [12:0] s12 ;
wire [12:0] s13 ;
wire [12:0] s14 ;

wire [7:0] s21a ;


wire [7:0] s22a ;

wire [6:0] s21b ;


wire [6:0] s22b ;

wire [14:0] s21 ;


wire [14:0] s22 ;

wire [8:0] s31a ;


wire [7:0] s31b ;
wire [17:0] s31 ;

wire res_sign
;
wire [18:0] res ;

reg [10:0] n1_mag ;


reg [7:0] n2_mag ;

reg [10:0] p1_reg1


;
reg [10:0] p2_reg1
;
reg [10:0] p3_reg1
;
reg [10:0] p4_reg1
;
reg [10:0] p5_reg1
;
reg [10:0] p6_reg1
;
reg [10:0] p7_reg1
;
reg [10:0] p8_reg1
;

reg [6:0]
s11a_reg2 ;
reg [6:0]
s12a_reg2 ;
reg [6:0]
s13a_reg2 ;
reg [6:0]
s14a_reg2 ;

reg n1_reg1;
reg n1_reg2;
reg n1_reg3;
reg n1_reg4;
reg n1_reg5;
reg n1_reg6;
reg n1_reg7;

reg n2_reg1;
reg n2_reg2;
reg n2_reg3;
reg n2_reg4;
reg n2_reg5;
reg n2_reg6;
reg n2_reg7;

reg n1orn2z_reg1 ;
reg n1orn2z_reg2 ;
reg n1orn2z_reg3 ;
reg n1orn2z_reg4 ;
reg n1orn2z_reg5 ;
reg n1orn2z_reg6 ;
reg n1orn2z_reg7 ;

reg [10:0] p1_reg2


;
reg [10:0] p2_reg2
;
reg [10:0] p3_reg2
;
reg [10:0] p4_reg2
;
reg [10:0] p5_reg2
;
reg [10:0] p6_reg2
;
reg [10:0] p7_reg2
;
reg [10:0] p8_reg2
;
reg [12:0] s11_reg3
;
reg [12:0] s12_reg3
;
reg [12:0] s13_reg3
;
reg [12:0] s14_reg3
;

reg [12:0] s11_reg4


;
reg [12:0] s12_reg4
;
reg [12:0] s13_reg4
;
reg [12:0] s14_reg4
;

reg [7:0]
s21a_reg4 ;
reg [7:0]
s22a_reg4 ;
reg [14:0] s21_reg5
;
reg [14:0] s22_reg5
;

reg [14:0] s21_reg6


;
reg [14:0] s22_reg6
;

reg [8:0]
s31a_reg6 ;

reg [17:0] s31_reg7


;

reg [18:0] result ;

always @(n1)

begin
if(n1[10] == 1'b0)
n1_mag = n1[10:0];
else
n1_mag = ~n1[10:0] + 1; //
Evaluate twos complement.

end

always @(n2)
begin

if(n2[7] == 1'b0)
n2_mag = n2[7:0];
else
n2_mag = ~n2[7:0] + 1;
// Evaluate
twos complement.

end
assign n1orn2z = ((n1 ==
11'b0)||(n2 == 7'b0))
? 1'b1:1'b0;

// If n1 or n2 is zero,
make final
// result +0.

assign p1 = n1_mag[10:0] &


{11{n2_mag[0]}};

// Compute the partial


products.

assign p2 = n1_mag[10:0] &


{11{n2_mag[1]}};

// n1 multiplied by n2
bit '0', etc.

assign p3 = n1_mag[10:0] &


{11{n2_mag[2]}};
assign p4 = n1_mag[10:0] &
{11{n2_mag[3]}};
assign p5 = n1_mag[10:0] &
{11{n2_mag[4]}};
assign p6 = n1_mag[10:0] &
{11{n2_mag[5]}};
assign p7 = n1_mag[10:0] &
{11{n2_mag[6]}};
assign p8 = n1_mag[10:0] &
{11{n2_mag[7]}};

always @ (posedge clk)

// This is the first pipeline


register,
// clk(1).

begin

p1_reg1 <= p1;


p2_reg1 <= p2;
p3_reg1 <= p3;
p4_reg1 <= p4;
p5_reg1 <= p5;
p6_reg1 <= p6;
p7_reg1 <= p7;
p8_reg1 <= p8;

n1_reg1 <= n1[10];


n2_reg1 <= n2[7];
n1orn2z_reg1 <= n1orn2z;

end

// p1_reg1, etc. means p1, etc.


are registered // after positive
edge of clk (1), clk (2),
// etc.

assign s11a[6:0] = p1_reg1[6:1]


+
p2_reg1[5:0];

// LSB
is added here.
assign s12a[6:0] = p3_reg1[6:1]
+
p4_reg1[5:0];

// Note the left


shifts are
// taken care of.

assign s13a[6:0] = p5_reg1[6:1]


+
p6_reg1[5:0];

// for p1,
p3, p5 and p7.

assign s14a[6:0] = p7_reg1[6:1]


+
p8_reg1[5:0];

// p1_reg1[0],
etc. will be
// processed
at the clk (2).
// s11a[6],
etc. are the
// carry bits.

always @ (posedge clk)

// This is the second


pipeline register,
// clk (2).

begin

s11a_reg2 <= s11a; // Store LSB


partial sums.
s12a_reg2 <= s12a;
s13a_reg2 <= s13a;
s14a_reg2 <= s14a;

p1_reg2[10:7] <= p1_reg1[10:7];


// Store MSB of
partial products.

p2_reg2[10:6] <= p2_reg1[10:6];


p3_reg2[10:7] <= p3_reg1[10:7];
p4_reg2[10:6] <= p4_reg1[10:6];
p5_reg2[10:7] <= p5_reg1[10:7];
p6_reg2[10:6] <= p6_reg1[10:6];
p7_reg2[10:7] <= p7_reg1[10:7];
p8_reg2[10:6] <= p8_reg1[10:6];

p1_reg2[0] <= p1_reg1[0]; //


Store '0' th bit
//
since it is not
p3_reg2[0] <= p3_reg1[0]; // yet
processed.
p5_reg2[0] <= p5_reg1[0];
p7_reg2[0] <= p7_reg1[0];

n1_reg2 <= n1_reg1;


// Also store sign bits
and zero status.

n2_reg2 <= n2_reg1;


n1orn2z_reg2 <= n1orn2z_reg1;

end

// MSB is added here along with


carry.

assign s11b[5:0] = {1'b0,


p1_reg2[10:7]} +

p2_reg2[10:6] +

s11a_reg2[6];

assign s12b[5:0] = {1'b0,


p3_reg2[10:7]} +

p4_reg2[10:6] +
s12a_reg2[6];

assign s13b[5:0] = {1'b0,


p5_reg2[10:7]} +

p6_reg2[10:6] +

s13a_reg2[6];

assign s14b[5:0] = {1'b0,


p7_reg2[10:7]} +

p8_reg2[10:6] +

s14a_reg2[6];

// MSBs & LSBs


are
// concatenated
here.
assign s11[12:0] = {s11b,
s11a_reg2[5:0],

p1_reg2[0]};

// MSB, LSB,
'0' th bit
//
respectively.

assign s12[12:0] = {s12b,


s12a_reg2[5:0],

p3_reg2[0]};
assign s13[12:0] = {s13b,
s13a_reg2[5:0],

p5_reg2[0]};

assign s14[12:0] = {s14b,


s14a_reg2[5:0],

p7_reg2[0]};
always @ (posedge clk)

// This is the third


pipeline register,
// clk (3). First stage
results.

begin

s11_reg3 <= s11; // Store


for further
//
processing.
s12_reg3 <= s12;
s13_reg3 <= s13;
s14_reg3 <= s14;

n1_reg3 <= n1_reg2;


n2_reg3 <= n2_reg2;
n1orn2z_reg3 <= n1orn2z_reg2;
end

assign s21a[7:0] = s11_reg3[8:2]


+

s12_reg3[6:0];

//
s21a[7]is the carry.

assign s22a[7:0] = s13_reg3[8:2]


+

s14_reg3[6:0];

// LSB
sum, 2nd stage.

always @ (posedge clk)


// This is the fourth
pipeline register,
// clk (4).

begin

s11_reg4[12:9] <=
s11_reg3[12:9];

// Store bits not


yet processed.

s11_reg4[1:0] <= s11_reg3[1:0];


s12_reg4[12:7] <=
s12_reg3[12:7];
s13_reg4[12:9] <=
s13_reg3[12:9];
s13_reg4[1:0] <= s13_reg3[1:0];
s14_reg4[12:7] <=
s14_reg3[12:7];

s21a_reg4 <= s21a;


// Store LSB, second stage
partial sums.

s22a_reg4 <= s22a;

n1_reg4 <= n1_reg3;


n2_reg4 <= n2_reg3;
n1orn2z_reg4 <= n1orn2z_reg3;

end

// Add second stage MSBs with


carry.

assign s21b[6:0] = {2'b0,


s11_reg4[12:9]} +

s12_reg4[12:7] +

s21a_reg4[7];
assign s22b[6:0] = {2'b0,
s13_reg4[12:9]} +

s14_reg4[12:7] +

s22a_reg4[7];

assign s21[14:0] = {s21b[5:0],


s21a_reg4[6:0],

s11_reg4[1:0]} ;

// {MSB,
LSB, [1:0]}

// Result will never


effect s21b[6],
// which is always 0.

assign s22[14:0] = {s22b[5:0],


s22a_reg4[6:0],
s13_reg4[1:0]} ;

always @ (posedge clk)

// This is the fifth


pipeline register,
// clk (5).

begin

s21_reg5 <= s21;


// Store for
further processing.
s22_reg5 <= s22;

n1_reg5 <= n1_reg4;


n2_reg5 <= n2_reg4;
n1orn2z_reg5 <= n1orn2z_reg4;

end
assign s31a[8:0] =
s21_reg5[11:4] +

s22_reg5[7:0];

// 3rd stage LSB


computed here.

always @ (posedge clk)

// This is the sixth


pipeline register,
// clk (6).

begin

s21_reg6[14:12]<=
s21_reg5[14:12];
//
Preserve MSB.
s22_reg6[14:8] <=
s22_reg5[14:8];
s21_reg6[3:0] <= s21_reg5[3:0];
s31a_reg6 <= s31a; //3rd
stage LSB
//
registered here.

n1_reg6 <= n1_reg5;


n2_reg6 <= n2_reg5;
n1orn2z_reg6 <= n1orn2z_reg5;

end

assign s31b[7:0] = {4'b0,


s21_reg6[14:12]} +

s22_reg6[14:8] +

s31a_reg6[8];

// 3rd stage MSB


computed here.
assign s31[17:0] = {s31b[5:0],
s31a_reg6[7:0],

s21_reg6[3:0]} ;

// Put MSB, LSB and [3:0]


bits together.

// Note that the 3rd stage


result will never // effect
s31b[6:5], which is always 0.

always @ (posedge clk)

// This is the seventh pipeline


register,
// clk (7).

begin

n1_reg7 <= n1_reg6;


// Store
intermediate results.
n2_reg7 <= n2_reg6;
s31_reg7 <= s31;
n1orn2z_reg7 <= n1orn2z_reg6;

end

assign res_sign =
n1_reg7^n2_reg7;

// '1'
means a -ve no.

assign res[18:0] = (res_sign ) ?


{1'b1,
(~s31_reg7 +
1'b1)}:
{1'b0,
s31_reg7};
always @ (posedge clk)

// This is the eighth pipeline


register,
// clk (8).

begin

if (n1orn2z_reg7 == 1'b1)

result[18:0] <= 19'b0;

else

result[18:0] <= res;


// This is the
final result
// (product of
two numbers)
// in twos
complement.
end
endmodule

________________________________
__________________________

TEST BENCH FOR MULTIPLIER

`define clkperiodby2 10

`include "mult11sx8s_banno.v"

module mult11sx8s_test (
result
);

output [18:0] result;


reg clk ;

reg [10:0] n1 ;

reg [7:0] n2 ;

mult11sx8s u1(

.clk(clk),
.n1(n1),
.n2(n2),
.result(result)

);

initial

begin

clk = 1'b0 ;
n1 = 11'h0 ;
n2 = 8'h0 ;

#17 n1 = 11'h555 ;
n2 = 8'h55;

#20 n1 = 11'h2aa ;
n2 = 8'haa;

#20 n1 = 11'h7ff ;
n2 = 8'h80;

#20 n1 = 11'h555 ;
n2 = 8'hff;

#20 n1 = 11'h7ff ;
n2 = 8'h81;

#20 n1 = 11'h555 ;
n2 = 8'h81;

#20 n1 = 11'h2aa ;
n2 = 8'h81;
#20 n1 = 11'h7ff ;
n2 = 8'h00;

#20 n1 = 11'h7ff ;
n2 = 8'h7f;

#20 n1 = 11'h000 ;
n2 = 8'hff;

#20 n1 = 11'h000 ;
n2 = 8'h7f;

#400

$stop ;

end

always

#`clkperiodby2 clk <= ~clk ;

endmodule
________________________________
__________________________

Simulation results of multiplier


________________________________
__________________________

Synplify results

@I::"D:\user\ram\verilog_latest\
dvlsi_des_verilog\mult11sx8s.v"
Verilog syntax check successful!
Selecting top level module
mult11sx8s
Synthesizing module mult11sx8s
@N:"D:\user\ram\verilog_latest\d
vlsi_des_verilog\mult11sx8s.v":3
46:0:346:5|Found seqShift
n1orn2z, depth=7, width=1
@N:"D:\user\ram\verilog_latest\d
vlsi_des_verilog\mult11sx8s.v":3
46:0:346:5|Found seqShift n1,
depth=6, width=1
@N:"D:\user\ram\verilog_latest\d
vlsi_des_verilog\mult11sx8s.v":3
46:0:346:5|Found seqShift n2,
depth=6, width=1
@W:"D:\user\ram\verilog_latest\d
vlsi_des_verilog\mult11sx8s.v":2
02:0:202:5|Register bit
s14a_reg2[6] is always 0,
optimizing ...
@END

Performance Summary
*******************

Worst slack in design: 12.009

Requested
Estimated
Starting Clock Frequency
Frequency
--------------------------------
-----------
clk 50.0 MHz
125.1 MHz
================================
===========

Requested Estimated
Clock
Period Period
Slack Type
--------------------------------
--------------
20.000 7.991
12.009 inferred
================================
==============

Resource Usage Report for


mult11sx8s

Mapping to part: xcv600ehq240-8


Cell usage:
MUXCY_L 100 uses
XORCY 109 uses
MUXCY 9 uses
FDR 105 uses
FD 209 uses
GND 1 use
VCC 1 use

I/O primitives:
IBUF 19 uses
OBUF 19 uses

BUFGP 1 use
SRL primitives:
SRL16 9 uses

I/O Register bits:


22
Register bits not including
I/Os: 292 (2%)

Global Clock Buffers: 1 of 4


(25%)

Mapping Summary:
Total LUTs: 181 (1%)

Mapper successful!

Xilinx P&R Results

Design Summary:
Number of errors: 0
Number of warnings: 0
Number of Slices: 201 out
of 6,912 2%
Number of Slices containing
unrelated logic: 0 out
of 201 0%
Number of Slice Flip Flops:
292 out
of 13,824 2%
Total Number 4 input LUTs:
178 out
of 13,824 1%
Number used as LUTs:161
Number used as a route-thru:
8
Number used as Shift
registers: 9
Number of bonded IOBs:
38 out
of 158 24%
IOB Flip Flops: 22
Number of GCLKs: 1 out
of 4 25%
Number of GCLKIOBs: 1 out
of 4 25%

Total equivalent gate count for


design: 5,284
Additional JTAG gate count for
IOBs: 1,872

Mapping completed.

Timing summary:
---------------

Timing errors: 0 Score: 0

Constraints cover 2328 paths, 0


nets, and 896 connections
(100.0% coverage)

Design statistics:
Minimum period: 12.132ns
(Maximum
frequency:
82.427MHz)

Minimum input arrival time


before clock:

10.150ns
Minimum output required time
after clock:

5.617ns
Running DRC.
DRC detected 0 errors and 0
warnings.
Creating bit map...
Saving bit stream in
"mult11sx8s.bit".
Creating bit mask...
Saving mask bit stream in
"mult11sx8s.msk".
Bitstream generation is
complete.
________________________________
__________________________

You might also like