Chapter 10 Arithmetic Ckts Presentation.V

Design of Arithmetic
circuits
ic principle of pipelining
aditional approach
Input
Data
Process Through
< 100 ns 10 MH
clk
pelining approach
¾ Throughput increases
considerably.
¾ Chip area also increases.
¾ Latency comes into effect.
nput Throug
ata 100 M
Proc. Reg. Proc. Reg.
1 1 10 10
<10 ns <10 ns
clk clk
ocessing order
me(ns) Input Reg. 1 Reg. 2

Reg. 10
0 Data1
0 Data2 Proc.1_1
20 Data3 Proc.1_2
Proc.2_1
0 Data11 Proc.1_10 Proc.2_9...

oc.10_1
tency: 100 ns.
rtitioning of a design
Partition of data width
Partition of functionality
Partition of data width

Consider the example of a
signed adder:
¾ Eight signed input numbers,

each of width
12 bits.
¾ Sum of these numbers are

required.
¾ Conventional approach of
addition/subtraction uses
all the 12 bits together.
¾ Since full adders are used

for implementation, the
result is delayed owing to
the propagation of carry
rippling through all the 12
bits.
¾ Even the usage of ‘carry look
ahead’ circuit does not
help in speeding up the
computation since a large
number of gates and inputs
are required in this case.
¾ The answer for this problem

is to divide the data
widths into smaller chunks,
and introduce pipelining.
¾ In the data width

partitioning approach,
all sub blocks do the same
function.
artition of functionality
¾ In this method, the

functional block is
divided into smaller sub
blocks.
¾ In this type of partitioning,

each sub
block does a different
function, in
general.
¾ In the signed adder example

to be presented,
LSBs (7 bits) of the eight
numbers are
added concurrently followed
by the
addition of MSBs (5 bits
along with carry
from LSB addition) in
subsequent pipeline
stages.
________________________________
__________________________
ADDER CAN BE REALIZED IN TWO
DIFFERENT WAYS:
• Feeding inputs serially

• Feeding inputs concurrently
SERIAL SIGNED ADDER DESIGN
sum [14:0] + s
n [11:0] - s
( n0 – n7 )
enable clk
// Pipelined Serial Signed Adder

Design - Verilog Code
//Adds eight numbers of 12 bit, 2's complement
// nos. Feed inputs serially at 'n'.
// Eight pipelining registered @ posedge of clk.
// Result, sum, is 15 bits wide, in 2's complement
// (registered output).
module serial_adder12s (
clk,
enable,
n,
sum,
sum_valid,
result
);
input clk ;
input enable ;
input [11:0] n;
output [14:0] sum ;

output sum_valid ;
output [14:0] result ;
// Extend the result till it is overwritten by the

new result.
wire [14:0] sum_next ;

wire [2:0] cnt_next ;
wire sum_val ;
reg [14:0] sum;

reg [2:0] cnt ;
reg sum_valid ;
reg [14:0] result ;
assign sum_next[14:0] = enable ?

({{3{n[11]}},n[11:0]}+sum[14:0]) : 0
;
// Sign extend & accumulate.

assign cnt_next[2:0] = enable ? (cnt+1) : 0 ;
// Sign extend & pre-advance the counter.
assign sum_val = (cnt==7) ? 1 : 0 ;
// Pre-determine the validity of the sum.
always @ (posedge clk)
// Pipeline - Register the sum.
begin
sum[14:0] <= sum_next[14:0] ;
// Register the sum.
cnt[2:0] <= cnt_next[2:0] ;
// Advance the count.

sum_valid <= sum_val ;
// Register the signal.
end
// Extend the result till it is overwritten by the

new result.
begin
result[14:0] = sum_valid ? sum[14:0] :

result[14:0]
;
// Register the sum.
end
endmodule
________________________________
__________________________
// Test Bench for Serial Adder Design
`define clkperiodby2 10
ìnclude "serial_adder12s.v"
module serial_adder12s_test (
sum,
sum_valid,
result
);
output [14:0] sum;

output sum_valid ;
output [14:0] result;
reg clk ;
reg enable ;
reg [11:0] n;
serial_adder12s u1(
.clk(clk),
.enable(enable),
.n(n),
.sum(sum),
.sum_valid(sum_valid),
.result(result)
);
initial
begin
clk = 1'b0 ;
// Apply first set of inputs sequentially every 20

ns.
n = 12'h0 ; // n0 @ 0 ns.
enable =0;
#20 enable =1;
#17 n = 12'hfff ; // n1 @ 37 ns.

#20 n = 12'h7ff ; // n2 @ 57 ns, etc.
#20 n = 12'h800 ;
#20 n = 12'h001 ;
#20 n = 12'h001 ;
#20 n = 12'h7ff ;
#20 n = 12'haaa ; // n7 @ 157 ns.
#20 n = 12'h0 ;
enable = 0 ; // Disable before
applying
// the next set of
inputs
// so that the
accumulated
// ‘sum’ is cleared.
#20 enable = 1 ;
// Apply the next set of
inputs.
n =100 ; // n0
#20 n = 200 ;
#20 n = 300 ;
#20 n = 400 ;
#20 n = 500 ;
#20 n = 100 ;
#20 n = 200 ;
#20 n = 247 ; // n7
#20 enable = 0 ;
#100
$stop ;
end
always
#`clkperiodby2 clk <= ~clk ;
// Run the clock at 50 MHz.
endmodule
________________________________
__________________________
Simulation results of serial signed adder

_______________________________________
____________________________
Synplify results
Max. frequency of operation: 138 MHz.
Mapping to part: xcv600ehq240-8

Cell usage:
MUXCY_L 14 uses
XORCY 14 uses
FDR 19 uses
FDE 15 uses
GND 1 use
I/O primitives:
IBUF 13 uses
OBUF 31 uses
BUFGP 1 use
I/O Register bits: 15

Register bits not including I/Os: 19 (0%)
Global Clock Buffers: 1 of 4 (25%)
Mapping Summary:
Total LUTs: 18 (0%)
Mapper successful!
Xilinx P&R Results
Design Summary:
Number of errors: 0
Number of warnings: 0
Number of Slices: 11 out of 6,912
1%
Number of Slices containing
unrelated logic: 0 out of 11 0%
Number of Slice Flip Flops: 19 out of 13,824
1%
Number of 4 input LUTs: 18 out of 13,824
1%
Number of bonded IOBs: 44 out of 158
27%
IOB Flip Flops: 15
Number of GCLKs: 1 out of 4
25%
Number of GCLKIOBs: 1 out of 4
25%
Total equivalent gate count for design: 464

Additional JTAG gate count for IOBs: 2,160
Mapping completed.
Maximum frequency: 174.307MHz
_______________________________________
____________________________
PARALLEL SIGNED ADDER DESIGN
n0 [11:0]
n1 [11:0]
n2 [11:0]
n3 [11:0]
adder12s
n4 [11:0]
n5 [11:0]
n6 [11:0]
Complement evaluation (shortcut)
[8].....[0]
ep
11110000 Data
1 10000 Retain first 1
followed by 0s
2 00010000 Invert other
bits
• Sign can be extended by any

number of bits without
affecting the actual value.
• Sign extend means duplicate
MSB ([8]=[7]).
Extend
Sign
[8].....[0]
111111111 -1
111111111 -1
111111111 -1
000000001 +1
_________ ___
_________ __
111111110 -2
000000000 0
_________ ___
_________ __
Ignore Carry.
001111111 +1 27
110000000 -128
001111111 +1 27
110000000 -128
_________ _____ _________
____
011111110 +2 54
100000000 -256
_________ _____
_________ ____
• Without the sign extension,

the MSB [7] will be mistaken
as a negative number for high
positive values such as +254.
Pipelined
design partition
n0 [11:0]
+
n1 [11:0]
+
n2 [11:0]
+
n3 [11:0]
n4 [11:0]
+
n5 [11:0]
+
n6 [11:0]
+
n7 [11:0] Regist
clk clk
LSB MSB
clk clk
Register
Register LSB MSB Result
Result
First stage Second stage T
**********
Verilog code for

signed adder
// Adds eight 12 bit, 2's

complement nos.,
// n0 to n7.
// Five pipeline stages
registered @ posedge // clk.
// Result, sum, is in 12 bit,
2's complement // (not
registered).
module adder12s(
clk,
n0,n1,n2,n3,n4,n5,n6,n7,
sum
) ;
input clk ;
input [11:0] n0, n1, n2, n3,

n4, n5, n6, n7;
output [14:0] sum ;
wire [7:0] s00_lsb ;

wire [5:0] s00_msb ;



reg [11:7] n0_reg1 ;

reg [11:7] n1_reg1 ;
reg [11:7] n2_reg1 ;
reg [11:7] n3_reg1 ;
reg [11:7] n4_reg1 ;
reg [11:7] n5_reg1 ;
reg [11:7] n6_reg1 ;
reg [11:7] n7_reg1 ;
reg [7:0] s00_lsbreg1 ;

reg [5:0] s00_msbreg2 ;







reg s20_lsbreg5cy ;
// First stage addition
assign s00_lsb[7:0] =
n0[6:0]+n1[6:0] ;
// Add lsb first - s00_lsb[7] is

the carry
n2[6:0]+n3[6:0] ;
// n0-n7 lsb need not be

registered since
// addition is already carried
out here.
n4[6:0]+n5[6:0] ;
n6[6:0]+n7[6:0] ;
// Pipeline 1: clk (1). Register

msb to
// continue addition of msb.
begin
n0_reg1[11:7] <= n0[11:7] ;
// Preserve all inputs

for msb addition
// during the clk(2).
n1_reg1[11:7] <= n1[11:7] ;
n2_reg1[11:7] <= n2[11:7] ;
n3_reg1[11:7] <= n3[11:7] ;
n4_reg1[11:7] <= n4[11:7] ;

n5_reg1[11:7] <= n5[11:7] ;
n6_reg1[11:7] <= n6[11:7] ;
n7_reg1[11:7] <= n7[11:7] ;
s00_lsbreg1[7:0] <=
s00_lsb[7:0] ;
// Preserve all
lsb sum.
// s00_lsbreg1[7]
is the
// registered
carry
// from lsb
addition.
s01_lsbreg1[7:0] <=
s01_lsb[7:0] ;
s02_lsbreg1[7:0] <=
s02_lsb[7:0] ;
s03_lsbreg1[7:0] <=
s03_lsb[7:0] ;
end
// Sign extended & msb added

with carry.
assign s00_msb[5:0] =
{n0_reg1[11],
n0_reg1[11:7]}+
{n1_reg1[11],
n1_reg1[11:7]}+s00_lsbreg1[7];
//
s00_msb[6] is ignored.
{n2_reg1[11],
n2_reg1[11:7]}+
{n3_reg1[11],
n3_reg1[11:7]}+s01_lsbreg1[7];
{n4_reg1[11],
n4_reg1[11:7]}+
{n5_reg1[11],
n5_reg1[11:7]}+s02_lsbreg1[7];
{n6_reg1[11],
n6_reg1[11:7]}+
{n7_reg1[11],
n7_reg1[11:7]}+s03_lsbreg1[7];

msb to
begin
s00_msbreg2[5:0] <=
s00_msb[5:0] ;
//
Preserve all msb sum.
s01_msbreg2[5:0] <=
s01_msb[5:0] ;
s02_msbreg2[5:0] <=
s02_msb[5:0] ;
s03_msbreg2[5:0] <=
s03_msb[5:0] ;
s00_lsbreg2[6:0] <=
s00_lsbreg1[6:0] ;
// Preserve
all lsb
sum.
s01_lsbreg2[6:0] <=
s01_lsbreg1[6:0] ;
s02_lsbreg2[6:0] <=
s02_lsbreg1[6:0] ;
s03_lsbreg2[6:0] <=
s03_lsbreg1[6:0] ;
end
// Second stage addition
s00_lsbreg2[6:0] +
s01_lsbreg2[6:0] ;
// Add lsb first -

s10_lsb[7] is
// the carry.
s02_lsbreg2[6:0] +
s03_lsbreg2[6:0] ;
// s00,s01 lsbs need not

be registered
// since addition is
already carried
// out here.

msb to
begin
s10_lsbreg3[7:0] <=
s10_lsb[7:0] ;
// Preserve
all lsb
sum.
s11_lsbreg3[7:0] <=
s11_lsb[7:0] ;
s00_msbreg3[5:0] <=
s00_msbreg2[5:0] ;
// Preserve
all msb sum.
s01_msbreg3[5:0] <=
s01_msbreg2[5:0] ;
s02_msbreg3[5:0] <=
s02_msbreg2[5:0] ;
s03_msbreg3[5:0] <=
s03_msbreg2[5:0] ;
end
{s00_msbreg3[5],
s00_msbreg3[5:0]}+
{s01_msbreg3[5],
s01_msbreg3[5:0]}
+s10_lsbreg3[7] ;
// Add MSB of 2nd stage with sign

extension
// and carry in from LSB.
// s10_msb[7] is ignored.
{s02_msbreg3[5],
s02_msbreg3[5:0]}+
{s03_msbreg3[5],
s03_msbreg3[5:0]}+
s11_lsbreg3[7] ;

msb to
begin
s10_lsbreg4[6:0] <=
s10_lsbreg3[6:0] ;
//
Preserve all lsb sum.
s11_lsbreg4[6:0] <=
s11_lsbreg3[6:0] ;
s10_msbreg4[6:0] <=
s10_msb[6:0] ;
//
s11_msbreg4[6:0] <=
s11_msb[6:0] ;
end
// Third stage addition.
s10_lsbreg4[6:0]+
s11_lsbreg4[6:0] ;
// Add lsb first -

s20_lsb[7] is
// the carry.

msb to
begin
s10_msbreg5[6:0] <=
s10_msbreg4[6:0] ;
//
s11_msbreg5[6:0] <=
s11_msbreg4[6:0] ;
s20_lsbreg5cy <=
s20_lsb[7];
//
Preserve all lsb sum.
s20_lsbreg5[6:0] <=
s20_lsb[6:0];
end
// Add third stage MSB result

and concatenate
// with LSB result to get the
final result.
assign sum[14:0] =
{({s10_msbreg5[6],
s10_msbreg5[6:0]}+
{s11_msbreg5[6],
s11_msbreg5[6:0]}+
s20_lsbreg5cy),
s20_lsbreg5[6:0]};
endmodule
________________________________
__________________________
TEST BENCH FOR PARALLEL SIGNED
ADDER DESIGN
ìnclude "adder12s_banno.v"
// Use back
annotated file.
module adder12s_test (
sum
);
output [14:0] sum;
reg clk ;
reg [11:0] n0 ;
reg [11:0] n1 ;
reg [11:0] n2 ;
reg [11:0] n3 ;
reg [11:0] n4 ;
reg [11:0] n5 ;
reg [11:0] n6 ;
reg [11:0] n7 ;
adder12s u1(
.clk(clk),
.n0(n0),
.n1(n1),
.n2(n2),
.n3(n3),
.n4(n4),
.n5(n5),
.n6(n6),
.n7(n7),
.sum(sum)
);
initial
begin
clk = 1'b0 ;
n0 = 12'h0 ;
n1 = 12'h0 ;
n2 = 12'h0 ;
n3 = 12'h0 ;
n4 = 12'h0 ;
n5 = 12'h0 ;
n6 = 12'h0 ;
n7 = 12'h0 ;
#17 n0 = 12'hfff ;
n1 = 12'hfff ;
n2 = 12'hfff ;
n3 = 12'hfff ;
n4 = 12'hfff ;
n5 = 12'hfff ;
n6 = 12'hfff ;
n7 = 12'hfff ;
#20 n0 = 12'h7ff ;
n1 = 12'h7ff ;
n2 = 12'h7ff ;
n3 = 12'h7ff ;
n4 = 12'h7ff ;
n5 = 12'h7ff ;
n6 = 12'h7ff ;
n7 = 12'h7ff ;
#20 n0 = 12'h800 ;
n1 = 12'h800 ;
n2 = 12'h800 ;
n3 = 12'h800 ;
n4 = 12'h800 ;
n5 = 12'h800 ;
n6 = 12'h800 ;
n7 = 12'h800 ;
#20 n0 = 12'h001 ;
n1 = 12'h001 ;
n2 = 12'h001 ;
n3 = 12'h001 ;
n4 = 12'h001 ;
n5 = 12'h001 ;
n6 = 12'h001 ;
n7 = 12'h001 ;
#20 n0 = 12'h001 ;
n1 = 12'hfff ;
n2 = 12'h001 ;
n3 = 12'hfff ;
n4 = 12'h001 ;
n5 = 12'hfff ;
n6 = 12'h001 ;
n7 = 12'hfff ;
#20 n0 = 12'h7ff ;
n1 = 12'h7ff ;
n2 = 12'h7ff ;
n3 = 12'h7ff ;
n4 = 12'h801 ;
n5 = 12'h801 ;
n6 = 12'h801 ;
n7 = 12'h801 ;
#20 n0 = 12'haaa ;
n1 = 12'h555 ;
n2 = 12'haaa ;
n3 = 12'h555 ;
n4 = 12'haaa ;
n5 = 12'h555 ;
n6 = 12'haaa ;
n7 = 12'h555 ;
#20 n0 = 12'h0 ;
n1 = 12'h0 ;
n2 = 12'h0 ;
n3 = 12'h0 ;
n4 = 12'h0 ;
n5 = 12'h0 ;
n6 = 12'h0 ;
n7 = 12'h0 ;
#400
$stop ;
end
always
endmodule
________________________________
__________________________
Simulation results of eight

input parallel signed adder
________________________________
__________________________
Synplify synthesis Results
@I::"D:\user\ram\verilog_latest\
dvlsi_des_verilog\adder12s.v"
Verilog syntax check successful!
Selecting top level module
adder12s
Synthesizing module adder12s
Performance Summary
*******************
Worst slack in design: 1.136
Requested
Estimated
Starting Clock Frequency
Frequency
--------------------------------
-----------
clk 100.0 MHz
112.8 MHz
================================
===========
Requested Estimated
Clock
Period Period
Slack Type
--------------------------------
--------------
10.000 8.864
1.136 inferred
================================
==============
Resource Usage Report for

adder12s

Cell usage:
MUXCY_L 81 uses
XORCY 88 uses
MUXCY 7 uses
FD 214 uses
GND 1 use
I/O primitives:
IBUF 96 uses
OBUF 15 uses
BUFGP 1 use
I/O Register bits:

47
Register bits not including
I/Os: 167 (1%)
Global Clock Buffers: 1 of 4

(25%)
Mapping Summary:
Total LUTs: 95 (0%)
Mapper successful!
________________________________
__________________________
Xilinx P&R
Results
Design Summary:
Number of errors: 0
Number of Slices: 97 out
of 6,912 1%
unrelated logic: 0 out
of 97 0%
Number of Slice Flip Flops:

167 out
of 13,824 1%
Number of 4 input LUTs:
95 out
of 13,824 1%
Number of bonded IOBs:
111 out
of 158 70%
IOB Flip Flops: 47
Number of GCLKs: 1 out

of 4 25%
Number of GCLKIOBs: 1 out
of 4 25%
Total equivalent gate count for

design: 2,810
Additional JTAG gate count for
IOBs: 5,376
Mapping completed.
Timing summary:
---------------
Design statistics:
Minimum period: 6.563ns
(Maximum frequency:
152.369MHz)
Minimum input arrival time
before clock:
4.259ns
Minimum output required time
after clock:
11.083ns
Running DRC.
DRC detected 0 errors and 0
warnings.
Creating bit map...
Saving bit stream in
"adder12s.bit".
Creating bit mask...
Saving mask bit stream in
"adder12s.msk".
Bitstream generation is
complete.
________________________________
__________________________
COMPARISON OF SERIAL ADDER AND

PARALLEL ADDER
WITH EIGHT NUMBER OF INPUTS
--------------------------------
--
Type of Serial Parallel
Adder
--------------------------------
--
No. of i/p 8 1
clk cycles
--------------------------------
--
No. of o/p 9 1
clk cycles
--------------------------------
--
Gate count 464 2810
JTAG gate 2,160 5376
--------------------------------
--
Max. freq. of 174 152
Operation in MHz
--------------------------------
--
________________________________
__________________________
MULTIPLIER DESIGN – A NEW

ALGORITHM
n1 [10:0]
mult11sx8s
n2 [7:0]
clk
8
pipeline
stages
Example :
Consider the evaluation of

products of two signed numbers:
1023 x -128 =
- 130944
Binary, signed representation:
01111111111 x 10000000 =
1000000000010000000
n1 (magnitude) x
n2 (magnitude)
01111111111
x 10000000
______________________________
_______
00000000000
P1
00000000000
P2
00000000000
P3
00000000000
P4
00000000000
P5
00000000000
P6
00000000000
P7
01111111111
P8
____________________
011111111110000000
(magnitude)
____________________
Pipelined design partition
P1
+ S 11
P2 LS 1 b S2
+
P3 LS 2 b
+
P4 LS 1 b S12
P5 L
+ S13
P6 LS 1 b
+
P7 LS 2 b S2
+
P8 LS 1 b S14
Second stage
Verilog code for multiplier
// Signed multiplication of two

numbers, n1
// (11-bit) & n2 (8-bit).
// n1 (Partial product, CX for
example) is the
// multiplicand, and is signed.
// n2 (cos term, CT for example)
is the signed
// multiplier.
// Result (CX)CT is in twos
complement.
// CX, CT are used in DCTQ
Processor.
// This module has eight

pipeline stages to
// increase the speed - input is
not
// registered.
module mult11sx8s(
clk,
n1,
n2,
result
) ;
input clk ;
input [10:0] n1 ;
input [7:0] n2 ;
output [18:0] result ;
wire n1orn2z
;
wire [10:0] p1 ;
wire [10:0] p2 ;
wire [10:0] p3 ;
wire [10:0] p4 ;
wire [10:0] p5 ;
wire [10:0] p6 ;
wire [10:0] p7 ;
wire [10:0] p8 ;
wire [6:0] s11a ;

wire [6:0] s12a ;
wire [6:0] s13a ;
wire [6:0] s14a ;
wire [5:0] s11b ;

wire [5:0] s12b ;
wire [5:0] s13b ;
wire [5:0] s14b ;
wire [12:0] s11 ;
wire [12:0] s12 ;
wire [12:0] s13 ;
wire [12:0] s14 ;
wire [7:0] s21a ;

wire [7:0] s22a ;
wire [6:0] s21b ;

wire [6:0] s22b ;
wire [14:0] s21 ;

wire [14:0] s22 ;
wire [8:0] s31a ;

wire [7:0] s31b ;
wire [17:0] s31 ;
wire res_sign
;
wire [18:0] res ;
reg [10:0] n1_mag ;

reg [7:0] n2_mag ;
reg [10:0] p1_reg1

;
reg [10:0] p2_reg1
;
reg [10:0] p3_reg1
;
reg [10:0] p4_reg1
;
reg [10:0] p5_reg1
;
reg [10:0] p6_reg1
;
reg [10:0] p7_reg1
;
reg [10:0] p8_reg1
;
reg [6:0]
s11a_reg2 ;
reg [6:0]
s12a_reg2 ;
reg [6:0]
s13a_reg2 ;
reg [6:0]
s14a_reg2 ;
reg n1_reg1;
reg n1_reg2;
reg n1_reg3;
reg n1_reg4;
reg n1_reg5;
reg n1_reg6;
reg n1_reg7;
reg n2_reg1;
reg n2_reg2;
reg n2_reg3;
reg n2_reg4;
reg n2_reg5;
reg n2_reg6;
reg n2_reg7;
reg n1orn2z_reg1 ;
reg n1orn2z_reg2 ;
reg n1orn2z_reg3 ;
reg n1orn2z_reg4 ;
reg n1orn2z_reg5 ;
reg n1orn2z_reg6 ;
reg n1orn2z_reg7 ;
reg [10:0] p1_reg2

;
reg [10:0] p2_reg2
;
reg [10:0] p3_reg2
;
reg [10:0] p4_reg2
;
reg [10:0] p5_reg2
;
reg [10:0] p6_reg2
;
reg [10:0] p7_reg2
;
reg [10:0] p8_reg2
;
reg [12:0] s11_reg3
;
reg [12:0] s12_reg3
;
reg [12:0] s13_reg3
;
reg [12:0] s14_reg3
;
reg [12:0] s11_reg4

;
reg [12:0] s12_reg4
;
reg [12:0] s13_reg4
;
reg [12:0] s14_reg4
;
reg [7:0]
s21a_reg4 ;
reg [7:0]
s22a_reg4 ;
reg [14:0] s21_reg5
;
reg [14:0] s22_reg5
;
reg [14:0] s21_reg6

;
reg [14:0] s22_reg6
;
reg [8:0]
s31a_reg6 ;
reg [17:0] s31_reg7

;
reg [18:0] result ;
always @(n1)
begin
if(n1[10] == 1'b0)
n1_mag = n1[10:0];
else
n1_mag = ~n1[10:0] + 1; //
Evaluate twos complement.
end
always @(n2)
begin
if(n2[7] == 1'b0)
n2_mag = n2[7:0];
else
n2_mag = ~n2[7:0] + 1;
// Evaluate
twos complement.
end
assign n1orn2z = ((n1 ==
11'b0)||(n2 == 7'b0))
? 1'b1:1'b0;
// If n1 or n2 is zero,
make final
// result +0.
assign p1 = n1_mag[10:0] &

{11{n2_mag[0]}};
// Compute the partial

products.

{11{n2_mag[1]}};
// n1 multiplied by n2
bit '0', etc.

{11{n2_mag[2]}};
{11{n2_mag[3]}};
{11{n2_mag[4]}};
{11{n2_mag[5]}};
{11{n2_mag[6]}};
{11{n2_mag[7]}};
// This is the first pipeline

register,
// clk(1).
begin
p1_reg1 <= p1;

p2_reg1 <= p2;
p3_reg1 <= p3;
p4_reg1 <= p4;
p5_reg1 <= p5;
p6_reg1 <= p6;
p7_reg1 <= p7;
p8_reg1 <= p8;
n1_reg1 <= n1[10];

n2_reg1 <= n2[7];
n1orn2z_reg1 <= n1orn2z;
end
// p1_reg1, etc. means p1, etc.

are registered // after positive
edge of clk (1), clk (2),
// etc.
assign s11a[6:0] = p1_reg1[6:1]

+
p2_reg1[5:0];
// LSB
is added here.
assign s12a[6:0] = p3_reg1[6:1]
+
p4_reg1[5:0];
// Note the left

shifts are
// taken care of.
assign s13a[6:0] = p5_reg1[6:1]

+
p6_reg1[5:0];
// for p1,
p3, p5 and p7.
assign s14a[6:0] = p7_reg1[6:1]

+
p8_reg1[5:0];
// p1_reg1[0],
etc. will be
// processed
at the clk (2).
// s11a[6],
etc. are the
// carry bits.
// This is the second

pipeline register,
// clk (2).
begin
s11a_reg2 <= s11a; // Store LSB

partial sums.
s12a_reg2 <= s12a;
s13a_reg2 <= s13a;
s14a_reg2 <= s14a;
p1_reg2[10:7] <= p1_reg1[10:7];

// Store MSB of
partial products.
p2_reg2[10:6] <= p2_reg1[10:6];

p3_reg2[10:7] <= p3_reg1[10:7];
p4_reg2[10:6] <= p4_reg1[10:6];
p5_reg2[10:7] <= p5_reg1[10:7];
p6_reg2[10:6] <= p6_reg1[10:6];
p7_reg2[10:7] <= p7_reg1[10:7];
p8_reg2[10:6] <= p8_reg1[10:6];
p1_reg2[0] <= p1_reg1[0]; //

Store '0' th bit
//
since it is not
p3_reg2[0] <= p3_reg1[0]; // yet
processed.
p5_reg2[0] <= p5_reg1[0];
p7_reg2[0] <= p7_reg1[0];
n1_reg2 <= n1_reg1;

// Also store sign bits
and zero status.
n2_reg2 <= n2_reg1;

n1orn2z_reg2 <= n1orn2z_reg1;
end
// MSB is added here along with

carry.
assign s11b[5:0] = {1'b0,

p1_reg2[10:7]} +
p2_reg2[10:6] +
s11a_reg2[6];
assign s12b[5:0] = {1'b0,

p3_reg2[10:7]} +
p4_reg2[10:6] +
s12a_reg2[6];
assign s13b[5:0] = {1'b0,

p5_reg2[10:7]} +
p6_reg2[10:6] +
s13a_reg2[6];
assign s14b[5:0] = {1'b0,

p7_reg2[10:7]} +
p8_reg2[10:6] +
s14a_reg2[6];
// MSBs & LSBs

are
// concatenated
here.
assign s11[12:0] = {s11b,
s11a_reg2[5:0],
p1_reg2[0]};
// MSB, LSB,
'0' th bit
//
respectively.
assign s12[12:0] = {s12b,

s12a_reg2[5:0],
p3_reg2[0]};
assign s13[12:0] = {s13b,
s13a_reg2[5:0],
p5_reg2[0]};
assign s14[12:0] = {s14b,

s14a_reg2[5:0],
p7_reg2[0]};
// This is the third

pipeline register,
// clk (3). First stage
results.
begin
s11_reg3 <= s11; // Store

for further
//
processing.
s12_reg3 <= s12;
s13_reg3 <= s13;
s14_reg3 <= s14;
n1_reg3 <= n1_reg2;

n2_reg3 <= n2_reg2;
end
assign s21a[7:0] = s11_reg3[8:2]

+
s12_reg3[6:0];
//
s21a[7]is the carry.
assign s22a[7:0] = s13_reg3[8:2]

+
s14_reg3[6:0];
// LSB
sum, 2nd stage.

// This is the fourth
pipeline register,
// clk (4).
begin
s11_reg4[12:9] <=
s11_reg3[12:9];
// Store bits not

yet processed.
s11_reg4[1:0] <= s11_reg3[1:0];

s12_reg4[12:7] <=
s12_reg3[12:7];
s13_reg4[12:9] <=
s13_reg3[12:9];
s13_reg4[1:0] <= s13_reg3[1:0];
s14_reg4[12:7] <=
s14_reg3[12:7];
s21a_reg4 <= s21a;

// Store LSB, second stage
partial sums.
s22a_reg4 <= s22a;
n1_reg4 <= n1_reg3;

n2_reg4 <= n2_reg3;
end
// Add second stage MSBs with

carry.
assign s21b[6:0] = {2'b0,

s11_reg4[12:9]} +
s12_reg4[12:7] +
s21a_reg4[7];
assign s22b[6:0] = {2'b0,
s13_reg4[12:9]} +
s14_reg4[12:7] +
s22a_reg4[7];
assign s21[14:0] = {s21b[5:0],

s21a_reg4[6:0],
s11_reg4[1:0]} ;
// {MSB,
LSB, [1:0]}
// Result will never

effect s21b[6],
// which is always 0.
assign s22[14:0] = {s22b[5:0],

s22a_reg4[6:0],
s13_reg4[1:0]} ;
// This is the fifth

pipeline register,
// clk (5).
begin
s21_reg5 <= s21;

// Store for
further processing.
s22_reg5 <= s22;
n1_reg5 <= n1_reg4;

n2_reg5 <= n2_reg4;
end
assign s31a[8:0] =
s21_reg5[11:4] +
s22_reg5[7:0];
// 3rd stage LSB

computed here.
// This is the sixth

pipeline register,
// clk (6).
begin
s21_reg6[14:12]<=
s21_reg5[14:12];
//
Preserve MSB.
s22_reg6[14:8] <=
s22_reg5[14:8];
s21_reg6[3:0] <= s21_reg5[3:0];
s31a_reg6 <= s31a; //3rd
stage LSB
//
registered here.
n1_reg6 <= n1_reg5;

n2_reg6 <= n2_reg5;
end
assign s31b[7:0] = {4'b0,

s21_reg6[14:12]} +
s22_reg6[14:8] +
s31a_reg6[8];
// 3rd stage MSB

computed here.
assign s31[17:0] = {s31b[5:0],
s31a_reg6[7:0],
s21_reg6[3:0]} ;
// Put MSB, LSB and [3:0]

bits together.
// Note that the 3rd stage

result will never // effect
s31b[6:5], which is always 0.
// This is the seventh pipeline

register,
// clk (7).
begin
n1_reg7 <= n1_reg6;

// Store
intermediate results.
n2_reg7 <= n2_reg6;
s31_reg7 <= s31;
end
assign res_sign =
n1_reg7^n2_reg7;
// '1'
means a -ve no.
assign res[18:0] = (res_sign ) ?

{1'b1,
(~s31_reg7 +
1'b1)}:
{1'b0,
s31_reg7};
// This is the eighth pipeline

register,
// clk (8).
begin
if (n1orn2z_reg7 == 1'b1)
result[18:0] <= 19'b0;
else
result[18:0] <= res;

// This is the
final result
// (product of
two numbers)
// in twos
complement.
end
endmodule
________________________________
__________________________
TEST BENCH FOR MULTIPLIER
ìnclude "mult11sx8s_banno.v"
module mult11sx8s_test (
result
);
output [18:0] result;

reg clk ;
reg [10:0] n1 ;
reg [7:0] n2 ;
mult11sx8s u1(
.clk(clk),
.n1(n1),
.n2(n2),
.result(result)
);
initial
begin
clk = 1'b0 ;
n1 = 11'h0 ;
n2 = 8'h0 ;
#17 n1 = 11'h555 ;
n2 = 8'h55;
#20 n1 = 11'h2aa ;
n2 = 8'haa;
#20 n1 = 11'h7ff ;
n2 = 8'h80;
#20 n1 = 11'h555 ;
n2 = 8'hff;
#20 n1 = 11'h7ff ;
n2 = 8'h81;
#20 n1 = 11'h555 ;
n2 = 8'h81;
#20 n1 = 11'h2aa ;
n2 = 8'h81;
#20 n1 = 11'h7ff ;
n2 = 8'h00;
#20 n1 = 11'h7ff ;
n2 = 8'h7f;
#20 n1 = 11'h000 ;
n2 = 8'hff;
#20 n1 = 11'h000 ;
n2 = 8'h7f;
#400
$stop ;
end
always
endmodule
________________________________
__________________________
Simulation results of multiplier

________________________________
__________________________
Synplify results
@I::"D:\user\ram\verilog_latest\
dvlsi_des_verilog\mult11sx8s.v"
Verilog syntax check successful!
Selecting top level module
mult11sx8s
Synthesizing module mult11sx8s
@N:"D:\user\ram\verilog_latest\d
vlsi_des_verilog\mult11sx8s.v":3
46:0:346:5|Found seqShift
n1orn2z, depth=7, width=1
46:0:346:5|Found seqShift n1,
depth=6, width=1
46:0:346:5|Found seqShift n2,
depth=6, width=1
@W:"D:\user\ram\verilog_latest\d
02:0:202:5|Register bit
s14a_reg2[6] is always 0,
optimizing ...
@END
Performance Summary
*******************
Worst slack in design: 12.009
Requested
Estimated
Starting Clock Frequency
Frequency
--------------------------------
-----------
clk 50.0 MHz
125.1 MHz
================================
===========
Requested Estimated
Clock
Period Period
Slack Type
--------------------------------
--------------
20.000 7.991
12.009 inferred
================================
==============
Resource Usage Report for

mult11sx8s

Cell usage:
MUXCY_L 100 uses
XORCY 109 uses
MUXCY 9 uses
FDR 105 uses
FD 209 uses
GND 1 use
VCC 1 use
I/O primitives:
IBUF 19 uses
OBUF 19 uses
BUFGP 1 use
SRL primitives:
SRL16 9 uses
I/O Register bits:

22
Register bits not including
I/Os: 292 (2%)
Global Clock Buffers: 1 of 4

(25%)
Mapping Summary:
Total LUTs: 181 (1%)
Mapper successful!
Xilinx P&R Results
Design Summary:
Number of errors: 0
Number of Slices: 201 out
of 6,912 2%
unrelated logic: 0 out
of 201 0%
Number of Slice Flip Flops:
292 out
of 13,824 2%
Total Number 4 input LUTs:
178 out
of 13,824 1%
Number used as LUTs:161
Number used as a route-thru:
8
Number used as Shift
registers: 9
Number of bonded IOBs:
38 out
of 158 24%
IOB Flip Flops: 22
Number of GCLKs: 1 out
of 4 25%
Number of GCLKIOBs: 1 out
of 4 25%
Total equivalent gate count for

design: 5,284
Additional JTAG gate count for
IOBs: 1,872
Mapping completed.
Timing summary:
---------------
Timing errors: 0 Score: 0
Constraints cover 2328 paths, 0

nets, and 896 connections
(100.0% coverage)
Design statistics:
Minimum period: 12.132ns
(Maximum
frequency:
82.427MHz)
Minimum input arrival time

before clock:
10.150ns
Minimum output required time
after clock:
5.617ns
Running DRC.
DRC detected 0 errors and 0
warnings.
Creating bit map...
Saving bit stream in
"mult11sx8s.bit".
Creating bit mask...
Saving mask bit stream in
"mult11sx8s.msk".
Bitstream generation is
complete.
________________________________
__________________________

Chapter 10 Arithmetic Ckts Presentation.V

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 10 Arithmetic Ckts Presentation.V

Uploaded by

Copyright:

Available Formats

Design of Arithmetic

me(ns) Input Reg. 1 Reg. 2

0 Data11 Proc.1_10 Proc.2_9...

tency: 100 ns.

 Partition of data width

Partition of data width

¾ Eight signed input numbers,

¾ Sum of these numbers are

¾ Since full adders are used

¾ The answer for this problem

¾ In the data width

¾ In this method, the

¾ In this type of partitioning,

¾ In the signed adder example

• Feeding inputs serially

SERIAL SIGNED ADDER DESIGN

// Pipelined Serial Signed Adder

output [14:0] sum ;

// Extend the result till it is overwritten by the

wire [14:0] sum_next ;

reg [14:0] sum;

reg [14:0] result ;

assign sum_next[14:0] = enable ?

// Sign extend & accumulate.

// Sign extend & pre-advance the counter.

assign sum_val = (cnt==7) ? 1 : 0 ;

// Pre-determine the validity of the sum.

always @ (posedge clk)

// Pipeline - Register the sum.

sum[14:0] <= sum_next[14:0] ;

// Register the sum.

cnt[2:0] <= cnt_next[2:0] ;

// Advance the count.

// Register the signal.

always @ (posedge clk)

// Extend the result till it is overwritten by the

result[14:0] = sum_valid ? sum[14:0] :

// Register the sum.

// Test Bench for Serial Adder Design

output [14:0] sum;

output [14:0] result;

// Apply first set of inputs sequentially every 20

#17 n = 12'hfff ; // n1 @ 37 ns.

#`clkperiodby2 clk <= ~clk ;

// Run the clock at 50 MHz.

Simulation results of serial signed adder

Max. frequency of operation: 138 MHz.

Mapping to part: xcv600ehq240-8

I/O Register bits: 15

Global Clock Buffers: 1 of 4 (25%)

Total equivalent gate count for design: 464

Maximum frequency: 174.307MHz

PARALLEL SIGNED ADDER DESIGN

• Sign can be extended by any

• Without the sign extension,

Verilog code for

// Adds eight 12 bit, 2's

input [11:0] n0, n1, n2, n3,

output [14:0] sum ;

wire [7:0] s00_lsb ;

wire [5:0] s00_msb ;

wire [7:0] s10_lsb ;

wire [6:0] s10_msb ;

Partition of data width