Design and Implementation of Arithmetic Based FIR Filter 27-28 2023

Design and Implementation of Arithmetic based FIR
Filters for DSP Application.

Gayathri S Esha S Challa Bhavya Yasha Jyothi M Shirur
Dept. of Electronics and Dept. of Electronics and Dept. of Electronics and Dept. of Electronics and
Communication Communication Communication Communication
BNM Institute of Technology BNM Institute of Technology BNM Institute of Technology BNM Institute of Technology
Bangalore, India Bangalore, India Bangalore, India Bangalore, India
sgayathritriveni@gmail.com eshashivapuram@gmail.com challabhavya12@gmail.com yashajyothimshirur@bnmit.com
Abstract— I n r e a l l i f e a p p l i c a t i o n s t h e s i g n a l s a r e 2) Transposed form: In this approach, a delay unit are

continuously captured, monitored, processed and observed frequently among adders because of this reason
a n a l y s e d . The p r o c e s s i n g a n d a n a l y s i s o f d a t a i s the multipliers are given simultaneously. The filter
easier if it is in the form of digital. The Digital coefficients are convolved with entering input data at
Signal Processing (DSP) finds importance mainly in
biomedical devices or wearable devices. In DSP
each tap and summed up to get the filter output. Equation
system, the Finite Impulse Response (FIR) filter 1 represents the FIR Filter design. The system of
design acts as a basic building block . In wearable convolution accommodates wide variety of additives and
applications where the complex computation is multiplications. FIR filter with inputs x(n), coefficients
involved, to acquire high accuracy the filters with bk, postpone(z-1) and output y(n) is as depicted in Figure
higher order is used. The Multipliers is heart of 1.
any filter design, accommodates major chip area
and require extra time for computation. The
designers mainly concentrate on the optimization of
multiplier over the existing one. An attempt is
made in this paper to design FIR filter based on
Distributed Arithmetic (DA) algorithm which is
mainly depends on the precomputed values stored
in the Look-Up Tabe(LUT). It is a multiplier less
design architecture. It is observed that the
distributed arithmetic-based architecture is Figure 1: Architecture of FIR Filter Design [1]
efficient for real signal computation. The paper
highlights the advantages of DA technique over the
traditional MAC based design. Both the designs
(1)
are coded in Verilog and verified for the
functionality and comparison is ma de. The
proposed design has given area, power and timing
advantage of 64%, 62% and 61% respectively. 3) Multiplier and Accumulate Design
Keywords— Distributed Arithmetic (DA), Look Up
Table (LUT), Multiply and Accumulate (MAC) unit, The MAC structure contains the following low-level modules
Finite Impulse Response (FIR) filter as depicted in the Figure 2:
I. INTRODUCTION TO DESIGN a. Multiplier

b. Adder
I. Design of TAP based FIR Filter: c. Accumulator.
The filter design plays an extremely important role for the
optimization of any design related to signal processing. The
essential building block of Digital Signal Processing (DSP)
systems is FIR filter design. The performance of the FIR filter
mainly relays on MAC operations. The traditional MAC is
made up of multiplier which utilizes greater energy,
computation time and occupies a large area. The researchers
and designers concentrate more on multiplier designs, there
are other techniques which needs exploration to replace the
multipliers but still perform the required operations. The one
technique which has given promising results when considered
in DSP operations is distributed arithmetic. Conventionally,
the FIR Filters are designed in two different ways:
Figure 2: Conventional MAC Architecture [2]
1) Direct form: In the direct approach, delay units are gifted
The 2 operands are multiplied and the resultant operand is
amongst multipliers. For a specific time, the values of
delivered to the output of the multiplier. The adder adds the
x(n) are entered, collectively with (M-1) pattern fed to
two parameters i.e., multiplier output and the feedback
each multiplier, additionally the y(n) filter output is sum
parameter from the accumulator.
of the solutions of every multiplier.
a) Multiplier Module: The Multiplier module calculates the
multiplication product for any given input and transfer the
978-1-6654-9260-7/23/$31.00 ©2023 IEEE

end result to the next-level for further computation i.e., meaningful results which are discussed in the later part of
addition stage. The fast Vedic multiplier is used in this the paper.
design, it takes comparatively less computation time since
it works on Vedic sutras to reduce the computation steps
in multiplication. The added advantage of Vedic
multiplier is less energy consumption and relatively
smaller area utilization [3]. In Vedic multiplier, the
standard multiplication technique implementation is
based on the well know sutra Urdhava Tiryakbhyam(UT)
for all combination of multiplication cases. The Urdhava
Figure 3: Architecture of LUT based DA design
Tiryakbhyam sutra described in the old ancient Vedas is a
Any new design will start with the detailed design
better choice for fast computing to get the faster
specifications which will give clarity about the algorithm
multiplication products. This multiplication method will
used, number of sub modules present in it, the complexity of
bring about better pace and with lesser delays and is
the design, hardware and software requirement and no. of
highly equipped than different present methods. It is
input and output signals and description of the signals. It also
highly efficient in terms of area utilisation and
figures out the control and interfacing unit requirement which
computation time.
decides the capability of the design to enhance or expand on
b) Adder module: In adder module the simplest Ripple Carry
need basis. The design is coded in Verilog starting from the
adder (RCA) is used. It is also recommended to use the
low-level going up in the direction to integrate them at high
other efficient adders available in the literature. RCA is a
level which gives the required results as per the application
combinational logic circuit. This architecture will perform
requirements. The verification is performed at every stage of
addition of m-bit numbers. In this m complete adder
the design flow to check the functionality. Performance
submodules are used for performing addition operation of
reviews are generated with the use of EDA tools and
m-bits. It generally uses m-bit parallel adder since it is
comparison is made between the proposed design and the
easy to implement and at the same time give greater
conventional design to conclude which is better for the chosen
advantages as per requirements the number of bits can be
application.
dynamically changed based on the application
requirement. The block will be readily available and can
be easily integrated with any other designs.
c) Accumulator Module: It takes the output of the adder
element and accumulates it with the previous outputs to
generate the output of the filter for every cycle of clock.
II. Introduction to Distributed Arithmetic:

In Very Large-Scale Integration (VLSI), the most popular
bits’ serial approach is Distributed Arithmetic (DA). In
this approach the values are calculated in advance based
on the number of address bits and stored in the database
in the form of a lookup table (LUT) and these values are
considered for the computation instead of resultant values
from the multiplication. The DA based design
implementation on field programmable gate arrays
(FPGAs) is easier and takes less computational time, but
they are hindering due to excessive utilization of area
when the bits increase. This is the one of the main
deciding factors to choose between the multiplier-based
MAC or multiplier-less MAC. The entered coefficients
are numbered concurrently and are used as input bits to
the LUT; its end result is added to the gathered partial
products. To calculate the dot product, it makes use of N
clocks wherein N is the range of bits entered and is
independent of the range of the entered variables [4].
Figure 3 denotes the LUT based DA design architecture.
II. DESIGN IMPLEMENTATION
Figure 4: Step-by-Step Design Flow carried out for proposed design
The flow chart shown in figure 4 gives the design implementation
implementation methodology followed for getting the
International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE)
I. Design Details of 4 tap FIR Filter based on MAC
In the proposed design four order filter is considered, it will
have four coefficients. The inputs considered are X and Y are
of 8-bits in size and output is 16-bits wide. The design
accommodates the non-positive numbers as well, in two's
complement format and output changes accordingly to the
input subjected to the design. The structural design
architecture of FIT filter is as depicted in figure 5.
Table 3: Input and Output Signal Description
Figure 5: Architecture of 4 tap FIR Filter using MAC unit
The high-level module Four-Tap FIR filter design includes 3

low-level modules as shown in figure 5. The table 1 gives the
details of the operation performed by each of the modules. Table 4: Details of the Module Name and Operation Performed
Table 1: Details of the Low-Level Module and Operation Performed
II. Design Details of 4 tap FIR Filter based DA:

DA uses Lookup Tables (LUTs) to store the multiplication
results. The Look-Up Table may be designed as per table 2.
The design is represented in Figure 6. It has low-level
modules such as rom_rtl, adder, PIPO and PISO. They are
used to perform operations as shown in table 3.
The table 4 the details of the input/output signals of the design
with number of bits, directions, and description.
Table 2: Look Up Table (LUT)
Figure 6: Architecture of DA based FIR Filter
III. RESULT AND DISCUSSIONS
The Conventional MAC and Proposed DA based algorithm
for FIR Filter is coded in Hardware Description Language
(Verilog) and simulated to check its functionality. The design
is synthesized by targeting it to 45nm technology libraries
using to Cadence XC Simulator. Figure 7 helps to perform
functional verification received for the MAC based algorithm
design.
The markers in the simulation represents as follows: Marker-
1: 8-bits Input x.
Marker-2: 8-bits Input y
Marker-3: 16-bits output q.
The simulated values are verified with the theoretical values. d
and e which can be 16-bits are the intermediate indicators for
the design.
Figure 7: Functional Verification for MAC based design

Figure 8 describes the waveform obtained for the DA based
algorithmic design. It helps to verify the functionality of the Figure 9: Comparison of Synthesized Area report for both the designs
design before it is synthesized.
The markers in the simulation represents as follows: The proposed method has given advantage of Area, Power and
Marker-1: Filter co-efficient. Timing over the conventional approach. The results are
Marker-2: 4-bits ADDR shown in Figures 10, 11, and 12.
Marker-3: 16-bits output q
The simulated results are verified with the manual calculation.
d and e of 16- bits are the intermediate indicators for the
design.
Figure 8: Simulation results for DA based FIR Filter
Figure 10: Comparison of Area based on synthesis report generated

The synthesis reports generated for both the design. Area
report generated by the tool for both the algorithmic designs in
Cadence is as shown in Figure 9. Clearly, DA based approach
is more efficient by making use of lesser number of cells
which in turn reduces the area.
specific LUT is the solution which can be easily adopted.,
Implementation of DA based FIR’s have a tendency to
consume less area, computation time and energy which is
approximately equal to 64%, 62% and 63% respectively when
compared to MAC based filter. In future the distributed
arithmetic methodology may be applied to other blocks of
DSP Processor to get full advantage. The only drawback
associated with this technique is it may consume more area
when the number bits exponentially. The designer should
carefully take a call to decide upon the design to be
incorporated based on the applications.
ACKNOWLEDGEMENT
Figure 11: Comparison of the Delay based on synthesis report generated
We would like to thank B N M Institute of Technology
management, Bengaluru for providing the required resources
to carry out the project and VTU, Belgaum for motivating us
to write a paper. Their constant encouragement and stable
support will always encourage the research community to
think in a different direction to serve the society.
REFERENCES
[1] N. Pal, H. Singh, R. Sarin, and S. Singh,

“Implementation of High-Speed FIR Filter using Serial
and Parallel Distributed Arithmetic Algorithm,”
Figure 12: Comparison of Power Consumed based on synthesis repor
International Journal of Computer Applications, vol. 25,
07 2011.
Table 3 indicates the Consolidated Comparison of Area,
[2] Bharathi, M., Shirur, Y.J., & Lahari, P.L. (2020).
Power and Timing for proposed design and conventional
Performance evaluation of Distributed Arithmetic based
design. i.e, MAC based FIR Filter and the proposed design i.e,
MAC Structures for DSP Applications. 2020 7th
DA based with clock signal generated
International Conference on Smart Structures and
Table 3: Consolidated Comparison of Area, Power and Timing for proposed Systems (ICSSS), 1-5.
design and conventional design [3] S. Dayanand, V. K R, R. T, Y. J. M. Shirur, and J. R.
Munavalli, “Low Power High Speed Vedic Techniques
in Recent VLSI Design – A Survey”, pices, vol. 4, no. 6,
pp. 147-156, Oct. 2020.
[4] Ashish B. Kharate and Prof. P.R. Gumble, “VLSI Design
and Implementation of Low Power MAC for Digital FIR
Filter”, International Journal of Electronics
Communication and Computer Engineering Volume 4,
Issue (2) REACT-2013, ISSN 2249–071X. June 2013,
PP 604 – 605.
IV. CONCLUSION [5] Juthi Farhana Sayed, Bhuiyan Hasibul Hasan, Babul
Muntasir, Mehedi Hasan, Farhadur Arifin, “Design and
The wearable device targets the optimized designs to be
Evaluation of a FIR Filter Using Hybrid Adders and
embedded into it. They demand the Integrated chips which are
Vedic Multipliers”, 2021 2nd International Conference
more efficient in term of computation time. Since the real
on Robotics, Electrical and Signal Processing
signals are dynamically captured, monitored and analysed for
Techniques (ICREST).
24X7. The reduction in power consumption is also preferred
[6] D. Maskell, “Design of efficient multiplier less FIR
for the portable devices which are wearable in nature since
filters,” IET Circuits, Devices & Systems, vol. 1, pp.
they cannot be charged frequently. If there is area reduction
175–180(5), 2007
then it acts like a bonus. The proposed design has given
[7] Haw-Jing Lo, “Distributed Arithmetic” in Design of a
advantage for all the three important parameters which the
reusable Distributed Arithmetic Filter and Its
VLSI industry will be looking for. It may be used in the IC
Application to The Affine Projection Algorithm”
design of smart watches, cellular processors wherein speed,
Georgia, UMI Microform, 2009, PP 3-12.
energy performance and portability play an essential role.
Therefore, Distributed Arithmetic based FIR Filter with
[8] Heejong Yoo and David V. Anderson,” Hardware- [9] Cui Guo-wei, Wang Feng-ying, “The
Efficient Distributed Arithmetic Architecture for High- Implementation of FIR Low-pass Filter Based on
Order Digital Filters”, IEEE International Conference on FPGA and DA” Fourth International Conference on
Acoustics, Speech, And Signal Processing (ICASSP), Intelligent Control and Information Processing
APRIL 2005, PP 125-128. (ICICI IP), 9 – 11.

Design and Implementation of Arithmetic Based FIR Filter 27-28 2023

Uploaded by

Copyright:

Available Formats

You might also like

Design and Implementation of Arithmetic Based FIR Filter 27-28 2023

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design and Implementation of Arithmetic Based FIR Filter 27-28 2023

Uploaded by

Copyright:

Available Formats

Design and Implementation of Arithmetic based FIR

Filters for DSP Application.

Abstract— I n r e a l l i f e a p p l i c a t i o n s t h e s i g n a l s a r e 2) Transposed form: In this approach, a delay unit are

I. INTRODUCTION TO DESIGN a. Multiplier

978-1-6654-9260-7/23/$31.00 ©2023 IEEE

II. Introduction to Distributed Arithmetic:

Table 3: Input and Output Signal Description

Figure 5: Architecture of 4 tap FIR Filter using MAC unit

The high-level module Four-Tap FIR filter design includes 3

Table 1: Details of the Low-Level Module and Operation Performed

II. Design Details of 4 tap FIR Filter based DA:

Figure 6: Architecture of DA based FIR Filter

Figure 7: Functional Verification for MAC based design

Figure 8: Simulation results for DA based FIR Filter

Figure 10: Comparison of Area based on synthesis report generated

[1] N. Pal, H. Singh, R. Sarin, and S. Singh,

You might also like