Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

24

CHAPTER 2

LITERATURE REVIEW

2.1 INTRODUCTION

This chapter discusses the literature survey of multiplier based FIR


filter design and multiplierless FIR filter design using distributed arithmetic
method.

2.2 SURVEY BASED ON VLSI DESIGN OF FIR FILTER

Mirzaei et al (2006) presented a method for implementing high


speed finite impulse response (FIR) filters adopting just registered adders and
hardwired shifts. A modified common subexpression elimination algorithm
was availed to reduce the number of adders. The design was implemented in
Xilinx Virtex II devices and compared with Xilinx CoregenTM using
distributed arithmetic. The results showed 50% reduction in the number of
slices and up to 75% reduction in the number of LUTs for fully parallel
implementations. The total dynamic power consumption of the filters was
reduced by 50%. The designs perform significantly faster than the MAC
filters, which utilize embedded multipliers.

Bougas et al (2005) implemented an elaborate design of folded


finite impulse response (FIR) filter based on pipelined multiplier arrays. The
design was considered at the bit level and the internal delays of the pipelined
multiplier array were fully exploited in order to reduce hardware complexity.
Both direct and transposed FIR filter forms were considered. The carry-save
25

and the carry-propagate multiplier arrays were studied for the filter
implementations. The proposed schemes were compared in regard to the
aspect of hardware complexity with a straightforward implementation of a
folded FIR filter based on the pipelined Wallace Tree multiplier. The
comparison reveals that the proposed schemes require 20%-30% less
hardware.

Maskell (2007) formulated an algorithm for reducing the hardware


complexity of linear phase finite impulse response digital filters that
minimised the adder depth in the multiplier block adders (MBAs). The
algorithm starts by aggressively reducing both the coefficient word length and
the number of non-zero bits in the filter coefficients. It reduces the number of
adders (the adder depth) needed to construct the coefficient multiplier and
results in an increased operating frequency. The results showed that this
technique achieved a 67% reduction in the number of MBAs and 70%
reduction in the number of multiplier block FAs, respectively.

Vinod et al (2007) presented minimal-difference differential


coefficients method for low power and high speed realisation of differential
coefficients based finite impulse response filters. The minimal difference
differential coefficients can be coded using fewer bits, which in turn reduce
the number of full additions required for coefficient multiplication. By
employing a differential coefficient partitioning algorithm and a pseudo
floating point representation, the number of full adders and the net memory
needed to implement the coefficient multipliers can be significantly reduced.

Park et al (2000) developed an FIR filter that can be expressed as


multiplication of a vector by scalars. The high speed implementations were
presented for adaptive and non-adaptive filters based on a computation
sharing multiplier which specifically targets computation reuse in vector
scalar products. The performance of the implementation was compared with
26

implementations based on carry save and Wallace tree multipliers in 0.6


technology. The result showed that the sharing multiplier scheme gives
improvement in speed by approximately 30% and 21% with respect to the
Wallace tree multiplier based implementation for non-adaptive and adaptive
filters, respectively.

Badave & Bhalchandra (2012) pointed out the area complexity in


the algorithm of finite impulse response (FIR) filter mainly caused by
multipliers. Among the multiplierless techniques of FIR filter, Distributed
Arithmetic was the most preferred area efficient technique. In this technique,
precomputed values of inner product were stored in LUT, which are further
added and shifted with a number of iterations equal to the precision of input
samples. Design implementation and synthesis result showed the
improvement in speed of operation as well as saving in area.

Mirzaei et al (2010) mooted a method for implementation of high


speed finite impulse response (FIR) filters on field programmable gate arrays
(FPGAs). The algorithm was a multiplierless technique where fixed
coefficient multipliers were replaced with a series of add and shift operations.
The design performed up to 27% faster than the multiply accumulate (MAC)
filters implemented by Xilinx Coregen tool exercising DSP blocks.

Yoo & Anderson (2005) introduced new memory efficient


distributed arithmetic (DA) architecture for high order FIR filters. This
architecture was based on a memory reduction technique for DA Look Up
Tables (LUTs) and it requires fewer transistors for high-order filters than
original LU based DA, DA offset binary coding (DA-OBC), and the LUTless
DA-OBC. Recursive iteration of the memory reduction technique
significantly increases the higher order filters implementable on an FPGA
platform by not only saving transistor counts, but also balancing hardware
usage between logic element (LE) and memory.
27

Hwang et al (2004) proposed a new distributed arithmetic (DA)


algorithm for low power finite impulse response (FIR) filter implementation.
The characteristics of the algorithm are that the FIR filters do not need to
employ two's complement representation in lookup tables as well as multiply-
and-accumulation blocks. Thus, this new DA algorithm can minimize the
dynamic power consumption of the FIR filters. The experimental results
showed that the low pass FIR filter practsing new DA algorithm achieved
29% and 26% power consumption reduction compared to that exercising the
conventional FIR algorithm for zero mean random inputs and speech inputs,
respectively.

Akshoy et al (2013) opined that in the last two decades, many


efficient algorithms and architectures have been introduced for the design of
low complexity bit parallel Multiple Constant Multiplications (MCM)
operation which dominates the complexity of many digital signal processing
systems. On the other hand, little attention has been given to the digit-serial
MCM design that offers alternative low complexity MCM operations at the
cost of an increased delay. The problem of optimizing the gate level area in
digit-serial MCM designs was addressed and high level synthesis algorithms,
design architectures, and a computer aided design tool were introduced.

Namin et al (2010) designed a high speed VLSI implementation of


a 233 bit serial in parallel out finite field multiplier. The multiplier was
realised in a 0.18µm CMOS technology employing multiples of a domino
logic block. The multiplier was simulated and functioned correctly up to a
clock rate of 1.587 GHz, achieving greater performance while occupying less
area compared to similar designs.

Baran et al (2011) posited energy efficient 16×16 bit serial and


parallel multipliers and they are compared with 45nm CMOS technology. A
multiplier structure was proposed by optimizing the architecture, gate sizes
28

and the voltage supply. The proposed structure provides 15% more as
compared to two cycle parallel multiplier with the same energy consumption
for high speed applications.

Young & Jones (1991) considered that many signal processing


algorithms involve multiplications of data that can be organized as scalar
vector products. This method involved computing scalar vector products
based on shifted partial products of the vector elements. The method achieved
a several fold reduction in the computation required to implement a scalar
vector product. The reduction in computation translated into an overall
decrease in chip area required for a VLSI layout of the filter.

Soderstrand & Al-Marayati (1995) introduced high order FIR


filters required for the new modulation schemes associated with wireless
computer networks and cellular telephones can be implemented in VLSI
circuitry exploiting low power CMOS technology and a novel application of
Residue Number System (RNS) arithmetic.

Kollig et al (1997) endorsed the design and implementation of high


performance, high speed linear phase FIR filters availing FPGA technology.
Various well-known multiplier architectures were compared and an
appropriate structure for FPGA implementation was identified. The high data
throughput required to accommodate the input sample values was achieved
utilizing an interleaved memory structure. The model operates on 2’s
complement numbers and is characterised by three user defined parameters:
number of filter taps, signal and coefficient word lengths.

Mehendale et al (1998) presented algorithmic and architectural


transforms for low power realization of Finite Impulse Response (FIR) filters
implemented both in software on programmable DSPs and as hardwared
macros. These transform address power reduction in the program memory
29

address and data buses and also the multiplier. The transforms for hardwired
FIR filters aim at reducing the supply voltage while maintaining the
throughput. The transformations that reduced the computational complexity
of the FIR filter computation were also suggested in order to achieve power
reduction.

Tang et al (2002) devised a new high speed, programmable FIR


filter, which is a multiplierless filter with CSD encoding coefficients. With
this encoding scheme, the speed of filter was improved and the area was
optimized. In order to make this filter more applicable, a new programmable
CSD encoding structure was employed to make CSD coefficients
programmable.

Chang & Jen (2001) figured out a hardware efficient pipelined FIR
architecture with programmable coefficients. FIR operations are first
reformulated into multi bit DA form at an algorithm level. Then, at the
architecture level, the (p, q) compressor, instead of booth encoding or RAM
implementation, is applied for high speed operation. Due to the simple
architecture, one can easily pipeline the FIR filter to the adder level and save
up to half of the cost of previous designs without sacrificing performance.
The stipulated design was actuated for bit parallel input design, which can
save 36.7% of the area cost compared with previous approaches.

Park et al (2004) offered a programmable digital Finite Impulse


Response (FIR) filter for high performance and low power applications. The
architecture was based on a computation sharing multiplier (CSHM) which
specifically targets computation reuse in vector scalar products. It could be
effectively used in the low complexity programmable FIR filter design.
Efficient circuit-level techniques, namely a new carry select adder and
conditional capture flipflop (CCFF), were also used to further improve power
and performance. A 10-tap programmable FIR filter was implemented and
30

fabricated in CMOS 0.25mm technology based on the proposed architectural


and circuit level techniques.

Demirsoy et al (2004) prescribed Reconfigurable Multiplier Blocks


(ReMB) that offered significant complexity reductions in multiple constant
multiplications in time multiplexed digital filters. The ReMB technique is
employed in the implementation of a half band 32 tap FIR filter on both
Xilinx Virtex FPGA and UMC 0.18 m CMOS technologies. Reference
designs have also been built by deploying standard time multiplexed
architectures and off-the-shelf Xilinx Core Generator system for the FPGA
design. All those designs are then compared for their area and delay figures. It
was shown that the ReMB technique could significantly reduce the area for
the multiplier circuitry and the coefficient store, as well as reduce the delay.

Srinivasan et al (2004) demonstrated the feasibility of low power


analog FIR filter design availing current mode techniques. The design results
indicated a power consumption that was much lower than the alternate digital
implementations at 20 MSPS making it attractive for portable applications.
The FIR filter comprises of a current mode sample and hold configured as a
delay element followed by an analog multiplier.

Dawoud & Masupe (2004) recommended the design of a digital


serial N tap FIR filter with programmable coefficients. The design introduced
a new digit serial multiplier that guaranteed minimum processing time and
reduced the hardware requirements. Sign-amplitude representation for the
coefficients and two’s complement for the input samples simplified the
circuit configuration and allowed the use of one common two’s complement
circuit for all the filter sections. A 100 tap, 8-bit word length version filter
was implemented actuating an ALTERA FPGA device. The filter can be used
in real time processing with sample rate ranging from 1.5 to 21 MHz.
31

Lindahl & Bengtsson (2005) deployed an FIR filter combining


residue Number System (RNS) and Radix-2 Signed Digit (SD) representation.
RNS offers parallelization of the computations and SD carry free additions.
The moduli set were used for reducing the complexity of the RNS arithmetic
units. The evaluated filters have 8, 12 and 16 taps, binary word lengths
between 16 and 64 bits and they have been synthesized exploiting a UMC
0.13 m CMOS cell library with 8 metal layers. Power, delay, and area
comparisons were made with equivalent 2’s complement designs. The area
delay and power delay products showed that reduction in both power and area
at the same filter throughput could be expected.

Yue et al (2005) introduced a low-power scheme for the VLSI


implementation of FIR filters based on a standard cell. The scheme was a
cross level solution in terms of design flow. A multi-hierarchy pipelined
scheme was introduced in the architecture level. Integrating addition into
wallace tree structure, which guaranteed the possibility of achieving minimum
sized device solution in circuit level. The simulation showed that 20% of the
power was saved by means of this method.

Chen & Chiueh (2006) mooted a digit reconfigurable FIR filter


architecture with a very fine granularity. It provides a flexible yet compact
and low power solution to FIR filters with a wide range of precision and tap
length. Based on this design, an 8 digit reconfigurable FIR filter chip was
implemented in a single poly quadruple metal 0.35µm CMOS technology.
Measurement results showed that the fabricated chip operated up to 86 MHz
when the filter drew 16.5 mW of power from a 2.5V power supply.

Castellano et al (2008) presented a new method and an algorithm


for the synthesis of a high speed variable coefficient FIR filter. Timing
performance and reduced area were achieved by employing two techniques.
Firstly, a merged arithmetic architecture was exercised to synthesize the FIR
32

filter function directly. Secondly, an algorithm that looked for minimum delay
Partial Product Reduction Tree (PPRT) was developed. These results were
combined to create a program that furnished a speed optimized net list for the
filter. The performance of this method has been evaluated by comparing it
with the results achieved by cell based synthesis software.

Patronik et al (2011) offered a constant coefficient FIR filter design


adopting the residue number system (RNS) arithmetic. A common
subexpression elimination (CSE) technique, based on synthesis technique,
was introduced for RNS-based MBs, along with a high-performance RNS
based FIR filter architecture. It employed RNS arithmetic principles but
implemented them mainly utilizing more efficient 2’s complement hardware.
Several filters with a number of taps ranging from 25 to 326 and dynamic
ranges from 24 to 50 bits have been synthesized exercising TSMC 90 nm LP
kit and Cadence RTL compiler. The result indicated up to 22% improvement
in peformance (19% reduction in area) within bounded power envelope, or up
to 14% reduction in power consumption (12% reduction in area) at same
frequency.

Wyrzykowski & Ovramenko (1992) contrived new systolic array


architecture with a spiral structure of interconnections which was proposed
for very high throughput VLSI FIR filters. This architecture consists of L x K
cells, where K is the filter order and 1 < L < K. The architecture produces
multiple output samples in parallel and allows an increase in filter throughput
by L times, in comparison with the usual linear arrays consisting of K cells.
As a result, high flexibility in the realisation of FIR filters with desired
throughputs is achieved.
33

Wang (1994) preferred the delayed least mean square (DLMS)


algorithm for adaptive finite impulse response (FIR) filtering applications
where high throughput rates are required. Here, a bit serial bit level systolic
array, based on new schemes for multiplication add inner-product
computation, was presented to implement DLMS adaptive N tap FIR filters.
The architecture was highly regular, modular, and thus well-suited for VLSI
implementation. It has an efficiency of 100% and a throughput rate of one
filter output per 2B cycles, where B is the word length of input data.

Raita-aho et al (1994) proffered a VLSI implementation of a linear


phase digital filter for ECG signal processing designed for this task. With a
sampling rate of 100 Hz, the pass band ranged from 0.5 Hz to 49.5 Hz with
0.5dB ripple. The filter architecture was based on the use of recursive running
sum blocks, resulting in a very low computational complexity. Module
generators have been used in the layout design for high integration density.
The circuit has been designed for a 2.0pm double metal CMOS technology,
having about 34000 transistors and a 15.43 mm2 chip area.

Mehendale et al (1996) pondered over the choice of low power


realization of FIR filters availing multirate architecture. The multirate
architectures enable computationally efficient implementation of FIR filters.
The computational complexity of these architectures was analysed and power
analysis was presented to show how the computational efficiency could be
exploited to reduce power dissipation. The results showed upto 73% power
reduction for dedicated ASIC implementation with no data path area
overhead. The paper also presented the implementation of the multirate
architecture on the TMS32OC2x/C5x programmable DSPs and showed that it
resulted in up to 43% power reduction.
34

Nagendra & Irwwin (1996) stipulated design techniques which help


in building very high speed and low power filters. In their design, the effects
of multiplier recoding was investigated. The power consumption in the
multipliers was reduced by exploiting gated clocks. Then, low power
techniques were introduced to deal with pipelining issues because a large
fraction of the total power was consumed by the clock circuitry.

Hsieh & Kim (1996) posited a highly modular pipelined VLSI


architecture for two dimensional (2-D) finite impulse response (FIR) digital
filters. This approach decomposed the 2D discrete convolutions into pipelined
summations of parallel 1-D discrete convolutions, so that a modular structure
was obtained. The advantages of this architecture were: (i) it was a regular
structure with high modularity, (ii) it involved fully pipelined operation with
maximum utilization of the hardware resources, (iii) it exhibited the real time
processing ability with low system latency.

Jaccottet et al (2010) introduced a design methodology to


implement low complexity and high speed digital Finite Impulse Response
(FIR) filters. To achieve low complexity or high speed, the
addition/subtraction operations were implemented by making use of Ripple
Carry Adder (RCA) or Carry- Save Adder (CSA) architectures respectively.
Furthermore, high level algorithms, designed for the optimization of the
number of RCA and CSA blocks, were used to reduce the complexity of the
FIR filter. Thus, a Computer Aided Design (CAD) tool that synthesizes low
complexity and high speed FIR filters in a shift adds architecture was
developed. It was observed from the experimental results on FIR filter
instances that the CAD tool developed could find better FIR filter designs in
terms of area and delay than those obtained practising efficient general
multipliers.
35

Bartlett & Grass (2001) designed an asynchronous FIR filter, based


on a single bit plane architecture with a data dependent, dynamic and logic
implementation. Its energy consumption and sample computation delay were
shown to correlate approximately and linearly with the total number of ones
in its coefficient set. The proposed architecture has the property that
coefficients in a sign magnitude representation can be handled at negligible
overhead which, for typical filter coefficient sets, was shown to offer
significant benefits to both energy consumption and throughput. Transistor
level simulations show energy consumption to be lower than in previously
reported designs.

Figueroa et al (2001) posited a new approach for the design of high


performance low power linear filters. P channel transistors were used for the
design of analog memory cells and mixed signal circuits for fast low power
arithmetic. To demonstrate the effectiveness of this approach, a 16 tap 7b
200MHz mixed signal finite impulse response (FIR) filter was designed and
the design consumed 3 mW at 3.3 V.

Muhammad et al (2001) described area and power reduction


techniques for a low latency adaptive finite impulse response filter for
magnetic recording read channel applications. The proposed parallel
transposed direct form architecture operated on a real time input data samples
and employed a fast, low area multiplier. The proposed filter has been
fabricated by means of a 0.18mm CMOS technology and it operated at 550 M
Samples/s.

Soudris et al (2003) described a design methodology for FIR filter


implementation based on Residue Number System (RNS). It aimed at power,
delay and hardware complexity reduction that necessitated a comparison with
36

conventional binary implementations. Second, a CAD tool development


derived RNS full adder based DSP architectures consisting of FIR, scaling,
converters, multiplication and accumulation units.

Jamal et al (2003) ushered in a novel implementation of a low


power FIR filtering algorithm. The algorithm is distributed in such a way that
an input to a MAC, which is the coefficient of an FIR filter, remains
unchanged during four consecutive multiplication and accumulation
processes. It reduces switching activity and hence enhances power
consumption.

Stefatos et al (2005) doled out a customized reconfigurable VLSI


architecture that was tailored for the implementation of low power,
medium/high order, digital FIR filters. These were realized within a
reconfigurable array that consisted of heterogeneous programmable and
arithmetic logic units.

Chen et al (2006) paper described a new design methodology for


the realization of the FIR digital filter in transposed direct form. The holistic
effects of operand length and adder structure were examined on the area time
complexity of FIR filters. Fine grained cost metrics based on the number of
full adders and the number of full adder delays were exercised to compare the
area and timing complexities of the multiplier blocks of FIR filters designed
by different algorithms.

The FIR filter implementation using different technologies and


their comparison is shown in Table 2.1.
37

Table 2.1 The FIR filter implementation using different technologies and
their comparison

S.No Title Author Year Features Technology


1 FPGA impementation of high Shahnam Mirzaei,Anup 2006 Adders are reduced,so Architectural level
speed FIR filters using add Hosangadi,Ryan Kastner number of slices optimization implementated
and shift method reduced leads to in
reduction in total power Xilinx Virtex II devices using
on FPGA 250 nm technology
2 Pipelined array based FIR Bougas P 2005 Folded FIR filter is Architecture level
filter folding Kalivas P designed based on carry optimization using 250 nm
Tsirikos A save multipliers technology
Pekmestszi K Z
3 Design of efficient D.L.Maskell 2007 Coefficient word length Algorithm level optimization
multiplierless FIR filters and number of non-zero implementated in
bits in filter coefficients FPGA using 180 nm
are reduced technology
4 Computation Sharing Jongsun Park., Woopyo 2004 Power consumption is Architecture level
Programmable FIR Filter for Jeong., Hamid reduced through Optimization implemented in
Low-Power and High- Mahmoodi-Meimand., computational sharing . Virtex II devices using 250
Performance Applications Yongtao Wang., Hunsoo nm technology
Choo., and Kaushik Roy
5 Low power differential A.P.Vinod,A.Singla and 2007 Reduction in Algorithm level optimization
coefficients-based FIR filters C.H.Chang complexity and usage implemented in FPGA using
using hardware optimised of full adder is reduced 180 nm technology
multipliers

6 Non-adaptive and adaptive Park J.,Choo 2000 Resources are shared Algorithm level optimization
filter implementation based H.,Muhammad K., and during FIR filter implemented in FPGA using
on sharing multiplication Roy K operation 350 nm technology

7 A New Common Mahesh R 2008 Complexity is reduced Algorithm level optimization


Subexpression Elimination Vinod A.P implemented in Spartan 3
Algorithm for Realizing Low- FPGA kit using 180nm
Complexity Higher Order technology
Digital Filters
IEEE transactions
8 Multiplierless FIR filter S. M. Badave and A .S. 2012 Lookup table based FIR Algorithm level optimization
implementation in FPGA Bhalchandra filter implementation in implementation in FPGA
FPGA using 180 nm technology

9 Layout Aware optimization of Shahnammirzaei, Ryan 2010 Optimized FIR filter Physical level optimization
high speed fixed coefficient Kastner, design is realized implemented in 180 nm
FIR filter for FPGAs and Anup Hosangadi technology

10 Hardware-Efficient H. Yoo, and D. 2005 FIR filter is designed Algorithm level optimization
Distributed Arithmetic Anderson based on look up table implemented in 180nm
Architecture for High-Order based distributed technology
Digital Filters arithmetic unit

11 New Distributed Arithmetic Sangyun Hwang, 2007 FIR filter design based Algorithm level optimization
Algorithm for Low-Power Gunhee Han, Sungho on two operand adder implemented in 180nm
FIR Filter implemention Kang, Jaeseok Kim based DA unit technology

12 Design of digit serial FIR Levent Akshoy, 2013 Developed Optimized Algorithm level optimization
filters:Algorithms,Architectur Cristiano Lazzari, algorithm to reduce the implemented in 180 nm
es and a CAD tool Eduardo Costa, Paulo area at the cost of technology
Flores, José Monteiro increase in delay in a
digit serial FIR filter.

2.3 SUMMARY

Thus the literature survey of multiplier based FIR filter design and
multiplierless FIR filter design has been addressed in this chapter.

You might also like