91521026

//
Multirate Multistage Digital FIR Filter /

Decimator / Interpolator Module Generator
(93 5 )
/()
(
) ()
) ()
( ) ()
(
:
: Multirate Multistage Digital FIR Filter / Decimator /
Interpolator Module Generator

91521026
93 6 23
1.
2.
3.
/
/
IFIR
polyphase representation
CSD transposed direct CSA
64-QAM
Synopsys TSMC 0.25m
1.64 1.95
714 MHz CDMA
(IFIR filters) 1.72
13.10
IFIR polyphase representation
3.56 1.96
3.06 1.36
Multirate Multistage Digital FIR Filter /

Decimator / Interpolator Module Generator
Student : Hsiao-Yun Chen

Advisor : Shyh-Jye Jou, Ph.D.
Department of Electrical Engineering

National Central University
Jhongli 320, Taiwan, R.O.C.
July 2004
Abstract
In this thesis, a module generator, which can automate the process of designing
high-speed low-complexity multirate multistage digital FIR filter / decimator /
interpolator, is presented. The generator exploits architectural symmetries in linear
phase filters and multistage multirate interpolated FIR filter design methodology for
low complexity. In addition, the polyphase representation is used to decompose the
filter into subfilters. The resulting filters utilize canonic signed digit (CSD)
multipliers, a transposed direct form structure, and carry-save addition for high speed.
The input of the generator requires only system-level specifications. In addition, the
generator can provide three types of filter structure for different applications.
Moreover, the output is a synthesizable Verilog code written in behavioral-level
hardware description language (HDL) which allows the synthesis tool to select the
appropriate architecture from users constraints. Therefore, this tool can eliminate
manual calculations, coding, simulation, and verification time of the design cycle.
We have designed several filters with TSMC 0.25m standard cell. A 64-QAM
baseband design example shows that the area is reduced about 1.64 times and the
power dissipation is saved about 1.95 times for low-complexity applications.
Moreover, for high-speed application, the chip can operate at 714MHz. Besides, we
design the IFIR filters which specification is the first version of the CDMA cellular,
the area is reduced about 1.72 times and the power dissipation is saved about 13.10
times as compared with direct form design. An example of multistage decimator used
in CDMA cellular shows that the area is reduced about 3.56 times and the power
dissipation is saved about 1.96 times as compared with conventional decimator.
Finally, an example of the narrowband multistage interpolator are designed, the area
is reduced about 3.06 and the power dissipation is saved about 1.36 times as
compared with conventional interpolator.
Contents
Chapter 1 Introduction
1.1
Introduction ........................................................................................ 1
1.2
Motivation and Goals ......................................................................... 5
1.3
Thesis Organization............................................................................ 6
Chapter 2 Digital FIR Filter Design

2.1
Basic FIR Filter Design...................................................................... 7
2.1.1
FIR Filter Structure ......................................................................... 8
2.1.2
Carry Save Addition ...................................................................... 11
2.1.3
Linear Phase FIR Filters............................................................... 12
2.2
Multiplierless Filter Design.............................................................. 14
2.2.1
CSD Representation ...................................................................... 14
2.2.2
CSD Multipliers............................................................................. 17
2.3
RTL Design Technologies for CSD Based Design........................... 18
2.3.1
Sign Extension Elimination ........................................................... 18
2.3.2
Common Subexpression................................................................. 20
2.3.3
Pipelining ...................................................................................... 22
Chapter 3 Multirate Multistage Digital FIR Filter Design

3.1
Basic Multirate Operations............................................................... 24
3.1.1
Decimation .................................................................................... 24
3.1.2
Interpolation.................................................................................. 27
3.2
The Noble Identities ......................................................................... 29
3.3
The Polyphase Representation ......................................................... 30
3.4
Interpolated FIR Filter Design ......................................................... 33
3.5
Multirate Multistage Filter Design ................................................... 38
Chapter 4 Module Generator Implementation

4.1
System Specifications....................................................................... 41
4.2
Multistage Architecture Analysis ..................................................... 44
4.2.1
Interpolated FIR Filter Decomposition........................................ 44
4.2.2
Multirate Multistage FIR Filter Decomposition .......................... 49
4.3
Coefficient Calculation..................................................................... 50
4.4
Coefficient Optimization.................................................................. 50
4.4.1
Scaling Strategy............................................................................. 51
4.4.2
Local Search Strategy.................................................................... 54
4.5
Word Length Estimation................................................................ 55
4.5.1
Overflow Prevention...................................................................... 55
4.5.2
Internal Word Length Reduction ................................................... 56
4.6
4.6.1
Synthesizable Verilog Code Generation........................................... 57

Hardware Estimation ................................................................... 57
ii
4.6.2
Design of the FIR Digital Filter ................................................... 58
4.6.3
Design of the Interpolated FIR Filter .......................................... 60
4.6.4
Design of the Multirate Multistage Filter ..................................... 62
4.7
Module Generator............................................................................. 65
Chapter 5 Experimental Results

5.1
FIR Digital Filter Design.................................................................. 68
5.2
Interpolated FIR Filter Design ......................................................... 74
5.3
Multirate Multistage Filter Design ................................................... 79
5.3.1
Interpolator ................................................................................... 79
5.3.2
Decimator...................................................................................... 81
Chapter 6 Conclusions
References
iii
List of Figures
Fig. 2.1
Fig. 2.2
Fig. 2.3
Fig. 2.4
Fig. 2.5
Fig. 2.6
Fig. 2.7
Fig. 2.8
Fig. 2.9
Fig. 2.10
Fig. 2.11
Direct form of FIR filter................................................................................9

Transposed direct form structure of FIR filter. ...........................................10
Spurious transitions of 12-bits CPA............................................................ 11
Transposed direct form FIR filter with carry-save addition........................12
Linear phase transposed direct form FIR filter. ..........................................13
Distribution of CSD coefficient set (a) with 2, 3 and 4 nonzero digits for
8-bit word length and (b) for 6-, 8- and 10-bit word length with 2 nonzero
digits. ...........................................................................................................16
Transposed direct form architecture with CSD coefficients and CSAs. .....18
Compensation vector in MSB fix technique. ..............................................20
Implementation with common subexpressions. ..........................................21
Transposed direct form filter with 2- and 3-FA delay pipelining.............23
Symmetric transposed direct form architecture using carry-save addition.
...................................................................................................................23
Fig. 3.1 (a) M-fold decimator. (b) Demonstration of decimation for M=2. .............25
Fig. 3.2 Spectrum analysis of downsampling effect with M=2................................26
Fig. 3.3 (a) Block diagram of an M-to-1 decimator. (b) Typical magnitude response
of the decimation filter................................................................................26
Fig. 3.4 (a) 1-to-L upsampler (b) Demonstration of upsampling for L=2................27
Fig. 3.5 Spectrum analysis of upsampling effect with L=2......................................28
Fig. 3.6 (a) Block diagram of an 1-to-L interpolator. (b) Typical magnitude response
of the interpolation filter. ............................................................................28
Fig. 3.7 The noble identities for multirate systems. .................................................29
Fig. 3.8 Reconstruction of a decimator with M=2. ..................................................31
Fig. 3.9 Polyphase implementations of (a) M-fold decimator and (b) L-fold
interpolator..................................................................................................32
Fig. 3.10 Time and frequency domain behaviors of IFIR low-pass filter with L=3.34
Fig. 3.11 IFIR filter performance versus transition region bandwidth for p = 0.5,
p = 0.1 dB and p = 40 dB: (a) hardware reduction factor over
iv
Fig. 3.12
Fig. 3.13
Fig. 3.14
Fig. 3.15
conventional filter design; (b) optimum interpolation factors. ..................36

Frequency domain behaviors of IFIR low-pass filter, the image suppressor
I(z) is designed with a don't-care region. ...............................................37
Multistage IFIR decimator design.............................................................38
Multistage IFIR interpolator design..........................................................39
Multistage IFIR decimator with three-stage decomposition.....................40
Fig. 4.1 Design flow of the module generator (a) the digital FIR filter design flow
(b) the multirate multistage digital FIR filter / decimator / interpolator
design flow. .................................................................................................42
Fig. 4.2 Specification of a lowpass filter. .................................................................43
Fig. 4.3 The number of taps versus interpolated factor L for the periodic model filter
G(zL), the image suppressor I(z) and the overall filter H(z)........................46
Fig. 4.4 In case (3), the decompositions of two-stage designs of I(z) for various
values of L1and L2. ......................................................................................47
Fig. 4.5 Specifications of multistage IFIR decimation filter design. .......................49
Fig. 4.6 The flowchart of the scaling strategy for filter coefficients........................53
Fig. 4.7 SNR simulation block. ................................................................................56
Fig. 4.8 Internal word length estimation flow chart. ................................................57
Fig. 4.9 Transposed direct form filter strcture utilizes with (a) carry save adders
(CSA) and, (b) carry save adders (CSA) with pipelining............................59
Fig. 4.10 The Structure C with an input buffer.........................................................59
Fig. 4.11 (a) The symmetric transposed direct from structure for G(zL) with L-unit
delays between the taps; (b) the impulse response of H(z); (c) the impulse
response of G(zL) with L=3. ......................................................................61
Fig. 4.12 The symmetric transposed direct from structure for G(zL) with dual clocks.61
Fig. 4.13 (a) Direct form decimator with mirror symmetric filter pairs.
(b) Transposed direct form decimator with memory-saving technique.
(c) Direct form interpolator with memory-saving technique.
(d) Transposed direct form interpolator with mirror symmetric filter pairs.
...63
Fig. 4.14 The operation flow of the module generator.............................................66
Fig. 5.1 The frequency responses of of Work #1, Work #2 and Work #3. ...............70
Fig. 5.2 The frequency response of 64-QAM baseband demodulator......................72
Fig. 5.3 Frequency responses of the IFIR filters with single-stage I(z) and L = 4,
(a) I(z) of order 15 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter. .................................76
Fig. 5.4 Frequency responses of the IFIR filters with two-stage I(z) and L = 4,
(a) I1(z) of order 7, I2(z2) of order 7 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter. .................................77
Fig. 5.5 The frequency responses of the subfilters of the multirate decimators and
conventional decimator. (a) the conventional decimator of order 69, the
subfilters of the multirate decimator with two-stage (b) I(z) of order 15
(c) G(z) of order 19, the subfilters of the multirate decimator with
three-stage (d) I1(z) of order 7 (e) I2(z) of order 7 (f) G(z) of order 19. .....80
Fig. 5.6 The frequency responses of the conventional FIR and multirate IFIR filters.
.....................................................................................................................81
Fig. 5.7 The frequency responses of the subfilters of the multirate interpolators and
conventional interpolator. (a) the conventional interpolator of order 85, the
subfilters of the multirate interpolator with two-stage (b) I(z) of order 10 (c)
G(z) of order 31...........................................................................................83
Fig. 5.8 The frequency responses of the conventional FIR and multirate IFIR filters.
.....................................................................................................................84
vi
List of Tables
Table 2.1 Key features of the four linear phase FIR filter types. .............................13
Table 2.2 Common subexpressions of filter coefficients. ........................................21
Table 3.1 Number of linear phase subfilters if prototype filter is linear phase ........33
Table 4.1 Input data of system specifications...........................................................42

Table 4.2 Three cases for IFIR decompositions. ......................................................44
Table 4.3 Summaries of the optimal IFIR filters with single-stage and two-stage
implementations of I(z). ...........................................................................48
Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6
Table 5.7
Table 5.8
Table 5.9
Table 5.10
Table 5.11
Table 5.12
Table 5.13
Minimum number of SPT terms required to attain -50dB NPR...............69

Synthesis results of Work #1. ...................................................................69
Synthesis results of Work #1and Work #3................................................71
Specifications of 64-QAM baseband demodulator...................................71
Specifications after module generator of 64-QAM baseband demodulator.
..................................................................................................................73
Synthesis results of the example for 64-QAM baseband demodulator. ...73
Specifications of the CDMA cellular proposed by Qualcomme. .............74
Design results by module generator with IFIR filter designs and the
conventional filter. ....................................................................................75
The synthesis results of the conventional filter and the IFIR filters.........78
The synthesis results of the conventional decimator and the multirate
decimator [40].(Decimation Ratio, M=8)...............................................79
Specifications of the interpolator. ...........................................................82
Specifications after module generator of IFIR filter designs and the
conventional filter. ..................................................................................82
The synthesis results of the conventional interpolator and the multirate
interpolator. (Interpolation Ratio, L=6) ..................................................83
vii
Chapter 1
Introduction
1.1 Introduction
Digital signal processing is an area of science and engineering that has
developed rapidly over the past 30 years. The applications of digital finite impulse
response (FIR) filters and up / down sampling DSP techniques are found everywhere
in modern electronic products such as multimedia, modems, and mobile personal
communications. For every electronic product, lower circuit complexity is always an
important design target since it reduces the cost. For portable applications such as
notebook computers or wireless personal communication systems, whose power
consumption shall be small, a low-power low-complexity implementation is very
important. This is evident by the recent trend toward integrating a whole system on a
single chip (SoC).
Digital FIR filters are widely used in DSP applications. The trend towards
increasing data rates in DSP systems has pushed the development and implementation
of high-speed and lower-power digital FIR filters. High-speed and low-power

applications require both increased parallelism and reduced complexity in order to
meet both sampling rate and power dissipation goals. For many applications, reduced
complexity may be achieved by eliminating programmability of the coefficients, thus
allowing the hardware to be optimized for a particular fixed coefficient set.
Multirate signal processing[1] consists of using different sample rates within a
system to achieve computational efficiencies that are impossible to obtain with a
system that operates on a single fixed sample rate. Such systems are frequently used
for audio and video processing, communication systems, general digital filtering,
transform analysis, and more. The two key components in multirate systems are
decimator and interpolator. It utilizes high-speed decimator / interpolator to reduce
the sampling rate so that complicated processing may be performed at a lower data
rate. The polyphase structure provides an efficient architecture for the realization of
multirate systems through a bank of filters operating in parallel [2]. Proper sampling
rate conversion always requires filtering. Linear phase filters used for sampling rate
conversion can be implemented efficiently. Among these digital filters, the proper
filtering may be performed by finite impulse response (FIR) or infinite impulse
response (IIR). In video and communication systems, however, linear phase filters are
highly desirable to avoid signal distortion, which precludes the use of IIR filters.
Furthermore, FIR filters have a very regular architecture, which make them much
more amenable to synthesis tools and suffer less from the effects of finite word length
than IIR filters.
Here, we briefly sketch some related commercial tools:
MathWorks, Inc [3]

z
Filter Design & Analysis Tool Box (FDATool)

FDATool is a collection of tools built on top of the MATLAB computing
environment and the Signal Processing Toolbox. The toolbox includes a number of
advanced filter design techniques that support designing, simulating, and analyzing
fixed-point and custom floating-point filters for a wide range of precisions.
However, it can handle single-rate filter design only.
z
Interpolated FIR Filter Design (IFIR)

This function can decompose the sharp narrow passband filter into a periodic
model filter and an image suppressor by using IFIR filter design methodology.
However, it cannot decompose the filter into three stages or more. Its capability for
filter decomposition is two stages only.
Synopsys, Inc [4]

z
System Studio Filter Design Tool (QED)

QED is a filter specification, design, and analysis tool. Users can use it to create
analog and digital filters for use in system simulation and implementation.
z
Multirate Filter Design Tool (MRFD)

MRFD enables users to create filtering systems that exploit the benefits of
decimation, interpolation, windowed multirate design, and polyphase filtering. In

these systems, the input and output sampling rates are the same and sample rate
changes are used only to achieve computational efficiencies.
z
Sample Rate Conversion Filter Design Tool (SRCFD)
SRCFD enables users to create decimation, interpolation, and rational L/M

sample rate conversion, using the polyphase, multistage technique for computational
savings.
This tool is a companion to the MRFD but is for use in problems in which
the goal is to change the sampling rate using filtering, decimation, and interpolation.
By using the three tools mentioned above, the following system capabilities can
be obtained:
Design of FIR filters.
Analysis of possible decimator or interpolator structures.
Analysis of multistage options for each decimation or interpolation factor.
Calculation of computational requirements for multistage structures.
Recommended multirate structure including number of stages.
Automatic generation of all filter specifications.
Automatically calls Parks-McClellan design algorithm for each filter design.
C code generation with multistage polyphase filters.
Optional comb / halfband filter design in multistage implementations.
For fixed filter implementations, it is necessary to create custom silicon
solutions for each application. The large number of applications for such application
specific integrated circuits (ASICs) would suggest that a compiler silicon solution
would be desirable [5]-[9]. However, logic synthesis is already a mature technology
and results from existing tools are generally accepted as producing satisfactory
circuits. Thus, we focus on the design process of multirate multistage digital FIR
filter / decimator / interpolator from system specifications to Verilog HDL codes[10].
1.2 Motivation and Goals

Recent rapid progress in very large-scale integrated circuits (VLSI) technology
has led to an emerging theme System-on-a-Chip (SoC). With the increase in the
density and complexity in VLSI integrated circuits technology, the design costs for
the development of a VLSI chip are also increased. It calls for rapid prototyping and
design reuse of major silicon intellectual property (SIP) modules to alleviate the
designer's effort and to speed up the design process. Therefore, computer aided design
(CAD) tools play an important role in decreasing the design cycle time and accurately
simulating the correctness of the circuit design.
In this thesis, a general-purpose multirate multistage digital FIR filter /
decimator / interpolator module generator, which is based on canonic signed digit
(CSD) code representation [11] and multistage multirate interpolated FIR (IFIR) filter
design methodology [12][13], will be proposed. Several design methodologies were
adopted to reduce the hardware complexity at architectural level. The module
generator we presented can automate design FIR filter / multistage filters / decimator
/ interpolator from the system specification to the corresponding synthesizable
Verilog hardware description language (HDL) code. Because the module generator
only requires the system-level specification, the module generator allows system
designers, who are inexperienced in VLSI design, to design filters easily and
concentrate on system design and performance evaluation. Therefore, by using this
module generator, an efficient design of a chip can be successfully completed in a few
minutes.
1.3
Thesis Organization
This thesis will describe various techniques by which sufficient parallelism for
high-speed operation may be achieved, while simultaneously constraining the

solution to have a small hardware implementation for these structures. Most of these
techniques are widely known and are briefly summarized as an introductory tutorial.
The module generator, which utilizes these architectures, analysis techniques and
design tradeoffs will be demonstrated.
The organization of this thesis is as follows. In Chapter 2, an overview of basic
FIR filter design issues will be given first. Next, the issues of multiplierless filter
implementation are introduced. Finally, we will introduce the methodologies that we
use to reach a substantial reduction in hardware complexity when we design the
digital filter. In Chapter 3, some basic multirate DSP fundamentals are reviewed first.
Next, some useful techniques, which are widely used in multirate systems such as
noble identities, polyphase representation, and IFIR filter design, are discussed. In
addition, we will introduce efficient multirate filtering design. In Chapter 4, the
design flow of the module generator will be demonstrated. In Chapter 5, experimental
results of FIR filter / multistage filters / decimator / interpolator examples designed
with the module generator are presented. Finally, some conclusions will be given in
Chapter 6.
Chapter 2
Digital FIR Filter Design
Digital filters play very important roles in DSP systems. The characteristics of
analog filter circuits are usually very difficult to design, and its overall performance is
very sensitive to no idealities, such as dc-offset voltage, dc voltage drifts and parasitic
components, etc. Compared with analog filters, digital FIR filters can have a truly
linear phase response and very precise performance. A digital filter is easily
programming the hardware to accommodate different data rates, modulation formats
and filter specifications makes the hardware requirements relatively simple and
compact in comparison with the equivalent analog circuitry.
2.1 Basic FIR Filter Design

A linear-time-invariant system can be characterized by its impulse response. A
system called finite-impulse response means its output will gradually decay to zero in
a finite duration as long as its input duration is finite. The basic FIR filters are
characterized by the following two equations:

N 1
y ( n) = h( k ) x ( n k )
(2.1)
k =0
Eqn.(2.1) is the FIR difference equation. It is a time domain equation and describes
the FIR filter in its nonrecursive form: y(n) is the current output samples that is the
function of present and past values of input, x(n). N is the filter length, that is th
number of filter coefficients. An alternative representation for FIR in z-domain is
given in Eqn. (2.2).
N 1
H ( z ) = h( k ) z k
(2.2)
k =0
where h(k), k = 0, 1, , N-1, are the impulse response coefficients of the filter, H(z)
is the transfer function of the filter. Detail discussion of several basic FIR filter
structures will be given in the next sections.
2.1.1 FIR Filter Structure

The choice of structure for FIR filter design includes factors such as hardware
complexity and desired throughput. Fig. 2.1 depicts the direct from structure of FIR
filter. It is also called a tapped delay line or transversal filter [19]. This structure is a
direct mapping of Eqn. (2.1) into hardware implementation, a tapped delay line in
which each of the delay versions of input is multiplied by the appropriate filter
coefficient and the results are summed together to form the filter output. This
structure needs delay elements, multipliers, and a multi-input adder. The multi-input
adder would dominate the speed of overall system. For linear accumulation, the sum
unit used a two-input carry-propagation adder (CPA), the critical path of an N-tap
direct form FIR is
Tdirect = Tmul + (N 1) Tadd
(2.3)
Fig. 2.1
Direct form of FIR filter.
where Tmul is the delay of the multiplier and Tadd is the delay of a Wint -bits CPA, and
Wint is the internal word length. A tree structure adder as suggested by Reutz [20] can
instead perform the accumulation and the critical path can be measured as
Tmul + (log1.5 N ) Tadd
(2.4)
The delay time of the filter, increases logarithmically with the filter tap length N.
Furthermore, the tree structures can use carry-save adder (CSA) tree, Wallace trees, or
Dadda trees to eliminate the delay due to carry propagation.
Fig. 2.2 depicts the transposed direct form structure of FIR filter that
repositioning the delay elements of the direct form structure [19]. In this structure, the
input is fed to each tap and the results are accumulated over N sample periods. As
shown in the block diagram, the system throughput rate is independent of the tap
length. It retains the regularity of the linear accumulation direct form structure and the
critical path of this structure is only a multiplication and an addition, as shown in Eqn.
(2.5).
Ttransposed = Tmul + Tadd
(2.5)
Fig. 2.2
Transposed direct form structure of FIR filter.
We can expect it faster than the tree structures used in direct from structure of
FIR filter. Such a short critical path also allows the system to operate in a low supply
voltage and make this solution very suitable for low-power applications. Besides, it
has inherent ability for high-speed operation and pipelining.
One of the primary disadvantages of this structure is the large loading on the
input data-broadcasting bus since all multipliers are fed in parallel. As the numbers of
taps increase, the input signal bus becomes longer and leads to larger load
capacitances. We can reduce this effect by using appropriate data buffers and by
appropriately distributing the input bus as tree-like structures. Another disadvantage
of this structure is the delay elements are larger since they hold the accumulated sum
instead of the input signal. Furthermore, if we choose the CSA base structures that
will be introduced in the next section, it required doubling delay elements within the
filter core.
10
2.1.2
Carry Save Addition
The multiplier and adder delay plays an important role in dominating the system
speed as show in Eqn. (2.5). Carry-propagation adder (CPA) is not a good candidate
for low power dissipation design and high-speed application. Because the delay time
of it is linearly dependent on the word length of the adder. It also generates many
glitches before the real carry propagates from the least significant bit (LSB) to the
most significant bit (MSB), as shown in Fig. 2.3 [21].
Cin
11
10
9
8
7
6
5
4
3
2
1
0
Fig. 2.3
Spurious transitions of 12-bits CPA.
In order to avoid the long critical path delay of the adder, the adder in each tap
is converted to CSA as shown in Fig. 2.4. In carry-save addition, both a sum and a
carry bit are acquired in each bit position in the word and the carry propagation
problem inside an adder is avoided. There are a few drawbacks to the carry-save
scheme, with the most important of these being the requirement of doubling the
11
Fig. 2.4
Transposed direct form FIR filter with carry-save addition.
number of registers within the filter core. This will increase the filter core area but
system can achieve a higher throughput rate or use a lower supply voltage. At the
final stage of the filter, it requires a single high-speed CPA, a so-called vector merge
adder (VMA), in order to sum the two data path output together to form the final
output. The critical path delay of transposed direct from FIR filter is
TFIR = max{Tmul + TCSA , TVMA }
(2.6)
where TVMA means the n-bits VMA delay. Obviously, the VMA delay will dominate
the system throughput rate, so some high-complexity high-speed adder such as a
carry-select adder or a carry-lookahead adder (CLA) may be used to reduce TVMA.
2.1.3
Linear Phase FIR Filters
In many filter applications, phase distortion cannot be tolerated, and thus the
filters are required to have a linear phase response. There are four types of linear
phase FIR filters, depending on whether N is even or odd and whether h(k) is
symmetric or anti-symmetric. Table 2.1 summarizes their key features.
12
Table 2.1
Key features of the four linear phase FIR filter types.
Type
II
III
IV
Tap Length
odd
even
odd
even
Symmetry
symmetric
symmetric
anti-symmetric
anti-symmetric
H(0)
arbitrary
arbitrary
H()
arbitrary
arbitrary
Applications
LP, HP, BP, BS,

multiband filters
LP, BP
differentiators,
Hilbert transformers
The symmetric structure can save about half the number of coefficient
multipliers by sharing the multipliers between the symmetric taps. This symmetry
feature exists in both the direct form and transposed direct form structures. Fig. 2.5
shown the linear phase transposed direct form structure or called symmetric
transposed direct form structure and it is adopted in our module generator. The
drawback of this symmetric structure is the slightly increase in data path routing due
to the sharing of multipliers.
Fig. 2.5
Linear phase transposed direct form FIR filter.
13
2.2 Multiplierless Filter Design

The area requirement for the multipliers is a well-known bottleneck for FIR
filters and it would consume a great amount of power. Moreover, the maximum speed
of filter would be severely limited by the delay of multiplier. The number of addition
operations required in a constant coefficient multiplication equals one less than the
number of nonzero bits in the constant coefficient. In order to further reduce the area
and power consumption, the constant coefficient can be encoded such that it contains
the fewest number of nonzero bits, which can be accomplished using canonic signed
digit (CSD) representation.
This section addresses the CSD number representation and its applications for
the design of the constant multipliers.
2.2.1 CSD Representation

A FIR filter coefficient expressed as a sum of signed powers-of-two (SPT)
terms has the general form
Dn
hSPT (n) = sk , n 2
pk , n
(2.7)
k =1
where s k , n { 1,0,1} and pk ,n {1,L,W }. The coefficient hSPT(n) has Dn-SPT terms
and W-bit word length. In general, there are several equivalent SPT representations
for a given number. The minimum representation refers to a representation requiring
the minimum number of SPT terms, of which there may also be more than one
choice.
14
The properties of CSD number representations are summarized as follows.

z
The CSD number representation is a ternary coded word with the minimum
number of nonzero digits (SPT terms).
No two consecutive digits in a CSD number are nonzero.
The CSD representation of a number is unique [11] and there are at most n/2
nonzero digits for an n-bit CSD word.
CSD numbers cover the range (-4/3, 4/3), out of which the values in the
range [-1,1) are of greatest interest.
Among the W-bit CSD numbers in the range [-1,1), the expected number
of non-zero digits tends asymptotically to n/3 + 1/9 [22]. Hence, on average,
CSD numbers contains about 33% fewer nonzero digits than 2s complement
numbers.
The drawback of CSD representation is that the distribution of CSD coefficient
is not uniform [23], as shown in Fig. 2.6, and it may cause seriously quantization
error problem. The distribution has many gaps in the region where the CSD value is
above 0.5 for a fixed number of nonzero digits and word length. When increase the
number of nonzero digits in same word length or increase the number of word length
in same nonzero digits, it can reduce the gaps of the distribution for larger CSD value.
Since the distribution of CSD coefficient is not uniform, some search strategies and
optimization algorithms are required in order to find the optimal CSD representation
of the origin coefficient, and fulfill the origin specifications in the same time. These
will be discussed later in Section 4.4.
15
(a)
(b)
Fig. 2.6 Distribution of CSD coefficient set (a) with 2, 3 and 4 nonzero digits for
8-bit word length and (b) for 6-, 8- and 10-bit word length with 2 nonzero
digits.
16
2.2.2 CSD Multipliers

As mentioned previously, constant multiplication can be carried out by adding
or subtracting a number of partial product terms corresponding to the nonzero bit
positions in the constant multiplier. A CSD-encoded multiplier simply implemented
by combining bus shifts and minimal 2s complement adders since adders and
subtracters are essentially identical in hardware. Without using a multiplier, this
structure is often referred to as multiplierless filters. They require less circuitry, lower
power dissipation, and have a shorter critical path delay, which translates into a
higher data throughput. These advantages are especially important for FIR filter
implementation since such filters usually need much more multiplications than their
IIR counterparts do.
Many high-speed digital FIR filter chips and silicon compilers employ CSD
coefficients [15][16][24][25]. An architecture frequently adopted by these designs,
which uses a transposed direct form filter structure with CSAs, is shown in Fig. 2.7.
In this architecture, one CSA is required for each nonzero term in all but the final
stage of the filter used a single VMA to combine the carry and sum vectors at the
filters output. The architecture provides excellent layout regularity and a short
critical path as
Ttransposed = max{Dmax TCSA , TVMA }
(2.8)
Where TCSA and TVMA are the delay times of the CSA and the VMA, respectively,
with Dmax = max{Dn }. The delay time of a CSA is only a one-bit full adder. The use
n
of carry save arithmetic takes full advantage of the CSD coefficients and reduces the
17
Fig. 2.7
Transposed direct form architecture with CSD coefficients and CSAs.
delay time of coefficient multiplication-accumulation to a few one-bit full adder

delays without heavy pipelining. It is intuitive that, for a given filter length, the total
number of SPT terms ( Dn ) used, determines the filter implementation complexity.
n
In addition, the maximum number of SPT terms per coefficient Dmax generally
determines the throughput limit of the filter. Therefore, the objective of the filter
design is to optimize the filters frequency response while keeping the number of SPT
terms employed to a minimum and keeping the number of SPT terms per coefficient
within a specified bound.
2.3
RTL Design Technologies for CSD Based Design
2.3.1 Sign Extension Elimination

One drawback of the transposed direct form structure is a large load on the input
data bus, which cannot be easily eliminated. Some work [22][26][27] have exploited
18
the common factors in the coefficient set to produce a nested generation of the
data-coefficient products. While this may simplify the generation of products and
reduce the loading on the data broadcast lines, it is coefficient dependent and
extremely irregular.
For the twos complement data format, the implications of this problem are
especially important for the MSB driver. For example, suppose that a 5-bit input data
word
x0 x1 x2 x3 x4
(2.9)
is multiplied by the CSD coefficient of 2-4. Then the input data should be shifted 4
bits to the right as
x0 x0 x0 x0 x0 x1 x2 x3 x4
(2.10)
to the appropriate adder columns. It is obviously that the MSB bit x0 must be
broadcast to five FAs, while the others need only be broadcast to one FA. This will
lead to a far greater loading capacitance on the MSB as compared to the other bits on
the input data bus and longer chains of buffers would be needed to drive large load.
Furthermore, power consumption and chip area would also increase due to these
driving circuits and wiring buses.
To solve this problem of large MSB load, a solution called MSB Fix was
applied [28]. To illustrate this principle, consider again the previous data word in
(2.11). One can reforms this data word and equivalently represents it as
x0 x0 x0 x0 x0 x1 x2 x3 x4 = 0000 x0 x1 x2 x3 x4
+ 111100000
(2.11)
According to this representation, the multiplied data word can be achieved by the
summation of the shifted data word and a constant vector. The FA loading of MSB
drivers would be the same as that of the non-MSB drivers, and the inverted MSB is
19
broadcast. Since the MSB fixed technique only depends on the value of CSD shift,
the constant vector of each tap of the filter can be summed together to form a
compensation vector (CV).
N 1
CV = CVn
(2.12)
n =0
This CV can be added to the first tap of filter as shown in Fig. 2.8.
Fig. 2.8
Compensation vector in MSB fix technique.
2.3.2 Common Subexpressions

Another simplification was performed for those CSD coefficients, which have
common subexpressions in the CSD representations [29]. The adders and shifters can
replace constant multipliers for efficient implementations, where the area can further
reduced by sharing the common subexpressions among those operations. For example,
an input data word x is multiplied by the CSD coefficients h1=2-1+2-3 and h2=2-5+2-7
respectively, and the results can be expressed as
x h1 = x >> 1 + x >> 3
x h2 = x >> 5 + x >> 7
(2.13)
We can define an common expression f=x>>1+x>>3, and the above representations
20
can be alternatively expressed as

x h1 = q
x h2 = ( x >> 1 + x >> 3) >> 4
= f >> 4
(2.14)
This method can be put into implementation directly as shown in Fig. 2.9. It is clearly
in this figure that there is a 25% hardware reduction with this scheme, 4 adders
Fig. 2.9
Table 2.2
Implementation with common subexpressions.
Common subexpressions of filter coefficients.
21
reduced to 3 adders. Another example, the common subexpressions (f1 and f2) of filter
coefficients are shown in Table 2.2.
Experimental results [30] show that a subexpression sharing design has a longer
critical path than a design with no sharing. Moreover, it is not suited to the polyphase
structure that will be discussed later in Section 3.3 although this can provide
significant reductions in complexity and loading for some filters. Subexpression
sharing is not employed in this module generator; however, it may become an option
in a future version.
2.3.3
Pipelining
Architectures that adopt the CSD multipliers and carry-save addition greatly reduce
the critical path of the filter. However, critical path can be reduced further through
pipelining of the structure. Pipelining stage of 2 to 3 FA delay can then be achieved
by placing pipeline registers between the CSAs and the adders as show in Fig. 2.10.
The register cost per filter tap for bit-level pipelining is
N reg ,n = min{Dn + 2, 4} Wint
(2.9)
where Wint is the internal word length of the filter. Pipelining to a single FA delay will
require substantially much more pipeline register hardware. Thus, only two-stage
pipeline will be incorporated as an option in the module generator. The final
architecture for a four-digit CSD linear phase tap using carry-save addition is shown
in Fig. 2.11.
22
Fig. 2.10
Fig. 2.11
Transposed direct form filter with 2- and 3-FA delay pipelining.
Symmetric transposed direct form architecture using carry-save addition.
23
Chapter 3 Multirate Digital FIR Filter Design
Chapter 3
Multirate Multistage
Digital FIR Filter Design
3.1
Basic Multirate Operations

A multirate system is characterized by the property that signals at different
sampling rates are present. Such systems are used for audio and video processing,
communication systems, general digital filtering, transform analysis, and more. The
two basic operations in multirate system are decreasing and increasing the sampling
rate of signals. The former is called decimation , or down-sampling. The latter is
called interpolation, or up-sampling.
3.1.1 Decimation
Fig. 3.1(a) shows the M-fold decimator, which takes an input sequence x(n) and
produces the output sequence
24
y (n) = x( Mn)
(3.1)
where M is an integer and y(n) is obtained by taking only M-th sample of the input
signal x(n) and discarding all others. Fig. 3.1(b) demonstrates the idea for M=2. As
will be shown mathematically, decimation results in aliasing unless x(n) is
bandlimited in a certain way. In general, therefore, it may not be possible to recover
x(n) from y(n) if aliasing occurs.
f =F
f =F M
(a)
x(n)
9 10
y(n)
(b)
Fig. 3.1
(a) M-fold decimator. (b) Demonstration of decimation for M=2.
For the M-fold decimator, Eqn. (3.1), we derive an expression for the output
spectrum Y(ej) in terms of X(ej), which is
Y (e j ) =
1
M
M 1
X (e
j ( 2 k ) / M
(3.2)
k =0
Fig. 3.2 demonstrates the spectrum analysis for M=2. From this figure, the
stretched version X(ej/) may overlap with its shift replica. When this happens, the
input samples x(n) can not be recovered from the decimated version y(n). This
overlap effect is called aliasing.
25
X(ej)
-2
Decimation
Y(ej)
aliasing
-2
Fig. 3.2
Spectrum analysis of downsampling effect with M=2.
A lowpass digital filter called the decimation filter as shown in Fig. 3.3(a)
precedes the downsampler. This filter ensures that the input signal being decimated is
bandlimited. The exact band edges of the decimation filter depend on how much
aliasing is permitted. The simplest form of lowpass decimation filter has magnitude
response as sketched in Fig. 3.3(b). Typically, the cutoff frequency is designed at
/M.
f =F
f =F
f =F M
M
(a)
H(ej )
(b)
Fig. 3.3
(a) Block diagram of an M-to-1 decimator. (b) Typical magnitude response

of the decimation filter.
26
3.1.2 Interpolation
Fig. 3.4(a) shows a building block of an L-fold interpolator (or expander). By
inserting L-1 equally spaced zeros between each pair of samples, we device takes an
input x(n) and produces an output sequence
x(n/L), if n is integer-multiple of L
y(n) =
(3.3)
0,
otherwise.
where L is an integer. Fig. 3.4(b) is a demonstration of this operation for L=2. It is

evident that the interpolation operation does not cause any loss of input information.
We can recover the input x(n) from y(n) by L-fold decimation.
f =F
f = FL
L
(a)
9 10
9 10
x(n)
n
y(n)
n
(b)
Fig. 3.4
(a) 1-to-L upsampler (b) Demonstration of upsampling for L=2.
By doing z-transform of Eqn. (3.3), the output time sequence of interpolator y(n)
can be written as follows.
27
Y ( z) =
=
y ( n) z
n =
y(kL) z
kL
k =
n = mul . of L
y ( n) z n
x(k )z
kL
(3.4)
k =
= X ( z L ).
X(ej)
-2
Interpolation
Images
Y(ej)
-2
Fig. 3.5 Spectrum analysis of upsampling effect with L=2.
f =F
f = F L
f = F L
(a)
H(ej )
(b)
Fig. 3.6
(a) Block diagram of an 1-to-L interpolator. (b) Typical magnitude response

of the interpolation filter.
28
From Eqn. (3.4), we can find that Y(ej) = X(ejL). This means that Y(ej) is a L-fold
compressed version of X(ej) as demonstrated in Fig. 3.5, where L=2. The multiple
copies of the compressed spectrum are the images created by the interpolation process.
An interpolation filter that follows an interpolator to suppress those unwanted images,
as shown in Fig. 3.6.
3.2
The Noble Identities

A different type of cascade is shown in Fig. 3.7(a) where a filter H(z) follows a
decimator, and in Fig. 3.7(c) where a filter H(z) precedes an intrpolator. Such
interconnections arise when we try to use the polyphase representation (Section 3.3)
for decimation and interpolation filters. If the function H(z) is rational (i.e., a ratio of
polynomials in z or z-1) then we can redraw Fig. 3.7(a) as in Fig. 3.7(b) and Fig. 3.7(c)
as in Fig. 3.7(d). These are called noble identities [1]. The proofs of them are shown
below.
Identity 1
x(n)
H(z)
y1(n)
x(n)
H(zM)
(a)
y2(n)
H(zL)
y4(n)
(b)
Identity 2
x(n)
H(z)
y3(n)
x(n)
(c)
Fig. 3.7
x'(n)
(d)
The noble identities for multirate systems.
29
1 M 1
X ( z1/ M W k ) H (( z 1/ M W k ) M )
M k =0
1
j 2 k M
M
1 M 1
1/ M
k
M
M
=
X
(
z
W
)
H
(
z
e
)
M k =0
1 M 1
=
X ( z1/ M W k ) H ( ze j 2 k )
M k =0
1 M 1
=
X ( z1/ M W k ) H ( z )
M k =0
Y2 ( z ) =
(3.5)
, W = e j 2 / M .
= Y1 ( z )
Eqn. (3.5) shows that Y2(z) is equal to Y1(z). Also, consider that
Y4 ( z ) = H ( z L ) X '( z ) = H ( z L ) X ( z L ) = Y3 ( z )
(3.6)
which proves that Y4(z) is the same as Y3(z).
3.3
The Polyphase Representation

An important advancement in multirate signal processing is the invention of
polyphase representation [31]. It permits great simplification of theoretical results and

leads to computationally efficient implementations of decimator and interpolator.
Considering a filter H(z) =
h(n)z
-n
, the coefficients of it can be separated
n =-
into odd numbered part and even numbered part, i.e., H(z) can be written as
H(z) =
h(n)z
-n
n =-
h(2n)z
-2n
+z
n =-
-1
h(2n + 1)z
(3.7)
-2n
n =-
If we define
H 0 (z) =
h(2n)z -n , H1 (z) =
n =-
h(2n + 1)z
-n
(3.8)
n =-
the representation of H(z) can be rewritten as

H(z) = H 0 (z 2 ) + z -1 H 1 (z 2 ) .
30
(3.9)
H(z)
f =1
f =2
H 0 (z2 )
H1 (z2 )
2
f =2
f =2
Fig. 3.8
H 0 (z)
H1 (z)
f =1
f =1
Reconstruction of a decimator with M=2.
This representation can be put into implementation directly. Fig. 3.8 shows an
example of this reconstruction for a decimator with M=2. The polyphase
implementation (Fig. 3.8(c)) is much more efficient than a direct implementation as
shown in Fig. 3.8(a). Although there are some hardware overheads due to the
downsampler, H0(z) and H1(z) will operate at lower rate. Each of them requires only
N/2 multiplications and (N-1)/2 additions per unit time to carry out the processing
relative to N multiplications and (N-1) additions per unit time that the direct
implementation needs. Here, N is the tap length of the decimation filter H(z).
This polyphase representation can also be used on the implementation of
interpolator. Fig. 3.9 shows the general form of the polyphase implementation of
M-fold decimator and L-fold interpolator.
31
H 0 (z)
H 0 (z)
z 1
z 1
L
H1 (z)
H1 (z)
z 1
z 1
M
f =M
H M-1 (z)
H L-1 (z)
L
f =1
f =1
(a)
Fig. 3.9
M
f =L
(b)
Polyphase implementations of (a) M-fold decimator and (b) L-fold

interpolator.
However, the decomposition of the linear phase filter, which has symmetric
coefficients with polyphase property, into subfilters will usually destroy the
symmetric property of subfilters. Thus, it possibly increases hardware complexity as
compared to the original symmetric filter without using polyphase representation.
Since the decomposition into subfilters is accomplished by sampling every M-th
coefficient of the original impulse response, those subfilters resulting from sampling
which is symmetric about the center tap will be linear phase, while the other subfilters
will not. At most, there will have two subfilters to be linear phase as is summarized in
Table 3.1 [25]. The remaining nonlinear phase subfilters cannot use the folded
structure and will require a large number of multipliers to implement. Therefore,
when the sampling rate conversion ratio (SRCR) is even, the filter with even tap
length N can be redesigned to be N+1 for two more number of linear phase subfilters
to reduce the hardware complexity.
32
Table 3.1
3.4
Number of linear phase subfilters if prototype filter is linear phase
Filter Length
Sampling Rate
Conversion Ratio
Number of
Linear Phase Subfilters
Even
Even
Even
Odd
Odd
Odd
Odd
Even
Interpolated FIR Filter Design

FIR digital filters are well known to have some desirable properties like stability,
linear phase response and less quantization error. The main drawback of it is the large
mount of arithmetic operations needed in implementation, especially for the filters
with narrow transition band. In order to cope with the computational complexity of
sharp narrowband FIR filters, the interpolated FIR (IFIR) filter technique is
introduced [12]. The basic idea of it is to implement the filter H(z) as a cascade of
two FIR sections:
H(z) = G(z L ) I(z)
(3.10)
where G(zL) is a periodic model filter which generates a sparse set of impulse
response values with every L-th samples being nonzero, and I(z) is a image
suppressor which can be implemented with only a few arithmetic operations.
In frequency domain analysis, G(zL) has a periodic frequency response with
period 2/L and is designed to perform passband, transition band and stopband
shaping in the vicinity of the passband, and I(z) is designed to attenuate the unwanted
33
Fig. 3.10
Time and frequency domain behaviors of IFIR low-pass filter with L=3.
passband created by G(zL). If p denotes the passband deviation and s denotes the
stopband deviation, the overall IFIR filter must meet the requirements of
1 p G( z L) I ( z ) 1 + p
in the passband (3.11)
and
G ( z L) I ( z) s
in the stopband.(3.12)
Time and frequency domain behaviors of the IFIR approach used on a low-pass filter
design with L=3 are illustrated in Fig. 3.10.
Considering the image suppressor I(z), it can also be generally implemented
into a multistage structure and can be expressed as [13]:
I ( z ) = I1 ( z ) I 2 ( z L1 ) I 3 ( z L1L2 ) I k ( z L1L2 Lk 1 )
N Ik
I k ( z ) = ik ( n) z n
n =0
where Lks are selected such that
34
(3.13)
(3.14)
Lk 1 =
L
L1 L2 Lk 2
(3.15)
is an integer. If the stopband edge frequency of the low-pass IFIR filter is denoted by
s, the maximum value for the interpolation factor L is
Lmax =
s
(3.16)
where the brackets denote truncation. We take IFIR filter performance versus
transition region bandwidth for p = 0.5, p = 0.1 dB and p = 40 dB as an example
for two stages implementation. Fig. 3.11(a) shows the reduction factor of IFIR filter
design over conventional filter design, which is
SF =
N CON
, N IFIR = N I + N G
N IFIR
(3.17)
where NCON is the order of conventional filter, NI is the order of I(z) and NG is the
order of G(z). Fig. 3.11(b) shows the optimum interpolation factors versus transition
region bandwidth. In this case, the filter with narrow transition band, higher
interpolation factor L gives higher reduction factor SF, the maximum SF can up to 6.
In [12], the design of IFIR filters was based on the use of simple interpolators.
In the simple case, the image suppressor filter I(z) is a simple lowpass filter. This case
is the most robust and the fastest design. Further optimized IFIR designs [13], the I(z)
is designed with a don't-care region, where the periodic model filter G(zL) already
provides the required attenuation, the concept as shown in Fig. 3.12, thus leads to
fewer coefficients required of I(z). Another advanced IFIR design is much more
involved, it jointly optimizes the G(zL) and I(z) in order to achieve the required
specification. The result is significant savings in the order of the I(z) and a slighter
savings in the order of G(zL). Matlab has a useful function called ifir which provides
35

6
L
L
L
L
L
L
L
5.5
=
=
=
=
=
=
=
2
3
4
5
6
7
8
4.5
SF
3.5
2.5
1.5
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Transition Region Bandwidth ( rad/sample)
0.08
0.09
0.1
(a)
(b)
Fig. 3.11
IFIR filter performance versus transition region bandwidth for p = 0.5,

p = 0.1 dB and p = 40 dB: (a) hardware reduction factor over
conventional filter design; (b) optimum interpolation factors.
36
the baseline and the advanced design approach. Obviously, the advanced IFIR design
method gives the fewest coefficients leading to fewest multipliers and hardware
complexity. However, the maximum value of the final coefficients may exceed 1, and
has higher coefficient range as compared to the simplest design method. It would be a
problem when the filter is realized with finite precision. Our module generator will
give more nonzero digits or use scaling to compress the range of the coefficients to
prevent the filter coefficients from overflow.
Fig. 3.12
Frequency domain behaviors of IFIR low-pass filter, the image suppressor

I(z) is designed with a don't-care region.
With carefully selecting the interpolation factor L, the number of stages and
choosing the best method to implement the interpolator, there will be an optimum
IFIR filter design with minimum hardware complexity. The price paid for these
reductions is only a slight increase in the number of delay elements as compared with
direct implementation. In addition, the IFIR implementation gives smaller coefficient
sensitivity and better roundoff noise than direct implementation [13].
37
3.5
Multirate Multistage Filter Design

In many applications, it is usually necessary to design a decimator / interpolator
with a large decimation / interpolation ratio. Although this can be done by designing a
filter directly and using the polyphase structure to save the arithmetic operations, it is
more efficient to design in multiple stages [1][2], and the IFIR technique is still
applicable.
Considering a decimator shown in Fig. 3.13(a), the lowpass filter H(z) will be a
narrow band case as the decimation ratio M becomes large. The IFIR technique can
be used to reduce the hardware complexity of H(z).
Fig. 3.13
Multistage IFIR decimator design.
38
L
f=1
H(z)
f=L
IFIR Technique
G(zL1)
L1 L2
f=1
f=L
L2
I(z)
(b)
I(z)
(c)
Noble Identity
G(z)
f=1
(a)
L1
f = L2
f=L
Polyphase Decomposition
G0(z)
I0(z)
L2
G1(z)
L2
GL2-1(z)
L2
f=1
Fig. 3.14
(d)
L1
-1
-1
Z-1
f = L2
I1(z)
L1
IL1-1(z)
L1
f = L2
Z-1
f=L
Multistage IFIR interpolator design.
If we carefully design the interpolation factor L of the periodic model filter G(zL)
to be M1, as shown in Fig. 3.13(b), the structure of the decimator can be reconstructed
into Fig. 3.13(c) from noble identity. By this structure, the decimator is divided into
two sections, and both of them can be implemented by polyphase representation with
less filter coefficients resulting from image suppressor I(z) and model filter G(z), as
shown in Fig. 3.13(d). In addition, the interpolator can be designed in the same way,
as shown in Fig. 3.14.
Furthermore, the multistage IFIR decimator / interpolator structure can also
extend to three stages or more. Fig. 3.15 shows the derivation of the structure with
three-stage decomposition.
39
H(z)
f =M
I1 (z)
G(zM1M 2 )
I 2 (z M1 )
M1 M 2 M 3
f =M
I1 (z)
I 2 (z M1 )
M1 M 2
f =M
M1
I 2 (z)
G(z)
f =1
M3
f = M3
f =M
I1 (z)
f =1
M2
G(z)
f = M3
f = M2 M3
f =1
M3
f =1
Fig. 3.15 Multistage IFIR decimator with three-stage decomposition.
40
Chapter 4
Module Generator Implementation
The design flow of the module generator and the program implementation
issues will be discussed in this chapter. The system configuration and dataflow of the
module generator are shown in Fig. 4.1. The module generator consists of many
sub-modules. The main sub-modules are the multistage architecture analysis and
synthesis, the coefficient calculation, the coefficient optimization, the word length
estimation and the synthesizable Verilog code generation. All modules are written in
C++ language and the operation of each module will be described in the following
sections.
4.1
Specifications
The inputs of our module generator are the system-level specifications, which
are listed in Table 4.1, where
and
AP = 20 log(1 P )
(4.1)
AS = 20 log S
(4.2)
41
Fig. 4.1
Design flow of the module generator (a) the digital FIR filter design flow
(b) the multirate multistage digital FIR filter / decimator / interpolator
design flow.
Table 4.1
Input data of system specifications.
Filter Type (LP, HP, BP, BS, Decimator, Interpolator, Multistage FIR)
Tf
Normalized Passband and Stopband Edge Frequencies
P , S
Passband Ripple in Magnitude or dB
P / AP
Stopband Attenuation in Magnitude or dB
S / AS
Input Word Length (bit)
Win
Signal to Noise Ratio (dB)
SNR
Up Conversion Ratio
Down Conversion Ratio
42

1.2
1+P
Passband ripple
1-P
0.8
Ideal lowpass filter
Amplitude
0.6
Transition width
0.4
0.2
S
Stopband ripple
-S
-0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fs/2
Normalized Frequency ( rad/sample)
Fig. 4.2
Specification of a lowpass filter.
Fig. 4.2 provides a graphical description of the specifications of a lowpass filter.

Because the impulse response required to implement the ideal lowpass filter is
infinitely long, it is impossible to design an ideal FIR lowpass filter. Finite length
approximations to the ideal impulse response lead to the presence of ripples (p and s)
in both the passband (< p) and the stopband (>s) of the filter, as well as to a
nonzero transition width (s-p) between the passband and stopband of the filter.
Both the passband /stopband ripples and the transition width are undesirable but
unavoidable deviations from the response of an ideal lowpass filter when
approximating with a finite impulse response. Practical FIR designs typically consist
of filters that meet certain design specifications, i.e., that have a transition width and
maximum passband / stopband ripples that do not exceed allowable values.
43
4.2
Multistage Architecture Analysis

Generally, it is usually necessary to design a decimator / interpolator with a
large decimation / interpolation ratio in many applications. Although it can be done

by designing a filter directly and using the polyphase structure to save the arithmetic
operations. It is more efficient to design in multiple stages and the IFIR technique [12]
is used. In addition, the interpolated FIR filter can implement narrowband FIR filter
designs with a significantly reduced hardware complexity relative to conventional
FIR filters.
4.2.1 Interpolated FIR Filter Decomposition

In this subsection, the optimal decomposition of IFIR filters are discussed for
both single-stage and multistage of the image suppressor I(z). We will estimate the
minimum subfilter orders very accurately. With the estimated values, it is possible to
find a nearly optimum decomposition. The optimal filter decomposition depends on
the stopband edge as well as on the relative transition width of the filter. In the
following, we will consider three cases show in Table 4.2 as design examples [13].
Table 4.2
Three cases for IFIR decompositions.
Case
0.05
0.1
0.01
0.001
II
0.09
0.1
0.01
0.001
III
0.01
0.02
0.01
0.001
44
Fig. 4.3 show the total taps requirements in these three cases to implement I(z),
G(zL) and the overall filter as a function of the interpolated factor L for the
single-stage implementation of I(z). The interpolated factor L = 1 corresponds to the
conventional direct form FIR filter. As show in these figures, the IFIR filters provide
significant reductions in the number of the taps over conventional direct form designs.
As L increases, the number of taps of G(zL) decreases exponentially and the taps of
I(z) increases exponentially. We can increase L until the decrease in the number of
taps of G(zL) becomes smaller than the increase in the number of taps of I(z) and the
minimum total taps of the overall filter is obtained. The maximum interpolated factor
is limited to Lmax = / s and Lmax for case I, II, III are 10, 10 and 50 respectively.
When comparing the results for case I and case II, it is observed that as the relative
transition bandwidth is made smaller while keeping the same stopband edge, the
interpolated factor of the optimum value Lopt of L becomes larger. As for case II and
case III, if the transition width is same, the one with smaller stopband edge will have
relative large tap contribution of I(z) and Lopt/Lmax will decrease. However, as the
absolute value of Lopt increases, it will result in larger saving in the number of
arithmetic operations.
Fig. 4.4 shows the total taps requirements in case III to implement I(z), G(zL)
and the overall filter as a function of the interpolated factor L(=L1L2) for the
two-stage implementation of I(z) (=I1(z)I2(zL1)). Comparing sing-stage and two-stage
implementations of I(z) in case III, the two-stage implementation significant saving
the number of the taps of the overall filter than the single-stage implementation. This
is because of the two-stage implementation of I(z) requires considerably fewer taps of
I(z) and the optimum decomposition occurs at a high value of L(=L1L2), thus it also
45
Case I
Case II
Case III
Fig. 4.3
The number of taps versus interpolated factor L for the periodic model filter
G(zL), the image suppressor I(z) and the overall filter H(z).
46
600
25
500
20
T ap Num ber
Tap Number
400
300
200
15
10
100
0
1
0
1
2
6
4
6
4
5
4
2
7
2
7
1
Interpolated factor L2
1
G(zL1L2)
I1(z)
80
600
70
500
400
50
Tap Num ber
Tap Num ber
60
40
30
300
200
20
100
10
0
1
0
1
2
6
4
4
6
5
4
3
2
7
1
1
I2(z)
Fig. 4.4
3
5
2
4
H(z)
In case (3), the decompositions of two-stage designs of I(z) for various

values of L1and L2.
decreasing the number of taps of G(zL). When the single-stage implementation of I(z)
have a very small taps, the multistage implementations of I(z) will provides only a
slight saving over its single-stage implementation. Table 4.3 summaries the optimal
IFIR filters with single-stage and two-stage implementations of I(z) in these three
cases. We can observed that a filter with the narrow passband width and the transition
band using the IFIR method implement will has significant reduces the total taps of
47
the overall filter. When using the two-stage implementations of I(z), it will further
saving the total taps of the overall filter compared with the one-stage implementation
of I(z). The analysis of the optimal IFIR filters for case I ~ III by our module generator
costs 12% additional taps compared with [13]. Because our module generator
decomposition of IFIR filters with stringent specifications to guarantee the final design
will satisfy the system specification, thus it will take more taps of overall design.
Table 4.3
Summaries of the optimal IFIR filters with single-stage and two-stage

implementations of I(z).
Single-stage implementations of I(z).
Case
NCON
NH
NI
NG
103
43
15
28
2.53
II
510
126
39
87
4.09
III
510
88
41
47
11-13
5.79
Two-stage implementations of I(z).

31
III
510
60
(NI1=15;
29
NI2=16)
Note:
20
(L1=4;L2=5)
8.5
NCON is the number of taps of conventional filter.

NH is the minimum taps of overall multistage filter
NI is the number of taps of I(z).
NG is the number of taps of G(zL).
L is the interpolated factor.
R is the reduction ratio of multistage total taps over the conventional design taps.
The IFIR decomposition analysis of our module generator will select several
decomposition methods that have minimum taps of the overall filter H(z) to
implement the multistage designs.
48
4.2.2 Multirate Multistage Filter Decomposition

Following the above subsection, we will analyze the optimal decomposition of
the IFIR designs and using the polyphase structure to save the arithmetic operations.
The choice of the number of stages K and the sampling rate conversion ratio M is not
a trivial problem. However, in practice, the number of stages K is rarely larger than
four. Furthermore, for a given value of M, there are only a limited set of possible
integer factors. Thus, a feasibe approach is to determine all the possible factors of M.
For a low power approach, we use multistage IFIR technique and follow the
relationship of sampling rate conversion ratio (SRCR):
M1 M2
Mk
(4.3)
to decompose the decimator / interpolator into all the possible multistage sets. For a
Fig. 4.5
Specifications of multistage IFIR decimation filter design.
49
K-stage decimator / interpolator, the filter specification for each stage shall be chosen
to ensure that the overall filter requirements are met as shown in Fig. 4.5, where the
passband ripple is P/K, and the stopband ripple is S.
Moreover, the polyphase decomposition is used to decompose the filter into
subfilters. In order to consider both high-speed and low-speed applications, the
transposed direct form structure is chosen. However, the decomposition of the linear
phase filter into M subfilters will usually destroy the symmetric property of subfilters
and result the nonlinear phase subfilters. Therefore, in our module generator when the
SRCR is even and the filter with even tap length N can be redesigned to be N+1 for
two more number of linear phase subfilters to reduce the hardware complexity.
4.3
Coefficient Calculation
The floating-point filter coefficient h(k) is generated by the Parks-McClellan
optimal equiripple method as given in the MATLAB gremez.m function [32]. If the
coefficients do not satisfy the desired filter specifications, the filter order is increased
and coefficients are calculated again. In addition, the user can input the coefficients
derived from other filter analysis packages.
4.4
Coefficient Optimization
The simple rounding of a filters floating-point coefficients to their nearest CSD
values does not usually yield satisfactory performance in terms of implementation

complexity and frequency response. Many search algorithms for the design of
multiplierless filters with power-of-two and CSD coefficients have been published.
50
The two most popular techniques for CSD coefficients optimization are
mixed-integer-linear-programming (MILP)[33] and local search [23].
MILP is known to be the optimal technique for designing FIR filters employing
conventional fixed word length coefficients. A drawback of MILP is that its
computation time grows at least exponentially with filter length and this limits its
application to the design of filters having short to medium length. However, for filters
with CSD coefficients, even though MILP optimally searches the SPT coefficient
space, it does not guarantee that the solution produced has the minimal total number
of adders. Obviously, the two major goals of a CSD search algorithm are:
(1) a filter that can be implemented with minimum hardware requirement, and
(2) to minimize the computation time in such a design procedure.
The local search techniques have been found to perform nearly as well as the MILP
method while requiring substantially less computational time for their convergence.
According to the methods presented in [23][34], we adopt a two-step local search
algorithm to round and optimize our filter coefficient with CSD codes as discussed in
the following two sections.
4.4.1 Scaling Strategy

In the beginning, according to Eqn. (2.7), the number of nonzero digits (Dn) and
the coefficients word length (W) are selected. A table of all possible CSD coefficients
between 0 and 2-1 is created and saved for later use.
The shape of the frequency response of an FIR filter is unaffected by
multiplying all the filter coefficients by a fixed scale factor (SF). The SF simply
inserts an additional gain or attenuation into the frequency response. It can also be
51
easily compensated by a constant gain stage before or after the filter system. The set
of numbers represented by a CSD code with a fixed number of nonzero digits is not
uniformly distributed. Therefore, properly scaling the ideal filter coefficients prior to
rounding them to the nearest CSD code can usually significantly reduce the
magnitudes of the coefficient quantization errors, which means an improved
frequency response. Fig. 4.6 shows the flowchart of the scaling strategy for filter
coefficients.
Since the coefficient quantization process is highly nonlinear, there is no way to
predict in advance which SF will produce better results. Therefore, a brute force
search of SF must be performed. All the filter coefficients are assumed to be in the
range [ 0.5,0.5] . Then the choice of the SF can be constrained to such a range that
the SF is not greater than the value SF_max and is not less than the value SF_min.
The limits SF_max and SF_min are defined as follows: multiplying by SF_max
makes the absolute value of the largest coefficient equal to 2-1; multiplying by
SF_min makes the absolute value of the largest coefficient equal to 2-2. During the
search procedure the SF change from SF_min to SF_max with the step size of 2-q,
where q is the coefficient wordlength.
For each SF, the frequency response is computed only if the quantized CSD
coefficient is different from the previous one. Finally, we select the SF which results
in the minimum total number of SPT terms ( Dn ) and fulfill the specification of
n
filter.
52
( SF _ max SF _ min)
2 q
Fig. 4.6
The flowchart of the scaling strategy for filter coefficients.
53
4.4.2 Local Search Strategy

The second step in the optimization process is a bi-variate local search in the
neighborhood of the scaled and rounded coefficient. It was observed that a bi-variate
search was found to yield substantially better results than a uni-variate local search
and that any higher order search did not justify the exponential growth in CPU time
[35].
Thus, all possible pairs of coefficients are varied by +/- one quantization step
size and the resulting frequency response is computed. Let S represent half the total
number of coefficients in symmetrical FIR digital filters. For each distinct coefficient
pair, four perturbations are performed by simultaneously rounding the first number in
the pair it up or down by one digit. Similarly, for each no distinct coefficient pair, two
perturbations are performed by rounding the number up or down by one digit. This
result in a total of
2S + 4
S ( S 1)
= 2S 2
2
(4.4)
coefficient sets are searched. The local search process proceeds in an iterative manner.
After the search cycle is completed, the coefficient sets whose frequency response fit
the filter specification are selected and the bi-variate local search is repeated with the
new coefficient sets. This process continues until no further improvement is obtained.
54
4.5
Word Length Estimation
4.5.1
Overflow Prevention
If the final output is within the range of the original word length, overflow in
partial sums are unimportant. This is a desirable property of 2s complement
arithmetic. However, if the final output exceeds the range of the signal, the value of
the output sample will be wrong and methods should be taken to prevent this.
An
approach is to avoid or allow limited overflow by scaling the coefficients. The

coefficients may be scaled in the following way [36]:
h(k ) = h(k ) 2 R
where
or
N 1
N 1
2
R = log 2 h (k ) = log 2 h 2 (k )
k =0
k =0
2
N 1
R = log 2 h(k )
k =0
(4.5)
(4.6a)
(4.6b)
where R denotes right shift bit(s). The method given in Eqn. (4.6a) probably lead to
shorten internal word length than Eqn. (4.6b) but this form of scaling will
occasionally have overflow which results in performance degradation. Therefore,
the method in Eqn. (4.6b) is adopted which never cause overflow because it is based
on the worst-case conditions for overflow.
Hence, the coefficient word length
increases R bit(s) and the coefficients are then shifted right R bit(s) to prevent
overflow.
55
4.5.2 Internal Word Length Reduction

In digital signal processing, the finite word length has a strong effect on the
system performance since it dominates the precision of the output signals. When the
internal word length increases, a better signal-to-noise ratio (SNR) would be acquired.
However, the system would have higher hardware complexity, consume more power,
and have lower system operation frequency. Therefore, the designers should make the
trade-off.
If designer is willing to accept some deviation from the given specifications, the
decrement of internal word length can enable a reduction of hardware complexity. In
this subprogram, we will evaluate the SNR by using Eqn. (4.7)
(
(
)
)
E y 2 ( n)
E y 2 ( n)
SNR = 10 log
=
10
log
2
2
E e (n)
E ( y (n) y (n) )
(4.7)
The simulation block is shown in Fig. 4.7 and Fig. 4.8. They show the internal word
length estimation flow.
y(n)
Fig. 4.7
SNR simulation block.
56
Fig. 4.8
Internal word length estimation flow chart.
The initial internal word length will be evaluated for the result that does not
introduce any error first. Then the internal word length will be decreased to the value
that its SNR value still fits the specification. Finally, the minimum internal word
length, which fulfills the specification, will be obtained.
4.6
Synthesizable Verilog Code Generation
4.6.1
Hardware Estimation
Before generating the hardware of the FIR digital filters, interpolated FIR filters
and multirate multistage decimator / interpolator, the module generator will do the
57
hardware complexity estimation for each design. For comparison are total nonzero
digit, maximum nonzero digit, and internal word length. The priority of the indices is
as follows.
z For low-complexity application:
Priority: Internal Word Length > Total Nonzero Digit > Max. Nonzero Digit
z For high-speed application:
Priority: Max. Nonzero Digit > Internal Word Length > Total Nonzero Digit
In addition, the module generator will also estimate the computation APU and
storage SPU of each design. Finally, it will generate a file, hardware.out, to record the
hardware estimation.
4.6.2 Structure of the FIR Digital Filter

The module generator generates three types of the symmetric transposed direct
form for each stage FIR filters.
Structure A: A transposed direct form filter structure is adopted and written in

behavior level synthesizable Verilog-HDL code, which allows the
synthesis tool to select the appropriate architecture for users constraints.
Structure B: A transposed direct form filter structure with carry save adders (CSA)
written in DesignWare components [37] provided by Synopsys is
adopted as show in Fig. 4.9(a).
Structure C: We exploit structure B with pipelining to achieve at most two CSA delay
critical path for the maximum allowed number of SPT terms per
58
coefficient is three or four, as shown in Fig. 4.9(b). Moreover, the

nonzero digits of most CSD coefficient sets is generally less than three
so we can use a single input buffer rather than pipelining at each tap.
Referring to Fig. 4.10, the input x(n) is for the taps whose nonzero digits
are more than two and x(n-1) is for less than three.
Fig. 4.9
Transposed direct form filter strcture utilizes with (a) carry save adders
(CSA) and, (b) carry save adders (CSA) with pipelining.
buffer
Z-1
x(n)
x(n-1)
shift
CSA
Z-1 Z-1 pipelining

Fig. 4.10
Z-1
Z-1
Z-1
Z-1
The Structure C with an input buffer.
59
Z-1
Z-1
4.6.3
Structure of the Interpolated FIR Filter
The basic idea of the interpolated FIR (IFIR) filters is to implement the
prototype filter H(z) as a cascade of two FIR sections, a periodic model subfilter G(zL)
and a image suppressor I(z), as show in Eqn. (3.10).
The periodic model subfilter G(zL) are based upon the behavior of an N-tap
nonrecursive linear-phase FIR filter when each of its unit delays are replaced with
L-unit delays, with the interpolated factor L being an integer, as shown in Fig. 4.11(a).
If the H(z) impulse response of a nine-tap FIR filter is that shown in Fig. 4.11(b), the
impulse response of the periodic model filter, where, for example, L = 3, is the G(zL)
in Fig. 4.11(c). The module generator generates the symmetric transposed direct from
structure for the periodic model subfilter with expanded delays between the taps and
adopts the above FIR filter design to implement the image suppressor.
It is an important implementation issue when a narrow IFIR filter passband
width and transition band is using in IFIR filters, a larger interpolated factors L can be
used. However, it requires a larger size of the storages to be allocated, in order to hold
a sufficient number of input samples for the periodic model subfilter. This is a
disadvantage to the periodic model subfilter G(zL) because the size of the storages
must be equal to [L(N-1)-1], N is the tap length of G(z). Although, it will increase
the storage area, but it can reduce the hardware complexity effectively relative to the
conventional FIR design when implement narrowband FIR filter. If we use dual
clocks to G(zL), it will effective reduce the storages requirment of the periodic model
subfilter such as the design shown in Fig. 4.12.
60
(a)
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
-0.1
-0.1
(b)
10
15
20
(c)
Fig. 4.11
(a) The symmetric transposed direct from structure for G(zL) with L-unit
delays between the taps; (b) the impulse response of H(z); (c) the impulse
response of G(zL) with L=3.
Fig. 4.12
The symmetric transposed direct from structure for G(zL) with dual clocks.
61
4.6.4
Structure of the Multirate Multistage Filter
Both decimator and interpolator can have two structures in direct form or
transposed direct form. When the implementation is to use the transposed direct form
for decimators and the direct form for interpolators, there are the registers to be
shared between the subfilters, as shown in Fig. 4.13(b)(c) for the example of M=3,
N=9. This is the so-called memory-saving technique [25]. Another type of
implementation is to use the direct form for decimator and transposed direct form for
interpolator.
They allow multipliers to be shared between the subfilter in each
mirror symmetric pair, as shown in Fig. 4.13(a)(d). This is the so-called mirror
symmetric filter pairs technique [25].
The word length of the registers in structures (b) and (d) need to store internal
signal and is longer than the word length of the registers in structures (a) and (c)
which store input signal. With mirror symmetric filter pairs, structures (a) and (d)
have only about half of the multipliers in structures (b) and (c). However, structures
(b) and (c) which using memory-saving technique have approximate 1/M registers of
those in structures (a) and (d). Although no structure is absolutely better than the
other one, the critical path of the transposed direct form is shorter than that of direct
form. For high-speed application, therefore, the structures (b) and (d) will be selected.
62
(a)
(b)
Fig. 4.13
(a) Direct form decimator with mirror symmetric filter pairs.

(b) Transposed direct form decimator with memory-saving technique.
63
(c)
(d)
Fig. 4.13
(c) Direct form interpolator with memory-saving technique.

(d) Transposed direct form interpolator with mirror symmetric filter pairs.
64
4.7
Module Generator
Our module generator are written in C++ language and employing Matlab as a
computation engine. The Matlab engine library is a set of routines that allows we to
call Matlab from our own programs. Our module generator has about 72 subprograms
in the main program and it consists of many sub-modules in the operation flow as
shown in Fig. 4.14. The main sub-modules are the multistage architecture analysis
and synthesis, the coefficient calculation, the coefficient optimization, the word
length estimation and the synthesizable Verilog code generation.
Following the operation flow, the module generator will read the system
specifications firstly. According the filter type definition of the specifications, it will
define the design is the digital FIR filter (LP, HP, BP and BS), the multistage IFIR
filter or the multirate multistage decimator / interpolator. When the design is the
multistage IFIR filter or the multirate multistage decimator / interpolator, it will
through the multistage architecture analysis / synthesis sub-module to decomposed
the optimal architectures of the IFIR filter and decimator / interpolator. After the
analysis of the optimal architecture, the coefficient calculation sub-module employ
Matlab to estimate the floating-point filter coefficients and generate the matlab.out to
record the coefficient values in the same time. We adopt a two-step local search,
scaling and local search, to round and optimize our filter coefficient with CSD codes.
For high speed approach, it will select the coefficient sets with the minimum nonzero
digits of a coefficient. For low complexity approach, it will select the coefficient sets
with the minimum total nonzero digits. According the optimal coefficient sets, our
model generator will estimate the internal word length in the system that must be
65
Fig. 4.14
The operation flow of the module generator.
66
satisfy the requirement of the SNR. Hardware estimation sub-module will generate
the hardware.out to record the hardware design costs. Finally, the synthesizable
Verilog code generation sub-module will generate the synthesizable Verilog code of
multirate multistage digital FIR filter / decimator / interpolator.
67
Chapter 5
Experimental Results
In this chapter, the design examples of FIR filter, interpolated FIR filter and
multirate multistage filter generated by the module generator are presented. All
performance data presented in this chapter are pre-layout estimations.
5.1
FIR Filter Design

A linear-phase low-pass FIR filter is designed using our proposed method, the
mixed integer linear programming (MILP) algorithm [33], and Samuelis local search
algorithm [23]. The pass-band and stop-band edge frequencies are 0.3 and 0.5,
respectively. The passband ripple is 0.05dB and stopband ripple is 50dB. The word
length of the input signal is assumed to be 14 bits. The minimum number of SPT terms
required by the various methods mentioned above is summarized in Table 5.1. When
the maximum allowed number of SPT terms per coefficient is limited to four, the filter
designed by our methods saves 22% (21%~24%) SPT terms and costs 5% (4%~7%)
68
Table 5.1 Minimum number of SPT terms required to attain -50dB NPR.
Algorithm
#SPT
Max. SPT per coeff. = 4

MILP [33]
Samueli [23]
Our Work #1
Our Work #2
Our Work #3
68
66
64
54
52
28
28
28
29
30
Max. SPT per coeff. = 3

MILP [33]
Samueli [23]
Our Work #4
Our Work #5
68
28
cannot reach -50 dB
66
57
Table 5.2
28
29
Synthesis results of Work #1.
Timing Constraint: 7.50(ns)

Structure
Critical Path (ns)
Total Gate Count
Combinational Area
Noncombinational Area
A
7.46
5069
2824
2245
B
4.65
8103
3613
4490
C
4.65
9119
3907
5212
B
1.57
11520
5595
5925
C
1.25
12862
5999
6863
Timing Constraint: 1.25(ns)

Structure
Critical Path (ns)
Total Gate Count
Combinational Area
Noncombinational Area
A
3.86
8338
5799
2539
additional tap length. If the application requires us to limit the maximum number of
SPT terms per coefficient to three, to have higher throughput rate, the filter designed
using Samuelis algorithm failed to reach -50 dB NPR. However, using our proposed
method can save 16% SPT terms and costs 4% additional tap length. The design results
are converted into three structures mentioned in Section 4.6.2. We designed the filters
69
of Work #1 with TSMC 0.25m process and summarized the results in Table 5.2. The
synthesis results summarized in Table 5.2 show that structure A is suitable for the
low-speed (133MHz) and area-efficient application; Structure B is suitable for the
high-speed (637MHz) application; and Structure C is suitable for the very high-speed
(800MHz) application. Therefore, our module generator can provide flexible hardware
implementation for various applications. The frequency responses of the filter designed
by our module generator are shown in Fig. 5.1. We designed the filters of Work #1 and
Work #3 with TSMC 0.25m process and summarized the results in Table 5.3. Work #3
design saves about 19% SPT terms and costs 6.6% additional tap length compared
with Work #1 design. The area of Work #1 design is about 1.1 times of Work #3 design.
The area of an FA is about 6.3 gates and a register is about 5.6 gates. The power
dissipation of Work #1 design is about 1.1 times of Work #3 design.
20
0.02
10
0.01
0
Normalized Magnitude Response (dB)
-10
-0.01
-20
-0.02
-0.03
-40
0.05
0.1
0.15
0.2
0.25
0.3
-50
-60
-70
-80
-90
W ork
W ork
W ork
W ork
W ork
-100
-110
-120
Fig. 5.1
0.1
#1
#2
#3
#4
#5
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
The frequency responses of of Work #1, Work #2 and Work #3.
70
Table 5.3
Synthesis results of Work #1and Work #3.
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Work #1
Work #3
Total Gate Count
6527
6080
Combination Area
4314
3719
Noncombination Area
2213
2361
Power Dissipation (mW)
80.66
72.99
A filter design example for the baseband demodulator of the 64 quadrature

amplitude modulation (QAM) telecommunication system is carried out. The specification
is shown in Table 5.4 [38]. Since this filter requires 35 taps length, it is clearly that
directly implementation requires much hardware. However, with our module generator,
the whole filter can be implemented with reasonable chip area.
Table 5.4 Specifications of 64-QAM baseband demodulator.

Sampling Frequency
Symbol Rate
Normalized Passband Edge Frequency
Normalized Stopband Edge Frequency
Passband Ripple
Stopband Attenuation
Input Data Word Length
Output Data Word Length
21.52 MHz
5.38 MHz
0.2110
0.3204
0.1 dB
30 dB
11 bit
14 bit
Fig. 5.2 shows the frequency response between the original coefficients and the
coefficients after coefficient optimization. In addition, the specifications after module
generator are summarized in Table 5.5. By using scaling strategy we can have less
number of total nonzero digit than [38] so fewer adder will be needed. Moreover, with
71
local search strategy the number of total nonzero digit is further reduced. The number
of maximum nonzero digit, which represents the critical path of the filter, is also
reduced.
10
0.1
0
0.05
0
-10
-0.05
-20
-0.1
0.05
-30
0.1
0.15
0.2
-40
-50
-60
-70
Original
Scaling Strategy
Local Search Strategy
-80
-90
0.1
Fig. 5.2
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
The frequency response of 64-QAM baseband demodulator.
The chip is deigned with TSMC 0.25m process because the max nonzero digit is
only 2 bits, the result of structure C is just the same as the structure Bs. For
low-complexity applications, the overall area of [38] is about 1.64 times and the power
dissipation is about 1.95 times of the structure A. Moreover, using the structure B (or
structure C) for high-speed application, the chip can operate at 714MHz. The synthesis
results are summarized in Table 5.6.
72
Table 5.5
Specifications after module generator of 64-QAM baseband demodulator.
[38]
This Work
Scaling
Local Search
Tap Length (tap)
35
35
Normalized Passband
Edge Frequency
0.2010
0.2148
Normalized Stopband
Edge Frequency
0.3204
0.3203
Passband Ripple (dB)
0.0558
0.0531
Stopband Attenuation (dB)
31.1030
31.5092
Coefficient Word Length (bit)
Internal Word Length (bit)
14
14
14
Max Nonzero Digit (bit)
Total Nonzero Digit (bit)
63
57
49
SNR (dB)
46.8
Table 5.6
41.2
Synthesis results of the example for 64-QAM baseband demodulator.

Work#1
Work#2
[38]
Technology (um)
0.25
0.25
0.25
Max. Operating Frequency
714 MHz
146 MHz
72 MHz
Total Gate Count
11117
5155
8477
Combination Area
5011
2496
5938
Noncombination Area
6106
2659
2539
520.05
6.83
13.31
Work#1 Specifications under high-speed constraint.

Work#2 Specifications under low-complexity constraint.
73
5.2
Interpolated FIR Filter Design

We design IFIR filters with specifications are the first version of the CDMA
cellular proposed by Qualcomme [40], by our module generator. The specification are
shown in Table 5.7. Then a conventional filter design using the Parks McClellan
algorithm would require an order N = 69. Base on the algorithm shown in section 4.2.1,
we use the optimal interpolated factor L = 4 for IFIR design with single-stage
implementation of I(z) and L1L2 = 22 for IFIR design with two-stage implementation
of I(z). After module generator, the specifications of the conventional filter, the periodic
model subfilters G(zL) and the image suppressors I(z) are summarized in Table 5.8 [41].
Notice that the system G(z4)I(z) has linear phase property since G(z) and I(z) have this
property.
Table 5.7 Specifications of the CDMA cellular proposed by Qualcomme.

Sampling Frequency
19.6608 MHz
Passband Edge Frequency
0.064087
Stopband Edge Frequency
0.125
Passband Ripple in dB
0.1 dB
Stopband Attenuation in dB
40 dB
Input Data Word Length
5 bits
Fig. 5.3 shows the conventional filter and the frequency responses for the IFIR
filters with single-stage I(z) and L = 4 as well as the frequency responses for the
subfilters, I(z) and G(z4). Fig. 5.4 shows the conventional filter and the frequency
74
Table 5.8
Design results by module generator with IFIR filter designs and the
conventional filter.
Conventional
FIR
Multistage IFIR with

Two-stage
I(z)
G(z4)
Tap Length (tap)
69
15
19
Normalized Passband
Edge Frequency
0.0645
0.0683
0.2598
Normalized Stopband
Edge Frequency
0.1230
0.3730
0.4980
0.0969
0.0284
0.0365
40.2410
43.1643
42.7119
13
15
11
13
204
27
39
Actual Nonzero Digit (bit)
104
14
21
SNR (dB)
41.0
41.5
40.0
Multistage IFIR with Three-stage

I1(z)
I2(z2)
G(z4)
Tap Length (tap)
19
Normalized Passband
Edge Frequency
0.0643
0.1289
0.2578
Normalized Stopband
Edge Frequency
0.8730
0.7480
0.4980
0.0232
0.0283
0.0280
40.7033
40.6462
40.4649
10
11
11
12
13
41
Actual Nonzero Digit (bit)
22
SNR (dB)
41.2
44.7
42.4
75
10
-10
-20
-30
-40
-50
-60
-70
-80
I(z)
G(z 4)
-90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(a)
20
0.1
0.05
0
-0.05
-20
-0.1
0
-40
0.02
0.04
0.06
-60
-80
-100
-120
Overall IFIR Filter
Conventional Filter
-140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(b)
Fig. 5.3
Frequency responses of the IFIR filters with single-stage I(z) and L = 4,

(a) I(z) of order 15 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter.
76
20
-20
-40
-60
-80
-100
I1(z)
I2(z 2)
G(z 4)
-120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(a)
20
0.1
0.05
-20
-0.05
-0.1
0.01 0.02 0.03 0.04 0.05 0.06
-40
-60
-80
-100
-120
Overall IFIR Filter
Conventional Filter
-140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(b)
Fig. 5.4
Frequency responses of the IFIR filters with two-stage I(z) and L = 4,

(a) I1(z) of order 7, I2(z2) of order 7 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter.
77
responses for the IFIR filters with two-stage I(z) and L1L2 = 22 as well as the
frequency responses for the subfilters, I1(z), I2(z2) and G(z4).
The synthesis results of the IFIR filter designs and conventional filter design are
summarized in Table 5.9.The filter I(z) is very inexpensive, whereas the cost of G(z4) is
little more than half the cost of the conventional design. When the timing constraints of
the conventional filter and the IFIR filters are equal, the area of the conventional filter
is about 1.63 times of the IFIR filter with single-stage I(z) and about 1.72 times of the
IFIR filter with two-stage I(z). The power dissipation of the conventional filter is about
12.46 times of the IFIR filter with single-stage I(z) and about 13.10 times of the IFIR
filter with two-stage I(z).
Table 5.9
The synthesis results of the conventional filter and the IFIR filters.
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Conventional
Filter
IFIR Filter with

single-stage I(z)
I(z)
G(z4)
Total Gate Count
14839
2049
7080
Combination Area
9190
1118
1684
Noncombination Area
5649
931
5396
1400.00
25.78
86.57
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
IFIR Filter with two-stage I(z)

I1(z)
I2(z2)
G(z4)
Total Gate Count
721
1369
6523
Combination Area
275
504
1540
Noncombination Area
446
865
4983
8.89
17.31
80.70
78
5.3
Multirate Multistage FIR Design
5.3.1
Decimator
Following the design of IFIR filters, we designed multirate decimators that is use
in the CDMA cellular [40] and decimated factor (M) is eight. The synthesis results are
summarized in Table 5.10. The conventional decimator that is single-stage
decimator using the polyphase structure to save the arithmetic operations and the
multirate decimators are designed by our proposed method.
Table 5.10
The synthesis results of the conventional decimator and the multirate

decimator [40].(Decimation Ratio, M=8)
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Conventional
Decimator
(M=8)
Multirate Decimator with

two-stage
Stage#1
(M1=4)
Stage#2
(M2=2)
Total Gate Count
13742
1580
2554
Combination Area
12567
1162
1792
Noncombination Area
1174
418
761
21.30
6.59
4.29
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Multirate Decimator with three-stage

Stage#1
(M1=2)
Stage#2
(M2=2)
Stage#3
(M3=2)
Total Gate Count
638
808
2417
Combination Area
336
496
1706
Noncombination Area
301
312
710
4.48
3.15
4.42
79

20
10
0.1
-10
0.06
0.04
0.02
Normalized M agnitude Response (dB)
Norm alized M agnitude Response (dB )
-0.1
-20
0.02
0.04
0.06
-40
-60
0
-0.02
-20
-0.04
-0.06
-30
0.01
0.02
0.03
0.04
0.05
-40
-50
-60
-70
-80
-100
Original
Scaling Strategy
Local Search S trategy
0.1
0.2
Original
Scaling Strategy
-80
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sam ple)
0.8
0.9
-90
(a)
20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sam ple)
0.8
0.9
(b)
(d)
(f)
20
0.05
0.05
-0.05
-20
-0.05
0.02 0.04 0.06 0.08 0.1
0.12 0.14
-40
-60
0.01
0.04
0.05
-40
-60
Original
Scaling Strategy
Original
Scaling S trategy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
10
0.8
0.9
-100
(c)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
20
0.05
0.8
0.9
0.06
0.04
0
0
0.02
0
-10
0
-0.05
0.03
-80
-80
-100
0.02
-20
0.02 0.04 0.06 0.08 0.1 0.12 0.14
-20
-30
-40
-50
-60
-0.02
-0.04
-20
-0.06
0.05
0.1
0.15
0.2
0.25
-40
-60
-70
-80
Original
Scaling Strategy
-80
-90
0.1
Fig. 5.5
0.2
0.3
0.4
0.5
0.6
0.7
Original
Scaling Strategy
0.8
0.9
(e)
-100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
The frequency responses of the subfilters of the multirate decimators and

conventional decimator. (a) the conventional decimator of order 69, the
subfilters of the multirate decimator with two-stage (b) I(z) of order 15
(c) G(z) of order 19, the subfilters of the multirate decimator with three-stage
(d) I1(z) of order 7 (e) I2(z) of order 7 (f) G(z) of order 19.
80
Fig. 5.5 shows the frequency responses of the subfilters of the multirate
decimators and the conventional decimator. The frequency responses of the
conventional FIR and the multirate IFIR filters as shown in Fig. 5.6 When the timing
constraints of the conventional decimator and the multirate multistage decimators are
equal, the area of the conventional decimator is about 3.32~3.56 times and the power
dissipation is about 1.78~1.96 times of the multirate multistage decimators.
20
0.1
0.05
0
-0.05
-20
-0.1
0
0.02
0.04
0.06
-40
-60
-80
-100
-120
-140
Fig. 5.6
5.2
Conventional Filter
Multirate IFIR with 2-stage
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
The frequency responses of the conventional FIR and multirate IFIR filters.
Interpolator
Using the module generator, we demonstrate an example of the sharp narrowband
interpolator whose specifications are summarized in Table 5.11. After module generator,
81
the specifications of the conventional filter, the periodic model subfilters G(zL) and the
image suppressors I(z) are summarized in Table 5.12.
Table 5.11
Table 5.12
Specifications of the interpolator.
Normalized Passband Edge Frequency
0.05
Normalized Stopband Edge Frequency
0.10
Normalized Transition Bandwidth
0.05
0.1
40.0
Interpolation Ratio
Specifications after module generator of IFIR filter designs and the

conventional filter.
Conventional
FIR
Multistage IFIR with

Two-stage
I(z)
G(z3)
Tap Length (tap)
85
10
31
Normalized Passband
Edge Frequency
0.0507
0.0586
0.1523
Normalized Stopband
Edge Frequency
0.0996
0.5664
0.2988
0.0712
0.0227
0.0376
40.5573
45.6249
42.4263
15
10
18
12
15
269
16
73
SNR (dB)
40.2
40.7
40.4
82
Table 5.13
The synthesis results of the conventional interpolator and the multirate

interpolator. (Interpolation Ratio, L=6)
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Multirate Interpolator with

two-stage
Conventional
Interpolator
(L=6)
Stage#1
(L1=3)
Stage#2
(L2=2)
Total Gate Count
14545
944
3810
Combination Area
14147
802
3737
Noncombination Area
398
142
73
566.20
26.93
390.52
20
0.1
0.05
0
0
-0.05
-20
-0.1
0.01
0.02
0.03
0.04
0.05
-40
-60
-80
-100
Original
Scaling Strategy
-120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
20
0.8
0.9
(a)
10
0.05
0.05
0
0
0
-10
0
-0.05
0.01
-40
0.02
0.03
0.04
Normalized Magnitude Response (dB )
Norm alized M agnitude Response (dB )
-20
0.05
-60
-80
-100
-120
-30
-0.05
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-40
-50
-60
-70
-140
-160
-20
Original
Scaling Strategy
0
0.1
Fig. 5.7
0.2
0.3
0.4
0.5
0.6
0.7
Original
Scaling Strategy
-80
0.8
0.9
(b)
-90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(c)
The frequency responses of the subfilters of the multirate interpolators and
conventional interpolator. (a) the conventional interpolator of order 85, the subfilters of
the multirate interpolator with two-stage (b) I(z) of order 10 (c) G(z) of order 31.
83
The synthesis results are summarized in Table 5.13. It is evident that, in general,
multistage designs yield very significant reduction in both computation (APU) and
storage (SPU) requirements compared with single-stage designs. The reduction is due
to the wide transition band of the subfilters, I(z) and G(z), leading to small number of
tap length. The conventional interpolator that is single-stage interpolator using the
polyphase structure to save the arithmetic operations and the multirate interpolators
designed by our proposed method. Fig. 5.7 shows the frequency responses of the
subfilters of the multirate decimators and the conventional decimator. The frequency
responses of the conventional FIR and the multirate IFIR filters as shown in Fig. 5.8.
When the timing constraints of the conventional interpolator and the multirate multistage
interpolator are equal, the area of the conventional interpolator is about 3.06 times and
the power dissipation is about 1.36 times of the multirate multistage interpolator.
20
0.1
0.05
0
0
-0.05
-20
-0.1
0.01
-40
0.02
0.03
0.04
0.05
-60
-80
-100
-120
Conventional Filter
-140
Fig. 5.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
The frequency responses of the conventional FIR and multirate IFIR filters.
84
Chapter 6
Conclusions
In this thesis, we have surveyed several architectures of multistage multirate

FIR digital filter / decimator / interpolator proposed in recent years and discussed
about their advantages and disadvantages. A module generator written in C++
language for multirate multistage FIR digital filter / decimator / interpolator has been
presented. Several design methodologies are adopted to reduce the hardware
complexity of the system. Thus, the module is suitable for low-power applications
because of its reduction in hardware complexity and operating frequency. Moreover,
this module generator can design for the high-speed applications due to the compact
and parallel structures used.
The inputs of the module generator are the system specifications. Firstly,
multistage architecture analysis and synthesis will decompose the system into the
optimum multistage sets with multistage multirate IFIR filter design methodology.
Secondly, coefficient calculation will use MATLAB to calculate the coefficient of the
filter. Thirdly, coefficient optimization will represent the floating-point coefficient to
CSD code using scaling strategy with minimum hardware complexity. Furthermore, it
85
will reduce the hardware complexity further by local search method. Next, word
length estimation can make the system achieve the SNR requirement with minimum
internal word length. Finally, synthesizable verilog code generation will generate the
synthesizable Verilog-HDL codes, which are written in behavior and RTL-level for
flexibility.
We have designed several filters with TSMC 0.25m standard cell. For
64-QAM baseband demodulator design shows that the area is reduced about 1.64
times and the power dissipation is saved about 1.95 times for low-complexity
applications. Moreover, for high-speed application, the chip can operate at 714MHz.
Besides, we designed the IFIR filters which specification is the first version of the
CDMA cellular, the area is reduced about 1.72 times and the power dissipation is
saved about 13.10 times as compared with direct form design. A designed multistage
decimator that is used in CDMA cellular shows that the area is reduced about 3.56
times and the power dissipation is saved about 1.96 times as compared with
conventional decimator. Finally, an example of the narrowband multistage
interpolator designed, the area is reduced about 3.06 times and the power dissipation
is saved about 1.36 times as compared with conventional interpolator.
Because the generator requires only system-level specifications, system
designers who are inexperienced in VLSI design can use the module generator easily.
Furthermore, by using this module generator, an efficient design of a chip can be
successfully completed in a few minutes.
86
References
References
[1]
P. P. Vaidyanathan, Multirate systems and filter banks, Englewood Cliffs, NJ:

Prentice Hall, 1993.
[2]
R. E. Crochiere and L. R. Rabiner, Multirate digital signal processing,

Englewood Cliffs, NJ: Prentice Hall, 1983.
[3]
Filter design toolbox users guide, version 2, MathWorks, Inc,2004.
[4]
CoCentric system studio filter design tools user guide, Synopsys, Inc., May
2002.
[5]
M. Ishikawa et al., Automatic layout synthesis for FIR filters using a silicon
compiler, IEEE Int. Symp. Circuits Syst., pp. 2588-2591, May 1990.
[6]
R. Jain, P. T. Yang, and T. Yoshino, FIRGEN: A computer-aided design system

for high performance FIR filter integrated circuits, IEEE Trans. Signal
Processing, vol. 39, pp. 1655-1668, Jul. 1991.
[7]
R. Hawley, T.-J. Lin, and H. Samueli, A silicon compiler for high-speed

CMOS multirate FIR digital filters, IEEE Int. Symp. Circuits Syst., vol. 3, pp.
1348-1351, May 1992.
[8]
E. Bidet, C. Joanblanq, and P. Senn, GENRIF: An integrated VLSI FIR filter

compiler, Eur. Conf. Design Automation, pp. 466-471, Feb. 1993.
[9]
G. Wacey and D. R. Bull, POFGEN: A design automation system for VLSI
87
References
digital filters with invariant transfer function, IEEE Int. Symp. Circuits Syst.,
pp. 631-634, May 1993.
[10] K. Y. Cheng, Multiplierless Multirate FIR Digital Filter / Decimator /
Interpolator Module Generator, MS thesis, Dept. of EE, National Central Univ.,
Taiwan, Jun. 2003.
[11] G. W. Reitwiesner, Binary arithmetic, Advances in Computers, vol. 1, NY:
Academic, pp. 231-308, 1966.
[12] Y. Neuvo, C. Y. Dong, and S. K. Mitra, Interpolated finite impulse response
filters, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp.
563-570, Jun. 1984.
[13] T. Saramki,Y. Neuvo and S. K. Mitra, Design of Computationally Efficient
Interpolated FIR Filters, IEEE Trans. Circuits Syst., VOL.35 , NO.1, Jan 1988.
[14] M. Ishikawa et al., Automatic layout synthesis for FIR filters using a silicon
compiler, IEEE Int. Symp. Circuits Syst., pp. 2588-2591, May 1990.
[15] R. Jain, P. T. Yang, and T. Yoshino, FIRGEN: A computer-aided design system
for high performance FIR filter integrated circuits, IEEE Trans. Signal
Processing, vol. 39, pp. 1655-1668, Jul. 1991.
[16] R. Hawley, T.-J. Lin, and H. Samueli, A silicon compiler for high-speed
CMOS multirate FIR digital filters, IEEE Int. Symp. Circuits Syst., vol. 3, pp.
1348-1351, May 1992.
[17] E. Bidet, C. Joanblanq, and P. Senn, GENRIF: An integrated VLSI FIR filter
compiler, Eur. Conf. Design Automation, pp. 466-471, Feb. 1993.
[18] G. Wacey and D. R. Bull, POFGEN: A design automation system for VLSI
digital filters with invariant transfer function, IEEE Int. Symp. Circuits Syst.,
pp. 631-634, May 1993.
[19] N. J. Fliege, Multirate digital signal processing: multirate systems, filter banks,
88
References
wavelets, 1994.
[20] P. Reutz, The architectures and design of a 20-MHz real-time DSP chip set,
IEEE JSSC, vol. 24, pp. 338-348, Apr. 1989.
[21] S.-Y. Wu, Low-power multirate IF digital frequency down converter for
wireless communication systems, MS thesis, Dept. of EE, National Central
Univ., Taiwan, Jun. 1997.
[22] R. Hartley, Subexpression sharing in filters using canonic signed digit
multipliers, IEEE Trans. Circuits Syst. II, vol. 43, pp. 677-688, Oct. 1996.
[23] H. Samueli, An improved search algorithm for the design of multiplierless FIR
filters with powers-of-two coefficients, IEEE Trans. Circuits Syst., vol. 36, pp.
1044-1047, Jul. 1989.
[24] T.-J. Lin and H. Samueli, A 200-Mhz CMOS x/sin(x) digital filter for
compensating D/A converter frequency response distortion in high-speed
communication systems, IEEE GLOBECOM, vol 3, pp. 1722-1726, Dec.
1990.
[25] R. A. Hawley, B. C. Wong, T.-J. Lin, J. Laskowski, and H. Samueli, Design
techniques for silicon compiler implementations of high-speed FIR digital
filters, IEEE JSSC, vol. 31, pp. 656-667, May 1996.
[26] I.-C. Park and H.-J. Kang, Digital filter synthesis based on an algorithm to
generate all minimal signed digit representations, IEEE Trans., CAD of IC and
Syst., vol. 21, pp. 1525-1529, Dec. 2002.
[27] A. P. Vinod, E. M.-K. Lai, A. B. Premkumar, and C. T. Lau, FIR filter
implementation by efficient sharing of horizontal and vertical common
subexpressions, Electronics Letters, vol. 39, pp. 251-253, Jan. 2003.
[28] B. C. Wong and H. Samueli, A 200-MHz all-digital QAM modulator and
demodulator in 1.2m CMOS for digital radio applications, IEEE JSSC, vol.
26, pp. 1970-1979, Dec. 1991.
89
References
[29] R. Hartley, Optimization of canonic signed digit multipliers for filter design,
IEEE ISCAS, pp.1992-1995, 1992.
[30] R. W. Mehler and D. Zhou, Architectural synthesis of finite impulse response
digital filters, Symp. Integrated Circuits Syst. Design, pp. 20-25, Sep. 2002.
[31] M. Bellanger, G. Bonnerot, and M. Coudreuse, Digital filtering by polyphase
network: application to sample rate alteration and filter banks, IEEE Trans.
ASSAP, vol. ASSP-24, pp. 109-114, Apr. 1976.
[32] D. J. Shpak and A. Antoniou, A generalized Remz method for the design of
FIR digital filters, IEEE Trans. Circuits Syst., pp. 161-174, Feb. 1990.
[33] Y. C. Lim and S. R. Parker, FIR filter design over a discrete powers-of-two
coefficient space, IEEE Trans. Acoust., Speech, Signal Processing, pp. 583-591,
Jun. 1983.
[34] X. Hu, L. S. DeBrunner, and V. DeBrunner, An efficient design for FIR filters
with variable precision, IEEE Int. Symp. Circuits Syst., vol. 4, pp.
IV-365-IV-368, May 2002.
[35] D. Kodek and K. Steiglitz, Comparison of optimal and local search methods
for designing finite wordlength FIR digital filters, IEEE Trans, Circuits Syst.,
vol. 28, pp. 28-32, Jan. 1981.
[36] E. C. Ifeachor and B. W. Jervis, Digital signal processing: a practical
approach, Addison-Wesley, 1993.
[37] DesignWare foundation library databook, Synopsys, Inc., Jan. 2002.
[38] S. J. Jou, C. H. Kuo, M. T. Shiau, J. Y. Heh and C. K. Wang, VLSI
implementation of timing recovery and carrier recovery for QAM/VSB dual
mode, International Symp. on VLSI Technology, Systems and Applications,
Taipei, R. O. C. June 1999, pp.159-162.
90
References
[39] Design Compiler User Guide, Synopsys, Inc., May 2002.

[40] The CDMA network engineering handbook, volume 1: concepts in CDMA,
Qualcomm Inc., Mar. 1993.
[41] S.-J. Jou, S.-Y. Wu, and C.-K. Wang, Low-power multirate architecture for IF
digital frequency down converter, IEEE Trans. Circuits Syst. II, vol. 45, pp.
1487-1494, Nov. 1998.
91

91521026

Uploaded by

Copyright:

Available Formats

You might also like

91521026

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

91521026

Uploaded by

Copyright:

Available Formats

//

Multirate Multistage Digital FIR Filter /

Multirate Multistage Digital FIR Filter /

Student : Hsiao-Yun Chen

Department of Electrical Engineering

Motivation and Goals ......................................................................... 5

Chapter 2 Digital FIR Filter Design

Basic FIR Filter Design...................................................................... 7

FIR Filter Structure ......................................................................... 8

Carry Save Addition ...................................................................... 11

Linear Phase FIR Filters............................................................... 12

Multiplierless Filter Design.............................................................. 14

CSD Representation ...................................................................... 14

RTL Design Technologies for CSD Based Design........................... 18

Sign Extension Elimination ........................................................... 18

Chapter 3 Multirate Multistage Digital FIR Filter Design

Basic Multirate Operations............................................................... 24

The Noble Identities ......................................................................... 29

The Polyphase Representation ......................................................... 30

Interpolated FIR Filter Design ......................................................... 33

Multirate Multistage Filter Design ................................................... 38

Chapter 4 Module Generator Implementation

Multistage Architecture Analysis ..................................................... 44

Interpolated FIR Filter Decomposition........................................ 44

Multirate Multistage FIR Filter Decomposition .......................... 49

Local Search Strategy.................................................................... 54

Word Length Estimation................................................................ 55

Internal Word Length Reduction ................................................... 56

Synthesizable Verilog Code Generation........................................... 57

Design of the FIR Digital Filter ................................................... 58

Design of the Interpolated FIR Filter .......................................... 60

Design of the Multirate Multistage Filter ..................................... 62

Chapter 5 Experimental Results

FIR Digital Filter Design.................................................................. 68

Interpolated FIR Filter Design ......................................................... 74

Multirate Multistage Filter Design ................................................... 79

Direct form of FIR filter................................................................................9

conventional filter design; (b) optimum interpolation factors. ..................36

Table 4.1 Input data of system specifications...........................................................42

Minimum number of SPT terms required to attain -50dB NPR...............69

of high-speed and lower-power digital FIR filters. High-speed and low-power

Here, we briefly sketch some related commercial tools:

MathWorks, Inc [3]

Filter Design & Analysis Tool Box (FDATool)

Interpolated FIR Filter Design (IFIR)

Synopsys, Inc [4]

System Studio Filter Design Tool (QED)

Multirate Filter Design Tool (MRFD)

decimation, interpolation, windowed multirate design, and polyphase filtering. In

Sample Rate Conversion Filter Design Tool (SRCFD)

SRCFD enables users to create decimation, interpolation, and rational L/M

1.2 Motivation and Goals

high-speed operation may be achieved, while simultaneously constraining the

Chapter 2 Digital FIR Filter Design

2.1 Basic FIR Filter Design

Chapter 2 Digital FIR Filter Design

2.1.1 FIR Filter Structure

Chapter 2 Digital FIR Filter Design

Direct form of FIR filter.