91521026

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 104

//

Multirate Multistage Digital FIR Filter /


Decimator / Interpolator Module Generator

(93 5 )

/()
(

) ()

) ()

( ) ()
(

:
: Multirate Multistage Digital FIR Filter / Decimator /
Interpolator Module Generator


91521026
93 6 23

1.

2.

3.

/
/

IFIR
polyphase representation
CSD transposed direct CSA

64-QAM
Synopsys TSMC 0.25m
1.64 1.95
714 MHz CDMA
(IFIR filters) 1.72
13.10
IFIR polyphase representation
3.56 1.96

3.06 1.36

Multirate Multistage Digital FIR Filter /


Decimator / Interpolator Module Generator

Student : Hsiao-Yun Chen


Advisor : Shyh-Jye Jou, Ph.D.

Department of Electrical Engineering


National Central University
Jhongli 320, Taiwan, R.O.C.

July 2004

Abstract
In this thesis, a module generator, which can automate the process of designing
high-speed low-complexity multirate multistage digital FIR filter / decimator /
interpolator, is presented. The generator exploits architectural symmetries in linear
phase filters and multistage multirate interpolated FIR filter design methodology for
low complexity. In addition, the polyphase representation is used to decompose the
filter into subfilters. The resulting filters utilize canonic signed digit (CSD)
multipliers, a transposed direct form structure, and carry-save addition for high speed.
The input of the generator requires only system-level specifications. In addition, the
generator can provide three types of filter structure for different applications.
Moreover, the output is a synthesizable Verilog code written in behavioral-level
hardware description language (HDL) which allows the synthesis tool to select the
appropriate architecture from users constraints. Therefore, this tool can eliminate
manual calculations, coding, simulation, and verification time of the design cycle.
We have designed several filters with TSMC 0.25m standard cell. A 64-QAM
baseband design example shows that the area is reduced about 1.64 times and the
power dissipation is saved about 1.95 times for low-complexity applications.
Moreover, for high-speed application, the chip can operate at 714MHz. Besides, we
design the IFIR filters which specification is the first version of the CDMA cellular,
the area is reduced about 1.72 times and the power dissipation is saved about 13.10
times as compared with direct form design. An example of multistage decimator used
in CDMA cellular shows that the area is reduced about 3.56 times and the power
dissipation is saved about 1.96 times as compared with conventional decimator.
Finally, an example of the narrowband multistage interpolator are designed, the area
is reduced about 3.06 and the power dissipation is saved about 1.36 times as
compared with conventional interpolator.

Contents

Chapter 1 Introduction
1.1

Introduction ........................................................................................ 1

1.2

Motivation and Goals ......................................................................... 5

1.3

Thesis Organization............................................................................ 6

Chapter 2 Digital FIR Filter Design


2.1

Basic FIR Filter Design...................................................................... 7

2.1.1

FIR Filter Structure ......................................................................... 8

2.1.2

Carry Save Addition ...................................................................... 11

2.1.3

Linear Phase FIR Filters............................................................... 12

2.2

Multiplierless Filter Design.............................................................. 14

2.2.1

CSD Representation ...................................................................... 14

2.2.2

CSD Multipliers............................................................................. 17

2.3

RTL Design Technologies for CSD Based Design........................... 18

2.3.1

Sign Extension Elimination ........................................................... 18

2.3.2

Common Subexpression................................................................. 20

2.3.3

Pipelining ...................................................................................... 22

Chapter 3 Multirate Multistage Digital FIR Filter Design


3.1

Basic Multirate Operations............................................................... 24

3.1.1

Decimation .................................................................................... 24

3.1.2

Interpolation.................................................................................. 27

3.2

The Noble Identities ......................................................................... 29

3.3

The Polyphase Representation ......................................................... 30

3.4

Interpolated FIR Filter Design ......................................................... 33

3.5

Multirate Multistage Filter Design ................................................... 38

Chapter 4 Module Generator Implementation


4.1

System Specifications....................................................................... 41

4.2

Multistage Architecture Analysis ..................................................... 44

4.2.1

Interpolated FIR Filter Decomposition........................................ 44

4.2.2

Multirate Multistage FIR Filter Decomposition .......................... 49

4.3

Coefficient Calculation..................................................................... 50

4.4

Coefficient Optimization.................................................................. 50

4.4.1

Scaling Strategy............................................................................. 51

4.4.2

Local Search Strategy.................................................................... 54

4.5

Word Length Estimation................................................................ 55

4.5.1

Overflow Prevention...................................................................... 55

4.5.2

Internal Word Length Reduction ................................................... 56

4.6
4.6.1

Synthesizable Verilog Code Generation........................................... 57


Hardware Estimation ................................................................... 57

ii

4.6.2

Design of the FIR Digital Filter ................................................... 58

4.6.3

Design of the Interpolated FIR Filter .......................................... 60

4.6.4

Design of the Multirate Multistage Filter ..................................... 62

4.7

Module Generator............................................................................. 65

Chapter 5 Experimental Results


5.1

FIR Digital Filter Design.................................................................. 68

5.2

Interpolated FIR Filter Design ......................................................... 74

5.3

Multirate Multistage Filter Design ................................................... 79

5.3.1

Interpolator ................................................................................... 79

5.3.2

Decimator...................................................................................... 81

Chapter 6 Conclusions

References

iii

List of Figures
Fig. 2.1
Fig. 2.2
Fig. 2.3
Fig. 2.4
Fig. 2.5
Fig. 2.6

Fig. 2.7
Fig. 2.8
Fig. 2.9
Fig. 2.10
Fig. 2.11

Direct form of FIR filter................................................................................9


Transposed direct form structure of FIR filter. ...........................................10
Spurious transitions of 12-bits CPA............................................................ 11
Transposed direct form FIR filter with carry-save addition........................12
Linear phase transposed direct form FIR filter. ..........................................13
Distribution of CSD coefficient set (a) with 2, 3 and 4 nonzero digits for
8-bit word length and (b) for 6-, 8- and 10-bit word length with 2 nonzero
digits. ...........................................................................................................16
Transposed direct form architecture with CSD coefficients and CSAs. .....18
Compensation vector in MSB fix technique. ..............................................20
Implementation with common subexpressions. ..........................................21
Transposed direct form filter with 2- and 3-FA delay pipelining.............23
Symmetric transposed direct form architecture using carry-save addition.
...................................................................................................................23

Fig. 3.1 (a) M-fold decimator. (b) Demonstration of decimation for M=2. .............25
Fig. 3.2 Spectrum analysis of downsampling effect with M=2................................26
Fig. 3.3 (a) Block diagram of an M-to-1 decimator. (b) Typical magnitude response
of the decimation filter................................................................................26
Fig. 3.4 (a) 1-to-L upsampler (b) Demonstration of upsampling for L=2................27
Fig. 3.5 Spectrum analysis of upsampling effect with L=2......................................28
Fig. 3.6 (a) Block diagram of an 1-to-L interpolator. (b) Typical magnitude response
of the interpolation filter. ............................................................................28
Fig. 3.7 The noble identities for multirate systems. .................................................29
Fig. 3.8 Reconstruction of a decimator with M=2. ..................................................31
Fig. 3.9 Polyphase implementations of (a) M-fold decimator and (b) L-fold
interpolator..................................................................................................32
Fig. 3.10 Time and frequency domain behaviors of IFIR low-pass filter with L=3.34
Fig. 3.11 IFIR filter performance versus transition region bandwidth for p = 0.5,
p = 0.1 dB and p = 40 dB: (a) hardware reduction factor over

iv

Fig. 3.12
Fig. 3.13
Fig. 3.14
Fig. 3.15

conventional filter design; (b) optimum interpolation factors. ..................36


Frequency domain behaviors of IFIR low-pass filter, the image suppressor
I(z) is designed with a don't-care region. ...............................................37
Multistage IFIR decimator design.............................................................38
Multistage IFIR interpolator design..........................................................39
Multistage IFIR decimator with three-stage decomposition.....................40

Fig. 4.1 Design flow of the module generator (a) the digital FIR filter design flow
(b) the multirate multistage digital FIR filter / decimator / interpolator
design flow. .................................................................................................42
Fig. 4.2 Specification of a lowpass filter. .................................................................43
Fig. 4.3 The number of taps versus interpolated factor L for the periodic model filter
G(zL), the image suppressor I(z) and the overall filter H(z)........................46
Fig. 4.4 In case (3), the decompositions of two-stage designs of I(z) for various
values of L1and L2. ......................................................................................47
Fig. 4.5 Specifications of multistage IFIR decimation filter design. .......................49
Fig. 4.6 The flowchart of the scaling strategy for filter coefficients........................53
Fig. 4.7 SNR simulation block. ................................................................................56
Fig. 4.8 Internal word length estimation flow chart. ................................................57
Fig. 4.9 Transposed direct form filter strcture utilizes with (a) carry save adders
(CSA) and, (b) carry save adders (CSA) with pipelining............................59
Fig. 4.10 The Structure C with an input buffer.........................................................59
Fig. 4.11 (a) The symmetric transposed direct from structure for G(zL) with L-unit
delays between the taps; (b) the impulse response of H(z); (c) the impulse
response of G(zL) with L=3. ......................................................................61
Fig. 4.12 The symmetric transposed direct from structure for G(zL) with dual clocks.61
Fig. 4.13 (a) Direct form decimator with mirror symmetric filter pairs.
(b) Transposed direct form decimator with memory-saving technique.
(c) Direct form interpolator with memory-saving technique.
(d) Transposed direct form interpolator with mirror symmetric filter pairs.
...63
Fig. 4.14 The operation flow of the module generator.............................................66

Fig. 5.1 The frequency responses of of Work #1, Work #2 and Work #3. ...............70
Fig. 5.2 The frequency response of 64-QAM baseband demodulator......................72

Fig. 5.3 Frequency responses of the IFIR filters with single-stage I(z) and L = 4,
(a) I(z) of order 15 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter. .................................76
Fig. 5.4 Frequency responses of the IFIR filters with two-stage I(z) and L = 4,
(a) I1(z) of order 7, I2(z2) of order 7 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter. .................................77
Fig. 5.5 The frequency responses of the subfilters of the multirate decimators and
conventional decimator. (a) the conventional decimator of order 69, the
subfilters of the multirate decimator with two-stage (b) I(z) of order 15
(c) G(z) of order 19, the subfilters of the multirate decimator with
three-stage (d) I1(z) of order 7 (e) I2(z) of order 7 (f) G(z) of order 19. .....80
Fig. 5.6 The frequency responses of the conventional FIR and multirate IFIR filters.
.....................................................................................................................81
Fig. 5.7 The frequency responses of the subfilters of the multirate interpolators and
conventional interpolator. (a) the conventional interpolator of order 85, the
subfilters of the multirate interpolator with two-stage (b) I(z) of order 10 (c)
G(z) of order 31...........................................................................................83
Fig. 5.8 The frequency responses of the conventional FIR and multirate IFIR filters.
.....................................................................................................................84

vi

List of Tables
Table 2.1 Key features of the four linear phase FIR filter types. .............................13
Table 2.2 Common subexpressions of filter coefficients. ........................................21

Table 3.1 Number of linear phase subfilters if prototype filter is linear phase ........33

Table 4.1 Input data of system specifications...........................................................42


Table 4.2 Three cases for IFIR decompositions. ......................................................44
Table 4.3 Summaries of the optimal IFIR filters with single-stage and two-stage
implementations of I(z). ...........................................................................48

Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6
Table 5.7
Table 5.8
Table 5.9
Table 5.10
Table 5.11
Table 5.12
Table 5.13

Minimum number of SPT terms required to attain -50dB NPR...............69


Synthesis results of Work #1. ...................................................................69
Synthesis results of Work #1and Work #3................................................71
Specifications of 64-QAM baseband demodulator...................................71
Specifications after module generator of 64-QAM baseband demodulator.
..................................................................................................................73
Synthesis results of the example for 64-QAM baseband demodulator. ...73
Specifications of the CDMA cellular proposed by Qualcomme. .............74
Design results by module generator with IFIR filter designs and the
conventional filter. ....................................................................................75
The synthesis results of the conventional filter and the IFIR filters.........78
The synthesis results of the conventional decimator and the multirate
decimator [40].(Decimation Ratio, M=8)...............................................79
Specifications of the interpolator. ...........................................................82
Specifications after module generator of IFIR filter designs and the
conventional filter. ..................................................................................82
The synthesis results of the conventional interpolator and the multirate
interpolator. (Interpolation Ratio, L=6) ..................................................83

vii

Chapter 1 Introduction

Chapter 1
Introduction

1.1 Introduction
Digital signal processing is an area of science and engineering that has
developed rapidly over the past 30 years. The applications of digital finite impulse
response (FIR) filters and up / down sampling DSP techniques are found everywhere
in modern electronic products such as multimedia, modems, and mobile personal
communications. For every electronic product, lower circuit complexity is always an
important design target since it reduces the cost. For portable applications such as
notebook computers or wireless personal communication systems, whose power
consumption shall be small, a low-power low-complexity implementation is very
important. This is evident by the recent trend toward integrating a whole system on a
single chip (SoC).
Digital FIR filters are widely used in DSP applications. The trend towards
increasing data rates in DSP systems has pushed the development and implementation

Chapter 1 Introduction

of high-speed and lower-power digital FIR filters. High-speed and low-power


applications require both increased parallelism and reduced complexity in order to
meet both sampling rate and power dissipation goals. For many applications, reduced
complexity may be achieved by eliminating programmability of the coefficients, thus
allowing the hardware to be optimized for a particular fixed coefficient set.
Multirate signal processing[1] consists of using different sample rates within a
system to achieve computational efficiencies that are impossible to obtain with a
system that operates on a single fixed sample rate. Such systems are frequently used
for audio and video processing, communication systems, general digital filtering,
transform analysis, and more. The two key components in multirate systems are
decimator and interpolator. It utilizes high-speed decimator / interpolator to reduce
the sampling rate so that complicated processing may be performed at a lower data
rate. The polyphase structure provides an efficient architecture for the realization of
multirate systems through a bank of filters operating in parallel [2]. Proper sampling
rate conversion always requires filtering. Linear phase filters used for sampling rate
conversion can be implemented efficiently. Among these digital filters, the proper
filtering may be performed by finite impulse response (FIR) or infinite impulse
response (IIR). In video and communication systems, however, linear phase filters are
highly desirable to avoid signal distortion, which precludes the use of IIR filters.
Furthermore, FIR filters have a very regular architecture, which make them much
more amenable to synthesis tools and suffer less from the effects of finite word length
than IIR filters.

Chapter 1 Introduction

Here, we briefly sketch some related commercial tools:

MathWorks, Inc [3]


z

Filter Design & Analysis Tool Box (FDATool)


FDATool is a collection of tools built on top of the MATLAB computing

environment and the Signal Processing Toolbox. The toolbox includes a number of
advanced filter design techniques that support designing, simulating, and analyzing
fixed-point and custom floating-point filters for a wide range of precisions.
However, it can handle single-rate filter design only.
z

Interpolated FIR Filter Design (IFIR)


This function can decompose the sharp narrow passband filter into a periodic

model filter and an image suppressor by using IFIR filter design methodology.
However, it cannot decompose the filter into three stages or more. Its capability for
filter decomposition is two stages only.

Synopsys, Inc [4]


z

System Studio Filter Design Tool (QED)


QED is a filter specification, design, and analysis tool. Users can use it to create

analog and digital filters for use in system simulation and implementation.
z

Multirate Filter Design Tool (MRFD)


MRFD enables users to create filtering systems that exploit the benefits of

decimation, interpolation, windowed multirate design, and polyphase filtering. In


these systems, the input and output sampling rates are the same and sample rate
changes are used only to achieve computational efficiencies.
z

Sample Rate Conversion Filter Design Tool (SRCFD)

Chapter 1 Introduction

SRCFD enables users to create decimation, interpolation, and rational L/M


sample rate conversion, using the polyphase, multistage technique for computational
savings.

This tool is a companion to the MRFD but is for use in problems in which

the goal is to change the sampling rate using filtering, decimation, and interpolation.
By using the three tools mentioned above, the following system capabilities can
be obtained:
Design of FIR filters.
Analysis of possible decimator or interpolator structures.
Analysis of multistage options for each decimation or interpolation factor.
Calculation of computational requirements for multistage structures.
Recommended multirate structure including number of stages.
Automatic generation of all filter specifications.
Automatically calls Parks-McClellan design algorithm for each filter design.
C code generation with multistage polyphase filters.
Optional comb / halfband filter design in multistage implementations.
For fixed filter implementations, it is necessary to create custom silicon
solutions for each application. The large number of applications for such application
specific integrated circuits (ASICs) would suggest that a compiler silicon solution
would be desirable [5]-[9]. However, logic synthesis is already a mature technology
and results from existing tools are generally accepted as producing satisfactory
circuits. Thus, we focus on the design process of multirate multistage digital FIR
filter / decimator / interpolator from system specifications to Verilog HDL codes[10].

Chapter 1 Introduction

1.2 Motivation and Goals


Recent rapid progress in very large-scale integrated circuits (VLSI) technology
has led to an emerging theme System-on-a-Chip (SoC). With the increase in the
density and complexity in VLSI integrated circuits technology, the design costs for
the development of a VLSI chip are also increased. It calls for rapid prototyping and
design reuse of major silicon intellectual property (SIP) modules to alleviate the
designer's effort and to speed up the design process. Therefore, computer aided design
(CAD) tools play an important role in decreasing the design cycle time and accurately
simulating the correctness of the circuit design.
In this thesis, a general-purpose multirate multistage digital FIR filter /
decimator / interpolator module generator, which is based on canonic signed digit
(CSD) code representation [11] and multistage multirate interpolated FIR (IFIR) filter
design methodology [12][13], will be proposed. Several design methodologies were
adopted to reduce the hardware complexity at architectural level. The module
generator we presented can automate design FIR filter / multistage filters / decimator
/ interpolator from the system specification to the corresponding synthesizable
Verilog hardware description language (HDL) code. Because the module generator
only requires the system-level specification, the module generator allows system
designers, who are inexperienced in VLSI design, to design filters easily and
concentrate on system design and performance evaluation. Therefore, by using this
module generator, an efficient design of a chip can be successfully completed in a few
minutes.

Chapter 1 Introduction

1.3

Thesis Organization
This thesis will describe various techniques by which sufficient parallelism for

high-speed operation may be achieved, while simultaneously constraining the


solution to have a small hardware implementation for these structures. Most of these
techniques are widely known and are briefly summarized as an introductory tutorial.
The module generator, which utilizes these architectures, analysis techniques and
design tradeoffs will be demonstrated.
The organization of this thesis is as follows. In Chapter 2, an overview of basic
FIR filter design issues will be given first. Next, the issues of multiplierless filter
implementation are introduced. Finally, we will introduce the methodologies that we
use to reach a substantial reduction in hardware complexity when we design the
digital filter. In Chapter 3, some basic multirate DSP fundamentals are reviewed first.
Next, some useful techniques, which are widely used in multirate systems such as
noble identities, polyphase representation, and IFIR filter design, are discussed. In
addition, we will introduce efficient multirate filtering design. In Chapter 4, the
design flow of the module generator will be demonstrated. In Chapter 5, experimental
results of FIR filter / multistage filters / decimator / interpolator examples designed
with the module generator are presented. Finally, some conclusions will be given in
Chapter 6.

Chapter 2 Digital FIR Filter Design

Chapter 2
Digital FIR Filter Design

Digital filters play very important roles in DSP systems. The characteristics of
analog filter circuits are usually very difficult to design, and its overall performance is
very sensitive to no idealities, such as dc-offset voltage, dc voltage drifts and parasitic
components, etc. Compared with analog filters, digital FIR filters can have a truly
linear phase response and very precise performance. A digital filter is easily
programming the hardware to accommodate different data rates, modulation formats
and filter specifications makes the hardware requirements relatively simple and
compact in comparison with the equivalent analog circuitry.

2.1 Basic FIR Filter Design


A linear-time-invariant system can be characterized by its impulse response. A
system called finite-impulse response means its output will gradually decay to zero in
a finite duration as long as its input duration is finite. The basic FIR filters are
characterized by the following two equations:

Chapter 2 Digital FIR Filter Design


N 1

y ( n) = h( k ) x ( n k )

(2.1)

k =0

Eqn.(2.1) is the FIR difference equation. It is a time domain equation and describes
the FIR filter in its nonrecursive form: y(n) is the current output samples that is the
function of present and past values of input, x(n). N is the filter length, that is th
number of filter coefficients. An alternative representation for FIR in z-domain is
given in Eqn. (2.2).
N 1

H ( z ) = h( k ) z k

(2.2)

k =0

where h(k), k = 0, 1, , N-1, are the impulse response coefficients of the filter, H(z)
is the transfer function of the filter. Detail discussion of several basic FIR filter
structures will be given in the next sections.

2.1.1 FIR Filter Structure


The choice of structure for FIR filter design includes factors such as hardware
complexity and desired throughput. Fig. 2.1 depicts the direct from structure of FIR
filter. It is also called a tapped delay line or transversal filter [19]. This structure is a
direct mapping of Eqn. (2.1) into hardware implementation, a tapped delay line in
which each of the delay versions of input is multiplied by the appropriate filter
coefficient and the results are summed together to form the filter output. This
structure needs delay elements, multipliers, and a multi-input adder. The multi-input
adder would dominate the speed of overall system. For linear accumulation, the sum
unit used a two-input carry-propagation adder (CPA), the critical path of an N-tap
direct form FIR is
Tdirect = Tmul + (N 1) Tadd

(2.3)

Chapter 2 Digital FIR Filter Design

Fig. 2.1

Direct form of FIR filter.

where Tmul is the delay of the multiplier and Tadd is the delay of a Wint -bits CPA, and
Wint is the internal word length. A tree structure adder as suggested by Reutz [20] can
instead perform the accumulation and the critical path can be measured as
Tmul + (log1.5 N ) Tadd

(2.4)

The delay time of the filter, increases logarithmically with the filter tap length N.
Furthermore, the tree structures can use carry-save adder (CSA) tree, Wallace trees, or
Dadda trees to eliminate the delay due to carry propagation.
Fig. 2.2 depicts the transposed direct form structure of FIR filter that
repositioning the delay elements of the direct form structure [19]. In this structure, the
input is fed to each tap and the results are accumulated over N sample periods. As
shown in the block diagram, the system throughput rate is independent of the tap
length. It retains the regularity of the linear accumulation direct form structure and the
critical path of this structure is only a multiplication and an addition, as shown in Eqn.
(2.5).
Ttransposed = Tmul + Tadd

(2.5)

Chapter 2 Digital FIR Filter Design

Fig. 2.2

Transposed direct form structure of FIR filter.

We can expect it faster than the tree structures used in direct from structure of
FIR filter. Such a short critical path also allows the system to operate in a low supply
voltage and make this solution very suitable for low-power applications. Besides, it
has inherent ability for high-speed operation and pipelining.
One of the primary disadvantages of this structure is the large loading on the
input data-broadcasting bus since all multipliers are fed in parallel. As the numbers of
taps increase, the input signal bus becomes longer and leads to larger load
capacitances. We can reduce this effect by using appropriate data buffers and by
appropriately distributing the input bus as tree-like structures. Another disadvantage
of this structure is the delay elements are larger since they hold the accumulated sum
instead of the input signal. Furthermore, if we choose the CSA base structures that
will be introduced in the next section, it required doubling delay elements within the
filter core.

10

Chapter 2 Digital FIR Filter Design

2.1.2

Carry Save Addition

The multiplier and adder delay plays an important role in dominating the system
speed as show in Eqn. (2.5). Carry-propagation adder (CPA) is not a good candidate
for low power dissipation design and high-speed application. Because the delay time
of it is linearly dependent on the word length of the adder. It also generates many
glitches before the real carry propagates from the least significant bit (LSB) to the
most significant bit (MSB), as shown in Fig. 2.3 [21].

Cin
11
10
9
8
7
6
5
4
3
2
1
0

Fig. 2.3

Spurious transitions of 12-bits CPA.

In order to avoid the long critical path delay of the adder, the adder in each tap
is converted to CSA as shown in Fig. 2.4. In carry-save addition, both a sum and a
carry bit are acquired in each bit position in the word and the carry propagation
problem inside an adder is avoided. There are a few drawbacks to the carry-save
scheme, with the most important of these being the requirement of doubling the

11

Chapter 2 Digital FIR Filter Design

Fig. 2.4

Transposed direct form FIR filter with carry-save addition.

number of registers within the filter core. This will increase the filter core area but
system can achieve a higher throughput rate or use a lower supply voltage. At the
final stage of the filter, it requires a single high-speed CPA, a so-called vector merge
adder (VMA), in order to sum the two data path output together to form the final
output. The critical path delay of transposed direct from FIR filter is
TFIR = max{Tmul + TCSA , TVMA }

(2.6)

where TVMA means the n-bits VMA delay. Obviously, the VMA delay will dominate
the system throughput rate, so some high-complexity high-speed adder such as a
carry-select adder or a carry-lookahead adder (CLA) may be used to reduce TVMA.

2.1.3

Linear Phase FIR Filters

In many filter applications, phase distortion cannot be tolerated, and thus the
filters are required to have a linear phase response. There are four types of linear
phase FIR filters, depending on whether N is even or odd and whether h(k) is
symmetric or anti-symmetric. Table 2.1 summarizes their key features.

12

Chapter 2 Digital FIR Filter Design

Table 2.1

Key features of the four linear phase FIR filter types.

Type

II

III

IV

Tap Length

odd

even

odd

even

Symmetry

symmetric

symmetric

anti-symmetric

anti-symmetric

H(0)

arbitrary

arbitrary

H()

arbitrary

arbitrary

Applications

LP, HP, BP, BS,


multiband filters

LP, BP

differentiators,
Hilbert transformers

The symmetric structure can save about half the number of coefficient
multipliers by sharing the multipliers between the symmetric taps. This symmetry
feature exists in both the direct form and transposed direct form structures. Fig. 2.5
shown the linear phase transposed direct form structure or called symmetric
transposed direct form structure and it is adopted in our module generator. The
drawback of this symmetric structure is the slightly increase in data path routing due
to the sharing of multipliers.

Fig. 2.5

Linear phase transposed direct form FIR filter.

13

Chapter 2 Digital FIR Filter Design

2.2 Multiplierless Filter Design


The area requirement for the multipliers is a well-known bottleneck for FIR
filters and it would consume a great amount of power. Moreover, the maximum speed
of filter would be severely limited by the delay of multiplier. The number of addition
operations required in a constant coefficient multiplication equals one less than the
number of nonzero bits in the constant coefficient. In order to further reduce the area
and power consumption, the constant coefficient can be encoded such that it contains
the fewest number of nonzero bits, which can be accomplished using canonic signed
digit (CSD) representation.
This section addresses the CSD number representation and its applications for
the design of the constant multipliers.

2.2.1 CSD Representation


A FIR filter coefficient expressed as a sum of signed powers-of-two (SPT)
terms has the general form
Dn

hSPT (n) = sk , n 2

pk , n

(2.7)

k =1

where s k , n { 1,0,1} and pk ,n {1,L,W }. The coefficient hSPT(n) has Dn-SPT terms
and W-bit word length. In general, there are several equivalent SPT representations
for a given number. The minimum representation refers to a representation requiring
the minimum number of SPT terms, of which there may also be more than one
choice.

14

Chapter 2 Digital FIR Filter Design

The properties of CSD number representations are summarized as follows.


z

The CSD number representation is a ternary coded word with the minimum
number of nonzero digits (SPT terms).

No two consecutive digits in a CSD number are nonzero.

The CSD representation of a number is unique [11] and there are at most n/2
nonzero digits for an n-bit CSD word.

CSD numbers cover the range (-4/3, 4/3), out of which the values in the
range [-1,1) are of greatest interest.

Among the W-bit CSD numbers in the range [-1,1), the expected number
of non-zero digits tends asymptotically to n/3 + 1/9 [22]. Hence, on average,
CSD numbers contains about 33% fewer nonzero digits than 2s complement
numbers.
The drawback of CSD representation is that the distribution of CSD coefficient

is not uniform [23], as shown in Fig. 2.6, and it may cause seriously quantization
error problem. The distribution has many gaps in the region where the CSD value is
above 0.5 for a fixed number of nonzero digits and word length. When increase the
number of nonzero digits in same word length or increase the number of word length
in same nonzero digits, it can reduce the gaps of the distribution for larger CSD value.
Since the distribution of CSD coefficient is not uniform, some search strategies and
optimization algorithms are required in order to find the optimal CSD representation
of the origin coefficient, and fulfill the origin specifications in the same time. These
will be discussed later in Section 4.4.

15

Chapter 2 Digital FIR Filter Design

(a)

(b)
Fig. 2.6 Distribution of CSD coefficient set (a) with 2, 3 and 4 nonzero digits for
8-bit word length and (b) for 6-, 8- and 10-bit word length with 2 nonzero
digits.

16

Chapter 2 Digital FIR Filter Design

2.2.2 CSD Multipliers


As mentioned previously, constant multiplication can be carried out by adding
or subtracting a number of partial product terms corresponding to the nonzero bit
positions in the constant multiplier. A CSD-encoded multiplier simply implemented
by combining bus shifts and minimal 2s complement adders since adders and
subtracters are essentially identical in hardware. Without using a multiplier, this
structure is often referred to as multiplierless filters. They require less circuitry, lower
power dissipation, and have a shorter critical path delay, which translates into a
higher data throughput. These advantages are especially important for FIR filter
implementation since such filters usually need much more multiplications than their
IIR counterparts do.
Many high-speed digital FIR filter chips and silicon compilers employ CSD
coefficients [15][16][24][25]. An architecture frequently adopted by these designs,
which uses a transposed direct form filter structure with CSAs, is shown in Fig. 2.7.
In this architecture, one CSA is required for each nonzero term in all but the final
stage of the filter used a single VMA to combine the carry and sum vectors at the
filters output. The architecture provides excellent layout regularity and a short
critical path as
Ttransposed = max{Dmax TCSA , TVMA }

(2.8)

Where TCSA and TVMA are the delay times of the CSA and the VMA, respectively,
with Dmax = max{Dn }. The delay time of a CSA is only a one-bit full adder. The use
n

of carry save arithmetic takes full advantage of the CSD coefficients and reduces the

17

Chapter 2 Digital FIR Filter Design

Fig. 2.7

Transposed direct form architecture with CSD coefficients and CSAs.

delay time of coefficient multiplication-accumulation to a few one-bit full adder


delays without heavy pipelining. It is intuitive that, for a given filter length, the total
number of SPT terms ( Dn ) used, determines the filter implementation complexity.
n

In addition, the maximum number of SPT terms per coefficient Dmax generally
determines the throughput limit of the filter. Therefore, the objective of the filter
design is to optimize the filters frequency response while keeping the number of SPT
terms employed to a minimum and keeping the number of SPT terms per coefficient
within a specified bound.

2.3

RTL Design Technologies for CSD Based Design

2.3.1 Sign Extension Elimination


One drawback of the transposed direct form structure is a large load on the input
data bus, which cannot be easily eliminated. Some work [22][26][27] have exploited

18

Chapter 2 Digital FIR Filter Design

the common factors in the coefficient set to produce a nested generation of the
data-coefficient products. While this may simplify the generation of products and
reduce the loading on the data broadcast lines, it is coefficient dependent and
extremely irregular.
For the twos complement data format, the implications of this problem are
especially important for the MSB driver. For example, suppose that a 5-bit input data
word
x0 x1 x2 x3 x4

(2.9)

is multiplied by the CSD coefficient of 2-4. Then the input data should be shifted 4
bits to the right as
x0 x0 x0 x0 x0 x1 x2 x3 x4

(2.10)

to the appropriate adder columns. It is obviously that the MSB bit x0 must be
broadcast to five FAs, while the others need only be broadcast to one FA. This will
lead to a far greater loading capacitance on the MSB as compared to the other bits on
the input data bus and longer chains of buffers would be needed to drive large load.
Furthermore, power consumption and chip area would also increase due to these
driving circuits and wiring buses.
To solve this problem of large MSB load, a solution called MSB Fix was
applied [28]. To illustrate this principle, consider again the previous data word in
(2.11). One can reforms this data word and equivalently represents it as
x0 x0 x0 x0 x0 x1 x2 x3 x4 = 0000 x0 x1 x2 x3 x4
+ 111100000

(2.11)

According to this representation, the multiplied data word can be achieved by the
summation of the shifted data word and a constant vector. The FA loading of MSB
drivers would be the same as that of the non-MSB drivers, and the inverted MSB is

19

Chapter 2 Digital FIR Filter Design

broadcast. Since the MSB fixed technique only depends on the value of CSD shift,
the constant vector of each tap of the filter can be summed together to form a
compensation vector (CV).
N 1

CV = CVn

(2.12)

n =0

This CV can be added to the first tap of filter as shown in Fig. 2.8.

Fig. 2.8

Compensation vector in MSB fix technique.

2.3.2 Common Subexpressions


Another simplification was performed for those CSD coefficients, which have
common subexpressions in the CSD representations [29]. The adders and shifters can
replace constant multipliers for efficient implementations, where the area can further
reduced by sharing the common subexpressions among those operations. For example,
an input data word x is multiplied by the CSD coefficients h1=2-1+2-3 and h2=2-5+2-7
respectively, and the results can be expressed as
x h1 = x >> 1 + x >> 3
x h2 = x >> 5 + x >> 7

(2.13)

We can define an common expression f=x>>1+x>>3, and the above representations

20

Chapter 2 Digital FIR Filter Design

can be alternatively expressed as


x h1 = q
x h2 = ( x >> 1 + x >> 3) >> 4
= f >> 4

(2.14)

This method can be put into implementation directly as shown in Fig. 2.9. It is clearly
in this figure that there is a 25% hardware reduction with this scheme, 4 adders

Fig. 2.9

Table 2.2

Implementation with common subexpressions.

Common subexpressions of filter coefficients.

21

Chapter 2 Digital FIR Filter Design

reduced to 3 adders. Another example, the common subexpressions (f1 and f2) of filter
coefficients are shown in Table 2.2.
Experimental results [30] show that a subexpression sharing design has a longer
critical path than a design with no sharing. Moreover, it is not suited to the polyphase
structure that will be discussed later in Section 3.3 although this can provide
significant reductions in complexity and loading for some filters. Subexpression
sharing is not employed in this module generator; however, it may become an option
in a future version.

2.3.3

Pipelining

Architectures that adopt the CSD multipliers and carry-save addition greatly reduce
the critical path of the filter. However, critical path can be reduced further through
pipelining of the structure. Pipelining stage of 2 to 3 FA delay can then be achieved
by placing pipeline registers between the CSAs and the adders as show in Fig. 2.10.
The register cost per filter tap for bit-level pipelining is
N reg ,n = min{Dn + 2, 4} Wint

(2.9)

where Wint is the internal word length of the filter. Pipelining to a single FA delay will
require substantially much more pipeline register hardware. Thus, only two-stage
pipeline will be incorporated as an option in the module generator. The final
architecture for a four-digit CSD linear phase tap using carry-save addition is shown
in Fig. 2.11.

22

Chapter 2 Digital FIR Filter Design

Fig. 2.10

Fig. 2.11

Transposed direct form filter with 2- and 3-FA delay pipelining.

Symmetric transposed direct form architecture using carry-save addition.

23

Chapter 3 Multirate Digital FIR Filter Design

Chapter 3
Multirate Multistage
Digital FIR Filter Design

3.1

Basic Multirate Operations


A multirate system is characterized by the property that signals at different

sampling rates are present. Such systems are used for audio and video processing,
communication systems, general digital filtering, transform analysis, and more. The
two basic operations in multirate system are decreasing and increasing the sampling
rate of signals. The former is called decimation , or down-sampling. The latter is
called interpolation, or up-sampling.

3.1.1 Decimation
Fig. 3.1(a) shows the M-fold decimator, which takes an input sequence x(n) and
produces the output sequence

24

Chapter 3 Multirate Digital FIR Filter Design

y (n) = x( Mn)

(3.1)

where M is an integer and y(n) is obtained by taking only M-th sample of the input
signal x(n) and discarding all others. Fig. 3.1(b) demonstrates the idea for M=2. As
will be shown mathematically, decimation results in aliasing unless x(n) is
bandlimited in a certain way. In general, therefore, it may not be possible to recover
x(n) from y(n) if aliasing occurs.
f =F

f =F M

(a)
x(n)

9 10

y(n)

(b)
Fig. 3.1

(a) M-fold decimator. (b) Demonstration of decimation for M=2.

For the M-fold decimator, Eqn. (3.1), we derive an expression for the output
spectrum Y(ej) in terms of X(ej), which is
Y (e j ) =

1
M

M 1

X (e

j ( 2 k ) / M

(3.2)

k =0

Fig. 3.2 demonstrates the spectrum analysis for M=2. From this figure, the
stretched version X(ej/) may overlap with its shift replica. When this happens, the
input samples x(n) can not be recovered from the decimated version y(n). This
overlap effect is called aliasing.

25

Chapter 3 Multirate Digital FIR Filter Design

X(ej)

-2

Decimation

Y(ej)

aliasing

-2

Fig. 3.2

Spectrum analysis of downsampling effect with M=2.

A lowpass digital filter called the decimation filter as shown in Fig. 3.3(a)
precedes the downsampler. This filter ensures that the input signal being decimated is
bandlimited. The exact band edges of the decimation filter depend on how much
aliasing is permitted. The simplest form of lowpass decimation filter has magnitude
response as sketched in Fig. 3.3(b). Typically, the cutoff frequency is designed at
/M.
f =F

f =F

f =F M
M

(a)
H(ej )

(b)
Fig. 3.3

(a) Block diagram of an M-to-1 decimator. (b) Typical magnitude response


of the decimation filter.

26

Chapter 3 Multirate Digital FIR Filter Design

3.1.2 Interpolation
Fig. 3.4(a) shows a building block of an L-fold interpolator (or expander). By
inserting L-1 equally spaced zeros between each pair of samples, we device takes an
input x(n) and produces an output sequence
x(n/L), if n is integer-multiple of L
y(n) =

(3.3)
0,

otherwise.

where L is an integer. Fig. 3.4(b) is a demonstration of this operation for L=2. It is


evident that the interpolation operation does not cause any loss of input information.
We can recover the input x(n) from y(n) by L-fold decimation.
f =F

f = FL

L
(a)

9 10

9 10

x(n)
n

y(n)
n

(b)
Fig. 3.4

(a) 1-to-L upsampler (b) Demonstration of upsampling for L=2.

By doing z-transform of Eqn. (3.3), the output time sequence of interpolator y(n)
can be written as follows.

27

Chapter 3 Multirate Digital FIR Filter Design

Y ( z) =
=

y ( n) z

n =

y(kL) z

kL

k =

n = mul . of L

y ( n) z n

x(k )z

kL

(3.4)

k =

= X ( z L ).

X(ej)

-2

Interpolation
Images

Y(ej)
-2

Fig. 3.5 Spectrum analysis of upsampling effect with L=2.

f =F

f = F L

f = F L

(a)
H(ej )

(b)
Fig. 3.6

(a) Block diagram of an 1-to-L interpolator. (b) Typical magnitude response


of the interpolation filter.

28

Chapter 3 Multirate Digital FIR Filter Design

From Eqn. (3.4), we can find that Y(ej) = X(ejL). This means that Y(ej) is a L-fold
compressed version of X(ej) as demonstrated in Fig. 3.5, where L=2. The multiple
copies of the compressed spectrum are the images created by the interpolation process.
An interpolation filter that follows an interpolator to suppress those unwanted images,
as shown in Fig. 3.6.

3.2

The Noble Identities


A different type of cascade is shown in Fig. 3.7(a) where a filter H(z) follows a

decimator, and in Fig. 3.7(c) where a filter H(z) precedes an intrpolator. Such
interconnections arise when we try to use the polyphase representation (Section 3.3)
for decimation and interpolation filters. If the function H(z) is rational (i.e., a ratio of
polynomials in z or z-1) then we can redraw Fig. 3.7(a) as in Fig. 3.7(b) and Fig. 3.7(c)
as in Fig. 3.7(d). These are called noble identities [1]. The proofs of them are shown
below.

Identity 1
x(n)

H(z)

y1(n)

x(n)

H(zM)

(a)

y2(n)

H(zL)

y4(n)

(b)
Identity 2

x(n)

H(z)

y3(n)

x(n)

(c)
Fig. 3.7

x'(n)

(d)
The noble identities for multirate systems.

29

Chapter 3 Multirate Digital FIR Filter Design

1 M 1
X ( z1/ M W k ) H (( z 1/ M W k ) M )

M k =0
1
j 2 k M
M
1 M 1
1/ M
k
M
M
=
X
(
z
W
)
H
(
z
e
)

M k =0
1 M 1
=
X ( z1/ M W k ) H ( ze j 2 k )

M k =0
1 M 1
=
X ( z1/ M W k ) H ( z )
M k =0

Y2 ( z ) =

(3.5)

, W = e j 2 / M .

= Y1 ( z )

Eqn. (3.5) shows that Y2(z) is equal to Y1(z). Also, consider that
Y4 ( z ) = H ( z L ) X '( z ) = H ( z L ) X ( z L ) = Y3 ( z )

(3.6)

which proves that Y4(z) is the same as Y3(z).

3.3

The Polyphase Representation


An important advancement in multirate signal processing is the invention of

polyphase representation [31]. It permits great simplification of theoretical results and


leads to computationally efficient implementations of decimator and interpolator.
Considering a filter H(z) =

h(n)z

-n

, the coefficients of it can be separated

n =-

into odd numbered part and even numbered part, i.e., H(z) can be written as
H(z) =

h(n)z

-n

n =-

h(2n)z

-2n

+z

n =-

-1

h(2n + 1)z

(3.7)
-2n

n =-

If we define
H 0 (z) =

h(2n)z -n , H1 (z) =

n =-

h(2n + 1)z

-n

(3.8)

n =-

the representation of H(z) can be rewritten as


H(z) = H 0 (z 2 ) + z -1 H 1 (z 2 ) .

30

(3.9)

Chapter 3 Multirate Digital FIR Filter Design

H(z)

f =1

f =2

H 0 (z2 )

H1 (z2 )

2
f =2

f =2

Fig. 3.8

H 0 (z)

H1 (z)

f =1

f =1

Reconstruction of a decimator with M=2.

This representation can be put into implementation directly. Fig. 3.8 shows an
example of this reconstruction for a decimator with M=2. The polyphase
implementation (Fig. 3.8(c)) is much more efficient than a direct implementation as
shown in Fig. 3.8(a). Although there are some hardware overheads due to the
downsampler, H0(z) and H1(z) will operate at lower rate. Each of them requires only
N/2 multiplications and (N-1)/2 additions per unit time to carry out the processing
relative to N multiplications and (N-1) additions per unit time that the direct
implementation needs. Here, N is the tap length of the decimation filter H(z).
This polyphase representation can also be used on the implementation of
interpolator. Fig. 3.9 shows the general form of the polyphase implementation of
M-fold decimator and L-fold interpolator.

31

Chapter 3 Multirate Digital FIR Filter Design

H 0 (z)

H 0 (z)

z 1

z 1
L

H1 (z)

H1 (z)

z 1

z 1

M
f =M

H M-1 (z)

H L-1 (z)

L
f =1

f =1
(a)

Fig. 3.9

M
f =L

(b)

Polyphase implementations of (a) M-fold decimator and (b) L-fold


interpolator.

However, the decomposition of the linear phase filter, which has symmetric
coefficients with polyphase property, into subfilters will usually destroy the
symmetric property of subfilters. Thus, it possibly increases hardware complexity as
compared to the original symmetric filter without using polyphase representation.
Since the decomposition into subfilters is accomplished by sampling every M-th
coefficient of the original impulse response, those subfilters resulting from sampling
which is symmetric about the center tap will be linear phase, while the other subfilters
will not. At most, there will have two subfilters to be linear phase as is summarized in
Table 3.1 [25]. The remaining nonlinear phase subfilters cannot use the folded
structure and will require a large number of multipliers to implement. Therefore,
when the sampling rate conversion ratio (SRCR) is even, the filter with even tap
length N can be redesigned to be N+1 for two more number of linear phase subfilters
to reduce the hardware complexity.

32

Chapter 3 Multirate Digital FIR Filter Design

Table 3.1

3.4

Number of linear phase subfilters if prototype filter is linear phase

Filter Length

Sampling Rate
Conversion Ratio

Number of
Linear Phase Subfilters

Even

Even

Even

Odd

Odd

Odd

Odd

Even

Interpolated FIR Filter Design


FIR digital filters are well known to have some desirable properties like stability,

linear phase response and less quantization error. The main drawback of it is the large
mount of arithmetic operations needed in implementation, especially for the filters
with narrow transition band. In order to cope with the computational complexity of
sharp narrowband FIR filters, the interpolated FIR (IFIR) filter technique is
introduced [12]. The basic idea of it is to implement the filter H(z) as a cascade of
two FIR sections:
H(z) = G(z L ) I(z)

(3.10)

where G(zL) is a periodic model filter which generates a sparse set of impulse
response values with every L-th samples being nonzero, and I(z) is a image
suppressor which can be implemented with only a few arithmetic operations.
In frequency domain analysis, G(zL) has a periodic frequency response with
period 2/L and is designed to perform passband, transition band and stopband
shaping in the vicinity of the passband, and I(z) is designed to attenuate the unwanted

33

Chapter 3 Multirate Digital FIR Filter Design

Fig. 3.10

Time and frequency domain behaviors of IFIR low-pass filter with L=3.

passband created by G(zL). If p denotes the passband deviation and s denotes the
stopband deviation, the overall IFIR filter must meet the requirements of
1 p G( z L) I ( z ) 1 + p

in the passband (3.11)

and
G ( z L) I ( z) s

in the stopband.(3.12)

Time and frequency domain behaviors of the IFIR approach used on a low-pass filter
design with L=3 are illustrated in Fig. 3.10.
Considering the image suppressor I(z), it can also be generally implemented
into a multistage structure and can be expressed as [13]:
I ( z ) = I1 ( z ) I 2 ( z L1 ) I 3 ( z L1L2 ) I k ( z L1L2 Lk 1 )
N Ik

I k ( z ) = ik ( n) z n
n =0

where Lks are selected such that

34

(3.13)
(3.14)

Chapter 3 Multirate Digital FIR Filter Design

Lk 1 =

L
L1 L2 Lk 2

(3.15)

is an integer. If the stopband edge frequency of the low-pass IFIR filter is denoted by
s, the maximum value for the interpolation factor L is

Lmax =
s

(3.16)

where the brackets denote truncation. We take IFIR filter performance versus
transition region bandwidth for p = 0.5, p = 0.1 dB and p = 40 dB as an example
for two stages implementation. Fig. 3.11(a) shows the reduction factor of IFIR filter
design over conventional filter design, which is
SF =

N CON
, N IFIR = N I + N G
N IFIR

(3.17)

where NCON is the order of conventional filter, NI is the order of I(z) and NG is the
order of G(z). Fig. 3.11(b) shows the optimum interpolation factors versus transition
region bandwidth. In this case, the filter with narrow transition band, higher
interpolation factor L gives higher reduction factor SF, the maximum SF can up to 6.
In [12], the design of IFIR filters was based on the use of simple interpolators.
In the simple case, the image suppressor filter I(z) is a simple lowpass filter. This case
is the most robust and the fastest design. Further optimized IFIR designs [13], the I(z)
is designed with a don't-care region, where the periodic model filter G(zL) already
provides the required attenuation, the concept as shown in Fig. 3.12, thus leads to
fewer coefficients required of I(z). Another advanced IFIR design is much more
involved, it jointly optimizes the G(zL) and I(z) in order to achieve the required
specification. The result is significant savings in the order of the I(z) and a slighter
savings in the order of G(zL). Matlab has a useful function called ifir which provides

35

Chapter 3 Multirate Digital FIR Filter Design


6
L
L
L
L
L
L
L

5.5

=
=
=
=
=
=
=

2
3
4
5
6
7
8

4.5

SF

3.5

2.5

1.5

0.01

0.02

0.03
0.04
0.05
0.06
0.07
Transition Region Bandwidth ( rad/sample)

0.08

0.09

0.1

(a)

(b)
Fig. 3.11

IFIR filter performance versus transition region bandwidth for p = 0.5,


p = 0.1 dB and p = 40 dB: (a) hardware reduction factor over
conventional filter design; (b) optimum interpolation factors.

36

Chapter 3 Multirate Digital FIR Filter Design

the baseline and the advanced design approach. Obviously, the advanced IFIR design
method gives the fewest coefficients leading to fewest multipliers and hardware
complexity. However, the maximum value of the final coefficients may exceed 1, and
has higher coefficient range as compared to the simplest design method. It would be a
problem when the filter is realized with finite precision. Our module generator will
give more nonzero digits or use scaling to compress the range of the coefficients to
prevent the filter coefficients from overflow.

Fig. 3.12

Frequency domain behaviors of IFIR low-pass filter, the image suppressor


I(z) is designed with a don't-care region.

With carefully selecting the interpolation factor L, the number of stages and
choosing the best method to implement the interpolator, there will be an optimum
IFIR filter design with minimum hardware complexity. The price paid for these
reductions is only a slight increase in the number of delay elements as compared with
direct implementation. In addition, the IFIR implementation gives smaller coefficient
sensitivity and better roundoff noise than direct implementation [13].

37

Chapter 3 Multirate Digital FIR Filter Design

3.5

Multirate Multistage Filter Design


In many applications, it is usually necessary to design a decimator / interpolator

with a large decimation / interpolation ratio. Although this can be done by designing a
filter directly and using the polyphase structure to save the arithmetic operations, it is
more efficient to design in multiple stages [1][2], and the IFIR technique is still
applicable.
Considering a decimator shown in Fig. 3.13(a), the lowpass filter H(z) will be a
narrow band case as the decimation ratio M becomes large. The IFIR technique can
be used to reduce the hardware complexity of H(z).

Fig. 3.13

Multistage IFIR decimator design.

38

Chapter 3 Multirate Digital FIR Filter Design

L
f=1

H(z)
f=L

IFIR Technique
G(zL1)

L1 L2
f=1

f=L

L2

I(z)

(b)

I(z)

(c)

Noble Identity

G(z)

f=1

(a)

L1

f = L2

f=L

Polyphase Decomposition
G0(z)

I0(z)

L2

G1(z)

L2

GL2-1(z)

L2
f=1

Fig. 3.14

(d)

L1

-1

-1

Z-1

f = L2

I1(z)

L1

IL1-1(z)

L1
f = L2

Z-1

f=L

Multistage IFIR interpolator design.

If we carefully design the interpolation factor L of the periodic model filter G(zL)
to be M1, as shown in Fig. 3.13(b), the structure of the decimator can be reconstructed
into Fig. 3.13(c) from noble identity. By this structure, the decimator is divided into
two sections, and both of them can be implemented by polyphase representation with
less filter coefficients resulting from image suppressor I(z) and model filter G(z), as
shown in Fig. 3.13(d). In addition, the interpolator can be designed in the same way,
as shown in Fig. 3.14.
Furthermore, the multistage IFIR decimator / interpolator structure can also
extend to three stages or more. Fig. 3.15 shows the derivation of the structure with
three-stage decomposition.

39

Chapter 3 Multirate Digital FIR Filter Design

H(z)
f =M

I1 (z)

G(zM1M 2 )

I 2 (z M1 )

M1 M 2 M 3
f =M

I1 (z)

I 2 (z M1 )

M1 M 2

f =M

M1

I 2 (z)

G(z)

f =1

M3

f = M3

f =M

I1 (z)

f =1

M2

G(z)
f = M3

f = M2 M3

f =1

M3
f =1

Fig. 3.15 Multistage IFIR decimator with three-stage decomposition.

40

Chapter 4 Module Generator Implementation

Chapter 4
Module Generator Implementation

The design flow of the module generator and the program implementation
issues will be discussed in this chapter. The system configuration and dataflow of the
module generator are shown in Fig. 4.1. The module generator consists of many
sub-modules. The main sub-modules are the multistage architecture analysis and
synthesis, the coefficient calculation, the coefficient optimization, the word length
estimation and the synthesizable Verilog code generation. All modules are written in
C++ language and the operation of each module will be described in the following
sections.

4.1

Specifications
The inputs of our module generator are the system-level specifications, which

are listed in Table 4.1, where

and

AP = 20 log(1 P )

(4.1)

AS = 20 log S

(4.2)

41

Chapter 4 Module Generator Implementation

Fig. 4.1

Design flow of the module generator (a) the digital FIR filter design flow
(b) the multirate multistage digital FIR filter / decimator / interpolator
design flow.
Table 4.1

Input data of system specifications.

Filter Type (LP, HP, BP, BS, Decimator, Interpolator, Multistage FIR)

Tf

Normalized Passband and Stopband Edge Frequencies

P , S

Passband Ripple in Magnitude or dB

P / AP

Stopband Attenuation in Magnitude or dB

S / AS

Input Word Length (bit)

Win

Signal to Noise Ratio (dB)

SNR

Up Conversion Ratio

Down Conversion Ratio

42

Chapter 4 Module Generator Implementation


1.2

1+P
Passband ripple

1-P
0.8

Ideal lowpass filter

Amplitude

0.6

Transition width
0.4

0.2

S
Stopband ripple

-S
-0.2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fs/2

Normalized Frequency ( rad/sample)

Fig. 4.2

Specification of a lowpass filter.

Fig. 4.2 provides a graphical description of the specifications of a lowpass filter.


Because the impulse response required to implement the ideal lowpass filter is
infinitely long, it is impossible to design an ideal FIR lowpass filter. Finite length
approximations to the ideal impulse response lead to the presence of ripples (p and s)
in both the passband (< p) and the stopband (>s) of the filter, as well as to a
nonzero transition width (s-p) between the passband and stopband of the filter.
Both the passband /stopband ripples and the transition width are undesirable but
unavoidable deviations from the response of an ideal lowpass filter when
approximating with a finite impulse response. Practical FIR designs typically consist
of filters that meet certain design specifications, i.e., that have a transition width and
maximum passband / stopband ripples that do not exceed allowable values.

43

Chapter 4 Module Generator Implementation

4.2

Multistage Architecture Analysis


Generally, it is usually necessary to design a decimator / interpolator with a

large decimation / interpolation ratio in many applications. Although it can be done


by designing a filter directly and using the polyphase structure to save the arithmetic
operations. It is more efficient to design in multiple stages and the IFIR technique [12]
is used. In addition, the interpolated FIR filter can implement narrowband FIR filter
designs with a significantly reduced hardware complexity relative to conventional
FIR filters.

4.2.1 Interpolated FIR Filter Decomposition


In this subsection, the optimal decomposition of IFIR filters are discussed for
both single-stage and multistage of the image suppressor I(z). We will estimate the
minimum subfilter orders very accurately. With the estimated values, it is possible to
find a nearly optimum decomposition. The optimal filter decomposition depends on
the stopband edge as well as on the relative transition width of the filter. In the
following, we will consider three cases show in Table 4.2 as design examples [13].

Table 4.2

Three cases for IFIR decompositions.

Case

0.05

0.1

0.01

0.001

II

0.09

0.1

0.01

0.001

III

0.01

0.02

0.01

0.001

44

Chapter 4 Module Generator Implementation

Fig. 4.3 show the total taps requirements in these three cases to implement I(z),
G(zL) and the overall filter as a function of the interpolated factor L for the
single-stage implementation of I(z). The interpolated factor L = 1 corresponds to the
conventional direct form FIR filter. As show in these figures, the IFIR filters provide
significant reductions in the number of the taps over conventional direct form designs.
As L increases, the number of taps of G(zL) decreases exponentially and the taps of
I(z) increases exponentially. We can increase L until the decrease in the number of
taps of G(zL) becomes smaller than the increase in the number of taps of I(z) and the
minimum total taps of the overall filter is obtained. The maximum interpolated factor
is limited to Lmax = / s and Lmax for case I, II, III are 10, 10 and 50 respectively.
When comparing the results for case I and case II, it is observed that as the relative
transition bandwidth is made smaller while keeping the same stopband edge, the
interpolated factor of the optimum value Lopt of L becomes larger. As for case II and
case III, if the transition width is same, the one with smaller stopband edge will have
relative large tap contribution of I(z) and Lopt/Lmax will decrease. However, as the
absolute value of Lopt increases, it will result in larger saving in the number of
arithmetic operations.
Fig. 4.4 shows the total taps requirements in case III to implement I(z), G(zL)
and the overall filter as a function of the interpolated factor L(=L1L2) for the
two-stage implementation of I(z) (=I1(z)I2(zL1)). Comparing sing-stage and two-stage
implementations of I(z) in case III, the two-stage implementation significant saving
the number of the taps of the overall filter than the single-stage implementation. This
is because of the two-stage implementation of I(z) requires considerably fewer taps of
I(z) and the optimum decomposition occurs at a high value of L(=L1L2), thus it also

45

Chapter 4 Module Generator Implementation

Case I

Case II

Case III
Fig. 4.3

The number of taps versus interpolated factor L for the periodic model filter
G(zL), the image suppressor I(z) and the overall filter H(z).

46

Chapter 4 Module Generator Implementation

600

25

500

20

T ap Num ber

Tap Number

400
300
200

15

10

100
0
1

0
1
2

6
4

6
4

5
4

2
7

2
7

1
Interpolated factor L2

Interpolated factor L1

1
Interpolated factor L2

Interpolated factor L1

G(zL1L2)

I1(z)

80

600

70

500
400

50

Tap Num ber

Tap Num ber

60

40
30

300
200

20
100

10
0
1

0
1
2

6
4
4

6
5

4
3

2
7

1
Interpolated factor L2

Interpolated factor L1

1
Interpolated factor L2

Interpolated factor L1

I2(z)
Fig. 4.4

3
5

2
4

H(z)

In case (3), the decompositions of two-stage designs of I(z) for various


values of L1and L2.

decreasing the number of taps of G(zL). When the single-stage implementation of I(z)
have a very small taps, the multistage implementations of I(z) will provides only a
slight saving over its single-stage implementation. Table 4.3 summaries the optimal
IFIR filters with single-stage and two-stage implementations of I(z) in these three
cases. We can observed that a filter with the narrow passband width and the transition
band using the IFIR method implement will has significant reduces the total taps of

47

Chapter 4 Module Generator Implementation

the overall filter. When using the two-stage implementations of I(z), it will further
saving the total taps of the overall filter compared with the one-stage implementation
of I(z). The analysis of the optimal IFIR filters for case I ~ III by our module generator
costs 12% additional taps compared with [13]. Because our module generator
decomposition of IFIR filters with stringent specifications to guarantee the final design
will satisfy the system specification, thus it will take more taps of overall design.

Table 4.3

Summaries of the optimal IFIR filters with single-stage and two-stage


implementations of I(z).
Single-stage implementations of I(z).

Case

NCON

NH

NI

NG

103

43

15

28

2.53

II

510

126

39

87

4.09

III

510

88

41

47

11-13

5.79

Two-stage implementations of I(z).


31
III

510

60

(NI1=15;

29

NI2=16)

Note:

20
(L1=4;L2=5)

8.5

NCON is the number of taps of conventional filter.


NH is the minimum taps of overall multistage filter
NI is the number of taps of I(z).
NG is the number of taps of G(zL).
L is the interpolated factor.
R is the reduction ratio of multistage total taps over the conventional design taps.

The IFIR decomposition analysis of our module generator will select several
decomposition methods that have minimum taps of the overall filter H(z) to
implement the multistage designs.

48

Chapter 4 Module Generator Implementation

4.2.2 Multirate Multistage Filter Decomposition


Following the above subsection, we will analyze the optimal decomposition of
the IFIR designs and using the polyphase structure to save the arithmetic operations.
The choice of the number of stages K and the sampling rate conversion ratio M is not
a trivial problem. However, in practice, the number of stages K is rarely larger than
four. Furthermore, for a given value of M, there are only a limited set of possible
integer factors. Thus, a feasibe approach is to determine all the possible factors of M.
For a low power approach, we use multistage IFIR technique and follow the
relationship of sampling rate conversion ratio (SRCR):
M1 M2

Mk

(4.3)

to decompose the decimator / interpolator into all the possible multistage sets. For a

Fig. 4.5

Specifications of multistage IFIR decimation filter design.

49

Chapter 4 Module Generator Implementation

K-stage decimator / interpolator, the filter specification for each stage shall be chosen
to ensure that the overall filter requirements are met as shown in Fig. 4.5, where the
passband ripple is P/K, and the stopband ripple is S.
Moreover, the polyphase decomposition is used to decompose the filter into
subfilters. In order to consider both high-speed and low-speed applications, the
transposed direct form structure is chosen. However, the decomposition of the linear
phase filter into M subfilters will usually destroy the symmetric property of subfilters
and result the nonlinear phase subfilters. Therefore, in our module generator when the
SRCR is even and the filter with even tap length N can be redesigned to be N+1 for
two more number of linear phase subfilters to reduce the hardware complexity.

4.3

Coefficient Calculation
The floating-point filter coefficient h(k) is generated by the Parks-McClellan

optimal equiripple method as given in the MATLAB gremez.m function [32]. If the
coefficients do not satisfy the desired filter specifications, the filter order is increased
and coefficients are calculated again. In addition, the user can input the coefficients
derived from other filter analysis packages.

4.4

Coefficient Optimization
The simple rounding of a filters floating-point coefficients to their nearest CSD

values does not usually yield satisfactory performance in terms of implementation


complexity and frequency response. Many search algorithms for the design of
multiplierless filters with power-of-two and CSD coefficients have been published.

50

Chapter 4 Module Generator Implementation

The two most popular techniques for CSD coefficients optimization are
mixed-integer-linear-programming (MILP)[33] and local search [23].
MILP is known to be the optimal technique for designing FIR filters employing
conventional fixed word length coefficients. A drawback of MILP is that its
computation time grows at least exponentially with filter length and this limits its
application to the design of filters having short to medium length. However, for filters
with CSD coefficients, even though MILP optimally searches the SPT coefficient
space, it does not guarantee that the solution produced has the minimal total number
of adders. Obviously, the two major goals of a CSD search algorithm are:
(1) a filter that can be implemented with minimum hardware requirement, and
(2) to minimize the computation time in such a design procedure.
The local search techniques have been found to perform nearly as well as the MILP
method while requiring substantially less computational time for their convergence.
According to the methods presented in [23][34], we adopt a two-step local search
algorithm to round and optimize our filter coefficient with CSD codes as discussed in
the following two sections.

4.4.1 Scaling Strategy


In the beginning, according to Eqn. (2.7), the number of nonzero digits (Dn) and
the coefficients word length (W) are selected. A table of all possible CSD coefficients
between 0 and 2-1 is created and saved for later use.
The shape of the frequency response of an FIR filter is unaffected by
multiplying all the filter coefficients by a fixed scale factor (SF). The SF simply
inserts an additional gain or attenuation into the frequency response. It can also be
51

Chapter 4 Module Generator Implementation

easily compensated by a constant gain stage before or after the filter system. The set
of numbers represented by a CSD code with a fixed number of nonzero digits is not
uniformly distributed. Therefore, properly scaling the ideal filter coefficients prior to
rounding them to the nearest CSD code can usually significantly reduce the
magnitudes of the coefficient quantization errors, which means an improved
frequency response. Fig. 4.6 shows the flowchart of the scaling strategy for filter
coefficients.
Since the coefficient quantization process is highly nonlinear, there is no way to
predict in advance which SF will produce better results. Therefore, a brute force
search of SF must be performed. All the filter coefficients are assumed to be in the
range [ 0.5,0.5] . Then the choice of the SF can be constrained to such a range that
the SF is not greater than the value SF_max and is not less than the value SF_min.
The limits SF_max and SF_min are defined as follows: multiplying by SF_max
makes the absolute value of the largest coefficient equal to 2-1; multiplying by
SF_min makes the absolute value of the largest coefficient equal to 2-2. During the
search procedure the SF change from SF_min to SF_max with the step size of 2-q,
where q is the coefficient wordlength.
For each SF, the frequency response is computed only if the quantized CSD
coefficient is different from the previous one. Finally, we select the SF which results
in the minimum total number of SPT terms ( Dn ) and fulfill the specification of
n

filter.

52

Chapter 4 Module Generator Implementation

( SF _ max SF _ min)
2 q

Fig. 4.6

The flowchart of the scaling strategy for filter coefficients.

53

Chapter 4 Module Generator Implementation

4.4.2 Local Search Strategy


The second step in the optimization process is a bi-variate local search in the
neighborhood of the scaled and rounded coefficient. It was observed that a bi-variate
search was found to yield substantially better results than a uni-variate local search
and that any higher order search did not justify the exponential growth in CPU time
[35].
Thus, all possible pairs of coefficients are varied by +/- one quantization step
size and the resulting frequency response is computed. Let S represent half the total
number of coefficients in symmetrical FIR digital filters. For each distinct coefficient
pair, four perturbations are performed by simultaneously rounding the first number in
the pair it up or down by one digit. Similarly, for each no distinct coefficient pair, two
perturbations are performed by rounding the number up or down by one digit. This
result in a total of
2S + 4

S ( S 1)
= 2S 2
2

(4.4)

coefficient sets are searched. The local search process proceeds in an iterative manner.
After the search cycle is completed, the coefficient sets whose frequency response fit
the filter specification are selected and the bi-variate local search is repeated with the
new coefficient sets. This process continues until no further improvement is obtained.

54

Chapter 4 Module Generator Implementation

4.5

Word Length Estimation

4.5.1

Overflow Prevention

If the final output is within the range of the original word length, overflow in
partial sums are unimportant. This is a desirable property of 2s complement
arithmetic. However, if the final output exceeds the range of the signal, the value of
the output sample will be wrong and methods should be taken to prevent this.

An

approach is to avoid or allow limited overflow by scaling the coefficients. The


coefficients may be scaled in the following way [36]:
h(k ) = h(k ) 2 R

where

or

N 1
N 1
2

R = log 2 h (k ) = log 2 h 2 (k )

k =0
k =0
2

N 1

R = log 2 h(k )
k =0

(4.5)

(4.6a)

(4.6b)

where R denotes right shift bit(s). The method given in Eqn. (4.6a) probably lead to
shorten internal word length than Eqn. (4.6b) but this form of scaling will
occasionally have overflow which results in performance degradation. Therefore,
the method in Eqn. (4.6b) is adopted which never cause overflow because it is based
on the worst-case conditions for overflow.

Hence, the coefficient word length

increases R bit(s) and the coefficients are then shifted right R bit(s) to prevent
overflow.

55

Chapter 4 Module Generator Implementation

4.5.2 Internal Word Length Reduction


In digital signal processing, the finite word length has a strong effect on the
system performance since it dominates the precision of the output signals. When the
internal word length increases, a better signal-to-noise ratio (SNR) would be acquired.
However, the system would have higher hardware complexity, consume more power,
and have lower system operation frequency. Therefore, the designers should make the
trade-off.
If designer is willing to accept some deviation from the given specifications, the
decrement of internal word length can enable a reduction of hardware complexity. In
this subprogram, we will evaluate the SNR by using Eqn. (4.7)

(
(

)
)

E y 2 ( n)
E y 2 ( n)
SNR = 10 log
=
10
log

2
2
E e (n)
E ( y (n) y (n) )

(4.7)

The simulation block is shown in Fig. 4.7 and Fig. 4.8. They show the internal word
length estimation flow.

y(n)

Fig. 4.7

SNR simulation block.

56

Chapter 4 Module Generator Implementation

Fig. 4.8

Internal word length estimation flow chart.

The initial internal word length will be evaluated for the result that does not
introduce any error first. Then the internal word length will be decreased to the value
that its SNR value still fits the specification. Finally, the minimum internal word
length, which fulfills the specification, will be obtained.

4.6

Synthesizable Verilog Code Generation

4.6.1

Hardware Estimation

Before generating the hardware of the FIR digital filters, interpolated FIR filters
and multirate multistage decimator / interpolator, the module generator will do the

57

Chapter 4 Module Generator Implementation

hardware complexity estimation for each design. For comparison are total nonzero
digit, maximum nonzero digit, and internal word length. The priority of the indices is
as follows.
z For low-complexity application:
Priority: Internal Word Length > Total Nonzero Digit > Max. Nonzero Digit
z For high-speed application:
Priority: Max. Nonzero Digit > Internal Word Length > Total Nonzero Digit

In addition, the module generator will also estimate the computation APU and
storage SPU of each design. Finally, it will generate a file, hardware.out, to record the
hardware estimation.

4.6.2 Structure of the FIR Digital Filter


The module generator generates three types of the symmetric transposed direct
form for each stage FIR filters.

Structure A: A transposed direct form filter structure is adopted and written in


behavior level synthesizable Verilog-HDL code, which allows the
synthesis tool to select the appropriate architecture for users constraints.
Structure B: A transposed direct form filter structure with carry save adders (CSA)
written in DesignWare components [37] provided by Synopsys is
adopted as show in Fig. 4.9(a).
Structure C: We exploit structure B with pipelining to achieve at most two CSA delay
critical path for the maximum allowed number of SPT terms per
58

Chapter 4 Module Generator Implementation

coefficient is three or four, as shown in Fig. 4.9(b). Moreover, the


nonzero digits of most CSD coefficient sets is generally less than three
so we can use a single input buffer rather than pipelining at each tap.
Referring to Fig. 4.10, the input x(n) is for the taps whose nonzero digits
are more than two and x(n-1) is for less than three.

Fig. 4.9

Transposed direct form filter strcture utilizes with (a) carry save adders
(CSA) and, (b) carry save adders (CSA) with pipelining.

buffer
Z-1

x(n)

x(n-1)

shift

CSA
Z-1 Z-1 pipelining


Fig. 4.10

Z-1
Z-1

Z-1
Z-1

The Structure C with an input buffer.

59

Z-1
Z-1

Chapter 4 Module Generator Implementation

4.6.3

Structure of the Interpolated FIR Filter

The basic idea of the interpolated FIR (IFIR) filters is to implement the
prototype filter H(z) as a cascade of two FIR sections, a periodic model subfilter G(zL)
and a image suppressor I(z), as show in Eqn. (3.10).
The periodic model subfilter G(zL) are based upon the behavior of an N-tap
nonrecursive linear-phase FIR filter when each of its unit delays are replaced with
L-unit delays, with the interpolated factor L being an integer, as shown in Fig. 4.11(a).
If the H(z) impulse response of a nine-tap FIR filter is that shown in Fig. 4.11(b), the
impulse response of the periodic model filter, where, for example, L = 3, is the G(zL)
in Fig. 4.11(c). The module generator generates the symmetric transposed direct from
structure for the periodic model subfilter with expanded delays between the taps and
adopts the above FIR filter design to implement the image suppressor.
It is an important implementation issue when a narrow IFIR filter passband
width and transition band is using in IFIR filters, a larger interpolated factors L can be
used. However, it requires a larger size of the storages to be allocated, in order to hold
a sufficient number of input samples for the periodic model subfilter. This is a
disadvantage to the periodic model subfilter G(zL) because the size of the storages
must be equal to [L(N-1)-1], N is the tap length of G(z). Although, it will increase
the storage area, but it can reduce the hardware complexity effectively relative to the
conventional FIR design when implement narrowband FIR filter. If we use dual
clocks to G(zL), it will effective reduce the storages requirment of the periodic model
subfilter such as the design shown in Fig. 4.12.

60

Chapter 4 Module Generator Implementation

(a)
0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

-0.1

-0.1

(b)

10

15

20

(c)

Fig. 4.11

(a) The symmetric transposed direct from structure for G(zL) with L-unit
delays between the taps; (b) the impulse response of H(z); (c) the impulse
response of G(zL) with L=3.

Fig. 4.12

The symmetric transposed direct from structure for G(zL) with dual clocks.

61

Chapter 4 Module Generator Implementation

4.6.4

Structure of the Multirate Multistage Filter

Both decimator and interpolator can have two structures in direct form or
transposed direct form. When the implementation is to use the transposed direct form
for decimators and the direct form for interpolators, there are the registers to be
shared between the subfilters, as shown in Fig. 4.13(b)(c) for the example of M=3,
N=9. This is the so-called memory-saving technique [25]. Another type of
implementation is to use the direct form for decimator and transposed direct form for
interpolator.

They allow multipliers to be shared between the subfilter in each

mirror symmetric pair, as shown in Fig. 4.13(a)(d). This is the so-called mirror
symmetric filter pairs technique [25].
The word length of the registers in structures (b) and (d) need to store internal
signal and is longer than the word length of the registers in structures (a) and (c)
which store input signal. With mirror symmetric filter pairs, structures (a) and (d)
have only about half of the multipliers in structures (b) and (c). However, structures
(b) and (c) which using memory-saving technique have approximate 1/M registers of
those in structures (a) and (d). Although no structure is absolutely better than the
other one, the critical path of the transposed direct form is shorter than that of direct
form. For high-speed application, therefore, the structures (b) and (d) will be selected.

62

Chapter 4 Module Generator Implementation

(a)

(b)
Fig. 4.13

(a) Direct form decimator with mirror symmetric filter pairs.


(b) Transposed direct form decimator with memory-saving technique.

63

Chapter 4 Module Generator Implementation

(c)

(d)
Fig. 4.13

(c) Direct form interpolator with memory-saving technique.


(d) Transposed direct form interpolator with mirror symmetric filter pairs.

64

Chapter 4 Module Generator Implementation

4.7

Module Generator
Our module generator are written in C++ language and employing Matlab as a

computation engine. The Matlab engine library is a set of routines that allows we to
call Matlab from our own programs. Our module generator has about 72 subprograms
in the main program and it consists of many sub-modules in the operation flow as
shown in Fig. 4.14. The main sub-modules are the multistage architecture analysis
and synthesis, the coefficient calculation, the coefficient optimization, the word
length estimation and the synthesizable Verilog code generation.
Following the operation flow, the module generator will read the system
specifications firstly. According the filter type definition of the specifications, it will
define the design is the digital FIR filter (LP, HP, BP and BS), the multistage IFIR
filter or the multirate multistage decimator / interpolator. When the design is the
multistage IFIR filter or the multirate multistage decimator / interpolator, it will
through the multistage architecture analysis / synthesis sub-module to decomposed
the optimal architectures of the IFIR filter and decimator / interpolator. After the
analysis of the optimal architecture, the coefficient calculation sub-module employ
Matlab to estimate the floating-point filter coefficients and generate the matlab.out to
record the coefficient values in the same time. We adopt a two-step local search,
scaling and local search, to round and optimize our filter coefficient with CSD codes.
For high speed approach, it will select the coefficient sets with the minimum nonzero
digits of a coefficient. For low complexity approach, it will select the coefficient sets
with the minimum total nonzero digits. According the optimal coefficient sets, our
model generator will estimate the internal word length in the system that must be
65

Chapter 4 Module Generator Implementation

Fig. 4.14

The operation flow of the module generator.

66

Chapter 4 Module Generator Implementation

satisfy the requirement of the SNR. Hardware estimation sub-module will generate
the hardware.out to record the hardware design costs. Finally, the synthesizable
Verilog code generation sub-module will generate the synthesizable Verilog code of
multirate multistage digital FIR filter / decimator / interpolator.

67

Chapter 5 Experimental Results

Chapter 5
Experimental Results

In this chapter, the design examples of FIR filter, interpolated FIR filter and
multirate multistage filter generated by the module generator are presented. All
performance data presented in this chapter are pre-layout estimations.

5.1

FIR Filter Design


A linear-phase low-pass FIR filter is designed using our proposed method, the

mixed integer linear programming (MILP) algorithm [33], and Samuelis local search
algorithm [23]. The pass-band and stop-band edge frequencies are 0.3 and 0.5,
respectively. The passband ripple is 0.05dB and stopband ripple is 50dB. The word
length of the input signal is assumed to be 14 bits. The minimum number of SPT terms
required by the various methods mentioned above is summarized in Table 5.1. When
the maximum allowed number of SPT terms per coefficient is limited to four, the filter
designed by our methods saves 22% (21%~24%) SPT terms and costs 5% (4%~7%)

68

Chapter 5 Experimental Results

Table 5.1 Minimum number of SPT terms required to attain -50dB NPR.
Algorithm

#SPT

Max. SPT per coeff. = 4


MILP [33]
Samueli [23]
Our Work #1
Our Work #2
Our Work #3

68
66
64
54
52

28
28
28
29
30

Max. SPT per coeff. = 3


MILP [33]
Samueli [23]
Our Work #4
Our Work #5

68

28
cannot reach -50 dB

66
57

Table 5.2

28
29

Synthesis results of Work #1.

Timing Constraint: 7.50(ns)


Structure
Critical Path (ns)
Total Gate Count
Combinational Area
Noncombinational Area

A
7.46
5069
2824
2245

B
4.65
8103
3613
4490

C
4.65
9119
3907
5212

B
1.57
11520
5595
5925

C
1.25
12862
5999
6863

Timing Constraint: 1.25(ns)


Structure
Critical Path (ns)
Total Gate Count
Combinational Area
Noncombinational Area

A
3.86
8338
5799
2539

additional tap length. If the application requires us to limit the maximum number of
SPT terms per coefficient to three, to have higher throughput rate, the filter designed
using Samuelis algorithm failed to reach -50 dB NPR. However, using our proposed
method can save 16% SPT terms and costs 4% additional tap length. The design results
are converted into three structures mentioned in Section 4.6.2. We designed the filters

69

Chapter 5 Experimental Results

of Work #1 with TSMC 0.25m process and summarized the results in Table 5.2. The
synthesis results summarized in Table 5.2 show that structure A is suitable for the
low-speed (133MHz) and area-efficient application; Structure B is suitable for the
high-speed (637MHz) application; and Structure C is suitable for the very high-speed
(800MHz) application. Therefore, our module generator can provide flexible hardware
implementation for various applications. The frequency responses of the filter designed
by our module generator are shown in Fig. 5.1. We designed the filters of Work #1 and
Work #3 with TSMC 0.25m process and summarized the results in Table 5.3. Work #3
design saves about 19% SPT terms and costs 6.6% additional tap length compared
with Work #1 design. The area of Work #1 design is about 1.1 times of Work #3 design.
The area of an FA is about 6.3 gates and a register is about 5.6 gates. The power
dissipation of Work #1 design is about 1.1 times of Work #3 design.

20
0.02

10

0.01

0
Normalized Magnitude Response (dB)

-10
-0.01
-20
-0.02
-0.03
-40

0.05

0.1

0.15

0.2

0.25

0.3

-50
-60
-70
-80
-90
W ork
W ork
W ork
W ork
W ork

-100
-110
-120

Fig. 5.1

0.1

#1
#2
#3
#4
#5
0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

The frequency responses of of Work #1, Work #2 and Work #3.

70

Chapter 5 Experimental Results

Table 5.3

Synthesis results of Work #1and Work #3.

Technology:
TSMC 0.25um
Input Frequency:
200 MHz

Work #1

Work #3

Total Gate Count

6527

6080

Combination Area

4314

3719

Noncombination Area

2213

2361

Power Dissipation (mW)

80.66

72.99

A filter design example for the baseband demodulator of the 64 quadrature


amplitude modulation (QAM) telecommunication system is carried out. The specification
is shown in Table 5.4 [38]. Since this filter requires 35 taps length, it is clearly that
directly implementation requires much hardware. However, with our module generator,
the whole filter can be implemented with reasonable chip area.

Table 5.4 Specifications of 64-QAM baseband demodulator.


Sampling Frequency
Symbol Rate
Normalized Passband Edge Frequency
Normalized Stopband Edge Frequency
Passband Ripple
Stopband Attenuation
Input Data Word Length
Output Data Word Length

21.52 MHz
5.38 MHz
0.2110
0.3204
0.1 dB
30 dB
11 bit
14 bit

Fig. 5.2 shows the frequency response between the original coefficients and the
coefficients after coefficient optimization. In addition, the specifications after module
generator are summarized in Table 5.5. By using scaling strategy we can have less
number of total nonzero digit than [38] so fewer adder will be needed. Moreover, with
71

Chapter 5 Experimental Results

local search strategy the number of total nonzero digit is further reduced. The number
of maximum nonzero digit, which represents the critical path of the filter, is also
reduced.

10
0.1
0

0.05
0

Normalized Magnitude Response (dB)

-10

-0.05
-20
-0.1
0.05

-30

0.1

0.15

0.2

-40

-50

-60

-70
Original
Scaling Strategy
Local Search Strategy

-80

-90

0.1

Fig. 5.2

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

The frequency response of 64-QAM baseband demodulator.

The chip is deigned with TSMC 0.25m process because the max nonzero digit is
only 2 bits, the result of structure C is just the same as the structure Bs. For
low-complexity applications, the overall area of [38] is about 1.64 times and the power
dissipation is about 1.95 times of the structure A. Moreover, using the structure B (or
structure C) for high-speed application, the chip can operate at 714MHz. The synthesis
results are summarized in Table 5.6.

72

Chapter 5 Experimental Results

Table 5.5

Specifications after module generator of 64-QAM baseband demodulator.

[38]

This Work
Scaling

Local Search

Tap Length (tap)

35

35

Normalized Passband
Edge Frequency

0.2010

0.2148

Normalized Stopband
Edge Frequency

0.3204

0.3203

Passband Ripple (dB)

0.0558

0.0531

Stopband Attenuation (dB)

31.1030

31.5092

Coefficient Word Length (bit)

Internal Word Length (bit)

14

14

14

Max Nonzero Digit (bit)

Total Nonzero Digit (bit)

63

57

49

SNR (dB)

46.8

Table 5.6

41.2

Synthesis results of the example for 64-QAM baseband demodulator.


Work#1

Work#2

[38]

Technology (um)

0.25

0.25

0.25

Max. Operating Frequency

714 MHz

146 MHz

72 MHz

Total Gate Count

11117

5155

8477

Combination Area

5011

2496

5938

Noncombination Area

6106

2659

2539

Power Dissipation (mW)

520.05

6.83

13.31

Work#1 Specifications under high-speed constraint.


Work#2 Specifications under low-complexity constraint.

73

Chapter 5 Experimental Results

5.2

Interpolated FIR Filter Design


We design IFIR filters with specifications are the first version of the CDMA

cellular proposed by Qualcomme [40], by our module generator. The specification are
shown in Table 5.7. Then a conventional filter design using the Parks McClellan
algorithm would require an order N = 69. Base on the algorithm shown in section 4.2.1,
we use the optimal interpolated factor L = 4 for IFIR design with single-stage
implementation of I(z) and L1L2 = 22 for IFIR design with two-stage implementation
of I(z). After module generator, the specifications of the conventional filter, the periodic
model subfilters G(zL) and the image suppressors I(z) are summarized in Table 5.8 [41].
Notice that the system G(z4)I(z) has linear phase property since G(z) and I(z) have this
property.

Table 5.7 Specifications of the CDMA cellular proposed by Qualcomme.


Sampling Frequency

19.6608 MHz

Passband Edge Frequency

0.064087

Stopband Edge Frequency

0.125

Passband Ripple in dB

0.1 dB

Stopband Attenuation in dB

40 dB

Input Data Word Length

5 bits

Fig. 5.3 shows the conventional filter and the frequency responses for the IFIR
filters with single-stage I(z) and L = 4 as well as the frequency responses for the
subfilters, I(z) and G(z4). Fig. 5.4 shows the conventional filter and the frequency

74

Chapter 5 Experimental Results

Table 5.8

Design results by module generator with IFIR filter designs and the
conventional filter.
Conventional
FIR

Multistage IFIR with


Two-stage
I(z)

G(z4)

Tap Length (tap)

69

15

19

Normalized Passband
Edge Frequency

0.0645

0.0683

0.2598

Normalized Stopband
Edge Frequency

0.1230

0.3730

0.4980

Passband Ripple (dB)

0.0969

0.0284

0.0365

Stopband Attenuation (dB)

40.2410

43.1643

42.7119

Coefficient Word Length (bit)

13

Internal Word Length (bit)

15

11

13

Max Nonzero Digit (bit)

Total Nonzero Digit (bit)

204

27

39

Actual Nonzero Digit (bit)

104

14

21

SNR (dB)

41.0

41.5

40.0

Multistage IFIR with Three-stage


I1(z)

I2(z2)

G(z4)

Tap Length (tap)

19

Normalized Passband
Edge Frequency

0.0643

0.1289

0.2578

Normalized Stopband
Edge Frequency

0.8730

0.7480

0.4980

Passband Ripple (dB)

0.0232

0.0283

0.0280

Stopband Attenuation (dB)

40.7033

40.6462

40.4649

Coefficient Word Length (bit)

10

Internal Word Length (bit)

11

11

12

Max Nonzero Digit (bit)

Total Nonzero Digit (bit)

13

41

Actual Nonzero Digit (bit)

22

SNR (dB)

41.2

44.7

42.4

75

Chapter 5 Experimental Results

10

Normalized Magnitude Response (dB)

-10

-20

-30

-40

-50

-60

-70

-80

I(z)
G(z 4)

-90

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

(a)
20

0.1
0.05

0
-0.05

Normalized Magnitude Response (dB)

-20

-0.1
0

-40

0.02

0.04

0.06

-60

-80

-100

-120
Overall IFIR Filter
Conventional Filter
-140

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

(b)
Fig. 5.3

Frequency responses of the IFIR filters with single-stage I(z) and L = 4,


(a) I(z) of order 15 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter.

76

Chapter 5 Experimental Results

20

Normalized Magnitude Response (dB)

-20

-40

-60

-80

-100

I1(z)
I2(z 2)
G(z 4)

-120

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

(a)
20
0.1
0.05

Normalized Magnitude Response (dB)

-20

-0.05
-0.1
0.01 0.02 0.03 0.04 0.05 0.06

-40

-60

-80

-100

-120
Overall IFIR Filter
Conventional Filter
-140

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

(b)
Fig. 5.4

Frequency responses of the IFIR filters with two-stage I(z) and L = 4,


(a) I1(z) of order 7, I2(z2) of order 7 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter.

77

Chapter 5 Experimental Results

responses for the IFIR filters with two-stage I(z) and L1L2 = 22 as well as the
frequency responses for the subfilters, I1(z), I2(z2) and G(z4).
The synthesis results of the IFIR filter designs and conventional filter design are
summarized in Table 5.9.The filter I(z) is very inexpensive, whereas the cost of G(z4) is
little more than half the cost of the conventional design. When the timing constraints of
the conventional filter and the IFIR filters are equal, the area of the conventional filter
is about 1.63 times of the IFIR filter with single-stage I(z) and about 1.72 times of the
IFIR filter with two-stage I(z). The power dissipation of the conventional filter is about
12.46 times of the IFIR filter with single-stage I(z) and about 13.10 times of the IFIR
filter with two-stage I(z).

Table 5.9

The synthesis results of the conventional filter and the IFIR filters.

Technology:
TSMC 0.25um
Input Frequency:
200 MHz

Conventional
Filter

IFIR Filter with


single-stage I(z)
I(z)

G(z4)

Total Gate Count

14839

2049

7080

Combination Area

9190

1118

1684

Noncombination Area

5649

931

5396

Power Dissipation (mW)

1400.00

25.78

86.57

Technology:
TSMC 0.25um
Input Frequency:
200 MHz

IFIR Filter with two-stage I(z)


I1(z)

I2(z2)

G(z4)

Total Gate Count

721

1369

6523

Combination Area

275

504

1540

Noncombination Area

446

865

4983

Power Dissipation (mW)

8.89

17.31

80.70

78

Chapter 5 Experimental Results

5.3

Multirate Multistage FIR Design

5.3.1

Decimator

Following the design of IFIR filters, we designed multirate decimators that is use
in the CDMA cellular [40] and decimated factor (M) is eight. The synthesis results are
summarized in Table 5.10. The conventional decimator that is single-stage
decimator using the polyphase structure to save the arithmetic operations and the
multirate decimators are designed by our proposed method.

Table 5.10

The synthesis results of the conventional decimator and the multirate


decimator [40].(Decimation Ratio, M=8)

Technology:
TSMC 0.25um
Input Frequency:
200 MHz

Conventional
Decimator
(M=8)

Multirate Decimator with


two-stage
Stage#1
(M1=4)

Stage#2
(M2=2)

Total Gate Count

13742

1580

2554

Combination Area

12567

1162

1792

Noncombination Area

1174

418

761

Power Dissipation (mW)

21.30

6.59

4.29

Technology:
TSMC 0.25um
Input Frequency:
200 MHz

Multirate Decimator with three-stage


Stage#1
(M1=2)

Stage#2
(M2=2)

Stage#3
(M3=2)

Total Gate Count

638

808

2417

Combination Area

336

496

1706

Noncombination Area

301

312

710

Power Dissipation (mW)

4.48

3.15

4.42

79

Chapter 5 Experimental Results


20

10

0.1

-10

0.06
0.04
0.02

Normalized M agnitude Response (dB)

Norm alized M agnitude Response (dB )

-0.1

-20

0.02

0.04

0.06

-40

-60

0
-0.02

-20
-0.04
-0.06

-30

0.01

0.02

0.03

0.04

0.05

-40

-50

-60

-70
-80

-100

Original
Scaling Strategy
Local Search S trategy

0.1

0.2

Original
Scaling Strategy
Local Search S trategy

-80

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sam ple)

0.8

0.9

-90

(a)

20

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sam ple)

0.8

0.9

(b)

(d)

(f)

20
0.05
0.05

-0.05
Normalized Magnitude Response (dB)

Normalized M agnitude Response (dB)

-20
-0.05
0.02 0.04 0.06 0.08 0.1

0.12 0.14

-40

-60

0.01

0.04

0.05

-40

-60

Original
Scaling Strategy
Local Search Strategy

Original
Scaling S trategy
Local Search Strategy
0

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

10

0.8

0.9

-100

(c)

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

20

0.05

0.8

0.9

0.06
0.04

0
0

0.02

0
-10

0
Normalized M agnitude Response (dB)

-0.05
Normalized M agnitude Response (dB)

0.03

-80

-80

-100

0.02

-20

0.02 0.04 0.06 0.08 0.1 0.12 0.14

-20

-30

-40

-50

-60

-0.02
-0.04

-20

-0.06
0.05

0.1

0.15

0.2

0.25

-40

-60

-70
-80
Original
Scaling Strategy
Local Search S trategy

-80

-90

0.1

Fig. 5.5

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

Original
Scaling Strategy
Local Search Strategy
0.8

0.9

(e)

-100

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

The frequency responses of the subfilters of the multirate decimators and


conventional decimator. (a) the conventional decimator of order 69, the
subfilters of the multirate decimator with two-stage (b) I(z) of order 15
(c) G(z) of order 19, the subfilters of the multirate decimator with three-stage
(d) I1(z) of order 7 (e) I2(z) of order 7 (f) G(z) of order 19.

80

Chapter 5 Experimental Results

Fig. 5.5 shows the frequency responses of the subfilters of the multirate
decimators and the conventional decimator. The frequency responses of the
conventional FIR and the multirate IFIR filters as shown in Fig. 5.6 When the timing
constraints of the conventional decimator and the multirate multistage decimators are
equal, the area of the conventional decimator is about 3.32~3.56 times and the power
dissipation is about 1.78~1.96 times of the multirate multistage decimators.
20
0.1
0.05

0
-0.05

Normalized Magnitude Response (dB)

-20

-0.1
0

0.02

0.04

0.06

-40

-60

-80

-100

-120

-140

Fig. 5.6

5.2

Conventional Filter
Multirate IFIR with 2-stage
Multirate IFIR with 3-stage
0

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

The frequency responses of the conventional FIR and multirate IFIR filters.

Interpolator
Using the module generator, we demonstrate an example of the sharp narrowband

interpolator whose specifications are summarized in Table 5.11. After module generator,

81

Chapter 5 Experimental Results

the specifications of the conventional filter, the periodic model subfilters G(zL) and the
image suppressors I(z) are summarized in Table 5.12.

Table 5.11

Table 5.12

Specifications of the interpolator.

Normalized Passband Edge Frequency

0.05

Normalized Stopband Edge Frequency

0.10

Normalized Transition Bandwidth

0.05

Passband Ripple (dB)

0.1

Stopband Attenuation (dB)

40.0

Interpolation Ratio

Specifications after module generator of IFIR filter designs and the


conventional filter.
Conventional
FIR

Multistage IFIR with


Two-stage
I(z)

G(z3)

Tap Length (tap)

85

10

31

Normalized Passband
Edge Frequency

0.0507

0.0586

0.1523

Normalized Stopband
Edge Frequency

0.0996

0.5664

0.2988

Passband Ripple (dB)

0.0712

0.0227

0.0376

Stopband Attenuation (dB)

40.5573

45.6249

42.4263

Coefficient Word Length (bit)

15

10

Internal Word Length (bit)

18

12

15

Max Nonzero Digit (bit)

Total Nonzero Digit (bit)

269

16

73

SNR (dB)

40.2

40.7

40.4

82

Chapter 5 Experimental Results

Table 5.13

The synthesis results of the conventional interpolator and the multirate


interpolator. (Interpolation Ratio, L=6)

Technology:
TSMC 0.25um
Input Frequency:
200 MHz

Multirate Interpolator with


two-stage

Conventional
Interpolator
(L=6)

Stage#1
(L1=3)

Stage#2
(L2=2)

Total Gate Count

14545

944

3810

Combination Area

14147

802

3737

Noncombination Area

398

142

73

Power Dissipation (mW)

566.20

26.93

390.52

20
0.1
0.05
0

Normalized Magnitude Response (dB)

0
-0.05

-20

-0.1
0.01

0.02

0.03

0.04

0.05

-40

-60

-80

-100
Original
Scaling Strategy
Local Search Strategy
-120

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

20

0.8

0.9

(a)

10
0.05

0.05
0

0
0

-10
0
-0.05
0.01

-40

0.02

0.03

0.04

Normalized Magnitude Response (dB )

Norm alized M agnitude Response (dB )

-20
0.05

-60

-80

-100

-120

-30

-0.05
0.02

0.04

0.06

0.08

0.1

0.12

0.14

-40

-50

-60

-70

-140

-160

-20

Original
Scaling Strategy
Local Search Strategy
0

0.1

Fig. 5.7

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

Original
Scaling Strategy
Local Search Strategy

-80

0.8

0.9

(b)

-90

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

(c)

The frequency responses of the subfilters of the multirate interpolators and

conventional interpolator. (a) the conventional interpolator of order 85, the subfilters of
the multirate interpolator with two-stage (b) I(z) of order 10 (c) G(z) of order 31.

83

Chapter 5 Experimental Results

The synthesis results are summarized in Table 5.13. It is evident that, in general,
multistage designs yield very significant reduction in both computation (APU) and
storage (SPU) requirements compared with single-stage designs. The reduction is due
to the wide transition band of the subfilters, I(z) and G(z), leading to small number of
tap length. The conventional interpolator that is single-stage interpolator using the
polyphase structure to save the arithmetic operations and the multirate interpolators
designed by our proposed method. Fig. 5.7 shows the frequency responses of the
subfilters of the multirate decimators and the conventional decimator. The frequency
responses of the conventional FIR and the multirate IFIR filters as shown in Fig. 5.8.
When the timing constraints of the conventional interpolator and the multirate multistage
interpolator are equal, the area of the conventional interpolator is about 3.06 times and
the power dissipation is about 1.36 times of the multirate multistage interpolator.

20

0.1
0.05

0
0
-0.05

Normalized Magnitude Response (dB)

-20

-0.1
0.01

-40

0.02

0.03

0.04

0.05

-60

-80

-100

-120
Conventional Filter
Multirate IFIR with 2-stage
-140

Fig. 5.8

0.1

0.2

0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)

0.8

0.9

The frequency responses of the conventional FIR and multirate IFIR filters.

84

Chapter 6 Conclusions

Chapter 6
Conclusions

In this thesis, we have surveyed several architectures of multistage multirate


FIR digital filter / decimator / interpolator proposed in recent years and discussed
about their advantages and disadvantages. A module generator written in C++
language for multirate multistage FIR digital filter / decimator / interpolator has been
presented. Several design methodologies are adopted to reduce the hardware
complexity of the system. Thus, the module is suitable for low-power applications
because of its reduction in hardware complexity and operating frequency. Moreover,
this module generator can design for the high-speed applications due to the compact
and parallel structures used.
The inputs of the module generator are the system specifications. Firstly,
multistage architecture analysis and synthesis will decompose the system into the
optimum multistage sets with multistage multirate IFIR filter design methodology.
Secondly, coefficient calculation will use MATLAB to calculate the coefficient of the
filter. Thirdly, coefficient optimization will represent the floating-point coefficient to
CSD code using scaling strategy with minimum hardware complexity. Furthermore, it

85

Chapter 6 Conclusions

will reduce the hardware complexity further by local search method. Next, word
length estimation can make the system achieve the SNR requirement with minimum
internal word length. Finally, synthesizable verilog code generation will generate the
synthesizable Verilog-HDL codes, which are written in behavior and RTL-level for
flexibility.
We have designed several filters with TSMC 0.25m standard cell. For
64-QAM baseband demodulator design shows that the area is reduced about 1.64
times and the power dissipation is saved about 1.95 times for low-complexity
applications. Moreover, for high-speed application, the chip can operate at 714MHz.
Besides, we designed the IFIR filters which specification is the first version of the
CDMA cellular, the area is reduced about 1.72 times and the power dissipation is
saved about 13.10 times as compared with direct form design. A designed multistage
decimator that is used in CDMA cellular shows that the area is reduced about 3.56
times and the power dissipation is saved about 1.96 times as compared with
conventional decimator. Finally, an example of the narrowband multistage
interpolator designed, the area is reduced about 3.06 times and the power dissipation
is saved about 1.36 times as compared with conventional interpolator.
Because the generator requires only system-level specifications, system
designers who are inexperienced in VLSI design can use the module generator easily.
Furthermore, by using this module generator, an efficient design of a chip can be
successfully completed in a few minutes.

86

References

References

[1]

P. P. Vaidyanathan, Multirate systems and filter banks, Englewood Cliffs, NJ:


Prentice Hall, 1993.

[2]

R. E. Crochiere and L. R. Rabiner, Multirate digital signal processing,


Englewood Cliffs, NJ: Prentice Hall, 1983.

[3]

Filter design toolbox users guide, version 2, MathWorks, Inc,2004.

[4]

CoCentric system studio filter design tools user guide, Synopsys, Inc., May
2002.

[5]

M. Ishikawa et al., Automatic layout synthesis for FIR filters using a silicon
compiler, IEEE Int. Symp. Circuits Syst., pp. 2588-2591, May 1990.

[6]

R. Jain, P. T. Yang, and T. Yoshino, FIRGEN: A computer-aided design system


for high performance FIR filter integrated circuits, IEEE Trans. Signal
Processing, vol. 39, pp. 1655-1668, Jul. 1991.

[7]

R. Hawley, T.-J. Lin, and H. Samueli, A silicon compiler for high-speed


CMOS multirate FIR digital filters, IEEE Int. Symp. Circuits Syst., vol. 3, pp.
1348-1351, May 1992.

[8]

E. Bidet, C. Joanblanq, and P. Senn, GENRIF: An integrated VLSI FIR filter


compiler, Eur. Conf. Design Automation, pp. 466-471, Feb. 1993.

[9]

G. Wacey and D. R. Bull, POFGEN: A design automation system for VLSI

87

References

digital filters with invariant transfer function, IEEE Int. Symp. Circuits Syst.,
pp. 631-634, May 1993.
[10] K. Y. Cheng, Multiplierless Multirate FIR Digital Filter / Decimator /
Interpolator Module Generator, MS thesis, Dept. of EE, National Central Univ.,
Taiwan, Jun. 2003.
[11] G. W. Reitwiesner, Binary arithmetic, Advances in Computers, vol. 1, NY:
Academic, pp. 231-308, 1966.
[12] Y. Neuvo, C. Y. Dong, and S. K. Mitra, Interpolated finite impulse response
filters, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp.
563-570, Jun. 1984.
[13] T. Saramki,Y. Neuvo and S. K. Mitra, Design of Computationally Efficient
Interpolated FIR Filters, IEEE Trans. Circuits Syst., VOL.35 , NO.1, Jan 1988.
[14] M. Ishikawa et al., Automatic layout synthesis for FIR filters using a silicon
compiler, IEEE Int. Symp. Circuits Syst., pp. 2588-2591, May 1990.
[15] R. Jain, P. T. Yang, and T. Yoshino, FIRGEN: A computer-aided design system
for high performance FIR filter integrated circuits, IEEE Trans. Signal
Processing, vol. 39, pp. 1655-1668, Jul. 1991.
[16] R. Hawley, T.-J. Lin, and H. Samueli, A silicon compiler for high-speed
CMOS multirate FIR digital filters, IEEE Int. Symp. Circuits Syst., vol. 3, pp.
1348-1351, May 1992.
[17] E. Bidet, C. Joanblanq, and P. Senn, GENRIF: An integrated VLSI FIR filter
compiler, Eur. Conf. Design Automation, pp. 466-471, Feb. 1993.
[18] G. Wacey and D. R. Bull, POFGEN: A design automation system for VLSI
digital filters with invariant transfer function, IEEE Int. Symp. Circuits Syst.,
pp. 631-634, May 1993.
[19] N. J. Fliege, Multirate digital signal processing: multirate systems, filter banks,

88

References

wavelets, 1994.
[20] P. Reutz, The architectures and design of a 20-MHz real-time DSP chip set,
IEEE JSSC, vol. 24, pp. 338-348, Apr. 1989.
[21] S.-Y. Wu, Low-power multirate IF digital frequency down converter for
wireless communication systems, MS thesis, Dept. of EE, National Central
Univ., Taiwan, Jun. 1997.
[22] R. Hartley, Subexpression sharing in filters using canonic signed digit
multipliers, IEEE Trans. Circuits Syst. II, vol. 43, pp. 677-688, Oct. 1996.
[23] H. Samueli, An improved search algorithm for the design of multiplierless FIR
filters with powers-of-two coefficients, IEEE Trans. Circuits Syst., vol. 36, pp.
1044-1047, Jul. 1989.
[24] T.-J. Lin and H. Samueli, A 200-Mhz CMOS x/sin(x) digital filter for
compensating D/A converter frequency response distortion in high-speed
communication systems, IEEE GLOBECOM, vol 3, pp. 1722-1726, Dec.
1990.
[25] R. A. Hawley, B. C. Wong, T.-J. Lin, J. Laskowski, and H. Samueli, Design
techniques for silicon compiler implementations of high-speed FIR digital
filters, IEEE JSSC, vol. 31, pp. 656-667, May 1996.
[26] I.-C. Park and H.-J. Kang, Digital filter synthesis based on an algorithm to
generate all minimal signed digit representations, IEEE Trans., CAD of IC and
Syst., vol. 21, pp. 1525-1529, Dec. 2002.
[27] A. P. Vinod, E. M.-K. Lai, A. B. Premkumar, and C. T. Lau, FIR filter
implementation by efficient sharing of horizontal and vertical common
subexpressions, Electronics Letters, vol. 39, pp. 251-253, Jan. 2003.
[28] B. C. Wong and H. Samueli, A 200-MHz all-digital QAM modulator and
demodulator in 1.2m CMOS for digital radio applications, IEEE JSSC, vol.
26, pp. 1970-1979, Dec. 1991.

89

References

[29] R. Hartley, Optimization of canonic signed digit multipliers for filter design,
IEEE ISCAS, pp.1992-1995, 1992.
[30] R. W. Mehler and D. Zhou, Architectural synthesis of finite impulse response
digital filters, Symp. Integrated Circuits Syst. Design, pp. 20-25, Sep. 2002.
[31] M. Bellanger, G. Bonnerot, and M. Coudreuse, Digital filtering by polyphase
network: application to sample rate alteration and filter banks, IEEE Trans.
ASSAP, vol. ASSP-24, pp. 109-114, Apr. 1976.
[32] D. J. Shpak and A. Antoniou, A generalized Remz method for the design of
FIR digital filters, IEEE Trans. Circuits Syst., pp. 161-174, Feb. 1990.
[33] Y. C. Lim and S. R. Parker, FIR filter design over a discrete powers-of-two
coefficient space, IEEE Trans. Acoust., Speech, Signal Processing, pp. 583-591,
Jun. 1983.
[34] X. Hu, L. S. DeBrunner, and V. DeBrunner, An efficient design for FIR filters
with variable precision, IEEE Int. Symp. Circuits Syst., vol. 4, pp.
IV-365-IV-368, May 2002.
[35] D. Kodek and K. Steiglitz, Comparison of optimal and local search methods
for designing finite wordlength FIR digital filters, IEEE Trans, Circuits Syst.,
vol. 28, pp. 28-32, Jan. 1981.
[36] E. C. Ifeachor and B. W. Jervis, Digital signal processing: a practical
approach, Addison-Wesley, 1993.
[37] DesignWare foundation library databook, Synopsys, Inc., Jan. 2002.
[38] S. J. Jou, C. H. Kuo, M. T. Shiau, J. Y. Heh and C. K. Wang, VLSI
implementation of timing recovery and carrier recovery for QAM/VSB dual
mode, International Symp. on VLSI Technology, Systems and Applications,
Taipei, R. O. C. June 1999, pp.159-162.

90

References

[39] Design Compiler User Guide, Synopsys, Inc., May 2002.


[40] The CDMA network engineering handbook, volume 1: concepts in CDMA,
Qualcomm Inc., Mar. 1993.
[41] S.-J. Jou, S.-Y. Wu, and C.-K. Wang, Low-power multirate architecture for IF
digital frequency down converter, IEEE Trans. Circuits Syst. II, vol. 45, pp.
1487-1494, Nov. 1998.

91

You might also like