Professional Documents
Culture Documents
91521026
91521026
91521026
(93 5 )
/()
(
) ()
) ()
( ) ()
(
:
: Multirate Multistage Digital FIR Filter / Decimator /
Interpolator Module Generator
91521026
93 6 23
1.
2.
3.
/
/
IFIR
polyphase representation
CSD transposed direct CSA
64-QAM
Synopsys TSMC 0.25m
1.64 1.95
714 MHz CDMA
(IFIR filters) 1.72
13.10
IFIR polyphase representation
3.56 1.96
3.06 1.36
July 2004
Abstract
In this thesis, a module generator, which can automate the process of designing
high-speed low-complexity multirate multistage digital FIR filter / decimator /
interpolator, is presented. The generator exploits architectural symmetries in linear
phase filters and multistage multirate interpolated FIR filter design methodology for
low complexity. In addition, the polyphase representation is used to decompose the
filter into subfilters. The resulting filters utilize canonic signed digit (CSD)
multipliers, a transposed direct form structure, and carry-save addition for high speed.
The input of the generator requires only system-level specifications. In addition, the
generator can provide three types of filter structure for different applications.
Moreover, the output is a synthesizable Verilog code written in behavioral-level
hardware description language (HDL) which allows the synthesis tool to select the
appropriate architecture from users constraints. Therefore, this tool can eliminate
manual calculations, coding, simulation, and verification time of the design cycle.
We have designed several filters with TSMC 0.25m standard cell. A 64-QAM
baseband design example shows that the area is reduced about 1.64 times and the
power dissipation is saved about 1.95 times for low-complexity applications.
Moreover, for high-speed application, the chip can operate at 714MHz. Besides, we
design the IFIR filters which specification is the first version of the CDMA cellular,
the area is reduced about 1.72 times and the power dissipation is saved about 13.10
times as compared with direct form design. An example of multistage decimator used
in CDMA cellular shows that the area is reduced about 3.56 times and the power
dissipation is saved about 1.96 times as compared with conventional decimator.
Finally, an example of the narrowband multistage interpolator are designed, the area
is reduced about 3.06 and the power dissipation is saved about 1.36 times as
compared with conventional interpolator.
Contents
Chapter 1 Introduction
1.1
Introduction ........................................................................................ 1
1.2
1.3
Thesis Organization............................................................................ 6
2.1.1
2.1.2
2.1.3
2.2
2.2.1
2.2.2
CSD Multipliers............................................................................. 17
2.3
2.3.1
2.3.2
Common Subexpression................................................................. 20
2.3.3
Pipelining ...................................................................................... 22
3.1.1
Decimation .................................................................................... 24
3.1.2
Interpolation.................................................................................. 27
3.2
3.3
3.4
3.5
System Specifications....................................................................... 41
4.2
4.2.1
4.2.2
4.3
Coefficient Calculation..................................................................... 50
4.4
Coefficient Optimization.................................................................. 50
4.4.1
Scaling Strategy............................................................................. 51
4.4.2
4.5
4.5.1
Overflow Prevention...................................................................... 55
4.5.2
4.6
4.6.1
ii
4.6.2
4.6.3
4.6.4
4.7
Module Generator............................................................................. 65
5.2
5.3
5.3.1
Interpolator ................................................................................... 79
5.3.2
Decimator...................................................................................... 81
Chapter 6 Conclusions
References
iii
List of Figures
Fig. 2.1
Fig. 2.2
Fig. 2.3
Fig. 2.4
Fig. 2.5
Fig. 2.6
Fig. 2.7
Fig. 2.8
Fig. 2.9
Fig. 2.10
Fig. 2.11
Fig. 3.1 (a) M-fold decimator. (b) Demonstration of decimation for M=2. .............25
Fig. 3.2 Spectrum analysis of downsampling effect with M=2................................26
Fig. 3.3 (a) Block diagram of an M-to-1 decimator. (b) Typical magnitude response
of the decimation filter................................................................................26
Fig. 3.4 (a) 1-to-L upsampler (b) Demonstration of upsampling for L=2................27
Fig. 3.5 Spectrum analysis of upsampling effect with L=2......................................28
Fig. 3.6 (a) Block diagram of an 1-to-L interpolator. (b) Typical magnitude response
of the interpolation filter. ............................................................................28
Fig. 3.7 The noble identities for multirate systems. .................................................29
Fig. 3.8 Reconstruction of a decimator with M=2. ..................................................31
Fig. 3.9 Polyphase implementations of (a) M-fold decimator and (b) L-fold
interpolator..................................................................................................32
Fig. 3.10 Time and frequency domain behaviors of IFIR low-pass filter with L=3.34
Fig. 3.11 IFIR filter performance versus transition region bandwidth for p = 0.5,
p = 0.1 dB and p = 40 dB: (a) hardware reduction factor over
iv
Fig. 3.12
Fig. 3.13
Fig. 3.14
Fig. 3.15
Fig. 4.1 Design flow of the module generator (a) the digital FIR filter design flow
(b) the multirate multistage digital FIR filter / decimator / interpolator
design flow. .................................................................................................42
Fig. 4.2 Specification of a lowpass filter. .................................................................43
Fig. 4.3 The number of taps versus interpolated factor L for the periodic model filter
G(zL), the image suppressor I(z) and the overall filter H(z)........................46
Fig. 4.4 In case (3), the decompositions of two-stage designs of I(z) for various
values of L1and L2. ......................................................................................47
Fig. 4.5 Specifications of multistage IFIR decimation filter design. .......................49
Fig. 4.6 The flowchart of the scaling strategy for filter coefficients........................53
Fig. 4.7 SNR simulation block. ................................................................................56
Fig. 4.8 Internal word length estimation flow chart. ................................................57
Fig. 4.9 Transposed direct form filter strcture utilizes with (a) carry save adders
(CSA) and, (b) carry save adders (CSA) with pipelining............................59
Fig. 4.10 The Structure C with an input buffer.........................................................59
Fig. 4.11 (a) The symmetric transposed direct from structure for G(zL) with L-unit
delays between the taps; (b) the impulse response of H(z); (c) the impulse
response of G(zL) with L=3. ......................................................................61
Fig. 4.12 The symmetric transposed direct from structure for G(zL) with dual clocks.61
Fig. 4.13 (a) Direct form decimator with mirror symmetric filter pairs.
(b) Transposed direct form decimator with memory-saving technique.
(c) Direct form interpolator with memory-saving technique.
(d) Transposed direct form interpolator with mirror symmetric filter pairs.
...63
Fig. 4.14 The operation flow of the module generator.............................................66
Fig. 5.1 The frequency responses of of Work #1, Work #2 and Work #3. ...............70
Fig. 5.2 The frequency response of 64-QAM baseband demodulator......................72
Fig. 5.3 Frequency responses of the IFIR filters with single-stage I(z) and L = 4,
(a) I(z) of order 15 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter. .................................76
Fig. 5.4 Frequency responses of the IFIR filters with two-stage I(z) and L = 4,
(a) I1(z) of order 7, I2(z2) of order 7 and G(z4) of order 19,
(b) the overall IFIR filter and the conventional filter. .................................77
Fig. 5.5 The frequency responses of the subfilters of the multirate decimators and
conventional decimator. (a) the conventional decimator of order 69, the
subfilters of the multirate decimator with two-stage (b) I(z) of order 15
(c) G(z) of order 19, the subfilters of the multirate decimator with
three-stage (d) I1(z) of order 7 (e) I2(z) of order 7 (f) G(z) of order 19. .....80
Fig. 5.6 The frequency responses of the conventional FIR and multirate IFIR filters.
.....................................................................................................................81
Fig. 5.7 The frequency responses of the subfilters of the multirate interpolators and
conventional interpolator. (a) the conventional interpolator of order 85, the
subfilters of the multirate interpolator with two-stage (b) I(z) of order 10 (c)
G(z) of order 31...........................................................................................83
Fig. 5.8 The frequency responses of the conventional FIR and multirate IFIR filters.
.....................................................................................................................84
vi
List of Tables
Table 2.1 Key features of the four linear phase FIR filter types. .............................13
Table 2.2 Common subexpressions of filter coefficients. ........................................21
Table 3.1 Number of linear phase subfilters if prototype filter is linear phase ........33
Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6
Table 5.7
Table 5.8
Table 5.9
Table 5.10
Table 5.11
Table 5.12
Table 5.13
vii
Chapter 1 Introduction
Chapter 1
Introduction
1.1 Introduction
Digital signal processing is an area of science and engineering that has
developed rapidly over the past 30 years. The applications of digital finite impulse
response (FIR) filters and up / down sampling DSP techniques are found everywhere
in modern electronic products such as multimedia, modems, and mobile personal
communications. For every electronic product, lower circuit complexity is always an
important design target since it reduces the cost. For portable applications such as
notebook computers or wireless personal communication systems, whose power
consumption shall be small, a low-power low-complexity implementation is very
important. This is evident by the recent trend toward integrating a whole system on a
single chip (SoC).
Digital FIR filters are widely used in DSP applications. The trend towards
increasing data rates in DSP systems has pushed the development and implementation
Chapter 1 Introduction
Chapter 1 Introduction
environment and the Signal Processing Toolbox. The toolbox includes a number of
advanced filter design techniques that support designing, simulating, and analyzing
fixed-point and custom floating-point filters for a wide range of precisions.
However, it can handle single-rate filter design only.
z
model filter and an image suppressor by using IFIR filter design methodology.
However, it cannot decompose the filter into three stages or more. Its capability for
filter decomposition is two stages only.
analog and digital filters for use in system simulation and implementation.
z
Chapter 1 Introduction
This tool is a companion to the MRFD but is for use in problems in which
the goal is to change the sampling rate using filtering, decimation, and interpolation.
By using the three tools mentioned above, the following system capabilities can
be obtained:
Design of FIR filters.
Analysis of possible decimator or interpolator structures.
Analysis of multistage options for each decimation or interpolation factor.
Calculation of computational requirements for multistage structures.
Recommended multirate structure including number of stages.
Automatic generation of all filter specifications.
Automatically calls Parks-McClellan design algorithm for each filter design.
C code generation with multistage polyphase filters.
Optional comb / halfband filter design in multistage implementations.
For fixed filter implementations, it is necessary to create custom silicon
solutions for each application. The large number of applications for such application
specific integrated circuits (ASICs) would suggest that a compiler silicon solution
would be desirable [5]-[9]. However, logic synthesis is already a mature technology
and results from existing tools are generally accepted as producing satisfactory
circuits. Thus, we focus on the design process of multirate multistage digital FIR
filter / decimator / interpolator from system specifications to Verilog HDL codes[10].
Chapter 1 Introduction
Chapter 1 Introduction
1.3
Thesis Organization
This thesis will describe various techniques by which sufficient parallelism for
Chapter 2
Digital FIR Filter Design
Digital filters play very important roles in DSP systems. The characteristics of
analog filter circuits are usually very difficult to design, and its overall performance is
very sensitive to no idealities, such as dc-offset voltage, dc voltage drifts and parasitic
components, etc. Compared with analog filters, digital FIR filters can have a truly
linear phase response and very precise performance. A digital filter is easily
programming the hardware to accommodate different data rates, modulation formats
and filter specifications makes the hardware requirements relatively simple and
compact in comparison with the equivalent analog circuitry.
y ( n) = h( k ) x ( n k )
(2.1)
k =0
Eqn.(2.1) is the FIR difference equation. It is a time domain equation and describes
the FIR filter in its nonrecursive form: y(n) is the current output samples that is the
function of present and past values of input, x(n). N is the filter length, that is th
number of filter coefficients. An alternative representation for FIR in z-domain is
given in Eqn. (2.2).
N 1
H ( z ) = h( k ) z k
(2.2)
k =0
where h(k), k = 0, 1, , N-1, are the impulse response coefficients of the filter, H(z)
is the transfer function of the filter. Detail discussion of several basic FIR filter
structures will be given in the next sections.
(2.3)
Fig. 2.1
where Tmul is the delay of the multiplier and Tadd is the delay of a Wint -bits CPA, and
Wint is the internal word length. A tree structure adder as suggested by Reutz [20] can
instead perform the accumulation and the critical path can be measured as
Tmul + (log1.5 N ) Tadd
(2.4)
The delay time of the filter, increases logarithmically with the filter tap length N.
Furthermore, the tree structures can use carry-save adder (CSA) tree, Wallace trees, or
Dadda trees to eliminate the delay due to carry propagation.
Fig. 2.2 depicts the transposed direct form structure of FIR filter that
repositioning the delay elements of the direct form structure [19]. In this structure, the
input is fed to each tap and the results are accumulated over N sample periods. As
shown in the block diagram, the system throughput rate is independent of the tap
length. It retains the regularity of the linear accumulation direct form structure and the
critical path of this structure is only a multiplication and an addition, as shown in Eqn.
(2.5).
Ttransposed = Tmul + Tadd
(2.5)
Fig. 2.2
We can expect it faster than the tree structures used in direct from structure of
FIR filter. Such a short critical path also allows the system to operate in a low supply
voltage and make this solution very suitable for low-power applications. Besides, it
has inherent ability for high-speed operation and pipelining.
One of the primary disadvantages of this structure is the large loading on the
input data-broadcasting bus since all multipliers are fed in parallel. As the numbers of
taps increase, the input signal bus becomes longer and leads to larger load
capacitances. We can reduce this effect by using appropriate data buffers and by
appropriately distributing the input bus as tree-like structures. Another disadvantage
of this structure is the delay elements are larger since they hold the accumulated sum
instead of the input signal. Furthermore, if we choose the CSA base structures that
will be introduced in the next section, it required doubling delay elements within the
filter core.
10
2.1.2
The multiplier and adder delay plays an important role in dominating the system
speed as show in Eqn. (2.5). Carry-propagation adder (CPA) is not a good candidate
for low power dissipation design and high-speed application. Because the delay time
of it is linearly dependent on the word length of the adder. It also generates many
glitches before the real carry propagates from the least significant bit (LSB) to the
most significant bit (MSB), as shown in Fig. 2.3 [21].
Cin
11
10
9
8
7
6
5
4
3
2
1
0
Fig. 2.3
In order to avoid the long critical path delay of the adder, the adder in each tap
is converted to CSA as shown in Fig. 2.4. In carry-save addition, both a sum and a
carry bit are acquired in each bit position in the word and the carry propagation
problem inside an adder is avoided. There are a few drawbacks to the carry-save
scheme, with the most important of these being the requirement of doubling the
11
Fig. 2.4
number of registers within the filter core. This will increase the filter core area but
system can achieve a higher throughput rate or use a lower supply voltage. At the
final stage of the filter, it requires a single high-speed CPA, a so-called vector merge
adder (VMA), in order to sum the two data path output together to form the final
output. The critical path delay of transposed direct from FIR filter is
TFIR = max{Tmul + TCSA , TVMA }
(2.6)
where TVMA means the n-bits VMA delay. Obviously, the VMA delay will dominate
the system throughput rate, so some high-complexity high-speed adder such as a
carry-select adder or a carry-lookahead adder (CLA) may be used to reduce TVMA.
2.1.3
In many filter applications, phase distortion cannot be tolerated, and thus the
filters are required to have a linear phase response. There are four types of linear
phase FIR filters, depending on whether N is even or odd and whether h(k) is
symmetric or anti-symmetric. Table 2.1 summarizes their key features.
12
Table 2.1
Type
II
III
IV
Tap Length
odd
even
odd
even
Symmetry
symmetric
symmetric
anti-symmetric
anti-symmetric
H(0)
arbitrary
arbitrary
H()
arbitrary
arbitrary
Applications
LP, BP
differentiators,
Hilbert transformers
The symmetric structure can save about half the number of coefficient
multipliers by sharing the multipliers between the symmetric taps. This symmetry
feature exists in both the direct form and transposed direct form structures. Fig. 2.5
shown the linear phase transposed direct form structure or called symmetric
transposed direct form structure and it is adopted in our module generator. The
drawback of this symmetric structure is the slightly increase in data path routing due
to the sharing of multipliers.
Fig. 2.5
13
hSPT (n) = sk , n 2
pk , n
(2.7)
k =1
where s k , n { 1,0,1} and pk ,n {1,L,W }. The coefficient hSPT(n) has Dn-SPT terms
and W-bit word length. In general, there are several equivalent SPT representations
for a given number. The minimum representation refers to a representation requiring
the minimum number of SPT terms, of which there may also be more than one
choice.
14
The CSD number representation is a ternary coded word with the minimum
number of nonzero digits (SPT terms).
The CSD representation of a number is unique [11] and there are at most n/2
nonzero digits for an n-bit CSD word.
CSD numbers cover the range (-4/3, 4/3), out of which the values in the
range [-1,1) are of greatest interest.
Among the W-bit CSD numbers in the range [-1,1), the expected number
of non-zero digits tends asymptotically to n/3 + 1/9 [22]. Hence, on average,
CSD numbers contains about 33% fewer nonzero digits than 2s complement
numbers.
The drawback of CSD representation is that the distribution of CSD coefficient
is not uniform [23], as shown in Fig. 2.6, and it may cause seriously quantization
error problem. The distribution has many gaps in the region where the CSD value is
above 0.5 for a fixed number of nonzero digits and word length. When increase the
number of nonzero digits in same word length or increase the number of word length
in same nonzero digits, it can reduce the gaps of the distribution for larger CSD value.
Since the distribution of CSD coefficient is not uniform, some search strategies and
optimization algorithms are required in order to find the optimal CSD representation
of the origin coefficient, and fulfill the origin specifications in the same time. These
will be discussed later in Section 4.4.
15
(a)
(b)
Fig. 2.6 Distribution of CSD coefficient set (a) with 2, 3 and 4 nonzero digits for
8-bit word length and (b) for 6-, 8- and 10-bit word length with 2 nonzero
digits.
16
(2.8)
Where TCSA and TVMA are the delay times of the CSA and the VMA, respectively,
with Dmax = max{Dn }. The delay time of a CSA is only a one-bit full adder. The use
n
of carry save arithmetic takes full advantage of the CSD coefficients and reduces the
17
Fig. 2.7
In addition, the maximum number of SPT terms per coefficient Dmax generally
determines the throughput limit of the filter. Therefore, the objective of the filter
design is to optimize the filters frequency response while keeping the number of SPT
terms employed to a minimum and keeping the number of SPT terms per coefficient
within a specified bound.
2.3
18
the common factors in the coefficient set to produce a nested generation of the
data-coefficient products. While this may simplify the generation of products and
reduce the loading on the data broadcast lines, it is coefficient dependent and
extremely irregular.
For the twos complement data format, the implications of this problem are
especially important for the MSB driver. For example, suppose that a 5-bit input data
word
x0 x1 x2 x3 x4
(2.9)
is multiplied by the CSD coefficient of 2-4. Then the input data should be shifted 4
bits to the right as
x0 x0 x0 x0 x0 x1 x2 x3 x4
(2.10)
to the appropriate adder columns. It is obviously that the MSB bit x0 must be
broadcast to five FAs, while the others need only be broadcast to one FA. This will
lead to a far greater loading capacitance on the MSB as compared to the other bits on
the input data bus and longer chains of buffers would be needed to drive large load.
Furthermore, power consumption and chip area would also increase due to these
driving circuits and wiring buses.
To solve this problem of large MSB load, a solution called MSB Fix was
applied [28]. To illustrate this principle, consider again the previous data word in
(2.11). One can reforms this data word and equivalently represents it as
x0 x0 x0 x0 x0 x1 x2 x3 x4 = 0000 x0 x1 x2 x3 x4
+ 111100000
(2.11)
According to this representation, the multiplied data word can be achieved by the
summation of the shifted data word and a constant vector. The FA loading of MSB
drivers would be the same as that of the non-MSB drivers, and the inverted MSB is
19
broadcast. Since the MSB fixed technique only depends on the value of CSD shift,
the constant vector of each tap of the filter can be summed together to form a
compensation vector (CV).
N 1
CV = CVn
(2.12)
n =0
This CV can be added to the first tap of filter as shown in Fig. 2.8.
Fig. 2.8
(2.13)
20
(2.14)
This method can be put into implementation directly as shown in Fig. 2.9. It is clearly
in this figure that there is a 25% hardware reduction with this scheme, 4 adders
Fig. 2.9
Table 2.2
21
reduced to 3 adders. Another example, the common subexpressions (f1 and f2) of filter
coefficients are shown in Table 2.2.
Experimental results [30] show that a subexpression sharing design has a longer
critical path than a design with no sharing. Moreover, it is not suited to the polyphase
structure that will be discussed later in Section 3.3 although this can provide
significant reductions in complexity and loading for some filters. Subexpression
sharing is not employed in this module generator; however, it may become an option
in a future version.
2.3.3
Pipelining
Architectures that adopt the CSD multipliers and carry-save addition greatly reduce
the critical path of the filter. However, critical path can be reduced further through
pipelining of the structure. Pipelining stage of 2 to 3 FA delay can then be achieved
by placing pipeline registers between the CSAs and the adders as show in Fig. 2.10.
The register cost per filter tap for bit-level pipelining is
N reg ,n = min{Dn + 2, 4} Wint
(2.9)
where Wint is the internal word length of the filter. Pipelining to a single FA delay will
require substantially much more pipeline register hardware. Thus, only two-stage
pipeline will be incorporated as an option in the module generator. The final
architecture for a four-digit CSD linear phase tap using carry-save addition is shown
in Fig. 2.11.
22
Fig. 2.10
Fig. 2.11
23
Chapter 3
Multirate Multistage
Digital FIR Filter Design
3.1
sampling rates are present. Such systems are used for audio and video processing,
communication systems, general digital filtering, transform analysis, and more. The
two basic operations in multirate system are decreasing and increasing the sampling
rate of signals. The former is called decimation , or down-sampling. The latter is
called interpolation, or up-sampling.
3.1.1 Decimation
Fig. 3.1(a) shows the M-fold decimator, which takes an input sequence x(n) and
produces the output sequence
24
y (n) = x( Mn)
(3.1)
where M is an integer and y(n) is obtained by taking only M-th sample of the input
signal x(n) and discarding all others. Fig. 3.1(b) demonstrates the idea for M=2. As
will be shown mathematically, decimation results in aliasing unless x(n) is
bandlimited in a certain way. In general, therefore, it may not be possible to recover
x(n) from y(n) if aliasing occurs.
f =F
f =F M
(a)
x(n)
9 10
y(n)
(b)
Fig. 3.1
For the M-fold decimator, Eqn. (3.1), we derive an expression for the output
spectrum Y(ej) in terms of X(ej), which is
Y (e j ) =
1
M
M 1
X (e
j ( 2 k ) / M
(3.2)
k =0
Fig. 3.2 demonstrates the spectrum analysis for M=2. From this figure, the
stretched version X(ej/) may overlap with its shift replica. When this happens, the
input samples x(n) can not be recovered from the decimated version y(n). This
overlap effect is called aliasing.
25
X(ej)
-2
Decimation
Y(ej)
aliasing
-2
Fig. 3.2
A lowpass digital filter called the decimation filter as shown in Fig. 3.3(a)
precedes the downsampler. This filter ensures that the input signal being decimated is
bandlimited. The exact band edges of the decimation filter depend on how much
aliasing is permitted. The simplest form of lowpass decimation filter has magnitude
response as sketched in Fig. 3.3(b). Typically, the cutoff frequency is designed at
/M.
f =F
f =F
f =F M
M
(a)
H(ej )
(b)
Fig. 3.3
26
3.1.2 Interpolation
Fig. 3.4(a) shows a building block of an L-fold interpolator (or expander). By
inserting L-1 equally spaced zeros between each pair of samples, we device takes an
input x(n) and produces an output sequence
x(n/L), if n is integer-multiple of L
y(n) =
(3.3)
0,
otherwise.
f = FL
L
(a)
9 10
9 10
x(n)
n
y(n)
n
(b)
Fig. 3.4
By doing z-transform of Eqn. (3.3), the output time sequence of interpolator y(n)
can be written as follows.
27
Y ( z) =
=
y ( n) z
n =
y(kL) z
kL
k =
n = mul . of L
y ( n) z n
x(k )z
kL
(3.4)
k =
= X ( z L ).
X(ej)
-2
Interpolation
Images
Y(ej)
-2
f =F
f = F L
f = F L
(a)
H(ej )
(b)
Fig. 3.6
28
From Eqn. (3.4), we can find that Y(ej) = X(ejL). This means that Y(ej) is a L-fold
compressed version of X(ej) as demonstrated in Fig. 3.5, where L=2. The multiple
copies of the compressed spectrum are the images created by the interpolation process.
An interpolation filter that follows an interpolator to suppress those unwanted images,
as shown in Fig. 3.6.
3.2
decimator, and in Fig. 3.7(c) where a filter H(z) precedes an intrpolator. Such
interconnections arise when we try to use the polyphase representation (Section 3.3)
for decimation and interpolation filters. If the function H(z) is rational (i.e., a ratio of
polynomials in z or z-1) then we can redraw Fig. 3.7(a) as in Fig. 3.7(b) and Fig. 3.7(c)
as in Fig. 3.7(d). These are called noble identities [1]. The proofs of them are shown
below.
Identity 1
x(n)
H(z)
y1(n)
x(n)
H(zM)
(a)
y2(n)
H(zL)
y4(n)
(b)
Identity 2
x(n)
H(z)
y3(n)
x(n)
(c)
Fig. 3.7
x'(n)
(d)
The noble identities for multirate systems.
29
1 M 1
X ( z1/ M W k ) H (( z 1/ M W k ) M )
M k =0
1
j 2 k M
M
1 M 1
1/ M
k
M
M
=
X
(
z
W
)
H
(
z
e
)
M k =0
1 M 1
=
X ( z1/ M W k ) H ( ze j 2 k )
M k =0
1 M 1
=
X ( z1/ M W k ) H ( z )
M k =0
Y2 ( z ) =
(3.5)
, W = e j 2 / M .
= Y1 ( z )
Eqn. (3.5) shows that Y2(z) is equal to Y1(z). Also, consider that
Y4 ( z ) = H ( z L ) X '( z ) = H ( z L ) X ( z L ) = Y3 ( z )
(3.6)
3.3
h(n)z
-n
n =-
into odd numbered part and even numbered part, i.e., H(z) can be written as
H(z) =
h(n)z
-n
n =-
h(2n)z
-2n
+z
n =-
-1
h(2n + 1)z
(3.7)
-2n
n =-
If we define
H 0 (z) =
h(2n)z -n , H1 (z) =
n =-
h(2n + 1)z
-n
(3.8)
n =-
30
(3.9)
H(z)
f =1
f =2
H 0 (z2 )
H1 (z2 )
2
f =2
f =2
Fig. 3.8
H 0 (z)
H1 (z)
f =1
f =1
This representation can be put into implementation directly. Fig. 3.8 shows an
example of this reconstruction for a decimator with M=2. The polyphase
implementation (Fig. 3.8(c)) is much more efficient than a direct implementation as
shown in Fig. 3.8(a). Although there are some hardware overheads due to the
downsampler, H0(z) and H1(z) will operate at lower rate. Each of them requires only
N/2 multiplications and (N-1)/2 additions per unit time to carry out the processing
relative to N multiplications and (N-1) additions per unit time that the direct
implementation needs. Here, N is the tap length of the decimation filter H(z).
This polyphase representation can also be used on the implementation of
interpolator. Fig. 3.9 shows the general form of the polyphase implementation of
M-fold decimator and L-fold interpolator.
31
H 0 (z)
H 0 (z)
z 1
z 1
L
H1 (z)
H1 (z)
z 1
z 1
M
f =M
H M-1 (z)
H L-1 (z)
L
f =1
f =1
(a)
Fig. 3.9
M
f =L
(b)
However, the decomposition of the linear phase filter, which has symmetric
coefficients with polyphase property, into subfilters will usually destroy the
symmetric property of subfilters. Thus, it possibly increases hardware complexity as
compared to the original symmetric filter without using polyphase representation.
Since the decomposition into subfilters is accomplished by sampling every M-th
coefficient of the original impulse response, those subfilters resulting from sampling
which is symmetric about the center tap will be linear phase, while the other subfilters
will not. At most, there will have two subfilters to be linear phase as is summarized in
Table 3.1 [25]. The remaining nonlinear phase subfilters cannot use the folded
structure and will require a large number of multipliers to implement. Therefore,
when the sampling rate conversion ratio (SRCR) is even, the filter with even tap
length N can be redesigned to be N+1 for two more number of linear phase subfilters
to reduce the hardware complexity.
32
Table 3.1
3.4
Filter Length
Sampling Rate
Conversion Ratio
Number of
Linear Phase Subfilters
Even
Even
Even
Odd
Odd
Odd
Odd
Even
linear phase response and less quantization error. The main drawback of it is the large
mount of arithmetic operations needed in implementation, especially for the filters
with narrow transition band. In order to cope with the computational complexity of
sharp narrowband FIR filters, the interpolated FIR (IFIR) filter technique is
introduced [12]. The basic idea of it is to implement the filter H(z) as a cascade of
two FIR sections:
H(z) = G(z L ) I(z)
(3.10)
where G(zL) is a periodic model filter which generates a sparse set of impulse
response values with every L-th samples being nonzero, and I(z) is a image
suppressor which can be implemented with only a few arithmetic operations.
In frequency domain analysis, G(zL) has a periodic frequency response with
period 2/L and is designed to perform passband, transition band and stopband
shaping in the vicinity of the passband, and I(z) is designed to attenuate the unwanted
33
Fig. 3.10
Time and frequency domain behaviors of IFIR low-pass filter with L=3.
passband created by G(zL). If p denotes the passband deviation and s denotes the
stopband deviation, the overall IFIR filter must meet the requirements of
1 p G( z L) I ( z ) 1 + p
and
G ( z L) I ( z) s
in the stopband.(3.12)
Time and frequency domain behaviors of the IFIR approach used on a low-pass filter
design with L=3 are illustrated in Fig. 3.10.
Considering the image suppressor I(z), it can also be generally implemented
into a multistage structure and can be expressed as [13]:
I ( z ) = I1 ( z ) I 2 ( z L1 ) I 3 ( z L1L2 ) I k ( z L1L2 Lk 1 )
N Ik
I k ( z ) = ik ( n) z n
n =0
34
(3.13)
(3.14)
Lk 1 =
L
L1 L2 Lk 2
(3.15)
is an integer. If the stopband edge frequency of the low-pass IFIR filter is denoted by
s, the maximum value for the interpolation factor L is
Lmax =
s
(3.16)
where the brackets denote truncation. We take IFIR filter performance versus
transition region bandwidth for p = 0.5, p = 0.1 dB and p = 40 dB as an example
for two stages implementation. Fig. 3.11(a) shows the reduction factor of IFIR filter
design over conventional filter design, which is
SF =
N CON
, N IFIR = N I + N G
N IFIR
(3.17)
where NCON is the order of conventional filter, NI is the order of I(z) and NG is the
order of G(z). Fig. 3.11(b) shows the optimum interpolation factors versus transition
region bandwidth. In this case, the filter with narrow transition band, higher
interpolation factor L gives higher reduction factor SF, the maximum SF can up to 6.
In [12], the design of IFIR filters was based on the use of simple interpolators.
In the simple case, the image suppressor filter I(z) is a simple lowpass filter. This case
is the most robust and the fastest design. Further optimized IFIR designs [13], the I(z)
is designed with a don't-care region, where the periodic model filter G(zL) already
provides the required attenuation, the concept as shown in Fig. 3.12, thus leads to
fewer coefficients required of I(z). Another advanced IFIR design is much more
involved, it jointly optimizes the G(zL) and I(z) in order to achieve the required
specification. The result is significant savings in the order of the I(z) and a slighter
savings in the order of G(zL). Matlab has a useful function called ifir which provides
35
5.5
=
=
=
=
=
=
=
2
3
4
5
6
7
8
4.5
SF
3.5
2.5
1.5
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Transition Region Bandwidth ( rad/sample)
0.08
0.09
0.1
(a)
(b)
Fig. 3.11
36
the baseline and the advanced design approach. Obviously, the advanced IFIR design
method gives the fewest coefficients leading to fewest multipliers and hardware
complexity. However, the maximum value of the final coefficients may exceed 1, and
has higher coefficient range as compared to the simplest design method. It would be a
problem when the filter is realized with finite precision. Our module generator will
give more nonzero digits or use scaling to compress the range of the coefficients to
prevent the filter coefficients from overflow.
Fig. 3.12
With carefully selecting the interpolation factor L, the number of stages and
choosing the best method to implement the interpolator, there will be an optimum
IFIR filter design with minimum hardware complexity. The price paid for these
reductions is only a slight increase in the number of delay elements as compared with
direct implementation. In addition, the IFIR implementation gives smaller coefficient
sensitivity and better roundoff noise than direct implementation [13].
37
3.5
with a large decimation / interpolation ratio. Although this can be done by designing a
filter directly and using the polyphase structure to save the arithmetic operations, it is
more efficient to design in multiple stages [1][2], and the IFIR technique is still
applicable.
Considering a decimator shown in Fig. 3.13(a), the lowpass filter H(z) will be a
narrow band case as the decimation ratio M becomes large. The IFIR technique can
be used to reduce the hardware complexity of H(z).
Fig. 3.13
38
L
f=1
H(z)
f=L
IFIR Technique
G(zL1)
L1 L2
f=1
f=L
L2
I(z)
(b)
I(z)
(c)
Noble Identity
G(z)
f=1
(a)
L1
f = L2
f=L
Polyphase Decomposition
G0(z)
I0(z)
L2
G1(z)
L2
GL2-1(z)
L2
f=1
Fig. 3.14
(d)
L1
-1
-1
Z-1
f = L2
I1(z)
L1
IL1-1(z)
L1
f = L2
Z-1
f=L
If we carefully design the interpolation factor L of the periodic model filter G(zL)
to be M1, as shown in Fig. 3.13(b), the structure of the decimator can be reconstructed
into Fig. 3.13(c) from noble identity. By this structure, the decimator is divided into
two sections, and both of them can be implemented by polyphase representation with
less filter coefficients resulting from image suppressor I(z) and model filter G(z), as
shown in Fig. 3.13(d). In addition, the interpolator can be designed in the same way,
as shown in Fig. 3.14.
Furthermore, the multistage IFIR decimator / interpolator structure can also
extend to three stages or more. Fig. 3.15 shows the derivation of the structure with
three-stage decomposition.
39
H(z)
f =M
I1 (z)
G(zM1M 2 )
I 2 (z M1 )
M1 M 2 M 3
f =M
I1 (z)
I 2 (z M1 )
M1 M 2
f =M
M1
I 2 (z)
G(z)
f =1
M3
f = M3
f =M
I1 (z)
f =1
M2
G(z)
f = M3
f = M2 M3
f =1
M3
f =1
40
Chapter 4
Module Generator Implementation
The design flow of the module generator and the program implementation
issues will be discussed in this chapter. The system configuration and dataflow of the
module generator are shown in Fig. 4.1. The module generator consists of many
sub-modules. The main sub-modules are the multistage architecture analysis and
synthesis, the coefficient calculation, the coefficient optimization, the word length
estimation and the synthesizable Verilog code generation. All modules are written in
C++ language and the operation of each module will be described in the following
sections.
4.1
Specifications
The inputs of our module generator are the system-level specifications, which
and
AP = 20 log(1 P )
(4.1)
AS = 20 log S
(4.2)
41
Fig. 4.1
Design flow of the module generator (a) the digital FIR filter design flow
(b) the multirate multistage digital FIR filter / decimator / interpolator
design flow.
Table 4.1
Filter Type (LP, HP, BP, BS, Decimator, Interpolator, Multistage FIR)
Tf
P , S
P / AP
S / AS
Win
SNR
Up Conversion Ratio
42
1+P
Passband ripple
1-P
0.8
Amplitude
0.6
Transition width
0.4
0.2
S
Stopband ripple
-S
-0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fs/2
Fig. 4.2
43
4.2
Table 4.2
Case
0.05
0.1
0.01
0.001
II
0.09
0.1
0.01
0.001
III
0.01
0.02
0.01
0.001
44
Fig. 4.3 show the total taps requirements in these three cases to implement I(z),
G(zL) and the overall filter as a function of the interpolated factor L for the
single-stage implementation of I(z). The interpolated factor L = 1 corresponds to the
conventional direct form FIR filter. As show in these figures, the IFIR filters provide
significant reductions in the number of the taps over conventional direct form designs.
As L increases, the number of taps of G(zL) decreases exponentially and the taps of
I(z) increases exponentially. We can increase L until the decrease in the number of
taps of G(zL) becomes smaller than the increase in the number of taps of I(z) and the
minimum total taps of the overall filter is obtained. The maximum interpolated factor
is limited to Lmax = / s and Lmax for case I, II, III are 10, 10 and 50 respectively.
When comparing the results for case I and case II, it is observed that as the relative
transition bandwidth is made smaller while keeping the same stopband edge, the
interpolated factor of the optimum value Lopt of L becomes larger. As for case II and
case III, if the transition width is same, the one with smaller stopband edge will have
relative large tap contribution of I(z) and Lopt/Lmax will decrease. However, as the
absolute value of Lopt increases, it will result in larger saving in the number of
arithmetic operations.
Fig. 4.4 shows the total taps requirements in case III to implement I(z), G(zL)
and the overall filter as a function of the interpolated factor L(=L1L2) for the
two-stage implementation of I(z) (=I1(z)I2(zL1)). Comparing sing-stage and two-stage
implementations of I(z) in case III, the two-stage implementation significant saving
the number of the taps of the overall filter than the single-stage implementation. This
is because of the two-stage implementation of I(z) requires considerably fewer taps of
I(z) and the optimum decomposition occurs at a high value of L(=L1L2), thus it also
45
Case I
Case II
Case III
Fig. 4.3
The number of taps versus interpolated factor L for the periodic model filter
G(zL), the image suppressor I(z) and the overall filter H(z).
46
600
25
500
20
T ap Num ber
Tap Number
400
300
200
15
10
100
0
1
0
1
2
6
4
6
4
5
4
2
7
2
7
1
Interpolated factor L2
Interpolated factor L1
1
Interpolated factor L2
Interpolated factor L1
G(zL1L2)
I1(z)
80
600
70
500
400
50
60
40
30
300
200
20
100
10
0
1
0
1
2
6
4
4
6
5
4
3
2
7
1
Interpolated factor L2
Interpolated factor L1
1
Interpolated factor L2
Interpolated factor L1
I2(z)
Fig. 4.4
3
5
2
4
H(z)
decreasing the number of taps of G(zL). When the single-stage implementation of I(z)
have a very small taps, the multistage implementations of I(z) will provides only a
slight saving over its single-stage implementation. Table 4.3 summaries the optimal
IFIR filters with single-stage and two-stage implementations of I(z) in these three
cases. We can observed that a filter with the narrow passband width and the transition
band using the IFIR method implement will has significant reduces the total taps of
47
the overall filter. When using the two-stage implementations of I(z), it will further
saving the total taps of the overall filter compared with the one-stage implementation
of I(z). The analysis of the optimal IFIR filters for case I ~ III by our module generator
costs 12% additional taps compared with [13]. Because our module generator
decomposition of IFIR filters with stringent specifications to guarantee the final design
will satisfy the system specification, thus it will take more taps of overall design.
Table 4.3
Case
NCON
NH
NI
NG
103
43
15
28
2.53
II
510
126
39
87
4.09
III
510
88
41
47
11-13
5.79
510
60
(NI1=15;
29
NI2=16)
Note:
20
(L1=4;L2=5)
8.5
The IFIR decomposition analysis of our module generator will select several
decomposition methods that have minimum taps of the overall filter H(z) to
implement the multistage designs.
48
Mk
(4.3)
to decompose the decimator / interpolator into all the possible multistage sets. For a
Fig. 4.5
49
K-stage decimator / interpolator, the filter specification for each stage shall be chosen
to ensure that the overall filter requirements are met as shown in Fig. 4.5, where the
passband ripple is P/K, and the stopband ripple is S.
Moreover, the polyphase decomposition is used to decompose the filter into
subfilters. In order to consider both high-speed and low-speed applications, the
transposed direct form structure is chosen. However, the decomposition of the linear
phase filter into M subfilters will usually destroy the symmetric property of subfilters
and result the nonlinear phase subfilters. Therefore, in our module generator when the
SRCR is even and the filter with even tap length N can be redesigned to be N+1 for
two more number of linear phase subfilters to reduce the hardware complexity.
4.3
Coefficient Calculation
The floating-point filter coefficient h(k) is generated by the Parks-McClellan
optimal equiripple method as given in the MATLAB gremez.m function [32]. If the
coefficients do not satisfy the desired filter specifications, the filter order is increased
and coefficients are calculated again. In addition, the user can input the coefficients
derived from other filter analysis packages.
4.4
Coefficient Optimization
The simple rounding of a filters floating-point coefficients to their nearest CSD
50
The two most popular techniques for CSD coefficients optimization are
mixed-integer-linear-programming (MILP)[33] and local search [23].
MILP is known to be the optimal technique for designing FIR filters employing
conventional fixed word length coefficients. A drawback of MILP is that its
computation time grows at least exponentially with filter length and this limits its
application to the design of filters having short to medium length. However, for filters
with CSD coefficients, even though MILP optimally searches the SPT coefficient
space, it does not guarantee that the solution produced has the minimal total number
of adders. Obviously, the two major goals of a CSD search algorithm are:
(1) a filter that can be implemented with minimum hardware requirement, and
(2) to minimize the computation time in such a design procedure.
The local search techniques have been found to perform nearly as well as the MILP
method while requiring substantially less computational time for their convergence.
According to the methods presented in [23][34], we adopt a two-step local search
algorithm to round and optimize our filter coefficient with CSD codes as discussed in
the following two sections.
easily compensated by a constant gain stage before or after the filter system. The set
of numbers represented by a CSD code with a fixed number of nonzero digits is not
uniformly distributed. Therefore, properly scaling the ideal filter coefficients prior to
rounding them to the nearest CSD code can usually significantly reduce the
magnitudes of the coefficient quantization errors, which means an improved
frequency response. Fig. 4.6 shows the flowchart of the scaling strategy for filter
coefficients.
Since the coefficient quantization process is highly nonlinear, there is no way to
predict in advance which SF will produce better results. Therefore, a brute force
search of SF must be performed. All the filter coefficients are assumed to be in the
range [ 0.5,0.5] . Then the choice of the SF can be constrained to such a range that
the SF is not greater than the value SF_max and is not less than the value SF_min.
The limits SF_max and SF_min are defined as follows: multiplying by SF_max
makes the absolute value of the largest coefficient equal to 2-1; multiplying by
SF_min makes the absolute value of the largest coefficient equal to 2-2. During the
search procedure the SF change from SF_min to SF_max with the step size of 2-q,
where q is the coefficient wordlength.
For each SF, the frequency response is computed only if the quantized CSD
coefficient is different from the previous one. Finally, we select the SF which results
in the minimum total number of SPT terms ( Dn ) and fulfill the specification of
n
filter.
52
( SF _ max SF _ min)
2 q
Fig. 4.6
53
S ( S 1)
= 2S 2
2
(4.4)
coefficient sets are searched. The local search process proceeds in an iterative manner.
After the search cycle is completed, the coefficient sets whose frequency response fit
the filter specification are selected and the bi-variate local search is repeated with the
new coefficient sets. This process continues until no further improvement is obtained.
54
4.5
4.5.1
Overflow Prevention
If the final output is within the range of the original word length, overflow in
partial sums are unimportant. This is a desirable property of 2s complement
arithmetic. However, if the final output exceeds the range of the signal, the value of
the output sample will be wrong and methods should be taken to prevent this.
An
where
or
N 1
N 1
2
R = log 2 h (k ) = log 2 h 2 (k )
k =0
k =0
2
N 1
R = log 2 h(k )
k =0
(4.5)
(4.6a)
(4.6b)
where R denotes right shift bit(s). The method given in Eqn. (4.6a) probably lead to
shorten internal word length than Eqn. (4.6b) but this form of scaling will
occasionally have overflow which results in performance degradation. Therefore,
the method in Eqn. (4.6b) is adopted which never cause overflow because it is based
on the worst-case conditions for overflow.
increases R bit(s) and the coefficients are then shifted right R bit(s) to prevent
overflow.
55
(
(
)
)
E y 2 ( n)
E y 2 ( n)
SNR = 10 log
=
10
log
2
2
E e (n)
E ( y (n) y (n) )
(4.7)
The simulation block is shown in Fig. 4.7 and Fig. 4.8. They show the internal word
length estimation flow.
y(n)
Fig. 4.7
56
Fig. 4.8
The initial internal word length will be evaluated for the result that does not
introduce any error first. Then the internal word length will be decreased to the value
that its SNR value still fits the specification. Finally, the minimum internal word
length, which fulfills the specification, will be obtained.
4.6
4.6.1
Hardware Estimation
Before generating the hardware of the FIR digital filters, interpolated FIR filters
and multirate multistage decimator / interpolator, the module generator will do the
57
hardware complexity estimation for each design. For comparison are total nonzero
digit, maximum nonzero digit, and internal word length. The priority of the indices is
as follows.
z For low-complexity application:
Priority: Internal Word Length > Total Nonzero Digit > Max. Nonzero Digit
z For high-speed application:
Priority: Max. Nonzero Digit > Internal Word Length > Total Nonzero Digit
In addition, the module generator will also estimate the computation APU and
storage SPU of each design. Finally, it will generate a file, hardware.out, to record the
hardware estimation.
Fig. 4.9
Transposed direct form filter strcture utilizes with (a) carry save adders
(CSA) and, (b) carry save adders (CSA) with pipelining.
buffer
Z-1
x(n)
x(n-1)
shift
CSA
Z-1 Z-1 pipelining
Fig. 4.10
Z-1
Z-1
Z-1
Z-1
59
Z-1
Z-1
4.6.3
The basic idea of the interpolated FIR (IFIR) filters is to implement the
prototype filter H(z) as a cascade of two FIR sections, a periodic model subfilter G(zL)
and a image suppressor I(z), as show in Eqn. (3.10).
The periodic model subfilter G(zL) are based upon the behavior of an N-tap
nonrecursive linear-phase FIR filter when each of its unit delays are replaced with
L-unit delays, with the interpolated factor L being an integer, as shown in Fig. 4.11(a).
If the H(z) impulse response of a nine-tap FIR filter is that shown in Fig. 4.11(b), the
impulse response of the periodic model filter, where, for example, L = 3, is the G(zL)
in Fig. 4.11(c). The module generator generates the symmetric transposed direct from
structure for the periodic model subfilter with expanded delays between the taps and
adopts the above FIR filter design to implement the image suppressor.
It is an important implementation issue when a narrow IFIR filter passband
width and transition band is using in IFIR filters, a larger interpolated factors L can be
used. However, it requires a larger size of the storages to be allocated, in order to hold
a sufficient number of input samples for the periodic model subfilter. This is a
disadvantage to the periodic model subfilter G(zL) because the size of the storages
must be equal to [L(N-1)-1], N is the tap length of G(z). Although, it will increase
the storage area, but it can reduce the hardware complexity effectively relative to the
conventional FIR design when implement narrowband FIR filter. If we use dual
clocks to G(zL), it will effective reduce the storages requirment of the periodic model
subfilter such as the design shown in Fig. 4.12.
60
(a)
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
-0.1
-0.1
(b)
10
15
20
(c)
Fig. 4.11
(a) The symmetric transposed direct from structure for G(zL) with L-unit
delays between the taps; (b) the impulse response of H(z); (c) the impulse
response of G(zL) with L=3.
Fig. 4.12
The symmetric transposed direct from structure for G(zL) with dual clocks.
61
4.6.4
Both decimator and interpolator can have two structures in direct form or
transposed direct form. When the implementation is to use the transposed direct form
for decimators and the direct form for interpolators, there are the registers to be
shared between the subfilters, as shown in Fig. 4.13(b)(c) for the example of M=3,
N=9. This is the so-called memory-saving technique [25]. Another type of
implementation is to use the direct form for decimator and transposed direct form for
interpolator.
mirror symmetric pair, as shown in Fig. 4.13(a)(d). This is the so-called mirror
symmetric filter pairs technique [25].
The word length of the registers in structures (b) and (d) need to store internal
signal and is longer than the word length of the registers in structures (a) and (c)
which store input signal. With mirror symmetric filter pairs, structures (a) and (d)
have only about half of the multipliers in structures (b) and (c). However, structures
(b) and (c) which using memory-saving technique have approximate 1/M registers of
those in structures (a) and (d). Although no structure is absolutely better than the
other one, the critical path of the transposed direct form is shorter than that of direct
form. For high-speed application, therefore, the structures (b) and (d) will be selected.
62
(a)
(b)
Fig. 4.13
63
(c)
(d)
Fig. 4.13
64
4.7
Module Generator
Our module generator are written in C++ language and employing Matlab as a
computation engine. The Matlab engine library is a set of routines that allows we to
call Matlab from our own programs. Our module generator has about 72 subprograms
in the main program and it consists of many sub-modules in the operation flow as
shown in Fig. 4.14. The main sub-modules are the multistage architecture analysis
and synthesis, the coefficient calculation, the coefficient optimization, the word
length estimation and the synthesizable Verilog code generation.
Following the operation flow, the module generator will read the system
specifications firstly. According the filter type definition of the specifications, it will
define the design is the digital FIR filter (LP, HP, BP and BS), the multistage IFIR
filter or the multirate multistage decimator / interpolator. When the design is the
multistage IFIR filter or the multirate multistage decimator / interpolator, it will
through the multistage architecture analysis / synthesis sub-module to decomposed
the optimal architectures of the IFIR filter and decimator / interpolator. After the
analysis of the optimal architecture, the coefficient calculation sub-module employ
Matlab to estimate the floating-point filter coefficients and generate the matlab.out to
record the coefficient values in the same time. We adopt a two-step local search,
scaling and local search, to round and optimize our filter coefficient with CSD codes.
For high speed approach, it will select the coefficient sets with the minimum nonzero
digits of a coefficient. For low complexity approach, it will select the coefficient sets
with the minimum total nonzero digits. According the optimal coefficient sets, our
model generator will estimate the internal word length in the system that must be
65
Fig. 4.14
66
satisfy the requirement of the SNR. Hardware estimation sub-module will generate
the hardware.out to record the hardware design costs. Finally, the synthesizable
Verilog code generation sub-module will generate the synthesizable Verilog code of
multirate multistage digital FIR filter / decimator / interpolator.
67
Chapter 5
Experimental Results
In this chapter, the design examples of FIR filter, interpolated FIR filter and
multirate multistage filter generated by the module generator are presented. All
performance data presented in this chapter are pre-layout estimations.
5.1
mixed integer linear programming (MILP) algorithm [33], and Samuelis local search
algorithm [23]. The pass-band and stop-band edge frequencies are 0.3 and 0.5,
respectively. The passband ripple is 0.05dB and stopband ripple is 50dB. The word
length of the input signal is assumed to be 14 bits. The minimum number of SPT terms
required by the various methods mentioned above is summarized in Table 5.1. When
the maximum allowed number of SPT terms per coefficient is limited to four, the filter
designed by our methods saves 22% (21%~24%) SPT terms and costs 5% (4%~7%)
68
Table 5.1 Minimum number of SPT terms required to attain -50dB NPR.
Algorithm
#SPT
68
66
64
54
52
28
28
28
29
30
68
28
cannot reach -50 dB
66
57
Table 5.2
28
29
A
7.46
5069
2824
2245
B
4.65
8103
3613
4490
C
4.65
9119
3907
5212
B
1.57
11520
5595
5925
C
1.25
12862
5999
6863
A
3.86
8338
5799
2539
additional tap length. If the application requires us to limit the maximum number of
SPT terms per coefficient to three, to have higher throughput rate, the filter designed
using Samuelis algorithm failed to reach -50 dB NPR. However, using our proposed
method can save 16% SPT terms and costs 4% additional tap length. The design results
are converted into three structures mentioned in Section 4.6.2. We designed the filters
69
of Work #1 with TSMC 0.25m process and summarized the results in Table 5.2. The
synthesis results summarized in Table 5.2 show that structure A is suitable for the
low-speed (133MHz) and area-efficient application; Structure B is suitable for the
high-speed (637MHz) application; and Structure C is suitable for the very high-speed
(800MHz) application. Therefore, our module generator can provide flexible hardware
implementation for various applications. The frequency responses of the filter designed
by our module generator are shown in Fig. 5.1. We designed the filters of Work #1 and
Work #3 with TSMC 0.25m process and summarized the results in Table 5.3. Work #3
design saves about 19% SPT terms and costs 6.6% additional tap length compared
with Work #1 design. The area of Work #1 design is about 1.1 times of Work #3 design.
The area of an FA is about 6.3 gates and a register is about 5.6 gates. The power
dissipation of Work #1 design is about 1.1 times of Work #3 design.
20
0.02
10
0.01
0
Normalized Magnitude Response (dB)
-10
-0.01
-20
-0.02
-0.03
-40
0.05
0.1
0.15
0.2
0.25
0.3
-50
-60
-70
-80
-90
W ork
W ork
W ork
W ork
W ork
-100
-110
-120
Fig. 5.1
0.1
#1
#2
#3
#4
#5
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
70
Table 5.3
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Work #1
Work #3
6527
6080
Combination Area
4314
3719
Noncombination Area
2213
2361
80.66
72.99
21.52 MHz
5.38 MHz
0.2110
0.3204
0.1 dB
30 dB
11 bit
14 bit
Fig. 5.2 shows the frequency response between the original coefficients and the
coefficients after coefficient optimization. In addition, the specifications after module
generator are summarized in Table 5.5. By using scaling strategy we can have less
number of total nonzero digit than [38] so fewer adder will be needed. Moreover, with
71
local search strategy the number of total nonzero digit is further reduced. The number
of maximum nonzero digit, which represents the critical path of the filter, is also
reduced.
10
0.1
0
0.05
0
-10
-0.05
-20
-0.1
0.05
-30
0.1
0.15
0.2
-40
-50
-60
-70
Original
Scaling Strategy
Local Search Strategy
-80
-90
0.1
Fig. 5.2
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
The chip is deigned with TSMC 0.25m process because the max nonzero digit is
only 2 bits, the result of structure C is just the same as the structure Bs. For
low-complexity applications, the overall area of [38] is about 1.64 times and the power
dissipation is about 1.95 times of the structure A. Moreover, using the structure B (or
structure C) for high-speed application, the chip can operate at 714MHz. The synthesis
results are summarized in Table 5.6.
72
Table 5.5
[38]
This Work
Scaling
Local Search
35
35
Normalized Passband
Edge Frequency
0.2010
0.2148
Normalized Stopband
Edge Frequency
0.3204
0.3203
0.0558
0.0531
31.1030
31.5092
14
14
14
63
57
49
SNR (dB)
46.8
Table 5.6
41.2
Work#2
[38]
Technology (um)
0.25
0.25
0.25
714 MHz
146 MHz
72 MHz
11117
5155
8477
Combination Area
5011
2496
5938
Noncombination Area
6106
2659
2539
520.05
6.83
13.31
73
5.2
cellular proposed by Qualcomme [40], by our module generator. The specification are
shown in Table 5.7. Then a conventional filter design using the Parks McClellan
algorithm would require an order N = 69. Base on the algorithm shown in section 4.2.1,
we use the optimal interpolated factor L = 4 for IFIR design with single-stage
implementation of I(z) and L1L2 = 22 for IFIR design with two-stage implementation
of I(z). After module generator, the specifications of the conventional filter, the periodic
model subfilters G(zL) and the image suppressors I(z) are summarized in Table 5.8 [41].
Notice that the system G(z4)I(z) has linear phase property since G(z) and I(z) have this
property.
19.6608 MHz
0.064087
0.125
Passband Ripple in dB
0.1 dB
Stopband Attenuation in dB
40 dB
5 bits
Fig. 5.3 shows the conventional filter and the frequency responses for the IFIR
filters with single-stage I(z) and L = 4 as well as the frequency responses for the
subfilters, I(z) and G(z4). Fig. 5.4 shows the conventional filter and the frequency
74
Table 5.8
Design results by module generator with IFIR filter designs and the
conventional filter.
Conventional
FIR
G(z4)
69
15
19
Normalized Passband
Edge Frequency
0.0645
0.0683
0.2598
Normalized Stopband
Edge Frequency
0.1230
0.3730
0.4980
0.0969
0.0284
0.0365
40.2410
43.1643
42.7119
13
15
11
13
204
27
39
104
14
21
SNR (dB)
41.0
41.5
40.0
I2(z2)
G(z4)
19
Normalized Passband
Edge Frequency
0.0643
0.1289
0.2578
Normalized Stopband
Edge Frequency
0.8730
0.7480
0.4980
0.0232
0.0283
0.0280
40.7033
40.6462
40.4649
10
11
11
12
13
41
22
SNR (dB)
41.2
44.7
42.4
75
10
-10
-20
-30
-40
-50
-60
-70
-80
I(z)
G(z 4)
-90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
(a)
20
0.1
0.05
0
-0.05
-20
-0.1
0
-40
0.02
0.04
0.06
-60
-80
-100
-120
Overall IFIR Filter
Conventional Filter
-140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
(b)
Fig. 5.3
76
20
-20
-40
-60
-80
-100
I1(z)
I2(z 2)
G(z 4)
-120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
(a)
20
0.1
0.05
-20
-0.05
-0.1
0.01 0.02 0.03 0.04 0.05 0.06
-40
-60
-80
-100
-120
Overall IFIR Filter
Conventional Filter
-140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
(b)
Fig. 5.4
77
responses for the IFIR filters with two-stage I(z) and L1L2 = 22 as well as the
frequency responses for the subfilters, I1(z), I2(z2) and G(z4).
The synthesis results of the IFIR filter designs and conventional filter design are
summarized in Table 5.9.The filter I(z) is very inexpensive, whereas the cost of G(z4) is
little more than half the cost of the conventional design. When the timing constraints of
the conventional filter and the IFIR filters are equal, the area of the conventional filter
is about 1.63 times of the IFIR filter with single-stage I(z) and about 1.72 times of the
IFIR filter with two-stage I(z). The power dissipation of the conventional filter is about
12.46 times of the IFIR filter with single-stage I(z) and about 13.10 times of the IFIR
filter with two-stage I(z).
Table 5.9
The synthesis results of the conventional filter and the IFIR filters.
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Conventional
Filter
G(z4)
14839
2049
7080
Combination Area
9190
1118
1684
Noncombination Area
5649
931
5396
1400.00
25.78
86.57
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
I2(z2)
G(z4)
721
1369
6523
Combination Area
275
504
1540
Noncombination Area
446
865
4983
8.89
17.31
80.70
78
5.3
5.3.1
Decimator
Following the design of IFIR filters, we designed multirate decimators that is use
in the CDMA cellular [40] and decimated factor (M) is eight. The synthesis results are
summarized in Table 5.10. The conventional decimator that is single-stage
decimator using the polyphase structure to save the arithmetic operations and the
multirate decimators are designed by our proposed method.
Table 5.10
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Conventional
Decimator
(M=8)
Stage#2
(M2=2)
13742
1580
2554
Combination Area
12567
1162
1792
Noncombination Area
1174
418
761
21.30
6.59
4.29
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Stage#2
(M2=2)
Stage#3
(M3=2)
638
808
2417
Combination Area
336
496
1706
Noncombination Area
301
312
710
4.48
3.15
4.42
79
10
0.1
-10
0.06
0.04
0.02
-0.1
-20
0.02
0.04
0.06
-40
-60
0
-0.02
-20
-0.04
-0.06
-30
0.01
0.02
0.03
0.04
0.05
-40
-50
-60
-70
-80
-100
Original
Scaling Strategy
Local Search S trategy
0.1
0.2
Original
Scaling Strategy
Local Search S trategy
-80
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sam ple)
0.8
0.9
-90
(a)
20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sam ple)
0.8
0.9
(b)
(d)
(f)
20
0.05
0.05
-0.05
Normalized Magnitude Response (dB)
-20
-0.05
0.02 0.04 0.06 0.08 0.1
0.12 0.14
-40
-60
0.01
0.04
0.05
-40
-60
Original
Scaling Strategy
Local Search Strategy
Original
Scaling S trategy
Local Search Strategy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
10
0.8
0.9
-100
(c)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
20
0.05
0.8
0.9
0.06
0.04
0
0
0.02
0
-10
0
Normalized M agnitude Response (dB)
-0.05
Normalized M agnitude Response (dB)
0.03
-80
-80
-100
0.02
-20
-20
-30
-40
-50
-60
-0.02
-0.04
-20
-0.06
0.05
0.1
0.15
0.2
0.25
-40
-60
-70
-80
Original
Scaling Strategy
Local Search S trategy
-80
-90
0.1
Fig. 5.5
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
Original
Scaling Strategy
Local Search Strategy
0.8
0.9
(e)
-100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
80
Fig. 5.5 shows the frequency responses of the subfilters of the multirate
decimators and the conventional decimator. The frequency responses of the
conventional FIR and the multirate IFIR filters as shown in Fig. 5.6 When the timing
constraints of the conventional decimator and the multirate multistage decimators are
equal, the area of the conventional decimator is about 3.32~3.56 times and the power
dissipation is about 1.78~1.96 times of the multirate multistage decimators.
20
0.1
0.05
0
-0.05
-20
-0.1
0
0.02
0.04
0.06
-40
-60
-80
-100
-120
-140
Fig. 5.6
5.2
Conventional Filter
Multirate IFIR with 2-stage
Multirate IFIR with 3-stage
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
The frequency responses of the conventional FIR and multirate IFIR filters.
Interpolator
Using the module generator, we demonstrate an example of the sharp narrowband
interpolator whose specifications are summarized in Table 5.11. After module generator,
81
the specifications of the conventional filter, the periodic model subfilters G(zL) and the
image suppressors I(z) are summarized in Table 5.12.
Table 5.11
Table 5.12
0.05
0.10
0.05
0.1
40.0
Interpolation Ratio
G(z3)
85
10
31
Normalized Passband
Edge Frequency
0.0507
0.0586
0.1523
Normalized Stopband
Edge Frequency
0.0996
0.5664
0.2988
0.0712
0.0227
0.0376
40.5573
45.6249
42.4263
15
10
18
12
15
269
16
73
SNR (dB)
40.2
40.7
40.4
82
Table 5.13
Technology:
TSMC 0.25um
Input Frequency:
200 MHz
Conventional
Interpolator
(L=6)
Stage#1
(L1=3)
Stage#2
(L2=2)
14545
944
3810
Combination Area
14147
802
3737
Noncombination Area
398
142
73
566.20
26.93
390.52
20
0.1
0.05
0
0
-0.05
-20
-0.1
0.01
0.02
0.03
0.04
0.05
-40
-60
-80
-100
Original
Scaling Strategy
Local Search Strategy
-120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
20
0.8
0.9
(a)
10
0.05
0.05
0
0
0
-10
0
-0.05
0.01
-40
0.02
0.03
0.04
-20
0.05
-60
-80
-100
-120
-30
-0.05
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-40
-50
-60
-70
-140
-160
-20
Original
Scaling Strategy
Local Search Strategy
0
0.1
Fig. 5.7
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
Original
Scaling Strategy
Local Search Strategy
-80
0.8
0.9
(b)
-90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
(c)
conventional interpolator. (a) the conventional interpolator of order 85, the subfilters of
the multirate interpolator with two-stage (b) I(z) of order 10 (c) G(z) of order 31.
83
The synthesis results are summarized in Table 5.13. It is evident that, in general,
multistage designs yield very significant reduction in both computation (APU) and
storage (SPU) requirements compared with single-stage designs. The reduction is due
to the wide transition band of the subfilters, I(z) and G(z), leading to small number of
tap length. The conventional interpolator that is single-stage interpolator using the
polyphase structure to save the arithmetic operations and the multirate interpolators
designed by our proposed method. Fig. 5.7 shows the frequency responses of the
subfilters of the multirate decimators and the conventional decimator. The frequency
responses of the conventional FIR and the multirate IFIR filters as shown in Fig. 5.8.
When the timing constraints of the conventional interpolator and the multirate multistage
interpolator are equal, the area of the conventional interpolator is about 3.06 times and
the power dissipation is about 1.36 times of the multirate multistage interpolator.
20
0.1
0.05
0
0
-0.05
-20
-0.1
0.01
-40
0.02
0.03
0.04
0.05
-60
-80
-100
-120
Conventional Filter
Multirate IFIR with 2-stage
-140
Fig. 5.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Frequency ( rad/sample)
0.8
0.9
The frequency responses of the conventional FIR and multirate IFIR filters.
84
Chapter 6 Conclusions
Chapter 6
Conclusions
85
Chapter 6 Conclusions
will reduce the hardware complexity further by local search method. Next, word
length estimation can make the system achieve the SNR requirement with minimum
internal word length. Finally, synthesizable verilog code generation will generate the
synthesizable Verilog-HDL codes, which are written in behavior and RTL-level for
flexibility.
We have designed several filters with TSMC 0.25m standard cell. For
64-QAM baseband demodulator design shows that the area is reduced about 1.64
times and the power dissipation is saved about 1.95 times for low-complexity
applications. Moreover, for high-speed application, the chip can operate at 714MHz.
Besides, we designed the IFIR filters which specification is the first version of the
CDMA cellular, the area is reduced about 1.72 times and the power dissipation is
saved about 13.10 times as compared with direct form design. A designed multistage
decimator that is used in CDMA cellular shows that the area is reduced about 3.56
times and the power dissipation is saved about 1.96 times as compared with
conventional decimator. Finally, an example of the narrowband multistage
interpolator designed, the area is reduced about 3.06 times and the power dissipation
is saved about 1.36 times as compared with conventional interpolator.
Because the generator requires only system-level specifications, system
designers who are inexperienced in VLSI design can use the module generator easily.
Furthermore, by using this module generator, an efficient design of a chip can be
successfully completed in a few minutes.
86
References
References
[1]
[2]
[3]
[4]
CoCentric system studio filter design tools user guide, Synopsys, Inc., May
2002.
[5]
M. Ishikawa et al., Automatic layout synthesis for FIR filters using a silicon
compiler, IEEE Int. Symp. Circuits Syst., pp. 2588-2591, May 1990.
[6]
[7]
[8]
[9]
87
References
digital filters with invariant transfer function, IEEE Int. Symp. Circuits Syst.,
pp. 631-634, May 1993.
[10] K. Y. Cheng, Multiplierless Multirate FIR Digital Filter / Decimator /
Interpolator Module Generator, MS thesis, Dept. of EE, National Central Univ.,
Taiwan, Jun. 2003.
[11] G. W. Reitwiesner, Binary arithmetic, Advances in Computers, vol. 1, NY:
Academic, pp. 231-308, 1966.
[12] Y. Neuvo, C. Y. Dong, and S. K. Mitra, Interpolated finite impulse response
filters, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp.
563-570, Jun. 1984.
[13] T. Saramki,Y. Neuvo and S. K. Mitra, Design of Computationally Efficient
Interpolated FIR Filters, IEEE Trans. Circuits Syst., VOL.35 , NO.1, Jan 1988.
[14] M. Ishikawa et al., Automatic layout synthesis for FIR filters using a silicon
compiler, IEEE Int. Symp. Circuits Syst., pp. 2588-2591, May 1990.
[15] R. Jain, P. T. Yang, and T. Yoshino, FIRGEN: A computer-aided design system
for high performance FIR filter integrated circuits, IEEE Trans. Signal
Processing, vol. 39, pp. 1655-1668, Jul. 1991.
[16] R. Hawley, T.-J. Lin, and H. Samueli, A silicon compiler for high-speed
CMOS multirate FIR digital filters, IEEE Int. Symp. Circuits Syst., vol. 3, pp.
1348-1351, May 1992.
[17] E. Bidet, C. Joanblanq, and P. Senn, GENRIF: An integrated VLSI FIR filter
compiler, Eur. Conf. Design Automation, pp. 466-471, Feb. 1993.
[18] G. Wacey and D. R. Bull, POFGEN: A design automation system for VLSI
digital filters with invariant transfer function, IEEE Int. Symp. Circuits Syst.,
pp. 631-634, May 1993.
[19] N. J. Fliege, Multirate digital signal processing: multirate systems, filter banks,
88
References
wavelets, 1994.
[20] P. Reutz, The architectures and design of a 20-MHz real-time DSP chip set,
IEEE JSSC, vol. 24, pp. 338-348, Apr. 1989.
[21] S.-Y. Wu, Low-power multirate IF digital frequency down converter for
wireless communication systems, MS thesis, Dept. of EE, National Central
Univ., Taiwan, Jun. 1997.
[22] R. Hartley, Subexpression sharing in filters using canonic signed digit
multipliers, IEEE Trans. Circuits Syst. II, vol. 43, pp. 677-688, Oct. 1996.
[23] H. Samueli, An improved search algorithm for the design of multiplierless FIR
filters with powers-of-two coefficients, IEEE Trans. Circuits Syst., vol. 36, pp.
1044-1047, Jul. 1989.
[24] T.-J. Lin and H. Samueli, A 200-Mhz CMOS x/sin(x) digital filter for
compensating D/A converter frequency response distortion in high-speed
communication systems, IEEE GLOBECOM, vol 3, pp. 1722-1726, Dec.
1990.
[25] R. A. Hawley, B. C. Wong, T.-J. Lin, J. Laskowski, and H. Samueli, Design
techniques for silicon compiler implementations of high-speed FIR digital
filters, IEEE JSSC, vol. 31, pp. 656-667, May 1996.
[26] I.-C. Park and H.-J. Kang, Digital filter synthesis based on an algorithm to
generate all minimal signed digit representations, IEEE Trans., CAD of IC and
Syst., vol. 21, pp. 1525-1529, Dec. 2002.
[27] A. P. Vinod, E. M.-K. Lai, A. B. Premkumar, and C. T. Lau, FIR filter
implementation by efficient sharing of horizontal and vertical common
subexpressions, Electronics Letters, vol. 39, pp. 251-253, Jan. 2003.
[28] B. C. Wong and H. Samueli, A 200-MHz all-digital QAM modulator and
demodulator in 1.2m CMOS for digital radio applications, IEEE JSSC, vol.
26, pp. 1970-1979, Dec. 1991.
89
References
[29] R. Hartley, Optimization of canonic signed digit multipliers for filter design,
IEEE ISCAS, pp.1992-1995, 1992.
[30] R. W. Mehler and D. Zhou, Architectural synthesis of finite impulse response
digital filters, Symp. Integrated Circuits Syst. Design, pp. 20-25, Sep. 2002.
[31] M. Bellanger, G. Bonnerot, and M. Coudreuse, Digital filtering by polyphase
network: application to sample rate alteration and filter banks, IEEE Trans.
ASSAP, vol. ASSP-24, pp. 109-114, Apr. 1976.
[32] D. J. Shpak and A. Antoniou, A generalized Remz method for the design of
FIR digital filters, IEEE Trans. Circuits Syst., pp. 161-174, Feb. 1990.
[33] Y. C. Lim and S. R. Parker, FIR filter design over a discrete powers-of-two
coefficient space, IEEE Trans. Acoust., Speech, Signal Processing, pp. 583-591,
Jun. 1983.
[34] X. Hu, L. S. DeBrunner, and V. DeBrunner, An efficient design for FIR filters
with variable precision, IEEE Int. Symp. Circuits Syst., vol. 4, pp.
IV-365-IV-368, May 2002.
[35] D. Kodek and K. Steiglitz, Comparison of optimal and local search methods
for designing finite wordlength FIR digital filters, IEEE Trans, Circuits Syst.,
vol. 28, pp. 28-32, Jan. 1981.
[36] E. C. Ifeachor and B. W. Jervis, Digital signal processing: a practical
approach, Addison-Wesley, 1993.
[37] DesignWare foundation library databook, Synopsys, Inc., Jan. 2002.
[38] S. J. Jou, C. H. Kuo, M. T. Shiau, J. Y. Heh and C. K. Wang, VLSI
implementation of timing recovery and carrier recovery for QAM/VSB dual
mode, International Symp. on VLSI Technology, Systems and Applications,
Taipei, R. O. C. June 1999, pp.159-162.
90
References
91