Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO.

7, JULY 2011

1627

Design of Discrete-Valued Linear Phase FIR Filters in Cascade Form


Dong Shi, Student Member, IEEE, and Ya Jun Yu, Senior Member, IEEE
AbstractDigital lters in cascade form enjoy many advantages over their equivalent single-stage realizations in that lower coefcient sensitivity, higher throughput, reduced computational and smaller implementation cost can be achieved. However, the numerical design and optimization of such structure are of much more difculty than the single-stage case if the lter coefcients are restricted to be of discrete values. This is mainly due to the non-convexity of the constraints, which rules out the possibility of employing sophisticated convex optimization techniques as well as the guaranteed global optimality. In this work, a general-purpose algorithm is proposed for the design of linear phase nite impulse response (FIR) lters in cascade form with discrete coefcients. The proposed algorithm decomposes the overall lter into sublters during the traverse of a tree search of the overall lter. Discrete-valued linear phase FIR lters are able to be searched and decomposed into both symmetric and non-symmetric sublters. The optimization complexity is of the same order as the single-stage lter optimization. Design examples have shown that the proposed algorithm is capable of achieving notable reduction in both implementation cost and adder depth compared with their single-stage optimum designs. Index TermsCascade form, digital lters, linear phase, linear programming, subexpression space.

I. INTRODUCTION IGITAL lters have found numerous applications in most digital signal processing systems. Among all the types of digital lters, the linear phase nite impulse response (FIR) lters are commonly used because of its stability, linear phase and the availability of relatively efcient optimization algorithms. However, the possibility for further reduction in implementation cost for a single-stage FIR lter seems rather limited. This is especially true after decent multiple constant multiplication (MCM) techniques have been proposed and explored based on the single-stage direct or transposed direct form [1][12]. An attractive approach to achieving further reduction in implementation cost is to realize FIR lters in multi-stage or cascade form. Early literatures on cascaded digital lters are mainly focused on the design of sampling rate conversion systems [13][20], where efcient realizations of antialiasing

Manuscript received October 05, 2010; revised December 31, 2010; accepted March 08, 2011. Date of publication May 19, 2011; date of current version June 29, 2011. This work was supported in part by Nanyang Technological University (NTU), in part by Temasek Laboratory, NTU, and in part by Academic Research Fund Tier 1 of NTU. This paper was recommended by Associate Editor Yong Lian. The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, 639798 Singapore (e-mail: shid0001@ e.ntu.edu.sg; eleyuyj@pmail.ntu.edu.sg). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TCSI.2011.2143250

lters are required for down-sampling (decimator). A popular structure of such lters consists of the cascade of multiple stages of simple innite impulse response (IIR) lters and FIR lters, where in each stage the simplest multiplierless block is constructed (as the denominator of IIR lters and numerator of FIR lters). This realization is often referred to as cascade integrator-comb lters due to the comb-like magnitude frequency response of the FIR lters. Another common usage of cascade structure lies in the design of interpolated FIR lters [21][23], where narrowband or wideband lters with extremely sharp transition bandwidths are realized by using the cascade of an interpolated prototype lter with a masking lter. A generalized version of such technique is the frequency response masking [24][26], in which the complementary component of the interpolated prototype lter is also generated so that the resultant overall lter is not restricted to be narrowband or wide-band. In spite of the substantial amount of works mentioned above, algorithms for the design of general discrete-valued cascaded FIR lters have not been intensively studied. For the simplest case of cascade form realization, the overall lter , i.e., is the cascade of two sublters and . The optimization problem is usually stated as follows: given a set of design specications for the overall lter , nd the coefcients of the two sublters in discrete values so that the overall implementation complexity is minimized (usually in terms of the total number of adders required to realize the entire circuit). Most existing techniques x one sublter to optimize the other one such that the problem is simplied as the optimization of a single-stage lter [27][31]. However, for general FIR lter design, there are no specic requirements for any sublter. The xation of one sublter reduces the optimization freedom signicantly. The techniques in [30], [31] iteratively x one sublter and optimize the other one. However, in most cases, no more improvement on the lter performance is achievable after 2 or 3 iterations. A joint optimization of the two sublters using heuristic algorithms is proposed in [32]. The above mentioned algorithms, in general, are not able to beat the algorithms [10], [11], [33] developed recently for the design of single-stage lters in the sense to use the minimum number of adders. One reason is because these algorithms are not oriented to the sublters with MCM implementation, while the other reason is that these algorithms may be stuck at local optimums. To develop a general-purpose algorithm for the design of linear phase FIR lter in cascade form with MCM implementation, in [34], an algorithm is proposed to simultaneously optimize the sublter coefcients in discrete space. The basic idea stems from the decomposition of an overall lter (whose coefcient values are discrete) into two sublters whose coefcient

1549-8328/$26.00 2011 IEEE

1628

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 7, JULY 2011

values remain discrete. This technique is then incorporated into the optimization process of the overall lter using mixed integer linear programming (MILP). As more and more coefcients are xed to integers, the possible choices of the decomposed sublter coefcients become less and less. Eventually, either a successful decomposition with two sublters is obtained; or the single discrete coefcient lter cannot be decomposed anymore. However, the algorithm in [34] is rather inefcient in that: 1) there exists a large portion of redundancy in the optimization process; 2) wiser cutting mechanisms could have been adopted to further improve the design efciency; and 3) only symmetric sublters are considered. In practical linear phase FIR lter design, however, the two sublters do not necessarily hold linear phase individually. Therefore, in this paper, a generalized algorithm capable of handling both symmetric and non-symmetric sublters is presented. The efciency of the algorithm is also improved signicantly by removing the possible redundancy. The optimization complexity is reduced to the same order as that to optimize the single-stage FIR lters with discrete coefcients, and hence relatively longer lters are able to be designed. Design results show that the proposed algorithm is able to design lters in cascade form using less adders and lower adder depth than their single-stage counterpart. The rest of this paper is organized as follows. Section II presents the decomposition procedure of a discrete-valued overall lter into two sublters without symmetric constraints. An efcient algorithm is proposed in Section III for the design of discrete-valued linear phase FIR lters in cascade form without symmetric constraints as well. The proposed algorithm results in signicant time savings compared with that in [34]. In Section IV, important details and observations of the proposed algorithm are discussed. Benchmark lters are designed in Section V, to demonstrate the superiority of the proposed algorithm. The design results are compared with their single-stage optimum ones, mainly in terms of implementation complexity, and adder depth. Section VI concludes the paper. II. DECOMPOSITION OF DISCRETE-VALUED FILTERS This section explains the basic strategy for decomposing a linear phase FIR lter into two sublters in cascade form. It is assumed that the overall lter and sublter coefcients are all integers. . Its correLet the order of a linear phase FIR lter be sponding -transform transfer function is given by (1) where for . Also, let and be the two sublters that is decomposed into, and , be the orders of and , respectively. Obviously, for the overall lter, the two relations and hold. In addition, for or , for or and for or . It should be noted that the overall lter given in (1) is assumed to be of even order and symmetric while the sublters are also assumed to be of even orders. These assumptions are made for expository convenience in the following derivation. For other types of lters (odd order, antisymmetric) and order combinations of the

Fig. 1. Calculation of and .

as the convolution of two non-symmetric sublters

sublters, similar equations and decomposition procedures can be derived. A. Basic Principle Assume that , the overall lter coefcients can thus be expressed as the convolution of the sublters coefcients and as (2) and (3) where for . Fig. 1 shows the calculation of and as the convolution of and , where and refer to the impulse responses of the two sublters, respectively. In this particular case, assume that the coefcient pairs and have been xed to certain integers and the value of is known. Since

and that satisfy (2) and (3) can be enumerated (for a given wordlength range). The basic idea of the decomposition procedure stems from this realization. In the following, three specic cases along the decomposition process are illustrated by using a simple example. Note that since the two sublters are not necessarily symmetric, i.e., might not be equal to , for . This also holds for . B. Decomposition of for Starting from the rst coefcient , assume that is , according to (2) and (3), we have xed to an integer, say (4) Thus, for given coefcient effective wordlengths (EWLs) and of the sublters, the possible pairs of are enumerable. For example, if the EWL of both sublters equals 3, this means that and can to 8. take integral values ranging from Dene as the set that contains all the possible pairs needed in calculating the convoluof

the possible

pairs of

SHI AND YU: DISCRETE-VALUED LINEAR PHASE FIR FILTERS

1629

tion of and making equal to . For example, , only the values of and in calculating are needed. Therefore, contains the pairs whose products are equal to . of , conSimilarly, since making tains the pairs of equal to . , we have In this particular case of . Therefore, xing to narrows and the choices of down to 2 pairs. All the other coefcients for and for are not specied yet. C. Decomposition of Note that for as (5) and (6) respectively. Equation (5) implies that if and have been determined by xing the values of for , the possible values of and can also be enumerated when is known. Thus, can be obtained recursively based on and the availability of . Similarly, can be obtained once and (6) implies that are available. Note that since the overall lter is symmetric, will always be used. Therefore, following the above example, once we have and the value of (say 11), if we select and as the rst pair in , i.e., and , according to (5), we have . Thus, can be generated as for , (2) and (3) can be further written

Now we have the two sublters whose coefcients look like follows:

can be any pair in and can be any pair in listed above. It should be noted that the decomposition procedure described in [34] has and in default at every level. Obviously, doing that results in a symmetric sublter scenario. Therefore, the decomposition procedure in this paper is a generalization of the one in [34]. The above described recursive generation of coefcient pairs continues with more and more lter coefcients being known. A tree of for , starting from , is thus formed until is given, as shown in Part A of Fig. 2. In Fig. 2, each represents a node at level and it is linked by the branches (paths) from the previous node to the next node. It is very likely that for some paths in Fig. 2, no feasible is available, satisfying (5). For instance, at level , there are no integral values of and that satisfy . These types of paths are then terminated. Each non-terminated path corresponds to a successful decomposition of sublter for . Each at level holds possible pairs for . In the same way, the tree of , starting from , is also formed. Each node of the tree of contains exactly the same possible pair values as those of (due to the symmetry of the overall lter). Each at level holds possible pairs for , and , . D. Decomposition of for

while

Since , there are some more coefcients in to be determined yet. Therefore, for , given the fact that is completely known at this point, (5) and (6) can be further written as

(7) Similarly, possible pairs of can also be obtained by using are selected as the second pair in . can be generated as and

(6). If and , we have Thus, the corresponding

(8) It is clear from (7) and (8) that the values of and can be uniquely determined when is known. The tree of (and ) is thus extended to Part B as shown in Fig. 2 until is given. It should be noted that since not all the paths in Part B lead to integer values of in tree ( in tree ), more and more paths will be terminated as the level increases.

1630

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 7, JULY 2011

Fig. 2. Decomposition procedure of a single discrete solution.

When the decomposition of Part B (as shown in Fig. 2) is completed, each survived node at holds possible pairs for , and , , whereas each survived node at holds possible pairs for and . Again, due to the symmetry of the overall lter, each node in tree has a corresponding node in tree that holds exactly the same possible pair values. E. Successful Decompositions At this stage, the two trees are combined by combining the all possible pairs in nodes of and , as indicated in Fig. 2. The combination of any pair in node with that of forms a possible pair of for and . If the same pairs of the corresponding nodes are selected, a symmetric pair of and are obtained (which is the case in [34]). Therefore, in each combined pair, all coefcients of the two sublters are completely determined. The remaining coefcients of can be checked against the convolution of and for . Only the combined pairs that lead to the exactly same coefcients as the given overall lter are successful decompositions. Thus, the decomposition of the single discrete-valued lter (for this particular order combination of and ) is completed. III. PROPOSED ALGORITHM In this section, the decomposition procedure described in the previous section is incorporated into the optimization of linear phase FIR lters using MILP. First, the optimization of the overall lter is formulated. Then, the details of the proposed algorithm in described.

A. Problem Formulation The frequency response of a linear phase FIR lter given in (1) can be expressed as (9) Based on (9), a linear programming (LP) problem can be formulated as (10)

(11) where is the objective function, is the ripple, , , and are the given passband ripple, stopband ripple, passband edge and stopband edge, respectively. is given in (9). and are the lower bound and upper bound of the passband gain, respectively. and could be chosen as 0.7 and 1.4, respectively, as discussed in [35]. The detailed descriptions of the optimization process of this problem can be found in [33], where the single-stage discretevalued linear phase FIR lter is optimally designed in the sense that the total number of adders required to realize the lter is minimized. For a complete understanding, [10], [11], [33], [36] are good references. B. Tree Search In traditional MILP tree search, along a path, the coefcients are xed one by one at each level and the unxed coefcients are re-optimized, until the last coefcient is xed to an integer; thus, a discrete solution is obtained.

SHI AND YU: DISCRETE-VALUED LINEAR PHASE FIR FILTERS

1631

Fig. 3. Searching process of the proposed algorithm.

In the proposed technique, the searching along a path is separated into three segments, as shown in Fig. 3, where represents the continuous coefcients of . Here, the orders of the overall and sublters are the same as those assumed in Section II. In the rst segment for , the search is similar to the traditional one. Coefcient is xed one by one (accompanied by the re-optimization) until . Other than that, accompanied with each node (where a coefcient is xed) is an update of two sub trees of and . The details are as follows. When is rst xed to an integer at node as shown in Fig. 3, both sub trees of and are initialized to be containing only a node and , respectively as presented in Section II. These sub trees are attached to node . With more and more coefcients xed along the path , , as shown in Fig. 3, at each node, the two sub trees are inherited from its parent and updated by increasing the sub trees with one more level as shown in Part A of Fig. 2. During this procedure, the two sub trees are growing, i.e., the number of possible pairs at level is more than that at level . The node at level of MILP search is attached with and containing all possible pairs of and , respectively, whose convolutions are the overall lter coefcient for xed along the MILP search to this node. The MILP search is continued in the second segment in Fig. 3, for . Once a coefcient is xed at a node, and of its parent node are inherited and updated according to

the decomposition procedure depicted in Part B of Fig. 2. This procedure continues until , when and are obtained. During this procedure, the two sub trees of the nodes along a MILP search path is non-growing, and in general shrinking, i.e., the number of possible pairs at level is generally less than that at level . In the second segment search, besides the traditional mechanism in MILP of terminating a path, if all the nodes in the sub trees at certain level become empty, the path of the MILP search is terminated as well. In the last segment of search for , since all the coefcients of an are xed, there is no need to x one by one as that in traditional MILP search. Instead, a pool is generated to store all the feasible pairs of and by combining any pair in and . The convolution of each pair of coefcients in is calculated. The pairs that lead to symmetric overall lter meeting the specications are kept in . Following the above 3-segment search, decomposed sublters with discrete coefcients are obtained if is not empty. To nd the set of and that uses the minimum number of adders, other paths as shown in Fig. 3 are searched in the same 3-segment manner recursively, until all the possible discrete-valued have been searched. To monitor the numbers of adders used in the sublters, during the search procedure, two dynamically expanding subexpression basis sets [11], [33] are adopted for each pair of and in and , respectively. As the decomposition proceeds, the subexpression basis sets of and grow accordingly, while the numbers of adders used to realize their coefcients are monitored. The basic mechanism is the same as that in [33], except that here the sum of the orders of the two basis sets are compared with the best result ever achieved. Let and be the orders of the corresponding basis sets of and , respectively, and let be the total number of adders used to implement the multiplier block adders of the best solution so far. For each pair combination of and formed from and , its corresponding is compared with . If , there is no need to investigate this pair as it results in more adders than the current best one. Thus, the number of pairs in and at each level can be further reduced. As a result, the whole computational complexity of the optimization process can be greatly reduced. The above searching manner is based on the traverse of the tree of the overall lter coefcients, by attaching two sub trees to each node, rather than the one proposed in [34], which is based on the tree of sublter coefcient pairs. As a result, using the proposed algorithm, the total number of linear programming (LP) iterations (which consume the major part of the computation time) is kept as the same order as that of the single-stage optimization algorithm in [33]. The main computational overhead of the proposed algorithm lies in the sub tree updates at each node. Compared with the time it takes to do LP iterations, this overhead is negligible. IV. SOME REMARKS In this section, some important explanations and observations of the proposed algorithm are discussed. First, the EWL of the sublters is predened so as to conne the magnitudes of the sublter coefcients when generand . It ating all the possible coefcient pairs in should be noted that larger EWL results in more possible pairs

1632

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 7, JULY 2011

TABLE I , , AND SPECIFICATIONS OF DESIGN EXAMPLES. THE PASSBAND EDGE, STOPBAND EDGE, PASSBAND RIPPLE STOPBAND RIPPLE, RESPECTIVELY

ARE AND

TABLE II . AND ARE THE UPPER BOUND AND LOWER SPECIFICATIONS OF BOUND OF THE PASSBAND, RESPECTIVELY. THE FILTER LENGTH IS 36

A. Designs With Minimum Filter Length/Coefcient Wordlength Combination For the lter lengths given in Tables I and II, each lter has a minimum coefcient EWL, smaller than which, no discrete solution is available. The minimum EWL of the 9 lters for the given lter lengths are listed in Table III. In this subsection, for all 9 lters, every possible order combination of the sublters is optimized under this minimum lter length/coefcient wordlength constraint. It should be noted that for longer lters , and , the total runtime is limited to 24 hours for each order combination. Therefore, there might exist feasible solutions for other combinations for these three lters. The presented ones are the best solutions obtained within the limited optimization time. The best design results of each order combination are listed in Table III, where , and are the length, number of multiplier block adders and number of structural adders of sublters , respectively. MAD and AAD are the maximum adder depth and average adder depth, respectively. The total number of adders are compared with those of the single-stage optimum designs. It can be seen that for lters , , , , and , the cascade form outperforms the single-stage optimum design. The maximum reductions in the total numbers of adders designed in cascade form for those lters are 1, 1, 1, 6, 7 and 6 adders, respectively. It should also be noted that, for the combination of and , no feasible discrete solutions were found for . However, for and , there exist feasible solutions. Nevertheless, for every specic lter, there does exist a combination of lter orders beyond which there are no feasible solutions. As important criteria for high speed and low power implementation, the maximum adder depth (MAD) and average adder depth (AAD) are also listed in Table III for comparison. It can bee seen that for most of the lters, cascade form designs achieve lower MAD and AAD. In particular, for , , , and , cascade designs outperform their single-stage designs in both MAD and AAD; and have the same MAD as their single-stage design but lower AAD. Only the design of results in a larger MAD. Also, it can be seen that for all the lters (except ), the cascade design can achieve lower AAD than the single-stage design. Since lower adder depth is preferable for high speed and low power design of circuits, it is clear that the proposed algorithm is capable of producing better designs in this sense. In the cascade implementation, the wordlength of the signals to the second sublters are increased if no truncation is applied. One might suspect that the decreasing in the number of adders may result in an increase in the number of full adders (FAs). To see this, the total number of FAs for all the design examples are also listed in Table III. It can be seen that the FA count is consistent with that of total number of adders. Also, it should be noted

of and in and , respectively. However, this also results in heavier computational complexity. is xed at the very beginning Second, when of the optimization, the pairs of (or are to be generated. It should be noted that the signs of these coefcients does not really affect the nal design of the sublters. In other words, for instance, , and the pair , the pair would lead to the same design regardless of their signs. The proposed algorithm can be accelerated by simply . discarding either of the above pairs in Third, from our design experience, the orders of the sublters, and , are of much importance. They have signicant impact on the computational complexity of the algorithm as well as the number of feasible designs. Specically, it is observed and decreases: 1) the that as the difference between computation time increases dramatically and 2) the number of feasible designs tends to drop. and The rst observation is due to the generation of . Apparently, according to (5), for larger , more lter are to be used in generating (same for coefcients ). For the second observation, it is further noted that for every specic design, there exists an , increasing the order beyond which does not result in any feasible design. It was spec, where is the order of subulated in [34] that that rst yields no feasible decomposition solution. lter For instance, if there is no feasible decompositions for a 2-order and 5-order combination, there will no feasible decompositions for a 3-order and 4-order combination. However, this speculation is false as shown by an example in Section V. V. DESIGN EXAMPLES In the section, the capability of the proposed algorithm in designing linear phase FIR lters in cascade form is illustrated. 9 benchmark lters taken from literatures are designed. Most of the lters are the same as those in the design example A in [33]. For convenience, the specications are re-listed in Tables I and II, respectively. It should be noted that the lter marked as is the second example in [11]. Furthermore, , , and are the passband edge, stopband edge, passband ripple and stopband ripple, respectively. In Table II, and are the upper bound and lower bound of the passband gain, respectively.

SHI AND YU: DISCRETE-VALUED LINEAR PHASE FIR FILTERS

1633

TABLE III AND ARE THE LENGTH, NUMBER OF MULTIPLIER BLOCK ADDERS AND NUMBER OF STRUCTURAL ADDERS OF SUMMARY OF DESIGN RESULTS. , SUBFILTERS , RESPECTIVELY. MAD AND AAD ARE THE MAXIMUM ADDER DEPTH AND AVERAGE ADDER DEPTH, RESPECTIVELY. FA IS THE TOTAL NUMBER OF FULL ADDER CELLS REQUIRED AND IS THE INPUT SIGNAL WORDLENGTH. ALL THE SUBFILTERS ARE SYMMETRIC

that the calculation of FA count is based on the assumption that sublter 1 is realized before sublter 2. Changing the positions of the two might result in a different FA count. This problem is analyzed in Appendix A. It is noted that in all the cases, the overall lter can always be decomposed into a 1st-order lter ( and ) with another one. It is well known that an odd-order symmetric lter always has a zero at and thus a term of could be extracted for sure. As for even-order symmetric lters, [37] shows a special case of half-band lter; as long as the half-band lter meets certain mild constraints, the simple block can always be factored out. However, for general even-odd symmetric FIR lters, the constraints in [37] cannot always be met. Therefore, it is not likely that can always be factored out for any design specications, though it is true for the particular benchmark lters of and listed in Table III. B. Designs With Relaxed Filter Length/Coefcient Wordlength Combination When the minimum lter length/coefcient wordlength combination is applied, it has been found that the number of possible combinations of sublter orders is rather limited, especially for lters with longer lengths. In this subsection, , and are designed with relaxed lter length/wordlength combination. The order of each overall lter is increased by 1, while the EWL of is increased by 1 as well, as shown in Table IV. It can be seen that, relaxing the lter length/coefcient wordlength constraints results in not only increasing possible sublter order combinations but also even better results in MAD, AAD, total number of adders and FA count. For , the best solution requires only 12 adders to implement the whole lter, which is 25% reduction compared with the single-stage optimum design with the same length. For , the total number of adders of the best result is also reduced from 23 to 21. For , only 27 adders are required in the best case, to implement the

cascade form realization, which is 5 less than the single-stage design and 10 less than that in Table III. Furthermore, even lower MAD and AAD are achieved compared with the results shown in Table III. The impulse responses of the sublters for these three lters (the best result of each one as indicated in bold letters in Table III) are listed in Table VI in Appendix B for reference. C. Non-Symmetric Design It should be pointed out that all the results obtained in Sections V-A and V-B are designs with symmetric sublters. No non-symmetric decomposition is found for the given lter length and EWL constraints. Only when the EWL of the overall lter is much more relaxed than that in Table III can non-symmetric sublter decompositions be found. Some of the design results are listed in Table V for reference. Compared with the symmetric cases in Tables III and IV, it is clear that the non-symmetric designs require much higher implementation cost. VI. CONCLUSION In this paper, a general-purpose algorithm is proposed for the design of linear phase FIR lters in cascade form with discrete coefcients. The proposed algorithm decomposes the overall lter into sublters during the traverse of MILP search of the overall lter. The conventional recursive search process is modied to cope with the decomposition procedures. Design examples have illustrated that the proposed algorithm is able to achieve signicant reduction in both implementation cost and adder depth compared with their single-stage optimum designs. In addition, although it is not necessary for the cascade sublters to be symmetric, our designs show that the results with symmetric sublters signicantly outperform those with non-symmetric ones in the number of adders. Furthermore, increasing the lter length and/or coefcient wordlength to larger than the

1634

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 7, JULY 2011

TABLE IV , AND WITH INCREASED FILTER LENGTHS. , AND ARE THE LENGTH, NUMBER OF MULTIPLIER SUMMARY OF DESIGN RESULTS OF BLOCK ADDERS AND NUMBER OF STRUCTURAL ADDERS OF SUBFILTERS , RESPECTIVELY. MAD AND AAD ARE THE MAXIMUM ADDER DEPTH AND AVERAGE ADDER DEPTH, RESPECTIVELY. FA IS THE TOTAL NUMBER OF FULL ADDER CELLS REQUIRED AND IS THE INPUT SIGNAL WORDLENGTH. ALL THE SUBFILTERS ARE SYMMETRIC

TABLE V AND WITH NON-SYMMETRIC SUMMARY OF DESIGN RESULTS OF SUBFILTERS. , AND ARE THE LENGTH, NUMBER OF MULTIPLIER ADDERS AND NUMBER OF STRUCTURAL ADDERS OF SUBFILTER , RESPECTIVELY

has no contribution to the cost of adders and therefore should be not be taken into account. In addition, denote , and as the total number of full adders (FAs) used to realize the lter, the number of FAs for the multiplier block and the number of FAs for the structural adders, respectively. It is also assumed that there is no truncation of the intermediate results. Hence, is clearly given by (12) It is apparent that is related to the specic MCM algorithm used to synthesize the lter coefcients. Suppose that the MCM algorithm results in a set of non-one positive odd numbers and this set achieves its lower bound in number of MCM adders [38]. Thus, can be approximately expressed as

minimum requirement may result in less number of adders and lower adder depth for a given specication. APPENDIX A ORDERING OF THE TWO SUBFILTERS In the system level, there is no difference in the ordering of the two sublters, as long as the intermediate output is not quantized. However, in bit level, there might be increase or decrease in the total number of full adder cells due to the change of the signal wordlength. Therefore, in this appendix, the total number of full adders required to implement an FIR lter is analyzed. Based on this analysis, the effect of different ordering of sublters in cascade form is discussed. For a single-stage FIR lter realized in transposed direct form, let and be the total number of non-zero coefcients and the corresponding coefcient values, respectively. For instance, if the length of lter is 4 and the coefcient values are: 3, 5, 0 and , from to , then while , and are 3, 5 and , respectively. This assumption simplies the analysis because a zero coefcient

(13) is the wordlength where is the -th element in the set and of the input signal. It should be pointed out that (13) overestimates the extra bits needed for the synthesis of coefcients. For example, realizing (while is the input signal) does not require a full adder longer than bits (for simplicity, the extra logics for computing the sign-bit in the twos complement case is not considered). However, from (13) it is clear that 2 extra bits are estimated for this realization. We shall keep this pessimistic estimation, because as shown later, the dominant part of the overall complexity is not due to (13) and it has no impact on the ordering of the sublters. For more detailed complexity model of bit-level adders, readers are to refer to [39], [40]. On the other hand, can be expressed as the sum of FAs for each structural adder. Let be the number of FAs used for the -th structural adder which corresponds to the addition

SHI AND YU: DISCRETE-VALUED LINEAR PHASE FIR FILTERS

1635

TABLE VI IMPULSE RESPONSES OF THE SUBFILTERS DESIGN EXAMPLES , AND

FOR

The rst term in (16) corresponds to the total number of adders calculated at the system level (without going down to the bit level). The second and third terms in (16) correspond to the extra FAs needed for the bit extension. It is obvious that in the system level optimization (which is done in this work and most other works), the second and third terms are not taken into account. And the value of the second term is usually much less than the third one. This is because both the number and magnitude of are often much smaller than . Hence, in calculating extra FAs for bit extension, the third term is dominant. Furthermore, it is worth mentioning that for a more accurate and practical algorithm, optimization problems should be formulated in this bit level and (16) can be the objective function to be minimized. The implementation cost for a cascade design can thus be easily obtained by extending the results shown above into the two-lter scenario. For the combination of and (where is placed before , the total number of FAs is (17) where the superscript in denotes the order of before . Using (16) and note that the input signal wordlength for the second lter is , instead of , we can obtain the condition under which the cascade of before requires less number of FAs than the cascade of before , as given by

of the tap delay line signal and the product of the input signal with the coefcient . Then, can be computed as (14) Therefore, can be obtained by using (14) for all the structural adders and we have (18) From (18) it can be seen that if a sublter has longer lter length but smaller sum of coefcient magnitude, this lter should be placed before the other one. However, longer lters often have larger sum of coefcient magnitude, hence the placement of the sublters needs to be checked case by case. Our design experience shows that the ordering of the two sublters does not result in signicant difference in FA count. APPENDIX B IMPULSE RESPONSES OF DESIGN EXAMPLES See Table VI. REFERENCES
[1] R. I. Hartley, Subexpression sharing in lters using canonic signed digit multipliers, IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 43, no. 10, pp. 677688, Oct. 1996. [2] A. G. Dempster and M. D. Macleod, Use of minimum-adder multiplier blocks in FIR digital lters, IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 42, no. 9, pp. 569577, Sep. 1995. [3] F. Xu, C.-H. Chang, and C.-C. Jong, Contention resolution algorithm for common subexpression elimination in digital lter design, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695700, Oct. 2005.

(15) It should be noted that there are only structural adders because the output of is directly wired to the register. Combining (15) and (13), we have

AND

(16)

1636

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 58, NO. 7, JULY 2011

[4] N. Sankarayya and K. Roy, Algorithms for lower power and high speed FIR lter realization using differential coefcients, IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 43, no. 6, pp. 488497, Jun. 1997. [5] O. Gustafsson, A difference based adder graph heuristic for multiple constant multiplication problems, in Proc. IEEE ISCAS07, New Orleans, LA, 2007, pp. 10971100. [6] Y. Voronenko and M. Pschel, Multiplierless multiple constant multiplication, ACM Trans. Algorithms, vol. 3, pp. 138, May 2007. [7] J. Yli-Kaakinen and T. Saramki, A systematic algorithm for the design of multiplierless FIR lters, in Proc. IEEE ISCAS01, Sydney, Australia, 2001, pp. 185188. [8] O. Gustafsson and L. Wanhammar, Design of linear-phase FIR lters combining subexpression sharing with MILP, in Proc. MWSCAS02, 2002, pp. 912. [9] F. Xu, C.-H. Chang, and C.-C. Jong, Design of low-complexity FIR lters based on signed-powers-of-two coefcients with reusable common subexpressions, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 10, pp. 18981907, Oct. 2007. [10] Y. J. Yu and Y. C. Lim, Design of linear phase FIR lters in subexpression space using mixed integer linear programming, IEEE Trans. Circuits Syst. I, vol. 54, pp. 23302338, Oct. 2007. [11] Y. J. Yu and Y. C. Lim, Optimization of linear phase FIR lters in dynamically expanding subexpression space, Circuit Syst. Signal Process, vol. 29, pp. 6580, Feb. 2010. [12] M. Aktan, A. Yurdakul, and G. Dndar, An algorithm for the design of low-power hardware-efcient FIR lters, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 7, pp. 15361545, Jul. 2008. [13] D. S. K. Chan and L. R. Rabiner, Theory of roundoff noise in cascade realizations of nite impulse response digital lters, Bell Syst. Tech. J., vol. 52, pp. 229245, Mar. 1973. [14] D. S. K. Chan and L. R. Rabiner, An algorithm for minimizing roundoff noise in cascade realizations of nite impulse response digital lters, Bell Syst. Tech. J., vol. 52, pp. 347385, Mar. 1973. [15] R. Shively, On multistage nite impulse response FIR lters with decimation, IEEE Trans. Acoust., Speech Signal Process., vol. 23, no. 8, pp. 353357, Aug. 1975. [16] R. Crochiere and L. Rabiner, Optimum FIR digital lter implementations for decimation, interpolation, and narrowband ltering, IEEE Trans. Acoust., Speech Signal Process., vol. 23, no. 10, pp. 444456, Oct. 1975. [17] D. J. Goodman and M. J. Carey, Nine digital lters for decimation and interpolation, IEEE Trans. Acoust., Speech, Signal Process., vol. 25, no. 4, pp. 121126, Apr. 1977. [18] E. B. Hogenauer, An economical class of digital lters for decimation and interpolation, IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 4, pp. 155162, Apr. 1981. [19] M. Smith and D. Farden, Statistical design of cascade nite wordlength FIR digital lters, in Proc. IEEE ICASSP84, Mar. 1984, pp. 583585. [20] W. Sung and S. Mitra, Efcient FIR llter design using differential coding of lter coefcients, in Proc. IEEE ICASSP85, Apr. 1985, pp. 4548. [21] Y. Neuvo, D. Cheng-Yu, and S. K. Mitra, Interpolated nite impulse response lters, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 563570, Jun. 1984. [22] T. Saramki, Y. Neuvo, and S. K. Mitra, Design of computationally efcient interpolated FIR lters, IEEE Trans. Circuits Syst., vol. 35, no. 1, pp. 7088, Jan. 1988. [23] S. K. Mitra, Digital Signal Processing, A Computer-Based Approach. New York: McGraw-Hill, 1998, ch. 10. [24] Y. C. Lim, Frequency-response masking approach for the synthesis of sharp linear phase digital lters, IEEE Trans. Circuits Syst., vol. CAS-33, no. 5, pp. 357364, Apr. 1986. [25] Y. C. Lim and Y. Lian, The optimum design of one-and two-dimensional FIR lters using the frequency-response masking technique, IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 40, no. 2, pp. 8895, Feb. 1993. [26] Y. Lian and Y. C. Lim, Reducing the complexity of frequency response masking lters using half band lters, Signal Process., vol. 42, no. 3, pp. 227230, Mar. 1995. [27] R. J. Hartnett and G. F. Boudreaux-Bartels, On the use of cyclotomic polynomial prelters for efcient FIR lter design, IEEE Trans. Signal Process., vol. 41, no. 5, pp. 17661779, May 1993.

[28] Y. Lian, Complexity reduction for frequency-response masking based FIR lters via prelter-equalizer technique, Circuits Syst. Signal Process., vol. 22, no. 3, pp. 137155, Mar. 2003. [29] H. J. Oh and Y. H. Lee, Design of discrete coefcient FIR and IIR digital lters with prelter-equalizer structure using linear programming, IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 47, no. 6, pp. 562565, Jun. 2000. [30] Y. C. Lim and B. Liu, Design of cascade form FIR lters with discrete valued coefcients, IEEE Trans. Signal Process., vol. 36, no. 11, pp. 17351739, Nov. 1988. [31] Y. C. Lim, R. Yang, and B. Liu, The design of cascaded FIR lters, in Proc. IEEE ISCAS96, May 1996, vol. 2, pp. 181184. [32] S. U. Ahmad and A. Antoniou, Cascade-form multiplierless FIR lter design using orthogonal genetic algorithm, in Proc. 2006 IEEE Symp. Signal Process. Inf. Technol., Aug. 2006, pp. 932937. [33] D. Shi and Y. J. Yu, Design of linear phase FIR lters with high probability of achieving minimum number of adders, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 1, pp. 126136, Jan. 2011. [34] D. Shi and Y. J. Yu, Low-complexity linear phase FIR lters in cascade form, in Proc. IEEE ISCAS10, 2010, pp. 177180. [35] Y. C. Lim, Design of discrete-coefcient-value linear phase FIR lters with optimum normalized peak ripple magnitude, IEEE Trans. Circuits Syst., vol. 37, no. 12, pp. 14801486, Dec. 1990. [36] Y. C. Lim and Y. J. Yu, A width-recursive depth-rst tree search approach for the design of discrete coefcient perfect reconstruction lattice lter bank, IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 50, no. 6, pp. 257266, Jun. 2003. [37] A. N. Willson Jr., Desensitized half-band lters, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 1, pp. 152165, Jan. 2010. [38] O. Gustafsson, Lower bounds for constant multiplication problems, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 11, pp. 974978, Nov. 2007. [39] K. Johansson, O. Gustafsson, and L. Wanhammar, A detailed complexity model for multiple constant multiplication and an algorithm to minimize the complexity, in Proc. IEEE ECCTD 2005, 2005, vol. 3, pp. 465468. [40] K. Johansson, O. Gustafsson, and L. Wanhammar, Bit-level optimization of shift-and-add based FIR lters, in Proc. IEEE ICECS 2007, 2007, pp. 713716. Dong Shi (S07) received the B.Eng. degree from the Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai, China, in 2006 and the Ph.D. degree from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore in 2011. Since 2010, he has been with Creative Technology, Ltd., Singapore. His research interests include low power and low complexity digital lters design, acoustic and array signal processing.

Ya Jun Yu (S99M05SM09) received the B.Sc. and M.Eng. degrees in biomedical engineering from Zhejiang University, Hangzhou, China, in 1994 and 1997, respectively, and the Ph.D. degree in electrical and computer engineering from the National University of Singapore, Singapore, in 2004. From 1997 to 1998, she was a Teaching Assistant with Zhejiang University. She joined the Department of Electrical and Computer Engineering, National University of Singapore as a Post Master Fellow in 1998 and remained in the same department as a Research Engineer until 2004. She joined the Temasek Laboratories at Nanyang Technological University as a Research Fellow in 2004. Since 2005, she has been with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, where she is currently an Assistant Professor. Her research interests include digital signal processing and VLSI circuits and systems design. Dr. Yu has served as an Associate Editor for Circuits Systems and Signal Processing and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: Express Briefs since 2009 and 2010, respectively.

You might also like