1986 Thesis Application of Multiplicative Complexity Theory To Convolution & DFT

You might also like

Download as pdf
Download as pdf
You are on page 1of 206
INFORMATION TO USERS ‘This reproduction was made from a copy of a manuscript sent to us for pubitcation and microfilming. While the most advanced technology has been used to pho- tograph and reproduce this manuscript, the quality of the reproduction is heavily dependent upon the quality of the material submitted. Pages in any manuscript may have indistinct print. In all cases the best available copy has been filmed. ‘The followingexplanation of techniques Is provided tohelp clarify notations which may appear on this reproduction. 1. Manuscripts may not always be complete. When it is not possible to obtain. missing pages, a note appears to indicate this. 2. When copyrighted materials are removed from the manuscript, a note ap- pears to indicate this. 8. Oversize materials (maps, drawings, and charts) are photographed by sec- tioning the original, beginning at the upper left hand corner and continu- {ng from left to right in equal sections with small overlaps. Each oversize page is also filmed as one exposure and is available, for an additional charge. as a standard 35mm slide or in black and white paper format.* 4, Most photographs reproduce acceptably on positive microfilm or micro- fiche but lack clarity on xerographic copies made from the microfilm. For an additional charge, all photographs are available in black and white standard 35mm slide format.* ‘*For more information about black and white slides or enlarged paper reproductions, please contact the Dissertations Customer Services Department. l -M.- Dissertation __ Information Service University Microfilms International ‘ABell & Howell Information Company ‘300 N. Zeb Road, Ann Arbor, Michigan 48106 8617451 Heideman, Michael Thomas APPLICATIONS OF MULTIPLICATIVE COMPLEXITY THEORY TO CONVOLUTION AND THE DISCRETE FOURIER TRANSFORM Rice University PHD. 1986 University Microfilms International soon. ze ross, ann arbor. 48108 RICE UNIVERSITY APPLICATIONS OF MULTIPLICATIVE COMPLEXITY THEORY TO CONVOLUTION AND THE DISCRETE FOURIER TRANSFORM by MICHAEL THOMAS HEIDEMAN A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE DOCTOR OF PHILOSOPHY APPROVED, THESIS COMMITTEE: © Sidney Burrus, Professor of Electrical and Computer Engineering, Director 7 ho. I. fab, — ‘Thomas W. Parks, Professor of Electrical and Computer Engine x John E. Dennis, Professor of Mathematical Sciences Houston, Texas April, 1986 Abstract APPLICATIONS OF MULTIPLICATIVE COMPLEXITY THEORY TO CONVOLUTION AND THE DISCRETE FOURIER TRANSFORM, by ‘Michael Thomas Heideman A review of the theory of multiplicative complexity and its application to com- mon signal processing operations is presented. This review collects results on the multiplicative complexity of polynomial multiplication, products in extension fields, multivariate polynomial multiplication, and multiple products in the same extension field with one or more fixed polynomials. In addition, new results are obtained for the multiplicative complexity of multidimensional cyclic convolution, the one- dimensional discrete Fourier transform (DFT), and convolutions and DFTs with input constraints or output restrictions. ‘The multiplicative complexity of multidimensional cyclic convolution is deter- mined for any possible combination of lengths in any number of dimensions, extend- ing a result of Winograd for one- and two-dimensional cyclic convolution. This result is shown to be applicable in determining the multiplicative complexity of the ‘one-dimensional discrete Fourier transform (DFT). The multiplicative complexity of the DFT for all possible lengths is determined starting with Winograd’s result for odd ii prime lengths and then extending it to power-of-prime lengths, power-of-two lengths, and finally to arbitrary lengths. ‘The multiplicative complexity of systems of polynomial multiplication with constrained inputs is considered. An input constraint must imply a nontrivial factori- zation of one input polynomial for which one factor has coefficients only in the ground field if the multiplicative complexity is to be reduced over unconstrained polynomial multiplication. This result is applied to symmetric polynomial multipli- cation. ‘The multiplicative complexity of polynomial products for which only selected outputs are needed is analyzed. Complexities are derived for polynomial products with decimated and truncated outputs, but no general rule is apparent for arbitrary output restrictions. The effect of input constraints and output restrictions on the multiplicative com- plexity of the discrete Fourier transform are considered. Specifically, restrictions to one input or output are analyzed as are even- or odd-symmetric inputs. Acknowledgements I would like to thank Dr. C.S, Burrus for his support and encouragement in this research. The research presented in this thesis was funded by the National Science Foundation and NASA through research grants that are hereby acknowledged. Finally, I must express my gratitude to my wife, Edith Heideman, and my children, Courtney, Adrian, and Brittany, who endured these years of financial hardship. Table of Contents Chapter 1. Introduction .. 1.1. An Overview of Multiplicative Complexity 1.2, Why Count Multiplications and Divisions Only? 1.3. Organization Chapter 2, Multiplicative Complexity of Linear and Bilinear Systems 2.1. Historical Perspective 2.2. Definitions and Basic Results 2.3. Semilinear Systems ...... 2.4. Quadratic and Bilinear Systems 24, Properties of Quadratic Systems .... 2.4.2, Bilinear Systems and Noncommutative Algorithms 2.43, Direct Products and Direct Sums of Systems ... 2.8. Summary of Chapter 2 Chapter 3. Convolution and Polynomial Multiplication 3.1. Aperiodic Convolution / Polynomial Multiplication 3.2. Polynomial Multiplication Modulo an Irreducible Polynomial .. 3.3. Polynomial Multiplication Modulo a General Polynomial . 3.4. Products of a Fixed Polynomial with Several Polynomials .. 7 2 23 29 32 35 37 37 42 45 49 3.4.1. Equivalent Systems and Row Reduced Forms 3.4.2. Reduction and Inflation Mappings 3.4.3. Equivalence of Products with a Fixed Polynomial 3.4.4. Multiplicative Complexity Results... 3.5. Products with Several Fixed Polynomials in the Same Ring .. 3.6. Products with Several Fixed Polynomials in Different Rings 3.7. Multivariate Polynomial Multiplication .. 3.7.1. Polynomial Products in Two Variables 3.7.2. Multi rensional Cyclic Convolution 3.8. Summary of Chapter 3 .... Chapter 4, Constrained Polynomial Multiplication .. 4.1. General Input Constraints .. 4.2. Multiplication by a Symmetric Polynomial 4,3, Multiplication by an Antisymmetric Polynomial .. 4.4. Products of Two Symmetric Polynomials .. 4.5. Polynomial Multiplication with Restricted Outputs 45.1. Decimation of Outputs 45.2. Other Output Restrictions .. 4.6, Summary of Chapter 4 Chapter 5. Multiplicative Complexity of Discrete Fourier Transform 5.1. The Discrete Fourier Transform .. 50 51 54 63 67 B 4 14 80 84 86 86 94, 96 7 99 99 105 106 109 109 5.2. Prime Lengths 5.2.1, Rader’s Permutation 5.2.2, Multiplicative Complexity 5.3. Powers of Prime Lengths 5.4. Powers of Two Lengths sre 5.5. Arbitrary Lengths .. 5.6. DFTs with Complex-Valued Inputs .. 5.7, Summary of Chapter 5 . Chapter 6. Restricted and Constrained DFTS .... 6.1. Restricting DFT Outputs to One Point 6.2. Constraining DFT Inputs to One Point 6.3. DFTs with Symmetric Inputs .. 6.4, Summary of Chapter 6 Appendix A, Cyclotomic Polynomials and Their Properties Appendix B, Complexities of Multidimensional Cyclic Convolutions .. ‘Appendix C. Programs for Computing Multiplicative Complexity Appendix D. Tabulated Complexities of the One-Dimensional DFT Bibliography vi ML 2 113 7 131 139 152 153 154 154 158 159 165 166 172 176 187 193 CHAPTER 1 Introduction LA. An Overview of Multiplicative Complexity Ever since man first conceived the abstract notions of numbers and arithmetic he has attempted to simplify the numeric computations required to extract desired infor- mation from numerical measurements. The abacus provides an example of a device invented long ago whose primary purpose was to simplify numerical computation. ‘The numerical analysis techniques developed between the 16" century and the begin- ning o1 the 20 century were intended to simplify hand calculations, although most are equally applicable to machine calculations. After the advent of digital computers, ‘many new areas of numeric calculation were investigated. ‘The previous lack of interest in many of these problems was apparently because of their complexity and seeming intractability when all calculations were done manually. ‘This thesis continues research begun by others in the early 1950's, attempting to quantify the number of multiplication or division operations that are necessary to ‘compute certain functions. The first problem that was investigated was the number of multiplications necessary to evaluate a polynomial at a given point. Later, in the 1960's, some results were obtained for the multiplication of polynomials, certain spe- cial types of matrix multiplication, and the discrete Fourier transform. Most of these results were upper bounds obtained by heuristically inventing an algorithm, which sometimes resulted in the development of a theoretical basis for the algorithm, and in only a few cases resulted in a derivation of a lower bound for the same set of compu- tations. In the early 1970's, the theory of multiplicative complexity seems to have emerged as a new field of study, unifying some of the earlier results into a common framework, In several cases the number of multiplications/divisions necessary and sufficient to compute certain systems was established. In other words, rather than bounding the complexity with what were often loose bounds, an exact number of multiplication/division operations is determined to be the minimum required to com- pute a system. For instance, it has been proven that 7 multiplications are necessary and sufficient to multiply two arbitrary 2x2 matrices. “The systems under consideration in this thesis are those commonly encountered in digital signal processing. These include aperiodic and cyclic convolution and the discrete Fourier transform. Winograd has derived some multiplicative complexity results for these types of systems; in particular, exact expressions for the minimum number of required multiplications are known for aperiodic and cyclic convolution and the discrete Fourier transform of a prime-length sequence if the inputs are all indeterminate. Lower bounds have been obtained for the multiplicative complexity of cyclic convolution when only one input sequence is completely indeterminate. Proof of all these results are included in this thesis for completeness; since these proofs have never before been collected in one place. This thesis extends the knowledge of the multiplicative complexity of the discrete Fourier transform from sequence lengths that are prime numbers to arbitrary sequence lengths. Previous work in this area had demonstrated the possibility of evaluating the multiplicative complexity for sequence lengths that are powers of prime numbers, but explicit formulas for the minimum number of required multipli- cations had not been derived. The multiplicative complexity of multidimensional cyclic convolution is derived for all possible lengths. Another contribution of this thesis is the analysis of the multiplicative complex- ity of convolution and the discrete Fourier transform when either the inputs have been constrained or the outputs have been restricted. It is shown that only certain types of input constraints reduce the multiplicative complexity of a system of aperiodic convolution. The discrete Fourier transform with symmetric or antisym- metric inputs is investigated. The multiplicative complexity of computing certain subsets of the outputs of the aforementioned systems is analyzed, but general results for this type of restriction are not as easily obtained as for the input constraints. 1.2. Why Count Multiplications and Divisions Only? One question that must be addressed is the reason for examining multiplicative complexity in the first place. The types of systems to which the theory of multiplica- tive complexity is applied are those in which a desired set of outputs can be expressed exactly in terms of a given set of inputs using only the field operations of addition, subtraction, multiplication, and division. Of these four operations, multipli- cation and division are intrinsically more difficult, generally requiring many additions/subtractions to be approximated to a desired precision in most fields of interest. Much of the computer hardware built before the mid-1970's and most fixed-point processors today require significantly more time to multiply or divide two numbers than to add or subtract them. Analysis of the multiplicative complexity of systems of equations provides a definitive number as the minimum number of required multiplications/divisions, which can be useful in evaluating known algorithms for computing the system. Fre- quently the techniques used in evaluating the multiplicative complexity of a system also provide a means of deriving algorithms that will realize the minimal multiplica- tive complexity, or provide insights into the development of suboptimal algorithms that may be more efficient than pre-existing algorithms. ‘A reduction of the number of multiplications or divisions used in computing a given system is possible through the application of the distributive, commutative, and associative laws in the field of interest, and in particular through the use of these laws in exploiting algebraic properties of the system. Frequently, when a system is large, the algorithms that realize the minimum multiplicative complexity require excessive numbers of additions, and can be extremely susceptible to numerical errors when implemented on standard computing hardware. Even in these cases it is almost always possible to decompose a large system into smaller systems, each lacking the additions when individually implemented with the minimum problem of exces: number of multiplications/divisions, to obtain suboptimal algorithms that are practi- cal and less likely to exhibit unacceptable computational errors. 13. Organization ‘The following chapters describe the various types of systems encountered, using an abstract algebraic approach originally due to Winograd. The placement of these systems into the algebraic setting requires the use of notation that is probably much more familiar to mathematicians than engineers, An attempt has been made to remove some of the abstraction by including frequent examples showing the applica- tion of the abstract principles. Chapter 2 presents a basis for the theory of multiplicative complexity and shows how this theory can be applied to bilinear systems. Semilinear systems are defined and their multiplicative complexity is analyzed for several specific semilinear sys- tems. The concept of a substitution is introduced. The most important results of this chapter are the row-rank theorem and the column-rank theorem, originally proven by Winograd. Chapter 3 describes the application of the theory of multiplicative complexity to several types of systems of polynomial multiplication, including systems equivalent to aperiodic convolution and cyclic convolution, The work on polynomial multipli- cation modulo irreducible polynomials is applicable to the analysis of the multiplica- tive complexity of the discrete Fourier transform, as are the extensions to general polynomials, polynomials in several variables, and multiplications by several polyno- mials, A large part of this chapter is devoted to the presentation of some general results of Auslander and Winograd on semilinear systems defined by polynomials. A new result is presented for the multiplicative complexity of multidimensional cyclic convolution that generalizes Winograd’s two-dimensional analysis to an arbitrary umber of dimensions. Chapter 4 contains results on polynomial multiplication with constraints. An important theorem is proven that gives exact conditions an input constraint must satisfy if the multiplicative complexity of a system is to be reduced. This theorem is then applied to the analysis of the multiplicative complexity of polynomial products with various symmetries. A theory is also developed for some types of restrictions on the outputs of systems of polynomial multiplication, but the results are less general because of the different structure of this problem. Chapter 5 applies the results of Chapter 3 to determine the multiplicative com- plexity of the one-dimensional discrete Fourier transform for all possible lengths. ‘Some of these results are obtained for the first time in this thesis. Chapter 6 extends the analysis of the multiplicative complexity of the discrete Fourier transform to include constraints on the inputs and computation of subsets of the output. ‘These special situations include both input and output pruning and transforms of symmetric and antisymmetric sequences. CHAPTER 2 Multiplicative Complexity of Linear and Bilinear Systems Multiplicative complexity theory is a field that has been in existence for only about thirty years. This chapter begins by highlighting some of the major work in this field that has influenced more recent research into the multiplicative complexity of convolution and the discrete Fourier transform. A framework is then established for the study of arithmetic algorithms and their multiplicative complexity, with emphasis on applications to bilinear systems. Semilinear systems are introduced for studying bilinear systems with one fixed set of inputs, such as a digital filter or the srete Fourier transform. The important concepts of a direct product and a direct sum of systems are also introduced. 2.1. Historical Perspective ‘The rigorous development of the theory of multiplicative complexity of arith- metic algorithms has a short history, beginning with the development of some heuris- tic algorithms for polynomial evaluation in the mid-1950's [20, 24-26]. Schemes for multiplication of large integers (9, 17, 34], which were later extended to the multi- plication of polynomials, soon followed. In 1969, Strassen [31] proposed an algo- rithm for multiplication of 2x2 matrices that could be iterated to generate new algo- rithms for matrix multiplication of arbitrary size with fewer multiplications than the straightforward method. These methods and the fast Fourier transform (FFT) of Coo- ley and Tukey [10] provided new upper bounds for the number of multiplications necessary to compute certain systems. These successes spurred the development of methods, beginning in the early 1970's, for determining realizable lower bounds on the number of multiplications necessary to compute certain systems [5, 12, 27, 35, 36, 38, 40]. In this chapter, some of these methods are presented as theorems that will then be used extensively in the following chapters. We must first establish a notation to simplify the discussion of both multiplica- tive complexity and bilinear systems. Most of this notation is borrowed from Wino- ‘grad [41] since he has been the main contributor to this theory. The following sec- tions of this chapter attempt to present the main ideas that will be necessary in analyzing the multiplicative complexity of convolution and the discrete Fourier transform. 2.2. Definitions and Basie Results An algorithm that computes an arithmetic function is a sequence of steps that computes exactly a set of desired outputs from the given set of inputs. This definition excludes algorithms for approximately computing an output or set of outputs, such as procedures for minimizing the value of a function, root-finding procedures, and the like. A ground set G must be specified in order that trivial multiplications and divi- sions are not counted. G is typically chosen to be Q, the field of rational numbers, since multiplication by an integer can be carried out with a finite number of additions and/or subtractions, and division by an integer can be avoided by scaling the outputs such that each has the same integer denominator. ‘The base set B is defined to be the set of inputs to an algorithm, including all elements of G as a subset, in addition to a set of indeterminate values that are not in G. Occasionally it is desirable to define a field F containing G, and then let B=FU{yp Ya) ++ sq} Where {y1,¥2- ++ »Jq} ate a set of indeterminates. The field F is typically F = G(x xp)... -y%n) Where {X,,Xp,.-- Xp} i8 a set of indeterminates distinct from {y,,y2,-..»Yq} in that any operations involving only elements of {ep Xp). 2%q) yield results in the field F and thus need not be included in an algo- rithm since these results are part of the base set B. Let H be the field H = F(y,Y2y -- +Yq)s that is, the extension field of F by the set of indeterminates {)1,)2,---,¥q}- H contains all possible results of (repeated) application of the field operations to the base set B. The values to be computed are the set Z= (21,29... 2s where z;€H,i=1,2,...,t, and the computation of these output values froma given B will be called the system Z. Definition 2.1. Let G be afield, and let F2G bea field. Let H=Fy.Yo be a purely transcendental extension field of F. An algorithm A (over the set B) is defined 10 be a finite sequence hy, hy, ...,hy of elements of H such that either h; ¢ B or there exist j,k dim Lg(r(21),r(29),.--»1(e,)). 15 Corollary 2.1. If ¢ output values are being computed that are linearly independent (over G) in H’, then at least ¢ multiplications are required to compute the set of out- puts. 7 ‘An algorithm that computes the outputs of a system using the minimum number ‘of mid steps is called a minimal algorithm. Occasionally some of the outputs can be directly computed from the inputs using a single m/d step independent of the other outputs. The next theorem shows that a minimal algorithm exists that computes each element of this subset directly. Theorem 2.3. [41] Let Z={2y,2).-.52}GH and let Z’={2y,2....%}GZ be such that r(2}),r(2q),---+1@,) are linearly independent over G and each z,¢2' satisfies \ip(zj3G)=1. There exists an algorithm A over B computing Z such that g(A;G) = H1g(Z;G) and the first k mid steps of A are the elements of the set 2’ Proof. Let A’=(hyhy...sh,) be a minimal algorithm for computing Z={zps2q---47,} and let A(1),h(2),...,Als) be the mid steps of A’. Assume, without loss of generality, that Z’={z,2,,...,2,} By Theorem 2.1, r(a,) € Lg (r(h(1)), r(h(2)), ..- ,r(h(s)))- Let q be the smallest integer such that r(zj) € Lg(r(h(1)), r(A(2)),... »r(h(g)3). This output value can be expressed as 4 24 = Sgihl4+Da'b; where gy gfe G, be B, and by the minimality of q, g40. i The algorithm A’ can be modified by inserting an initial sequence of steps that com- pute 2, using only one md step, and then replacing the m/d step h(q) by a sequence el of non-md steps using the relation h(q) = 87 "(2-5 gil) ~Lgb)). The new algo- iat 16 rithm has the same number of m/d steps as A’ and contains all steps in A’, and hence must also compute Z. If this process is repeated for 2,_1,2,-)- +2, then A’ will have been converted into the algorithm A satisfying the conditions of the theorem. Example 2.2. Consider the system Z= {nyg,¢y} introduced in Example 2.1. In both A and A the first four steps are in Lg(B) and thus for both algorithms r(h,) =' ,2,3,4. ‘The remaining steps have no linear component over B for either algo- i rithm and consequently for these steps r(;) In the algorithm A, the space Lg(r(lts),r(hg)) is composed of all elements expressible as gomtyotg ey} for go.g, €G. Clearly the mappings r(h,) of the first four steps are contained in this space by letting g9=g,=0, and since r(hs) and r(Itg) are basis vectors of the space, they are naturally included also, validating Theorem 2.1 for this algorithm. The algorithm A has three m/d steps, and the relevant space is Le(r(hs), (hig), (lig)).. This space has elements of the form gotygtay(Rte)y +8271» and since the basis vectors are independent over G the dimension of the space is three, The validity of Theorem 2.1 can be demonstrated for this algorithm, as for A, since the first four steps are the zero vector, the next three steps are the basis vectors, and the final step is the vector [0 1-1], where the natural basis is used. Theorem 2.2 states that tp(Z) 2 dimLg(nyo,¢y,), and for indeterminates yp and ype NO go.81€G exist such that gomyytsyey,=0, thus dimLg(nyg,¢y,)=2 and hig(Z) 2 2. Corollary 2.1 also applies to this example since 2 independent outputs are being computed. Both outputs require only one mvd step each and are independent, 7 thus Theorem 2.3 applies, and A is an example of an algorithm that does indeed com- pute both outputs directly. 2.3. Semilinear Systems The preceding development has given a general framework for the evaluation of the multiplicative complexity of systems. This analysis will now be specialized to the class of semilinear systems. A semilinear system is defined to be the computation 1,2,...,f, The set of inputs is of Z={zyszq...5) where 72 EO yy) mA B=FUY, where ¥ = {yy,¥2,.-- Yq} are indeterminate elements of H, and F is the extension field of G by the @,/s. In matrix form this system is z= @y, where zis the vector z= [2 2 ‘++ 2)", yis the vector y=[y1 yo *** yal’, and @is the ex matrix with the entry in the i** row and j"* column. The space over G whose elements are n tuples of elements of F will be called V, and the corresponding space of t-tuples of elements of F will be denoted by W. Each row of ® is thus an element of V and each column of ® is an element of W. Let V’ be the quotient space V’ = V/G", in which each row of ® represents an ele- ment. Define p,(@) to be the dimension of the linear space over G spanned by the rows of ® as elements of V’, and define p,(®) to be the dimension of the linear space ‘over G spanned by the columns of @ as elements of the quotient space W’ = WIG". ‘The following theorem is simply a rewording of Theorem 2.2 using this notation. Theorem 2.4. [41] Let = Oy be a semilinear system and p,(@) the row rank of © over G, then p(y) > p,(). 18 Proof. Let Z in Theorem 2.2 be the set composed of the entries in the vector z= Oy. ‘The entries of y are independent over G, thus each of the p,(®) independent rows of © (in V) will generate an independent output (over L¢(B)). Therefore dimLg(r(z,),r(29), ...,r2)) 2 p,(), which by the transitivity of inequality and ‘Theorem 2.2 yields up(y) 2 p,(®). ml A similar result can be proven for the column rank of ©, but before presenting, this theorem, it will be useful to define the concept of a substinurion. Intuitively, a substitution replaces one of {)1,J,---.Jq} with a linear combination of the other inputs to the system. The resulting system may require fewer m/d steps than the ori- ginal system. The partial mapping o:” is included in the definition to eliminate the possibility of division by zero when a substitution is made in an algorithm. Definition 2.6. A substitution is @ mapping 0:{) Yo. ---1J,} > Lg(B). This map- ping can be extended uniquely to a homomorphism ©: F())¥q---+¥q) > FO s¥op ++ 19q) Such that every fe F remains fixed. A partial mapping a” can be defined as 0°: FU y¥25+++sYq) FFU pY90+++1Yn) Such that a. (alb) = Bayle) when a(b) # O for a,b € Fy,¥o, + +Jq)» A substitution is called a specialization of yj when aly) =y;fori# j and aty,) =f+¥ syygfor fe F and ge G. is Definition 2.7. A substitution ct is compatible with an algorithm A if every step of A is expressible as alb , b #0, and a,b € F1,¥y- Yn) Theorem 2.5. [35] Let Z=0y be a semilinear system and p,(®) the column rank of © over G, then p(y) 2 p_(®). 19 Proof. Let Z of Theorem 2.2 be as in Theorem 2.4, Assume initially that p,(®) = 1, so that at least one column of © exists that is not in G‘. Let w; denote the i! column of ©, and without loss of generality assume that this column is not in G'. Since the entries of y are indeterminate elements of Hand at least one entry in w; is not in G, the system wy,, that computes the contribution of the i input y; to each of the entries in z requires at least one mvd step. The assertion has now been proven for Pp) = 1. ‘Suppose that the assertion holds for p,() =u, and let A be a minimal algo- rithm computing @y, where ® has w+1 columns that are G-linearly independent in W’. Let A(1) be the first mid step of A. All steps preceding (1) must be of the form h= Sanh, g;¢G,i=1,2,... 50, fe F. Bither h(1)= seh! or h(1)= heh’ for two Fo} Lepaf and b= ely’, where steps A and h’ in A preceding A(1). Let p8feG,i=1,2,....n,and f,f €F. Atleast one g; or g/ is not zero, since if they were all zero then h(1) ¢ F ¢L¢(B), contradicting the assumption that h(1) is an md. step. Assume, with no loss of generality, that g, # 0. rot Choose g €G such that if g-"(g—f- 5 g,y) is substituted for y,, then no steps in A will require division by zero. Such ag can always be selected since each step in A can be expressed as a ratio of polynomials in y,, which must have a finite number of roots since the number of steps in A is finite. Therefore only a finite number of pos- sible substitutions for y, will cause a denominator polynomial to be zero, and since G 20 has an infinite number of elements, g can always be chosen such that all denomina- tors are nonzero. After making the above substitution, an algorithm A’ results that computes the system O'y'ty where O;,=O,-e, 180, (ISiSt, 1S f$m-D), Y= Dy. 7+ Intl and v= (GAT in San 7° Syl! The mid step A(1) in A has been converted to a non-mvd step in A’, and since the rest of the algorithm is the same the system @y must require one more mid step than ®'y'+v, The number of linearly independent columns of ©’ in W” must be at least x, consequently by the induction hypothesis at least u m/d steps are necessary to compute: @'y'+v. Therefore at least u+1 m/d steps are necessary to compute @y and by induction at least p,(®) mvd steps are necessary to compute y in general. ll Theorem 2.4 and Theorem 2.5 are extremely valuable in the study of systems such as the discrete Fourier transform (DFT) that are linear in a set of indeterminates, and in which the matrix multiplying this vector of indeterminates consists of elements of H that are known exactly for all applications of this system. This contrasts with the general case where the elements of the matrix are also indeterminates, such as general matrix vector multiplication. In the proof of Theorem 2.5, a substitution was performed that eliminated one mm/d step from an algorithm. Some care had to be taken to insure that this substitution did not cause a division by zero anywhere in the algorithm. The following theorem guarantees the existence of a specialization that is compatible with a given system and such that the new system requires fewer m/d steps than the original system. 21 Theorem 2.6. [40] Let G be an infinite field and {z, 2, ...,2} a semilinear system such that {1p(24,29)..-42) 1. A compatible specialization 0. of y; exists such that HpCot"(21), 0°), - - - "CE D) SHgle, 29) «+ 2-1 rel Proof. Use the substitution from Theorem 2.5, g,0(),) = 8-f- 5 8);- This substitu- a tion is compatible with the algorithm A since G is infinite and there are only a finite number of roots to each denominator polynomial in any step of A. The proof of ‘Theorem 2.5 shows that this specialization also reduces the multiplicative complexity of the system by at least one m/d step. ll ‘The specialization of Theorem 2.6 can be repeated for several of the y, yielding the following corollary. Corollary 2.2. [40] For every algorithm A over B computing {21, 22)... ,2,} there exists an algorithm A’ over B’ = BV {24,2)...42)} such that w(A’)< W(A)—a(d, where d(l) = dim Lg(r( 2), rep), --- +7(@))- Theorem 2.7, [40] Let G be an infinite field, and {2,2 ,. ..,2,} a semilinear system with 2, fiyp i= 1,2,... 1. There exists a compatible specialization 0. of y; such that Hg (eps) Ceppg)s- = "CE $ gly. 209-92) -A) where 22 a) = dimL gle), r(2p) =dimLg(rf).r(f). rep) 7D). Proof. Let A be a minimal algorithm computing {z1,2),...,z,}. Assume without loss of generality that j= 1 and thus y,=y,. Every step of A can be expressed as h;= ab; where a; and b; are multivariate polynomials in y,,)y,...,J,- Since A has a finite number of steps and each denominator polynomial b; has finitely many roots when viewed as a polynomial in y,, there are only a finite set of elements of Fhyq,¥3p +++ sql (the ting of polynomials with coefficients in F and indeterminates YyYgr++-Jq) that are roots of some denominator polynomial in A. Since G has infinitely many elements, choose g that is not a root of any polynomial b,(y,) in A and let ot be the specialization of y, defined by a(y,)=g and a(y,) =y; for #1. Since no denominators become zero, ois compatible with {21,2,...,2,} and clearly o'()) =f; €Lg(B), i= 1,2,...,1. Corollary 2.2 states that for every algorithm A over B an algorithm A’ over BU {21,2y. «. 52/} exists for which w(A’) < (4) —d(), IfA had been a minimal algorithm for computing {z1, 22, . .. .2,}, then Corollary 2.2 implies that [p(24,452)499 + 62) SHpl2p-2q--«52)—(D. 2.4. Quadratic and Bilinear Systems ‘Semilinear systems are actually a special type of a more general class of systems called bilinear systems that are in turn a special type of quadratic system, Some gen- eral properties of bilinear and quadratic systems are presented in this section. The first important property is that a minimal algorithm using no division steps always 2B exists for these systems. Commutative and noncommutative algorithms are then dis- cussed. Lastly, direct products and direct sums are introduced as a means of creating large systems from smaller systems, or more importantly, for decomposing large sys- tems into smaller systems. 2.4.1. Properties of Quadratic Systems ‘A quadratic system is a system of the form 42% Deersy k= ie 's are indeterminate elements of H. A where the gi’s are elements of G and the bilinear system is a special case of a quadratic system in which the indeterminates 1,2,....5}, have been partitioned into the two sets {x /=1,2,....r} and {yj and each of the outputs z;, k=1,2,...,f, is a linear function of the elements of these two input sets. A general system of bilinear forms is then 225. ze z Layerop k where the gjj,’s are elements of G and the x's and y's are indeterminate elements of A. The following theorem shows two important features of minimal algorithms for bilinear systems. The first is that for any bilinear system there exists a minimal algo- rithm in which no nud step depends on a previous mid step. The second feature is that there exists a minimal division-free algorithm for computing any bilinear system. ‘This suggests that in the study of the multiplicative complexity of bilinear systems 24 ‘we need only consider division-free algorithms in which no m/d step depends on a previous mvd step. This does not, in general, exclude the possibility that minimal algorithms exist in which one or more of the m/d steps involves division ( [11] con- tains a counterexample), or in which an m/d step depends on one or more previous mid steps. A more general theorem about the conversion of essential division steps into multiplication steps, due to Strassen [32], states that a system of degree d, requiring ¢ mid steps may be computed using at most fg = d(d-1)1/2 essential multi- plication steps. Theorem 2.8. [41] Let S be a system of quadratic forms. If G has infinitely many elements then a minimal algorithm A computing S exists such that each mid step of A is of the form h(i) = M,@)N{2) where M(x) and N{x) are both linear in x. Proof. Let S be the system z= Y Dxxj k=1,2,...,t. Let a’= (hi, at fst be a minimal algorithm computing S. Each step of A’ is a rational multivariate poly- nomial with coefficients in G. The algorithm A’ will be modified to the algorithm A satisfying the conditions of the theorem by representing each step of A’ as a power series in the x;'s and then truncating the power series for each step such that the max- imum degree is quadratic. If a step k; has a nonzero constant term in the denominator and therefore cannot be expanded into a power series, then one or more of the x; will be replaced by x/-g;, g;€G, until all denominators have nonzero constant terms. ‘This substitution is always possible since G has an infinite number of elements. The system 2,= 5 Deeps llere)s k=1,2,-..5¢, would then be computed, from isi 25 which the original system S can be computed without additional m/d steps. Let Lp, Ly, and Ly be three linear operators over power series that extract the constant, linear, and quadratic components of the power series and let L = Lg+Lj+Ly. Each step hf of A’ will be replaced by a step kof A such that L(hi) =L(h,) = hy Since the system $ is quadratic, each of the outputs satisfies 2, =L(z,), showing that the algorithm A will indeed compute the system S. The modifications will replace each mid step with only one m/d step and non-mid steps with non-m/d steps, hence ig(AsG) = Hg(S: G). Each of the inputs obviously satisfies L(x-g)) =xj-g;- We will assume that L(h) =h; (SiS), and show how to construct the next element fy, of A from iggy in A’. IE Riyy =eyhtegh for i,jn, in which case the summation over i in (3.1) could be from 0 tom. This system is equivalent to a product of polynomials since the product of the polynomials a) = Sexy and yu) = Sey is equal to 2) =" zu! and will be = ra a denoted PM(m, n), that is the polynomial multiplication of polynomials of degree m and n. ‘The base set for this system is B= GU Giyxp + 1%} UDO Iab It can be proven that if G is a field containing at least m+n distinct elements, then ig(PM(m,n);G) =m-tnt1. The results of Chapter 2 can be applied to this system to obtain a lower bound on the number of required mv/d steps. Lemma 3.1. 1g(PM(m,n);G)2 ment. Proof, The system PM(m,n) can be represented as the product y where ® is an metnt1 x n+] matrix whose entries are the indeterminates $;; j (OSH jSm) and is a vector of ntl indeterminates. Since all zero otherwise, and y=[¥9¥1 °° the entries of © are indeterminate elements of H, no nontrivial linear combination (over G) of the m+n distinct entries of © can yield an element of G. From this we conclude that no nontrivial linear combination (over G) of the rows of ® yields a row whose entries are all in G. Therefore p,(®)=mtnt1 and by Theorem 2.4 we obtain }1g(PM(m, n),G) 2 metre. Ml ‘This lower bound on the number of mid steps is equal to U1g(PM(m, n); G) since an algorithm can be demonstrated that computes PiM(m, n) using m+n+1 mid steps. 39 This algorithm is sometimes called the Toom-Cook or Karatsuba-Toom algorithm and is based on the application of the Lagrange interpolation formula using mtn+1 distinct elements of G. The Toom-Cook algorithm can also be formulated as an application of the polynomial Chinese remainder theorem (CRT) in which multiplica- tions are carried out by reducing each factor modulo m+n+1 distinct linear polynomi- als with coefficients in G, performing the resulting mtn+1 scalar multiplications of the residues, and then reconstructing the resulting product polynomial from the resi- dues. This second approach can be easily generalized to the case where the modulus polynomials are imeducible nonlinear polynomials, providing one method for deriv- ing nonminimal algorithms that may have other desirable features, The Chinese remainder theorem for polynomials states that a polynomial X(u) of degree n can be uniquely reconstructed from its residues modulo a set of relatively prime polynomials whose product is of degree n+l or greater. We will assume that each of the modulus polynomials is monic; if not we could divide through by the leading coefficient without affecting the statement of the theorem. Let =1,2,..445, where (P{(u), P(u)) = 1 for i# j, m= deg Pi(u), and Sinjzn. Let Xiu) denote the residue X(u) (mod P(w)), then the polynomial rat X(u) may be reconstructed from its residues as Pw) Piu) XW) = EX) L-9 4a) (mod Pw) 62) a where each 0 (u) satisfies Qu P(u)/P,{u) = 1 (mod P,(u)) and Pw) = TTP,(o- il 40 ‘We will now apply the CRT to develop a minimal algorithm for multiplication inet of polynomials, Let Po) ="T] Pi(u) where Piu)=u-g, ge, mbntl, and g; # g; fori j. The polynomials to be multiplied are x(u yu) = Sy’ Let %,(W denote the residue x(u) (mod Pi(w)) and y, (u) the residue y(u) (mod P{us)), In this context a residue reduction determines the polynomial of lowest degree congruent to the given polynomial modulo the modulus polynomial (this polynomial will always have degree less than the modulus polynomial). A resi- due reduction modulo a monic linear polynomial is simply the evaluation of the input polynomial at the root of the modulus polynomial, therefore we have x, (W= Exp). 1, eee yrobmtL eo iso and Lyzg/ 1=0,1,...,mentl. ‘These computations require no mid steps since all multiplications are by elements of G. This count accurately reflects that of an actual implementation when the g,’s are 0, 1,or—1. Now mn+l mid steps must be executed to obtain the residues ,mtntl. In the reconstruction the polynomial 7,0) =, (Wy, (wy 11,250. O,(u) = UTP fg) is a constant and element of G and P(u)/P{u) is an mtn'™ degree jai 41 polynomial with coefficients in G. The coefficients of the z,'s in the reconstruction are therefore all elements of G, hence the reconstruction requires no mvd steps. The system PM(m,n) has been computed using mtn+1 m/d steps, which, together with Lemma 3.1, proves the following theorem. Theorem 3.4. If G contains at least mtn distinct elements then Hg(PM(m, n); G) = ment. The requirement on the number of distinct elements of G is one less than expected because a modulus polynomial could have been u-se, yielding the m/d step XnYqr The CRT reconstruction is 2(u) = 2 (Ud ttyy,POD where 2, () LO, YP ay Q{u) (mod P(u)) and the polynomial P(u) is composed of m+n distinct monic linear polynomial fac- tors with coefficients in G and excludes the factor use, The reason for labeling this extra factor as ue becomes clear if the polynomial equation z(u) = x(u)(u) is con- verted to the equivalent equation u”*"2(1/u) = ux(1/u)u"y(1/u). The modulus poly- nomial u in the second representation is 1/u in the first, implying 1/u=0 or w=. ‘The designation of this identity as use of the modulus ue» is used by Winograd [41], and it is also referred to as interpolation at 1/u [22] for obvious reasons. 42 3.2. Polynomial Multiplication Modulo an Irreducible Polynomial The results of §3.1 can be extended to the case of polynomial multiplication modulo polynomials with coefficients in G. The first case to be considered is when the modulus polynomial is irreducible over the field G. Irreducible polynomials can- not be factored into polynomials of lesser degree in a specified field and are analo- ‘gous to prime numbers in the ring of integers. We define the system (ue) = x(u)y(u) (mod P(u)) G3) mt, 4°55 gu! is a monic imeducible polynomial over G of degree n, i= mt ot Sow, and 2(u) = 5 z;u/, as the modular polynomial multipli- =o io cation of x(u) and y(u), denoted by MPM(P), where the x,’s and y,’s are indeter- minates. In this definition we always assume that the polynomials to be multiplied are of degree one less than the modulus polynomial. If they were of a larger degree then they could be reduced modulo P(u) with no mid steps, resulting in an equivalent sys- tem. If they were of smaller degree then the resulting system could be analyzed in the same way to obtain a multiplicative complexity that could be less than that of the complete system. Clearly any system of the type (3.3), even those in which P(u) is not imeducible, satisfies [1g(MPM(P); G) < 2n-1 since the input polynomials could be multiplied using the Toom-Cook algorithm and then reduced modulo P(u) using only 2n-1 mvd steps. We will now prove that when P(u) is an irreducible polynomial or power of an irreducible polynomial (over G) exactly 2n—1 mid steps are necessary to 43 compute MPM(P). The original proof is due to Winograd (38]. ‘Theorem 3.2, Let Q(u) be a monic irreducible polynomial over the field G and let n be the degree of P(u)=Q*(u). If G contains at least 2n-2 distinct elements then lig(MPM(P); G) = fip(MPM(P); G) = 2n-1. Proof. Let (000 -- 0 ~g 100-0 -% Cp=|010 ++ 0 ~g ooo = 1g. be the (right) companion matrix of P, and let V’={veG"|3q#0,degq vector B with entries in G such that only entries in positions 1, jsgysJgay +++ »Jg AFC nonzero and if y=BW is partitioned in a manner similar to W such that Y=(y ltl ++ ly then y,€Vp £=1,2,...,4. At Teast m multiplications are necessary to compute the bilinear form BWA(x)y since the j“* column of Ax; is equal to C4.x; and any nontrivial linear combination of the columns of BIYA() is komt Lu Tachs i iz0 nt Again, as in Theorem 3.2, this combination can be zero only if ¥; 5 04,C; + 0 for k, which is impossible waiess o;=0, Wij, since y,€Vj for k. Therefore BWA(x) has m linearly independent columns and by ‘Theorem 2.5 the computation of BWA(x)y requires at least n m/d steps. Since B is nonzero only in k-s+1 positions, that many multiplications are contributed by the identity matrix partition of (|U’], and the remaining -n entries in Bl/|U'] are nonzero thus B{J|U"lm can be computed using only k-s+1++-n multiplications. Therefore k-stltr-n2n and converting to an inequality for + we obtain 12 2n-k-L+s. Since an algorithm has been developed that uses 2n-k multiplica- tions to compute A(x)y and s21, we obtain ¢=U,(MPM(P);G)= Fig(MPM(P);G) = 2n-k. is the evaluation of ‘A common computation to which Theorem 3.3 app! cyclic (or circular) convolutions. The cyclic convolution of two sequences is defined by seyN-1, GA) where each of the indices is reduced to the principal residue modulo N. The cyclic convolution of (3.4) is equivalent to the polynomial multiplication 49 2(u) = x(w)y(u) (mod w—-1) Na wa Na where x(u) = Sx! yu) = Syl, and 20a) = Fay! 0 io ‘Appendix A shows that the polynomial «’-1 factors over Q into irreducible cyclotomic polynomials. This factorization is wh = TCA, ain where C,(u) is the d** cyclotomic polynomial. The number of irreducible factors of uN-1 over Q is therefore equal to the number of positive divisors of N, denoted by (N). Corollary 3.1. jig(MPM(u-1);Q) = 2N-r(N).. Proof. Theorem 3.3 states that j1p(MPM(P);Q) = 2deg Pk, where k is the number of distinct irreducible factors of P over G, given that G has enough elements. The field G=@Q has an infinite number of elements, deg u-1 =N, and the number of distinct irreducible factors of «V1 over Q is t(NV), thus hg(MPM@N—1);0) =2N-2(N). 3.4. Products of a Fixed Polynomial with Several Polynomials ‘Sometimes one polynomial needs to be multiplied by several other distinct poly- nomials, The discrete Fourier transform of a sequence whose length is a power of a prime number is equivalent to a direct sum of systems of polynomial multiplication modulo irreducible polynomials, which may be subdivided into groups of systems 50 that are identical except for the vector of indeterminates formed from the sequence samples. The principal result of this section is that if each identical system has dis- tinct indeterminates, independent (over G) of the other identical systems, then a lower bound can be established for the direct sum of these systems that is equal to the sum of the lower bounds of each of the components. When the lower bound is realiz~ able, then the multiplicative complexity can be assigned a definite value, These sys- tems are analyzed as semilinear systems, providing a specialization of the results of the previous several sections. Equivalent Systems and Row Reduced Forms It is first necessary to have a good definition for the equivalence of two systems. ‘Two systems are considered equivalent if an algorithm for computing one system can be used to compute the other system and vice versa, such that no additional m/d steps are required to convert either system to the other. Definition 3.1. [4] Let B =FUQpJq0---1Yg} U L2p2q) «+ Eqh The wo systems M(foy and N(f)z are equivalent if wo mappings, 0:9 Yoy +++ +Ym} 2 Lg(B) and Bi{2p,2.+-52q}—Lg(B) exist such that Le(r(M( flay) =Lg(rN(fa) and Lg AB@) =LeM(fly)- Frequently the row rank, p,(M(f)), is less than the actual number of rows of M(f). It will be convenient for such a system to be reduced to a form in which all rows are G-linearly independent, since the original outputs could all be computed from such a form. A system for which all rows are G-linearly independent is said to be in row reduced form. 51 3.4.2. Reduction and Inflation Mappings tis now necessary to extend some of the concepts presented in §3.2 to develop analysis techniques for direct sums of systems of polynomial multiplication modulo identical polynomials, and additionally for direct sums of such systems in which two cor more of the polynomial multiplicands are identical, This analysis will begin with an investigation into the algebraic structure of such systems. Let G be the ground field and u be an indeterminate for all fields that occur. ‘The ring Glu] contains all polynomials with coefficients in G. Let K be the quotient ring K = GluJ(P(w)) where P(u) isan irreducible polynomial over G of degree n and (P(u)) is the ideal generated by P(u). Elements of K are polynomials in Glu}, and each element of K is equivalent to a polynomial in G[u] of degree less than n. Multi- plication in K is the normal convolution product used for polynomial multiplication, except that it is common to reduce elements of K to polynomials of degree less than ‘n through reduction modulo P(u). Each nonzero element of K has a multiplicative inverse, thus Kis a field and the dimension of K over G is [K:G]=n. The usual mt basis chosen for K is 1,1,u2,... 51! Let F > G bea field and let 5 be the ring N= Flu}(P(w)). K will generally be =n, and a field unless P(u) factors over F. The dimension of over F is [%:F] 1,u,u2,...,u%1 is a basis of R. K will sometimes be denoted by R=FOK to show that it ean be constructed as a direct product of algebras. In more generality K>G could be any finite extension field of G such that [K:G] =n. Let ugyuy,.-- sty bea basis of K over G, then 52 a bayy= Sgiptp 1=0,1,.--0-1, io where g;;€ G, for i,j=0,1,...,n-1. Therefore each ke K can be mapped into an nxn G-matrix. This mapping, called the regular representation of K over G rela- tive to the basis yy... ,tq_ty Will be denoted by p(A) = (g,;) where (g,,) is a nota- tion for an nxn G-matrix with entries gi, When K = G[u)/(P(w)), and the basis "lis used, then p(u) = Cp, where Cp is the companion matrix of P(u), Lue as presented in the proof of Theorem 3.2. The representation p can be extended into a representation pp of elements of R as nxn F -matrices. Let Vx be a K-vector space with dimyVy=1. The mapping p allows vectors in Vx to be represented as vectors in a G-vector space that will be denoted by Vg. If Vo Vpr ++ «1 ¥iey 18 a basis of Vy and tp, iy, «5 tty-1 18. basis of K over G, then vi, i=0,1,...,1,7=0,1,...,n isa basis of Vg and dimVg = In. Let Vy and Wy be vector spaces over K with bases vqy¥y,-+-5¥,.1 and Woy Wjr- ++ +Wy_y Fespectively. The space Hom(Vx, Wx) contains all linear transfor- mations from Vx to Wy relative to the given bases and is the space of all sxr matrices over K. Every K-linear transformation 1: Vq— Wg is also a G-linear transformation Vg -> Wg. This mapping of an sxr matrix in K to an smxrn matrix in G is denoted by R:Hom(Vg, Wx) > Hom(Vg, Wq) and is called the reduction mapping since it reduces the field of definition. If p is the 33 regular representation of K relative to the chosen basis of K over G, then the reduc- tion mapping R replaces each entry k,; of an sxr K-matrix A = (k;) with the nx G -matrix p(k,,) in forming R(A). ‘An inverse mapping can be defined to inflate the field of definition, but is only applicable when each nxn submatrix of the saxrn G-matrix is equal to p(k) for some kj €K. The mapping 3: (G3) > i) is called the inflation mapping. Clearly each element of K‘ can be viewed as a vector in G™, yielding the iso- ‘morphism S336" 4K". Let Bg € RHom(Vg,W)), and let ae G™, then 38,8 gq) = SBq)5,(@). S, is called the pseudo-inflation mapping. This mapping can be extended to a map- ping from mx! G-matrices to x1 K-matrices by applying the mapping to each of the columns of the G-matrix. A pseudo-reduction mapping R,: Vx —> Vg can be defined that is the inverse of 3, and can be similarly extended to a mapping from 1x! K-matrices to nx! G- matrices. 54 ‘The concepts of reduction and inflation mappings can be applied to the ring =F ®K. The reduction mapping replaces each entry rj; of an sxr St-matrix with p(ryp. yielding an snxrn F-matrix. The inflation mapping from snxrn F -matrices to.sxr K-matrices exists under conditions similar to those specified for the inflation mapping from G-matrices to K-matrices. 3.4.3. Equivalence of Products with a Fixed Polynomial The previous two sections have provided an algebraic framework and valuable tools for the analysis of systems consisting of a direct sum of products of a single polynomial by several polynomials. Let FG be a field and let H=FOppy20 +++ sYqhs WHEE YY ++ pg BE indeterminates. This system description is identical to those proposed in Chapter 2, where F is typically F = G(x)... .x,) for some indeterminates 1,2)... %p AS in the previous section, let K>G be a finite extension of G with up, y, = Mpa @ basis of K over G. Let R=F@K and Ny) =H @K. Each re K can be expressed nm as Sf where fre F, i=0,1,...,n-1. Similarly, each element of K(y) can be ml expressed as 5 hay = The system under consideration is eG Sah mel ned CE fan % fats where fe F, ly €Lg0) CHG). Auslander and Winograd [4] have chosen to 55 met fue K denote the system {EqqEats++-»Eacn-1y} bY CORifitys {u}), where f= ¥ fi 0 rot and [y= ¥ lait € Lg) @K SRY). When K=GluJ(PW)), where Plu) is an = irreducible polynomial of degree n, then C(K;f,1,;{u}) may be denoted by CUP ifslq) where the basis u; =u! is assumed. 1 ‘The system under consideration is © C(Kifilgs {u)). Sometimes the direct sum symbol (@) is replaced by a union symbol (U) to show that the union of the set of outputs is being computed. Let p be the regular representation of K over G relative to the basis = golly, ++ sty y- Let A(f) be the nxn F-matix ACf)=pp(f)= D/p(u). Let T= Ula far ** lagna)” and Eq= [Ego Sar *** Sogael”> Then By=APY, and the matrix representation of the system under consideration is El) fay 0 ml [0 ayn r= [|= zg) [o 0 = ally When K = G[u)(P(u)) and the basis 1,u, 1 companion matrix of P(u), and A(f) = SACP. Since fy; € Lg(), then fay 56 then Av) t The system 1= © C(Kifilaj {u}) will be denoted by I= (CACP))My to emphasize that each of the A(f) blocks are identical. If another basis vp¥j.--+¥,-1 for K over G were selected, then c @ GS Ly tv)= CAM’. Let kK K be the G-linear mapping that changes basis from v; to u,, defined by Iv) = seynel. IEL is the matrix representing this mapping, then A’(f) =L“A(A)L and Mi, =L~'M,. Consequently 37 af) 0 + 0 ref 2 Do 8 May 0 0 ay G3) fx? 0 -- O]fagy o = 0 : My Ont. oj] 0 apy: 0 0 oe Lo 0° ag) and therefore J and /’ are equivalent systems. Therefore the basis of K over G is irelevant in evaluating the multiplicative complexity of these systems, and CK flaj {u,}) will now be denoted by C(K:f,1,) to reflect this equivalence of ident- jal systems with different bases. Let K’ > K be a finite extension of K with vo, ¥, -.-5¥p.1 a basis of K” over K and assume vo . The injection mapping i:K —> K” allows elements of K to be represented in K’. This mapping can be extended to represent elements of Ry) as nt elements of R’(y)=H@K’ where each element 5 hu; of Sy) is mapped into the = element Fh; (vqu,) of RQ). = Define the system J” by I= BCR Gla) where f and I, are now the images of the original fand I, in (9). Let p be the regu- lar representation of K over G relative to the basis up, 14},. 4, and p” be the reg- ular representation of K’ over X relative to the basis vq.¥),- Ypay- Let p” be the 58 regular representation of K” over G relative to the lexicographically ordered set met {yu}. If fe RQ) is * fou) =(Efiudvye then since vp=1, p= met (Sfp where 1, is the pxp identity matrix. p"(f) is obtained by reducing the 0 field of definition, therefore ° en + AY), and C(K'sfily) is lag 0 o | fa) fr 9 AD 8110 = Places. 66) 0 o anllo} fo Therefore C(K’sf,l,) is equivalent to C(K;filg) and also the systems J and J” are equivalent. ‘The multiplicative complexity of C(K:f,,) does not depend on whether f or ly are viewed as elements of %(y) or 5). Since the system J computes the coordi- nates of fl, then the system C(K;f,1,) will be referred to.as C(fl,) unless the field K is to be emphasized. The following example will clarify some of the concepts presented in the past few sections. 39 Example 3.1. Let G = Q, the field of rational numbers, K = G[u)(u2+1) with natural basis 1,u, and F=G(V2). Consider the system C(f1), where f= fk, f=\2eF, x k=1-ueK. Thus f= he is a representation of fas a vector in F?. Let y have dimension 1 and let the single entry of y be y= "V3. Let the 2x1 G-matrix M be 1 u- fh Elements of the field K can be represented as an ordered pair of elements of G, where multiplication of two pairs follows the rules of complex multiplication. The ‘oni fo -1 regular representation is p(u)=Cp= |, 4 |,and thus Af) =[f Cpfl. Clearly += 1, and the complete system is AUMy vz 2] 1 [3v2| Seabee This system is not row reduced since the two rows of ACf)M are scalar multi- ples of each other in G. For this system p,(A(()M)=p,(A(/)M) =1, thus M(A(f)My) 2 1 and since one multiplication, ¥2-V3, suffices to compute the system: then WAC/My) = 1. ‘The ability to reduce the system C(K’:f,1,) to an equivalent system over K is one example of reducing a system to a minimal equivalent system, in the sense that ¢ is minimal, Given two systems, © CUflg) and © CLT, itcan be shown that they 60 are equivalent if Lo(M,) = Lg(Mp). Thus a system eel) may be reduced to the equivalent problem with the minimum value of r, that is, r= dimL.g(M,). From the previous example we see that a system may satisfy this condition yet not be row reduced. ‘A more general result will now be proven regarding the equivalence of systems of the form under consideration, Theorem 3.4, Let (‘A(f))My and (sA(f))M'y be two systems. Let LE(M) be the K- linear span of the rows of S,(M) and Lf(M’) be the K-linear span of 3,(M’). If LE(M) = LE{M’) then (eA(f))My and (sA(f))M’y are equivalent. Proof. Let B be the sxt K-matrix satisfying 3,(M’) = BS (M) and let Bg = R(B) be the snxin G-matrix obtained by reducing the field of definition. Clearly BgM=M’ and therefore Bo(ACA)My = (SAP IBgMy = GAP )M'y and if 3,(M) = B’S,(M’) then BGCAM'y = CACD)My. In the proof of this theorem the assumption was made that F is a commutative field and thus that % is a commutative ring. If not, then the identity BgCALS)) = (6AY))Bg would not necessarily be true, The identity can be easily verified in the ring F since S(:A(f)) 7, and S((sA(f))) = rl, where f, is the x2 identity matrix and /, is the sxs identity matrix in F and re Ris fin K. Therefore 61 Brl,=rB, ot Br=rB, implying the commutativity of multiplication in F and St. fi Application of the inflation mapping S gives the desired identity. The following definition formalizes this concept of row reduction over K’. Definition 3.2. (4] A system @ CUflg) is said to be quasi-row reduced ifthe t rows of S,() are linearly independent over K. ‘The concept of column reduction can also be applied here. The system (AC)My is column reduced if all the columns of M are G-linearly independent. Since 3,(MC)=3,(M)C for any mx! G-matrix C, then (cA(f))My is column reduced if all the columns of S,(0) are G-linearly independent. ‘ Definition 33. [4] A system © C(fl) is said ro be quas-row-colunn reduced (are reduced) if the t rows of 3 (M) are linearly independent over K and the m columns of S,(M) are linearly independent over G. A system (cA(f))My can always be converted to an equivalent system (ACA) that is gre reduced. The system (“A(/))M'Y’ can be obtained by select- a xm! submatrix of 3,(M) whose rows are K linearly independent and whose columns are G -linearly independent, where ¢’ = dimLg(M) and m’ is the dimension of the G-linear span of the columns of the quasi-row reduced 3, (M). The resulting xm’ matrix is 3,(0") and thus M’=R(S,(M’). In this section it has been shown that a semilinear system of the form («A(f))My is equivalent to the qrc reduced system (sA(f))M'y where M’ = BgMC, Bg = R(B) is 62 amxin G-matrix, B is atxe K-matrix, and C is an mxm G-matrix. A simple example will show this concept. Example 3.2. Let G=Q, K=G[u)u+1), and F = G(x,x,) where x, and x) are indeterminates. Let yy,¥2,)'s be three indeterminates. Define the system S as, G hs = P" [ame fib Fy 3. For this example r= 1 and M is obviously not qre reduced. M is row reduced since Ly(M)=r= 1, but is not quasi-column reduced since there are more columns than rows. The column rank of M is 2, thus S is equivalent toa system A(f)M'y’ where M’ is 2x2 and y’ has only two distinct indeterminates. It is instructive to construct the equivalent system by finding an invertible G-matrix C, such that MC has a zero vector as a column. Suppose that it is convenient to have M' = ly the 2x2 identity matrix, then 1-2 1 c=|o1 -2| loo 1 yields MC =[M"|0]. This system will compute S by simply redefining the indeterminates in y as y =NC7ly, where N= [/>|0]. Inverting C yields 63 123 cts 012 JO 01, and ly lyy+2yot3y5 © pa)” | ozs Thus B 1 -*2| fi 0] Pit S= {P| =AQ)M’y’ = |. E> come. po 1 | [0 1] 2 is a representation of the original system with one less indeterminate, and has the same multiplicative complexity as the original system. 3.4.4. Multiplicative Complexity Results Based on the previous several sections, any system of the form (cA(f))My can be reduced to an equivalent system with minimal row rank over K and minimal column rank over G. In the following analysis it will be assumed that this reduction has been performed and that any system under consideration is thus qre reduced. ‘The number of m/d steps necessary to compute (‘A(f))My can be determined by induction on ¢ and m, where m is the number of distinct indeterminates in y. It can be shown that for m>£, the multiplicative complexity of the system (cA(/))My is, related to the multiplicative complexity of systems of the form ((r-I)AC))My or GAC)My’, where y= [75 "7" Yl? A system (tA(A))My can be computed from 64 one of the two systems mentioned by adding a known number of additional md steps. The following lemma provides a means of identifying which of the two types of systems are obtained in the reduction of a given («A(f))My. Lemma 3.2. [4] Let A beatxs K-matrix of rank t and B be an sx(s-1) G-matrix of rank s-1. If v € G5 satisfies vB = 0, then rank AB = t-1 if and only if K® vis in the K--linear span of the rows of A, otherwise rank AB = t. Proof. Since rankB=s-1, then either rankAB=1 or rankAB=s-l. Assume rankAB = -1, then a nonzero row vector vg € K" exists such that vyAB=0. Let Wg= vA #0 since rankA=t. As a K-matrix the rank of B is also s-1, therefore {ug € K*|uxB =0} is a one-dimensional subspace of K* that must be K@v since ¥B=0. Since WB =0 then wy € K@y, but Wy =¥gA $0 Wy is in the K-linear span of the rows of A, therefore K® v is in the K -linear span of the rows of A. To prove the other half of the relation, assume K @v is in the K-linear span of the rows of A. Let C be arxt invertible K-matrix such that the first row of CA is v. ‘The first row of CAB will be 0 and therefore the rank of CAB is less than t. Since C is nonsingular the rank of CAB is equal to the rank of C~'CAB = AB and must be Le Let G be an infinite field, F > G an extension field of G, and K > G be a finite extension of G unrelated to F. Let ug, yy. s14,-1 be a basis of K over G and let o Saye. Let s=dimLg (rt fo)srtfy)s--- Maa) where riF FIG is the natural homomorphism and s will be referred to as the row rank of f. Let 65 ppp +++ 2Ym BE a Set of distinct indeterminates comprising the entries of y. The cardinality of y is said to be m, indicating that y consists of m distinct indeter- minates. ‘Theorem 3.5. [4] If (AC/)My is a gre reduced system with the cardinality of y equal to m and the row rank of f equal tos ,$ 21, then Hp (AP)My; G) 2 1(s-1} 4m where B= FO{Y1,¥2)-+ + +Ymn) Proof. Consider first the case 1= 1, m=, or the system A(f)My. For this system rel ACA) = SFC» where C;= p(w) is the regular representation of u;, M is an nx1 G- =O matrix, and y is an indeterminate. Let k=3,(M) and let B=p(k~}), ACf)My is equivalent to BA(f)My = A(f)BMy. We can assume without loss of generality that uig= 1, in which case BM=[10 +++ O]7 and ABM =[fyf, °* fy-1]" Thus p,(ACf)BM) =s and Theorem 2.4 yields Hp(AC/My) 2 s = 1(s-1)+1. ‘The theorem has now been proven for (t,m)=(1, 1) and now we will assume that it is true for all (1’,m’) < (t,m) where the set of ordered pairs (¢,m) is ordered lexi- cographically. We will show that for any (¢,m) the system (AC/))My can be modified to yield a smaller system of the same form requiring fewer m/d steps than (CACA)My. The difference between the number of nud steps required for this smaller system and the original system can be bounded and the lower bound is always greater 66 than zero, hence this type of reduction of a system can be recursively applied to obtain a lower bound on the number of mid steps required to compute (:A(f))My. Two types of reduction will be performed on (cA(f))My. The first reduction is used when a nonsingular K-matrix T exists such that 73,(M) has a row of the form Hg, 82" Bb 8 GREK. The system (A(f)My may be replaced by the equivalent system RIGAS)My = (CACF))M’y where S,(M’) = 73,(M). Assume, without loss of gen- erality, that the first row of S,(M’) is Kg) 8 °** 8ml=k@v, #0, vEG™. Let D=diag("1,1,1,...,1) bea exe K-matrix, and let E be a nonsingular mxm G- matrix satisfying vE=[10 --- OQ]. If M”=R,(S,(M")), where 3M") =DS,(M)E, then the systems (A(A))M"y and (:A(f))My are equivalent. Since the first row of 3,(M") is [10 +++ 0}, then the first m outputs of (ACA)M"y ae fy, ,1,...yn-1. This is exactly the structure of the system analyzed in Theorem 2.6, from which we conclude that Hg((CACS))My) = Hg tACf))M"y = Hg((e-DAUMy’)+s where S,(WA is the submatrix of 5,(M”) resulting from the omission of the first row and first column, and y’= [2.3 °** YmJ?- By assumption, (@-DA)My’ is qre reduced and thus must satisfy the induction hypothesis, hence Hp CACAMy) 2 (eI) (5-1) tn = a(s—1)4en, which agrees with the hypothesis. ‘The only remaining possibility is that no nonsingular K matrix T exists such that 73,(M) has a row of the form &[g1 82°" Sm s€G, KEK. For this case 67 ‘Theorem 2.5 guarantees the existence of the specialization sO) = Lar» i) 2G) =¥, 1=2,3,...5m Let E be the mx (m=1) G-matrix whose first row is [25 8 °** 8m)!” and whose last m-1 rows are the (m-1)x(m-1) identity matrix. Clearly of (eA(A)My) = GACP)MEy’, where y’=[)2y3 °°" Yl’, and therefore by Theorem 2.5, Hp(CACDMy) 2 Ug (CAUMEY)+1. Since for this case no matrix T exists satisfy- ing the assumptions, there can be no vector v€ G™ such that K®@v is in the K-linear span of the rows of M, by Lemma 3.2 the rank of 3,(ME) is . The columns of 3,(ME) are G-linearly independent and thus the system (A({))MEy’ is qre reduced. ‘Therefore, since (cACA)MEy’ must satisfy the induction hypothesis for (t, m1), Hp(@AC)My) 2 t(s-1)pm—L41 = 1(s-1)4 ‘This theorem is powerful and can be used to provide many of the lower bounds derived earlier in this chapter. Before demonstrating these applications of this theorem it will be useful to generalize first to the case where more than one element FEF is included in the direct sum, and then to the case where more than one modulus polynomial is used. 3.5. Products with Several Fixed Polynomials in the Same Ring (One question that has not been addressed in the previous sections of this chapter is the multiplicative complexity of several unrelated polynomial products modulo a 68 single irreducible polynomial. The difference between this type of system and that discussed in the previous section is that there are now fi,fy,--- fy F rather than a single fe F. A system of this type will be denoted by S 0, PGAgpuy. Just as for (CAAY)MY, the notion of a qrc reduced system will be useful. The analogue of Theorem 3.4 can be proven for this type of system, resulting in the fol- lowing theorem. J i Theorem 3.6. The swo systems aguas and Pie Aapns are equivalent if and only if the K--linear span of the rows of 3M) and 3,(N) are identical for j= 1,2, Proof. The proof of this theorem follows exactly the same reasoning as Theorem 3.4 and will thus be omitted. J J Theorem 3.7. The two systems © (AG MPy and jag Oz are i i equivalent if and only if wo G-matrices S and T exist such that for each 12,.22,J, MD=NOS andND = MOT, J Proof. The proof of this theorem follows the same line of reasoning used in the proof of Theorem 3.4 and will therefore be omitted. J Definition 3.4. A system © agp dy is said to be quasi-row-column reduced if i for each j=1,2,...,J the rows of 3M) are K-linearly independent and if all 0 the columns of are G-linearly independent. Based on this definition and the two theorems preceding it, two qre reduced sys- tems Boagpay and Bwaupney are equivalent if a nonsingular mx G-matrix $ exists and if for each j=1,2,...,J a nonsingular t;x1; K-matrix Dj exists such that SV) =DS,(MOS, j=1,2,...,J. We are now prepared to determine a lower bound on the multiplicative complexity of eAgp%. As for the system (1A(f))My, let G be an infinite field, F > G an extension field of G, and K>G a finite extension of G unrelated to F with basis uoyitys «5% p-4 ma 1r(fy.)) be the row rank of f= 5 fu;e %, over G. Let s=dimLg(r(fp)sr(f)s- ms where r: F — F/G is the natural homomorphism. Z Theorem 3.8. [4] /f 8 (AyyMy is a qrc reduced system with the cardinality of i y equal tom and the row rank of fj equal to s;21, j= 1,2,...,J, and if for every Le{1,2,...J},dimLg U rfp (sr U4l, then seb tek J : L Hal GATOR G2 Tess Dem, where r(f) denotes the set {r fo)s?(f,)s +++ +7Ufga)} and B =F U {94s ¥20- + Yah 70 J Prof: Let t= 34 he tt number of bosks in the direct sum. As for Theorem 3.5, the proof will be by induction on the set of ordered pairs (¢, m) ordered lexico- graphically. When r= 1 then J must be 1 and the result of Theorem 3.5 provides a starting point for the induction. As in the proof of Theorem 3.5, assume the theorem true for all (¢,m') < (tm). Two reductions will be used that are analogous to those used in the proof of Theorem 3.5. The first reduction is used when 2 nonsingular K-matrix T; exists such that 73,9) has a row of the form kg; 82°" 8b EG, keK for some J€{1,2,... 0h For this case, as for Theorem 3.5, assume without loss of generality that the first k@veK@G". Define L as the subset of row of TSM) is Key 82 “°° Bl {1,2,...,J} consisting of all / for which there exists 7, such that the first row of i T,8,(M) is k,®v, where ke K for !eL. The original system Oaypmy is i J equivalent to the system SAL, where 3,(M') = 3M) for je L and is 3M") = 3D) for j¢L. Let D be the diagonal :x1 K-matrix whose diago- nal entries are Djj=1, je L, and Dy=k™', jeL. Let E be a nonsingular mxm ji 1,2,...,J. The G-matrix satisfying vE=[1 0 «+> O] and tet W) =D ME, j J J original system © (¢, AGM Oy is equivalent to © (¢ ACF) Yy. ja? jer oF Since the first row of 3,(M@) is [10 --- 0] for je, each output of these blocks is of the form iy). Therefore by Theorem 2.6, 1 J J —, 1962 AGM”) bp f (eau oa where d=dimLg(\v rfp) and ais the specialization of y defined by ay) =0 AY) = Yip 3, Each subsystem a°(ejAUQIMOy) is identical to (GAG NY’ where Y= g95 «++ ql NO=HB, and 2-| Lemma 3.2 states that the rank of 3 (Vis (¢;-1) for every je L and is t; for oo is an mx (m=1) G-matrix. every je L. Let NO be such that $(N) is S,(N) without the first row for every JEL and SN is $Y) for every jeL. Also let f= t-1 for jeL and =t, J forjeL. J 7 From Lemma 3.2 we know that eGagpney is qre reduced and is re Z equivalent to o* [esaupira} ‘The total number of blocks in the direct sum l= for this system is = 1-1 and the cardinality of y is m’ = m-1, thus (fm) < (cm) fet and the induction hypothesis applies. Therefore 2 a1 7 In the statement of the theorem it was assumed that Z @ GAC ups] 2 [s spl) /4m-1 Z = Lefs-Dem-1- Eo). it tel d=dimLg (y ro} 2 Ero, yielding J J Hp | © CAG MOy]| 2 np | © GAUDY led =n Lin 2 t{s-l)tm-1-¥ (sl) fet 2 Les - Dem, TMs TMs proving the validity of the induction hypothesis for this case. ‘The remaining case is if no nonsingular K -matrix T; exists such that T/3,(M) has a row of the form K[gy 89 *** &mh 8;€G,kEK, forl 1 can be reduced to an equivalent polynomial of degree m-p in analyzing the multiplicative complex- ity of products with x(u). There are other types of constraints for which the multiplicative complexity is reduced. If some polynomial in Glu] can be added to x(u) so that the resulting poly- nomial has a non-constant factor in Glu), then the multiplicative complexity is reduced by the degree of this factor. ‘This is simply the application of the homomor- phism r that was defined in Chapter 3 to take care of this possibility. In this chapter we will not use the mapping r, but will account for such additive terms explicitly. The following lemma will be useful in converting inhomogeneous systems of equa- tions caused by these additive terms into homogeneous systems of equations. Lemma 4.1. Let [x=d, where Tis an mtixmtl nonzero G-matrix, = bea 1 Ml G1, and d=[dyd, “~~ dy) eG". The vector x can always be replaced by £+e,e¢G™", satisfying T= OandTe=d. Proof. Assume first that m=n and rankD'=m+1. Then I must be invertible and x=T7ld, implying that xe G"!, contradicting the assumption of the lemma. Simi- larly, if mn, rankT where e9€G may be selected arbitrarily. Now $= -e must 1 1 yields e, = satisfy T=0 and the polynomial x(u) may be replaced by the polynomial x(u)tegteyu without affecting p,(®). 89 ‘The basic idea used in the proof of Lemma 3.1 is that p,(®), the row rank over G of the matrix © containing the coefficients of x(u), is equal to m#n+1, the number of rows in ©, since each of the coefficients is a distinct indeterminate. The following theorem states exactly when p,() is less than m+n+1 for n equal to or larger than m. Theorem 4.1. Let y(u)= Soy! be a polynomial with distinct coefficients in B that =o are algebraically independent over G, and let x(u) = x3! be a polynomial whose =o coefficients are not necessarily inG. Let n2m. If Z is the system defined by the product x(u)y(u), then UgZ;G)=mtntl, unless x(u) can be expressed as alu) = Ow) Exulorwy) where Q(u),Q"(u)€ Glu] and m’ n then T’ has only n+l rows and it is possible for the dimension of the null space of ®7 over G to be greater than the number of independent rows of I. Consider the case when m n is more complicated. An upper bound on the multiplicative complexity for m>n can be stated as a simple extension to the result for m 0, lg(PM,(rm,n)) = mtn+l when m is even, and Ug(PM,(m, n)) = mtn when m is odd. Proof. The remarks of the previous paragraph show that the row rank over G of the matrix formed from x(u) is m+n+1 when degR is even and is m+n when degR is odd. Using Theorem 2.4, these constitute lower bounds on Hig(PM,(m, n)) for m even or odd. 4.3. Multiplication by an Antisymmetric Polynomial When the polynomial x(u) exhibits odd symmetry it is called antisymmetric or skew-symmetric. As for symmetric polynomials there is a simple relation between the value of an antisymmetric polynomial at any nonzero point and the value at the reciprocal of this point. This relation is x(u)=-ux(1/u), implying again that all roots (except for 1 and —1) must occur in reciprocal pairs. It is easily seen that 1 is a root of all antisymmetric polynomials, and when a single factor of u-1 is factored out of an antisymmetric polynomial, then the resulting polynomial must be a symmetric polynomial. Theorem 4.3. If a polynomial multiplication by an antisymmetric polynomial is denoted as the system PM,(m, n), where m is the degree of the antisymmetric polyno- ial and n is the degree of another polynomial with indeterminate coefficients, then if n> 0, tg(PM,(m, n)) = m+n-1 when m is even, and tg(PM,(m, n)) = mtn when m is odd. Proof. The factor of u-1 can be eliminated from the antisymmetric polynomial, leav- ing a system equivalent to PM,(m~I,n), and the result of Theorem 4.2 applied to 7 obtain the suggested result, mi Looking back at the proof of Theorem 4.2, this implies that an antisymmetric polynomial of even degree has a factor of u2—1 and an antisymmetric polynomial of Odd degree has a factor of u-1. Another important point to realize is that if the degree of an antisymmetric polynomial is even then the middle coefficient must be %q_i» aNd When i= mi2, then zero. This is because each coefficient must satisfy x;=—%p_p 2nd = Fn» implying that X49 = 0. 4.4, Products of Two Symmetric Polynomials Sometimes it happens that both of the polynomials to be multiplied are sym- ‘metric. When this is the case, the coefficients of y(u) are no longer indeterminate and the row rank of y is (n+1)/2 for n odd, and (n+2)/2 for n even. The redundant rows can be eliminated by replacing each of the first (n+1)/2 or n/2 columns of ® by the sum of the original column with its symmetric counterpart and eliminating the sym- metric columns. Since x(u) is symmetric, this “folding over” of the columns of © causes the rows of the resulting matrix to be symmetric, resulting in (mtn+1)/2 dis tinct rows when m+n is odd, and (m+n+2)/2 distinct rows when min is even. Because the columns were folded over, this system is, in general, no longer equivalent to an unrestricted polynomial multiplication and therefore the resuit of ‘Theorem 4.1 is not applicable. An upper bound on the number of m/d steps necessary to compute this system can be easily established by application of the Karatsuba-Toom algorithm and use of the following identities. Assume that m and n are both even, since if either or both 98 were odd then factors of u+1 could be divided out where necessary. For any €G, x(g)=gx(1/g), 1g) =8"y(I/g), and T(g)=x(g)y(g) = g"™""7(1/g). Therefore, for each geG, except for 1 and -1, two of the residues necessary for the Chinese remainder theorem reconstruction can be determined with only one mvd step. Since ‘m and n were assumed to be even, the product is evaluated for a total of m+n+1 dis- tinct elements of G, using a total of “5P+1 mid steps. If either m or n is odd and men-1 the other even then +1 md steps are sufficient, and if both m and n are odd mtn then mid steps are sufficient. ‘The number of md steps sufficient to compute the product x(u)y(1) will now be shown to be the number of necessary m/d steps also. First, note that any general polynomial can be expressed as the sum of a symmetric and an antisymmetric poly- nomial of the same degree as the original polynomial. That is, for any m'"* degree polynomial x(u), the symmetric part is x,(u)= Yofx(u)tu"x(1/u)], and the antisym- metric part is x,(ue) = Yala(u)-u"x(1/u)], yielding x(u) =x,(u)+x,(u). The product of two symmetric polynomials will be denoted by PM,,(m,) and the product of a sym- metric polynomial of degree m with an antisymmetric polynomial of degree n will be denoted by PM,,(m,n). The — inequality [ag(PM,(m,n);,G) < g(PM, (rn, n);G)+ug(PM,q(m,n);G) arises because an algorithm for computing PM,(m,n) could first decompose y(u) into symiuetric and antisymmetric components with no nvd steps, then algorithms for PM,,(m,n) and PM,,(m,n) could be imple- mented using the sum of the number of m/d steps required by each component, and 99 finally the two component outputs could be added to obtain the symmetric polyno- mial multiplication. ‘When m and n are both even, then pg(PM,(m,n);G)=mtnt, Ha(PM lm m),G) S741, and ip PM gm, n)sG) = Hy (PM g(r Adding the two inequalities yields up(PM,,(m, n); G)Hig(PM,_(m, n);G) Smintl, but also.-—from_~—sthe:~— previous = paragraph = ptg(PM,.(m,n);G)+ 4+itg(PM,q(m, n); G) 2 bg(PM, (m,n); G) = mnt; therefore for the inequality to hold both — ways, Hp(PMgy(m,n);G)+ig(PM,,(m,n);G)=mintl. If either Hy(PM, (rm, ) < oF fig(PM, (m,n); G) < ee then this equality could not be true, thus fg(PMs(m,n);G)= "S41 and yy(PM (, 1);G)= a5. The results for other values of m and n reduce to the case of m and n both even, and therefore a derivation of these results is omitted. Table 4.1 summarizes the multipli- ative complexity of polynomial multiplication with symmetric and antisymmetric polynomials. 48. Polynomial Multiplication with Restricted Outputs 48.1. Decimation of Outputs Frequently, only certain coefficients resulting from a product of polynomials are desired. It is clear, for instance, that if the coefficient of the constant term is the only’ required output of a system then the system is equivalent to a product of two zero- order polynomials and one multiplication is necessary to compute the system. Com- EO) ye) Parity | Parity | Hp(PMa(r, =) ofm ofn General General either | either mnt] ‘Symmetric General | even | either | mtnti ‘Symmetric General | odd | either min Antisymmetric | General | even | either | mtn-1 Antisymmetric General odd either m+n ‘Symmetric Symmetric | even | even mente ‘Symmetric Symmetric | even | odd eel Symmetric Symmetric | odd | odd ao Symmetric | Antisymmetric | even | even = Symmetric } Antisymmetric | even | odd mr : Symmesic | Antisymmetric | odd | even | 4 Symmetric | Antisymmetric | odd | odd ie Antisymmetric | Antisymmetric | even | even | #22 Antisymmetric | Antisymmetric | even | odd etek Antisymmetric | Antisymmetric | odd | odd a Table 4.1. Multiplicative complexity of computing x(u)y(u) for symmetsic and antisymmetric input polynomials. 100 putation of only the highest-order term can be similarly shown to require just a single multiplication. We have seen in the previous section that for linear constraints on the inputs to a system of polynomial multiplication it is possible to determine exactly what conditions must be satisfied for a constraint of this type to have an effect on the multiplicative complexity of the system. Can a similar statement be made regarding the effect of restricting the set of desired outputs to a subset of the coefficients result- ing from a polynomial product? 101 As a first example, consider the computation of every d" coefficient of a poly- nomial product. In digital filtering this selection of equally spaced output samples is called decimation, and the computation of these decimated outputs for a single poly- nomial product is a basic operation in the overlap-add method of digital filtering applied to a decimating filter. A similar problem has been analyzed by Winograd [41], but his analysis was confined to a technique more closely resembling overlap- save than overlap-add. Given a polynomial x(u) of degree m and a polynomial y(u) of degree n, the computation of every d“* coefficient will be denoted by PM(m, n, d). Another param- ‘eter could be added to show which coefficient begins the decimated output, but this starting point does not affect the multiplicative complexity, so the same notation will bbe used to represent a class of systems with arbitrary starting points (less than d). It will be assumed that dn mertl, r>mrSn mentl, r>mr>n Mpl2q 2p «+ 2,5G)= Theorem 4.5. If Tl Sui, xu) and y(u) are polynomials with indeterminate coefficients of degree m and n respectively, and O The outputs of the system can be reversed to obtain a system equivalent to a polynomial product (modu"*-"*1), An analysis similar to that of ‘Theorem 4.5 produces Hg(2;.2,4 9 «+ +2qp4qiG) = 2m+2n-max(m, )-max(n,r}+1. 4.6. Summary of Chapter 4 In this chapter various input constraints and input restrictions have been imposed on systems of polynomial multiplication to determine their effects on the multiplicative complexity. It was shown that for the constrained polynomial of degree equal to or less than the unconstrained polynomial the multiplicative complex- ity is less than a general system of polynomial multiplication only if a polynomial in G[u] divides the constrained polynomial (or the sum of the constrained polynomial and another polynomial in G[u]). When this is the case the multiplicative complexity is reduced by the degree of the polynomial factor from Glu]. ‘When the degree of the constrained polynomial is greater than the degree of the ive complexity can be unconstrained polynomial, an upper bound on the multiplic determined by finding a representation of the constrained polynomial as a sum of pro- ducts of polynomials in G[u] with polynomials having indeterminate coefficients. ‘The result is a direct sum of systems of polynomial products and the upper bound on the multiplicative complexity is simply the sum of the complexities of the 107 components, Clearly there is a “minimal” representation in the sense that no other decomposition of this sort uses fewer md steps. This minimal representation is not necessarily unique. The results on constrained inputs were applied to the computation of the pro- ducts of symmetric or antisymmetric polynomials with general polynomials. The multiplicative complexity of these systems is bounded above by the complexity of a system of general polynomial multiplication and can be less than this bound by only one or two depending on whether the degrees of the polynomials are even or odd. When both of the polynomials are either symmetric or antisymmetric, then the mind sag rage number of required mid steps is between , or almost exactly half that required by general polynomial multiplication. Two types of restrictions on the outputs of a polynomial product were con- sidered. Decimating the output by some integer factor d results in a reduction of the multiplicative complexity by d-1 over the computation of the complete set of out- puts. Truncating the set of outputs to the first r is equivalent to computing a polyno- mial product. modulo u’*! and the multiplicative complexity is equal to min(m,r) +min(n,r)+1. A similar result is obtained for computing the higher order coefficients only. Several remaining questions must be addressed to complete a study of con- straints on the inputs or restrictions of the outputs of polynomial products. Is the upper bound on the number of mid steps obtained when the degree of the constrained polynomial is greater than the degree of the unconstrained polynomial equal to the 108 lower bound? Are there general results for when both polynomials are constrained? ‘An upper bound can obviously be established by removing factors with coefficients in G from each polynomial, multiplying the remaining polynomials, and finally mul- tiplying by the removed factors. Is this also a lower bound? Can anything be stated about arbitrary restrictions on the output? What if linear ‘combinations (over G) of the outputs are desired rather than the coefficients of the polynomial product themselves? Finally, how do these results apply to polynomial products modulo other polynomials? CHAPTER 5 Multiplicative Complexity of Discrete Fourier Transform In this chapter the multiplicative complexity of the discrete Fourier transform (DFT) is analyzed. The next several sections define the DFT and then show how the complexity of the DFT is determined when the number of inputs is prime, a power of ‘an odd prime, a power of two, and finally for any positive integer. 5.1, The Discrete Fourier Transform ‘The discrete Fourier transform is a linear transformation that, when applied to a given vector, describes the vector in terms of an orthogonal set of basis vectors whose entries are uniformly spaced samples of complex exponential functions with unit magnitude and linear phase, that are periodic with the period an integer divisor of the product of the vector length and the interval between adjacent samples. Mathemati- cally, for a vector whose entries are x9... ..y_y, the samples of the DFT are Na Dx,e PMN, k=0,1,...,N1 (6.1) m= Using Euler’s relation, (5.1) can be expressed as, mt X_= Y%_(C082nnkIN ~ jsin 2nnkiN), n=O emphasizing the trigonometric nature of the transformation. 109 110 ‘The DFT, or a formula closely resembling it, was apparently originally discovered by Euler [16] in 1750 in his investigations of sound propagation in elastic media. Shortly after Euler’s discovery, formulas similar to the DET were indepen- dently discovered by D. Bemoulli, Cisiraut, and Lagrange. Their applications included the analysis of the motion of a vibrating string, sound propagation, and ‘modeling the apparent motion of the sun relative to the earth. Several decades passed before further effort was expended on trigonometric series. New research in this area was discouraged because it was not known whether these trigonometric series converged when the number of terms was infinite. In addi- tion to the technical arguments against the use of trigonometric series, the late 18" century was also a bad time to be associated with scientific or mathematical research because of the political situation, particularly in France during the French Revolution. Fourier began investigating the use of trigonometric series in developing an ana- lytic theory of heat possibly as early as 1802. His ideas were rejected by the French ‘Academy of Sciences when he presented them in 1807, because he had no proof for the convergence of the series. Fourier persisted and ultimately convinced his critics that arbitrary functions defined over finite intervals could be expressed as infinite ti- gonometric series. Dirichlet investigated the convergence properties of trigonometric series and showed that Fourier’s claims were mathematically sound. Fourier’s research of trigonometric series led to them being called Fourier series and their extension to the Fourier transform. ut In 1805, Gauss discovered a method for computing the DFT that he claimed “greatly reduces the tediousness of mechanical calculation” [16]. Gauss had derived a technique for computing the DFT that is now known as the fast Fourier transform (FFT), that was rediscovered by Cooley and Tukey in 1965. Gauss applied the DFT to the interpolation of orbits of asteroids, but later discovered a better analytical method for solving this interpolation problem and did not publish his notes on the FFT during his lifetime. The notes did appear in Gauss’ collected work, but were never referenced by later researchers except for a footnote ina mathematical encyclo- pedia in 1904. In 1977, Goldstine [14] realized the significance of Gauss’ discovery and pointed out the connection with the algorithm of Cooley and Tukey. ‘The algorithm known as the fast Fourier transform (FFT) is only efficient when the length of the sequence (vector) to be transformed is a composite number. The next section presents results of Winograd and Rader that show how efficient algo- rithms can be derived for computing the DFT of sequences whose length is a prime number, The later sections show how the theory of multiplicative complexity can be applied to the computation of the DFT for other sequence lengths. 5.2, Prime Lengths When the length of the sequence to be transformed is prime, Rader [28] has shown that the DFT may be reformulated as a cyclic convolution of length one less than the DFT length plus some auxiliary additions. Winograd [40] has analyzed the multiplicative complexity of the prime-length DFT, using the conversion to cyclic convolution. 2 5.2.1. Rader’s Permutation ‘The DFT of prime length is simply (5.1) with N=p, a prime, Rader showed that this computation may be reorganized by first separating the terms in this set of sums for which either n or k is zero (and e~?™#P = 1), yielding 1 Xp Sa m= 62) 1 Xpoagt Dae PMP, b= 1,2,...pel. nel Let w, ePRP be ap" root of unity. By definition, w2=1, and therefore ke pe if and only if nk =n'k’ (modp). The set of reduced residues (all residues excluding zero) modulo a prime p forms a group under multiplication. This group has 6(p-1) distinct primitive roots 21], where 9(-) is Euler’s totient function, and is isomorphic to the additive group Z,_ of integers modulo pI. ‘The existence of a primitive root g in this multiplicative group implies that all elements of the group may be expressed as g/, i=0,1,...,p-2. Rader suggests replacing n and k in (5.2) with (g"), and (ey where the notation (), is a shorthand for reduction to the principal residue modulo p (i. (), =i (modp) and 0 (i),

You might also like