Professional Documents
Culture Documents
Final
Final
CHAPTER 1:INTRODUCTION
1.1 Introduction Convolution provides the mathematical framework for DSP. It is the single most important technique in Digital Signal Processing. Convolution is a mathematical way of combining two signals to form a third signal. Using the strategy of impulse decomposition, systems are described by a signal called the impulse response. In signal processing, the impulse response, or impulse response function (IRF), of a dynamic system is its output when presented with a brief input signal, called an impulse. More generally, an impulse response refers to the reaction of any dynamic system in response to some external change. It has applications that include statistics, computer vision, image and signal processing, electrical engineering, and differential equations. Introduction to Convolution One of the most important concepts in Fourier theory, and in crystallography, is that of a convolution. Convolutions arise in many guises, as will be shown below. Because of a mathematical property of the Fourier transform, referred to as the convolution theorem, it is convenient to carry out calculations involving convolutions. Convolution Definition The convolution of and g is written g, using an asterisk or star. It is defined as the integral of the product of the two functions after one is reversed and shifted. As such, it is a particular kind of integral transform:
While the symbol t is used above, it need not represent the time domain. But in that context, the convolution formula can be described as a weighted average of the function () at the moment t where the weighting is given by g() simply shifted by amount t. As t changes, the weighting function emphasizes different parts of the input function. More generally, if f and g are complex-valued functions on Rd, then their convolution may be defined as the integral:
Types of Convolution There are two types of convolution. They are: Linear convolution Circular convolution
1.2.1
Linear convolution
Convolution is an integral concatenation of two signals. It has many applications in numerous areas of signal processing. The convolution described above is nothing but linear convolution. The most popular application is the determination of the output signal of a linear time-invariant system by convolving the input signal with the impulse response of the system. Convolving two signals is equivalent to multiplying the Fourier transform of the two signals. Mathematic Formula: The linear convolution of two continuous time signals and is defined by
For discrete time signals x(n) and h(n) , the integration is replaced by a summation
1.2.2
Circular convolution
The circular convolution of two aperiodic functions occurs when one of them is convolved in the normal way with a periodic summation of the other function. It occurs naturally in digital signal processing when DTFTs and inverse DTFTs are replaced by DFTs and inverse DFTs. Equivalently, the continuous frequency domain is replaced by a discrete one. (See Circular convolution theorem.) For a periodic function xT(t) , with period T, the convolution with another function, h(t), is also periodic, and can be expressed in terms of integration over a finite interval as follows:
When xT(t) is expressed as the periodic summation of another function, x, this convolution is sometimes referred to as a circular convolution of functions h and x.
1.3 Properties of convolution This section describes the properties of convolution. The properties of convolution are: Commutative Associative Distributive 1.3.1 Commutative property: The commutative property for convolution is expressed in mathematical form: a[n] * b[n] = b[n] * a[n] In words, the order in which two signals are convolved makes no difference, the results are identical. 1.3.2 Associative property:
The associative property describes the way to convolve more than two signals. Convolve two of the signals to produce an intermediate signal, then convolve the intermediate signal with the third signal. The associative property provides that the order of the convolutions doesn't matter. As an equation:
The associative property is used in system theory to describe how cascaded systems behave. Two or more systems are said to be in a cascade if the output of one system is used as the input for the next system. From the associative property, the order of the systems can be rearranged without changing the overall response of the cascade. Further, any number of cascaded systems can be replaced with a single system. The impulse response of the replacement system is found by convolving the impulse responses of all of the original systems.
1.3.3
Distributive property:
In artificial reverberation (digital signal processing, pro audio), convolution is used to map the impulse response of a real room on a digital audio signal (see previous and next point for additional information). In electrical engineering and other disciplines, the output (response) of a (stationary, or timeor space-invariant) linear system is the convolution of the input (excitation) with the system's response to an impulse or Dirac delta function. See LTI system theory and digital signal processing. In time-resolved fluorescence spectroscopy, the excitation signal can be treated as a chain of delta pulses, and the measured fluorescence is a sum of exponential decays from each delta pulse. In physics, wherever there is a linear system with a "superposition principle", a convolution operation makes an appearance.
In digital signal processing, frequency filtering can be simplified by convolving two functions
(data with a filter) in the time domain, which is analogous to multiplying the data with a filter in the frequency domain
The idea of discrete-time convolution is exactly the same as that of continuous-time convolution. For this reason, it may be useful to look at both versions to help your understanding of this extremely important concept. Convolution is a very powerful tool in determining a system's output from knowledge of an arbitrary input and the system's impulse response. We know that any discrete-time signal can be represented by a summation of scaled and shifted discrete-time impulses. Since we are assuming the system to be linear and time-invariant, it would seem to reason that an input signal comprised of the sum of scaled and shifted impulses would give rise to an output comprised of a sum of scaled and shifted impulse responses. This is exactly what occurs in convolution. For discrete time signals the convolution equation is given by:
Graphical Interpretation:
Reflection of Shifting of
2.2.1
Fig 2.2.1.1: A single impulse input yields the system's impulse response.
Fig 2.2.1.2. : A scaled impulse input yields a scaled response, due to the scaling property of the system's linearity.
Fig 2.2.1.3: We now use the time-invariance property of the system to show that a delayed input results in an output of the same shape, only delayed by the same amount as the input.
Fig 2.2.1.4 : We now use the additively portion of the linearity property of the system to complete the picture. Since any discrete-time signal is just a sum of scaled and shifted discrete-time impulses, we can find the output from knowing the input and the impulse response.
2.3 Convolution Analog In this module we examine convolution for continuous time signals. This will result in the convolution integral and its properties. These concepts are very important in Engineering and will make any engineer's life a lot easier if the time is spent now to truly understand what is going on.
10
To begin this, it is necessary to state the assumptions we will be making. In this instance, the only constraints on our system are that it be linear and time-invariant. Brief Overview of Derivation Steps: 1. An impulse input leads to an impulse response output. 2. A shifted impulse input leads to a shifted impulse response output. This is due to the timeinvariance of the system. 3. We now scale the impulse input to get a scaled impulse output. This is using the scalar multiplication property of linearity. 4. We can now "sum up" an infinite number of these scaled impulses to get a sum of an infinite number of scaled impulse responses. This is using the additively attribute of linearity. 5. Now we recognize that this infinite sum is nothing more than an integral, so we convert both sides into integrals. 6. Recognizing that the input is the function f(t), we also recognize that the output is exactly the convolution integral.
Fig 2.3.1.1: We begin with a system defined by its impulse response, h(t).
11
Fig 2.3.1.2: We then consider a shifted version of the input impulse. Due to the time invariance of the system, we obtain a shifted version of the output impulse response.
Fig 2.3.1.3: Now we use the scaling part of linearity by scaling the system by a value, f(), that is constant with respect to the system variable, t.
Fig 2.3.1.4: We can now use the additively aspect of linearity to add an infinite number of these, one for each possible . Since an infinite sum is exactly an integral, we end up with the integration known as the Convolution Integral. Using the sampling property, we recognize the left-hand side simply as the input f(t). 2.3.2 Convolution Integral As mentioned above, the convolution integral provides an easy mathematical way to express the output of an LTI system based on an arbitrary signal, x (t), and the system's impulse response, h(t) . The convolution integral is expressed as
Convolution is such an important tool that it is represented by the symbol *, and can be written as
12
13
down in the same column. For example lets say that we are given two discrete finite length sequences x[n] and h[n] where x[n] = {a1 a2 a3} and h[n] = { b1 b2 b3 b4} are convolved, y[n] =x[n]*h[n], in a way that is similar to regular multiplication as shown below in Table 2.3.3
Table 2.3.3 As we were evaluating possible design approaches to achieve low speed, our research took us through the following progression. Figure 2.3.3 shows the convolution flow of two 16-bit numbers, in 4-bit segments. The letters A, B, C, D, E, F, G,and H each represent 4 bits of the 16 bits number. We sum the partial product along each column; HD0 is the LS 4 bits of the product while HD1 is the MS 4 bits of the product.The Digital Convolution is summarized as: first Flip (reverse) one of the digital functions, second Shift it along the time axis by one sample. Third, multiply the corresponding values of the two digital functions. Fourth, sum the products from step 3 to get one point of the Digital Convolution. And finally repeat steps 1-4 to obtain the digital convolution at all times that the functions overlap. For example, let X= [1 2 3 4 5] and v = [-1 5 3 -2 1].
14
Figure 2.3.3.1 convolution results A discrete convolution of these two discrete signals equals:-1 3 10 15 21 33 10 -6 5. We used Matlab to check the results which is shown in figure 2.3.3.1. For continuous function, y(t) = x(t)*h(t) where the input,x(t), and the impulse response, h(t) has a sufficiently small delta to make the result to be accurate. The e results are shown in figure 2.3.3.2. x= [-2*ones(1,400) zeros(1,1000) 3*ones(1,100)] h=ones(1,300); conv(x,-3,h,-2,0.01)
15
16
17
High performance Digital Signal Processing chips have been widely employed to solve signal processing problems. Many of these signal processing solutions can be implemented in a Field Programmable Gate Array (FPGA) instead of a DSP chip. This is possible because the gate densities available in FPGAs have increased rapidly within the last few years and now allow fairly sophisticated DSP algorithms to be implemented within a single chip. In they try to implement the convolution in an FPGA. Their approach in calculating a finite number of L convolution samples requires approximately 3L+L(L+1)/2 clock cycles and addresses for the two data memories which cost lots of access time resources. In their design they extend the result of the multiplication by six more overflow bits before the results are added to the previous sum of products. This is done so they can prevent overflow which is costly. Depending on the application and desired quality (i.e. the width of the filter kernel), computing this weighted sum of neighboring pixels can require significant amounts of computation, thus suggesting a highly parallel implementation in special-purpose hardware. In they discuss parameterized program generation of convolution filters in an FPGA for applications in image processing including real-time video and desktop publishing. They show an example of 2-D filter pipeline assembled from a set of multipliers and adders, which are in turn generated from a canonical serial-parallel multiplier stage. They show a 3x3 convolution filter for video applications. The drawback In their research is they have a high fan-in and because of the pipeline delay, output pixels may be rewritten directly into the source image memory. It is important to point out the emerging field of algorithm derivation and implementation, which could be used as a basis for future work. In it is shown there are no restrictions imposed on the convolution length other than to be composite, but they pointed out FPGA implementation will be a future work. Breitzman shows the automatic derivation and implementation of fast convolution algorithms and Arce-Nazario presents an automated methodology designed for the high-level partitioning of discrete signal transforms onto distributed hardware architectures.
18
To efficiently control the number of required multipliers, at the cost of a reasonable number of adders, a study was done on a hardware efficient fast cyclic convolution algorithm. It shows the I/O cost can be kept low and the throughput rate high. Thus, it is much more efficient than previous cyclic convolution implementation methods. But independently applying this algorithm for prime-length DFT will still require huge amount of hardware cost. Some specific DFT designs remove the multiplication operations, but they require a large number of adders and RAM/ROM resources. Another approach people use is to go through Matlab. It is used to automatically generate Verilog code for the hardware implementation of convolution algorithms. This automation is very efficient when the coefficients change. As mentioned in when they are trying to implement FIR filter, some inputs go through two consecutive subtraction operators. This optimization can be done when the Verilog code is being automatically generated. In their implementations they used carry-save adders to accumulate consecutive adders which are slow compared to using other adders as will be discussed in the next section. Note that the number of required additions is dependent on the order of iterations. The iteration order for short convolutions should be 4x4, 3x3 and 2x2, as this will lead to the lowest implementation cost. The research paper in shows a substitute algorithm for calculating the convolution that requires less computation time. It is shown that CDMA receivers require a long time to acquire the signals. This is mostly due to the use of expensive FFT based convolvers in the acquisition process. The permutations usually can be stored in lookup tables . This type of implementation is not efficient since it will cost additional hardware to store and time to retrieve. 2.4 Symmetric convolution In mathematics, symmetric convolution is a special subset of convolution operations in which the convolution kernel is symmetric across its zero point. Many common convolution-based processes such as Gaussian blur and taking the derivative of a signal in frequency-space are symmetric and this property can be exploited to make these convolutions easier to evaluate.
19
The convolution theorem states that a convolution in the real domain can be represented as a pointwise multiplication across the frequency domain of a Fourier transform. Since sine and cosine transforms are related transforms a modified version of the convolution theorem can be applied, in which the concept of circular convolution is replaced with symmetric convolution. Using these transforms to compute discrete symmetric convolutions is non-trivial since discrete sine transforms (DSTs) and discrete cosine transforms (DCTs) can be counter-intuitively incompatible for computing symmetric convolution, i.e. symmetric convolution can only be computed between a fixed set of compatible transforms. 2.4.1 Advantages of symmetric convolution There are a number of advantages to computing symmetric convolutions in DSTs and DCTs in comparison with the more common circular convolution with the Fourier transform. Most notably the implicit symmetry of the transforms involved is such that only data unable to be inferred through symmetry is required. For instance using a DCT-II, a symmetric signal need only have the positive half DCT-II transformed, since the frequency domain will implicitly construct the mirrored data comprising the other half. This enables larger convolution kernels to be used with the same cost as smaller kernels circularly convolved on the DFT. Also the boundary conditions implicit in DSTs and DCTs create edge effects that are often more in keeping with neighboring data than the periodic effects introduced by using the Fourier transform.
20
21
When two signals convolution is carried out in time domain it is referred to as convolution in time domain. We are dealing with convolution in time domain in this project. In time domain also the convolution can be continuous or discrete. When the convolution is in time domain is discrete then it is called as convolution in discrete time and when the convolution is performed with respect to continuous time it is called as convolution as convolution in continuous time. Convolution in discrete and continuous time are described in previous chapter. 3.3 Convolution in frequency domain When two signals are convolved in frequency domain then it is called as convolution in frequency domain. It is proved that the convolution in time domain is equivalent to multiplication in frequency domain. Proof: Let f, g belong to L1 (Rn). Let F be the Fourier transform of f and G be the Fourier transform of g:
Where the dot between x and indicates the inner product of Rn . Let h be the convolution of f and g
22
These two integrals are the definitions of F() and G(), so:
Hence, it is proved that the convolution in time domain is equivalent to multiplication in frequency domain.
23
3.4 General implementation flow The generalized implementation flow diagram of the project is represented as follows.
24
Initially the market research should be carried out which covers the previous version of the design and the current requirements on the design. Based on this survey, the specification and the architecture must be identified. Then the RTL modelling should be carried out in VERILOG HDL with respect to the identified architecture. Once the RTL modelling is done, it should be simulated and verified for all the cases. The functional verification should meet the intended architecture and should pass all the test cases. Once the functional verification is clear, the RTL model will be taken to the synthesis process. Three operations will be carried out in the synthesis process such as Translate Map Place and Route The developed RTL model will be translated to the mathematical equation format which will be in the understandable format of the tool. These translated equations will be then mapped to the library that is, mapped to the hardware. Once the mapping is done, the gates were placed and routed. Before these processes, the constraints can be given in order to optimize the design. Finally the BIT MAP file will be generated that has the design information in the binary format which will be dumped in the FPGA board. 3.5 Implementation In this project the implementation is carried out by first designed the individual blocks and then these are combined to the final architecture. The individual blocks are shown in block diagram given below:
3.5.1 Block diagram of proposed architecture The block diagram of the proposed architecture is shown below:
25
Figure 3.5.1 block diagram of the proposed architecture 3.5.1.1 Multiplexer 4*1 and 8*1: A multiplexer, sometimes referred to as a "multiplexer" or simply "mux", is a device that selects between a numbers of input signals. In its simplest form, a multiplexer will have two signal inputs, one control input, and one output. A multiplexer is a device which selects any one of the inputs from 2n inputs and directed to output depending on n-select lines.
26
Figure 3.5.1.1.2 8*1 multiplexer The higher order multiplexers can be implemented using the lower order multiplexers. The 4*1 multiplexer can be implemented using two 2*1 multiplexers and so on. Similarly an 8*1 multiplexer can be implemented using two 4*1 multiplexers.
27
3.5.1.2 Serial in parallel out block: A serial-in/parallel-out shift register is similar to the serial-in/ serial-out shift register in that it shifts data into internal storage elements and shifts data out at the serial-out, data-out, pin . It is different in that it makes all the internal stages available as outputs. Therefore, a serial in/parallel-out shift register converts data from serial format to parallel format. If four data bits are shifted in by four clock pulses via a single wire at data-in, below, the data becomes available simultaneously on the four Outputs QA to QD after the fourth clock pulse.
Figure 3.5.1.2.1 Serial in parallel out The practical application of the serial-in/parallel-out shift register is to convert data from serial format on a single wire to parallel format on multiple wires. Perhaps, we will illuminate four LEDs (Light Emitting Diodes) with the four outputs (QA QB QC QD ).
28
The above details of the serial-in/parallel-out shift register are fairly simple. It looks like a serialin/ serial-out shift register with taps added to each stage output. Serial data shifts in at SI (Serial Input). After a number of clocks equal to the number of stages, the first data bit in appears at SO (QD) in the above figure. In general, there is no SO pin. The last stage (QD above) serves as SO and is cascaded to the next package if it exists.
Figure 3.5.1.2.3 Serial in parallel out wave forms The shift register has been cleared prior to any data by CLR', an active low signal, which clears all type D Flip-Flops within the shift register. Note the serial data 1011 pattern presented at the SI input. This data is synchronized with the clock CLK. This would be the case if it is being shifted in from something like another shift register, for example, a parallel-in/ serial-out shift register (not shown here). On the first clock at t1, the data 1 at SI is shifted from D to Q of the first shift register stage. After t2 this first data bit is at QB. After t3 it is at QC. After t4 it is at QD. Four clock pulses have shifted the first data bit all the way to the last stage QD. The second data bit a 0 is at QC after the 4th clock. The third data bit a 1 is at QB. The fourth data bit another 1 is at QA. Thus, the
29
serial data input pattern 1011 is contained in (QD QC QB QA). It is now available on the four outputs. It will available on the four outputs from just after clock t4 to just before t5. This parallel data must be used or stored between these two times, or it will be lost due to shifting out the QD stage on following clocks t5 to t8 as shown above. 3.5.1.3 Binary multiplier: The binary multiplier used here is a 4-bit multiplier which takes two four bit inputs and gives an 8-bit output.
S0 S1 S2 S3 S4 Binary multiplier S5 S6 S7
Figure 3.5.1.3 binary multiplier The binary multiplier which is employed in convolution here in the present project has a special characteristic that the internal carry will not be forwarded to next stage. So the number of outputs obtained here is seven only because in binary multiplier the MSB part is nothing but the carry obtained from the second MSB so as carry is not forwarded only seven bits will be obtained as output.
30
3.5.1.4 Register: A circuit with flip-flops is considered a sequential circuit even in the absence of Combinational logic. Circuits that include flip-flops are usually classified by the function they perform. Two such circuits are registers and counters. A Register is a group of flip-flops. Its basic function is to hold information within a digital system so as to make it available to the logic units during the computing process. However, a register may also have additional capabilities associated with it. It may have combinational gates that perform certain data-processing tasks.
Figure 3.5.1.4.1 4 bit register Various types of registers are available on the market. A simple 4-bit register is shown below. The common clock input triggers all flip-flops and the binary data available at the four inputs are transferred into the register. The clear input is useful for clearing the register to all 0s output. Registers capable of shifting their binary contents in one or both directions. A unidirectional 4-bit shift register that uses only flip-flops is as follows:
31
32
33
In general the multiplexer will have 2n number of inputs and n selection lines and one output. Here we are using 4:1 multiplexer, so it will have 4 inputs and 2 selection lines and one output. Based on selection line the input will be selected and we will get the output. Here for doing convolution we have the blocks multiplexer 2:1 of two blocks. The above figure shows the simulation results of 4:1 multiplexer. SIPO
34
In this block the input is the output of the multiplexer. The serial input and parallel output block will do, the data from the multiplexer it will take as the input and it will hold the value up to four clock cycles and it will convert the data serial into parallel. The above figure shows the simulation results of the Serial input of data into parallel output.
BINARY MULTIPLIER
35
The binary multiplier will do the multiplication operation. For the binary multiplier the input is the data which we are getting from the serial input parallel output block. Binary multiplier do the multiplication from the serial input and parallel output blocks. Multiplexer
36
The data from the binary multiplier is applied to the multiplexer. The multiplexer convert the parallel data into the serial data and it will be stored into the register. Top module
The top module shows the processes of convolution. The input is applied to the multiplexers. Based on the selection line the data will be selected and it will produce the output in each clock cycle. The output data from the multiplexer is applied to the serial input and parallel output block, the data will be convert serial to parallel. The output of the serial input parallel output block is
37
connected to the binary multiplier so the binary multipliers do the multiplication operation and the output is converted into parallel to serial. The data will be stored in the register. 4.3 Introduction to FPGA FPGA stands for Field Programmable Gate Array which has the array of logic module, I /O module and routing tracks (programmable interconnect). FPGA can be configured by end user to implement specific circuitry. Speed is up to 100 MHz but at present speed is in GHz. Main applications are DSP, FPGA based computers, logic emulation, ASIC and ASSP. FPGA can be programmed mainly on SRAM (Static Random Access Memory). It is Volatile and main advantage of using SRAM programming technology is re-configurability. Issues in FPGA technology are complexity of logic element, clock support, IO support and interconnections (Routing). In this work, design of a DWT and IDWT is made using Verilog HDL and is synthesized on FPGA family of Spartan 3E through XILINX ISE Tool. This process includes following: Translate Map Place and Route 4.3.1 FPGA Flow The basic implementation of design on FPGA has the following steps. Design Entry Logic Optimization Technology Mapping Placement Routing Programming Unit Configured FPGA
38
Above shows the basic steps involved in implementation. The initial design entry of may be Verilog HDL, schematic or Boolean expression. The optimization of the Boolean expression will be carried out by considering area or speed.
In technology mapping, the transformation of optimized Boolean expression to FPGA logic blocks, that is said to be as Slices. Here area and delay optimization will be taken place. During placement the algorithms are used to place each block in FPGA array. Assigning the FPGA wire segments, which are programmable, to establish connections among FPGA blocks through routing. The configuration of final chip is made in programming unit.
4.4 Synthesis Result The developed convolution project is simulated and verified their functionality. Once the functional verification is done, the RTL model is taken to the synthesis process using the Xilinx ISE tool. In synthesis process, the RTL model will be converted to the gate level netlist mapped to a specific technology library. Here in this Spartan 3E family, many different devices were
39
available in the Xilinx ISE tool. In order to synthesis this design the device named as XC3S500E has been chosen and the package as FG320 with the device speed such as -4. This design is synthesized and its results were analyzed as follows.
Synthesis Report:
40
41
Figure 4.4.2
42
Figure 4.4.4
43
Language is a language used to describe a digital system, for example, a computer or a component of a computer. One may describe a digital system at several levels. For example, an HDL might describe the layout of the wires, resistors and transistors on an Integrated Circuit (IC) chip, i. e., the switch level. Or, it might describe the logical gates and flip flops in a digital system, i. e., the gate level. An even higher level describes the registers and the transfers of vectors of information between registers. This is called the Register Transfer Level (RTL). Verilog supports all of these levels. The industry is currently split on which is better. Many feel that Verilog is easier to learn and use than VHDL. Verilog was introduced in 1985 by Gateway Design System Corporation, now a part of Cadence Design Systems, Inc.s Systems Division. Verilog HDL allows a hardware designer to describe designs at a high level of abstraction such as at the architectural or behavioral level as well as the lower implementation levels (i. e. , gate and switch levels) leading to Very Large Scale Integration (VLSI) Integrated Circuits (IC) layouts and chip fabrication. A primary use of HDLs is the simulation of designs before the designer must commit to fabrication. 5.2 Overview of VHDL: As the size and the complexity of digital system increases, more computer aided design tools are introduced into the hardware design process. The early papered pencil design methods have given way to sophisticated design entry, verification and automatic hardware generation tools. The newest addition to this design methodologies the introduction of hardware description language (HDL).Actually the use of this language is not new languages such as CDI,ISP and AHPL have been used for last some years. However, their primary application has been the verification of designs architecture. They do not have the capability to model design with a high degree of accuracy that is, their timing model is not precise and/or their language construct implies
44
a certain hardware structure newer languages such as VHDL have more universal timing models and imply no particular hardware structure. Hardware description languages have two main applications documenting a design and modeling it. Good documentation of a design helps to ensure design accuracy and design portability. Since a simulator supports them inherent in a HDL description can be used to validate a design. Prototyping of complicated system is extremely expansive, and the goal of those concerned with the development of hardware languages is to replace this prototyping process with validation through simulation and silicon compilation. Once an entity has been modeled, it needs to be validated by the VHDL system. A typical VHDL system consists of an analyzer and a simulator. The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static semantic checks. The design library is a place in the host environment where compiled design units are stored. The simulator simulates an entity, represented by an entity-architecture pair or by a configuration, by reading in its compiled description from the design library & then performing the following steps. 1. Elaboration 2. Initialization 3. Simulation VHDL is an acronym for VHSIC Hardware description language(VHSIC is an acronym for very high speed integrated circuits). It is a hardware description language that can be used to model a digital system at many levels of abstraction, ranging from the algorithmic level to the gate level. The complexity of a digital system being modeled could vary from that of simple gate to a complete digital electronic system, or anything in between. The digital system can also be described hierarchically. Timing can also be explicitly modeled in the same description. The VHDL language can be regarded as an integrated amalgamation of the following languages. Sequential language.
45
Concurrent language. Net list language. Timing specifications. Waveform generation language. Therefore, the language has constructs that enable you to express the concurrent or sequential behavior of a digital system as an interconnection of components. All the above constructs may be combined to provide a comprehensive description of the system in a single model. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore models written in this language can be verified using a VHDL simulator. It inherits many of its features especially the sequential part, from the Ada programming language. Because VHDL provides an extensive range of modeling capabilities, it is often difficult to understand, fortunately, it is possible to quickly assimilate a core subset of the language that is both easy and simple to understand without learning the more complex features. The complete language however has sufficient power to capture the descriptions of the most complex chips to complete electronic systems. 5.2.1 Features of VHDL: The following are the major capabilities that the language provides along with the features that differentiate it from other hardware description languages. The language can be used an exchange medium between chip vendors and CAD tools users. Different chip vendors can provide VHDL descriptions of their components to system designers. CAD tool users can use it to capture the behavior of the design at a high level of abstraction for functional simulation The language supports hierarchy that is a digital system can be modeled as a set of interconnected components, each component, in turn can be modeled as a set of interconnected sub components. The language is not technology specific, but is capable of supporting technology specific features. It can also support various hardware technologies, for example you may define new logic
46
types and new components, also specify technology specific attributes. By being technology independent, the same model can be synthesized into different vendor libraries. It supports both synchronous and asynchronous timing models. Various digital modeling techniques such as finite state machine descriptions, algorithmic descriptions and Boolean equations can be modeled using the language. Test benches can be written using the same language to test other VHDL models. 5.3 Modelsim ModelSim is a verification and simulation tool for VHDL, Verilog, SystemVerilog, and mixed language designs. 5.3.1Basic Simulation Flow The following diagram shows the basic steps for simulating a design in ModelSim.
Figure 5.3.1 Basic Simulation Flow - Overview Lab In ModelSim, all designs are compiled into a library. You typically start a new simulation in ModelSim by creating a working library called "work". "Work" is the library name used by the compiler as the default destination for compiled design units. Compiling Your Design
47
After creating the working library, you compile your design units into it. The ModelSim library format is compatible across all supported platforms. You can simulate your design on any platform without having to recompile your design. Loading the Simulator with Your Design and Running the Simulation With the design compiled, you load the simulator with your design by invoking the simulator on a top-level module (Verilog) or a configuration or entity/architecture pair (VHDL). Assuming the design loads successfully, the simulation time is set to zero, and you enter a run command to begin simulation. Debugging Your Results If you dont get the results you expect, you can use ModelSims robust debugging environment to track down the cause of the problem. 5.3.2Project Flow A project is a collection mechanism for an HDL design under specification or test. Even though you dont have to use projects in ModelSim, they may ease interaction with the tool and are useful for organizing files and specifying simulation settings. The following diagram shows the basic steps for simulating a design within a ModelSim project.
As you can see, the flow is similar to the basic simulation flow. However, there are two important differences: You do not have to create a working library in the project flow; it is done for you
48
automatically. Projects are persistent. In other words, they will open every time you invoke ModelSim unless you specifically close them. 5.3.3 Multiple Library Flow ModelSim uses libraries in two ways: 1) as a local working library that contains the compiled version of your design; 2) as a resource library. The contents of your working library will change as you update your design and recompile. A resource library is typically static and serves as a parts source for your design. You can create your own resource libraries, or they may be supplied by another design team or a third party (e.g., a silicon vendor). You specify which resource libraries will be used when the design is compiled, and there are rules to specify in which order they are searched. A common example of using both a working library and a resource library is one where your gate-level design and testbench are compiled into the working library, and the design references gate-level models in a separate resource library. The diagram below shows the basic steps for simulating with multiple libraries.
Figure 5.3.3. Multiple Library Flow 5.4 Debugging Tools ModelSim offers numerous tools for debugging and analyzing your design. Several of these tools are covered in subsequent lessons, including:
49
Using projects Working with multiple libraries Setting breakpoints and stepping through the source code Viewing waveforms and measuring time Viewing and initializing memories Creating stimulus with the Waveform Editor Automating simulation 5.5 Basic Simulation Figure 5.5. Basic Simulation Flow - Simulation Lab
5.5.1 Design Files for this Lesson The sample design for this lesson is a simple 8-bit, binary up-counter with an associated
50
testbench. The pathnames are as follows: Verilog <install_dir>/examples/tutorials/verilog/basicSimulation/counter.v and tcounter.v VHDL <install_dir>/examples/tutorials/vhdl/basicSimulation/counter.vhd and tcounter.vhd This lesson uses the Verilog files counter.v and tcounter.v. If you have a VHDL license, use counter.vhd and tcounter.vhd instead. Or, if you have a mixed license, feel free to use the Verilog testbench with the VHDL counter or vice versa. 5.5.2 Create the Working Design Library Before you can simulate a design, you must first create a library and compile the source code into that library. 1. Create a new directory and copy the design files for this lesson into it. Start by creating a new directory for this exercise (in case other users will be working with these lessons). Verilog: Copy counter.v and tcounter.v files from /<install_dir>/examples/tutorials/verilog/basicSimulation to the new directory. VHDL: Copy counter.vhd and tcounter.vhd files from /<install_dir>/examples/tutorials/vhdl/basicSimulation to the new directory. 2. Start ModelSim if necessary. a. Type vsim at a UNIX shell prompt or use the ModelSim icon in Windows. Upon opening ModelSim for the first time, you will see the Welcome to ModelSim dialog. Click Close. b. Select File > Change Directory and change to the directory you created in step 1. 3. Create the working library. a. Select File > New > Library. This opens a dialog where you specify physical and logical names for the library (Figure 3-2). You can create a new library or map to an existing library. Well be doing the former.
51
b. Type work in the Library Name field (if it isnt already entered automatically). c. Click OK. ModelSim creates a directory called work and writes a specially-formatted file named _info into that directory. The _info file must remain in the directory to distinguish it as a ModelSim library. Do not edit the folder contents from your operating system; all changes should be made from within ModelSim. ModelSim also adds the library to the list in the Workspace (Figure 3-3) and records the library mapping for future reference in the ModelSim initialization file (modelsim.ini).
52
Figure 5.5.2.2 work library in work space When you pressed OK in step 3c above, the following was printed to the Transcript: vlib work vmap work work These two lines are the command-line equivalents of the menu selections you made. Many command-line equivalents will echo their menu-driven functions in this fashion. 5.5.3 Compile the Design With the working library created, you are ready to compile your source files. You can compile by using the menus and dialogs of the graphic interface, as in the Verilog
53
example below, or by entering a command at the ModelSim> prompt. 1. Compile counter.v and tcounter.v. a. Select Compile > Compile. This opens the Compile Source Files dialog (Figure 3-4). If the Compile menu option is not available, you probably have a project open. If so, close the project by making the Workspace pane active and selecting File > Close from the menus. b. Select both counter.v and tcounter.v modules from the Compile Source Files dialog and click Compile. The files are compiled into the work library. c. When compile is finished, click Done.
Figure 5.5.3.1 Compile Source Files Dialog 2. View the compiled design units. a. On the Library tab, click the + icon next to the work library and you will see two design units (Figure 3-5). You can also see their types (Modules, Entities, etc.) and the path to the underlying source files (scroll to the right if necessary). b. Double-click test_counter to load the design.
54
You can also load the design by selecting Simulate > Start Simulation in the menu bar. This opens the Start Simulation dialog. With the Design tab selected, click the + sign next to the work library to see the counter and test_counter modules. Select the test_counter module and click OK (Figure 3-6).
Figure 5.5.3.2 Loading Design with Start Simulation Dialog When the design is loaded, you will see a new tab in the Workspace named sim that displays the hierarchical structure of the design (Figure 3-7). You can navigate within the hierarchy by clicking on any line with a + (expand) or - (contract) icon. You will also see a tab named Files that displays all files included in the design.
55
Figure 5.3.3.3 Verilog Modules Compiled into work Library 5.3.4 Load the Design 1. Load the test_counter module into the simulator. a. In the Workspace, click the + sign next to the work library to show the files contained there.
56
2. View design objects in the Objects pane. a. Open the View menu and select Objects. The command line equivalent is: view objects The Objects pane (Figure 3-8) shows the names and current values of data objects in the current region (selected in the Workspace). Data objects include signals, nets, registers, constants and variables not declared in a process, generics, parameters.
Figure 5.3.4.2 Object Pane Displays Design Objects You may open other windows and panes with the View menu or with the view command. See Navigating the Interface. 5.3.5 Run the Simulation Now you will open the Wave window, add signals to it, then run the simulation. 1. Open the Wave debugging window. a. Enter view wave at the command line You can also use the View > Wave menu selection to open a Wave window. The Wave window is one of several windows available for debugging. To see a list of the other debugging windows, select the View menu. You may need to move or resize the windows to your liking. Window panes within the Main window can be
57
zoomed to occupy the entire Main window or undocked to stand alone. For details, see Navigating the Interface. 2. Add signals to the Wave window. a. In the Workspace pane, select the sim tab. b. Right-click test_counter to open a popup context menu. c. Select Add > To Wave > All items in region (Figure 3-9). All signals in the design are added to the Wave window.
Figure 5.3.5.1 Using the Popup Menu to Add Signals to Wave Window 3. Run the simulation. a. Click the Run icon in the Main or Wave window toolbar. The simulation runs for 100 ns (the default simulation length) and waves are drawn in the Wave window. b. Enter run 500 at the VSIM> prompt in the Main window.
58
The simulation advances another 500 ns for a total of 600 ns (Figure 3-10).
Figure 5.3.5.2 Waves Drawn in Wave Window c. Click the Run -All icon on the Main or Wave window toolbar. The simulation continues running until you execute a break command or it hits a statement in your code (e.g., a Verilog $stop statement) that halts the simulation. d. Click the Break icon. 5.4 Xilinx design flow The first step involved in implementation of a design on FPGA involves System Specifications. Specifications refer to kind of inputs and kind of outputs and the range of values that the kit can take in based on these Specifications. After the first step system specifications the next step is the Architecture. Architecture describes the interconnections between all the blocks involved in our design. Each and every block in the Architecture along with their interconnections is modeled in The simulation stops running.
59
either VHDL or Verilog depending on the ease. All these blocks are then simulated and the outputs are verified for correct functioning.
Figure 5.4 Xilinx Implementation Design Flow-Chart. After the simulation step the next steps i.e., Synthesis. This is a very important step in knowing whether our design can be implemented on a FPGA kit or not. Synthesis converts our VHDL code into its functional components which are vendor specific. After performing synthesis RTL schematic, Technology Schematic and generated and the timing delays are generated. The timing delays will be present in the FPGA if the design is implemented on it. Place & Route is the next step in which the tool places all the components on a FPGA die for optimum performance both in terms of areas and speed. We also see the interconnections which will be made in this part of the implementation flow. In post place and route simulation step the delays which will be involved on the FPGA kit are considered by the tool and simulation is performed taking into consideration these delays which will be present in the implementations on the kit. Delays here mean electrical loading effect, wiring delays, stray capacitances. After post place and route, comes generating the bit-map file, which means converting the VHDL code into bit streams which is useful to configure the FPGA kit. A bit file is generated this step is performed. After this comes final step of downloading the bit
60
map file on to the FPGA board which is done by connecting the computer to FPGA board with the help of JTAG cable (Joint Test Action Group) which is an IEEE standard. The bit map file consist the whole design which is placed on the FPGA die, the outputs can now be observed from the FPGA LEDs. This step completes the whole process of implementing our design on an FPGA. 5.4.1 Xilinx ISE 10.1 software Xilinx ISE (Integrated Software Environment) 9.2i software is from XILINX company, which is used to design any digital circuit and implement onto a Spartan-3E FPGA device. XILINX ISE 9.2i software is used to design the application, verify the functionality and finally download the design on to a Spartan-3E FPGA device. 5.4.2 Xilinx ISE 10.1 software tools SIMULATION : ISE (Integrated Software Environment) Simulator SYNTHESIS, PLACE & POUTE : XST (Xilinx Synthesis Technology) Synthesizer
5.4.3 Design steps using Xilinx ISE 10.1 1 2 3 4 5 6 Create an ISE PROJECT for particular embedded system application. Write the assembly code in notepad or write pad and generate the verilog or vhdl module by making use of assembler. Check syntax for the design. Create verilog test fixture of the design. Simulate the test bench waveform (BEHAVIORAL SIMULATION) for functional verification of the design using ISE simulator. Synthesize and implement the top level module using XST synthesizer.
61
CHAPTER 6: CONCLUSION
In this paper, we presented an optimized implementation of convolution. This particular model has the advantage of being fine tuned for signal processing; this implementation has the advantage of being optimized based on operation, power and area. To accurately analyze our proposed system, we have coded our design using the Verilog hardware description language and have synthesized using Xilinx. This implementation has the advantage of being optimized based on operation, power and area. Second, we implemented an illustrative example 4X4 convolver. Similarly, the presented concept can be extended on an NXN case. The functionality of the convolver was tested and verified successfully on a XILINIX SE FPGA and design compiler. The proposed circuit uses only 5mw and saves almost 35% area and it takes 20ns to complete. This shows improvement of more than 50% less power. As FPGA technology matures and much larger arrays become practical, techniques that allow the automatic generation of highly parallel architectures will become central to high performance computing. We have described some simple techniques for generation of convolution pipelines for image processing and other applications. Higher level techniques and approaches are also needed. FPGAs permit restructurable processing, and restructurable interconnects are also becoming available.
62
CHAPTER 7: BIBLIOGRAPHY
[1] John W. Pierre, A Novel Method for Calculating the Convolution Sum of Two Finite Length Sequences, IEEE transaction on education, VOL.39, NO. 1, 1996. [2] W. W. Smith, J. M. Smith, Handbook f Real-Time Fast Fourier Transforms, IEEE Press, 1995, p. 28. [3] R. G. Shoup, Parameterized convolution filtering in a field programmable gate array, in selected papers from the Oxford 1993 international workshop on field programmable logic and applications on More FPGAs. Oxford, United Kingdom: Abingdon EE&CS Books, 1994, pp. 274 280. [4] Ivn Rodrguez, Parallel Cyclic Convolution Based on Recursive Formulations of Block Pseudocirculant MatricesMarvi Teixeira, IEEE, transaction on signal processing,2008 [5] Thomas Oelsner ,Implementation of Data Convolution Algorithms in FPGAs , QuickLogic Europe http://www.quicklogic.com/images/appnote18.pdf [6] Chao Cheng , Keshab K. Parhi ,Low-Cost Fast VLSI Algorithm for Discrete Fourier
Transform, IEEE,. IEEE transaction on circuits and systems, VOL. 54, 2007 [7] J. I. Guo, C. M. Liu, and C. W. Jen, The efficient memory-based VLSI array designs for DFT and DCT, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 37, no. 10, 1992, pp. 723733. [8] T. S. Chang, J. I. Guo, and C. W. Jen, Hardware-efficient DFT designs with cyclic
convolution and subexpression sharing,IEEE Trans. Circuits Syst. II, Analog Digital Signal Process., vol. 47, no. 9, 2000, pp. 886892.
63
[9] C. Cheng and K. K. Parhi, Hardware efficient fast DCT based on novel cyclic convolution structures, IEEE Trans. Signal Process., vol. 54, no.11, 2007, pp. 44194434. [10] Chao Cheng , Keshab K. Parhi Hardware Efficient Fast Parallel FIR Filter Structures Based on Iterated Short Convolution IEEE, and, IEEE transaction on circuits and systems, VOL. 51, NO. 8, 2004 http://www.tc.umn.edu/~chen0867/ParallelFIR2004_TCASI.pdf. [11] Abdulqadir Alaqeeli, Janusz Starzyk, Hardware Implementation for Fast Convolution with a PN Code Using Field Programmable Gate, Ohio University, http://www.ent.ohiou.edu/~starzyk/network/Research/Papers/Recent %20conferences/Conv_FPGA_PN_code_SSST2001.pdf.3483
64
65
memory_SIPO sp1 (.CLOCK(CLK),.load(load),.data_in(r11),.DATA_OUT_1(DATA_OUT_1),.DATA_O UT_2(DATA_OUT_2),.DATA_OUT_3(DATA_OUT_3),.DATA_OUT_4(DATA_OUT_4)); memory_SIPO sp2 (.CLOCK(CLK),.load(load1),.data_in(r12),.DATA_OUT_1(DATA_OUT_5),.DATA_ OUT_2(DATA_OUT_6),.DATA_OUT_3(DATA_OUT_7),.DATA_OUT_4(DATA_OUT_8)); //memory sp1 (.CLK(CLK),.RST(RST),.serial_in(r11),.parallel_out0(parallel_out10),.p arallel_out1(parallel_out11),.parallel_out2(parallel_out12),.parallel_ out3(parallel_out13)); //memory sp2 (.CLK(CLK),.RST(RST),.serial_in(r12),.parallel_out0(parallel_out20),.p arallel_out1(parallel_out21),.parallel_out2(parallel_out22),.parallel_ out3(parallel_out23)); bm bm1
(.CLK(CLK),.RST(RST),.a0(DATA_OUT_1),.a1(DATA_OUT_2),.a2(DATA_OUT_3),. a3(DATA_OUT_4),.b0(DATA_OUT_5),.b1(DATA_OUT_6),.b2(DATA_OUT_7),.b3(DAT A_OUT_8),.s0(r0),.s1(r1),.s2(r2),.s3(r3),.s4(r4),.s5(r5),.s6(r6)); mux81 m3 (.CLK(CLK),.RST(RST),.a0(r0), .a1(r1), .a2(r2), .a3(r3), .a4(r4), .a5(r5), .a6(r6), .a7(r7), .s(l1), .o1(r)); register ro1(.CLK(CLK),.RST(RST),.r(r),.out(outfinal)); endmodule
66
reg
output reg load; always @(posedge CLK ) begin if(RST==1'b0) begin o1=4'bzzzz; load=1'b0; end else begin case (s) 2'b00 2'b01 2'b10 2'b11 endcase load=1'b1; end end endmodule : o1 = a0; : o1 = a1; : o1 = a2; : o1 = a3;
8.3 SIPO:
module memory_SIPO(CLOCK,load,data_in,DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_O UT_4); //INPUTS input CLOCK; input load; input signed [3:0] data_in;
67
//OUTPUTS output signed [3:0] DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4; //REGISTERS reg [2:0] cntr; integer i; reg [2:0] cntr1; reg signed [3:0] DATA_OUT_1,DATA_OUT_2,DATA_OUT_3,DATA_OUT_4; //MEMORY reg signed [3:0] m [3:0]; //WRITING INTO memory always @(posedge CLOCK) begin if (!load) begin cntr<=3'b0; cntr1<=3'b0; DATA_OUT_1 <= 4'b0; DATA_OUT_2 <= 4'b0; DATA_OUT_3 <= 4'b0; DATA_OUT_4 <= 4'b0; for(i=0;i<=3;i=i+1) m[i] <= 4'b0; end else if(cntr<=2'd3 && load) begin m[cntr] <= data_in;
68
cntr <= cntr + 1; end else begin DATA_OUT_1 <= m[0]; DATA_OUT_2 <= m[1]; DATA_OUT_3 <= m[2]; DATA_OUT_4 <= m[3]; cntr1<=cntr1+1; end end endmodule
69
wire signed [7:0] s11,s12,s21,s22,s23,s31,s32,s33,s34,s41,s42,s43,s51,s52,s61; //wire signed [15:0] st,x1,x2; always @(posedge CLK ) begin if(RST==1'b0) begin s0=8'bzzzzzzzz; s1=8'bzzzzzzzz; s2=8'bzzzzzzzz; s3=8'bzzzzzzzz; s4=8'bzzzzzzzz; s5=8'bzzzzzzzz; s6=8'bzzzzzzzz; // s7=8'b00000000; end else begin s0=so0; s1=so1; s2=so2; s3=so3; s4=so4; s5=so5; s6=so6; // s7=so7; end end
assign so0
= a0*b0;
70
assign s11 = a1*b0; assign s12 = a0*b1; assign so1= s11 + s12; assign s21 assign s22 assign s23 = a0*b2; = a1*b1; = a2*b0;
assign so2= s21 + s22 + s23 ; assign s31=a0*b3; assign s32=a1*b2; assign s33=a2*b1; assign s34=a3*b0; assign so3 = s31 + s32 + s33 + s34; assign s41 = a1*b3; assign s42 = a2*b2; assign s43 = a3*b1; assign so4 = s41 + s42 + s43; assign s51 = a2*b3; assign s52 = a3*b2; assign so5 = s51 + s52 ; //assign x1<={8'b00000000,s51}; //assign x2<={8'b00000000,s52};
71
//assign st = x1 + x2; assign s61=a3*b3; assign so6=s61; //assign so6 = s61 [7:0]; //assign so7= so6[15:8]; endmodule
72
: o1 = a7;
8.6 Register:
module register( CLK,RST,r,out); input CLK,RST; input [7:0] r; output [7:0] out; reg [7:0] out; always@(posedge CLK ) begin if(RST==1'b0) begin out<=8'bzzzzzzzz; end else begin out<=r; end end endmodule
RESULTS:
73
74
75
76
77
78