Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Design of parallel compination of FIR filter for fast processing Data

1
Deepak Shankhala JECRC, JECRC Foundation, Jaipur (Rajasthan)
1(
Lecturer, Electronics & Communication Engineering raksh.ece@jecrc.ac.in)
Department
3
JECRC, JECRC Foundation, Jaipur (Rajasthan) Mangilal
(
Deepak.ece@jecrc.ac.in) Lecturer, Electronics & Communication Engineering
Department
2
Rakesh kumar kardam JECRC, JECRC Foundation, Jaipur (Rajasthan)
(
Lecturer, Electronics & Communication Engineering mangilal.ece@jecrc.ac.in)
Department

Abstract— A matched filter is a filter used in communications to “match” a


particular transit waveform. It passes all the signal frequency
For laser beam images, centralizing is an acceptable technique for components while suppressing any frequency components where
determining the beam position. However, some beam images there is only noise and allows to pass the maximum amount of signal
may exhibit significant intensity variation or other distortions power. The purpose of the matched filter is to maximize the signal to
which makes such an approach susceptible to high position noise ratio at the sampling point of a bit stream and to minimize the
uncertainty; in these cases, correlation or matched filtering results probability of undetected errors received from a signal.To achieve the
in excellent stability even in the presence of fluctuating intensity maximum SNR, we want to allow through all the signal frequency
. Matched filtering using simple templates can achieve fairly components,but to Emphasize more on signal frequency components
stable position detection despite a wide range of intensity and that are large and so contribute more to improving the overall SNR.
beam quality variations. However, simple templates may not
always lead to sufficiently accurate results. This work Model for matched filtering
demonstrates a template design that yields more accurate results Matched filtering is a process for detecting a known piece of signal
for good beam quality than could be obtained using the simple or wavelet that is embedded in Noise The filter will maximize the
template, although at the expense of extra processing time required signal to noise ratio (SNR) of the sign being detected with aspect
for template creation. to the noise.
This paper describe the development of FIR filter on Field Consider the model in Figure 2.1 where the input signal is s(t) and
programmable gate array (FPGAs) using Xilinx. FIR filter has been the noise, n(t). The objective is to design a filter, h(t), that maximizes
designed and realized by FPGA for filtering the digital signal t
because the term digital filter arises because these filters operate on the SNR of the output, y(t).
discrete-time signals. So to designing The FIR filter the coefficient
are generate by using MAT Lab FDATOOL and making to delay
with signal Xilinx XC3S400FPGA(ISE 13.1) is considered. Process
is done by taking the 16 Bit signed data for input and 38 bit for signal s(t) Filter
output (6 Bit extra). . Matched filtering using simple templates h(t)
can achieve fairly stable position detection despite a wide range noise n(t)
of intensity and beam quality variations. However, simple Y(t)=Sh(t)+nh(t)
templates may not always lead to sufficiently accurate results.
This work demonstrates a template design that yields more
accurate results for good beam quality than could be obtained
using the simple template, although at the expense of extra Figure 2.1 Model for matched filtering
processing time required for template creation.
If the input signal, s(t), is a wavelet, w(t), and n(t) is white noise,
then matched filter theory states the maximum SNR at the output
LITERATURE SURVEY will occur when the filter has an impulse response that is the time-
However, many relevant image features are characterized by two or reverse of the input wavelet. Note that the convolution of the time-
more orientations. Examples of such multi- oriented patterns are reversed wavelet is identical to cross-correlation of the wavelet
corners, junctions, or the above mentioned checkerboard patterns. with the wavelet (autocorrelation) in the input signal. When the
To estimate multiple orientations, the structure tensor approach for Wavelet is of length, T, then the matched filter is defined by:
estimating a single orientation in images has already been extended
toward double or multiple orientations. In these eigen system-based h(t)= w[T-t]
approaches,
orientation is defined as a vanishing directional This result is derived in many signal processing texts such as
derivative, and the minimum number of directional Ziemer and Tranter (1988) and Lathi (1968), and will not be
derivatives needed to make the observed signal vanish derived in this paper. Essentially, the least squares principle is used
corresponds to the number of orientations. Correspondingly, to maximize the output signal energy with respect to the output
these approaches for orientation estimation cannot distinguish noise energy. The matched filter improves the SNR by reducing the
between the patterns .They detect and estimate the two noises spectral bandwidth to that of the wavelet, and in addition,
orientations, but provide no further information on the reduces the noise within the wavelet’s bandwidth by the shape of
specific underlying pattern. the wavelet’s spectrum. The duration of the wavelet can be small, as
used in radar or sonar, and it can be much larger when used with the
MATCH FILTER BASED POSITION DETECTOR vibroseis source with the acquisition of seismic data. Other
What is a matched filter? applications may involve the detection of weak signals from
satellite transmissions, or the detection of military equipment from resources. Using DA method, the filter can be implemented
visual images. In addition, Kirchhoff migration is a form of 2D or either in bit serial or fully parallel mode to trade bandwidth
3D matched filtering that estimates the location, size, and shape of for area utilization. Assuming coefficients c[n] are known
scattered or diffraction energy. constants, equation (I) can be rewritten as follows:
2 FPGA Implementation of High Speed FIR Filters
y[n] = ∑ c[n] · x[n] n = 0, 1,
a method for implementing high speed Finite Impulse …, N-1
Response (FIR) filters using just registered adders and
hardwired shifts. We extensively use a modified
common sub expression elimination algorithm to reduce (II) Variable x[n] can be represented by:
the number of adders. We target our optimizations to x [n] = ∑ xb [n] · 2b b=0, 1, …,
Xilinx Virtex III devices where we compare our
B-1 (III)
implementations with those produced by Xilinx ISE 3.1
x[b] [n] € [0, 1]
using Distributed Arithmetic.

3 Methamatical analysis of FIR filter : where xb [n] is the bth bit of x[n] and B is the input
DSP functions such as FIR filters and transforms are used in width. Finally, the inner product can be rewritten as follows:
a number of applications such as communication and
multimedia. These functions are major determinants of the
performance and Power consumption of the whole y = ∑ c[n] ∑ xb [k] · y= c[0] (xB-1 [0]2B-1 + xB-2 [0]2B-2
B-1 + x B-2 +
system. Therefore it is important to have good tools for + … + x0 [0]20 ) + c[1] (xB-1 [1]2 B-2 [1]2
Optimizing these functions.
… + x0 [1]20 ) + ……. + c[N-1] (xB-1 [N-1]2B-1 + xB-2
Equation (I) represents the output of an L tap FIR filter, 0
[0]2B-2 + … + x0 [N-1]2 )
which is the convolution of the latest L input samples.
List the number of coefficients h(k) of the filter, and y= (c[0] xB-1 [0] + c[1] xB-1 [1] + … + c[N-1] xB-1 [N-
x(n) represents the input time series. 1])2B-1 +(c[0] xB-2 [0] + c[1] xB-2 [1] + … + c[N-1] xB-2
y[n] = ∑ h[k] x[n-k] k= 0, 1, ..., L-1 B-2
[N- 1])2 + ….. + (c[0] x0 [0] + c[1] x0 [1] + … + c[N-1]
The conventional tapped delay line realization of this x0 [N-1])20
inner product is shown in Figure 5 . 1. This implementation
r a n s l a t e s to L multiplications and L-1 additions per y = ∑ 2b ∑ c[n] · xb [k]
sample to compute the result. This can be implemented using (IV)
a single Multiply Accumulate (MAC) engine, but it would where n=0, 1, …, N-1 and b=0, 1, …, B-1
require L MAC cycles, before the next input sample can be
processed. Using a parallel implementation with L MACs can The coefficients in most of DSP applications for the multiply
speed up the performance L times. accumulate operation are constants. The partial products are
A general purpose multiplier occupies a large area on obtained by multiplying the coefficients ci by multiplying
FPGAs. Since all the multiplications are with constants, the one bit of data xi at a time in AND operation. These
full flexibility of a general purpose multiplier is not required, partial products should be added and the result depend only
and the area can be vastly reduced using techniques develop on the outputs of the input shift registers.
for constant multiplication.Though most of the current The AND functions and adders can be replaced by Look
generation FPGAs such as Virtex II have embedded Up Tables (LUTs) that gives the partial product. This is
multipliers to handle these multiplications, the number of shown in Figure 5.2. Input sequence is fed into the shift
these multipliers is typically limited. register at the input sample rate. The serial output is
Furthermore, the size of these multipliers is limited presented to the RAM based shift registers (registers are not
to only 18 bits, which limits the precision of the shown in Figure for simplicity) at the bit clock rate which is
computations for high speed requirements. The ideal n+1 times (n is number of bits in a data input sample) the
implementation would involve a sharing of the sample rate.
Combinational Logic Blocks (CLBs) and these multipliers. The RAM based shift register stores the data in a particular
In this p a p e r , w e present a t e c h n i q u e that i s address. The outputs of registered LUTs are added and
better than conventional techniques for implementation on loaded to the scaling accumulator from LSB to MSB
the CLBs. and the result which is the filter output will be
accumulated over the time.
For an n bit input, n+1 clock cycles are needed for a
symmetrical filter to generate the output

4 Step of Coding designing and parameter of FIR filter

Step 1 : Coefficient generate through the MATLAB


Step 2 : Decide the level (Till 36) ,product and sum 0 to 31
Step 3 : Port mapping (Input and output )
Figure5.1. A FIR filter block filter input 16 bit
diagram filter output 32 bit
Step 4 : Coefficient declaration
An alternative to the above approach is Distributed Step 5 : Signal declaration (for the product )
Arithmetic (DA) which is a well known method to save Step 6 : Addition and subtraction
Step 7 : Delay pipe line process traces in Figure 2.4. Note the improved SNR of the traces, and that
0 to 80 coefficient even the bottom trace with an initial SNR of 0.2 demonstrates a
Step 8 : Multiply output save as product higher probability of detection.
Step 9 : Resize (match with previous value)

4.1 Coefficient selection step by using MATLAB

The window, optimal and frequency sampling method are the


most commonly used for designing filter coefficient. In this
paper the frequency sampling method is used for designing
the FIR filter.. Flow chart in the Figure 4.1 shows the Figure 2.3 Ten noisy traces and wavelets with a SNR that varies
generation of the coefficient of the FIR filter generated from 2.0 to .2
through MATLAB software. As seen in the flow chart,
coefficients generation requires three inputs, first is the
starting and ends of the band, second is the input sampling
frequency and finally the order of the filter. The coefficient
generated for the eight stages FIR filter so the order of the
filter is sixteen

Figure 2.4 Results of a matched filter from cross-correlating


the wavelet in Figure 2 with the noisy signal
Figure 2.4 illustrated the attenuation of the noise and the
preservation of the wavelet .energy. The shape of the wavelet,
however, has been modified to zero-phase, corresponding to the
cross-correlation of the wavelets being an auto-correlation. The
peak of the zero-phase wavelet identifies the beginning of the initial
4.1 Filter Coefficient Design. wavelet.

4.1.1 Frequency Sampling Method 2.4 2D data


The frequency sampling method allows us to The same principles apply to detecting a 2D wavelet in a 2D signal.
design nonrecursive FIR filter for both standard Such applications are used to detect potential military equipment in
frequency filters (low pass, high pass & band pass filter) video or optical images. In geophysics, it can be used to detect
and filter with arbitrary y frequency response. A unique seismic diffractions in a process we know as seismic migration.
attraction of the frequency sampling method is that it also Note that the matched filter concept tells us that we need to cross-
allows recursive implementation of FIR filters. correlate with the same size and amplitude of the .object. Being
The first example is a short wavelet added to noisy traces. In this detected. Consider the noisy section, the 2D wavelet, and the
situation, the signal to- noise ratio (SNR) is defined by taking the matched filter result in Figure 2.5.The noisy section in (a) has 100
peak amplitude of the wavelet and dividing it by the root-mean- by 100 samples, and contains one barely detectable wavelet. The
squared (RMS) measure of the noise (standard deviation). wavelet in (b) has 21 by 21 samples and is much narrower when
Various amplitudes of the wavelet in Figure 2.2 are added to ten inserted into the grid of the noisy section. The cross-correlation in
noisy traces and are illustrated in Figure 2.2. The SNR of these (c) shows a significant improvement in the matched filter result.
traces varies from 2.0 at the top to 0.2 at the bottom as indicated on
the left side of the figure.

figure 2.5 with a) the noisy section containing the wavelet

FIGURE. 2.2: Wavelet, only first 100 ms (50 points)


used.

figure 2.5 b
The cross-correlation of the known wavelet in Figure 2.2 with the
traces in Figure 2.3 produces the corresponding matched-filter
of frequency samples are zero valued. The transfer function
of an FIR filter, H (z) can be expressed in a recursive form:
𝑛−1
𝑗2𝜋𝑘
⁡⁡
H(z) = 1 − 𝑍 −𝑁 /𝑁 ∑ H(k)/⁡𝑍 −1⁡ 𝑒 𝑁

𝑘=0
figure. 2.6 C: Example Of 2d Matched Filtering

In the cross-correlation process, the amplitudes of the input Thus in recursive form, H(z ) can be viewed as a
wavelets or events are squared and will represent some form of cascade of two filters: a comb filter,
energy. H( z) zeros uniformly distributed around the unit circle,
and a sum of N which has N single all-pole filters,H(z)
4.1.2 No recursive frequency sampling The zero of comb filter and the poles of the single
To obtain the FIR coefficients of the filter pole filters are coincide on the unit circle at points .Thus
whose frequency response is depicted in Figure 2.3 By the zero cancel the pole, making H( z) an FIR as it
taking N samples the frequency response at intervals of effectively has no poles.
Kfs/N, k=0,1,……N-1. In practice, due to finite word length effects the poles of
The filter coefficients can be obtained as inverse DFT of H2 (z) not to be located exactly on unit circle so that
frequency samples. they are not cancelled by the zeros, making H(z) IIR
and potentially unstable. Stability problems can be
avoided by sampling H(z) at a radius, r, slightly less than
𝑛−1 unity. Thus the transfer function in this case becomes
h(n) = 1/𝑁 ∑ H(k)𝑒 𝑗(2𝜋/𝑁)𝑛𝑘 𝑛−1
𝑘=0 𝑗2𝜋𝑘
⁡⁡
Where H(k) k = 0, 1, 2,"""".., N-1, are sample of the ideal H(z) = 1 − 𝑟 𝑁 𝑍 −𝑁 /𝑁 ∑ H(k)/⁡⁡1 − ⁡𝑟𝑍 −1⁡ 𝑒 𝑁
frequency response. The impulse response coefficients of
linear phase FIR filter with positive symmetry, for N even, 𝑘=0

can be expressed as:


𝑁 In general, the frequency samples, H(k) are complex. Thus
−1
2 direct implementation requires complex arithmetic. To
h(n) = 1/𝑁 ∑ 2|H(k)|cos⁡[2πk(n − α) /⁡N] + H(0) avoid this, the symmetry inherent use in frequency
𝑘=1 response of any FIR filters with real impulse, . So above
equation can expressed as

Where " = (N-1)/2, and H(k) are the samples of the 5 FPGA implementation step
frequency response of the filter taken at intervals of k
Fs/N. For N odd, the upper limit in the summation is Input form Image or Video
(N-1)/2.The resulting filter will have exactly the same
frequency response as the original response at the
sampling instants. To obtain a good approximation to the Optical trainer Kit
desired frequency response, a sufficient number of frequency
samples must be taken.
An alternative frequency sampling filter, know
as type 2, results if frequency Digital Data (16 Bit )
Sample taken at intervals of
FPGA Devices
fk = (k+1/2)Fs/N. k = 0,1,2,3…………..,N-1

To improve the amplitude response of frequency samples Digital Data 32 Bit + 6 bit O/P
in the wider transition, introducing frequency samples in Buffer
the transition band. For a low pass filter the stop band Optical trainer kit
attenuation increases, approximately, by 20 dB for each
transition band frequency sample, with a corresponding
increase in the transition width:
Output form Image or Video
Approximate stopband attenuation (25+20M) dB
Approximate transition width (M+1)Fs/N .

where M is the number of transition band frequency samples


and N is the filter length.

4.1.3 Recursive frequency sampling Recursive forms of


the frequency sampling offer significant computational
advantages over the nonrecursive forms if a large number
DSPs 0.00 118 192 61
Quiescent 439.52 - - -
Total 476.06 - - -

8.3.2 B Thermal summary


Table 8.5
Thermal NEEDED

Effective TJA (C/W) 8.31.0


Junction Temp (C 54
Max Ambient (C) 83

Comment on flow chart of coding


A multiplier less technique, based on the add and shift
method and common sub expression elimination for low area,
low power and high speed implementations of FIR filters.
We validated our techniques on Virtex IITM devices where
we observed significant area and power reductions over
traditional Distributed Arithmetic based techniques. In 8 .3.3 C Power Supply Summary
future, we would like to modify our algorithm to make use Power Supply Summary
of the limited number of embedded multipliers available on Table 8.6
the FPGA devices. Total Dynamic Quiescent
7. Hardware performance Total power (mw) 476.06 36.54 439.52
The system above was implemented on a Xilinx Virtex II Pro 8.3.4 D Power Supply Currents
FPGA (part number XCVP50) on a Cray XD1. The FPGA Table 8.7
synthesized system ran at 160 MHz. FPGAs contain a certain Supply source
Supply voltage
Total current Dynamic Quiescent
amount of logic (AND, OR, etc.) and memory (block RAM) (mA) Current(mA) Current
on chip. Any design is converted to a circuit that is Vcc int 1.200 248.28 30.45 217.83
programmed on the FPGA. Our circuit used 69% of the logic Vcc aux 2.500 70.00 0.00 70.00
and 75% of the memory on the FPGA. The algorithm was Vcco 25 2.500 1.25 0.00 1.25
also implemented in Matlab and in C.
8 Results Conclusion
8.1 Device This paper has discussed an effective method for designing
Table 8.1 FIR filter of isolated less power consume and less delay time.
Name Type It presents a parallel designing of filter for image recognition
Family Virtex4 recent years there has been a steady movement towards the
| Part xc4vsx35 development of image recognigation technologies to replace
| Package ff668 or enhance text input called as have Mobile, video Search
Grade Commercial Applications. Recently NASA is working search
Process Typical applications. Future work can include improving the
Speed Grade -10 recognition filter design of the individual image
reorganization by combining the multiple classifiers.
8.2 Default Activity Rates Matched filters are designed to extract the maximum SNR of
Table 8.2 a signal that is buried in noise.

FF Toggle Rate (%) 12.5 References


| I/O Toggle Rate (%) 12.5 1. Burrus C S, #Digital Filters Structures described
| Output Load (pF) (%) 5 by Distributed arithmetic$, IEEE Transactions on
| I/O Enable Rate (%) 100 Circuits and Systems, vol. CAS-24, page: 12,
| BRAM Write Rate (%) 50 December 1977.
| BRAM Enable Rate (%) 25
| DSP Toggle Rate (%) 12.5 2. Parhi K K., #A Systematic Approach for Design
of Digit-serial Signal Processing Architectures$,
Circuits and Systems, 1991.
8.3 Summary
8 .3.1 An On-Chip Power Summary 3. Chapman S. J., # Matlab Programming for
Table 8.4 Engineers$, 3rd Edition, Thomson learning 2005.
On chipPower(mw) used available Utilization
Clocks 36.54 1 --- ---
Logic 0 87 2506 30720
Signals 0.00 11309 --- ---
IOs 0.00 | 124 448 28

You might also like