Professional Documents
Culture Documents
Frequency Domain Analysis I Me cp7004 Unit2 Notes
Frequency Domain Analysis I Me cp7004 Unit2 Notes
Frequency Domain Analysis I Me cp7004 Unit2 Notes
Spatial domain
In simple spatial domain , we directly deal with the image matrix. Whereas in frequency domain , we deal
an image like this.
Frequency Domain
We first transform the image to its frequency distribution. Then our black box system perform what ever
processing it has to performed , and the output of the black box in this case is not an image , but a
transformation. After performing inverse transformation , it is converted into an image which is then
viewed in spatial domain.
Here we have used the word transformation. What does it actually mean?
Transformation.
A signal can be converted from time domain into frequency domain using mathematical operators called
transforms. There are many kind of transformation that does this. Some of them are given below.
Fourier Series
Fourier transformation
Laplace transform
Z transform
Out of all these , we will thoroughly discuss Fourier series and Fourier transformation in our next tutorial.
Frequency components
Any image in spatial domain can be represented in a frequency domain. But what do this frequencies
actually mean.
We will divide frequency components into two major components.
Fourier Series
Fourier series simply states that , periodic signals can be represented into sum of sines and cosines when
multiplied with a certain weight.It further states that periodic signals can be broken down into further
signals with the following properties.
In the above signal , the last signal is actually the sum of all the above signals. This was the idea of the
Fourier.
How it is calculated.
Since as we have seen in the frequency domain , that in order to process an image in frequency domain ,
we need to first convert it using into frequency domain and we have to take inverse of the output to
convert it back into spatial domain. Thats why both Fourier series and Fourier transform has two
formulas. One for conversion and one converting it back to the spatial domain.
Fourier series
The Fourier series can be denoted by this formula.
, ...,
by letting
, where
as
(3
)
is then
(4
)
Discrete Fourier transforms are extremely useful because they reveal periodicities in input data as well as
the relative strengths of any periodic components. There are a few subtleties in the interpretation of
discrete Fourier transforms, however. In general, the discrete Fourier transform of a real sequence of
numbers will be a sequence of complex numbers of the same length. In particular, if are real, then
and are related by
(5
)
for
, 1, ...,
real for real data.
, where denotes the complex conjugate. This means that the component
is always
As a result of the above relation, a periodic function will contain transformed peaks in not one, but two
places. This happens because the periods of the input data become split into "positive" and "negative"
frequency complex components.
The plots above show the real part (red), imaginary part (blue), and complex modulus (green) of the
discrete Fourier transforms of the functions
(left figure) and
(right figure)
sampled 50 times over two periods. In the left figure, the symmetrical spikes on the left and right side are
the "positive" and "negative" frequency components of the single sine wave. Similarly, in the right figure,
there are two pairs of spikes, with the larger green spikes corresponding to the lower-frequency stronger
component
and the smaller green spikes corresponding to the higher-frequency weaker component. A
suitably scaled plot of the complex modulus of a discrete Fourier transform is commonly known as a
power spectrum.
Mathematica implements the discrete Fourier transform for a list of complex numbers as Fourier[list].
The discrete Fourier transform is a special case of the Z-transform.
The discrete Fourier transform can be computed efficiently using a fast Fourier transform.
Adding an additional factor of in the exponent of the discrete Fourier transform gives the so-called
(linear) fractional Fourier transform.
The discrete Fourier transform can also be generalized to two and more dimensions. For example, the plot
above shows the complex modulus of the 2-dimensional discrete Fourier transform of the function
.
Fourier transform
The Fourier transform simply states that that the non periodic signals whose area under the curve is finite
can also be represented into integrals of the sines and cosines after being multiplied by a certain weight.
The Fourier transform has many wide applications that include , image compression (e.g JPEG
compression) , filtrering and image analysis.
Spatial Frequency
Magnitude
Phase
The spatial frequency directly relates with the brightness of the image. The magnitude of the sinusoid
directly relates with the contrast. Contrast is the difference between maximum and minimum pixel
intensity. Phase contains the color information.
The formula for 2 dimensional discrete Fourier transform is given below.
The discrete Fourier transform is actually the sampled Fourier transform, so it contains some samples that
denotes an image. In the above formula f(x,y) denotes the image , and F(u,v) denotes the discrete Fourier
transform. The formula for 2 dimensional inverse discrete Fourier transform is given below.
The inverse discrete Fourier transform converts the Fourier transform back to the image
Original Image
sometimes called the Danielson-Lanczos lemma. The easiest way to visualize this procedure is perhaps
via the Fourier matrix.
The Sande-Tukey algorithm (Stoer and Bulirsch 1980) first transforms, then rearranges the output values
(decimation in frequency).
Spatial frequency
Images are 2D functions f(x,y) in spatial coordinates (x,y) in an image plane. Each function describes how
colours or grey values (intensities, or brightness) vary in space:
An alternative image representation is based on spatial frequencies of grey value or colour variations over
the image plane. This dual representation by a spectrum of different frequency components is completely
equivalent to the conventional spatial representation: the direct conversion of a 2D spatial function f(x,y)
into the 2D spectrum F(u,v) of spatial frequencies and the reverse conversion of the latter into a spatial
representation f(x,y) are lossless, i.e. involve no loss of information. Such spectral representation
sometimes simplifies image processing.
To formally define the term "spatial frequency", let us consider a simple 1D periodic function such as sine
function (x) = sin x:
It consists of a fixed pattern or cycle (from x = 0 to x = 2 6.28; in particular, (0) = 0.0; (/2) = 1.0;
() = 0.0; (3/2) = 1.0; and (2) = 0.0) that repeats endlessly in both directions. The length of this
cycle, L (in the above example L = 2) is called the period, or cycle of the function, and the frequency of
variation is the reciprocal of the period. For the spatial variation where L is measured in distance units, the
spatial frequency of the variation is 1/L. Generally, a sinusoidal curve f(x) = A sin(x + ) is similar to
the above pure sine but may differ in phase , period L = 2/ (i.e. angular frequency ), or / and
amplitude A. The sine function has the unit amplitude A = 1, the unit spatial frequency (i.e. the angular
frequency = 2), and the zero phase = 0.
Images below show what x-directed sinusoidal variations of grey values in a synthetic greyscale image
f(x,y) = fmean + A sin((2/N)ux + ) look like:
Column1: the bottom image is half the amplitude/contrast of the top image.
Column 2: the bottom image is twice the spatial frequency of the top image.
Column 3: the bottom image is 90o ( = /2) out of phase w.r.t. the top image.
(All the images - from www.luc.edu/faculty/asutter/sinewv2.gif)
Here, u is a dimensionless spatial frequency corresponding to the number of complete cycles of the
sinusoid per the image width N measured in the number of pixels. The ratio 2/N gives the spatial
frequency in units of cycles per pixel. To relate the dimensionless spatial frequency parameter u to the
number of complete cycles of the sinusoid that fit into the width of the image from the starting pixel
position x = 0 to the ending position x = N 1, the function should be specified as f(x,y) = fmean + A
sin((2/(N1))ux + ) so that u corresponds to the number of complete cycles that fit into the width of the
image.
A few examples of more general 2D sinusoidal functions (products of x- and y- oriented sinusoids:
Images from:
www.canyonmaterials.com/grate www.xahlee.org/SpecialPlaneCurves_dir/Sinusoid_dir/sinu
soid.html
4.html
In such artificial images, one can measure spatial frequency by simply counting peaks and thoughs. Most
of real images lack any strong periodicity, and Fourier transform is used to obtain and analyse the
frequencies.
Return to the local table of contents
Fourier transform
The key idea of Fourier's theory is that any periodic function, however complex it is along the period, can
be exactly (i.e. with no information loss) represented as a weighted sum of simple sinusoids.
Irrespectively of how irregular may be an image, it can be decomposed into a set of sinusoidal
components having each a well-defined frequency. The sine and cosine functions for the decomposition
are called the basis functions of the decomposition. The weighted sum of these basis functions is called a
Fouirier series
where the weighting factors for each sine (an) and cosine (bn) function are the Fourier coefficients, and
the index n specifying the number of cycles of the sinusoid that fit within one period L of the function f(x)
is a dimensionless frequency of a basis function. A 1D function with period L is uniquely represented by
two infinite sequences of coefficients.
The computation of the (usual) Fourier series is based on the integral identities (see on-line Math
Reference Data for more detail):
for m, n 0, where mn = 1 if m = n and 0 otherwise is the Kronecker delta function. Since the cosine and
sine functions form a complete orthogonal basis over the interval [L/2, L/2], the Fourier coefficients are
as follows:
Figures below illustrate steps of summation of sine waves to approach a square wave:
http://www.brad.ac.uk/acad/lifesci/optometry/resources/modules/stage1//pvp1/CSF.html
With only one term, it is a simple sine wave, and adding the next terms brings the sum closer and closer to
a square wave.
The Fourier series decomposition equally holds for 2D images, and the basis consists in this case of 2D
sine and cosine functions. A Fourier series representation of a 2D function, f(x,y), having a period L in
both the x and y directions is:
where u and v are the numbers of cycles fitting into one horizontal and vertical period, respectively, of
f(x,y). In the general case, when both the periods are different (Lx and Ly, respectively) the Fourier series is
quite similar:
The Fourier series representation of f(x,y) can be considered as a pair of 2D arrays of coefficients, each of
infinite extent.
Return to the local table of contents
due to ej exp(j) = cos + jsin . The (forward) DFT results in a set of complex-valued Fourier
coefficients F(u,v) specifying the contribution of the corresponding pair of basis images to a Fourier
representation of the image. The term "Fourier transform" is applied either to the process of calculating all
the values of F(u,v) or to the values themselves.
The inverse Fourier transform converting a set of Fourier coefficients into an image is very similar to
the forward transform (except of the sign of the exponent):
The forward transform of an NN image yields an NN array of Fourier coefficients that completely
represent the original image (because the latter is reconstructed from them by the inverse transform).
Manipulations with pixel values f(x,y) or Fourier coefficients F(u,v) are called processing in the spatial
domain or frequency (spectral) domain, respectively. The transformation from one domain to another via
a forward or inverse Fourier transform does not, in itself, cause any information loss.
Return to the local table of contents
If an array of complex coefficients is decomposed into an array of magnitudes and an array of phases, the
magnitudes correspond to the amplitudes of the basis images in the Fourier representation. The array of
magnitudes is called the amplitude spectrum of the image, as well as the array of phases is called the
phase spectrum. The power spectrum, or spectral density of an image is the squared amplitude
spectrum: P(u,v) = |F(u,v)|2 = R2(u,v) + I2(u,v). All the power, amplitude, and phase spectra can be
rendered as images themselves for visualisation and interpretation. While the amplitude spectrum reveals
the presence of particular basis images in an image, the phase spectrum encodes their relative shifts. Thus,
without phase information, the spatial coherence of the image is destroyed to such extent that it is
impossible to recognise depicted objects. Without amplitude information, the relative brightnesses of
these objects cannot be restored, although the boundaries between them can be found. Because phase is so
important to keep the overall visuall appearance of an image, most of image processing operations in the
frequency domain do not alter the phase spectrum and manipulate only the amplitude spectrum.
The interpretation of spectra is made much easier if the results of the DFT are centred on the point (u = 0,
v = 0), such that frequency increases in any direction away from the origin. This can be done by circular
shifting of the four quadrants of the array or computing the DFT sums from N/2 to N/2 rather than from
0 to N. Alternatively, by the shift theorem of the Fourier transform (see Wikipedia for a brief description
of the shift theorem), the same result can be achieved by making the adjacent input values positive and
negative by multiplying f(x,y) by (1)x+y (i.e. by keeping the input values positive and negative for even
and odd sums x+y, respectively). Typically, all the spectra are represented with the centre point as the
origin to see and analyse dominant image frequencies. Because the lower frequency amplitudes mostly
dominate over the mid-range and high-frequency ones, the fine structure of the amplitude spectrum can be
perceived only after a non-linear mapping to the greyscale range [0, 255]. Among a host of possible ways,
commonly used methods include a truncated linear amplitude mapping with the scale factor for the
amplitude (the scaled amplitudes a are cut out above the level 255; it is a conventional linear mapping if
the maximum amplitude amax 255/) or the like truncated linear mapping after some continuous nonlinear, e.g. logarithmic amplitude transformation. The examples below show amplitude spectra of the
same image obtained with the linear or truncated linear mapping of the initial amplitudes and the
logarithms of amplitudes:
Original image
Phase spectrum
Spectra of simple periodic patterns, e.g. of pure 2D sinusoidal patterns, are the simplest possible because
correspond to a single basis image:
2D sinusoid
Amplitude spectrum
Other pairs of simple images below (left) and their amplitude Fourier spectra (right) are taken from the
webpage ket.dyndns.org/~durandal/public/:
Circle
Square
Periodic circles
Periodic squares
Images below demostrate the natural digital photo, its power, amplitude, and phase spectra, and the
images reconstructed with the inverse DFT from the spectrum restricted to only higher or only lower
frequencies (images www.aitech.ac.jp/~iiyoshi/ugstudy/ugst.html):
Original image
Power spectrum
Amplitude spectrum
Phase spectrum
High-pass filtering
Low-pass filtering
For a more detailed analysis of Fourier transform and other examples of 2D image spectra and filtering,
see introductory materials prepared by Dr. John M. Brayer ( Professor Emeritus, Department of Computer Science,
University of New Mexico, Albuquerque, New Mexico, USA).
Windowing
Fourier theory assumes that not only the Fourier spectrum is periodic but also the input DFT data array is
a single period of an infinite image repeating itself infinitely in both directions:
Equivalently, the image may be considered on a torus, i.e. wrapped around itself, so that the left and right
sides and the top and bottom coincide. Any mismatch along these coinciding lines, i.e. any discontinuity
in the periodic image, distort the Fourier spectrum. But this problem is minimised by windowing the
images prior to the DFT in such a way as to gradually reduce the pixel values to zero at the edges of the
image. The reduction is performed with a windowing function depending on the distance r of each image
pixel (x,y) from the centre (xc,yc) of the image, r2 = (x xc)2 + (y yc)2, and a given maximum distance
rmax. There are several standard windows such as:
the cone-shaped 2D Bartlett window: w(r) = 1 (r/rmax) if r rmax and w(r) = 0 otherwise;
the slightly smoother Hanning window: w(r) = 0.5 0.5cos[(1 r/rmax)] if r rmax and w(r) = 0
otherwise; or
the similar but narrower Blackman window: w(r) = 0.42 0.5cos[(1 r/rmax)] + 0.08cos[2(1
r/rmax)] if r rmax and w(r) = 0 otherwise.
Phase spectrum
Filtering of images
Filtering in the spatial domain is performed by convolution with an appropriate kernel. This operation
impacts spatial frequencies but it is difficult to quantify this impact in the spatial domain. The spectral
(frequency) domain is more natural to specify these effects; also filtering in the spectral domain is
computationally simpler because convolution in the spatial domain is replaced with the point-to-point
multiplication of the complex image spectrum by a filter transfer function.
Let F, H, and G denote the spectrum of the image f, the filter transfer function (i.e. the spectrum of the
convolution kernel h), and the spectrum of the filtered image g, respectively. Then the convolution
theorem states that fh G(u,v) = F(u,v)H(u,v) (the derivation takes into account that the image is
infinite and periodic with the period N in the both directions; the sign "" indicates a Fourier transform
pair such that each its side is converted into the opposite one by a Fourier transform, i.e. the left-to-right
forward and right-to-left inverse transforms):
The filtered image g can be computed using the inverse DFT. Generally, filtering in the spectral domain
impacts both the amplitude and phase of the image spectrum F(u,v). In practice, most filters do not affect
phases and change only magnitudes, i.e. they are zero-phase-shift filters.
Convolving an image with a certain kernel has the same effect on that image as multiplying the spectrum
of that image by the Fourier transform of the kernel. Therefore, linear filtering can always be performed
either in the spatial or the spectral domains.
A low pass filtering suppresses high frequency components and produces smoothed images. An ideal
sharp cut-off filter simply blocks all frequencies at distances larger than a fixed filter radius r0 from the
centre (0,0) of the spectrum:
H(u,v) = 1 if r(u,v) r0 and H(u,v) = 0 if r(u,v) > r0
where r(u,v) = [u2 + v2]1/2 is the distance form the centre of the spectrum. But such a filter produces a
rippled effect around the image edges because the inverse DFT of such a filter is a "sinc function",
sin(r)/r. To avoid ringing, a low pass transfer function should smoothly fall to zero. One of most known
filters of such type is the Butterworth low pass filter of order n: H(u,v) = 1/(1 + [r(u,v)/r0]2n) where the
value r0 defines the distance at which H(u,v) = 0.5 rather than the cut-off radius. When the order increases,
the Butterworth filter approaches the ideal low pass filter. Many other transfer functions perform low pass
filtering. An important example of a smooth and well-behaved spectral filter is a Gaussian transfer
function (its Fourier transform results in another Gaussian).
A high pass filtering suppresses low frequency components and produces images with enhanced edges.
An ideal high pass filter suppresses all frequencies up to the cutoff one and does not change frequencies
beyond this border:
H(u,v) = 0 if r(u,v) r0 and H(u,v) = 1 if r(u,v) > r0
Just as the ideal low pass filter, it leads to ringing in the filtered image. This effect is avoided by using a
smoother filter, e.g. the Butterworth high pass filter of order n with the transfer function H(u,v) = 1/(1 +
[r0/r(u,v)]2n). Both the high-pass and the low pass Butterworth filters approach the ideal cutoff ones if the
filtert order increases.
A band pass filtering preserves a certain range of frequencies and suppress all others. Conversely, a
band stop filtering suppresses a range of frequencies and preserves all other frequencies. For example, a
band stop filter may combine a low pass filter of radius rlow and a high pass filter of radius rhigh, with rlow >
rhigh. The transfer function of a Butterworth band stop filter with radius r0 = (rlow + rhigh)/2 and the band
width = rhigh rlow is specified as Hs = 1 / (1 + [r(u,v)/(r2(u,v) r02)]2n) where r(u,v) = [u2 + v2]1/2. The
corresponding band pass filter is Hp = 1 Hs. Even more versatile filtering is produced by selective
editing of specific frequencies, in particular, the removal of periodic sinusoidal noise by suppressing
narrow spikes in the spectrum.
The convolution kernel h, which models the blurring caused by all degradation sources in the scene and
imaging device, is called the point spread function (PSF). Generally, both h and may vary spatially, but
usually the model is simplified by assuming that the degradation h is spatially invariant. The PSF specifies
how a point source is impacted by the imaging process. The perfect process forms for a point source an
image with a single bright point and other zero-valued pixels, while the real process produces an area of
non-zero pixels such that a grey level profile across this area follows the PSF. To deconvolve an image, its
PSF has to be known or estimated from measurements of the image.
The way to inverting the convolution is clearer in the spectral domain: in accord with the convolution
theorem, the spectra of the perfect and degraded images, noise spectrum E, and DFT of the PSF, called the
modulation transfer function (MTF) relate as follows:
Assuming the noise is negligible, the simplest deconvolution to restore an initial (perfect) image from the
degraded one is based on inverse filtering:
where 1/H(u,v) is the inverse filter to remove the degradation. But in practice this approach encounters
numerous problems, first, with zeros of the MTF (that result in indeterminate or infinite ratios), and
secondly, with noisy data when the term E(u,v)/H(u,v) may become large and dominate F(u,v) which
should be recovered. Empirical solutions to these problems set a threshold on H(u,v) below which the
corresponding values of F(u,v) are set to zero and limit inverse filtering to a certain distance from the
origin of the spectrum. In some cases these heuristics are sufficient to clearly improve the output
(restored) image over the input (degraded) one.
A more theoretically justified solution is the Wiener filtering that minimises the expected squared error
between the restored and perfect images. A simplified Wiener filter is as follows:
where K is a constant value directly proportional to the variance of the noise present in the image and
inversely proportional to the variance of the image with respect to the average grey value. If K = 0 (no
noise), the Wiener filter reduces to a simple inverse filter. If K is large compared with |H(u,v)|, then the
large value of the inverse filtering term 1/H(u,v) is balanced out with the small value of the second term
inside the brackets.
Image smoothing and Sharpening
SMOOTHING/AVERAGING FILTERS
To smooth an image might do a 'NxN pixel moving window average' of image - e.g. with the 3x3 lter below. Place centre pixel of
window over given pixel, multiply pixels of image with pixels in window, sum results and copy as value of output pixel. Then shift
window one place to right or down and repeat.
1/9
S=
1/9
1/9
1/9 1/9
1/9
1/9
1/9 1/9
SMOOTHING/AVERAGING FILTERS
Local Histogram Equalisation Like histogram equali-sation but use some region about pixel not whole image to compute transfer function for each
pixel. Better than global equalisation if image contains big variations in brightness.
Image Smoothing Often done to reduce the eect of pixel noise in images. Box average, median or Wiener ltering.
'same' Create an ouput as long as central pixel of the lter function lines within the input image, output is same size as input. Default
option.
'valid' All pixels within the lter must be inside the image. Output size of imag is (N-M+1) x (N-M-1).
for images on a black background (e.g astronomy images) get no edge eect in output imge. For other images though the rst 2 options give dark
border in the output image.
If there is only smooth structure in true image is correct thing to do - reduce noise without aecting underlying image.
In general case, if image has ne details then we 'win' some-thing by reducing noise but we 'lose' by removing ne de-tails. What is optimum size and
form of lter? Can have adaptive lters (see nect lecture) or Wiener lter command in MATLAB)- Amount of smoothing changes based on the contents
of the lter - nd best compromise between smooth-ing too little (gives high noise) or too much (lose detail).
SHARPENING FILTERS
Obviously useful to bring out ne details of an image. Want to estimate details on top of a smooth background. Find a way to estimate
background and then remove from original. Background can be estimated by smoothing an image, if we remove this image from original
get a sharpened image.
Was implemented before computers using photographic tech-niques. Known as unsharp masking. Take in focus and out of focus
images, make negative of second image and overlay ontop of rst and re-photograph.
I (m; n) = I(m; n) ( S)
where the is a lter which is 1 in the centre and zero else-where and the smoothing or blurring matrix S is given by
S=
1/9
1/9
1/9
1/9
1/9
1/9
1/9
1/9
1/9
-1/9
-1/9
-1/9
-1/9
8/9
-1/9
-1/9
-1/9
-1
-1
H = -1
-1
-1
-1
-1
0
L=
-1
0
-1
4 -1
-1
Pure highpass often overdoes it, too much high frequency information, can't recognise image. Better to add original image to the
high frequency ltered image.
0
0 0
1 0
0 0
Is equivalent to I
0
L'
= I (aL ) or I = I L where
-a
-a 1+4a -a
0
-a
Selective filters
Low-pass : to extract short-term average or to eliminate high-frequency fluctuations
(eg. noise filtering, demodulation, etc.)
High-pass : to follow small-amplitude high-frequency perturbations in presence
of much larger slowly-varying component (e.g. recording the electrocardiogram
in the presence of a strong breathing signal)
Band-pass : to select a required modulated carrier frequency out of many (e.g.
radio)
Band-stop : to eliminate single-frequency (e.g. mains) interference (also known
as notch filtering)
If several copies of an image have been obtained from the source, some static image, then
it may be possible to sum the values for each pixel from each image and compute an
average. This is not possible, however, if the image is from a moving source or there are
other time or size restrictions.
If such averaging is not possible, or if it is insufficient, some form of low pass spatial
filtering may be required. There are two main types:
o
Neighborhood-averaging filters These replace the value of each pixel, a[i,j] say, by a
weighted-average of the pixels in some neighborhood around it, i.e. a weighted sum of
a[i+p,j+q], with p = -k to k, q = -k to k for some positive k; the weights are non-negative
with the highest weight on the p = q = 0 term. If all the weights are equal then this is a
mean filter. "linear"
Median filters This replaces each pixel value by the median of its neighbors, i.e. the
value such that 50% of the values in the neighborhood are above, and 50% are below.
This can be difficult and costly to implement due to the need for sorting of the values.
However, this method is generally very good at preserving edges.
Mode filters Each pixel value is replaced by its most common neighbor. This is a
particularly useful filter for classification procedures where each pixel corresponds to an
object which must be placed into a class; in remote sensing, for example, each class could
be some type of terrain, crop type, water, etc..
The above filters are all space invariant in that the same operation is applied to each pixel
location. A non-space invariant filtering, using the above filters, can be obtained by changing the
type of filter or the weightings used for the pixels for different parts of the image. Non-linear
filters also exist which are not space invariant; these attempt to locate edges in the noisy image
before applying smoothing, a difficult task at best, in order to reduce the blurring of edges due to
smoothing. These filters are not discussed in this tutorial.
In contrast to the wavelet transform, the Fourier transform takes a signal in the time domain (e.g.,
a signal sampled at some frequency) and transforms it into the frequency domain, where the
Fourier transform result represents the frequency components of the signal. Once the signal is
transformed into the frequency domain, we lose all information about time, only frequency
remains.
What would be nice is a a signal analysis tool that has the frequency resolution power of the
Fourier transform and the time resolution power of the wavelet transform. To some extent the
wavelet packet transform gives us this tool. I write "to some extent" because as this web pages
shows, the wavelet packet transform does not produce as exact a result as the Fourier transform
and the wavelet packet result is more difficult to interpret. The wavelet packet transform can be
applied to time varying signals, where the simple Fourier transform does not produce a useful
result.
Acknowledgment
Wavelets and digital signal processing are topics that are too complex to cover easily in a set of
web pages, or even in a single book (as my growing library of wavelet and DSP references
attests). The contribution that I see these web pages making is the publication of C++ and Java
code that implements these algorithms. I also have tried to provide some material that explains
these algorithms. But it is obvious to me that my explanation is incomplete and is best read along
with a more detailed explanation, which covers some of the mathematical background. I
recommend Ripples in Mathematics: the Discrete Wavelet Transform by Jensen and la CourHarbo, Springer Verlag, 2001. The material on this web page relies heavily on Chapter 9 of
Ripples. Any mistakes on this web page are mine.
Software Download
The C++ code that builds the frequency ordered wavelet packet tree extends the standard wavelet
packet code and is published with this code in a single tar file (see The Wavelet Packet
Transform).
The C++ source code for the wavelet packet/frequency ordered wavelet packet
algorithms can be downloaded here.
The doxygen generated documentation for this code can be found here
In the standard wavelet packet transform the result of the scaling function (the low pass filter) is
placed in the lower half of the array and the result of the wavelet function (the high pass filter) is
placed in the upper half of the array. The wavelet packet algorithm recursively applies the
wavelet transform to the high and low pass result at each level, generating two new filter results
which have half the number of elements. The standard transform is shown in figure 1. The result
of the wavelet (high pass) filter is shaded.
Figure 1
If we think of the lower and upper halves of the array that results from the wavelet transform as
two children in the wavelet packet tree, a frequency ordered wavelet packet result can be
calculated inverting the location of the filter results in the left hand child. This is shown in Figure
2. In this diagram the low pass filter is shown as H and the high pass filter is shown as G. The
high pass filter results are shaded as well.
The first wavelet trasform produces the result
left child = H1 = {21, 29, 32.5, 36, 21, 13.5, 23.5, 31}
right child = G1 = {11, -9, 4.5, 2, -3, 4.5, -0.5, -3}
The standard wavelet transform is applied to the left child (e.g., the result of the low pass filter H
is placed in the lower half o the result array and the result of the high pass filter G is placed in
the upper half of the array.
A modified wavelet transform is applied to the right child. Here the result of the high pass filter
G is placed in the lower half of the array and the result of the low pass filter H is placed in the
upper half of the array.
Each recursive step generates two more children, where the standard transform is applied to the
left child and the modified transform is applied to the left child.
Figure 2
The best time/frequency resolution is obtained by taking a level basis through the tree which
results in a square matrix. For the wavelet packet tree this in Figure 2 this is a matrix composed
from the four element element arrays that result from the second application of the wavelet
transform. The tree is walked from left to right. If the maximum frequency is f, the first array
will contain the frequency range {0 .. (f/4)-1}, the second array will contain frequencies {f/4 ..
(f/2)-1}, the third array will contain frequencies {f/2 .. (3/4 *f)-1} and the four array will contain
{3/4*f .. f }. On the time axis there are four time steps.
As Figure 3 shows, these arrays are stacked to form a matrix, where the x-axis is time and the yaxis is frequency. This is shown below in Figure 3.
Figure 3
A square matrix constructed from a level basis can be used to build a three dimensional plot,
where the z-axis is a function of the value at m[y][x]. Many authors use a grey scale plot to
represent the z-axis values (e.g., an x-y plot where the z-dimension is the shade of grey. Math
and statistics environments like Matlab and S+ support grey scale plots. Although I have access
to these packages at work, this material is developed with my own computer resources, so I have
used 3-D surface plots rendered with gnuPlot.
A magnitude plot of the result of a Fourier transform of the sampled signal is shown in Figure 5
(here only the relevant part of the plot is shown). This shows a signal of about 51 cycles, which
is what I get when I count the cycles by hand. The data for this discrete Fourier transform (DFT)
plot was calculated using Java code which can be found here.
Figure 5
The Fourier transform plots on this web page do not show adjusted magnitude (where adjMag =
2Mag/N), so the magnitudes do not properly represent the signal magnitude.
Figure 6a shows a frequency/time plot using wavelet packet frequency analysis. As with the
examples above, this plot samples the signal sin(4Pix) in the range {0..8Pi} at 1024 equally
spaced points. A square 32x32 matrix is constructed from 32 elements arrays from the fifth level
of the modified wavelet packet transform tree (where we count from zero, starting at the top
level of the tree, which is the original signal). Frequency is plotted on the x-axis and time on the
y-axis. The z-axis plots the founction log(1+s[i]2). A gradient map is also shown on the x-y plane.
Figure 6(a)
Why are there peaks in this plot? The peaks are formed by the filtered signal at the resolution of
the level basis. Since the z-axis plots log(1+s[i]2), the part of the sine wave that would be below
the plane is flipped up above the plane.
The wavelet frequency/time plot in 6(a) is not as easy to interpret as the Fourier transform
magnitude plot. The signal region is shown in Figure 6(b), scaled to show the signal region in
more detail.
Figure 6(b)
The wavelet signal is spread out through a range of about 80 cycles, centered at slightly over
100. The Fourier transform of sin(4x) shows that there are 51 cycles in the sample. Is the wavelet
packet transform reporting a value that is double the value reported by the Fourier transformm? I
don't know the answer. The wavelet packet transform has been developed in the last decade.
Where books like Richard Lyons' Understanding Digital Signal Processing cover Fourier based
frequency analysis in detail, this depth is lacking the the literature I've seen on the wavelet packet
transform.
Time Frequency Analysis of a Signal Composed of the Sum of Two Sine Waves
The Fourier transform is a powerful tool for decomposing a signal that is composed of the sum
of sine (or cosine) waves. The plot below super imposes two sine waves, sin(16Pix) and
sin(4Pix).
Figure 7
When these two signals are added together we get the signal show below in Figure 8 (shown in
detail)
Figure 8
The same signal, plotted through a range of {0..32Pi} and sampled at 1024 equally spaced points
is shown below.
Figure 9
The Fourier transform result in Figure 10 shows that this signal is composed the 51 cycle
sin(4Pix) signal and another signal of about 200 cycles (sin(16Pix)). This is a case where the
Fourier transform really shines as a signal analysis tool. The two signal components are widely
spaced, allowing clear resolution.
Figure 10
The wavelet packet transform plotted in Figure 11 shows two signal components, the sin(4Pix)
component we saw in Figures 6(a) and 6(b) and the higher frequency component from
sin(16Pix). Again, the wavelet packet transform result is not entirely clear. As the Fourier
transform result shows, the higher frequency signal component is about four times the frequency
of the lower frequency component. This is not quite the case with the wavelet packet transform,
where the second frequency component appears to be slightly less than four times the frequency
of the sin(4Pix) component. The surface plot also shows two echo artifacts at higher frequencies.
Figure 11
Figure 12
Figure 13 (a) shows a surface plot of the modified wavelet packet transform applied to this signal
(using the Haar wavelet). The surface ridge shows the increasing frequency, although the steps
cannot be clearly isolated, perhaps because the frequency difference between the steps is not
sufficiently large. The ridges above 512 on the frequency spectrum are artifacts.
Figure 13 (a)
Figure 13 (b) shows a gradient plot, using the same data as Figure 13 (a). As with the surface plot
representation, we can see the frequency increase, but the step wise nature of this increase cannot
be seen.
Figure 13 (b)
The modified wavelet packet transform is frequently demonstrated using a "linear chirp" signal.
This is a signal with an exponentially increasing frequency, calculated from the equation:
Figure 14 shows a plot of the linear chirp signal in the region {0..2}, sampled at 1024 points. As
the linear chirp frequency increases, the signal becomes undersampled, which accounts for the
jagged arrowhead shape of the signal around 0 as xi gets closer to 2.
Figure 14
Figure 15 shows the result of the modified wavelet packet transform, using the Haar wavelet,
applied to the linear chirp. The peaks exist because the signal is sampled at a particular resolution
and the absolute value of the signal is plotted. Note that as the frequency increases the peaks
seem to disappear as the signal cycles get close together.
The ridge along the diagonal shows that the signal frequency increases through time. In theory
the linear chirp frequency increases exponentially, not linearly as this plot suggests. However, the
signal is sampled at a finite number of points, so the exponential nature of the signal disappears
as the signal becomes under sampled. The ridges that are perpendicular to the main diagonal line
are artifacts.
Figure 15
In theory the Daubechies D4 wavelet transform (e.g., four scaling (H) and four wavelet
coefficients (G)) is closer that the Haar wavelet transform to a perfect filter that exactly divides
the frequency spectrum. The closer the (H, G) filters are to an ideal filter, the fewer the artifacts
in the wavelet packet transform result. The result of applying the modified wavelet packet
transform, using Daubechies D4 filters, to the linear chirp is shown in figure 16.
Figure 16
The result in Figure 16 is certainly not better than that obtained using the Haar transform and, in
fact, may be worse. In Ripples in Mathematics, the authors give an example of wavelet packet
transform results using Daubechies D12 filters. There are notable fewer artifacts in this case.
Jensen and la Cour-Harbo mention that as the filter length approaches the signal size, the filter
approaches an ideal filter.
The plots in Figures 15 and 16 come from 32x32 matrices (where the original sample consisted
of 1024 points). Time is divided up into 32 regions (as is frequency). Can we get better
time/frequency resolution by decreasing the range of the time regions and increasing the number
of frequency regions?
Figure 17 shows a surface plot of a 16x64 matrix generated from the next "linear basis" (e.g., a
horizontal slice through the wavelet packet tree at the next level). As the gradient plot on the x-y
plane shows, the time frequency localization is not improved.
Figure 17
The plot in Figure 18 is generated from an 8x128 matrix. By further reducing the time regions,
all the frequency bands become compressed into a smaller time region. Multiple frequency bands
become associated with a given time region.
Figure 18
Looking Backward
The modified wavelet packet transform gives us a tool that can be used to analyze time varying
signals. Although this tool can be used in cases where the simple Fourier transform does not
provide good results (either because a time/frequency answer is needed or because the signal
varies through time), the wavelet packet transform is more difficult to use and understand. On
this web page, the wavelet packet transform has been examined using only two wavelets: Haar
and Daubechies D4. Other wavelets can produce better answers. All the examples here use a
"level basis". A non-level bases can be chosen using a cost function (Shannon entropy, as
discussed on this related web page). A non-level basis is even more difficult to interpret (not to
mention plot) since it is composed of several time/frequency regions from the wavelet packet
tree.
In short, this web page is hardly the last word on using the wavelet packet transform for
time/frequency analysis. At most this web page provides some examples drawn from a complex
topic. I suspect that if I understood wavelet filter bank theory better some of the issues of
time/frequency localization would become clearer.
into successive
(4.7.1)
(4.7.2)
This can be interpreted as the action on the sequence of N input samples of two (noncausal)
filters with impulse responses and
transfer functions are
and
.. In the time domain and for an input sequence consisting of eight samples, the
of the
and
operation,
and
is a high-pass filter and the other a low-pass filter. Here in lies the
importance of the filter band interpretation of the Haar transform. The input sequence
is
first split into two versions of lower resolution with respect to the original one: a low-pass
(average) coarser resolution version and a high-pass (difference) detailed resolution one. In the
sequel the coarser resolution version is further split into two versions, and so on. This leads to a
number of versions with a hierarchy of resolutions. This decomposition is known as
multiresolution decomposition .
Multiresolution expansion
Laplacian pyramids
Some applications of Laplacian pyramids
Discrete Wavelet Transform (DWT)
Wavelet theory
4. Cross correlation, mutual information, template matching, Chamfer, etc. are a few
examples of feature correspondence.
4. Transformation Function
aligns the sensed image to the reference image by the mapping function.
A few example of transformation functions are affine, projective, piecewise linear,
thin-plate spline, etc.
5. Resampling
takes the coordinate points location of the discrete points and transforms them into a
new coordinate system because the sensed image is an uniformly spaced sample of
a continuous image.
Some examples are
nearest neighbor, bilinear, cubic spline, etc.