Professional Documents
Culture Documents
Image Processing For Engineers
Image Processing For Engineers
For Engineers
: ip.eecs.umich.edu
“book” — 2016/3/15 — 6:35 — page iii — #3
IMAGE
PROCESSING
FOR ENGINEERS
Andrew E. Yagle
The University of Michigan
Fawwaz T. Ulaby
The University of Michigan
Copyright 2018 Andrew E. Yagle and Fawwaz T. Ulaby
This book is published by Michigan Publishing under an agreement with the authors.
It is made available free of charge in electronic form to any student or instructor
interested in the subject matter.
Raw Improved
Image image Image image
Image display
formation processing
Image storage/
Sensor transmission
Image analysis
Figure 1-1 After an image is formed by a sensor, image processing tools are applied for many purposes, including changing its scale and
orientation, improving its information content, or reducing its digital size.
1-1 OPTICAL IMAGERS 3
Frequency (Hz)
Temperature of object
Temperature when
of object radiation
when radiationisismost
mostintense
intense
Converging lens
of focal length f
do di
Blue Green
1.0
V [n1 , m1 ] = {Vred [n1 , m1 ], Vgreen [n1 , m1 ], Vblue [n1 , m1 ]} distribu-
0.8 tion: discrete 2-D array of the voltage outputs of the CCD or
Red photo detector array.
0.6
B[n2 , m2 ] = {Bred [n2 , m2 ], Bgreen [n2 , m2 ], Bblue [n2 , m2 ]} distribu-
0.4
tion: discrete 2-D array of the brightness across the LCD array.
0.2
0
0.40 0.45 0.50 0.55 0.60 0.65 0.70
◮ Our notation uses parentheses ( ) with continuous-space
Wavelength λ ( μm)
signals and images, as in Io (x′ , y′ ), and square brackets [ ]
with discrete-space images, as in V [n, m]. ◭
Figure 1-5 Spectral sensitivity plots for photodetectors.
(Courtesy Nikon Corporation.)
1 µ m, it is necessary to use a filter to block the IR spectrum and The three associated transformations are:
to place red (R), green (G), or blue (B) filters over each pixel
so as to separate the visible spectrum of the incident light into
(1) Optical Transformation: from Io (x′ , y′ ; λ ) to Ii (x, y; λ ).
the three primary colors. Thus, the array elements depicted in
Fig. 1-4 in red respond to red light, and a similar correspondence
applies to those depicted in green and blue. Typical examples (2) Detection Transformation: from Ii (x, y; λ ) to V [n1 , m1 ].
of color sensitivity spectra are shown in Fig. 1-5 for a Nikon
camera.
Regardless of the specific detection mechanism (CCD or (3) Display Transformation: from V [n1 , m1 ] to B[n2 , m2 ].
APS), the array output is transferred to a digital storage device
with specific markers denoting the location of each element of
the array and its color code (R, G, or B). Each array consists of
three subarrays, one for red, another for green, and a third for Indices [n1 , m1 ] and [n2 , m2 ] vary over certain ranges of discrete
blue. This information is then used to synchronize the output values, depending on the chosen notation. For a discrete image,
of the 2-D detector array with the 2-D pixel arrangement on an the two common formats are:
LCD (liquid crystal display) or other electronic displays.
Io (x′ , y′ ; λ ): continuous intensity brightness in the object plane, (2) Corner Coordinate System: In Fig. 1-7(b), indices n and m
with (x′ , y′ ) denoting the coordinates of the object plane. of V [n, m] start at 1 (rather than zero). Image size is M × N.
6 CHAPTER 1 IMAGING SENSORS
Discretized version
of apple
Optical transformation Detection transformation Display transformation
]
n 1,m1
[
V red
Red
Io(x′, y′) Ii(x, y) filter B[n2, m2]
]
m1
[ n 1,
en
V gre
Green
filter
]
,m1
Blue [e n 1
Object plane Image plane filter V blu Display array
Detector array
Figure 1-6 Io (x′ , y′ ; λ ) and Ii (x, y; λ ) are continuous scene brightness and image intensities, whereas V [n1 , m1 ] and B[n2 , m2 ] are discrete
images of the detected voltage and displayed brightness, respectively.
1-1 OPTICAL IMAGERS 7
Image Notation
N columns
(0, M)
M rows
(0,1)
n
(−N,0) (−1,0) (0,0) (1,0) (N,0) m
s
where J1 (γ ) is the first-order Bessel function of the first kind, x2 + y2
and sin θ = , (1.5)
πD x2 + y2 + di2
γ= sin θ . (1.3)
λ and Eq. (1.4) can be rewritten as
Here, λ is the wavelength of the light (assumed to be monochro-
matic for simplicity) and D is the diameter of the converging Ii (x, y) 2J1 (γ ) 2
lens. The normalized form of Eq. (1.2) represents the impulse h(x, y) = = , (1.6)
I0 γ
response h(θ ) of the imaging system,
with s
Ii (θ ) 2J1 (γ ) 2 πD x2 + y2
h(θ ) = = . (1.4) γ= . (1.7)
Io γ λ x2 + y2 + di2
For a 2-D image, the impulse response is called the point spread The expressions given by Eqs. (1.2) through (1.7) pertain to
function (PSF). coherent monochromatic light. Unless the light source is a laser,
Detector arrays are arranged in rectangular grids. For a pixel the light source usually is panchromatic, in which case the
at (x, y) in the image plane (Fig. 1-9), diffraction pattern that would be detected by each of the three-
color detector arrays becomes averaged over the wavelength
range of that array. The resultant diffraction pattern maintains
the general shape of the pattern in Fig. 1-8(b), but it exhibits a
Lens of diameter D gentler variation with θ (with no distinct minima). Here, h(x, y)
denotes the PSF in rectangular coordinates relative to the center
of the image plane.
y
Source s
θ x
πD
γ= sin θ y
λ Along y axis: sin θ = q
y2 + di2
γ s
−5 5 x2 + y2
For an image pixel at (x, y): sin θ =
(b) 1-D profile of imaged response x + y2 + di2
2
λ
∆y′min = 1.22do ∆θmin = 1.22do . (1.9b)
C. Spatial Resolution D
(scene spatial resolution)
Each of the two coherent, monochromatic sources shown in
Fig. 1-10 produces a diffraction pattern. If the two sources are
sufficiently far apart so that their patterns are essentially distinct, This is known as the Rayleigh resolution criterion. Because
then we should be able to distinguish them from one another. But the lens diameter D is in the denominator, using a larger lens
as we bring them closer together, their diffraction patterns in the improves spatial resolution. Thus telescopes are made with very
image plane start to overlap, making it more difficult to discern large lenses and/or mirrors.
their images as those of two distinct sources. These expressions apply to the y and y′ directions at wave-
One definition of the spatial resolution capability of the length λ . Since the three-color detector arrays operate over
imaging system along the y′ direction is the separation ∆y′min different wavelength ranges, the associated angular and spatial
between the two point sources (Fig. 1-10) such that the peak of resolutions are the smallest at λblue ≈ 0.48 µ m and the largest at
the diffraction pattern of one of them occurs at the location of λred ≈ 0.6 µ m (Fig. 1-5). Expressions with identical form apply
the first null of the diffraction pattern of the other one, and vice along the x and x′ direction (i.e., upon replacing y and y′ with x
versa. Along the y direction in the image plane, the first null and x′ , respectively).
occurs when [2J1 (γ )/γ ]2 = 0 or, equivalently, γ = 3.832. Use of
D. Detector Resolution
The inherent spatial resolution in the image plane is
y ∆ymin = di λ /D, but the detector array used to record the
image has its own detector resolution ∆p, which is the pixel
Image pattern of s2
size of the active pixel sensor. For a black and white imaging
camera, to fully capture the image details made possible by the
s1 imaging system, the pixel size ∆p should be, at most, equal to
∆θmin
∆ymin . In a color camera, however, the detector pixels of an
∆ymin
′ ∆θmin ∆ymin individual color are not adjacent to one another (see Fig. 1-4),
s2 so ∆p should be several times smaller than ∆ymin .
do
Image pattern of s1 1-1.2 Thermal IR Imagers
di Density slicing is a technique used to convert a parameter
of interest from amplitude to pseudocolor so as to enhance
Figure 1-10 The separation between s1 and s2 is such that the the visual display of that parameter. An example is shown in
peak of the diffraction pattern due to s1 is coincident with the Fig. 1-11, wherein color represents the infrared (IR) tempera-
first null of the diffraction pattern of s2 , and vice versa. ture of a hot-air balloon measured by a thermal infrared imager.
The vertical scale on the right-hand side provides the color-
10 CHAPTER 1 IMAGING SENSORS
Figure 1-11 IR image of a hot-air balloon (courtesy of Ing.-Buro für Thermografie). The spatial pattern is consistent with the fact that
warm air rises.
1-1 OPTICAL IMAGERS 11
Medium-wave IR
λ = 0.76 μm 2 μm 4 μm 103 μm
Near IR Long-wave IR
Gamma Ultraviolet
X-rays Infrared Microwaves Radio waves
rays rays
1.0 IR lens
Dectector
0.95 array
Spectral emissivity
0.90 Data
processing
0.85 Ocean unit
Vegetation IR
0.80 Desert signal
Snow/ice
0.75 Cooler
Optics
3.3 μm 5 μm 10 μm 20 μm
Wavelength 50 μm
Figure 1-15 Thermal IR imaging systems often use cryogenic
cooling to improve detection sensitivity.
Figure 1-14 Emissivity spectra for four types of terrain.
(Courtesy the National Academy Press.)
B. Imaging System availability and use of a cryogenic agent, such as liquid nitrogen,
as well as placing the detectors in a vacuum-sealed container.
The basic configuration of a thermal IR imaging system Consequently, cooled IR imagers are significantly more expen-
(Fig. 1-15) is similar to that of a visible-light camera, but the sive to construct and operate than uncooled imagers.
lenses and detectors are designed to operate over the intended We close this section with two image examples. Figure 1-16
IR wavelength range of the system. Two types of detectors compares the image of a scene recorded by a visible-light black-
are used, namely uncooled detectors and cooled detectors. By and-white camera with a thermal IR image of the same scene.
cooling a semiconductor detector to very low temperatures, The IR image is in pseudocolor, with red representing high IR
typically in the 50–100 K range, its self-generated thermal noise emission and blue representing (comparatively) low IR emis-
is reduced considerably, thereby improving the signal-to-noise sion. The two images convey different types of information, but
ratio of the detected IR signal emitted by the observed scene. they also have significantly different spatial resolutions. Today,
Cooled detectors exhibit superior sensitivity in comparison with digital cameras with 16 megapixel detector arrays are readily
uncooled detectors, but the cooling arrangement requires the available and fairly inexpensive. In contrast, most standard
1-2 RADAR IMAGERS 13
Concept Question 1-4: Why is an IR imager called a It is important to note that λ of visible light is much shorter
thermal imager? than λ in the microwave region. In the middle of the visible
spectrum, λvis ≈ 0.5 µ m, whereas at a typical microwave radar
14 CHAPTER 1 IMAGING SENSORS
∆y′min
∆θmin
Figure 1-18 Radar imaging of a scene by raster scanning the antenna beam.
frequency of 6 GHz, λmic ≈ 5 cm. The ratio is sion and transmits very short pulses to achieve fine resolution in
the orthogonal dimension. The predecessor to SAR is the real-
λmic 5 × 10−2 aperture side-looking airborne radar (SLAR). A SLAR uses
= = 105 !
λvis 0.5 × 10−6 a rectangular- or cylindrical-shaped antenna that gets mounted
along the longitudinal direction of an airplane, and pointed
This means that the angular resolution capability of an optical partially to the side (Fig. 1-19).
system is on the order of 100,000 times better than the angular Even though the antenna beam in the elevation direction
resolution of a radar, if the lens diameter is the same size as the is very wide, fine discrimination can be realized along the x
antenna diameter. direction in Fig. 1-19 by transmitting a sequence of very short
To fully compensate for the large wavelength ratio, a radar pulses. At any instant in time, the extent of the pulse along x is
antenna would need a diameter on the order of 1 km to produce
an image with the same resolution as a camera with a lens 1 cm cτ
∆x′min = (scene range resolution), (1.11)
in diameter. Clearly, that is totally impractical. In practice, most 2 sin θ
radar antennas are on the order of centimeters to meters in size,
where c is the velocity of light, τ is the pulse width, and θ is
but certainly not kilometers. Yet, radar can image the Earth
the incidence angle relative to nadir-looking. This represents the
surface from satellite altitudes with spatial resolutions on the
scene spatial resolution capability along the x′ direction. At a
order of 1 m—equivalent to antenna sizes several kilometers in
typical angle of θ = 45◦ , the spatial resolution attainable when
extent! How is that possible?
transmitting pulses each 5 µ s in width is
A. Synthetic-Aperture Radar 3 × 108 × 5 × 10−9
∆x′min = ≈ 1.05 m.
As we will see shortly, a synthetic-aperture radar (SAR) uses 2 sin 45◦
a synthesized aperture to achieve good resolution in one dimen-
1-2 RADAR IMAGERS 15
va Transmitte
Tr ter-
ter-
Transmitter-receiver
An
Antenna
Idealized
lized elevation
antenna pattern ly
Short pu
pulse
Trees
Bank edge
Water Truck
x′′
Shadow ∆y′′
∆
∆y
Start sweep
Truck
Sloping edge
Trees
Shadow
End of sweep
Video Water
amplitude
Time
(range)
A-scope display
Scrub growth
(brush, grass, bare earth, etc.)
Figure 1-19 Real-aperture SLAR imaging technique. The antenna is mounted along the belly of the aircraft.
Not only is this an excellent spatial resolution along the x′ Fig. 1-19). The shape of the beam of the cylindrical antenna
direction, but it is also independent of range R (distance between is illustrated in Fig. 1-20. From range R, the extent of the beam
the radar and the surface), which means it is equally applicable along the y direction is
to a satellite-borne radar.
As the aircraft flies along the y direction, the radar beam λ λh
∆y′min ≈ R= , (1.12)
sweeps across the terrain, while constantly transmitting pulses, ly ly cos θ
receiving their echoes, and recording them on an appropriate (real-aperture azimuth resolution)
medium. The sequential echoes are then stitched together to
form an image. where h is the aircraft altitude. This is the spatial resolution
By designing the antenna to be as long as practicable along capability of the radar along the flight direction. For a 3 m long
the airplane velocity direction, the antenna pattern exhibits a antenna operating at λ = 3 cm from an altitude of 1 km, the
relatively narrow beam along that direction (y′ direction in
16 CHAPTER 1 IMAGING SENSORS
NRL
N
Capitol
Pentagon
White House
Washington
Monument
Figure 1-22 SAR image collected over Washington, D.C. Right of center is the Washington Monument, though only the shadow of the obelisk
is readily apparent in the image. [Courtesy of Sandia National Laboratories.]
18 CHAPTER 1 IMAGING SENSORS
given by
Exercise 1-3: With reference to the diagram in Fig. 1-21,
h(x, y) = hx (x) hy (y), (1.14)
suppose the length of the real aperture were to be increased
with hx (x) describing the shape of the transmitted pulse and from 2 m to 8 m. What would happen to (a) the antenna
hy (y) describing the shape of the synthetic antenna-array pat- beamwidth, (b) length of the synthetic aperture, and (c) the
tern. Typically, the pulse shape is like a Gaussian: SAR azimuth resolution?
2 Answer: (a) Beamwidth is reduced by a factor of 4, (b)
hx (x) = e−2.77(x/τ ) , (1.15a)
synthetic aperture length is reduced from 8 km to 2 km, and
where τ is the effective width of the pulse (width between half- (c) SAR resolution changes from 1 m to 4 m.
peak points). The synthetic array pattern is sinc-like in shape,
but the sidelobes may be suppressed further by assigning differ-
ent weights to the processed pulses. For the equally weighted
case, 1-3 X-Ray Computed Tomography
2 1.8y (CT)
hy (y) = sinc , (1.15b)
l
where l is the length of the real antenna, and the sinc function is Computed tomography, also known as CT scan, is a technique
defined such that sinc(z) = sin(π z)/(π z) for any variable z. capable of generating 3-D images of the X-ray attenuation (ab-
sorption) properties of an object, such as the human body. The
X-ray absorption coefficient of a material is strongly dependent
Concept Question 1-5: Why is a SAR called a
on the density of that material. CT has the sensitivity necessary
“synthetic”-aperture radar?
to image body parts across a wide range of densities, from soft
tissue to blood vessels and bones.
Concept Question 1-6: What system parameters deter- As depicted in Fig. 1-24(a), a CT scanner uses an X-ray
mine the PSF of a SAR? source, with a narrow slit to generate a fan-beam, wide enough
to encompass the extent of the body, but only about 1 mm
thick. The attenuated X-ray beam is captured by an array of
∼ 900 detectors. The X-ray source and the detector array are
mounted on a circular frame that rotates in steps of a fraction
of a degree over a full 360◦ circle around the object or patient,
each time recording an X-ray attenuation profile from a different
angular direction. Typically, on the order of 1000 such profiles
are recorded, each composed of measurements by 900 detectors.
For each horizontal slice of the body, the process is completed
in less than 1 second. CT uses image reconstruction algorithms
to generate a 2-D image of the absorption coefficient of that
horizontal slice. To image an entire part of the body, such as
the chest or head, the process is repeated over multiple slices
(layers).
For each anatomical slice, the CT scanner generates on
the order of 9 × 105 measurements (1000 angular orientations
×900 detectors). In terms of the coordinate system shown in
Fig. 1-24(b), we define α (ξ , η ) as the absorption coefficient
of the object under test at location (ξ , η ). The X-ray beam is
directed along the ξ direction at η = η0 . The X-ray intensity
Figure 1-23 High-resolution image of an airport runway received by the detector located at ξ = ξ0 and η = η0 is given
with a plane and helicopter. [Courtesy of Sandia National by
Z ξ0
Laboratories.]
I(ξ0 , η0 ) = I0 exp − α (ξ , η0 ) d ξ , (1.16)
0
1-4 MAGNETIC RESONANCE IMAGING 19
~1 × 10−4 T
Computer
Figure 1-26 B0 is static and approximately uniform within
the cavity. Inside the cavity, B0 ≈ 1.5 T (teslas), compared with
Figure 1-25 Basic diagram of an MRI system. only 0.1 to 0.5 milliteslas outside.
B = B0 + BG + BRF , mI = −1/2
A. Static Field B0
Figure 1-27 Nuclei with spin magnetic number of ±1/2
Field B0 is a strong, static (non–time varying) magnetic field precessing about B0 at the Larmor angular frequency ω0 .
created by a magnet designed to generate a uniform (constant)
distribution throughout the magnetic core (Fig. 1-26). Usually, a
superconducting magnet is used for this purpose because it can
generate magnetic fields with much higher magnitudes than can come magnetized when exposed to a magnetic field. Among the
be realized with resistive and permanent magnets. The direction substances found in a biological material, the hydrogen nucleus
of B0 is longitudinal (ẑ direction in Fig. 1-26) and its magnitude has a strong susceptibility to magnetization, and hydrogen is
is typically on the order of 1.5 teslas (T). The conversion factor highly abundant in biological tissue. For these reasons, a typical
between teslas and gauss is 1 T = 104 gauss. Earth’s magnetic MR image is related to the concentration of hydrogen nuclei.
field is on the order of 0.5 gauss, so B0 inside the MRI core is The strong magnetic field B0 causes the nuclei of the material
on the order of 30,000 times that of Earth’s magnetic field. inside the core space to temporarily magnetize and to spin
Biological tissue is composed of chemical compounds, and (precess) like a top about the direction of B0 . The precession ori-
each compound is organized around the nuclei (protons) of the entation angle θ , shown in Fig. 1-27, is determined by the spin
atoms comprising that compound. Some, but not all, nuclei be- quantum number I of the spinning nucleus and the magnetic
1-4 MAGNETIC RESONANCE IMAGING 21
sin(π N ∆k x)
the vertical direction, the total core volume can be discretized hx (x) = ∆k , (1.20)
sin(π ∆k x)
into horizontal layers called slices, each corresponding to a
different value of f0 (Fig. 1-29). This way, the RF signal can where x is one of the two MR image coordinates, k is a spatial
communicate with each slice separately by selecting the RF frequency, ∆k is the sampling interval in k space, and N is the
frequency to match f0 of that slice. In practice, instead of total number of Fourier samples. A similar expression applies
sending a sequence of RF signals at different frequencies, the RF to hy (y). The spatial resolution of the MR image is equal to the
transmitter sends out a short pulse whose frequency spectrum equivalent width of hx (x), which can be computed as follows:
covers the frequency range of interest for all the slices in the
Z 1/(1 ∆k)
volume, and then a Fourier transformation is applied to the 1 1
response from the biological tissue to separate the responses ∆xmin = hx (x) dx = . (1.21)
h(0) −1/(1 ∆k) N ∆k
from the individual slices.
The gradient magnetic field along the ŷ direction allows dis- The integration was performed over one period (1/∆k) of hx (x).
cretization of the volume into x–y slices. A similar process can According to Eq. (1.21), the image resolution is inversely
be applied to generate x–z and y–z slices, and the combination is proportional to the product N ∆k. The choices of values for N
used to divide the total volume into a three-dimensional matrix and ∆k are associated with signal-to-noise ratio and scan time
of voxels (volume pixels). The voxel size defines the spatial considerations.
resolution capability of the MRI system.
1-4.3 MRI-Derived Information
C. RF System
Generally speaking, MRI can provide three types of information
The combination of the strong static field B0 and the gradient about the imaged tissue:
field BG (whose amplitude is on the order of less than 1%
of B0 ) defines a specific Larmor frequency for the nuclei of (a) The magnetic characteristics of tissues, which are related
every isotope within each voxel. As we noted earlier through to biological attributes and blood vessel conditions.
Table 1-1, at B0 intensities in the 1 T range, the Larmor frequen-
cies of common isotopes are in the MHz range. The RF system (b) Blood flow, made possible through special time-dependent
consists of a transmitter and a receiver connected to separate gradient excitations.
coils, or the same coil can be used for both functions. The
transmitter generates a burst of narrow RF pulses. In practice, (c) Chemical properties discerned from measurements of
many different pulse configurations are used, depending on the small shifts in the Larmor frequency.
1-5 ULTRASOUND IMAGER 23
Transmit pulse
Transmit
Transmitter τ beamforming
unit (time delay
generator) Transducers
System T/R
Display
processor switch
Receive echo
Receive
Data
beamforming
acquisition
(time delay
unit
generator)
Array
axis Array axis
Rf1
Rf2
Focus 1
Focus 2
Figure 1-33 Changing the inter-element time delay across a symmetrical time-delay distribution shifts the location of the focal point in
the range direction.
1-5.3 Spatial Resolution cycles in the pulse. The wavelength is related to the signal
frequency by
v
For a 2-D transducer array of size (Lx × Ly ) and focused at λ= , (1.23)
f
range Rf , as shown in Fig. 1-36 (side Ly is not shown in
the figure), the size of the resolution voxel is given by an where v is the wave velocity and f is the frequency. In biological
axial resolution ∆Rmin along the range direction and by lateral tissue, v ≈ 1540 m/s. For an ultrasound system operating at
resolutions ∆xmin and ∆ymin along the two lateral directions. The f = 5 MHz and generating pulses with N = 2 cycles per pulse,
axial resolution is given by
vN 1540 × 2
∆Rmin = = ≈ 0.3 mm.
λN 2f 2 × 5 × 106
∆Rmin = (axial resolution), (1.22)
2
where λ is the wavelength of the pulse in the material in which
the acoustic waves are propagating and N is the number of
26 CHAPTER 1 IMAGING SENSORS
90 90
70 70
50 50
Lateral distance (mm)
30 30 Lx ∆ xmin
10 10
∆ Rmin
10 10
30 30
50 50 Rf
70 70
90 90 Figure 1-36 Axial resolution ∆Rmin and lateral resolution
0 20 40 60 80 100 0 20 40 60 80 ∆xmin for a transducer array of length Lx focused at range Rf .
Axial distance (mm)
(a) Focused beam (b) Focused beam
with no steering with steering
by 45o
The lateral resolution ∆xmin is given by
Figure 1-34 Simulations of acoustic energy distribution for
(a) a beam focused at Rf = 40 mm by a 96-element array and λ Rf v
∆xmin = Rf = (lateral resolution), (1.24)
(b) a beam focused and steered by 45◦ . Lx Lx f
where Rf is the focal length (range at which the beam is
focused). If the beam is focused at Rf = 5 cm and the array
length Lx = 4 cm and f = 5 MHz, then
5 × 10−2 × 1540
∆xmin = ≈ 0.4 mm,
4 × 10−2 × 5 × 106
(a) Uniform distribution (b) Linear shift (c) Non-uniform (d) Nonlinear shift
symmetrical and non-uniform
distribution distribution
Figure 1-35 Beam focusing and steering are realized by shaping the time-delay distribution.
1-6 COMING ATTRACTIONS 27
which is comparable with the magnitude of the axial resolution 1-6 Coming Attractions
∆Rmin . The resolution along the orthogonal lateral direction,
∆ymin , is given by Eq. (1.24) with Lx replaced with Ly . The size Through examples of image processing products, this section
of the resolvable voxel is presents images extracted from various sections in the book.
In each case, we present a transformed image, along with a
∆V = ∆Rmin × ∆xmin × ∆ymin. (1.25) reference to its location within the text.
199
0 199
50
100
Concept Question 1-10: How does an ultrasound imager
focus its beam in the range direction and in the lateral 150
300
Answer: ∆V = ∆Rmin × ∆xmin × ∆ymin Figure 1-38 Original clown image and nonlinearly warped
= 0.26 mm × 0.41 mm × 0.41 mm. product. [Extracted from Figs. 4-14 and 4-17.]
28 CHAPTER 1 IMAGING SENSORS
200
0 200
(a) Dark clown image
200
0 200
(b) Brightened clown image
0 0
20
40
60
80
100
120
140 659
0 799
160 (a) Original Mariner image
180
0
200
0 20 40 60 80 100 120 140 160 180 200
20
40
60
80 659
0 799
100
(b) Notch-filtered Mariner image
120
140
Figure 1-42 The horizontal lines in (a) are due to sinusoidal
interference in the recorded image. The lines were removed by
160 applying notch filtering. [Extracted from Fig. 6-7.]
180
200
0 20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
140
224 160
0 274
180
(a) Original motion-blurred image caused by
taking a photograph in a moving car 200
20 40 60 80 100 120 140 160 180 200
(a) A noisy clown image
0
20
40
60
80
100
120
140
160
274 180
0 274
200
(b) Motion deblurred image 20 40 60 80 100 120 140 160 180 200
(b) Wavelet-denoised clown image
Figure 1-43 Motion blurring is removed. [Extracted from
Fig. 6-11.] Figure 1-44 Image denoising. [Extracted from Fig. 7-21.]
1-6 COMING ATTRACTIONS 31
60
80
100
120
140
160
180
200
20 40 60 80 100 120 140 160 180 200 (a) Unfocused MRI image
(a) Image with missing pixels
20
40
60
80
100
160
180
200
20 40 60 80 100 120 140 160 180 200
(b) Inpainted image
1-6.10 Markov Random Fields for Image 1-6.11 Motion-Deblurring of a Color Image
Segmentation Chapters 1–9 consider grayscale (black-and-white) images,
In a Markov random field (MRF) image model, the value of each since color images consist of three (red, green, blue) images.
pixel is stochastically related to its surrounding values. This Motion deblurring of a color image is presented in Section 10-3,
is useful in segmenting images, as presented in Section 9-11. an example of which is shown in Fig. 1-48.
Figure 1-47 illustrates how an MRF image model can improve
the segmentation of an X-ray image of a foot into tissue and
bone.
0
u2 1
1.5 2
784 terminals
3
1 4
Class 1 5
0.5 6
7
0
8
9
−0.5 Class 2 Class 3
−1
Summary
Concepts
• Images may be formed using any of these imaging • Color images are actually triplets of red, green, and blue
modalities: optical, infrared, radar, x-rays, ultrasound, images, displayed together.
and magnetic resonance imaging. • The effect of an image acquisition system on an image
• Image processing is needed to process a raw image, can usually be modelled as 2-D convolution with the
formed directly from data, into a final image, which has point spread function of the system (see below).
been deblurred, denoised, interpolated, or enhanced, all • The resolution of an image acquisition system can be
of which are subjects of this book. computed using various formulae (see below).
Mathematical Formulae
1 1 1
Lens law + = X-ray tomography path attenuation
d0 di f Z ∞Z ∞
p(r, θ ) = a(ξ , η ) δ (r − ξ cos θ − η sin θ ) d ξ d η
Optical point spread function −∞ −∞
λ
2J1 (γ ) 2 πD Optical resolution ∆θmin ≈ 1.22
h(θ ) = , γ= sin θ D
γ λ λ
Radar resolution ∆y′min ≈ R
SAR point spread function D
λN
2 1.8y Ultrasound resolution ∆Rmin =
h(x, y) = e−2.77(x/τ ) sinc2 2
l
sin(π N ∆k x)
MRI point spread function hx (x) = ∆k
sin(π ∆k x)
2-D convolution Z ∞Z ∞
Ii (x, y) = Io (x, y) ∗ ∗ h(x, y) = Io (x − x′ , y − y′) h(x′ , y′ ) dx′ dy′
−∞ −∞
Important Terms Provide definitions or explain the meaning of the following terms:
active pixel sensor liquid crystal display radar X-ray computed tomography
beamforming magnetic resonance imaging (MRI) resolution
charge-coupled device optical imaging synthetic-aperture radar
infrared imaging point spread function ultrasound imaging
36 CHAPTER 1 IMAGING SENSORS
Objectives 0.20
0.15
Learn to:
0.10
39
40 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
1-D Signals
Continuous Time
FT
x(t) X( f )
Signal in Spectrum in
time domain frequency domain
Discrete Time
DTFT DFT
x[n] X(Ω) X[k]
Signal at Spectrum at Spectrum at
2πk
discrete times continuous discrete frequencies Ω =
N
t = n∆ frequency Ω 0≤k≤N−1
2-D Images
Continuous Space
CSFT
f (x,y) F(μ,ν)
Image in Spectrum in
spatial domain frequency domain
Discrete Space
DSFT 2-D DFT
f [n,m] F(Ω1,Ω2) F[k1,k2]
Image in Spectrum in order N Spectrum in
discrete space continuous discrete frequency domain:
frequency domain 2πk1 2πk2
Ω1 = , Ω2 =
N N
2-1 REVIEW OF 1-D CONTINUOUS-TIME SIGNALS 41
Types of Signals
x(t)
A cos (2πt /T )
A
Eternal sinusoid x(t) = A cos(2π f0t + θ ), −∞ < t < ∞
t
−T −T/2 0 T/2 T
t − t0
rect ( )T
T
t − t0 1
Pulse (rectangle) x(t) = rect
( T
t (s)
1 for (t0 − T /2) < t < (t0 + T /2), 0 t0
=
0 otherwise
1 δ(t − t0)
Z ∞
Impulse δ (t) x(t) δ (t − t0 ) dt = x(t0 )
−∞
t
0 t0
Properties
x(t)
y1(t) = x(2t)
10 x(t) y2(t) = x(t / 2)
The scaling property can be interpreted using Eq. (2.5) as A nonzero signal x(t) that is zero-valued outside the interval
follows. For a > 1 the width of the pulse in Eq. (2.5) is
compressed by |a|, reducing its area by a factor of |a|, but its [a, b] = {t : a ≤ t ≤ b},
height is unaltered. Hence the area under the pulse is reduced to
1/|a|. (i.e., x(t) = 0 for t ∈
/ [a, b]), has support [a, b] and duration b − a.
Impulses are important tools used in defining the impulse
responses of 1-D systems and the point-spread functions of 2-D Concept Question 2-1: Why does scaling time in an im-
spatial systems (such as a camera or an ultrasound), as well as pulse also scale its area?
for deriving the sampling theorem.
R∞
Exercise 2-1: Compute the value of −∞ δ (3t − 6) t 2 dt.
2-1.2 Properties of 1-D Signals
Answer: 43 . (See IP )
A. Time Delay
Delaying signal x(t) by t0 generates signal x(t −t0 ). If t0 > 0, the Exercise 2-2: Compute
the energy of the pulse defined by
waveform of x(t) is shifted to the right by t0 , and if t0 < 0, the t−2
x(t) = 5 rect 6 .
waveform of x(t) is shifted to the left by |t0 |. This is illustrated
by the time-delay figure in Table 2-1. Answer: 150. (See IP )
B. Time Scaling
2-2 Review of 1-D Continuous-Time
A signal x(t) time-scaled by a becomes x(at). If a > 1, the
waveform of x(t) is compressed in time by a factor of a. If Systems
0 < a < 1, the waveform of x(t) is expanded in time by a factor
of 1/a, as illustrated by the scaling figure in Table 2-1. If a < 0, A continuous-time system is a device or mathematical model
the waveform of x(t) is compressed by |a| or expanded by 1/|a|, that accepts a signal x(t) as its input and produces another signal
and then time-reversed. y(t) at its output. Symbolically, the input-output relationship is
expressed as
C. Signal Energy
x(t) SYSTEM y(t)
The energy E of a signal x(t) is
Z ∞
Table 2-2 provides a list of important system types and proper-
E= |x(t)|2 dt. (2.7)
−∞ ties.
Property Definition
N N
Linear System (L) If xi (t) L yi (t), then ∑ ci xi (t) L ∑ ci yi (t)
i=1 i=1
N N
Linear Time-Invariant (LTI) If xi (t) LTI yi (t), then ∑ ci xi (t − τi) LTI ∑ ci yi (t − τi)
i=1 i=1
of the above four classes. If a system is moderately nonlinear, exactly the same direction. That is, if
it can be approximated by a linear model, and if it is highly
nonlinear, it may be possible to divide its input-output response
x(t) TI y(t),
into a series of quasi-linear regions. In this book, we limit our
treatment to linear (L) and linear time-invariant (LTI) systems.
then it follows that
in the following two equations: which is known as the convolution integral. Often, the convolu-
tion integral is represented symbolically by
Z ∞
δ (t − τ ) SYSTEM h(t; τ ), (2.10a)
x(t) ∗ h(t) = x(τ ) h(t − τ ) d τ . (2.15)
−∞
and for a time-invariant system,
Combining the previous results leads to the symbolic form
δ (t − τ ) TI h(t − τ ). (2.10b)
x(t) LTI y(t) = x(t) ∗ h(t). (2.16)
Z ∞
x(t) L y(t) = x(τ ) h(t; τ ) d τ . (2.13)
−∞
Concept Question 2-2: What is the significance of a sys-
This integral is called the superposition integral. tem being linear time-invariant?
Z ∞ Z ∞
4. x(τ ) δ (t − τ ) d τ LTI y(t) = x(τ ) h(t − τ ) d τ
−∞ −∞
Z ∞
5. x(t) LTI y(t) = x(τ ) h(t − τ ) d τ = x(t) ∗ h(t)
−∞
Figure 2-2 Derivation of the convolution integral for a linear time-invariant system.
Property Description
3. Distributive x(t) ∗ [h1 (t) + · · · + hN (t)] = x(t) ∗ h1 (t) + · · · + x(t) ∗ hN (t) (2.19c)
Z t
4. Causal ∗ Causal = Causal y(t) = u(t) h(τ ) x(t − τ ) d τ (2.19d)
0
(a) In series
2-3 1-D Fourier Transforms
The continuous-time Fourier transform (CTFT) is a powerful
h1(t) tool for
• computing the spectra of signals, and
x(t) h2(t) y(t)
• analyzing the frequency responses of LTI systems.
and Modulation:
Z ∞
1 − jω t Z ∞
x(t) = X−ω (ω ) e dω . (2.22b) e j2π f0t x(t) = X( f ) e j2π f t e j2π f0t d f
2π −∞
−∞
Geophysicists use different sign conventions for time and space! Z ∞
In addition, some computer programs,√such as Mathematica, = X( f ) e j2π ( f + f0 )t d f
−∞
split the 1/(2π ) factor into factors of 1/ 2π in both the forward −1
and inverse transforms. =F {X( f − f0 )}. (2.25)
Derivative:
◮ In this book, we use the definition of the Fourier trans- Z ∞
dx(t) d j2π f t
form given by Eq. (2.20) exclusively. ◭ = X( f ) e df
dt dt −∞
Z ∞
= X( f ) ( j2π f ) e j2π f t d f
B. Fourier Transform Notation −∞
−1
Throughout this book, we use Eq. (2.20) as the definition of the =F {( j2π f ) X( f )}. (2.26)
Fourier transform, we denote the individual transformations by
Zero frequency:
F {x(t)} = X( f ) Setting f = 0 in Eq. (2.20a) leads to
Z ∞
and X(0) = x(t) dt.
F −1 {X( f )} = x(t), −∞
1. Linearity ∑ ci xi (t) ∑ ci Xi ( f )
1 f
2. Time scaling x(at) X
|a| a
3. Time shift x(t − τ ) e− j2π f τ X( f )
dx
5. Time derivative x′ = j2π f X( f )
dt
6. Reversal x(−t) X(− f )
7. Conjugation x∗ (t) X∗ (− f )
Special FT Relationships
Z ∞
11. Zero frequency X(0) = x(t) dt
−∞
Z ∞
12. Zero time x(0) = X( f ) d f
−∞
Z ∞ Z ∞
13. Parseval’s theorem x(t) y∗ (t) dt = X( f ) Y∗ ( f ) d f
−∞ −∞
x(t) and X( f ) are equal: where the even component xe (t) and the odd component xo (t)
Z ∞ Z ∞ are formed from their parent signal x(t) as follows:
E= |x(t)|2 dt = |X( f )|2 d f . (2.28)
−∞ −∞ xe (t) = [x(t) + x∗ (−t)]/2 (2.30a)
and
xo (t) = [x(t) − x∗(−t)]/2. (2.30b)
A signal is said to have even symmetry if x(t) = x∗ (−t), in
A. Even and Odd Parts of Signals which case x(t) = xe (t) and xo (t) = 0. Similarly, a signal has
odd symmetry if x(t) = −x∗ (−t), in which case x(t) = xo (t) and
xe (t) = 0.
A signal x(t) can be decomposed into even xe (t) and odd x0 (t)
components:
x(t) = xe (t) + x0(t), (2.29)
50 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
d
Sound magnitude Exercise 2-6: Compute the Fourier transform of dt [sinc(t)].
Answer:
(
“oo” as in “cool” d j2π f for | f | < 0.5,
F sinc(t) =
dt 0 for | f | > 0.5.
(See IP )
f (kHz)
1 2 3
(a) “oo” spectrum
2-4 The Sampling Theorem
The sampling theorem is an operational cornerstone of both
Sound magnitude discrete-time 1-D signal processing and discrete-space 2-D
image processing.
1 1 1 and if
x(t) = sin(t) + sin(3t) + sin(5t) + sin(7t) + · · ·
3 5 7 x(t) is sampled at a sampling rate of S samples/s,
Compute output y(t) if then
x(t) can be reconstructed exactly from {x(n∆),
x(t) h(t) = 0.4 sinc(0.4t) y(t). n = . . . , −2, −1, 0, 1, 2, . . .}, provided S > 2B.
mum) sampling rate 2B samples/second is called the Nyquist Dividing by ∆ and recalling that S = 1/∆ gives
rate, and the frequency 2B is called the Nyquist frequency.
∞
xs (t) S ∑ X( f − kS). (2.47)
k=−∞
2-4.2 Sampling Theorem Derivation The spectrum Xs ( f ) of xs (t) consists of a superposition of copies
of the spectrum X( f ) of x(t), repeated every S = 1/∆ and
A. The Sampled Signal xs (t) multiplied by S. If these copies do not overlap in frequency,
we may then recover X( f ) from Xs ( f ) using a lowpass filter,
Given a signal x(t), we construct the sampled signal xs (t) by provided S > 2B [see Fig. 2-6(a)].
multiplying x(t) by the impulse train
∞ X s( f )
δs (t) = ∑ δ (t − n∆). (2.42) S X(0)
n=−∞ f
−B − S B−S −B 0 B S−B S+B
That is, Copy Copy
(a) S > 2B
∞
xs (t) = x(t) δs (t) = ∑ x(t) δ (t − n∆) X s( f )
n=−∞
∞
f
= ∑ x(n∆) δ (t − n∆). (2.43) −B − S B − S0S − B S+B
n=−∞
−B B
(b) S < 2B
B. Spectrum of the sampled signal xs (t) Figure 2-6 Sampling a signal x(t) with maximum frequency B at
a rate of S makes X( f ) change amplitude to S X( f ) and to repeat
Using Fourier series, it can be shown that the Fourier transform in f with period S. These copies (a) do not overlap if S > 2B, but
of the impulse train δs (t) is itself an impulse train in frequency: (b) they do if S < 2B.
∞ ∞
∆ ∑ δ (t − n∆) ∑ δ ( f − k/∆). (2.44)
n=−∞ k=−∞
2-4.3 Aliasing
This result can be interpreted as follows. A periodic signal has
If the sampling rate S does not exceed 2B, the copies of X( f )
a discrete spectrum (zero except at specific frequencies) given
will overlap one another, as shown in Fig. 2-6(b). This is called
by the signal’s Fourier series expansion. By Fourier duality, a
an aliased condition, the consequence of which is that the
discrete signal (zero except at specific times) such as xs (t) has
reconstructed signal will no longer match the original signal
a periodic spectrum. So a discrete and periodic signal such as
x(t).
δs (t) has a spectrum that is both discrete and periodic.
Multiplying Eq. (2.44) by x(t), using the definition for xs (t)
given by Eq. (2.43), and applying property #9 in Table 2-4 leads Example 2-1: Two Sinusoids and Aliasing
to
∞
xs (t) ∆ X( f ) ∗ ∑ δ ( f − k/∆), (2.45)
Two signals, a 2 Hz sinusoid and a 12 Hz sinusoid:
k=−∞
0.8 x1(t)
0.6
x1s(t)
0.4
0.2
0 t (s)
−0.2
−0.4
−0.6
−0.8
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
x2(t)
0.8
0.6
0.4
x2s(t)
0.2
0 t (s)
−0.2
−0.4
−0.6
−0.8
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 2-7 Plots of (a) x1 (t) and x1s (t) and (b) x2 (t) and x2s (t). Sampling rate S = 20 samples/s.
2-4 THE SAMPLING THEOREM 57
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 f (Hz)
−30 −20 −10 0 10 20 30
X1s( f )
X1s( f ) for 2 Hz sinusoid sampled at 20 Hz. Note no aliasing.
10
9
S X1( f )
Copy of S X1( f ) Copy of S X1( f )
8 at −20 Hz at +20 Hz
7
0 f (Hz)
−30 −20 −10 0 10 20 30
S = 20 Hz S = 20 Hz
(b) Spectrum X1s( f ) of sampled signal x1s(t)
Figure 2-8 Spectra (a) X1 ( f ) and (b) X1s ( f ) of the 2 Hz sinusoid and its sampled version, respectively. The spectrum X1s ( f ) consists of
X( f ) scaled by S = 20, plus copies thereof at integer multiples of ±20 Hz. The vertical axes denote areas under the impulses.
58 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 f (Hz)
−30 −20 −10 0 10 20 30
4 Overlap Overlap
0 f (Hz)
−30 −20 −10 0 10 20 30
S = 20 Hz S = 20 Hz
(b) Spectrum X2s( f ) of sampled signal x2s(t)
Figure 2-9 Spectra (a) X2 ( f ) and (b) X2s ( f ) of the 12 Hz sinusoid and its sampled version. Note the overlap in (b) between the spectrum
of X2s ( f ) and its neighboring copies. The vertical axes denote areas under the impulses.
2-5 REVIEW OF 1-D DISCRETE-TIME SIGNALS AND SYSTEMS 59
Note that Xs ( f ) is periodic in f with period 1/∆, as it should be. reconstruction x̂1 (t), we apply Eq. (2.57) with ∆ = 1/S = 0.05 s:
In the absence of aliasing,
X̂1 ( f ) = X1s ( f ) ∆ sinc(∆ f ).
1
X( f ) = Xs ( f ) for | f | < S/2. (2.54)
S The sinc function is displayed in Fig. 2-10(a) in red and uses the
vertical scale on the right-hand side, and the spectrum X̂1 ( f ) is
The relationship given by Eq. (2.53)) still requires an infinite displayed in blue using the vertical scale on the left-hand side.
number of samples {x(n∆)} to reconstruct X( f ) at each fre- The sinc function preserves the spectral components at ±2 Hz,
quency f . but attenuates the components centered at ±20 Hz by a factor
of 10 (approximately).
D. Nearest-Neighbor (NN) Interpolation (b) Application of Eq. (2.56) to x1 (n∆) = cos(4π n∆) with
∆ = 1/20 s yields plot x̂1 (t) shown in Fig. 2-10(b).
A common procedure for computing an approximation to x(t)
from its samples {x(n∆)} is the nearest neighbor interpolation.
The signal x(t) is approximated by x̂(t): Concept Question 2-6: Why must the sampling rate of a
signal exceed double its maximum frequency, if it is to be
reconstructed from its samples?
x(n∆)
for (n − 0.5)∆ < t < (n + 0.5)∆,
x̂(t) = x((n + 1)∆) for (n + 0.5)∆ < t < (n + 1.5)∆,
.. .. Concept Question 2-7: Why does nearest-neighbor in-
. . terpolation work as well as it does?
(2.55)
So x̂(t) is a piecewise-constant approximation to x(t), and it is
related to the sampled signal xs (t) by Exercise 2-7: What is the Nyquist sampling rate for a signal
bandlimited to 4 kHz?
x̂(t) = xs (t) ∗ rect(t/∆). (2.56)
Answer: 8000 samples/s. (See IP )
Using the Fourier transform of a rectangle function (entry #4 in
Table 2-5), the spectrum X̂( f ) of x̂(t) is Exercise 2-8: A 500 Hz sinusoid is sampled at 900
samples/s. No anti-alias filter is being used. What is the
X̂( f ) = Xs ( f ) ∆ sinc(∆ f ), (2.57) frequency of the reconstructed continuous-time sinusoid?
where Xs ( f ) is the spectrum of the sampled signal. The zero- Answer: 400 Hz. (See IP )
crossings of the sinc function occur at frequencies f = k/∆ = kS
for integers k. These are also the centers of the copies of the
original spectrum X( f ) induced by sampling. So these copies
are attenuated if the maximum frequency B of X( f ) is such that 2-5 Review of 1-D Discrete-Time
B ≪ S. The factor ∆ in Eq. (2.57) cancels the factor S = 1/∆ in
Eq. (2.47). Signals and Systems
Through direct generalizations of the 1-D continuous-time def-
initions and properties of signals and systems presented earlier,
Example 2-2: Reconstruction of 2 Hz Sinusoid
we now extend our review to their discrete counterparts.
ˆ 1( f )
X sinc(0.05 f )
0.5 1.0
0.45 0.9
0.4 0.8
0.2 0.4
0.15 0.3
0.1 0.2
0.05 0.1
0
0 f (Hz)
−30 −20 −10 0 10 20 30
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
n 2-5.2 Discrete-Time Eternal Sinusoids
−2 −1 0 1 2 3
A discrete-time eternal sinusoid is defined as
Figure 2-11 Stem plot representation of x[n]. x[n] = A cos(Ω0 n + θ ), −∞ < n < ∞, (2.62)
can be depicted using either the bracket notation with N/D being a rational number. In such a case, the fun-
damental period of the sinusoid is N, provided N/D has been
x[n] = {3, 2, 0, 4}, (2.59) reduced to lowest terms.
∞
Another important property of discrete-time eternal sinusoids
is that the discrete-time frequency Ω0 is periodic, which is not
y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]. (2.71a)
i=−∞
true for continuous-time sinusoids. For any integer k, Eq. (2.62) Discrete-time convolution
can be rewritten as
x[n] = A cos(Ω0 n + θ ), −∞ < n < ∞ Most of the continuous-time properties of convolution also
= A cos((Ω0 + 2π k)n + θ ), apply in discrete time.
Real-world signals and filters are defined over specified
= A cos(Ω′0 n + θ ), −∞ < n < ∞, (2.66) ranges of n (and set to zero outside those ranges). If h[n] has
support in the interval [n1 , n2 ], Eq. (2.71a) becomes
with
Ω′0 = Ω0 + 2π k. (2.67) n2
x[n] = cos(2π n) = 1 at (Ω0 = 2π ). (2.70) For causal signals (x[n] and h[n] equal to zero for n < 0), y[n]
assumes the form
Beyond Ω0 = 2π , the oscillatory behavior starts to increase
again, and so on. This behavior has no equivalence in the world n
of continuous-time sinusoids. y[n] = ∑ x[i] h[n − i], n ≥ 0. (2.71d)
i=0
Causal
2-5.3 1-D Discrete-Time Systems
A 1-D discrete-time system accepts an input x[n] and produces For example,
an output y[n]:
{1, 2} ∗ {3, 4} = {3, 10, 8}. (2.72)
x[n] SYSTEM y[n]. The duration of the output is 2 + 2 − 1 = 3.
The definition of LTI for discrete-time systems is identical to 2-5.4 Discrete-Time Convolution Properties
the definition of LTI for continuous-time systems. If a discrete-
time system has impulse response h[n], then the output y[n] With one notable difference, the properties of the discrete-time
can be computed from the input x[n] using the discrete-time convolution are the same as those for continuous time. If (t)
2-5 REVIEW OF 1-D DISCRETE-TIME SIGNALS AND SYSTEMS 63
Table 2-6 Comparison of convolution properties for continuous-time and discrete-time signals.
2. Associative [g(t) ∗ h(t)] ∗ x(t) = g(t) ∗ [h(t) ∗ x(t)] [g[n] ∗ h[n]]∗ x[n] = g[n] ∗ [h[n] ∗ x[n]]
is replaced with [n] and integrals are replaced with sums, the 2-5.5 Delayed-Impulses Computation Method
convolution properties listed in Table 2-3 lead to those listed in
Table 2-6. For finite-duration signals, computation of the convolution sum
The notable difference is associated with property #7. In can be facilitated by expressing one of the signals as a linear
discrete time, the width (duration) of a signal that is zero-valued combination of delayed impulses. The process is enabled by the
outside interval [a, b] is b − a + 1, not b − a. Consider two sampling property (#6 in Table 2-6).
signals, h[n] and x[n], defined as follows: Consider, for example, the convolution sum of the two signals
x[n] = {2, 3, 4} and h[n] = {5, 6, 7}, namely
Signal From To Duration
y[n] = x[n] ∗ h[n] = {2, 3, 4} ∗ {5, 6, 7}.
h[n] a b b−a+1
x[n] c d d−c+1 The sampling property allows us to express x[n] in terms of
impulses,
y[n] a+c b+d (b + d) − (a + c) + 1
x[n] = 2δ [n] + 3δ [n − 1] + 4δ [n − 2],
where y[n] = h[n] ∗ x[n]. Note that the duration of y[n] is
which leads to
(b + d) − (a + c) + 1 = (b − a + 1) + (d − c + 1) − 1
= duration h[n] + duration x[n] − 1. y[n] = (2δ [n] + 3δ [n − 1] + 4δ [n − 2])∗ h[n]
= 2h[n] + 3h[n − 1] + 4h[n − 2].
64 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
2
Given that both x[n] and h[n] are of duration = 3, the duration
of their sum is 3 + 3 − 1 = 5, and it extends from n = 0 y[3] = ∑ x[i] h[3 − i]
i=1
to n = 4. Computing y[0] using the delayed-impulses method
(while keeping in mind that h[i] has a non-zero value for only = x[1] h[2] + x[2] h[1] = 3 × 7 + 4 × 6 = 45,
i = 0, 1, and 2) leads to 2
y[4] = ∑ x[i] h[4 − i] = x[2] h[2] = 4 × 7 = 28,
y[0] = 2h[0] + 3h[−1] + 4h[−2] i=2
The process can then be repeated to obtain the values of y[n] for Hence,
n = 1, 2, 3, and 4. y[n] = {10, 27, 52, 45, 28}.
(b) The convolution sum can be computed graphically
through a four-step process.
Example 2-4: Discrete-Time Convolution Step 1: Replace index n with index i and plot x[i] and h[−i],
as shown in Fig. 2-12(a). Signal h[−i] is obtained from h[i] by
reflecting it about the vertical axis.
Given x[n] = {2, 3, 4} and h[n] = {5, 6, 7}, compute
Step 2: Superimpose x[i] and h[−i], as in Fig. 2-12(b), and
y[n] = x[n] ∗ h[n] multiply and sum them. Their product is 10.
by (a) applying the sum definition and (b) graphically. Step 3: Shift h[−i] to the right by 1 to obtain h[1 − i], as
shown in Fig. 2-12(c). Multiplication and summation of x[i] by
Solution: (a) Both signals have a length of 3 and start at time h[1 − i] generates y[1] = 27. Shift h[1 − i] by one more unit to
zero. That is, x[0] = 2, x[1] = 3, x[2] = 4, and x[i] = 0 for all the right to obtain h[2 − i], and then repeat the multiplication
other values of i. Similarly, h[0] = 5, h[1] = 6, h[2] = 7, and and summation process to obtain y[2]. Continue the shifting and
h[i] = 0 for all other values of i. multiplication and summation processes until the two signals no
By Eq. (2.71d), the convolution sum of x[n] and h[n] is longer overlap.
n
y[n] = x[n] ∗ h[n] = ∑ x[i] h[n − i]. Step 4: Use the values of y[n] obtained in step 3 to generate a
i=0 plot of y[n], as shown in Fig. 2-12(g);
Since h[i] = 0 for all values of i except i = 0, 1, and 2, it follows y[n] = {10, 27, 52, 45, 28}.
that h[n − i] = 0 for all values of i except for i = n, n − 1, and
n − 2. With this constraint in mind, we can apply Eq. (2.71d) at
discrete values of n, starting at n = 0:
Concept Question 2-8: Why are most discrete-time si-
0 nusoids not periodic?
y[0] = ∑ x[i] h[0 − i] = x[0] h[0] = 2 × 5 = 10,
i=0
1 Concept Question 2-9: Why is the length of the convo-
y[1] = ∑ x[i] h[1 − i] lution of two discrete-time signals not equal to the sum of
i=0 the lengths of the two signals?
= x[0] h[1] + x[1] h[0] = 2 × 6 + 3 × 5 = 27,
2 Exercise 2-9: A 28 Hz sinusoid is sampled at 100 sam-
y[2] = ∑ x[i] h[2 − i] ples/s. What is Ω0 for the resulting discrete-time sinusoid?
i=0
What is the period of the resulting discrete-time sinusoid?
= x[0] h[2] + x[1] h[1] + x[2] h[0]
Answer: Ω0 = 0.56π ; N = 25. (See IP )
= 2 × 7 + 3 × 6 + 4 × 5 = 52,
2-5 REVIEW OF 1-D DISCRETE-TIME SIGNALS AND SYSTEMS 65
8 8
n=0 7 n=0
6 x[i] 6 6
5 h[−i]
4 4 4
3
2 2 2
i i
−3 −2 −1 0 1 2 3 4 5 6 −3 −2 −1 0 1 2 3 4 5 6
(a) x[i] and h[−i]
y[0] = 2 × 5 = 10 y[1] = 2 × 6 + 3 × 5 = 27
8 8
h[−i] n=0 h[1 − i] n=1
6 6
4 4
2 x[i] 2 x[i]
i i
−3 −2 −1 0 1 2 3 4 5 6 −3 −2 −1 0 1 2 3 4 5 6
(b) (c)
y[2] = 2 × 7 + 3 × 6 + 4 × 5 = 52 y[3] = 3 × 7 + 4 × 6 = 45
8 8
h[2 − i] n=2 h[3 − i] n=3
6 6
4 4
2 x[i] 2 x[i]
i i
−3 −2 −1 0 1 2 3 4 5 6 −3 −2 −1 0 1 2 3 4 5 6
(d) (e)
y[4] = 4 × 7 = 28 y[n]
8
n=4
6 60
h[4 − i] 52
4 40 45
x[i] 27 28
2 20
10
i n
−3 −2 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4
(f ) (g) y[n]
(a) m 6= n
Exercise 2-10: Compute the output y[n] of a discrete-time
LTI system with impulse response h[n] and input x[n], where
Evaluation of the integral in Eq. (2.74) leads to
h[n] = { 3, 1 } and x[n] = { 1, 2, 3, 4 }.
π
Z π
Answer: { 3, 7, 11, 15, 4 }. (See IP ) 1 jΩ(m−n) e jΩ(m−n)
e dΩ =
2 π −π 2π j(m − n)
−π
e j π (m−n) − e jπ (m−n)
−
=
2-6 Discrete-Time Fourier Transform j2π (m − n)
(DTFT) (−1)m−n − (−1)m−n
=
j2π (m − n)
The discrete-time Fourier transform (DTFT) is the discrete- = 0, (m 6= n). (2.75)
time counterpart to the Fourier transform. It has the same two
functions: (1) to compute spectra of signals and (2) to analyze
the frequency responses of LTI systems.
(b) m = n
2-6.1 Definition of the DTFT
If m = n, the integral reduces to
The DTFT of x[n], denoted X(Ω), and its inverse are defined as Z π Z π
1 1
e jΩ(n−n) dΩ = 1 dΩ = 1. (2.76)
∞
2π −π 2π −π
X(Ω) = ∑ x[n] e− jΩn (2.73a) The results given by Eqs. (2.75) and (2.76) can be combined into
n=−∞
the definition of the orthogonality property given by Eq. (2.74).
and Having verified the validity of the orthogonality property, we
Z π
1 now use it to derive Eq. (2.73b). Upon multiplying the definition
x[n] = X(Ω) e jΩn dΩ. (2.73b)
2π −π for the DTFT given by Eq. (2.73a) by 21π e jΩm and integrating
over Ω, we have
Readers familiar with the Fourier series will recognize that Z π Z π ∞
1 1
the DTFT X(Ω) is a Fourier series expansion with x[n] as the X(Ω) e jΩm dΩ = ∑ x[n] e jΩ(m−n) dΩ
2π −π 2π −π n=−∞
coefficients of the Fourier series. The inverse DTFT is simply Z π
∞
the formula used for computing the coefficients x[n] of the 1
= ∑ x[n] e jΩ(m−n) dΩ
Fourier series expansion of the periodic function X(Ω). 2π n=−∞ −π
We note that the DTFT definition given by Eq. (2.73a) is ∞
the same as the formula given by Eq. (2.53) for computing the = ∑ x[n] δ [m − n] = x[m]. (2.77)
spectrum Xs ( f ) of a continuous-time signal x(t) directly from n=−∞
its samples {x(n∆)}, with Ω = 2π f ∆.
The inverse DTFT given by Eq. (2.73b) can be derived as Equation (2.74) was used in the final step leading to Eq. (2.77).
follows. First, we introduce the orthogonality property Exchanging the order of integration and summation in Eq. (2.77)
is acceptable if the summand is absolutely summable; i.e., if the
Z π
1 DTFT is defined. Finally, replacing the index m with n in the top
e jΩ(m−n) dΩ = δ [m − n]. (2.74) left-hand side and bottom right-hand side of Eq. (2.77) yields
2π −π
the inverse DTFT expression given by Eq. (2.73b).
To establish the validity of this property, we consider two cases,
namely: (1) when m 6= n and (2) when m = n.
2-6 DISCRETE-TIME FOURIER TRANSFORM (DTFT) 67
Table 2-7 Properties of the DTFT. If x[n] is real-valued, then conjugate symmetry holds:
1. Linearity ∑ ci xi [n] ∑ ci Xi (Ω) Parseval’s theorem for the DTFT states that the energy of x[n]
is identical, whether computed in the discrete-time domain n or
2. Time shift x[n − n0] X(Ω) e− jn0 Ω in the frequency domain Ω:
∞ Z π
3. Modulation x[n] e jΩ0 n X(Ω − Ω0) 1
∑ |x[n]|2 = |X(Ω)|2 dΩ (2.80)
n=−∞ 2π −π
4. Time reversal x[−n] X(−Ω)
5. Conjugation x∗ [n] X∗ (−Ω) The energy spectral density is now 21π |X(Ω)|2 .
Finally, by analogy to continuous time, a discrete-time ideal
6. Time h[n] ∗ x[n] H(Ω) X(Ω) lowpass filter with cutoff frequency Ω0 has the frequency
convolution response for |Ω| < π (recall that H(Ω) is periodic with period π )
(
Special DTFT Relationships 1 for |Ω| < Ω0 ,
H(Ω) = (2.81)
0 for Ω0 < |Ω| ≤ π ,
7. Conjugate X∗ (Ω) = X(−Ω)
symmetry
which eliminates frequency components of x[n] that lie in the
8. Zero X(0) = ∑∞
n=−∞ x[n] range Ω0 < |Ω| ≤ π .
frequency Z π
1
9. Zero time x[0] = X(Ω) dΩ
2π −π 2-6.3 Important DTFT Pairs
∞
n
10. Ω = ±π X(±π ) = ∑ (−1) x[n] For easy access, several DTFT pairs are provided in Table 2-8.
n=−∞
∞ Z In all cases, the expressions for X(Ω) are periodic with period
1 π
11. Rayleigh’s ∑ |x[n]|2 = |X(Ω)|2 dΩ 2π , as they should be.
n=−∞ 2π −π
Entries #7 and #8 of Table 2-8 deserve more discussion,
(often called
Parseval’s) which we now present.
theorem
68 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
π ∞
5. sin(Ω0 n)
j ∑ [δ (Ω − Ω0 − 2π k) − δ (Ω + Ω0 − 2π k)]
k=−∞
The impulse response of an ideal lowpass filter is As can be seen in Fig. 2-13(c) and (d), hFIR [n] with N = 10
Z Ω0 provides a good approximation to an ideal lowpass filter, with
Ω0 Ω0 a finite duration. The Hamming-windowed filter belongs to a
h[n] = 1e jΩn dΩ = sinc n . (2.82)
−Ω0 π π group of filters called finite-impulse response (FIR) filters. FIR
filters can also be designed using a minimax criterion, resulting
This is called a discrete-time sinc function. A discrete-time sinc in an equiripple filter. This and other FIR filter design proce-
function h[n] with Ω0 = π /4 is displayed in Fig. 2-13(a), along dures are discussed in discrete-time signal processing textbooks.
with its frequency response H(Ω) in Fig. 2-13(b). Such a filter
is impractical for real-world applications, because it is unstable
and it has infinite duration. To override these limitations, we
can multiply h[n] by a window function, such as a Hamming
window. The modified impulse response hFIR [n] is then given
2-6 DISCRETE-TIME FOURIER TRANSFORM (DTFT) 69
h[n] H(Ω)
Impulse response Frequency response
0.25 1.5
0.20
1
0.10
0 n 0.5
−0.10 0 Ω
−20 −15 −10 −5 0 5 10 15 20 −2π −π −Ω00 Ω0 π 2π
(a) Impulse response h[n] (b) Ideal lowpass filter spectrum H(Ω) with Ω0 = π/4
hFIR[n] HFIR(Ω)
Frequency response
Impulse response 1.5
0.25
0.20 1
0.10
0.5
0 n
−0.05 0
−10 −5 0 5 10 −π 0 π
(c) Impulse response of Hamming-windowed filter (d) Spectrum HFIR(Ω) of Hamming-windowed filter
Figure 2-13 Parts (a) and (b) are for an ideal lowpass filter with Ω0 = π /4, and parts (c) and (d) are for the same filter after multiplying its
impulse response with a Hamming window of length N = 10.
( 1 − e jΩ(2N+1)
hni 1 for |n| ≤ N, = e− jΩN
rect = (2.84) 1 − e jΩ
N 0 for |n| > N. sin((2N + 1)Ω/2)
= . (2.87)
sin(Ω/2)
We note that rect Nn has duration 2N + 1. This differs from the
continuous-time rect
function
rect(t/T ), which has duration T . This is called a discrete (or periodic) sinc function. A rectangu-
The DTFT of rect Nn is obtained from Eq. (2.73a) by setting lar pulse with N = 10 is shown in Fig. 2-14 along with its DTFT.
x[n] = 1 and limiting the summation to the range (−N, N):
N
DTFT{rect[n/N]} = ∑ e− jΩn . (2.85)
n=−N Concept Question 2-10: Why does the DTFT share so
many properties with the CTFT?
Using the formula
N
1 − rN+1 Concept Question 2-11: Why is the DTFT periodic in
∑ rk = 1−r
(2.86) frequency?
k=0
70 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
by 1
N e j2π mk/N and summing over k gives
◮ To avoid confusion between the DTFT and the DFT, the
DFT, being a discrete function of integer k, uses square N−1 N−1 M−1
1 1
brackets, as in X[k], while the DTFT, being a continuous ∑ X[k] e j2π mk/N = ∑ ∑ x[n] e j2π (m−n)k/N
and periodic function of real numbers Ω, uses round paren- N k=0 N k=0 n=0
theses, as in X(Ω). ◭ 1 M−1 N−1
=
N ∑ x[n] ∑ e j2π (m−n)k/N
n=0 k=0
M−1
= ∑ x[n] δ [m − n]
n=0
(
x[m] for 0 ≤ m ≤ M − 1,
=
0 for M ≤ m ≤ N − 1.
(2.91)
Similarly,
X∗ [2] = X[4 − 2] = X[2] = −2,
2-7 DISCRETE FOURIER TRANSFORM (DFT) 73
which is real-valued. This conjugate-symmetry property follows where (n − n1 )N means (n − n1 ) reduced mod N (i.e., reduced
from the definition of the DFT given by Eq. (2.89a): by the largest integer multiple of N without (n − n1 ) becoming
negative).
N−1
X∗ [k] = ∑ x[n] e j2π nk/N (2.99)
n=0
and
N−1 N−1
C. DFT and Cyclic Convolution
X[N − k] = ∑ x[n] e− j2π n(N−k)/N = ∑ x[n] e− j2π ne j2π k/N .
n=0 n=0
(2.100) Because of the mod N reduction cycle, the expression on the
Since n is an integer, e− j2π n = 1 and Eq. (2.100) reduces to right-hand side of Eq. (2.103) is called the cyclic or circular
convolution of signals x1 [n] and x2 [n]. The terminology helps
N−1
distinguish it from the traditional linear convolution of two
X[N − k] = ∑ e j2π nk/N = X∗[k]. nonperiodic signals.
n=0
The symbol commonly used to denote cyclic convolution is
.
c
Combining Eqs. (2.101) and (2.103) leads to
B. Use of DFT for Convolution
N−1
The convolution property of the DTFT extends to the DFT after
some modifications. Consider two signals, x1 [n] and x2 [n], with
yc [n] = x1 [n]
c x2 [n] = ∑ x1[n1 ] x2 [(n − n1)N ]
n1 =0
N-point DFTs X1 [k] and X2 [k]. From Eq. (2.89b), the inverse
DFT of their product is = DFT−1 (X1 [k] X2 [k])
N−1
1 2π
1 N−1
jk 2Nπ n
= ∑ X1 [k] X2 [k] e jk N n . (2.104)
DFT−1 (X1 [k] X2 [k]) = ∑ (X1 [k] X2 [k])e N k=0
N k=0
" # The cyclic convolution yc [n] can certainly by computed by
1 N−1 jk 2π n N−1
− jk 2Nπ n1
= ∑e N
N k=0 ∑ x1 [n1 ] e applying Eq. (2.104), but it can also be computed from the
n1 =0 linear convolution x1 [n]∗ x2[n] by aliasing the latter. To illustrate,
" # suppose x1 [n] and x2 [n] are both of duration N. The linear
N−1
− jk 2Nπ n2
· ∑ x2[n2 ] e . (2.101) convolution of the two signals
n2 =0
y[n] = x1 [n] ∗ x2[n] (2.105)
Rearranging the order of the summations gives
is of duration 2N − 1, extending from n = 0 to n = 2N − 2.
DFT−1 (X1 [k] X2 [k]) = Aliasing y[n] means defining z[n], the aliased version of y[n],
as
1 N−1 N−1 N−1 2π
X1 [k] X2 [k] Next, we zero-pad x1 [n] and x2 [n] so that their durations are
= { 10 × 11, (−2 + j2)(3 − j2), 2 × 3, (−2 − j2)(3 + j2) } equal to or greater than Nc . As we will see in Section 2-8 on how
the fast Fourier transform (FFT) is used to compute the DFT, it
= { 110, −2 + j10, 6, −2 − j10 }. is advantageous to choose the total length of the zero-padded
signals to be M such that M ≥ Nc , and simultaneously M is a
Application of Eq. (2.104) leads to
power of 2.
c x2 [n] = { 28, 21, 30, 31 }.
x1 [n]
The zero-padded signals are defined as
(b) Per Eq. (2.71d), the linear convolution of x′1 [n] = {x1 [n], 0, . . . , 0}, (2.109a)
|{z} | {z }
N1 M−N1
x1 [n] = { 2, 1, 4, 3 }
x′2 [n] = {x2 [n], 0, . . . , 0}, (2.109b)
|{z} | {z }
and N2 M−N2
x2 [n] = { 5, 3, 2, 1 }
and their M-point DFTs are X′1 [k] and X′2 [k], respectively. The
is linear convolution y[n] can now be computed by a modified
version of Eq. (2.104), namely
y[n] = x1 [n] ∗ x2[n]
3 y[n] = x′1 [n] ∗ x′2[n]
= ∑ x1 [i] x2 [n − i]
i=0 = DFT−1 { X′1 [k] X′2 [k] }
= { 10, 11, 27, 31, 18, 10, 3 }. M−1
1
=
M ∑ X′1[k] X′2 [k] e j2π nk/M . (2.110)
Per Eq. (2.106), k=0
z[n] = { y[0] + y[4], y[1] + y[5], y[2] + y[6], y[3] } Note that the DFTs can be computed using M-point DFTs of
x′1 [n] and x′2 [n], since using an M-point DFT performs the zero-
= { 10 + 18, 11 + 10, 27 + 3, 31 } = { 28, 21, 30, 31 }. padding automatically.
2-7 DISCRETE FOURIER TRANSFORM (DFT) 75
and
Example 2-7: DFT Convolution
X′2 [3] = −2 + j2.
From Eq. (2.89a) with N = Nc = 4, the 4-point DFT of which is identical to the answer obtained earlier in part (a).
x′1 [n] = {4, 5, 0, 0} is For simple signals like those in this example, the DFT method
involves many more steps than does the straightforward con-
3 volution method of part (a), but for the type of signals used in
X1 [k] = ∑ x′1[n] e− jkπ n/2, k = 0, 1, 2, 3, practice, the DFT method is computationally superior.
n=0
which gives
and
X′2 [0] = 6,
X′2 [1] = −2 − j2,
X′2 [2] = 2,
76 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
Exercise 2-13: Compute the 4-point DFT of {4, 3, 2, 1}. WN = e− j2π /N , (2.111a)
− j2π nk/N
Answer: {10, (2 − j2), 2, (2 + j2)}. (See IP ) WNnk =e , (2.111b)
and
WN−nk = e j2π nk/N . (2.111c)
2-8 Fast Fourier Transform (FFT)
Using this shorthand notation, the summations for the DFT, and
its inverse given by Eq. (2.89), assume the form
◮ The fast Fourier transform (FFT) is a computational
N−1
algorithm used to compute the discrete Fourier transforms
(DFT) of discrete signals. Strictly speaking, the FFT is X[k] = ∑ x[n] WNnk , k = 0, 1, . . . , N − 1, (2.112a)
n=0
not a transform, but rather an algorithm for computing the
transform. ◭ and
1 N−1
As was mentioned earlier, the fast Fourier transform (FFT) is x[n] = ∑ X[k]WN−nk ,
N k=0
n = 0, 1, . . . , N − 1. (2.112b)
a highly efficient algorithm for computing the DFT of discrete
time signals. An N-point DFT performs a linear transformation
from an N-long discrete-time vector, namely x[n], into an N- In this form, the N-long vector X[k] is given in terms of the
long frequency domain vector X[k] for k = 0, 1, . . . , N − 1. N-long vector x[n], and vice versa, with WNnk and WN−nk acting
Computation of each X[k] involves N complex multiplications, as weighting coefficients.
so the total number of multiplications required to perform the For a 2-point DFT,
DFT for all X[k] is N 2 . This is in addition to N(N − 1) complex
additions. For N = 512, for example, direct implementation of N = 2,
the DFT operation requires 262,144 multiplications and 261,632 W20k = e− j0 = 1,
complex additions. For small N, these numbers are smaller, and
since multiplication by any of { 1, −1, j, − j } does not count as
a true multiplication. W21k = e− jkπ = (−1)k .
2-8 FAST FOURIER TRANSFORM (FFT) 77
Table 2-10 Comparison of number of complex computations required by a standard DFT and an FFT using the formulas in the
bottom row.
N Multiplication Additions
Standard DFT FFT Standard DFT FFT
2 4 1 2 2
4 16 4 12 8
8 64 12 56 24
16 256 32 240 64
.. .. .. .. ..
. . . . .
512 262,144 2,304 261,632 4,608
1,024 1,048,576 5,120 1,047,552 10,240
2,048 4,194,304 11,264 4,192,256 22,528
N
N N2 log2 N N(N − 1) N log2 N
2
divided into four 4-point DFTs, which are just additions and
1 Xe[0] 1 X[0] subtractions. This conquers the 16-point DFT by dividing it into
xe[0] = x[0]
1 1 4-point DFTs and additional MADs.
1
Xe[1] 1 X[1]
xe[1] = x[2]
−1
W14
2-point DFT
A. Dividing a 16-Point DFT
1
xo[0] = x[1] 1 Xo[0] −1 X[2] We now show that the 16-point DFT can be computed for
1 even values of k using an 8-point DFT of (x[n] + x[n + 8]) and
1 1 for odd values of k using an 8-point DFT of the modulated
Xo[1] −W14 X[3] signal (x[n] − x[n + 8])e− j2π n/16. Thus, the 16-point DFT can
xo[1] = x[3]
−1 be computed as an 8-point DFT (for even values of k) and as a
2-point DFT Recomposition modulated 8-point DFT (for odd values of k).
k
X[2k′ ] = ∑ x[n] e− j2π (2k /16)n
X[k] = [xe [0] + (−1) xe [1]] n=0
| {z }
7 15
2-point DFT of xe [n] ′ ′
= ∑ x[n] e− j2π (2k /16)n + ∑ x[n] e− j2π (2k /16)n.
+ W41k [xo [0] + (−1)k xo [1]], (2.119) n=0 n=8
| {z } (2.120)
2-point DFT of xo [n]
= DFT({e− j2π (1/16)n (x[n] − x[n + 8]), n = 0, . . . , 7}). Note the conjugate symmetry in the second and third lines:
(2.123) X[7] = X∗ [1], X[6] = X∗ [2], and X[5] = X∗ [3].
So for odd values of k, the 16-point DFT of x[n] is the 8-point
• This result agrees with direct MATLAB computation using
DFT of { e− j(2π /16)n(x[n] − x[n + 8]), n = 0, . . . , 7 }. The signal
{ x[n] − x[n + 8], n = 0, . . . , 7 } has been modulated through
multiplication by e− j(2π /16)n. The multiplications by e− j(2π /16)n fft([7 1 4 2 8 5 3 6]).
are known as twiddle multiplications (mults) by the twiddle
factors e− j(2π /16)n.
2-8.4 Dividing Up a 2N-Point DFT
We now generalize the procedure to a 2N-point DFT by dividing
Example 2-8: Dividing an 8-Point DFT it into two N-point DFTs and N twiddle mults.
into Two 4-Point DFTs (1) For even indices k = 2k′ we have:
N−1
′
e
avoid one or more zeros in H[k], but it may also introduce new and
ones. It may be necessary to try multiple values of N to satisfy
e 6= 0 for all k.
the condition that H[k] Y[3] = 6(1) + 19( j) + 32(−1) + 21(− j) = −26 − j2.
Y[0] = 6(1) + 19(1) + 32(1) + 21(1) = 78, Solution: The two-trumpets signal time-waveform is shown
Y[1] = 6(1) + 19(− j) + 32(−1) + 21( j) = −26 + j2, in Fig. 2-18(a), and the corresponding spectrum is shown in
Fig. 2-18(b). We note that the spectral lines occur in pairs of
Y[2] = 6(1) + 19(−1) + 32(1) + 21(−1) = −2, harmonics with the lower harmonic of each pair associated with
82 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
{ X(k∆ f ) } if its sampling rate S f = 1/∆ f > 2 T2 = T . In the exponent of Eq. (2.136), the expression becomes
In the sequel, we use the minimum sampling intervals
M
∆t = 1/F and ∆ f = 1/T . Finer discretization can be achieved
Xs (k∆ f ) = ∑ x(n∆t ) e− j2π (k∆ f )(n∆t ) , |k| ≤ M
by simply increasing F and/or T . In practice, F and/or T is (are)
n=−M
increased slightly so that N = FT is an odd integer, which makes
M
the factor M = (N − 1)/2 also an integer (but not necessarily an x(n∆t ) e− j2π nk/(2M+1) ,
odd integer). The factor M is related to the order of the DFT,
= ∑ |k| ≤ M. (2.140)
n=−M
which has to be an integer.
The Fourier transform of the synthetic sampled signal xs (t), This expression looks like a DFT of order 2M + 1. Recall from
defined in Eq. (2.43) and repeated here as the statement in connection with Eq. (2.47) that the spectrum
Xs ( f ) of the sampled signal includes the spectrum X( f ) of the
∞
continuous-time signal (multiplied by the sampling rate St ) plus
xs (t) = ∑ x(n∆t ) δ (t − n∆t ), (2.134)
additional copies repeated every ±St along the frequency axis.
n=−∞
With St = 1/∆t = F in the present case,
was computed in Eq. (2.53), and also repeated here as
Xs ( f ) = FX( f ), for | f | < F, (2.141)
∞
Xs ( f ) = ∑ x(n∆t ) e− j2π f n∆t . (2.135) from which we deduce that
n=−∞
M
Setting f = k∆ f gives X(k∆ f ) = Xs ( f ) ∆t = ∑ x(n∆t ) e− j2π nk/(2M+1) ∆t , |k| ≤ M.
n=−M
∞
(2.142)
Xs (k∆ f ) = ∑ x(n∆t ) e− j2π (k∆ f )(n∆t ) . (2.136) Ironically, this is the same result that would be obtained by
n=−∞
simply discretizing the definition of the continuous-time Fourier
Noting that x(t) = 0 for |t| > T2 and X( f ) = 0 for | f | > F transform! But this derivation shows that discretization gives the
2, we
restrict the ranges of n and k to exact result if x(t) is time- and bandlimited.
T /2 FT N
|n| ≤ = = (2.137a)
∆t 2 2 Example 2-11: Computing CTFT by DFT
and
F/2 FT N
|k| ≤ = = . (2.137b) Use the DFT to compute the Fourier transform of the continuous
∆f 2 2
Gaussian signal
Next, we introduce factor M defined as 1 2
x(t) = √ e−t /2 .
N −1 2π
M= , (2.138)
2
Solution: Our first task is to assign realistic values for the
and we note that if N is an odd integer, M is guaranteed to be signal duration T and the width of its spectrum F. It is an
an integer. In view of Eq. (2.137), the ranges of n and k become “educated” trial-and-error process. At t = 4, x(t) = 0.00013,
n, k = −M, . . . , M. Upon substituting so we will assume that x(t) ≈ 0 for |t| > 4. Since x(t) is
symmetrical with respect to the vertical axis, we assign
1 1 1 1 1
∆t ∆ f = = = = . (2.139)
FT FT N 2M + 1 T = 2 × 4 = 8 s.
2 2
The Fourier transform of x(t) is X( f ) = e−2π f . By trial and
error, we determine that F = 1.2 Hz is sufficient to characterize
X( f ). The combination gives
N = T F = 8 × 1.2 = 9.6.
84 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
X[k]
0.8
0.6
0.4
0.2
k
−4 −3 −2 −1 0 1 2 3 4
Figure 2-19 Comparison of exact (blue circles) and DFT-computed (red crosses) of the continuous-time Fourier transform of a Gaussian
signal.
Summary
Concepts
• Many 2-D concepts can be understood more easily by h(t) to input x(t) is output y(t) = h(t)∗ x(t), and similarly
reviewing their 1-D counterparts. These include: LTI sys- in discrete time.
tems, convolution, sampling, continuous-time; discrete- • The response of an LTI system with impulse response
time; and discrete Fourier transforms. h(t) to input A cos(2π f0t + θ ) is
• The DTFT is periodic with period 2π .
A|H( f )| cos(2π f0t + θ + H( f )),
• Continuous-time signals can be sampled to discrete-time
signals, on which discrete-time signal processing can be where H( f ) is the Fourier transform of h(t), and
performed. similarly in discrete time.
• The response of an LTI system with impulse response
Mathematical Formulae
Impulse Sinc interpolation formula
1 t ∞
δ (x) = lim
ε →0 2ε
rect
2ε
x(t) = ∑ x(n∆) sinc(S(t − n∆))
n=−∞
Energy of x(t) Discrete-time Fourier transform (DTFT)
Z ∞
2 ∞
E= |x(t)| dt x[n] e− jΩn
−∞
X(Ω) = ∑
n=−∞
Convolution Z ∞ Inverse DTFTZ
y(t) = h(t) ∗ x(t) = h(τ )x(t − τ ) d τ 1 π
−∞ x[n] = X(Ω) e jΩn dΩ
2 π −π
Convolution
∞ Discrete-time sinc
y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]
h[n] =
Ω0
sinc
Ω0 n
i=−∞ π π
Fourier transform
Z ∞ Discrete sinc
X( f ) = x(t) e− j2π f t dt sin((2N + 1)Ω/2)
−∞ X(Ω) =
sin(Ω/2)
Inverse Fourier transform
Z ∞ Discrete Fourier Transform (DFT)
j2π f t M−1
x(t) = X( f ) e df
−∞ X[k] = ∑ x[n] e− j2π nk/N
n=0
Sinc function
sin(π x) Inverse DFT
sinc(x) =
πx 1 N−1
x[n] = ∑ X[k] e j2π nk/N
N k=0
Ideal lowpass filter impulse response
h(t) = 2 fc sinc(2 fc t) Cyclic convolution
N−1
Sampling theorem yc [n] = x1 [n]
c x2 [n] = ∑ x1 [n1] x2 [(n − n1)N ]
1 n1 =0
Sampling rate S = > 2B if X( f ) = 0 for | f | > B
∆
86 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
Important Terms Provide definitions or explain the meaning of the following terms:
aliasing FFT Parseval’s theorem spectrum
cyclic convolution Fourier transform Rayleigh’s theorem zero padding
deconvolution frequency response function sampled signal
DFT impulse response convolution sampling theorem
DTFT linear time-invariant (LTI) sinc function
2.4 If x(t) = sin(2t)/(π t), compute the energy of d 2 x/dt 2 . Section 2-5: Review of Discrete-Time Signals and
2.5 Compute the energy of e−t u(t) ∗ sin(t)/(π t). Systems
2.6 Show that Z ∞ 2.12 Compute the following convolutions:
sin2 (at) a
dt = (a) {1, 2} ∗ {3, 4, 5}
−∞ (π t)2 π
if a > 0. (b) {1, 2, 3} ∗ {4, 5, 6}
(c) {2, 1, 4} ∗ {3, 6, 5}
Section 2-4: The Sampling Theorem 2.13 If {1, 2, 3} ∗ x[n] = {5, 16, 34, 32, 21}, compute x[n].
2.7 The spectrum of the trumpet signal for note G (784 Hz) 2.14 Given the two systems connected in series as
is negligible above its ninth harmonic. What is the Nyquist
sampling rate required for reconstructing the trumpet signal x[n] h1 [n] w[n] = 3x[n] − 2x[n − 1],
from its samples?
PROBLEMS 87
(a) x[n] ∗ {3, 1, 4, 2} = {6, 23, 18, 57, 35, 37, 28, 6}.
(b) x[n] ∗ {1, 7, 3, 2} = {2, 20, 53, 60, 53, 54, 21, 10}.
(c) x[n] ∗ {2, 2, 3, 6} =
{12, 30, 42, 71, 73, 43, 32, 45, 42}.
0
1
This chapter extends the 1-D definitions, properties, and trans-
3
formations covered in the previous chapter into their 2-D
equivalents. It also presents certain 2-D properties that have no 5
counterparts in 1-D. 7
9
x
11
3-1 Displaying Images
13
In 1-D, a continuous-time signal x(t) is displayed by plotting 15
x(t) versus t. A discrete-time signal x[n] is displayed using a 17
stem plot of x[n] versus n. Clearly such plots are not applicable
19
for 2-D images.
0 1 3 5 7 9 11 13 15 17 19
Image intensity f (x, y) of a 2-D image can be displayed either y
as a 3-D mesh plot (Fig. 3-1(a)), which hides some features of
the image and is difficult to create and interpret, or as a grayscale (b) Grayscale format
image (Fig. 3-1(b)). In a grayscale image, the image intensity is
scaled so that the minimum value of f (x, y) is depicted in black Figure 3-1 An image displayed in (a) mesh plot format and (b)
and the maximum value of f (x, y) is depicted in white. If the grayscale format.
image is non-negative ( f (x, y) ≥ 0), as is often the case, black in
the grayscale image denotes zero values of f (x, y). If the image
is not non-negative, zero values of f (x, y) appear as a shade of depicting the infrared intensity emitted by a hot air balloon.
gray. We should not confuse a false-color image with a true-color
MATLAB’s imagesc(X),colormap(gray) displays image. Whereas a false-color image is a single grayscale image
the 2-D array X as a grayscale image in which black depicts (1 channel) displayed in color, a true-color image actually is a
the minimum value of X and white depicts the maximum value set of three images (3 channels):
of X.
It is also possible to display an image f (x, y) as a false-color { fred (x, y), fgreen (x, y), fblue (x, y)},
image, in which case different colors denote different values
of f (x, y). The relation between color and values of f (x, y) is representing (here) the three primary colors: red, green and
denoted using a colorbar, to the side or bottom of the image. An blue. Other triplets of colors, such as yellow, cyan and magenta,
example of a false-color display was shown earlier in Fig. 1-11, can also be used. Hence, image processing of color images
90
3-2 2-D CONTINUOUS-SPACE IMAGES 91
x0 x
3-2.1 Fundamental 2-D Images
(a) f Box x − x0 , y − y0 = rect x − x0 rect y − y0
A. Impulse ( ℓx ℓy ) ℓx ( ) ( )ℓy
A 2-D impulse δ (x, y) is simply y
δ (x − ξ , y − η ) = δ (x − ξ ) δ (y − η ). x0 x
(b) f Disk x − x0 , y − y0
The sifting property generalizes directly from 1-D to 2-D. In
2-D ( a a )
Z ∞Z ∞
f (ξ , η ) δ (x − ξ , y − η ) d ξ d η = f (x, y). (3.1) Figure 3-2 (a) Box image of widths (ℓx , ℓy ) and centered at
−∞ −∞
(x0 , y0 ) and (b) disk image of radius a/2 and centered at (x0 , y0 ).
92 CHAPTER 3 2-D IMAGES AND SYSTEMS
C. Disk Image
(0,0) x
Being rectangular in shape, the box-image function is suitable
for applications involving Cartesian coordinates, such as shifting
the box sideways or up and down across the image. Some
applications, however, require the use of polar coordinates, in
which case the disk-image function is more suitable. The disk y
image fDisk (x, y) of radius 1/2 is defined as
( p (a) Origin at top left and y axis downward
1 for px2 + y2 < 1/2,
fDisk (x, y) = (3.3)
0 for x2 + y2 > 1/2. y
3. Image energy
y
Extending the expression for signal energy given by Eq. (3.7)
from 1-D to 2-D leads to x′
y′ y
x′ = x cos θ + y sin θ
Z ∞Z ∞ x′ y′ = −x sin θ + y cos θ
E= | f (x, y)|2 dx dy. (3.5) θ
−∞ −∞ y′
x x
4. Even-odd decomposition
A real-valued image f (x, y) can be decomposed into its even Figure 3-4 Rotation of coordinate system (x, y) by angle θ to
fe (x, y) and odd fo (x, y) components: coordinate system (x′ , y′ ).
To rotate an image by an angle θ , we define rectangular Answer: 2-D impulse and disk are invariant to rotation;
coordinates (x′ , y′ ) as the rectangular coordinates (x, y) rotated box is not invariant to rotation.
by angle θ . Using the sine and cosine addition formulae, the
rotated coordinates (x′ , y′ ) are related to coordinates (x, y) by
(Fig. 3-4)
x′ = x cos θ + y sin θ (3.9)
3-3 Continuous-Space Systems
and A continuous-space system is a device or mathematical model
y′ = −x sin θ + y cos θ . (3.10) that accepts as an input an image f (x, y) and produces as an
94 CHAPTER 3 2-D IMAGES AND SYSTEMS
output an image g(x, y). and if the system also is shift-invariant, the 2-D superposition
integral simplifies to the 2-D convolution given by
Z ∞Z ∞
f (x, y) SYSTEM g(x, y).
g(x, y) = f (ξ , η ) h(x − ξ , y − η ) d ξ d η
−∞ −∞
The image rotation transformation described by Eq. (3.12) is a = f (x, y) ∗ ∗h(x, y), (3.15a)
good example of such a 2-D system.
where the “double star” in f (x, y) ∗ ∗h(x, y) denotes the 2-D
convolution of the PSF h(x, y) with the input image f (x, y). In
symbolic form, the 2-D convolution is written as
3-3.1 Linear and Shift-Invariant (LSI) Systems
f (x, y) LSI g(x, y) = f (x, y) ∗ ∗h(x, y). (3.15b)
The definition of the linearity property of 1-D systems (Section
2-2.1) extends directly to 2-D spatial systems, as does the
definition for the invariance, except that time invariance in 1-D
systems becomes shift invariance in 2-D systems. Systems ◮ A 2-D convolution consists of a convolution in the x
that are both linear and shift-invariant are termed linear shift- direction, followed by a convolution in the y direction, or
invariant (LSI). vice versa. Consequently, the 1-D convolution properties
listed in Table 2-3 generalize to 2-D. ◭
by
◮ The spectrum F(µ , ν ) of an image f (x, y) is its 2-D
Z ∞Z ∞
CSFT. The spatial frequency response H(µ , ν ) of an LSI
F(µ , v) = f (x, y) e− j2π (µ x+vy) dx dy. (3.16a) 2-D system is the 2-D CSFT of its PSF h(x, y). ◭
−∞ −∞
and we call the Fourier transform of h(t) the frequency response B. Separable Images
of the system, H( f ):
◮ The CSFT of a separable image f (x, y) = f1 (x) f2 (y) is
h(t) H( f ). itself separable in the spatial frequency domain:
The analogous relationships for a 2-D LSI system are f1 (x) f2 (y) F1 (µ ) F2 (v). (3.19)
δ (x, y) LSI h(x, y), (3.17a) This assertion follows directly from the definition of the CSFT
given by Eq. (3.16a):
h(x, y) H(µ , v). (3.17b)
Z ∞Z ∞
The CSFT H(µ , v) is called the spatial frequency response of F(µ , v) = f (x, y) e− j2π (µ x+vy) dx dy
−∞ −∞
the LSI system. Z ∞ Z ∞
= f1 (x) e− j2π µ x dx f2 (y) e− j2π vy dy
−∞ −∞
◮ As in 1-D, the 2-D Fourier transform of a convolution = F1 (µ ) F2 (v). (3.20)
of two functions is equal to the product of their Fourier
transforms:
◮ The CSFT pairs listed in Table 3-1 are all separable
functions, and can be obtained by applying Eq. (3.20) to the
f (x, y) LTI g(x, y) = h(x, y) ∗ ∗ f (x, y) 1-D Fourier transform pairs listed in Table 2-5. CSFT pairs
for non-separable functions are listed later in Table 3-2. ◭
implies that
Selected Properties
1. Linearity ∑ ci fi (x, y) ∑ ci Fi (µ , v)
1 µ v
2. Spatial scaling f (ax x, ay y) F ,
|ax ay | ax ay
3. Spatial shift f (x − x0 , y − y0 ) e− j2π µ x0 e− j2π vy0 F(µ , v)
CSFT Pairs
8. δ (x, y) 1
domain, respectively. A grayscale image-display of f (x, y) is By Eq. (3.20), the CSFT of f (x, y) is
shown in Fig. 3-5(a), with pure black representing f (x, y) = −1
and pure white representing f (x, y) = +1. As expected, the F(µ , v) = F1 (µ ) F2 (v)
image exhibits a repetitive pattern along both x and y, with 19 = F { cos(2π µ0 x) } F { cos(2π v0y) }
cycles in 10 cm in the x direction and 9 cycles in 10 cm in 1
the y direction, corresponding to spatial frequencies of µ = 1.9 = [δ (µ − µ0 ) + δ (µ + µ0 )]
4
cycles/cm and v = 0.9 cycles/cm, respectively.
× [δ (v − v0) + δ (v + v0)], (3.21)
0 f (t)
Signal
1
t
0
−T T
5 cm 2 2
(a) Rectangular pulse
|F( μ)|
Magnitude
spectrum T
10 cm x
0 5 cm 10 cm
y
(a) 2-D sinusoidal image f (x, y)
μ
−3 −2 −1 0 1 2 3
v
T T T T T T
13
(b) Magnitude spectrum
Phase ϕ( μ)
spectrum
180o
0 μ
μ
−3 −2 −1 0 1 2 3
T T T T T T
−13 (c) Phase spectrum
−13 0 13
(b) F( μ,v), with axes in cycles/cm Figure 3-6 (a) Rectangular pulse, and corresponding (b) magni-
tude spectrum and (c) phase spectrum.
Figure 3-5 (a) Sinusoidal image f (x, y)=cos(2π µ0 x) cos(2π v0 y)
with µ0 = 1.9 cycles/cm and v0 = 0.9 cycles/cm, and (b) the
corresponding Fourier transform F(µ , v), which consists of four
impulses (4 white dots) at { ±µ0 , ±v0 }.
D. Box Image
F(µ , v) = ℓx e− j2π µ x0 sinc(µ ℓx )ℓy e− j2π vy0 sinc(vℓy ), (3.28) (c) Phase image ϕ( μ,v)
and
−1 Im[F(µ , v)]
φ (µ , v) = tan . (3.29b)
Re[F(µ , v)]
A visual example is shown in Fig. 3-8(a) for a square box of
sides ℓx = ℓy = ℓ, and shifted to the right by L and also downward x
L
by L. Inserting x0 = L, y0 = −L, and ℓx = ℓy = ℓ in Eq. (3.28) l
leads to
FT
μ
(a) Clown face image f (x,y) (b) Magnitude spectrum of clown image F( μ,v)
×v
IFT
μ
(c) Magnified PSF h(x,y) of 2-D LPF (d) Spatial frequency response of 2-D LPF, HLP( μ,v)
with μ0 = 0.44 cycles/mm
=
IFT
μ
(e) Lowpass-filtered clown image g(x,y) (f ) Magnitude spectrum of filtered image G( μ,v)
Figure 3-9 Lowpass filtering the clown image in (a) to generate the image in (e). Image f (x, y) is 40 mm × 40 mm and the magnitude
spectra extend between −2.5 cycles/mm and +2.5 cycles/mm in both directions.
3-4 2-D CONTINUOUS-SPACE FOURIER TRANSFORM (CSFT) 101
G(µ , v) = F(µ , v) HLP (µ , v). (3.34) Figure 3-10 Rotation of axes by θ in (a) spatial domain causes
rotation by the same angle in the (b) spatial frequency domain.
The magnitude of the result is displayed in Fig. 3-9(f). Upon
performing an inverse Fourier transform on G(µ , v), we obtain
g(x, y), the lowpass-filtered image of the clown face shown in
Fig. 3-9(e). Image g(x, y) looks like a blurred version of the
original image f (x, y) because the lowpass filtering smooths out
rapid variations in the image. of the rotated image g(x, y) = f (x′ , y′ ), and Rθ is the rotation
Alternatively, we could have obtained g(x, y) directly by matrix relating (x, y) to (x′ , y′ ):
performing a convolution in the spatial domain:
g(x, y) = f (x, y) ∗ ∗hLP(x, y). (3.35) cos θ sin θ
Rθ = . (3.38)
− sin θ cos θ
Even though the convolution approach is direct and conceptually
straightforward, it is computationally much easier to perform the
filtering by transforming to the angular frequency domain, mul- The inverse relationship between (x′ , y′ ) and (x, y) is given in
tiplying the two spectra, and then inverse transforming back to terms of the inverse of matrix Rθ :
the spatial domain. The actual computation was performed using ′
x −1 x cos θ − sin θ x′
discretized (pixelated) images and the Fourier transformations = Rθ = . (3.39)
y y′ sin θ cos θ y′
were realized using the 2-D DFT introduced later in Section 3-8.
The 2-D Fourier transform of g(x, y) is given by
3-4.2 Image Rotation Z ∞Z ∞
G(µ , v) = g(x, y) e− j2π (µ x+vy) dx dy
−∞ −∞
◮ Rotating an image by angle θ (Fig. 3-10) in the 2-D Z ∞Z ∞
spatial domain (x, y) causes its Fourier transform to also = f (x′ , y′ ) e− j2π (µ x+vy) dx dy. (3.40)
rotate by the same angle in the frequency domain (µ , v). ◭ −∞ −∞
coordinates (ρ , φ ), with
y v
p
(x,y) (μ,v) µ = ρ cos φ ρ = µ 2 + v2
. (3.45)
y v v = ρ sin φ φ = tan−1 (v/µ )
r ρ
θ ϕ The Fourier transform of f (x, y) is given by Eq. (3.16a) as
x x μ Z ∞Z ∞
μ
F(µ , v) = f (x, y) e− j2π (µ x+vy) dx dy. (3.46)
−∞ −∞
(a) Spatial domain (b) Frequency domain
We wish to transform F(µ , v) into polar coordinates so we
Figure 3-11 Relationships between Cartesian and polar coordi- may apply it to circularly symmetric images or to use it in
nates in (a) spatial domain and (b) spatial frequency domain. filtering applications where the filter’s frequency response is
defined in terms of polar coordinates. To that end, we convert
the differential area dx dy in Eq. (3.46) to r dr d θ , and we use
the relations given by Eqs. (3.44) and (3.45) to transform the
exponent in Eq. (3.46):
where we define
′ µ x + vy = (ρ cos φ )(r cos θ ) + (ρ sin φ )(r sin θ )
µ cos θ sin θ µ µ
= = Rθ . (3.42) = ρ r[cos φ cos θ + sin φ sin θ ]
v′ − sin θ cos θ v v
= ρ r cos(φ − θ ). (3.47)
The newly defined spatial-frequency coordinates (µ ′ , v′ ) are
related to the original frequency coordinates (µ , v) by exactly The cosine addition formula was used in the last step. Conver-
the same rotation matrix Rθ that was used to rotate image f (x, y) sion to polar coordinates leads to
to g(x, y). The consequence of using Eq. (3.42) is that Eq. (3.38) Z ∞ Z 2π
now assumes the standard form for the definition of the Fourier F(ρ , φ ) = f (r, θ ) e− j2πρ r cos(φ −θ ) r dr d θ . (3.48a)
transform for f (x′ , y′ ): r=0 θ =0
J0(z)
1
y
3 cm
0.5
0 z 0 x
−0.5
0 2 4 6 8 10 12 14 16 18 20
v
21
Because the integration over θ extends over the range (0, 2π ),
the integrated value is the same for any fixed value of φ . Hence,
for simplicity we set φ = 0, in which case Eq. (3.49) simplifies
to
Z ∞ Z 2 π
− j2πρ r cos θ
F(ρ ) = r f (r) e d θ dr
r=0 θ =0
Z ∞ 0 μ
= 2π r f (r) J0 (2πρ r) dr, (3.50)
r=0
f (r) F(ρ )
δ (r)
1
πr
J1 (πρ )
rect(r)
2ρ
J1 (π r)
rect(ρ )
2r
1 1
r ρ
2 2
e− π r e−πρ (a) Letters image f (x,y)
δ (r − r0 ) 2π r0 J0 (2π r0 ρ )
transform is
Z ∞
F(ρ ) = 2π r δ (r − a) J0 (2πρ r) dr = 2π a J0 (2πρ a). (b) Letters image f (x′,y′) with x′ = ax and y′ = ay,,
r=0 spatially scaled by a = 4
(3.52b)
The image in Fig. 3-13(b) displays the variation of F(ρ ) as a
function of ρ in the spatial frequency domain for a ring with Figure 3-14 (a) Letters image and (b) a scaled version.
a = 1 cm (image size is 6 cm × 6 cm).
Table 3-2 provides a list of Fourier transform pairs of rota-
tionally symmetric images.
C. Gaussian Image
3-4.5 Image Examples
A. Scaling A 2-D Gaussian image is characterized by
x
y
(a) Sinusoidal image (b) Spectrum of image in (a)
x
y
(c) Sinusoidal image rotated by 45◦ (d) Spectrum of rotated sinusoidal image
Figure 3-15 (a) Sinusoidal image and (b) its 2-D spectrum; (c) rotated image and (d) its rotated spectrum.
From standard tables of integrals, we borrow the following The integrals in Eq. (3.54) and Eq. (3.55) become identical if we
identity for any real variable t: set t = r, a2 = π , and b = 2πρ , which leads to
Z ∞
2t 2 1 −b2 /4a2
te−a J0 (bt) dt = e , for a2 > 0. (3.55) 2
0 2a2 F(ρ ) = e−πρ . (3.56)
Gaussian spectrum
106 CHAPTER 3 2-D IMAGES AND SYSTEMS
1
Concept Question 3-3: Why do so many 1-D Fourier
sin(πρ) J1(πρ)
sinc( ρ) = jinc( ρ) = transform properties generalize directly to 2-D?
0.5 πρ 2ρ
2
Exercise 3-7: Compute the 2-D CSFT of f (x, y) = e−π r ,
where r2 = x2 + y2 , without using Bessel functions. Hint:
f (x, y) is separable. y
Answer: f (x, y) = e −π r 2
=e e −π x2 −π y2
is separable, so
Eq. (3.19) and entry #5 of Table 2-5 (see also entry #13
of Table 3-1) give
2 2 2 +ν 2 ) 2
F(µ , ν ) = e−π µ e−πν = e−π (µ = e−πρ .
where ∆ is the sampling length (instead of interval) and sin(π S(x − n∆)) sin(π S(y − m∆))
× . (3.62)
Sx = 1/∆ is the sampling rate in samples/meter. π S(x − n∆) π S(y − m∆)
If the spectrum of image f (x, y) is bandlimited to B—that is,
F(µ , v) = 0 outside the square region defined by As noted earlier in Section 2-4.4 in connection with Eq. (2.51),
accurate reconstruction using the sinc interpolation formula
{ (µ , v) : 0 ≤ |µ |, |v| ≤ B }, is not practical because it requires summations over infinite
number of samples.
then the image f (x, y) can be reconstructed from its samples
f [m, n], provided the sampling rate is such that S > 2B. As in
1-D, 2B is called the Nyquist sampling rate, although the units 3-5.1 Sampling/Reconstruction Examples
are now samples/meter instead of samples/second. The following image examples are designed to illustrate the im-
The sampled signal xs (t) defined by Eq. (2.43) generalizes portant role of the Nyquist rate when sampling an image f (x, y)
directly to the sampled image: (for storage or digital transmission) and then reconstructing it
∞ ∞ from its sampled version fs (x, y). We will use the term image
fs (x, y) = ∑ ∑ f (n∆, m∆) reconstruction fidelity as a qualitative measure of how well
n=−∞ m=−∞ the reconstructed image frec (x, y) resembles the original image
× [δ (x − n∆) δ (y − m∆)]. (3.61) f (x, y).
Reconstruction of frec (x, y) from the sampled image fs (x, y)
The term inside the square brackets (product of two impulse can be accomplished through either of two approaches:
trains) is called the bed of nails function, because it consists (a) Application of nearest-neighbor (NN) interpolation
of a 2-D array of impulses, as shown in Fig. 3-17. (which is a 2-D version of the 1-D nearest-neighbor interpola-
108 CHAPTER 3 2-D IMAGES AND SYSTEMS
tion), implemented directly on image fs (x, y). in part (d). The spectrum of the sampled image contains
(b) Transforming image fs (x, y) to the frequency domain, the spectrum of the original image (namely, the spectrum in
applying 2-D lowpass filtering (LPF) to simultaneously preserve Fig. 3-18(b)), plus periodic copies spaced at an interval S along
the central spectrum of f (x, y) and remove all copies thereof both directions in the spatial frequency domain. To preserve the
(generated by the sampling process), and then inverse transform- central spectrum and simultaneously remove all of the copies,
ing to the spatial domain. a lowpass filter is applied in step (f) of Fig. 3-18. Finally.
Both approaches will be demonstrated in the examples that application of the 2-D inverse Fourier transform to the spectrum
follow, and in each case we will compare an image reconstructed in part (f) leads to the reconstructed image frec (x, y) in part (e).
from an image sampled at the Nyquist rate with an aliased image We note that the process yields a reconstructed image with high-
reconstructed from an image sampled at a rate well below the fidelity resemblance to the original image f (x, y).
Nyquist rate. In all cases, the following parameters apply:
• Size of original (clown) image f (x, y) and reconstructed B. NN Reconstruction
image frec (x, y): 40 mm × 40 mm
Figure 3-19 displays image f (x, y), sampled image fs (x, y), and
• Sampling interval ∆ (and corresponding sampling rate the NN reconstructed image f (x, y). The last step was realized
S = 1/∆) and number of samples N: using a 2-D version of the nearest-neighbor interpolation tech-
nique described in Section 2-4.4D. NN reconstruction provides
• Nyquist-sampled version: ∆ = 0.2 mm, S = 5 sam- image
ples/mm, N = 200 × 200
• Sub–Nyquist-sampled version: ∆ = 0.4 mm, S = 2.5 fˆ(x, y) = fs (x, y) ∗ ∗ rect(x/∆) rect(y/∆), (3.63)
sample/mm, N = 100 × 100
which is a 2-D convolution of the 2-D sampled image fs (x, y)
• Spectrum of original image f (x, y) is bandlimited to with a box function. The spectrum of the NN interpolated signal
B = 2.5 cycles/mm is
sin(π ∆µ ) sin(π ∆v)
F̂(µ , v) = F(µ , v) . (3.64)
• Display πµ πv
• Images f (x, y), fs (x, y), frec (x, y): Linear scale As in 1-D, the zero crossings of the 2-D sinc functions coincide
with the centers of the copies of F(µ , v) induced by sampling.
• Image magnitude spectra: Logarithmic scale (for Consequently, the 2-D sinc functions act like lowpass filters
easier viewing; magnitude spectra extend over a wide along µ and v, serving to eliminate the copies almost completely.
range). Comparison of the NN-interpolated image in Fig. 3-19(c)
with the original image in part (a) of the figure leads to the
Reconstruction Example 1: conclusion that the NN technique works quite well for images
Image Sampled at Nyquist Rate sampled at or above the Nyquist rate.
Our first step is to create a bandlimited image f (x, y). This was
done by transforming an available clown image to the spatial Reconstruction Example 2:
frequency domain and then applying a lowpass filter with a Image Sampled below the Nyquist Rate
cutoff frequency of 2.5 cycles/mm. The resultant image and its
corresponding spectrum are displayed in Figs. 3-18(a) and (b), A. LPF Reconstruction
respectively.
The sequence in this example (Fig. 3-20) is identical with that
A. LPF Reconstruction described earlier in Example 1A, except for one very important
difference: in the present case the sampling rate is S = 2.5
Given that image f (x, y) is bandlimited to B = 2.5 cycles/mm, cycles/mm, which is one-half of the Nyquist rate. Consequently,
the Nyquist rate is 2B = 5 cycles/mm. Figure 3-18(c) displays the final reconstructed image in Fig. 3-20(e) bears a poor
fs (x, y), a sampled version of f (x, y), sampled at the Nyquist resemblance to the original image in part (a) of the figure.
rate, so it should be possible to reconstruct the original im-
age with good fidelity. The spectrum of fs (x, y) is displayed
3-5 2-D SAMPLING THEOREM 109
FT
μ
(c) Nyquist-sampled image fs(x,y) (d) Spectrum Fs( μ,v) of sampled image
Filtering
IFT
μ
Figure 3-18 Reconstruction Example 1A: After sampling image f (x, y) in (a) to generate fs (x, y) in (c), the sampled image is Fourier
transformed [(c) to (d)], then lowpass-filtered [(d) to (f)] to remove copies of the central spectrum) and inverse Fourier transform [(f) to (e)]
to generate the reconstructed image frec (x, y). All spectra are displayed in log scale.
110 CHAPTER 3 2-D IMAGES AND SYSTEMS
B. NN Reconstruction
The sequence in Fig. 3-21 parallels the sequence in Fig. 3-19,
except that in the present case we are working with the sub-
Nyquist sampled image. As expected, the NN interpolation
technique generates a poor-fidelity reconstruction, just like the
LPF reconstructed version.
FT
μ
(c) Sub-Nyquist sampled image fs(x,y) (d) Spectrum Fs( μ,v) of sampled image
Filtering
IFT
μ
Figure 3-20 Reconstruction Example 2A: Image f (x, y) is sampled at half the Nyquist rate (S = 2.5 sample/mm compared with 2B = 5
samples/mm). Consequently, the reconstructed image in (e) bears a poor resemblance to the original image in (a). All spectra are displayed in
log scale.
112 CHAPTER 3 2-D IMAGES AND SYSTEMS
∆rec
∆rec
ρ0
μ
0
ν
3-6 2-D Discrete Space
3-6.1 Discrete-Space Images
A discrete-space image represents a physical quantity that
varies with discrete space [n, m], where n and m are dimension-
ρ0 less integers. Such an image usually is generated by sampling a
continuous-space image f (x, y) at a spatial interval ∆s along the
μ x and y directions. The sampled image is defined by
0
f [n, m] = f (n∆s , m∆s ). (3.65)
A. Image Axes
(b) Spectrum Fs( ρ,ϕ) As noted earlier in connection with continuous-space images,
multiple different formats are used in both continuous- and
discrete-space to define image coordinates. We illustrate the
Figure 3-23 (a) Hexagonal tiling and (b) corresponding spec- most common of these formats in Fig. 3-25. In the top of the
trum Fs (ρ , φ ). figure, we show pixel values for a 10 × 10 array. In parts (a)
114 CHAPTER 3 2-D IMAGES AND SYSTEMS
(a) Radially bandlimited image f (r,θ) (b) Spectrum F( ρ,ϕ) of image f (r,θ)
Sampling at 2ρ0
v
FT
μ
(c) Hexagonally sampled image fs(r,θ) (d) Spectrum Fs( ρ,ϕ) of sampled image
Filtering
IFT
μ
0 0 0 0 0 0 0 0 0 0
0 3 5 7 9 10 11 12 13 14
0 5 10 14 17 20 23 25 27 29
0 8 15 21 26 30 34 37 40 43
0 10 20 27 34 40 45 50 54 57
0 10 20 27 34 40 45 50 54 57
0 8 15 21 26 30 34 37 40 43
0 5 10 14 17 20 23 25 27 29
0 3 5 7 9 10 11 12 13 14
0 0 0 0 0 0 0 0 0 0
Pixel values
Origin
m
0 9
1 8
2 7
3 6
4 5
5 4
6 3
7 2
8 1
9 0
n n
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
m
(a) Top-left corner format (b) Bottom-left corner format
Origin
m
4 1
3 2
2 3
1 4
0 5
−1
n 6
−2 7
−3 8
−4 9
−5 10
−5 −4 −3 −2 −1 0 1 2 3 4 1 2 3 4 5 6 7 8 9 10
n′
m′
(c) Center-of-image format (d) MATLAB format
Figure 3-25 The four color images are identical in pixel values, but they use different formats for the location of the origin and for coordinate
directions for image f [n, m]. The MATLAB format in (d) represents X(m′ , n′ ).
116 CHAPTER 3 2-D IMAGES AND SYSTEMS
through (d), we display the same color maps corresponding to MATLAB format
the pixel array, except that the [n, m] coordinates are defined
differently, namely: In MATLAB, an (M × N) image is defined as
{ X(m′ , n′ ), 1 ≤ m′ ≤ M, 1 ≤ n′ ≤ N }. (3.68)
Top-left corner format
MATLAB uses the top-left corner format, except that its
Figure 3-25(a): [n, m] starts at [0, 0] and both integers extend indices n′ and m′ start at 1 instead of 0. Thus, the top-left corner
to 9, the origin is located at the upper left-hand corner, m in- is (1, 1) instead of [0, 0]. Also, m′ , the first index in X(m′ , n′ ),
creases downward, and n increases to the right. For a (M × N) represents the vertical axis and the second index, n′ , represents
image, f [n, m] is defined as the horizontal axis, which is the reverse of the index notation
represented in f [n, m]. The two notations are related as follows:
{ f [n, m], 0 ≤ n ≤ N − 1, 0 ≤ m ≤ M − 1 } (3.66a)
f [n, m] = X(m′ , n′ ), (3.69a)
or, equivalently,
with
f [n, m] =
m′ = m + 1, (3.69b)
f [0, 0] f [1, 0] . . . f [N − 1, 0]
f [0, 1] f [1, 1] . . . f [N − 1, 1] ′
n = n + 1. (3.69c)
.. .. . (3.66b)
. .
f [0, M − 1] f [1, M − 1] . . . f [N − 1, M − 1]
Column 3 of f [n,m]
Image f [n,m]
Image f [n − 2, m − 1]
Row 5 of f [n,m]
Figure 3-26 Image f [n − 2, m − 1] is image f [n, m] shifted down by 1 and to the right by 2.
3-7 2-D Discrete-Space Fourier ◮ The spectrum of an image f [n, m] is its DSFT F(Ω1 , Ω2 ).
Transform (DSFT) The discrete-space frequency response H(Ω1 , Ω2 ) of an
LSI system is the DSFT of its point spread function (PSF)
The 2-D discrete-space Fourier transform (DSFT) is obtained h[n, m]. ◭
via direct generalization of the 1-D DTFT (Section 2-6) to
2-D. The DSFT consists of a DTFT applied first along m
and then along n, or vice versa. By extending the 1-D DTFT
definition given by Eq. (2.73a) (as well as the properties listed Example 3-2: DSFT of Clown Image
in Table 2-7) to 2-D, we obtain the following definition for the
DSFT F(Ω1 , Ω2 ) and its inverse f [n, m]:
∞ ∞ Use MATLAB to obtain the magnitude image of the DSFT of
F(Ω1 , Ω2 ) = ∑ ∑ f [n, m] e− j(Ω1 n+Ω2 m) , (3.73a) the clown image.
n=−∞ m=−∞
Z π Z π
1 Solution: The magnitude part of the DSFT is displayed in
f [n, m] = F(Ω1 , Ω2 )
4π 2 −π −π Fig. 3-27. As expected, the spectrum is periodic with period 2π
×e j(Ω1 n+Ω2 m)
dΩ1 dΩ2 . (3.73b) along both Ω1 and Ω2 .
The properties of the DSFT are direct 2-D generalizations of Concept Question 3-7: What is the DSFT used for? Give
the properties of the DTFT, and discrete-time generalizations of three applications.
the properties of the 2-D continuous-space Fourier transform.
3-8 2-D DISCRETE FOURIER TRANSFORM (2-D DFT) 119
Table 3-3 Properties of the (K2 × K1 ) 2-D DFT. In the time-shift and modulation properties, (k1 − k1′ ) and (n − n0 ) must be reduced
mod(K 1 ), and (k2 − k2′ ) and (m − m0 ) must be reduced mod(K 2 ).
Selected Properties
linear 2-D convolutions h[n, m] ∗ ∗ f [n, m] can be zero-padded to 3-8.2 Conjugate Symmetry for the 2-D DFT
cyclic convolutions, just as in 1-D.
n − j(2π /K2 )mk2 ,
(K2 − k2), respectively: (4) F[K1 /2, k2 ] = ∑M−1 N−1
m=0 ∑n=0 f [n, m] (−1) e
which is the K2 -point 1-D DFT of ∑N−1 n=0 f [n, m] (−1)
n
F[K1 − k1 , K2 − k2 ] − j(2 π / K1 )n(K1 /2) j π n n
because e = e = (−1) .
N−1 M−1
n(K1 − k1 ) m(K2 − k2)
= ∑ ∑ f [n, m] exp − j2π +
m − j(2π /K1 )nk1
n=0 m=0 K1 K2 (5) F[k1 , K2 /2] = ∑N−1 M−1
n=0 ∑m=0 f [n, m] (−1) e ,
N−1 M−1 M−1
which is the K1 -point 1-D DFT of ∑m=0 f [n, m] (−1)m
nk1 mk2
= ∑ ∑ f [n, m] exp j2π + because e− j(2π /K2)m(K2 /2) = e jπ m = (−1)m .
n=0 m=0 K1 K2
nK1 mK2 (6) F[K1 /2, K2 /2] = ∑N−1 M−1 m+n .
× exp − j2π + n=0 ∑m=0 f [n, m] (−1)
K1 K2
N−1 M−1
nk1 mk2
= ∑ ∑ f [n, m] exp j2π +
n=0 m=0 K1 K2
3-9 Computation of the 2-D DFT Using
× e− j2π (n+m)
N−1 M−1 MATLAB
nk1 mk2
= ∑ ∑ f [n, m] exp j2π
K1
+
K2
, (3.79)
We remind the reader that the notation used in this book
n=0 m=0
represents images f [n, m] defined in Cartesian coordinates, with
where we used e− j2π (n+m) = 1 because n and m are integers. The the origin at the upper left corner, the first element n of the
expression on the right-hand side of Eq. (3.79) is identical to the coordinates [n, m] increasing horizontally rightward from the
expression for F[k1 , k2 ] given by Eq. (3.75) except for the minus origin, and the second element m of the coordinates [n, m]
sign ahead of j. Hence, for a real-valued image f [n, m], increasing vertically downward from the origin. To illustrate
with an example, let us consider the (3 × 3) image given by
F∗ [k1 , k2 ] = F[K1 − k1 , K2 − k2 ], (3.80) f [0, 0] f [1, 0] f [2, 0] 3 1 4
1 ≤ k1 ≤ K1 − 1; 1 ≤ k2 ≤ K2 − 1, f [n, m] = f [0, 1] f [1, 1] f [2, 1] = 1 5 9 . (3.81)
f [0, 2] f [1, 2] f [2, 2] 2 6 5
where (K2 × K1 ) is the order of the 2-D DFT. When stored in MATLAB as array X(m′ , n′ ), the content remains
the same, but the indices swap roles and their values start at
3-8.3 Special Cases (1, 1):
A. f [n, m] is real X(1, 1) X(1, 2) X(1, 3) 3 1 4
X(m′ , n′ ) = X(2, 1) X(2, 2) X(2, 3) = 1 5 9 .
If f [n, m] is a real-valued image, the following special cases X(3, 1) X(3, 2) X(3, 3) 2 6 5
hold: (3.82)
Arrays f [n, m] and X(m′ , n′ ) are displayed in Fig. 3-28.
(1) F[0, 0] = ∑N−1 M−1
n=0 ∑m=0 f [n, m] is real-valued. Application of Eq. (3.75) with N = M = 3 and K1 = K2 = 3
− j(2π /K )mk to the 3 × 3 image defined by Eq. (3.81) leads to
(2) F[0, k2 ] = ∑M−1 N−1
m=0 ∑n=0 f [n, m] e
2 2,
N−1
which is the K2 -point 1-D DFT of ∑n=0 f [n, m]. 36 −9 + j5.2 −9 − j5.2
− j(2π /K )nk F[k1 , k2 ] = −6 − j1.7 9 + j3.5 1.5 + j0.9 . (3.83)
(3) F[k1 , 0] = ∑N−1 M−1
n=0 ∑m=0 f [n, m] e
1 1,
−6 + j1.7 1.5 − j0.9 9 − j3.5
M−1
which is the K1 -point 1-D DFT of ∑m=0 f [n, m].
◮ In MATLAB, the command FX=fft2(X,M,N) com-
B. f [n, m] is real and K1 and K2 are even putes the (M × N) 2-D DFT of array X and stores it in array
If also K1 and K2 are even, then the following relations apply: FX. ◭
122 CHAPTER 3 2-D IMAGES AND SYSTEMS
3 × 3 Image
3 1 4
1 5 9
2 6 5
DFT fft2(X)
9 − j3.5 −6 + j1.7 1.5 − j0.9 9 − j3.5 −6 + j1.7 1.5 − j0.9
′ ′
Fc [k1c , k2c ] = −9 − j5.2 36 −9 + j5.2 FXC(k2c , k1c ) = −9 − j5.2 36 −9 + j5.2
1.5 + j0.9 −6 − j1.7 9 + j3.5 1.5 + j0.9 −6 − j1.7 9 + j3.5
Figure 3-28 In common-image format, application of the 2-D DFT to image f [n, m] generates F[k1 , k2 ]. Upon shifting F[k1 , k2 ] along k1 and k2 to
center the image, we obtain the center-of-image format represented by Fc [k1c , k2c ]. The corresponding sequence in MATLAB starts with X(m′ , n′ ) and
′ , k′ ).
concludes with FXC(k2c 1c
The corresponding array in MATLAB, designated FX(k2′ , k1′ ) 3-9.1 Center-of-Image Format
and displayed in Fig. 3-28, has the same content but with
MATLAB indices (k2′ , k1′ ). Also, k2′ increases downward and In some applications, it is more convenient to work with the
k1′ increases horizontally. The relationships between MATLAB 2-D DFT array when arranged in a center-of-image format
indices (k2′ , k1′ ) and common-image format indices (k1 , k2 ) are (Fig. 3-25(c) but with the vertical axis pointing downward) than
identical in form to those given by Eq. (3.69), namely in the top-left corner format. To convert array F[k1 , k2 ] to a
center-of-image format, we need to shift the array elements to
k2′ = k2 + 1, (3.84)
the right and downward by an appropriate number of steps so as
k1′ = k1 + 1. (3.85) to locate F[0, 0] in the center of the array. If we denote the 2-D
3-9 COMPUTATION OF THE 2-D DFT USING MATLAB 123
DFT in the center-of-image format as Fc [k1c , k2c ], then its index with k1 = K1 − k1c , k2 = K2 − k2c ,
k1c extends over the range
Ki − 1 Ki − 1
− ≤ k1c ≤ , for Ki = odd, (3.86) (d) Fourth Quadrant
2 2
and Fc [k1c , −k2c ] = F[k1 , k2 ], (3.90d)
A. N = M and K1 = K2 = odd
◮ In MATLAB, the command
FXC=fftshift(fft2(FX)) The (3 × 3) image shown in Fig. 3-28 provides an example of an
(M × M) image with M being an odd integer. As noted earlier,
shifts array FX to center-image format and stores it in array when the 2-D DFT is displayed in the center-of-image format,
FXC. ◭ the conjugate symmetry about the center of the array becomes
readily apparent.
In the general case for any integers K1 and K2 , transform-
ing the 2-D DFT F[k1 , k2 ] into the center-of-image format
Fc [k1c , k2c ] entails the following recipe:
For A. N = M and K1 = K2 = even
(
′ Ki /2 − 1 if Ki is even, Let us consider the (4 × 4) image
Ki = (3.89)
(Ki − 1)/2 if Ki is odd,
1 2 3 4
and 0 ≤ k1c , k2c ≤ K′i : 2 4 5 3
f [n, m] = . (3.91)
3 4 6 2
(a) First Quadrant 4 3 2 1
Fc [k1c , k2c ] = F[k1 , k2 ], (3.90a) The (4 × 4) 2-D DFT F[k1 , k2 ] of f [n, m], displayed in the upper-
left corner format, is
with k1 = k1c and k2 = k2c ,
F[0, 0] F[1, 0] F[2, 0] F[3, 0]
(b) Second Quadrant
F[0, 1] F[1, 1] F[2, 1] F[3, 1]
F=
Fc [−k1c , k2c ] = F[k1 , k2 ], (3.90b) F[0, 2] F[1, 2] F[2, 2] F[3, 2]
F[0, 3] F[1, 3] F[2, 3] F[3, 3]
with k1 = K1 − k1c , k2 = k2c ,
49 −6 − j3 3 −6 + j3
−5 − j4 2 + j9 −5 + j2 j
(c) Third Quadrant = . (3.92)
1 −4 + j3 −1 −4 − j3
Fc [−k1c , −k2c ] = F[k1 , k2 ], (3.90c) −5 + j4 −j −5 − j2 2 − j9
124 CHAPTER 3 2-D IMAGES AND SYSTEMS
Fc (k1c , k2c )
′
F [−2, −2] F′ [−1, −2] F′ [0, −2] F′ [1, −2]
′ ′ ′ ′
F [−2, −1] F [−1, −1] F [0, −1] F [1, −1]
= ′
F [−2, 0] F′ [−1, 0] F′ [0, 0] F′ [1, 0]
′ ′
F [−2, −1] F [−1, 1] ′
F [0, 1] F′ [1, 1]
−1 −4 − j3 1 −4 + j3
−5 − j2 2 − j9 −5 + j4 −j
= . (3.93)
3 −6 + j3 49 −6 − j3
−5 + j2 j −5 − j4 2 + j9
Exercise 3-15: Why did this book not spend more space on
computing the DSFT?
Answer: Because in practice, the DSFT is computed using
the 2-D DFT.
3-9 COMPUTATION OF THE 2-D DFT USING MATLAB 125
Summary
Concepts
• Many 2-D concepts are generalizations of 1-D coun- space images, on which discrete-space image processing
terparts. These include: LSI systems, convolution, sam- can be performed.
pling, 2-D continuous-space; 2-D discrete-space; and • Nearest-neighbor interpolation often works well for in-
2-D discrete Fourier transforms. terpolating sampled images to continuous space. Hexag-
• The 2-D DSFT is doubly periodic in Ω1 and Ω2 with onal sampling can also be used.
periods 2π . • Discrete-space images can be displayed in several differ-
• Rotating an image rotates its 2-D continuous-space ent formats (the location of the origin differs).
Fourier transform (CSFT). The CSFT of a radially sym- • The response of an LSI system with point spread
metric image is radially symmetric. function h(x, y) to image f (x, y) is output g(x, y) =
• Continuous-space images can be sampled to discrete- h(x, y) ∗ ∗ f (x, y), and similarly in discrete space.
Mathematical Formulae
Impulse Ideal radial lowpass filter PSF
δ (r) h(r) = 4ρ02 jinc(2ρ0 r)
δ (x, y) = δ (x) δ (y) =
πr
Energy of f (x, y) 2-D Sampling
Z ∞Z ∞ 1
E= | f (x, y)|2 dx dy Sampling interval > 2B if F(µ , ν ) = 0 for |µ |, |ν | > B
∆
−∞ −∞
Important Terms Provide definitions or explain the meaning of the following terms:
aliasing CSFT DSFT linear shift-invariant (LSI) point spread function sampling theorem
convolution DFT FFT nearest-neighbor interpolation sampled image sinc function
126 CHAPTER 3 2-D IMAGES AND SYSTEMS
3.9 Nearest-Neighbor interpolation: Run MATLAB program 3.15 Compute the spatial frequency response of the system in
P39.m. This samples the clown image and reconstructs from Problem 3.12.
PROBLEMS 127
3.16 Show that the 2-D spatial frequency response of the PSF using (2 × 2) 2-D DFTs.
1 2 1
h[m, n] = 2 4 2
1 2 1
y[n, m] = x1 [n, m]
c
c x2 [n, m],
where
1 2
x1 [n, m] =
3 4
and
5 6
x2 [n, m] =
7 8
Chapter 4
4 Image Interpolation
0
Contents
50
Overview, 129
4-1 Interpolation Using Sinc Functions, 129 100
4-2 Upsampling and Downsampling Modalities, 130
4-3 Upsampling and Interpolation, 133 150
4-4 Implementation of Upsampling Using 2-D DFT
in MATLAB, 137 200
4-5 Downsampling, 140
4-6 Antialias Lowpass Filtering, 141 250
4-7 B-Splines Interpolation, 143
4-8 2-D Spline Interpolation, 149 300
4-9 Comparison of 2-D Interpolation Methods, 150
4-10 Examples of Image Interpolation 350
Applications, 152
Problems, 156 399
0 50 100 150 200 250 300 350 399
129
130 CHAPTER 4 IMAGE INTERPOLATION
In reality, an image is finite in size, and so is the number of where the rectangle function, defined earlier in Eq. (2.2), is given
samples { f (n∆, m∆) }. Consequently, for a square image, the by (
ranges of indices n and m are limited to finite lengths M, in x 1 for − a < x < a,
which case the infinite sums in Eq. (4.1) become finite sums: rect =
2a 0 otherwise.
M−1 M−1 x y
Parameter a usually is assigned a value of 2 or 3. In MATLAB,
fsinc (x, y) = ∑ ∑ f (n∆, m∆) sinc
∆
− n sinc
∆
−m .
the Lanczos interpolation formula can be exercised using the
n=0 m=0
(4.3) command imresize and selecting Lanczos from the menu.
The summations start at n = 0 and m = 0, consistent with the In addition to the plot for sinc(x), Fig. 4-2 also contains plots
image display format shown in Fig. 3-3(a), wherein location of the sinc function multiplied by the windowed sinc function,
(0, 0) is at the top-left corner of the image. with a = 2 and also with a = 3. The rectangle function is zero
The sampled image consists of M × M values—denoted here beyond |x| = a, so the windowed function stops at |x| = 2 for
by f (n∆, m∆)—each of which is multiplied by the product of a = 2 and at |x| = 3 for a = 3. The Lanczos interpolation method
two sinc functions, one along x and another along y. The value provides significant computational improvement over the simple
fsinc (x, y) at a specified location (x, y) on the image consists of sinc interpolation formula.
the sum of M × M terms. In the limit for an image of infinite
size, and correspondingly an infinite number of samples along Concept Question 4-1: What is the advantage of Lanc-
x and y, the infinite summations lead to fsinc (x, y) = f (x, y), zos interpolation over sinc interpolation?
where f (x, y) is the original image. That is, the interpolated
image is identical to the original image, assuming all along that
the sampled image is in compliance with the Nyquist criterion Exercise 4-1: If sinc interpolation is applied to the samples
of the sampling theorem. When applying the sinc interpolation {x(0.1n) = cos(2π (6)0.1n)}, what will be the result?
formula to a finite-size image, the interpolated image should be Answer: {x(0.1n)} are samples of a 6 Hz cosine sam-
a good match to the original image, but it is computationally pled at a rate of 10 samples/second, which is below the
inefficient when compared with the spatial-frequency domain Nyquist frequency of 12 Hz. The sinc interpolation will
interpolation technique described later in Section 4-3.2. be a cosine with frequency aliased to (10 − 6) = 4 Hz (see
Section 2-4.3).
1
sinc(x)
0.8
sinc(x) sinc(x/2) rect(x/4)
0.6 sinc(x) sinc(x/3) rect(x/6)
0.4
0.2
−0.2 x
−3 −2 −1 0 1 2 3
Figure 4-2 Sinc function (in black) and Lanczos windowed sinc functions (a = 2 in blue and a = 3 in red).
image is characterized by four parameters: (Mu × Mu ) image g[n, m], with Mu > Mo and
w × w = w2 = image pixel area, Since the sampling interval ∆u in the enlarged upsampled image
2 is shorter than the sampling interval in the initial image, ∆o , the
∆o × ∆o = ∆o = true resolution area. Nyquist requirement continues to be satisfied.
If the displayed image bears a one-to-one correspondence to the
array f [n, m], then the brightness of a given pixel corresponds B. Increasing Image Array Size While Keeping
to the magnitude f [n, m] of the corresponding element in the Physical Size Unchanged
array, and the pixel dimensions w × w are directly proportional
to ∆o × ∆o . For simplicity, we set w = ∆o , which means that the The image displayed in Fig. 4-3(b) is an upsampled version of
image displayed on the computer screen has the same physical f [n, m] in which the array size was increased from (M × M) to
dimensions as the original image f (x, y). The area ∆2o is the true (Mu × Mu ) and the physical size of the image remained the same,
resolution area of the image. but the pixel size is smaller. In the image g[n, m],
Mo Mo
T = T ′, w′ = w , and ∆u = ∆o . (4.7)
Mu Mu
4-2.1 Upsampling Modalities
A. Enlarging Physical Size While Keeping Pixel ◮ The two images in Fig. 4-3(a) and (b) have identical ar-
Size Unchanged rays g[n, m], but they are artificially displayed on computer
screens with different pixel sizes. ◭
The image configuration in part (a) of Fig. 4-3 depicts what
happens when the central image f [n, m] is enlarged in size from
(T × T ) to (T ′ × T ′ ) while keeping the pixel size the same. The 4-2.2 Downsampling Modalities
process, accomplished by an upsampling operation, leads to an
132 CHAPTER 4 IMAGE INTERPOLATION
T′ g[n,m]
T′ = T
Upsampling
Upsampling
Δu = Δo /2
f [n,m] Δu = Δo /2 w′
T
(b) Image with Mu = 2Mo,
T ′ = T, and w′ = w /2
w′
(a) Image with w′ = w, Mu = 2Mo,
and T ′ = 2T Downsampling
Downsampling T′ = T
w
Δd = 2Δo
T′ Δd = 2Δo Original image
Mo × Mo
w′ w′
(c) Thumbnail image with w′ = w, (d) Image with w′ = 2w, T ′ = T,
Md = Mo /2 and T ′ = T/2 and Md = Mo /2
4-3 Upsampling and Interpolation illustrated in Fig. 4-3(a), rather than to decrease the sampling
interval.
Upsampling image f [n, m] to image g[n′ , m′ ] can be accom-
Let f (x, y) be a continuous-space image of size T (meters) by T
plished either directly in the discrete spatial domain or indirectly
(meters) whose 2-D Fourier transform F(µ , v) is bandlimited to
in the spatial frequency domain. We examine both approaches in
B (cycles/m). That is,
the subsections that follow.
F(µ , v) = 0 for |µ |, |ν | > B. (4.10)
Image f (x, y) is not available to us, but an (Mo × Mo ) sampled 4-3.1 Upsampling in the Spatial Domain
version of f (x, y) is available. We define it as the original
In practice, image upsampling is performed using the 2-D
sampled image
DFT in the spatial frequency domain because it is much faster
f [n, m] = { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1 }, (4.11) and easier computationally than performing the upsampling
directly in the spatial domain. Nevertheless, for the sake of
where ∆o is the associated sampling interval. Moreover, the completeness, we now provide a succinct presentation of how
sampling had been performed at a rate exceeding the Nyquist upsampling is performed in the spatial domain using the sinc
rate, which requires the choice of ∆o to satisfy the condition interpolation formula.
We start by repeating Eq. (4.3) after replacing ∆ with ∆o and
1 M with Mo :
∆o < . (4.12)
2B
Mo −1 Mo −1
x y
Given that the sampled image is T × T in size, the number of f (x, y) = ∑ ∑ f [n, m] sinc − n sinc −m .
samples Mo along each direction is n=0 m=0 ∆o ∆o
(4.16)
T Here, f [n, m] is the original (Mo × Mo ) sampled image available
Mo = . (4.13) to us, and the goal is to upsample it to an (Mu × Mu ) image
∆o
g[n′ , m′ ], with Mu = LMo , where L is an upsampling factor. In
Next, we introduce a new (yet to be created) higher-density the upsampled image, the sampling interval ∆u is related to the
sampled image g[n, m], also T × T in physical dimensions but sampling interval ∆o of the original sampled image by
containing Mu × Mu samples—instead of Mo × Mo samples—
with Mu > Mo (which corresponds to the scenario depicted in Mo ∆o
∆u = ∆o = . (4.17)
Fig. 4-3(b)). We call g[n′ , m′ ] the upsampled version of f [n, m]. Mu L
The goal of upsampling and interpolation, which usually is
abbreviated to just “upsampling,” is to compute g[n′ , m′ ] from To obtain g[n′ , m′ ], we sample f (x, y) at x = n′ ∆u and y = m′ ∆u :
f [n, m]. Since Mu > Mo , g[n′ , m′ ] is more finely discretized than
f [n, m], and the narrower sampling interval ∆u of the upsampled g[n′ , m′ ] = f (n′ ∆u , m′ ∆u ) (0 ≤ n′ , m′ ≤ Mu − 1)
image is Mu −1 Mu −1
∆u =
T
=
Mo
∆o . (4.14)
= ∑ ∑ f [n, m]
n=0 m=0
Mu Mu
′ ∆u ′ ∆u
Since ∆u < ∆o , it follows that the sampling rate associated with × sinc n − n sinc m −m
g[n′ , m′ ] also satisfies the Nyquist rate. ∆o ∆o
The upsampled image is given by Mu −1 Mu −1
′ ′ ′ ′ ′ ′
= ∑ ∑ f [n, m]
g[n , m ] = { f (n ∆u , m ∆u ), 0 ≤ n , m ≤ Mu − 1 }. (4.15) n=0 m=0
′
n′ m
In a later section of this chapter (Section 4-6) we demonstrate × sinc − n sinc −m . (4.18)
L L
how the finer discretization provided by upsampling is used to
compute a rotated or warped version of image f [n, m]. Another If greater truncation is desired, we can replace the product of
application is image magnification, but in that case the primary sinc functions with the product of the Lanczos functions defined
goal is to increase the image size (from T1 × T1 to T2 × T2 ), as in Eq. (4.4). In either case, application of Eq. (4.18) generates
134 CHAPTER 4 IMAGE INTERPOLATION
the upsampled version g[n′ , m′ ] directly from the original sam- tion for F[k1 , k2 ] extends to (Mo − 1) whereas the summation for
pled version f [n, m]. If ∆u = ∆o /L and L is an integer, then the G[k1 , k2 ] extends to (Mu − 1).
process preserves the values of f [n, m] while adding new ones in The goal is to compute g[n, m] by (1) transforming f [n, m] to
between them. To demonstrate that the upsampling process does obtain F[k1 , k2 ], (2) transforming F[k1 , k2 ] to G[k1 , k2 ], and (3)
indeed preserve f [n, m], let us consider the expression given by then transforming G[k1 , k2 ] back to the spatial domain to form
Eq. (4.15) for the specific case where n′ = Ln and m′ = Lm: g[n, m]. Despite the seeming complexity of having to execute a
three-step process, the process is computationally more efficient
g[Ln, Lm] = { f (n′ ∆u , m′ ∆u ), 0 ≤ n′ , m′ ≤ Mu − 1 } than performing upsampling entirely in the spatial domain.
= { f (Ln∆u , Lm∆u ), 0 ≤ n, m ≤ Mo − 1 } Upsampling in the discrete frequency domain [k1 , k2 ] en-
= { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1 } = f [n, m], tails increasing the number of discrete frequency components
from (Mo × Mo ) for F[k1 , k2 ] to (Mu × Mu ) for G[k1 , k2 ], with
where we used the relationships given by Eqs. (4.11) and (4.14). Mu > Mo . As we will demonstrate shortly, G[k1 , k2 ] includes all
Hence, upsampling by an integer L using the sinc interpola- of the elements of F[k1 , k2 ], but it also includes some additional
tion formula does indeed preserve the existing values of f [n, m], rows and columns filled with zeros.
in addition to adding interpolated values between them.
A. Original Image
4-3.2 Upsampling in the Spatial Frequency
Domain Let us start with the sampled image fo (x, y) of continuous
image f (x, y) sampled at a sampling interval ∆o and resulting
Instead of using the sinc interpolation formula given by in (Mo × Mo ) samples. Per Eq. (3.61), adapted to a finite sum
Eq. (4.18), upsampling can be performed much more easily, that starts at (0, 0) and ends at (Mo − 1, Mo − 1),
and with less computation, using the 2-D DFT in the spatial
frequency domain. From Eqs. (4.11) and (4.18), the (Mo × Mo ) Mo −1 Mo −1
original f [n, m] image and the (Mu × Mu ) upsampled image fo (x, y) = ∑ ∑ f (n∆o , m∆o ) δ (x − n∆o ) δ (y − m∆o)
g[n′ , m′ ] are defined as n=0 m=0
Mo −1 Mo −1
f [n, m] = { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1 }, (4.19a) = ∑ ∑ f [n, m] δ (x − n∆o) δ (y − m∆o ), (4.21)
n=0 m=0
g[n′ , m′ ] = { g(n′ ∆u , m′ ∆u ), 0 ≤ n′ , m′ ≤ Mu − 1 }. (4.19b)
where we used the definition for f [n, m] given by Eq. (4.19a).
Note that whereas in the earlier section it proved convenient to Using entry #9 in Table 3-1, the 2-D CSFT Fo (µ , ν ) of
distinguish the indices of the upsampled image from those of fo (x, y) can be written as
the original image—so we used [n, m] for the original image and
[n′ , m′ ] for the upsampled image—the distinction is no longer Fo (µ , ν ) = F { fo (x, y)}
needed in the present section, so we will now use indices [n, m] Mo −1 Mo −1
for both images. = ∑ ∑ f [n, m] F {δ (x − n∆o) δ (y − m∆o )}
From Eq. (3.75), the 2-D DFT of f [n, m] of order (Mo × Mo ) n=0 m=0
and the 2-D DFT of g[n, m] of order (Mu × Mu ) are given by Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π µ n∆o e− j2πν m∆o . (4.22)
Mo −1 Mo −1 n=0 m=0
− j(2π /Mo )(nk1 +mk2 )
F[k1 , k2 ] = ∑ ∑ f [n, m] e ,
n=0 m=0 The spectrum Fo (µ , ν ) of sampled image fo (x, y) is doubly
0 ≤ k1 , k2 ≤ Mo − 1, (4.20a) periodic in µ and ν with period 1/∆o , as expected.
Mu −1 Mu −1 Extending the relations expressed by Eqs. (2.47) and (2.54)
G[k1 , k2 ] = ∑ ∑ f [n, m] e− j(2π /Mu )(nk1 +mk2 ) , from 1-D to 2-D, the spectrum Fo (µ , ν ) of the sampled image is
n=0 m=0 related to the spectrum F(µ , ν ) of continuous image f (x, y) by
0 ≤ k1 , k2 ≤ Mu − 1. (4.20b) ∞ ∞
1
The summations are identical in form, except that the summa-
Fo ( µ , ν ) =
∆2o k ∑ ∑ F(µ − k1 /∆o , ν − k2 /∆o ). (4.23)
1 =−∞ k2 =−∞
4-3 UPSAMPLING AND INTERPOLATION 135
The spectrum Fo (µ , ν ) of the sampled image consists of copies k1 , k2 ≥ 0 and redefining µ as µ = −k1 /(Mo ∆o ) leads to
of the spectrum F(µ , ν ) of f (x, y) repeated every 1/∆o in both
µ and ν , and also scaled by 1/∆2o. −k1 k2 Mo − 1
Fo , 0 ≤ k1 , k2 ≤
In (µ , ν ) space, µ and ν can be both positive or negative. The Mo ∆o Mo ∆o 2
relation between the spectrum Fo (µ , ν ) of the sampled image Mo −1 Mo −1
fo (x, y) and the spectrum F(µ , ν ) of the original continuous = ∑ ∑ f [n, m] e− j2π n(−k1)/Mo e− j2π mk2 /Mo
image f (x, y) assumes different forms for the four quadrants of n=0 m=0
(µ , ν ) space. Mo −1 Mo −1
For ease of presentation, let Mo be odd. If Mo is even, then = ∑ ∑ f [n, m] e− j2π n(Mo −k1 )/Mo e− j2π mk2 /Mo
simply replace (Mo − 1)/2 with Mo /2 (see Section 4-4.2). n=0 m=0
Next, we sample µ and ν by setting them to Mo − 1
= F[Mo − k1 , k2 ], 0 ≤ k1 , k2 ≤ , (4.26)
2
k1 k2
µ= and ν = , (4.24)
Mo ∆o Mo ∆o 3. Quadrant 3: µ ≤ 0 and ν ≤ 0
Mo − 1
0 ≤ |k1 |, |k2 | ≤ . Redefining µ as µ = −k1 /(Mo ∆o ) and ν as ν = −k2 /(Mo ∆o )
2
leads to
−k1 −k2 Mo − 1
Fo , , 0 ≤ k1 , k2 ≤
Mo ∆o Mo ∆o 2
Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π n(−k1)/Mo e− j2π m(−k2 )/Mo
1. Quadrant 1: µ ≥ 0 and ν ≥ 0 n=0 m=0
Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π n(Mo −k1 )/Mo e− j2π m(Mo −k2 )/Mo
At these values of µ and ν , Fo (µ , ν ) becomes n=0 m=0
Mo − 1
k1 k2 = F[Mo − k1, Mo − k2 ], 0 ≤ k1 , k2 ≤ . (4.27)
Fo , 2
Mo ∆o Mo ∆o
Mo −1 Mo −1
4. Quadrant 4: µ ≥ 0 and ν ≤ 0
= ∑ ∑ f [n, m] e− j2π nk1 /Mo e− j2π mk2 /Mo
n=0 m=0 Upon defining µ as in Eq. (4.24) and redefining ν as
Mo − 1 ν = −k2 /(Mo ∆o ),
= F[k1 , k2 ], 0 ≤ k1 , k2 ≤ . (4.25)
2
k1 −k2 Mo − 1
Fo , , 0 ≤ k1 , k2 ≤
Mo ∆o Mo ∆o 2
Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π nk1 /Mo e− j2π m(−k2 )/Mo
n=0 m=0
Mo −1 Mo −1
2. Quadrant 2: µ ≤ 0 and ν ≥ 0 = ∑ ∑ f [n, m] e− j2π nk1 /Mo e− j2π m(Mo −k2 )/Mo
n=0 m=0
In quadrants 2–4, we make use of the relation Mo − 1
= F[k1 , Mo − k2], 0 ≤ k1 , k2 ≤ , (4.28)
2
− j2π n(−k1 )/Mo − j2π nMo /Mo − j2π n(−k1 )/Mo
e =e e
The result given by Eqs. (4.25)–(4.28) states that the 2-D
= e− j2π n(Mo −k1 )/Mo , CSFT Fo (µ , ν ) of the sampled image fo (x, y)—when sampled
at the discrete spatial frequency values defined by Eq. (4.24)—
where we used e− j2π nMo /Mo = 1. A similar relation applies to k2 . is the 2-D DFT of the sampled image f [n, m]. Also, the spec-
In quadrant 2, µ is negative and ν is positive, so keeping trum Fo (µ , ν ) of the sampled image consists of copies of the
136 CHAPTER 4 IMAGE INTERPOLATION
spectrum F(µ , ν ) repeated every 1/∆o in both µ and ν , and also upon sampling Fu (µ , ν ) at the rates defined by Eq. (4.24), we
scaled by 1/∆2o. Hence, generalizing Eq. (2.54) from 1-D to 2-D obtain
gives
k1 k2
Fu ,
k1 k2 1 k1 k2 Mo ∆o Mo ∆o
Fo , = 2F , , (4.29)
Mo ∆o Mo ∆o ∆o Mo ∆o Mo ∆o Mu −1 Mu −1
= ∑ ∑ g[n, m] e− j2π nk1 /Mu e− j2π mk2 /Mu
where F(µ , ν ) is the 2-D CSFT of the continuous-space image n=0 m=0
f (x, y) and 0 ≤ |k1 |, |k2 | ≤ (Mo − 1)/2. Mo − 1
= G[k1 , k2 ], 0 ≤ k1 , k2 ≤ , (4.32)
2
1. Quadrant 1: µ ≥ 0 and ν ≥ 0 Mo − 1
= G[Mu − k1, Mu − k2 ], 0 ≤ k1 , k2 ≤ . (4.34)
2
In view of the relationship (from Eq. (4.14))
∆u ∆u 1
= = ,
Mo ∆o Mu ∆u Mu
4-4 IMPLEMENTATION OF UPSAMPLING USING 2-D DFT IN MATLAB 137
4. Quadrant 4: µ ≥ 0 and ν ≤ 0 sampled signal is zero, since sampling the original signal f (x, y)
at above its Nyquist rate separates the copies of its spectrum,
leaving bands of zero between copies. Thus
k1 −k2 Mo − 1
Fu , , 0 ≤ k1 , k2 ≤
Mo ∆o Mo ∆o 2 G[k1 , k2 ] = 0, Mo ≤ k1 , k2 ≤ Mu − 1. (4.39)
Mu −1 Mu −1
= ∑ ∑ g[n, m] e− j2π nk1 /Mu e− j2π m(−k2 )/Mu
n=0 m=0
Mu −1 Mu −1
= ∑ ∑ g[n, m] e− j2π nk1 /Mu e− j2π m(Mu −k2 )/Mu
n=0 m=0 4-4 Implementation of Upsampling
Mo − 1
= G[k1 , Mu − k2 ], 0 ≤ k1 , k2 ≤ , (4.35) Using 2-D DFT in MATLAB
2
The result given by Eqs. (4.32)–(4.35) states that the 2-D In MATLAB, both the image f [n, m] and its 2-D DFT are stored
CSFT Fu (µ , ν ) of the upsampled image fu (x, y)—when sampled and displayed using the format shown in Fig. 3-25(d), wherein
at the discrete spatial frequency values defined by Eq. (4.24)—is the origin is at the upper left-hand corner of the image, and the
the 2-D DFT of the sampled image g[n, m]. Also, the spectrum indices of the corner pixel are (1, 1).
Fu (µ , ν ) of the sampled image consists of copies of the spec-
trum F(µ , ν ) of f (x, y) repeated every 1/∆u in both µ and ν ,
and also scaled by 1/∆2u. Thus,
k1 k2 1 k1 k2 Image and 2-D DFT Notation
Fu , = 2F , , (4.36)
Mo ∆o Mo ∆o ∆u Mo ∆o Mo ∆o
To avoid confusion between the common-image format
From Eq. (4.14), Mo ∆o = Mu ∆u . Combining Eq. (4.29) and (CIF) and the MATLAB format, we provide the following
Eq. (4.36) shows that list of symbols and definitions:
k1 k2 M2 k1 k2 CIF MATLAB
Fu , = u2 Fo , . (4.37)
Mo ∆o Mo ∆o Mo Mo ∆o Mo ∆o Original image f [n, m] X(m′ , n′ )
Hence, the (Mo × Mo ) 2-D DFT F[k1 , k2 ] of f [n, m] and the Upsampled image g[n, m] Y(m′ , n′ )
(Mu × Mu ) 2-D DFT G[k1 , k2 ] of g[n, m] are related by 2-D DFT of f [n, m] F[k1 , k2 ] FX(k2′ , k1′ )
2-D DFT of g[n, m] G[k1 , k2 ] FY(k2′ , k1′ )
Mu2
G[k1 , k2 ] = F[k1 , k2 ],
Mo2
Mu2
G[Mu − k1 , k2 ] = F[Mo − k1 , k2 ],
Mo2 As noted earlier in Section 3-9, when an image f [n, m] is stored
M2 in MATLAB as array X(m′ , n′ ), the two sets of indices are related
u
G[k1 , Mu − k1 ] = F[k1 , Mo − k2 ], by
Mo2
Mu2 m′ = m + 1, (4.40a)
G[Mu − k1 , Mu − k2 ] = F[Mo − k1 , Mo − k2 ], ′
Mo2 n = n + 1. (4.40b)
Mo − 1
0 ≤ k1 , k2 ≤ . (4.38) The indices get interchanged in orientation (n represents row
2
number, whereas n′ represents column number) and are shifted
This leaves G[k1 , k2 ] for Mo ≤ k1 , k2 ≤ Mu − 1 to be determined. by 1. For example, f [0, 0] = X(1, 1), and f [0, 1] = X(2, 1). While
But Eq. (4.38) shows that these values of G[k1 , k2 ] are samples the indices of the two formats are different, the contents of
of Fu (µ , ν ) at values of µ , ν for which this spectrum of the array X(m′ , n′ ) is identical with that of f [n, m]. That is, for an
138 CHAPTER 4 IMAGE INTERPOLATION
(Mo × Mo ) image zeros in the “middle” of the array FX. The result is
Mu2
X(m′ , n′ ) = f [n, m] = FY = ×
Mo2
f [0, 0] f [1, 0] ··· f [Mo − 1, 0] (Mu −Mo ) columns
h i z }| { h i
f [0, 1] f [1, 1] ··· f [Mo − 1, 1] Mo −1 Mo +1
. F[0,0]... F 2 ,0 0 ... 0 ... 0 F 2 ,0 ... F[Mo −1,0]
.
.. .. .. ..
.. .. .. .. ..
. . .
.h . . . .
h i i h i h i
f [0, Mo − 1] f [1, Mo − 1] · · · f [Mo − 1, Mo − 1] Mo −1
F 0, Mo2−1 ... F Mo2−1 , Mo2−1 F Mo +1 Mo −1
2 , 2 ... F Mo −1, 2
(4.41)
0. 0.
.. ..
.. .
The MATLAB command FX = fft(X, Mo , Mo ) computes the {
0. . 0 .. 0. }
2-D DFT F[k1 , k2 ] and stores it in array FX(k2′ , k1′ ):
.
.
.
.
h i 0 h i h i 0 h i
FX(k2′ , k1′ ) = F[k1 , k2 ] =
F 0, Mo +1 ... F Mo −1 , Mo +1 Mo +1 Mo +1
Mo +1
2 2 2 F 2 , 2 ... F Mo −1, 2
F[0, 0] F[1, 0] ··· F[Mo − 1, 0] .. .. .. .. ..
.h . . . .
F[0, 1] F[1, 1] ··· F[Mo − 1, 1] i h i
.. .. .. .. . F [0,Mo −1]... F Mo2−1 ,Mo −1 0| 0 0} F Mo −1, 2Mo +1
... F [Mo −1,Mo −1]
{z
. . . . (4.43)
F[0, Mo − 1] F[1, Mo − 1] · · · F[Mo − 1, Mo − 1]
(4.42)
◮ Note that the (Mu − Mo ) columns of zeros start after entry
The goal of upsampling using the spatial-frequency domain F[(Mo − 1)/2, 0], and similarly the (Mu − Mo ) rows of zeros
is to compute g[n, m] from f [n, m] by computing G[k1 , k2 ] from start after F[0, (Mo − 1)/2]. ◭
F[k1 , k2 ] and then applying the inverse DFT to obtain g[n, m].
The details of the procedure are somewhat different depending
on whether the array size parameter M is an odd integer or an Once array FY has been established, the corresponding up-
even integer. Hence, we consider the two cases separately. sampled image g[n, m] is obtained by applying the MATLAB
command Y=real(ifft2(FY,N,N)), where N = Mu and
the “real” is needed to eliminate the imaginary part of Y, which
4-4.1 Mo = Odd Integer may exist because of round-off error in the ifft2.
As a simple example, consider the (3 × 3) array
The recipe for upsampling using the 2-D DFT is as follows:
F[0, 0] F[1, 0] F[2, 0]
1. Given: image { f [n, m], 0 ≤ n, m ≤ Mo −1 }, as represented FX = F[0, 1] F[1, 1] F[2, 1] . (4.44a)
by Eq. (4.41). F[0, 2] F[1, 2] F[2, 2]
In MATLAB, the array FY containing the 2-D DFT G[k1 , k2 ] Application of the inverse 2-D DFT to FY generates array Y in
is obtained from the array FX given by Eq. (4.42) by inserting MATLAB, which is equivalent in content to image g[n, m] in
(Mu − Mo ) rows of zeros and an equal number of columns of common-image format.
4-4 IMPLEMENTATION OF UPSAMPLING USING 2-D DFT IN MATLAB 139
4-4.2 Mo = Even Integer one column of zeros and multiplying by (52 /32 ) would generate
49 −6 − j3 0 3 −6 + j3
52 −5 − j4 2 + j9 0 −5 + j2 j
G[k1 , k2 ] = 2 0 0 0 0 0 .
3 1 −4 + j3 0 −1 −4 − j3
For a real-valued (Mo × Mo ) image f [n, m] with Mo = an odd
−5 + j4 −j 0 −5 − j2 2 − j9
integer, conjugate symmetry is automatically satisfied for both
(4.48)
F[k1 , k2 ], the 2-D DFT of the original image, as well as for This G[k1 , k2 ] array does not satisfy conjugate symmetry. Ap-
G[k1 , k2 ], the 2-D DFT of the upsampled image. However, if plying the inverse 2-D DFT to G[k1 , k2 ] generates the upsampled
Mo is an even integer, application of the recipe outlined in the image g[n, m]:
preceding subsection will violate conjugate symmetry, so we
need to modify it. Recall from Eq. (3.80) that for a real-valued g[n,m] = (52 /32 )×
image f [n, m], conjugate symmetry requires that 0.64 1.05 + j0.19 1.64 − j0.30 2.39 + j0.30 2.27 − j0.19
1.05 + j0.19 2.01 + j0.22 2.85 − j0.20 2.8 − j0.13 1.82 − j0.19
F∗ [k1 , k2 ] = F[Mo − k1 , Mo − k2 ], 1 ≤ k1 , k2 ≤ Mo − 1. 1.64 − j0.3 2.38 − j0.44 3.9 + j0.46 3.16 + j0.08 1.31 + j0.39 ,
2.39 + j0.30 2.37 − j0.066 2.88 + j0.36 2.04 − j0.9 0.84 + j0.12
(4.45) 2.27 − j0.19 1.89 − j0.25 1.33 + j0.26 0.83 + j0.08 1.18 + j0.22
Additionally, in view of the definition for F[k1 , k2 ] given by
Eq. (4.20a), the following two conditions should be satisfied: which is clearly incorrect; all of its elements should be real-
valued because the original image f [n, m] is real-valued. Obvi-
Mo −1 Mo −1 ously, the upsampling recipe needs to be modified.
F[0, 0] = ∑ ∑ f [n, m] = real-valued, (4.46a) A simple solution is to split row F[k1 , Mo /2] into 2 rows and
n=0 m=0 to split column F[Mo /2, k2 ] into 2 columns, which also means
Mo Mo Mo −1 Mo −1 that F[Mo /2, Mo /2] gets split into 4 entries. The recipe preserves
F , = ∑ ∑ (−1)n+m f [n, m] = real-valued. conjugate symmetry in G[k1 , k2 ].
2 2 n=0 m=0 When applied to F[k1 , k2 ], the recipe yields the (6 × 6) array
(4.46b)
G[k1 , k2 ] = (62 /42 )×
In the preceding subsection, we inserted the appropriate number
of rows and columns of zeros to obtain G[k1 , k2 ] from F[k1 , k2 ]. F[0, 0] F[1, 0] F[2, 0]/2 0 F[2, 0]/2 F[3, 0]
For Mo equal to an odd integer, the conditions represented F[0, 1] F[1, 1] F[2, 1]/2 0 F[2, 1]/2 F[3, 1]
by Eqs. (4.45) and (4.46) are satisfied for both F[k1 , k2 ] and F[0, 2]/2 F[1, 2]/2 F[2, 2]/4 0 F[2, 2]/4 F[3, 2]/2
0 0 0 0 0 0 =
G[k1 , k2 ], but they are not satisfied for G[k1 , k2 ] when Mo is
F[0, 2]/2 F[1, 2]/2 F[2, 2]/4 0 F[2, 2]/4 F[3, 2]/2
an even integer. To demonstrate why the simple zero-insertion
procedure is problematic, let us consider the (4 × 4) image F[0, 3] F[1, 3] F[2, 3]/2 0 F[2, 3]/2 F[3, 3]
49 −6 − j3 1.5 0 1.5 −6 + j3
1 2 3 4 −5 − j4 2 + j9 −2.5 + j1 0 −2.5 + j1 j
2 4 5 3
f [n, m] = .
(4.47a) 0.5 −2 + j1.5 −0.25 0 −0.25 −2 − j1.5
3 4 6 2 .
0 0 0 0 0 0
4 3 2 1 0.5 −2 + j1.5 −0.25 0 −0.25 −2 − j1.5
−5 − j4 −j − j0.5 0 −2.5 − j1 2 − j9
The (4 × 4) 2-D DFT F[k1 , k2 ] of f [n, m] is
(4.49)
49 −6 − j3 3 −6 + j3
−5 − j4 2 + j9 −5 + j2 j Application of the inverse 2-D DFT to G[k1 , k2 ] yields the
F[k1 , k2 ] = .
1 −4 + j3 −1 −4 − j3
−5 + j4 −j −5 − j2 2 − j9
(4.47b)
This F[k1 , k2 ] has conjugate symmetry.
Let us suppose that we wish to upsample the (4 × 4) image
f [n, m] to a (5 × 5) image g[n, m]. Inserting one row of zeros and
140 CHAPTER 4 IMAGE INTERPOLATION
4-5 Downsampling
upsampled image Downsampling is the inverse operation to upsampling. The
objective of downsampling is to reduce the array size f [n, m]
0.44 0.61 1.06 1.33 1.83 1.38 of an image from (Mo × Mo ) samples down to (Md × Md ), with
0.61 1.09 1.73 1.91 1.85 1.21 Md < Mo . The original image f [n, m] and the downsampled
62
1.06 1.55 2.31 2.58 1.66 0.90 image g[n, m] are defined as:
g[n, m] = 2
4 1.33 1.55 2.22 2.67 1.45 0.78
1.83 1.72 1.52 1.42 0.54 0.73
Original image f [n, m] = { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1},
1.38 1.25 0.94 0.76 0.72 1.04 Downsampled image g[n, m] = { f (n∆d , m∆d ), 0 ≤ n, m ≤ Md − 1}.
1.0 1.37 2.39 3.0 4.12 3.11
1.37 2.46 3.90 4.30 4.16 2.72 Both images are sampled versions of some continuous-space
image f (x, y), with image f [n, m] sampled at a sampling interval
2.39 3.49 5.20 5.81 3.74 2.03
= , (4.50) ∆o that satisfies the Nyquist rate, and the downsampled image
3.0 3.49 5.00 6.01 3.26 1.76
4.12 3.87 3.42 3.20 1.22 1.64 g[n, m] is sampled at ∆d , with ∆d > ∆o , so it is unlikely that
3.11 2.81 2.12 1.71 1.62 2.34 g[n, m] satisfies the Nyquist rate.
The goal of downsampling is to compute g[n, m] from f [n, m].
which is entirely real-valued, as it should be. The original (4×4) That is, we compute a coarser-discretized Md × Md image g[n, m]
image given by Eq. (4.47a) and the upsampled (6 × 6) image from the finer-discretized Mo × Mo image f [n, m]. Applications
given by Eq. (4.50) are displayed in Fig. 4-4. The two images, of downsampling include computation of “thumbnail” versions
which bear a close resemblance, have the same physical size but of images, as demonstrated in Example 4-1, and shrinking
different-sized pixels. images to fit into a prescribed space, such as columns in a
textbook.
4-5.1 Aliasing
It might seem that downsampling by, say, two, meaning that
Exercise 4-2: Upsample the length-2 signal x[n] = {8, 4} to
Md = Mo /2 (assuming Mo is even), could be easily accomplished
a length-4 signal y[n]. by simply deleting every even-indexed (or odd-indexed) row
and column of f [n, m]. Deleting every other row and column of
4-6 ANTIALIAS LOWPASS FILTERING 141
f [n, m] is called decimation by two. Decimation by two would 4-6 Antialias Lowpass Filtering
give the result of sampling f (x, y) every 2∆o instead of every
∆o . But if sampling at S = 1/2∆o is below the Nyquist rate, the Clearly decimation alone is not sufficient to obtain a downsam-
decimated image g[n, m] is aliased. The effect of aliasing on the pled image that looks like a demagnified original image. To
spectrum of g[n, m] can be understood in 1-D from Fig. 2-6(b). avoid aliasing, it is necessary to lowpass filter the image before
The copies of the spectrum of f (x, y) produced by sampling decimation, eliminating the high spatial frequency components
overlap one another, so the high-frequency parts of the signal of the image, so that when the filtered image is decimated, the
become distorted. Example 4-1 gives an illustration of aliasing copies of the spectra do not overlap. In 1-D, in Fig. 2-6(b),
in 2-D. had the spectrum been previously lowpass filtered with cutoff
frequency S/2 Hz, the high-frequency parts of the spectrum
would no longer overlap after sampling. This is called antialias
Example 4-1: Aliasing filtering.
The same concept applies in discrete space (and time). The
The 200 × 200 clown image shown in Fig. 4-5(a) was decimated periodicity of spectra induced by sampling becomes the period-
to the 25 × 25 image shown in Fig. 4-5(b). The decimated image icity of the DSFT and DTFT with periods of 2π in (Ω1 , Ω2 )
is a poor replica of the original image and could not function as and Ω, respectively. Lowpass filtering can be accomplished by
a thumbnail image. setting certain high-spatial-frequency portions of the 2-D DFT
to zero. The purpose of this lowpass filtering is now to eliminate
the high-spatial-frequency parts of the discrete-space spectrum,
prior to decimation. This eliminates aliasing, as demonstrated in
Example 4-2.
The 200 × 200 clown image shown in Fig. 4-5(a) was first
lowpass-filtered to the image shown in Fig. 4-6(a), with the
spectrum shown in Fig. 4-6(b), then decimated to the 25 × 25
image shown in Fig. 4-6(c). The decimated image is now a good
replica of the original image and could function as a thumbnail
image. The decimated image pixel values in Fig. 4-6(c) are all
equal to certain pixel values in the lowpass-filtered image in
Fig. 4-6(a).
|FX(k1 , k2 )| =
173.00 23.81 20.66 15.00 20.66 23.81
6.93 25.00 7.55 14.00 7.94 21.93
(b) Spectrum of lowpass-filtered image
11.14 19.98 16.09 15.10 14.93 4.58
9.00 16.09 . (4.52)
19.98 1.00 19.98 16.09
11.14 4.58 14.93 15.10 16.09 19.98
6.93 21.93 7.94 14.00 7.55 25.00
Deleting the zero-valued rows and columns in the 2-D DFT Photoshop
R
and through the command imresize.
magnitudes in MATLAB depiction gives the magnitudes
173.00 23.82 23.82 4-7.1 B-Splines
|FG(k1 , k2 )| = 6.93 25.00 21.93 . (4.54)
6.93 21.93 25.00 Splines are piecewise-polynomial functions whose polynomial
coefficients change at half-integer or integer values of the
Multiplying by Md2 /Mo2 = 32 /62 = 1/4 and taking the inverse independent variable, called knots, so that the function and some
2-D DFT gives of its derivatives are continuous at each knot. In 1-D, a B-spline
βN (t) of order N is a piecewise polynomial of degree N, centered
5.83 1.58 6.00 at t = 0. The support of βN (t), which is the interval outside of
g[m, n] = 6.08 6.33 3.00 . (4.55) which βN (t) = 0, extends between −(N + 1)/2 and +(N + 1)/2.
6.25 3.5 4.67
This is the result we would have gotten by decimating the ◮ Hence, the duration of βN (t) is (N + 1). ◭
antialiased lowpass-filtered original image, but it was performed
entirely in the 2-D DFT domain.
Formally, the B-spline function βN (t) is defined as
◮ A MATLAB code for this example is available on the Z ∞
sin(π f ) N+1 j2π f t
book website. ◭ βN (t) = e d f, (4.56)
−∞ πf
which is equivalent to the inverse Fourier transform of
Concept Question 4-2: Why must we delete rows and
sincN+1 ( f ). Recognizing that (a) the inverse Fourier transform
columns of the 2-D DFT array to perform downsampling?
of sinc(t) is a rectangle function and (b) multiplication in
the frequency domain is equivalent to convolution in the time
domain, it follows that
4-7 B-Splines Interpolation
In the preceding sections we examined several different image βN (t) = rect(t) ∗ · · · ∗ rect(t), (4.57)
| {z }
interpolation methods, some of which perform the interpolation N+1 times
directly in the spatial domain, and others that perform the inter-
polation in the spatial frequency domain. Now, we introduce yet with (
another method, known as the B-splines interpolation method,
1 for |t| < 1/2,
with the distinguishing feature that it is the method most com- rect(t) = (4.58)
monly used for image interpolation. Unlike with downsampling, 0 for |t| > 1/2.
B-spline interpolation has no aliasing issues when the sampling Application of Eq. (4.57) for N = 0, 1, 2, and 3 leads to:
interval ∆ is too large. Moreover, unlike with upsampling,
(
B-spline interpolation need not result in blurred images. 1 for |t| < 1/2,
B-splines are a family of piecewise polynomial functions, β0 (t) = rect(t) = (4.59)
with each polynomial piece having a degree N, where N is 0 for |t| > 1/2,
(
a non-negative integer. As we will observe later on in this 1 − |t| for |t| < 1,
section, a B-spline of order zero is equivalent to the nearest- β1 (t) = β0 (t) ∗ β0(t) = (4.60)
neighbor interpolation method of Section 3-5.1, but it is simpler 0 for |t| > 1,
to implement than the sinc interpolation formula. Interpolation β2 (t) = β1 (t) ∗ β0(t)
with B-splines of order N = 1 generates linear interpolation,
2
which is used in computer graphics. Another popular member 3/4 − t for 0 ≤ |t| ≤ 1/2,
1
of the B-spline interpolation family is cubic interpolation, corre- = 2 (3/2 − |t|)2 for 1/2 ≤ |t| ≤ 3/2, (4.61)
0
sponding to N = 3. Cubic spline interpolation is used in Adobe
R
for |t| > 3/2,
144 CHAPTER 4 IMAGE INTERPOLATION
Note that in all cases, βN (t) is continuous over its full duration.
For N ≥ 1, the B-spline function βN (t) is continuous and
differentiable (N − 1) times at all times t. For β2 (t), the function
is continuous across its full interval (−3/2, 3/2), including at
t = 1/2. Similarly, β3 (t) is continuous over its interval (−2, 2), t
−0.5 0 0.5
including at t = 1.
(a) β0(t)
Plots of the B-splines of order N = 0, 1, 2, and 3 are displayed
in Fig. 4-7. From the central limit theorem in the field of
probability, we know that convolving a function with itself
repeatedly makes the function resemble a Gaussian. This is 1
evident in the present case as well.
From the standpoint of 1-D and 2-D interpolation of signals
and images, the significance of B-splines is in how we can use
them to express a signal or image. To guide us through the
process, let us assume we have 6 samples x(n∆), as shown in
Fig. 4-8, extending between t = 0 and t = 5∆. Our objective
t
is to interpolate between these 6 points so as to obtain a −1 0 1
continuous function x(t). An important constraint is to ensure (b) β1(t)
that x(t) = x(n∆) at the 6 discrete times n∆.
For a B-spline of a specified order N, the interpolation is
realized by expressing the desired interpolated signal x(t) as a 0.8
linear combination of time-shifted B-splines, all of order N:
∞ t
x(t) = ∑ c[m] βN
∆
−m . (4.63) 0.4
m=−∞
Here, βN ∆t − m is the B-spline function βN (t), with t scaled by t
−2 −1 0 1 2
the sampling interval ∆ and delayed by a scaled time integer m.
(c) β2(t)
◮ The support of βN ∆t − m is
0.7
0.6
N +1 t N+1
m− < < m+ . (4.64) 0.4
2 ∆ 2
0.2
That is, βN ∆t − m = 0 outside that interval. ◭
t
−3 −2 −1 0 1 2 3
(d) β3(t)
Associated with each value of m is a constant coefficient c[m]
whose value is related to the sampled values x(n∆) and the order
N of the B-spline. More specifically, the values of c[m] have to Figure 4-7 Plots of βN (t) for N = 0, 1, 2, and 3.
be chosen such that the aforementioned constraint requiring that
x(t) = x(n∆) at discrete times t = n∆ is satisfied. The process is
4-7 B-SPLINES INTERPOLATION 145
4 4
3 3 x(nΔ)
x(t)
2 2
1 1
0 t/Δ 0 t/Δ
0 1 2 3 4 5 0 1 2 3 4 5
Figure 4-8 Samples x(n∆) to be interpolated into x(t). Figure 4-9 B-spline interpolation for N = 0.
0 t/Δ
0 1 2 3 4 5
(a) x(nΔ)
0 t/Δ
−1 0 1 2 3 4 5 6
(b) β1(t/Δ − m)
0
0 1 2 3 4 5
(c) x(t)
Consequently, for any integer n, x(t) at time t between n∆ and Figure 4-11 B-splines β2 (t/∆ − m) overlap in time.
(n + 1)∆ is a weighted average given by
t t
x(t) = x(n∆) (n + 1) − + x((n + 1)∆) − n . (4.68a)
∆ ∆ samples { x(n∆) } can be derived by starting with Eq. (4.63),
The associated duration is ∞ t
n∆ ≤ t ≤ (n + 1)∆. (4.68b)
x(t) = ∑ c[m] βN
∆
−m , (4.69)
m=−∞
Application of the B-spline linear interpolation to the given and then setting t = n∆, which gives
samples x(n∆) leads to the continuous function x(t) shown in
∞
Fig. 4-10(c). The linear interpolation amounts to setting { x(t),
n∆ ≤ t ≤ (n + 1)∆ } to lie on the straight line connecting x(n∆)
x(n∆) = ∑ c[m] βN (n − m) = c[n] ∗ βN (n), (4.70)
m=−∞
to x((n + 1)∆).
where use was made of the discrete-time convolution relation
Exercise 4-6: Given the samples given by Eq. (2.71a).
For N = 2, Eq. (4.61) indicates that β2 (n) 6= 0 only for integers
{ x(0), x(∆), x(2∆), x(3∆) } = { 7, 4, 3, 2 }, n = { −1, 0, 1 }. Hence, the discrete-time convolution given by
Eq. (4.70) simplifies to
compute x(∆/3) by interpolation using: (a) nearest neigh-
bor; (b) linear. x(n∆) = c[n − 1] β2 (1) + c[n] β2 (0) + c[n + 1] β2 (−1)
1
Answer: (a) ∆/3 is closer to 0 than to ∆, so = 8 c[n − 1] + 43 c[n] + 81 c[n + 1] (N = 2). (4.71)
x(∆/3) = x(0) = 7. In the second step, the constant coefficients were computed
using Eq. (4.61) for β2 (t). The sum truncates because β2 (t) = 0
for |t| ≥ 3/2, so only three basis functions overlap at any specific
(b)
time t, as is evident in Fig. 4-11.
x(∆/3) = (2/3)x(0)+(1/3)x(∆) = (2/3)(7)+(1/3)(4) = 6. Similarly, for N = 3, β3 (t) = 0 for |t| ≥ 2, which also leads
to the sum of three terms:
As noted earlier, 15
1 3 1 10
β2 (n) = { β2 (−1), β2 (0), β2 (1) } = , , .
8 4 8
5
Inserting x(n) and β (n) in Eq. (4.70) establishes the convolution
problem 0 t
−5 −4 −3 −2 −1 0 1 2 3 4 5
1 3 1 (b) x(t) interpolated using quadratic splines
{ 3, 19, 11, 17, 26, 4 } = , , ∗ c[n].
8 4 8
Figure 4-12 (a) Original samples x(n) and (b) interpolated
Following the solution recipe outlined earlier—and demon-
function x(t).
strated in Section 2-9—the deconvolution solution is
◮ Note: The MATLAB code for solving Example 4-4 is Concept Question 4-4: Why use cubic interpolation,
available on the book website. ◭ when quadratic interpolation produces smooth curves?
4-8 2-D SPLINE INTERPOLATION 149
four neighbors:
Concept Question 4-5: Why do quadratic and cubic in-
terpolation require computation of coefficients, while linear { f (n∆, m∆), f ((n + 1)∆, m∆), f (n∆, (m + 1)∆),
interpolation does not?
f ((n + 1)∆, (m + 1)∆) }. (4.77)
(−1, 0, 1):
0 β3 (−1)
β3 (0) β3 (−1) β3 (0) β3 (1)
100
β3 (1)
200 1
1 1 4 1
6
300 41 4 1
= 6 6 6 6 = 4 16 4 . (4.81)
36 1 4 1
400 1
6
500
600
1 2
Exercise 4-8: The “image” is interpolated using
700 3 4
bilinear interpolation. What is the interpolated value at the .
800 center of the image?
900 Answer: 14 (1 + 2 + 3 + 4) = 2.5
1000
0 0
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
199 199
0 40 80 120 160 199 0 40 80 120 160 199
(a) Sinc interpolation (d) B-spline NN interpolation
0 0
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
199 199
0 40 80 120 160 199 0 40 80 120 160 199
(b) Lanczos interpolation with a = 2 (e) B-spline linear interpolation
0 0
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
199 199
0 40 80 120 160 199 0 40 80 120 160 199
(c) Lanczos interpolation with a = 3 (f ) B-spline cubic-spline interpolation
Figure 4-15 Comparison of three interpolation methods: (a) sinc interpolation; (b) and (c) Lanczos interpolation with a = 2 and a = 3,
respectively; and (d) to (f) B-spline with N = 0, N = 1, and N = 3, respectively.
152 CHAPTER 4 IMAGE INTERPOLATION
g(n∆, m∆) =
199
0 199 f (n∆ cos θ + m∆ sin θ , m∆ cos θ − n∆ sin θ ), (4.83)
(a) Original clown image
which clearly requires interpolation of f (x, y) at the required
0 points from its given samples f (n∆, m∆). In practice, nearest
neighbor (NN) interpolation is usually sufficient to realize the
necessary interpolation.
Figure 4-16(a) displays a zero-padded clown image, and
part (b) displays the image after rotation by 45◦ using NN
interpolation. The rotated image bears a very good resemblance
to the rotated original. The MATLAB code for this figure is on
the book website.
150
Example 4-7: Square-Root and Inverse Image
200 Warping
250
Another form of image warping is realized by applying a
300 square-root function of the form
p p
350 n |n| m |m|
Tn (n) = and Tm (m) = .
25 25
399
0 50 100 150 200 250 300 350 399 Repetition of the steps described in the previous example, but
(a) Zero-padded clown image using the square-root transformation instead, leads to the images
in Fig. 4-18(a). The MATLAB code for this figure is available
0 on the book website.
Another transformation is the inverse function given by
50
n|n| m|m|
Tn (n) = and Tm (m) = .
100 a a
The result of warping the clown image with a = 300 is shown in
150 Fig. 4-18(b). The MATLAB code for this figure is available on
the book website.
200
300
1 2
Exercise 4-9: The “image” is rotated counter-
3 4
350 ◦
clockwise 90 . What is the result?
399 2 4
0 50 100 150 200 250 300 350 399 Answer:
1 3
(b) Clown image rotated by 45˚
4 8
Figure 4-16 Clown image before and after rotation by 45◦ . Exercise 4-10: The “image” is magnified by a
12 16
factor of three. What is the result, using: (a) NN; (b) bilinear
interpolation?
154 CHAPTER 4 IMAGE INTERPOLATION
0 0
50 50
100 100
150 150
200 200
250 250
300 300
350 350
399 399
0 50 100 150 200 250 300 350 399 0 50 100 150 200 250 300 350 399
(a) Tn(n) = ne−|n|/ 300 and Tm(m) = me−|m|/ 300 (a) Tn(n) = n√|n|/ 25 and Tm(m) = m√|n|/ 25
0 0
50 50
100 100
150 150
200 200
250 250
300 300
350 350
399 399
0 50 100 150 200 250 300 350 399 0 50 100 150 200 250 300 350 399
(b) Tn(n) = ne−|n|/ 200 and Tm(m) = me−|m|/ 200 (b) Tn(n) = n|n|/ 300 and Tm(m) = m|m|/ 300
Figure 4-17 Nonlinear image warping with space constants of Figure 4-18 Clown image warped with (a) square-root transfor-
(a) 300 and (b) 200. mation and (b) inverse transformation.
4-10 EXAMPLES OF IMAGE INTERPOLATION APPLICATIONS 155
Answer:
(a)
4 4 4 8 8 8
4 4 4 8 8 8
4 4 4 8 8 8
12 12 12 16 16 16
12 12 12 16 16 16
12 12 12 16 16 16
(b)
1 2 1 2 4 2
2 4 2 4 8 4
1 2 1 2 4 2
3 6 3 4 8 4
6 12 6 8 16 8
3 6 3 4 8 4
156 CHAPTER 4 IMAGE INTERPOLATION
Summary
Concepts
• Interpolation is “connecting the dots” of 1-D samples, must be taken to preserve conjugate symmetry in the
and “filling in the gaps” of 2-D samples. 2-D DFT. Deleting rows and columns performs lowpass
• Interpolation can be used to rotate and to warp or filtering so that the downsampled image is not aliased.
“morph” images. • B-splines are piecewise polynomial functions that can be
• Upsampling an image can be performed by inserting used to interpolate samples in 1-D and 2-D.
rows and columns of zeros in the 2-D DFT of the image. • For N ≥ 2, computation of the coefficients {c[m]} from
Care must be taken to preserve conjugate symmetry in samples {x(m∆)} can be formulated as a deconvolution
the 2-D DFT. problem.
• Downsampling an image can be performed by deleting • 2-D interpolation using B-splines is a generalization of
rows and columns in the 2-D DFT of the image. Care 1-D interpolation using B-splines.
Mathematical Formulae
B-splines B-spline
2 3
βN (t) = rect(t) ∗ · · · ∗ rect(t) 2/3 − t + |t| /2 for |t| ≤ 1,
| {z } 3
N+1 times β3 (t) = (2 − |t|) /6 for 1 ≤ |t| ≤ 2,
B-spline 0 for |t| ≥ 2
(
1 for |t| < 1/2,
β0 (t) = rect(t) = B-spline 1-D interpolation
0 for |t| > 1/2 ∞ t
x(t) = ∑ c[m] βN −m
B-spline ( m=−∞ ∆
1 − |t| for |t| ≤ 1, Nearest-neighbor 1-D interpolation
β1 (t) =
0 for |t| ≥ 1 x(t) = x(n∆) for |t − n∆| < ∆/2
Linear 1-D interpolation
B-spline t t
2 x(t) = x(n∆) (n + 1) − + x((n + 1)∆) −n
3/4 − t for 0 ≤ |t| ≤ 1/2, ∆ ∆
β2 (t) = (3/2 − |t|)2/2 for 1/2 ≤ |t| ≤ 3/2, for n∆ ≤ t ≤ (n + 1)∆
0 for |t| ≥ 3/2
Important Terms Provide definitions or explain the meaning of the following terms:
B-spline downsampling interpolation Lanczos function nearest-neighbor thumbnail image upsampling
pling. Note that 64 is an even number. 4.14 Another way to derive the formula for linear in-
terpolation is as follows: The goal is to interpolate the
4.4 Write a MATLAB program that loads the 64 × 64 image in
four points { f (0, 0), f (1, 0), f (0, 1), f (1, 1)} using a formula
tinyletters.mat, deletes the last row and column to make
f (x, y) = f0 + f1 x + f2 y + f3 xy, where { f0 , f2 , f2 , f3 } are found
it 63 × 63, and magnifies it by four using upsampling. This is
from the given points. This extends to
easier than Problem 4.3 since 63 is an odd number.
{ f (n, m), f (n + 1, ), f (n, m + 1), f (n + 1, m + 1)}
Section 4-5: Downsampling
for any integers n, m.
4.5 Write a MATLAB program that loads the 200 × 200 image (a) Set up a linear system of equations with unknowns
in clown.mat, antialias lowpass filters it, and demagnifies { f0 , f1 , f2 , f3 } and knowns
it by four using downsampling. (This is how the image in
tinyclown.mat was created.) { f (0, 0), f (1, 0), f (0, 1), f (1, 1)}.
4.6 Repeat Problem 4.5, but skip the antialias lowpass filter.
(b) Solve the system to obtain a closed-form expression for
4.7 Write a MATLAB program that loads the 256 × 256 image f (x, y) as a function of { f (0, 0), f (0, 1), f (1, 0), f (1, 1)}.
in letters.mat, antialias lowpass filters it, and demagnifies
it by four using downsampling. (This is how the image in 4.15 The image
tinyletters.mat was created.) a b c
d e f
4.8 Repeat Problem 4.7, but skip the antialias lowpass filter. g h i
4.9 Show that if the sinc interpolation formula is used to
is rotated 90◦ clockwise. What is the result?
upsample an M × M image f [n, m] to an N × N image g[n, m] by
an integer factor L (so that N = ML), then g[nL, mL] = f [n, m], 4.16 The image
so that the values of f [n, m] are preserved after upsampling. a b c
d e f
Section 4-8: 2-D Spline Interpolation g h i
√
4.10 Write a MATLAB program that loads the 50 × 50 image is rotated 45◦ clockwise and magnified by 2 using linear
in tinyclown.mat and magnifies it by four using nearest- interpolation. What is the result?
neighbor interpolation. 4.17 Recall from Eq. (3.12) that rotating an image f (x, y) by
4.11 Write a MATLAB program that loads the 64 × 64 image θ to get g(x, y) is implemented by
in tinyletters.mat and magnifies it by four using nearest-
neighbor interpolation. g(x, y) = f (x′ , y′ ) = f (x cos θ + y sin θ , y cos θ − x sin θ ),
(Eq. (4.64)) for {c[n]} from {x(n∆)}. In Section 4-7.4 this was
solved using the DFT, which requires (N/2) log2 N multipli-
cations. This problem gives a faster method, requiring only
2N < (N/2) log2 N multiplications.
Let
(c) Show that for each of these systems, xi [n] can be computed
recusively and stably from yi [n] using
Contents 20
40
Overview, 160
60
5-1 Pixel Value Transformation, 160
5-2 Unsharp Masking, 163 80
5-3 Histogram Equalization, 167 100
5-4 Edge Detection, 171
5-5 Summary of Image Enhancement 120
180
Objectives 200
0 20 40 60 80 100 120 140 160 180 200
Learn to:
This chapter covers various types of image
■ Use linear or gamma transformation to alter pixel enhancement, in which the goal is to deliberately
values to bring out image features. alter the image to brighten it, increase its contrast,
sharpen it, or enhance features such as edges.
■ Use unsharp masking or the Laplacian to sharpen an
image. Unsharp masking sharpens an image using a
high-pass filter, but this also makes the image
■ Use histogram equalization to brighten an image. noisier. Histogram equalization nonlinearly alters
pixel values to spread them out more evenly over
■ Use Sobel or Canny edge detection to produce an the display range of the image. Edge enhancement
edge image of a given image. produces an edge image of just the edges of the
image, which can be useful in image recognition in
computer vision.
Overview 5-1.1 Linear Transformation of Pixel Values
Image enhancement is an operation that transforms an image In general, image f [n, m] may have both positive and negative
f [n, m] to another image g[n, m] in which features of f [n, m], pixel values. A linear transformation linearly transforms the
such as edges or contrasts between different pixel values, are individual pixel values from f [n, m] to g[n, m], with 0 displayed
emphasized. It is not the same as image restoration, which as pure black and gmax displayed as pure white. Functionally,
includes denoising (removing noise from an image), deblurring the linear transformation is given by
(refocusing an out-of-focus image), and the more general case
of deconvolution (undoing the effect of a PSF on an image). In f [n, m] − fmin
g[n, m] = gmax . (5.1)
these three types of operations, the goal is to recover the true fmax − fmin
image f [n, m] from its noisy or blurred version g[n, m]. Image
restoration is covered in Chapter 6. Usually, g[n, m] is normalized so that gmax = 1.
Image enhancement techniques covered in this chapter in- Without the linear transformation, a display device would
clude: linear and nonlinear transformations of pixel values for display all negative values of f [n, m] as black and would display
displaying images more clearly; unsharp masking, a technique all values larger than gmax as white. In MATLAB, the com-
originally developed for sharpening images in film-based pho- mand imagesc(X),colormap(gray) applies the linear
tography; histogram equalization for brightening images; and transformation given by Eq. (5.1), with gmin = 0 and gmax = 1,
edge detection for identifying edges in images. prior to displaying an image, thereby ensuring that the full range
of values of array X are displayed properly, including negative
values.
5-1 Pixel-Value Transformation
The image dynamic range Ri of an image f [n, m] is defined
as the range of pixel values contained in the image, extending
5-1.2 Logarithmic Transformation of Pixel
between a minimum value fmin and a maximum value fmax : Values
Ri If coherent light is used to illuminate a circular opening, as
depicted by the diagram in Fig. 5-1, the light diffracted by the
fmin fmax opening generates an interference pattern in the image plane,
consisting of a “ring-like” structure. The 1-D image intensity
For a display device, such as a printed page or a computer along any direction in the image plane is given by
monitor, the display dynamic range Rd of the display intensity
g[n, m] is the full range available for displaying an image, I(θ ) = I0 sinc2 (aθ ), (5.2)
extending from a minimum of zero to a maximum gmax :
Rd
0 gmax
Ideally, the display device should display an image such that the Light illumination
information content of the image is conveyed to the user most y
optimally. This is accomplished by applying a preprocessing
transformation of pixel values. If Ri extends over a narrow range
of Rd , a transformation can be used to expand Ri to take full θ x
advantage of the available extent of Rd . Conversely, if Ri extends
over several orders of magnitude, displaying the image over the Image plane
limited linear range Rd would lead to pixel-value truncation. To
avoid the truncation issue, a nonlinear transformation is needed
so as to convert the dynamic range Ri into a range that is more Figure 5-1 Image generated by coherent light diffracted by a
compatible with the dynamic range Rd of the display device. We circular opening.
now explore both types of transformations.
160
5-1 PIXEL-VALUE TRANSFORMATION 161
−20 dB
• Acoustics:
−25 dB pressure
PdB = 20 log10 , (5.5)
−30 dB
20 µ pascals
−35 dB
−40 −20 0 20 40 where 20 µ pascals is the smallest acoustic pressure that can
θ (degrees) create an audible sound. On this PdB scale, a whisper is about
(b) Decibel scale, AdB(θ) = 10 log10 A(θ) 30 dB and the sound intensity of a jet plane taking off is about
130 dB, making the latter 100 dB (or, equivalently, 100,000)
times louder than a whisper.
Figure 5-2 Plot of the normalized intensity as a function of
angle θ . • Richter Scale for Earthquakes:
displacement
DdB = log10 , (5.6)
1 µm
where I0 and a are constants related to the wavelength λ the
diameter of the opening, and the overall imaging geometry. where “displacement” is defined as the horizontal ground dis-
A plot of the normalized intensity placement at a location 100 km from the earthquake’s epicenter.
For an earthquake of Richter magnitude 6 the associated ground
I(θ ) motion is 1 m. In contrast, the ground displacement associated
A(θ ) = = sinc2 (aθ ) (5.3) with a Richter magnitude 3 earthquake is only 1 mm.
I0
with a = 0.1 is displayed in Fig. 5-2(a) as a function of angle θ , • Stellar Magnitude (of stars viewed from Earth):
with θ expressed in degrees. The peak value of A(θ ) is 1,
and the sinc2 (aθ ) function exhibits sidelobes that decrease in star brightness
SdB = −2.512 log10 . (5.7)
intensity with increasing value of |θ |. The plot provides a good brightness of Vega
162 CHAPTER 5 IMAGE ENHANCEMENT
1.8 ≤ γ ≤ 2.5.
A first-magnitude star, such as Spica in the constellation Virgo, One role of the gamma transformation is to correct for the
has a stellar magnitude of approximately 1. Stars of magnitude power-law relationship given by Eq. (5.9), thereby generating an
6 are barely visible to the naked eye (depending on viewing image display of the true signal f [n, m]. Here, f [n, m] is the true
conditions) and are 100 times less bright than a first-magnitude pixel value, g[n, m] is the output of the preprocessing step, which
star. The factor of −2.512 was chosen so that a first-magnitude makes it the input to the display device. That is, g[n, m] = V [n, m]
star has a brightness equal to 40% of that of the star Vega, and and
Vega was chosen as the star with a reference brightness of zero
magnitude. I[n, m] = aV b [n, m] = a gb [n, m]
The dB scale also is used in voltage and power ratios, and in b
defining signal-to-noise ratio. f [n, m] − fmin γ
= a gmax . (5.11)
When applied to images, the logarithmic transformation of fmax − fmin
5-2 UNSHARP MASKING 163
where a′ = agbmax . 40
Figure 5-4(a) is an image of the planet Saturn, displayed
with no preprocessing. By comparison, the image in part (b) of 60
the figure had been subjected to a preprocessing step using a
gamma transformation with γ = 3 (the value that seemed to best
enhance the image). The cloud bands are much more apparent 80
in the transformed image than in the original.
100
Concept Question 5-1: Why do we not simply use pixel
values directly as numerical measures of image intensities?
120
0
Exercise 5-1: A star of magnitude 6 is barely visible to
the naked eye in a dark sky. How much fainter is a star of
magnitude 6 than Vega, whose magnitude is 0? 20
Answer: From Eq. (5.7), 106/2.512 = 244.6, so a sixth
magnitude star is 245 times fainter than Vega. 40
◮ Thus, unsharp masking enhances edges and fast-varying The “unsharp” and “masking” parts of the name are associated
parts of the image, and is a standard tool available in with a technique in which a blurred (or unsharp) image is used
Adobe
R
Photoshop
R
.◭ to create a corrective “mask” for removing the blurriness from
the image. The audio equivalent of unsharp masking is turning
164 CHAPTER 5 IMAGE ENHANCEMENT
Image fmask (x, y) can be formed in a darkroom by adding a HLaplace (µ , ν ) = −4π 2(µ 2 + ν 2 ). (5.18a)
“negative” version of fblur (x, y) to f (x, y). The blurring process
caused by the imperfect printing process is, in effect, a lowpass- Similarly, in polar coordinates
filtering process. Hence, fblur (x, y) represents a lowpass-filtered
version of f (x, y), and the “mask” image HLaplace (ρ , φ ) = −4πρ 2. (5.18b)
fmask (x, y) = f (x, y) − fblur (x, y) It is evident from the definitions given by Eqs. (5.18a and b) that
the Laplacian emphasizes high spatial-frequency components
represents a high-pass filtered version of f (x, y). A highpass (proportional to ρ 2 ) of the input image f (x, y). It is equally
spatial filter emphasizes the presence of edges; hence the name evident that all frequency components of HLaplace (µ , ν ) and
“mask.” HLaplace (ρ , φ ) have negative values.
By photographically adding the mask image to the original
image, we obtain a sharpened image fsh (x, y) in which high
spatial-frequency components of f (x, y) are boosted relative to
5-2.3 Laplacian in Discrete Space
low spatial-frequency components: Derivatives in continuous space are approximated as differences
in discrete space. Hence, in discrete space, the Laplacian g[n, m]
fsh (x, y) = f (x, y) + fmask (x, y) of a 2-D image f [n, m] is defined as
= f (x, y) + [ f (x, y) − fblur (x, y)]. (5.14)
g[n, m] = f [n + 1, m] + f [n − 1, m]
In digital image processing, high spatial-frequency compo- + f [n, m + 1] + f [n, m − 1] − 4 f [n, m]. (5.19)
nents can also be boosted by applying the discrete form of the
Laplacian operator to f (x, y). This operation is equivalent to the convolution
Figure 5-5 |HLaplace (Ω1 , Ω2 )| versus Ω1 at Ω2 = 0 (in red) and HLaplace (Ω1 , Ω2 ) ≈ −Ω21 − Ω22 = −R 2 , for R ≪ 1.
the approximation for small values of Ω1 and Ω2 blue.
(5.27)
This frequency dependence of the discrete-space Laplacian is
analogous to the response given by Eq. (5.18b) for the frequency
extended to 2-D, leads to response of the continuous-space Laplacian; both have negative
signs and both vary as the square of the spatial frequency
f [n − n0, m − m0] F(Ω1 , Ω2 ) e− j(n0 Ω1 +m0 Ω2 ) . (5.23) (ρ and R).
The blue plot in Fig. 5-5 represents the approximate expres-
Application of Eq. (5.23) to Eq. (5.19) leads to sion for |HLaplace (Ω1 , Ω2 )| given by Eq. (5.27). It confirms that
the approximation is valid not only for |Ω1 |, |Ω2 | ≪ 1, but also
G(Ω1 , Ω2 ) = F(Ω1 , Ω2 ) e− jΩ1 + F(Ω1 , Ω2 ) e jΩ1
up to |Ω1 |, |Ω2 | ≈ 1.
+ F(Ω1 , Ω2 ) e− jΩ2 + F(Ω1 , Ω2 ) e jΩ2 In Fig. 5-5, the plots for the exact and approximate ex-
− 4F(Ω1, Ω2 ) pressions of |HLaplace (Ω1 , 0)| are displayed over the range
−π < Ω1 < π . They are in close agreement over approximately
= F(Ω1 , Ω2 ) [(e− jΩ1 + e jΩ1 ) + (e− jΩ2 + e jΩ2 ) − 4] the central one-third of the spectral range, and they deviate
= F(Ω1 , Ω2 ) [2 cos Ω1 + 2 cosΩ2 − 4]. (5.24) significantly as |Ω1 | exceeds 1 (or |Ω2 | exceeds 1), or more
generally, as the radial frequency R exceeds 1. In most images,
The spectrum G(Ω1 , Ω2 ) is the product of the spectrum of the bulk of the image “energy” is contained within this central
the original image, F(Ω1 , Ω2 ) and the Laplacian’s spatial fre- region.
quency response HLaplace (Ω1 , Ω2 ): This last statement deserves further elaboration. To do so, we
refer the reader to Eq. (2.64), which relates the frequency Ω0 in
G(Ω1 , Ω2 ) = F(Ω1 , Ω2 ) HLaplace (Ω1 , Ω2 ). (5.25) discrete time to the frequency f0 in continuous time, namely
Equating Eqs. (5.24) and (5.25) leads to Ω0 = 2π f0 ∆, (5.28)
HLaplace (Ω1 , Ω2 ) = 2[cosΩ1 + cosΩ2 − 2]. (5.26) where ∆ is the sampling interval in seconds. Extending the
relationship to 2-D provides the connections
gsharp [n, m] = f [n, m] − f [n, m] ∗ ∗ hLaplace [n, m] where Nvalid = M − L + 1. To illustrate with an example, let us
consider the image given by
= f [n, m] ∗ ∗ hsharp [n, m], (5.31)
where hsharp [n, m] is an image sharpening filter with PSF 4 8 12
f [n, m] = 16 20 24 , (5.37a)
28 32 36
hsharp [n, m] = δ [n] δ [m] − hLaplace [n, m]. (5.32)
and let us assume that we wish to perform local averaging by
sliding a 2 × 2 window across the image, both horizontally and
This operation is analogous to Eq. (5.14) for film photography, vertically. Such a filter has a PSF given by
except that in the present case we used a minus sign (rather than
a plus sign) in the first step of Eq. (5.31) because 1 1 1
h[n, m] = . (5.37b)
4 1 1
f [n, m] ∗ ∗ hLaplace [n, m]F(Ω1 , Ω2 ) HLaplace (Ω1 , Ω2 ),
(5.33) Upon performing the convolution given by Eq. (5.35) onto the
and HLaplace (Ω1 , Ω2 ) is always negative. arrays given in Eqs. (5.37a and b), we obtain
Use of Eq. (5.21) in Eq. (5.32) leads to
1 3 5 3
0 −1 0 5 12 16 9
y[n, m] = . (5.38)
hsharp [n, m] = −1 5 −1 . (5.34) 11 24 28 15
0 −1 0 7 15 17 9
Since hsharp [n, m] is only 3 × 3, it is faster to compute the 2-D The border rows and columns are not the average values of
convolution given by Eq. (5.31) in the spatial [n, m] domain than 4 neighboring pixels, but of only 1 neighboring pixel and 3
by multiplying zero-padded 2-D DFTs. zeros or 2 neighboring pixels and 2 zeros. These are invalid
In a later part of this section, we will compare sharpened entries. The more realistic valid output is yvalid [n, m], obtained
images to their original versions, but we should note that: via Eq. (5.33) or equivalently, by removing the top and bottom
5-3 HISTOGRAM EQUALIZATION 167
rows and the columns at the far left and far right. Either approach
leads to
12 16
yvalid [n, m] = . (5.39)
24 28
Since f [n, m] is M × M = 3 × 3 and h[n, m] is L × L = 2 × 2,
yvalid [n, m] is Nvalid × Nvalid with
Nvalid = M − L + 1 = 3 − 2 + 1 = 2.
1 1 0 1 0
f [n, m] ∗ ∗ hLaplace [n, m] = c ∑ ∑ 1 −4 1 = 0.
n=−1 m=−1 0 1 0
0 0
200 200
0 200 0 200
(a) Original image f [n,m] (b) Histogram-equalized g[n,m]
pf ( f0) pg(g0)
f0 g0
0 255 0 255
(c) Histogram pf [ f0] of original image (d) Histogram pg[g0] of g[n,m]
Figure 5-8 Clown image, before and after application of histogram equalization, and associated histograms. Histogram pg (g0 ) was
generated by nonlinearly transforming (redistributing) p f ( f0 ), but the total number of pixels at each new value (the g0 axis is a nonlinear
transformation of the f0 axis) remains the same.
evaluated at f0 = f [n, m]. That is, uniformly spread out over the range 0 to 255 than the histogram
of the original image, p f [ f0 ]. The associated CDF, Pg [g0 ],
approximates a straight line that starts at coordinates (0, 0) and
g[n, m] = Pf [ f0 ] f . (5.42) concludes at (255, M 2 ), where M 2 is the total number of pixels.
0 = f [n,m]
These attributes are evident in Fig. 5-10 for the histogram-
equalized clown image.
Such a transformation leads to a histogram pg [g0 ] that is more
170 CHAPTER 5 IMAGE ENHANCEMENT
pf ( f0) Pf [ f0]
M2
f0
0 50 100 150 200 255
f0 (b) CDF Pf [ f0]
0 50 100 150 200 255
(a) Histogram pf [ f0] of f [n,m] Pg[g0]
Pf [ f0] M2
M2
g0
0 50 100 150 200 255
(n0,m2)
n
m
(a) dH[n,m]
n
0 n0
m
Computing dH [n, m] for every pixel is equivalent to validly where, again, ∆ is a prescribed gradient threshold. In the image,
convolving (see Section 5-2.5) image f [n, m] with the window’s pixels for which z[n, m] = 1 are shown in white, and those with
point spread function hH [n, m] along the horizontal direction. z[n, m] = 0 are shown in black. Usually, the value of ∆ is selected
That is, empirically by examining a histogram of g[n, m] or through
repeated trials.
dH [n, m] = f [n, m] ∗ ∗ hH [n, m], (5.48)
0 0
20
50
40
60
100
80
150 100
120
200
140
250 n 160
0 50 100 150 200 250
m 180
(a) Letters image
200
0 20 40 60 80 100 120 140 160 180 200
n
0
m
(a) Clown image
50
0
100 20
40
150
60
80
200
100
250 n 120
0 50 100 150 200 250
m 140
(b) Edge-detected image
160
Figure 5-14 Application of the Sobel edge detector to the 180
image in (a) with ∆ = 200 led to the image in (b).
200
0 20 40 60 80 100 120 140 160 180 200
n
m
(b) Sobel edge-detected image
5 × 5 PSF given by
Figure 5-15 Application of the Sobel edge detector to the
2 4 5 4 2 image in (a) captures some of the edges in the image, but also
1
4 9 12 9 4
misses others.
hG [n, m] = 5 12 15 12 5 . (5.54)
159 4 9 12 9 4
2 4 5 4 2
and 40
dV [n, m] 60
θ [n, m] = tan−1 . (5.56b)
dH [n, m] 80
Step 4: At each pixel [n, m], round θ [n, m] to the nearest of 140
{ 0◦ , 45◦ , 90◦ , 135◦ }. Next, determine whether to keep the value
160
of g[n, m] of pixel [n, m] as is or to replace it with zero. The
decision logic is as follows: 180
(a) For a pixel [n, m] with θ [n, m] = 0◦ , compare the value of 200
g[n, m] to the values of g[n + 1, m] and g[n − 1, m], correspond- 0 20 40 60 80 100 120 140 160 180 200
ing to the pixels at the immediate right and left of pixel [n, m]. If (a) Clown image
g[n, m] is the largest of the three gradients, keep its value as is;
otherwise, set it to zero. 0
(b) For a pixel [n, m] with θ = 45◦ , compare the value of 20
g[n, m] to the values of g[n − 1, m + 1] and g[n + 1, m − 1],
corresponding to the pixel neighbors along the 45◦ diagonal. If 40
g[n, m] is the largest of the three gradients, keep its value as is; 60
otherwise, set it to zero.
(c) For a pixel [n, m] with θ = 90◦ , compare the value of 80
rithm to the clown image with ∆1 = 0.05 and ∆2 = 0.125. This 5-5 Summary of Image Enhancement
particular combination provides an edge-image that successfully
captures the contours that segment the clown image.
Techniques
• To increase contrast in an image by altering the range
of pixel values, use a gamma transformation. Simply try
different values of γ and choose the one that gives the best
◮ Edge detection can be implemented in MATLAB’s Image visual result.
Processing Toolbox using the commands
• To compress the range of values; e.g., in a spectrum, use a
E=edge(X,’sobel’,T1) for Sobel and log transformation.
E=edge(X,’canny’,T1,T2) for Canny.
• To sharpen an image, use unsharp masking, bearing in mind
The image is stored in array X, the edge image is stored that this also makes the image noisier.
in array E and T1 and T2 are the thresholds. MATLAB
assigns default values to the thresholds, computed from the • To make an image brighter, use histogram equalization.
image, if they are not specified. ◭
• To detect edges, or create an image of edges, use Canny
edge detection (if available), otherwise use Sobel edge
detection.
Concept Question 5-7: Why does edge detection not
work well on noisy images?
Summary
Concepts
• Image enhancement transforms a given image into an- • Unsharp masking and Laplacians sharpen an image, but
other image in which image features such as edges or increase noise.
contrast has been enhanced to make them more apparent. • Histogram equalization nonlinearly alters pixel values to
• Linear, logarithmic, and gamma transformations alter the brighten images.
range of pixel values so that they fit the range of the • Edge detection produces an image consisting entirely of
display. edges of the image.
Mathematical Formulae
Linear transformation Laplacian
f [n, m] − fmin g[n, m] = f [n + 1, m] + f [n − 1, m] + f [n, m + 1]
g[n, m] = gmax
fmax − fmin + f [n, m − 1] − 4 f [n, m]
Logarithmic transformation Cumulative distribution
g[n, m] = a log10 ( f [n, m] + b) f0
Pf [ f0 ] = ∑ p f [ f ′]
Gamma transformation
′
f =1
f [n, m] − fmin γ Histogram equalization
g[n, m] = gmax
fmax − fmin g[n, m] = Pf [ f0 ] f0 = f [n,m]
Unsharp masking Horizontal and verticaledge detectors
g(x, y) = f (x, y) + f (x, y) − fblur (x, y) −1 0 1
| {z }
f mask (x,y) dH [n, m] = f [n, m] ∗ ∗ −2 0 2
Unsharp masking −1 0 1
g[n, m] = f [n, m] − f [n, m] ∗ ∗ hLaplace [n, m] −1 −2 −1
dV [n, m] = f [n, m] ∗ ∗ 0 0 0
Laplacian 1 2 1
∂2 f ∂2 f
g(x, y) = ∇2 f (x, y) = + Sobel edge detector
( p
∂ x2 ∂ y2
1 if dh [n, m]2 + dv [n, m]2 > ∆,
z[n, m] = p
0 if dh [n, m]2 + dv [n, m]2 < ∆
Important Terms Provide definitions or explain the meaning of the following terms:
Canny edge detector histogram equalization logarithmic transformation unsharp masking
gamma transformation Laplacian Sobel edge detector
178 CHAPTER 5 IMAGE ENHANCEMENT
5.1 Explain why, in the gamma transformation (Eq. (5.10)), hsharpen (x, y) = δ (x) δ (y) − ∇2 f (x, y).
γ > 1 tends to darken images, while γ < 1 tends to lighten 2π 2 ρ 2
images. Hint: Use the result of Problem 3.4, which is Fg (ρ ) = e−2σ .
Use σ 2 = 2.
5.2 Use gamma transformation with γ = 3 to darken the image
in coins1.mat. 5.9 Use unsharp masking as defined in Problem 5.8 to
sharpen the two images in the files (a) circuit.mat and (b)
5.3 Use gamma transformation with γ = 3 to darken the image quarter.mat. Use unsharp.m.
in coins2.mat.
5.10 Use unsharp masking as defined in Problem 5.8 to
sharpen the two images in the files (a) tire.mat and (b)
Section 5-2: Unsharp Masking coins2.mat. Use unsharp.m.
5.4 Use the sharpening filter Eq. (5.32) to sharpen the two Section 5-3: Histogram Equalization
images in the files (a) plane.mat and (b) coins2.mat.
5.11 This problem applies histogram equalization to a tiny
5.5 Use the sharpening filter Eq. (5.32) to sharpen the two
(3 × 3) image. The goal is for the reader to work the problem
images in the files (a) quarter.mat and (b) rice.mat.
entirely by hand, thereby aiding understanding. The (3 × 3)
5.6 Use sharpening filter Eq. (5.32) to sharpen the two images image is
in the files (a) moon.mat and (b) unsharp.mat. 1 2 1
f [n, m] = 2 3 9 .
5.7 Unsharp masking was originally based on Eq. (5.14), 3 2 9
which in discrete space is
(a) Plot the histogram of the image.
g[n, m] = f [n, m] + ( f [n, m] − fblur [n, m]).
(b) List its distribution and CDF in a table.
fblur [n, m] is a lowpass version of f [n, m]. If fblur [n, m] is the (c) List values of f [n, m] and values of the histogram-equalized
average of { f [n + 1, m], f [n − 1, m], f [n, m + 1], f [n, m − 1]}, image g[n, m] in a table.
show that (d) Depict g[n, m] as a 3 × 3 matrix, similar to the depiction of
1 f [n, m].
g[n, m] = f [n, m] − f [n, m] ∗ ∗ hLaplacian [n, m], (e) Depict f [n, m] and g[n, m] as images, and plot their respec-
4
tive histograms and CDF’s.
similar to Eq. (5.32).
5.12 Use the program hist.m to apply histogram equaliza-
5.8 Unsharp masking was originally based on Eq. (5.14), tion to the image in circuit.mat. Print out the images,
which is histograms, and CDFs of the original and equalized images.
5.13 Use the program hist.m to apply histogram equal-
g(x, y) = f (x, y) + ( f (x, y) − fblur (x, y)).
ization to the image in pout.mat. Print out the images,
fblur (x, y) is a lowpass version of f (x, y). Adobe
R
Photoshop
R histograms, and CDFs of the original and equalized images.
uses the following form of unsharp masking: 5.14 Use the program hist.m to apply histogram equal-
p ization to the image in tire.mat. Print out the images,
fblur (x, y) = f (x, y) ∗ ∗ fg ( x2 + y2), histograms, and CDFs of the original and equalized images.
where 5.15 Use the program hist.m to apply histogram equal-
2 2 2
p e−(x +y )/(2σ ) ization to the image in coins.mat. Print out the images,
2 2
fg ( x + y ) = .
2πσ 2 histograms, and CDFs of the original and equalized images.
PROBLEMS 179
and
1 2 1
hV [n, m] = 0 0 0 .
−1 −2 −1
Compute the discrete-space frequency response H(Ω1 , Ω2 ) of
each of these PSFs.
5.17 Sobel edge detection works by convolving the image
f [n, m] with the two PSFs
1 0 −1
hH [n, m] = 2 0 −2
1 0 −1
and
1 2 1
hV [n, m] = 0 0 0 .
−1 −2 −1
Show how to implement these two convolutions using
(a) 16N 2 additions and subtractions since doubling is two
additions;
(b) 10N 2 additions and subtractions since hH [n, m] and hV [n, m]
are separable.
5.18 Apply (a) Sobel edge detection and (b) Canny edge
detection to the image in plane.mat using the programs
sobel.m and canny.m. Compare results.
5.19 Apply (a) Sobel edge detection and (b) Canny edge
detection to the image in quarter.mat using the programs
sobel.m and canny.m. Compare results.
5.20 Apply (a) Sobel edge detection and (b) Canny edge detec-
tion to the image in moon.mat using the programs sobel.m
and canny.m. Compare results.
5.21 Apply (a) Sobel edge detection and (b) Canny edge
detection to the image in saturn.mat using the programs
sobel.m and canny.m. Compare results.
Chapter 6
6 Deterministic Approach to
Image Restoration
0
Contents
Overview, 181
6-1 Direct and Inverse Problems, 181
6-2 Denoising by Lowpass Filtering, 183
224
6-3 Notch Filtering, 188 0
(a) Image g[n,m]: motion-blurred highway sign
274
Objectives
224
0 274
(b) Image g′[n,m]: motion-blurred highway sign
Learn to: with additive noise
0
■ Denoise a noisy image using the 2-D DFT and a
Hamming-windowed lowpass filter.
181
182 CHAPTER 6 DETERMINISTIC APPROACH TO IMAGE RESTORATION
f (x,y)
Figure 6-1 Simulation of image blurring: the original noise-free image atpthe top is convolved with Gaussian-shaped point spread
functions of different effective widths. The variable r is the radial distance r = x2 + y2 .
f (x, y) = δ (x, y) in Eq. (6.1) gives using Eq. (6.1), thereby providing a possible solution of the
direct problem. Because υ (x, y) is random in nature, each
g(x, y) = h(x, y) ∗ ∗δ (x, y) = h(x, y). (6.4) simulation of Eq. (6.1) will result in a statistically different, but
comparable image g(x, y).
In most imaging systems, the noise υ (x, y) is random in nature
and usually modeled as a zero-mean Gaussian random variable
(Chapter 8). Accordingly, υ (x, y) is described by a probability 6-1.2 The Inverse Problem
density function (pdf ) that contains a single parameter, the Whereas the solution of the direct problem seeks to generate
noise variance σv2 . The pdf and associated variance can be mea- image g(x, y) from image f (x, y), the solution of the inverse
sured experimentally by recording the output g(x, y) for many problem seeks to do the exact opposite, namely to extract the
locations (x, y), while having no signal as input ( f (x, y) = 0). true image f (x, y)—or a close facsimile of f (x, y)—from the
For a camera, this is equivalent to imaging a perfectly dark blurred and noisy image g(x, y). The process involves (a) denois-
object. The recorded image in that case is the noise added by ing g(x, y) by filtering out υ (x, y)—or at least most of it—and
the camera. (b) deconvolution of g(x, y) to generate a close approximation
Once h(x, y) has been characterized and υ (x, y) has been of the true image f (x, y). The denoising and deconvolution steps
modeled appropriately, image g(x, y) can be readily computed of the inversion algorithm are performed using deterministic
6-2 DENOISING BY LOWPASS FILTERING 183
f (x,y)
h1(x,y)
Figure 6-2 The image in the upper center, g1 (x, y), had been convolved with the narrow filter before noise was added to it. SNR = 20 dB
corresponds to average signal power/average noise power = 100, so g11 (x, y) is essentially noise-free. In contrast, SNR = 0 dB in g12 (x, y),
which means that the average signal and noise powers are equal, and in g13 (x, y) the noise power is 10× the signal power..
methods, as demonstrated in later sections of the present chapter, 6-2 Denoising by Lowpass Filtering
or they are performed using stochastic (probabilistic) methods,
which we cover later in Chapter 9. As a “heads up”, we note that
the stochastic approach usually outperforms the deterministic
approach. In Section 3-7, we introduced and defined the 2-D discrete-
space Fourier transform (DSFT) F(Ω1 , Ω2 ) of discrete-space
image f [n, m]. Here, Ω1 and Ω2 are continuous spatial frequen-
184 CHAPTER 6 DETERMINISTIC APPROACH TO IMAGE RESTORATION
cies one period of which is over the range Application of the lowpass filter to the spectrum G(Ω1 , Ω2 ) of
noisy image g[n, m] generates spectrum
− π ≤ Ω1 , Ω2 ≤ π .
We will refer to this continuous-frequency domain as the Gbrick (Ω1 , Ω2 ) = Hbrick (Ω1 , Ω2 ) G(Ω1 , Ω2 ). (6.8)
discrete-space spatial frequency (DSSF) domain.
In the DSSF domain, most of the energy in the spectra The operation given by Eq. (6.8) can be performed in the
of typical images is concentrated in a small central region (N × N) 2-D DFT domain (Section 3-8) by defining a 2-D DFT
surrounding the origin (Ω1 = 0, Ω2 = 0). In contrast, additive cutoff index K such that
noise may be distributed over a wide range of frequencies Ω1
and Ω2 . If we denote G(Ω1 , Ω2 ) as the spectrum of noisy image Ωc N
g[n, m], the rationale behind lowpass filtering is to remove high- K= , (6.9)
2π
frequency noise from G(Ω1 , Ω2 ) while preserving (as much as
possible) the spectrum of the original image F(Ω1 , Ω2 ). The and then setting to zero those elements of the 2-D DFT of
disadvantage of lowpass filtering is that the high-DSSF regions G[k1 , k2 ] that fall in the range
of an image may represent features of interest, such as edges.
K ≤ k1 , k2 ≤ N + 2 − K. (6.10)
−π Ω2 π
0 π
DSFT
0 Ω1
Inverse DSFT
0 Ω1
Inverse DSFT
0 Ω1
(f ) Ωc2 = 50π/128
255
0 255
n −π
m (e) Filtered image gbrick[n,m] with Ωc2 = 50π/128
−π Ω2 π
0 π
Inverse DSFT
0 Ω1
Figure 6-3 Image denoising by three lowpass filters with different cutoff wavenumbers.
186 CHAPTER 6 DETERMINISTIC APPROACH TO IMAGE RESTORATION
lowpass filter with Ωc = 75π /128 leads to the spectrum The fundamental frequency of x(t) is f0 = 1/T = 1/2π . We
in Fig. 6-3(d). The two spectra are identical within the can apply the equivalent of a brickwall lowpass filter with cutoff
square defined by |Ω1 |, |Ω2 | ≤ Ωc1 . The fractional size of frequency kc f0 by truncating the Fourier series at k = kc . For
the square is (75/128)2 = 34% of the spectrum of the example, if we select kc = 21, we obtain a brickwall lowpass-
original image. filtered version of x(t) given by
Inverse transforming the spectrum in Fig. 6-3(d) produces 21
the lowpass-filtered image in Fig. 6-3(c). We observe that 4
the noise is reduced, albeit only slightly, but the letters are
ybrick (t) = ∑ kπ
sin(kt). (6.12)
k=1
k=odd
hardly distorted.
• Narrowing the filtered spectrum down to a box with The truncated summation contains 11 nonzero terms. The plot of
Ωc2 = 50π /128 leads to the spectrum in Fig. 6-3(f). The as- ybrick (t) displayed in Fig. 6-4(b) resembles the original square
sociated filtered image gbrick [n, m] is shown in Fig. 6-3(e). wave, except that it also exhibits small oscillations; i.e., the
In this case, only (50/128)2 = 15% of the spectrum of ringing effect we referred to earlier.
the original noisy image is retained. The filtered image
contains less noise, but the letters are distorted slightly. B. 1-D tapered lowpass-filtered signal
• Repeating the process, but limiting the spectrum to The ringing in the lowpass-filtered signal, which is associated
Ωc3 = 25π /128—in which case, only (25/128)2 = 4% of with the sharp cutoff characteristic of the brickwall filter, can be
the spectrum is retained—leads to the image in Fig. 6-3(g). reduced significantly by multiplying the terms in Eq. (6.12) by
The noise is greatly reduced, but the edges of the letters are a decreasing sequence of weights, thereby tapering those terms
fuzzy. gradually to zero. Several tapering formats are available, one of
which is the Hamming window defined by Eq. (2.83). Adapting
the expression for the Hamming window to the square wave
◮ This example illustrates the trade-off inherent in Fourier- leads to
based lowpass filtering: noise can be reduced, but at the ex-
21
pense of distorting the high-frequency content of the image. 4 π (k − 1)
As noted earlier, in Chapter 7 we show how to avoid this yHam (t) = ∑ sin(kt) 0.54 + 0.46 cos .
k=1 kπ 20
trade-off using wavelets instead of Fourier transforms. ◭ k=odd
(6.13)
A plot of the tapered signal yHam (t) is shown in Fig. 6-4(c).
6-2.2 Tapered Lowpass Filtering Even though the tapered signal includes the same number of fre-
quency harmonics as before, the oscillations have disappeared
Even though it is not apparent in the filtered images of Fig. 6-3, and the transitions at t = integer values of T /2 = π are relatively
at lower cutoff DSSFs, some of what appears to be noise is smooth.
actually “ringing” caused by the abrupt “brickwall” filtering of
the spectrum of the noisy image. Fortunately, the ringing effect
can be reduced significantly by modifying the brickwall filter C. 2-D brickwall lowpass-filtered image
into a tapered filter. We will examine both the problem and the
proposed solution for 2-D images, but before we do so, it will The ringing observed in 1-D signals also manifests itself in 2-D
be instructive to consider the case of a periodic 1-D signal. images whenever the image spectrum is lowpass-filtered by a
sharp filter. The DSSF response Hbrick (Ω1 , Ω2 ) of a brickwall
lowpass filter with cutoff frequency Ωc along both Ω1 and Ω2 is
A. 1-D brickwall lowpass-filtered signal given by Eq. (6.7). The corresponding inverse DSFT is
Signal x(t), shown in Fig. 6-4(a) is a square wave with period
T = 2π and amplitude A = 1. Its Fourier series expansion is hbrick [n, m] = hbrick [n] hbrick [m]
given by 2
∞ Ωc Ωc n Ωc m
4
x(t) = ∑ sin(kt). (6.11) = sinc sinc . (6.14)
k=1 kπ
π π π
k=odd
6-2 DENOISING BY LOWPASS FILTERING 187
0.5
−0.5
−1
0 1 2 3 4 5 6 7 8 9
t
π 2π
(a) Square wave x(t)
0.5
−0.5
−1
t
0 1 2 3 4 5 6 7 8 9
(b) Brickwall lowpass-filtered signal ybrick(t)
0.5
−0.5
−1
t
0 1 2 3 4 5 6 7 8 9
(c) Tapered Fourier series signal yHam(t)
Figure 6-4 (a) Square wave x(t), (b) brickwall lowpass-filtered version, and (c) Hamming-windowed version.
188 CHAPTER 6 DETERMINISTIC APPROACH TO IMAGE RESTORATION
D. 2-D tapered lowpass-filtered image Figure 6-5 Hamming window of length N = 10: (a) impulse
Figure 6-5 displays the spatial and frequency domain responses response, and (b) spectrum.
of a Hamming window with N = 10, adapted from Fig. 2-13.
For a 2-D image g[n, m], lowpass filtering its spectrum with a
Hamming window of length N is equivalent to performing the In contrast, the image in Fig. 6-6(e)—which was generated by
convolution applying a Hamming windowed filter to the original image—
exhibits no “ringing.” Note that the spectrum of the Hamming-
gHam [n, m] = hHam [n, m] ∗ ∗g[n, m] (6.17a) windowed spectrum in Fig. 6-6(f) tapers gradually from the
center outward. It is this tapering profile that eliminates the
ringing effect.
with
hHam [n, m] = hHam [n] hHam [m] = Concept Question 6-1: For lowpass filtering, why would
2 we use a Hamming-windowed filter instead of a brick-wall
Ωc
π sinc Ωπc n 0.54 + 0.46 cos πNn filter?
Ωc m
× sinc π 0.54 + 0.46 cos πNm for |n|, |m| ≤ N,
0 for |n|, |m| > N. 6-3 Notch Filtering
(6.17b)
Occasionally, an image may contain a 2-D sinusoidal inter-
To illustrate the presence of “ringing” when a sharp-edged filter ference contributed by an electromagnetic source, such as the
like a brickwall is used, and its absence when a Hamming win- ac power cable in a camera. In 1-D discrete-time signals,
dowed filter is used instead, we refer the reader to Fig. 6-6. In sinusoidal interference can be eliminated by subjecting the
part (a), we show a noiseless letters image, and its corresponding signal’s spectrum to a notch filter, which amounts to setting the
spectrum is displayed in part (b). Application of a brickwall spectrum at that specific frequency to zero. A similar process
lowpass filter (with the impulse response given by Eq. (6.14)) can be applied to a 2-D image. To illustrate, let us consider the
leads to the image in Fig. 6-6(c). The “ringing” in the image is example portrayed in Fig. 6-7. In part (a) of the figure, we have a
visible in the form of whorls that resemble a giant thumbprint. 660 × 800 image of the planet Mars recorded by a Mariner space
6-3 NOTCH FILTERING 189
−π Ω2 π
0 π
DSFT
128 0 Ω1
255
0 128 255 −π
(a) Noiseless letters image (b) Log of spectrum of noiseless letters image
−π Ω2 π
0 π
DSFT
128 0 Ω1
255 −π
0 128 255
(c) Brickwall lowpass-filtered image (d) Log of spectrum of brickwall lowpass-filtered image
−π Ω2 π
−10 π
DSFT
128 0 Ω1
265 −π
−10 128 265
(e) Hamming lowpass-filtered image (f) Log of spectrum of Hamming-windowed
lowpass-filtered image
Figure 6-6 Letters image and its spectrum in (a) and (b); brickwall lowpass-filtered version in (c) and (d), and brickwall lowpass-filtered
version with a Hamming window in (e) and (f). The logarithmic scale enhances small values of the spectrum.
190 CHAPTER 6 DETERMINISTIC APPROACH TO IMAGE RESTORATION
Ω2
0 π
DSFT
0 Ω1
659 −π
0 799 −π π
(a) Original image with scan lines (b) Log of magnitude of spectrum of original image
Ω2
0 π
DSFT
0 Ω1
659 −π
0 799 −π π
(c) Image of vertically periodic horizontal lines (d) Magnitude of spectrum of vertically
periodic horizontal lines
Ω2
0 π
Inverse DSFT
0 Ω1
659 −π
0 799 −π π
(e) Notch-filtered image (f ) Log of magnitude of spectrum of notch-filtered image
Figure 6-7 Process for notch-filtering horizontal scan lines. Fig. 6-7(a) courtesy of NASA.
6-4 IMAGE DECONVOLUTION 191
to use the fast radix-2 2-D FFT to compute 2-D DFTs of order
(N × N). Alternatively, the Cooley-Tukey FFT can be used, in
which case N should be an integer with a large number of small
factors.
Sampling the DSFT at Ω1 = 2π k1/N and Ω2 = 2π k2 /N for
k1 = 0, 1, . . . , N − 1 and k2 = 0, 1, . . . , N − 1 provides the DFT
complex coefficients G[k1 , k2 ].
A similar procedure can be applied to h[n, m] to obtain
coefficients H[k1 , k2 ], after zero-padding h[n, m] so that it also
is of size (N × N). The DFT equivalent of Eq. (6.23) is then
given by
G[k1 , k2 ]
F[k1 , k2 ] = . (6.25)
H[k1 , k2 ]
Exercising the process for all possible values of k1 and k2 leads (b) Blurred image g[n,m]
to an (N × N) 2-D DFT for F[k1 , k2 ], whereupon application of
an inverse 2-D DFT process yields a zero-padded version of
f [n, m]. Upon discarding the zeros, we obtain the true image
f [n, m]. The deconvolution procedure is straightforward, but
it hinges on a critical assumption, namely that none of the
DFT coefficients of the imaging system’s transfer function is
zero. Otherwise, division by zero in Eq. (6.25) would lead to
undeterminable values for F[k1 , k2 ].
we label g[n, m]. The image sizes are: regularization, which seeks to minimize the cost function
with
Concept Question 6-4: What does the Wiener filter
given by Eq. (6.31) reduce to when λ = 0?
H∗ [k1 , k2 ]
W[k1 , k2 ] = . (6.32b)
|H[k1 , k2 ]|2 + λ 2
Concept Question 6-5: Why is Tikhonov regularization
needed in deconvolution?
The operation of the Wiener filter is summarized as follows:
(a) For values of (k1 , k2 ) such that |H[k1 , k2 ]| ≫ λ , the Wiener Exercise 6-1: Apply Tikhonov regularization with λ = 0.01
filter implementation leads to to the 1-D deconvolution problem
H∗ [k1 , k2 ] G[k1 , k2 ] {x[0], x[1]} ∗ {h[0], h[1], h[2]} = {2, −5, 4, −1},
F̂[k1 , k2 ] ≈ G[k1 , k2 ] 2
= , (6.33a)
|H[k1 , k2 ]| H[k1 , k2 ]
where h[0] = h[2] = 1 and h[1] = −2.
which is the same as Eq. (6.25). Y(Ω)
(b) For values of (k1 , k2 ) such that |H[k1 , k2 ]| ≪ λ , the Wiener Answer: H(0) = 1 − 2 + 1 = 0 so X(Ω) = H(Ω) will not
filter implementation leads to work at Ω = 0. But
H∗ [k1 , k2 ] H∗ (Ω)
F̂[k1 , k2 ] ≈ G[k1 , k2 ] . (6.33b) X(Ω) = Y(Ω)
λ2 |H(Ω)|2 + λ 2
In this case, the Wiener filter avoids the noise amplification does work. Using 4-point DFTs (computable by hand) gives
problem that would have occurred with the use of the unregu- x[n] = {1.75, −1.25, −0.25, −0.25}, which is close to the
larized deconvolution given by Eq. (6.25). actual x[n] = {2, −1}. MATLAB code:
h=[1 -2 1];x=[2 -1];y=conv(x,h);
6-4.5 Wiener Filter Deconvolution Example H=fft(h,4);Y=fft(y);
Z=conj(H).*Y./(abs(H).*abs(H)+0.0001);
To demonstrate the capabilities of the Wiener filter, we compare z=real(ifft2(Z)) provides the estimated x[n].
image deconvolution performed with and without regulariza-
tion. The demonstration process involves images at various
stages, namely:
6-5 Median Filtering
• f [n, m]: true letters image (Fig. 6-9(a)).
Median filtering is used to remove salt-and-pepper noise, often
• g[n, m] = h[n, m] ∗ ∗ f [n, m] + v[n, m]: the imaging process
due to bit errors or shot noise associated with electronic devices.
not only distorts the image (through the PSF), but also adds
The concept of median filtering is very straightforward:
random noise v[n, m]. The result, displayed in Fig. 6-9(b),
is an image with signal-to-noise ratio of 10.8 dB, which
means that the random noise energy is only about 8% of ◮ A median filter of order L replaces each pixel with the
that of the signal. median value of the L2 pixels in the L × L block centered on
• fˆ1 [n, m]: estimate of f [n, m] obtained without regulariza- that pixel. ◭
tion (i.e., using Eq. (6.25)). Image fˆ1 [n, m], displayed in
Fig. 6-9(c), does not show any of the letters present in the
original image, despite the fact that the noise level is small For example, a median filter of order L = 3 replaces each pixel
relative to the signal. [n, m] with the median value of the 3 × 3 = 9 pixels centered at
[n, m]. Figure 6-10(a) shows an image corrupted with salt-and-
• fˆ2 [n, m]: estimate of f [n, m] obtained using the Wiener pepper noise, and part (b) of the same figure shows the image
filter of Eq. (6.31) with λ 2 = 5. The deconvolved image after the application of a median filter of order L = 5.
(Fig. 6-9(d)) displays all of the letters contained in the
original image, but some high wavenumber noise also is Concept Question 6-6: When is median filtering useful?
present.
6-6 MOTION-BLUR DECONVOLUTION 195
Figure 6-9 (a) Original noise-free undistorted letters image f [n, m], (b) blurred image due to imaging system PSF and addition of random
noise v[n, m], (c) deconvolution using Eq. (6.25), and (d) deconvolution using Eq. (6.31) with λ 2 = 5.
6-6 Motion-Blur Deconvolution ing pattern known as motion blur. An example is the simulated
photograph of the highway sign shown in Fig. 6-11(a), taken
from a moving car. Often, the direction and duration of the blur
6-6.1 Continuous Space can be discerned from the blurred image.
To describe motion blur mathematically, we start by making
If, during the recording time for generating a still image of an
the following assumptions:
object or scene, the imaged object or scene is in motion relative
(1) The blurred image has been appropriately rotated so that
to the imaging system, the recorded image will exhibit a streak-
196 CHAPTER 6 DETERMINISTIC APPROACH TO IMAGE RESTORATION
The spatial shift occurs along x only, and its length is D, which
is equivalent to defining g(x, y) as the convolution of f (x, y) with
a point spread function h(x, y) composed of a rectangle function
of spatial duration D = sT and centered at D/2:
transforms. Using entries #1a and #4 in Table 2-5 gives where s is the relative speed of the imager (along the x axis).
The total number of time shifts N that occur during the total
x − D/2 δ (y) recording time T is
H(µ , ν ) = F x→µ rect F y→ν
D s T
N= . (6.45)
D ∆t
= sinc(µ D) e− jπ µ D . (6.38)
s In terms of these new quantities, the discrete-case analogues to
Eqs. (6.36) and (6.37) are
The convolution in the spatial domain given by Eq. (6.36)
becomes a product in the spatial frequency domain: N
g[n, m] = ∑ f [n − i, m] ∆t = f [n, m] ∗ ∗h[n, m], (6.46)
G(µ , ν ) = F(µ , ν ) H(µ , ν ). (6.39) i=0
To recover the unblurred image f (x, y), we need to: where the discrete-space PSF h[n, m] is
(a) divide Eq. (6.39) by H(µ , ν ) to obtain
n − N/2
1 h[n, m] = rect δ [m] ∆t . (6.47)
F(µ , ν ) = G(µ , ν ), (6.40) N/2
H(µ , ν )
The rectangle function is of duration (N + 1), extending from
where G(µ , ν ) is the spatial frequency spectrum of the blurred n = 0 to n = N, and centered at N/2. We assume that N is an
image, and then even integer. As with the continuous-space case, the deblurring
(b) perform an inverse transform on F(µ , ν ). operation (to retrieve f [n, m] from g[n, m]) is performed in the
However, in view of the definition of the sinc function, spatial frequency domain, wherein the assumption that N is an
even integer is not relevant, so the assumption is mathematically
sin(π µ D) convenient, but not critical.
sinc(µ D) = , (6.41)
π µD The spatial frequency domain analogue of Eq. (6.46) is
it follows that the spatial frequency response H(µ , ν ) = 0 for G(Ω1 , Ω2 ) = F(Ω1 , Ω2 ) H(Ω1 , Ω2 ). (6.48)
integer values of µ D. Consequently, the inverse filter 1/H(µ , ν )
is undefined for nonzero integer values of µ D, thereby requiring Here, G(Ω1 , Ω2 ) is the 2-D spectrum of the recorded blurred
the use of regularization (Section 6-4.3). image, and H(Ω1 , Ω2 ) is the discrete-space spatial frequency
response function (DSSF) response of h[n, m]. From entry #7
in Table 2-8, and noting that ∆t = T /N, the DSSF response
6-6.2 Motion Blur after Sampling function corresponding to Eq. (6.47) is given by
To convert image representation from the continuous-space case H(Ω1 , Ω2 ) = DSFT{h[n, m]}
of the previous subsection to the sampled-space case, we start by
n − N/2
sampling unblurred (still) image f (x, y) and the motion-blurred = DTFTn→Ω1 rect
image g(x, y) at x = n∆ and y = m∆: N/2
T
× DTFTm→Ω2 {δ [m]}
f [n, m] = f (x = n∆, y = m∆), (6.42a)
N
g[n, m] = g(x = n∆, y = m∆). (6.42b) T sin Ω1 N+1
= 2
e− jΩ1 N/2 . (6.49)
N sin(Ω1 /2)
We also discretize time t as
Summary
Concepts
• Image restoration is about reconstructing an image from filtered image.
its blurred, noisy, or interference-corrupted version. • Notch filtering reduces sinusoidal interference caused by
• Lowpass filtering reduces noise, but it blurs edges and AC interference.
fine-scale image features. Wavelet-based denoising (in • Motion blur deconvolution undoes the blur caused by
Chapter 7) reduces noise while preserving edges. camera motion.
• A Hamming-windowed PSF reduces “ringing” in the
PROBLEMS 199
Mathematical Formulae
1-D Hamming-windowed lowpass filter Tikhonov regularization criterion
hFIR [n] = N−1 N−1
Ωc sinc Ωc n
h π n i e= ∑ ∑ [(g[n, m] − h[n, m] ∗ ∗ fˆ[n, m])2 + λ 2 fˆ[n, m]2 ]
0.54 + 0.46 cos |n| ≤ N, n=0 m=0
π π N
Wiener filter
0 |n| > N
H∗ [k1 , k2 ]
F̂[k1 , k2 ] = G[k1 , k2 ]
2-D Hamming-windowed lowpass filter |H[k1 , k2 ]|2 + λ 2
hFIR [n, m] = hFIR [n] hFIR [m]
Motion blur PSF
Deconvolution formulation 1 x − D/2
h(x, y) = rect δ (y)
g[n, m] = h[n, m] ∗ ∗ f [n, m] + v[n, m] s D
Important Terms Provide definitions or explain the meaning of the following terms:
deconvolution Hamming window motion blur notch filter Tikhonov criterion Wiener filter
(a) Compute the spatial frequency response H(Ω1 , Ω2 ) of this Add and subtract |GH∗ |2 , divide by (HH∗ + λ 2 ), and complete
system. the square.
(b) Show that it does indeed eliminate f [n, m].
6.11 This is an introductory image deconvolution problem
(c) Specify a problem with using this LSI system as a notch using a Wiener filter. A crude lowpass filter is equivalent to
filter. convolution with PSF
6.6 Download file P66.mat. Use notch filtering to eliminate (
stripes in the clown image. Print out the striped image, its 1/L2 for 0 ≤ n, m ≤ L − 1,
h[n, m] =
spectrum, and the notch-filtered image and its spectrum. Hint: 0 otherwise.
Let x[n] have length N and its N-point DFT X[k] have a peak
at k = k0 . From Eq. (2.88), the peak represents a sinusoid This problem undoes this crude lowpass filter using a Wiener
with frequency Ω0 = 2π k0 /N. The sinusoid will repeat about filter with λ = 0.01.
Ω/(2π )N = k0 times over the length N of x[n]. (a) Blur the clown image with h[n, m] for L = 11 using:
6.7 Download file P67.mat. Use notch filtering to eliminate clear;load clown;
stripes in the head. Print out the striped image, its spectrum, and H=ones(11,11)/121;Y=conv2(X,H);
the notch-filtered image and its spectrum. Hint: Let x[n] have (b) Deblur the blurred image using a Wiener filter using:
length N and its N-point DFT X[k] have a peak at k = k0 . From imagesc(Y),colormap(gray);
Eq. (2.88), the peak represents a sinusoid with frequency Ω0 = FY=fft2(Y);FH=fft2(H,210,210);
2π k0/N. The sinusoid will repeat about Ω/(2π )N = k0 times FZ=FY.*conj(FH)./(abs(FH).
over the length N of x[n]. *abs(FH)+.0001);
6.8 Download file P68.mat. Use notch filtering to eliminate Z=real(ifft2(FZ));
two sets of lines. Note that there are horizontal lines on top of the figure,imagesc(Z),colormap(gray)
image, and vertical lines in the image. Use the procedure used in
6.12 This is an introductory image deconvolution problem
Section 6-2, but in both horizontal and vertical directions. Print
using a Wiener filter. A crude lowpass filter is equivalent to
out the original image, its spectrum, and the notch-filtered image
convolution with PSF
and its spectrum.
(
6.9 Download file P69.mat. Use notch filtering to eliminate 1/L2 for 0 ≤ n, m ≤ L − 1,
two sets of lines. Note that there are horizontal lines on top of the h[n, m] =
0 otherwise.
image, and vertical lines in the image. Use the procedure used in
Section 6-2, but in both horizontal and vertical directions. Print This problem undoes this crude lowpass filter using a Wiener
out the original image, its spectrum, and the notch-filtered image filter with λ = 0.01.
and its spectrum.
(a) Blur the letters image with h[n, m] for L = 15 using:
clear;load letters;H=ones(15,15)/225;
Section 6-4: Image Deconvolution Y=conv2(X,H);
6.10 Derive the Wiener filter by showing that the fˆ[n, m] (b) Deblur the blurred image using a Wiener filter using:
minimizing the Tikhonov functional imagesc(Y),colormap(gray);FY=fft2(Y);
FH=fft2(H,270,270);
T = ∑ ∑[(g[n, m] − h[n, m] ∗ ∗ fˆ[n, m])2 + λ 2 ( fˆ[n, m])2 ] FZ=FY.*conj(FH)./(abs(FH).
n m *abs(FH)+.0001);Z=real(ifft2(FZ));
figure,imagesc(Z),colormap(gray)
has 2-D DFT
6.13 Deblurring due to an out-of-focus camera can be mod-
H[k1 , k2 ]∗
F̂[k1 , k2 ] = G[k1 , k2 ] . elled crudely as a 2-D convolution with a disk-shaped point-
|H[k1 , k2 ]|2 + λ 2 spread function
Hints: Use Parseval’s theorem and (
1 for n2 + m2 < R2 ,
h[n, m] =
|a + b|2 = aa∗ + ab∗ + ba∗ + bb∗. 0 for n2 + m2 > R2 .
PROBLEMS 201
This problem deblurs an out-of-focus image in the (unrealistic) (b) followed by, for an additional Ty − Tx s (for Tx < t < Ty ),
absence of noise. (c) vertical, in increasing y, at speed ry cm/s for Ty − Tx s.
(a) Blur the letter image with an (approximate) disk PSF using
Compute the spatial frequency response of the camera motion.
H(25,25)=0;for I=1:25;for J=1:25;
if((I-13)*(I-13)+(J-13)*(J-13)<145); 6.19 Download file P619.mat. The goal is to deconvolve the
H(I,J)=1;end;end;end; motion blur.
load letters;Y=conv2(X,H); (a) Compute and display the spectrum of the blurred image.
subplot(221),imagesc(Y),colormap(gray) What causes the vertical bands of zeros (VBZ)? Hint: See
(b) Deblur this out-of-focus image using the command the numerator of Eq. (6.45).
Z=real(ifft2(fft2(Y)./fft2(H,280,280))); (b) From the spacing between the VBZ, compute N in
subplot(222),imagesc(Z),colormap(gray) Eq. (6.45).
Note that the size of the blurred image is 256 + 25 − 1
= 280. (c) Deconvolve the image using the Wiener filter Eq. (6.46).
Let T = 1 and λ = 0.01.
(c) Explain why this approach will not work in the real world
(i.e., in the presence of noise). 6.20 Download file P620.mat. The goal is to deconvolve the
motion blur.
6.14 Repeat Problem 6.13, only now add noise to the blurred
image: (a) Compute and display the spectrum of the blurred image.
What causes the vertical bands of zeros (VBZ)? Hint: See
(a) Add noise to the blurred image using the numerator of Eq. (6.45).
Y=Y+100*randn(280,280);
(b) From the spacing between the VBZ, compute N in
(b) Deblur the image as in Problem 6.13. You should get noise! Eq. (6.45).
(c) Deblur the image using a Wiener filter, using (c) Deconvolve the image using the Wiener filter Eq. (6.46).
FH=fft2(H,280,280); Let T = 1 and λ = 0.01.
W=real(ifft2(fft(Y).*conj(FH)./
(abs(FH).*abs(FH)+10))); 6.21 Download file P621.mat. The goal is to deconvolve the
subplot(221),imagesc(Z),colormap(gray) motion blur. The blurred image in this problem is the SAR image
subplot(222),imagesc(W),colormap(gray) from Chapter 4.
6.15 Repeat Problem 6.13 using the clown image. Note that (a) Compute and display the spectrum of the blurred image.
the size of the blurred image is now 200 + 25 − 1 = 224. What causes the vertical bands of zeros (VBZ)? Hint: See
the numerator of Eq. (6.45).
6.16 Repeat Problem 6.14 using the clown image. Note that (b) From the spacing between the VBZ, compute N in
the size of the blurred image is now 200 + 25 − 1 = 224. Eq. (6.45).
Add noise using Y=Y+randn(224,224); and use λ 2 = 100,
since the clown pixel values have a maximum value of only 1, (c) Deconvolve the image using the Wiener filter Eq. (6.46).
while the letters pixel values have a maximum value of 255. Let T = 1 and λ = 0.01.
40
60
Overview, 203
80
7-1 Tree-Structured Filter Banks, 203
7-2 Expansion of Signals in Orthogonal Basis 100
Equations, 238 60
140
Objectives 160
180
• Filtering of signals and images: since the signal or image (2) Addition:
in the wavelet transform domain requires many fewer num-
bers to represent it, thresholding small values of the wavelet y1[n]
transform of a noisy signal or image to zero reduces the
noise in the original signal or image. We will show that the z[n] = y1[n] + y2[n]
combination of thresholding and shrinkage gives results
far superior to using the 2-D DFT for noise reduction. y2[n]
After this Overview section, we present the Haar wavelet
transform, which is the simplest wavelet transform, and yet (3) Downsampling (decimation):
illustrates many features of the family of wavelet transforms.
We then present quadrature mirror filters (QMFs) and derive x[n] 2 yd [n] = x[2n]
the Smith-Barnwell condition for perfect reconstruction of
the original signal from its wavelet transform. We conclude Discarding every other sample in x[n].
our treatment of wavelets by deriving the Daubechies wavelet
function, which is the most commonly used wavelet function (4) Upsampling (zero-stuffing):
because it sparsifies many real-world signals. Finally, examples (
of image compression, denoising, and compressed sensing are x[n/2] for n even
x[n] 2 yu [n] =
provided. 0 for n odd
203
204 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
g[n] 2 xLDLD[n]
xL[n] xLD[n]
g[n] 2
h[n] 2 xLDHD[n]
x[n]
g[n] 2 xHDLD[n]
xH[n] xHD[n]
h[n] 2
h[n] 2 xHDHD[n]
Figure 7-1 Tree-like filter structure for subband decomposition. The green boxes denote lowpass and highpass frequency filters, realized
through cyclic convolution.
Inserting a zero between successive values of x[n]. into 2N signals, each of which represents a different frequency
band of bandwidth π /2N and is sampled only 1/2N as often as
(5) Convolution: x[n]. Use of N = 5, resulting in 25 = 32 subbands, is a common
choice.
In Fig. 7-1:
x[n] h[n] y[n] = h[n] ∗ x[n]
• xLD [n] is the lowpass part, 0 ≤ |Ω| ≤ π2 , of x[n].
π
• xHD [n] is the highpass part, 2 ≤ |Ω| ≤ π , of x[n].
In Fig. 7-1, the input signal x[n] with spectrum (DTFT)
X(Ω) is separated into a low-frequency-band signal xL [n] whose The signals at the second stage of the filter bank have spectra
spectrum is roughly that are roughly as follows:
( • xLDLD [n] is the lowpass part, 0 ≤ |Ω| ≤ π2 , of xLD [n], which
X(Ω) for 0 ≤ |Ω| < π /2, is equivalent to the lowpass part, 0 ≤ |Ω| ≤ π4 , of x[n].
XL (Ω) = (7.1)
0 for π /2 < |Ω| ≤ π ,
• xLDHD [n] is the highpass part, π2 ≤ |Ω| ≤ π , of xLD [n],
which is equivalent to the bandpass part π4 ≤ |Ω| ≤ π2 of
and a high-frequency-band signal xH [n] whose spectrum is
x[n].
roughly
( If we were to extend the filter bank in Fig. 7-1 to another stage,
0 for 0 ≤ |Ω| < π /2, the signals at the third stage would have spectra that are roughly:
XH (Ω) = (7.2)
X(Ω) for π /2 < |Ω| ≤ π . xLDLDLD [n] is the lowpass part 0 ≤ |Ω| ≤ π2 , of xLDLD [n], which
is equivalent to the lowpass part, 0 ≤ |Ω| ≤ π8 , of x[n].
Each signal can be downsampled by 2 without aliasing, resulting
in xLD [n] = xL [2n] and xHD [n] = xH [2n]. There are now two xLDLDHD [n] is the highpass part, π2 ≤ |Ω| ≤ π , of xLDLD [n],
different signals, each of which is sampled only half as often which is equivalent to the bandpass part π8 ≤ |Ω| ≤ π4 of x[n].
as x[n], so the total number of samples is unaltered, and each At each stage, decimation (halving the sampling rate) expands
represents a different frequency band of the original signal. the spectrum of each signal to the full range 0 ≤ |Ω| < π (see
This same decomposition can then be applied to each of Section 7-1.1).
the two downsampled signals xLD and xHD , which results in
four signals, each of which is sampled only one fourth as 7-1.1 Octave-Based Filter Banks
often as x[n], so the total number of samples is the same, and
each represents a different frequency band of bandwidth π /4. The tree structure in Fig. 7-1 can be replaced with the simpler
Repeating this decomposition N times, x[n] can be decomposed structure shown in Fig. 7-2. As we show later in Section 7-5,
7-1 TREE-STRUCTURED FILTER BANKS 205
xLDL[n]
g[n] 2 xLDLD[n]
xL[n] xLD[n]
g[n] 2
xLDH[n]
h[n] 2 xLDHD[n]
x[n]
xH[n] xHD[n]
h[n] 2
Figure 7-2 Octave-based filter bank structure for subband decomposition. Note that only the upper half (lowpass) of each stage is
decomposed further.
xLDLD[n] 2 g[−n]
xLD[n]
2 g[−n]
xHD[n] 2 h[−n]
◮ Real-world signals and images do tend to consist of Exercise 7-1: (a) An input signal of duration N is fed
mostly slowly-varying regions, containing a few localized into a tree-based filter bank of five stages. What is the
regions in which they are fast-varying. Wavelets are good combined total duration of the output signals? Repeat for
at representing such signals with wavelet transforms that (b) an octave-based filter bank.
are mostly zero-valued. ◭
Answer: (a) 25 N = 32N. (b) N, because lower-
wavenumber bands can be sampled less often.
An important consequence of the orthogonality property is that function given by Eq. (7.4) and its inverse in Eq. (7.6). Table 7-1
coefficients xk can be computed from x[n] using compares attributes of the generic orthogonal expansion basis
function with those of the DFT and the continuous-time Fourier
∞
1 series.
xk = ∑ x[n] φ ∗k [n]. (7.6)
C n=−∞
The other part of the significance is Rayleigh’s theorem (Section ◮ If C = 1 in Eq. (7.6), the orthogonal basis functions are
7-2.3). said to be orthonormal. The wavelet transform (Section
7-5) uses orthonormal basis functions. ◭
7-2.2 Expansion Coefficients
Equation (7.6) can be derived as follows: 7-2.3 Parseval’s Theorem and its Significance
1. In Eq. (7.4), change index k to index k2 , giving For two continuous-time signals x(t) and y(t) and their asso-
∞ ciated Fourier transforms X( f ) and Y( f ), Parseval’s theorem is
x[n] = ∑ xk2 φ k2 [n]. (7.7) stated in Eq. (2.27) as
k2 =1 Z ∞ Z ∞
x(t) y∗ (t) dt = X( f ) Y∗ ( f ) d f . (7.12a)
2. Upon multiplying both sides of Eq. (7.7) by φ ∗k1 [n] and −∞ −∞
summing over index n, we have The special case wherein x(t) = y(t) is known as Rayleigh’s
∞ ∞ ∞ theorem:
∑ x[n] φ ∗k1 [n] = ∑ ∑ xk2 φ k2 [n] φ∗k1 [n]. (7.8) Z ∞ Z ∞
n=−∞ n=−∞ k2 =1 E= |x(t)|2 dt = |X( f )|2 d f , (7.12b)
−∞ −∞
3. Interchanging the order of summations in the right side of which states that the energies of x(t) and X( f ) are equal.
Eq. (7.8) leads to Similarly, Rayleigh’s theorem for a discrete-time signal x[n]
∞ ∞ ∞ is given by Eq. (2.80) as
∑ x[n] φ ∗k1 [n] = ∑ xk2 ∑ φ k2 [n] φ ∗k1 [n]. (7.9) Z π
∞
n=−∞ k2 =1 n=−∞ 1
∑ |x[n]|2 = |X(Ω)|2 dΩ, (7.13)
n=−∞ 2π −π
4. Using the orthogonality property given by Eq. (7.5) gives
where X(Ω) is the DTFT of x[n].
∞ ∞
x[n] φ ∗k1 [n] = The statements given by Eqs. (7.12) and (7.13) can be
∑ ∑ xk2 C δ [k2 − k1] = C xk1 . (7.10)
generalized to the generic orthogonal basis function expressed
n=−∞ k2 =1
in Eq. (7.4):
Dividing by C and replacing k1 with k gives Eq. (7.6). ∞ ∞
A good example of an orthogonal expansion is the 1-D DFT ∑ |x[n]|2 = C ∑ |xk |2 (7.14a)
and its inverse defined in Eq. (2.89) for a finite-length signal n=−∞ k=−∞
x[n]: (Rayleigh’s theorem),
∞ ∞
M−1
− j2π nk/N ∑ x[n] y[n]∗ = C ∑ xk y∗k (7.14b)
X[k] = ∑ x[n] e , k = 0, . . . , N − 1, (7.11a) n=−∞ k=−∞
n=0
N−1
(Parseval’s theorem),
1 j2π nk/N
x[n] =
N ∑ X[k] e , n = 0, . . . , M − 1. (7.11b)
where x[n] and y[n] are any two discrete-time functions. Our
k=0
interest in this book is in real-valued 2-D images, so the complex
The ranges in the summations in Eqs. (7.11) are particular to the conjugation on x[n] and y[n] in Eq. (7.14a) is irrelevant, but we
DFT and differ from those in the generic definition of the basis have decided to retain it for the sake of completeness.
208 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
Table 7-1 1-D DFT and Fourier series compared with generic orthogonal expansion function.
For an orthonormal set of basis functions with C = 1, Ray- equal to the energy of the transform coefficients { ε k }.
leigh’s theorem states that the energy of x[n], summed over all n,
is equal to the energy of coefficients { xk }, summed over all k.
The statement is equally applicable to small perturbations in
total energy. Consider, for example, a small perturbation ε [n]
from signal x[n]. The perturbed signal is
and
Answer: By Rayleigh’s theorem, the average power of x(t) h[n] = { h[0], h[1], . . . , h[N2 ] }. (7.19b)
is Their linear convolution is
∞
1 π2
∑ k2 = 6
, ∞
k=1 y[n] = h[n] ∗ x[n] = ∑ h[n] x[n − i]. (7.20)
and the average power of the output signal is i=∞
2 If x[n] has
1
∑ k2 = 1.25. support: Nxℓ ≤ n ≤ Nxu , and
k=1
duration: Nx = Nxu − Nxℓ + 1,
This is because the lowpass filter sets the Fourier series
coefficients for k ≥ 3 to zero. where second subscripts ℓ and u refer to the lower and upper
1.25 values of Nx , and if h[n] has
= 0.76.
π 2 /6 support: Nhℓ ≤ n ≤ Nhu , and (7.21a)
duration: Nh = Nhu − Nhℓ + 1, (7.21b)
7-3.1 Why Use Cyclic Convolutions? support: Nyℓ ≤ n ≤ Nyu , and (7.21c)
duration: Nyu − Nyℓ + 1 = Nx + Nh − 1, (7.21d)
In Section 2-7.2, we introduced the concept of cyclic convo-
lution x1 [n]
c x2 [n] between two signals x1 [n] and x2 [n], and where
we showed how it can be computed from the traditional linear
convolution x1 [n] ∗ x2 [n], as demonstrated in Example 2-6, or by Nyℓ = Nxℓ + Nhℓ, (7.21e)
applying the DFT method. Nyu = Nxu + Nhu . (7.21f)
The wavelet transform—the prime topic of this chapter—
employs convolution, decimation, and zero-stuffing. If we use Graphically, these supports and associated durations are:
linear convolutions in computing the wavelet transform, the
total length of the decimated signal will be longer than that of
Nx = Nxu − Nxl + 1
the original signal. At each stage in Fig. 7-1 or Fig. 7-2, for x[n]:
example, the linear convolution with h[n] or g[n] would generate Nxl Nxu
a new signal longer than that of the input signal by the length Nh = Nhu − Nhl + 1
of h[n] or g[n], respectively. The advantage of cyclic convolution h[n]:
is that the new signal remains at the same length as that of the Nhl Nhu
input signal. This property limits the computational storage to Ny = Nx + Nh − 1
the same storage required for the original signal. y[n]:
As noted, the cyclic convolution can be computed directly Nyl = Nxl + Nhl Nyu = Nxu + Nhu
from the linear convolution, or indirectly by applying the DFT
method. In preparation for the material presented in forthcoming
sections, we present reviews of both computational approaches.
A. Causal * Causal
7-3.2 Computing Linear Convolution If x[n] and h[n] are both causal signals, with Nxℓ = 0 and Nhℓ = 0,
Suppose we are given a causal signal x[n] and a causal filter (or then Eq. (7.20) simplifies to
another signal) h[n] defined as n
y[n] = ∑ h[i] x[n − i], 0 ≤ n ≤ Nyu . (7.22)
x[n] = { x[0], x[1], . . . , x[N1 ] }, (7.19a) i=0
210 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
Linear y[n] = y[0] y[1] ... y[ ] ... y[N − 1] y[N] y[N + 1] ... y[Nyu]
+ + +
Linear y[n] = y[Nyl] ... y[−2] y[−1] y[0] ... y[ ] ... y[Nyu − 1] y[Nyu]
+ + +
Figure 7-5 Graphical representation of obtaining cyclic convolution of order N from linear convolution.
Again, the same result can be obtained by computing the cyclic The linear convolution of the modified functions is
convolution with the DFT method: ∞
y′ [n] = h′ [n] ∗ x′ [n] = ∑ h′ [i] x′ [n − i]
ifft(fft([3,4,5,6,7,8]).*fft([1,0,0,0,3,2])) i=−∞
∞
= ∑ (−1)i h[i] (−1)n−i x[n − i]
i=−∞
7-3.4 Decimating and Zero-Stuffing ∞
= ∑ (−1)n h[i] x[n − i]
Decimating 1-D signals by 2, and zero-stuffing 1-D signals i=−∞
by 2, are essential parts of computing discrete-time wavelet ∞
transforms. Hence, we present this quick review of these two = (−1)n ∑ h[i] x[n − i]
i=−∞
concepts.
Decimating a signal x[n] by two means deleting every other = (−1)n y[n]. (7.33)
value of x[n]:
Combining the result given by Eq. (7.33) with the definition of
convolution given in Eq. (2.71a) leads to the conclusion:
x[n] 2 yd [n] = x[2n] = { . . . , x[0], x[2], x[4], . . . }.
(7.28)
Zero-stuffing a signal x[n] by two means inserting zeros between (−1)n { h[n] ∗ x[n] } = { (−1)n h[n] } ∗ { (−1)n x[n] }. (7.34)
successive values of x[n]:
(
x[n/2] for n even, 7-3.6 Wavelet Applications
x[n] 2 yu [n] =
0 for n odd, For applications involving wavelets, the following conditions
= { . . . , x[0], 0, x[1], 0, x[2], 0, x[3], . . . }. apply:
(7.29) • Batch processing is used almost exclusively. This is be-
cause the entire original signal is known before processing
Decimating by 2, followed by zero-stuffing by 2, replaces x[n] begins, so the use of non-causal filters is not a problem.
with zeros for odd times n:
• The order N of the cyclic convolution is the same as the
x[n] 2 2 { . . . , x[0], 0, x[2], 0, x[4], 0, x[6], . . . }, duration of the signal x[n], which usually is very large.
7-4 HAAR WAVELET TRANSFORM 213
• Filtering x[n] with a filter h[n] will henceforth mean com- 7-4 Haar Wavelet Transform
puting the cyclic convolution h[n]
c x[n]. The duration L
of filter h[n] is much smaller than N (L ≪ N), so h[n] gets The Haar transform is by far the simplest wavelet transform,
zero-padded (see Section 2-7.3) with (N − L) zeros. The and yet it illustrates many of the concepts of how the wavelet
result of the cyclic convolution is the same as h[n] ∗ x[n], transform works.
except for the first (L − 1) values, which are aliased, and
the final (L − 1) values, which are no longer present, but
added to the first (L − 1) values.
7-4.1 Single-Stage Decomposition
Consider the finite-duration signal x[n]
• Filtering x[n] with the non-causal filter h[−n] gives the
same result as the linear convolution h[−n] ∗ x[n], except x[n] = { a, b, c, d, e, f , g, h }. (7.35)
that the non-causal part of the latter will alias the final
(L − 1) places of the cyclic convolution. Define the lowpass and highpass filters with impulse responses
ghaar [n] and hhaar [n], respectively, as
• Zero-padding does not increase the computation, since
multiplication by zero is known to give zero, so it need not 1
ghaar [n] = √ { 1, 1 }, (7.36a)
be computed. 2
1
• For two filters g[n] and h[n], both of length L, g[n]
c h[n] hhaar [n] = √ { 1, −1 }. (7.36b)
2
consists of g[n] ∗ h[n] followed by N − (2L − 1) zeros.
The frequency responses of these filters are the DTFTs given by
• As long as the final result has length N, linear convolutions
may be replaced with cyclic convolutions and the final 1 − jΩ
√ Ω − jΩ/2
Ghaar (Ω) = √ (1 + e ) = 2 cos e , (7.37a)
result will be the same. 2 2
1 √ Ω
Hhaar (Ω) = √ (1 − e− jΩ) = 2 sin je− jΩ/2 , (7.37b)
2 2
Exercise 7-3: Compute the cyclic convolution of {1, 2} and
{3, 4} for N = 2. which have lowpass and highpass frequency responses, respec-
Answer: {1, 2} ∗ {3, 4} = {3, 10, 8}. Aliasing the output tively (Fig. 7-6).
gives {8 + 3, 10} = {11, 10}. Define the average (lowpass) signal xL [n] as
xL [n] = x[n]
c ghaar [n] (7.38a)
Exercise 7-4: x[n] 2 2 y[n]. 1
= √ {a + h, b + a, c + b, d + c, e + d . . .}
Express y[n] in terms of x[n]. 2
and the detail (highpass) signal xH [n] Note that downsampling by 2 followed by upsampling by 2
replaces values of x[n] with zeros for odd times n.
xH [n] = x[n]
c hhaar [n] (7.38b) 2. Next, filter xLDU [n] and xHDU [n] with filters ghaar [−n] and
1 hhaar [−n], respectively.
= √ {a − h, b − a, c − b, d − c, e − d . . .}. As noted earlier in Section 7-1.3, the term “filter” in the
2
context of the wavelet transform means “cyclic convolution.”
Next, define the downsampled average signal xLD [n] as Filters ghaar [n] and hhaar [n] are called analysis filters, because
they are used to compute the Haar wavelet transform. Their time
1 reversals ghaar [−n] and hhaar [−n] are called synthesis filters,
xLD [n] = xL [2n] = √ {a + h, c + b, e + d, g + f } (7.39a)
2 because they are used to compute the inverse Haar wavelet
transform (that is, to reconstruct the signal from its Haar wavelet
and the downsampled detail signal xHD [n] as transform). The reason for using time reversals here is explained
below.
1
xHD [n] = xH [2n] = √ {a − h, c − b, e − d, g − f }. (7.39b) The cyclic convolutions of xLDU [n] with ghaar [−n] and
2 xHDU [n] with hhaar [−n] yield
The signal x[n] of duration 8 has been replaced by the two 1
signals xLD [n] and xHD [n], each of durations 4, so no information xLDU [n]
c ghaar [−n] = {a+h, c+b, c+b, . . . , a+h},
2
about x[n] has been lost. We use cyclic convolutions instead (7.41a)
of linear convolutions so that the cumulative length of the
1
downsampled signals equals the length of the original signal. c hhaar [−n] = {a − h, b − c, c − b, . . ., h − a}.
xHDU [n]
Using linear convolutions, each convolution with ghaar [n] or 2
(7.41b)
hhaar [n] would lengthen the signal unnecessarily. As we shall
see, using cyclic convolutions instead of linear convolutions is
sufficient to recover the original signal from its Haar wavelet 3. Adding the outcomes of the two cyclic convolutions gives
transform. x[n]:
x[n] x[n]
Decomposition Reconstruction
1 1 1
xL [n] = x[n]
c √ {1, 1} c √ {1, 1} + xHDU [n]
xLDU [n]
c √ {−1, 1}
2 2 2
1 = {4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 4} = x[n].
= √ {8, 8, 8, 8, 8, 5, 2, 2, 2, 2, 4, 6, 6, 6, 6, 7},
2
1 We observe that the outcome of the second cyclic convolution
xH [n] = x[n]
c √ {1, −1} is sparse (mostly zero-valued). The Haar transform allows x[n],
2 which has duration 16, to be represented using the eight values
1 of xLD [n] and the single nonzero value (and its location n = 5)
= √ {0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1},
2 of xHD [n]. This saves almost half of the storage required for x[n].
1 Hence, x[n] has been compressed by 43%.
xLD [n] = xL [2n] = √ {8, 8, 8, 2, 2, 4, 6, 6}, Even though x[n] is not sparse, it was transformed, using
2
the Haar transform, into a sparse representation with the same
1
xHD [n] = xH [2n] = √ {0, 0, 0, 0, 0, 2, 0, 0}. number of samples, meaning that most of the values of the Haar-
2 transformed signal are zero-valued. This reduces the amount of
The original signal x[n] can be recovered from xLD [n] and xHD [n] memory required to store x[n], because only the times at which
by nonzero values occur (as well as the values themselves) need
be stored. The few bits (0 or 1) required to store locations
1 of nonzero values are considered to be negligible in number
xLDU [n] = √ {8, 0, 8, 0, 8, 0, 2, 0, 2, 0, 4, 0, 6, 0, 6, 0}, compared with the many bits required to store the actual nonzero
2
values. Since the Haar transform is orthogonal, x[n] can be
1
xHDU [n] = √ {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0}, recovered perfectly from its Haar-transformed values.
2
1
c √ {1, 1}
xLDU [n]
2 7-4.3 Multistage Decomposition and
= {4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4}, Reconstruction
1
c √ {−1, 1}
xHDU [n]
In the simple example used in the preceding subsection, only 1
2 element of the Haar-transformed signal xHD [n] is nonzero, but all
= {0, 0, 0, 0, 0, 0, 0, 0, 0, −1, 1, 0, 0, 0, 0, 0}, 8 elements of xLD [n] are nonzero. We can reduce the number of
nonzero elements of xLD [n] by applying a second Haar transform
stage to it. That is, xLD [n] can be transformed into the two signals
xLDLD [n] and xLDHD [n] by applying the steps outlined in Fig. 7-8.
216 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
xLDL[n]
ghaar[n] 2 xLDLD[n]
xL[n] xLD[n]
ghaar[n] 2
xLDH[n]
hhaar[n] 2 xLDHD[n]
x[n]
xH[n] xHD[n]
hhaar[n] 2
Figure 7-8 Two-stage Haar analysis filter bank. Note that only the upper half of the first stage is decomposed further.
Thus, Decomposition
1 A signal x[n] of duration N = 2K (with K an integer) can be
xLD [n] = √ {8, 8, 8, 2, 2, 4, 6, 6},
2 represented by the Haar wavelet transform through a K-stage
1 decomposition process involving cyclic convolutions with filters
xLDL [n] = xLD [n]
c √ {1, 1} ghaar [n] and hhaar [n], as defined by Eq. (7.36). The signal x[n]
2 can be zero-padded so that its length is a power of 2, if that is
= {7, 8, 8, 5, 2, 3, 5, 6}, not already the case, just as is done for the FFT. The sequential
(7.45)
1 process is:
xLDH [n] = xLD [n]
c √ {1, −1}
2
= {1, 0, 0, −3, 0, 1, 1, 0}, Stage 1:
xLDLD [n] = xLDL [2n] = {7, 8, 2, 5},
xLDHD [n] = xLDH [2n] = {1, 0, 0, 1}. x[n] hhaar[n] 2 xe1 [n] = xHD [n],
Signal xLDHD [n] is again sparse: only two of its four values are
x[n] ghaar[n] 2 Xe1 [n] = xLD [n].
nonzero. So x[n] can now be represented by the four values
of xLDLD [n], the two nonzero values of xLDHD [n], and the one
nonzero value of xHD [n]. This reduces the storage required for Stage 2:
x[n] by 57%.
The average signal xLDLD [n] can in turn be decomposed even
further. The result is an analysis filter bank that computes Xe1 [n] hhaar[n] 2 xe2 [n] = xLDHD [n],
the Haar wavelet transform of x[n]. This analysis filter bank
consists of a series of sections like the left half of Fig. 7-7, Xe1 [n] ghaar[n] 2 Xe2 [n] = xLDLD [n].
connected as in Fig. 7-8, except that each average signal is
decomposed further. The signals computed at the right end of ..
.
this analysis filter bank constitute the Haar wavelet transform of
x[n]. Reconstruction of x[n] is shown in Fig. 7-9.
Stage K:
xLDLD[n] 2 ghaar[−n]
xLD[n]
2 ghaar[−n]
xHD[n] 2 hhaar[−n]
{ xe1 [n], xe2 [n], xe3 [n], . . . , xeK [n], XeK [n] }. (7.46) XeK [n] 2 ghaar [−n] AK−1 [n],
|{z} |{z} |{z} | {z } | {z }
Duration N/2 N/4 N/8 N/2K N/2K
To represent x[n], we need to retain the “high-frequency” outputs xeK [n] 2 hhaar [−n] BK−1 [n],
of all K stages (i.e., { xe1 [n], xe2 [n], . . . , xeK [n] }), but only the final
output of the “low-frequency” sequence, namely XeK [n]. The XeK−1 [n] = AK−1 [n] + BK−1[n].
total duration of all of the (K + 1) Haar transform signals is
N N N N N Stage 2:
+ + + · · · + K + K = N, (7.47)
2 4 8 2 2
which equals the duration N of x[n]. We use cyclic convolutions XeK−1 [n] 2 ghaar [−n] AK−2 [n],
instead of linear convolutions so that the total lengths of the
downsampled signals equals the length of the original signal.
Were we to use linear convolutions, each convolution with xeK−1 [n] 2 hhaar [−n] BK−2 [n],
ghaar [n] or hhaar [n] would lengthen the signal unnecessarily.
XeK−2 [n] = AK−2 [n] + BK−2[n].
• The xek [n] for k = 1, 2, . . . , K are called detail signals. ..
.
bank? and
(2) How should h[n] be chosen so that the outputs of all of the (2)
octave-based decomposition filter banks are sparse, except
for the output of the lowest frequency band? (−1)n g[n] ∗ g[−n] + (−1)n h[n] ∗ h[−n] = 0. (7.51b)
xin[n] xout[n]
Decomposition Reconstruction
Figure 7-11 Single-stage decomposition of x[n] to xLD [n] and xHD [n], and reconstruction of x[n] from xLD [n] and xHD [n].
h[n] ∗ h[−n] = δ [n], for n even. (7.62) Computation of Xe1 [n] and xe1 [n] in Eq. (7.65) is implemented
using the wavelet analysis filter bank shown in Fig. 7-12(a).
Writing out Eq. (7.62) for n even gives Computation of x[n] from Xe1 [n] and xe1 [n] in Eq. (7.64) is
L
implemented using the wavelet synthesis filter bank shown in
Fig. 7-12(b).
n=0 ∑ h2[i] = 1, (7.63a)
The decimation in Fig. 7-12(a) manifests itself in Eq. (7.65)
i=0
L as the 2n in g[2n − i] and h[2n − i]. The zero-stuffing and time
n=2 ∑ h[i] h[i − 2] = 0, (7.63b) reversals in Fig. 7-12(b) manifest themselves in Eq. (7.64) as
i=2 the 2i in g[2i − n] and h[2i − n].
L The average signal Xe1 [n] can in turn be decomposed similarly,
n=4 ∑ h[i] h[i − 4] = 0, (7.63c) as in the wavelet analysis filter bank shown in Fig. 7-12(a). So
i=4
the first term in Eq. (7.64) can be decomposed further, resulting
.. in the K-stage decomposition
.
L ∞
n = L−1 ∑ h[i] h[i − (L − 1)] = 0. (7.63d) x[n] = ∑ XeK [i] g(K) [2K i − n]
i=L−1 i=−∞
K ∞
Recall that since L is odd, L − 1 is even. +∑ ∑ xek [i] h(k) [2k i − n], (7.66)
k=1 i=−∞
◮ The Smith-Barnwell condition is equivalent to stating where average signal XeK [n] and detail signal {e
xk [n],
that the autocorrelation rh [n] = h[n] ∗ h[−n] of h[n] is zero k = 1, . . . , K} are computed using
for even, nonzero, n and 1 for n = 0. This means that h[n] is
∞
orthonormal to even-valued translations of itself. ◭
Xek [n] = ∑ g(k) [2k n − i] x[i]
i=−∞
h[n] 2 x˜ 1[n]
x[n]
h[n] 2 x˜ 2[n]
X˜ 1[n]
g[n] 2
g[n] 2 X˜ 2[n]
X˜ 2[n] 2 g[−n]
2 g[−n]
x˜1[n] 2 h[−n]
Figure 7-12 (a) Wavelet analysis filter bank and (b) wavelet synthesis filter bank.
and the basis functions g(k) [n] and h(k) [n] can be computed transform of x[n] have durations
recursively offline. We will not bother with explicit formulae for
these, since the decomposition and reconstruction are performed x1 [n], xe2 [n], xe3 [n], . . . , xeK [n], XeK [n]}.
{e (7.69)
|{z} |{z} |{z} | {z } | {z }
much more easily using filter banks. The wavelet analysis and N/2 N/4 N/8 N/2K N/2K
synthesis filter banks shown respectively in Fig. 7-12(a) and
(b) are identical in form to the octave-band filter banks shown
in Figs. 7-2 and 7-4, except for the following nomenclature 7-5.5 Amount of Computation
changes:
The total duration of all of the wavelet transform signals together
xe1 [n] = xHD [n], is
N N N N N
xe2 [n] = xLDHD [n], + + + · · · + K + K = N, (7.70)
2 4 8 2 2
(7.68)
Xe1 [n] = xLD [n], which equals the duration N of x[n].
Xe2 [n] = xLDLD [n]. Again, we use cyclic convolutions instead of linear convolu-
tions so that the total lengths of the decimated signals equals the
If x[n] has duration N = 2K , the components of the wavelet length of the original signal.
The total amount of computation required to compute the
wavelet transform of a signal x[n] of duration N can be computed
as follows. Let the durations of g[n] and h[n] be the even integer
7-6 SPARSIFICATION USING WAVELETS OF PIECEWISE-POLYNOMIAL SIGNALS 223
(L + 1) where L is the odd integer in Eq. (7.52). Convolving in Eq. (7.67) and computed using the analysis filter bank shown
x[n] with both g[n] and h[n] requires 2(2(L + 1))N = 4(L + 1)N in Fig. 7-12(a) are sparse (mostly zero-valued). In particular, we
multiplications-and-additions (MADs). But since the results will design g[n] and h[n] so that the wavelet transform detail signals
be decimated by two, only half of the convolution outputs must are sparse when the input signal x[n] is piecewise polynomial,
be computed, halving this to 2(L + 1)N. which we define next. The g[n] and h[n] filters are then used in
At each successive decomposition, g[n] and h[n] are con- the Daubechies wavelet transform.
volved with the average signal from the previous stage, and the
result is decimated by two. So g[n] and h[n] are each convolved
7-6.1 Definition of Piecewise-Polynomial Signals
with the signals
A signal x[n] is defined to be piecewise-Mth-degree polynomial
{ x[n] , Xe1 [n], Xe2 [n], . . . , XeK [n]}. if it has the form
|{z} | {z } | {z } | {z }
N N/2 N/4 N/2K
M
∑ a0,k nk
for − ∞ < n ≤ N0 ,
The total number of MADs required is thus
k=0
M
a nk
N N N
2(L + 1) N + + + · · · + K < 4(L + 1)N. (7.71)
∑ 1,k for N0 < n ≤ N1 ,
2 4 2 x[n] = k=0 (7.72)
M
The additional computation for computing more decomposi-
∑ a2,k nk for N1 < n ≤ N2 .
k=0
tions (i.e., increasing K) is thus minimal.
..
Since L is small (usually 1 ≤ L ≤ 5), this is comparable to .
the amount of computation N2 log2 (N) required to compute the
DFT, using the FFT, of a signal x[n] of duration N. But the DFT This x[n] can be segmented into intervals, and in each interval
requires complex-valued multiplications and additions, while x[n] is a polynomial in time n of degree M. The times Ni at which
the wavelet transform uses only real-valued multiplications and the coefficients {ai,k } change values are sparse, meaning that
additions. they are scattered over time n. In continuous time, such a signal
would be a spline (see Section 4-7), except that in the case of a
Concept Question 7-4: What is the Smith-Barnwell con- spline, the derivatives of the signal must match at the knots (the
dition? times where coefficients {ai,k } change values). The idea here is
that the coefficients {ai,k } can change completely at the times Ni ;
1
there is no “smoothness” requirement. Indeed, these times Ni
Exercise 7-7: If h[n] = √ {6, 2, h[2], 3}, find h[2] so that constitute the edges of x[n].
5 2
h[n] satisfies the Smith-Barnwell condition.
Answer: According to Eq. (7.63), the Smith-Barnwell A. Piecewise-Constant Signals
condition requires the autocorrelation of h[n] to be 1 for First, let x[n] be piecewise constant (M = 0 in Eq. (7.72)), so that
n = 0 and to be 0 for even n 6= 0. For h[n] = {a, b, c, d}, x[n] is of the form
these conditions give a2 + b2 + c2 + d 2 = 1, ac + bd = 0,
and h[2] = −1. a0 for − ∞ < n ≤ N0 ,
a 1 for N0 < n ≤ N1 ,
x[n] = a for N1 < n ≤ N2 . (7.73)
7-6 Sparsification Using Wavelets of
2
.
..
Piecewise-Polynomial Signals
The wavelet filters g[n] and h[n] are respectively called scaling The value of x[n] changes only at a few scattered times. The
and wavelet functions. In this section, we show how to design amount by which x[n] changes at n = Ni is the jump ai+1 − ai :
these functions such that they satisfy the Smith-Barnwell condi- (
tion for perfect reconstruction given by Eq. (7.56), form a QMF 0 for n 6= Ni ,
x[n + 1] − x[n] = (7.74)
pair, and have the property that the detail signals x̃k [n] defined ai+1 − ai for n = Ni .
224 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
This can be restated as In practice, q[n] = q[0] δ [n] has duration = 1, so x[n] ∗ h[n] is still
mostly zero-valued.
x[n + 1] − x[n] = ∑(ai+1 − ai ) δ [n − Ni ]. (7.75)
i
B. Piecewise-Linear Signals
Taking differences sparsifies a piecewise-constant signal, as
illustrated in Fig. 7-13. Next, let x[n] be piecewise linear (M = 1 in Eq. (7.72)), so that
Now let the wavelet function h[n] have the form, for some x[n] is of the form
signal q[n] that is yet to be determined,
a0,1 n + a0,0, −∞ < n ≤ N0 ,
h[n] = q[n] ∗ { 1, −1}. (7.76)
a1,1 n + a1,0, N0 < n ≤ N1 ,
x[n] = a n + a , (7.78)
Since from Fig. 2-3 the overall impulse response of two systems
2,1 2,0 N1 < n ≤ N2 ,
connected in series is the convolution of their impulse responses, ...
h[n] can be implemented by two systems connected in series:
Proceeding as in the case of piecewise-constant x[n], taking
x[n] { 1, −1 } q[n] x[n] ∗ h[n]. differences, and then taking differences of the differences, will
sparsify a piecewise-linear signal. The process is illustrated in
Fig. 7-14.
Convolution with h[n] sparsifies a piecewise-constant input x[n], The bottom signal in Fig. 7-14 is in turn convolved with q[n],
since (using the time-shift property of convolution)
1
6 0.5
4 0
2 −0.5
0 5 10 15 20 25 30 35
0
0 5 10 15 20 25 30 (b) w1[n] = x[n + 1] − x[n]
(a) x[n]
3 1
2 0
1
−1
0
−1 −2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
(b) x[n + 1] − x[n] (c) w2[n] = w1[n + 1] − w1[n]
Figure 7-13 A piecewise-constant signal is compressed by Figure 7-14 A piecewise-linear signal is compressed by
taking differences. (a) a piecewise-constant signal, (b) differ- taking successive differences. (a) A piecewise-linear signal, (b)
ences of the signal. differences of the top signal, (c) differences of the middle signal.
7-6 SPARSIFICATION USING WAVELETS OF PIECEWISE-POLYNOMIAL SIGNALS 225
3
A. Piecewise-Constant Signals
1= ∑ h[n]2. (7.90b)
To compress piecewise-constant signals, we set M = 0; a con- n=0
stant is a polynomial of degree zero. We wish to design a The first equation states that h[n] must be orthogonal to its
db1 = D2 order Daubechies wavelet, which has only 1 differ- even-valued translations, and the second equation simply states
ence because M + 1 = 0 + 1 = 1, and a duration = 2(1) = 2, so that the energy of h[n] must be one, which is a normalization
L = 1. The expression in Eq. (7.84) becomes requirement that should be imposed after q[0] and q[1] have been
h[n] = (q[0] δ [n]) ∗ {1, −1} = {q[0], −q[0]}. (7.85) determined.
| {z } | {z } | {z } Substituting the elements of h[n] given in Eq. (7.89) into
duration=1 1 difference duration=2 Eq. (7.90a) leads to
Inserting this result into the Smith-Barnwell condition of 0 = (q[0] − 2q[1]) q[0] + (q[1] − 2q[0]) q[1]
Eq. (7.63a) gives
q[0]2 + q[0]2 = 1. (7.86) = q[0]2 − 4q[0] q[1] + q[1]2. (7.91)
The requirement given in Eq. (7.63b) is satisfied automatically, Since the scale factor will be set by ∑ h[n]2 = 1, we can, without
since h[n] has duration = 2. Hence, q[0] = √12 and loss of generality, set q[0]=1. This gives
Hence,
3
2 4 6 8 10 12 14 16
n
K=1 K=2 K=3 K=4 (c) x˜ 1[n]
g[n] db1 db2 db3 db4 6
g[0] .7071 .4830 .3327 .2304 4
g[1] .7071 .8365 .8069 .7148 2
g[2] 0 .2241 .4599 .6309 0
g[3] 0 –.1294 –.1350 –.0280 −2
g[4] 0 0 –.0854 –.1870 −4
1 2 3 4 5 6 7 8
n
g[5] 0 0 .0352 .0308 ~
(d) X [n] 2
g[6] 0 0 0 .0329
g[7] 0 0 0 –.0106
0.5
−0.5
xe2 [n], using Eqs. (7.67), (7.94), and (7.95). The results are signals to 2-D images. Downsampling in 2-D involves down-
displayed in parts (b) to (e) of Fig. 7-15. We note that sampling in both directions. For example:
• Average signals Xe1 [n] and Xe2 [n] are low-resolution versions (3,2)
x[n, m] x[3n, 2m].
of x[n].
• Detail signals xe1 [n] and xe2 [n] are sparse (mostly zero), and The downsampling factor in this case is 3 along the horizontal
their nonzero values are small in magnitude. direction and 2 along the vertical direction. To illustrate the
process, we apply it to a 5 × 7 image:
• The db2 wavelet transform of the given x[n] consists of
xe1 [n], xe2 [n], and Xe2 [n]. 1 2 3 4 5 6 7
8 9 10 11 12 13 14 1 4 7
(3,2) 15 18 21 .
These patterns explain the terms “average” and “detail.” 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 32 35
Concept Question 7-5: How is it possible that the 29 30 31 32 33 34 35
wavelet transform requires less computation than the FFT?
Upsampling in 2-D involves upsampling in both directions.
Upsampling a signal x[n, m] by a factor 3 along the horizontal
Exercise 7-8: What are the finest-detail signals of the db3 and by a factor 2 along the vertical, for example, is denoted
wavelet transform of x[n] = 3n2 ? symbolically as
Answer: Zero, because db3 wavelet basis function h[n]
eliminates quadratic signals by construction. x[n, m] (3,2)
n m
x 3 , 2 for n = integer multiple of 3
Exercise 7-9: Show by direct computation that the db2 and m = integer multiple of 2,
scaling function listed in Table 7-3 satisfies the Smith-
0 otherwise.
Barnwell condition.
Answer: The Smith-Barnwell condition requires the au- Applying this upsampling operation to a simple 2 × 2 image
tocorrelation of g[n] to be 1 for n = 0 and to be 0 for yields
even n 6= 0. For g[n] = {a, b, c, d}, these conditions give
a2 + b2 + c2 + d 2 = 1 and ac + bd = 0. The db2 g[n] listed 1 0 0 2
in Table 7-3 is g[n] = {.4830, .8365, .2241, −.1294}. It is 1 2 (3,2) 0 0 0 0
3 4 3 0 0 4 .
easily verified that the sum of the squares of these numbers
is 1, and that (.4830) × (.2241) + (.8365) × (−.1294) = 0. 0 0 0 0
(1) Stage-1 decomposition up to the largest (in size) three detail images:
(1) (1) (1)
{ xeLH [n, m], xeHL [n, m], xeHH [n, m] }.
(1)
x[n, m] g[n] g[m] (2,2) xeLL [n, m] (7.99a)
(k)
The average images xeLL [n, m] are analogous to the average
(1) signals Xek [m] of a signal, except that they are low-resolution
x[n, m] g[n] h[m] (2,2) xeLH [n, m] (7.99b) versions of a 2-D image x[n, m] instead of a 1-D signal x[m].
In 2-D, there are now three detail images, while in 1-D there is
(1) only one detail signal.
x[n, m] h[n] g[m] (2,2) xeHL [n, m] (7.99c) In 1-D, the detail signals are zero except near edges, repre-
senting abrupt changes in the signal or in its slope. In 2-D, the
x[n, m] h[n] h[m]
(1)
xeHH [n, m] (7.99d) three detail images play the following roles:
(2,2)
(k)
(a) xeLH [n, m] picks up vertical edges,
(2) Stage-2 to stage-K decomposition (k)
(b) xeHL [n, m] picks up horizontal edges,
(k)
(1) (2) (c) xeHH [n, m] picks up diagonal edges.
xeLL [n, m] g[n] g[m] (2,2) xeLL [n, m]
50
(1) Stage-4 images: 16 × 16
(4)
The coarsest average image, xLL [n, m], is used as a thumbnail 100
image, and placed at the upper left-hand corner in Fig. 7-16(c).
The three stage-4 detail images are arranged clockwise around 150
(4)
image xLL [n, m].
200
(3) Stage-2 images: 64 × 64 Figure 7-16 (a) 256 × 256 test image, (b) arrangement of
images generated by a 3-stage Haar wavelet transform, and (c)
the images represented in (b). A logarithmic scale is used to
display the values.
7-7 2-D WAVELET TRANSFORM 231
180
Stage 2 images: The 3 (64 × 64) detail images contain a total of
3(64)2 = 12288 pixels. The fourth 64 × 64 image is the average 200
20 40 60 80 100 120 140 160 180 200
image, which is decomposed into the stage 3 images. (a) 200 × 200 clown image
Stage 1 images: The 3 (128 × 128) detail images contain a total
of 3(128)2 = 49152 pixels. The fourth 128 × 128 image is the
average image, which is decomposed into the stage 2 images. 20
The total number of pixels in the wavelet transform of the 40
Shepp-Logan phantom is then:
60
100
This equals the number of pixels in the Shepp-Logan phantom,
which is 2562 = 65536. 120
140
140
180
Shrinking
200
20 40 60 80 100 120 140 160 180 200
A noisy image is given by
(b) Reconstructed clown image
y[n, m] = x[n, m] + v[n, m], (7.104)
Figure 7-18 (a) Original clown image, and (b) image recon-
where x[n, m] is the desired image and v[n, m] is the noise that structed from thresholded db3 Daubechies wavelet transform
had been added to it. The goal of denoising is to recover the images, requiring only 6% as much storage capacity as the
original image x[n, m], or a close approximation thereof, from original image.
the noisy image y[n, m]. We now show that the obvious approach
of simply thresholding the wavelet transform of the image
does not work well. Then we show that the combination of
thresholding and shrinking the wavelet transform of the image noise ratio is low, so little of value is lost by thresholding
does work well. these small values to zero. For large wavelet transform values,
the signal-to-noise ratio is large, so these large values should
7-8.1 Denoising by Thresholding Alone be kept. This approach works poorly on wavelet transforms of
noisy images, as the following example shows.
One approach to denoising is to threshold the wavelet transform Zero-mean 2-D white Gaussian noise with standard deviation
of y[n, m]. For small wavelet transform values, the signal-to- σ = 0.1 was added to the clown image of Fig. 7-19(a). The noisy
7-8 DENOISING BY THRESHOLDING AND SHRINKING 233
20
40
60
80
100
120
140
160
180
200
20 40 60 80 100 120 140 160 180 200
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
200 200
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
(b) Noisy image (c) Image denoised by thresholding its wavelet transform
Figure 7-19 (a) Noise-free clown image, (b) noisy image with SNR = 11.5, and (c) image reconstructed from thresholded wavelet
transform. Thresholding without shrinkage does not reduce noise.
image, shown in Fig. 7-19(b), has a signal-to-noise ratio (SNR) reduce the noise by any appreciable amount.
of 11.5, which means that the noise level is, on average, only
about 8.7% of that of the signal.
The db3 Daubechies wavelet transform was computed for the 7-8.2 Denoising by Thresholding and Shrinkage
noisy image, then thresholded with λ = 0.11, which appeared We now show that a combination of thresholding small wavelet-
to provide the best results. Finally, the image was reconstructed transform values to zero and shrinking other wavelet transform
from the thresholded wavelet transform, and it now appears in values by a small number λ performs much better in denoising
Fig. 7-19(c). Upon comparing the images in parts (b) and (c) of images. First we show that shrinkage comes from minimizing
the figure, we conclude that the thresholding operation failed to a cost functional, just as Wiener filtering given by Eq. (6.31)
234 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
120
x̂[n] = y[n] − λ , for y[n] ≥ 0 and y[n] ≥ λ . (7.108b)
140
160
Case 3: y[n] ≤ 0 and |y[n]| ≤ λ 180
This case, which is identical to case 1 except that now y[n] is 200
20 40 60 80 100 120 140 160 180 200
negative, leads to the same result, namely
(a) Thresholding only
40
Case 4: y[n] ≤ 0 and |y[n]| ≥ λ
60
Repetition of the analysis of case 2, but with y[n] negative, leads 80
to
100
140
Values of y[n] smaller in absolute value than the threshold λ Figure 7-21 Denoising the clown image: (a) denoising by
are thresholded (set) to zero. Values of y[n] larger in absolute thresholding alone, (b) denoising by thresholding and shrinkage
value than the threshold λ are shrunk by λ , making their in combination.
absolute values smaller. So x̂[n] is computed by thresholding
and shrinking y[n]. This is usually called (ungrammatically)
“thresholding and shrinkage,” or “soft thresholding.”
The next example shows that denoising images works much preserving the real features of the image.
better with thresholding and shrinkage than with thresholding
alone.
When we applied thresholding alone to the noisy image of Concept Question 7-7: Why is the combination of
Fig. 7-19(b), we obtained the image shown in Fig. 7-19(c), shrinkage and thresholding needed for noise reduction?
which we repeat here in Fig. 7-21(a). Application of thresh-
olding and shrinkage in combination, with λ = 0.11, leads to Concept Question 7-8: Why does wavelet-based denois-
the image in Fig. 7-21(b), which provides superior rendition ing work so much better than lowpass filtering?
of the clown image by filtering much more of the noise, while
236 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
7-9 Compressed Sensing In MRI, this reduces acquisition time inside the MRI machine,
and in smartphone cameras, it reduces the exposure time and
energy required to acquire an image.
The solution of an inverse problem in signal and image process- This section presents the basic concepts behind compressed
ing is the reconstruction of an unknown signal or image from sensing and applies these concepts to a few signal and image
measurements (known linear combinations) of the values of the inverse problems. Compressed sensing is an active area of re-
signal or image. Such inverse problems arise in medical imag- search and development, and will experience significant growth
ing, radar imaging, optics, and many other fields. For example, in applications in the future.
in tomography and magnetic resonance imaging (MRI), the
inverse problem is to reconstruct an image from measurements
of some (but not all) of its 2-D Fourier transform values. 7-9.1 Problem Formulation
If the number of measurements equals or exceeds the size
(duration in 1-D, number of pixels in 2-D) of the unknown To cast the compressed sensing problem into an appropriate
signal or image, solution of the inverse problem in the absence form, we define the following quantities:
of noise becomes a solution of a linear system of equations. In
practice, there is always noise in the measurements, so some sort (a) {x[n], n = 0 . . . N − 1} is an unknown signal of length N
of regularization is required. In Section 6-4.3, the deconvolution
problem required Tikhonov regularization to produce a recog- (b) The corresponding (unknown) wavelet transform of x[n] is
nizable solution when noise was added to the data. Furthermore,
often the number of observations is less than the size of the { xe1 [n], xe2 [n], . . . , xeL [n], XeL [n] },
unknown signal or image. For example, in tomography, the 2-D
and the wavelet transform of x[n] is sparse: only K values
Fourier transform values of the image at very high wavenumbers
xk [n]} are nonzero, with K ≪ N.
of all of the {e
are usually unknown. In this case, the inverse problem is under-
determined; consequently, even in the absence of noise there is
an infinite number of possible solutions. Hence, regularization is (c) { y[n], n = 0, 1, . . . , M − 1 } are M known measurements
needed, not only to deal with the underdetermined formulation,
y[0] = a0,0 x[0] + a0,1 x[1] + · · · + a0,N−1 x[N − 1],
but also to manage the presence of noise in the measurements.
We have seen that many real-world signals and images can y[1] = a1,0 x[0] + a1,1 x[1] + · · · + a1,N−1 x[N − 1],
be compressed, using the wavelet transform, into a sparse repre- ..
sentation in which most of the values are zero. This suggests .
that the number of measurements needed to reconstruct the y[M − 1] = aM−1,0 x[0] + aM−1,1 x[1] + · · ·
signal or image can be less than the size of the signal or image, + aM−1,N−1 x[N − 1],
because in the wavelet-transform domain, most of the values
to be reconstructed are known to be zero. Had the locations of where {an,i , n = 0, 1, . . . , M − 1 and i = 0, 1, . . . , N − 1} are
the nonzero values been known, the problem would have been known.
reduced to a solution of a linear system of equations smaller
in size than that of the original linear system of equations. In (d) K is unknown, but we know that K ≪ M < N.
practice, however, neither the locations of the nonzero values
nor their values are known. The goal of compressed sensing is to compute signal {x[n],
Compressed sensing refers to a set of signal processing n = 0, 1, . . . , N − 1} from the M known measurements {y[n],
techniques used for reconstructing wavelet-compressible signals n = 0, 1, . . . , M − 1}.
and images from measurements that are much fewer in number The compressed sensing problem can be divided into two
than the size of the signal or image, but much larger than the components, a direct problem and an inverse problem. In the
number of nonzero values in the wavelet transform of the signal direct problem, the independent variable (input) is x[n] and the
or image. The general formulation of the problem is introduced dependent variable (output) is the measurement y[n]. The roles
in the next subsection. There are many advantages to reducing are reversed in the inverse problem: the measurements become
the number of measurements needed to reconstruct the signal or the independent variables (input) and the unknown signal x[n]
image. In tomography, for example, the acquisition of a fewer becomes the output. The relationships between x[n] and y[n]
number of measurements reduces patient exposure to radiation. involve vectors and matrices:
7-9 COMPRESSED SENSING 237
A. Signal vector For the orthogonal wavelet transforms, such as the Haar
and Daubechies transforms covered in Sections 7-4 and 7-6,
x = [x[0], x[1], . . . , x[N − 1]]T, (7.110) W−1 = WT , so the inverse wavelet transform can be computed
where T denotes the transpose operator, which converts a row as easily as the wavelet transform. In practice, both are com-
vector into a column vector. puted using analysis and synthesis filter banks, as discussed in
earlier sections.
y = [y[0], y[1], . . ., y[M − 1]]T . (7.111) The crux of the compressed sensing problem reduces to
finding z, given y. An additional factor to keep in mind is that
only K values of the elements of z are nonzero, with K ≪ M.
B. Wavelet transform vector Algorithms for computing z from y rely on iterative approaches,
as discussed in future sections.
Because z is of length N, y of length M, and M < N
z1 xe1 [n]
z2 xe [n] (fewer measurements than unknowns), Eq. (7.115) represents an
2 underdetermined system of linear equations, whose solution is
.. .
z=
. = .. ,
(7.112) commonly called an ill-posed problem.
.. xe [n]
. L
zN XeL [n]
z is of length N, and xe1 [n] to xeL [n] are the detail signals of the 7-9.2 Inducing Sparsity into Solutions
wavelet transform and XeL [n] is the coarse signal.
In seismic signal processing, explosions are set off on the
Earth’s surface, and echoes of the seismic waves created by
C. Wavelet transform matrix the explosion are measured by seismometers. In the 1960s,
z = W x, (7.113) sedimentary media (such as the bottom of the Gulf of Mexico)
were modeled as a stack of layers, so the seismometers would
where W is a known N × N wavelet transform matrix that record occasional sharp pulses reflected off of the interfaces
implements the wavelet transform of x[n] to obtain z. between the layers. The amplitudes and times of the pulses
would allow the layered medium to be reconstructed. However,
D. Direct-problem formulation the occasional pulses had to be deconvolved from the source
pulse created by the explosions. The deconvolution problem was
y = A x, (7.114) modeled as an underdetermined linear system of equations.
A common approach to finding a sparse solution to a system
where A is an M × N matrix. Usually, A is a known matrix based
of equations is to choose the solution that minimizes the sum of
on a physical model or direct measurement of y for a known x. absolute values of the solution. This is known as the minimum
Combining Eqs. (7.113) and (7.114) gives ℓ1 norm solution. The ℓ1 norm is denoted by the symbol ||z||1
y = A W−1 z = Aw z, (7.115a) and defined as
N
||z||1 = ∑ |zi |. (7.117a)
where i=1
−1
Aw = A W . (7.115b)
The goal is to find the solution to the system of equations that
Since A and W are both known matrices, Aw also is known. minimizes ||z||1 .
A second approach called the squared ℓ2 norm, which does
not provide a sparse solution, finds the solution that minimizes
E. Inverse-problem formulation the sum of squares of the solution. The squared ℓ2 norm is
If, somehow, z can be determined from the measurement vec- denoted by the symbol ||z||22 and is defined as
tor y, then x can be computed by inverting Eq. (7.113): N
−1 ||z||22 = ∑ |zi |2 . (7.117b)
x=W z. (7.116) i=1
238 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
In terms of z+ and z− , the basis pursuit problem becomes: ẑ = (ATw Aw )−1 ATw y. (7.124a)
using the LU decomposition method or similar techniques. rewrite Eq. (7.128) in the expanded form
Here, LU stands for lower upper, in reference to the lower trian-
gular submatrix and the upper triangular submatrix multiplying T =
the unknown vector ẑ. 2
y[0] a0,0 a0,1 ... a0,N−1 z1
... a1,N−1 z2
y[1] a1,0 a1,1
.. − .
. ..
. . .
1
(b) Underdetermined system y[M − 1] a M−1,0 aM−1,1 . . . aM−1, N−1
..
2 .
Now consider the underdetermined system characterized by ..
.
M < N (fewer measurements than unknowns). In this case, there
zN
is an infinite number of possible solutions. The vector ẑ that 2
minimizes ||z||22 among this infinite number of solutions also is M×1 M×N N ×1
called the pseudo-inverse solution, and is given by the estimate
2
ẑ = ATw (Aw ATw )−1 y. (7.125) D11
z1
.. z
. 2
In the present case, Aw ATw is an M × M matrix with full rank. .
.. .
Solution ẑ should be computed not by inverting Aw ATw , but by
+ λ
. 0 .
. .
initially solving the linear system .. .
0 . .
.
(Aw ATw ) r̂ = y, (7.126a) .. .
. .
DNN zN
to compute an intermediate estimate r̂, and then computing ẑ by 2
applying N×N N ×1
ẑ = ATw r̂. (7.126b) (7.129)
Next, we introduce the new cost function T1 as LASSO functional given by Eq. (7.122) if
" #
1 ′ 2
T1 = y − Bw z2 D = diag p
1
. (7.135)
2
" # " # 2 |zn |
1 y Aw
= − √ z This is because the second terms in the two equations become
2 0 2 λ D 2 identical:
1
= ||y − Aw z||22 + λ ||0 − D z||22 N
z2n N
2 ||D z||22 = ∑ = ∑ |zn | = ||z||1 . (7.136)
1 n=1 |zn |
= ||y − Aw z||22 + λ ||D z||22 = T . (7.131) n=1
2
Given this correspondence between the two cost functionals, the
Hence, Eq. (7.128) can be rewritten in the form IRLS algorithm uses the following iterative procedure to find z:
T = 1
2 ||y − Bw z||22 . (7.132)
(a) Initial solution: Set D = I and then compute z(1) , the initial
The vector z minimizing T is the pseudo-inverse given by iteration of z, by solving Eq. (7.134).
(b) Initial D: Use z(1) to compute D(1) , the initial iteration
ẑ = (BTw Bw )−1 BTw y′ of D:
" #!−1 " # 1
√ Aw √ y D(1) = diag q . (7.137)
= ATw 2 λ DT √ ATw 2 λ DT (1)
2λ D 0 |zn | + ε
= (ATw Aw + 2λ DT D)−1 ATw y. (7.133) (c) Second iteration: Use D(1) to compute z(2) by solving
Eq. (7.134) again.
As always, instead of performing matrix inversion (which is (d) Recursion: Continue to iterate by computing D(k) from
susceptible to noise amplification), vector ẑ should be computed (k)
z using
by solving
(ATw Aw + 2λ DT D) ẑ = ATw y. (7.134) 1
D(k) = diag q (7.138)
(k)
Once ẑ has been determined, the unknown vector x can be |zn | + ε
computed by solving Eq. (7.113).
To solve Eq. (7.134) for ẑ, however, we need to know Aw , λ , for a small deviation ε inserted in the expression to keep D(k)
D, and y. From Eq. (7.115b), Aw = A W−1 , where A is a known finite when elements of z(k) → 0.
matrix based on a physical model or calibration data, and W is
a known wavelet transform matrix. The parameter λ is specified The iterative process ends when no significant change oc-
by the user to adjust the intended balance between data fidelity curs between successive iterations. The algorithm, also called
and storage size (as noted in connection with Eq. (7.128)), and focal underdetermined system solver (FOCUSS), is guaranteed
y is the measurement vector. The only remaining quantity is to converge under mild assumptions. However, because the
the diagonal matrix D, whose function is to assign weights to method requires a solution of a large system of equations at
z1 through zN so as to minimize storage size by having many each iteration, the algorithm is considered unsuitable for most
elements of z → 0. Initially, D is unknown, but it is possible signal and image processing applications. Superior-performance
to propose an initial function for D and then iterate to obtain a algorithms are introduced in succeeding sections.
solution for z that minimizes the number of nonzero elements,
while still satisfying Eq. (7.134).
The Tikhonov function given by Eq. (7.128) reduces to the Concept Question 7-10: How can we get a useful solu-
tion to an underdetermined system of equations?
7-11 LANDWEBER ALGORITHM 241
The Landweber algorithm is a recursive algorithm for solving x(k+1) = x(k) + AT (y − A x(k) ). (7.144)
linear systems of equations y = Ax. The iterative shrinkage
and thresholding algorithm (ISTA) consists of the Landweber The process can be initialized by x(0) = 0, which makes
algorithm,with thresholding and shrinkage applied at each re- x(1) = AT y, as it should.
cursion. Thresholding and shrinkage were used in Section 7-7 The recursion process is called the Landweber iteration,
to minimize the LASSO functional. which in optics is known as the van Cittert iteration. It is
guaranteed to converge to the solution of y = Ax that minimizes
7-11.1 Underdetermined System the sum of squares of the elements of x, provided that the
eigenvalues λi of AAT are within the range 0 < λi < 2. If this
For an underdetermined system y = Ax with M < N, the solution condition is not satisfied, the formulation may be scaled to
x̂ that minimizes the sum of squares of the elements of x is, by
analogy with Eq. (7.125), given by y A
= x
c c
x̂ = AT (A AT )−1 y. (7.139)
or, equivalently,
A useful relationship in matrix algebra states that if all of the u = Bx, (7.145)
eigenvalues λi of (A AT ) lie in the interval 0 < λi < 2, then the where u = y/c and B = A/c. The constant c is chosen so that
coefficient of y in Eq. (7.139) can be written as the eigenvalue condition is satisfied. For example, if c is chosen
∞ to be equal to the sum of the squares of the magnitudes of all of
AT (AAT )−1 = ∑ (I − ATA)k AT . (7.140) the elements of A, then the eigenvalues λi′ of BBT will be in the
k=0 range 0 < λi′ < 1.
The symbol λi for eigenvalue is unrelated to the trade-off
parameter λ in Eq. (7.128).
Using Eq. (7.140) in Eq. (7.139) leads to
∞
7-11.2 Overdetermined System
T k T
x̂ = ∑ (I − A A) A y. (7.141)
In analogy with Eq. (7.124a), the solution for an overdetermined
k=0
system y = Ax is given by
A recursive implementation of Eq. (7.141) assumes the form
x̂ = (AT A)−1 AT y. (7.146)
K
(K+1) T k T
x = ∑ (I − A A) A y, (7.142) Using the equality
k=0
∞
where the upper limit in the summation is now K (instead of ∞). (AT A)−1 = ∑ (I − ATA)k , (7.147)
For K = 0 and K = 1, we obtain the expressions k=0
x(1) = AT y, (7.143a) we can rewrite Eq. (7.146) in the same form as Eq. (7.141),
namely
x(2) = AT y + (I − ATA)AT y = (I − AT A) x(1) + ATy. ∞
(7.143b) x̂ = ∑ (I − ATA)k AT y. (7.148)
k=0
Extending the process to K = 2 gives Hence, the Landweber algorithm is equally applicable to solving
overdetermined systems of linear equations.
x(3) = AT y + (I − ATA)AT y + (I − ATA)2 AT y
|{z} | {z } | {z }
k=0 k=1 k=2
7-11.3 Iterative Shrinkage and Thresholding tions on ISTA, with names like SPARSA (sparse reconstruction
Algorithm (ISTA) by separable approximation), FISTA (fast iterative shrinkage
and thresholding algorithm, and TWISTA (two-step iterative
For a linear system given by shrinkage and thresholding algorithm).
y = A x, (7.149)
Concept Question 7-11: Why do we need an iterative
where x is the unknown signal of length N and y is the (possibly algorithm to find the LASSO-minimizing solution?
noisy) observation of length M, the LASSO cost functional is
N−1 N−1
1 7-12 Compressed Sensing Examples
Λ=
2 ∑ (y[n] − (Ax)[n])2 + λ ∑ |x[n]|, (7.150)
n=0 n=0
To illustrate the utility of the ISTA described in the preceding
where matrix A is M × N and (A x)[n] is the nth element of A x. section, we present four examples of compressed sensing:
• Reconstruction of an image from some, but not all, of its
◮ In the system described by Eq. (7.149), x and y are 2-D DFT values.
generic input and output vectors. The Landweber algorithm
provides a good estimate of x, given y. The estimation • Image inpainting, which entails filling in holes (missing
algorithm is equally applicable to any other linear system, pixel values) in an image.
including the system y = Aw z, where Aw is the matrix given • Valid deconvolution of an image from only part of its
by Eq. (7.115b) and z is the wavelet transform vector. ◭ convolution with a known point spread function (PSF).
• Tomography, which involves reconstruction of a 3-D im-
The ISTA algorithm combines the Landweber algorithm with
age from slices of its 2-D DFT.
the thresholding and shrinkage operation outlined earlier in
Section 7-8.3, and summarized by Eq. (7.109). After each These are only a few of many more possible types of applica-
iteration, elements x(k) [n], of vector x(k) , whose absolute values tions of compressed sensing.
are smaller than the trade-off parameter λ are thresholded to The ISTA was used in all four cases, and the maximum
zero, and those whose absolute values are larger than λ are number of possible iterations was set at 1000, or fewer if the
shrunk by λ . Hence, the ISTA algorithm combines Eq. (7.144) algorithm converges to where no apparent change is observed in
with Eq. (7.109): the reconstructed images. The LASSO functional parameter was
set at λ = 0.01, as this value seemed to provide the best results.
x(0) = 0, (7.151a)
(k+1) (k) T (k)
x =x + A (y − A x ), (7.151b) 7-12.1 Image Reconstruction from Subset of
with
DFT Values
(k+1) (k+1)
Suppose that after computing the 2-D DFT of an image x[n, m],
xi −λ if xi > λ, some of the DFT values X[k1 , k2 ] were lost or no longer avail-
(k+1) (k+1) (k+1)
xi = xi +λ if xi < −λ , (7.151c) able. The goal is to reconstruct image x[n, m] from the partial
(k+1) subset of its DFTs. Since the available DFT values are fewer
0 if |xi | < λ,
than those of the unknown signal, the system is underdetermined
(k+1) and the application is a good illustration of compressed sensing.
where xi is the ith component of x(k+1) . For a 1-D signal { x[n], n = 0, 1, . . . , N − 1 }, its DFT can be
The ISTA algorithm converges to the value of x that mini- implemented by the matrix-vector product
mizes the LASSO functional given by Eq. (7.150), provided all
of the eigenvalues λi of AAT obey |λi | < 1. y = A x, (7.152)
The combination of thresholding and shrinking is often called
soft thresholding, while thresholding small values to zero with- where the (k, n)th element of A is Ak,n = e− j2π nk/N , the nth
out shrinking is called hard thresholding. There are many varia- element of vector x is x[n], and the kth element of vector y is
7-12 COMPRESSED SENSING EXAMPLES 243
X[k]. Multiplication of both sides by AH implements an inverse stored during the implementation of the reconstruction process.
1-D DFT within a factor 1/N. Multiplication by W−1 is implemented by a 2-D filter bank,
The 2-D DFT of an N × N image can be implemented by and multiplication by B is implemented by a 2-D FFT. Con-
multiplication by an N 2 × N 2 block matrix B whose (k2 , n2 )th sequently, ISTA is a very fast algorithm.
block is the N × N matrix A multiplied by the scalar e− j2π n2k2 /N .
So the element Bk1 +Nk2 ,n1 +Nn2 of B is e− j2π n1 k1 /N e− j2π n2k2 /N ,
where 0 ≤ n1 , n2 , k1 , k2 ≤ N − 1. 7-12.2 Image Inpainting
The absence of some of the DFT values is equivalent to In an image inpainting problem, some of the pixel values of an
deleting some of the rows of B and y, thereby establishing image are unknown, either because those pixel values have been
an underdetermined linear system of equations. To illustrate corrupted, or because they represent some unwanted feature of
the reconstruction process for this underdetermined system, we the image that we wish to remove. The goal is to restore the
consider two different scenarios applied to the same set of data. image to its original version, in which the unknown pixel values
are replaced with the, hitherto, unknown pixel values of the
◮ For convenience, we call the values X[k1 , k2 ] of the 2-D original image. This can be viewed as a kind of interpolation
DFT the pixels of the DFT image. ◭ problem.
It is not at all evident that this can be done at all—how can
we restore unknown pixel values? But under the assumption
(a) Least-squares reconstruction that the Daubechies wavelet transform of the image is sparse
(mostly zero-valued), image inpainting can be formulated as a
Starting with the (256 × 256) image shown in Fig. 7-22(a), we compressed sensing problem. Let y be the vector of the known
compute its 2-D DFT and then we randomly select a subset of pixel values and x be the vector of the wavelet transform of the
the pixels in the DFT image and label them as unknown. The image. Note that all elements of x are unknown, even though
complete 2-D DFT image consists of 2562 = 65536 pixel values some of the pixel values are actually known. Then the problem
of X[k1 , k2 ]. Of those, 35755 are unaltered (and therefore have can be formulated as an underdetermined linear system y = Ax.
known values), and the other 29781 pixels have unknown values. For example, if x is a column vector of length five, and only
Figure 7-22(b) displays the locations of pixels with known DFT the first, third, and fourth elements of x are known, the problem
values as white dots and those with unknown values as black can be formulated as
dots.
In the least-squares reconstruction method, all of the pixels x1
with unknown values are set to zero, and then the inverse 2-D y1 x1 1 0 0 0 0 x2
DFT is computed. The resulting image, shown in Fig. 7-22(c), y2 = x3 = 0 0 1 0 0
x3 .
is a poor rendition of the original image in part (a) of the figure. 3y 4 x 0 0 0 1 0 x
4
x5
(b) ISTA reconstruction One application of image inpainting is to restore a painting
The ISTA reconstruction process consists of two steps: in which the paint in some regions of the painting has been
chipped off, scraped off, damaged by water or simply faded,
(1) Measurement vector y, representing the 35755 DFT pixels but most of the painting is unaffected. Another application is
with known values, is used to estimate the (entire 65536) to remove unwanted letters or numbers from an image. Still
wavelet transform vector z by applying the recipe outlined in another application is “wire removal” in movies, the elimination
Section 7-11.3 with λ = 0.01 and 1000 iterations. The relation- of wires used to suspend actors or objects used for an action
ship between y and z is given by y = Aw z, with Aw = BW and stunt in a movie scene.
some rows of B deleted. In all of these cases, damage to the painting, or presence of
(2) Vector z is then used to reconstruct x by applying the unwanted objects in the image, has made some small regions
relation x = W−1 z. of the painting or image unknown. The goal is to fill in the
The reconstructed image, displayed in Fig. 7-22(d), is an unknown values to restore the (digitized) painting or image to
excellent rendition of the original image. its original version.
It is important to note that while B and W−1 are each Using a (200 × 200) = 40000-pixel clown image, 19723 pix-
(N 2 × N 2 ), with N = 65536, neither matrix is ever computed or els were randomly selected and their true values were deleted.
244 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
50 50
100 100
150 150
200 200
250 250
50 100 150 200 250 50 100 150 200 250
(a) Original Shepp-Logan phantom image (b) Locations of known values of X[k1,k2]
50 50
100 100
150 150
200 200
250 250
50 100 150 200 250 50 100 150 200 250
(c) Reconstructed image without ISTA (d) Reconstructed image with ISTA
Figure 7-22 (a) Original Shepp-Logan phantom image, (b) 2-D DFT image with locations of pixels of known values displayed in white
and those of unknown values displayed in black, (c) reconstructed image using available DFT pixels and (d) reconstructed image after
filling in missing DFT pixel values with estimates provided by ISTA.
In the image shown in Fig. 7-23(a), the locations of pixels values are known. The ISTA is a good algorithm to solve this
with unknown values are painted black, while the remaining compressed sensing problem, since the matrix vector multipli-
half (approximately) have their correct values. The goal is to cation y = AWT z can be implemented quickly by taking the
reconstruct the clown image from the remaining half. inverse wavelet transform of the current iteration (multiplication
In terms of the formulation y = AWT z, M = 20277 and by WT ), and then selecting a subset of the pixel values (multi-
N = 40000, so that just over half of the clown image pixel plication by A). The result, after 1000 iterations, is shown in
7-12 COMPRESSED SENSING EXAMPLES 245
7-12.3 Valid 2-D Deconvolution In contrast, the size of the valid 2-D convolution is
(a) Definition of valid convolution (M − L + 1) × (M − L + 1) = 2 × 2,
Given an M × M image x[n, m] and an L × L point spread
function (PSF) h[n, m], their 2-D convolution generates an
246 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
and yV [n, m] is given by transposes of the rows are stacked into a column vector. Finally,
note that multiplication by AT can be implemented as a valid
143 193 2-D convolution with the doubly reversed version of h[n, m]. For
yV [n, m] = .
293 343 example, if
1 2 3 4
The valid convolution yv [n, m] is the central part of y[n, m], 5 6 7 8
z[n, m] =
obtained by deleting the edge rows and columns from y[n, m]. 9 10 11 12
In MATLAB, the valid 2-D convolution of X and H can be 13 14 15 16
computed using the command
and
Y=conv2(X,H,’valid’).
14 13
g[n, m] = = h[1 − n, 1 − m], with n, m = 0, 1,
12 11
(b) Reconstruction from yV [n, m]
The valid 2-D deconvolution problem is to reconstruct an un- then the valid 2-D convolution of z[n, m] and g[n, m] is
known image from its valid 2-D convolution with a known PSF.
The 2-D DFT and Wiener filter cannot be used here, since not all 184 234 284
of the blurred image y[n, m] is known. It may seem that we may wv [n, m] = 384 434 484 .
simply ignore, or set to zero, the unknown parts of y[n, m] and 584 634 684
still obtain a decent reconstructed image using a Wiener filter,
but as we will demonstrate with an example, such an approach This valid 2-D convolution can also be implemented as w = AT z
does not yield fruitful results. where
The valid 2-D deconvolution problem is clearly underdeter-
mined, since the (M − L+ 1)× (M − L+ 1) portion of the blurred z = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]T
image is smaller than the M × M unknown image. But if x[n, m] and
is sparsifiable, then valid 2-D deconvolution can be formulated w = [184 234 284 384 434 484 584 634 684]T.
as a compressed sensing problem and solved using the ISTA.
The matrix A turns out to be a block Toeplitz with Toeplitz To illustrate the process with an image, we computed the
blocks matrix, but multiplication by A is implemented as a valid valid 2-D convolution of the (200 × 200) clown image with a
2-D convolution. Multiplication by AT is implemented as a valid (20 × 20) PSF. The goal is to reconstruct the clown image from
2-D convolution. the (181 × 181) blurred image shown in Fig. 7-24(a). The db3
The valid 2-D convolution can be implemented as yV = Ax, Daubechies wavelet function was used to sparsify the image.
where Here, M = 200 and L = 20, so the valid 2-D convolution has size
(M − L + 1) × (M − L + 1) = 181 × 181. In terms of yV = Ax,
x = [1 2 3 4 5 6 7 8 9]T , A is (1812 × 2002) = 32761 × 40000.
yV = [143 193 293 343]T , Parts (b) and (c) of Fig. 7-24 show reconstructed versions
of the clown image, using a Wiener filter and ISTA, respec-
and the matrix A is composed of the elements of h[n, m] as tively. Both images involve deconvolution using the restricted
follows: valid convolution data yV [n, m]. In the Wiener-image approach,
the unknown parts of the blurred image (beyond the edges
14 13 0 12 11 0 0 0 0 of yV [n, m]) were ignored, and the resultant image bears no
0 14 13 0 12 11 0 0 0 real resemblance to the original clown image. In contrast, the
A= .
0 0 0 14 13 0 12 11 0 ISTA approach provides excellent reconstruction of the original
0 0 0 0 14 13 0 12 11 image. This is because ISTA is perfectly suited for solving un-
derdetermined systems of linear equations with sparse solutions.
Note that A is a 2 × 3 block matrix of 2 × 3 blocks. Each
block is constant along its diagonals, and the blocks are constant
along block diagonals. This is the block Toeplitz with Toeplitz
blocks structure. Also note that images x[n, m] and yV [n, m] have
been unwrapped row by row, starting with the top row, and the
7-12 COMPRESSED SENSING EXAMPLES 247
20
40
60
80
100
120
140
160
180
20 40 60 80 100 120 140 160 180
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
200 200
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
(b) 2-D deconvolution using yv[m,n] and Wiener filter (c) 2-D deconvolution using yv[m,n] and ISTA
Figure 7-24 (a) Valid 2-D convolution yV [n, m] of clown image, (b) deconvolution using Wiener filter, and (c) deconvolution using ISTA.
7-12.4 Computed Axial Tomography (CAT) where α (ξ , η ) is the absorption coefficient of the body under
test at location (ξ , η ) in Cartesian coordinates or (r, θ ) in
polar coordinates. The impulse function δ (r − ξ cos θ − η sin θ )
The basic operation of the CAT scanner was described in
dictates that only those points in the (ξ , η ) plane that fall along
the opening chapter of this book using Fig. 1-24, which we
the path specified by fixed values of (r, θ ) are included in the
reproduce here as Fig. 7-25. For a source-to-detection path
integration.
through a body at a radius r and at orientation θ , the path
The relation between p(r, θ ) and α (ξ , η ) is known as the
attenuation is given by Eq. (1.18) as
2-D Radon transform of α (ξ , η ). The goal of CAT is to
Z ∞Z ∞ reconstruct α (ξ , η ) from the measured path attenuations p(r, θ ),
p(r, θ ) = α (ξ , η ) δ (r − ξ cos θ − η sin θ ) d ξ d η , by inverting the Radon transform given by Eq. (7.154). We do
−∞ −∞
(7.154) so with the help of the Fourier transform.
248 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
r (7.158)
α(ξ,η) I(r,θ) By reversing the order of integration, we have
θ Detector Z ∞Z ∞
ξ P( f , θ ) = α (ξ , η )
(c) Path at radius r and orientation θ −∞ −∞
Z
∞
− j2π f r
· δ (r − ξ cos θ − η sin θ ) e dr d ξ d η .
Figure 7-25 (a) CAT scanner, (b) X-ray path along ξ , and (c) 0
X-ray path along arbitrary direction. (7.159)
A(µ , ν ) = P( f , θ ), (7.161)
150
where A(µ , ν ) is the 2-D Fourier transform of α (ξ , η ), and P
is the 1-D Fourier transform (with respect to r) of p(r, θ ). The
variables (µ , ν ) and ( f , θ ) and related by Eq. (7.156). 200
If p(r, θ ) is measured for all r across the body of interest and
for all directions θ , then its 1-D Fourier transform P( f , θ ) can
be computed, and then converted to A(µ , ν ) using Eq. (7.156). 250
The conversion is called the projection-slice theorem. In prac- 50 100 150 200 250
tice, however, p(r, θ ) is measured for only a finite number of (a) Locations of known values of X[k1,k2]
angles θ , so A(µ , ν ) is known only along radial slices in the
2-D wavenumber domain (µ , ν ). Reconstruction to find α (ξ , η )
from a subset of its 2-D Fourier transform values is a perfect
example of compressed sensing. 50
150
200
250
50 100 150 200 250
(c) ISTA reconstruction
Summary
Concepts
• The wavelet transform of an image is an orthonormal wavelet transform.
expansion of the image using basis functions that are • An image can be denoised by thresholding and shrinking
localized in wavenumber or space. its 2-D wavelet transform. This preserves edges while
• The 1-D wavelet transform of a piecewise-polynomial reducing noise. Thresholding and shrinkage minimizes
signal is sparse (mostly zero-valued). This is why it is the LASSO cost functional, which favors sparsity.
useful. • Compressed sensing allows an image to be reconstructed
• The wavelet and inverse wavelet transforms are imple- from fewer linear combinations of its pixel values than
mented using filter banks and cyclic convolutions. This the number of pixel values, using the ISTA algorithm or
makes their computation very fast. (rarely) basis pursuit.
• The filters in the tree-structured filter banks used to • The ISTA algorithm applies thresholding and shrinkage
implement wavelet and inverse wavelet transforms must at each iteration of the Landweber algorithm.
satisfy the Smith-Barnwell condition for perfect recon- • Applications of compressed sensing include: tomogra-
struction, and also form a quadrature-mirror pair. phy, image inpainting, and valid deconvolution.
• An image can be compressed by thresholding its 2-D
Mathematical Formulae
Zero-stuffing
( Smith-Barnwell condition
x[n/2] for n even (h[n] ∗ h[−n])(−1)n + (h[n] ∗ h[−n]) = 2δ [n]
y[n] =
0 for n odd
Haar functions
Decimation
y[n] = x[2n] 1 1
g[n] = √ { 1, 1 } and h[n] = √ { 1, −1 }
2 2
QMF relation
g[n] = −(−1)n h[L − n]
Important Terms Provide definitions or explain the meaning of the following terms:
average image decimation orthonormal basis sparse
basis pursuit detail image quadrature mirror filter subband decomposition
compressed sensing Haar pair thresholding
cyclic convolution ISTA algorithm Shepp-Logan phantom tree-structured filter
Daubechies Landweber algorithm shrinkage banks
dbK LASSO functional Smith-Barnwell condition zero-stuffing
PROBLEMS 251
PROBLEMS 7.7 Why are time-reversals used in the synthesis filters? Show
that using g[n] and h[n] instead of g[−n] and h[−n] for the
Section 7-2: Expansions of Signals in Orthogonal synthesis filters and then h[n] = (−1)n g[n] for the QMF, perfect
Basis Functions reconstruction is possible only if h[n] and g[n] have DTFTs
(
7.1 The continuous-time Haar functions are defined as 1 for 0 ≤ Ω < π2 ,
( G(Ω) =
0 for π2 < Ω < π ,
1 for 0 < t < 1,
φ (t) =
0 otherwise, and (
for 0 < t < 21 , 0 for 0 ≤ Ω < π2 ,
1 H(Ω) =
ψ (t) = −1 for 21 < t < 1, 1 for π2 < Ω < π ,
0 otherwise, which constitutes an octave-band filter bank. Note that
h[−n] = h[n] and g[−n] = g[n]. Hint: Replace g[−n] with g[n]
ψm,n (t) = 2m/2 ψ (2mt − n). and h[−n] with h[n] in Eq. (7.48).
Let B = {φ (t), ψm,n (t), m, n integers} and let F be the set 7.8 Repeat Problem 7.7, except now change the synthesis
of piecewise-constant functions with support (nonzero region) filters from g[−n] and h[−n] to g[n] and −h[n], and then use
0 ≤ t ≤ 1 whose values change at t = m/2N . h[n] = (−1)n g[n] for the new QMF.
(a) Show that any member of F is a linear combination of (a) Show that one equation for perfect reconstruction is now
elements of B. automatically satisified.
(b) Show that B is an orthonormal basis for F. Hint: Draw (b) Show that the other equation still cannot be satisfied by any
pictures. g[n].
7.2 Let B = {e j2π kt , k integer, 0 ≤ t ≤ 1} and F be the set of Hint: Replace g[−n] with g[n] and h[−n] with −h[n] in
continuous functions with support (nonzero region) 0 ≤ t ≤ 1. Eq. (7.48) and use Eq. (7.34).
Show that B is an orthonormal basis for F.
Section 7-6: Sparsification Using Wavelets of
Section 7-4: Haar Wavelet Transforms Piecewise Polynomial Signals
7.3 Let x[n] = {4, 4, 4, 1, 1, 1, 1, 7, 7, 7, 7, 5, 5, 5, 5, 4}. 7.9 Use the Smith-Barnwell condition given by Eq. (7.62) to
(a) Compute all of the signals in the Haar analysis filter bank design the db2 Daubechies wavelet function. Confirm that your
Fig. 7-8. answer matches the coefficients listed in Table 7-3. Do this by
(b) Check your answers using Rayleigh’s (Parseval’s) theorem. equating coefficients of time n. You should get a large linear
You may use MATLAB. system of equations and a small nonlinear system of equations.
7.4 Let x[n] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}. 7.10 Use the Smith-Barnwell condition given by Eq. (7.62) to
(a) Compute all of the signals in the Haar analysis filter bank design the db2 Daubechies wavelet function. Confirm that your
Fig. 7-8. answer matches the coefficients listed in Table 7-3. Use q[n] =
{q[0], q[1]} = q[0]{1, b}, where b = q[1]/q[0]. This avoids the
(b) Check your answers using Rayleigh’s (Parseval’s) theorem. large linear systems of equations and the simultaneous quadratic
You may use MATLAB. equations of the previous problem.
Section 7-5: Discrete-Time Wavelet Transforms Section 7-7: 2-D Wavelet Transform
1 1
7.5 h[n] = {a, b, 2 , − 2 }.
Find a, b such that h[n] satisfies
7.11 Use haar.m to compute the 2-D Haar transform of the
Smith-Barnwell condition given by Eq. (7.57).
image in letters.mat. Set sigma=0 and lambda=0 in the
7.6 If h[n] = {a, b, c, d}, find g[n] such that g[n] and h[n] are a first line of haar.m. Also depict the image reconstructed from
QMF pair given by Eq. (7.52a). the wavelet transform.
252 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
7.12 Use daub.m to compute the 2-D db3 transform of the (a) Download and run the program daub.m. This adds noise
SAR image in sar.mat. Change the first line to to the SAR image, computes its 2-D db3 transform,
thresholds and shrinks this wavelet transform, computes
load sar.mat;sigma=0;lambda=0; the inverse 2-D db3 wavelet transform of the result, and
Also depict the image reconstructed from the wavelet transform. displays images. Change the first line to
load sar.mat;sigma=50;lambda=100;
Section 7-8: Wavelet-Based Denoising by The threshold and shrinkage uses λ = 100, and the signal-
to-noise ratio is about 6.1.
Thresholding and Shrinkage
(b) Why does this work better than the 2-D DFT or convolution
7.13 This problem investigates denoising the letters image with a lowpass filter?
using the wavelet transform, by thresholding and shrinking the
2-D Haar transform of the noisy image. Section 7-9: Compressed Sensing
(a) Run haar.m. This adds noise to the image in
letters.mat, computes its 2-D Haar transform, 7.17 Even if a compressed sensing problem is only slighlty
thresholds and shrinks this wavelet transform, computes underdetermined, and it has a mostly sparse solution, there is no
the inverse 2-D Haar wavelet transform of the result, and guarantee that the sparse solution is unique. The worst case for
displays images. The threshold and shrinkage uses λ = 70. compressed sensing is as follows. Let:
(b) Why does this work better than the 2-D DFT or convolution (a) am,n = e− j2π mn/N , n = 0, . . . , N − 1, m = 0, . . . , N − 1; skip
with a lowpass filter? every multiple of N/L.
7.14 This problem investigates denoising an MRI head image (b) (
using the wavelet transform, by thresholding and shrinking the 1 for n is a multiple of L,
xn =
2-D Haar transform of the noisy image. 0 for n not a multiple of L.
(a) Run the program haar.m. This adds noise to the MRI
image, computes its 2-D Haar transform, thresholds and (c) For N = 12 and L = 4: m = {1, 2, 4, 5, 7, 8, 10, 11} and
shrinks this wavelet transform, computes the inverse 2-D {n = 0, 4, 8}, so that in this case A is 8 × 12 and the sparsity
Haar wavelet transform of the result, and displays images. (number of nonzero xn ) is K = 3.
Change load letters.mat to load mri.mat;. Show that yn = 0! {xn } is called the Dirac comb. Its signficance:
The threshold and shrinkage uses λ = 150. Let xn be any signal of length 12 with nonzero elements only at
{n = 0, 4, 8}. Let ym = ∑11
n=0 am,n xn . Then adding the Dirac comb
(b) Why does this work better than the 2-D DFT or convolution
to {xn } won’t alter {ym }, so {xn } plus any multiple of the Dirac
with a lowpass filter?
comb is a K-sparse solution.
7.15 This problem investigates denoising an MRI head image
using the wavelet transform, by thresholding and shrinking the Section 7-10: Computing Solutions to
2-D db3 transform of the noisy image.
Underdetermined Systems
(a) Download and run the program daub.m. This adds noise
to the MRI image, computes its 2-D db3 transform, thresh- 7.18 Derive the pseudoinverse given by Eq. (7.124), which is
olds and shrinks this wavelet transform, computes the in- the solution x̂ to the overdetermined linear system of equations
verse 2-D db3 wavelet transform of the result, and displays y = Ax that minimizes E = ||y − Ax||22 . Perform the derivation
images. Change the first line to by letting x = x̂ + δ for an arbitrary δ and showing that the
load mri.mat;sigma=50;lambda=100; coefficient of δ must be zero.
The threshold and shrinkage uses λ = 100.
(b) Why does this work better than the 2-D DFT or convolution 7.19 Derive Eq. (7.125), which is the solution x̂ to the under-
with a lowpass filter? determined linear system of equations y = Ax that minimizes
E = ||x̂||22 . Perform the derivation by minimizing the Tikhonov
7.16 This problem investigates denoising an SAR image using criterion given by Eq. (7.128), letting parameter λ → 0, applying
wavelet transforms, by thresholding and shrinking the 2-D db3 the pseudoinverse given by Eq. (7.124), and using the matrix
transform of the noisy image. identity (AT A + λ 2I)−1 AT = AT (AAT + λ 2I)−1 .
PROBLEMS 253
7.20 Free the clown from his cage. Run the program P720.m.
This sets horizontal and vertical bands of the clown image to
zero, making it appear that the clown is confined to a cage.
Free the clown: The program then uses inpainting to replace the
bands of zeros with pixels by regarding the bands of zeros as
unknown pixel values of the clown. Change the widths of the
bands of zeros and see how this affects the reconstruction.
7.21 De-square the clown image. Run program P721.m. This
sets 81 small squares of the clown image to zero, desecrating it.
The program then uses inpainting to replace the small squares
with pixels by regarding the 81 small squares as unknown pixel
values of the clown. Change the sizes of the squares and see how
this affects the reconstruction.
7.22 The tomography example at the end of Chapter 7 used 24
rays. Download and run the program P722.m, which generated
this example. Change the number of rays (M) from 24 to 32, and
then 16. Compare the compressed sensing reconstruction results
to the least-squares reconstructions (all unknown DFT values set
to zero).
Chapter 8
8 Random Variables,
Processes, and Fields
n2 n1n2 P[n1n2]
Contents
1 11 a2
Overview, 255 a
n1 = 1 b
8-1 Introduction to Probability, 255 2 12 ab
8-2 Conditional Probability, 259 c 3 13 ac
8-3 Random Variables, 261 d
8-4 Effects of Shifts on Pdfs and Pmfs, 263 4 14 ad
8-5 Joint Pdfs and Pmfs, 265 1 21 ba
a
8-6 Functions of Random Variables, 269 a n1 = 2 b 2 22 b2
8-7 Random Vectors, 272
8-8 Gaussian Random Vectors, 275 b c 3 23 bc
8-9 Random Processes, 278 d
4 24 bd
8-10 LTI Filtering of Random Processes, 282
8-11 Random Fields, 285 1 31 ca
a
Problems, 288 c b 2 32 cb
n1 = 3 c 3 33 c2
Objectives d d
4 34 cd
Learn to: 1 41 da
a
b 2 42 db
■ Compute conditional probabilities and density
functions. n1 = 4 c 3 43 dc
d
4 44 d2
■ Compute means, variances, and covariances for
random variables and vectors.
■ Compute autocorrelations and power spectral This chapter supplies a quick review of probability,
densities of wide-sense-stationary random processes random variables, vectors, processes, and fields for
in continuous and discrete time. use in Chapter 9, which reviews estimation theory
and applies it to image estimation. Readers already
■ Use thresholding and shrinkage of its wavelet familiar with these topics may skip this chapter.
transform to denoise an image.
S = { E1 , E2 , E3 , E4 }, (8.1)
◮ The union of two events Ei and E j is an OR state-
where E1 to E4 are outcomes (events) defined as: ment: Ei occurring, E j occurring, or both, as illustrated by
Fig. 8-1(a). ◭
E1 = { H1 H2 }, E2 = { H1 T2 },
E3 = { T1 H2 }, E4 = { T1 T2 },
where H1 and H2 denote that the results of the first and second
flips are heads, and similarly for tails T1 and T2 .
Ei Ej Ei Ej
Event Probability P: Each element of S, such as E1 through
E4 , is called an event. Events may also include combinations
of elements, such as
(a) Union: Ei U Ej (b) Intersection: Ei I Ej
Ea = { E2 , E4 } = { H1 T2 , T1 T2 },
Figure 8-1 (a) The event of the union of two events Ei and E j
which in this case represents the outcome that the second coin encompasses the combined elements of both events, whereas
flip is a “tail,” regardless of the outcome of the first coin flip. (b) the event of their intersection includes only the elements
Associated with each event is a certain probability determined common to both of them.
by the conditions and constraints of the experiment. If each time
255
256 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
◮ The intersection of events Ei and E j is an AND statement ◮ Probabilities are assigned by the user for all elements of
(Fig. 8-1(b)), it is denoted Ei ∩ E j , and its associated the sample space S, consistent with the axioms of probabil-
probability P[Ei ∩ E j ] represents the condition that both Ei ity. Using these axioms, the probability of any element of
and E j occur. ◭ the event space A can then be computed. ◭
For the two-time coin-flip experiment, the occurrence of any 8-1.3 Properties of Unions and Intersections
one of the four events, E1 through E4 , negates the possibility
of occurrence of the other three. Hence, if E1 = H1 H2 and Commutative property
E2 = H1 T2 ,
Because (E1 ∩ E2 ) and (E1′ ∩ E2 ) are disjoint, the probability 8-1.4 Probability Tree for Coin-Flip Experiment
relationship corresponding to Eq. (8.13a) is
Suppose a coin is flipped N times, and the result of the kth flip—
with 1 ≤ k ≤ N—is either a head and designated Hk , or a tail and
P[E1 ∩ E2 ] + P[E1′ ∩ E2 ] = P[E2 ]. (8.14) designated Tk .
P[Hk ] = a, (8.20a)
P[Ei ∪ E j ] = P[Ei ] + P[E j ] − P[Ei ∩ E j ]. (8.15)
P[Tk ] = 1 − a, (8.20b)
1 ≤ k ≤ N.
The relationship given by Eq. (8.15) can be derived by using
Fig. 8-1(a) to write (Ei ∪ E j ) as the union of three intersections: For an unbiased coin, a = 0.5. A set designates a specific event,
such as H1 T2 , which can be written as
Ei ∪ E j = (Ei ∩ E ′j ) ∪ (Ei′ ∩ E j ) ∪ (Ei ∩ E j ), (8.16)
H1 T2 = H1 ∩ T2 .
and the corresponding probability is given by
For N = 2, the sample space S has 2N = 22 = 4 elements:
P[Ei ∪ E j ] = P[Ei ∩ E ′j ] + P[Ei′ ∩ E j ] + P[Ei ∩ E j ]. (8.17)
S = { H1 ∩ H2 , H1 ∩ T2 , T1 ∩ H2 , T1 ∩ T2 },
Upon setting E1 and E2 in Eq. (8.14) as Ei and E j , respectively,
we obtain N
and the event space A has 22 = 24 = 16 elements:
P[Ei ∩ E j ] + P[Ei′ ∩ E j ] = P[E j ]. (8.18a)
Repeating the process but with E1 and E2 set in Eq. (8.14) as E j {S, H2 , T2 , T1 H2 , T1 T2 , (T1 T2 )′ , (T1 H2 )′ , H1 T2 ∪ T1 H2 }∪
and Ei (instead of as Ei and E j ), respectively, leads to / H1 , T1 , H1 H2 , H1 T2 , (H1 H2 )′ , (H1 T2 )′ , H1 H2 ∪ T1 T2 }.
{0,
P[E j ∩ Ei ] + P[E ′j ∩ Ei ] = P[Ei ]. (8.18b) The probability tree of S is shown in Fig. 8-2. Each element
of S and its assigned probability are denoted at the end of each
Using Eqs. (8.18a and b) in Eq. (8.17) leads to Eq. (8.15). branch of the tree. Examples of computing probabilities of union
To illustrate the meaning of Eq. (8.14), let us consider the events are given in Example 8-1.
example of a coin toss with N = 2 times. If we use E1 to denote
that the first toss is a head and E2 to denote that the second toss
is a tail, then Eq. (8.14) becomes
Example 8-1: Probability Tree for Coin-Flip
P[H1 ∩ T2 ] + P[H1′ ∩ T2 ] = P[T2 ]. (8.19) Experiment
The first term is the probability that the first toss resulted in a
head and the second toss resulted in a tail, and the second term
is the probability that the first toss did not result in a head, but the For the coin-flip experiment with N = 2 and P[Hk ] = a, com-
second one did result in a tail. The sum of the two probabilities pute the probabilities of the following events: (a) H1 ∪ H2 , (b)
is equal to the probability that the second toss is a tail, P[T2 ], H1 H2 ∪ T1 T2 , and (c) H1 T2 ∪ T1 H2 .
regardless of the outcome of the first toss.
Solution: (a) According to Eq. (8.15),
Using the tree in Fig. 8-2 and noting that H1 T2 and T1 H2 are
a2 disjoint events, we have
H2 = H1 I H2 = H1H2
P[H1 T2 ∪ T1 H2 ] = a(1 − a) + (1 − a)a = 2a(1 − a).
a
H1
a(1 − a)
8-1.5 Probability Tree for Tetrahedral Die
a′ = 1 − a Experiment
a T2 = H1 I T2 = H1T2
A tetrahedral die is a four-sided object with one of the following
four numbers: 1, 2, 3, 4, printed on each of its four sides
Start (Fig. 8-3). When the die is rolled, the outcome is the number
a(1 − a) printed on the bottom side.
H2 = T1 I H2 = T1H2 Condition 1: The result of any die roll has no effect on the result
a′ = 1 − a a of any other roll.
T1 Condition 2: The die is not necessarily a “fair” die. If we denote
(1 − a)2 n as the number that appears on the bottom of the die after it has
a′ = 1 − a been rolled, the probabilities that n = 1, 2, 3, or 4 are
T2 = T1 I T2 = T1T2
P[n = 1] = a,
Figure 8-2 Probability tree for N = 2 for coin-flip experiment. P[n = 2] = b,
P[n = 3] = c,
P[n = 4] = d,
From the tree in Fig. 8-1,
with the constraints that 0 ≤ a, b, c, d ≤ 1 and
P[H1 ] = P[H2 ] = a
a + b + c + d = 1.
and
P[H1 ∩ H2 ] = a2 . For a fair die, a = b = c = d = 1/4.
Hence,
P[H1 ∪ H2 ] = a + a − a2 = a(2 − a).
(b)
P[H1 H2 ∪ T1 T2 ] = a2 + (1 − a)2.
Figure 8-3 Tetrahedral die with 4 sides displaying numerals 1
(c) to 4. When the die is rolled, the outcome is the numeral on the
bottom side.
P[H1 T2 ∪ T1 H2 ] = P[H1 T2 ] + P[T1 H2 ] − P[H1T2 ∩ T1 H2 ].
8-2 CONDITIONAL PROBABILITY 259
H1 P[H1H2] = ab
P[H2|H1] = b
H1
P[H1] = a
P[T2|H1] = 1 − b T1 P[H1T2] = a(1 − b)
Start
H2 P[T1H2] = (1 − a)b
P[T1] = 1 − a P[H2|T1] = b
T1
Figure 8-5 Conditional probability tree for a coin-flip experiment with N = 2. The top red path represents the outcome H1 H2 and the
bottom blue path represents the outcome T1 T2 .
that
Table 8-1 Probability symbols and terminology.
P[n1 = 1 ∩ n2 = 2]
Term Notation P[n1 = 1 | n1 + n2 = 3] =
P[n1 + n2 = 3]
Sample space S P[n1 = 1 ∩ n2 = 2]
=
Event (examples) E1 , E2 , . . . P[n1 n2 = 12] + P[n1n2 = 21]
Outcome (examples) H, T ; x1 x2 ab
= = 0.5.
Empty set (impossible event) 0/ ab + ba
Complement of E (“not E”) E′ The last entries were obtained from the probability tree in
Union of Ei and E j (“Ei or E j ”) Ei ∪ E j Fig. 8-4. Note that because of the a priori knowledge that
Intersection of Ei and E j (“Ei and E j ”) Ei ∩ E j n1 + n2 = 3, the probability of n1 = 1 increased from 0.1 to 0.5.
Ei and E j are independent P[Ei ∩ E j ] = Concept Question 8-2: Why does the conditional proba-
P[Ei ] P[E j ] bility formula require division by P[B]?
Ei and E j are mutually exclusive P[Ei ∪ E j ] =
P[Ei ] + P[E j ] Exercise 8-2: Given that the result of the coin flip experi-
ment for N = 2 was one head and one tail, compute P[H1 ].
Answer: P[H1 | H1 T2 ∪ T1 H2 ] = 0.5. (See IP ).
where we used the relation (H1 H2 ) ∩ (T1 T2 ) = 0. Using the
probability tree in Fig. 8-5 gives
8-3 Random Variables
a2 0.42
P[E1 |E2 ] = 2 2
= = 0.31. The number of heads n among N flips is a random variable. A
a + (1 − a) 0.4 + (1 − 0.4)2
2
random variable is a number assigned to each possible outcome
of a random experiment. The range of values that n can assume
is from zero (no heads) to N (all heads). Another possible
random variable is the number of consecutive pairs of heads
Example 8-4: Tetrahedral Die Conditional among the N flips.
Probability For the tetrahedral die experiment, our random variable might
be the number of times among N tosses that the outcome is the
number 3, or the number of times the number 3 is followed
by the number 2, or many others. In all of these cases, the
Given that the result of the tetrahedral die experiment of Exam- random variables have real discrete values, so we refer to them
ple 8-2 was n1 + n2 = 3, compute the probability that n1 = 1. as discrete random variables. This is in contrast to continuous
Use a = 0.1, b = 0.2, c = 0.3, and d = 0.4. random variables in which the random variable may assume
any value over a certain continuous range. We will examine the
Solution: From Fig. 8-4, there are two ways to obtain properties of both types of variables.
n1 + n2 = 3, namely
1 0.050 p(x)
p(x′ ) = lim P[x′ ≤ x < x′ + δ ]. (8.24)
δ →0 δ 0.045
0.040
Thus, the probability of x lying within a narrow interval
[x′ < x < x′ + δ ) is p(x) δ . The interval probability that x lies 0.035
between values a and b is 0.030
Z b 0.025
P[a ≤ x < b] = p(x′ ) dx′ . (8.25a) 0.020
a
0.015
and the total probability over all possible values of x is
0.010
Z ∞
0.005
p(x′ ) dx′ = 1. (8.25b)
−∞ 0.000
−20 −15 −10 −5 0 5 10 15 20
Surface height above mean surface x (mm)
(b) pdf
Figure 8-6 (a) Measured height profile x(y) and (b) pdf of
digitized height profile p(x).
8-3.2 Probability Distributions for Discrete
Random Variables
p[n] R 1/2
Answer: P[x < 1/2] = 0 2x dx = 1/4.
0.4
0.3
Exercise 8-4: Random variable n has the pmf p[n] = ( 12 )n
0.2 for integers n ≥ 1. Compute P[n ≤ 5].
0.1
Answer: P[n ≤ 5] = ∑5n=1 ( 12 )n = 31/32.
0 n
−1 0 1 2 3 4 5 6
Table 8-2 Notation and properties of continuous and discrete random variables.
variance of x σx2 = x2 − x2
Z b
interval probability over (a, b) P[a ≤ x < b] = p(x′ ) dx′
a
variance of n σn2 = n2 − n2
n′ =nb −1
interval probability over [na ≤ n < nb ] P[na ≤ n < nb ] = ∑ p[n′ ]
′
n =na
compute (a) constant C, (b) the marginal pdf p(x), and (c) the 8-5.2 Discrete Random Variables
conditional pdf p(y|x) at x = 3/4.
For two discrete random variables n and m, their joint probabil-
Solution: (a) We note from the definition of p(x, y) that x can ity mass function is p[n, m]. The probability that n is in the range
extend between 0 and 1, but y extends only between whatever between na and nb − 1, inclusive of those limits, and m is in the
8-5 JOINT PDFS AND PMFS 267
p(x)
1.0
x = 2, σx = 0.45
0.6
0.4
0.2
−3 −2 −1 0 1 2 3 4 5 6 7
x
1
Given that
(
C for 0 ≤ n ≤ m ≤ 2,
p[n, m] =
x 0 otherwise,
1
compute (a) constant C, (b) the marginal pmf p[n], and (c) the
Figure 8-10 The domain of joint pdf p(x, y) of Example 8-6; conditional pmf p[m|n] at n = 1.
for each value of variable x, variable y extends from that value
to 1. Solution: (a) The joint pmf, depicted in Fig. 8-11, is the
discrete equivalent of the joint pdf displayed in Fig. 8-10.
2 C C C
To evaluate p[m|n] at n = 1, we should note that since n ≤ m, the 8-6 Functions of Random Variables
range of m becomes limited to [1, 2]. Hence,
1/6 1
1/6 2/6 = 2
for m = 1,
p[m|1] = 1/6
= 2/6 = 2 1
for m = 2,
8-6.1 Mean Value
p[n = 1]
0 otherwise.
We note that the total conditional probability adds up to 1: The mean value, or simply the mean or expectation, of a
continuous random variable x characterized by a pdf p(x) is
2
1 1 defined as Z ∞
∑ p[m | n = 1] = 2 + 2 = 1. x = E[x] = x′ p(x′ ) dx′ . (8.48a)
m=1 −∞
where xy = E[xy].
◮ The mean value of the weighted sum of two random
Unlike the mean, the variance of the linear sum of two random
variables is equal to the sum of their weighted means. ◭
variables is not a linear operator. Consider the variable x + y:
2
A similar relationship applies to two discrete random variables σ(x+y) = E[((x + y) − E[x + y])2]
n and m:
E[an + bm] = an + bm. (8.52) = E[(x + y)2 − 2(x + y) E[x + y] + (E[x + y])2]
= E[x2 ] + E[2xy] + E[y2]
8-6.2 Conditional Mean − 2E[x] E[x + y] − 2E[y] E[x + y] + (E[x + y])2
2
The conditional expectation, also known as the conditional = x2 + 2xy + y2 − 2x(x + y) − 2y(x + y) + (x + y) .
mean, of random variable x, given that y = y′ uses the condi- (8.57)
tional pdf p(x|y = y′ ):
Z ∞ The expectation of (x + y) is linear; i.e.,
E[x | y = y′ ] = x′ p(x′ |y′ ) dx′ . (8.53a)
−∞ (x + y) = x + y. (8.58)
Note that the conditional mean is defined at a specific value of Use of Eq. (8.58) in Eq. (8.57) leads to
the second random variable, namely y = y′ .
2
For the discrete case, σ(x+y) = x2 + 2xy+ y2 − 2x(x+ y)− 2y(x+ y)+ (x+ y)2 , (8.59)
∞
E[n | m = m′ ] = ∑ n′ p[n′ |m′ ]. (8.53b) which simplifies to
n′ =−∞
2
σ(x+y) = (x2 − x2 ) + (y2 − y2 ) + 2(xy − x y). (8.60)
8-6.3 Variance and Standard Deviation In view of the definitions given by Eqs. (8.54) and (8.56),
The variance of a random variable x with mean value x is the Eq. (8.60) can be rewritten as
mean value of (x − x)2 , where (x − x) is the deviation of x from
its mean x. Denoted σx2 , the variance is defined as 2
σ(x+y) = σx2 + σy2 + 2λx,y . (8.61)
σx2 = E[(x − x) ]2
= E[x2 − 2xx + x2 ]
8-6.4 Properties of the Covariance
= E[x2 ] − 2x E[x] + x2 = x2 − x2 , (8.54)
(1) The covariance between a random variable and itself is its
variance:
where x2 = E[x2 ].
Also, 2
σax = a2 σx2 .
λx,x = σx2 . (8.62)
The square root of the variance is the standard deviation:
q (2) The degree of correlation between two random variables is
σx = x2 − x2 . (8.55) defined by the correlation coefficient ρxy :
p(x)
1.0
x = 0, σx2 = 0.2
0.8
x = 0, σx2 = 1.0
0.6
x = 0, σx2 = 5.0
0.4
0.2
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
Figure 8-12 Three Gaussian distributions, all with x = 0 but different standard deviations.
Consider two independent random variables x and y. The mean compute (a) x, (b) σx2 , (c) E[y | x = 3/4], and (d) σy2 | x=3/4 .
of their product is
Solution: (a) From Eq. (8.42),
Z ∞Z ∞
(
xy = E[xy] = x′ y′ p(x′ , y′ ) dx′ dy′ 2(1 − x), 0 ≤ x ≤ 1,
−∞ −∞ p(x) =
Z ∞
0 otherwise.
= x y px (x′ ) py (y′ ) dx′ dy′
′ ′
−∞
Z ∞ Z ∞ Hence, the mean value of x is
= x′ p(x′ ) dx′ y′ p(y′ ) dy′ Z 1 Z 1
−∞ −∞ 1
=xy (x and y independent), (8.67) x = E[x] = x′ p(x′ ) dx′ = 2x′ (1 − x′) dx′ = .
0 0 3
which leads to λx,y = xy − x y = 0, and hence uncorrelation (b) To compute σx2 , we start by computing x2 :
between x and y.
For readers interested in mechanical systems, we note the Z 1 Z 1
1
following correspondences: x2 = E[x2 ] = (x′ )2 p(x′ ) dx′ = 2(x′ )2 (1 − x′ ) dx′ = .
0 0 6
272 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
Hence, Exercise 8-8: Show that for the joint pdf given in Example
2 8-8, λx,y = 1/36.
37 7 1
σy | x=3/4 = − = .
48 8 192 Answer: (See IP ).
example, the classes would be water, urban, forest, etc. And if where λxi ,x j = xi x j − xi x j . Similarly, the cross-covariance ma-
the multiple images are ultrasound medical images of the same trix Kx,y between random vector x of length N and random
scene but acquired under different conditions or different times, vector y of length M is given by the (N × M) matrix
the classes would be bone, tissue, etc.
When two or more random variables are associated with an Kx,y = E[(x − x)(y − y)T ]
event of interest, we form a random vector (RV). A random
λx1 ,y1 λx1 ,y2 ··· λx1 ,yM
vector x of length N is a column vector comprising N random λx2 ,y1 λx2 ,y2 ··· λx2 ,yM
variables { x1 , x2 , . . . , xN }: = E[xyT ] − x yT =
..
.
.
x = [x1 , x2 , . . . , xN ]T . (8.71) λxN ,y1 λxN ,y2 ··· λxN ,yM
(8.75)
We write x in bold to denote that it is a vector, and here the
superscript “T” denotes the transpose operation, which in this We note that Kx,x = Kx = KTx and Ky,x = KTx,y . Additionally, if
case converts a horizontal vector into a column vector. x and y are uncorrelated,
◮ The notation and properties of random vectors are sum- y = E[y] = E[Ax] = Ax. (8.78)
marized in Table 8-3. ◭
To demonstrate the validity of Eq. (8.78), we rewrite y in terms
of its individual elements:
8-7.1 Mean Vector and Covariance Matrix N
yi = ∑ Ai, j x j , 1 ≤ i ≤ M, (8.79)
The mean vector x of random vector x is the vector of the mean j=1
values of the components of x:
where Ai, j is the (i, j)th element of A. Taking the expectation of
x = E[x] = [x1 , x2 , . . . , xN ]T . (8.73) yi , while recalling from Eq. (8.51) that the expectation is a linear
operator, gives
In analogy with Eq. (8.56), the covariance matrix Kx of random
vector x comprises the covariances between xi and x j : N
E[yi ] = ∑ Ai, j E[x j ], 1 ≤ i ≤ M. (8.80)
Kx = E[(x − x)(x − x)T ] j=1
λx1 ,x1 λx1 ,x2 ··· λx1 ,xN Since this result is equally applicable to all random variables yi
λx2 ,x1 λx2 ,x2 ··· λx2 ,xN of random vector y, it follows that Eq. (8.78) is true.
= E[xxT ] − x xT =
..
, (8.74)
Using the matrix algebra property
.
λxN ,x1 λxN ,x2 ··· λxN ,xN (Ax)T = xT AT , (8.81)
274 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
Similarly, the cross-covariance matrix between y and x is ob- Random Vectors x and y
tained by applying the basic definition for Ky,x (as given in the
second step of Eq. (8.75), after interchanging x and y: Covariance matrix Kx = E[x xT ] − x xT
ical definition of the Fourier transform, which uses a “+” sign in the exponent
The quantities defining the expression for p(x) given by in Eq. (8.91). For consistency with the engineering definition of the Fourier
Eq. (8.85) are the length N of the random vector, the mean- transform used throughout this book, we use a “−” sign instead.
276 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
any function of x, such as g(x), is by definition 8-8.2 Properties of Gaussian Random Vectors
Z ∞
E[g(x)] = g(x′ ) p(x′ ) dx′ . (8.92)
A. Marginals of Gaussian random vectors are
−∞ Gaussian
By extension, Eq. (8.91) reduces to Let x be a Gaussian random vector partitioned into two vectors
y and z:
T
Φ x (µ ) = E[e− j2πµµ x ]. (8.93) y
x = [yT , zT ]T = . (8.100)
z
Since x is a Gaussian random vector, use of its pdf, as defined
by Eq. (8.85), in Eq. (8.91) can be shown to lead to Next, if we define matrix A as
µ T x−2π 2 µ T Kx µ
− j2πµ A= I 0 , (8.101)
Φ x (µ ) = e . (8.94)
it follows that y is related to x by
We note that the first term is Eq. (8.94) is the N-dimensional
generalization of entry #3 in Table 3-1 and the second term is
y
the generalization of entry #13 in the same table. y = Ax = I 0 . (8.102)
z
Now, we wish to derive the characteristic function Φ y (µ ) of
vector y = Ax. We start by introducing the spatial frequency According to the result given by Eq. (8.99), if x is a Gaussian
vector µe and its transpose: random vector, so is y, which proves that the marginal of a
Gaussian vector is itself a Gaussian random vector.
µe = ATµ (8.95a)
The joint pdf of z has the form of Eq. (8.85) with x replaced C. Conditional Gaussian random vectors
with z:
If z is a Gaussian random vector partitioned into vectors x and y
1 1 T −1 as defined by Eq. (8.103), namely
p(z) = e− 2 (z−z) Kz (z−z) . (8.104)
(2π )N/2 (det Kz )1/2
x
z= , (8.109)
The mean vector and covariance matrix of z are given by y
z = E[z] =
x
(8.105a) then the conditional pdf p(x | y = y′ ) also is Gaussian:
y
p(x | y = y′ ) ∼ N (x | y = y′ , Kx|y ), (8.110)
and
Kx Kx,y with
Kz = . (8.105b)
KTx,y Ky
x | y = y′ = E[x | y = y′ ] = x + Kx,yK−1 ′
y (y − y) (8.111a)
Since x and y are uncorrelated, Kx,y = 0, in which case Kz
becomes a block diagonal matrix: and
Kx|y = Kx − Kx,y K−1 T
y Kx,y . (8.111b)
Kx 0
Kz = T , (8.106a)
0 Ky
Interchanging x and y everywhere in Eqs. (8.110) and (8.111)
detKz = (det Kx )(det Ky ), (8.106b)
provides expressions for the conditional pdf p(y | x = x′ ).
and −1 Deriving these relationships involves a rather lengthy mathe-
Kx 0 matical process, which we do not include in here (see Problem
K−1
z = . (8.106c)
0T K−1
y 8-13).
Moreover, the exponent of Eq. (8.104) becomes
Example 8-9: Gaussian Random Vectors
1 x − x T K−1
x 0 x−x
−
2 y−y 0T K−1 y y−y
1 1 x1
= − (x − x)T K−1 T −1
x (x − x) − (y − y) Ky (y − y). Random vector x =
x2
has a joint pdf
2 2
(8.107)
p(x) ∼ N (0, Kx ), (8.112)
In view of the results represented by Eqs. (8.106b) and (8.107),
the pdf of z simplifies to with a covariance matrix
1 5 2
p(z) = Kx = . (8.113)
(2π )Nx /2 (2π )Ny /2 (det Kx )1/2 (det Ky )1/2 2 1
1 T K−1 (x−x) 1 T K−1 (y−y)
× e− 2 (x−x) x e− 2 (y−y) y
y
Random vector y = 1 is related to x by
= p(x) p(y), (8.108) y2
Determine: (a) σx21 and λx1 ,x2 , (b) σz2 , (c) Ky , (d) Ky,x , and
278 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
p(x[3])
p(x[1])
x[3]
x[3]
p(x[5])
x[1]
x[1]
x[5]
x[5]
n
1 2 3 4 5 6
Figure 8-13 x[n] is the air temperature at discrete time n. At time n = 1, atmospheric conditions lead to a probabilities model for random
variable x[1] given by a pdf p(x[1]) and a mean value x[1]. Similar models characterize x[n] at each n. The sequence of random variables
{ . . . , x[−2], x[−1], x[0], x[1], . . . } constitutes a discrete-time random process.
Mean value
B. Gaussian random process x[n] = E[x[n]]. (8.120)
A random process is Gaussian if each finite subset of the A zero-mean random process is a process with x[n] = 0.
infinite set of random variables { x[n] } is a jointly Gaussian set
of random variables, and therefore they can be stacked into a
280 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
Cross-covariance function ◮ For the sake of simplicity, we will henceforth assume that
all WSS random processes have zero mean. ◭
Kxy [i, j] = λx[i],y[ j] = E[(x[i] − x[i])(y[ j] − y[ j])]
= x[i] y[ j] − x[i] y[ j]. (8.122) For a zero-mean random process, with x[i] = y[ j] = 0 for all i
and j, the combination of Eqs. (8.121) through (8.127) leads to
Random processes x[n] and y[n] are uncorrelated if
Rx [i − j] = Kx [i − j] = E[x[i] x[ j]] (8.128a)
Kxy [n, m] = 0 for all n and m (uncorrelated).
and
Rxy [i − j] = Kxy [i − j] = E[x[i] y[ j]]. (8.128b)
Autocorrelation function
Rx [i, j] = E[x[i] x[ j]] = Kx [i, j] + x[i] x[ j]. (8.123) Changing variables to n = i − j gives, for any j,
For a zero-mean random process with x[i] = x[ j] = 0, Rx [n] = E[x[n + j] x[ j]], (8.129a)
Rxy [n] = E[x[n + j] y[ j]]. (8.129b)
Rx [i, j] = Kx [i, j] (zero-mean process). (8.124)
Cross-correlation function If the process is IID, x[n + j] and x[ j] are independent random
variables, except for n = 0. Hence, Rx [n] becomes
Rxy [i, j] = E[x[i] y[ j]] = Kxy [i, j] + x[i] y[ j]. (8.125a) (
For zero-mean random processes E[x2 [ j]] for n = 0,
Rx [n] = (8.130)
E[x[n + j]] E[x[ j]] for n 6= 0.
Rxy [i, j] = Kxy [i, j] (zero-mean process). (8.125b)
Since the process is presumed to be zero-mean, which means
that x[i] = 0 for any i, Rx [n] simplifies to
8-9.3 Wide-Sense Stationary (WSS) Random
Process Rx [n] = σ 2 δ [n], (8.131)
A. Discrete time
where the variance is
A random process is considered wide-sense stationary (WSS),
σ 2 = x2 [ j]. (8.132)
also known as weak-sense stationary, if it has the following
three properties:
B. Continuous-time processes
(a) The mean x[n] is constant for all values of n.
All of the definitions and relationships introduced earlier
(b) The autocovariance function Kx [i, j] is a function of the for discrete-time random processes generalize directly to
difference (i − j), rather than i or j explicitly. That is, continuous-time random processes. For example, x(t) is a Gaus-
Kx [i, j] = Kx [i − j]. (8.126) sian random process if
WSS random process, we defined the autocorrelation and cross- In general, Rx (τ ) is related to the power spectral density of the
correlation functions in terms of the discrete-time difference signal, Sx ( f ), by the Fourier transform:
n = i − j. By analogy, we define τ = ti − t j for the continuous- Z ∞
′
time case and then we generalize by replacing t j with simply t, Sx ( f ) = F { Rx (τ ) } = Rx (τ ′ ) e− j2π f τ d τ ′ (8.139a)
which leads to −∞
and Z ∞
Rx (τ ) = Kx (τ ) = E[x(τ + t) x(t)] (zero-mean WSS) ′
(8.134a) Rx (τ ) = F −1 { Sx ( f ) } = Sx ( f ′ ) e j2π f τ d f ′ . (8.139b)
−∞
and
Setting τ = 0 leads to
Rxy (τ ) = Kxy (τ ) = E[x(τ + t) y(t)] (zero-mean WSS).
Z ∞
(8.134b)
E[x2 (t)] = Rx (0) = Sx ( f ′ ) d f ′ . (8.140)
−∞
These expressions are for a zero-mean WSS random process.
Furthermore, if the process is also IID, For the special case where Sx ( f ) is zero at all frequencies except
over an infinitesimally narrow band of width B centered at
f ′ = f0 , the expression given by Eq. (8.140) reduces to
Rx (τ ) = σ 2 δ (τ ). (8.135)
E[x2 (t)] = Sx ( f0 ) B. (8.141)
Concept Question 8-10: Why can we assume that a where we assumed that E[x(t − α )] = 0, based on our earlier
WSS random process has zero mean? assumption that x(t) is a zero-mean WSS random process.
Hence, y(t) is zero-mean.
Exercise 8-10: A zero-mean WSS random process has
power spectral density A. Autocorrelation of output
4 Let us consider y(t) at times t1 and t2 ; upon replacing t with t1
Sx ( f ) = .
(2π f )2 + 4 and dummy variable α with α1 , and then repeating the process
at t2 , we have
What is its autocorrelation function?
Z ∞
Answer: R(τ ) = F −1 {Sx ( f )}. From entry #3 in Table 2-5, y(t1 ) = h(t1 ) ∗ x(t1 ) = h(α1 ) x(t1 − α1 ) d α1 (8.148a)
−∞
R(τ ) = e−2|τ | .
and
8-10 LTI FILTERING OF RANDOM PROCESSES 283
−1
−2
−3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a) Incorrect plot of white process
−1
−2
−3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b) Correct plot of white process
Z ∞
y(t2 ) = h(t2 ) ∗ x(t2 ) = h(α2 ) x(t2 − α2 ) d α2 . (8.148b) Taking the expectation E[·] of both sides gives
−∞ Z Z ∞
Multiplication of y(t1 ) by y(t2 ) gives Ry (t1 ,t2 ) = h(α1 ) h(α2 ) Rx (t1 − t2 − α1 + α2 ) d α1 d α2 .
−∞
Z ∞Z ∞ (8.150)
y(t1 ) y(t2 ) = h(α1 ) h(α2 ) x(t1 − α1 ) x(t2 − α2 ) d α1 d α2 . Sunce we have already shown that E[y(t)] = 0, it follows
−∞ −∞ that Ry (t1 ,t2 ) = Ry (t1 − t2 ) and y(t) is WSS. Upon defining
(8.149)
284 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
Next, we multiply both sides by x(t − α2 ): These relationships presume our earlier characterization that
x[n] and y[n] are zero-mean, jointly WSS processes.
Z ∞
The DTFT maps convolutions in the discrete-time domain to
y(t) x(t − α2 ) = h(α1 ) x(t − α1 ) x(t − α2 ) d α1 . (8.154)
−∞ products in the frequency domain:
which states that x(t) and y(t) are jointly WSS. Noting that
Ryx (α2 ) = h(α2 ) ∗ Rx (α2 ). (8.156) The input x(t) to an LTI system defined by
frequency response is
Exercise 8-12: x(t) is a white process with Rx (τ ) = 5δ (τ )
Y( f ) 5 and Z t+1
H( f ) = = .
X( f ) j2π f + 7 y(t) = x(τ ) d τ .
t−1
From Eq. (8.144), Sx ( f ) = σ 2 when x(t) is a white WSS with Compute the power spectral density of y(t).
Rx (τ ) = σ 2 δ (τ ). Hence, in the present case, Sx ( f ) = 3, and the
power spectral density of y(t) is, from Eq. (8.152), Answer: Sy ( f ) = 20 sinc2 (2 f ). (See IP ).
2
5 75
Sy ( f ) = |H( f )|2 Sx ( f ) = ×3 = .
j2π f + 7 4π 2 f 2 + 49 8-11 Random Fields
Using entry #3 in Table 2-5, the inverse Fourier transform of
Sy ( f ) is The preceding sections highlighted the properties and relations
75 −7|τ | for 1-D random processes, in both continuous and discrete time.
Ry (τ ) = e . Now, we extend those results to 2-D space, and to distinguish
14 between 1-D and 2-D processes, we refer to the latter as a
random field, instead of a random process, and we also change
our symbols to match the notation we used in earlier chapters to
represent 2-D images.
(1) E[ f (x, y)] = constant for all (x, y), and Rg (x, y) = h(x, y) ∗ ∗h(−x, −y) ∗ ∗R f (x, y), (8.167a)
Rg [n, m] = h[n, m] ∗ ∗h[−n, −m] ∗ ∗R f [n, m], (8.167b)
(2) R f (x, y) = E[ f (x′ + x, y′ + y) f (x′ , y′ )].
Rg f (x, y) = h(x, y) ∗ ∗R f (x, y), (8.167c)
The corresponding relations for discrete-space random fields are Rg f [n, m] = h[n, m] ∗ ∗R f [n, m], (8.167d)
(1) E[ f [n, m]] = constant for all [n, m], and Sg (µ , ν ) = |H(µ , ν )|2 S f (µ , ν ), (8.167e)
2
Sg (Ω1 , Ω2 ) = |H(Ω1 , Ω2 )| S f (Ω1 , Ω2 ), (8.167f)
(2) R f [n, m] = E[ f [n′ + n, m′ + m] f [n′ , m′ ]].
Sg f (µ , ν ) = H(µ , ν ) S f (µ , ν ), (8.167g)
The power spectral density Sf (µ , ν ) in the continuous-space Sg f (Ω1 , Ω2 ) = H(Ω1 , Ω2 ) S f (Ω1 , Ω2 ). (8.167h)
frequency domain (µ , ν ) is related to R f (x, y) by
Z ∞Z ∞
S f (µ , ν ) = R f (x, y) e− j2π (µ x+ν y) dx dy, Exercise 8-13: A white random field f (x, y) with autocor-
−∞ −∞ relation function R f (x, y) = 2δ (x) δ (y) is filtered by an LSI
(continuous-space zero-mean WSS) (8.164a) system with PSF h(r) = 1r , where r is the radius in (x, y)
space. Compute the power spectral density of the output
and the 2-D equivalent of Eq. (8.140) is random field g(x, y).
Z ∞Z ∞
Answer:
E[ f 2 (x, y)] = R f (0, 0) = S f (µ , ν ) d µ d ν . 2
−∞ −∞ Sg (µ , ν ) = .
µ2 + ν2
(continuous-space zero-mean WSS) (8.164b)
(See IP ).
For a discrete-space WSS random field with zero mean,
∞ ∞
S f (Ω1 , Ω2 ) = ∑ ∑ e− j(Ω1 n+Ω2 m) ,
n=−∞ m=−∞
(discrete-space zero-mean WSS) (8.165a)
8-11 RANDOM FIELDS 287
Summary
Concepts
• A random variable is a number assigned to each random matrix.
outcome. • A random process is a set of random variables indexed
• A probability density function (pdf) is the probability by time.
that a random variable x lies in an interval of length δ x. • A wide-sense stationary (WSS) random process has
• A Gaussian random variable is described by its mean and constant mean and autocorrelation Rx [i, j] = Rx [i − j].
variance. The constant mean is subtracted off.
• A random vector is a vector of random variables. Mean • A random field is a 2-D random process in 2-D space.
and variance generalize to mean vector and covariance
Mathematical Formulae
Conditional probability Covariance matrix
P[E1 ∩ E2 ] Kx = E[xxT] − xxT
P[E1 |E2 ] =
P[E2 ]
Covariance matrix
Interval probability
Z b KAx = AKx AT
P[a ≤ x < b] = p(x′ ) dx′
a Gaussian random vector
Interval probability x ∼ N (x, Kx )
Z bx Z by T −1
e−(1/2)(x−x) Kx (x−x)
P[ax ≤ x < bx , ay ≤ y < by ] = p(x′ , y′ ) dx′ dy′ p(x) =
ax ay (2π )N/2 (det Kx )1/2
Conditional pdf Conditional Gaussian expectation
p(x|y) =
p(x, y) E[x|y] = E[x] + Kx,yK−1
y (y − E[y])
p(y)
Expectation Autocovariance function
Z ∞ Kx [i, j] = E[x[i] x[ j]] − E[x[i]] E[x[ j]]
E[ f (x)] = f (x) = f (x′ ) p(x′ ) dx′
−∞
Autocorrelation function
Variance Rx [i, j] = E[x[i] x[ j]]
σx2 = E[x2 ] − x2
Rx (t1 ,t2 ) = E[x(t1 ) x(t2 )]
Covariance
Wide-sense stationary random process
λx,y = E[xy] − xy
Rx (t1 ,t2 ) = Rx (t1 − t2 ); x(t)] = m
Gaussian pdf
1 2 2
Power spectral density
p(x) = p e−(x−x) /(2σx ) Sy ( f ) = F {Ry (t)} = |H( f )|2 Sx ( f )
2πσx2
Mean vector Cross-spectral density
Syx ( f ) = F {Rx,y (t)} = H( f ) Sx ( f )
E[x] = x = [x1 , . . . , xN ]T
288 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
Important Terms Provide definitions or explain the meaning of the following terms:
autocovariance function disjoint pmf sample and event spaces
axioms of probability iid random process power spectral density sample function
conditional probability independent probability and trees vector random variable
covariance matrix mutually exclusive random field white random process
cross-covariance function pdf realization wide-sense stationary
PROBLEMS 8.4 Bayes’s Rule. Widgets are made by two factories. Factory
A makes 6000 widgets per year, 2% of which are defective. Fac-
Section 8-2: Conditional Probability tory B makes 4000 widgets per year, 3% of which are defective.
A widget is chosen at random from the 10,000 widgets made in
8.1 Three-Card Monte (a game hucksters play with chumps— a year.
don’t be a chump!). There are three cards. Card #1 is red on (a) Compute P[the chosen widget is defective].
both sides. Card #2 is black on both sides. Card #3 is red on one
(b) Compute P[the chosen widget came from factory A, given
side and black on the other side. The cards are shuffled and one that it is defective].
chosen at random. The top of the chosen card is red. What is
P[bottom of the chosen card is also red]? (The chump bets even 8.5 Bayes’s Rule. Widgets are made by two factories. Factory
money.) A makes 7000 widgets per year, 1% of which are defective. Fac-
tory B makes 3000 widgets per year, 2% of which are defective.
8.2 A Tale of Two Tosses. Two coins have P[heads] as follows: A widget is chosen at random from the 10,000 widgets made in
Coin A has P[heads] = 1/3. Coin B has P[heads] = 3/4. The a year.
results of all flips are independent. We choose a coin at random
(a) Compute P[the chosen widget is defective].
and flip it. Given that it came up heads, compute P[it was
coin A]. Hint: Draw a probability tree. (b) Compute P[the chosen widget came from factory A, given
that it is defective].
8.3 Three Coins in the Fountain (an old movie). Three coins
have P[heads]: Coin A has P[heads] = 2/3. Coin B has Section 8-3: Random Variables
P[heads] = 3/4. Coin C has P[heads] = 4/5. The results of all
flips are independent. Coin A is flipped. Then: 8.6 Random variables x and y have the joint pdf
• If coin A lands heads, we flip coin B. (
cx if 1 < x < y < 2,
p(x, y) =
• If coin A lands tails, we flip coin C. 0 otherwise,
Let H2 denote that the second coin flipped (whatever it is) lands where c is a constant to be determined.
heads.
(a) Compute the constant c in the pdf p(x, y).
(a) Compute P[H2]. (b) Are x and y independent? Explain your answer.
(b) Compute P[Coin A landed heads, given H2]. Now the (c) Compute the marginal pdf p(y).
second coin is flipped n − 1 more times (for a total of n
(d) Compute the conditional pdf p(x|y) at y = 3/2.
flips). Let H2n denote that all n flips of the second coin
land heads. 8.7 Random variables x and y have the joint pdf
(c) Compute P[H2n]. (
cxy if 0 < y < x < 1,
(d) Compute P[Coin A landed heads, given H2n]. p(x, y) =
0 otherwise,
(e) What happens to your answer to (d) as n → ∞? Explain this.
Hint: Draw a probability tree. where c is a constant to be determined.
PROBLEMS 289
(a) Compute the constant c in the pdf p(x, y). (a) Write out the formula for the joint pdf p(x, y).
(b) Are x and y independent? Explain your answer. (b) Show that changing variables from (x, y) to (z, w), where
(c) Compute the marginal pdf p(x).
z 1 1 1 x
(d) Compute the conditional pdf p(y|x) at x = 1/2. =√
w 2 1 −1 y
8.8 Random variable x has the exponential pdf
( yields two decorrelated (λz,w = 0) random variables z
p(x) = λ e−λ x for x > 0, and w.
0 for x < 0. 8.12 Random variables {x1 , x2 , x3 } are all zero-mean and
R jointly Gaussian, with variances σx2i = 2 and covariances
∞
(a) Confirm −∞ p(x) dx = 1. λxi ,x j = −1 for i 6= j.
(b) Compute the expectation x. (a) Write out the joint pdf p(x1 , x2 , x3 ). Use Eq. (8.85) nota-
(c) Compute the variance σx2 . tion.
(b) Write out the joint marginal pdf p(x1 , x3 ). Use Eq. (8.85)
Section 8-5: Joint Pdfs and Pmfs notation.
(c) Let y = x1 + 2x2 + 3x3. Compute σy2 .
8.9 Random variables m and n have the joint pmf
(d) Show that the conditional pdf p(x1 |x2 = 2, x3 = 3) =
(
c if 0 ≤ n ≤ m ≤ 4, δ (x1 + 5). Hint: Compute the eigenvalues and eigenvectors
p[m, n] = of the covariance matrix.
0 otherwise,
8.13 Prove Eq. (8.111a) and Eq. (8.111b), which are:
where c is a constant to be determined and m and n are integers.
(a) Compute the constant c in the pmf p[m, n]. E[x|y = y′ ] = E[x] + Kx,yK−1 ′
y (y − E[y]),
Section 8-7: Random Vectors 8.15 x(t) is a zero-mean WSS random process with power
spectral density Sx ( f ) = 1. x(t) is passed through the LTI system
dy
8.11 Random variables x and y are zero-mean and jointly dt + 4y(t) = 3x(t).
Gaussian, with variances σx2 = σy2 = 1 and covariance λx,y = ρ (a) Compute the power spectral density Sy ( f ).
for some constant ρ 6= ±1.
(b) Compute the cross-spectral density Syx ( f ).
290 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
Filtering, 307
7
9-10 Markov Random Fields, 322 (b) Plot of log10(Sf (0,Ω2)) versus
log10(Ω2) for π/240 ≤ Ω2 ≤ π.
(e) Reconstructed image using
a stochastic Wiener filter
9-11 Application of MRF to Image Segmentation, 327 The average slope is −2.
Problems, 331
Objectives
Learn to:
(c) Blurred image (f ) Reconstructed image using
■ Compute MLE, MAP, and LS estimates for small a deterministic Wiener filter
analytic problems.
This chapter provides a quick review of estimation
■ Estimate parameters of a fractal power spectral theory, including MLE, MAP, LS and LLSE
density from a signal or image. estimators. It then derives stochastic versions of the
deterministic denoising and deconvolution filters
■ Denoise and deconvolve fractal signals and images presented in earlier chapters. Incorporating a priori
using stochastic filters. information, in the form of power spectral density,
is shown to greatly improve the performance of
■ Use thresholding and shrinkage of its wavelet denoising and deconvolution filters on 1-D and 2-D
transform to denoise an image. problems. A very quick presentation of Markov
random fields and the ICM algorithm for image
■ Use the ICM algorithm to segment an image.
segmentation, with examples, are also provided.
Overview All three estimation methods are applied in Section 9-2 to the
coin-flip experiment presented earlier in Section 8-1, obtaining
In a deterministic (non-random) inverse problem, the goal is to three different estimators of P[heads]. This is then followed in
compute an unknown 1-D signal x(t) or x[n], or a 2-D image Sections 9-3 to 9-5 with applications of the estimation methods
f (x, y) or f [n, m], from observation of a corresponding signal to four 1-D estimation problems: (1) estimating the mean of
y(t) or y[n], or corresponding image g(x, y) or g[n, m], wherein a wide-sense stationary (WSS) random process, in the context
the unknown quantity and its observed counterpart are linked of polling, (2) denoising a signal containing additive noise, (3)
by a known model. The inversion process may also take into denoising a signal known to be sparse, and (4) deconvolving
consideration side information about the unknown quantity, a signal from its noisy convolution with a known impulse
such as non-negativity, or a priori information, in the form of response.
a pdf or pmf for it. The solution of an inverse problem may be The treatment is extended to 2-D in Section 9-6, and in
difficult, but a well-posed inverse problem always has a unique Section 9-7 we review the periodogram method for estimating
solution. power spectral densities of random processes and random fields.
An estimation problem is a stochastic (random) version of The periodogram can be used to obtain parametric forms of
the inverse problem. The observation is a 1-D random variable power spectral densities for classes of signals and images, such
or random process, or a 2-D random field. The random character as fractals, which can then be used in the formulations developed
of the observation may be associated with the inherent nature in earlier sections. The procedures are outlined in Sections 9-8
of the observing system itself—an example of which is a laser for signals and 9-9 for images. The last two sections of the
imager and certain radar systems—or due to additive noise chapter are focused on an introduction to Markov random fields
introduced in the system’s receiver. An image generated by a (MRF) and their application to image segmentation. An MRF
monochromatic laser imager or synthetic-aperture radar usually incorporates a priori information (prior knowledge) about the
exhibits a speckle-like randomness superimposed on the true relationships between the value of a given pixel and those of its
image intensity that would have been measured by a wide- neighbors.
bandwidth sensor.
Based on knowledge of the sources and mechanisms respon-
sible for the randomness associated with the observed signal 9-1 Estimation Methods
or image, we can incorporate randomness (stochasticity) in the
model that relates the unknown quantity to the observation by This section introduces three estimation methods: MLE, MAP,
modeling the unknown quantity as a random variable with a and LSE, and even though our ultimate goal is to apply these
characteristic pdf or pmf. This pdf or pmf represents a pri- methods to 2-D images (which we do in later sections), we
ori information about the unknown quantities. Because of the will, for the present, limit the presentation to estimating a
stochastic nature of the inverse problem, we call it an estimation scalar unknown x from a scalar observation yobs of a random
problem. The formulation of an estimation problem usually variable y. In later sections we generalize the formulations to
leads to a likelihood function, and the goal of the estimation random vectors, processes, and fields.
problem becomes to maximize the likelihood function; the value An estimation problem consists of the following ingredients:
of the unknown signal or image that maximizes the likelihood (a) x: the unknown quantity to be estimated, which may have a
function is deemed the solution of the estimation problem. constant value or it may be a random variable with a known
In Section 9-1, we introduce three common approaches to pdf p(x).
estimation:
(b) yobs : the observed value of random variable y.
(a) Maximum Likelihood Estimation (MLE),
(c) p(y | x = x′ ): the conditional pdf of y, given that x = x′ ,
(b) Maximum A Posteriori Probability (MAP) Estimation, provided by a model.
(c) Least-Squares Estimation (LSE). In a typical situation, y and x are related by a model of the form
In all three methods, the observation is presumed to be random y(t) = h(t) ∗ x(t) + υ (t) (continuous time),
in nature, and the unknown quantity also is presumed to be
random but only in MAP and LSE; the unknown quantity is or
presumed to be non-random in MLE. y[n] = h[n] ∗ x[n] + υ [n] (discrete time),
292
9-1 ESTIMATION METHODS 293
where h(t) and h[n] are the continuous-time and discrete-time information about the unknown (input) random variable x in the
impulse responses of the observing system and υ (t) and υ [n] form of its pdf p(x). For example, x may be known to have a
represent random noise added by the measurement process. To Gaussian pdf (Fig. 9-1(a)
obtain a good estimate of x(t) (or x[n]), we need to filter out the
2 2
noise and to deconvolve y(t) (or y[n]). e−[(x−x) /(2σx )]
We now introduce the basic structure of each of the three p(x) = p (Gaussian), (9.1)
2πσx2
estimation methods.
where x = E[x] is the mean value of x and σx2 is its variance. Or x
9-1.1 Maximum Likelihood Estimation (MLE) might be known to be a value of a sparse signal with Laplacian
pdf (Fig. 9-1(b))
Among the three estimation methods, the maximum likelihood √
estimation (MLE) method is the one usually used when the e− 2 |x|/σx
unknown quantity x has a constant value, as opposed to being p(x) = √ (Laplacian). (9.2)
2 σx
a random variable. The basic idea behind MLE is to choose
the value of x that makes what actually happened, namely the Gaussian and Laplacian a priori pdfs will be used later in
observation y = yobs , the most likely outcome. MLE maximizes Sections 9-3 and 9-4.
the likelihood of y = yobs (hence the name MLE) by applying The idea behind MAP estimation is to determine the most
the following recipe: probable value of x, given y = yobs , which requires maximizing
the a posteriori pdf p(x|yobs ). This is in contrast to the MLE
(1) Set y = yobs in the conditional pdf p(y|x) provided by the method introduced earlier, which sought to maximize the likeli-
model to obtain p(yobs |x). hood function p(yobs |x). As we see shortly, in order to maximize
(2) Choose the value of x that maximizes the likelihood func- p(x|yobs ), we need to know not only p(yobs |x), but also the
tion p(yobs |x) and denote it x̂MLE. a priori pdf p(x).
As noted earlier, usually we know or have expressions for
(3) If side information about x is available, such as x is non- p(x) and the likelihood function pobs (y|x), but not for the a
negative or x is bounded within a specified interval, then posteriori pdf p(x|y). To obtain an expression for the latter (so
incorporate that information in the maximization process. we may maximize it), we use the conditional pdf relation given
by Eq. (8.36b) to relate the joint pdf p(x, y) to each of the two
In practice it is often easier to maximize the natural logarithm, conditional pdfs:
ln(p(yobs |x)), rather than to maximize p(yobs |x) itself. Hence-
forth, ln(p(yobs |x)) will be referred to as the log-likelihood p(x, y) = p(y|x) p(x) (9.3a)
function. and
p(x, y) = p(x|y) p(y). (9.3b)
◮ p(yobs |x) = likelihood function
Combining the two relations gives Bayes’s rule for pdfs:
p(x) = a priori pdf
p(x|yobs ) = a posteriori pdf
ln(p(yobs |x) = log-likelihood function p(x)
p(x|y) = p(y|x) . (9.4)
p(y)
9-1.2 Maximum A Posteriori (MAP) Estimation Bayes’s rule relates the a posteriori pdf p(x|yobs ) to the likeli-
If the unknown quantity x is a random variable, the two meth- hood function p(yobs |x).
ods most commonly used to estimate x, given an observation The goal of the MAP estimator is to choose the value of x
y = yobs , are the maximum a posteriori (MAP) estimator and that maximizes the posteriori pdf p(x|yobs ), given the a priori
the least-squares estimator (LSE). This subsection covers MAP pdf p(x) and the likelihood function p(yobs |x). The third pdf in
and the next one covers LSE. Eq. (9.4), namely p(y), has no influence on the maximization
The observation y = yobs is called a posteriori information process because it is not a function of x. Hence, p(y) can be set
because it is about the outcome. This is in contrast to a priori equal to any arbitrary constant value C.
294 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
p(x)
0.4
Gaussian pdf
x=0
0.3 σx = 1
0.2
x=0
0.1 σx = 2
0 x
−6 −4 −2 0 2 4 6
(a) Gaussian pdf
p(x)
0.8
Laplacian pdf
0.6
σx = 1
0.4
0.2
σx = 2
0 x
−6 −4 −2 0 2 4 6
(b) Laplacian pdf
The MAP estimator recipe consists of the following steps: given by Eq. (9.4) and set p(y) = C:
p(x)
p(x|yobs ) = p(yobs |x) . (9.5)
C
(1) Set y = yobs in the expression for the a posteriori pdf p(x|y)
9-1 ESTIMATION METHODS 295
(2) Choose the value of x that maximizes the a posteriori pdf apparent from the integration that both x′ and y′ are treated
p(x|yobs ) and denote it x̂MAP . as random variables, so we need to keep that in mind in what
follows.
(3) If side information about x is available, incorporate that Since expectation is a linear operator,
information in the estimation process.
As noted with the MLE method in the preceding subsection, E[(x − x̂(y))2 ] = E[(x − E[x|y] + ε (y))2]
it is often easier to maximize the natural logarithm of the = E[(x − E[x|y])2] + E[ε 2 (y)]
a posteriori pdf: + 2E[xε (y)] − 2E[E[x|y]ε (y)]. (9.11)
ln(p(x|yobs )) = ln p(yobs |x) + ln p(x) − lnC, (9.6) We now show that the final two terms cancel each other. Using
which is the sum of the logarithms of the likelihood function Eq. (8.36a), namely
p(yobs |x) and the a priori pdf p(x). Subtracting lnC does not p(x′ , y′ ) = p(x′ |y′ ) p(y′ ), (9.12)
affect the process of maximizing the value of x.
the third term in Eq. (9.11) becomes
9-1.3 Least-Squares Estimator (LSE) ZZ
2E[xε (y)] = 2 x′ ε (y′ ) p(x′ , y′ ) dx′ dy′
Whereas the MAP estimator sought to estimate x by maximizing
ZZ
the a posteriori pdf p(x|yobs ) for a given observed value yobs ,
=2 x′ ε (y′ ) p(x′ |y′ ) p(y′ )dx′ dy′
the least-squares estimator (LSE) estimates the most probable
Z Z
value of x by minimizing the mean square error (MSE) between ′ ′
the estimated value x̂ and the random variable x. The MSE is =2 p(y ) ε (y ) x′ p(x′ |y′ ) dx′ dy′
defined as Z
ZZ =2 p(y′ ) ε (y′ ) E[x | y = y′ ] dy′
MSE = E[(x − x̂(y))2 ] = (x′ − x̂(y′ ))2 p(x′ , y′ ) dx′ dy′ . (9.7) = 2E[ε (y) E[x|y]]. (9.13)
Of course, x̂ is a function of y, and as we will show shortly, This result shows that the third and fourth terms in Eq. (9.11)
the value of x estimated by the LSE method, for a given value are identical in magnitude but opposite in sign, thereby cancel-
y = yobs , is given by ing one another out. Hence, the MSE is
Z
x̂LS = E[x | y = yobs ] = x′ p(x′ |yobs ) dx′ , (9.8) MSE = E[(x − x̂(y))2 ] = E[(x − E[x|y])2 ] + E[ε 2(y)]. (9.14)
leading to the final expression (b) n: the observed value of a discrete random variable n.
Also available from the model relating m to n is:
R ′
x p(yobs |x′ ) p(x′ ) dx′
x̂LS = R . (9.17) (c) p[n|m]: the conditional pmf.
p(yobs |x′ ) p(x′ ) dx′
Occasionally, we may encounter scenarios in which the obser-
As noted at the outset of Section 9-1, the pdfs p(x) and p(y|x) vation n is a discrete random variable, but the unknown is a
are made available by the model describing the random variable continuous-value quantity. In such cases, we continue to use n
x and its relationship to random variable y. Hence, computing as the discrete-value observation, but we use x and p(x) instead
x̂LS using Eq. (9.17) should be a straightforward task. of m and p[m] for the continuous-value random variable.
Generalizing to the case where x and y are random vectors Transforming from (x, y) notation to [m, n] notation and from
and jointly Gaussian, the LS vector-equivalent of Eq. (9.8) is integrals to sums leads to the formulations that follow.
Exercise 9-1: How does Eq. (9.18) simplify when random p[nobs |m] p[m]
p[m|nobs ] = . (9.20a)
vectors x and y are replaced with random variables x and y? p[nobs ]
Answer: (both n and m are discrete)
λx,y
x̂LS = x + (yobs − y). As noted in connection with Eq. (9.5), p[nobs ] exercises no effect
σy2
on the maximization process, so it can be set to any arbitrary
9-1 ESTIMATION METHODS 297
constant value C. Also, if the unknown quantity is continuous, Solution: (a) For N = 1, we are given a single observation,
m and p[m] should be replaced with x and p(x): y1 of y. Since p(y|x) = 0 for y > x, we require x̂MLE > y1 . The
height of the pdf for 0 ≤ y ≤ x is 1/x. This height is maximized
p[nobs |x] p(x) by choosing x̂MLE to be as small as possible. The smallest
p(x|nobs ) = (x continuous and n discrete).
p[nobs ] possible value of x̂MLE for which p(y1 |x) 6= 0 is x̂MLE = y1 .
(9.20b) (b) Repeating this argument for each of N independent
observations {y1 , . . . , yN }, we require the smallest value of x̂MLE
C. LSE such that x̂MLE > { y1 , . . . , yN }. This is x̂MLE = max{ y1 , . . . , yN }.
The likelihood function is nonzero in the N-dimensional
∞ hypercube 0 ≤ yn ≤ x:
∑ m′ p[nobs|m′ ] p[m′ ] (
m̂LS =
′
m =−∞
. (9.21a) 1/xN for 0 ≤ yn ≤ x,
∞ p(y1 , . . . , yN |x) =
p[nobs |m′ ] p[m′ ] 0 otherwise,
∑
m′ =−∞
(both m and n are discrete) which is maximized by minimizing xN subject to 0 ≤ yn ≤ x.
Hence x̂MLE = max{ y1 , . . . , yN }.
This expression, which is the discrete counterpart of the expres-
sion given by Eq. (9.17), is applicable only if both m and n are
discrete random variables. If the unknown quantity is a continu-
ous random variable and n is a discrete random variable, then m
and p[m] should be replaced with x and p(x), respectively, and
the summations in Eq. (9.21a) should be converted to integrals,
Concept Question 9-1: What is the essential difference
namely
between MLE and MAP estimation?
R
x′ p[nobs |x′ ] p(x′ ) dx′
x̂LS = R . (9.21b)
p[nobs |x′ ] p(x′ ) dx′ Exercise 9-2: Determine x̂LS and x̂MAP , given that only the
(x continuous and and n discrete) a priori pdf p(x) is known.
Answer: x̂LS = x, and x̂MLE = value of x at which p(x) is
the largest.
Example 9-1: MLE
Exercise 9-3: We observe random variable y whose pdf
is p(y|λ ) = λ e−λ y for y > 0. Compute the maximum
We are given N independent observations of a random variable likelihood estimate of λ .
y with uniform pdf:
Answer: The log-likelihood function is
(
1/x for 0 ≤ y ≤ x,
p(y|x) = ln p(yobs |λ ) = ln λ − λ yobs.
0 otherwise.
Setting its derivative to zero gives λ1 − yobs = 0. Solving
p( y|x) gives
1
λ̂MLE (yobs ) = .
yobs
1
x
Exercise 9-4: We observe random variable y whose pdf
y is p(y|λ ) = λ e−λ y for y > 0 and λ has the a priori
x pdf p(λ ) = 3λ 2 for 0 ≤ λ ≤ 1. Compute the maximum a
posteriori estimate of λ .
Compute x̂MLE for (a) N = 1 and (b) arbitrary N.
298 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
P[Hk ] = a, ∂ 1
(ln x) =
P[Tk ] = 1 − a, ∂x x
and
and 0 ≤ a ≤ 1. For a fair coin, a = 0.5, but we wish to consider ∂ 1
the more general case where the coin can be such that a can (ln(1 − x)) = − .
∂x 1−x
assume any desired value between 0 (all tails) and 1 (all heads).
Solving Eq. (9.24) for x gives
We now use the coin-flip experiment as a vehicle to test the
three estimation methods of the preceding section. In the con- nobs
text of the coin-flip experiment, our unknown and observation x̂MLE = x = , (9.25)
N
quantities are:
and for the coin-flip experiment with N = 10, x̂MLE = nobs /10.
(1) x = unknown probability of heads, which we wish to This result not only satisfies the condition 0 ≤ x ≤ 1, but it is
estimate on the basis of a finite number of observations, N. also intuitively obvious. It says that if we flip the coin 10 times
If we were to flip the coin an infinite number of times, the and 6 of those flips turn out to be heads, then the most-likelihood
estimated value of x should be a, the true probability of estimate of the probability that any individual flip is a head is
heads, but in our experiment N is finite. For the sake of the 0.6. As we will see shortly, the MAP and LS estimators provide
present exercise, we set N = 10. slightly different estimates.
(2) nobs = number of heads observed in N = 10 flips. 9-2.2 MAP Coin Estimate
As noted earlier, the MLE method does not use an a priori pdf Whereas the MLE method does not use a priori probabilistic
for x, but the MAP and LS methods do. Consequently, the three information about the unknown quantity x, the MAP method
estimation methods result in three different estimates x̂(nobs ). uses the pmf p[m] if m is a discrete-value random variable or
9-2 COIN-FLIP EXPERIMENT 299
where used used Eqs. (9.22) and (9.26), and p[n] has been set
to a constant value because it has no effect on the maximization
process. The natural logarithm of the pdf is which is different from the MLE estimate x̂MLE = nobs /10.
Figure 9-2 displays a plot of x̂MLE , x̂MAP , and x̂LS as a function
2N! of nobs , all for N = 10. We observe that even when none of
ln(p(x|nobs )) = ln the 10 coin flips is a head (i.e., nobs = 0), the MAP estimator
Cnobs ! (N − nobs)!
predicts x̂MAP = 1/11 ≈ 0.09, but as nobs approaches N = 10,
+ ln((x)nobs +1 ) + (N − nobs) ln(1 − x) both estimators approach the same limit of 1.
2N!
= ln
Cnobs ! (N − nobs)!
+ (nobs + 1) ln x + (N − nobs) ln(1 − x). (9.29)
9-2.3 LS Coin Estimate
Only the last two terms are functions of x, so when we take the
partial derivative with respect to x and set it equal to zero, we Since x is a continuous random variable and n a discrete random
get variable, the applicable expression for x̂LS is the one given by
nobs + 1 N − nobs Eq. (9.21b):
0= − ,
x 1−x R ′
whose solution for x yields the MAP estimate x p[nobs|x′ ] p[x′ ] dx′
x̂LS = R ′ ′ ′
. (9.31)
p[nobs |x ] p[x ] dx
nobs + 1
x̂MAP = x = . (9.30) After inserting the expression for p[nobs|x] given by Eq. (9.22)
N +1
and the expression for p(x) given by Eq. (9.26) into Eq. (9.31),
For the specific case where N = 10, x̂MAP = (nobs + 1)/11, and then canceling terms that are common to the numerator and
300 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
a pollster is to estimate µ from a subset of M polled voters, each voters who indicated they plan to vote for candidate A and M is
of whom tells the pollster for which candidate he or she will the total number of polled voters.
vote. If the subset of voters is representative of the entire voting For a single poll, the estimation process is straightforward,
population, and if the choice made by each voter is independent but what if we have N different polls, each comprised of M
of the choices made by other voters, the number of voters in the voters, resulting in N estimates of µ ? Instead of only one random
subset of M voters who say they will vote for candidate A is variable z with only one estimate µ̂ , we now have N random
a discrete random variable m and its pmf is the binomial pmf variables, z[1] through z[N], and N estimates of µ , namely
defined in Eq. (8.31), namely
m[i]
M m µ̂ [i] = zobs [i] = , 0 ≤ i ≤ N. (9.41)
p[m] = µ (1 − µ )M−m M
m
How do we combine the information provided by the N polls to
µ m (1 − µ )M−m M! generate a single estimate of µ ? As we see next, the answer to
= , m = 0, . . . , M. (9.34)
m! (M − m)! the question depends on which estimation method we use.
For large values of M (where M is the number of polled
voters), p[m] can be approximated as a Gaussian pdf with a mean
M µ and a variance σm2 = M µ (1 − µ ). In shorthand notation,
N 1 N p(µ ) p(µ )
ln(p(zobs |µ̂ )) = − ln(2πσz2 ) − 2 ∑ (zobs [i] − µ̂ )2 . (9.45) p(µ |zobs ) = p(zobs |µ ) = p(zobs |µ ) , (9.50)
2 2σz p(zobs ) C
i=1
The log-likelihood function is maximized by setting its partial where we set p(zobs ) = C because it is not a function of the
derivative with respect to µ̂ to zero: unknown mean µ .
Inserting Eqs. (9.44), with z = zobs , and (9.49) into Eq. (9.50)
∂ 1 N leads to
0=
∂ µ̂
ln(p(zobs |µ̂ )) =
2σz2 ∑ (zobs [i] − µ̂ ). (9.46)
1 1 2 2
i=1
p(µ |zobs ) = ×q e−(µ −µ p ) /(2σ p )
C(2πσz2 )N/2 2πσ 2
Solving for µ̂ gives the MLE estimate of µ : p
N
2 /(2σ 2 )
1 N
1 N
m[i] 1 N
× ∏ e−(zobs [i]−µ ) z . (9.51)
µ̂MLE =
N ∑ zobs [i] = N ∑ M = NM ∑ m[i], (9.47) i=1
i=1 i=1 i=1
The log-likelihood function is
which is equal to the total number of voters (among all N polls)
who selected candidate A, divided by the total number of polled N 1
voters, NM. The result given by Eq. (9.47) is called the sample ln(p(µ |zobs )) = − lnC − ln(2πσz2 ) − ln(2πσ p2 )
2 2
mean of { zobs [i] }. As with the MLE estimator in Eq. (9.25)
1 1 N
for the coin-flip problem, this is the “obvious” estimator of the − 2 (µ − µ p)2 − 2 ∑ (zobs [i] − µ )2 .
mean µ . 2σ p 2σz i=1
Note that Eq. (9.42) can be interpreted as a set of N ob- (9.52)
servations of a white Gaussian random process with unknown
mean µ . So the sample mean can be used to estimate the The a posteriori estimate µ̂MAP is the value of µ that maximizes
unknown mean of a white Gaussian random process. the log-likelihood function, which is obtained by setting its
partial derivative with respect to µ to zero:
" # " #
9-3.3 MAP Estimate of Sample Mean ∂ 1 2 ∂ 1 N 2
0= − 2 (µ − µ p ) + − 2 ∑ (zobs [i] − µ ) .
We again have N polls, represented by { z[i] }, but we also ∂µ 2σ p ∂µ 2σz i=1
have some additional information generated in earlier polls of
candidate A versus candidate B. The data generated in earlier The solution leads to
polls show that the fraction µ of voters who preferred candidate !
σz2 N
A varied from one poll to another, and that collectively µ µ p + ∑ zobs [i]
behaves like a Gaussian random variable with a mean µ p and σ p2 i=1
µ̂MAP = µ = . (9.53)
a variance σ p2 : σ2
N + z2
µ ∼ N (µ p , σ p2 ). (9.48) σp
Since the information about the statistics of µ is available ahead
The MAP estimate is the sample mean of the following:
of the N polls, we refer to the pdf of µ as an a priori pdf:
{z[i]}, augmented with (σz2 /σ p2 ) copies of µ p , for a total of
1 2 2 (N + σz2 /σ p2 ) “observations.” For the polling problem, if M is
p(µ ) = q e−(µ −µ p) /(2σ p ) . (9.49)
the number of voters polled in each of the N polls and M ′ is
2πσ p2
the number of voters polled in each of the earlier polls, then
(σz2 /σ p2 ) = M ′ /M, so if the previous polls polled more voters,
Our plan in the present subsection is to apply the MAP method its estimate µ p of µ is weighed more heavily.
outlined in Section 9-1.2, wherein the goal is to maximize
the a posteriori pdf p(unknown mean µ | observation vec-
tor zobs ) = p(µ |zobs ). Using the form of Eq. (9.5), p(µ |zobs ) can
9-4 LEAST-SQUARES ESTIMATION 303
◮ The result given by Eq. (9.56) is a statement of orthogo- ◮ The result given by Eq. (9.59) is another statement
nality: The estimation error e = (x − x̂LS ) is uncorrelated of orthogonality: When the expectation of the product of
with the observation y used to produce the LS estimate two random variables—in this case, the error e[n] and
x̂LS . ◭ the observation y[n − j]—is zero, it means that those two
random variables are uncorrelated. We conclude from this
observation that if x[n] and y[n] are jointly Gaussian WSS
random processes, then x̂LLSE [n] = x̂LS [n]. ◭
We now consider a scenario in which the estimator is con- 9-4.3 1-D Stochastic Wiener Smoothing Filter
strained to be a linear function of the observation. As we shall
see shortly, the derived linear least-squares estimate x̂LLSE is We now use the orthogonality relationship to derive 1-D stochas-
equal to the LS estimate for Gaussian random vectors of the tic versions of the deterministic Wiener deconvolution filter
preceding subsection. presented in Chapter 6 and the deterministic sparsifying de-
Given that x[i] and y[i] are zero-mean, jointly wide-sense- noising filter presented in Chapter 7. The stochastic versions
stationary (WSS, defined in Section 8-9.3) random processes, of these filters assume that the signals are random processes
our present goal is to compute the linear least-squares estimate instead of just functions. This allows a priori information about
x̂LLSE[n] of x[n] at discrete time n from the infinite set of the signals, in the form of their power spectral densities, to be
observations { y[i], −∞ < i < ∞ }. The estimator is constrained incorporated into the filters. If all of the random processes are
to be a linear function of the { y[i] }, which means that x̂LLSE [n] white, the stochastic filters reduce to the deterministic filters of
and { y[i] } are related by a linear sum of the form Chapters 6 and 7, as we show later.
∞
Another advantage of the stochastic forms of these filters is
that the trade-off parameter λ in the Tikhonov and LASSO
x̂LLSE [n] = ∑ h[i] y[n − i] = h[n] ∗ y[n]. (9.57)
criteria can now be interpreted as an inverse signal-to-noise
i=−∞
ratio, as we also show later. We derive the 1-D forms of
Here, h[n] is an unknown function (filter) that is yet to be stochastic filters so we may generalize them to 2-D later.
determined. Our task in the present subsection is to determine the filter h[i]
Let us consider the error e[n] between x[n] and the estimate introduced in Eq. (9.57). To that end, we insert Eq. (9.58) into
x̂LLSE: Eq. (9.59) and apply the distributive property of the expectation:
∞
0 = E[e[n] y[n − j]]
e[n] = x[n] − x̂LLSE[n] = x[n] − ∑ h[i] y[n − i]. (9.58) "
∞
! #
i=−∞
=E x[n] − ∑ h[i] y[n − i] y[n − j]
Next, let us square the error e[n], take the derivative of its i=−∞
expectation with respect to h[ j] and then set it equal to zero: ∞
= E [x[n] y[n − j]] − ∑ h[i] E[y[n − i] y[n − j]]
∂ i=−∞
0= E[e[n]2 ] ∞
∂ h[ j]
∂ e[n]
= Rxy [ j] − ∑ h[i] Ry [ j − i]
i=−∞
= 2E e[n]
∂ h[ j] = Rxy [ j] − h[ j] ∗ Ry[ j], (9.60)
" !#
∞
∂
= 2E e[n] x[n] − ∑ h[ j] y[n − j] where, in the last step, we used the standard definition of
∂ h[ j] j=−∞ convolution of two discrete-time 1-D signals, Eq. (2.71a).
= 2E[e[n] (−y[n − j])] Taking the DTFT of Eq. (9.60) gives
= −2E[e[n] y[n − j]]. (9.59) 0 = Sxy (Ω) − H(Ω) Sy (Ω).
9-4 LEAST-SQUARES ESTIMATION 305
The solution for H(Ω) is labeled HSDN (Ω): Taking the DTFTs of Eqs. (9.63) and (9.65) gives
Similarly,
9-4.5 Stochastic Wiener Deconvolution Example
Ry [i, j] = E[(x[i] + ν [i])(x[ j] + ν [ j])] Now let y[n] be noisy observations of hblur [n] ∗ x[n]:
= E[x[i] x[ j]] + E[ν [i] x[ j]] + E[x[i] ν [ j]] + E[ν [i] ν [ j]].
(9.64) y[n] = hblur [n] ∗ x[n] + v[n], −∞ < n < ∞, (9.69)
Since x[i] and ν [ j] are uncorrelated, the second and third terms where hblur [n] is a known impulse response. The goal is to
are zero. Hence, compute the linear least-square estimate x̂LLSE [n] at time n from
the observations { y[i], −∞ < i < ∞ }. The noise v[n] is a zero-
Ry [i, j] = Rx [i − j] + σv2 δ [i − j], (9.65) mean IID random process uncorrelated and jointly WSS with
x[n], and E(v[n]2 ) = σv2 δ [n].
where σv2 is the variance of the noise ν [n]. Equations (9.63) and Replacing x[n] with hblur [n] ∗ x[n] in the derivation leading to
(9.65) show that Rxy [i, j] = Rxy [i − j] and Ry [i, j] = Ry [i − j]. Eq. (9.67) for the smoothing filter for noisy observations and
306 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
using Eq. (8.161)(a and b) gives (3) x[n] at each time n has a Laplacian a priori pdf given by
√
Sy (Ω) = |Hblur (Ω)|2 Sx (Ω) + σv2 , (9.70a) 1
p(x[n]) = √ e− 2|x[n]|/σx . (9.75)
Sxy (Ω) = H∗blur (Ω) Sx (Ω), (9.70b) 2 σx
where Yobs (Ω) is the DTFT of the noisy, convolved observation The MAP estimate x̂MAP [n] at time n is related to the a posteriori
yobs [n]. pdf p(x|yobs), which is related to p(yobs |x) and p(x) by the
vector equivalent of Eq. (9.5):
p(x)
9-4.6 Stochastic MAP Sparsifying Denoising p(x|yobs) = p(yobs |x) . (9.79)
C
Estimator
Inserting Eqs. (9.77) and (9.78) into Eq. (9.79) and then taking
We return to the denoising problem: the natural log leads to the MAP log-likelihood function:
y[n] = x[n] + v[n], 1 ≤ n ≤ N, (9.74) N
ln(p(x|yobs )) = − lnC − ln(2πσv2 )
2
and we make the following assumptions: N
1
(1) ν [n] is a zero-mean white Gaussian random process with
−
2σv2 ∑ (yobs[n] − x[n])2
n=1
Rν [n] = σν2 δ [n]. √
√ 2 N
− N ln( 2 σx ) −
σx ∑ |x[n]|, (9.80)
(2) x[n] and ν [n] are IID and jointly WSS random processes. n=1
9-5 DETERMINISTIC VERSUS STOCHASTIC WIENER FILTERING 307
The usual procedure for obtaining the MAP estimate x̂MAP [n] 9-5 Deterministic versus Stochastic
involves taking the partial derivative of the log-likelihood func-
tion with respect to x[n], equating the result to zero, and then
Wiener Filtering
solving for x[n]. In the present case, the procedure is not so
straightforward because one of the terms includes the absolute Through a speedometer example, we now compare results of
value of x[n]. Upon ignoring the three terms in Eq. (9.80) speed estimates based on two Wiener filtering approaches, one
that do not involve x[n] (because they exercise no impact on using a deterministic filter and another using a stochastic filter.
minimizing the log-likelihood function), and then multiplying The speedometer accepts as input y(t), a noisy measurement of
the two remaining terms by (−σv2 ), we obtain a new cost the distance traveled at time t, measured by an odometer, and
functional converts y(t) into an output speed r(t) = dy/dt. [We use symbol
r (for rate) instead of s, to denote speed to avoid notational
1 N √ σv2 N confusion in later sections.] Sampling y(t) and r(t) at a sampling
Λ=
2 ∑ (yobs [n] − x[n])2 + 2
σx ∑ |x[n]|. (9.81) interval ∆ converts them into discrete-time signals y[n] and r[n]:
n=1 n=1
The functional form of Λ is identical to that of the LASSO cost y[n] = y(t = n∆), (9.84a)
functional given in Eq. (7.106), and so is the form of the solution y[n] − y[n − 1]
given by Eq. (7.109): r[n] = . (9.84b)
∆
yobs [n] − λ for yobs [n] > λ , The observed distance y[n] contains a noise component ν [n],
x̂MAP [n] = yobs [n] + λ for yobs [n] < −λ , (9.82) y[n] = s[n] + ν [n], (9.85)
0 for |yobs [n]| < λ ,
where s[n] is the actual distance that the odometer would have
where λ is the noise-to-signal ratio measured had it been noise-free, and ν [n] is a zero-mean white
noise Gaussian random process with variance σv2 . An example
√ σv2 of a slightly noisy odometer signal y[n] with σv2 = 2 is shown
λ= 2 . (9.83)
σx in Fig. 9-3(a), and the application of the differentiator given
by Eq. (9.84b) with ∆ = 1 s results in the unfiltered speed
ru [n] shown in Fig. 9-3(b). The expression for ru [n] is, from
Concept Question 9-4: How did we use the orthogonal-
Eq. (9.84b),
ity principle of linear prediction in Section 9-4?
y[n] − y[n − 1]
ru [n] =
Concept Question 9-5: How is the Tikhonov parameter ∆
λ interpreted in Section 9-4? s[n] − s[n − 1] ν [n] − ν [n − 1]
= +
∆ ∆
Exercise 9-9: What is the MAP sparsifying estimator when = rtrue [n] + rnoise[n]. (9.86)
the noise variance σv2 is much smaller than σx ?
The goal of the Wiener filter is to estimate the true speed rtrue [n],
Answer: When σv2 ≪ σx , λ → 0, and x̂[n] = yobs [n]. so for the purpose of present example, we show in Fig. 9-2(c)
a plot of the true speed rtrue [n], against which we will shortly
Exercise 9-10: What is the MAP sparsifying estimator compare the Wiener-filter results.
when noise variance σv2 is much greater than σx ?
Answer: When σv2 ≫ σx , λ → ∞, and x̂MAP = 0. This
makes sense: the a priori information that x is sparse 9-5.1 Deterministic Wiener Filter
dominates the noisy observation.
For ∆ = 1 s, the true speed rtrue [n] is related to the true distance
s[n] by
rtrue [n] = s[n] − s[n − 1]. (9.87)
308 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
y[n] rtrue[n]
800 12
10
600
8
400 6
200 4
2
0
0
−200
0 50 100 150 200 250
n −2
0 50 100 150 200 250
n
(a) Noisy odometer signal y[n] (c) True speed rtrue[n]
ru[n] rˆ D[n]
15 15
10 10
5 5
0 0
−5
0 50 100 150 200 250
n −5
0 50 100 150 200 250
n
(b) Unfiltered speed ru[n] (d) Speed rˆD[n] estimated by
deterministic Wiener filter
rˆs[n]
12
10
8
6
4
2
0
−2
0 50 100 150 200 250
n
(e) Speed rˆS[n] estimated by stochastic Wiener filter
Figure 9-3 Speedometer example: (a) noisy odometer signal y[n], (b) unfiltered speed ru [n] = (y[n] − y[n − 1])/∆, with ∆ = 1 s, (c) true
speed rtrue [n] = (s[n] − s[n − 1])/∆, with ∆ = 1 s, (d) deterministically estimated speed r̂D [n], and (e) stochastically estimated speed r̂S [n].
For a segment of length N, with { n = 1, . . . , N }, the N-point noise-free system described by Eq. (9.87) is
DFT of Eq. (9.87) gives
S[k] 1
− j2π k/N − j2π k/N H[k] = = . (9.89)
R[k] = S[k] − e S[k] = (1 − e ) S[k], (9.88) R[k] 1 − e j2π k/N
−
where we used property #2 in Table 2-9 to compute the second The deterministic Wiener denoising/deconvolution filter was
term of Eq. (9.88). The frequency response function H[k] of the presented earlier in Section 6-4.4 for 2-D images. Converting
the notation from 2-D to 1-D, as well as replacing the symbols
in Eqs. (6.32a) and (6.32b) to match our current speedometer
9-6 2-D ESTIMATION 309
problem, we obtain the following expression for the estimated As noted later in Section 9-7 and 9-8, a practical model for
DFT of the true speed rtrue [n]: Ss (Ω) is
C
R̂D [k] = Y[k] WDDC [k] (9.90a) Ss (Ω) = 2 , (9.92c)
Ω
with where C is a constant. Using Eqs. (9.89) and (9.92c) leads to the
H∗ [k] stochastic estimate
WDDC [k] = , (9.90b)
|H[k]|2 + λ 2 Y(Ω)(1 − e− jΩ)
R̂S (Ω) = . (9.93)
where Y[k] is the DFT of the noisy observation y[n], WDDC [k] σ2
is the deterministic deconvolution Wiener filter, and λ is 1 + |1 − e− jΩ|2 Ω2 v
C
the trade-off parameter in Tikhonov regularization. Combining
Eqs. (9.89), (9.90a), and (9.90b), and multiplying the numerator Implementation of the stochastic Wiener filter entails the follow-
and denominator by |1 − e j2π k/N |2 leads to the deterministic ing steps:
estimate (1) For the given observation distance signal y[n], compute its
Y[k] (1 − e− j2π k/N ) DFT Y[k].
R̂D [k] = . (9.91)
1 + |1 − e− j2π k/N |2 λ 2
(2) Convert Eq. (9.93) into a numerical format by replacing Ω
To obtain the “best” estimate of the speed rD [n] using the with 2π k/N everywhere.
deterministic Wiener filter, we need to go through the estimation
process multiple time using different values of λ . The process (3) Using C′ = σv2 /C as a parameter, compute R̂S [k] for various
entails the following steps: values of C′ .
(4) Perform an inverse DFT to obtain r̂s [n] for each value of C′ .
(1) Compute the DFT Y[k] of the noisy observations y[n].
(5) Select the value of C′ that appears to provide the best result.
(2) Compute R̂D [k] using Eq. (9.91) for various values of λ .
The result for the seemingly best value of C′ , which turned out to
be C′ = 0.4, is displayed in Fig. 9-3(e). Comparison of the plots
(3) Perform an inverse N-point DFT to compute r̂D [n].
for r̂D [n] and r̂s [n] reveals that the stochastic approach provides
For the speedometer example, the outcome with the “seemingly” a much better rendition of the true speed rtrue [n] than does the
best result is the one with λ = 0.1, and its plot is shown in deterministic approach.
Fig. 9-3(d). It is an improvement over the unfiltered speed ru [n],
but it still contains a noticeable noisy component. Concept Question 9-6: Why did the stochastic Wiener
filter produce a much better estimate than the deterministic
Wiener filter in Section 9-5?
9-5.2 Stochastic Wiener Filter
Based on the treatment given in Section 9-4.5, the stochastic Exercise 9-11: What happens to the stochastic Wiener filter
Wiener approach to filtering the noisy signal y[n] uses a stochas- when the observation noise strength σv2 → 0?
tic denoising/deconvolution Wiener filter WSDC (Ω) as follows:
Answer: From Eq. (9.93), letting σv2 → 0 makes
R̂S (Ω) = Y(Ω) WSDC (Ω), (9.92a) R̂S (Ω) = Y(Ω)(1 − e− jΩ ), whose inverse DTFT is
r̂s [n] = y[n] − y[n − 1], which is Eq. (9.84b).
with
H∗ (Ω) Ss (Ω)
WSDC (Ω) = , (9.92b)
|H(Ω)|2 Ss (Ω) + σv2
9-6 2-D Estimation
where H(Ω) is H[k] of Eq. (9.89) with Ω = 2π k/N, Ss (Ω) is the
power spectral density of s[n] and σv2 is the noise variance. Since We now extend the various methods that were introduced in ear-
the true distance s[n] is an unknown quantity, its power spectral lier sections for estimating 1-D random processes to estimating
density Ss (Ω) is unknown. In practice, Ss (Ω) is assigned a func- 2-D random fields. The derivations of these estimators are direct
tional form based on experience with similar random processes. generalizations of their 1-D counterparts.
310 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
Sfg (Ω1 , Ω2 ) = Sf (Ω1 , Ω2 ) (9.97a) and then we use them to obtain the stochastic deconvolution
9-6 2-D ESTIMATION 311
Wiener filter form of a zero-mean white Gaussian random field ν [n, m]:
Sfg (Ω1 , Ω2 ) g[n, m] = hblur [n, m] ∗ ∗ f [n, m] + ν [n, m]. (9.108)
WSDC (Ω1 , Ω2 ) =
Sg (Ω1 , Ω2 )
H∗blur (Ω1 , Ω2 ) Sf (Ω1 , Ω2 ) The outcome of the blurring and noise-addition processes is
= . (9.103) displayed in Fig. 9-4(b).
|Hblur (Ω1 , Ω2 )|2 Sf (Ω1 , Ω2 ) + σv2
As before, we assume that we know the power spectral density
Sf (Ω1 , Ω2 ), the noise variance σv2 , and the blur filter hblur [n, m], A. Deterministic Wiener Deconvolution
in which case we follow the recipe: By extending the 1-D speedometer recipe of Section 9-5.1 to the
(1) Obtain Hblur (Ω1 , Ω2 ) from 2-D MRI image, we obtain the deterministic Wiener estimate
fˆD [n, m] as follows:
Hblur (Ω1 , Ω2 ) = DSFT{ hblur [n, m] }. (9.104)
(1) Hblur [k1 , k2 ] is computed from hblur [n, m] by taking the 2-D
(2) Compute WSDC (Ω1 , Ω2 ) using Eq. (9.103). DFT of Eq. (9.107).
(3) Compute wSCD [n, m] from (2) The deterministic deconvolution Wiener filter
fˆs [n, m] = wSDC [n, m] ∗ ∗ gobs [n, m]. (9.106) (3) F̂D [k1 , k2 ] is estimated using
Does the stochastic Wiener deconvolution filter provide better F̂D [k1 , k2 ] = G[k1 , k2 ] WDDC [k1 , k2 ], (9.110)
results than the deterministic Wiener deconvolution filter de-
scribed in Section 6-4.4? The answer is an emphatic yes, as where G[k1 , k2 ] is the 2-D DFT of the blurred image g[n, m].
demonstrated by the following example.
(4) Application of the inverse DFT gives fˆD [n, m]. The “best”
resultant image, shown in Fig. 9-4(c), used λ = 0.1.
9-6.5 Deterministic versus Stochastic
Deconvolution Example
This is a 2-D “replica” of the 1-D speedometer example of B. Stochastic Wiener Deconvolution
Section 9-5. The image in Fig. 9-4(a) is a high-resolution low-
The 1-D stochastic Wiener solution to the speedometer problem
noise MRI image of a human head. We will treat it as an
is given in Section 9-5.2. The 2-D image deconvolution proce-
“original” image f [n, m]. To illustrate (a) the effects of blurring
dure using the stochastic Wiener approach is identical to the 1-D
caused by the convolution of an original image with the PSF
procedure outlined earlier in part A of Section 9-6.5, except for
of the imaging system, and (b) the deblurring realized by
the form of the Wiener filter. For the stochastic case, the 2-D
Wiener deconvolution, we use a disk-shaped PSF with a uniform
equivalent of the expression given by Eq. (9.92c) for the power
distribution given by
spectral density is
(
1 for m2 + n2 < 145, C
hblur [n, m] = (9.107) Sf (Ω1 , Ω2 ) = . (9.111)
0 for m2 + n2 ≥ 145, (Ω21 + Ω22 )2
which provides a good model of out-of-focus imaging. The Upon setting Ω1 = 2π k1 /N and Ω2 = 2π k2 /N in Eqs. (9.111)
circular PSF has a radius of 12 pixels. We also add noise in the and (9.103), we obtain the stochastic Wiener deconvolution
312 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
ˆ
(b) Noisy blurred MRI image g[n,m] (d) Deconvolved image fs[n,m] using
stochastic Wiener filter
Figure 9-4 MRI images: (a) Original “true” image, (b) image blurred by imaging system, (c) image deconvolved by deterministic Wiener
filter, and (d) image deconvolved by stochastic Wiener filter.
9-7 SPECTRAL ESTIMATION 313
9-6.6 Stochastic 2-D MAP Sparsifying Answer: As σf → ∞, the parameter λ → 0, in which case
fMAP [n, m] → gobs [n, m].
Estimator
The 2-D analogue of the 1-D sparsifying estimator of Section
9-4.6 can be summarized as follows: 9-7 Spectral Estimation
(1) To apply the 1-D and 2-D stochastic denoising and deconvolu-
gobs [n, m] = f [n, m] + ν [n, m], (9.113) tion operations outlined in Sections 9-4 to 9-6, we need to know
the power spectral densities Sx (Ω) of the 1-D unknown random
with gobs [n, m] representing the observation, f [n, m] repre- process x[n] and Sf (Ω1 , Ω2 ) of the 2-D random field f [n, m].
senting the unknown random field, and ν [n, m] represent- Had x[n] been available, we could have applied the 1-D
ing a white Gaussian random field with known variance N-order DFT to estimate Sx (Ω) using
Sv (Ω1 , Ω2 ) = σv2 .
2
2π k 1 N−1
(2) f [n, m] and ν [n, m] are IID, jointly WSS random fields. Ŝx Ω = = ∑ x[n] e− j2π kn/N , (9.118)
N N n=0
(3) Each f [n, m] has a 2-D Laplacian a priori pdf given by for k = 0, 1, . . . , N − 1. The division by N converts energy
spectral density to power spectral density. Similarly, application
√
1 of the 2-D Nth-order DFT to f [n, m] leads
p( f [n, m]) = √ e− 2 | f [n,m]|/σf . (9.114)
2 σf
2 π k1 2 π k2
Ŝf Ω1 = , Ω2 = =
N N
The 2-D LASSO cost-functional equivalent to the 1-D expres- 2
sion given by Eq. (9.81) is 1 N−1
− j2π (nk1 +mk2 )/N
N 2 n=0 ∑ f [n, m] e ,
(9.119)
1 N N √ σv2 N N
Λ=
2 ∑ ∑ (gobs[n, m] − f [n, m])2 + 2
σf ∑ ∑ | f [n, m]|, for k1 = 0, 1, . . . , N − 1, and k2 = 0, 1, . . . , N − 1.
n=1 m=1 n=1 m=1
(9.115) This estimation method is known as the periodogram spectral
314 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
estimator. Other spectral estimation methods exist as well, but definition applies to an image in 2-D. Fractals are quite common
the central problem is that x[n] and f [n, m] are the unknown in nature; examples include certain classes of trees, river deltas,
quantities we wish to estimate, so we have no direct way to and coastlines (Fig. 9-5). In a perfect fractal, self-similarity
determine their power spectral densities. However, we can use exists over an infinite number of scales, but for real objects the
parametric models to describe Sx (Ω) and Sf (Ω1 , Ω2 ), which similarity is exhibited over a finite number of scales.
can then be used in the stochastic Wiener estimation recipes of An example of a fractal signal is shown in Fig. 9-6; note
Sections 9-4 to 9-6 to obtain the seemingly best outcome. As we the statistical resemblance between (a) the pattern of the entire
shall see in the next section, many images and signals exhibit signal extending over the range between 0 and 1 in part (a) of the
a fractal-like behavior with corresponding 1-D and 2-D power figure and (b) the pattern in part (b) of only a narrow segment of
spectral densities of the form the original, extending between 0.4 and 0.5 of the original scale.
C
Sx (Ω) = , for Ωmin < |Ω| < π , (9.120a)
|Ω|a
C 9-8.1 Continuous-Time Fractals
Sf (Ω1 , Ω2 ) = 2 , for Ωmin < |Ω1 |, |Ω2 | < π ,
(Ω1 + Ω22)b
(9.120b) Perfect self-similarity implies that a signal x(t) is identically
equal to a scaled version of itself, x(at), where a is a positive
where a, b, C, and Ωmin are adjustable constant parameters. scaling constant, but statistical self-similarity implies that if
The expressions given by Eq. (9.120) are known as power laws, x(t) is a zero-mean wide-sense stationary (WSS) random pro-
an example of which with b = 2 was used in Section 9-6.5 to cess, then its autocorrelation is self-similar. That is,
deconvolve a blurred MRI image (Fig. 9-4).
Rx (τ ) = C Rx (aτ ), (9.121)
Exercise 9-13: Suppose we estimate the autocorrelation where τ is the time shift between x(t) and x(t − τ ) in
function Rx [n] of data {x[n], n = 0, . . . , N − 1} using the
sample mean over i of {x[i] x[i − n]}, zero-padding {x[n]} Rx (τ ) = E[x(t) x(t − τ )]. (9.122)
as needed. Show that the DFT of R̂x [n] is the periodogram
(this is one reason why the periodogram works). According to Eq. (9.121), within a multiplicative constant C, the
variation of the autocorrelation Rx (τ ) with the time shift τ is the
Answer: same as the variation of the autocorrelation of the time-scaled
N−1 version Rx (aτ ) with the scaled time (aτ ).
1 The self-similarity property extends to the power spectral
R̂x [n] =
N ∑ x[i] x[i − n] density. According to Eq. (8.139a), the power spectral density
i=0
density has the form Fractal random processes are often characterized using col-
ors:
C White Fractal (a = 0): a random process whose power spec-
Sx ( f ) = . (9.125)
| f |a tral density, as given by Eq. (9.125), has a frequency exponent
a = 0. Hence, Sx ( f ) = C, which means that all frequencies are
We should note that a > 0 and S( f ) = S(− f ). present and weighted equally, just like white light.
316 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
w(t) w(t)
0.03 0.015
0.02 0.01
0.01 0.005
0 0
−0.01 −0.005
−0.02
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t (s) −0.01
0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5
t (s)
(a) 1-D signal (b) A small segment of the signal in (a)
Figure 9-6 Fractal signal w(t): Note the self-similarity between: (a) the entire signal and (b) a 10× expanded scale version of the segment
between t = 0.4 s and 0.5 s.
9-8.3 Wiener Process treat w(t) as a continuous-time signal, even though in reality it
is a discrete-time version of w(t). The power spectral density
A Wiener process w(t) is a zero-mean non-WSS random pro- Ŝw ( f ) was estimated using the periodogram
cess generated by integrating a white Gaussian random process
z(t): Z ∞ 2
Z − j2π f t
t 2
Ŝw ( f ) = |W( f )| = w(t) e dt . (9.137)
w(t) = z(τ ) d τ . (9.131) −∞
−∞
The Wiener process is a 1-D version of Brownian motion, which Figure 9-7(b) displays a plot of Sw ( f ) as a function of f on a
describes the motion of particles or molecules in a solution. log-log scale over the range 1 < f < 400 Hz. Superimposed onto
Often w(t) is initialized with w(0) = 0. the actual spectrum (in blue) is a straight line in red whose slope
Eve though the Wiener process is not WSS—and therefore in log-log scale is equivalent to 1/ f 2 , confirming the applicabil-
it does not have a definable power spectral density, we will ity of the brown fractal model described by Eq. (9.136). From
nonetheless proceed heuristically to obtain an expression for the intercept in Fig. 9-7(b), it was determined that
Sw ( f ). To that end, we start by converting Eq. (9.131) into the
differential form σz2 = 0.25.
dw
= z(t). (9.132) Hence, Eq. (9.136) becomes
dt
Utilizing entry #5 in Table 2-4, the Fourier transform of the 0.25 1
system described by Eq. (9.132) is Ŝw ( f ) = , (1 < | f | < 400).
4π 2 | f |2
( j2π f ) W( f ) = Z( f ), (9.133)
w(t) y[n]
0.03 0.04
0.02
0.02
0.01
0
0
−0.02
−0.01
−0.02
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t (s) −0.04
0 100 200 300 400 500 600 700 800 900 1000
n
(a) Wiener process w(t) (c) Wiener process w[n] plus noise ν[n]
Sw( f )
10−2
10−3
wˆ LS[n]
10−4 0.02
0.015
10−5
0.01
10−6 0.005
0
10−7 −0.005
−0.01
10−8
3 10 100 300
f (Hz) −0.015
0 100 200 300 400 500 600 700 800 900 1000
n
400
(b) Log-log plot of Sw( f ) (d) Denoised Wiener process using
stochastic Wiener filtering
wˆ D[n]
0.04
0.02
−0.02
−0.04
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
n
(e) Denoised Wiener process using
deterministicWiener filtering
Figure 9-7 (a) Wiener random process w(t), (b) spectrum Sw ( f ) of w(t) plotted on a log-log scale, (c) noisy signal y[n] = w[n] + ν [n], (d)
denoised estimate ŵLS [n] using stochastic filtering, and (e) deterministic estimate ŵD [n], which resembles the noisy Wiener process in (c).
where we used σz2 = 0.25. With subscript x in Eq. (9.67) Eq. (9.67) gives
changed to subscript w, and σv2 = 0.01, use of Eq. (9.140) in
Sw (Ω)
HSDN (Ω) =
Sw (Ω) + σv2
1/(16π 2Ω2 ) 1
= = , (9.141)
1/(16π 2Ω2 ) + 0.01 1 + 1.58Ω2
9-8 1-D FRACTALS 319
ŴLS (Ω) = HSDN (Ω) Y(Ω), (9.142) tion is equivalent to integration in continuous time or summation
and in discrete time. That is, at a sampling rate of 1000 samples/s,
ŵLS [n] = DTFT−1 { ŴLS (Ω) }, (9.143) n
z[n] = 1000 ∑ w[n], (9.147a)
where Y(Ω) is the DTFT of y[n]. n−100
Application of the recipe outlined in Section 9-4.4 led to
the plot shown in Fig. 9-7(d). The estimated signal ŵLS [n] or equivalently,
bears very close resemblance to the true signal w(t) shown Zt
in Fig. 9-7(a), demonstrating the capability of the stochastic z(t) = 1000 w(τ ) d τ . (9.147b)
t−0.1
Wiener filter as a powerful tool for removing noise from noisy
signals. To perform the summation (or integration), it is necessary to
append w[n] with zeros for −100 ≤ n < 0 (or equivalently,
B. Deterministic Wiener Denoising −0.1 ≤ t < 0 for w(t)). The integrated signal z[n] in Fig. 9-8(b)
is a smoothed version of the original signal w[n].
In the deterministic approach to noise filtering, signal w(t) is Next, white Gaussian noise ν [n] was added to the system
treated as a white random process with Sw ( f ) = σw2 . Conse- output z[n] to produce noisy observation
quently, the expression for the deterministic denoising filter
transfer function becomes y[n] = z[n] + ν [n] = hblur [n] ∗ w[n] + ν [n]. (9.148)
Sw (Ω) σw2 The noisy signal, plotted in Fig. 9-8(c), is only slightly different
HDDN (Ω) = = . (9.144) from z[n] of Fig. 9-8(b), because the signal-to-noise ratio is
Sw (Ω) + σv2 σw2 + σv2
36.9 dB, corresponding to an average signal power of about
Consequently, since HDDN (Ω) is no longer a function of Ω, 4900 times that of the noise.
implementation of the two steps in Eqs. (9.142) and (9.143) The stochastic Wiener deconvolution method uses the filter
leads to the deterministic estimate given by Eq. (9.71a), with subscript x changed to w:
σw2 H∗blur (Ω) Sw (Ω)
ŵD [n] = y[n]. (9.145) WSDC (Ω) = . (9.149)
σw2 + σv2 |Hblur (Ω)|2 Sw (Ω) + σv2
Clearly, ŵD [n] is just a scaled version of y[n], and therefore no Here, Hblur (Ω) is the DTFT of hblur [n] defined by Eq. (9.146),
filtering is performed. The plot shown in Fig. 9-7(e) is identical Sw (Ω) is given by Eq. (9.140), and from knowledge of the noise
to that in Fig. 9-7(c) for y[n]. For display purposes, we chose to added to z[n], σv2 = 0.01. Once WSDC (Ω) has been computed,
set the quantity σw2 /(σw2 + σv2 ) = 1. w[n] can be estimated using the form of Eq. (9.73), namely
w[n] y[n]
0.03 2
1.5
0.02
1
0.01 0.5
0 0
−0.5
−0.01
−1
−0.02
0 100 200 300 400 500 600 700 800 900 1000
n −1.5
0 100 200 300 400 500 600 700 800 900 1000
n
(a) Wiener process w[n] (c) Convolved Wiener process with noise added
z[n] wˆ SW[n]
2 0.03
1.5
0.02
1
0.5 0.01
0 0
−0.5
−0.01
−1
−1.5 0 100 200 300 400 500 600 700 800 900 1000
n −0.02
0 100 200 300 400 500 600 700 800 900 1000 n
(b) Wiener process convolved with rect (d) Deconvolved Wiener process using
function of duration 100 and amplitude 1000 stochastic Wiener deconvolution filter
wˆ DW
0.2
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
0 100 200 300 400 500 600 700 800 900 1000
n
(e) Deconvolved Wiener process using
deterministic Wiener deconvolution filter
Figure 9-8 Wiener deconvolution: (a) original Wiener process w[n], (b) convolved signal z[n] = h[n] ∗ w[n], (c) y[n] = z[n] + ν [n],
where ν [n] is white Gaussian noise with S/N = 36.9 dB, (d) deconvolved reconstruction ŵSW [n] using a stochastic Wiener filter, and
(e) deconvolved reconstruction ŵDW [n] using a deterministic filter.
1
−2 −1.5 −1 −0.5 0 0.5
log(Ω2)
(b) Plot of log10(Sf (0,Ω2)) versus (e) Reconstructed image using
log10(Ω2) for π/240 ≤ Ω2 ≤ π. a stochastic Wiener filter
The average slope is −2.
(2) Figure 9-9(b): To ascertain the fractal character of the ence, namely that instead of using Eq. (9.152) for Sf (Ω1 , Ω2 )
tree, the 2-D power spectral density Sf (Ω1 , Ω2 ) was estimated the expression used is Sf (Ω1 , Ω2 ) = σf2 . The absence of the
using inverse frequency dependency leads to a fuzzier reconstruction
than realized by the stochastic deconvolution filter.
959 1279
Ŝf (Ω1 , Ω2 ) = ∑ ∑ f [n, m] e− j(Ω1 n+Ω2 m) , (9.151)
n=0 m=0
9-10 Markov Random Fields
and then Ŝf (0, Ω2 ) was plotted in Fig. 9-9(b) on a log-log scale.
The average slope is −2, which means that Ŝf (0, Ω2 ) varies as Medical applications—such as ultrasound imaging, MRI, and
1/Ω22. Generalizing to 2-D, others—rely on the use of image processing tools to segment
the images produced by those imaging sensors into regions of
C common features. The different regions may belong to different
Sf (Ω1 , Ω2 ) = . (9.152)
Ω21 + Ω22 organs or different types of tissue, and the goal of the segmenta-
tion process is to facilitate the interpretation of the information
(3) Figure 9-9(c): To simulate motion blur in the horizontal contained in the images. Another name for segmentation is
direction, the image in Fig. 9-9(a) is convolved with a (1 × 151) classification: assigning each pixel to one of a set of predefined
2-D PSF given by classes on the basis of its own value as well as those of its
( neighbors. Similar tools are used to segment an image captured
1 for 0 ≤ n ≤ 150 and m = 1, by a video camera, an infrared temperature sensor, or a weather
hblur [n, m] = (9.153)
0 otherwise. radar system.
An important ingredient of the image segmentation process
The resultant 960 × 1430 blurred image given by is a parameter estimation technique that models an image as a
Markov random field (MRF). The MRF model assigns each
z[n, m] = hblur [n, m] ∗ ∗ f [n, m] (9.154) image pixel f [n, m] a conditional probability density based, in
part, on the values of the pixels in its immediate neighborhood.
is shown in Fig. 9-9(c) The purpose of the present section is to introduce the concept
(4) Figure 9-9(d): Addition of a slight amount of noise and attributes of MRFs and to demonstrate their applications
ν [n, m] to z[n, m] gives through image examples.
g[n, m] = z[n, m] + ν [n, m] = hblur [n, m] ∗ ∗ f [n, m] + ν [n, m].
(9.155)
9-10.1 1-D Markov Process
The slightly noisy convolved image g[n, m] is shown in
Fig. 9-9(d). Before we delve into the 2-D case, let us first consider the
(5) Figure 9-9(e): Generalizing Eq. (9.149) to 2-D gives 1-D case of a Markov random process x[n]. The value of
x[n] is continuous, but it is sampled in time in discrete steps,
H∗blur (Ω1 , Ω2 ) Sf (Ω1 , Ω2 ) generating the random vector { . . . , x[0], x[1], . . . , x[N], . . . }. In
WSDC (Ω1 , Ω2 ) = , (9.156)
|Hblur (Ω1 , Ω2 )|2 Sf (Ω1 , Ω2 ) + σv2 Markov language, x[n] at time n is regarded as the present,
x[n + 1] is regarded as the future, and its values x[n − 1],
where Hblur (Ω1 , Ω2 ) is the 2-D DSFT of hblur [n, m] and σv2 x[n − 2], . . . , x[0] are regarded as the past. The Markov model
is the noise variance. Application of the stochastic Wiener assigns a conditional pdf to “the future value x[n + 1] based on
reconstruction recipe in 2-D gives the present value x[n] but independent of past values,” which is
equivalent to the mathematical statement
fˆS [n, m] = DSFT−1 [WSDC (Ω1 , Ω2 ) G(Ω1 , Ω2 )], (9.157)
p(x[n + 1]|{ x[0], x[1], . . ., x[n] }) = p(x[n + 1]|x[n]). (9.158)
where G(Ω1 , Ω2 ) is the 2-D DSFT of the observed image
g[n, m]. The reconstructed image is shown in Fig. 9-9(e). In 2-D, the “present value of x[n]” becomes the values of the
(6) Figure 9-9(f): Deterministic convolution involves the pixels in the neighborhood of pixel f [n, m], and “past values”
same steps represented by Eqs. (9.156) and (9.157) for the become pixels outside that neighborhood.
stochastic deconvolution process except for one single differ- Using the Markov condition encapsulated by Eq. (9.158), and
9-10 MARKOV RANDOM FIELDS 323
after much algebra, it can be shown that the classes pertain to different types of tissue, while in a video
image of terrain, the classes might be roads, cars, trees, etc. The
p(x[n]|{ . . . , x[N], x[N − 1], . . ., x[n + 1], x[n − 1], . . ., x[0], . . . }) segmentation procedure involves several elements, the first of
p(x[n + 1]|x[n]) p(x[n]|x[n − 1]) which is the likelihood function of the estimation method used
=Z , (9.159) to implement the segmentation.
p(x′ [n + 1]|x′[n]) p(x′ [n]|x′ [n − 1]) dx′ [n] Given the general image model
which states that the conditional pdf of x[n], given all other g[n, m] = f [n, m] + ν [n, m], (9.160)
values { . . . , x[0], . . . , x[N], . . . }, is governed by the product of
two conditional pdfs, one relating x[n] to its immediate past where g[n, m] is the observed intensity of pixel [n, m], f [n, m]
x[n−1], and another relating x[n] to its immediate future x[n+1]. is the noise-free intensity, and ν [n, m] is the additive noise, the
An MRF generalizes this relationship from 1-D to 2-D using the estimated value fˆ[n, m] is obtained from g[n, m] by maximizing
concepts of neighborhoods and cliques. the likelihood function. The likelihood functions for the maxi-
mum likelihood estimation (MLE) method and the maximum a
posteriori probability (MAP) method are given by
9-10.2 Neighborhoods and Cliques
MLE:
The neighborhood of a pixel at location [n, m] is denoted∗ p(g[n, m]| f [n, m]), (9.161a)
∆[n, m] and consists of a set of pixels surrounding pixel [n, m],
but excluding it. Figure 9-10 displays the pixel locations in- MAP:
cluded in 3-, 8-, 15-neighbor systems. Neighborhoods may also
p(g[n, m]| f [n, m]) p( f [n, m])
be defined in terms of cliques. p( f [n, m]|g[n, m]) = . (9.161b)
A clique is a set of locations such that any two members of p(g[n, m])
the clique adjoin each other, either horizontally, vertically, or A. MLE Estimate fˆMLE [n, m]
diagonally. Figure 9-11 shows 10 cliques, one of which is a
self-adjoining clique. A neighborhood can be represented as a As shown earlier in Sections 9-2 and 9-3 for the 1-D case, the
disjoint union of cliques of various types. MLE maximization process entails: (a) introducing an appro-
In a typical image, pixel neighbors are more likely to have priate model for the conditional probability of Eq. (9.161a), (b)
similar intensities than distant pixels. Different organs in a taking the logarithm of the model expression, (c) computing the
medical image or different objects in an imaged scene tend to derivative of the likelihood function with respect to the unknown
have smooth features and the boundaries between them tend quantity f [n, m] and equating it to zero, and (d) solving for
to have sharp transitions across them. Image texture, defined f [n, m]. The resultant value of f [n, m] is labeled fˆMLE [n, m]. The
as the spatial variability within a given “homogeneous” region, computed estimate is then used to assign pixel [n, m] to one of
might exhibit different statistical variations in different types the K classes, in accordance with a class-assignment algorithm
of regions (classes). Image segmentation (classification) algo- (introduced shortly).
rithms differ by the type of sensor used to produce the image,
the type of scene, and the intended application. However, most
of these algorithms share a common strategy, which we outline B. MAP Estimate fˆMAP [n, m]
in this and the next section. Part of the strategy is to define
the “relevant” neighborhood for a particular application, which MAP requires models for two pdfs, namely the same con-
entails defining the combination of cliques comprising such a ditional pdf used in MLE, p(g[n, m]| f [n, m]), as well as the
neighborhood. pdf p( f [n, m]). The third pdf, p(g[n, m]) in the denominator
of Eq. (9.161b), is not needed and may be set equal to a
constant. This is because it disappears in the maximization
9-10.3 Likelihood Functions procedure (taking the log, differentiating with respect to f [n, m],
and equating to zero).
The goal of image segmentation is to classify each pixel into The type of models commonly used to describe the two pdfs
one of K predefined classes. In a medical ultrasound image, in the numerator of Eq. (9.161b) are introduced shortly.
∗ For purposes of clarity, we use the symbol ∆, even though the Markov field
literature use ∂ .
324 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
[n − 2, m − 2] [n − 1, m − 2] [n, m − 2] [n +1, m − 2] [n + 2, m − 2]
[n − 2, m − 1] [n − 1, m − 1] [n, m − 1] [n +1, m − 1] [n + 2, m − 1]
[n − 2, m] [n − 1, m] [n, m] [n +1, m] [n + 2, m]
[n − 2, m + 1] [n − 1, m + 1] [n, m + 1] [n +1, m + 1] [n + 2, m + 1]
[n − 2, m + 2] [n − 1, m + 2] [n, m + 2] [n +1, m + 2] [n + 2, m + 2]
(c) 15 neighbors
25
10
20
20
15
30
10
40
5
50 0
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
60
10 20 30 40 50 60 f1 f2
Figure 9-13 Noisy image of four squares and associated histogram. The noise-free image had only positive values, but the addition of
random noise expands the range to negative values and to larger positive values.
9-11 APPLICATION OF MRF TO IMAGE SEGMENTATION 327
9-11 Application of MRF to Image with g[n, m] = 50 is closer to f1 = 0 than to f2 = 150, and
therefore it gets classified as class 1 (black). Similarly, a pixel
Segmentation with g[n, m] = 100 is closest to f2 = 150, and therefore it gets
classified as class 2 (tissue). The MLE segmentation process
9-11.1 Image Histogram leads to the image shown in Fig. 9-14(c).
Consider the 320 × 320 noisy image shown in Fig. 9-14(a). Our
goal is to segment the image into three classes: bone, other
tissue, and background. To that end, we start by generating 9-11.3 MAP Segmentation
the histogram of the noisy image g[n, m]. The histogram dis- Inserting Eqs. (9.167) and (9.168) into Eq. (9.161b) provides the
played in Fig. 9-14(b) encompasses three clusters of pixels, a MAP likelihood function
group concentrated around f [n, m] = f1 = 0, another around
f [n, m] = f2 = 150, and a third around f [n, m] = f3 = 300. p(g[n, m]| f [n, m]) p( f [n, m])
p( f [n, m]|g[n, m]) =
The three clusters correspond to dark pixels in the background p(g[n, m])
part of the image, pixels of non-bone tissue, and pixels of the ( )
N N
1 −(1/2σ 2 )(g[n,m]− f [n,m])2 1 −β s[n,m]
bones in the five toes. In the present case, the values of f1 to = ∏∏√ e × e
f3 were extracted from the image histogram, but in practice n=1 m=1 2πσ
2 Z
more precise values are available from calibration experiments 1
performed with the imaging sensor. × , (9.170)
p(g[n, m])
f1 = 0 f2 = 150 f3 = 300
1400
1200
1000
800
600
400
200
0 g[n,m]
0 100 200 300 400 500
but the presence of the term β s[n, m] introduces the degree of (ICM) algorithm. The algorithm repeats the segmentation pro-
similarity/dissimilarity of pixel [n, m] relative to its neighbors cess iteratively until it reaches a defined threshold.
into the segmentation decision.
For each pixel [n, m], g[n, m] is the observed value of that pixel
in the noisy image, β is a trial-and-error parameter, σ 2 is the Example 9-3: Four-Square Image
image variance (usually determined through calibration tests),
and s[n, m] is the dissimilarity index obtained from the MLE
segmentation image. By computing z[n, m] three times, once Apply the ICM algorithm with β = 300 to segment the 64 × 64
with f [n, m] in Eq. (9.172) set equal to f1 , another with f [n, m] noisy binary image shown in Fig. 9-15(a). The noise level is
set equal to f2 , and a third time with f [n, m] set equal to f3 , characterized by a variance σ 2 = 0.25.
we obtain three values for z[n, m]. MAP segmentation selects
the smallest of the three absolute values of z[n, m], and assigns Solution: We start by computing and displaying the image
that pixel to the corresponding class. The outcome is shown in histogram shown in Fig. 9-15(b). Based on the histogram, we
Fig. 9-14(d). select f1 = 1.5 for the darker class and f2 = 2.5 for the brighter
Computationally, a commonly used algorithm for realizing class.
the MAP segmentation is the Iterated Conditional Modes Next, we apply MLE segmentation. We assign each pixel
[n, m] the value f1 or f2 depending on which one of them is
PROBLEMS 329
25
10
20
20
15
30
10
40
5
50 0
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
60
10 20 30 40 50 60 f1 f2
(a) Noisy image (b) p(g[n,m])
10 10 10
20 20 20
30 30 30
40 40 40
50 50 50
60 60 60
10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60
closest in value to the pixel value g[n, m]. The result is displayed
Concept Question 9-11: What application of Markov
in Fig. 9-15(c).
random fields was covered in Section 9-11?
Finally, we apply MAP segmentation using the ICM algo-
rithm to obtain the image in Fig. 9-15(d). The segmentation
can be improved through iterative repetition of the process. The
first iteration generates an MLE image, computes s[n, m] for
each pixel, and then computes a MAP image. Each subsequent
iteration computes a new set of values for s[n, m] and a new
MAP image. The image in Fig. 9-15(e) is the result of 9
iterations. Except for a few misidentified pixels—mostly along
the boundaries of the four squares, the MAP segmentation
procedure provides very good discrimination between the white
and black squares.
330 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
Summary
Concepts
• The goal of estimation is to estimate an unknown x from minimizes the mean square error E[(x − x̂LS(y))2 ].
observation yobs of a random variable or process y using • The 2-D versions of these estimators are generalizations
a conditional pdf p(y|x) which comes from a model, and of the 1-D ones.
possibly an a priori pdf p(x) for x. • Using a white power spectral density makes the stochas-
• The maximum likelihood estimate x̂MLE(yobs ) is the tic Wiener filter reduce to the deterministic Wiener filter.
value of x that maximizes the likelihood function • A fractal model of power spectral density greatly im-
p(yobs |x), or equivalently its logarithm. proves performance.
• The maximum a posteriori estimate x̂MAP (yobs ) is the • A Markov random field (MRF) models each image pixel
value of x that maximizes the a posteriori pdf p(x|yobs ), as a random variable conditionally dependent on its
or equivalently its logarithm. p(x|yobs ) is computed from neighboring pixels.
likelihood function p(yobs |x) and a priori pdf p(x) using • A MRF model and the ICM algorithm can be used to
Bayes’s rule (see below). segment an image.
• The least-squares estimate x̂LS (yobs ) is the value of x that
Mathematical Formulae
Bayes’s rule 1-D Linear least-squares estimator
p(x) x̂LLSE [n] = h[n] ∗ yobs[n]
p(x|y) = p(y|x)
p(y)
1-D Deterministic Wiener deconvolution filter
MAP log-likelihood H∗blur (Ω)
log p(x|yobs ) = log p(yobs |x) + log p(x) − log(yobs ) x̂(Ω) = yobs (Ω)
|Hblur (Ω)|2 + σv2
Least-squares estimate R 1-D Stochastic Wiener deconvolution filter
x′ p(yobs |x′ ) p(x′ ) dx′
x̂LS (yobs ) = E[x | y = yobs ] = R H∗blur (Ω) Sx (Ω)
p(yobs |x′ ) p(x′ ) dx′ x̂(Ω) = yobs (Ω)
|Hblur (Ω)|2 Sx (Ω) + λ 2
Gaussian least-squares estimate
Fractal model
x̂LS (yobs ) = x̄ + Kx,yK−1
y (yobs − ȳ) c
Sx ( f ) = a , where a = 1 or 2
Sample mean |f|
1 N Gibbs distribution of a Markov random field
x̂LS (yobs ) = ∑ yobs [n]
N n=1 1
p({ f [n, m]}) = e−U[n,m]
Z
Least-squares estimation orthogonality principle
E[(x − x̂LS (y))]yT = 0
Important Terms Provide definitions or explain the meaning of the following terms:
Bayes’s rule Gibbs distribution Markov random field sample mean
deterministic Wiener filter ICM algorithm maximum likelihood stochastic Wiener filter
fractal power spectral density least-squares maximum a posteriori
PROBLEMS 331
so that P[x = 41 ] = 31 and P[x = 12 ] = 32 , compute an 9.6 y(t) is a zero-mean WSS Gaussian random process with
expression for the least-squares estimator x̂LS (n0 ). autocorrelation function Ry (τ ) = e−|τ | .
(b) If N = 3 and n0 = 2, compute the least-squares estimate (a) Let y = {y(1), y(2), y(3)}. Determine the joint pdf p(y).
x̂LS (2). (b) Compute the least-squares estimate ŷ(3)LS (y(2) = 6).
9.3 An exponential random variable y has pdf (c) Compute the least-squares estimate
(
xe−xy for y > 0, ŷ(3)LS (y(2) = 6, y(1) = 4).
p(y|x) =
0 for y < 0.
9.7 x(t) is a zero-mean WSS white Gaussian random process
We observe five independent values y0 = {y1 , y2 , y3 , y4 , y5 } of with autocorrelation Rx (τ ) = 4δ (τ ). x(t) is input into an LTI
random variable y. x is an unknown constant. Compute the MLE system with impulse response
x̂MLE({y1 , y2 , y3 , y4 , y5 }) of x. (
3e−2t for t > 0,
9.4 An exponential random variable y has pdf h(t) =
0 for t < 0.
(
xe−xy for y > 0,
p(y|x) = (a) Compute the autocorrelation Ry (τ ) of the output random
0 for y < 0.
process y(t).
(a) If x is an unknown constant, compute the MLE estimate (b) Let y = {y(3), y(7), y(9)}. Determine the joint pdf p(y).
x̂MLE(y0 ). (c) Compute the least-squares estimate ŷ(7)LS (y(5) = 6).
332 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
(d) Compute the least-squares estimate Section 9-6: 2-D Estimation Problems
ŷ(7)LS (y(5) = 6, y(3) = 4). Deblurring due to an out-of-focus camera can be modeled
crudely as 2-D convolution with a disk-shaped PSF
(
9.8 x[n] is a zero-mean non-WSS random process with au- 1 for m2 + n2 < R2 ,
tocorrelation Rx [i, j] = min[i, j]. Let i > j > k. Show that h[n, m] =
x̂[i]LS (x[ j], x[k]) = x̂[i]LS (x[ j]), so x[k] is irrelevant. 0 for m2 + n2 > R2 ,
9.9 x[n] is an IID Gaussian random process with for some radius of R pixels. The program srefocus.m con-
volves with h[n, m] the image f [n, m] in the file ????.mat,
x[n] ∼ N (m, s), adds zero-mean white Gaussian noise with variance σv2 to the
2
blurred image, and deconvolves the blurred image using each of
where m = E[x[n]] and s = σx[n] . We are given observations the following two Wiener filters:
x0 = {x0 [1], x0 [2], . . . , x0 [N]} • the deterministic Wiener filter uses power spectral density
Sf (Ω1 , Ω2 ) = C;
of {x[1], x[2], . . . , x[N]}. The goal is to compute the MLE esti-
mates m̂(x0 ) and ŝ(x0 ) of mean m and variance s. • the stochastic Wiener filter uses power spectral density
Sf (Ω1 , Ω2 ) = C/(Ω21 + Ω22).
Section 9-7: Spectral Estimation Both filters depend only on the reciprocal signal-to-noise-type
ratio σv2 /C. In practice, neither C nor σv2 is known, so different
9.10 Section 9-7 showed discrete-space fractal images have values of σv2 /C would be tried. Use σv2 /C = 1 here. The program
power spectral densities srefocus.m will be used in the next eight problems. [Here
“????” refers to the .mat files used in Problems 9.11 through
C 9.18.]
Sf (Ω1 , Ω2 ) = ,
(Ω21 + Ω22 ) 9.11 Edit program srefocus.m to deconvolve the image in
0 < Ωmin < |Ω1 |, |Ω2 | < Ωmax < π , clown.mat. Use σv2 /C = 1.
for some Ωmin and Ωmax . This problem suggests many real- 9.12 Edit program srefocus.m to deconvolve the image in
world images also have such power spectral densities. letters.mat. σv2 /C = 1.
The following program uses a periodogram to estimate the 9.13 Edit program srefocus.m to deconvolve the image in
power spectral density of the image in ????.mat and fits a xray.mat. Use σv2 /C = 1.
line to the log-log plot of Sf (Ω1 , 0) versus Ω1 for Ωmin = 0.1π
and Ωmax = 0.9π . [Here “????” refers to the .mat files listed 9.14 Edit program srefocus.m to deconvolve the image in
below.] mri.mat. Use σv2 /C = 1. Change the power spectral density to
Sf (Ω1 , Ω2 ) = C/((Ω21 + Ω22 ))2 (uncomment a line).
clear;load ???.mat;M=size(X,1); 9.15 Edit program srefocus.m so it merely denoises the
K=round(M/20); noisy clown image. Use σv2 /C = 1 and h[n, m] = δ [n, m] (to
FX=abs(fft2(X))/M;FX=log10(FX.*FX); make a denoising, not a deconvolution problem) and
Omega1=log10(2*pi*[K:M/2-K]/M);
P=polyfit(Omega1,FX(1,K:M/2-K),1); • the deterministic Wiener filter uses the power spectral
Y=polyval(P,Omega1); density Sf (Ω1 , Ω2 ) = C;
subplot(211),plot(Omega1,FX(1,K:M/2-K),
Omega1,Y,’r’),axis tight,grid on • the stochastic Wiener filter uses the power spectral density
Sf (Ω1 , Ω2 ) = C/(Ω21 + Ω22).
Run this program for the images contained in the follow-
ing .mat files: (a) clown.mat; (b) letters.mat; (c) 9.16 Edit program srefocus.m so it merely denoises the
sar.mat; (d) mri.mat. What do these plots suggest about noisy letters image. Use σv2 /C = 1 and h[n, m] = δ [n, m] (to
the power spectral densities of the images? make a denoising, not a deconvolution problem) and
PROBLEMS 333
Contents
Overview, 335
10-1 Color Systems, 335
10-2 Histogram Equalization and Edge Detection, 340
10-3 Color-Image Deblurring, 343
10-4 Denoising Color Images, 346 (a) Blurred Christmas tree
Problems, 351
6
Objectives
2
-2
Learn to: -4
-6
-8
-12
k1
■ Apply image enhancement (histogram equalization
-14
0 20 40 60 80 100
(MATLAB)
and edge detection) to color images. (b) 1-D spectrum log(|GR[k1,0] + 1)
335
336 CHAPTER 10 COLOR IMAGE PROCESSING
Lens To UHF
Video transmitter
Encoder signal
combiner
Chrominance signal
Mirror
Luminance Luminance signal
Semi-mirror matrix
identical procedures (2-D DFT and Wiener filter) for each RGB when irradiated with electrons, and similarly for green and blue.
component. Image enhancement, in particular edge detection The phosphors are in groups of three, called pixels, one for each
and histogram equalization, is applied to the YIQ representation, color.
rather than the RGB representation (the two representations are Color televisions and computer monitors with LED displays
related by the linear transformation Eq. (10.4), in order to ensure use three sets of LEDs to depict the red, green, and blue images.
that different colors are not enhanced differently. Denoising can Each pixel in the display consists of three closely spaced LEDs,
use either RGB or YIQ representation. All of these tools are one for each color.
illustrated in the remainder of this chapter.
To acquire a color image, most digital cameras and scanners
acquire red, green, and blue images separately. Scanners include C. RGB
a (M × 3) array of CCD sensors with an (M × 3) array of color
The RGB color system represents color images using
filters in front of them, where M is the width of the scanner in
pixels. One row of filters is red, another row of filters is green, Eq. (10.1). It is called an additive color system because
and another row of filters is blue (see Fig. 1-6). The image is non-RGB colors are created by taking a linear combination of
{ fR [n, m], fG [n, m], fB [n, m]}. The additive process is illustrated
scanned line by line, and the mth line generates fR [n, m], fG [n, m]
and fB [n, m]. by the color cube shown in Fig. 10-3. The coordinates are
the weights applied to { fR [n, m], fG [n, m], fB [n, m]} to produce
the color shown in the cube. White, which has coordinates
B. Displaying Color Images (1,1,1), is obtained by combining red, green, and blue in equal
strengths. The non-RGB colors are discussed in Subsection
As noted above, color photography was developed by filtering 10-1.2 where we discuss the CMYK color scheme.
the lens image using red, green, and blue filters, and then devel- Since TV screens and monitors are dark when they are not
oping in a photo lab three separate images in these colors. Color displaying an image, it is natural to create color images on
television with cathode-ray tubes (CRTs) uses three electron them by displaying weighted sums (i.e., linear combinations) of
guns to “paint” a phosphor screen with red, green, and blue { fR [n, m], fG [n, m], fB [n, m]}. The lighting used in a dark theater
images. A mask behind the screen separates the outputs of the for a play is another example of an additive color system.
guns, so that the “red” gun illuminates phosphors that glow red Web browsers display web pages using an additive
10-1 COLOR SYSTEMS 337
Red
Magenta Yellow
Blue Green
Cyan
have to deposit both red and green ink onto the paper. When Black (K) is included in CMYK because while Fig. 10-4
illuminated by white light, the red ink will absorb the green and shows that cyan + magenta + yellow = black, such an addition
blue components of the light and the green ink will absorb the of inks requires much ink to produce black, which is a common
red and blue components, leaving no light to be reflected by the color in images. Hence, black is included as a separate color in
printed page. Hence, the yellow tomato will appear black instead printing. Because CMYK consists of three subtractive colors—
of yellow. Because the ink used in inkjet printers and the powder yellow, cyan, and magenta—and black, it is often referred to as
used in laser toners operate by subtracting their colors from a four-color printing process. In CMYK, silver is represented
white, they are more compatible with the subtractive CMYK using the weights {0, 0, 0, 0.25}, and gold by {0, 0, 0.88, 0.15}.
color system than with the additive RGB system. The exact relation between RGB and CMYK color systems is
more complicated than we have presented, since the inks used
in printing are not exactly cyan-, magenta-, and yellow-colored.
B. CMYK Printers We should note that CMYK is not used in image processing,
In the subtractive CMYK color system (Fig. 10-5), red can be only for image printing.
created by a mixture of yellow and magenta inks, green by
a mixture of yellow and cyan inks, and so on. Using CMY 10-1.4 YIQ Color System
ink colors solves the problem we encountered earlier with the
yellow tomato. We can now print many colors, whether pure or The Y (luminance), I (In phase), Q (Quadrature) color system
not. Obviously, a yellow tomato can be printed using yellow ink, was developed for television by the NTSC (National Television
but what about a perfectly red tomato? Since yellow consists of System Committee) in 1953. It differs from the RGB and
red and green and magenta consists of red and blue, we can use CYMK color systems in that intensity is decoupled from color.
the combination of yellow and magenta inks to print it since both The reason for decoupling intensity from color was so that im-
colors include red. When illuminated by white light, the yellow ages transmitted in color could easily be displayed on black-and-
ink absorbs blue and the magenta ink absorbs green, leaving white televisions, which were still in common use throughout
only the red component of the white light to be reflected. the 1950s. This feature also makes the YIQ color system useful
In CMYK, a color image is depicted as: for histogram equalization and edge detection because image
intensity (luminance) can be varied without distorting colors, as
f [n, m] = { fC [n, m], fM [n, m], fY [n, m], fK [n, m]}. (10.2) we show later in Section 10-2.
The YIQ depiction of a color image f [n, m] is given by
Even though the symbol fY [n, m] is used in both the YIQ and
CMYK color systems, this is the standard notation used in color
Cyan schemes. The meaning of fY [n, m] (yellow or luminance) should
be obvious from the context.
The luminance component fY [n, m] is the intensity of
f [n, m] at location [n, m]. The chrominance (color) components
Blue Green
{ fI [n, m], fQ [n, m]} depict the color of f [n, m] at location [n, m]
using Fig. 10-6.
The horizontal axis (blue to orange) ranges over the colors
to which the eye is most sensitive. The vertical axis (purple
Magenta Yellow to green) ranges over colors to which the eye is less sensitive.
Red
Accordingly, for analog transmission of television signals, less
bandwidth is needed to transmit the quadrature-modulated sig-
nal fQ [n, m] than the in-phase (un-modulated) signal fI [n, m],
with most of the transmission bandwidth reserved for the lu-
minance signal fY [n, m].
Figure 10-5 Relation of RGB colors to CMY colors. The chrominance components { fI [n, m], fQ [n, m]} can be re-
garded as rectangular color coordinates. In polar coordinates,
10-1 COLOR SYSTEMS 339
◮ In this book we will transform from the RGB color Answer: From Eq. (10.4), fY [n, m] = 1 and
system to the YIQ color system using Eq. (10.4), then
perform image enhancement using histogram equalization fI [n, m] = fQ [n, m] = 0
on fY [n, m], and convert the result back to the RGB color
system using the inverse of Eq. (10.4). ◭ for all [n, m], as discussed below Eq. (10.4).
The two edge images are then combined to generate the gradient
image g[n, m]:
q
g[n, m] = dH [n, m]2 + dV [n, m]2 . (10.7c)
Concept Question 10-2: Why does applying histogram Example 10-2: Deblurring Checkerboard Image
equalization using the RGB color system alter colors?
Exercise 10-3: Should gamma transformation be applied to Consider the 8 × 8 color checkerboard image shown in
images using the RGB or YIQ format? Fig. 10-10(a), which we label f [n, m], as it represents an original
unblurred image of the checkerboard.
Answer: YIQ format, because gamma transformation is
If the original scene was imaged by a sensor characterized by
nonlinear; applying it to an RGB image would distort its
a circular PSF given by
colors.
( √
1 for n2 + m2 ≤ 4,
Exercise 10-4: Should linear transformation be applied to h[n, m] = √ (10.11)
0 for n2 + m2 > 4.
images using the RGB or YIQ color system?
Answer: RGB, since this is the color system used to (a) Generate the convolved image
display the transformed image. Applying linear transfor-
mation to YIQ images may result in gR [n, m] ≥ gmax , g[n, m] = f [n, m] ∗ ∗h[n, m], (10.12)
gG [n, m] ≥ gmax , or gB [n, m] ≥ gmax in Eq. (5.1).
and (b) apply deconvolution to reconstruct the original image.
Hence, all three of the YIQ components are also blurred, so (2) Since convolution in the spatial domain is equivalent to
there is no particular advantage to transforming the image from multiplication in the spatial frequency domain using (16 × 16)
RGB to YIQ format. In practice deblurring (deconvolution) of a 2-D DFTs:
color image is performed separately on each of gR [n, m], gG [n, m]
and gB [n, m] to obtain the original image components fR [n, m], GR [k1 , k2 ] = FR [k1 , k2 ] H[k1 , k2 ], (10.14)
fG [n, m] and fB [n, m].
The deblurring process is illustrated through two examples. and similar expressions apply to the green and blue images.
344 CHAPTER 10 COLOR IMAGE PROCESSING
1
2
3
4
5
6
7
8
9
n
1 2 3 4 5 6 7 8 9
m
(a) (8 × 8) original checkerboard RGB image (b) Extent of PSF when centered at pixel (5,5)
(c) (16 × 16) blurred checkerboard image (d) (16 × 16) reconstructed checkerboard image
Figure 10-10 Simulation of blurring and deblurring processes: (a) original image, (b) PSF h[n, m], (c) original image blurred by system
PSF, and (d) reconstructed image after removal of blurring.
Hence, for the red channel we use the computed results to obtain convolved image is shown in Fig. 10-10(d). The blurring caused
by the sensor has been completely removed. The reconstructed
GR [k1 , k2 ] image is a (16 × 16) zero-padded version of the original (8 × 8)
FR [k1 , k2 ] = , (10.15)
H[k1 , k2 ] image.
and similar expressions apply to the other two colors.
(3) Compute fR [n, m] for all pixels:
and
Exercise 10-7: If the length of the blur were N + 1 = 6, at
2 π k2
Ω2 = , (10.18b) what Ω1 values would H(Ω1 , Ω2 ) = 0?
691
Answer: The numerator of Eq. (10.17) is zero when
to obtain GR [k1 , k2 ]. Next, we plot the horizontal profile of sin(Ω1 (N + 1)/2) = 0, which occurs when Ω1 (N + 1)/2
|GR [k1 , k2 ]| as a function of k1 at k2 = 0. For display purposes, is a multiple of π , or equivalently when Ω1 = ±kπ /3 for
we offset |GR [k1 , k2 ]| by 1 and plot the logarithm of the sum: integers k. This is independent of the size of the image.
log(|GR [k1 , 0]| + 1). (10.19)
The MATLAB-generated plot of Eq. (10.19) is shown in 10-4 Denoising Color Images
Fig. 10-11(b). It exhibits a periodic sequence of sharp nulls
corresponding to the zeros of sin[Ω1 (N + 1)/2] in the spatial In Chapter 7, we demonstrated how the discrete wavelet trans-
frequency response H(Ω1 , Ω2 ) of the PSF that caused the form can be used to denoise an image by applying the combi-
blurring of the image. The first null is at MATLAB index = 15, nation of thresholding and shrinkage to the noisy image. The
which corresponds to same approach is equally applicable to color images. The recipe
outlined in Section 7-8.2 can be applied to each of the three
k1 = 15 − 1 = 14. RGB channels separately, and then the three modified images
are combined to form the denoised color image. Since noise had
Since the first null of the sine function occurs when its argument been added to each RGB component separately, the denoising
is ±π , it follows that procedure also is applied to each RGB component separately. If
preserving color is paramount, denoising can be applied to the
Ω1 (N + 1) 2π k1 N + 1 2π × 14 N + 1
π= = = , Y component of YIQ, instead of to the RGB components.
2 1162 2 1162 2 We illustrate the effectiveness of the denoising technique
through the following two examples.
which leads to N = 82.
(3) With H(Ω1 , Ω2 ) fully specified, we now apply the Wiener
filter of Eq. (6.50) to the red-color spectrum to obtain the Example 10-4: Denoising American Flag Image
deblurred version
GR (Ω1 , Ω2 ) H∗ (Ω1 , Ω2 ) To the original image shown in Fig. 10-12(a), a zero-mean white
FR (Ω1 , Ω2 ) =
|H(Ω1 , Ω2 )|2 + λ 2 Gaussian noise random field with σv = 0.5 was added to each
T sin Ω1 N+1 of the RGB components. The resultant noisy image is shown
GR (Ω1 , Ω2 ) 2
e jΩ1 N/2 in Fig. 10-12(b), and the associated signal-to-noise ratios are
N sin(Ω1 /2)
= 2 2 . 3.32 dB for the red channel, 1.71 dB for the green, and 2.36 dB
N+1
T sin Ω 1 for the blue. Application of the 2-D Haar transform method
2
+λ2 outlined in Section 7-8 with a threshold/shrinkage factor λ = 3
N sin(Ω1 /2)
led to the denoised image shown in Fig. 10-12(c). Much of the
(10.20)
noise has been removed.
Similar procedures are applied to the two other channels.
(4) Application of the inverse DFT to each of the three color
channels yields fR [n, m], fG [n, m], and fB [n, m], the combination
of which yields the deblurred color image in Fig. 10-11(c). The
motion-caused blurring has essentially been removed. The only
artifact is the black band at the right-hand side of the image,
which is due to the zero-padding used in the DFT (compare with
Fig. 10-10(d)).
10-4 DENOISING COLOR IMAGES 347
3500
3000
2500
2000
1500
1000
500
0
50 100 150 200 250
(a) Original toucan image (b) Histogram of red (R) channel in original image
3500 15000
3000
2500
10000
2000
1500
1000 5000
500
0 0
50 100 150 200 250 50 100 150 200 250
(c) Histogram of green (G) channel in original image (d) Histogram of blue (B) channel in original image
Figure 10-13 Original image and associated histograms of its R, G, and B channels.
PROBLEMS 349
3500
3000
2500
2000
1500
1000
500
0
0 50 100 150 200 250
(a) RGB-equalized image (b) Histogram of red (R) channel in RGB-equalized image
2500 12000
10000
2000
8000
1500
6000
1000
4000
500 2000
0 0
0 50 100 150 200 250 0 50 100 150 200 250
(c) Histogram of green (G) channel in RGB-equalized image (d) Histogram of blue (B) channel in RGB-equalized image
Figure 10-14 The RGB-equalized method generates a brighter image, but does not preserve color balance.
Summary
Concepts
• Color images have three components: red, green, and • Image restoration, such as denoising and deconvolution,
blue, each of which is a separate 2-D function. This RGB uses the RGB color system, since each component is
color system represents how images are acquired and blurred or has noise added to it directly. Image enhance-
displayed. ment, such as histogram equalization and edge detection,
• Other color systems include the CMYK system, used uses the YIQ system, since use of the RGB system can
for color printing, and the YIQ system, used for image result in distortion of colors.
enhancement, which is applied only to the Y (luminance) • Other colors can be represented as a linear combination
component. of red, green, and blue, as depicted in the color cube.
350 CHAPTER 10 COLOR IMAGE PROCESSING
2500
2000
1500
1000
500
0
0 50 100 150 200 250
(a) YIQ-equalized image (b) Histogram of red (R) channel in YIQ-equalized image
3500
3500
3000
3000
2500
2500
2000 2000
1500 1500
1000 1000
500 500
0 0
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
(c) Histogram of green (G) channel in YIQ-equalized image (d) Histogram of blue (B) channel in YIQ-equalized image
Figure 10-15 The YIQ-equalized method generates a partially brighter image, but does preserve color balance.
Mathematical Formulae
Components of a color image RGB to grayscale
f [n, m] = { fR [n, m], fG [n, m], fB [n, m]} fgray [n, m] = 0.299 fR [n, m] + 0.587 fG[n, m] + 0.114 fB[n, m]
Relation between components of RGB MATLAB commands for color images
and YIQ color systems imread, imagesc
fY [n, m] 0.299 0.587 0.114 fR [n, m]
fI [n, m] = 0.596 −0.275 −0.321 fG [n, m]
fQ [n, m] 0.212 −0.523 0.311 fB [n, m]
Important Terms Provide definitions or explain the meaning of the following terms:
additive color scheme CMYK color scheme luminance subtractive color scheme
chrominance color cube RGB color scheme YIQ color scheme
PROBLEMS 351
blur and then deblur the (512 × 512) image in flag.mat. white Gaussian noise with variance σ 2 is added to each
Uncomment a line in refocuscolor.m and use λ = 0.01 RGB component. The image is clipped to (704 × 704) since
and disk radius R = 29. daubdenoisecolor.m requires square images. Uncomment
a line in daubdenoisecolor.m and use λ = 0.05 and
10.12 Refocus an out-of-focus image. An out-of-focus image
σ = 0.05.
can be modeled as the image convolved with a disk-shaped PSF.
The program refocuscolor.m convolves an image with a 10.18 Use db3 transform threshold and shrinkage pro-
disk PSF and then deconvolves. Use refocuscolor.m to gram daubdenoisecolor.m to denoise each RGB com-
blur and then deblur the plumage image in plumage.mat. ponent of the toucan image in toucan.mat. Zero-mean
Uncomment a line in refocuscolor.m and use λ = 0.01 white Gaussian noise with variance σ 2 is added to each
and disk radius R = 18. RGB component. Image is zero-padded to (480 × 480) since
daubdenoisecolor.m requires square images. Uncomment
Section 10-4: Denoising Color Images a line in daubdenoisecolor.m and use λ = 0.05 and
σ = 0.05.
10.13 Use Haar transform threshold and shrinkage program 10.19 Use db3 transform threshold and shrinkage pro-
haardenoisecolor.m to denoise each RGB component of gram daubdenoisecolor.m to denoise each RGB com-
the (8 × 8) checkerboard image in checker.mat. Zero-mean ponent of the plumage image in plumage.mat. Zero-mean
white Gaussian noise with variance σ 2 is added to each RGB white Gaussian noise with variance σ 2 is added to each
component. Uncomment a line in haardenoisecolor.m RGB component. Image is zero-padded to (512 × 512) since
and use λ = 1 and σ = 0.2. daubdenoisecolor.m requires square images. Uncomment
10.14 Use Haar transform threshold and shrinkage program a line in daubdenoisecolor.m and use λ = 0.05 and
haardenoisecolor.m to denoise each RGB component σ = 0.05.
of the Christmas tree image in xmastree.mat. Zero-mean
white Gaussian noise with variance σ 2 is added to each
RGB component. The image is clipped to (704 × 704) since
haardenoisecolor.m requires square images. Uncomment
a line in haardenoisecolor.m and use λ = 0.2 and
σ = 0.1.
10.15 Use Haar transform threshold and shrinkage pro-
gram haardenoisecolor.m to denoise each RGB com-
ponent of the toucan image in toucan.mat. Zero-mean
white Gaussian noise with variance σ 2 is added to each
RGB component. Image is zero-padded to (480 × 480) since
haardenoisecolor.m requires square images. Uncomment
a line in haardenoisecolor.m and use λ = 0.1 and
σ = 0.1.
10.16 Use Haar transform threshold and shrinkage pro-
gram haardenoisecolor.m to denoise each RGB com-
ponent of the plumage image in plumage.mat. Zero-mean
white Gaussian noise with variance σ 2 is added to each
RGB component. Image is zero-padded to (512 × 512) since
haardenoisecolor.m requires square images. Uncomment
a line in haardenoisecolor.m and use λ = 0.2 and
σ = 0.1.
10.17 Use db3 transform threshold and shrinkage program
daubdenoisecolor.m to denoise each RGB component
of the Christmas tree image in xmastree.mat. Zero-mean
Chapter 11
11 Image Recognition
Contents
(a) Image of text
Overview, 354
11-1 Image Classification by Correlation, 354
11-2 Classification by MLE, 357
11-3 Classification by MAP, 358
11-4 Classification of Spatially Shifted Images, 360 (b) Thresholded cross-correlation ρ5(n0,m0)
11-5 Classification of Spatially Scaled Images, 361
11-6 Classification of Rotated Images, 366
11-7 Color Image Classification, 367
11-8 Unsupervised Learning and Classification, 373 (c) Image of text (green) and thresholded cross-correlation (red)
11-9 Unsupervised Learning Examples, 377
11-10 K-Means Clustering Algorithm, 380
Problems, 384 Image recognition amounts to classification of a
given image as one of several given possible
candidate images. Classification can be regarded as
Objectives a discrete estimation problem in which the candi-
date image that maximizes a likelihood function is
Learn to: chosen. If there are no candidate images, unsuper-
vised learning can be used to determine image
■ Use correlation, MLE, and MAP to classify an image. classes from a set of training images.
■ Use cross-correlation to classify a shifted image.
354
11-1 IMAGE CLASSIFICATION BY CORRELATION 355
(a) Numerals 1,2,…,9,0 in Bank font (b) Noisy image gobs[n,m] of numeral 3
ρk
6
x 10
14
12
10
0 k
0 1 2 3 4 5 6 7 8 9
(c) Correlation ρk
ΛMLE[k]
6
x 10
15
10
0 k
−5
0 1 2 3 4 5 6 7 8 9
(d) ΛMLE[k]
Figure 11-2 (a) Numerals 1–9 and 0 in bank font, (b) noisy (30 × 18) image gobs [n, m] of numeral 3 with SRN = −1.832 dB, (c)
correlation ρk , and (d) MLE criterion Λ2 [k].
356 CHAPTER 11 IMAGE RECOGNITION
11-2 Classification by MLE all locations [n, m] of the marginal pdfs p(gobs [n, m]) given by
Eq. (11.5):
For L classes, the observed (M × N) image gobs [n, m] is one of:
N−1 M−1
f1 [n, m] + υ [n, m]
class #1,
p({gobs [n, m]}) = ∏ ∏ p(g[n, m])
n=0 m=0
f2 [n, m] + υ [n, m] class #2, N−1 M−1
gobs [n, m] = (11.3) 1 2 2
.
.. ..
.
= ∏ ∏ e−(gobs [n,m]− fk [n,m]) /(2σv )
(2πσv2 )NM/2 n=0 m=0
fL [n, m] + υ [n, m] class L, 1 N−1 M−1 2 2
= 2 NM/2
e− ∑n=0 ∑m=0 (gobs [n,m]− fk [n,m]) /(2σv ) . (11.6)
(2πσv )
where υ [n, m] is a (M × N) zero-mean white Gaussian noise
random field with variance σv2 . Following Section 9-3, the likelihood function
p({gobs[n, m]}) is one of the following joint pdfs:
of noise added to other pixels, image values {gobs[n, m]} can multiplies all terms, k̂MLE is the value of k that maximizes Λ1 [k]
be regarded as independent if the amount of noise added to defined as
{ fk [n, m]} is significant. This is certainly the case for the image
N−1 M−1
shown in Fig. 11-2(b). [If the noise is insignificant in compari-
son with the signal, the classification task becomes rather trivial;
Λ1 [k] = − ∑ ∑ (gobs [n, m] − fk [n, m])2 . (11.9)
n=0 m=0
the correlation method of the preceding subsection should be
able to correctly classify the observed image with no error.] This expression has an evident interpretation: Choose the value
Hence, the joint pdf p({gobs [n, m]}) is equal to the product over of k such that fk [n, m] is closest to gobs [n, m] in the ℓ2 (sum of
358 CHAPTER 11 IMAGE RECOGNITION
2 1
0.14 into one of the two classes defined by the image and
5 7
0.12 8 6
. Obtain a classification rule in terms of the values of the
4 3
0.10 four elements of gobs [n, m] using (a) correlation and (b) MLE.
0.08
0.06 Solution:
(a) Classification by correlation: In terms of the notation
0.04 introduced earlier,
0.02
2 1
f1 [n, m] = (11.14a)
0 5 7
abcdefghijklmnopqrstuvwxyz
and
8 6
Figure 11-3 Frequencies of appearances of letters, which can f2 [n, m] = . (11.14b)
4 3
be regarded as a priori probabilities p[k].
The energies of f1 [n, m] and f2 [n, m] are
E f1 = 22 + 12 + 52 + 72 = 79, (11.15a)
function in Eq. (11.7) by p[k]. The modification adds ln[p[k]] to 2 2 2 2
the log-likelihood function in Eq. (11.8). Repeating the above E f2 = 8 + 6 + 4 + 3 = 125. (11.15b)
derivation, k̂MAP is computed by choosing the k that maximizes
ΛMAP [k]:
The correlations between gobs [n, m] and each of f1 [n, m] and
f2 [n, m] are
ΛMAP [k] = 2ρk − E fk + 2σv2 ln p[k]. (11.13)
ρ1 = 2g0,0 + g1,0 + 5g0,1 + 7g1,1, (11.16a)
The MAP (maximum a posteriori) classifier is also known
as the minimum error probability (MEP) classifier since it ρ2 = 8g0,0 + 6g1,0 + 4g0,1 + 3g1,1. (11.16b)
minimizes the probability of an incorrect choice of k. Note
that if p[k] = 1/L, so that each class is equally likely, the For a given image g[n, m], we classify it as
MAP classifier reduces to the MLE classifier. Also note that
σv2 functions as a trade-off parameter between the a priori f1 [n, m], if ρ1 > ρ2 (11.17a)
information p[k] and the a posteriori information ρk : the noisier or as
the observations, the larger is σv2 , and the greater the weight f2 [n, m], if ρ1 < ρ2 . (11.17b)
given to a priori information p[k]. The smaller the noise, the
heavier is the weight given to a posteriori (from the noisy data)
information ρk . (b) Classification by MLE: Using Eq. (11.11), the MLE
parameter Λ2 [k] is given by
3. Label that specific pixel location as [n, m] = [n0 , m0 ] and 1. Given observation image gobs (x, y) and reference images
label the identified value of k as the unknown class K. fk (x, y), compute the cross-correlation ρk (x0 , y0 ) for all
values of k: {k = 1, 2, . . . , L}, and for all spatial shifts
(x0 , y0 ) that offer nonzero overlap between gobs (x, y) and
fk (x − x0 , y − y0).
2. Identify the combination of shift (x0 , y0 ) and class k that
Example 11-2: Identifying Letter “e”
exhibits the largest value of ρk (x0 , y0 ).
in Text Image
x y where the true class is class #K and all images are nonzero only
gobs (x, y) = fk , . (11.27b)
ax ay for x, y > 0. Class K and variables ax , and ay are all unknown.
and logarithmically transformed scale factors (a′x , a′y ): that its coordinates may have an unknown spatial scaling relative
to those of reference images fk (x, y), consists of the following
′
a′x = ln(ax ) a x = ea x , (11.31a) steps:
a′y
a′y = ln(ay ) ay = e . (11.31b) 1. Use Eqs. (11.29) and (11.30) to transform all images g(x, y)
and fk (x, y) to logarithmic format g′ (x′ , y′ ) and fk′ (x′ , y′ ).
Using these transformations, the spatially warped observation
image g′obs (x′ , y′ ) can be related to the spatially warped reference 2. Use Eq. (11.34) to compute ρk (x0 , y0 ) for each value of k:
image fk′ (x′ , y′ ) as follows: {k = 1, 2, . . . , L}, and all spatial shifts (x0 , y0 ) that offer
! nonzero overlap between g′ (x′ , y′ ) and fx′ (x′ − x0 , y′ − y0 ),
e x′ ey′
′ ′ using the 2-D CSFT.
g′obs (x′ , y′ ) = gobs (ex , ey ) = fk ,
ax ay
! 3. Identify the combination of shift (x0 , y0 ) and class k that
′ ′
ex ey yields the largest value of ρk (x0 , y0 ). Label that combina-
= fk ′ , ′ tion as (a′x , a′y ) and k = class K.
ea x ea y
′ ′ ′ ′
= fk (e(x −ax ) , e(y −ay ) ) 4. With (a′x , a′y ) known, the scaling factors (ax , ay ) can then be
= fk′ (x′ − a′x , y′ − a′y ). (11.32) determined using Eq. (11.31).
10 2
6
15
8
10
20 12
14
16
25 2 4 6 8 10
n
m
gobs[n,m]
30
n
4 8 12 16 20
m
f3[n,m]
(a) Images in linear coordinates [n,m]
n and m 1 2 4 8 16 32
n′ and m′ 0 1 2 3 4 5
n′ n′
m′ m′
f3′[n′,m′] gobs
′ [n′,m′]
(b) Images in logarithmic coordinates [n′,m′] (c) Relationship between [n,m] and [n′,m′]
Figure 11-5 (a) (32 × 20) image f3 [n, m] and (16 × 10) image gobs [n, m], (b) the same images in logarithmic format in base 2 (instead of
base e), also note that g′obs [n′ , m′ ] = f3′ [n′ + 1, m′ + 1], and (c) the relation between [n, m] and [n′ , m′ ]. [Note that 32 pixels in [n, m] space
transform into 6 (not 5) pixels in [n′ , m′ ] space.]
11-5 CLASSIFICATION OF SPATIALLY SCALED IMAGES 365
f1′ [n′,m′] f2′ [n′,m′] f3′ [n′,m′] f4′ [n′,m′] f5′ [n′,m′]
f6′ [n′,m′] f7′ [n′,m′] f8′ [n′,m′] f9′ [n′,m′] f0′ [n′,m′]
(a) Reference images fk [n,m] (a) Reference images fk′[n′,m′] in logarithmic format
Upon using Eq. (11.38) in Eq. (11.37) and then incorporating ΛMLE [k] = 2ρk′ − E f ′ , (11.40)
k
the scaling relationship into Eq. (11.36), we have
′ ′
where ρk′ is the cross-correlation between g′obs [n′ , m′ ] and
g′obs [n′ , m′ ] = fk [2i 2n , 2i 2m ] fk′ [n′ , m′ ] and E f ′ is the energy of image fk′ [n′ , m′ ]. Because
k
′ ′
= fk [2n +i , 2m +i ]. the energies of the 10 digits in [n′ , m′ ] space vary widely (as
evidenced by the wide range in the number of white pixels
= fk′ [n′ + i, m′ + i], (11.39) among the 10 digits in Fig. 11-7(a)), use of the MLE criterion
366 CHAPTER 11 IMAGE RECOGNITION
ΛMLE[k]
5
x 10
3
0 k
−1
−2
0 1 2 3 4 5 6 7 8 9
Figure 11-8 ΛMLE [k] based on correlation between noisy digit g′obs [n, m] and each of the ten digits fk′ [n, m] for {k = 0, 1, . . . , 10}.
r r
8 8
6 6
(1) 4
(2) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(3) 4
(4) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(5) 4
(6) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(7) 4
(8) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(9) 4
(0) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
(a) Reference images displayed in polar coordinates
r
8
0
100 200 300 400 500 600 700 θ
90°
(b) Observed image
Figure 11-10 In polar coordinates, the observed image in (b) matches the reference image of “3”, shifted by 90◦ .
11-7 COLOR IMAGE CLASSIFICATION 369
printed in different fonts, as well as in many object recognition Observed-image model for class k:
applications.
With these two considerations (3-color channels and within- gobs [n, m] = fk [n, m] + v[n, m], (11.49)
class variability) in mind, we introduce the following notation:
where v[n, m] is a vector of length 3 (one for each color),
modeled as a zero-mean white Gaussian noise random field with
Number of classes: L, with class index k = 1, 2, . . . , L. variance σv2 for each color component.
Image size: M × N Gaussian model for gobs [n, m]:
Number of training images per class: I, with training image
index i = 1, 2, . . . , I. gobs [n, m] = [gk,R [n, m], gk,G [n, m], gk,B [n, m]]T
(i) ∼ N ([fk [n, m]], Kk0 ), (11.50)
Red channel ith training image for class k: fk,R [n, m]
where Kk0 is a (3 × 3) joint covariance matrix that accounts
ith training image vector of 2-D functions of [n, m] for class k: for both Kk , the in-class variability of the noise-free training
(i) (i) (i) (i) images, and the added noise represented by σv2 :
fk [n, m] = [ fk,R [n, m], fk,G [n, m], fk,B [n, m]]T (11.44)
Kk0 = Kk + σv2 I3 , (11.51)
Mean red channel training image for class k:
where I3 is the (3 × 3) identity matrix.
I
1 (i)
f k,R [n, m] =
I ∑ fk,R [n, m] (11.45)
i=1 11-7.2 Likelihood Functions
Mean training image vector for class k: Even though for class k, vector fk [n, m] and covariance matrix
Kk are both estimated from the vectors of the I training images
fk [n, m] = [ f k,R [n, m], f k,G [n, m], f k,B [n, m]]T (11.46) (i)
{fk [n, m]}, we use them as the “true” values, not just estimates,
Covariance matrix estimate for class k at location [n, m]: in the classification algorithms that follow. The goal of the
classification algorithm is to determine the value of the class
1 I index k from among the L possible classes.
(i) (i)
Kk [n, m] = ∑ (fk [n, m] − fk [n, m])(fk [n, m] − fk [n, m])T Given an observation image vector gobs [n, m] and an average
I i=1 reference image vector fk [n, m] for each class index k, we define
I the difference image vector ∆ k [n, m] as
1 (i) (i)
= ∑ [fk [n, m](fk [n, m])T ] − fk [n, m](fk [n, m])T .
I i=1 ∆ k [n, m] = gobs [n, m] − fk [n, m]
(11.47)
= [gk,R [n, m], gk,G [n, m], gk,B [n, m]]T
The estimated covariance matrix Kk [n, m] accounts for the − [ f k,R [n, m], f k,G [n, m], f k,B [n, m]]T . (11.52)
variability at location [n, m] among the I training images, relative
to their mean fk [n, m], and correlation between different colors By treating fk [n, m] as the mean value of gobs [n, m], and in
in an image. view of the model given by Eq. (11.50), the location-specific
Location-independent covariance matrix for class k: marginal pdf p(gobs [n, m]) of length-3 Gaussian random vector
gobs [n, m] for class k is given by
N−1 M−1
1
Kk =
NM ∑ ∑ Kk [n, m] (11.48)
p(gobs [n, m]) =
1
p
n=0 m=0
(2π )3/2 det(Kk0 )
1 K−1
× e− 2 ((∆∆k [n,m])
T
The location-independent covariance matrix Kk is the sam- k0 ∆ k [n,m]) . (11.53)
ple mean over [n, m] of Kk [n, m]. It is used to represent the in-
class variability when Kk [n, m] is statistically independent of lo- The {gobs[n, m]} values at different locations [n, m] are indepen-
cation [n, m], which is a valid assumption in many applications. dent random vectors because noise vectors v[n, m] are location-
370 CHAPTER 11 IMAGE RECOGNITION
independent. Accordingly, their joint pdf is The corresponding natural log-likelihood function is given by
N−1 M−1 3NM
ln(p({gobs[n, m]})) = − ln(2π )
p({gobs[n, m]}) = ∏ ∏ p(gobs[n, m]) 2
n=0 m=0
NM
1 1
− ln(det(K10 ))
=
2
(2π )3NM/2 (det(Kk0 ))NM/2
1 N−1 M−1
− ∑ ∑ (∆ ∆1 [n, m])T K−1
10 ∆ 1 [n, m] class 1,
N−1 M−1
2 n=0 m=0
1 K−1
e− 2 ((∆∆k [n,m]) k0 ∆ k [n,m])
T
×∏
∏
NM
n=0 m=0
− ln(det(K20 ))
2
1 1 1 N−1 M−1
= − ∑ ∑ (∆ ∆2 [n, m])T K−1
(2π )3NM/2 (det(Kk0 ))NM/2 + 20 ∆ 2 [n, m] class 2,
2 n=0 m=0
1 N−1 M−1 ∆ T −1
× e− 2 (∑n=0 ∑m=0 (∆ k [n,m]) Kk0 ∆k [n,m]) .
.
.
.
(11.54)
NM
− ln(det(KL0 ))
The joint pdf given by Eq. (11.54) is the likelihood function of
2
the observed image vector gobs [n, m]. An expanded version for
1 N−1 M−1
− ∑ ∑ (∆ ∆L [n, m])T K−1
the individual classes is given by L0 ∆ L [n, m] class L.
2 n=0 m=0
1 (11.56)
p({gobs[n, m]}) =
(2π )3NM/2
N−1 M−1
1 1 T −1
NM/2 ∏ ∏ e− 2 ((∆∆1 [n,m]) K10 ∆ 1 [n,m]) class 1,
(det(K10 )) n=0 m=0 11-7.3 Classification by MLE
N−1 M−1
1 1 T −1
NM/2 ∏ ∏ e− 2 ((∆∆2 [n,m]) K20 ∆ 2 [n,m]) class 2,
× (det(K20 )) n=0 m=0 The maximum likelihood estimate k̂MLE of class k is the
.. value of k that maximizes the log-likelihood function given by
.
Eq. (11.56). Since the first term [−(3NM/2) ln(2π )] is common
N−1 M−1
1 1 T −1
∏ ∏ e− 2 ((∆∆L [n,m]) KL0 ∆L [n,m]) class L.
(det(KL0 ))NM/2 n=0 m=0
to all terms, it has no impact on the choice of k. Hence, k̂MLE is
the value of k that maximizes the MLE criterion Λ1 [k] given by
(11.55)
N−1 M−1
Λ1 [k] = −NM ln(det(Kk0 ))− ∑ ∑ ∆k [n, m])T K−1
(∆ k0 ∆ k [n, m] .
n=0 m=0
(11.57)
According to Eq. (11.51), the joint covariance matrix Kk0
incorporates two statistical variations, Kk due to variations
among training images for class k, and σv2 due to the added
noise. If the classification application is such that Kk does
not vary with class k (or the variation from class to class is
relatively minor), we then can treat Kk as a class-independent
covariance matrix K, with a corresponding joint covariance
matrix K0 = K + σv2 I3 . As a consequence, the first term in
Eq. (11.57) becomes class-independent and can be removed
from the maximization process, which leads us to adjust the
definition of the MLE criterion to
N−1 M−1
Λ1 [k] = − ∑ ∑ ∆k [n, m])T K−1
(∆ 0 ∆ k [n, m] . (11.58)
n=0 m=0
11-7 COLOR IMAGE CLASSIFICATION 371
Upon replacing ∆ k [n, m] with its defining expression given by additional term:
Eq. (11.52), Eq. (11.58) becomes
N−1 M−1
N−1 M−1
ΛMAP [k] = 2 ∑ ∑ (gobs[n, m])TK−1
0 fk [n, m]
Λ1 [k] = − ∑ ∑ ∆k [n, m])T K−1
(∆ 0 ∆ k [n, m]
n=0 m=0
n=0 m=0 N−1 M−1
N−1 M−1 − ∑ ∑ fk [n, m] K−1
0 fk [n, m] + 2 ln(p[k]).
=− ∑ ∑ (gobs[n, m] − fk [n, m])T K−1
0
n=0 m=0
(11.61)
n=0 m=0
× (gobs[n, m] − fk [n, m]) The MAP classifier (estimator of k) selects the value of k that
N−1 M−1 maximizes ΛMAP [k].
=− ∑ ∑ gobs[[n, m]K−1
0 gobs [n, m]]
n=0 m=0
N−1 M−1
− ∑ ∑ fk [n, m]K−10 fk [n, m]
n=0 m=0
N−1 M−1 Example 11-5: Color Image Classification
+2 ∑ ∑ (gobs [n, m])T K−1
0 fk [n, m] . (11.59)
n=0 m=0
The last term comes from noting that a scalar equals its own
transpose, which is useful in matrix algebra. Develop a classification rule for a two-class, (1 × 1) color image
The first term in Eq. (11.59) is independent of k. Hence, classifier for a (1 × 1) image [gR , gG , gB ] with no additive noise
computation of k̂MLE simplifies to choosing the value of k that and equal class probabilities (p[k = 1] = p[k = 2]), given the
maximizes the modified MLE criterion following 4 training images per class:
N−1 M−1
ΛMLE [k] = 2 ∑ ∑ (gobs [n, m])T K−1
0 fk [n, m] Class k = 1:
n=0 m=0
N−1 M−1 0 4 4 4
− ∑ ∑ fk [n, m] K−1
0 fk [n, m]. (11.60) (1) (2) (3) (4)
f1 = 0 , f1 = 0 , f1 = 4 , f1 = 0 , (11.62)
n=0 m=0 0 0 0 4
The expression given by Eq. (11.60) is the vector version of the
Class k = 2:
expression given earlier in Eq. (11.11) for the scalar case.
0 0 0 4
(1) (2) (3) (4)
f2 = 0 , f2 = 4 , f2 = 4 , f2 = 4 . (11.63)
4 0 4 4
11-7.4 Classification by MAP Solution: The sample means of the four training sets are
An noted earlier in connection with Fig. 11-3, the probability 1 0 4 4 4 3
f1 = 0 + 0 + 4 + 0 = 1 (11.64)
of occurrence p[k] of the letters of the alphabet varies widely 4 0 0 0 4 1
among the 26 letters of the English language. The same may be
true for other classification applications. The maximum a priori and
(MAP) classifier takes advantage of this a priori information
by multiplying the likelihood function given by Eq. (11.54) by
1 0 0 0 4 1
p[k]. This modification leads to a MAP classification criterion f2 = 0 + 4 + 4 + 4 = 3 . (11.65)
ΛMAP [k] given by the same expression for ΛMLE [k], but with an 4 4 0 4 4 3
372 CHAPTER 11 IMAGE RECOGNITION
3 2 6 5 9 8 (3) To provide a simple rule for classifying any new image into
f2,R = , f2,G = , f2,B = . (11.71)
1 0 4 0 7 0 one of the classes identified by the training images.
following simple approach to compute the SVD of F by noting 12/13 0 5/13
that V= 0 1 0 . (11.78d)
−5/13 0 12/13
FFT = (USVT )(VST UT )
The two singular values are σ1 = 130 and σ2 = 65.
= (US)(VT V)(ST U) = U diag[σ 2j ]UT (11.76a)
The singular values are usually put in decreasing order
and σ1 > σ2 > · · · by reordering the columns of U and the rows
FT F = (VST UT )(USVT ) of VT .
(b) Using Eq. (11.78a), we compute
= (VST )(UT U)(SVT ) = V diag[σ 2j ]VT . (11.76b)
96 −72
These relationships rely on the fact that U and V are orthogonal 96 39 −40 39
matrices: VVT = UT U = I. FFT = 52
−72 52 30
As demonstrated shortly in the SVD example, the eigenvec- −40 30
tors of FFT constitute the columns of U, the eigenvectors of 12337 −6084
FT F constitute the columns of V, and the {σ j } are the nonzero = .
−6084 8788
eigenvalues of FFT and of FT F. Given V and {σ 2j }, or U and
{σ 2j }, we can compute U or V as follows: Postmultiplying FFT in Eq. (11.76a) by U and using UT U = I
gives
U = FVdiag[σ −1 (FFT )U = U diag[σ 2j ]. (11.78e)
j ], (11.77a)
T
= diag[σ −1 T Next, let u j be the jth column of (M × M) matrix U. Then the
V j ]U F, (11.77b)
jth column of Eq. (11.78e) is
both of which follow directly from F = USVT .
(FFT )u j = σ 2j u j , (11.78f)
where Ve T is the first M rows of VT . This is as expected: The third row of VT is computed to be orthogonal to the two
A = USVT uses only the first M rows of VT . Since VT must e T using Gram-Schmidt and is
rows of V
be orthogonal, the remaining rows of VT must be chosen to be
orthogonal to its first M rows. The remaining rows of VT can vT3 = 5/13 0 12/13 .
be computed from its first M rows using, say, Gram-Schmidt
orthonormalization. 11-8.3 Interpretations of SVD
The SVD of F can therefore be computed as follows:
A. Rotations, Reflections and Scalings
1. Compute eigenvalues and eigenvectors of FF T .
An orthogonal matrix can be interpreted as a rotation and/or a
2. The eigenvalues are σi2 and the eigenvectors are ui . reflection of coordinate axes. A (2 × 2) orthogonal matrix can
e T of VT using often be put into the form of Rθ defined in Eq. (3.38), which we
3. Compute the first M rows V
repeat here as
e T = diag[σ −2 ]UT F. cos θ sin θ
V i Rθ = . (11.79)
− sin θ cos θ
4. Compute the remaining rows of VT using Gram-Schmidt For example, by comparing the entries of U in Eq. (11.78b) with
orthonormalization. the entries of Rθ , we ascertain that U is a rotation matrix with
θ = 36.87◦. Similarly, V in Eq. (11.78d) represents a rotation
Another way to compute the SVD is from matrix with θ = 22.62◦.
Hence, the SVD of a matrix F can be interpreted as: (1) a
FT F = (VST UT )(USVT ) rotation and/or reflection of axes, followed by (2) scaling of the
= (VST )(UT U)(SVT ) rotated axes, followed by (3) another rotation and/or reflection
of axes.
0 ]VT .
= Vdiag[σ 2j , |{z} (11.78i)
N−M
B. Expansion in Orthonormal Vectors
Repeating the earlier argument, V is the matrix of eigenvectors,
and {σ 2j } are the nonzero eigenvalues, of (N × N) matrix FT F. Now we introduce another interpretation of the SVD, which will
prove particularly useful in unsupervised learning. In the sequel,
Note that FT F has N − M zero eigenvalues, since the rank of we assume that M < N, so F is a reclining matrix. Let us define:
FT F is M.
Inserting σ1 into Eq. (11.78i) gives (1) fi as the ith column of F, for i = 1, 2, . . . , N:
12337 − 16900 −6084 0 fi,1
u1 = ,
−6084 8788 − 16900 0 fi,2
fi =
.. , (11.80)
which has the solution (after normalization so that uT1 u1 = 1): .
fi,M
u1 = [4/5, −3/5]T.
Similarly, for σ2 ,
(2) ui as the ith column of U, for i = 1, 2, . . . , M:
12337 − 4225 −6084 0
u = ui,1
−6084 8788 − 4225 2 0
ui,2
ui =
.. , (11.81)
has the solution (after normalization so that uT2 u2 = 1): .
u2 = [3/5, 4/5]T. ui,M
376 CHAPTER 11 IMAGE RECOGNITION
(4) 0.0 0.9 where u1 and u2 are the first and second columns of U. Similarly,
f [n, m] = , (11.89d)
1.1 0.1 application of Eq. (11.88) for i = 2, 3, 4, and 5 yields
1.1 1.0
f (5) [n, m] = , (11.89e) f(2) ≈ (σ1 υ2,1 )u1 + (σ2 υ2,2 )u2
0.9 1.0
= 1.06u1 + 0.94u2, (11.92b)
1 1
gobs [n, m] = . (11.89f) f(3) ≈ (σ1 υ3,1 )u1 + (σ2 υ3,2 )u2
1 1
= 1.04u1 − 0.86u2, (11.92c)
f(4) ≈ (σ1 υ4,1 )u1 + (σ2 υ4,2 )u2
A. Training Matrix = 1.01u1 + 1.00u2, (11.92d)
(5)
f ≈ (σ1 υ5,1 )u1 + (σ2 υ5,2 )u2
We unwrap the f (i) [n, m] by columns as in MATLAB’s F(:)
and assemble the unwrapped images into a (4 × 5) training = 2.00u1 − 0.23u2. (11.92e)
matrix F. The ith column of the training matrix F is the
All five images assume the form of Eq. (11.88):
unwrapped vector image f(i) [n, m]:
f(i) ≈ ci,1 u1 + ci,2 u2 .
F = f1 f2 f3 f4 f5
1.1 0.1 1.0 0.0 1.1 With u1 and u2 as orthogonal dimensions, ci,1 and ci,2 are the
0.1 1.0 0.0 1.1 0.9 coordinates of image f(1) in (u1 , u2 ) space.
= . (11.90)
0.1 1.0 0.1 0.9 1.0
1.0 0.1 0.9 0.1 1.0
B. Subspace Representation
Using the recipe given in Section 11-8.1(B) or the MATLAB If we regard u1 and u2 in Eq. (11.92) as orthogonal axes, then
command [U,S,V]=svd(F), we obtain the following matri- f(1) , the dimensionally reduced representation of training im-
ces: age 1, has coordinates (1.19, −0.90), as depicted in Fig. 11-12.
Similar assignments are made to the other four training images.
0.54 −0.52 −0.20 −0.63
0.47 0.56 0.63 −0.26 The five symbols appear to be clustered into three image classes
U= , (11.91a) centered approximately at coordinates { (1, 1), (1, −1), (2, 0) }.
0.49 0.48 −0.69 0.25
0.50 −0.44 0.29 0.69 We assign:
Class #1: to cluster centered at (1, 1),
2.94 0 0 0 0
0 1.86 0 0 0 Class #2: to cluster centered at (1, −1),
S= , (11.91b)
0 0 0.14 0 0 Class #3: to cluster centered at (2, 0).
0 0 0 0.02 0 Given these three clusters, we divide the (u1 , u2 ) domain into
0.40 −0.49 0.45 −0.63 0.01 the three regions shown in Fig. 11-12.
0.36 0.51 −0.38 −0.44 −0.53
V = 0.35 −0.46 −0.06 0.54 −0.61 . (11.91c) C. Classification of Observation Image
0.34 0.54 0.70 0.31 −0.01
0.68 −0.01 −0.39 0.17 0.59 For the observation image defined by Eq. (11.89f), we need to
determine its equivalent coordinates (g1 , g2 ) in (u1 , u2 ) space.
It is apparent from matrix S that both σ1 = 2.94 and σ2 = 1.86 We do so by applying the following recipe:
are much larger than σ3 = 0.14 and σ4 = 0.02. Hence, we can
truncate the SVD to T = 2 in Eq. (11.88), which gives 1. Unwrap gobs [n, m] by columns to form vector g, which in
the present case gives
f(1) ≈ u1 (σ1 υ1,1 ) + u2 (σ2 υ1,2 ) 1
= (2.94 × 0.40)u1 + (1.86 × (−0.49))u2 1
g = . (11.93)
1
= 1.19u1 − 0.90u2, (11.92a) 1
11-9 UNSUPERVISED LEARNING EXAMPLES 379
||xn − x0||22
P[xn ] = N
.
∑i=1 ||xi − x0||22
Thus, points far away from the first centroid x0 are more
likely to be chosen. Point xn is assigned as the centroid for
the second cluster.
Results of different iterations of the K-Means algorithm are
shown in parts (b) through (e) of Fig. 11-15, after which the
algorithm seems to have converged.
The K-Means algorithm was run using the MATLAB com-
mand idx=kmeans(X,k) available in MATLAB’s Machine
Learning toolbox. Here X is the (N × 2) array of data point
coordinates, k is the number of clusters (which must be set
ahead of time; here k=2), and idx is the (N × 1) column
vector of indices of clusters to which the data points whose
coordinates are in each row of X are assigned. Each element
of idx is an integer between 1 and k. For example, the data
point whose coordinates are [X(3,1),X(3,2)] is assigned
to cluster #idx(3).
kmeans has many options. For information, use help
kmeans.
3 3
Cluster 1
Cluster 2
2 2 Centroids
1 1
0 0
-1 -1
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
(a) Random data points to be classified (b) Results of first iteration
3 3
Cluster 1 Cluster 1
Cluster 2 Cluster 2
2 Centroids 2 Centroids
1 1
0 0
-1 -1
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
(c) Results of second iteration (d) Results of third iteration
3
Cluster 1
Cluster 2
2 Centroids
-1
-2
-3
-3 -2 -1 0 1 2 3 4
(e) Results of fourth iteration
Figure 11-15 (a) Random data points and results of K-means clustering after (b) 1 iteration, (c) 2 iterations, (d) 3 iterations, and (e) 4
iterations.
384 CHAPTER 11 IMAGE RECOGNITION
Summary
Concepts
• Classification is often performed by choosing the image • Jointly Gaussian color images can be classified using
with the largest correlation with the observed image. labelled training images to estimate the mean and co-
• MLE classification is performed by subtracting each im- variance matrix of each image class, and using these
age energy from double the correlation with the observed estimates in vector versions of the MLE and MAP
image. classifiers.
• MAP classification is performed by subtracting each • Unsupervised learning can be performed using unlabeled
image energy from double the correlation with the ob- training images, by unwrapping each image into a col-
served image, and adding the logarithm of the a priori umn vector, assembling these into a training matrix,
probability of the image times double the noise variance. computing the SVD of the training matrix, setting small
• Classification of an image with an unknown shift can be singular values to zero, and regarding the orthonormal
performed using cross-correlation instead of correlation. vectors associated with the remaining singular values as
Classification of an image with an unknown scaling can subspace bases. Each training image is a point in this
be performed by cross-correlation on logarithmic spatial subspace. The points can be clustered using the K-Means
scaling. algorithm, if necessary.
Mathematical Formulae
Correlation Singular value decomposition (SVD)
N−1 M−1 F = USVT
ρk = ∑ ∑ gobs[n, m] fk [n, m]
n=0 m=0 Expansion in orthonormal vectors
T
MAP classifier
ΛMAP [k] = 2ρk − E fk + 2σv2 log p[k]
fi = ∑ u j (σ j vi, j )
j=1
Important Terms Provide definitions or explain the meaning of the following terms:
correlation dimensionality reduction orthonormal vectors subspace training matrix
cross-correlation K-Means algorithm recognition SVD unsupervised learning
(a) Expand 0 ≤ ∑ ∑(g̃obs [n, m] ± f˜k [n, m])2 and show that f2 [n, m] =
2 5
,
|ρ̃k | ≤ 1. 1 7
(b) If gobs [n, m] = a fk [n, m] for any constant a > 0, show that 1 2
ρ̃k = 1. f3 [n, m] = ,
5 7
11.2 In Section 11-2, let the reference images { fk [n, m]} all be
energy-normalized to using
(a) MLE classification;
fk [n, m] (b) MAP classification with a priori probabilities p[1] = 0.1,
f˜k [n, m] = q ,
∑ ∑ fk2 [n, m] p[2] = 0.3, p[3] = 0.6. The noise variance is σv2 = 10.
and the observed image be g[n, m] = f˜k [n, m] + v[n, m], where Section 11-4: Classification of Spatially Shifted
v[n, m] is a zero-mean white Gaussian noise random field with Images
variance σv2 . This is the problem considered in Section 11-2,
except with energy-normalized reference images. Show that the 11.5 Download text.jpg and text.m from the book web-
MLE classifier is now the value of k that maximizes the correla- site. These were used in Example 11-2 to find all occurrences
tion coefficient ρ̃k = ∑ ∑ g̃obs [n, m] f˜k [n, m] between g[n, m] and of “e” in text.jpg. Modify text.m to find all occurrences
fk [n, m]. of “a” in text.jpg. You will need to (i) find a subset of the
image containing “a” instead of “e” and (ii) change the threshold
Section 11-3: Classification by MAP using trial-and-error. Produce a figure like Fig. 11-11(c).
11.6 Download text.jpg and text.m from the book web-
11.3 Determine a rule for classifying the image site. These were used in Example 11-2 to find all occurrences of
“e” in text.jpg. Modify text.m to find all occurrences of
g0,0 g1,0
gobs [n, m] = “m” in text.jpg. You will need to (i) find a subset of the im-
g0,1 g1,1
age containing “m” instead of “e” and (ii) change the threshold
into one of the three classes using trial and error. Produce a figure like Fig. 11-11(c).
11.7 To machine-read numerals, such as the routing and
1 2
f1 [n, m] = , account numbers on the check in Fig. 11-1, it is necessary
3 4
to segment (chop up) the number into individual numerals,
4 3 and then use one of the methods of Sections 11-1 through
f2 [n, m] = ,
2 1 11-3 to identify each individual numeral. Segmentation is often
performed by the hardware (camera) used to read the number.
3 1
f3 [n, m] = , The program bank1.m reads the image of the bank font
2 4
numerals in Fig. 11-2(a), segments the image into (30 × 18) im-
using ages of each numeral, reassembles the images of numerals back
into the sequence of numerals in Fig. 11-2(a), and machine-
(a) MLE classification;
reads the result, using correlation, into a sequence of numbers
(b) MAP classification with a priori probabilities p[1] = 0.1, that represent the numerals in Fig. 11-2(a).
p[2] = 0.3, p[3] = 0.6. The noise variance is σv2 = 10. Modify bank1.m so that it machine reads the account
11.4 Determine a rule for classifying the image number in Fig. 11-1. This requires assembling the images of
numerals into the sequence of numerals in the account number
g g1,0 in Fig. 11-1. Plot the computed numerals in a stem plot. Figure
gobs [n, m] = 0,0 11-2(a) is stored in bank.png.
g0,1 g1,1
into one of the three classes 11.8 To machine-read numerals, such as the routing and
account numbers on the check in Fig. 11-1, it is necessary
7 1 to segment (chop up) the number into individual numerals,
f1 [n, m] = , and then use one of the methods of Sections 11-1 through
5 2
386 CHAPTER 11 IMAGE RECOGNITION
11-3 to identify each individual numeral. Segmentation is often images of each numeral, decimates each numeral by (2 × 2),
performed by the hardware (camera) used to read the number. reassembles the images of decimated numerals into the sequence
The program bank1.m reads the image of the bank font of numerals in Fig. 11-2(a), uses logarithmic spatial scaling
numerals in Fig. 11-2(a), segments the image into (30 × 18) im- to transform the problem into a shifted images problem, and
ages of each numeral, reassembles the images of numerals back machine-reads the result, using correlation, into a sequence of
into the sequence of numerals in Fig. 11-2(a), and machine- numbers that represent the numerals in Fig. 11-2(a)
reads the result, using correlation, into a sequence of numbers Modify scale1.m so that it machine-reads the routing
that represent the numerals in Fig. 11-2(a). number in Fig. 11-1. This requires assembling the images of
Modify bank1.m so that it machine-reads the routing num- numerals into the sequence of numerals in the routing number
ber in Fig. 11-1. This requires assembling the images of nu- in Fig. 11-1. Plot the computed numerals in a stem plot. Figure
merals into the sequence of numerals in the routing number in 11-2(a) is stored in bank.png.
Fig. 11-1. Plot the computed numerals in a stem plot. Figure
11-2(a) is stored in bank.png.
Section 11-7: Classification of Color Images
Section 11-5: Classification of Spatially Scaled
Images 11.11 Show that if there is no randomness in image color
components, i.e., Kk = 0, that Eq. (11.61) for MAP classification
11.9 To machine-read numerals, such as the routing and of color images reduces to Eq. (11.13) for MAP classification of
account numbers on the check in Fig. 11-1, it is necessary grayscale images summed over all three color components.
to segment (chop up) the number into individual numerals,
and then use one of the methods of Sections 11-1 through 11.12 Determine a rule for classifying the (2 × 2) color image
11-3 to identify each individual numeral. Segmentation is often
g g0,1,R
performed by the hardware (camera) used to read the number. gobs,R = 0,0,R ,
g1,0,R g1,1,R
But the numbers may have a different size from the stored
numerals, in which case the numbers are spatially scaled. g g0,1,G
gobs,G = 0,0,G ,
The program scale1.m reads the image of the bank font g1,0,G g1,1,G
numerals in Fig. 11-2(a), segments the image into (30 × 18)
g g0,1,B
images of each numeral, decimates each numeral by (2 × 2), gobs,B = 0,0,B ,
g1,0,B g1,1,B
reassembles the images of decimated numerals into the sequence
of numerals in Fig. 11-2(a), uses logarithmic spatial scaling into one of the two (2 × 2) color image classes
to transform the problem into a shifted images problem, and
machine-reads the result, using correlation, into a sequence of
1 2 5 6 9 1
numbers that represent the numerals in Fig. 11-2(a) f1,R = , f1,G = , f1,B = ,
3 4 7 8 2 3
Modify scale1.m so that it machine-reads the account
number in Fig. 11-1. This requires assembling the images of 4 3 8 7 3 2
f2,R = , f2,G = , f2,B = ,
numerals into the sequence of numerals in the account number 2 1 6 5 1 9
in Fig. 11-1. Plot the computed numerals in a stem plot. Figure
11-2(a) is stored in bank.png. using (assume K1 = K2 = I throughout)
11.10 To machine-read numerals, such as the routing and (a) MLE classification;
account numbers on the check in Fig. 11-1, it is necessary (b) MAP classification with a priori probabilities p[1] = 0.2,
to segment (chop up) the number into individual numerals, p[2] = 0.8, and σv2 = 0.
and then use one of the methods of Sections 11-1 through
11-3 to identify each individual numeral. Segmentation is often 11.13 Determine a rule for classifying the (2 × 2) color image
performed by the hardware (camera) used to read the number.
g g0,1,R
But the numbers may have a different size from the stored gobs,R = 0,0,R ,
numerals, in which case the numbers are spatially scaled. g1,0,R g1,1,R
The program scale1.m reads the image of the bank font g g0,1,G
numerals in Fig. 11-2(a), segments the image into (30 × 18) gobs,G = 0,0,G ,
g1,0,G g1,1,G
PROBLEMS 387
g
gobs,B = 0,0,B
g0,1,B
, Section 11-8: Unsupervised Learning and
g1,0,B g1,1,B Classification
into one of the two (2 × 2) color image classes
11.16 Show that the approximations in the truncated expan-
7 1 8 6 9 7 sions of Eq. (11.92)(a-e) do a good job of approximating the
f1,R = , f1,G = , f1,B = ,
2 4 4 2 5 3 columns of the training matrix F in Eq. (11.90).
4 2 2 4 3 5
f2,R = , f2,G = , f2,B = ,
1 7 6 8 7 9 Section 11-9: Unsupervised Learning Examples
using (assume K1 = K2 = I throughout)
(a) MLE classification; 11.17 We are given the eight (2 × 2) training images
(b) MAP classification with a priori probabilities p[1] = 0.2, (1) 1.1 0.1 (2) 0.9 0.0
p[2] = 0.8, and σv2 = 0. f = , f = ,
0.1 1.1 0.1 0.9
11.14 Following Example 11-5, determine an MLE rule for 1.1 0.1 0.9 0.0
f(3) = , f(4) = ,
classifying a (1 × 1) color image [gR , gG , gB ]T into two classes, 0.0 0.9 0.0 0.1
determined from the following training images: 0.1 1.1 0.0 0.9
For class #1: f(5) = , f(6) = ,
1.1 0.1 0.9 0.1
0 0 8 8 0.1 1.1 0.0 0.9
f(7) = , f(8) = .
f11 = 0 , f21 = 8 , f31 = 8 , f41 = 8 . 0.9 0.0 1.1 0.0
8 8 0 8
Use unsupervised learning to determine classes for these eight
For class #2: training images, and a rule for classifying the observed image
4 4 12 12 0.0 1.1
f12 = 4 , f22 = 12 , f32 = 12 , f42 = 12 . gobs = .
1.1 0.0
12 12 4 12
Assume no additive noise and equal a priori probabilities 11.18 We are given the eight (2 × 2) training images
p[1] = p[2] = 0.5.
(1) 0.9 0.1 (2) 1.1 0.0
11.15 Following Example 11-5, determine an MLE rule for f = , f = ,
0.9 0.1 1.1 0.1
classifying a (1 × 1) color image [gR , gG , gB ]T into two classes,
0.9 0.1 1.1 0.0
determined from the following training images: f(3) = , f(4) = ,
1.1 0.0 0.9 0.0
For class #1:
0.1 0.9 0.0 1.1
1 5 3 3 f(5) = , f(6) = ,
0.1 0.9 0.1 1.1
f11 = 3 , f21 = 3 , f31 = 5 , f41 = 1 .
5 1 7 5 0.1 0.9 0.0 1.1
f(7) = , f(8) = .
0.0 1.1 0.0 0.9
For class #2:
Use unsupervised learning to determine classes for these eight
2 6 4 4 training images, and a rule for classifying the observed image
f12 = 4 , f22 = 4 , f32 = 6 , f42 = 2 .
6 2 8 6 0.0 1.1
gobs = .
0.0 1.1
Assume no additive noise and equal a priori probabilities
p[1] = p[2] = 0.5.
388 CHAPTER 11 IMAGE RECOGNITION
(i) (i)
is the centroid of the N points {( f1 , f2 ), i = 1, . . . , N}. This is
how centroids are determined in the K-Means algorithm. Note
that the quantity to be minimized is the sum of squared distances
from the points to the centroid, not the sum of distances.
Chapter 12
12 Supervised Learning and
Classification
Contents
Overview, 390 0
12-1 Overview of Neural Networks, 390 1
12-2 Training Neural Networks, 396 2
12-3 Derivation of Backpropagation, 403 3
12-4 Neural Network Training Examples, 404
4
Problems, 408 784 terminals
5
Objectives 7
Learn to: 9
390
12-1 OVERVIEW OF NEURAL NETWORKS 391
Weights
x1 w1
1
Out
1 w0 Σ
x2 w2 0
Weighted Activation
sum function ϕ(∙)
Inputs Perceptron
(a)
Weights
x1 w1
x2 w2
x3 w3
1
Out
1 w0 Σ
x4 w4 0
Weighted Activation
x5 w5 sum function ϕ(∙)
x6 w6
Inputs Perceptron
(b)
Then, in Section 12-2, we discuss how neural networks are A. Components of a Perceptron
trained, using the labeled training images. An example follows
in Section 12-4.
ϕ(x) g0,0 12
1
0.8 g1,0 10
1
0.6
0.4
1 −46 Σ y
0.2
0
g0,1 −2
0 x ϕ(∙)
−6 −4 −2 0 2 4 6
g1,1 −8
Figure 12-2 Activation function φ (x) for a = 1 (blue) and
a = 4 (red). The larger the value of a, the more closely φ (x) (a) Perceptron for Example 11-1
resembles a step function.
gG 4
some activation constant a > 0) the sigmoid function
gB 4
( 1
1 1 if x > 0,
φ (x) =
1 + e−ax
≈
0 if x < 0.
(12.2) 1 −8 Σ y
0
gR −4
This choice of activation function is plotted in Fig. 12-2 for ϕ(∙)
a = 1 and a = 4. The similarity to a step function is evident.
(b) Perceptron for Example 11-5
The rationale for choosing this particular mathematical rep-
resentation for the activation function is supported by two Figure 12-3 Perceptrons for implementing the classification
rules of Examples 11-1 and 11-5.
attributes: (1) its resemblance to a step function, and (2) its
derivative has the simple form
dφ
= a φ (x) (1 − φ (x)). (12.3)
dx Fig. 12-3(a) with φ (x) set equal to the step function u(x). If
The derivative of the activation function is used in Section 12-2 output y = 1, choose f2 , and if y = 0, choose f1 .
for training the neural network. Similarly, consider the classification rule given in Example
Note that to implement a simple linear combination of the 11-5, which stated:
perceptron’s inputs, the activation function should be chosen
such that φ (x) = x.
Choose f1 : if 4gG + 4gB − 4gR − 8 < 0,
Choose f2 : if 4gG + 4gB − 4gR − 8 > 0.
B. Classification Using a Single Perceptron
Another reason for using perceptrons is that single-stage clas-
The rule can be implemented using the perceptron in Fig.
sification algorithms can each be implemented using a single 12-3(b). If output y = 1, choose f2 , and if y = 0, choose f1 .
perceptron.
By way of example, let us consider the classification rules we
derived earlier in Examples 11-1 and 11-5. The classification
rule given by Eq. (11.19) is: Exercise 12-1: Use a perceptron to implement a digital OR
logic gate (y = x1 + x2 , except 1 + 1 = 1).
Choose f1 : if 12g0,0 + 10g1,0 − 2g0,1 − 8g1,1 − 46 < 0,
Choose f2 : if 12g0,0 + 10g1,0 − 2g0,1 − 8g1,1 − 46 > 0. Answer: Use Fig. 12-1(a) with φ (x) ≈ u(x), weights
w1 = w2 = 1, and any w0 such that −1 < w0 < 0.
This rule can be implemented using the perceptron in
12-1 OVERVIEW OF NEURAL NETWORKS 393
neural networks for deep learning. Examples 11-1 and 11-5 are examples of binary classifi-
Deep-learning neural networks became practical around 2009 cation problems with a separating hyperplane. Classification
when it became possible to construct them using graphical was accomplished using the single perceptrons in Figs. 12-3(a)
processing units (GPUs). A GPU is a parallel-processing chip and (b), respectively. The AND and OR gates in Exercises 12-1
capable of performing simple computations (like those in a and 12-2 are two other examples of binary classification.
perceptron) in parallel in thousands of parallel computing cores. Many classification problems do not have separating hy-
They were developed for creating images for computer games perplanes separating classification regions. The classification
and similar applications that required computations for rotation regions are Voronoi sets separated by multiple hyperplanes, or
of vertices in 3-D images. An example of a GPU is the NVIDIA more complicated curved boundaries.
GeForce GTX1080 video card. The applicability of GPUs to
neural networks should be evident. E. Classification Using Multiple Hyperplanes
Let us consider the XOR (exclusive OR) logic gate defined by
C. Examples of Tiny Neural Networks
x1 0 0 1 1
For practical applications, even the smallest neural networks
have thousands of perceptrons. For example, a neural network x2 0 1 0 1 .
for reading handwritten zip codes reads each digit using a y 0 1 1 0
camera that generates a (28×28) image of each digit. Hence, the
The classification regions for the XOR gate are shown in
input layer of the neural network has N ′ = 282 = 784 terminals,
Fig. 12-5, which include two hyperplanes.
which is too large to display as a figure. The output layer has
The y = 1 region is not contiguous, so a single perceptron will
L = 10 perceptrons (one for each possible output {0, 1, . . . , 9}).
not be able to classify the input pair (x1 , x2 ) correctly. Instead,
There is usually one hidden layer with about 15 perceptrons.
we use three perceptrons, connected as shown in Fig. 12-5(b).
Hence, in the forthcoming examples, we limit our discussion
With the activation functions φ (·) of all three perceptrons set
to tiny networks designed to demonstrate how neural networks
to act like step functions u(x), the lower perceptron in the hidden
operate, even though their sizes are very small.
layer of Fig. 12-5(b) computes φ (x2 − x1 − 1), whose value is 1
in the region above the line x2 − x1 = 1 (hyperplane 1 in R2 ).
D. Classification with a Single Separating Similarly, the upper perceptron in the hidden layer computes
Hyperplane φ (x1 − x2 − 1), whose value is 1 in the lower right region below
the line x1 − x2 = 1 (hyperplane 2 in R2 ).
The rightmost perceptron (in the output layer) implements an
◮ A single perceptron can be used to classify images in OR logic gate operation. The output is y = 1 if either of its inputs
binary (L = 2 classes) classification problems in which the is 1, and 0 if neither of its inputs is 1. The −0.5 could be replaced
regions are segregated by a separating hyperplane. ◭ by any number b such that −1 < b < 0.
a1 x1 + a2 x2 + · · · + aN xN = b, (12.5)
Implement a neural network classifier for the classification rule
RN
where (x1 , x2 , . . . , xN ) are the coordinates of a point in and developed in Section 11-9.1.
{a1, a2 , . . . , aN } and b are constants. For N = 2, Eq. (12.5) Solution: In Section 11-9.1, we illustrated how to develop an
becomes unsupervised classification rule for an unknown (2 × 2) image
b a1
x2 = − x1 ,
a2 a2 g[0, 0] g[1, 0]
g[n, m] = , (12.6)
which is a line with slope −a1 /a2 . For N = 3, Eq. (12.5) g[0, 1] g[1, 1]
becomes a plane like the one in Fig. 11-11. A separating
hyperplane divides RN into two classification regions, one for given five (2 × 2) images { f (i) [n, m], i = 1, 2, . . . , 5} with un-
each image class. known classes. The procedure led to coordinates g1 and g2 given
12-1 OVERVIEW OF NEURAL NETWORKS 395
g[0,0] 0.54
0.47 g1
1.5
0.49 g2 − g1 + 1.5
g[0,1] − 1 a1
0.50 −1.5
0
a4 1
−0.52 y1
g[1,0] 0
0.56 g2 g2 1 a2
0.48 0
a5 1
g[1,1] −0.44 y2
0
g2 + g1 − 1.5 1 a3 −0.5
0
−1.5
12-2.1 Gradient (Steepest Descent) (SD) where x(k) is the estimate of the minimizing x at the kth iteration,
Minimization Algorithm and µ is the step-size, a small discretization length. Vector x(k) is
perturbed by a distance µ in the direction that decreases f (x(k) )
A steepest-descent (SD) algorithm, also known as a gradient the fastest.
or (misleadingly) a gradient-descent algorithm (the gradient The iterative process stops when x(k+1) ≈ x(k) , corresponding
does not descend) is an iterative algorithm for finding the to ∇ f (x) ≈ 0. This may be the location x∗ of the global
minimum value of a differentiable function f (x1 , x2 , . . . , xN ) minimum of f (x), or it may be only a local minimum of f (x).
of N spatial variables x = [x1 , x2 , . . . , xN ]T . The function to be These are both illustrated in Example 12-2 below.
minimized must be real-valued and scalar (not vector-valued), A useful interpretation of SD is to regard f (x) as the elevation
although SD can also be applied to minimize scalar functions at a location x in a bowl-shaped surface. The minimizing value
of vector-valued or complex-valued functions, such as || f (x)||2 . x∗ is at the bottom of the bowl. By taking a short step in the
To maximize f (x), we apply SD to − f (x). The minimum occurs direction in which f (x) is decreasing, we presumably get closer
at x = x∗ , where ∗ denotes the minimizing value, not complex to the bottom x∗ of the bowl. However, we may get caught in a
conjugate (this is standard notation for optimization). local minimum, at which ∇ f (x) ≥ 0, so taking another step in
SD relies on the mathematical fact that the gradient ∇ f (x) any direction would increase f (x).
specifies the (vector) direction in which f (x) increases fastest. Figure 12-7 illustrates how taking short steps in the direction
Recall that, even though f is a scalar, ∇ f is a vector defined as in which f (x) is decreasing may eventually take us to the
T minimizing value x∗ . The curves are contours (as in a topograph-
∂f ∂f ∂f ical map) of equal “elevations” of f (x) values. This is merely
∇f = , ,..., . (12.15) illustrative; backpropagation uses a 1-D SD algorithm.
∂ x1 ∂ x2 ∂ xN
398 CHAPTER 12 SUPERVISED LEARNING AND CLASSIFICATION
The 1-D version of Eq. (12.16) for minimizing a function f (x) 12-2.3 Backpropagation
is
d f (k)
x(k+1) = x(k) − µ (x ), (12.17) A. Overview of Backpropagation
dx
where k is the iteration index. Apply the algorithm given by Backpropagation is an algorithm for “training” (computing the
Eq. (12.17) to compute the minimizing value x∗ for weights) of a neural network. To perform backpropagation, we
are given a set of I labeled training images {f1 , f2 , . . . , fI }, where
f (x) = cos(3x) − sin(x) + 2, −2 < x < 2. (12.18) each fi is an unwrapped training image. That is, the original 2-D
(M × N) ith image fi [n, m] is unwrapped by column into a vector
Solution: The function f (x) is plotted in blue in Fig. 12-8. fi of length N ′ = MN:
Over the range −2 < x < 2, f (x) has its global minimum x∗
fi = [ f1 [i], f2 [i], . . . , f j [i], . . . , fN ′ [i]]T . (12.20)
near x = 1, as well as a local minimum near x = −1. Since
df Element f j [i] is the value of the jth pixel of the unwrapped
f ′ (x) = = −3 sin(3x) − cos(x), |x| < 2, training image i. The neural network has an input vector and
dx an output vector. Image fi has N ′ elements, so the input layer of
the SD algorithm of Eq. (12.17) becomes the neural network should have N ′ terminals. The output of the
last layer consists of L neurons, corresponding to the number of
x(k+1) = x(k) + µ (3 sin(3x(k) ) + cos(x(k) )). (12.19) image classes, and the combination of the L values constitutes
an output vector. The desired variable di is the number denoting
When initialized using x(0) = 0.1 and µ = 0.05, the SD algo- perfect classification of image i with input vector fi , and it is
rithm converged to the (desired) global minimum at x ≈ 1 after called the label of fi . For the zip-code example in which each
15 iterations. Ordered pairs {(x(k) , f (x(k) )), k = 1, . . . , 15} are training image is (28 × 28) in size, the input vector is of length
plotted using red “+” symbols in Fig. 12-8. N ′ = 28 × 28 = 784, but the output vector is only of length
But when initialized using x(0) = −0.2, the SD algorithm L = 10, corresponding to the digits 0 through 9.
converged to the (undesired) local minimum at x ≈ −1. The The goal of backpropagation is to determine the weights in
ordered pairs {(x(k) , f (x(k) )), k = 1, . . . , 15} are plotted using the neural network that minimize the mean-square error E over
green “+” symbols in Fig. 12-8. In both cases the algorithm all I training images.
had smaller updates at locations x where d f /dx was small, as
expected from Eq. (12.17). At locations where d f /dx = 0, the B. Notation
algorithm stopped; these were the global and local minima.
The neural-network notation can be both complicated and con-
12-2.2 Supervised-Training Scenario: fusing, so we devote this subsection to nomenclature, supported
by the configuration shown in Fig. 12-10.
Classifying Numerals 0 to 9 With this notation in hand, we now define the mean-square
Let us consider how a neural network should perform to cor- error E over all I training images as
rectly classify images of numerals 0 to 9. In Fig. 12-9, the
I
input layer consists of 784 terminals, corresponding to the 1
intensity values of (28 × 28) image pixels. The true identity
E=
I ∑ E[i], (12.22)
i−1
of the test input image is the numeral 4. The output layer
consists of 10 terminals designed to correspond to the numerals where
0 to 9. Once the neural network has been properly trained using L L
the backpropagation algorithm described later, the fifth output 1 1
terminal (corresponding to numeral 4) should report an output
E[i] =
2 ∑ (eL,p [i])2 = 2 ∑ (d p[i] − yL,p[i])2 . (12.23)
p=1 p=1
of 1, and all of the other output terminals should report outputs
of zero. If the input image is replaced with an image of a The quantity inside the summation is the difference between the
12-2 TRAINING NEURAL NETWORKS 399
0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 12-8 f (x) (in blue) and {(x(k) , f (x(k) )), k = 1, . . . , 15}, in red when initialized using x(0) = 0.1, and in green when initialized
using x(0) = −0.2.
Figure 12-9 When the image of numeral 4 is introduced at the input of a properly trained neural network, the output terminals should all
be zeros, except for the terminal corresponding to numeral 4.
desired output of neuron p of the output layer and the actual image and the L outputs associated with the L classes. That
output of that neuron. The summation is over all L neurons in relationship depends, in turn, on the assignment of weights
the output layer. wℓ,p,q , representing the gain between the output of the qth
neuron in layer (ℓ − 1) and the input to the pth neuron in layer ℓ
(Fig. 12-11). Backpropagation is the tool used to determine
12-2.4 Backpropagation Training Procedure those weights.
The process involves the use of I training images and
The classification accuracy of the neural network depends on one or more iterations. We will outline three backpropagation
the relationship between the N ′ inputs associated with the input
400 CHAPTER 12 SUPERVISED LEARNING AND CLASSIFICATION
f1 w1,1,1
f1
w1,2,1 f1w1,1,1
w3,1,0
f1w1,2,1
w1,1,0 w2,1,0 1
w3,1,1 Σ y3,1
y1,1w2,1,1
f2 w1,1,2 1 y1,1 w2,1,1 1 y2,1 0
f2 f2w1,1,2 Σ Σ w3,2,1
w1,2,2 0 w2,2,1 0
w3,2,0
w3,3,1
f2w1,2,2 y1,2w2,1,2 1
Σ y3,2
f3w1,1,3 y1,1w2,2,1
w3,1,2 0
w1,2,0 Σ y3,3
w2,2,0
f4w1,1,4 0
w3,3,0
f4 w1,1,4 f4w1,2,4
f4
w1,2,4
Figure 12-10 Neural network with 4 terminals at input layer, corresponding to image vector f = [ f1 f2 f3 f4 ]T , 2 hidden layers of 2
neurons each, and an output layer with three terminals, corresponding to three image classes.
(0)
procedures with different approaches. The procedures utilize values. The superscript (0) denotes that wℓ,p,q [1] are the initial
relationships extracted from the derivation given later in Section assignments.
12-3.
Weight Initialization
A. Procedure 1: Iterate over k, One Image at a
Time The initial values of the weights w in the neural network should
be small enough such that the linear combination of inputs is
Image i = 1, Iteration k = 0 between 0 and 1. In addition, even when the activation function
(sigmoid function) has a steep slope, the neuron output will be
1. Initialize: For test image i = 1 (chosen at random and labeled roughly proportional to the linear combination (instead of just
(0)
image 1), initialize the weights wℓ,p,q [1] with randomly chosen being 1 or 0) for a wide range of input values.
12-2 TRAINING NEURAL NETWORKS 401
(k)
defining the total mean-square error (MSE) in Eq. (12.22) can This relationship, which is used to update the weight wℓ,p,q [i] of
be rewritten as a sum of the error over the I2 batches: (k+1)
iteration k to weight wℓ,p,q [i] in iteration (k + 1), is based on the
1 I2 steepest descent (SD) algorithm, which uses the iterative form
E=
I2 ∑ E[i2 ]. (12.29a)
i2 =1 (k+1) (k) ∂ E[i]
wℓ,p,q [i] = wℓ,p,q [i] − µ (k)
. (12.31)
Here, E[i2 ] is the MSE of batch i2 , which is given by ∂ wℓ,p,q [i]
Concept Question 12-5: What is backpropagation and and Eq. (12.33) simplifies to
what is it used for?
∂ E[i]
(k)
= δℓ,p yℓ−1,q[i]. (12.35)
Concept Question 12-6: What does it mean to train a ∂ wℓ,p,q [i]
neural network?
A second use of the chain rule gives
Concept Question 12-7: Does a gradient (steepest de- ∂ E[i]
nℓ+1
∂ E[i] ∂ υℓ+1,s [i]
scent) algorithm always find the global minimum? δℓ,p [i] =
∂ υℓ,p [i]
= ∑ ∂ υℓ+1,s [i] ∂ υℓ,p [i]
s=1
nℓ+1
∂ υℓ+1,s [i]
= ∑ δℓ+1,s[i] ∂ υℓ,p [i]
. (12.36)
12-3 Derivation of Backpropagation s=1
c1
f1
c2
f2
c3
f3
c4
f4
c5
(a) Image of noise plus “+” sign f5
c6
f6 c10
c7
f7
c8
f8
c9
f9
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
Figure 12-14 Performance of neural network for Detection of “+.” Correct in blue, output in red.
y[i] = [y1 [i], y2 [i], . . . , y10 [i]]T , (12.47b) A training set I = 60, 000 images of handwritten digits is avail-
and able at the U.S. National Institute of Standards and Technology
(NIST). The zip code digits were handwritten by 250 different
d[i] = [d1 [i], d2 [i], . . . , d10 [i]]T , (12.47c)
people. The neural network, comprised of 784 input terminals,
with 10 output terminals, and many hidden layers is illustrated in
( Fig. 12-15.
1 for j = correct digit,
d j [i] = (12.47d)
0 for j = incorrect digit.
PROBLEMS 407
4
784 terminals
5
Summary
Concepts
• The output of a perceptron is the application of an brain.
activation function to a weighted sum of its inputs. A • The weights in a neural network are computed using an
perceptron mimics the biological action of a neuron, and algorithm called backpropagation, using a set of labelled
perceptrons are often called neurons. training images. Backpropagation uses one iteration of a
• A common choice for activation function is the sigmoid gradient or steepest descent algorithm.
function (below). • Computing the weights in a neural network by applying
• Classification rules defined by a separating hyperplane backpropagation using a set of labelled training images
can be implemented using a single perceptron. is called training the neural network. There are three
• A neural network is a network of perceptrons, connected different ways of performing training.
in a way that mimics the connections of neurons in the
Mathematical Formulae
Perceptron Sigmoid function
y = φ (w0 + w1 x1 + · · · + wN xN ) 1
φ (x) =
1 + e−ax
Important Terms Provide definitions or explain the meaning of the following terms:
activation function hidden layers neuron sigmoid function training
backpropagation input layer output layer steepest descent
gradient neural network perceptron supervised learning
PROBLEMS 12.4 An image
g0,0 g1,0
is to be classified as either
1 0
g1,0 g1,1 0 1
Section 12-1: Overview of Neural Networks 0 1
or . Specify the weights in a perceptron like Fig. 12-3(a)
1 0
12.1 In Exercise 12-1, you had to determine by inspection the that classifies the image.
weights of a perceptron so that it implemented an OR gate. Write 12.5 A binary adder implements binary addition (with carry).
out a set of nonlinear equations whose solution is the weights in It has the truth table
Fig. 12-1(a).
x1 0 0 1 1
12.2 In Exercise 12-2, you had to determine by inspection the x2 0 1 0 1
weights of a perceptron so that it implemented an AND gate.
Write out a set of nonlinear equations whose solution is the sum 0 1 1 0
weights in Fig. 12-1(a). carry 0 0 0 1
g0,0 g1,0 1 0 where sum = x1 + x2 (mod 2) and carry is the carry (1 + 1 = 10
12.3 An image is to be classified as either base 2). Modify the neural network in Fig. 12-5(b) to implement
g1,0 g1,1 0 1
a binary adder.
0 1
or . Write a set of equations whose solution is the
1 0 12.6 A binary-to-decimal converter accepts as input a 2-bit
weights replacing those in Fig. 12-3(a). binary number (x1 x2 )2 and converts it to a decimal number 0,
PROBLEMS 409
1, 2 or 3. Design a neural network that accepts as inputs {x1 , x2 } training input M-vectors x[i] = [x1 [i], x2 [i], . . . , xM [i]]T , where
and outputs {y0 , y1 , y2 , y3 } where if (x1 x2 )2 = K, then yK = 1 i = 1 . . . I and I labels {d[i], i = 1, . . . , I}, where d[i] is the desired
and {yk , k 6= K} are all zero. output for training vector x[i]. The goal is to compute weights
{w j , j = 0, . . . , M} that minimize E[i] = 21 (d[i] − y[i])2, where
12.7 (This problem may require review of Chapter 2.) A 1-D
signal x(t) of duration 1 s is sampled at 44100 samples/s, y[i] = φ (∑M j=0 w j x j [i]) and x0 = 1 implements the single neuron.
resulting in discrete-time signal x[n] = x(n/44100) of duration (a) Derive a steepest descent algorithm to compute {w j ,
44100 s. Design a neural network for determining the presence j = 0, . . . , M} minimizing E[i].
or absence of a trumpet playing note A (fundamental frequency (b) Show that this is the output layer ℓ = L in the backpropa-
440 Hz) by examining the first four harmonics of the trumpet gation derivation.
signal. Assume the real parts of their DFT are positive.
12.8 (This problem may require review of Chapter 5.) Design Section 12-4: Neural Network Training Examples
a neural network for edge detection on a 1-D signal {x[n]} of
duration N. An edge is at n = n0 if |x[n0 ] − x[n0 − 1]| > T for
12.13 Program P1213.m creates 100 random points in the
some threshold T . Let x[−1] = x[0]. The extension to 2-D edge
square −10 < x1 , x2 < 10 and labels them as being inside
detection of horizontal or vertical edges is straightforward.
or outside a circle of radius 8 centered at the origin (so
the areas inside and outside the circle are rougly equal:
Section 12-2: Training Neural Networks π (8)2 = 201 ≈ 200 = 12 (20)2 ). It then trains a neural network
with 2 inputs (x1 and x2 coordinates of each point), 1 output
12.9 The program neuron.m trains a neuron using labeled neuron (for inside or outside the circle), and a hidden layer of 10
training vectors. Run the program to train (determine 2 weights neurons. It uses 1000 iterations, each running over 100 training
of) a neuron to implement an OR gate. This is the same problem points (2 coordinates each), µ = 0.01, and a = 7. Run P1213.m
as Problem 12.1. The neuron has the form of Fig. 12-1(a). Use using different initializations until it successfully assigns each
1000 iterations with step size µ = 0.01, a = 7, and initialize training point as being in category #1 (inside) or #2 (outside) the
all weights with 0.1. Compare the neuron outputs y[i] with the circle. Of course, the neural network doesn’t “know” circles; it
desired output d[ j] for j = 1, 2, 3, 4 in a table. “learns” this from training.
12.10 The program neuron.m trains a neuron using labelled 12.14 Program P1214.m creates 100 random points in the
training vectors. Run the program to train (determine 2 weights square −10 < x1 , x2 < 10 and labels them as being inside or
of) a neuron to implement an AND gate. This is the same prob- outside a parabola x2 = x21 /10. It then trains a neural network
lem as Problem 12.2. The neuron has the form of Fig. 12-1(a). with 2 inputs (x1 and x2 coordinates of each point), 1 output
Use 1000 iterations with step size µ = 0.01, a = 7, and initialize neuron (for inside or outside the parabola), and a hidden layer
all weights with 0.1. Compare the neuron outputs y[i] with the of 10 neurons. It uses 1000 iterations, each running over 100
desired output d[ j] for j = 1, 2, 3, 4 in a table. training points (2 coordinates each), µ = 0.01, and a = 7. Run
12.11 The program neuron.m trains a neuron using labelled P1214.m using different initializations until it successfully
training vectors. Run assigns each training point as being in category #1 (inside) or
the program
to train a neuron to classify a
1 0 0 1 #2 (outside) the parabola. Of course, the neural network doesn’t
(2 × 2) image as or . This is the same problem “know” parabolas; it “learns” this from training.
0 1 1 0
as Problem 12.3. The neuron has the form of Fig. 12-1(b). Use 12.15 Program P1215.m creates 100 random points in the
1000 iterations with step size µ = 0.01, a = 7, and initialize square −10 < x1 , x2 < 10 and labels them as being in-
all weights with 0.1. Compare the neuron outputs y[i] with the side or outside 4 quarter circles centered on the corners.
desired output d[ j] for j = 1, 2 in a table. The areas inside and outside the circle are rougly equal:
π (8)2 = 201 ≈ 200 = 12 (20)2 ). It then trains a neural network
Section 12-3: Derivation of Backpropagation with 2 inputs (x1 and x2 coordinates of each point), 1 output
neuron (for inside or outside the parabola), and a hidden layer
12.12 We clarify the derivation of backpropagation by ap- of 10 neurons. It uses 1000 iterations, each running over 100
plying it to a single neuron. Single neurons are discussed in training points (2 coordinates each), µ = 0.01, and a = 7. Run
Section 12-1.1 and illustrated in Fig. 12-1. We are given I P1215.m using different initializations until it successfully
410 CHAPTER 12 SUPERVISED LEARNING AND CLASSIFICATION
√
|z| = z z∗ . (A.9)
411
412 APPENDIX A REVIEW OF COMPLEX NUMBERS
Division: For z2 6= 0,
(z)
z1 x1 + jy1
z2 = −2 + j3 z1 = 2 + j3 =
3 z2 x2 + jy2
θ22 3 (x1 + jy1 ) (x2 − jy2 )
θ2 = 180o − θ1 θ1 = tan−1 2 = 56.3o = ·
1 θ1 (x2 + jy2 ) (x2 − jy2 )
(x1 x2 + y1 y2 ) + j(x2 y1 − x1 y2 )
(z) = , (A.13a)
−3 −2 −1 1 2 3 x22 + y22
−1
θ3 = −θ2 θ θ4 θ4 = −θ1 or
−2 3
−3 z1 |z1 |e jθ1
z3 = −2 − j3 z4 = 2 − j3 =
z2 |z2 |e jθ2
|z1 | j(θ1 −θ2 )
Figure A-2 √ Complex numbers z1 to z4 have the same = e
|z2 |
magnitude |z| = 22 + 32 = 3.61, but their polar angles depend
on the polarities of their real and imaginary components. |z1 |
= [cos(θ1 − θ2 ) + j sin(θ1 − θ2 )]. (A.13b)
|z2 |
zn = (|z|e jθ )n
Equality: If two complex numbers z1 and z2 are given by
= |z|n e jnθ = |z|n (cos nθ + j sin nθ ), (A.14)
j θ1
z1 = x1 + jy1 = |z1 |e , (A.10a)
j θ2 z1/2 = ±|z|1/2 e jθ /2
z2 = x2 + jy2 = |z2 |e , (A.10b)
= ±|z|1/2 [cos(θ /2) + j sin(θ /2)]. (A.15)
then z1 = z2 if and only if (iff ) x1 = x2 and y1 = y2 or,
equivalently, |z1 | = |z2 | and θ1 = θ2 .
√ p
x = Re(z) = |z| cos θ |z| = zz∗ = x2 + y2
y = Im(z) = |z| sin θ θ = tan−1 (y/x)
zn = |z|n e jnθ z1/2 = ±|z|1/2 e jθ /2
z1 = x1 + jy1 z2 = x2 + jy2
z1 = z2 iff x1 = x2 and y1 = y2 z1 + z2 = (x1 + x2) + j(y1 + y2 )
z1 |z1 | j(θ1 −θ2 )
z1 z2 = |z1 ||z2 |e j(θ1 +θ2 ) = e
z2 |z2 |
−1 = e jπ = e− jπ = 1 ±180◦
j = e jπ /2 = 1 90◦ − j = e− jπ /2 = 1 −90◦
p (1 + j) p (1 − j)
j = ±e jπ /4 = ± √ − j = ±e− jπ /4 = ± √
2 2
◦
V = |V|e jθV = 5e− j53.1 = 5 −53.1◦ ,
Example A-1: Working with Complex
Numbers p √
|I| = 22 + 32 = 13 = 3.61.
√
Exercise A-2: Show that 2 j = ±(1 + j). (See IP )
−2 3
θV
|I| θI
|V|
I −3
−4 V
VI = (5 −53.1◦)(3.61 −123.7◦)
= (5 × 3.61) (−53.1◦ − 123.7◦) = 18.05 −176.8◦ .
(c)
◦ ◦ ◦
VI∗ = 5e− j53.1 × 3.61e j123.7 = 18.05e j70.6 .
(d)
◦
V 5e− j53.1 j70.6◦
= ◦ = 1.39e .
I 3.61e− j123.7
(e)
√ √ ◦
I = 3.61e− j123.7
√ ◦ ◦
= ± 3.61 e− j123.7 /2 = ±1.90e− j61.85 .
z1 = (4 − j3)2 ,
z2 = (4 − j3)1/2.
√
Answer: z1 = 25 −73.7◦ , z2 = ± 5 −18.4◦ . (See IP )
Appendix B
B MATLAB® and MathScript
A Short Introduction for Use in Image MATLAB commands used in this book and website also work
Processing in MathScript.
One important exception is colormap(gray). To make
this work in MathScript, G=[0:64]/64;gray=G’*[1 1
1]; must be inserted at the beginning of the program.
B-1 Background Instructions on how to acquire a student version of MathScript
“A computer will always do exactly what you tell it to do. But are included on the website accompanying the book, as part of
that may not be what you had in mind.”—a quote from the the student edition of LabVIEW. In the sequel, we use “M/M”
1950’s. to designate “MATLAB or MathScript.”
This Appendix is a short introduction to MATLAB and Math- Freemat and GNU Octave are freeware programs that are
Script for this book. It is not comprehensive; only commands mostly compatible with MATLAB.
directly applicable to signal and image processing are covered.
No commands in any of MATLAB’s Toolboxes are included, Getting Started
since these commands are not included in basic MATLAB
or MathScript. Programming concepts and techniques are not To install the student version of MathScript included on the
included, since they are not used anywhere in this book. website, follow the instructions.
When you run M/M, a prompt >> will appear when it
is ready. Then you can type commands. Your first command
MATLAB should be >>cd mydirectory, to change directory to your
working directory, which we call “mydirectory” here.
MATLAB is a computer program developed and sold by the We will use this font to represent typed commands and
Mathworks, Inc. It is the most commonly used program in signal generated output. You can get help for any command, such as
processing, but it is used in all fields of engineering. plot, by typing at the prompt help plot.
“MATLAB” (matrix laboratory was originally based on a set Some basic things to know about M/M:
of numerical linear algebra programs, written in FORTRAN,
called LINPACK. So MATLAB tends to formlate problems • Inserting a semicolon “;” at the end of a command sup-
in terms of vectors and arrays of numbers, and often solves presses output; without it M/M will type the results of the
problems by formulating them as linear algebra problems. computation. This is harmless, but it is irritating to have
The student edition of MATLAB is much cheaper than the numbers flying by on your screen.
professional version of MATLAB. It is licensed for use by all
• Inserting ellipses “...” at the end of a command means
undergraduate and graduate students. Every program on the
it is continued on the next line. This is useful for long
website for this book will run on it.
commands.
• Inserting “%” at the beginning of a line makes the line a
MathScript comment; it will not be executed. Comments are used to
MathScript is a computer program developed and sold by explain what the program is doing at that point.
National Instruments, as a module in LabVIEW. The basic • clear eliminates all present variables. Programs should
commands used by MATLAB also work in MathScript, but always start with a clear.
higher-level MATLAB commands, and those in Toolboxes,
usually do not work in MathScript. Unless otherwise noted, all • whos shows all variables and their sizes.
415
416 APPENDIX MATLAB AND MATHSCRIPT
√
• M/M variables are case-sensitive: t and T are different Both i and‘ j represent −1; answers use i. pi represents
variables. π . e does not represent 2.71828.
• save myfile X,Y saves the variables X and Y in the
file myfile.mat for use in another session of M/M at another B-2.2 Entering Vectors and Arrays
time. To enter row vector [1 2 3] and store it in A, type at the prompt
• load myfile loads all variables saved in myfile.mat, so A=[1 2 3]; or A=[1,2,3];
they can now be used in the present session of M/M. To enter the same numbers as a column vector and store
it in A, type at the prompt either A=[1;2;3]; or A=[1 2
• quit ends the present session of M/M. 3];A=A’; Note A=A’ replaces A with its transpose. “Trans-
pose” means “convert rows to columns, and vice-versa.”
To enter a vector of consecutive or equally-spaced numbers,
.m Files follow these examples:
An M/M program is a list of commands executed in succession.
Programs are called “m-files” since their extension is “.m,” or • [2:6] gives ans=2 3 4 5 6
“scripts.” • [3:2:9] gives ans=3 5 7 9
To write an .m file, at the upper left, click:
File→New→m-file • [4:-1:1] gives ans=4 3 2 1
This opens a window with a text editor.
Type in your commands and then do this: To enter an array or matrix of numbers, type, for example,
File→Save as→myname.m B=[3 1 4;1 5 9;2 6 5]; This gives the array B and its
Make sure you save it with an .m extension. Then you can run transpose B’
the file by typing its name at the prompt: >>myname. Make
sure the file name is not the same as a Matlab command! Using 3 1 4 3 1 2
′
your own name is a good idea. B = 1 5 9 B = 1 5 6
You can access previously-typed commands using uparrow 2 6 5 4 9 5
and downarrow on your keyboard.
To download a file from a website, right-click on it, select Other basics of arrays:
save target as, and use the menu to select the proper file type
• ones(M,N) is an M × N array of “1”
(specified by its file extension).
• zeros(M,N) is an M × N array of “0”
B-2 Basic Computation • length(X) gives the length of vector X
• To find the inner product of X and Y, which is • A(A>2)=0 sets to 0 all values of elements of vector A
(3)(2)+(1)(7)+(4)(3)=25, use X*Y’. This gives ans=25 exceeding 2. For example,
A=[3 1 4 1 5];A(A<2)=0 gives A=3 0 4 0 5
• To find the outer product of X and Y, which is
M/M indexing of arrays starts with 1, while signal and image
(3)(2) (3)(7) (3)(3) indexing starts with 0. For example, the DFT is defined using
(1)(2) (1)(7) (1)(3) use X’*Y
index n = 0, 1 . . . N − 1, for k = 0, 1 . . . N − 1. fft(X), which
(4)(2) (4)(7) (4)(3) computes the DFT of X, performs
This gives the above matrix. fft(X)=X*exp(-j*2*pi*[0:N-1]’*[0:N-1]/N);
• Plot each computed value of X against its corresponding B-3.2 Plotting Problems
value of T using plot(T,X)
Common problems encountered using plot:
• If you are making several different plots, put them all on T and X must have the same lengths; and neither T nor X
one page using subplot. should be complex; use plot(T,real(X)) if necessary.
subplot(324),plot(T,X) divides a figure into a 3- The above linspace command generates 100 equally
by-2 array of plots, and puts the X vs. T plot into the 4th spaced numbers between a and b, including a and b. This is
place in the array (the middle of the rightmost column). not the same as sampling x(t) with a sampling interval of b−a
100 .
To see why:
One problem with MathScript that does not arise with MATLAB
is that in MathScript subplot(324) opens 6 figures, even if • linspace(0,1,10) gives 10 numbers between 0 and 1
only one or two of them will actually be used for plots. This is inclusive, spaced by 0.111;
inelegant but harmless.
Print out the current figure (the one in the foreground; click • [0:.1:1] gives 11 numbers spaced by 0.1.
on a figure to bring it to the foreground) by typing print
Print the current figure to a encapsulated postscript file Try the following yourself on M/M:
myname.eps by typing print -deps2 myname.eps. Type
help print for a list of printing options for your computer. • T=[0:10];X=3*cos(T);plot(T,X)
For example, use -depsc2 to save a figure in color. This should be a very jagged-looking plot, since it is only
To make separate plots of cos(4t) and sin(4t) for 0 ≤ t ≤ 5 in sampled at 11 integers and the samples are connected by
a single figure, use the following: lines.
T=linspace(0,5,100);X=cos(4*T);Y=sin(4*T); • T=[0:0.1:10];X=3*cos(T);plot(T,X)
subplot(211),plot(T,X) This should be a much smoother plot since there are now
subplot(212),plot(T,Y) 101 (not 11) samples.
These commands produce the following figure:
• T=[1:4000];X=cos(2*pi*440*T/8192);
1 sound(X,8192) This is musical note “A.”
sound(X,Fs) plays the vector X as sound,
0.5
at a sampling rate of Fs samples/second.
0
• plot(X). This should be a blue smear! It is about 200
−0.5 cycles squished together.
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
• plot(X(1:100)) This “zooms in” on the first 100
samples of X to see the sinusoid. It is also possible to zoom
1
in by clicking on the figure.
0.5
B-3.3 More Advanced Plotting
0
Plots should be labelled and annotated. Use:
−0.5
• title(’Myplot’) adds the title “Myplot”
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
• xlabel(’t’) labels the x axis with “t”
The default is that plot(X,Y) plots each of the 100 ordered • ylabel(’x’) labels the y axis with “x”.
pairs (X(I),Y(I)) for I = 1, . . . , 100, and connects the
points with straight lines. If there are only a few data points to • $\omega$ produces ω in title,xlabel and
be plotted, they should be plotted as individual ordered pairs, not ylabel. Similarly for other Greek letters. Note ’, not ‘,
connected by lines. This can be done using plot(X,Y,’+’). should be used everywhere.
B-5 MISCELLANEOUS COMMANDS 419
• axis tight contracts the plot borders to the limits of from 0 to 1 for color images; this must be done separately for
the plot itself. each component:
Index
421
422 INDEX
warping, 152
wavelength, 25
wavelet, 218, 223
wavelet transform, 203
wavelet transform matrix, 237
wavenumbers, 94
weak-sense stationary, 280
weighting coefficients, 76
white, 282
white random process, 282
wide-sense stationary, 280
Wiener filter, 193, 310
Wiener process, 317
windowed sinc functions, 130
within-class, 367
WSS, 280, 304
zero-padded functions, 80
zero-padding, 74
zero-stuffing, 203, 205, 212
Andrew E. Yagle Fawwaz T. Ulaby
University of Michigan