Image Processing For Engineers

Image Processing
For Engineers
by Andrew E. Yagle and Fawwaz T. Ulaby

Book companion website:
: ip.eecs.umich.edu
“book” — 2016/3/15 — 6:35 — page iii — #3
IMAGE
PROCESSING
FOR ENGINEERS
Andrew E. Yagle
The University of Michigan
Fawwaz T. Ulaby
The University of Michigan
Copyright  2018 Andrew E. Yagle and Fawwaz T. Ulaby
This book is published by Michigan Publishing under an agreement with the authors.
It is made available free of charge in electronic form to any student or instructor
interested in the subject matter.
Published in the United States of America by

Michigan Publishing
Manufactured in the United States of America
ISBN 978-1-60785-488-3 (hardcover)

ISBN 978-1-60785-489-0 (electronic)
“book” — 2016/3/15 — 6:35 — page v — #5
This book is dedicated to the memories of

Professor Raymond A. Yagle and Mrs. Anne Yagle
Contents
Preface Chapter 4 Image Interpolation 128
Chapter 1 Imaging Sensors 1 4-1 Interpolation Using Sinc Functions 129

4-2 Upsampling and Downsampling Modalities 130
1-1 Optical Imagers 3 4-3 Upsampling and Interpolation 133
1-2 Radar Imagers 13 4-4 Implementation of Upsampling Using 2-D DFT
1-3 X-Ray Computed Tomography (CT) 18 in MATLAB 137
1-4 Magnetic Resonance Imaging 19 4-5 Downsampling 140
1-5 Ultrasound Imager 23 4-6 Antialias Lowpass Filtering 141
1-6 Coming Attractions 27 4-7 B-Splines Interpolation 143
4-8 2-D Spline Interpolation 149
Chapter 2 Review of 1-D Signals and Systems 38 4-9 Comparison of 2-D Interpolation Methods 150
4-10 Examples of Image Interpolation Applications 152
2-1 Review of 1-D Continuous-Time Signals 41
2-2 Review of 1-D Continuous-Time Systems 43 Chapter 5 Image Enhancement 159
2-3 1-D Fourier Transforms 47
2-4 The Sampling Theorem 53 5-1 Pixel-Value Transformation 160
2-5 Review of 1-D Discrete-Time Signals and 5-2 Unsharp Masking 163
Systems 59 5-3 Histogram Equalization 167
2-6 Discrete-Time Fourier Transform (DTFT) 66 5-4 Edge Detection 171
2-7 Discrete Fourier Transform (DFT) 70 5-5 Summary of Image Enhancement Techniques 176
2-8 Fast Fourier Transform (FFT) 76
2-9 Deconvolution Using the DFT 80 Chapter 6 Deterministic Approach to Image 180
2-10 Computation of Continuous-Time Fourier Restoration
Transform (CTFT) Using the DFT 82
6-1 Direct and Inverse Problems 181
Chapter 3 2-D Images and Systems 89 6-2 Denoising by Lowpass Filtering 183
6-3 Notch Filtering 188
3-1 Displaying Images 90 6-4 Image Deconvolution 191
3-2 2-D Continuous-Space Images 91 6-5 Median Filtering 194
3-3 Continuous-Space Systems 93 6-6 Motion-Blur Deconvolution 195
3-4 2-D Continuous-Space Fourier Transform
(CSFT) 94 Chapter 7 Wavelets and Compressed Sensing 202
3-5 2-D Sampling Theorem 107
7-1 Tree-Structured Filter Banks 203
3-6 2-D Discrete Space 113
7-2 Expansion of Signals in Orthogonal Basis
3-7 2-D Discrete-Space Fourier Transform (DSFT) 118
Functions 206
3-8 2-D Discrete Fourier Transform (2-D DFT) 119
7-3 Cyclic Convolution 209
3-9 Computation of the 2-D DFT Using MATLAB 121
7-4 Haar Wavelet Transform 213
7-5 Discrete-Time Wavelet Transforms 218 Chapter 10 Color Image Processing 334
7-6 Sparsification Using Wavelets of Piecewise-
Polynomial Signals 223 10-1 Color Systems 335
7-7 2-D Wavelet Transform 228 10-2 Histogram Equalization and Edge Detection 340
7-8 Denoising by Thresholding and Shrinking 232 10-3 Color-Image Deblurring 343
7-9 Compressed Sensing 236 10-4 Denoising Color Images 346
7-10 Computing Solutions to Underdetermined
Chapter 11 Image Recognition 353
Equations 238
7-11 Landweber Algorithm 241 11-1 Image Classification by Correlation 354
7-12 Compressed Sensing Examples 242 11-2 Classification by MLE 357
11-3 Classification by MAP 358
Chapter 8 Random Variables, Processes, and 254
11-4 Classification of Spatially Shifted Images 360
Fields
11-5 Classification of Spatially Scaled Images 361
8-1 Introduction to Probability 255 11-6 Classification of Rotated Images 366
8-2 Conditional Probability 259 11-7 Color Image Classification 367
8-3 Random Variables 261 11-8 Unsupervised Learning and Classification 373
8-4 Effects of Shifts on Pdfs and Pmfs 263 11-9 Unsupervised Learning Examples 377
8-5 Joint Pdfs and Pmfs 265 11-10 K-Means Clustering Algorithm 380
8-6 Functions of Random Variables 269
Chapter 12 Supervised Learning and 389
8-7 Random Vectors 272
Classification
8-8 Gaussian Random Vectors 275
8-9 Random Processes 278 12-1 Overview of Neural Networks 390
8-10 LTI Filtering of Random Processes 282 12-2 Training Neural Networks 396
8-11 Random Fields 285 12-3 Derivation of Backpropagation 403
12-4 Neural Network Training Examples 404
Chapter 9 Stochastic Denoising and 291
Deconvolution Appendix A Review of Complex Numbers 411
9-1 Estimation Methods 292 Appendix B MATLAB® and MathScript 415
9-2 Coin-Flip Experiment 298
9-3 1-D Estimation Examples 300 Index 421
9-4 Least-Squares Estimation 303
9-5 Deterministic versus Stochastic Wiener
Filtering 307
9-6 2-D Estimation 309
9-7 Spectral Estimation 313
9-8 1-D Fractals 314
9-9 2-D Fractals 320
9-10 Markov Random Fields 322
9-11 Application of MRF to Image Segmentation 327
Preface
“A picture is worth a thousand words.” • An introduction to discrete wavelets, and application of

This is an image processing textbook with a difference. Instead wavelet-based denoising algorithms using thresholding and
of just a picture gallery of before-and-after images, we provide shrinkage, including examples and problems.
(on the accompanying website) MATLAB programs (.m files) • An introduction to compressed sensing, including exam-
and images (.mat files) for each of the examples. These allow ples and problems.
the reader to experiment with various parameters, such as noise
strength, and see their effect on the image processing procedure. • An introduction to Markov random fields and the ICM
We also provide general MATLAB programs, and Javascript algorithm.
versions of them, for many of the image processing procedures
presented in this book. We believe studying image processing • An introduction to supervised and unsupervised learning
without actually performing it is like studying cooking without and neural networks.
turning on an oven. • Coverage of both deterministic (least-squares) and stochas-
Designed for a course on image processing (IP) aimed at both tic (a priori power spectral density) image deconvolution,
graduate students as well as undergraduates in their senior year, and how the latter gives better results.
in any field of engineering, this book starts with an overview
in Chapter 1 of how imaging sensors—from cameras to radars • Interpolation using B-splines.
to MRIs and CAT—form images, and then proceeds to cover a
wide array of image processing topics. The IP topics include: • A review of probability, random processes, and MLE,
image interpolation, magnification, thumbnails, and sharpening, MAP, and LS estimation.
edge detection, noise filtering, de-blurring of blurred images,
supervised and unsupervised learning, and image segmentation, Book Companion Website: ip.eecs.umich.edu
among many others. As a prelude to the chapters focused on
image processing (Chapters 3–12), the book offers in Chapter 2 The book website is a rich resource developed to extend the
a review of 1-D signals and systems, borrowed from our 2018 educational experience of the student beyond the material cov-
book Signals and Systems: Theory and Applications, by Ulaby ered in the textbook. It contains MATLAB programs, standard
and Yagle. images to which the reader can apply the image processing tools
outlined in the book, and Javascript image processing modules
with selectable parameters. It also contains solutions to “concept
Book highlights: questions” and “exercises,” and, for instructors, solutions to
• A section in Chapter 1 called “Coming Attractions,” of- homework problems.
fering a sampling of the image processing applications
covered in the book. Acknowledgments: Mr. Richard Carnes—our friend, our pi-
ano performance teacher, and our LATEX super compositor—
• MATLAB programs and images (.m and .mat files) on the deserves singular thanks and praise for the execution of this
book’s website for all examples and problems. All of these book. We are truly indebted to him for his meticulous care and
also run on NI LabVIEW Mathscript. attention. We also thank Ms. Rose Anderson for the elegant de-
• Coverage of standard image processing techniques, includ- sign of the cover and for creating the printable Adobe InDesign
ing upsampling and downsampling, rotation and scaling, version of the book.
histogram equalization, lowpass filtering, classification,
edge detection, and an introduction to color image process- A NDREW YAGLE AND FAWWAZ U LABY, 2018
ing.
Chapter 1
1 Imaging Sensors
Contents
Lens of diameter D
Overview, 2
1-1 Optical Imagers, 3
y
1-2 Radar Imagers, 13
1-3 X-Ray Computed Tomography (CT), 18 Source s
1-4 Magnetic Resonance Imaging, 19 x
θ
1-5 Ultrasound Imager, 12
1-6 Coming Attractions, 27
Problems, 36 Object plane Image plane
Image processing has applications in medicine,

Objectives robotics, human-computer interface, and manufac-
turing, among many others. This book is about the
Learn about: mathematical methods and computational
algorithms used in processing an image from its raw
■ How a digital camera forms an image, and what form—whether generated by a digital camera, an
determines the angular resolution of the camera.
ultrasound monitor, a high-resolution radar, or any
■ How a thermal infrared imager records the distribu- other 2-D imaging system—into an improved form
tion of energy emitted by the scene. suitable for the intended application. As a prelude,
this chapter provides overviews of the image
■ How a radar can create images with very high formation processes associated with several sensors.
resolution from satellite altitudes.
■ How an X-ray system uses computed tomography

(CT) to generate 3-D images.
■ How magnetic resonance is used to generate 3-D

MRI images.
■ How an ultrasound instrument generates an image of

acoustic reflectivity, much like an imaging radar.
■ The history of image processing.
■ The types of image-processing operations examined

in detail in follow-up chapters.
2 CHAPTER 1 IMAGING SENSORS
Overview sectional images (called slice) of the attenuation for specific

areas of interest. A rather different process occurs in magnetic
In today’s world we use two-dimensional (2-D) images gen- resonance imaging (MRI).
erated by a variety of different sensors, from optical cameras For these and many other sensing processes, the formation of
and ultrasound monitors to high-resolution radars and others. the 2-D image is only the first step. As depicted in Fig. 1-1, we
A camera uses light rays and lenses to form an image of the call such an image the raw image, because often we subject the
brightness distribution across the scene observed by the lens, raw image to a sequence of image processing steps designed
ultrasound imagers use sound waves and transducers to measure to transform the image into a product more suitable for the
the reflectivity of the scene or medium exposed to the sound intended application (Table 1-1). These steps may serve to filter
waves, and radar uses antennas to illuminate a scene with out (most of) the noise that may have accompanied the (desired)
microwaves and then detect the fraction of energy scattered signal in the image detection process, rotate or interpolate the
back toward the radar. The three image formation processes image if called for by the intended application, enhance certain
are markedly different, yet their output product is similar: a image features to accentuate recognition of objects of interest,
2-D analog or digital image. An X-ray computed tomography or compress the number of pixels representing the image so as
(CT) scanner measures the attenuation of X-rays along many to reduce data storage (number of bits), as well as other related
directions through a 3-D object, such as a human head, and actions.
then processes the data to generate one or more 2-D cross-
Raw Improved
Image image Image image
Image display
formation processing
Image storage/
Sensor transmission
Image Formation and Processing
Image analysis
Figure 1-1 After an image is formed by a sensor, image processing tools are applied for many purposes, including changing its scale and
orientation, improving its information content, or reducing its digital size.
1-1 OPTICAL IMAGERS 3
processing techniques covered in future chapters can accom-

Table 1-1 Examples of image-processing applications.
plish.
• Medicine (radiological diagnoses, microscopy)
• Defense (radar, sonar, infrared, satellites, etc.)
1-1 Optical Imagers
• Robotics / machine vision (e.g., “intelligent” vehicles)
• Human-computer interfaces (face/fingerprint “recognition” Even though the prime objective of this book is to examine
for security, character recognition) the various image processing techniques commonly applied to
• Compression for storage, transmission from space probes, a raw image (Fig. 1-1) to transform it into an improved image
etc. of specific utility, it will prove helpful to the reader to have a
fundamental understanding of the image formation process that
• Entertainment industry led to the raw image in the first place. We consider five types
• Manufacturing (e.g., part inspection) of imaging sensors in this introductory chapter, of which four
are electromagnetic (EM), and the fifth is acoustic. Figure 1-2
depicts the EM spectrum, extending from the gamma-ray region
to the radio region. Optical imagers encompass imaging systems
◮ This book covers the mathematical bases and compu- that operate in the visible, ultraviolet, and infrared segments
tational techniques used to realize these image-processing of the EM spectrum. In the present section we feature digital
transformations. ◭ cameras, which record reflected energy in the visible part of the
spectrum, and infrared imagers, which sense thermal radiation
self-emitted by the observed scene.
To set the stage for the material covered in future chapters,
this introductory chapter introduces the reader to overviews of 1-1.1 Digital Cameras
the image formation processes associated with several different
types of sensors. Each of Sections 1-1 through 1-5 sketches the In June of 2000, Samsung introduced the first mobile phone
fundamental physical principles and the terminology commonly with a built-in digital camera. Since then, cameras have become
used in connection with that particular imaging sensor. The integral to most mobile phones, computer tablets, and laptop
chapter concludes with Section 1-6, which provides visual computers. And even though cameras may vary widely in terms
demonstrations of the various image operations that the image of their capabilities, they all share the same imaging process. As
X-ray imagers Optical imagers Radar MRI

Gamma-ray X-ray Ultraviolet Visible Infrared Microwave Radio
10−12 10−10 10−8 0.5×10 −6 10−5 10−2 100 102 103
Wavelength (m)
3 × 1020 3 × 1018 3 × 1016 3 × 1014 3 × 1012 3 × 108 3 × 104
Frequency (Hz)
10,000,000 K 10,000 K 100 K 1K
Temperature of object
Temperature when
of object radiation
when radiationisismost
mostintense
intense
Figure 1-2 Electromagnetic spectrum.

an optical imager, the camera records the spatial distribution of

visible light reflected by a scene due to illumination by the sun
or an artificial light source. In the simplified diagram shown in
Fig. 1-3, the converging lens of the camera serves to focus the Digital
light reflected by the apple to form a sharp image in the image storage
plane of the camera. To “focus” the image, it is necessary to device
adjust the location of the lens so as to satisfy the lens law
1 1 1 Amplifiers
+ = (lens law), (1.1)
do di f
Figure 1-4 An active pixel sensor uses a 2-D array of
where do and di are the distances between the lens and the object photodetectors, usually made of CMOS, to detect incident light
and image planes, respectively, and f is the focal length of the in the red, green, and blue bands.
lens.
In a traditional analog camera, the image is captured by a
film containing light-sensitive silver halide crystals. The crystals
undergo a chemical change—and an associated darkening—in
replaced with APS arrays over the past 20 years (Moynihan,
proportion to the amount of light absorbed by the crystals.
2015) because APS arrays consume less power to operate and
Modern cameras use arrays of charge-coupled devices
are less expensive to fabricate (but they are more susceptible to
(CCDs) or active pixel sensors (APSs), placed in the image
noise than CCDs).
plane, to capture the image and then transfer the intensity
A photodetector uses CMOS (complementary metal-oxide
readings to a data storage device (Fig. 1-4). The CCD relies
semiconductor) technology to convert incident photons into an
on charge transfer in response to incident photons, whereas an
output voltage. Because both CCD and CMOS are sensitive to
APS uses a photodetector and an amplifier. CCD arrays were
the entire visible spectrum, from about 0.4 µ m to 0.7 µ m, as
the sensors of choice in the 1970–2000 era, but they have been
well as part of the near-infrared (NIR) spectrum from 0.7 µ m to
Converging lens
of focal length f
2-D detector array
do di
Object plane Image plane
Figure 1-3 Camera imaging system.

Ii (x, y; λ ): continuous intensity image in the image plane of the

1.2 camera, with (x, y) denoting the coordinates of the image plane.
Relative spectral sensitivities
Blue Green
1.0
V [n1 , m1 ] = {Vred [n1 , m1 ], Vgreen [n1 , m1 ], Vblue [n1 , m1 ]} distribu-
0.8 tion: discrete 2-D array of the voltage outputs of the CCD or
Red photo detector array.
0.6
B[n2 , m2 ] = {Bred [n2 , m2 ], Bgreen [n2 , m2 ], Bblue [n2 , m2 ]} distribu-
0.4
tion: discrete 2-D array of the brightness across the LCD array.
0.2
0
0.40 0.45 0.50 0.55 0.60 0.65 0.70
◮ Our notation uses parentheses ( ) with continuous-space
Wavelength λ ( μm)
signals and images, as in Io (x′ , y′ ), and square brackets [ ]
with discrete-space images, as in V [n, m]. ◭
Figure 1-5 Spectral sensitivity plots for photodetectors.
(Courtesy Nikon Corporation.)
1 µ m, it is necessary to use a filter to block the IR spectrum and The three associated transformations are:
to place red (R), green (G), or blue (B) filters over each pixel
so as to separate the visible spectrum of the incident light into
(1) Optical Transformation: from Io (x′ , y′ ; λ ) to Ii (x, y; λ ).
the three primary colors. Thus, the array elements depicted in
Fig. 1-4 in red respond to red light, and a similar correspondence
applies to those depicted in green and blue. Typical examples (2) Detection Transformation: from Ii (x, y; λ ) to V [n1 , m1 ].
of color sensitivity spectra are shown in Fig. 1-5 for a Nikon
camera.
Regardless of the specific detection mechanism (CCD or (3) Display Transformation: from V [n1 , m1 ] to B[n2 , m2 ].
APS), the array output is transferred to a digital storage device
with specific markers denoting the location of each element of
the array and its color code (R, G, or B). Each array consists of
three subarrays, one for red, another for green, and a third for Indices [n1 , m1 ] and [n2 , m2 ] vary over certain ranges of discrete
blue. This information is then used to synchronize the output values, depending on the chosen notation. For a discrete image,
of the 2-D detector array with the 2-D pixel arrangement on an the two common formats are:
LCD (liquid crystal display) or other electronic displays.
A. Continuous and Discrete Images

(1) Centered Coordinate System: The central pixel of V [n, m]
By the time an image appears on an LCD screen, it will have
is at (n = 0, m = 0), as shown in Fig. 1-7(a), and the image
undergone a minimum of three transformations, involving a
extends to ±N for n and to ±M for m. The total image
minimum of three additional images. With λ denoting the
size is (2M + 1) × (2N + 1) pixels. Note that index n varies
light wavelength and using Fig. 1-6 as a guide, we define the
horizontally.
following images:
Io (x′ , y′ ; λ ): continuous intensity brightness in the object plane, (2) Corner Coordinate System: In Fig. 1-7(b), indices n and m
with (x′ , y′ ) denoting the coordinates of the object plane. of V [n, m] start at 1 (rather than zero). Image size is M × N.
Discretized version
of apple
Optical transformation Detection transformation Display transformation
]
n 1,m1
[
V red
Detector array Discretized

full screen
Red
Io(x′, y′) Ii(x, y) filter B[n2, m2]
]
m1
[ n 1,
en
V gre
Green
filter
]
,m1
Blue [e n 1
Object plane Image plane filter V blu Display array
Detector array
Figure 1-6 Io (x′ , y′ ; λ ) and Ii (x, y; λ ) are continuous scene brightness and image intensities, whereas V [n1 , m1 ] and B[n2 , m2 ] are discrete
images of the detected voltage and displayed brightness, respectively.
Image Notation
N columns
(0, M)
M rows
(0,1)
n
(−N,0) (−1,0) (0,0) (1,0) (N,0) m
(0,−1) • Index n varies horizontally and index m varies verti-

cally.
• Image notation is the same as matrix notation.
(0,−M) • Image size = # of rows × # of columns = M × N.
(a) Centered coordinates with

(2M+ 1) × (2N + 1) pixels The detection and display images may or may not be of
the same size. For example, if image compression is used
to generate a “thumbnail,” then far fewer pixels are used to
1 represent the imaged object in the display image than in the
detected image. Conversely, if the detected image is to be
2 enlarged through interpolation, then more pixels are used to
display the object than in the detected image.
3
B. Point Spread Function
Consider the scenario depicted in Fig. 1-8(a). An infinitesimally
small source of monochromatic (single wavelength) light, de-
noted s, is located in the center of the object plane, and the lens
location is adjusted to satisfy Eq. (1.1), thereby producing in the
image plane the best-possible image of the source. We assume
that the lens has no aberrations due to shape or material imper-
fections. We observe that even though the source is infinitesimal
M in spatial extent—essentially like a spatial impulse, its image
n is definitely not impulse-like. The image exhibits a circularly
1 2 3 N symmetric diffraction pattern caused by the phase interference
m
of the various rays of light that had emanated from the source
(b) Corner coordinates with (M × N ) pixels and traveled to the image plane through the lens. The pattern is
called an Airy disc.
Figure 1-7 (a) In the centered coordinate system, index m Figure 1-8(b) shows a 1-D plot of the image pattern in terms
extends between −M and +M and index n varies between −N of the intensity Ii (θ ) as a function of θ , where θ is the angular
and +N, whereas (b) in the corner coordinate system, indices n deviation from the central horizontal axis (Fig. 1-8(a)). The
and m start at 1 and conclude at N and M, respectively. expression for Ii (θ ) is
2
2J1 (γ )
Ii (θ ) = Io , (1.2)
γ
s
where J1 (γ ) is the first-order Bessel function of the first kind, x2 + y2
and sin θ = , (1.5)
πD x2 + y2 + di2
γ= sin θ . (1.3)
λ and Eq. (1.4) can be rewritten as
Here, λ is the wavelength of the light (assumed to be monochro-

matic for simplicity) and D is the diameter of the converging Ii (x, y) 2J1 (γ ) 2
lens. The normalized form of Eq. (1.2) represents the impulse h(x, y) = = , (1.6)
I0 γ
response h(θ ) of the imaging system,
with s
Ii (θ ) 2J1 (γ ) 2 πD x2 + y2
h(θ ) = = . (1.4) γ= . (1.7)
Io γ λ x2 + y2 + di2
For a 2-D image, the impulse response is called the point spread The expressions given by Eqs. (1.2) through (1.7) pertain to
function (PSF). coherent monochromatic light. Unless the light source is a laser,
Detector arrays are arranged in rectangular grids. For a pixel the light source usually is panchromatic, in which case the
at (x, y) in the image plane (Fig. 1-9), diffraction pattern that would be detected by each of the three-
color detector arrays becomes averaged over the wavelength
range of that array. The resultant diffraction pattern maintains
the general shape of the pattern in Fig. 1-8(b), but it exhibits a
Lens of diameter D gentler variation with θ (with no distinct minima). Here, h(x, y)
denotes the PSF in rectangular coordinates relative to the center
of the image plane.
y
Source s
θ x
Object plane Image plane

(a) Image of impulse source s
y
Ii(θ )
(0,y)
Io θ
di
πD
γ= sin θ y
λ Along y axis: sin θ = q
y2 + di2
γ s
−5 5 x2 + y2
For an image pixel at (x, y): sin θ =
(b) 1-D profile of imaged response x + y2 + di2
2
Figure 1-8 For an aberration-free lens, the image of a point

source is a diffraction pattern called an Airy disc. Figure 1-9 Relating angle θ to pixel at (x, y) in image plane.
the geometry in Fig. 1-10 with sin θ ≈ θ leads to

◮ The implication of this PSF is that when the optical
system is used to image a scene, the image it forms in the
image plane is the result of a 2-D convolution (as defined ∆ymin λ
later in Section 3-3) of the brightness distribution of the ∆θmin ≈ ≈ 1.22 (1.9a)
di D
scene in the object plane, Io (x, y), with the PSF given by (angular resolution).
Eq. (1.7):
Ii (x, y) = Io (x, y) ∗ ∗ h(x, y). (1.8)
The angular width ∆θmin is the angular resolution of the
The convolution effect is then embedded in the discrete imaging system and ∆ymin is the image spatial resolution. On
2-D detected image as well as in all subsequent manifes- the object side of the lens, the scene spatial resolution is
tations. ◭
λ
∆y′min = 1.22do ∆θmin = 1.22do . (1.9b)
C. Spatial Resolution D
(scene spatial resolution)
Each of the two coherent, monochromatic sources shown in
Fig. 1-10 produces a diffraction pattern. If the two sources are
sufficiently far apart so that their patterns are essentially distinct, This is known as the Rayleigh resolution criterion. Because
then we should be able to distinguish them from one another. But the lens diameter D is in the denominator, using a larger lens
as we bring them closer together, their diffraction patterns in the improves spatial resolution. Thus telescopes are made with very
image plane start to overlap, making it more difficult to discern large lenses and/or mirrors.
their images as those of two distinct sources. These expressions apply to the y and y′ directions at wave-
One definition of the spatial resolution capability of the length λ . Since the three-color detector arrays operate over
imaging system along the y′ direction is the separation ∆y′min different wavelength ranges, the associated angular and spatial
between the two point sources (Fig. 1-10) such that the peak of resolutions are the smallest at λblue ≈ 0.48 µ m and the largest at
the diffraction pattern of one of them occurs at the location of λred ≈ 0.6 µ m (Fig. 1-5). Expressions with identical form apply
the first null of the diffraction pattern of the other one, and vice along the x and x′ direction (i.e., upon replacing y and y′ with x
versa. Along the y direction in the image plane, the first null and x′ , respectively).
occurs when [2J1 (γ )/γ ]2 = 0 or, equivalently, γ = 3.832. Use of
D. Detector Resolution
The inherent spatial resolution in the image plane is
y ∆ymin = di λ /D, but the detector array used to record the
image has its own detector resolution ∆p, which is the pixel
Image pattern of s2
size of the active pixel sensor. For a black and white imaging
camera, to fully capture the image details made possible by the
s1 imaging system, the pixel size ∆p should be, at most, equal to
∆θmin
∆ymin . In a color camera, however, the detector pixels of an
∆ymin
′ ∆θmin ∆ymin individual color are not adjacent to one another (see Fig. 1-4),
s2 so ∆p should be several times smaller than ∆ymin .
do
Image pattern of s1 1-1.2 Thermal IR Imagers
di Density slicing is a technique used to convert a parameter
of interest from amplitude to pseudocolor so as to enhance
Figure 1-10 The separation between s1 and s2 is such that the the visual display of that parameter. An example is shown in
peak of the diffraction pattern due to s1 is coincident with the Fig. 1-11, wherein color represents the infrared (IR) tempera-
first null of the diffraction pattern of s2 , and vice versa. ture of a hot-air balloon measured by a thermal infrared imager.
The vertical scale on the right-hand side provides the color-
Figure 1-11 IR image of a hot-air balloon (courtesy of Ing.-Buro für Thermografie). The spatial pattern is consistent with the fact that
warm air rises.
Medium-wave IR
λ = 0.76 μm 2 μm 4 μm 103 μm
Near IR Long-wave IR
Gamma Ultraviolet
X-rays Infrared Microwaves Radio waves
rays rays
10−6 μm 10−3 μm 1 μm 103 μm 106 μm 109 μm
Figure 1-12 Infrared subbands.
temperature conversion. Unlike the traditional camera—which

measures the reflectance of the observed scene—a thermal IR Spectral emittance (Wm−2μm−1) 108
imager measures the emission by the scene, without an external Sun spectrum
source of illumination. IR imagers are used in many applications 5800 K
including night vision, surveillance, fire detection, and thermal 106
insulation in building construction.
104
1000 K
A. IR Spectrum 100
The wavelength range of the infrared spectrum extends from the 300 K
end of the red part of the visible spectrum at about 0.76 µ m 1 Earth spectrum
to the edge of the millimeter-wave band at 1000 µ m (or,
equivalently, λ = 1 mm). For historical reasons, the IR band 0.01
has been subdivided into multiple subbands, but these subbands 0.1 1 10 100
do not have a standard nomenclature, nor standard definitions Wavelength (μm)
for their wavelength extents. The prevailing practice assigns the
following names and wavelength ranges (Fig. 1-12): Figure 1-13 The peak of the blackbody radiation spectrum of
the sun is in the visible part of the EM spectrum, whereas the
(a) The near IR (NIR) extends from λ = 0.76 µ m to peak for a terrestrial object is in the IR (at ≈ 10 µ m).
λ = 2 µ m.
(b) The middle-wave IR (MWIR) extends from λ = 2 µ m to

λ = 4 µ m.
The basis for the self-emission is the blackbody radiation
(c) The long-wave IR (LWIR) extends from λ = 4 µ m to law, which states that all material objects radiate EM energy,
λ = 1000 µ m. and the spectrum of the radiated energy depends on the physical
temperature of the object, its material composition, and its
Most sensors operating in the NIR subband are similar to visible surface properties. A blackbody is a perfect emitter and perfect
light cameras in that they record light reflectance, but only absorber, and its radiation spectrum is governed by Planck’s law.
in the 0.76–2 µ m range, whereas IR sensors operating at the Figure 1-13 displays plots of spectral emittance for the sun (at
longer wavelengths rely on measuring energy self-emitted by the an effective radiating temperature of 5800 K) and a terrestrial
observed object, which depends, in part, on the temperature of blackbody at 300 K (27 ◦ C). We observe from Fig. 1-13
the object. Hence, such IR sensors are called thermal imagers. that the peak of the terrestrial blackbody is at approximately
1.0 IR lens
Dectector
0.95 array
Spectral emissivity
0.90 Data
processing
0.85 Ocean unit
Vegetation IR
0.80 Desert signal
Snow/ice
0.75 Cooler
Optics
3.3 μm 5 μm 10 μm 20 μm
Wavelength 50 μm
Figure 1-15 Thermal IR imaging systems often use cryogenic
cooling to improve detection sensitivity.
Figure 1-14 Emissivity spectra for four types of terrain.
(Courtesy the National Academy Press.)
10 µ m, which is towards the short wavelength end of the LWIR

subband. This means that the wavelength range around 10 µ m is
particularly well suited for measuring radiation self-emitted by
objects at temperatures in the range commonly encountered on
Earth. The amount of energy emitted at any specific wavelength
depends not only on the temperature of the object, but also on
its material properties. The emissivity of an object is defined as
the ratio of the amount of energy radiated by that object to the
amount of energy that would have been radiated by the object Visible light Thermal IR
had it been an ideal blackbody at the same physical temperature.
By way of an example, Fig. 1-14 displays spectral plots of the Figure 1-16 Comparison of black-and-white visible-light
emissivity for four types of terrain: an ocean surface, a desert photography with an IR thermal image of the same scene.
surface, a surface covered with snow or ice, and a vegetation-
covered surface.
B. Imaging System availability and use of a cryogenic agent, such as liquid nitrogen,
as well as placing the detectors in a vacuum-sealed container.
The basic configuration of a thermal IR imaging system Consequently, cooled IR imagers are significantly more expen-
(Fig. 1-15) is similar to that of a visible-light camera, but the sive to construct and operate than uncooled imagers.
lenses and detectors are designed to operate over the intended We close this section with two image examples. Figure 1-16
IR wavelength range of the system. Two types of detectors compares the image of a scene recorded by a visible-light black-
are used, namely uncooled detectors and cooled detectors. By and-white camera with a thermal IR image of the same scene.
cooling a semiconductor detector to very low temperatures, The IR image is in pseudocolor, with red representing high IR
typically in the 50–100 K range, its self-generated thermal noise emission and blue representing (comparatively) low IR emis-
is reduced considerably, thereby improving the signal-to-noise sion. The two images convey different types of information, but
ratio of the detected IR signal emitted by the observed scene. they also have significantly different spatial resolutions. Today,
Cooled detectors exhibit superior sensitivity in comparison with digital cameras with 16 megapixel detector arrays are readily
uncooled detectors, but the cooling arrangement requires the available and fairly inexpensive. In contrast, most standard
1-2 RADAR IMAGERS 13
detector arrays of thermal IR imagers are under 1 megapixel in

Exercise 1-1: An imaging lens used in a digital camera
size. Consequently, IR images appear “blurry” when compared
has a diameter of 25 mm and a focal length of 50 mm.
with their photographic counterparts.
Considering only the photodetectors responsive to the red
Our second image, shown in Fig. 1-17, is an IR thermal image
band centered at λ = 0.6 µ m, what is the camera’s spatial
of a person’s head and neck. Such images are finding increased
resolution in the image plane, given that the image distance
use in medical diagnostics, particularly for organs close to the
from the lens is di = 50.25 mm? What is the corresponding
surface [Ring and Ammer, 2012].
resolution in the object plane?
Answer: ∆ymin = 1.47 µ m; ∆y′min = 0.3 mm.
Exercise 1-2: At λ = 10 µ m, what is the ratio of the

emissivity of a snow-covered surface relative to that of a
sand-covered surface? (See Fig. 1-14.)
Answer: esnow /esand ≈ 0.985/0.9 = 1.09.
1-2 Radar Imagers

Conceptually, a radar can generate an image of the reflectivity of
a scene by scanning its antenna beam across the scene in a raster-
like format, as depicted in Fig. 1-18. Even though the imaging
process is very different from the process used by a lens in a
camera, the radar and the camera share the same fundamental
relationship for angular resolution. In Section 1-1.1, we stated
in Eq. (1.9a) that the angular resolution of a converging lens is
approximately ∆θmin = 1.22λ /D, and the corresponding spatial
resolution is
Figure 1-17 Thermal IR image of a person’s head and neck. λ

∆y′min = do ∆θmin = 1.22do (camera). (1.10a)
D
Here, λ is the wavelength of the light and D is the diameter of
the lens.
Concept Question 1-1: What is a camera’s point spread
Equation (1.10a) is approximately applicable to a microwave
function? What role does it play in the image formation radar with a dish antenna of diameter D (in the camera case,
process? the scene illumination is external to the camera, so the lens gets
involved in only the receiving process, whereas in the radar case
the antenna is involved in both the transmitting and receiving
Concept Question 1-2: How are the image and scene processes). In the radar literature, the symbol usually used to
spatial resolutions related to one another? denote the range between the radar antenna and the target is the
symbol R. Hence, upon replacing do with R, we have
Concept Question 1-3: What is the emissivity of an ob- λ
ject? ∆y′min ≈ R (radar). (1.10b)
D
Concept Question 1-4: Why is an IR imager called a It is important to note that λ of visible light is much shorter
thermal imager? than λ in the microwave region. In the middle of the visible
spectrum, λvis ≈ 0.5 µ m, whereas at a typical microwave radar
∆y′min
∆θmin
Figure 1-18 Radar imaging of a scene by raster scanning the antenna beam.
frequency of 6 GHz, λmic ≈ 5 cm. The ratio is sion and transmits very short pulses to achieve fine resolution in
the orthogonal dimension. The predecessor to SAR is the real-
λmic 5 × 10−2 aperture side-looking airborne radar (SLAR). A SLAR uses
= = 105 !
λvis 0.5 × 10−6 a rectangular- or cylindrical-shaped antenna that gets mounted
along the longitudinal direction of an airplane, and pointed
This means that the angular resolution capability of an optical partially to the side (Fig. 1-19).
system is on the order of 100,000 times better than the angular Even though the antenna beam in the elevation direction
resolution of a radar, if the lens diameter is the same size as the is very wide, fine discrimination can be realized along the x
antenna diameter. direction in Fig. 1-19 by transmitting a sequence of very short
To fully compensate for the large wavelength ratio, a radar pulses. At any instant in time, the extent of the pulse along x is
antenna would need a diameter on the order of 1 km to produce
an image with the same resolution as a camera with a lens 1 cm cτ
∆x′min = (scene range resolution), (1.11)
in diameter. Clearly, that is totally impractical. In practice, most 2 sin θ
radar antennas are on the order of centimeters to meters in size,
where c is the velocity of light, τ is the pulse width, and θ is
but certainly not kilometers. Yet, radar can image the Earth
the incidence angle relative to nadir-looking. This represents the
surface from satellite altitudes with spatial resolutions on the
scene spatial resolution capability along the x′ direction. At a
order of 1 m—equivalent to antenna sizes several kilometers in
typical angle of θ = 45◦ , the spatial resolution attainable when
extent! How is that possible?
transmitting pulses each 5 µ s in width is
A. Synthetic-Aperture Radar 3 × 108 × 5 × 10−9
∆x′min = ≈ 1.05 m.
As we will see shortly, a synthetic-aperture radar (SAR) uses 2 sin 45◦
a synthesized aperture to achieve good resolution in one dimen-
Recorder (digital storage)
va Transmitte
Tr ter-
ter-
Transmitter-receiver
An
Antenna
Idealized
lized elevation
antenna pattern ly
Short pu
pulse
Trees
Bank edge
Water Truck
x′′
Shadow ∆y′′
∆
∆y
Start sweep
Truck
Sloping edge
Trees
Shadow
End of sweep
Video Water
amplitude
Time
(range)
A-scope display
Scrub growth
(brush, grass, bare earth, etc.)
Figure 1-19 Real-aperture SLAR imaging technique. The antenna is mounted along the belly of the aircraft.
Not only is this an excellent spatial resolution along the x′ Fig. 1-19). The shape of the beam of the cylindrical antenna
direction, but it is also independent of range R (distance between is illustrated in Fig. 1-20. From range R, the extent of the beam
the radar and the surface), which means it is equally applicable along the y direction is
to a satellite-borne radar.
As the aircraft flies along the y direction, the radar beam λ λh
∆y′min ≈ R= , (1.12)
sweeps across the terrain, while constantly transmitting pulses, ly ly cos θ
receiving their echoes, and recording them on an appropriate (real-aperture azimuth resolution)
medium. The sequential echoes are then stitched together to
form an image. where h is the aircraft altitude. This is the spatial resolution
By designing the antenna to be as long as practicable along capability of the radar along the flight direction. For a 3 m long
the airplane velocity direction, the antenna pattern exhibits a antenna operating at λ = 3 cm from an altitude of 1 km, the
relatively narrow beam along that direction (y′ direction in
βxz ≈ λ Along-track resolution

lx Real-aperture ∆y′min = λR/ly
ly βyz ≈ λ Synthetic-aperture ∆y′min = ly /2
ly
R
Example: λ = 4 cm
lx spacecraft radar
Length of
synthetic aperture Resolution
of synthetic
Fan beam 8 km aperture
1m
Figure 1-20 Radiation pattern of a cylindrical reflector.
400 km
resolution ∆y′min at θ = 45◦ is ly = 2 m

Length of
real aperture
3 × 10−2
∆y′min ≈ × 103 ≈ 14 m.
3 cos45◦ 8 km
Ideally, an imaging system should have similar resolution ca- Resolution of
pabilities along both directions of the imaged scene. In the real aperture
present case, ∆x′min ≈ 1.05 m, which is highly desirable, but
∆y′min ≈ 14 m, which for most imaging applications is not so Figure 1-21 An illustration of how synthetic aperture works.
desirable, particularly if the altitude h is much higher than 1 km.
Furthermore, since ∆y′min is directly proportional to the altitude
h of the flying vehicle, whereas ∆x′min is independent of h, the
disparity between ∆x′min and ∆y′min will get even greater when
we consider radars flown at satellite altitudes. B. Point Spread Function
To improve the resolution ∆y′min and simultaneously remove
its dependence on the range R, we artificially create an array The first of our two SAR-image examples displays a large part
of antennas as depicted in Fig. 1-21. In the example shown in of Washington, D.C. (Fig. 1-22). The location information of
Fig. 1-21, the real satellite-borne radar antenna is 2 m long and a particular pixel in the observed scene is computed, in part,
the synthetic aperture is 8 km long! The latter consists of pulse from the round-trip travel time of the transmitted pulse. Conse-
returns recorded as the real antenna travels over a distance of quently, a target that obscures the ground beneath it, such as the
8 km, and then processed later as if they had been received by Washington Monument in Fig. 1-22, ends up generating a radar
an 8 km long array of antennas, each 2 m long, simultaneously. shadow because no signal is received from the obscured area.
The net result of the processing is an image with a resolution The radar shadow of the obelisk of the Washington Monument
along the y direction given by appears as a dark line projected from the top onto the ground
surface.
ly Radar shadow is also apparent in the SAR image of the plane
∆y′min = (SAR azimuth resolution), (1.13)
2 and helicopter of Fig. 1-23.
In Section 1-1.1, we stated that the image formed by the lens
where ly is the length of the real antenna. For the present of an optical camera represents the convolution of the reflectivity
example, ly = 2 m and ∆y′min = 1 m, which is approximately the of the scene (or the emission distribution in the case of the IR
same as ∆x′min . Shortening the antenna length would improve the imager) with the point spread function (PSF) of the imaging
azimuth resolution, but considerations of signal-to-noise ratio system. The concept applies equally well to the imaging radar
would require the transmission of higher power levels. case. For an x–y SAR image with x denoting the side-looking
direction and y denoting the flight direction, the SAR PSF is
NRL
N
Capitol
Pentagon
White House
Washington
Monument
Figure 1-22 SAR image collected over Washington, D.C. Right of center is the Washington Monument, though only the shadow of the obelisk
is readily apparent in the image. [Courtesy of Sandia National Laboratories.]
given by
Exercise 1-3: With reference to the diagram in Fig. 1-21,
h(x, y) = hx (x) hy (y), (1.14)
suppose the length of the real aperture were to be increased
with hx (x) describing the shape of the transmitted pulse and from 2 m to 8 m. What would happen to (a) the antenna
hy (y) describing the shape of the synthetic antenna-array pat- beamwidth, (b) length of the synthetic aperture, and (c) the
tern. Typically, the pulse shape is like a Gaussian: SAR azimuth resolution?
2 Answer: (a) Beamwidth is reduced by a factor of 4, (b)
hx (x) = e−2.77(x/τ ) , (1.15a)
synthetic aperture length is reduced from 8 km to 2 km, and
where τ is the effective width of the pulse (width between half- (c) SAR resolution changes from 1 m to 4 m.
peak points). The synthetic array pattern is sinc-like in shape,
but the sidelobes may be suppressed further by assigning differ-
ent weights to the processed pulses. For the equally weighted
case, 1-3 X-Ray Computed Tomography
2 1.8y (CT)
hy (y) = sinc , (1.15b)
l
where l is the length of the real antenna, and the sinc function is Computed tomography, also known as CT scan, is a technique
defined such that sinc(z) = sin(π z)/(π z) for any variable z. capable of generating 3-D images of the X-ray attenuation (ab-
sorption) properties of an object, such as the human body. The
X-ray absorption coefficient of a material is strongly dependent
Concept Question 1-5: Why is a SAR called a
on the density of that material. CT has the sensitivity necessary
“synthetic”-aperture radar?
to image body parts across a wide range of densities, from soft
tissue to blood vessels and bones.
Concept Question 1-6: What system parameters deter- As depicted in Fig. 1-24(a), a CT scanner uses an X-ray
mine the PSF of a SAR? source, with a narrow slit to generate a fan-beam, wide enough
to encompass the extent of the body, but only about 1 mm
thick. The attenuated X-ray beam is captured by an array of
∼ 900 detectors. The X-ray source and the detector array are
mounted on a circular frame that rotates in steps of a fraction
of a degree over a full 360◦ circle around the object or patient,
each time recording an X-ray attenuation profile from a different
angular direction. Typically, on the order of 1000 such profiles
are recorded, each composed of measurements by 900 detectors.
For each horizontal slice of the body, the process is completed
in less than 1 second. CT uses image reconstruction algorithms
to generate a 2-D image of the absorption coefficient of that
horizontal slice. To image an entire part of the body, such as
the chest or head, the process is repeated over multiple slices
(layers).
For each anatomical slice, the CT scanner generates on
the order of 9 × 105 measurements (1000 angular orientations
×900 detectors). In terms of the coordinate system shown in
Fig. 1-24(b), we define α (ξ , η ) as the absorption coefficient
of the object under test at location (ξ , η ). The X-ray beam is
directed along the ξ direction at η = η0 . The X-ray intensity
Figure 1-23 High-resolution image of an airport runway received by the detector located at ξ = ξ0 and η = η0 is given
with a plane and helicopter. [Courtesy of Sandia National by
Z ξ0
Laboratories.]
I(ξ0 , η0 ) = I0 exp − α (ξ , η0 ) d ξ , (1.16)
0
1-4 MAGNETIC RESONANCE IMAGING 19
where I0 is the X-ray intensity radiated by the source. Outside

the body, α (ξ , η ) = 0. The corresponding logarithmic path
attenuation p(ξ0 , η0 ) is defined as
X-ray
Fan beam Z ξ0
source I(ξ0 , η0 )
of X-rays p(ξ0 , η0 ) = − log = α (ξ , η0 ) d ξ . (1.17)
I0 0
Detector
array The path attenuation p(ξ0 , η0 ) is the integrated absorption
coefficient across the X-ray path.
In the general case, the path traversed by the X-ray source is at
a range r and angle θ in a polar coordinate system, as depicted
in Fig. 1-24(c). The direction of the path is orthogonal to the
direction of r. For a path corresponding to a specific set (r, θ ),
Computer Eq. (1.17) becomes
and monitor Z ∞Z ∞
(a) CAT scanner p(r, θ ) = α (ξ , η ) δ (r − ξ cos θ − η sin θ ) d ξ d η ,
−∞ −∞
η (1.18)
where the Dirac impulse δ (r − ξ cos θ − η sin θ ) dictates that
only those points in the (ξ , η ) plane that fall along the path
X-ray specified by fixed values of (r, θ ) are included in the integration.
I(ξ0,η0)
source The relation between p(r, θ ) and α (ξ , η ) is known as the
I0
η0 X-ray detector 2-D Radon transform of α (ξ , η ). The goal of CT is to re-
Absorption construct α (ξ , η ) from the measured path attenuations p(r, θ ),
coefficient I(ξ0,η0) = ξ0 by inverting the Radon transform given by Eq. (1.18), which is
α(ξ,η) accomplished with the help of the Fourier transform.
I0 exp(− ∫ α(ξ,η0) dξ)
Object 0
Concept Question 1-7: What physical attribute of the
imaged body is computed and displayed by a CT scanner?
0 ξ
0 ξ0
(b) Horizontal path
1-4 Magnetic Resonance Imaging
η
X-ray source Since its early demonstration in the 1970s, magnetic resonance
I0 imaging (MRI) has become a highly valuable tool in diagnostic
radiology, primarily because it can generate high-resolution
anatomical images of the human body, without exposing the
r patient to ionizing radiation. Like X-ray CT scanners, magnetic
α(ξ,η) resonance (MR) imagers can generate 3-D images of the body
I(r,θ) part of interest, from which 2-D slices can be extracted along
θ Detector any orientation of interest. The name MRI derives from the fact
ξ that the MRI scanner measures nuclear magnetic resonance
(c) Path at radius r and orientation θ (NMR) signals emitted by the body’s tissues and blood vessels
in response to excitation by a magnetic field introduced by a
Figure 1-24 (a) CT scanner, (b) X-ray path along x, and (c) radio frequency (RF) system.
X-ray path along arbitrary direction.
1-4.1 Basic System Configuration
The MRI system shown in Fig. 1-25 depicts a human body lying
Superconducting magnet generates static field B0 The Magnetic Field

Superconducting
Gradient coils magnet
generate field BG
RF coil excites
nuclei and “listens” yˆ B0
to the response
zˆ ~1.5 T
Radio frequency Gradient Radio frequency xˆ

transmitter amplifiers receiver
~1 × 10−4 T
Computer
Figure 1-26 B0 is static and approximately uniform within
the cavity. Inside the cavity, B0 ≈ 1.5 T (teslas), compared with
Figure 1-25 Basic diagram of an MRI system. only 0.1 to 0.5 milliteslas outside.
inside a magnetic core. The magnetic field at a given location

Magnetic moment
(x, y, z) within the core and at a given instant in time t may
consist of up to three magnetic field contributions: B0
B = B0 + BG + BRF , mI = −1/2
where B0 is a static field, BG is the field gradient, and BRF is

the radio frequency (RF) excitation used to solicit a response
from the biological material placed inside the core volume. Each
of these three components plays a critical role in making MRI
possible, so we will discuss them individually. mI = 1/2
θ
A. Static Field B0
Figure 1-27 Nuclei with spin magnetic number of ±1/2
Field B0 is a strong, static (non–time varying) magnetic field precessing about B0 at the Larmor angular frequency ω0 .
created by a magnet designed to generate a uniform (constant)
distribution throughout the magnetic core (Fig. 1-26). Usually, a
superconducting magnet is used for this purpose because it can
generate magnetic fields with much higher magnitudes than can come magnetized when exposed to a magnetic field. Among the
be realized with resistive and permanent magnets. The direction substances found in a biological material, the hydrogen nucleus
of B0 is longitudinal (ẑ direction in Fig. 1-26) and its magnitude has a strong susceptibility to magnetization, and hydrogen is
is typically on the order of 1.5 teslas (T). The conversion factor highly abundant in biological tissue. For these reasons, a typical
between teslas and gauss is 1 T = 104 gauss. Earth’s magnetic MR image is related to the concentration of hydrogen nuclei.
field is on the order of 0.5 gauss, so B0 inside the MRI core is The strong magnetic field B0 causes the nuclei of the material
on the order of 30,000 times that of Earth’s magnetic field. inside the core space to temporarily magnetize and to spin
Biological tissue is composed of chemical compounds, and (precess) like a top about the direction of B0 . The precession ori-
each compound is organized around the nuclei (protons) of the entation angle θ , shown in Fig. 1-27, is determined by the spin
atoms comprising that compound. Some, but not all, nuclei be- quantum number I of the spinning nucleus and the magnetic
1-4 MAGNETIC RESONANCE IMAGING 21
Table 1-2 Gyromagnetic ratio γ– for biological nuclei. y

z
Isotope Spin I % Abundance γ– MHz/T
1H 1/2 99.985 42.575 x
13 C 1/2 1.108 10.71
14 N 1 99.63 3.078
17 O z x y
5/2 0.037 5.77
19 F 1/2 100 40.08
23 Na 3/2 100 11.27
31 P 1/2 100 17.25
Figure 1-28 Magnetic coils used to generate magnetic fields

quantum number mI . A material, such as hydrogen, with a spin along three orthogonal directions. All three gradient fields BG
system of I = 1/2 has magnetic quantum numbers mI = ±1/2. point along ẑ, but their intensities vary linearly along x, y, and z.
Hence, the nucleus may p spin along two possible directions
defined by cos θ = mI / I(I + 1), which yields θ = ±54◦ 44′
[Liang and Lauterbur, 2000].
gradient magnetic field is localization (in addition to other infor-
The associated angular frequency of the nuclear precession
mation that can be extracted about the tissue material contained
is called the Larmor frequency and is given by
in the core volume during the activation and deactivation cycles
ω0 = γ B0 (Larmor angular frequency), (1.19a) of the gradient fields). Ideally, the gradient fields assume the
following spatial variation inside the core volume:
with ω0 in rad/s, B0 in teslas (T), and γ , the gyromagnetic ratio
of the material, in (rad/s)/T. Alternatively, we can express the BG = (Gx x + Gy y + Gz z)ẑ,
precession in terms of the frequency f0 = ω0 /2π , in which case
with the center of the (x, y, z) coordinate system placed at the
Eq. (1.19a) assumes the equivalent form
center of the core volume. The gradient coefficients Gx , Gy ,
f0 = γ–B0 (Larmor frequency), (1.19b) and Gz are on the order of 10 mT/m, and they are controlled
individually by the three gradient coils.
where γ– = γ /2π . This fundamental relationship between f0 and Let us consider an example in which Gx = Gz = 0 and Gy = 10
B0 is at the heart of what makes magnetic resonance imaging mT/m, and let us assume that the vertical dimension of the core
possible. Table 1-2 provides a list of nuclei of biological interest volume is 1 m. If B0 = 1.5 T, the combined field will vary from
that have nonzero spin quantum numbers, along with their cor- (
responding gyromagnetic ratios. For the hydrogen isotope 1 H, 1.495 T @ y = − 12 m, to
B = B 0 + Gy y =
γ– = 42.575 MHz/T, so the Larmor frequency for hydrogen at 1.505 T @ y = 12 m,
B0 = 1.5 T is f0 = 63.8625 MHz, which places it in the RF part
of the EM spectrum. Since the human body is made up primarily as depicted in Fig. 1-29. By Eq. (1.19b), the corresponding
of water, the most commonly imaged nucleus is hydrogen. Larmor frequency for hydrogen will vary from 63.650 MHz for
hydrogen nuclei residing in the plane at y = −0.5 m to 64.075
B. Gradient Field BG MHz for nuclei residing in the plane at y = +0.5 m. As we will
explain shortly, when an RF signal at a particular frequency fRF
The MRI system includes three current-activated gradient coils is introduced inside the core volume, those nuclei whose Larmor
(Fig. 1-28) designed to generate magnetic fields pointed along frequency f0 is the same as fRF will resonate by absorbing part
the ẑ direction—the same as B0 , but whose magnitudes exhibit of the RF energy and then reemitting it at the same frequency (or
linear spatial variations along the x̂, ŷ, and ẑ directions. That slightly shifted in the case of certain chemical reactions). The
is why they are called gradient fields. The three coils can be strength of the emitted response is proportional to the density
activated singly or in combination. The primary purpose of the of nuclei. By varying the total magnetic field B linearly along
intended application [Liang and Lauterbur, 2000]. The magnetic

y field of the transmitted energy causes the exposed biological
tissue to resonate at its Larmor frequency. With the transmitter
Gradient field intensity
off, the receiver picks up the resonant signals emitted by the

biological tissue. The received signals are Fourier transformed
so as to establish a one-to-one correspondence to the locations
of the voxels responsible for the emission. For each voxel, the
strength of the associated emission is related to the density of 1 H
nuclei in that voxel as well as to other parameters that depend
on the tissue properties and pulse timing.
Image slice
Volume
1-4.2 Point Spread Function
Figure 1-29 Imposing a gradient field that varies linearly with Generating the MR image involves applying the discrete form
y allows stratification into thin slices, each characterized by its of the Fourier transform. Accordingly, the point spread function
own Larmor frequency. of the MR image is given by a discrete form of the sinc function,
namely [Liang and Lauterbur, 2000]:
sin(π N ∆k x)
the vertical direction, the total core volume can be discretized hx (x) = ∆k , (1.20)
sin(π ∆k x)
into horizontal layers called slices, each corresponding to a
different value of f0 (Fig. 1-29). This way, the RF signal can where x is one of the two MR image coordinates, k is a spatial
communicate with each slice separately by selecting the RF frequency, ∆k is the sampling interval in k space, and N is the
frequency to match f0 of that slice. In practice, instead of total number of Fourier samples. A similar expression applies
sending a sequence of RF signals at different frequencies, the RF to hy (y). The spatial resolution of the MR image is equal to the
transmitter sends out a short pulse whose frequency spectrum equivalent width of hx (x), which can be computed as follows:
covers the frequency range of interest for all the slices in the
Z 1/(1 ∆k)
volume, and then a Fourier transformation is applied to the 1 1
response from the biological tissue to separate the responses ∆xmin = hx (x) dx = . (1.21)
h(0) −1/(1 ∆k) N ∆k
from the individual slices.
The gradient magnetic field along the ŷ direction allows dis- The integration was performed over one period (1/∆k) of hx (x).
cretization of the volume into x–y slices. A similar process can According to Eq. (1.21), the image resolution is inversely
be applied to generate x–z and y–z slices, and the combination is proportional to the product N ∆k. The choices of values for N
used to divide the total volume into a three-dimensional matrix and ∆k are associated with signal-to-noise ratio and scan time
of voxels (volume pixels). The voxel size defines the spatial considerations.
resolution capability of the MRI system.
1-4.3 MRI-Derived Information
C. RF System
Generally speaking, MRI can provide three types of information
The combination of the strong static field B0 and the gradient about the imaged tissue:
field BG (whose amplitude is on the order of less than 1%
of B0 ) defines a specific Larmor frequency for the nuclei of (a) The magnetic characteristics of tissues, which are related
every isotope within each voxel. As we noted earlier through to biological attributes and blood vessel conditions.
Table 1-1, at B0 intensities in the 1 T range, the Larmor frequen-
cies of common isotopes are in the MHz range. The RF system (b) Blood flow, made possible through special time-dependent
consists of a transmitter and a receiver connected to separate gradient excitations.
coils, or the same coil can be used for both functions. The
transmitter generates a burst of narrow RF pulses. In practice, (c) Chemical properties discerned from measurements of
many different pulse configurations are used, depending on the small shifts in the Larmor frequency.
1-5 ULTRASOUND IMAGER 23
receive switch. Thus, the array serves to both launch acoustic

waves in response to electrical excitation as well as to receive
the consequent acoustic echoes and convert them back into
electrical signals. The echoes are reflections from organs and
tissue underneath the skin of the body part getting imaged by
the ultrasound imager (Fig. 1-32).
The transmitter unit in Fig. 1-31, often called the pulser,
generates a high-voltage short-duration pulse (on the order of
a few microseconds in duration) and sends it to the transmit
beamforming unit, which applies individual time delays to
the pulse before passing it on to the transducers through the
Figure 1-30 MR image. transmit/receive switch. The choice of time delays determines
the range at which the acoustic waves emitted by the four
transducers interfere constructively, as well as the direction
of that location relative to the axis of the array. The range
An example of an MR image is shown in Fig. 1-30. operation is called focusing and the directional operation is
called steering. Reciprocal operations are performed by the
Concept Question 1-8: An MRI system uses three dif- receive beamforming unit; it applies the necessary time delays
ferent types of magnetic fields. For what purpose? to the individual signals made available by the transducers and
then combines them together coherently to generate the receive
echo. The focusing and steering operations are the subject of the
Concept Question 1-9: What determines the Larmor fre-
next subsection.
quency of a particular biological material?
1-5 Ultrasound Imager 1-5.2 Beam Focusing and Steering

Human hearing extends up to 20 kHz. Ultrasound is defined The focusing operation is illustrated by the diagram in Fig. 1-33
as sound at frequencies above that range. Ultrasound imaging using eight transducers. In response to the electrical stimulations
systems, which operate in the 2 to 20 MHz range, have numer- introduced by the beamforming unit, all of the transducers
ous industrial and medical applications, and the latter include generate outward-going acoustic waves that are identical in
both diagnosis and therapy. Fundamentally, ultrasound imagers every respect except for their phases (time delays). The specific
are similar to radar imagers in that both sensors employ phase distribution of the time delays shown in Fig. 1-33(a) causes
shifting (or, equivalently, time delaying) to focus and steer their the eight acoustic waves to interfere constructively at the point
beams at the desired distances and along the desired directions. labeled Focus 1 at range Rf1 . The time-delay distribution is
Imaging radars use 1-D or 2-D arrays of antennas, and likewise, symmetrical relative to the center of the array, so the direction
ultrasound imagers use 1-D or 2-D arrays of transducers. How- of Focus 1 is broadside to the array axis. Changing the delay
ever, electromagnetic waves and sound waves have different shifts between adjacent elements, while keeping the distribution
propagation properties, so the focusing and steering techniques symmetrical, as in Fig. 1-33(b), causes the focal point to move
are not quite identical. to Focus 2 at range Rf2 . If no time delay is applied to any of the
eight transducer signals, the focal point moves to infinity.
1-5.1 Ultrasound System Architecture
Ultrasound imagers use both 1-D and 2-D transducer arrays
(with some as large as 2000 × 8000 elements and each on the ◮ The combined beam of the transducer array can be
order of 5 µ m × 5 µ m in size), but for the sake of simplicity, focused as a function of depth by varying the incremental
we show in Fig. 1-31 only a 1-D array with four elements. The time delay between adjacent elements in a symmetrical
system has a transmitting unit and a receiving unit, with the time-delay distribution. ◭
transducer array connected to the two units through a transmit/
Transmit pulse
Transmit
Transmitter τ beamforming
unit (time delay
generator) Transducers
System T/R
Display
processor switch
Receive echo
Receive
Data
beamforming
acquisition
(time delay
unit
generator)
Figure 1-31 Block diagram of an ultrasound system with a 4-transducer array.
The image displayed in Fig. 1-34 is a simulation of acoustic

energy across two dimensions, the lateral dimension parallel to
the array axis and the axial dimension along the range direction.
The array consists of 96 elements extending over a length of
20 mm, and the beam is focused at a range R = 40 mm.
The delay-time distribution shown in Fig. 1-35 is symmetrical
relative to the broadside direction. By shifting the axis of
symmetry to another direction, the focal point moves to a new
direction. This is called steering the beam, and is illustrated in
Fig. 1-35 and simulated in part (b) of Fig. 1-34.
With a 2-D array of transducers, the steering can be realized
along two orthogonal directions, so the combination of focusing
and steering ends up concentrating the acoustic energy radiated
by the transducer array into a small voxel within the body
getting imaged by the ultrasound probe. Similar operations are
performed by the receive beamforming unit so as to focus and
steer the beam of the transducer array to receive the echo from
the same voxel.
Figure 1-32 Ultrasound imaging of the thyroid gland.

1-5 ULTRASOUND IMAGER 25
Beamforming unit Beamforming unit
Array
axis Array axis
Rf1
Rf2
Focus 1
Focus 2
Broadside direction Broadside direction
(a) Focal point at Rf1 (b) Focal point at Rf2
Figure 1-33 Changing the inter-element time delay across a symmetrical time-delay distribution shifts the location of the focal point in
the range direction.
1-5.3 Spatial Resolution cycles in the pulse. The wavelength is related to the signal
frequency by
v
For a 2-D transducer array of size (Lx × Ly ) and focused at λ= , (1.23)
f
range Rf , as shown in Fig. 1-36 (side Ly is not shown in
the figure), the size of the resolution voxel is given by an where v is the wave velocity and f is the frequency. In biological
axial resolution ∆Rmin along the range direction and by lateral tissue, v ≈ 1540 m/s. For an ultrasound system operating at
resolutions ∆xmin and ∆ymin along the two lateral directions. The f = 5 MHz and generating pulses with N = 2 cycles per pulse,
axial resolution is given by
vN 1540 × 2
∆Rmin = = ≈ 0.3 mm.
λN 2f 2 × 5 × 106
∆Rmin = (axial resolution), (1.22)
2
where λ is the wavelength of the pulse in the material in which
the acoustic waves are propagating and N is the number of
90 90
70 70
50 50
Lateral distance (mm)
30 30 Lx ∆ xmin
10 10
∆ Rmin
10 10
30 30
50 50 Rf
70 70
90 90 Figure 1-36 Axial resolution ∆Rmin and lateral resolution
0 20 40 60 80 100 0 20 40 60 80 ∆xmin for a transducer array of length Lx focused at range Rf .
Axial distance (mm)
(a) Focused beam (b) Focused beam
with no steering with steering
by 45o
The lateral resolution ∆xmin is given by
Figure 1-34 Simulations of acoustic energy distribution for
(a) a beam focused at Rf = 40 mm by a 96-element array and λ Rf v
∆xmin = Rf = (lateral resolution), (1.24)
(b) a beam focused and steered by 45◦ . Lx Lx f
where Rf is the focal length (range at which the beam is
focused). If the beam is focused at Rf = 5 cm and the array
length Lx = 4 cm and f = 5 MHz, then
5 × 10−2 × 1540
∆xmin = ≈ 0.4 mm,
4 × 10−2 × 5 × 106
(a) Uniform distribution (b) Linear shift (c) Non-uniform (d) Nonlinear shift
symmetrical and non-uniform
distribution distribution
Figure 1-35 Beam focusing and steering are realized by shaping the time-delay distribution.
1-6 COMING ATTRACTIONS 27
which is comparable with the magnitude of the axial resolution 1-6 Coming Attractions
∆Rmin . The resolution along the orthogonal lateral direction,
∆ymin , is given by Eq. (1.24) with Lx replaced with Ly . The size Through examples of image processing products, this section
of the resolvable voxel is presents images extracted from various sections in the book.
In each case, we present a transformed image, along with a
∆V = ∆Rmin × ∆xmin × ∆ymin. (1.25) reference to its location within the text.
Figure 1-37 displays an ultrasound image of a fetus.

1-6.1 Image Warping by Interpolation
Section 4-10 demonstrates how an image can be warped by
nonlinear shifts of its pixel locations and then interpolating the
results to generate a smooth image. An example is shown in
Fig. 1-38.
199
0 199
Figure 1-37 Ultrasound image of a fetus. (a) Original clown image
50
100
Concept Question 1-10: How does an ultrasound imager
focus its beam in the range direction and in the lateral 150
direction? There are two orthogonal lateral directions, so 200
how is that managed? 250
300
Exercise 1-4: A 6 MHz ultrasound system generates pulses 350
with 2 cycles per pulse using a 5 cm × 5 cm 2-D transducer 399

0 50 100 150 200 250 300 350 399
array. What are the dimensions of its resolvable voxel when
focused at a range of 8 cm in a biological material? (b) Warped image product
Answer: ∆V = ∆Rmin × ∆xmin × ∆ymin Figure 1-38 Original clown image and nonlinearly warped
= 0.26 mm × 0.41 mm × 0.41 mm. product. [Extracted from Figs. 4-14 and 4-17.]
1-6.2 Image Sharpening by Highpass Filtering 1-6.3 Brightening by Histogram Equalization

Section 5-2 illustrates how an image can be sharpened by spatial Histogram equalization (nonlinear transformation of pixel val-
highpass filtering, and how this amplifies noise in the image. The ues) can be used to brighten an image, as illustrated by the pair
original image of an electronic circuit and its highpass-filtered of images in Fig. 1-40.
version are displayed in Fig. 1-39.
200
0 200
(a) Dark clown image
(a) Original image 0
200
0 200
(b) Brightened clown image
Figure 1-40 Application of histogram equalization to the dark

(b) Sharpened image image in (a) leads to the brighter image in (b). [Extracted from
Fig. 5-8.]
Figure 1-39 Image of electronic circuit before and after
application of a highpass sharpening filter. [Extracted from
Fig. 5-6.]
1-6.4 Edge Detection 1-6.5 Notch Filtering

Edges can be enhanced in an image by applying edge detection Notch filtering can be used to remove sinusoidal interference
algorithms, such as the Canny edge detector that was applied to from an image. The original Mariner space probe image and its
the image in Fig. 1-41(a). notch-filtered version are shown in Fig. 1-42.
0 0
20
40
60
80
100
120
140 659
0 799
160 (a) Original Mariner image
180
0
200
0 20 40 60 80 100 120 140 160 180 200
(a) Original clown image
20
40
60
80 659
0 799
100
(b) Notch-filtered Mariner image
120
140
Figure 1-42 The horizontal lines in (a) are due to sinusoidal
interference in the recorded image. The lines were removed by
160 applying notch filtering. [Extracted from Fig. 6-7.]
180
200
0 20 40 60 80 100 120 140 160 180 200
(b) Edge-detected image
Figure 1-41 Original clown image and its Canny edge-

detected version. [Extracted from Fig. 5-16.]
1-6.6 Motion-Blur Deconvolution 1-6.7 Denoising of Images Using Wavelets

Section 6-6 shows how to deblur a motion-blurred image. A One method used to denoise an image is by thresholding
blurred image and its motion-deblurred version are shown in and shrinking its wavelet transform. An example is shown in
Fig. 1-43. Fig. 1-44.
0
20
40
60
80
100
120
140
224 160
0 274
180
(a) Original motion-blurred image caused by
taking a photograph in a moving car 200
20 40 60 80 100 120 140 160 180 200
(a) A noisy clown image
0
20
40
60
80
100
120
140
160
274 180
0 274
200
(b) Motion deblurred image 20 40 60 80 100 120 140 160 180 200
(b) Wavelet-denoised clown image
Figure 1-43 Motion blurring is removed. [Extracted from
Fig. 6-11.] Figure 1-44 Image denoising. [Extracted from Fig. 7-21.]
1-6.8 Image Inpainting 1-6.9 Deconvolution Using Deterministic and

The wavelet transform of an image can be used to “inpaint” Stochastic Image Models
(restore missing pixels in an image). An image with deleted Chapters 8 and 9 provide reviews of probability and estimation.
pixels and its inpainted version are shown in Fig. 1-45. The reason for these reviews is that using a stochastic image
model, in which the 2-D power spectral density is modeled
as that of a fractal image, can yield much better results in
refocusing an image than is possible with deterministic models.
20
An example is shown in Fig. 1-46.
40
60
80
100
120
140
160
180
200
20 40 60 80 100 120 140 160 180 200 (a) Unfocused MRI image
(a) Image with missing pixels
20
40
60
80
100
120 (b) Deterministically refocused MRI image

140
160
180
200
20 40 60 80 100 120 140 160 180 200
(b) Inpainted image
Figure 1-45 The image in (b) was created by “filling in”

values for the missing pixels in the upper image. [Extracted
from Fig. 7-23.] (c) Stochastically refocused MRI image
Figure 1-46 The images in (b) and (c) demonstrate two

methods used for refocusing an unfocused image. [Extracted
from Fig. 9-4.]
1-6.10 Markov Random Fields for Image 1-6.11 Motion-Deblurring of a Color Image
Segmentation Chapters 1–9 consider grayscale (black-and-white) images,
In a Markov random field (MRF) image model, the value of each since color images consist of three (red, green, blue) images.
pixel is stochastically related to its surrounding values. This Motion deblurring of a color image is presented in Section 10-3,
is useful in segmenting images, as presented in Section 9-11. an example of which is shown in Fig. 1-48.
Figure 1-47 illustrates how an MRF image model can improve
the segmentation of an X-ray image of a foot into tissue and
bone.
(a) Motion-blurred Christmas tree

(a) Noisy image
(b) Segmented image

(b) Deblurred Christmas tree
Figure 1-47 Segmenting a noisy image into two distinct
classes: bone and tissue. [Extracted from Fig. 9-14.]
Figure 1-48 Deblurring a motion-blurred color image. [Ex-
tracted from Fig. 10-11.]
1-6.12 Wavelet-Based Denoising of a Color 1-6.13 Histogram Equalization (Brightening) of

Image a Color Image
Wavelet-based denoising can be used on each color component Color images can be brightened using histogram equalization, as
of a color image, as presented in Section 10-4. An illustration is presented in Section 10-5. Figure 1-50(a) displays a dark image
given in Fig. 1-49. of a toucan, and part (b) of the same figure displays the result of
applying histogram equalization to each color.
(a) Noisy flag image
(a) Original dark toucan image
(b) Denoised image
Figure 1-49 Denoising an image of the American flag using

wavelet-based denoising. [Extracted from Fig. 10-12.]
(b) Brightened image
Figure 1-50 Application of histogram equalization to a color

image. [Extracted from Figs. 10-13 and 10-14.]
1-6.14 Unsupervised Learning 1-6.15 Supervised Learning

In unsupervised learning, a set of training images is used to Supervised learning by neural networks is presented in Chapter
determine a set of reference images, which are then used to 12. An example of a neural network is shown in Fig. 1-52.
classify an observed image. The training images are mapped to a
subspace spanned by the most significant singular vectors of the
singular value decomposition of a training matrix. In Fig. 1-51,
the training images, depicted by blue “ ” symbols, cluster into
different image classes.
0
u2 1
1.5 2
784 terminals
3
1 4
Class 1 5
0.5 6
7
0
8
9
−0.5 Class 2 Class 3
−1
Input terminals Hidden layer Output layer

−1.5 u1
1 1.5 2 2.5 3 3.5
Figure 1-52 A multilayer neural network.
Figure 1-51 Depiction of training images in 2-D subspace.
[Extracted from Fig. 11-13.]
Summary
Concepts
• Images may be formed using any of these imaging • Color images are actually triplets of red, green, and blue
modalities: optical, infrared, radar, x-rays, ultrasound, images, displayed together.
and magnetic resonance imaging. • The effect of an image acquisition system on an image
• Image processing is needed to process a raw image, can usually be modelled as 2-D convolution with the
formed directly from data, into a final image, which has point spread function of the system (see below).
been deblurred, denoised, interpolated, or enhanced, all • The resolution of an image acquisition system can be
of which are subjects of this book. computed using various formulae (see below).
Mathematical Formulae
1 1 1
Lens law + = X-ray tomography path attenuation
d0 di f Z ∞Z ∞
p(r, θ ) = a(ξ , η ) δ (r − ξ cos θ − η sin θ ) d ξ d η
Optical point spread function −∞ −∞
λ
2J1 (γ ) 2 πD Optical resolution ∆θmin ≈ 1.22
h(θ ) = , γ= sin θ D
γ λ λ
Radar resolution ∆y′min ≈ R
SAR point spread function D
λN
2 1.8y Ultrasound resolution ∆Rmin =
h(x, y) = e−2.77(x/τ ) sinc2 2
l
sin(π N ∆k x)
MRI point spread function hx (x) = ∆k
sin(π ∆k x)
2-D convolution Z ∞Z ∞
Ii (x, y) = Io (x, y) ∗ ∗ h(x, y) = Io (x − x′ , y − y′) h(x′ , y′ ) dx′ dy′
−∞ −∞
Important Terms Provide definitions or explain the meaning of the following terms:
active pixel sensor liquid crystal display radar X-ray computed tomography
beamforming magnetic resonance imaging (MRI) resolution
charge-coupled device optical imaging synthetic-aperture radar
infrared imaging point spread function ultrasound imaging
PROBLEMS 1.6 The following program loads an image stored in

sar.mat as Io (x, y), passes it through an imaging system with
Section 1-1: Optical Imagers the PSF given by Eq. (1.15), and displays Io (x, y) and Ii (x, y).
Parameters ∆, τ and l are specified in the program’s first line.
1.1 An imaging lens in a digital camera has a focal length of clear;Delta=0.1;l=5;tau=1;I=[-15:15];
6 cm. How far should the lens be from the camera’s CCD array z=pi*1.8*Delta*I/l;load sar.mat;
to focus on an object hy=sin(pi*z)./(pi*z);hy(16)=1;hy=hy.*hy;
(a) 12 cm in front of the lens? hx=exp(-2.77*Delta*Delta*I.*I/tau/tau);
(b) 15 cm in front of the lens? H=hy’*hx;Y=conv2(X,H);
figure,imagesc(X),axis off,colormap(gray),
1.2 An imaging lens in a digital camera has a focal length of
figure,imagesc(Y),axis off,colormap(gray)
4 cm. How far should the lens be from the camera’s CCD array
to focus on an object
Run the program and display Io (x, y) (input) and Ii (x, y)
(a) 12 cm in front of the lens? (output).
(b) 8 cm in front of the lens?
1.3 The following program loads an image stored in
clown.mat as Io (x, y), passes it through an imaging system Section 1-3: X-Ray Computed Tomography (CT)
with the PSF given by Eq. (1.6), and displays Io (x, y) and
Ii (x, y). Parameters ∆, D, di , and λ (all in mm) are specified in 1.7 (This problem assumes prior knowledge of the 1-D Fourier
the program’s first line. transform (FT)). The basic CT problem is to reconstruct α (ξ , η )
clear;Delta=0.0002;D=0.03; in Eq. (1.18) from p(r, θ ). One way to do this is as follows:
lambda=0.0000005;di=0.003;
(a) Take the FT of Eq. (1.18), transforming r to f . Define
T=round(0.01/Delta);
p(−r, θ ) = p(r, θ + π ).
for I=1:T;for J=1:T;
x2y2(I,J)=(I-T/2).*(I-T/2)+(J-T/2). (b) Define and substitute µ = f cos θ and ν = f sin θ in this
*(J-T/2);end;end; FT.
gamma=pi*D/lambda* (c) Show that the result defines 2 FTs, transforming ξ to µ and
sqrt(x2y2./(x2y2+di*di/Delta/Delta)); η to ν , and that A(µ , ν ) = P( f , θ ). Hence, α (ξ , η ) is the
h=2*besselj(1,gamma)./gamma; inverse FT of P( f , θ ).
h(T/2,T/2)=(h(T/2+1,T/2)+h(T/2-1,T/2)
+h(T/2,T/2+1)+h(T/2,T/2-1))/4;
h=h.*h;H=h(T/2-5:T/2+5,T/2-5:T/2+5); Section 1-4: Magnetic Resonance Imaging
load clown.mat;Y=conv2(X,H);
figure,imagesc(Y),axis off,colormap(gray) 1.8 The following program loads an image stored in
mri.mat as Io (x, y), passes it through an imaging system with
Run the program and display Io (x, y) (input) and Ii (x, y) the PSF given by Eq. (1.20), and displays Io (x, y) and Ii (x, y).
(output). Parameters ∆, N, and dk are specified in the program’s first line.
clear;N=16;Delta=0.01;dk=1;
I=[-60:60];load mri.mat;
Section 1-2: Radar Imagers h=dk*sin(pi*N*dk*I*Delta)./sin(pi*dk*I*Delta);
h(61)=N;H=h’*h;Y=conv2(X,H);
1.4 Compare the azimuth resolution of a real-aperture radar
with that of a synthetic-aperture radar, with both pointed at the
figure,imagesc(Y),axis off,colormap(gray)
ground from an aircraft at a range R = 5 km. Both systems
operate at λ = 3 cm and utilize a 2-m-long antenna.
Run the program and display Io (x, y) (input) and Ii (x, y)
1.5 A 2-m-long antenna is used to form a synthetic-aperture (output).
radar from a range of 100 km. What is the length of the synthetic
aperture?
PROBLEMS 37
Section 1-5: Ultrasound Imager
1.9 This problem shows how beamforming works on a linear

array of transducers, as illustrated in Fig. 1-35, in a medium
with a wave speed of 1540 m/s. We are given a linear array
of transducers located 1.54 cm apart along the x axis, with
the nth transducer located at x = 1.54n cm. Outputs {yn (t)}
from the transducers are delayed and summed to produce the
signal y(t) = ∑n yn (t − 0.05n). In what direction (angle from
perpendicular to the array) is the array focused?
Chapter 2
2 Review of 1-D Signals
and Systems
x(t)
Contents 1.0 Trumpet signal
0.8
Overview, 39 0.6
0.4
2-1 Review of 1-D Continuous-Time Signals, 41
0.2
2-2 Review of 1-D Continuous-Time Systems, 43
0 t (ms)
2-3 1-D Fourier Transforms, 47
−0.2
2-4 The Sampling Theorem, 53 −0.4
2-5 Review of 1-D Discrete-Time Signals −0.6
and Systems, 59 −0.8
2-6 Discrete-Time Fourier Transform (DTFT), 66 −1
0 1 2 3 4 5 6 7
2-7 Discrete Fourier Transform (DFT), 70
(a) x(t)
2-8 Fast Fourier Transform (FFT), 76
2-9 Deconvolution Using the DFT, 80 |X( f )|
2-10 Computation of Continuous-Time Fourier 0.35

Magnitude spectrum of trumpet signal
Transform (CTFT) Using the DFT, 82 0.30
Problems, 86
0.25
Objectives 0.20
0.15
Learn to:
0.10
■ Compute the response of an LTI system to a given 0.05

input using convolution. 0 f (Hz)
0 500 1500 2500 3500 4500
■ Compute the frequency response (response to a (b) |X( f )|
sinusoidal input) of an LTI system.
Many techniques and transforms in image process-
■ Compute the continuous-time Fourier transform of a ing are direct generalizations of techniques and
signal or impulse response.
transforms in 1-D signal processing. Reviewing
■ Use the sampling theorem to convert a continuous- these 1-D concepts enhances the understanding of
time signal to a discrete-time signal. their 2-D counterparts. These include: linear
time-invariant (LTI) systems, convolution, frequen-
■ Perform the three tasks listed above for continuous- cy response, filtering, Fourier transforms for
time signals on discrete-time signals. continuous and discrete-time signals, and the
sampling theorem. This chapter reviews these 1-D
■ Use the discrete Fourier transform (DFT) to denoise, concepts for generalization to their 2-D counterparts
filter, and deconvolve signals. in Chapter 3.
Overview
Some topics and concepts in 1-D signals and systems generalize
directly to 2-D. This chapter provides quick reviews of those
topics and concepts in 1-D so as to simplify their repeat presen-
tation in 2-D in future chapters. We assume the reader is already
familiar with 1-D signals and systems∗ —in both continuous and
discrete time, so the presentation in this chapter is more in the
form of a refresher than an extensive treatment. Moreover, we
limit the coverage to topics that generalize directly from 1-D
to 2-D. These topics include those listed in the box below.
Some topics that do not generalize readily from 1-D to 2-D
include: causality; differential and difference equations; transfer
functions; poles and zeros; Laplace and z-transforms. Hence,
these topics will not be covered in this book.
1-D Signals and Systems 2-D Signals and Systems

(1) Linear time-invariant (LTI) 1-D systems Linear shift-invariant (LSI) 2-D systems
(2) Frequency response of LTI systems Spatial frequency response of LSI systems
(3) Impulse response of 1-D systems Point-spread function of LSI systems
(4) 1-D filtering and convolution 2-D filtering and convolution
(5) Sampling theorem in 1-D Sampling theorem in 2-D
(6) Discrete-time Fourier transform (DTFT) Discrete-space Fourier transform (DSFT)
∗ For a review, see Engineering Signals and Systems in Continuous and
Discrete Time, Ulaby and Yagle, NTS Press, 2016.
39
40 CHAPTER 2 REVIEW OF 1-D SIGNALS AND SYSTEMS
For the sake of clarity, we start this chapter with a synopsis of

the terminology and associated symbols used to represent 1-D
continuous-time and discrete-time signals and 2-D continuous-
space and discrete-space signals, and their associated spectra.
1-D Signals
Continuous Time
FT
x(t) X( f )
Signal in Spectrum in
time domain frequency domain
Discrete Time
DTFT DFT
x[n] X(Ω) X[k]
Signal at Spectrum at Spectrum at
2πk
discrete times continuous discrete frequencies Ω =
N
t = n∆ frequency Ω 0≤k≤N−1
2-D Images
Continuous Space
CSFT
f (x,y) F(μ,ν)
Image in Spectrum in
spatial domain frequency domain
Discrete Space
DSFT 2-D DFT
f [n,m] F(Ω1,Ω2) F[k1,k2]
Image in Spectrum in order N Spectrum in
discrete space continuous discrete frequency domain:
frequency domain 2πk1 2πk2
Ω1 = , Ω2 =
N N
2-1 REVIEW OF 1-D CONTINUOUS-TIME SIGNALS 41
2-1 Review of 1-D Continuous-Time The rectangle function rect(t) is defined as

Signals (
1 for − 1/2 < t < 1/2,
rect(t) = (2.3a)
A continuous-time signal is a physical quantity, such as voltage 0 otherwise.
or acoustic pressure, that varies with time t, where t is a real
number having units of time (usually seconds). Mathematically The pulse x(t) defined in Eq. (2.2) can be written in terms of
a continuous-time signal is a function x(t) of t. Although t can rect(t) as
also be a spatial variable, we will refer to t as time, to avoid
t − t0
confusion with the spatial variables used for images. x(t) = rect . (2.3b)
T
Rectangle Pulse
2-1.1 Fundamental 1-D Signals
In Table 2-1, we consider three types of fundamental,
continuous-time signals.
C. Impulses
An impulse δ (t) is defined as a function that has the sifting
A. Eternal Sinusoids property
An (eternal) sinusoid with amplitude A, phase angle θ (radians),
Z ∞
and frequency f0 (Hz), is described by the function
x(t) δ (t − t0 ) dt = x(t0 ). (2.4)
−∞
x(t) = A cos(2π f0t + θ ), −∞ < t < ∞. (2.1)
Sifting Property
The period of x(t) is T = 1/ f0 .
Even though an eternal sinusoid cannot exist physically (since
it would extend from before the beginning of the universe until ◮ Multiplying a function x(t) that is continuous at t = t0 by
after its end), it is used nevertheless to mathematically describe a delayed impulse δ (t −t0 ) and integrating over t “sifts out”
periodic signals in terms of their Fourier series. Also, another the value x(t0 ). ◭
useful aspect of eternal sinusoids is that the response of a linear
time-invariant LTI system (defined in Section 2-2.1) to an eter-
nal sinusoid is another eternal sinusoid at the same frequency as Setting x(t) = 1 in Eq. (2.4) shows that an impulse has an area
that of the input sinusoid, but with possibly different amplitude of unity.
and phase. Despite the fact that the sinusoids are eternal (and An impulse can be thought of (non-rigorously) as the limiting
therefore, unrealistic), they can be used to compute the response case of a pulse of width T = 2ε multiplied by an amplitude
of real systems to real signals. Consider, for example, a sinusoid A = 1/(2ε ) as ε → 0:
that starts at t = 0 as the input to a stable and causal LTI system.
The output consists of a transient response that decays to zero, 1 t
plus a sinusoid that starts at t = 0. The output sinusoid has the
δ (t) = lim rect . (2.5)
ε →0 2ε 2ε
same amplitude, phase, and frequency as the response that the
system would have had to an eternal input sinusoid. Because the width of δ (t) is the reciprocal of its amplitude
(Fig. 2-1), the area of δ (t) remains 1 as ε → 0.
The limit is undefined, but it is useful to think of an impulse
B. Pulses as the limiting case of a short duration and high pulse with unit
area. Also, the pulse shape need not be rectangular; a Gaussian
A (rectangular) pulse of duration T centered at time t0 (Table or sinc function can also be used.
2-1) is defined as Changing variables from t to t ′ = at yields the time scaling
( property (Table 2-1) of impulses:
1 for (t0 − T /2) < t < (t0 + T /2),
x(t) = (2.2)
0 otherwise. δ (at) = δ (t)/|a| (2.6)
Table 2-1 Types of signals and signal properties.
Types of Signals
x(t)
A cos (2πt /T )
A
Eternal sinusoid x(t) = A cos(2π f0t + θ ), −∞ < t < ∞
t
−T −T/2 0 T/2 T
t − t0
rect ( )T
T
t − t0 1
Pulse (rectangle) x(t) = rect
( T
t (s)
1 for (t0 − T /2) < t < (t0 + T /2), 0 t0
=
0 otherwise
1 δ(t − t0)
Z ∞
Impulse δ (t) x(t) δ (t − t0 ) dt = x(t0 )
−∞
t
0 t0
Properties
x(t)
Causal x(t) = 0 for t < 0 (starts at or after t = 0)

t
0
6
x(t + 10)
4 x(t)
Time delay by t0 x(t) x(t − t0 ) 2 x(t − 10)

0 t (s)
−10 −6 0 4 10 14 20 30
y1(t) = x(2t)
10 x(t) y2(t) = x(t / 2)
Time scaling by a x(t) x(at) t

Z ∞ 1 2 3 4 5
Signal energy E= |x(t)|2 dt
−∞
2-2 REVIEW OF 1-D CONTINUOUS-TIME SYSTEMS 43
The scaling property can be interpreted using Eq. (2.5) as A nonzero signal x(t) that is zero-valued outside the interval
follows. For a > 1 the width of the pulse in Eq. (2.5) is
compressed by |a|, reducing its area by a factor of |a|, but its [a, b] = {t : a ≤ t ≤ b},
height is unaltered. Hence the area under the pulse is reduced to
1/|a|. (i.e., x(t) = 0 for t ∈
/ [a, b]), has support [a, b] and duration b − a.
Impulses are important tools used in defining the impulse
responses of 1-D systems and the point-spread functions of 2-D Concept Question 2-1: Why does scaling time in an im-
spatial systems (such as a camera or an ultrasound), as well as pulse also scale its area?
for deriving the sampling theorem.
R∞
Exercise 2-1: Compute the value of −∞ δ (3t − 6) t 2 dt.
2-1.2 Properties of 1-D Signals
Answer: 43 . (See IP )
A. Time Delay
Delaying signal x(t) by t0 generates signal x(t −t0 ). If t0 > 0, the Exercise 2-2: Compute
the energy of the pulse defined by
waveform of x(t) is shifted to the right by t0 , and if t0 < 0, the t−2
x(t) = 5 rect 6 .
waveform of x(t) is shifted to the left by |t0 |. This is illustrated
by the time-delay figure in Table 2-1. Answer: 150. (See IP )
B. Time Scaling
2-2 Review of 1-D Continuous-Time
A signal x(t) time-scaled by a becomes x(at). If a > 1, the
waveform of x(t) is compressed in time by a factor of a. If Systems
0 < a < 1, the waveform of x(t) is expanded in time by a factor
of 1/a, as illustrated by the scaling figure in Table 2-1. If a < 0, A continuous-time system is a device or mathematical model
the waveform of x(t) is compressed by |a| or expanded by 1/|a|, that accepts a signal x(t) as its input and produces another signal
and then time-reversed. y(t) at its output. Symbolically, the input-output relationship is
expressed as
C. Signal Energy
x(t) SYSTEM y(t)
The energy E of a signal x(t) is
Z ∞
Table 2-2 provides a list of important system types and proper-
E= |x(t)|2 dt. (2.7)
−∞ ties.
2-2.1 Linear and Time-Invariant Systems

δ(t) Systems are classified on the basis of two independent proper-
Area = 1
ties: (a) linearity and (b) time invariance, which leads to four
possible classes:
1 (1) Linear (L), but not time-invariant.
2ε
(2) Linear and time-invariant (LTI).
t (3) Nonlinear, but time-invariant (TI).
−ε 0 ε
2ε
(4) Nonlinear and not time-invariant.
Figure 2-1 Rectangular pulse model for δ (t). Most practical systems (including 2-D imaging systems such
a camera, ultrasound, radar, etc.) belong to one of the first two
Table 2-2 Types of systems and associated properties.
Property Definition
N N
Linear System (L) If xi (t) L yi (t), then ∑ ci xi (t) L ∑ ci yi (t)
i=1 i=1
Time-Invariant (TI) If x(t) TI y(t), then x(t − τ ) TI y(t − τ )
N N
Linear Time-Invariant (LTI) If xi (t) LTI yi (t), then ∑ ci xi (t − τi) LTI ∑ ci yi (t − τi)
i=1 i=1
Impulse Response of LTI System δ (t − τ ) LTI y(t) = h(t − τ )
of the above four classes. If a system is moderately nonlinear, exactly the same direction. That is, if
it can be approximated by a linear model, and if it is highly
nonlinear, it may be possible to divide its input-output response
x(t) TI y(t),
into a series of quasi-linear regions. In this book, we limit our
treatment to linear (L) and linear time-invariant (LTI) systems.
then it follows that
A. Linear Systems x(t − τ ) TI y(t − τ ). (2.9)

A system is linear (L) if its response to a linear combination
of input signals acting simultaneously is the same as the linear for any input signal x(t) and constant time shift τ .
combination of the responses to each of the input signals acting
alone. That is:
If
◮ Systems that are both linear and time-invariant are termed
xi (t) L yi (t), (2.8a)
linear time-invariant (LTI). ◭
then for any N inputs {xi (t), i = 1 . . . N} and any N constants
{ci , i = 1 . . . N},
N N 2-2.2 Impulse Response
∑ ci xi (t) L ∑ ci yi (t). (2.8b)
i=1 i=1 In general, we use the symbols x(t) and y(t) to denote, respec-
tively, the input signal into a system and the resultant output
Under mild assumptions (regularity) about the system, the finite
response. The term impulse response is used to denote the out-
sum can be extended to infinite sums and integrals.
put response for the specific case when the input is an impulse.
Linearity also is called the superposition property.
For non–time-invariant systems, the impulse response depends
on the time at which the impulse is nonzero. The response to
B. Time-Invariant System the impulse δ (t) delayed by τ , which is δ (t − τ ), is denoted as
h(t; τ ). If the system is time-invariant, then delaying the impulse
A system is time-invariant (TI) if time shifting (delaying) the merely delays the impulse response, so h(t; τ ) = h(t − τ ), where
input, time shifts the output by exactly the same amount and in h(t) is the response to the impulse δ (t). This can be summarized
2-2 REVIEW OF 1-D CONTINUOUS-TIME SYSTEMS 45
in the following two equations: which is known as the convolution integral. Often, the convolu-
tion integral is represented symbolically by
Z ∞
δ (t − τ ) SYSTEM h(t; τ ), (2.10a)
x(t) ∗ h(t) = x(τ ) h(t − τ ) d τ . (2.15)
−∞
and for a time-invariant system,
Combining the previous results leads to the symbolic form
δ (t − τ ) TI h(t − τ ). (2.10b)
x(t) LTI y(t) = x(t) ∗ h(t). (2.16)
The steps leading to Eq. (2.16) are summarized in Fig. 2-2.

The convolution in Eq. (2.15) is realized by time-shifting the
2-2.3 Convolution impulse response h(t). Changing variables from τ to t − τ shows
that convolution has the commutative property:
A. Linear System
x(t) ∗ h(t) = h(t) ∗ x(t). (2.17)
Upon interchanging t and t0 in the sifting property given by
Eq. (2.4) and replacing t0 with τ , we obtain the relationship The expression for y(t) can also be derived by time-shifting the
Z ∞
input signal x(t) instead, in which case the result would be
x(τ ) δ (t − τ ) d τ = x(t). (2.11) Z ∞
−∞ y(t) = h(t) ∗ x(t) = x(t − τ ) h(τ ) d τ . (2.18)
−∞
Next, if we multiply both sides of Eq. (2.10a) by x(τ ) and then
integrate τ over the limits (−∞, ∞), we obtain Table 2-3 provides a summary of key properties of convolution
that are extendable to 2-D, and Fig. 2-3 offers a graphical
Z ∞ Z ∞ representation of how two of those properties—the associative
x(τ ) δ (t − τ ) d τ L y(t) = x(τ ) h(t; τ ) d τ . and distributive properties—are used to characterize the overall
−∞ −∞
(2.12) impulse responses of systems composed of multiple systems,
Upon using Eq. (2.11) to replace the left-hand side of Eq. (2.12) when connected in series or in parallel, in terms of the impulse
with x(t), Eq. (2.12) becomes responses of the individual systems.
Z ∞
x(t) L y(t) = x(τ ) h(t; τ ) d τ . (2.13)
−∞
Concept Question 2-2: What is the significance of a sys-
This integral is called the superposition integral. tem being linear time-invariant?
Concept Question 2-3: Why does delaying either of two

signals delay their convolution?
B. LTI System
For an LTI system, h(t; τ ) = h(t − τ ), in which case the expres- Exercise 2-3: Is the following system linear, time-invariant,
sion for y(t) in Eq. (2.13) becomes both, or neither?
dy
Z ∞ = 2x(t − 1) + 3tx(t + 1)
dt
y(t) = x(τ ) h(t − τ ) d τ , (2.14)
−∞
Convolution Integral Answer: System is linear but not time-invariant. (See IP )
LTI System with Zero Initial Conditions
1. δ (t) LTI y(t) = h(t)
2. δ (t − τ ) LTI y(t) = h(t − τ )
3. x(τ ) δ (t − τ ) LTI y(t) = x(τ ) h(t − τ )
Z ∞ Z ∞
4. x(τ ) δ (t − τ ) d τ LTI y(t) = x(τ ) h(t − τ ) d τ
−∞ −∞
Z ∞
5. x(t) LTI y(t) = x(τ ) h(t − τ ) d τ = x(t) ∗ h(t)
−∞
Figure 2-2 Derivation of the convolution integral for a linear time-invariant system.
Table 2-3 Convolution properties.

Z ∞
Convolution Integral y(t) = h(t) ∗ x(t) = h(τ ) x(t − τ ) d τ
−∞
Z t
• Causal Systems and Signals: y(t) = h(t) ∗ x(t) = u(t) h(τ ) x(t − τ ) d τ
0
Property Description
1. Commutative x(t) ∗ h(t) = h(t) ∗ x(t) (2.19a)
2. Associative [g(t) ∗ h(t)] ∗ x(t) = g(t) ∗ [h(t) ∗ x(t)] (2.19b)
3. Distributive x(t) ∗ [h1 (t) + · · · + hN (t)] = x(t) ∗ h1 (t) + · · · + x(t) ∗ hN (t) (2.19c)
Z t
4. Causal ∗ Causal = Causal y(t) = u(t) h(τ ) x(t − τ ) d τ (2.19d)
0
5. Time-shift h(t − T1 ) ∗ x(t − T2 ) = y(t − T1 − T2 ) (2.19e)
6. Convolution with Impulse x(t) ∗ δ (t − T ) = x(t − T ) (2.19f)

2-3 1-D FOURIER TRANSFORMS 47
x(t) h1(t) h2(t) hN(t) y(t) Answer:

(
e−2t − e−3t for t > 0,
y(t) =
0 for t < 0.
x(t) h1(t) ∗ h2(t) ∗ … ∗ hN(t) y(t) (See IP )
(a) In series
2-3 1-D Fourier Transforms
The continuous-time Fourier transform (CTFT) is a powerful
h1(t) tool for
• computing the spectra of signals, and
x(t) h2(t) y(t)
• analyzing the frequency responses of LTI systems.
hN(t) 2-3.1 Definition of Fourier Transform

The 1-D Fourier transform X( f ) of x(t) and the inverse 1-D
Fourier transform x(t) of X( f ) are defined by the transforma-
tions
x(t) h1(t) + h2(t) + … + hN(t) y(t) Z ∞
X( f ) = F {x(t)} = x(t) e− j2π f t dt (2.20a)
(b) In parallel −∞
and
Z ∞
Figure 2-3 (a) The overall impulse response of a system com-
posed of multiple LTI systems connected in series is equivalent
x(t) = F −1 {X( f )} = X( f ) e j2π f t d f . (2.20b)
−∞
to the cumulative convolution of the impulse responses of the
individual systems. (b) For LTI systems connected in parallel,
the overall impulse response is equal to the sum of the impulse Throughout this book, variables written in boldface (e.g., X( f ))
responses of the individual systems. denote vectors or complex-valued quantities.
A. Alternative Definitions of the Fourier

Transform
Note that Eq. (2.20) differs slightly from the usual electrical
Exercise 2-4: Compute the output y(t) of an LTI system engineering definition of the Fourier-transform pair:
with impulse response h(t) to input x(t), where Z ∞
( Xω (ω ) = x(t) e− jω t dt (2.21a)
−∞
e−3t for t > 0,
h(t) =
0 for t < 0, and Z ∞
1
x(t) = Xω (ω ) e jω t d ω . (2.21b)
and ( 2π −∞
e−2t for t > 0, Whereas Eq. (2.20) uses the oscillation frequency f (in Hz),
x(t) =
0 for t < 0. the definition given by Eq. (2.21) uses ω (in rad/s) instead,
where ω = 2π f . Using Hz makes interpretation of the Fourier
transform as a spectrum—as well as the presentation of the Shifting:

sampling theorem in Section 2-4—easier. Z ∞
The definition of the Fourier transform used by mathemati- x(t − τ ) = X( f ) e j2π f (t−τ ) d f
cians has a different sign for ω than the definition used by −∞
Z ∞
electrical engineers: = X( f ) e j2π f t e− j2π f τ d f
Z ∞ −∞
X−ω (ω ) = x(t) e jω t dt (2.22a) =F −1
{X( f ) e− j2π f τ }. (2.24)
−∞
and Modulation:
Z ∞
1 − jω t Z ∞
x(t) = X−ω (ω ) e dω . (2.22b) e j2π f0t x(t) = X( f ) e j2π f t e j2π f0t d f
2π −∞
−∞
Geophysicists use different sign conventions for time and space! Z ∞
In addition, some computer programs,√such as Mathematica, = X( f ) e j2π ( f + f0 )t d f
−∞
split the 1/(2π ) factor into factors of 1/ 2π in both the forward −1
and inverse transforms. =F {X( f − f0 )}. (2.25)
Derivative:
◮ In this book, we use the definition of the Fourier trans- Z ∞

dx(t) d j2π f t
form given by Eq. (2.20) exclusively. ◭ = X( f ) e df
dt dt −∞
Z ∞
= X( f ) ( j2π f ) e j2π f t d f
B. Fourier Transform Notation −∞
−1
Throughout this book, we use Eq. (2.20) as the definition of the =F {( j2π f ) X( f )}. (2.26)
Fourier transform, we denote the individual transformations by
Zero frequency:
F {x(t)} = X( f ) Setting f = 0 in Eq. (2.20a) leads to
Z ∞
and X(0) = x(t) dt.
F −1 {X( f )} = x(t), −∞
and we denote the combined bilateral pair by Zero time:

Similarly, setting t = 0 in Eq. (2.20b) leads to
x(t) X( f ). Z ∞
x(0) = X( f ) d f .
−∞
2-3.2 Fourier Transform Properties
The other properties in Table 2-4 follow readily from the
The major properties of the Fourier transform are summarized definition of the Fourier transform given by Eq. (2.20), except
in Table 2-4, and derived next. for the convolution property, which requires a few extra steps of
algebra.
Scaling: Parseval’s theorem states that
For a 6= 0, Z ∞ Z ∞
Z ∞ E= x(t) y∗ (t) dt = X( f ) Y∗ ( f ) d f . (2.27)
j2π f (at) −∞ −∞
x(at) = X( f ) e df
−∞
Z ∞ Setting y(t) = x(t) gives Rayleigh’s theorem (also commonly
= X( f ) e j2π ( f a)t d(a f )/a known as Parseval’s theorem), which states that the energies of
−∞

−1 1 f
=F X . (2.23)
|a| a
Table 2-4 Major properties of the Fourier transform.

Z ∞
Property x(t) X( f ) = F [x(t)] = x(t) e− j2π f t dt
−∞
1. Linearity ∑ ci xi (t) ∑ ci Xi ( f )

1 f
2. Time scaling x(at) X
|a| a
3. Time shift x(t − τ ) e− j2π f τ X( f )
4. Frequency shift (modulation) e j2π f0 t x(t) X( f − f0 )
dx
5. Time derivative x′ = j2π f X( f )
dt
6. Reversal x(−t) X(− f )
7. Conjugation x∗ (t) X∗ (− f )
8. Convolution in t x(t) ∗ y(t) X( f ) Y( f )
9. Convolution in f (multiplication in t) x(t) y(t) X( f ) ∗ Y( f )
10. Duality X(t) x(− f )
Special FT Relationships
Z ∞
11. Zero frequency X(0) = x(t) dt
−∞
Z ∞
12. Zero time x(0) = X( f ) d f
−∞
Z ∞ Z ∞
13. Parseval’s theorem x(t) y∗ (t) dt = X( f ) Y∗ ( f ) d f
−∞ −∞
x(t) and X( f ) are equal: where the even component xe (t) and the odd component xo (t)
Z ∞ Z ∞ are formed from their parent signal x(t) as follows:
E= |x(t)|2 dt = |X( f )|2 d f . (2.28)
−∞ −∞ xe (t) = [x(t) + x∗ (−t)]/2 (2.30a)
and
xo (t) = [x(t) − x∗(−t)]/2. (2.30b)
A signal is said to have even symmetry if x(t) = x∗ (−t), in
A. Even and Odd Parts of Signals which case x(t) = xe (t) and xo (t) = 0. Similarly, a signal has
odd symmetry if x(t) = −x∗ (−t), in which case x(t) = xo (t) and
xe (t) = 0.
A signal x(t) can be decomposed into even xe (t) and odd x0 (t)
components:
x(t) = xe (t) + x0(t), (2.29)
B. Conjugate Symmetry and a frequency response

(
If x(t) is real-valued, then the following conjugate symmetry 1 for | f | < fc ,
relations hold: HLP ( f ) = (2.33)
X(− f ) = X∗ ( f ), (2.31a) 0 for | f | > fc ,
where X∗ ( f ) is the complex conjugate of X( f ), eliminates all frequency components of X( f ) above fc .

X( f ) = − X(− f ) (phase is an odd function), (2.31b)
D. Sinc Functions
|X( f )| = |X(− f )| (magnitude is even), (2.31c)
real(X( f )) = real(X(−f)) (real part is even), (2.31d) The impulse response of the ideal lowpass filter characterized
by Eq. (2.33) is
imag(X( f )) = −imag(X(− f )) (imaginary part is odd).
(2.31e) hLP (t) = F −1 {HLP ( f )}
Z fc
(
The real and imaginary parts of the Fourier transform of a real- j2π f t sin(2π fct)/π t for t 6= 0,
valued signal x(t) are the Fourier transforms of the even and = 1e df =
− fc 2 fc for t = 0.
odd parts of x(t), respectively. So the Fourier transform of a
(2.34)
real-valued and even function is real-valued, and the Fourier
transform of a real-valued and odd function is purely imaginary: The scientific literature contains two different, but both com-
monly used definitions for the sinc function:
x(t) is even X( f ) is real,
x(t) is odd X( f ) is imaginary, (1) sinc(x) = sin(x)/x, and
x(t) is real and even X( f ) is real and even, (2) sinc(x) = sin(π x)/π x.
x(t) is real and odd X( f ) is imaginary and odd.
With either definition, sinc(0) = 1 since sin(x) ≈ x for x ≪ 1.
C. Filtering and Frequency Response

◮ Throughout this book, we use the sinc function definition
The following is a very important property of the Fourier (
transform: sin(π x)/π x for x 6= 0,
sinc(x) = (2.35)
1 for x = 0.
◮ The Fourier transform of a convolution of two functions
is equal to the product of their Fourier transforms:
Hence, per the definition given by Eq. (2.35), the impulse
response of the ideal lowpass filter is given by
x(t) LTI y(t) = h(t) ∗ x(t)
hLP (t) = 2 fc sinc(2 fc t). (2.36)
implies that
Y( f ) = H( f ) X( f ). (2.32) 2-3.3 Fourier Transform Pairs
Commonly encountered Fourier transform pairs are listed in
Table 2-5. Note the duality between entries #1 and #2, #4 and
The function H( f ) = F {h(t)} is called the frequency response #5, and #6 and itself.
of the system. The relationship described by Eq. (2.32) defines
the frequency filtering process performed by the system. At a
given frequency f0 , frequency component X( f0 ) of the input is 2-3.4 Interpretation of the Fourier Transform
multiplied by H( f0 ) to obtain the frequency component Y( f0 ) A Fourier transform can be interpreted in three ways:
of the output.
For example, an ideal lowpass filter with cutoff frequency fc (1) as the frequency response of an LTI system,
Table 2-5 Examples of Fourier transform pairs. Note that constant a ≥ 0.
|x(t)| X( f ) = F [x(t)] |X( f )|

BASIC FUNCTIONS
δ(t) 1
1a. 1 δ (t) 1
t f
1 1
1b. t δ (t − τ ) e− j2π f τ
τ f
1
2. 1 δ(f) 1
t f
1 2a 2/a
3. e−a|t| , a > 0
(2π f )2 + a2
t f
1 T
t f
rect(t/T ) T sinc( f T )
−T T
4.
−1 1
2 2 T T
T 1
f f
5. −1 1 f0 sinc( f0t) rect( f / f0 ) f0 f0
f0 f0 −
2 2
1 2 2 1
6. e−π t e−π f
t f
(2) as the spectrum of a signal, and B. Spectrum
(3) as the energy spectral density of a signal.

The spectrum of x(t) is X( f ). Consider, for example, the eternal
sinusoid defined by Eq. (2.1):
A. Frequency Response x(t) = A cos(2π f0t + θ )

A A
The frequency response H( f ) of an LTI system is the frequency = e jθ e j2π f0t + e− jθ e− j2π f0t , (2.37)
2 2
domain equivalent of the system’s impulse response h(t):
where we used the relation cos(x) = (e jx + e− jx )/2. From the
H( f ) h(t). properties and pairs listed in Tables 2-4 and 2-5, we have
A jθ A
X( f ) = e δ ( f − f0 ) + e− jθ δ ( f + f0 ). (2.38)
2 2
Strictly speaking, the Fourier transform of an eternal sinusoid is

undefined, since an eternal sinusoid is not absolutely integrable.
Nevertheless, this example provides a convenient illustration
that the spectrum of a sinusoid at f0 is concentrated entirely
at ± f0 . By extension, the spectrum of a constant signal is x(t)
concentrated entirely at f = 0.
1.0 Trumpet signal
Real signals are more complicated than simple sinusoids, as
are their corresponding spectra. Figure 2-4(a) displays 7 ms 0.8
of a trumpet playing note B, and part (b) of the same figure 0.6
displays the corresponding spectrum. The spectrum is concen-
trated around narrow spectral lines located at 491 Hz and the 0.4
next 6 harmonics. In contrast, speech exhibits a much broader 0.2
spectrum, as illustrated by the examples in Fig. 2-5. 0 t (ms)
−0.2
C. Energy Spectral Density −0.4
−0.6
A more rigorous interpretation of X( f ) that avoids impulses in
frequency uses the concept of energy spectral density. Let the −0.8
spectrum of a signal x(t) be −1
0 1 2 3 4 5 6 7
(
constant for f0 < f < f0 + ε , (a) x(t)
X( f ) =
0 otherwise,
( |X( f )|
X( f0 ) for f0 < f < f0 + ε ,
≈ (2.39) 0.35
0 otherwise. Magnitude spectrum of trumpet signal
The approximation becomes exact in the limit as ε → 0, pro- 0.30

vided X( f ) is continuous at f = f0 .
Using Rayleigh’s theorem, the energy of x(t) is 0.25
Z ∞
0.20
E= |X( f )|2 d f ≈ |X( f0 )|2 ε . (2.40)
−∞
0.15
The energy of x(t) in the interval f0 < f < f0 + ε is |X( f0 )|2 ε .
The energy spectral density at frequency f = f0 (analogous to 0.10
probability density or to mass density of a physical object) is
|X( f0 )|2 . 0.05
Note that a real-valued x(t) will, by conjugate symmetry, also
have nonzero X( f ) in the interval − f0 > f > − f0 − δ . So the 0 f (Hz)
bilateral energy spectral density of x(t) at f0 is 2|X( f0 )|2 . 0 500 1500 2500 3500 4500
(b) |X( f )|
Concept Question 2-4: Provide three applications of the
Figure 2-4 Trumpet signal (note B) and its magnitude spectrum.
Fourier transform.
Concept Question 2-5: Provide an application of the

sinc function.
2-4 THE SAMPLING THEOREM 53
d
Sound magnitude Exercise 2-6: Compute the Fourier transform of dt [sinc(t)].
Answer:
(
“oo” as in “cool” d j2π f for | f | < 0.5,
F sinc(t) =
dt 0 for | f | > 0.5.
(See IP )
f (kHz)
1 2 3
(a) “oo” spectrum
2-4 The Sampling Theorem
The sampling theorem is an operational cornerstone of both
Sound magnitude discrete-time 1-D signal processing and discrete-space 2-D
image processing.
“ah” as in “Bach” 2-4.1 Sampling Theorem Statement

The samples {x(n∆)} of a signal x(t) sampled every ∆ seconds
are
{x(n∆), n = . . . , −2, −1, 0, 1, 2, . . .}. (2.41)
f (kHz) The inverse of the sampling interval ∆ is the sampling rate
1 2 3 S = 1/∆ samples per second. The sampling rate has the same
(b) “ah” spectrum dimension as Hz, and is often expressed in “Hz.” For example,
the standard sampling rate for CDs is 44100 samples/second,
Figure 2-5 Spectra of two vowel sounds. often stated as 44100 Hz. The corresponding sampling interval
is ∆ = 1/44100 s = 22.676 µ s.
A signal x(t) is bandlimited to a maximum frequency of
B (extending from −B to B), measured in Hz, if its Fourier
transform X( f ) = 0 for | f | > B. Although real-world signals
are seldom truly bandlimited, their spectra are often negligible
above some frequency B.
◮ The sampling theorem states that if

Exercise 2-5: A square wave x(t) has the Fourier series
expansion X( f ) = 0 for | f | > B,
1 1 1 and if
x(t) = sin(t) + sin(3t) + sin(5t) + sin(7t) + · · ·
3 5 7 x(t) is sampled at a sampling rate of S samples/s,
Compute output y(t) if then
x(t) can be reconstructed exactly from {x(n∆),
x(t) h(t) = 0.4 sinc(0.4t) y(t). n = . . . , −2, −1, 0, 1, 2, . . .}, provided S > 2B.
Answer: y(t) = sin(t). (See IP )

The sampling rate must exceed double the maximum frequency
in the spectrum X( f ) of x(t). The minimum (actually an infi-
mum) sampling rate 2B samples/second is called the Nyquist Dividing by ∆ and recalling that S = 1/∆ gives
rate, and the frequency 2B is called the Nyquist frequency.
∞
xs (t) S ∑ X( f − kS). (2.47)
k=−∞
2-4.2 Sampling Theorem Derivation The spectrum Xs ( f ) of xs (t) consists of a superposition of copies
of the spectrum X( f ) of x(t), repeated every S = 1/∆ and
A. The Sampled Signal xs (t) multiplied by S. If these copies do not overlap in frequency,
we may then recover X( f ) from Xs ( f ) using a lowpass filter,
Given a signal x(t), we construct the sampled signal xs (t) by provided S > 2B [see Fig. 2-6(a)].
multiplying x(t) by the impulse train
∞ X s( f )
δs (t) = ∑ δ (t − n∆). (2.42) S X(0)
n=−∞ f
−B − S B−S −B 0 B S−B S+B
That is, Copy Copy
(a) S > 2B
∞
xs (t) = x(t) δs (t) = ∑ x(t) δ (t − n∆) X s( f )
n=−∞
∞
f
= ∑ x(n∆) δ (t − n∆). (2.43) −B − S B − S0S − B S+B
n=−∞
−B B
(b) S < 2B
B. Spectrum of the sampled signal xs (t) Figure 2-6 Sampling a signal x(t) with maximum frequency B at
a rate of S makes X( f ) change amplitude to S X( f ) and to repeat
Using Fourier series, it can be shown that the Fourier transform in f with period S. These copies (a) do not overlap if S > 2B, but
of the impulse train δs (t) is itself an impulse train in frequency: (b) they do if S < 2B.
∞ ∞
∆ ∑ δ (t − n∆) ∑ δ ( f − k/∆). (2.44)
n=−∞ k=−∞
2-4.3 Aliasing
This result can be interpreted as follows. A periodic signal has
If the sampling rate S does not exceed 2B, the copies of X( f )
a discrete spectrum (zero except at specific frequencies) given
will overlap one another, as shown in Fig. 2-6(b). This is called
by the signal’s Fourier series expansion. By Fourier duality, a
an aliased condition, the consequence of which is that the
discrete signal (zero except at specific times) such as xs (t) has
reconstructed signal will no longer match the original signal
a periodic spectrum. So a discrete and periodic signal such as
x(t).
δs (t) has a spectrum that is both discrete and periodic.
Multiplying Eq. (2.44) by x(t), using the definition for xs (t)
given by Eq. (2.43), and applying property #9 in Table 2-4 leads Example 2-1: Two Sinusoids and Aliasing
to
∞
xs (t) ∆ X( f ) ∗ ∑ δ ( f − k/∆), (2.45)
Two signals, a 2 Hz sinusoid and a 12 Hz sinusoid:
k=−∞
which, using property #6 of Table 2-3, simplifies to x1 (t) = cos(4π t)

∞
and
xs (t) ∆ ∑ X( f − k/∆). (2.46)
x2 (t) = cos(24π t),
k=−∞
were sampled at 20 samples/s. Generate plots for B. Sinc Interpolation Formula

(a) x1 (t) and its sampled version x1s (t),
Mathematically, we can use an ideal lowpass filter with a cutoff
(b) x2 (t) and its sampled version x2s (t), frequency anywhere between B and S − B. It is customary to use
S/2 as the cutoff frequency, since it is halfway between B and
(c) Spectra X1 ( f ) and X1s ( f ), and S − B, so as to provide a safety margin for avoiding aliasing if
the actual maximum frequency of X( f ) exceeds B but is less
(d) Spectra X2 ( f ) and X2s ( f ). than S/2. The frequency response of this ideal lowpass filter is,
from Eq. (2.48),
(
Solution: (a) Figure 2-7(a) displays x1 (t) and x1s (t), with the
1/S for | f | < S/2,
latter generated by sampling x1 (t) at S = 20 samples/s. The H( f ) = (2.49)
applicable bandwidth of x1 (t) is B1 = 2 Hz. Since S > 2B1 , it 0 for | f | > S/2.
should be possible to reconstruct x1 (t) from x1s (t), which we
demonstrate in a later subsection. Setting f0 = S in entry #4 of Table 2-5, the impulse response is
Similar plots are displayed in Fig. 2-7(b) for the 12 Hz found to be
sinusoid. In this latter case, B = 12 Hz and S = 20 samples/s.
h(t) = (1/S)[S sinc(St)] = sinc(St). (2.50)
Hence, S < 2B.
(b) Spectrum X1 ( f ) of x1 (t) consists of two impulses at Using the convolution property x(t) ∗ δ (t − τ ) = x(t − τ ) [see
±2 Hz, as shown in Fig. 2-8(a). The spectrum of the sampled property #6 in Table 2-3], we can derive the following sinc
version consists of the same spectrum X1 ( f ) of x1 (t), scaled by interpolation formula:
the factor S, plus additional copies repeated every ±S = 20 Hz
(Fig. 2-8(b). Note that the central spectrum in (Fig. 2-8(b), x(t) = xs (t) ∗ h(t)
corresponding to S X1 ( f ), does not overlap with the neighboring ∞
copies. = ∑ x(n∆) δ (t − n∆) ∗ h(t)
(c) Spectra X2 ( f ) and X2s ( f ) are shown in Fig. 2-9. Because n=−∞
S < 2B, the central spectrum overlaps with its two neighbors. ∞
= ∑ x(n∆) sinc(S(t − n∆)). (2.51)
n=−∞
2-4.4 Sampling Theorem Implementation
In principle, this formula can be used to reconstruct x(t) for
A. Physical Lowpass Filter any time t from its samples {x(n∆)}. But since it requires an
infinite number of samples {x(n∆)} to reconstruct x(t), it is of
If S > 2B, the original signal x(t) can be recovered from theoretical interest only.
the sampled signal xs (t) by subjecting the latter to a lowpass
filter that passes frequencies below B with a gain of 1/S (to
compensate for the factor of S induced by sampling (as noted
in Eq. (2.47)) and rejects frequencies greater than (S − B) Hz:
(
1/S for | f | < B, C. Reconstruction of X( f ) from Samples {x(n∆)}
H( f ) = (2.48)
0 for | f | > S − B.
According to Eq. (2.43)), the sampled signal xs (t) is given by
This type of filter must be implemented using a physical ∞
circuit. For example, a Butterworth filter can be constructed xs (t) = ∑ x(n∆) δ (t − n∆). (2.52)
by connecting op-amps, capacitors and resistors in a series of n=−∞
Sallen-Key configurations.† This is clearly impractical for image
processing. Application of property #6 in Table 2-5 yields
∞
† Ulaby and Yagle, Signals and Systems: Theory and Applications, pp. 296– Xs ( f ) = ∑ x(n∆) e− j2π f n∆. (2.53)
297. n=−∞
0.8 x1(t)
0.6
x1s(t)
0.4
0.2
0 t (s)
−0.2
−0.4
−0.6
−0.8
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a) 2 Hz sinusoid x1(t) and its sampled version x1s(t)
1
x2(t)
0.8
0.6
0.4
x2s(t)
0.2
0 t (s)
−0.2
−0.4
−0.6
−0.8
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b) 12 Hz sinusoid x2(t) and its sampled version x2s(t)
Figure 2-7 Plots of (a) x1 (t) and x1s (t) and (b) x2 (t) and x2s (t). Sampling rate S = 20 samples/s.
X1( f ) X1( f ) of 2 Hz sinusoid

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 f (Hz)
−30 −20 −10 0 10 20 30
(a) X1( f ) of 2 Hz sinusoid
X1s( f )
X1s( f ) for 2 Hz sinusoid sampled at 20 Hz. Note no aliasing.
10
9
S X1( f )
Copy of S X1( f ) Copy of S X1( f )
8 at −20 Hz at +20 Hz
7
0 f (Hz)
−30 −20 −10 0 10 20 30
S = 20 Hz S = 20 Hz
(b) Spectrum X1s( f ) of sampled signal x1s(t)
Figure 2-8 Spectra (a) X1 ( f ) and (b) X1s ( f ) of the 2 Hz sinusoid and its sampled version, respectively. The spectrum X1s ( f ) consists of
X( f ) scaled by S = 20, plus copies thereof at integer multiples of ±20 Hz. The vertical axes denote areas under the impulses.
X2( f ) X2( f ) of 12 Hz sinusoid

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 f (Hz)
−30 −20 −10 0 10 20 30
(a) Spectrum X2( f ) of 12 Hz sinusoid

X2s( f )
X2s( f ) for 12 Hz sinusoid sampled at 20 Hz. Note the presence of aliasing.
10
7 Copy of S X2( f ) Copy of S X2( f )

at −20 Hz at +20 Hz
6
S X2( f )
5
4 Overlap Overlap
0 f (Hz)
−30 −20 −10 0 10 20 30
S = 20 Hz S = 20 Hz
(b) Spectrum X2s( f ) of sampled signal x2s(t)
Figure 2-9 Spectra (a) X2 ( f ) and (b) X2s ( f ) of the 12 Hz sinusoid and its sampled version. Note the overlap in (b) between the spectrum
of X2s ( f ) and its neighboring copies. The vertical axes denote areas under the impulses.
2-5 REVIEW OF 1-D DISCRETE-TIME SIGNALS AND SYSTEMS 59
Note that Xs ( f ) is periodic in f with period 1/∆, as it should be. reconstruction x̂1 (t), we apply Eq. (2.57) with ∆ = 1/S = 0.05 s:
In the absence of aliasing,
X̂1 ( f ) = X1s ( f ) ∆ sinc(∆ f ).
1
X( f ) = Xs ( f ) for | f | < S/2. (2.54)
S The sinc function is displayed in Fig. 2-10(a) in red and uses the
vertical scale on the right-hand side, and the spectrum X̂1 ( f ) is
The relationship given by Eq. (2.53)) still requires an infinite displayed in blue using the vertical scale on the left-hand side.
number of samples {x(n∆)} to reconstruct X( f ) at each fre- The sinc function preserves the spectral components at ±2 Hz,
quency f . but attenuates the components centered at ±20 Hz by a factor
of 10 (approximately).
D. Nearest-Neighbor (NN) Interpolation (b) Application of Eq. (2.56) to x1 (n∆) = cos(4π n∆) with
∆ = 1/20 s yields plot x̂1 (t) shown in Fig. 2-10(b).
A common procedure for computing an approximation to x(t)
from its samples {x(n∆)} is the nearest neighbor interpolation.
The signal x(t) is approximated by x̂(t): Concept Question 2-6: Why must the sampling rate of a
signal exceed double its maximum frequency, if it is to be
 reconstructed from its samples?
x(n∆)
 for (n − 0.5)∆ < t < (n + 0.5)∆,
x̂(t) = x((n + 1)∆) for (n + 0.5)∆ < t < (n + 1.5)∆,

 .. .. Concept Question 2-7: Why does nearest-neighbor in-
. . terpolation work as well as it does?
(2.55)
So x̂(t) is a piecewise-constant approximation to x(t), and it is
related to the sampled signal xs (t) by Exercise 2-7: What is the Nyquist sampling rate for a signal
bandlimited to 4 kHz?
x̂(t) = xs (t) ∗ rect(t/∆). (2.56)
Answer: 8000 samples/s. (See IP )
Using the Fourier transform of a rectangle function (entry #4 in
Table 2-5), the spectrum X̂( f ) of x̂(t) is Exercise 2-8: A 500 Hz sinusoid is sampled at 900
samples/s. No anti-alias filter is being used. What is the
X̂( f ) = Xs ( f ) ∆ sinc(∆ f ), (2.57) frequency of the reconstructed continuous-time sinusoid?
where Xs ( f ) is the spectrum of the sampled signal. The zero- Answer: 400 Hz. (See IP )
crossings of the sinc function occur at frequencies f = k/∆ = kS
for integers k. These are also the centers of the copies of the
original spectrum X( f ) induced by sampling. So these copies
are attenuated if the maximum frequency B of X( f ) is such that 2-5 Review of 1-D Discrete-Time
B ≪ S. The factor ∆ in Eq. (2.57) cancels the factor S = 1/∆ in
Eq. (2.47). Signals and Systems
Through direct generalizations of the 1-D continuous-time def-
initions and properties of signals and systems presented earlier,
Example 2-2: Reconstruction of 2 Hz Sinusoid
we now extend our review to their discrete counterparts.
For the 2 Hz sinusoid of Example 2-1: (a) plot spectrum X̂1 ( f )

of the approximated reconstruction x̂1 (t), and (b) apply nearest- 2-5.1 Discrete-Time Notation
neighbor interpolation to generate x̂1 (t).
A discrete-time signal is a physical quantity—such as voltage or
Solution: (a) Spectrum X1s ( f ) of the sampled version of the acoustic pressure—that varies with discrete time n, where n is a
2 Hz sinusoid was generated earlier in Example 2-1 and dis- dimensionless integer. Mathematically, a discrete-time signal is
played in Fig. 2-8(b). To obtain the spectrum of the approximate a function x[n] of discrete time n.
ˆ 1( f )
X sinc(0.05 f )
0.5 1.0
0.45 0.9
0.4 0.8
0.35 sinc(0.05 f ) 0.7

ˆ 1( f )
X
0.3 0.6
Left scale Right scale
0.25 0.5
0.2 0.4
0.15 0.3
0.1 0.2
0.05 0.1
0
0 f (Hz)
−30 −20 −10 0 10 20 30
(a) Spectrum Xˆ 1( f ) of 2 Hz signal x1(t). The sinc function is shown in red.
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b) 2 Hz sinusoidal signal reconstructed from samples at 20 Hz,

using nearest neighbor interpolation. The 2 Hz sinusoid is shown in red.
Figure 2-10 Plots of Example 2-2.

property of impulses still holds, with a summation replacing the

4 integral:
∞
3
∑ x[i] δ [n − i] = x[n]. (2.61)
2 i=−∞
1
n 2-5.2 Discrete-Time Eternal Sinusoids
−2 −1 0 1 2 3
A discrete-time eternal sinusoid is defined as
Figure 2-11 Stem plot representation of x[n]. x[n] = A cos(Ω0 n + θ ), −∞ < n < ∞, (2.62)
where Ω0 is the discrete-time frequency with units of radians

per sample, so it is dimensionless.
• Discrete-time signals x[n] use square brackets, Comparing the discrete-time eternal sinusoid to the
whereas continuous-time signals x(t) use parentheses. continuous-time eternal sinusoid given by Eq. (2.1), which we
repeat here as
• t has units of seconds, while n is dimensionless.
x(t) = A cos(2π f0t + θ ), −∞ < t < ∞, (2.63)
Discrete-time signals x[n] usually result from sampling a
it is apparent that a discrete-time sinusoid can be viewed as a
continuous-time signal x(t) at integer multiples of a sampling
continuous-time sinsoid sampled every ∆ seconds, at a sampling
interval of ∆ seconds. That is,
rate of S = 1/∆ samples/s. Thus,
• x[n] = x(n∆) for n = {. . . , −2, −1, 0, 1, 2, . . .}.
Discrete-time signals are often represented using bracket nota- Ω0 = 2π f0 ∆ = 2π f0 /S, (2.64)
tion, and plotted using stem plots. For example, the discrete-time
signal x[n] defined by
which confirms that Ω0 , like n, is dimensionless. However,
 almost all discrete-time eternal sinusoids are nonperiodic! In

 3 for n = −1,

2 fact, x[n] is periodic only if
for n = 0,
x[n] = (2.58) 2π N

 4 for n = 2, = , (2.65)


0 for all other n, Ω0 D
can be depicted using either the bracket notation with N/D being a rational number. In such a case, the fun-
damental period of the sinusoid is N, provided N/D has been
x[n] = {3, 2, 0, 4}, (2.59) reduced to lowest terms.
where the underlined value is the value at time n = 0, or in the

form of the stem plot shown in Fig. 2-11. Example 2-3: Discrete Sinusoid
The support of this x[n] is the interval [−1, 2], and its
duration is 2 − (−1) + 1 = 4. In general, a discrete-time signal
Compute the fundamental period of
with support [a, b] has duration b − a + 1.
A discrete-time (Kronecker) impulse δ [n] is defined as x[n] = 3 cos(0.3π n + 2).
(
1 for n = 0,
δ [n] = {1} = (2.60) Solution: From the expression for x[n], we deduce that
0 for n 6= 0.
Ω0 = 0.3π . Hence,
Unlike the continuous-time impulse, the discrete-time impulse 2π 2π 20
has no issues about infinite height and zero width. The sifting = =
Ω0 0.3π 3
reduced to lowest terms. Therefore, the period is N = 20. convolution
∞
Another important property of discrete-time eternal sinusoids
is that the discrete-time frequency Ω0 is periodic, which is not
y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]. (2.71a)
i=−∞
true for continuous-time sinusoids. For any integer k, Eq. (2.62) Discrete-time convolution
can be rewritten as
x[n] = A cos(Ω0 n + θ ), −∞ < n < ∞ Most of the continuous-time properties of convolution also
= A cos((Ω0 + 2π k)n + θ ), apply in discrete time.
Real-world signals and filters are defined over specified
= A cos(Ω′0 n + θ ), −∞ < n < ∞, (2.66) ranges of n (and set to zero outside those ranges). If h[n] has
support in the interval [n1 , n2 ], Eq. (2.71a) becomes
with
Ω′0 = Ω0 + 2π k. (2.67) n2
Also, the nature of the variation of x[n] with n has a peculiar

y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]. (2.71b)
i=n1
dependence on Ω0 . Consider, for example, the sinusoid
Reversing the sequence of h[n] and x[n] leads to the same
x[n] = cos(Ω0 n), −∞ < n < ∞, (2.68) outcome. That is, if x[n] has support in the interval [n3 , n4 ], then
where, for simplicity, we assigned it an amplitude of 1 and n4
a phase angle θ = 0. Next, let us examine what happens as y[n] = h[n] ∗ x[n] = ∑ x[i] h[n − i]. (2.71c)
we increase Ω0 from a value slightly greater than zero to 2π . i=n3
Initially, as Ω0 is increased, x[n] oscillates faster and faster, until
x[n] reaches a maximum rate of oscillation at Ω0 = π , namely ◮ The duration of the convolution of two signals of dura-
n tions N1 and N2 is Nc = N1 + N2 − 1, not N1 + N2 . Since
x[n] = cos(π n) = (−1) at (Ω0 = π ). (2.69)
h[n] is of length N1 = n2 − n1 + 1 and x[n] is of length
At Ω0 = π , x[n] oscillates as a function of n between (−1) and N2 = n4 − n3 + 1, the length of the convolution y[n] is
(+1). As Ω0 is increased beyond π , oscillation slows down and Nc = N1 + N2 − 1 = (n2 − n1) + (n4 − n3 ) + 1. ◭
then stops altogether when Ω0 reaches 2π :
x[n] = cos(2π n) = 1 at (Ω0 = 2π ). (2.70) For causal signals (x[n] and h[n] equal to zero for n < 0), y[n]
assumes the form
Beyond Ω0 = 2π , the oscillatory behavior starts to increase
again, and so on. This behavior has no equivalence in the world n
of continuous-time sinusoids. y[n] = ∑ x[i] h[n − i], n ≥ 0. (2.71d)
i=0
Causal
2-5.3 1-D Discrete-Time Systems
A 1-D discrete-time system accepts an input x[n] and produces For example,
an output y[n]:
{1, 2} ∗ {3, 4} = {3, 10, 8}. (2.72)
x[n] SYSTEM y[n]. The duration of the output is 2 + 2 − 1 = 3.
The definition of LTI for discrete-time systems is identical to 2-5.4 Discrete-Time Convolution Properties
the definition of LTI for continuous-time systems. If a discrete-
time system has impulse response h[n], then the output y[n] With one notable difference, the properties of the discrete-time
can be computed from the input x[n] using the discrete-time convolution are the same as those for continuous time. If (t)
Table 2-6 Comparison of convolution properties for continuous-time and discrete-time signals.
Property Continuous Time Discrete Time

Z ∞ ∞
Definition y(t) = h(t) ∗ x(t) =
−∞
h(τ ) x(t − τ ) d τ y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]
i=−∞
1. Commutative x(t) ∗ h(t) = h(t) ∗ x(t) x[n] ∗ h[n] = h[n] ∗ x[n]
2. Associative [g(t) ∗ h(t)] ∗ x(t) = g(t) ∗ [h(t) ∗ x(t)] [g[n] ∗ h[n]]∗ x[n] = g[n] ∗ [h[n] ∗ x[n]]
3. Distributive x(t) ∗ [h1(t) + · · · + hN (t)] = x[n] ∗ [h1 [n] + · · · + hN [n]] =

x(t) ∗ h1(t) + · · · + x(t) ∗ hN (t) x[n] ∗ h1[n] + · · · + x[n] ∗ hN [n]
Z t n
4. Causal ∗ Causal = Causal y(t) = u(t)
0
h(τ ) x(t − τ ) d τ y[n] = u[n] ∑ h[i] x[n − i]
i=0
5. Time-Shift h(t − T1 ) ∗ x(t − T2 ) = y(t − T1 − T2 ) h[n − a] ∗ x[n − b] = y[n − a − b]
6. Sampling x(t) ∗ δ (t − T ) = x(t − T ) x[n] ∗ δ [n − a] = x[n − a]
7. Width width y(t) = width x(t) + width h(t) width y[n] =

width x[n] + width h[n] − 1
! !
∞ ∞ ∞
8. Area area of y(t) = area of x(t) × area of h(t) ∑ y[n] = ∑ h[n] ∑ x[n]
n=−∞ n=−∞ n=−∞
Z t n
9. Convolution with Step y(t) = x(t) ∗ u(t) =
−∞
x(τ ) d τ x[n] ∗ u[n] = ∑ x[i]
i=−∞
is replaced with [n] and integrals are replaced with sums, the 2-5.5 Delayed-Impulses Computation Method
convolution properties listed in Table 2-3 lead to those listed in
Table 2-6. For finite-duration signals, computation of the convolution sum
The notable difference is associated with property #7. In can be facilitated by expressing one of the signals as a linear
discrete time, the width (duration) of a signal that is zero-valued combination of delayed impulses. The process is enabled by the
outside interval [a, b] is b − a + 1, not b − a. Consider two sampling property (#6 in Table 2-6).
signals, h[n] and x[n], defined as follows: Consider, for example, the convolution sum of the two signals
x[n] = {2, 3, 4} and h[n] = {5, 6, 7}, namely
Signal From To Duration
y[n] = x[n] ∗ h[n] = {2, 3, 4} ∗ {5, 6, 7}.
h[n] a b b−a+1
x[n] c d d−c+1 The sampling property allows us to express x[n] in terms of
impulses,
y[n] a+c b+d (b + d) − (a + c) + 1
x[n] = 2δ [n] + 3δ [n − 1] + 4δ [n − 2],
where y[n] = h[n] ∗ x[n]. Note that the duration of y[n] is
which leads to
(b + d) − (a + c) + 1 = (b − a + 1) + (d − c + 1) − 1
= duration h[n] + duration x[n] − 1. y[n] = (2δ [n] + 3δ [n − 1] + 4δ [n − 2])∗ h[n]
= 2h[n] + 3h[n − 1] + 4h[n − 2].
2
Given that both x[n] and h[n] are of duration = 3, the duration
of their sum is 3 + 3 − 1 = 5, and it extends from n = 0 y[3] = ∑ x[i] h[3 − i]
i=1
to n = 4. Computing y[0] using the delayed-impulses method
(while keeping in mind that h[i] has a non-zero value for only = x[1] h[2] + x[2] h[1] = 3 × 7 + 4 × 6 = 45,
i = 0, 1, and 2) leads to 2
y[4] = ∑ x[i] h[4 − i] = x[2] h[2] = 4 × 7 = 28,
y[0] = 2h[0] + 3h[−1] + 4h[−2] i=2
= 2 × 5 + 3 × 0 + 4 × 0 = 10. y[n] = 0, otherwise.
The process can then be repeated to obtain the values of y[n] for Hence,
n = 1, 2, 3, and 4. y[n] = {10, 27, 52, 45, 28}.
(b) The convolution sum can be computed graphically
through a four-step process.
Example 2-4: Discrete-Time Convolution Step 1: Replace index n with index i and plot x[i] and h[−i],
as shown in Fig. 2-12(a). Signal h[−i] is obtained from h[i] by
reflecting it about the vertical axis.
Given x[n] = {2, 3, 4} and h[n] = {5, 6, 7}, compute
Step 2: Superimpose x[i] and h[−i], as in Fig. 2-12(b), and
y[n] = x[n] ∗ h[n] multiply and sum them. Their product is 10.
by (a) applying the sum definition and (b) graphically. Step 3: Shift h[−i] to the right by 1 to obtain h[1 − i], as
shown in Fig. 2-12(c). Multiplication and summation of x[i] by
Solution: (a) Both signals have a length of 3 and start at time h[1 − i] generates y[1] = 27. Shift h[1 − i] by one more unit to
zero. That is, x[0] = 2, x[1] = 3, x[2] = 4, and x[i] = 0 for all the right to obtain h[2 − i], and then repeat the multiplication
other values of i. Similarly, h[0] = 5, h[1] = 6, h[2] = 7, and and summation process to obtain y[2]. Continue the shifting and
h[i] = 0 for all other values of i. multiplication and summation processes until the two signals no
By Eq. (2.71d), the convolution sum of x[n] and h[n] is longer overlap.
n
y[n] = x[n] ∗ h[n] = ∑ x[i] h[n − i]. Step 4: Use the values of y[n] obtained in step 3 to generate a
i=0 plot of y[n], as shown in Fig. 2-12(g);
Since h[i] = 0 for all values of i except i = 0, 1, and 2, it follows y[n] = {10, 27, 52, 45, 28}.
that h[n − i] = 0 for all values of i except for i = n, n − 1, and
n − 2. With this constraint in mind, we can apply Eq. (2.71d) at
discrete values of n, starting at n = 0:
Concept Question 2-8: Why are most discrete-time si-
0 nusoids not periodic?
y[0] = ∑ x[i] h[0 − i] = x[0] h[0] = 2 × 5 = 10,
i=0
1 Concept Question 2-9: Why is the length of the convo-
y[1] = ∑ x[i] h[1 − i] lution of two discrete-time signals not equal to the sum of
i=0 the lengths of the two signals?
= x[0] h[1] + x[1] h[0] = 2 × 6 + 3 × 5 = 27,
2 Exercise 2-9: A 28 Hz sinusoid is sampled at 100 sam-
y[2] = ∑ x[i] h[2 − i] ples/s. What is Ω0 for the resulting discrete-time sinusoid?
i=0
What is the period of the resulting discrete-time sinusoid?
= x[0] h[2] + x[1] h[1] + x[2] h[0]
Answer: Ω0 = 0.56π ; N = 25. (See IP )
= 2 × 7 + 3 × 6 + 4 × 5 = 52,
8 8
n=0 7 n=0
6 x[i] 6 6
5 h[−i]
4 4 4
3
2 2 2
i i
−3 −2 −1 0 1 2 3 4 5 6 −3 −2 −1 0 1 2 3 4 5 6
(a) x[i] and h[−i]
y[0] = 2 × 5 = 10 y[1] = 2 × 6 + 3 × 5 = 27
8 8
h[−i] n=0 h[1 − i] n=1
6 6
4 4
2 x[i] 2 x[i]
i i
−3 −2 −1 0 1 2 3 4 5 6 −3 −2 −1 0 1 2 3 4 5 6
(b) (c)
y[2] = 2 × 7 + 3 × 6 + 4 × 5 = 52 y[3] = 3 × 7 + 4 × 6 = 45
8 8
h[2 − i] n=2 h[3 − i] n=3
6 6
4 4
2 x[i] 2 x[i]
i i
−3 −2 −1 0 1 2 3 4 5 6 −3 −2 −1 0 1 2 3 4 5 6
(d) (e)
y[4] = 4 × 7 = 28 y[n]
8
n=4
6 60
h[4 − i] 52
4 40 45
x[i] 27 28
2 20
10
i n
−3 −2 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4
(f ) (g) y[n]
Figure 2-12 Graphical computation of convolution sum.

(a) m 6= n
Exercise 2-10: Compute the output y[n] of a discrete-time
LTI system with impulse response h[n] and input x[n], where
Evaluation of the integral in Eq. (2.74) leads to
h[n] = { 3, 1 } and x[n] = { 1, 2, 3, 4 }.
π
Z π
Answer: { 3, 7, 11, 15, 4 }. (See IP ) 1 jΩ(m−n) e jΩ(m−n)
e dΩ =
2 π −π 2π j(m − n)
−π
e j π (m−n) − e jπ (m−n)
−
=
2-6 Discrete-Time Fourier Transform j2π (m − n)
(DTFT) (−1)m−n − (−1)m−n
=
j2π (m − n)
The discrete-time Fourier transform (DTFT) is the discrete- = 0, (m 6= n). (2.75)
time counterpart to the Fourier transform. It has the same two
functions: (1) to compute spectra of signals and (2) to analyze
the frequency responses of LTI systems.
(b) m = n
2-6.1 Definition of the DTFT
If m = n, the integral reduces to
The DTFT of x[n], denoted X(Ω), and its inverse are defined as Z π Z π
1 1
e jΩ(n−n) dΩ = 1 dΩ = 1. (2.76)
∞
2π −π 2π −π
X(Ω) = ∑ x[n] e− jΩn (2.73a) The results given by Eqs. (2.75) and (2.76) can be combined into
n=−∞
the definition of the orthogonality property given by Eq. (2.74).
and Having verified the validity of the orthogonality property, we
Z π
1 now use it to derive Eq. (2.73b). Upon multiplying the definition
x[n] = X(Ω) e jΩn dΩ. (2.73b)
2π −π for the DTFT given by Eq. (2.73a) by 21π e jΩm and integrating
over Ω, we have
Readers familiar with the Fourier series will recognize that Z π Z π ∞
1 1
the DTFT X(Ω) is a Fourier series expansion with x[n] as the X(Ω) e jΩm dΩ = ∑ x[n] e jΩ(m−n) dΩ
2π −π 2π −π n=−∞
coefficients of the Fourier series. The inverse DTFT is simply Z π
∞
the formula used for computing the coefficients x[n] of the 1
= ∑ x[n] e jΩ(m−n) dΩ
Fourier series expansion of the periodic function X(Ω). 2π n=−∞ −π
We note that the DTFT definition given by Eq. (2.73a) is ∞
the same as the formula given by Eq. (2.53) for computing the = ∑ x[n] δ [m − n] = x[m]. (2.77)
spectrum Xs ( f ) of a continuous-time signal x(t) directly from n=−∞
its samples {x(n∆)}, with Ω = 2π f ∆.
The inverse DTFT given by Eq. (2.73b) can be derived as Equation (2.74) was used in the final step leading to Eq. (2.77).
follows. First, we introduce the orthogonality property Exchanging the order of integration and summation in Eq. (2.77)
is acceptable if the summand is absolutely summable; i.e., if the
Z π
1 DTFT is defined. Finally, replacing the index m with n in the top
e jΩ(m−n) dΩ = δ [m − n]. (2.74) left-hand side and bottom right-hand side of Eq. (2.77) yields
2π −π
the inverse DTFT expression given by Eq. (2.73b).
To establish the validity of this property, we consider two cases,
namely: (1) when m 6= n and (2) when m = n.
2-6 DISCRETE-TIME FOURIER TRANSFORM (DTFT) 67
2-6.2 Properties of the DTFT

◮ The DTFT X(Ω) is periodic with period 2π . ◭
The DTFT can be regarded as the Fourier transform of the
sampled signal xs (t) with a sampling interval ∆ = 1:
( )
∞ We also note the following special relationships between x[n]
X(Ω) = F ∑ x[n] δ (t − n) . (2.78) and X(Ω):
n=−∞
∞
This statement can be verified by subjecting Eq. (2.73a) to the X(0) = ∑ x[n], (2.79a)
n=−∞
time-shift property of the Fourier transform (#3 in Table 2-4). Z π
Consequently, most (but not all) of the Fourier transform prop- 1
x[0] = X(Ω) dΩ, (2.79b)
erties listed in Table 2-4 extend directly to the DTFT with 2π f 2π −π
replaced with Ω, which we list here in Table 2-7. The exceptions
mostly involve the following property of the DTFT: and
∞
X(±π ) = ∑ (−1)n x[n]. (2.79c)
n=−∞
Table 2-7 Properties of the DTFT. If x[n] is real-valued, then conjugate symmetry holds:
Property x[n] X(Ω) X(Ω)∗ = X(−Ω).
1. Linearity ∑ ci xi [n] ∑ ci Xi (Ω) Parseval’s theorem for the DTFT states that the energy of x[n]
is identical, whether computed in the discrete-time domain n or
2. Time shift x[n − n0] X(Ω) e− jn0 Ω in the frequency domain Ω:
∞ Z π
3. Modulation x[n] e jΩ0 n X(Ω − Ω0) 1
∑ |x[n]|2 = |X(Ω)|2 dΩ (2.80)
n=−∞ 2π −π
4. Time reversal x[−n] X(−Ω)
5. Conjugation x∗ [n] X∗ (−Ω) The energy spectral density is now 21π |X(Ω)|2 .
Finally, by analogy to continuous time, a discrete-time ideal
6. Time h[n] ∗ x[n] H(Ω) X(Ω) lowpass filter with cutoff frequency Ω0 has the frequency
convolution response for |Ω| < π (recall that H(Ω) is periodic with period π )
(
Special DTFT Relationships 1 for |Ω| < Ω0 ,
H(Ω) = (2.81)
0 for Ω0 < |Ω| ≤ π ,
7. Conjugate X∗ (Ω) = X(−Ω)
symmetry
which eliminates frequency components of x[n] that lie in the
8. Zero X(0) = ∑∞
n=−∞ x[n] range Ω0 < |Ω| ≤ π .
frequency Z π
1
9. Zero time x[0] = X(Ω) dΩ
2π −π 2-6.3 Important DTFT Pairs
∞
n
10. Ω = ±π X(±π ) = ∑ (−1) x[n] For easy access, several DTFT pairs are provided in Table 2-8.
n=−∞
∞ Z In all cases, the expressions for X(Ω) are periodic with period
1 π
11. Rayleigh’s ∑ |x[n]|2 = |X(Ω)|2 dΩ 2π , as they should be.
n=−∞ 2π −π
Entries #7 and #8 of Table 2-8 deserve more discussion,
(often called
Parseval’s) which we now present.
theorem
Table 2-8 Discrete-time Fourier transform (DTFT) pairs.
x[n] X(Ω) Condition

1. δ [n] 1
1a. δ [n − m] e− jmΩ m = integer
∞
2. 1 2π ∑ δ (Ω − 2π k)
k=−∞
∞
3. e jΩ0 n 2π ∑ δ (Ω − Ω0 − 2π k)
k=−∞
∞
4. cos(Ω0 n) π ∑ [δ (Ω − Ω0 − 2π k) + δ (Ω + Ω0 − 2π k)]
k=−∞
π ∞
5. sin(Ω0 n)
j ∑ [δ (Ω − Ω0 − 2π k) − δ (Ω + Ω0 − 2π k)]
k=−∞
e j2Ω cos θ − ae jΩ cos(Ω0 − θ )

6. an cos(Ω0 n + θ ) u[n] |a| < 1
e j2Ω − 2ae jΩ cos Ω0 + a2
hni
sin Ω N + 21
7. rect Ω 6= 2π k
N sin Ω2
∞
Ω0 Ω0 sin(Ω0 n) Ω − 2π k
8.
π
sinc
π
n =
πn ∑ rect 2Ω0
k=−∞
A. Discrete-Time Sinc Functions by
hFIR [n] = h[n] hHam [n]

 π n

 Ω0 Ω0

π sinc n 0.54 + 0.46 cos , |n| ≤ N,
π | {z N }
= | {z }

 h[n] hHam [n]

0, |n| > N.
(2.83)
The impulse response of an ideal lowpass filter is As can be seen in Fig. 2-13(c) and (d), hFIR [n] with N = 10
Z Ω0 provides a good approximation to an ideal lowpass filter, with
Ω0 Ω0 a finite duration. The Hamming-windowed filter belongs to a
h[n] = 1e jΩn dΩ = sinc n . (2.82)
−Ω0 π π group of filters called finite-impulse response (FIR) filters. FIR
filters can also be designed using a minimax criterion, resulting
This is called a discrete-time sinc function. A discrete-time sinc in an equiripple filter. This and other FIR filter design proce-
function h[n] with Ω0 = π /4 is displayed in Fig. 2-13(a), along dures are discussed in discrete-time signal processing textbooks.
with its frequency response H(Ω) in Fig. 2-13(b). Such a filter
is impractical for real-world applications, because it is unstable
and it has infinite duration. To override these limitations, we
can multiply h[n] by a window function, such as a Hamming
window. The modified impulse response hFIR [n] is then given
2-6 DISCRETE-TIME FOURIER TRANSFORM (DTFT) 69
h[n] H(Ω)
Impulse response Frequency response
0.25 1.5
0.20
1
0.10
0 n 0.5
−0.10 0 Ω
−20 −15 −10 −5 0 5 10 15 20 −2π −π −Ω00 Ω0 π 2π
(a) Impulse response h[n] (b) Ideal lowpass filter spectrum H(Ω) with Ω0 = π/4
hFIR[n] HFIR(Ω)
Frequency response
Impulse response 1.5
0.25
0.20 1
0.10
0.5
0 n
−0.05 0
−10 −5 0 5 10 −π 0 π
(c) Impulse response of Hamming-windowed filter (d) Spectrum HFIR(Ω) of Hamming-windowed filter
Figure 2-13 Parts (a) and (b) are for an ideal lowpass filter with Ω0 = π /4, and parts (c) and (d) are for the same filter after multiplying its
impulse response with a Hamming window of length N = 10.
B. Discrete Sinc Functions with r = e jΩ , the summation in Eq. (2.85) becomes

N 2N
∑ e− jΩn = e− jΩN ∑ e jΩn
A discrete-time rectangle function rect Nn is defined as n=−N n=0
( 1 − e jΩ(2N+1)
hni 1 for |n| ≤ N, = e− jΩN
rect = (2.84) 1 − e jΩ
N 0 for |n| > N. sin((2N + 1)Ω/2)
= . (2.87)
sin(Ω/2)
We note that rect Nn has duration 2N + 1. This differs from the
continuous-time rect
function
rect(t/T ), which has duration T . This is called a discrete (or periodic) sinc function. A rectangu-
The DTFT of rect Nn is obtained from Eq. (2.73a) by setting lar pulse with N = 10 is shown in Fig. 2-14 along with its DTFT.
x[n] = 1 and limiting the summation to the range (−N, N):
N
DTFT{rect[n/N]} = ∑ e− jΩn . (2.85)
n=−N Concept Question 2-10: Why does the DTFT share so
many properties with the CTFT?
Using the formula
N
1 − rN+1 Concept Question 2-11: Why is the DTFT periodic in
∑ rk = 1−r
(2.86) frequency?
k=0
x[n] X(Ω) and X[k]

1.5 14
12 X[k]
1 10 X(Ω)
8
0.5 6
4
0 n 2
0 1 2 3 4 5 6
Ω
−15 −10 −5 0 5 10 15
(a) x[n] = rect(n/10)
Figure 2-15 The DFT X[k] is a sampled version of the DTFT
X(Ω).
X(Ω)
21
20
15
Exercise 2-12: Compute the inverse DTFT of
10
5 4 cos(2Ω) + 6 cos(Ω) + j8 sin(2Ω) + j2 sin(Ω).
0 Ω
−2π −π 0 π 2π Answer:
(b) X(Ω) DTFT−1 [4 cos(2Ω) + 6 cos(Ω) + j8 sin(2Ω) + j2 sin(Ω)]

= { 6, 4, 0, 2, −2 }.
Figure 2-14 Discrete-time rectangle function with N = 10 and
its DTFT. (See IP )
2-7 Discrete Fourier Transform (DFT)

The discrete Fourier transform (DFT) is the numerical bridge
between the DTFT and the fast Fourier transform (FFT). For a
signal {x[n], n = 0, . . . , M − 1} of duration M, its DFT of order
N is X[k], where X[k] is X(Ω) sampled at the N frequencies
Ω = {2π k/N, k = 0, . . . , N − 1}:
X[k] = X(Ω = 2π k/N), k = 0, . . . , N − 1. (2.88)

Exercise 2-11: Compute the DTFT of 4 cos(0.15π n + 1).
Answer: An example is shown in Fig. 2-15. Usually N = M; i.e., the order
N of the DFT is equal to the duration M of x[n]. However, in
DTFT[4 cos(0.15π n + 1)] some situations, it is desirable to select N to be larger than M.
∞ Fro example, to compute and plot the DTFT of a short-duration
= ∑ 4π e j1[δ (Ω − 0.15 − 2kπ )] signal such as x[n] = { 3, 1, 4 }, for which M = 3, we may choose
k=−∞ N to be 256 or 512 so as to produce a smooth plot, such as the
∞ blue plot in Fig. 2-15. Choosing N > M will also allow the use of
+ ∑ 4π e− j1[δ (Ω + 0.15 − 2kπ )]. the FFT to compute convolutions quickly (see Section 2-7.2C).
k=−∞
As a result, the properties of the DFT follow those of the
DTFT, with some exceptions, such as time reversal and cyclic
(See IP )
convolution (discussed later in Section 2-7.2C).
2-7 DISCRETE FOURIER TRANSFORM (DFT) 71
by 1
N e j2π mk/N and summing over k gives
◮ To avoid confusion between the DTFT and the DFT, the
DFT, being a discrete function of integer k, uses square N−1 N−1 M−1
1 1
brackets, as in X[k], while the DTFT, being a continuous ∑ X[k] e j2π mk/N = ∑ ∑ x[n] e j2π (m−n)k/N
and periodic function of real numbers Ω, uses round paren- N k=0 N k=0 n=0
theses, as in X(Ω). ◭ 1 M−1 N−1
=
N ∑ x[n] ∑ e j2π (m−n)k/N
n=0 k=0
M−1
= ∑ x[n] δ [m − n]
n=0
(
x[m] for 0 ≤ m ≤ M − 1,
=
0 for M ≤ m ≤ N − 1.
(2.91)
Upon changing index m to n on the left-hand side of Eq. (2.91)

and in the right-hand side in x[m], we obtain the formal definition
of the inverse DFT given by Eq. (2.89b).
The main use of the DFT is to compute spectra of signals and
2-7.1 Definition of the DFT frequency responses of LTI systems. All plots of spectra in this
book (and all other books on signal and image processing) were
made by computing them using the DFT and plotting the results.
The N-point (or Nth-order) DFT of {x[n], n = 0, . . . , M − 1},

denoted X[k], and the inverse DFT of X[k], namely x[n], are
defined as
M−1 Example 2-5: DFT of Periodic Sinusoids

X[k] = ∑ x[n] e− j2π nk/N , k = 0, . . . , N − 1, (2.89a)
n=0
and
N−1
1
x[n] =
N ∑ X[k] e j2π nk/N , n = 0, . . . , M − 1. (2.89b) Compute the N-point DFT of the segment of a periodic discrete-
k=0 time sinusoid
x[n] = A cos(2π (k0 /N)n + θ ), 0 ≤ n ≤ N − 1, (2.92)

The definition for X[k] given by Eq. (2.89a) is obtained by
applying Eq. (2.88) to Eq. (2.73a), namely by replacing Ω with with k0 a fixed integer.
2π k/N and limiting the range of summation over n to [0, M − 1].
For the inverse DFT, the definition given by Eq. (2.89b) can
be derived by following a process similar to that we presented Solution: We start by rewriting x[n] as the sum of two expo-
earlier in Section 2-6.1 in connection with the inverse DTFT. nentials:
Specifically, we start with the discrete equivalent of the orthog- A jθ j2π k0 n/N A − jθ − j2π k0 n/N
onality property given by Eq. (2.74): x[n] = e e + e e
2 2
N−1 A jθ j2π k0 n/N A − jθ j2π (N−k0 )n/N
1 = e e + e e , (2.93)
N ∑ e j2π (m−n)k/N = δ [m − n]. (2.90) 2 2
k=0
where we have multiplied the second term in the first step by
Next, multiplying the definition of the DFT given by Eq. (2.89a) e j2π N/N = 1.
Inserting Eq. (2.93) into Eq. (2.89a) with M = N, we have

Table 2-9 Properties of the DFT. In the time-shift and
N−1 modulation properties, (n − n0 ) and (k − k0 ) must be reduced
A jθ j2π k0 n/N − j2π nk/N
X[k] = ∑ 2
e e e mod(N).
n=0
N−1 Property x[n] X[k]
A − jθ j2π (N−k0 )n/N − j2π nk/N
+ ∑ 2
e e e
1. Linearity ∑ ci xi [n] ∑ ci Xi [k]
n=0
A jθ N−1 2. Time shift x[n − n0 ] e− j2π n0 k/N X[k]

=
2
e ∑ e j2π n(k0−k)/N 3. Modulation e j2π k0 n/N x[n] X[k − k0 ]
n=0
A − jθ N−1 j2π n(N−k0 −k)/N 4. Time reversal x[N − n] X[N − k]

+
2
e ∑e . (2.94a)
5. Conjugation x∗ [n] X∗ [N − k]
n=0
6. Convolution h[n]
c x[n] H[k] X[k]
In view of the orthogonality property given by Eq. (2.90), the
summations simplify to impulses, resulting in Special DFT Relationships
A jθ A X∗ [k] = X[N − k]
X[k] = e N δ [k − k0] + e− jθ N δ [N − k − k0], (2.94b) 7. Conjugate
2 2 symmetry
N−1
which can be restated as 8. Zero frequency X[0] = ∑ x[n]
 n=0
 N jθ for k = k0 ,
 2 Ae 1 N−1
X[k] = N2 Ae− jθ for k = N − k0 , (2.94c) 9. Zero time x[0] =
N ∑ X[k]

0 n=0
otherwise. N−1
10. k = N/2 X[N/2] = ∑ (−1)n x[n]
Thus, the DFT of a segment of a periodic sinusoid with n=0
N−1 1 N−1
Ω0 = 2π k0/N consists of two discrete-time impulses, at indices 11. Parseval’s ∑ |x[n]| 2
= ∑ |X[k]|2
k = k0 and k = N − k0 . N
theorem n=0 k=0
2-7.2 Properties of the DFT

Table 2-9 provides a summary of the salient properties of the A. Conjugate Symmetry Property of the DFT
DFT, as well as some of the special relationships between x[n]
and X[k]. Of particular note are the three relationships If x[n] is real-valued, then conjugate symmetry holds for the
DFT, which takes the form
N−1
X[0] = ∑ x[n], (2.95) X∗ [k] = X[N − k], k = 1, . . . , N − 1. (2.98)
n=0
1 N−1 For example, the 4-point DFT of x[n] = { 1, 2, 3, 4 } is

x[0] =
N ∑ X[k], (2.96)
n=0 X[k] = { 10, −2 + j2, −2, −2 − j2 }
and and
N−1
X[N/2] = ∑ (−1)n x[n] for N even. (2.97)
X∗ [1] = −2 − j2 = X[4 − 1] = X[3] = −2 − j2.
n=0
Similarly,
X∗ [2] = X[4 − 2] = X[2] = −2,
which is real-valued. This conjugate-symmetry property follows where (n − n1 )N means (n − n1 ) reduced mod N (i.e., reduced
from the definition of the DFT given by Eq. (2.89a): by the largest integer multiple of N without (n − n1 ) becoming
negative).
N−1
X∗ [k] = ∑ x[n] e j2π nk/N (2.99)
n=0
and
N−1 N−1
C. DFT and Cyclic Convolution
X[N − k] = ∑ x[n] e− j2π n(N−k)/N = ∑ x[n] e− j2π ne j2π k/N .
n=0 n=0
(2.100) Because of the mod N reduction cycle, the expression on the
Since n is an integer, e− j2π n = 1 and Eq. (2.100) reduces to right-hand side of Eq. (2.103) is called the cyclic or circular
convolution of signals x1 [n] and x2 [n]. The terminology helps
N−1
distinguish it from the traditional linear convolution of two
X[N − k] = ∑ e j2π nk/N = X∗[k]. nonperiodic signals.
n=0
The symbol commonly used to denote cyclic convolution is .
c
Combining Eqs. (2.101) and (2.103) leads to
B. Use of DFT for Convolution
N−1
The convolution property of the DTFT extends to the DFT after
some modifications. Consider two signals, x1 [n] and x2 [n], with
yc [n] = x1 [n]
c x2 [n] = ∑ x1[n1 ] x2 [(n − n1)N ]
n1 =0
N-point DFTs X1 [k] and X2 [k]. From Eq. (2.89b), the inverse
DFT of their product is = DFT−1 (X1 [k] X2 [k])
N−1
1 2π
1 N−1
jk 2Nπ n
= ∑ X1 [k] X2 [k] e jk N n . (2.104)
DFT−1 (X1 [k] X2 [k]) = ∑ (X1 [k] X2 [k])e N k=0
N k=0
" # The cyclic convolution yc [n] can certainly by computed by
1 N−1 jk 2π n N−1
− jk 2Nπ n1
= ∑e N
N k=0 ∑ x1 [n1 ] e applying Eq. (2.104), but it can also be computed from the
n1 =0 linear convolution x1 [n]∗ x2[n] by aliasing the latter. To illustrate,
" # suppose x1 [n] and x2 [n] are both of duration N. The linear
N−1
− jk 2Nπ n2
· ∑ x2[n2 ] e . (2.101) convolution of the two signals
n2 =0
y[n] = x1 [n] ∗ x2[n] (2.105)
Rearranging the order of the summations gives
is of duration 2N − 1, extending from n = 0 to n = 2N − 2.
DFT−1 (X1 [k] X2 [k]) = Aliasing y[n] means defining z[n], the aliased version of y[n],
as
1 N−1 N−1 N−1 2π
N n∑ ∑ x1 [n1] x2[n2 ] ∑ e jk N (n−n1−n2 ) . (2.102)

z[0] = y[0] + y[0 + N]
1 =0 n2 =0 k=0
z[1] = y[1] + y[1 + N]
In view of the orthogonality property given by Eq. (2.90),
Eq. (2.102) reduces to ..
.
DFT−1 (X1 [k] X2 [k]) z[N − 2] = y[N − 2] + y[2N − 2]
z[N − 1] = y[N − 1]. (2.106)
1 N−1 N−1
N n∑ ∑ x1 [n1] x2[n2 ] N δ [(n − n1 − n2)N ]
=
1 =0 n2 =0 The aliasing process leads to the result that z[n] is the cyclic
N−1 convolution of x1 [n] and x2 [n]:
= ∑ x1 [n1] x2 [(n − n1)N ], (2.103)
yc [n] = z[n] = x1 [n]
c x2 [n]. (2.107)
n1 =0
Hence, by Eq. (2.107),

Example 2-6: Cyclic Convolution
x1 [n]
c x2 [n] = z[n] = { 28, 21, 30, 31 },
Given the two signals
which is the same answer obtained in part (a).
x1 [n] = { 2, 1, 4, 3 },
x2 [n] = { 5, 3, 2, 1 }, 2-7.3 DFT and Linear Convolution
compute the cyclic convolution of the two signals by In the preceding subsection, we examined how the DFT can
be used to compute the cyclic convolution of two discrete-
(a) applying the DFT method; time signals (Eq. (2.104)). The same method can be applied to
compute the linear convolution of the two signals, provided a
(b) applying the aliasing of the linear convolution method.
preparatory step of zero-padding the two signals is applied first.
Let us suppose that signal x1 [n] is of duration N1 and signal
Solution: x2 [n] is of duration N2 , and we are interested in computing their
(a) With N = 4, application of Eq. (2.89a) to x1 [n] and x2 [n] linear convolution
leads to
y[n] = x1 [n] ∗ x2[n].
X1 [k] = { 10, −2 + j2, 2, −2 − j2 },
X2 [k] = { 11, 3 − j2, 3, 3 + j2 }. The duration of y[n] is
The point-by-point product of X1 [k] and X2 [k] is Nc = N1 + N2 − 1. (2.108)
X1 [k] X2 [k] Next, we zero-pad x1 [n] and x2 [n] so that their durations are
= { 10 × 11, (−2 + j2)(3 − j2), 2 × 3, (−2 − j2)(3 + j2) } equal to or greater than Nc . As we will see in Section 2-8 on how
the fast Fourier transform (FFT) is used to compute the DFT, it
= { 110, −2 + j10, 6, −2 − j10 }. is advantageous to choose the total length of the zero-padded
signals to be M such that M ≥ Nc , and simultaneously M is a
Application of Eq. (2.104) leads to
power of 2.
c x2 [n] = { 28, 21, 30, 31 }.
x1 [n] The zero-padded signals are defined as
(b) Per Eq. (2.71d), the linear convolution of x′1 [n] = {x1 [n], 0, . . . , 0}, (2.109a)
|{z} | {z }
N1 M−N1
x1 [n] = { 2, 1, 4, 3 }
x′2 [n] = {x2 [n], 0, . . . , 0}, (2.109b)
|{z} | {z }
and N2 M−N2
x2 [n] = { 5, 3, 2, 1 }
and their M-point DFTs are X′1 [k] and X′2 [k], respectively. The
is linear convolution y[n] can now be computed by a modified
version of Eq. (2.104), namely
y[n] = x1 [n] ∗ x2[n]
3 y[n] = x′1 [n] ∗ x′2[n]
= ∑ x1 [i] x2 [n − i]
i=0 = DFT−1 { X′1 [k] X′2 [k] }
= { 10, 11, 27, 31, 18, 10, 3 }. M−1
1
=
M ∑ X′1[k] X′2 [k] e j2π nk/M . (2.110)
Per Eq. (2.106), k=0
z[n] = { y[0] + y[4], y[1] + y[5], y[2] + y[6], y[3] } Note that the DFTs can be computed using M-point DFTs of
x′1 [n] and x′2 [n], since using an M-point DFT performs the zero-
= { 10 + 18, 11 + 10, 27 + 3, 31 } = { 28, 21, 30, 31 }. padding automatically.
and
Example 2-7: DFT Convolution
X′2 [3] = −2 + j2.
Multiplication of corresponding pairs gives

Given signals x1 [n] = {4, 5} and x2 [n] = {1, 2, 3}, (a) compute
their convolution in discrete time, and (b) compare the result X′1 [0] X′2 [0] = 9 × 6 = 54,
with the DFT relation given by Eq. (2.110). X′1 [1] X′2 [1] = (4 − j5)(−2 − j2) = −18 + j2,
X′1 [2] X′2 [2] = −1 × 2 = −2,
Solution:
(a) Application of Eq. (2.71a) gives and
3
x1 [n] ∗ x2[n] = ∑ x1 [i] x2 [n − i] = {4, 13, 22, 15}. X′1 [3] X′2 [3] = (4 + j5)(−2 + j2) = −18 − j2.
i=0
Application of Eq. (2.110) gives
(b) Since x1 [n] is of length N1 = 2 and x2 [n] is of length
N2 = 3, their convolution is of length Nc −1
1
y[n] = x′1 [n] ∗ x′2[n] = ∑ X′1 [k] X′2 [k] e j2π nk/Nc
Nc
Nc = N1 + N2 − 1 = 2 + 3 − 1 = 4. k=0
3
1
Hence, we need to zero-pad x1 [n] and x2 [n] as =
4 ∑ X′1 [k] X′2 [k] e jkπ n/2.
k=0
x′1 [n] = {4, 5, 0, 0}
Evaluating the summation for n = 0, 1, 2 and 3 leads to
and
x′2 [n] = {1, 2, 3, 0}. y[n] = x′1 [n] ∗ x′2[n] = {4, 13, 22, 15},
From Eq. (2.89a) with N = Nc = 4, the 4-point DFT of which is identical to the answer obtained earlier in part (a).
x′1 [n] = {4, 5, 0, 0} is For simple signals like those in this example, the DFT method
involves many more steps than does the straightforward con-
3 volution method of part (a), but for the type of signals used in
X1 [k] = ∑ x′1[n] e− jkπ n/2, k = 0, 1, 2, 3, practice, the DFT method is computationally superior.
n=0
which gives
X′1 [0] = 4(1) + 5( j) + 0(1) + 0( j) = 9,

X′1 [1] = 4(1) + 5(− j) + 0(−1) + 0( j) = 4 − j5,
X′1 [2] = 4(1) + 5(−1) + 0(1) + 0(−1) = −1,
and
X′1 [3] = 4(1) + 5( j) + 0(−1) + 0(− j) = 4 + j5.
Similarly, the 4-point DFT of x′2 [n] = {1, 2, 3, 0} gives
X′2 [0] = 6,
X′2 [1] = −2 − j2,
X′2 [2] = 2,
Contrast these large number of multiplications and additions

◮ To summarize, the linear convolution y[n] = h[n] ∗ x[n] (MADs) with the number required using the FFT algorithm:
defined in Eq. (2.71) computes the response (output) y[n] for N large, the number of complex multiplications is reduced
to the input x[n] for an LTI system with impulse response from N 2 to approximately (N/2) log2 N, which is only 2304
h[n]. The cyclic convolution yc [n] = h[n]
c x[n] defined in complex multiplications for N = 512. For complex additions,
Eq. (2.104) is what the DFT maps to products, so a cyclic the number is reduced from N(N − 1) to N log2 N, or 4608
convolution can be computed very quickly using the FFT for N = 512. These reductions, thanks to the efficiency of the
algorithm to compute the DFTs H[k] of h[n] and X[k] of FFT algorithm, are on the order of 100 for multiplications and
x[n], and the inverse DFT of H[k] X[k]. Fortunately, a linear on the order of 50 for addition. The reduction ratios become
convolution can be zero-padded to a cyclic convolution, as increasingly more impressive at larger values of N (Table 2-10).
presented in Subsection 2-7.3, so linear convolutions can The computational efficiency of the FFT algorithm relies on a
also be computed quickly using the FFT algorithm. Cyclic “divide and conquer” concept. An N-point DFT is decomposed
convolutions will also be used in Chapter 7 for computing (divided) into two (N/2)-point DFTs. Each of the (N/2)-point
wavelet transforms, because the cyclic convolution of a DFTs is decomposed further into two (N/4)-point DFTs. The
signal x[n] of duration N with an impulse response h[n] (that decomposition process, which is continued until it reaches the
has been zero-padded to length N) gives an output yc [n] 2-point DFT level, is illustrated in the next subsections.
of the same length as that of the input x[n]. For wavelets,
the cyclic convolution approach is superior to the linear
convolution approach because the latter results in an output 2-8.1 2-Point DFT
longer than the input. ◭
For notational efficiency, we introduce the symbols
Exercise 2-13: Compute the 4-point DFT of {4, 3, 2, 1}. WN = e− j2π /N , (2.111a)
− j2π nk/N
Answer: {10, (2 − j2), 2, (2 + j2)}. (See IP ) WNnk =e , (2.111b)
and
WN−nk = e j2π nk/N . (2.111c)
2-8 Fast Fourier Transform (FFT)
Using this shorthand notation, the summations for the DFT, and
its inverse given by Eq. (2.89), assume the form
◮ The fast Fourier transform (FFT) is a computational
N−1
algorithm used to compute the discrete Fourier transforms
(DFT) of discrete signals. Strictly speaking, the FFT is X[k] = ∑ x[n] WNnk , k = 0, 1, . . . , N − 1, (2.112a)
n=0
not a transform, but rather an algorithm for computing the
transform. ◭ and
1 N−1
As was mentioned earlier, the fast Fourier transform (FFT) is x[n] = ∑ X[k]WN−nk ,
N k=0
n = 0, 1, . . . , N − 1. (2.112b)
a highly efficient algorithm for computing the DFT of discrete
time signals. An N-point DFT performs a linear transformation
from an N-long discrete-time vector, namely x[n], into an N- In this form, the N-long vector X[k] is given in terms of the
long frequency domain vector X[k] for k = 0, 1, . . . , N − 1. N-long vector x[n], and vice versa, with WNnk and WN−nk acting
Computation of each X[k] involves N complex multiplications, as weighting coefficients.
so the total number of multiplications required to perform the For a 2-point DFT,
DFT for all X[k] is N 2 . This is in addition to N(N − 1) complex
additions. For N = 512, for example, direct implementation of N = 2,
the DFT operation requires 262,144 multiplications and 261,632 W20k = e− j0 = 1,
complex additions. For small N, these numbers are smaller, and
since multiplication by any of { 1, −1, j, − j } does not count as
a true multiplication. W21k = e− jkπ = (−1)k .
2-8 FAST FOURIER TRANSFORM (FFT) 77
Table 2-10 Comparison of number of complex computations required by a standard DFT and an FFT using the formulas in the
bottom row.
N Multiplication Additions
Standard DFT FFT Standard DFT FFT
2 4 1 2 2
4 16 4 12 8
8 64 12 56 24
16 256 32 240 64
.. .. .. .. ..
. . . . .
512 262,144 2,304 261,632 4,608
1,024 1,048,576 5,120 1,047,552 10,240
2,048 4,194,304 11,264 4,192,256 22,528
N
N N2 log2 N N(N − 1) N log2 N
2
2-8.2 4-Point DFT

1
x[0] X[0]
1
1
x[1] X[1] For a 4-point DFT, N = 4 and
−1
WNnk = W4nk = e− jnkπ /2 = (− j)nk . (2.115)
Figure 2-16 Signal flow graph for a 2-point DFT.
From Eq. (2.112a), we have
3
X[k] = ∑ x[n] W4nk
Hence, Eq. (2.112a) yields the following expressions for X[0] n=0
and X[1]: = x[0] + x[1] W41k + x[2] W42k + x[3] W43k , (2.116)
X[0] = x[0] + x[1] (2.113a) k = 0, 1, 2, 3.
and
X[1] = x[0] − x[1], (2.113b) Upon evaluating W41k , W42k , and W43k and the relationships
between them, Eq. (2.116) can be cast in the form
which can be combined into the compact form
X[k] = [x[0] + (−1)k x[2]] + W41k [x[1] + (−1)k x[3]], (2.117)
| {z } | {z }
k
X[k] = x[0] + (−1) x[1], k = 0, 1. (2.114) 2-point DFT 2-point DFT
which consists of two 2-point DFTs: one that includes values of

The equations for X[0] and X[1] can be represented by the signal x[n] for even values of n, and another for odd values of n. At this
flow graph shown in Fig. 2-16, which is often called a butterfly point, it is convenient to define xe [n] and xo [n] as x[n] at even and
diagram. odd times:
xe [n] = x[2n], n = 0, 1, (2.118a)

xo [n] = x[2n + 1], n = 0, 1. (2.118b)
divided into four 4-point DFTs, which are just additions and
1 Xe[0] 1 X[0] subtractions. This conquers the 16-point DFT by dividing it into
xe[0] = x[0]
1 1 4-point DFTs and additional MADs.
1
Xe[1] 1 X[1]
xe[1] = x[2]
−1
W14
2-point DFT
A. Dividing a 16-Point DFT
1
xo[0] = x[1] 1 Xo[0] −1 X[2] We now show that the 16-point DFT can be computed for
1 even values of k using an 8-point DFT of (x[n] + x[n + 8]) and
1 1 for odd values of k using an 8-point DFT of the modulated
Xo[1] −W14 X[3] signal (x[n] − x[n + 8])e− j2π n/16. Thus, the 16-point DFT can
xo[1] = x[3]
−1 be computed as an 8-point DFT (for even values of k) and as a
2-point DFT Recomposition modulated 8-point DFT (for odd values of k).
Figure 2-17 Signal flow graph for a 4-point DFT. Weighting

coefficient W41 = − j. Note that summations occur only at red
intersection points.
B. Computation at Even Indices
We consider even and odd indices k separately.

Thus, xe [0] = x[0] and xe [1] = x[2] and, similarly, xo [0] = x[1] For even values of k, we can write k = 2k′ and split the
and xo [1] = x[3]. When expressed in terms of xe [n] and xo [n], 16-point DFT summation into two summations:
Eq. (2.117) becomes
15
′
k
X[2k′ ] = ∑ x[n] e− j2π (2k /16)n
X[k] = [xe [0] + (−1) xe [1]] n=0
| {z }
7 15
2-point DFT of xe [n] ′ ′
= ∑ x[n] e− j2π (2k /16)n + ∑ x[n] e− j2π (2k /16)n.
+ W41k [xo [0] + (−1)k xo [1]], (2.119) n=0 n=8
| {z } (2.120)
2-point DFT of xo [n]
k = 0, 1, 2, 3. Changing variables from n to n′ = n − 8 in the second summa-

tion, and recognizing 2k′ /16 = k′ /8, gives
The FFT computes the 4-point DFT by computing the two 7 7
′ ′ ′
2-point DFTs, followed by a recomposition step that involves
multiplying the even 2-point DFT by W41k and then adding it to
X[2k′ ] = ∑ x[n] e− j2π (k /8)n + ∑
′
x[n′ + 8] e− j2π (k /8)(n +8)
n=0 n =0
the odd 2-point DFT. The entire process is depicted by the signal 7
′
flow graph shown in Fig. 2-17. In the graph, Fourier coefficients = ∑ (x[n] + x[n + 8])e− j2π (k /8)n
Xe [0] and Xe [1] represent the outputs of the even 2-point DFT, n=0
and similarly, Xo [0] and Xo [1] represent the outputs of the odd = DFT({x[n] + x[n + 8], n = 0, . . . , 7}). (2.121)
2-point DFT.
So for even values of k, the 16-point DFT of x[n] is the 8-point
DFT of { x[n] + x[n + 8], n = 0, . . . , 7 }.
2-8.3 16-Point DFT
We now show how to compute a 16-point DFT using two 8-point
DFTs and 8 multiplications and additions (MADs). This divides
the 16-point DFT into two 8-point DFTs, which in turn can be
2-8 FAST FOURIER TRANSFORM (FFT) 79
C. Computation at Odd Indices • For even values of index k, we have

For odd values of k, we can write k = 2k′ + 1 and split the 16- X[0, 2, 4, 6] = DFT({ 7 + 8, 1 + 5, 4 + 3, 2 + 6 })
point DFT summation into two summations: = { 36, 8 + j2, 8, 8 − j2 }.
15
′
For odd index values, we need twiddle mults. The twiddle
X[2k′ + 1] = ∑ x[n] e− j2π (2k +1)/16n factors are given by { e− j2π n/8√} for n = 0, 1, 2, and 3, which
n=0 √
7
′
reduce to { 1, 22 (1 − j), − j, 22 (−1 − j) }.
= ∑ x[n] e− j2π (2k +1)/16n
n=0
• Implementing the twiddle mults gives
15
− j2π (2k′ +1)/16n
+ ∑ x[n] e . (2.122) { 7 − 8, 1 − 5, 4 − 3, 2 − 6 }
n=8 ( √ √ )
2 2
Changing variables from n to n′ = n − 8 in the second summa- × 1, (1 − j), − j, (−1 − j)
2 2
tion, and recognizing that e− j2π 8/16 = −1 and √ √
= { −1, 2 2(−1 + j), − j, 2 2(1 + j) }.
2k′ + 1 k′ 1
= + ,
16 8 16
• For odd values of index k, we have
gives √ √
X[1, 3, 5, 7] = DFT({ −1, 2 2(−1 + j), − j, 2 2(1 + j) })
7
′ = { −1 + j4.66, −1 + j6.66, −1 − j6.66, −1 − j4.66 }.
X[2k′ + 1] = ∑ (x[n] e− j2π (1/16)n )e− j2π (k /8)n
n=0
7
′ ′ ′ • Combining these results for even and odd k gives
+ ∑ (x[n′ + 8] e− j2π (1/16)(n +8) )e− j2π (k /8)(n +8)
′
n =0 DFT({ 7, 1, 4, 2, 8, 5, 3, 6 })
7
− j2π (k′/8)n = { 36, −1 + j4.7, 8 + j2, −1 + j6.7, 8,
= ∑ e− j2π (1/16)n(x[n] − x[n + 8])e − 1 − j6.7, 8 − j2, −1 − j4.7 }.
n=0
= DFT({e− j2π (1/16)n (x[n] − x[n + 8]), n = 0, . . . , 7}). Note the conjugate symmetry in the second and third lines:
(2.123) X[7] = X∗ [1], X[6] = X∗ [2], and X[5] = X∗ [3].
So for odd values of k, the 16-point DFT of x[n] is the 8-point
• This result agrees with direct MATLAB computation using
DFT of { e− j(2π /16)n(x[n] − x[n + 8]), n = 0, . . . , 7 }. The signal
{ x[n] − x[n + 8], n = 0, . . . , 7 } has been modulated through
multiplication by e− j(2π /16)n. The multiplications by e− j(2π /16)n fft([7 1 4 2 8 5 3 6]).
are known as twiddle multiplications (mults) by the twiddle
factors e− j(2π /16)n.
2-8.4 Dividing Up a 2N-Point DFT
We now generalize the procedure to a 2N-point DFT by dividing
Example 2-8: Dividing an 8-Point DFT it into two N-point DFTs and N twiddle mults.
into Two 4-Point DFTs (1) For even indices k = 2k′ we have:
N−1
′
Divide the 8-point DFT of { 7, 1, 4, 2, 8, 5, 3, 6 } into two 4-point

X[2k′ ] = ∑ (x[n] + x[n + N])e− j2π (k /N)n
n=0
DFTs and twiddle mults.
= DFT{ x[n] + x[n + N], n = 0, 1, . . . , N − 1 }. (2.124)
Solution:
(2) For odd indices k = 2k′ + 1 we have: 2-9.1 Deconvolution Procedure

N−1
′ If x[n] has duration M and h[n] has duration L, then y[n] has
X[2k′ + 1] = ∑ e− j2π (1/(2N))n(x[n] − x[n + N])e− j2π (k /N)n duration N = L+ M − 1. Let us define the zero-padded functions
n=0
= DFT{e− j2π (1/(2N))n(x[n] − x[n + N])}. (2.125) e
h[n] = {h[n], 0, . . . , 0 }, (2.127a)
| {z }
N−L zeros
Thus, a 2N-point DFT can be divided into
xe[n] = {x[n], 0, . . . , 0 }. (2.127b)
| {z }
• Two N-point DFTs, N−M zeros
• N multiplications by twiddle factors e− j2π (1/2N)n, and (1) With xe[n], e

h[n], and y[n] all now of duration N, we can obtain
their respective N-point DFTs, X[k],e e
H[k], and Y[k], which are
• 2N additions and subtractions. interrelated by
e X[k].
Y[k] = H[k] e (2.128)
2-8.5 Dividing and Conquering e and taking an N-point inverse DFT, we

Upon dividing by H[k]
have
Now suppose N is a power of two; e.g., N = 1024 = 210 . In
that case, we can apply the algorithm of the previous subsection xe[n] = DFT−1 {X[k]}
e
recursively to divide an N-point DFT into two N/2-point DFTs, ( )
then into four N/4-point DFTs, then into eight N/8-point DFTs, −1 Y[k]
= DFT
and so on until we reach the following 4-point DFTs: e
H[k]
( )
X[0] = x[0] + x[1] + x[2] + x[3], DFT{y[n]}
= DFT−1 . (2.129)
X[1] = x[0] − jx[1] − x[2] + jx[3], DFT{e h[n]}
(2.126)
X[2] = x[0] − x[1] + x[2] − x[3],
(2) Discarding the (N − M) final zeros in xe[n] gives x[n]. The
X[3] = x[0] + jx[1] − x[2] − jx[3]. zero-padding and unpadding processes allow us to perform the
e is nonzero
deconvolution problem for any system, provided H[k]
At each stage, half of the DFTs are modulated, requiring N/2
for all 0 ≤ k ≤ N − 1.
multiplications. So if N is a power of 2, then an N-point
DFT computed using the FFT will require approximately
(N/2) log2 (N) multiplications and N log2 (N) additions. These
2-9.2 FFT Implementation Issues
can be reduced slightly by recognizing that some multiplications
are simply multiplications by ±1 and ± j. (a) To use the FFT algorithm (Section 2-8) to compute the three
To illustrate the computational significance of the FFT, sup- DFTs, N should be rounded up to the next power of 2 because
pose we wish to compute a 32768-point DFT. Direct compu- the FFT can be computed more rapidly.
tation using Eq. (2.89a) would require (32768)2 ≈ 1.1 × 109
MADs. In contrast, computation using the FFT would require e may be zero, which
(b) In some cases, some of the values of H[k]
less than 32768
2 log2 (32768) ≈ 250,000 MADs, representing a is problematic because the computation of Y[k]/H[k] e would
computational saving of a factor of 4000! involve dividing by zero. A possible solution to the division-by-
zero problem is to change the value of N. Suppose H[k]e = 0 for
some value of index k, such as k = 3. This corresponds to H(Ω)
2-9 Deconvolution Using the DFT having a zero at 2π k/N for k = 3, because by definition, the DFT
is the DTFT sampled at Ω = 2π k/N for integers k. Changing N
Recall that the objective of deconvolution is to reconstruct the to, say, N + 1 (or some other suitable integer) means that the
input x[n] of a system from measurements of its output y[n] and DFT is now the DTFT H(Ω) sampled at Ω = 2π k/(N + 1), so
knowledge of its impulse response h[n]. That is, we seek to solve the zero at k = 3 when the order was N may now get missed with
y[n] = h[n] ∗ x[n] for x[n], given y[n] and h[n]. the sampling at the new order N + 1. Changing N to N + 1 may
2-9 DECONVOLUTION USING THE DFT 81
e
avoid one or more zeros in H[k], but it may also introduce new and
ones. It may be necessary to try multiple values of N to satisfy
e 6= 0 for all k.
the condition that H[k] Y[3] = 6(1) + 19( j) + 32(−1) + 21(− j) = −26 − j2.
The 4-point DFT of x[ñ] is, therefore,
Example 2-9: DFT Deconvolution e = Y[0] =

X[0]
78
= 13,
e
H[0] 6
e = Y[1] = −26 + j2
X[1] = 6 − j7,
e
H[1] −2 − j2
In response to an input x[n], an LTI system with an impulse
response h[n] = {1, 2, 3} generated an output e = Y[2] = −2 = −1,
X[2]
e
H[2] 2
y[n] = {6, 19, 32, 21}.
and
Determine x[n], given that it is of finite duration.
e = Y[3] = −26 − j2 = 6 + j7.
X[3]
Solution: The output is of duration N = 4, so we should zero- e −2 + j2
H[3]
pad h[n] to the same duration by defining
e is
By Eq. (2.89b), the inverse DFT of X[k]
e
h[n] = {1, 2, 3, 0}. (2.130)
1 3 e
From Eq. (2.89a), the 4-point DFT of e
h[n] is xe[n] = ∑ X[k]e j2π kn/4, n = 0, 1, 2, 3, (2.132)
4 k=0
3
e =
H[k] ∑ eh[n] e− j2π kn/4, k = 0, 1, 2, 3, (2.131) which yields
n=0 xe[n] = {6, 7, 0, 0}.
which yields Given that y[n] is of duration N = 4 and h[n] is of du-
ration L = 3, it follows that x[n] must be of duration
e = 1(1) + 2(1) + 3(1) + 0(1) = 6,
H[0] M = N − L + 1 = 4 − 3 + 1 = 2, if its duration is finite. Deletion
e = 1(1) + 2(− j) + 3(−1) + 0( j) = −2 − j2, of the zero-pads from xe[n] leads to
H[1]
e = 1(1) + 2(−1) + 3(1) + 0(−1) = 2,
H[2] x[n] = {6, 7}, (2.133)
and whose duration is indeed 2.
e = 1(1) + 2( j) + 3(−1) + 0(− j) = −2 + j2.

H[3] Example 2-10: Removal of Periodic Interference
Similarly, the 4-point DFT of y[n] = {6, 19, 32, 21} is
We are given the signal of two actual trumpets playing si-
3
− j2π kn/4 multaneously notes A and B. The goal is to use the DFT to
Y[k] = ∑ y[n] e , k = 0, 1, 2, 3,
eliminate the trumpet playing note A, while preserving the
n=0
trumpet playing note B. We only need to know that note B is
which yields at a higher frequency than note A.
Y[0] = 6(1) + 19(1) + 32(1) + 21(1) = 78, Solution: The two-trumpets signal time-waveform is shown
Y[1] = 6(1) + 19(− j) + 32(−1) + 21( j) = −26 + j2, in Fig. 2-18(a), and the corresponding spectrum is shown in
Fig. 2-18(b). We note that the spectral lines occur in pairs of
Y[2] = 6(1) + 19(−1) + 32(1) + 21(−1) = −2, harmonics with the lower harmonic of each pair associated with
nent of each pair of spectral lines to zero. The modified spectrum

x(t) is shown in Fig. 2-18(c). The inverse DFT of this spectrum,
Waveform of two-trumpet signal
followed by reconstruction to continuous time, is shown in
0.8
0.6 Fig. 2-18(d).
0.4 The filtering process eliminated the signal due to the trumpet
0.2 playing note A, while preserving the signal due to note B, almost
0
−0.2 completely. This can be confirmed by listening to the signals
−0.4 before and after filtering.
−0.6
t (ms) Whereas it is easy to distinguish between the harmonics of
(a) 23 24 25 26 27 note A and those of note B at lower frequencies, this is not
the case at higher frequencies, particularly when they overlap.
X[k] Hence, neither note can be eliminated without affecting the other
Spectrum of two-trumpet signal slightly. Fortunately, the overlapping high-frequency harmonics
0.10
Note A (440 Hz) contain very little power compared with the non-overlapping,
0.08
Note B (491 Hz) low-frequency harmonics, and therefore, their role is quite
0.06 insignificant.
0.04
0.02
0 k 2-10 Computation of Continuous-Time
(b) 0 1000 2000 3000 4000
Fourier Transform (CTFT) Using
X[k] (after filtering) the DFT
0.10 Spectrum of filtered two-trumpet signal Let us consider a continuous-time signal x(t) with support
0.08 [−T /2, T /2], which means that
0.06
T
0.04 x(t) = 0 for |t| > .
2
0.02
0 k Also, let us assume (for the time being) that the signal spectrum
(c) 0 1000 2000 3000 4000 X( f ) is bandwidth-limited to a maximum frequency F/2 in Hz,
which means that
xf (t)
Waveform of filtered two-trumpet signal F
X( f ) = 0 for | f | > .
0.6 2
0.4 Our goal is to compute samples {X(k∆ f )} of X( f ) at a
0.2 frequency spacing ∆ f from signal samples { x(n∆t ) } of x(t)
0
−0.2 recorded at time interval ∆t , and to do so using the DFT.
−0.4
t (ms)
(d) 23 24 25 26 27 2-10.1 Derivation Using Sampling Theorem
Twice
Figure 2-18 Removing the spectrum of note A.
Whereas it is impossible for a signal to be simultaneously
bandlimited and time-limited, as presumed earlier, many real-
world signals are approximately band- and time-limited.
According to the sampling theorem (Section 2-4), x(t) can
note A and the higher harmonic of each pair associated with be reconstructed from its samples { x(n∆t ) } if the sampling rate
note B. St = 1/∆t > 2 F2 = F. Applying the sampling theorem with t
Since we wish to eliminate note A, we set the lower compo- and f exchanged, X( f ) can be reconstructed from its samples
2-10 COMPUTATION OF CONTINUOUS-TIME FOURIER TRANSFORM (CTFT) USING THE DFT 83
{ X(k∆ f ) } if its sampling rate S f = 1/∆ f > 2 T2 = T . In the exponent of Eq. (2.136), the expression becomes
In the sequel, we use the minimum sampling intervals
M
∆t = 1/F and ∆ f = 1/T . Finer discretization can be achieved
Xs (k∆ f ) = ∑ x(n∆t ) e− j2π (k∆ f )(n∆t ) , |k| ≤ M
by simply increasing F and/or T . In practice, F and/or T is (are)
n=−M
increased slightly so that N = FT is an odd integer, which makes
M
the factor M = (N − 1)/2 also an integer (but not necessarily an x(n∆t ) e− j2π nk/(2M+1) ,
odd integer). The factor M is related to the order of the DFT,
= ∑ |k| ≤ M. (2.140)
n=−M
which has to be an integer.
The Fourier transform of the synthetic sampled signal xs (t), This expression looks like a DFT of order 2M + 1. Recall from
defined in Eq. (2.43) and repeated here as the statement in connection with Eq. (2.47) that the spectrum
Xs ( f ) of the sampled signal includes the spectrum X( f ) of the
∞
continuous-time signal (multiplied by the sampling rate St ) plus
xs (t) = ∑ x(n∆t ) δ (t − n∆t ), (2.134)
additional copies repeated every ±St along the frequency axis.
n=−∞
With St = 1/∆t = F in the present case,
was computed in Eq. (2.53), and also repeated here as
Xs ( f ) = FX( f ), for | f | < F, (2.141)
∞
Xs ( f ) = ∑ x(n∆t ) e− j2π f n∆t . (2.135) from which we deduce that
n=−∞
M
Setting f = k∆ f gives X(k∆ f ) = Xs ( f ) ∆t = ∑ x(n∆t ) e− j2π nk/(2M+1) ∆t , |k| ≤ M.
n=−M
∞
(2.142)
Xs (k∆ f ) = ∑ x(n∆t ) e− j2π (k∆ f )(n∆t ) . (2.136) Ironically, this is the same result that would be obtained by
n=−∞
simply discretizing the definition of the continuous-time Fourier
Noting that x(t) = 0 for |t| > T2 and X( f ) = 0 for | f | > F transform! But this derivation shows that discretization gives the
2, we
restrict the ranges of n and k to exact result if x(t) is time- and bandlimited.
T /2 FT N
|n| ≤ = = (2.137a)
∆t 2 2 Example 2-11: Computing CTFT by DFT
and
F/2 FT N
|k| ≤ = = . (2.137b) Use the DFT to compute the Fourier transform of the continuous
∆f 2 2
Gaussian signal
Next, we introduce factor M defined as 1 2
x(t) = √ e−t /2 .
N −1 2π
M= , (2.138)
2
Solution: Our first task is to assign realistic values for the
and we note that if N is an odd integer, M is guaranteed to be signal duration T and the width of its spectrum F. It is an
an integer. In view of Eq. (2.137), the ranges of n and k become “educated” trial-and-error process. At t = 4, x(t) = 0.00013,
n, k = −M, . . . , M. Upon substituting so we will assume that x(t) ≈ 0 for |t| > 4. Since x(t) is
symmetrical with respect to the vertical axis, we assign
1 1 1 1 1
∆t ∆ f = = = = . (2.139)
FT FT N 2M + 1 T = 2 × 4 = 8 s.
2 2
The Fourier transform of x(t) is X( f ) = e−2π f . By trial and
error, we determine that F = 1.2 Hz is sufficient to characterize
X( f ). The combination gives
N = T F = 8 × 1.2 = 9.6.
X[k]
0.8
0.6
0.4
0.2
k
−4 −3 −2 −1 0 1 2 3 4
Figure 2-19 Comparison of exact (blue circles) and DFT-computed (red crosses) of the continuous-time Fourier transform of a Gaussian
signal.
To increase the value of N to an odd integer, we increase F to

Concept Question 2-13: The DFT is often used to com-
1.375 Hz, which results in N = 11 and M = 5. In Fig. 2-19
pute the CTFT numerically. Why does this often work as
computed values of the discretized spectrum of x(t) are com-
well as it does?
pared with exact values based on evaluating the analytical
2 2
expression for X( f ) = e−2π f . The comparison provides an
excellent demonstration of the power of the sampling theorem;
representing x(t) by only 11 equally spaced samples is sufficient
to capture its information content and generate its Fourier
transform with high fidelity.
2-10.2 Practical Computation of X( f ) Using the

DFT
Example 2-11 shows that simple discretization of the
continuous-time Fourier transform works very well, provided
that the discretization lengths in time and frequency are chosen
properly. The integer N was odd for clarity of derivation. In
practice, we would increase T and F so that N = T F is a power
of two, permitting the fast Fourier transform (FFT) algorithm to
be used to compute the DFT quickly.
Concept Question 2-12: Why do we need a discrete

Fourier transform (DFT)?
2-10 COMPUTATION OF CONTINUOUS-TIME FOURIER TRANSFORM (CTFT) USING THE DFT 85
Summary
Concepts
• Many 2-D concepts can be understood more easily by h(t) to input x(t) is output y(t) = h(t)∗ x(t), and similarly
reviewing their 1-D counterparts. These include: LTI sys- in discrete time.
tems, convolution, sampling, continuous-time; discrete- • The response of an LTI system with impulse response
time; and discrete Fourier transforms. h(t) to input A cos(2π f0t + θ ) is
• The DTFT is periodic with period 2π .
A|H( f )| cos(2π f0t + θ + H( f )),
• Continuous-time signals can be sampled to discrete-time
signals, on which discrete-time signal processing can be where H( f ) is the Fourier transform of h(t), and
performed. similarly in discrete time.
• The response of an LTI system with impulse response
Impulse Sinc interpolation formula
1 t ∞
δ (x) = lim
ε →0 2ε
rect
2ε
x(t) = ∑ x(n∆) sinc(S(t − n∆))
n=−∞
Energy of x(t) Discrete-time Fourier transform (DTFT)
Z ∞
2 ∞
E= |x(t)| dt x[n] e− jΩn
−∞
X(Ω) = ∑
n=−∞
Convolution Z ∞ Inverse DTFTZ
y(t) = h(t) ∗ x(t) = h(τ )x(t − τ ) d τ 1 π
−∞ x[n] = X(Ω) e jΩn dΩ
2 π −π
Convolution
∞ Discrete-time sinc

y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]
h[n] =
Ω0
sinc
Ω0 n
i=−∞ π π
Fourier transform
Z ∞ Discrete sinc
X( f ) = x(t) e− j2π f t dt sin((2N + 1)Ω/2)
−∞ X(Ω) =
sin(Ω/2)
Inverse Fourier transform
Z ∞ Discrete Fourier Transform (DFT)
j2π f t M−1
x(t) = X( f ) e df
−∞ X[k] = ∑ x[n] e− j2π nk/N
n=0
Sinc function
sin(π x) Inverse DFT
sinc(x) =
πx 1 N−1
x[n] = ∑ X[k] e j2π nk/N
N k=0
Ideal lowpass filter impulse response
h(t) = 2 fc sinc(2 fc t) Cyclic convolution
N−1
Sampling theorem yc [n] = x1 [n]
c x2 [n] = ∑ x1 [n1] x2 [(n − n1)N ]
1 n1 =0
Sampling rate S = > 2B if X( f ) = 0 for | f | > B
∆
aliasing FFT Parseval’s theorem spectrum
cyclic convolution Fourier transform Rayleigh’s theorem zero padding
deconvolution frequency response function sampled signal
DFT impulse response convolution sampling theorem
DTFT linear time-invariant (LTI) sinc function
PROBLEMS 2.8 Compute a Nyquist sampling rate for reconstructing signal
Section 2-2: Review of Continuous-Time Systems sin(40π t) sin(60π t)

x(t) =
π 2t 2
2.1 Compute the following convolutions:
from its samples.
(a) e−t u(t) ∗ e−2t u(t)
2.9 Signal
(b) e−2t u(t) ∗ e−3t u(t)
(c) e−3t u(t) ∗ e−3t u(t) sin(2π t)
x(t) = [1 + 2 cos(4π t)]
πt
Section 2-3: 1-D Fourier Transforms
is sampled every 1/6 second. What is the spectrum of the
sampled signal?
2.2 Show that the spectrum of
2.10 Signal x(t) = cos(14π t) − cos(18π t) is sampled at 16
sin(20π t) sin(10π t) sample/s. The result is passed through an ideal brick-wall
πt πt lowpass filter with a cutoff frequency of 8 Hz. What is the
spectrum of the output signal?
is zero for | f | > 15 Hz.
2.3 Using only Fourier transform properties, show that 2.11 Signal x(t) = sin(30π t) + sin(70π t) is sampled at 50
sample/s. The result is passed through an ideal brick-wall
sin(10π t) sin(30π t) lowpass filter with a cutoff frequency of 25 Hz. What is the
[1 + 2 cos(20π t)] = . spectrum of the output signal?
πt πt
2.4 If x(t) = sin(2t)/(π t), compute the energy of d 2 x/dt 2 . Section 2-5: Review of Discrete-Time Signals and
2.5 Compute the energy of e−t u(t) ∗ sin(t)/(π t). Systems
2.6 Show that Z ∞ 2.12 Compute the following convolutions:
sin2 (at) a
dt = (a) {1, 2} ∗ {3, 4, 5}
−∞ (π t)2 π
if a > 0. (b) {1, 2, 3} ∗ {4, 5, 6}
(c) {2, 1, 4} ∗ {3, 6, 5}
Section 2-4: The Sampling Theorem 2.13 If {1, 2, 3} ∗ x[n] = {5, 16, 34, 32, 21}, compute x[n].
2.7 The spectrum of the trumpet signal for note G (784 Hz) 2.14 Given the two systems connected in series as
is negligible above its ninth harmonic. What is the Nyquist
sampling rate required for reconstructing the trumpet signal x[n] h1 [n] w[n] = 3x[n] − 2x[n − 1],
from its samples?
PROBLEMS 87
and Section 2-6: Discrete-Time Fourier Transform

(DTFT)
w[n] h2 [n] y[n] = 5w[n] − 4w[n − 1],
2.20 Compute the DTFTs of the following signals (simplify
answers to sums of sines and cosines).
compute the overall impulse response.
(a) {1, 1, 1, 1, 1}
2.15 The two systems (b) {3, 2, 1}
y[n] = 3x[n] − 2x[n − 1] 2.21 Compute the inverse DTFT of
and X(Ω) = [3 + 2 cos(Ω) + 4 cos(2Ω)] + j[6 sin(Ω) + 8 sin(2Ω)].

y[n] = 5x[n] − 4x[n − 1]
are connected in parallel. Compute the overall impulse response. 2.22 Compute the inverse DTFT of
X(Ω) = [7 + 5 cos(Ω) + 3 cos(2Ω)] + j[sin(Ω) + sin(2Ω)].

Section 2-6: Discrete-Time Frequency Response
2.16 Given
Section 2-7: Discrete Fourier Transform (DFT)
π 2.23 Compute the DFTs of each of the following signals:
cos n y[n] = x[n] + 0.5x[n − 1] + x[n − 2] y[n], (a) {12, 8, 4, 8}
2
(b) {16, 8, 12, 4}
(a) Compute the frequency response H(Ω). 2.24 Determine the DFT of a single period of each of the
(b) Compute the output y[n]. following signals:
2.17 Given (a) cos( π4 n)
π (b) 1
4 sin( 34π n)
cos n y[n] = 8x[n] + 3x[n − 1] + 4x[n − 2] y[n], 2.25 Compute the inverse DFTs of the following:
2
(a) {0, 0, 3, 0, 4, 0, 3, 0}
(a) Compute the frequency response H(Ω). (b) {0, 3 + j4, 0, 0, 0, 0, 0, 3 − j4}
(b) Compute the output y[n].
Section 2-9: Deconvolution Using the DFT
2.18 If input x[n] = cos( π2 n) + cos(π n), and
2.26 Use DFTs to compute the convolution
x[n] y[n] = x[n] + x[n − 1] + x[n − 2] + x[n − 3] y[n],
{1, 3, 5} ∗ {7, 9}
(a) Compute the frequency response H(Ω). by hand.

(b) Compute the output y[n]. 2.27 Solve each of the following deconvolution problems for
input x[n]. Use MATLAB.
2.19 If input x[n] = 1 + 2 cos( π2 n) + 3 cos(π n), and
(a) x[n] ∗ {1, 2, 3} = {7, 15, 27, 13, 24, 27, 34, 15}.
(b) x[n] ∗ {1, 3, 5} = {3, 10, 22, 18, 28, 29, 52, 45}.
x[n] y[n] = x[n] + 4x[n − 1] + 3x[n − 3] y[n], (c) x[n] ∗ {1, 4, 2, 6, 5, 3} =
{2, 9, 11, 31, 48, 67, 76, 78, 69, 38, 12}.
(a) Compute the frequency response H(Ω). 2.28 Solve each of the following deconvolution problems for
(b) Compute the output y[n]. input x[n]. Use MATLAB.
(a) x[n] ∗ {3, 1, 4, 2} = {6, 23, 18, 57, 35, 37, 28, 6}.
(b) x[n] ∗ {1, 7, 3, 2} = {2, 20, 53, 60, 53, 54, 21, 10}.
(c) x[n] ∗ {2, 2, 3, 6} =
{12, 30, 42, 71, 73, 43, 32, 45, 42}.
Section 2-10: Computation of CTFT Using the

DFT
2.29 Use a 40-point DFT to compute the inverse Fourier

transform of
sin(π f ) 2
X( f ) = .
πf
Assume that X( f ) ≈ 0 for | f | > 10 Hz and x(t) ≈ 0 for |t| > 1 s.
Plot the actual and computed inverse Fourier transforms on the
same plot to show the close agreement between them.
2.30 Use an 80-point DFT to compute the inverse Fourier
transform of

sin(π f ) 2
X( f ) H( f ) = (1 + e−2 jπ f ).
πf
Assume that X( f ) ≈ 0 for | f | > 10 Hz and x(t) ≈ 0 for |t| > 2 s.

Plot the actual and computed inverse Fourier transforms on the
same plot to show the close agreement between them.
Chapter 3
3 2-D Images and Systems
Contents Ω2
4π
Overview, 90
3-1 Displaying Images, 90
3-2 2-D Continuous-Space Images, 91
3-3 Continuous-Space Systems, 93
3-4 2-D Continuous-Space Fourier Transform
(CSFT), 94 Ω1
3-5 2-D Sampling Theorem, 107
3-6 2-D Discrete Space, 113
3-7 2-D Discrete-Space Fourier Transform
(DSFT), 118
3-8 2-D Discrete Fourier Transform (2-D DFT), 119
3-9 Computation of the 2-D DFT Using
−4π
MATLAB, 126 −4π 4π
Problems, 86
Objectives This chapter presents the 2-D versions, suitable for

Learn to: image processing, of the 1-D concepts presented in
Chapter 2. These include: linear shift-invariant
■ Compute the output image from an LSI system to a (LSI) systems, 2-D convolution, spatial frequency
given input image using convolution. response, filtering, Fourier transforms for continu-
ous and discrete-space images, and the 2-D samp-
■ Compute the continuous-space Fourier transform of ling theorem. It also covers concepts that do not
an image or point-spread function. arise in 1-D, such as separability, image scaling and
rotation, and representation of discrete-space
■ Use the 2-D sampling theorem to convert a continu-
images using various coordinate systems.
ous-space image to a discrete-space image.
■ Perform the two tasks listed above for continuous-

space images on discrete-space images.
Overview
A 2-D image is a signal that varies as a function of the two
spatial dimensions, x and y, instead of time t. A 3-D image, such 3.5
as a CT scan (Fig. 1-24), is a signal that varies as a function of 3
(x, y, z). 2.5
In 1-D, it is common practice to assign the symbol x(t) to 2
represent the signal at the input to a system, and to assign y(t) 1.5
to the output signal [or x[n] and y[n] if the signals are discrete]. 1
Because x, y, and z are the standard symbols of the Cartesian 0.5
coordinate system, a different symbolic representation is used 0
with 2-D images: 19
x 10
10 y
19 0
◮ Image intensity is represented by f (x, y), where (x, y) are
two orthogonal spatial dimensions. ◭ (a) Mesh plot format
0
1
This chapter extends the 1-D definitions, properties, and trans-
3
formations covered in the previous chapter into their 2-D
equivalents. It also presents certain 2-D properties that have no 5
counterparts in 1-D. 7
9
x
11
3-1 Displaying Images
13
In 1-D, a continuous-time signal x(t) is displayed by plotting 15
x(t) versus t. A discrete-time signal x[n] is displayed using a 17
stem plot of x[n] versus n. Clearly such plots are not applicable
19
for 2-D images.
0 1 3 5 7 9 11 13 15 17 19
Image intensity f (x, y) of a 2-D image can be displayed either y
as a 3-D mesh plot (Fig. 3-1(a)), which hides some features of
the image and is difficult to create and interpret, or as a grayscale (b) Grayscale format
image (Fig. 3-1(b)). In a grayscale image, the image intensity is
scaled so that the minimum value of f (x, y) is depicted in black Figure 3-1 An image displayed in (a) mesh plot format and (b)
and the maximum value of f (x, y) is depicted in white. If the grayscale format.
image is non-negative ( f (x, y) ≥ 0), as is often the case, black in
the grayscale image denotes zero values of f (x, y). If the image
is not non-negative, zero values of f (x, y) appear as a shade of depicting the infrared intensity emitted by a hot air balloon.
gray. We should not confuse a false-color image with a true-color
MATLAB’s imagesc(X),colormap(gray) displays image. Whereas a false-color image is a single grayscale image
the 2-D array X as a grayscale image in which black depicts (1 channel) displayed in color, a true-color image actually is a
the minimum value of X and white depicts the maximum value set of three images (3 channels):
of X.
It is also possible to display an image f (x, y) as a false-color { fred (x, y), fgreen (x, y), fblue (x, y)},
image, in which case different colors denote different values
of f (x, y). The relation between color and values of f (x, y) is representing (here) the three primary colors: red, green and
denoted using a colorbar, to the side or bottom of the image. An blue. Other triplets of colors, such as yellow, cyan and magenta,
example of a false-color display was shown earlier in Fig. 1-11, can also be used. Hence, image processing of color images
90
3-2 2-D CONTINUOUS-SPACE IMAGES 91
encompasses 3-channel image processing (see Chapter 10). B. Box Image

A grayscale image can be regarded as a still of a black-and-
white TV image, while a color image can be regarded as a still The box image fBox (x, y) is the 2-D pulse-equivalent of the
of a color TV image. Before flat-panel displays were invented, rectangular function rect(t) and is defined as
color images on TVs and monitors were created from three
separate signals using three different electron guns in picture fBox (x, y) = rect(x) rect(y)
(
tubes, while black-and-white TV images were created using a 1 for |x| < 1/2 and |y| < 1/2,
single electron gun in a picture tube. Modern solid-state color =
0 otherwise.
display, such as liquid crystal displays (LCDs), are composed
of three interleaved displays (similar to the APS display in
Fig. 1-4), each driven by one of the three signal channels. By extension, a box image of widths (ℓx , ℓy ) and centered at
(x0 , y0 ) is defined as

Concept Question 3-1: What is the difference between a x − x0 y − y0 x − x0 y − y0
true-color image and a false-color image? fBox , = rect rect
ℓx ℓy ℓx ℓy
(
1 for |x − x0| < ℓx /2 & |y − y0| < ℓy /2,
Exercise 3-1: How can you tell whether an image displayed = (3.2)
0 otherwise,
in color is a color image or a false-color image?
Answer: False-color images should have colorbars to and shown in Fig. 3-2(a).
identify the numerical value associated with each color.
3-2 2-D Continuous-Space Images

A continuous-space image is a physical quantity, such as tem- y
perature or pressure, that varies with spatial position in 2-D. ℓx
Mathematically a continuous-space image is a function f (x, y)
of spatial position (x, y), where x and y have units of length y0 ℓy
(meters).
x0 x
3-2.1 Fundamental 2-D Images
(a) f Box x − x0 , y − y0 = rect x − x0 rect y − y0
A. Impulse ( ℓx ℓy ) ℓx ( ) ( )ℓy
A 2-D impulse δ (x, y) is simply y
δ (x, y) = δ (x) δ (y),

a/2
y0
and a 2-D impulse shifted by ξ along x and by η along y is
δ (x − ξ , y − η ) = δ (x − ξ ) δ (y − η ). x0 x
(b) f Disk x − x0 , y − y0
The sifting property generalizes directly from 1-D to 2-D. In
2-D ( a a )
Z ∞Z ∞
f (ξ , η ) δ (x − ξ , y − η ) d ξ d η = f (x, y). (3.1) Figure 3-2 (a) Box image of widths (ℓx , ℓy ) and centered at
−∞ −∞
(x0 , y0 ) and (b) disk image of radius a/2 and centered at (x0 , y0 ).
92 CHAPTER 3 2-D IMAGES AND SYSTEMS
C. Disk Image
(0,0) x
Being rectangular in shape, the box-image function is suitable
for applications involving Cartesian coordinates, such as shifting
the box sideways or up and down across the image. Some
applications, however, require the use of polar coordinates, in
which case the disk-image function is more suitable. The disk y
image fDisk (x, y) of radius 1/2 is defined as
( p (a) Origin at top left and y axis downward
1 for px2 + y2 < 1/2,
fDisk (x, y) = (3.3)
0 for x2 + y2 > 1/2. y
The expression given by Eq. (3.3) pertains to a circular disk

centered at the origin and of radius 1/2. For the more general
case of a circular disk of radius a/2 and centered at (x0 , y0 ),
(0,0) x
x − x0 y − y0
fDisk , =
a a (b) Origin at bottom left and y axis upward
( p
1 for p(x − x0)2 + (y − y0)2 < a/2,
(3.4) y
0 for (x − x0)2 + (y − y0)2 > a/2.
An example is displayed in Fig. 3-2(b). (0,0)

x
3-2.2 Properties of Images
A. Generalizations of 1-D Properties (c) Origin at center of image
1. Spatial shift
Figure 3-3 Three commonly used image coordinate systems.
When shifted spatially by (x0 , y0 ), image f (x, y) becomes
f (x − x0 , y − y0). Assuming the x axis is pointing to the right,
image f (x − x0 , y − y0 ) is shifted to the right by x0 , relative to
f (x, y), if x0 is positive, and to the left by the same amount if x0 • Point spread functions (PSFs)—introduced in Section
is negative. 3-3.2—are usually displayed with the origin at the center,
Whereas it is customary to define the direction of the x axis as as in Fig. 3-3(c).
pointing to the right, there is less uniformity with regard to the
definition of the y axis; sometimes the y axis is defined along the • Image spectra, defined in Section 3-4, are usually dis-
upward direction, and in other cases it is defined to be along the played as in Fig. 3-3(c).
downward direction. Hence, if y0 is positive, f (x − x0 , y − y0 )
is shifted by y0 either upwards or downwards, depending on the 2. Spatial scaling
direction of the y axis.
As to the location of the origin (x, y) = (0, 0), it is usually When spatially scaled by (ax , ay ), image f (x, y) becomes
defined to be at any one of the following three locations: f (ax x, ay y). If ax > 1, the image is shrunk in the x direction by
upper left corner, lower left corner, or the center of the image a factor ax , and if 0 < ax < 1, the image is magnified in size
(Fig. 3-3). by 1/ax . So ax represents a shrinkage factor. If ax < 0, the image
In this book: is reversed in the x direction, in addition to being shrunk by a
factor of |ax |. The same comments apply to ay .
• Images f (x, y) are usually displayed with the origin (0, 0)
at the upper-left corner, as in Fig. 3-3(a).
3-3 CONTINUOUS-SPACE SYSTEMS 93
3. Image energy
y
Extending the expression for signal energy given by Eq. (3.7)
from 1-D to 2-D leads to x′
y′ y
x′ = x cos θ + y sin θ
Z ∞Z ∞ x′ y′ = −x sin θ + y cos θ
E= | f (x, y)|2 dx dy. (3.5) θ
−∞ −∞ y′
x x
4. Even-odd decomposition
A real-valued image f (x, y) can be decomposed into its even Figure 3-4 Rotation of coordinate system (x, y) by angle θ to
fe (x, y) and odd fo (x, y) components: coordinate system (x′ , y′ ).
f (x, y) = fe (x, y) + fo (x, y), (3.6)
where These can be combined into

fe (x, y) = [ f (x, y) + f (−x, −y)]/2, (3.7a) ′
x cos θ sin θ x
fo (x, y) = [ f (x, y) − f (−x, −y)]/2. (3.7b) = . (3.11)
y′ − sin θ cos θ y
Hence, after rotation by angle θ , image f (x, y) becomes trans-

B. Non-Generalizations of 1-D Properties formed into a new image g(x, y) given by
We now introduce two new properties of images, separability g(x, y) = f (x′ , y′ ) = f (x cos θ + y sin θ , y cos θ − x sin θ ) (3.12)
and rotation, neither one of which has a 1-D counterpart.
Note that rotating an image usually assumes that the image has
1. Separability been shifted so that the center of the image is at the origin (0, 0),
as in Fig. 3-3(c).
An image f (x, y) is separable if it can be written as a product of
separate functions f1 (x) and f2 (y):
Exercise 3-2: Which of the following images is separable:
f (x, y) = f1 (x) f2 (y). (3.8) (a) 2-D impulse; (b) box; (c) disk?
Answer: 2-D impulse and box are separable; disk is not
As we will see later, the 2-D Fourier transform of a separable im-
separable.
age can be computed as a product of the 1-D Fourier transforms
of f1 (x) and f2 (y). 2-D impulses and box images are separable,
whereas disk images are not. Exercise 3-3: Which of the following images is invariant
to rotation: (a) 2-D impulse; (b) box; (c) disk with center at
2. Rotation the origin (0, 0)?
To rotate an image by an angle θ , we define rectangular Answer: 2-D impulse and disk are invariant to rotation;
coordinates (x′ , y′ ) as the rectangular coordinates (x, y) rotated box is not invariant to rotation.
by angle θ . Using the sine and cosine addition formulae, the
rotated coordinates (x′ , y′ ) are related to coordinates (x, y) by
(Fig. 3-4)
x′ = x cos θ + y sin θ (3.9)
3-3 Continuous-Space Systems
and A continuous-space system is a device or mathematical model
y′ = −x sin θ + y cos θ . (3.10) that accepts as an input an image f (x, y) and produces as an
output an image g(x, y). and if the system also is shift-invariant, the 2-D superposition
integral simplifies to the 2-D convolution given by
Z ∞Z ∞
f (x, y) SYSTEM g(x, y).
g(x, y) = f (ξ , η ) h(x − ξ , y − η ) d ξ d η
−∞ −∞
The image rotation transformation described by Eq. (3.12) is a = f (x, y) ∗ ∗h(x, y), (3.15a)
good example of such a 2-D system.
where the “double star” in f (x, y) ∗ ∗h(x, y) denotes the 2-D
convolution of the PSF h(x, y) with the input image f (x, y). In
symbolic form, the 2-D convolution is written as
3-3.1 Linear and Shift-Invariant (LSI) Systems
f (x, y) LSI g(x, y) = f (x, y) ∗ ∗h(x, y). (3.15b)
The definition of the linearity property of 1-D systems (Section
2-2.1) extends directly to 2-D spatial systems, as does the
definition for the invariance, except that time invariance in 1-D
systems becomes shift invariance in 2-D systems. Systems ◮ A 2-D convolution consists of a convolution in the x
that are both linear and shift-invariant are termed linear shift- direction, followed by a convolution in the y direction, or
invariant (LSI). vice versa. Consequently, the 1-D convolution properties
listed in Table 2-3 generalize to 2-D. ◭
3-3.2 Point Spread Function (PSF)

Concept Question 3-2: Why do so many 1-D convolu-
The point spread function (PSF) of a 2-D system is essentially tion properties generalize directly to 2-D?
its 2-D impulse response. The PSF h(x, y; x0 , y0 ) of an image
system is its response to a 2-D impulse δ (x, y) shifted by Exercise 3-4: The operation of a 2-D mirror is described by
(x0 , y0 ): g(x, y) = f (−x, −y). Find the PSF of the mirror.
Answer: h(x, y; ξ , η ) = δ (x + ξ ; y + η ).
δ (x − x0 , y − y0 ) SYSTEM h(x, y; x0 , y0 ). (3.13a)
Exercise 3-5: If g(x, y) = h(x, y) ∗ ∗ f (x, y), what is
If the system is also shift-invariant (SI), Eq. (3.13a) becomes 4h(x, y) ∗ ∗ f (x − 3, y − 2) in terms of g(x, y)?
Answer: 4g(x − 3, y − 2), using the shift and scaling
δ (x − x0 , y − y0) SI h(x − x0, y − y0). (3.13b) properties of 1-D convolutions.
3-3.3 2-D Convolution 3-4 2-D Continuous-Space Fourier

Transform (CSFT)
For a linear system, extending the 1-D superposition integral
expression given by Eq. (2.13) to 2-D gives
The 2-D continuous-space Fourier transform (CSFT) operates
between the spatial domain (x, y) and the spatial frequency
f (x, y) L g(x, y) = domain (µ , v), where µ and v are called spatial frequencies or
Z ∞Z ∞ wavenumbers, with units of cycles/meter, analogous to the units
f (ξ , η ) h(x, y; ξ , η ) d ξ d η , (3.14) of cycles/s (i.e., Hz) for the time frequency f .
−∞ −∞ The 2-D CSFT is denoted F(µ , v) and it is related to f (x, y)
3-4 2-D CONTINUOUS-SPACE FOURIER TRANSFORM (CSFT) 95
by
◮ The spectrum F(µ , ν ) of an image f (x, y) is its 2-D
Z ∞Z ∞
CSFT. The spatial frequency response H(µ , ν ) of an LSI
F(µ , v) = f (x, y) e− j2π (µ x+vy) dx dy. (3.16a) 2-D system is the 2-D CSFT of its PSF h(x, y). ◭
−∞ −∞
The inverse operation is given by

Z ∞Z ∞ 3-4.1 Notable 2-D CSFT Pairs and Properties
f (x, y) = F(µ , v) e j2π (µ x+vy) d µ dv, (3.16b)
−∞ −∞ A. Conjugate Symmetry
and the combination of the two operations is represented sym- The 2-D CTFT of a real-valued image f (x, y) obeys the conju-
bolically by gate symmetry property
f (x, y) F(µ , v).
F∗ (µ , v) = F(−µ , −v), (3.18)
In the 1-D continuous-time domain, we call the output of
an LTI system its impulse response h(t) when the input is an which states that the 2-D Fourier transform F(µ , v), must be
impulse: reflected across both the µ and v axes to produce its complex
δ (t) LTI h(t), conjugate.
and we call the Fourier transform of h(t) the frequency response B. Separable Images
of the system, H( f ):
◮ The CSFT of a separable image f (x, y) = f1 (x) f2 (y) is
h(t) H( f ). itself separable in the spatial frequency domain:
The analogous relationships for a 2-D LSI system are f1 (x) f2 (y) F1 (µ ) F2 (v). (3.19)
δ (x, y) LSI h(x, y), (3.17a) This assertion follows directly from the definition of the CSFT
given by Eq. (3.16a):
h(x, y) H(µ , v). (3.17b)
Z ∞Z ∞
The CSFT H(µ , v) is called the spatial frequency response of F(µ , v) = f (x, y) e− j2π (µ x+vy) dx dy
−∞ −∞
the LSI system. Z ∞ Z ∞
= f1 (x) e− j2π µ x dx f2 (y) e− j2π vy dy
−∞ −∞
◮ As in 1-D, the 2-D Fourier transform of a convolution = F1 (µ ) F2 (v). (3.20)
of two functions is equal to the product of their Fourier
transforms:
◮ The CSFT pairs listed in Table 3-1 are all separable
functions, and can be obtained by applying Eq. (3.20) to the
f (x, y) LTI g(x, y) = h(x, y) ∗ ∗ f (x, y) 1-D Fourier transform pairs listed in Table 2-5. CSFT pairs
for non-separable functions are listed later in Table 3-2. ◭
implies that
G(µ , v) = H(µ , v) F(µ , v). C. Sinusoidal Image

Consider the sinusoidal image described by
All of the 1-D Fourier transform properties listed in Table 2-4 f (x, y) = cos(2π µ0 x) cos(2π v0 y),
and all of the 1-D transform pairs listed in Table 2-5 generalize
readily to 2-D. The 2-D version of the two tables is available in where µ0 = 1.9 cycles/cm and v0 = 0.9 cycles/cm are the
Table 3-1. frequencies of the spatial variations along x and y in the spatial
Table 3-1 2-D Continuous-space Fourier transform (CSFT).
Selected Properties
1. Linearity ∑ ci fi (x, y) ∑ ci Fi (µ , v)

1 µ v
2. Spatial scaling f (ax x, ay y) F ,
|ax ay | ax ay
3. Spatial shift f (x − x0 , y − y0 ) e− j2π µ x0 e− j2π vy0 F(µ , v)
4. Reversal f (−x, −y) F(−µ , −v)
5. Conjugation f ∗ (x, y) F∗ (−µ , −v)
6. Convolution f (x, y) ∗ ∗h(x, y) F(µ , v) H(µ , v)

in space
7. Convolution f (x, y) h(x, y) F(µ , v) ∗ ∗H(µ , v)
in frequency
CSFT Pairs
8. δ (x, y) 1
9. δ (x − x0 , y − y0 ) e− j2π µ x0 e− j2π vy0
10. e j2π µ0 x e j2π v0 y δ (µ − µ0 , v − v0 )

x y
11. rect rect ℓx ℓy sinc(ℓx µ ) sinc(ℓy v)
ℓx ℓy
µ v
12. µ0 v0 sinc(µ0 x) sinc(v0 y) rect rect
µ0 v0
2 2 2 2
13. e−π x e−π y e−π µ e−π v
1
14. cos(2π µ0 x) cos(2π v0 y) 4 [δ ( µ ± µ0 ) δ (v ± v0 )]
domain, respectively. A grayscale image-display of f (x, y) is By Eq. (3.20), the CSFT of f (x, y) is
shown in Fig. 3-5(a), with pure black representing f (x, y) = −1
and pure white representing f (x, y) = +1. As expected, the F(µ , v) = F1 (µ ) F2 (v)
image exhibits a repetitive pattern along both x and y, with 19 = F { cos(2π µ0 x) } F { cos(2π v0y) }
cycles in 10 cm in the x direction and 9 cycles in 10 cm in 1
the y direction, corresponding to spatial frequencies of µ = 1.9 = [δ (µ − µ0 ) + δ (µ + µ0 )]
4
cycles/cm and v = 0.9 cycles/cm, respectively.
× [δ (v − v0) + δ (v + v0)], (3.21)
where we derived entry #14 in Table 3-1. The CSFT

consists of four impulses at spatial frequency locations
{ µ , v } = { ±µ0 , ±v0 }, as is also shown in Fig. 3-5(b).
We should note that the images displayed in Fig. 3-5 are
0 f (t)
Signal
1
t
0
−T T
5 cm 2 2
(a) Rectangular pulse
|F( μ)|
Magnitude
spectrum T
10 cm x
0 5 cm 10 cm
y
(a) 2-D sinusoidal image f (x, y)
μ
−3 −2 −1 0 1 2 3
v
T T T T T T
13
(b) Magnitude spectrum
Phase ϕ( μ)
spectrum
180o
0 μ
μ
−3 −2 −1 0 1 2 3
T T T T T T
−13 (c) Phase spectrum
−13 0 13
(b) F( μ,v), with axes in cycles/cm Figure 3-6 (a) Rectangular pulse, and corresponding (b) magni-
tude spectrum and (c) phase spectrum.
Figure 3-5 (a) Sinusoidal image f (x, y)=cos(2π µ0 x) cos(2π v0 y)
with µ0 = 1.9 cycles/cm and v0 = 0.9 cycles/cm, and (b) the
corresponding Fourier transform F(µ , v), which consists of four
impulses (4 white dots) at { ±µ0 , ±v0 }.
D. Box Image
As a prelude to presenting the CSFT of a 2-D box image, let us

not truly continuous functions; function f (x, y) was discretized examine the 1-D case of the rectangular pulse f (t) = rect(t/T )
into 256 × 256 pixels and then a discrete form of the Fourier shown in Fig. 3-6(a). The pulse is centered at the origin
transform called the DFT (Section 3-8) was used to generate and extends between −T /2 and +T /2, and the corresponding
F(µ , v), also in the form of a 256 × 256 image. Fourier transform is, from entry #3 in Table 2-5,
F(µ ) = T sinc(µ T ), (3.22)

where sinc(θ ) is the sinc function defined by Eq. (2.35) as

sinc θ = [sin(πθ )]/(πθ ). By defining F(µ ) as y
F(µ ) = |F(µ )|e j φ (µ ) , (3.23)
we determine that the phase spectrum φ (µ ) can be ascertained

from
F(µ ) sinc(µ T )
e j φ (µ ) = = . (3.24)
|F(µ )| | sinc(µ T )| l x
The quantity on the right-hand side of Eq. (3.24) is always equal
to +1 or −1. Hence, φ (µ ) = 0◦ when sinc(µ T ) is positive
and 180◦ when sinc(µ T ) is negative. The magnitude and phase
spectra of the rectangular pulse are displayed in Figs. 3-6(b)
and (c), respectively.
Next, let us consider the white square shown in Fig. 3-7(a).
If we assign an amplitude of 1 to the white part of the image (a) White square image
and 0 to the black part, the variation across the image along the v
x direction is analogous to that representing the time-domain
pulse of Fig. 3-6(a), and the same is true along y. Hence, the
white square represents the product of two pulses, one along x
and another along y, and is given by
x y
f (x, y) = rect rect , (3.25)
ℓ ℓ
μ
where ℓ is the length of the square sides. In analogy with
Eq. (3.22),
F(µ , v) = ℓ2 sinc(µ ℓ) sinc(vℓ). (3.26)
The magnitude and phase spectra associated with the expres-
sion given by Eq. (3.26) are displayed in grayscale format in
Fig. 3-7(b) and (c), respectively. For the magnitude spectrum, (b) Magnitude image |F( μ,v)|
white represents the peak value of |FBox (µ , v)| and black rep-
resents |FBox (µ , v)| = 0. The phase spectrum φ (µ , v) varies be- v
tween 0◦ and 180◦ , so the grayscale was defined such that white
corresponds to +180◦ and black to 0◦ . The tonal variations along
µ and v are equivalent to the patterns depicted in Figs. 3-6(b)
and (c) for the rectangular pulse.
In the general case of a box image of widths ℓx along x and ℓy
along y, and centered at (x0 , y0 ),
μ
x − x0 y − y0
f (x, y) = rect rect . (3.27)
ℓx ℓy
In view of properties #3 and 11 in Table 3-1, the corresponding

CSFT is
F(µ , v) = ℓx e− j2π µ x0 sinc(µ ℓx )ℓy e− j2π vy0 sinc(vℓy ), (3.28) (c) Phase image ϕ( μ,v)
Figure 3-7 (a) Grayscale image of a white square in a black

background, (b) magnitude spectrum, and (c) phase spectrum.
and the associated magnitude and phase spectra are given by

y
|F(µ , v)| (3.29a)
and
−1 Im[F(µ , v)]
φ (µ , v) = tan . (3.29b)
Re[F(µ , v)]
A visual example is shown in Fig. 3-8(a) for a square box of
sides ℓx = ℓy = ℓ, and shifted to the right by L and also downward x
L
by L. Inserting x0 = L, y0 = −L, and ℓx = ℓy = ℓ in Eq. (3.28) l
leads to
F(µ , v) = ℓ2 e− j2π µ L sinc(µ ℓ) e j2π vL sinc(vℓ). (3.30)
The magnitude and phase spectra associated with the CSFT of L

the box image defined by Eq. (3.30) are displayed in Fig. 3-8(b) (a) Box image f (x,y)
and (c), respectively. The magnitude spectrum of the shifted box
is similar to that of the unshifted box (Fig. 3-7), but the phase v
spectra of the two boxes are considerably different.
E. 2-D Ideal Brickwall Lowpass Filter

An ideal lowpass filter is characterized by a spatial frequency
response HLP (µ , v) with a specified cutoff frequency µ0 along
both the µ and v axes: μ
(
1 for 0 ≤ |µ |, |v| ≤ µ0 ,
HLP (µ , v) = (3.31)
0 otherwise.
Mathematically, HLP (µ , v) can be expressed in terms of rectan-

gle functions centered at the origin and of width 2 µ0 : (b) |F( μ,v)|
v
µ v
HLP (µ , v) = rect rect . (3.32)
2 µ0 2 µ0
The inverse 2-D Fourier transfer of HLP (µ , v) is the PSF

hLP (x, y). Application of property #12 in Table 3-1 yields
sin(2π xµ0 ) sin(2π yv0 ) μ

hLP (x, y) = ωx
πx πy
= 4 µ02 sinc(2xµ0 ) sinc(2yv0 ). (3.33)
F. Example: Lowpass-Filtering of Clown Image

Figure 3-9(a) displays an image of a clown’s face. Our goal is to (c) ϕ( μ,v)
lowpass-filter the clown image using an ideal lowpass filter with
a cutoff frequency µ0 = 0.44 cycles/mm. To do so, we perform Figure 3-8 (a) Image of a box image of dimension ℓ and centered
at (L, −L), (b) magnitude spectrum, and (c) phase spectrum.
(x, y) Domain ( μ,v) Domain

v
FT
μ
(a) Clown face image f (x,y) (b) Magnitude spectrum of clown image F( μ,v)
×v
IFT
μ
(c) Magnified PSF h(x,y) of 2-D LPF (d) Spatial frequency response of 2-D LPF, HLP( μ,v)
with μ0 = 0.44 cycles/mm
=
IFT
μ
(e) Lowpass-filtered clown image g(x,y) (f ) Magnitude spectrum of filtered image G( μ,v)
Figure 3-9 Lowpass filtering the clown image in (a) to generate the image in (e). Image f (x, y) is 40 mm × 40 mm and the magnitude
spectra extend between −2.5 cycles/mm and +2.5 cycles/mm in both directions.
the following steps:

(1) We denote the intensity distribution of the clown image y v
as f (x, y), and we apply the 2-D Fourier transform to obtain the
spectrum F(µ , v), whose magnitude is displayed in Fig. 3-9(b). y′ x′ μ′
v′
(2) The spatial frequency response of the lowpass filter, shown
in Fig. 3-9(d), consists of a white square representing the
passband of the filter. Its functional form is given by Eq. (3.32) θ θ
with µ0 = 0.44 cycles/mm. The corresponding PSF given by x μ
Eq. (3.33) is displayed in Fig. 3-9(c).
(3) Multiplication of F(µ , v) by HLP (µ , v) yields the spectrum
of the filtered image, G(µ , v): (a) Spatial domain (b) Frequency domain
G(µ , v) = F(µ , v) HLP (µ , v). (3.34) Figure 3-10 Rotation of axes by θ in (a) spatial domain causes
rotation by the same angle in the (b) spatial frequency domain.
The magnitude of the result is displayed in Fig. 3-9(f). Upon
performing an inverse Fourier transform on G(µ , v), we obtain
g(x, y), the lowpass-filtered image of the clown face shown in
Fig. 3-9(e). Image g(x, y) looks like a blurred version of the
original image f (x, y) because the lowpass filtering smooths out
rapid variations in the image. of the rotated image g(x, y) = f (x′ , y′ ), and Rθ is the rotation
Alternatively, we could have obtained g(x, y) directly by matrix relating (x, y) to (x′ , y′ ):
performing a convolution in the spatial domain:

g(x, y) = f (x, y) ∗ ∗hLP(x, y). (3.35) cos θ sin θ
Rθ = . (3.38)
− sin θ cos θ
Even though the convolution approach is direct and conceptually
straightforward, it is computationally much easier to perform the
filtering by transforming to the angular frequency domain, mul- The inverse relationship between (x′ , y′ ) and (x, y) is given in
tiplying the two spectra, and then inverse transforming back to terms of the inverse of matrix Rθ :
the spatial domain. The actual computation was performed using ′
x −1 x cos θ − sin θ x′
discretized (pixelated) images and the Fourier transformations = Rθ = . (3.39)
y y′ sin θ cos θ y′
were realized using the 2-D DFT introduced later in Section 3-8.
The 2-D Fourier transform of g(x, y) is given by
3-4.2 Image Rotation Z ∞Z ∞
G(µ , v) = g(x, y) e− j2π (µ x+vy) dx dy
−∞ −∞
◮ Rotating an image by angle θ (Fig. 3-10) in the 2-D Z ∞Z ∞
spatial domain (x, y) causes its Fourier transform to also = f (x′ , y′ ) e− j2π (µ x+vy) dx dy. (3.40)
rotate by the same angle in the frequency domain (µ , v). ◭ −∞ −∞
Using the relationships between (x, y) and (x′ , y′ ) defined by

To demonstrate the validity of the assertion, we start with the Eq. (3.39), while also recognizing that dx dy = dx′ dy′ because
relationships given by Eqs. (3.11) and (3.12): a differential element of area is the same in either coordinate
system, Eq. (3.40) becomes
g(x, y) = f (x′ , y′ ) (3.36)
G(µ , v)
and Z ∞Z ∞
′ ′ ′ ′ ′
x x = f (x′ , y′ ) e− j2π [µ (x cos θ −y sin θ )+v(x sin θ +y cos θ )] dx′ dy′
= R θ y , (3.37) −∞ −∞
y′ Z ∞Z ∞
′ ′ ′ ′
= f (x′ , y′ ) e− j2π [µ x +v y ] dx′ dy′ , (3.41)
where f (x, y) is the original image, (x′ , y′ ) are the coordinates −∞ −∞
coordinates (ρ , φ ), with
y v
p
(x,y) (μ,v) µ = ρ cos φ ρ = µ 2 + v2
. (3.45)
y v v = ρ sin φ φ = tan−1 (v/µ )
r ρ
θ ϕ The Fourier transform of f (x, y) is given by Eq. (3.16a) as
x x μ Z ∞Z ∞
μ
F(µ , v) = f (x, y) e− j2π (µ x+vy) dx dy. (3.46)
−∞ −∞
(a) Spatial domain (b) Frequency domain
We wish to transform F(µ , v) into polar coordinates so we
Figure 3-11 Relationships between Cartesian and polar coordi- may apply it to circularly symmetric images or to use it in
nates in (a) spatial domain and (b) spatial frequency domain. filtering applications where the filter’s frequency response is
defined in terms of polar coordinates. To that end, we convert
the differential area dx dy in Eq. (3.46) to r dr d θ , and we use
the relations given by Eqs. (3.44) and (3.45) to transform the
exponent in Eq. (3.46):
where we define
′ µ x + vy = (ρ cos φ )(r cos θ ) + (ρ sin φ )(r sin θ )
µ cos θ sin θ µ µ
= = Rθ . (3.42) = ρ r[cos φ cos θ + sin φ sin θ ]
v′ − sin θ cos θ v v
= ρ r cos(φ − θ ). (3.47)
The newly defined spatial-frequency coordinates (µ ′ , v′ ) are
related to the original frequency coordinates (µ , v) by exactly The cosine addition formula was used in the last step. Conver-
the same rotation matrix Rθ that was used to rotate image f (x, y) sion to polar coordinates leads to
to g(x, y). The consequence of using Eq. (3.42) is that Eq. (3.38) Z ∞ Z 2π
now assumes the standard form for the definition of the Fourier F(ρ , φ ) = f (r, θ ) e− j2πρ r cos(φ −θ ) r dr d θ . (3.48a)
transform for f (x′ , y′ ): r=0 θ =0
G(µ , v) = F(µ ′ , v′ ). (3.43) The inverse transform is given by

Z ∞ Z 2π
In conclusion, we have demonstrated that rotation of image f (r, θ ) = F(ρ , φ ) e j2πρ r cos(φ −θ ) ρ d ρ d φ . (3.48b)
f (x, y) by angle θ in the (x, y) plane leads to rotation of F(µ , v) ρ =0 φ =0
by exactly the same angle in the spatial frequency domain.
3-4.3 2-D Fourier Transform in Polar 3-4.4 Rotationally Invariant Images

Coordinates A rotationally invariant image is a circularly symmetric image,
which means that f (r, θ ) is a function of r only. According to
In the spatial domain, the location of a point can be speci- Section 3-4.2, rotation of f (r, θ ) by a fixed angle θ0 causes
fied by its (x, y) coordinates in a Cartesian coordinate system the transform F(ρ , φ ) to rotate by exactly the same angle θ0 .
or by its (r, θ ) in the corresponding polar coordinate system Hence, if f (r, θ ) is independent of θ , it follows that F(ρ , φ ) is
(Fig. 3-11(a). The two pairs of variables are related by independent of φ , in which case Eq. (3.48a) can be rewritten as
p Z ∞ Z 2π
x = r cos θ r = x2 + y2
. (3.44) F(ρ ) = f (r) e− j2πρ r cos(φ −θ ) r dr d θ
y = r sin θ θ = tan−1 (y/x) r=0 θ =0
Z ∞ Z 2 π
Similarly, in the spatial frequency domain (Fig. 3-11(b)), we − j2πρ r cos(φ −θ )
= r f (r) e d θ dr. (3.49)
can use Cartesian coordinates (µ , v) or their corresponding polar r=0 θ =0
J0(z)
1
y
3 cm
0.5
0 z 0 x
−0.5
0 2 4 6 8 10 12 14 16 18 20
Figure 3-12 Plot of J0 (z) the Bessel function of order zero, as a

function of z. −3 cm
−3 cm 0 3 cm
(a) Ring impulse
v
21
Because the integration over θ extends over the range (0, 2π ),
the integrated value is the same for any fixed value of φ . Hence,
for simplicity we set φ = 0, in which case Eq. (3.49) simplifies
to
Z ∞ Z 2 π
− j2πρ r cos θ
F(ρ ) = r f (r) e d θ dr
r=0 θ =0
Z ∞ 0 μ
= 2π r f (r) J0 (2πρ r) dr, (3.50)
r=0
where J0 (z) is the Bessel function of order zero:

Z 2π
1
J0 (z) = e− jz cos θ d θ . (3.51)
2π 0
−21
A plot of J0 (z) versus z is shown in Fig. 3-12. −21 0 21
The integral expression on the right-hand side of Eq. (3.50) (b) Fourier transform of ring impulse
is known as the Hankel transform of order zero. Hence, the
Fourier transform of a circularly symmetric image f (r) is given
Figure 3-13 (a) Image of ring impulse of radius a = 1 cm and
by its Hankel transform of order zero. An example is the ring
(b) the logarithm of its 2-D CTFT. [In display mode, the images
impulse are 256 × 256 pixels.]
f (r) = δ (r − a), (3.52a)
which defines a unit-intensity circle of radius a (Fig. 3-13(a)) in
the spatial coordinate system (r, θ ). The corresponding Fourier
Table 3-2 2-D Fourier transforms of rotationally invariant

images.
f (r) F(ρ )
δ (r)
1
πr
J1 (πρ )
rect(r)
2ρ
J1 (π r)
rect(ρ )
2r
1 1
r ρ
2 2
e− π r e−πρ (a) Letters image f (x,y)
δ (r − r0 ) 2π r0 J0 (2π r0 ρ )
transform is
Z ∞
F(ρ ) = 2π r δ (r − a) J0 (2πρ r) dr = 2π a J0 (2πρ a). (b) Letters image f (x′,y′) with x′ = ax and y′ = ay,,
r=0 spatially scaled by a = 4
(3.52b)
The image in Fig. 3-13(b) displays the variation of F(ρ ) as a
function of ρ in the spatial frequency domain for a ring with Figure 3-14 (a) Letters image and (b) a scaled version.
a = 1 cm (image size is 6 cm × 6 cm).
Table 3-2 provides a list of Fourier transform pairs of rota-
tionally symmetric images.
C. Gaussian Image
3-4.5 Image Examples
A. Scaling A 2-D Gaussian image is characterized by
Figure 3-14 compares image f (x, y), representing an image of 2 2

letters, to a scaled-down version f (x′ , y′ ) with x′ = ax, y′ = ay, f (x, y) = e−π (x +y ) . (3.53a)
and a = 4. The area of the scaled-down image is 1/16 of the area Gaussian image
of the original. To enlarge the image, the value of a should be
smaller than 1.
Since x2 + y2 = r2 , f (x, y) is rotationally invariant, so we can
rewrite it as
B. Image Rotation 2
f (r) = e−π r . (3.53b)
The image displayed in Fig. 3-15(a) is a sinusoidal image that To obtain the Fourier transform F(ρ ), we can apply Eq. (3.50),
oscillates along only the y direction. Its 2-D spectrum consists the Fourier transform for a rotationally invariant image:
of two impulse functions along the v direction, as shown in Z ∞
Fig. 3-15(b). Rotating the sinusoidal image by 45◦ to the image
F(ρ ) = 2π r f (r) J0 (2πρ r) dr
in Fig. 3-15(c) leads to a corresponding rotation of the spectrum,
Z0 ∞
as shown in Fig. 3-15(d). 2
= 2π re−π r J0 (2πρ r) dr. (3.54)
0
x
y
(a) Sinusoidal image (b) Spectrum of image in (a)
x
y
(c) Sinusoidal image rotated by 45◦ (d) Spectrum of rotated sinusoidal image
Figure 3-15 (a) Sinusoidal image and (b) its 2-D spectrum; (c) rotated image and (d) its rotated spectrum.
From standard tables of integrals, we borrow the following The integrals in Eq. (3.54) and Eq. (3.55) become identical if we
identity for any real variable t: set t = r, a2 = π , and b = 2πρ , which leads to
Z ∞
2t 2 1 −b2 /4a2
te−a J0 (bt) dt = e , for a2 > 0. (3.55) 2
0 2a2 F(ρ ) = e−πρ . (3.56)
Gaussian spectrum
extends between 0 and ρ0 , and is given by

◮ Hence, the Fourier transform of a 2-D Gaussian image is
itself 2-D Gaussian. ◭ (
ρ 1 for 0 < |ρ | < ρ0 ,
HLP (ρ ) = rect =
2ρ0 0 otherwise.
D. Disk Image
By Fourier duality, or equivalently by application of the inverse
A disk image has a value of 1 inside the disk area and zero transformation given by Eq. (3.48b), we obtain the PSF of the
outside it. A disk image centered at the origin and of radius 1/2 lowpass filter as
is characterized by Eq. (3.3) in (x, y) coordinates. Conversion to
polar coordinates gives J1 (2πρ0r)(2ρ0 )
( hLP (r) = = (2ρ0 )2 jinc(2ρ0 r). (3.59)
2r
1 for 0 ≤ r < 1/2,
fDisk (r) = rect(r) = (3.57)
0 otherwise.
◮ We should always remember the scaling property of the
Fourier transform, namely if
After much algebra, it can be shown that the corresponding
Fourier transform is given by f (r) F(ρ ),
J1 (πρ ) then for any real-valued scaling factor a,

FDisk (ρ ) = = jinc(ρ ), (3.58) ρ
2ρ 1
f (ar) F .
|a|2 a
where J1 (x) is the Bessel function of order 1, and
The scaling property allows us to use available expressions,
jinc(x) = J1 (π x)/(2x) is called the jinc function, which comes
such as those in Eqs. (3.58) and (3.59), and to easily
from its resemblance in both waveform and purpose to the
convert them into the expressions appropriate (for example)
sinc function defined by Eq. (2.35), except that the numerator
to disks of different sizes or filters with different cutoff
changes from a sine to a Bessel function. Figure 3-16 displays
frequencies. ◭
a plot of jinc(ρ ), as well as a plot of the sinc function sinc(ρ ),
included here for comparison. In one dimension, the Fourier
transform of rect(x) is sinc(µ ); in two dimensions, the Fourier
transform of a disk image fDisk (r) = rect(r) is given by jinc(ρ ),
which resembles the variation exhibited by the sinc function.
1
Concept Question 3-3: Why do so many 1-D Fourier
sin(πρ) J1(πρ)
sinc( ρ) = jinc( ρ) = transform properties generalize directly to 2-D?
0.5 πρ 2ρ
0 ρ Concept Question 3-4: Where does the jinc function get

its name?
−0.5
0 1 2 3 4 5 6 7 8 9 10
Exercise 3-6: Why do f (x, y) and f (x − x0 , y − y0 ) have the
Figure 3-16 Sinc function (blue) and jinc function (red). same magnitude spectrum |F(µ , ν )|?
Answer: Let g(x, y) = f (x − x0 , y − y0 ). Then, from entry
#3 in Table 3-1,
E. PSF of Radial Brickwall Lowpass Filter |G(µ , ν )| = |e− j2π µ x0 e− j2πν y0 F(µ , ν )|
In the spatial frequency domain, the frequency response of a = |e− j2π µ x0 e− j2πν y0 ||F(µ , ν )| = |F(µ , ν )|.
radial brickwall lowpass filter with cutoff spatial frequency ρ0
3-5 2-D SAMPLING THEOREM 107
2
Exercise 3-7: Compute the 2-D CSFT of f (x, y) = e−π r ,
where r2 = x2 + y2 , without using Bessel functions. Hint:
f (x, y) is separable. y
Answer: f (x, y) = e −π r 2
=e e −π x2 −π y2
is separable, so
Eq. (3.19) and entry #5 of Table 2-5 (see also entry #13
of Table 3-1) give
2 2 2 +ν 2 ) 2
F(µ , ν ) = e−π µ e−πν = e−π (µ = e−πρ .
Exercise 3-8: The 1-D phase spectrum φ (µ ) in Fig. 3-6(c) x

is either 0 or 180◦ for all µ . Yet the phase of the 1-D
CTFT of a real-valued function must be an odd function
of frequency. How can these two statements be reconciled? Figure 3-17 The “bed of nails” function
∞ ∞
Answer: A phase of 180◦ is equivalent to a phase of −180◦ . ∑ ∑ δ (x − n∆) δ (y − m∆).
Replacing 180◦ with −180◦ for µ < 0 in Fig. 3-6(c) makes n=−∞ m=−∞
the phase φ (µ ) an odd function of µ .
Conceptually, image f (x, y) can be reconstructed from its

3-5 2-D Sampling Theorem discretized version f (n∆, m∆) by applying the 2-D version of
the sinc interpolation formula. Generalizing Eq. (2.51) to 2-D
The sampling theorem generalizes directly from 1-D to 2-D gives
using rectangular sampling:
∞ ∞
f [n, m] = f (n∆, m∆) = f (n/S, m/S) (3.60) f (x, y) = ∑ ∑ f (n∆, m∆)
n=−∞ m=−∞
where ∆ is the sampling length (instead of interval) and sin(π S(x − n∆)) sin(π S(y − m∆))
× . (3.62)
Sx = 1/∆ is the sampling rate in samples/meter. π S(x − n∆) π S(y − m∆)
If the spectrum of image f (x, y) is bandlimited to B—that is,
F(µ , v) = 0 outside the square region defined by As noted earlier in Section 2-4.4 in connection with Eq. (2.51),
accurate reconstruction using the sinc interpolation formula
{ (µ , v) : 0 ≤ |µ |, |v| ≤ B }, is not practical because it requires summations over infinite
number of samples.
then the image f (x, y) can be reconstructed from its samples
f [m, n], provided the sampling rate is such that S > 2B. As in
1-D, 2B is called the Nyquist sampling rate, although the units 3-5.1 Sampling/Reconstruction Examples
are now samples/meter instead of samples/second. The following image examples are designed to illustrate the im-
The sampled signal xs (t) defined by Eq. (2.43) generalizes portant role of the Nyquist rate when sampling an image f (x, y)
directly to the sampled image: (for storage or digital transmission) and then reconstructing it
∞ ∞ from its sampled version fs (x, y). We will use the term image
fs (x, y) = ∑ ∑ f (n∆, m∆) reconstruction fidelity as a qualitative measure of how well
n=−∞ m=−∞ the reconstructed image frec (x, y) resembles the original image
× [δ (x − n∆) δ (y − m∆)]. (3.61) f (x, y).
Reconstruction of frec (x, y) from the sampled image fs (x, y)
The term inside the square brackets (product of two impulse can be accomplished through either of two approaches:
trains) is called the bed of nails function, because it consists (a) Application of nearest-neighbor (NN) interpolation
of a 2-D array of impulses, as shown in Fig. 3-17. (which is a 2-D version of the 1-D nearest-neighbor interpola-
tion), implemented directly on image fs (x, y). in part (d). The spectrum of the sampled image contains
(b) Transforming image fs (x, y) to the frequency domain, the spectrum of the original image (namely, the spectrum in
applying 2-D lowpass filtering (LPF) to simultaneously preserve Fig. 3-18(b)), plus periodic copies spaced at an interval S along
the central spectrum of f (x, y) and remove all copies thereof both directions in the spatial frequency domain. To preserve the
(generated by the sampling process), and then inverse transform- central spectrum and simultaneously remove all of the copies,
ing to the spatial domain. a lowpass filter is applied in step (f) of Fig. 3-18. Finally.
Both approaches will be demonstrated in the examples that application of the 2-D inverse Fourier transform to the spectrum
follow, and in each case we will compare an image reconstructed in part (f) leads to the reconstructed image frec (x, y) in part (e).
from an image sampled at the Nyquist rate with an aliased image We note that the process yields a reconstructed image with high-
reconstructed from an image sampled at a rate well below the fidelity resemblance to the original image f (x, y).
Nyquist rate. In all cases, the following parameters apply:
• Size of original (clown) image f (x, y) and reconstructed B. NN Reconstruction
image frec (x, y): 40 mm × 40 mm
Figure 3-19 displays image f (x, y), sampled image fs (x, y), and
• Sampling interval ∆ (and corresponding sampling rate the NN reconstructed image f (x, y). The last step was realized
S = 1/∆) and number of samples N: using a 2-D version of the nearest-neighbor interpolation tech-
nique described in Section 2-4.4D. NN reconstruction provides
• Nyquist-sampled version: ∆ = 0.2 mm, S = 5 sam- image
ples/mm, N = 200 × 200
• Sub–Nyquist-sampled version: ∆ = 0.4 mm, S = 2.5 fˆ(x, y) = fs (x, y) ∗ ∗ rect(x/∆) rect(y/∆), (3.63)
sample/mm, N = 100 × 100
which is a 2-D convolution of the 2-D sampled image fs (x, y)
• Spectrum of original image f (x, y) is bandlimited to with a box function. The spectrum of the NN interpolated signal
B = 2.5 cycles/mm is
sin(π ∆µ ) sin(π ∆v)
F̂(µ , v) = F(µ , v) . (3.64)
• Display πµ πv
• Images f (x, y), fs (x, y), frec (x, y): Linear scale As in 1-D, the zero crossings of the 2-D sinc functions coincide
with the centers of the copies of F(µ , v) induced by sampling.
• Image magnitude spectra: Logarithmic scale (for Consequently, the 2-D sinc functions act like lowpass filters
easier viewing; magnitude spectra extend over a wide along µ and v, serving to eliminate the copies almost completely.
range). Comparison of the NN-interpolated image in Fig. 3-19(c)
with the original image in part (a) of the figure leads to the
Reconstruction Example 1: conclusion that the NN technique works quite well for images
Image Sampled at Nyquist Rate sampled at or above the Nyquist rate.
Our first step is to create a bandlimited image f (x, y). This was
done by transforming an available clown image to the spatial Reconstruction Example 2:
frequency domain and then applying a lowpass filter with a Image Sampled below the Nyquist Rate
cutoff frequency of 2.5 cycles/mm. The resultant image and its
corresponding spectrum are displayed in Figs. 3-18(a) and (b), A. LPF Reconstruction
respectively.
The sequence in this example (Fig. 3-20) is identical with that
A. LPF Reconstruction described earlier in Example 1A, except for one very important
difference: in the present case the sampling rate is S = 2.5
Given that image f (x, y) is bandlimited to B = 2.5 cycles/mm, cycles/mm, which is one-half of the Nyquist rate. Consequently,
the Nyquist rate is 2B = 5 cycles/mm. Figure 3-18(c) displays the final reconstructed image in Fig. 3-20(e) bears a poor
fs (x, y), a sampled version of f (x, y), sampled at the Nyquist resemblance to the original image in part (a) of the figure.
rate, so it should be possible to reconstruct the original im-
age with good fidelity. The spectrum of fs (x, y) is displayed
(a) Bandlimited image f (x,y) (b) Spectrum F( μ,v) of image f (x,y)

Sampling at S = 5 samples/mm
v
FT
μ
(c) Nyquist-sampled image fs(x,y) (d) Spectrum Fs( μ,v) of sampled image
Filtering
IFT
μ
(e) LPF reconstructed image frec(x,y) (f ) Lowpass-filtered spectrum
Figure 3-18 Reconstruction Example 1A: After sampling image f (x, y) in (a) to generate fs (x, y) in (c), the sampled image is Fourier
transformed [(c) to (d)], then lowpass-filtered [(d) to (f)] to remove copies of the central spectrum) and inverse Fourier transform [(f) to (e)]
to generate the reconstructed image frec (x, y). All spectra are displayed in log scale.
B. NN Reconstruction
The sequence in Fig. 3-21 parallels the sequence in Fig. 3-19,
except that in the present case we are working with the sub-
Nyquist sampled image. As expected, the NN interpolation
technique generates a poor-fidelity reconstruction, just like the
LPF reconstructed version.
3-5.2 Hexagonal Sampling

The transformation defined by Eq. (3.48a) converts image
f (r, θ )—expressed in terms of spatial polar coordinates (r, θ )—
(a) Bandlimited image f (x,y) to its spectrum F(ρ , φ )—expressed in terms of radial frequency
ρ and associated azimuth angle φ . Spectrum F(ρ , φ ) is said to
Sampled at be radially bandlimited to radial frequency ρ0 if:
S = 5 samples/mm
F(ρ , φ ) = 0 for ρ > ρ0 .
If f (r, θ ) is sampled along a rectangular grid—the same as

when sampling f (x, y) in rectangular coordinates—at a sam-
pling spacing ∆rect (Fig. 3-22(a)) and corresponding sampling
rate S = 1/∆rec such that S ≥ 2ρ0 (to satisfy the Nyquist rate),
then the spectrum Fs (ρ , φ ) of the sampled image fs (r, θ ) would
consist of a central disk of radius ρ0 , as shown in Fig. 3-22(b),
plus additional copies at a spacing S along both the µ and v
directions. The term commonly used to describe the sampling in
(x, y) space is tiling; the image space in Fig. 3-22(a) is tiled with
(b) Nyquist-sampled image fs(x,y) square pixels.
Square tiling is not the only type of tiling used to sample
NN 2-D images. A more efficient arrangement in terms of data rate
(or total number of samples per image) is to tile the image
space using hexagons instead of squares. Such an arrangement
is shown in Fig. 3-23(a) and is called hexagonal sampling.
The image space is tiled with hexagons. The spacing along y is
unchanged (Fig. 3-23(a)), but the spacing along x has changed
to
2
∆hex = √ ∆rect = 1.15∆rect .
3
The modest wider spacing along x translates into fewer samples
needed to tile the image, and more efficient utilization of the
spatial frequency space (Fig. 3-23(b)).
Hexagonal sampling is integral to how the human vision sys-
(c) NN reconstructed image frec(x,y)
tem functions, in part because our photoreceptors are arranged
along a hexagonal lattice. The same is true for other mammals
Figure 3-19 Reconstruction Example 1B: Nearest-neighbor as well.
(NN) interpolation for Nyquist-sampled image. Sampling rate is Reconstruction of f (r, θ ) from its hexagonal samples entails
S = 5 samples/mm. the application of a radial lowpass filter with cutoff frequency
ρ0 to Fs (ρ , φ ), followed by an inverse Fourier transformation
(using Eq. (3.48b)) to the (r, θ ) domain. A clown image ex-
(a) Bandlimited image f (x,y) (b) Spectrum F( μ,v) of image f (x,y)

Sampling at S = 2.5 sample/mm
v
FT
μ
(c) Sub-Nyquist sampled image fs(x,y) (d) Spectrum Fs( μ,v) of sampled image
Filtering
IFT
μ
(e) LPF reconstructed image frec(x,y) (f ) Lowpass-filtered spectrum
Figure 3-20 Reconstruction Example 2A: Image f (x, y) is sampled at half the Nyquist rate (S = 2.5 sample/mm compared with 2B = 5
samples/mm). Consequently, the reconstructed image in (e) bears a poor resemblance to the original image in (a). All spectra are displayed in
log scale.
∆rec
∆rec
(a) Bandlimited image f (x,y)

Sampled at
S = 2.5 samples/mm
x
(a) Square tiling of f ( x,y)
(b) Sub-Nyquist sampled image fs(x,y)

NN
ρ0
μ
0
(c) NN reconstructed image frec(x,y)

(b) Spectrum Fs( ρ,ϕ)
Figure 3-21 Reconstruction Example 2B: Nearest-neighbor
interpolation for sub-Nyquist-sampled image.
Figure 3-22 (a) Square tiling at a spacing ∆rect and a sampling
rate S = 1/∆rect ≥ 2ρ0 , and (b) corresponding spectrum Fs (ρ , φ )
for an image radially bandlimited to spatial frequency ρ0 .
3-6 2-D DISCRETE SPACE 113
ample with hexagonal sampling at the Nyquist rate is shown

in Fig. 3-24. Note that the clown image has been antialiased
(lowpass-filtered prior to hexagonal sampling) so that the copies
of the spectrum created by hexagonal sampling do not overlap in
Fig. 3-24(d). This is why the clown image looks blurred, but the
reconstructed clown image matches the original blurred image.
∆rec
Concept Question 3-5: Why is the sampling theorem
important in image processing?
Exercise 3-9: An image is spatially bandlimited to 10

cycles/mm in both the x and y directions. What is the
minimum sampling length ∆s required in order to avoid
aliasing?
1 1
Answer: ∆s < 2B = 20 mm.
x
∆hex
y
Exercise 3-10: The 2-D CSFT of a 2-D impulse is 1, so it
is not spatially bandlimited. Why is it possible to sample an
impulse?
(a) Hexagonal tiling of f ( x,y)
Answer: In general, it isn’t. Using a sampling interval of
∆s , the impulse δ (x − x0 , y − y0 ) will be missed unless x0
and y0 are both integer multiples of ∆s .
ν
3-6 2-D Discrete Space
3-6.1 Discrete-Space Images
A discrete-space image represents a physical quantity that
varies with discrete space [n, m], where n and m are dimension-
ρ0 less integers. Such an image usually is generated by sampling a
continuous-space image f (x, y) at a spatial interval ∆s along the
μ x and y directions. The sampled image is defined by
0
f [n, m] = f (n∆s , m∆s ). (3.65)
The spatial sampling rate is Ss = 1/∆s , [n, m] denotes the

location of a pixel (picture element), and f [n, m] denotes the
value (such as image intensity) of that pixel.
A. Image Axes
(b) Spectrum Fs( ρ,ϕ) As noted earlier in connection with continuous-space images,
multiple different formats are used in both continuous- and
discrete-space to define image coordinates. We illustrate the
Figure 3-23 (a) Hexagonal tiling and (b) corresponding spec- most common of these formats in Fig. 3-25. In the top of the
trum Fs (ρ , φ ). figure, we show pixel values for a 10 × 10 array. In parts (a)
(a) Radially bandlimited image f (r,θ) (b) Spectrum F( ρ,ϕ) of image f (r,θ)
Sampling at 2ρ0
v
FT
μ
(c) Hexagonally sampled image fs(r,θ) (d) Spectrum Fs( ρ,ϕ) of sampled image
Filtering
IFT
μ
(e) Radial LPF reconstructed image frec(r,θ) (f ) Radial lowpass-filtered spectrum
Figure 3-24 Hexagonal sampling and reconstruction example.

0 0 0 0 0 0 0 0 0 0
0 3 5 7 9 10 11 12 13 14
0 5 10 14 17 20 23 25 27 29
0 8 15 21 26 30 34 37 40 43
0 10 20 27 34 40 45 50 54 57
0 10 20 27 34 40 45 50 54 57
0 8 15 21 26 30 34 37 40 43
0 5 10 14 17 20 23 25 27 29
0 3 5 7 9 10 11 12 13 14
0 0 0 0 0 0 0 0 0 0
Pixel values
Origin
m
0 9
1 8
2 7
3 6
4 5
5 4
6 3
7 2
8 1
9 0
n n
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
m
(a) Top-left corner format (b) Bottom-left corner format
Origin
m
4 1
3 2
2 3
1 4
0 5
−1
n 6
−2 7
−3 8
−4 9
−5 10
−5 −4 −3 −2 −1 0 1 2 3 4 1 2 3 4 5 6 7 8 9 10
n′
m′
(c) Center-of-image format (d) MATLAB format
Figure 3-25 The four color images are identical in pixel values, but they use different formats for the location of the origin and for coordinate
directions for image f [n, m]. The MATLAB format in (d) represents X(m′ , n′ ).
through (d), we display the same color maps corresponding to MATLAB format
the pixel array, except that the [n, m] coordinates are defined
differently, namely: In MATLAB, an (M × N) image is defined as
{ X(m′ , n′ ), 1 ≤ m′ ≤ M, 1 ≤ n′ ≤ N }. (3.68)
Top-left corner format
MATLAB uses the top-left corner format, except that its
Figure 3-25(a): [n, m] starts at [0, 0] and both integers extend indices n′ and m′ start at 1 instead of 0. Thus, the top-left corner
to 9, the origin is located at the upper left-hand corner, m in- is (1, 1) instead of [0, 0]. Also, m′ , the first index in X(m′ , n′ ),
creases downward, and n increases to the right. For a (M × N) represents the vertical axis and the second index, n′ , represents
image, f [n, m] is defined as the horizontal axis, which is the reverse of the index notation
represented in f [n, m]. The two notations are related as follows:
{ f [n, m], 0 ≤ n ≤ N − 1, 0 ≤ m ≤ M − 1 } (3.66a)
f [n, m] = X(m′ , n′ ), (3.69a)
or, equivalently,
with
f [n, m] =
  m′ = m + 1, (3.69b)
f [0, 0] f [1, 0] . . . f [N − 1, 0]
 f [0, 1] f [1, 1] . . . f [N − 1, 1]  ′
n = n + 1. (3.69c)
 .. .. . (3.66b)
 
. .
f [0, M − 1] f [1, M − 1] . . . f [N − 1, M − 1]
Bottom-left corner format

Common-Image vs. MATLAB Format
Figure 3-25(b): [n, m] starts at [0, 0] and both integers extend
to 9, the origin is located at the bottom left-hand corner, m in-
creases upward, and n increases to the right. This is a vertically The format used by MATLAB to store images is different
flipped version of the image in Fig. 3-25(a), and f [n, m] has the from the conventional matrix format given in Eq. (3.66b) in
same definition given by Eq. (3.66a). two important ways:
(1) Whereas the pixel at the upper left-hand corner of image

Center-of-image format f [n, m] is f [0, 0], at location (0, 0), that pixel is denoted
X(1, 1) in MATLAB.
Figure 3-25(c): The axes directions are the same as in the image
of Fig. 3-25(b), except that the origin of the coordinate system (2) In f [n, m], the value of n denotes the column that f [n, m]
is now located at the center of the image. The ranges of n and resides within (relative to the most-left column, which is
m depend on whether M and N are odd or even integers. If pixel denoted by n = 0), and m denotes the row that f [n, m]
[0, 0] is to be located in the center in the [n, m] coordinate system, resides within (relative to the top row, which is denoted by
then the range of m is m = 0). In MATLAB, the notation is reversed: the first index
of X(m′ , n′ ) denotes the row and the second index denotes
M−1 M−1
− ≤m≤ , if M is odd, (3.67a) the column. Hence,
2 2
M M f [n, m] = X(m′ , n′ )
− ≤ m ≤ −1 , if M is even. (3.67b)
2 2
with m′ and n′ related to n and m by Eq. (3.69). To
A similar definition applies to index n. distinguish between the two formats, f [n, m] uses square
brackets whereas X(m′ , n′ ) uses curved brackets.
Symbolically, the 2-D convolution is represented by

◮ From here on forward, the top-left-corner format will be
used for images, the center-of-image format will be used
for image spectra and PSFs, and the MATLAB format will f [n, m] h[n, m] g[n, m].
be used in MATLAB arrays. ◭
The properties of 1-D convolution are equally applicable in 2-D
discrete-space. Convolution of images of sizes (L1 × L2 ) and
B. Impulses and Shifts (M1 × M2 ) yields an image of size (N1 × N2 ), where
In 2-D discrete-space, impulse δ [n, m] is defined as N1 = L1 + M1 − 1, (3.72a)

( N2 = L2 + M2 − 1. (3.72b)
1 if n = m = 0,
δ [n, m] = δ [n] δ [m] = (3.70)
0 otherwise. The 2-D convolution process is illustrated next through a simple
example.
In upper-left-corner image format, shifting an image
f [n, m] by m0 downward and n0 rightward generates image
f [n − n0, m − m0]. An example is shown in Fig. 3-26.
Example 3-1: 2-D Convolution

Image Size
N columns Compute the 2-D convolution

1 2 5 6
∗∗ .
M rows 3 4 7 8
n Solution: By using entries in the first image as weights, we

m (M × N) image have
       
As noted earlier in Chapter 1, the size of an image is denoted 5 6 0 0 5 6 0 0 0 0 0 0
by (# of rows × # of columns) = M × N. 1 7 8 0 + 2 0 7 8 + 3 5 6 0 + 4 0 5 6
0 0 0 0 0 0 7 8 0 0 7 8
 
5 16 12
= 22 60 40 .
3-6.2 Discrete-Space Systems 21 52 32
For discrete-space systems, linearity and shift invariance follow
analogously from their continuous-space counterparts. When a Concept Question 3-6: Why do we bother studying
linear shift-invariant (LSI) system characterized by a point discrete-space images and systems, when almost all real-
spread function h[n, m] is subjected to an input f [n, m], it world systems and images are defined in continuous space?
generates an output g[n, m] given by the 2-D convolution of
f [n, m] and h[n, m]:
Exercise 3-11: If g[n, m] = h[n, m] ∗ ∗ f [n, m], what is
g[n, m] = h[n, m] ∗ ∗ f [n, m] 4h[n, m] ∗ ∗ f [n − 3, m − 2] in terms of g[n, m]?
∞ ∞
Answer: 4g[n − 3, m − 2], using the shift and scaling
= ∑ ∑ h[i, j] f [n − i, m − j]. (3.71)
properties of 1-D convolutions.
i=−∞ j=−∞
Column 3 of f [n,m]
Image f [n,m]
Image f [n − 2, m − 1]
Row 5 of f [n,m]
Figure 3-26 Image f [n − 2, m − 1] is image f [n, m] shifted down by 1 and to the right by 2.
Conjugate symmetry for real-valued images f [m, n] implies that

1 1 1 1
Exercise 3-12: Compute ∗∗ .
1 1 1 1 F∗ (Ω1 , Ω2 ) = F(−Ω1 , −Ω2 ). (3.74)
 
1 2 1 As in the 2-D continuous-space Fourier transform, F(Ω1 , Ω2 )
Answer: 2 4 2 . must be reflected across both spatial frequency axes to produce
1 2 1 its complex conjugate.
The DSFT is doubly periodic in (Ω1 , Ω2 ) with periods 2π
along each axis, as demonstrated by Example 3-2.
3-7 2-D Discrete-Space Fourier ◮ The spectrum of an image f [n, m] is its DSFT F(Ω1 , Ω2 ).
Transform (DSFT) The discrete-space frequency response H(Ω1 , Ω2 ) of an
LSI system is the DSFT of its point spread function (PSF)
The 2-D discrete-space Fourier transform (DSFT) is obtained h[n, m]. ◭
via direct generalization of the 1-D DTFT (Section 2-6) to
2-D. The DSFT consists of a DTFT applied first along m
and then along n, or vice versa. By extending the 1-D DTFT
definition given by Eq. (2.73a) (as well as the properties listed Example 3-2: DSFT of Clown Image
in Table 2-7) to 2-D, we obtain the following definition for the
DSFT F(Ω1 , Ω2 ) and its inverse f [n, m]:
∞ ∞ Use MATLAB to obtain the magnitude image of the DSFT of
F(Ω1 , Ω2 ) = ∑ ∑ f [n, m] e− j(Ω1 n+Ω2 m) , (3.73a) the clown image.
n=−∞ m=−∞
Z π Z π
1 Solution: The magnitude part of the DSFT is displayed in
f [n, m] = F(Ω1 , Ω2 )
4π 2 −π −π Fig. 3-27. As expected, the spectrum is periodic with period 2π
×e j(Ω1 n+Ω2 m)
dΩ1 dΩ2 . (3.73b) along both Ω1 and Ω2 .
The properties of the DSFT are direct 2-D generalizations of Concept Question 3-7: What is the DSFT used for? Give
the properties of the DTFT, and discrete-time generalizations of three applications.
the properties of the 2-D continuous-space Fourier transform.
3-8 2-D DISCRETE FOURIER TRANSFORM (2-D DFT) 119
format. That conversion is called the discrete Fourier transform

Ω2 (DFT). With the DFT, both f [n, m] and its 2-D Fourier transform
4π operate in discrete domains. For an (M × N) image f [n, m],
generalizing Eq. (2.89) to 2-D leads to the 2-D DFT of order
(K2 × K1 ):
N−1 M−1
nk1 mk2
F[k1 , k2 ] = ∑ ∑ f [n, m] exp − j2π
K1
+
K2
,
n=0 m=0
Ω1 k1 = { 0, . . . , K1 − 1 }, k2 = { 0, . . . , K2 − 1 }, (3.75)
where we have converted (Ω1 , Ω2 ) into discrete indices (k1 , k2 )

by setting
2π
Ω1 = k1
K1
−4π and
−4π 4π 2π
Ω2 = k2 .
K2
Figure 3-27 DSFT magnitude of clown image (log scale). Array F[k1 , k2 ] is given by the (K2 × K1 ) array
 
F[0, 0] F[1, 0] . . . F[K1 − 1, 0]
 F[0, 1] F[1, 1] . . . F[K1 − 1, 1] 
Exercise 3-13: An LSI system has a PSF given by F[k1 , k2 ] = 
 .. .. .

. .
  F[0, K2 − 1] F[1, K2 − 1] . . .F[K1 − 1, K2 − 1]
1 2 1
(3.76)
h[n, m] = 2 4 2 .
Note that the indexing is the same as that used for f [n, m]. The
1 2 1
inverse DFT is
Compute its spatial frequency response H(Ω1 , Ω2 ). Hint:
1 K1 −1 K2 −1 nk1 mk2
K1 K2 k∑ ∑ F[k1 , k2 ] exp j2π K1 + K2 ,
h[n, m] is separable. f [n, m] =
=0 k =0
1 2
Answer: The DTFT was defined in Eq. (2.73). Recog-
nizing that h[n, m] = h1 [n] h1 [m] with h1 [n] = {1, 2, 1}, the n = { 0, . . . , N − 1 }, m = { 0, . . . , M − 1 }. (3.77)
DSFT is the product of two 1-D DTFTs, each of the form
◮ Note that if N < K1 and M < K2 , then the re-
H1 (Ω) = e jΩ + 2 + e− jΩ = 2 + 2 cos(Ω),
constructed image f [n, m] = 0 for N ≤ n ≤ K1 − 1 and
and therefore M ≤ m ≤ K2 − 1. ◭
H(Ω1 , Ω2 ) = [2 + 2 cos(Ω1 )][2 + 2 cos(Ω2 )].

3-8.1 Properties of the 2-D DFT
The 2-D DFT, like the 2-D CSFT and 2-D DSFT, consists of a
3-8 2-D Discrete Fourier Transform 1-D transform along either the horizontal or vertical direction,
(2-D DFT) followed by another 1-D transform along the other direction.
Accordingly, the 2-D DFT has the following properties listed
According to Eq. (3.73), while f [n, m] is a discrete function, in Table 3-3, which are direct generalizations of the 1-D DFT
its DSFT F(Ω1 , Ω2 ) is a continuous function of Ω1 and Ω2 . properties listed earlier in Table 2-9.
Numerical computation using the fast Fourier transform (FFT) The cyclic convolution h[n] c x[n] was defined in Eq. (2.104).
requires an initial step of converting F(Ω1 , Ω2 ) into discrete The 2-D DFT maps 2-D cyclic convolutions to products, and
Table 3-3 Properties of the (K2 × K1 ) 2-D DFT. In the time-shift and modulation properties, (k1 − k1′ ) and (n − n0 ) must be reduced
mod(K 1 ), and (k2 − k2′ ) and (m − m0 ) must be reduced mod(K 2 ).
Selected Properties
1. Linearity ∑ ci fi [n, m] ∑ ci F[k1 , k2 ]

2. Shift f [(n − n0 ), (m − m0 )] e− j2π k1 n0 /K1 e− j2π k2 m0 /K2 F[k1 , k2 ]
3. Modulation e j2π k1′ n/K1 e j2π k2′ m/K2 f [n, m] F[(k1 − k1′ ), (k2 − k2′ )]
4. Reversal f [(N − n), (M − m)] F[(K 1 − k1 ), (K 2 − k2 )]
5. Convolution h[n, m]
c c f [n, m] H[k1 , k2 ] F[k1 , k2 ]
Special DFT Relationships

6. Conjugate Symmetry for f [n, m] real F∗ [k1 , k2 ] = F[(K 1 − k1 ), (K 2 − k2 )]
N−1 M−1
7. Zero spatial frequency F[0, 0] = ∑ ∑ f [n, m]
n=0 m=0
K1 −1 K2 −1
1
8. Spatial origin f [0, 0] =
K1 K2 ∑ ∑ F[k1 , k2 ]
k1 =0 k2 =0
N−1 M−1 K1 −1 K2 −1
1
9. Rayleigh’s theorem ∑ ∑ | f [n, m]|2 = ∑ ∑ |F[k1 , k2 ]|2
n=0 m=0 K1 K2 k1 =0 k2 =0
linear 2-D convolutions h[n, m] ∗ ∗ f [n, m] can be zero-padded to 3-8.2 Conjugate Symmetry for the 2-D DFT
cyclic convolutions, just as in 1-D.
Concept Question 3-8: Why is the 2-D DFT defined

only over a finite region, while the DSFT is defined over
all spatial frequency space?
Given an (M × N) image f [n, m], the expression given by

Exercise 3-14:  an expression for the (256 × 256)
 Compute Eq. (3.75) allows us to compute the 2-D DFT of f [n, m] for any
1 2 1 order (K2 × K1 ). If f [n, m] is real, then conjugate symmetry
2-D DFT of 2 4 2. Use the result of Exercise 3-11. holds:
1 2 1
Answer: The 2-D DFT is the DSFT sampled at F∗ [k1 , k2 ] = F[K1 − k1 , K2 − k2 ], (3.78)
Ωi = 2π ki /256 for i = 1, 2. Substituting in the answer to 1 ≤ k1 ≤ K1 − 1; 1 ≤ k2 ≤ K2 − 1.
Exercise 3-11 gives
[Compare this statement with the conjugate symmetry of the

k1 k2 1-D DFT X[k] of a real-valued signal x[n], as given by
X[k1 , k2 ] = 2 + 2 cos 2π 2 + 2 cos 2π , Eq. (2.98).]
256 256
The conjugate-symmetry relation given by Eq. (3.78) states
0 ≤ k1 , k2 ≤ 255. that an array element F[k1 , k2 ] is equal to the complex conjugate
of array element F[K1 − k1 , K2 − k2 ], and vice versa. To
demonstrate the validity of conjugate symmetry, we start by
rewriting Eq. (3.75) with k1 and k2 replaced with (K1 − k1 ) and
3-9 COMPUTATION OF THE 2-D DFT USING MATLAB 121

n − j(2π /K2 )mk2 ,
(K2 − k2), respectively: (4) F[K1 /2, k2 ] = ∑M−1 N−1
m=0 ∑n=0 f [n, m] (−1) e
which is the K2 -point 1-D DFT of ∑N−1 n=0 f [n, m] (−1)
n
F[K1 − k1 , K2 − k2 ] − j(2 π / K1 )n(K1 /2) j π n n
because e = e = (−1) .
N−1 M−1
n(K1 − k1 ) m(K2 − k2)
= ∑ ∑ f [n, m] exp − j2π +
m − j(2π /K1 )nk1
n=0 m=0 K1 K2 (5) F[k1 , K2 /2] = ∑N−1 M−1
n=0 ∑m=0 f [n, m] (−1) e ,
N−1 M−1 M−1
which is the K1 -point 1-D DFT of ∑m=0 f [n, m] (−1)m
nk1 mk2
= ∑ ∑ f [n, m] exp j2π + because e− j(2π /K2)m(K2 /2) = e jπ m = (−1)m .
n=0 m=0 K1 K2

nK1 mK2 (6) F[K1 /2, K2 /2] = ∑N−1 M−1 m+n .
× exp − j2π + n=0 ∑m=0 f [n, m] (−1)
K1 K2
N−1 M−1
nk1 mk2
= ∑ ∑ f [n, m] exp j2π +
n=0 m=0 K1 K2
3-9 Computation of the 2-D DFT Using
× e− j2π (n+m)
N−1 M−1 MATLAB
nk1 mk2
= ∑ ∑ f [n, m] exp j2π
K1
+
K2
, (3.79)
We remind the reader that the notation used in this book
n=0 m=0
represents images f [n, m] defined in Cartesian coordinates, with
where we used e− j2π (n+m) = 1 because n and m are integers. The the origin at the upper left corner, the first element n of the
expression on the right-hand side of Eq. (3.79) is identical to the coordinates [n, m] increasing horizontally rightward from the
expression for F[k1 , k2 ] given by Eq. (3.75) except for the minus origin, and the second element m of the coordinates [n, m]
sign ahead of j. Hence, for a real-valued image f [n, m], increasing vertically downward from the origin. To illustrate
with an example, let us consider the (3 × 3) image given by
   
F∗ [k1 , k2 ] = F[K1 − k1 , K2 − k2 ], (3.80) f [0, 0] f [1, 0] f [2, 0] 3 1 4
1 ≤ k1 ≤ K1 − 1; 1 ≤ k2 ≤ K2 − 1, f [n, m] =  f [0, 1] f [1, 1] f [2, 1] = 1 5 9 . (3.81)
f [0, 2] f [1, 2] f [2, 2] 2 6 5
where (K2 × K1 ) is the order of the 2-D DFT. When stored in MATLAB as array X(m′ , n′ ), the content remains
the same, but the indices swap roles and their values start at
3-8.3 Special Cases (1, 1):
   
A. f [n, m] is real X(1, 1) X(1, 2) X(1, 3) 3 1 4
X(m′ , n′ ) = X(2, 1) X(2, 2) X(2, 3) = 1 5 9 .
If f [n, m] is a real-valued image, the following special cases X(3, 1) X(3, 2) X(3, 3) 2 6 5
hold: (3.82)
Arrays f [n, m] and X(m′ , n′ ) are displayed in Fig. 3-28.
(1) F[0, 0] = ∑N−1 M−1
n=0 ∑m=0 f [n, m] is real-valued. Application of Eq. (3.75) with N = M = 3 and K1 = K2 = 3
− j(2π /K )mk to the 3 × 3 image defined by Eq. (3.81) leads to
(2) F[0, k2 ] = ∑M−1 N−1
m=0 ∑n=0 f [n, m] e
2 2,
N−1  
which is the K2 -point 1-D DFT of ∑n=0 f [n, m]. 36 −9 + j5.2 −9 − j5.2
− j(2π /K )nk F[k1 , k2 ] = −6 − j1.7 9 + j3.5 1.5 + j0.9 . (3.83)
(3) F[k1 , 0] = ∑N−1 M−1
n=0 ∑m=0 f [n, m] e
1 1,
−6 + j1.7 1.5 − j0.9 9 − j3.5
M−1
which is the K1 -point 1-D DFT of ∑m=0 f [n, m].
◮ In MATLAB, the command FX=fft2(X,M,N) com-
B. f [n, m] is real and K1 and K2 are even putes the (M × N) 2-D DFT of array X and stores it in array
If also K1 and K2 are even, then the following relations apply: FX. ◭
3 × 3 Image
 
3 1 4
1 5 9
2 6 5
Common Image Format MATLAB Format

n n′
   
f [0, 0] = 3 f [1, 0] = 1 f [2, 0] = 4 X(1, 1) = 3 X(1, 2) = 1 X(1, 3) = 4
m f [n, m] =  f [0, 1] = 1 f [1, 1] = 5 f [2, 1] = 9 m′ X(m′ , n′ ) = X(2, 1) = 1 X(2, 2) = 5 X(2, 3) = 9
f [0, 2] = 2 f [1, 2] = 6 f [2, 2] = 5 X(3, 1) = 2 X(3, 2) = 6 X(3, 3) = 5
DFT fft2(X)
F[k1 , k2 ] = FX(k2′ , k1′ ) =

   
F[0, 0] = 36 F[1, 0] = −9 + j5.2 F[2, 0] = −9 − j5.2 FX(1, 1) = 36 FX(1, 2) = −9 + j5.2 FX(1, 3) = −9 − j5.2
 F[0, 1] = −6 − j1.7 F[1, 1] = 9 + j3.5 F[2, 1] = 1.5 + j0.9   FX(2, 1) = −6 − j1.7 FX(2, 2) = 9 + j3.5 FX(2, 3) = 1.5 + j0.9 
F[0, 2] = −6 + j1.7 F[1, 2] = 1.5 − j0.9 F[2, 2] = 9 − j3.5 FX(3, 1) = −6 + j1.7 FX(3, 2) = 1.5 − j0.9 FX(3, 3) = 9 − j3.5
Shift to Center fftshift(fft2(X))
   
9 − j3.5 −6 + j1.7 1.5 − j0.9 9 − j3.5 −6 + j1.7 1.5 − j0.9
′ ′
Fc [k1c , k2c ] = −9 − j5.2 36 −9 + j5.2 FXC(k2c , k1c ) = −9 − j5.2 36 −9 + j5.2
1.5 + j0.9 −6 − j1.7 9 + j3.5 1.5 + j0.9 −6 − j1.7 9 + j3.5
Figure 3-28 In common-image format, application of the 2-D DFT to image f [n, m] generates F[k1 , k2 ]. Upon shifting F[k1 , k2 ] along k1 and k2 to
center the image, we obtain the center-of-image format represented by Fc [k1c , k2c ]. The corresponding sequence in MATLAB starts with X(m′ , n′ ) and
′ , k′ ).
concludes with FXC(k2c 1c
The corresponding array in MATLAB, designated FX(k2′ , k1′ ) 3-9.1 Center-of-Image Format
and displayed in Fig. 3-28, has the same content but with
MATLAB indices (k2′ , k1′ ). Also, k2′ increases downward and In some applications, it is more convenient to work with the
k1′ increases horizontally. The relationships between MATLAB 2-D DFT array when arranged in a center-of-image format
indices (k2′ , k1′ ) and common-image format indices (k1 , k2 ) are (Fig. 3-25(c) but with the vertical axis pointing downward) than
identical in form to those given by Eq. (3.69), namely in the top-left corner format. To convert array F[k1 , k2 ] to a
center-of-image format, we need to shift the array elements to
k2′ = k2 + 1, (3.84)
the right and downward by an appropriate number of steps so as
k1′ = k1 + 1. (3.85) to locate F[0, 0] in the center of the array. If we denote the 2-D
DFT in the center-of-image format as Fc [k1c , k2c ], then its index with k1 = K1 − k1c , k2 = K2 − k2c ,
k1c extends over the range

Ki − 1 Ki − 1
− ≤ k1c ≤ , for Ki = odd, (3.86) (d) Fourth Quadrant
2 2
and Fc [k1c , −k2c ] = F[k1 , k2 ], (3.90d)
Ki Ki with k1 = k1c , k2 = K2 − k2c .

− ≤ k1c ≤ − 1, for Ki = even. (3.87)
2 2
To obtain Fc [k1c , k2c ] from F[k1 , k2 ] for the array given by To demonstrate the recipe, we use a numerical example with
Eq. (3.83), we circularly shift the array by one unit to the right K1 = K2 = 3, and again for K1 = K2 = 4 because the recipe is
and one unit downward, which yields different for odd and even integers.
Fc [k1c ,k2c ] =
 
Fc [−1,1] = 9 − j3.5 Fc [0,1] = −6 + j1.7 Fc [1,1] = 1.5 − j0.9
 Fc [−1,0] = −9 − j5.2 Fc [0,0] = 36 Fc [1,0] = −9 + j5.2  .
Fc [−1,−1] = 1.5 + j0.9 Fc [0,−1] = −6 − j1.7 Fc [1,−1] = 9 + j3.5
3-9.2 Odd and Even Image Examples
(3.88)
A. N = M and K1 = K2 = odd
◮ In MATLAB, the command
FXC=fftshift(fft2(FX)) The (3 × 3) image shown in Fig. 3-28 provides an example of an
(M × M) image with M being an odd integer. As noted earlier,
shifts array FX to center-image format and stores it in array when the 2-D DFT is displayed in the center-of-image format,
FXC. ◭ the conjugate symmetry about the center of the array becomes
readily apparent.
In the general case for any integers K1 and K2 , transform-
ing the 2-D DFT F[k1 , k2 ] into the center-of-image format
Fc [k1c , k2c ] entails the following recipe:
For A. N = M and K1 = K2 = even
(
′ Ki /2 − 1 if Ki is even, Let us consider the (4 × 4) image
Ki = (3.89)
(Ki − 1)/2 if Ki is odd,  
1 2 3 4
and 0 ≤ k1c , k2c ≤ K′i : 2 4 5 3
f [n, m] =  . (3.91)
3 4 6 2
(a) First Quadrant 4 3 2 1
Fc [k1c , k2c ] = F[k1 , k2 ], (3.90a) The (4 × 4) 2-D DFT F[k1 , k2 ] of f [n, m], displayed in the upper-
left corner format, is
with k1 = k1c and k2 = k2c ,
 
F[0, 0] F[1, 0] F[2, 0] F[3, 0]
(b) Second Quadrant
F[0, 1] F[1, 1] F[2, 1] F[3, 1]
F=
Fc [−k1c , k2c ] = F[k1 , k2 ], (3.90b) F[0, 2] F[1, 2] F[2, 2] F[3, 2]
F[0, 3] F[1, 3] F[2, 3] F[3, 3]
with k1 = K1 − k1c , k2 = k2c ,  
49 −6 − j3 3 −6 + j3
−5 − j4 2 + j9 −5 + j2 j 
(c) Third Quadrant = . (3.92)
1 −4 + j3 −1 −4 − j3
Fc [−k1c , −k2c ] = F[k1 , k2 ], (3.90c) −5 + j4 −j −5 − j2 2 − j9
Application of the recipe given by Eq. (3.90) leads to
Fc (k1c , k2c )
 ′ 
F [−2, −2] F′ [−1, −2] F′ [0, −2] F′ [1, −2]
′ ′ ′ ′
 F [−2, −1] F [−1, −1] F [0, −1] F [1, −1] 
= ′
F [−2, 0] F′ [−1, 0] F′ [0, 0] F′ [1, 0] 
′ ′
F [−2, −1] F [−1, 1] ′
F [0, 1] F′ [1, 1]
 
−1 −4 − j3 1 −4 + j3
−5 − j2 2 − j9 −5 + j4 −j 
= . (3.93)
3 −6 + j3 49 −6 − j3
−5 + j2 j −5 − j4 2 + j9
Now conjugate symmetry Fc [−k1c , −k2c ] = F∗c [k1c , k2c ] applies,

but only after omitting the first row and column of Fc [k1c , k2c ].
This is because the dc value (49) is not at the center of the array.
Indeed, there is no center pixel in an M × M array if M is even.
It is customary in center-of-image depictions to place the origin
at array coordinates [ M2 + 1, M2 + 1]. The first row and column
are not part of the mirror symmetry about the origin. This is not
noticeable in M × M arrays if M is even and large.
Conjugate symmetry applies within the first row and within
the first column, since these are 1-D DFTs with M2 = 2.
Exercise 3-15: Why did this book not spend more space on
computing the DSFT?
Answer: Because in practice, the DSFT is computed using
the 2-D DFT.
Summary
Concepts
• Many 2-D concepts are generalizations of 1-D coun- space images, on which discrete-space image processing
terparts. These include: LSI systems, convolution, sam- can be performed.
pling, 2-D continuous-space; 2-D discrete-space; and • Nearest-neighbor interpolation often works well for in-
2-D discrete Fourier transforms. terpolating sampled images to continuous space. Hexag-
• The 2-D DSFT is doubly periodic in Ω1 and Ω2 with onal sampling can also be used.
periods 2π . • Discrete-space images can be displayed in several differ-
• Rotating an image rotates its 2-D continuous-space ent formats (the location of the origin differs).
Fourier transform (CSFT). The CSFT of a radially sym- • The response of an LSI system with point spread
metric image is radially symmetric. function h(x, y) to image f (x, y) is output g(x, y) =
• Continuous-space images can be sampled to discrete- h(x, y) ∗ ∗ f (x, y), and similarly in discrete space.
Impulse Ideal radial lowpass filter PSF
δ (r) h(r) = 4ρ02 jinc(2ρ0 r)
δ (x, y) = δ (x) δ (y) =
πr
Energy of f (x, y) 2-D Sampling
Z ∞Z ∞ 1
E= | f (x, y)|2 dx dy Sampling interval > 2B if F(µ , ν ) = 0 for |µ |, |ν | > B
∆
−∞ −∞
Convolution 2-D Sinc interpolation formula

h(x, y) ∗ f (x, y) = f (x, y) =
∞ ∞
Z ∞Z ∞ x − n∆ y − m∆
f (ξ , η ) h(x − ξ , y − η ) d ξ d η
∑ ∑ f (n∆, m∆) sinc ∆ sinc
∆
n=−∞ m=−∞
−∞ −∞
Discrete-space Fourier transform (DSFT)
Convolution ∞ ∞
∞ ∞ F(Ω1 , Ω2 ) = ∑ ∑ f [n, m] e− j(Ω1 n+Ω2 m)
h[n, m] ∗ ∗ f [n, m] = ∑ ∑ h[i, j] f [n − i, m − j] n=−∞ m=−∞
i=−∞ j=−∞
(K2 × K1 ) 2-D DFT of (M × N) image
Fourier transform
Z Z (CSFT) N−1 M−1
∞ ∞
F(µ , ν ) = f (x, y) e − j2π ( µ x+ν y)
dx dy F[k1 , k2 ] = ∑ ∑ f [n, m] e− j2π (nk1/K1 +mk2 /K2 )
−∞ −∞ n=0 m=0
Inverse CSFT Inverse 2-D DFT

Z ∞Z ∞
f (x, y) = F(µ , ν ) e j2π (µ x+ν y) d µ d ν 1 K1 −1 K2 −1
−∞ −∞
f [n, m] =
K1 K2 k∑ ∑ F[k1 , k2 ] e j2π (nk1/K1 +mk2 /K2)
=0 k =0
1 2
Ideal square lowpass filter PSF
h(x, y) = 4µ02 sinc(2 µ0 x) sinc(2ν0 y)
aliasing CSFT DSFT linear shift-invariant (LSI) point spread function sampling theorem
convolution DFT FFT nearest-neighbor interpolation sampled image sinc function
PROBLEMS its samples using NN interpolation, but displays spectra at each

stage. Explain why the image reconstructed from its samples
Section 3-2: 2-D Continuous-Space Images matches the clown image.
3.1 Let f (x, y) be an annulus (ring) with inner radius 3 and

outer radius 5, with center at the point (2,4). Express f (x, y) in Section 3-6: 2-D Discrete Space
terms of fDisk (x, y).
3.10 Compute the 2-D convolution
Section 3-3: Continuous-Space Systems

3 1 5 9
3.2 Compute the autocorrelation ∗∗
4 1 2 6
r(x, y) = fBox (x, y) ∗ ∗ fBox(−x, −y) by hand. Check your answer using MATLAB’s conv2.
of fBox (x, y). 3.11 An LTI system is described by the equation
Section 3-4: 2-D Continuous-Space Fourier 1 1 1

g[n, m] = f [n − 1, m − 1] + f [n − 1, m] + f [n − 1, m + 1]
Transform (CSFT) 9 9 9
1 1 1
+ f [n, m − 1] + f [n, m] + f [n, m + 1]
3.3 Compute the 2-D CSFT F(µ , ν ) of a 10 × 6 ellipse f (x, y). 9 9 9
1 1 1
3.4 A 2-D Gaussian function has the form + f [n + 1, m − 1] + f [n + 1, m] + f [n + 1, m + 1].
9 9 9
2 2)
e−r /(2σ What is the PSF h[n, m] of the system? Describe in words what
fg (r) = .
2πσ 2 it does to its input.
Compute the 2-D CSFT fg (ρ ) of fg (r) using the scaling property 3.12 An LTI system is described by the equation
of the 2-D CSFT.
3.5 Compute the CSFT of an annulus (ring) f (x, y) with inner g[n, m] = 9 f [n − 1, m − 1] + 8 f [n − 1, m] + 7 f [n − 1, m + 1]
radius 3 and outer radius 5, with center at: + 6 f [n, m − 1] + 5 f [n, m] + 4 f [n, m + 1]
(a) the origin (0,0); + 3 f [n + 1, m − 1] + 2 f [n + 1, m] + f [n + 1, m + 1].
(b) the point (2,4).
What is the PSF h[n, m] of the system? Use center-of-image
notation (Fig. 3-25(c)).
Section 3-5: 2-D Sampling Theorem
3.6 f (x, y) = cos(2π 3x) cos(2π 4y) is sampled every ∆ = 0.2. Section 3-7: Discrete-Space Fourier Transform
What is reconstructed by a brick-wall lowpass filter with cutoff (DSFT)
= 5 in µ and ν and passband gain ∆ = 0.2?
3.7 f (x, y) = cos(2π 12x) cos(2π 15y) is sampled every
3.13 Prove the shift property of the DSFT:
∆ = 0.1. What is reconstructed by a brick-wall lowpass filter
with cutoff = 10 in µ and ν and passband gain ∆ = 0.1?
f [n − a, m − b] e− j(aΩ1 +bΩ2 ) F(Ω1 , Ω2 ).
3.8 Antialias filtering: Run MATLAB program P38.m. This
lowpass-filters the clown image before sampling it below its
Nyquist rate. Explain why the image reconstructed from its 3.14 Compute the spatial frequency response of the system in
samples has no aliasing. Problem 3.11.
3.9 Nearest-Neighbor interpolation: Run MATLAB program 3.15 Compute the spatial frequency response of the system in
P39.m. This samples the clown image and reconstructs from Problem 3.12.
PROBLEMS 127
3.16 Show that the 2-D spatial frequency response of the PSF using (2 × 2) 2-D DFTs.
 
1 2 1
h[m, n] = 2 4 2
1 2 1
is close to circularly symmetric, making it a circularly symmet-

ric lowpass filter, by:
(a) displaying the spatial frequency response as an image with
dc at the center;
(b) using
Ω2 Ω4
cos(Ω) = 1 − + − ···
2! 4!
and neglecting all terms of degree four or higher.
Section 3-8: 2-D Discrete Fourier Transform

(DFT)
3.17 Compute by hand the (2 × 2) 2-D DFT of

1 2
F = f [n, m] = .
3 4
Check your answer using MATLAB using FX=fft(F,2,2).

3.18 Prove that the (M × M) 2-D DFT of a separable image
f [n, m] = f1 [n] f2 [m] is the product of the 1-D M-point DFTs of
f1 [n] and f2 [n]: F[k1 , k2 ] = f1 [k1 ] f2 [k2 ].
3.19 Show that we can extend the definition of the (M × M)
2-D DFT to negative values of k1 and k2 using
F[−k1 , −k2 ] = F[M − k1 , M − k2 ]
for indices 0 < k1 , k2 < M. Conjugate symmetry for real f [n, m]

is then
F[−k1 , −k2 ] = F[M − k1 , M − k2 ] = F∗ [k1 , k2 ].
3.20 Compute the 2-D cyclic convolution
y[n, m] = x1 [n, m]
c c x2 [n, m],
where
1 2
x1 [n, m] =
3 4
and
5 6
x2 [n, m] =
7 8
Chapter 4
4 Image Interpolation
0
Contents
50
Overview, 129
4-1 Interpolation Using Sinc Functions, 129 100
4-2 Upsampling and Downsampling Modalities, 130
4-3 Upsampling and Interpolation, 133 150
4-4 Implementation of Upsampling Using 2-D DFT
in MATLAB, 137 200
4-5 Downsampling, 140
4-6 Antialias Lowpass Filtering, 141 250
4-7 B-Splines Interpolation, 143
4-8 2-D Spline Interpolation, 149 300
4-9 Comparison of 2-D Interpolation Methods, 150
4-10 Examples of Image Interpolation 350
Applications, 152
Problems, 156 399
0 50 100 150 200 250 300 350 399
Objectives In many image processing applications, such as

Learn to: magnification, thumbnails, rotation, morphing, and
reconstruction from samples, it is necessary to
■ Use sinc and Lanczos functions to interpolate a interpolate (roughly, fill in gaps between given
bandlimited image. samples). This chapter presents three approaches to
interpolation: using sinc or Lanczos functions;
■ Perform upsampling and interpolation using the 2-D upsampling using the 2-D DFT; and use of
DFT and MATLAB. B-splines. These methods are compared and used on
each of the applications listed above.
■ Perform downsampling for thumbnails using the 2-D
DFT and MATLAB.
■ Use B-spline functions of various orders to interpo-

late non-bandlimited images.
■ Rotate, magnify, and morph images using interpola-

tion.
Overview applying the sinc interpolation formula may be an aliased
version of x(t). In such cases, different interpolation methods
Suppose some unknown signal x(t) had been sampled at a should be used, several of which are examined in this chapter.
sampling rate S to generate a sampled signal x[n] = { x(n∆), In 2-D, interpolation seeks to fill in the gaps between the
n = . . . , −1, 0, 1, . . . }, sampled at times t = n∆, where ∆ = 1/S is image values f [n, m] = { f (n∆, m∆) } defined at discrete lo-
the sampling interval. Interpolation entails the use of a recipe cations (x = n∆, y = m∆) governed by the spatial sampling
or formula to compute a continuous-time interpolated version rate S = Sx = Sy = 1/∆. In analogy with the 1-D case, if the
xint (t) that takes on the given values { x(n∆) } and interpolates spectrum F(µ , v) of the original image f (x, y) is bandlimited
between them. In 1-D, interpolation is akin to connecting to Bx = By = B and if the sampling rate satisfies the Nyquist
the dots, represented here by { x(n∆) }, to obtain xint (t). The criterion (i.e., S > 2B), then it should be possible to interpolate
degree to which the interpolated signal is identical to or a close f [n, m] = f (n∆, m∆) to generate f (x, y) exactly.
rendition of the original signal x(t) depends on two factors: This chapter explores several types of recipes for interpolating
(1) Whether or not the sampling rate S used to generate xint (t) a sampled image f [n, m] = { f (n∆, m∆) } into a continuous-
from x(t) satisfies the Nyquist criterion, namely S > 2B, space image fint (x, y). Some of these recipes are extensions of
where B is the maximum frequency in the spectrum of the sinc interpolation formula, while others rely on the use of
signal x(t), and B-spline functions. As discussed later in Section 4-7, B-splines
are polynomial functions that can be designed to perform
(2) the specific interpolation method used to obtain xint (t). nearest-neighbor, linear, quadratic, and cubic interpolation of
images and signals. We will also explore how interpolation is
If the spectrum X( f ) of x(t) is bandlimited to B and S > 2B,
used to realize image zooming (magnification), rotation, and
it should be possible to use the sinc interpolation formula given
warping.
by Eq. (2.51) to reconstruct x(t) exactly. A simple example is
illustrated by the finite-duration sinusoid shown in Fig. 4-1.
In practice, however, it is computationally more efficient to
perform the interpolation in the frequency domain by lowpass- 4-1 Interpolation Using Sinc Functions
filtering the spectrum of the sampled signal.
Oftentimes, the sampling rate does not satisfy the Nyquist 4-1.1 Sinc Interpolation Formula
criterion, as a consequence of which the signal obtained by
An image f (x, y) is said to be bandlimited to a maximum spatial
frequency B if its spectrum F(µ , v) is such that
F(µ , ν ) = 0 for |µ |, |v| ≥ B.

1
If such an image is sampled uniformly along x and y at sampling
rates Sx = Sy = S = 1/∆, and the sampling interval ∆ satisfies the
0.5 Nyquist rate, namely
1
∆< ,
2B
0 t (ms) then we know from the 2-D sampling theorem (Section
3-5) that f (x, y) can be reconstructed from its samples
f [n, m] = { f (n∆, m∆) } using the sinc interpolation formula
−0.5 given by Eq. (3.62), which we repeat here as
∞ ∞ x y
−1 f (x, y) =∑ ∑ f (n∆, m∆) sinc
∆
− n sinc
∆
− m ,
n=−∞ m=−∞
0 0.5 1 1.5 2 2.5 3 3.5
(4.1)
where for any argument z, the sinc function is defined as
Figure 4-1 1-D interpolation of samples of a sinusoid using the
sinc interpolation formula. sin(π z)
sinc(z) = . (4.2)
πz
129
130 CHAPTER 4 IMAGE INTERPOLATION
In reality, an image is finite in size, and so is the number of where the rectangle function, defined earlier in Eq. (2.2), is given
samples { f (n∆, m∆) }. Consequently, for a square image, the by (
ranges of indices n and m are limited to finite lengths M, in x 1 for − a < x < a,
which case the infinite sums in Eq. (4.1) become finite sums: rect =
2a 0 otherwise.
M−1 M−1 x y
Parameter a usually is assigned a value of 2 or 3. In MATLAB,
fsinc (x, y) = ∑ ∑ f (n∆, m∆) sinc
∆
− n sinc
∆
−m .
the Lanczos interpolation formula can be exercised using the
n=0 m=0
(4.3) command imresize and selecting Lanczos from the menu.
The summations start at n = 0 and m = 0, consistent with the In addition to the plot for sinc(x), Fig. 4-2 also contains plots
image display format shown in Fig. 3-3(a), wherein location of the sinc function multiplied by the windowed sinc function,
(0, 0) is at the top-left corner of the image. with a = 2 and also with a = 3. The rectangle function is zero
The sampled image consists of M × M values—denoted here beyond |x| = a, so the windowed function stops at |x| = 2 for
by f (n∆, m∆)—each of which is multiplied by the product of a = 2 and at |x| = 3 for a = 3. The Lanczos interpolation method
two sinc functions, one along x and another along y. The value provides significant computational improvement over the simple
fsinc (x, y) at a specified location (x, y) on the image consists of sinc interpolation formula.
the sum of M × M terms. In the limit for an image of infinite
size, and correspondingly an infinite number of samples along Concept Question 4-1: What is the advantage of Lanc-
x and y, the infinite summations lead to fsinc (x, y) = f (x, y), zos interpolation over sinc interpolation?
where f (x, y) is the original image. That is, the interpolated
image is identical to the original image, assuming all along that
the sampled image is in compliance with the Nyquist criterion Exercise 4-1: If sinc interpolation is applied to the samples
of the sampling theorem. When applying the sinc interpolation {x(0.1n) = cos(2π (6)0.1n)}, what will be the result?
formula to a finite-size image, the interpolated image should be Answer: {x(0.1n)} are samples of a 6 Hz cosine sam-
a good match to the original image, but it is computationally pled at a rate of 10 samples/second, which is below the
inefficient when compared with the spatial-frequency domain Nyquist frequency of 12 Hz. The sinc interpolation will
interpolation technique described later in Section 4-3.2. be a cosine with frequency aliased to (10 − 6) = 4 Hz (see
Section 2-4.3).
4-1.2 Lanczos Interpolation 4-2 Upsampling and Downsampling

To reduce the number of terms involved in the computation of Modalities
the interpolated image, the sinc function in Eq. (4.3) can be
modified by truncating the sinc pattern along x and y so that As a prelude to the material presented in forthcoming sections,
it is zero beyond a certain multiple of ∆, such as |x| = 2∆ we present here four examples of image upsampling and down-
and |y| = 2∆. Such a truncation is offered by the Lanczos sampling applications. Figure 4-3 depicts five image configu-
interpolation formula which replaces each of the sinc functions rations, each consisting of a square array of square pixels. The
in Eq. (4.3) by windowed sinc functions, thereby assuming the central image is the initial image from which the other four were
form generated. We define this initial image as the discrete version of
a continuous image f (x, y), sampled at (M × M) locations at a
M−1 M−1 sampling interval ∆o along both dimensions:
fLanczos (x, y) = ∑ ∑ f (n∆, m∆)
n=0 m=0 f [n, m] = { f (n∆o , m∆o ), 0 ≤ n, m ≤ M − 1 }. (4.5)
x x x
× sinc − n sinc − n rect
∆
y a∆y 2ay We assume that the sampling rate S is such that ∆o = 1/S
× sinc − m sinc − m rect , satisfies the Nyquist rate S > 2B or, equivalently, ∆o < 1/2B,
∆ a∆ 2a where B is the maximum spatial frequency of f (x, y).
(4.4) When displayed on a computer screen, the initial (central
4-2 UPSAMPLING AND DOWNSAMPLING MODALITIES 131
1
sinc(x)
0.8
sinc(x) sinc(x/2) rect(x/4)
0.6 sinc(x) sinc(x/3) rect(x/6)
0.4
0.2
−0.2 x
−3 −2 −1 0 1 2 3
Figure 4-2 Sinc function (in black) and Lanczos windowed sinc functions (a = 2 in blue and a = 3 in red).
image is characterized by four parameters: (Mu × Mu ) image g[n, m], with Mu > Mo and
Mo × Mo = M 2 = image array size, Mu Mo

T′ = T, w′ = w, and ∆u = ∆o . (4.6)
2
T × T = T = image physical size on computer screen, Mo Mu
w × w = w2 = image pixel area, Since the sampling interval ∆u in the enlarged upsampled image
2 is shorter than the sampling interval in the initial image, ∆o , the
∆o × ∆o = ∆o = true resolution area. Nyquist requirement continues to be satisfied.
If the displayed image bears a one-to-one correspondence to the
array f [n, m], then the brightness of a given pixel corresponds B. Increasing Image Array Size While Keeping
to the magnitude f [n, m] of the corresponding element in the Physical Size Unchanged
array, and the pixel dimensions w × w are directly proportional
to ∆o × ∆o . For simplicity, we set w = ∆o , which means that the The image displayed in Fig. 4-3(b) is an upsampled version of
image displayed on the computer screen has the same physical f [n, m] in which the array size was increased from (M × M) to
dimensions as the original image f (x, y). The area ∆2o is the true (Mu × Mu ) and the physical size of the image remained the same,
resolution area of the image. but the pixel size is smaller. In the image g[n, m],
Mo Mo
T = T ′, w′ = w , and ∆u = ∆o . (4.7)
Mu Mu
4-2.1 Upsampling Modalities
A. Enlarging Physical Size While Keeping Pixel ◮ The two images in Fig. 4-3(a) and (b) have identical ar-
Size Unchanged rays g[n, m], but they are artificially displayed on computer
screens with different pixel sizes. ◭
The image configuration in part (a) of Fig. 4-3 depicts what
happens when the central image f [n, m] is enlarged in size from
(T × T ) to (T ′ × T ′ ) while keeping the pixel size the same. The 4-2.2 Downsampling Modalities
process, accomplished by an upsampling operation, leads to an
T′ g[n,m]
T′ = T
Upsampling
Upsampling
Δu = Δo /2
f [n,m] Δu = Δo /2 w′
T
(b) Image with Mu = 2Mo,
T ′ = T, and w′ = w /2
w′
(a) Image with w′ = w, Mu = 2Mo,
and T ′ = 2T Downsampling
Downsampling T′ = T
w
Δd = 2Δo
T′ Δd = 2Δo Original image
Mo × Mo
w′ w′
(c) Thumbnail image with w′ = w, (d) Image with w′ = 2w, T ′ = T,
Md = Mo /2 and T ′ = T/2 and Md = Mo /2
Figure 4-3 Examples of image upsampling and downsampling.
A. Thumbnail Image B. Reduce Array Size While Keeping Physical

Size Unchanged
If we apply downsampling to reduce the array size from
(Mo × Mo ) to (Md × Md ), we end up with the downsampled The final of the transformed images, shown in Fig. 4-3(d), has
images depicted in Figs. 4-3(c) and (d). In the thumbnail image the same identical content of the thumbnail image, but the pixel
shown in Fig. 4-3(c), the pixel size of the computer display is size on the computer screen has been enlarged so that
the same as that of the original image. Hence,
Mo Mo
Md Mo T = T ′, ∆d = ∆o , and w′ = w . (4.9)
′
T = T, ′
w = w, and ∆d = ∆o . (4.8) Md Md
Mo Md
4-3 UPSAMPLING AND INTERPOLATION 133
4-3 Upsampling and Interpolation illustrated in Fig. 4-3(a), rather than to decrease the sampling
interval.
Upsampling image f [n, m] to image g[n′ , m′ ] can be accom-
Let f (x, y) be a continuous-space image of size T (meters) by T
plished either directly in the discrete spatial domain or indirectly
(meters) whose 2-D Fourier transform F(µ , v) is bandlimited to
in the spatial frequency domain. We examine both approaches in
B (cycles/m). That is,
the subsections that follow.
F(µ , v) = 0 for |µ |, |ν | > B. (4.10)
Image f (x, y) is not available to us, but an (Mo × Mo ) sampled 4-3.1 Upsampling in the Spatial Domain
version of f (x, y) is available. We define it as the original
In practice, image upsampling is performed using the 2-D
sampled image
DFT in the spatial frequency domain because it is much faster
f [n, m] = { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1 }, (4.11) and easier computationally than performing the upsampling
directly in the spatial domain. Nevertheless, for the sake of
where ∆o is the associated sampling interval. Moreover, the completeness, we now provide a succinct presentation of how
sampling had been performed at a rate exceeding the Nyquist upsampling is performed in the spatial domain using the sinc
rate, which requires the choice of ∆o to satisfy the condition interpolation formula.
We start by repeating Eq. (4.3) after replacing ∆ with ∆o and
1 M with Mo :
∆o < . (4.12)
2B
Mo −1 Mo −1
x y
Given that the sampled image is T × T in size, the number of f (x, y) = ∑ ∑ f [n, m] sinc − n sinc −m .
samples Mo along each direction is n=0 m=0 ∆o ∆o
(4.16)
T Here, f [n, m] is the original (Mo × Mo ) sampled image available
Mo = . (4.13) to us, and the goal is to upsample it to an (Mu × Mu ) image
∆o
g[n′ , m′ ], with Mu = LMo , where L is an upsampling factor. In
Next, we introduce a new (yet to be created) higher-density the upsampled image, the sampling interval ∆u is related to the
sampled image g[n, m], also T × T in physical dimensions but sampling interval ∆o of the original sampled image by
containing Mu × Mu samples—instead of Mo × Mo samples—
with Mu > Mo (which corresponds to the scenario depicted in Mo ∆o
∆u = ∆o = . (4.17)
Fig. 4-3(b)). We call g[n′ , m′ ] the upsampled version of f [n, m]. Mu L
The goal of upsampling and interpolation, which usually is
abbreviated to just “upsampling,” is to compute g[n′ , m′ ] from To obtain g[n′ , m′ ], we sample f (x, y) at x = n′ ∆u and y = m′ ∆u :
f [n, m]. Since Mu > Mo , g[n′ , m′ ] is more finely discretized than
f [n, m], and the narrower sampling interval ∆u of the upsampled g[n′ , m′ ] = f (n′ ∆u , m′ ∆u ) (0 ≤ n′ , m′ ≤ Mu − 1)
image is Mu −1 Mu −1
∆u =
T
=
Mo
∆o . (4.14)
= ∑ ∑ f [n, m]
n=0 m=0
Mu Mu
′ ∆u ′ ∆u
Since ∆u < ∆o , it follows that the sampling rate associated with × sinc n − n sinc m −m
g[n′ , m′ ] also satisfies the Nyquist rate. ∆o ∆o
The upsampled image is given by Mu −1 Mu −1
′ ′ ′ ′ ′ ′
= ∑ ∑ f [n, m]
g[n , m ] = { f (n ∆u , m ∆u ), 0 ≤ n , m ≤ Mu − 1 }. (4.15) n=0 m=0
′
n′ m
In a later section of this chapter (Section 4-6) we demonstrate × sinc − n sinc −m . (4.18)
L L
how the finer discretization provided by upsampling is used to
compute a rotated or warped version of image f [n, m]. Another If greater truncation is desired, we can replace the product of
application is image magnification, but in that case the primary sinc functions with the product of the Lanczos functions defined
goal is to increase the image size (from T1 × T1 to T2 × T2 ), as in Eq. (4.4). In either case, application of Eq. (4.18) generates
the upsampled version g[n′ , m′ ] directly from the original sam- tion for F[k1 , k2 ] extends to (Mo − 1) whereas the summation for
pled version f [n, m]. If ∆u = ∆o /L and L is an integer, then the G[k1 , k2 ] extends to (Mu − 1).
process preserves the values of f [n, m] while adding new ones in The goal is to compute g[n, m] by (1) transforming f [n, m] to
between them. To demonstrate that the upsampling process does obtain F[k1 , k2 ], (2) transforming F[k1 , k2 ] to G[k1 , k2 ], and (3)
indeed preserve f [n, m], let us consider the expression given by then transforming G[k1 , k2 ] back to the spatial domain to form
Eq. (4.15) for the specific case where n′ = Ln and m′ = Lm: g[n, m]. Despite the seeming complexity of having to execute a
three-step process, the process is computationally more efficient
g[Ln, Lm] = { f (n′ ∆u , m′ ∆u ), 0 ≤ n′ , m′ ≤ Mu − 1 } than performing upsampling entirely in the spatial domain.
= { f (Ln∆u , Lm∆u ), 0 ≤ n, m ≤ Mo − 1 } Upsampling in the discrete frequency domain [k1 , k2 ] en-
= { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1 } = f [n, m], tails increasing the number of discrete frequency components
from (Mo × Mo ) for F[k1 , k2 ] to (Mu × Mu ) for G[k1 , k2 ], with
where we used the relationships given by Eqs. (4.11) and (4.14). Mu > Mo . As we will demonstrate shortly, G[k1 , k2 ] includes all
Hence, upsampling by an integer L using the sinc interpola- of the elements of F[k1 , k2 ], but it also includes some additional
tion formula does indeed preserve the existing values of f [n, m], rows and columns filled with zeros.
in addition to adding interpolated values between them.
A. Original Image
4-3.2 Upsampling in the Spatial Frequency
Domain Let us start with the sampled image fo (x, y) of continuous
image f (x, y) sampled at a sampling interval ∆o and resulting
Instead of using the sinc interpolation formula given by in (Mo × Mo ) samples. Per Eq. (3.61), adapted to a finite sum
Eq. (4.18), upsampling can be performed much more easily, that starts at (0, 0) and ends at (Mo − 1, Mo − 1),
and with less computation, using the 2-D DFT in the spatial
frequency domain. From Eqs. (4.11) and (4.18), the (Mo × Mo ) Mo −1 Mo −1
original f [n, m] image and the (Mu × Mu ) upsampled image fo (x, y) = ∑ ∑ f (n∆o , m∆o ) δ (x − n∆o ) δ (y − m∆o)
g[n′ , m′ ] are defined as n=0 m=0
Mo −1 Mo −1
f [n, m] = { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1 }, (4.19a) = ∑ ∑ f [n, m] δ (x − n∆o) δ (y − m∆o ), (4.21)
n=0 m=0
g[n′ , m′ ] = { g(n′ ∆u , m′ ∆u ), 0 ≤ n′ , m′ ≤ Mu − 1 }. (4.19b)
where we used the definition for f [n, m] given by Eq. (4.19a).
Note that whereas in the earlier section it proved convenient to Using entry #9 in Table 3-1, the 2-D CSFT Fo (µ , ν ) of
distinguish the indices of the upsampled image from those of fo (x, y) can be written as
the original image—so we used [n, m] for the original image and
[n′ , m′ ] for the upsampled image—the distinction is no longer Fo (µ , ν ) = F { fo (x, y)}
needed in the present section, so we will now use indices [n, m] Mo −1 Mo −1
for both images. = ∑ ∑ f [n, m] F {δ (x − n∆o) δ (y − m∆o )}
From Eq. (3.75), the 2-D DFT of f [n, m] of order (Mo × Mo ) n=0 m=0
and the 2-D DFT of g[n, m] of order (Mu × Mu ) are given by Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π µ n∆o e− j2πν m∆o . (4.22)
Mo −1 Mo −1 n=0 m=0
− j(2π /Mo )(nk1 +mk2 )
F[k1 , k2 ] = ∑ ∑ f [n, m] e ,
n=0 m=0 The spectrum Fo (µ , ν ) of sampled image fo (x, y) is doubly
0 ≤ k1 , k2 ≤ Mo − 1, (4.20a) periodic in µ and ν with period 1/∆o , as expected.
Mu −1 Mu −1 Extending the relations expressed by Eqs. (2.47) and (2.54)
G[k1 , k2 ] = ∑ ∑ f [n, m] e− j(2π /Mu )(nk1 +mk2 ) , from 1-D to 2-D, the spectrum Fo (µ , ν ) of the sampled image is
n=0 m=0 related to the spectrum F(µ , ν ) of continuous image f (x, y) by
0 ≤ k1 , k2 ≤ Mu − 1. (4.20b) ∞ ∞
1
The summations are identical in form, except that the summa-
Fo ( µ , ν ) =
∆2o k ∑ ∑ F(µ − k1 /∆o , ν − k2 /∆o ). (4.23)
1 =−∞ k2 =−∞
4-3 UPSAMPLING AND INTERPOLATION 135
The spectrum Fo (µ , ν ) of the sampled image consists of copies k1 , k2 ≥ 0 and redefining µ as µ = −k1 /(Mo ∆o ) leads to
of the spectrum F(µ , ν ) of f (x, y) repeated every 1/∆o in both
µ and ν , and also scaled by 1/∆2o. −k1 k2 Mo − 1
Fo , 0 ≤ k1 , k2 ≤
In (µ , ν ) space, µ and ν can be both positive or negative. The Mo ∆o Mo ∆o 2
relation between the spectrum Fo (µ , ν ) of the sampled image Mo −1 Mo −1
fo (x, y) and the spectrum F(µ , ν ) of the original continuous = ∑ ∑ f [n, m] e− j2π n(−k1)/Mo e− j2π mk2 /Mo
image f (x, y) assumes different forms for the four quadrants of n=0 m=0
(µ , ν ) space. Mo −1 Mo −1
For ease of presentation, let Mo be odd. If Mo is even, then = ∑ ∑ f [n, m] e− j2π n(Mo −k1 )/Mo e− j2π mk2 /Mo
simply replace (Mo − 1)/2 with Mo /2 (see Section 4-4.2). n=0 m=0
Next, we sample µ and ν by setting them to Mo − 1
= F[Mo − k1 , k2 ], 0 ≤ k1 , k2 ≤ , (4.26)
2
k1 k2
µ= and ν = , (4.24)
Mo ∆o Mo ∆o 3. Quadrant 3: µ ≤ 0 and ν ≤ 0
Mo − 1
0 ≤ |k1 |, |k2 | ≤ . Redefining µ as µ = −k1 /(Mo ∆o ) and ν as ν = −k2 /(Mo ∆o )
2
leads to

−k1 −k2 Mo − 1
Fo , , 0 ≤ k1 , k2 ≤
Mo ∆o Mo ∆o 2
Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π n(−k1)/Mo e− j2π m(−k2 )/Mo
1. Quadrant 1: µ ≥ 0 and ν ≥ 0 n=0 m=0
Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π n(Mo −k1 )/Mo e− j2π m(Mo −k2 )/Mo
At these values of µ and ν , Fo (µ , ν ) becomes n=0 m=0
Mo − 1
k1 k2 = F[Mo − k1, Mo − k2 ], 0 ≤ k1 , k2 ≤ . (4.27)
Fo , 2
Mo ∆o Mo ∆o
Mo −1 Mo −1
4. Quadrant 4: µ ≥ 0 and ν ≤ 0
= ∑ ∑ f [n, m] e− j2π nk1 /Mo e− j2π mk2 /Mo
n=0 m=0 Upon defining µ as in Eq. (4.24) and redefining ν as
Mo − 1 ν = −k2 /(Mo ∆o ),
= F[k1 , k2 ], 0 ≤ k1 , k2 ≤ . (4.25)
2
k1 −k2 Mo − 1
Fo , , 0 ≤ k1 , k2 ≤
Mo ∆o Mo ∆o 2
Mo −1 Mo −1
= ∑ ∑ f [n, m] e− j2π nk1 /Mo e− j2π m(−k2 )/Mo
n=0 m=0
Mo −1 Mo −1
2. Quadrant 2: µ ≤ 0 and ν ≥ 0 = ∑ ∑ f [n, m] e− j2π nk1 /Mo e− j2π m(Mo −k2 )/Mo
n=0 m=0
In quadrants 2–4, we make use of the relation Mo − 1
= F[k1 , Mo − k2], 0 ≤ k1 , k2 ≤ , (4.28)
2
− j2π n(−k1 )/Mo − j2π nMo /Mo − j2π n(−k1 )/Mo
e =e e
The result given by Eqs. (4.25)–(4.28) states that the 2-D
= e− j2π n(Mo −k1 )/Mo , CSFT Fo (µ , ν ) of the sampled image fo (x, y)—when sampled
at the discrete spatial frequency values defined by Eq. (4.24)—
where we used e− j2π nMo /Mo = 1. A similar relation applies to k2 . is the 2-D DFT of the sampled image f [n, m]. Also, the spec-
In quadrant 2, µ is negative and ν is positive, so keeping trum Fo (µ , ν ) of the sampled image consists of copies of the
spectrum F(µ , ν ) repeated every 1/∆o in both µ and ν , and also upon sampling Fu (µ , ν ) at the rates defined by Eq. (4.24), we
scaled by 1/∆2o. Hence, generalizing Eq. (2.54) from 1-D to 2-D obtain
gives
k1 k2
Fu ,
k1 k2 1 k1 k2 Mo ∆o Mo ∆o
Fo , = 2F , , (4.29)
Mo ∆o Mo ∆o ∆o Mo ∆o Mo ∆o Mu −1 Mu −1
= ∑ ∑ g[n, m] e− j2π nk1 /Mu e− j2π mk2 /Mu
where F(µ , ν ) is the 2-D CSFT of the continuous-space image n=0 m=0
f (x, y) and 0 ≤ |k1 |, |k2 | ≤ (Mo − 1)/2. Mo − 1
= G[k1 , k2 ], 0 ≤ k1 , k2 ≤ , (4.32)
2
A. Upsampled Image 2. Quadrant 2: µ ≤ 0 and ν ≥ 0
Now we repeat this entire derivation using a sampling interval

∆u instead of ∆o , and replacing Mo with Mu , but keeping the form −k1 k2 Mo − 1
Fu , 0 ≤ k1 , k2 ≤
of the relations given by Eq. (4.24) the same. Since ∆u < ∆o , the Mo ∆o Mo ∆o 2
sampled image is now (Mu × Mu ) instead of (Mo × Mo ). Hence, Mu −1 Mu −1
the (Mu × Mu ) sampled image is = ∑ ∑ g[n, m] e− j2π n(−k1)/Mu e− j2π mk2 /Mu
n=0 m=0
Mu −1 Mu −1 Mu −1 Mu −1
fu (x, y) = ∑ ∑ f (n∆u , m∆u ) δ (x − n∆u ) δ (y − m∆u ) = ∑ ∑ g[n, m] e− j2π n(Mu −k1 )/Mu e− j2π mk2 /Mu
n=0 m=0 n=0 m=0
Mu −1 Mu −1
Mo − 1
= ∑ ∑ g[n, m] δ (x − n∆u ) δ (y − m∆u ), (4.30) = G[Mu − k1, k2 ], 0 ≤ k1 , k2 ≤
2
, (4.33)
n=0 m=0
with g[n, m] as defined in Eq. (4.19b). The associated 2-D CSFT

of the sampled image fu (x, y) is
Fu (µ , ν ) = F { fu (x, y)} 3. Quadrant 3: µ ≤ 0 and ν ≤ 0

Mu −1 Mu −1
= ∑ ∑ g[n, m] F {δ (x − n∆u) δ (y − m∆u )}

n=0 m=0 −k1 −k2 Mo − 1
Mu −1 Mu −1 Fu , , 0 ≤ k1 , k2 ≤
Mo ∆o Mo ∆o 2
= ∑ ∑ g[n, m] e− j2π µ n∆u e− j2πν m∆u . (4.31)
Mu −1 Mu −1
n=0 m=0
= ∑ ∑ g[n, m] e− j2π n(−k1)/Mu e− j2π m(−k2 )/Mu
n=0 m=0
Mu −1 Mu −1
= ∑ ∑ g[n, m] e− j2π n(Mu −k1 )/Mu e− j2π m(Mu −k2 )/Mu
n=0 m=0
1. Quadrant 1: µ ≥ 0 and ν ≥ 0 Mo − 1
= G[Mu − k1, Mu − k2 ], 0 ≤ k1 , k2 ≤ . (4.34)
2
In view of the relationship (from Eq. (4.14))
∆u ∆u 1
= = ,
Mo ∆o Mu ∆u Mu
4-4 IMPLEMENTATION OF UPSAMPLING USING 2-D DFT IN MATLAB 137
4. Quadrant 4: µ ≥ 0 and ν ≤ 0 sampled signal is zero, since sampling the original signal f (x, y)
at above its Nyquist rate separates the copies of its spectrum,
leaving bands of zero between copies. Thus
k1 −k2 Mo − 1
Fu , , 0 ≤ k1 , k2 ≤
Mo ∆o Mo ∆o 2 G[k1 , k2 ] = 0, Mo ≤ k1 , k2 ≤ Mu − 1. (4.39)
Mu −1 Mu −1
= ∑ ∑ g[n, m] e− j2π nk1 /Mu e− j2π m(−k2 )/Mu
n=0 m=0
Mu −1 Mu −1
= ∑ ∑ g[n, m] e− j2π nk1 /Mu e− j2π m(Mu −k2 )/Mu
n=0 m=0 4-4 Implementation of Upsampling
Mo − 1
= G[k1 , Mu − k2 ], 0 ≤ k1 , k2 ≤ , (4.35) Using 2-D DFT in MATLAB
2
The result given by Eqs. (4.32)–(4.35) states that the 2-D In MATLAB, both the image f [n, m] and its 2-D DFT are stored
CSFT Fu (µ , ν ) of the upsampled image fu (x, y)—when sampled and displayed using the format shown in Fig. 3-25(d), wherein
at the discrete spatial frequency values defined by Eq. (4.24)—is the origin is at the upper left-hand corner of the image, and the
the 2-D DFT of the sampled image g[n, m]. Also, the spectrum indices of the corner pixel are (1, 1).
Fu (µ , ν ) of the sampled image consists of copies of the spec-
trum F(µ , ν ) of f (x, y) repeated every 1/∆u in both µ and ν ,
and also scaled by 1/∆2u. Thus,

k1 k2 1 k1 k2 Image and 2-D DFT Notation
Fu , = 2F , , (4.36)
Mo ∆o Mo ∆o ∆u Mo ∆o Mo ∆o
To avoid confusion between the common-image format
From Eq. (4.14), Mo ∆o = Mu ∆u . Combining Eq. (4.29) and (CIF) and the MATLAB format, we provide the following
Eq. (4.36) shows that list of symbols and definitions:

k1 k2 M2 k1 k2 CIF MATLAB
Fu , = u2 Fo , . (4.37)
Mo ∆o Mo ∆o Mo Mo ∆o Mo ∆o Original image f [n, m] X(m′ , n′ )
Hence, the (Mo × Mo ) 2-D DFT F[k1 , k2 ] of f [n, m] and the Upsampled image g[n, m] Y(m′ , n′ )
(Mu × Mu ) 2-D DFT G[k1 , k2 ] of g[n, m] are related by 2-D DFT of f [n, m] F[k1 , k2 ] FX(k2′ , k1′ )
2-D DFT of g[n, m] G[k1 , k2 ] FY(k2′ , k1′ )
Mu2
G[k1 , k2 ] = F[k1 , k2 ],
Mo2
Mu2
G[Mu − k1 , k2 ] = F[Mo − k1 , k2 ],
Mo2 As noted earlier in Section 3-9, when an image f [n, m] is stored
M2 in MATLAB as array X(m′ , n′ ), the two sets of indices are related
u
G[k1 , Mu − k1 ] = F[k1 , Mo − k2 ], by
Mo2
Mu2 m′ = m + 1, (4.40a)
G[Mu − k1 , Mu − k2 ] = F[Mo − k1 , Mo − k2 ], ′
Mo2 n = n + 1. (4.40b)
Mo − 1
0 ≤ k1 , k2 ≤ . (4.38) The indices get interchanged in orientation (n represents row
2
number, whereas n′ represents column number) and are shifted
This leaves G[k1 , k2 ] for Mo ≤ k1 , k2 ≤ Mu − 1 to be determined. by 1. For example, f [0, 0] = X(1, 1), and f [0, 1] = X(2, 1). While
But Eq. (4.38) shows that these values of G[k1 , k2 ] are samples the indices of the two formats are different, the contents of
of Fu (µ , ν ) at values of µ , ν for which this spectrum of the array X(m′ , n′ ) is identical with that of f [n, m]. That is, for an
(Mo × Mo ) image zeros in the “middle” of the array FX. The result is
Mu2
X(m′ , n′ ) = f [n, m] = FY = ×
  Mo2
f [0, 0] f [1, 0] ··· f [Mo − 1, 0]  (Mu −Mo ) columns

h i z }| { h i
 f [0, 1] f [1, 1] ··· f [Mo − 1, 1]   Mo −1 Mo +1 
 .  F[0,0]... F 2 ,0 0 ... 0 ... 0 F 2 ,0 ... F[Mo −1,0] 
 .
.. .. .. ..  
.. .. .. .. ..

. . .  
 .h . . . . 
 h i i h i h i
f [0, Mo − 1] f [1, Mo − 1] · · · f [Mo − 1, Mo − 1]  Mo −1 
F 0, Mo2−1 ... F Mo2−1 , Mo2−1 F Mo +1 Mo −1
2 , 2 ... F Mo −1, 2 
(4.41)  
 0. 0. 
 
 .. .. 
 .. . 
The MATLAB command FX = fft(X, Mo , Mo ) computes the {
 0. . 0 .. 0. }

2-D DFT F[k1 , k2 ] and stores it in array FX(k2′ , k1′ ): 
 .
.
.
.


 
 h i 0 h i h i 0 h i
FX(k2′ , k1′ ) = F[k1 , k2 ] = 
F 0, Mo +1 ... F Mo −1 , Mo +1 Mo +1 Mo +1

Mo +1 
 2 2 2 F 2 , 2 ... F Mo −1, 2 
   
F[0, 0] F[1, 0] ··· F[Mo − 1, 0]  .. .. .. .. .. 
 .h . . . . 
 F[0, 1] F[1, 1] ··· F[Mo − 1, 1]   i h i 
 .. .. .. .. . F [0,Mo −1]... F Mo2−1 ,Mo −1 0| 0 0} F Mo −1, 2Mo +1
... F [Mo −1,Mo −1]
  {z
. . . . (4.43)
F[0, Mo − 1] F[1, Mo − 1] · · · F[Mo − 1, Mo − 1]
(4.42)
◮ Note that the (Mu − Mo ) columns of zeros start after entry
The goal of upsampling using the spatial-frequency domain F[(Mo − 1)/2, 0], and similarly the (Mu − Mo ) rows of zeros
is to compute g[n, m] from f [n, m] by computing G[k1 , k2 ] from start after F[0, (Mo − 1)/2]. ◭
F[k1 , k2 ] and then applying the inverse DFT to obtain g[n, m].
The details of the procedure are somewhat different depending
on whether the array size parameter M is an odd integer or an Once array FY has been established, the corresponding up-
even integer. Hence, we consider the two cases separately. sampled image g[n, m] is obtained by applying the MATLAB
command Y=real(ifft2(FY,N,N)), where N = Mu and
the “real” is needed to eliminate the imaginary part of Y, which
4-4.1 Mo = Odd Integer may exist because of round-off error in the ifft2.
As a simple example, consider the (3 × 3) array
The recipe for upsampling using the 2-D DFT is as follows:  
F[0, 0] F[1, 0] F[2, 0]
1. Given: image { f [n, m], 0 ≤ n, m ≤ Mo −1 }, as represented FX = F[0, 1] F[1, 1] F[2, 1] . (4.44a)
by Eq. (4.41). F[0, 2] F[1, 2] F[2, 2]
To generate a 5 × 5 array FY we insert Mu − Mo = 5 − 3 = 2

2. Compute: the 2-D DFT F[k1 , k2 ] of f [n, m] using columns of zeros after element F[(Mo − 1)/2, 0] = F[1, 0], and
Eq. (4.20a) to obtain the array represented by Eq. (4.42). also 2 rows of zeros after F[0, (Mo − 1)/2] = F[0, 1]. The result
is
3. Create: an upsampled (Mu × Mu ) array G[k1 , k2 ], and then  
F[0, 0] F[1, 0] 0 0 F[2, 0]
set its entries per the rules of Eq. (4.38).
52 
F[0, 1] F[1, 1] 0 0 F[2, 1]

FY = 2  0 0 0 0 0 . (4.44b)
4. Compute: the (Mu × Mu ) 2-D inverse DFT of G[k1 , k2 ] to 3  0 0 0 0 0 
obtain g[n, m]. F[0, 2] F[1, 2] 0 0 F[2, 2]
In MATLAB, the array FY containing the 2-D DFT G[k1 , k2 ] Application of the inverse 2-D DFT to FY generates array Y in
is obtained from the array FX given by Eq. (4.42) by inserting MATLAB, which is equivalent in content to image g[n, m] in
(Mu − Mo ) rows of zeros and an equal number of columns of common-image format.
4-4 IMPLEMENTATION OF UPSAMPLING USING 2-D DFT IN MATLAB 139
4-4.2 Mo = Even Integer one column of zeros and multiplying by (52 /32 ) would generate
 
49 −6 − j3 0 3 −6 + j3
52  −5 − j4 2 + j9 0 −5 + j2 j 
 
G[k1 , k2 ] = 2  0 0 0 0 0 .
3  1 −4 + j3 0 −1 −4 − j3
For a real-valued (Mo × Mo ) image f [n, m] with Mo = an odd
−5 + j4 −j 0 −5 − j2 2 − j9
integer, conjugate symmetry is automatically satisfied for both
(4.48)
F[k1 , k2 ], the 2-D DFT of the original image, as well as for This G[k1 , k2 ] array does not satisfy conjugate symmetry. Ap-
G[k1 , k2 ], the 2-D DFT of the upsampled image. However, if plying the inverse 2-D DFT to G[k1 , k2 ] generates the upsampled
Mo is an even integer, application of the recipe outlined in the image g[n, m]:
preceding subsection will violate conjugate symmetry, so we
need to modify it. Recall from Eq. (3.80) that for a real-valued g[n,m] = (52 /32 )×
 
image f [n, m], conjugate symmetry requires that 0.64 1.05 + j0.19 1.64 − j0.30 2.39 + j0.30 2.27 − j0.19
1.05 + j0.19 2.01 + j0.22 2.85 − j0.20 2.8 − j0.13 1.82 − j0.19
 
F∗ [k1 , k2 ] = F[Mo − k1 , Mo − k2 ], 1 ≤ k1 , k2 ≤ Mo − 1.  1.64 − j0.3 2.38 − j0.44 3.9 + j0.46 3.16 + j0.08 1.31 + j0.39 ,
2.39 + j0.30 2.37 − j0.066 2.88 + j0.36 2.04 − j0.9 0.84 + j0.12
(4.45) 2.27 − j0.19 1.89 − j0.25 1.33 + j0.26 0.83 + j0.08 1.18 + j0.22
Additionally, in view of the definition for F[k1 , k2 ] given by
Eq. (4.20a), the following two conditions should be satisfied: which is clearly incorrect; all of its elements should be real-
valued because the original image f [n, m] is real-valued. Obvi-
Mo −1 Mo −1 ously, the upsampling recipe needs to be modified.
F[0, 0] = ∑ ∑ f [n, m] = real-valued, (4.46a) A simple solution is to split row F[k1 , Mo /2] into 2 rows and
n=0 m=0 to split column F[Mo /2, k2 ] into 2 columns, which also means

Mo Mo Mo −1 Mo −1 that F[Mo /2, Mo /2] gets split into 4 entries. The recipe preserves
F , = ∑ ∑ (−1)n+m f [n, m] = real-valued. conjugate symmetry in G[k1 , k2 ].
2 2 n=0 m=0 When applied to F[k1 , k2 ], the recipe yields the (6 × 6) array
(4.46b)
G[k1 , k2 ] = (62 /42 )×
In the preceding subsection, we inserted the appropriate number  
of rows and columns of zeros to obtain G[k1 , k2 ] from F[k1 , k2 ]. F[0, 0] F[1, 0] F[2, 0]/2 0 F[2, 0]/2 F[3, 0]
For Mo equal to an odd integer, the conditions represented  F[0, 1] F[1, 1] F[2, 1]/2 0 F[2, 1]/2 F[3, 1] 
 
by Eqs. (4.45) and (4.46) are satisfied for both F[k1 , k2 ] and F[0, 2]/2 F[1, 2]/2 F[2, 2]/4 0 F[2, 2]/4 F[3, 2]/2
 0 0 0 0 0 0 =
G[k1 , k2 ], but they are not satisfied for G[k1 , k2 ] when Mo is  
F[0, 2]/2 F[1, 2]/2 F[2, 2]/4 0 F[2, 2]/4 F[3, 2]/2
an even integer. To demonstrate why the simple zero-insertion
procedure is problematic, let us consider the (4 × 4) image F[0, 3] F[1, 3] F[2, 3]/2 0 F[2, 3]/2 F[3, 3]
 
  49 −6 − j3 1.5 0 1.5 −6 + j3
1 2 3 4 −5 − j4 2 + j9 −2.5 + j1 0 −2.5 + j1 j 
2 4 5 3  
f [n, m] =  . 
(4.47a)  0.5 −2 + j1.5 −0.25 0 −0.25 −2 − j1.5
3 4 6 2 .
 0 0 0 0 0 0 
4 3 2 1  0.5 −2 + j1.5 −0.25 0 −0.25 −2 − j1.5
−5 − j4 −j − j0.5 0 −2.5 − j1 2 − j9
The (4 × 4) 2-D DFT F[k1 , k2 ] of f [n, m] is
(4.49)
 
49 −6 − j3 3 −6 + j3
−5 − j4 2 + j9 −5 + j2 j  Application of the inverse 2-D DFT to G[k1 , k2 ] yields the
F[k1 , k2 ] =  .
1 −4 + j3 −1 −4 − j3
−5 + j4 −j −5 − j2 2 − j9
(4.47b)
This F[k1 , k2 ] has conjugate symmetry.
Let us suppose that we wish to upsample the (4 × 4) image
f [n, m] to a (5 × 5) image g[n, m]. Inserting one row of zeros and
Answer: The 2-point DFT X[k] = {8 + 4, 8 − 4} = {12, 4}.

Since Mo = 2 is even, we split 4 and insert a zero in the
middle, and multiply by 4/2 to get Y[k] = {12, 2, 0, 2}. The
inverse 4-point DFT is
2
y[0] = (12 + 2 + 0 + 2) = 8,
4
2
y[1] = (12 + 2 j − 0 − 2 j) = 6,
4
(a) Original 4 × 4 image (b) Upsampled 6 × 6 image 2
y[2] = (12 − 2 + 0 − 2) = 4,
4
Figure 4-4 Comparison of original and upsampled images. 2
y[3] = (12 − 2 j + 0 + 2 j) = 6.
4
y[n] = {8, 6, 4, 6}.
4-5 Downsampling
upsampled image Downsampling is the inverse operation to upsampling. The
  objective of downsampling is to reduce the array size f [n, m]
0.44 0.61 1.06 1.33 1.83 1.38 of an image from (Mo × Mo ) samples down to (Md × Md ), with
0.61 1.09 1.73 1.91 1.85 1.21 Md < Mo . The original image f [n, m] and the downsampled
62  
1.06 1.55 2.31 2.58 1.66 0.90 image g[n, m] are defined as:
g[n, m] = 2  
4 1.33 1.55 2.22 2.67 1.45 0.78
1.83 1.72 1.52 1.42 0.54 0.73
Original image f [n, m] = { f (n∆o , m∆o ), 0 ≤ n, m ≤ Mo − 1},
1.38 1.25 0.94 0.76 0.72 1.04 Downsampled image g[n, m] = { f (n∆d , m∆d ), 0 ≤ n, m ≤ Md − 1}.
 
1.0 1.37 2.39 3.0 4.12 3.11
1.37 2.46 3.90 4.30 4.16 2.72 Both images are sampled versions of some continuous-space
  image f (x, y), with image f [n, m] sampled at a sampling interval
2.39 3.49 5.20 5.81 3.74 2.03
=  , (4.50) ∆o that satisfies the Nyquist rate, and the downsampled image
 3.0 3.49 5.00 6.01 3.26 1.76
4.12 3.87 3.42 3.20 1.22 1.64 g[n, m] is sampled at ∆d , with ∆d > ∆o , so it is unlikely that
3.11 2.81 2.12 1.71 1.62 2.34 g[n, m] satisfies the Nyquist rate.
The goal of downsampling is to compute g[n, m] from f [n, m].
which is entirely real-valued, as it should be. The original (4×4) That is, we compute a coarser-discretized Md × Md image g[n, m]
image given by Eq. (4.47a) and the upsampled (6 × 6) image from the finer-discretized Mo × Mo image f [n, m]. Applications
given by Eq. (4.50) are displayed in Fig. 4-4. The two images, of downsampling include computation of “thumbnail” versions
which bear a close resemblance, have the same physical size but of images, as demonstrated in Example 4-1, and shrinking
different-sized pixels. images to fit into a prescribed space, such as columns in a
textbook.
4-5.1 Aliasing
It might seem that downsampling by, say, two, meaning that
Exercise 4-2: Upsample the length-2 signal x[n] = {8, 4} to
Md = Mo /2 (assuming Mo is even), could be easily accomplished
a length-4 signal y[n]. by simply deleting every even-indexed (or odd-indexed) row
and column of f [n, m]. Deleting every other row and column of
4-6 ANTIALIAS LOWPASS FILTERING 141
f [n, m] is called decimation by two. Decimation by two would 4-6 Antialias Lowpass Filtering
give the result of sampling f (x, y) every 2∆o instead of every
∆o . But if sampling at S = 1/2∆o is below the Nyquist rate, the Clearly decimation alone is not sufficient to obtain a downsam-
decimated image g[n, m] is aliased. The effect of aliasing on the pled image that looks like a demagnified original image. To
spectrum of g[n, m] can be understood in 1-D from Fig. 2-6(b). avoid aliasing, it is necessary to lowpass filter the image before
The copies of the spectrum of f (x, y) produced by sampling decimation, eliminating the high spatial frequency components
overlap one another, so the high-frequency parts of the signal of the image, so that when the filtered image is decimated, the
become distorted. Example 4-1 gives an illustration of aliasing copies of the spectra do not overlap. In 1-D, in Fig. 2-6(b),
in 2-D. had the spectrum been previously lowpass filtered with cutoff
frequency S/2 Hz, the high-frequency parts of the spectrum
would no longer overlap after sampling. This is called antialias
Example 4-1: Aliasing filtering.
The same concept applies in discrete space (and time). The
The 200 × 200 clown image shown in Fig. 4-5(a) was decimated periodicity of spectra induced by sampling becomes the period-
to the 25 × 25 image shown in Fig. 4-5(b). The decimated image icity of the DSFT and DTFT with periods of 2π in (Ω1 , Ω2 )
is a poor replica of the original image and could not function as and Ω, respectively. Lowpass filtering can be accomplished by
a thumbnail image. setting certain high-spatial-frequency portions of the 2-D DFT
to zero. The purpose of this lowpass filtering is now to eliminate
the high-spatial-frequency parts of the discrete-space spectrum,
prior to decimation. This eliminates aliasing, as demonstrated in
Example 4-2.
Example 4-2: Antialiasing
The 200 × 200 clown image shown in Fig. 4-5(a) was first
lowpass-filtered to the image shown in Fig. 4-6(a), with the
spectrum shown in Fig. 4-6(b), then decimated to the 25 × 25
image shown in Fig. 4-6(c). The decimated image is now a good
replica of the original image and could function as a thumbnail
image. The decimated image pixel values in Fig. 4-6(c) are all
equal to certain pixel values in the lowpass-filtered image in
Fig. 4-6(a).
(a) Original ◮ A MATLAB code for this example is available on the

book website. ◭
4-6.1 Downsampling in the 2-D DFT Domain

The antialiasing approach works well when downsampling by
(b) Decimated 25 × 25 version an integer factor L, since decimation can be easily performed
by keeping only every Lth row and column of the antialiased-
Figure 4-5 Clown image: (a) original 200 × 200 and (b) filtered image. But downsampling by a non-integer factor
decimated 25 × 25 version. Md /Mo must be performed entirely in the 2-D DFT domain, as
follows.
Antialias lowpass filtering is performed by setting to zero
p the Mo − Md center rows and columns of the 2-D DFT, in
MATLAB depiction. Decimation is then performed by deleting

those (Mo − Md ) center rows and columns of the 2-D DFT. Of
course, there is no reason to set to zero rows and columns that
will be deleted anyways, so in practice the first step need not be
performed. By “center rows and columns” we mean those rows
and columns with DFT indices k (horizontal or vertical; add one
to all DFT indices to get MATLAB indices) in the (Mo × Mo )
2-D DFT of the (M × M) original image f [n, m]:
• For Md odd: (Md + 1)/2 ≤ k ≤ Mo − (Md + 1)/2.
• For Md even: Md /2 ≤ k ≤ Mo − Md /2, then insert a row or
column of zeros at index k = Md /2.
Note that there is no need to subdivide some values of F[k1 , k2 ]
to preserve conjugate symmetry.
The procedure is best illustrated by an example.
(a) Lowpass-filtered 200 × 200 image

Example 4-3: Downsampling
The goal is to downsample the MATLAB (6 × 6) image array:

 
3 1 4 1 5 9
2 6 5 3 5 8
 
9 7 9 3 2 3
X(m, n) =   (4.51)
8 4 6 2 6 4
3 3 8 3 2 7
9 5 0 2 8 8
The corresponding magnitudes of the (6 × 6) 2-D DFT of this

array in MATLAB is
|FX(k1 , k2 )| =
 
173.00 23.81 20.66 15.00 20.66 23.81
 6.93 25.00 7.55 14.00 7.94 21.93
(b) Spectrum of lowpass-filtered image  
 11.14 19.98 16.09 15.10 14.93 4.58 
 9.00 16.09 . (4.52)
 19.98 1.00 19.98 16.09
 11.14 4.58 14.93 15.10 16.09 19.98 
6.93 21.93 7.94 14.00 7.55 25.00
Setting to zero the middle three columns of the 2-D DFT

magnitudes in MATLAB depiction gives the magnitudes
(c) The decimated 25 × 25 image is not aliased  
173.00 23.81 0 0 0 23.81
Figure 4-6 Clown image: (a) lowpass-filtered 200 × 200 im-  6.93 25.00 0 0 0 21.93
 
age, (b) spectrum of image in (a), and (c) unaliased, decimated  0 0 0 0 0 0 
|FG(k1 , k2 )| =  . (4.53)
version of (a).  0 0 0 0 0 0  
 0 0 0 0 0 0 
6.93 21.93 0 0 0 25.00
4-7 B-SPLINES INTERPOLATION 143
Deleting the zero-valued rows and columns in the 2-D DFT Photoshop
R
and through the command imresize.
magnitudes in MATLAB depiction gives the magnitudes
 
173.00 23.82 23.82 4-7.1 B-Splines
|FG(k1 , k2 )| =  6.93 25.00 21.93 . (4.54)
6.93 21.93 25.00 Splines are piecewise-polynomial functions whose polynomial
coefficients change at half-integer or integer values of the
Multiplying by Md2 /Mo2 = 32 /62 = 1/4 and taking the inverse independent variable, called knots, so that the function and some
2-D DFT gives of its derivatives are continuous at each knot. In 1-D, a B-spline
βN (t) of order N is a piecewise polynomial of degree N, centered
 
5.83 1.58 6.00 at t = 0. The support of βN (t), which is the interval outside of
g[m, n] = 6.08 6.33 3.00 . (4.55) which βN (t) = 0, extends between −(N + 1)/2 and +(N + 1)/2.
6.25 3.5 4.67
This is the result we would have gotten by decimating the ◮ Hence, the duration of βN (t) is (N + 1). ◭
antialiased lowpass-filtered original image, but it was performed
entirely in the 2-D DFT domain.
Formally, the B-spline function βN (t) is defined as
◮ A MATLAB code for this example is available on the Z ∞
sin(π f ) N+1 j2π f t
book website. ◭ βN (t) = e d f, (4.56)
−∞ πf
which is equivalent to the inverse Fourier transform of
Concept Question 4-2: Why must we delete rows and
sincN+1 ( f ). Recognizing that (a) the inverse Fourier transform
columns of the 2-D DFT array to perform downsampling?
of sinc(t) is a rectangle function and (b) multiplication in
the frequency domain is equivalent to convolution in the time
domain, it follows that
4-7 B-Splines Interpolation
In the preceding sections we examined several different image βN (t) = rect(t) ∗ · · · ∗ rect(t), (4.57)
| {z }
interpolation methods, some of which perform the interpolation N+1 times
directly in the spatial domain, and others that perform the inter-
polation in the spatial frequency domain. Now, we introduce yet with (
another method, known as the B-splines interpolation method,
1 for |t| < 1/2,
with the distinguishing feature that it is the method most com- rect(t) = (4.58)
monly used for image interpolation. Unlike with downsampling, 0 for |t| > 1/2.
B-spline interpolation has no aliasing issues when the sampling Application of Eq. (4.57) for N = 0, 1, 2, and 3 leads to:
interval ∆ is too large. Moreover, unlike with upsampling,
(
B-spline interpolation need not result in blurred images. 1 for |t| < 1/2,
B-splines are a family of piecewise polynomial functions, β0 (t) = rect(t) = (4.59)
with each polynomial piece having a degree N, where N is 0 for |t| > 1/2,
(
a non-negative integer. As we will observe later on in this 1 − |t| for |t| < 1,
section, a B-spline of order zero is equivalent to the nearest- β1 (t) = β0 (t) ∗ β0(t) = (4.60)
neighbor interpolation method of Section 3-5.1, but it is simpler 0 for |t| > 1,
to implement than the sinc interpolation formula. Interpolation β2 (t) = β1 (t) ∗ β0(t)
with B-splines of order N = 1 generates linear interpolation, 
 2
which is used in computer graphics. Another popular member 3/4 − t for 0 ≤ |t| ≤ 1/2,
1
of the B-spline interpolation family is cubic interpolation, corre- = 2 (3/2 − |t|)2 for 1/2 ≤ |t| ≤ 3/2, (4.61)

0
sponding to N = 3. Cubic spline interpolation is used in Adobe R
for |t| > 3/2,
β3 (t) = β2 (t) ∗ β0(t)


 2 3
2/3 − t + |t| /2 for |t| ≤ 1,
= (2 − |t|) /63 for 1 ≤ |t| ≤ 2, (4.62) 1

0 for |t| > 2.
Note that in all cases, βN (t) is continuous over its full duration.
For N ≥ 1, the B-spline function βN (t) is continuous and
differentiable (N − 1) times at all times t. For β2 (t), the function
is continuous across its full interval (−3/2, 3/2), including at
t = 1/2. Similarly, β3 (t) is continuous over its interval (−2, 2), t
−0.5 0 0.5
including at t = 1.
(a) β0(t)
Plots of the B-splines of order N = 0, 1, 2, and 3 are displayed
in Fig. 4-7. From the central limit theorem in the field of
probability, we know that convolving a function with itself
repeatedly makes the function resemble a Gaussian. This is 1
evident in the present case as well.
From the standpoint of 1-D and 2-D interpolation of signals
and images, the significance of B-splines is in how we can use
them to express a signal or image. To guide us through the
process, let us assume we have 6 samples x(n∆), as shown in
Fig. 4-8, extending between t = 0 and t = 5∆. Our objective
t
is to interpolate between these 6 points so as to obtain a −1 0 1
continuous function x(t). An important constraint is to ensure (b) β1(t)
that x(t) = x(n∆) at the 6 discrete times n∆.
For a B-spline of a specified order N, the interpolation is
realized by expressing the desired interpolated signal x(t) as a 0.8
linear combination of time-shifted B-splines, all of order N:
∞ t
x(t) = ∑ c[m] βN
∆
−m . (4.63) 0.4
m=−∞

Here, βN ∆t − m is the B-spline function βN (t), with t scaled by t
−2 −1 0 1 2
the sampling interval ∆ and delayed by a scaled time integer m.
(c) β2(t)

◮ The support of βN ∆t − m is
0.7
0.6
N +1 t N+1
m− < < m+ . (4.64) 0.4
2 ∆ 2
0.2
That is, βN ∆t − m = 0 outside that interval. ◭
t
−3 −2 −1 0 1 2 3
(d) β3(t)
Associated with each value of m is a constant coefficient c[m]
whose value is related to the sampled values x(n∆) and the order
N of the B-spline. More specifically, the values of c[m] have to Figure 4-7 Plots of βN (t) for N = 0, 1, 2, and 3.
be chosen such that the aforementioned constraint requiring that
x(t) = x(n∆) at discrete times t = n∆ is satisfied. The process is
4 4
3 3 x(nΔ)
x(t)
2 2
1 1
0 t/Δ 0 t/Δ
0 1 2 3 4 5 0 1 2 3 4 5
Figure 4-8 Samples x(n∆) to be interpolated into x(t). Figure 4-9 B-spline interpolation for N = 0.
described in forthcoming subsections. 4-7.2 N = 0: Nearest-Neighbor Interpolation

Since each B-spline is a piecewise polynomial of order N,
For N = 0, Eqs. (4.63) and (4.64) lead to
continuous and differentiable (N − 1) times, any linear com-
bination of time-shifted B-splines also constitutes a piecewise ∞ t
polynomial of order N, and will also be continuous and differ- x(t) = ∑ c[m] β0
∆
−m
entiable. Thus, the B-splines form a basis—hence, the “B” in m=−∞
∞ t
their name—and where the basis is used to express x(t), as in
Eq. (4.63), x(t) is continuous and differentiable (N − 1) times at
= ∑ c[m] rect
∆
−m
m=−∞
the knots t = m∆ if N is odd and at the knots t = (m + 1/2)∆ (
∞
if N is even. This feature of B-splines makes them suitable for 1 for m − 21 < t
< m + 12 ,
interpolation, as well as for general representation of signals and
= ∑ c[m]
0 otherwise.
∆ (4.65)
m=−∞
images.
The expression given in Eq. (4.65) consists of a series of adjoin-
ing, but not overlapping, rectangle functions. For m = 0, rect ∆t
Exercise 4-3: Show that the area under βN (t) is 1. 1 1
R∞ is centered at ∆t = 0 and extends over t
the range − 2 t < ∆ < 2 .
Answer: −∞ βN (t) dt = B(0) by entry #11 in Table 2-4. t
Similarly, for m = 1, rect ∆ − 1 is centered at ∆ = 1 and
Set f = 0 in sincN ( f ) and use sinc(0) = 1. extends over 21 < ∆t < 32 . Hence, to satisfy the constraint that
x(t) = x(n∆) at the sampled locations t = n∆, we need to set
Exercise 4-4: Why does βN (t) look like a Gaussian func-
m = n and c[m] = x(n∆):
tion for N ≥ 3? t 1 t 1
x(t) = x(n∆) rect −n , n− < < n + . (4.66)
Answer: Because βN (t) is rect(t) convolved with itself ∆ 2 ∆ 2
(N + 1) times. By the central limit theorem of probabil-
ity, convolving any square-integrable function with itself The interpolated function x(t) is shown in Fig. 4-9, along with
the 6 samples { x(n∆) }. The B-spline representation given by
results in a function resembling a Gaussian.
Eq. (4.66) is a nearest-neighbor (NN) interpolation: x(t) in
the interval { n∆ ≤ t ≤ (n + 1)∆ } is set to the closer (in time)
Exercise 4-5: Show that the support of βN (t) is of x(n∆) and x((n + 1)∆). The resulting x(t) is a piecewise
−(N + 1)/2 < t < (N + 1)/2. constant, as shown in Fig. 4-9.
Answer: The support of βN (t) is the interval outside of
which βN (t) = 0. The support of rect(t) is −1/2 < t < 1/2. 4-7.3 Linear Interpolation
βN (t) is rect(t) convolved with itself N + 1 times, which has
duration N + 1 centered at t = 0. For N = 1, the B-spline β1 ∆t − m assumes the shape of a
triangle (Fig. 4-7(b)) centered at t/∆ = m. Figure 4-10(b)
displays the triangles centered at t/∆ = m for m = 0 through 5.
Also displayed in the same figure are the values of x(n∆). To
0 t/Δ
0 1 2 3 4 5
(a) x(nΔ)
β1(t/Δ − 1) β1(t/Δ − 2) β1(t/Δ − 3) β1(t/Δ − 4)

β1(t/Δ) 1 β1(t/Δ − 5)
0 t/Δ
−1 0 1 2 3 4 5 6
(b) β1(t/Δ − m)
0
0 1 2 3 4 5
(c) x(t)
Figure 4-10 B-spline linear interpolation (N = 1).

satisfy Eq. (4.63) for N = 1, namely

3
∞ t 2.5 m=0
x(t) = ∑ c[m] β1
∆
−m , (4.67) 2
m = −2
m=2
m=−∞ 1.5
as well as meet the condition that x(t) = x(n∆) at the 6 given

1 m=1
0.5
m = −1
points, we should set m = n and select t
0
−4 −3 −2 −1 0 1 2 3 4 Δ
c[m] = c[n] = x(∆n) (N = 1).
Consequently, for any integer n, x(t) at time t between n∆ and Figure 4-11 B-splines β2 (t/∆ − m) overlap in time.
(n + 1)∆ is a weighted average given by
t t
x(t) = x(n∆) (n + 1) − + x((n + 1)∆) − n . (4.68a)
∆ ∆ samples { x(n∆) } can be derived by starting with Eq. (4.63),
The associated duration is ∞ t
n∆ ≤ t ≤ (n + 1)∆. (4.68b)
x(t) = ∑ c[m] βN
∆
−m , (4.69)
m=−∞
Application of the B-spline linear interpolation to the given and then setting t = n∆, which gives
samples x(n∆) leads to the continuous function x(t) shown in
∞
Fig. 4-10(c). The linear interpolation amounts to setting { x(t),
n∆ ≤ t ≤ (n + 1)∆ } to lie on the straight line connecting x(n∆)
x(n∆) = ∑ c[m] βN (n − m) = c[n] ∗ βN (n), (4.70)
m=−∞
to x((n + 1)∆).
where use was made of the discrete-time convolution relation
Exercise 4-6: Given the samples given by Eq. (2.71a).
For N = 2, Eq. (4.61) indicates that β2 (n) 6= 0 only for integers
{ x(0), x(∆), x(2∆), x(3∆) } = { 7, 4, 3, 2 }, n = { −1, 0, 1 }. Hence, the discrete-time convolution given by
Eq. (4.70) simplifies to
compute x(∆/3) by interpolation using: (a) nearest neigh-
bor; (b) linear. x(n∆) = c[n − 1] β2 (1) + c[n] β2 (0) + c[n + 1] β2 (−1)
1
Answer: (a) ∆/3 is closer to 0 than to ∆, so = 8 c[n − 1] + 43 c[n] + 81 c[n + 1] (N = 2). (4.71)
x(∆/3) = x(0) = 7. In the second step, the constant coefficients were computed
using Eq. (4.61) for β2 (t). The sum truncates because β2 (t) = 0
for |t| ≥ 3/2, so only three basis functions overlap at any specific
(b)
time t, as is evident in Fig. 4-11.
x(∆/3) = (2/3)x(0)+(1/3)x(∆) = (2/3)(7)+(1/3)(4) = 6. Similarly, for N = 3, β3 (t) = 0 for |t| ≥ 2, which also leads
to the sum of three terms:
x(n∆) = c[n − 1] β3 (1) + c[n] β2 (0) + c[n + 1] β3 (−1)

4-7.4 Quadratic Interpolation 1
= 6 c[n − 1] + 64 c[n] + 61 c[n + 1] (N = 3). (4.72)
For N ≥ 2, interpolation using B-splines becomes more compli-
cated than for N = 0 and 1 because the supports of the basis If N is increased beyond 3, the number of terms increases, but
functions { βN ∆t − m } overlap in time for different values the proposed method of solution to determine the values of c[n]
of m, as shown in Fig. 4-11 for N = 2. Unlike the cases N = 0 remains the same. Specifically, we offer the following recipe:
and N = 1, wherein we set c[m] = x(m∆), now c[m] is related to (1) Delay { x(n∆) } to { xe(n∆) } to make it causal. Compute
values of more than one of the discrete samples { x(n∆) }. the No th-order DFT X[k] of { xe(n∆) }, where No is the number of
For N ≥ 2, the relationships between coefficients { c[m] } and samples { x(n∆) }.
(2) Delay { β2 (−1), β2 (0), β2 (1) } by 1 to { βe2 (t) } to make it

30
causal. Compute the No th-order DFT B[k] of
25
1 3 1
{ β2 (−1), β2 (0), β2 (1) } = , , .
8 4 8
20
(3) Compute the No th-order inverse DFT to determine c[n]:
15
−1 X[k]
c[n] = DFT . (4.73)
B[k] 10
Example 4-4: Quadratic Spline Interpolation

0 t
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 4-12(a) displays samples { x(n∆) } with ∆ = 1. Obtain (a) x(n)

an interpolated version x(t) using quadratic splines.
30
Solution: From Fig. 4-12(a), we deduce that x(n) has 6
nonzero samples and is given by
25
x(n) = { x(−3), x(−2), x(−1), x(0), x(1), x(2) }

20
= { 3, 19, 11, 17, 26, 4 }.
As noted earlier, 15

1 3 1 10
β2 (n) = { β2 (−1), β2 (0), β2 (1) } = , , .
8 4 8
5
Inserting x(n) and β (n) in Eq. (4.70) establishes the convolution
problem 0 t
−5 −4 −3 −2 −1 0 1 2 3 4 5
1 3 1 (b) x(t) interpolated using quadratic splines
{ 3, 19, 11, 17, 26, 4 } = , , ∗ c[n].
8 4 8
Figure 4-12 (a) Original samples x(n) and (b) interpolated
Following the solution recipe outlined earlier—and demon-
function x(t).
strated in Section 2-9—the deconvolution solution is
c[n] = { 24, 8, 16, 32 },
and, therefore, the interpolated continuous function is
x(t) = 24β2(t + 2) + 8β2(t + 1) + 16β2(t) + 32β2(t − 1).

Concept Question 4-3: What is the difference between
A plot of x(t) is shown in Fig. 4-12(b). It is evident that x(t) has zero-order-spline interpolation and nearest-neighbor inter-
the same values as x(n) at t = { −3, −2, −1, 0, 1, 2 }. polation?
◮ Note: The MATLAB code for solving Example 4-4 is Concept Question 4-4: Why use cubic interpolation,
available on the book website. ◭ when quadratic interpolation produces smooth curves?
4-8 2-D SPLINE INTERPOLATION 149
four neighbors:
Concept Question 4-5: Why do quadratic and cubic in-
terpolation require computation of coefficients, while linear { f (n∆, m∆), f ((n + 1)∆, m∆), f (n∆, (m + 1)∆),
interpolation does not?
f ((n + 1)∆, (m + 1)∆) }. (4.77)
Exercise 4-7: In Eq. (4.63), why isn’t c[n] = x(n∆) for

N ≥ 2?
4-8.2 Bilinear Image Interpolation
Answer: Because three of the basis functions βN ∆t − m
overlap for any t. Linear interpolation is performed as follows:
(1) Each image location (x0 , y0 ) has four nearest sampled
values given by Eq. (4.77), with a unique set of values for n
and m.
4-8 2-D Spline Interpolation (2) Compute:
x0
1-D interpolation using splines generalizes directly to 2-D. The f (x0 , m∆) = f (n∆, m∆) n + 1 −
task now is to obtain a continuous function f (x, y) from samples x∆
0
{ f (n∆, m∆) }. + f ((n + 1)∆, m∆) −n , (4.78a)
The 2-D spline functions are separable products of the 1-D ∆ x0
spline functions: f (x0 , (m + 1)∆) = f (n∆, (m + 1)∆) n + 1 −
x∆
0
βN (x, y) = βN (x) βN (y). (4.74) + f ((n + 1)∆, (m + 1)∆) − n , (4.78b)
∆
For example, for N = 1 we have and then combine them to find
(
(1 − |x|)(1 − |y|) for 0 ≤ |x|, |y| ≤ 1 y0
β1 (x, y) = f (x0 , y0 ) = f (x0 , m∆) m + 1 −
0 otherwise, y∆
0
(4.75) + f (x0 , (m + 1)∆) −m . (4.79)
∆
which is pyramidal in shape. Interpolation using β1 (x, y) is
called bilinear interpolation, which is a misnomer because The preceding computation linearly interpolates in x for y = m∆,
β1 (x, y) includes a product term, |x||y|. and again for y = (m + 1)∆, and then linearly interpolates in y
In 2-D, Eq. (4.63) becomes for x = x0 .
∞ ∞ x y
f (x, y) = ∑ ∑ c[n, m] βN
∆
− n βN
∆
− m . (4.76)
m=−∞ n=−∞ 4-8.3 Cubic Spline Interpolation
For N = 0 and N = 1, c[n, m] = f (n∆, m∆), but for N ≥ 2, c[n, m] We describe the cubic spline interpolation procedure through
is computed from { f (n∆, m∆) } using a 2-D version of the DFT an example. Figure 4-13(a) shows a synthetic-aperture radar
recipe outlined in Section 4-7.4. (SAR) image of a metropolitan area, and part (b) of the figure
shows a magnified version of the central part of the original
image using cubic-spline interpolation. The magnification factor
4-8.1 Nearest-Neighbor (NN) image is 3 along each direction. The interpolation uses
Interpolation
x[n, m] = c[n, m] ∗ ∗(β3[n] β3 [m]) (4.80)
NN interpolation of images works in the same way as NN
interpolation of 1-D signals. After locating the four samples to compute x[n, m] at x = n/3 and y = m/3 for integers n
surrounding a location (x0 , y0 )—thereby specifying the appli- and m. The values of c[n, m] are determined using the DFT
cable values of n and m, the value assigned to f (x, y) at location recipe outlined in Section 4-7.4 in combination with the product
(x0 , y0 ) is the value of the nearest-location neighbor among those of cubic spline functions given by Eq. (4.62) evaluated at
(−1, 0, 1):
 
0 β3 (−1)
 β3 (0)  β3 (−1) β3 (0) β3 (1)
100
β3 (1)
200 1  
1 1 4 1
6
300 41 4 1
= 6 6 6 6 = 4 16 4 . (4.81)
36 1 4 1
400 1
6
500
600
1 2
Exercise 4-8: The “image” is interpolated using
700 3 4
bilinear interpolation. What is the interpolated value at the .
800 center of the image?
900 Answer: 14 (1 + 2 + 3 + 4) = 2.5
1000
0 200 400 600 800 1000

(a) Original SAR image to be magnified 4-9 Comparison of 2-D Interpolation
Methods
0
In this chapter, we have discussed three image interpolation
methods: The sinc interpolation formula in Section 4-1.1, the
100 Lanczos interpolation formula in Section 4-1.2, and the B-
spline interpolation method in Section 4-8. To compare the
effectiveness of the different methods, we chose the original
200 clown image shown in Fig. 4-14(a) and then downsampled it
by a factor of 9 by retaining only 1/9 of the original pixels (1/3
along each direction). The locations of the downsampled pixels
300
and the resulting (67 × 67) downsampled image are shown in
parts (b) and (c) of Fig. 4-14, respectively.
400 Next, we applied the various interpolation methods listed
earlier to the downsampled clown image in Fig. 4-14(c) so as
to generate an interpolated version of the original clown image.
500 This is equivalent to magnifying the (67 × 67) downsampled
clown image in Fig. 4-14(c) by a factor of 3 in each dimen-
sion. The results, displayed in Fig. 4-15, deserve the following
599 commentary:
0 100 200 300 400 500 599 (a) Relative to the original clown image, of the first three in-
(b) SAR image magnified by 3 using cubic splines terpolated images, the Lanczos with a = 3 is slightly better than
that with a = 2, and both are better than the sinc-interpolated
Figure 4-13 The image in (b) is the (200 × 200) central part image.
of the synthetic-aperture radar (SAR) image in (a) magnified by a (b) Among the B-spline images, significant improvement is
factor of 3 along each direction using cubic-spline interpolation. realized in image quality as N is increased from N = 0 (nearest
neighbor) to N = 1 (bilinear) and then to N = 3 (cubic). MAT-
LAB code for this example is available on the book website.
4-9 COMPARISON OF 2-D INTERPOLATION METHODS 151
0 0
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
199 199
0 40 80 120 160 199 0 40 80 120 160 199
(a) Sinc interpolation (d) B-spline NN interpolation
0 0
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
199 199
0 40 80 120 160 199 0 40 80 120 160 199
(b) Lanczos interpolation with a = 2 (e) B-spline linear interpolation
0 0
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
199 199
0 40 80 120 160 199 0 40 80 120 160 199
(c) Lanczos interpolation with a = 3 (f ) B-spline cubic-spline interpolation
Figure 4-15 Comparison of three interpolation methods: (a) sinc interpolation; (b) and (c) Lanczos interpolation with a = 2 and a = 3,
respectively; and (d) to (f) B-spline with N = 0, N = 1, and N = 3, respectively.
4-10 Examples of Image Interpolation

0
Applications
Example 4-5: Image Rotation
Recall from Eq. (3.12), that rotating an image f (x, y) by an

angle θ leads to a rotated image g(x, y) given by
g(x, y) = f (x cos θ + y sin θ , y cos θ − x sin θ ). (4.82)
Sampling g(x, y) at x = n∆ and y = m∆ gives
g(n∆, m∆) =
199
0 199 f (n∆ cos θ + m∆ sin θ , m∆ cos θ − n∆ sin θ ), (4.83)
(a) Original clown image
which clearly requires interpolation of f (x, y) at the required
0 points from its given samples f (n∆, m∆). In practice, nearest
neighbor (NN) interpolation is usually sufficient to realize the
necessary interpolation.
Figure 4-16(a) displays a zero-padded clown image, and
part (b) displays the image after rotation by 45◦ using NN
interpolation. The rotated image bears a very good resemblance
to the rotated original. The MATLAB code for this figure is on
the book website.
Example 4-6: Exponential Image Warping
Image warping or morphing entails creating a new image g(x, y)

from the original image f (x, y) where
199
0 199 g(x, y) = f (Tx (x), Ty (y)) (4.84)
(b) Samples to be interpolated
0 for some 1-D transformations Tx (x) and Ty (y). Image shifting
by (xo , yo ) can be implemented using
20
40 Tx (x) = x − xo and Ty (y) = y − yo.

60 Magnification by a factor of a can be implemented using
0 20 40 60
(c) Downsampled image x y
Tx (x) = and Ty (y) = .
a a
Figure 4-14 The original clown image in (a) was downsampled More interesting warping of images can be performed using
to the image in (c) by sampling only 1/9 of the pixels of the nonlinear transformations, as demonstrated by the following
original image. illustrations.
4-10 EXAMPLES OF IMAGE INTERPOLATION APPLICATIONS 153
(a) Warped Clown Image

After shifting the clown image so that the origin [n, m] = [0, 0]
0 is at the center of the image, the image was warped using
Tn (n) = ne−|n|/300 and Tm (m) = me−|m|/300 and NN interpola-
50 tion. The warped image is shown in Fig. 4-17(a). Repeating the
process with the space constant 300 replaced with 200 leads to
100 greater warping, as shown in Fig. 4-17(b).
150
Example 4-7: Square-Root and Inverse Image
200 Warping
250
Another form of image warping is realized by applying a
300 square-root function of the form
p p
350 n |n| m |m|
Tn (n) = and Tm (m) = .
25 25
399
0 50 100 150 200 250 300 350 399 Repetition of the steps described in the previous example, but
(a) Zero-padded clown image using the square-root transformation instead, leads to the images
in Fig. 4-18(a). The MATLAB code for this figure is available
0 on the book website.
Another transformation is the inverse function given by
50
n|n| m|m|
Tn (n) = and Tm (m) = .
100 a a
The result of warping the clown image with a = 300 is shown in
150 Fig. 4-18(b). The MATLAB code for this figure is available on
the book website.
200
Concept Question 4-6: Provide four applications of in-

250 terpolation.
300
1 2
Exercise 4-9: The “image” is rotated counter-
3 4
350 ◦
clockwise 90 . What is the result?

399 2 4
0 50 100 150 200 250 300 350 399 Answer:
1 3
(b) Clown image rotated by 45˚

4 8
Figure 4-16 Clown image before and after rotation by 45◦ . Exercise 4-10: The “image” is magnified by a
12 16
factor of three. What is the result, using: (a) NN; (b) bilinear
interpolation?
0 0
50 50
100 100
150 150
200 200
250 250
300 300
350 350
399 399
0 50 100 150 200 250 300 350 399 0 50 100 150 200 250 300 350 399
(a) Tn(n) = ne−|n|/ 300 and Tm(m) = me−|m|/ 300 (a) Tn(n) = n√|n|/ 25 and Tm(m) = m√|n|/ 25
0 0
50 50
100 100
150 150
200 200
250 250
300 300
350 350
399 399
0 50 100 150 200 250 300 350 399 0 50 100 150 200 250 300 350 399
(b) Tn(n) = ne−|n|/ 200 and Tm(m) = me−|m|/ 200 (b) Tn(n) = n|n|/ 300 and Tm(m) = m|m|/ 300
Figure 4-17 Nonlinear image warping with space constants of Figure 4-18 Clown image warped with (a) square-root transfor-
(a) 300 and (b) 200. mation and (b) inverse transformation.
4-10 EXAMPLES OF IMAGE INTERPOLATION APPLICATIONS 155
Answer:
(a)  
4 4 4 8 8 8
4 4 4 8 8 8
 
4 4 4 8 8 8
12 12 12 16 16 16
 
12 12 12 16 16 16
12 12 12 16 16 16
(b)  
1 2 1 2 4 2
2 4 2 4 8 4
 
1 2 1 2 4 2
 
3 6 3 4 8 4
6 12 6 8 16 8
3 6 3 4 8 4
Summary
Concepts
• Interpolation is “connecting the dots” of 1-D samples, must be taken to preserve conjugate symmetry in the
and “filling in the gaps” of 2-D samples. 2-D DFT. Deleting rows and columns performs lowpass
• Interpolation can be used to rotate and to warp or filtering so that the downsampled image is not aliased.
“morph” images. • B-splines are piecewise polynomial functions that can be
• Upsampling an image can be performed by inserting used to interpolate samples in 1-D and 2-D.
rows and columns of zeros in the 2-D DFT of the image. • For N ≥ 2, computation of the coefficients {c[m]} from
Care must be taken to preserve conjugate symmetry in samples {x(m∆)} can be formulated as a deconvolution
the 2-D DFT. problem.
• Downsampling an image can be performed by deleting • 2-D interpolation using B-splines is a generalization of
rows and columns in the 2-D DFT of the image. Care 1-D interpolation using B-splines.
B-splines B-spline 
 2 3
βN (t) = rect(t) ∗ · · · ∗ rect(t) 2/3 − t + |t| /2 for |t| ≤ 1,
| {z } 3
N+1 times β3 (t) = (2 − |t|) /6 for 1 ≤ |t| ≤ 2,


B-spline 0 for |t| ≥ 2
(
1 for |t| < 1/2,
β0 (t) = rect(t) = B-spline 1-D interpolation
0 for |t| > 1/2 ∞ t
x(t) = ∑ c[m] βN −m
B-spline ( m=−∞ ∆
1 − |t| for |t| ≤ 1, Nearest-neighbor 1-D interpolation
β1 (t) =
0 for |t| ≥ 1 x(t) = x(n∆) for |t − n∆| < ∆/2
Linear 1-D interpolation

B-spline  t t
 2 x(t) = x(n∆) (n + 1) − + x((n + 1)∆) −n
3/4 − t for 0 ≤ |t| ≤ 1/2, ∆ ∆
β2 (t) = (3/2 − |t|)2/2 for 1/2 ≤ |t| ≤ 3/2, for n∆ ≤ t ≤ (n + 1)∆


0 for |t| ≥ 3/2
B-spline downsampling interpolation Lanczos function nearest-neighbor thumbnail image upsampling
PROBLEMS 4.2 Write a MATLAB program that loads the 50 × 50 image

in tinyclown.mat, deletes the last row and column to make
Section 4-3: Upsampling and Interpolation it 49 × 49, and magnifies it by four using upsampling. This is
easier than Problem 4.1 since 49 is an odd number.
4.1 Write a MATLAB program that loads the 50 × 50 image in
tinyclown.mat and magnifies it by four using upsampling. 4.3 Write a MATLAB program that loads the 64 × 64 image
Note that 50 is an even number. in tinyletters.mat and magnifies it by four using upsam-
PROBLEMS 157
pling. Note that 64 is an even number. 4.14 Another way to derive the formula for linear in-
terpolation is as follows: The goal is to interpolate the
4.4 Write a MATLAB program that loads the 64 × 64 image in
four points { f (0, 0), f (1, 0), f (0, 1), f (1, 1)} using a formula
tinyletters.mat, deletes the last row and column to make
f (x, y) = f0 + f1 x + f2 y + f3 xy, where { f0 , f2 , f2 , f3 } are found
it 63 × 63, and magnifies it by four using upsampling. This is
from the given points. This extends to
easier than Problem 4.3 since 63 is an odd number.
{ f (n, m), f (n + 1, ), f (n, m + 1), f (n + 1, m + 1)}
Section 4-5: Downsampling
for any integers n, m.
4.5 Write a MATLAB program that loads the 200 × 200 image (a) Set up a linear system of equations with unknowns
in clown.mat, antialias lowpass filters it, and demagnifies { f0 , f1 , f2 , f3 } and knowns
it by four using downsampling. (This is how the image in
tinyclown.mat was created.) { f (0, 0), f (1, 0), f (0, 1), f (1, 1)}.
4.6 Repeat Problem 4.5, but skip the antialias lowpass filter.
(b) Solve the system to obtain a closed-form expression for
4.7 Write a MATLAB program that loads the 256 × 256 image f (x, y) as a function of { f (0, 0), f (0, 1), f (1, 0), f (1, 1)}.
in letters.mat, antialias lowpass filters it, and demagnifies
it by four using downsampling. (This is how the image in 4.15 The image  
tinyletters.mat was created.) a b c
d e f
4.8 Repeat Problem 4.7, but skip the antialias lowpass filter. g h i
4.9 Show that if the sinc interpolation formula is used to
is rotated 90◦ clockwise. What is the result?
upsample an M × M image f [n, m] to an N × N image g[n, m] by
an integer factor L (so that N = ML), then g[nL, mL] = f [n, m], 4.16 The image  
so that the values of f [n, m] are preserved after upsampling. a b c
d e f
Section 4-8: 2-D Spline Interpolation g h i
√
4.10 Write a MATLAB program that loads the 50 × 50 image is rotated 45◦ clockwise and magnified by 2 using linear
in tinyclown.mat and magnifies it by four using nearest- interpolation. What is the result?
neighbor interpolation. 4.17 Recall from Eq. (3.12) that rotating an image f (x, y) by
4.11 Write a MATLAB program that loads the 64 × 64 image θ to get g(x, y) is implemented by
in tinyletters.mat and magnifies it by four using nearest-
neighbor interpolation. g(x, y) = f (x′ , y′ ) = f (x cos θ + y sin θ , y cos θ − x sin θ ),
4.12 NN interpolation is used to magnify where from Eq. (3.11)

  ′
a b c x cos θ sin θ x
= .
d e f  y′ − sin θ cos θ y
g h i
This is point-by-point, and so it is very slow. A faster way to
by three. What is the result? rotate an image by transforming it first in y and then in x is
as follows: (1) Let h(x, y) = f (x, (y/ cos θ − x tan θ )); (2) then
4.13 Linear interpolation is used to magnify
g(x, y) = h(cos θ + y sin θ , y)).
 
a b c (a) Show that this computes g(x, y) from f (x, y).
d e f  (b) Show that this amounts to the matrix factorization
g h i
cos θ sin θ 1 0 ∗ ∗
by two. What is the result? =
− sin θ cos θ ∗ ∗ 0 1
for some elements ∗.

4.18 Write a MATLAB program that loads the 50 × 50 image
in tinyclown.mat and magnifies it by four using cubic
spline interpolation.
4.19 Write a MATLAB program that loads the 50 × 50 image
in tinyletters.mat and magnifies it by four using cubic
spline interpolation.
4.20 This problem is for readers who have some familiarity
with 1-D DSP. Use of 1-D quadratic splines on N interpolation
points {x(n∆)} requires solving
x(n∆) = c[n − 1] β2 (1) + c[n] β2 (0) + c[n + 1] β2 (1)
(Eq. (4.64)) for {c[n]} from {x(n∆)}. In Section 4-7.4 this was
solved using the DFT, which requires (N/2) log2 N multipli-
cations. This problem gives a faster method, requiring only
2N < (N/2) log2 N multiplications.
Let
h[n] = {β2 (−1), β2 (0), β2 (1)} = β2 (1){1, r, 1},
where r = β2 (0)/β2 (1).

(a) Show that h[n] can be written as
h[n] = β2 (1){1, 1/ρ } ∗ {1, ρ }
for some constant ρ . Determine ρ .

(b) Show that h[n] can be implemented by the following two
systems connected in series:
y1 [n] = h1 [n] ∗ x1[n]

= x1 [n] + ρ x1[n − 1]; y2[n]
= h2 [n] ∗ x2[n]
1
= x2 [n + 1] + x2 [n].
ρ
(c) Show that for each of these systems, xi [n] can be computed
recusively and stably from yi [n] using
x1 [n] + ρ x1[n − 1] = y1 [n]; x2 [n] + ρ x2[n + 1] = ρ y2 [n].
The latter system must be run backwards in time n.

(d) Determine a recipe for computing {c[n]} from {x(n∆)}
using these concepts.
4.21 Repeat Problem 4.20 for cubic splines.
Chapter 5
5 Image Enhancement
0
Contents 20
40
Overview, 160
60
5-1 Pixel Value Transformation, 160
5-2 Unsharp Masking, 163 80
5-3 Histogram Equalization, 167 100
5-4 Edge Detection, 171
5-5 Summary of Image Enhancement 120
Techniques, 176 140

Problems, 178
160
180
Objectives 200
0 20 40 60 80 100 120 140 160 180 200
Learn to:
This chapter covers various types of image
■ Use linear or gamma transformation to alter pixel enhancement, in which the goal is to deliberately
values to bring out image features. alter the image to brighten it, increase its contrast,
sharpen it, or enhance features such as edges.
■ Use unsharp masking or the Laplacian to sharpen an
image. Unsharp masking sharpens an image using a
high-pass filter, but this also makes the image
■ Use histogram equalization to brighten an image. noisier. Histogram equalization nonlinearly alters
pixel values to spread them out more evenly over
■ Use Sobel or Canny edge detection to produce an the display range of the image. Edge enhancement
edge image of a given image. produces an edge image of just the edges of the
image, which can be useful in image recognition in
computer vision.
Overview 5-1.1 Linear Transformation of Pixel Values
Image enhancement is an operation that transforms an image In general, image f [n, m] may have both positive and negative
f [n, m] to another image g[n, m] in which features of f [n, m], pixel values. A linear transformation linearly transforms the
such as edges or contrasts between different pixel values, are individual pixel values from f [n, m] to g[n, m], with 0 displayed
emphasized. It is not the same as image restoration, which as pure black and gmax displayed as pure white. Functionally,
includes denoising (removing noise from an image), deblurring the linear transformation is given by
(refocusing an out-of-focus image), and the more general case
of deconvolution (undoing the effect of a PSF on an image). In f [n, m] − fmin
g[n, m] = gmax . (5.1)
these three types of operations, the goal is to recover the true fmax − fmin
image f [n, m] from its noisy or blurred version g[n, m]. Image
restoration is covered in Chapter 6. Usually, g[n, m] is normalized so that gmax = 1.
Image enhancement techniques covered in this chapter in- Without the linear transformation, a display device would
clude: linear and nonlinear transformations of pixel values for display all negative values of f [n, m] as black and would display
displaying images more clearly; unsharp masking, a technique all values larger than gmax as white. In MATLAB, the com-
originally developed for sharpening images in film-based pho- mand imagesc(X),colormap(gray) applies the linear
tography; histogram equalization for brightening images; and transformation given by Eq. (5.1), with gmin = 0 and gmax = 1,
edge detection for identifying edges in images. prior to displaying an image, thereby ensuring that the full range
of values of array X are displayed properly, including negative
values.
5-1 Pixel-Value Transformation
The image dynamic range Ri of an image f [n, m] is defined
as the range of pixel values contained in the image, extending
5-1.2 Logarithmic Transformation of Pixel
between a minimum value fmin and a maximum value fmax : Values
Ri If coherent light is used to illuminate a circular opening, as
depicted by the diagram in Fig. 5-1, the light diffracted by the
fmin fmax opening generates an interference pattern in the image plane,
consisting of a “ring-like” structure. The 1-D image intensity
For a display device, such as a printed page or a computer along any direction in the image plane is given by
monitor, the display dynamic range Rd of the display intensity
g[n, m] is the full range available for displaying an image, I(θ ) = I0 sinc2 (aθ ), (5.2)
extending from a minimum of zero to a maximum gmax :
Rd
0 gmax
Ideally, the display device should display an image such that the Light illumination
information content of the image is conveyed to the user most y
optimally. This is accomplished by applying a preprocessing
transformation of pixel values. If Ri extends over a narrow range
of Rd , a transformation can be used to expand Ri to take full θ x
advantage of the available extent of Rd . Conversely, if Ri extends
over several orders of magnitude, displaying the image over the Image plane
limited linear range Rd would lead to pixel-value truncation. To
avoid the truncation issue, a nonlinear transformation is needed
so as to convert the dynamic range Ri into a range that is more Figure 5-1 Image generated by coherent light diffracted by a
compatible with the dynamic range Rd of the display device. We circular opening.
now explore both types of transformations.
160
5-1 PIXEL-VALUE TRANSFORMATION 161
representation of the normalized intensity in the range between

A(θ) the first pair of nulls, but it is difficult to examine the plot
1 quantitatively outside the range between the two nulls.
0.9 Figure 5-2(b) provides an alternative format for displaying
0.8 the normalized intensity, namely as
0.7
0.6 AdB (θ ) = 10 log10 A(θ ) = 10 log10 [sinc2 (aθ )]. (5.4)
0.5
The use of the logarithm in the decibel (dB) scale serves to
0.4
compress the dynamic range; the range between 1 and 10−3,
0.3
for example, gets converted to between 0 and −30 dB. Con-
0.2
sequently, very small values of A(θ ) become more prominent,
0.1
while values close to 1 get compressed. This is evident in the
0
−40 −20 0 20 40 plot shown in Fig. 5-2(b), where it is now much easier to “read”
θ (degrees) the values of the second and third lobes, compared with doing so
(a) A(θ) in natural units using the linear scale shown in Fig. 5-2(a). We should note that
at angles θ where A(θ ) = 0, the corresponding value of AdB (θ )
AdB is −∞. In plots like the one in Fig. 5-2(b), the lower limit along
0 the vertical axis is truncated to some finite value, in this case
−35 dB.
−5 dB
The decibel scale also is used in displaying image spectra, an
−10 dB example of which is shown in Fig. 5-3, as well as in multiple
fields of science and engineering, including:
−15 dB
−20 dB
• Acoustics:

−25 dB pressure
PdB = 20 log10 , (5.5)
−30 dB
20 µ pascals
−35 dB
−40 −20 0 20 40 where 20 µ pascals is the smallest acoustic pressure that can
θ (degrees) create an audible sound. On this PdB scale, a whisper is about
(b) Decibel scale, AdB(θ) = 10 log10 A(θ) 30 dB and the sound intensity of a jet plane taking off is about
130 dB, making the latter 100 dB (or, equivalently, 100,000)
times louder than a whisper.
Figure 5-2 Plot of the normalized intensity as a function of
angle θ . • Richter Scale for Earthquakes:

displacement
DdB = log10 , (5.6)
1 µm
where I0 and a are constants related to the wavelength λ the
diameter of the opening, and the overall imaging geometry. where “displacement” is defined as the horizontal ground dis-
A plot of the normalized intensity placement at a location 100 km from the earthquake’s epicenter.
For an earthquake of Richter magnitude 6 the associated ground
I(θ ) motion is 1 m. In contrast, the ground displacement associated
A(θ ) = = sinc2 (aθ ) (5.3) with a Richter magnitude 3 earthquake is only 1 mm.
I0
with a = 0.1 is displayed in Fig. 5-2(a) as a function of angle θ , • Stellar Magnitude (of stars viewed from Earth):
with θ expressed in degrees. The peak value of A(θ ) is 1,
and the sinc2 (aθ ) function exhibits sidelobes that decrease in star brightness
SdB = −2.512 log10 . (5.7)
intensity with increasing value of |θ |. The plot provides a good brightness of Vega
162 CHAPTER 5 IMAGE ENHANCEMENT
pixel values is given by the functional form

ν
g[n, m] = a log10 ( f [n, m] + b), (5.8)
where constant a is chosen so that the maximum value of g[n, m]

is equal to gmax , and constant b is chosen so that the smallest
value of g[n, m] is zero, which corresponds to setting the smallest
value of ( f [n, m] + b) equal to 1. The logarithmic transformation
μ of pixel values also is used when displaying image spectra
F(Ω1 , Ω2 ), because the range of values of |F(Ω1 , Ω2 )| usually
is very large.
(a) Linear spectrum 5-1.3 Gamma Transformation of Pixel Values
ν For most display devices—including cathode-ray tubes, print-

ers, and scanners, the intensity I of the displayed image is related
to the signal voltage V by a power-law relation of the form
I[n, m] = aV b [n, m], (5.9)
where a and b are constants, and b having a value in the range

μ 0.4 ≤ b ≤ 0.55. Because b is on the order of 0.5, the dynamic
range of the intensity I is much smaller than that of V . Hence,
for a display device with a fixed dynamic range, the displayed
intensity is a “compressed” version of what a display of V would
have looked like. To correct for this compression, the image
can be preprocessed prior to displaying it by applying a gamma
transformation of pixel values given by
(b) Logarithmic spectrum
f [n, m] − fmin γ
g[n, m] = gmax , (5.10)
Figure 5-3 Spectrum F(µ , ν ) in (a) linear scale and (b) logarith- fmax − fmin
mic scale.
where γ is an application-dependent constant with the same
range as 1/b of Eq. (5.9), namely
1.8 ≤ γ ≤ 2.5.
A first-magnitude star, such as Spica in the constellation Virgo, One role of the gamma transformation is to correct for the
has a stellar magnitude of approximately 1. Stars of magnitude power-law relationship given by Eq. (5.9), thereby generating an
6 are barely visible to the naked eye (depending on viewing image display of the true signal f [n, m]. Here, f [n, m] is the true
conditions) and are 100 times less bright than a first-magnitude pixel value, g[n, m] is the output of the preprocessing step, which
star. The factor of −2.512 was chosen so that a first-magnitude makes it the input to the display device. That is, g[n, m] = V [n, m]
star has a brightness equal to 40% of that of the star Vega, and and
Vega was chosen as the star with a reference brightness of zero
magnitude. I[n, m] = aV b [n, m] = a gb [n, m]
The dB scale also is used in voltage and power ratios, and in b
defining signal-to-noise ratio. f [n, m] − fmin γ
= a gmax . (5.11)
When applied to images, the logarithmic transformation of fmax − fmin
5-2 UNSHARP MASKING 163
If γ of the preprocessor is chosen such that it is approximately

equal to 1/b of the display device, then Eq. (5.11) simplifies to 0

f [m, n] − fmin 20
I[n, m] = a′ , (5.12)
fmax − fmin
where a′ = agbmax . 40
Figure 5-4(a) is an image of the planet Saturn, displayed
with no preprocessing. By comparison, the image in part (b) of 60
the figure had been subjected to a preprocessing step using a
gamma transformation with γ = 3 (the value that seemed to best
enhance the image). The cloud bands are much more apparent 80
in the transformed image than in the original.
100
Concept Question 5-1: Why do we not simply use pixel
values directly as numerical measures of image intensities?
120
Concept Question 5-2: Why are logarithmic scales so 0 20 40 60 80 100 120

useful in so many fields? (a) Original Saturn image
0
Exercise 5-1: A star of magnitude 6 is barely visible to
the naked eye in a dark sky. How much fainter is a star of
magnitude 6 than Vega, whose magnitude is 0? 20
Answer: From Eq. (5.7), 106/2.512 = 244.6, so a sixth
magnitude star is 245 times fainter than Vega. 40
Exercise 5-2: What is another name for gamma transfor- 60

mation with γ = 1?
Answer: Linear transformation. Compare Eq. (5.10) and 80
Eq. (5.1).
100
5-2 Unsharp Masking 120

5-2.1 Film Photography Version of Unsharp 0 20 40 60 80 100 120
Masking (b) Result of Gamma transform with γ = 3
Unsharp masking is a contrast-enhancement procedure com-
Figure 5-4 Image of Saturn (a) before and (b) after gamma
monly used in both photographic and electronic displays. The
transformation.
procedure acts like a high-pass spatial filter that enhances high
spatial-frequency (fast varying) image components.
◮ Thus, unsharp masking enhances edges and fast-varying The “unsharp” and “masking” parts of the name are associated
parts of the image, and is a standard tool available in with a technique in which a blurred (or unsharp) image is used
Adobe R
Photoshop R
.◭ to create a corrective “mask” for removing the blurriness from
the image. The audio equivalent of unsharp masking is turning
up the treble. In the spatial frequency domain G(µ , ν ) is the product of

In its original form, unsharp masking was developed for the spatial frequency response of the Laplacian operator,
film photography to deal with blurring caused by the printing HLaplace (µ , ν ), and the spectrum F(µ , ν ):
process, which involves the passage of the light image through
a sheet of glass. If the original image is f (x, y) and the blurred G(µ , ν ) = HLaplace (µ , ν ) F(µ , ν ). (5.17)
printed version is fblur (x, y), the difference is called fmask (x, y),
Hence,
fmask (x, y) = f (x, y) − fblur (x, y). (5.13)
Image fmask (x, y) can be formed in a darkroom by adding a HLaplace (µ , ν ) = −4π 2(µ 2 + ν 2 ). (5.18a)
“negative” version of fblur (x, y) to f (x, y). The blurring process
caused by the imperfect printing process is, in effect, a lowpass- Similarly, in polar coordinates
filtering process. Hence, fblur (x, y) represents a lowpass-filtered
version of f (x, y), and the “mask” image HLaplace (ρ , φ ) = −4πρ 2. (5.18b)
fmask (x, y) = f (x, y) − fblur (x, y) It is evident from the definitions given by Eqs. (5.18a and b) that
the Laplacian emphasizes high spatial-frequency components
represents a high-pass filtered version of f (x, y). A highpass (proportional to ρ 2 ) of the input image f (x, y). It is equally
spatial filter emphasizes the presence of edges; hence the name evident that all frequency components of HLaplace (µ , ν ) and
“mask.” HLaplace (ρ , φ ) have negative values.
By photographically adding the mask image to the original
image, we obtain a sharpened image fsh (x, y) in which high
spatial-frequency components of f (x, y) are boosted relative to
5-2.3 Laplacian in Discrete Space
low spatial-frequency components: Derivatives in continuous space are approximated as differences
in discrete space. Hence, in discrete space, the Laplacian g[n, m]
fsh (x, y) = f (x, y) + fmask (x, y) of a 2-D image f [n, m] is defined as
= f (x, y) + [ f (x, y) − fblur (x, y)]. (5.14)
g[n, m] = f [n + 1, m] + f [n − 1, m]
In digital image processing, high spatial-frequency compo- + f [n, m + 1] + f [n, m − 1] − 4 f [n, m]. (5.19)
nents can also be boosted by applying the discrete form of the
Laplacian operator to f (x, y). This operation is equivalent to the convolution
g[n, m] = f [n, m] ∗ ∗ hLaplace [n, m], (5.20)

5-2.2 Laplacian Operator in Continuous Space
where hLaplace [n, m] is the point-spread-function (PSF) of the
In continuous space, the Laplacian g(x, y) of a 2-D image f (x, y) Laplacian operator, and is given by
is defined as
 
0 1 0
∂2 f ∂2 f
g(x, y) = ∇2 f (x, y) = + . (5.15) hLaplace [n, m] = 1 −4 1 . (5.21)
∂ x2 ∂ y2 0 1 0
The spatial frequency response of the Laplacian can be obtained
by computing the 2-D CSFT of g(x, y). Application of property The DSFT F(Ω1 , Ω2 ) of image f [n, m] was defined by
#5 in Table 2-4 leads to Eq. (3.73a) as
∞ ∞
G(µ , ν ) = −4π 2(µ 2 + ν 2 ) F(µ , ν ). (5.16a) F(Ω1 , Ω2 ) = ∑ ∑ f [n, m] e− j(Ω1 n+Ω2 m) , (5.22)
n=−∞ m=−∞
Conversion to polar coordinates (ρ , φ ) in the spatial frequency
domain leads to and the properties of the DSFT are direct 2-D generalizations of
the properties of the 1-D DTFT given in Table 2-7. Of particular
G(ρ , φ ) = −4π 2 ρ 2 F(ρ , φ ). (5.16b) interest is the time-shift property in Table 2-7, which when
5-2 UNSHARP MASKING 165
fact, in close proximity to the origin, such that |Ω1 |, |Ω2 | ≪ 1,

|HLaplace(Ω1,0)| expansion of the cosine functions in Eq. (5.26) in a Taylor series
gives
8 Eq. (5.26) [Exact] Ω21
6 Eq. (5.27) [Approx.] cos Ω1 ≈ 1 − + ··· ,
2
4 Ω2
cos Ω2 ≈ 1 − 2 + · · · ,
2 2
0 Ω1 which when used in Eq. (5.26), the latter simplifies to
−π −π/2 0 π/2 π
Figure 5-5 |HLaplace (Ω1 , Ω2 )| versus Ω1 at Ω2 = 0 (in red) and HLaplace (Ω1 , Ω2 ) ≈ −Ω21 − Ω22 = −R 2 , for R ≪ 1.
the approximation for small values of Ω1 and Ω2 blue.
(5.27)
This frequency dependence of the discrete-space Laplacian is
analogous to the response given by Eq. (5.18b) for the frequency
extended to 2-D, leads to response of the continuous-space Laplacian; both have negative
signs and both vary as the square of the spatial frequency
f [n − n0, m − m0] F(Ω1 , Ω2 ) e− j(n0 Ω1 +m0 Ω2 ) . (5.23) (ρ and R).
The blue plot in Fig. 5-5 represents the approximate expres-
Application of Eq. (5.23) to Eq. (5.19) leads to sion for |HLaplace (Ω1 , Ω2 )| given by Eq. (5.27). It confirms that
the approximation is valid not only for |Ω1 |, |Ω2 | ≪ 1, but also
G(Ω1 , Ω2 ) = F(Ω1 , Ω2 ) e− jΩ1 + F(Ω1 , Ω2 ) e jΩ1
up to |Ω1 |, |Ω2 | ≈ 1.
+ F(Ω1 , Ω2 ) e− jΩ2 + F(Ω1 , Ω2 ) e jΩ2 In Fig. 5-5, the plots for the exact and approximate ex-
− 4F(Ω1, Ω2 ) pressions of |HLaplace (Ω1 , 0)| are displayed over the range
−π < Ω1 < π . They are in close agreement over approximately
= F(Ω1 , Ω2 ) [(e− jΩ1 + e jΩ1 ) + (e− jΩ2 + e jΩ2 ) − 4] the central one-third of the spectral range, and they deviate
= F(Ω1 , Ω2 ) [2 cos Ω1 + 2 cosΩ2 − 4]. (5.24) significantly as |Ω1 | exceeds 1 (or |Ω2 | exceeds 1), or more
generally, as the radial frequency R exceeds 1. In most images,
The spectrum G(Ω1 , Ω2 ) is the product of the spectrum of the bulk of the image “energy” is contained within this central
the original image, F(Ω1 , Ω2 ) and the Laplacian’s spatial fre- region.
quency response HLaplace (Ω1 , Ω2 ): This last statement deserves further elaboration. To do so, we
refer the reader to Eq. (2.64), which relates the frequency Ω0 in
G(Ω1 , Ω2 ) = F(Ω1 , Ω2 ) HLaplace (Ω1 , Ω2 ). (5.25) discrete time to the frequency f0 in continuous time, namely
Equating Eqs. (5.24) and (5.25) leads to Ω0 = 2π f0 ∆, (5.28)
HLaplace (Ω1 , Ω2 ) = 2[cosΩ1 + cosΩ2 − 2]. (5.26) where ∆ is the sampling interval in seconds. Extending the
relationship to 2-D provides the connections
In Fig. 5-5, we display a plot of |HLaplace (Ω1 , Ω2 )| as a function Ω1 = 2 π µ ∆ (5.29a)

of Ω1 , with (for simplicity) Ω2 set equal to zero. The plot
would look the same as a function of the radial discrete-space and
frequency Ω2 = 2πν ∆, (5.29b)
R = [Ω21 + Ω22 ]1/2 . where now Ω1 and Ω2 are continuous spatial frequencies as-
The frequency response |HLaplace (Ω1 , Ω2 )| exhibits a shape sim- sociated with the discrete image f [n, m], µ and ν are spatial
ilar to that of a high-frequency filter, with |HLaplace (Ω1 , Ω2 )| = 0 frequencies (in cycles/m) associated with the continuous image
at the origin and then increasing rapidly with increasing R. In f (x, y), and ∆ is the sampling length in meters/sample. Since 2π
represents the amount in radians per a single complete cycle, the

◮ A common detractor of all image sharpening algorithms
units of Ω1 and Ω2 are radians/sample.
is that because they emphasize high spatial frequencies,
If an image f (x, y) is sampled at the Nyquist rate such
they also tend to emphasize high spatial-frequency noise.
that ∆ = 1/2B, where B is the maximum spatial frequency of
So in general, sharpened images tend to be noisy. ◭
the image spectrum F(µ , ν ), then the maximum discrete-space
frequency is
1 5-2.5 Valid Convolution
Ω1 (max) = 2π × B × = π, (5.30)
2B When convolving an image f [n, m] with a filter characterized
and the same conclusion applies to Ω2 and R. This is why the by a PSF h[n, m], it is important that edge effects are dealt with
plots in Fig. 5-5 extend over the range −π ≤ Ω1 ≤ π . appropriately. If the image size is (M × M) and the filter size is
Most images are sampled at rates greater than the Nyquist (L × L), the convolved image
rate. If an image is sampled at three times the Nyquist rate
(i.e., at ∆ = 1/6B), then Ω1 (max) = π /3 ≈ 1. In such a case, y[n, m] = f [n, m] ∗ ∗ h[n, m] (5.35)
the approximation given by Eq. (5.27) becomes valid over the
is (N × N), with N = M + L − 1. However, some parts of y[n, m]
complete relevant ranges of Ω1 , Ω2 , and R.
are not “valid” because they include pixels that are a result
of convolving h[n, m] with pixels outside of the boundaries of
5-2.4 Image Sharpening f [n, m]. The “valid” part of y[n, m], which we call yvalid [n, m], is
given by
An image f [n, m] can be sharpened into image gsharp [n, m] by
subtracting g[n, m] of Eq. (5.20) from the original image: yvalid [n, m] = { y[n, m], 0 ≤ n, m ≤ Nvalid }, (5.36)
gsharp [n, m] = f [n, m] − f [n, m] ∗ ∗ hLaplace [n, m] where Nvalid = M − L + 1. To illustrate with an example, let us
consider the image given by
= f [n, m] ∗ ∗ hsharp [n, m], (5.31)
 
where hsharp [n, m] is an image sharpening filter with PSF 4 8 12
f [n, m] = 16 20 24 , (5.37a)
28 32 36
hsharp [n, m] = δ [n] δ [m] − hLaplace [n, m]. (5.32)
and let us assume that we wish to perform local averaging by
sliding a 2 × 2 window across the image, both horizontally and
This operation is analogous to Eq. (5.14) for film photography, vertically. Such a filter has a PSF given by
except that in the present case we used a minus sign (rather than

a plus sign) in the first step of Eq. (5.31) because 1 1 1
h[n, m] = . (5.37b)
4 1 1
f [n, m] ∗ ∗ hLaplace [n, m]F(Ω1 , Ω2 ) HLaplace (Ω1 , Ω2 ),
(5.33) Upon performing the convolution given by Eq. (5.35) onto the
and HLaplace (Ω1 , Ω2 ) is always negative. arrays given in Eqs. (5.37a and b), we obtain
Use of Eq. (5.21) in Eq. (5.32) leads to
 
  1 3 5 3
0 −1 0  5 12 16 9 
y[n, m] =  . (5.38)
hsharp [n, m] = −1 5 −1 . (5.34) 11 24 28 15
0 −1 0 7 15 17 9
Since hsharp [n, m] is only 3 × 3, it is faster to compute the 2-D The border rows and columns are not the average values of
convolution given by Eq. (5.31) in the spatial [n, m] domain than 4 neighboring pixels, but of only 1 neighboring pixel and 3
by multiplying zero-padded 2-D DFTs. zeros or 2 neighboring pixels and 2 zeros. These are invalid
In a later part of this section, we will compare sharpened entries. The more realistic valid output is yvalid [n, m], obtained
images to their original versions, but we should note that: via Eq. (5.33) or equivalently, by removing the top and bottom
5-3 HISTOGRAM EQUALIZATION 167
rows and the columns at the far left and far right. Either approach
leads to
12 16
yvalid [n, m] = . (5.39)
24 28
Since f [n, m] is M × M = 3 × 3 and h[n, m] is L × L = 2 × 2,
yvalid [n, m] is Nvalid × Nvalid with
Nvalid = M − L + 1 = 3 − 2 + 1 = 2.
We now use yvalid [n, m] as the filtered image in the following

two examples:
(1) Image of an electronic circuit, before and after applying
the sharpening algorithm: Fig. 5-6.
(2) Image of US coins, before and after sharpening: Fig. 5-7.
Concept Question 5-3: Why does image sharpening not

work well on noisy images?
Concept Question 5-4: Why is the Laplacian a common

choice for a high-pass filter?
(a) Original image
Exercise 5-3: Show that the valid convolution of a Lapla-
cian with a constant image f [n, m] = c is zero.
Answer: The valid convolution (defined in Section 5-2.5)
is the usual convolution with edge effects omitted.
 
1 1 0 1 0
f [n, m] ∗ ∗ hLaplace [n, m] = c ∑ ∑ 1 −4 1 = 0.
n=−1 m=−1 0 1 0
5-3 Histogram Equalization

Consider the two clown images shown in parts (a) and (b) of
Fig. 5-8. The first one, labeled “original image,” is rather dark,
making it difficult to discern some of the features in the clown’s
face. In contrast, the image labeled “histogram-equalized” ex-
hibits a broader range of intensities, thereby allowing the viewer
to see details that are difficult to see in the original image. What (b) Sharpened image
is a histogram-equalized image? That is the topic of the present
section. Figure 5-6 Image of electronic circuit before and after applica-
Given an image f [n, m] with pixel values that extend over the tion of the sharpening filter.
input dynamic range Ri , with
Ri = { fmin ≤ f [n, m] ≤ fmax }, (5.40a)

the objective of histogram equalization is to convert the pixel

values f [n, m] into a new set g[n, m] so that they become more
evenly distributed across the dynamic range of the display
device Rd , with
Rd = { 0 ≤ g[n, m] ≤ gmax }. (5.40b)
In Section 5-1, the conversion from f [n, m] to g[n, m] was

accomplished by “stretching” Ri to fit Rd , but now we explore
a different approach that relies on converting the histogram
of f [n, m] into a new histogram associated with the converted
image g[n, m]. The histogram of an image (or signal) is simply
a bar graph of the number of times that each pixel value occurs
in the image. The histograms associated with the original and
histogram-equalized images shown in Figs. 5-8(a) and (b) are
displayed in Figs. 5-8(c) and (d). In both histograms, the hori-
zontal axis represents pixel value and the vertical axis represents
the number of times that a specific pixel value occurs in the
image. For image f [n, m], its continuous original f (x, y) had
been sampled to 200 × 200 discrete locations [n, m], and its non-
negative values had been quantized using 8-bit precision, which
means that the pixel value f0 can take on any integer value
(a) Original image between 0 and 255:
f [n, m] ∈ { 0, 1, 2, . . ., 255 }, for { 0 ≤ n, m ≤ 199 }.
The horizontal axis in a histogram of f [n, m] represents the pixel

value f0 , which may range between 0 and 255, and the vertical
axis is p f [ f0 ], which is the number of times that pixel value
f0 occurs in the image. We refer to p f [ f0 ] as the distribution
function, or histogram, of f [n, m].
A related quantity is the cumulative distribution function
(CDF), Pf [ f0 ], which is the cumulative sum of p f [ f0 ]:
f0
Pf [ f0 ] = ∑ p f [ f ′ ]. (5.41)
′
f =0
The CDF Pf [ f0 ] is a non-decreasing function that jumps upward

in value at the values of f0 for which p f [ f0 ] 6= 0. Figure 5-9(a)
displays the histogram of the original clown image, p f [ f0 ], and
a plot of its associated CDF, Pf [ f0 ]. We observe that Pf [ f0 ]
increases rapidly with f0 between f0 = 0 and f0 = 100 and at
(b) Sharpened image a lower rate at higher values of f0 . Thus, the majority of the
image pixels are concentrated in the range corresponding to low
image intensity, thereby giving the image its dark appearance.
Figure 5-7 Coins image, before and after sharpening.
The pixel-value transformation converts each pixel value
f [n, m] into a new pixel value g[n, m], based on the CDF Pf [ f0 ],
5-3 HISTOGRAM EQUALIZATION 169
0 0
200 200
0 200 0 200
(a) Original image f [n,m] (b) Histogram-equalized g[n,m]
pf ( f0) pg(g0)
f0 g0
0 255 0 255
(c) Histogram pf [ f0] of original image (d) Histogram pg[g0] of g[n,m]
Figure 5-8 Clown image, before and after application of histogram equalization, and associated histograms. Histogram pg (g0 ) was
generated by nonlinearly transforming (redistributing) p f ( f0 ), but the total number of pixels at each new value (the g0 axis is a nonlinear
transformation of the f0 axis) remains the same.
evaluated at f0 = f [n, m]. That is, uniformly spread out over the range 0 to 255 than the histogram
of the original image, p f [ f0 ]. The associated CDF, Pg [g0 ],
approximates a straight line that starts at coordinates (0, 0) and
g[n, m] = Pf [ f0 ] f . (5.42) concludes at (255, M 2 ), where M 2 is the total number of pixels.
0 = f [n,m]
These attributes are evident in Fig. 5-10 for the histogram-
equalized clown image.
Such a transformation leads to a histogram pg [g0 ] that is more
pf ( f0) Pf [ f0]
M2
f0
0 50 100 150 200 255
f0 (b) CDF Pf [ f0]
0 50 100 150 200 255
(a) Histogram pf [ f0] of f [n,m] Pg[g0]
Pf [ f0] M2
M2
g0
0 50 100 150 200 255
0 50 100 150 200 255

f0 (b) CDF Pg[g0]
(b) CDF Pf [ f0]
Figure 5-10 CDF for an (M × M) image before and after
application of the histogram-equalization algorithm given by
Figure 5-9 Histogram and associated CDF of original clown Eq. (5.42).
image f [n, m]. Image size is (M × M).
Answer: The histogram of the image is: 0.2 occurring

Concept Question 5-5: What is the purpose for using twice and 0.3 occurring twice. The CDF is
histogram equalization? 

0 for 0 ≤ f0 < 0.2,
Concept Question 5-6: Why is histogram equalization Pf ( f0 ) = 2 for 0.2 ≤ f0 < 0.3,

4
often described as a nonlinear warping transformation of for 0.3 ≤ f0 < 1.
pixel values?
Pixel value 0.2 is mapped to Pf (0.2) = 2 and pixel value
Exercise 5-4: Perform histogram equalization on the to Pf (0.3) = 4. The histogram-equalized
0.3 is mapped
4 2
0.3 0.2 “image” is , which has a wider range of values than
(2 × 2) “image” . 2 4
0.2 0.3 the original “image.”
5-4 EDGE DETECTION 171
5-4 Edge Detection

x[n]
An edge in an image is a sharp boundary between two different 6
regions of an image. Here “sharp” means a width of at most a 4
few pixels, and a “boundary” means that significant differences 2
exist in the pixel values between the two sides of the edge.
0 n
“Significant” is not clearly defined; its definition depends on 0 5 10 15 20 25 30
the characteristics of the image and the reason why edges are (a) x[n]
of interest. This is a nebulous definition, but there is no uniform
y[n]
definition for an edge.
4
The goal of edge detection is to determine the locations [n, m]
of edges in an image f [n, m]. Edge detection is used to segment 2
an image into different regions, or to determine the boundaries 0 n
of a region of interest. For example, a medical image may −2
consist of different human organs. Interpretation of the image 0 5 10 15 20 25 30
is easier if (say) the region of the image corresponding to the (b) d[n]
pancreas is identified separately from the rest of the image. z[n]
Identification of a face is easier if the eyes in an image of the 1
face are identified as a region separate from the rest of the image.
Ideally, an edge is a contour that encloses a region of the image 0.5
whose values differ significantly from the values around it. Edge
detection also is important in computer vision. 0 n
0 5 10 15 20 25 30
(c) z[n] with ∆ = 1
5-4.1 1-D Edge Detection
Figure 5-11 Edge detection by thresholding absolute values
We start by examining a simple 1-D edge-detection method, of differences: (a) original signal x[n], (b) d[n] = x[n + 1] − x[n],
as it forms the basis for a commonly used 2-D edge-detection (c) z[n] with ∆ = 1.
algorithm.
An obvious approach to detecting the locations of sharp
changes in a 1-D signal x(t) is to compute its derivative
x′ (t) = dx/dt. Rapid changes of x(t) with t generate derivatives The times ni at which z[ni ] = 1 denote the edges of x[n].
with large magnitudes, and slow changes generate derivatives Specification of the threshold level ∆ depends on the character
with small magnitudes. The times t0 at which |x′ (t0 )| is large of x[n]. In practice, for a particular class of signals, the algorithm
represent potential edges of x(t). The threshold for “large” has is tested for several values of ∆ so as to determine the value that
to be defined in the context of the signal x(t) itself. provides the best results for the intended application.
For a 1-D discrete-time signal x[n], the discrete-time counter- For the signal x[n] displayed in Fig. 5-11(a), the difference
part to the derivative is the difference operator operator d[n] was computed using Eq. (5.44) and then plotted in
Fig. 5-11(b). It is evident that |d[n]| exhibits significant values
d[n] = x[n + 1] − x[n]. (5.43) at n = 5, 14, and 21. Setting ∆ = 1 would detect all three edges,
as shown in part (c) of the figure, but had we chosen ∆ to be 2,
The difference d[n] is large when x[n] changes rapidly with n, for example, only the edge at n = 15 would have been detected.
making it possible to easily pinpoint the time n0 of an edge. As The choice depends on the intended application.
simple as it is, computing the difference d[n] and thresholding
|d[n]| is a very effective method for detecting 1-D edges. If the
threshold is set at a value ∆, the edge-detection algorithm can be 5-4.2 2-D Edge Detection
cast as ( The 1-D edge detection method can be extended to edge de-
1 for |d[n]| > ∆, tection in 2-D images. Let us define a vertical edge (VE) as a
z[n] = (5.44)
0 for |d[n]| < ∆. vertical line at n = n0 , extending from m = m1 to m = m2 , as
(n0,m1) Pixel [n,m]

−1 0 1
VE −2 0 2
@ n0 −1 0 1
(n0,m2)
n
m
(a) dH[n,m]
n
0 n0
m
Figure 5-12 Vertical edge VE at n = n0 extends from m = m1

to m = m2 , and its length is (m2 − m1 + 1).
−1 −2 −1
0 0 0
1 2 1
shown in Fig. 5-12. That is,
VE = { [n, m]: n = n0 ; m1 ≤ m ≤ m2 }. (5.45)
The total length of VE is (m2 − m1 + 1). n

One way to detect a vertical edge is to apply the difference m
(b) dV[n,m]
operator given by Eq. (5.43) to each row of the image. In 2-D,
the difference operator for row m is given by
Figure 5-13 Point spread functions dH [n, m] (in red) and
d[n, m] = f [n + 1, m] − f [n, m]. (5.46) dV [n, m] (in blue), displayed in center-of-image format.
If d[n, m] satisfies a specified threshold for a group of continuous

pixels (all at n = n0 ) extending between m = m1 and m2 , then we
call the group a vertical edge. In real images, we may encounter detector consists of a 3 × 3 window centered at the pixel of
situations where d[n, m] may exhibit a large magnitude, but it is interest. Detector dH [n, m] computes the difference between the
associated with a local variation in tone, not an edge. A vertical values of pixel [n + 1, m] and pixel [n − 1, m], whose positions
edge at n = n0 requires that not only d[n0 , m] at row m be large, are to the left and right of pixel [n, m], respectively. Similar
but also that d[n0 , m + 1] at the row above m and d[n0 , m − 1] differences are performed for the row above and the row below
at the row below row m be large as well. All three differences row m. Then, the three differences are added up together, with
should be large and of the same polarity in order for the three the middle difference assigned twice the weight of the two other
pixels to qualify as a vertical edge. differences. The net result is
This requirement suggests that a vertical edge detector should
not only compute horizontal differences, but also vertical sums dH [n, m] = f [n + 1, m + 1] − f [n − 1, m + 1]
of the differences. The magnitude of a vertical sum becomes + 2 f [n + 1, m] − 2 f [n − 1, m]
an indicator of the presence of a true vertical edge. A rel- + f [n + 1, m − 1] − f [n − 1, m − 1]. (5.47)
atively simple edge operator is illustrated in Fig. 5-13 for
both a horizontal-direction vertical-edge detector dH [n, m] The coefficients of the six terms of Eq. (5.47) are the nonzero
and a vertical-direction horizontal-edge detector dV [n, m]. Each weights shown in Fig. 5-13(a).
Computing dH [n, m] for every pixel is equivalent to validly where, again, ∆ is a prescribed gradient threshold. In the image,
convolving (see Section 5-2.5) image f [n, m] with the window’s pixels for which z[n, m] = 1 are shown in white, and those with
point spread function hH [n, m] along the horizontal direction. z[n, m] = 0 are shown in black. Usually, the value of ∆ is selected
That is, empirically by examining a histogram of g[n, m] or through
repeated trials.
dH [n, m] = f [n, m] ∗ ∗ hH [n, m], (5.48)
with   5-4.3 Sobel Edge Detector Examples

−1 0 1
hH [n, m] = −2 0 2 . (5.49) The gradient algorithm given by Eq. (5.53) is known as the
−1 0 1 Sobel edge detector, named after Irwin Sobel, who developed
To compute dH [n, m] for all pixels [n, m] in the image, it is it in 1968, when computer-based image processing was in its
necessary to add an extra row above of and identical with the infancy and only simple algorithms could be used. Application
top row, and a similar add-on is needed at the bottom end of of the Sobel edge detector to the letters image in part (a) of
the image. The decision as to whether or not a given pixel is Fig. 5-14 leads to the image in part (b). Through repeated
part of a vertical edge is made by comparing the magnitude of applications using different values of ∆, it was determined that
dH [n, m] with a predefined gradient threshold ∆ whose value is ∆ = 200 provided an image with clear edges, including diagonal
selected heuristically (based on practical experience for the class and curved edges. The value specified for ∆ depends in part on
of images under consideration). the values assigned to black and white tones in the image.
Horizontal edges can be detected by a vertical-direction edge The Sobel edge detector does not always capture all of the
detector dV [n, m] given by major edges contained in an image. When applied to the clown
image of Fig. 5-15(a), the edge detector identified some parts of
continuous edges, but failed to identify others, which suggests
dV [n, m] = f [n, m] ∗ ∗ hV [n, m], (5.50) the need for a detector that can track edges and complete edge
contours as needed. Such a capability is provided by the Canny
where hV [n, m] is the point spread function (PSF) for a pixel edge detector, the subject of the next subsection.
(n, m). By exchanging the roles of the rows and column in
Eq. (5.49), we have
 
−1 −2 −1 5-4.4 Canny Edge Detector
hV [n, m] =  0 0 0 . (5.51)
1 2 1 The Canny edge detector is a commonly used algorithm that
extends the capabilities of the Sobel detector by applying
Of course, most edges are neither purely horizontal nor purely preprocessing and postprocessing steps. The preprocessing step
vertical, so an edge at an angle different from 0◦ or 90◦ (with involves the use of a 2-D Gaussian PSF to reduce image noise
0◦ denoting the horizontal dimension of the image) should and to filter out isolated image features that are not edges.
have edge components along both the horizontal and vertical After computing the Sobel operator given by Eq. (5.52), the
directions. Hence, the following edge-detection gradient is Canny algorithm performs an edge thinning step, separating
often used: detected edges into different candidate categories, and then
q applies certain criteria to decide whether or not the candidate
g[n, m] = dH2 [n, m] + dV2 [n, m] . (5.52) edges should be connected together. The five-step process of the
Canny detection algorithm are:
For each pixel [n, m], we define the edge indicator z[n, m] as
(
1 if g[n, m] > ∆, Step 1: Image f [n, m] is blurred (filtered) by convolving it with
z[n, m] = (5.53)
0 if g[n, m] < ∆, a truncated Gaussian point spread function. An example of a
practical function that can perform the desired operation is the
0 0
20
50
40
60
100
80
150 100
120
200
140
250 n 160
0 50 100 150 200 250
m 180
(a) Letters image
200
0 20 40 60 80 100 120 140 160 180 200
n
0
m
(a) Clown image
50
0
100 20
40
150
60
80
200
100
250 n 120
0 50 100 150 200 250
m 140
(b) Edge-detected image
160
Figure 5-14 Application of the Sobel edge detector to the 180
image in (a) with ∆ = 200 led to the image in (b).
200
0 20 40 60 80 100 120 140 160 180 200
n
m
(b) Sobel edge-detected image
5 × 5 PSF given by
  Figure 5-15 Application of the Sobel edge detector to the
2 4 5 4 2 image in (a) captures some of the edges in the image, but also
1 
4 9 12 9 4
 misses others.
hG [n, m] = 5 12 15 12 5 . (5.54)
159 4 9 12 9 4
2 4 5 4 2
The standard deviation of the truncated Gaussian function is 1.4.

Application of hG [n, m] to image f [n, m] generates a filtered
image f1 [n, m] given by
Step 2: For image f1 [n, m], compute the horizontal and vertical
f1 [n, m] = hG [n, m] ∗ ∗ f [n, m]. (5.55) edge detectors given by Eqs. (5.48) and (5.50).
Step 3: Compute the gradient magnitude and orientation:

0
q
g[n, m] = dH2 [n, m] + dV2 [n, m] (5.56a) 20
and 40

dV [n, m] 60
θ [n, m] = tan−1 . (5.56b)
dH [n, m] 80
For a vertical edge, dV [n, m] = 0, and therefore θ [n, m] = 0. 100

Similarly, for a horizontal edge, dH [n, m] = 0 and θ = 90◦ .
120
Step 4: At each pixel [n, m], round θ [n, m] to the nearest of 140
{ 0◦ , 45◦ , 90◦ , 135◦ }. Next, determine whether to keep the value
160
of g[n, m] of pixel [n, m] as is or to replace it with zero. The
decision logic is as follows: 180
(a) For a pixel [n, m] with θ [n, m] = 0◦ , compare the value of 200
g[n, m] to the values of g[n + 1, m] and g[n − 1, m], correspond- 0 20 40 60 80 100 120 140 160 180 200
ing to the pixels at the immediate right and left of pixel [n, m]. If (a) Clown image
g[n, m] is the largest of the three gradients, keep its value as is;
otherwise, set it to zero. 0
(b) For a pixel [n, m] with θ = 45◦ , compare the value of 20
g[n, m] to the values of g[n − 1, m + 1] and g[n + 1, m − 1],
corresponding to the pixel neighbors along the 45◦ diagonal. If 40
g[n, m] is the largest of the three gradients, keep its value as is; 60
otherwise, set it to zero.
(c) For a pixel [n, m] with θ = 90◦ , compare the value of 80
g[n, m] to the values of g[n, m − 1] and g[n, m + 1], corre- 100

sponding to pixels immediately above and below pixel [n, m].
120
If g[n, m] is the largest of the three gradients, keep its value as is;
(d) For a pixel [n, m] with θ = 135◦, compare the value of
160
g[n, m] to the values of g[n − 1, m − 1] and g[n + 1, m + 1]. If
g[n, m] is the largest of the three gradients, keep its value as is; 180
The foregoing operation is called edge thinning, as it avoids 0 20 40 60 80 100 120 140 160 180 200
making an edge wider than necessary in order to indicate its (b) Canny edge-detected image
presence.
Figure 5-16 The Canny edge detector provides better edge-
Step 5: Replace the edge indicator algorithm given by Eq. (5.53) detection performance than the Sobel detector in Fig. 5-15.
with a double-threshold algorithm given by


2 if g[n, m] > ∆2 ,
z[n, m] = 1 if ∆1 < g[n, m] < ∆2 , (5.57) of the other two categories. This is accomplished by converting

0 if g[n, m] < ∆1 . pixel [n, m] with z[n, m] = 1 into an edge if any one of its nearest
8 neighbors is a confirmed edge. That is, pixel [n, m] is an edge
The edge indicator z[n, m] may assume one of three values, location only if it adjoins another edge location.
indicating the presence of an edge (z[n, m] = 2), the possible The values assigned to thresholds ∆1 and ∆2 are selected
presence of an edge (z[n, m] = 1), and the absence of an edge through multiple trials. For example, the clown edge-image
(z[n, m] = 0). The middle category requires resolution into one shown in Fig. 5-16 was obtained by applying the Canny algo-
rithm to the clown image with ∆1 = 0.05 and ∆2 = 0.125. This 5-5 Summary of Image Enhancement
particular combination provides an edge-image that successfully
captures the contours that segment the clown image.
Techniques
• To increase contrast in an image by altering the range
of pixel values, use a gamma transformation. Simply try
different values of γ and choose the one that gives the best
◮ Edge detection can be implemented in MATLAB’s Image visual result.
Processing Toolbox using the commands
• To compress the range of values; e.g., in a spectrum, use a
E=edge(X,’sobel’,T1) for Sobel and log transformation.
E=edge(X,’canny’,T1,T2) for Canny.
• To sharpen an image, use unsharp masking, bearing in mind
The image is stored in array X, the edge image is stored that this also makes the image noisier.
in array E and T1 and T2 are the thresholds. MATLAB
assigns default values to the thresholds, computed from the • To make an image brighter, use histogram equalization.
image, if they are not specified. ◭
• To detect edges, or create an image of edges, use Canny
edge detection (if available), otherwise use Sobel edge
detection.
Concept Question 5-7: Why does edge detection not
work well on noisy images?
Concept Question 5-8: In edge detection, why do we not

simply take differences, as in Eq. (5.46), instead of taking
differences in one direction and sums in the other, as in
Eq. (5.47)?
Exercise 5-5: Show that edge detection applied to a con-

stant image f [n, m] = c gives no edges.
Answer: The valid convolution (defined in Section 5-2.5)
is the usual convolution with edge effects omitted. The
two valid convolutions (hH [n, m] and hV [n, m] are defined
in Eq. (5.49) and Eq. (5.51))
 
1 1 −1 0 1
f [n, m] ∗ ∗ hH [n, m] = c ∑ ∑ −2 0 2 = 0,
n=−1 m=−1 −1 0 1
 
1 1 −1 −1 1
f [n, m] ∗ ∗ hV [n, m] = c ∑ ∑  0 0 0 = 0,
n=−1 m=−1 1 2 1
Hence, the edge-direction gradient g[n, m] defined in

Eq. (5.52) is zero.
5-5 SUMMARY OF IMAGE ENHANCEMENT TECHNIQUES 177
Summary
Concepts
• Image enhancement transforms a given image into an- • Unsharp masking and Laplacians sharpen an image, but
other image in which image features such as edges or increase noise.
contrast has been enhanced to make them more apparent. • Histogram equalization nonlinearly alters pixel values to
• Linear, logarithmic, and gamma transformations alter the brighten images.
range of pixel values so that they fit the range of the • Edge detection produces an image consisting entirely of
display. edges of the image.
Linear transformation Laplacian
f [n, m] − fmin g[n, m] = f [n + 1, m] + f [n − 1, m] + f [n, m + 1]
g[n, m] = gmax
fmax − fmin + f [n, m − 1] − 4 f [n, m]
Logarithmic transformation Cumulative distribution
g[n, m] = a log10 ( f [n, m] + b) f0
Pf [ f0 ] = ∑ p f [ f ′]
Gamma transformation

′
f =1
f [n, m] − fmin γ Histogram equalization
g[n, m] = gmax
fmax − fmin g[n, m] = Pf [ f0 ] f0 = f [n,m]
Unsharp masking Horizontal and verticaledge detectors
g(x, y) = f (x, y) + f (x, y) − fblur (x, y) −1 0 1
| {z }
f mask (x,y) dH [n, m] = f [n, m] ∗ ∗ −2 0 2
Unsharp masking −1 0 1
 
g[n, m] = f [n, m] − f [n, m] ∗ ∗ hLaplace [n, m] −1 −2 −1
dV [n, m] = f [n, m] ∗ ∗  0 0 0
Laplacian 1 2 1
∂2 f ∂2 f
g(x, y) = ∇2 f (x, y) = + Sobel edge detector
( p
∂ x2 ∂ y2
1 if dh [n, m]2 + dv [n, m]2 > ∆,
z[n, m] = p
0 if dh [n, m]2 + dv [n, m]2 < ∆
Canny edge detector histogram equalization logarithmic transformation unsharp masking
gamma transformation Laplacian Sobel edge detector
PROBLEMS Plot a cross-section (with ν = 0) of the spatial frequency

response of this form of unsharp masking with that of the
Section 5-1: Pixel Value Transformations continuous-space version of Eq. (5.32), which is
5.1 Explain why, in the gamma transformation (Eq. (5.10)), hsharpen (x, y) = δ (x) δ (y) − ∇2 f (x, y).
γ > 1 tends to darken images, while γ < 1 tends to lighten 2π 2 ρ 2
images. Hint: Use the result of Problem 3.4, which is Fg (ρ ) = e−2σ .
Use σ 2 = 2.
5.2 Use gamma transformation with γ = 3 to darken the image
in coins1.mat. 5.9 Use unsharp masking as defined in Problem 5.8 to
sharpen the two images in the files (a) circuit.mat and (b)
5.3 Use gamma transformation with γ = 3 to darken the image quarter.mat. Use unsharp.m.
in coins2.mat.
5.10 Use unsharp masking as defined in Problem 5.8 to
sharpen the two images in the files (a) tire.mat and (b)
Section 5-2: Unsharp Masking coins2.mat. Use unsharp.m.
5.4 Use the sharpening filter Eq. (5.32) to sharpen the two Section 5-3: Histogram Equalization
images in the files (a) plane.mat and (b) coins2.mat.
5.11 This problem applies histogram equalization to a tiny
5.5 Use the sharpening filter Eq. (5.32) to sharpen the two
(3 × 3) image. The goal is for the reader to work the problem
images in the files (a) quarter.mat and (b) rice.mat.
entirely by hand, thereby aiding understanding. The (3 × 3)
5.6 Use sharpening filter Eq. (5.32) to sharpen the two images image is  
in the files (a) moon.mat and (b) unsharp.mat. 1 2 1
f [n, m] = 2 3 9 .
5.7 Unsharp masking was originally based on Eq. (5.14), 3 2 9
which in discrete space is
(a) Plot the histogram of the image.
g[n, m] = f [n, m] + ( f [n, m] − fblur [n, m]).
(b) List its distribution and CDF in a table.
fblur [n, m] is a lowpass version of f [n, m]. If fblur [n, m] is the (c) List values of f [n, m] and values of the histogram-equalized
average of { f [n + 1, m], f [n − 1, m], f [n, m + 1], f [n, m − 1]}, image g[n, m] in a table.
show that (d) Depict g[n, m] as a 3 × 3 matrix, similar to the depiction of
1 f [n, m].
g[n, m] = f [n, m] − f [n, m] ∗ ∗ hLaplacian [n, m], (e) Depict f [n, m] and g[n, m] as images, and plot their respec-
4
tive histograms and CDF’s.
similar to Eq. (5.32).
5.12 Use the program hist.m to apply histogram equaliza-
5.8 Unsharp masking was originally based on Eq. (5.14), tion to the image in circuit.mat. Print out the images,
which is histograms, and CDFs of the original and equalized images.
5.13 Use the program hist.m to apply histogram equal-
g(x, y) = f (x, y) + ( f (x, y) − fblur (x, y)).
ization to the image in pout.mat. Print out the images,
fblur (x, y) is a lowpass version of f (x, y). Adobe R
Photoshop
R histograms, and CDFs of the original and equalized images.
uses the following form of unsharp masking: 5.14 Use the program hist.m to apply histogram equal-
p ization to the image in tire.mat. Print out the images,
fblur (x, y) = f (x, y) ∗ ∗ fg ( x2 + y2), histograms, and CDFs of the original and equalized images.
where 5.15 Use the program hist.m to apply histogram equal-
2 2 2
p e−(x +y )/(2σ ) ization to the image in coins.mat. Print out the images,
2 2
fg ( x + y ) = .
2πσ 2 histograms, and CDFs of the original and equalized images.
PROBLEMS 179
Section 5-4: Edge Detection
5.16 Sobel edge detection works by convolving the image

f [n, m] with the two PSFs
 
1 0 −1
hH [n, m] = 2 0 −2
1 0 −1
and  
1 2 1
hV [n, m] =  0 0 0 .
−1 −2 −1
Compute the discrete-space frequency response H(Ω1 , Ω2 ) of
each of these PSFs.
5.17 Sobel edge detection works by convolving the image
f [n, m] with the two PSFs
 
1 0 −1
hH [n, m] = 2 0 −2
1 0 −1
and  
1 2 1
hV [n, m] =  0 0 0 .
−1 −2 −1
Show how to implement these two convolutions using
(a) 16N 2 additions and subtractions since doubling is two
additions;
(b) 10N 2 additions and subtractions since hH [n, m] and hV [n, m]
are separable.
5.18 Apply (a) Sobel edge detection and (b) Canny edge
detection to the image in plane.mat using the programs
sobel.m and canny.m. Compare results.
detection to the image in quarter.mat using the programs
5.20 Apply (a) Sobel edge detection and (b) Canny edge detec-
tion to the image in moon.mat using the programs sobel.m
and canny.m. Compare results.
detection to the image in saturn.mat using the programs
Chapter 6
6 Deterministic Approach to
Image Restoration
0
Contents
Overview, 181
6-1 Direct and Inverse Problems, 181
6-2 Denoising by Lowpass Filtering, 183
224
6-3 Notch Filtering, 188 0
(a) Image g[n,m]: motion-blurred highway sign
274
6-4 Image Deconvolution, 191

6-5 Median Filtering, 194 0
6-6 Motion-Blur Deconvolution, 195

Problems, 199
Objectives
224
0 274
(b) Image g′[n,m]: motion-blurred highway sign
Learn to: with additive noise
0
■ Denoise a noisy image using the 2-D DFT and a
Hamming-windowed lowpass filter.
■ Notch-filter an image with sinusoidal interference

added to it.
■ Use median filtering to denoise an image with

salt-and-pepper noise added to it. 274
0 274
(c) Reconstructed highway sign
■ Use Tikhonov regularization in a deconvolution
problem. This chapter covers image restoration from a noisy
or blurred version of it, where the blur and noise
■ Deconvolve an image blurred with a known
were introduced by the imaging system. Denoising
point-spread function.
can be performed by lowpass-filtering the noisy
■ Deblur a motion-blurred image. image using the 2-D DFT (a Hamming-windowed
filter works better than a brick-wall filter). Deblur-
ring (deconvolution) can be performed using the
2-D DFT, although Tikhonov regularization is
usually required. Motion-blur deblurring is a
common application; refocusing an out-of-focus
image is also common.
Overview g(x, y) = recorded image,
f (x, y) = true image of the scene,
The goal in image restoration is to recover the true image h(x, y) = PSF of the imaging system,
f [n, m]—or a close version of it—from its noisy or blurred υ (x, y) = additive noise.
version g[n, m]. Examples include:
(1) Denoising: removing noise that had been added by the
To illustrate the impact of each type of distortion sepa-
imaging system.
rately, let us start with a noise-free imaging process by setting
(2) Removing interference: subtracting an unwanted image
υ (x, y) = 0. In Fig. 6-1, we show a high-quality MRI image
that had been added to f [n, m].
labeled f (x, y). We will treat it as if it were a true image, and
(3) Deblurring: undoing the effect of the convolution of a
then through simulations, we will subject f (x, y) to convolution
system PSF h[n, m] with image f [n, m], where h[n, m] is the
with hi (x, y) to generate image gi (x, y):
system response due to motion blur or defocusing.
Image restoration methods are categorized into two groups: gi (x, y) = hi (x, y) ∗ ∗ f (x, y), (6.2)
deterministic and probabilistic. Deterministic methods apply
algorithms—such as lowpass filtering—that do not incorporate with index i = 1, 2, or 3 referring to Gaussian-shaped filters
knowledge of probability and random processes associated with hi (x, y) of different effective radii. Image g1 (x, y) is the result of
the image or the image formation process. The present chapter convolving f (x, y) with a narrow PSF h1 (x, y). Because h1 (x, y)
deals exclusively with deterministic restoration methods applied is narrow, the blurring is visually unnoticeable. Image g2 (x, y)
to discrete-space images. Image restoration methods using the and g3 (x, y) are the result of convolving the same original
probabilistic approach are treated in Chapter 9. image with a medium-wide PSF h2 (x, y) and wide PSF h3 (x, y),
respectively. Not surprisingly, increasing the filter’s effective
width leads to more image blurring.
6-1 Direct and Inverse Problems Next we simulate the impact that additive random noise
imparts onto the appearance of an image. The noise is added
In image processing, we encounter two interrelated operations to f (x, y) after convolving it with the narrow filter h1 (x, y), so
commonly called the direct and inverse problems, and our goal the applicable model is
is to obtain solutions to both problems.
g1 j (x, y) = h1 (x, y) ∗ ∗ f (x, y) + υ j (x, y), (6.3)
6-1.1 The Direct Problem where j = 1, 2, or 3 refers to three noise-addition simulations,
The solution to the direct problem consists of a mathematical characterized by different signal-to-noise ratios. All three output
model that correctly accounts for the two forms of distortions images in Fig. 6-2 were generated by adding noise randomly
commonly encountered in the generation of an image: (1) blur- to different segments of the convolved image, but in g11 (x, y),
ring by the imaging system and (2) the introduction of additive the average power content of the added noise is much smaller
noise. As noted in Chapter 1, an imaging sensor—be it our than the average power of the noise-free image, whereas the
eye’s pupil, a camera, an MRI system, or any other 2-D image- signal (image) and noise powers are comparable to one another
forming configuration—has a non-zero “beam” described by in g12 (x, y), and the noise power is much larger than the signal
a point spread function h(x, y). The image formation process power in g13 (x, y). Noise distorts an image by changing its
is equivalent to convolving a filter h(x, y) with the true image amplitude, whereas the PSF distorts it through spatial averaging.
f (x, y), which results in blurring. The second form of distortion Most images include both types of distortions.
involves the addition of random noise, which may be contributed The solution to the direct problem entails computing g(x, y)
entirely by the electronics of the imaging and recording systems, from f (x, y) using Eq. (6.1). Doing so requires knowledge of the
or it may also include a component due to peripheral sources in PSF of the imaging system, h(x, y), and the statistical nature of
the imaged scene. the added noise υ (x, y). The PSF usually is determined through
Mathematically, the direct problem is modeled as calibration tests in which the output g(x, y) is measured in
response to a strong point-like target placed at the center of the
g(x, y) = h(x, y) ∗ ∗ f (x, y) + υ (x, y), (6.1) scene, as illustrated in Fig. 1-8. The strong target allows us to
ignore the noise υ (x, y), and its point-like spatial extent makes
where it equivalent to a 2-D impulse δ (x, y). Setting υ (x, y) = 0 and
181
182 CHAPTER 6 DETERMINISTIC APPROACH TO IMAGE RESTORATION
f (x,y)
h1(r) h2(r) h3(r)
h1(x,y) h2(x,y) h3(x,y)

r r r
h1(x,y) ** f (x,y) h2(x,y) ** f (x,y) h3(x,y) ** f (x,y)
g1(x,y) g2(x,y) g3(x,y)
Figure 6-1 Simulation of image blurring: the original noise-free image atpthe top is convolved with Gaussian-shaped point spread
functions of different effective widths. The variable r is the radial distance r = x2 + y2 .
f (x, y) = δ (x, y) in Eq. (6.1) gives using Eq. (6.1), thereby providing a possible solution of the
direct problem. Because υ (x, y) is random in nature, each
g(x, y) = h(x, y) ∗ ∗δ (x, y) = h(x, y). (6.4) simulation of Eq. (6.1) will result in a statistically different, but
comparable image g(x, y).
In most imaging systems, the noise υ (x, y) is random in nature
and usually modeled as a zero-mean Gaussian random variable
(Chapter 8). Accordingly, υ (x, y) is described by a probability 6-1.2 The Inverse Problem
density function (pdf ) that contains a single parameter, the Whereas the solution of the direct problem seeks to generate
noise variance σv2 . The pdf and associated variance can be mea- image g(x, y) from image f (x, y), the solution of the inverse
sured experimentally by recording the output g(x, y) for many problem seeks to do the exact opposite, namely to extract the
locations (x, y), while having no signal as input ( f (x, y) = 0). true image f (x, y)—or a close facsimile of f (x, y)—from the
For a camera, this is equivalent to imaging a perfectly dark blurred and noisy image g(x, y). The process involves (a) denois-
object. The recorded image in that case is the noise added by ing g(x, y) by filtering out υ (x, y)—or at least most of it—and
the camera. (b) deconvolution of g(x, y) to generate a close approximation
Once h(x, y) has been characterized and υ (x, y) has been of the true image f (x, y). The denoising and deconvolution steps
modeled appropriately, image g(x, y) can be readily computed of the inversion algorithm are performed using deterministic
6-2 DENOISING BY LOWPASS FILTERING 183
f (x,y)
h1(x,y)
g1(x,y) = h1(x,y) ** f (x,y)
υ(x,y) υ(x,y) υ(x,y)

SNR = 20 dB SNR = 0 dB SNR = −10 dB
g11(x,y) g12(x,y) g13(x,y)
Figure 6-2 The image in the upper center, g1 (x, y), had been convolved with the narrow filter before noise was added to it. SNR = 20 dB
corresponds to average signal power/average noise power = 100, so g11 (x, y) is essentially noise-free. In contrast, SNR = 0 dB in g12 (x, y),
which means that the average signal and noise powers are equal, and in g13 (x, y) the noise power is 10× the signal power..
methods, as demonstrated in later sections of the present chapter, 6-2 Denoising by Lowpass Filtering
or they are performed using stochastic (probabilistic) methods,
which we cover later in Chapter 9. As a “heads up”, we note that
the stochastic approach usually outperforms the deterministic
approach. In Section 3-7, we introduced and defined the 2-D discrete-
space Fourier transform (DSFT) F(Ω1 , Ω2 ) of discrete-space
image f [n, m]. Here, Ω1 and Ω2 are continuous spatial frequen-
cies one period of which is over the range Application of the lowpass filter to the spectrum G(Ω1 , Ω2 ) of
noisy image g[n, m] generates spectrum
− π ≤ Ω1 , Ω2 ≤ π .
We will refer to this continuous-frequency domain as the Gbrick (Ω1 , Ω2 ) = Hbrick (Ω1 , Ω2 ) G(Ω1 , Ω2 ). (6.8)
discrete-space spatial frequency (DSSF) domain.
In the DSSF domain, most of the energy in the spectra The operation given by Eq. (6.8) can be performed in the
of typical images is concentrated in a small central region (N × N) 2-D DFT domain (Section 3-8) by defining a 2-D DFT
surrounding the origin (Ω1 = 0, Ω2 = 0). In contrast, additive cutoff index K such that
noise may be distributed over a wide range of frequencies Ω1
and Ω2 . If we denote G(Ω1 , Ω2 ) as the spectrum of noisy image Ωc N
g[n, m], the rationale behind lowpass filtering is to remove high- K= , (6.9)
2π
frequency noise from G(Ω1 , Ω2 ) while preserving (as much as
possible) the spectrum of the original image F(Ω1 , Ω2 ). The and then setting to zero those elements of the 2-D DFT of
disadvantage of lowpass filtering is that the high-DSSF regions G[k1 , k2 ] that fall in the range
of an image may represent features of interest, such as edges.
K ≤ k1 , k2 ≤ N + 2 − K. (6.10)
We note that the operation preserves conjugate symmetry in the

◮ Straightforward lowpass filtering—the subject of the filtered spectrum Gbrick [k1 , k2 ]. Consequently, the inverse 2-D
present section—eliminates high-frequency noise, but also DFT gbrick [n, m] of Gbrick [k1 , k2 ] is real-valued.
eliminates edges and sharp variations in the image. If To illustrate the trade-off between noise reduction and preser-
preserving edges is important, alternative methods should vation of fast-varying (with position) image features, we con-
be used, such as the wavelet-denoising approach described sider a noisy letters image, which characteristically has many
in Chapter 7. ◭ edges. Starting with a noise-free image f [n, m], we synthesized
a noisy image g[n, m] by adding random amounts of noise to the
image pixels. The noisy image, consisting of (256 × 256) pixels,
6-2.1 Brickwall Lowpass Filtering is shown in Fig. 6-3(a). The values f [n, m] of the original image
ranged between 0 (representing black) and 255 (representing
Signal-to-noise (SNR) is a measure of how significant (or white). Thus,
insignificant) the presence of noise is and the degree to which 0 ≤ f [n, m] ≤ 255.
it is likely to distort the image. If the noisy image g[n, m] is
Intentionally, the amount of noise that was added to the original
composed of the true image f [n, m] plus additive noise υ [n, m],
image was much greater in total energy than the energy of the
g[n, m] = f [n, m] + υ [n, m], (6.5) original image itself, thereby producing a very noisy looking
image. The associated SNR is −12.8 dB, which means that the
then the SNR in dB is defined as total energy of f [n, m] is only 5.25% of that of the total noise
energy. To preserve non-negativity of the noisy image g[n, m],
∑ ∑ f 2 [n, m] the noise image υ [n, m] was assigned the range 0 to 500:
SNR = 10 log10 , (6.6)
∑ ∑ υ 2 [n, m]
0 ≤ υ [n, m] ≤ 500.
where the summations are performed over all image pixels.
A brickwall lowpass filter passes all frequency components Consequently, the range of pixel values in noisy image g[n, m] is
below a specified cutoff frequency Ωc (along both Ω1 and Ω2 )
and removes all components at frequencies above Ωc . The 0 ≤ g[n, m] ≤ 755.
DSSF response of the brickwall lowpass filter over 1 period of
• Figure 6-3(b) displays spectrum 10 log10 [G(Ω1 , Ω2 )] of
(Ω1 , Ω2 ) is
g[n, m], which extends between −π and π along both Ω1
( and Ω2 .
1 for 0 ≤ Ω1 , Ω2 ≤ Ωc ,
Hbrick (Ω1 , Ω2 ) = (6.7)
0 for Ωc < Ω1 , Ω2 ≤ π . • Multiplication of the spectrum in Fig. 6-3(b) by a brickwall
−π Ω2 π
0 π
DSFT
0 Ω1
(b) Spectrum 10 log10[G(Ω1,Ω2)]

255 n −π of noisy image
0 255
m (a) Noisy image g[n,m]
−π Ω2 π
0 π
Inverse DSFT
0 Ω1
(d) Ωc1 = 75π/128

255
0 255
n −π
m (c) Filtered image gbrick[n,m] with Ωc1 = 75π/128
−π Ω2 π
0 π
Inverse DSFT
0 Ω1
(f ) Ωc2 = 50π/128
255
0 255
n −π
m (e) Filtered image gbrick[n,m] with Ωc2 = 50π/128
−π Ω2 π
0 π
Inverse DSFT
0 Ω1
(h) Ωc3 = 25π/128

255
0 255
n −π
m (g) Filtered image gbrick[n,m] with Ωc3 = 25π/128
Figure 6-3 Image denoising by three lowpass filters with different cutoff wavenumbers.
lowpass filter with Ωc = 75π /128 leads to the spectrum The fundamental frequency of x(t) is f0 = 1/T = 1/2π . We
in Fig. 6-3(d). The two spectra are identical within the can apply the equivalent of a brickwall lowpass filter with cutoff
square defined by |Ω1 |, |Ω2 | ≤ Ωc1 . The fractional size of frequency kc f0 by truncating the Fourier series at k = kc . For
the square is (75/128)2 = 34% of the spectrum of the example, if we select kc = 21, we obtain a brickwall lowpass-
original image. filtered version of x(t) given by
Inverse transforming the spectrum in Fig. 6-3(d) produces 21
the lowpass-filtered image in Fig. 6-3(c). We observe that 4
the noise is reduced, albeit only slightly, but the letters are
ybrick (t) = ∑ kπ
sin(kt). (6.12)
k=1
k=odd
hardly distorted.
• Narrowing the filtered spectrum down to a box with The truncated summation contains 11 nonzero terms. The plot of
Ωc2 = 50π /128 leads to the spectrum in Fig. 6-3(f). The as- ybrick (t) displayed in Fig. 6-4(b) resembles the original square
sociated filtered image gbrick [n, m] is shown in Fig. 6-3(e). wave, except that it also exhibits small oscillations; i.e., the
In this case, only (50/128)2 = 15% of the spectrum of ringing effect we referred to earlier.
the original noisy image is retained. The filtered image
contains less noise, but the letters are distorted slightly. B. 1-D tapered lowpass-filtered signal
• Repeating the process, but limiting the spectrum to The ringing in the lowpass-filtered signal, which is associated
Ωc3 = 25π /128—in which case, only (25/128)2 = 4% of with the sharp cutoff characteristic of the brickwall filter, can be
the spectrum is retained—leads to the image in Fig. 6-3(g). reduced significantly by multiplying the terms in Eq. (6.12) by
The noise is greatly reduced, but the edges of the letters are a decreasing sequence of weights, thereby tapering those terms
fuzzy. gradually to zero. Several tapering formats are available, one of
which is the Hamming window defined by Eq. (2.83). Adapting
the expression for the Hamming window to the square wave
◮ This example illustrates the trade-off inherent in Fourier- leads to
based lowpass filtering: noise can be reduced, but at the ex-
21
pense of distorting the high-frequency content of the image. 4 π (k − 1)
As noted earlier, in Chapter 7 we show how to avoid this yHam (t) = ∑ sin(kt) 0.54 + 0.46 cos .
k=1 kπ 20
trade-off using wavelets instead of Fourier transforms. ◭ k=odd
(6.13)
A plot of the tapered signal yHam (t) is shown in Fig. 6-4(c).
6-2.2 Tapered Lowpass Filtering Even though the tapered signal includes the same number of fre-
quency harmonics as before, the oscillations have disappeared
Even though it is not apparent in the filtered images of Fig. 6-3, and the transitions at t = integer values of T /2 = π are relatively
at lower cutoff DSSFs, some of what appears to be noise is smooth.
actually “ringing” caused by the abrupt “brickwall” filtering of
the spectrum of the noisy image. Fortunately, the ringing effect
can be reduced significantly by modifying the brickwall filter C. 2-D brickwall lowpass-filtered image
into a tapered filter. We will examine both the problem and the
proposed solution for 2-D images, but before we do so, it will The ringing observed in 1-D signals also manifests itself in 2-D
be instructive to consider the case of a periodic 1-D signal. images whenever the image spectrum is lowpass-filtered by a
sharp filter. The DSSF response Hbrick (Ω1 , Ω2 ) of a brickwall
lowpass filter with cutoff frequency Ωc along both Ω1 and Ω2 is
A. 1-D brickwall lowpass-filtered signal given by Eq. (6.7). The corresponding inverse DSFT is
Signal x(t), shown in Fig. 6-4(a) is a square wave with period
T = 2π and amplitude A = 1. Its Fourier series expansion is hbrick [n, m] = hbrick [n] hbrick [m]
given by 2
∞ Ωc Ωc n Ωc m
4
x(t) = ∑ sin(kt). (6.11) = sinc sinc . (6.14)
k=1 kπ
π π π
k=odd
0.5
−0.5
−1
0 1 2 3 4 5 6 7 8 9
t
π 2π
(a) Square wave x(t)
0.5
−0.5
−1
t
0 1 2 3 4 5 6 7 8 9
(b) Brickwall lowpass-filtered signal ybrick(t)
0.5
−0.5
−1
t
0 1 2 3 4 5 6 7 8 9
(c) Tapered Fourier series signal yHam(t)
Figure 6-4 (a) Square wave x(t), (b) brickwall lowpass-filtered version, and (c) Hamming-windowed version.
Per the definition of the sinc function given by Eq. (2.35),

namely sinc(x) = [sin(π x)]/(π x), hHam[n]
Impulse response
π Ωc n 0.25
sin 0.20
Ωc n π sin(Ωc n)
sinc = = , (6.15) 0.10
π π Ωc n Ωc n
0 n
π −0.05
−10 −5 0 5 10
and a similar definition applies to sinc(Ωc m/π ). (a) Impulse response of Hamming-windowed filter
For an image g[n, m] with a corresponding DSFT G(Ω1 , Ω2 ),
the filtered image spectrum is HHam(Ω)
Frequency response
Gbrick (Ω1 , Ω2 ) = Hbrick (Ω1 , Ω2 ) G(Ω1 , Ω2 ), (6.16a) 1.5
and the corresponding spatial-domain relationship is 1

0.5
gbrick [n, m] = hbrick [n, m] ∗ ∗g[n, m]. (6.16b)
0
The impact of the brickwall lowpass-filtering process on a 2-D −π 0 π
image was illustrated earlier through Fig. 6-3. (b) Spectrum HHam(Ω) of Hamming-windowed filter
D. 2-D tapered lowpass-filtered image Figure 6-5 Hamming window of length N = 10: (a) impulse
Figure 6-5 displays the spatial and frequency domain responses response, and (b) spectrum.
of a Hamming window with N = 10, adapted from Fig. 2-13.
For a 2-D image g[n, m], lowpass filtering its spectrum with a
Hamming window of length N is equivalent to performing the In contrast, the image in Fig. 6-6(e)—which was generated by
convolution applying a Hamming windowed filter to the original image—
exhibits no “ringing.” Note that the spectrum of the Hamming-
gHam [n, m] = hHam [n, m] ∗ ∗g[n, m] (6.17a) windowed spectrum in Fig. 6-6(f) tapers gradually from the
center outward. It is this tapering profile that eliminates the
ringing effect.
with
hHam [n, m] = hHam [n] hHam [m] = Concept Question 6-1: For lowpass filtering, why would
 2 we use a Hamming-windowed filter instead of a brick-wall
Ωc


 π sinc Ωπc n 0.54 + 0.46 cos πNn filter?

Ωc m
 × sinc π 0.54 + 0.46 cos πNm for |n|, |m| ≤ N,


0 for |n|, |m| > N. 6-3 Notch Filtering
(6.17b)
Occasionally, an image may contain a 2-D sinusoidal inter-
To illustrate the presence of “ringing” when a sharp-edged filter ference contributed by an electromagnetic source, such as the
like a brickwall is used, and its absence when a Hamming win- ac power cable in a camera. In 1-D discrete-time signals,
dowed filter is used instead, we refer the reader to Fig. 6-6. In sinusoidal interference can be eliminated by subjecting the
part (a), we show a noiseless letters image, and its corresponding signal’s spectrum to a notch filter, which amounts to setting the
spectrum is displayed in part (b). Application of a brickwall spectrum at that specific frequency to zero. A similar process
lowpass filter (with the impulse response given by Eq. (6.14)) can be applied to a 2-D image. To illustrate, let us consider the
leads to the image in Fig. 6-6(c). The “ringing” in the image is example portrayed in Fig. 6-7. In part (a) of the figure, we have a
visible in the form of whorls that resemble a giant thumbprint. 660 × 800 image of the planet Mars recorded by a Mariner space
6-3 NOTCH FILTERING 189
−π Ω2 π
0 π
DSFT
128 0 Ω1
255
0 128 255 −π
(a) Noiseless letters image (b) Log of spectrum of noiseless letters image
−π Ω2 π
0 π
DSFT
128 0 Ω1
255 −π
0 128 255
(c) Brickwall lowpass-filtered image (d) Log of spectrum of brickwall lowpass-filtered image
−π Ω2 π
−10 π
DSFT
128 0 Ω1
265 −π
−10 128 265
(e) Hamming lowpass-filtered image (f) Log of spectrum of Hamming-windowed
lowpass-filtered image
Figure 6-6 Letters image and its spectrum in (a) and (b); brickwall lowpass-filtered version in (c) and (d), and brickwall lowpass-filtered
version with a Hamming window in (e) and (f). The logarithmic scale enhances small values of the spectrum.
Ω2
0 π
DSFT
0 Ω1
659 −π
0 799 −π π
(a) Original image with scan lines (b) Log of magnitude of spectrum of original image
Ω2
0 π
DSFT
0 Ω1
659 −π
0 799 −π π
(c) Image of vertically periodic horizontal lines (d) Magnitude of spectrum of vertically
periodic horizontal lines
Ω2
0 π
Inverse DSFT
0 Ω1
659 −π
0 799 −π π
(e) Notch-filtered image (f ) Log of magnitude of spectrum of notch-filtered image
Figure 6-7 Process for notch-filtering horizontal scan lines. Fig. 6-7(a) courtesy of NASA.
6-4 IMAGE DECONVOLUTION 191
probe. In addition to the image of the planet, the image contains

Concept Question 6-2: From where does a notch filter
near-horizontal scan lines that we wish to eliminate by applying
get its name?
notch filtering.
Let us designate the image containing only the near-
horizontal lines as f ′ (x, y), which we display in Fig. 6-7(c), Concept Question 6-3: What is notch filtering used for?
along with its spectrum in Fig. 6-7(d). Along the vertical
direction, the image contains (2M + 1) lines, with one line
passing through the center and M lines each above and below
the center line. The separation between adjacent lines is ∆. For
6-4 Image Deconvolution
the present, we will treat the near-horizontal lines as if they were
When our eyes view a scene, they form an approximate image of
perfectly horizontal (correction for skewness will be applied
the scene, because the optical imaging process performed by the
later). Accordingly, f ′ (x, y) is given by
eyes distorts the true scene, with the degree of distortion being
M dependent on the imaging properties of the eyes’ lenses. The
f ′ (x, y) = ∑ δ (y − m∆), (6.18) same is true when imaging with a camera, a medical imaging
m=−M system, and an optical or radio telescope. Distortion can also be
caused by the intervening medium between the imaged scene
and (using entry #3 in Table 2-4 and property #2 in Table 2-5) and the imaging sensor. Examples include the atmosphere when
the associated spectrum of f ′ (x, y) is a telescope is used to image a distant object, or body tissue
M
when a medical ultrasound sensor is used to image body organs.
F′ ( µ , ν ) = e− j2πν m∆ δ (µ ). In all cases, the imaging process involves the convolution of a
∑ (6.19)
true image scene f [n, m] with a point spread function h[n, m]
m=−M
representing the imaging sensor (and possibly the intervening
The sum in Eq. (6.19) is similar in form to the sum in Eq. (2.87), medium). The recorded (sensed) image g[n, m] is, therefore,
M
given by
sin((2M + 1)Ω/2) g[n, m] = h[n, m] ∗ ∗ f [n, m]. (6.22)
∑ e− jΩm = , (6.20)
m=−M sin(Ω/2)
The goal of image deconvolution is to deconvolve the recorded
thereby allowing us to rewrite Eq. (6.19) as image so as to extract the true image f [n, m], or a close
approximation of it. Doing so requires knowledge of the PSF
sin((2M + 1)πν ∆) h[n, m]. In terms of size:
F′ ( µ , ν ) = δ (µ ). (6.21)
sin(πν ∆) • f [n, m] is the unknown true image, with size (M × M).
The spectrum, which is a discrete sinc function along the ν
• h[n, m] is the known PSF, with size (L × L).
(vertical spatial frequency) direction and an impulse function
along the µ (horizontal spatial frequency) direction, is displayed • g[n, m] is the known recorded image, with size
in Fig. 6-7(d). Only the peaks of the discrete sinc are visible in (L + M − 1) × (L + M − 1).
the image.
Recall from Section 3-4.2 that rotating an image causes its As noted earlier in Chapter 1, the PSF of the imaging sensor
spectrum to rotate by the same angle. Hence, the interfering can be established by imaging a small object representing a 2-D
vertically periodic near-horizontal lines in Fig. 6-7(a) should impulse.
appear in its spectrum (Fig. 6-7(b)) as a near-vertical line rotated Since convolution in the discrete-time domain translates into
slightly counterclockwise. multiplication in the frequency domain, the DSFT-equivalent of
The spectrum shown in Fig. 6-7(f) is the spectrum of the Eq. (6.22) is given by
original image after setting the spectrum associated with the
interfering lines to zero. Application of the inverse transform G(Ω1 , Ω2 ) = H(Ω1 , Ω2 ) F(Ω1 , Ω2 ). (6.23)
leads to the filtered image shown in Fig. 6-7(e). The interfering
horizontal lines have been eliminated, with minor degradation Before we perform the frequency transformation of g[n, m], we
to the rest of the image. should round up (L+ M − 1) to N, where N is the smallest power
of 2 greater than (L + M − 1). The rounding-up step allows us
to use the fast radix-2 2-D FFT to compute 2-D DFTs of order
(N × N). Alternatively, the Cooley-Tukey FFT can be used, in
which case N should be an integer with a large number of small
factors.
Sampling the DSFT at Ω1 = 2π k1/N and Ω2 = 2π k2 /N for
k1 = 0, 1, . . . , N − 1 and k2 = 0, 1, . . . , N − 1 provides the DFT
complex coefficients G[k1 , k2 ].
A similar procedure can be applied to h[n, m] to obtain
coefficients H[k1 , k2 ], after zero-padding h[n, m] so that it also
is of size (N × N). The DFT equivalent of Eq. (6.23) is then
given by
G[k1 , k2 ] = H[k1 , k2 ] F[k1 , k2 ]. (6.24)

(a) Letters image f [n,m]
The objective of deconvolution is to compute the DFT co-
efficients F[k1 , k2 ], given the DFT coefficients G[k1 , k2 ] and
H[k1 , k2 ].
6-4.1 Nonzero H[k1, k2 ] Coefficients

In the ideal case where none of the DFT coefficients H[k1 , k2 ]
are zero, the DFT coefficients F[k1 , k2 ] of the unknown image
can be obtained through simple division,
G[k1 , k2 ]
F[k1 , k2 ] = . (6.25)
H[k1 , k2 ]
Exercising the process for all possible values of k1 and k2 leads (b) Blurred image g[n,m]
to an (N × N) 2-D DFT for F[k1 , k2 ], whereupon application of
an inverse 2-D DFT process yields a zero-padded version of
f [n, m]. Upon discarding the zeros, we obtain the true image
f [n, m]. The deconvolution procedure is straightforward, but
it hinges on a critical assumption, namely that none of the
DFT coefficients of the imaging system’s transfer function is
zero. Otherwise, division by zero in Eq. (6.25) would lead to
undeterminable values for F[k1 , k2 ].
6-4.2 Image Deconvolution Example

To demonstrate the performance of the deconvolution process,
we used a noise-free version of the letters image, shown in
Fig. 6-8(a), which we denote f [n, m]. Then, we convolved it with (c) Deconvolved image f [n,m]
a truncated 2-D Gaussian PSF (a common test PSF) given by
2 +n2 )/20 Figure 6-8 The blurred image in (b) was generated by con-
h[n, m] = e−(m , −10 ≤ n, m ≤ 10. (6.26) volving f [n, m] with a Gaussian PSF, and the image in (c) was
recovered through deconvolution of g[n, m].
The PSF represents the imaging system. The convolution pro-
cess generated the blurred image shown in Fig. 6-8(b), which
6-4 IMAGE DECONVOLUTION 193
we label g[n, m]. The image sizes are: regularization, which seeks to minimize the cost function
• Original letters image f [n, m]: 256 × 256

N−1 N−1
e= ˆ m])2
∑ ∑ [(g[n, m] − h[n, m] ∗ ∗ f[n,
• Gaussian PSF h[n, m]: 21 × 21 m=0 n=0
+ (λ fˆ[n, m])2 ], (6.29)
• Blurred image g[n, m]: 276 × 276, where
276 = 256 + 21 − 1.
where zero-padding to size N × N has been implemented so that
After zero-padding all three images to 280 × 280, the 2-D FFT all quantities in Eq. (6.29), except for λ , are of the same order.
was applied to all three images. Then, Eq. (6.25) was applied The parameter λ is non-negative and it is called a regularization
to find coefficients F[k1 , k2 ], which ultimately led to the decon- parameter. The second term on the right-hand side of Eq. (6.29)
volved image f [n, m] displayed in Fig. 6-8(c). The deconvolved represents the bias error associated with fˆ[n, m] and the first
image matches the original image shown in Fig. 6-8(a). The term represents the variance. Setting λ = 0 reduces Eq. (6.29)
process was successful because none of the H[k1 , k2 ] coefficients to the unregularized state we dealt with earlier in Section 6-4.1,
had zero values and G[k1 , k2 ] was noise-free. To avoid division wherein the measurement process was assumed to be noise-free.
by H[k1 , k2 ] when computing F[k1 , k2 ], we use image regulariza- For realistic imaging processes, λ should be greater than zero,
tion and Wiener filtering, as discussed in the next subsections. but there is no simple method for specifying its value, so usually
its value is selected heuristically (by trial and error).
The estimation process may be performed iteratively in the
6-4.3 Tikhonov Image Regularization discrete-time domain by selecting an initial estimate fˆ[n, m]
and then recursively iterating the estimate until the error e
All electronic imaging systems generate some noise of their approaches a minimum level. Alternatively, the process can be
own. The same is true for the eye-brain system. Hence, performed in the frequency domain using a Wiener filter, as
Eq. (6.24) should be modified to discussed next.
G[k1 , k2 ] = H[k1 , k2 ] F[k1 , k2 ] + V[k1, k2 ], (6.27)
where V[k1 , k2 ] represents the spectrum of the additive noise

contributed by the imaging system. The known quantities are the
6-4.4 Wiener Filter
measured image G[k1 , k2 ] and the PSF of the system, H[k1 , k2 ], Using Rayleigh’s theorem (entry #9 in Table 3-3), the frequency
and the sought-out quantity is the true image F[k1 , k2 ]. Dividing domain DFT equivalent of the Tikhonov cost function given by
both sides of Eq. (6.27) by H[k1 , k2 ] and solving for F[k1 , k2 ] Eq. (6.29) is
gives
G[k1 , k2 ] V[k1 , k2 ] 1 N−1 N−1
F[k1 , k2 ] = − . (6.28)
H[k1 , k2 ] H[k1 , k2 ] E=
N2 ∑ ∑ [|G[k1 , k2 ] − H[k1, k2 ] F̂[k1, k2 ]|2
k1 =0 k2 =0
In many practical applications, H[k1 , k2 ] may assume very small
values for large values of [k1 , k2 ]. Consequently, the second term + λ 2 |X[k1 , k2 ]|2 ]. (6.30)
in Eq. (6.28) may end up amplifying the noise component and The error can be minimized separately for each (k1 , k2 ) combi-
may drown out the first term. To avoid the noise-amplification nation. The process can be shown (see Problem 6-11) to lead to
problem, the deconvolution can be converted into a regularized the solution
estimation process. Regularization involves the use of a cost
function that trades off estimation accuracy (of f [n, m]) against H∗ [k1 , k2 ]
measurement precision. The process generates an estimate F̂[k1 , k2 ] = G[k1 , k2 ] , (6.31)
|H[k1 , k2 ]|2 + λ 2
fˆ[n, m] of the true image f [n, m]. Accuracy refers to a bias
associated with all pixel values of the reconstructed image where H∗ [k1 , k2 ] is the complex conjugate of H[k1 , k2 ]. The
fˆ[n, m] relative to f [n, m]. Precision refers to the ± uncertainty quantity multiplying G[k1 , k2 ] is called a Wiener filter W[k1 , k2 ].
associated with each individual pixel value due to noise. That is,
A commonly used regularization model is the Tikhonov F̂[k1 , k2 ] = G[k1 , k2 ] W[k1 , k2 ], (6.32a)
with
Concept Question 6-4: What does the Wiener filter
given by Eq. (6.31) reduce to when λ = 0?
H∗ [k1 , k2 ]
W[k1 , k2 ] = . (6.32b)
|H[k1 , k2 ]|2 + λ 2
Concept Question 6-5: Why is Tikhonov regularization
needed in deconvolution?
The operation of the Wiener filter is summarized as follows:
(a) For values of (k1 , k2 ) such that |H[k1 , k2 ]| ≫ λ , the Wiener Exercise 6-1: Apply Tikhonov regularization with λ = 0.01
filter implementation leads to to the 1-D deconvolution problem
H∗ [k1 , k2 ] G[k1 , k2 ] {x[0], x[1]} ∗ {h[0], h[1], h[2]} = {2, −5, 4, −1},
F̂[k1 , k2 ] ≈ G[k1 , k2 ] 2
= , (6.33a)
|H[k1 , k2 ]| H[k1 , k2 ]
where h[0] = h[2] = 1 and h[1] = −2.
which is the same as Eq. (6.25). Y(Ω)
(b) For values of (k1 , k2 ) such that |H[k1 , k2 ]| ≪ λ , the Wiener Answer: H(0) = 1 − 2 + 1 = 0 so X(Ω) = H(Ω) will not
filter implementation leads to work at Ω = 0. But
H∗ [k1 , k2 ] H∗ (Ω)
F̂[k1 , k2 ] ≈ G[k1 , k2 ] . (6.33b) X(Ω) = Y(Ω)
λ2 |H(Ω)|2 + λ 2
In this case, the Wiener filter avoids the noise amplification does work. Using 4-point DFTs (computable by hand) gives
problem that would have occurred with the use of the unregu- x[n] = {1.75, −1.25, −0.25, −0.25}, which is close to the
larized deconvolution given by Eq. (6.25). actual x[n] = {2, −1}. MATLAB code:
h=[1 -2 1];x=[2 -1];y=conv(x,h);
6-4.5 Wiener Filter Deconvolution Example H=fft(h,4);Y=fft(y);
Z=conj(H).*Y./(abs(H).*abs(H)+0.0001);
To demonstrate the capabilities of the Wiener filter, we compare z=real(ifft2(Z)) provides the estimated x[n].
image deconvolution performed with and without regulariza-
tion. The demonstration process involves images at various
stages, namely:
6-5 Median Filtering
• f [n, m]: true letters image (Fig. 6-9(a)).
Median filtering is used to remove salt-and-pepper noise, often
• g[n, m] = h[n, m] ∗ ∗ f [n, m] + v[n, m]: the imaging process
due to bit errors or shot noise associated with electronic devices.
not only distorts the image (through the PSF), but also adds
The concept of median filtering is very straightforward:
random noise v[n, m]. The result, displayed in Fig. 6-9(b),
is an image with signal-to-noise ratio of 10.8 dB, which
means that the random noise energy is only about 8% of ◮ A median filter of order L replaces each pixel with the
that of the signal. median value of the L2 pixels in the L × L block centered on
• fˆ1 [n, m]: estimate of f [n, m] obtained without regulariza- that pixel. ◭
tion (i.e., using Eq. (6.25)). Image fˆ1 [n, m], displayed in
Fig. 6-9(c), does not show any of the letters present in the
original image, despite the fact that the noise level is small For example, a median filter of order L = 3 replaces each pixel
relative to the signal. [n, m] with the median value of the 3 × 3 = 9 pixels centered at
[n, m]. Figure 6-10(a) shows an image corrupted with salt-and-
• fˆ2 [n, m]: estimate of f [n, m] obtained using the Wiener pepper noise, and part (b) of the same figure shows the image
filter of Eq. (6.31) with λ 2 = 5. The deconvolved image after the application of a median filter of order L = 5.
(Fig. 6-9(d)) displays all of the letters contained in the
original image, but some high wavenumber noise also is Concept Question 6-6: When is median filtering useful?
present.
6-6 MOTION-BLUR DECONVOLUTION 195
(a) f [n,m] (b) g[n,m] = h[n,m] ** f [n,m] + v[n,m]
(c) fˆ1[n,m] without regularization (d) fˆ2[n,m] with Wiener filter
Figure 6-9 (a) Original noise-free undistorted letters image f [n, m], (b) blurred image due to imaging system PSF and addition of random
noise v[n, m], (c) deconvolution using Eq. (6.25), and (d) deconvolution using Eq. (6.31) with λ 2 = 5.
6-6 Motion-Blur Deconvolution ing pattern known as motion blur. An example is the simulated
photograph of the highway sign shown in Fig. 6-11(a), taken
from a moving car. Often, the direction and duration of the blur
6-6.1 Continuous Space can be discerned from the blurred image.
To describe motion blur mathematically, we start by making
If, during the recording time for generating a still image of an
the following assumptions:
object or scene, the imaged object or scene is in motion relative
(1) The blurred image has been appropriately rotated so that
to the imaging system, the recorded image will exhibit a streak-
(a) Noisy image (b) After median filtering

224
Figure 6-10 Median filtering example: (a) letters image cor- 0 274
rupted by salt-and-pepper noise, and (b) image after application (a) Image g[n,m]: motion-blurred highway sign
of median filtering using a 5 × 5 window.
0
the direction of motion is aligned along the x axis.

(2) Image recording is T (seconds) in duration, with instan-
taneous start and stop actions, thereby allowing us to represent
the process in terms of a rectangle function.
(3) The motion of the imager, relative to the scene, is linear
(along the x axis) and at a constant speed s (meters/s).
In terms of the unblurred image f (x, y), corresponding to
no-motion conditions, the blurred image g(x, y) consists of a 224
0 274
superposition of copies of f (x, y) shifted along the x axis by
distance x′ = st: (b) Image g′[n,m]: motion-blurred highway sign
Z T with additive noise
g(x, y) = f (x − x′ , y) dt. (6.34)
0 0
Upon replacing dt with dx′ /s and changing the upper integration

limit to D = sT , we have
Z D
1
g(x, y) = f (x − x′ , y) dx′ . (6.35)
s 0
The spatial shift occurs along x only, and its length is D, which
is equivalent to defining g(x, y) as the convolution of f (x, y) with
a point spread function h(x, y) composed of a rectangle function
of spatial duration D = sT and centered at D/2:
g(x, y) = f (x, y) ∗ ∗h(x, y), (6.36)

274
0 274
with (c) Reconstructed highway sign
1 x − D/2
h(x, y) = rect δ (y). (6.37)
s D Figure 6-11 Highway sign: (a) motion-blurred, (b) with combi-
The spatial frequency response H(µ , ν ) is the 2-D Fourier nation of blurring and noise, and (c) after reconstruction.
transform of h(x, y), which separates into two 1-D Fourier
6-6 MOTION-BLUR DECONVOLUTION 197
transforms. Using entries #1a and #4 in Table 2-5 gives where s is the relative speed of the imager (along the x axis).
The total number of time shifts N that occur during the total
x − D/2 δ (y) recording time T is
H(µ , ν ) = F x→µ rect F y→ν
D s T
N= . (6.45)
D ∆t
= sinc(µ D) e− jπ µ D . (6.38)
s In terms of these new quantities, the discrete-case analogues to
Eqs. (6.36) and (6.37) are
The convolution in the spatial domain given by Eq. (6.36)
becomes a product in the spatial frequency domain: N
g[n, m] = ∑ f [n − i, m] ∆t = f [n, m] ∗ ∗h[n, m], (6.46)
G(µ , ν ) = F(µ , ν ) H(µ , ν ). (6.39) i=0
To recover the unblurred image f (x, y), we need to: where the discrete-space PSF h[n, m] is
(a) divide Eq. (6.39) by H(µ , ν ) to obtain
n − N/2
1 h[n, m] = rect δ [m] ∆t . (6.47)
F(µ , ν ) = G(µ , ν ), (6.40) N/2
H(µ , ν )
The rectangle function is of duration (N + 1), extending from
where G(µ , ν ) is the spatial frequency spectrum of the blurred n = 0 to n = N, and centered at N/2. We assume that N is an
image, and then even integer. As with the continuous-space case, the deblurring
(b) perform an inverse transform on F(µ , ν ). operation (to retrieve f [n, m] from g[n, m]) is performed in the
However, in view of the definition of the sinc function, spatial frequency domain, wherein the assumption that N is an
even integer is not relevant, so the assumption is mathematically
sin(π µ D) convenient, but not critical.
sinc(µ D) = , (6.41)
π µD The spatial frequency domain analogue of Eq. (6.46) is
it follows that the spatial frequency response H(µ , ν ) = 0 for G(Ω1 , Ω2 ) = F(Ω1 , Ω2 ) H(Ω1 , Ω2 ). (6.48)
integer values of µ D. Consequently, the inverse filter 1/H(µ , ν )
is undefined for nonzero integer values of µ D, thereby requiring Here, G(Ω1 , Ω2 ) is the 2-D spectrum of the recorded blurred
the use of regularization (Section 6-4.3). image, and H(Ω1 , Ω2 ) is the discrete-space spatial frequency
response function (DSSF) response of h[n, m]. From entry #7
in Table 2-8, and noting that ∆t = T /N, the DSSF response
6-6.2 Motion Blur after Sampling function corresponding to Eq. (6.47) is given by
To convert image representation from the continuous-space case H(Ω1 , Ω2 ) = DSFT{h[n, m]}
of the previous subsection to the sampled-space case, we start by
n − N/2
sampling unblurred (still) image f (x, y) and the motion-blurred = DTFTn→Ω1 rect
image g(x, y) at x = n∆ and y = m∆: N/2
T
× DTFTm→Ω2 {δ [m]}
f [n, m] = f (x = n∆, y = m∆), (6.42a)
N
g[n, m] = g(x = n∆, y = m∆). (6.42b) T sin Ω1 N+1
= 2
e− jΩ1 N/2 . (6.49)
N sin(Ω1 /2)
We also discretize time t as
t = i∆t , (6.43) The sinc function dictates that H(Ω1 , Ω2 ) = 0 for

Ω1 (N + 1)/2 = kπ for nonzero integer values of k.
where ∆t is the time interval associated with the movement of Consequently, the inverse filter 1/H(Ω1, Ω2 ) is undefined
the imager (relative to the scene) by a distance ∆: at these values of Ω1 , thereby requiring regularization.
Regularization can be accomplished using the Wiener filter
∆
∆t = , (6.44)
s
(Section 6-4.4), which leads to

Concept Question 6-8: How does one determine the
G(Ω1 , Ω2 ) H∗ (Ω1 , Ω2 ) value of λ to use in Tikhonov regularization?
F(Ω1 , Ω2 ) =
|H(Ω1 , Ω2 )|2 + λ 2

T sin Ω1 N+1 Exercise 6-2: For motion blur with a PSF of length
G(Ω1 , Ω2 ) 2
e jΩ1 N/2 N + 1 = 60 in the direction of motion, at what spatial
N sin(Ω1 /2)
= 2 2 . (6.50) frequencies Ω will the spatial frequency response be zero?
N+1
T sin Ω1 2 2 Answer: From Eq. (6.50), the numerator of the spatial
+λ
N sin(Ω1 /2) frequency response is zero when Ω = ±kπ /30 for any
nonzero integer k.
The image deblurring process is illustrated by the three images
shown in Fig. 6-11.
(a) A (still) (225 × 225) image f [n, m] has been motion-
blurred into a (225 × 275) image g[n, m]. The image is a
simulated highway sign taken from a moving vehicle. The length
(N + 1) of the blur is 51.
(b) To further distort the blurred image, noise was added to
image g[n, m] to produce
g′ [n, m] = g[n, m] + υ [n, m], (6.51)
with υ [n, m] being a zero-mean random variable with a variance

of 100. The consequent signal-to-noise ratio is 5.35 dB.
(c) Application of the Wiener filter recipe given by Eq. (6.50)
and then inverting to the spatial domain to obtain f [n, m] leads
to the reconstructed image shown in Fig. 6-11(c). The imple-
mentation involved the use of 2-D DFTs of order (275 × 275).
Reconstructed images were generated for different values of λ
(in Eq. (6.50)); the value of λ = 1 provided the best result
visually.
Concept Question 6-7: How does one determine the

length N + 1 of the PSF for motion blur from the spectrum
of the blurred image?
Summary
Concepts
• Image restoration is about reconstructing an image from filtered image.
its blurred, noisy, or interference-corrupted version. • Notch filtering reduces sinusoidal interference caused by
• Lowpass filtering reduces noise, but it blurs edges and AC interference.
fine-scale image features. Wavelet-based denoising (in • Motion blur deconvolution undoes the blur caused by
Chapter 7) reduces noise while preserving edges. camera motion.
• A Hamming-windowed PSF reduces “ringing” in the
PROBLEMS 199
1-D Hamming-windowed lowpass filter Tikhonov regularization criterion
hFIR [n] = N−1 N−1

 Ωc sinc Ωc n
h π n i e= ∑ ∑ [(g[n, m] − h[n, m] ∗ ∗ fˆ[n, m])2 + λ 2 fˆ[n, m]2 ]
0.54 + 0.46 cos |n| ≤ N, n=0 m=0
π π N
 Wiener filter
0 |n| > N
H∗ [k1 , k2 ]
F̂[k1 , k2 ] = G[k1 , k2 ]
2-D Hamming-windowed lowpass filter |H[k1 , k2 ]|2 + λ 2
hFIR [n, m] = hFIR [n] hFIR [m]
Motion blur PSF
Deconvolution formulation 1 x − D/2
h(x, y) = rect δ (y)
g[n, m] = h[n, m] ∗ ∗ f [n, m] + v[n, m] s D
Deconvolution implementation by 2-D DFT Motion blur PSF

G[k1 , k2 ] n − N/2
F[k1 , k2 ] = h[n, m] = rect δ [m] ∆t
H[k1 , k2 ] N/2
deconvolution Hamming window motion blur notch filter Tikhonov criterion Wiener filter
PROBLEMS 6.4 This problem denoises by 2-D lowpass filtering with a

separable 2-D lowpass filter h[m, n] = h[m] h[n], where h[n] is an
Section 6-2: Denoising by Lowpass Filtering FIR lowpass filter designed by windowing the impulse response
of a brick-wall lowpass filter, which suppresses “ringing.”
6.1 The usual procedure for lowpass filtering an (N × N) (a) Design a 1-D lowpass filter h[n] of duration 31 by using a
image f [n, m] is to set its 2-D DFT F[k1 , k2 ] = 0 for Hamming window on the impulse response of a brick-wall
K ≤ k1 , k2 ≤ N + 2 − K for some index K. Specify two problems lowpass filter with cutoff frequency Ω0 = π3 .
with using this brick-wall filtering approach.
(b) Filter the “letters” image by 2-D convolution with
6.2 Explain how a Hamming-windowed filter solves the prob- h[m] h[n]. Display the result.
lems in Problem 6.1. (c) Try varying the filter duration and cutoff frequency. See if
6.3 This problem investigates denoising images by 2-D brick- this improves the result.
wall lowpass filtering. The program adds noise to the “clown”
image, then 2-D brick-wall lowpass filters it: Section 6-3: Notch Filtering
load clown.mat; 6.5 We derive a system for performing notch filtering of
Y=X+0.2*randn(200,200);FY=fft2(Y); the 2-D sinusoidal signal f [n, m] = cos((Ω1 )0 n) cos((Ω2 )0 m),
FZ=FY;L=??;FZ(L:202-L,L:202-L)=0; which is to be eliminated from an image. Let the PSF
Z=real(ifft2(FZ));imagesc(Z),colormap(gray)  
1 0 1
(a) Run this for L = 101, 20, and 10. Display the filtered h[n, m] = 0 h[0, 0] 0
images. 1 0 1
(b) Discuss the tradeoffs involved in varying the cutoff fre-
quency. where h[0, 0] = −4 cos((Ω1 )0 ) cos((Ω2 )0 ).
(a) Compute the spatial frequency response H(Ω1 , Ω2 ) of this Add and subtract |GH∗ |2 , divide by (HH∗ + λ 2 ), and complete
system. the square.
(b) Show that it does indeed eliminate f [n, m].
6.11 This is an introductory image deconvolution problem
(c) Specify a problem with using this LSI system as a notch using a Wiener filter. A crude lowpass filter is equivalent to
filter. convolution with PSF
6.6 Download file P66.mat. Use notch filtering to eliminate (
stripes in the clown image. Print out the striped image, its 1/L2 for 0 ≤ n, m ≤ L − 1,
h[n, m] =
spectrum, and the notch-filtered image and its spectrum. Hint: 0 otherwise.
Let x[n] have length N and its N-point DFT X[k] have a peak
at k = k0 . From Eq. (2.88), the peak represents a sinusoid This problem undoes this crude lowpass filter using a Wiener
with frequency Ω0 = 2π k0 /N. The sinusoid will repeat about filter with λ = 0.01.
Ω/(2π )N = k0 times over the length N of x[n]. (a) Blur the clown image with h[n, m] for L = 11 using:
6.7 Download file P67.mat. Use notch filtering to eliminate clear;load clown;
stripes in the head. Print out the striped image, its spectrum, and H=ones(11,11)/121;Y=conv2(X,H);
the notch-filtered image and its spectrum. Hint: Let x[n] have (b) Deblur the blurred image using a Wiener filter using:
length N and its N-point DFT X[k] have a peak at k = k0 . From imagesc(Y),colormap(gray);
Eq. (2.88), the peak represents a sinusoid with frequency Ω0 = FY=fft2(Y);FH=fft2(H,210,210);
2π k0/N. The sinusoid will repeat about Ω/(2π )N = k0 times FZ=FY.*conj(FH)./(abs(FH).
over the length N of x[n]. *abs(FH)+.0001);
6.8 Download file P68.mat. Use notch filtering to eliminate Z=real(ifft2(FZ));
two sets of lines. Note that there are horizontal lines on top of the figure,imagesc(Z),colormap(gray)
image, and vertical lines in the image. Use the procedure used in
6.12 This is an introductory image deconvolution problem
Section 6-2, but in both horizontal and vertical directions. Print
using a Wiener filter. A crude lowpass filter is equivalent to
out the original image, its spectrum, and the notch-filtered image
convolution with PSF
and its spectrum.
(
6.9 Download file P69.mat. Use notch filtering to eliminate 1/L2 for 0 ≤ n, m ≤ L − 1,
two sets of lines. Note that there are horizontal lines on top of the h[n, m] =
0 otherwise.
image, and vertical lines in the image. Use the procedure used in
Section 6-2, but in both horizontal and vertical directions. Print This problem undoes this crude lowpass filter using a Wiener
out the original image, its spectrum, and the notch-filtered image filter with λ = 0.01.
and its spectrum.
(a) Blur the letters image with h[n, m] for L = 15 using:
clear;load letters;H=ones(15,15)/225;
Section 6-4: Image Deconvolution Y=conv2(X,H);
6.10 Derive the Wiener filter by showing that the fˆ[n, m] (b) Deblur the blurred image using a Wiener filter using:
minimizing the Tikhonov functional imagesc(Y),colormap(gray);FY=fft2(Y);
FH=fft2(H,270,270);
T = ∑ ∑[(g[n, m] − h[n, m] ∗ ∗ fˆ[n, m])2 + λ 2 ( fˆ[n, m])2 ] FZ=FY.*conj(FH)./(abs(FH).
n m *abs(FH)+.0001);Z=real(ifft2(FZ));
figure,imagesc(Z),colormap(gray)
has 2-D DFT
6.13 Deblurring due to an out-of-focus camera can be mod-
H[k1 , k2 ]∗
F̂[k1 , k2 ] = G[k1 , k2 ] . elled crudely as a 2-D convolution with a disk-shaped point-
|H[k1 , k2 ]|2 + λ 2 spread function
Hints: Use Parseval’s theorem and (
1 for n2 + m2 < R2 ,
h[n, m] =
|a + b|2 = aa∗ + ab∗ + ba∗ + bb∗. 0 for n2 + m2 > R2 .
PROBLEMS 201
This problem deblurs an out-of-focus image in the (unrealistic) (b) followed by, for an additional Ty − Tx s (for Tx < t < Ty ),
absence of noise. (c) vertical, in increasing y, at speed ry cm/s for Ty − Tx s.
(a) Blur the letter image with an (approximate) disk PSF using
Compute the spatial frequency response of the camera motion.
H(25,25)=0;for I=1:25;for J=1:25;
if((I-13)*(I-13)+(J-13)*(J-13)<145); 6.19 Download file P619.mat. The goal is to deconvolve the
H(I,J)=1;end;end;end; motion blur.
load letters;Y=conv2(X,H); (a) Compute and display the spectrum of the blurred image.
subplot(221),imagesc(Y),colormap(gray) What causes the vertical bands of zeros (VBZ)? Hint: See
(b) Deblur this out-of-focus image using the command the numerator of Eq. (6.45).
Z=real(ifft2(fft2(Y)./fft2(H,280,280))); (b) From the spacing between the VBZ, compute N in
subplot(222),imagesc(Z),colormap(gray) Eq. (6.45).
Note that the size of the blurred image is 256 + 25 − 1
= 280. (c) Deconvolve the image using the Wiener filter Eq. (6.46).
Let T = 1 and λ = 0.01.
(c) Explain why this approach will not work in the real world
(i.e., in the presence of noise). 6.20 Download file P620.mat. The goal is to deconvolve the
motion blur.
6.14 Repeat Problem 6.13, only now add noise to the blurred
image: (a) Compute and display the spectrum of the blurred image.
What causes the vertical bands of zeros (VBZ)? Hint: See
(a) Add noise to the blurred image using the numerator of Eq. (6.45).
Y=Y+100*randn(280,280);
(b) From the spacing between the VBZ, compute N in
(b) Deblur the image as in Problem 6.13. You should get noise! Eq. (6.45).
(c) Deblur the image using a Wiener filter, using (c) Deconvolve the image using the Wiener filter Eq. (6.46).
FH=fft2(H,280,280); Let T = 1 and λ = 0.01.
W=real(ifft2(fft(Y).*conj(FH)./
(abs(FH).*abs(FH)+10))); 6.21 Download file P621.mat. The goal is to deconvolve the
subplot(221),imagesc(Z),colormap(gray) motion blur. The blurred image in this problem is the SAR image
subplot(222),imagesc(W),colormap(gray) from Chapter 4.
6.15 Repeat Problem 6.13 using the clown image. Note that (a) Compute and display the spectrum of the blurred image.
the size of the blurred image is now 200 + 25 − 1 = 224. What causes the vertical bands of zeros (VBZ)? Hint: See
the numerator of Eq. (6.45).
6.16 Repeat Problem 6.14 using the clown image. Note that (b) From the spacing between the VBZ, compute N in
the size of the blurred image is now 200 + 25 − 1 = 224. Eq. (6.45).
Add noise using Y=Y+randn(224,224); and use λ 2 = 100,
since the clown pixel values have a maximum value of only 1, (c) Deconvolve the image using the Wiener filter Eq. (6.46).
while the letters pixel values have a maximum value of 255. Let T = 1 and λ = 0.01.
Section 6-6: Motion Blur Deconvolution
6.17 The motion sensor in a Steadicam camera records its

motion as being in increasing x and y at an angle θ from the
horizontal (x) at speed r cm/s for T s for a diagonal distance
of D = rT cm. Compute the spatial frequency response of the
camera motion.
6.18 The motion sensor in a Steadicam camera records its
motion as being:
(a) horizontal, in increasing x, at speed rx cm/s for Tx s,
Chapter 7
7 Wavelets and Compressed
Sensing
Contents 20
40
60
Overview, 203
80
7-1 Tree-Structured Filter Banks, 203
7-2 Expansion of Signals in Orthogonal Basis 100
Functions, 206 120
7-3 Cyclic Convolution, 209 140
7-4 Haar Wavelet Transform, 213 160
7-5 Discrete-Time Wavelet Transforms, 218 180
7-6 Sparsification Using Wavelets of Piecewise- 200

20 40 60 80 100 120 140 160 180 200
Polynomial Signals, 223 (a) 200 × 200 clown image

7-7 2-D Wavelet Transform, 228
7-8 Denoising by Thresholding and Shrinking, 232
7-9 Compressed Sensing, 236 20
7-10 Computing Solutions to Underdetermined 40
Equations, 238 60
7-11 Landweber Algorithm, 241 80
7-12 Compressed Sensing Examples, 242 100
Problems, 251 120
140
Objectives 160
180
Learn to: 200

20 40 60 80 100 120 140 160 180 200
(b) 2-D D3 Daubechies wavelet transform

■ Compute wavelet transforms of 1-D signals and 2-D of the clown image
images.
The wavelet transform is an important tool. Its
■ Design discrete-time Daubechies wavelets of various applications in image processing include: denoising
orders. while preserving edges, compression, and com-
pressed sensing, which is reconstruction of an
■ Use wavelet transforms to compress signals and image from a reduced set of linear measurements of
images. it. After a review of the 1-D discrete-time Haar and
Daubechies wavelet transforms, we present applica-
■ Use thresholding and shrinkage of its wavelet
transform to denoise an image. tions of them to denoising, compression, and com-
pressed sensing, including image inpainting and
■ Use compressed sensing to reconstruct an image X-ray tomography (CAT).
from a reduced set of measurements.
Overview 7-1 Tree-Structured Filter Banks
The wavelet transform is an important signal processing tool Wavelet transforms can be viewed as a generalization of tree-
for representing signals or images consisting mostly of slowly structured filter banks (TSFBs) and subband coding. A filter
varying regions, but containing a few fast-varying regions. Like bank is a set of bandpass filters connected in parallel; each
the discrete Fourier transform (DFT), it represents signals or bandpass filter passes a different range of frequencies. So the
images as a linear combination of basis functions. signal input into the filter bank is separated into different
One characteristic of the DFT is that images, even with only components, each of which consists of a different part of the
a few fast-varying segments such as edges, require high spatial spectrum of the input signal.
frequency complex exponentials to represent them. Hence, most Filter banks are used in audio signal processing. In human
or all of the DFT values X[k1 , k2 ] are nonzero. In contrast, the ba- hearing, some frequencies cannot be heard as well as others.
sis functions used in wavelet transforms are localized in time and Also a large component at one frequency can mask a component
frequency. This means that a signal that is mostly slowly-varying at another frequency. So it makes sense to keep only the
but has a few localized fast-varying regions requires only a few frequency bands that humans can hear. The basic idea behind
low-resolution basis functions to represent the slowly-varying coding of signals is to omit the frequency bands that contribute
regions, and a few high-resolution basis functions to represent little to the perception of the signal, and the process is called
just the localized fast-varying regions. Many of the wavelet subband coding. The mp3 coding of music uses this idea (and
transform values are thus zero (or near zero). This feature leads many others).
to the following three major applications of wavelet transforms: An efficient tree-like filter structure for separating a 1-D
• Compression of signals and images: they are represented sampled signal into different frequency bands (subband decom-
in the wavelet transform domain by many fewer numbers position) is shown in Fig. 7-1, in which g[n] is a lowpass filter
than in the original signal or image. The JPEG-2000 image with cutoff frequency Ωc = π /2 and h[n] is a highpass filter with
compression standard uses the wavelet transform. the same cutoff frequency. The concept is easily extendable to
2-D images.
• Compressed sensing of signals and images: since the signal Filter-bank diagrams may involve five types of operations:
or image in the wavelet transform domain requires many
fewer numbers to represent it, it can be reconstructed (1) Duplication:
from many fewer observations than would be required to
x[n]
reconstruct the original signal or image. An introduction to x[n]
compressed sensing is presented later in this chapter. x[n]
• Filtering of signals and images: since the signal or image (2) Addition:
in the wavelet transform domain requires many fewer num-
bers to represent it, thresholding small values of the wavelet y1[n]
transform of a noisy signal or image to zero reduces the
noise in the original signal or image. We will show that the z[n] = y1[n] + y2[n]
combination of thresholding and shrinkage gives results
far superior to using the 2-D DFT for noise reduction. y2[n]
After this Overview section, we present the Haar wavelet
transform, which is the simplest wavelet transform, and yet (3) Downsampling (decimation):
illustrates many features of the family of wavelet transforms.
We then present quadrature mirror filters (QMFs) and derive x[n] 2 yd [n] = x[2n]
the Smith-Barnwell condition for perfect reconstruction of
the original signal from its wavelet transform. We conclude Discarding every other sample in x[n].
our treatment of wavelets by deriving the Daubechies wavelet
function, which is the most commonly used wavelet function (4) Upsampling (zero-stuffing):
because it sparsifies many real-world signals. Finally, examples (
of image compression, denoising, and compressed sensing are x[n/2] for n even
x[n] 2 yu [n] =
provided. 0 for n odd
203
204 CHAPTER 7 WAVELETS AND COMPRESSED SENSING
g[n] 2 xLDLD[n]
xL[n] xLD[n]
g[n] 2
h[n] 2 xLDHD[n]
x[n]
g[n] 2 xHDLD[n]
xH[n] xHD[n]
h[n] 2
h[n] 2 xHDHD[n]
Figure 7-1 Tree-like filter structure for subband decomposition. The green boxes denote lowpass and highpass frequency filters, realized
through cyclic convolution.
Inserting a zero between successive values of x[n]. into 2N signals, each of which represents a different frequency
band of bandwidth π /2N and is sampled only 1/2N as often as
(5) Convolution: x[n]. Use of N = 5, resulting in 25 = 32 subbands, is a common
choice.
In Fig. 7-1:
x[n] h[n] y[n] = h[n] ∗ x[n]
• xLD [n] is the lowpass part, 0 ≤ |Ω| ≤ π2 , of x[n].
π
• xHD [n] is the highpass part, 2 ≤ |Ω| ≤ π , of x[n].
In Fig. 7-1, the input signal x[n] with spectrum (DTFT)
X(Ω) is separated into a low-frequency-band signal xL [n] whose The signals at the second stage of the filter bank have spectra
spectrum is roughly that are roughly as follows:
( • xLDLD [n] is the lowpass part, 0 ≤ |Ω| ≤ π2 , of xLD [n], which
X(Ω) for 0 ≤ |Ω| < π /2, is equivalent to the lowpass part, 0 ≤ |Ω| ≤ π4 , of x[n].
XL (Ω) = (7.1)
0 for π /2 < |Ω| ≤ π ,
• xLDHD [n] is the highpass part, π2 ≤ |Ω| ≤ π , of xLD [n],
which is equivalent to the bandpass part π4 ≤ |Ω| ≤ π2 of
and a high-frequency-band signal xH [n] whose spectrum is
x[n].
roughly
( If we were to extend the filter bank in Fig. 7-1 to another stage,
0 for 0 ≤ |Ω| < π /2, the signals at the third stage would have spectra that are roughly:
XH (Ω) = (7.2)
X(Ω) for π /2 < |Ω| ≤ π . xLDLDLD [n] is the lowpass part 0 ≤ |Ω| ≤ π2 , of xLDLD [n], which
is equivalent to the lowpass part, 0 ≤ |Ω| ≤ π8 , of x[n].
Each signal can be downsampled by 2 without aliasing, resulting
in xLD [n] = xL [2n] and xHD [n] = xH [2n]. There are now two xLDLDHD [n] is the highpass part, π2 ≤ |Ω| ≤ π , of xLDLD [n],
different signals, each of which is sampled only half as often which is equivalent to the bandpass part π8 ≤ |Ω| ≤ π4 of x[n].
as x[n], so the total number of samples is unaltered, and each At each stage, decimation (halving the sampling rate) expands
represents a different frequency band of the original signal. the spectrum of each signal to the full range 0 ≤ |Ω| < π (see
This same decomposition can then be applied to each of Section 7-1.1).
the two downsampled signals xLD and xHD , which results in
four signals, each of which is sampled only one fourth as 7-1.1 Octave-Based Filter Banks
often as x[n], so the total number of samples is the same, and
each represents a different frequency band of bandwidth π /4. The tree structure in Fig. 7-1 can be replaced with the simpler
Repeating this decomposition N times, x[n] can be decomposed structure shown in Fig. 7-2. As we show later in Section 7-5,
7-1 TREE-STRUCTURED FILTER BANKS 205
xLDL[n]
g[n] 2 xLDLD[n]
xL[n] xLD[n]
g[n] 2
xLDH[n]
h[n] 2 xLDHD[n]
x[n]
xH[n] xHD[n]
h[n] 2
Figure 7-2 Octave-based filter bank structure for subband decomposition. Note that only the upper half (lowpass) of each stage is
decomposed further.
the wavelet transform is implemented using this simpler tree denoted as

structure. (
This form of filter bank decomposes the spectrum X(Ω) of x[n/2] for n even,
x[n] 2 yu [n] = (7.3)
x[n] into octaves. An octave is a frequency band fmin < f < fmax 0 for n odd,
in which fmax = 2 fmin , using either continuous-time frequency
f in Hz or discrete-time frequency Ω. We use Ω in the se- which gives
quel, since the signals are all in discrete time. The octave
π /2K ≤ Ω ≤ π /(2K−1) is represented by a single signal. Since yu [n] = { . . . , x[0], 0, x[1], 0, x[2], 0, x[3], . . . },
the width of this band is π /2K , the sampling rate can be reduced
by a factor of 2K using decimation by 2 at each of K stages. The followed by interpolation using a filter g[−n] or h[−n] to replace
lowest band 0 ≤ Ω ≤ π /2K is represented by a single signal, as the zeros introduced by zero-stuffing with the correct values of
shown in Fig. 7-3 for K = 2. the signal input into that filter-bank stage. For octave-based filter
banks, the time reversals are unnecessary, but time reversals are
necessary when using this same structure for the inverse wavelet
transform introduced later in Section 7-5.
xLDLD[n] xLDHD[n] xHD[n] band Zero-stuffing is presented in more detail in Subsection 7-4.3.
band band
Ω
0 π/4 π/2 π
Figure 7-3 Partitioning of signal spectrum by octave-based

filter bank for K = 2.
7-1.3 Significance of Wavelets
The wavelet transform differs from this subband coding in that

7-1.2 Reconstruction Using Octave-Based Filter x[n] is not recursively decomposed explicitly into lower and
Banks higher frequency bands, although the decomposition is still
roughly into lower and higher frequency bands. Instead, the
The original signal x[n] can be recovered from its octave-based lower frequency bands are replaced with signals that represent
frequency decomposition using the reconstruction filter bank slowly-varying parts of x[n], and the higher frequency bands
shown in Fig. 7-4. The inverse wavelet transform in Section 7-5 are replaced with signals that represent fast-varying parts of
is implemented using this same structure. Each filter bank stage x[n]. The latter signals are mostly zero-valued if x[n] is slowly-
consists of zero-stuffing (inserting zeros between samples), varying most of the time.
xLDLD[n] 2 g[−n]
xLD[n]
2 g[−n]
xLDHD[n] 2 h[−n] x[n]
xHD[n] 2 h[−n]
Figure 7-4 Octave-based filter bank structure for subband reconstruction.
◮ Real-world signals and images do tend to consist of Exercise 7-1: (a) An input signal of duration N is fed
mostly slowly-varying regions, containing a few localized into a tree-based filter bank of five stages. What is the
regions in which they are fast-varying. Wavelets are good combined total duration of the output signals? Repeat for
at representing such signals with wavelet transforms that (b) an octave-based filter bank.
are mostly zero-valued. ◭
Answer: (a) 25 N = 32N. (b) N, because lower-
wavenumber bands can be sampled less often.
7-2 Expansion of Signals in Orthogonal

To see why representing a signal using only a few wavelet
transform components is useful, consider periodic signals. A Basis Functions
periodic signal x(t) with period T0 and maximum frequency B
(in Hz) can be represented in the frequency domain using only 7-2.1 Signal Expansion
BT0 frequencies, since its spectrum consists of harmonics at
frequencies k/T0 Hz for integers k. The maximum frequency B The general form of the expansion of a signal x[n] into a linear
must equal N/T0 for some integer N, or equivalently, N = BT0 combination of basis functions {φφk [n] } with coefficients { xk }
frequencies. Instead of storing x(t), we can generate it using BT0 (often called an orthogonal expansion of x[n]) is
sinusoidal generators, each of which requires only an amplitude ∞
and phase. So x(t) can be compressed into 2BT0 (plus a dc term, x[n] = ∑ xk φ k [n]. (7.4)
if present) numbers. k=1
If noise had been added to x(t), most of the noise can be
eliminated because any part of the spectrum of the noisy x(t) that The coefficients {xk } constitute the transform of x[n] using
is not at a harmonic k/T0 in Hz is noise and can be filtered out. the basis functions {φ φk [n] }. For wavelet transforms, the basis
We will perform similar actions on signals and images that are functions {φφk [n] } and coefficients { xk } are real-valued, but for
not periodic, but which have wavelet transforms that are mostly many other orthogonal expansions, such as the DFT and Fourier
zero-valued. This includes many real-world signals and images. series, they are complex-valued. We assume here that the signal
x[n] to be expanded is real-valued.
Basis functions {φ φk [n] } are chosen to be orthogonal, which
means that, for some constant C,
Concept Question 7-1: Why is it useful to represent a ∞
signal or image using a few wavelet coefficients? ∑ φ k1 [n] φ ∗k2 [n] = Cδ [k1 − k2 ]. (7.5)
n=−∞
7-2 EXPANSION OF SIGNALS IN ORTHOGONAL BASIS FUNCTIONS 207
An important consequence of the orthogonality property is that function given by Eq. (7.4) and its inverse in Eq. (7.6). Table 7-1
coefficients xk can be computed from x[n] using compares attributes of the generic orthogonal expansion basis
function with those of the DFT and the continuous-time Fourier
∞
1 series.
xk = ∑ x[n] φ ∗k [n]. (7.6)
C n=−∞
The other part of the significance is Rayleigh’s theorem (Section ◮ If C = 1 in Eq. (7.6), the orthogonal basis functions are
7-2.3). said to be orthonormal. The wavelet transform (Section
7-5) uses orthonormal basis functions. ◭
7-2.2 Expansion Coefficients
Equation (7.6) can be derived as follows: 7-2.3 Parseval’s Theorem and its Significance
1. In Eq. (7.4), change index k to index k2 , giving For two continuous-time signals x(t) and y(t) and their asso-
∞ ciated Fourier transforms X( f ) and Y( f ), Parseval’s theorem is
x[n] = ∑ xk2 φ k2 [n]. (7.7) stated in Eq. (2.27) as
k2 =1 Z ∞ Z ∞
x(t) y∗ (t) dt = X( f ) Y∗ ( f ) d f . (7.12a)
2. Upon multiplying both sides of Eq. (7.7) by φ ∗k1 [n] and −∞ −∞
summing over index n, we have The special case wherein x(t) = y(t) is known as Rayleigh’s
∞ ∞ ∞ theorem:
∑ x[n] φ ∗k1 [n] = ∑ ∑ xk2 φ k2 [n] φ∗k1 [n]. (7.8) Z ∞ Z ∞
n=−∞ n=−∞ k2 =1 E= |x(t)|2 dt = |X( f )|2 d f , (7.12b)
−∞ −∞
3. Interchanging the order of summations in the right side of which states that the energies of x(t) and X( f ) are equal.
Eq. (7.8) leads to Similarly, Rayleigh’s theorem for a discrete-time signal x[n]
∞ ∞ ∞ is given by Eq. (2.80) as
∑ x[n] φ ∗k1 [n] = ∑ xk2 ∑ φ k2 [n] φ ∗k1 [n]. (7.9) Z π
∞
n=−∞ k2 =1 n=−∞ 1
∑ |x[n]|2 = |X(Ω)|2 dΩ, (7.13)
n=−∞ 2π −π
4. Using the orthogonality property given by Eq. (7.5) gives
where X(Ω) is the DTFT of x[n].
∞ ∞
x[n] φ ∗k1 [n] = The statements given by Eqs. (7.12) and (7.13) can be
∑ ∑ xk2 C δ [k2 − k1] = C xk1 . (7.10)
generalized to the generic orthogonal basis function expressed
n=−∞ k2 =1
in Eq. (7.4):
Dividing by C and replacing k1 with k gives Eq. (7.6). ∞ ∞
A good example of an orthogonal expansion is the 1-D DFT ∑ |x[n]|2 = C ∑ |xk |2 (7.14a)
and its inverse defined in Eq. (2.89) for a finite-length signal n=−∞ k=−∞
x[n]: (Rayleigh’s theorem),
∞ ∞
M−1
− j2π nk/N ∑ x[n] y[n]∗ = C ∑ xk y∗k (7.14b)
X[k] = ∑ x[n] e , k = 0, . . . , N − 1, (7.11a) n=−∞ k=−∞
n=0
N−1
(Parseval’s theorem),
1 j2π nk/N
x[n] =
N ∑ X[k] e , n = 0, . . . , M − 1. (7.11b)
where x[n] and y[n] are any two discrete-time functions. Our
k=0
interest in this book is in real-valued 2-D images, so the complex
The ranges in the summations in Eqs. (7.11) are particular to the conjugation on x[n] and y[n] in Eq. (7.14a) is irrelevant, but we
DFT and differ from those in the generic definition of the basis have decided to retain it for the sake of completeness.
Table 7-1 1-D DFT and Fourier series compared with generic orthogonal expansion function.
Generic Orthogonal Function DFT Fourier Series

Basis function φ k [n] e j2π kn/N e jnω0 t
∞ 1 N−1 ∞
Expansion x[n] = ∑ xk φk [n] x[n] = ∑ X[k] e j2π nk/N ,
N k=0
x(t) = ∑ xn e jnω0 t
k=1 n=−∞
n = 0, . . . , M − 1
∞ M−1 Z T0
1 1
Coefficients xk = ∑ x[n] φ ∗k [n] X[k] = ∑ x[n] e− j2π nk/N , xn = x(t) e− jnω0 t dt
C n=−∞ n=0 T0 0
k = 0, . . . , N − 1
∞ N−1 Z T0
1
Orthogonality property ∑ φ k1 [n] φ ∗k2 [n] = Cδ [k1 − k2 ]
N ∑ e j2π (m−n)k/N = δ [m − n] 0
e jmω0 t e− jnω0 t dt = T0 δ [m − n]
n=−∞ k=0
For an orthonormal set of basis functions with C = 1, Ray- equal to the energy of the transform coefficients { ε k }.
leigh’s theorem states that the energy of x[n], summed over all n,
is equal to the energy of coefficients { xk }, summed over all k.
The statement is equally applicable to small perturbations in
total energy. Consider, for example, a small perturbation ε [n]
from signal x[n]. The perturbed signal is
y[n] = x[n] + ε [n]. (7.15)
Signals x[n] and y[n] can each be expanded as:

∞ ◮ An orthonormal transformation does not amplify pertur-
x[n] = ∑ xk φk [n], (7.16a) bations, a property that will prove highly significant to the
k=1 application of wavelets for denoising images. ◭
∞
y[n] = ∑ yk φk [n]. (7.16b)
k=1
Use of Eqs. (7.16a) and (7.16b) in Eq. (7.15) leads to

∞
ε [n] = y[n] − x[n] = ∑ (yk − xk ) φk [n] = ∑ ε k φk [n], (7.17) Concept Question 7-2: What is the significance of Ray-
k=1 k=1 leigh’s theorem?
where
ε k = yk − xk . Exercise 7-2: A square wave x(t) has the Fourier series
expansion x(t) = ∑∞ 1
k=1 k sin(2π kt). If x(t) is passed through
Application of Rayleigh’s theorem, as stated by Eq. (7.14b), to a brick-wall lowpass filter with a cutoff frequency of 2.5 Hz.
sampled signal ε [n] leads to What is the ratio of the average power of the output signal
∞ ∞ to the average power of x(t)? Hint:
∑ |ε [n]|2 = ∑ |ε k |2 , (7.18)
∞
n=∞ k=∞ 1 π2
∑ k2 = 6
.
which confirms that the energy of the perturbation ε [n] of x[n] is k=1
7-3 CYCLIC CONVOLUTION 209
and
Answer: By Rayleigh’s theorem, the average power of x(t) h[n] = { h[0], h[1], . . . , h[N2 ] }. (7.19b)
is Their linear convolution is
∞
1 π2
∑ k2 = 6
, ∞
k=1 y[n] = h[n] ∗ x[n] = ∑ h[n] x[n − i]. (7.20)
and the average power of the output signal is i=∞
2 If x[n] has
1
∑ k2 = 1.25. support: Nxℓ ≤ n ≤ Nxu , and
k=1
duration: Nx = Nxu − Nxℓ + 1,
This is because the lowpass filter sets the Fourier series
coefficients for k ≥ 3 to zero. where second subscripts ℓ and u refer to the lower and upper
1.25 values of Nx , and if h[n] has
= 0.76.
π 2 /6 support: Nhℓ ≤ n ≤ Nhu , and (7.21a)
duration: Nh = Nhu − Nhℓ + 1, (7.21b)
7-3 Cyclic Convolution then their linear convolution y[n] has
7-3.1 Why Use Cyclic Convolutions? support: Nyℓ ≤ n ≤ Nyu , and (7.21c)
duration: Nyu − Nyℓ + 1 = Nx + Nh − 1, (7.21d)
In Section 2-7.2, we introduced the concept of cyclic convo-
lution x1 [n]
c x2 [n] between two signals x1 [n] and x2 [n], and where
we showed how it can be computed from the traditional linear
convolution x1 [n] ∗ x2 [n], as demonstrated in Example 2-6, or by Nyℓ = Nxℓ + Nhℓ, (7.21e)
applying the DFT method. Nyu = Nxu + Nhu . (7.21f)
The wavelet transform—the prime topic of this chapter—
employs convolution, decimation, and zero-stuffing. If we use Graphically, these supports and associated durations are:
linear convolutions in computing the wavelet transform, the
total length of the decimated signal will be longer than that of
Nx = Nxu − Nxl + 1
the original signal. At each stage in Fig. 7-1 or Fig. 7-2, for x[n]:
example, the linear convolution with h[n] or g[n] would generate Nxl Nxu
a new signal longer than that of the input signal by the length Nh = Nhu − Nhl + 1
of h[n] or g[n], respectively. The advantage of cyclic convolution h[n]:
is that the new signal remains at the same length as that of the Nhl Nhu
input signal. This property limits the computational storage to Ny = Nx + Nh − 1
the same storage required for the original signal. y[n]:
As noted, the cyclic convolution can be computed directly Nyl = Nxl + Nhl Nyu = Nxu + Nhu
from the linear convolution, or indirectly by applying the DFT
method. In preparation for the material presented in forthcoming
sections, we present reviews of both computational approaches.
A. Causal * Causal
7-3.2 Computing Linear Convolution If x[n] and h[n] are both causal signals, with Nxℓ = 0 and Nhℓ = 0,
Suppose we are given a causal signal x[n] and a causal filter (or then Eq. (7.20) simplifies to
another signal) h[n] defined as n
y[n] = ∑ h[i] x[n − i], 0 ≤ n ≤ Nyu . (7.22)
x[n] = { x[0], x[1], . . . , x[N1 ] }, (7.19a) i=0
Nx = Nxu + 1 (2.106) and (2.107) as

x[n]:
0 Nxu yc [n] = h[n]
c x[n] = y[n] + y[(n)N ], (7.25)
Nh = Nhu + 1 n = 0, 1 . . . , N − 1,
h[n]:
0 Nhu
where y[n] is the linear convolution of x[n] and h[n], and y[(n)N ]
Ny = Nx + Nh − 1 is y[n] at values of n outside the range n = 0, 1, . . . , N − 1,
y[n]: with those values of n reduced mod(N); i.e., the remainders
0 Nxu + Nhu
after reducing n by the largest multiple of N without becoming
negative.
B. Anticausal * Causal
A. Causal * Causal
If x[n] is causal with Nxℓ = 0 and h[n] is anticausal with its upper
limit Nhu = 0, then Eq. (7.20) simplifies to If both h[n] and x[n] are causal, so that
n Nxd = Nhd = 0,
y[n] = ∑ h[i] x[n − i], Nhℓ ≤ n ≤ Nxu . (7.23)
i=Nhℓ then Eq. (7.25) simplifies to
Nx = Nxu + 1 yc [n] = y[n] + y[n + N], n = 0, 1, . . . , N − 1, (7.26)
x[n]:
0 Nxu where, by definition, y[n + N] = 0 for n > Nyu − N.
Nh = |Nhl| + 1 The values {y[N], y[N + 1], . . . , y[Nyu ]} of y[n] for n ≥ N
h[n]: get added point-by-point to the given values {y[0], y[1], . . .,
Nhl 0 y[Nyu − N]}. The process is illustrated graphically in Fig. 7-5(a).
Ny = Nx + Nh − 1 = Nxu + |Nhl| + 1 If N > Nyu , the cyclic convolution equals the linear convolu-
y[n]: tion.
Nyl = Nhl Nyu = Nxu
An easy way to compute the convolution given by Eq. (7.23) is B. Anticausal * Causal
to use the time-shift property of convolution (# 5 in Table 2-6):
If x[n] is causal and h[n] is anticausal, so that Nxℓ = 0 and
h[n] ∗ x[n] = y[n] Nhu = 0, then Eq. (7.25) simplifies to
yc [n] = y[n] + y[n − N], n = 0, 1, . . . , N − 1, (7.27)

h[n − n1] ∗ x[n − n2] = h[n − n1 − n2], (7.24)
where we set y[n − N] = 0 for n < Nyℓ + N and we assume
for any two integers n1 and n2 . The procedure involves the N ≥ Nyℓ . If N < Nyℓ , we must also add the values of {y[n],
following steps: N ≤ n ≤ Nyℓ } to y[n] as described in the causal*causal subsec-
(1) Keeping in mind that Nhℓ is a negative integer, delay tion. This situation seldom happens.
the anticausal signal h[n] by |Nhℓ | to obtain the causal signal The anticausal values {y[Nyℓ ], . . . , y[−1]} of y[n] for n < 0
h[n − |Nhℓ|]. get added point-by-point to the given values {y[N − |Nyℓ |], . . . ,
(2) Compute h[n − |Nhℓ |] ∗ x[n] using Eq. (7.22), the convolu- y[N − 1]}. The process is illustrated graphically in Fig. 7-5(b).
tion expression for two causal signals.
(3) Advance h[n − |Nhℓ|] ∗ x[n] by |Nhℓ |.
Example 7-1: Cyclic Convolution
7-3.3 Computing Cyclic Convolution

Given
The cyclic convolution yc [n] of order N ≥ Nx , Nh of signals x[n]
and h[n] as specified by Eq. (7.19), was defined in Eqs. (2.104), x[n] = {3, 4, 5, 6, 7, 8},
7-3 CYCLIC CONVOLUTION 211
First N terms Additional terms if N < Nyu
Linear y[n] = y[0] y[1] ... y[ ] ... y[N − 1] y[N] y[N + 1] ... y[Nyu]
+ + +
Cyclic yc[n] = y[0] + y[N] y[1] + y[N + 1] ... y[N − 1]
(a) Causal * causal
Anticausal terms Causal terms
Linear y[n] = y[Nyl] ... y[−2] y[−1] y[0] ... y[ ] ... y[Nyu − 1] y[Nyu]
+ + +
Causal yc[n] = y[0] y[1] ... y[Nyu − 1] + y[−2] y[Nyu] + y[−1]
(b) Anticausal * causal
Figure 7-5 Graphical representation of obtaining cyclic convolution of order N from linear convolution.
h1 [n] = {1, 2, 3}, yc1 [n] = h1 [n] c x[n]

h2 [n] = h1 [−n] = {3, 2, 1}, = y1 [n] + y1[n + N]
= {3 + 37, 10 + 24, 22, 28, 34, 40}
and N = 6, compute (a) yc1 = h1 [n]
c x[n] and (b)
yc2 = h2 [n]
c x[n]. = {40, 34, 22, 28, 34, 40}.
The same result can be obtained by computing the cyclic

convolution with the DFT (see Eq. (2.104)):
Solution: (a)
y1 [n] = h1 [n] ∗ x[n]



 1×3 = 3 for n = 0,

 ifft(fft([3,4,5,6,7,8]).*fft([1,2,3],6))
1 × 4 + 2 × 3 = 10 for n = 1,
= 1 × 5 + 2 × 4 + 3 × 3 = 22 for n = 2




 ...
= {3, 10, 22, 28, 34, 40, 37, 24},

(b) which is equivalent to multiplying x[n] by 21 (1 + (−1)n):
y2 [n] = h2 [n] ∗ x[n] 1

 x[n] 2 2 (1 + (−1)n) x[n]. (7.30)
 3×3 = 9 for n = −2, 2



3 × 4 + 2 × 3 = 18 for n = −1,
= 3 × 5 + 2 × 4 + 1 × 3 = 26 for n = 0,
7-3.5 A Useful Convolution Relation




 ... Recall from Eq. (7.20) that the discrete-time linear convolution
is defined as
= {9, 18, 26, 32, 38, 44, 23, 8}. ∞
This is {3, 2, 1} ∗ {3, 4, 5, 6, 7, 8} advanced in time by 2.

y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]. (7.31)
i=−∞
yc2 [n] = h2 [n] x[n]

c Let us consider the modified functions:
= y1 [n] + y1[n − N]
h′ [n] = (−1)n h[n], (7.32a)
= {26, 32, 38, 44, 23 + 9, 8 + 18}
′ n
= {26, 32, 38, 44, 32, 26}. x [n] = (−1) x[n]. (7.32b)
Again, the same result can be obtained by computing the cyclic The linear convolution of the modified functions is
convolution with the DFT method: ∞
y′ [n] = h′ [n] ∗ x′ [n] = ∑ h′ [i] x′ [n − i]
ifft(fft([3,4,5,6,7,8]).*fft([1,0,0,0,3,2])) i=−∞
∞
= ∑ (−1)i h[i] (−1)n−i x[n − i]
i=−∞
7-3.4 Decimating and Zero-Stuffing ∞
= ∑ (−1)n h[i] x[n − i]
Decimating 1-D signals by 2, and zero-stuffing 1-D signals i=−∞
by 2, are essential parts of computing discrete-time wavelet ∞
transforms. Hence, we present this quick review of these two = (−1)n ∑ h[i] x[n − i]
i=−∞
concepts.
Decimating a signal x[n] by two means deleting every other = (−1)n y[n]. (7.33)
value of x[n]:
Combining the result given by Eq. (7.33) with the definition of
convolution given in Eq. (2.71a) leads to the conclusion:
x[n] 2 yd [n] = x[2n] = { . . . , x[0], x[2], x[4], . . . }.
(7.28)
Zero-stuffing a signal x[n] by two means inserting zeros between (−1)n { h[n] ∗ x[n] } = { (−1)n h[n] } ∗ { (−1)n x[n] }. (7.34)
successive values of x[n]:
(
x[n/2] for n even, 7-3.6 Wavelet Applications
x[n] 2 yu [n] =
0 for n odd, For applications involving wavelets, the following conditions
= { . . . , x[0], 0, x[1], 0, x[2], 0, x[3], . . . }. apply:
(7.29) • Batch processing is used almost exclusively. This is be-
cause the entire original signal is known before processing
Decimating by 2, followed by zero-stuffing by 2, replaces x[n] begins, so the use of non-causal filters is not a problem.
with zeros for odd times n:
• The order N of the cyclic convolution is the same as the
x[n] 2 2 { . . . , x[0], 0, x[2], 0, x[4], 0, x[6], . . . }, duration of the signal x[n], which usually is very large.
7-4 HAAR WAVELET TRANSFORM 213
• Filtering x[n] with a filter h[n] will henceforth mean com- 7-4 Haar Wavelet Transform
puting the cyclic convolution h[n] c x[n]. The duration L
of filter h[n] is much smaller than N (L ≪ N), so h[n] gets The Haar transform is by far the simplest wavelet transform,
zero-padded (see Section 2-7.3) with (N − L) zeros. The and yet it illustrates many of the concepts of how the wavelet
result of the cyclic convolution is the same as h[n] ∗ x[n], transform works.
except for the first (L − 1) values, which are aliased, and
the final (L − 1) values, which are no longer present, but
added to the first (L − 1) values.
7-4.1 Single-Stage Decomposition
Consider the finite-duration signal x[n]
• Filtering x[n] with the non-causal filter h[−n] gives the
same result as the linear convolution h[−n] ∗ x[n], except x[n] = { a, b, c, d, e, f , g, h }. (7.35)
that the non-causal part of the latter will alias the final
(L − 1) places of the cyclic convolution. Define the lowpass and highpass filters with impulse responses
ghaar [n] and hhaar [n], respectively, as
• Zero-padding does not increase the computation, since
multiplication by zero is known to give zero, so it need not 1
ghaar [n] = √ { 1, 1 }, (7.36a)
be computed. 2
1
• For two filters g[n] and h[n], both of length L, g[n]
c h[n] hhaar [n] = √ { 1, −1 }. (7.36b)
2
consists of g[n] ∗ h[n] followed by N − (2L − 1) zeros.
The frequency responses of these filters are the DTFTs given by
• As long as the final result has length N, linear convolutions
may be replaced with cyclic convolutions and the final 1 − jΩ
√ Ω − jΩ/2
Ghaar (Ω) = √ (1 + e ) = 2 cos e , (7.37a)
result will be the same. 2 2

1 √ Ω
Hhaar (Ω) = √ (1 − e− jΩ) = 2 sin je− jΩ/2 , (7.37b)
2 2
Exercise 7-3: Compute the cyclic convolution of {1, 2} and
{3, 4} for N = 2. which have lowpass and highpass frequency responses, respec-
Answer: {1, 2} ∗ {3, 4} = {3, 10, 8}. Aliasing the output tively (Fig. 7-6).
gives {8 + 3, 10} = {11, 10}. Define the average (lowpass) signal xL [n] as
xL [n] = x[n]
c ghaar [n] (7.38a)
Exercise 7-4: x[n] 2 2 y[n]. 1
= √ {a + h, b + a, c + b, d + c, e + d . . .}
Express y[n] in terms of x[n]. 2
Answer: y[n] = x[n]. Zero stuffing inserts zeros between

consecutive values of x[n] and decimation removes those 1.414
zeros, thereby restoring the original x[n]. 1.2
1.0
|H(Ω)|
0.8
Exercise 7-5: If y[n] = h[n] ∗ x[n], what is 0.6 |G(Ω)|
0.4
h[n](−1)n ∗ x[n](−1)n 0.2
0
0 0.5 1 1.5 2 2.5 3
Ω
in terms of y[n]?
Answer: y[n](−1)n . See Eq. (7.34). Figure 7-6 |G(Ω)| (in blue) and |H(Ω)| (in red) for the Haar
wavelet transform. This (quadrature-mirror filters) QMF pair
has symmetry about the Ω = π /2 axis.
and the detail (highpass) signal xH [n] Note that downsampling by 2 followed by upsampling by 2
replaces values of x[n] with zeros for odd times n.
xH [n] = x[n]
c hhaar [n] (7.38b) 2. Next, filter xLDU [n] and xHDU [n] with filters ghaar [−n] and
1 hhaar [−n], respectively.
= √ {a − h, b − a, c − b, d − c, e − d . . .}. As noted earlier in Section 7-1.3, the term “filter” in the
2
context of the wavelet transform means “cyclic convolution.”
Next, define the downsampled average signal xLD [n] as Filters ghaar [n] and hhaar [n] are called analysis filters, because
they are used to compute the Haar wavelet transform. Their time
1 reversals ghaar [−n] and hhaar [−n] are called synthesis filters,
xLD [n] = xL [2n] = √ {a + h, c + b, e + d, g + f } (7.39a)
2 because they are used to compute the inverse Haar wavelet
transform (that is, to reconstruct the signal from its Haar wavelet
and the downsampled detail signal xHD [n] as transform). The reason for using time reversals here is explained
below.
1
xHD [n] = xH [2n] = √ {a − h, c − b, e − d, g − f }. (7.39b) The cyclic convolutions of xLDU [n] with ghaar [−n] and
2 xHDU [n] with hhaar [−n] yield
The signal x[n] of duration 8 has been replaced by the two 1
signals xLD [n] and xHD [n], each of durations 4, so no information xLDU [n]
c ghaar [−n] = {a+h, c+b, c+b, . . . , a+h},
2
about x[n] has been lost. We use cyclic convolutions instead (7.41a)
of linear convolutions so that the cumulative length of the
1
downsampled signals equals the length of the original signal. c hhaar [−n] = {a − h, b − c, c − b, . . ., h − a}.
xHDU [n]
Using linear convolutions, each convolution with ghaar [n] or 2
(7.41b)
hhaar [n] would lengthen the signal unnecessarily. As we shall
see, using cyclic convolutions instead of linear convolutions is
sufficient to recover the original signal from its Haar wavelet 3. Adding the outcomes of the two cyclic convolutions gives
transform. x[n]:
x[n] = xLDU [n]

c ghaar [−n] + xHDU [n]
c hhaar [−n]. (7.42)
7-4.2 Single-Stage Reconstruction
It is still not evident why this is worth doing. The following
The single-stage Haar decomposition and reconstruction pro- example provides a partial answer.
cesses are depicted in Fig. 7-7. The signal x[n] can be recon- Consider the finite-duration (N = 16) signal x[n]:
structed from xLD [n] and xHD [n] as follows: 

 4 for 0 ≤ n ≤ 4
1. Define the upsampled (zero-stuffed) signal xLDU [n] as 
1
( for 5 ≤ n ≤ 9
x[n] = (7.43)
xLD [n/2] for n even 
 3 for 10 ≤ n ≤ 14
xLDU [n] = 

0 for n odd 4 for n = 15.
1 The Haar-transformed signals are
= √ {a + h, 0, c + b, 0, e + d, 0, g + f , 0},
2
(7.40a) 1
xLD [n] = xL [2n] = √ {8, 8, 8, 2, 2, 4, 6, 6},
2
and the upsampled (zero-stuffed) signal xHDU [n] as (7.44)
1
( xHD [n] = xH [2n] = √ {0, 0, 0, 0, 0, 2, 0, 0}.
2
xHD [n/2] for n even
xHDU [n] =
0 for n odd These can be derived as follows. We have
1 x[n] = {4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 4},
= √ {a − h, 0, c − b, 0, e − d, 0, g − f , 0}.
2
(7.40b)
xL[n] xLD[n] xLDU[n]

ghaar[n] 2 2 ghaar[−n]
x[n] x[n]
xH[n] xHD[n] xHDU[n]

hhaar[n] 2 2 hhaar[−n]
Decomposition Reconstruction
Figure 7-7 Single-stage Haar decomposition and reconstruction of x[n].
1 1 1
xL [n] = x[n] c √ {1, 1} c √ {1, 1} + xHDU [n]
xLDU [n] c √ {−1, 1}
2 2 2
1 = {4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 4} = x[n].
= √ {8, 8, 8, 8, 8, 5, 2, 2, 2, 2, 4, 6, 6, 6, 6, 7},
2
1 We observe that the outcome of the second cyclic convolution
xH [n] = x[n] c √ {1, −1} is sparse (mostly zero-valued). The Haar transform allows x[n],
2 which has duration 16, to be represented using the eight values
1 of xLD [n] and the single nonzero value (and its location n = 5)
= √ {0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1},
2 of xHD [n]. This saves almost half of the storage required for x[n].
1 Hence, x[n] has been compressed by 43%.
xLD [n] = xL [2n] = √ {8, 8, 8, 2, 2, 4, 6, 6}, Even though x[n] is not sparse, it was transformed, using
2
the Haar transform, into a sparse representation with the same
1
xHD [n] = xH [2n] = √ {0, 0, 0, 0, 0, 2, 0, 0}. number of samples, meaning that most of the values of the Haar-
2 transformed signal are zero-valued. This reduces the amount of
The original signal x[n] can be recovered from xLD [n] and xHD [n] memory required to store x[n], because only the times at which
by nonzero values occur (as well as the values themselves) need
be stored. The few bits (0 or 1) required to store locations
1 of nonzero values are considered to be negligible in number
xLDU [n] = √ {8, 0, 8, 0, 8, 0, 2, 0, 2, 0, 4, 0, 6, 0, 6, 0}, compared with the many bits required to store the actual nonzero
2
values. Since the Haar transform is orthogonal, x[n] can be
1
xHDU [n] = √ {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0}, recovered perfectly from its Haar-transformed values.
2
1
c √ {1, 1}
xLDU [n]
2 7-4.3 Multistage Decomposition and
= {4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4}, Reconstruction
1
c √ {−1, 1}
xHDU [n] In the simple example used in the preceding subsection, only 1
2 element of the Haar-transformed signal xHD [n] is nonzero, but all
= {0, 0, 0, 0, 0, 0, 0, 0, 0, −1, 1, 0, 0, 0, 0, 0}, 8 elements of xLD [n] are nonzero. We can reduce the number of
nonzero elements of xLD [n] by applying a second Haar transform
stage to it. That is, xLD [n] can be transformed into the two signals
xLDLD [n] and xLDHD [n] by applying the steps outlined in Fig. 7-8.
xLDL[n]
ghaar[n] 2 xLDLD[n]
xL[n] xLD[n]
ghaar[n] 2
xLDH[n]
hhaar[n] 2 xLDHD[n]
x[n]
xH[n] xHD[n]
hhaar[n] 2
Figure 7-8 Two-stage Haar analysis filter bank. Note that only the upper half of the first stage is decomposed further.
Thus, Decomposition
1 A signal x[n] of duration N = 2K (with K an integer) can be
xLD [n] = √ {8, 8, 8, 2, 2, 4, 6, 6},
2 represented by the Haar wavelet transform through a K-stage
1 decomposition process involving cyclic convolutions with filters
xLDL [n] = xLD [n] c √ {1, 1} ghaar [n] and hhaar [n], as defined by Eq. (7.36). The signal x[n]
2 can be zero-padded so that its length is a power of 2, if that is
= {7, 8, 8, 5, 2, 3, 5, 6}, not already the case, just as is done for the FFT. The sequential
(7.45)
1 process is:
xLDH [n] = xLD [n] c √ {1, −1}
2
= {1, 0, 0, −3, 0, 1, 1, 0}, Stage 1:
xLDLD [n] = xLDL [2n] = {7, 8, 2, 5},
xLDHD [n] = xLDH [2n] = {1, 0, 0, 1}. x[n] hhaar[n] 2 xe1 [n] = xHD [n],
Signal xLDHD [n] is again sparse: only two of its four values are
x[n] ghaar[n] 2 Xe1 [n] = xLD [n].
nonzero. So x[n] can now be represented by the four values
of xLDLD [n], the two nonzero values of xLDHD [n], and the one
nonzero value of xHD [n]. This reduces the storage required for Stage 2:
x[n] by 57%.
The average signal xLDLD [n] can in turn be decomposed even
further. The result is an analysis filter bank that computes Xe1 [n] hhaar[n] 2 xe2 [n] = xLDHD [n],
the Haar wavelet transform of x[n]. This analysis filter bank
consists of a series of sections like the left half of Fig. 7-7, Xe1 [n] ghaar[n] 2 Xe2 [n] = xLDLD [n].
connected as in Fig. 7-8, except that each average signal is
decomposed further. The signals computed at the right end of ..
.
this analysis filter bank constitute the Haar wavelet transform of
x[n]. Reconstruction of x[n] is shown in Fig. 7-9.
Stage K:
XeK−1 [n] hhaar[n] 2 xeK [n],

7-4.4 Haar Wavelet Transform Filter Banks
XeK−1 [n] ghaar[n] 2 XeK [n].
xLDLD[n] 2 ghaar[−n]
xLD[n]
2 ghaar[−n]
xLDHD[n] 2 hhaar[−n] x[n]
xHD[n] 2 hhaar[−n]
Figure 7-9 Reconstruction by a two-stage Haar synthesis filter bank.
The Haar transform of x[n] consists of the combination of Stage 1:

K + 1 signals:
{ xe1 [n], xe2 [n], xe3 [n], . . . , xeK [n], XeK [n] }. (7.46) XeK [n] 2 ghaar [−n] AK−1 [n],
|{z} |{z} |{z} | {z } | {z }
Duration N/2 N/4 N/8 N/2K N/2K
To represent x[n], we need to retain the “high-frequency” outputs xeK [n] 2 hhaar [−n] BK−1 [n],
of all K stages (i.e., { xe1 [n], xe2 [n], . . . , xeK [n] }), but only the final
output of the “low-frequency” sequence, namely XeK [n]. The XeK−1 [n] = AK−1 [n] + BK−1[n].
total duration of all of the (K + 1) Haar transform signals is
N N N N N Stage 2:
+ + + · · · + K + K = N, (7.47)
2 4 8 2 2
which equals the duration N of x[n]. We use cyclic convolutions XeK−1 [n] 2 ghaar [−n] AK−2 [n],
instead of linear convolutions so that the total lengths of the
downsampled signals equals the length of the original signal.
Were we to use linear convolutions, each convolution with xeK−1 [n] 2 hhaar [−n] BK−2 [n],
ghaar [n] or hhaar [n] would lengthen the signal unnecessarily.
XeK−2 [n] = AK−2 [n] + BK−2[n].
• The xek [n] for k = 1, 2, . . . , K are called detail signals. ..
.
• The XeK [n] is called the average signal.

Stage K:
Reconstruction Xe1 [n] 2 ghaar [−n] A0 [n],

The inverse Haar wavelet transform can be computed in reverse
order, starting with { XeK [n], xeK [n], xeK−1 [n], . . . }. xe1 [n] 2 hhaar [−n] B0 [n],
x[n] = A0 [n] + B0[n].

7-4.5 Haar Wavelet Transform in the Frequency

Domain ~
X3[n] x˜ 3[n] x˜ 2[n] x˜ 1[n]
As noted in Section 7-1, the original objective of subband band band band band
Ω
coding was to decompose a signal into different frequency 0 π/8 π/4 π/2 π
bands. Wavelets are more general than simple subbanding in
that decomposition of the spectrum of the input into different Figure 7-10 Approximate frequency-band coverage by com-
bands is not the explicit purpose of the wavelet transform. ponents of the Haar wavelet transform for K = 3.
Nevertheless, the decomposition does approximately allocate
the signal into different frequency bands, so it is helpful to track
and understand the approximate decomposition.
Recall from Eq. (7.37) that Ghaar (Ω) is approximately a Concept Question 7-3: Why do we use cyclic convolu-
lowpass filter and Hhaar (Ω) is approximately a highpass filter. tions instead of linear convolutions?
Hence, at the output of the first stage of the Haar wavelet
transform:
√ 7-6: Compute the one-stage Haar transform of
Exercise
• Xe1 [n] is the lowpass-frequency part of x[n], covering the x[n] = 2{4, 3, 3, 4}.
approximate range 0 ≤ |Ω| ≤ π /2, and
Answer:
• xe1 [n] is the highpass-frequency part of x[n], covering the x[n]
c √1 [1, 1] = {8, 7, 6, 7}. Hence, xLD [n] = {8, 6}.
2
approximate range π /2 ≤ |Ω| ≤ π . x[n]
c √1 [1, −1] = {0, −1, 0, 1}. Hence, xHD [n] = {0, 0}.
2
At each stage, downsampling expands the spectrum of each
signal to the full range 0 ≤ |Ω| ≤ π . Hence, at the output of
the second stage,
7-5 Discrete-Time Wavelet Transforms
• Xe2 [n] covers the frequency range 0 ≤ |Ω| ≤ π /2 of Xe1 [n],
which corresponds to the range 0 ≤ |Ω| ≤ π /4 of the The two Haar wavelet filters hhaar [n] and ghaar [n] introduced
spectrum of x[n]. in the preceding section and defined by Eq. (7.36) are very
simple in structure and short in duration, and yet they provide a
• xe2 [n] covers the frequency range π /2 ≤ |Ω| ≤ π of Xe1 [n], powerful tool for decomposing signals (so they may be stored or
which corresponds to the range π /4 ≤ |Ω| ≤ π /2 of the transmitted more efficiently) and reconstructing them later. We
spectrum of x[n]. now extend our presentation to the general case of wavelet trans-
The Haar wavelet transform decomposes the spectrum of X(Ω) form filters h[n] and g[n], which also use the octave-based filter-
into octaves, with bank decomposition structure shown in Fig. 7-2 and the octave-
based filter-bank reconstruction structure shown in Fig. 7-4,
π π but the main difference is that g[n] and h[n] are no longer
Octave k
≤ |Ω| ≤ k−1 represented by xek [n].
2 2 explicitly lowpass and highpass filters, although their frequency
responses are still approximately lowpass and highpass. Now
Since the width of this band is π /2k , the sampling rate can be h[n] is designed to sparsify (make mostly zero-valued) all of
reduced by a factor of 2k using downsampling. the octave-based filter bank outputs except the lowest frequency
For a signal of duration N = 2K , the Haar wavelet transform band output. Filters g[n] and h[n] are now called the scaling and
decomposes x[n] into (K + 1) components, namely XeK [n] and wavelet functions, respectively.
{ xe1 [n], xe2 [n], . . . , xeK [n] }, with each component representing a In this and future sections of this chapter, we examine the
frequency octave. The spectrum decomposition is illustrated structure, the properties, and some of the applications of the
in Fig. 7-10 for K = 3. Because the different components wavelet transform. We start by addressing the following two
cover different octaves, they can be sampled at different rates. questions:
Furthermore, if the signal or image consists of slowly varying
segments with occasional fast-varying features, as many real- (1) What conditions must g[n] and h[n] satisfy so that the
world signals and images do, then xek [n] for small k will be output of the octave-based reconstruction filter bank is the
sparse, requiring few samples to represent it. same as the input to the octave-based decomposition filter
7-5 DISCRETE-TIME WAVELET TRANSFORMS 219
bank? and
(2) How should h[n] be chosen so that the outputs of all of the (2)
octave-based decomposition filter banks are sparse, except
for the output of the lowest frequency band? (−1)n g[n] ∗ g[−n] + (−1)n h[n] ∗ h[−n] = 0. (7.51b)
Next, we examine how to relate g[n] to h[n] so as to satisfy the

7-5.1 Conditions for Filter Bank Perfect two parts of Eq. (7.51).
Reconstruction
We now derive more general conditions on g[n] and h[n] so 7-5.2 Quadrature Mirror Filters
that the output of the octave-based reconstruction filter bank is
identical to the input of the octave-based decomposition filter There are many possible solutions that satisfy Eq. (7.51). A
bank. If this occurs at each stage, then it will occur at all stages. sensible approach is to find a pair of filters g[n] and h[n] that au-
Accordingly we consider the single-stage decomposition and tomatically satisfy Eq. (7.51b), and then use them in Eq. (7.51a).
reconstruction shown in Fig. 7-11. One such pair is the quadrature mirror filter (QMF), which is
To determine the necessary conditions that g[n] and h[n] based on the QMF relation (where L is an odd integer):
should satisfy in order for the input/output relationship in
Fig. 7-11 to be true, we replicate Fig. 7-11 in equation form, g[n] = −(−1)n h[L − n] = (−1)n−L h[L − n]. (7.52a)
but we also replace the combined downsampling/upsampling
operations in the figure with the equivalent functional form Replacing n with −n implies:
given by Eq. (7.30):
g[−n] = −(−1)−n h[L + n] = (−1)−(n+L) h[L + n], (7.52b)

1 + (−1)n
(x[n] ∗ g[n]) ∗ g[−n] where h[L − n] is h[−n] shifted in time by L, with L chosen to
2
be an odd integer and its length is such as to make g[n] causal.
1 + (−1)n Thus, if h[n] and g[n] are each of length N, L should be equal to
+ (x[n] ∗ h[n]) ∗ h[−n] = x[n]. (7.48)
2 or greater than (N − 1).
Using Eq. (7.52), the first term in Eq. (7.51b) becomes
Expanding Eq. (7.48) and using the convolution relation given
by Eq. (7.34) gives (−1)n g[n] ∗ g[−n] = (−1)2n−L h[L − n] ∗ (−1)−(n+L) h[L + n].
x[n] ∗ g[n] ∗ g[−n] In view of the time-shift property (#5 in Table 2-6)
+ {(−1)n x[n]} ∗ {(−1)n g[n]} ∗ g[−n]
+ x[n] ∗ h[n] ∗ h[−n] x1 [n − L] ∗ x2[n + L] = x1 [n] ∗ x2[n], (7.53)
+ {(−1)n x[n]} ∗ {(−1)n h[n]} ∗ h[−n] = 2x[n]. (7.49) it follows that
Collecting terms convolved with x[n] and (−1)n x[n] separately (−1)n g[n] ∗ g[−n] = −(−1)n h[n] ∗ h[−n],
leads to
which satisfies the condition stated by Eq. (7.51b).
x[n] ∗ {g[n] ∗ g[−n] + h[n] ∗ h[−n]}
+ (−1)n x[n] ∗ {(−1)n g[n] ∗ g[−n]
+ (−1)n h[n] ∗ h[−n]} = 2x[n]. (7.50) ◮ We conclude that the QMF relations between g[n] and
h[n], as specified in Eq. (7.52), do indeed satisfy one of the
The expression given by Eq. (7.50) is satisfied for any input two conditions required for perfect reconstruction. ◭
x[n] if and only if both of the following perfect-reconstruction
conditions are satisfied:
(1) The significance of the odd-valued time shift L is best illustrated
g[n] ∗ g[−n] + h[n] ∗ h[−n] = 2δ [n] (7.51a) by a simple example.
xH[n] xHD[n] xHDU[n]

h[n] 2 2 h[−n]
xin[n] xout[n]
xL[n] xLD[n] xLDU[n]

g[n] 2 2 g[−n]
Decomposition Reconstruction
Figure 7-11 Single-stage decomposition of x[n] to xLD [n] and xHD [n], and reconstruction of x[n] from xLD [n] and xHD [n].
The two sequences in Eqs. (7.54b) and (7.54d) are identical in

Example 7-2: QMF value, but opposite in sign. Hence, their sum adds up to zero.
Show that for h[n] = { 1, 2, 3, 4 }, the QMF relations given

7-5.3 Smith-Barnwell Condition
by Eq. (7.52) satisfy the reconstruction condition given by
Eq. (7.51b). We now turn our attention to the first of the two perfect-
reconstruction conditions, namely the condition defined by
Solution: Since h[n] is of length 4, we select the time shift L to Eq. (7.51a), and we do so in concert with the QMF relations
be 3. The second term in Eq. (7.51b) is given by Eq. (7.52).
(−1)n h[n] ∗ h[−n] = { 1, −2, 3, −4 } ∗ { 4, 3, 2, 1} (7.54a) Using the QMF relations, the first term in Eq. (7.51a) be-
comes
= { 4, −5, 8, −10, −8, −5, −4 }. (7.54b)
g[n] ∗ g[−n] = {(−1)n−L h[−(n − L)]} ∗ {(−1)−(n+L) h[n + L]}
For L = 3 and the QMF relation defined by Eq. (7.52a), g[n] is
= {(−1)n h[−n]} ∗ {(−1)n h[n]}, (7.55)
g[n] = −(−1)n h[3 − n],
where we used the time-shift property given by Eq. (7.53).
which yields Replacing the first term in Eq. (7.51a) with Eq. (7.55) and
employing the commutativity property of convolution yields
g[0] = −h[3] = −4, the Smith-Barnwell condition for perfect reconstruction using
g[1] = h[2] = 3, QMF filters:
g[2] = −h[1] = −2, {(−1)n h[n]} ∗ {(−1)n h[−n]} + h[n] ∗ h[−n] = 2δ [n]. (7.56)
g[3] = h[0] = 1.
Finally, upon taking advantage of the “useful convolution rela-
Hence, tion” encapsulated by Eq. (7.34), we obtain the result
g[n] = { −4, 3, −2, 1 }.
The first term of Eq. (7.51b) is then given by (h[n] ∗ h[−n])(−1)n + h[n] ∗ h[−n] = 2δ [n]. (7.57)
(−1)n g[n] ∗ g[−n] = { −4, −3, −2, −1 } ∗ { 1, −2, 3, −4}

(7.54c) The usual form of the Smith-Barnwell condition uses the DTFT.
Recall that the DTFT maps convolutions to products (entry #6 of
= −{ 4, −5, 8, −10, −8, −5, −4 }. (7.54d) Table 2-7) and modulation (entry #3 of Table 2-7) to frequency
7-5 DISCRETE-TIME WAVELET TRANSFORMS 221
shift: relation in Eq. (7.52a), does also. Consequently, g[n] also is

e jΩ0 n h[n] H(Ω − Ω0). (7.58) orthonormal to even translations of itself. Using Eq. (7.52a),
it can be shown that g[n] and h[n] are also orthogonal to even
Setting Ω0 = π and recognizing that e jπ n = (−1)n gives
translations of each other.
(−1)n h[n] H(Ω − π ). (7.59) So the wavelet transform constitutes an orthonormal expan-
sion of x[n] as
Using the properties of the DTFT, the DTFT of the Smith- ∞ ∞
Barnwell condition given by Eq. (7.57) is g[2i − n] Xe1 [i] +
x[n] = ∑ ∑ h[2i − n] xe1 [i], (7.64)
2 2 i=−∞ i=−∞
|H(Ω)| + |H(Ω − π )| = 2. (7.60)
where average signal Xe1 [n] and detail signal xe1 [n] are defined as
Recognizing that
( ∞
0 for odd n, Xe1 [n] = ∑ g[2n − i] x[i] (7.65a)
1 + (−1)n = (7.61) i=−∞
2 for even n,
and
∞
Eq. (7.57) holds for odd n for any h[n], and for n even, it
simplifies to
xe1 [n] = ∑ h[2n − i] x[i]. (7.65b)
i=−∞
h[n] ∗ h[−n] = δ [n], for n even. (7.62) Computation of Xe1 [n] and xe1 [n] in Eq. (7.65) is implemented
using the wavelet analysis filter bank shown in Fig. 7-12(a).
Writing out Eq. (7.62) for n even gives Computation of x[n] from Xe1 [n] and xe1 [n] in Eq. (7.64) is
L
implemented using the wavelet synthesis filter bank shown in
Fig. 7-12(b).
n=0 ∑ h2[i] = 1, (7.63a)
The decimation in Fig. 7-12(a) manifests itself in Eq. (7.65)
i=0
L as the 2n in g[2n − i] and h[2n − i]. The zero-stuffing and time
n=2 ∑ h[i] h[i − 2] = 0, (7.63b) reversals in Fig. 7-12(b) manifest themselves in Eq. (7.64) as
i=2 the 2i in g[2i − n] and h[2i − n].
L The average signal Xe1 [n] can in turn be decomposed similarly,
n=4 ∑ h[i] h[i − 4] = 0, (7.63c) as in the wavelet analysis filter bank shown in Fig. 7-12(a). So
i=4
the first term in Eq. (7.64) can be decomposed further, resulting
.. in the K-stage decomposition
.
L ∞
n = L−1 ∑ h[i] h[i − (L − 1)] = 0. (7.63d) x[n] = ∑ XeK [i] g(K) [2K i − n]
i=L−1 i=−∞
K ∞
Recall that since L is odd, L − 1 is even. +∑ ∑ xek [i] h(k) [2k i − n], (7.66)
k=1 i=−∞
◮ The Smith-Barnwell condition is equivalent to stating where average signal XeK [n] and detail signal {e
xk [n],
that the autocorrelation rh [n] = h[n] ∗ h[−n] of h[n] is zero k = 1, . . . , K} are computed using
for even, nonzero, n and 1 for n = 0. This means that h[n] is
∞
orthonormal to even-valued translations of itself. ◭
Xek [n] = ∑ g(k) [2k n − i] x[i]
i=−∞
7-5.4 Wavelets as Basis Functions and

∞
It is easy to show that if h[n] satisfies the Smith-Barnwell xek [n] = ∑ h(k) [2k n − i] x[i], (7.67)
condition given by Eq. (7.57), then g[n], as defined by the QMF i=−∞
h[n] 2 x˜ 1[n]
x[n]
h[n] 2 x˜ 2[n]
X˜ 1[n]
g[n] 2
g[n] 2 X˜ 2[n]
(a) Wavelet analysis filter bank
X˜ 2[n] 2 g[−n]
2 g[−n]
x˜ 2[n] 2 h[−n] x[n]
x˜1[n] 2 h[−n]
(b) Wavelet synthesis filter bank
Figure 7-12 (a) Wavelet analysis filter bank and (b) wavelet synthesis filter bank.
and the basis functions g(k) [n] and h(k) [n] can be computed transform of x[n] have durations
recursively offline. We will not bother with explicit formulae for
these, since the decomposition and reconstruction are performed x1 [n], xe2 [n], xe3 [n], . . . , xeK [n], XeK [n]}.
{e (7.69)
|{z} |{z} |{z} | {z } | {z }
much more easily using filter banks. The wavelet analysis and N/2 N/4 N/8 N/2K N/2K
synthesis filter banks shown respectively in Fig. 7-12(a) and
(b) are identical in form to the octave-band filter banks shown
in Figs. 7-2 and 7-4, except for the following nomenclature 7-5.5 Amount of Computation
changes:
The total duration of all of the wavelet transform signals together
xe1 [n] = xHD [n], is
N N N N N
xe2 [n] = xLDHD [n], + + + · · · + K + K = N, (7.70)
2 4 8 2 2
(7.68)
Xe1 [n] = xLD [n], which equals the duration N of x[n].
Xe2 [n] = xLDLD [n]. Again, we use cyclic convolutions instead of linear convolu-
tions so that the total lengths of the decimated signals equals the
If x[n] has duration N = 2K , the components of the wavelet length of the original signal.
The total amount of computation required to compute the
wavelet transform of a signal x[n] of duration N can be computed
as follows. Let the durations of g[n] and h[n] be the even integer
7-6 SPARSIFICATION USING WAVELETS OF PIECEWISE-POLYNOMIAL SIGNALS 223
(L + 1) where L is the odd integer in Eq. (7.52). Convolving in Eq. (7.67) and computed using the analysis filter bank shown
x[n] with both g[n] and h[n] requires 2(2(L + 1))N = 4(L + 1)N in Fig. 7-12(a) are sparse (mostly zero-valued). In particular, we
multiplications-and-additions (MADs). But since the results will design g[n] and h[n] so that the wavelet transform detail signals
be decimated by two, only half of the convolution outputs must are sparse when the input signal x[n] is piecewise polynomial,
be computed, halving this to 2(L + 1)N. which we define next. The g[n] and h[n] filters are then used in
At each successive decomposition, g[n] and h[n] are con- the Daubechies wavelet transform.
volved with the average signal from the previous stage, and the
result is decimated by two. So g[n] and h[n] are each convolved
7-6.1 Definition of Piecewise-Polynomial Signals
with the signals
A signal x[n] is defined to be piecewise-Mth-degree polynomial
{ x[n] , Xe1 [n], Xe2 [n], . . . , XeK [n]}. if it has the form
|{z} | {z } | {z } | {z }
N N/2 N/4 N/2K

M


 ∑ a0,k nk
 for − ∞ < n ≤ N0 ,
The total number of MADs required is thus 

k=0


 M
 a nk

N N N
2(L + 1) N + + + · · · + K < 4(L + 1)N. (7.71)
∑ 1,k for N0 < n ≤ N1 ,
2 4 2 x[n] = k=0 (7.72)

 M


The additional computation for computing more decomposi-




∑ a2,k nk for N1 < n ≤ N2 .

 k=0
tions (i.e., increasing K) is thus minimal. 
 ..
Since L is small (usually 1 ≤ L ≤ 5), this is comparable to .
the amount of computation N2 log2 (N) required to compute the
DFT, using the FFT, of a signal x[n] of duration N. But the DFT This x[n] can be segmented into intervals, and in each interval
requires complex-valued multiplications and additions, while x[n] is a polynomial in time n of degree M. The times Ni at which
the wavelet transform uses only real-valued multiplications and the coefficients {ai,k } change values are sparse, meaning that
additions. they are scattered over time n. In continuous time, such a signal
would be a spline (see Section 4-7), except that in the case of a
Concept Question 7-4: What is the Smith-Barnwell con- spline, the derivatives of the signal must match at the knots (the
dition? times where coefficients {ai,k } change values). The idea here is
that the coefficients {ai,k } can change completely at the times Ni ;
1
there is no “smoothness” requirement. Indeed, these times Ni
Exercise 7-7: If h[n] = √ {6, 2, h[2], 3}, find h[2] so that constitute the edges of x[n].
5 2
h[n] satisfies the Smith-Barnwell condition.
Answer: According to Eq. (7.63), the Smith-Barnwell A. Piecewise-Constant Signals
condition requires the autocorrelation of h[n] to be 1 for First, let x[n] be piecewise constant (M = 0 in Eq. (7.72)), so that
n = 0 and to be 0 for even n 6= 0. For h[n] = {a, b, c, d}, x[n] is of the form
these conditions give a2 + b2 + c2 + d 2 = 1, ac + bd = 0,

and h[2] = −1.  a0 for − ∞ < n ≤ N0 ,



a 1 for N0 < n ≤ N1 ,
x[n] = a for N1 < n ≤ N2 . (7.73)
7-6 Sparsification Using Wavelets of 
 2

 .
 ..
Piecewise-Polynomial Signals
The wavelet filters g[n] and h[n] are respectively called scaling The value of x[n] changes only at a few scattered times. The
and wavelet functions. In this section, we show how to design amount by which x[n] changes at n = Ni is the jump ai+1 − ai :
these functions such that they satisfy the Smith-Barnwell condi- (
tion for perfect reconstruction given by Eq. (7.56), form a QMF 0 for n 6= Ni ,
x[n + 1] − x[n] = (7.74)
pair, and have the property that the detail signals x̃k [n] defined ai+1 − ai for n = Ni .
This can be restated as In practice, q[n] = q[0] δ [n] has duration = 1, so x[n] ∗ h[n] is still
mostly zero-valued.
x[n + 1] − x[n] = ∑(ai+1 − ai ) δ [n − Ni ]. (7.75)
i
B. Piecewise-Linear Signals
Taking differences sparsifies a piecewise-constant signal, as
illustrated in Fig. 7-13. Next, let x[n] be piecewise linear (M = 1 in Eq. (7.72)), so that
Now let the wavelet function h[n] have the form, for some x[n] is of the form
signal q[n] that is yet to be determined, 

 a0,1 n + a0,0, −∞ < n ≤ N0 ,
h[n] = q[n] ∗ { 1, −1}. (7.76) 

a1,1 n + a1,0, N0 < n ≤ N1 ,
x[n] = a n + a , (7.78)
Since from Fig. 2-3 the overall impulse response of two systems 
 2,1 2,0 N1 < n ≤ N2 ,


connected in series is the convolution of their impulse responses,  ...
h[n] can be implemented by two systems connected in series:
Proceeding as in the case of piecewise-constant x[n], taking
x[n] { 1, −1 } q[n] x[n] ∗ h[n]. differences, and then taking differences of the differences, will
sparsify a piecewise-linear signal. The process is illustrated in
Fig. 7-14.
Convolution with h[n] sparsifies a piecewise-constant input x[n], The bottom signal in Fig. 7-14 is in turn convolved with q[n],
since (using the time-shift property of convolution)
x[n] ∗ h[n] = x[n] ∗ {1, −1} ∗ q[n]

= (x[n + 1] − x[n]) ∗ q[n] 4
2
= ∑(ai+1 − ai) δ [n − Ni ] ∗ q[n]
i 0
= ∑(ai+1 − ai) q[n − Ni]. (7.77) −2
0 5 10 15 20 25 30 35
i
(a) x[n]
1
6 0.5
4 0
2 −0.5
0 5 10 15 20 25 30 35
0
0 5 10 15 20 25 30 (b) w1[n] = x[n + 1] − x[n]
(a) x[n]
3 1
2 0
1
−1
0
−1 −2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
(b) x[n + 1] − x[n] (c) w2[n] = w1[n + 1] − w1[n]
Figure 7-13 A piecewise-constant signal is compressed by Figure 7-14 A piecewise-linear signal is compressed by
taking differences. (a) a piecewise-constant signal, (b) differ- taking successive differences. (a) A piecewise-linear signal, (b)
ences of the signal. differences of the top signal, (c) differences of the middle signal.
resulting in a series of scaled and delayed versions of q[n]. In

Table 7-2 Daubechies wavelet notation.
practice, q[n] has length two, so x[n] ∗ h[n] is still mostly zero-
valued. K 1 2 3
Now let the wavelet function h[n] have the form
dbK 1 2 3
h[n] = q[n] ∗ {1, −1} ∗ {1, −1}. (7.79) D(2K) 2 4 6
Number of differences 1 2 3
Convolution with h[n] sparsifies a piecewise-linear input x[n],
since Duration = 2K 2 4 6
Degree sparsified = M 0 1 2
x[n] ∗ h[n] = x[n] ∗ {1, −1} ∗ {1, −1} ∗ q[n]
= (x[n + 1] − x[n]) ∗ {1, −1} ∗ q[n]
= (x[n + 2] − 2x[n + 1] + x[n]) ∗ q[n]. (7.80)
7-6.2 Design of Wavelet Functions h[n] of
The output consists of a set of scaled and delayed versions of Desired Form
q[n]. In practice, q[n] is very short (length two), so x[n] ∗ h[n] is
still mostly zero-valued. We now show how to design wavelet functions of the form given
by Eq. (7.83). These wavelet functions compress piecewise-
Mth-degree polynomials to sparse signals xe[n]. Since many real-
world signals can be so modeled, their wavelet transforms will
C. Piecewise-Polynomial Signals always be sparse.
The functions h[n] given by Eq. (7.83) are called Daubechies
Finally, let x[n] be piecewise polynomial, meaning that wavelet functions. The order of Daubechies wavelet function
 has several different definitions. The Daubechies wavelet func-


 ∑M k
k=0 a0,k n , −∞ < n ≤ N0 , tion that compresses piecewise-(K − 1)th-degree polynomials

∑M k to sparse signals is termed “dbK Daubechies wavelet func-
k=0 a1,k n , N0 < n ≤ N1 ,
x[n] = M k, (7.81) tions,” where “K” is a reference to the number of differences
∑ a
 k=0 2,k

n N1 < n ≤ N2 ,

 .. that need to be taken, “db” is short for Daubechies, and
 . K = M + 1. Daubechies wavelet functions having duration N
are termed “DN Daubechies wavelet functions.” We shall see
Taking (M + 1) sets of successive differences will sparsify x[n], that dbK Daubechies wavelet functions have duration 2K, so a
since each difference will reduce the degrees of the polynomial dbK Daubechies wavelet function is also a D(2K) Daubechies
segments by one. The continuous-time analogue of this is wavelet function. The DN notation is inefficient because N is
always even. For that reason, we will use “dbK” to refer to
d M+1 x the order of a Daubechies wavelet function. The Haar functions
y(t) = =0 (7.82)
dt M+1 introduced in Section 7-4 are db1 and D2 wavelets. To confuse
matters further, DN was used in the past to mean both DN and
for any Mth-degree polynomial x(t). db(N/2). The notation is summarized in Table 7-2.
Now let the wavelet function h[n] have the form To make h[n] causal, we delay each of the differences in
Eq. (7.83) to get
h[n] = q[n] ∗ {1, −1} ∗ {1, −1} . . . {1, −1} . (7.83)
| {z }
M+1 differences h[n] = q[n] ∗ {1, −1} ∗ {1, −1} ∗ · · · ∗ {1, −1} . (7.84)
| {z }
M+1 differences
The output consists of a set of scaled and delayed versions of
q[n]. In practice, q[n] is still relatively short (length M + 1), so A degrees-of-freedom analysis shows that q[n] must have dura-
x[n] ∗ h[n] is still mostly zero-valued. tion M + 1, so h[n] has duration (2M + 2) = 2K. Function q[n]
is determined by inserting the form Eq. (7.84) into the Smith-
Barnwell condition given by Eq. (7.56). The scaling function
g[n] is then determined by the QMF formula in Eq. (7.52a).
3
A. Piecewise-Constant Signals
1= ∑ h[n]2. (7.90b)
To compress piecewise-constant signals, we set M = 0; a con- n=0
stant is a polynomial of degree zero. We wish to design a The first equation states that h[n] must be orthogonal to its
db1 = D2 order Daubechies wavelet, which has only 1 differ- even-valued translations, and the second equation simply states
ence because M + 1 = 0 + 1 = 1, and a duration = 2(1) = 2, so that the energy of h[n] must be one, which is a normalization
L = 1. The expression in Eq. (7.84) becomes requirement that should be imposed after q[0] and q[1] have been
h[n] = (q[0] δ [n]) ∗ {1, −1} = {q[0], −q[0]}. (7.85) determined.
| {z } | {z } | {z } Substituting the elements of h[n] given in Eq. (7.89) into
duration=1 1 difference duration=2 Eq. (7.90a) leads to
Inserting this result into the Smith-Barnwell condition of 0 = (q[0] − 2q[1]) q[0] + (q[1] − 2q[0]) q[1]
Eq. (7.63a) gives
q[0]2 + q[0]2 = 1. (7.86) = q[0]2 − 4q[0] q[1] + q[1]2. (7.91)
The requirement given in Eq. (7.63b) is satisfied automatically, Since the scale factor will be set by ∑ h[n]2 = 1, we can, without
since h[n] has duration = 2. Hence, q[0] = √12 and loss of generality, set q[0]=1. This gives
1 q[1]2 − 4q[1] + 1 = 0, (7.92)

h[n] = √ {1, −1}, (7.87)
2 √
√ has the two roots q[1] = {2 ± 3}. Using the smaller root,
which
and the scaling function g[n] is, from Eq. (7.52a), given by 2 − 3, in Eq. (7.89) gives
1 √ √ √
g[n] = −(−1)n h[1 − n] = √ {1, 1}. (7.88) h1 [n] = q1 [0]{1, − 3, 2 3 − 3, 2 − 3}. (7.93)
2
Finally, the energy normalization ∑ h1 [n]2 = 1 requires
We recognize this as the Haar transform. It compresses piece- q1 [0] = ±0.4830. Hence,
wise constant signals. Making the alternate choice in the solu-
−1
tion of Eq. (7.86), namely q[0] = √ 2
, simply changes the signs h1 [n] = ±{0.4830, −0.8365, 0.2241, 0.1294}. (7.94)
of g[n] and h[n].
The length of h[n] is N = 2(M + 1) = 2K = 4. To satisfy
Eq. (7.52), the time shift L should be an odd integer and equal to
B. Piecewise-Linear Signals or greater than (N − 1). Since N − 1 = 4 − 1 = 3, we set L = 3.
Accordingly, the scaling function g1 [n] is, from Eq. (7.52a),
To compress piecewise-linear signals, we set M = 1. We wish given by
to design a db2 = D4 order Daubechies wavelet, which has
M + 1 = 2 differences and a duration = 2(2) = 4. Function h[n] g1 [n] = −(−1)n h[3 − n]
in Eq. (7.84) becomes a filter of duration 4: = ±{0.1294, −0.2241, −0.8365, −0.4830}. (7.95)
√
h[n] = {q[0], q[1]} ∗ {1, −1} ∗ {1, −1} Using the larger root of Eq. (7.92), namely (2 + 3), in
| {z } | {z }
duration=2 M+1=2 differences Eq. (7.89) gives
= {q[0], q[1]} ∗ {1, −2, 1} √ √ √
h2 [n] = q2 [0] {1, 3, −2 3 − 3, 2 + 3}, (7.96)
= {q[0], q[1] − 2q[0], q[0] − 2q[1], q[1]}. (7.89)
and the energy normalization ∑ h[n]2 = 1 leads to
The Smith-Barnwell condition for n = 0 and n = 2, as stated in
Eqs. (7.63b) and (7.63a), gives two equations: q2 [0] = ±0.1294.
3
0= ∑ h[n] h[n − 2], (7.90a)
n=2
Hence,
3
h2 [n] = ±{0.1294, 0.2241, −0.8365, 0.4830}, (7.97) 2

1
and the scaling function g[n] is then 0
−1
g2 [n] = −(−1)n h[3 − n]
−2
5 10 15 20 25 30
n
= ±{0.4830, 0.8365, 0.2241, −0.1294}. (7.98)
(a) x[n]
The two choices of h[n] are time reversals of each other, and so 6
are the choices of g[n]. The signs of g[n] and h[n] can be changed 4
due to the (plus/minus) signs in Eqs. (7.94), (7.95), (7.97), and 2
(7.98).
0
These g[n] and h[n] functions are called the Daubechies db2
−2
or D4 scaling and wavelet functions. The Daubechies db3 or
D6 wavelet transform sparsifies piecewise-quadratic signals, as −4
0 2 4 6 8 10 12 14 16
n
there are three differences (M = 2 and, hence, K = 3) in h[n].
(b) X1[n]
~
◮ The Daubechies scaling functions g[n], with g[0] > 0, are

listed in Table 7-3 for various values of K. ◭ 0
−0.2
−0.4
Table 7-3 Daubechies scaling functions. −0.6
2 4 6 8 10 12 14 16
n
K=1 K=2 K=3 K=4 (c) x˜ 1[n]
g[n] db1 db2 db3 db4 6
g[0] .7071 .4830 .3327 .2304 4
g[1] .7071 .8365 .8069 .7148 2
g[2] 0 .2241 .4599 .6309 0
g[3] 0 –.1294 –.1350 –.0280 −2
g[4] 0 0 –.0854 –.1870 −4
1 2 3 4 5 6 7 8
n
g[5] 0 0 .0352 .0308 ~
(d) X [n] 2
g[6] 0 0 0 .0329
g[7] 0 0 0 –.0106
0.5
−0.5
Example 7-3: db2 Wavelet Transform −1

1 2 3 4 5 6 7 8
n
(e) x˜2[n]
Compute the db2 (or equivalently, D4) Daubechies wavelet
transform of the piecewise-linear signal x[n] shown in
Fig. 7-15(a). Figure 7-15 (a) Piecewise linear signal x[n], (b) stage-1
average signal Xe1 [n], (c) stage-1 detail signal xe1 [n], (d) stage-
Solution: The two-stage procedure involves the computation 2 average signal Xe2 [n], and (e) stage-2 detail signal xe2 [n].
of average signals Xe1 [n] and Xe2 [n], and detail signals xe1 [n] and
xe2 [n], using Eqs. (7.67), (7.94), and (7.95). The results are signals to 2-D images. Downsampling in 2-D involves down-
displayed in parts (b) to (e) of Fig. 7-15. We note that sampling in both directions. For example:
• Average signals Xe1 [n] and Xe2 [n] are low-resolution versions (3,2)
x[n, m] x[3n, 2m].
of x[n].
• Detail signals xe1 [n] and xe2 [n] are sparse (mostly zero), and The downsampling factor in this case is 3 along the horizontal
their nonzero values are small in magnitude. direction and 2 along the vertical direction. To illustrate the
process, we apply it to a 5 × 7 image:
• The db2 wavelet transform of the given x[n] consists of  
xe1 [n], xe2 [n], and Xe2 [n]. 1 2 3 4 5 6 7  
 8 9 10 11 12 13 14 1 4 7
  (3,2) 15 18 21 .
These patterns explain the terms “average” and “detail.” 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 32 35
Concept Question 7-5: How is it possible that the 29 30 31 32 33 34 35
wavelet transform requires less computation than the FFT?
Upsampling in 2-D involves upsampling in both directions.
Upsampling a signal x[n, m] by a factor 3 along the horizontal
Exercise 7-8: What are the finest-detail signals of the db3 and by a factor 2 along the vertical, for example, is denoted
wavelet transform of x[n] = 3n2 ? symbolically as
Answer: Zero, because db3 wavelet basis function h[n]
eliminates quadratic signals by construction. x[n, m] (3,2)

 n m
x 3 , 2 for n = integer multiple of 3
Exercise 7-9: Show by direct computation that the db2 and m = integer multiple of 2,
scaling function listed in Table 7-3 satisfies the Smith- 
0 otherwise.
Barnwell condition.
Answer: The Smith-Barnwell condition requires the au- Applying this upsampling operation to a simple 2 × 2 image
tocorrelation of g[n] to be 1 for n = 0 and to be 0 for yields
even n 6= 0. For g[n] = {a, b, c, d}, these conditions give  
a2 + b2 + c2 + d 2 = 1 and ac + bd = 0. The db2 g[n] listed 1 0 0 2
in Table 7-3 is g[n] = {.4830, .8365, .2241, −.1294}. It is 1 2 (3,2) 0 0 0 0
3 4 3 0 0 4 .
easily verified that the sum of the squares of these numbers
is 1, and that (.4830) × (.2241) + (.8365) × (−.1294) = 0. 0 0 0 0
Downsampling and upsampling are used extensively in image

processing operations.
7-7 2-D Wavelet Transform
The real power of the wavelet transform becomes apparent when 7-7.2 Image Analysis Filter Bank
it is applied to 2-D images. A 512 × 512 image has more than
a quarter-million pixel values. Storing a sparse representation The generalization of the wavelet transform from signals to
of an image, rather than the image itself, saves a huge amount images is straightforward if separable scaling and wavelet
of memory. Compressed sensing (covered later in Section 7-9) functions are used. In this book we restrict our attention to
becomes very powerful when applied to images. separable functions; i.e., the 2-D filter g2 [n, m] = g[n] g[m]. With
g[n] and h[n] denoting scaling and wavelet functions, such as
7-7.1 Decimation and Zero-Stuffing of Images the Haar or Daubechies functions, the image analysis filter bank
performs the following operations:
The concepts of downsampling (decimation) and upsampling
(zero-stuffing) and interpolation generalize directly from 1-D
7-7 2-D WAVELET TRANSFORM 229
(1) Stage-1 decomposition up to the largest (in size) three detail images:
(1) (1) (1)
{ xeLH [n, m], xeHL [n, m], xeHH [n, m] }.
(1)
x[n, m] g[n] g[m] (2,2) xeLL [n, m] (7.99a)
(k)
The average images xeLL [n, m] are analogous to the average
(1) signals Xek [m] of a signal, except that they are low-resolution
x[n, m] g[n] h[m] (2,2) xeLH [n, m] (7.99b) versions of a 2-D image x[n, m] instead of a 1-D signal x[m].
In 2-D, there are now three detail images, while in 1-D there is
(1) only one detail signal.
x[n, m] h[n] g[m] (2,2) xeHL [n, m] (7.99c) In 1-D, the detail signals are zero except near edges, repre-
senting abrupt changes in the signal or in its slope. In 2-D, the
x[n, m] h[n] h[m]
(1)
xeHH [n, m] (7.99d) three detail images play the following roles:
(2,2)
(k)
(a) xeLH [n, m] picks up vertical edges,
(2) Stage-2 to stage-K decomposition (k)
(b) xeHL [n, m] picks up horizontal edges,
(k)
(1) (2) (c) xeHH [n, m] picks up diagonal edges.
xeLL [n, m] g[n] g[m] (2,2) xeLL [n, m]
(7.100a) 7-7.3 Image Synthesis Filter Bank

(1) (2) The image synthesis filter bank combines all of the detail
xeLL [n, m] g[n] h[m] (2,2) xeLH [n, m] (K)
images, and the coarsest average image, xeLL [n, m], into the
(7.100b) original image x[n, m], as follows:
(1) (2) (K) (K)

xeLL [n, m] h[n] g[m] (2,2) xeHL [n, m] xeLL [n, m] (2,2) g[−n] g[−m] ALL [n, m]
(7.100c)
(K) (K)
(1) (2)
xeLH [n, m] (2,2) g[−n] h[−m] ALH [n, m]
xeLL [n, m] h[n] h[m] (2,2) xeHH [n, m]
(7.100d) (K) (K)

xeHL [n, m] (2,2) h[−n] g[−m] AHL [n, m]
The decomposition process is continued as above, until output
xeKLL [n, m] is reached, where K is the total number of stages. (K)
xeHH [n, m] (2,2) h[−n] h[−m]
(K)
AHH [n, m]
(3) Final wavelet transform (K−1)

Average signal xeLL [n, m] is the sum of the above four
At the conclusion of the decomposition process, x[n, m] consists outputs:
of:
(K−1) (K) (K)
xeLL [n, m] = ALL [n, m] + ALH [n, m]
(K)
(a) The coarsest average image xeLL [n, m]. (K) (K)
+ AHL [n, m] + AHH [n, m]. (7.101)
(b) Three detail images at each stage:
(K) (K) (K) (K)
Here { ALL [n, m], ALH [n, m], AHL [n, m], AHH [n, m] } are just
(K−1) (K−1) (K−1)
{ xeLH [n, m], xeHL [n, m], xeHH [n, m] } four temporary quantities to be added. The analogy to signal
(K−2) (K−2) (K−2) analysis and synthesis filter banks is evident, except that at each
{ xeLH [n, m], xeHL [n, m], xeHH [n, m] }, stage there are three detail images instead of one detail signal.
Repetition of the reconstruction step represented by

Eq. (7.101) through an additional K − 1 stages leads to
(0) 50
x[n, m] = xeLL [n, m].
The condition for perfect reconstruction is the 2-D version of 100

the Smith-Barnwell condition defined by Eq. (7.60), namely
150
|H2 (Ω1 , Ω2 )|2 + |H2(Ω1 − π , Ω2 − π )|2
+|H2 (Ω1 − π , Ω2)|2 + |H2(Ω1 , Ω2 − π )|2 = 4, (7.102) 200
where H2 (Ω1 , Ω2 ) is the DSFT of the 2-D wavelet function

250
h2 [n, m] = h[n] h[m]. Since the 2-D wavelet function h2 [n, m] is 50 100 150 200 250
separable, its DSFT is also separable: (a) 256 × 256 Shepp-Logan phantom image
H2 (Ω1 , Ω2 ) = H(Ω1 ) H(Ω2 ), (7.103)
(4) (4)
xLL xLH (3)
where H(Ω1 ) is the DTFT of h[n]. The 2-D Smith-Barnwell (4) (4) xLH (2)
condition is satisfied if the 1-D Smith-Barnwell condition given xHL xHH (3) (3)
xLH
xHL xHH (1)
by Eq. (7.60) is satisfied. xLH
(2) (2)
xHL xHH
7-7.4 2-D Haar Wavelet Transform of
Shepp-Logan Phantom
The Shepp-Logan phantom is a piecewise-constant image that (1) (1)
xHL xHH
has been a test image for tomography algorithms since the
1970s. A 256 × 256 image of the Shepp-Logan phantom is
displayed in Fig. 7-16(a). To illustrate what the coarse and
detail images generated by the application of the 2-D wavelet (b)
transform look like, we applied a 4-stage 2-D Haar wavelet
transform to the image in Fig. 7-16(a). The results are displayed Thumbnail image
per the pattern in Fig. 7-16(b), with:
50
(1) Stage-4 images: 16 × 16
(4)
The coarsest average image, xLL [n, m], is used as a thumbnail 100
image, and placed at the upper left-hand corner in Fig. 7-16(c).
The three stage-4 detail images are arranged clockwise around 150
(4)
image xLL [n, m].
200
(2) Stage-3 images: 32 × 32

250
The three stage-3 detail images are four times as large, and are 50 100 150 200 250
arranged clockwise around the stage-4 images. (c) 4-stage 2-D Haar wavelet transform images
(3) Stage-2 images: 64 × 64 Figure 7-16 (a) 256 × 256 test image, (b) arrangement of
images generated by a 3-stage Haar wavelet transform, and (c)
the images represented in (b). A logarithmic scale is used to
display the values.
7-7 2-D WAVELET TRANSFORM 231
(4) Stage-1 images: 128 × 128

The largest images in Fig. 7-16(c) are the three stage-1 im- 20
ages. The number of pixels in the 2-D Haar transform can 40
be computed similarly to the 1-D Haar transform durations in
60
Eq. (7.46):
80
Stage 4 images: The 4 (16 × 16) images (1 average and 3 detail)
100
contain a total of 4(16)2 = 1024 pixels.
120
Stage 3 images: The 3 (32 × 32) detail images contain a total of 140
3(32)2 = 3072 pixels. The fourth 32 × 32 image is the average
image, which is decomposed into the stage 4 images. 160
180
Stage 2 images: The 3 (64 × 64) detail images contain a total of
3(64)2 = 12288 pixels. The fourth 64 × 64 image is the average 200
20 40 60 80 100 120 140 160 180 200
image, which is decomposed into the stage 3 images. (a) 200 × 200 clown image
Stage 1 images: The 3 (128 × 128) detail images contain a total
of 3(128)2 = 49152 pixels. The fourth 128 × 128 image is the
average image, which is decomposed into the stage 2 images. 20
The total number of pixels in the wavelet transform of the 40
Shepp-Logan phantom is then:
60
1024 + 3072 + 12288 + 49152 = 65536. 80
100
This equals the number of pixels in the Shepp-Logan phantom,
which is 2562 = 65536. 120
140
Even though the coarsest average image is only 16 × 16, it 160

contains almost all of the large numbers in the 2-D wavelet
180
transform of the original image, and captures most of its primary
features. That is why it is used as a thumbnail image. The pixel 200
20 40 60 80 100 120 140 160 180 200
values of the stage-4 detail images are almost entirely zeros. The
(b) 2-D D3 Daubechies wavelet transform
3-stage composition process preserves the information content
of the clown image
of the original image—composed of 2562 = 65536 pixels—
while compressing it down to 4 images containing only 3619
nonzero pixels. This is a 94.5% reduction in the number of Figure 7-17 (a) Clown image and (b) its 3-stage wavelet-
nonzero pixels. transform images. A logarithmic scale is used to display the
values.
7-7.5 2-D db3 Daubechies Wavelet Transform of

Clown Image 7-7.6 Image Compression by Thresholding Its
Wavelet Transform
For a second example, we repeated the steps outlined in
the preceding subsection, but this time we used a 2-D db3 Image compression is an important feature of the wavelet
Daubechies wavelet transform on the 200 × 200 clown image transform. Not only is the original image represented by fewer
shown in Fig. 7-17(a). The images generated by the 3-stage pixels, but also many of the pixels of the wavelet-transform
decomposition process are displayed in part (b) of the figure, images are zero-valued. The compression ratio can be improved
using the same arrangement as shown earlier in Fig. 7-17(b). further by thresholding the output images of the final stage of
the wavelet-transform decomposition process. Thresholding a

pixel means replacing it with zero if its absolute value is below a
20
given threshold level λ . As noted earlier in connection with 1-D
signals and 2-D images, most of the wavelet-transform detail 40
signals and images have very small values, so little information 60
is lost by setting their values to zero, which means that they no
80
longer need to be stored, thereby reducing the storage capacity
needed to store the wavelet transform of the image. Furthermore, 100
since the wavelet transform is composed of orthogonal basis 120
functions, a small change in the wavelet transform of an image
140
will produce only a small change in the image reconstructed
from these values (see Section 7-2.3). 160
To illustrate with an example, we compare in Fig. 7-18 two 180
images:
200
20 40 60 80 100 120 140 160 180 200
(a) In part (a), we show the original 200 × 200 clown image,
and (a) 200 × 200 clown image
(b) in part (b) we show a reconstructed clown image, generated
from the db3 Daubechies wavelet-transform images after
thresholding the images with λ = 0.11. 20
The reconstructed image looks almost identical to the original 40

image, even though only 6% of the pixels in the wavelet- 60
transform images are nonzero.
80
Concept Question 7-6: Why is the 2-D wavelet trans- 100

form useful for generating thumbnail images? 120
140
7-8 Denoising by Thresholding and 160
180
Shrinking
200
20 40 60 80 100 120 140 160 180 200
A noisy image is given by
(b) Reconstructed clown image
y[n, m] = x[n, m] + v[n, m], (7.104)
Figure 7-18 (a) Original clown image, and (b) image recon-
where x[n, m] is the desired image and v[n, m] is the noise that structed from thresholded db3 Daubechies wavelet transform
had been added to it. The goal of denoising is to recover the images, requiring only 6% as much storage capacity as the
original image x[n, m], or a close approximation thereof, from original image.
the noisy image y[n, m]. We now show that the obvious approach
of simply thresholding the wavelet transform of the image
does not work well. Then we show that the combination of
thresholding and shrinking the wavelet transform of the image noise ratio is low, so little of value is lost by thresholding
does work well. these small values to zero. For large wavelet transform values,
the signal-to-noise ratio is large, so these large values should
7-8.1 Denoising by Thresholding Alone be kept. This approach works poorly on wavelet transforms of
noisy images, as the following example shows.
One approach to denoising is to threshold the wavelet transform Zero-mean 2-D white Gaussian noise with standard deviation
of y[n, m]. For small wavelet transform values, the signal-to- σ = 0.1 was added to the clown image of Fig. 7-19(a). The noisy
7-8 DENOISING BY THRESHOLDING AND SHRINKING 233
20
40
60
80
100
120
140
160
180
200
20 40 60 80 100 120 140 160 180 200
(a) Original noise-free image
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
200 200
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
(b) Noisy image (c) Image denoised by thresholding its wavelet transform
Figure 7-19 (a) Noise-free clown image, (b) noisy image with SNR = 11.5, and (c) image reconstructed from thresholded wavelet
transform. Thresholding without shrinkage does not reduce noise.
image, shown in Fig. 7-19(b), has a signal-to-noise ratio (SNR) reduce the noise by any appreciable amount.
of 11.5, which means that the noise level is, on average, only
about 8.7% of that of the signal.
The db3 Daubechies wavelet transform was computed for the 7-8.2 Denoising by Thresholding and Shrinkage
noisy image, then thresholded with λ = 0.11, which appeared We now show that a combination of thresholding small wavelet-
to provide the best results. Finally, the image was reconstructed transform values to zero and shrinking other wavelet transform
from the thresholded wavelet transform, and it now appears in values by a small number λ performs much better in denoising
Fig. 7-19(c). Upon comparing the images in parts (b) and (c) of images. First we show that shrinkage comes from minimizing
the figure, we conclude that the thresholding operation failed to a cost functional, just as Wiener filtering given by Eq. (6.31)
came from minimizing the Tikhonov cost functional given by

5
Eq. (6.29). y[n] = 1
4
In the material that follows, we limit our treatment to 1-D λ=2
signals. The results are readily extendable to 2-D images. 3 Λn
λ|x[n]|
2
Suppose we are given noisy observations y[n] of a signal x[n]
1
that is known to be sparse (mostly zero), 1 ( y[n] − x[n])2
2
0
−1 −0.5 0 0.5 1 1.5 2
x[n]
y[n] = x[n] + v[n]. (7.105)
(a)
All three signals have durations N and the noise v[n] is known
2.5
to have zero-mean. The goal is to estimate x[n] from the
2 y[n] = 1
noisy observations y[n]; i.e., to denoise y[n] with the additional Λn
information that x[n] is mostly zero-valued. 1.5 λ = 1/3
1
The prior knowledge that x[n] is sparse can be incorporated 1 ( y[n] − x[n])2
λ|x[n]| 2
by estimating x[n] from y[n], not by simply using y[n] as an 0.5
estimator for x[n], but by minimizing over x[n] the LASSO cost 0
−1 −0.5 0 0.5 1 1.5 2
x[n]
functional (discussed in more detail in Section 7-9.2).
(b)
N−1 N−1
1
Λ=
2 ∑ (y[n] − x[n])2 + λ ∑ |x[n]|. (7.106) Figure 7-20 Plots of the first and second terms of Λn , and their
n=0 n=0 sum for: (a) y[n] = 1 and λ = 2, and (b) y[n] = 1 and λ = 1/3.
| {z } | {z }
fidelity to data y[n] sparsity
Readers familiar with basic estimation theory will note that Λ is

the negative log-likelihood function for zero-mean white Gaus- consists of four possible scenarios:
sian noise v[n] with independent Laplacian a priori distributions (1) y[n] ≥ 0 and y[n] ≤ λ
for each x[n]. The coefficient λ is a trade-off parameter between (2) y[n] ≥ 0 and y[n] ≥ λ
fidelity to the data y[n] and imposition of sparsity. If λ = 0, (3) y[n] ≤ 0 and |y[n]| ≤ λ
then the estimator x̂[n] of x[n] is just x̂[n] = y[n]. Nonzero λ (4) y[n] ≤ 0 and |y[n]| ≥ λ
emphasizes sparsity, while allowing some difference between
x̂[n] and y[n], which takes into account the noise v[n].
LASSO is an acronym for least absolute shrinkage and
Case 1: y[n] ≥ 0 and y[n] ≤ λ
selection operator. Let us consider the following example for a particular value of n:
Measurement y[n] = 1
7-8.3 Minimization of LASSO Cost Functional Trade-off parameter λ = 2
The minimization of Λ decouples in time n, so each term can Signal x[n]: unknown
be minimized separately. The goal then is to find an estimate of The estimated value x̂[n] of x[n] is found by minimizing Λn .
x[n], which we denote x̂[n], that minimizes over x[n] the nth term Figure 7-20(a) displays three plots, corresponding to the first
Λn of Eq. (7.106), namely and second terms in Eq. (7.107), and their sum. It is evident
1 from the plot of Λn that Λn is minimized at x[n] = 0. Hence,
Λn = (y[n] − x[n])2 + λ |x[n]|. (7.107)
2
x̂[n] = 0 for y[n] ≥ 0 and y[n] ≤ λ . (7.108a)
The expression given by Eq. (7.107) is the sum of two terms.
The value of λ is selected to suit the specific application; if
fidelity to the measured observations y[n] is highly prized, then λ Case 2: y[n] ≥ 0 and y[n] ≥ λ
is assigned a small value, but if sparsity is an important attribute,
then λ may be assigned a large value. Repetition of the scenario described by case 1, but with λ
Keeping in mind that λ ≥ 0, the minimization problem changed from 2 to 1/3, leads to the plots shown in Fig. 7-20(b).
7-8 DENOISING BY THRESHOLDING AND SHRINKING 235
In this case, Λn is parabolic-like in shape, and its minimum

occurs at a positive value of x[n] at which the slope of Λn is
20
zero. That is,
40
dΛn d 1 2
= (y[n] − x[n]) + λ x[n] = 0, 60
dx[n] dx[n] 2
80
which leads to 100
120
x̂[n] = y[n] − λ , for y[n] ≥ 0 and y[n] ≥ λ . (7.108b)
140
160
Case 3: y[n] ≤ 0 and |y[n]| ≤ λ 180
This case, which is identical to case 1 except that now y[n] is 200
20 40 60 80 100 120 140 160 180 200
negative, leads to the same result, namely
(a) Thresholding only
x̂[n] = 0, for y[n] ≤ 0 and |y[n]| ≤ λ . (7.108c)

20
40
Case 4: y[n] ≤ 0 and |y[n]| ≥ λ
60
Repetition of the analysis of case 2, but with y[n] negative, leads 80
to
100
x̂[n] = y[n] + λ , for y[n] ≤ 0 and |y[n]| ≥ λ . (7.108d) 120
140
The four cases can be combined into 160

 180

y[n] − λ for y[n] > +λ ,
200
x̂[n] = y[n] + λ for y[n] < −λ , (7.109) 20 40 60 80 100 120 140 160 180 200

0 for |y[n]| < λ . (b) Thresholding and shrinkage
Values of y[n] smaller in absolute value than the threshold λ Figure 7-21 Denoising the clown image: (a) denoising by
are thresholded (set) to zero. Values of y[n] larger in absolute thresholding alone, (b) denoising by thresholding and shrinkage
value than the threshold λ are shrunk by λ , making their in combination.
absolute values smaller. So x̂[n] is computed by thresholding
and shrinking y[n]. This is usually called (ungrammatically)
“thresholding and shrinkage,” or “soft thresholding.”
The next example shows that denoising images works much preserving the real features of the image.
better with thresholding and shrinkage than with thresholding
alone.
When we applied thresholding alone to the noisy image of Concept Question 7-7: Why is the combination of
Fig. 7-19(b), we obtained the image shown in Fig. 7-19(c), shrinkage and thresholding needed for noise reduction?
which we repeat here in Fig. 7-21(a). Application of thresh-
olding and shrinkage in combination, with λ = 0.11, leads to Concept Question 7-8: Why does wavelet-based denois-
the image in Fig. 7-21(b), which provides superior rendition ing work so much better than lowpass filtering?
of the clown image by filtering much more of the noise, while
7-9 Compressed Sensing In MRI, this reduces acquisition time inside the MRI machine,
and in smartphone cameras, it reduces the exposure time and
energy required to acquire an image.
The solution of an inverse problem in signal and image process- This section presents the basic concepts behind compressed
ing is the reconstruction of an unknown signal or image from sensing and applies these concepts to a few signal and image
measurements (known linear combinations) of the values of the inverse problems. Compressed sensing is an active area of re-
signal or image. Such inverse problems arise in medical imag- search and development, and will experience significant growth
ing, radar imaging, optics, and many other fields. For example, in applications in the future.
in tomography and magnetic resonance imaging (MRI), the
inverse problem is to reconstruct an image from measurements
of some (but not all) of its 2-D Fourier transform values. 7-9.1 Problem Formulation
If the number of measurements equals or exceeds the size
(duration in 1-D, number of pixels in 2-D) of the unknown To cast the compressed sensing problem into an appropriate
signal or image, solution of the inverse problem in the absence form, we define the following quantities:
of noise becomes a solution of a linear system of equations. In
practice, there is always noise in the measurements, so some sort (a) {x[n], n = 0 . . . N − 1} is an unknown signal of length N
of regularization is required. In Section 6-4.3, the deconvolution
problem required Tikhonov regularization to produce a recog- (b) The corresponding (unknown) wavelet transform of x[n] is
nizable solution when noise was added to the data. Furthermore,
often the number of observations is less than the size of the { xe1 [n], xe2 [n], . . . , xeL [n], XeL [n] },
unknown signal or image. For example, in tomography, the 2-D
and the wavelet transform of x[n] is sparse: only K values
Fourier transform values of the image at very high wavenumbers
xk [n]} are nonzero, with K ≪ N.
of all of the {e
are usually unknown. In this case, the inverse problem is under-
determined; consequently, even in the absence of noise there is
an infinite number of possible solutions. Hence, regularization is (c) { y[n], n = 0, 1, . . . , M − 1 } are M known measurements
needed, not only to deal with the underdetermined formulation,
y[0] = a0,0 x[0] + a0,1 x[1] + · · · + a0,N−1 x[N − 1],
but also to manage the presence of noise in the measurements.
We have seen that many real-world signals and images can y[1] = a1,0 x[0] + a1,1 x[1] + · · · + a1,N−1 x[N − 1],
be compressed, using the wavelet transform, into a sparse repre- ..
sentation in which most of the values are zero. This suggests .
that the number of measurements needed to reconstruct the y[M − 1] = aM−1,0 x[0] + aM−1,1 x[1] + · · ·
signal or image can be less than the size of the signal or image, + aM−1,N−1 x[N − 1],
because in the wavelet-transform domain, most of the values
to be reconstructed are known to be zero. Had the locations of where {an,i , n = 0, 1, . . . , M − 1 and i = 0, 1, . . . , N − 1} are
the nonzero values been known, the problem would have been known.
reduced to a solution of a linear system of equations smaller
in size than that of the original linear system of equations. In (d) K is unknown, but we know that K ≪ M < N.
practice, however, neither the locations of the nonzero values
nor their values are known. The goal of compressed sensing is to compute signal {x[n],
Compressed sensing refers to a set of signal processing n = 0, 1, . . . , N − 1} from the M known measurements {y[n],
techniques used for reconstructing wavelet-compressible signals n = 0, 1, . . . , M − 1}.
and images from measurements that are much fewer in number The compressed sensing problem can be divided into two
than the size of the signal or image, but much larger than the components, a direct problem and an inverse problem. In the
number of nonzero values in the wavelet transform of the signal direct problem, the independent variable (input) is x[n] and the
or image. The general formulation of the problem is introduced dependent variable (output) is the measurement y[n]. The roles
in the next subsection. There are many advantages to reducing are reversed in the inverse problem: the measurements become
the number of measurements needed to reconstruct the signal or the independent variables (input) and the unknown signal x[n]
image. In tomography, for example, the acquisition of a fewer becomes the output. The relationships between x[n] and y[n]
number of measurements reduces patient exposure to radiation. involve vectors and matrices:
7-9 COMPRESSED SENSING 237
A. Signal vector For the orthogonal wavelet transforms, such as the Haar
and Daubechies transforms covered in Sections 7-4 and 7-6,
x = [x[0], x[1], . . . , x[N − 1]]T, (7.110) W−1 = WT , so the inverse wavelet transform can be computed
where T denotes the transpose operator, which converts a row as easily as the wavelet transform. In practice, both are com-
vector into a column vector. puted using analysis and synthesis filter banks, as discussed in
earlier sections.
y = [y[0], y[1], . . ., y[M − 1]]T . (7.111) The crux of the compressed sensing problem reduces to
finding z, given y. An additional factor to keep in mind is that
only K values of the elements of z are nonzero, with K ≪ M.
B. Wavelet transform vector Algorithms for computing z from y rely on iterative approaches,
as discussed in future sections.
    Because z is of length N, y of length M, and M < N
z1 xe1 [n]
 z2   xe [n]  (fewer measurements than unknowns), Eq. (7.115) represents an
   2  underdetermined system of linear equations, whose solution is
 ..   . 
z=  
 .  =  ..  ,
 (7.112) commonly called an ill-posed problem.
 ..   xe [n] 
.  L 
zN XeL [n]
z is of length N, and xe1 [n] to xeL [n] are the detail signals of the 7-9.2 Inducing Sparsity into Solutions
wavelet transform and XeL [n] is the coarse signal.
In seismic signal processing, explosions are set off on the
Earth’s surface, and echoes of the seismic waves created by
C. Wavelet transform matrix the explosion are measured by seismometers. In the 1960s,
z = W x, (7.113) sedimentary media (such as the bottom of the Gulf of Mexico)
were modeled as a stack of layers, so the seismometers would
where W is a known N × N wavelet transform matrix that record occasional sharp pulses reflected off of the interfaces
implements the wavelet transform of x[n] to obtain z. between the layers. The amplitudes and times of the pulses
would allow the layered medium to be reconstructed. However,
D. Direct-problem formulation the occasional pulses had to be deconvolved from the source
pulse created by the explosions. The deconvolution problem was
y = A x, (7.114) modeled as an underdetermined linear system of equations.
A common approach to finding a sparse solution to a system
where A is an M × N matrix. Usually, A is a known matrix based
of equations is to choose the solution that minimizes the sum of
on a physical model or direct measurement of y for a known x. absolute values of the solution. This is known as the minimum
Combining Eqs. (7.113) and (7.114) gives ℓ1 norm solution. The ℓ1 norm is denoted by the symbol ||z||1
y = A W−1 z = Aw z, (7.115a) and defined as
N
||z||1 = ∑ |zi |. (7.117a)
where i=1
−1
Aw = A W . (7.115b)
The goal is to find the solution to the system of equations that
Since A and W are both known matrices, Aw also is known. minimizes ||z||1 .
A second approach called the squared ℓ2 norm, which does
not provide a sparse solution, finds the solution that minimizes
E. Inverse-problem formulation the sum of squares of the solution. The squared ℓ2 norm is
If, somehow, z can be determined from the measurement vec- denoted by the symbol ||z||22 and is defined as
tor y, then x can be computed by inverting Eq. (7.113): N
−1 ||z||22 = ∑ |zi |2 . (7.117b)
x=W z. (7.116) i=1
which is a straightforward linear programming problem that

Concept Question 7-9: What is compressed sensing?
can be solved using linprog in MATLAB’s Optimization
Toolbox. The basis pursuit method is limited to noise-free signal
and image problems, so in the general case, more sophisticated
approaches are called for.
7-10 Computing Solutions to
Underdetermined Equations 7-10.2 LASSO Cost Functional
7-10.1 Basis Pursuit A serious shortcoming of the basis pursuit solution method
is that it does not account for the presence of noise in the
As noted earlier, the unknown signal vector x can be determined observations y. As noted earlier in Section 7-8.2, the least
from Eq. (7.116), provided we have a solution for z (because absolute shrinkage and selection operator (LASSOprovides
W is a known matrix). One possible approach to finding z is to an effective approach for estimating the true signal from noisy
implement the minimum ℓ1 norm given by Eq. (7.117a), subject measurements. In the present context, the LASSO functional is
to the constraint given by Eq. (7.115a). That is, the goal is to defined as
find vector z from measurement vector y such that
N Λ= 1
2 ||y − Aw z||22 + λ ||z||1 , (7.122)
| {z } | {z }
∑ |zi | is minimum, fidelity sparsity
i=1
and where λ is a trade-off parameter between sparsity of z and

y = Aw z. fidelity to the measurement y. Choosing the solution z that
The solution method is known as basis pursuit, and it can be minimizes Eq. (7.122), for a specified value of λ , is called
formulated as a linear programming problem by defining the basis pursuit denoising. The solution requires the use of an
positive z+ and negative z− parts of z as: iterative algorithm. Two such algorithms are presented in later
( subsections, preceded by a short review of pseudo inverses.
+ +zi if zi ≥ 0,
zi = (7.118a)
0 if zi < 0,
( 7-10.3 Review of Pseudo-Inverses
− −zi if zi ≤ 0,
zi = (7.118b) (a) Overdetermined system
0 if zi > 0.
Consider the linear system of equations given by
Vector z is then given by
y = Aw z, (7.123)
z = z+ − z− , (7.119)
with z of length N, y of length M, and Aw of size M × N
and its ℓ1 norm is with full rank. If M > N, then the system is overdetermined
(more measurements than unknowns) and, in general, it has no
N
solution. The vector ẑ that minimizes ||y − Aw z||22 is called the
||z||1 = ∑ (z+ −
i + zi ). (7.120)
i=1
pseudo-inverse solution, and is given by the estimate
In terms of z+ and z− , the basis pursuit problem becomes: ẑ = (ATw Aw )−1 ATw y. (7.124a)
N Note that ATw Aw is an N × N matrix with full rank.

Minimize ∑ (z+ −
i + zi ) To avoid matrix-inversion problems, ẑ should be computed
i=1 not by inverting ATw Aw , but by solving the linear system of
subject to y = Aw z+ − Aw z− , (z+ −
i ≥ 0, zi ≥ 0), equations
(7.121) (ATw Aw ) ẑ = ATw y (7.124b)
7-10 COMPUTING SOLUTIONS TO UNDERDETERMINED EQUATIONS 239
using the LU decomposition method or similar techniques. rewrite Eq. (7.128) in the expanded form
Here, LU stands for lower upper, in reference to the lower trian-
gular submatrix and the upper triangular submatrix multiplying T =
the unknown vector ẑ.      2
y[0] a0,0 a0,1 ... a0,N−1 z1
... a1,N−1   z2 
 y[1]   a1,0 a1,1
 .. − .   
   .   .. 
. .  . 
1
(b) Underdetermined system y[M − 1] a M−1,0 aM−1,1 . . . aM−1, N−1  
 .. 
2  . 
 
Now consider the underdetermined system characterized by  .. 
 . 
M < N (fewer measurements than unknowns). In this case, there
zN
is an infinite number of possible solutions. The vector ẑ that 2
minimizes ||z||22 among this infinite number of solutions also is M×1 M×N N ×1
called the pseudo-inverse solution, and is given by the estimate
   2
ẑ = ATw (Aw ATw )−1 y. (7.125) D11
z1
 ..   z 
 .   2 
In the present case, Aw ATw is an M × M matrix with full rank.    . 
 ..   . 
Solution ẑ should be computed not by inverting Aw ATw , but by 
+ λ 
. 0   . 
  .  .
initially solving the linear system  ..   . 

 0 .   . 
  . 
(Aw ATw ) r̂ = y, (7.126a)  ..   . 
 .  .

DNN zN
to compute an intermediate estimate r̂, and then computing ẑ by 2
applying N×N N ×1
ẑ = ATw r̂. (7.126b) (7.129)
where ai, j is the (i, j)th element of Aw , not of A.

Both the unknown vector x and its wavelet vector z are of size
7-10.4 Iterative Reweighted Least Squares N × 1. This is in contrast with the much shorter measurement
(IRLS) Algorithm vector y, which is of size M × 1, with M < N.
We now introduce vector y′ and matrix Bw as
According to Eq. (7.115a), measurement vector y and wavelet " #
′ y }M × 1
transform vector z are related by y = , (7.130a)
0 }N × 1
y = Aw z. (7.127) (M+N)×1
" #
The iterative reweighted least squares (IRLS) algorithm uses Aw }M × N
the Tikhonov regularization functional given by Eq. (6.29), Bw = √ . (7.130b)
together with a diagonal weighting matrix D to minimize the 2λ D }N × N
(M+N)×N
cost function
Vector y′ is vector y of length M stacked on top of vector 0,
T = 1
2 ||y − Aw z||22 + λ ||D z||22 , (7.128) which is a column vector of zeros of length N. Similarly, matrix
| {z } | {z } Bw is matrix Aw (of size M × N) stacked on√top of matrix D (of
fidelity size
size N × N), multiplied by the scalar factor 2λ .
where λ is the trade-off parameter between the size of z and the

fidelity to the data y. The goal is to trade off small differences
between y and Aw z so as to keep z small. To compute z, we first
Next, we introduce the new cost function T1 as LASSO functional given by Eq. (7.122) if
" #
1 ′ 2
T1 = y − Bw z2 D = diag p
1
. (7.135)
2
" # " # 2 |zn |
1 y Aw

= − √ z This is because the second terms in the two equations become
2 0 2 λ D 2 identical:
1
= ||y − Aw z||22 + λ ||0 − D z||22 N
z2n N
2 ||D z||22 = ∑ = ∑ |zn | = ||z||1 . (7.136)
1 n=1 |zn |
= ||y − Aw z||22 + λ ||D z||22 = T . (7.131) n=1
2
Given this correspondence between the two cost functionals, the
Hence, Eq. (7.128) can be rewritten in the form IRLS algorithm uses the following iterative procedure to find z:
T = 1
2 ||y − Bw z||22 . (7.132)
(a) Initial solution: Set D = I and then compute z(1) , the initial
The vector z minimizing T is the pseudo-inverse given by iteration of z, by solving Eq. (7.134).
(b) Initial D: Use z(1) to compute D(1) , the initial iteration
ẑ = (BTw Bw )−1 BTw y′ of D:  
" #!−1 " # 1
√ Aw √ y D(1) = diag  q . (7.137)
= ATw 2 λ DT √ ATw 2 λ DT (1)
2λ D 0 |zn | + ε
= (ATw Aw + 2λ DT D)−1 ATw y. (7.133) (c) Second iteration: Use D(1) to compute z(2) by solving
Eq. (7.134) again.
As always, instead of performing matrix inversion (which is (d) Recursion: Continue to iterate by computing D(k) from
susceptible to noise amplification), vector ẑ should be computed (k)
z using  
by solving
(ATw Aw + 2λ DT D) ẑ = ATw y. (7.134) 1
D(k) = diag  q  (7.138)
(k)
Once ẑ has been determined, the unknown vector x can be |zn | + ε
computed by solving Eq. (7.113).
To solve Eq. (7.134) for ẑ, however, we need to know Aw , λ , for a small deviation ε inserted in the expression to keep D(k)
D, and y. From Eq. (7.115b), Aw = A W−1 , where A is a known finite when elements of z(k) → 0.
matrix based on a physical model or calibration data, and W is
a known wavelet transform matrix. The parameter λ is specified The iterative process ends when no significant change oc-
by the user to adjust the intended balance between data fidelity curs between successive iterations. The algorithm, also called
and storage size (as noted in connection with Eq. (7.128)), and focal underdetermined system solver (FOCUSS), is guaranteed
y is the measurement vector. The only remaining quantity is to converge under mild assumptions. However, because the
the diagonal matrix D, whose function is to assign weights to method requires a solution of a large system of equations at
z1 through zN so as to minimize storage size by having many each iteration, the algorithm is considered unsuitable for most
elements of z → 0. Initially, D is unknown, but it is possible signal and image processing applications. Superior-performance
to propose an initial function for D and then iterate to obtain a algorithms are introduced in succeeding sections.
solution for z that minimizes the number of nonzero elements,
while still satisfying Eq. (7.134).
The Tikhonov function given by Eq. (7.128) reduces to the Concept Question 7-10: How can we get a useful solu-
tion to an underdetermined system of equations?
7-11 LANDWEBER ALGORITHM 241
7-11 Landweber Algorithm Continuing the pattern leads to
The Landweber algorithm is a recursive algorithm for solving x(k+1) = x(k) + AT (y − A x(k) ). (7.144)
linear systems of equations y = Ax. The iterative shrinkage
and thresholding algorithm (ISTA) consists of the Landweber The process can be initialized by x(0) = 0, which makes
algorithm,with thresholding and shrinkage applied at each re- x(1) = AT y, as it should.
cursion. Thresholding and shrinkage were used in Section 7-7 The recursion process is called the Landweber iteration,
to minimize the LASSO functional. which in optics is known as the van Cittert iteration. It is
guaranteed to converge to the solution of y = Ax that minimizes
7-11.1 Underdetermined System the sum of squares of the elements of x, provided that the
eigenvalues λi of AAT are within the range 0 < λi < 2. If this
For an underdetermined system y = Ax with M < N, the solution condition is not satisfied, the formulation may be scaled to
x̂ that minimizes the sum of squares of the elements of x is, by
analogy with Eq. (7.125), given by y A
= x
c c
x̂ = AT (A AT )−1 y. (7.139)
or, equivalently,
A useful relationship in matrix algebra states that if all of the u = Bx, (7.145)
eigenvalues λi of (A AT ) lie in the interval 0 < λi < 2, then the where u = y/c and B = A/c. The constant c is chosen so that
coefficient of y in Eq. (7.139) can be written as the eigenvalue condition is satisfied. For example, if c is chosen
∞ to be equal to the sum of the squares of the magnitudes of all of
AT (AAT )−1 = ∑ (I − ATA)k AT . (7.140) the elements of A, then the eigenvalues λi′ of BBT will be in the
k=0 range 0 < λi′ < 1.
The symbol λi for eigenvalue is unrelated to the trade-off
parameter λ in Eq. (7.128).
Using Eq. (7.140) in Eq. (7.139) leads to
∞
7-11.2 Overdetermined System
T k T
x̂ = ∑ (I − A A) A y. (7.141)
In analogy with Eq. (7.124a), the solution for an overdetermined
k=0
system y = Ax is given by
A recursive implementation of Eq. (7.141) assumes the form
x̂ = (AT A)−1 AT y. (7.146)
K
(K+1) T k T
x = ∑ (I − A A) A y, (7.142) Using the equality
k=0
∞
where the upper limit in the summation is now K (instead of ∞). (AT A)−1 = ∑ (I − ATA)k , (7.147)
For K = 0 and K = 1, we obtain the expressions k=0
x(1) = AT y, (7.143a) we can rewrite Eq. (7.146) in the same form as Eq. (7.141),
namely
x(2) = AT y + (I − ATA)AT y = (I − AT A) x(1) + ATy. ∞
(7.143b) x̂ = ∑ (I − ATA)k AT y. (7.148)
k=0
Extending the process to K = 2 gives Hence, the Landweber algorithm is equally applicable to solving
overdetermined systems of linear equations.
x(3) = AT y + (I − ATA)AT y + (I − ATA)2 AT y
|{z} | {z } | {z }
k=0 k=1 k=2
= x(2) + AT (y − A x(2) ). (7.143c)

7-11.3 Iterative Shrinkage and Thresholding tions on ISTA, with names like SPARSA (sparse reconstruction
Algorithm (ISTA) by separable approximation), FISTA (fast iterative shrinkage
and thresholding algorithm, and TWISTA (two-step iterative
For a linear system given by shrinkage and thresholding algorithm).
y = A x, (7.149)
Concept Question 7-11: Why do we need an iterative
where x is the unknown signal of length N and y is the (possibly algorithm to find the LASSO-minimizing solution?
noisy) observation of length M, the LASSO cost functional is
N−1 N−1
1 7-12 Compressed Sensing Examples
Λ=
2 ∑ (y[n] − (Ax)[n])2 + λ ∑ |x[n]|, (7.150)
n=0 n=0
To illustrate the utility of the ISTA described in the preceding
where matrix A is M × N and (A x)[n] is the nth element of A x. section, we present four examples of compressed sensing:
• Reconstruction of an image from some, but not all, of its
◮ In the system described by Eq. (7.149), x and y are 2-D DFT values.
generic input and output vectors. The Landweber algorithm
provides a good estimate of x, given y. The estimation • Image inpainting, which entails filling in holes (missing
algorithm is equally applicable to any other linear system, pixel values) in an image.
including the system y = Aw z, where Aw is the matrix given • Valid deconvolution of an image from only part of its
by Eq. (7.115b) and z is the wavelet transform vector. ◭ convolution with a known point spread function (PSF).
• Tomography, which involves reconstruction of a 3-D im-
The ISTA algorithm combines the Landweber algorithm with
age from slices of its 2-D DFT.
the thresholding and shrinkage operation outlined earlier in
Section 7-8.3, and summarized by Eq. (7.109). After each These are only a few of many more possible types of applica-
iteration, elements x(k) [n], of vector x(k) , whose absolute values tions of compressed sensing.
are smaller than the trade-off parameter λ are thresholded to The ISTA was used in all four cases, and the maximum
zero, and those whose absolute values are larger than λ are number of possible iterations was set at 1000, or fewer if the
shrunk by λ . Hence, the ISTA algorithm combines Eq. (7.144) algorithm converges to where no apparent change is observed in
with Eq. (7.109): the reconstructed images. The LASSO functional parameter was
set at λ = 0.01, as this value seemed to provide the best results.
x(0) = 0, (7.151a)
(k+1) (k) T (k)
x =x + A (y − A x ), (7.151b) 7-12.1 Image Reconstruction from Subset of
with
DFT Values
 (k+1) (k+1)
Suppose that after computing the 2-D DFT of an image x[n, m],

xi −λ if xi > λ, some of the DFT values X[k1 , k2 ] were lost or no longer avail-
(k+1) (k+1) (k+1)
xi = xi +λ if xi < −λ , (7.151c) able. The goal is to reconstruct image x[n, m] from the partial

 (k+1) subset of its DFTs. Since the available DFT values are fewer
0 if |xi | < λ,
than those of the unknown signal, the system is underdetermined
(k+1) and the application is a good illustration of compressed sensing.
where xi is the ith component of x(k+1) . For a 1-D signal { x[n], n = 0, 1, . . . , N − 1 }, its DFT can be
The ISTA algorithm converges to the value of x that mini- implemented by the matrix-vector product
mizes the LASSO functional given by Eq. (7.150), provided all
of the eigenvalues λi of AAT obey |λi | < 1. y = A x, (7.152)
The combination of thresholding and shrinking is often called
soft thresholding, while thresholding small values to zero with- where the (k, n)th element of A is Ak,n = e− j2π nk/N , the nth
out shrinking is called hard thresholding. There are many varia- element of vector x is x[n], and the kth element of vector y is
7-12 COMPRESSED SENSING EXAMPLES 243
X[k]. Multiplication of both sides by AH implements an inverse stored during the implementation of the reconstruction process.
1-D DFT within a factor 1/N. Multiplication by W−1 is implemented by a 2-D filter bank,
The 2-D DFT of an N × N image can be implemented by and multiplication by B is implemented by a 2-D FFT. Con-
multiplication by an N 2 × N 2 block matrix B whose (k2 , n2 )th sequently, ISTA is a very fast algorithm.
block is the N × N matrix A multiplied by the scalar e− j2π n2k2 /N .
So the element Bk1 +Nk2 ,n1 +Nn2 of B is e− j2π n1 k1 /N e− j2π n2k2 /N ,
where 0 ≤ n1 , n2 , k1 , k2 ≤ N − 1. 7-12.2 Image Inpainting
The absence of some of the DFT values is equivalent to In an image inpainting problem, some of the pixel values of an
deleting some of the rows of B and y, thereby establishing image are unknown, either because those pixel values have been
an underdetermined linear system of equations. To illustrate corrupted, or because they represent some unwanted feature of
the reconstruction process for this underdetermined system, we the image that we wish to remove. The goal is to restore the
consider two different scenarios applied to the same set of data. image to its original version, in which the unknown pixel values
are replaced with the, hitherto, unknown pixel values of the
◮ For convenience, we call the values X[k1 , k2 ] of the 2-D original image. This can be viewed as a kind of interpolation
DFT the pixels of the DFT image. ◭ problem.
It is not at all evident that this can be done at all—how can
we restore unknown pixel values? But under the assumption
(a) Least-squares reconstruction that the Daubechies wavelet transform of the image is sparse
(mostly zero-valued), image inpainting can be formulated as a
Starting with the (256 × 256) image shown in Fig. 7-22(a), we compressed sensing problem. Let y be the vector of the known
compute its 2-D DFT and then we randomly select a subset of pixel values and x be the vector of the wavelet transform of the
the pixels in the DFT image and label them as unknown. The image. Note that all elements of x are unknown, even though
complete 2-D DFT image consists of 2562 = 65536 pixel values some of the pixel values are actually known. Then the problem
of X[k1 , k2 ]. Of those, 35755 are unaltered (and therefore have can be formulated as an underdetermined linear system y = Ax.
known values), and the other 29781 pixels have unknown values. For example, if x is a column vector of length five, and only
Figure 7-22(b) displays the locations of pixels with known DFT the first, third, and fourth elements of x are known, the problem
values as white dots and those with unknown values as black can be formulated as
dots.  
In the least-squares reconstruction method, all of the pixels       x1
with unknown values are set to zero, and then the inverse 2-D y1 x1 1 0 0 0 0 x2 
DFT is computed. The resulting image, shown in Fig. 7-22(c), y2  = x3  = 0 0 1 0 0  
x3  .
is a poor rendition of the original image in part (a) of the figure. 3y 4 x 0 0 0 1 0 x  
4
x5
(b) ISTA reconstruction One application of image inpainting is to restore a painting
The ISTA reconstruction process consists of two steps: in which the paint in some regions of the painting has been
chipped off, scraped off, damaged by water or simply faded,
(1) Measurement vector y, representing the 35755 DFT pixels but most of the painting is unaffected. Another application is
with known values, is used to estimate the (entire 65536) to remove unwanted letters or numbers from an image. Still
wavelet transform vector z by applying the recipe outlined in another application is “wire removal” in movies, the elimination
Section 7-11.3 with λ = 0.01 and 1000 iterations. The relation- of wires used to suspend actors or objects used for an action
ship between y and z is given by y = Aw z, with Aw = BW and stunt in a movie scene.
some rows of B deleted. In all of these cases, damage to the painting, or presence of
(2) Vector z is then used to reconstruct x by applying the unwanted objects in the image, has made some small regions
relation x = W−1 z. of the painting or image unknown. The goal is to fill in the
The reconstructed image, displayed in Fig. 7-22(d), is an unknown values to restore the (digitized) painting or image to
excellent rendition of the original image. its original version.
It is important to note that while B and W−1 are each Using a (200 × 200) = 40000-pixel clown image, 19723 pix-
(N 2 × N 2 ), with N = 65536, neither matrix is ever computed or els were randomly selected and their true values were deleted.
50 50
100 100
150 150
200 200
250 250
50 100 150 200 250 50 100 150 200 250
(a) Original Shepp-Logan phantom image (b) Locations of known values of X[k1,k2]
50 50
100 100
150 150
200 200
250 250
50 100 150 200 250 50 100 150 200 250
(c) Reconstructed image without ISTA (d) Reconstructed image with ISTA
Figure 7-22 (a) Original Shepp-Logan phantom image, (b) 2-D DFT image with locations of pixels of known values displayed in white
and those of unknown values displayed in black, (c) reconstructed image using available DFT pixels and (d) reconstructed image after
filling in missing DFT pixel values with estimates provided by ISTA.
In the image shown in Fig. 7-23(a), the locations of pixels values are known. The ISTA is a good algorithm to solve this
with unknown values are painted black, while the remaining compressed sensing problem, since the matrix vector multipli-
half (approximately) have their correct values. The goal is to cation y = AWT z can be implemented quickly by taking the
reconstruct the clown image from the remaining half. inverse wavelet transform of the current iteration (multiplication
In terms of the formulation y = AWT z, M = 20277 and by WT ), and then selecting a subset of the pixel values (multi-
N = 40000, so that just over half of the clown image pixel plication by A). The result, after 1000 iterations, is shown in
(M + L − 1) × (M + L − 1) blurred image y[n, m]. The process is

reversible: the blurred image can be deconvolved to reconstruct
20
x[n, m] by subjecting y[n, m] to a Wiener filter, as described
40 earlier in Section 6-4.3. To do so, however, requires that all of
60 y[n, m] be known.
Often, we encounter deconvolution applications where only
80
a fraction of y[n, m] is known, specifically, the part of y[n, m]
100 called the valid convolution. This is the part whose convolution
120
computation does not require the image x[n, m] to be zero-valued
outside the square 0 ≤ n, m ≤ M − 1.
140
For L < M (image larger than PSF), the valid 2-D convolution
160 of h[n, m] and x[n, m] is defined as
180 M−1 M−1
200
20 40 60 80 100 120 140 160 180 200
yV [n, m] = ∑ ∑ x[i, j] h[n − i, m − j]
i=0 j=0
(a) Locations of known values of image
= h[n, m] ∗ x[n, m], restricted to
{ L − 1 ≤ n, m ≤ M − 1 }. (7.153)
20
A valid convolution omits all end effects in 1-D convolution
40 and all edge effects in 2-D convolution. Consequently, the
60 size of yV [n, m] is (M − L + 1) × (M − L + 1), instead of
(M + L − 1) × (M + L − 1) for the complete convolution y[n, m].
80
To further illustrate the difference between y[n, m] and
100 yV [n, m], let us consider the following example:
120  
1 2 3
140
x[n, m] = 4 5 6
160 7 8 9
180
and
200
20 40 60 80 100 120 140 160 180 200 11 12
h[n, m] = .
(b) Reconstructed (inpainted) image 13 14
Since M = 3 and L = 2, the 2-D convolution is
Figure 7-23 (a) Locations of known values of clown image in
white and those of unknown values in black; (b) restored image. (M + L − 1) × (M + L − 1) = 4 × 4,
and y[n, m] is given by

 
Fig. 7-23(b). The image has been reconstructed quite well, but 11 34 57 36
not perfectly. The db3 Daubechies wavelet transform was used  57 143 193 114
y[n, m] =  .
to sparsify the image. 129 293 343 192
91 202 229 126
7-12.3 Valid 2-D Deconvolution In contrast, the size of the valid 2-D convolution is
(a) Definition of valid convolution (M − L + 1) × (M − L + 1) = 2 × 2,
Given an M × M image x[n, m] and an L × L point spread
function (PSF) h[n, m], their 2-D convolution generates an
and yV [n, m] is given by transposes of the rows are stacked into a column vector. Finally,
note that multiplication by AT can be implemented as a valid
143 193 2-D convolution with the doubly reversed version of h[n, m]. For
yV [n, m] = .
293 343 example, if
 
1 2 3 4
The valid convolution yv [n, m] is the central part of y[n, m], 5 6 7 8
z[n, m] = 
obtained by deleting the edge rows and columns from y[n, m]. 9 10 11 12
In MATLAB, the valid 2-D convolution of X and H can be 13 14 15 16
computed using the command
and
Y=conv2(X,H,’valid’).
14 13
g[n, m] = = h[1 − n, 1 − m], with n, m = 0, 1,
12 11
(b) Reconstruction from yV [n, m]
The valid 2-D deconvolution problem is to reconstruct an un- then the valid 2-D convolution of z[n, m] and g[n, m] is
known image from its valid 2-D convolution with a known PSF.  
The 2-D DFT and Wiener filter cannot be used here, since not all 184 234 284
of the blurred image y[n, m] is known. It may seem that we may wv [n, m] = 384 434 484 .
simply ignore, or set to zero, the unknown parts of y[n, m] and 584 634 684
still obtain a decent reconstructed image using a Wiener filter,
but as we will demonstrate with an example, such an approach This valid 2-D convolution can also be implemented as w = AT z
does not yield fruitful results. where
The valid 2-D deconvolution problem is clearly underdeter-
mined, since the (M − L+ 1)× (M − L+ 1) portion of the blurred z = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]T
image is smaller than the M × M unknown image. But if x[n, m] and
is sparsifiable, then valid 2-D deconvolution can be formulated w = [184 234 284 384 434 484 584 634 684]T.
as a compressed sensing problem and solved using the ISTA.
The matrix A turns out to be a block Toeplitz with Toeplitz To illustrate the process with an image, we computed the
blocks matrix, but multiplication by A is implemented as a valid valid 2-D convolution of the (200 × 200) clown image with a
2-D convolution. Multiplication by AT is implemented as a valid (20 × 20) PSF. The goal is to reconstruct the clown image from
2-D convolution. the (181 × 181) blurred image shown in Fig. 7-24(a). The db3
The valid 2-D convolution can be implemented as yV = Ax, Daubechies wavelet function was used to sparsify the image.
where Here, M = 200 and L = 20, so the valid 2-D convolution has size
(M − L + 1) × (M − L + 1) = 181 × 181. In terms of yV = Ax,
x = [1 2 3 4 5 6 7 8 9]T , A is (1812 × 2002) = 32761 × 40000.
yV = [143 193 293 343]T , Parts (b) and (c) of Fig. 7-24 show reconstructed versions
of the clown image, using a Wiener filter and ISTA, respec-
and the matrix A is composed of the elements of h[n, m] as tively. Both images involve deconvolution using the restricted
follows: valid convolution data yV [n, m]. In the Wiener-image approach,
  the unknown parts of the blurred image (beyond the edges
14 13 0 12 11 0 0 0 0 of yV [n, m]) were ignored, and the resultant image bears no
 0 14 13 0 12 11 0 0 0  real resemblance to the original clown image. In contrast, the
A= .
0 0 0 14 13 0 12 11 0  ISTA approach provides excellent reconstruction of the original
0 0 0 0 14 13 0 12 11 image. This is because ISTA is perfectly suited for solving un-
derdetermined systems of linear equations with sparse solutions.
Note that A is a 2 × 3 block matrix of 2 × 3 blocks. Each
block is constant along its diagonals, and the blocks are constant
along block diagonals. This is the block Toeplitz with Toeplitz
blocks structure. Also note that images x[n, m] and yV [n, m] have
been unwrapped row by row, starting with the top row, and the
20
40
60
80
100
120
140
160
180
20 40 60 80 100 120 140 160 180
(a) Valid 2-D convolution yv[m,n] of clown image
20 20
40 40
60 60
80 80
100 100
120 120
140 140
160 160
180 180
200 200
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
(b) 2-D deconvolution using yv[m,n] and Wiener filter (c) 2-D deconvolution using yv[m,n] and ISTA
Figure 7-24 (a) Valid 2-D convolution yV [n, m] of clown image, (b) deconvolution using Wiener filter, and (c) deconvolution using ISTA.
7-12.4 Computed Axial Tomography (CAT) where α (ξ , η ) is the absorption coefficient of the body under
test at location (ξ , η ) in Cartesian coordinates or (r, θ ) in
polar coordinates. The impulse function δ (r − ξ cos θ − η sin θ )
The basic operation of the CAT scanner was described in
dictates that only those points in the (ξ , η ) plane that fall along
the opening chapter of this book using Fig. 1-24, which we
the path specified by fixed values of (r, θ ) are included in the
reproduce here as Fig. 7-25. For a source-to-detection path
integration.
through a body at a radius r and at orientation θ , the path
The relation between p(r, θ ) and α (ξ , η ) is known as the
attenuation is given by Eq. (1.18) as
2-D Radon transform of α (ξ , η ). The goal of CAT is to
Z ∞Z ∞ reconstruct α (ξ , η ) from the measured path attenuations p(r, θ ),
p(r, θ ) = α (ξ , η ) δ (r − ξ cos θ − η sin θ ) d ξ d η , by inverting the Radon transform given by Eq. (7.154). We do
−∞ −∞
(7.154) so with the help of the Fourier transform.
Recall from entry #1 in Table 2-5 that for variable r,

F {δ (r)} = 1, and from entry #3 in Table 2-4 that the shift
property is
X-ray F {x(r − r0 )} = X( f ) e− j2π f r0 .
source Fan beam
of X-rays The combination of the two properties leads to
Detector F {δ (r − ξ cos θ − η sin θ )}
array Z ∞
= δ (r − ξ cos θ − η sin θ ) e− j2π f r dr
0
− j2π f (ξ cos θ +η sin θ )
=e = e− j2π (µξ +νη ) , (7.155)
where we define spatial frequencies µ and ν as

Computer
and monitor µ = f cos θ , (7.156a)
(a) CAT scanner ν = f sin θ . (7.156b)
η Next, let us define A(µ , ν ) as the 2-D Fourier transform of the
absorption coefficient α (ξ , η ) using the relationship given by
Eq. (3.16a):
X-ray
I(ξ0,η0) Z ∞Z ∞
source
I0 A(µ , ν ) = α (ξ , η ) e− j2π µξ e− j2πνη d ξ d η . (7.157)
η0 X-ray detector −∞ −∞
Absorption
coefficient I(ξ0,η0) = ξ0
If we know A(µ , ν ), we can perform an inverse 2-D Fourier
transform to retrieve α (ξ , η ). To do so, we need to relate
α(ξ,η) I0 exp(− ∫ α(ξ,η0) dξ)
Object A(µ , ν ) to the measured path attenuation profiles p(r, θ ). To that
0
end, we use Eq. (7.154) to compute P( f , θ ), the 1-D Fourier
transform of p(r, θ ):
0 ξ Z ∞
0 ξ0
P( f , θ ) = p(r, θ ) e− j2π f r dr
(b) Horizontal path 0
"Z Z
Z ∞ ∞ ∞
η = α (ξ , η )
X-ray source 0 −∞ −∞
I0 #
· δ (r − ξ cos θ − η sin θ ) d ξ d η e− j2π f r dr.
r (7.158)
α(ξ,η) I(r,θ) By reversing the order of integration, we have
θ Detector Z ∞Z ∞
ξ P( f , θ ) = α (ξ , η )
(c) Path at radius r and orientation θ −∞ −∞
Z
∞
− j2π f r
· δ (r − ξ cos θ − η sin θ ) e dr d ξ d η .
Figure 7-25 (a) CAT scanner, (b) X-ray path along ξ , and (c) 0
X-ray path along arbitrary direction. (7.159)
We recognize the integral inside the square bracket as the

Fourier transform of the shifted impulse function, as given by

Eq. (7.155). Hence, Eq. (7.159) simplifies to
Z ∞Z ∞
P( f , θ ) = α (ξ , η ) e− j(2π µξ +2πνη ) d ξ d η , (7.160) 50
−∞ −∞
which is identical to Eq. (7.157). Hence, 100
A(µ , ν ) = P( f , θ ), (7.161)
150
where A(µ , ν ) is the 2-D Fourier transform of α (ξ , η ), and P
is the 1-D Fourier transform (with respect to r) of p(r, θ ). The
variables (µ , ν ) and ( f , θ ) and related by Eq. (7.156). 200
If p(r, θ ) is measured for all r across the body of interest and
for all directions θ , then its 1-D Fourier transform P( f , θ ) can
be computed, and then converted to A(µ , ν ) using Eq. (7.156). 250
The conversion is called the projection-slice theorem. In prac- 50 100 150 200 250
tice, however, p(r, θ ) is measured for only a finite number of (a) Locations of known values of X[k1,k2]
angles θ , so A(µ , ν ) is known only along radial slices in the
2-D wavenumber domain (µ , ν ). Reconstruction to find α (ξ , η )
from a subset of its 2-D Fourier transform values is a perfect
example of compressed sensing. 50
Image reconstruction from partial radial slices 100

To demonstrate the CAT reconstruction process, we computed
the 2-D DFT X[k1 , k2 ] of a 256 × 256 Shepp-Logan phantom
150
image, and then retained the data values corresponding to only
12 radial slices, as shown in Fig. 7-26(a). These radial slices
simulate P( f , θ ), corresponding to 12 radial measurements 200
p(r, θ ). In terms of y = Ax, the number of pixels in the frequency
domain image is N = 65536, and the number of values contained
in the 12 radial slices is M = 11177. The Haar transform was 250
used to sparsify the image. 50 100 150 200 250
(b) Least-squares reconstruction
(a) Least-squares reconstruction: Unknown values of X[k1 , k2 ]
were set to zero, and then the inverse 2-D DFT was computed.
The resulting image is displayed in Fig. 7-26(b).
50
(b) ISTA reconstruction: Application of ISTA with λ = 0.01

for 1000 iterations led to the image in Fig. 7-26(c), which bears
100
very good resemblance to the original image.
150
200
250
50 100 150 200 250
(c) ISTA reconstruction
Figure 7-26 Shepp-Logan phantom image reconstruction

from partial radial slices of its 2-D DFT: (a) radial slices
of X[k1 , k2 ], (b) least-squares reconstruction, and (c) ISTA
reconstruction.
Summary
Concepts
• The wavelet transform of an image is an orthonormal wavelet transform.
expansion of the image using basis functions that are • An image can be denoised by thresholding and shrinking
localized in wavenumber or space. its 2-D wavelet transform. This preserves edges while
• The 1-D wavelet transform of a piecewise-polynomial reducing noise. Thresholding and shrinkage minimizes
signal is sparse (mostly zero-valued). This is why it is the LASSO cost functional, which favors sparsity.
useful. • Compressed sensing allows an image to be reconstructed
• The wavelet and inverse wavelet transforms are imple- from fewer linear combinations of its pixel values than
mented using filter banks and cyclic convolutions. This the number of pixel values, using the ISTA algorithm or
makes their computation very fast. (rarely) basis pursuit.
• The filters in the tree-structured filter banks used to • The ISTA algorithm applies thresholding and shrinkage
implement wavelet and inverse wavelet transforms must at each iteration of the Landweber algorithm.
satisfy the Smith-Barnwell condition for perfect recon- • Applications of compressed sensing include: tomogra-
struction, and also form a quadrature-mirror pair. phy, image inpainting, and valid deconvolution.
• An image can be compressed by thresholding its 2-D
Zero-stuffing
( Smith-Barnwell condition
x[n/2] for n even (h[n] ∗ h[−n])(−1)n + (h[n] ∗ h[−n]) = 2δ [n]
y[n] =
0 for n odd
Haar functions
Decimation
y[n] = x[2n] 1 1
g[n] = √ { 1, 1 } and h[n] = √ { 1, −1 }
2 2
QMF relation
g[n] = −(−1)n h[L − n]
average image decimation orthonormal basis sparse
basis pursuit detail image quadrature mirror filter subband decomposition
compressed sensing Haar pair thresholding
cyclic convolution ISTA algorithm Shepp-Logan phantom tree-structured filter
Daubechies Landweber algorithm shrinkage banks
dbK LASSO functional Smith-Barnwell condition zero-stuffing
PROBLEMS 251
PROBLEMS 7.7 Why are time-reversals used in the synthesis filters? Show
that using g[n] and h[n] instead of g[−n] and h[−n] for the
Section 7-2: Expansions of Signals in Orthogonal synthesis filters and then h[n] = (−1)n g[n] for the QMF, perfect
Basis Functions reconstruction is possible only if h[n] and g[n] have DTFTs
(
7.1 The continuous-time Haar functions are defined as 1 for 0 ≤ Ω < π2 ,
( G(Ω) =
0 for π2 < Ω < π ,
1 for 0 < t < 1,
φ (t) =
0 otherwise, and (

 for 0 < t < 21 , 0 for 0 ≤ Ω < π2 ,
1 H(Ω) =
ψ (t) = −1 for 21 < t < 1, 1 for π2 < Ω < π ,

0 otherwise, which constitutes an octave-band filter bank. Note that
h[−n] = h[n] and g[−n] = g[n]. Hint: Replace g[−n] with g[n]
ψm,n (t) = 2m/2 ψ (2mt − n). and h[−n] with h[n] in Eq. (7.48).
Let B = {φ (t), ψm,n (t), m, n integers} and let F be the set 7.8 Repeat Problem 7.7, except now change the synthesis
of piecewise-constant functions with support (nonzero region) filters from g[−n] and h[−n] to g[n] and −h[n], and then use
0 ≤ t ≤ 1 whose values change at t = m/2N . h[n] = (−1)n g[n] for the new QMF.
(a) Show that any member of F is a linear combination of (a) Show that one equation for perfect reconstruction is now
elements of B. automatically satisified.
(b) Show that B is an orthonormal basis for F. Hint: Draw (b) Show that the other equation still cannot be satisfied by any
pictures. g[n].
7.2 Let B = {e j2π kt , k integer, 0 ≤ t ≤ 1} and F be the set of Hint: Replace g[−n] with g[n] and h[−n] with −h[n] in
continuous functions with support (nonzero region) 0 ≤ t ≤ 1. Eq. (7.48) and use Eq. (7.34).
Show that B is an orthonormal basis for F.
Section 7-6: Sparsification Using Wavelets of
Section 7-4: Haar Wavelet Transforms Piecewise Polynomial Signals
7.3 Let x[n] = {4, 4, 4, 1, 1, 1, 1, 7, 7, 7, 7, 5, 5, 5, 5, 4}. 7.9 Use the Smith-Barnwell condition given by Eq. (7.62) to
(a) Compute all of the signals in the Haar analysis filter bank design the db2 Daubechies wavelet function. Confirm that your
Fig. 7-8. answer matches the coefficients listed in Table 7-3. Do this by
(b) Check your answers using Rayleigh’s (Parseval’s) theorem. equating coefficients of time n. You should get a large linear
You may use MATLAB. system of equations and a small nonlinear system of equations.
7.4 Let x[n] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}. 7.10 Use the Smith-Barnwell condition given by Eq. (7.62) to
(a) Compute all of the signals in the Haar analysis filter bank design the db2 Daubechies wavelet function. Confirm that your
Fig. 7-8. answer matches the coefficients listed in Table 7-3. Use q[n] =
{q[0], q[1]} = q[0]{1, b}, where b = q[1]/q[0]. This avoids the
(b) Check your answers using Rayleigh’s (Parseval’s) theorem. large linear systems of equations and the simultaneous quadratic
You may use MATLAB. equations of the previous problem.
Section 7-5: Discrete-Time Wavelet Transforms Section 7-7: 2-D Wavelet Transform
1 1
7.5 h[n] = {a, b, 2 , − 2 }.
Find a, b such that h[n] satisfies
7.11 Use haar.m to compute the 2-D Haar transform of the
Smith-Barnwell condition given by Eq. (7.57).
image in letters.mat. Set sigma=0 and lambda=0 in the
7.6 If h[n] = {a, b, c, d}, find g[n] such that g[n] and h[n] are a first line of haar.m. Also depict the image reconstructed from
QMF pair given by Eq. (7.52a). the wavelet transform.
7.12 Use daub.m to compute the 2-D db3 transform of the (a) Download and run the program daub.m. This adds noise
SAR image in sar.mat. Change the first line to to the SAR image, computes its 2-D db3 transform,
thresholds and shrinks this wavelet transform, computes
load sar.mat;sigma=0;lambda=0; the inverse 2-D db3 wavelet transform of the result, and
Also depict the image reconstructed from the wavelet transform. displays images. Change the first line to
load sar.mat;sigma=50;lambda=100;
Section 7-8: Wavelet-Based Denoising by The threshold and shrinkage uses λ = 100, and the signal-
to-noise ratio is about 6.1.
Thresholding and Shrinkage
(b) Why does this work better than the 2-D DFT or convolution
7.13 This problem investigates denoising the letters image with a lowpass filter?
using the wavelet transform, by thresholding and shrinking the
2-D Haar transform of the noisy image. Section 7-9: Compressed Sensing
(a) Run haar.m. This adds noise to the image in
letters.mat, computes its 2-D Haar transform, 7.17 Even if a compressed sensing problem is only slighlty
thresholds and shrinks this wavelet transform, computes underdetermined, and it has a mostly sparse solution, there is no
the inverse 2-D Haar wavelet transform of the result, and guarantee that the sparse solution is unique. The worst case for
displays images. The threshold and shrinkage uses λ = 70. compressed sensing is as follows. Let:
(b) Why does this work better than the 2-D DFT or convolution (a) am,n = e− j2π mn/N , n = 0, . . . , N − 1, m = 0, . . . , N − 1; skip
with a lowpass filter? every multiple of N/L.
7.14 This problem investigates denoising an MRI head image (b) (
using the wavelet transform, by thresholding and shrinking the 1 for n is a multiple of L,
xn =
2-D Haar transform of the noisy image. 0 for n not a multiple of L.
(a) Run the program haar.m. This adds noise to the MRI
image, computes its 2-D Haar transform, thresholds and (c) For N = 12 and L = 4: m = {1, 2, 4, 5, 7, 8, 10, 11} and
shrinks this wavelet transform, computes the inverse 2-D {n = 0, 4, 8}, so that in this case A is 8 × 12 and the sparsity
Haar wavelet transform of the result, and displays images. (number of nonzero xn ) is K = 3.
Change load letters.mat to load mri.mat;. Show that yn = 0! {xn } is called the Dirac comb. Its signficance:
The threshold and shrinkage uses λ = 150. Let xn be any signal of length 12 with nonzero elements only at
{n = 0, 4, 8}. Let ym = ∑11
n=0 am,n xn . Then adding the Dirac comb
(b) Why does this work better than the 2-D DFT or convolution
to {xn } won’t alter {ym }, so {xn } plus any multiple of the Dirac
with a lowpass filter?
comb is a K-sparse solution.
7.15 This problem investigates denoising an MRI head image
using the wavelet transform, by thresholding and shrinking the Section 7-10: Computing Solutions to
2-D db3 transform of the noisy image.
Underdetermined Systems
(a) Download and run the program daub.m. This adds noise
to the MRI image, computes its 2-D db3 transform, thresh- 7.18 Derive the pseudoinverse given by Eq. (7.124), which is
olds and shrinks this wavelet transform, computes the in- the solution x̂ to the overdetermined linear system of equations
verse 2-D db3 wavelet transform of the result, and displays y = Ax that minimizes E = ||y − Ax||22 . Perform the derivation
images. Change the first line to by letting x = x̂ + δ for an arbitrary δ and showing that the
load mri.mat;sigma=50;lambda=100; coefficient of δ must be zero.
The threshold and shrinkage uses λ = 100.
(b) Why does this work better than the 2-D DFT or convolution 7.19 Derive Eq. (7.125), which is the solution x̂ to the under-
with a lowpass filter? determined linear system of equations y = Ax that minimizes
E = ||x̂||22 . Perform the derivation by minimizing the Tikhonov
7.16 This problem investigates denoising an SAR image using criterion given by Eq. (7.128), letting parameter λ → 0, applying
wavelet transforms, by thresholding and shrinking the 2-D db3 the pseudoinverse given by Eq. (7.124), and using the matrix
transform of the noisy image. identity (AT A + λ 2I)−1 AT = AT (AAT + λ 2I)−1 .
PROBLEMS 253
Section 7-12: Compressed Sensing Examples
7.20 Free the clown from his cage. Run the program P720.m.
This sets horizontal and vertical bands of the clown image to
zero, making it appear that the clown is confined to a cage.
Free the clown: The program then uses inpainting to replace the
bands of zeros with pixels by regarding the bands of zeros as
unknown pixel values of the clown. Change the widths of the
bands of zeros and see how this affects the reconstruction.
7.21 De-square the clown image. Run program P721.m. This
sets 81 small squares of the clown image to zero, desecrating it.
The program then uses inpainting to replace the small squares
with pixels by regarding the 81 small squares as unknown pixel
values of the clown. Change the sizes of the squares and see how
this affects the reconstruction.
7.22 The tomography example at the end of Chapter 7 used 24
rays. Download and run the program P722.m, which generated
this example. Change the number of rays (M) from 24 to 32, and
then 16. Compare the compressed sensing reconstruction results
to the least-squares reconstructions (all unknown DFT values set
to zero).
Chapter 8
8 Random Variables,
Processes, and Fields
n2 n1n2 P[n1n2]
Contents
1 11 a2
Overview, 255 a
n1 = 1 b
8-1 Introduction to Probability, 255 2 12 ab
8-2 Conditional Probability, 259 c 3 13 ac
8-3 Random Variables, 261 d
8-4 Effects of Shifts on Pdfs and Pmfs, 263 4 14 ad
8-5 Joint Pdfs and Pmfs, 265 1 21 ba
a
8-6 Functions of Random Variables, 269 a n1 = 2 b 2 22 b2
8-7 Random Vectors, 272
8-8 Gaussian Random Vectors, 275 b c 3 23 bc
8-9 Random Processes, 278 d
4 24 bd
8-10 LTI Filtering of Random Processes, 282
8-11 Random Fields, 285 1 31 ca
a
Problems, 288 c b 2 32 cb
n1 = 3 c 3 33 c2
Objectives d d
4 34 cd
Learn to: 1 41 da
a
b 2 42 db
■ Compute conditional probabilities and density
functions. n1 = 4 c 3 43 dc
d
4 44 d2
■ Compute means, variances, and covariances for
random variables and vectors.
■ Compute autocorrelations and power spectral This chapter supplies a quick review of probability,
densities of wide-sense-stationary random processes random variables, vectors, processes, and fields for
in continuous and discrete time. use in Chapter 9, which reviews estimation theory
and applies it to image estimation. Readers already
■ Use thresholding and shrinkage of its wavelet familiar with these topics may skip this chapter.
transform to denoise an image.
■ Compute autocorrelations and power spectral

densities of wide-sense-stationary random fields.
Overview a coin is flipped, it is equally likely to produce a head or a tail,
then the probabilities of all four events are the same, namely
In this chapter, we offer a brief review of random variables, ran-
dom processes, and random fields. A random field is a random 1
P[E1 ] = P[E3 ] = P[E3 ] = P[E4 ] = .
process in 2-D. These topics will all be used in Chapter 9, which 4
covers estimation and Markov random fields. To begin our
Event Space A: The set of all subsets of sample space S to which
review, Section 8-1 provides a primer on the nomenclature used
probabilities can be computed on the basis of the probabilities
in the language of probability, illustrated with several examples.
assigned to the elements of S. If S has a finite number of
Section 8-2 presents conditional probability, which will play
elements M, then A has 2M elements, each of which is a subset
an important role in Chapter 9. In Section 8-3, we introduce
of S. Included in A are both S itself and the null set 0;
/ that is,
probability density functions (pdf) and probability mass func-
S ∈ A and 0/ ∈ A.
tions (pmf) for describing the distributions of 1-D random
variables, and then these tools are extended to 2-D in Sections
8-5 and 8-6. Sections 8-7 and 8-8 treat random vectors (vectors 8-1.1 Complement, Union, and Intersection
of random variables), which are then extended to 1-D random
processes and 2-D random fields in later sections. A particular Event E2 = { H1 T2 } represents the outcome that the first coin
emphasis is placed on wide-sense stationary (WSS) random flip is a head and the second one is a tail, and its associated
processes and fields. probability of occurrence is P[E2 ]. The complement of event
E2 is E2′ , and it denotes the condition that the outcome is not
H1 T2 . Thus, the complement of an event includes all possible
8-1 Introduction to Probability outcomes of S except for that particular event:
A probability experiment is an experiment in which the outcome

is uncertain (random), but each possible outcome (event) has a P[E2′ ] = P[S] − P[E2] = 1 − P[E2], (8.2)
given or computable probability (likelihood) of occurring.
Sample Space S: The set of all distinguishable outcomes. For where we used the probability axiom (Section 8-1.2) that the to-
a coin flipped once, S = { H, T }, where H denotes a “head” tal probability of all events constituting S is 1. Probability P[E2′ ]
outcome and T denotes a “tail” outcome. In this case, S consists equals P[E1 ]+ P[E3 ]+ P[E4], since only one of { E1 , E3 , E4 } can
of M = 2 elements. For a coin flipped twice in a row, M = 4 and occur.
S = { E1 , E2 , E3 , E4 }, (8.1)
◮ The union of two events Ei and E j is an OR state-
where E1 to E4 are outcomes (events) defined as: ment: Ei occurring, E j occurring, or both, as illustrated by
Fig. 8-1(a). ◭
E1 = { H1 H2 }, E2 = { H1 T2 },
E3 = { T1 H2 }, E4 = { T1 T2 },
where H1 and H2 denote that the results of the first and second
flips are heads, and similarly for tails T1 and T2 .
Ei Ej Ei Ej
Event Probability P: Each element of S, such as E1 through
E4 , is called an event. Events may also include combinations
of elements, such as
(a) Union: Ei U Ej (b) Intersection: Ei I Ej
Ea = { E2 , E4 } = { H1 T2 , T1 T2 },
Figure 8-1 (a) The event of the union of two events Ei and E j
which in this case represents the outcome that the second coin encompasses the combined elements of both events, whereas
flip is a “tail,” regardless of the outcome of the first coin flip. (b) the event of their intersection includes only the elements
Associated with each event is a certain probability determined common to both of them.
by the conditions and constraints of the experiment. If each time
255
256 CHAPTER 8 RANDOM VARIABLES, PROCESSES, AND FIELDS
◮ The intersection of events Ei and E j is an AND statement ◮ Probabilities are assigned by the user for all elements of
(Fig. 8-1(b)), it is denoted Ei ∩ E j , and its associated the sample space S, consistent with the axioms of probabil-
probability P[Ei ∩ E j ] represents the condition that both Ei ity. Using these axioms, the probability of any element of
and E j occur. ◭ the event space A can then be computed. ◭
For the two-time coin-flip experiment, the occurrence of any 8-1.3 Properties of Unions and Intersections
one of the four events, E1 through E4 , negates the possibility
of occurrence of the other three. Hence, if E1 = H1 H2 and Commutative property
E2 = H1 T2 ,
P[E1 ∩ E2 ] = P[H1 H2 ∩ H1 T2 ] = 0. Ei ∪ E j = E j ∪ Ei , (8.8a)

Ei ∩ E j = E j ∩ Ei . (8.8b)
When the intersection Ei ∩ E j = 0,
/ the associated probability is
P[Ei ∩ E j ] = 0 (disjoint), (8.3) Associative property

in which case events Ei and E j are said to be disjoint.
(Ei ∪ E j ) ∪ Ek = Ei ∪ (E j ∪ Ek ), (8.9a)
8-1.2 Axioms of Probability (Ei ∩ E j ) ∩ Ek = Ei ∩ (E j ∩ Ek ). (8.9b)
The axioms of probability are rules for computing P[E] for
any member E of event space A. The user must first assign Distributive property
probabilities to each member of the sample space S that satisfies
those axioms.
(Ei ∪ E j ) ∩ Ek = (Ei ∩ Ek ) ∪ (E j ∩ Ek ), (8.10a)
Axiom 1: Probability of any event Ei is bounded between 0
and 1: (Ei ∩ E j ) ∪ Ek = (Ei ∪ Ek ) ∩ (E j ∪ Ek ). (8.10b)
0 ≤ P[Ei ] ≤ 1. (8.4)
Axiom 2: Total probability for all distinguishable events com- De Morgan’s law
prising S is 1:
M For two events Ei and E j , De Morgan’s law states:
P[S] = ∑ P[Ei ] = 1, (8.5)
i=1 (E1 ∪ E2 )′ = E1′ ∩ E2′ , (8.11a)
where M is the total number of elements in S. (E1 ∩ E2 ) ′
= E1′ ∪ E2′ , (8.11b)
Axiom 3: If Ei and E j are disjoint, then P[Ei ∩ E j ] = 0 and
and hence,
P[Ei ∪ E j ] = P[Ei ] + P[E j ] (Ei and E j disjoint). (8.6)
P[(E1 ∪ E2 )′ ] = P[E1′ ∩ E2′ ], (8.12a)
Also, the probability of the union of an event Ei with its ′
complement Ei′ is 1: P[(E1 ∩ E2 ) ] = P[E1′ ∪ E2′ ], (8.12b)
P[Ei ∪ Ei′ ] = 1, (8.7a)

where as noted earlier, the prime denotes the complement
thus stating that the total probability of the presence and absence (absence) of the specified event. De Morgan’s law also leads to
of an event is 1. Similarly, the probability of the intersection of these useful relationships:
an event Ei and its complement Ei′ is zero:
(E1 ∩ E2 ) ∪ (E1′ ∩ E2 ) = (E1 ∪ E1′ ) ∩ E2 = E2 , (8.13a)
P[Ei ∩ Ei′ ] = 0. (8.7b) (E1 ∩ E2 ) ∩ (E1′ ∩ E2 ) = (E1 ∩ E1′ ) ∩ E2 = 0.
/ (8.13b)
8-1 INTRODUCTION TO PROBABILITY 257
Because (E1 ∩ E2 ) and (E1′ ∩ E2 ) are disjoint, the probability 8-1.4 Probability Tree for Coin-Flip Experiment
relationship corresponding to Eq. (8.13a) is
Suppose a coin is flipped N times, and the result of the kth flip—
with 1 ≤ k ≤ N—is either a head and designated Hk , or a tail and
P[E1 ∩ E2 ] + P[E1′ ∩ E2 ] = P[E2 ]. (8.14) designated Tk .
Independent flips: The result of any coin flip has no effect on

the result of any other flip.
Probability of Union and Intersection Events
Biased coin: The coin is not necessarily a “fair” coin, meaning
For two events, the probability of their union is equal to the sum that the probability of heads is not necessarily equal to the
of their individual probabilities minus the probability of their probability of tails. Suppose the probability of heads for any
intersection: coin flip is a known value a, where 0 ≤ a ≤ 1. Thus,
P[Hk ] = a, (8.20a)
P[Ei ∪ E j ] = P[Ei ] + P[E j ] − P[Ei ∩ E j ]. (8.15)
P[Tk ] = 1 − a, (8.20b)
1 ≤ k ≤ N.
The relationship given by Eq. (8.15) can be derived by using
Fig. 8-1(a) to write (Ei ∪ E j ) as the union of three intersections: For an unbiased coin, a = 0.5. A set designates a specific event,
such as H1 T2 , which can be written as
Ei ∪ E j = (Ei ∩ E ′j ) ∪ (Ei′ ∩ E j ) ∪ (Ei ∩ E j ), (8.16)
H1 T2 = H1 ∩ T2 .
and the corresponding probability is given by
For N = 2, the sample space S has 2N = 22 = 4 elements:
P[Ei ∪ E j ] = P[Ei ∩ E ′j ] + P[Ei′ ∩ E j ] + P[Ei ∩ E j ]. (8.17)
S = { H1 ∩ H2 , H1 ∩ T2 , T1 ∩ H2 , T1 ∩ T2 },
Upon setting E1 and E2 in Eq. (8.14) as Ei and E j , respectively,
we obtain N
and the event space A has 22 = 24 = 16 elements:
P[Ei ∩ E j ] + P[Ei′ ∩ E j ] = P[E j ]. (8.18a)
Repeating the process but with E1 and E2 set in Eq. (8.14) as E j {S, H2 , T2 , T1 H2 , T1 T2 , (T1 T2 )′ , (T1 H2 )′ , H1 T2 ∪ T1 H2 }∪
and Ei (instead of as Ei and E j ), respectively, leads to / H1 , T1 , H1 H2 , H1 T2 , (H1 H2 )′ , (H1 T2 )′ , H1 H2 ∪ T1 T2 }.
{0,
P[E j ∩ Ei ] + P[E ′j ∩ Ei ] = P[Ei ]. (8.18b) The probability tree of S is shown in Fig. 8-2. Each element
of S and its assigned probability are denoted at the end of each
Using Eqs. (8.18a and b) in Eq. (8.17) leads to Eq. (8.15). branch of the tree. Examples of computing probabilities of union
To illustrate the meaning of Eq. (8.14), let us consider the events are given in Example 8-1.
example of a coin toss with N = 2 times. If we use E1 to denote
that the first toss is a head and E2 to denote that the second toss
is a tail, then Eq. (8.14) becomes
Example 8-1: Probability Tree for Coin-Flip
P[H1 ∩ T2 ] + P[H1′ ∩ T2 ] = P[T2 ]. (8.19) Experiment
The first term is the probability that the first toss resulted in a
head and the second toss resulted in a tail, and the second term
is the probability that the first toss did not result in a head, but the For the coin-flip experiment with N = 2 and P[Hk ] = a, com-
second one did result in a tail. The sum of the two probabilities pute the probabilities of the following events: (a) H1 ∪ H2 , (b)
is equal to the probability that the second toss is a tail, P[T2 ], H1 H2 ∪ T1 T2 , and (c) H1 T2 ∪ T1 H2 .
regardless of the outcome of the first toss.
Solution: (a) According to Eq. (8.15),
P[H1 ∪ H2 ] = P[H1 ] + P[H2] − P[H1 ∩ H2 ].

Using the tree in Fig. 8-2 and noting that H1 T2 and T1 H2 are
a2 disjoint events, we have
H2 = H1 I H2 = H1H2
P[H1 T2 ∪ T1 H2 ] = a(1 − a) + (1 − a)a = 2a(1 − a).
a
H1
a(1 − a)
8-1.5 Probability Tree for Tetrahedral Die
a′ = 1 − a Experiment
a T2 = H1 I T2 = H1T2
A tetrahedral die is a four-sided object with one of the following
four numbers: 1, 2, 3, 4, printed on each of its four sides
Start (Fig. 8-3). When the die is rolled, the outcome is the number
a(1 − a) printed on the bottom side.
H2 = T1 I H2 = T1H2 Condition 1: The result of any die roll has no effect on the result
a′ = 1 − a a of any other roll.
T1 Condition 2: The die is not necessarily a “fair” die. If we denote
(1 − a)2 n as the number that appears on the bottom of the die after it has
a′ = 1 − a been rolled, the probabilities that n = 1, 2, 3, or 4 are
T2 = T1 I T2 = T1T2
P[n = 1] = a,
Figure 8-2 Probability tree for N = 2 for coin-flip experiment. P[n = 2] = b,
P[n = 3] = c,
P[n = 4] = d,
From the tree in Fig. 8-1,
with the constraints that 0 ≤ a, b, c, d ≤ 1 and
P[H1 ] = P[H2 ] = a
a + b + c + d = 1.
and
P[H1 ∩ H2 ] = a2 . For a fair die, a = b = c = d = 1/4.
Hence,
P[H1 ∪ H2 ] = a + a − a2 = a(2 − a).
(b)
P[H1 H2 ∪ T1 T2 ] = P[H1 H2 ] + P[T1T2 ] − P[H1H2 ∩ T1 T2 ].
From the tree in Fig. 8-2, P[H1 H2 ] = a2 and P[T1 T2 ] = (1 − a)2.

Furthermore, H1 H2 and T1 T2 are disjoint events, since they
cannot both occur. Hence, the probability of their intersection
is zero,
3 1
P[H1 H2 ∩ T1 T2 ] = 0,
and therefore
P[H1 H2 ∪ T1 T2 ] = a2 + (1 − a)2.
Figure 8-3 Tetrahedral die with 4 sides displaying numerals 1
(c) to 4. When the die is rolled, the outcome is the numeral on the
bottom side.
P[H1 T2 ∪ T1 H2 ] = P[H1 T2 ] + P[T1 H2 ] − P[H1T2 ∩ T1 H2 ].
8-2 CONDITIONAL PROBABILITY 259
c = 0.3, and d = 0.4, compute the probabilities of (a) n1 is an

n2 n1n2 P[n1n2] even number and (b) n1 + n2 = 3.
Solution: (a) The probability of n1 = even number is the sum
1 11 a2
a of the probabilities for n1 = 2 and n1 = 4:
n1 = 1 b 2 12 ab
P[n1 = even] = P[n1 = 2] + P[n1 = 4]
c 3 13 ac
d = 0.2 + 0.4 = 0.6.
4 14 ad
(b) Using the tree in Fig. 8-3,
1 21 ba
a
a n1 = 2 b P[n1 + n2 = 3] = P[n1 = 1 ∩ n2 = 2] + P[n1 = 2 ∩ n2 = 1]
2 22 b2
= ab + ba = 2ab = 2 × 0.1 × 0.2 = 0.04.
b c 3 23 bc
d
4 24 bd
Concept Question 8-1: What, exactly, is a probability
1 31 ca tree?
a
c b 2 32 cb
Exercise 8-1: For the coin flip experiment, what is the pmf
n1 = 3 c 3 33 c2 for n = # flips of all tails followed by the first head?
d d
4 34 cd
Answer: p[n] = a(1 − a)(n−1) since we require n − 1
1 41 da consecutive tails before the first head. This is called a
a
b geometric pmf .
2 42 db
n1 = 4 c 3 43 dc
d 8-2 Conditional Probability
4 44 d2
For a given event, say E1 , its probability of occurrence is
Figure 8-4 Probability tree for tetrahedral die experiment with
denoted P[E1 ]. The value of P[E1 ] is calculated assuming
N = 2 rolls. Symbols a, b, c, and d are the probabilities that the
no foreknowledge that any other event has already occurred.
outcome of the tetrahedral roll is a 1, 2, 3, or 4, respectively.
Sometimes, the prior occurrence of another event, say E3 ,
may impact the probability of occurrence of the current event
under consideration. Such a probability is called a conditional
When rolled N times, the sample space S of the tetrahedral die probability, designated
has 4N elements, each of which has the form { n1 , n2 , . . . , nN },
where each n is one of the four numbers { 1, 2, 3, 4 }. For N = 2, P[E1 |E3 ],
S has 42 = 16 elements:
and is read as “probability of E1 given E3 .” Figure 8-5 displays
{ n1 n2 } = the conditional probability tree for the coin flip experiment with
N = 2, and through Examples 8-3 and 8-4 we illustrate how to
{ 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44 }.
compute the conditional probability for specific scenarios.
The probability tree for the tetrahedral die experiment is
shown in Fig. 8-4 for N = 2. 8-2.1 Conditional Probability Formula
The general form of the conditional probability formula for
Example 8-2: Tetrahedral Die Probabilities events E1 and E2 is
P[E1 ∩ E2 ] P[E1 ∩ E2 ]
For the tetrahedral die experiment with N = 2, a = 0.1, b = 0.2, P[E1 |E2 ] = = , (8.21)
P[E2 ] P[E1 ∩ E2 ] + P[E1′ ∩ E2 ]
H1 P[H1H2] = ab
P[H2|H1] = b
H1
P[H1] = a
P[T2|H1] = 1 − b T1 P[H1T2] = a(1 − b)
Start
H2 P[T1H2] = (1 − a)b
P[T1] = 1 − a P[H2|T1] = b
T1
P[T2|T1] = 1 − b T2 P[T1T2] = (1 − a)(1 − b)
Figure 8-5 Conditional probability tree for a coin-flip experiment with N = 2. The top red path represents the outcome H1 H2 and the
bottom blue path represents the outcome T1 T2 .
where we used Eq. (8.14) in the second step. and
P[Ei ∪ E j ] = P[Ei ] + P[E j ] (mutually exclusive). (8.23b)

8-2.2 Independent and Mutually Exclusive
Events Table 8-1 provides a summary of the major terms and symbols
used in this section.
Two events Ei and E j are said to be independent if the probabil-
ity of occurrence of either has no effect on the occurrence of the
other: Example 8-3: Coin-Flip Conditional Probability
P[Ei |E j ] = P[Ei ], (8.22a)
P[E j |Ei ] = P[E j ], (8.22b) Given that the result of the coin-flip experiment for N = 2 was
two heads or two tails, compute the conditional probability that
and the result was two heads. Use a = b = 0.4.
P[Ei ∩ E j ] = P[Ei ] P[E j ]. (8.22c) Solution:
(independent events) E1 = H1 H2 (both heads),
In the coin-flip experiment, it was assumed that the outcome of E2 = H1 H2 ∪ T1 T2 , (both heads or both tails).
each flip was independent of the outcomes of other flips. Hence,
P[H1 ] is independent of P[T2 ]. The conditional probability that the result is both heads is
Two events are mutually exclusive if the occurrence of one P[E1 ∩ E2 ] P[H1 H2 ]
of them negates the possibility of the occurrence of the other. P[E1 |E2 ] = =
Consequently, P[E2 ] P[H1 H2 ∪ T1 T2 ]
P[H1 H2 ]
P[Ei ∩ E j ] = 0 (mutually exclusive) (8.23a) = ,
P[H1 H2 ] + P[T1 T2 ]
8-3 RANDOM VARIABLES 261
that
Table 8-1 Probability symbols and terminology.
P[n1 = 1 ∩ n2 = 2]
Term Notation P[n1 = 1 | n1 + n2 = 3] =
P[n1 + n2 = 3]
Sample space S P[n1 = 1 ∩ n2 = 2]
=
Event (examples) E1 , E2 , . . . P[n1 n2 = 12] + P[n1n2 = 21]
Outcome (examples) H, T ; x1 x2 ab
= = 0.5.
Empty set (impossible event) 0/ ab + ba
Complement of E (“not E”) E′ The last entries were obtained from the probability tree in
Union of Ei and E j (“Ei or E j ”) Ei ∪ E j Fig. 8-4. Note that because of the a priori knowledge that
Intersection of Ei and E j (“Ei and E j ”) Ei ∩ E j n1 + n2 = 3, the probability of n1 = 1 increased from 0.1 to 0.5.
Ei and E j are independent P[Ei ∩ E j ] = Concept Question 8-2: Why does the conditional proba-
P[Ei ] P[E j ] bility formula require division by P[B]?
Ei and E j are mutually exclusive P[Ei ∪ E j ] =
P[Ei ] + P[E j ] Exercise 8-2: Given that the result of the coin flip experi-
ment for N = 2 was one head and one tail, compute P[H1 ].
Answer: P[H1 | H1 T2 ∪ T1 H2 ] = 0.5. (See IP ).
where we used the relation (H1 H2 ) ∩ (T1 T2 ) = 0. Using the
probability tree in Fig. 8-5 gives
8-3 Random Variables
a2 0.42
P[E1 |E2 ] = 2 2
= = 0.31. The number of heads n among N flips is a random variable. A
a + (1 − a) 0.4 + (1 − 0.4)2
2
random variable is a number assigned to each possible outcome
of a random experiment. The range of values that n can assume
is from zero (no heads) to N (all heads). Another possible
random variable is the number of consecutive pairs of heads
Example 8-4: Tetrahedral Die Conditional among the N flips.
Probability For the tetrahedral die experiment, our random variable might
be the number of times among N tosses that the outcome is the
number 3, or the number of times the number 3 is followed
by the number 2, or many others. In all of these cases, the
Given that the result of the tetrahedral die experiment of Exam- random variables have real discrete values, so we refer to them
ple 8-2 was n1 + n2 = 3, compute the probability that n1 = 1. as discrete random variables. This is in contrast to continuous
Use a = 0.1, b = 0.2, c = 0.3, and d = 0.4. random variables in which the random variable may assume
any value over a certain continuous range. We will examine the
Solution: From Fig. 8-4, there are two ways to obtain properties of both types of variables.
n1 + n2 = 3, namely
E1 = n1 n2 = 12, 8-3.1 Probability Distributions for Continuous

E2 = n1 n2 = 21. Random Variables
We need to compute the conditional probability Since a continuous random variable can take on a continuum of
values, such as the set of real number in an interval, the proba-
P[n1 = 1 | n1 + n2 = 3]. bility of a continuous random variable taking on a specific value
is zero. To describe the distribution of a continuous random
Since satisfying both conditions requires that n2 = 2, it follows variable, we must use a density function, called a probability
density function (pdf ). A pdf is not a probability; its units are

the same as those of the random variable (e.g., distance). Instead, 20
it describes the relative likelihood of the random variable taking x(y)
15
on a specific value. It is analogous to mass density, rather than
pure mass. 10
Surface height x (mm)

Let us consider the example shown in Fig. 8-6. Part (a)
of the figure displays the height profile measured by a laser 5
ranger as the beam was moved across 1 m of a ground surface.
Relative to the mean surface, the height varies between about 0
−20 mm and +20 mm, which means that the probability that
the ground surface height may exceed this range is zero. If −5
we count the number of times that each height occurs, we end
−10
up with the pdf shown in Fig. 8-6(b). The measured pdf is
discrete, but if the discretization is made infinitesimally small, −15
the pdf becomes continuous. In the present case, the pdf is
approximately Gaussian in shape. The horizontal axis denotes −20
the surface height x (relative to the mean surface) and the vertical 0 100 200 300 400 500 600 700 800 900 1000
axis denotes p(x), the probability density function. If a point Horizontal location y (mm)
on the surface is selected randomly, the probability that it has a (a) Height profile
height between 5 mm and 5.001 mm is 0.001 times the value of
the pdf at 5 mm, which is 0.0004. 0.060
Formally, p(x) for a continuous random variable is defined as
0.055 Gaussian
Probability density p(x) (1/mm)
1 0.050 p(x)
p(x′ ) = lim P[x′ ≤ x < x′ + δ ]. (8.24)
δ →0 δ 0.045
0.040
Thus, the probability of x lying within a narrow interval
[x′ < x < x′ + δ ) is p(x) δ . The interval probability that x lies 0.035
between values a and b is 0.030
Z b 0.025
P[a ≤ x < b] = p(x′ ) dx′ . (8.25a) 0.020
a
0.015
and the total probability over all possible values of x is
0.010
Z ∞
0.005
p(x′ ) dx′ = 1. (8.25b)
−∞ 0.000
−20 −15 −10 −5 0 5 10 15 20
Surface height above mean surface x (mm)
(b) pdf
Figure 8-6 (a) Measured height profile x(y) and (b) pdf of
digitized height profile p(x).
8-3.2 Probability Distributions for Discrete
Random Variables
The term pdf is used to describe the probability function

associated with a continuous random variable. The analogous ◮ Note that we use curved brackets for a pdf p(x) and
function for a discrete random variable n is the probability mass square brackets for a pmf p[n]. ◭
function (pmf ), p[n].
8-4 EFFECTS OF SHIFTS ON PDFS AND PMFS 263
p[n] R 1/2
Answer: P[x < 1/2] = 0 2x dx = 1/4.
0.4
0.3
Exercise 8-4: Random variable n has the pmf p[n] = ( 12 )n
0.2 for integers n ≥ 1. Compute P[n ≤ 5].
0.1
Answer: P[n ≤ 5] = ∑5n=1 ( 12 )n = 31/32.
0 n
−1 0 1 2 3 4 5 6
Figure 8-7 pmf p[n] for the tetrahedral die experiment.

8-4 Effects of Shifts on Pdfs and Pmfs
For a discrete random variable n, 8-4.1 Continuous Random Variable
p[n′ ] = P[n = n′ ], (8.26) If a random variable x characterized by a pdf p(x) is shifted by

a constant amount τ to form a new random variable y, with
where P[n = n′ ] is the probability that n has the value n′ . By
way of an example, Fig. 8-7 displays p[n] for a single toss of y = x − τ, (8.28a)
the tetrahedral die, with the random variable n representing the
then according to the rules of probability, the pdf of y is the pdf
outcome of the toss, namely the number 1, 2, 3, or 4.
of x shifted by τ :
The interval probability that the value of n is between n′ = na
p(y) = p(x − τ ). (8.28b)
and n′ = nb − 1, inclusive of those limits, is
When computing mean values and other moments, the dummy
nb −1
variables used in integrations should be related by
P[na ≤ n < nb ] = ∑ p[n′ ], (8.27a)
n′ =n a y′ = x′ − τ . (8.28c)
and the total probability over all possible values of n′ is
∞
∑ p[n′ ] = 1. (8.27b)
′
n =−∞
8-4.2 Discrete Random Variable
◮ The notation and properties of continuous and discrete For an integer k and two discrete random variables n and m
random variables are summarized in Table 8-2. ◭ related by
m = n − k, (8.29a)
the pmf of m is related to the pmf of n by
Concept Question 8-3: If x is a continuous random vari-
able, what is P[x = c] for any constant c? p[m = m′ ] = P[m = m′ ]
= P[n − k = m′ ]
Exercise 8-3: Random variable x has the pdf
= P[n = m′ + k] = p[n = n′ ] = p[n′ ], (8.29b)
(
2x for 0 ≤ x ≤ 1, where dummy variables n′ and m′ are related by
p(x) =
0 otherwise.
m′ = n′ − k. (8.29c)
Compute P[x < 1/2].
Table 8-2 Notation and properties of continuous and discrete random variables.
A. Continuous Random Variable x

pdf of x p(x)
Z ∞
mean value of x x = E[x] = x′ p(x′ ) dx′
−∞
Z ∞
mean value of x2 x2 = E[x2 ] = (x′ )2 p(x′ ) dx′
−∞
p
standard deviation of x σx = x2 − x2
variance of x σx2 = x2 − x2
Z b
interval probability over (a, b) P[a ≤ x < b] = p(x′ ) dx′
a
shift property: y = x − τ , with τ = constant p(y) = p(x), with y = x − τ
covariance of x and y λx,y = xy − x y
B. Discrete Random Variable n

pmf of n p[n]
∞
mean value of n n= ∑ n′ p[n′ ]
′
n =−∞
∞
mean value of n2 n2 = ∑ (n′ )2 p[n′ ]
′
n =−∞
p
standard deviation of n σn = n2 − n2
variance of n σn2 = n2 − n2
n′ =nb −1
interval probability over [na ≤ n < nb ] P[na ≤ n < nb ] = ∑ p[n′ ]
′
n =na
shift property: m = n − k p[m] = p[n], with m = n − k
covariance of n and m λn,m = nm − n m
pendent, the probability of any given sequence of results is

Example 8-5: Binomial Pmf the product of the probabilities of the individual results. The
probability for a head is a and the probability for a tail is
(1 − a), and if the number of heads is n, the number of tails
In the coin-flip experiment, define random variable n as the is (N − n). Since multiplication of probabilities is commutative,
number of heads in N flips. Compute pmf p[n], given that for the probability of any specific sequence of n heads and (N − n)
any coin flip, the probability for a head outcome is a.
Solution: Since the results of different coin flips are inde-

8-5 JOINT PDFS AND PMFS 265
8-5 Joint Pdfs and Pmfs

p[n]
0.35 This section introduces the joint pdf and joint pmf for two ran-
0.30
0.25
dom variables, in continuous and discrete format. Atmospheric
0.20 temperature and pressure are each a random variable that varies
0.15 with time and the prevailing atmospheric conditions. Hence,
0.10 each has its own pdf or pmf. The study of atmospheric phe-
0.05
0 n nomena requires knowledge of the statistics of both variables,
0 1 2 3 4 5 6 7 8 9 10 including their joint pdf or pmf.
Figure 8-8 Binomial pmf p[n] for N = 10 and a = 0.6.
8-5.1 Continuous Random Variables

tails, in any order, is The joint pdf of two continuous random variables x and y is
P[n] = a (1 − a) n N−n
(single sequence). (8.30a) defined as
1
The number of such possible sequences, denoted Nn , is p(x = x′ , y = y′ ) = lim P[x′ ≤ x < x′ + δ , y′ ≤ y < y′ + δ ].
δ →0 δ 2
(8.32)
N N! The definition is extendable to any number of random variables.
= (number of sequences). (8.30b)
n n! (N − n)! The interval probability over the range (ax ≤ x < bx ) and
(ay ≤ y < by ) is
The pmf p(n) is the product of the probability for an individual
sequence and the number of sequences: Z bx Z by
P[ax ≤ x < bx , ay ≤ y < by ] = p(x′ , y′ ) dx′ dy′ . (8.33)
ax ay

N an (1 − a)N−n N!
p(n) = P[n] = . (8.31) Additionally, if we extend the limits to ±∞, we have
n n! (N − n)!
Z ∞Z ∞
p(x′ , y′ ) dx′ dy′ = 1. (8.34)
This is known as the binomial probability density function. A −∞ −∞
plot of p(n) is shown in Fig. 8-8 for N = 10 and a = 0.6. The marginal pdfs p(x) and p(y) are related to the joint pdf
The expression given by Eq. (8.31) is the pmf for a specific through
value of n. If we were to add the probabilities for all possible
Z ∞
values of n (between 0 (no heads) and N (all heads)), among N
tosses, then the sum should be 1. Indeed, a bit of algebra leads p(x) = p(x, y′ ) dy′ (marginal pdf for x), (8.35a)
−∞
to the conclusion Z ∞
p(y) = p(x′ , y) dx′ (marginal pdf for y). (8.35b)
N N ′ ′ −∞
an (1 − a)N−n N!
∑ p(n) = ∑ n′ ! (N − n′)! The conditional pdf for random variable x, given random
n′ =0 n′ =0
N N−1 variable y, is given by
= (1 − a) + a(1 − a) N
a2 (1 − a)N−2 N(N − 1) 1
+ + · · · + aN 2
P[x′ ≤ x < x′ + δ , y′ ≤ y < y′ + δ ]
2 ′ ′
p(x = x | y = y ) = lim δ
δ →0 1
= (a + (1 − a))N = 1. P[y′ ≤ y < y′ + δ ]
δ
In the last step, we used the binomial expansion of p(x, y)
= (conditional pdf). (8.36a)
(a + (1 − a))N . p(y)
Similarly, value x is and up to 1. The domain of p(x, y) is the shaded

p(x, y) triangle in Fig. 8-10.
p(y|x) = . (8.36b)
p(x) Since the total probability is 1, it follows that
If variables x and y are statistically independent, then Z 1 Z 1
1= p(x , y ) dy dx′
′ ′ ′
p(x, y) = p(x) p(y), (8.37a) 0 x′

Z 1 Z 1
in which case = C dy dx′
′
0 x′
Z 1 1 Z 1
p(x|y) = p(x) (independent variables). (8.37b) C
= Cy′ x′ dx′ = C(1 − x′ ) dx′ = . (8.41)
0 0 2
The Gaussian (or normal) pdf is an important density function
because it is applicable to the probabilistic behavior of many Hence, C = 2.
physical variables. If random variables x and y are each charac- (b) With C = 2, application of Eq. (8.35a) leads to
terized by a Gaussian pdf, then Z 1
p(x) = p(x, y′ ) dy′
1 −(x−x)2 /2σx2 x
(
p(x) = p e (8.38a) Z 1
2πσx2 ′ 2(1 − x) 0 ≤ x ≤ 1,
= 2 dy = (8.42)
x 0 otherwise.
and
1 2 2
p(y) = q e−(y−y) /2σy , (8.38b) (c) Using the results of parts (a) and (b) in Eq. (8.36b), the
2πσy2 conditional pdf p(y|x) is
where x and y are the mean values of x and y, respectively, and 
2
σx and σy are their associated standard deviations. Moreover, if p(x, y)  0 ≤ x ≤ y ≤ 1,
p(y|x) = = 2(1 − x)
x and y are independent, then their joint pdf is p(x) 
0 otherwise.
p(x, y) = p(x) p(y)
For x = 3/4,
1 2 2 2 2
= e−(x−x) /2σx e−(y−y) /2σy . (8.39) (
2πσx σy 4 3/4 ≤ y ≤ 1,
p(y|x = 3/4) = (8.43)
Part (a) of Fig. 8-9 displays a 1-D plot of the Gaussian pdf with 0 otherwise.
x = 2 and σx = 0.45, and part (b) displays a 2-D Gaussian with
x = y = 0 and σx = σy = 1.2. Note that both p(x) and p(y|x = 3/4) integrate to 1, as required
by Eq. (8.5):
Z 1 Z 1
Example 8-6: Triangle-Like Joint Pdf p(x′ ) dx′ = 2(1 − x′) dx′ = 1,
0 0
Z 1 Z 1
p(y′ | x = 3/4) dy′ = 4 dy′ = 1.
Given that 3/4 3/4
(
C for 0 ≤ x ≤ y ≤ 1,
p(x, y) = (8.40)
0 otherwise,
compute (a) constant C, (b) the marginal pdf p(x), and (c) the 8-5.2 Discrete Random Variables
conditional pdf p(y|x) at x = 3/4.
For two discrete random variables n and m, their joint probabil-
Solution: (a) We note from the definition of p(x, y) that x can ity mass function is p[n, m]. The probability that n is in the range
extend between 0 and 1, but y extends only between whatever between na and nb − 1, inclusive of those limits, and m is in the
8-5 JOINT PDFS AND PMFS 267
p(x)
1.0
x = 2, σx = 0.45
0.6
0.4
0.2
−3 −2 −1 0 1 2 3 4 5 6 7
x
(a) 1-D Gaussian plot with x = 2 and σx = 0.45.
(b) 2-D Gaussian pdf with x = y = 0 and σx = σy = 1.2.
Figure 8-9 Gaussian pdfs in (a) 1-D and (b) 2-D.

y Example 8-7: Discrete Version of Example 8-6
1
Given that
(
C for 0 ≤ n ≤ m ≤ 2,
p[n, m] =
x 0 otherwise,
1
compute (a) constant C, (b) the marginal pmf p[n], and (c) the
Figure 8-10 The domain of joint pdf p(x, y) of Example 8-6; conditional pmf p[m|n] at n = 1.
for each value of variable x, variable y extends from that value
to 1. Solution: (a) The joint pmf, depicted in Fig. 8-11, is the
discrete equivalent of the joint pdf displayed in Fig. 8-10.
2 C C C
range between ma and mb − 1, also inclusive of those limits, is

nb −1 mb −1 1 C C
na ≤ n < nb ′ ′
P
ma ≤ m < mb
= ∑ ∑ p[n , m ]. (8.44a)
n′ =na m′ =ma
C n
0
As required by the axioms of probability, when the limits on the 0 1 2
double sum are extended to ±∞, the total probability adds up
to 1: Figure 8-11 Depiction of joint pmf p[n, m] in Example 8-7.
∞ ∞
′ ′
∑ ∑ p[n , m ] = 1. (8.44b)
n′ =−∞ m′ =−∞
Since the total probability is 1, it follows that
In analogy with the expressions given by Eq. (8.35) for the
marginal pdfs in the continuous case, the marginal pmfs p[n] 2 2 2 2 2
and p[m] in the discrete case are related to the joint pmf by 1= ∑ ∑ p[n′ , m′ ] = ∑ C+ ∑ C+ ∑ C
′ ′ ′
n =0 m =n ′ m =0 ′
m =1 ′
m =2
∞ |{z} |{z} |{z}
p[n, m′ ], n′ =0 n′ =1 n′ =2
p[n] = ∑ (8.45a)
′
m =−∞ = 3C + 2C + C = 6C.
∞
p[m] = ∑ p[n′ , m]. (8.45b) Hence, C = 1/6.
n′ =−∞ (b) With C = 1/6, application of Eq. (8.45a) leads to

Finally, the conditional pmf for random variable n, given 3/6 for n = 0,
random variable m, is 2
′
2
1 
p[n] = ∑ p[n, m ] = ∑ = 2/6 for n = 1,
m′ =n m′ =n
6  1/6
p[n, m] for n = 2.
p[n|m] = . (8.46)
p[m]
(c) Using Eq. (8.46), but with n and m interchanged, the
Of course, if n and m are independent random variables, then conditional probability p[m|n] is
p[n, m] = p[n] p[m], in which case
p[m, n] 1/6
p[n|m] = p[n] (n and m independent). (8.47) p[m|n] = = .
p[n] p[n]
8-6 FUNCTIONS OF RANDOM VARIABLES 269
To evaluate p[m|n] at n = 1, we should note that since n ≤ m, the 8-6 Functions of Random Variables
range of m becomes limited to [1, 2]. Hence,
 1/6 1
1/6  2/6 = 2
 for m = 1,
p[m|1] = 1/6
= 2/6 = 2 1
for m = 2,
8-6.1 Mean Value
p[n = 1]  
0 otherwise.
We note that the total conditional probability adds up to 1: The mean value, or simply the mean or expectation, of a
continuous random variable x characterized by a pdf p(x) is
2
1 1 defined as Z ∞
∑ p[m | n = 1] = 2 + 2 = 1. x = E[x] = x′ p(x′ ) dx′ . (8.48a)
m=1 −∞
Here, we use E[x] to denote the “expected value” of x, synony-

Concept Question 8-4: Since P[y = c] = 0 for any con- mous with the abbreviated notation x.
tinuous random variable y and constant c, how can p(x|y) Similarly, for a discrete random variable n characterized by a
make sense? pmf p(n), the mean value of n is
∞
Exercise 8-5: In Example 8-6, compute the marginal pdf n = E[n] = ∑ n′ p[n′ ]. (8.48b)
p(y) and conditional marginal pdf p(x|y). ′
n =−∞
Answer: For a function f (x, y) of two continuous random variables x and

Z y
( y, the expectation (mean value) of f (x, y) is computed using the
2y for 0 ≤ y ≤ 1, joint pdf of (x, y), namely p(x, y), as follows:
p(y) = 2 dx =
0 0 otherwise, Z ∞Z ∞
(
p(x, y) 2
for 0 ≤ x < y ≤ 1, E[ f (x, y)] = f (x′ , y′ ) p(x′ , y′ ) dx′ dy′ . (8.49a)
p(x|y) = = 2y −∞ −∞
p(y) 0 otherwise.
Similarly, for the discrete case
As expected, this becomes an impulse if y = 0. ∞ ∞
E[ f [n, m]] = ∑ ′∑ f [n′ , m′ ] p[n′ , m′ ]. (8.49b)
′
n =−∞ m =−∞
Exercise 8-6: In Example 8-7, compute the marginal pmf
p[m] and conditional marginal pmf p[n|m].
Expectation is a linear operator: for any two continuous
Answer: random variables x and y, and any two constants a and b,
 Z ∞
1/6 for m = 0,
m
1  E[ax + by] = (ax′ + by′ ) p(x′ , y′ ) dx′ dy′
p[m] = ∑ = 2/6 for m = 1, −∞
Z ∞ Z ∞
n′ =0
6 
3/6 for m = 2. =a x′ dx′ p(x′ , y′ ) dy′
−∞ −∞
Z ∞ Z ∞
Hence,
+b y′ dy′ p(x′ , y′ ) dx′ . (8.50)
−∞ −∞
m+1
p[m] = for m = 0, 1, 2,
6 In view of the relations given by Eq. (8.35) for the marginal pdfs,
p[n, m] 1/6 1 Eq. (8.50) can be rewritten as
p[n|m] = = = for m = 0, 1, 2,
p[m] (m + 1)/6 m + 1 Z ∞ Z ∞
E[ax + by] = a x′ dx′ p(x) + b y′ dy′ p(y′ )
which sums to 1, as required for a pmf. −∞ −∞
= aE[x] + bE[y] = ax + by. (8.51)
where xy = E[xy].
◮ The mean value of the weighted sum of two random
Unlike the mean, the variance of the linear sum of two random
variables is equal to the sum of their weighted means. ◭
variables is not a linear operator. Consider the variable x + y:
2
A similar relationship applies to two discrete random variables σ(x+y) = E[((x + y) − E[x + y])2]
n and m:
E[an + bm] = an + bm. (8.52) = E[(x + y)2 − 2(x + y) E[x + y] + (E[x + y])2]
= E[x2 ] + E[2xy] + E[y2]
8-6.2 Conditional Mean − 2E[x] E[x + y] − 2E[y] E[x + y] + (E[x + y])2
2
The conditional expectation, also known as the conditional = x2 + 2xy + y2 − 2x(x + y) − 2y(x + y) + (x + y) .
mean, of random variable x, given that y = y′ uses the condi- (8.57)
tional pdf p(x|y = y′ ):
Z ∞ The expectation of (x + y) is linear; i.e.,
E[x | y = y′ ] = x′ p(x′ |y′ ) dx′ . (8.53a)
−∞ (x + y) = x + y. (8.58)
Note that the conditional mean is defined at a specific value of Use of Eq. (8.58) in Eq. (8.57) leads to
the second random variable, namely y = y′ .
2
For the discrete case, σ(x+y) = x2 + 2xy+ y2 − 2x(x+ y)− 2y(x+ y)+ (x+ y)2 , (8.59)
∞
E[n | m = m′ ] = ∑ n′ p[n′ |m′ ]. (8.53b) which simplifies to
n′ =−∞
2
σ(x+y) = (x2 − x2 ) + (y2 − y2 ) + 2(xy − x y). (8.60)
8-6.3 Variance and Standard Deviation In view of the definitions given by Eqs. (8.54) and (8.56),
The variance of a random variable x with mean value x is the Eq. (8.60) can be rewritten as
mean value of (x − x)2 , where (x − x) is the deviation of x from
its mean x. Denoted σx2 , the variance is defined as 2
σ(x+y) = σx2 + σy2 + 2λx,y . (8.61)
σx2 = E[(x − x) ]2
= E[x2 − 2xx + x2 ]
8-6.4 Properties of the Covariance
= E[x2 ] − 2x E[x] + x2 = x2 − x2 , (8.54)
(1) The covariance between a random variable and itself is its
variance:
where x2 = E[x2 ].
Also, 2
σax = a2 σx2 .
λx,x = σx2 . (8.62)
The square root of the variance is the standard deviation:
q (2) The degree of correlation between two random variables is
σx = x2 − x2 . (8.55) defined by the correlation coefficient ρxy :
A pdf with a small standard deviation is relatively narrow in λx,y

ρx,y = . (8.63)
shape, whereas one with a large value for σx is relatively broad σx σy
(Fig. 8-12).
The covariance λx,y of two random variables x and y is If x and y are uncorrelated, then
defined as the mean value of the product of the deviation of x E[xy] = E[x] E[y] = x y, (8.64)
from x and y from y:
in which case Eq. (8.56) yields the result
λx,y = E[(x − x)(y − y)]
= E[xy] − x E[y] − y E[x] + x y = xy − x y, (8.56) λx,y = 0 (x and y uncorrelated), (8.65)
8-6 FUNCTIONS OF RANDOM VARIABLES 271
p(x)
1.0
x = 0, σx2 = 0.2
0.8
x = 0, σx2 = 1.0
0.6
x = 0, σx2 = 5.0
0.4
0.2
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
Figure 8-12 Three Gaussian distributions, all with x = 0 but different standard deviations.
and, consequently, ρx,y = 0. Probabilistic Term Mechanical Term

Expectation (mean) Center of mass
Variance σx2 Moment of inertia
(3) Two random variables x and y are uncorrelated if their Standard deviation σx Radius of gyration
covariance is zero, and they are independent if their joint pdf Covariance λx,y Cross moment of inertia
is separable into the product of their individual pdfs:
p(x, y) = p(x) p(y) (x and y independent). (8.66)

Example 8-8: Triangle-Like Pdf II
◮ Uncorrelated random variables may or may not be
independent, but independent random variables are uncor-
related. ◭ Given the joint pdf introduced earlier in Example 8-6, namely
(
C for 0 ≤ x ≤ y ≤ 1,
p(x, y) =
0 otherwise,
Consider two independent random variables x and y. The mean compute (a) x, (b) σx2 , (c) E[y | x = 3/4], and (d) σy2 | x=3/4 .
of their product is
Solution: (a) From Eq. (8.42),
Z ∞Z ∞
(
xy = E[xy] = x′ y′ p(x′ , y′ ) dx′ dy′ 2(1 − x), 0 ≤ x ≤ 1,
−∞ −∞ p(x) =
Z ∞
0 otherwise.
= x y px (x′ ) py (y′ ) dx′ dy′
′ ′
−∞
Z ∞ Z ∞ Hence, the mean value of x is
= x′ p(x′ ) dx′ y′ p(y′ ) dy′ Z 1 Z 1
−∞ −∞ 1
=xy (x and y independent), (8.67) x = E[x] = x′ p(x′ ) dx′ = 2x′ (1 − x′) dx′ = .
0 0 3
which leads to λx,y = xy − x y = 0, and hence uncorrelation (b) To compute σx2 , we start by computing x2 :
between x and y.
For readers interested in mechanical systems, we note the Z 1 Z 1
1
following correspondences: x2 = E[x2 ] = (x′ )2 p(x′ ) dx′ = 2(x′ )2 (1 − x′ ) dx′ = .
0 0 6
Hence, by Eq. (8.54),

Concept Question 8-5: When is the variance of the sum
2 of two random variables the sum of the variances?
2 2 2 1 1 1
σx = x − x = − = .
6 3 18
Concept Question 8-6: Does “x and y are uncorrelated”
(c) With x and y interchanged, the expression for the condi- imply “x and y are independent” or is it the other way
tional expectation given by Eq. (8.53a) is around?
Z ∞
′
E[y | x = x ] = y′ p(y′ |x′ ) dy′ . Exercise 8-7: In Example 8-6, compute (a) the mean E(y)
−∞
and (b) variance σy2 .
Using the expression given by Eq. (8.43) and setting x′ = 3/4 Answer: (a)
leads to Z 1 (
7 Z y
E[y | x = 3/4] = y′ 4 dy′ = , 2y for 0 ≤ y ≤ 1
3/4 8 p(y) = 2 dx =
0 0 otherwise,
which is the midpoint between 3/4 and 1. Z 1
(d) By Eq. (8.54), the variance σy2 | x=3/4 is given by E[y] = y(2y) dy = 2/3.
0
σy2 | x=3/4 = E[y2 | x = 3/4] − (E[y | x = 3/4])2 . (b) Z 1

The first term computes to E[y2 ] = y2 (2y) dy = 1/2.
0
Z 1
37 Hence σy2 = 21 − ( 23 )2 = 1
18 .
E[y2 | x = 3/4] = (y′ )2 4 dy′ = .
3/4 48
Hence, Exercise 8-8: Show that for the joint pdf given in Example
2 8-8, λx,y = 1/36.
37 7 1
σy | x=3/4 = − = .
48 8 192 Answer: (See IP ).
8-6.5 Uniform Pdf

A pdf is uniform if it has a constant value over a specified range,
8-7 Random Vectors
such as a to b, and zero otherwise: Suppose we wish to quantify the probability of an event—
 specifically, the time at which students at a certain university
 1
for a < x < b, wake up in the morning—in terms of three random variables,
p(x) = b − a (8.68)
0 otherwise. namely the age of an individual student, the temperature of the
room in which the student was sleeping, and the number of
The mean and variance of a random variable characterized by a credit hours that the student is enrolled in. Many other factors
uniform pdf are impact the time that a student wakes up in the morning, but let
us assume we are focused on only these three random variables.
Z b Z b
From the perspective of the prime subject of this book,
1 a+b
x= x′ p(x′ ) dx′ = x′ dx′ = (8.69) namely image processing, our multiple random variables might
a a b−a 2
be the image intensities of multiple images of a scene acquired
and at different wavelengths, different times, or under different
Z b 2 illumination conditions. The objective might be to assign each
′ 2 1 ′ a+b (b − a)2 image pixel to one of a list of possible classes on the basis of the
σx2 = x2 − x2 = (x ) dx − = .
a b−a 2 12 intensities of the multiple images for that pixel. If we are dealing
(8.70) with multi-wavelength satellite images of Earth’s surface, for
8-7 RANDOM VECTORS 273
example, the classes would be water, urban, forest, etc. And if where λxi ,x j = xi x j − xi x j . Similarly, the cross-covariance ma-
the multiple images are ultrasound medical images of the same trix Kx,y between random vector x of length N and random
scene but acquired under different conditions or different times, vector y of length M is given by the (N × M) matrix
the classes would be bone, tissue, etc.
When two or more random variables are associated with an Kx,y = E[(x − x)(y − y)T ]
event of interest, we form a random vector (RV). A random  
λx1 ,y1 λx1 ,y2 ··· λx1 ,yM
vector x of length N is a column vector comprising N random  λx2 ,y1 λx2 ,y2 ··· λx2 ,yM 
variables { x1 , x2 , . . . , xN }: = E[xyT ] − x yT = 
 ..
.

.
x = [x1 , x2 , . . . , xN ]T . (8.71) λxN ,y1 λxN ,y2 ··· λxN ,yM
(8.75)
We write x in bold to denote that it is a vector, and here the
superscript “T” denotes the transpose operation, which in this We note that Kx,x = Kx = KTx and Ky,x = KTx,y . Additionally, if
case converts a horizontal vector into a column vector. x and y are uncorrelated,
Ky,x = [0] (x and y uncorrelated). (8.76)

◮ Throughout this chapter, we assume that all random
variables represent real quantities. ◭
The distribution of a random vector of continuous or discrete 8-7.2 Random Vector Ax

random variables is described respectively by a joint pdf or
joint pmf of the random variables. The dummy variable vector Given a random vector x of length N and a constant (M × N)
associated with the random vector x is matrix A, we now examine how to relate the attributes of x to
those of random vector y given by
x′ = [x′1 , x′2 , . . . , x′N ]T , (8.72)
y = Ax. (8.77)
and the joint pdf of vector x is p(x).
The mean value of y is simply
◮ The notation and properties of random vectors are sum- y = E[y] = E[Ax] = Ax. (8.78)
marized in Table 8-3. ◭
To demonstrate the validity of Eq. (8.78), we rewrite y in terms
of its individual elements:
8-7.1 Mean Vector and Covariance Matrix N
yi = ∑ Ai, j x j , 1 ≤ i ≤ M, (8.79)
The mean vector x of random vector x is the vector of the mean j=1
values of the components of x:
where Ai, j is the (i, j)th element of A. Taking the expectation of
x = E[x] = [x1 , x2 , . . . , xN ]T . (8.73) yi , while recalling from Eq. (8.51) that the expectation is a linear
operator, gives
In analogy with Eq. (8.56), the covariance matrix Kx of random
vector x comprises the covariances between xi and x j : N
E[yi ] = ∑ Ai, j E[x j ], 1 ≤ i ≤ M. (8.80)
Kx = E[(x − x)(x − x)T ] j=1
 
λx1 ,x1 λx1 ,x2 ··· λx1 ,xN Since this result is equally applicable to all random variables yi
 λx2 ,x1 λx2 ,x2 ··· λx2 ,xN  of random vector y, it follows that Eq. (8.78) is true.
= E[xxT ] − x xT = 
 ..
 , (8.74)
 Using the matrix algebra property
.
λxN ,x1 λxN ,x2 ··· λxN ,xN (Ax)T = xT AT , (8.81)
Table 8-3 Notation and properties of random vectors.
A. Continuous Random Vector x = [x1 , x2 , . . . , xN ]T

pdf of x p(x)
mean value of x x = E[x] = [x1 , x2 , . . . , xN ]T
covariance matrix of x Kx = E[(x − x)(x − x)T ] = xxT − x xT
cross-covariance matrix Kx,y = E[(x − x)(y − y)T ] = x yT − x yT
B. Discrete Random Vector n = [n1 , n2 , . . . , nN ]T

pmf of n p[n]
mean value of n n = E[n] = [n1 , n2 , . . . , nN ]T
covariance matrix of n Kn = E[(n − n)(n − n)T ] = nnT − n nT
cross-covariance matrix Kn,m = E[(n − n)(m − m)T ] = nmT − n mT
we derive the following relation for the covariance matrix of y:
Ky = E[(Ax)(Ax)T ] − E[(Ax)] E[Ax]T

= A E[xxT]AT − A E[x] E[x]T AT = AKx AT . (8.82) Table 8-4 Properties of two real-valued random vectors.
Similarly, the cross-covariance matrix between y and x is ob- Random Vectors x and y
tained by applying the basic definition for Ky,x (as given in the
second step of Eq. (8.75), after interchanging x and y: Covariance matrix Kx = E[x xT ] − x xT
Ky,x = E[yxT ] − y xT = E[AxxT] − Ax xT = AKx , (8.83) Cross-covariance matrix Ky,x = E[x yT ] − x yT ,

any 2 random vectors x and y
where we used Eq. (8.74) in the last step. Cross-covariance matrix Ky,x = KTx,y ,
Noting that (a + b)(c + d) = ac + ad + bc + bd, we can show any 2 random vectors x and y
that the covariance matrix of the sum of two random vectors x Cross-covariance matrix Kx,x = Kx = KTx , for x = y
and y is
K(x+y) = Kx + Kx,y + Ky,x + Ky . (8.84) Cross-covariance matrix Kx,y = 0,
Table 8-4 provides a summary of the covariance relationships if x and y are uncorrelated
between random vectors x and y. Covariance of sum matrix
K(x+y) = Kx + Kx,y + Ky,x + Ky
Concept Question 8-7: If x is a random vector and
y = Ax, how is Ky related to Kx ? Is it Ky = AKx AT or Random Vectors x and y = Ax
Ky = AT Kx A?
Mean value y = Ax
Covariance matrix Ky = AKx AT
Cross-covariance matrix Ky,x = AKx
Cross-covariance matrix Kx,y = Kx AT
8-8 GAUSSIAN RANDOM VECTORS 275
value vector x, and the covariance matrix Kx . Hence, a jointly

Exercise Random
8-9: x has the covariance matrix
vector Gaussian random vector x often is described by the shorthand
4 1 4 3
Kx = . If y = x, find Ky and Kx,y . notation:
1 3 2 1
x ∼ N (x, Kx ), (8.86)
Answer:
where “N ” stands for “Normal Distribution,” another name
T for the Gaussian distribution. Next, we examine some of the
4 3 4 1 4 3 115 51
Ky = = . important properties of the jointly Gaussian random vectors.
2 1 1 3 2 1 51 23

4 3 4 1 19 13
Kx,y = = .
2 1 1 3 9 5
8-8.1 If x Is Gaussian, Then y = Ax Is Gaussian
Per the notation given by Eq. (8.86), if x is a Gaussian random
8-8 Gaussian Random Vectors vector, then random vector y = Ax also is Gaussian and given
by
The expression given by Eq. (8.39) describes the joint pdf for a y ∼ N (Ax, AKx AT ). (8.87)
Gaussian random vector consisting of two independent random To demonstrate the validity of Eq. (8.87), we resort to the
variables. For the general case, if x is a random vector of use of the N-dimensional continuous-space Fourier transform
length N, as defined by Eq. (8.71), and if its N random variables (N-D CSFT). Consider an N-dimensional image f (x), with x a
are jointly Gaussian, then it is considered a Gaussian random random vector comprising N random variables (corresponding
vector and its joint pdf is given by to N different pixels). The N-D CSFT of f (x) is F(µ ), where µ
is an N-dimensional spatial frequency vector
1 1 T K−1 (x−x)
p(x) = e− 2 (x−x) x , µ = [µ1 , µ2 , . . . , µN ]T , (8.88)
(2π )N/2 (detK x )1/2
and Z ∞ Z ∞
Tx
(8.85) F(µ ) = ··· f (x) e− j2πµµ dx, (8.89)
where (det Kx ) is the determinant of the covariance matrix Kx , | −∞ {z −∞
}
N integrals
and K−1
x is the inverse of matrix Kx .
The random variables { x1 , x2 , . . . , xN } are said to be jointly with
N
Gaussian random variables. Often, the label gets abbreviated
µ Tx = ∑ µ n xn . (8.90)
to Gaussian random variables, but this can be misleading be- n=1
cause even though each individual random variable may have
a Gaussian pdf, the combined set of random variables need not For N = 2, F(µ ) reduces to the 2-D CSFT given by Eq. (3.16a),
be jointly Gaussian, nor form a Gaussian random vector. Hence, with µ = [µ1 , µ2 ] = [µ , ν ].
to avoid ambiguity, we always include the term “jointly,” when The form of the transformation represented by Eq. (8.89) is
applicable, to random variables equally applicable for computing the N-D CSFT of the joint pdf
p(x), which is called the characteristic function∗ Φ x (µ ):
Z ∞ Z ∞
T x′
◮ The adjective “jointly” is not attached to a Gaussian Φ x (µ ) = F { p(x) } = ··· p(x′ ) e− j2πµµ dx′ , (8.91)
random vector because, by definition, the random vari- | −∞ {z −∞
}
ables constituting the Gaussian random vector are jointly N integrals
Gaussian. However, two or more Gaussian random vectors
can be jointly Gaussian, in which case we refer to their For a single random variable x with pdf p(x), the mean value of
combination as jointly Gaussian vectors. ◭
∗ The term “characteristic function” is usually associated with the mathemat-
ical definition of the Fourier transform, which uses a “+” sign in the exponent
The quantities defining the expression for p(x) given by in Eq. (8.91). For consistency with the engineering definition of the Fourier
Eq. (8.85) are the length N of the random vector, the mean- transform used throughout this book, we use a “−” sign instead.
any function of x, such as g(x), is by definition 8-8.2 Properties of Gaussian Random Vectors
Z ∞
E[g(x)] = g(x′ ) p(x′ ) dx′ . (8.92)
A. Marginals of Gaussian random vectors are
−∞ Gaussian
By extension, Eq. (8.91) reduces to Let x be a Gaussian random vector partitioned into two vectors
y and z:
T
Φ x (µ ) = E[e− j2πµµ x ]. (8.93) y
x = [yT , zT ]T = . (8.100)
z
Since x is a Gaussian random vector, use of its pdf, as defined
by Eq. (8.85), in Eq. (8.91) can be shown to lead to Next, if we define matrix A as

µ T x−2π 2 µ T Kx µ
− j2πµ A= I 0 , (8.101)
Φ x (µ ) = e . (8.94)
it follows that y is related to x by
We note that the first term is Eq. (8.94) is the N-dimensional
generalization of entry #3 in Table 3-1 and the second term is
y
the generalization of entry #13 in the same table. y = Ax = I 0 . (8.102)
z
Now, we wish to derive the characteristic function Φ y (µ ) of
vector y = Ax. We start by introducing the spatial frequency According to the result given by Eq. (8.99), if x is a Gaussian
vector µe and its transpose: random vector, so is y, which proves that the marginal of a
Gaussian vector is itself a Gaussian random vector.
µe = ATµ (8.95a)
and B. Uncorrelated jointly Gaussian random vectors

T
µe = µ T A. (8.95b) are independent
Next, rewriting Eq. (8.93) for y instead of x, we have As demonstrated by the result of Eq. (8.67), independent random
variables are uncorrelated:
T T (Ax) T
Φ y (µ ) = E[e− j2πµµ y ] = E[e− j2πµµ ] = E[e− j2π µe x ]. (8.96) always
In analogy to the correspondence between the two forms of independent uncorrelated.

Φ x (µ ) given by Eqs. (8.93) and (8.94), the form of Eq. (8.94) The converse, however, is not necessarily true; in general,
applicable to Eq. (8.96) is uncorrelated random variables may or may not be independent:
T T
x−2π 2 µe Kx µ
Φ y (µ ) = e− j2π µe e sometimes
(8.97)
T (Ax)−2π 2 µ T (AK T
uncorrelated independent.
= e− j2πµµ x A )µ
, (8.98)
But, for Gaussian random vectors, the relationships are always
which is the characteristic function for a Gaussian random vec- bidirectional:
tor with mean vector (Ax) and covariance matrix Ky = AKx AT .
uncorrelated independent (Gaussian).
Hence,
y ∼ N (Ax, AKx AT ). (8.99) To demonstrate the validity of this assertion, let us define a
Gaussian random vector z, partitioned into two vectors x and y,
This result is consistent with the form of Eq. (8.86); upon
replacing x with y, and using Eqs. (8.78) and (8.82), we have
T T T x
z = [x y ] = , (8.103)
y
y ∼ N (y, Ky ) ∼ N (Ax, AKx AT ).
with x of length Nx , y of length Ny , and z of length N = Nx + Ny .
Our task is to demonstrate that if x and y are uncorrelated, then
they are independent. Uncorrelation means that Kx,y = [0] and
independence means that p(x, y) = p(x) p(y).
8-8 GAUSSIAN RANDOM VECTORS 277
The joint pdf of z has the form of Eq. (8.85) with x replaced C. Conditional Gaussian random vectors
with z:
If z is a Gaussian random vector partitioned into vectors x and y
1 1 T −1 as defined by Eq. (8.103), namely
p(z) = e− 2 (z−z) Kz (z−z) . (8.104)
(2π )N/2 (det Kz )1/2
x
z= , (8.109)
The mean vector and covariance matrix of z are given by y

z = E[z] =
x
(8.105a) then the conditional pdf p(x | y = y′ ) also is Gaussian:
y
p(x | y = y′ ) ∼ N (x | y = y′ , Kx|y ), (8.110)
and
Kx Kx,y with
Kz = . (8.105b)
KTx,y Ky
x | y = y′ = E[x | y = y′ ] = x + Kx,yK−1 ′
y (y − y) (8.111a)
Since x and y are uncorrelated, Kx,y = 0, in which case Kz
becomes a block diagonal matrix: and
Kx|y = Kx − Kx,y K−1 T
y Kx,y . (8.111b)
Kx 0
Kz = T , (8.106a)
0 Ky
Interchanging x and y everywhere in Eqs. (8.110) and (8.111)
detKz = (det Kx )(det Ky ), (8.106b)
provides expressions for the conditional pdf p(y | x = x′ ).
and −1 Deriving these relationships involves a rather lengthy mathe-
Kx 0 matical process, which we do not include in here (see Problem
K−1
z = . (8.106c)
0T K−1
y 8-13).
Moreover, the exponent of Eq. (8.104) becomes
Example 8-9: Gaussian Random Vectors
1 x − x T K−1
x 0 x−x
−
2 y−y 0T K−1 y y−y

1 1 x1
= − (x − x)T K−1 T −1
x (x − x) − (y − y) Ky (y − y). Random vector x =
x2
has a joint pdf
2 2
(8.107)
p(x) ∼ N (0, Kx ), (8.112)
In view of the results represented by Eqs. (8.106b) and (8.107),
the pdf of z simplifies to with a covariance matrix

1 5 2
p(z) = Kx = . (8.113)
(2π )Nx /2 (2π )Ny /2 (det Kx )1/2 (det Ky )1/2 2 1
1 T K−1 (x−x) 1 T K−1 (y−y)
× e− 2 (x−x) x e− 2 (y−y) y
y
Random vector y = 1 is related to x by
= p(x) p(y), (8.108) y2
where we used the relation N = Nx + Ny . y1 = 2x1 + 3x2, (8.114a)

y2 = 4x1 + 5x2. (8.114b)
◮ The result given by Eq. (8.108) confirms that if x and Also, random variable z is related to x1 and x2 by
y are jointly Gaussian and uncorrelated, then they are
independent. ◭ z = x1 + 2x2 . (8.115)
Determine: (a) σx21 and λx1 ,x2 , (b) σz2 , (c) Ky , (d) Ky,x , and
(e) Ky|x . (e) Interchanging x and y in Eq. (8.106b) gives
Ky|x = Ky − Ky,x K−1 T

x Ky,x

Solution: (a) The covariance matrix given by Eq. (8.113) 53 99 16 7 1 −2 16 30
= −
represents 99 185 30 13 −2 5 7 13

0 0
λx1 ,x1 λx1 ,x2 5 2 = .
Kx = = . (8.116) 0 0
λx2 ,x1 λx2 ,x2 2 1
This result makes perfect sense; if x is known then y = Ax also
The variance σx21 of x1 is the same as the covariance λx1 ,x1 . is known, and hence the covariance of y given x, is zero.
Hence,
σx21 = 5,
Concept Question 8-8: If {x1 , x2 , . . . , xN } are all Gaus-
and from Eq. (8.116), sian random variables, is [x1 , x2 , . . . , xN ]T necessarily a
Gaussian random vector?
λx1 ,x2 = 2.
(b) Random variable z = x1 + 2x2 is related to x by

x
8-9 Random Processes
z = 1 2 1 = Bx,
x2 Figure 8-13 displays pdfs for the air temperature x[n] at discrete
times n = 1, 3, and 5. The air temperatures x[1], x[3], and
with x[5]—as well as x[n] for all other values of n—are each a
B= 1 2 . random variable characterized by its own pdf, generated from a
Application of Eq. (8.77) leads to probabilistic model that involves several atmospheric variables.
For example, the pdf of x[3] is p(x[3]) and its mean value is
5 2 1 x[3], and similar attributes apply to x[n] for every n. The set
σz2 = Kz = BKx B = 1 T
2 = 17. encompassing all of these random variables for all times n is
2 1 2
called a random process, and since in the present example they
Alternatively, using Eq. (8.61), are indexed by discrete time, the process is called a discrete-time
random process. Otherwise, had the indexing been continuous
σz2 = σx21 + 4σx22 + 2λx1x2 = 5 + 4 × 2 + 2 × 2 = 17. in time, we would have called it a continuous-time random
process.
(c) In view of the coefficients in Eq. (8.114), we can relate y The random process x[n] consists of the infinitely long ran-
to x by dom vector:
y = Ax,
with x = [. . . , x[−2], x[−1], x[0], x[1], . . . ]T , (8.117)
2 3
A= . and it is characterized by the joint infinite-dimensional pdf p(x).
4 5
As we will see shortly, different members of x may or may not be
By Eq. (8.77), correlated with each other, or their correlation may be a function
of the time separation between them.
T2 3 5 2 2 4 53 99
Ky = AKx A = = .
4 5 2 1 3 5 99 185
◮ A sample function or realization of a random process
(d) From Eq. (8.83)
is a deterministic function that results from a specific out-
come of the probabilistic experiment generating the random
2 3 5 2 16 7
Ky,x = AKx = = . process. Random processes are also known as stochastic
4 5 2 1 30 13
processes. ◭
8-9 RANDOM PROCESSES 279
p(x[3])
p(x[1])
x[3]
x[3]
p(x[5])
x[1]
x[1]
x[5]
x[5]
n
1 2 3 4 5 6
Figure 8-13 x[n] is the air temperature at discrete time n. At time n = 1, atmospheric conditions lead to a probabilities model for random
variable x[1] given by a pdf p(x[1]) and a mean value x[1]. Similar models characterize x[n] at each n. The sequence of random variables
{ . . . , x[−2], x[−1], x[0], x[1], . . . } constitutes a discrete-time random process.
8-9.1 Examples of Discrete-Time Random Gaussian random vector. That is,

Processes
x = [x[n1 ], x[n2 ], . . . , x[nN ]]T (8.119)
A. Independent and identically distributed (IID) is a Gaussian random vector for any N integer-valued times
The joint pdf or pmf of two independent random variables { n1 , n2 , . . . , nN }, and for any N. The joint pdf has the form given
is equal to the product of their individual pdfs or pmfs: by Eq. (8.85).
p(x, y) = p(x) p(y). By extension, if all of the elements of vector
x are statistically independent of each other, and if in addition,
they all have the same generic pdf p(x) then x is said to be an 8-9.2 Functions of Random Processes
independent and identically distributed (IID) random process
characterized by The following definitions pertain to discrete-time random pro-
cesses. Analogous expressions apply to continuous-time random
∞
processes.
p(x) = ∏ p(x[n′ ]). (8.118)
′
n =−∞
Mean value
B. Gaussian random process x[n] = E[x[n]]. (8.120)
A random process is Gaussian if each finite subset of the A zero-mean random process is a process with x[n] = 0.
infinite set of random variables { x[n] } is a jointly Gaussian set
of random variables, and therefore they can be stacked into a
Autocovariance function Since the constant mean is known or can be estimated

(Section 9-1), it can be subtracted off from the random process,
Kx [i, j] = λx[i],x[ j] = E[(x[i] − x[i])(x[ j] − x[ j])] thereby producing a zero-mean random process, in which case
Eqs. (8.124) and (8.125b) become applicable.
= x[i] x[ j] − x[i] x[ j]. (8.121)
Cross-covariance function ◮ For the sake of simplicity, we will henceforth assume that
all WSS random processes have zero mean. ◭
Kxy [i, j] = λx[i],y[ j] = E[(x[i] − x[i])(y[ j] − y[ j])]
= x[i] y[ j] − x[i] y[ j]. (8.122) For a zero-mean random process, with x[i] = y[ j] = 0 for all i
and j, the combination of Eqs. (8.121) through (8.127) leads to
Random processes x[n] and y[n] are uncorrelated if
Rx [i − j] = Kx [i − j] = E[x[i] x[ j]] (8.128a)
Kxy [n, m] = 0 for all n and m (uncorrelated).
and
Rxy [i − j] = Kxy [i − j] = E[x[i] y[ j]]. (8.128b)
Autocorrelation function
Rx [i, j] = E[x[i] x[ j]] = Kx [i, j] + x[i] x[ j]. (8.123) Changing variables to n = i − j gives, for any j,
For a zero-mean random process with x[i] = x[ j] = 0, Rx [n] = E[x[n + j] x[ j]], (8.129a)
Rxy [n] = E[x[n + j] y[ j]]. (8.129b)
Rx [i, j] = Kx [i, j] (zero-mean process). (8.124)
Cross-correlation function If the process is IID, x[n + j] and x[ j] are independent random
variables, except for n = 0. Hence, Rx [n] becomes
Rxy [i, j] = E[x[i] y[ j]] = Kxy [i, j] + x[i] y[ j]. (8.125a) (
For zero-mean random processes E[x2 [ j]] for n = 0,
Rx [n] = (8.130)
E[x[n + j]] E[x[ j]] for n 6= 0.
Rxy [i, j] = Kxy [i, j] (zero-mean process). (8.125b)
Since the process is presumed to be zero-mean, which means
that x[i] = 0 for any i, Rx [n] simplifies to
8-9.3 Wide-Sense Stationary (WSS) Random
Process Rx [n] = σ 2 δ [n], (8.131)
A. Discrete time
where the variance is
A random process is considered wide-sense stationary (WSS),
σ 2 = x2 [ j]. (8.132)
also known as weak-sense stationary, if it has the following
three properties:
B. Continuous-time processes
(a) The mean x[n] is constant for all values of n.
All of the definitions and relationships introduced earlier
(b) The autocovariance function Kx [i, j] is a function of the for discrete-time random processes generalize directly to
difference (i − j), rather than i or j explicitly. That is, continuous-time random processes. For example, x(t) is a Gaus-
Kx [i, j] = Kx [i − j]. (8.126) sian random process if
{ x(t1 ), x(t2 ), . . . , x(tN ) } (8.133)

(c) The autocorrelation function also is a function of (i − j):
are jointly Gaussian random variables for any N real-valued
Rx [i, j] = Rx [i − j]. (8.127) times {t1 ,t2 , . . . ,tN }, and any integer N. For the discrete-time
8-9 RANDOM PROCESSES 281
WSS random process, we defined the autocorrelation and cross- In general, Rx (τ ) is related to the power spectral density of the
correlation functions in terms of the discrete-time difference signal, Sx ( f ), by the Fourier transform:
n = i − j. By analogy, we define τ = ti − t j for the continuous- Z ∞
′
time case and then we generalize by replacing t j with simply t, Sx ( f ) = F { Rx (τ ) } = Rx (τ ′ ) e− j2π f τ d τ ′ (8.139a)
which leads to −∞
and Z ∞
Rx (τ ) = Kx (τ ) = E[x(τ + t) x(t)] (zero-mean WSS) ′
(8.134a) Rx (τ ) = F −1 { Sx ( f ) } = Sx ( f ′ ) e j2π f τ d f ′ . (8.139b)
−∞
and
Setting τ = 0 leads to
Rxy (τ ) = Kxy (τ ) = E[x(τ + t) y(t)] (zero-mean WSS).
Z ∞
(8.134b)
E[x2 (t)] = Rx (0) = Sx ( f ′ ) d f ′ . (8.140)
−∞
These expressions are for a zero-mean WSS random process.
Furthermore, if the process is also IID, For the special case where Sx ( f ) is zero at all frequencies except
over an infinitesimally narrow band of width B centered at
f ′ = f0 , the expression given by Eq. (8.140) reduces to
Rx (τ ) = σ 2 δ (τ ). (8.135)
E[x2 (t)] = Sx ( f0 ) B. (8.141)
Since x(t) is real-valued, Rx (τ ) = Rx (−τ ) and Sx ( f ) = Sx (− f ),

8-9.4 Power Spectral Density so the bilateral power spectral density of x(t) at f0 is 2Sx ( f0 ).
Finally, for two zero-mean jointly WSS random processes
A. Continuous-time deterministic signal x and y, the cross-correlation function Rxy (τ ) and the cross-
spectral density Sxy ( f ) are related by
For a deterministic (non-random) continuous-time signal x(t), Z ∞
the signal power at time t is simply |x(t)|2 , and the total energy Sxy ( f ) = F { Rxy (τ ) } = Rxy (τ ′ ) e− j2π f τ d τ ′
′
(8.142a)
of the signal is, from Eq. (2.7), −∞
Z ∞ and Z ∞
E= |x(t)|2 dt. (8.136) Rxy (τ ) = F −1 { Sxy ( f ) } =
′
Sxy ( f ′ ) e j2π f τ d f ′ . (8.142b)
−∞
−∞
In image processing, the image intensity is real-valued, so we
will continue to treat 1-D signals and 2-D images as real-valued, C. Discrete-time random process
in which case |x(t)|2 = x2 (t).
Using the expressions given in Eq. (8.129) for the autocorrela-
tion and cross-correlation functions Rx [n] and Rxy [n] for zero-
mean WSS random processes, application of the DTFT (instead
B. Continuous-time random signal of the Fourier transform) leads to
The power at time t of a continuous-time random process x(t) ∞
is defined not only in terms of x2 (t) but also in terms of the Sx (Ω) = ∑ Rx [n] e− jΩn (8.143a)
probability of that specific value of x(t). That is, n=−∞
Z ∞ and
2 ′ 2 ′ ′ ∞
P(t) = E[x (t)] = (x (t)) p(x (t)) dx (t). (8.137) Rxy [n] e− jΩn .
x′ =−∞ Sxy (Ω) = ∑ (8.143b)
n=−∞
If x(t) is a zero-mean WSS random process, we can express the
power in terms of the autocorrelation function Rx (τ ), as defined D. White random process
by Eq. (8.134a), for τ = 0:
According to Eq. (8.135), the autocorrelation function for a
E[x2 (t)] = Rx (0). (8.138) zero-mean WSS random process is given by Rx (τ ) = σ 2 δ (τ ).
From Eq. (8.139a), the corresponding power spectral density is

Exercise 8-11: A zero-mean WSS random process has the
Z ∞ Z ∞ 2
− j2π f τ ′ − j2π f τ ′ Gaussian autocorrelation function R(τ ) = e−πτ . What is its
Sx ( f ) = Rx (τ ′ ) e dτ ′ = σ 2 δ (τ ′ ) e dτ ′
−∞ −∞ power spectral density?
2
=σ . (8.144) Answer: Sx ( f ) = F {R(τ )}. From entry #6 in Table 2-5,
2
Sx ( f ) = e−π f .
Hence, Sx ( f ) is constant across all frequencies f . By analogy
to white light, which consists of all colors of the spectrum, a
frequency independent power spectral density is called white
and x(t) is called a white random process. One of the properties
of such a process is that x(t1 ) and x(t2 ) are uncorrelated random
8-10 LTI Filtering of Random Processes
variables (and also independent if x(t) is Gaussian), which
means that the value of x(t1 ) at time t1 carries no information 8-10.1 Continuous-Time Random Process
about the value of x(t2 ) at time t2 no matter how small |t2 − t1 | Consider an LTI system with a real-valued impulse response
is, so long as t2 6= t1 . h(t), a continuous-time random process x(t) at its input, and an
Often, a white random process is illustrated by a plot like the output y(t):
one depicted in Fig. 8-14(a). However, such a plot is incorrect
because it displays time continuity of x(t). The correct plot x(t) h(t) y(t). (8.146)
should look more like the one in part (b) of the figure.
Similarly, for a discrete-time WSS white random processes Since x(t) is a random process, it can exhibit many sample
with Rx [n] = σ 2 δ [n], use of Eq. (8.131) in Eq. (8.143a) leads to functions (manifestations), and each sample function x(t) is
filtered by h(t) to produce a sample function y(t). Hence, y(t)
Sx (Ω) = σ 2 (8.145)
also is a random process:
for all Ω. Furthermore, x[n1 ] and x[n2 ] are uncorrelated for any Z ∞
times n1 6= n2 , and also independent if x[n] is a Gaussian random y(t) = h(t) ∗ x(t) = h(α ) x(t − α ) d α . (8.147a)
−∞
process.
Now we examine the relation between the autocorrelations of
x(t) and y(t), as well as the cross-correlation between x(t) and
y(t). We assume x(t) is a zero-mean WSS random process,
which implies that Rx (t1 ,t2 ) = Rx (t1 − t2 ). Taking the expecta-
tion E[·] of Eq. (8.147a) gives
Concept Question 8-9: Are IID random processes wide- Z ∞
sense stationary? E[y(t)] = h(α ) E[x(t − α )] d α = 0, (8.147b)
−∞
Concept Question 8-10: Why can we assume that a where we assumed that E[x(t − α )] = 0, based on our earlier
WSS random process has zero mean? assumption that x(t) is a zero-mean WSS random process.
Hence, y(t) is zero-mean.
Exercise 8-10: A zero-mean WSS random process has
power spectral density A. Autocorrelation of output
4 Let us consider y(t) at times t1 and t2 ; upon replacing t with t1
Sx ( f ) = .
(2π f )2 + 4 and dummy variable α with α1 , and then repeating the process
at t2 , we have
What is its autocorrelation function?
Z ∞
Answer: R(τ ) = F −1 {Sx ( f )}. From entry #3 in Table 2-5, y(t1 ) = h(t1 ) ∗ x(t1 ) = h(α1 ) x(t1 − α1 ) d α1 (8.148a)
−∞
R(τ ) = e−2|τ | .
and
8-10 LTI FILTERING OF RANDOM PROCESSES 283
−1
−2
−3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a) Incorrect plot of white process
−1
−2
−3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b) Correct plot of white process
Figure 8-14 A white process cannot be continuous in time.
Z ∞
y(t2 ) = h(t2 ) ∗ x(t2 ) = h(α2 ) x(t2 − α2 ) d α2 . (8.148b) Taking the expectation E[·] of both sides gives
−∞ Z Z ∞
Multiplication of y(t1 ) by y(t2 ) gives Ry (t1 ,t2 ) = h(α1 ) h(α2 ) Rx (t1 − t2 − α1 + α2 ) d α1 d α2 .
−∞
Z ∞Z ∞ (8.150)
y(t1 ) y(t2 ) = h(α1 ) h(α2 ) x(t1 − α1 ) x(t2 − α2 ) d α1 d α2 . Sunce we have already shown that E[y(t)] = 0, it follows
−∞ −∞ that Ry (t1 ,t2 ) = Ry (t1 − t2 ) and y(t) is WSS. Upon defining
(8.149)
τ = t1 − t2 , we have filtering process is a convolution depicted as

Z Z ∞
Ry (τ ) = h(α1 ) h(α2 ) Rx (τ − α1 + α2 ) d α1 d α2 x[n] h[n] y[n], (8.158)
−∞
= h(τ ) ∗ h(−τ ) ∗ Rx(τ ). (8.151)
and equivalent to
Taking the Fourier transform of both sides leads to the following
expression for the power spectral density of y(t): ∞
y[n] = h[n] ∗ x[n] = ∑ h[i] x[n − i]. (8.159)
Sy ( f ) = H( f ) H(− f ) Sx ( f ) = |H( f )|2 Sx ( f ). (8.152) i=−∞
The discrete-time counterparts of the autocorrelation and cross-

B. Cross-correlation between input and output correlation relations given by Eqs. (8.151) and (8.156) are
derived similarly and given by
We begin by rewriting Eq. (8.147a) with the dummy variable α
changed to α1 : Ry [n] = h[n] ∗ h[−n] ∗ Rx[n] (8.160a)
Z ∞ and
y(t) = h(α1 ) x(t − α1 ) d α1 . (8.153) Ryx [n] = h[n] ∗ Rx[n]. (8.160b)
−∞
Next, we multiply both sides by x(t − α2 ): These relationships presume our earlier characterization that
x[n] and y[n] are zero-mean, jointly WSS processes.
Z ∞
The DTFT maps convolutions in the discrete-time domain to
y(t) x(t − α2 ) = h(α1 ) x(t − α1 ) x(t − α2 ) d α1 . (8.154)
−∞ products in the frequency domain:
Taking the expectation E[·] of both sides—while keeping in

mind that x(t) is a zero-mean WSS process—leads to Sy (Ω) = |H(Ω)|2 Sx (Ω) (8.161a)
Z ∞ and
Ryx (t, t − α2 ) = h(α1 ) Rx (α2 − α1 ) d α1 , (8.155) Syx (Ω) = H(Ω) Sx (Ω) (8.161b)
−∞
which states that x(t) and y(t) are jointly WSS. Noting that
Ryx (t, t − α2 ) = Ryx (t − (t − α2 )) = Ryx (α2 ),

Example 8-10: Continuous-Time Random
and the integral in Eq. (8.155) is in the form of a convolution Process
leads to the result
Ryx (α2 ) = h(α2 ) ∗ Rx (α2 ). (8.156) The input x(t) to an LTI system defined by
The frequency domain equivalent of Eq. (8.156) is dy

+ 7y(t) = 5x(t)
dt
Syx ( f ) = H( f ) Sx ( f ). (8.157) is a zero-mean, white random process with Rx (τ ) = 3δ (τ ).
Compute the power spectral density and autocorrelation func-
tion of the output y(t).
8-10.2 Discrete-Time Random Processes Solution: Application of the Fourier transform to the system
equation gives
The relations obtained in the preceding subsection are all for
continuous-time random processes. If, instead, the zero-mean ( j2π f + 7) Y( f ) = 5X( f ),
WSS random process x[n] is passed through an LTI system with
impulse response h[n] to produce a random process y[n], the where use was made of property #5 in Table 2-4. The system’s
8-11 RANDOM FIELDS 285
frequency response is
Exercise 8-12: x(t) is a white process with Rx (τ ) = 5δ (τ )
Y( f ) 5 and Z t+1
H( f ) = = .
X( f ) j2π f + 7 y(t) = x(τ ) d τ .
t−1
From Eq. (8.144), Sx ( f ) = σ 2 when x(t) is a white WSS with Compute the power spectral density of y(t).
Rx (τ ) = σ 2 δ (τ ). Hence, in the present case, Sx ( f ) = 3, and the
power spectral density of y(t) is, from Eq. (8.152), Answer: Sy ( f ) = 20 sinc2 (2 f ). (See IP ).
2
5 75
Sy ( f ) = |H( f )|2 Sx ( f ) = ×3 = .
j2π f + 7 4π 2 f 2 + 49 8-11 Random Fields
Using entry #3 in Table 2-5, the inverse Fourier transform of
Sy ( f ) is The preceding sections highlighted the properties and relations
75 −7|τ | for 1-D random processes, in both continuous and discrete time.
Ry (τ ) = e . Now, we extend those results to 2-D space, and to distinguish
14 between 1-D and 2-D processes, we refer to the latter as a
random field, instead of a random process, and we also change
our symbols to match the notation we used in earlier chapters to
represent 2-D images.
Example 8-11: Discrete-Time Random Process

8-11.1 Random Field Notation
(a) Continuous-space random field f (x, y) with 2-D continuous
The input x[n] to a discrete-time LTI system given by spatial dimensions x and y.
(b) Discrete-space random field f [n, m] with 2-D discrete spatial
y[n] + 2y[n − 1] = 3x[n] + 4x[n − 1]
dimensions n and m.
is a zero-mean white random process with Rx [n] = 3δ [n]. Com- (c) Probability density and mass functions
pute the power spectral density of y[n].
p( f (x, y) = f ′ (x, y)) = probability density that random field
Solution: Taking the DTFT of the system equation gives f (x, y) at (x, y) has value f ′ (x, y) (continuous space),
− jΩ − jΩ ′
(1 + 2e ) Y(Ω) = (3 + 4e ) X(Ω), p[ f [n, m] = f [n, m]] = probability mass that random field
f [n, m] at [n, m] has value f ′ [n, m] (discrete space).
where use was made of entry #2 in Table 2-7. The system’s
frequency response is then
(d) Mean values for each (x, y) and [n, m]
Y(Ω) 3 + 4e− jΩ Z ∞
H(Ω) = = .
X(Ω) 1 + 2e− jΩ E[ f (x, y)] = f ′ (x, y) p( f ′ (x, y)) d f ′ , (8.162a)
−∞
Given that x[n] is a zero-mean white process with Rx [n] = 3δ [n], (continuous space)
it follows that Sx [Ω] = 3, and by Eq. (8.161a), ∞
E[ f [n, m]] = ∑ f ′ [n, m] p[ f ′ [n, m]], (8.162b)
3 + 4e− jΩ 2 ′
f =−∞
Sy (Ω) = |H(Ω)|2 Sx (Ω) = ×3
1 + 2e− jΩ (discrete space)
3[(3 + 4 cosΩ)2 + 16 sin2 Ω]
= . where p( f ′ (x, y)) is the pdf of variable f ′ (x, y) and p[ f ′ [n, m]] is
(1 + 2 cosΩ)2 + 4 sin2 Ω the pmf of f ′ [n, m].
(e) Autocorrelation functions and

π Z π Z
R f (x1 , y1 ; x2 , y2 ) = E[ f (x1 , y1 ) f (x2 , y2 )], (8.163a) 1
E[ f 2 [n, m]] = R f (0, 0) = 2
S f (Ω1 , Ω2 ) dΩ1 dΩ2 .
(continuous space) 4 π −π −π
(discrete-space zero-mean WSS) (8.165b)
R f (n1 , m1 ; n2 , m2 ) = E[ f [n1 , m1 ] f (n2 , m2 )]. (8.163b)
(discrete space)
8-11.3 Filtering WSS Random Fields
Extending the material on 1-D random processes (Section 8-10)
As in 1-D, if the mean values E[ f (x, y)] and E[ f [n, m]] are to 2-D, we have
zero, the autocorrelation functions are equal to the autocovari-
ance functions:
f (x, y) h(x, y) g(x, y) = h(x, y) ∗ ∗ f (x, y),
R f (x1 , y1 ; x2 , y2 ) = K f (x1 , y1 ; x2 , y2 ) (continuous space),
R f [n1 , m1 ; n2 , m2 ] = K f [n1 , m1 ; n2 , m2 ] (discrete space). (continuous space) (8.166a)
f [n, m] h[n, m] g[n, m] = h[n, m] ∗ ∗ f [n, m].

8-11.2 WSS Random Fields
The attributes of 1-D WSS random processes were discussed (discrete space) (8.166b)
earlier in Section 8-9.3. Generalizing to 2-D, a continuous-space
zero-mean random field f (x, y) is WSS if If f (x, y) and f [n, m] are zero-mean WSS random fields, then
(1) E[ f (x, y)] = constant for all (x, y), and Rg (x, y) = h(x, y) ∗ ∗h(−x, −y) ∗ ∗R f (x, y), (8.167a)
Rg [n, m] = h[n, m] ∗ ∗h[−n, −m] ∗ ∗R f [n, m], (8.167b)
(2) R f (x, y) = E[ f (x′ + x, y′ + y) f (x′ , y′ )].
Rg f (x, y) = h(x, y) ∗ ∗R f (x, y), (8.167c)
The corresponding relations for discrete-space random fields are Rg f [n, m] = h[n, m] ∗ ∗R f [n, m], (8.167d)
(1) E[ f [n, m]] = constant for all [n, m], and Sg (µ , ν ) = |H(µ , ν )|2 S f (µ , ν ), (8.167e)
2
Sg (Ω1 , Ω2 ) = |H(Ω1 , Ω2 )| S f (Ω1 , Ω2 ), (8.167f)
(2) R f [n, m] = E[ f [n′ + n, m′ + m] f [n′ , m′ ]].
Sg f (µ , ν ) = H(µ , ν ) S f (µ , ν ), (8.167g)
The power spectral density Sf (µ , ν ) in the continuous-space Sg f (Ω1 , Ω2 ) = H(Ω1 , Ω2 ) S f (Ω1 , Ω2 ). (8.167h)
frequency domain (µ , ν ) is related to R f (x, y) by
Z ∞Z ∞
S f (µ , ν ) = R f (x, y) e− j2π (µ x+ν y) dx dy, Exercise 8-13: A white random field f (x, y) with autocor-
−∞ −∞ relation function R f (x, y) = 2δ (x) δ (y) is filtered by an LSI
(continuous-space zero-mean WSS) (8.164a) system with PSF h(r) = 1r , where r is the radius in (x, y)
space. Compute the power spectral density of the output
and the 2-D equivalent of Eq. (8.140) is random field g(x, y).
Z ∞Z ∞
Answer:
E[ f 2 (x, y)] = R f (0, 0) = S f (µ , ν ) d µ d ν . 2
−∞ −∞ Sg (µ , ν ) = .
µ2 + ν2
(continuous-space zero-mean WSS) (8.164b)
(See IP ).
For a discrete-space WSS random field with zero mean,
∞ ∞
S f (Ω1 , Ω2 ) = ∑ ∑ e− j(Ω1 n+Ω2 m) ,
n=−∞ m=−∞
(discrete-space zero-mean WSS) (8.165a)
8-11 RANDOM FIELDS 287
Summary
Concepts
• A random variable is a number assigned to each random matrix.
outcome. • A random process is a set of random variables indexed
• A probability density function (pdf) is the probability by time.
that a random variable x lies in an interval of length δ x. • A wide-sense stationary (WSS) random process has
• A Gaussian random variable is described by its mean and constant mean and autocorrelation Rx [i, j] = Rx [i − j].
variance. The constant mean is subtracted off.
• A random vector is a vector of random variables. Mean • A random field is a 2-D random process in 2-D space.
and variance generalize to mean vector and covariance
Conditional probability Covariance matrix
P[E1 ∩ E2 ] Kx = E[xxT] − xxT
P[E1 |E2 ] =
P[E2 ]
Covariance matrix
Interval probability
Z b KAx = AKx AT
P[a ≤ x < b] = p(x′ ) dx′
a Gaussian random vector
Interval probability x ∼ N (x, Kx )
Z bx Z by T −1
e−(1/2)(x−x) Kx (x−x)
P[ax ≤ x < bx , ay ≤ y < by ] = p(x′ , y′ ) dx′ dy′ p(x) =
ax ay (2π )N/2 (det Kx )1/2
Conditional pdf Conditional Gaussian expectation
p(x|y) =
p(x, y) E[x|y] = E[x] + Kx,yK−1
y (y − E[y])
p(y)
Expectation Autocovariance function
Z ∞ Kx [i, j] = E[x[i] x[ j]] − E[x[i]] E[x[ j]]
E[ f (x)] = f (x) = f (x′ ) p(x′ ) dx′
−∞
Autocorrelation function
Variance Rx [i, j] = E[x[i] x[ j]]
σx2 = E[x2 ] − x2
Rx (t1 ,t2 ) = E[x(t1 ) x(t2 )]
Covariance
Wide-sense stationary random process
λx,y = E[xy] − xy
Rx (t1 ,t2 ) = Rx (t1 − t2 ); x(t)] = m
Gaussian pdf
1 2 2
Power spectral density
p(x) = p e−(x−x) /(2σx ) Sy ( f ) = F {Ry (t)} = |H( f )|2 Sx ( f )
2πσx2
Mean vector Cross-spectral density
Syx ( f ) = F {Rx,y (t)} = H( f ) Sx ( f )
E[x] = x = [x1 , . . . , xN ]T
autocovariance function disjoint pmf sample and event spaces
axioms of probability iid random process power spectral density sample function
conditional probability independent probability and trees vector random variable
covariance matrix mutually exclusive random field white random process
cross-covariance function pdf realization wide-sense stationary
PROBLEMS 8.4 Bayes’s Rule. Widgets are made by two factories. Factory
A makes 6000 widgets per year, 2% of which are defective. Fac-
Section 8-2: Conditional Probability tory B makes 4000 widgets per year, 3% of which are defective.
A widget is chosen at random from the 10,000 widgets made in
8.1 Three-Card Monte (a game hucksters play with chumps— a year.
don’t be a chump!). There are three cards. Card #1 is red on (a) Compute P[the chosen widget is defective].
both sides. Card #2 is black on both sides. Card #3 is red on one
(b) Compute P[the chosen widget came from factory A, given
side and black on the other side. The cards are shuffled and one that it is defective].
chosen at random. The top of the chosen card is red. What is
P[bottom of the chosen card is also red]? (The chump bets even 8.5 Bayes’s Rule. Widgets are made by two factories. Factory
money.) A makes 7000 widgets per year, 1% of which are defective. Fac-
tory B makes 3000 widgets per year, 2% of which are defective.
8.2 A Tale of Two Tosses. Two coins have P[heads] as follows: A widget is chosen at random from the 10,000 widgets made in
Coin A has P[heads] = 1/3. Coin B has P[heads] = 3/4. The a year.
results of all flips are independent. We choose a coin at random
(a) Compute P[the chosen widget is defective].
and flip it. Given that it came up heads, compute P[it was
coin A]. Hint: Draw a probability tree. (b) Compute P[the chosen widget came from factory A, given
that it is defective].
8.3 Three Coins in the Fountain (an old movie). Three coins
have P[heads]: Coin A has P[heads] = 2/3. Coin B has Section 8-3: Random Variables
P[heads] = 3/4. Coin C has P[heads] = 4/5. The results of all
flips are independent. Coin A is flipped. Then: 8.6 Random variables x and y have the joint pdf
• If coin A lands heads, we flip coin B. (
cx if 1 < x < y < 2,
p(x, y) =
• If coin A lands tails, we flip coin C. 0 otherwise,
Let H2 denote that the second coin flipped (whatever it is) lands where c is a constant to be determined.
heads.
(a) Compute the constant c in the pdf p(x, y).
(a) Compute P[H2]. (b) Are x and y independent? Explain your answer.
(b) Compute P[Coin A landed heads, given H2]. Now the (c) Compute the marginal pdf p(y).
second coin is flipped n − 1 more times (for a total of n
(d) Compute the conditional pdf p(x|y) at y = 3/2.
flips). Let H2n denote that all n flips of the second coin
land heads. 8.7 Random variables x and y have the joint pdf
(c) Compute P[H2n]. (
cxy if 0 < y < x < 1,
(d) Compute P[Coin A landed heads, given H2n]. p(x, y) =
0 otherwise,
(e) What happens to your answer to (d) as n → ∞? Explain this.
Hint: Draw a probability tree. where c is a constant to be determined.
PROBLEMS 289
(a) Compute the constant c in the pdf p(x, y). (a) Write out the formula for the joint pdf p(x, y).
(b) Are x and y independent? Explain your answer. (b) Show that changing variables from (x, y) to (z, w), where
(c) Compute the marginal pdf p(x).
z 1 1 1 x
(d) Compute the conditional pdf p(y|x) at x = 1/2. =√
w 2 1 −1 y
8.8 Random variable x has the exponential pdf
( yields two decorrelated (λz,w = 0) random variables z
p(x) = λ e−λ x for x > 0, and w.
0 for x < 0. 8.12 Random variables {x1 , x2 , x3 } are all zero-mean and
R jointly Gaussian, with variances σx2i = 2 and covariances
∞
(a) Confirm −∞ p(x) dx = 1. λxi ,x j = −1 for i 6= j.
(b) Compute the expectation x. (a) Write out the joint pdf p(x1 , x2 , x3 ). Use Eq. (8.85) nota-
(c) Compute the variance σx2 . tion.
(b) Write out the joint marginal pdf p(x1 , x3 ). Use Eq. (8.85)
Section 8-5: Joint Pdfs and Pmfs notation.
(c) Let y = x1 + 2x2 + 3x3. Compute σy2 .
8.9 Random variables m and n have the joint pmf
(d) Show that the conditional pdf p(x1 |x2 = 2, x3 = 3) =
(
c if 0 ≤ n ≤ m ≤ 4, δ (x1 + 5). Hint: Compute the eigenvalues and eigenvectors
p[m, n] = of the covariance matrix.
0 otherwise,
8.13 Prove Eq. (8.111a) and Eq. (8.111b), which are:
where c is a constant to be determined and m and n are integers.
(a) Compute the constant c in the pmf p[m, n]. E[x|y = y′ ] = E[x] + Kx,yK−1 ′
y (y − E[y]),
(b) Are m and n independent? Explain your answer. Kx|y = Kx − Kx,yK−1 T

y Kx,y .
(c) Compute the marginal pmf p[m].
Hint: Define random vector w = x − Kx,yK−1 y y and show that w
(d) Compute the conditional pmf p[n|m] at m = 2. and y are jointly Gaussian and uncorrelated, hence independent,
(e) Compute the conditional mean E[n|m] at m = 2. random vectors.
2 at m = 2.
(f) Compute the conditional variance σn|m
8.10 Random variables m and n have the joint pmf Section 8-9: LTI Filtering of Random Processes
(
p[m, n] = cm if 0 ≤ n ≤ m ≤ 4, 8.14 x(t) is a zero-mean WSS random process with power
0 otherwise, spectral density
1
Sx ( f ) = .
where c is a constant to be determined and m and n are integers. 9 + (2π f )2
(a) Compute the constant c in the pmf p[m, n]. x(t) is passed through the LTI system y(t) = dx
dt + 3x(t).
(b) Compute the marginal pmf p[n]. (a) Compute the power spectral density Sy ( f ).
(c) Compute the conditional pdf p[n|m] at m = 2. (b) Compute the cross-spectral density Syx ( f ).
Section 8-7: Random Vectors 8.15 x(t) is a zero-mean WSS random process with power
spectral density Sx ( f ) = 1. x(t) is passed through the LTI system
dy
8.11 Random variables x and y are zero-mean and jointly dt + 4y(t) = 3x(t).
Gaussian, with variances σx2 = σy2 = 1 and covariance λx,y = ρ (a) Compute the power spectral density Sy ( f ).
for some constant ρ 6= ±1.
(b) Compute the cross-spectral density Syx ( f ).
8.16 x(t) is a zero-mean WSS random process with power

spectral density Sx ( f ) = 1. x(t) is passed through an LTI system
with impulse response
(
2e−3t for t > 0,
h(t) =
0 for t < 0.
(a) Compute the power spectral density Sy ( f ).

(b) Compute the cross-spectral density Syx ( f ).
Section 8-11: Random Fields
8.17 f (x, y) is a zero-mean WSS Gaussian random field with

power spectral density S f (µ , ν ) = 1. g(x, y) is f (x, y) filtered by
a brick-wall lowpass filter with cutoff spatial frequency 21 .
(a) Compute the pdf of g(7, 5).
(b) Compute the joint pdf of {g(7, 5), g(2, 4)}.
8.18 f (x, y) is a zero-mean WSS random field with power
spectral density S f (µ , ν ) = 1. g(x, y) is f (x, y) blurred by a
2 2
Gaussian PSF, so g(x, y) = f (x, y) ∗ ∗ e−π (x +y ) .
2
(a) Compute the variance σg(3,5) .
(b) Compute the covariance λg(3,5),g(2,4).
8.19 A fractal random field f (x, y) has a power spectral
density of form S f (ρ , φ ) = c/ρ d for two constants c and d.
Show that the Laplacian operator g(x, y) = ∇2 f (x, y) defined
in Eq. (5.5) “whitens” f (x, y) (produces a white random field
output g(x, y)) if d = 4.
8.20 f [n, m] is a zero-mean WSS white random field
with power spectral density S f (Ω1 , Ω2 ) = 1. g[n, m] is
f [n, m] filtered with a Laplacian, so g[n, m] = ∇2 f [n, m].
∇2 f [n, m] is the discrete-space Laplacian (Eq. (5.9)). g[n, m] =
f [n, m] ∗ ∗ hLaplace [n, m]. Eq. (5.10):
 
0 1 0
hLaplace [n, m] = 1 −4 1 .
0 1 0
Compute the power spectral density of g[n, m].

8.21 f [n, m] is a zero-mean WSS Gaussian random field with
autocorrelation function R f [m, n] = 26−|m|−|n| .
(a) Compute the pdf of f [5, 4]
(b) Compute the joint pdf of { f [5, 4], f [3, 1]}.
Chapter 9
9 Stochastic Denoising
and Deconvolution
Contents
Overview, 292
9-1 Estimation Methods, 292
9-2 Coin-Flip Experiment, 298
(a) Fractal tree image (d) Noisy blurred image
9-3 1-D Estimation Examples, 300 log10(Sf (0,Ω2))
9-4 Least-Squares Estimation, 303 9
9-5 Deterministic versus Stochastic Wiener 8
Filtering, 307
7
9-6 2-D Estimation, 309 5
9-7 Spectral Estimation, 313 4
9-8 1-D Fractals, 314

3
9-9 2-D Fractals, 320 1

−2 −1.5 −1 −0.5 0 0.5
log(Ω2)
9-10 Markov Random Fields, 322 (b) Plot of log10(Sf (0,Ω2)) versus
log10(Ω2) for π/240 ≤ Ω2 ≤ π.
(e) Reconstructed image using
a stochastic Wiener filter
9-11 Application of MRF to Image Segmentation, 327 The average slope is −2.
Problems, 331
Objectives
Learn to:
(c) Blurred image (f ) Reconstructed image using
■ Compute MLE, MAP, and LS estimates for small a deterministic Wiener filter
analytic problems.
This chapter provides a quick review of estimation
■ Estimate parameters of a fractal power spectral theory, including MLE, MAP, LS and LLSE
density from a signal or image. estimators. It then derives stochastic versions of the
deterministic denoising and deconvolution filters
■ Denoise and deconvolve fractal signals and images presented in earlier chapters. Incorporating a priori
using stochastic filters. information, in the form of power spectral density,
is shown to greatly improve the performance of
■ Use thresholding and shrinkage of its wavelet denoising and deconvolution filters on 1-D and 2-D
transform to denoise an image. problems. A very quick presentation of Markov
random fields and the ICM algorithm for image
■ Use the ICM algorithm to segment an image.
segmentation, with examples, are also provided.
Overview All three estimation methods are applied in Section 9-2 to the
coin-flip experiment presented earlier in Section 8-1, obtaining
In a deterministic (non-random) inverse problem, the goal is to three different estimators of P[heads]. This is then followed in
compute an unknown 1-D signal x(t) or x[n], or a 2-D image Sections 9-3 to 9-5 with applications of the estimation methods
f (x, y) or f [n, m], from observation of a corresponding signal to four 1-D estimation problems: (1) estimating the mean of
y(t) or y[n], or corresponding image g(x, y) or g[n, m], wherein a wide-sense stationary (WSS) random process, in the context
the unknown quantity and its observed counterpart are linked of polling, (2) denoising a signal containing additive noise, (3)
by a known model. The inversion process may also take into denoising a signal known to be sparse, and (4) deconvolving
consideration side information about the unknown quantity, a signal from its noisy convolution with a known impulse
such as non-negativity, or a priori information, in the form of response.
a pdf or pmf for it. The solution of an inverse problem may be The treatment is extended to 2-D in Section 9-6, and in
difficult, but a well-posed inverse problem always has a unique Section 9-7 we review the periodogram method for estimating
solution. power spectral densities of random processes and random fields.
An estimation problem is a stochastic (random) version of The periodogram can be used to obtain parametric forms of
the inverse problem. The observation is a 1-D random variable power spectral densities for classes of signals and images, such
or random process, or a 2-D random field. The random character as fractals, which can then be used in the formulations developed
of the observation may be associated with the inherent nature in earlier sections. The procedures are outlined in Sections 9-8
of the observing system itself—an example of which is a laser for signals and 9-9 for images. The last two sections of the
imager and certain radar systems—or due to additive noise chapter are focused on an introduction to Markov random fields
introduced in the system’s receiver. An image generated by a (MRF) and their application to image segmentation. An MRF
monochromatic laser imager or synthetic-aperture radar usually incorporates a priori information (prior knowledge) about the
exhibits a speckle-like randomness superimposed on the true relationships between the value of a given pixel and those of its
image intensity that would have been measured by a wide- neighbors.
bandwidth sensor.
Based on knowledge of the sources and mechanisms respon-
sible for the randomness associated with the observed signal 9-1 Estimation Methods
or image, we can incorporate randomness (stochasticity) in the
model that relates the unknown quantity to the observation by This section introduces three estimation methods: MLE, MAP,
modeling the unknown quantity as a random variable with a and LSE, and even though our ultimate goal is to apply these
characteristic pdf or pmf. This pdf or pmf represents a pri- methods to 2-D images (which we do in later sections), we
ori information about the unknown quantities. Because of the will, for the present, limit the presentation to estimating a
stochastic nature of the inverse problem, we call it an estimation scalar unknown x from a scalar observation yobs of a random
problem. The formulation of an estimation problem usually variable y. In later sections we generalize the formulations to
leads to a likelihood function, and the goal of the estimation random vectors, processes, and fields.
problem becomes to maximize the likelihood function; the value An estimation problem consists of the following ingredients:
of the unknown signal or image that maximizes the likelihood (a) x: the unknown quantity to be estimated, which may have a
function is deemed the solution of the estimation problem. constant value or it may be a random variable with a known
In Section 9-1, we introduce three common approaches to pdf p(x).
estimation:
(b) yobs : the observed value of random variable y.
(a) Maximum Likelihood Estimation (MLE),
(c) p(y | x = x′ ): the conditional pdf of y, given that x = x′ ,
(b) Maximum A Posteriori Probability (MAP) Estimation, provided by a model.
(c) Least-Squares Estimation (LSE). In a typical situation, y and x are related by a model of the form
In all three methods, the observation is presumed to be random y(t) = h(t) ∗ x(t) + υ (t) (continuous time),
in nature, and the unknown quantity also is presumed to be
random but only in MAP and LSE; the unknown quantity is or
presumed to be non-random in MLE. y[n] = h[n] ∗ x[n] + υ [n] (discrete time),
292
9-1 ESTIMATION METHODS 293
where h(t) and h[n] are the continuous-time and discrete-time information about the unknown (input) random variable x in the
impulse responses of the observing system and υ (t) and υ [n] form of its pdf p(x). For example, x may be known to have a
represent random noise added by the measurement process. To Gaussian pdf (Fig. 9-1(a)
obtain a good estimate of x(t) (or x[n]), we need to filter out the
2 2
noise and to deconvolve y(t) (or y[n]). e−[(x−x) /(2σx )]
We now introduce the basic structure of each of the three p(x) = p (Gaussian), (9.1)
2πσx2
estimation methods.
where x = E[x] is the mean value of x and σx2 is its variance. Or x
9-1.1 Maximum Likelihood Estimation (MLE) might be known to be a value of a sparse signal with Laplacian
pdf (Fig. 9-1(b))
Among the three estimation methods, the maximum likelihood √
estimation (MLE) method is the one usually used when the e− 2 |x|/σx
unknown quantity x has a constant value, as opposed to being p(x) = √ (Laplacian). (9.2)
2 σx
a random variable. The basic idea behind MLE is to choose
the value of x that makes what actually happened, namely the Gaussian and Laplacian a priori pdfs will be used later in
observation y = yobs , the most likely outcome. MLE maximizes Sections 9-3 and 9-4.
the likelihood of y = yobs (hence the name MLE) by applying The idea behind MAP estimation is to determine the most
the following recipe: probable value of x, given y = yobs , which requires maximizing
the a posteriori pdf p(x|yobs ). This is in contrast to the MLE
(1) Set y = yobs in the conditional pdf p(y|x) provided by the method introduced earlier, which sought to maximize the likeli-
model to obtain p(yobs |x). hood function p(yobs |x). As we see shortly, in order to maximize
(2) Choose the value of x that maximizes the likelihood func- p(x|yobs ), we need to know not only p(yobs |x), but also the
tion p(yobs |x) and denote it x̂MLE. a priori pdf p(x).
As noted earlier, usually we know or have expressions for
(3) If side information about x is available, such as x is non- p(x) and the likelihood function pobs (y|x), but not for the a
negative or x is bounded within a specified interval, then posteriori pdf p(x|y). To obtain an expression for the latter (so
incorporate that information in the maximization process. we may maximize it), we use the conditional pdf relation given
by Eq. (8.36b) to relate the joint pdf p(x, y) to each of the two
In practice it is often easier to maximize the natural logarithm, conditional pdfs:
ln(p(yobs |x)), rather than to maximize p(yobs |x) itself. Hence-
forth, ln(p(yobs |x)) will be referred to as the log-likelihood p(x, y) = p(y|x) p(x) (9.3a)
function. and
p(x, y) = p(x|y) p(y). (9.3b)
◮ p(yobs |x) = likelihood function
Combining the two relations gives Bayes’s rule for pdfs:
p(x) = a priori pdf
p(x|yobs ) = a posteriori pdf
ln(p(yobs |x) = log-likelihood function p(x)
p(x|y) = p(y|x) . (9.4)
p(y)
9-1.2 Maximum A Posteriori (MAP) Estimation Bayes’s rule relates the a posteriori pdf p(x|yobs ) to the likeli-
If the unknown quantity x is a random variable, the two meth- hood function p(yobs |x).
ods most commonly used to estimate x, given an observation The goal of the MAP estimator is to choose the value of x
y = yobs , are the maximum a posteriori (MAP) estimator and that maximizes the posteriori pdf p(x|yobs ), given the a priori
the least-squares estimator (LSE). This subsection covers MAP pdf p(x) and the likelihood function p(yobs |x). The third pdf in
and the next one covers LSE. Eq. (9.4), namely p(y), has no influence on the maximization
The observation y = yobs is called a posteriori information process because it is not a function of x. Hence, p(y) can be set
because it is about the outcome. This is in contrast to a priori equal to any arbitrary constant value C.
294 CHAPTER 9 STOCHASTIC DENOISING AND DECONVOLUTION
p(x)
0.4
Gaussian pdf
x=0
0.3 σx = 1
0.2
x=0
0.1 σx = 2
0 x
−6 −4 −2 0 2 4 6
(a) Gaussian pdf
p(x)
0.8
Laplacian pdf
0.6
σx = 1
0.4
0.2
σx = 2
0 x
−6 −4 −2 0 2 4 6
(b) Laplacian pdf
Figure 9-1 Zero-mean Gaussian and Laplacian pdfs.
The MAP estimator recipe consists of the following steps: given by Eq. (9.4) and set p(y) = C:
p(x)
p(x|yobs ) = p(yobs |x) . (9.5)
C
(1) Set y = yobs in the expression for the a posteriori pdf p(x|y)
(2) Choose the value of x that maximizes the a posteriori pdf apparent from the integration that both x′ and y′ are treated
p(x|yobs ) and denote it x̂MAP . as random variables, so we need to keep that in mind in what
follows.
(3) If side information about x is available, incorporate that Since expectation is a linear operator,
information in the estimation process.
As noted with the MLE method in the preceding subsection, E[(x − x̂(y))2 ] = E[(x − E[x|y] + ε (y))2]
it is often easier to maximize the natural logarithm of the = E[(x − E[x|y])2] + E[ε 2 (y)]
a posteriori pdf: + 2E[xε (y)] − 2E[E[x|y]ε (y)]. (9.11)
ln(p(x|yobs )) = ln p(yobs |x) + ln p(x) − lnC, (9.6) We now show that the final two terms cancel each other. Using
which is the sum of the logarithms of the likelihood function Eq. (8.36a), namely
p(yobs |x) and the a priori pdf p(x). Subtracting lnC does not p(x′ , y′ ) = p(x′ |y′ ) p(y′ ), (9.12)
affect the process of maximizing the value of x.
the third term in Eq. (9.11) becomes
9-1.3 Least-Squares Estimator (LSE) ZZ
2E[xε (y)] = 2 x′ ε (y′ ) p(x′ , y′ ) dx′ dy′
Whereas the MAP estimator sought to estimate x by maximizing
ZZ
the a posteriori pdf p(x|yobs ) for a given observed value yobs ,
=2 x′ ε (y′ ) p(x′ |y′ ) p(y′ )dx′ dy′
the least-squares estimator (LSE) estimates the most probable
Z Z
value of x by minimizing the mean square error (MSE) between ′ ′
the estimated value x̂ and the random variable x. The MSE is =2 p(y ) ε (y ) x′ p(x′ |y′ ) dx′ dy′
defined as Z
ZZ =2 p(y′ ) ε (y′ ) E[x | y = y′ ] dy′
MSE = E[(x − x̂(y))2 ] = (x′ − x̂(y′ ))2 p(x′ , y′ ) dx′ dy′ . (9.7) = 2E[ε (y) E[x|y]]. (9.13)
Of course, x̂ is a function of y, and as we will show shortly, This result shows that the third and fourth terms in Eq. (9.11)
the value of x estimated by the LSE method, for a given value are identical in magnitude but opposite in sign, thereby cancel-
y = yobs , is given by ing one another out. Hence, the MSE is
Z
x̂LS = E[x | y = yobs ] = x′ p(x′ |yobs ) dx′ , (9.8) MSE = E[(x − x̂(y))2 ] = E[(x − E[x|y])2 ] + E[ε 2(y)]. (9.14)
The MSE is minimum when the term E[ε 2 (y)] = 0, which,

where p(x|yobs ) is the a posteriori pdf given by Eq. (9.4) with in turn, requires that ε (y) = 0. This result proves that the
y = yobs . conditional mean given by Eq. (9.8) is indeed the solution that
We now prove that x̂LS is indeed the value of x that minimizes minimizes the MSE.
the MSE by introducing the perturbation ε (y) as the deviation Readers familiar with mechanics will recognize the result
of x̂(y) from E[x|y]: given by Eq. (9.14) as the parallel-axis theorem: The moment
of inertia of a planar object is minimized when the axis of
x̂(y) = E[x|y] − ε (y). (9.9)
rotation goes through the center of mass of the object. Here,
For y = yobs , E[x|yobs ] is the LS estimator x̂LS given by Eq. (9.8), MSE represents the moment of inertia and x̂(y) represents the
so our task is to demonstrate that the MSE given by Eq. (9.7) is center of mass.
at its minimum when ε (y) = 0. To that end, let us replace x̂(y) Now that we have shown that x̂LS is given by Eq. (9.8), let us
in the MSE expression given by Eq. (9.7) with the perturbed examine how to compute it. Using Bayes’s rule from Eq. (9.4),
definition given by Eq. (9.9):
MSE = E[(x − x̂(y))2 ] = E[(x − E[x|y] + ε (y))2]. (9.10)
From the definition on the right-hand side of Eq. (9.7), it is

Eq. (9.8) becomes 9-1.4 Discrete-Value Estimators

Z
p(x′ ) The formulations associated with the three estimation methods
x̂LS = x′ p(yobs |x′ ) dx′ covered in the preceding subsections are specific to continuous-
p(y = yobs )
Z value quantities, wherein the symbol x denotes the unknown
1
= x′ p(yobs |x′ ) p(x′ ) dx′ . (9.15) quantity we wish to estimate and y denotes the observation yobs .
p(y = yobs ) Now we consider discrete-value variables m and n, with:
The quantity p(y = yobs ) can be determined from (a) m: the unknown quantity to be estimated, which may have a
Z constant value or it may be a random variable with a known
p(y = yobs ) = p(yobs |x′ ) p(x′ ) dx′ , (9.16) pmf p[m].
leading to the final expression (b) n: the observed value of a discrete random variable n.
Also available from the model relating m to n is:
R ′
x p(yobs |x′ ) p(x′ ) dx′
x̂LS = R . (9.17) (c) p[n|m]: the conditional pmf.
p(yobs |x′ ) p(x′ ) dx′
Occasionally, we may encounter scenarios in which the obser-
As noted at the outset of Section 9-1, the pdfs p(x) and p(y|x) vation n is a discrete random variable, but the unknown is a
are made available by the model describing the random variable continuous-value quantity. In such cases, we continue to use n
x and its relationship to random variable y. Hence, computing as the discrete-value observation, but we use x and p(x) instead
x̂LS using Eq. (9.17) should be a straightforward task. of m and p[m] for the continuous-value random variable.
Generalizing to the case where x and y are random vectors Transforming from (x, y) notation to [m, n] notation and from
and jointly Gaussian, the LS vector-equivalent of Eq. (9.8) is integrals to sums leads to the formulations that follow.
x̂LS = E[x | y = yobs ] = x + Kx,y K−1

y (yobs − y), (9.18) A. MLE
(jointly Gaussian vectors) m̂MLE = the MLE-estimated value of m, determined by maxi-
mizing the likelihood function (or its logarithm):
where we used Eq. (8.111a). Here, x and y are the mean values
of vectors x and y, and Ky and Kx,y are the autocorrelation and p[nobs|m] (both n and m are discrete). (9.19a)
cross-correlation matrices defined in Section 8-7.1. We note that
x̂LS is linearly related to the observation vector yobs , which will To maintain notational consistency, if the unknown quantity is
prove useful later. continuous, m should be replaced with x:
We should note that the MAP and LS methods are sometimes
called Bayesian estimators because they take advantage of p[nobs |x] (x continuous and n discrete). (9.19b)
a priori information about x in the form of p(x), whereas the
MLE method does not use p(x). The possible drawback of the
Bayesian methods is that if p(x) is incorrect, their estimates may
prove to be inferior to that of the MLE estimate, but if p(x) B. MAP
is correct and applicable, the Bayesian methods are likely to
produce more accurate estimates of x than MLE. m̂MAP = the MAP-estimated value of m, determined by maxi-
mizing the a posteriori pdf
Exercise 9-1: How does Eq. (9.18) simplify when random p[nobs |m] p[m]
p[m|nobs ] = . (9.20a)
vectors x and y are replaced with random variables x and y? p[nobs ]
Answer: (both n and m are discrete)
λx,y
x̂LS = x + (yobs − y). As noted in connection with Eq. (9.5), p[nobs ] exercises no effect
σy2
on the maximization process, so it can be set to any arbitrary
constant value C. Also, if the unknown quantity is continuous, Solution: (a) For N = 1, we are given a single observation,
m and p[m] should be replaced with x and p(x): y1 of y. Since p(y|x) = 0 for y > x, we require x̂MLE > y1 . The
height of the pdf for 0 ≤ y ≤ x is 1/x. This height is maximized
p[nobs |x] p(x) by choosing x̂MLE to be as small as possible. The smallest
p(x|nobs ) = (x continuous and n discrete).
p[nobs ] possible value of x̂MLE for which p(y1 |x) 6= 0 is x̂MLE = y1 .
(9.20b) (b) Repeating this argument for each of N independent
observations {y1 , . . . , yN }, we require the smallest value of x̂MLE
C. LSE such that x̂MLE > { y1 , . . . , yN }. This is x̂MLE = max{ y1 , . . . , yN }.
The likelihood function is nonzero in the N-dimensional
∞ hypercube 0 ≤ yn ≤ x:
∑ m′ p[nobs|m′ ] p[m′ ] (
m̂LS =
′
m =−∞
. (9.21a) 1/xN for 0 ≤ yn ≤ x,
∞ p(y1 , . . . , yN |x) =
p[nobs |m′ ] p[m′ ] 0 otherwise,
∑
m′ =−∞
(both m and n are discrete) which is maximized by minimizing xN subject to 0 ≤ yn ≤ x.
Hence x̂MLE = max{ y1 , . . . , yN }.
This expression, which is the discrete counterpart of the expres-
sion given by Eq. (9.17), is applicable only if both m and n are
discrete random variables. If the unknown quantity is a continu-
ous random variable and n is a discrete random variable, then m
and p[m] should be replaced with x and p(x), respectively, and
the summations in Eq. (9.21a) should be converted to integrals,
Concept Question 9-1: What is the essential difference
namely
between MLE and MAP estimation?
R
x′ p[nobs |x′ ] p(x′ ) dx′
x̂LS = R . (9.21b)
p[nobs |x′ ] p(x′ ) dx′ Exercise 9-2: Determine x̂LS and x̂MAP , given that only the
(x continuous and and n discrete) a priori pdf p(x) is known.
Answer: x̂LS = x, and x̂MLE = value of x at which p(x) is
the largest.
Example 9-1: MLE
Exercise 9-3: We observe random variable y whose pdf
is p(y|λ ) = λ e−λ y for y > 0. Compute the maximum
We are given N independent observations of a random variable likelihood estimate of λ .
y with uniform pdf:
Answer: The log-likelihood function is
(
1/x for 0 ≤ y ≤ x,
p(y|x) = ln p(yobs |λ ) = ln λ − λ yobs.
0 otherwise.
Setting its derivative to zero gives λ1 − yobs = 0. Solving
p( y|x) gives
1
λ̂MLE (yobs ) = .
yobs
1
x
Exercise 9-4: We observe random variable y whose pdf
y is p(y|λ ) = λ e−λ y for y > 0 and λ has the a priori
x pdf p(λ ) = 3λ 2 for 0 ≤ λ ≤ 1. Compute the maximum a
posteriori estimate of λ .
Compute x̂MLE for (a) N = 1 and (b) arbitrary N.
9-2.1 MLE Coin Estimate

Answer: The a posteriori pdf is
For the coin-flip experiment, the MLE likelihood function
p(yobs |λ ) p(λ ) given by Eq. (9.19b) is the conditional binomial pmf given by
p(λ |yobs ) = . Eq. (8.31), with the notation changed from p(n) to p(nobs |x)
p(yobs )
and a replaced with x. The pmf given by Eq. (8.31) presumes
Its logarithm is ln λ − λ yobs + 2 ln λ + ln(3) − ln(p(yobs )). that a is a known quantity, whereas in the present case a is the
Setting its derivative to zero gives λ3 − yobs = 0. Solving unknown value of the quantity x. The notation conversion gives
gives
3 xnobs (1 − x)N−nobs N!
λ̂MAP (yobs ) = p[nobs |x] = . (9.22)
yobs nobs ! (N − nobs)!
if 3/yobs < 1 since λ < 1. Note that λMAP (yobs ) = Here, n is a discrete-value random variable, but x is continuous.
3λMLE(yobs ) as λ is likely to be large. The log-likelihood function is
ln(p[nobs |x]) = nobs ln x + (N − nobs) ln(1 − x)

+ ln N! − lnnobs ! − ln(N − nobs)!. (9.23)
9-2 Coin-Flip Experiment
Only the first two terms are functions of x. The log-likelihood
In Section 8-1.4, we described a coin-flip experiment in which a function is maximized by setting its partial derivative with
coin is flipped N times. The result of the kth flip, with 1 ≤ k ≤ N, respect to x to zero:
is either a head event and designated by Hk , or a tail event and
designated by Tk . The result of any flip has no effect on the result ∂ ∂ nobs N − nobs
of any other flip, but the coin can be biased towards heads or 0 = nobs (ln x)+ (N − nobs) (ln(1 − x)) = − ,
∂x ∂x x 1−x
tails. For any given coin, the probability of heads for any flip is (9.24)
an unknown number a: where we used the relations
P[Hk ] = a, ∂ 1
(ln x) =
P[Tk ] = 1 − a, ∂x x
and
and 0 ≤ a ≤ 1. For a fair coin, a = 0.5, but we wish to consider ∂ 1
the more general case where the coin can be such that a can (ln(1 − x)) = − .
∂x 1−x
assume any desired value between 0 (all tails) and 1 (all heads).
Solving Eq. (9.24) for x gives
We now use the coin-flip experiment as a vehicle to test the
three estimation methods of the preceding section. In the con- nobs
text of the coin-flip experiment, our unknown and observation x̂MLE = x = , (9.25)
N
quantities are:
and for the coin-flip experiment with N = 10, x̂MLE = nobs /10.
(1) x = unknown probability of heads, which we wish to This result not only satisfies the condition 0 ≤ x ≤ 1, but it is
estimate on the basis of a finite number of observations, N. also intuitively obvious. It says that if we flip the coin 10 times
If we were to flip the coin an infinite number of times, the and 6 of those flips turn out to be heads, then the most-likelihood
estimated value of x should be a, the true probability of estimate of the probability that any individual flip is a head is
heads, but in our experiment N is finite. For the sake of the 0.6. As we will see shortly, the MAP and LS estimators provide
present exercise, we set N = 10. slightly different estimates.
(2) nobs = number of heads observed in N = 10 flips. 9-2.2 MAP Coin Estimate
As noted earlier, the MLE method does not use an a priori pdf Whereas the MLE method does not use a priori probabilistic
for x, but the MAP and LS methods do. Consequently, the three information about the unknown quantity x, the MAP method
estimation methods result in three different estimates x̂(nobs ). uses the pmf p[m] if m is a discrete-value random variable or
9-2 COIN-FLIP EXPERIMENT 299
the pdf p(x) if x is a continuous-value random variable. For the

coin-flip experiment, let us assume that x is continuous, with x̂
0 ≤ x ≤ 1, and that we know the coin is biased such that values 1.0
of x closer to 1 are more likely. Specifically, we are given the
a priori pdf N = 10
( 0.8
2x for 0 ≤ x ≤ 1, x̂LS
p(x) = (9.26)
0 otherwise,
0.6
which states that the higher the value of x is, the greater is its pdf.
This is in contrast to the uniform pdf p(x) = 1 for 0 ≤ x ≤ 1,
which allocates the same probability density to all values of x x̂MAP
within the specified range. We note that the expression given by 0.4
Eq. (9.26) satisfies the condition that
x̂MLE
Z 1
0.2
p(x) dx = 1.
0
The MAP estimator obtains x̂MAP by maximizing the a no

posteriori pdf given by Eq. (9.20b): 0 1 2 3 4 5 6 7 8 9 10
1
p(x|nobs ) = p[nobs |x] p(x) (9.27) Figure 9-2 Variation of x̂MLE , x̂MAP , and x̂LS with nobs , the
C number of coin flips that were observed to be heads, for N = 10
2x xnobs (1 − x)N−nobs N! coin flips and p(x′ ) = 2x′ for 0 ≤ x′ ≤ 1.
= . (9.28)
C nobs ! (N − nobs)!
where used used Eqs. (9.22) and (9.26), and p[n] has been set
to a constant value because it has no effect on the maximization
process. The natural logarithm of the pdf is which is different from the MLE estimate x̂MLE = nobs /10.
Figure 9-2 displays a plot of x̂MLE , x̂MAP , and x̂LS as a function
2N! of nobs , all for N = 10. We observe that even when none of
ln(p(x|nobs )) = ln the 10 coin flips is a head (i.e., nobs = 0), the MAP estimator
Cnobs ! (N − nobs)!
predicts x̂MAP = 1/11 ≈ 0.09, but as nobs approaches N = 10,
+ ln((x)nobs +1 ) + (N − nobs) ln(1 − x) both estimators approach the same limit of 1.

2N!
= ln
Cnobs ! (N − nobs)!
+ (nobs + 1) ln x + (N − nobs) ln(1 − x). (9.29)
9-2.3 LS Coin Estimate
Only the last two terms are functions of x, so when we take the
partial derivative with respect to x and set it equal to zero, we Since x is a continuous random variable and n a discrete random
get variable, the applicable expression for x̂LS is the one given by
nobs + 1 N − nobs Eq. (9.21b):
0= − ,
x 1−x R ′
whose solution for x yields the MAP estimate x p[nobs|x′ ] p[x′ ] dx′
x̂LS = R ′ ′ ′
. (9.31)
p[nobs |x ] p[x ] dx
nobs + 1
x̂MAP = x = . (9.30) After inserting the expression for p[nobs|x] given by Eq. (9.22)
N +1
and the expression for p(x) given by Eq. (9.26) into Eq. (9.31),
For the specific case where N = 10, x̂MAP = (nobs + 1)/11, and then canceling terms that are common to the numerator and
denominator but not functions of x, we have

R 1 ′ n +1 Answer: The a posteriori pdf is
(x ) obs (1 − x′)N−nobs dx′
x̂LS = 0R 1 . (9.32) p[nobs |x] p(x)
0 (x′ )nobs (1 − x′ )N−nobs dx′ p(x|nobs ) = .
p[nobs ]
The integration is somewhat cumbersome, but it leads to the
result Its logarithm, excluding constants, is
nobs + 1
x̂LS = . (9.33) ln p(x|nobs ) = nobs ln x + (N − nobs + 1) ln(1 − x).
N +2
For N = 10, x̂LS = (nobs + 1)/12. The variation of x̂LS with nobs , Setting its derivative to zero gives
which is displayed in Fig. 9-2, starts at a value slightly smaller
than x̂MAP at nobs = 0 and concludes at x̂LS = 0.92 at nobs = 10. nobs N − nobs + 1
− = 0.
The same coin-flip experiment has produced three different x 1−x
estimators, depending on the estimation criterion and use or not
use of the a priori pdf. Solving gives
nobs
x̂MAP (nobs ) = .
N +1
Note that x̂MAP (nobs ) < x̂MLE (nobs ) as x is likely to be small.
9-3 1-D Estimation Examples

In Section 9-2, we introduced three different approaches to esti-
mation: MLE, MAP, and LS. We used the coin-flip problem as
an example for demonstrating all three approaches, computing
different estimates of P[heads] based on observations of the
number of heads in ten flips of the coin. Now we apply the three
estimation methods to the following discrete-time 1-D estima-
tion scenarios: (1) estimating the mean value of an independent
Concept Question 9-2: How could we get three different and identically distributed (IID) random process (as described
estimates of P[heads] from the same number of heads? in Section 8-9.1 A), in the context of polling, (2) denoising a
random process containing additive white Gaussian noise, given
its power spectral density, (3) deconvolving a random process
Exercise 9-5: Compute the MAP estimator for the coin-flip
containing additive white Gaussian noise, and (4) deconvolving
experiment assuming the a priori uniform pdf p(x) = 1, for a sparse random process containing additive white Gaussian
0 ≤ x ≤ 1. noise. We present these 1-D scenarios as a prelude to similar
Answer: x̂MAP = x̂MLE = nobs /N. presentations for 2-D images in Section 9-4.
Denoising and deconvolution were presented in Chapter 6,
so they should be familiar. The problem of estimating the
Exercise 9-6: Compute the MAP estimator for the coin-flip expectation of an IID Gaussian random process leads to an
experiment with N = 10 and the a priori pdf p(x) = 3x2 , for estimator called the sample mean, which we present in the
0 ≤ x ≤ 1. context of combining poll results.
Answer: x̂MAP = (nobs + 2)/12.
9-3.1 Polling Scenario
Exercise 9-7: Compute the MAP estimator for the coin-flip
Suppose an election is to take place involving two candidates, A
experiment with a triangular a priori pdf p(x) = 2(1 − x)
and B. The fraction of voters who will vote for candidate A is
for 0 ≤ x ≤ 1.
the unknown constant µ , which we wish to estimate. The goal of
9-3 1-D ESTIMATION EXAMPLES 301
a pollster is to estimate µ from a subset of M polled voters, each voters who indicated they plan to vote for candidate A and M is
of whom tells the pollster for which candidate he or she will the total number of polled voters.
vote. If the subset of voters is representative of the entire voting For a single poll, the estimation process is straightforward,
population, and if the choice made by each voter is independent but what if we have N different polls, each comprised of M
of the choices made by other voters, the number of voters in the voters, resulting in N estimates of µ ? Instead of only one random
subset of M voters who say they will vote for candidate A is variable z with only one estimate µ̂ , we now have N random
a discrete random variable m and its pmf is the binomial pmf variables, z[1] through z[N], and N estimates of µ , namely
defined in Eq. (8.31), namely
m[i]
M m µ̂ [i] = zobs [i] = , 0 ≤ i ≤ N. (9.41)
p[m] = µ (1 − µ )M−m M
m
How do we combine the information provided by the N polls to
µ m (1 − µ )M−m M! generate a single estimate of µ ? As we see next, the answer to
= , m = 0, . . . , M. (9.34)
m! (M − m)! the question depends on which estimation method we use.
For large values of M (where M is the number of polled
voters), p[m] can be approximated as a Gaussian pdf with a mean
M µ and a variance σm2 = M µ (1 − µ ). In shorthand notation,
m ∼ N (M µ , σm2 ). (9.35) 9-3.2 MLE Estimate of Sample Mean

where N is shorthand for normal distribution.
The N polling processes can be represented by N random
Here, random variable m represents the number of voters
variables z[i] given by
who indicated they plan to vote for candidate A, out of a total
of M polled voters. Instead of m, the pollster can formulate z[i] = µ + ν [i], 0 ≤ i ≤ N, (9.42)
the estimation problem in terms of the normalized random
variable z, where where µ is the true mean we wish to estimate and ν [i] is the zero-
m
z= . (9.36) mean Gaussian random variable associated with polling process
M z[i]. Our first step involves combining the N random variables
Furthermore, z can be expressed as into an N-dimensional random vector z:
z = µ + ν, (9.37) z = [z[1], z[2], . . . , z[N]]T . (9.43)
with ν as a zero-mean Gaussian random variable: Since the various polls are independent processes, we assume
that the { z[i] } are independent random variables. Furthermore,
ν ∼ N (0, σz2 ), (9.38) we assume that M, the number of polled voters, is sufficiently
large to justify describing the random variables { z[i] } by Gaus-
and σz2 is related to σm2 by
sian pdfs with the same unknown mean µ and variance σz2 . The
joint conditional pdf p(z|µ ) of vector z is the product of the
σm2 M µ (1 − µ ) µ (1 − µ )
σz2 = = = . (9.39) marginal pdfs of all N members of z:
M2 M2 M
N N
If the polling indicates that the vote is likely to be close (between 1 2 2
p(z|µ ) = ∏ p(z[i]) = ∏ p e−(z[i]−µ ) /(2σz )
the two candidates), then the pollster might set µ ≈ 1/2, in i=1 i=1 2 πσz
2
which case σz2 ≈ 1/(4M). Additionally, the value of the un- N
1 2 2
known constant µ is estimated as =
(2πσz2 )N/2
∏ e−(z[i]−µ ) /(2σz ) . (9.44)
i=1
m
µ̂ = zobs = , (9.40)
M Setting z[i] = zobs [i] = the observed result of the ith poll and
µ = µ̂ = the estimate of µ to be computed in Eq. (9.44) gives
where zobs is the observed value of z, namely the number of
the likelihood function. Taking its natural logarithm gives the
log-likelihood function: be expressed as
N 1 N p(µ ) p(µ )
ln(p(zobs |µ̂ )) = − ln(2πσz2 ) − 2 ∑ (zobs [i] − µ̂ )2 . (9.45) p(µ |zobs ) = p(zobs |µ ) = p(zobs |µ ) , (9.50)
2 2σz p(zobs ) C
i=1
The log-likelihood function is maximized by setting its partial where we set p(zobs ) = C because it is not a function of the
derivative with respect to µ̂ to zero: unknown mean µ .
Inserting Eqs. (9.44), with z = zobs , and (9.49) into Eq. (9.50)
∂ 1 N leads to
0=
∂ µ̂
ln(p(zobs |µ̂ )) =
2σz2 ∑ (zobs [i] − µ̂ ). (9.46)
1 1 2 2
i=1
p(µ |zobs ) = ×q e−(µ −µ p ) /(2σ p )
C(2πσz2 )N/2 2πσ 2
Solving for µ̂ gives the MLE estimate of µ : p
N
2 /(2σ 2 )
1 N
1 N
m[i] 1 N
× ∏ e−(zobs [i]−µ ) z . (9.51)
µ̂MLE =
N ∑ zobs [i] = N ∑ M = NM ∑ m[i], (9.47) i=1
i=1 i=1 i=1
The log-likelihood function is
which is equal to the total number of voters (among all N polls)
who selected candidate A, divided by the total number of polled N 1
voters, NM. The result given by Eq. (9.47) is called the sample ln(p(µ |zobs )) = − lnC − ln(2πσz2 ) − ln(2πσ p2 )
2 2
mean of { zobs [i] }. As with the MLE estimator in Eq. (9.25)
1 1 N
for the coin-flip problem, this is the “obvious” estimator of the − 2 (µ − µ p)2 − 2 ∑ (zobs [i] − µ )2 .
mean µ . 2σ p 2σz i=1
Note that Eq. (9.42) can be interpreted as a set of N ob- (9.52)
servations of a white Gaussian random process with unknown
mean µ . So the sample mean can be used to estimate the The a posteriori estimate µ̂MAP is the value of µ that maximizes
unknown mean of a white Gaussian random process. the log-likelihood function, which is obtained by setting its
partial derivative with respect to µ to zero:
" # " #
9-3.3 MAP Estimate of Sample Mean ∂ 1 2 ∂ 1 N 2
0= − 2 (µ − µ p ) + − 2 ∑ (zobs [i] − µ ) .
We again have N polls, represented by { z[i] }, but we also ∂µ 2σ p ∂µ 2σz i=1
have some additional information generated in earlier polls of
candidate A versus candidate B. The data generated in earlier The solution leads to
polls show that the fraction µ of voters who preferred candidate !
σz2 N
A varied from one poll to another, and that collectively µ µ p + ∑ zobs [i]
behaves like a Gaussian random variable with a mean µ p and σ p2 i=1
µ̂MAP = µ = . (9.53)
a variance σ p2 : σ2
N + z2
µ ∼ N (µ p , σ p2 ). (9.48) σp
Since the information about the statistics of µ is available ahead
The MAP estimate is the sample mean of the following:
of the N polls, we refer to the pdf of µ as an a priori pdf:
{z[i]}, augmented with (σz2 /σ p2 ) copies of µ p , for a total of
1 2 2 (N + σz2 /σ p2 ) “observations.” For the polling problem, if M is
p(µ ) = q e−(µ −µ p) /(2σ p ) . (9.49)
the number of voters polled in each of the N polls and M ′ is
2πσ p2
the number of voters polled in each of the earlier polls, then
(σz2 /σ p2 ) = M ′ /M, so if the previous polls polled more voters,
Our plan in the present subsection is to apply the MAP method its estimate µ p of µ is weighed more heavily.
outlined in Section 9-1.2, wherein the goal is to maximize
the a posteriori pdf p(unknown mean µ | observation vec-
tor zobs ) = p(µ |zobs ). Using the form of Eq. (9.5), p(µ |zobs ) can
9-4 LEAST-SQUARES ESTIMATION 303
Example 9-2: Polling Example

Answer: From Chapter 8, the variance of the sum of
uncorrelated random variables is the sum of their vari-
ances, independent random variables are uncorrelated, and
2
σax = a2 σx2 . Let µ = N1 ∑Ni=1 xi be the sample mean of the
{xi }. Then
Five different polls of 1000 voters each were taken. The frac-
1 σ2
tions of voters supporting candidate A were: σµ2 = 2 N σx2 = x .
N N
zobs = { 0.51, 0.55, 0.53, 0.54, 0.52 }. Also, E[µ ] = N1 ∑Ni=1 E[xi ] = x, so the sample mean is an
unbiased estimator of x. So the sample mean converges to
Compute (a) the MLE estimate µ̂MLE , and (b) the MAP estimate
the actual mean x of the {xi } as N → ∞. This is why the
µ̂MAP , given that 2000 voters had been polled in an earlier poll
sample mean is useful, and why polling works.
in which 60% of voters selected candidate A.
Solution: (a) The MLE estimate is simply the sample mean:
1 5 9-4 Least-Squares Estimation

µ̂MLE =
5 ∑ zobs [i]
i=1
9-4.1 Gaussian Random Vectors
1
= (0.51 + 0.55 + 0.53 + 0.54 + 0.52) = 0.53.
5 Generalizing Eq. (9.18) for two jointly Gaussian random vectors
x and y (instead of x and yobs ), the least-squares estimate x̂LS of
(b) The MAP estimate is given by Eq. (9.53): vector x, given vector y, is given by the conditional mean

σz2 5
x̂LS = E[x|y] = x + Kx,yK−1
µ p + ∑ zobs [i] y (y − y), (9.54)
σv2 i=1
µ̂MAP = . where x and y are the means of vectors x and y, respectively,
σz2
N+ Kx,y is the cross-covariance matrix defined by Eq. (8.75), and
σv2 Ky is the covariance matrix given by the form of Eq. (8.74). If x
and y are zero-mean, Eq. (9.54) simplifies to
From the available information, N = 5, µ p = 0.6 (earlier
poll), M = 1000 votes, and M ′ = 2000 voters (earlier a priori x̂LS = Kx,y K−1 −1
y y = Rx,y Ry y, (9.55)
information). Also, σz2 /σv2 = M ′ /M = 2. Using these values
leads to (zero-mean jointly Gaussian)
5
2 × 0.6 + ∑ zobs [i] where—because x and y are zero-mean—Kx,y was replaced
µ̂MAP = i=1
= 0.55. with the cross-correlation function Rx,y and Ky was replaced
5+2 with the auto-correlation function Rx .
The a priori information biased the estimate by raising it from Vector x represents the unknown quantity, vector x̂LS repre-
µ̂MLE = 0.53 to µ̂MAP = 0.55. sents the LS estimate of x, and the difference between them,
e = (x − x̂LS ), represents the estimation error. Also, vector y
represents the observation. The expectation of the product of the
estimation error and the observation is
Concept Question 9-3: Another term for the sample
mean of a bunch of numbers is their what? E[(x − x̂LS )yT ] = E[x yT] − E[x̂LS yT ]
= Rx,y − Rx,yR−1 T
y E[y y ]
Exercise 9-8: Compute the mean and variance of the
sample mean of N iid random variables {xi }, each of which = Rx,y − Rx,yR−1
y Ry
has mean x and variance σx2 . = Rx,y − Rx,y = 0. (9.56)
◮ The result given by Eq. (9.56) is a statement of orthogo- ◮ The result given by Eq. (9.59) is another statement
nality: The estimation error e = (x − x̂LS ) is uncorrelated of orthogonality: When the expectation of the product of
with the observation y used to produce the LS estimate two random variables—in this case, the error e[n] and
x̂LS . ◭ the observation y[n − j]—is zero, it means that those two
random variables are uncorrelated. We conclude from this
observation that if x[n] and y[n] are jointly Gaussian WSS
random processes, then x̂LLSE [n] = x̂LS [n]. ◭
9-4.2 Linear Least-Squares Estimation
We now consider a scenario in which the estimator is con- 9-4.3 1-D Stochastic Wiener Smoothing Filter
strained to be a linear function of the observation. As we shall
see shortly, the derived linear least-squares estimate x̂LLSE is We now use the orthogonality relationship to derive 1-D stochas-
equal to the LS estimate for Gaussian random vectors of the tic versions of the deterministic Wiener deconvolution filter
preceding subsection. presented in Chapter 6 and the deterministic sparsifying de-
Given that x[i] and y[i] are zero-mean, jointly wide-sense- noising filter presented in Chapter 7. The stochastic versions
stationary (WSS, defined in Section 8-9.3) random processes, of these filters assume that the signals are random processes
our present goal is to compute the linear least-squares estimate instead of just functions. This allows a priori information about
x̂LLSE[n] of x[n] at discrete time n from the infinite set of the signals, in the form of their power spectral densities, to be
observations { y[i], −∞ < i < ∞ }. The estimator is constrained incorporated into the filters. If all of the random processes are
to be a linear function of the { y[i] }, which means that x̂LLSE [n] white, the stochastic filters reduce to the deterministic filters of
and { y[i] } are related by a linear sum of the form Chapters 6 and 7, as we show later.
∞
Another advantage of the stochastic forms of these filters is
that the trade-off parameter λ in the Tikhonov and LASSO
x̂LLSE [n] = ∑ h[i] y[n − i] = h[n] ∗ y[n]. (9.57)
criteria can now be interpreted as an inverse signal-to-noise
i=−∞
ratio, as we also show later. We derive the 1-D forms of
Here, h[n] is an unknown function (filter) that is yet to be stochastic filters so we may generalize them to 2-D later.
determined. Our task in the present subsection is to determine the filter h[i]
Let us consider the error e[n] between x[n] and the estimate introduced in Eq. (9.57). To that end, we insert Eq. (9.58) into
x̂LLSE: Eq. (9.59) and apply the distributive property of the expectation:
∞
0 = E[e[n] y[n − j]]
e[n] = x[n] − x̂LLSE[n] = x[n] − ∑ h[i] y[n − i]. (9.58) "
∞
! #
i=−∞
=E x[n] − ∑ h[i] y[n − i] y[n − j]
Next, let us square the error e[n], take the derivative of its i=−∞
expectation with respect to h[ j] and then set it equal to zero: ∞
= E [x[n] y[n − j]] − ∑ h[i] E[y[n − i] y[n − j]]
∂ i=−∞
0= E[e[n]2 ] ∞
∂ h[ j]

∂ e[n]
= Rxy [ j] − ∑ h[i] Ry [ j − i]
i=−∞
= 2E e[n]
∂ h[ j] = Rxy [ j] − h[ j] ∗ Ry[ j], (9.60)
" !#
∞
∂
= 2E e[n] x[n] − ∑ h[ j] y[n − j] where, in the last step, we used the standard definition of
∂ h[ j] j=−∞ convolution of two discrete-time 1-D signals, Eq. (2.71a).
= 2E[e[n] (−y[n − j])] Taking the DTFT of Eq. (9.60) gives
= −2E[e[n] y[n − j]]. (9.59) 0 = Sxy (Ω) − H(Ω) Sy (Ω).
9-4 LEAST-SQUARES ESTIMATION 305
The solution for H(Ω) is labeled HSDN (Ω): Taking the DTFTs of Eqs. (9.63) and (9.65) gives
Sxy (Ω) Sxy (Ω) = Sx (Ω), (9.66a)

HSDN (Ω) = . (9.61)
Sy (Ω) and
Sy (Ω) = Sx (Ω) + σv2. (9.66b)
The subscript SDN stands for stochastic denoising Wiener filter.
The LS estimate of x[n] is obtained from observation y[n] by The combination of the two results leads to
applying the recipe:
Sxy (Ω) Sx (Ω)
x̂LLSE[n] = x̂LS [n] = hSDN [n] ∗ y[n], (9.62a) HSDN (Ω) = = . (9.67)
Sy (Ω) Sx (Ω) + σv2
with The expression given by Eq. (9.67) represents a stochastic
Sxy (Ω)
hSDN [n] = DTFT−1 . (9.62b) denoising filter because at frequency components Ω for which
Sy (Ω) Sx (Ω) ≫ σv2 , HSDN (Ω) ≈ 1 and x̂LS [n] ≈ y[n], but at frequency
Application examples follow in the next two subsections. components for which Sx (Ω) ≪ σv2 , HSDN (Ω) ≈ 0, thereby
filtering out the noise at frequencies where the noise drowns out
the signal.
Assuming the power spectral density Sx (Ω) and the noise
9-4.4 Stochastic Wiener Denoising variance σv2 are known or can be estimated (as discussed later
in Section 9-7), the denoising recipe consists of the following
Let us suppose that y[n] are noisy observations of x[n]: steps:
y[n] = x[n] + ν [n], −∞ < n < ∞, (1) Equation (9.67) is used to compute HSDN (Ω).
(2) The inverse DTFT is applied to obtain
where ν [n] is a zero-mean IID random noise process. Also, x[n]
and ν [n] are uncorrelated and jointly WSS Gaussian random hSDN [n] = DTFT−1 { HSDN (Ω) }.
processes. Our goal is to compute the linear least-squares es-
timate x̂LLSE[n] at discrete n from the infinite set of observations (3) The LS denoised estimate of random process x[n] at time
{ y[i], −∞ < i < ∞ }. In order to apply the recipe given by n is
Eq. (9.62), we first need to obtain expressions for Sxy (Ω) and x̂LS [n] = hSDN [n] ∗ yobs[n], (9.68)
Sy (Ω). where yobs [n] is the observation. Alternatively, yobs [n] can be
Since x[n] and ν [n] are uncorrelated and WSS, it follows that transformed to Yobs (Ω), then used to compute
the cross-correlation
x̂LS (Ω) = HSDN (Ω) Yobs (Ω),
Rxy [i, j] = E[x[i] (x[ j] + ν [ j])]
= E[x[i] x[ j]] + E[x[i] ν [ j]] = Rx [i − j]. (9.63) after which an inverse transformation of x̂LS (Ω) yields x̂LS [n].
Similarly,
9-4.5 Stochastic Wiener Deconvolution Example
Ry [i, j] = E[(x[i] + ν [i])(x[ j] + ν [ j])] Now let y[n] be noisy observations of hblur [n] ∗ x[n]:
= E[x[i] x[ j]] + E[ν [i] x[ j]] + E[x[i] ν [ j]] + E[ν [i] ν [ j]].
(9.64) y[n] = hblur [n] ∗ x[n] + v[n], −∞ < n < ∞, (9.69)
Since x[i] and ν [ j] are uncorrelated, the second and third terms where hblur [n] is a known impulse response. The goal is to
are zero. Hence, compute the linear least-square estimate x̂LLSE [n] at time n from
the observations { y[i], −∞ < i < ∞ }. The noise v[n] is a zero-
Ry [i, j] = Rx [i − j] + σv2 δ [i − j], (9.65) mean IID random process uncorrelated and jointly WSS with
x[n], and E(v[n]2 ) = σv2 δ [n].
where σv2 is the variance of the noise ν [n]. Equations (9.63) and Replacing x[n] with hblur [n] ∗ x[n] in the derivation leading to
(9.65) show that Rxy [i, j] = Rxy [i − j] and Ry [i, j] = Ry [i − j]. Eq. (9.67) for the smoothing filter for noisy observations and
using Eq. (8.161)(a and b) gives (3) x[n] at each time n has a Laplacian a priori pdf given by
√
Sy (Ω) = |Hblur (Ω)|2 Sx (Ω) + σv2 , (9.70a) 1
p(x[n]) = √ e− 2|x[n]|/σx . (9.75)
Sxy (Ω) = H∗blur (Ω) Sx (Ω), (9.70b) 2 σx
which leads to the stochastic deconvolution (SDC) Wiener

filter:
H∗blur (Ω) Sx (Ω) We will now show that these conditions lead to a sparse
WSDC (Ω) = . (9.71a) (mostly zero-valued) estimate of x[n].
|Hblur (Ω)|2 Sx (Ω) + σν2
To best use the combined information, we form vectors x, y,
For frequencies Ω for which |Hblur (Ω)|2 Sx (Ω) ≫ σν2 , and v:
H∗blur (Ω) Sx (Ω) 1 x = [x[1], x[2], . . . , x[N]]T , (9.76a)

WSDC (Ω) ≈ = , (9.71b) T
|Hblur (Ω)|2 Sx (Ω) Hblur (Ω) y = [y[1], y[2], . . . , y[N]] , (9.76b)
T
which makes perfect sense for a noiseless deconvolution filter. v = [v[1], v[2], . . . , v[N]] . (9.76c)
The Wiener deconvolution filter given by Eq. (9.71a) can be Using Eq. (9.75), the a priori pdf of unknown vector x is
rewritten as
N N
1 √
H∗blur (Ω) Sx (Ω) p(x) = ∏ p(x[n]) = ∏ √ e− 2 |x[n]|/σx . (9.77)
WSDC (Ω) =
|Hblur (Ω)|2 Sx (Ω) + σν2 n=1 n=1 2 σx
|Hblur (Ω)|2 Sx (Ω) 1 For y = yobs , where yobs is the observation vector, the condi-
= × , (9.72)
|Hblur (Ω)|2 Sx (Ω) + σν2 Hblur (Ω) tional pdf p(yobs |x) is the same as the jointly Gaussian pdf of
the noise vector ν with ν [n] = yobs [n] − x[n]:
which has the nice interpretation of a Wiener denoising filter
followed by a noiseless deconvolution filter. N
1 2 2
Assuming Hblur (Ω), Sx (Ω), and σv2 are known or can be es- p(yobs |x) = p(ν ) = ∏ p e−ν [n] /(2σv )
2
2πσv
timated through other means, a reconstructed stochastic Wiener n=1
estimate can be computed N
1 2 2
=∏p e−(yobs [n]−x[n]) /(2σv ) . (9.78)
2
2πσv
x̂SDC [n] = DTFT−1 [WSDC (Ω) Yobs (Ω)], (9.73) n=1
where Yobs (Ω) is the DTFT of the noisy, convolved observation The MAP estimate x̂MAP [n] at time n is related to the a posteriori
yobs [n]. pdf p(x|yobs), which is related to p(yobs |x) and p(x) by the
vector equivalent of Eq. (9.5):
p(x)
9-4.6 Stochastic MAP Sparsifying Denoising p(x|yobs) = p(yobs |x) . (9.79)
C
Estimator
Inserting Eqs. (9.77) and (9.78) into Eq. (9.79) and then taking
We return to the denoising problem: the natural log leads to the MAP log-likelihood function:
y[n] = x[n] + v[n], 1 ≤ n ≤ N, (9.74) N
ln(p(x|yobs )) = − lnC − ln(2πσv2 )
2
and we make the following assumptions: N
1
(1) ν [n] is a zero-mean white Gaussian random process with
−
2σv2 ∑ (yobs[n] − x[n])2
n=1
Rν [n] = σν2 δ [n]. √
√ 2 N
− N ln( 2 σx ) −
σx ∑ |x[n]|, (9.80)
(2) x[n] and ν [n] are IID and jointly WSS random processes. n=1
9-5 DETERMINISTIC VERSUS STOCHASTIC WIENER FILTERING 307
The usual procedure for obtaining the MAP estimate x̂MAP [n] 9-5 Deterministic versus Stochastic
involves taking the partial derivative of the log-likelihood func-
tion with respect to x[n], equating the result to zero, and then
Wiener Filtering
solving for x[n]. In the present case, the procedure is not so
straightforward because one of the terms includes the absolute Through a speedometer example, we now compare results of
value of x[n]. Upon ignoring the three terms in Eq. (9.80) speed estimates based on two Wiener filtering approaches, one
that do not involve x[n] (because they exercise no impact on using a deterministic filter and another using a stochastic filter.
minimizing the log-likelihood function), and then multiplying The speedometer accepts as input y(t), a noisy measurement of
the two remaining terms by (−σv2 ), we obtain a new cost the distance traveled at time t, measured by an odometer, and
functional converts y(t) into an output speed r(t) = dy/dt. [We use symbol
r (for rate) instead of s, to denote speed to avoid notational
1 N √ σv2 N confusion in later sections.] Sampling y(t) and r(t) at a sampling
Λ=
2 ∑ (yobs [n] − x[n])2 + 2
σx ∑ |x[n]|. (9.81) interval ∆ converts them into discrete-time signals y[n] and r[n]:
n=1 n=1
The functional form of Λ is identical to that of the LASSO cost y[n] = y(t = n∆), (9.84a)
functional given in Eq. (7.106), and so is the form of the solution y[n] − y[n − 1]
given by Eq. (7.109): r[n] = . (9.84b)
∆


yobs [n] − λ for yobs [n] > λ , The observed distance y[n] contains a noise component ν [n],
x̂MAP [n] = yobs [n] + λ for yobs [n] < −λ , (9.82) y[n] = s[n] + ν [n], (9.85)

0 for |yobs [n]| < λ ,
where s[n] is the actual distance that the odometer would have
where λ is the noise-to-signal ratio measured had it been noise-free, and ν [n] is a zero-mean white
noise Gaussian random process with variance σv2 . An example
√ σv2 of a slightly noisy odometer signal y[n] with σv2 = 2 is shown
λ= 2 . (9.83)
σx in Fig. 9-3(a), and the application of the differentiator given
by Eq. (9.84b) with ∆ = 1 s results in the unfiltered speed
ru [n] shown in Fig. 9-3(b). The expression for ru [n] is, from
Concept Question 9-4: How did we use the orthogonal-
Eq. (9.84b),
ity principle of linear prediction in Section 9-4?
y[n] − y[n − 1]
ru [n] =
Concept Question 9-5: How is the Tikhonov parameter ∆
λ interpreted in Section 9-4? s[n] − s[n − 1] ν [n] − ν [n − 1]
= +
∆ ∆
Exercise 9-9: What is the MAP sparsifying estimator when = rtrue [n] + rnoise[n]. (9.86)
the noise variance σv2 is much smaller than σx ?
The goal of the Wiener filter is to estimate the true speed rtrue [n],
Answer: When σv2 ≪ σx , λ → 0, and x̂[n] = yobs [n]. so for the purpose of present example, we show in Fig. 9-2(c)
a plot of the true speed rtrue [n], against which we will shortly
Exercise 9-10: What is the MAP sparsifying estimator compare the Wiener-filter results.
when noise variance σv2 is much greater than σx ?
Answer: When σv2 ≫ σx , λ → ∞, and x̂MAP = 0. This
makes sense: the a priori information that x is sparse 9-5.1 Deterministic Wiener Filter
dominates the noisy observation.
For ∆ = 1 s, the true speed rtrue [n] is related to the true distance
s[n] by
rtrue [n] = s[n] − s[n − 1]. (9.87)
y[n] rtrue[n]
800 12
10
600
8
400 6
200 4
2
0
0
−200
0 50 100 150 200 250
n −2
0 50 100 150 200 250
n
(a) Noisy odometer signal y[n] (c) True speed rtrue[n]
ru[n] rˆ D[n]
15 15
10 10
5 5
0 0
−5
0 50 100 150 200 250
n −5
0 50 100 150 200 250
n
(b) Unfiltered speed ru[n] (d) Speed rˆD[n] estimated by
deterministic Wiener filter
rˆs[n]
12
10
8
6
4
2
0
−2
0 50 100 150 200 250
n
(e) Speed rˆS[n] estimated by stochastic Wiener filter
Figure 9-3 Speedometer example: (a) noisy odometer signal y[n], (b) unfiltered speed ru [n] = (y[n] − y[n − 1])/∆, with ∆ = 1 s, (c) true
speed rtrue [n] = (s[n] − s[n − 1])/∆, with ∆ = 1 s, (d) deterministically estimated speed r̂D [n], and (e) stochastically estimated speed r̂S [n].
For a segment of length N, with { n = 1, . . . , N }, the N-point noise-free system described by Eq. (9.87) is
DFT of Eq. (9.87) gives
S[k] 1
− j2π k/N − j2π k/N H[k] = = . (9.89)
R[k] = S[k] − e S[k] = (1 − e ) S[k], (9.88) R[k] 1 − e j2π k/N
−
where we used property #2 in Table 2-9 to compute the second The deterministic Wiener denoising/deconvolution filter was
term of Eq. (9.88). The frequency response function H[k] of the presented earlier in Section 6-4.4 for 2-D images. Converting
the notation from 2-D to 1-D, as well as replacing the symbols
in Eqs. (6.32a) and (6.32b) to match our current speedometer
9-6 2-D ESTIMATION 309
problem, we obtain the following expression for the estimated As noted later in Section 9-7 and 9-8, a practical model for
DFT of the true speed rtrue [n]: Ss (Ω) is
C
R̂D [k] = Y[k] WDDC [k] (9.90a) Ss (Ω) = 2 , (9.92c)
Ω
with where C is a constant. Using Eqs. (9.89) and (9.92c) leads to the
H∗ [k] stochastic estimate
WDDC [k] = , (9.90b)
|H[k]|2 + λ 2 Y(Ω)(1 − e− jΩ)
R̂S (Ω) = . (9.93)
where Y[k] is the DFT of the noisy observation y[n], WDDC [k] σ2
is the deterministic deconvolution Wiener filter, and λ is 1 + |1 − e− jΩ|2 Ω2 v
C
the trade-off parameter in Tikhonov regularization. Combining
Eqs. (9.89), (9.90a), and (9.90b), and multiplying the numerator Implementation of the stochastic Wiener filter entails the follow-
and denominator by |1 − e j2π k/N |2 leads to the deterministic ing steps:
estimate (1) For the given observation distance signal y[n], compute its
Y[k] (1 − e− j2π k/N ) DFT Y[k].
R̂D [k] = . (9.91)
1 + |1 − e− j2π k/N |2 λ 2
(2) Convert Eq. (9.93) into a numerical format by replacing Ω
To obtain the “best” estimate of the speed rD [n] using the with 2π k/N everywhere.
deterministic Wiener filter, we need to go through the estimation
process multiple time using different values of λ . The process (3) Using C′ = σv2 /C as a parameter, compute R̂S [k] for various
entails the following steps: values of C′ .
(4) Perform an inverse DFT to obtain r̂s [n] for each value of C′ .
(1) Compute the DFT Y[k] of the noisy observations y[n].
(5) Select the value of C′ that appears to provide the best result.
(2) Compute R̂D [k] using Eq. (9.91) for various values of λ .
The result for the seemingly best value of C′ , which turned out to
be C′ = 0.4, is displayed in Fig. 9-3(e). Comparison of the plots
(3) Perform an inverse N-point DFT to compute r̂D [n].
for r̂D [n] and r̂s [n] reveals that the stochastic approach provides
For the speedometer example, the outcome with the “seemingly” a much better rendition of the true speed rtrue [n] than does the
best result is the one with λ = 0.1, and its plot is shown in deterministic approach.
Fig. 9-3(d). It is an improvement over the unfiltered speed ru [n],
but it still contains a noticeable noisy component. Concept Question 9-6: Why did the stochastic Wiener
filter produce a much better estimate than the deterministic
Wiener filter in Section 9-5?
9-5.2 Stochastic Wiener Filter
Based on the treatment given in Section 9-4.5, the stochastic Exercise 9-11: What happens to the stochastic Wiener filter
Wiener approach to filtering the noisy signal y[n] uses a stochas- when the observation noise strength σv2 → 0?
tic denoising/deconvolution Wiener filter WSDC (Ω) as follows:
Answer: From Eq. (9.93), letting σv2 → 0 makes
R̂S (Ω) = Y(Ω) WSDC (Ω), (9.92a) R̂S (Ω) = Y(Ω)(1 − e− jΩ ), whose inverse DTFT is
r̂s [n] = y[n] − y[n − 1], which is Eq. (9.84b).
with
H∗ (Ω) Ss (Ω)
WSDC (Ω) = , (9.92b)
|H(Ω)|2 Ss (Ω) + σv2
9-6 2-D Estimation
where H(Ω) is H[k] of Eq. (9.89) with Ω = 2π k/N, Ss (Ω) is the
power spectral density of s[n] and σv2 is the noise variance. Since We now extend the various methods that were introduced in ear-
the true distance s[n] is an unknown quantity, its power spectral lier sections for estimating 1-D random processes to estimating
density Ss (Ω) is unknown. In practice, Ss (Ω) is assigned a func- 2-D random fields. The derivations of these estimators are direct
tional form based on experience with similar random processes. generalizations of their 1-D counterparts.
9-6.1 2-D MLE Estimate of Sample Mean and

Throughout Chapter 8, it was assumed that WSS random fields Sg (Ω1 , Ω2 ) = Sf (Ω1 , Ω2 ) + σv2 , (9.97b)
were zero-mean, so realized by subtracting the constant mean
in which case the expression for HSDN (Ω1 , Ω2 ) becomes
from the original random field. To perform the subtraction step,
we need to either know the value of the mean f or we need to Sf (Ω1 , Ω2 )
estimate it from the observation of the random field fobs [n, m]. HSDN (Ω1 , Ω2 ) = . (9.98)
Sf (Ω1 , Ω2 ) + σv2
Extending the result of the 1-D MLE estimate of the mean given
by Eq. (9.47) to a 2-D random field of size N × N gives If the power spectral density Sf (Ω1 , Ω2 ) and the noise vari-
N N ance σv2 are known, a 2-D image can be denoised by applying
1 the following recipe:
f MLE =
N2 ∑∑ fobs [n, m]. (9.94)
n=1 m=1
(1) Equation (9.98) can be used to compute HSDN (Ω1 , Ω2 ).
◮ Estimating the mean f and subtracting it from fobs [n, m]
to generate a zero-mean observation random field gobs [n, m] (2) The inverse DSFT is applied to obtain
is a required prerequisite step to the application of Wiener
filters for image smoothing and deconvolution. ◭ hSDN [n, m] = DSFT−1 { HSDN (Ω1 , Ω2 ) }. (9.99)
(3) Then, each element of f [n, m] is denoised by using the

9-6.2 2-D Wiener Filter filtering operation
As in 1-D, the least-squares (LS) estimate for jointly Gaussian
fˆS [n, m] = hSDN [n, m] ∗ ∗ gobs [n, m], (9.100)
random fields—unknown x[n] and observation y[n]—is equal to
the linear least-squares estimate (LLSE) for any pair of linearly where gobs [n, m] is the noisy observation of f [n, m]. As
related random fields, whether jointly Gaussian or not. in 1-D, hSDN [n, m] usually is windowed to a finite spatial
extent.
◮ Hence, for convenience and with no loss of generality,
we shall assume that all random fields are jointly WSS
Gaussian, for estimation purposes. ◭ 9-6.4 Stochastic 2-D Wiener Deconvolution
Filter
Given an unknown random field f [n, m], a zero-mean noise
random field ν [n, m], and an observation random field g[n, m], Let us assume we have a noisy 2-D image g[n, m]. In fact, in
with addition to the added noise ν [n, m], the true (unknown) image
f [n, m] had been blurred by the measurement process. Thus,
g[n, m] = f [n, m] + ν [n, m], 1 ≤ n, m ≤ N, (9.95)
g[n, m] = hblur [n, m] ∗ ∗ f [n, m] + ν [n, m], (9.101)
the 2-D version of the 1-D stochastic denoising Wiener filter
given by Eq. (9.61) is where hblur [n, m] is a blur filter with a known PSF, established
through a calibration process.
Sfg (Ω1 , Ω2 ) Our goal is to compute the stochastic reconstruction fˆS [n, m]
HSDN (Ω1 , Ω2 ) = . (9.96) of f [n, m] at each location [n, m], from noisy observations
Sg (Ω1 , Ω2 )
{ gobs [i, j], −∞ ≤ i, j ≤ ∞ }. To that end, we introduce the 2-D
version of Eqs. (9.70a and b):
9-6.3 Stochastic 2-D Wiener Denoising Filter
Sfg (Ω1 , Ω2 ) = H∗blur (Ω1 , Ω2 ) Sf (Ω1 , Ω2 ), (9.102a)
Moreover, if ν [n, m] is a zero-mean white Gaussian noise ran-
2
dom field with variance σv2 , it follows that Sg (Ω1 , Ω2 ) = |Hblur (Ω1 , Ω2 )| Sf (Ω1 , Ω2 ) + σv2, (9.102b)
Sfg (Ω1 , Ω2 ) = Sf (Ω1 , Ω2 ) (9.97a) and then we use them to obtain the stochastic deconvolution
9-6 2-D ESTIMATION 311
Wiener filter form of a zero-mean white Gaussian random field ν [n, m]:
Sfg (Ω1 , Ω2 ) g[n, m] = hblur [n, m] ∗ ∗ f [n, m] + ν [n, m]. (9.108)
WSDC (Ω1 , Ω2 ) =
Sg (Ω1 , Ω2 )
H∗blur (Ω1 , Ω2 ) Sf (Ω1 , Ω2 ) The outcome of the blurring and noise-addition processes is
= . (9.103) displayed in Fig. 9-4(b).
|Hblur (Ω1 , Ω2 )|2 Sf (Ω1 , Ω2 ) + σv2
As before, we assume that we know the power spectral density
Sf (Ω1 , Ω2 ), the noise variance σv2 , and the blur filter hblur [n, m], A. Deterministic Wiener Deconvolution
in which case we follow the recipe: By extending the 1-D speedometer recipe of Section 9-5.1 to the
(1) Obtain Hblur (Ω1 , Ω2 ) from 2-D MRI image, we obtain the deterministic Wiener estimate
fˆD [n, m] as follows:
Hblur (Ω1 , Ω2 ) = DSFT{ hblur [n, m] }. (9.104)
(1) Hblur [k1 , k2 ] is computed from hblur [n, m] by taking the 2-D
(2) Compute WSDC (Ω1 , Ω2 ) using Eq. (9.103). DFT of Eq. (9.107).
(3) Compute wSCD [n, m] from (2) The deterministic deconvolution Wiener filter
wSDC [n, m] = DSFT−1 { WSDC (Ω1 , Ω2 ) }. (9.105) H∗blur [k1 , k2 ]

WDDC [k1 , k2 ] = (9.109)
|Hblur [k1 , k2 ]|2 + λ 2
(4) Estimate fˆs [n, m] at each location [n, m] based on observa-
tions gobs [n, m] by applying the convolution: is computed for several values of the parameter λ .
fˆs [n, m] = wSDC [n, m] ∗ ∗ gobs [n, m]. (9.106) (3) F̂D [k1 , k2 ] is estimated using
Does the stochastic Wiener deconvolution filter provide better F̂D [k1 , k2 ] = G[k1 , k2 ] WDDC [k1 , k2 ], (9.110)
results than the deterministic Wiener deconvolution filter de-
scribed in Section 6-4.4? The answer is an emphatic yes, as where G[k1 , k2 ] is the 2-D DFT of the blurred image g[n, m].
demonstrated by the following example.
(4) Application of the inverse DFT gives fˆD [n, m]. The “best”
resultant image, shown in Fig. 9-4(c), used λ = 0.1.
9-6.5 Deterministic versus Stochastic
Deconvolution Example
This is a 2-D “replica” of the 1-D speedometer example of B. Stochastic Wiener Deconvolution
Section 9-5. The image in Fig. 9-4(a) is a high-resolution low-
The 1-D stochastic Wiener solution to the speedometer problem
noise MRI image of a human head. We will treat it as an
is given in Section 9-5.2. The 2-D image deconvolution proce-
“original” image f [n, m]. To illustrate (a) the effects of blurring
dure using the stochastic Wiener approach is identical to the 1-D
caused by the convolution of an original image with the PSF
procedure outlined earlier in part A of Section 9-6.5, except for
of the imaging system, and (b) the deblurring realized by
the form of the Wiener filter. For the stochastic case, the 2-D
Wiener deconvolution, we use a disk-shaped PSF with a uniform
equivalent of the expression given by Eq. (9.92c) for the power
distribution given by
spectral density is
(
1 for m2 + n2 < 145, C
hblur [n, m] = (9.107) Sf (Ω1 , Ω2 ) = . (9.111)
0 for m2 + n2 ≥ 145, (Ω21 + Ω22 )2
which provides a good model of out-of-focus imaging. The Upon setting Ω1 = 2π k1 /N and Ω2 = 2π k2 /N in Eqs. (9.111)
circular PSF has a radius of 12 pixels. We also add noise in the and (9.103), we obtain the stochastic Wiener deconvolution
(a) Original MRI image f [n,m]

ˆ
(c) Deconvolved image fD[n,m] using
deterministic Wiener filter
ˆ
(b) Noisy blurred MRI image g[n,m] (d) Deconvolved image fs[n,m] using
stochastic Wiener filter
Figure 9-4 MRI images: (a) Original “true” image, (b) image blurred by imaging system, (c) image deconvolved by deterministic Wiener
filter, and (d) image deconvolved by stochastic Wiener filter.
9-7 SPECTRAL ESTIMATION 313
filter: and the MAP estimate is


H∗blur [k1 , k2 ] 
gobs [n, m] − λ for gobs [n] > λ ,
WSDC [k1 , k2 ] = 4 2 .
2 2π σv 2 2 2
ˆfMAP [n, m] = gobs [n, m] + λ for gobs [n] < −λ ,
|Hblur [k1 , k2 ]| + (k + k2 ) 
0
N C 1 for |gobs [n]| < λ ,
(9.112) (9.116)
Repetition of the recipe given earlier for the deterministic filter where λ is the noise-to-signal ratio
but replacing WDDC [k1 , k2 ] in Eq. (9.110) with WSDC [k1 , k2 ] led
to the image shown in Fig. 9-4(d). The process was optimized by √ σv2
repeating it for several different values of the parameter σv2 /C. λ= 2 . (9.117)
σf
The “best” result was for σv2 /C = 1.
Comparison of the images in Fig. 9-4 leads to two conclu-
sions: Concept Question 9-7: Why did the stochastic Wiener
(1) The image in part (c), generated by applying the determin- deconvolution filter produce a much better estimate than the
istic Wiener filter, is far superior to the blurred image in part (b). deterministic Wiener deconvolution filter in Section 9-6?
(2) Among the two filtering approaches, the stochastic ap-
proach (image in (d)) yields a sharper image than its determin-
Exercise 9-12: When σf → ∞, the Laplacian pdf given by
istic counterpart, motivating the stochastic approach.
Eq. (9.114) approaches the uniform distribution. What does
the MAP estimate reduce to when σf → ∞?
9-6.6 Stochastic 2-D MAP Sparsifying Answer: As σf → ∞, the parameter λ → 0, in which case
fMAP [n, m] → gobs [n, m].
Estimator
The 2-D analogue of the 1-D sparsifying estimator of Section
9-4.6 can be summarized as follows: 9-7 Spectral Estimation
(1) To apply the 1-D and 2-D stochastic denoising and deconvolu-
gobs [n, m] = f [n, m] + ν [n, m], (9.113) tion operations outlined in Sections 9-4 to 9-6, we need to know
the power spectral densities Sx (Ω) of the 1-D unknown random
with gobs [n, m] representing the observation, f [n, m] repre- process x[n] and Sf (Ω1 , Ω2 ) of the 2-D random field f [n, m].
senting the unknown random field, and ν [n, m] represent- Had x[n] been available, we could have applied the 1-D
ing a white Gaussian random field with known variance N-order DFT to estimate Sx (Ω) using
Sv (Ω1 , Ω2 ) = σv2 .
2
2π k 1 N−1

(2) f [n, m] and ν [n, m] are IID, jointly WSS random fields. Ŝx Ω = = ∑ x[n] e− j2π kn/N , (9.118)
N N n=0
(3) Each f [n, m] has a 2-D Laplacian a priori pdf given by for k = 0, 1, . . . , N − 1. The division by N converts energy
spectral density to power spectral density. Similarly, application
√
1 of the 2-D Nth-order DFT to f [n, m] leads
p( f [n, m]) = √ e− 2 | f [n,m]|/σf . (9.114)
2 σf
2 π k1 2 π k2
Ŝf Ω1 = , Ω2 = =
N N
The 2-D LASSO cost-functional equivalent to the 1-D expres- 2
sion given by Eq. (9.81) is 1 N−1
− j2π (nk1 +mk2 )/N

N 2 n=0 ∑ f [n, m] e ,

(9.119)
1 N N √ σv2 N N
Λ=
2 ∑ ∑ (gobs[n, m] − f [n, m])2 + 2
σf ∑ ∑ | f [n, m]|, for k1 = 0, 1, . . . , N − 1, and k2 = 0, 1, . . . , N − 1.
n=1 m=1 n=1 m=1
(9.115) This estimation method is known as the periodogram spectral
estimator. Other spectral estimation methods exist as well, but definition applies to an image in 2-D. Fractals are quite common
the central problem is that x[n] and f [n, m] are the unknown in nature; examples include certain classes of trees, river deltas,
quantities we wish to estimate, so we have no direct way to and coastlines (Fig. 9-5). In a perfect fractal, self-similarity
determine their power spectral densities. However, we can use exists over an infinite number of scales, but for real objects the
parametric models to describe Sx (Ω) and Sf (Ω1 , Ω2 ), which similarity is exhibited over a finite number of scales.
can then be used in the stochastic Wiener estimation recipes of An example of a fractal signal is shown in Fig. 9-6; note
Sections 9-4 to 9-6 to obtain the seemingly best outcome. As we the statistical resemblance between (a) the pattern of the entire
shall see in the next section, many images and signals exhibit signal extending over the range between 0 and 1 in part (a) of the
a fractal-like behavior with corresponding 1-D and 2-D power figure and (b) the pattern in part (b) of only a narrow segment of
spectral densities of the form the original, extending between 0.4 and 0.5 of the original scale.
C
Sx (Ω) = , for Ωmin < |Ω| < π , (9.120a)
|Ω|a
C 9-8.1 Continuous-Time Fractals
Sf (Ω1 , Ω2 ) = 2 , for Ωmin < |Ω1 |, |Ω2 | < π ,
(Ω1 + Ω22)b
(9.120b) Perfect self-similarity implies that a signal x(t) is identically
equal to a scaled version of itself, x(at), where a is a positive
where a, b, C, and Ωmin are adjustable constant parameters. scaling constant, but statistical self-similarity implies that if
The expressions given by Eq. (9.120) are known as power laws, x(t) is a zero-mean wide-sense stationary (WSS) random pro-
an example of which with b = 2 was used in Section 9-6.5 to cess, then its autocorrelation is self-similar. That is,
deconvolve a blurred MRI image (Fig. 9-4).
Rx (τ ) = C Rx (aτ ), (9.121)
Exercise 9-13: Suppose we estimate the autocorrelation where τ is the time shift between x(t) and x(t − τ ) in
function Rx [n] of data {x[n], n = 0, . . . , N − 1} using the
sample mean over i of {x[i] x[i − n]}, zero-padding {x[n]} Rx (τ ) = E[x(t) x(t − τ )]. (9.122)
as needed. Show that the DFT of R̂x [n] is the periodogram
(this is one reason why the periodogram works). According to Eq. (9.121), within a multiplicative constant C, the
variation of the autocorrelation Rx (τ ) with the time shift τ is the
Answer: same as the variation of the autocorrelation of the time-scaled
N−1 version Rx (aτ ) with the scaled time (aτ ).
1 The self-similarity property extends to the power spectral
R̂x [n] =
N ∑ x[i] x[i − n] density. According to Eq. (8.139a), the power spectral density
i=0
1 N−1 Sx ( f ) of x(t) is related to its autocorrelation function Rx (τ ) by

=
N ∑ x[i] x[−(n − i)] Z ∞
i=0
Sx ( f ) = Rx (τ ) e− j2π f τ d τ . (9.123)
1 −∞
= (x[n] ∗ x[−n]).
N Inserting Eq. (9.121) into Eq. (9.123) and then replacing τ with
From entries #4 and #6 of Table 2-7, the DSFT of τ ′ /a leads to
(x[n] ∗ x[−n]) is X(Ω) X(−Ω) = X(Ω) X∗ (Ω) = |X(Ω)|2 , Z ∞
which is the periodogram defined in Eq. (9.118). Sx ( f ) = C Rx (aτ ) e− j2π f τ d τ
−∞
Z ∞
C ′ − j2π f τ ′ /a ′ C f
= Rx (τ ) e d τ = Sx . (9.124)
a −∞ a a
9-8 1-D Fractals
Using functional analysis, it can be shown that strictly speaking,
A fractal is a signal, image, or any object that exhibits self- the only class of signals that are self-similar are power laws
similarity across different scales. A signal is self-similar if any characterized by the form x(t) = Ct a , where C and a are
segment of it resembles the overall signal statistically. The same constants. For a 1-D fractal random process, its power spectral
9-8 1-D FRACTALS 315
Figure 9-5 Fractal patterns in nature.
density has the form Fractal random processes are often characterized using col-
ors:
C White Fractal (a = 0): a random process whose power spec-
Sx ( f ) = . (9.125)
| f |a tral density, as given by Eq. (9.125), has a frequency exponent
a = 0. Hence, Sx ( f ) = C, which means that all frequencies are
We should note that a > 0 and S( f ) = S(− f ). present and weighted equally, just like white light.
w(t) w(t)
0.03 0.015
0.02 0.01
0.01 0.005
0 0
−0.01 −0.005
−0.02
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t (s) −0.01
0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5
t (s)
(a) 1-D signal (b) A small segment of the signal in (a)
Figure 9-6 Fractal signal w(t): Note the self-similarity between: (a) the entire signal and (b) a 10× expanded scale version of the segment
between t = 0.4 s and 0.5 s.
Pink Fractal (a = 1): a random process with Sx ( f ) = C/| f |. process is

Since higher frequencies (green and blue) are more heavily Z fmax Z fmax
attenuated, the combination appears pink in color. 2C C
P=2 Sx ( f ) d f = 2 ( f 1−a − fmin
df =1−a
).
Brown Fractal (a = 2): Brownian motion is characterized by f min f min 1 − a max
fa
Sx ( f ) = C/ f 2 , hence the name brown. (9.127)
A true fractal signal is not realistic because it has infinite total The total power also is related to the zero-shift autocorrelation
power: function Rx (0) and to the variance σx2 of the zero-mean Gaussian
Z ∞
C random process x(t) as in:
P= a
df →∞
−∞ | f |
P = Rx (0) = σx2 . (9.128)
for any a ≥ 0 and C 6= 0. Hence, Sx ( f ) should be limited
to a range of frequencies extending between fmin and fmax .
Continuing with the color analogy, fmin has to be greater than 9-8.2 Discrete-Time Fractals
zero, because if a > 1, As noted earlier, the power spectral density of a realistic fractal
Z ∞ signal x(t) has to be bandlimited to within a range between
C
df →∞ (a > 1). fmin and fmax . Hence, x(t) can be converted to a discrete-time
f min fa
fractal random process x[n] by sampling it at a rate exceeding the
unless fmin > 0. This unacceptable condition is known as the Nyquist sampling rate of 2 fmax samples/s. The self-similarity
infrared catastrophe because, relative to frequencies in the property of x(t) is equally present in x[n], although it is more
viable part of the spectrum, infrared frequencies are considered difficult to envision it in discrete time.
to be very low. On the other end of the spectrum, the ultraviolet The functional form of the power spectral density of x[n] is
catastrophe represents the condition when 0 < a < 1 and analogous to that given by Eq. (9.125):
Z fmax C
C Sx (Ω) = , Ωmin < |Ω| < π , (9.129)
df →∞ (0 < a < 1), |Ω|a
0 fa
unless fmax < ∞. Hence, for a realistic fractal random process, for some lower frequency bound Ωmin . Since Sx (Ω) = Sx (−Ω)
the average power of x[n] is
C Z π
Sx ( f ) = , fmin < | f | < fmax . (9.126) 1
| f |a Pav = 2 × Sx (Ω) dΩ
2π Ωmin
Since Sx ( f ) is symmetrical with respect to f = 0, Z π
1 C C
Sx ( f ) = Sx (− f ), and the total power of the fractal random = a
dΩ = (π 1−a − Ω1−a
min ). (9.130)
π Ωmin Ω π (1 − a)
9-8.3 Wiener Process treat w(t) as a continuous-time signal, even though in reality it
is a discrete-time version of w(t). The power spectral density
A Wiener process w(t) is a zero-mean non-WSS random pro- Ŝw ( f ) was estimated using the periodogram
cess generated by integrating a white Gaussian random process
z(t): Z ∞ 2
Z − j2π f t

t 2
Ŝw ( f ) = |W( f )| = w(t) e dt . (9.137)
w(t) = z(τ ) d τ . (9.131) −∞
−∞
The Wiener process is a 1-D version of Brownian motion, which Figure 9-7(b) displays a plot of Sw ( f ) as a function of f on a
describes the motion of particles or molecules in a solution. log-log scale over the range 1 < f < 400 Hz. Superimposed onto
Often w(t) is initialized with w(0) = 0. the actual spectrum (in blue) is a straight line in red whose slope
Eve though the Wiener process is not WSS—and therefore in log-log scale is equivalent to 1/ f 2 , confirming the applicabil-
it does not have a definable power spectral density, we will ity of the brown fractal model described by Eq. (9.136). From
nonetheless proceed heuristically to obtain an expression for the intercept in Fig. 9-7(b), it was determined that
Sw ( f ). To that end, we start by converting Eq. (9.131) into the
differential form σz2 = 0.25.
dw
= z(t). (9.132) Hence, Eq. (9.136) becomes
dt
Utilizing entry #5 in Table 2-4, the Fourier transform of the 0.25 1
system described by Eq. (9.132) is Ŝw ( f ) = , (1 < | f | < 400).
4π 2 | f |2
( j2π f ) W( f ) = Z( f ), (9.133)
which leads to the system response function H( f ) as

A. Stochastic Wiener Denoising
W( f ) 1
H( f ) = = . (9.134) To illustrate the utility of the stochastic denoising Wiener filter
Z( f ) j2π f described in Section 9-4.4, we added random noise ν (t) to w(t)
As noted earlier in the definition for the Wiener process w(t), to generate
the random process z(t) is white Gaussian, so the power spectral y(t) = w(t) + ν (t). (9.138)
density of z(t) is constant: Sz ( f ) = σz2 . For an LTI system with The sampled noisy signal y[n], shown in Fig. 9-7(c), ex-
transfer function H( f ), the power spectral density Sw ( f ) of the hibits large fluctuations because the signal-to-noise ratio is
Wiener random process w(t) is related to the power spectral SNR = −0.21 dB. The negative sign is indicative that the noise
density Sz ( f ) of the white random process z(t) by contains more energy than the original signal w(t). The corre-
sponding signal-power to noise-power ratio is
Sw ( f ) = |H( f )|2 Sz ( f ), (9.135)
Ps
which leads to = 10(−0.21)/10 = 0.95.
Pn
σz2 1
Sw ( f ) = . (9.136)
4π 2 f 2 The variance of ν (t) is σv2 = 0.01.
Despite the non-rigorous analysis leading to Eq. (9.136), the Sampling y(t) at a sampling interval of 1 ms (corresponding
result confirms that the power spectral density of a Wiener to a sampling rate of 1000 samples/s) converts y(t) in Eq. (9.138)
process varies as 1/ f 2 . into the discrete-time signal
y[n] = w[n] + ν [n]. (9.139)

9-8.4 Stochastic Wiener Filtering of Wiener In the discrete-time frequency domain Ω, the power spectral
Processes density corresponding to Eq. (9.136) is
The plot shown in Fig. 9-7(a) displays a realization of a Wiener σz2 1 1
process w(t) extending over a duration of 1 s. For simplicity, we Sw (Ω) = = , (9.140)
4π 2 Ω2 16π 2 Ω2
w(t) y[n]
0.03 0.04
0.02
0.02
0.01
0
0
−0.02
−0.01
−0.02
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t (s) −0.04
0 100 200 300 400 500 600 700 800 900 1000
n
(a) Wiener process w(t) (c) Wiener process w[n] plus noise ν[n]
Sw( f )
10−2
10−3
wˆ LS[n]
10−4 0.02
0.015
10−5
0.01
10−6 0.005
0
10−7 −0.005
−0.01
10−8
3 10 100 300
f (Hz) −0.015
0 100 200 300 400 500 600 700 800 900 1000
n
400
(b) Log-log plot of Sw( f ) (d) Denoised Wiener process using
stochastic Wiener filtering
wˆ D[n]
0.04
0.02
−0.02
−0.04
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
n
(e) Denoised Wiener process using
deterministicWiener filtering
Figure 9-7 (a) Wiener random process w(t), (b) spectrum Sw ( f ) of w(t) plotted on a log-log scale, (c) noisy signal y[n] = w[n] + ν [n], (d)
denoised estimate ŵLS [n] using stochastic filtering, and (e) deterministic estimate ŵD [n], which resembles the noisy Wiener process in (c).
where we used σz2 = 0.25. With subscript x in Eq. (9.67) Eq. (9.67) gives
changed to subscript w, and σv2 = 0.01, use of Eq. (9.140) in
Sw (Ω)
HSDN (Ω) =
Sw (Ω) + σv2
1/(16π 2Ω2 ) 1
= = , (9.141)
1/(16π 2Ω2 ) + 0.01 1 + 1.58Ω2
ŴLS (Ω) = HSDN (Ω) Y(Ω), (9.142) tion is equivalent to integration in continuous time or summation
and in discrete time. That is, at a sampling rate of 1000 samples/s,
ŵLS [n] = DTFT−1 { ŴLS (Ω) }, (9.143) n
z[n] = 1000 ∑ w[n], (9.147a)
where Y(Ω) is the DTFT of y[n]. n−100
Application of the recipe outlined in Section 9-4.4 led to
the plot shown in Fig. 9-7(d). The estimated signal ŵLS [n] or equivalently,
bears very close resemblance to the true signal w(t) shown Zt
in Fig. 9-7(a), demonstrating the capability of the stochastic z(t) = 1000 w(τ ) d τ . (9.147b)
t−0.1
Wiener filter as a powerful tool for removing noise from noisy
signals. To perform the summation (or integration), it is necessary to
append w[n] with zeros for −100 ≤ n < 0 (or equivalently,
B. Deterministic Wiener Denoising −0.1 ≤ t < 0 for w(t)). The integrated signal z[n] in Fig. 9-8(b)
is a smoothed version of the original signal w[n].
In the deterministic approach to noise filtering, signal w(t) is Next, white Gaussian noise ν [n] was added to the system
treated as a white random process with Sw ( f ) = σw2 . Conse- output z[n] to produce noisy observation
quently, the expression for the deterministic denoising filter
transfer function becomes y[n] = z[n] + ν [n] = hblur [n] ∗ w[n] + ν [n]. (9.148)
Sw (Ω) σw2 The noisy signal, plotted in Fig. 9-8(c), is only slightly different
HDDN (Ω) = = . (9.144) from z[n] of Fig. 9-8(b), because the signal-to-noise ratio is
Sw (Ω) + σv2 σw2 + σv2
36.9 dB, corresponding to an average signal power of about
Consequently, since HDDN (Ω) is no longer a function of Ω, 4900 times that of the noise.
implementation of the two steps in Eqs. (9.142) and (9.143) The stochastic Wiener deconvolution method uses the filter
leads to the deterministic estimate given by Eq. (9.71a), with subscript x changed to w:

σw2 H∗blur (Ω) Sw (Ω)
ŵD [n] = y[n]. (9.145) WSDC (Ω) = . (9.149)
σw2 + σv2 |Hblur (Ω)|2 Sw (Ω) + σv2
Clearly, ŵD [n] is just a scaled version of y[n], and therefore no Here, Hblur (Ω) is the DTFT of hblur [n] defined by Eq. (9.146),
filtering is performed. The plot shown in Fig. 9-7(e) is identical Sw (Ω) is given by Eq. (9.140), and from knowledge of the noise
to that in Fig. 9-7(c) for y[n]. For display purposes, we chose to added to z[n], σv2 = 0.01. Once WSDC (Ω) has been computed,
set the quantity σw2 /(σw2 + σv2 ) = 1. w[n] can be estimated using the form of Eq. (9.73), namely
ŵS [n] = DTFT−1 [WSDC (Ω) Y(Ω)], (9.150)

9-8.5 Stochastic Deconvolution of the Wiener
Process where Y(Ω) is the DTFT of the convolved noisy observation
y[n]. Implementation of the stochastic convolution process led
To illustrate how the stochastic Wiener deconvolution filter of to the plot in Fig. 6-9(d). The mean-square-error between w[n]
Section 9-4.5 can by used to deconvolve a random process, we and its reconstruction ŵS [n] over the interval 0 < n < 1000 is
selected the Wiener process w(t) shown in Fig. 9-8(a) and then 2.25 × 10−6.
convolved it with a blurring filter characterized by the rectangle For the sake of comparison, in part (e) of Fig. 9-8 we
impulse response again show ŵD [n], the result of reconstructing w[n] using the
( deterministic Wiener filter. Clearly, the deterministic approach
1000, 0 < n < 100, is not effective.
hblur [n] = (9.146)
0, otherwise.
The convolved Wiener process z[n] = w[n] ∗ hblur [n] is displayed

in Fig. 9-8(b). The convolution process with the rectangle func-
w[n] y[n]
0.03 2
1.5
0.02
1
0.01 0.5
0 0
−0.5
−0.01
−1
−0.02
0 100 200 300 400 500 600 700 800 900 1000
n −1.5
0 100 200 300 400 500 600 700 800 900 1000
n
(a) Wiener process w[n] (c) Convolved Wiener process with noise added
z[n] wˆ SW[n]
2 0.03
1.5
0.02
1
0.5 0.01
0 0
−0.5
−0.01
−1
−1.5 0 100 200 300 400 500 600 700 800 900 1000
n −0.02
0 100 200 300 400 500 600 700 800 900 1000 n
(b) Wiener process convolved with rect (d) Deconvolved Wiener process using
function of duration 100 and amplitude 1000 stochastic Wiener deconvolution filter
wˆ DW
0.2
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
0 100 200 300 400 500 600 700 800 900 1000
n
(e) Deconvolved Wiener process using
deterministic Wiener deconvolution filter
Figure 9-8 Wiener deconvolution: (a) original Wiener process w[n], (b) convolved signal z[n] = h[n] ∗ w[n], (c) y[n] = z[n] + ν [n],
where ν [n] is white Gaussian noise with S/N = 36.9 dB, (d) deconvolved reconstruction ŵSW [n] using a stochastic Wiener filter, and
(e) deconvolved reconstruction ŵDW [n] using a deterministic filter.
Concept Question 9-8: Why does the reciprocal power

9-9 2-D Fractals
form of the power spectral density of a fractal random
process need only hold over a finite range of frequencies, The 1-D fractal concepts introduced in the preceding section
as in Eq. (9.126)? generalize readily to 2-D images, which we now demonstrate
through an example:
(1) Figure 9-9(a): 960 × 1280 image f [n, m] of a tree with a
fractal-like branching structure.
(a) Fractal tree image (d) Noisy blurred image

log10(Sf (0,Ω2))
9
1
−2 −1.5 −1 −0.5 0 0.5
log(Ω2)
(b) Plot of log10(Sf (0,Ω2)) versus (e) Reconstructed image using
log10(Ω2) for π/240 ≤ Ω2 ≤ π. a stochastic Wiener filter
The average slope is −2.
(c) Blurred image (f ) Reconstructed image using

a deterministic Wiener filter
Figure 9-9 2-D deconvolution example.

(2) Figure 9-9(b): To ascertain the fractal character of the ence, namely that instead of using Eq. (9.152) for Sf (Ω1 , Ω2 )
tree, the 2-D power spectral density Sf (Ω1 , Ω2 ) was estimated the expression used is Sf (Ω1 , Ω2 ) = σf2 . The absence of the
using inverse frequency dependency leads to a fuzzier reconstruction
than realized by the stochastic deconvolution filter.
959 1279
Ŝf (Ω1 , Ω2 ) = ∑ ∑ f [n, m] e− j(Ω1 n+Ω2 m) , (9.151)
n=0 m=0
9-10 Markov Random Fields
and then Ŝf (0, Ω2 ) was plotted in Fig. 9-9(b) on a log-log scale.
The average slope is −2, which means that Ŝf (0, Ω2 ) varies as Medical applications—such as ultrasound imaging, MRI, and
1/Ω22. Generalizing to 2-D, others—rely on the use of image processing tools to segment
the images produced by those imaging sensors into regions of
C common features. The different regions may belong to different
Sf (Ω1 , Ω2 ) = . (9.152)
Ω21 + Ω22 organs or different types of tissue, and the goal of the segmenta-
tion process is to facilitate the interpretation of the information
(3) Figure 9-9(c): To simulate motion blur in the horizontal contained in the images. Another name for segmentation is
direction, the image in Fig. 9-9(a) is convolved with a (1 × 151) classification: assigning each pixel to one of a set of predefined
2-D PSF given by classes on the basis of its own value as well as those of its
( neighbors. Similar tools are used to segment an image captured
1 for 0 ≤ n ≤ 150 and m = 1, by a video camera, an infrared temperature sensor, or a weather
hblur [n, m] = (9.153)
0 otherwise. radar system.
An important ingredient of the image segmentation process
The resultant 960 × 1430 blurred image given by is a parameter estimation technique that models an image as a
Markov random field (MRF). The MRF model assigns each
z[n, m] = hblur [n, m] ∗ ∗ f [n, m] (9.154) image pixel f [n, m] a conditional probability density based, in
part, on the values of the pixels in its immediate neighborhood.
is shown in Fig. 9-9(c) The purpose of the present section is to introduce the concept
(4) Figure 9-9(d): Addition of a slight amount of noise and attributes of MRFs and to demonstrate their applications
ν [n, m] to z[n, m] gives through image examples.
g[n, m] = z[n, m] + ν [n, m] = hblur [n, m] ∗ ∗ f [n, m] + ν [n, m].
(9.155)
9-10.1 1-D Markov Process
The slightly noisy convolved image g[n, m] is shown in
Fig. 9-9(d). Before we delve into the 2-D case, let us first consider the
(5) Figure 9-9(e): Generalizing Eq. (9.149) to 2-D gives 1-D case of a Markov random process x[n]. The value of
x[n] is continuous, but it is sampled in time in discrete steps,
H∗blur (Ω1 , Ω2 ) Sf (Ω1 , Ω2 ) generating the random vector { . . . , x[0], x[1], . . . , x[N], . . . }. In
WSDC (Ω1 , Ω2 ) = , (9.156)
|Hblur (Ω1 , Ω2 )|2 Sf (Ω1 , Ω2 ) + σv2 Markov language, x[n] at time n is regarded as the present,
x[n + 1] is regarded as the future, and its values x[n − 1],
where Hblur (Ω1 , Ω2 ) is the 2-D DSFT of hblur [n, m] and σv2 x[n − 2], . . . , x[0] are regarded as the past. The Markov model
is the noise variance. Application of the stochastic Wiener assigns a conditional pdf to “the future value x[n + 1] based on
reconstruction recipe in 2-D gives the present value x[n] but independent of past values,” which is
equivalent to the mathematical statement
fˆS [n, m] = DSFT−1 [WSDC (Ω1 , Ω2 ) G(Ω1 , Ω2 )], (9.157)
p(x[n + 1]|{ x[0], x[1], . . ., x[n] }) = p(x[n + 1]|x[n]). (9.158)
where G(Ω1 , Ω2 ) is the 2-D DSFT of the observed image
g[n, m]. The reconstructed image is shown in Fig. 9-9(e). In 2-D, the “present value of x[n]” becomes the values of the
(6) Figure 9-9(f): Deterministic convolution involves the pixels in the neighborhood of pixel f [n, m], and “past values”
same steps represented by Eqs. (9.156) and (9.157) for the become pixels outside that neighborhood.
stochastic deconvolution process except for one single differ- Using the Markov condition encapsulated by Eq. (9.158), and
9-10 MARKOV RANDOM FIELDS 323
after much algebra, it can be shown that the classes pertain to different types of tissue, while in a video
image of terrain, the classes might be roads, cars, trees, etc. The
p(x[n]|{ . . . , x[N], x[N − 1], . . ., x[n + 1], x[n − 1], . . ., x[0], . . . }) segmentation procedure involves several elements, the first of
p(x[n + 1]|x[n]) p(x[n]|x[n − 1]) which is the likelihood function of the estimation method used
=Z , (9.159) to implement the segmentation.
p(x′ [n + 1]|x′[n]) p(x′ [n]|x′ [n − 1]) dx′ [n] Given the general image model
which states that the conditional pdf of x[n], given all other g[n, m] = f [n, m] + ν [n, m], (9.160)
values { . . . , x[0], . . . , x[N], . . . }, is governed by the product of
two conditional pdfs, one relating x[n] to its immediate past where g[n, m] is the observed intensity of pixel [n, m], f [n, m]
x[n−1], and another relating x[n] to its immediate future x[n+1]. is the noise-free intensity, and ν [n, m] is the additive noise, the
An MRF generalizes this relationship from 1-D to 2-D using the estimated value fˆ[n, m] is obtained from g[n, m] by maximizing
concepts of neighborhoods and cliques. the likelihood function. The likelihood functions for the maxi-
mum likelihood estimation (MLE) method and the maximum a
posteriori probability (MAP) method are given by
9-10.2 Neighborhoods and Cliques
MLE:
The neighborhood of a pixel at location [n, m] is denoted∗ p(g[n, m]| f [n, m]), (9.161a)
∆[n, m] and consists of a set of pixels surrounding pixel [n, m],
but excluding it. Figure 9-10 displays the pixel locations in- MAP:
cluded in 3-, 8-, 15-neighbor systems. Neighborhoods may also
p(g[n, m]| f [n, m]) p( f [n, m])
be defined in terms of cliques. p( f [n, m]|g[n, m]) = . (9.161b)
A clique is a set of locations such that any two members of p(g[n, m])
the clique adjoin each other, either horizontally, vertically, or A. MLE Estimate fˆMLE [n, m]
diagonally. Figure 9-11 shows 10 cliques, one of which is a
self-adjoining clique. A neighborhood can be represented as a As shown earlier in Sections 9-2 and 9-3 for the 1-D case, the
disjoint union of cliques of various types. MLE maximization process entails: (a) introducing an appro-
In a typical image, pixel neighbors are more likely to have priate model for the conditional probability of Eq. (9.161a), (b)
similar intensities than distant pixels. Different organs in a taking the logarithm of the model expression, (c) computing the
medical image or different objects in an imaged scene tend to derivative of the likelihood function with respect to the unknown
have smooth features and the boundaries between them tend quantity f [n, m] and equating it to zero, and (d) solving for
to have sharp transitions across them. Image texture, defined f [n, m]. The resultant value of f [n, m] is labeled fˆMLE [n, m]. The
as the spatial variability within a given “homogeneous” region, computed estimate is then used to assign pixel [n, m] to one of
might exhibit different statistical variations in different types the K classes, in accordance with a class-assignment algorithm
of regions (classes). Image segmentation (classification) algo- (introduced shortly).
rithms differ by the type of sensor used to produce the image,
the type of scene, and the intended application. However, most
of these algorithms share a common strategy, which we outline B. MAP Estimate fˆMAP [n, m]
in this and the next section. Part of the strategy is to define
the “relevant” neighborhood for a particular application, which MAP requires models for two pdfs, namely the same con-
entails defining the combination of cliques comprising such a ditional pdf used in MLE, p(g[n, m]| f [n, m]), as well as the
neighborhood. pdf p( f [n, m]). The third pdf, p(g[n, m]) in the denominator
of Eq. (9.161b), is not needed and may be set equal to a
constant. This is because it disappears in the maximization
9-10.3 Likelihood Functions procedure (taking the log, differentiating with respect to f [n, m],
and equating to zero).
The goal of image segmentation is to classify each pixel into The type of models commonly used to describe the two pdfs
one of K predefined classes. In a medical ultrasound image, in the numerator of Eq. (9.161b) are introduced shortly.
∗ For purposes of clarity, we use the symbol ∆, even though the Markov field
literature use ∂ .
m [n, m − 1] [n − 1, m − 1] [n, m − 1] [n +1, m − 1]
[n − 1, m] [n, m] [n +1, m] [n − 1, m] [n, m] [n +1, m]
[n, m + 1] [n − 1, m + 1] [n, m + 1] [n +1, m + 1]
(a) 3 neighbors (b) 8 neighbors
[n − 2, m − 2] [n − 1, m − 2] [n, m − 2] [n +1, m − 2] [n + 2, m − 2]
[n − 2, m − 1] [n − 1, m − 1] [n, m − 1] [n +1, m − 1] [n + 2, m − 1]
[n − 2, m] [n − 1, m] [n, m] [n +1, m] [n + 2, m]
[n − 2, m + 1] [n − 1, m + 1] [n, m + 1] [n +1, m + 1] [n + 2, m + 1]
[n − 2, m + 2] [n − 1, m + 2] [n, m + 2] [n +1, m + 2] [n + 2, m + 2]
(c) 15 neighbors
Figure 9-10 Examples of pixel neighborhoods.

9-10 MARKOV RANDOM FIELDS 325
The Gibbs distribution is a joint pdf for f [n, m] at all locations

0 0 0 0 * 0 [n, m], and it has the general form:
* 0 * * * 0
(a) Self (b) Horizontal (c) Vertical 1 −U[n,m]
p({ f [n, m] }) = e , (9.163)
clique clique clique Z
where { f [n, m]} is f [n, m] at all locations [n, m] and Z is the
* 0 0 * * 0 partition function that normalizes the pdf so that it integrates to
0 * * 0 * * unity, as every pdf should, and U[n, m] is the sum of the potential
(d) Left-diagonal (e) Right-diagonal (f ) L-clique energy over the cliques c:
clique clique
U[n, m] = ∑ Uc [n′ , m′ ]. (9.164)
all cliques c
* * 0 * * *
0 * * * * 0 The summation extends over the cliques defining the specified
(g) Inverted (h) Reversed (i) Inverted- neighborhood ∆[n, m], and the model for the potential energy of
L-clique L-clique reversed an individual clique depends on the segmentation algorithm.
L-clique The Ising model is a popular image-processing algorithm
used for identifying boundaries between different classes. The
* *
* * original Ising model, which was developed for characterizing
the statistics of ±1 electron spin in ferromagnetic materials, has
( j) Square clique been adapted to image segmentation by restricting the Markov
random field f [n, m] to a few discrete values. Consequently,
Figure 9-11 Ten types of pixel cliques. A red star denotes a f [n, m] is described by a pmf, not a pdf. Each f [n, m] interacts
location in a clique and a 0 denotes a location not in a clique. only with its specified neighbors, such as its 8 immediate
neighbors.
A particular form of the Ising model uses a binary assignment
in which f [n, m] can assume either a value of 0 or a value of 1,
9-10.4 pdf p( f [n, m]) and it interacts with its two horizontal neighbors and two vertical
neighbors (Fig. 9-10(a)). Thus,
The Hammersley-Clifford theorem is sometimes called “the
fundamental theorem of random fields.” It states that any con- ∆[n, m] = { [n ± 1, m ± 1] }. (9.165)
ditional pdf for a set of random variables in a Markov random
field is always given by the Gibbs distribution, provided the The potential energy of pixel [n, m] is defined as
following conditions hold:
(1) Positivity: p( f [n, m]) > 0, which usually is true for U[n, m] = β s[n, m], (9.166)
images. Note that positivity states that the pmf p( f [n, m]) of
where β is a model parameter selected through experimen-
f [n, m] is always positive, but f [n, m] itself may assume both
tation (as discussed later in Section 9-11.3) and s[n, m] is a
positive and negative values.
dissimilarity index that accounts for how many of the four
(2) Stationarity: p( f [n, m]) does not vary with position. That
pixels surrounding pixel [n, m] are different from pixel [n, m].
is, the same statistics apply across the entire image.
For example, s[n, m] = 0 for the center pixel in Fig. 9-12(a) and
(3) Locality: For a specified neighborhood ∆[n, m], f [n, m], is
s[n, m] = 4 for the center pixel in Fig. 9-12(b). The full range
a Markov random field that obeys the locality condition
of dissimilarity index s[n, m] is between 0 for a pixel surrounded
p( f [n, m]|{ f [n′ , m′ ], n′ 6= n, m′ 6= m }) with like pixels and 4 for a pixel different from all of its four
neighbors.
= p( f [n, m]| f∆ [n, m]), (9.162) To compute s[n, m] for each pixel in the image, we first have
to devise a scheme for assigning a value of 0 or 1 to each pixel.
where f∆ [n, m] are the values of pixels in the specified neighbor-
As we shall see in a later section, such an assignment can be
hood ∆[n, m]. Thus, the pdf of f [n, m] at location [n, m] depends
realized using maximum likelihood segmentation. Consider, for
on only the values of the pixels within the neighborhood ∆[n, m].
information about the neighborhood of that pixel, akin to how

0 1 1 0 our eye-brain system perceives the boundaries between the
0 0 0 1 1 1 1 0 1 0 1 0 squares in the image of Fig. 9-13(a).
0 1 1 0 By computing and then plotting the histogram of the observed
s[n,m] = 0 s[n,m] = 0 s[n,m] = 4 s[n,m] = 4 image g[n, m], as in Fig. 9-13(b), we can divide the range of
(a) All 4 neighbors are the (b) All 4 neighbors are values of g[n, m] into two segments and assign them values of
same as center pixel different from center pixel 0 and 1. [If the image is to be classified into more than two
classes, the approach can be extended accordingly, by assigning
1 discrete values commensurate with the distances between the
1 0 0 peaks in the distribution.] Using the 0/1 assignment, each pixel
0 in the original image is then assigned a value of 0 or 1. We call
s[n,m] = 2 this image a binary image.
(c) Two neighbors are different from center pixel Once the dissimilarity index s[n, m] has been determined for
each pixel, the Ising-Gibbs distribution given by Eq. (9.163)
Figure 9-12 Dissimilarity index s[n, m] is equal to the number becomes
1
of immediate horizontal and vertical pixels that are different p( f [n, m]) = e−β s[n,m] . (9.167)
from the pixel under consideration. Z
The negative exponential model favors shorter boundaries
(smaller s[n, m]) over longer boundaries. Its role in image seg-
mentation is discussed shortly.
example, the noisy two-class image shown in 9-13(a). Because
of the randomness associated with the pixel values, it is not
easy to assign individual pixels to their correct class. When Concept Question 9-9: In general terms, what is a
our eyes look at the image, however, we discern the boundaries Markov random field model of an image?
between the four squares, even though some of the pixels in the
predominantly dark squares are bright, and some of those in the
predominantly bright squares are dark. The goal of segmentation Concept Question 9-10: How can a Markov random
is to assign each pixel to its correct class by incorporating field model be useful?
25
10
20
20
15
30
10
40
5
50 0
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
60
10 20 30 40 50 60 f1 f2
(a) Noisy image (b) Histogram of noisy image
Figure 9-13 Noisy image of four squares and associated histogram. The noise-free image had only positive values, but the addition of
random noise expands the range to negative values and to larger positive values.
9-11 APPLICATION OF MRF TO IMAGE SEGMENTATION 327
9-11 Application of MRF to Image with g[n, m] = 50 is closer to f1 = 0 than to f2 = 150, and
therefore it gets classified as class 1 (black). Similarly, a pixel
Segmentation with g[n, m] = 100 is closest to f2 = 150, and therefore it gets
classified as class 2 (tissue). The MLE segmentation process
9-11.1 Image Histogram leads to the image shown in Fig. 9-14(c).
Consider the 320 × 320 noisy image shown in Fig. 9-14(a). Our
goal is to segment the image into three classes: bone, other
tissue, and background. To that end, we start by generating 9-11.3 MAP Segmentation
the histogram of the noisy image g[n, m]. The histogram dis- Inserting Eqs. (9.167) and (9.168) into Eq. (9.161b) provides the
played in Fig. 9-14(b) encompasses three clusters of pixels, a MAP likelihood function
group concentrated around f [n, m] = f1 = 0, another around
f [n, m] = f2 = 150, and a third around f [n, m] = f3 = 300. p(g[n, m]| f [n, m]) p( f [n, m])
p( f [n, m]|g[n, m]) =
The three clusters correspond to dark pixels in the background p(g[n, m])
part of the image, pixels of non-bone tissue, and pixels of the ( )
N N
1 −(1/2σ 2 )(g[n,m]− f [n,m])2 1 −β s[n,m]
bones in the five toes. In the present case, the values of f1 to = ∏∏√ e × e
f3 were extracted from the image histogram, but in practice n=1 m=1 2πσ
2 Z
more precise values are available from calibration experiments 1
performed with the imaging sensor. × , (9.170)
p(g[n, m])
9-11.2 MLE Segmentation where β is a selectable (trial-and-error) parameter similar to the

Tikhonov parameter λ , and s[n, m] is the dissimilarity index of
The likelihood function for a noisy image g[n, m], modeled as a pixel [n, m], computed from the MLE segmented image.
Gaussian with variance σ 2 , is given by The natural log of the MAP likelihood function is
N N
1 2 2 N2
p(g[n, m]| f [n, m]) = ∏ ∏ √2πσ 2 e−(1/2σ )(g[n,m]− f [n,m]) . 2
ln(p( f [n, m]|g[n, m])) = − ln(2πσ ) − ln Z − lnC
2
n=1 m=1
(9.168) ( )
N N
1
Here, N = 320. Taking the natural log of both sides gives + − 2 ∑ ∑ (g[n, m] − f [n, m])2 − β s[n, m] ,
2σ n=1 m=1
ln(p(g[n, m]| f [n, m])) = (9.171)
N N
N2 1
− ln(2πσ 2 ) − 2 ∑ ∑ (g[n, m] − f [n, m])2. where we have replaced p(g[n, m]) with constant C because
2 2σ n=1 m=1 p(g[n, m]) has no influence on the maximization process. In
(9.169) Eq. (9.171), we have two groups of terms. The first group—
consisting of three terms—has no role in the maximization
In the present segmentation, each pixel is assigned to one of process, whereas the second group does. Of the second group,
three classes: let us consider the terms associated with pixel [n, m] and let us
Class 1: f1 = 0 call their combination z[n, m]:
Class 2: f2 = 150
Class 3: f3 = 300 1
z[n, m] = − ((g[n, m] − f [n, m])2 − β s[n, m]). (9.172)
2σ 2
That is, for each pixel [n, m] with pixel value g[n, m], f [n, m] in
Eq. (9.169) can assume only one of the three listed values. Be- We wish to assign pixel f [n, m] one of three values, namely
cause of the minus sign ahead of the second term in Eq. (9.169), f1 = 0, f2 = 150, or f3 = 300. The assignment also classifies the
the log of the MLE likelihood function is maximized by min- pixel as background, tissue, or bone. The assignment seeks to
imizing the difference (g[n, m] − f [n, m])2 . Hence, for pixel maximize the likelihood function, which (because of the minus
[n, m] with value g[n, m], the MLE likelihood is maximized signs) is accomplished by minimizing the value of z[n, m]. In the
by assigning that pixel the value f1 , f2 , or f3 depending on absence of the second term in Eq. (9.172), the process would
which one of them is closest to g[n, m]. Consequently, a pixel collapse to the MLE segmentation of the previous subsection,
f1 = 0 f2 = 150 f3 = 300
1400
1200
1000
800
600
400
200
0 g[n,m]
0 100 200 300 400 500
(a) Noisy x-ray image (b) Histogram of noisy image
(c) MLE segmentation (d) MAP segmentation with ICM algorithm
Figure 9-14 Three-class segmentation example.
but the presence of the term β s[n, m] introduces the degree of (ICM) algorithm. The algorithm repeats the segmentation pro-
similarity/dissimilarity of pixel [n, m] relative to its neighbors cess iteratively until it reaches a defined threshold.
into the segmentation decision.
For each pixel [n, m], g[n, m] is the observed value of that pixel
in the noisy image, β is a trial-and-error parameter, σ 2 is the Example 9-3: Four-Square Image
image variance (usually determined through calibration tests),
and s[n, m] is the dissimilarity index obtained from the MLE
segmentation image. By computing z[n, m] three times, once Apply the ICM algorithm with β = 300 to segment the 64 × 64
with f [n, m] in Eq. (9.172) set equal to f1 , another with f [n, m] noisy binary image shown in Fig. 9-15(a). The noise level is
set equal to f2 , and a third time with f [n, m] set equal to f3 , characterized by a variance σ 2 = 0.25.
we obtain three values for z[n, m]. MAP segmentation selects
the smallest of the three absolute values of z[n, m], and assigns Solution: We start by computing and displaying the image
that pixel to the corresponding class. The outcome is shown in histogram shown in Fig. 9-15(b). Based on the histogram, we
Fig. 9-14(d). select f1 = 1.5 for the darker class and f2 = 2.5 for the brighter
Computationally, a commonly used algorithm for realizing class.
the MAP segmentation is the Iterated Conditional Modes Next, we apply MLE segmentation. We assign each pixel
[n, m] the value f1 or f2 depending on which one of them is
PROBLEMS 329
25
10
20
20
15
30
10
40
5
50 0
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
60
10 20 30 40 50 60 f1 f2
(a) Noisy image (b) p(g[n,m])
10 10 10
20 20 20
30 30 30
40 40 40
50 50 50
60 60 60
10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60
(c) MLE segmentation (d) ICM segmentation (e) ICM segmentation

of noisy image after 1 iteration after 9 iterations
Figure 9-15 Example 9-2.
closest in value to the pixel value g[n, m]. The result is displayed
Concept Question 9-11: What application of Markov
in Fig. 9-15(c).
random fields was covered in Section 9-11?
Finally, we apply MAP segmentation using the ICM algo-
rithm to obtain the image in Fig. 9-15(d). The segmentation
can be improved through iterative repetition of the process. The
first iteration generates an MLE image, computes s[n, m] for
each pixel, and then computes a MAP image. Each subsequent
iteration computes a new set of values for s[n, m] and a new
MAP image. The image in Fig. 9-15(e) is the result of 9
iterations. Except for a few misidentified pixels—mostly along
the boundaries of the four squares, the MAP segmentation
procedure provides very good discrimination between the white
and black squares.
Summary
Concepts
• The goal of estimation is to estimate an unknown x from minimizes the mean square error E[(x − x̂LS(y))2 ].
observation yobs of a random variable or process y using • The 2-D versions of these estimators are generalizations
a conditional pdf p(y|x) which comes from a model, and of the 1-D ones.
possibly an a priori pdf p(x) for x. • Using a white power spectral density makes the stochas-
• The maximum likelihood estimate x̂MLE(yobs ) is the tic Wiener filter reduce to the deterministic Wiener filter.
value of x that maximizes the likelihood function • A fractal model of power spectral density greatly im-
p(yobs |x), or equivalently its logarithm. proves performance.
• The maximum a posteriori estimate x̂MAP (yobs ) is the • A Markov random field (MRF) models each image pixel
value of x that maximizes the a posteriori pdf p(x|yobs ), as a random variable conditionally dependent on its
or equivalently its logarithm. p(x|yobs ) is computed from neighboring pixels.
likelihood function p(yobs |x) and a priori pdf p(x) using • A MRF model and the ICM algorithm can be used to
Bayes’s rule (see below). segment an image.
• The least-squares estimate x̂LS (yobs ) is the value of x that
Bayes’s rule 1-D Linear least-squares estimator
p(x) x̂LLSE [n] = h[n] ∗ yobs[n]
p(x|y) = p(y|x)
p(y)
1-D Deterministic Wiener deconvolution filter
MAP log-likelihood H∗blur (Ω)
log p(x|yobs ) = log p(yobs |x) + log p(x) − log(yobs ) x̂(Ω) = yobs (Ω)
|Hblur (Ω)|2 + σv2
Least-squares estimate R 1-D Stochastic Wiener deconvolution filter
x′ p(yobs |x′ ) p(x′ ) dx′
x̂LS (yobs ) = E[x | y = yobs ] = R H∗blur (Ω) Sx (Ω)
p(yobs |x′ ) p(x′ ) dx′ x̂(Ω) = yobs (Ω)
|Hblur (Ω)|2 Sx (Ω) + λ 2
Gaussian least-squares estimate
Fractal model
x̂LS (yobs ) = x̄ + Kx,yK−1
y (yobs − ȳ) c
Sx ( f ) = a , where a = 1 or 2
Sample mean |f|
1 N Gibbs distribution of a Markov random field
x̂LS (yobs ) = ∑ yobs [n]
N n=1 1
p({ f [n, m]}) = e−U[n,m]
Z
Least-squares estimation orthogonality principle
E[(x − x̂LS (y))]yT = 0
Bayes’s rule Gibbs distribution Markov random field sample mean
deterministic Wiener filter ICM algorithm maximum likelihood stochastic Wiener filter
fractal power spectral density least-squares maximum a posteriori
PROBLEMS 331
PROBLEMS (b) If x is a random variable with a priori pdf

(
Section 9-1: Estimation Methods be−bx for x > 0,
p(x) =
0 for x < 0,
9.1 A coin with P[heads] = x is flipped N times. The results of
each flip are independent. We observe n0 = number of heads in where b is a known constant, compute the MAP estimate
N flips. x̂MAP (y0 ).
(a) If x is an unknown constant, compute the MLE estimate (c) Explain the behavior of x̂MAP (y0 ) when b → ∞ and when
x̂MLE(n0 ). b → 0.
(b) If x is a random variable with a priori pdf
9.5 An exponential random variable y has pdf
(
10e−10x for 0 ≤ x ≤ 1, (
p(x) = xe−xy for y > 0,
0 otherwise, p(y|x) =
0 for y < 0.
derive a quadratic equation for the MAP estimate
x̂MAP (n0 ). Neglect P[x > 1] = e−10 = 0.000045. x is a random variable with a priori pdf
(
(c) If N = 92 and n0 = 20, compute x̂MAP (20). be−bx for x > 0,
p(x) =
9.2 A coin with P[heads] = x is flipped N times. The results 0 for x < 0,
of flips are independent. We observe n0 = number of heads in N
flips. where b is a known constant. Compute the least-squares estimate
(a) If x is a random variable with a priori pdf x̂LS (y0 ).

1 1 2 1
p(x) = δ x − + δ x− Section 9-4: Least-Squares Estimation
3 4 3 2
so that P[x = 41 ] = 31 and P[x = 12 ] = 32 , compute an 9.6 y(t) is a zero-mean WSS Gaussian random process with
expression for the least-squares estimator x̂LS (n0 ). autocorrelation function Ry (τ ) = e−|τ | .
(b) If N = 3 and n0 = 2, compute the least-squares estimate (a) Let y = {y(1), y(2), y(3)}. Determine the joint pdf p(y).
x̂LS (2). (b) Compute the least-squares estimate ŷ(3)LS (y(2) = 6).
9.3 An exponential random variable y has pdf (c) Compute the least-squares estimate
(
xe−xy for y > 0, ŷ(3)LS (y(2) = 6, y(1) = 4).
p(y|x) =
0 for y < 0.
9.7 x(t) is a zero-mean WSS white Gaussian random process
We observe five independent values y0 = {y1 , y2 , y3 , y4 , y5 } of with autocorrelation Rx (τ ) = 4δ (τ ). x(t) is input into an LTI
random variable y. x is an unknown constant. Compute the MLE system with impulse response
x̂MLE({y1 , y2 , y3 , y4 , y5 }) of x. (
3e−2t for t > 0,
9.4 An exponential random variable y has pdf h(t) =
0 for t < 0.
(
xe−xy for y > 0,
p(y|x) = (a) Compute the autocorrelation Ry (τ ) of the output random
0 for y < 0.
process y(t).
(a) If x is an unknown constant, compute the MLE estimate (b) Let y = {y(3), y(7), y(9)}. Determine the joint pdf p(y).
x̂MLE(y0 ). (c) Compute the least-squares estimate ŷ(7)LS (y(5) = 6).
(d) Compute the least-squares estimate Section 9-6: 2-D Estimation Problems
ŷ(7)LS (y(5) = 6, y(3) = 4). Deblurring due to an out-of-focus camera can be modeled
crudely as 2-D convolution with a disk-shaped PSF
(
9.8 x[n] is a zero-mean non-WSS random process with au- 1 for m2 + n2 < R2 ,
tocorrelation Rx [i, j] = min[i, j]. Let i > j > k. Show that h[n, m] =
x̂[i]LS (x[ j], x[k]) = x̂[i]LS (x[ j]), so x[k] is irrelevant. 0 for m2 + n2 > R2 ,
9.9 x[n] is an IID Gaussian random process with for some radius of R pixels. The program srefocus.m con-
volves with h[n, m] the image f [n, m] in the file ????.mat,
x[n] ∼ N (m, s), adds zero-mean white Gaussian noise with variance σv2 to the
2
blurred image, and deconvolves the blurred image using each of
where m = E[x[n]] and s = σx[n] . We are given observations the following two Wiener filters:
x0 = {x0 [1], x0 [2], . . . , x0 [N]} • the deterministic Wiener filter uses power spectral density
Sf (Ω1 , Ω2 ) = C;
of {x[1], x[2], . . . , x[N]}. The goal is to compute the MLE esti-
mates m̂(x0 ) and ŝ(x0 ) of mean m and variance s. • the stochastic Wiener filter uses power spectral density
Sf (Ω1 , Ω2 ) = C/(Ω21 + Ω22).
Section 9-7: Spectral Estimation Both filters depend only on the reciprocal signal-to-noise-type
ratio σv2 /C. In practice, neither C nor σv2 is known, so different
9.10 Section 9-7 showed discrete-space fractal images have values of σv2 /C would be tried. Use σv2 /C = 1 here. The program
power spectral densities srefocus.m will be used in the next eight problems. [Here
“????” refers to the .mat files used in Problems 9.11 through
C 9.18.]
Sf (Ω1 , Ω2 ) = ,
(Ω21 + Ω22 ) 9.11 Edit program srefocus.m to deconvolve the image in
0 < Ωmin < |Ω1 |, |Ω2 | < Ωmax < π , clown.mat. Use σv2 /C = 1.
for some Ωmin and Ωmax . This problem suggests many real- 9.12 Edit program srefocus.m to deconvolve the image in
world images also have such power spectral densities. letters.mat. σv2 /C = 1.
The following program uses a periodogram to estimate the 9.13 Edit program srefocus.m to deconvolve the image in
power spectral density of the image in ????.mat and fits a xray.mat. Use σv2 /C = 1.
line to the log-log plot of Sf (Ω1 , 0) versus Ω1 for Ωmin = 0.1π
and Ωmax = 0.9π . [Here “????” refers to the .mat files listed 9.14 Edit program srefocus.m to deconvolve the image in
below.] mri.mat. Use σv2 /C = 1. Change the power spectral density to
Sf (Ω1 , Ω2 ) = C/((Ω21 + Ω22 ))2 (uncomment a line).
clear;load ???.mat;M=size(X,1); 9.15 Edit program srefocus.m so it merely denoises the
K=round(M/20); noisy clown image. Use σv2 /C = 1 and h[n, m] = δ [n, m] (to
FX=abs(fft2(X))/M;FX=log10(FX.*FX); make a denoising, not a deconvolution problem) and
Omega1=log10(2*pi*[K:M/2-K]/M);
P=polyfit(Omega1,FX(1,K:M/2-K),1); • the deterministic Wiener filter uses the power spectral
Y=polyval(P,Omega1); density Sf (Ω1 , Ω2 ) = C;
subplot(211),plot(Omega1,FX(1,K:M/2-K),
Omega1,Y,’r’),axis tight,grid on • the stochastic Wiener filter uses the power spectral density
Sf (Ω1 , Ω2 ) = C/(Ω21 + Ω22).
Run this program for the images contained in the follow-
ing .mat files: (a) clown.mat; (b) letters.mat; (c) 9.16 Edit program srefocus.m so it merely denoises the
sar.mat; (d) mri.mat. What do these plots suggest about noisy letters image. Use σv2 /C = 1 and h[n, m] = δ [n, m] (to
the power spectral densities of the images? make a denoising, not a deconvolution problem) and
PROBLEMS 333
• the deterministic Wiener filter uses the power spectral

density Sf (Ω1 , Ω2 ) = C;
• the stochastic Wiener filter uses the power spectral density
Sf (Ω1 , Ω2 ) = C/(Ω21 + Ω22).
9.17 Edit program srefocus.m so it merely denoises the

noisy XRAY image. Use σv2 /C = 1 and h[n, m] = δ [n, m] (to
make a denoising, not a deconvolution problem) and
• the deterministic Wiener filter uses power spectral density
Sf (Ω1 , Ω2 ) = C;
• the stochastic Wiener filter uses power spectral density
Sf (Ω1 , Ω2 ) = C/(Ω21 + Ω22).
9.18 Edit program srefocus.m so it merely denoises the

noisy MRI image. Use σv2 /C = 1 and h[n, m] = δ [n, m] (to make
a denoising, not a deconvolution problem) and
• the deterministic Wiener filter uses the power spectral
density Sf (Ω1 , Ω2 ) = C;
• the stochastic Wiener filter uses the power spectral density
C
Sf (Ω1 , Ω2 ) = .
(Ω21 + Ω22)2
Section 9-10: Markov Random Fields
9.19 Download the image in field.mat and the program

icm.m.
(a) Display the image using
imagesc(X),colormap(gray).
(b) Display its histogram for 200 bins using
hist(X(:),200).
Eight pixel values occur more often than nearby values.
Identify them. Hint: They are in arithmetic progression.
(c) Change the line f=[] in icm.m to contain these values.
(d) Run icm.m. It uses 9 iterations, σ = 30, and β = 1.
(Varying these values seems to make little difference to the
final result.) Display the maximum-likelihood segmented
and ICM-algorithm segmented images.
Chapter 10
10 Color Image Processing
Contents
Overview, 335
10-1 Color Systems, 335
10-2 Histogram Equalization and Edge Detection, 340
10-3 Color-Image Deblurring, 343
10-4 Denoising Color Images, 346 (a) Blurred Christmas tree
Problems, 351
6
Objectives
2
-2
Learn to: -4
-6
-8
■ Transform between the RGB and YIQ color systems. -10
-12
k1
■ Apply image enhancement (histogram equalization
-14
0 20 40 60 80 100
(MATLAB)
and edge detection) to color images. (b) 1-D spectrum log(|GR[k1,0] + 1)
■ Apply image restoration (deblurring and denoising) to

color images.
(c) Reconstructed Christmas tree
This chapter extends the image processing tech-

niques developed in previous chapters for grayscale
images to color images, which can be represented in
either the RGB color system (suitable for image
deblurring) or the YIQ color system (suitable for
enhancement, so that colors are not distorted).
Examples illustrate all of this.
Overview
So far in this book, we have focused on the processing of
grayscale (black-and-white) images, where the intensity of
the image is described by a single 2-D function f (x, y) of
continuous spatial variables (x, y) or f [n, m] of discrete spatial
variables [n, m]. We considered several types of image pro-
cessing applications including image denoising, deblurring, and
segmentation. Now in this chapter, we consider color images
and how to process them. It is important to note that by “color (a) Color checkerboard image (b) Red color fR[n,m]
images” we mean true-color images, in which different colors
are actually present in the image, not false-color images, in
which the grayscale intensity of a black-and-white image is
divided into ranges and then assigned to different colors. We use
discrete-space images f [n, m], although our treatment is equally
applicable to continuous-space images f (x, y).
A color image consists of three channels or components,
which taken together contain the information present in the
image. The actual content of the three channels depends on
the color system used for the image. For example the RGB (c) Green color fG[n,m] (d) Blue color fB[n,m]
color system represents a color f [n, m] image as a triplet of red
fR [n, m], green fG [n, m], and blue fB [n, m] images: Figure 10-1 The three-color checkerboard image comprises
three single-color images, each comprising one-fourth of the
f [n, m] = { fR [n, m], fG [n, m], fB [n, m]}. (10.1) total pixels of the checkerboard (because one-fourth of the
pixels are black). In (b), fR [n, m] is non-zero in only the non-
It is important to note that it is not true that f [n, m] = black pixels, and a similar arrangement applies to fG [n, m] and
fR [n, m] + fG [n, m] + fB [n, m]—the three components are akin fB [n, m].
to elements of a vector. Instead, the color image consists of a
simultaneous display of the 2-D functions fR [n, m], fG [n, m] and
fB [n, m], as shown in Fig. 10-1. The color generation schemes
used in displays and printers are discussed in Section 10-1. 10-1.1 RGB (Additive) Color System
Other color systems, specifically CMYK (cyan, magenta,
yellow, black) and YIQ (luminance, in-phase, quadrature), are A. Acquiring Color Images
presented in Section 10-1. Image enhancement (histogram
equalization and edge detection) of color images is presented in Color images are acquired using three sensors: red, green,
Section 10-2, which uses YIQ, because use of RGB can result and blue. Film images in color photography were acquired by
in distortion of the colors of f [n, m]. Deconvolution (deblurring) splitting the image into three copies, and then filtering them
of color images is presented in Section 10-3, and denoising of using red, green, and blue filters to record red, green, and
color images is covered in Section 10-4. Both use RGB. blue components of the image. TV images when TV color
cameras were used were also acquired by splitting the image
into three copies, and then filtering them using red, green, and
10-1 Color Systems blue filters before inputting them into three separate sensors
(Fig. 10-2). More recently, three sets of charge-coupled device
A color system is a representation of a color image using three (CCD) sensors with red, green, and blue filters in front of them
channels or components. This section presents an overview of are used to acquire the red, green, and blue components of an
the three color systems most commonly used in displays and image.
color printers. Hence, image processing of color images consists of process-
ing three RGB components. Deconvolution of color images re-
quires deconvolution of each RGB component separately, using
335
336 CHAPTER 10 COLOR IMAGE PROCESSING
Filters Tubes Sync

pulses in
Lens To UHF
Video transmitter
Encoder signal
combiner
Chrominance signal
Mirror
Luminance Luminance signal
Semi-mirror matrix
Figure 10-2 Color TV camera with three image orthican tubes.
identical procedures (2-D DFT and Wiener filter) for each RGB when irradiated with electrons, and similarly for green and blue.
component. Image enhancement, in particular edge detection The phosphors are in groups of three, called pixels, one for each
and histogram equalization, is applied to the YIQ representation, color.
rather than the RGB representation (the two representations are Color televisions and computer monitors with LED displays
related by the linear transformation Eq. (10.4), in order to ensure use three sets of LEDs to depict the red, green, and blue images.
that different colors are not enhanced differently. Denoising can Each pixel in the display consists of three closely spaced LEDs,
use either RGB or YIQ representation. All of these tools are one for each color.
illustrated in the remainder of this chapter.
To acquire a color image, most digital cameras and scanners
acquire red, green, and blue images separately. Scanners include C. RGB
a (M × 3) array of CCD sensors with an (M × 3) array of color
The RGB color system represents color images using
filters in front of them, where M is the width of the scanner in
pixels. One row of filters is red, another row of filters is green, Eq. (10.1). It is called an additive color system because
and another row of filters is blue (see Fig. 1-6). The image is non-RGB colors are created by taking a linear combination of
{ fR [n, m], fG [n, m], fB [n, m]}. The additive process is illustrated
scanned line by line, and the mth line generates fR [n, m], fG [n, m]
and fB [n, m]. by the color cube shown in Fig. 10-3. The coordinates are
the weights applied to { fR [n, m], fG [n, m], fB [n, m]} to produce
the color shown in the cube. White, which has coordinates
B. Displaying Color Images (1,1,1), is obtained by combining red, green, and blue in equal
strengths. The non-RGB colors are discussed in Subsection
As noted above, color photography was developed by filtering 10-1.2 where we discuss the CMYK color scheme.
the lens image using red, green, and blue filters, and then devel- Since TV screens and monitors are dark when they are not
oping in a photo lab three separate images in these colors. Color displaying an image, it is natural to create color images on
television with cathode-ray tubes (CRTs) uses three electron them by displaying weighted sums (i.e., linear combinations) of
guns to “paint” a phosphor screen with red, green, and blue { fR [n, m], fG [n, m], fB [n, m]}. The lighting used in a dark theater
images. A mask behind the screen separates the outputs of the for a play is another example of an additive color system.
guns, so that the “red” gun illuminates phosphors that glow red Web browsers display web pages using an additive
10-1 COLOR SYSTEMS 337
Red
Magenta Yellow
Blue Green
Cyan
Figure 10-3 Color cube: various colors are linear combina-

tions of red, green, and blue.
Figure 10-4 Relation of CMY colors to RGB colors.
color scheme. The HTML command <body bgcolor=

‘‘#RrGgBb’’> changes the color of the background of a has the coordinates (0, 1, 1) on the color cube and red has
web page to a linear combination of red, green, and blue, where coordinates (1, 0, 0). The sum of all three additive colors is
the weights are the hexadecimal (base-16) numbers (Rr)16 for (1, 0, 0) + (0, 1, 0) + (0, 0, 1) = (1, 1, 1), which is the coordinate
red, (Gg)16 for green, and (Bb)16 for blue, with ( f )16 = 15. So for white. Similarly, yellow has coordinates (1, 1, 0) and blue
<body bgcolor=‘‘#ffff00’’> changes the background has coordinates (0, 0, 1). This is why the CMYK color system
color to yellow (red + green), since (Rr)16 = (Gg)16 = ( f f )16 is called subtractive—it subtracts colors from white, instead of
turns the red and green “lights” to their maximum values adding them to black the way RGB does.
( f f )16 = 255. Each hexadecimal weight has 162 = 256 values
so the number of possible colors is (256)3 = 16, 777, 216.
Still, not every conceivable color can be represented as a linear 10-1.3 Color Printing
combination of the colors in { fR [n, m], fG [n, m], fB [n, m]}. A. RGB Printers
So why is it that systems other than RGB are used?
Color printers do not use the RGB color system to print text
10-1.2 CMYK (Subtractive) Color System and images, but let us pretend for the time being that they do.
When the image of a perfectly red tomato is printed on white
The C (cyan), M (magenta), Y (yellow), K (black) color system paper using an RGB inkjet printer, the printer deposits red ink
is used for printing color images on a printer. A CMYK color onto the paper. A similar process occurs with the cartridge of
system was used to print drafts of this book, for example, on a a laser printer except that the material deposited into the paper
color-laser printer. The toner for laser printers and the inks for is a fine red powder and its extent across the paper is defined
color inkjet printers are in CMYK colors. by a magnetically charged toner. The printed tomato appears
The colors cyan, magenta, and yellow are generated from red, red because when illuminated by white light, the green and
green, and blue, as shown in Fig. 10-4. Yellow is the sum of red blue components of the white light are absorbed by the red ink
and green, or equivalently, yellow is white minus blue. or red powder, leaving only the red component of the incident
The CMY colors are also specified as the off-axis cor- light to be reflected by the printed image of the tomato. Hence,
ners of the color cube of Fig. 10-3. In fact, on the color the perfectly red tomato appears red. However, if the tomato is
cube, cyan, magenta, and yellow are the complementary col- yellow in color, which is equivalent to the sum of red and green
ors of red, green, and blue, respectively. For example, cyan in the RGB system (Fig. 10-4), an RGB inkjet printer would
have to deposit both red and green ink onto the paper. When Black (K) is included in CMYK because while Fig. 10-4
illuminated by white light, the red ink will absorb the green and shows that cyan + magenta + yellow = black, such an addition
blue components of the light and the green ink will absorb the of inks requires much ink to produce black, which is a common
red and blue components, leaving no light to be reflected by the color in images. Hence, black is included as a separate color in
printed page. Hence, the yellow tomato will appear black instead printing. Because CMYK consists of three subtractive colors—
of yellow. Because the ink used in inkjet printers and the powder yellow, cyan, and magenta—and black, it is often referred to as
used in laser toners operate by subtracting their colors from a four-color printing process. In CMYK, silver is represented
white, they are more compatible with the subtractive CMYK using the weights {0, 0, 0, 0.25}, and gold by {0, 0, 0.88, 0.15}.
color system than with the additive RGB system. The exact relation between RGB and CMYK color systems is
more complicated than we have presented, since the inks used
in printing are not exactly cyan-, magenta-, and yellow-colored.
B. CMYK Printers We should note that CMYK is not used in image processing,
In the subtractive CMYK color system (Fig. 10-5), red can be only for image printing.
created by a mixture of yellow and magenta inks, green by
a mixture of yellow and cyan inks, and so on. Using CMY 10-1.4 YIQ Color System
ink colors solves the problem we encountered earlier with the
yellow tomato. We can now print many colors, whether pure or The Y (luminance), I (In phase), Q (Quadrature) color system
not. Obviously, a yellow tomato can be printed using yellow ink, was developed for television by the NTSC (National Television
but what about a perfectly red tomato? Since yellow consists of System Committee) in 1953. It differs from the RGB and
red and green and magenta consists of red and blue, we can use CYMK color systems in that intensity is decoupled from color.
the combination of yellow and magenta inks to print it since both The reason for decoupling intensity from color was so that im-
colors include red. When illuminated by white light, the yellow ages transmitted in color could easily be displayed on black-and-
ink absorbs blue and the magenta ink absorbs green, leaving white televisions, which were still in common use throughout
only the red component of the white light to be reflected. the 1950s. This feature also makes the YIQ color system useful
In CMYK, a color image is depicted as: for histogram equalization and edge detection because image
intensity (luminance) can be varied without distorting colors, as
f [n, m] = { fC [n, m], fM [n, m], fY [n, m], fK [n, m]}. (10.2) we show later in Section 10-2.
The YIQ depiction of a color image f [n, m] is given by
f [n, m] = { fY [n, m], fI [n, m], fQ [n, m]}. (10.3)
Even though the symbol fY [n, m] is used in both the YIQ and
CMYK color systems, this is the standard notation used in color
Cyan schemes. The meaning of fY [n, m] (yellow or luminance) should
be obvious from the context.
The luminance component fY [n, m] is the intensity of
f [n, m] at location [n, m]. The chrominance (color) components
Blue Green
{ fI [n, m], fQ [n, m]} depict the color of f [n, m] at location [n, m]
using Fig. 10-6.
The horizontal axis (blue to orange) ranges over the colors
to which the eye is most sensitive. The vertical axis (purple
Magenta Yellow to green) ranges over colors to which the eye is less sensitive.
Red
Accordingly, for analog transmission of television signals, less
bandwidth is needed to transmit the quadrature-modulated sig-
nal fQ [n, m] than the in-phase (un-modulated) signal fI [n, m],
with most of the transmission bandwidth reserved for the lu-
minance signal fY [n, m].
Figure 10-5 Relation of RGB colors to CMY colors. The chrominance components { fI [n, m], fQ [n, m]} can be re-
garded as rectangular color coordinates. In polar coordinates,
10-1 COLOR SYSTEMS 339
image. The formula for this is

fQ[n,m]
fgray [n, m] = 0.299 fR [n, m] + 0.587 fG[n, m] + 0.114 fB[n, m].
(10.5)
The luminance image fY [n, m] is the image displayed on black-
and-white televisions and monitors.
B. Significance of the YIQ Color System

fI[n,m] Of course, black-and-white monitors, let alone televisions, are
rare today. The significance of the YIQ color system is that
it decouples color from intensity, so we can lighten an image
by applying histogram equalization to fY [n, m], while leaving
fI [n, m] and fQ [n, m] unaltered. Applying histogram equalization
to each of fR [n, m], fG [n, m] and fB [n, m] can alter the colors of
the image (see Section 10-2). There is also the computational
savings of applying image enhancement to a single image
instead of to three images. Similar comments apply to edge
Figure 10-6 YIQ chrominance components. detection in color images.
10-1.5 Other Color Systems

these would be hue (tint) for the angular coordinate and satura- There are many other color systems, such as HSI (Hue, Satura-
tion (vividness) for the radial coordinate. These are used in the tion, Intensity) and HSV (Hue, Saturation, Value). For example,
HSI (Hue, Saturation, Intensity) color system noted below. the HSI color system is related to the RGB system using
1
fI [n, m] = ( fR [n, m] + fG [n, m] + fB [n, m]), (10.6a)
3
A. Relating the RGB and YIQ Color Systems
3 min[ fR [n, m], fG [n, m], fB [n, m]]
fS [n, m] = 1 − , (10.6b)
The RGB (Eq. (10.1)) and YIQ (Eq. (10.3)) color system fR [n, m] + fG [n, m] + fB [n, m]
" #
representations of a color image f [n, m] are related by the linear 1
2 (( f R − f G ) + ( f R − f B ))
transformation fH [n, m] = cos−1 . (10.6c)
     ( fR − fG )2 + ( fR − fB )( fG − fB )
fY [n, m] 0.299 0.587 0.114 fR [n, m]
 fI [n, m]  = 0.596 −0.275 −0.321  fG [n, m] . We do not discuss or use these color schemes in this book,
fQ [n, m] 0.212 −0.523 0.311 fB [n, m] nor do we discuss color science, which includes topics such as
(10.4) photometry (image color spectra) and colorimetry (color match-
ing). These topics cover issues such as nonlinear response of
Note that the first row of this matrix sums to one, and each of sensors (including the eye), perception of colors by the eye and
the second and third rows sums to zero. This ensures that: (a) brain (psychophysics of color), imperfect ink colors for printing,
fY [n, m] ≤ 1 if fR [n, m] ≤ 1, fG [n, m] ≤ 1, and fB [n, m] ≤ 1, and nonlinear response of inkjet printers (which print dots
and (b) the two color components are zero if all three colors of different colors, just as monitors and televisions represent
are present in equal strengths, corresponding to a white image at color images using clusters of dots of different colors). Gamma
location [n, m], in which case fY [n, m] alone suffices to represent correction is an example of a nonlinear correction that arises
the image at [n, m]. in this context. For a good treatment of color science in image
The first row of Eq. (10.4) is a formula for transforming the processing, see H. J. Trussel and M. J. Vrhel, Fundamentals of
color image f [n, m] depicted using Eq. (10.1) to a grayscale Digital Imaging, Cambridge University Press, 2008.
◮ In this book we will transform from the RGB color Answer: From Eq. (10.4), fY [n, m] = 1 and
system to the YIQ color system using Eq. (10.4), then
perform image enhancement using histogram equalization fI [n, m] = fQ [n, m] = 0
on fY [n, m], and convert the result back to the RGB color
system using the inverse of Eq. (10.4). ◭ for all [n, m], as discussed below Eq. (10.4).
The result can then be converted to another color system, such

10-2 Histogram Equalization and Edge
as HSI, if desired. Detection
10-2.1 Histogram Equalization
10-1.6 Reading Images to MATLAB Histogram equalization (HE) was covered in Chapter 5. It alters
the pixel values of an image f [n, m] so that they are evenly
The MATLAB command imread is used to read images into distributed over the entire range of the output display, thereby
3-D MATLAB arrays. For example, A=imread(’f.jpg’); brightening a dim image. In principle, histogram equalization
maps the M × N JPEG image f.jpg into the M × N × 3 3-D can be applied to each of the three RGB components of the
MATLAB array A, in which A(:,:,1) is the M × N array color image represented by Eq. (10.1). The drawback of such an
of fR [m, n], A(:,:,2) is the M × N array of fG [m, n], and approach is that since the colors of the image are mixtures of the
A(:,:,3) is the M × N array of fB [m, n]. For the checkerboard red, green, and blue components of the image, disproportionate
image in Fig. 10-1(a), A(:,:,1) is shown in Fig. 10-1(b), altering of the pixel values will alter the colors of the image. The
A(:,:,2) in Fig. 10-1(c), and A(:,:,3) in Fig. 10-1(d). image will appear brighter, but the colors may get distorted.
Many types of images, such as .jpeg, .tiff and .bmp can be To avoid the color distortion, the RGB components can
handled using imread. A will be in uint8 format, which be mapped to YIQ components using Eq. (10.4), and then
should be converted into double-precision floating-point using histogram equalization can be applied to only the luminance
A=double(A);. The values A[i,j] must be scaled to satisfy (brightness) component fY [n, m]. The process brightens the
0 ≤ A[i, j] ≤ 1, using Eq. (5.1). image but leaves the colors unaltered because the color of each
imagesc(A) depicts the M × N × 3 3-D MATLAB array A pixel is determined by the two chrominance components fI [n, m]
as a color image, as in Fig. 10-1. and fQ [n, m], whose values are not altered. An inverse form of
Eq. (10.4) is then used to transform the YIQ representation back
to the RGB representation. This approach also has the com-
Concept Question 10-1: Why do we use the YIQ color putational advantage in that histogram equalization is applied
system in image processing? Why not just use the RGB to a single component, fY [n, m], instead of to all three RGB
color system exclusively? components.
To compare the result of histogram equalization when applied
(a) directly to an RGB image with (b) the result obtained
Exercise 10-1: What should the output of by converting the RGB image to a YIQ image, followed by
imagesc(ones(3,3,3)) be? histogram equalization applied to the Y component and then
Answer: A 3 × 3 all-white image (the combination of 3 × 3 converting the YIQ image back to RGB, we consider the dark
red, green, and blue images). checkerboard RGB image shown in Fig. 10-7(a).
Application of the histogram equalization method to the three
RGB channels separately leads to the image in Fig. 10-7(b),
Exercise 10-2: An image has whereas the application of histogram equalization to the Y
channel leads to the image in Fig. 10-7(c). Both approaches
fR [n, m] = fG [n, m] = fB [n, m] = 1 generate brighter versions of the checkerboard image, but only
for all [n, m]. What is its YIQ color system representation? the Y histogram equalization preserves the true colors of the
original image.
10-2 HISTOGRAM EQUALIZATION AND EDGE DETECTION 341
Example 10-1: Flag Image in RGB and

YIQ Equalization
Use the image in Fig. 10-8(a) to compare the results of two

histogram equalization experiments, one performed using RGB
and another using YIQ.
Solution: Application of histogram equalization to each of the
three RGB images separately, and then followed by combining
the equalized images into a single color image results in the
brighter, but color-distorted, image shown in Fig. 10-8(b). In
contrast, application of histogram equalization to the Y channel
of the YIQ image (Fig. 10-8(c) preserves the color balance of
(a) Original, dark checkerboard image
the original image. Note the color of the field of stars.
10-2.2 Edge Detection in Color Images

The subject of edge detection as applied to grayscale images
was covered in Section 5-4.2, wherein we demonstrated how the
Sobel edge detector is used to convolve the grayscale image
f [n, m] with two PSFs, hH [n, m] and hV [n, m], to obtain the
horizontal edge image dH [n, m] and the vertical edge image
dV [n, m]:
dH [n, m] = f [n, m] ∗ ∗hH [n, m] (10.7a)

and
(b) Histogram equalization using RGB
dV [n, m] = f [n, m] ∗ ∗hV [n, m]. (10.7b)
The two edge images are then combined to generate the gradient
image g[n, m]:
q
g[n, m] = dH [n, m]2 + dV [n, m]2 . (10.7c)
The Sobel edge-detector algorithm identifies a pixel [n, m] as an

edge if g[n, m] exceeds a pre-specified threshold ∆. Accordingly,
the edge image z[n, m] is given by
(
1 for g[n, m] > ∆,
z[n, m] = (10.8)
0 for g[n, m] < ∆.
(c) Histogram equalization using YIQ
The value of ∆ is application-dependent.
Figure 10-7 Application of histogram equalization to the dark For color images, we can apply edge detection to each of
checkerboard image in (a) results in the image in (b) when the fR [n, m], fG [n, m] and fB [n, m] in an RGB image, or just to
equalization is applied to all three RGB channels and to the
fY [n, m] in its YIQ equivalent. Aside from the computational
image in (c) when applied to the Y channel of the YIQ image.
savings of detecting edges in only one image instead of three,
another reason for using just fY [n, m] is illustrated by the
following example.
Edge detection of checkerboard image

The (24 × 24) checkerboard image in Fig. 10-9(a) is in RGB.
Application of the Sobel edge detector algorithm with threshold
∆ = 0.1 to each color channel separately leads to the images
in parts (b) through (d) of the figure. The three colors generate
different edge images. In contrast, Fig. 10-9(e) displays the edge
image using the YIQ image, which correctly identifies all edges
regardless of the pixel colors.
(a) Original dark flag image
(a) Original checkerboard (b) Red color edge

RGB image detection
(b) After histogram equalization of RGB image
(c) Green color edge (d) Blue color edge

detection detection
(c) After histogram equalization of YIQ image

and conversion back to RGB format
(e) Edge detection using YIQ
Figure 10-8 The application of histogram equalization to
each of the three channels of the RGB image results in color Figure 10-9 Sobel edge detection: (a) original color image,
distortion, but that is not the case when applied to a YIQ image. (b)–(d): edge detection of each RGB color channel separately,
and (e) edge detection using Y channel of YIQ image.
10-3 COLOR-IMAGE DEBLURRING 343
Concept Question 10-2: Why does applying histogram Example 10-2: Deblurring Checkerboard Image
equalization using the RGB color system alter colors?
Exercise 10-3: Should gamma transformation be applied to Consider the 8 × 8 color checkerboard image shown in
images using the RGB or YIQ format? Fig. 10-10(a), which we label f [n, m], as it represents an original
unblurred image of the checkerboard.
Answer: YIQ format, because gamma transformation is
If the original scene was imaged by a sensor characterized by
nonlinear; applying it to an RGB image would distort its
a circular PSF given by
colors.
( √
1 for n2 + m2 ≤ 4,
Exercise 10-4: Should linear transformation be applied to h[n, m] = √ (10.11)
0 for n2 + m2 > 4.
images using the RGB or YIQ color system?
Answer: RGB, since this is the color system used to (a) Generate the convolved image
display the transformed image. Applying linear transfor-
mation to YIQ images may result in gR [n, m] ≥ gmax , g[n, m] = f [n, m] ∗ ∗h[n, m], (10.12)
gG [n, m] ≥ gmax , or gB [n, m] ≥ gmax in Eq. (5.1).
and (b) apply deconvolution to reconstruct the original image.
Solution: (a) A 2-D display of the PSF, when centered at pixel

(5, 5), is shown in Fig. 10-10(b). The PSF approximates a disk
of radius 4. Included in the PSF are pixels [5, 5], [5, 5 ± 4],
and [5 ± 4, 5]. Hence, to include all its pixels, we approximate
10-3 Color-Image Deblurring the PSF as a (9 × 9) square. Application of the convolution
operation given by Eq. (10.12) leads to the blurred image shown
in Fig. 10-10(c). Per Section 5-2.5, if the size of image f [n, m]
When a color scene/image is blurred as a result of convolution is (M × M) and if h[n, m] is of size (L × L), then the size of the
with the sensor’s PSF h[n, m], each of its RGB components is convolved image is (N × N) with
blurred. The blurred image consists of the three components
N = M + L − 1 = 8 + 9 − 1 = 16.
gR [n, m] = fR [n, m] ∗ ∗h[n, m],
gG [n, m] = fG [n, m] ∗ ∗h[n, m], (10.9) Hence, the blurred image in Fig. 10-10(c) is (16 × 16).
gB [n, m] = fB [n, m] ∗ ∗h[n, m]. (b) To deblur the image in Fig. 10-10(c) and recover the
original image, we follow the recipe outlined in Section 6-4.1:
For the YIQ image format, convolving Eq. (10.4) with h[n, m] (1) Transform to the discrete frequency domain:
leads to
     H[k1 , k2 ] = DFT{[h[n, m]},
gY [n, m] 0.299 0.587 0.114 gR [n, m] GR [k1 , k2 ] = DFT{[gR [n, m]}, (10.13a)
 gI [n, m]  = 0.596 −0.275 −0.321 gG [n, m] .
GG [k1 , k2 ] = DFT{[gG [n, m]}, (10.13b)
gQ [n, m] 0.212 −0.523 0.311 gB [n, m]
GB [k1 , k2 ] = DFT{[gB [n, m]}. (10.13c)
(10.10)
Hence, all three of the YIQ components are also blurred, so (2) Since convolution in the spatial domain is equivalent to
there is no particular advantage to transforming the image from multiplication in the spatial frequency domain using (16 × 16)
RGB to YIQ format. In practice deblurring (deconvolution) of a 2-D DFTs:
color image is performed separately on each of gR [n, m], gG [n, m]
and gB [n, m] to obtain the original image components fR [n, m], GR [k1 , k2 ] = FR [k1 , k2 ] H[k1 , k2 ], (10.14)
fG [n, m] and fB [n, m].
The deblurring process is illustrated through two examples. and similar expressions apply to the green and blue images.
1
2
3
4
5
6
7
8
9
n
1 2 3 4 5 6 7 8 9
m
(a) (8 × 8) original checkerboard RGB image (b) Extent of PSF when centered at pixel (5,5)
(c) (16 × 16) blurred checkerboard image (d) (16 × 16) reconstructed checkerboard image
Figure 10-10 Simulation of blurring and deblurring processes: (a) original image, (b) PSF h[n, m], (c) original image blurred by system
PSF, and (d) reconstructed image after removal of blurring.
Hence, for the red channel we use the computed results to obtain convolved image is shown in Fig. 10-10(d). The blurring caused
by the sensor has been completely removed. The reconstructed
GR [k1 , k2 ] image is a (16 × 16) zero-padded version of the original (8 × 8)
FR [k1 , k2 ] = , (10.15)
H[k1 , k2 ] image.
and similar expressions apply to the other two colors.
(3) Compute fR [n, m] for all pixels:
fR [n, m] = DFT−1 {FR [k1 , k2 ]}, (10.16)

Concept Question 10-3: Why didn’t we need regulariza-
and similar expressions apply to fG [n, m] and fB [n, m]. The de- tion in Example 10-2?
10-3 COLOR-IMAGE DEBLURRING 345
Exercise 10-5: Can the colors of a deconvolved image

differ from those of the original image?
Answer: Yes. Regularization can alter the values of the
reconstructed { fR [n, m], fG [n, m], fB [n, m]} image, and this
will affect the colors. However, the effect is usually too
slight to be noticeable.
Exercise 10-6: If the PSFs for different colors are different,

can we still deconvolve the image?
Answer: Yes. Use the proper PSF for each color.
(a) Blurred Christmas tree
Example 10-3: Deblurring of Motion-Blurred 6

Image
4
Horizontal motion by a camera relative to a stationary Christmas 0

tree led to the (691 × 1162) blurry image g[n, m] shown in -2
Fig. 10-11(a). Apply the motion-deblurring technique of Sec- -4
tion 6-6.2 to each of the three RGB images to obtain a deblurred
-6
image f [n, m]. Use λ = 0.01 (varying the value of λ does not
-8
appear to impact the results) and T = 1 s.
-10
Solution:
-12
(1) Image g[n, m] consists of three images, namely gR [n, m],
gG [n, m], and gB [n, m]. Application of the 2-D DSFT, as defined -14
0 20 40 60 80 100
k1
by Eq. (3.73a), to each of the three images leads to three spectral (MATLAB)
images GR (Ω1 , Ω2 ), GG (Ω1 , Ω2 ), and GB (Ω1 , Ω2 ). (b) 1-D spectrum log(|GR[k1,0] + 1)
(2) For a blur PSF of horizontal extent (N + 1) and an image
of horizontal extent M, the blurred image has a horizontal extent
of M + (N + 1) − 1 = M + N. According to Eq. (6.49), motion
blur in the horizontal direction causing an image to increase in
width by N pixels is mathematically equivalent to a filter with a
spatial frequency response H(Ω1 , Ω2 ) given by

T sin Ω1 N+1
H(Ω1 , Ω2 ) = 2
e− jΩ1 N/2 . (10.17)
N sin(Ω1 /2)
Here, T is the total recording time and N is the total number
of time shifts that occur during time T . Because the exact value
of T is irrelevant to the deblurring procedure, we set T = 1 s.
In contrast, the value of N is crucial, so we need to extract it
from the spectrum of one of the three color channels. Using the (c) Reconstructed Christmas tree
spectrum GR (Ω1 , Ω2 ) of the red channel, we sample it at
Figure 10-11 Deblurring of motion-blurred image of a
2 π k1 Christmas tree.
Ω1 = (10.18a)
1162
and
Exercise 10-7: If the length of the blur were N + 1 = 6, at
2 π k2
Ω2 = , (10.18b) what Ω1 values would H(Ω1 , Ω2 ) = 0?
691
Answer: The numerator of Eq. (10.17) is zero when
to obtain GR [k1 , k2 ]. Next, we plot the horizontal profile of sin(Ω1 (N + 1)/2) = 0, which occurs when Ω1 (N + 1)/2
|GR [k1 , k2 ]| as a function of k1 at k2 = 0. For display purposes, is a multiple of π , or equivalently when Ω1 = ±kπ /3 for
we offset |GR [k1 , k2 ]| by 1 and plot the logarithm of the sum: integers k. This is independent of the size of the image.
log(|GR [k1 , 0]| + 1). (10.19)
The MATLAB-generated plot of Eq. (10.19) is shown in 10-4 Denoising Color Images
Fig. 10-11(b). It exhibits a periodic sequence of sharp nulls
corresponding to the zeros of sin[Ω1 (N + 1)/2] in the spatial In Chapter 7, we demonstrated how the discrete wavelet trans-
frequency response H(Ω1 , Ω2 ) of the PSF that caused the form can be used to denoise an image by applying the combi-
blurring of the image. The first null is at MATLAB index = 15, nation of thresholding and shrinkage to the noisy image. The
which corresponds to same approach is equally applicable to color images. The recipe
outlined in Section 7-8.2 can be applied to each of the three
k1 = 15 − 1 = 14. RGB channels separately, and then the three modified images
are combined to form the denoised color image. Since noise had
Since the first null of the sine function occurs when its argument been added to each RGB component separately, the denoising
is ±π , it follows that procedure also is applied to each RGB component separately. If
preserving color is paramount, denoising can be applied to the
Ω1 (N + 1) 2π k1 N + 1 2π × 14 N + 1
π= = = , Y component of YIQ, instead of to the RGB components.
2 1162 2 1162 2 We illustrate the effectiveness of the denoising technique
through the following two examples.
which leads to N = 82.
(3) With H(Ω1 , Ω2 ) fully specified, we now apply the Wiener
filter of Eq. (6.50) to the red-color spectrum to obtain the Example 10-4: Denoising American Flag Image
deblurred version
GR (Ω1 , Ω2 ) H∗ (Ω1 , Ω2 ) To the original image shown in Fig. 10-12(a), a zero-mean white
FR (Ω1 , Ω2 ) =
|H(Ω1 , Ω2 )|2 + λ 2 Gaussian noise random field with σv = 0.5 was added to each

T sin Ω1 N+1 of the RGB components. The resultant noisy image is shown
GR (Ω1 , Ω2 ) 2
e jΩ1 N/2 in Fig. 10-12(b), and the associated signal-to-noise ratios are
N sin(Ω1 /2)
= 2 2 . 3.32 dB for the red channel, 1.71 dB for the green, and 2.36 dB
N+1
T sin Ω 1 for the blue. Application of the 2-D Haar transform method
2
+λ2 outlined in Section 7-8 with a threshold/shrinkage factor λ = 3
N sin(Ω1 /2)
led to the denoised image shown in Fig. 10-12(c). Much of the
(10.20)
noise has been removed.
Similar procedures are applied to the two other channels.
(4) Application of the inverse DFT to each of the three color
channels yields fR [n, m], fG [n, m], and fB [n, m], the combination
of which yields the deblurred color image in Fig. 10-11(c). The
motion-caused blurring has essentially been removed. The only
artifact is the black band at the right-hand side of the image,
which is due to the zero-padding used in the DFT (compare with
Fig. 10-10(d)).
10-4 DENOISING COLOR IMAGES 347
Example 10-5: Brightening Toucan Image
The objective of this example is to brighten the toucan image

shown in Fig. 10-13(a) using two different methods, namely the
RGB and YIQ approaches, and then to compare and contrast the
results. The plots in parts (b) to (d) display the histograms of the
R, G, and B channels in the original image.
Method 1: RGB-Equalized Image

Application of the histogram-equalization method of Section
5-3 to the three RGB channels individually led to the image
in Fig. 10-14(a). The underlying histograms are shown in parts
(b)–(d) of the figure. The toucan image is indeed brighter than
(a) Original flag image
that of the original, but because the histogram equalization
was performed separately and independently for the three RGB
channels, the color balance in the RGB-equalized image is not
the same as in the original image. Consequently, some of the
colors got distorted.
Method 2: YIQ-Equalized Image

After (a) converting the original RGB image to YIQ format,
(b) applying histogram-equalization on only the Y channel, and
then (c) converting the YIQ-equalized image to RGB, we end up
with the image and associated histograms shown in Fig. 10-15.
(b) Noisy image The image in Fig. 10-15(a) is not as bright as the RGB-equalized
image in Fig. 10-14(a), but the color balance is preserved in the
YIQ-equalization method.
Concept Question 10-4: Why should wavelet-based de-

noising be applied to images in YIQ format and not RGB
format?
(c) Denoised image
Figure 10-12 Denoising American flag image using 2-D

Haar transform method generates a less noisy image, but at the
expense of high-resolution information.
3500
3000
2500
2000
1500
1000
500
0
50 100 150 200 250
(a) Original toucan image (b) Histogram of red (R) channel in original image
3500 15000
3000
2500
10000
2000
1500
1000 5000
500
0 0
50 100 150 200 250 50 100 150 200 250
(c) Histogram of green (G) channel in original image (d) Histogram of blue (B) channel in original image
Figure 10-13 Original image and associated histograms of its R, G, and B channels.
PROBLEMS 349
3500
3000
2500
2000
1500
1000
500
0
0 50 100 150 200 250
(a) RGB-equalized image (b) Histogram of red (R) channel in RGB-equalized image
2500 12000
10000
2000
8000
1500
6000
1000
4000
500 2000
0 0
0 50 100 150 200 250 0 50 100 150 200 250
(c) Histogram of green (G) channel in RGB-equalized image (d) Histogram of blue (B) channel in RGB-equalized image
Figure 10-14 The RGB-equalized method generates a brighter image, but does not preserve color balance.
Summary
Concepts
• Color images have three components: red, green, and • Image restoration, such as denoising and deconvolution,
blue, each of which is a separate 2-D function. This RGB uses the RGB color system, since each component is
color system represents how images are acquired and blurred or has noise added to it directly. Image enhance-
displayed. ment, such as histogram equalization and edge detection,
• Other color systems include the CMYK system, used uses the YIQ system, since use of the RGB system can
for color printing, and the YIQ system, used for image result in distortion of colors.
enhancement, which is applied only to the Y (luminance) • Other colors can be represented as a linear combination
component. of red, green, and blue, as depicted in the color cube.
2500
2000
1500
1000
500
0
0 50 100 150 200 250
(a) YIQ-equalized image (b) Histogram of red (R) channel in YIQ-equalized image
3500
3500
3000
3000
2500
2500
2000 2000
1500 1500
1000 1000
500 500
0 0
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200
(c) Histogram of green (G) channel in YIQ-equalized image (d) Histogram of blue (B) channel in YIQ-equalized image
Figure 10-15 The YIQ-equalized method generates a partially brighter image, but does preserve color balance.
Components of a color image RGB to grayscale
f [n, m] = { fR [n, m], fG [n, m], fB [n, m]} fgray [n, m] = 0.299 fR [n, m] + 0.587 fG[n, m] + 0.114 fB[n, m]
Relation between components of RGB MATLAB commands for color images
and YIQ color systems imread, imagesc
    
fY [n, m] 0.299 0.587 0.114 fR [n, m]
 fI [n, m]  = 0.596 −0.275 −0.321  fG [n, m]
fQ [n, m] 0.212 −0.523 0.311 fB [n, m]
additive color scheme CMYK color scheme luminance subtractive color scheme
chrominance color cube RGB color scheme YIQ color scheme
PROBLEMS 351
PROBLEMS 10.5 Use sobelc.m to apply Sobel edge detection to the

toucan image in toucan.mat. Display the original image and
Section 10-1: Color Systems the result of edge detection applied to the Y image component.
Use a threshold of 0.5.
10.1 Hang a checkerboard ornament on a Christmas tree.
The goal of this problem is to hang a color checkerboard 10.6 Use sobelc.m to apply Sobel edge detection to the
ornament in the middle of the Christmas tree in MATLAB file plumage image in plumage.mat. Display the original image
xmastree.mat. and the result of edge detection applied to the Y image compo-
nent. Use a threshold of 0.5.
(a) Create in MATLAB a (64 × 64) checkerboard image,
consisting of an (8 × 8) array of alternating red and green
(8 × 8) blocks. Zero-pad this to (691 × 1080) (the size of Section 10-3: Color Image Deblurring
the image in xmastree.mat).
(b) Add the zero-padded checkerboard image to the Christmas 10.7 Deblur motion-blurred Christmas tree. Download
tree image. Display the resulting image. Why did this the (686 × 399) motion-blurred Christmas tree image in
procedure produce a poor result? xmasdarkmotion.mat.
(c) Set the pixel values in the Christmas tree image where (a) Display the motion-blurred Christmas tree image.
the checkerboard image is nonzero to zero. Now add the (b) Display the log of the magnitude of the red component
zero-padded checkerboard image to this image. Display the of xmasdarkmotion.mat. Zero-pad (686 × 399) to
resulting two images. Why did this procedure produce a (686 × 400) to speed up the 2-D FFT. Use the 2-D spectrum
much better result? to find the length N + 1 of the PSF of the motion blur.
10.2 Put the clown at the top of a Christmas tree. The goal of (c) Use λ = 0.01 and T = 1 in motiondeblurcolor.m to
this problem is to put the clown image in clown.mat at the deblur and display the tree.
top of the Christmas tree in MATLAB file xmastree.mat.
10.8 Deblur motion-blurred Christmas tree. Download the
(a) Convert the black-and-white clown image in clown.mat motion-blurred Christmas tree image in xmasmotion.mat.
from YIQ to RGB format. ( fI [n, m] and fQ [n, m] are
(a) Display the motion-blurred Christmas tree image.
both zero; fY [n, m] is the black-and-white image.) Zero-
pad this to (691 × 1080) (the size of the image in (b) Use λ = 0.01, T = 1, and N = 15 in
xmastree.mat). motiondeblurcolor.m to deblur the tree.
(b) Add the zero-padded checkerboard image to the Christmas 10.9 Refocus an out-of-focus image. An out-of-focus image
tree image. Display the resulting image. Why didn’t this can be modeled as the image convolved with a disk-shaped PSF.
work? The program refocuscolor.m convolves an image with a
(c) Set the pixel values in the Christmas tree image where disk PSF and then deconvolves. Use refocuscolor.m to
the zero-padded clown image is nonzero to zero. Now add blur and deblur the (691 × 1080) image in xmastree.mat.
the zero-padded clown image to this image. Display the Uncomment a line in refocuscolor.m and use λ = 0.01
resulting two images. Why did this work? and disk radius R = 20.
10.10 Refocus an out-of-focus image. An out-of-focus image
Section 10-2: Histogram Equalization and Edge can be modeled as the image convolved with a disk-shaped PSF.
Detection The program refocuscolor.m convolves an image with a
disk PSF and then deconvolves. Use refocuscolor.m to
10.3 Apply histogram equalization (HE) to the Christmas tree blur and then deblur the (340 × 453) image in toucan.mat.
image in xmastree.mat using the program histcolor.m. Uncomment a line in refocuscolor.m and use λ = 0.01 and
Display results for HE of RGB and YIQ formats. disk radius R = 11.
10.4 Apply histogram equalization (HE) to the Christmas 10.11 Refocus an out-of-focus image. An out-of-focus image
tree image in xmastreedark.mat using the program can be modeled as the image convolved with a disk-shaped PSF.
histcolor.m. Display results for HE of RGB and YIQ The program refocuscolor.m convolves an image with a
formats. disk PSF and then deconvolves. Use refocuscolor.m to
blur and then deblur the (512 × 512) image in flag.mat. white Gaussian noise with variance σ 2 is added to each
Uncomment a line in refocuscolor.m and use λ = 0.01 RGB component. The image is clipped to (704 × 704) since
and disk radius R = 29. daubdenoisecolor.m requires square images. Uncomment
a line in daubdenoisecolor.m and use λ = 0.05 and
10.12 Refocus an out-of-focus image. An out-of-focus image
σ = 0.05.
can be modeled as the image convolved with a disk-shaped PSF.
The program refocuscolor.m convolves an image with a 10.18 Use db3 transform threshold and shrinkage pro-
disk PSF and then deconvolves. Use refocuscolor.m to gram daubdenoisecolor.m to denoise each RGB com-
blur and then deblur the plumage image in plumage.mat. ponent of the toucan image in toucan.mat. Zero-mean
Uncomment a line in refocuscolor.m and use λ = 0.01 white Gaussian noise with variance σ 2 is added to each
and disk radius R = 18. RGB component. Image is zero-padded to (480 × 480) since
daubdenoisecolor.m requires square images. Uncomment
Section 10-4: Denoising Color Images a line in daubdenoisecolor.m and use λ = 0.05 and
σ = 0.05.
10.13 Use Haar transform threshold and shrinkage program 10.19 Use db3 transform threshold and shrinkage pro-
haardenoisecolor.m to denoise each RGB component of gram daubdenoisecolor.m to denoise each RGB com-
the (8 × 8) checkerboard image in checker.mat. Zero-mean ponent of the plumage image in plumage.mat. Zero-mean
white Gaussian noise with variance σ 2 is added to each RGB white Gaussian noise with variance σ 2 is added to each
component. Uncomment a line in haardenoisecolor.m RGB component. Image is zero-padded to (512 × 512) since
and use λ = 1 and σ = 0.2. daubdenoisecolor.m requires square images. Uncomment
10.14 Use Haar transform threshold and shrinkage program a line in daubdenoisecolor.m and use λ = 0.05 and
haardenoisecolor.m to denoise each RGB component σ = 0.05.
of the Christmas tree image in xmastree.mat. Zero-mean
white Gaussian noise with variance σ 2 is added to each
RGB component. The image is clipped to (704 × 704) since
haardenoisecolor.m requires square images. Uncomment
a line in haardenoisecolor.m and use λ = 0.2 and
σ = 0.1.
10.15 Use Haar transform threshold and shrinkage pro-
gram haardenoisecolor.m to denoise each RGB com-
ponent of the toucan image in toucan.mat. Zero-mean
RGB component. Image is zero-padded to (480 × 480) since
σ = 0.1.
10.16 Use Haar transform threshold and shrinkage pro-
gram haardenoisecolor.m to denoise each RGB com-
ponent of the plumage image in plumage.mat. Zero-mean
RGB component. Image is zero-padded to (512 × 512) since
σ = 0.1.
10.17 Use db3 transform threshold and shrinkage program
daubdenoisecolor.m to denoise each RGB component
of the Christmas tree image in xmastree.mat. Zero-mean
Chapter 11
11 Image Recognition
Contents
(a) Image of text
Overview, 354
11-1 Image Classification by Correlation, 354
11-2 Classification by MLE, 357
11-3 Classification by MAP, 358
11-4 Classification of Spatially Shifted Images, 360 (b) Thresholded cross-correlation ρ5(n0,m0)
11-5 Classification of Spatially Scaled Images, 361
11-6 Classification of Rotated Images, 366
11-7 Color Image Classification, 367
11-8 Unsupervised Learning and Classification, 373 (c) Image of text (green) and thresholded cross-correlation (red)
11-9 Unsupervised Learning Examples, 377
11-10 K-Means Clustering Algorithm, 380
Problems, 384 Image recognition amounts to classification of a
given image as one of several given possible
candidate images. Classification can be regarded as
Objectives a discrete estimation problem in which the candi-
date image that maximizes a likelihood function is
Learn to: chosen. If there are no candidate images, unsuper-
vised learning can be used to determine image
■ Use correlation, MLE, and MAP to classify an image. classes from a set of training images.
■ Use cross-correlation to classify a shifted image.
■ Use coordinate transformation (logarithmic or polar)

to classify a spatially scaled or rotated image.
■ Classify color images.
■ Use the SVD to perform unsupervised learning.

Overview similarity between gobs [n, m] and each stored, reference image
fk [n, m] and then selects the class index k that maximizes
Along the very bottom section of the check shown in Fig. 11-1 the similarity. Some algorithms use the correlation between
are three sequences of numbers. The first sequence, labeled gobs [n, m] and each fk [n, m] as the parameter with which to
the “Routing Number,” specifies the identity of the bank that measure the degree of similarity, while other algorithms may use
had issued the check; the second sequence is the “Account measurement parameters that incorporate other image properties
Number” and it identifies the owner of the account; and the as well. Often the observed image gobs [n, m] contains random
last sequence is the “Check Number” itself. When the check is noise, and the shape of the imaged object may be distorted due
inserted inside a check reader, an optical sensor scans across the to some angular rotation or size modification. A robust classifi-
rectangular space containing the three sequences of numbers and cation algorithm needs to be adept at dealing with these types
then displays the sequences on an electronic screen. How does of deviations. These and other aspects of image recognition are
the check reader recognize the individual digits comprising the addressed in this chapter.
three sequences? The answers to this and other image recog- Image recognition algorithms rely on one of three approaches:
nition questions are at the heart of the present and succeeding
chapter. (1) Correlation between the observed image and each of
The objective of an image recognition algorithm is to cor- the stored reference images, in conjunction with a priori
rectly classify an observed image gobs [n, m] into one of L probability information about the frequency of occurrence
possible classes on the basis of stored, true (reference) images of the unknown objects.
fk [n, m] of the L classes, where { k = 1, . . . , L }. If the classes are (2) Unsupervised training using training images with un-
digits printed in a specific font, then the objective is to identify known identities.
the correct digit from among the 10 possible digits between 0
and 9. In text recognition, not only is the number of classes (3) Supervised training using training images with known
(letters in the alphabet) much larger, but the letters may be identities.
printed in lower or upper case and may include more than one
The present chapter covers the first two approaches, and the third
type of font. Other image recognition applications may involve
approach is covered in Chapter 12.
classifying objects on the basis of their shapes and colors.
In some cases, the image recognition algorithm requires
some form of prior training or learning using sample images 11-1 Image Classification by
of the types of objects under consideration. Otherwise, for L
classes of objects with known true (reference) images fk [n, m]— Correlation
with { k = 1, 2, . . . , L }—an algorithm measures the degree of
Consider the 10 numerals shown in Fig. 11-2(a). The numerals
are printed in white, using a standard bank font, against a black
background. In practice, check numbers are read by a reading
head (optical scanner) whose size in pixels is (9 × 1). As the
head moves horizontally across each (9 × 7) numeral, a 1-D sig-
nal of duration 7 samples is generated as the output “signature”
of that numeral. Upon comparing the measured signal with the
10 possible signals stored in the system, the system selects the
numeral (from among 1, 2, . . . , 9, 0) that provides the best match.
An alternative, and more reliable, approach to performing the
numeral recognition problem is to use a 2-D image instead of
a 1-D signal. Let us assume that each numeral in the check
account number is allocated a rectangular space of M × N pixels,
comprising N horizontal pixels and M vertical rows of pixels.
When the optical scanner sweeps across the rectangle containing
Figure 11-1 Example of a bank check with three sequences the bank account number, it images the rectangle and then
of numbers. partitions it into individual images each (M × N) pixels in size.
An expanded view of one such image is shown in Fig. 11-2(b).
354
11-1 IMAGE CLASSIFICATION BY CORRELATION 355
(a) Numerals 1,2,…,9,0 in Bank font (b) Noisy image gobs[n,m] of numeral 3
ρk
6
x 10
14
12
10
0 k
0 1 2 3 4 5 6 7 8 9
(c) Correlation ρk
ΛMLE[k]
6
x 10
15
10
0 k
−5
0 1 2 3 4 5 6 7 8 9
(d) ΛMLE[k]
Figure 11-2 (a) Numerals 1–9 and 0 in bank font, (b) noisy (30 × 18) image gobs [n, m] of numeral 3 with SRN = −1.832 dB, (c)
correlation ρk , and (d) MLE criterion Λ2 [k].
356 CHAPTER 11 IMAGE RECOGNITION
It contains a single numeral, but it is difficult to recognize the

numeral’s identity because the image is quite noisy (the signal- Notation for Training
to-noise ratio is −1.832 dB, corresponding to a signal energy and Classification
only 66% of that of the noise). The image size is (30 × 18) and
the true identity of the numeral is 3, but we assume at this stage Single-Color Images
of the classification example that we do not yet know the true
identity of the numeral in Fig. 11-2(b). fk [n, m] = reference image of class k, { k = 1, . . . , L }
Let us designate the observed image of the unknown numeral (i)
fk [n, m] = ith training image of class k
as gobs [n, m], which consists of the sum of two images, namely L = number of classes
images fk [n, m] and υ [n, m]: gobs [n, m] = observed image of unknown class
gobs [n, m] = fk [n, m] + υ [n, m], (11.1)
M × N = image size
(0 ≤ n ≤ N − 1, 0 ≤ m ≤ M − 1), ρk = correlation between fk [n, m] and gobs [n, m]
where fk [n, m] is the true image of the unknown (but we do
not yet know the value of index k) and υ [n, m] is a random- p[k] = a priori probability of occurrence of class k
noise image. Index k denotes the class of image fk [n, m]: for the E fk = energy of image fk [n, m]
present example, k extends over the range k = 1 for class 1 to
k = 10 for class 10, with class 1 representing numeral 1, class 9 ΛMLE = MLE maximization parameter
representing numeral 9, and class 10 representing numeral 0. k̂MLE = value of k that maximizes ΛMLE
The goal of the classification algorithm is to determine the true
value of index k, from the observed image gobs [n, m], given the ΛMAP = MAP maximization parameter
stored reference images f1 [n, m] through f10 [n, m]. k̂MAP = value of k that maximizes ΛMAP
A common approach to establishing the identity of an un-
known class is to compute the correlation ρk between the Color Images
observed image gobs [n, m] and the reference image fk [n, m] for (i)
all L classes, and the select the value of k associated with the fk [n, m] = ith training image vector for class k
highest correlation. The correlation ρk is given by (i) (i) (i)
= [ fk,R [n, m], fk,G [n, m], fk,B [n, m]]
N−1 M−1
ρk = ∑ ∑ gobs[n, m] fk [n, m]. (11.2) gobs [n, m] = observed image vector
n=0 m=0
Kk [n, m] = class-specific covariance matrix
Since gobs [n, m] and fk [n, m] are both (M × N), computation of
ρk requires MN multiplications and additions for each value of k. Kk0 = joint covariance matrix
If gobs [n, m] and fk [n, m] have sparse wavelet transforms, we can
take advantage of Parseval’s theorem for the wavelet transform
(Eq. (7.14b)) to compute ρk , because multiplications by zero do
not count as computations. The √ Haar transform requires only
subtractions and divisions by 2. section also classifies the observed image correctly, but it does
To determine the identity of the numeral in the noisy image so with a superior margin of error.
shown in Fig. 11-2(b), ρk was computed using Eq. (11.2) for
all 10 values of k, and the results are displayed in Fig. 11-2(c).
Among the 10 correlations, ρ3 of class 3 (corresponding to Exercise 11-1: We have two classes of (1 × 1) images
numeral 3) exhibits the largest value, but it exceeds ρ8 of class 8 f1 [n, m] = 1 and f2 [n, m] = 4. Use correlation to classify
by only a small amount. The correct numeral in image gobs [n, m] gobs [n, m] = 2.
is indeed 3, so the correlation method is successful in classifying Answer: ρ1 = (2)(1) = 2 and ρ2 = (2)(4) = 8. Since
it correctly, but the margin of error is rather slim. As we shall ρ2 > ρ1 , classify gobs [n, m] as k = 2. This is counterintuitive
see shortly, the MLE classification method outlined in the next (2 is closer to 1 than to 4), so see Exercise 11-2.
11-2 CLASSIFICATION BY MLE 357
11-2 Classification by MLE all locations [n, m] of the marginal pdfs p(gobs [n, m]) given by
Eq. (11.5):
For L classes, the observed (M × N) image gobs [n, m] is one of:
N−1 M−1

 f1 [n, m] + υ [n, m]
 class #1,
p({gobs [n, m]}) = ∏ ∏ p(g[n, m])

 n=0 m=0
 f2 [n, m] + υ [n, m] class #2, N−1 M−1
gobs [n, m] = (11.3) 1 2 2



.
.. ..
.
= ∏ ∏ e−(gobs [n,m]− fk [n,m]) /(2σv )
(2πσv2 )NM/2 n=0 m=0


fL [n, m] + υ [n, m] class L, 1 N−1 M−1 2 2
= 2 NM/2
e− ∑n=0 ∑m=0 (gobs [n,m]− fk [n,m]) /(2σv ) . (11.6)
(2πσv )
where υ [n, m] is a (M × N) zero-mean white Gaussian noise
random field with variance σv2 . Following Section 9-3, the likelihood function
p({gobs[n, m]}) is one of the following joint pdfs:
11-2.1 Marginal pdfs p({gobs[n, m]}) =

1
(2πσv2 )NM/2
Accordingly, for each location [n, m], exactly one of the follow-  N−1 M−1
2
−(1/2σv ) ∑n=0 ∑m=0 (gobs [n,m]− f 1 [n,m])2
ing marginal pdfs is the pdf of gobs [n, m]: 
e class #1,


 e−(1/2σv2) ∑N−1 M−1
n=0 ∑m=0 (gobs [n,m]− f 2 [n,m])
2
class #2,
2
N ( f1 [n, m], σv )


class #1, × .
 ..

N ( f2 [n, m], σv2 ) 

class #2, 
gobs [n, m] ∼ (11.4)  −(1/2σ 2) ∑N−1 ∑M−1 (gobs [n,m]− fL [n,m])2
 .
.. .. e v n=0 m=0 class #L,

 .

N ( f [n, m], σ 2 ) (11.7)
L v class L,
and the natural log-likelihood ln[p({gobs[n, m]})] is
where N ( fk [n, m], σv2 ) is a short-hand notation analogous to
that in Eq. (8.86), and it states that the pdf of gobs [n, m] for class k NM 1
ln[p({gobs[n, m]})] = − ln(2πσv2 ) − 2
has the form of the Gaussian pdf given by Eq. (8.38a): 2 2σv

 N−1 M−1 2
1 2 2
∑N−1
 n=0 ∑m=0 (gobs [n, m] − f 1 [n, m]) class #1,
p(gobs [n, m]) = p e−(gobs [n,m]− fk [n,m]) /(2σv ) . (11.5) 
∑ M−1 2
2
2πσv n=0 ∑m=0 (gobs [n, m] − f 2 [n, m]) class #2,
× .

 ..
The form of the marginal pdf (i.e., the pdf specific to pixel [n, m]) 

 N−1 M−1 (g [n, m] − f [n, m])2
given by Eq. (11.5) is a consequence of the fact that υ [n, m] is a ∑n=0 ∑m=0 obs L class #L.
zero-mean random field, as a result of which the mean value of (11.8)
gobs [n, m] is simply fk [n, m].
Notationally, p(gobs [n, m]) is the marginal pdf of pixel [n, m] 11-2.2 MLE of Index k
and p({gobs [n, m]}) is the joint pdf of all pixels in image
gobs [n, m]. Since for class k, image gobs [n, m] is the sum of The maximum likelihood estimate k̂MLE is the value of k that
fk [n, m] and white random noise υ [n, m], and since the amount of maximizes the log-likelihood function given by Eq. (11.8).
noise υ [n, m] added to pixel [n, m] is independent of the amount Since − NM 2
2 log(2πσv ) is added to all terms, and −1/(2σv )
2
of noise added to other pixels, image values {gobs[n, m]} can multiplies all terms, k̂MLE is the value of k that maximizes Λ1 [k]
be regarded as independent if the amount of noise added to defined as
{ fk [n, m]} is significant. This is certainly the case for the image
N−1 M−1
shown in Fig. 11-2(b). [If the noise is insignificant in compari-
son with the signal, the classification task becomes rather trivial;
Λ1 [k] = − ∑ ∑ (gobs [n, m] − fk [n, m])2 . (11.9)
n=0 m=0
the correlation method of the preceding subsection should be
able to correctly classify the observed image with no error.] This expression has an evident interpretation: Choose the value
Hence, the joint pdf p({gobs [n, m]}) is equal to the product over of k such that fk [n, m] is closest to gobs [n, m] in the ℓ2 (sum of
squares) norm defined in Eq. (7.117b). Classified as 0 1 2 3 4 5 6 7 8 9

Computation of k̂MLE can be simplified further by expanding True numeral
Eq. (11.9), since
0 78 4 1 5 1 3 2 2 0 4
N−1 M−1 1 0 77 4 0 6 2 7 3 0 1
2
Λ1 [k] = − ∑ ∑ (gobs[n, m] − fk [n, m]) 2 5 9 69 1 7 1 2 5 1 0
n=0 m=0
3 2 2 0 85 2 3 4 0 1 1
N−1 M−1 N−1 M−1
4 0 3 4 0 70 5 5 3 8 2
=− ∑ ∑ gobs[n, m]2 − ∑ ∑ fk [n, m]2 5 1 4 2 4 5 76 0 4 1 3
n=0 m=0 n=0 m=0
6 0 6 1 0 3 1 83 3 2 1
N−1 M−1
7 0 3 11 2 5 4 1 69 4 1
+2 ∑ ∑ gobs[n, m] fk [n, m]. (11.10)
8 0 0 1 3 7 1 3 6 75 4
n=0 m=0
9 0 3 1 0 3 4 7 2 4 76
The first term is the energy of gobs [n, m], which is independent
of k, so we can ignore it. The second term is the energy E fk of The first row pertains to numeral 0. Of the 100 noisy images
containing an image of numeral 0, 78 were classified correctly,
fk [n, m]. The recipe for computing k̂MLE simplifies to choosing
4 were misclassified as numeral 1, 1 was misclassified as
the value of k that maximizes the MLE criterion ΛMLE [k]:
numeral 2, etc. Each row sums to 100, and similar interpretations
N−1 M−1 apply to the other numerals. A perfect classifier would be a
ΛMLE[k] = 2 ∑ ∑ gobs[n, m] fk [n, m] − E fk . (11.11) (10 × 10) diagonal matrix of 100s. For the noisy images used
n=0 m=0 in this trial, the classification accuracy varied between a low of
69% for numeral 7 and a high of 85% for numeral 3.
In view of Eq. (11.2), ΛMLE [k] is related to the correlation ρk
by
Concept Question 11-1: Why is the maximum likeli-
ΛMLE [k] = 2ρk − E fk . (11.12)
hood (MLE) classifier different from correlation?
Returning to the image of Fig. 11-2(b), computation of the MLE
parameter ΛMLE [k] leads to the stem plot in Fig. 11-2(d), which
Exercise 11-2: We have two classes of (1 × 1) images
correctly selects numeral 3 as the numeral in the noisy image,
with a significant margin between the values of ΛMLE [3] and f1 [n, m] = 1 and f2 [n, m] = 4. Use the MLE classifier to
ΛMLE[8]. classify gobs [n, m] = 2.
Answer: ρ1 = (2)(1) = 2 and ρ2 = (2)(4) = 8.
ΛMLE [1] = 2(2) − 12 = 3 and ΛMLE [2] = 2(8) − 42 = 0.
Since ΛMLE [1] > ΛMLE [2], classify gobs [n, m] as k = 1.
Including energies of the images resulted in a different
classification.
11-2.3 MLE Bank Numeral Classification

11-3 Classification by MAP
We now extend the numeral classification example by perform- Sometimes a priori probabilities p[k] are available for the
ing 1000 trials comprised of 100 noisy images of each of the 10 various fk [n, m]. This is not the case for zip codes or bank
numerals 0 to 9. By selecting σv2 = 106 , we decrease the signal- account numbers (all ten digits are equally likely), but it is
to-noise ratio from −1.832 dB, an image example of which is the case for letters of the alphabet. Figure 11-3 depicts the
shown in Fig. 11-1(b), down to an average of −16.7 dB among frequencies of appearances of a letter in typical text. These
the various trials and numerals. At this ratio, the average signal frequencies function as a priori probabilities p[k] for each letter
energy is only 2% of the noise energy! in letter recognition in an image of text.
The results of the trials using the MLE criterion ΛMLE [k] to The a priori probabilities p[k] can be incorporated into the im-
classify each of the 1000 trials are summarized in the form of age recognition problem by using an MAP formulation instead
the following confusion matrix: of an MLE formulation, by simply multiplying the likelihood
11-3 CLASSIFICATION BY MAP 359

2 1
0.14 into one of the two classes defined by the image and
5 7

0.12 8 6
. Obtain a classification rule in terms of the values of the
4 3
0.10 four elements of gobs [n, m] using (a) correlation and (b) MLE.
0.08
0.06 Solution:
(a) Classification by correlation: In terms of the notation
0.04 introduced earlier,
0.02
2 1
f1 [n, m] = (11.14a)
0 5 7
abcdefghijklmnopqrstuvwxyz
and
8 6
Figure 11-3 Frequencies of appearances of letters, which can f2 [n, m] = . (11.14b)
4 3
be regarded as a priori probabilities p[k].
The energies of f1 [n, m] and f2 [n, m] are
E f1 = 22 + 12 + 52 + 72 = 79, (11.15a)
function in Eq. (11.7) by p[k]. The modification adds ln[p[k]] to 2 2 2 2
the log-likelihood function in Eq. (11.8). Repeating the above E f2 = 8 + 6 + 4 + 3 = 125. (11.15b)
derivation, k̂MAP is computed by choosing the k that maximizes
ΛMAP [k]:
The correlations between gobs [n, m] and each of f1 [n, m] and
f2 [n, m] are
ΛMAP [k] = 2ρk − E fk + 2σv2 ln p[k]. (11.13)
ρ1 = 2g0,0 + g1,0 + 5g0,1 + 7g1,1, (11.16a)
The MAP (maximum a posteriori) classifier is also known
as the minimum error probability (MEP) classifier since it ρ2 = 8g0,0 + 6g1,0 + 4g0,1 + 3g1,1. (11.16b)
minimizes the probability of an incorrect choice of k. Note
that if p[k] = 1/L, so that each class is equally likely, the For a given image g[n, m], we classify it as
MAP classifier reduces to the MLE classifier. Also note that
σv2 functions as a trade-off parameter between the a priori f1 [n, m], if ρ1 > ρ2 (11.17a)
information p[k] and the a posteriori information ρk : the noisier or as
the observations, the larger is σv2 , and the greater the weight f2 [n, m], if ρ1 < ρ2 . (11.17b)
given to a priori information p[k]. The smaller the noise, the
heavier is the weight given to a posteriori (from the noisy data)
information ρk . (b) Classification by MLE: Using Eq. (11.11), the MLE
parameter Λ2 [k] is given by
Λ2 [1] = 4g0,0 + 2g1,0 + 10g0,1 + 14g1,1 − 79, (11.18a)

Example 11-1: (2 × 2) Image Classification Λ2 [2] = 16g0,0 + 12g1,0 + 8g0,1 + 6g1,1 − 125. (11.18b)
Example
Upon subtracting the expression for Λ2 [1] from the expression
for Λ2 [2], we obtain the following classification rule:
Classify noisy image Image of gobs [n, m] is

g0,0 g1,0 f1 [n, m], if (12g0,0 + 10g1,0 − 2g0,1 − 8g1,1 − 46) < 0
gobs [n, m] = (11.19a)
g0,1 g1,1
and gobs [n, m]-Reference Format:


f2 [n, m], if (12g0,0 + 10g1,0 − 2g0,1 − 8g1,1 − 46) > 0. 
 f1 [n − n0, m − m0 ] class #1,
(11.19b) 

 f2 [n − n0, m − m0 ] class #2,
gobs [n, m] = .. .. (11.20)

 . .
Concept Question 11-2: Why is the maximum a poste- 

 f [n − n , m − m ] class #L,
riori probability (MAP) classifier different from the maxi- L 0 0
mum likelihood (MLE) classifier? or as
Exercise 11-3: We have two classes of (1 × 1) images fk [n, m]-Reference Format:

f1 [n, m] = 1 and f2 [n, m] = 4, with respective a priori 
probabilities p[k = 1] = 0.1 and p[k = 2] = 0.9. The additive 
 f1 [n, m] class #1,


noise has σv2 = 3. Use the MAP classifier to classify  f2 [n, m] class #2,
gobs [n, m] = 2. gobs [n − n0, m − m0 ] = .. .. (11.21)

 . .


Answer: ρ1 = (2)(1) = 2 and ρ2 = (2)(4) = 8.  f [n, m] class #L.
L
ΛMAP [1] = 2(2) − 12 + 2(3) log(0.1) = −10.8 and
ΛMAP [2] = 2(8) − 42 + 2(3) log(0.9) = −0.63. Since
ΛMAP [2] > ΛMAP [1], classify gobs [n, m] as k = 2. The a
priori probabilities biased the classification towards k = 2. 11-4.2 Classification by Cross-Correlation
In both cases, the objective is to select the correct class through
some appropriate comparison between gobs and fk for all possi-
11-4 Classification of Spatially Shifted ble values of k. One such measure of comparison is the cross-
Images correlation ρk [n0 , m0 ] between gobs and fk when one of them
is shifted by [n0 , m0 ] relative to the other. Of the two formats,
Thus far, we have assumed that the location, size, and orientation the observation-reference format given by Eq. (11.20) is more
of the observed image gobs [n, m] is the same as those of the applicable to search algorithms, so we adopt it in here for
reference images { fk [n, m], k = 1, 2, . . . , L} for all L classes computing the cross-correlation:
of numerals, letters, objects, etc. Now, in this and the next
N−1 M−1
two sections, we examine how to perform image classification
successfully even when the reference images are shifted in
ρk [n0 , m0 ] = ∑ ∑ gobs[n, m] fk [n − n0, m − m0]. (11.22)
n=0 m=0
location, of different size, or different orientation relative to the
observed image. Both images are of size (M × N) pixels.
By rearranging the order of the indices of fk , we can cast the
summations in the form of a convolution:
11-4.1 Unknown Relative Shift
N−1 M−1
In the absence of a spatial shift between the observation and ρk [n0 , m0 ] = ∑ ∑ gobs[n, m] fk [−(n0 − n), −(m0 − m)]
reference images, all images are arranged so that their origins n=0 m=0
[0, 0] coincide. In some applications, however, the observed = gobs [n0 , m0 ] ∗ ∗ fk [−n0 , −m0 ], (11.23)
image gobs [n, m] or the reference images fk [n, m] may be shifted
by an unknown amount [n0 , m0 ] relative to each other. One such The image representing ρk [n0 , m0 ] is called the cross-
application is the search for occurrence of a specific letter in a correlation between gobs [n, m] and fk [n, m]. Using properties #4
block of text, an example of which is presented later in the form and #5 in Table 3-3, ρk [n, m] can be computed readily using the
of Example 11-4. zero-padded 2-D DFT:
Since it is the relative—rather than the absolute—spatial shift
that matters, the image classification problem can be formulated Gobs [k1 , k2 ] = (2M × 2N) DFT{gobs[n, m]}, (11.24a)
as either: Fk [k1 , k2 ] = (2M × 2N) DFT{ fk [n, m]}, (11.24b)
11-5 CLASSIFICATION OF SPATIALLY SCALED IMAGES 361
ρk [n, m] = DFT−1 {Gobs[k1 , k2 ] Fk [2N − k1 , 2M − k2 ]}. 11-4.4 Classification Recipe for

(11.24c) Continuous-Space Images
In analogy with Eq. (11.20), for continuous-space images the
classification problem is formulated as

11-4.3 Classification Recipe for Discrete-Space  f1 (x − x0 , y − y0 ) class #1,


Images 
 f2 (x − x0 , y − y0 ) class #2,
gobs (x, y) = . .. (11.25)
The recipe for classification when an unknown relative 
 .. .


shift [n0 , m0 ] may exist between gobs [n, m] and { fk [n, m],  f (x − x , y − y )
L 0 0 class #L,
k = 1, 2, . . . , L} consists of the following steps:
where the true class is class #K and variables K, x0 and y0 are all
1. Compute ρk [n, m] for all values of k: {k = 1, 2, . . . , L} using unknown. The cross-correlation for the continuous-space case is
Eq. (11.24c). given by
Z ∞Z ∞
2. Among the total of (M × N) pixels × L classes = MNL, ρk (x0 , y0 ) = gobs (ξ , η ) fk (ξ − x0 , η − y0) d ξ d η ,
−∞ −∞
identify the combination of pixel location [n, m] and class k (11.26)
that yields the largest value of ρk [n, m]. and the classification recipe proceeds as follows:
3. Label that specific pixel location as [n, m] = [n0 , m0 ] and 1. Given observation image gobs (x, y) and reference images
label the identified value of k as the unknown class K. fk (x, y), compute the cross-correlation ρk (x0 , y0 ) for all
values of k: {k = 1, 2, . . . , L}, and for all spatial shifts
(x0 , y0 ) that offer nonzero overlap between gobs (x, y) and
fk (x − x0 , y − y0).
2. Identify the combination of shift (x0 , y0 ) and class k that
Example 11-2: Identifying Letter “e”
exhibits the largest value of ρk (x0 , y0 ).
in Text Image
Figure 11-4(a) displays a (33 × 256) image of two lines of text.

11-5 Classification of Spatially Scaled
Use cross-correlation to identify all the locations of the letter Images
“e” within the text image. In the printed text, each lower-case
letter is allocated (9 × 7) pixels and reference images fk [n, m] We now consider the problem of image classification in which
are provided for all lower-case letters. the observed image gobs (x, y) is a spatially scaled version of
the reference images fk (x, y), or vice versa. Our presentation
Solution: Application of the recipe outlined in Section 11-5.3 is specific to continuous-space observation and reference im-
leads to the identification of 7 pixels with cross-correlation ages, gobs (x, y) and fk (x, y). In a later part of this section, we
values that were much larger than those of other pixels. After address how to apply the classification algorithm developed for
examining the distribution of values, all values of the cross- continuous-space images to discrete-space images. Addition-
correlation smaller than 800,000 were set to zero, thereby ally, to keep the presentation simple and manageable, we assume
highlighting the high-value pixels shown in Fig. 11-4(b). With that the observed images are noise-free.
the text printed in green in Fig. 11-4(c), the identified pixels The topic of spatial scaling of images was presented earlier
in part (b) are now printed in red. Examination of the image in the book, in Section 3-2.2A. If gobs (x, y) is a spatially scaled
confirms that the cross-correlation algorithm has successfully version of fk (x, y), then
identified all occurrences of the letter “e” in the text image.
gobs (ax x, ay y) = fk (x, y) (11.27a)
or
(a) Image of text
(b) Thresholded cross-correlation ρ5(n0,m0)
(c) Image of text (green) and thresholded cross-correlation (red)
Figure 11-4 Identifying the letter “e” in Example 11-4.

x y where the true class is class #K and all images are nonzero only
gobs (x, y) = fk , . (11.27b)
ax ay for x, y > 0. Class K and variables ax , and ay are all unknown.
The positive-value constants ax and ay are unknown spatial-

scaling factors. If ax > 1, gobs (x, y) is fk (x, y), but magnified
11-5.1 Logarithmic Spatial Transformation
in the x direction by ax . Conversely, if ax < 1, g(x, y) is fk (x, y)
but shrunk in the x direction by ax . Similar variations apply to Clearly, because gobs and fk have different spatial scales, the
ay along the y direction. An obvious example of image classi- image classification problem cannot be solved using the same
fication in the presence of spatial scaling is when the relative cross-correlation approach presented in earlier sections.
sizes of letters and numerals in the images to be classified are However, we can still solve the problem—and use cross-
different from those in the reference images fk (x, y). correlation—by performing logarithmic warping of spatial co-
The spatial-scaling classification problem can be formulated ordinates x and y. To that end, we change variables from (x, y)
as to (x′ , y′ ):

 x y ′


 f1 , class #1, x′ = ln(x) x = ex , (11.29a)


 ax ay ′ y′
 y = ln(y) y=e .
 f2 x , y
 class #2,
(11.29b)
gobs (x, y) = ax ay (11.28)
 .. .. Next, we define spatially warped images g′obs (x′ , y′ ) and



 . . fk′ (x′ , y′ ):



 x y
 fL , class #L, ′ ′
ax ay g′obs (x′ , y′ ) = gobs (ex , ey ), (11.30a)
x′ y′
fk′ (x′ , y′ ) = fk (e , e ), (11.30b)
and logarithmically transformed scale factors (a′x , a′y ): that its coordinates may have an unknown spatial scaling relative
to those of reference images fk (x, y), consists of the following
′
a′x = ln(ax ) a x = ea x , (11.31a) steps:
a′y
a′y = ln(ay ) ay = e . (11.31b) 1. Use Eqs. (11.29) and (11.30) to transform all images g(x, y)
and fk (x, y) to logarithmic format g′ (x′ , y′ ) and fk′ (x′ , y′ ).
Using these transformations, the spatially warped observation
image g′obs (x′ , y′ ) can be related to the spatially warped reference 2. Use Eq. (11.34) to compute ρk (x0 , y0 ) for each value of k:
image fk′ (x′ , y′ ) as follows: {k = 1, 2, . . . , L}, and all spatial shifts (x0 , y0 ) that offer
! nonzero overlap between g′ (x′ , y′ ) and fx′ (x′ − x0 , y′ − y0 ),
e x′ ey′
′ ′ using the 2-D CSFT.
g′obs (x′ , y′ ) = gobs (ex , ey ) = fk ,
ax ay
! 3. Identify the combination of shift (x0 , y0 ) and class k that
′ ′
ex ey yields the largest value of ρk (x0 , y0 ). Label that combina-
= fk ′ , ′ tion as (a′x , a′y ) and k = class K.
ea x ea y
′ ′ ′ ′
= fk (e(x −ax ) , e(y −ay ) ) 4. With (a′x , a′y ) known, the scaling factors (ax , ay ) can then be
= fk′ (x′ − a′x , y′ − a′y ). (11.32) determined using Eq. (11.31).
Hence, in the logarithmically transformed spatial variables

(x′ , y′ ), spatial scaling by (ax , ay ) becomes a spatial shift by
(a′x , a′y ). An example is illustrated in Fig. 11-5. Example 11-3: Scaled-Image Classification
The problem statement defined by Eq. (11.28) can now be
reformulated as
 In Fig. 11-6(a), we are given displays of 10 reference images

 f ′ (x′ − a′x , y′ − a′y ) class #1, of bank-font digits fk [n, m], with k = 1 to 9 representing the
 1′ ′
 numerals 1 to 9, respectively, and k = 10 representing 0. Each
 f (x − a′x , y′ − a′y ) class #2,
2
g′obs (x′ , y′ ) = . .. (11.33) image is (32 × 20) in size, even though the original size was only

 .
. . (30 × 18); the original image was zero-padded by inserting two


 ′ ′ rows and two columns of zeros at the top and left of the image.
fL (x − a′x , y′ − a′y ) class #L,
The zero-padding was used to ensure that the logarithmically
where now the unknown true class is still class k = K, but transformed image had sufficient space to be shifted.
the other unknowns are a′x and a′y . The formulation defined by We also are given an observed image gobs [n, m] shown in
Eq. (11.33) is the same as that of the spatial-shift classification Fig. 11-6(b), which we know to be a downsized scaled version
problem given by Eq. (11.25) in the previous section. The cross- of one of the 10 digits in Fig. 11-6(a), with a scaling factor of
correlation between g′obs (x′ , y′ ) and fk′ (x′ , y′ ) when the latter is 2 or 4, but we do not know which particular digit it represents,
spatially shifted by (x0 , y0 ) is given by the convolution nor do we know the exact value of the scaling factor. [Actually,
ZZ gobs [n, m] is digit “3” at half-size, but that information is yet to
ρk (x0 , y0 ) = g′obs (x′ , y′ ) f ′ (x′ − x0 , y′ − y0) dx′ dy′ be established by the image classifier.] Classify the unclassified
image gobs [n, m].
= g′obs (x0 , y0 ) ∗ ∗ f ′ (−x0 , −y0 ). (11.34)
Solution: In Section 11-4.1, we used a natural-log trans-
formation to convert a spatially scaled continuous-space im-
11-5.2 Classification Recipe age gobs (x, y) into a spatially shifted continuous-space image
g′obs (x′ , y′ ). For a discrete-space image gobs [n, m], we use base
The following recipe is for continuous-space images. If the 2 logarithms:
images are in discrete-space form, interpolation can be applied
′
to convert them into continuous-space form using the methods n′ = log2 n n = 2n , (11.35a)
described in Chapter 4. ′ m′
The recipe for classifying observation image gobs (x, y), given m = log2 m m=2 . (11.35b)
10 2
6
15
8
10
20 12
14
16
25 2 4 6 8 10
n
m
gobs[n,m]
30
n
4 8 12 16 20
m
f3[n,m]
(a) Images in linear coordinates [n,m]
n and m 1 2 4 8 16 32
n′ and m′ 0 1 2 3 4 5
n′ n′
m′ m′
f3′[n′,m′] gobs
′ [n′,m′]
(b) Images in logarithmic coordinates [n′,m′] (c) Relationship between [n,m] and [n′,m′]
Figure 11-5 (a) (32 × 20) image f3 [n, m] and (16 × 10) image gobs [n, m], (b) the same images in logarithmic format in base 2 (instead of
base e), also note that g′obs [n′ , m′ ] = f3′ [n′ + 1, m′ + 1], and (c) the relation between [n, m] and [n′ , m′ ]. [Note that 32 pixels in [n, m] space
transform into 6 (not 5) pixels in [n′ , m′ ] space.]
f1′ [n′,m′] f2′ [n′,m′] f3′ [n′,m′] f4′ [n′,m′] f5′ [n′,m′]
f6′ [n′,m′] f7′ [n′,m′] f8′ [n′,m′] f9′ [n′,m′] f0′ [n′,m′]
(a) Reference images fk [n,m] (a) Reference images fk′[n′,m′] in logarithmic format
(b) Observed image gobs[n,m] (b) Observed image gobs

′ [n′,m′]
in logarithmic format
Figure 11-6 (32 × 20) reference images and (unclassified)
observed image of Example 11-3. Figure 11-7 (6 × 5) reference images and (5 × 4) (unknown)
observation image in logarithmic format.
The spatially logarithmically transformed image g′obs [n′ , m′ ] in

where in the last step we used the definition given by Eq. (11.36)
[n′ , m′ ] space is
to convert fk into its logarithmic equivalent fk′ . The result encap-
′ ′ sulated by Eq. (11.39) states that in [n′ , m′ ] space, the observed
g′obs [n′ , m′ ] = gobs [2n , 2m ]. (11.36)
image g′obs [n′ , m′ ] is a spatially shifted version of fk′ [n′ , m′ ] and
Based on the given information, the shift is i pixels along each direction. Hence, application of
the spatial-shift recipe outlined in Section 11-3 should lead to
hn mi n m correctly identifying the values of both k and i. The validity of
gobs = fk , for 1 ≤ ≤ 20, 1 ≤ ≤ 32, (11.37)
a a a a this statement is verified by Figs. 11-7 and 11-8. In Fig. 11-7, we
show the reference and observed images in logarithmic format.
with the scaling factor a = 1/2 or 1/4. For convenience, we Visual inspection reveals the close similarity between f3′ [n′ , m′ ]
introduce the inverse scaling factor and gobs [n′ , m′ ], and that the spatial shift between them is i = 1,
( corresponding to a = 1/2.
1 i 1 for a = 1/2, Computational classification is realized by computing the
b= =2, with i = (11.38)
a 2 for a = 1/4. MLE criterion ΛMLE [k] as defined by Eq. (11.12):
Upon using Eq. (11.38) in Eq. (11.37) and then incorporating ΛMLE [k] = 2ρk′ − E f ′ , (11.40)
k
the scaling relationship into Eq. (11.36), we have
′ ′
where ρk′ is the cross-correlation between g′obs [n′ , m′ ] and
g′obs [n′ , m′ ] = fk [2i 2n , 2i 2m ] fk′ [n′ , m′ ] and E f ′ is the energy of image fk′ [n′ , m′ ]. Because
k
′ ′
= fk [2n +i , 2m +i ]. the energies of the 10 digits in [n′ , m′ ] space vary widely (as
evidenced by the wide range in the number of white pixels
= fk′ [n′ + i, m′ + i], (11.39) among the 10 digits in Fig. 11-7(a)), use of the MLE criterion
ΛMLE[k]
5
x 10
3
0 k
−1
−2
0 1 2 3 4 5 6 7 8 9
Figure 11-8 ΛMLE [k] based on correlation between noisy digit g′obs [n, m] and each of the ten digits fk′ [n, m] for {k = 0, 1, . . . , 10}.
ΛMLE[k] provides a better classifier than ρk′ alone. The computed by p

values of ΛMLE [k], plotted in Fig. 11-8, show that ΛMLE is x = r cos θ r = x2 + y2
largest for digit “3.” Note that because images f2′ [n, m] and . (11.41)
y = r sin θ θ = tan−1 (y/x)
f3′ [n, m] in Fig. 11-7(a) look similar, their computed MLE
criteria (Fig. 11-8) are close in values. The image classification problem with unknown relative rotation
θo can then be reformulated as

Concept Question 11-3: In classification of spatially- 
 fe1 (r, θ − θo ) class #1,
shifted images, why not just correlate with all possible 

 fe2 (r, θ − θo ) class #2,
shifts of the image? geobs (r, θ ) = . .. (11.42)

 .. .


e
fL (r, θ − θo ) class #L,
11-6 Classification of Rotated Images
where geobs (, θ ) and fek (r, θ ) are gobs (x, y) and fk (x, y) in polar
We now consider the continuous-space and noiseless version coordinates, respectively.
of the image classification problem in which the observation The form of Eq. (11.42) is the same as that of Eq. (11.20)
gobs (x, y) is a rotated (by an unknown angle θo ) version of for the spatial-shift problem, except that now the variables
fK (x, y), or vice versa. As before, K is the unknown true value are (r, θ ) instead of (x, y). In fact, the problem is relatively
of k. simpler because we only have one unknown spatial variable,
namely θo , rather than two. Using the cross-correlation concept
that served us well in earlier sections, we define the rotation
11-6.1 Polar-Coordinate Transformation cross-correlation ρek (r, θ ) as
Z
To incorporate the unknown angular rotation into the image
ρek (r, θ ) = geobs (r, θ ′ ) fek (r, θ ′ − θ ) d θ ′ . (11.43)
classification problem, we reformulate the problem using polar
coordinates (r, θ ) instead of Cartesian coordinates (x, y). Per
Fig. 3-11(a) and Eq. (3.44), the two pairs of variables are related
11-7 COLOR IMAGE CLASSIFICATION 367
applying Eq. (11.41) to convert the coordinates of the 11 images

from Cartesian (x, y) to polar (r, θ ). The transformed images are
displayed in Fig. 11-10, with the horizontal axis representing θ
and the vertical axis representing r. Close examination of the
images reveals that the observed image is identical with image
“3” of the reference images, except for a 90◦ shift. Hence, it
is not surprising that application of Eq. (11.43) against all 10
reference images and at many values of θ between 0◦ and 360◦
produces the highest value for the cross-correlation when k = 3
(a) (17 × 17) clipped reference images
and θ = 90◦ .
Concept Question 11-4: When is interpolation unneces-

sary for classification of spatially scaled images?
(b) Upper half of numeral “3” rotated by 90˚

11-7 Color Image Classification
Figure 11-9 Reference images and rotated observed image of
Example 11-4. 11-7.1 Notation for Color Images
We now extend the classification methods of the preceding
section to color images consisting of red (R), green (G), and
11-6.2 Classification Recipe blue (B) channels. The treatment is equally applicable to non-
optical sensors, such as imaging radars operating at three dif-
1. Transform all images gobs (x, y) and g f (x, y) to polar- ferent wavelengths, and is extendable to multispectral images
coordinates format geobs (r, θ ) and fek (r, θ ). comprised of more that three channels.
In Sections 11-1 through 11-3, we considered the scenario
2. Use Eq. (11.43) to compute ρek (r, θ ) for every possible where we were given a set of L possible images { fk [n, m],
combination of k (among L classes) and θ , using the 2-D k = 1, 2, . . . , L} and the goal was to classify a noisy image
CSFT. gobs [n, m] into one of the L classes. Now we consider the case
3. Identify the combination that yields the largest value of where the { fk [n, m]} images must themselves be estimated from
(i)
ρek (r, θ ). Label k = K and θ = θo . a set of I training images { fk [n, m], k = 1, . . . , L; i = 1, . . . , I}.
We will assume that all random variables are jointly Gaussian,
so it will be necessary for us to estimate mean vectors and
covariance matrices for each class k from these training images.
Example 11-4: Rotated Image Classification In the numeral classification example of Section 11-1, we had
L = 10 classes of numerals, one observed image gobs [n, m] and
one known reference image fk [n, m] for each value of k, with k
The (32 × 20) reference images shown in Fig. 11-6(a) have been
extending between 1 and 10. With color images, the observed
clipped to (17 × 17), so only the upper half of each image is
image consists of three channels, and so do all of the training
shown in Fig. 11-9(a). The 90◦ -rotated image of one of the nu-
images. Another consideration that we should now address is
merals, namely numeral “3,” is shown in Fig. 11-9(b). Classify
the fact that in many classification problems the identity of a
the rotated image, presuming that its identity is unknown, as is
given class is not unique. If we are dealing with numerals that
its degree of rotation.
are always printed using a single type of font, then we only need
Solution: The recipe in Subsection 11-5.2 relies on computing a single reference image per numeral (1 to 9, plus 0), but if
the cross-correlation ρek (r, θ ) defined by Eq. (11.43). Doing we need to classify numerals that may have been printed using
so requires two prerequisite steps, namely using interpolation multiple types of fonts, then we would need a much larger set of
to convert the reference images and the observation image reference images.
from discrete-space format to continuous-space format, and then Similar within-class variations occur among text letters
r r
8 8
6 6
(1) 4
(2) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(3) 4
(4) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(5) 4
(6) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(7) 4
(8) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
r r
8 8
6 6
(9) 4
(0) 4
2 2
0
100 200 300 400 500 600 700 θ 0
100 200 300 400 500 600 700 θ
(a) Reference images displayed in polar coordinates
r
8
0
100 200 300 400 500 600 700 θ
90°
(b) Observed image
Figure 11-10 In polar coordinates, the observed image in (b) matches the reference image of “3”, shifted by 90◦ .
printed in different fonts, as well as in many object recognition Observed-image model for class k:
applications.
With these two considerations (3-color channels and within- gobs [n, m] = fk [n, m] + v[n, m], (11.49)
class variability) in mind, we introduce the following notation:
where v[n, m] is a vector of length 3 (one for each color),
modeled as a zero-mean white Gaussian noise random field with
Number of classes: L, with class index k = 1, 2, . . . , L. variance σv2 for each color component.
Image size: M × N Gaussian model for gobs [n, m]:
Number of training images per class: I, with training image
index i = 1, 2, . . . , I. gobs [n, m] = [gk,R [n, m], gk,G [n, m], gk,B [n, m]]T
(i) ∼ N ([fk [n, m]], Kk0 ), (11.50)
Red channel ith training image for class k: fk,R [n, m]
where Kk0 is a (3 × 3) joint covariance matrix that accounts
ith training image vector of 2-D functions of [n, m] for class k: for both Kk , the in-class variability of the noise-free training
(i) (i) (i) (i) images, and the added noise represented by σv2 :
fk [n, m] = [ fk,R [n, m], fk,G [n, m], fk,B [n, m]]T (11.44)
Kk0 = Kk + σv2 I3 , (11.51)
Mean red channel training image for class k:
where I3 is the (3 × 3) identity matrix.
I
1 (i)
f k,R [n, m] =
I ∑ fk,R [n, m] (11.45)
i=1 11-7.2 Likelihood Functions
Mean training image vector for class k: Even though for class k, vector fk [n, m] and covariance matrix
Kk are both estimated from the vectors of the I training images
fk [n, m] = [ f k,R [n, m], f k,G [n, m], f k,B [n, m]]T (11.46) (i)
{fk [n, m]}, we use them as the “true” values, not just estimates,
Covariance matrix estimate for class k at location [n, m]: in the classification algorithms that follow. The goal of the
classification algorithm is to determine the value of the class
1 I index k from among the L possible classes.
(i) (i)
Kk [n, m] = ∑ (fk [n, m] − fk [n, m])(fk [n, m] − fk [n, m])T Given an observation image vector gobs [n, m] and an average
I i=1 reference image vector fk [n, m] for each class index k, we define
I the difference image vector ∆ k [n, m] as
1 (i) (i)
= ∑ [fk [n, m](fk [n, m])T ] − fk [n, m](fk [n, m])T .
I i=1 ∆ k [n, m] = gobs [n, m] − fk [n, m]
(11.47)
= [gk,R [n, m], gk,G [n, m], gk,B [n, m]]T
The estimated covariance matrix Kk [n, m] accounts for the − [ f k,R [n, m], f k,G [n, m], f k,B [n, m]]T . (11.52)
variability at location [n, m] among the I training images, relative
to their mean fk [n, m], and correlation between different colors By treating fk [n, m] as the mean value of gobs [n, m], and in
in an image. view of the model given by Eq. (11.50), the location-specific
Location-independent covariance matrix for class k: marginal pdf p(gobs [n, m]) of length-3 Gaussian random vector
gobs [n, m] for class k is given by
N−1 M−1
1
Kk =
NM ∑ ∑ Kk [n, m] (11.48)
p(gobs [n, m]) =
1
p
n=0 m=0
(2π )3/2 det(Kk0 )
1 K−1
× e− 2 ((∆∆k [n,m])
T
The location-independent covariance matrix Kk is the sam- k0 ∆ k [n,m]) . (11.53)
ple mean over [n, m] of Kk [n, m]. It is used to represent the in-
class variability when Kk [n, m] is statistically independent of lo- The {gobs[n, m]} values at different locations [n, m] are indepen-
cation [n, m], which is a valid assumption in many applications. dent random vectors because noise vectors v[n, m] are location-
independent. Accordingly, their joint pdf is The corresponding natural log-likelihood function is given by
N−1 M−1 3NM
ln(p({gobs[n, m]})) = − ln(2π )
p({gobs[n, m]}) = ∏ ∏ p(gobs[n, m])  2
n=0 m=0
 NM
1 1 
 − ln(det(K10 ))
= 
 2
(2π )3NM/2 (det(Kk0 ))NM/2 


 1 N−1 M−1

 − ∑ ∑ (∆ ∆1 [n, m])T K−1
10 ∆ 1 [n, m] class 1,
N−1 M−1 
 2 n=0 m=0
1 K−1
e− 2 ((∆∆k [n,m]) k0 ∆ k [n,m])
T
×∏ 
∏ 



NM
n=0 m=0 
 − ln(det(K20 ))

 2
1 1  1 N−1 M−1
= − ∑ ∑ (∆ ∆2 [n, m])T K−1
(2π )3NM/2 (det(Kk0 ))NM/2 + 20 ∆ 2 [n, m] class 2,

 2 n=0 m=0
1 N−1 M−1 ∆ T −1 
× e− 2 (∑n=0 ∑m=0 (∆ k [n,m]) Kk0 ∆k [n,m]) . 

 .
.

 .
(11.54) 


 NM

 − ln(det(KL0 ))
The joint pdf given by Eq. (11.54) is the likelihood function of 
 2


the observed image vector gobs [n, m]. An expanded version for 
 1 N−1 M−1

 − ∑ ∑ (∆ ∆L [n, m])T K−1
the individual classes is given by  L0 ∆ L [n, m] class L.
2 n=0 m=0
1 (11.56)
p({gobs[n, m]}) =
(2π )3NM/2

 N−1 M−1
 1 1 T −1


 NM/2 ∏ ∏ e− 2 ((∆∆1 [n,m]) K10 ∆ 1 [n,m]) class 1,

 (det(K10 )) n=0 m=0 11-7.3 Classification by MLE

 N−1 M−1

 1 1 T −1

NM/2 ∏ ∏ e− 2 ((∆∆2 [n,m]) K20 ∆ 2 [n,m]) class 2,
× (det(K20 )) n=0 m=0 The maximum likelihood estimate k̂MLE of class k is the

 .. value of k that maximizes the log-likelihood function given by

 .

 Eq. (11.56). Since the first term [−(3NM/2) ln(2π )] is common

 N−1 M−1

 1 1 T −1

 ∏ ∏ e− 2 ((∆∆L [n,m]) KL0 ∆L [n,m]) class L.
(det(KL0 ))NM/2 n=0 m=0
to all terms, it has no impact on the choice of k. Hence, k̂MLE is
the value of k that maximizes the MLE criterion Λ1 [k] given by
(11.55)
N−1 M−1
Λ1 [k] = −NM ln(det(Kk0 ))− ∑ ∑ ∆k [n, m])T K−1
(∆ k0 ∆ k [n, m] .
n=0 m=0
(11.57)
According to Eq. (11.51), the joint covariance matrix Kk0
incorporates two statistical variations, Kk due to variations
among training images for class k, and σv2 due to the added
noise. If the classification application is such that Kk does
not vary with class k (or the variation from class to class is
relatively minor), we then can treat Kk as a class-independent
covariance matrix K, with a corresponding joint covariance
matrix K0 = K + σv2 I3 . As a consequence, the first term in
Eq. (11.57) becomes class-independent and can be removed
from the maximization process, which leads us to adjust the
definition of the MLE criterion to
N−1 M−1
Λ1 [k] = − ∑ ∑ ∆k [n, m])T K−1
(∆ 0 ∆ k [n, m] . (11.58)
n=0 m=0
Upon replacing ∆ k [n, m] with its defining expression given by additional term:
Eq. (11.52), Eq. (11.58) becomes
N−1 M−1
N−1 M−1
ΛMAP [k] = 2 ∑ ∑ (gobs[n, m])TK−1
0 fk [n, m]
Λ1 [k] = − ∑ ∑ ∆k [n, m])T K−1
(∆ 0 ∆ k [n, m]
n=0 m=0
n=0 m=0 N−1 M−1
N−1 M−1 − ∑ ∑ fk [n, m] K−1
0 fk [n, m] + 2 ln(p[k]).
=− ∑ ∑ (gobs[n, m] − fk [n, m])T K−1
0
n=0 m=0
(11.61)
n=0 m=0
× (gobs[n, m] − fk [n, m]) The MAP classifier (estimator of k) selects the value of k that
N−1 M−1 maximizes ΛMAP [k].
=− ∑ ∑ gobs[[n, m]K−1
0 gobs [n, m]]
n=0 m=0
N−1 M−1
− ∑ ∑ fk [n, m]K−10 fk [n, m]
n=0 m=0
N−1 M−1 Example 11-5: Color Image Classification
+2 ∑ ∑ (gobs [n, m])T K−1
0 fk [n, m] . (11.59)
n=0 m=0
The last term comes from noting that a scalar equals its own
transpose, which is useful in matrix algebra. Develop a classification rule for a two-class, (1 × 1) color image
The first term in Eq. (11.59) is independent of k. Hence, classifier for a (1 × 1) image [gR , gG , gB ] with no additive noise
computation of k̂MLE simplifies to choosing the value of k that and equal class probabilities (p[k = 1] = p[k = 2]), given the
maximizes the modified MLE criterion following 4 training images per class:
N−1 M−1
ΛMLE [k] = 2 ∑ ∑ (gobs [n, m])T K−1
0 fk [n, m] Class k = 1:
n=0 m=0
       
N−1 M−1 0 4 4 4
− ∑ ∑ fk [n, m] K−1
0 fk [n, m]. (11.60) (1) (2) (3) (4)
f1 = 0 , f1 = 0 , f1 = 4 , f1 = 0 , (11.62)
n=0 m=0 0 0 0 4
The expression given by Eq. (11.60) is the vector version of the
Class k = 2:
expression given earlier in Eq. (11.11) for the scalar case.
       
0 0 0 4
(1) (2) (3) (4)
f2 = 0 , f2 = 4 , f2 = 4 , f2 = 4 . (11.63)
4 0 4 4
11-7.4 Classification by MAP Solution: The sample means of the four training sets are
         
An noted earlier in connection with Fig. 11-3, the probability 1 0 4 4 4 3
f1 = 0 + 0 + 4 + 0 = 1 (11.64)
of occurrence p[k] of the letters of the alphabet varies widely 4 0 0 0 4 1
among the 26 letters of the English language. The same may be
true for other classification applications. The maximum a priori and
(MAP) classifier takes advantage of this a priori information          
by multiplying the likelihood function given by Eq. (11.54) by
1 0 0 0 4 1
p[k]. This modification leads to a MAP classification criterion f2 = 0 + 4 + 4 + 4 = 3 . (11.65)
ΛMAP [k] given by the same expression for ΛMLE [k], but with an 4 4 0 4 4 3
The sample covariance for k = 1 is

     gB
1 0 4
Region f2
K1 = 0 [0, 0, 0] + 0 [4, 0, 0]
4 0 0 (0,0,4) (0,4,4)
    
1 4 4
+ 4 [4, 4, 0] + 0 [4, 0, 4] (4,0,4)
4 0 4 (4,4,4)
    (0,0,0) (0,4,0)
3 3 1 1 gG
− 1 [3, 1, 1] = 1 3 −1 , (11.66)
1 1 −1 3 (4,0,0)
(4,4,0)
Region f1
and the sample covariance for k = 2 is
gR
    
1 0 0
K2 = 0 [0, 0, 4] + 4 [0, 4, 0]
4 Figure 11-11 Decision regions for Example 11-5.
4 0
    
1 0 4
+ 4 [0, 4, 4] + 4 [4, 4, 4]
4 4 4 and for k = 2,
      
1 3 1 1 1  2 −1 −1 1
− 3 [1, 3, 3] = 1 3 −1 . (11.67) ΛMLE [2] = 2[gR , gG , gB ] −1 2 1 3
4 −1 1 2 3
3 1 −1 3
  
Because K1 = K2 , Eq. (11.60) applies. The classification rule is: 1 2 −1 −1 1
− [1, 3, 3] −1 2 1 3
Choose the value of k that maximizes ΛMLE[k] in Eq. (11.60). 4 −1 1 2 3
Setting K1 = K2 = K, the inverse of K is
  = 4gG + 4gB − 2gR − 11. (11.70)
−1 1  2 −1 −1 The classification rule (choose the larger of ΛMLE [1] and
K = −1 2 1 . (11.68)
4 −1 1 2 ΛMLE [2]) simplifies to the following simple sign test:
• Choose f1 if: −4gR + 4gG + 4gB − 8 < 0,
For k = 1, ΛMLE [1] of Eq. (11.60) becomes
   • Choose f2 if: −4gR + 4gG + 4gB − 8 > 0.
1 2 −1 −1 3
ΛMLE [1] = 2[gR, gG , gB ] −1 2 1  1 In 3-D space, with axes {gR, gG , gB }, the boundary is a plane
4 −1 1 2 1 that separates the regions in which {gR , gG , gB } is assigned to
   f1 and in which it is assigned to f2 . The regions are shown in
1 2 −1 −1 3 Fig. 11-11.
− [3, 1, 1] −1 2 1 1
4 −1 1 2 1
= 2gR − 3, (11.69) Example 11-6: (2 × 2) Image Classifier
We are given two (2 × 2) color image classes:

1 2 4 5 7 8
f1,R = , f1,G = , f1,B = ,
3 0 6 0 9 0
11-8 UNSUPERVISED LEARNING AND CLASSIFICATION 373

3 2 6 5 9 8 (3) To provide a simple rule for classifying any new image into
f2,R = , f2,G = , f2,B = . (11.71)
1 0 4 0 7 0 one of the classes identified by the training images.
Find the rule for classifying an observed (2 × 2) color image

11-8.1 Singular Value Decomposition
R
g gR0,1 Before diving into the mechanics of unsupervised learning and
gR = R0,0 ,
g1,0 gR1,1 classification, we provide an overview of an important matrix
G tool known as singular value decomposition (SVD).
g0,0 gG 0,1
gG = G , (11.72) The SVD is used to factorize an (M × N) matrix F into the
g1,0 gG 1,1
B product of two orthogonal matrices U and V, and one diagonal
g gB0,1 matrix S:
gB = B0,0 .
g1,0 gB1,1 F = USVT . (11.75)
The four matrices have the following attributes:
Assume K1 = K2 = I and equal a priori probabilities.
(1) F is any (M × N) matrix.
Solution: The second term in Eq. (11.60) represents the energy
Efk of class k. By inspection Ef1 = Ef2 (f1 and f2 have the same (2) U is an (M × M) orthogonal matrix, which means
numbers). Hence the second term in Eq. (11.60) is the same
for k = 1 and 2, so we can ignore it. Consequently, Eq. (11.60) UT U = UUT = IM ,
simplifies to
where IM is an (M × M) identity matrix.
ΛMLE [1] = 1gR0,0 + 2gR0,1 + 3gR1,0
(3) V is an (N × N) orthogonal matrix:
+ 4gG G G
0,0 + 5g0,1 + 6g1,0
VT V = VVT = IN .
+ 7gB0,0 + 8gB0,1 + 9gB1,0, (11.73)
ΛMLE [2] = 3gR0,0 + 2gR0,1 + 1gR1,0 (4) S is an (M × N) diagonal matrix, of the following form:
+ 6gG G G
0,0 + 5g0,1 + 4g1,0 (a) If M ≥ N:
+ 9gB0,0 + 8gB0,1 + 7gB1,0. (11.74) diag[σ j ]
S= ,
0M−N, N
Classification Rule: Choose the larger of ΛMLE [1] and ΛMLE [2]. where diag[σ j ] is an M × M diagonal matrix of {σ j }, and
Note that g1,1 is not used for any color, as expected. in this case F and S are called tall matrices.
Concept Question 11-5: How do we estimate image vec- (b) If M ≤ N:

(i) S = diag[σ j ] 0M, N−M ,
tors fk [n, m] from the training image vectors fk [n, m]?
where diag[σ j ] is an M × M diagonal matrix of {σ j }, and
in this case F and S are called reclining matrices.
11-8 Unsupervised Learning and Note that 0m,n is an (m × n) matrix of zeros.
Classification The {σ j } are called singular values. The number of singular
values is the smaller of M and N. Note that F and S both have
In unsupervised learning we are given a set of L training the same size (M × N). More information on {σ j } is presented
images. We know nothing about what the images are supposed shortly.
to represent, or to what classes they belong, or even what the
classes are. The goals of unsupervised learning and classifica- 11-8.2 SVD Computation by
tion include: Eigendecomposition
(1) To identify classes from the training images.
In this book, we do not prove the existence of the SVD, nor
(2) To classify each of the training images. review algorithms for computing it. However, we offer the
 
following simple approach to compute the SVD of F by noting 12/13 0 5/13
that V= 0 1 0 . (11.78d)
−5/13 0 12/13
FFT = (USVT )(VST UT )
The two singular values are σ1 = 130 and σ2 = 65.
= (US)(VT V)(ST U) = U diag[σ 2j ]UT (11.76a)
The singular values are usually put in decreasing order
and σ1 > σ2 > · · · by reordering the columns of U and the rows
FT F = (VST UT )(USVT ) of VT .
(b) Using Eq. (11.78a), we compute
= (VST )(UT U)(SVT ) = V diag[σ 2j ]VT . (11.76b)
 
96 −72
These relationships rely on the fact that U and V are orthogonal 96 39 −40  39
matrices: VVT = UT U = I. FFT = 52 
−72 52 30
As demonstrated shortly in the SVD example, the eigenvec- −40 30

tors of FFT constitute the columns of U, the eigenvectors of 12337 −6084
FT F constitute the columns of V, and the {σ j } are the nonzero = .
−6084 8788
eigenvalues of FFT and of FT F. Given V and {σ 2j }, or U and
{σ 2j }, we can compute U or V as follows: Postmultiplying FFT in Eq. (11.76a) by U and using UT U = I
gives
U = FVdiag[σ −1 (FFT )U = U diag[σ 2j ]. (11.78e)
j ], (11.77a)
T
= diag[σ −1 T Next, let u j be the jth column of (M × M) matrix U. Then the
V j ]U F, (11.77b)
jth column of Eq. (11.78e) is
both of which follow directly from F = USVT .
(FFT )u j = σ 2j u j , (11.78f)
This is because postmultiplying U by diag[σ 2j ] multiplies the jth

◮ The SVD of a matrix A can be computed using MATLAB
column of U by σ 2j , and U is an orthogonal matrix characterized
by the command [U,S,V]=svd(A); ◭
by uTi u j = δ [i − j]. Equation (11.78f) states that u j is an
eigenvector of FFT with associated eigenvalue σ 2j , which means
that (FFT − σ 2j I) is singular and therefore its determinant is
Example 11-7: Compute SVD zero:
det(FFT − σ 2j I) = 0. (11.78g)
The roots of this quadratic polynomial are σ12 and σ22 , which can
Matrix F is given by be computed by inserting the matrix FFT computed earlier into
Eq. (11.78g):
96 39 −40
F= . (11.78a)
−72 52 30 12337 −6084 1 0
det − σ 2j = 0,
−6084 8788 0 1
Compute the SVD of F (a) by MATLAB and (b) by eigende-
composition of FFT . which leads to
Solution: (a) The MATLAB command [U,S,V]=svd(F) (12337 − σ 2j )(8788 − σ 2j ) − (−6084)2 = 0.
yields
The solution of the quadratic equation gives
4/5 3/5
U= , (11.78b)
−3/5 4/5 σ1 = 130,

130 0 0 σ2 = 65.
S= , (11.78c)
0 65 0
11-8 UNSUPERVISED LEARNING AND CLASSIFICATION 375
To find V, premultiply F = USVT first by UT and then by e T is computed to be

Using Eq. (11.78h), V
diag[σ −2 T
j ]. Using U U = I, the process gives
e T = 12/13 0 −5/13 .
V
e T = diag[σ −2 ]UT F,
V (11.78h) 0 1 0
i
where Ve T is the first M rows of VT . This is as expected: The third row of VT is computed to be orthogonal to the two
A = USVT uses only the first M rows of VT . Since VT must e T using Gram-Schmidt and is
rows of V
be orthogonal, the remaining rows of VT must be chosen to be
orthogonal to its first M rows. The remaining rows of VT can vT3 = 5/13 0 12/13 .
be computed from its first M rows using, say, Gram-Schmidt
orthonormalization. 11-8.3 Interpretations of SVD
The SVD of F can therefore be computed as follows:
A. Rotations, Reflections and Scalings
1. Compute eigenvalues and eigenvectors of FF T .
An orthogonal matrix can be interpreted as a rotation and/or a
2. The eigenvalues are σi2 and the eigenvectors are ui . reflection of coordinate axes. A (2 × 2) orthogonal matrix can
e T of VT using often be put into the form of Rθ defined in Eq. (3.38), which we
3. Compute the first M rows V
repeat here as
e T = diag[σ −2 ]UT F. cos θ sin θ
V i Rθ = . (11.79)
− sin θ cos θ
4. Compute the remaining rows of VT using Gram-Schmidt For example, by comparing the entries of U in Eq. (11.78b) with
orthonormalization. the entries of Rθ , we ascertain that U is a rotation matrix with
θ = 36.87◦. Similarly, V in Eq. (11.78d) represents a rotation
Another way to compute the SVD is from matrix with θ = 22.62◦.
Hence, the SVD of a matrix F can be interpreted as: (1) a
FT F = (VST UT )(USVT ) rotation and/or reflection of axes, followed by (2) scaling of the
= (VST )(UT U)(SVT ) rotated axes, followed by (3) another rotation and/or reflection
of axes.
0 ]VT .
= Vdiag[σ 2j , |{z} (11.78i)
N−M
B. Expansion in Orthonormal Vectors
Repeating the earlier argument, V is the matrix of eigenvectors,
and {σ 2j } are the nonzero eigenvalues, of (N × N) matrix FT F. Now we introduce another interpretation of the SVD, which will
prove particularly useful in unsupervised learning. In the sequel,
Note that FT F has N − M zero eigenvalues, since the rank of we assume that M < N, so F is a reclining matrix. Let us define:
FT F is M.
Inserting σ1 into Eq. (11.78i) gives (1) fi as the ith column of F, for i = 1, 2, . . . , N:
 
12337 − 16900 −6084 0 fi,1
u1 = ,
−6084 8788 − 16900 0  fi,2 
fi =  
 ..  , (11.80)
which has the solution (after normalization so that uT1 u1 = 1): .
fi,M
u1 = [4/5, −3/5]T.
Similarly, for σ2 ,
(2) ui as the ith column of U, for i = 1, 2, . . . , M:
12337 − 4225 −6084 0  
u = ui,1
−6084 8788 − 4225 2 0
 ui,2 
ui =  
 ..  , (11.81)
has the solution (after normalization so that uT2 u2 = 1): .
u2 = [3/5, 4/5]T. ui,M
and C. Comparison with Orthonormal Functions

(3) υi, j as the (i, j)th element of V. It follows that υ j,i is the The column vectors ui of the orthogonal matrix U are them-
selves orthonormal vectors. This is because UT U = IM is equiv-
(i, j)th element of matrix VT .
alent to stating
With these definitions, the nth column of the equation F = USVT (
can be rewritten as T 1 if i = j,
ui u j = (11.85)
0 if i 6= j.
M
fi = ∑ u j (σ j υi, j ) Equation (11.82) is the same as the expansion given in
j=1
Eq. (7.4) for a function x[i], for all i, in orthonormal func-
= ci,1 u1 + ci,2 u2 + · · · + ci,M uM , for i = 1, . . . , N, tions {φi (t)}. Furthermore, Eq. (11.84) is the linear algebraic
(11.82) equivalent of the formula given by Eq. (7.6) for computing the
coefficients {xi } in the orthonormal expansion in Eq. (7.4). The
where ci, j = σ j υi, j . The coefficients {ci, j } are called coor-
orthonormality of the vectors {ui } in U is the linear algebraic
dinates of vector fi in the basis {u j } of the M-dimensional
equivalent of orthonormality of the basis functions {φi [n]} in
space RM . Eq. (7.5).
For example, applying Eq. (11.82) to the SVD example of For convenience, we repeat Eqs. (7.4)–(7.6), modified to finite
Eq. (11.78) gives number N of real-valued basis functions φi [n], coefficients xi and
times n, and C = 1:
−40
f3 = = u1 (σ1 υ3,1 ) + u2 (σ2 υ3,2 )
30 N
x[n] = ∑ xi φi [n], (11.86a)
4/5 3/5
= 130(−5/13) + 65(0) . (11.83) i=1
−3/5 | {z } 4/5 | {z } N
−50 0
δ [i − j] = ∑ φi [n] φ j [n], (11.86b)
According to Eq. (11.82) each column of F can be written as n=1
a linear combination of the M columns {ui } of the orthogonal and
N
matrix U, with the coefficients in the linear combination being
{σ j υi, j }. Note that the final N–M rows of VT are not used to xi = ∑ x[n] φi [n]. (11.86c)
n=1
compute F, because they multiply the zero part of the reclining
matrix S.
The coefficients σ j υi, j can be computed using
11-8.4 Dimensionality Reduction
σ j υi, j = (UT fi ) j , j = 1, . . . , M; i = 1, . . . , N, (11.84)
Let us arrange the singular values {σi } to be in decreasing order
where (UT fi ) j is the jth element of the column vector UT fi .
Equation (11.84) provides the ( j, i)th element of either UT F or σ1 > σ2 > · · · > σM ,
SVT .
For example, for i = 3 and j = 1, which can be accomplished by reordering the columns of U and
the rows of VT .
Often there is a threshold singular value σT with the follow-
−40
σ1 υ3,1 = 4/5 −3/5 = −50; ing properties:
30
and for i = 3 and j = 2, (1) σ1 > σ2 > · · · > σT +1 > σT and

−40 (2) σT ≫ σT −1 > σT −2 > · · · > σM .
σ2 υ3,2 = 3/5 4/5 = 0.
30
Thus, all singular values with index larger than T are much
smaller than σT , and therefore they can be ignored in
11-9 UNSUPERVISED LEARNING EXAMPLES 377
Eq. (11.82). The truncated SVD becomes

Notation for Unsupervised
T Training and Classification
fi ≈ ∑ u j (σ j υi, j ). (11.87)
j=1
f (i) [n, m] = (M × N) ith training image, with class un-
The summation in Eq. (11.87) provides a reasonable approx- known, { i = 1, . . . , I }
imation so long as σT is much larger than singular values I = number of training images
with higher indices. In terms of orthogonal functions, this is L = number of classes
analogous to truncating a Fourier series expansion to T terms. gobs [n, m] = observed image of unknown class
The resulting sum is often a good approximation to the function.
The significance of Eq. (11.87) is that each column fi of
N ′ = MN = number of pixels per image
F can now be approximated well by only T of the M terms
f(i) = ith training image vector, generated by column
in the orthonormal expansion given by Eq. (11.82). Each fi is
unwrapping f(i) [n, m] into
represented using T coefficients:
[ f (i) [0, 0], . . . , f (i) [0, M − 1], . . ., f (i) [1, 0], . . . ,
fi ≈ ci,1 u1 + ci,2 u2 + · · · + ci,T uT , (11.88) f (i) [1, M − 1], . . ., f (i) [N − 1, 0], . . .,
f (i) [N − 1, M − 1]]T
which contains only the first T terms in Eq. (11.82). F = [f1 f2 . . . fI ]T = matrix composed of the
Replacing the expansion in Eq. (11.82) with the truncated columns of the I vectors representing images f (i) [n, m],
expansion in Eq. (11.87) is called dimensionality reduction.
{ i = 1, 2, . . . , I }. Size of F is N ′ × I, where N ′ = MN
This not only reduces the amount of computation, but also
allows visualization of the image classes in a T-D subspace
{ σ j } = singular values of SVD representation
instead of M-D R M . The dimensionality reduction process is
illustrated through forthcoming examples.
U and V = SVD orthogonal matrices
S = SVD diagonal matrix
Concept Question 11-6: Why do we set small singular
values to zero?
11-9 Unsupervised Learning Examples 11-9.1 Unsupervised Learning Example with 5

In unsupervised learning, we are given a set of I training images Training Images
{ f (i) [n, m], i = 1, . . . , I}. Even though we have no information
about the contents of the I images, the goal is to partition In this simple example, we are given I = 5 training images,
the images into L classes { fk [n, m], k = 1, . . . L}, as was done { f (i) [n, m] }, identified by superscript i with i = 1, 2, . . . , 5. All
previously in Sections 11-1 to 11-4, but now determined by the 5 images are (2 × 2) in size. We also are given an observation
images themselves. If we are then given an additional image image gobs [n, m]. Our goal is to use the training images to
gobs [n, m], we must provide a simple rule for classifying it into determine the distinct classes contained in the training images,
one of the L classes. using unsupervised learning, and then to classify gobs [n, m] into
We provide two simple examples to illustrate how SVD is one of those classes.
used in unsupervised learning and classification. In the first The six (2 × 2) images are given by
example, the images are intentionally chosen to be small in size
(2 × 2) to allow us to carry out the computations manually. A 1.1 0.1
f (1) [n, m] = , (11.89a)
total of 5 training images and 1 observation image are involved. 0.1 1.0

The experience gained from the first example then allows us 0.1 1.0
to consider a second example comprised of 24 training images, f (2) [n, m] = , (11.89b)
1.0 0.1
each (3 × 3) in size, and 1 observation image of the same size.
1.0 0.1
f (3) [n, m] = , (11.89c)
0.0 0.9

(4) 0.0 0.9 where u1 and u2 are the first and second columns of U. Similarly,
f [n, m] = , (11.89d)
1.1 0.1 application of Eq. (11.88) for i = 2, 3, 4, and 5 yields

1.1 1.0
f (5) [n, m] = , (11.89e) f(2) ≈ (σ1 υ2,1 )u1 + (σ2 υ2,2 )u2
0.9 1.0
= 1.06u1 + 0.94u2, (11.92b)
1 1
gobs [n, m] = . (11.89f) f(3) ≈ (σ1 υ3,1 )u1 + (σ2 υ3,2 )u2
1 1
= 1.04u1 − 0.86u2, (11.92c)
f(4) ≈ (σ1 υ4,1 )u1 + (σ2 υ4,2 )u2
A. Training Matrix = 1.01u1 + 1.00u2, (11.92d)
(5)
f ≈ (σ1 υ5,1 )u1 + (σ2 υ5,2 )u2
We unwrap the f (i) [n, m] by columns as in MATLAB’s F(:)
and assemble the unwrapped images into a (4 × 5) training = 2.00u1 − 0.23u2. (11.92e)
matrix F. The ith column of the training matrix F is the
All five images assume the form of Eq. (11.88):
unwrapped vector image f(i) [n, m]:
f(i) ≈ ci,1 u1 + ci,2 u2 .
F = f1 f2 f3 f4 f5
 
1.1 0.1 1.0 0.0 1.1 With u1 and u2 as orthogonal dimensions, ci,1 and ci,2 are the
0.1 1.0 0.0 1.1 0.9 coordinates of image f(1) in (u1 , u2 ) space.
= . (11.90)
0.1 1.0 0.1 0.9 1.0
1.0 0.1 0.9 0.1 1.0
B. Subspace Representation
Using the recipe given in Section 11-8.1(B) or the MATLAB If we regard u1 and u2 in Eq. (11.92) as orthogonal axes, then
command [U,S,V]=svd(F), we obtain the following matri- f(1) , the dimensionally reduced representation of training im-
ces: age 1, has coordinates (1.19, −0.90), as depicted in Fig. 11-12.
  Similar assignments are made to the other four training images.
0.54 −0.52 −0.20 −0.63
0.47 0.56 0.63 −0.26 The five symbols appear to be clustered into three image classes
U= , (11.91a) centered approximately at coordinates { (1, 1), (1, −1), (2, 0) }.
0.49 0.48 −0.69 0.25 
0.50 −0.44 0.29 0.69 We assign:
  Class #1: to cluster centered at (1, 1),
2.94 0 0 0 0
 0 1.86 0 0 0 Class #2: to cluster centered at (1, −1),
S= , (11.91b)
0 0 0.14 0 0 Class #3: to cluster centered at (2, 0).
0 0 0 0.02 0 Given these three clusters, we divide the (u1 , u2 ) domain into
 
0.40 −0.49 0.45 −0.63 0.01 the three regions shown in Fig. 11-12.
0.36 0.51 −0.38 −0.44 −0.53
 
V = 0.35 −0.46 −0.06 0.54 −0.61 . (11.91c) C. Classification of Observation Image
0.34 0.54 0.70 0.31 −0.01
0.68 −0.01 −0.39 0.17 0.59 For the observation image defined by Eq. (11.89f), we need to
determine its equivalent coordinates (g1 , g2 ) in (u1 , u2 ) space.
It is apparent from matrix S that both σ1 = 2.94 and σ2 = 1.86 We do so by applying the following recipe:
are much larger than σ3 = 0.14 and σ4 = 0.02. Hence, we can
truncate the SVD to T = 2 in Eq. (11.88), which gives 1. Unwrap gobs [n, m] by columns to form vector g, which in
the present case gives
 
f(1) ≈ u1 (σ1 υ1,1 ) + u2 (σ2 υ1,2 ) 1
= (2.94 × 0.40)u1 + (1.86 × (−0.49))u2 1
g =  . (11.93)
1
= 1.19u1 − 0.90u2, (11.92a) 1
11-9 UNSUPERVISED LEARNING EXAMPLES 379
11-9.2 Unsupervised Learning Example with 24

u2 Training Images
1.5
Now that we understand how to use SVD to truncate and classify
f (4) simple (2 × 2) images, let us consider a slightly more elaborate
1
example involving 24 training images, each (3 × 3) in size. The
f (2) Class 1 images are displayed in Fig. 11-13(a) and the goal is to develop
0.5 a classification scheme on the basis of unsupervised learning.
The procedure is summarized as follows:
g 1. Each of the 24 images is unwrapped by columns into a
0 f (5) (9 × 1) vector.
2. The 24 vectors are assembled into a (9 × 24) training
−0.5
Class 3 matrix F.
Class 2 3. MATLAB is used to compute matrices U, S, and V of the
f (3) f (1)
SVD of F, namely F = USVT .
−1 u1 4. From S, its diagonal elements, corresponding to the sin-
1 1.5 2
gular values σ1 through σ9 , are read off and then plotted to
Figure 11-12 Depiction of 2-D subspace spanned by u1 determine how many dimensions the images can be reduced
and u2 . The blue symbols represent each column of the to within a reasonable degree of approximation. The plotted
training matrix (i.e., each training image f (i) [n, m]. They cluster values of σi for the images in Fig. 11-13(a) are shown in
into 3 classes. The red represents the observation image Fig. 11-13(b). Because σ1 and σ2 are much larger than the
gobs [n, m]. remaining singular values, we can approximate the images by
reducing the dimensionality from M = 9 down to T = 2.
5. The reduced images can be represented in terms of co-
efficients similar to the form in Eq. (11.92). These coeffi-
2. Use the form of Eq. (11.84)—with fi replaced with g and
cients become coordinates, leading to the 24 “+” symbols in
ci, j = σ j υi, j replaced with gi —to compute coordinates g1 to g4 :
Fig. 11-13(c).
     6. Based on the cluster of points in Fig. 11-13(c), the (u1 , u2 )
g1 0.54 0.47 0.49 0.50 1
g2  −0.52 0.56 0.48 −0.44 1 space is divided into three classes. For an observation image
T
g  = U g = −0.20 0.63 −0.69 0.29  1 with an observation vector g,
3
g4 −0.63 −0.26 0.25 0.69 1 (a) Class #1 if g2 > 0 and g2 − g1 > −2,
 
2.0 (b) Class #2 if g2 < 0 and g2 + g1 < 2,
0.08
= . (11.94)
0.03 (c) Class #3 if g1 > 2 and g1 − 2 > g2 > −(g1 − 2).
0.05 While the two examples presented in this subsection have served
to illustrate the general approach to unsupervised learning and
3. From the result given by Eq. (11.94), we deduce that the
classification, the image scenarios were limited to small-size
reduced-coordinate representation of gobs [n, m] in (u1 , u2 ) space
images, the number of reduced dimensions T was only 2 in both
is (g1 , g2 ) = (2.0, 0.08).
cases, and the clustering of points was fairly obvious, making
4. These coordinates place g in Class 3 (Fig. 11-12).
the development of the classification schemes rather obvious and
For new observations with (g1 , g2 ) coordinates, the class convenient. How should we approach the general case where the
regions in Fig. 11-12 correspond to the following classification training images are larger than (3 × 3), the number of reduced
scheme: dimensions T is greater than 2, and the points are not clustered
into clearly separate groups? The answer is:
(a) Class #1 if g2 > 0 and g2 − g1 > −1.5, (a) Neither the number of available training images nor their
(b) Class #2 if g2 < 0 and g2 + g1 < 1.5, size has any effect on the SVD procedure outlined earlier
in this subsection. We used small-size images simply for the
(c) Class #3 if g1 > 1.5 and g1 − 1.5 > g2 > −(g1 − 1.5). convenience of writing their matrices on paper.
T ≥ 4, we can no longer rely on our visual perspective to identify

clusters and define classification regions. We therefore need
mathematical tools to help us cluster the training images into any
specified number of classes and in any number of dimensions,
and to do so in accordance with an objective criterion. One such
tool is the K-Means clustering algorithm and its associated
Voronoi sets.
11-10 K-Means Clustering Algorithm

(a) 24 training images, each (3 × 3) in size 11-10.1 Voronoi Sets
σi Let us start with a review and update of our notation:
15
10 I: Number of training images
5 (M × N): Size of each image, in pixels
0 i f (i) [n, m]: ith image
1 2 3 4 5 6 7 8 9
f(i) : (N ′ × 1) column vector generated by column
(b) Singular values σi of the training matrix
unwrapping image f (i) [n, n] into a column of
u2 length N ′ = MN
L: Specified number of classes into which the
1.5
training images are to be clustered
′
N ′ = MN: Number of dimensions of the RN space in
1
which SVD clustering has to occur, assuming
Class 1 no dimensionality reduction is used
0.5
T: Number of dimensions if dimensionality reduc-
tion is justified
0
For many realistic images, M and N may each be on the order of
−0.5 Class 2 1000, and therefore N ′ = MN may be on the order of 106 . Even
Class 3 if the number of dimensions is reduced to a smaller number T , it
−1 remains much larger than the 3-D space we can manage visually.
The material that follows is applicable for clustering images
−1.5 u1 using SVD in N ′ -dimensional space, as well as in T -dimensional
1 1.5 2 2.5 3 3.5 space if dimensionality reduction had been applied.
(c) Depiction of 2-D subspace spanned by u1 and u2. Given I images, defined in terms of their unwrapped column
′
The blue symbols represent individual columns of vectors {f(i) , i = 1, 2, . . . , I}, we wish to cluster them in RN
the training matrix. They cluster into 3 classes. space into L classes pre-specified by centroids at locations {µ p ,
p = 1, 2, . . . , L}. With each image represented by a point in
′ ′
Figure 11-13 (a) 24 training images, (b) plot of singular RN space, the sets of points obtained by partitioning RN into
values, and (c) clusters of training images in (u1 , u2 ) space. regions such that each region consists of the set of points closest
to the specified centroids are called Voronoi sets (or regions),
named for Georgy Voronov. Figure 11-14 shows a 2-D space
divided into 8 Voronov regions, each identified by a centroid.
(b) For a reduced number of dimensions T = 2, it is easy to The boundaries of each region are defined such that every point
generate 2-D clusters. For T = 3, we can generate and display location within that region is closer to the centroid of that region
3-D clusters onto a 2-D page, but the scales get distorted. For than to any of the other centroids. The distance between two
11-10 K-MEANS CLUSTERING ALGORITHM 381
11-10.2 K-Means Algorithm
μ3 The boundaries of the Voronoi regions, such as those in

Fig. 11-14, are established by the locations of the pre-specified
μ4 centroids {µ p , p = 1, 2, . . . , L}. In unsupervised learning and
μ2
classification, no information is available on the locations of
those centroids, so we need an algorithm to create them. The
μ5 K-Means algorithm is a procedure for clustering I points—i.e.,
μ6 I images defined in terms of their vectors {f(i) , i = 1, 2, . . . , I}—
μ1 ′
into L clusters of points in RN , where N ′ is the column length
of the image vectors. We still have to specify the number of
μ7 classes L, but not the locations of their centroids. The procedure
is iterative and consists of the following steps:
μ8 (0)
1. Choose an initial set of L clustering centroids {µ p ,
(0)
p = 1, 2, . . . , L} at random. The superscript of each µ p denotes
that it is the initial choice (zeroth iteration).
Figure 11-14 Voronoi regions surrounding centroids µ 1
2. Compute the L Voronoi sets (regions) using the I available
through µ 8 .
image vectors, by measuring the distance between each image
(0)
vector f(i) and each centroid µ p and then assigning f(i) to the
nearest centroid.
points, whether in 2-D or N ′ -D space, is the Euclidean distance 3. For each Voronoi set p use the locations of the image
defined in accordance with Eq. (7.117b). In the present context, vectors assigned to it to compute their combined centroid and
(1)
for a test image fi and a centroid µ p of region p, defined by label it µ p (first iteration). The centroid of a Voronoi set is the
mean (average) location of the image vectors contained within
  that set. For purposes of illustration, if Voronoi set 3 contains
fi,1
 fi,2  (4 × 1) image vectors f(2) , f(5) , and f(8) , and they are given in
f(i) = 
 .. 
 (11.95a) SVD format (Eq. (11.82)) by
.
fi,N ′ f(2) = 4u1 + 3u2 + 6u3 + (0)u4 ,
and
  f(5) = 2u1 + (0)u2 + 3u3 + 2u4 ,
µ p,1
 µ p,2  f(8) = (0)u1 + (0)u2 + 3u3 + 7u4,
µp =
 ..  ,
 (11.95b)
. where u1 to u4 are the column vectors of matrix U, as defined in
µ p,N ′ (1)
Section 11-8.3, then the new centroid µ 3 is given by
the Euclidean distance between them is 1
(1)
" ′ #1/2 µ 3 = [(4 + 2 + 0)u1 + (3 + 0 + 0)u2
N 3
ℓi,p = ∑ | fi, j − µ p, j |2 . (11.96) + (6 + 3 + 3)u3 + (0 + 2 + 7)u4]
j=1 = 2u1 + u2 + 4u3 + 3u4 .
After computing ℓi,p for all L regions, image f(i) is assigned (1)
Here, µ 3 defines the location of the first iteration of centroid 3.
to the region for which ℓi,p is the smallest. The boundaries in
4. Repeat steps 2 and 3 after incrementing each superscript on
Fig. 11-14 are established by exercising such an assignment to
′ µ p by 1.
every point in the space RN . 5. Continue the iterative process until the algorithm has
converged per a defined criterion regarding the average distance
between centroids of successive iterations.
Example 11-8: K-Means Clustering Algorithm
The K-Means algorithm was applied to the N data points shown

in Fig. 11-15(a), which were generated using two sets of random
points. The algorithm was initialized using the K-Means++
algorithm, which works as follows:
• Choose one data point at random. Call this point x0 , the
centroid for the first cluster.
• Choose another data point xn at random. The probability
that the data point xn is chosen is proportional to the square
distance between xn and x0 . Specifically,
||xn − x0||22
P[xn ] = N
.
∑i=1 ||xi − x0||22
Thus, points far away from the first centroid x0 are more
likely to be chosen. Point xn is assigned as the centroid for
the second cluster.
Results of different iterations of the K-Means algorithm are
shown in parts (b) through (e) of Fig. 11-15, after which the
algorithm seems to have converged.
The K-Means algorithm was run using the MATLAB com-
mand idx=kmeans(X,k) available in MATLAB’s Machine
Learning toolbox. Here X is the (N × 2) array of data point
coordinates, k is the number of clusters (which must be set
ahead of time; here k=2), and idx is the (N × 1) column
vector of indices of clusters to which the data points whose
coordinates are in each row of X are assigned. Each element
of idx is an integer between 1 and k. For example, the data
point whose coordinates are [X(3,1),X(3,2)] is assigned
to cluster #idx(3).
kmeans has many options. For information, use help
kmeans.
Concept Question 11-7: What is the K-Means algorithm

for?
PROBLEMS 383
3 3
Cluster 1
Cluster 2
2 2 Centroids
1 1
0 0
-1 -1
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
(a) Random data points to be classified (b) Results of first iteration
3 3
Cluster 1 Cluster 1
Cluster 2 Cluster 2
2 Centroids 2 Centroids
1 1
0 0
-1 -1
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
(c) Results of second iteration (d) Results of third iteration
3
Cluster 1
Cluster 2
2 Centroids
-1
-2
-3
-3 -2 -1 0 1 2 3 4
(e) Results of fourth iteration
Figure 11-15 (a) Random data points and results of K-means clustering after (b) 1 iteration, (c) 2 iterations, (d) 3 iterations, and (e) 4
iterations.
Summary
Concepts
• Classification is often performed by choosing the image • Jointly Gaussian color images can be classified using
with the largest correlation with the observed image. labelled training images to estimate the mean and co-
• MLE classification is performed by subtracting each im- variance matrix of each image class, and using these
age energy from double the correlation with the observed estimates in vector versions of the MLE and MAP
image. classifiers.
• MAP classification is performed by subtracting each • Unsupervised learning can be performed using unlabeled
image energy from double the correlation with the ob- training images, by unwrapping each image into a col-
served image, and adding the logarithm of the a priori umn vector, assembling these into a training matrix,
probability of the image times double the noise variance. computing the SVD of the training matrix, setting small
• Classification of an image with an unknown shift can be singular values to zero, and regarding the orthonormal
performed using cross-correlation instead of correlation. vectors associated with the remaining singular values as
Classification of an image with an unknown scaling can subspace bases. Each training image is a point in this
be performed by cross-correlation on logarithmic spatial subspace. The points can be clustered using the K-Means
scaling. algorithm, if necessary.
Correlation Singular value decomposition (SVD)
N−1 M−1 F = USVT
ρk = ∑ ∑ gobs[n, m] fk [n, m]
n=0 m=0 Expansion in orthonormal vectors
T
MAP classifier
ΛMAP [k] = 2ρk − E fk + 2σv2 log p[k]
fi = ∑ u j (σ j vi, j )
j=1
Cross-correlation for unknown shift Coefficients of orthonormal vectors

ρk [n, m] = gobs [n, m] ∗ ∗ fk [−n, −m] ci, j = σ j vi, j = (UT fi ) j
correlation dimensionality reduction orthonormal vectors subspace training matrix
cross-correlation K-Means algorithm recognition SVD unsupervised learning
PROBLEMS ρ̃k = ∑ ∑ g̃obs [n, m] f˜k [n, m], where
Section 11-1: Image Classification by Correlation gobs [n, m]

g̃obs [n, m] = q
∑ ∑ g2obs [n, m]
11.1 A problem with using correlation ρk (defined in Eq. and
(11.2)) to compare an observed image gobs [n, m] with each one fk [n, m]
of a set of reference images { fk [n, m]} is that the fk [n, m] with f˜k [n, m] = q .
the largest pixel values tends to have the largest ρk . We can ∑ ∑ fk2 [n, m]
avoid this problem as follows. Define the correlation coefficient
PROBLEMS 385

(a) Expand 0 ≤ ∑ ∑(g̃obs [n, m] ± f˜k [n, m])2 and show that f2 [n, m] =
2 5
,
|ρ̃k | ≤ 1. 1 7

(b) If gobs [n, m] = a fk [n, m] for any constant a > 0, show that 1 2
ρ̃k = 1. f3 [n, m] = ,
5 7
11.2 In Section 11-2, let the reference images { fk [n, m]} all be
energy-normalized to using
(a) MLE classification;
fk [n, m] (b) MAP classification with a priori probabilities p[1] = 0.1,
f˜k [n, m] = q ,
∑ ∑ fk2 [n, m] p[2] = 0.3, p[3] = 0.6. The noise variance is σv2 = 10.
and the observed image be g[n, m] = f˜k [n, m] + v[n, m], where Section 11-4: Classification of Spatially Shifted
v[n, m] is a zero-mean white Gaussian noise random field with Images
variance σv2 . This is the problem considered in Section 11-2,
except with energy-normalized reference images. Show that the 11.5 Download text.jpg and text.m from the book web-
MLE classifier is now the value of k that maximizes the correla- site. These were used in Example 11-2 to find all occurrences
tion coefficient ρ̃k = ∑ ∑ g̃obs [n, m] f˜k [n, m] between g[n, m] and of “e” in text.jpg. Modify text.m to find all occurrences
fk [n, m]. of “a” in text.jpg. You will need to (i) find a subset of the
image containing “a” instead of “e” and (ii) change the threshold
Section 11-3: Classification by MAP using trial-and-error. Produce a figure like Fig. 11-11(c).
11.6 Download text.jpg and text.m from the book web-
11.3 Determine a rule for classifying the image site. These were used in Example 11-2 to find all occurrences of
“e” in text.jpg. Modify text.m to find all occurrences of
g0,0 g1,0
gobs [n, m] = “m” in text.jpg. You will need to (i) find a subset of the im-
g0,1 g1,1
age containing “m” instead of “e” and (ii) change the threshold
into one of the three classes using trial and error. Produce a figure like Fig. 11-11(c).
11.7 To machine-read numerals, such as the routing and
1 2
f1 [n, m] = , account numbers on the check in Fig. 11-1, it is necessary
3 4
to segment (chop up) the number into individual numerals,
4 3 and then use one of the methods of Sections 11-1 through
f2 [n, m] = ,
2 1 11-3 to identify each individual numeral. Segmentation is often
performed by the hardware (camera) used to read the number.
3 1
f3 [n, m] = , The program bank1.m reads the image of the bank font
2 4
numerals in Fig. 11-2(a), segments the image into (30 × 18) im-
using ages of each numeral, reassembles the images of numerals back
into the sequence of numerals in Fig. 11-2(a), and machine-
(a) MLE classification;
reads the result, using correlation, into a sequence of numbers
(b) MAP classification with a priori probabilities p[1] = 0.1, that represent the numerals in Fig. 11-2(a).
p[2] = 0.3, p[3] = 0.6. The noise variance is σv2 = 10. Modify bank1.m so that it machine reads the account
11.4 Determine a rule for classifying the image number in Fig. 11-1. This requires assembling the images of
numerals into the sequence of numerals in the account number
g g1,0 in Fig. 11-1. Plot the computed numerals in a stem plot. Figure
gobs [n, m] = 0,0 11-2(a) is stored in bank.png.
g0,1 g1,1
into one of the three classes 11.8 To machine-read numerals, such as the routing and
account numbers on the check in Fig. 11-1, it is necessary
7 1 to segment (chop up) the number into individual numerals,
f1 [n, m] = , and then use one of the methods of Sections 11-1 through
5 2
11-3 to identify each individual numeral. Segmentation is often images of each numeral, decimates each numeral by (2 × 2),
performed by the hardware (camera) used to read the number. reassembles the images of decimated numerals into the sequence
The program bank1.m reads the image of the bank font of numerals in Fig. 11-2(a), uses logarithmic spatial scaling
numerals in Fig. 11-2(a), segments the image into (30 × 18) im- to transform the problem into a shifted images problem, and
ages of each numeral, reassembles the images of numerals back machine-reads the result, using correlation, into a sequence of
into the sequence of numerals in Fig. 11-2(a), and machine- numbers that represent the numerals in Fig. 11-2(a)
reads the result, using correlation, into a sequence of numbers Modify scale1.m so that it machine-reads the routing
that represent the numerals in Fig. 11-2(a). number in Fig. 11-1. This requires assembling the images of
Modify bank1.m so that it machine-reads the routing num- numerals into the sequence of numerals in the routing number
ber in Fig. 11-1. This requires assembling the images of nu- in Fig. 11-1. Plot the computed numerals in a stem plot. Figure
merals into the sequence of numerals in the routing number in 11-2(a) is stored in bank.png.
Fig. 11-1. Plot the computed numerals in a stem plot. Figure
11-2(a) is stored in bank.png.
Section 11-7: Classification of Color Images
Section 11-5: Classification of Spatially Scaled
Images 11.11 Show that if there is no randomness in image color
components, i.e., Kk = 0, that Eq. (11.61) for MAP classification
11.9 To machine-read numerals, such as the routing and of color images reduces to Eq. (11.13) for MAP classification of
account numbers on the check in Fig. 11-1, it is necessary grayscale images summed over all three color components.
to segment (chop up) the number into individual numerals,
and then use one of the methods of Sections 11-1 through 11.12 Determine a rule for classifying the (2 × 2) color image
11-3 to identify each individual numeral. Segmentation is often
g g0,1,R
performed by the hardware (camera) used to read the number. gobs,R = 0,0,R ,
g1,0,R g1,1,R
But the numbers may have a different size from the stored
numerals, in which case the numbers are spatially scaled. g g0,1,G
gobs,G = 0,0,G ,
The program scale1.m reads the image of the bank font g1,0,G g1,1,G
numerals in Fig. 11-2(a), segments the image into (30 × 18)
g g0,1,B
images of each numeral, decimates each numeral by (2 × 2), gobs,B = 0,0,B ,
g1,0,B g1,1,B
reassembles the images of decimated numerals into the sequence
of numerals in Fig. 11-2(a), uses logarithmic spatial scaling into one of the two (2 × 2) color image classes
to transform the problem into a shifted images problem, and
machine-reads the result, using correlation, into a sequence of
1 2 5 6 9 1
numbers that represent the numerals in Fig. 11-2(a) f1,R = , f1,G = , f1,B = ,
3 4 7 8 2 3
Modify scale1.m so that it machine-reads the account
number in Fig. 11-1. This requires assembling the images of 4 3 8 7 3 2
f2,R = , f2,G = , f2,B = ,
numerals into the sequence of numerals in the account number 2 1 6 5 1 9
in Fig. 11-1. Plot the computed numerals in a stem plot. Figure
11-2(a) is stored in bank.png. using (assume K1 = K2 = I throughout)
11.10 To machine-read numerals, such as the routing and (a) MLE classification;
account numbers on the check in Fig. 11-1, it is necessary (b) MAP classification with a priori probabilities p[1] = 0.2,
to segment (chop up) the number into individual numerals, p[2] = 0.8, and σv2 = 0.
and then use one of the methods of Sections 11-1 through
11-3 to identify each individual numeral. Segmentation is often 11.13 Determine a rule for classifying the (2 × 2) color image
performed by the hardware (camera) used to read the number.
g g0,1,R
But the numbers may have a different size from the stored gobs,R = 0,0,R ,
numerals, in which case the numbers are spatially scaled. g1,0,R g1,1,R

The program scale1.m reads the image of the bank font g g0,1,G
numerals in Fig. 11-2(a), segments the image into (30 × 18) gobs,G = 0,0,G ,
g1,0,G g1,1,G
PROBLEMS 387

g
gobs,B = 0,0,B
g0,1,B
, Section 11-8: Unsupervised Learning and
g1,0,B g1,1,B Classification
into one of the two (2 × 2) color image classes
11.16 Show that the approximations in the truncated expan-
7 1 8 6 9 7 sions of Eq. (11.92)(a-e) do a good job of approximating the
f1,R = , f1,G = , f1,B = ,
2 4 4 2 5 3 columns of the training matrix F in Eq. (11.90).

4 2 2 4 3 5
f2,R = , f2,G = , f2,B = ,
1 7 6 8 7 9 Section 11-9: Unsupervised Learning Examples
using (assume K1 = K2 = I throughout)
(a) MLE classification; 11.17 We are given the eight (2 × 2) training images

(b) MAP classification with a priori probabilities p[1] = 0.2, (1) 1.1 0.1 (2) 0.9 0.0
p[2] = 0.8, and σv2 = 0. f = , f = ,
0.1 1.1 0.1 0.9

11.14 Following Example 11-5, determine an MLE rule for 1.1 0.1 0.9 0.0
f(3) = , f(4) = ,
classifying a (1 × 1) color image [gR , gG , gB ]T into two classes, 0.0 0.9 0.0 0.1

determined from the following training images: 0.1 1.1 0.0 0.9
For class #1: f(5) = , f(6) = ,
1.1 0.1 0.9 0.1
       
0 0 8 8 0.1 1.1 0.0 0.9
f(7) = , f(8) = .
f11 = 0 , f21 = 8 , f31 = 8 , f41 = 8 . 0.9 0.0 1.1 0.0
8 8 0 8
Use unsupervised learning to determine classes for these eight
For class #2: training images, and a rule for classifying the observed image
       
4 4 12 12 0.0 1.1
f12 =  4  , f22 = 12 , f32 = 12 , f42 = 12 . gobs = .
1.1 0.0
12 12 4 12
Assume no additive noise and equal a priori probabilities 11.18 We are given the eight (2 × 2) training images
p[1] = p[2] = 0.5.
(1) 0.9 0.1 (2) 1.1 0.0
11.15 Following Example 11-5, determine an MLE rule for f = , f = ,
0.9 0.1 1.1 0.1
classifying a (1 × 1) color image [gR , gG , gB ]T into two classes,
0.9 0.1 1.1 0.0
determined from the following training images: f(3) = , f(4) = ,
1.1 0.0 0.9 0.0
For class #1:
        0.1 0.9 0.0 1.1
1 5 3 3 f(5) = , f(6) = ,
0.1 0.9 0.1 1.1
f11 = 3 , f21 = 3 , f31 = 5 , f41 = 1 .
5 1 7 5 0.1 0.9 0.0 1.1
f(7) = , f(8) = .
0.0 1.1 0.0 0.9
For class #2:
        Use unsupervised learning to determine classes for these eight
2 6 4 4 training images, and a rule for classifying the observed image
f12 = 4 , f22 = 4 , f32 = 6 , f42 = 2 .
6 2 8 6 0.0 1.1
gobs = .
0.0 1.1
Assume no additive noise and equal a priori probabilities
p[1] = p[2] = 0.5.
Section 11-10: Clustering

(i) (i)
11.19 We are given a set of N points {( f1 , f2 ), i = 1, . . . , N}
in the plane R 2 . Prove that the point ( f1 , f2 ) minimizing
N
(i) (i)
∑ ||( f1 , f2 ) − ( f1 , f2 )||22
i=1
(i) (i)
is the centroid of the N points {( f1 , f2 ), i = 1, . . . , N}. This is
how centroids are determined in the K-Means algorithm. Note
that the quantity to be minimized is the sum of squared distances
from the points to the centroid, not the sum of distances.
Chapter 12
12 Supervised Learning and
Classification
Contents
Overview, 390 0
12-1 Overview of Neural Networks, 390 1
12-2 Training Neural Networks, 396 2
12-3 Derivation of Backpropagation, 403 3
12-4 Neural Network Training Examples, 404
4
Problems, 408 784 terminals
5
Objectives 7
Learn to: 9
■ Recognize the difference between unsupervised and

supervised learning.
■ Use a perceptron to implement classification with a

single separating hyperplane. Neural networks and machine learning have become
important tools in signal and image processing and
■ Use a hidden layer of perceptrons to implement more computer science. No book on image or signal
complicated classification. processing is complete without some coverage of
these, which have revolutionized society. A neural
■ Train a neural network using backpropagation. network can be trained, using sets of labeled
training images, to recognize digits or faces. This
chapter presents a very brief introduction to neural
networks for image classification.
Overview or neurons. The network of processors is called a neural net-
work. The act of using the labeled training images to compute
Having covered image classification algorithms based on un- the weights (gains) in the neural network is called training the
supervised training in the preceding chapter, we now switch neural network. The trained neural network accepts as input the
our attention to algorithms that rely on supervised training pixel values of a new image to be classified, and outputs the class
instead. To clearly distinguish between the structures of the two to which the new image belongs.
approaches, we begin this overview with a short summary of the A good example of a supervised learning problem is optical
former and an overview of the latter. character recognition of handwritten zip codes. A zip code
is a sequence of five or nine digits used to identify letter
and package destinations for the U.S. Postal System (USPS).
A. Summary of Unsupervised Learning But handwritten zip codes seldom resemble specific images
In unsupervised learning and classification, we are given a set of digits, because people write digits in different ways and
of I training images, each containing an image of one class (such styles. Supervised learning can be used to recognize handwritten
as a numeral or a letter) from among L possible classes, but zip codes by using as training images a labeled database of
the identities of the image classes are not known to us. The handwritten digits maintained by the U.S. National Institute of
goal is to cluster the I images into L domains and to develop Standards and Technology (NIST) and labeled by the actual
a classification scheme for assigning new observation images to digit each handwritten digit represents, as identified by a human
the correct class. judge. Supervised learning has been applied to optical character
Each (M × N) training image is unwrapped by columns to recognition of handwritten Chinese symbols (with a training
form a column vector of length N ′ = MN. The column vectors, image size of 3 GB). We discuss the zip code application further
denoted f(i) with {i = 1, 2, . . . , I}, are stacked alongside each in Section 12-4.
other to form the (N ′ × I) training matrix F. After computing the
SVD of matrix F, the dimensionality of vectors f(i) is reduced Concept Question 12-1: What is the difference between
from length N ′ to a truncated length T by setting small-valued supervised and unsupervised learning?
singular values σi to zero. The reduced-dimensionality image
vectors f(i) are each expressed as the sum of unit vectors {u j ,
i = 1, 2, . . . , T } with coefficients (coordinates) ci, j as expressed
by Eq. (11.92). The coordinates define the location of f(i) in
12-1 Overview of Neural Networks
RT space. Next, Voronoi domains are established by clustering A neural network is a network of simple processors called
the I images into L classes using the K-Means algorithm. Fi- neurons, as they are perceived to mimic the action of neurons
nally, boundaries are established between the Voronoi domains, in a brain. Neural networks are also called multilayered percep-
thereby creating a scheme for classifying new observation im- trons.
ages. The idea behind neural networks, when they were originally
developed in the 1950s, was that a neural network mimics the
B. Overview of Supervised Learning communication structure in the brain, which consists of about
100 billion neurons, each connected to about one thousand of
In supervised learning, we are given a set of I training images, its neighbors in an elaborate network. It was thought that such
just as in unsupervised learning, except that now we do know a network of processors might mimic a brain in its ability to
(i)
the class of each of the training images. Each training image fk identify images and objects on the basis of previous exposure
is said to be labeled with its proper image class k, so there is to similar images and objects (learning). The analogy to brain
no need to determine the image classes, which we needed to do function is now viewed as “tenuous.”
with unsupervised learning. Labeled training images of object The major advantage of using trained neural networks for
classes, such as faces, are readily available from data collected image classification is that no modeling or understanding of the
from the internet. The goal of supervised training is to use the image classification problem is necessary. The way in which
labeled training images to develop an algorithm for classifying the trained neural network works need not be (and seldom
new observation images. is) understandable to humans. In the remainder of the present
The classification algorithm is implemented using a net- section, we introduce how perceptrons work and how multiple
worked (connected) set of simple processors called perceptrons perceptrons are connected together to form a neural network.
390
12-1 OVERVIEW OF NEURAL NETWORKS 391
Weights
x1 w1
1
Out
1 w0 Σ
x2 w2 0
Weighted Activation
sum function ϕ(∙)
Inputs Perceptron
(a)
Weights
x1 w1
x2 w2
x3 w3
1
Out
1 w0 Σ
x4 w4 0
Weighted Activation
x5 w5 sum function ϕ(∙)
x6 w6
Inputs Perceptron
(b)
Figure 12-1 Perceptrons with (a) 2 inputs and (b) 6 inputs.
Then, in Section 12-2, we discuss how neural networks are A. Components of a Perceptron
trained, using the labeled training images. An example follows
in Section 12-4.
12-1.1 Perceptrons Figure 12-1(a) shows a perceptron with N = 2 inputs. The

added constant w0 usually is viewed as another weight multi-
A perceptron is a simple processor that accepts a set of N plying a constant input of 1.
numbers {xn , n = 1, . . . , N} as inputs, multiplies each input xn by In another example, a perceptron with six inputs is shown in
a weight wn , adds a constant w0 to the weighted sum of inputs, Fig. 12-1(b).
feeds the result into an activation function φ (x) that mimics A perceptron with N inputs {xn , n = 1, . . . , N} is implemented
a step function, and outputs a number y that is close to either 0 by the formula
or 1. The process mimics a biological neuron, which “fires” only
if the weighted sum of its inputs exceeds a certain threshold. y = φ (w0 + w1 x1 + w2 x2 + · · · + wN xN ). (12.1)
Synapses in the brain (which connect neurons) are biological
analogues of the weights in a perceptron. The activation function φ (x) is commonly chosen to be (for
392 CHAPTER 12 SUPERVISED LEARNING AND CLASSIFICATION
ϕ(x) g0,0 12
1
0.8 g1,0 10
1
0.6
0.4
1 −46 Σ y
0.2
0
g0,1 −2
0 x ϕ(∙)
−6 −4 −2 0 2 4 6
g1,1 −8
Figure 12-2 Activation function φ (x) for a = 1 (blue) and
a = 4 (red). The larger the value of a, the more closely φ (x) (a) Perceptron for Example 11-1
resembles a step function.
gG 4
some activation constant a > 0) the sigmoid function
gB 4
( 1
1 1 if x > 0,
φ (x) =
1 + e−ax
≈
0 if x < 0.
(12.2) 1 −8 Σ y
0
gR −4
This choice of activation function is plotted in Fig. 12-2 for ϕ(∙)
a = 1 and a = 4. The similarity to a step function is evident.
(b) Perceptron for Example 11-5
The rationale for choosing this particular mathematical rep-
resentation for the activation function is supported by two Figure 12-3 Perceptrons for implementing the classification
rules of Examples 11-1 and 11-5.
attributes: (1) its resemblance to a step function, and (2) its
derivative has the simple form
dφ
= a φ (x) (1 − φ (x)). (12.3)
dx Fig. 12-3(a) with φ (x) set equal to the step function u(x). If
The derivative of the activation function is used in Section 12-2 output y = 1, choose f2 , and if y = 0, choose f1 .
for training the neural network. Similarly, consider the classification rule given in Example
Note that to implement a simple linear combination of the 11-5, which stated:
perceptron’s inputs, the activation function should be chosen
such that φ (x) = x.
Choose f1 : if 4gG + 4gB − 4gR − 8 < 0,
Choose f2 : if 4gG + 4gB − 4gR − 8 > 0.
B. Classification Using a Single Perceptron
Another reason for using perceptrons is that single-stage clas-
The rule can be implemented using the perceptron in Fig.
sification algorithms can each be implemented using a single 12-3(b). If output y = 1, choose f2 , and if y = 0, choose f1 .
perceptron.
By way of example, let us consider the classification rules we
derived earlier in Examples 11-1 and 11-5. The classification
rule given by Eq. (11.19) is: Exercise 12-1: Use a perceptron to implement a digital OR
logic gate (y = x1 + x2 , except 1 + 1 = 1).
Choose f1 : if 12g0,0 + 10g1,0 − 2g0,1 − 8g1,1 − 46 < 0,
Choose f2 : if 12g0,0 + 10g1,0 − 2g0,1 − 8g1,1 − 46 > 0. Answer: Use Fig. 12-1(a) with φ (x) ≈ u(x), weights
w1 = w2 = 1, and any w0 such that −1 < w0 < 0.
This rule can be implemented using the perceptron in
any “reasonable” nonlinear function f (x1 , x2 , . . . , xN ) can be

Exercise 12-2: Use a perceptron to implement a digital
approximated as
AND logic gate (y = x1 x2 ).
!
M N
Answer: Use Fig. 12-1(a) with φ (x) ≈ u(x), weights
w1 = w2 = 1, and any w0 such that −2 < w0 < −1. f (x1 , x2 , . . . , xN ) ≈ ∑ ci φ ∑ wi, j x j , a < xn < b, (12.4)
i=1 j=1
for some constants N, M, ci , wi, j , a, b and some nonlinear func-

12-1.2 Networks of Perceptrons tion φ (·). There are no specific rules for the number of hidden
layers or number of perceptrons in each hidden layer. A very
A. Background rough rule of thumb is that the total number of perceptrons in
hidden layers should be one-fourth the sum of the numbers of
Networks of perceptrons, in which the output of each perceptron
inputs and outputs.
is fanned out and serves as an input into many other perceptrons,
are called neural networks, an example of which is shown in
Fig. 12-4. B. Deep-Learning Neural Networks
The leftmost vertical stack of squares is called the input
layer (of terminals). Each square is a terminal that accepts one Deep-learning neural networks may have dozens of hidden
number as an input and fans it out to the perceptrons in the layers, with each layer consisting of hundreds of perceptrons,
hidden layer to its immediate right. The number of terminals and requiring teraflops of computation. A trained deep-learning
in the input layer equals the number of pixels in the image. neural network may be able to perform surprising feats of
Terminals are also called nodes, and they often are depicted classification, such as spam filtering and face recognition. Much
using circles, making them easy to confuse with perceptrons of the research on neural networks is focused on the use of huge
(which they are not). To distinguish the terminals in the input
layer from the perceptrons in succeeding layers, we depict the
terminals in Fig. 12-4 as squares instead of circles.
Inputs Outputs
The rightmost vertical stack of perceptrons is called the
x1 y1
output layer. Each circle is now a perceptron, which includes
weights and an activation function, which are not shown ex- x2 y2
plicitly in Fig. 12-4. The number of perceptrons in the output x3 y3
layer is usually the number of image classes. The output of the x4 Layer 2 y4
neural network is usually close to 1 for one of the perceptrons Layer 1 Layer 3
in the output layer, and close to zero for all other perceptrons Layer 0 Hidden layers Layer 4
in the output layer. The output-layer perceptron with an output Input layer Output layer
of 1 identifies the classification of the image. In terms of the
formulation in Section 11-1, the Kth perceptron in the output
layer outputting 1 is analogous to K maximizing the log-
likelihood function. There are also other possibilities; the output
layer may output ones and zeros that constitute bits in a binary 1
representation of integer K. Σ
0
The vertical stacks of perceptrons between the input and
output layers are called hidden layers (of perceptrons). The
hidden layers greatly increase the ability of the neural network
to perform classification, partly because there are more weights
and partly because the neural network has a more complicated
structure, including more activation functions. The activation Figure 12-4 A neural network with 4 inputs and 4 outputs.
functions allow classification regions that are more complicated Each blue circle in the hidden and output layers is a perceptron
than those we have seen so far. with associated weights and an activation function (see enlarged
An underlying rationale for the validity of neural networks circle).
is the Universal Approximation Theorem, which states that
neural networks for deep learning. Examples 11-1 and 11-5 are examples of binary classifi-
Deep-learning neural networks became practical around 2009 cation problems with a separating hyperplane. Classification
when it became possible to construct them using graphical was accomplished using the single perceptrons in Figs. 12-3(a)
processing units (GPUs). A GPU is a parallel-processing chip and (b), respectively. The AND and OR gates in Exercises 12-1
capable of performing simple computations (like those in a and 12-2 are two other examples of binary classification.
perceptron) in parallel in thousands of parallel computing cores. Many classification problems do not have separating hy-
They were developed for creating images for computer games perplanes separating classification regions. The classification
and similar applications that required computations for rotation regions are Voronoi sets separated by multiple hyperplanes, or
of vertices in 3-D images. An example of a GPU is the NVIDIA more complicated curved boundaries.
GeForce GTX1080 video card. The applicability of GPUs to
neural networks should be evident. E. Classification Using Multiple Hyperplanes
Let us consider the XOR (exclusive OR) logic gate defined by
C. Examples of Tiny Neural Networks
x1 0 0 1 1
For practical applications, even the smallest neural networks
have thousands of perceptrons. For example, a neural network x2 0 1 0 1 .
for reading handwritten zip codes reads each digit using a y 0 1 1 0
camera that generates a (28×28) image of each digit. Hence, the
The classification regions for the XOR gate are shown in
input layer of the neural network has N ′ = 282 = 784 terminals,
Fig. 12-5, which include two hyperplanes.
which is too large to display as a figure. The output layer has
The y = 1 region is not contiguous, so a single perceptron will
L = 10 perceptrons (one for each possible output {0, 1, . . . , 9}).
not be able to classify the input pair (x1 , x2 ) correctly. Instead,
There is usually one hidden layer with about 15 perceptrons.
we use three perceptrons, connected as shown in Fig. 12-5(b).
Hence, in the forthcoming examples, we limit our discussion
With the activation functions φ (·) of all three perceptrons set
to tiny networks designed to demonstrate how neural networks
to act like step functions u(x), the lower perceptron in the hidden
operate, even though their sizes are very small.
layer of Fig. 12-5(b) computes φ (x2 − x1 − 1), whose value is 1
in the region above the line x2 − x1 = 1 (hyperplane 1 in R2 ).
D. Classification with a Single Separating Similarly, the upper perceptron in the hidden layer computes
Hyperplane φ (x1 − x2 − 1), whose value is 1 in the lower right region below
the line x1 − x2 = 1 (hyperplane 2 in R2 ).
The rightmost perceptron (in the output layer) implements an
◮ A single perceptron can be used to classify images in OR logic gate operation. The output is y = 1 if either of its inputs
binary (L = 2 classes) classification problems in which the is 1, and 0 if neither of its inputs is 1. The −0.5 could be replaced
regions are segregated by a separating hyperplane. ◭ by any number b such that −1 < b < 0.
A hyperplane in RN is a surface of the form Example 12-1: Neural Network Classifier
a1 x1 + a2 x2 + · · · + aN xN = b, (12.5)
Implement a neural network classifier for the classification rule
RN
where (x1 , x2 , . . . , xN ) are the coordinates of a point in and developed in Section 11-9.1.
{a1, a2 , . . . , aN } and b are constants. For N = 2, Eq. (12.5) Solution: In Section 11-9.1, we illustrated how to develop an
becomes unsupervised classification rule for an unknown (2 × 2) image
b a1
x2 = − x1 ,
a2 a2 g[0, 0] g[1, 0]
g[n, m] = , (12.6)
which is a line with slope −a1 /a2 . For N = 3, Eq. (12.5) g[0, 1] g[1, 1]
becomes a plane like the one in Fig. 11-11. A separating
hyperplane divides RN into two classification regions, one for given five (2 × 2) images { f (i) [n, m], i = 1, 2, . . . , 5} with un-
each image class. known classes. The procedure led to coordinates g1 and g2 given
and U is a (4 × 4) matrix computed from the five given images

x2 using the SVD method outlined in Section 11-9.1. In Eq. (12.7),
Hyperplane 1: (UT g)i is the ith element of the column vector (UT g). From
x2 − x1 = 1 Eq. (11.91a), the transpose of U is given by
y=1
1 y=0  
0.54 0.47 0.49 0.50
Hyperplane 2: −0.52 0.56 0.48 −0.44
UT =  . (12.9)
−1 1 x1 − x2 = 1 −0.20 0.63 −0.69 0.29 
x1 −0.63 −0.26 0.25 0.69
Using Eqs. (12.8) and (12.9) in Eq. (12.7) gives

y = 0 −1
y=1 g1 = 0.54 g[0, 0] + 0.47 g[0, 1]
+ 0.49 g[1, 0] + 0.50 g[1, 1], (12.10a)
g2 = −0.52 g[0, 0] + 0.56 g[0, 1]
(a) Classification regions for XOR gate + 0.48 g[1, 0] − 0.44 g[1, 1]. (12.10b)
−1 The solution in Section 11-9.1 led to the existence of three

classes with the following rule:
x1 u(∙) u(x1 − x2 − 1)
(1) Class #1 if: g2 > 0 and g2 − g1 > −1.5, (12.11a)
− (2) Class #2 if: g2 < 0 and g2 + g1 < 1.5, (12.11b)
(3) Class #3 if: g1 > 1.5 and g1 − 1.5 > g2 > −(g1 − 1.5).
−0.5 u(∙) y (12.11c)
To construct a neural network with the appropriate perceptrons,

− it is useful to cast the classification rules in the form of AND
or OR statements. To that end, we introduce variables a1 , a2 ,
x2 u(∙) u(x2 − x1 − 1) and a3 , and we use activation functions that mimic step functions
u(·):
−1
a1 = φ (g2 − g1 + 1.5)
(b) Neural network implementation of XOR gate (
1 if (g2 − g1 + 1.5) > 0,
≈ u(g2 − g1 + 1.5) =
Figure 12-5 Classification regions and neural network for 0 if (g2 − g1 + 1.5) < 0,
XOR gate. Here, φ (·) ≈ u(·) is a step function. (12.12a)
(
1 if g2 > 0,
a2 = φ (g2 ) ≈ (12.12b)
0 if g2 < 0,
and (
by 1 if g2 + g1 − 1.5 > 0,
a3 = φ (g2 + g1 − 1.5) ≈
g1 = (UT g)1 (12.7a) 0 if g2 + g1 − 1.5 < 0.
and (12.12c)
g2 = (UT g)2 , (12.7b)
The steps required for realizing a1 to a3 are shown in Fig. 12-6.
where g is the vector equivalent of unwrapped g[n, m], namely The classification rule for Class #1, as given by Eq. (12.11a),
requires that both a1 AND a2 be 1, which can be realized by
g = [g[0, 0], g[0, 1], g[1, 0], g[1, 1]]T, (12.8)
g[0,0] 0.54
0.47 g1
1.5
0.49 g2 − g1 + 1.5
g[0,1] − 1 a1
0.50 −1.5
0
a4 1
−0.52 y1
g[1,0] 0
0.56 g2 g2 1 a2
0.48 0
a5 1
g[1,1] −0.44 y2
0
g2 + g1 − 1.5 1 a3 −0.5
0
−1.5
Figure 12-6 Neural network for Example 12-1.
combining them to obtain

◮ For Class #2, the classification rule is:
a4 = a1 + a2 − 1.5, (12.13a) If y2 = 0, choose Class #2.
and then subjecting a4 to a final activation-function operation
φ (a4 ):
y1 = φ (a4 ) = φ (a1 + a2 − 1.5). (12.13b) For class #3, the classification rule is straightforward:
If y1 = 0 and y2 = 1, then g[n, m] does not belong to either
The AND table shown in Fig. 12-6 confirms that output y1 = 1 Class #1 or Class #2, and therefore it has to belong to Class #3.
only when both a1 AND a2 are 1.
Concept Question 12-2: What is the difference between
◮ For Class #1, the classification rule is: a perceptron and a neural network?
If y1 = 1, choose Class #1.

Concept Question 12-3: A perceptron is a nonlinear
function of a linear combination of several inputs. Why
Next, we consider the rule for Class #2, as defined by does it have this form?
Eq. (12.11b), which also is an AND statement, as it requires that
two simultaneous conditions be satisfied, namely that a2 and Concept Question 12-4: If the input to a neural network
a3 are both zero. Instead of implementing an AND operation is to be classified into one of two classes separated by a
requiring both a2 and a3 to be zero, we can equivalently separating hyperplane, what is the minimum number of
implement an OR operation requiring both a2 and a3 to be 1. neurons that must be in the neural network?
This is made possible by the bottom branch in Fig. 12-6:
y2 = φ (a5 ) = φ (a2 + a3 − 0.5). (12.14)

12-2 TRAINING NEURAL NETWORKS 397
12-2 Training Neural Networks

So far, we have presented miniature-size neural networks for
which the weights and structure could be determined rather
easily. Neural networks for real-world problems are far more
complicated. The weights are determined by using labeled train-
ing images. The procedure for determining the weights is called
training the neural network. We now present this procedure.
There are actually three different procedures for training a
neural network. At the heart of all three procedures is an al- x3 x4
x2
gorithm called backpropagation. This term comes from the fact
x1
that the neural network is first run forward (in increasing layer
number, or left to right), using a labeled training image, and the x0
gradient of the mean-square error with respect to the weights is
then propagated backwards (in decreasing layer number, or right
to left) using the results of the forward run.
The gradient of the mean-square error, computed using back-
propagation, is then used in a steepest-descent (or gradient) Figure 12-7 Contours of equal values of f (x) and the path (in
algorithm, in one of three ways. red) taken by an SD algorithm.
We explain these techniques in the next three subsections.
First, we review steepest descent algorithms. Next, we derive
the backpropagation algorithm, and finally, we present three
procedures for computing the weights from labeled training
images.
The basic iteration of an SD algorithm is
x(k+1) = x(k) − µ ∇ f (x(k) ), (12.16)
12-2.1 Gradient (Steepest Descent) (SD) where x(k) is the estimate of the minimizing x at the kth iteration,
Minimization Algorithm and µ is the step-size, a small discretization length. Vector x(k) is
perturbed by a distance µ in the direction that decreases f (x(k) )
A steepest-descent (SD) algorithm, also known as a gradient the fastest.
or (misleadingly) a gradient-descent algorithm (the gradient The iterative process stops when x(k+1) ≈ x(k) , corresponding
does not descend) is an iterative algorithm for finding the to ∇ f (x) ≈ 0. This may be the location x∗ of the global
minimum value of a differentiable function f (x1 , x2 , . . . , xN ) minimum of f (x), or it may be only a local minimum of f (x).
of N spatial variables x = [x1 , x2 , . . . , xN ]T . The function to be These are both illustrated in Example 12-2 below.
minimized must be real-valued and scalar (not vector-valued), A useful interpretation of SD is to regard f (x) as the elevation
although SD can also be applied to minimize scalar functions at a location x in a bowl-shaped surface. The minimizing value
of vector-valued or complex-valued functions, such as || f (x)||2 . x∗ is at the bottom of the bowl. By taking a short step in the
To maximize f (x), we apply SD to − f (x). The minimum occurs direction in which f (x) is decreasing, we presumably get closer
at x = x∗ , where ∗ denotes the minimizing value, not complex to the bottom x∗ of the bowl. However, we may get caught in a
conjugate (this is standard notation for optimization). local minimum, at which ∇ f (x) ≥ 0, so taking another step in
SD relies on the mathematical fact that the gradient ∇ f (x) any direction would increase f (x).
specifies the (vector) direction in which f (x) increases fastest. Figure 12-7 illustrates how taking short steps in the direction
Recall that, even though f is a scalar, ∇ f is a vector defined as in which f (x) is decreasing may eventually take us to the
T minimizing value x∗ . The curves are contours (as in a topograph-
∂f ∂f ∂f ical map) of equal “elevations” of f (x) values. This is merely
∇f = , ,..., . (12.15) illustrative; backpropagation uses a 1-D SD algorithm.
∂ x1 ∂ x2 ∂ xN
different numeral, the output terminals should change values

Example 12-2: 1-D SD Algorithm accordingly.
The 1-D version of Eq. (12.16) for minimizing a function f (x) 12-2.3 Backpropagation
is
d f (k)
x(k+1) = x(k) − µ (x ), (12.17) A. Overview of Backpropagation
dx
where k is the iteration index. Apply the algorithm given by Backpropagation is an algorithm for “training” (computing the
Eq. (12.17) to compute the minimizing value x∗ for weights) of a neural network. To perform backpropagation, we
are given a set of I labeled training images {f1 , f2 , . . . , fI }, where
f (x) = cos(3x) − sin(x) + 2, −2 < x < 2. (12.18) each fi is an unwrapped training image. That is, the original 2-D
(M × N) ith image fi [n, m] is unwrapped by column into a vector
Solution: The function f (x) is plotted in blue in Fig. 12-8. fi of length N ′ = MN:
Over the range −2 < x < 2, f (x) has its global minimum x∗
fi = [ f1 [i], f2 [i], . . . , f j [i], . . . , fN ′ [i]]T . (12.20)
near x = 1, as well as a local minimum near x = −1. Since
df Element f j [i] is the value of the jth pixel of the unwrapped
f ′ (x) = = −3 sin(3x) − cos(x), |x| < 2, training image i. The neural network has an input vector and
dx an output vector. Image fi has N ′ elements, so the input layer of
the SD algorithm of Eq. (12.17) becomes the neural network should have N ′ terminals. The output of the
last layer consists of L neurons, corresponding to the number of
x(k+1) = x(k) + µ (3 sin(3x(k) ) + cos(x(k) )). (12.19) image classes, and the combination of the L values constitutes
an output vector. The desired variable di is the number denoting
When initialized using x(0) = 0.1 and µ = 0.05, the SD algo- perfect classification of image i with input vector fi , and it is
rithm converged to the (desired) global minimum at x ≈ 1 after called the label of fi . For the zip-code example in which each
15 iterations. Ordered pairs {(x(k) , f (x(k) )), k = 1, . . . , 15} are training image is (28 × 28) in size, the input vector is of length
plotted using red “+” symbols in Fig. 12-8. N ′ = 28 × 28 = 784, but the output vector is only of length
But when initialized using x(0) = −0.2, the SD algorithm L = 10, corresponding to the digits 0 through 9.
converged to the (undesired) local minimum at x ≈ −1. The The goal of backpropagation is to determine the weights in
ordered pairs {(x(k) , f (x(k) )), k = 1, . . . , 15} are plotted using the neural network that minimize the mean-square error E over
green “+” symbols in Fig. 12-8. In both cases the algorithm all I training images.
had smaller updates at locations x where d f /dx was small, as
expected from Eq. (12.17). At locations where d f /dx = 0, the B. Notation
algorithm stopped; these were the global and local minima.
The neural-network notation can be both complicated and con-
12-2.2 Supervised-Training Scenario: fusing, so we devote this subsection to nomenclature, supported
by the configuration shown in Fig. 12-10.
Classifying Numerals 0 to 9 With this notation in hand, we now define the mean-square
Let us consider how a neural network should perform to cor- error E over all I training images as
rectly classify images of numerals 0 to 9. In Fig. 12-9, the
I
input layer consists of 784 terminals, corresponding to the 1
intensity values of (28 × 28) image pixels. The true identity
E=
I ∑ E[i], (12.22)
i−1
of the test input image is the numeral 4. The output layer
consists of 10 terminals designed to correspond to the numerals where
0 to 9. Once the neural network has been properly trained using L L
the backpropagation algorithm described later, the fifth output 1 1
terminal (corresponding to numeral 4) should report an output
E[i] =
2 ∑ (eL,p [i])2 = 2 ∑ (d p[i] − yL,p[i])2 . (12.23)
p=1 p=1
of 1, and all of the other output terminals should report outputs
of zero. If the input image is replaced with an image of a The quantity inside the summation is the difference between the
0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Figure 12-8 f (x) (in blue) and {(x(k) , f (x(k) )), k = 1, . . . , 15}, in red when initialized using x(0) = 0.1, and in green when initialized
using x(0) = −0.2.
Designated numeral Output reading

0
0
1
0
2
0
(28 × 28) = 784 pixels 3
0
displaying image 4
NN 1
of numeral “4” 5
0
6
0
7
0
8
0
9
0
Input layer with Output layer
784 terminals
Figure 12-9 When the image of numeral 4 is introduced at the input of a properly trained neural network, the output terminals should all
be zeros, except for the terminal corresponding to numeral 4.
desired output of neuron p of the output layer and the actual image and the L outputs associated with the L classes. That
output of that neuron. The summation is over all L neurons in relationship depends, in turn, on the assignment of weights
the output layer. wℓ,p,q , representing the gain between the output of the qth
neuron in layer (ℓ − 1) and the input to the pth neuron in layer ℓ
(Fig. 12-11). Backpropagation is the tool used to determine
12-2.4 Backpropagation Training Procedure those weights.
The process involves the use of I training images and
The classification accuracy of the neural network depends on one or more iterations. We will outline three backpropagation
the relationship between the N ′ inputs associated with the input
f1 w1,1,1
f1
w1,2,1 f1w1,1,1
w3,1,0
f1w1,2,1
w1,1,0 w2,1,0 1
w3,1,1 Σ y3,1
y1,1w2,1,1
f2 w1,1,2 1 y1,1 w2,1,1 1 y2,1 0
f2 f2w1,1,2 Σ Σ w3,2,1
w1,2,2 0 w2,2,1 0
w3,2,0
w3,3,1
f2w1,2,2 y1,2w2,1,2 1
Σ y3,2
f3w1,1,3 y1,1w2,2,1
w3,1,2 0
f3 w1,1,3 1 y1,2 w2,1,2 1 y2,2

f3 f3w1,2,3 Σ Σ w3,2,2
w1,2,3 w2,2,2
0
y1,2w2,2,2 0
w3,3,2 1
w1,2,0 Σ y3,3
w2,2,0
f4w1,1,4 0
w3,3,0
f4 w1,1,4 f4w1,2,4
f4
w1,2,4
Layer 0 Layer 1 Layer 2 Layer 3

(input) (output)
Figure 12-10 Neural network with 4 terminals at input layer, corresponding to image vector f = [ f1 f2 f3 f4 ]T , 2 hidden layers of 2
neurons each, and an output layer with three terminals, corresponding to three image classes.
(0)
procedures with different approaches. The procedures utilize values. The superscript (0) denotes that wℓ,p,q [1] are the initial
relationships extracted from the derivation given later in Section assignments.
12-3.
Weight Initialization
A. Procedure 1: Iterate over k, One Image at a
Time The initial values of the weights w in the neural network should
be small enough such that the linear combination of inputs is
Image i = 1, Iteration k = 0 between 0 and 1. In addition, even when the activation function
(sigmoid function) has a steep slope, the neuron output will be
1. Initialize: For test image i = 1 (chosen at random and labeled roughly proportional to the linear combination (instead of just
(0)
image 1), initialize the weights wℓ,p,q [1] with randomly chosen being 1 or 0) for a wide range of input values.
Symbols and Indices To other neurons

i: Training-image index label, i = 1, 2, . . . , I in layer l
1 yl−1, q
I: Total number of training images Σ
fi : Vector representing unwrapped image fi [n, m] 0
f j [i]: Value of jth pixel of vector fi , j = 1, 2, . . . , N ′ wl,p,q

L: Total number of classes, also total number of Neuron q in layer l − 1
outputs wl,p,q yl−1, q
N′: (= MN): Length of training image vectors fi , 1
which also is the number of input terminals at Σ
layer 0 0
k: Iteration index, as in (·)(k)

ℓ: Layer index in neural network, ℓ = 0, . . . , L, Neuron p in layer l
with ℓ = 0 representing the input layer and
ℓ = L representing the output layer Figure 12-11 Neuron q in layer (ℓ − 1) generates output
yℓ−1,q . The contribution of that neuron to neuron p in layer ℓ
L: Total number of layers, excluding the input
is wℓ,p,q yℓ−1,q , where wℓ,p,q is an iteratively adjusted weighting
layer (which contains terminals, not percep- coefficient.
trons)
ℓ, p: pth neuron in ℓth layer, p = 1, 2, . . . , nℓ
nℓ : Total number of neurons in ℓth layer
However, if a weight wℓ,p,q is too small, the output yℓ−1,q of
Inputs, Outputs, and Differences the qth neuron in layer (ℓ − 1) will have little effect on the input
wℓ,p,q yℓ−1,q to the pth neuron in layer ℓ. Also, if the weight is too
nin = N ′ : Number of inputs large, it will dominate all of the other inputs to the pth neuron in
nout = L: Number of outputs layer ℓ. To prevent these extreme situations from occurring, the
wℓ,p,q [i]: Weight (gain) between output of qth neu- initial weights should be chosen appropriately: the more inputs
ron in layer (ℓ − 1) and input to pth into a neuron, the smaller the initial weights should be, and in
neuron in layer ℓ for image i all cases, the initial weights should be randomly distributed.
A common rule of thumb for initializing the weights is to
υℓ,p[i] = Activity of pth neuron in ℓth layer for image i choose a random number between 0 and 1 if there are 50
nℓ −1 perceptrons in the neural network, between 0 and 0.5 if there
are between 50 and 200 perceptrons in the neural network, and
υℓ,p [i] = ∑ wℓ,p,q[i] yℓ−1,q[i], (12.21a)
between 0 and 0.2 if there are more than 200 perceptrons in the
q=0
neural network.
with yℓ−1,0 [i] = 1 for the constant input for each percep-
tron in layer ℓ. 2. Feed forward: Run the neural network forward (in increasing
layer index ℓ).
yℓ,p [i] = output of pth neuron in layer ℓ for image i 3. Measure outputs of all layers: Measure all of the outputs
1 (0)
= φ (υℓ,p [i]) = , (12.21b) {yℓ,p [1], ℓ = 1, 2, . . . , L} of all perceptrons in all layers of the
1 + exp(−aυℓ,p[i]) neural network, not just the output layer.
with a = activation constant. (0)
4. Initialize output gradients: Set δL,p [1] for all perceptrons in
the final layer (ℓ = L) using
d p [i] = Desired output of pth neuron in output layer
(0) (0) (0) (0)
ℓ = L for image i δL,p [1] = −a(d p[1] − yL,p[1]) yL,p [1] (1 − yL,p[1]), (12.24)
eL,p [i] = Error for pth neuron in output layer ℓ = L
(0)
= d p [i] − yL,p [i]. (12.21c) where yL,p [1] is the output of perceptron p of the last layer,
(k) (k) (k)

generated in response to test image 1, d p [1] is the desired output δℓ,p [i] = a yℓ,p [i] (1 − yℓ,p[i])
of the pth neuron in the output layer—which is either 1 or 0, nℓ+1
depending on whether image 1 is class p or not, respectively— (k) (k)
and a is a selectable activation constant (see Eq. (12.2)). If image
× ∑ δℓ+1,q[i] wℓ+1,q,p[i], (12.28b)
q=1
1 belongs to class 5 from among 8 classes, for example, then (k) (k) (k)
the desired output d p [1] is 1 for p = 5 and zero for the other 7 ∆ℓ,p,q[i] = δℓ,p [i] yℓ−1,q[i], (12.28c)
outputs (p = 1, 2, 3, 4, 6, 7, and 8). and
(k+1) (k) (k)
5. Backpropagate: in decreasing layer number, generate values wℓ,p,q [i] = wℓ,p,q [i] − µ ∆ℓ,p,q[i]. (12.28d)
(0)
for the local gradients δℓ,p [1], starting at ℓ = L and ending at
ℓ = 1, using the recipe ◮ Procedure 1 computes a local (and hopefully also a
nℓ+1
global) minimum of E[i]. The drawback of this procedure
(0) (0) (0) (0) (0) is that it requires many runs of the neural network and
δℓ,p [1] = a yℓ,p [1] × (1 − yℓ,p[1]) × ∑ δℓ+1,q[1] wℓ+1,q,p[1]. backpropagation to make use of all of the training images.
q=1
(12.25) Hence, the procedure is seldom used in practice. ◭
6. Compute the weight iteration correction:

B. Procedure 2: Iterate over Images, but
(0) (0) (0)
∆ℓ,p,q [1] = δℓ,p [1] yℓ−1,q [1] (12.26) Backpropagate Only Once for Each Image
for all layers and neurons in those layers. The relationship given 1. Perform steps 1–7 of Procedure 1.
by Eq. (12.25) is extracted from the complete derivation of 2. Repeat steps 2–7 of Procedure 1, but with a different training
backpropagation given in Section 12-3. (1)
image. Use the weights wℓ,p,q computed for the previous training
image.
Image 1, Iteration k = 1
3. Repeat the preceding step for all remaining training images.
7. Update all weights: from those used in iteration k = 0 to a
new set for iteration k = 1 using
◮ Procedure 2 differs from Procedure 1 in that the steepest
(1) (0)
wℓ,p,q [1] = wℓ,p,q [1] − µ
(0)
∆ℓ,p,q[1], (12.27) descent (SD) algorithm is run for only a single iteration
(k = 0 to k = 1) for each training image. Procedure 2,
where µ is a small step size. called incremental learning, is often described as a gra-
dient descent algorithm, but this is an incorrect description
8. Repeat steps 2 to 6. because the SD algorithm is run for only a single iteration.
(1)
The computed weights {wℓ,p,q [i]} do not minimize E[i]—
Image 1, Iterations k = 2 to K they only make it slightly smaller than the randomly chosen
(0)
initial weights {wℓ,p,q [i]} do. However, Procedure 2 does
Increment k and repeat steps 2–6 until the process converges to
a predefined set of output thresholds (weights stop changing). get all of the training images involved more quickly than
The final iteration is identified as k = K. Procedure 1, and is more commonly used in practice. ◭
Images 2 to I C. Procedure 3: Stochastic Approach

Repeat the process in steps 1–8 for all remaining images i = 2 This procedure is similar to Procedure 2 except that it uses a
to I. For each image, initialize the neural network with the last stochastic gradient descent algorithm. Given I training images,
iteration weights of the first training image. For any iteration k the images are partitioned into I2 batches of I1 images each,
and image i, the relevant relationships are: selected randomly. Hence, I = I1 I2 . Using index i1 for the
images within a batch, with {i1 = 1, 2, . . . , I1 }, and index i2
(k) (k) (k) (k)
δL,p [i] = −a(d p[i] − yL,p[i]) yL,p [i] (1 − yL,p[i]), (12.28a) for the batch of images, with {i2 = 1, 2, . . . , I2 }, the summation
12-3 DERIVATION OF BACKPROPAGATION 403
(k)
defining the total mean-square error (MSE) in Eq. (12.22) can This relationship, which is used to update the weight wℓ,p,q [i] of
be rewritten as a sum of the error over the I2 batches: (k+1)
iteration k to weight wℓ,p,q [i] in iteration (k + 1), is based on the
1 I2 steepest descent (SD) algorithm, which uses the iterative form
E=
I2 ∑ E[i2 ]. (12.29a)
i2 =1 (k+1) (k) ∂ E[i]
wℓ,p,q [i] = wℓ,p,q [i] − µ (k)
. (12.31)
Here, E[i2 ] is the MSE of batch i2 , which is given by ∂ wℓ,p,q [i]
I1 The second term involves the differential of the mean-square

1
E[i2 ] =
I1 ∑ E[i1 , i2 ], (12.29b) error E[i]. The derivation that follows will show that the second
i2 =1 term in Eq. (12.31) is indeed equal to the second term in
Eq. (12.30).
where E[i1 , i2 ] is the MSE for image i1 in batch i2 . In view We start by defining the local gradient δℓ,p [i] as
of Eq. (12.23), for a neural network whose output layer L is
composed of L neurons, E[i1 , i2 ] is given by ∂ E[i]
δℓ,p [i] = , (12.32)
L
∂ υℓ,p [i]
1
E[i1 , i2 ] =
2 ∑ (d p[i1 , i2 ] − yL,p[i1 , i2 ])2 . (12.29c)
where υℓ,p [i] is the activity of neuron p in layer ℓ defined in
p=1
Eq. (12.21a). Using the chain rule gives
In Procedures 1 and 2, the weights in the neural network are
updated from one iteration to the next, using the expressions ∂ E[i] ∂ E[i] ∂ υℓ,p [i] ∂ υℓ,p [i]
(k)
= = δℓ,p [i] . (12.33)
given by Eq. (11.48) for the local gradients δℓ,p . A similar ∂ wℓ,p,q [i] ∂ υℓ,p [i] ∂ w(k) [i] (k)
∂ wℓ,p,q [i]
ℓ,p,q
process is used in the current procedure, except that now the
weights for each particular neuron are updated using the average Use of the definition given by Eq. (12.21a) leads to
weight for that neuron among I1 training images in one of the I2
batches (in practice, it does not much matter which particular ∂ υℓ,p [i]
(k)
= yℓ−1,q [i], (12.34)
batch is selected). ∂ wℓ,p,q [i]
Concept Question 12-5: What is backpropagation and and Eq. (12.33) simplifies to
what is it used for?
∂ E[i]
(k)
= δℓ,p yℓ−1,q[i]. (12.35)
Concept Question 12-6: What does it mean to train a ∂ wℓ,p,q [i]
neural network?
A second use of the chain rule gives
Concept Question 12-7: Does a gradient (steepest de- ∂ E[i]
nℓ+1
∂ E[i] ∂ υℓ+1,s [i]
scent) algorithm always find the global minimum? δℓ,p [i] =
∂ υℓ,p [i]
= ∑ ∂ υℓ+1,s [i] ∂ υℓ,p [i]
s=1
nℓ+1
∂ υℓ+1,s [i]
= ∑ δℓ+1,s[i] ∂ υℓ,p [i]
. (12.36)
12-3 Derivation of Backpropagation s=1
Next, change ℓ to ℓ + 1 in Eq. (12.21a) and also use Eq. (12.21b)

In the preceding section we outlined how to apply the backprop- to obtain:
agation algorithm in the form of three slightly different iterative
procedures. A critical ingredient is the relationship given by the nℓ
(k)
combination of Eqs. (12.28c and d), namely υℓ+1,s [i] = ∑ wℓ+1,s,q [i] φ (υℓ,p [i]). (12.37)
q=0
(k+1) (k) (k) (k) (k)
wℓ,p,q [i] = wℓ,p,q [i] − µ ∆ℓ,p,q [i] = wℓ,p,q − µ δℓ,p [i] yℓ−1,q [i]. When taking a partial derivative with respect to υℓ,p[i], only the
(12.30)
q = p term is nonzero, and it is δL,p [i]:

∂ υℓ+1,s [i] (k) δL,p [i] = −a(d p[i] − yL,p[i])(1 − yL,p )yL,p . (12.46)
= wℓ+1,s,p [i] φ ′ (υℓ,p [i]). (12.38)
∂ υℓ,p [i]
Upon factoring out φ ′ (υℓ,p [i]), Eq. (12.36) becomes

12-4 Neural Network Training
mℓ+1
δℓ,p [i] = φ ′ (υℓ,p [i])
(k)
∑ δℓ+1,s[i] wℓ+1,s,p[i]. (12.39) Examples
s=1
12-4.1 Training Image Selection
Using Eq. (12.3) for the derivative of the sigmoid function φ (x)
defined in Eq. (12.2), and also in Eq. (12.21b), gives It is customary to use only some members of the training set to
train the neural network, and then use the remaining members
φ ′ (υℓ,p [i]) = a(1 − φ (υℓ,p[i])) φ (υℓ,p [i]) to test the performance of the neural network. If only half of the
= a(1 − yℓ,p[i]) yℓ,p [i]. (12.40) training set members are used to train, and the other half to test,
we can repeat training by exchanging these two classes. This is
Hence, we now can compute δℓ,p [i] recursively in decreasing called cross-validation.
layer number ℓ, starting at output layer ℓ = L, using Members of the training set should be selected at random, so
that not all training set images are labeled with the same image
mℓ+1 class.
(k)
δℓ,p [i] = a(1 − yℓ,p[i]) yℓ,p [i] ∑ δℓ+1,s[i] wℓ+1,s,p[i]. (12.41)
s=1
12-4.2 Identifying “+” Symbol
We still have to initialize this recursion at ℓ = L. To do that,
we apply the chain rule (again!) to the definition given be In this simple example, we train a neural network to determine
Eq. (12.32) at ℓ = L: the presence or absence of a “+” in a noisy (3 × 3) image, from a
set of I labeled training images, all (3 × 3) in size, half of which
∂ E[i] ∂ E[i] ∂ eL,p [i] ∂ yL,p [i] contain the “+” symbol—in addition to zero-mean Gaussian
δL,p [i] = = . (12.42) noise—and half contain only noise. Typical examples of the
∂ υL,p [i] ∂ eL,p [i] ∂ yL,p [i] ∂ υL,p [i]
training images are shown in Fig. 12-12.
Each of these partial derivatives can be simplified easily. From This problem can be solved using the procedure MLE de-
the definition given by Eq. (12.23) for E[i], we have scribed in Section 11-2 of the preceding chapter, but the deri-
vation requires the use of likelihood functions and relies heavily
∂ E[i] on the zero-mean white Gaussian nature of the noise. By using a
= eL,p [i]. (12.43)
∂ eL,p [i] neural network, we do not need to use the mathematical analyses
associated with the likelihood functions, but we should evaluate
From the definition for eL,p [i] given by Eq. (12.21c), the classification accuracy of the neural network to make sure it
is acceptable
∂ eL,p [i] The neural network we use in this example has nine terminals
= −1. (12.44)
∂ yL,p [i] (nodes) in its input layer (Fig. 12-13), for the nine pixel values
in a (3 × 3) image, nine perceptrons in a single hidden layer, and
From Eqs. (12.2) and (12.3), a single perceptron in the output layer, which (ideally) outputs
∂ yL,p [i] 1 if the “+” is present and zero if the ”+” is not present. The
= φ ′ (υL,p [i]) total number of weights is 92 = 81 connecting the input layer
∂ υL,p [i] to the hidden layer, and an additional 9 weights connecting
= a(1 − φ (υL,p [i])) φ (υL,p [i]) the hidden layer to the output layer. These do not include the
= a(1 − yL,p)yL,p . (12.45) 10 constant inputs in the 10 perceptrons. Hence, the total is
81 + 9 + 10 = 100 weights.
Substituting these results in Eq. (12.42) gives the initialization The neural network was trained using 100,000 training im-
ages, half of which included the “+” and the other half of which
12-4 NEURAL NETWORK TRAINING EXAMPLES 405
c1
f1
c2
f2
c3
f3
c4
f4
c5
(a) Image of noise plus “+” sign f5
c6
f6 c10
c7
f7
c8
f8
c9
f9
Layer 0 Layer 1 Layer 2

(input) (output)
(b) Typical image of noise alone, with no “+” sign
Figure 12-13 Basic structure of neural network. The input
Figure 12-12 Typical examples of noisy (3 × 3) images (a) image is represented by f = [ f1 , f2 , . . . , fq ]T . Each blue circle
with and (b) without “+” symbol. represents a perceptron that weighs the 9 inputs, adds them up,
adds a constant (c0 through c9 , and goes through an activation
function φ (·).
did not. Each training image included a realization of a zero-

mean Gaussian noise field with σ = 0.1. Typical image exam-
for most initializations the resulting neural networks correctly
ples are shown in Fig. 12-12. The weights were all initialized
classified all test images. A typical neural network correctly
with random numbers distributed with a zero-mean Gaussian classified 98% of the test images.
distributions with σ = 1. The step size for steepest descent (SD)
was set at µ = 0.1, and for the activation function φ (·), the
activation parameter was set at a = 7. With such a large value
12-4.3 Zip-Code Reader
for a, φ (·) resembles a step function. Procedure 2 of Section
12-3 was used. For a zip-code reader, each input f[i] is a vector of length 784.
The neural network was tested using 100 additional training Each element of f[i] is a pixel value read by a camera that reads
images. The results are shown in Fig. 12-14. The correct each digit as a (28 × 28) image. The number of pixels is then
classification is shown in blue and the neural network output 282 = 784. Each training image is labeled with a digit (one of
classification is shown in red. The neural network correctly {0, 1, . . . , 9}) selected by a human judge. Training image f[i], the
classified the “+” symbol in 98% of the 100 test images. corresponding output y[i], and the desired output d[i] are given
The performance of the neural network depended heavily on by
the initial values of the weights wℓ,p,q . For a few initializa-
tions, the resulting neural network performed rather poorly, but f[i] = [ f1 [i], f2 [i], . . . , f784 [i]]T , {i = 1, 2, . . . , I}, (12.47a)
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
Figure 12-14 Performance of neural network for Detection of “+.” Correct in blue, output in red.
y[i] = [y1 [i], y2 [i], . . . , y10 [i]]T , (12.47b) A training set I = 60, 000 images of handwritten digits is avail-
and able at the U.S. National Institute of Standards and Technology
(NIST). The zip code digits were handwritten by 250 different
d[i] = [d1 [i], d2 [i], . . . , d10 [i]]T , (12.47c)
people. The neural network, comprised of 784 input terminals,
with 10 output terminals, and many hidden layers is illustrated in
( Fig. 12-15.
1 for j = correct digit,
d j [i] = (12.47d)
0 for j = incorrect digit.
PROBLEMS 407
4
784 terminals
5
Figure 12-15 The basic structure of a zip-code reader.

Summary
Concepts
• The output of a perceptron is the application of an brain.
activation function to a weighted sum of its inputs. A • The weights in a neural network are computed using an
perceptron mimics the biological action of a neuron, and algorithm called backpropagation, using a set of labelled
perceptrons are often called neurons. training images. Backpropagation uses one iteration of a
• A common choice for activation function is the sigmoid gradient or steepest descent algorithm.
function (below). • Computing the weights in a neural network by applying
• Classification rules defined by a separating hyperplane backpropagation using a set of labelled training images
can be implemented using a single perceptron. is called training the neural network. There are three
• A neural network is a network of perceptrons, connected different ways of performing training.
in a way that mimics the connections of neurons in the
Perceptron Sigmoid function
y = φ (w0 + w1 x1 + · · · + wN xN ) 1
φ (x) =
1 + e−ax
activation function hidden layers neuron sigmoid function training
backpropagation input layer output layer steepest descent
gradient neural network perceptron supervised learning

PROBLEMS 12.4 An image
g0,0 g1,0
is to be classified as either
1 0
g1,0 g1,1 0 1

Section 12-1: Overview of Neural Networks 0 1
or . Specify the weights in a perceptron like Fig. 12-3(a)
1 0
12.1 In Exercise 12-1, you had to determine by inspection the that classifies the image.
weights of a perceptron so that it implemented an OR gate. Write 12.5 A binary adder implements binary addition (with carry).
out a set of nonlinear equations whose solution is the weights in It has the truth table
Fig. 12-1(a).
x1 0 0 1 1
12.2 In Exercise 12-2, you had to determine by inspection the x2 0 1 0 1
weights of a perceptron so that it implemented an AND gate.
Write out a set of nonlinear equations whose solution is the sum 0 1 1 0
weights in Fig. 12-1(a). carry 0 0 0 1

g0,0 g1,0 1 0 where sum = x1 + x2 (mod 2) and carry is the carry (1 + 1 = 10
12.3 An image is to be classified as either base 2). Modify the neural network in Fig. 12-5(b) to implement
g1,0 g1,1 0 1
a binary adder.
0 1
or . Write a set of equations whose solution is the
1 0 12.6 A binary-to-decimal converter accepts as input a 2-bit
weights replacing those in Fig. 12-3(a). binary number (x1 x2 )2 and converts it to a decimal number 0,
PROBLEMS 409
1, 2 or 3. Design a neural network that accepts as inputs {x1 , x2 } training input M-vectors x[i] = [x1 [i], x2 [i], . . . , xM [i]]T , where
and outputs {y0 , y1 , y2 , y3 } where if (x1 x2 )2 = K, then yK = 1 i = 1 . . . I and I labels {d[i], i = 1, . . . , I}, where d[i] is the desired
and {yk , k 6= K} are all zero. output for training vector x[i]. The goal is to compute weights
{w j , j = 0, . . . , M} that minimize E[i] = 21 (d[i] − y[i])2, where
12.7 (This problem may require review of Chapter 2.) A 1-D
signal x(t) of duration 1 s is sampled at 44100 samples/s, y[i] = φ (∑M j=0 w j x j [i]) and x0 = 1 implements the single neuron.
resulting in discrete-time signal x[n] = x(n/44100) of duration (a) Derive a steepest descent algorithm to compute {w j ,
44100 s. Design a neural network for determining the presence j = 0, . . . , M} minimizing E[i].
or absence of a trumpet playing note A (fundamental frequency (b) Show that this is the output layer ℓ = L in the backpropa-
440 Hz) by examining the first four harmonics of the trumpet gation derivation.
signal. Assume the real parts of their DFT are positive.
12.8 (This problem may require review of Chapter 5.) Design Section 12-4: Neural Network Training Examples
a neural network for edge detection on a 1-D signal {x[n]} of
duration N. An edge is at n = n0 if |x[n0 ] − x[n0 − 1]| > T for
12.13 Program P1213.m creates 100 random points in the
some threshold T . Let x[−1] = x[0]. The extension to 2-D edge
square −10 < x1 , x2 < 10 and labels them as being inside
detection of horizontal or vertical edges is straightforward.
or outside a circle of radius 8 centered at the origin (so
the areas inside and outside the circle are rougly equal:
Section 12-2: Training Neural Networks π (8)2 = 201 ≈ 200 = 12 (20)2 ). It then trains a neural network
with 2 inputs (x1 and x2 coordinates of each point), 1 output
12.9 The program neuron.m trains a neuron using labeled neuron (for inside or outside the circle), and a hidden layer of 10
training vectors. Run the program to train (determine 2 weights neurons. It uses 1000 iterations, each running over 100 training
of) a neuron to implement an OR gate. This is the same problem points (2 coordinates each), µ = 0.01, and a = 7. Run P1213.m
as Problem 12.1. The neuron has the form of Fig. 12-1(a). Use using different initializations until it successfully assigns each
1000 iterations with step size µ = 0.01, a = 7, and initialize training point as being in category #1 (inside) or #2 (outside) the
all weights with 0.1. Compare the neuron outputs y[i] with the circle. Of course, the neural network doesn’t “know” circles; it
desired output d[ j] for j = 1, 2, 3, 4 in a table. “learns” this from training.
12.10 The program neuron.m trains a neuron using labelled 12.14 Program P1214.m creates 100 random points in the
training vectors. Run the program to train (determine 2 weights square −10 < x1 , x2 < 10 and labels them as being inside or
of) a neuron to implement an AND gate. This is the same prob- outside a parabola x2 = x21 /10. It then trains a neural network
lem as Problem 12.2. The neuron has the form of Fig. 12-1(a). with 2 inputs (x1 and x2 coordinates of each point), 1 output
Use 1000 iterations with step size µ = 0.01, a = 7, and initialize neuron (for inside or outside the parabola), and a hidden layer
all weights with 0.1. Compare the neuron outputs y[i] with the of 10 neurons. It uses 1000 iterations, each running over 100
desired output d[ j] for j = 1, 2, 3, 4 in a table. training points (2 coordinates each), µ = 0.01, and a = 7. Run
12.11 The program neuron.m trains a neuron using labelled P1214.m using different initializations until it successfully
training vectors. Run assigns each training point as being in category #1 (inside) or
the program
to train a neuron to classify a
1 0 0 1 #2 (outside) the parabola. Of course, the neural network doesn’t
(2 × 2) image as or . This is the same problem “know” parabolas; it “learns” this from training.
0 1 1 0
as Problem 12.3. The neuron has the form of Fig. 12-1(b). Use 12.15 Program P1215.m creates 100 random points in the
1000 iterations with step size µ = 0.01, a = 7, and initialize square −10 < x1 , x2 < 10 and labels them as being in-
all weights with 0.1. Compare the neuron outputs y[i] with the side or outside 4 quarter circles centered on the corners.
desired output d[ j] for j = 1, 2 in a table. The areas inside and outside the circle are rougly equal:
π (8)2 = 201 ≈ 200 = 12 (20)2 ). It then trains a neural network
Section 12-3: Derivation of Backpropagation with 2 inputs (x1 and x2 coordinates of each point), 1 output
neuron (for inside or outside the parabola), and a hidden layer
12.12 We clarify the derivation of backpropagation by ap- of 10 neurons. It uses 1000 iterations, each running over 100
plying it to a single neuron. Single neurons are discussed in training points (2 coordinates each), µ = 0.01, and a = 7. Run
Section 12-1.1 and illustrated in Fig. 12-1. We are given I P1215.m using different initializations until it successfully
assigns each training point as being in category #1 (inside) or

#2 (outside) the circles. Of course, the neural network doesn’t
“know” circles; it “learns” this from training.
Appendix A
A Review of Complex Numbers
A complex number z may be written in the rectangular form
(z)
x = |z| cos θ
z = x + jy, (A.1)
z y = |z| sin θ
y
real (Re) and imaginary (Im) parts of z,
where x and y are the√ |z| = x2 + y2
respectively, and j = −1. That is,
|z| θ = tan−1 (y/x)
x = Re(z), y = Im(z). (A.2)
Note that Im(3 + j4) = 4, not j4. θ

Alternatively, z may be written in polar form as (z)
x
z = |z|e jθ = |z| θ (A.3)
Figure A-1 Relation between rectangular and polar represen-
where |z| is the magnitude of z, θ is its phase angle, and the tations of a complex number z = x + jy = |z|e jθ .
form θ is a useful shorthand representation commonly used in
numerical calculations. By applying Euler’s identity,
e jθ = cos θ + j sin θ , (A.4) as illustrated in Fig. A-2. Specifically,



 tan−1 (y/x) if x > 0,
we can convert z from polar form, as in Eq. (A.3), into rectan- 
tan−1 (y/x) ± π if x < 0,
gular form, as in Eq. (A.1), θ=

 π /2 if x = 0 and y > 0,


z = |z|e jθ = |z| cos θ + j|z| sin θ , (A.5) −π /2 if x = 0 and y < 0.
which leads to the relations Complex numbers z2 and z4 point in opposite directions and
their phase angles θ2 and θ4 differ by 180◦ , despite the fact that
x = |z| cos θ , y = |z| sin θ , (A.6) (y/x) has the same value in both cases.
p The complex conjugate of z, denoted with a star superscript
|z| = x2 + y2 , θ = tan−1 (y/x). (A.7) (or asterisk), is obtained by replacing j (wherever it appears)
with − j, so that
The two forms of z are illustrated graphically in Fig. A-1.
Because in the complex plane, a complex number assumes the
z∗ = (x + jy)∗ = x − jy = |z|e− jθ = |z| −θ . (A.8)
form of a vector, it is represented by a bold letter.
When using Eq. (A.7), care should be taken to ensure that θ is
in the proper quadrant by noting the signs of x and y individually, The magnitude |z| is equal to the positive square root of the
product of z and its complex conjugate:
√
|z| = z z∗ . (A.9)
411
412 APPENDIX A REVIEW OF COMPLEX NUMBERS
Division: For z2 6= 0,
(z)
z1 x1 + jy1
z2 = −2 + j3 z1 = 2 + j3 =
3 z2 x2 + jy2
θ22 3 (x1 + jy1 ) (x2 − jy2 )
θ2 = 180o − θ1 θ1 = tan−1 2 = 56.3o = ·
1 θ1 (x2 + jy2 ) (x2 − jy2 )
(x1 x2 + y1 y2 ) + j(x2 y1 − x1 y2 )
(z) = , (A.13a)
−3 −2 −1 1 2 3 x22 + y22
−1
θ3 = −θ2 θ θ4 θ4 = −θ1 or
−2 3
−3 z1 |z1 |e jθ1
z3 = −2 − j3 z4 = 2 − j3 =
z2 |z2 |e jθ2
|z1 | j(θ1 −θ2 )
Figure A-2 √ Complex numbers z1 to z4 have the same = e
|z2 |
magnitude |z| = 22 + 32 = 3.61, but their polar angles depend
on the polarities of their real and imaginary components. |z1 |
= [cos(θ1 − θ2 ) + j sin(θ1 − θ2 )]. (A.13b)
|z2 |
We now highlight some of the salient properties of complex

algebra.
Powers: For any positive integer n,
zn = (|z|e jθ )n
Equality: If two complex numbers z1 and z2 are given by
= |z|n e jnθ = |z|n (cos nθ + j sin nθ ), (A.14)
j θ1
z1 = x1 + jy1 = |z1 |e , (A.10a)
j θ2 z1/2 = ±|z|1/2 e jθ /2
z2 = x2 + jy2 = |z2 |e , (A.10b)
= ±|z|1/2 [cos(θ /2) + j sin(θ /2)]. (A.15)
then z1 = z2 if and only if (iff ) x1 = x2 and y1 = y2 or,
equivalently, |z1 | = |z2 | and θ1 = θ2 .
Addition: Useful relations:

z1 + z2 = (x1 + x2 ) + j(y1 + y2 ). (A.11)
−1 = e jπ = e− jπ = 1 180◦ , (A.16a)
j π /2
Multiplication: j=e = 1 90◦ , (A.16b)
j π /2 − j π /2
z1 z2 = (x1 + jy1 )(x2 + jy2 ) − j = −e =e =1 −90◦ , (A.16c)
= (x1 x2 − y1 y2 ) + j(x1 y2 + x2 y1 ), (A.12a) p ±(1 + j)
j = (e jπ /2 )1/2 = ±e jπ /4 = √ , (A.16d)
2
or p ±(1 − j)
− j = ±e− jπ /4 = √ . (A.16e)
z1 z2 = |z1 |e jθ1 · |z2 |e jθ2 2
= |z1 ||z2 |e j(θ1 +θ2 ) For quick reference, the preceding properties of complex num-
= |z1 ||z2 |[cos(θ1 + θ2 ) + j sin(θ1 + θ2 )]. (A.12b) bers are summarized in Table A-1. Note that if a complex
number is given by (a + jb) and b = 1, it can be written either
as (a + j1) or simply as (a + j). Thus, j is synonymous with j1.
413
Table A-1 Properties of complex numbers.
Euler’s Identity: e jθ = cos θ + j sin θ

e j θ − e− j θ e j θ + e− j θ
sin θ = cos θ =
2j 2
z = x + jy = |z|e jθ z∗ = x − jy = |z|e− jθ
√ p
x = Re(z) = |z| cos θ |z| = zz∗ = x2 + y2
y = Im(z) = |z| sin θ θ = tan−1 (y/x)
zn = |z|n e jnθ z1/2 = ±|z|1/2 e jθ /2
z1 = x1 + jy1 z2 = x2 + jy2
z1 = z2 iff x1 = x2 and y1 = y2 z1 + z2 = (x1 + x2) + j(y1 + y2 )
z1 |z1 | j(θ1 −θ2 )
z1 z2 = |z1 ||z2 |e j(θ1 +θ2 ) = e
z2 |z2 |
−1 = e jπ = e− jπ = 1 ±180◦
j = e jπ /2 = 1 90◦ − j = e− jπ /2 = 1 −90◦
p (1 + j) p (1 − j)
j = ±e jπ /4 = ± √ − j = ±e− jπ /4 = ± √
2 2
◦
V = |V|e jθV = 5e− j53.1 = 5 −53.1◦ ,
Example A-1: Working with Complex
Numbers p √
|I| = 22 + 32 = 13 = 3.61.
Since I = (−2 − j3) is in the third quadrant in the complex plane

Given two complex numbers (Fig. A-3),

V = 3 − j4, θI = −180◦ + tan−1 32 = −123.7◦,
I = −(2 + j3), I = 3.61 −123.7◦ .
(a) express V√and I in polar form, and find (b) VI, (c) VI∗ , (d) Alternatively, whenever the real part of a complex number is
V/I, and (e) I . negative, we can factor out a (−1) multiplier and then use
Eq. (A.16a) to replace it with a phase angle of either +180◦
Solution: or −180◦, as needed. In the case of I, the process is as follows:
(a)
√ I = −2 − j3 = −(2 + j3)
|V| =VV∗ ◦
p −1
p √ = e± j180 · 22 + 32 e j tan (3/2)
= (3 − j4)(3 + j4) = 9 + 16 = 5, ◦ ◦
= 3.61e j57.3 e± j180 .
θV = tan−1 (−4/3) = −53.1◦,
414 APPENDIX A REVIEW OF COMPLEX NUMBERS
√
Exercise A-2: Show that 2 j = ±(1 + j). (See IP )
−2 3
θV
|I| θI
|V|
I −3
−4 V
Figure A-3 Complex numbers V and I in the complex plane

(Example A-1).
Since our preference is to end up with a phase angle within

the range between −180◦ and +180◦, we will choose −180◦.
Hence, ◦
I = 3.61e− j123.7 .
(b)
VI = (5 −53.1◦)(3.61 −123.7◦)
= (5 × 3.61) (−53.1◦ − 123.7◦) = 18.05 −176.8◦ .
(c)
◦ ◦ ◦
VI∗ = 5e− j53.1 × 3.61e j123.7 = 18.05e j70.6 .
(d)
◦
V 5e− j53.1 j70.6◦
= ◦ = 1.39e .
I 3.61e− j123.7
(e)
√ √ ◦
I = 3.61e− j123.7
√ ◦ ◦
= ± 3.61 e− j123.7 /2 = ±1.90e− j61.85 .
Exercise A-1: Express the following complex functions in

polar form:
z1 = (4 − j3)2 ,
z2 = (4 − j3)1/2.
√
Answer: z1 = 25 −73.7◦ , z2 = ± 5 −18.4◦ . (See IP )
Appendix B
B MATLAB® and MathScript
A Short Introduction for Use in Image MATLAB commands used in this book and website also work
Processing in MathScript.
One important exception is colormap(gray). To make
this work in MathScript, G=[0:64]/64;gray=G’*[1 1
1]; must be inserted at the beginning of the program.
B-1 Background Instructions on how to acquire a student version of MathScript
“A computer will always do exactly what you tell it to do. But are included on the website accompanying the book, as part of
that may not be what you had in mind.”—a quote from the the student edition of LabVIEW. In the sequel, we use “M/M”
1950’s. to designate “MATLAB or MathScript.”
This Appendix is a short introduction to MATLAB and Math- Freemat and GNU Octave are freeware programs that are
Script for this book. It is not comprehensive; only commands mostly compatible with MATLAB.
directly applicable to signal and image processing are covered.
No commands in any of MATLAB’s Toolboxes are included, Getting Started
since these commands are not included in basic MATLAB
or MathScript. Programming concepts and techniques are not To install the student version of MathScript included on the
included, since they are not used anywhere in this book. website, follow the instructions.
When you run M/M, a prompt >> will appear when it
is ready. Then you can type commands. Your first command
MATLAB should be >>cd mydirectory, to change directory to your
working directory, which we call “mydirectory” here.
MATLAB is a computer program developed and sold by the We will use this font to represent typed commands and
Mathworks, Inc. It is the most commonly used program in signal generated output. You can get help for any command, such as
processing, but it is used in all fields of engineering. plot, by typing at the prompt help plot.
“MATLAB” (matrix laboratory was originally based on a set Some basic things to know about M/M:
of numerical linear algebra programs, written in FORTRAN,
called LINPACK. So MATLAB tends to formlate problems • Inserting a semicolon “;” at the end of a command sup-
in terms of vectors and arrays of numbers, and often solves presses output; without it M/M will type the results of the
problems by formulating them as linear algebra problems. computation. This is harmless, but it is irritating to have
The student edition of MATLAB is much cheaper than the numbers flying by on your screen.
professional version of MATLAB. It is licensed for use by all
• Inserting ellipses “...” at the end of a command means
undergraduate and graduate students. Every program on the
it is continued on the next line. This is useful for long
website for this book will run on it.
commands.
• Inserting “%” at the beginning of a line makes the line a
MathScript comment; it will not be executed. Comments are used to
MathScript is a computer program developed and sold by explain what the program is doing at that point.
National Instruments, as a module in LabVIEW. The basic • clear eliminates all present variables. Programs should
commands used by MATLAB also work in MathScript, but always start with a clear.
higher-level MATLAB commands, and those in Toolboxes,
usually do not work in MathScript. Unless otherwise noted, all • whos shows all variables and their sizes.
415
416 APPENDIX MATLAB AND MATHSCRIPT
√
• M/M variables are case-sensitive: t and T are different Both i and‘ j represent −1; answers use i. pi represents
variables. π . e does not represent 2.71828.
• save myfile X,Y saves the variables X and Y in the
file myfile.mat for use in another session of M/M at another B-2.2 Entering Vectors and Arrays
time. To enter row vector [1 2 3] and store it in A, type at the prompt
• load myfile loads all variables saved in myfile.mat, so A=[1 2 3]; or A=[1,2,3];
they can now be used in the present session of M/M. To enter the same numbers as a column vector and store
it in A, type at the prompt either A=[1;2;3]; or A=[1 2
• quit ends the present session of M/M. 3];A=A’; Note A=A’ replaces A with its transpose. “Trans-
pose” means “convert rows to columns, and vice-versa.”
To enter a vector of consecutive or equally-spaced numbers,
.m Files follow these examples:
An M/M program is a list of commands executed in succession.
Programs are called “m-files” since their extension is “.m,” or • [2:6] gives ans=2 3 4 5 6
“scripts.” • [3:2:9] gives ans=3 5 7 9
To write an .m file, at the upper left, click:
File→New→m-file • [4:-1:1] gives ans=4 3 2 1
This opens a window with a text editor.
Type in your commands and then do this: To enter an array or matrix of numbers, type, for example,
File→Save as→myname.m B=[3 1 4;1 5 9;2 6 5]; This gives the array B and its
Make sure you save it with an .m extension. Then you can run transpose B’
the file by typing its name at the prompt: >>myname. Make    
sure the file name is not the same as a Matlab command! Using 3 1 4 3 1 2
′
your own name is a good idea. B = 1 5 9 B = 1 5 6
You can access previously-typed commands using uparrow 2 6 5 4 9 5
and downarrow on your keyboard.
To download a file from a website, right-click on it, select Other basics of arrays:
save target as, and use the menu to select the proper file type
• ones(M,N) is an M × N array of “1”
(specified by its file extension).
• zeros(M,N) is an M × N array of “0”
B-2 Basic Computation • length(X) gives the length of vector X
B-2.1 Basic Arithmetic • size(X) gives the size of array X

For B above, size(B) gives ans=3 3
• Addition: 3+2 gives ans=5
• A(I,J) gives the (I,J)th element of A.
• Subtraction: 3-2 gives ans=1 For B above, B(2,3) gives ans=9
• Multiplication: 2*3 gives ans=6
B-2.3 Array Operations
• Division: 6/2 gives ans=3
Arrays add and subtract point-by-point:
• Powers: 2b3 gives ans=8 X=[3 1 4];Y=[2 7 3];X+Y gives ans=5 8 7
But X*Y generates an error message.
• Others: sin,cos,tan,exp,log,log10
To compute various types of vector products:
• Square root: sqrt(49) gives ans=7
• To multiply element-by-element, use X.*Y This gives
• Conjugate: conj(3+2j) gives ans=3-2i ans=6 7 12. To divide element-by-element, type X./Y
B-3 PLOTTING 417
• To find the inner product of X and Y, which is • A(A>2)=0 sets to 0 all values of elements of vector A
(3)(2)+(1)(7)+(4)(3)=25, use X*Y’. This gives ans=25 exceeding 2. For example,
A=[3 1 4 1 5];A(A<2)=0 gives A=3 0 4 0 5
• To find the outer product of X and Y, which is
  M/M indexing of arrays starts with 1, while signal and image
(3)(2) (3)(7) (3)(3) indexing starts with 0. For example, the DFT is defined using
(1)(2) (1)(7) (1)(3) use X’*Y
index n = 0, 1 . . . N − 1, for k = 0, 1 . . . N − 1. fft(X), which
(4)(2) (4)(7) (4)(3) computes the DFT of X, performs
This gives the above matrix. fft(X)=X*exp(-j*2*pi*[0:N-1]’*[0:N-1]/N);
A common problem is when you think you have a row vec-

tor when in fact you have a column vector. Check by us- B-2.4 Solving Systems of Equations
ing size(X); in the present example, the command gives To solve the linear system of equations
ans=1,3 which tells you that X is a 1 × 3 (row) vector.

1 2 x 17
• The following functions operate on each element of an =
3 4 y 39
array separately, giving another array:
sin,cos,tan,exp,log,log10,sqrt using
cos([0:3]*pi) gives ans=1 -1 1 -1
A=[1 2;3 4];Y=[17;39];X=A\Y;X’
• To compute n2 for n = 0, 1 . . . 5, use
gives ans=5.000 6.000, which is the solution [x y]′ .
[0:5].b2 which gives ans=0 1 4 9 16 25
To solve the complex system of equations
• To compute 2n for n = 0, 1 . . . 5, use
2.b[0:5] which gives ans=1 2 4 8 16 32 1+2j 3+4j x 16 + 32 j
=
5+6j 7+8j y 48 + 64 j
Other array operations include:
[1+2j 3+4j;5+6j 7+8j]\[16+32j;48+64j] gives
• A=[1 2 3;4 5 6];(A(:))’
Stacks A by columns into a column vector 2 − 2i
ans= ,
and transposes the result to a row vector. In the present 6 + 2i
example, the command gives ans=1 4 2 5 3 6
which is the solution.
• reshape(A(:),2,3) These systems can also be solved using inv(A)*Y, but this
Unstacks the column vector to a 2×3 array which, in this is a bad idea, since computing the matrix inverse of A takes much
case, is the original array A. more computation than just solving the system of equations.
Computing a matrix inverse can lead to numerical difficulties
• X=[1 4 1 5 9 2 6 5];C=X(2:8)-X(1:7) for large matrices.
Takes differences of successive values of X. In the present
example, the command gives C=3 -3 4 4 -7 4 -1
B-3 Plotting
• D=[1 2 3]; E=[4 5 6]; F=[D E]
This concatenates the vectors D and E (i.e., it appends B-3.1 Plotting Basics
E after D to get vector F). In the present example, the
command gives To plot a function x(t) for a ≤ t ≤ b:
F=1 2 3 4 5 6
• Generate, say, 100 values of t in a ≤ t ≤ b using
• I=find(A>2) stores in I locations (indices) of elements T=linspace(a,b,100);
of vector A exceeding 2.
find([3 1 4 1 5]<2) gives ans=2 4 • Generate and store 100 values of x(t) in X
• Plot each computed value of X against its corresponding B-3.2 Plotting Problems
value of T using plot(T,X)
Common problems encountered using plot:
• If you are making several different plots, put them all on T and X must have the same lengths; and neither T nor X
one page using subplot. should be complex; use plot(T,real(X)) if necessary.
subplot(324),plot(T,X) divides a figure into a 3- The above linspace command generates 100 equally
by-2 array of plots, and puts the X vs. T plot into the 4th spaced numbers between a and b, including a and b. This is
place in the array (the middle of the rightmost column). not the same as sampling x(t) with a sampling interval of b−a
100 .
To see why:
One problem with MathScript that does not arise with MATLAB
is that in MathScript subplot(324) opens 6 figures, even if • linspace(0,1,10) gives 10 numbers between 0 and 1
only one or two of them will actually be used for plots. This is inclusive, spaced by 0.111;
inelegant but harmless.
Print out the current figure (the one in the foreground; click • [0:.1:1] gives 11 numbers spaced by 0.1.
on a figure to bring it to the foreground) by typing print
Print the current figure to a encapsulated postscript file Try the following yourself on M/M:
myname.eps by typing print -deps2 myname.eps. Type
help print for a list of printing options for your computer. • T=[0:10];X=3*cos(T);plot(T,X)
For example, use -depsc2 to save a figure in color. This should be a very jagged-looking plot, since it is only
To make separate plots of cos(4t) and sin(4t) for 0 ≤ t ≤ 5 in sampled at 11 integers and the samples are connected by
a single figure, use the following: lines.
T=linspace(0,5,100);X=cos(4*T);Y=sin(4*T); • T=[0:0.1:10];X=3*cos(T);plot(T,X)
subplot(211),plot(T,X) This should be a much smoother plot since there are now
subplot(212),plot(T,Y) 101 (not 11) samples.
These commands produce the following figure:
• T=[1:4000];X=cos(2*pi*440*T/8192);
1 sound(X,8192) This is musical note “A.”
sound(X,Fs) plays the vector X as sound,
0.5
at a sampling rate of Fs samples/second.
0
• plot(X). This should be a blue smear! It is about 200
−0.5 cycles squished together.
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
• plot(X(1:100)) This “zooms in” on the first 100
samples of X to see the sinusoid. It is also possible to zoom
1
in by clicking on the figure.
0.5
B-3.3 More Advanced Plotting
0
Plots should be labelled and annotated. Use:
−0.5
• title(’Myplot’) adds the title “Myplot”
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
• xlabel(’t’) labels the x axis with “t”
The default is that plot(X,Y) plots each of the 100 ordered • ylabel(’x’) labels the y axis with “x”.
pairs (X(I),Y(I)) for I = 1, . . . , 100, and connects the
points with straight lines. If there are only a few data points to • $\omega$ produces ω in title,xlabel and
be plotted, they should be plotted as individual ordered pairs, not ylabel. Similarly for other Greek letters. Note ’, not ‘,
connected by lines. This can be done using plot(X,Y,’+’). should be used everywhere.
B-5 MISCELLANEOUS COMMANDS 419
• axis tight contracts the plot borders to the limits of from 0 to 1 for color images; this must be done separately for
the plot itself. each component:
• axis([a b c d]) changes the horizontal axis limits to X(:,:,1)=X(:,:,1)/max(max(X(:,:,1)));

a ≤ x ≤ b and the vertical axis limits to c ≤ y ≤ d. X(:,:,2)=X(:,:,2)/max(max(X(:,:,2)));
X(:,:,3)=X(:,:,3)/max(max(X(:,:,3)));
• grid on adds grid lines to the plot.
prior to using imagesc(X). This was done for all programs
• plot(T,X,’g’,T,Y,’r’) plots on the same plot for Chapter 10.
(T,X,Y must all have the same lengths)
X vs. T in green and Y vs. T in red. B-4.3 Image Computations in M/M
There is much, much more that can be done. Type help plot To process X using M/M, X must be converted from uint8 for-
to see how to do it. mat to the double-precision, 64-bit, floating-point format used
by M/M computation. This can be accomplished by inserting
X=double(X) after reading in X.
B-4 Image Commands conv2(X,H) computes the 2-D discrete-space convolution
of X and H. If X is (M1 × M2 ) and H is (L1 × L2 ) then Y is
B-4.1 Reading Images into M/M (N1 ×N2 ), where Ni = Mi +Li −1 for i = 1, 2 (see Section 3-6.2).
conv2(X,H,’valid’) computes the ’valid’ 2-D discrete-
X=imread(’picture.jpg’) reads the (M × N) JPEG im-
space convolution, defined in Section 7-12.3.
age picture.jpg into the (M × N) M/M array X if picture.jpg is a
fft2(X,M,N) computes the (M ×N) 2-D DFT of the image
grayscale (black and white) image.
in array X. If the size of X is smaller than (M × N), it is zero-
For color images, X=imread(’picture.jpg’) reads
padded to size (M × N). If M and N are not specified (i.e.,
the (M × N) JPEG image picture.jpg into the (M × N × 3)
fft2(X)), then M and N are set to the size of the image. Section
3-D M/M array X if picture.jpg is an RGB color image. Then
3-9 discusses some issues in the computation and display of
X(:,:,1) is the red component of the image, X(:,:,2) is
fft2(X,M,N).
the green component, and X(:,:,3) is the blue component
real(ifft2(F,M,N)) computes the (M × N) inverse
(see Chapter 10).
2-D DFT of F. Even if F is conjugate symmetric, roundoff error
The following image formats can be handled using imread:
will incorrectly make ifft2(F,M,N) a complex array. So
JPEG, TIFF, BMP, PNG, HDF, PCX. The extension of the
the real part of ifft2(F,M,N) should always be computed.
picture filename is used to determine the type of image.
Section 3-9 discusses some issues in the computation and
display of real(ifft2(X,M,N)).
B-4.2 Displaying Images in M/M
An image will be displayed using the default axis setting. This B-5 Miscellaneous Commands
usually alters the aspect ratio of the displayed image. This can
be corrected by axis image, which will display the array as B-5.1 Rectangular-to-Polar Complex
being (M × N). Conversion
figure,imagesc(X),colormap(gray) displays the
(M ×N) array X as a grayscale (black-and-white) image. Adding If an M/M result is a complex number, then it is presented√in its
axis off suppresses the numbers along the axes, which are rectangular form a+bj. M/M recognizes both i and j as −1,
in MATLAB format (see Chapter 3). imagesc(X) scales the so that complex numbers can be entered as 3+2j or 3+2i.
array so that its values range from 0 to 1 for display purposes To convert a complex number X to polar form, use
(see Chapter 5). image(X) omits this scaling, so in general it abs(X),angle(X) to get its magnitude and phase (in
should not be used. radians), respectively. To get its phase in degrees, use
For color images, figure,imagesc(X) displays the angle(X)*180/pi
(M × N × 3) 3-D MATLAB array X as a color image, Note atan(imag(X)/real(X)) will not give the correct
with colors determined as discussed above. Despite its name, phase, since this formula is only valid if the real part is positive.
imagesc(X) does not scale the array so that its values range angle corrects this.
The real and imaginary parts of X are found using real(X)

and imag(X), respectively.
B-5.2 Polynomial Zeros

To compute the zeros of a polynomial, enter its coefficients as
a row vector P and use R=roots(P). For example, to find
the zeros of 3x3 − 21x + 18 (the roots of 3x3 − 21x + 18 = 0)
use P=[3 0 -21 18];R=roots(P);R’, giving ans=
-3.0000 2.0000 1.0000, which are the roots.
To find the monic (leading coefficient one) polynomial
having a given set of numbers for zeros, enter the num-
bers as a column vector R and use P=poly(R). For exam-
ple, to find the polynomial having {1, 3, 5} as its zeros, use
R=[1;3;5];P=poly(R), giving P=1 -9 23 -15. The
polynomial is therefore x3 − 9x2 + 23x − 15.
Note that polynomial are stored as row vectors, and roots are
stored as column vectors.
B-5.3 Discrete-Time Commands

• stem(N,X) produces a stem plot of X vs. N
• conv(X,Y) convolves X and Y.
• fft(X,N) computes the N-point DFT of X.
• ifft(F) computes the inverse DFT of F. Due to roundoff
error, use real(ifft(F)).
sin(π x)
• sinc(X) compute πx for each element.
Index
Index
1-D Fourier transforms, 47–53 antialias filtering, 141

1-D continuous-time signals, 41–43 antialias lowpass filtering, 141–143
1-D continuous-time systems, 43–47 application of MRF to image segmentation, 327–329
1-D discrete-time signals and systems, 59–66 APSs, 4
1-D estimation examples, 300–303 associative property, 45
1-D fractals, 314–320 autocovariance functions, 286
2-D, 2 average, 213, 217, 221
2-D continuous-space Fourier transform, 94–107 average image, 229
2-D continuous-space images, 91–93 axial resolution, 25
2-D convolution, 94
2-D discrete Fourier transform, 119–121 B-spline, 129
2-D discrete space, 113–118 B-splines interpolation, 143–149
2-D discrete-space Fourier transform, 118–119 backpropagation, 397, 398
2-D estimation, 309–313 bandlimited, 53, 107
2-D fractals, 320–322 basis functions, 203, 206
2-D impulse, 91 basis pursuit, 238
2-D sampling theorem, 107–113 basis pursuit denoising, 238
2-D slices, 19 batches, 402
2-D spline interpolation, 149–150 Bayes’s rule, 293
2-D wavelet transform, 228–232 Bayesian estimators, 296
bed of nails, 107
a posteriori, 293 Bessel function, 8, 103
a posteriori pdf, 293 bilateral, 52
a priori, 293 bilateral power spectral density, 281
a priori information, 292 bilinear, 149
a priori pdf, 293 binary, 325
absorption coefficient, 18 binary image, 326
activation constant, 392 binomial probability density function, 265
activation function, 391 black, 335, 337
active pixel sensors, 4 blackbody, 11
additive, 336 blackbody radiation law, 11
additive noise, 181, 184 blurring, 181
Airy disc, 7 box image, 91
aliased, 54, 141 brickwall lowpass filter, 184
aliasing, 73 brightness, 5
analysis filter bank, 216 butterfly diagram, 77
analysis filters, 214
angular frequency, 21 Canny edge detector, 173
angular resolution, 9 CCDs, 4
421
422 INDEX
CDF, 168 constant speed, 196

center of mass, 295 continuous random variables, 261
central limit theorem, 144 continuous space, 285, 361
centroids, 380 continuous-space Fourier transform, 94
characteristic function, 275 continuous-space systems, 93–94
charge-coupled device, 4, 335 continuous-time Fourier transform, 47
circularly shift, 123 continuous-time random process, 278
class index, 354 continuous-time signals, 207
class-assignment algorithm, 323 convolution integral, 45
classification, 322 convolved image, 166
classification by MAP, 358–360 cooled detectors, 12
classification by MLE, 357–358 coordinates, 376
classification of rotated images, 366–367 correlation, 270, 354, 356
classification of spatially scaled images, 361–366 correlation coefficient, 270
classification of spatially shifted images, 360–361 cost function, 193
Classification Rule, 373 covariance, 270
clique, 323 covariance matrix, 273
CMOS, 4 cross-correlation, 360
CMYK, 335 cross-covariance matrix, 273
coin-flip experiment, 298–300 cross-validation, 404
color, 335 CSFT, 94
color cube, 336 CT, 2
color image classification, 367–373 CT scan, 18
color science, 339 CTFT, 47
color systems, 335–340 cumulative distribution function, 168
color-image deblurring, 343–346 cutoff index, 184
cyan, 335, 337
colorimetry, 339
cyclic, 73
commutative property, 45
cyclic convolution, 209–213
comparison of 2-D interpolation methods, 150
complement, 255, 256 Daubechies wavelet function, 203, 225
complex conjugate, 411 deblurring, 160, 181
complex number, 411 decimating, 212
compressed, 206 decimation, 141, 203
compressed sensing, 203, 236–238 decomposed, 76
compressed sensing examples, 242–249 decomposition structure, 218
compression, 203 deconvolution, 160
computation of continuous-time Fourier transform using the deconvolution using the DFT, 80–82
DFT, 82–84 deep learning, 393
computation of the 2-D DFT using MATLAB, 121–124 degree of similarity, 354
computed tomography, 2, 18 denoising, 160, 181, 182
computing solutions to underdetermined equations, 238–240 denoising by lowpass filtering, 183–188
conditional expectation, 270 denoising by thresholding and shrinking, 232–235
conditional pdf, 265 denoising color images, 346–347
conditional pmf, 268 density slicing, 9
conditional probability, 259–261 derivation of backpropagation, 403–404
confusion matrix, 358 detail, 214, 217, 221
conjugate symmetry, 50, 67, 95, 120 detail images, 229
connecting the dots, 129 detector resolution, 9
INDEX 423
deterministic, 181, 292 edge thinning, 173, 175

deterministic deconvolution, 309 edge-detection gradient, 173
deterministic estimate, 319 effects of shifts on pdfs and pmfs, 263–265
deterministic versus stochastic Wiener filtering, 307–309 electromagnetic, 3
DFT, 70, 119 EM, 3
difference operator, 171 emission, 11
diffraction pattern, 7 emissivity, 12
dimensionality reduction, 377 energy, 43
direct, 181 energy spectral density, 52, 67
direct and inverse problems, 181–183 estimate, 234
direct problem, 236 estimation, 255, 292
discrete, 69 estimation accuracy, 193
discrete Fourier transform, 70–76, 119 estimation error, 303
discrete random variables, 261 estimation methods, 292–298
discrete space, 113, 285 estimation problem, 292
discrete time, 59 Euler’s identity, 411
discrete-space Fourier transform, 118, 183 even, 49, 77, 93
discrete-space image, 113 even symmetry, 49
discrete-space spatial frequency, 184 even values of n, 77
discrete-time eternal sinusoid, 61 event, 255
discrete-time Fourier transform, 66–70 examples of image interpolation applications, 152–155
discrete-time frequency, 61 excitation, 20
discrete-time (Kronecker) impulse, 61 expansion of signals in orthogonal basis functions, 206–209
discrete-time random process, 278, 279 expectation, 269
discrete-time rectangle function, 69
discrete-time signal, 59 false-color, 335
discrete-time sinc function, 68 false-color image, 90
discrete-time wavelet transforms, 218–223 fast Fourier transform, 70, 76–80, 119
disjoint, 256 fast iterative shrinkage and thresholding algorithm, 242
disk-image, 92 FFT, 70, 119
display dynamic range, 160 field gradient, 20
displaying images, 90–91 fill in the gaps, 129
dissimilarity index, 325 filtering, 213
distribution function, 168 filtering of signals and images, 203
distributive property, 45 finite-impulse response, 68
downsampled average signal, 214 FIR, 68
downsampled detail signal, 214 first iteration, 381
downsampling, 140–141 FISTA, 242
DSFT, 118, 183 focal underdetermined system solver, 240
DSSF, 184 focusing, 23
DTFT, 66 FOCUSS, 240
DTFT of the Smith-Barnwell condition, 221 four-color printing process, 338
duration, 43, 61, 143 fractal, 290
dynamic range, 167 fractal-like, 314
frequency exponent, 315
edge, 171 frequency filtering, 50
edge detection, 160, 171–176 frequency response, 50, 51
edge image, 341 functions of random variables, 269–272
edge indicator, 173 fundamental frequency, 186
424 INDEX
fundamental period, 61 image enhancement, 160

image physical size on computer screen, 131
gamma transformation, 162 image pixel area, 131
Gaussian, 266, 279 image plane, 5
Gaussian pdf, 293 image recognition, 354
Gaussian random vector, 275 image reconstruction fidelity, 107
Gaussian random vectors, 275–278 image recording, 196
geometric pmf, 259 image restoration, 160, 181
Gibbs distribution, 325 image sharpening filter, 166
global minimum, 397 image shifting, 152
GPUs, 393 image spatial resolution, 9
gradient, 397 image spectra, 92
gradient descent, 397, 402 image texture, 323
gradient fields, 21 images, 92
gradient threshold, 173 imaginary, 411
graphical processing units, 393 implementation of upsampling using 2-D DFT in MATLAB,
grayscale, 335, 339, 341 137–140
grayscale image, 90 impulse, 41
gyromagnetic ratio, 21 impulse response, 8, 44
impulse train, 54
Haar, 203 in phase, 338
Haar transform, 226 incremental learning, 402
Haar wavelet transform, 213–218 independent and identically distributed, 279, 300
Hammersley-Clifford theorem, 325 independent events, 260
Hamming window, 68, 186, 188 independent random variable, 266, 271
Hankel transform, 103 infrared, 9
hard thresholding, 242 infrared catastrophe, 316
hexagonal sampling, 110 input layer, 393
hidden layers, 393 interpolation, 129, 205
histogram, 168 interpolation using sinc functions, 129–130
histogram equalization, 160, 167–170 intersection, 255, 256
histogram equalization and edge detection, 340–343 interval, 61
histogram-equalized image, 167 interval probability, 262, 263
horizontal-direction vertical-edge detector, 172 inverse, 181
HSI, 339 inverse problem, 236, 292
HSV, 339 IR, 9
hue, 339 IRLS, 239
hyperplane, 394 Ising model, 325
Ising-Gibbs distribution, 326
ICM, 328 ISTA, 241
ideal lowpass filter, 50 Iterated Conditional Modes, 328
iff, 412 iterative reweighted least squares, 239
IID, 279, 300 iterative shrinkage and thresholding algorithm, 241
ill-posed problem, 237
image, 107 jinc function, 106
image array size, 131 joint covariance matrix, 369
image classification by correlation, 354–356 joint pdf, 265, 357
image deconvolution, 191–194 joint pdfs and pmfs, 265–269
image dynamic range, 160 joint probability mass function, 266
INDEX 425
jointly Gaussian random variables, 275 LSI, 94, 117

LTI, 41, 43, 44
K-Means clustering algorithm, 380–382 LTI filtering of random processes, 282–285
K-stage decomposition, 221 LU decomposition, 239
knots, 143 luminance, 338
LWIR, 11
L, 43, 44
label, 399 MADs, 78
labeled, 390 magenta, 335, 337
Lanczos interpolation formula, 130 magnetic quantum number, 21
Landweber algorithm, 241–242 magnetic resonance, 19
Landweber iteration, 241 magnetic resonance imaging, 2, 19–23, 236
Laplacian, 164 magnetized, 20
Laplacian operator, 164 MAP, 292, 293, 323
Laplacian pdf, 293 marginal pdf, 357, 369
Laplacian’s spatial frequency response, 165 marginal pdfs, 265, 357
Larmor frequency, 21 marginal pmfs, 268
LASSO cost functional, 234, 238 Markov random fields, 255, 292, 322–326
lateral resolutions, 25 “mask” image, 164
LCD, 5, 91 maximum a posteriori, 293
learning, 354, 390 maximum likelihood estimate, 357
least absolute shrinkage and selection operator, 234, 238 maximum likelihood estimation, 293
least-squares estimation, 303–307 mean, 269
least-squares estimator, 293, 295 mean square error, 295
lens law, 4 mean value, 269
letter recognition, 358 mean vector, 273
likelihood, 255, 293 mean-square error, 399
likelihood function, 292, 370 measurement precision, 193
linear, 43, 44 median filtering, 194
linear convolution, 73 MEP, 359
linear least-squares estimate, 304 mesh plot, 90
linear operator, 269 middle-wave IR, 11
linear programming, 238 minimum error probability, 359
linear shift-invariant, 94, 117 minimum ℓ1 norm, 237
linear time-invariant, 41, 44 MLE, 292, 293, 323, 358
linear transformation, 160 MLE criterion, 370
liquid crystal display, 5, 91 modified functions, 212
local gradient, 403 modified MLE criterion, 371
local minimum, 397 modulated signal, 78
locality, 325 moment of inertia, 295
localization, 21 morphing, 152
location-independent covariance matrix, 369 motion blur, 195
log-likelihood function, 293, 302 motion-blur deconvolution, 195–197
logarithmic, 162 MR, 19
logarithmic warping, 362 MRF, 292, 322
logarithmically transformed scale factors, 363 MRI, 2, 19, 236
long-wave IR, 11 MSE, 295
lowpass filter, 99 multilayered perceptrons, 390
LSE, 292, 293, 295 multiplications and additions, 76, 78
426 INDEX
mutually exclusive, 260 overdetermined, 238

MWIR, 11
parallel-axis theorem, 295
N-D CSFT, 275 Parseval’s theorem, 48, 67
N-dimensional continuous-space Fourier transform, 275 partition function, 325
N-dimensional spatial frequency vector, 275 path attenuation, 19, 247
N-point, 71 pdf, 182, 262
near IR, 11 perceptrons, 390
nearest-neighbor, 107, 145 perfect-reconstruction conditions, 219
neighborhood, 323 perfectly red, 337
neural network training examples, 404–406 periodic, 69
neural networks, 390–396 periodogram spectral estimator, 314
neurons, 390 perturbation, 208
NIR, 11 perturbed, 208
NMR, 19 phase shifting, 23
NN, 107, 145 phase spectrum, 98
nodes, 393 photometry, 339
noise, 181 piecewise-Mth-degree polynomial, 223
noise variance, 182 piecewise polynomial, 223
noise-amplification problem, 193
pixel, 113
noisy image, 184
pixel dimensions, 131
non-WSS, 317
pixel value, 168
normal, 266
pixel-value transformation, 160–163, 168
normal distribution, 301
pixels of the DFT image, 243
normalized intensity, 161
pmf, 262
notch filter, 188
point spread function, 8, 94, 173
notch filtering, 188–191
point spread functions, 92
nuclear magnetic resonance, 19
polar form, 411
Nyquist frequency, 54
potential energy, 325
Nyquist sampling rate, 54, 107
Nyquist-sampled version, 108 power laws, 314
power spectral, 281
object plane, 5 probabilistic, 181
observation, 292, 310 probability, 255–259
observed value, 292, 296, 301 probability density function, 182, 262
octaves, 205, 218 probability mass function, 262
odd, 49, 77, 93 probability tree, 257
odd symmetry, 49 projection-slice theorem, 249
odd values of n, 77 pseudo inverses, 238
optical imagers, 3–13 pseudo-inverse solution, 238
original, 133 pseudocolor, 9
original sampled image, 133 PSF, 8, 92, 94, 173, 181
orthogonal, 206, 373 pulse, 41
orthogonal expansion, 206 pulser, 23
orthogonality, 304
orthogonality property, 66, 71 QMF, 203, 219
orthonormal, 207 QMF relation, 219
orthonormal expansion, 221 quadrature, 338
output layer, 393 quadrature mirror filter, 203, 219
INDEX 427
radar imagers, 13–18 sampling rate, 53, 107, 129

radar shadow, 16 sampling theorem, 53–59
radial brickwall lowpass filter, 106 SAR, 14
radial discrete-space frequency, 165 SAR PSF, 16
radial frequency, 110 saturation, 339
radially bandlimited, 110 scaling, 218, 223
radio frequency, 20 scaling constant, 314
Radon transform, 19, 247 scene spatial resolution, 9, 14
random, 255 SD, 397, 403
random fields, 255, 285–286 SDC, 306
random processes, 255, 278–282 segment, 171, 322
random variables, 255, 261–263 self-similarity, 314
random vectors, 272–275 separable, 93, 228
range, 13 separating hyperplane, 394
Rayleigh resolution criterion, 9 sharpened image, 164
Rayleigh’s theorem, 48 Shepp-Logan phantom, 230
real, 411 shift invariance, 94
realization, 278 shorthand notation, 275
receive beamforming unit, 23 shot noise, 194
reclining matrix, 373, 375 shrinkage, 203
reconstructed image, 107 shrinking, 232, 233
reconstruction structure, 218 shrnking, 235
recorded, 181 SI, 94
rectangle function, 41 side information, 292, 293
rectangular form, 411 side-looking airborne radar, 14
reflectance, 11 sifting, 41
regularity, 44
sifting property, 91
regularization parameter, 193
signal flow graph, 77
regularized, 193
signal-to-noise, 184
removing interference, 181
sinc, 50
resolution area, 131
sinc function, 69
RF, 20
sinc interpolation formula, 55, 107, 129
RGB, 335
singular value decomposition, 373
ring impulse, 103
rotated, 195, 366 singular values, 373
rotation cross-correlation, 366 sinusoidal interference, 188
rotation matrix, 101 SLAR, 14
rotationally invariant, 102 slice, 18
RV, 273 slices, 2
Smith-Barnwell condition, 203
salt-and-pepper, 194 Smith-Barnwell condition for perfect reconstruction, 220
sample function, 278 SNR, 184
sample mean, 300, 302 Sobel edge detector, 173, 341
sampled image, 107 soft thresholding, 242
sampled signal, 54, 129 SPARSA, 242
sampled version, 107 sparse, 215
samples, 53 sparse reconstruction by separable approximation, 242
sampling interval, 53, 129 sparsification using wavelets of piecewise-polynomial signals,
sampling length, 107 223–228
428 INDEX
spatial frequency, 94 thermal infrared imager, 9

spatial frequency response, 95 three subtractive colors, 338
spatial frequency response of the Laplacian operator, 164 threshold, 376
spatial interval, 113 threshold level, 232
spatial resolution, 9 thresholding, 203, 233, 235
spatial sampling, 113 thresholding and shrinking, 235
spatial-scaling factors, 362 thumbnail image, 231
spatially scaled, 361, 363 TI, 43, 44
spatially shifted, 363 Tikhonov regularization, 193
spatially warped images, 362 tiling, 110
spectral estimation, 313–314 time delaying, 23
spectrum, 51 time scaling, 41
spin quantum number, 20 time-invariant, 44
square wave, 186 time-scaled, 43
squared ℓ2 norm, 237 Toeplitz blocks, 246
standard deviation, 270 trade-off parameter, 234
static field, 20 training, 354, 390, 396
statistical self-similarity, 314 training images, 367, 373, 377
steepest descent, 397, 403 training matrix, 378
steepest-descent, 397 training neural networks, 396–403
steering, 23 transducers, 23
steering the beam, 24 transform, 206
stochastic, 292 transmit beamforming unit, 23
stochastic deconvolution, 306, 310 transmit/receive switch, 23
stochastic denoising, 305 transpose, 273
stochastic denoising/deconvolution, 309 tree-structured filter banks, 203–206
stochastic denoising filter, 305 true, 181
stochastic gradient descent, 402 true resolution area, 131
stochastic processes, 278 true value of index k, 356
stochastic Wiener, 306 true-color, 335
stochasticity, 292 truncated SVD, 377
sub–Nyquist-sampled version, 108 TSFBs, 203
subband coding, 203 twiddle factors, 79
subband decomposition, 203 twiddle multiplications, 79
subtractive, 337 TWISTA, 242
summary of image enhancement techniques, 176 two-dimensional, 2
superposition, 44 two-step iterative shrinkage and thresholding algorithm, 242
superposition integral, 45
supervised learning, 390 ultrasound, 23
supervised training, 354 ultrasound imager, 23–27
support, 43, 61, 143 ultraviolet catastrophe, 316
SVD, 373 uncooled detectors, 12
synthesis filters, 214 uncorrelated, 270, 271, 273, 280
synthetic-aperture radar, 14 underdetermined, 236
underdetermined system, 237
tall matrices, 373 unfiltered, 307
terminals, 393 uniform, 272
tetrahedral die, 258 uniform pdf, 299
thermal imagers, 11 union, 255
INDEX 429
Universal Approximation Theorem, 393

unknown, 292
unsharp masking, 160, 163–167
unsupervised learning, 373, 390
unsupervised learning and classification, 373–377
unsupervised learning examples, 377–380
unsupervised training, 354
upsampled version, 133
upsampling and downsampling modalities, 130–132
upsampling and interpolation, 133–137
upsampling factor, 133
valid convolution, 245

van Cittert iteration, 241
variance, 270
VE, 171
vertical edge, 171
vertical-direction horizontal-edge detector, 172
voltage outputs, 5
Voronoi sets, 380
voxels, 22
warping, 152
wavelength, 25
wavelet, 218, 223
wavelet transform, 203
wavelet transform matrix, 237
wavenumbers, 94
weak-sense stationary, 280
weighting coefficients, 76
white, 282
white random process, 282
wide-sense stationary, 280
Wiener filter, 193, 310
Wiener process, 317
windowed sinc functions, 130
within-class, 367
WSS, 280, 304
X-ray computed tomography, 18–19
yellow, 335, 337

YIQ, 335
zero-padded functions, 80
zero-padding, 74
zero-stuffing, 203, 205, 212
Andrew E. Yagle Fawwaz T. Ulaby
University of Michigan
Andrew E. Yagle is professor of Electrical Engineering and Computer

Science at the University of Michigan. He is the recipient of several
research and teaching awards including the NSF Presidential Young
Investigator, ONR Young Investigator, College of Engineering
Teaching Excellence Award, the Eta Kappa Nu Professor of the Year
Award, and the Class of 1938E Distinguished Service Award.
He is a past member of the IEEE Signal Processing Society Board
of Governors, the Image and Multidimensional Signal Processing
Technical Committee, the Digital Signal Processing Technical
Committee, and the Signal Processing Theory and Methods
Technical Committee. He is a past associate editor of the IEEE
Transactions on Signal Processing, IEEE Signal Processing Letters,
Multidimensional Systems and Signal Processing, and the IEEE
Transactions on Image Processing.
Fawwaz T. Ulaby is the Emmett Leith Distinguished Professor of

Electrical Engineering and Computer Science and former Vice
President for Research at the University of Michigan. He is a member
of the National Academy of Engineering and recipient of the IEEE
James H. Mulligan, Jr. Education Medal. His Applied Electromagnetics
textbook is used at over 100 US universities.

Image Processing For Engineers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Processing For Engineers

Uploaded by

Copyright:

Available Formats

Image Processing

by Andrew E. Yagle and Fawwaz T. Ulaby

Published in the United States of America by

ISBN 978-1-60785-488-3 (hardcover)

This book is dedicated to the memories of

Preface Chapter 4 Image Interpolation 128

Chapter 1 Imaging Sensors 1 4-1 Interpolation Using Sinc Functions 129

“A picture is worth a thousand words.” • An introduction to discrete wavelets, and application of

Image processing has applications in medicine,

■ How an X-ray system uses computed tomography

■ How magnetic resonance is used to generate 3-D

■ How an ultrasound instrument generates an image of

■ The history of image processing.

■ The types of image-processing operations examined

Overview sectional images (called slice) of the attenuation for specific

Image Formation and Processing

processing techniques covered in future chapters can accom-

X-ray imagers Optical imagers Radar MRI

3 × 1020 3 × 1018 3 × 1016 3 × 1014 3 × 1012 3 × 108 3 × 104

10,000,000 K 10,000 K 100 K 1K

Figure 1-2 Electromagnetic spectrum.

an optical imager, the camera records the spatial distribution of

2-D detector array

Object plane Image plane

Figure 1-3 Camera imaging system.

Ii (x, y; λ ): continuous intensity image in the image plane of the

A. Continuous and Discrete Images

Detector array Discretized

(0,−1) • Index n varies horizontally and index m varies verti-

(0,−M) • Image size = # of rows × # of columns = M × N.

(a) Centered coordinates with

Object plane Image plane

Figure 1-8 For an aberration-free lens, the image of a point

the geometry in Fig. 1-10 with sin θ ≈ θ leads to

10−6 μm 10−3 μm 1 μm 103 μm 106 μm 109 μm

Figure 1-12 Infrared subbands.

temperature conversion. Unlike the traditional camera—which

(b) The middle-wave IR (MWIR) extends from λ = 2 µ m to

10 µ m, which is towards the short wavelength end of the LWIR

detector arrays of thermal IR imagers are under 1 megapixel in

Exercise 1-2: At λ = 10 µ m, what is the ratio of the

1-2 Radar Imagers

Figure 1-17 Thermal IR image of a person’s head and neck. λ

Recorder (digital storage)

βxz ≈ λ Along-track resolution

resolution ∆y′min at θ = 45◦ is ly = 2 m

where I0 is the X-ray intensity radiated by the source. Outside

Superconducting magnet generates static field B0 The Magnetic Field

Radio frequency Gradient Radio frequency xˆ

inside a magnetic core. The magnetic field at a given location

where B0 is a static field, BG is the field gradient, and BRF is

Table 1-2 Gyromagnetic ratio γ– for biological nuclei. y

Figure 1-28 Magnetic coils used to generate magnetic fields

intended application [Liang and Lauterbur, 2000]. The magnetic

off, the receiver picks up the resonant signals emitted by the

receive switch. Thus, the array serves to both launch acoustic

1-5 Ultrasound Imager 1-5.2 Beam Focusing and Steering

Figure 1-31 Block diagram of an ultrasound system with a 4-transducer array.

The image displayed in Fig. 1-34 is a simulation of acoustic

Figure 1-32 Ultrasound imaging of the thyroid gland.

Beamforming unit Beamforming unit

Broadside direction Broadside direction

(a) Focal point at Rf1 (b) Focal point at Rf2