Professional Documents
Culture Documents
Pyramid Methods in Image Processing
Pyramid Methods in Image Processing
Pyramid Methods in Image Processing
Ogden
D igital image processing is being used i n that can perform most of the routine vis- images themselves. Scenes in the world
many domains today. In image enhance- ual tasks that humans do effortlessly. contain objects of many sizes, and these
ment, for example, a variety of methods It is becoming increasingly clear that objects contain features of many sizes.
now exist for removing image degrada- the format used to represent image data Moreover, objects can be at various dis-
tions and emphasizing important image in- can be as critical in image processing as tances from the viewer. As a result, any
formation, and in computer graphics, dig- the algorithms applied to the data. A dig- analysis procedure that is applied only at a
ital images can be generated, modified, and ital image is initially encoded as an array single scale may miss information at other
combined for a wide variety of visual of pixel intensities, but this raw format i s scales. The solution is to carry out analy-
effects. In data compression, images may be not suited to most tasks. Alternatively, an ses at all scales simultaneously.
efficiently stored and transmitted if trans- image may be represented by its Fourier Convolution is the basic operation of
lated into a compact digital code. In ma- transform, with operations applied to the most image analysis systems, and convo-
chine vision, automatic inspection systems transform coefficients rather than to the lution with large weighting functions is a
and robots can make simple decisions based original pixel values. This is appropriate notoriously expensive computation. In a
on the digitized input from a television for some data compression and image en- multiresolution system one wishes to per-
camera. hancement tasks, but inappropriate for form convolutions with kernels of many
But digital image processing is still in a others. The transform representation is par- sizes, ranging from very small to very
developing state. In all of the areas just ticularly unsuited for machine vision and large. and the computational problems
mentioned, many important problems re- computer graphics, where the spatial loca- appear forbidding. Therefore one of the
main to be solved. Perhaps this is most tion of pattem elements is critical. main problems in working with multires-
obvious in the case of machine vision: we Recently there has been a great deal of olution representations is to develop fast
still do not know how to build machines interest in representations that retain spa- and efficient techniques.
tial localization as well as localization i n Members of the Advanced Image Pro-
Abstract: The data structure used to the spatial—frequency domain. This i s cessing Research Group have been actively
represent image information can be critical achieved by decomposing the image into a involved in the development of multireso-
to the successful completion of an image set of spatial frequency bandpass compo- lution techniques for some time. Most of
processing task. One structure that has nent images. Individual samples of a com- the work revolves around a representation
attracted considerable attention is the image ponent image represent image pattern in- known as a "pyramid," which is versatile,
pyramid This consists of a set of lowpass or formation that is appropriately localized, convenient, and efficient to use. We have
bandpass copies of an image, each while the bandpassed image as a whole rep- applied pyramid-based methods to some
representing pattern information of a resents information about a particular fine- fundamental problems in image analysis,
different scale. Here we describe a variety of ness of detail or scale. There is evidence data compression, and image manipulation.
pyramid methods that we have developed that the human visual system uses such a
for image data compression, enhancement, representation, 1 and multiresolution sche-
mes are becoming increasingly popular i n Image pyramids
analysis and graphics.
machine vision and in image processing i n The task of detecting a target pattern that
©1984 RCA Corporation general. may appear at any scale can be approached
Final manuscript received November 12, 1984 The importance of analyzing images at in several ways. Two of these, which in-
Reprint Re-29-6-5 many scales arises from the nature of volve only simple convolutions, are illus-
trated in Fig. 1. Several copies of the pat- convolution with the image reduced i n bottom, or zero level of the pyramid, G 0,
tern can be constructed at increasing scales, scale by a factor of s. This can be substan- is equal to the original image. This is low-
then each is convolved with the image. tial for scale factors in the range 2 to 32, a pass-filtered and subsampled by a factor of
Alternatively, a pattern of fixed size can be commonly used range in image analysis. two to obtain the next pyramid level, G 1.
convolved with several copies of the image The image pyramid is a data structure G1 is then filtered in the same way and
represented at correspondingly reduced re- designed to support efficient scaled convo- subsampled to obtain G2. Further repeti-
solutions. The two approaches yield equi- lution through reduced image representa- tions of the filter/subsample steps generate
valent results, provided critical information tion. It consists of a sequence of copies of the remaining pyramid levels. To be pre-
in the target pattern is adequately repre- an original image in which both sample cise, the levels of the pyramid are obtained
sented. However, the second approach i s density and resolution are decreased i n iteratively as follows. For 0 < l < N:
much more efficient: a given convolution regular steps. An example is shown in Fig. (1)
with the target pattern expanded in scale 2a. These reduced resolution levels of the Gl (i,j) ΣΣ w (m,n) G l-1 (2i+m,2j+n)
by a factor s will require s 4 more arith- pyramid are themselves obtained through a m n
metic operations than the corresponding highly efficient iterative algorithm. The However, it is convenient to refer to this
reduced correspondingly by one octave with Gaussian pyramid could have been ob-
each level. Because of this resemblance t o tained directly by convolving a Gaussian-
the Gaussian density function we refer t o like equivalent weighting function with the
the pyramid of lowpass images as the original image, each value of this bandpass
"Gaussian pyramid." pyramid could be obtained by convolving
Bandpass, rather than lowpass, images a difference of two Gaussians with the
are required for many purposes. These may original image. These functions closely
be obtained by subtracting each Gaussian resemble the Laplacian operators common-
(lowpass) pyramid level from the next- ly used in image processing (Fig. 3b). For
lower level in the pyramid. Because these this reason we refer to the bandpass pyra-
levels differ in their sample density it i s mid as a "Laplacian pyramid."
necessary to interpolate new sample values An important property of the Laplacian
between those in a given level before that pyramid is that it is a complete image
Fig.3. Equivalent weighting functions. level is subtracted from the next-lower representation: the steps used to construct
The process of constructing the Gaus- level. Interpolation can be achieved b y the pyramid may be reversed to recover
sian (lowpass) pyramid is equivalent to reversing the REDUCE process. We call the original image exactly. The top pyra-
convolving the original image with a set this an EXPAND operation. Let Gl,k be mid level, LN, is first expanded and added
of Gaussian-like weighting functions, the image obtained by expanding Gl k to LN-1 to form GN-1 then this array i s
then subsampling, as shown in (a). The times. Then Gl,k = EXPAND [G Gl,k-1] or, to be expanded and added to LN-2 to recover
weighting functions double in size with precise, Gl,0 = Gl, and for k>0, GN-2, and so on. Alternatively, we may
each increase in 1. The corresponding write
functions for the Laplacian pyramid re- (2)
Gl,k(i,j) = 4 Σ Σ Gl,k-1 ( 2i + m , 2 j + n )
semble the difference of two Gaussians, G0 = ∑ Ll,l (4)
as shown in (b). m n 2 2 The pyramid has been introduced here as
Here only terms for which (2i+m)/2 and a data structure for supporting scaled image
process as a standard REDUCE opera- analysis. The same structure is well suited
tion, and simply write (2j+n)/2 are integers contribute to the
sum. The expand operation doubles the for a variety of other image processing
Gl = REDUCE [Gl-1]. tasks. Applications in data compression
size of the image with each iteration, s o
We call the weighting function w(m,n) that Gl,1, is the size of Gl,1, and Gl,1 is the and graphics, as well as in image analysis,
the "generating kernel." For reasons of same size as that of the original image. will be described in the following sections.
computational efficiency this should be Examples of expanded Gaussian pyramid It can be shown that the pyramid-building
small and separable. A five-tap filter was levels are shown in Fig. 2b. procedures described here have significant
used to generate the pyramid in Fig. 2a. The levels of the bandpass pyramid, L 0 , advantages over other approaches to scaled
Pyramid construction is equivalent t o L 1, ...., LN, may now be specified in terms analysis in terms of both computation cost
convolving the original image with a set of of the lowpass pyramid levels as follows: and complexity. The pyramid levels are
Gaussian-like weighting functions. These obtained with fewer steps through repeated
"equivalent weighting functions" for three Ll = Gl—EXPAND [Gl+1] (3) REDUCE and EXPAND operations than i s
successive pyramid levels are shown i n = Gl—Gl+1,1. possible with the standard FFT. Further-
Fig. 3a. Note that the functions double i n more, direct convolution with large equiva-
width with each level. The convolution The first four levels are shown in Fig. 4a. lent weighting functions requires 20- t o
acts as a lowpass filter with the band limit Just as the value of each node in the 30-bit arithmetic to maintain the same ac-
curacy as the cascade of convolutions with image can be exactly reconstructed from it's are segregated by scale in the various pyra-
the small generating kernel using just 8-bit pyramid representation (Eq. 4), the pyramid mid levels, as shown in Fig. 4. As with the
arithmetic. code is complete. Fourier transform, pyramid code elements
There are two reasons for transforming represent pattern components that are res-
an image from one representation to an- tricted in the spatial-frequency domain. But
A compact code other: the transformation may isolate criti- unlike the Fourier transform, pyramid code
The Laplacian pyramid has been described as cal components of the image pattern s o elements are also restricted to local regions
a data structure composed of bandpass they are more directly accessible to analy- in the spatial domain. Spatial as well as
copies of an image that is well suited sis, or the transformation may place the spatial-frequency localization can be critical
for scaled-image analysis. But the pyramid data in a more compact form so that they in the analysis of images that contain
may also be viewed as an image transform- can be stored and transmitted more effi- multiple objects so that code elements will
ation, or code. The pyramid nodes are then ciently. The Laplacian pyramid serves both tend to represent characteristics of single
considered code elements, and the equiva- of these objectives. As a bandpass filter, objects rather than confound the characteris-
lent weighting functions are sampling pyramid construction tends to enhance tics of many objects.
functions that give node values when con- image features, such as edges, which are The pyramid representation also permits
volved with the image. Since the original important for interpretation. These features data compression. 3 Although it has one
third more sample elements than the orig- These examples suggest that the pyra- cost of image processing based on such
inal image, the values of these samples mid is a particularly effective way of repre- pyramids is correspondingly increased.
tend to be near zero, and therefore can be senting image information both for trans- A second class of operations concerns
represented with a small number of bits. mission and analysis. Salient information the estimation of integrated properties
Further data compression can be obtained is enhanced for analysis, and to the extent within local image regions. For example, a
through quantization: the number of dis- that quantization does not degrade analy- texture may often be characterized by local
tinct values taken by samples is reduced sis, the representation is both compact and density or energy measures. Reliable esti-
by binning the existing values. This results robust. mates of image motion also require the
in some degradation when the image i s integration of point estimates of displace-
reconstructed, but if the quantization bins ment within regions of uniform motion. In
are carefully chosen, the degradation will Image analysis such cases early analysis can often be
not be detectable by human observers and Pyramid methods may be applied to anal- formulated as a three-stage sequence of
will not affect the performance of analysis ysis in several ways. Three of these will be standard operations. First, an appropriate
algorithms. outlined here. The first concerns pattern pattern is convolved with the image (or
Figure 5 illustrates an application of the matching and has already been mentioned: images, in the case of motion analysis).
pyramid to data compression for image to locate a particular target pattern that This selects a particular pattern attribute t o
transmission. The original image is shown may occur at any scale within an image, be examined in the remaining two stages.
in Fig. 5a. A Laplacian pyramid represen- the pattern is convolved with each level of Second, a nonlinear intensity transforma-
tation was constructed for this image, then the image pyramid. All levels of the pyra- tion is performed on each sample value.
the values were quantized to reduce the mid combined contain just one third more Operations may include a simple threshold
effective data rate to just one bit per pixel, nodes than there are pixels in the original to detect the presence of the target pattern,
then to one-half bit per pixel. Images recon- image. Thus the cost of searching for a a power function to be used in computing
structed from the quantized data are pattern at many scales is just one third texture energy measures, or the product of
shown in Figs. 5b and 5c. Humans tend t o more than that of searching the original corresponding samples in two images used
be more sensitive to errors in low-frequency image alone. in forming correlation measures for motion
image components than in high-frequency The complexity of the patterns that may analysis. Finally the transformed sample
components. Thus in pyramid compression, be found in this way is limited by the fact values are integrated within local windows
nodes at level zero can be quantized more that not all image scales are represented i n to obtain the desired local property
coarsely than those in higher levels. This i s the pyramid. As defined here, pyramid measures.
fortuitous for compression since three-quart- levels differ in scale by powers of two, or Pattern scale is an important parameter
ers of the pyramid samples are in the zero by octave steps in the frequency domain. of both the convolution and integration
level. Power-of-two steps are adequate when the stages. Pyramid-based processing may be
Data compression through quantization patterns to be located are simple, but com- employed at each of these stages to facili-
may also be important in image analysis t o plex patterns require a closer match be- tate scale selection and to support efficient
reduce the number of bits of precision tween the scale of the pattern as defined i n computation. A flow diagram for this three-
carried in arithmetic operations. For exam- the target array, and the scale of the pat- stage analysis is given in Fig. 6. Analysis
ple, in a study of pyramid-based image tern as it appears in the image. Variants o n begins with the construction of the pyramid
motion analysis it was found that data the pyramid can easily be defined with representation of the image. A feature pat-
could be reduced to just three bits per squareroot-of-two and smaller steps. How- tern is then convolved with each level of the
sample without noticeably degrading the ever, these not on]y have more levels, but pyramid (Stage 1), and the resulting
computed flow field.4 many more samples, and the computational correlation values may be passed through
A related application of pyramids con- An alternative approach is to join image This dilemma can be resolved if each
cerns the construction of image mosaics. components smoothly by averaging pixel image is first decomposed into a set of
This is a common task in certain scientific values within a transition zone centered o n spatial-frequency bands. Then a bandpass
fields and in advertising. The objective i s the join line. The width of the transition mosaic can be constructed in each band
to join a number of images smoothly into a zone is then a critical parameter. If it i s by use of a transition zone that is compar-
larger mosaic so that segment boundar- too narrow, the transition will still be vis- able in width to the wavelengths repres-
ies are not visible. As an example, suppose ible as a somewhat blurred step. If it is too ented in the band. The final mosaic is then
we wish to join the left half of Fig. 10a wide, features from both images will be obtained by summing the component band-
with the right half of Fig. 10b The most visible within the transition zone as in a pass mosaics.
direct method for combining the images i s photographic double exposure. The blur- The computational steps in this "multire-
to catinate the left portion of Fig. 10a with red-edge effect is due to a mismatch of solution splining" procedure are quite sim-
the right portion of Fig. 10b. The result, low frequencies along the mosaic boun- ple when pyramid methods are used.6 To
shown in Fig. 10c, is a mosaic in which dary, while the double-exposure effect i s begin, Laplacian pyramids LA and LB are
the boundary is clearly visible as a sharp due to a mismatch in high frequencies. In constructed for the two original images.
(though generally low-contrast) step in gray general, there is no choice of transition These decompose the images into the re-
level. zone width that can avoid both defects. quired spatial-frequency bands. Let P be the
Conclusions
The pyramid offers a useful image
representation for a number of tasks. It i s
efficient to compute: indeed pyramid
filtering is faster than the equivalent
filtering done with a fast Fourier transform.
The information is also available in a
format that is convenient to use, since the
nodes in each level represent information
that is localized in both space and spatial
frequency.
We have discussed a number of examples
in which the pyramid has proven to be
valuable. Substantial data compression
(similar to that obtainable with transform
methods) can be achieved by pyramid
encoding combined with quantitization and
entropy coding. Tasks such as texture
analysis can be done rapidly and
simultaneously at all scales. Several
different images can be combined to form a
seamless mosaic, or several images of the
same scene with different planes of focus
can be combined to form a single sharply
focused image.
Because the pyramid is useful in so many
tasks, we believe that it can bring some
conceptual unification to the problems of
Fig. 10. Image mosaics. The left half of image (a) is catinated with the right half of representing and manipulating low-level
image (b) to give the mosaic in (c). Note that the boundary between regions is visual information. It offers a flexible,
clearly visible. The mosaic in (d) was obtained by combining images separately in convenient multiresolution format that
each spatial frequency band of their pyramid representations then expanding and matches the multiple scales found in the
summing these bandpass mosaics. visual scenes and mirrors the multiple scales
of processing in the human visual system.
locus of image points that fall on the LCl (i,j) = LAl (i,j)
boundary line, and let R be the region to the If the sample is in P,then
left of P that is to be taken from the left LCl (i,j) = LBl (i,j), References
image. Then the pyramid LC for the Otherwise, 1. H. Wilson and J. Bergen' "A four mechanism model
composite image is defined as: LCl = LCl (i,j) (8) for threshold special vision", Vision Research. Vol.
If the sample is in R, then The levels of LC are then expanded and 19, pp. l9-31, 1979.