Professional Documents
Culture Documents
Image Fusion
Image Fusion
Image fusion is the process by which two or more images are combined into
a single image retaining the important features from each of the original images.
The fusion of images is often required for images acquired from different
instrument modalities or capture techniques of the same scene or objects.
Important applications of the fusion of images include medical imaging,
microscopic imaging, remote sensing, computer vision, and robotics. Fusion
techniques include the simplest method of pixel averaging to more complicated
methods such as principal component analysis and wavelet transform fusion.
Several approaches to image fusion can be distinguished, depending on whether
the images are fused in the spatial domain or they are transformed into another
domain, and their transforms fused.
1
light stimuli, so moving artifacts or time depended contrast changes introduced by
the fusion process are highly distracting to the human observer. So, in case of
image sequence fusion the two additional requirements apply. Temporal stability:
The fused image sequence should be temporal stable, i.e. gray level changes in
the fused sequence must only be caused by gray level changes in the input
sequences, they must not be introduced by the fusion scheme itself; Temporal
consistency: Gray level changes occurring in the input sequences must be present
in the fused sequence without any delay or contrast change.
1.1.1 Introduction
linear superposition
nonlinear methods
optimization approaches
artificial neural networks
image pyramids
wavelet transform
generic multiresolution fusion scheme
2
1.1.2 Linear Superposition
Another simple approach to image fusion is to build the fused image by the
application of a simple nonlinear operator such as max or min. If in all input images
the bright objects are of interest, a good choice is to compute the fused image by
an pixel-by-pixel application of the maximum operator.
3
ones for the operations on images and templates. Image algebra has been used in
a generic way to combine multisensor images
1.1.4 Optimization Approaches
4
Image pyramids have been initially described for multiresolution image
analysis and as a model for the binocular fusion in human vision. A generic image
pyramid is a sequence of images where each image is constructed by low pass
filtering and sub sampling from its predecessor. Due to sampling, the image size is
halved in both spatial directions at each level of the decomposition process, thus
leading to an multiresolution signal representation. The difference between the
input image and the filtered image is necessary to allow an exact reconstruction
from the pyramidal representation. The image pyramid approach thus leads to a
signal representation with two pyramids: The smoothing pyramid containing the
averaged pixel values, and the difference pyramid containing the pixel differences,
i.e. the edges. So the difference pyramid can be viewed as a multiresolution edge
representation of the input image.
The actual fusion process can be described by a generic multiresolution
fusion scheme which is applicable both to image pyramids and the wavelet
approach. There are several modifications of this generic pyramid construction
method described above. Some authors propose the computation of nonlinear
pyramids, such as the ratio and contrast pyramid, where the multistage edge
representation is computed by an pixel-by-pixel division of neighboring resolutions.
A further modification is to substitute the linear filters by morphological nonlinear
filters, resulting in the morphological pyramid. Another type of image pyramid - the
gradient pyramid - results, if the input image is decomposed into its directional
edge representation using directional derivative filter
5
invoked in image sequence fusion. To overcome the shift dependency of the
wavelet fusion scheme, the input images must be decomposed into a shift
invariant representation. There are several ways to achieve this: The
straightforward way is to compute the wavelet transform for all possible circular
shifts of the input signal. In this case, not all shifts are necessary and it is possible
to develop an efficient computation scheme for the resulting wavelet
representation. Another simple approach is to drop the subsampling in the
decomposition process and instead modify the filters at each decomposition level,
resulting in a highly redundant signal representation.
6
Fig. 1 Block Diagram Of Basic Image Fusion Process
7
AIM OF THE PROJECT
8
The fusion of images is the process of combining two or more images into a
single image retaining important features from each. Fusion is an important
technique within many disparate fields such as remote sensing, robotics and
medical applications. Wavelet based fusion techniques have been reasonably
effective in combining perceptually important image features. Shift invariance of
the wavelet transform is important in ensuring robust sub band fusion. Therefore
the novel application of the shift invariant and directionally selective Dual Tree
Complex Wavelet Transform (DT-CWT) to image fusion is now introduced. This
novel technique provides improved qualitative and quantitative results compared to
previous wavelet fusion method.
9
The goals for this Project have been the following.
Another goal has been to search for algorithms that can be used to
implement for the image fusion for various applications.
A final goal has been to design and implement the Wavelet based fuzzy and
Neural approaches using matlab.
Figures 3(a) and 3(b) show a pair of multifocus test images that were fused
for a closer comparison of the DWT and DT-CWT methods. Figures 3(d) and 3(e)
show the results of a simple MS method using the DWT and DT-CWT,
respectively. These results are clearly superior to the simple pixel averaging result
shown in 3(c). They both retain a perceptually acceptable combination of the two
“in focus” areas from each input image. An edge fusion result is also shown for
comparison (figure 3(f)) [8]. Upon closer inspection however, there are residual
ringing artefacts found in the DWT fused image not found within the DT-CWT
fused image. Using more sophisticated coefficient fusion rules (such as WBV or
WA) the DWT and DT-CWT results were much more difficult to distinguish.
However, the above comparison when using a simple MS method reflects the
ability of the DT-CWT to retain edge details without ringing.
10
Figure 2.1: (a) First image of the multifocus test set. (b) Second image of the
multi focus test set. (c) Fused image using average pixel values. (d) Fused
image using DWT with an MS fuse rule. (e) Fused image using DT-CWT with
an MS fuse rule. (f) Fused image using multiscale edge fusion
(point representations).
11
2.2.2 Quantitative Comparisons
Figure 2.2: (a) First image (MR) of the medical test set. (b) Second image
(CT) of the medical test set. (c) Fused image using average pixel values. (d)
Fused image using DWT with an MS fuse rule. (e) Fused image using DT-
CWT with an MS fuse rule. (f) Fused image using multiscale edge fusion
(point representations).
12
where Igt is
the cut-and-paste “ground truth” image, ___ is the fused image and is the size of
the image. Lower values of _ indicate greater similarity between the images___
and ___ and therefore more successful fusion in terms of quantitatively
measurable similarity. Table 1 shows the results for the various methods used. The
average pixel value method gives a baseline result. The PCA method gave an
equivalent but a slightly worse result. These methods have poor results relatively
to the others. This was expected as they have no scale selectivity. Results were
obtained for the DWT methods using all the bio-orthogonal wavelets available
within the Matlab (5.0) Wavelet Toolbox. Similarly, results were obtained for the
DT-CWT methods using all the shift invariant wavelet filters described in [3].
Results were also calculated for the SIDWT using the Haar wavelet and the
bior2.2 Daubechies wavelet. The table 1 shows the best results for all filters for
each method. For all filters, the DWT results were worse than their DT-CWT
equivalents. Similarly, all the DWT results were worse than their SIDWT
equivalents. This demonstrates the importance of shift invariance in wavelet
transform fusion. The DT-CWT results were also better than the equivalent results
using the SIDWT. This indicates the improvement gained from the added
directional selectivity of the DT-CWT over the SIDWT. The WBV and WA methods
performed better than MS with equivalent transforms as expected, with WBV
performing best for both cases. All of the wavelet transform results were
decomposed to four levels. In addition, the residual low pass images were fused
using simple averaging and the window for the WA and WBV methods were all set
to 3_3.
13
Table 2.1: Quantitative results for various fusion methods.
There are many different choices of filters to affect the DWT transform. In
order not to introduce phase distortions, using filters having a linear phase
response is a sensible choice. To retain a perfect reconstruction property, this
necessitates the use of biorthogonal filters. MS fusion results were compared for
all the images in figures 3 and 4 using all the biorthogonal filters included in the
Mat lab (5.0) Wavelet Toolbox. Likewise there are also many different choices of
filters to affect the DT-CWT transform. MS fusion results were compared for all the
same three image pairs using all the specially designed filters given in [3].
Qualitatively all the DWT results gave more ringing artifacts than the equivalent
DTCWT results. Different choices of DWT filters gave ringing artifacts at different
image locations and scales. The choice of filters for the DT-CWT did not seem to
alter or move the ringing artifacts found within the fused images. The perceived
higher quality of the DT-CWT fusion results compared to the DWT fusion results
was also reflected by a quantitative comparison.
14
WAVELET TRANSFORM OVERVIEW
15
(a)w(2t) (b)w(4t) (c)w(8t)
The wavelets are called orthogonal when their inner products are zero. The
smaller the scaling factor is, the wider the wavelet is. Wide wavelets are
comparable to low-frequency sinusoids and narrow wavelets are comparable to
high-frequency sinusoids.
3.1.1 Scaling
3.1.2 Shifting
(a)
Wavelet function Ψ(t) (b) Shifted wavelet function Ψ(t-k)
16
3.1.3 Scale and Frequency
The higher scales correspond to the most “stretched” wavelets. The more
stretched the wavelet, the longer the portion of the signal with which it is being
compared, and thus the coarser the signal features being measured by the
wavelet coefficients. The relation between the scale and the frequency is shown in
Fig. 3.5.
For many signals, the low-frequency content is the most important part. It is
the identity of the signal. The high-frequency content, on the other hand, imparts
details to the signal. In wavelet analysis, the approximations and details are
obtained after filtering. The approximations are the high-scale, low frequency
17
components of the signal. The details are the low-scale, high frequency
components. The filtering process is schematically represented as in Fig. 3.6.
19
Fig. 3.9 Reconstruction using upsampling
20
respectively, then down sampled by a factor of two, constituting one level of
transform.
nL nH
li = ∑ sjx2i+j and hi = ∑ tjx2i+1+j
j=-nl j=-nH
Although l and h are two separate output streams, together they have the
same total number of coefficients as the original data. The output stream l, which
is commonly referred to as the low-pass data may then have the identical process
applied again repeatedly. The other output stream, h (or high-pass data), generally
remains untouched. The inverse process expands the two separate low- and high-
pass data streams by inserting zeros between every other sample, convolves the
resulting data streams with two new synthesis filters s’ and t’, and adds them
together to regenerate the original double size data stream.
nH nl
yi = ∑ t’jl’i+j + ∑ s’j h’i+j where l’2i = li, l’ 2i+1 = 0
j= -nH j= -nH h’2i+1 = hi, h’2i = 0
21
To meet the definition of a wavelet transform, the analysis and synthesis
filters s, t, s’ and t’ must be chosen so that the inverse transform perfectly
reconstructs the original data. Since the wavelet transform maintains the same
number of coefficients as the original data, the transform itself does not provide
any compression. However, the structure provided by the transform and the
expected values of the coefficients give a form that is much more amenable to
compression than the original data. Since the filters s, t, s’ and t’ are chosen to be
perfectly invertible, the wavelet transform itself is lossless. Later application of the
quantization step will cause some data loss and can be used to control the degree
of compression. The forward wavelet-based transform uses a 1-D subband
decomposition process; here a 1-D set of samples is converted into the low-pass
subband (Li) and high-pass subband (Hi). The low-pass subband represents a
down sampled low-resolution version of the original image. The high-pass
subband represents residual information of the original image, needed for the
perfect reconstruction of the original image from the low-pass subband
LL1 HL1
Fig. 3.12
Subband
Labeling
Scheme for a
LH1 HH1
one level, 2-D
Wavelet
Transform
22
The original image of a one-level (K=1), 2-D wavelet transform, with
corresponding notation is shown in Fig. 3.13. The example is repeated for a three-
level (K =3) wavelet expansion in Fig. 3.14. In all of the discussion K represents
the highest level of the decomposition of the wavelet transform.
LL1 HL1
HL2
LH1 HH1
HL3
LH2 HH2
LH3 HH3
Fig. 3.13 Subband labeling Scheme for a Three Level, 2-D Wavelet Transform
Similarly, the high pass subband (Hi) is further decomposed into HLi and
HHi. After one level of transform, the image can be further decomposed by
applying the 2-D subband decomposition to the existing LLi subband. This iterative
process results in multiple “transform levels”. In Fig. 3.14 the first level of transform
results in LH1, HL1, and HH1, in addition to LL1, which is further decomposed into
LH2, HL2, HH2, LL2 at the second level, and the information of LL2 is used for the
third level transform. The subband LLi is a low-resolution subband and high-pass
23
subbands LHi, HLi, HHi are horizontal, vertical, and diagonal subband respectively
since they represent the horizontal, vertical, and diagonal residual information of
the original image. An example of three-level decomposition into subbands of the
image CASTLE is illustrated in Fig. 3.15.
H2H1HH
Fig. 3.14 The process of 2-D wavelet transform applied through three
transform levels
25
When constructing each wavelet coefficient for the fused image. We will
have to determine which source image describes this coefficient better. This
information will be kept in the fusion decision map. The fusion decision map has
the same size as the original image. Each value is the index of the source image
which may be more informative on the corresponding wavelet coefficient. Thus, we
will actually make decision on each coefficient. There are two frequently used
methods in the previous research. In order to make the decision on one of the
coefficients of the fused image, one way is to consider the corresponding
coefficients in the source images as illustrated by the red pixels. This is called
pixel-based fusion rule. The other way is to consider not only the corresponding
coefficients, but also their close neighbors, say a 3x3 or 5x5 windows, as
illustrated by the blue and shadowing pixels. This is called window-based fusion
rules. This method considered the fact that there usually has high correlation
among neighboring pixels.
In our research, we think objects carry the information of interest, each pixel
or a small neighboring pixels are just one part of an object. Thus, we proposed a
region-based fusion scheme. When make the decision on each coefficient, we
consider not only the corresponding coefficients and their closing neighborhood,
but also the regions the coefficients are in. We think the regions represent the
objects of interest. We will provide more details of the scheme in the following.
26
3.4 PROPOSED SCHEME
Neural Network and Fuzzy Logic approach can be used for sensor fusion.
Such a sensor fusion could belong to a class of sensor fusion in which case the
features could be input and decision could be output. The help of Neuro-fuzzy of
fuzzy systems can achieve sensor fusion. The system can be trained from the
input data obtained from the sensors. The basic concept is to associate the given
sensory inputs with some decision outputs. After developing the system. another
group of input data is used to evaluate the performance of the system.
Following algorithm and .M file for pixel level image fusion using Fuzzy
Logic illustrate the process of defining membership functions and rules for the
image fusion process using FIS (Fuzzy Inference System) editor of Fuzzy Logic
toolbox in Matlab.
STEP 1
Read first image in variable M1 and find its size (rows z l , columns: SI).
Read second image in variable M2 and find its size (rows z2, columns: s2).
Variables MI and M2 are images in matrix form where each pixel value is in
the range from 0-255. Use Gray color map.
Compare rows and columns of both input images. If the two images are not
of the same size, select the portion,which are of same size.
STEP 2
Apply wavelet decomposition and form spatial decomposition Trees
Convert the images in column form which has C= zl*sl entries.
27
STEP 3
Create fuzzy interference system of type Mamdani with following
specifications
Name: 'c7'
Type: 'mamdani'
AndMethod: 'min'
OrMethod: 'max'
DefuzzMethod: 'centroid'
ImpMethod: 'min'
AggMethod: 'max'
STEP 4
Decide number and type of membership functions for both the input
images by tuning the membership functions.
Input images in antecedent are resolved to a degree of membership
ranging 0 to 255.
Make rules for input images, which resolve the two antecedents to a
single number from 0 to 255.
28
STEP 5
For num=l to C in steps of one, apply fuzzification using the rules developed
above on the corresponding pixel values of the input images which gives a fuzzy
set represented by a membership function and results in output image in column
format.
29
STEP 6
Convert the column form to matrix form and display the fused image.
30
STEP 1
Read first image in variable M1 and find its size (rows z l , columns: SI).
Read second image in variable M2 and find its size (rows z2, columns: s2).
Variables MI and M2 are images in matrix form where each pixel value is in the
range from 0-255. Use Gray color map.
Compare rows and columns of both input images. If the two images are not of the
same size, select the portion,which are of same size.
STEP 2
Apply wavelet decomposition and form spatial decomposition Trees
Convert the images in column form which has C= zl*sl entries.
STEP 3
31
OrMethod: 'max'
DefuzzMethod: 'centroid'
ImpMethod: 'min'
AggMethod: 'max'
STEP 4
Decide number and type of membership functions for both the input
images by tuning the membership functions.
Input images in antecedent are resolved to a degree of membership
ranging 0 to 255.
Make rules for input images, which resolve the two antecedents to a
single number from 0 to 255.
32
STEP 5
For num=l to C in steps of one, apply fuzzification using the rules developed
above on the corresponding pixel values of the input images which gives a fuzzy
set represented by a membership function and results in output image in column
format.
STEP 6
Start training using ANFIS for the generated Fuzzy Interference system
using Training data
Convert the column form to matrix form and display the fused image.
33
QUANTITATIVE COMPARISONS
4.2 ENTROPY
where is the probability (here frequency) of each grey scale level. As an example a
digital image of type uint8 (unsigned integer 8) has 256 different levels from 0
(black) to 255(white . It must be noticed that in combined images the number of
levels is very large and grey level intensity of each pixel is a decimal, double
number. But the equation (10) is still valid to compute the entropy. For images with
high information content the entropy is large. The larger alternations and changes
in an image give larger entropy and the sharp and focused images have more
34
changes than blurred and misfocused images. Hence, the entropy is a measure to
assess the quality of different aligned images from the same scene.
The Root Mean Square Error between the reference image, I and the fused
image is defined as: F
where and i j denotes the spatial position of pixels, M and are the dimensions of
the images. N This measure is appropriate for a pair of images containing two
objects. First a reference, everywhere-infocus image I is taken. Then two images
are provided from this original image. In one image the first object is focused and
the second one is blurred. In the other image the first object is blurred and another
one is remained focused. The fused image would contain both well-focused
objects.
where Igt is the cut-and-paste “ground truth” image, Ifd is the fused image and N
is the size of the image. Lower values of _ indicate greater similarity between the
images Igt and Ifd and therefore more successful fusion in terms of quantitatively
35
measurable similarity. Table 1 shows the results for the various methods used. The
average pixel value method, the pixel based PCA and the DWT methods give poor
results relatively to the others as expected. The DT-CWT methods give roughly
equivalent results although the New-CWT method gave slightly worse results. The
results were however very close and should not be taken as indicative as this is
just one experiment and the transforms are producing essentially the same
subband orms. The WBV and WA methods performed better than MS with
equivalent transforms as expected in most cases. The residual low pass images
were fused using simple averaging and the window for the WA and WBV methods
were all set to 3×3. The table 1 shows the best results for all filters available for
each method.
Due to the limited depth-of-focus of optical lenses (especially such with long
focal lengths) it is often not possible to get an image which contains all relevant
objects 'in focus'. One possibility to overcome this problem is to take several
pictures with different focus points and combine them together into a single frame
which finally contains the focused regions of all input images. The following
images in the result 1.2 illustrate this approach.
36
4.3.3 Medical Imaging
37
4.4 RESULTS
Fig 4.2
Fusion
by
Maximum
38
Fig 4.3 Fusion by Minimum
39
Fig 4.5 Fusion by averaging
40
FUSION BY AVERAGING FUSION BY MAXIMUM
41
CONCLUSION
In this project, the use of Discrete Wavelet Transform (DWT), Fuzzy, Neuro
Fuzzy, the fusion of images taken by digital camera was studied. The pixel-level-
based fusion mechanism applied to sets of images. All the results obtained by
these methods are valid in case of using aligned source images from the same
scene. In order to evaluate the results and compare these methods two
quantitative assessment criteria Information Entropy and Root Mean Square Error
were employed. Experimental results indicated that there are no considerable
differences between these two methods in performance. In fact if the result of
fusion in each level of decomposition is separately evaluated visually and
quantitatively in terms of entropy, no considerable difference will be observed (Fig.
5, 6,7, 9 and 11 and Tables 2 and 4). Although some differences identified in lower
levels, DWT and LPT demonstrated similar results from level three of
decomposing. Both techniques reach the best result in terms of information
entropy with a decomposing level of three. Experimental results demonstrated in
Tables 2 and 4 also indicate that LPT algorithm reaches its best quality in terms of
entropy in lower levels than DWT. The RMSE values represented in Table 6 show
that neither LPT nor DWT has better performance in all levels, although the best
result belongs to the LPT method. However the RMSE results compared to quality
and entropy of fused images indicate that RMSE can not be used as a proper
criterion to evaluate and compare the fusion results. Finally the experiments
showed that the LPT approach is implemented faster than DWT. Actually LPT
takes less than half the time in comparison with DWT and with regard to
approximately similar performance, LPT is preferred in real-time applications.
Fuzzy and Neuro-Fuzzy algorithms have been implemented to fuse a variety of
images. The results of fusion process proposed are given in terms of Entropy and
Variance. The fusions have been implemented for medical images and remote
sensing images. It is hoped that the techniques can be extended for colored
images and for fusion of multiple sensor images.
42
5.1 DWT Fusion
43
REFERENCES
[1] Shutao Li, James T. Kwok, Ivor W. Tsang, Yaonan Wang, “ Fusing images with
different focuses using support vector machines” IEEE Transactions on Neural
Networks, 15(6):1555- 1561, Nov. 2004.
[4] P.J Burt, EH Adelson, “The Laplacian pyramid as a compact image code”. IEEE
Transactions Communications, 31, pp.532-540, April.1983
[5] Shutao Li, James T. Kwok, Yaonan Wang, “Combination of images with diverse
focuses using the spatial frequency”, Information Fusion 2(3): 169-176, 2001
[7] Pajares, G., De La Cruz, JM, “ A wavelet-based image fusion tutorial”. Pattern
Recognition,37, pp. 1855-1872, 2004.
[9] H. Wang, J. Peng and W. Wu, “Fusion algorithm for multisensor images based
on discretemultiwavelet transform”. Vision, Image and Signal Processing,
Proceedings of the IEEE Vol.149, no.5: October 2002.
44