Professional Documents
Culture Documents
Pattern Recognition: Thai V. Hoang, Salvatore Tabbone
Pattern Recognition: Thai V. Hoang, Salvatore Tabbone
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
a r t i c l e i n f o abstract
Article history: A pattern descriptor invariant to rotation, scaling, translation (RST), and robust to additive noise is
Received 24 December 2010 proposed by using the Radon, Fourier, and Mellin transforms. The Radon transform converts the RST
Received in revised form transformations applied on a pattern image into transformations in the radial and angular coordinates
23 March 2011
of the patterns Radon image. These benecial properties of the Radon transform make it an useful
Accepted 30 June 2011
intermediate representation for the extraction of invariant features from pattern images for the
Available online 19 July 2011
purpose of indexing/matching. In this paper, invariance to RST is obtained by applying the 1D Fourier
Keywords: Mellin and discrete Fourier transforms on the radial and angular coordinates of the patterns Radon
Invariant pattern representation image respectively. The implementation of the proposed descriptor is reasonably fast and correct, based
Radon transform
mainly on the fusion of the Radon and Fourier transforms and on a modication of the Mellin
FourierMellin transform
transform. Theoretical arguments validate the robustness of the proposed descriptor to additive noise
Feature extraction
Noise robustness and empirical evidence on both occlusion/deformation and noisy datasets shows its effectiveness.
& 2011 Elsevier Ltd. All rights reserved.
0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2011.06.020
272 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284
applying the Radon transform on a RST transformed pattern The main contribution of this paper is to adopt and use the
image, the transformation parameters are encoded in the radial three basic transformations (Radon, Fourier, and Mellin) to trans-
(for translation and scaling) and angular (for rotation) coordinates form an input pattern image into a descriptor that is totally
of the obtained Radon image [9]. Current techniques thus usually invariant to RST transformations. Moreover, we apply the 1D
exploit this encoded information to dene invariant features. FourierMellin transform on the radial coordinate of the patterns
A pioneer and notable work in this direction is the R-transform Radon image. We use an implementation of the Mellin transform
proposed in [10] to compute the R-signature. This approach uses that is fast, accurate, and can preserve the discriminatory infor-
an integral function and the discrete Fourier transform for the mation of the input signal. Additionally, we analyzed and pro-
radial and angular coordinates of the Radon image respectively to vided theoretical evidence for the robustness of the proposed
get a 1D signature of the pattern image that is invariant to RST descriptor to additive noise. And nally, we illustrate the useful-
transformations. The strength of this approach is simplicity, ness of the proposed descriptor in the presence of occlusion/
however, even if an extension to a 2D signature has been deformation and additive noise using several datasets.
proposed, the obtained signature still has low discriminatory The remainder of this paper is organized as follows. Section 2
power as there is a loss of information in the compression process gives some background on the Radon, Fourier, and Mellin trans-
from the Radon image to the 1D signature. Recently, there was an forms and their benecial properties. Section 3 presents the
effort to apply the 2D FourierMellin transform on the Radon combined 1D FourierMellin transform and then denes the
image [11]. Similarly, Mellin transform and harmonic expansion RFM descriptor. A reasonable fast and accurate implementation
are applied on the radial and angular coordinates of the Radon of the proposed descriptor is provided in Section 4 and then an
image respectively to get a pattern descriptor that is invariant to analysis on the robustness of the proposed descriptor to additive
scaling and rotation. Similar efforts with different strategies to noise is carried out in Section 5. Experimental results are given in
exploit the Radon image have been reported in [1214]. These Section 6 and nally conclusions are drawn in Section 7.
methods, however, do not allow operator O to be full RST
transformations as in the R-signature, operator O could only be
a combination of rotation and scaling. Moving the origin of the 2. Basic material
coordinate system to the patterns centroid is a common solution
to have translation invariance, however, this normalization step This section reviews the three transformations used in the
may introduce errors when the image is noisy or the pattern is proposed descriptor, that are the Radon, Fourier, and Mellin
slightly distorted. transforms. It includes their denition and their derived benecial
Another direction in using the Radon transform for pattern properties.
description is to extract pattern features directly from the Radon
image, similar to the way the Hough transform [15] is used. For
example, pattern primitives in edge form are detected from the 2.1. The Radon transform
Radon image and represented analytically in [16]. Moreover, their
spatial relations can be made explicit [17] and this leads to a Let f x,y A R2 be a 2D function, Ly, r be a straight line in R2
taxonomy of pattern for its characterization [18]. This approach, represented by
however, is quite limited as it requires that the edge primitives
should have analytical form. Recently, a set of spectral and L fx,y A R2 : x cos y y sin y rg,
structural features proposed in [19] is extracted from a slightly where y is the angle L makes with the y axis and r is the distance
modied version of the Radon transform for pattern description. from the origin to L. The Radon transform [25] of f, denoted by Rf ,
This set of features is not invariant to rotation and, consequently, in is a function dened on the space of lines Ly, r by the line
the matching step, these features need to be rotated to all possible integral along each line:
angles corresponding to potential patterns orientations in order to Z
compute patterns similarity. Long matching time may prevent the Rf L Rf y, r f x,y dx dy
application of this approach in real systems. A generalization of the L
Z 1 Z 1
Radon transform, called the trace transform, has also been pro-
f x,ydrx cos yy sin y dx dy: 1
posed and used for image description [20,21] but its application is 1 1
restricted due to high computational complexity.
This paper presents a novel RST invariant pattern descriptor In the eld of shape analysis and recognition, the function
which has briey presented in [22], called the RFM descriptor, f x,y is constrained to the following particular denition:
(
based on the Radon, Fourier, and Mellin transforms that is 1 if x A D,
invariant not only to rotation, and scaling but also to translation f x,y
0 otherwise,
by denition. The proposed descriptor is a global one, its compu-
tation is essentially different from that of another type of where D is the domain of the binary shape represented by f x,y.
descriptors, local descriptors [23,24], which can work only on In the illustration of the Radon transform given in Fig. 1, the
textured patterns and fail to describe smooth or non-textured shaded region represents the region D. The value of the line
ones. The main idea here is to apply the 1D FourierMellin integral in Eq. (1) equals the length of the intersection between
transform, which is invariant to translation and scaling, on the the line L and the shaded region.
radial coordinate of the patterns Radon image. The result of this The Radon transform has some properties that are benecial
transform is a representation of pattern that is shifted horizon- for pattern recognition as outlined below:
tally according to the patterns rotating angle. A further applica- P1 linearity: The Radon transform is linear.
tion of the discrete Fourier transform on this representation
guarantees a pattern descriptor totally invariant to RST transfor- Rf g y, r Rf y, r Rg y, r:
mations. The proposed descriptor is fast in implementation and is
P2 periodicity: The Radon transform of f x,y is periodic in the
more robust to additive noise than existing descriptors. Experi-
variable y with period 2p.
mental results on both occlusion/deformation and noisy data-
bases show the usefulness of the proposed descriptor. Rf y, r Rf y 2kp, r, 8k A Z:
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 273
DF g k e2pi=Nkm DF f k: 4
Fig. 1. Graphical illustration of the Radon transform of a function f x,y. The Radon Let f x A R1 be a 1D function. The Mellin transform [4] of f,
transform is a mapping from the image space (x,y) to the parameter space y, r denoted by Mf , is a function dened by
and can be mathematically represented by a line integral of f x,y along all the Z 1
lines L parameterized by y, r represented in the image space (x,y). Mf s f xxs1 dx, 5
0
P5 rotation: A rotation of f x,y by an angle y0 implies a shift in Thus, except for a constant multiplicative factor as , the
the variable y of Rf y, r by a distance y0 . Mellin transform is scale invariant. However, this scaling factor
can be easily eliminated by normalization or it can be used to nd
Rf y, r-Rf y y0 , r: the relative scale between two functions.
P6 scaling: A scaling of f x,y by a factor a results in a scaling in
the variable r and the amplitude of Rf y, r by the factors a and
1=a respectively. 3. An invariant pattern descriptor based on the Radon,
Fourier, and Mellin transforms
1
Rf y, r- Rf y, ar:
a This section presents a combination of the Fourier and Mellin
transforms for 1D signals and its application on the radial
2.2. The Fourier transform coordinate of the patterns Radon image to have a pattern
descriptor invariant to RST transformations. A measure of simi-
Let f x A R1 be a 1D function. The Fourier transform [3] of f, larity for the proposed pattern descriptor is also dened, allowing
denoted by F f , is a function dened for every real number x by the comparison/matching between any pair of images.
Z 1
F f x f xei2pxx dx: 2 3.1. The combined FourierMellin transform for 1D signal
1
It is well known that the Fourier transform possesses a shift or Combinations of the Fourier and Mellin transforms have been
translation invariant property. Consider a function gx f xx0 , proposed in literature to have signal representations that do not
a version of f(x) shifted by x0, then vary with rotation/scaling or translation/scaling. For 2D signals,
Z 1
they are rst converted from Cartesian coordinates into polar
F g x gxei2pxx dx ei2pxx0 F f x: 3
1 coordinates then Fourier and Mellin transforms are performed
independently on the circular and radial coordinates of the
Taking the magnitude of the two sides of Eq. (3) results in
converted signals respectively [26,27]. In this way, the obtained
jF g xj jF f xj representation is invariant to rotation and scaling. For 1D signals,
Fourier and Mellin transforms are performed in sequence directly
or, in other words, the magnitude of the Fourier transform of a 1D
on the signals [28,29] to have signal representations invariant to
function is invariant to translation. Discrete version of Eq. (2)
translation and scaling.
dened for a sequence of numbers ff n : n 0, . . . ,N1g has the
Consider the Fourier transform of a function gx f axx0 , a
following denition:
scaled and translated version of f(x) by parameters a and x0:
X
N 1 Z 1
DF f k f ne2pi=Nkn , k 0, . . . ,N1: F g x f axx0 ei2pxx dx: 7
n0 1
274 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284
Denoting y axx0 , Eq. (7) becomes By varying the value of y and s, the two images MF RJ y,s and
Z 1 MF RI y,s are obtained. Moreover, except for a constant multi-
1 1 x
F g x ei2px=ax0 f yei2px=ay dy ei2px=ax0 F f : 8 plicative factor as2 , MF RJ y,s can be directly obtained by
a 1 a a
shifting MF RI y,s along the y axis by a quantity y0 . Applying
Taking the magnitude of the two sides of Eq. (8) results in
the discrete Fourier transform on the angular coordinate of the
1 x FourierMellin images MF RJ y,s and MF RI y,s then ignoring
jF g xj F f : 9
a a the phase information in the coefcients, we have
The translation parameter x0 has disappeared in Eq. (9); the jDF MF RJ k,sj as2 jDF MF RI x,sj: 13
remaining scaling parameter a could be eliminated by applying
the Mellin transform on both sides of Eq. (9):
Z 1 This demonstrates that, except for a constant multiplicative
1 x s1 factor as2 , the proposed descriptor computed on a scaled,
MjF g j s F f x dx as1 MjF f j s,
0 a a rotated, and translated version J of a image I is exactly the same
or as the descriptor computed on the original image I. The factor
as2 , however, can be easily eliminated by a normalization step
jMjF g j sj as1 jMjF f j sj: 10
using the DC component as jDF MF RI k,s=DF MF RI 0,sj or it can
Therefore, by dening be used to determine the relative scale between any two pattern
Z 1 Z images of the same category. &
1
s1
MF f s jMjF g j sj
f xei2pxx dxjx dxj 11
0 1 Calculating the proposed RFM descriptor, as described above,
does not require any normalization regarding the size, position, or
as the combined 1D FourierMellin transform of a function f(x)
orientation of the patterns, it only requires a normalization in the
and from Eq. (10), MF f s is invariant to translation and scaling,
intensity of the computed descriptor. As a consequence, the
except for a constant multiplicative factor.
proposed descriptor is totally invariant to RST transformations.
FFT is applied on the radial coordinate of the patterns Radon In the time domain, the modied direct Mellin transform is
image, the DC component is always nonzero and it, in turn, is the dened as
value of the Mellin transforms input function f(x) at x 0. Z 1
To avoid these problems, an alternative implementation of the MMf s s f xxs1 dx:
0
Mellin transform proposed in [29], which is called the direct
Mellin transform, is adopted for this work. Assuming f(x) is in the It is easy to verify that the modied direct Mellin transform
form of sampled data with sampling period T, expanding Eq. (5) maintains the scaling invariance property and the combined
276 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284
Fourier-modied direct Mellin transform dened as A quantitative evaluation of the invariant properties of the
Z FourierMellin transform is given in Fig. 3 using the normalized
1
cross correlation between all possible pairs of FourierMellin images
MF f s MMjF f j s s jF f jxs1 dx 20
0 MF RIi y,s and MF RIj y,s from the third row of Fig. 2. Normalized
cross correlation is selected for the sake of overcoming the constant
is invariant to both translation and scaling. Therefore, the 1D multiplicative factor as2 in Eq. (13). To overcome the remaining
FourierMellin transform used in the denition of the RFM rotating parameter, the correlation is calculated for all possible
descriptor in Section 3.2 is henceforth dened as in Eq. (20), rotating angle, meaning that one of the two FourierMellin images
instead of Eq. (11), and implemented through Eq. (19). can be circular shifted along its y direction by 180 possible values
To qualitatively illustrate the invariant properties of from 0 to 179 with increment of 1. Denoting j as the shifting
MF RI y,s, Fig. 2 provides some example pattern images and distance, the correlation between image Ii and Ij at j is dened as
their corresponding Radon and FourierMellin images. The rst
row contains two original pattern images (Fig. 2(a), (b)) and the Cij j corr2MF RIi y,s,MF RIj y j,s,
RST transformed versions (Fig. 2(c)(e)) of the pattern image in
Fig. 2(b). The second row shows the Radon transform of these where corr2A,B is the normalized cross correlation function
pattern images and the third row shows the images obtained after between two 2D input data A and B of size m n calculated using
performing the 1D FourierMellin transform on the Radon images the following formula:
in the second row using 150 values of t ranging from 2.0 to 16.9 Pm Pn
with increment of 0.1 (the intensity of these images has been i1 j 1 Aij ABij B
q
Pm Pn P Pn
rescaled to [0,255] for better viewing). It is noteworthy here that, i 1 j 1 Aij A2 m i1 j 1 Bij B ,
2
RI1 (, ) RI2 (, ) RI3 (, ) RI4 (, ) RI5 (, )
MFRI (, s) MFRI (, s) MFRI (, s) MFRI (, s) MFRI (, s)
1 2 3 4 5
Fig. 2. Illustration of the invariant properties of the Radon and 1D FourierMellin transforms performed on pattern images. The rst row contains two original pattern
images (a)(b) and the RST transformed versions (c)(e) of the pattern image in (b). The second row shows the Radon transform of these pattern images and the third row
shows the images obtained after performing the 1D FourierMellin transform on the Radon images in the second row using 150 values of t ranging from 2.0 to 16.9 with
increment of 0.1.
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 277
1 1 1 1 1
C12 ()
C13 ()
C15 ()
C23 ()
C14 ()
0.5 0.5 0.5 0.5 0.5
0 0 0 0 0
0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180
1 1 1 1 1
C24 ()
C45 ()
C35 ()
C25 ()
C34 ()
0.5 0.5 0.5 0.5 0.5
0 0 0 0 0
0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180
Fig. 3. The normalized cross correlation between all possible pairs of FourierMellin images from the third row of Fig. 2. For each pair of images, 180 correlation values are
calculated by circular shifting one of the two FourierMellin images along its y direction by 180 possible values from 0 to 179 with increment of 1.
5. Robustness to noise
Rf^ y, r Rf y, r RZ y, r:
R Nr
mn Nr 1 nr dr. Eq. (24) can be rewritten as noise respectively, then
10
80
SNRproj ()
6
SNRimage
60 D1
= 9.375
SNRproj ()
d (12D)2
SNRimage
40 4
0
20 2
0 0.1
0.7 d
0.6 0
0.5 0.4 20 200 400 600 800 1000
D 0.3 0.2
A ()
mn
Fig. 6. The values of SNRproj =SNRimage over the domain D A 0:3,0:7 and d A 0,0:2. A high value of SNRproj =SNRimage means that the projection in the Radon transform has
the property of suppressing salt & pepper noise. (a) Ay=mn 100. (b) D 0.3, d 0.2.
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 279
Fig. 7. (a) Images of 26 Latin characters of size 64 64 in Arial bold font used to generate six alphabet datasets (noise-free images). (b) Sample images from the six alphabet
datasets generated from the rst four character images with six possible values of SNR f16,8,4,2,1,0:5g, corresponding to the six datasets (examples of noisy images).
Fig. 8. (a) Twenty object images from the COIL-20 dataset [40] used to generate six object datasets (noise-free images). (b) Sample images from the six object datasets
generated from the four object images with six possible values of SNR f16,8,4,2,1,0:5g, corresponding to the six datasets (examples of noisy images).
280 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284
The main characteristic that differentiates ExpA and ExpB, empirical evidence for the analytical results developed in
besides the semantic content of their images, is the number of Section 5. Poor performance of ART, GFD, Zernike, and R2DFM
intensity levels in the original images: character images have only descriptors has its root in the required normalizations in their
two levels of intensity whereas object images have multi-level computation and can be explained as
intensity. Noisy images are generated from the corresponding
noise-free images by randomly scaling, rotating, translating and 1. To have invariance to translation, the origin of the polar
then adding white noise to them. Let SNR be the signal-to-noise coordinate system needs to be placed at the centroid of the
ratio dened as pattern. In the presence of noise, position of the centroid is
P 2 shifted arbitrarily according to the actual noise.
x,y f x,y
SNR P 2. To have invariance to scaling, the radial axis is normalized by the
x,y Z x,y
2
distance from the origin of the polar coordinate system to the
where f x,y is the noise-free image and Zx,y is the added white furthest patterns point. In the presence of noise, this furthest
noise, the value of SNR for each dataset is kept constant and, for point might not belong to the actual pattern but the noise.
each experiment, SNR has six possible values {0.5,1,2,4,8,16}, corre-
sponding to the six datasets. Some example images from the six Furthermore, it is also evident from the two set of experiments
datasets in the two experiments are given in Figs. 7(b) 8(b). that the performance of R-signature and RFM descriptors is better
Comparison of the proposed RFM descriptor with ART, GFD, on ExpB than on ExpA at each value of SNR, leading to a conclusion
Zernike, R-signature, and R2DFM descriptors on these noisy that the proposed descriptor performs better on multi-level than
datasets have been performed and the obtained results on ExpA two levels grayscale images. Possible explanation for this come
and ExpB are given in Figs. 9 and 10 respectively. It is observed from the effect of re-sampling and re-quantization when an
from these set of gures that: image is rotated, scaled, and/or translated. Multi-level images
have smooth intensity surface which suffers less deformation
1. ART, GFD, Zernike, and R2DFM descriptors are not robust to due to re-sampling and re-quantization than the deformation
additive white noise at all, their performance is similarly poor occurred at sharp edge in two levels images.
for different levels of noise.
2. R-signature and RFM descriptors have reasonably good per- 6.2. Binary pattern recognition
formance when the noise is weak (SNR 16,8,4), demonstrat-
ing its resistance to additive noise. 6.2.1. Noisy datasets
3. As SNR decreases (the images get noisier), the precisionrecall The robustness of the proposed RFM descriptor to additive salt
curves of the R-signature and RFM descriptors generally move & pepper noise is demonstrated using six logo datasets generated
downwards. However, the curves of the RFM descriptor is from the rst 25 logo images of the UMD Logo dataset as shown in
always above that of the R-signature, meaning a more robust- Fig. 11(b). Each of these six logo datasets has 275 images of 25
ness of the RFM descriptor to noise than the R-signature. categories, each category contains 11 images generated by ran-
domly scaling, rotating, translating the original corresponding logo
It is thus can be concluded that the proposed RFM descriptor is image and then adding salt & pepper noise to it. Let d be the
more robust to additive white noise than all other comparing percentage of pixels ipped by the noise, the value of d for each
descriptors on grayscale noisy datasets and this provides an generated dataset is kept constant and d has six possible values
1 1 1
precision
1 1 1
precision
Fig. 9. The precisionrecall curves of different descriptors on the six alphabet datasets. ART, GFD, Zernike, and R2DFM descriptors are not robust to noise while R-signature
and RMF descriptors are. As SNR decreases (the images get noisier), the precisionrecall curves of the RMF and R-signature descriptors generally move downwards but the
curves of the RMF descriptor is always above that of the R-signature, meaning its superiority. (a) SNR 16. (b) SNR 8. (c) SNR 4. (d) SNR 2. (e) SNR 1. (f) SNR 0:5.
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 281
1 1 1
precision
precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall
1 1 1
precision
precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall
Fig. 10. The precisionrecall curves of different descriptors on the six object datasets. ART, GFD, Zernike, and R2DFM descriptors are not robust to noise while R-signature
and RMF descriptors are. As SNR decreases (the images get noisier), the precisionrecall curves of the RMF and R-signature descriptors generally move downwards but the
curves of the RMF descriptor is always above that of the R-signature, meaning its superiority. (a) SNR 16. (b) SNR 8. (c) SNR 4. (d) SNR 2. (e) SNR 1. (f) SNR 0:5.
Fig. 11. (a) Twenty-ve logo images from the UMD Logo dataset [41] used to generate six logo datasets (noise-free images). (b) Sample images from the six logo datasets
generated from the rst four logo images with six possible values of d f0,0:02,0:04,0:06,0:08,0:1g, corresponding to the six datasets (examples of noisy images).
ranging from 0 to 0.1 with increment of 0.02 corresponding to recall curve moves downwards. However, the impact of d on those
6 datasets. The rst dataset with d0 is actually a noiseless curves differs from one descriptor to another. Among the other
dataset; its use is intended for checking the invariant properties descriptors, the proposed descriptor has the best performance for
of the proposed and comparing descriptors. The values of d of the all the 5 noisy datasets while ART and Zernike descriptors have
other 5 noisy datasets make up an arithmetic progression with similarly worse performance. It is also observed that:
common difference 0.02. These 5 datasets are, therefore, used to
evaluate the resistance of the proposed and comparing descriptors 1. As d increases, the curve of all the descriptors generally move
at incrementing levels of additive noise. Some sample images from downwards.
these 6 datasets are given in Fig. 11(b). It should also be noted that 2. ART and Zernike descriptors are not robust to noise at all, their
the maximum chosen value of d, 0.1, satises the assumption on performance is similarly poor for different levels of noise.
possible value of d in the analysis of the robustness of the proposed 3. GFD have more resistance to noise than ART and Zernike
descriptor to additive salt & pepper noise in Section 5. because its curves are pushed away from the ideal curve
The proposed RFM descriptor is again compared with ART, GFD, (when d0) with distance which increases along with the
Zernike, R-signature, and R2DFM descriptors using these 6 datasets increase in d. However, the resistance of GFD is weaker than
and the computed precision-recall curves of these descriptors are that of descriptors dened on the Radon transform.
depicted in Fig. 12. For the noiseless dataset with d0 (Fig. 12(a)), 4. Among the three Radon-based descriptors, the shift in the
all shape descriptors have ideal performance. This demonstrates curve of R-signature and RFM is more regular than that of
that the proposed descriptor is totally invariant to both translation, R2DFM. In addition, and more importantly, the RFM descriptor
rotation, and scaling. When d a 0 (Fig. 12(b)(e)), deterioration has the smallest in the shift of its curve among all descriptors
appears in the performance of all descriptors and their precision- for each value of d.
282 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284
1 1 1
precision
precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall
1 1 1
precision
precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall
Fig. 12. The precisionrecall curves of different descriptors on the six logo datasets. For the noiseless dataset d 0 (a), all shape descriptors have ideal performance. When d a 0
(b)(f), deterioration appears in the performance of all descriptors and their curve moves downwards. However, the impact of d on those curves differs from one descriptor to
another. Zernike and the RFM descriptors have worse and best performance respectively among the remaining. (a) d0. (b) d0.02. (c) d0.04. (d) d0.06. (e) d0.08. (f) d0.1.
Table 1
The resulting nearest matches by category of the Shapes216 dataset using the proposed RFM descriptor.
1 2 3 4 5 6 7 8 9 10 11 12
Bone 12 12 12 12 12 12 12 12 12 12 12 10 98.61
Glass 11 12 11 12 12 11 10 11 11 12 6 12 90.97
Heart 11 8 9 8 9 5 8 9 10 9 7 7 69.44
Misk 10 10 10 6 4 10 10 10 1 10 10 10 70.14
Bird 4 1 9 10 11 6 10 10 10 11 9 7 68.06
Brick 12 12 12 12 11 12 12 12 12 12 12 12 99.30
Camel 6 6 8 9 9 5 7 5 10 8 8 9 62.50
Car 12 12 12 12 12 12 12 12 12 12 12 12 100
Children 12 12 12 12 12 12 12 12 12 12 12 12 100
Classic 11 11 11 12 12 10 11 11 12 12 11 11 93.75
Elephant 4 1 7 7 3 5 6 3 7 6 4 6 40.97
Face 12 12 12 12 12 12 12 12 12 12 12 12 100
Fork 3 2 7 4 5 1 3 5 6 5 8 8 39.58
Fountain 12 12 12 12 12 12 12 12 12 12 12 12 100
Hammer 9 11 9 11 11 8 5 8 8 8 8 10 73.61
Key 11 12 9 9 11 9 12 12 11 9 11 12 88.89
Ray 4 12 12 12 12 12 12 5 5 11 11 5 78.47
Turtle 6 9 6 6 6 5 10 6 7 4 6 5 52.78
% 100 94.44 91.67 90.74 87.04 81.02 76.85 75.93 69.44 62.96 64.35 56.94
Actual category
Elephant
Fountain
Hammer
Children
Classic
Camel
Glass
Turtle
Heart
Bone
Brick
Face
Misk
Fork
Bird
Ray
Key
Car
1
Bone 140
Glass
Heart 120
0.8 Misk
Bird
Predicted category
Brick 100
precision
0.6 Camel
Car 80
ART Children
0.4 Classic
GFD Elephant 60
Rsig Face
0.2 Zernike Fork 40
R2DFM Fountain
Hammer
RFM Key 20
0 Ray
0 0.2 0.4 0.6 0.8 1 Turtle 0
recall
Fig. 14. (a) The precisionrecall curves of different descriptors on the Shapes216 dataset. The proposed RFM descriptor has similar performance with ART, GFD, Zernike,
R2DFM and outperform R-signature. (b) The confusion matrix of the proposed RFM descriptor on the Shapes216 dataset mapped to the full range of the Jet colormap in
MATLAB (ranging from blue to red, and passing through the colors cyan, yellow, and orange). (For interpretation of the references to color in this gure legend, the reader is
referred to the web version of this article.)
local descriptors. In our perspective, due to the orthogonality [19] Y.W. Chen, Y.Q. Chen, Invariant description and retrieval of planar shapes
between these two type of descriptors, the combination may using Radon composite features, IEEE Transactions on Signal Processing 56
(101) (2008) 47624771.
potentially boost the performance of pattern recognition and [20] A. Kadyrov, M. Petrou, The Trace transform and its applications, IEEE
image retrieval systems. Transactions on Pattern Analysis and Machine Intelligence 23 (8) (2001)
811828.
[21] M. Petrou, A. Kadyrov, Afne invariant features from the Trace transform,
IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (1) (2004)
Acknowledgments
3044.
[22] T.V. Hoang, S. Tabbone, A geometric invariant shape descriptor based on the
The authors would like to thank the anonymous reviewers for Radon, Fourier, and Mellin transforms, in: Proceedings of the ICPR, 2010,
pp. 20852088.
their valuable and insightful comments and suggestions that help
[23] K. Mikolajczyk, C. Schmid, Scale & afne invariant interest point detectors,
to improve the quality of the paper. International Journal of Computer Vision 60 (1) (2004) 6386.
[24] G.-H. Liu, L. Zhang, Y.-K. Hou, Z.-Y. Li, J.-Y. Yang, Image retrieval based on
multi-texton histogram, Pattern Recognition 43 (7) (2010) 23802389.
References [25] S.R. Deans, The Radon Transform and Some of Its Applications, Krieger
Publishing Company, 1993.
[1] J. Wood, Invariant pattern recognition: a review, Pattern Recognition 29 (1) [26] D. Casasent, D. Psaltis, New optical transforms for pattern recognition,
(1996) 117. Proceedings of the IEEE 65 (1) (1977) 7784.
[2] D. Zhang, G. Lu, Review of shape representation and description techniques, [27] Y. Sheng, J. Duvernoy, Circular Fourier-radial Mellin descriptors for pattern
Pattern Recognition 37 (1) (2004) 119. recognition, Journal of the Optical Society of America A 3 (6) (1986) 885888.
[3] K.B. Howell, Fourier transforms, in: A.D. Poularikas (Ed.), The Transforms and [28] R.A. Altes, The FourierMellin transform and mammalian hearing, Journal of
Applications Handbook, CRC Press & IEEE Press, 2000 (Chapter 2). the Acoustical Society of America 63 (1) (1978) 174183.
[4] J. Bertrand, P. Bertrand, J.P. Ovarlez, The Mellin transform, in: A.D. Poularikas [29] P.E. Zwicke, I. Kiss, A new implementation of the Mellin transform and its
(Ed.), The Transforms and Applications Handbook, CRC Press & IEEE Press, application to radar classication of ships, IEEE Transactions on Pattern
2000 (Chapter 11). Analysis and Machine Intelligence 5 (2) (1983) 191199.
[5] Y.-N. Hsu, H.H. Arsenault, G. April, Rotation-invariant digital pattern recogni- [30] A. Khotanzad, Y.H. Hong, Invariant image recognition by Zernike moments,
tion using circular harmonic expansion, Applied Optics 21 (22) (1982) IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (5) (1990)
40124015. 489497.
[6] D. Zhang, G. Lu, Shape-based image retrieval using generic Fourier descriptor, [31] P.C. Mahalanobis, On the generalised distance in statistics, Proceedings of the
Signal Processing: Image Communication 17 (10) (2002) 825848. National Institute of Sciences of India 2 (1) (1936) 4955.
[7] M.R. Teague, Image analysis via the general theory of moments, Journal of the
[32] W.A. Gotz,
H.J. Druckmuller, A fast digital Radon transforman efcient
Optical Society of America 70 (8) (1980) 920930. means for evaluating the Hough transform, Pattern Recognition 29 (4) (1996)
[8] J. Flusser, T. Suk, B. Zitova, Moments and Moment Invariants in Pattern 711718.
Recognition, John Wiley & Sons, 2009. [33] M.L. Brady, A fast discrete approximation algorithm for the Radon transform,
[9] H. Hjouj, D.W. Kammler, Identication of reected, scaled, translated, and SIAM Journal on Computing 27 (1) (1998) 107119.
rotated objects from their Radon projections, IEEE Transactions on Image [34] A. Averbuch, R. Coifman, D. Donoho, M. Israeli, J. Walden, Fast Slant Stack:
Processing 17 (3) (2008) 301310. A Notion of Radon Transform for Data on A Cartesian Grid Which Is Rapidly
[10] S. Tabbone, L. Wendling, J.-P. Salmon, A new shape descriptor dened on the Computable, Algebraically Exact, Geometrically Faithful, and Invertible,
Radon transform, Computer Vision and Image Understanding 102 (1) (2006) Technical Report, Stanford University, 2001.
4251. [35] S. Derrode, F. Ghorbel, Robust and efcient FourierMellin transform approx-
[11] X. Wang, B. Xiao, J.-F. Ma, X.-L. Bi, Scaling and rotation invariant analysis imations for gray-level image reconstruction and complete invariant descrip-
approach to object recognition based on Radon and FourierMellin trans- tion, Computer Vision and Image Understanding 83 (1) (2001) 5778.
forms, Pattern Recognition 40 (12) (2007) 35033508. [36] A. De Sena, D. Rocchesso, A fast Mellin and scale transform, EURASIP Journal
[12] S.-S. Xiao, Y.-X. Wu, Rotation-invariant texture analysis using Radon and on Advances in Signal Processing 100 (2007).
Fourier transforms, Journal of Physics Conference Series 48 (1) (2007) 1459. [37] G. Robbins, T. Huang, Inverse ltering for linear shift-variant imaging
[13] G. Chen, T.D. Bui, A. Krzyzak, Invariant pattern recognition using Radon, dual- systems, Proceedings of the IEEE 60 (7) (1972) 862872.
tree complex wavelet and Fourier transforms, Pattern Recognition 42 (9) [38] L.H. Johnson, The Shift and Scale Invariant FourierMellin Transform for
(2009) 20132019. Radar Applications, Technical Report, Massachusetts Institute of Technology,
[14] X. Wang, F. xia Guo, B. Xiao, J.-F. Ma, Rotation invariant analysis and 1980.
orientation estimation method for texture classication based on Radon [39] K. Jafari-Khouzani, H. Soltanian-Zadeh, Rotation-invariant multiresolution
transform and correlation analysis, Journal of Visual Communication and texture analysis using Radon and wavelet transforms, IEEE Transactions on
Image Representation 21 (1) (2010) 2932. Image Processing 14 (6) (2005) 783795.
[15] R.O. Duda, P.E. Hart, Use of the Hough transformation to detect lines and [40] S.A. Nene, S.K. Nayar, H. Murase, Columbia Object Image Library (COIL-20),
curves in pictures, Communications of the ACM 15 (1) (1972) 1115. CUCS-005-96, Technical Report, Department of Computer Science, Columbia
[16] V.F. Leavers, J.F. Boyce, The Radon transform and its application to shape University, 1996.
parametrization in machine vision, Image and Vision Computing 5 (2) (1987) [41] D.S. Doermann, E. Rivlin, I. Weiss, Applying algebraic and differential
161166. invariants for logo recognition, Machine Vision and Applications 9 (2)
[17] V.F. Leavers, Use of the Radon transform as a method of extracting informa- (1996) 7386.
tion about shape in two dimensions, Image and Vision Computing 10 (2) [42] T.B. Sebastian, P.N. Klein, B.B. Kimia, Recognition of shapes by editing their
(1992) 99107. shock graphs, IEEE Transactions on Pattern Analysis and Machine Intelligence
[18] V.F. Leavers, Use of the two-dimensional Radon transform to generate a 26 (5) (2004) 550571.
taxonomy of shape for the characterization of abrasive powder particles, IEEE [43] M. Bober, F. Preteux, W.-Y.Y. Kim, Shape descriptors, in: B.S. Manjunat,
Transactions on Pattern Analysis and Machine Intelligence 22 (12) (2000) P. Salembier, T. Sikora (Eds.), Introduction to MPEG 7: Multimedia Content
14111423. Description Language, John Wiley & Sons, 2002, pp. 231260.
Thai V. Hoang received his B.Eng. degree from Hanoi University of Science and Technology, Vietnam, in 2003 and his M.Sc. degree from Korea Advanced Institute of Science
and Technology, Republic of Korea, in 2007, both in Electrical Engineering. Since October 2008, he has been receiving the CNRSs BDI-PED doctoral fellowship and working
towards his Ph.D. degree in Computer Science at LORIA-INRIA Nancy research center in France. His research interests include invariant pattern representation and sparse
representation.
Salvatore Tabbone is a professor in Computer Science at University of Nancy 2 (France). Since 2007, he is the scientic leader of the QGAR research team at LORIA research
center. His research topics include image and document indexing, content-based image retrieval, and pattern recognition. He is the (co-)author of more than 100 articles in
refereed journals and conferences. He serves as PC members for numerous national and international conferences.