Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Pattern Recognition 45 (2012) 271284

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

Invariant pattern recognition using the RFM descriptor


Thai V. Hoang a,b,, Salvatore Tabbone b
a
MICA Center, HUST - CNRS/UMI 2954 - Grenoble INP, Hanoi, Vietnam
b
LORIA, CNRS/UMR 7503, Universite Nancy 2, 54506 Vandoeuvre-le s-Nancy, France

a r t i c l e i n f o abstract

Article history: A pattern descriptor invariant to rotation, scaling, translation (RST), and robust to additive noise is
Received 24 December 2010 proposed by using the Radon, Fourier, and Mellin transforms. The Radon transform converts the RST
Received in revised form transformations applied on a pattern image into transformations in the radial and angular coordinates
23 March 2011
of the patterns Radon image. These benecial properties of the Radon transform make it an useful
Accepted 30 June 2011
intermediate representation for the extraction of invariant features from pattern images for the
Available online 19 July 2011
purpose of indexing/matching. In this paper, invariance to RST is obtained by applying the 1D Fourier
Keywords: Mellin and discrete Fourier transforms on the radial and angular coordinates of the patterns Radon
Invariant pattern representation image respectively. The implementation of the proposed descriptor is reasonably fast and correct, based
Radon transform
mainly on the fusion of the Radon and Fourier transforms and on a modication of the Mellin
FourierMellin transform
transform. Theoretical arguments validate the robustness of the proposed descriptor to additive noise
Feature extraction
Noise robustness and empirical evidence on both occlusion/deformation and noisy datasets shows its effectiveness.
& 2011 Elsevier Ltd. All rights reserved.

1. Introduction enough discriminatory power for their recognition. Mathemati-


cally speaking, if f x,y is an image and let gx,y be another image
The problem of recognition of image patterns that undergo described as g Of , where O is an RST transformation operator,
geometric transformations like rotation, scaling, and translation is then the invariant I is a functional satisfying I f I Of .
an important topic in pattern recognition and is the goal of many Many pattern descriptors have been proposed in literature for
research works. Many approaches have been proposed for this the extraction of patterns invariant features [1,2] using techni-
problem and they can be classied into three main lines: brute ques that allow operator O to be rotation, scaling, translation, or
force, image normalization, and invariant features. Brute force their combinations. Translation and scaling invariance could be
approaches are the most trivial ones, using complete training obtained by using the Fourier [3] and Mellin [4] transforms
datasets; for each pattern category, the training dataset contains respectively; rotation invariance by computing the harmonic
all its RST transformed versions. This line of approaches has expansion [5] or performing the discrete Fourier transform on
inherent limitations in both storage requirement and time com- the circular coordinate of the pattern image in polar space [6], etc.
plexity that make them practically inapplicable. Image normal- However, the task of combining several techniques to make
ization is a solution for the reduction of the size of training operator O full RST transformations while guaranteeing the
datasets. The burden of the encoded RST transformation para- discriminatory power of the invariant features is challenging
meters in input pattern images is alleviated by normalizing them and has attracted attention of many researchers. Most of the
regarding their orientation, size, and position. However, despite existing methods do not allow operator O to be full RST trans-
its efciency in the recognition stage, normalization of image formations, they usually require normalizations for the unavail-
patterns involves difcult inverse problems that are often ill- ability of any of RST transformations in operator O. For example,
conditioned or ill-posed, leading to unreliable normalization methods based on the theory of moments [7,8] usually normalize
results. Approaches using invariant features are based on the idea input pattern images regarding their centroid position and size:
of describing image patterns by a set of measurable quantities the patterns centroid is required to coincide with the origin of the
that are insensitive to RST transformations while providing coordinate system and the longest distance from this centroid to a
pattern point is set to a xed value. These normalizations usually
introduce errors, are sensitive to noise, and thus induce inaccu-
 Corresponding author at: LORIA, CNRS/UMR 7503, Bureau B226, LORIA,
racy in the later recognition/matching process.
Campus Scientique, BP 239, 54506 Vandoeuvre-le s-Nancy, France.
Tel.: 33 3 54 95 85 60; fax: 33 3 83 27 83 19.
Radon-based methods are different from the others in the
E-mail addresses: vanthai.hoang@loria.fr (T.V. Hoang), sense that Radon transform is used as an intermediate represen-
tabbone@loria.fr (S. Tabbone). tation upon which invariant features are extracted from. By

0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2011.06.020
272 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284

applying the Radon transform on a RST transformed pattern The main contribution of this paper is to adopt and use the
image, the transformation parameters are encoded in the radial three basic transformations (Radon, Fourier, and Mellin) to trans-
(for translation and scaling) and angular (for rotation) coordinates form an input pattern image into a descriptor that is totally
of the obtained Radon image [9]. Current techniques thus usually invariant to RST transformations. Moreover, we apply the 1D
exploit this encoded information to dene invariant features. FourierMellin transform on the radial coordinate of the patterns
A pioneer and notable work in this direction is the R-transform Radon image. We use an implementation of the Mellin transform
proposed in [10] to compute the R-signature. This approach uses that is fast, accurate, and can preserve the discriminatory infor-
an integral function and the discrete Fourier transform for the mation of the input signal. Additionally, we analyzed and pro-
radial and angular coordinates of the Radon image respectively to vided theoretical evidence for the robustness of the proposed
get a 1D signature of the pattern image that is invariant to RST descriptor to additive noise. And nally, we illustrate the useful-
transformations. The strength of this approach is simplicity, ness of the proposed descriptor in the presence of occlusion/
however, even if an extension to a 2D signature has been deformation and additive noise using several datasets.
proposed, the obtained signature still has low discriminatory The remainder of this paper is organized as follows. Section 2
power as there is a loss of information in the compression process gives some background on the Radon, Fourier, and Mellin trans-
from the Radon image to the 1D signature. Recently, there was an forms and their benecial properties. Section 3 presents the
effort to apply the 2D FourierMellin transform on the Radon combined 1D FourierMellin transform and then denes the
image [11]. Similarly, Mellin transform and harmonic expansion RFM descriptor. A reasonable fast and accurate implementation
are applied on the radial and angular coordinates of the Radon of the proposed descriptor is provided in Section 4 and then an
image respectively to get a pattern descriptor that is invariant to analysis on the robustness of the proposed descriptor to additive
scaling and rotation. Similar efforts with different strategies to noise is carried out in Section 5. Experimental results are given in
exploit the Radon image have been reported in [1214]. These Section 6 and nally conclusions are drawn in Section 7.
methods, however, do not allow operator O to be full RST
transformations as in the R-signature, operator O could only be
a combination of rotation and scaling. Moving the origin of the 2. Basic material
coordinate system to the patterns centroid is a common solution
to have translation invariance, however, this normalization step This section reviews the three transformations used in the
may introduce errors when the image is noisy or the pattern is proposed descriptor, that are the Radon, Fourier, and Mellin
slightly distorted. transforms. It includes their denition and their derived benecial
Another direction in using the Radon transform for pattern properties.
description is to extract pattern features directly from the Radon
image, similar to the way the Hough transform [15] is used. For
example, pattern primitives in edge form are detected from the 2.1. The Radon transform
Radon image and represented analytically in [16]. Moreover, their
spatial relations can be made explicit [17] and this leads to a Let f x,y A R2 be a 2D function, Ly, r be a straight line in R2
taxonomy of pattern for its characterization [18]. This approach, represented by
however, is quite limited as it requires that the edge primitives
should have analytical form. Recently, a set of spectral and L fx,y A R2 : x cos y y sin y rg,
structural features proposed in [19] is extracted from a slightly where y is the angle L makes with the y axis and r is the distance
modied version of the Radon transform for pattern description. from the origin to L. The Radon transform [25] of f, denoted by Rf ,
This set of features is not invariant to rotation and, consequently, in is a function dened on the space of lines Ly, r by the line
the matching step, these features need to be rotated to all possible integral along each line:
angles corresponding to potential patterns orientations in order to Z
compute patterns similarity. Long matching time may prevent the Rf L Rf y, r f x,y dx dy
application of this approach in real systems. A generalization of the L
Z 1 Z 1
Radon transform, called the trace transform, has also been pro-
f x,ydrx cos yy sin y dx dy: 1
posed and used for image description [20,21] but its application is 1 1
restricted due to high computational complexity.
This paper presents a novel RST invariant pattern descriptor In the eld of shape analysis and recognition, the function
which has briey presented in [22], called the RFM descriptor, f x,y is constrained to the following particular denition:
(
based on the Radon, Fourier, and Mellin transforms that is 1 if x A D,
invariant not only to rotation, and scaling but also to translation f x,y
0 otherwise,
by denition. The proposed descriptor is a global one, its compu-
tation is essentially different from that of another type of where D is the domain of the binary shape represented by f x,y.
descriptors, local descriptors [23,24], which can work only on In the illustration of the Radon transform given in Fig. 1, the
textured patterns and fail to describe smooth or non-textured shaded region represents the region D. The value of the line
ones. The main idea here is to apply the 1D FourierMellin integral in Eq. (1) equals the length of the intersection between
transform, which is invariant to translation and scaling, on the the line L and the shaded region.
radial coordinate of the patterns Radon image. The result of this The Radon transform has some properties that are benecial
transform is a representation of pattern that is shifted horizon- for pattern recognition as outlined below:
tally according to the patterns rotating angle. A further applica- P1 linearity: The Radon transform is linear.
tion of the discrete Fourier transform on this representation
guarantees a pattern descriptor totally invariant to RST transfor- Rf g y, r Rf y, r Rg y, r:
mations. The proposed descriptor is fast in implementation and is
P2 periodicity: The Radon transform of f x,y is periodic in the
more robust to additive noise than existing descriptors. Experi-
variable y with period 2p.
mental results on both occlusion/deformation and noisy data-
bases show the usefulness of the proposed descriptor. Rf y, r Rf y 2kp, r, 8k A Z:
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 273

Similarly, the discrete Fourier transform possesses a circular


shift invariant property: a circular shift of the input ff ng
corresponds to multiplying the output DF f k by a linear phase.
Let fgng be a sequence obtained by circular shifting ff ng by a
distance m, then

DF g k e2pi=Nkm DF f k: 4

Taking the magnitude of the two sides of Eq. (4) results in


jDF g kj jDF f kj

or the magnitude of the discrete Fourier transform of a 1D


function is invariant to circular shifting.

2.3. The Mellin transform

Fig. 1. Graphical illustration of the Radon transform of a function f x,y. The Radon Let f x A R1 be a 1D function. The Mellin transform [4] of f,
transform is a mapping from the image space (x,y) to the parameter space y, r denoted by Mf , is a function dened by
and can be mathematically represented by a line integral of f x,y along all the Z 1
lines L parameterized by y, r represented in the image space (x,y). Mf s f xxs1 dx, 5
0

where s s it. The real part of s, s, is a constant chosen such


P3 semi-symmetry: The Radon transform of f x,y is semi- that the integral in Eq. (5) converges. The imaginary part of s, t, is
symmetric. the transform variable.
Consider a function gx f ax, a scaling of f by a factor a
Rf y, r Rf y 7 p,r: (a 4 0), then
P4 translation: A translation of f x,y by a vector ~u x0 ,y0 Z 1
results in a shift in the variable r of Rf y, r by a distance Mg s gxxs1 dx as Mf s: 6
0
d x0 cos y y0 sin y, equal to the length of the projection of ~u
onto the line x cos y y sin y r. Taking the magnitude of the two sides of Eq. (6) results in

Rf y, r-Rf y, rx0 cos yy0 sin y: jMg sj as jMf sj:

P5 rotation: A rotation of f x,y by an angle y0 implies a shift in Thus, except for a constant multiplicative factor as , the
the variable y of Rf y, r by a distance y0 . Mellin transform is scale invariant. However, this scaling factor
can be easily eliminated by normalization or it can be used to nd
Rf y, r-Rf y y0 , r: the relative scale between two functions.
P6 scaling: A scaling of f x,y by a factor a results in a scaling in
the variable r and the amplitude of Rf y, r by the factors a and
1=a respectively. 3. An invariant pattern descriptor based on the Radon,
Fourier, and Mellin transforms
1
Rf y, r- Rf y, ar:
a This section presents a combination of the Fourier and Mellin
transforms for 1D signals and its application on the radial
2.2. The Fourier transform coordinate of the patterns Radon image to have a pattern
descriptor invariant to RST transformations. A measure of simi-
Let f x A R1 be a 1D function. The Fourier transform [3] of f, larity for the proposed pattern descriptor is also dened, allowing
denoted by F f , is a function dened for every real number x by the comparison/matching between any pair of images.
Z 1
F f x f xei2pxx dx: 2 3.1. The combined FourierMellin transform for 1D signal
1

It is well known that the Fourier transform possesses a shift or Combinations of the Fourier and Mellin transforms have been
translation invariant property. Consider a function gx f xx0 , proposed in literature to have signal representations that do not
a version of f(x) shifted by x0, then vary with rotation/scaling or translation/scaling. For 2D signals,
Z 1
they are rst converted from Cartesian coordinates into polar
F g x gxei2pxx dx ei2pxx0 F f x: 3
1 coordinates then Fourier and Mellin transforms are performed
independently on the circular and radial coordinates of the
Taking the magnitude of the two sides of Eq. (3) results in
converted signals respectively [26,27]. In this way, the obtained
jF g xj jF f xj representation is invariant to rotation and scaling. For 1D signals,
Fourier and Mellin transforms are performed in sequence directly
or, in other words, the magnitude of the Fourier transform of a 1D
on the signals [28,29] to have signal representations invariant to
function is invariant to translation. Discrete version of Eq. (2)
translation and scaling.
dened for a sequence of numbers ff n : n 0, . . . ,N1g has the
Consider the Fourier transform of a function gx f axx0 , a
following denition:
scaled and translated version of f(x) by parameters a and x0:
X
N 1 Z 1
DF f k f ne2pi=Nkn , k 0, . . . ,N1: F g x f axx0 ei2pxx dx: 7
n0 1
274 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284

Denoting y axx0 , Eq. (7) becomes By varying the value of y and s, the two images MF RJ y,s and
Z 1   MF RI y,s are obtained. Moreover, except for a constant multi-
1 1 x
F g x ei2px=ax0 f yei2px=ay dy ei2px=ax0 F f : 8 plicative factor as2 , MF RJ y,s can be directly obtained by
a 1 a a
shifting MF RI y,s along the y axis by a quantity y0 . Applying
Taking the magnitude of the two sides of Eq. (8) results in
   the discrete Fourier transform on the angular coordinate of the
1 x  FourierMellin images MF RJ y,s and MF RI y,s then ignoring
jF g xj F f : 9
a a the phase information in the coefcients, we have
The translation parameter x0 has disappeared in Eq. (9); the jDF MF RJ k,sj as2 jDF MF RI x,sj: 13
remaining scaling parameter a could be eliminated by applying
the Mellin transform on both sides of Eq. (9):
Z 1    This demonstrates that, except for a constant multiplicative
1  x  s1 factor as2 , the proposed descriptor computed on a scaled,
MjF g j s F f x dx as1 MjF f j s,
0 a a rotated, and translated version J of a image I is exactly the same
or as the descriptor computed on the original image I. The factor
as2 , however, can be easily eliminated by a normalization step
jMjF g j sj as1 jMjF f j sj: 10
using the DC component as jDF MF RI k,s=DF MF RI 0,sj or it can
Therefore, by dening be used to determine the relative scale between any two pattern
Z 1  Z images of the same category. &
  1
s1
MF f s jMjF g j sj  
 f xei2pxx dxjx dxj 11
0 1 Calculating the proposed RFM descriptor, as described above,
does not require any normalization regarding the size, position, or
as the combined 1D FourierMellin transform of a function f(x)
orientation of the patterns, it only requires a normalization in the
and from Eq. (10), MF f s is invariant to translation and scaling,
intensity of the computed descriptor. As a consequence, the
except for a constant multiplicative factor.
proposed descriptor is totally invariant to RST transformations.

3.2. The proposed RFM descriptor


3.3. Similarity measure

Attractive invariant properties of the Radon and 1D Fourier


For any two images I and J represented by RFMI and RFMJ
Mellin transforms lead to the proposal of a novel region-based
respectively their measure of similarity is dened as the 2 -norm
pattern descriptor, called the RFM descriptor. The proposed
distance between their descriptors as
descriptor of an image I is computed by RFMI jDF MF RI k,s=
DF MF RI 0,sj, namely: simI,J JRFMIRFMJJ2 : 14
The computation of simI,J is simple and fast, permitting the
Step 1: The Radon transform performed on the image I, RFM descriptor to be used in pattern matching problems with
RI y, r. large-size datasets. More sophisticated distances like the weighted
Step 2: The 1D FourierMellin transform performed on the Euclidean distance [30] could be used to reduce the dominance of
radial coordinate of the obtained Radon image, some of the coefcients in the RFM descriptor. However, as small
MF RI y,s. value coefcients usually correspond to high frequency compo-
Step 3: The magnitude of discrete Fourier transform per- nents, meaning that they are more sensitive to additive noise and
formed on the angular coordinate of the obtained, sampling/quantization effect, balancing the coefcient contribu-
discretized FourierMellin image normalized by the tions thus reduces the performance of the descriptor in noisy
DC component, jDF MF RI k,s=DF MF RI 0,sj. environment. Moreover, due to the orthogonality of the Fourier
basis, there is no correlation among coefcients of the RFM
Invariant properties of the proposed descriptor described above descriptor and thus the Mahalanobis distance [31], if employed,
are proved as follows. reduces to the weighted Euclidean distance.
Properties. The proposed RFM descriptor is invariant to translation,
rotation, and scaling.
4. Implementation of the RFM descriptor
Proof. Let J be the image obtained after scaling, rotating, and
translating a image I using transformation parameters a, y0 , and From the denition of the RFM descriptor in Section 3.2, its
~
u x0 ,y0 . Properties P426 of the Radon transform imply: calculation could be separated into three steps: Radon, 1D Four-
ierMellin, and discrete Fourier transforms. However, due to the
1 interpretation of the Radon transform by means of the Fourier
RJ y, r RI y y0 , ard, 12
a transform, a computational reduction is possible by fusing these
where d x0 cosy y0 y0 siny y0 is the shift distance depend- two transforms. Moreover, in order to have fast implementation
ing on y. Eq. (12) indicates that, except for a constant multi- and high discriminatory power, a modied version of the Mellin
plicative factor 1=a, each column constant-y slice of the Radon transform has been used.
image of J, RJ y,, can be obtained by scaling and translating the
constant-y y0 slice of the Radon image of I, RI y y0 , by a 4.1. Possible fusion of Radon and Fourier transforms
factor a and a distance d.
Applying the 1D FourierMellin transform on RJ y, and The Radon transform could be computed based on recursively
RI y y0 , and using its invariant property and Eq. (10), Eq. (12) dened digital straight lines [32,33] requiring ON log N opera-
becomes tions for an images of N n  n pixels. It could also be computed
through 2D Fourier transform with the same complexity by
MF RJ y, y,s as2 MF RI y y0 , y y0 ,s: means of the projection slice theorem [34], which states that
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 275

the 1D constant-y slice of the Radon image Rf , y and the 1D gives


radial slice of the 2D Fourier transform F f x cos y, x sin y make a Z T Z 2T Z NT
1D Fourier transform pair: Mf s f xxs1 dx f xxs1 dx    f xxs1 dx:
0 T N1T
Z 1 Z 1 Z 1 
F Rf , y x f x,ydrx cos yy sin y dx dy eirx dr 15
1 1 1
The value of f(x) is assumed to be piecewise constant in any
Z 1 Z 1 interval T, then
f x,yeix cos y y sin yx dx dy
1 1 sMf s f 0xs j10 f Txs j21    f N1Txs jN
N1 : 16
F f x cos y, x sin y: Denoting f iT fi 1 , then Dk fk fk1 and, without loss of
generality, assuming T 1 and fN 0, Eq. (16) becomes
Thus, if one chooses to implement the Radon transform
through 2D Fourier transform, due to the successive applications X
N 1 X
N 1
sMf s f1 xs j10 f2 xs j21    fN xs jN
N1 ks fk fk 1 k s Dk :
of inverse and forward Fourier transforms, performing the 1D
k1 k1
FourierMellin transform on each constant-y slice of the Radon
17
image is equivalent to directly performing the Mellin transform
on the 1D radial slice of the 2D Fourier transform. This equiva- Substituting s s it into Eq. (17)
lence results in a computational reduction with no change in X
N 1
complexity. However, for notation convenience and for the clarity s itMf s it ks it Dk : 18
of presentation, the RFM descriptor in this paper sticks with the k1
denition in Section 3.2. This choice of presentation has no Because ks it es itln k is bounded for any xed constant
impact on the remaining part. It is only when the RFM descriptor value of s, the right hand side of Eq. (18) is bounded, which
is chosen to be implemented through 2D Fourier transform that means jMf sj converges to zero when jsj increases. This indicates
the readers should pay attention to this fusion for their own the low-pass ltering of the Mellin transform.
computational benet. For a set of si s iti with i1,y,m, ti is the arbitrary spectral
component, and denoting ksi fi,k , Eq. (17) can be rewritten in
matrix form as
4.2. The modied direct Mellin transform 2 3 2 32 3
s1 Mf s1 f1,1 f1,2    f1,N1 D1
6 s M s 7 6 f 76 7
The Mellin transform, as dened in Eq. (5), has a very 6 2 f 2 7 6 2,1 f2,2    f2,N1 76 D2 7
6 76 76 7: 19
6 ^ 7 6 7 6 7
attractive property of scaling invariance and it can be implemen- 4 5 4 ^ ^ & ^ 54 ^ 5
ted optically by using an optical scale invariant correlator [26]. sm Mf sm fm,1 fm,2    fm,N1 DN1
However, there are reported problems with its digital implemen-
tation in todays digital systems [35]. Traditionally, the Mellin The direct Mellin transform, as dened in Eq. (15), is an exact
transform is implemented by changing the variable x ey : implementation of the Mellin transform for sampled data and has
Z 1 Z 1 Z 1 the following properties:
Mf s f xxs1 dx f xxs1 eitlnx dx f ey esy  eity dy:
0 0 1 1. It does not require exponential re-sampling and there is no
This is, by denition, the Fourier transform of the distorted requirement for a correction term.
function f ey weighted by esy . For sampled data, the discrete 2. Only the differences in the value of adjacent data points are
(fast) Mellin transform is implemented through FFT by re-sam- used for computing the transform.
pling the data exponentially. Exponential sampling means inter- 3. The coefcients fi,k in Eq. (19) can be computed off-line and
polating the data to make them uniformly sampled in the y stored. For each specic value of si, the number of stored
domain [36]. This process introduces errors into the transform coefcients for direct Mellin transform is N whereas the
and accentuates the low frequency components. Additionally, if number of stored coefcients for fast Mellin transform is
f(x) is sampled at the Nyquist rate and N is the number of data N ln N.
samples in the x domain then the number of the required data 4. Computing si Mf si consists of only an inner product of two
samples in the y domain will be M N ln N [37]. Likewise, when vectors, one of which has been pre-computed and stored and
f(x) is nonzero at x0, f ey will be nonzero at y 1 and the the other vector could be obtained from the input data by
implementation is not realizable. In this case, the Mellin trans- simple subtraction operations. Thus, Eq. (19) is very fast in
form can be approximated by using a correction term dened digital implementation.
based on the value of f x 0 [37].
Another problem is on the application of the combined Four- In order to eliminate the low-pass ltering characteristic of the
ierMellin transform in Section 3.1 for feature extraction despite direct Mellin transform, a modication is carried out by removing
it is invariant to scaling and translation. The problem is the the s s it factor in Eq. (18). The resulting transform is called
obscurity of the discriminatory information contained in the the modied direct Mellin transform, denoted by MM, as follows:
input function [38]. Possible reasons are the discard of the phase X
N 1
information from the output of the Fourier and Mellin transforms MMf s sMf s ks Dk :
and the accentuation of low frequency components. Moreover, as k1

FFT is applied on the radial coordinate of the patterns Radon In the time domain, the modied direct Mellin transform is
image, the DC component is always nonzero and it, in turn, is the dened as
value of the Mellin transforms input function f(x) at x 0. Z 1
To avoid these problems, an alternative implementation of the MMf s s f xxs1 dx:
0
Mellin transform proposed in [29], which is called the direct
Mellin transform, is adopted for this work. Assuming f(x) is in the It is easy to verify that the modied direct Mellin transform
form of sampled data with sampling period T, expanding Eq. (5) maintains the scaling invariance property and the combined
276 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284

Fourier-modied direct Mellin transform dened as A quantitative evaluation of the invariant properties of the
Z FourierMellin transform is given in Fig. 3 using the normalized
1
cross correlation between all possible pairs of FourierMellin images
MF f s MMjF f j s s jF f jxs1 dx 20
0 MF RIi y,s and MF RIj y,s from the third row of Fig. 2. Normalized
cross correlation is selected for the sake of overcoming the constant
is invariant to both translation and scaling. Therefore, the 1D multiplicative factor as2 in Eq. (13). To overcome the remaining
FourierMellin transform used in the denition of the RFM rotating parameter, the correlation is calculated for all possible
descriptor in Section 3.2 is henceforth dened as in Eq. (20), rotating angle, meaning that one of the two FourierMellin images
instead of Eq. (11), and implemented through Eq. (19). can be circular shifted along its y direction by 180 possible values
To qualitatively illustrate the invariant properties of from 0 to 179 with increment of 1. Denoting j as the shifting
MF RI y,s, Fig. 2 provides some example pattern images and distance, the correlation between image Ii and Ij at j is dened as
their corresponding Radon and FourierMellin images. The rst
row contains two original pattern images (Fig. 2(a), (b)) and the Cij j corr2MF RIi y,s,MF RIj y j,s,
RST transformed versions (Fig. 2(c)(e)) of the pattern image in
Fig. 2(b). The second row shows the Radon transform of these where corr2A,B is the normalized cross correlation function
pattern images and the third row shows the images obtained after between two 2D input data A and B of size m  n calculated using
performing the 1D FourierMellin transform on the Radon images the following formula:
in the second row using 150 values of t ranging from 2.0 to 16.9 Pm Pn
with increment of 0.1 (the intensity of these images has been i1 j 1 Aij ABij B
q
Pm Pn P Pn
rescaled to [0,255] for better viewing). It is noteworthy here that, i 1 j 1 Aij A2 m i1 j 1 Bij B ,
2

due to the periodicity and semi-symmetry properties of the


Radon transform (P23), the effective range of y used in the where A and B are the mean values of A and B respectively.
computation is 02prad or 01801. The 10 curves Cij j corresponding to the 10 possible pairs of
The two images I1, I2 in Fig. 2(a), (b) are not similar and as a ve FourierMellin images have two different patterns, Cij j is
consequence MF RI1 y,s, MF RI2 y,s in Fig. 2(k), (l) have different peaky only when the two pattern images Ii and Ij are similar
patterns. The images I3, I4, and I5 in Fig. 2(c)(e) are transformed (i,j 2,3,4,5). The maximum value of Cij j are 0.6200, 0.6236,
versions of the image I2 in Fig. 2(b) then MF RI3 y,s, MF RI4 y,s, 0.6239, 0.6239, 0.9962, 0.9943, 0.9943, 0.9970, 0.9970, 1.0000
and MF RI5 y,s in Fig. 2(m)(o) have the same pattern with respectively from left to right, top to bottom. The non-peaky and
MF RI2 y,s in Fig. 2(l). The images in Fig. 2(l)(o) demonstrate peaky maxima exhibit the discriminatory power of the proposed
clearly the scaling and translation invariant properties of the 1D descriptor and the maximum of nearly 1 means that the Fourier
FourierMellin transform, it is invariant to scaling, translation and Mellin image is invariant to translation and scaling. The value of
converts rotation in the image I into a circular shift in the variable jn corresponding to the peak in Cij j denotes the difference in
y of MF RI y,s. orientation (in degree) between the patterns in Ii and Ij.

Image I1 Image I2 Image I3 Image I4 Image I5

RI1 (, ) RI2 (, ) RI3 (, ) RI4 (, ) RI5 (, )

MFRI (, s) MFRI (, s) MFRI (, s) MFRI (, s) MFRI (, s)
1 2 3 4 5

Fig. 2. Illustration of the invariant properties of the Radon and 1D FourierMellin transforms performed on pattern images. The rst row contains two original pattern
images (a)(b) and the RST transformed versions (c)(e) of the pattern image in (b). The second row shows the Radon transform of these pattern images and the third row
shows the images obtained after performing the 1D FourierMellin transform on the Radon images in the second row using 150 values of t ranging from 2.0 to 16.9 with
increment of 0.1.
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 277

1 1 1 1 1
C12 ()

C13 ()

C15 ()

C23 ()
C14 ()
0.5 0.5 0.5 0.5 0.5

0 0 0 0 0
0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180
    

1 1 1 1 1
C24 ()

C45 ()
C35 ()
C25 ()

C34 ()
0.5 0.5 0.5 0.5 0.5

0 0 0 0 0
0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180 0 60 120 180
    

Fig. 3. The normalized cross correlation between all possible pairs of FourierMellin images from the third row of Fig. 2. For each pair of images, 180 correlation values are
calculated by circular shifting one of the two FourierMellin images along its y direction by 180 possible values from 0 to 179 with increment of 1.

5. Robustness to noise

Pattern descriptor based on the Radon transform has the


advantage of being robust to additive noise [39]. Suppose the
image f x,y is corrupted by additive white noise Zx,y with zero
mean and variance s2 to be f^ x,y f x,y Zx,y, the Radon
transform of the noisy image f^ x,y is obtained by applying the
linearity property of the Radon transform (P1):

Rf^ y, r Rf y, r RZ y, r:

Recall that the Radon transform, as illustrated in Fig. 1, is line


integrals of f x,y. In the continuous domain, the Radon transform
of additive white noise RZ y, r is proportional to the mean value
of the noise, which means RZ y, r 0, or Fig. 4. Illustration of the computation of the Radon transform by denition: for
each value of y, the image function f x,y is projected onto an axis r which is
Rf^ y, r Rf y, r: 21 perpendicular to the line L and makes with the x axis an angle of y. The projection
makes itself a line Rf y, in the Radon image.
The ideal additive white noise, therefore, has no effect on the
RN
Radon transform of the image in the continuous domain. How- In the above equation, the integral Nr r 1 nr dr represents all
ever, in practice, the image represented and processed in digital the area of f x,y and is equal to the number of pixels in f x,y,
RN
computer is not a continuous function but its sampled and which is mn. Then, by denoting Ay Nr r 1 n2r dr, Eq. (22) is
quantized version and Eq. (21) does not hold. simplied as
Suppose f x,y is in the form of a sampled 2D signal of size
m  n (0 rx rm, 0 r yr n) whose values are random variables mns2 Aym2
Ep : 23
with mean m and variance s2 . The computation of the Radon 2Nr 2Nr
transform of f x,y is assumed to follow the denition, that is the
For the image f x,y corrupted by white noise Zx,y, assuming
values of f x,y along the lines Ly, r are summed up, as shown in
f x,y has mean ms , variance s2s and Zx,y has mean mn 0, variance
Fig. 4. The contribution of each pixel i to the sum is equal to the
s2n , then Es mns2s =2Nr Aym2s =2Nr and En mns2n =2Nr . The
length of its intersection with Ly, r. The sum of all the pixels
signal-to-noise ratio (SNR) of f^ x,y and its projection Rf^ y, is
contribution corresponding to Ly, r is therefore equal to the
length of the segment AB, the intersection of f x,y with Ly, r. s2s m2s
The sum of f x,y along all the lines Ly, r having the same SNRimage ,
s2n
direction y can also be interpreted as the projection of f x,y onto
an axis r perpendicular to Ly,. This projection is actually the s2s Amny m2s
Es mns2s Aym2s
line Rf y, in the Radon image of f x,y. SNRprojy ,
En mns2n s2n
To study this projection, let y const and denoting nr AB,
then the sum of the pixel values pr Rf y, r for each line Ly, r or:
has mean nr m and variance nr s2 . The average of the expected   2
Ay ms
value of p2r is SNRprojy SNRimage 1 : 24
mn s2n
Z Nr
1 The SNR is increased by a quantity Ay=mn1m2s =s2n after
Ep Efp2r g dr
2Nr Nr 1 projecting f^ x,y along the direction y. As the value of Ay
Z Nr Z Nr
depends both on y, m and n, the multiplicative factor
1 1
nr s 2 dr n2r m2 dr: 22 Ay=mn1 is not constant. Moreover, the value of Ay=mn is
2Nr 2Nr RN
Nr 1 Nr 1 big because Ay Nr r 1 n2r dr is one order higher than
278 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284

R Nr
mn Nr 1 nr dr. Eq. (24) can be rewritten as noise respectively, then

Ay m2s ms D, s2s DD2 ,


SNRprojy C SNRimage  :
mn s2n

As Ay=mn  m2s =s2n is positive and has high value, corre-


mn d12D, s2n dd2 12D2 :
sponding to a high increase in the value of SNR after projection, Using Eq. (23), the signal-to-noise ratio (SNR) of f^ x,y and its
this means that the Radon transform is very robust to additive projection Rf^ y, is
white noise. Fig. 5 depicts the values of Ay=mn for a range of y
from 0 to 1801 using input images of different sizes. We can s2s m2s D
SNRimage ,
notice that the value of Ay=mn depends on the actual size of s2n m2n d
f x,y and it gets its maximum in the direction of the longer side
and its minimum near the direction of the shorter side of f x,y, DD2 A y 2
mns2s Aym2s mn D
therefore: SNRprojy
mns2n Aym2n dd2 12D2 Ay 2 2
mn d 12D
Ay  
min Cminm,n, D 1 D A y
mn 1
y mn   
d 1 d12D2 Ay1 mn
Ay
max maxm,n:
y mn then
 
In the eld of shape analysis and recognition, f x,y is con- SNRprojy 1 D A y
mn 1
strained to have binary values of 0 or 1 and the additive noise to  :
SNRimage 1 d12D2 A y
f x,y is in the form of salt & pepper noise, instead of white mn 1

noise. To model this type of noise, let D and d be the percentage of


It is clear that SNRprojy =SNRimage depends on the size of the
pixels in f^ x,y occupied by the shape region and ipped by the
input noisy image f^ x,y, the projection direction y, the percentage
of shape region D, and the level of noise d. For explicit minimum
120 140 value of SNRprojy =SNRimage , let us say D A 0:3,0:7 and d A 0,0:2.
140
100 140
100 130 This is a practically reasonable assumption as the shape usually
100 120 occupies around half of the image region (D0.5) and the shape
130
100 110 image is not too noisy (the highest level of noise used in experi-
100 100 mentation is d0.1, meaning 10% of the image pixels is ipped,
120 corresponding to the dataset in the bottom row of Fig. 11(b)). Due to
A ()
mn

the inverse proportion of SNRprojy =SNRimage to d, SNRprojy =SNRimage


110 gets its minimum value at max d 0:2. Moreover at d0.2, as
SNRprojy =SNRimage decreases when D is going away from the point
D0.5, the minimum value of SNRprojy =SNRimage , at a specic value
100 of Ay=mn, is reached at minD 0:3. The depiction of the value of
SNRprojy =SNRimage for the case Ay=mn 100 over the domain
90 D A 0:3,0:7 and d A 0,0:2 is given in Fig. 6(a).
0 60 120 180 Fixing D 0.3 and d0.2, the dependance of SNRprojy =SNRimage
 (degree) on Ay=mn is further given in Fig. 6(b). It is evident that
SNRprojy =SNRimage 41, meaning the projection in the Radon
Fig. 5. The value of the multiplicative factor Ay=mn in Eq. (24) at different values
transform has the property of suppressing salt & pepper noise.
of y computed on an image f x,y of different sizes. It gets its maximum at y 901
corresponding to the direction of the longer side of f x,y and its minimum Additionally, SNRprojy =SNRimage increases with the increase of
near y 0. Ay=mn from 4.1667 at Ay=mn 20 (a very small image) to its

10

80
SNRproj ()

6
SNRimage

60 D1
= 9.375
SNRproj ()

d (12D)2
SNRimage

40 4
0
20 2
0 0.1
0.7 d
0.6 0
0.5 0.4 20 200 400 600 800 1000
D 0.3 0.2
A ()
mn

Fig. 6. The values of SNRproj =SNRimage over the domain D A 0:3,0:7 and d A 0,0:2. A high value of SNRproj =SNRimage means that the projection in the Radon transform has
the property of suppressing salt & pepper noise. (a) Ay=mn 100. (b) D 0.3, d 0.2.
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 279

limit at comparing descriptors need normalizations in order to be invar-


SNRprojy iant to RST transformations. Moreover, the R-signature and
D
lim 9:375 R2DFM descriptors are also dened on the Radon transform.
Ay=mn-1 SNRimage d12D2
These descriptors are selected because they are commonly used
implying a better suppression of noise in the projection of bigger- and have good reported performance. The measure used for
size images. comparison among descriptors is the precisionrecall curve
dened in information retrieval context.
In computing the precisionrecall curve for each dataset, in the
6. Experimental results experiment, each of the images in the dataset is used as a model to
which all the images in the dataset are compared/matched with.
In order to demonstrate the effectiveness of the proposed RFM The matching is realized using the similarity measure dened in
descriptor, three experiments on grayscale and binary image Eq. (14). The obtained matching results are then sorted, or ranked,
datasets have been carried out. The robustness of the proposed for the determination of the nth nearest matches for each model.
descriptor to additive white noise is rst demonstrated by using
two sets of datasets generated from images of 26 Latin characters
and from the COIL-20 dataset [40] by adding white noise of 6.1. Grayscale pattern recognition
different levels to them. Secondly, the proposed descriptor is
computed on a set of datasets generated from the UMD Logo The performance of the proposed descriptor has been rst
dataset [41] by adding salt & pepper noise of different levels to tested on grayscale noisy images to demonstrate its robustness to
its images. The aim of these experiments is to demonstrate the additive white noise. Two experiments have been carried out on
robustness to salt & pepper noise of the proposed descriptor. two different sets of datasets:
Finally, the proposed descriptor is computed and compared with
other descriptors on the reference Shapes216 dataset [42], which  ExpA: The rst set of six alphabet datasets has been generated
is used to evaluate its robustness to occlusion and deformation. from images of 26 Latin characters as shown in Fig. 7(b). Each
Thus, the rst experiment deals with grayscale patterns and the of these six datasets has 260 images of 26 categories, each
last two ones with binary patterns. category contains 10 images.
The proposed RFM descriptor is compared with angular radial  ExpB: The second set of six object datasets has been generated
transform (ART) [43], generic Fourier descriptor (GFD) [6], Zernike from 20 object images from the COIL-20 dataset [40] as shown
moments [30], R-signature [10], and Radon 2D FourierMellin in Fig. 8(a). Each of these six datasets has 220 images of 20
transforms (R2DFM) [11]. Except for the R-signature, all other categories, each category contains 11 images.

Fig. 7. (a) Images of 26 Latin characters of size 64  64 in Arial bold font used to generate six alphabet datasets (noise-free images). (b) Sample images from the six alphabet
datasets generated from the rst four character images with six possible values of SNR f16,8,4,2,1,0:5g, corresponding to the six datasets (examples of noisy images).

Fig. 8. (a) Twenty object images from the COIL-20 dataset [40] used to generate six object datasets (noise-free images). (b) Sample images from the six object datasets
generated from the four object images with six possible values of SNR f16,8,4,2,1,0:5g, corresponding to the six datasets (examples of noisy images).
280 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284

The main characteristic that differentiates ExpA and ExpB, empirical evidence for the analytical results developed in
besides the semantic content of their images, is the number of Section 5. Poor performance of ART, GFD, Zernike, and R2DFM
intensity levels in the original images: character images have only descriptors has its root in the required normalizations in their
two levels of intensity whereas object images have multi-level computation and can be explained as
intensity. Noisy images are generated from the corresponding
noise-free images by randomly scaling, rotating, translating and 1. To have invariance to translation, the origin of the polar
then adding white noise to them. Let SNR be the signal-to-noise coordinate system needs to be placed at the centroid of the
ratio dened as pattern. In the presence of noise, position of the centroid is
P 2 shifted arbitrarily according to the actual noise.
x,y f x,y
SNR P 2. To have invariance to scaling, the radial axis is normalized by the
x,y Z x,y
2
distance from the origin of the polar coordinate system to the
where f x,y is the noise-free image and Zx,y is the added white furthest patterns point. In the presence of noise, this furthest
noise, the value of SNR for each dataset is kept constant and, for point might not belong to the actual pattern but the noise.
each experiment, SNR has six possible values {0.5,1,2,4,8,16}, corre-
sponding to the six datasets. Some example images from the six Furthermore, it is also evident from the two set of experiments
datasets in the two experiments are given in Figs. 7(b) 8(b). that the performance of R-signature and RFM descriptors is better
Comparison of the proposed RFM descriptor with ART, GFD, on ExpB than on ExpA at each value of SNR, leading to a conclusion
Zernike, R-signature, and R2DFM descriptors on these noisy that the proposed descriptor performs better on multi-level than
datasets have been performed and the obtained results on ExpA two levels grayscale images. Possible explanation for this come
and ExpB are given in Figs. 9 and 10 respectively. It is observed from the effect of re-sampling and re-quantization when an
from these set of gures that: image is rotated, scaled, and/or translated. Multi-level images
have smooth intensity surface which suffers less deformation
1. ART, GFD, Zernike, and R2DFM descriptors are not robust to due to re-sampling and re-quantization than the deformation
additive white noise at all, their performance is similarly poor occurred at sharp edge in two levels images.
for different levels of noise.
2. R-signature and RFM descriptors have reasonably good per- 6.2. Binary pattern recognition
formance when the noise is weak (SNR 16,8,4), demonstrat-
ing its resistance to additive noise. 6.2.1. Noisy datasets
3. As SNR decreases (the images get noisier), the precisionrecall The robustness of the proposed RFM descriptor to additive salt
curves of the R-signature and RFM descriptors generally move & pepper noise is demonstrated using six logo datasets generated
downwards. However, the curves of the RFM descriptor is from the rst 25 logo images of the UMD Logo dataset as shown in
always above that of the R-signature, meaning a more robust- Fig. 11(b). Each of these six logo datasets has 275 images of 25
ness of the RFM descriptor to noise than the R-signature. categories, each category contains 11 images generated by ran-
domly scaling, rotating, translating the original corresponding logo
It is thus can be concluded that the proposed RFM descriptor is image and then adding salt & pepper noise to it. Let d be the
more robust to additive white noise than all other comparing percentage of pixels ipped by the noise, the value of d for each
descriptors on grayscale noisy datasets and this provides an generated dataset is kept constant and d has six possible values

1 1 1

0.8 0.8 0.8


precision
precision

precision

0.6 0.6 0.6


ART ART ART
0.4 GFD 0.4 GFD
0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall

1 1 1

0.8 0.8 0.8


precision
precision

precision

0.6 0.6 0.6


ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike
0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall

Fig. 9. The precisionrecall curves of different descriptors on the six alphabet datasets. ART, GFD, Zernike, and R2DFM descriptors are not robust to noise while R-signature
and RMF descriptors are. As SNR decreases (the images get noisier), the precisionrecall curves of the RMF and R-signature descriptors generally move downwards but the
curves of the RMF descriptor is always above that of the R-signature, meaning its superiority. (a) SNR 16. (b) SNR 8. (c) SNR 4. (d) SNR 2. (e) SNR 1. (f) SNR 0:5.
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 281

1 1 1

0.8 0.8 0.8


precision

precision

precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall

1 1 1

0.8 0.8 0.8


precision

precision

precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall

Fig. 10. The precisionrecall curves of different descriptors on the six object datasets. ART, GFD, Zernike, and R2DFM descriptors are not robust to noise while R-signature
and RMF descriptors are. As SNR decreases (the images get noisier), the precisionrecall curves of the RMF and R-signature descriptors generally move downwards but the
curves of the RMF descriptor is always above that of the R-signature, meaning its superiority. (a) SNR 16. (b) SNR 8. (c) SNR 4. (d) SNR 2. (e) SNR 1. (f) SNR 0:5.

Fig. 11. (a) Twenty-ve logo images from the UMD Logo dataset [41] used to generate six logo datasets (noise-free images). (b) Sample images from the six logo datasets
generated from the rst four logo images with six possible values of d f0,0:02,0:04,0:06,0:08,0:1g, corresponding to the six datasets (examples of noisy images).

ranging from 0 to 0.1 with increment of 0.02 corresponding to recall curve moves downwards. However, the impact of d on those
6 datasets. The rst dataset with d0 is actually a noiseless curves differs from one descriptor to another. Among the other
dataset; its use is intended for checking the invariant properties descriptors, the proposed descriptor has the best performance for
of the proposed and comparing descriptors. The values of d of the all the 5 noisy datasets while ART and Zernike descriptors have
other 5 noisy datasets make up an arithmetic progression with similarly worse performance. It is also observed that:
common difference 0.02. These 5 datasets are, therefore, used to
evaluate the resistance of the proposed and comparing descriptors 1. As d increases, the curve of all the descriptors generally move
at incrementing levels of additive noise. Some sample images from downwards.
these 6 datasets are given in Fig. 11(b). It should also be noted that 2. ART and Zernike descriptors are not robust to noise at all, their
the maximum chosen value of d, 0.1, satises the assumption on performance is similarly poor for different levels of noise.
possible value of d in the analysis of the robustness of the proposed 3. GFD have more resistance to noise than ART and Zernike
descriptor to additive salt & pepper noise in Section 5. because its curves are pushed away from the ideal curve
The proposed RFM descriptor is again compared with ART, GFD, (when d0) with distance which increases along with the
Zernike, R-signature, and R2DFM descriptors using these 6 datasets increase in d. However, the resistance of GFD is weaker than
and the computed precision-recall curves of these descriptors are that of descriptors dened on the Radon transform.
depicted in Fig. 12. For the noiseless dataset with d0 (Fig. 12(a)), 4. Among the three Radon-based descriptors, the shift in the
all shape descriptors have ideal performance. This demonstrates curve of R-signature and RFM is more regular than that of
that the proposed descriptor is totally invariant to both translation, R2DFM. In addition, and more importantly, the RFM descriptor
rotation, and scaling. When d a 0 (Fig. 12(b)(e)), deterioration has the smallest in the shift of its curve among all descriptors
appears in the performance of all descriptors and their precision- for each value of d.
282 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284

1 1 1

0.8 0.8 0.8


precision

precision

precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall

1 1 1

0.8 0.8 0.8


precision

precision

precision
0.6 0.6 0.6
ART ART ART
0.4 GFD 0.4 GFD 0.4 GFD
Rsig Rsig Rsig
0.2 Zernike 0.2 Zernike 0.2 Zernike
R2DFM R2DFM R2DFM
RFM RFM RFM
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
recall recall recall

Fig. 12. The precisionrecall curves of different descriptors on the six logo datasets. For the noiseless dataset d 0 (a), all shape descriptors have ideal performance. When d a 0
(b)(f), deterioration appears in the performance of all descriptors and their curve moves downwards. However, the impact of d on those curves differs from one descriptor to
another. Zernike and the RFM descriptors have worse and best performance respectively among the remaining. (a) d0. (b) d0.02. (c) d0.04. (d) d0.06. (e) d0.08. (f) d0.1.

The above observations lead to a conclusion that the proposed


RFM descriptor is more robust to additive salt & pepper noise
than all other comparing descriptors and this provides an empiri-
cal evidence for the analytical results developed in Section 5. Poor
performance of ART, GFD, Zernike, and R2DFM descriptors in
binary noisy datasets has similar explanations as in grayscale
noisy datasets given in the previous subsection.

6.2.2. Occlusion and deformation dataset


Fig. 13 shows the set of 216 shapes from the Shapes216
dataset. This dataset is composed of 18 shape categories with
12 samples per category, each of these cannot be obtained by RST
transforming any other shape from the same category. The overall
resulting nearest matches by category using the proposed RFM
descriptor is given in Table 1. The value of each cell in the center
region of this table represents the number of times the nth
nearest matches for each category are in the appropriate category.
The precision of the rst 12 nearest matches for each category is
given in the last column and the value varies according to
category. The proposed descriptor obtains:

1. 100% precision in the rst 12 nearest matches on Car, Children,


Face, and Fountain.
2. High performance on Bone, Brick, and Classic.
3. And low performance on Elephant, Fork.
Fig. 13. Images from the Shapes216 dataset [42]. There are 18 shape categories
The last row of Table 1 provides the precision of each nearest and each category contains 12 shapes, each of these cannot be obtained by RST
match for all categories. It is clear that the categorization is not as transforming any other shape from the same category.
good as that given in [42] where the precision of each nearest match
for all categories are reported as (100, 100, 100, 100, 99, 97, 99, 96, work with noisy patterns and need clean images for feature
96, 95, 91, 80). However, it should be noted that the proposed RFM extraction, which the proposed one does not. Comparison of the
descriptor is not intended nor designed to work solely with binary proposed RFM descriptor with commonly used descriptors in this
patterns as in [42]. It is designed, instead, to work also with direction is given in Fig. 14. The performance of the RFM descriptor
grayscale patterns under RST transformations allowing a certain outperforms the performance of R-signature and is comparable to
level of additive noise. Furthermore, methods such as [42] cannot the performance of ART, GFD, Zernike, and R2DFM descriptors.
T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284 283

Table 1
The resulting nearest matches by category of the Shapes216 dataset using the proposed RFM descriptor.

Category nth nearest match %

1 2 3 4 5 6 7 8 9 10 11 12

Bone 12 12 12 12 12 12 12 12 12 12 12 10 98.61
Glass 11 12 11 12 12 11 10 11 11 12 6 12 90.97
Heart 11 8 9 8 9 5 8 9 10 9 7 7 69.44
Misk 10 10 10 6 4 10 10 10 1 10 10 10 70.14
Bird 4 1 9 10 11 6 10 10 10 11 9 7 68.06
Brick 12 12 12 12 11 12 12 12 12 12 12 12 99.30
Camel 6 6 8 9 9 5 7 5 10 8 8 9 62.50
Car 12 12 12 12 12 12 12 12 12 12 12 12 100
Children 12 12 12 12 12 12 12 12 12 12 12 12 100
Classic 11 11 11 12 12 10 11 11 12 12 11 11 93.75
Elephant 4 1 7 7 3 5 6 3 7 6 4 6 40.97
Face 12 12 12 12 12 12 12 12 12 12 12 12 100
Fork 3 2 7 4 5 1 3 5 6 5 8 8 39.58
Fountain 12 12 12 12 12 12 12 12 12 12 12 12 100
Hammer 9 11 9 11 11 8 5 8 8 8 8 10 73.61
Key 11 12 9 9 11 9 12 12 11 9 11 12 88.89
Ray 4 12 12 12 12 12 12 5 5 11 11 5 78.47
Turtle 6 9 6 6 6 5 10 6 7 4 6 5 52.78

% 100 94.44 91.67 90.74 87.04 81.02 76.85 75.93 69.44 62.96 64.35 56.94

Actual category

Elephant

Fountain
Hammer
Children
Classic
Camel
Glass

Turtle
Heart
Bone

Brick

Face
Misk

Fork
Bird

Ray
Key
Car
1
Bone 140
Glass
Heart 120
0.8 Misk
Bird
Predicted category

Brick 100
precision

0.6 Camel
Car 80
ART Children
0.4 Classic
GFD Elephant 60
Rsig Face
0.2 Zernike Fork 40
R2DFM Fountain
Hammer
RFM Key 20
0 Ray
0 0.2 0.4 0.6 0.8 1 Turtle 0
recall

Fig. 14. (a) The precisionrecall curves of different descriptors on the Shapes216 dataset. The proposed RFM descriptor has similar performance with ART, GFD, Zernike,
R2DFM and outperform R-signature. (b) The confusion matrix of the proposed RFM descriptor on the Shapes216 dataset mapped to the full range of the Jet colormap in
MATLAB (ranging from blue to red, and passing through the colors cyan, yellow, and orange). (For interpretation of the references to color in this gure legend, the reader is
referred to the web version of this article.)

A better insight into the performance of proposed descriptor is 7. Conclusions


by using the confusion matrix plotted as an image Fig. 14(b). The
value in each table cell of row i and column j corresponds to the This paper presents a novel pattern descriptor based on the
number of shape images of actual category j being predicted as Radon, Fourier, and Mellin transforms. The proposed descriptor is
shape images of category i. Thus, it is easy to see if the system is obtained by applying 1D FourierMellin and Fourier transforms
confusing two classes (i.e. commonly mislabeling one as another). on the radial and angular coordinates of the patterns Radon
In the case of the Shapes216 dataset, it is observed that: image respectively and has been proved to be invariant to
rotation, scaling, and translation, without the need of any normal-
1. Low performance on Fork is due to its high confusion with ization step. The proposed descriptors computation is reasonably
Hammer and Key, the value in the cells of column Fork and fast and correct, based mainly on the fusion of the Radon and
rows Hammer, Key is high. This is understandable as the shapes Fourier transforms and on a modication of the Mellin transform.
of Hammer and Key are quite similar to the shapes of Fork. It is proved to be robust to additive noise both theoretically and
2. There is no high confusion between Elephant and any other experimentally. Experimental results show that, when compared
category. However, low performance on Elephant is due to the to commonly used pattern descriptors, the proposed RFM descrip-
diffusion of the values in the column Elephant. Noticeable tor has comparable performance on a occlusion/deformation
values are in the cells of column Elephant and rows Misk, Bird, dataset and outperforms on noisy datasets. Future work will
Camel, Car, Ray, and Turtle. investigate possible combinations of the RFM descriptor with
284 T.V. Hoang, S. Tabbone / Pattern Recognition 45 (2012) 271284

local descriptors. In our perspective, due to the orthogonality [19] Y.W. Chen, Y.Q. Chen, Invariant description and retrieval of planar shapes
between these two type of descriptors, the combination may using Radon composite features, IEEE Transactions on Signal Processing 56
(101) (2008) 47624771.
potentially boost the performance of pattern recognition and [20] A. Kadyrov, M. Petrou, The Trace transform and its applications, IEEE
image retrieval systems. Transactions on Pattern Analysis and Machine Intelligence 23 (8) (2001)
811828.
[21] M. Petrou, A. Kadyrov, Afne invariant features from the Trace transform,
IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (1) (2004)
Acknowledgments
3044.
[22] T.V. Hoang, S. Tabbone, A geometric invariant shape descriptor based on the
The authors would like to thank the anonymous reviewers for Radon, Fourier, and Mellin transforms, in: Proceedings of the ICPR, 2010,
pp. 20852088.
their valuable and insightful comments and suggestions that help
[23] K. Mikolajczyk, C. Schmid, Scale & afne invariant interest point detectors,
to improve the quality of the paper. International Journal of Computer Vision 60 (1) (2004) 6386.
[24] G.-H. Liu, L. Zhang, Y.-K. Hou, Z.-Y. Li, J.-Y. Yang, Image retrieval based on
multi-texton histogram, Pattern Recognition 43 (7) (2010) 23802389.
References [25] S.R. Deans, The Radon Transform and Some of Its Applications, Krieger
Publishing Company, 1993.
[1] J. Wood, Invariant pattern recognition: a review, Pattern Recognition 29 (1) [26] D. Casasent, D. Psaltis, New optical transforms for pattern recognition,
(1996) 117. Proceedings of the IEEE 65 (1) (1977) 7784.
[2] D. Zhang, G. Lu, Review of shape representation and description techniques, [27] Y. Sheng, J. Duvernoy, Circular Fourier-radial Mellin descriptors for pattern
Pattern Recognition 37 (1) (2004) 119. recognition, Journal of the Optical Society of America A 3 (6) (1986) 885888.
[3] K.B. Howell, Fourier transforms, in: A.D. Poularikas (Ed.), The Transforms and [28] R.A. Altes, The FourierMellin transform and mammalian hearing, Journal of
Applications Handbook, CRC Press & IEEE Press, 2000 (Chapter 2). the Acoustical Society of America 63 (1) (1978) 174183.
[4] J. Bertrand, P. Bertrand, J.P. Ovarlez, The Mellin transform, in: A.D. Poularikas [29] P.E. Zwicke, I. Kiss, A new implementation of the Mellin transform and its
(Ed.), The Transforms and Applications Handbook, CRC Press & IEEE Press, application to radar classication of ships, IEEE Transactions on Pattern
2000 (Chapter 11). Analysis and Machine Intelligence 5 (2) (1983) 191199.
[5] Y.-N. Hsu, H.H. Arsenault, G. April, Rotation-invariant digital pattern recogni- [30] A. Khotanzad, Y.H. Hong, Invariant image recognition by Zernike moments,
tion using circular harmonic expansion, Applied Optics 21 (22) (1982) IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (5) (1990)
40124015. 489497.
[6] D. Zhang, G. Lu, Shape-based image retrieval using generic Fourier descriptor, [31] P.C. Mahalanobis, On the generalised distance in statistics, Proceedings of the
Signal Processing: Image Communication 17 (10) (2002) 825848. National Institute of Sciences of India 2 (1) (1936) 4955.
[7] M.R. Teague, Image analysis via the general theory of moments, Journal of the
[32] W.A. Gotz,
H.J. Druckmuller, A fast digital Radon transforman efcient
Optical Society of America 70 (8) (1980) 920930. means for evaluating the Hough transform, Pattern Recognition 29 (4) (1996)
[8] J. Flusser, T. Suk, B. Zitova, Moments and Moment Invariants in Pattern 711718.
Recognition, John Wiley & Sons, 2009. [33] M.L. Brady, A fast discrete approximation algorithm for the Radon transform,
[9] H. Hjouj, D.W. Kammler, Identication of reected, scaled, translated, and SIAM Journal on Computing 27 (1) (1998) 107119.
rotated objects from their Radon projections, IEEE Transactions on Image [34] A. Averbuch, R. Coifman, D. Donoho, M. Israeli, J. Walden, Fast Slant Stack:
Processing 17 (3) (2008) 301310. A Notion of Radon Transform for Data on A Cartesian Grid Which Is Rapidly
[10] S. Tabbone, L. Wendling, J.-P. Salmon, A new shape descriptor dened on the Computable, Algebraically Exact, Geometrically Faithful, and Invertible,
Radon transform, Computer Vision and Image Understanding 102 (1) (2006) Technical Report, Stanford University, 2001.
4251. [35] S. Derrode, F. Ghorbel, Robust and efcient FourierMellin transform approx-
[11] X. Wang, B. Xiao, J.-F. Ma, X.-L. Bi, Scaling and rotation invariant analysis imations for gray-level image reconstruction and complete invariant descrip-
approach to object recognition based on Radon and FourierMellin trans- tion, Computer Vision and Image Understanding 83 (1) (2001) 5778.
forms, Pattern Recognition 40 (12) (2007) 35033508. [36] A. De Sena, D. Rocchesso, A fast Mellin and scale transform, EURASIP Journal
[12] S.-S. Xiao, Y.-X. Wu, Rotation-invariant texture analysis using Radon and on Advances in Signal Processing 100 (2007).
Fourier transforms, Journal of Physics Conference Series 48 (1) (2007) 1459. [37] G. Robbins, T. Huang, Inverse ltering for linear shift-variant imaging
[13] G. Chen, T.D. Bui, A. Krzyzak, Invariant pattern recognition using Radon, dual- systems, Proceedings of the IEEE 60 (7) (1972) 862872.
tree complex wavelet and Fourier transforms, Pattern Recognition 42 (9) [38] L.H. Johnson, The Shift and Scale Invariant FourierMellin Transform for
(2009) 20132019. Radar Applications, Technical Report, Massachusetts Institute of Technology,
[14] X. Wang, F. xia Guo, B. Xiao, J.-F. Ma, Rotation invariant analysis and 1980.
orientation estimation method for texture classication based on Radon [39] K. Jafari-Khouzani, H. Soltanian-Zadeh, Rotation-invariant multiresolution
transform and correlation analysis, Journal of Visual Communication and texture analysis using Radon and wavelet transforms, IEEE Transactions on
Image Representation 21 (1) (2010) 2932. Image Processing 14 (6) (2005) 783795.
[15] R.O. Duda, P.E. Hart, Use of the Hough transformation to detect lines and [40] S.A. Nene, S.K. Nayar, H. Murase, Columbia Object Image Library (COIL-20),
curves in pictures, Communications of the ACM 15 (1) (1972) 1115. CUCS-005-96, Technical Report, Department of Computer Science, Columbia
[16] V.F. Leavers, J.F. Boyce, The Radon transform and its application to shape University, 1996.
parametrization in machine vision, Image and Vision Computing 5 (2) (1987) [41] D.S. Doermann, E. Rivlin, I. Weiss, Applying algebraic and differential
161166. invariants for logo recognition, Machine Vision and Applications 9 (2)
[17] V.F. Leavers, Use of the Radon transform as a method of extracting informa- (1996) 7386.
tion about shape in two dimensions, Image and Vision Computing 10 (2) [42] T.B. Sebastian, P.N. Klein, B.B. Kimia, Recognition of shapes by editing their
(1992) 99107. shock graphs, IEEE Transactions on Pattern Analysis and Machine Intelligence
[18] V.F. Leavers, Use of the two-dimensional Radon transform to generate a 26 (5) (2004) 550571.
taxonomy of shape for the characterization of abrasive powder particles, IEEE [43] M. Bober, F. Preteux, W.-Y.Y. Kim, Shape descriptors, in: B.S. Manjunat,
Transactions on Pattern Analysis and Machine Intelligence 22 (12) (2000) P. Salembier, T. Sikora (Eds.), Introduction to MPEG 7: Multimedia Content
14111423. Description Language, John Wiley & Sons, 2002, pp. 231260.

Thai V. Hoang received his B.Eng. degree from Hanoi University of Science and Technology, Vietnam, in 2003 and his M.Sc. degree from Korea Advanced Institute of Science
and Technology, Republic of Korea, in 2007, both in Electrical Engineering. Since October 2008, he has been receiving the CNRSs BDI-PED doctoral fellowship and working
towards his Ph.D. degree in Computer Science at LORIA-INRIA Nancy research center in France. His research interests include invariant pattern representation and sparse
representation.

Salvatore Tabbone is a professor in Computer Science at University of Nancy 2 (France). Since 2007, he is the scientic leader of the QGAR research team at LORIA research
center. His research topics include image and document indexing, content-based image retrieval, and pattern recognition. He is the (co-)author of more than 100 articles in
refereed journals and conferences. He serves as PC members for numerous national and international conferences.

You might also like