Learning Mahalanobis Distance For DTW Based Online Signature Verification

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/252016693

Learning Mahalanobis Distance for DTW Based Online Signature Verification

Article · June 2011


DOI: 10.1109/ICINFA.2011.5949012

CITATIONS READS

25 231

3 authors, including:

Yu Qiao
Chinese Academy of Sciences
189 PUBLICATIONS   7,182 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Scene Text Understanding View project

EmotionW 2017 View project

All content following this page was uploaded by Yu Qiao on 22 July 2015.

The user has requested enhancement of the downloaded file.


Learning Mahalanobis Distance for DTW based
Online Signature Verification
Yu Qiao1,2 , Xingxing Wang1 , and Chunjing Xu1,2
1. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
2. The Chinese University of Hong Kong, Hong Kong, China
{yu.qiao, xx.wang, cj.xu}@siat.ac.cn

Abstract— Signature, a form of handwritten depiction, has signature verification has higher performance than offline one
been and is still widely used as a proof of the writer’s due to the availability of dynamic information. There are
identity/intent in human society. Online signatures represents researches which aims at recovering dynamic information
the dynamic process of handwriting as a sequence of feature
vectors along time. Dynamic time warping (DTW) has been from static handwritten images, but this is still regarded as a
popularly adopted to compare sequence data. A basic problem hard problem [2], [3]. Recently, due to the popularity of tablet
in using DTW for signature verification is how to estimate the and touch devices, it becomes more easy and convenient to
difference between the feature vectors. Most previous researches record online handwriting data by smart phone, tablet PC
made use of Euclidean distance (ED) for this problem. However, etc. Partially for this reason, online signature verification
ED treats each feature equally and cannot take account of
the correlations between features. To overcome this problem, attract large research interests nowadays [4], [5], [6], [7], [8].
this paper proposed Mahalanobis distance (MD) for signature Competitions on signature verification has been organized in
verification. One key question is how to estimate covariance 2004, 2009 [4], [5].
matrix in MD calculation. We formulate this problem in a
learning framework and introduce two criterion for estimating This paper deals with online signature verification. Online
the matrix. The first criteria aims at minimizing the signature signature can be seen as a sequence of feature vectors
difference for the same writer, while the second criteria try sampled along time. Dynamic time warping (DTW) can
to maximize the signature difference between different writ- compare sequences with distortion and is popular used in
ers while minimize the within-writer signature difference. We
carried out experiments on the MCYT biometric database. online signature verification [4], [5], [6], [7], [8]. In DTW, we
The experimental results exhibit that the proposed MD based need a distance measure between feature vectors. Euclidean
method achieved better results than ED based method. distance has been widely used in previous researches [6],
[7], [8]. However, in Euclidean distance, each dimension of
Index Terms— Mahalanobis distance, Signature verification, feature vector is treated equally and the correlations between
Dynamic time warping, Sequence feature.
different dimensions are ignored. To address this problem,
I. I NTRODUCTION this paper proposed Mahalanobis distance as a measure of
the difference between feature vectors. The key problem in
Signatures, handwritten depictions of names, provide a
using Mahalanobis distance is how to estimate the covariance
natural way to recognize and identify persons. In human
matrix. In this paper, we proposed two methods to estimate
society, signatures have been and are still widely used to
covariance from a set of training signatures. The first method
give evidence of the provenance of the document or the
aims at minimizing the difference of the signatures of the
intention of an individual with regard to that document.
same writer. The second method tries to maximizing the sig-
Automatic signature verification techniques make use of
nature difference among different writers while minimizing
computer algorithms to identify writer or verify a signature is
the within-writer signature difference at the same time. We
genuine or forgery. Compared with other biometric methods,
carried out experiments on MCYT online signature database
such as face iris speech, signature verification is already
[9]. The experimental results show that Mahalanobis distance
widely accepted in daily human life and is protected by laws.
based measure achieved better performance than Euclidean
There are two types of signatures[1]. The first type is
distance based ones. We also tested the methods on different
called offline signature or static signature where the images
feature set for signature verification.
of signatures are scanned and digitized into computer as a 2
dimensional image. The second type is called online signature The reminder of this manuscript is organized as follows.
or dynamic signature, which is captured by digital pen or Our signature verification system is given in Section 2.
tablet device. Offline signature can be seen as the result of Section 3 shows how to learn Mahalanobis distance for
the signing action, while online signature can be seen as the signature verification. Experimental results are described in
record of the dynamic signing processing. Generally, online Section 4. And we conclude the paper in Section 5.
3000
2000

X
1000
0 10 20 30 40 50 60 70 80 90 100

8000

Y
7000
0 10 20 30 40 50 60 70 80 90 100
1000

Pressure
500

0 10 20 30 40 50 60 70 80 90 100
130

Azimuth
120
110
0 10 20 30 40 50 60 70 80 90 100

Elevation
65

60
0 10 20 30 40 50 60 70 80 90 100

Fig. 1. An example of signature and its features.

II. OVERVIEW OF ONLINE SIGNATURE VERIFICATION difficult to be imitated, but they always exhibit variance with
SYSTEM the signatures of the same signer.
The above feature vector describes the local feature infor-
In a signature verification system, we have a set of reg-
mation at certain time point. There are some global feature
istered signatures given by certain writer. Then, for a test
which can be useful for signature verification. In [10], the
signature, our objective is to examine whether it is written by
authors compared a set of 91 global features in the task of
the same person or not. If it is written by the same person,
online signature verification. Due to space limitation, the use
the signature is called a genuine one; if not, it is called a
of global features is out of the discussion of this paper. But
forgery. In signature verification, we need to compare the
the combination of local and global features can improve the
difference of two signatures. Here the difficulty is that the
performance [11].
two signatures may have different lengths, and there may
exist distortion between them. One of the popular methods
to deal with this method is Dynamic Time Warping. DTW B. Dynamic time warping
aligned two sequence by minimizing the distortion distance
between them. The minimum can be achieved efficiently by Dynamic time warping, also known as dynamic program-
Dynamic Programming (DP). In the next, we will describe ming matching, was first proposed by Sakoe and Chiba [12]
dynamic time warping at first. Then we will discuss how to for speech recognition. Suppose we have two sequences,
use it for signature verification. denoted by X = [x1 , x2 , ..., xN ] and Y = [y1 , y2 , ..., yM ],
where N and M are the lengths of two sequences, xi , yi
A. Features of online signature denote feature vectors. Note there may exist distortion be-
tween X and Y . DTW aims to find an optimal matching path
Online signature is a sequence of features vectors recorded
between X and Y , which minimizes the difference between
by tablet or digital pen devices in time. With a tablet, features
them. We introduce two warping functions wx (k) and wy (k)
such as positions x, y, pressure p, azimuth angle α and
to represent the matching path, i.e., xwx (k) is matched to
inclination angle β can be recorded at sampling time t. An
ywy (k) . Let K represent the length of warping path. The
example of these features of a signature is shown in Fig. 1.
warping functions are monotonically non-decreasing integer
In this paper, we make use of the first three features [x, y, p]
functions, and should satisfy the following start and end
due to their good performance in previous researches.
constraints,
As a processing, we subtract the positions of the start
points x, y. We can further calculate the derivatives of po-
wx (1) = 1, wy (1) = 1,
sitions (velocity) and pressure denoted by vx , vy and ∆p. We
use x = [x, y, vx , vy , p, ∆p] as a 6 dimensional feature vector. wx (K) = n, wy (K) = M. (1)
These play different roles and have different importance in
signature verification. Position x, y determine the shape of There also exist other constraints on warping functions, such
signature, but they are comparably easy to be imitated. On as wx (i)−wx (i−1) and wy (i)−wy (i−1) should be smaller
the other hand, velocity and pressure feature vx , vy , p, ∆p are than certain value. One may refer to [13] for more details.
9000 120

8500
100

8000
80
7500

7000 60

6500
40

6000
20
5500

5000 0
0 500 1000 1500 2000 2500 3000 3500 0 20 40 60 80 100 120

Fig. 2. An example of signature and its features. In the left figure, red and black lines denote two signatures, blue lines show the corresponding relation
between their points. The right figure shows the DTW path between two signatures.

In this paper, we assume that among the five registered signatures. We also use DTW
to calculate five distances between it and each of the five
wx (i) − wx (i − 1) ≤ 1,
registered signatures, and estimate the average of these five
wy (i) − wy (i − 1) ≤ 1. (2) distance
Suppose we have a distance function d(x, y) to measure 1X
d(Y, {Xi }) = DT W (Xi , Y ). (6)
the difference between x and y. The objective of DTW is 5 i
defined as
K Then we take
X
DT W (X, Y ) = min d(xwx (k) , ywy (k) ). (3) d(Y, {Xi })
wx ,wy
¯ (7)
k=1 dDT W ({Xi })
The above objective function can be optimized by us- as a verification score. In spite of average distances, maxi-
ing dynamic programming. For simplicity, let DT W (i, j) mum minimum distances can also be combined to calculate
denote the DTW matching cost between two subsequences verification scores. More discussions on this can be found in
[x1 , x2 , ..., xi ] and [x1 , x2 , ..., xj ]. We can iteratively optimize [7].
DT W (i, j) by,
 III. L EARNING M AHALANOBIS DISTANCE
 DT W (i − 1, j) + d(i, j),
DT W (i, j) = min DT W (i, j − 1) + d(i, j), (4) To use DTW, we need a distance function d(x, y) to
 estimate the difference between two feature vectors x, y.
DT W (i − 1, j − 1) + d(i, j).
Perhaps, the most simple way is to use Euclidean distance
An example of DTW comparison of two signature is shown
(ED),
in Fig. 2.
C. Verification d(x, y) = kx − yk2 . (8)
In the signature verification tasks, there usually exist more ED is simple and has been adopted in many previous works
than one signatures for registrations. In our experiments, this [6], [7], [8]. However, in Euclidean distance, each dimension
number is set as five, since it has been shown that five of feature vectors is treated equally and the correlations be-
signatures can give relative good performances [9], [8], [7]. tween these features are ignored. In our problems, the feature
For a test signature Y , we need to estimate a verification score vectors are composed of positions, velocity, pressure which
how this signature is near to the five registered signatures are from different domain and have different measurement
{X1 , X2 , .., X5 }. Among the five signatures, we can also units. Moreover, these features are not independent to each
select one signature as template Xt , which has the minimum other, and can be correlated. For example, pressure is related
distance with others among the five registered ones. We with velocity. To overcome this problem, we made use of
calculate the average DTW distances between the template Mahalanobis distance to replace Euclidean distance,
signature and the other registered signatures,
d(x, y) = (x − y)T Σ−1 (x − y), (9)
1 X
d¯DT W ({Xi }) = DT W (Xi , Xt ), (5) where Σ denotes a full rank covariance matrix. Note Eu-
4
Xi (i6=t) clidean distance can be seen as a special example of MD
when taking unit matrix I as covariance matrix Σ. If Σ We can rewrite it as,
is diagonal, this equals to a weighted summation of the X
difference of features, DT W (X, Y )
X,Y ∈X
d
X X X
d(x, y) = (x − y)T Σ−1 (x − y) = wj (xj − y j )2 , (10) = (xwx (k) − ywy (k) )T Σ−1 (xwx (k) − ywy (k) ).
j=1 X,Y ∈X k
(14)
where d is the dimensionality of xj , weights [w1 , w2 , ..., wd ]
denotes the diagonal of Σ−1 , and xj denote the j-th dimen- Define within-writter variance matrix for X as
sion of x. X X
If Σ is not diagonal, we can apply eigen-decomposition Sw = (xwx (k) − ywy (k) )(xwx (k) − ywy (k) )T .
X,Y ∈X k
on it : Σ = U T ΛU , where U consists of the eigen vectors
(15)
and Λ is a diagonal matrix whose diagonal components are
the eigen values. Then, MD can be written into the ED with Note Sw has a dimensionality d × d.
transformed features: In the following, we deduce the optimal solution for Eq.
d(x, y) = (x − y)T Σ−1 (x − y) = ||Ax − Ay||2 , (11) 13. Remind AT A = Σ−1 , Eq. 13 can be written into

where the transformation matrix A = Λ−1/2 U . It is easy to M SV (D, Σ) = Tr(ASw AT ), (16)


examine that AT A = Σ−1 . This formulation will fasten the
where “Tr” denotes the trace of a matrix.
DTW calculation. In DTW, we need to estimate the distance
Since |AT A| = 1, we have the Lagrangian function of Eq.
between each pair of feature vectors (x, y) from sequences
13 as follows,
X and Y . If using Eq. 9, we need to calculate a vector-matrix
multiplication at each MD calculation. With Eq. 11, we can L(A, λ) = Tr(ASw AT ) + λ(|AT A| − 1). (17)
transform the feature vectors x, y by Ax, Ay at first. Then
MD calculation becomes to ED one without vector-matrix Calculating the derivative of Eq. 17 to A, we have
multiplication.
∂L(A, λ) ∂Tr(ASw AT ) ∂λ(|AT A| − 1)
To use Mahalanobis distance (Eq. 9), the basic question is = +
how to estimate matrix Σ. In classical MD, Σ is estimated ∂A ∂A ∂A
as the covariance matrix for the whole signature sequence = 2ASw + 2λ|AT A|A−T = 0. (18)
n
1X Since AT A = Σ−1 , the optimal covariance matrix of Eq.
Σ= (xi − m)(xi − m)T , (12) 13 can be calculated by,
n i=1
Pn 1
where mean m = i=1 xi /n. However, the calculation Eq. ΣM SV = 1/d
Sw . (19)
12 only considers the statistical characteristics of the whole |Sw |
data. We are more interested in a distance metric which is B. Criterion 2: maximization of discriminant variance
small enough for signatures of the same writer while keeps
large enough for signature from different writers. In this section, a set of genuine signatures X and a set of
In the following, we will study this problem (estimation of faked signatures Z = {Z1 , Z2 , ..., ZR } are used for training.
matrix Σ) in a learning framework. Specially, we will develop The 2nd criterion tries to minimize the signature difference
two criteria which minimize the feature variance for the within the same writer, while to maximize the signature
same writer and (or) maximize the feature difference between difference between different writers. Formally,
different writers. We assume |Σ| = 1 to avoid scaling factors. X
max M SB(X, Σ) = max DT W (X, Z), (20)
Σ Σ
A. Criterion 1: minimization of within variance X∈X,Z∈Z
X
Assume we have a set genuine signatures X = min M SW (X, Z, Σ) = min DT W (X, Y ). (21)
Σ Σ
{X1 , X2 , ..., XQ } of the registered writer for training. We X,Y ∈X
apply dynamic time warping with ED for each two signatures
Define between-writer signature variance matrix of Z as
X, Y in X and obtain the warping path wx (k), wy (k).
X X
The first criterion is to find matrix Σ, which minimizes the Sb = (xwx (k) − zwz (k) )(xwx (k) − zwz (k) )T ,
summation of DTW distance among every pair of signatures. X∈X,Z∈Z k
Mathematically, this can be formulated as (22)
X
min M SW (X, Σ) = min DT W (X, Y ). (13) where wx (k) and wz (k) are the DTW matching paths be-
Σ Σ
X,Y ∈X tween X and Z.
MCYT DB,100 signer
s
Then, Eq. 20, 21 can be reduced to, 10
ED
min Tr(ASw AT ), (23) 9 MSV
MDV−RT
Σ MDV−ST α =0.01
max Tr(ASb AT ). (24) 8 MDV−ST α =0.1
Σ
7
This is a multi-objective problem. We need to convert it to

False Rejection Rate (%)


6
a single objective one. Basically, there are two choices. One
is based on the subtraction of trace 5

min{Tr(ASw AT ) − αTr(ASb AT )} (25) 4


Σ
3
where α is a coffecient; the other is based on ratio of trace,
1 2

Tr(ASb AT ) 1
max . (26)
Σ Tr(ASw AT ) 0
0 1 2 3 4 5 6 7 8 9 10
Eq. 25 can be optimized by using the same techniques of False Acceptance Rate (%)

MSV,
Fig. 3. ROC curves of ED, MSV, MDV-RT and MDV-ST.
Sw − αSb
ΣM DV −ST = 1/d
. (27)
|Sw − αSb | 6

However, there is no close form solution for Eq. 26. [14] 5


5.23

showed an approximate answer for Eq. 26 as


4.28
4.09
Sb−1 Sw Sb−1 4 3.93
ΣM DV −RT = 1/d
. (28) 3.26
|Sb−1 Sw Sb−1 |
ERR (%
)

IV. E XPERIMENTS
2
We made use of MCYT signature database [9] to evalu-
ate the performance of proposed methods. MCYT database 1
includes signatures for 100 signers. For each signer, there
are 25 genuine signatures and 25 skilled forgeries. Forgers 0
ED MSV MDV−RT MDV−ST MDV−ST
are asked to imitate the shape with natural dynamics of the α = 0.01 α =0.1
signature images to be forged. The MCYT database is divided
into the training and testing sets. The training set includes Fig. 4. EERs of ED, MSV, MDV-RT and MDV-ST.
5 genuine signatures, and the other genuine signatures and
25 forgeries are used for testing. We conducted two exper-
iments for evaluation. In the first experiment, we compared MDV, we used five genuine signatures and two forgeries to
Mahalanobis distance (MD) with Euclidean distance (ED) covariance matrix for each signer. Note the training signatures
for signature verification. In the second experiment, we test will not be used in testing phase. There are two method in
different feature sets when using Mahalanobis distance for MDV estimation, namely MDV-ST by Eq. 27 and MDV-RT
signature verification. by Eq. 28. In MDV-ST, α is set as 0.1, 0.01.
A. MD vs. ED We made comparative signature verification experiments
among the four distance methods, Euclidean distance (ED),
There are two methods to estimate the covariance matrices Mahabanobis distance with MSV, MDV-RT and MDV-ST. All
for MD. The first method is to minimize the within-writer the six features x, y, vx , vy , p, ∆p are used in this experiment.
signature variance (MSV). The second method aims to max- For each method, we calculated the false acceptance rates
imize the between-writer signature variance and to minimize and false rejection rates using user-independent thresholds.
the within-writer signature variance. We called the second The ROC curves of these methods are depicted in Fig. 3.
method maximize discriminative variance (MDV). The equal error rates for each method are shown in Fig. 4.
In MSV, we used five genuine signatures to estimate a MD based method always have better performance than ED
covariance matrix of MD for each signer by using Eq, 19. In based method. MDV-RT with α = 0.01 achieved the best
1 Readers ASb A T performance among all the methods compared.
may suggest to use trace ratio maxΣ Tr( AS T ) as a criteria,
wA We conducted experiments to estimate a common covari-
which is widely adopted in linear discriminant analysis (LDA). However, it
can be proved that trace ratio is invariant to transformation A on Σ. ance matrix for all signers. The error rates of this method are
MCYT DB,100 signer
s
10 V. C ONCLUSIONS
x y dx dy p dp Online signature verification is an important biometric
9 y dx dy p dp
y dx dy method and has wide applications. Dynamic time warping
8 y dx dyp (DTW) has been widely used in online signature verification
due to its efficiency and effectiveness. This paper introduced
False Rejection Rate (%)

7
Mahalanobis distance (MD) to replace Euclidean distance
6 (ED) in order to improve the performance of DTW based
signature verification. Compared ED, MD can take account of
5
the correlations among differen features and can put weights
4 on different features. We develop two criterion for estimating
the covariance matrices in MD calculation. The objectives
3
are to maximize between-signer feature variance and (or)
2 minimize within-signer feature variance. Our experiments
on MCYT database exhibits that MD-based method can
1
achieved higher recognition rates than ED-based method.
0
0 2 4 6 8 10 VI. ACKNOWLEDGMENT
False Acceptance Rate (%)
This work is partially supported by Supported by National
Fig. 5. ROC curves of different features sets. Natural Science Foundation of China (61002042,61005011)
and Shenzhen Basic Research Program for Distinguished
5 Young Scholar (JC201005270350A).
4.5 R EFERENCES
4 3.93 3.94
3.73 3.78 [1] R. Plamondon and G. Lorette, “Automatic signature verification and
3.5 writer identification–the state of the art,” Pattern recognition, vol. 22,
no. 2, pp. 107–131, 1989.
3
ERR (%)

[2] Y. Qiao and M. Yasuhara, “Recovering dynamic information from


2.5 static handwritten images,” IWFHR, pp. 1550–5235, 2004.
[3] Y. Qiao, M. Nishiara, and M. Yasuhara, “A framework toward
2
restoration of writing order from single-stroked handwriting image,”
1.5 IEEE Tran. on PAMI, pp. 1724–1737, 2006.
[4] D.Y. Yeung, H. Chang, Y. Xiong, S. George, R. Kashi, T. Matsumoto,
1
and G. Rigoll, “SVC2004: First international signature verification
0.5 competition,” Biometric Authentication, pp. 1–30, 2004.
[5] V.L. Blankers, C.E. van den Heuvel, K.Y. Franke, and L.G. Vuurpijl,
0
x y dx dy p dp y dx dy p dp y dx dy y dx dy p “ICDAR 2009 Signature Verification Competition,” in ICDAR, 2009,
pp. 1403–1407.
Fig. 6. EERs of different features sets. [6] A.K. Jain, F.D. Griess, and S.D. Connell, “On-line signature verifica-
tion,” Pattern recognition, vol. 35, no. 12, pp. 2963–2972, 2002.
[7] A. Kholmatov and B. Yanikoglu, “Identity authentication using
improved online signature verification method,” Pattern Recognition
Letters, vol. 26, no. 15, pp. 2400–2408, 2005.
bit higher than singer-dependent covariance matrix. We also [8] M. Faundez-Zanuy, “On-line signature recognition based on VQ-
tried diagonal covariance matrix. The results are a bit worse. DTW,” Pattern Recognition, vol. 40, no. 3, pp. 981–992, 2007.
[9] J. Ortega-Garcia, J. Fierrez-Aguilar, D. Simon, J. Gonzalez,
M. Faundez-Zanuy, V. Espinosa, A. Satue, I. Hernaez, J.J. Igarza,
C. Vivaracho, et al., “MCYT baseline corpus: a bimodal biometric
B. Feature set database,” in IEE Proceedings Vision, Image and Signal Processing,,
2003, vol. 150, pp. 395–401.
There are totally six types of features used in our exper- [10] L.L. Lee, T. Berger, and E. Aviczer, “Reliable online human signature
verification systems,” IEEE Tran. on PAMI, vol. 18, no. 6, pp. 643–
iments. In practice, different features have different roles in 647, 1996.
signature verification. And it has been shown in previous [11] J. Fierrez-Aguilar, L. Nanni, J. Lopez-Penalba, J. Ortega-Garcia, and
researches that using a subset of these features can have D. Maltoni, “An on-line signature verification system based on fusion
of local and global information,” in Audio-and video-based biometric
better performance than using all. In the second experiment, person authentication. Springer, 2005, pp. 523–532.
we make comparison of different feature subset for signature [12] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimiza-
verification. The ROC curves and EER of four different sets tion for spoken word recognition,” Acoustics, Speech and Signal
Processing, IEEE Transactions on, vol. 26, no. 1, pp. 43–49, 1978.
are shown in Fig. 5 and Fig.6, respectively. We also tried [13] L. Rabiner and B.H. Juang, “Fundamentals of speech recognition,”
other feature sets, but these four sets have relatively good per- Prentice hall, 1993.
formance. Feature set y, vx , vy , p has the best performances, [14] T. Hastie and R. Tibshirani, “Discriminant adaptive nearest neighbor
classification,” IEEE Tran. on PAMI, vol. 18, no. 6, pp. 607–616, 1996.
while the results of the other three methods are very near.

View publication stats

You might also like