Expert System For Speaker Identification Using Lip Features With PCA

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Expert System for Speaker Identification using

Lip features with PCA


Anuj Mehra Mahender Kumawat Rajiv Ranjan
anujmehra87@gmail.com mahender_iiitm@yahoo.co.in 009.rajiv@gmail.com
Bipul Pandey Sushil Ranjan Anupam Shukla Ritu Tiwari
pandeybipul@gmail.com sushilranjan007@gmail.com dranupamshukla@gmail.com rt_twr@yahoo.co.in
Department of Information and Communications Technology,
ABV-Indian Institute of Information Technology and Management, Gwalior, India

Abstract--Biometric authentication techniques such as Shape Model (ASM) features and model the lip
lips, face, and eyes are more reliable and efficient shape and the intensity profile vector along normal
than conventional authentication techniques such as to the model points by using ASM, Matthews et al.
password authentication, token, cards, personal [6] considering the intensity variation inside the
identification number, etc. In this research paper, the
outer lip contour. Previous work has been done on
emphasis has been laid on the speaker identification
based on lip features. In this study, we have presented the Active Shape Model (ASM) [7] features and
a detailed comparative analysis for speaker analysis has been made by using Hidden Markov
identification by using lip features, Principal Model (HMM) as a classifier. Algorithms have
Component Analysis (PCA), and neural network been developed that automatically extracts lip areas
classifiers. PCA has been used for feature extraction from speaker images [9]. PCA is an approach
from the six geometric lip features which are height of which is used to approximate the original data with
the outer corners of the mouth, width of the outer lower dimensional feature vectors. In this paper, we
corners of the mouth, height of the inner corners of have proposed a comparative analysis for the
the mouth, width of the inner corners of the mouth,
speaker authentication by using lip features with
height of the upper lip, and height of the lower lip.
These features are then used for training of the the incorporation of BP, LVQ, and RBF with PCA
network by using different neural network classifiers as compared to previous work which uses mainly
such as Back Propagation (BP), Radial Basis Hidden Markov Model (HMM) [8]. The present
Function (RBF) and Learning Vector Quantization work is a novel approach which uses different
(LVQ). These approaches are incorporated on Artificial Neural Network (ANN) algorithms for
“TULIPS1 database, (Movellan, 1995)” which is a speaker identification. The block diagram of
small audiovisual database of 12 subjects saying the speaker identification process in this work is shown
first 4 digits in English. After the detailed analysis in fig.1.
and evaluation a maximum of 91.07% accuracy in
speaker recognition is obtained using PCA and RBF. “TULIPS1
Speaker identification has a wide range of database” [1]
applications such as Audio Processing, Medical data,
Finance, Array processing, etc.

Keywords-Biometric Authentication; Lip Feature; Training dataset Testing dataset


Principal Component Analysis (PCA); Back
Propagation (BP); Radial Basis Function (RBF);
Learning Vector Quantization (LVQ).
Feature Feature
extraction extraction
I. INTRODUCTION
Biometric authentication techniques such as lips,
face, and eyes are more reliable and efficient than Applying Different
conventional authentication techniques and have a ANN-BP, RBF, LVQ
wide range of applications [2]. Lip features and its
associated intensity information have been
successfully demonstrated for speaker
identification. Previous researches have proposed
Simulating Network and
different kind of techniques using lip feature to
calculating Accuracy
authenticate the speaker. These approaches were
Brown et al. [3] which uses six geometric lip
features and two inner mouth features to extract Desired result is obtained-91.07%
identity-relevant information, Wark and Sridharan using PCA and RBF
[4] forms the grand profile vector (GPV) for the
image by concatenating normal from profiles to the Fig1.Block Diagram for Speaker Identification.
contour points, Luettin et al. [5] extracts the Active

978-1-4244-5874-5/10/$26.00 ©2010 IEEE


the centre of Gaussian functions. In the network
II. METHODOLOGY [14] n –dimensional input feature vector are
Here, PCA is used as a preprocessing technique for accepted to the input layer which is a set of n units
the lip images. PCA is a technique which produces consisting of input vectors x1, x2, ......., xn and are
feature vectors in a reduced dimension. The feature input to the l hidden functions. The output of the
vectors are then used as an input data to the hidden function is then multiplied by the weighting
proposed neural network classifiers which are Back factor w(i, j) which is the input to the output layer
Propagation Multi-layer neural network structure of the network y (x).
(BP-MLNN), Radial basis function (RBF) and N
Learning Vector quantization (LVQ). y ( x) = ∑ wi Φ (|| x − ci ||) (3)
i =1
A. PCA Preprocessing
PCA is a technique used for the dimensionality where the approximating function y(x) is
reduction [12, 15]. The procedure is to first represented as a sum of N radial basis functions,
calculate the principle components. These are the each associated with a different centre ci, and
unit vectors which are computed by the eigen weighted by an appropriate coefficient wi, and || ||
values and eigenvectors from the covariance matrix indicates the Euclidean norm on the input space.
of the original input data vector followed by
computation of the orthonormal vectors which are D. Learning Vector Quantization Networks
the basis of the computed data. The principle LVQ [10] is a prototype-based supervised
components calculated serve as the new set of axes classification algorithm and a learning technique
for the input data. The principle component w1 of that uses class information to improve the quality
the data set X is as follows: of classifier decision regions with the help of
movement of Voronoi vectors [13] to some extent.
w1 = arg max var{ W T x} The computation of the featured map may be
|| w || = 1 viewed in two stages [14]. The first stage is self-
(1) organizing feature map and the second stage is
= arg max E {( W T x ) 2 } LVQ which are used to solve a pattern
|| w || = 1
classification problem.

B. Back Propagation Multi-Layer Neural


Networks Structure III. EXPERIMENT AND RESULTS
In back propagation the feed forward multi-layer In this research work, we study and explore the use
neural networks are used as a classifier for the PCA of neural networks in the authentication of speaker
output data. The partial derivatives of the cost using lip features. In the first step, the “TULIPS1
function with respect to the parameters of the database, (Movellan, 1995)” (shown in Fig. 2) is
network are being determined by the back selected which consists of 12 subjects saying the
propagation error signals [12]. first 4 digits in English and the digits are repeated
The equation of error back propagation is twice by each speaker. Each digit spoken
constitutes of 6 instances and the size of each
∂E ∂net i ∂y j
δj = − ∑
image is 100 by 75 pixels. In the second step, the
(2)
database is split into two datasets i.e. training and
i∈Pj ∂net i ∂y j ∂net j
testing dataset. In this experiment, 7 subjects are
chosen for the research work constituting in the
where, first factor represents the error of node i, formulation of 7 classes with 24 images present in
second factor is the weight from unit j to i, and each class i.e. 168 images are selected for training
third factor is the derivative of node j's activation and 168 images are selected for testing dataset. In
function. the next step, PCA is applied on training dataset for
feature extraction followed by the neural network
C. Radial Basis Function Network classifiers (BP, RBF, and LVQ). An output matrix
These are the network that finds the input to output is achieved which consists of the reduced set of
map using local approximators and require fewer images. A target matrix is formed and testing
training samples. It strictly responds to a small images are trained using PCA followed by the
region of the input space where the Gaussian is simulation of the neural networks resulting in the
centered due to the static Gaussian function as the calculation of the accuracy (number of images of
nonlinearity for the hidden layer processing training and testing dataset matched) in the last
elements which makes them extremely fast. This step.
type of neural network classifier gives efficient and
better result with the usage of unsupervised
approach instead of the supervised approach to find
B. Classification by the Radial Basis
Function Network
PCA along with RBF is applied to these lip
images for training and then it is tested with rest of
the images. In RBF a hidden radial basis layer &
output linear layer is used.
II. STATISTICAL DATA of RBF
For PCA
Number of Radial Basis Layers 1
Fig 2. Speaker speaking one to four Number of neurons (input ,radial basis 6,125,7
& output)
Spread 25
Epochs 50

Fig 3. Image showing lip features

As shown in fig. 3, H1 represents height of the


outer corners of the mouth, W1 represents width of Fig 5. Learning by PCA along with RBF.
the outer corners of the mouth, H2 represents
height of the inner corners of the mouth, W2
represents width of the inner corners of the mouth, C. Classification by the Learning Vector
H3 represents height of the upper lip, and H4 Quantization Network
represents height of the lower lip. PCA along with LVQ is applied to these lip
images for training and then it is tested with rest of
A. Classification by the Back Propagation the images. In LVQ a hidden competitive layer and
Network an output linear layer is used.
PCA along with BP is applied to these lip images
for training and then it is tested with rest of the III. STATISTICAL DATA of LVQ
For PCA
images. In BP two hidden layers are used so BP is a
Multilayer Neural Network (MLNN). Number of competitive Layers 1
Number of neurons (input ,competitive & 6,40,7
output)
I. STATISTICAL DATA of BP
For PCA
Transfer function Lvq1.0
Input Vector Nodes 6
Network Learning rate .001
Number of Hidden Layers 2 Epochs 1000
Number of neurons (hidden1 20,25,7
,hidden2 & output)
Transfer functions (input, hidden & Tansigmoid,
output ) tansigmoid, linear
Network Learning rate .01
Epochs 1000

Fig 6. Learning by PCA along with LVQ.

D. Experimental Results:
The results for the recognition test with BP, RBF &
LVQ are shown in Table IV:
Fig 4. Learning by PCA along with BP.
IV. STATISTICAL DATA of DIFFERENT NEURAL Pattern Analysis and Machine Intelligence, vol.24, issue 2, pp.
NETWORK TECHNIQUES 198-213, Feb. 2002.

Methods BP RBF LVQ 7.L. L. Mok, W. H. Lau, S. H. Leung, S. L. Wang and H. Yan,
"Person Authentication Using ASM Based Lip Shape and
Recognition 89.88% 91.07% 87.5% Intensity Information", Proc. 2004 IEEE International
Rate (151/168) (153/168) (147/168) Conference on Image Processing (ICIP 2004), pp 561-
564,Singapore, Oct. 2004.

Result shows that the recognition performance for 8. S.L.Wang and A. W. C. Liew, “ICA-Based Lip Feature
PCA with RBF is better than the other methods Representation For Speaker Authentication”, Proc. 2008 Third
used. International IEEE Conference on Signal-Image technologies
and Internet-Based System Page(s): 763-767, 16-18 Dec. 2007.

9. K.L. Sum, WH. Lau, S.H. Leung, Alan WC. Liew and K. W
IV. CONCLUSIONS Tse , ”A New Optimization Procedure for extracting the point –
based lip contour using active shape model”, Proc.2001 IEEE
It can be concluded from the result section that International Conference on Acoustics, Speech, and Signal
among all the classifiers mentioned above RBF Processing-Volume 03, Pages 1485-1488.
overrules BP and LVQ when used with PCA. It can
also be concluded that BP network achieved better 10. Guangming Dong, Jin Chen, Xuanyang Lei, Zuogui Ning,
Dongsheng Wang, and Xiongxiang Wang, “Global-Based
accuracy than LVQ by the use of its back Structure Damage Detection Using LVQ Neural Network and
propagation mechanism but it was RBF who Bispectrum Analysis”, Proc.2005 Springer-Verlag pp. 531 537,
achieved highest accuracy. The second conclusion Berlin Heidelberg 2005.
can be made with respect to computational time;
11. Jaakko Ho l lmen , Volker Tresp and Olli Simula, “A
RBF took much lesser time than BP and LVQ due Learning Vector Quantization Algorithm For Probailistic
to the fact that it is fast and require fewer training Models”, Proc.2000 European Signal Processing Conference,
samples as it employs local approximators. Hence, Volume II,pp.721-724.
using RBF in lip recognition increases the accuracy
12. Hui Kong, Xuchun Li, Lei Wang, Earn Khwang Teoh, Jian-
and decreases the computational time. In future, the
Gang Wang, Venkateswarlu, R “Generalized 2D principal
work can be done on an identification system that component analysis”, Proc. 2005 IEEE International Joint
incorporates the features of both lips and speech. Conference on Volume 1, Aug. 2005.
Accuracy can be improved by the use of other
feature extraction techniques. Accuracy can also be 13. Haykin, S. (1994) , Neural Networks: A Comprehensive
Foundation, Upper Saddle River, NJ : Prentice Hall.
improvised by incorporating several feature
extraction techniques to form a unique one. 14. Harry Wechsler, Vishal Kakkad, Jeffrey Huang, Srinivas
Gutta, V. Chen, “ Automatic Video-based Person Authentication
Using the RBF Network” First International Conference on
Audio- and Video-Based Biometric Person Authentication, 1997
V. REFERENCES pages 85-92.

1. “TULIPS1 database, (Movellan J. R. (1995)” Movellan J. R. 15. Xiaopeng Hong, Hongxun Yao, Yuqi Wan, Rong Chen , ”A
G. Tesauro, D. Toruetzky, & T. Leen (eds.) (1995) Visual PCA Based Visual DCT Feature Extraction Method for Lip-
Speech Recognition with Stochastic Networks. in Advances in Reading” Proc. 2006 International Conference on Intelligent
Neural Information Processing Systems, Vol 7, MIT Pess, Information Hiding and Multimedia Signal Processing, (IIH-
Cambridge. MSP'06.),pp.321-326,Pasadena,CA,USA,2006.

2. D.A.Reynolds and L.P.Heck, "Automatic Speaker


Recognition Recent Progress, Current Application, and Future
Trends", AAAS 2000 Meeting Humans, Computers and Speech
Symposium, 19 February 2000.

3. C. C. Brown, X. Zhang, R. M. Mersereau and M. Clements,


“Automatic speech reading with application to speaker
verification”, Proc. 2002 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP’02), vol.1,
pp. 685-688, Orlando, USA, May 2002.

4. T. Wark and S. Sridharan, “A Syntactic Approach to


Automatic Lip Feature Extraction for Speaker Identification”,
Proc. 1998 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP’98), vol.6, pp. 3693-3696,
Seattle, USA, May 1998.

5. J. Luettin, N. A. Thacker and S. W. Beet, “Learning to


recognise talking faces”, Proc. 13th International Conference on
Pattern Recognition, vol.4, pp. 55-59, Vienna, Austria, Aug.
1996.

6. I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, R. Harvey,


“Extraction of visual features for lip reading”, IEEE Trans. on

You might also like