Professional Documents
Culture Documents
10 1109@cac 2018 8623118
10 1109@cac 2018 8623118
Abstract—Recently, deep learning based on convolutional great state-of-the-art performance in image classification.
neural networks (CNN) has achieved great state-of-the-art The advantage of CNN is that it omits the steps of features
performance in many fields such as image classification, extraction in traditional image processing. There have been
semantic analysis and biometric recognition. Normally, the some studies applying deep learning method in gait analysis
Softmax activation function is used as classifier in the last layer [2-5]. What is more, Razavian et al. [9] proposed that the
of CNN. However, there some studies try to replace the deep fully connected layer units of an off-the-shelf CNN can
Softmax layer with the support vector machine (SVM) in an be used as the input image’s descriptors and the SVM can be
artificial neural network architecture and achieve great trained by the deep features extracted from the pre-trained
results. Inspired by these works, we research the performance
deep convolutional neural network. Many researchers used
of CNN with linear SVM classifier on the gender recognition
the linear SVM to replace the Softmax layer in CNN to
based on CASIA-B dataset. In the first model, the input
image’s descriptors are extracted from the fully connected
create a novel architecture, namely CNN-SVM, to address
layer of the pre-trained VGGNet-16 model as the features to different image recognition tasks [10-12].
train the SVM. In the second model, we adjust VGGNet-16 Yichuan Tang [11] have researched the deep learning
with a hinge loss function using an L2 norm to create a new using linear support vector machines in face classification
architecture, namely VGGNet-SVM. The results have shown task. Inspired by Tang’s work, this paper studies the using of
that SVM shows the better performance than Softmax in linear SVM in deep learning in gait-based gender
VGGNet-16 to work out the gender recognition problem based
recognition. We research the performance of linear SVM
on gait.
classifier on two models, the deep-features model and the
Keywords—CNN, SVM, gender recognition, GEI, deep- fine-tuning model. In the former model, the SVM is trained
features by image’s descriptors which are extracted from the fully
connected layer of the pre-trained deep convolution model
VGGNet-16. In this model, the weights on all layers are
I. INTRODUCTION freeze, meaning that the pre-trained VGGNet-16 is only used
Gender is one of the important basic attributes of human as a tool for features extraction. In the latter model, for our
beings. It is an important function to identify the gender of fine-tuning network a 2 units fully connected layer with a
an object for the long-distance intelligent monitoring system, hinge loss function using an L2 norm was added follow the
which can effectively improve the system's understanding of fc8. And only the weights of fully connected layers will be
the monitoring environment. At present, most gender optimized in the training. Figure 1 shows the architecture of
recognition methods are based on face, but the acquisition of that. To compare our network with network using a Softmax,
face images is easily restricted. Influenced by factors such as we demonstrate the performance on CASIA_B dataset. The
distance and resolution, it is not easy for the camera to experimental results show that the proposed architecture has
capture a clear face image. Compared with face, gait has its a better classification rate compared to the traditional
advantages on nonaggression, non-contact and easy architecture under normal, carrying, and wearing conditions
collection, which makes the gait recognition play an in five views (54°, 72°, 90°, 108°, 126°).
irreplaceable role in the long-distance identification [1].
With the development of image processing and II. RELATED WORK
recognition technology, silhouette-based gait identification
has become the major method. The support vector machine A. The VGGNet-16 model
(SVM) plays a key role in machine learning. Generally, the A state-of-the-art deep convolution model VGGNet-16
SVM classifiers are trained by gait features image’s [13] was used in this paper. The VGGNet-16 model includes
descriptors to get a classification model to estimate the class 16 weight layers which consist of 13 convolutional layers
of feed gait feature image [6-8]. The key of the way is the and 3 fully connected layers. The input to VGGNet-16 is
selection of image descriptors. Recently, the deep learning RGB image whose size is 224×224 pixel. The input image is
based on convolutional neural network (CNN) has achieved passed through a stack of convolutional layers with the shape
of 3×3. The convolutional layers are followed by fully extract the 4096 dimensional fc6 fc7 features and 1000
connected layers which first two have 4096 units and the dimensional fc8 features as the deep-features to train a SVM,
third has 1000 units meaning the results of classification. The listing in table 1.
final layer is the Softmax layer. In the work, the VGGNet-16
model is pre-trained using ImageNet dataset. TABLE I. THE DEEP FEATURES.
3478
Fig. 3. The freeze configuration of fine-tuning VGGNet-SVM
3479
Fig. 4. The output of loss in training
frozen, the more difficult for loss to converge to a lower research the performance of deep-features extracted from the
value. This is because the ability of the convolutional neural fully connected layer of pre-trained VGGNet-16. The results
network to modify and adjust features is degraded, making have shown that these features can be used as the input GEI
the extracted features not well adapted to the needs of the descriptors to extract the gait features and the best
task. performance was achieved when using the Deep-features-I
which is extracted on the FC6 layer of VGGNet-16. In our
When the FC6 layer is frozen, the loss output suddenly future work, we intend to extend our researches with other
changes very sharply when it is trained for about 50 epoch, pre-trained CNNs such as AlexNet, ResNet, LetNet and etc.
and then slowly and gradually stabilizes. And whether the
FC8 layer frozen directly affects whether the loss can
converge to a lower value. So it can be inferred here that the ACKNOWLEDGMENTS
FC6 layer and the FC8 layer should play a key role in the The research in this paper use the CASIA Gait Database
feature extraction and adjustment of the entire fully collected by Institute of Automation, Chinese Academy of
connected layer. Sciences. This work is supported by the National Natural
In the table 4, VGGNet-Fune_all is an method using Science Foundation of China (No. 61503398).
Softmax with cross-entropy loss function. The results
provide that the hinge loss using an L2 norm outperforms the REFERENCES
Softmax function with cross-entropy loss. It can be observed [1] Jain A. K, Ross A and Prabhakar S, “An introduction to biometric
that the more number of fully connected layers are fine- recognition,” IEEE Trans. CIRC SYST VID. vol.14, 2004, pp.4-20.
tuning, the more higher accuracy can be get. This result [2] Yeoh T W, Aguirre H E, and Tanaka K, “Clothing-invariant gait
attributed to the fact that the optimization weights of fully recognition using convolutional neural network,” Proc. ISPACS.
connected layer by backpropagation is very effective, which (Xiamen, China) , 2017, pp.1-5.
makes the extracted deep-features more suitable to solve the [3] Shiraga K, Makihara Y, Muramatsu D et al, “GEINet: View-invariant
gait recognition using a convolutional neural network,” Proc. Int.
new recognition task. Conf. on Biometrics. (Halmstad, Sweden) , 2016, pp.1-8.
[4] Shukla R, Shukla R and Shukla A et al, “Gender identification in
TABLE IV. TEST ACCURACY OF FINE-TUNING MODEL. human gait using neural network,” J. Model. Educ & Comput. Sci.
vol.4, 2012, pp.70-75.
Method Accuracy(%) [5] Alotaibi M, Mahmood A, “Improved gait recognition based on
VGGNet-Fune_all 87.10% specialized deep convolutional neural networks,” J. Comput Vision
VGGNet-SVM-Fune_all 89.62% Image Understanding. vol.164, pp.103-10.
VGGNet-SVM-Freeze_FC6 82.72% [6] Yoo J H, Hwang D, Nixon M S, “Gender classification in human gait
VGGNet-SVM-Freeze_FC6FC7 82.22% using support vector machine,” ACIVS. vol.3708, 2005, pp.138-145.
VGGNet-SVM-Freeze_FC6FC7FC8 77.84%
[7] Juang L H, Lin S A and Wu M N, “Gender recognition studying by
At the same time, we can see that the accuracy of gait energy image classification,” Int. IEEE. Symposium on
VGGNet-Fune_all to VGGNet-SVM-Freze_FC6 and Computer, Consumer and Control (Taichung, Taiwan), 2012, pp.837-
VGGNet-SVM-Freze_FC6FC7 to VGGNet-SVM- 840.
Freze_FC6FC7FC8 has greatly reduced. Therefore, it is [8] El-Alfy E S M, Binsaadoon A G, “Silhouette-based gender
further proved that in the VGGNet-SVM structure, the FC6 recognition in smart environments using fuzzy local binary patterns
and support vector machines,” J. Procedia Comput. Sci. vol.109,
and FC8 layers play a key role in feature extraction and 2017, pp.164-171.
adjustment. The parameter freeze on these layers has a great [9] Razavian A S, Azizpour H, Sullivan J, et al, “CNN features off-the-
influence on the result, which is consistent with the output shelf : An astounding baseline for recognition,” IEEE. Conf. on
loss value graph. . Comput Vision & Pattern Recognit (Columbus, American), 2014,
pp.512-519.
V. CONCLUSION [10] Wolfshaar J V D, Karaaba M F, Wiering M A, “Deep Convolutional
Neural Networks and Support Vector Machines for Gender
In this paper, we research using the linear SVM to Recognition,” Proc. IEEE. Symposium Series on Computational
replace the Softmax function in VGGNet-16. The Intelligence (Cape Town, South africa) , 2015, pp.188-195.
experiments have shown that the linear SVM outperforms [11] Tang Y, “Deep Learning using Linear Support Vector Machines,”
arXiv preprint. 2013, 1306.0239
Softmax function to solve gait-based gender recognition task.
We find that the FC6 and the FC8 play a key role in on [12] Agarap A F, “An architecture combining convolutional neural
network (CNN) and support vector machine (SVM) for image
feature extraction and adjustment in the VGGNet-SVM classification,” arXiv preprint. 2017, 1712.0354
model by freeze the different layer. What is more, we
3480
[13] Simonyan K, Zisserman A, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint. 2014, 1409.1566.
[14] Han, J and Bhanu B, “Individual recognition using gait energy
image,” IEEE Trans. TPAMI. vol.28, 2006, pp. 316-322.
3481