Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

IEEE - 43488

A Robust Approach for Gender Recognition using


Deep Learning

Shefali Arora M.P.S Bhatia


Division of Computer Engineering Division of Computer Engineering
Netaji Subhas Institute of Technology Netaji Subhas Institute of Technology
Delhi,India Delhi,India
arorashef@gmail.com bhatia.mps@gmail.com

Abstract— Gender information using facial patterns serves an finally fully connected layers. Thus accuracy is improved to a
important use in various user interaction applications. This large extent , with the use of CNN for facial recognition and
paper proposes an approach based on Convolutional Neural analysis.CNNs are easier to train as compared to traditional
Networks to identify gender from faces. The network is trained classifiers. Also they have fewer parameters than fully
using back-propagation and Adam optimization. Using connected multi layer perceptron networks and are robust to
convolution operations, performance of proposed CNN network shape and transformation variations[2].
is evaluated on publicly available CASIA face recognition
dataset. The proposed network is able to process 640 x 480 pixel In this paper, we propose a robust and effective method to
face images in a very less time. A combination of convolutional classify gender using CNN. The network architecture has
and max pooling layers is used and classification accuracy of lesser number of layers and parameters as compared to
98.5 percent is achieved in just 50 epochs. The results on CASIA existing methods. We conduct experiments on 2500 images of
benchmark prove that a superior classification performance can CASIA face recognition dataset, along with image
be attained by using Convolutional Neural Networks for gender augmentation to provide a good amount of training data to the
recognition. The accuracy attained is more than earlier methods network. It is observed that by using Adam optimization and
used for face recognition. backpropagation, the training accuracy converges in lesser
number of epochs and outperforms various existing methods
Keywords— facial patterns, gender, CNN, deep
used in gender recognition from facial patterns.
learning,computer vision
The remaining paper is classified as follows. Section 2
describes the earlier work done in the field of face recognition
and deep learning. Section 3 explains the architecture of
I. INTRODUCTION convolutional neural networks and the proposed use of this
Facial images help a great deal in terms of information architecture in our gender recognition system. Section 4
needed for various applications involving human interaction. provides the experimental results and Section 5 concludes the
Computer vision researchers have been working on paper.
development of systems that make use of this reliability of
human systems. In this paper, gender classification from facial II. RELATED WORK
images is the main focus.
Authors worked on unconstrained face images for gender
Research shows that the differences between masculinity recognition in LFW dataset, based on prior knowledge of pose
and femininity can help to improve performances of biometric and demographic parameters[3], attaining an accuracy of
systems and computer vision applications. There are various around 90% using PHOG descriptor and SVM. Other
challenges like deciphering facial expressions, removing descriptors like Gabor descriptors were used along with Fuzzy
background noise etc. that have to be dealt with in this LDA classifier which considers facial images to be in more
process. Thus there arises a need of developing robust than one class[4].
algorithms that can achieve gender classification with high Gonzalez-Sosa et al.[5] worked on estimation of gender ,
accuracy.
ethnicity, age and presence of glasses from faces in LFW
Conventional approaches of face recognition involve dataset, to tackle facial recognition in unconstrained
image preprocessing, feature extraction and further environments. The accuracy attained was 98.2%.
classification. Thus performance is highly dependent on the Levi and Tassner developed a convolutional net architecture
type of classifier used and the number of features extracted. which proved appropriate for age and gender recognition on
The advantage of using CNNs is that these multilayer Adience benchmark[6]. Gender estimation using this method
perceptrons achieve feature extraction, dimensionality gave a mean accuracy of 84.6%. Authors have worked on
reduction and classification, all in one network[1]. This is different architectures of CNN, training strategies and their
done by using convolutional layers, max pooling layers and

9th ICCCNT 2018


July 10-12, 2018, IISC, Bengaluru
Bengaluru, India
IEEE - 43488

application on three popular benchmarks to solve the problem In this paper, we have been able to solve the task of gender
of gender classification[9]. recognition by using deep convolutional networks, which help
Shan worked on LFW benchmark using local binary patterns to attain high accuracy, with results being better than many
and SVM classifier to achieve an accuracy of 94.81%[10]. existing approaches. The proposed architecture using deep
Interests in CNNs for facial recognition was revived and convolutional networks has been described in the following
authors continued to work on deeper layers of Convolutional section.
neural networks .
Mirjalili and Ross[11] devised techniques in which facial
images were modified and then soft biometric utilities like III. DEEP CONVOLUTIONAL NETWORKS
gender were mined. While gender classified gives a result
which is perturbed, face matcher retains its utility. A. Convolutional Neural Networks
Many such experiments were conducting by modifying Convolutional neural networks make use of convolution to
existing images to attain information of soft biometric traits extract various features from the input[14]. It makes use of
like age and gender . Another experiment on the FERET filters for the process of convolution, in which various features
database gave a classification rate of 94.7 %, with the help of of the image would be extracted. Given a filter of size k x k,
CNN. The use of FERET benchmark has been used in various an image of size n x n is convolved with each filter to give k
research works in a controlled environment as it was easier to features. Thus, the output image would be of size (n-k+1) x (n-
use them as compared to other datasets with faces in the k +1). The filters start from top right corner and move
wild[12]. Other datasets like ImageNet have been used [13] downward, then left to right to perform convolution.
for the problem of gender recognition using deep
convolutional neural networks.The problem of overfitting has
to be taken care of while using CNNs due to a huge number of
model parameters.
Authors obtained a record accuracy of 97.31% on LFW
dataset by using minimalistic CNN and 10 times lesser
number of training images[14].
The use of local binary patterns was done along with the use
AdaBoost classifier on Labeled Faces in the Wild (LFW )
dataset by authors in [15] . This is a challenging dataset and so
a robust system was required to identify faces in unconstrained
environments. Coco et al.[16] worked on use of DNN for Fig.1. Convolution applied to layers
recognition of gender and how using a lower cardinality of
training examples can bias the performance of the system. These layers are followed by pooling layers which reduce the
An approach for gender classification using single neural resolution of features. The most generally used method of
network was given by Titive and Bouzerdoum [20] in which pooling is Max pooling, which subsamples a given image by
an average classification accuracy of 97% was achieved. Afifi picking the maximum values from a matrix[15].
[21] has worked on solving gender recognition task using a
large dataset of hands, as they give important gender related
information.
Biometric identification is playing a major role in ensuring
security and privacy of individuals in many applications these
days.Various traits like face, fingerprint and iris are used to
identify a person. In applications using facial recognition, a
single image can give us valuable information like gender,age,
ethnicity etc. The gender recognition task is very important as
well as challenging .Faces may not be identified due to
differences in pose, illumination and expressions. It is possible
that a biometric system might misidentify an individual in
high risk environments. Making use of soft biometrics ( which
includes attributes like age, gender etc.) can help to obtain Fig.2. Results after Max pooling
valuable information about an individual and solve
identification problems. Besides solving problems like The use of non linear functions helps in distinct identification
shadowing and lighting, use of gender recognition can also be of likely features in each layer. The most commonly used non
used in video surveillance where faces are detected at a linear function is ReLu, or rectified linear unit, which is given
distance, without personal cooperation. Thus, a lot of accuracy by the function[19]:
and therefore, robust algorithms are required in dealing with Y=max(0,x) (1)
such challenging problems.

9th ICCCNT 2018


July 10-12, 2018, IISC, Bengaluru
Bengaluru, India
IEEE - 43488

There are 10 convolution layers and max pooling layers


followed by a fully connected later and dropout layer. The size
of filters used in each layer is 5 x 5.

The convolution layers are:


1.Convolutional filters of size 5 x 5 are convolved with no
stride and padding, resulting in a reduced volume size of
image. The number of filters are 32 in number. This is
followed by rectified linear unit function and maxpooling
which reduces the size to further and removes any negative
Fig.3. Graph depicting ReLu function values if present in the image matrix.

This function removes any negative values from the output 2. This layer is followed by 2 convolutional layers with filters
and makes sure that input and output layer sizes are the same. of size 5 x 5 which would extract features from the image of
It trains the network faster than other non linear functions like the previous layers. 64 filters are used. Every layer is fused
tangent, sigmoid etc. The fully connected layers of CNN sum with max pooling layers and the activation function used in
weights of the previous layers and decide a final result by ReLu i.e rectified linear unit, as it is better option as compared
involving previous layers to get an output feature. to other activation functions.

3. The output image from the previous layers is fed to the next
two convolutional layers using same size filter, 128 in number.
This is followed by another four layers, 256 convolving filters
of size 5, and 512 convolving filters of size 5.
A fully connected layer is used at the end of this network with
a dropout factor of 0.8. The fully connected layer uses a
softmax layer . The use of dropout helps to reduce the case of
overfitting, which is quite common in case of convolutional
neural networks. The softmax layer helps to find the
Fig.4. Fully connected layer probability with which each image falls in a particular class:
i.e whether the person is a male or female.

B. Proposed architecture C. CASIA benchmark


The CASIA Face Image Database version 5.0 (or CASIA-Face
The network architecture has been proposed so as to avoid V5) consists of 2500 facial images of 500 people.These faces
overfitting. The model used in the experiment is as follows: have been captured using Logitech Cmera. The people
TABLE I. PARAMETERS FOR GENDER RECOGNITION involved are mostly students, waiters,workers etc.

Height of the image 640


Width of the image 480
Number of training images 1500
Number of validation images 1000
Number of epochs 50
Optimizer Adam
Loss Categorical crossentropy
Augmentation parameters Flip left, right

The images input to the network is are of dimension 640 x


480 x1 (where 1 is the number of color channels). The colored
images have been resized and converted into grayscale
images.

9th ICCCNT 2018


July 10-12, 2018, IISC, Bengaluru
Bengaluru, India
IEEE - 43488

D. Training and testing


The dataset is split into training and cross validation sets.
This is because it is possible that that if we shuffle the training
set and directly go for testing, the results would be too
promising as compared to reality. Therefore, the hypothesis
cannot be generalized. Therefore, cross validation is done on a
fixed set of images.
The network is trained using the available images and target
labels which are binary vectors for classification of males and
females. There are two classes; hence the number of target
Fig.5. Images from CASIA dataset
vectors are 2 in this case.
In addition, the network is trained to avoid overfitting by
The database consists of images in BMP format and the
applying a dropout layer with a dropout ratio of 0.8. Further,
resolution of these images is 640 x 480. These images could
the weights are optimized by using Adam optimization and a
also be used for various intra-class variations like pose,
learning rate of 0.001. Image augmentation is used by flipping
illumination, distance etc.
the images left and right, as CNNs work best if more training
In this paper, we have opted for the cropped version of
data is fed to the network.
images. These images have been converted to grayscale using
The image augmentation used is a just-in-time one and
OpenCv. While 1500 images of the available 2500 images
although it leads to increase in time taken per epoch for
have been considered to be a part of the training set, the
training, it helps to reduce the overhead incurred for storage.
remaining 1000 are a part of the validation set. Once the
The parameters for augmentation of images are dependent on
model is trained and validated, we can test it by feeding any
the dataset used and can be applied based on intuition. The
new image and it would accurately classify whether the input
results are highly dependent on the selection and tuning of
face image is of a male or a female. Thus gender classification
parameters of CNN model.
has been done on CASIA benchmark.
1 epoch = (Number of training images/ Batch size) iterations
The figure ahead illustrates the network architecture of
The use of proper weights initialization helps to increase the
Convolutional layers used to estimate gender of a person.
performance. The number of epochs in which training
converges is 50, which is much better as compared to existing
methods.
E. Dataset Preprocessing and Tools used

The images from CASIA dataset are preprocessed using an


automated Python script to divide it into two categories: male
and female. The platform used to further analyze the dataset
and predict gender for test images are:

● OpenCV
● Python 3.3
● Tensorflow (tflearn)
● Platform : Virtual Machine on Google Cloud with 30 GB
RAM
● Visualization: Accuracy and loss of the model have been
visualized using Tensorboard.

OpenCV is a popular library used these days for applications


in the field of computer vision and machine learning. With the
help of various functions available in OpenCV, various
operations on images can be performed like reading, resizing,
displaying etc.
Fig.6. Proposed architecture

F. Adam Optimization

9th ICCCNT 2018


July 10-12, 2018, IISC, Bengaluru
Bengaluru, India
IEEE - 43488

Adam optimization is an extension of classical Stochastic


Gradient descent algorithm . It is used by various researchers
because :
. It requires very little hyperparameter tuning
. It is computationally very efficient
. It can work on noisy data
. It combines the advantages of AdaGrad and RMSProp
algorithm in which a per parameter learning rate is used and
this further helps in improvement of performance.

IV. EXPERIMENTAL RESULTS

The accuracy achieved on CASIA benchmark using the


proposed model is 98.5 percent.
Fig.9. Loss using Adam optimization

These graphs, visualized using Tensorboard, depict the


accuracy attained and the loss incurred while training the
model. The loss has been calculated by using categorical
crossentropy in each epoch.

V. CONCLUSION AND FUTURE WORK

In this paper, we have worked on the gender classification


problem by considering 2500 faces present in the CASIA
benchmark. Using image augmentation and other techniques,
we have been able to achieve a good amount of accuracy i.e.
98.5 percent on a lesser number of training images. The
number of epochs in which training converges is also lesser.
Our results are better as compared to many of the existing
Fig.7. Accuracy on training set models for gender recognition.
The paper illustrates the use of Convolutional Neural
Networks and the importance of a deep network and
hyperparameters to attain a better classification performance
on our dataset. Thus our algorithm proves to be quite robust in
handling this problem.
As a part of the future work, we would be extending the use of
CNNs to recognition of other biometric traits like fingerprint,
iris, palm etc. We would also work on other techniques that
could help to make our deep CNN network attain even better
accuracy. This includes the selection and tuning of
hyperparameters, batch normalization and other methods
which could help to attain our objective.

REFERENCES

[1] H. Khalajzadeh, M. Mansouri and M. Teshnehlab, "Face Recognition


Fig.8. Accuracy on validation set Using Convolutional Neural Network and Simple Logistic
Classifier", Advances in Intelligent Systems and Computing, vol. 223,
pp. 197-207, 2013.
[2] S. Xie and H. Hu, "Facial expression recognition with FRR-
CNN", Electronics Letters, vol. 53, no. 4, pp. 235-237, 2017.

9th ICCCNT 2018


July 10-12, 2018, IISC, Bengaluru
Bengaluru, India
IEEE - 43488

[3] O. Ayodeji Arigbabu, S. Mumtazah Syed Ahmad, W. Azizun, W. [15] G. Huang, M. Ramesh, T. Berg and E. Learned-Miller, "Labeled faces in
Adnan, S. Yussof and S. Mahmood, "Soft Biometrics: Gender the wild: A database for studying face recognition in unconstrained
Recognition from Unconstrained Face Images using Local Feature environments", University of Massachusetts, Amherst, 2007.
Descriptor", Computer Vision and Pattern Recognition, pp. 1-9, 2017. [16] M. D. Coco, P. Carcagnì, M. Leo, P. L. Mazzeo, P. Spagnolo and C.
[4] Chengjun Liu and H. Wechsler, "Gabor feature based classification Distante, "Assessment of deep learning for gender classification on
using the enhanced fisher linear discriminant model for face traditional datasets," 2016 13th IEEE International Conference on
recognition," in IEEE Transactions on Image Processing, vol. 11, no. 4, Advanced Video and Signal Based Surveillance (AVSS), Colorado
pp. 467-476, Apr 2002. Springs, CO, pp. 271-277,2016.
[5] E. Gonzalez-Sosa, J. Fierrez, R. Vera-Rodriguez and F. Alonso- [17] C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE
Fernandez, "Facial Soft Biometrics for Recognition in the Wild: Recent Conference on Computer Vision and Pattern Recognition (CVPR),
Works, Annotation, and COTS Evaluation," in IEEE Transactions on Boston, MA, pp. 1-9,2015.
Information Forensics and Security, vol. 13, no. 8, pp. 2001-2014, Aug. [18] D. Jaswal, S. V and K. Soman, "Image Classification Using
2018. Convolutional Neural Networks", International Journal of Scientific and
[6] G. Levi and T. Hassncer, "Age and gender classification Engineering Research, vol. 5, no. 6, pp. 1661-1668, 2014.
usingconvolutional neural networks," 2015 IEEE Conference on [19] X. Jiang, Y. Pang, X. Li, J. Pan and Y. Xie, "Deep neural networks with
Computer Vision and Pattern Recognition Workshops (CVPRW), Elastic Rectified Linear Units for object recognition", Neurocomputing,
Boston, MA, pp. 34-42,2015. vol. 275, pp. 1132-1139, 2018.
[7] A. Toshev and C. Szegedy, "DeepPose: Human Pose Estimation via [20] F.Tivive and A. Bouzerdoum, "A gender recognition system using
Deep Neural Networks", in IEEE Conference on Computer Vision and shunting inhibitory convolutional neural networks", in International
Pattern Recognition, pp. 1653-1660.,2014. Joint Conference on Neural Networks, 2006.
[8] D. Reid, S. Samangooei, C. Chen, M. Nixon and A. Ross, "Soft [21] M. Afifi, "11K Hands: Gender recognition and biometric identification
biometrics for surveillance: an overview", in Handbook of Statistics, vol. using a large dataset of hand images", Computer Vision and Pattern
31, pp. 327-352,2013. Recognition, 2018.
[9] G. Antipov, M. Baccouche, S. Berrani and J. Dugelay, "Effective
training of convolutional neural networks for face-based gender and age
prediction", Pattern Recognition, vol. 72, pp. 15-26, 2017.
[10] C. Shan, "Learning local binary patterns for gender classification on
real-world face images", Pattern Recognition Letters, vol. 33, no. 4, pp.
431-437, 2012.
[11] V. Mirjalili and A. Ross, "Soft Biometric Privacy: Retaining Biometric
Utility of Face Images while Perturbing Gender", in Proc. of
International Joint Conference on Biometrics, 2017, pp. 1-10.
[12] P. Phillips, H. Wechsler, J. Huang and P. Rauss, "The FERET database
and evaluation procedure for face-recognition algorithms", Image and
Vision Computing, vol. 16, no. 5, pp. 295-306, 1998.
[13] A. Krizhevsky, I. Sutskever and G. Hinton, "ImageNet classification
with deep convolutional neural networks", Communications of the ACM,
vol. 60, no. 6, pp. 84-90, 2017. \ “
[14] G. Antipov, S. Berrani and J. Dugelay, "Minimalistic CNN-based
ensemble model for gender prediction from face images", Pattern
Recognition Letters, vol. 70, pp. 59-65, 2016.

9th ICCCNT 2018


July 10-12, 2018, IISC, Bengaluru
Bengaluru, India

You might also like