CNN Algorithms For Detection of Human Face Attributes - A Survey

Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2019)
IEEE Xplore Part Number: CFP19K34-ART; ISBN: 978-1-5386-8113-8
CNN Algorithms for Detection of Human Face

Attributes – A Survey
Vallimeena P, Uma Gopalakrishnan, Bhavana B Nair, Sethuraman N Rao
Amrita Center for Wireless Networks & Applications (AmritaWNA)
Amrita School of Engineering, Amritapuri
Amrita Vishwa Vidyapeetham
India
e-mail: anudhas24@gmail.com, umag@am.amrita.edu, bhavanabnair@am.amrita.edu, sethuramanrao@am.amrita.edu
Abstract— In recent years, CNN algorithms are being size of the dataset increases and helps to learn high-level
increasingly applied for various computer vision based features from the training dataset. Also, the Machine Learning
applications such as disaster management systems using crowd- network needs two different algorithms for feature extraction
sourced images. Flood is one such frequent natural disaster that and classification [4] whereas deep learning uses only one
threatens human life and property. Research is in progress to network. The three fundamental deep learning architectures
find the extent of damage in flood hit areas by calculating the are Convolutional Neural Network (CNN), Recurrent Neural
depth of the water using flood images containing humans Networks and Recursive Neural Networks among which CNN
captured by smartphone cameras. Algorithms, which can detect a is designed for image processing.
human face and its attributes such as age, gender and ethnicity
with these crowd-sourced images, can provide valuable
information during such situations. A multitude of CNN TABLE I. TYPES OF MACHINE LEARNING ALGORITHM. TABLE
algorithms is available for these tasks. Each one of them is STYLES
different in their architecture which in turn influences the
accuracy of the results. In this survey, we compare the state of
Supervised • Decision Trees
Learning
the art CNN algorithms which perform each of these tasks, • Naive Bayes Classification
namely, face detection, age and gender classification, and • Ordinary Least Squares Regression
ethnicity classification. We compare these algorithms with • Logistic Regression
respect to their performance and accuracy so that an appropriate • Support Vector Machines
algorithm can be selected for the above application. • Ensemble Methods
Keywords— CNN; VGG; Inception; ResNet; Face Detection;

Unsupervised • Clustering Algorithms
Learning
YOLO; Face Classification; Gender; Age- group; Ethnicity. • Principal Component Analysis
• Singular Value Decomposition
I. INTRODUCTION • Independent Component Analysis
Reinforcement • Q-Learning
Machine learning is an application of Artificial Learning
• State-Action-Reward-State-Action (SARSA)
Intelligence that makes the machine learn from training data • Deep Q Network (DQN)
set and perform prediction on test data. It has a wide range of • Deep Deterministic Policy Gradient (DDPG)
applications like autonomous cars, intelligent transport system
[1], cybersecurity, classification of human faces, etc. TABLE I.
summarises categorization of Machine Learning Algorithms There are various applications that require face
based on their type. detection [5] and classification using CNN architectures. One
such application is flood monitoring system [6]. In flood
Image processing is the most popular application of
monitoring system, water depth is estimated using an object as
Machine learning which can analyse and provide rich
reference from the images whose height should be known.
information from images using computer vision techniques.
Many of the images from flood situations have humans. So,
The image dataset may consist of many objects like vehicles,
humans are chosen as reference object and the average height
lamp posts, animals, humans, etc. In detecting and classifying
of the humans is determined based on the age-group, gender
such objects, machine learning algorithms can provide
and ethnicity of the humans [7]. The flood monitoring system
accurate results.
has to be trained to detect the faces of the humans and predict
However, Machine Learning networks are simple, the age-group, gender and ethnicity of the faces detected.
involves less learning and can be used only for a dataset of Similarly, the researchers working on different applications
limited size. But the deep learning provides better can utilize the results of this paper to choose their CNN
performance (accuracy and execution time in test data) as the architecture.
978-1-5386-8113-8/19/$31.00 ©2019 IEEE 576

Based on the different deep learning mechanisms, CNN [8] is a type of Artificial Neural Networks that
CNN architecture used and size of the dataset, the performs feature extraction and classification within a single
performance of the system varies. The accuracy of the system deep neural network. When the CNN employs different deep
can be further improved using better training dataset. This learning mechanisms like Back Propagation, Stochastic
work compares different CNN architectures trained for Gradient Descent, Learning Rate Decay, Max Pooling, Batch
classification of face attributes like age- group, gender and Normalization, Long Short-Term Memory, Transfer Learning,
ethnicity. Comparison gives the accuracy achieved by etc., based on the requirement of the architecture designed and
different algorithms that help to choose the architecture for the application, the system will provide better results.
different applications.
Layers [9] in CNN:
This paper is organised as follows. Section 2 explains
the Flood Monitoring system, Section 3 starts with an ● Convolutional Layer - Filters are applied for feature
introduction on CNN architectures and then describes each extraction
architecture. Section 4 compares different face detection ● Pooling Layer - Max pooling for dimensionality reduction
algorithms, Section 4 compares different algorithms for face ● Fully Connected Layer - For classification of objects
classification based on age-group, gender and ethnicity. Stride and Padding values are two important
Section 4 concludes this work based on the comparison and parameters for a filter for deciding the size of the output
also discusses the potential options for further enhancement of image. Stride value says the number of columns a filter should
this work. shift after a convolution operation. Padding appends the image
with extra pixels in all the side. Also, Max pooling and
II. FLOOD MONITORING Activation functions are two important mechanisms of CNN.
A flood monitoring system is proposed to aid the rescuers Max pooling is to downsample the size of the image (reduce
with information such as water depth to help with rescue the dimension) and Activation functions [10] is to receive the
actions. Estimating water depth helps to continuously monitor output from a node. Sigmoid, Tanh, ReLU, SoftMax are most
the flooded region. Our current research on flood monitoring commonly used activation functions.
system focuses on images with humans. Humans are used as The first CNN architecture is LeNet [8] for recognizing
the reference object and the average height of the humans are handwritten digits from images of size 32*32. But, this is not
used as reference value to estimate the water depth. efficient for high dimensional images. When there is a need
Our flood Monitoring System [7] includes five modules for processing large sized images, this architectures straggles
namely face detection, age-group and gender classification, behind due to unavailability of computational resources.
Ethnicity classification, Semantic segmentation, and Water Secondly, a deep CNN architecture with more layers and a
depth estimation. greater number of filters per layer was developed called
AlexNet [8]. It uses 11*11, 5*5, 3*3 convolutions. The
The face detection algorithm is applied to the input image architecture includes 5 convolutional layers and 3 fully
that gives a bounding box around the human faces. According connected layers. It includes ReLU (Rectified Linear Unit)
to the standard ‘golden ratio’, human height is 8 times the activation unit. Then, ZFNet [8], an improved version of Alex
height of the face. With this concept, the average height of the Net with 7*7 convolutions is developed by varying the
human is determined in pixels in the image. Using age-group, parameters of Alex Net and maintains the same architecture. It
gender and ethnicity classification of human faces detected, uses ReLU activation unit and batch stochastic gradient
the average height of the human is determined in feet. These descent. Amongst the existing CNN architectures, Inception,
two values along with the output of semantic segmentation are VGG Net and ResNet are widely accepted and used. Our
used in estimating the water depth. survey focuses on face classification algorithms based on
Hence, CNN architectures have to be applied in flood these algorithms.
monitoring system in face detection, age-group and gender
classification, ethnicity classification modules. A. Inception
As the region of information in each image varies largely
III. CNN ARCHITECTURES in size, it is difficult to choose the right filter size. So, multiple
filters of different sizes (1*1, 3*3, 5*5) are used followed by
ANN (Artificial Neural Networks) is one of the most
max pooling and then the outputs are concatenated. The filters
dominant tools for machine learning. It is inspired by the
of different sizes make the network computationally
features of the human neural system. It consists of three
expensive. So, 1*1 convolution is added before the different
layers: Input layer, Hidden layer, and Output layer. Each layer
sized filters. This forms the inception module. Inception v1
has numerous nodes. Depending on the requirement of
(Google Net) [11] has 9 inception modules and is 22 layers
different applications like pattern recognition in images,
deep. Two auxiliary classifiers and SoftMax activation units
speech recognition different ANN algorithms are developed
are used [12]. To increase the accuracy and reduce the
with more number of hidden layers called deep learning
computational complexity, inception v2 is developed with all
algorithms.
5*5 filters are replaced with two 3*3 filters and then all n*n
978-1-5386-8113-8/19/$31.00 ©2019 IEEE 577

filters are replaced with 1*n and n*1 filters. Filter banks are IV. FACE DETECTION
made wider for easy representation and to reduce the loss of
information [13]. Inception v3 adds RMSprop Optimizer, The face is a unique identifier for an individual. Hence,
Factorized Convolutions, Batch Normalization, and Label face detection becomes the primary step in face classification
Smoothing. TABLE II. Summarise the functionalities of each and face recognition algorithms.
deep learning mechanism.
Wang, et al., 2018 in [17] have compared the traditional
machine learning algorithms of face detection and concluded
TABLE II. DEEP LEARNING MECHANISMS AND THEIR that the Viola-Jones method (Haar Cascade) is the best
FUNCTIONALITIES
method. This section compares Haar Cascade with other state
Mechanism Functionality of the art techniques.
Optimizer Tweak and change parameters to minimize loss
function A. Haar Cascade
Factorized Convolution For reducing dimension
The Haar filters [18] are used to extract the features from
the images. The process of feature extraction involves a large
Normalization Adjusting(normalizing) the input values to number of calculations, so integral image [19] is used.
improve the performance and stability
AdaBoost [20] is used to select the best features from a large
Label Smoothing To prevent overfitting set of features. A cascade of classifiers is used to locate the
region of the face in the image.
In Inception v4, the operations that are before inception The Classifier is accurate for images with frontal
modules are modified and new reduction blocks are orientation but for faces in other angles, this does not work
introduced. Reduction blocks reduce the dimension of the well. Also, it is difficult to tune the parameters of these
input image (reduce 35*35 to 17*17 to 8*8) [14]. Then a classifiers.
hybrid model of inception and ResNet is developed by adding
1*1 convolution after all the operations to make the dimension
of input and output image the same.
B. VGGNet
VGG [15] developed by Visual Geometry Group from
Oxford as an upgrade of AlexNet architecture. It has three
models: VGG-16 (16 layers), VGG-19 (19 layers) and model
fusion. As the filters of smaller size stacked together can
extract more features than large sized filters, the 11*11 and Fig. 1. Face Detection using Haar Cascade
5*5 filters in AlexNet are replaced with 3*3 in VGG, thereby
making the architecture deeper. So, it is the most preferred
B. Dlib Detector
method for feature extraction and can be fine-tuned with
Transfer Learning for different applications. The VGG Dlib is a toolkit that can be used for face detection. The
architecture starts with a set of 3*3 convolutional layers with 1 Dlib can perform face detection along with Histogram of
pixel of stride and padding. Followed by three fully connected Oriented Gradients (HOG) and Support Vector Machine
layers and soft-max layer. Max pooling and ReLU activation (SVM) or with CNN. Dlib with HOG and SVM works well
functions are effectively utilized. for frontal faces. In many cases, it can also detect faces that
are not perfectly frontal. Dlib with CNN can detect faces in all
C. ResNet angles.
ResNet (Residual Neural Networks) architecture has 52 But when the results of the two models are compared, there is
layers. It is equipped with a stack of residual blocks [16]. Each no significant difference in the accuracy.
residual block comprises of a neural network segment and an
identity loop. Residual block with shortcut link in it forms the
residual network. As the architecture becomes deeper, the
signal required to transmit the weights also increases called
vanishing problem and optimization mechanism based on a
large number of parameters decreases the performance of the
architecture called degradation problem. These two challenges
are solved by adding identity loop in residual blocks.
Fig. 2. Face Detection using Dlib
978-1-5386-8113-8/19/$31.00 ©2019 IEEE 578

C. YOLO V. FACE CLASSIFICATION

YOLO [21] is an object detection algorithm which can have a Human faces convey numerous pieces of information such
human face as one of the classes. The conventional face as age, gender, and ethnicity. These details about a human in
detectors are used for single class detection such as the human an image can be obtained from the face. One application of
face. But YOLO can detect any number of classes in an image. estimating age-group, gender, ethnicity is flood monitoring
system that estimates the water depth using humans as the
The Yolo is derived from R-CNN. R-CNN uses a
selective search technique to choose the regions for SVM. reference object and the average height of a human is
This technique chooses a huge number of regions, from which determined using their facial features.
2000 regions are selected. This selective search technique
doesn’t involve any learning process and SVM classifies all A. Based on Age and Gender
the 2000 regions. The CNN architectures Inception, VGG, and ResNet have
shown the best results compared to other architectures [8].
To overcome the limitation of 2000 regions, Fast R- TABLE IV. Compares different CNN architectures trained and
CNN was developed, that inputs the complete image instead tested with the different dataset for age-group and gender
of 2000 regions. The CNN layer generates feature maps, from classification.
which regions are identified. Further, Faster R-CNN was
developed without involving selective search technique.
TABLE IV. COMPARISON OF CNN ARCHITECTURES BASED ON AGE-
All the R-CNN methods only look at the region of the GROUP AND GENDER
proposal, not the complete image. In YOLO, the complete
image is divided into grids and bounding boxes are drawn Architecture Year Dataset Accuracy
around objects within each of the grids. Each bounding box is
provided with a probability value for a particular class of Inception v3 [22][23] 2015 Adience Dataset 85%
object.
ResNet [24] 2017 IMDB-WIKI 95.3%
GoogleNet [25] 2017 Adience Dataset 98%
LMTCNN (Lightweight 2018 Adience Dataset 85%

Multi-Task CNN) [26]
VGG [27] 2018 MORPH-II 98.6%
Inception v3 was trained on unconstrained image dataset

that gives 85% accuracy. In LMTCNN, the accuracy of
Fig. 3. Face Detection using YOLO Inception v3 was maintained improving the speed of execution
and model size for implementing on the Android operating
system. Both GoogleNet and VGG have produced remarkable
TABLE III. COMPARISON OF FACE DETECTION ALGORITHMS accuracy. GoogleNet resulted in good accuracy of 98% by
Face Detection Detection Time No of faces detected utilizing the pre-trained network. VGG in [27] has acquired a
Algorithms (s) good accuracy of 98.6% by exploring transfer learning
Haar Cascade 0.202 7 faces
techniques. The popular datasets include images taken of
people from different camera perspectives and in different
Dlib 0.63 9 faces backgrounds like roads, parks, etc. So, these datasets can be
YOLO 1.2 12 faces used for applications on crowdsourcing.
TABLE III compares the three face detection B. Based on Ethnicity

algorithms discussed above based on their time taken to detect Based on the performance of different CNN algorithms on
the faces and number of faces detected. All the three age-group and gender classification, ethnicity classification is
algorithms use the same image as input and the image has performed using VGG architecture and the results are
totally 12 faces. The face detection algorithms are executed in compared with traditional ANN algorithm.
macOS with 1.8 GHz Intel Core i5 processor and 8GB
memory. Since YOLO involves more learning and a greater a) ANN
number of layers in the neural network, it takes larger
execution time. Still, it can detect all the human faces in Most considered parameters for ethnicity classification are
different orientations accurately. Therefore, it is clear that skin-color, forehead area, face shape. Two different
YOLO detects faces better compared to Haar cascade and experiments have been analysed.
Dlib. Ethnicity classification in [28] using ANN includes two steps,
978-1-5386-8113-8/19/$31.00 ©2019 IEEE 579

Extraction of skin color and Normalized forehead area and VGG architecture is used for ethnicity classification in
calculation using Sobel Edge Detection method. FERET flood monitoring system.
Database with 447 samples (357 for training, 90 for testing) is
used. The experimental results show 82% accuracy. VII. CONCLUSION
Ethnicity classification in [29] includes Gender We have compared all state-of-the-art algorithms for
identification using PCA, face shape recognition using active human face detection and facial feature based human
appearance model (AAM) and active shape model (ASM), classification and have found the best algorithm based on their
feature and key point extraction, Euclidean distance performance and accuracy.
calculation and final classification using SVM. The As part of the future work, an extensive survey in
experiment was tested on Android platform and the accuracy different layers in each architecture can help in identifying the
is 86.4% scope of improving performance. The hybrid models of
inception and ResNet, in general, have proved to give better
b) CNN results than VGG. Hence, there is more scope for hybrid
models in different applications.
Ethnicity classification in [28] uses FERET Database with
357 samples and VGG- 16 architecture with 13 convolution
ACKNOWLEDGMENT
layers, 3 fully connected layers, pooling with 2*2 window and
stride 2. ReLU activation function and Categorical cross We are extremely grateful to our beloved Chancellor,
entropy for loss function are used. The accuracy on testing is Dr. Mata Amritanandamayi Devi, also known as Amma, for
98.6%. TABLE V. Compares the ANN and CNN Architectures providing us the guidance, motivation and a supportive
Based on Ethnicity. environment to work on this project.
TABLE V. COMPARISON OF ANN AND CNN ARCHITECTURES

BASED ON ETHNICITY REFERENCES
[1] Agartha, R., C. Arunkumar, and K. RagheshKrishnan. "Automatic
Architecture Dataset Accuracy isolation and classification of vehicles in a traffic video." In 2011
World Congress on Information and Communication Technologies, pp.
ANN FERET 82% 357-361. IEEE, 2011.
CNN (VGG) FERET 98.6% [2] Kdnuggets.com ‘The 10 Algorithms Machine Learning Enigneers Need
to Know’[Online] Available on :
https://www.kdnuggets.com/2016/08/10-algorithms-machine- learning-
In ethnicity classification, VGG architecture of CNN engineers.html [Accessed on: Jan 2019]
has achieved a better accuracy of 98.6% as compared to 82%
[3] Towardsdatascience.com, ‘Introduction to Various Reinforcement
using ANN (TABLE IV). Both the architectures have used Learning Algorithms. Part I (Q-Learning, SARSA, DQN,DDPG)’
FERET dataset for ethnicity classification. Therefore, in [Online] Available on : https://towardsdatascience.com/introduction-to-
addition to age-group and gender classification, VGG various-reinforcement-learning- algorithms-i-q-learning-sarsa-dqn-
architecture performs accurately for ethnicity classification as ddpg-72a5e0cb6287 [Accessed on: Jan 2019]
well. [4] Towardsdatascience.com, ‘Why Deep Learning over Traditional
Machine Learning?’ [Online] Available on :
https://towardsdatascience.com/why-deep-learning-is-needed- over-
VI. DISCUSSION
traditional-machine-learning-1b6a99177063 [Accessed on: Jan 2019]
From Table 3, it is clear that YOLO performs well [5] Nehru, Mangayarkarasi, and S. Padmavathi. "Illumination invariant
for face detection. In our flood monitoring system, all the face detection using viola jones algorithm." In 2017 4th International
human faces detected cannot be used for water depth Conference on Advanced Computing and Communication Systems
(ICACCS), pp. 1-4. IEEE, 2017.
estimation. In cases, such as child carried over the shoulder,
[6] Narayanan, RamKumar, V. M. Lekshmy, Sethuraman Rao, and Kalyan
the child’s face cannot be considered. So, the algorithm that Sasidhar. "A novel approach to urban flood monitoring using computer
can detect the maximum number of faces has to be used. vision." In Fifth International Conference on Computing,
Hence, YOLO is chosen for face detection in flood monitoring Communications and Networking Technologies (ICCCNT), pp. 1-7.
system. IEEE, 2014.
[7] Nair, Bhavana B., and Sethuraman N. Rao. "Poster: Flood Monitoring
From Table 4 and Table 5, it is clear that VGG using Computer Vision." In Proceedings of the 15th Annual
performs well in all facial feature-based human classification. International Conference on Mobile Systems, Applications, and
For age-group and gender classification, GoogleNet has also Services, pp. 165-165. ACM, 2017.
achieved 98% accuracy. Increase in the accuracy of flood [8] medium.com, ‘CNN Architectures: LeNet, AlexNet, VGG,
monitoring system helps to continuously monitor a flooded GoogLeNet, ResNet and more ....’ [Online] Available on :
region temporally. The system can determine whether the https://medium.com/@sidereal/cnns-architectures- lenet-alexnet-vgg-
googlenet-resnet-and-more-666091488df5 [Accessed on: Jan 2019]
water level is increasing or decreasing with time. Hence,
Google Net is used for age-group and gender classification
978-1-5386-8113-8/19/$31.00 ©2019 IEEE 580

[9] towardsdatascience.com, ‘Simple Introduction to Convolutional Neural https://computersciencesource.wordpress.com/2010/09/03/computer-

Networks’ [Online] Available on: vision-the-integral-image/[Accessed on: Mar 2019]
https://towardsdatascience.com/simple-introduction-to- convolutional- [20] en.wikipedia.org, ‘AdaBoost’ [Online] Available on :
neural-networks-cdf8d3077bac [Accessed on: Feb 2019] https://en.wikipedia.org/wiki/AdaBoost[Accessed on: Mar 2019]
[10] towardsdatascience.com. ‘A Simple Guide to the Versions of the [21] towardsdatascience.com, ‘R-CNN, Fast R-CNN, Faster R-CNN,
Inception Network’ [Online] Available on : YOLO — Object Detection Algorithms’ [Online] Available on :
https://towardsdatascience.com/a-simple-guide-to-the-versions- of-the- https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-
inception-network-7fc52b863202 [Accessed on: Feb 2019] object-detection-algorithms-36d53571365e[Accessed on: Mar 2019]
[22] Levi, Gil, and Tal Hassner. "Age and gender classification using
[11] towardsdatascience.com. ‘Activation Functions in Neural Networks’ convolutional neural networks." In Proceedings of the IEEE
[Online] Available on :https://towardsdatascience.com/activation- Conference on Computer Vision and Pattern Recognition Workshops,
functions-neural-networks-1cbd9f8d91d6 [Accessed on: Feb 2019] pp. 34-42. 2015.
[12] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott [23] github.com, ‘Age/Gender detection in Tensorflow’ [Online] Available
Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and on : https://github.com/dpressel/rude-carnie[Accessed on: Jan 2019]
Andrew Rabinovich. "Going deeper with convolutions." In Proceedings
[24] Zhang, Ke, Ce Gao, Liru Guo, Miao Sun, Xingfang Yuan, Tony X.
of the IEEE conference on computer vision and pattern recognition, pp.
Han, Zhenbing Zhao, and Baogang Li. "Age group and gender
1-9. 2015.
estimation in the wild with deep ror architecture." IEEE Access 5
[13] Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and (2017): 22492-22503.
Zbigniew Wojna. "Rethinking the inception architecture for computer
[25] Liu, Xuan, Junbao Li, Cong Hu, and Jeng-Shyang Pan. "Deep
vision." In Proceedings of the IEEE conference on computer vision and
convolutional neural networks-based age and gender classification with
pattern recognition, pp. 2818-2826. 2016.
facial images." In 2017 First International Conference on Electronics
[14] Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Instrumentation & Information Systems (EIIS), pp. 1-4. IEEE, 2017.
Zbigniew Wojna. "Rethinking the inception architecture for computer
vision." In Proceedings of the IEEE conference on computer vision and [26] Lee, Jia-Hong, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen.
"Joint Estimation of Age and Gender from Unconstrained Face Images
pattern recognition, pp. 2818-2826. 2016.
using Lightweight Multi-task CNN for Mobile Applications." In 2018
[15] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional IEEE Conference on Multimedia Information Processing and Retrieval
networks for large-scale image recognition." arXiv preprint (MIPR), pp. 162-165. IEEE, 2018.
arXiv:1409.1556 (2014).
[27] Smith, Philip, and Cuixian Chen. "Transfer Learning with Deep CNNs
[16] Huang, Furong, Jordan Ash, John Langford, and Robert Schapire. for Gender Recognition and Age Estimation." In 2018 IEEE
"Learning deep resnet blocks sequentially using boosting theory." arXiv International Conference on Big Data (Big Data), pp. 2564-2571. IEEE,
preprint arXiv:1706.04964 (2017). 2018.
[17] Dang, Kirti, and Shanu Sharma. "Review and comparison of face [28] Masood, Sarfaraz, Shubham Gupta, Abdul Wajid, Suhani Gupta, and
detection algorithms." In 2017 7th International Conference on Cloud Musheer Ahmed. "Prediction of human ethnicity from facial images
Computing, Data Science & Engineering-Confluence, pp. 629-633. using neural networks." In Data Engineering and Intelligent
IEEE, 2017. Computing, pp. 217-226. Springer, Singapore, 2018.
[18] Becominghuman.ai, ‘Face Detection Using opencv With Haar Cascade [29] Batsukh, Bat-Erdene, and Ganbat Tsend. "Effective Computer Model
Classifiers’ [Online] Available on : https://becominghuman.ai/face- For Recognizing Nationality From Frontal Image." arXiv preprint
detection-using-opencv-with-haar-cascade-classifiers-941dbb25177 arXiv:1603.04550 (2016).
[Accessed on: Mar 2019]
[19] computersciencesource.wordpress.com, ‘Computer Vision – The
Integral Image’ [Online] Available on :
978-1-5386-8113-8/19/$31.00 ©2019 IEEE 581

CNN Algorithms For Detection of Human Face Attributes - A Survey

Uploaded by

Copyright:

Available Formats

You might also like

CNN Algorithms For Detection of Human Face Attributes - A Survey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CNN Algorithms For Detection of Human Face Attributes - A Survey

Uploaded by

Copyright:

Available Formats

Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2019)

IEEE Xplore Part Number: CFP19K34-ART; ISBN: 978-1-5386-8113-8

CNN Algorithms for Detection of Human Face

Keywords— CNN; VGG; Inception; ResNet; Face Detection;

978-1-5386-8113-8/19/$31.00 ©2019 IEEE 576

978-1-5386-8113-8/19/$31.00 ©2019 IEEE 577

Fig. 2. Face Detection using Dlib

978-1-5386-8113-8/19/$31.00 ©2019 IEEE 578

C. YOLO V. FACE CLASSIFICATION

GoogleNet [25] 2017 Adience Dataset 98%

LMTCNN (Lightweight 2018 Adience Dataset 85%

VGG [27] 2018 MORPH-II 98.6%

Inception v3 was trained on unconstrained image dataset

TABLE III compares the three face detection B. Based on Ethnicity

978-1-5386-8113-8/19/$31.00 ©2019 IEEE 579

TABLE V. COMPARISON OF ANN AND CNN ARCHITECTURES

978-1-5386-8113-8/19/$31.00 ©2019 IEEE 580

[9] towardsdatascience.com, ‘Simple Introduction to Convolutional Neural https://computersciencesource.wordpress.com/2010/09/03/computer-

978-1-5386-8113-8/19/$31.00 ©2019 IEEE 581

You might also like