Salah 2021

The Visual Computer
https://doi.org/10.1007/s00371-021-02108-3
ORIGINAL ARTICLE
A novel approach for human skin detection using convolutional neural

network
Khawla Ben Salah1,2 · Mohamed Othmani3 · Monji Kherallah4
Accepted: 6 March 2021

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021
Abstract
Human skin detection, which is one of the important pre-processing phases, has a wide range of applications such as face
tracking, skin diseases, video surveillance, web content filtering, and so on. Skin detection is a challenging problem because
skin color can vary dramatically in its appearance due to many factors such as illumination conditions, pose variations, race,
aging, and complex background. Several methods dealing with skin detection assume that skin pixels can be extracted from
background colors according to some thresholding rules related to a specific color model. Nevertheless, it is a complex task
to recognize skin pixels under the challenging factors aforementioned. In the recent era, the success of deep convolutional
neural network (CNN) has strongly influenced the field of computer vision. However, we could find only a few researches that
apply deep learning methods to deal with the skin detection problem. This paper presents a novel approach based on CNN
for skin detection. Extensive experiments show that the proposed approach exceeds the best result for other state-of-the-art
methods.
Keywords Skin detection · Color models · Deep learning · Convolutional neural networks
1 Introduction detection [7], to a variety of computational health informatics

[8]. Skin detection has been proven to be highly compli-
The detection of human skin, i.e., the process of distinguish- cated with a wide variation in the appearance of the skin,
ing between “skin” and “non-skin” pixels in an image or a depending on its own properties, and is also influenced by
video, is a very eminent task for several applications ranging a spread of other factors (low illumination, camera char-
from face detection [1], semantic filtering of web contents acteristics, complex background, subject movement, etc.).
[2], image enhancement [3], gesture analysis [4], surveil- Previous methods like [9,10] have tried to specify the skin
lance systems [5], dermatology diagnostics [6], driver fatigue color in several color models and have suggested skin thresh-
olds in these models. The most known color models that
can be used are RGB, YCbCr, and HSV [11–13]. They can
B Khawla Ben Salah
be used separately or combined [14]. However, these meth-
khawlabensalah8@gmail.com
ods are firmly based on the color distribution of the sample,
Mohamed Othmani
mohamed.othmani@yahoo.fr and there is no semantic information involved; they do not
reach excellent performance. Recently, convolutional neu-
Monji Kherallah
monji.kherallah@fss.usf.tn ral networks (CNN) have shown very promising results on
computer vision tasks, such as image classification and object
1 National Engineering School of Sfax, University of Sfax, BP detection [15]. In this study, we present CNN architecture and
1173, Sfax, Tunisia a training strategy for skin detection. To reduce false posi-
2 Research Lab: Math, Earth Sciences, Modeling and tives, our training strategy consists of patch-based training
Intelligent Systems, Faculty of Sciences of Gafsa, University which is robust to background clutter and detects skin pixels
of Gafsa, Gafsa, Tunisia
precisely. We propose an efficient and performing algorithm
3 Faculty of Sciences of Gafsa, University of Gafsa, BP 2100, to detect and extract skin pixels from the whole RGB image
Gafsa, Tunisia
after integrating the trained model. The proposed approach
4 Faculty of Sciences of Sfax, University of Sfax, BP 1173, overcomes most of the difficulties in skin detection. We also
Sfax, Tunisia
123
K. B. Salah et al.
use color purification methods before the skin color detection

step to enhance the effectiveness of results, which was not
present in previous studies. To improve the computational
efficiency of the skin model, we generate a new dataset from
SFA. Three methods for skin color detection in addition to
our approach have been tested in this work.
2 Related work
2.1 Color model methods
Different color models are applied in many applications such Fig. 1 Illustration of the HSV color model
as TV broadcasting, image processing, computer graphics,
and computer vision [16–18]. The most common ones are
YCbCr and HSV. The choice of color model is the primary – H is the hue: corresponds to the color perception;
process for modeling skin color and further for classification – S is the saturation: describes the color purity, simply its
[19]. According to [20], HSV is classified as a perceptual vivid character or dull;
uniform model and YCbCr as an orthogonal model, and – V is the value: indicates the light quantity in the color, if
YCbCr is chosen as the most suited for skin detection prob- it has light or dark appearance.
lem because orthogonal models are characterized by lower
correlation between components. In the HSV color model, the intensity is represented by the
component V. This component must be abandoned in the
2.1.1 YCbCr color model process of skin detection. Only the H and S components are
maintained which represent the chrominance. The skin color
This color model is often used in image compression. The detection, according to the HSV model, is realized by the
luminance (also called Luma) is presented by the component conversion of the original image from the RGB model to the
Y and calculated by a weighted sum of the components of R, HSV model respecting the following thresholds [23].
G, and B [21]
Y = 0.299 × R + 0.587 × G + 0.114 × B. (1) 0 <= H <= 0.25 (6)

0.15 <= S <= 0.9. (7)
The other two components of this color model represent
the chrominance, and they are calculated from Luma: 2.1.3 HSCbCr model
Cr = R − Y (2) HSCbCr represents the combination between HSV model

Cb = B − Y; (3) and YCbCr model. In fact, H and S components are taken
from the HSV model and the two other components (Cb,
only the two components Cb and Cr are retained for the appli- Cr) are taken from the YCbCr model. Y and H contain the
cation of the thresholding defined as follows [22]: lighting information which adds no information to the color
of human skin. Therefore, these two components have been
((Cb >= 77) and (Cb <= 127)), (4) excluded.
and 2.2 Methods based on deep learning
((Cr >= 133) and (Cr <= 173)). (5) A fair comparison between works based on deep learning
for skin detection is difficult due to the unavailability of
2.1.2 HSV color model a common benchmark. A fully convolutional neural net-
work method for skin segmentation was present in [24]. The
HSV is a color model called “natural”, in other words, close authors make experiments on many CNN structures to deter-
to the physiological perception of color by the human eye. mine the best one. A handcrafted skin dataset was provided
It decomposes colors according to physiological criteria as in their study consisting of three well-known ECU, SFA, and
illustrated in Fig. 1: Pratheepan datasets. The first and second datasets were used
123
A novel approach for human skin detection using convolutional neural network
can detect, localize, and extract the human skin region of the
input image. All steps of the proposed approach are shown
in Fig. 5.
3.2 Convolutional neural network for skin detection
Convolutional neural network (CNN) is a category of models

dedicated to extracting features from 2D inputs (e.g., images)
[31]. CNN served in many applications such as image pro-
cessing [32], pattern recognition [33], and several other types
of cognitive tasks [34]. A typical CNN includes three types
of layers: the convolutional layer, the max-pooling layer, and
the fully connected layer. In this study, CNN has been used
successfully as a skin classifier. The training strategy used in
this work is based on skin and non-skin patches because of its
rapid convergence time compared with whole image-based
training. They go through a stack of multiple convolutions,
one max-pooling layer followed by two fully connected lay-
ers and an output layer as shown in Fig. 3. The convolution
layers (1, 2, 3) are convolved with their respective kernel
number (16, 32, and 64). After the block of convolution lay-
Fig. 2 Generation of a new dataset from SFA. Training set, testing set, ers, the max-pooling layer, also known as a down-sampling
and validating set are three folders each containing a set of skin patches layer, is applied to the feature maps. It was employed to min-
and a set of non-skin patches imize computational complexity and supervise overfitting.
The stride for previous layers is set at 1. Padding was used
after the first convolution layer, and it was set at 1. Nonlin-
during the training phase. Pratheepan was used in the testing
earity is presented in the model by offering all layers with
phase. The authors compare their work to the state-of-the-art
the rectified linear unit (ReLU) activation function. ReLU
methods (Bayesian [2], FSD [25], LASD [26], FPSD [27],
does not saturate, thereby better gradient propagation com-
DSPF [28], SPSD [29], patch-VGG [30], patch-NiN [30],
pared to sigmoid and hyperbolic tan activation functions. The
Image-VGG [30], and Image-NiN [30]) using the Pratheepan
mathematical form of ReLU function is as follows:
dataset. Table 5 presents the results of the related work men-
tioned in [24] as well as the results of our approach.
f (x) = max(0, x) (8)
where x is the input to a neuron. 1 × 1 filters are employed

3 Proposed approach in the first and the third convolutional layers and 2 × 2 fil-
ter in the second one since skin patches from 1 × 1 to 35
This section presents details of the proposed approach. First, × 35 are used, which is different from most of the previous
we give a general overview of the approach. After that, we models that used larger filters (e.g., 3 × 3 or 5 × 5). For
discuss each phase in detail. The same algorithm steps are training this architecture, a large dataset SFA is used, con-
applied to the color models in order to compare our approach sisting of approximately 160,992 skin and no skin patches.
to classical methods, as detailed in Sect. 3.4. Positive samples coincide with skin image patches, and neg-
ative ones correspond to non-skin image patches. We were
3.1 Overview convinced that the SFA dataset is large and diverse enough
to detect skin pixels due to the variation in races and col-
In this paper, we attempt to address the problem of skin detec- ors of these samples. The details of each layer parameters
tion in two steps. The first step of our proposed approach is of the proposed skin detection CNN model are presented
training a patch-based convolutional neural network (Sect. in Table 1. Because we are facing a binary classification
3.2) which takes as input patches classified as skin or non- problem (skin or non-skin) and the output of our model is
skin. To train our model, we generated a new dataset from the a probability (we end our network with a single-unit layer
SFA dataset as shown in Fig. 2. For the second step, we pro- with a sigmoid activation), the best choice was to configure
pose a skin detection algorithm which scan the whole image the model with the rmsprop optimizer and the binary cross-
after integrating the trained model. This approach (Sect. 3.3) entropy loss function. Dropout [35] is a regularization model
123
K. B. Salah et al.
Fig. 3 The proposed skin detection CNN model
2. Function: Extraction of the mask. Step 0 to step 9 are

shown in Fig. 5;
(a) New image is created with same dimensions as the
original images (512 × 768) but in grayscale color,
initialized to 0 (black) and named image number 1;
(b) New grayscale image is created with dimensions 1 ×
1, initialized to 255 (White) and named image number
2;
(c) The original image is scanned pixel by pixel and each
time checked if this pixel is skin or non-skin with the
trained CNN model;
Fig. 4 An example of a skin pixel map obtained with YCbCr method.
(d) If the pixel is skin, image number 2 will be pasted on
a The ground truth image and b the skin pixel map image number 1 with the tested pixel’s coordinates.
3. The mask is smoothened (purification) using morpholog-
ical operations (thresholding, closing, opening);
with low calculation cost and strong deep learning ability. In 4. The skin area is extracted by tracing the mask on the
dropout, a hyper-parameter of neuron sampling probability original image. The result is illustrated in Fig. 6;
(p) is chosen. While the default value is set as 0.5, it is not 5. Evaluation of skin detection: is by employing ground
a norm, and thus, it must be constantly tested with different truth and the skin map extracted.
data and networks. To prevent the problem of overfitting, we
inject dropout several times; particularly, each hidden unit
in the model must learn to cooperate with different sampled 3.4 Skin detection using color models
neurons, which renders the neurons more vigorous and leads
them to acquire useful features, rather than relying on other The algorithm used in the detection of human skin in colored
neurons to rectify their errors. Finally, the training and testing images is explained as follows:
of the CNN were done in 100 epochs and a batch size of 128
samples. These parameters were changed until the optimum 1. Input image is an RGB image from the new generated
performance was reached (Fig. 4). SFA dataset as seen in Fig. 2;
2. The mask is obtained for each image by applying the
3.3 The proposed algorithm predefined thresholding;
3. The mask is smoothened (purification) using morpholog-
The algorithm for human skin detection based on the trained ical operations (thresholding, closing, opening);
model is explained as follows: 4. The skin area is extracted by tracing the mask on the
original image;
5. Evaluation of skin detection is by employing ground
1. Input image is an RGB image from the new generated truth images, where the skin pixels have been manually
SFA dataset; selected. The ground truth is fundamental to generate
123
Table 1 The architecture of the

Layers No. Type Kernel size No. kernels Stride Output shape
proposed skin detection CNN
model Layer 1 Conv 1 1×1 16 1 1 × 1 × 16
Layer 2 Conv 2 2×2 32 1 2 × 2 × 32
Layer 3 Conv 3 1×1 64 1 2 × 2 × 64
Layer 4 Max pooling 1 2×2 0 1 × 1 × 64
Layer 5 Flatten 1 64
Layer 6 Dense 1 128 128
Layer 7 Dense 2 64 64
Layer 8 Dense 3 1 1
Fig. 5 Framework of the proposed skin detection algorithm. The skin map is constructed progressively and displayed in steps 0–9. Each pixel of
the image is predicted with the trained model
four values: true positive (TP), true negative (TN), false

positive (FP), and false negative (FN) that are obtained
by comparing the ground truth image and the obtained
skin map. An example is seen in Fig. 4, (a) is the ground
truth image and (b) is an example of skin pixel map of
the YCbCr method.
Fig. 6 Result of skin detection with CNN approach on an example of

SFA dataset
123
K. B. Salah et al.
Table 3 Confusion matrix of binary skin detection problem

Predicted as skin Predicted as non-skin
GT skin True positive (TP) False negative (FN)

GT non-skin False positive (FP) True negative (TN)
false negatives (FN). In such cases, F-measure and MCC are

excellent quantification measures for binary classification.
In the skin detection studies, TP is the number of skin pixels
Fig. 7 Illustration of samples on SFA dataset
correctly distributed as skin pixels. TN is the number of non-
skin pixels correctly distributed as non-skin pixels. FP is the
Table 2 Numbers of the new generated dataset number of the non-skin pixels incorrectly classified as skin
Types of image Quantity Quantity used pixels. FN is the number of skin pixels incorrectly classified
as non-skin pixels. A confusion matrix as shown in Table 3
Original images 1118 876a details the metrics (TP, TN, FP, and TN) used for the eval-
Ground truths 1118 876a uation of skin detection. Table 4 illustrates the comparison
Skin samples 3354b 3354b between our proposed approach and other color models. The
Non-skins samples 5590b 5590b proposed approach achieved an accuracy, a recall, and IOU
Total of images 163,228 162,744 of 92.39%, 96.00%, and 0.7601, respectively. It can be noted
a Images have the same dimension (512 × 768)
that the best F-measure and Matthews correlation coefficient
b For each one of the 18 different dimensions (MCC) were achieved by the proposed method, i.e., 95.00%
and 81.26%, which demonstrates its excellent capacity for
addressing the challenge. The MCC of 81% is high, also
4 Experimental results indicating that the ground truth and the predicted image have
a high correlation. Figure 8 shows the skin detection result
Our algorithm was evaluated on a PC workstation with 2.6 obtained using the proposed algorithm. The first column of
GHz CPU, 32 GB of memory, and NVIDIA GeForce GTX Fig. 8 represents an original image from the SFA dataset. The
1650 GPU card. All methods were executed using Spyder second column denotes the result of YCbCr color model on
(Python 3.7). The best model results are obtained in epoch the same dataset. The third column denotes the result of HSV.
number 84. It takes 31 ms/step and leads to training loss = The next one presents the result of a combination between
0.1026, training binary accuracy = 0.9625, validation loss = HSV and YCbCr. The last column represents the results of
0.0413, and validation binary accuracy = 0.9485 as shown in the proposed CNN approach. These methods don’t produce
Fig. 10. The proposed approach was tested on SFA dataset better results as compared to our proposed approach (Fig. 8e).
and Pratheepan dataset. SFA dataset [36] is based on images If the color of the hair (golden) and the clothes have the same
of FERET and AR face datasets. SFA is made up of the color as the skin, the hair or the clothes are detected as skin
original images, the ground truths for benchmark the skin regions resulting in false positives as shown in Fig. 8b–d.
detection, and the patches of skin and non-skin. The dimen- However, the background is correctly classified if it has a
sions of patches vary from 1 × 1 pixels to 35 × 35 pixels color different from the color skin. The problem with these
as shown in Fig. 7. The total of images is 163,228. Table 2 methods is in classifying clothes and hair for some images as
shows the numbers of the new generated dataset. skin. Our proposed approach succeeds in identifying some
This dataset combines photos of people from various parts of the face (lips, eyebrows, eyes, neck), accessories
ethnic groups and contains some photos with illumination (earrings, glasses, necklace), shadow, clothes, and hair with
variation. In addition, there are some photos with com- its different colors as non-skin color. Table 5 shows the skin
plex non-skin but skin-colored regions. Pratheepan [37] is detection performance on Pratheepan; it compares the meth-
a small dataset that includes 78 downloaded from Google. ods by accuracy, precision, recall, and the F-measure, where
The dataset is divided into two subsets: FacePhoto includ- it can be seen that the proposed method has the best per-
ing 32 single subject images and family photo including 46 formance. As seen in Fig. 9, results are obviously with high
images with multiple objects and complex backgrounds. To correlation with the ground truth. The proposed approach
evaluate our experiments, we defined five metrics: accuracy, succeeds in detecting skin with a real-world scenario, multi-
recall, IOU, MCC, and F-measure. Accuracy can be abnor- ple people, and in a natural setting as shown in Fig. 9 (Fig.
mally steep despite a small number of true positives (TP) or 10).
123
Fig. 8 Example of results of human skin detection with different methods on SFA dataset. a Original image, b skin detection with YCbCr, c skin
detection with HSV, d skin detection with HSCBCR, and e skin detection with the proposed approach
123
K. B. Salah et al.
Fig. 9 Experimental results on Pratheepan dataset. a Original image, b skin detection with the proposed approach, c ground truth image, and d
skin map obtained with the proposed approach
123
Fig. 10 Loss and accuracy curves (training and validation)
Table 4 Performances of different skin detection methods on SFA fundamental objective of this study was to explore the poten-
dataset tial of CNN learning model for skin pixel classification. A
Accuracy Recall F-measure Mcc IOU comparison is carried out among the existing methods and
our approach. As exhibited in experiments, the proposed
YCbCr 0.8898 0.8700 0.9200 0.7655 0.6988
method outperforms state-of-the-art methods in terms of
HSV 0.8597 0.8200 0.8900 0.7220 0.6488
recall, accuracy, precision, and F-measure in different con-
HSCbCr 0.8477 0.8000 0.8800 0.7060 0.6323
ditions: complex background, variation in the illumination
Ours 0.9239 0.9600 0.9500 0.8126 0.7601
level, and ethnicity. The comparison was performed using
The bold is used to identify which method exceeds in term of value in the SFA and Pratheepan datasets. We have trained the CNN
that metric with skin and non-skin patches from the SFA dataset instead
of whole-image based strategy. We found that this training
Table 5 Evaluation results on Pratheepan dataset
strategy finds skin color as well as its texture. Then, when we
Methods Accuracy Precision Recall F-measure integrate the trained model in the global algorithm for human
Bayesian [2] 0.8237 0.6881 0.8972 0.7788
skin detection, it was efficient in rejecting non-skin pixels in
FSD [25] 0.8255 0.8077 0.6851 0.7414
various situations.
LASD [26] 0.8361 0.7954 0.8275 0.8111
FPSD [27] 0.8419 0.7387 0.8991 0.8070 Declarations
DSPF [28] 0.8521 0.7543 0.8436 0.7964
Conflict of interest The authors declared that they have no conflicts of
SPSD [29] 0.8782 0.7659 0.9328 0.8412
interest to this work.
Patch-VGG [30] 0.9299 0.8563 0.8750 0.8655
Patch-NiN [30] 0.9334 0.8802 0.8972 0.8886
Image-VGG [30] 0.9313 0.8577 0.9069 0.8816
Image-NiN [30] 0.9484 0.9003 0.8912 0.8957 References
FCNN [24] 0.9499 0.8480 0.8981 0.8678
1. Chen, W., Wang, K., Jiang, H., et al.: Skin color modeling for
Ours 0.9357 0.9801 0.9730 0.9765
face detection and segmentation: a review and a new approach.
The bold is used to identify which method exceeds in term of value in Multimed. Tools Appl. 75, 839–862 (2016)
that metric 2. Jones, M.J., Rehg, J.M.: Statistical color models with application
to skin detection. Int. J. Comput. Vis. 46, 81–96 (2002). https://
doi.org/10.1023/A:1013200319198
5 Conclusion 3. Zafarifar, B., Bellers, E.B., de With P.H.: Application and eval-
uation of texture-adaptive skin detection in TV image enhance-
ment. In: IEEE International Conference on Consumer Electronics
In this work, a novel approach based on convolutional neu- (ICCE), pp. 88–91 (2013). https://doi.org/10.1109/ICCE.2013.
ral networks for human skin detection is presented. The 6486807
123
K. B. Salah et al.
4. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition 22. Chai, D., Ngan, K.N.: Face segmentation using skin-color map in
for human computer interaction: a survey. Artif. Intell. Rev. 43, 1– videophone applications. IEEE Trans. Circuits Syst. Video Tech-
54 (2015). https://doi.org/10.1007/s10462-012-9356-9 nol. 9(4), 551–564 (1999)
5. Zhang, Z., Gunes, H., Piccardi, M.: Head detection for video 23. Chitra, S., Balakrishnan, G.: Comparative study for two color
surveillance based on categorical hair and skin colour models. In: spaces HSCbCr and YCbCr in skin color detection. Appl. Math.
IEEE International Conference on Image Processing, pp. 1137– Sci. 6, 4229–4238 (2012)
1140 (2009) 24. Ma, C., Shih, H.: Human skin segmentation using fully convo-
6. Schaefer, G., Tait, R., Zhu, S.Y.: Overlay of thermal and visual lutional neural networks. Nara (2018). https://doi.org/10.1109/
medical images using skin detection and image registration. In: GCCE.2018.8574747
International Conference of the IEEE Engineering in Medicine and 25. Tan, W.R., Chan, C.S., Yogarajah, P., Condell, J.: A fusion approach
Biology Society, NY, vol. 2, pp. 965–967 (2006). https://doi.org/ for efficient human skin detection. IEEE Trans. Ind. Inform. 8, 138–
10.1109/IEMBS.2006.259275 147 (2012)
7. Devi, M.S., Bajaj, P.R.: Driver fatigue detection based on eye 26. Hwang, I., Lee, S.H., Min, B., Cho, N.I.: Luminance adapted skin
tracking. In: First International Conference on Emerging Trends color modeling for the robust detection of skin areas. In: Proceed-
in Engineering and Technology, pp. 649–652 (2008). https://doi. ings of IEEE ICIP, pp. 2622–2625 (2013)
org/10.1109/ICETET.2008.17 27. Kawulok, M.: Fast propagation based skin regions segmentation in
8. Fang, R., Pouyanfar, S., Yang, Y., Chen, S.-C., Iyengar, S.: Com- color images. In: Proceedings of IEEEFG, pp. 1–7 (2013)
putational health informatics in the bigdata age: a survey. ACM 28. Kawulok, M., Kawulok, J., Nalepa, J.: Spatial based skin detection
Comput. Surv. 49, 12 (2016) using discriminative skin presence features. Pattern Recognit. Lett.
9. Erdem, C.E., Ulukaya, S., Karaali, A., Erdem, A.T.: Combining 41, 3–13 (2014)
Haar feature and skin color based classifiers for face detection. In: 29. Hwang, I., Kim, Y., Cho, N.I.: Skin detection based on multi-seed
IEEE International Conference on Acoustics, Speech and Signal propagation in a multi-layer graph for regional and color consis-
Processing Proceedings, pp. 1497–1500 (2011) tency. In: IEEE ICASSP (2017). https://doi.org/10.1109/ICASSP.
10. Zhu, Q., Cheng, K.T., Wu, C.T., Wu, Y.L.: Adaptive learning of an 2017.7952361
accurate skin-color model. In: IEEE International Conference on 30. Kim, Y., Hwang, I., Cho, N.I.: Convolutional neural networks and
Automatic Face and Gesture Recognition, pp. 37–42 (2004) training strategies for skin detection. In: IEEE ICIP (2017). https://
11. Al-Tairi, Z., Wirza, R., Saripan, M.I., Sulaiman, P.: Skin segmen- doi.org/10.1109/ICIP.2017.8297017
tation using YUV and RGB color spaces. J. Inf. Process. Syst. 10, 31. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature
283–299 (2014) 521(7553), 436–444 (2015)
12. Rahman, M.A., Edy Purnama, I.K., Purnomo, M.H.: Simple 32. Ahmad, J., Muhammad, K., Bakshi, S., Baik, S.W.: Object-oriented
method of human skin detection using HSV and YCbCr color convolutional features for fine-grained image retrieval in large
spaces. In: International Conference on Intelligent Autonomous surveillance datasets. Future Gener. Comput. Syst. 81, 314–330
Agents, Networks and Systems, pp. 58–61 (2015) (2018)
13. Bin Abdul Rahman, N.A., Wei, K.C., See, J.: RGB-HCbCr skin 33. Mudassar, R., Muhammad, S., Mussarat, Y., Attique, K.M.,
colour model for human face detection. In: Proceedings of The Tanzila, S., Lawrence, F.S.: Appearance based pedestrians’ gender
MMU International Symposium on Information and Communica- recognition by employing stacked auto encoders in deep learning.
tions Technologies, pp. 90–96 (2006) Future Gener. Comput. Syst. 88, 28–39 (2018)
14. Hajiarbabi, M., Agah, A.: Face detection in color images using skin 34. Hong, T.J., Bhandary, S.V., Sobha, S., Yuki, H., Akanksha, B.,
segmentation. J. Autom. Mob. Robot. Intell. Syst. 8, 41–51 (2014) Raghavendra, U., et al.: Age-related macular degeneration detec-
15. Li, Y., Wang, Z., Yang, X., et al.: Efficient convolutional hierar- tion using deep convolutional neural network. Future Gener.
chical autoencoder for human motion prediction. Vis. Comput. 35, Comput. Syst. 87, 127–135 (2018)
1143–1156 (2019). https://doi.org/10.1007/s00371-019-01692-9 35. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhut-
16. Ganesan, P., Rajini, V.: YIQ color space based satellite image dinov, R.: Dropout: a simple way to prevent neural networks from
segmentation using modified FCM clustering and histogram equal- overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
ization. In: Advances in Electrical Engineering (ICAEE), pp. 9–11 36. Casati, J.P.B., Moraes, D.R., Rodrigues, E.L.L.: SFA: a human
(2014) skin image database based on FERET and AR facial images. In:
17. Ganesan, P., Rajini, V.: Assessment of satellite image segmentation IX Workshop de Visao Computational, Rio de Janeiro (2013)
in RGB and HSV color space using image quality measures. In: 37. Yogarajah, P., Condell, J., Curran, K., Cheddad, A., McKevitt, P.: A
Advances in Electrical Engineering (ICAEE), pp. 9–11 (2014) dynamic threshold approach for skin segmentation in color images.
18. Ganesan, P., Rajini, V.: Value based semi automatic segmentation of In: Proceedings of IEEE ICIP, pp. 2225–2228 (2010)
satellite images using HSV color space, histogram equalization and
modified FCM clustering algorithm. In: Green Computing. Com-
munication and Conservation of Energy (ICGCE), p. 77 (2013)
Publisher’s Note Springer Nature remains neutral with regard to juris-
19. Nikolskaia, K., Ezhova, N., Sinkov, A., Medvedev, M.: Skin detec-
dictional claims in published maps and institutional affiliations.
tion technique based on HSV color model and SLIC segmentation
method. In: CEUR Workshop Proceedings, pp. 1323–1355 (2018)
20. Kakumanu, P., Makrogiannis, S., Bourbakis, N.: A survey of skin-
color modeling and detection methods. Pattern Recognit. (2007).
https://doi.org/10.1016/j.patcog.2006.06.010
21. Sack, H., Meinel, C.: Digitale Kommunikation: Vernetzen, Multi-
media, Sicherheit. Springer, Berlin (2009)
123
Khawla Ben Salah is a PhD Monji Kherallah was born in

student at the National Engineer- Sfax, Tunisia. He received Ing.
ing School of Sfax and received Diploma degree, Ph.D. and HU
the Ing. Diploma degree in 2014 in electrical engineering, respec-
from University of Manouba tively, in 1989, 2008 and 2012,
(ISAMM). Her research interests from University of Sfax (ENIS).
are computer vision, deep learn- For fourteen years, he was an engi-
ing and machine learning. neer in Biotechnology Center of
Sfax. Now he is Professor in Fac-
ulty of Science of Sfax. He is
founder of a professional mas-
ters degree ”Metrology and Indus-
trial Instrumentation” at the Fac-
ulty of Sciences of Sfax. His
research interest includes signal
and image processing. The techniques used are based on intelligent
Mohamed Othmani received B.Sc. methods, such as neural network, logic fuzzy, genetic algorithm, etc.
degree in computer science engi- In fact his research work is based on machine learning and deep learn-
neering from the National School ing. The applications are focused on Arabic document analysis and
of Engineering, University of Sfax, recognition, biometrics, robotics, etc. He has more than 100 papers,
Tunisia, M.S and PhD degrees including journal and conference papers and book chapters. His con-
in computer systems engineering tribution also includes some international project as DAAD project
from the National School of Engi- from April 2007 to March 2010 and EOLES Erasmus + project from
neering, University of Sfax, September 2012 to 2015. His teaching skills are around these themes:
Tunisia. His research Analog Electronics, Digital Electronics, Power Electronics, Industrial
interests include computer vision Maintenance, Sensors and Instrumentation, Pattern Recognition, Arti-
and image analysis. His research ficial Intelligence. He is a member of IEEE and IEEE AESS Tunisia
activities are centered on deep Chapter Chair, 2010 and 2011. He is reviewer of several international
learning, image and signal pro- journals.
cessing, 3D object modeling, clas-
sification and recognition, theory
and tools of neural networks. He is a member of the Research Group
on Intelligent Machines (REGIM). He is teaching computer Science at
the Faculty of Sciences, University of Gafsa, Tunisia.
123

Salah 2021

Uploaded by

Copyright:

Available Formats

You might also like

Salah 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Salah 2021

Uploaded by

Copyright:

Available Formats

The Visual Computer

A novel approach for human skin detection using convolutional neural

Accepted: 6 March 2021

1 Introduction detection [7], to a variety of computational health informatics

use color purification methods before the skin color detection

2.1 Color model methods

Y = 0.299 × R + 0.587 × G + 0.114 × B. (1) 0 <= H <= 0.25 (6)

Cr = R − Y (2) HSCbCr represents the combination between HSV model

and 2.2 Methods based on deep learning

3.2 Convolutional neural network for skin detection

Convolutional neural network (CNN) is a category of models

where x is the input to a neuron. 1 × 1 filters are employed

Fig. 3 The proposed skin detection CNN model

2. Function: Extraction of the mask. Step 0 to step 9 are

Table 1 The architecture of the

four values: true positive (TP), true negative (TN), false

Fig. 6 Result of skin detection with CNN approach on an example of

Table 3 Confusion matrix of binary skin detection problem

GT skin True positive (TP) False negative (FN)

false negatives (FN). In such cases, F-measure and MCC are

Fig. 10 Loss and accuracy curves (training and validation)

Khawla Ben Salah is a PhD Monji Kherallah was born in

You might also like