Face Detection Evaluation

CAP 6415 Computer Vision Project Report Version 2 Face Detection Using Skin Color and Haar-like features
es Implementation and Evaluation

By
Vasant Manohar Department of Computer Science and Engineering University of South Florida Email: vmanohar@csee.usf.edu Abstract
From a research point of view, well-established problems need standard databases, evaluation protocols and scoring methods available. Evaluating algorithms lets researchers know the strengths and weaknesses of a particular approach and identifies aspects of a problem where further research is needed. In this report, two face detection algorithms (one based on Haar-like features and the other based on skin color modeling) have been implemented. They have been empirically evaluated and the strengths and weaknesses of each method are identified. Also, with the changing values of the measures a global setting of parameter for the skin-color based face detector, which is optimal for most of the images, is identified. Since the performance has been quantified, a conclusion as to which algorithm outperforms the other is made. Experimental results of both the face detection algorithms and empirical evaluation have been produced. It is worth mentioning that the emphasis of this project was not on developing robust face detection algorithms. Rather, effort was directed to develop a framework for empirical evaluation of algorithms for face detection (in fact object-detection, to make it generic). By way of performance evaluation, the aspects of each algorithm that needs improvement were identified. Using the proposed measures, we can do the following: 1. Quantitatively measure the performance. 2. Compare the performance of an algorithm for different kinds of data. 3. Possible to quantitatively compare different detection algorithms. 4. In the course of an algorithms development, any performance improvement can be measured. 5. Trade-offs between performance aspects can be determined. 6. Parameter settings of algorithms that is optimal on majority of the images can inferred from performance plots. The report is organized into the following sections. Section 2 introduces the performance evaluation measures on which the algorithms will be evaluated. Section 3 briefly explains the Haar face detector and discusses the results. Section 4 discusses the skin-color based face detector and details the results. Section 5 shows the ground truth. It also discusses the ground truthing issues and how performance evaluation can be made independent of ground truthing errors. Section 6 explains how the measures were used to arrive at a global setting for the parameters of skin color-based face detector. The details of the evaluation results of the two algorithms and the inferences that can be drawn from them are explained in Section 7. Section 8 describes the future scope of work on the project. The conclusions drawn from the work are presented in Section 9.
1. Introduction and Overview

Two face detection algorithms are implemented as part of this project. One based on skin color and connected component operators and the other method based on Haar-like features. Both the methods were implemented using Open Computer Vision Library. As for Haar face detection, the functions of OpenCV were used to implement the face detector. Also, the classifier was not trained on any training data. The reason being, the Haar face detector has been already trained with face data and the learnt parameters of the classifier already existed in the library. However, as for the other method based on skin color, the images from the dataset were used to train the classifier. In fact, we can call the method as semi-supervised, because, sample pixels of the skin are input to the classifier before it detects the face in the image. Again, this is not mandatory, but was done in this project because acquiring samples for skin color from huge databases seemed difficult. Hence, it was checked if given the samples of the skin pixels that appear in the image, will the classifier be able to segment the face effectively. It should be noted here that the number of skin pixels that were extracted from the image were kept minimal. The emphasis was on making the training set size as small as possible while building a classifier with acceptable accuracy. Both the face detectors will detect only frontal faces. Profile faces, faces that are partially occluded, and heads will not be detected.
2. Performance Evaluation [1]

This section details the measures used to quantify the different aspects of an algorithms performance. The strengths and weaknesses of each of these measures are described. The value of each measure is between zero (worst) and one (best). Fig. 1 introduces the concept of the recall and precision applied to the detection measures. There can be two forms of false alarms, one results from the non-overlapping region in the detected area called the false positive (FP) or in other words, this area is classified as an object but the ground truth is absent. The other form of false alarm results from the missed region in the ground truth area and this is
called false negative (FN). Precision gives an idea of how well the detected area match with the ground truth area. Recall on the other hand gives an idea of how well the ground truth overlaps with the detected area. All the measures described have the recall and precision counterparts so that the FP and FN errors are accounted for.
calculating overlaps, it considers spatial union of boxes, which makes sure that overlapped areas are not counted twice. 2.1.1 Object Count Accuracy This measure compares the number of ground-truth objects in the frame with the number of algorithm output boxes. It penalizes the algorithm both for extra or fewer boxes than the ground truth. Let G be the set of ground truth objects in the image and let D be the set of output boxes produced by the algorithm. The Accuracy is defined as:
if NG + ND = 0 undefined Minimum (NG , ND) Accuracy = otherwise NG + ND 2 where NG and ND are the number of ground-truth objects and output boxes, respectively in the image. The measure does not consider the spatial information of these boxes. Only the count of boxes in each frame is considered. This measure could be useful in evaluating algorithm performances like correctly identifying the number of objects in a given image irrespective of how close they are with respect to the ground truth object. Consider a scenario in which there are 10 ground truth objects and algorithm A finds 8 boxes (say) and algorithm B finds 2 boxes, then A is obviously better than B as for identifying the number of objects in the image. To measure the accuracy in-terms of overlaps with respect to area there are other measures.
Figure 1: Recall and Precision Concept The measures are organized in growing level of complexity and accuracy. The first measure, the Object Count Accuracy (Sec 2.1.1) is a trivial measure and it simply counts the number of detected objects with respect to the ground truth objects without checking for how accurate they overlap with each other. Next, the pixel-based measures, which check for the raw pixel overlaps between the object and ground truth boxes are defined in Sec 2.1.2 and 2.1.3. Here the entire frame is considered as a bit-map without any distinctions made between the different objects. If there were a detected box overlapping another detected box, then this measure would not make any distinction as it considers the union of the areas. Here, bigger boxes have an advantage over smaller boxes. The measures discussed in Sec 2.1.4 and 2.1.5 are area-thresholded measures. If the overlap between the ground truth and detected box is greater than a threshold, then full credit is given for the particular box pair. Next, the area-based measures are discussed in Sec 2.1.6 and 2.1.7, where the measures treat the individual boxes equally regardless of the size, in contrast to the pixel-based measure, which treats bigger boxes differently than smaller boxes. The area-based measure takes into account the individual objects as opposed to not making such distinction in the case of pixel-based measure. In Sec 2.1.8, the fragmentation measure is discussed. This measure penalizes algorithms if they break individual ground truth box into multiple detected boxes. We also propose a set of measures, which are based on a requirement that there is a one-to-one mapping between each ground truth box and the detected box. We measure the positional accuracy of the detection output to the ground truth in Sec 2.2.1. A size-based measure is discussed in Sec 2.2.2, while Sec 2.2.3 discusses an orientation-based measure. Finally, we propose a composite measure in Sec 2.2.4, which is area-based and takes into account the recall, precision and fragmentation.
2.1.2 Pixel-based Recall The measure measures how well the algorithm minimizes false negatives. This is a pixel-count-based measure. Let UnionG and UnionD be the spatial union of boxes in G and D. i=NG UnionG = U Gi i=1 where Gi represents the ith ground truth object in the image.
i=ND UnionD = i=1 where Di represents the ith detected object in the image. We define Recall as the ratio of the detected areas in the ground truth with the total ground truth:
if UnionG = undefined UnionG I UnionD Recall = otherwise 1UnionG where | | operator denotes the number of pixels in the given area. This measure treats the frame not as collection of objects but as a binary pixel map (object/non-object; output-covered/not-outputcovered). So the score increases as the overlap increases and will be 1 for complete overlap.
Di
2.1 Measures independent of Ground Truth and Detected Box Matching

The measures proposed in this section are independent of Ground Truth and Detected Box Matching. This is because, whenever
2.1.3 Pixel-based Precision The measure measures how well the algorithm minimizes false positives. This is a pixel-count-based measure. Let UnionG and UnionD be the spatial union of boxes in G and D. i=NG UnionG = U Gi i=1
i=ND UnionD = i=1 We define Precision as the ratio of the detected areas in the ground truth with the total detection:
if UnionD = undefined UnionD I UnionG Precision = otherwise 1UnionD where | | operator denotes the number of pixels in the given area. This measure treats the frame not as collection of objects but as a binary pixel map (object/non-object; output-covered/not-outputcovered). So the score increases as the false positives reduce and will be 1 for no false positives.
Box Pr ecision( D )
i
Area thresholded _ Pr ecision =
Di
ND
where,
Di I UnionG 1 if > OVERLAP_MIN BoxPrecision(Di) = Di 0 otherwise Here OVERLAP_MIN is the minimum proportion of the output boxs area that should be overlapped by the ground truth in order to say that the output box is precise. Again, in this measure the output boxes are treated equally regardless of size similar to the previous measure.
Di
2.1.4 Area-thresholded Recall In this measure, a ground-truth object is considered detected if the output boxes cover a minimum proportion of its area. Recall is computed as the ratio of the number of detected objects to the total number of ground-truth objects. Let Area-thresholded_Recall be the number of detected objects in an image: ObjectDetect (Gi )
2.1.6 Area-based Recall This measure is intended to measure the average area recall of all the ground-truth objects in the image. The recall for an object is the proportion of its area that is covered by the algorithms output boxes. The objects are treated equally regardless of size. We define Recall as the average recall for all the objects in the ground truth G: ObjectRecall(Gi) Gi Recall = NG where,
undefined if Gi = 0 ObjectRecall(Gi) = Gi I UnionD otherwise Gi where | | operator denotes the number of pixels in the given area. All the ground-truth objects contribute equally to the measure, regardless of their size. On one extreme, if an image contains two objects a large object that was completely detected and a very small object that was missed, then Recall will be 50%.
Gi
Area thresholded _ Re call =
NG
where,
Gi I UnionD 1 if > OVERLAP_MIN ObjectDetect(Gi) = Gi 0 otherwise Here OVERLAP_MIN is the minimum proportion of the groundtruth objects area that should be overlapped by the output boxes in order to say that it is correctly detected by the algorithm. Here again, the ground-truth objects are treated equally regardless of size.
2.1.5 Area-thresholded Precision This is a counterpart of measure 2.1.4. The measure counts the number of output boxes that significantly covered the ground truth. An output box Di significantly covers the ground-truth if a minimum proportion of its area overlaps with UnionG. Let Area-thresholded_Precision be the number of output boxes that significantly overlap with the ground-truth objects:
2.1.7 Area-based Precision This is a counterpart of the previous measure 2.1.6 where the output boxes are examined instead of the ground-truth objects. Precision is computed for each output box and averaged for the whole image. The precision of a box is the proportion of its area that covers the ground truth objects. We define Precision as the average precision of the algorithms output boxes D: BoxPrecision(Di) Di Precision = ND where, undefined if Di = 0 BoxPrecision(Di) = Di I UnionG otherwise Di where | | operator denotes the number of pixels in the given area.
In this measure the output boxes are treated equally regardless of size. 2.1.8 Average Fragmentation Detection of objects is usually not the final step in a vision system. For example, extracted text from video will go through enhancement, binarization and finally recognition by an OCR system. Ideally, the extracted text should be in one piece, but a detection algorithm could produce several boxes (e.g. one for each word or character) or multiple overlapping boxes, which could increase the difficulty for the next processing step. The measure is intended to penalize an algorithm for multiple output boxes covering a ground-truth object. Multiple detections include overlapping and non-overlapping boxes. For a ground-truth object Gi, the fragmentation of the output boxes overlapping the object Gi is measured by:
if ND I G = 0 undefined Frag(Gi) = 1 1 + log10(ND I G) otherwise where ND I G is the number of output boxes in D that overlap with the ground-truth object Gi. For an image, Frag is simply the average fragmentation of all ground-truth objects in the image where Frag(Gi) is defined. This is a particularly useful metric for face detection.
However, each setting is associated with a cost. While, the default settings might detect faces most of the cases, it might also declare a face when there is none in the image. On the other, though the other setting might not detect faces when there is no face in the image, it might also not detect if there are faces in the image. Figs. 2 and 3 explain this possibility.
(2a) Original Image
3. Haar face detector

Haar object detection, partly motivated by face detection, was primarily developed with the goal of rapid object detection. Since the Haar face detector is not the main face detector under investigation in this project, only a brief overview of the method is provided. For a detailed description of the technique, please refer [2]. There are three main contributions of the method. The first is the introduction of a new image representation called the Integral Image, which allows the features used by the detector to be computer very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient classifiers. The third contribution is a method for combining classifiers in a cascade which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. [2] Haar face detector is implemented in OpenCV. The function cvHaarDetectObjects( ) finds rectangular regions in the given image that are likely to contain objects the cascade has been trained for and returns those regions as a sequence of rectangles. The function scans the image several times at different scales. Each time it considers overlapping regions in the image and applies the classifiers to the regions. It may also apply some heuristics to reduce number of analyzed regions, such as Canny pruning. The default parameters are 1. scale_factor = 1.1, min_neighbors = 3, flags = 0, tuned for accurate yet slow face detection. 2. For faster face detection on real video images the better settings are scale_factor = 1.2, min_neighbors = 2, flags=CV_HAAR_DO_CANNY_PRUNING.
(2b) Results with setting 2
(2c) Results with setting 1 (default) Figure 2: Results showing the better results of default settings.
(3a) Original Image
1. Convert the RGB image into the YCbCr color space. The reason behind this is that that segmentation of skin colored regions becomes robust only if the chrominance component is used in analysis. Therefore, the luminance component is eliminated as much as possible by choosing the CbCr plane (chrominance) of the YCbCr color space to build the model. 2. Regions of interest are carefully extracted from the image as training pixels. Regions containing human skin pixels as well as nonskin pixels are collected. The mean and covariance of the database characterize the model. It is modeled as a unimodal Gaussian. The mean and covariance is estimated using EM algorithm. (EM algorithm was implemented as part of another project by me in Fall 2003. The same code was used). It can be seen in Figure 4 that the color of human skin pixels is confined to a very small region in the chrominance space, which is distinct from the non-skin region.
(3b) Results with setting 2
Figure 4: CbCr plane of skin and non-skin regions Let c = [Cb Cr] T be the chrominance vector of an input pixel. The probability that the given pixel lies in the skin distribution is given by
1 exp (c s )T 2 p (c / skin) = 2
1 s
(c s )
where
s and
represent the mean vector and the covariance
(3c) Results with setting 1 (default) Figure 3: Results showing the false positives produced by default settings. Though, the default settings produced false positives, the fact that it detects the face accurately when present, motivated its usage as the settings. The Haar face detection results shown from now on were obtained using the default settings.
matrix respectively of the training pixels. This gives the probability of a pixel occurring given that it is a skin pixel. Similarly, we calculate p(c/non-skin). The posterior probability that a pixel represents skin given its chrominance vector c, p(skin/c) is evaluated using Bayes theorem.
p(c / skin) p(c / skin) + p(c / non skin) An input image is analyzed pixel-by-pixel evaluating the skin probability at each pixel. This results in a gray level image where the gray value gives the probability of the pixel representing skin. This image is thresholded to get to obtain a binary image. A correct choice of threshold is critical. Increasing the threshold will increase the chances of losing certain skin regions exposed to adverse lighting conditions, during thresholding. Also, the extra regions that get p( skin / c) =
4. Skin color-based face detector [3]

The skin color-based face detector proceeds in the following steps.
retained in the image because of the lower threshold can be removed using connected component operators. 3. The resulting image after stage 2 contains lot of noise. So the image is opened using a disk shaped structuring element. The effect of using area open is removal of small and bright regions in the thresholded image. The size of the structuring element should not be more than that of the smallest face the system is designed to detect. A set of shape based connected operators are applied over these remaining components to decide whether they represent a face or not. These operators make use of basic assumptions about the shape of the face. 4. Compactness It is defined as the ratio of its area to the square of its perimeter. A Compactnes s = 2 P This criterion is maximized for circular objects. Face component exhibits a high value for this operator. If a particular component shows a compactness value greater than this threshold, it is retained for further analysis, else discarded. 5. Solidity For a connected component, solidity is defined as the ratio of its area to the area of the rectangular bounding box. A Solidity = Dx D y It gives a measure of area occupancy of a connected component within its min-max box dimensions. The solidity assumes a high value for face components. If a particular component shows a solidity value greater than a threshold, it is retained, else discarded. 6. Aspect ratio It is assumed that normally face components have an aspect ratio well within a certain range. If a components aspect ratio falls out of this range, the component is eliminated. Dy Aspect Ratio = Dx 7. Normalization The remaining unwanted components are removed using Normalized Area. It is the ratio of the area of the connected component to that of the largest component present in the image. Connected components that are less than this threshold are eliminated. The connected components that remain at this stage contain faces. Figure 5 walks through all the steps for the image shown in Figure 5a.
(5b) Binary image showing probable skin areas
(5c) After erosion with a disk structuring element
(5d) After dilation with the same structuring element
(5e) After compactness thresholding (5a) Original Image
The parameter setting that was used for the above image is: Probability threshold 0.1 Size of SE (for opening) 17 x 17 pixels, disk shaped (Image size 816 x 616 pixels) Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.9 2.1 Normalization threshold 0.35 It is worth mentioning here as to how the parameter settings were finalized. This is derived from the scatter plots of the Recall and Precision measures. Again, this is one of the primary uses of empirical evaluation. This is explained in detail in a later section.
(5f) After solidity thresholding
5. Ground truth
The following guidelines were used while ground-truthing the images for evaluation. A face is bounded by a rectangle, where the area includes the eyebrows, eyes, nose mouth and chin. There should be a small but clear space between these facial features and the bounding box. The ears and top hair are not included in the face. For clear visualization, the ground truth images are shown in section 7 along with the results of each of the methods on the test images. One of the major issues with evaluation is the quality of ground truth. How reliable the ground truth is? To account for this ambiguity, care was taken to make the evaluation insensitive to ground truthing errors. Measures that use area overlaps, were made a little lenient in the sense that their contribution to the final score had less weight as against measures such as fragmentation, object count accuracy etc. The approach of weighing the different metric values will also be helpful in extending the evaluation protocol to different domains.
(5g) After aspect ratio thresholding
6. Parameter setting of skin color-based face detector

Another important application of performance evaluation is to arrive on global setting of parameters that influence the performance of an algorithm. Toward this end, a set of measures that are representative of the performance of the algorithm were chosen in tracking the algorithm performance. Area-Thresholded Recall and Precision give the overall picture of the performance. These two measures were chosen to decide the setting of parameters for the skin color-based face detector. Also, as for the parameters, only the parameters that affected the performance most were chosen. The values of solidity and compactness thresholds did not seem to be varying the results to a considerable extent. Also, keeping the probability threshold as low as possible is always advisable as missing the skin region is not good. So the aspect ratio and normalized area thresholds were varied and the scatter plots of ATR and ATP were plotted. The global setting of these parameters was decided at the value, where the values of the measure peaked for most of the images.
(5h) After normalized area thresholding
(5i) Final result Figure 5: Various stages in face detection using skin-color and connected component operators.
(7b) Results with image specific settings (Though one of the faces is fragmented, it is localized properly) Figure 7: Results explaining the tradeoff between global setting and per image setting of parameters. The result in Fig. 7a) is obtained with the global setting, while the result in Fig. 7b) is obtained with the following settings. Probability threshold 0.1 Size of SE (for opening) 5 x 5 pixels, disk shaped (Image size 400 x 276 pixels) Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.3 2.1 Normalization threshold 0.05 However this is acceptable because one cannot set the parameter on a per image basis. The whole process of face detection should be an automated process with no user intervention with the parameter settings.
Figure 6: Scatter plot of ATR/ATP against Normalized Area Threshold A value of 0.35 is chosen for Normalized Area Threshold because this is the value at which the ATR does not go very low for most of the images, while the ATP is maintained at best possible value. A similar plot was made for the Aspect Ratio range against ATR/ATP. On the same lines of deciding the value for Normalized Area threshold, the value for Aspect Ratio range was decided as 0.9 2.1. Since it has two values lower threshold and higher threshold, there have to be two plots to show the effect of each. The reader might not be able to appreciate the decision through two different plots. Hence, the plot has not been shown here. It is important to note that the performance of the algorithm might not be the best for all images. With a different setting, the performance might increase. This is shown in Fig. 7.
7. Evaluation Results
Based on the performance metric values (Refer last page for the evaluation results Fig. 9), one can conclude the following: 1. The Haar face detector is more robust in detecting a face in the image. This is apparent from the fact that the Area Thresholded Recall of Haar face detector is always higher than or equal to the skin color-based face detector. 2. Both the algorithms produce false positives. However, based on the values of Precision, we can infer that, often Haar face detector produces lesser false positives than skin color-based face detector. 3. On this dataset, both the algorithms, when they detect a face, they detect it in whole. This can be seen from the average fragmentation measure for the test images. Again, this is specific to the test data set used. The skin color-based face detector is expected to be prone to errors in terms of fragmentation. In fact when tried on a different image from a database from which no images were used in training, the results of face detection showed fragmentation.
(7a) Results with global settings (One face is totally missed)
where,
Gi U Di i=1 Here, min (Gi,Di) , indicates the maximum number of one-to-one mapping between ground truth objects and detected boxes. However, work has to be done in checking the failure cases of the measure on boundary conditions. Initial results have been promising in that it successfully captures the aspects stated above. Finally, the essence of evaluation is to improve the performance of the algorithm. Here we have noticed that the skin color based face detector does not perform as good as the Haar face detector. In fact, even tweaking the parameters does not yield best results. This shows that the method based on skin color is not robust.
Figure 8: Results of skin color-based face detector. The detected face is fragmented. Also, there are false positives. This image was not included in the test set because color is sensitive to the camera used. Since this image was taken from a different data set on which the classifier was not trained on, it wouldn't be genuine to test the algorithm on this data. Again this is one of the major drawbacks skin color-based face detector. It has to be trained with skin and non-skin pixels from images taken from a camera whose images will be present in the test set. The Haar face detector is not limited by any such constraints. From these, we can declare that the performance of Haar face detector is better than the skin color-based face detector in all the aspects.
Overlap Ratio =
min (NG,ND) Gi I Di
9. Conclusion
Two face detection algorithms one based on Haar-like features and the other based on skin color have been implemented. Both the methods have been empirically evaluated and their performance quantified. Based on the results, we can declare that the Haar face detector outperforms skin color-based face detector in almost all the aspects of evaluation. Efforts on improving the performance of the skin-color based method have proven to be futile. This probably is due to the inability of the method in handling challenging situations. Even a cursory investigation of the method reveals the fact that color is not a good feature to rely on. It can vary due to different lightings, cameras and other factors such as shadow etc Since, the evaluation is not subjective and the performance has been quantified, there is no ambiguity with the conclusion.
8. Future Work
Effort has to be directed in making the performance evaluation insensitive to ground truthing errors. This is an extremely difficult task. However, measures such as Area Thresholded Recall and Precision are efforts in this direction. However, there is still scope for improvement. This aspect has to be explored. Another point is the fact that there are probably too many measures. Considering the fact that they cover different aspects of performance, this is acceptable. However, there have to be measures that comprehensively cover all aspects of an algorithm. To this end, we have developed a comprehensive measure that accounts for fragmentation (splits), merges, area overlap and false positives. This measure is mainly intended to comprehensively cover many aspects in one measure. However, this measure requires a one-to-one mapping of ground truth and detected objects. This is an area-based measure, which penalizes false detections, missed detections and spatial fragmentation. For a single image, we define CAM, the detection composite measure; (given that there are NG ground-truth objects and ND detected objects in the image) as,
CAM = Overlap Ratio NG + ND 2
References
[1] Kasturi R, Goldgof D, Soundararajan P, Manohar V, Performance Evaluation Protocol for Text and Face Detection & Tracking in Video Analysis and Content Extraction (VACE-II), Report Submitted to Advanced Research and Development Activity, March 2004. [2] Viola P and Jones M.J Robust real-time object detection. In Proc. of IEEE Workshop on Statistical and Computational Theories of Vision, 2001. [3] Kuchi P, Gabbur P, Bhat S, David S, Human Face Detection and Tracking using Skin Color Modeling and Connected Component Operators, IETE Journal of Research, Special issue on Visual Media Processing, May 2002.
Evaluation Results
OCA PBR PBP ATR ATP ABR ABP AF 9.1 (a) 9.1 (b) 9.1 (c) OCA PBR PBP ATR ATP ABR ABP AF A-1 1 .97 .81 1 1 .9 .73 1 A-1 .8 .8 .87 .67 1 .67 .82 1 A-2 .67 .61 .92 .5 1 .49 .92 1 A-2 .8 .71 .94 .67 1 .58 .96 1 A-1 1 1 .98 1 1 .98 .96 1 A-2 1 0 0 0 0 0 0 ND
9.2 (a)
9.2 (b)
9.2 (c)
OCA PBR PBP ATR ATP ABR ABP AF
9.3 (a)
9.3 (b)
9.3 (c)
9.4 (a)
9.4 (b)
9.4 (c)
OCA PBR PBP ATR ATP ABR ABP AF
A-1 1 .64 .62 .5 .5 .49 .47 1
A-2 .67 .59 .52 .5 1 .46 .52 1
Figure 9: Results of Evaluation (a) Ground Truth Image (b) Results of Haar face detection (c) Results of skin color-based face detection A-1 Haar face detector; A-2 Skin color-based face detector OCA Object Count Accuracy; PBR Pixel Based Recall; PBP Pixel Based Precision; ATR Area Thresholded Recall; ATP Area Thresholded Precision; ABR Area Based Recall; ABP Area Based Precision; AF Average Fragmentation OVERLAP_MIN was kept at 40% for all the ATR/ATP measurements.

Face Detection Evaluation

Uploaded by

Copyright:

Available Formats

You might also like

Face Detection Evaluation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Face Detection Evaluation

Uploaded by

Copyright:

Available Formats

CAP 6415 Computer Vision Project Report Version 2 Face Detection Using Skin Color and Haar-like features

es Implementation and Evaluation

1. Introduction and Overview

2. Performance Evaluation [1]

2.1 Measures independent of Ground Truth and Detected Box Matching

Area thresholded _ Pr ecision =

Area thresholded _ Re call =

(2a) Original Image

3. Haar face detector

(2b) Results with setting 2

(3a) Original Image

(3b) Results with setting 2

represent the mean vector and the covariance

4. Skin color-based face detector [3]

(5b) Binary image showing probable skin areas

(5c) After erosion with a disk structuring element

(5d) After dilation with the same structuring element

(5e) After compactness thresholding (5a) Original Image

(5f) After solidity thresholding

(5g) After aspect ratio thresholding

6. Parameter setting of skin color-based face detector

(5h) After normalized area thresholding

(7a) Results with global settings (One face is totally missed)

OCA PBR PBP ATR ATP ABR ABP AF

OCA PBR PBP ATR ATP ABR ABP AF

A-1 1 .64 .62 .5 .5 .49 .47 1

A-2 .67 .59 .52 .5 1 .46 .52 1

You might also like