Professional Documents
Culture Documents
1502 02766
1502 02766
Neural Networks
ABSTRACT
arXiv:1502.02766v3 [cs.CV] 20 Apr 2015
[22, 2] requires a significantly larger number of resizing per where B is the example batch that is used in an iteration
octave, e.g. 8. Note that, unlike R-CNN [9], which uses of stochastic gradient descent and yi is the label of example
SVM classifier to obtain the final score, we removed the SVM xi . The sampling method for selecting examples in B can
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Figure 2: left) an example image with faces in different in-plane rotations. It also shows output of our
proposed face detector after NMS along with corresponding confidence score for each detection. right)
heat-map for the response of DDFD scores over the image.
0.025
0.02
0.015
0.01
0.005
0
−150 −100 −50 0 50 100 150
in plane rotation in degree
0.02
0.018
0.016
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
−150 −100 −50 0 50 100 150
pitch in degree
0.01
0.008
0.006
0.004
tection.
Figure 4: Histogram of faces in AFLW dataset
√ √
not scan the image as thorough as fs = √5 0.5 or fs = 7 0.5. based on their top) in-plane, middle) pitch (up and
3
Based on this experiment we use fs = 0.5 for the rest of down) and bottom) yaw(left to right) rotations.
this paper.
Another component of our system is the non-maximum
suppression module (NMS). For this we evaluated two dif- Finally, we examined the effect of a bounding-box regres-
ferent strategies: sion module for improving detector localization. The idea is
to train regressors to predict the difference between the lo-
• NMS-max: we find the window of the maximum cations of the predicted bounding-box and the ground truth.
score and remove all of the bounding-boxes with an At the test time these regressors can be used to estimate the
IOU (intersection over union) larger than an overlap location difference and adjust the predicted bounding-boxes
threshold. accordingly. This idea has been shown to improve localiza-
tion performance in several methods including [5, 29, 4]. To
• NMS-avg: we first filter out windows with confidence
train our bounding-box regressors, we followed the algorithm
lower than 0.2. We then use groupRectangles function
of [9] and Figure 7 shows the performance of our detector
of OpenCV to cluster the detected windows accord-
with and without this module. As shown in this figure,
ing to an overlap threshold. Within each cluster, we
surprisingly, adding a bounding-box regressors degrades the
then removed all windows with score less than 90% of
performance for both NMS strategies. Our analysis revealed
the maximum score of that cluster. Next we averaged
that this is due to the mismatch between the annotations of
the locations of the remaining bounding-boxes to get
the training set and the test set. This mismatch is mostly
the detection window. Finally, we used the maximum
for side-view faces and is illustrated in Figure 8. In addition
score of the cluster as the score of the proposed detec-
to degrading performance of bounding-box regression mod-
tion.
ule, this mismatch also leads to false miss-detections in the
We tested both strategies and Figure 6 shows the perfor- evaluation process.
mance of each strategy for different overlap thresholds. As
shown in this figure, performance of both methods vary sig- 3.1 Comparison with R-CNN
nificantly with the overlap threshold. An overlap threshold R-CNN [9] is one of the current state-of-the-art methods
of 0.3 gives the best performance for NMS-max while, for for object detection. In this section we compare our pro-
NMS-avg 0.2 performs the best. According to this figure, posed detector with R-CNN and its variants.
NMS-avg has better performance compared to NMS-max in We started by fine-tuning AlexNet for face detection using
terms of average precision. the process described in section 2. We then trained a SVM
Figure 5: Effect of scaling factor on precision and Figure 7: Performance of the proposed face detector
recall of the detector. with and without bounding-box regression.
Figure 6: Effect of different NMS strategies and boxes, provided by selective search [36], with the ground
their overlap thresholds. truth.
Figure 9: Comparison of our face detector, DDFD, Figure 11: Comparison of different face detectors on
with different R-CNN face detectors. FDDB dataset.