Professional Documents
Culture Documents
Icip42928 2021 9506308
Icip42928 2021 9506308
ABSTRACT
Due to the large amount of noisy data in person re-identification
(ReID) task, the ReID models are usually affected by the
data uncertainty. Therefore, the deep uncertainty estima-
tion method is important for improving the model robustness
and matching accuracy. To this end, we propose a part-
based uncertainty convolutional neural network (PUCNN),
which introduces the part-based uncertainty estimation into
the baseline model. On the one hand, PUCNN improves Fig. 1. A negative pair caused by low uncertainty estimation
the model robustness to noisy data by distributilizing the score samples in Markert-1501 ReID benchmarks. Different
feature embedding and constraining the part-based uncer- identities are marked as two colors.
tainty. On the other hand, PUCNN improves the cumulative
matching characteristics (CMC) performance of the model timating the data uncertainty during the optimization process.
by filtering out low-quality training samples according to the By estimating the data uncertainty in ReID, the model robust-
estimated uncertainty score. The experiments on both non- ness and reliability on noisy data can be improved. Moreover,
video datasets, the noised Market-1501 and DukeMTMC, and the false positive (FP) samples can get rid off according to
video datasets, PRID2011, iLiDS-VID and MARS, demon- their estimated uncertainty, thus improving the matching ac-
strate that our proposed method achieves encouraging and curacy of the ReID model. Therefore, the uncertainty estima-
promising performance. tion is of vital importance on ReID models.
Index Terms— Person ReID, uncertainty estimation, Existing methods make various efforts to alleviate the
noise learning, quality filter affect caused by the low-quality data. The attention-based
methods [2, 3] locate the focus on the relatively high quality
1. INTRODUCTION regions (e.g. image parts) while fail to qualify the entire
image. More recently, DistributionNet [4] and quality aware
Person Re-identification (ReID) is a non-cooperative task un- network (QAN) [5] are proposed to qualify the data by learn-
der a complex and open environment. Therefore, the data ing the distribution of image features and set to set training,
for ReID is usually of low-quality and contains lots of noise, respectively. However, they measure only the global image
such as occlusion, blur and illumination difference [1]. The quality while fail to locate the low quality region with large
low-quality data could lead to incorrect matching showed in uncertainty.
Fig. 1, where the low-quality positive sample is matched with In this paper, we propose the part-based uncertainty con-
a negative sample due to the small distance between them. volutional neural network (PUCNN), which introduces the
Although the affection of low-quality data (e.g. uncer- part-based uncertainty estimation into the ReID model, thus
tainty samples) is hard to eliminate, it can be alleviated via es- involving both the global and local data uncertainty into the
∗ Corresponding author. This work was supported in part by the National model optimization. Specifically, the PUCNN estimates the
Key R&D Program of China under Grant 2020AAA0105200, in part by data uncertainty by distribulizing the feature of parts and con-
the National Key R&D Program of China under Grant 2019YFF0303300 straining the divergence of part-based uncertainty. Then, the
and under Subject II No. 2019YFF0303302, in part by National Natural low-quality images are get rid off according to their estimated
Science Foundation of China (NSFC) under Grant 61922015, 61773071,
uncertainty. Finally, the PUCNN is with the understanding
U19B2036, in part by Beijing Natural Science Foundation Project under
Grant Z200002, and in part by the Beijing Nova Programme Interdisciplinary and acknowledgment of the uncertainty of the input images,
Cooperation Project under Grant Z191100001119140. which improves its robustness and reliability.
Authorized licensed use limited to: IEEE Xplore. Downloaded on September 01,2021 at 06:19:52 UTC from IEEE Xplore. Restrictions apply.
illustrate how to introduce the uncertainty term into feature
embedding. Then we discuss the loss function and constraint
of the uncertainty term. (See Fig. 2 for an overview)
2305
Authorized licensed use limited to: IEEE Xplore. Downloaded on September 01,2021 at 06:19:52 UTC from IEEE Xplore. Restrictions apply.
BoT-baseline [13] PCB [11] DistributionNet [4] PUCNN (ours)
Noisy dataset
Rank-1 mAP Rank-1 mAP Rank-1 mAP Rank-1 mAP
M 93.38 81.94 93.64 83.47 93.81 82.90 94.68 84.21
M+(3, 3)GBlur 91.29 79.88 91.53 81.62 91.21 81.37 93.16 82.49
M+(3, 3),(5, 5)GBlur 84.78 69.36 86.73 71.55 88.85 72.73 89.57 75.72
M+(3, 3),(5, 5),(7, 7)GBlur 75.61 59.72 79.89 62.53 81.29 64.54 84.65 69.74
M+(3, 3)Erode 86.36 71.46 88.49 72.55 89.92 76.85 90.18 76.39
M+RO 85.42 71.76 87.97 73.91 88.56 74.26 88.70 73.57
M+(3, 3)Erode+RO 72.90 52.74 77.31 58.53 79.13 62.57 81.23 70.73
D 84.72 70.36 85.13 71.42 84.21 70.36 85.68 71.78
D+(3, 3)GBlur 81.03 66.48 83.28 69.34 82.73 68.39 83.47 69.09
D+(3, 3),(5, 5)GBlur 74.92 56.12 77.91 57.61 79.41 60.34 80.52 64.53
D+(3, 3),(5, 5),(7, 7)GBlur 66.17 48.27 69.78 49.49 71.63 51.88 74.67 55.96
D+(3, 3)Erode 80.36 65.17 80.07 64.52 81.25 66.87 82.6 68.06
D+RO 80.90 64.47 81.82 66.16 81.58 67.49 82.74 68.75
D+(3, 3)Erode+RO 61.64 43.28 62.32 44.62 68.42 47.92 70.39 51.23
2306
Authorized licensed use limited to: IEEE Xplore. Downloaded on September 01,2021 at 06:19:52 UTC from IEEE Xplore. Restrictions apply.
PRID2011 iLIDS-VID MARS
Method Percentage
CMC-1 CMC-5 CMC-10 CMC-1 CMC-5 CMC-10 CMC-1 CMC-5 CMC-10
Baseline 0% 33.5 60.2 74.3 46.7 69.5 83.1 58.4 73.6 84.1
Random selection 25% 33.7 59.2 73.8 46.1 70.2 83.2 58.3 73.2 84.8
QAN [5] 25% 37.8 68.7 81.4 52.4 77.8 91.6 62.5 77.4 91.3
PUCNN (ours) 25% 39.6 69.3 80.5 53.2 77.5 91.4 63.0 79.6 91.5
Random selection 50% 33.5 60.4 74.6 46.5 69.3 82.8 58.0 73.4 84.0
QAN [5] 50% 38.2 69.1 81.5 52.1 78.4 92.3 61.8 76.7 90.9
PUCNN (ours) 50% 39.5 69.8 80.8 52.7 78.0 92.2 62.9 78.5 91.8
Table 2. Comparisons of the proposed PUCNN with one quality filter method (QAN [5]). Results of cross-dataset CMC
performance after (0%,25%,50%) filtered. The best results are marked in bold, respectively.
noises. The comparative experiments are shown in Table 1. datasets into gallery and query sets, and removed the groups
Experimental results show that our proposed PUCNN has of samples that have less than 20 samples in the same ID.
a significant improvement on both Rank-1 and mAP, com- Subsequently, we filtered the samples with low global uncer-
pared with BoT-baseline [13] and PCB [11] evaluated on tainty scores in the query set, and set the filtering percentages
high-level noise datasets. In addition, the higher the noise to 25% and 50%, respectively, for the methods. 0% filter-
level, the more obvious the advantages of our method can be ing percentage refers to the baseline model trained by the
found. Moreover, compared with DistributionNet [4] that fo- PCB-RPP where no samples are filtered.
cuses on feature noise and label noise, the proposed PUCNN The results of the comparison of the proposed PUCNN
model also has a better positive effect on noise samples, ex- with two referred quality filter methods are reported in Ta-
cept only two cases. The main reason is that most samples in ble 2. For 25% filter percentage, the cumulative matching
the person ReID are part occluded or local features are dif- characteristics (CMC) of the PUCNN (39.6% in PRID2011)
ficult to be discriminated, that is, the local uncertainty leads is significantly better than random selection (33.7%) and
to global uncertainty. By estimating the local uncertainty of QAN [5] (37.8%). This is consistent with the above dis-
input, the effect of local parts in global features can be weak- cussions. By filtering out the query samples with high
ened. These results demonstrate that the proposed method uncertainty scores, the remaining 75% samples with lower
based on local data uncertainty modeling can strengthen the uncertainty can provide better performance of CMC-1 and
constraints and extract more discriminative features on the CMC-5, respectively. Meanwhile, PUCNN also achieves
noise samples. In addition, noise learning improves the ro- better performance in iLIDS-VID and MARS datasets. How-
bustness of the person ReID model to the uncertain samples. ever, when the filtering percentage was set as 50%, the CMCs
remain basically unchanged compared with 25% filtering
4.3. Cross-dataset Uncertainty Filter percentage. This indicates that the confused samples with
the highest 25% uncertainty scores in video-based person
As shown in Fig. 3, the images in the first two columns are ReID can bring more interference to the overall discrimina-
generated by random noise, and those in the other columns tion than the less confused samples. In summary, compared
are sampled in the Market-1501 dataset. Due to the reason with traditional quality estimation using additional attributes
that global uncertainty scores reflect the discriminant of fea- such as postures and blur-levels, the above results show that
tures, that is, the confidence of PUCNN prediction embed- the proposed PUCNN in person ReID is an effective filtering
ding, the uncertainty scores can be used to judge the quality method that only ID label is used for training.
of the input images. Therefore, the quality filter based on
global uncertainty scores can be naturally used to filter out
5. CONCLUSION
the confused samples and outliers, so as to improve the per-
formance of residual data in person ReID. As described in (3), In this paper, we introduced the uncertainty term into the
the global uncertainty scores can be obtained by calculating part-based person ReID model for the first time, namely
the harmonic mean of part uncertainty term. Thus, we con- PUCNN. Comprehensive experiments demonstrated that our
ducted the following experiments in Table 2. proposed PUCNN performed better than deterministic mod-
To approximate the real-world situations, we undertook els on noised datasets. Moreover, extensive cross-dataset
a group of cross-dataset experiments that all models have experiments were conducted and presented our proposed
trained on the Market-1501 dataset, and tested on three method was an effective method that can be used to estimate
video-based ReID datasets, including PRID2011 [8], iLIDS- the quality of ReID images and filter low-quality images.
VID [9], and MARS [10] datasets. We divided the three
2307
Authorized licensed use limited to: IEEE Xplore. Downloaded on September 01,2021 at 06:19:52 UTC from IEEE Xplore. Restrictions apply.
6. REFERENCES Computer Vision and Pattern Recognition Workshops, 2019,
pp. 0–0.
[1] Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, and Yi Yang,
[14] Sara Iodice and Krystian Mikolajczyk, “Partial person re-
“Pose-guided feature alignment for occluded person re-
identification with alignment and hallucination,” in Asian Con-
identification,” in Proceedings of the IEEE International Con-
ference on Computer Vision. Springer, 2018, pp. 101–116.
ference on Computer Vision, 2019, pp. 542–551.
[15] Balaji Lakshminarayanan, Alexander Pritzel, and Charles
[2] Wei Li, Xiatian Zhu, and Shaogang Gong, “Harmonious atten-
Blundell, “Simple and scalable predictive uncertainty estima-
tion network for person re-identification,” in Proceedings of
tion using deep ensembles,” in Advances in neural information
the IEEE conference on computer vision and pattern recogni-
processing systems, 2017, pp. 6402–6413.
tion, 2018, pp. 2285–2294.
[16] Jiyang Xie, Zhanyu Ma, Jianjun Lei, Guoqiang Zhang, Jing-
[3] Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang,
Hao Xue, Zheng-Hua Tan, and Jun Guo, “Advanced dropout:
and Pan Zhou, “Jointly attentive spatial-temporal pooling net-
A model-free methodology for bayesian dropout optimiza-
works for video-based person re-identification,” in Proceed-
tion,” IEEE Transactions on Pattern Analysis and Machine
ings of the IEEE international conference on computer vision,
Intelligence, pp. 1–1, 2021.
2017, pp. 4733–4742.
[17] Jiyang Xie, Zhanyu Ma, Guoqiang Zhang, Jing-Hao Xue,
[4] Tianyuan Yu, Da Li, Yongxin Yang, Timothy M Hospedales,
Zheng-Hua Tan, and Jun Guo, “Soft dropout and its vari-
and Tao Xiang, “Robust person re-identification by modelling
ational bayes approximation,” in 2019 IEEE 29th Interna-
feature uncertainty,” in Proceedings of the IEEE International
tional Workshop on Machine Learning for Signal Processing
Conference on Computer Vision, 2019, pp. 552–561.
(MLSP), 2019, pp. 1–6.
[5] Yu Liu, Junjie Yan, and Wanli Ouyang, “Quality aware
[18] Terrance DeVries and Graham W Taylor, “Learning con-
network for set to set recognition,” in Proceedings of the
fidence for out-of-distribution detection in neural networks,”
IEEE Conference on Computer Vision and Pattern Recogni-
arXiv preprint arXiv:1802.04865, 2018.
tion, 2017, pp. 5790–5799.
[19] Yichun Shi and Anil K Jain, “Probabilistic face embeddings,”
[6] Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong
in Proceedings of the IEEE International Conference on Com-
Wang, and Qi Tian, “Scalable person re-identification: A
puter Vision, 2019, pp. 6902–6911.
benchmark,” in Proceedings of the IEEE international con-
ference on computer vision, 2015, pp. 1116–1124. [20] Jie Chang, Zhonghao Lan, Changmao Cheng, and Yichen Wei,
“Data uncertainty learning in face recognition,” in Proceedings
[7] Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara,
of the IEEE/CVF Conference on Computer Vision and Pattern
and Carlo Tomasi, “Performance measures and a data set for
Recognition, 2020, pp. 5710–5719.
multi-target, multi-camera tracking,” in European Conference
on Computer Vision. Springer, 2016, pp. 17–35. [21] Shuya Isobe and Shuichi Arai, “Deep convolutional encoder-
decoder network with model uncertainty for semantic segmen-
[8] Martin Hirzer, Csaba Beleznai, Peter M Roth, and Horst
tation,” in 2017 IEEE International Conference on INnovations
Bischof, “Person re-identification by descriptive and discrim-
in Intelligent SysTems and Applications (INISTA). IEEE, 2017,
inative classification,” in Scandinavian conference on Image
pp. 365–370.
analysis. Springer, 2011, pp. 91–102.
[22] Axel Brando, Jose A Rodrı́guez-Serrano, Mauricio Ciprian,
[9] Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin
Roberto Maestre, and Jordi Vitrià, “Uncertainty modelling in
Wang, “Person re-identification by video ranking,” in Euro-
deep networks: Forecasting short and noisy series,” in Joint
pean conference on computer vision. Springer, 2014, pp. 688–
European Conference on Machine Learning and Knowledge
703.
Discovery in Databases. Springer, 2018, pp. 325–340.
[10] Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su,
[23] Diederik P Kingma and Max Welling, “Auto-encoding varia-
Shengjin Wang, and Qi Tian, “Mars: A video benchmark for
tional bayes,” arXiv preprint arXiv:1312.6114, 2013.
large-scale person re-identification,” in European Conference
on Computer Vision. Springer, 2016, pp. 868–884. [24] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao, “A
discriminative feature learning approach for deep face recog-
[11] Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin
nition,” in European conference on computer vision. Springer,
Wang, “Beyond part models: Person retrieval with refined part
2016, pp. 499–515.
pooling (and a strong convolutional baseline),” in Proceed-
ings of the European Conference on Computer Vision (ECCV), [25] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
2018, pp. 480–496. “Delving deep into rectifiers: Surpassing human-level perfor-
mance on imagenet classification,” in Proceedings of the IEEE
[12] Guangcong Wang, Jianhuang Lai, Peigen Huang, and Xiaohua
international conference on computer vision, 2015, pp. 1026–
Xie, “Spatial-temporal person re-identification,” in Proceed- 1034.
ings of the AAAI Conference on Artificial Intelligence, 2019,
vol. 33, pp. 8933–8940. [26] Xing Fan, Wei Jiang, Hao Luo, and Mengjuan Fei, “Spher-
ereid: Deep hypersphere manifold embedding for person re-
[13] Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei identification,” Journal of Visual Communication and Image
Jiang, “Bag of tricks and a strong baseline for deep person Representation, vol. 60, pp. 51–58, 2019.
re-identification,” in Proceedings of the IEEE Conference on
2308
Authorized licensed use limited to: IEEE Xplore. Downloaded on September 01,2021 at 06:19:52 UTC from IEEE Xplore. Restrictions apply.