Professional Documents
Culture Documents
Meta Soft Label Generation For Noisy Labels
Meta Soft Label Generation For Noisy Labels
galgan@aselsan.com.tr
Abstract—The existence of noisy labels in the dataset causes gradient descent step on the meta objective. Our meta ob-
significant performance degradation for deep neural networks jective based on a basic assumption: the best training label
(DNNs). To address this problem, we propose a Meta Soft distribution should minimize the loss of the small clean meta-
Label Generation algorithm called MSLG, which can jointly
generate soft labels using meta-learning techniques and learn data. Therefore, the meta-data is chosen as the clean subset
DNN parameters in an end-to-end fashion. Our approach adapts of the dataset. In the second stage, the base classifier is
the meta-learning paradigm to estimate optimal label distribution trained on images with their corresponding predicted labels.
by checking gradient directions on both noisy training data and Our framework simultaneously generates soft-label posterior
noise-free meta-data. In order to iteratively update soft labels,
arXiv:2007.05836v2 [cs.CV] 19 Jan 2021
N
1 X M m N n
Lc,2 = KL(fθ (xi )||ŷi ), where αβ X ∂Li (θ̂) ∂ X ∂Lj (θ, ŷ)
N i=1 =− (19)
M N i=1 ∂ θ̂ ∂ ŷ j=1 ∂θ
C
! (14) ŷ (t)
X j fθj (xi )
KL(fθ (xi )||ŷi ) = fθ (xi ) log
j=1 ŷij N M
!
αβ X ∂ 1 X ∂Lm
i (θ̂)
∂Lnj (θ, ŷ)
=− (20)
which produces the following gradients N j=1 ∂ ŷ M i=1 ∂ θ̂ ∂θ
! ŷ (t)
∂Lc,2 fθj (xi )
= 1 + log (15) Let
∂fθj (xi ) ŷij
Lets assume a classification task where true label is 2 (yi2 = ∂Lm
i (θ̂)
∂Lnj (θ, ŷ)
Gij (ŷ) = (21)
1) but noisy label is given as 5 (ỹi5 = 1). Now we consider ∂ θ̂ ∂θ
two updates on ŷi2 and ŷi5
j=2
Then we can rewrite Equation 6 as
• Case ŷi : ŷi2 is initially very small, therefore fθ2 (xi )
2
ŷi . In that case Lc,1 will produce a very small gradients !
N M
(13) while Lc,2 will produce a medium positive gradient (t+1) (t) αβ X ∂ 1 X
ŷ = ŷ + Gij (ŷ) (22)
(15) as desired. N j=1 ∂ ŷ M i=1
j=5
• Case ŷi : ŷi5 initially has peak value; however, due to ŷ (t)
TABLE I: Test accuracy percentages for CIFAR10 dataset with varying level of uniform and feature-dependent noise. Results
are averaged over 4 runs.