Professional Documents
Culture Documents
Annotate Less But Perform Better: Weakly Supervised Shadow Detection Via Label Augmentation
Annotate Less But Perform Better: Weakly Supervised Shadow Detection Via Label Augmentation
Annotate Less But Perform Better: Weakly Supervised Shadow Detection Via Label Augmentation
https://doi.org/10.1007/s00371-024-03278-6
ORIGINAL ARTICLE
Abstract
Shadow detection is essential for scene understanding and image restoration. Existing paradigms for producing shadow
detection training data usually rely on densely labeling each image pixel, which will lead to a bottleneck when scaling up the
number of images. To tackle this problem, by labeling shadow images with only a few strokes, this paper designs a learning
framework for Weakly supervised Shadow Detection, namely WSD. Firstly, it creates two shadow detection datasets with
scribble annotations, namely Scr-SBU and Scr-ISTD. Secondly, it proposes an uncertainty-guided label augmentation scheme
based on graph convolutional networks, which can propagate the sparse scribble annotations to more reliable regions, and
then avoid the model converging to an undesired local minima as intra-class discontinuity. Finally, it introduces a multi-task
learning framework to jointly learn for shadow detection and edge detection, which encourages generated shadow maps to
be comprehensive and well aligned with shadow boundaries. Experimental results on benchmark datasets demonstrate that
our framework even outperforms existing semi-supervised and fully supervised shadow detectors requiring only 2% pixels
to be labeled.
Keywords Image segmentation · Shadow detection · Weakly supervised · Graph convolutional network · Uncertainty analysis
123
H. Chen et al.
(a) Scribble annotations (b) Densely labeling (c) Weak augmentation (d) Strong augmentation
(e) Baseline (f ) Edge map (g) Multi-task learning (h) WSD (ours)
[47]), namely Scr-SBU and Scr-ISTD, with flexible scribble the-art (SOTA) edge detector—EDTER [39]. When we train
(Fig. 1a). the shadow detection and edge detection task jointly, edge
Compared to densely labeling (Fig. 1b), scribble anno- detection can provide rich edge features for shadow detec-
tation can reduce the labeling cost by 20–40 times. Since tion. However, so many structure details will exist in the
scribble annotations are generally sparse, it will bring with non-shadow regions (Fig. 1g). Inspired by gated structure-
us a new challenge: deep models training on them are prone to aware loss [61], we adopt it to focus only on shadow regions
be trapped into an undesired local minima as intra-class dis- and ignore the structure of non-shadow regions. By doing
continuity, i.e., shadow regions often cover multiple materials these, we can generate shadow maps with smoothing pre-
with different textures. Fortunately, the above problem can dictions inside shadow and non-shadow regions, and clear
be alleviated by exploring more pixel-level training samples. boundaries at the shadow edges (Fig. 1h). Extensive exper-
Thanks to the powerful capability of label propagation from iments show that our WSD outperforms the SOTA shadow
the graph convolutional networks (GCNs) [23, 60], we pro- detectors with only labeling around 2% pixels. The three
pose a novel label augmentation scheme, which can expand main contributions are summarized as follows.
sparse scribble to denser annotation (Fig. 1c and d) by prop-
agating the label information among superpixel nodes and 1. To alleviate the reliance on dense labeling, we present a
conducting uncertainty analysis for GCN outputs. weakly supervised shadow detection (WSD) framework
One can observe that even the augmented annotations are and create two corresponding shadow detection datasets
still hard to align with the edges of shadows, resulting in poor with scribble annotation. Experimental results demon-
boundary localization of shadow maps (Fig. 1e). Inspired strate that our WSD can perform comparably to SOTA
by a recent semi-supervised shadow detector, MTMT-Net semi-supervised and fully supervised shadow detectors.
[5], they introduce a shadow edge detection task to produce 2. We propose an uncertainty-guided label augmentation
feature highlight shadow structures. However, this manner scheme based on the graph convolutional networks to
is not suitable for us because of the following two reasons. explore more reliable labels from sparse scribble anno-
Firstly, the shadow edge masks in MTMT-Net are extracted tation, which can avoid the model converging to an
from GT shadow masks, which means this method still relies undesired local minima as intra-class discontinuity.
on pixel-level densely labeling and it goes against the original 3. To solve the problem of poor boundary localization due
intention of weakly supervised learning. Secondly, the severe to incomplete labels around shadow edges on the aug-
noisy labels in GT shadow masks lead to the extracted shadow mented annotations, we introduce a multi-task learning
edge masks not being well aligned with shadow boundaries, framework to train the shadow detection and edge detec-
resulting in an unsatisfactory shadow edge detection result. tion jointly. It enables us to extract the shadow edge
To tackle this problem, we introduce an edge detection features explicitly to produce more accurate shadow
task instead of shadow edge detection. Note that the GT boundaries.
edge maps (Fig. 1f) in our task are produced by a state-of-
123
Annotate less but perform better: weakly supervised...
123
H. Chen et al.
Shadow image Shadow mask Scribble Shadow image Shadow mask Scribble
(a) Illustration of our Scr-SBU and Scr-ISTD (b) Statics of the ratio of labeled pixel
responding color. This trick ensures that the trained model is Feature representation. Shadow regions usually have
highly discriminative for these regions. lower brightness, darker color, weaker texture, and shape
clusters. Therefore, we encode the illumination (L channel
in Lab color space), color histograms (RGB space), tex-
4 Label augmentation ton histograms [8], and location of each superpixel into
node representation. Compared with the features extracted by
Let us define the training set as S = {Ii , Yi }i=1
N , where I is deep network [60], hand-crafted features have lower dimen-
the shadow image, Y is scribble mask, and N is the number sions. Most importantly, we do not need a pre-trained image
of the all training images. Considering that directly training classifier to extract high-level features for shadow images.
on the sparse annotation will lead to the deep model being While the edge representation can be viewed as a relevance
trapped into undesired local minima. In this section, we aim between
H two adjacent nodes, computed by Gaussian similar-
to develop a label augmentation scheme to explore denser ity h=1 exp(−||fih − f hj ||22 /σh2 ), fih denote the node feature
annotation Y from given sparse scribble mask Y. Our label representation of node xi on h-th hand-crafted features, and
augmentation consists of two degrees of schemes: weak aug- σh is the corresponding standard deviation.
mentation and strong augmentation, see also Fig. 3. Graph construction. Even in extremely complex scenes,
It is worth noting that a superpixel will be viewed as the the shadow regions still easily to be distinguished from the
shadow or non-shadow region entirely as long as an arbitrary image background by our eyes. We found that humans tend
pixel is annotated as shadow or non-shadow. In weak aug- to first locate the approximate scope of shadow regions by
mentation, we can easily expand the annotation by two–three the prior knowledge of weak brightness, and then discrimi-
times (the times are decided by the superpixel size). While nate the ambiguous regions by local and global comparison,
in strong augmentation, we further propagate the label infor- e.g., two regions on the same material with different bright-
mation from the labeled superpixel to the rest parts based on ness indicate that the darker region must belong to shadow.
a two-layer GCN model. Then, we perform uncertainty anal- To conduct more robust label propagation, we treat all the
ysis for GCN outputs to select the most reliable part from the superpixels in turn as the center, connecting with the four
entire pseudo-masks, avoiding deep model training on them nearest nodes and 15 random nodes (these sparse connec-
overfitting so many noisy labels. tions can be a trade-off between training cost and inference
accuracy) for considering both the local and global context.
4.1 Data Formulation
4.2 Label propagation
We first use SLIC superpixel segmentation [1] to separate a
training image into a series of homogeneous regions. Then, After our weak augmentation, we can assign each superpixel
these superpixels are viewed as nodes in graph G, allowing node with a corresponding label: shadow, non-shadow, or
us to have a better context for the representation. Noting unlabeled. Next, we pose the strong label augmentation as
that, since we need to train an individual GCN model for a graph-learning problem to propagate the label information
each shadow image in the training set, the overall training among all nodes. The procedure can be formulated as:
cost and memory requirements should be considered. Hence,
we attempt to design an efficient GCN model from feature
representation and graph construction. ˜ 0 ))W1 ),
Y = Sigmoid(Ã(ReLU(ASW (1)
123
Annotate less but perform better: weakly supervised...
Expand
Hidden layers Output
Graph formulaon MCDO
Hidden layers layers
2-layered GCN
Strong augmentaon
T
Weak augmentaon
where S = [s1 , s2 , ..., sn ]T ∈ Rn×d consists of n feature reliable non-shadow, and unreliable node). Inspired by the
vectors with d dimension, W0 ∈ Rd×32 and W1 ∈ R32×1 Monte Carlo dropout (MCDO) [18, 22], which can perform
are network parameters, Ã is the sum of the adjacent matrix uncertainty analysis for an arbitrary CNN model by keeping
A ∈ Rn×n and an identity matrix in . Y = [y1 , y2 , ..., yn ] the dropout layers turned on in the inference stage. Consider-
denotes the predicted probabilities of all nodes. The GCN ing the similarity between the CNN and GCN, we introduce
training and inference process is very fast as it only has a this MCDO into the GCN training to conduct such analy-
few thousand parameters. Then, we introduce a partial cross- sis. The identified uncertain parts will be regarded as the
entropy loss [42] to encourage the label information from the “unreliable nodes,” and then be discarded in the subsequent
weak augmentation unchanged in the training stage: multi-task learning. Specifically, we retain the dropout layer
in the inference stage and then perform T forward passes to
⎡ ⎤
generate a distribution over predictions.
1 1
LGC N = −⎣ log ŷi + log(1 − ŷ j )⎦ ,
|S| |N S|
1
T
i∈S j∈N S
D(x) ≈ GC N (x, θ t ), (3)
(2) T
t=1
where S and N S denote the set of shadow and non-shadow where θ is the GCN’s parameter and T is number of total
nodes in the result of weak augmentation, respectively. In the stochastic forward passes. We compute a node’s uncertainty
stage of inference, we can calculate the predicted probabili- U(x) by the cross-entropy:
ties for all nodes by Eq. (1) and then generate a corresponding
shadow map (Fig. 4b). Specifically, we construct a fully con-
U(x) = − D(x) log D(x) + (1 − D(x)) log(1 − D(x)) .
nected graph including all nodes in a shadow image, extract
(4)
the feature matrix, and then feed it into the well-trained GCN
model.
At this point, the unreliable part can be selected if U(x) is
greater than the threshold τ . The final augmented annotation
4.3 Noise suppression mask can be formulated as:
One can observe that there are many errors (Fig. 4e) in GCN binarize(Y (x)), i f U(x) > τ
Y (x) = , (5)
outputs as the limited performance of the lightweight GCN 2, i f U(x) ≤ τ
model and the limited representation ability of low-level fea-
tures. According to the memorization effect [59], training where “2” denotes the uncertain elements (black in Fig. 4c),
with partially incorrect labels often reduces the final model which will be discarded in the rest of the shadow detection
performance. As a result, our goal is to identify the most training. One can observe from that (Fig. 4e and f, we can
reliable parts from GCN outputs. obtain a reliable pseudo-mask with fewer label noises. Here,
Our idea is to transfer the label propagation from a binary it is worth noting that, despite using our label augmenta-
classification problem (i.e., shadow and non-shadow node) tion scheme, there still are three kinds of supervision signals.
into a multi-classification problem (i.e., reliable shadow, That means training with the pseudo-masks from our weak
123
H. Chen et al.
(a) Shadow image (b) Pseudo shadow mask (c) Result of noise suppression
(d) Clean shadow mask (e) Noisy labels in (b) (f ) Noisy labels in (c)
or strong label augmentation is still essentially weakly super- 5.2 Multi-task learning
vised learning.
Shadow detection task: For scribble supervision, we use
the partial cross-entropy loss Lscribble on the labeled points
of augmented annotations as:
1 1
5 Edge-guided shadow detection Lscribble = − log ŷi − log(1 − ŷi ),
|As | |Ans |
i∈As j∈Ans
Despite the denser annotation obtained from the strong aug- (6)
mentation, they still had to represent the structure of shadows
since the uncertainty parts tend to exist along with the shadow
where As and Ans denote the augmented shadow and
boundaries. Hence, we introduce the edge detection task
non-shadow annotations, respectively. ŷi is the predicted
into our weakly supervised shadow detection framework, as
probability of i-th pixel belong to shadow.
shown in Fig. 5. Specifically, our shadow detection branch
first generates a coarse shadow map sc ; the edge map e Edge detection task: We use the SOTA edge detector,
obtained by the edge detection branch will be fused further EDETR [39], to produce edge maps e for all training images,
to improve the structure of sc , and then produce a refined then use them as GT to train the edge detection branch with
shadow map sr . Finally, we introduce a gated structure-aware a loss Ledge :
loss [61] to force the boundaries of these two shadow maps
to comply with image edges.
N
Ledge (e, e ) = − (ei log ei + (1 − ei ) log(1 − ei )). (7)
i=1
Unlike recent SOTA shadow detectors [5, 62, 66, 67], which Gated structure-aware loss: Although the edge features
use ResNext101 [56] or EfficentNet [41] to extract multi- from the edge detection task help the shadow detection net-
scale features of shadow images, we use the SegNeXt-B [12] work generate a shadow map with richer structures, it does
instead as our backbone, which can perform better than recent not limit the scope of the structure that needs to be recov-
vision transformers (ViT) [31, 55, 63] with simple and cheap ered. For example, it retains too much structural information
convolutions. For the feature maps extracted from the dif- in non-shadow regions. Inspired by the gated structure-aware
ferent stages of the shared encoder, we first compress and loss [61], which introduces a gate for the smoothing loss [48]
upsample them into the same resolution, and then fuse and to place constraints on the scope of structure to be recovered,
feed them into a 1 × 1 convolutional kernel to generate the we employ it to encourage the predicted shadow maps to have
edge map and shadow map for two different tasks separately. consistent intensities inside the shadow regions and distinct
123
Annotate less but perform better: weakly supervised...
boundaries at the shadow edges. This loss is defined as: of UCF dataset and the image scenes on it are similar to SBU,
we also validate the detection performance on it by using a
Lgate = φ(|∂ p sw,h |e−α|∂ p (G·Iw,h )| ), (8) model well-trained on Scr-SBU.
w,h p∈
x ,y
Metrics. We use the balanced error rate (BER) to evaluate
where φ(·) is square root operation, Iw,h is the intensity of our proposed detector quantitatively:
the gray level at location (w, h), p is the partial derivatives,
1 TP TN
and G is the gate. As shown in Fig. 6, we dilate the predicted BER = 1 − + × 100. (10)
shadow map (Fig. 6c) to obtain a gate map (Fig. 6d). By using 2 TP + FN TN + FP
it, gated edge detection result (Fig. 6e) can help network to
We also consider the per-pixel error rates of shadow and non-
focus on the shadow region and predict sharp boundaries in
shadow regions, denoted as “S” and “NS,” respectively.
a shadow map.
Based on these above loss functions, we define the total Methods for comparison. We first compare our methods
loss function LMulti−task as follows: with recent ten SOTA shadow detectors, SDTR+ [52], SDCM
[68], FDRNet [67], MTMT-Net [5], DSD [62], A+D Net [28],
LMulti−task = Lscribble (sc ) + Lscribble (sr ) + α1 · Lgate (sc ) DSC [16], BDRAR [66], scGAN [35], and stacked-CNN
+ α2 · Lgate (sr ) + α3 · Ledge (e, e ). (9) [46]. For a more comprehensive comparison, we also com-
pare our method with a weakly supervised saliency object
where α1 , α2 , and α3 are the hyper-parameters for balancing detector, WSOD [57], and a weakly supervised camouflaged
the effect of each loss function. We use the Lscirbble and Lgate object detector, WCOD [14]. For a fair competition, we
on both the coarse shadow map sc and the refined shadow directly use the reported evaluation values on their publi-
map sr , and employ the Ledge for the edge map e . Note that cations.
Lscribble do not contradict with Lgate since Lscribble aims to
propagate the annotated labels to the shadow regions (relying 6.2 Implementation details
on the shadow detection branch), while Lgate constrain sc and
sr to be well aligned to edges extracted by edge detection Our GCN model and shadow detection network are imple-
branch prevents the shadow labels from propagating to non- mented on NVIDIA RTX 3090 GPU by Python 3.6 and
shadow regions. Pytorch 1.7. In details, our two-layered GCN model has 32
hidden layers and an output layer for binary classification.
We use Adam as our gradient optimizer, the learning rate
6 Experiments is set as 0.01 to train all graphs from a single image with
600 epochs. In our uncertainty analysis (Sec. 4.3), we set
6.1 Setup the dropout rate in the hidden layer p = 0.3, T = 20, and
τ = 0.8. In our shadow detection network, we first resize
Benchmarks. All experiments in our work are conducted on all input images to 416×416. The learning rate is set as 1e-4
the following datasets: Scr-SBU, Scr-ISTD, and UCF [65]. with a poly-learning rate decay policy. The batch size is set
Specifically, we first train and test our WSD on the Scr-SBU as 8, and the hyper-parameters α1 , α2 , and α3 in Eq. (9) as
and Scr-ISTD dataset separately. Next, due to the limited size 0.2, 0.2, and 0.6, respectively. During testing, we also apply
123
H. Chen et al.
(a) Input (b) Edge map (c) Shadow map (d) Gated shadow (e) Gated edge map
map
the CRF [25] to further refine the predicted shadow maps like 6.4 Ablation study and analysis
pervious works [5, 62, 67].
As shown in Table 2 and Fig. 8, we conduct extensive experi-
6.3 Comparison with SOTA methods ments to analyze our methods, including label augmentation
scheme (“V2 ” and “V3 ”), multi-task learning (“V4 ”), loss
One can observe from Table 1, our fully supervised shadow function (“V5 ” and “V6 ”), and robustness analysis (“V7 ” and
detector (i.e., Ours-F) can achieve the best BER values among “V8 ”). In this paper, “V6 ” denotes our final result.
all competitors on the SBU, UCF, and ISTD datasets. Com- Directly training on scribbles. We directly train the shadow
pared to the second-best fully supervised methods, Ours-F detection branch in Fig. 5 with the sparse scribble annotation
reduces the BER values by 6.46%, 2.20%, and 3.47% on and Eq. (6), denoted as “V1 .” One can observe that “V1 ”
SBU, UCF, and ISTD, respectively. Moreover, the perfor- cannot preserve the shadow structure well caused by the intra-
mance of Our-W (i.e., WSD) also outperforms these weakly class discontinuity from the sparse scribble.
supervised methods, which improves by 9.35%, 8.78%,
8.44%, and 13.32% in terms of BER on SBU, UCF, ISTD Effect of label augmentation. We use the pseudo-masks
datasets, respectively. In addition, we also provide the from our weak augmentation and strong augmentation
computational complexity (i.e., FLOPs) and the network scheme to train the shadow detection branch like “V1 ,”
parameters of all competitive methods. Due to the usage denoted as “V2 ” and “V3 .” Compared with the original scrib-
of SegNeXt-B as our network backbone, such lightweight ble annotation, one can observe that both of our proposed
design cannot introduce so many training parameters and label augmentation schemes can lead to different degrees of
computational complexity. performance improvement. Among them, the effect of strong
Recall that the shadow scenes in SBU dataset are more augmentation is more remarkable. By only learning the
diverse and complex, and there are a lot of noisy labels most reliable training samples identified by our strong aug-
in the SBU training set. Training with them is prone to mentation scheme, our weakly supervised shadow detection
meet performance bottlenecks. However, in this work, we method already achieves similar performance with SOTA
use scribbles to mark error-prone regions empirically, and approaches.
then develop the label augmentation scheme to obtain denser Effect of the multi-task learning. We add the edge detection
annotations. Moreover, the edge detection-guided multi-task task to “V3 ,” denoted as “V4 .” Compared to “V3 ,” we observe
learning framework can lead the prediction align with shadow from Fig. 8f, the edge detection branch can provide richer
boundaries. So we can outperform existing fully supervised structural constraints for shadow detection.
and semi-supervised methods on the SBU and ISTD dataset
Effect of gated structure-aware loss. We use auxiliary
by a little margin. While in the UCF dataset, our detector
smoothing loss [48] and gated structure-aware loss [61] to
does not show a clear superiority than SDTR+, due to this
train “V4 ,” denoted as “V5 ” and “V6 ,” respectively. One
method learn from more auxiliary training data for better
can observe from Fig. 8g, simply using the smoothing loss
generalizability.
enforces smoothness in the whole image will make the
As shown in Fig. 7, we also qualitatively compare our
shadow map ambiguous. While the gated structure-aware
shadow detector with the most recent SOTA methods. One
loss in “V6 ” can tackle this issue.
can see that our detector has the best visualization per-
formance among all competitors on various scenes. When Effect of different edge maps. We also use other edge detec-
coping with tiny shadow (e.g., first two rows), self shadow tion methods, i.e., RCF [30] or Sobel [24], to extract the edge
(e.g., 3th and 4th rows), soft shadow (e.g., 5th rows), and maps for shadow images, and then use them for training the
common cast shadow (e.g., the last three rows), our method edge detection network, which is denoted as “V7 ” and “V8 ,”
detects them more accurately with fine structures. respectively. Since the Sobel and RCF are more sensitive to
123
Table 1 Quantitative comparisons of shadow detection methods on three benchmark datasets
Method Year Type Params (M) FLOPs (G) SBU [46] UCF [65] ISTD [47]
BER ↓ S↓ NS ↓ BER S NS BER S NS
Annotate less but perform better: weakly supervised...
stacked-CNN [46] 2016 F 11.00 8.84 12.76 13.00 8.84 12.76 8.60 7.69 9.23
scGAN [35] 2017 – – 9.10 8.39 9.69 11.50 7.74 15.30 4.70 3.22 6.18
BDRAR [66] 2018 42.46 31.42 3.64 3.40 3.89 7.81 9.69 5.44 2.69 0.50 4.87
DSC [16] 2018 122.48 61.66 5.59 9.76 1.42 10.54 18.08 3.00 3.42 3.85 3.00
A+D Net [28] 2018 54.41 – 5.37 4.45 6.30 9.25 8.37 10.14 – – –
DSD [62] 2019 58.16 46.63 3.45 3.33 3.58 7.59 9.74 5.44 2.17 1.36 2.98
MTMT-Net [5] 2020 44.13 47.34 3.15 3.73 2.57 7.47 10.31 4.63 1.72 1.36 2.08
FDRNet [67] 2021 10.77 14.32 3.04 2.91 3.18 7.28 8.31 6.26 1.55 1.22 1.88
SDCM [68] 2022 10.95 15.04 2.94 – – 6.73 – – 1.44 – –
SDTR+ [52] 2023 24.82 35.17 2.95 3.15 2.75 6.35 6.73 5.97 1.55 1.18 1.84
Ours-F 34.81 24.32 2.76 (6.46%↑) 3.04 2.48 6.21 (2.20%↑) 6.58 5.84 1.39 (3.47% ↑) 1.06 1.72
Ours-W (WSD) W 2.81 (9.35% ↑) 2.79 2.83 6.65(8.78% ↑) 7.65 5.65 1.41(8.44% ↑) 1.22 1.87
WCOD [14] 2023 29.52 13.46 3.10 3.08 3.12 7.29 8.12 6.46 1.54 1.36 1.72
WSOD [57] 2023 48.07 26.17 3.23 3.22 3.24 7.46 8.25 6.67 1.77 1.53 2.01
“F” and “W” denote the fully and weakly supervised training type, respectively. We set the best two fully supervised detectors in red and blue color, while the best two weakly supervised detectors
are set in green and copper. We also provide the percentage improvement bold for our two detectors compared to the best one in the previous detectors
123
H. Chen et al.
(a) Inputs (b) A+D (c) DSC (d) (e) DSD (f ) MTMT(g) Seg- (h) Ours (i) GT
BDRAR NeXt
123
Annotate less but perform better: weakly supervised...
(a) Input (b) GT (c) V1 (d) V2 (e) V3 (f ) V4 (g) V5 (h) V6 (WSD) (i) V7 (j) V8
(a) Skin image (b) Ground truth (c) Our scribble annota- (d) Edge map
tion
(e) Superpixel segmenta-(f ) Weak label augmenta-(g) Strong label augmen- (h) Lesion mask
tion tion tation
(a) Shadow image (b) Ground truth (c) Edge map (d) Shadow mask
image noise or texture than EDETR, “V7 ” and “V8 ” are lower was expected, we obtained impressive skin lesion detection
than “V6 .” results by training our weakly supervised learning frame-
work.
6.5 Application
123
H. Chen et al.
7 Conclusions 4. Chen, X.D., Wu, W., Yang, W., Qin, H., Wu, X., Mao, X.: Make
segment anything model perfect on shadow detection. IEEE Trans.
Geosci. Remote Sens. 61, 1–13 (2023)
In this work, we create two shadow detection datasets with 5. Chen, Z., Zhu, L., Wan, L., Wang, S., Feng, W., Heng, P.A.: A
scribble annotations, i.e., Scr-SBU and Scr-ISTD. At the multi-task mean teacher for semi-supervised shadow detection. In:
same time, we present a corresponding weakly supervised Proceedings of Computer Vision and Pattern Recognition, 5611–
shadow detection framework. Our method can achieve a 5620. IEEE (2020)
6. Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., Dusza, S.,
good trade-off between the labeling cost and detection accu- Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M.,
racy. Specifically, an uncertainty-guided label augmentation et al.: Skin lesion analysis toward melanoma detection 2018: A
scheme is developed to expand the sparse scribble annotation challenge hosted by the international skin imaging collaboration
to more reliable regions and enable sufficient training for a (isic). arXiv preprint arXiv:1902.03368 (2019)
7. Cucchiara, R., Grana, C., Piccardi, M., Prati, A., Sirotti, S.: Improv-
deep model. Furthermore, we introduce a multi-task learn-
ing shadow suppression in moving object detection with hsv color
ing framework to produce shadow maps with rich structures. information. In: Proceedings of Intelligent Transportation Systems,
Extensive experiments demonstrate that our method outper- 334–339. IEEE (2001)
forms existing SOTA semi-supervised and fully supervised 8. Ecins, A., Fermüller, C., Aloimonos, Y.: Shadow free segmenta-
tion in still images using local density measure. In: Proceedings
methods. In the future, we will extend our methods, e.g., label
of International Conference on Computational Photography, 1–8.
augmentation or edge-guided multi-task learning framework IEEE (2014)
to more computer vision tasks ,i.e., semantic segmentation 9. Ge, Y., Zhou, Q., Wang, X., Shen, C., Wang, Z., Li, H.: Point-
and salient object detection. teaching: weakly semi-supervised object detection with point
annotations. In: Proceedings of the AAAI Conference on Artifi-
Author contributions WW was involved in conceptualization, writing— cial Intelligence, 37, 667–675 (2023)
original draft, writing—review & editing. HC helped in validation, 10. Guan, Y.P.: Wavelet multi-scale transform based foreground seg-
methodology, project administration, funding acquisition. X-DC con- mentation and shadow elimination. Open Signal Process. J. 1, 1–6
tributed to visualization, formal analysis, data curation. WY and XM (2008)
helped in supervision. 11. Gulshan, V., Rother, C., Criminisi, A., Blake, A., Zisserman, A.:
Geodesic star convexity for interactive image segmentation. In:
Funding The work was supported by the National Natural Science Proceedings of Computer Vision and Pattern Recognition, 3129–
Foundation of China (61972120), the General Research Project of 3136. IEEE (2010)
Zhejiang Provincial Department of Education (Y202044861), the Zhe- 12. Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.:
jiang Provincial Natural Science Foundation (Regional Innovation Joint Segnext: Rethinking convolutional attention design for semantic
Funds) (LQZSZ24E050001). segmentation. In: Proceedings of Advances in Neural Information
Processing Systems. MIT Press (2022)
Data availability The datasets generated during and/or analyzed dur- 13. Guo, R., Dai, Q., Hoiem, D.: Single-image shadow detection and
ing the current study are available from the corresponding author on removal using paired regions. In: Proceedings of Computer Vision
reasonable request. and Pattern Recognition, 2033–2040. IEEE (2011)
14. He, R., Dong, Q., Lin, J., Lau, R.W.: Weakly-supervised camou-
flaged object detection with scribble annotations. In: Proceedings
Declarations of the AAAI Conference on Artificial Intelligence, 37, 781–789
(2023)
Conflict of interest The authors declare that they have no conflict of 15. Hu, X., Wang, T., Fu, C.W., Jiang, Y., Wang, Q., Heng, P.A.: Revisit-
interest. ing shadow detection: a new benchmark dataset for complex world.
IEEE Trans. Image Process. 30, 1925–1934 (2021)
Ethics approval The work follows appropriate ethical standards in 16. Hu, X., Zhu, L., Fu, C.W., Qin, J., Heng, P.A.: Direction-aware
conducting research and writing the manuscript. This work presents spatial context features for shadow detection. In: Proceedings
computational models trained with publicly available data, for which of Computer Vision and Pattern Recognition, 7454–7462. IEEE
no ethical approval was required. (2018)
17. Huang, X., Hua, G., Tumblin, J., Williams, L.: What character-
izes a shadow boundary under the sun and sky? In: Proceedings
of International Conference on Computer Vision, 898–905. IEEE
(2011)
18. Joshi, I., Kothari, R., Utkarsh, A., Kurmi, V.K., Dantcheva, A.,
References Roy, S.D., Kalra, P.K.: Explainable fingerprint roi segmentation
using monte carlo dropout. In: Proceedings of Winter Conference
1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: on Applications of Computer Vision, 60–69 (2021)
SLIC superpixels compared to state-of-the-art superpixel methods. 19. Junejo, I.N., Foroosh, H.: Estimating geo-temporal location of
IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012) stationary cameras using shadow trajectories. In: Proceedings of
2. Al-Amaren, A., Ahmad, M.O., Swamy, M.: A low-complexity European Conference on Computer Vision, 318–331. Springer
residual deep neural network for image edge detection. Appl. Intell. (2008)
53(9), 11282–11299 (2022) 20. Kang, S., Kim, J., Jang, I.S., Lee, B.D.: C2shadowgan: cycle-in-
3. Al-Huda, Z., Peng, B., Algburi, R.N.A., Alfasly, S., Li, T.: Weakly cycle generative adversarial network for shadow removal using
supervised pavement crack semantic segmentation based on multi- unpaired data. Appl. Intell. 53(12), 15067–15079 (2023)
scale object localization and incremental annotation refinement.
Appl. Intell. 53(11), 14527–14546 (2023)
123
Annotate less but perform better: weakly supervised...
21. Karsch, K., Hedau, V., Forsyth, D., Hoiem, D.: Rendering synthetic 40. Suykens, J.A., Vandewalle, J.: Least squares support vector
objects into legacy photographs. ACM Trans. Graph. 30(6), 1–12 machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
(2011) 41. Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolu-
22. Kendall, A., Gal, Y.: What uncertainties do we need in bayesian tional neural networks. In: Proceedings of International Conference
deep learning for computer vision? In: Proceedings of Advances on Machine Learning, 6105–6114. PMLR (2019)
in Neural Information Processing Systems. MIT Press (2017) 42. Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., Schroers, C.:
23. Kipf, T.N., Welling, M.: Semi-supervised classification with graph Normalized cut loss for weakly-supervised cnn segmentation. In:
convolutional networks. In: Proceedings of International Confer- Proceedings of Computer Vision and Pattern Recognition, 1818–
ence on Learning Representation, 1–14 (2017) 1827. IEEE (2018)
24. Kittler, J.: On the accuracy of the Sobel edge detector. Image Vis. 43. Unal, O., Dai, D., Van Gool, L.: Scribble-supervised lidar semantic
Comput. 1(1), 37–42 (1983) segmentation. In: Proceedings of the IEEE/CVF Conference on
25. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected Computer Vision and Pattern Recognition, pp. 2697–2707 (2022)
crfs with gaussian edge potentials. In: Proceedings of Advances in 44. Vicente, T.F.Y.: Large-scale weakly-supervised shadow detection.
Neural Information Processing Systems. MIT Press (2011) Ph.D. thesis, State University of New York at Stony Brook (2018)
26. Lalonde, J.F., Efros, A.A., Narasimhan, S.G.: Detecting ground 45. Vicente, T.F.Y., Hoai, M., Samaras, D.: Leave-one-out kernel opti-
shadows in outdoor consumer photographs. In: Proceedings of mization for shadow detection and removal. IEEE Trans. Pattern
European Conference on Computer Vision, 322–335. Springer Anal. Mach. Intell. 40(3), 682–695 (2017)
(2010) 46. Vicente, T.F.Y., Hou, L., Yu, C.P., Hoai, M., Samaras, D.: Large-
27. Lalonde, J.F., Efros, A.A., Narasimhan, S.G.: Estimating the nat- scale training of shadow detectors with noisily-annotated shadow
ural illumination conditions from a single outdoor image. Int. J. examples. In: Proceedings of European Conference on Computer
Comput. Vis. 98(2), 123–145 (2012) Vision, 816–832. Springer (2016)
28. Le, H., Vicente, T.F.Y., Nguyen, V., Hoai, M., Samaras, D.: A+d net: 47. Wang, J., Li, X., Yang, J.: Stacked conditional generative adver-
Training a shadow detector with adversarial shadow attenuation. In: sarial networks for jointly learning shadow detection and shadow
Proceedings of European Conference on Computer Vision, 662– removal. In: Proceedings of Computer Vision and Pattern Recog-
678. Springer (2018) nition, 1788–1797. IEEE (2018)
29. Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: Denser- 48. Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., Xu, W.: Occlu-
net: Weakly supervised visual localization using multi-scale feature sion aware unsupervised learning of optical flow. In: Proceedings
aggregation. In: Proceedings of the AAAI Conference on Artificial of Computer Vision and Pattern Recognition, 4884–4893. IEEE
Intelligence, 35, 6101–6109 (2021) (2018)
30. Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X.: Richer convo- 49. Wu, L., Cao, X., Foroosh, H.: Camera calibration and geo-location
lutional features for edge detection. In: Proceedings of Computer estimation from two shadow trajectories. Comput. Vis. Image
Vision and Pattern Recognition, 3000–3009. IEEE (2017) Underst. 114(8), 915–927 (2010)
31. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, 50. Wu, W., Chen, X.D., Yang, W., Yong, J.H.: Exploring better target
B.: Swin transformer: Hierarchical vision transformer using shifted for shadow detection. Knowl.-Based Syst. 273, 110614 (2023)
windows. In: Proceedings of International Conference on Com- 51. Wu, W., Wu, X., Wan, Y.: Single-image shadow removal using
puter Vision, 10012–10022. IEEE (2021) detail extraction and illumination estimation. Vis. Comput. 38(5),
32. Luo, D., Liu, G., Bavirisetti, D.P., Cao, Y.: Infrared and visible 1677–1687 (2022)
image fusion based on VPDE model and VGG network. Appl. 52. Wu, W., Yang, W., Ma, W., Chen, X.D.: How many annotations
Intell. 53(21), 24739–24764 (2023) do we need for generalizing new-coming shadow images? IEEE
33. Meng, Q., Sinclair, M., Zimmer, V., Hou, B., Rajchl, M., Tous- Transactions on Circuits and Systems for Video Technology 1–12
saint, N., Oktay, O., Schlemper, J., Gomez, A., Housden, J., et al.: (2023)
Weakly supervised estimation of shadow confidence maps in fetal 53. Wu, W., Zhang, S., Tian, M., Tan, D., Wu, X., Wan, Y.: Learning to
ultrasound imaging. IEEE Trans. Med. Imaging 38(12), 2755–2767 detect soft shadow from limited data. Vis. Comput. 38(5), 1665–
(2019) 1675 (2022)
34. Mikic, I., Cosman, P.C., Kogut, G.T., Trivedi, M.M.: Moving 54. Wu, W., Zhang, S., Zhou, K., Yang, J., Wu, X., Wan, Y.: Shadow
shadow and object detection in traffic scenes. In: Proceedings of removal via dual module network and low error shadow dataset.
International Conference on Pattern Recognition, P1, 321–324. Comput. Graph. 95, 156–163 (2021)
IEEE (2000) 55. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo,
35. Nguyen, V., Yago Vicente, T.F., Zhao, M., Hoai, M., Samaras, P.: Segformer: Simple and Efficient Design for Semantic Segmen-
D.: Shadow detection with conditional generative adversarial net- tation with Transformers. MIT Press, Cambridge (2021)
works. In: Proceedings of International Conference on Computer 56. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual
Vision, 4510–4518. IEEE (2017) transformations for deep neural networks. In: Proceedings of Com-
36. Okabe, T., Sato, I., Sato, Y.: Attached shadow coding: Estimat- puter Vision and Pattern Recognition, 1492–1500. IEEE (2017)
ing surface normals from shadows under unknown reflectance and 57. Xu, B., Liang, H., Gong, W., Liang, R., Chen, P.: A visual
lighting conditions. In: Proceedings of International Conference on representation-guided framework with global affinity for weakly
Computer Vision, 1693–1700. IEEE (2009) supervised salient object detection. IEEE Trans. Circ. Syst. Video
37. Panagopoulos, A., Samaras, D., Paragios, N.: Robust shadow and Technol. 1–12 (2023)
illumination estimation using a mixture model. In: Proceedings of 58. Yang, W., Wu, W., Chen, X.D., Tao, X., Mao, X.: How to use extra
Computer Vision and Pattern Recognition, 651–658. IEEE (2009) training data for better edge detection? Appl. Intell. 1–15 (2023)
38. Pei, J., Tang, H., Wang, W., Cheng, T., Chen, C.: Salient instance 59. Ye, S., Chen, D., Han, S., Liao, J.: Learning with noisy labels for
segmentation with region and box-level annotations. Neurocom- robust point cloud segmentation. In: Proceedings of International
puting 507, 332–344 (2022) Conference on Computer Vision, 6443–6452 (2021)
39. Pu, M., Huang, Y., Liu, Y., Guan, Q., Ling, H.: Edter: Edge detec- 60. Zhang, B., Xiao, J., Jiao, J., Wei, Y., Zhao, Y.: Affinity attention
tion with transformer. In: Proceedings of Computer Vision and graph neural network for weakly supervised semantic segmenta-
Pattern Recognition, 1402–1412. IEEE (2022) tion. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8082–8096
(2022)
123
H. Chen et al.
61. Zhang, J., Yu, X., Li, A., Song, P., Liu, B., Dai, Y.: Weakly- Xiao-Diao Chen received the bach-
supervised salient object detection via scribble annotations. In: elor’s degree from Zhejiang Uni-
Proceedings of Computer Vision and Pattern Recognition, 12546– versity, Hangzhou, China, in 2000,
12555 (2020) and the master’s and Ph.D. degrees
62. Zheng, Q., Qiao, X., Cao, Y., Lau, R.W.: Distraction-aware shadow from Tsinghua University, Beijing,
detection. In: Proceedings of Computer Vision and Pattern Recog- China, in 2003 and 2006, respec-
nition, 5167–5176. IEEE (2019) tively. He is currently a faculty
63. Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, member with the School of Com-
J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation puter, Hangzhou Dianzi Univer-
from a sequence-to-sequence perspective with transformers. In: sity, Hangzhou. His research inter-
Proceedings of Computer Vision and Pattern Recognition, 6881– ests include approximation and
6890. IEEE (2021) interpolation methods applied in
64. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learn- computer graphics, edge detection,
ing deep features for discriminative localization. In: Proceedings and shadow detection in image
of Computer Vision and Pattern Recognition, 2921–2929. IEEE processing.
(2016)
65. Zhu, J., Samuel, K.G., Masood, S.Z., Tappen, M.F.: Learning to
recognize shadows in monochromatic natural images. In: Proceed-
ings of Computer Vision and Pattern Recognition, 223–230. IEEE
(2010) Wen Wu received his B.Sc. degree
66. Zhu, L., Deng, Z., Hu, X., Fu, C.W., Xu, X., Qin, J., Heng, P.A.: in electronic and information engi-
Bidirectional feature pyramid network with recurrent attention neering from Wuhan College of
residual modules for shadow detection. In: Proceedings of Euro- Arts and Science in 2016 and
pean Conference on Computer Vision, 121–136. Springer (2018) M.Sc. degree in computer applied
67. Zhu, L., Xu, K., Ke, Z., Lau, R.W.: Mitigating intensity bias technology from Hubei Univer-
in shadow detection via feature decomposition and reweighting. sity, Wuhan, in 2019. He is cur-
In: Proceedings of International Conference on Computer Vision, rently pursuing the Ph.D. degree
4702–4711. IEEE (2021) in computer science and technol-
68. Zhu, Y., Fu, X., Cao, C., Wang, X., Sun, Q., Zha, Z.J.: Single image ogy with Hangzhou Dianzi Uni-
shadow detection via complementary mechanism. In: Proceedings versity, Hangzhou, China. From
of the ACM International Conference on Multimedia, 6717–6726 2019 to 2022, he was a lecturer
(2022) in computer science with the Xin-
jiang Institute of Technology,
Aksu, China. His research inter-
Publisher’s Note Springer Nature remains neutral with regard to juris- ests include deep learning and computer vision.
dictional claims in published maps and institutional affiliations.
123
Annotate less but perform better: weakly supervised...
123