Annotate Less But Perform Better: Weakly Supervised Shadow Detection Via Label Augmentation

The Visual Computer
https://doi.org/10.1007/s00371-024-03278-6
ORIGINAL ARTICLE
Annotate less but perform better: weakly supervised shadow

detection via label augmentation
Hongyu Chen1 · Xiao-Diao Chen1,2 · Wen Wu1 · Wenya Yang1 · Xiaoyang Mao3
Accepted: 11 January 2024

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024
Abstract
Shadow detection is essential for scene understanding and image restoration. Existing paradigms for producing shadow
detection training data usually rely on densely labeling each image pixel, which will lead to a bottleneck when scaling up the
number of images. To tackle this problem, by labeling shadow images with only a few strokes, this paper designs a learning
framework for Weakly supervised Shadow Detection, namely WSD. Firstly, it creates two shadow detection datasets with
scribble annotations, namely Scr-SBU and Scr-ISTD. Secondly, it proposes an uncertainty-guided label augmentation scheme
based on graph convolutional networks, which can propagate the sparse scribble annotations to more reliable regions, and
then avoid the model converging to an undesired local minima as intra-class discontinuity. Finally, it introduces a multi-task
learning framework to jointly learn for shadow detection and edge detection, which encourages generated shadow maps to
be comprehensive and well aligned with shadow boundaries. Experimental results on benchmark datasets demonstrate that
our framework even outperforms existing semi-supervised and fully supervised shadow detectors requiring only 2% pixels
to be labeled.
Keywords Image segmentation · Shadow detection · Weakly supervised · Graph convolutional network · Uncertainty analysis
1 Introduction 54] in these cases can obviously improve accuracy. More-

over, shadows can provide valuable cues for inferencing
The presence of shadows in natural images tends to change scene geometry [19, 21, 36], camera parameters [49], and
the intensity, color, and texture of the occluded image regions. light source localization [27, 37]. In recent years, shadow
Therefore, shadows often hamper the performance in var- detection [4, 50, 52, 53] has achieved significant progress
ious vision tasks, i.e., image segmentation [10] and object due to the development of deep models. However, deep
detection [7, 34]. Detecting and removing shadows [51, networks are extremely data hungry, typically requiring
pixel-level human-annotated datasets to achieve high per-
B Xiao-Diao Chen formance. Hence, seeking weak supervisory signals is very
xiaodiao@hdu.edu.cn meaningful to shadow detection as it can significantly reduce
B Wen Wu labeling costs and enable models to scale much easier than
wuwen.hdu.cs@gmail.com fully supervised methods.
B Wenya Yang The common weak labels in the image segmentation
yangwenya@hdu.edu.cn communities are as follows: image-level category label [3],
Hongyu Chen bounding box [38], point [9], and scribble [43]. The image-
chenhongyu651@gmail.com level labels are often used with the class activate map [64],
Xiaoyang Mao which aims to locate the most discriminative regions for a cer-
mao@yamanashi.ac.jp tain object. Obviously, representing the complicated shadows
1 in an image by one or more category labels is impossible. In
School of Computer Science, Hangzhou Dianzi University,
Hangzhou, China addition, the bounding boxes and points may only be suitable
2 for simple scenes where shadow distribution is relatively con-
Xinchuang Haihe Laboratory, Tianjin, China
centrated or scattered. So we choose to relabel two widely
3 Department of Computer Science and Engineering, used shadow detection datasets (i.e., SBU [46], and ISTD
University of Yamanashi, Kofu, Japan
123
H. Chen et al.
(a) Scribble annotations (b) Densely labeling (c) Weak augmentation (d) Strong augmentation
(e) Baseline (f ) Edge map (g) Multi-task learning (h) WSD (ours)
Fig. 1 A training example in our weakly supervised shadow detection framework
[47]), namely Scr-SBU and Scr-ISTD, with flexible scribble the-art (SOTA) edge detector—EDTER [39]. When we train
(Fig. 1a). the shadow detection and edge detection task jointly, edge
Compared to densely labeling (Fig. 1b), scribble anno- detection can provide rich edge features for shadow detec-
tation can reduce the labeling cost by 20–40 times. Since tion. However, so many structure details will exist in the
scribble annotations are generally sparse, it will bring with non-shadow regions (Fig. 1g). Inspired by gated structure-
us a new challenge: deep models training on them are prone to aware loss [61], we adopt it to focus only on shadow regions
be trapped into an undesired local minima as intra-class dis- and ignore the structure of non-shadow regions. By doing
continuity, i.e., shadow regions often cover multiple materials these, we can generate shadow maps with smoothing pre-
with different textures. Fortunately, the above problem can dictions inside shadow and non-shadow regions, and clear
be alleviated by exploring more pixel-level training samples. boundaries at the shadow edges (Fig. 1h). Extensive exper-
Thanks to the powerful capability of label propagation from iments show that our WSD outperforms the SOTA shadow
the graph convolutional networks (GCNs) [23, 60], we pro- detectors with only labeling around 2% pixels. The three
pose a novel label augmentation scheme, which can expand main contributions are summarized as follows.
sparse scribble to denser annotation (Fig. 1c and d) by prop-
agating the label information among superpixel nodes and 1. To alleviate the reliance on dense labeling, we present a
conducting uncertainty analysis for GCN outputs. weakly supervised shadow detection (WSD) framework
One can observe that even the augmented annotations are and create two corresponding shadow detection datasets
still hard to align with the edges of shadows, resulting in poor with scribble annotation. Experimental results demon-
boundary localization of shadow maps (Fig. 1e). Inspired strate that our WSD can perform comparably to SOTA
by a recent semi-supervised shadow detector, MTMT-Net semi-supervised and fully supervised shadow detectors.
[5], they introduce a shadow edge detection task to produce 2. We propose an uncertainty-guided label augmentation
feature highlight shadow structures. However, this manner scheme based on the graph convolutional networks to
is not suitable for us because of the following two reasons. explore more reliable labels from sparse scribble anno-
Firstly, the shadow edge masks in MTMT-Net are extracted tation, which can avoid the model converging to an
from GT shadow masks, which means this method still relies undesired local minima as intra-class discontinuity.
on pixel-level densely labeling and it goes against the original 3. To solve the problem of poor boundary localization due
intention of weakly supervised learning. Secondly, the severe to incomplete labels around shadow edges on the aug-
noisy labels in GT shadow masks lead to the extracted shadow mented annotations, we introduce a multi-task learning
edge masks not being well aligned with shadow boundaries, framework to train the shadow detection and edge detec-
resulting in an unsatisfactory shadow edge detection result. tion jointly. It enables us to extract the shadow edge
To tackle this problem, we introduce an edge detection features explicitly to produce more accurate shadow
task instead of shadow edge detection. Note that the GT boundaries.
edge maps (Fig. 1f) in our task are produced by a state-of-
123
Annotate less but perform better: weakly supervised...
2 Related work work (SDCM) by investigating the mutual complementary

mechanisms. Wu et al. [52] introduce an omni-supervised
Traditional shadow detection methods [17, 26, 65] focus on shadow detection approach based on Transformer architec-
extracting hand-crafted features from pixel and edge to iden- ture (SDTR+) to generalize to new shadow datasets with
tify the shadow region from consumer photographs. Later, minimal annotation costs.
various machine learning-based classifiers are designed for While deep learning-based shadow detection methods
shadow detection based on these features. For example, have achieved compelling results, these above deep learning-
Huang et al. [17] train a support vector machine (SVM) for based shadow detectors heavily rely on pixel-level densely
shadow detection using the edge features of shadows. Guo labeling. This approach will become a bottleneck when
et al. [13] also use the SVM to create a graph-based clas- scaling up the number of images, leading to a limited gener-
sifier to merge the segmented shadow regions using color, alization ability of deep models. That motivates us to attempt
position, and texture features. Vicente et al. [45] employ to develop a weakly supervised shadow detection framework.
Markov random field (MRF) to improve the performance
of shadow detection using pair-wise context. However, these
above methods do not work in complex shadow images since
these hand-crafted features are not discriminative enough to 3 Two benchmarks for weakly supervised
distinguish shadow regions well from backgrounds. shadow detection
In recent years, convolutional neural networks (CNNs)
[2, 20, 32, 58] have achieved remarkable performance in Several benchmarks have been successively proposed for
the computer vision community. At the same time, various single-image shadow detection in the past years. In the begin-
CNN-based shadow detection methods have been proposed ning, UCF [65] (245 pairs of images) and UIUC [13] (108
for obtaining more accurate results. In the beginning, Vicente pairs of images) are created for training traditional machine
et al. [44, 46] propose a “lazy annotation” method to quickly learning-based classifiers as their limited size. To meet the
generate pseudo masks for the large-scale benchmark dataset needs of deep learning, Vicente et al. [46] feature a wider vari-
(i.e., SBU), and introduce a patch-level CNN to exploit ety of scenes for their SBU dataset, which has 4,725 pairs of
semantic information of shadow images. Nguyen et al. [35] shadow/shadow mask images. To accelerate the process of
introduce a novel sensitive conditional generative adversar- large-scale data labeling, they first use lazy annotation [11] to
ial network (scGAN) to extract high-level contexts better. generate coarse binary masks for training images. Although
Le et al. [28] propose an attenuation and detection network these masks will be further refined by the least squares sup-
(A+D Net) for adversarial learning, where the A-Net tries to port vector machine (LSSVM) [40], they still have lots of
fool the shadow predictions from the D-Net by modifying annotation noises (Fig. 2a). Next, Wang et al. [47] binarize
the illumination of shadow regions. Hu et al. [16] present a the difference between shadow and shadow-free images to
direction-aware spatial context (DSC ) module that uses a obtain the coarse label masks. Then, they manually adjust
spatial RNN to learn the spatial context of shadows. Zhu et these masks for more accurate labeling on their ISTD dataset.
al. [66] develop a bidirectional recurrent attention residual However, so many images in it share the same background,
(BDRAR) to optimize context features from deep to shal- leading to a weak generalization ability. To consider both the
low layers. Zheng et al. [62] propose a distraction-aware complexity of image scenes and the accuracy of annotations,
shadow detection network (DSDNet) to learn discriminative Hu et al. [15] collect shadow images for multiple scenar-
features for robust shadow detection. Chen et al. [5] intro- ios and compile the CUHK-Shadow dataset, which contains
duce a multi-task mean teacher network (MTMT-Net), which 10,500 shadow images. They hired 30 persons to label all
uses unlabeled shadow data to further improve the general- shadow images, which cost about six months.
ization ability. Very recently, Hu et al. [15] design a fast As we can see, producing a sizable shadow dataset would
shadow detection network (FSDNet) by adopting the DSC demand a great deal of work, especially to get pixel-level
[16] module. Zhu et al. [67] propose a feature decomposition annotations. So we seek a weak supervisory signal, i.e.,
and reweighting scheme (FDRNet) to mitigate the intensity scribble, to annotate shadow images. Specifically, we invited
bias in shadow detection. Meng et al. [33] introduce a weakly six students to annotate all images from SBU and ISTD. It
supervised method for automatically estimating the confi- took around 4 h for data labeling and 2 h for discussion and
dence of acoustic shadow regions in ultrasound images. Liu refinement. We also provide the statistical analysis for the
et al. [29] present a weakly supervised convolutional neural annotated pixels, which reveals that around 2% of the pixels
network architecture called DenserNet for visual localiza- be labeled (Fig. 2b). Moreover, instead of randomly draw-
tion, which utilizes GPS-tagged image pairs for end-to-end ing a few strokes in the shadow and non-shadow regions,
training without needing pixel-level annotation. Recently, we empirically mark error-prone regions (tiny shadow, self
Zhu et al. [68] present a novel shadow detection frame- shadow, soft shadow, and distraction regions) with the cor-
123
H. Chen et al.
Annotang for SBU Annotang for ISTD
Shadow image Shadow mask Scribble Shadow image Shadow mask Scribble
(a) Illustration of our Scr-SBU and Scr-ISTD (b) Statics of the ratio of labeled pixel
Fig. 2 Our weakly supervised shadow detection benchmarks
responding color. This trick ensures that the trained model is Feature representation. Shadow regions usually have
highly discriminative for these regions. lower brightness, darker color, weaker texture, and shape
clusters. Therefore, we encode the illumination (L channel
in Lab color space), color histograms (RGB space), tex-
4 Label augmentation ton histograms [8], and location of each superpixel into
node representation. Compared with the features extracted by
Let us define the training set as S = {Ii , Yi }i=1
N , where I is deep network [60], hand-crafted features have lower dimen-
the shadow image, Y is scribble mask, and N is the number sions. Most importantly, we do not need a pre-trained image
of the all training images. Considering that directly training classifier to extract high-level features for shadow images.
on the sparse annotation will lead to the deep model being While the edge representation can be viewed as a relevance
trapped into undesired local minima. In this section, we aim between
H two adjacent nodes, computed by Gaussian similar-
to develop a label augmentation scheme to explore denser ity h=1 exp(−||fih − f hj ||22 /σh2 ), fih denote the node feature
annotation Y from given sparse scribble mask Y. Our label representation of node xi on h-th hand-crafted features, and
augmentation consists of two degrees of schemes: weak aug- σh is the corresponding standard deviation.
mentation and strong augmentation, see also Fig. 3. Graph construction. Even in extremely complex scenes,
It is worth noting that a superpixel will be viewed as the the shadow regions still easily to be distinguished from the
shadow or non-shadow region entirely as long as an arbitrary image background by our eyes. We found that humans tend
pixel is annotated as shadow or non-shadow. In weak aug- to first locate the approximate scope of shadow regions by
mentation, we can easily expand the annotation by two–three the prior knowledge of weak brightness, and then discrimi-
times (the times are decided by the superpixel size). While nate the ambiguous regions by local and global comparison,
in strong augmentation, we further propagate the label infor- e.g., two regions on the same material with different bright-
mation from the labeled superpixel to the rest parts based on ness indicate that the darker region must belong to shadow.
a two-layer GCN model. Then, we perform uncertainty anal- To conduct more robust label propagation, we treat all the
ysis for GCN outputs to select the most reliable part from the superpixels in turn as the center, connecting with the four
entire pseudo-masks, avoiding deep model training on them nearest nodes and 15 random nodes (these sparse connec-
overfitting so many noisy labels. tions can be a trade-off between training cost and inference
accuracy) for considering both the local and global context.
4.1 Data Formulation
4.2 Label propagation
We first use SLIC superpixel segmentation [1] to separate a
training image into a series of homogeneous regions. Then, After our weak augmentation, we can assign each superpixel
these superpixels are viewed as nodes in graph G, allowing node with a corresponding label: shadow, non-shadow, or
us to have a better context for the representation. Noting unlabeled. Next, we pose the strong label augmentation as
that, since we need to train an individual GCN model for a graph-learning problem to propagate the label information
each shadow image in the training set, the overall training among all nodes. The procedure can be formulated as:
cost and memory requirements should be considered. Hence,
we attempt to design an efficient GCN model from feature
representation and graph construction. ˜ 0 ))W1 ),
Y = Sigmoid(Ã(ReLU(ASW (1)
123
Fig. 3 Overview of our Scribble annotaon

uncertainty-guided label
augmentation scheme
Expand
Hidden layers Output
Graph formulaon MCDO
Hidden layers layers
2-layered GCN
Strong augmentaon
T
Weak augmentaon
where S = [s1 , s2 , ..., sn ]T ∈ Rn×d consists of n feature reliable non-shadow, and unreliable node). Inspired by the
vectors with d dimension, W0 ∈ Rd×32 and W1 ∈ R32×1 Monte Carlo dropout (MCDO) [18, 22], which can perform
are network parameters, Ã is the sum of the adjacent matrix uncertainty analysis for an arbitrary CNN model by keeping
A ∈ Rn×n and an identity matrix in . Y = [y1 , y2 , ..., yn ] the dropout layers turned on in the inference stage. Consider-
denotes the predicted probabilities of all nodes. The GCN ing the similarity between the CNN and GCN, we introduce
training and inference process is very fast as it only has a this MCDO into the GCN training to conduct such analy-
few thousand parameters. Then, we introduce a partial cross- sis. The identified uncertain parts will be regarded as the
entropy loss [42] to encourage the label information from the “unreliable nodes,” and then be discarded in the subsequent
weak augmentation unchanged in the training stage: multi-task learning. Specifically, we retain the dropout layer
in the inference stage and then perform T forward passes to
⎡ ⎤
generate a distribution over predictions.
1 1
LGC N = −⎣ log ŷi + log(1 − ŷ j )⎦ ,
|S| |N S|
1
T
i∈S j∈N S
D(x) ≈ GC N (x, θ t ), (3)
(2) T
t=1
where S and N S denote the set of shadow and non-shadow where θ is the GCN’s parameter and T is number of total
nodes in the result of weak augmentation, respectively. In the stochastic forward passes. We compute a node’s uncertainty
stage of inference, we can calculate the predicted probabili- U(x) by the cross-entropy:
ties for all nodes by Eq. (1) and then generate a corresponding
shadow map (Fig. 4b). Specifically, we construct a fully con-
U(x) = − D(x) log D(x) + (1 − D(x)) log(1 − D(x)) .
nected graph including all nodes in a shadow image, extract
(4)
the feature matrix, and then feed it into the well-trained GCN
model.
At this point, the unreliable part can be selected if U(x) is
greater than the threshold τ . The final augmented annotation
4.3 Noise suppression mask can be formulated as:
One can observe that there are many errors (Fig. 4e) in GCN binarize(Y (x)), i f U(x) > τ
Y (x) = , (5)
outputs as the limited performance of the lightweight GCN 2, i f U(x) ≤ τ
model and the limited representation ability of low-level fea-
tures. According to the memorization effect [59], training where “2” denotes the uncertain elements (black in Fig. 4c),
with partially incorrect labels often reduces the final model which will be discarded in the rest of the shadow detection
performance. As a result, our goal is to identify the most training. One can observe from that (Fig. 4e and f, we can
reliable parts from GCN outputs. obtain a reliable pseudo-mask with fewer label noises. Here,
Our idea is to transfer the label propagation from a binary it is worth noting that, despite using our label augmenta-
classification problem (i.e., shadow and non-shadow node) tion scheme, there still are three kinds of supervision signals.
into a multi-classification problem (i.e., reliable shadow, That means training with the pseudo-masks from our weak
123
H. Chen et al.
Fig. 4 An example of our noise

suppression
(a) Shadow image (b) Pseudo shadow mask (c) Result of noise suppression
(d) Clean shadow mask (e) Noisy labels in (b) (f ) Noisy labels in (c)
or strong label augmentation is still essentially weakly super- 5.2 Multi-task learning
vised learning.
Shadow detection task: For scribble supervision, we use
the partial cross-entropy loss Lscribble on the labeled points
of augmented annotations as:
1 1
5 Edge-guided shadow detection Lscribble = − log ŷi − log(1 − ŷi ),
|As | |Ans |
i∈As j∈Ans
Despite the denser annotation obtained from the strong aug- (6)
mentation, they still had to represent the structure of shadows
since the uncertainty parts tend to exist along with the shadow
where As and Ans denote the augmented shadow and
boundaries. Hence, we introduce the edge detection task
non-shadow annotations, respectively. ŷi is the predicted
into our weakly supervised shadow detection framework, as
probability of i-th pixel belong to shadow.
shown in Fig. 5. Specifically, our shadow detection branch
first generates a coarse shadow map sc ; the edge map e Edge detection task: We use the SOTA edge detector,
obtained by the edge detection branch will be fused further EDETR [39], to produce edge maps e for all training images,
to improve the structure of sc , and then produce a refined then use them as GT to train the edge detection branch with
shadow map sr . Finally, we introduce a gated structure-aware a loss Ledge :
loss [61] to force the boundaries of these two shadow maps
to comply with image edges.
N
Ledge (e, e ) = − (ei log ei + (1 − ei ) log(1 − ei )). (7)
i=1
This task can compensate for the weak boundary structure in

5.1 Network architecture augmented annotations.
Unlike recent SOTA shadow detectors [5, 62, 66, 67], which Gated structure-aware loss: Although the edge features
use ResNext101 [56] or EfficentNet [41] to extract multi- from the edge detection task help the shadow detection net-
scale features of shadow images, we use the SegNeXt-B [12] work generate a shadow map with richer structures, it does
instead as our backbone, which can perform better than recent not limit the scope of the structure that needs to be recov-
vision transformers (ViT) [31, 55, 63] with simple and cheap ered. For example, it retains too much structural information
convolutions. For the feature maps extracted from the dif- in non-shadow regions. Inspired by the gated structure-aware
ferent stages of the shared encoder, we first compress and loss [61], which introduces a gate for the smoothing loss [48]
upsample them into the same resolution, and then fuse and to place constraints on the scope of structure to be recovered,
feed them into a 1 × 1 convolutional kernel to generate the we employ it to encourage the predicted shadow maps to have
edge map and shadow map for two different tasks separately. consistent intensities inside the shadow regions and distinct
123
Fig. 5 Overview of our

proposed weakly supervised
shadow detection framework. It
consists of three main parts: a
shared encoder for extracting
features from shadow images,
and two decoders for edge
detection and shadow detection,
respectively
boundaries at the shadow edges. This loss is defined as: of UCF dataset and the image scenes on it are similar to SBU,
we also validate the detection performance on it by using a
Lgate = φ(|∂ p sw,h |e−α|∂ p (G·Iw,h )| ), (8) model well-trained on Scr-SBU.
w,h p∈
x ,y
Metrics. We use the balanced error rate (BER) to evaluate
where φ(·) is square root operation, Iw,h is the intensity of our proposed detector quantitatively:
the gray level at location (w, h), p is the partial derivatives,
1 TP TN
and G is the gate. As shown in Fig. 6, we dilate the predicted BER = 1 − + × 100. (10)
shadow map (Fig. 6c) to obtain a gate map (Fig. 6d). By using 2 TP + FN TN + FP
it, gated edge detection result (Fig. 6e) can help network to
We also consider the per-pixel error rates of shadow and non-
focus on the shadow region and predict sharp boundaries in
shadow regions, denoted as “S” and “NS,” respectively.
a shadow map.
Based on these above loss functions, we define the total Methods for comparison. We first compare our methods
loss function LMulti−task as follows: with recent ten SOTA shadow detectors, SDTR+ [52], SDCM
[68], FDRNet [67], MTMT-Net [5], DSD [62], A+D Net [28],
LMulti−task = Lscribble (sc ) + Lscribble (sr ) + α1 · Lgate (sc ) DSC [16], BDRAR [66], scGAN [35], and stacked-CNN
+ α2 · Lgate (sr ) + α3 · Ledge (e, e ). (9) [46]. For a more comprehensive comparison, we also com-
pare our method with a weakly supervised saliency object
where α1 , α2 , and α3 are the hyper-parameters for balancing detector, WSOD [57], and a weakly supervised camouflaged
the effect of each loss function. We use the Lscirbble and Lgate object detector, WCOD [14]. For a fair competition, we
on both the coarse shadow map sc and the refined shadow directly use the reported evaluation values on their publi-
map sr , and employ the Ledge for the edge map e . Note that cations.
Lscribble do not contradict with Lgate since Lscribble aims to
propagate the annotated labels to the shadow regions (relying 6.2 Implementation details
on the shadow detection branch), while Lgate constrain sc and
sr to be well aligned to edges extracted by edge detection Our GCN model and shadow detection network are imple-
branch prevents the shadow labels from propagating to non- mented on NVIDIA RTX 3090 GPU by Python 3.6 and
shadow regions. Pytorch 1.7. In details, our two-layered GCN model has 32
hidden layers and an output layer for binary classification.
We use Adam as our gradient optimizer, the learning rate
6 Experiments is set as 0.01 to train all graphs from a single image with
600 epochs. In our uncertainty analysis (Sec. 4.3), we set
6.1 Setup the dropout rate in the hidden layer p = 0.3, T = 20, and
τ = 0.8. In our shadow detection network, we first resize
Benchmarks. All experiments in our work are conducted on all input images to 416×416. The learning rate is set as 1e-4
the following datasets: Scr-SBU, Scr-ISTD, and UCF [65]. with a poly-learning rate decay policy. The batch size is set
Specifically, we first train and test our WSD on the Scr-SBU as 8, and the hyper-parameters α1 , α2 , and α3 in Eq. (9) as
and Scr-ISTD dataset separately. Next, due to the limited size 0.2, 0.2, and 0.6, respectively. During testing, we also apply
123
H. Chen et al.
(a) Input (b) Edge map (c) Shadow map (d) Gated shadow (e) Gated edge map
map
Fig. 6 Structure-aware constrains
the CRF [25] to further refine the predicted shadow maps like 6.4 Ablation study and analysis
pervious works [5, 62, 67].
As shown in Table 2 and Fig. 8, we conduct extensive experi-
6.3 Comparison with SOTA methods ments to analyze our methods, including label augmentation
scheme (“V2 ” and “V3 ”), multi-task learning (“V4 ”), loss
One can observe from Table 1, our fully supervised shadow function (“V5 ” and “V6 ”), and robustness analysis (“V7 ” and
detector (i.e., Ours-F) can achieve the best BER values among “V8 ”). In this paper, “V6 ” denotes our final result.
all competitors on the SBU, UCF, and ISTD datasets. Com- Directly training on scribbles. We directly train the shadow
pared to the second-best fully supervised methods, Ours-F detection branch in Fig. 5 with the sparse scribble annotation
reduces the BER values by 6.46%, 2.20%, and 3.47% on and Eq. (6), denoted as “V1 .” One can observe that “V1 ”
SBU, UCF, and ISTD, respectively. Moreover, the perfor- cannot preserve the shadow structure well caused by the intra-
mance of Our-W (i.e., WSD) also outperforms these weakly class discontinuity from the sparse scribble.
supervised methods, which improves by 9.35%, 8.78%,
8.44%, and 13.32% in terms of BER on SBU, UCF, ISTD Effect of label augmentation. We use the pseudo-masks
datasets, respectively. In addition, we also provide the from our weak augmentation and strong augmentation
computational complexity (i.e., FLOPs) and the network scheme to train the shadow detection branch like “V1 ,”
parameters of all competitive methods. Due to the usage denoted as “V2 ” and “V3 .” Compared with the original scrib-
of SegNeXt-B as our network backbone, such lightweight ble annotation, one can observe that both of our proposed
design cannot introduce so many training parameters and label augmentation schemes can lead to different degrees of
computational complexity. performance improvement. Among them, the effect of strong
Recall that the shadow scenes in SBU dataset are more augmentation is more remarkable. By only learning the
diverse and complex, and there are a lot of noisy labels most reliable training samples identified by our strong aug-
in the SBU training set. Training with them is prone to mentation scheme, our weakly supervised shadow detection
meet performance bottlenecks. However, in this work, we method already achieves similar performance with SOTA
use scribbles to mark error-prone regions empirically, and approaches.
then develop the label augmentation scheme to obtain denser Effect of the multi-task learning. We add the edge detection
annotations. Moreover, the edge detection-guided multi-task task to “V3 ,” denoted as “V4 .” Compared to “V3 ,” we observe
learning framework can lead the prediction align with shadow from Fig. 8f, the edge detection branch can provide richer
boundaries. So we can outperform existing fully supervised structural constraints for shadow detection.
and semi-supervised methods on the SBU and ISTD dataset
Effect of gated structure-aware loss. We use auxiliary
by a little margin. While in the UCF dataset, our detector
smoothing loss [48] and gated structure-aware loss [61] to
does not show a clear superiority than SDTR+, due to this
train “V4 ,” denoted as “V5 ” and “V6 ,” respectively. One
method learn from more auxiliary training data for better
can observe from Fig. 8g, simply using the smoothing loss
generalizability.
enforces smoothness in the whole image will make the
As shown in Fig. 7, we also qualitatively compare our
shadow map ambiguous. While the gated structure-aware
shadow detector with the most recent SOTA methods. One
loss in “V6 ” can tackle this issue.
can see that our detector has the best visualization per-
formance among all competitors on various scenes. When Effect of different edge maps. We also use other edge detec-
coping with tiny shadow (e.g., first two rows), self shadow tion methods, i.e., RCF [30] or Sobel [24], to extract the edge
(e.g., 3th and 4th rows), soft shadow (e.g., 5th rows), and maps for shadow images, and then use them for training the
common cast shadow (e.g., the last three rows), our method edge detection network, which is denoted as “V7 ” and “V8 ,”
detects them more accurately with fine structures. respectively. Since the Sobel and RCF are more sensitive to
123
Table 1 Quantitative comparisons of shadow detection methods on three benchmark datasets
Method Year Type Params (M) FLOPs (G) SBU [46] UCF [65] ISTD [47]
BER ↓ S↓ NS ↓ BER S NS BER S NS
stacked-CNN [46] 2016 F 11.00 8.84 12.76 13.00 8.84 12.76 8.60 7.69 9.23
scGAN [35] 2017 – – 9.10 8.39 9.69 11.50 7.74 15.30 4.70 3.22 6.18
BDRAR [66] 2018 42.46 31.42 3.64 3.40 3.89 7.81 9.69 5.44 2.69 0.50 4.87
DSC [16] 2018 122.48 61.66 5.59 9.76 1.42 10.54 18.08 3.00 3.42 3.85 3.00
A+D Net [28] 2018 54.41 – 5.37 4.45 6.30 9.25 8.37 10.14 – – –
DSD [62] 2019 58.16 46.63 3.45 3.33 3.58 7.59 9.74 5.44 2.17 1.36 2.98
MTMT-Net [5] 2020 44.13 47.34 3.15 3.73 2.57 7.47 10.31 4.63 1.72 1.36 2.08
FDRNet [67] 2021 10.77 14.32 3.04 2.91 3.18 7.28 8.31 6.26 1.55 1.22 1.88
SDCM [68] 2022 10.95 15.04 2.94 – – 6.73 – – 1.44 – –
SDTR+ [52] 2023 24.82 35.17 2.95 3.15 2.75 6.35 6.73 5.97 1.55 1.18 1.84
Ours-F 34.81 24.32 2.76 (6.46%↑) 3.04 2.48 6.21 (2.20%↑) 6.58 5.84 1.39 (3.47% ↑) 1.06 1.72
Ours-W (WSD) W 2.81 (9.35% ↑) 2.79 2.83 6.65(8.78% ↑) 7.65 5.65 1.41(8.44% ↑) 1.22 1.87
WCOD [14] 2023 29.52 13.46 3.10 3.08 3.12 7.29 8.12 6.46 1.54 1.36 1.72
WSOD [57] 2023 48.07 26.17 3.23 3.22 3.24 7.46 8.25 6.67 1.77 1.53 2.01
“F” and “W” denote the fully and weakly supervised training type, respectively. We set the best two fully supervised detectors in red and blue color, while the best two weakly supervised detectors
are set in green and copper. We also provide the percentage improvement bold for our two detectors compared to the best one in the previous detectors
123
H. Chen et al.
(a) Inputs (b) A+D (c) DSC (d) (e) DSD (f ) MTMT(g) Seg- (h) Ours (i) GT
BDRAR NeXt
Fig. 7 Qualitative comparison of our method with the SOTAs
Table 2 Ablation study on three

Metric V1 V2 V3 V4 V5 V6 V7 V8
benchmark datasets
SBU [46] BER 12.5 11.30 6.13 4.20 3.82 2.81 3.02 3.41
Shadow 9.63 8.57 5.31 3.58 3.33 2.79 2.83 2.98
Non-shadow 15.37 14.03 6.95 4.82 4.31 2.83 3.21 3.84
UCF [65] BER 15.30 13.10 9.58 7.55 7.49 6.65 6.72 7.21
Shadow 10.32 10.12 9.32 9.14 8.69 7.65 7.71 8.20
Non-shadow 20.28 16.08 9.84 5.96 6.29 5.65 5.73 6.22
ISTD [47] BER 9.24 8.65 3.57 2.30 1.86 1.41 1.59 1.70
Shadow 7.51 7.32 2.17 1.98 1.62 1.22 1.27 1.44
Non-shadow 10.97 9.98 4.97 2.62 2.10 1.87 1.91 1.96
We set the best and second one of all among detectors in red and blue, respectively
123
(a) Input (b) GT (c) V1 (d) V2 (e) V3 (f ) V4 (g) V5 (h) V6 (WSD) (i) V7 (j) V8
Fig. 8 Visual comparison for different combinations
(a) Skin image (b) Ground truth (c) Our scribble annota- (d) Edge map
tion
(e) Superpixel segmenta-(f ) Weak label augmenta-(g) Strong label augmen- (h) Lesion mask
tion tion tation
Fig. 9 Application in skin lesion segmentation
(a) Shadow image (b) Ground truth (c) Edge map (d) Shadow mask
Fig. 10 Failure case from our WSD
image noise or texture than EDETR, “V7 ” and “V8 ” are lower was expected, we obtained impressive skin lesion detection
than “V6 .” results by training our weakly supervised learning frame-
work.
6.5 Application
We can also apply our weakly supervised framework to

semantic segmentation, salient object detection, and medi-
cal image segmentation. In the task of skin lesion detection, 6.6 Limitation
for example, we first relabel ISIC dataset [6] with scribble
annotation (Fig. 9c). It only took six students about 15 min to We must acknowledge that although we can obtain satis-
annotate 2000 images in the ISIC dataset. Then, we employ factory shadow detection results, our proposed methods also
EDETR [39] to generate edge maps for all training sam- suffer from several limitations: (1) Despite the well-designed
ples (Fig. 9d). Next, we conduct weak label augmentation hand-crafted features and graph construction, the training and
(Fig. 9f) and strong label augmentation (Fig. 9g) succes- inference of GCNs for all training samples still bring about
sively to obtain more pixel-level training samples. Since the 2.2 and 0.5 h for the SBU and ISTD datasets, respectively.
scenes of skin lesion images are simpler than the natural (2) Inaccurate edge detection result (Fig. 10c) may reduce
image, the superpixel size can be set bigger than our shadow the accuracy of shadow detection (Fig. 10d), especially in
image, and the scribble can also be drawn more optionally. As unclear shadow regions.
123
H. Chen et al.
7 Conclusions 4. Chen, X.D., Wu, W., Yang, W., Qin, H., Wu, X., Mao, X.: Make
segment anything model perfect on shadow detection. IEEE Trans.
Geosci. Remote Sens. 61, 1–13 (2023)
In this work, we create two shadow detection datasets with 5. Chen, Z., Zhu, L., Wan, L., Wang, S., Feng, W., Heng, P.A.: A
scribble annotations, i.e., Scr-SBU and Scr-ISTD. At the multi-task mean teacher for semi-supervised shadow detection. In:
same time, we present a corresponding weakly supervised Proceedings of Computer Vision and Pattern Recognition, 5611–
shadow detection framework. Our method can achieve a 5620. IEEE (2020)
6. Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., Dusza, S.,
good trade-off between the labeling cost and detection accu- Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M.,
racy. Specifically, an uncertainty-guided label augmentation et al.: Skin lesion analysis toward melanoma detection 2018: A
scheme is developed to expand the sparse scribble annotation challenge hosted by the international skin imaging collaboration
to more reliable regions and enable sufficient training for a (isic). arXiv preprint arXiv:1902.03368 (2019)
7. Cucchiara, R., Grana, C., Piccardi, M., Prati, A., Sirotti, S.: Improv-
deep model. Furthermore, we introduce a multi-task learn-
ing shadow suppression in moving object detection with hsv color
ing framework to produce shadow maps with rich structures. information. In: Proceedings of Intelligent Transportation Systems,
Extensive experiments demonstrate that our method outper- 334–339. IEEE (2001)
forms existing SOTA semi-supervised and fully supervised 8. Ecins, A., Fermüller, C., Aloimonos, Y.: Shadow free segmenta-
tion in still images using local density measure. In: Proceedings
methods. In the future, we will extend our methods, e.g., label
of International Conference on Computational Photography, 1–8.
augmentation or edge-guided multi-task learning framework IEEE (2014)
to more computer vision tasks ,i.e., semantic segmentation 9. Ge, Y., Zhou, Q., Wang, X., Shen, C., Wang, Z., Li, H.: Point-
and salient object detection. teaching: weakly semi-supervised object detection with point
annotations. In: Proceedings of the AAAI Conference on Artifi-
Author contributions WW was involved in conceptualization, writing— cial Intelligence, 37, 667–675 (2023)
original draft, writing—review & editing. HC helped in validation, 10. Guan, Y.P.: Wavelet multi-scale transform based foreground seg-
methodology, project administration, funding acquisition. X-DC con- mentation and shadow elimination. Open Signal Process. J. 1, 1–6
tributed to visualization, formal analysis, data curation. WY and XM (2008)
helped in supervision. 11. Gulshan, V., Rother, C., Criminisi, A., Blake, A., Zisserman, A.:
Geodesic star convexity for interactive image segmentation. In:
Funding The work was supported by the National Natural Science Proceedings of Computer Vision and Pattern Recognition, 3129–
Foundation of China (61972120), the General Research Project of 3136. IEEE (2010)
Zhejiang Provincial Department of Education (Y202044861), the Zhe- 12. Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.:
jiang Provincial Natural Science Foundation (Regional Innovation Joint Segnext: Rethinking convolutional attention design for semantic
Funds) (LQZSZ24E050001). segmentation. In: Proceedings of Advances in Neural Information
Processing Systems. MIT Press (2022)
Data availability The datasets generated during and/or analyzed dur- 13. Guo, R., Dai, Q., Hoiem, D.: Single-image shadow detection and
ing the current study are available from the corresponding author on removal using paired regions. In: Proceedings of Computer Vision
reasonable request. and Pattern Recognition, 2033–2040. IEEE (2011)
14. He, R., Dong, Q., Lin, J., Lau, R.W.: Weakly-supervised camou-
flaged object detection with scribble annotations. In: Proceedings
Declarations of the AAAI Conference on Artificial Intelligence, 37, 781–789
(2023)
Conflict of interest The authors declare that they have no conflict of 15. Hu, X., Wang, T., Fu, C.W., Jiang, Y., Wang, Q., Heng, P.A.: Revisit-
interest. ing shadow detection: a new benchmark dataset for complex world.
IEEE Trans. Image Process. 30, 1925–1934 (2021)
Ethics approval The work follows appropriate ethical standards in 16. Hu, X., Zhu, L., Fu, C.W., Qin, J., Heng, P.A.: Direction-aware
conducting research and writing the manuscript. This work presents spatial context features for shadow detection. In: Proceedings
computational models trained with publicly available data, for which of Computer Vision and Pattern Recognition, 7454–7462. IEEE
no ethical approval was required. (2018)
17. Huang, X., Hua, G., Tumblin, J., Williams, L.: What character-
izes a shadow boundary under the sun and sky? In: Proceedings
of International Conference on Computer Vision, 898–905. IEEE
(2011)
18. Joshi, I., Kothari, R., Utkarsh, A., Kurmi, V.K., Dantcheva, A.,
References Roy, S.D., Kalra, P.K.: Explainable fingerprint roi segmentation
using monte carlo dropout. In: Proceedings of Winter Conference
1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: on Applications of Computer Vision, 60–69 (2021)
SLIC superpixels compared to state-of-the-art superpixel methods. 19. Junejo, I.N., Foroosh, H.: Estimating geo-temporal location of
IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012) stationary cameras using shadow trajectories. In: Proceedings of
2. Al-Amaren, A., Ahmad, M.O., Swamy, M.: A low-complexity European Conference on Computer Vision, 318–331. Springer
residual deep neural network for image edge detection. Appl. Intell. (2008)
53(9), 11282–11299 (2022) 20. Kang, S., Kim, J., Jang, I.S., Lee, B.D.: C2shadowgan: cycle-in-
3. Al-Huda, Z., Peng, B., Algburi, R.N.A., Alfasly, S., Li, T.: Weakly cycle generative adversarial network for shadow removal using
supervised pavement crack semantic segmentation based on multi- unpaired data. Appl. Intell. 53(12), 15067–15079 (2023)
scale object localization and incremental annotation refinement.
Appl. Intell. 53(11), 14527–14546 (2023)
123
21. Karsch, K., Hedau, V., Forsyth, D., Hoiem, D.: Rendering synthetic 40. Suykens, J.A., Vandewalle, J.: Least squares support vector
objects into legacy photographs. ACM Trans. Graph. 30(6), 1–12 machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
(2011) 41. Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolu-
22. Kendall, A., Gal, Y.: What uncertainties do we need in bayesian tional neural networks. In: Proceedings of International Conference
deep learning for computer vision? In: Proceedings of Advances on Machine Learning, 6105–6114. PMLR (2019)
in Neural Information Processing Systems. MIT Press (2017) 42. Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., Schroers, C.:
23. Kipf, T.N., Welling, M.: Semi-supervised classification with graph Normalized cut loss for weakly-supervised cnn segmentation. In:
convolutional networks. In: Proceedings of International Confer- Proceedings of Computer Vision and Pattern Recognition, 1818–
ence on Learning Representation, 1–14 (2017) 1827. IEEE (2018)
24. Kittler, J.: On the accuracy of the Sobel edge detector. Image Vis. 43. Unal, O., Dai, D., Van Gool, L.: Scribble-supervised lidar semantic
Comput. 1(1), 37–42 (1983) segmentation. In: Proceedings of the IEEE/CVF Conference on
25. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected Computer Vision and Pattern Recognition, pp. 2697–2707 (2022)
crfs with gaussian edge potentials. In: Proceedings of Advances in 44. Vicente, T.F.Y.: Large-scale weakly-supervised shadow detection.
Neural Information Processing Systems. MIT Press (2011) Ph.D. thesis, State University of New York at Stony Brook (2018)
26. Lalonde, J.F., Efros, A.A., Narasimhan, S.G.: Detecting ground 45. Vicente, T.F.Y., Hoai, M., Samaras, D.: Leave-one-out kernel opti-
shadows in outdoor consumer photographs. In: Proceedings of mization for shadow detection and removal. IEEE Trans. Pattern
European Conference on Computer Vision, 322–335. Springer Anal. Mach. Intell. 40(3), 682–695 (2017)
(2010) 46. Vicente, T.F.Y., Hou, L., Yu, C.P., Hoai, M., Samaras, D.: Large-
27. Lalonde, J.F., Efros, A.A., Narasimhan, S.G.: Estimating the nat- scale training of shadow detectors with noisily-annotated shadow
ural illumination conditions from a single outdoor image. Int. J. examples. In: Proceedings of European Conference on Computer
Comput. Vis. 98(2), 123–145 (2012) Vision, 816–832. Springer (2016)
28. Le, H., Vicente, T.F.Y., Nguyen, V., Hoai, M., Samaras, D.: A+d net: 47. Wang, J., Li, X., Yang, J.: Stacked conditional generative adver-
Training a shadow detector with adversarial shadow attenuation. In: sarial networks for jointly learning shadow detection and shadow
Proceedings of European Conference on Computer Vision, 662– removal. In: Proceedings of Computer Vision and Pattern Recog-
678. Springer (2018) nition, 1788–1797. IEEE (2018)
29. Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: Denser- 48. Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., Xu, W.: Occlu-
net: Weakly supervised visual localization using multi-scale feature sion aware unsupervised learning of optical flow. In: Proceedings
aggregation. In: Proceedings of the AAAI Conference on Artificial of Computer Vision and Pattern Recognition, 4884–4893. IEEE
Intelligence, 35, 6101–6109 (2021) (2018)
30. Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X.: Richer convo- 49. Wu, L., Cao, X., Foroosh, H.: Camera calibration and geo-location
lutional features for edge detection. In: Proceedings of Computer estimation from two shadow trajectories. Comput. Vis. Image
Vision and Pattern Recognition, 3000–3009. IEEE (2017) Underst. 114(8), 915–927 (2010)
31. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, 50. Wu, W., Chen, X.D., Yang, W., Yong, J.H.: Exploring better target
B.: Swin transformer: Hierarchical vision transformer using shifted for shadow detection. Knowl.-Based Syst. 273, 110614 (2023)
windows. In: Proceedings of International Conference on Com- 51. Wu, W., Wu, X., Wan, Y.: Single-image shadow removal using
puter Vision, 10012–10022. IEEE (2021) detail extraction and illumination estimation. Vis. Comput. 38(5),
32. Luo, D., Liu, G., Bavirisetti, D.P., Cao, Y.: Infrared and visible 1677–1687 (2022)
image fusion based on VPDE model and VGG network. Appl. 52. Wu, W., Yang, W., Ma, W., Chen, X.D.: How many annotations
Intell. 53(21), 24739–24764 (2023) do we need for generalizing new-coming shadow images? IEEE
33. Meng, Q., Sinclair, M., Zimmer, V., Hou, B., Rajchl, M., Tous- Transactions on Circuits and Systems for Video Technology 1–12
saint, N., Oktay, O., Schlemper, J., Gomez, A., Housden, J., et al.: (2023)
Weakly supervised estimation of shadow confidence maps in fetal 53. Wu, W., Zhang, S., Tian, M., Tan, D., Wu, X., Wan, Y.: Learning to
ultrasound imaging. IEEE Trans. Med. Imaging 38(12), 2755–2767 detect soft shadow from limited data. Vis. Comput. 38(5), 1665–
(2019) 1675 (2022)
34. Mikic, I., Cosman, P.C., Kogut, G.T., Trivedi, M.M.: Moving 54. Wu, W., Zhang, S., Zhou, K., Yang, J., Wu, X., Wan, Y.: Shadow
shadow and object detection in traffic scenes. In: Proceedings of removal via dual module network and low error shadow dataset.
International Conference on Pattern Recognition, P1, 321–324. Comput. Graph. 95, 156–163 (2021)
IEEE (2000) 55. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo,
35. Nguyen, V., Yago Vicente, T.F., Zhao, M., Hoai, M., Samaras, P.: Segformer: Simple and Efficient Design for Semantic Segmen-
D.: Shadow detection with conditional generative adversarial net- tation with Transformers. MIT Press, Cambridge (2021)
works. In: Proceedings of International Conference on Computer 56. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual
Vision, 4510–4518. IEEE (2017) transformations for deep neural networks. In: Proceedings of Com-
36. Okabe, T., Sato, I., Sato, Y.: Attached shadow coding: Estimat- puter Vision and Pattern Recognition, 1492–1500. IEEE (2017)
ing surface normals from shadows under unknown reflectance and 57. Xu, B., Liang, H., Gong, W., Liang, R., Chen, P.: A visual
lighting conditions. In: Proceedings of International Conference on representation-guided framework with global affinity for weakly
Computer Vision, 1693–1700. IEEE (2009) supervised salient object detection. IEEE Trans. Circ. Syst. Video
37. Panagopoulos, A., Samaras, D., Paragios, N.: Robust shadow and Technol. 1–12 (2023)
illumination estimation using a mixture model. In: Proceedings of 58. Yang, W., Wu, W., Chen, X.D., Tao, X., Mao, X.: How to use extra
Computer Vision and Pattern Recognition, 651–658. IEEE (2009) training data for better edge detection? Appl. Intell. 1–15 (2023)
38. Pei, J., Tang, H., Wang, W., Cheng, T., Chen, C.: Salient instance 59. Ye, S., Chen, D., Han, S., Liao, J.: Learning with noisy labels for
segmentation with region and box-level annotations. Neurocom- robust point cloud segmentation. In: Proceedings of International
puting 507, 332–344 (2022) Conference on Computer Vision, 6443–6452 (2021)
39. Pu, M., Huang, Y., Liu, Y., Guan, Q., Ling, H.: Edter: Edge detec- 60. Zhang, B., Xiao, J., Jiao, J., Wei, Y., Zhao, Y.: Affinity attention
tion with transformer. In: Proceedings of Computer Vision and graph neural network for weakly supervised semantic segmenta-
Pattern Recognition, 1402–1412. IEEE (2022) tion. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8082–8096
(2022)
123
H. Chen et al.
61. Zhang, J., Yu, X., Li, A., Song, P., Liu, B., Dai, Y.: Weakly- Xiao-Diao Chen received the bach-
supervised salient object detection via scribble annotations. In: elor’s degree from Zhejiang Uni-
Proceedings of Computer Vision and Pattern Recognition, 12546– versity, Hangzhou, China, in 2000,
12555 (2020) and the master’s and Ph.D. degrees
62. Zheng, Q., Qiao, X., Cao, Y., Lau, R.W.: Distraction-aware shadow from Tsinghua University, Beijing,
detection. In: Proceedings of Computer Vision and Pattern Recog- China, in 2003 and 2006, respec-
nition, 5167–5176. IEEE (2019) tively. He is currently a faculty
63. Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, member with the School of Com-
J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation puter, Hangzhou Dianzi Univer-
from a sequence-to-sequence perspective with transformers. In: sity, Hangzhou. His research inter-
Proceedings of Computer Vision and Pattern Recognition, 6881– ests include approximation and
6890. IEEE (2021) interpolation methods applied in
64. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learn- computer graphics, edge detection,
ing deep features for discriminative localization. In: Proceedings and shadow detection in image
of Computer Vision and Pattern Recognition, 2921–2929. IEEE processing.
(2016)
65. Zhu, J., Samuel, K.G., Masood, S.Z., Tappen, M.F.: Learning to
recognize shadows in monochromatic natural images. In: Proceed-
ings of Computer Vision and Pattern Recognition, 223–230. IEEE
(2010) Wen Wu received his B.Sc. degree
66. Zhu, L., Deng, Z., Hu, X., Fu, C.W., Xu, X., Qin, J., Heng, P.A.: in electronic and information engi-
Bidirectional feature pyramid network with recurrent attention neering from Wuhan College of
residual modules for shadow detection. In: Proceedings of Euro- Arts and Science in 2016 and
pean Conference on Computer Vision, 121–136. Springer (2018) M.Sc. degree in computer applied
67. Zhu, L., Xu, K., Ke, Z., Lau, R.W.: Mitigating intensity bias technology from Hubei Univer-
in shadow detection via feature decomposition and reweighting. sity, Wuhan, in 2019. He is cur-
In: Proceedings of International Conference on Computer Vision, rently pursuing the Ph.D. degree
4702–4711. IEEE (2021) in computer science and technol-
68. Zhu, Y., Fu, X., Cao, C., Wang, X., Sun, Q., Zha, Z.J.: Single image ogy with Hangzhou Dianzi Uni-
shadow detection via complementary mechanism. In: Proceedings versity, Hangzhou, China. From
of the ACM International Conference on Multimedia, 6717–6726 2019 to 2022, he was a lecturer
(2022) in computer science with the Xin-
jiang Institute of Technology,
Aksu, China. His research inter-
Publisher’s Note Springer Nature remains neutral with regard to juris- ests include deep learning and computer vision.
dictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds

exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted Wenya Yang received her B.Sc.
manuscript version of this article is solely governed by the terms of such degree from Zhengzhou Univer-
publishing agreement and applicable law. sity of Aeronautics in 2017, M.Sc.
degree from Zhengzhou Univer-
sity of Light Industry in 2020,
and PH.D. degree from Hangzhou
Dianzi University in 2023. She
Hongyu Chen received his B.Sc.
is currently a postdoctoral fellow
degree from Jilin Medical Uni-
with the School of Electronic
versity in 2018 and M.Sc. degree
Information Engineering,
from Hangzhou Dianzi University
Hangzhou Dianzi University. Her
in 2021. He is currently pursuing
research interests include computer
a Ph.D. degree in computer sci-
vision, artificial intelligence, med-
ence and technology at Hangzhou
ical image processing, and com-
Dianzi University. His research
puter graphics.
areas include computer vision,
artificial intelligence, computer
graphics, and computer-aided
design.
123
Xiaoyang Mao received her B.Sc.

in computer science from Fudan
University, China, M.Sc., and
Ph.D. in computer science from
University of Tokyo, Japan. She
is currently a professor at Depart-
ment of Computer Science and
Engineering, University of
Yamanashi, Japan, and an adjunct
professor at School of Computer
Science and Technology, Hangzhou
Dianzi University, China. Her
research interests include image
processing, virtual and augmented
reality, and their applications to e-
health. She is the receipt of Computer Graphics International 2018
Achievement Award. She is a member of ACM and IEEE and is serv-
ing as an associate editor for the Visual Computer.
123

Annotate Less But Perform Better: Weakly Supervised Shadow Detection Via Label Augmentation

Uploaded by

Copyright:

Available Formats

You might also like

Annotate Less But Perform Better: Weakly Supervised Shadow Detection Via Label Augmentation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Annotate Less But Perform Better: Weakly Supervised Shadow Detection Via Label Augmentation

Uploaded by

Copyright:

Available Formats

The Visual Computer

Annotate less but perform better: weakly supervised shadow

Accepted: 11 January 2024

1 Introduction 54] in these cases can obviously improve accuracy. More-

Fig. 1 A training example in our weakly supervised shadow detection framework

2 Related work work (SDCM) by investigating the mutual complementary

Annotang for SBU Annotang for ISTD

Fig. 2 Our weakly supervised shadow detection benchmarks

Fig. 3 Overview of our Scribble annotaon

Fig. 4 An example of our noise

This task can compensate for the weak boundary structure in

Fig. 5 Overview of our

Fig. 6 Structure-aware constrains

Fig. 7 Qualitative comparison of our method with the SOTAs

Table 2 Ablation study on three

Fig. 8 Visual comparison for different combinations

Fig. 9 Application in skin lesion segmentation

Fig. 10 Failure case from our WSD

We can also apply our weakly supervised framework to

Springer Nature or its licensor (e.g. a society or other partner) holds

Xiaoyang Mao received her B.Sc.

You might also like