Professional Documents
Culture Documents
A Survey On Physical Adversarial Attack in CV A SURVEY
A Survey On Physical Adversarial Attack in CV A SURVEY
A Survey On Physical Adversarial Attack in CV A SURVEY
9, SEPTEMBER 2022 1
TERMINOLOGY
they are out-of-date since plenty of novel physical attacks x Clean image.
have emerged [29]–[39] in the past two years. Concurrent y Ground truth of the image.
to our work, although [40], [41] review physical attacks, x̂ Clean image in the physical world.
xadv Adversarial example.
they only review some of the current physical attacks, i.e., x̂adv Adversarial example in the physical world.
43 and 46 publications, respectively. In contrast, we review δ Adversarial perturbation.
69 publications on physical attacks. Figure 1 illustrates the Maximum allowable magnitude of perturbation.
f Neural network.
number of physical attack literature published yearly since ycls Ground truth label for the classification task.
2016. Therefore, it is necessary to analyze the newly proposed yobj Ground truth label for the object detection task.
method for better tracking the latest research direction. Jf (·) Loss function for neural network f.
∇Jf (·) The gradient of the f with respect to the input x.
In this paper, we comprehensively review the existing || · ||p Lp norm distance.
physical adversarial attacks in the computer vision task. Given d(·, ·) Distance metrics.
T Transformation distribution.
the statistical result of the current literature, we mainly survey t A specific transformation.
the two most prevalent tasks: image recognition and object
detection. Note that, object tracking and semantic segmenta-
tion tasks are similar to object detection. Thus we include II. F UNDAMENTAL CONCEPT
the physical adversarial attack against object tracking and
semantic segmentation in object detection. Specifically, we In this section, we briefly introduce the foundation concept
first propose a taxonomy to summarize the current physical of deep learning and adversarial examples.
adversarial attacks from the formal of adversarial perturbation,
the way of physical attack, the settings of physical attack, and
so on. Then, we briefly analyze different physical adversarial A. Deep learning
attacks in detail. After reviewing attacks for each field, we
summarize the most frequently used technique for improving Deep learning, a data-driven technique, is known for its
the robustness of physical attacks. Finally, we discuss the powerful capability to understand data patterns and features
issues of the current physical adversarial attacks to be solved from enormous data, where the learned representations are
and the promising direction. We hope this survey will pro- beneficial to various regressions or classification tasks. In gen-
vide comprehensive retrospect and beneficial information on eral, the deep learning technique consists of data preprocess-
physical adversarial attacks for researchers. ing, architecture design, training strategy, loss function, and
performance metrics. The core component of deep learning
To cover the existing physical attacks as much as possible, is automatic feature learning driven by deep neural networks
we searched the keywords “physical attack” or “physical (DNNs), in which the most representative characteristic is that
adversarial examples” from Google Scholar to obtain the it can extract useful features from considerable data in an end-
latest paper. In addition, conference proceedings in CVPR, to-end manner.
ICCV, ECCV, AAAI, IJCAI, ICML, ICLR, and NeurIPS were
The architecture of DNN is determined by the different
searched based on the keyword toward titles of the papers. We
connections and numbers between the neurons and the layers.
manually checked the references in all selected papers to seek
In the past decade, convolutional neural networks (CNNs)
the relevant literature to avoid leakage. In addition, we exclude
were the most commonly used architecture in computer vision
papers that have not relevant to physical adversarial attacks.
tasks and have been maturely deployed in commercial appli-
Finally, we collected 69 physical adversarial attack papers in
cations. CNN consists of multiple hidden layers with different
computer vision.
connections, where each layer contains several convolution
In summary, our contributions are listed as follows. kernels for extracting different feature representations from
input data. The representative architecture of CNN in computer
• We make a comprehensive survey on physical adversar-
vision is ResNet [2], VGG [42], and DenseNet [43], which
ial attacks in computer vision tasks to provide overall
are employed in image recognition tasks and object detection
knowledge about this research field.
tasks [2], [44]. Recently, inspired by the incredible success
• We propose a taxonomy to summarize the existing phys-
of transformer architecture in the NLP task, the transformer
ical adversarial attacks according to the characteristic of
has gradually been applied to computer vision tasks, called
different attacks. We briefly discuss different physical
vision transformers. Vision transformer consists of an encoder
attacks and outline the most frequently used techniques
and decoder with several transformer blocks of the same ar-
for improving the robustness of physical attacks.
chitecture. The encoder produces the embedding of the inputs,
• We discuss the challenging and research directions for
while the decoder takes all embedding as input to generate the
physical adversarial attacks based on the following is-
output with their incorporated contextual information. Despite
sues: input transformation augmentation; attack algorithm
the architecture discrepancy between different networks, the
design consisting of developing physical attacks under
neural network has a uniform representation formal, i.e., f (x),
black-box settings, transferability, attack multitask, and
and the concrete representation of f (x) depends on the given
so on; evaluation contains physically attack benchmark
tasks. However, despite DNNs achieving great success, the
evaluation datasets construction and uniform physical test
underlying mechanism of DNNs is unclear and exposing the
criterion.
potential security risks as it is overly dependent on the data.
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 3
Optimization
Robustness Transformation Patch applier Victim model
Loss
Brightness
function
Rotate Resize
Adversarial /contrast
perturbation
Print out Stage 1: Optimize adversarial perturbation in digital space
Stick patch
Victim model
Patch paper
the direction of the gradient ascent multiple times in small 3) D2P: Take further steps than [55], Jan et al. [53]
steps. The attack algorithm can be expressed as follows. analyzed the discrepancy between the digital image and the
physical adversarial image, then proposed a method to model-
x0adv = x,
(5) ing such difference. Specifically, the author first collected enor-
xN +1
N N
adv = Clipx, xadv + αsign(∇x J(xadv , y)) mous digital-physical image pairs by printing digital images
where α controls the iterative step, is the maximum al- and retaking them. Then, the author treated the transformation
lowable modification over the image, Clipx, (·) is a function between the digital and physical worlds as the image-to-
to confine the discrepancy between adversarial examples and image translation and adopted a Pix2Pix model [106] or
clean image in . In physical attacks, the author printed a CycleGAN [107] to learn the translation mapping. Addi-
adversarial examples and used a phone to capture the photo tionally, the EOT is adopted to boost the robustness of the
of adversarial examples. Experiment results suggested that the adversarial perturbation on the simulated physical image (i.e.,
printed adversarial examples can fool the image recognition the generator’s output). In physical attacks, the author printed
model. out the adversarial examples and retook them every 15 degrees
2) EOT: To construct robust adversarial examples for phys- in the range of [-60, 60]. Experiment results suggest that their
ical attacks, Athalye et al. [49] proposed a general frame- attack outperforms the EOT and remains valid under different
work to synthesize robust adversarial examples over a set of visual angles. However, their attack is the full pixel attack,
transformation distributions, which called Expectation Over which is impractical in practice.
Transformation (EOT). Specifically, they constructed the the 4) ABBA: Guo [56] proposed a novel motion attack, which
following optimization problem exploits the blurring effect caused by object motion to fool
the DNNs. Specifically, the author optimized the specific
argmin Et∼T [log P r(yt |t(x0 )) transform parameters for the object and background to create
x0 (6) the blurring effect. The background is extracted by a binary
−λ||LAB(t(x0 )) − LAB(t(x))||2 ], mask generated with a saliency detection model. Moreover, the
where the P r(yt |t(x0 )) represents the probability of t(x0 ) be author used a spatial transformer network [108] to improve
classified to yt , which is used to guarantee the adversarial; the robustness of adversarial examples. In physical attacks,
LAB(·) is the color space convert function between RGB the author captured the object image in the real world and
space to LAB space, which used to ensure the perceptual of generated the adversarial blur examples by the devised method.
adversarial perturbation. Note that, they develop an algorithm Then, the author moved the camera according to the optimized
to optimize the texture of 3D objects in 3D cases scenario. transformation parameters to take the blurred image. Experi-
In physical attacks, the author printed the 3D object and the ment results show that the captured blurring image achieved
adversarial texture with the printer (3D), then wrapped the a success rate of 85.3%.
adversarial texture over the 3D object. Evaluations were con- 5) MetaAttack: Feng et al. [57] observed that the previous
ducted on the captured image, and the results suggested that D2P requires enormous amounts of manually collected data
the multi-view captured images can fool the image recognition and proposed to utilize the meta-learning technique to solve
model. the problem. Specifically, the author proposed to use GAN
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 7
TABLE II
P HYSICAL ADVERSARIAL ATTACKS AGAINST IMAGE RECOGNITION TASKS . W E LIST THEM BY TIME AND TASK , ALIGNING WITH THE PAPER ’ S
DISCUSSION ORDER .
TABLE III
P HYSICAL ADVERSARIAL ATTACKS AGAINST OBJECT DETECTION TASKS . W E LIST THEM BY TIME AND TASK , ALIGNING WITH THE PAPER ’ S DISCUSSION
ORDER .
However, lacking constraint on perturbation results in the odd to construct naturalistic adversarial patches. Unlike crafting
appearance of the adversarial patch. full-pixel adversarial examples with GAN [109], the author
8) ACOsAttack: The previous physical attacks failed to first train a generator to map the latent variable z (i.e., random
explore the DNN-based application in real life, Liu et al. noise) to the realistic image patch, which will be embedded in
[29], [54] proposed optimizing a universal adversarial patch to the image. In the second stage, the author fixed the generator’s
attack the real deployed automatic check-out system (ACOs). parameters, then used the gradient-based optimization method
Specifically, the author exploits models’ perceptual and se- with the guide of adversarial loss to seek the optimal latent
mantic biases to optimize the adversarial patch. Regarding variable z, which generates the visual naturally adversarial
the perceptual bias, the author used hard examples which patch. Their approach is based on the fact that the trained
convey strong model uncertainties and extracted the textural generator could map the randomly sampled latent variable
patch prior from them in terms of the style similarities. z to a naturalistic image, in which one latent space can be
To alleviate the heavy dependency on large-scale training mapped to the naturalistic adversarial patch. In such a way,
data, the author exploited the semantic bias by collecting the author significantly reduced the optimization variable (only
many prototypes with respect to the different classes, then 100-dimension latent variables z). In physical attacks, the
trained the universal adversarial patch on the prototype dataset. author printed the patch image and stuck it on the front of
Moreover, the author also adopted the EOT to improve the clothes, then captured a 1-minute video of the person who
robustness of the adversarial patch. In physical attacks, the wore the clothes with the adversarial patch. Experiment results
author printed the adversarial patch and stuck them on the suggested that over 90% of captured images in frames can fool
commodities, then captured it from different environmental the target network.
conditions (i.e., angles {−30◦ , −15◦ , 0◦ , 15◦ , 30◦ } and dis- 10) Copy/Paste Attack: Inspired by the ringlet butterfly’s
tances {0.3, 0.5, 0.7, 1})(meter). Experiment results suggested predators being stunned by adversarial “eyespots” on its
that the captured images with the adversarial patch easily wings, Casper [60] speculate that the DNNs are fooled by
deceive the system. Additionally, their attack could degrade the interpretable features in the real world. To this end, the author
recognition accuracy of the prevalent E-commerce platforms proposed to develop adversarial perturbation that can reveal
(i.e., JD and Taobao) on the test set by 43.75% and 40%. the weakness of the victim network. Specifically, the author
9) TnTAttack: Recently, Bao [38] et al. pointed out the used GAN to create the patch, which is pasted on the target
existing physical adversarial patches are visually conspicuous. image that is classified as the specific target class by the
To address the issue, the author proposed to exploit a generator target model. Additionally, EOT is adopted to improve the
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 9
X
Luntargeted
G = lg(1 − D(G(z)))
z∈Z
X (9)
−κ( Fyi (x + G(z)) − Fgt (x + G(z))),
i6=x
X
Ltargeted
G = lg(1 − D(G(z)))
z∈Z
X (10)
(a) (b) (c) (d) −κ(Fyt (x + G(z)) − Fyi (x + G(z))),
Fig. 6. Examples of adversarial glasses for dodging and impersonation attacks i6=t
of [51] (a,b) and [69] (c,d).
where the z is sampling from a distribution Z, D(·) and G(·)
are the output of discriminator and generator, respectively. Fyt
the latter refers to the attacked face that can be misidentified and Fyi are the network’s softmax output of the ground-truth
as other face except the original one. Actually, the dodging label yt and another label yi . lg(·) is the logarithmic function
and impersonation attack is analogous to the untargeted and and κ is the weight to balance the loss function. By minimizing
targeted attacks, respectively. Specifically, physical attacks the loss function, the generator learns to craft realistic glasses
against the facial recognition system to output the incorrect frames with adversarial. To further enhance the robustness of
person (impersonation attack) or undetected result (dodging the adversarial glasses in physical attacks, the author intro-
attack) by modifying the accessorize of the face, e.g., eye- duce the following three methods. The first one is to utilize
glasses frame [51], [69], hat [73], sticker [39], or even makeup multiple images of the attacker to improve the robustness of
[110]. adversarial glasses under different conditions. The second is
to collect various pose images to make the attacks robust to
1) Eyeglass attack: Sharif et al. [51] first investigated the
pose changing. The third is to use the Polynomial Texture
physical adversarial attack against face recognition models.
Maps approach [111] to confine the RGB value of eyeglass
To attack the face recognition model, they taken the facial
under baseline luminance to values under a specific luminance,
accessory (i.e., eyeglasses) as the perturbation carrier. The
making the attacks robust to varying illumination conditions.
robustness of the single perturbation is enhanced by training
In physical attacks, the author printed out the eyeglasses frame
it on a set of images, making it remain works under different
and adversarial pattern, stuck them over the eyeglasses frame,
imaging conditions. Additionally, the TV loss [52] is adopted
and then recaptured images for evaluation. Experiment results
to improve the smoothness of perturbation and designed the
show that their method outperforms [51] by the improvement
NPS loss to ensure the printability of perturbation. Finally, the
of 45% under different physical conditions (i.e., head pose
author optimizes the perturbation by constructing the following
covered 13.01◦ pitch, 17.11◦ of yaw, 4.42◦ of roll). Figure 6
optimization problem
illustrates the printed adversarial eyeglasses generated by [51]
X and [69].
argmin(( sof tmaxloss(x + δ, yt )) Later, Zhang et al. [71] demonstrated that the face authen-
δ x∈X (8) tication system equipped with the DNN-based spoof detection
+κ1 · T V (δ) + κ2 · N P S(δ)), module is still vulnerable to the adversarial perturbation. They
got inspiration from EOT [49] and train the perturbation over
where sof tmaxloss is the adversarial loss for the face recog- a serial of transformations (i.e., affine, perspective, brightness,
nition system, κ1 and κ2 balance the objectives. To conduct gaussian blurring, and hue and saturation). In performing
the physical experiment, the author printed and cropped out physical attacks, The author displayed the face image with
the adversarial eyeglass frame and stuck it on the actual pair adversarial perturbation on the screen and then captured it by
of eyeglasses. Then, the author asked the trial participant to the cellphone under different views (angles in −20◦ ∼ 20◦ ,
wear eyeglasses and stand at a fixed camera distance in the distances in 10 to 50 cm). Experiment results suggested that
indoor environment and then collected 30-50 images for each the target system is fragile to the adversarial face image.
participant. Experiment results suggested that their adversarial Recently, Singh et al. [70] leveraged the concept of curricu-
eyeglass frame easily deceives face recognition. lum learning to promote the robustness of attack to changing
Although the eyeglasses frame generated by [51] deceive brightness against the facial recognition model. In optimiza-
facial recognition to some extent, Sharif et al. [69] argued tion, the author constantly changes the brightness of face
that the appearance of the eyeglasses frame is conspicuous. To images stuck with adversarial eyeglasses, which makes their
craft an inconspicuous eyeglass frame, the author proposed to attack robust to brightness changes. Physical attacks verify the
use a generative model to craft the adversarial glasses frame effectiveness of the attack against the brightness change.
with a natural pattern. Specifically, the author first collected 2) Sticker: Unlike treating the eyeglasses as the pertur-
enormous images of real glasses frames as the training set; bation carrier, Pautov et al. [72] proposed a simple but
Then trained an adversarial generative network (AGN) with effective method to craft the adversarial sticker to deceive face
the following loss function, i.e., Equation 9 and 10 for the recognition. To apply the adversarial sticker over the facial
untargeted and targeted attack, respectively. area, the author applied the projective transformation over the
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 11
approach to craft the adversarial patch that can place anywhere that the detection performance on captured images average
(even far from the object) in the image, which can suppress degraded 76.9% in indoor and outdoor conditions.
all the detected objects. Specifically, the author designed the 6) TranslucentPatch: Unlike crafting the adversarial patch
adversarial loss against the YOLO serial detectors and adopted and pastes on the object’s surface, Zolfi et al. [88] proposed a
the EOT (affine transformation and brightness on HSV color novel adversarial attack that crafts the adversarial patch, which
space) to improve the physical robustness. The PGD [12] is was stuck on the camera lens. To avoid the camera lens being
adopted the optimize the adversarial patch. In physical attacks, overlapped by the adversarial patch, the author took the im-
the author printed out the adversarial patch and recorded age’s alpha channel into account during the optimization. The
the video under natural lighting conditions. The YOLOv3 is author implemented the targeted attack and simultaneously
adopted to evaluate the effect of physical attacks. Experiment maintained the non-attacked objects were not to be affected,
results show that the patch is invariant to location. However, which are implemented by the Equation (18) and (19).
when placing the adversarial patch away from the object, the
size of the patches needs to be enlarged.
4) UPC: Previous works mostly craft the instance pertur- Ltarget conf = P r(objectness) × P r(targetclass), (18)
bations for rigid or planar objects (e.g., traffic sign), ignoring 1 X
the non-rigid or non-planar objects (e.g., clothes). To address Luntarget conf =
M
this issue, Huang et al. [48] proposed to craft the universal cls ∈ image
(19)
physical adversarial camouflage (UPC) for the non-rigid object cls 6= target
(i.e., person). Specifically, the author used the nature image
as the initialization of the adversarial patch for the purpose |conf (cls, clean) − conf (cls, patch)|,
of visual inconspicuous. Moreover, a set of transformations where the P r(objectness) and P r(class) represent the ob-
is performed on the adversarial patch to mimic deformable jectness and classification of detector, respectively. Addition-
properties. Then, the author devised specific two-stage attack ally, the author also takes the Intersection over Union (IOU)
procedures against Faster RCNN: the RPN attack and the as the adversarial loss
classifier and regressor attack, which are implemented by the
following loss function. gt
LIoU = IOUpredcit (target). (20)
The predicted bounding box changed by minimizing the LIoU .
Lrpn =Epi ∼P (D(si , st ) + si ||d~i − ∇d~i ||1 ), The author also adopted the NPS to ensure the color of the
Lcls =Epi ∼P̂ C(p)y + Epi ∼P ? L(C(p), y), printed adversarial patches. The author conducts the physical
(17)
attack by first printing out the adversarial patch on transparent
X
Lreg = ~ 2,
||R(p) − ∇d||
pi ∼P ?
paper, then placing it on the camera’s lens, then capturing the
video. The physical attack evaluated on YOLOv2/5 and Faster
where si and d~i are the confidence score and coordinates RCNN achieves over the average fooling rate of 35.48%.
of i-th bounding box; st is the target score, where s1 and 7) SwitchAttack: Recently, Shapira et al. [89] developed
s0 represent the background and foreground. ∇d~i is the a targeted attack method to generate the adversarial patch
predefined vector that guides the optimization direction. C and that is pasted on the car hood to attack object detectors
R are the prediction output of the classifier and the regressor. (i.e., YOLOv3 and Faster RCNN) in a more realistic monitor
P is the output proposal, P̂ is the top-k proposal filtered by scenario. Specifically, the author devised a tailored projection
confidence score and P ? is the proposal that can be detected method to adaptively adjust the position and shape of the ad-
as true label y. d~ denotes the distortion offset. Additionally, the versarial patch in terms of the camera orientation. To performs
TV loss is also adopted to produce more natural patterns. In the targeted attack, the author considers all candidates that
physical attacks, the author printed the adversarial patterns and surpass the specific threshold before the NMS component to
stuck them over the car (i.e., non-plane objects) and captured ensure the relevant candidates can be classified as the target
the picture from different distances (8 ∼ 12 m) and angles label. Moreover, TV loss is adopted to improve the physically
(−45◦ ∼ 45◦ ). Experiment results show that the adversarial deployable of the adversarial patch. In physical attacks, the
camouflage could reduce the detector’s performance by a large author stuck the printed patch on the toy car, achieving an
margin. attack success rate of 95.9% on recaptured images.
5) LPA Attack: Unlike the previous patch-based physical 8) ScreenAttack: The first step of current patch-based phys-
attacks [45], Yang et al. [87] proposed to generate the adver- ical attacks is to print out the adversarial patch, which is
sarial license plate to fool the SSD. Specifically, the author static in practice, and ignore the constantly changing character
devised a mask to guide the perturbation region, optimized by of the physical world, which may result in a failed attack.
the C&W adversarial loss and color regularization consisting To address this challenge, Hoory et al. [90] proposed a
of NPS and TV loss to ensure adversarially and eliminate the dynamic adversarial patch for evading object detection models
color discrepancy between the digital and physical worlds. In by displaying the adversarial patch with a digital screen (see
optimization, EOT is used to enhance physical robustness. In Figure 11). The author adopted the loss function as the same
physical attacks, the author created the adversarial license plate to [47] to train the adversarial patch. Additionally, the author
and placed it on the real car. Experiment results suggested pointed out that the adversarial patch crafted by loss function
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 16
[47] is easily misdetected to the semantically related category, Xu et al. [92] proposed an adversarial T-shirt attack to avoid
which makes it inconsistent with background objects. To this the detection of the single object detector (i.e., person) in the
end, the author considered the semantically-related class (e.g., physical world. The generated adversarial pattern is attached
bus or truck) of the original class (i.e., car) as the target and to the surface of the T-shirt. To mimic the cloth deformation
devised the following loss function in the physical world, the author proposed to apply the
Thin Plate Spline Mapping (TPS) transformation over the
X adversarial perturbation. To further enhance the performance
J(δ, xadv , C) = α · T V (δ) + L(f (xadv ), yi ), (21) of the adversarial attack, the author adopted the following
yi ∈C
three approaches: 1) approximate the cloth’s deformation with
where C is the set of semantically-related labels used to avoid TPS; 2) approximate the color discrepancy between digital
being detected as car-related classes. Furthermore, the author and physical space with a quadratic polynomial regression.
split the train set into multi subsets and trained an adversarial 3) enhance the robustness of the adversarial perturbation
patch for each subset. In physical attacks, the author placed with EOT. By performing the above operations, the T-shirt
a digital screen on the side and rear of the car, where the stuck with the adversarial patch degrades the person detector
adversarial patch is chosen from the patch subsets to dynamic significantly under indoor and outdoor physical conditions.
display in terms of the changing environmental conditions. Concurrent to [92], Wu et al. [93] proposed to devise an
Evaluation results on YOLOv2 suggested that 90% video adversarial patch to implement the disappear attack. To this
frames with the adversarial perturbed screen can not detect end, the author devised the objectness disappear loss, which
the target object (i.e., car). is expressed as follows
X 2
Lobj = Et∼T max {Si (A(x, δ, t)) + 1, 0} , (22)
i
(s,p)
applied to the target object inside the image to constructing where P is the projection model, cs , cp , co is the training
the adversarial examples. The TV loss and the following data triple, consists of projecting cp on pixels of color cs , and
(s,p)
adversarial loss are adopted to optimize the adversarial patch the output co ; w is the parameters of the projection model.
N
To optimize the adversarial examples, the author combined the
1 X j j EOT and treated the objectness and classification output of
L= max[fobj (xadvi ) · fcls (xadvi )], (23)
N i=1 j the detector as the adversarial loss. The author conducted the
physical attack by placing the projector away from the target
where the xadvi is the i-th image in a batch with size
j j object from 1 to 12 meters and viewing angles −30◦ ∼ 30◦ ,
N. fcls , fobj represent the class probability and objectness
achieving the success rate 100% for the traffic sign classifi-
probability, respectively. In physical attacks under indoor and
cation model and 77% for object detectors (i.e., Mask RCNN
outdoor conditions, the author recorded one person wearing
and YOLOv3).
the adversarial shirt side-by-side with another person wearing
an ordinary shirt. Experiment results show that the phys- D. Camouflage-based physical adversarial attack
ically recaptured image could degrade the detection recall
The works discussed above are mainly focused on gen-
of YOLOv4tiny by approximately 23.80% and 43.14% (for
erating the adversarial patch on the 2D image, where each
indoor and outdoor, respectively). Moreover, they exhaustly
pixel can be modified independently. Recently, researchers
investigate the influence of different transformations, patch
have increasingly paid attention to more realistic attacks in 3D
sizes, and target classes on attack performance.
space, developing adversarial camouflage-based attacks that
Hu et al. [96] got inspiration from cloth making and
focus on modifying the appearance (i.e., shape or texture) of
proposed to treat the adversarial patch as the whole cloth to
the 3D object.
be cut. To generate a naturalistic patch, the author utilized the
1) Shape of 3D object: Zeng et al. [98] first proposed to
generative model to generate the adversarial patch and then
generate the adversarial examples against 3D shape classifica-
overlap them with the object. To simulate the physical defor-
tion and a visual question answer (VQA) system. Specifically,
mation of the cloth, the author applied the TPS transformation
they crafted the adversarial example by adopting the neural
over the adversarial patch. TV loss is adopted to ensure the
renderer technique, which allows them to perform the 3D
smoothness of the patch’s pattern. They conduct the physical
transformation to better approximate real-world environments.
experiment by painting the adversarial patch over the cloth and
Concurrent to Zeng [98], Xiao et al. [122] proposed to
then making the clothes. The adversary who wore the crafted
generate an adversarial object by modifying the mesh of a 3D
adversarial clothes can bypass the detector with the average
object, namely MeshAdv. The adversarial mesh is optimized
attack success rate of 80.75% against different detectors (i.e.,
on the total loss consisting of classification loss, objectness
YOLOv2/3, Faster RCNN, and Mask RCNN) under different
loss, and perceptual loss. Moreover, the author extends the
distances and view angles (range from 0◦ to 180◦ ). Figure 13
TV loss from pixel-wise in the 2D image to the vertex in the
describe the adversarial patch and the corresponding clothes.
3D image. However, the methods of the above method limited
their utilization, which made them fail to conduct the physical
attack.
2) CAMOU: Unlike the works [98], [122] optimize the ob-
ject mesh, Zhang et al. [99] proposed to optimize the texture of
the 3D object. The author presented a clone network to mimic
the entire processing containing the adversarial texture render
and detector prediction. Then apply the white-box attack on
Fig. 13. Adversarial patch (left) v.s. adversarial clothes (right). [96] the clone network to optimize the adversarial texture. Although
the camouflaged vehicle achieved good attack performance on
Mask RCNN, the optimized texture is a mosaic-like pattern,
C. Projector’s Projection which is conspicuous to the human observer.
Rather than generating the adversarial patch directly, Lovi- 3) UPC: Huang et al. [48] proposed to optimize the
sotto et al. [97] presented a novel adversarial attack by mod- adversarial texture from the patch seed and stick the adver-
ulating the projector’s parameters. The author first collected sarial patch over the partial vehicle’s surface to obtain well
the projectable colors as following step: 1) collect the target appearance. It is worth noting that the author printed out the
image of the target object as the projection surface S; 2) select adversarial texture and stuck it on the real car to experiment.
a color cp = [r, g, b], and the color Pcp over the projection 4) ER: Rather than construct the adversarial patch under
surface, resulting in the output Ocp . Finally, repeat the above the white-box setting, Wu et al. [100] considered a more
two steps until collecting enough data. The author takes the challenging black-box attack. The author proposed to utilize
camera noise and projector’s work mechanism into account to the genetic algorithm to optimize the adversarial texture. To
collect the realistic color as possible. After that, the author reduce the optimization variable, the author optimizes small-
fitted a projection model as follows size adversarial texture (64 × 64), then enlarges or repeats
(ER) to a large scale (2048 × 2048). The adversarial texture is
X
LP = argmin ||P(cs , cp ) − c(s,p) ||1 , (24) rendered wrapped over the vehicle in the simulator environ-
o
w
∀cs ,cp ment via Carla [123] PythonAPI, then recorded the adversarial
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 18
it is free from lighting changes, enabling them adopted widely as a saliency loss to train the adversarial patch. The saliency
in the real world. loss is represented as follows
Zhu et al. [31] proposed a novel adversarial method to attack
the infrared thermal imaging detector. The author devised
q q
Lsal = 2 + σ 2 + 0.3 ∗
σrg µ2rg + µ2yb
a board with multi bulbs, where the positions of bulbs are yb
(25)
the optimization variables. To mimic the shining of the bulb, rg = R − G, yb = 0.5 ∗ (R + G) − B,
the author applied the gaussian function on the pixel of the where µ and σ are the averages, and the standard deviation
bulb’s position. Finally, the author treated the combination of of the auxiliary variables, R, G, and B denote the patch’s
objectness output of detector and TV loss as their loss function color channel values. The author investigated the impact
to optimize the adversarial patch. The author conducted the of the patch’s size, position, number, and saliency on the
physical attack by placing the bulb in the optimized positions adversarial performance. Experiment results demonstrate that
and photoed the people who held the board. Experiment results the optimized patch could reduce the detector performance
show that the adversarial bulb board degrades the detector by significantly (i.e., 47%).
34.48%. Recently, Du et al. [101] implement the physical adversarial
Recently, the author pointed out that [31] may be failed attack against the aerial imagery detection. Considering the re-
in multi-view environments and proposed to construct the alistic scenario, the author designed two types (see Figure 17)
adversarial clothes covered with adversarial texture made by of adversarial patches to accommodate the different scenarios.
aerogel [32]. More specifically, the author adopted a similar Type ON refers to the adversary placing the adversarial patch
idea with [96] to optimize the adversarial patch image. To on the top of the car, which is suitable for dynamic scenarios
simplify the physical implementation, the author optimized the (i.e., the car is driving). By contrast, Type OFF refers to the
QR code-like adversarial texture, which allows the author only adversary placing the adversarial patch around the car, which is
required to optimize the generation parameters of the QR code suitable for stationary scenarios (e.g., parking lot). The author
texture. The generated QR code texture is tiled to a large shape, adopted objectness, NPS, and TV loss to train the adversarial
and then the actual adversarial patch is randomly cropped patch. The author conducted the physical attack by printing
from the tiled image and then pasted onto the target object out the adversarial patch and placing them around or near the
in the image. The TPS and various transformations are used target objects under different lighting conditions. Experiment
to improve the robustness of the adversarial patch. The author results demonstrated that the printed-out adversarial patch
made the clothes by attaching the black block to the aerogel could significantly reduce objectness score (25% to 85%).
according to the optimized QR code texture. The participant
who wore the adversarial patch could successfully bypass
the detector (i.e., YOLOv3), where the detection performance
degraded by 64.6%.
Type ON Type OFF
F. Against Aerial detection Fig. 17. Patches designs [101].
Unmanned aerial surveillance has gained increasing atten- Apart from two adversarial patch attacks against the object
tion with the development of drone techniques and object detector on the aerial imagery dataset, there are a number of
detection. Correspondingly, the adversarial attack against un- attempts to attack the remote sensing image recognition, and
manned aerial surveillance aroused increasing attention in detection [130]–[132].
the military scenario. The conventional technique against
unmanned aerial surveillance is to use the camouflage net,
G. Other adversarial attacks against object detection
while a recent study shows that a partial imagery patch could
deceive the unmanned aerial surveillance. Although physical attacks consider the more realistic con-
ditions, the basic procedure behind physical attacks is similar
to digital attacks, i.e., the first step of the physical adversarial
attack is to construct the adversarial perturbation in the digital
world. Thus, some digital attacks may inspire physical attacks.
Wang et al. [133] proposed a dense adversarial generation
(DAG) attack, which simultaneously attacks the classification,
objectness, and the classification and regress loss of the
Random patch Adversarial patch RPN module. Chow et al. [134] systemically investigate the
Fig. 16. Visualization result of different patch. [129]. adversarial objectness attack against the common detectors
(i.e., YOLOv3, SSD, and Faster RCNN) on three different
Den Hollander et al. [129] assumed unmanned aerial benchmark datasets, the devised adversarial patch could paste
surveillance is the YOLOv2 trained on the aerial imagery on the image or add into the image. Wang et al. [135] proposed
dataset. To attack the object detector, the author combined the a creative attack by disturbing the non-maximum suppression
common loss consisting of the adversarial loss, NPS loss, and (NMS) module of the detector, which resulted in enormous
the objectness loss and proposed a novel colorfulness metric wrong results on the generated adversarial examples. Liao et
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 20
al. [136] showed that there is no need to generate global rate for both YOLOv4 and Faster RCNN).
perturbation for the whole image, the author exploited the
high-level semantic information to generate local perturbation
H. Attack Semantic Segmentation
for detectors. To improve the naturalness of the adversarial
patch, Pavlitskaya et al. [36] extend the existing GAN-based Semantic segmentation is similar to the object detection task
approaches [95] by attacking all object that appears in the but needs fine-grind classification for every pixel in the image.
image to generate inconspicuous patches, and then assessed However, physical attacks against semantic segmentation are
the trade-off between attack performance and naturalness. less explored, which will be discussed as follow.
Nesti et al. [102] proposed to optimized a scene-specific
billboard patch to attack the semantic segmentation models.
The author exploits the Carla simulator [123] to simulate a
more realistic environment of the real world. Rather than adopt
the EOT in the 2D image space, the author exploited various
physical transformations in the 3D dimension provided by
SAA [137] DPatch [138] RPAttack [139]
Carla. More specifically, the author first applied the projective
transformation to precisely paste the patch onto the 2D attack-
able surface with the extracted 3D rotation matrix. The author
devised the following loss function to optimize the patch.
X X
HideAttack [138] HideAttack [138] LM = LCE (fi (xadv ), yi ), LM̄ = LCE (fi (xadv ), yi ),
r∈Ω r6∈Ω
Fig. 18. Visualization of different sparse attack against the object detector.
(26)
where LM describes the cumulative cross entropy (CE) for
Unlike searching the global perturbation for the whole
those pixels that have been misclassified with respect to the
image, the research has attempted to implement the sparse
ground truth y, denoted as Ω; while LM̄ refers to all the others.
attack by perturbing a few numbers of the image pixel,
The author printed out the billboard patch as a 1 × 2 meter
which makes them suitable for extending to physical attacks.
poster, which could successfully hide the target object. Figure
Although the sparse adversarial attack has been investigated
19 illustrates the physical billboard patch.
in image recognition systems [140]–[144], their application
on object detection is unexplored. Figure 18 illustrated the
successfully sparse adversarial attack in the digital world.
To perform the sparse adversarial attack, Bao et al. [137]
generated the adversarial patch by simultaneously considering
the position, shape, and size of the adversarial patch into
optimization. Zhao et al. [145] proposed two adversarial patch
generation algorithms: exploit the heatmap to guide the patch
application and consensus-based algorithm (i.e., one of the Fig. 19. Visualization of Random patch (left) and optimized patch (right).
model ensemble strategies). The proposed algorithm won [102]
seventh place in the Alibaba Tianchi adversarial challenge
object detection competition. Wu et al. [138] proposed a Unlike the image recognition and object detection task, the
diffused patch attack to optimize the adversarial line embedded research on the physical adversarial attack against the semantic
in the object’s center. The proposed attack won second place segmentation task lacks enough attention. The possible reason
in the Tianchi competition. Li et al. [146] extend the universal is the task similarity between object detection and semantic
adversarial perturbation [147] from image recognition tasks to segmentation. Additionally, semantic segmentation models are
object detection tasks and proposed a universal dense object commonly deployed in autonomous driving, and the physical
suppression (U-DOS) algorithm to optimize the universal adversarial attack against the autonomous driving system has
adversarial perturbation to deceive the detector on most input gained enormous attention [149]–[151], which is beyond the
images. To probe the model’s blind spots, Zhu et al. [148] discussion of this paper. Such reason may result in a few
came up with a novel approach that simultaneously models numbers of literature investigating the physical adversarial
the shape and the pattern of the adversarial patch; they can attack against semantic segmentation.
generate multi patches with different shapes for a specific Nonetheless, several attempts have explored the digital
image. Huang et al. [139] proposed to generate the adversarial attack against semantic segmentation. Hendrik Metzen et al.
patch and gradually remove the inconsequential perturbation [152] proposed universal adversarial perturbation to deceive
in the crafted patch. In addition, the author devised a weighted the semantic segmentation on most images. Unlike to [152],
strategy to stabilize the model ensemble, avoiding the coun- Nakka et al. [153] pointed out there is no need to perturb the
termeasure among different detectors. The experiment result whole image to realize the attack and proposed to optimize the
shows that perturbing a few numbers of pixels could reduce adversarial patch applied to the object to attack the semantic
the detector significantly (obtaining 100% missed detection segmentation. Arnab et al. [154] systematically investigated
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 21
the adversarial robustness of the semantic segmentation with Based on the aforementioned discussion, one promising
respect to various attack (e.g., FGSM [10], PGD [12]). approach is to adaptive adjust the mask’s shape, such as using
Additionally, physical adversarial attacks have been recently a network to automatically learn the best mask or exploit the
applied to various fields, such as depth estimation [103] and geometry algorithm to optimize the optimal mask.
object tracking [104], [105]. 2) Universal adversarial perturbation via black-box attack:
Many existing physical attacks belong to white-box attacks,
VI. D ISCUSSION AND F UTURE W ORK and the perturbation is defaulted to be universal, which means
it only requires optimizing one perturbation to fool the whole
Although physical adversarial attacks impose potential risks
dataset. Despite their success, the white-box setting is not
for existing deployed DNN-based systems, the issue remains
practical in practice. Regarding black-box attacks, there are
to be solved. In this section, from the perspective of the attack
plenty of works [16], [155]–[157] have been proposed for
pipeline (i.e., input transformation, algorithm design, and
attacking the target model in the digital world. Moreover, these
evaluation), we will summarize the common techniques used
works are designed to generate perturbation for each input
to develop physical attacks and then discuss the promising
image, which limits their application in practice. In surveyed
methods used to boost further the performance of physical at-
works, only one work was proposed to optimize the universal
tacks. Finally, we discuss the potentially beneficial application
adversarial perturbation against the image recognition model
of the physical deployable adversarial perturbation.
in digital work [158].
Therefore, developing physically deployable universal
A. Input transformation black-box attacks is more realistic in practice but more dif-
The essential of input transformation is the expectation ficult. Some algorithms can solve the above problem, such
maximization of the objective function over the inputs that as genetic algorithm, differential evolution, particle swarm
convert on various transformation operations (e.g., rotation, optimization, and random search. However, optimizing the
affine transform, brightness adjust). In adversarial attacks, perturbation revolves to solve a high-dimension optimization
the transformation is exerted on the adversarial examples problem, which significantly confines the efficiency and ef-
to make them robust to such transformation or boost their fectiveness of the attack. Thus, reducing the optimization
transferability. Thus, input transformation is a simple but dimension is crucial for developing the physical deployable
effective way to improve adversarial attacks’ performance black-box attack. Moreover, other issues referring to black-
both in the digital and physical world. However, the complex box may need to be considered, such as time-consuming and
environmental condition may curb the effectiveness of such query efficient.
transformations. To address the issue, the digital and the cor- 3) Transferability: Transferability is one of the significant
responding physical recaptured image pairs are used to boost characteristics of adversarial perturbation, which refers to the
the physical robustness of adversarial perturbation. Another adversarial perturbation generated for the known model and
solution is to use a network to model the discrepancy between can also be used to attack the unknown model. In realistic
the digital and physical worlds. physical attacks, the adversary may not know what the DNN
To better mimic the physical transformation in the real deployed in the system, which makes the transferability of
world, given the stronger learning capability of DNN, one adversarial perturbation essential for physical attack perfor-
approach is to exploit a network to learn the various physical mance. However, the existing physical attacks mainly focus on
transformation, such as shadow, brightness, rotation, satura- the robustness of physical attacks and attack performance on
tion, and so on. For example, one can use a network to learn specific known models, making the transferability of physical
the explicit physical transformation from video sequences. adversarial attacks less explored.
Thus, collecting high-quality training data is significant to the There are enormous works attempts to improve the trans-
network’s performance. ferability of digital attacks, including but not limited to in-
put transformation, gradient calibration, feature-level attack,
model ensemble, and generative model. These techniques
B. Attack algorithm design could be combined with physical attacks to boost their trans-
Although a line of physical attacks has been proposed for ferability.
different tasks and achieved certain success, many problems 4) Simultaneous attack multitask: Most current physical
still exist, which will be discussed as follows. attacks focus on attacking specific tasks, such as image
1) Adversarial patch application: Adversarial patch appli- recognition and object detection. However, in the real world,
cation refers to how to paste the adversarial perturbation on the attacker cannot get knowledge about the target victim
the host image/object, which is crucial for adversarial patch system, such as whether the system is equipped with the
attacks. However, most existing works have used the mask to image recognition model or object detection model. Therefore,
guide the application of the adversarial patch, which curbs the one promising direction to address the above problem is to
performance of the attacks. The reason may include two-fold: take multi-task into account during the optimization simulta-
on one side, only a specific area of the image is crucial for the neously.
model decisions, and modifying unspecific areas may have a Recently, the vision transformer (aka., ViT) provided a novel
countereffect; On another side, a large block of perturbation general framework for vision tasks, exhibiting competitive
is conspicuous to the human observer, hurting its stealthiness. performance to CNN-based framework. With the increasing
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 22
application of Vit, physical adversarial attacks against the ViT the scene and parameters (e.g., weather and light conditions).
should arouse the researcher’s attention. For example, one can develop a physical testing environment
5) Robustness of physical attacks: The main problem faced using an open source simulator (e.g., Carla [159] or AirSim
by physical adversarial attacks is the complex physical envi- [160]), then provide the physical testing interface, allowing
ronments, including brightness, lighting, shadow, deformation, the attacker to call the testing procedure with their optimized
distance, occlusion, and the system noise introduced by the adversarial perturbation and report the attack results.
imaging sensor (i.e., camera). The existing approaches en- 2) Uniform physical test criterion: Most physical attacks
counter the following drawbacks when applied to physical conduct physical experiments under stationary conditions,
attacks. First, they generally resit some of these conditions which means that the adversary takes photos of the adversarial
and hard take all underlying physical environment factors into perturbation disturbed object at a specific position and then
account. Second, they cannot cover some physical conditions, evaluates the attack performance of the captured image. How-
such as partial shadow or occlusion of the target object. ever, the target object moves in and out of the viewfinder of
Third, the used transformation (e.g., rotation or brightness the system’s sensor device in the actual physical environment,
change) modifies the whole pixel of the image instead of meaning that the ratio of the perturbed object is dynamically
the target object, which is inconsistent with real physical changing with respect to the fixed viewfinder. In the dynam-
deployment. Finally, there are no works to take the dynamic ically changing process, many factors caused by the back-
changing of the size of the target object into account, which is ground environment would impact the attack performance. The
crucial in the real-world scenario as the perturbed object will stationary test cannot include such a complex scenario. To
move. Therefore, constructing a general and robust physical this end, the dynamic physical test may better represent the
adversarial attack is an open problem for future research. performance of the attack method in the real world. Therefore,
Two promising techniques may be can used to solve the a complete criterion of the physical test process is urgent to
above issues. The first one is to utilize the technique from the boost the development of physical adversarial attacks.
computer graphic, such as the physical renderer, a computer 3) Applications: The adversarial attack is a double-blade
graphics approach that seeks to render images in a way that sword. On the one side, adversarial attacks would lead the
models the flow of light in the real world. The second one DNN-based system to make mistakes, resulting in potential
is to utilize emerging technologies, i.e., neural radiance fields security risks. On the other side, adversarial attacks also
(NeRF), a method for synthesis for novel views of complex have their benefits side, such as 1) the adversarial perturbed
scenes. The abovementioned approaches can be integrated into physical adversarial example could be used as the training
optimizing adversarial attacks to mimic complex real-world data to boost the adversarial robustness of DNN further; 2)
conditions. adversarial perturbation can used to help the DNN system
6) Stealthiness: Most existing successful physical adversar- remain stable in complex environmental conditions [161],
ial attacks pursue higher and more robust attack performance [162]. Therefore, potential applications exploiting adversarial
with the sacrifice of perceptible. One reason is the mode of perturbation to boost the performance of DNN in the real
projecting the perturbation to the sphere of L∞ with a radius world is an interesting research direction.
of a specific magnitude in digital attacks may be unsuitable
for physical attacks as the slight perturbation may be distorted
caused by the device noise or environmental factors, making VII. C ONCLUSION
the attack invalid. Moreover, visually conspicuous adversarial
In this paper, we systemically discussed the physical ad-
perturbation patterns easily observe by humans. Thus, as
versarial attacks in computer vision tasks, including image
we cannot diminish the perturbation magnitude, making the
recognition and object detector tasks. We argue that DNN-
adversarial patterns more natural may be a promising work.
based systems still face enormous potential risks in both digital
and physical environments. These risks should be thoroughly
C. Evaluation studied to avoid unnecessary loss when widely deployed in
the real world. Specifically, we proposed a taxonomy scheme
1) Evaluation benchmark dataset: Although most physical
to categorize the current physical adversarial attack to provide
attack methods were proposed, they are devised for different
a comprehensive overview. By reviewing the existing physical
datasets, resulting in hardly evaluating different methods on
adversarial attack, we summarize the common characteristic of
one uniform dataset. Thus, it is urgent to develop a public
physical adversarial attacks, which may be helpful to future
dataset for evaluating the effectiveness of the attack methods.
physical adversarial attack research. Finally, we discuss the
On the other hand, there is no unified evaluation process
problems that need to be solved in physical adversarial attacks
and criterion for physical attacks, which makes it hard to
and provide some possible research directions.
reproduce the result reported in the publications. However,
the real environment is complex and hard to reproduce as the
changing weather and light condition. Recently, some research VIII. ACKNOWLEDGMENTS
has attempted to use the simulator to develop adversarial
attacks, which gives a promising direction to develop an This work was supported by the National Natural Science
environment for evaluating physical attacks as the simulator Foundation of China (No.11725211 and 52005505).
can provide reproducibility and definite environments by fixing .
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 23
R EFERENCES [23] J. Zhang and C. Li, “Adversarial examples: Opportunities and chal-
lenges,” IEEE transactions on neural networks and learning systems,
[1] K. Simonyan and A. Zisserman, “Very deep convolutional networks for vol. 31, no. 7, pp. 2578–2593, 2019.
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [24] Y. Hu, W. Kuang, Z. Qin, K. Li, J. Zhang, Y. Gao, W. Li, and K. Li,
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image “Artificial intelligence security: threats and countermeasures,” ACM
recognition,” in Proceedings of the IEEE conference on computer vision Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021.
and pattern recognition, 2016, pp. 770–778. [25] A. Aldahdooh, W. Hamidouche, S. A. Fezza, and O. Déforges, “Adver-
[3] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and sarial example detection for dnn models: A review and experimental
S. Zagoruyko, “End-to-end object detection with transformers,” in comparison,” Artificial Intelligence Review, pp. 1–60, 2022.
European conference on computer vision. Springer, 2020, pp. 213– [26] G. R. Machado, E. Silva, and R. R. Goldschmidt, “Adversarial machine
229. learning in image classification: A survey toward the defender’s per-
[4] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, spective,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–38,
“Swin transformer: Hierarchical vision transformer using shifted win- 2021.
dows,” in Proceedings of the IEEE/CVF International Conference on [27] A. Sharma, Y. Bian, P. Munz, and A. Narayan, “Adversarial patch
Computer Vision, 2021, pp. 10 012–10 022. attacks and defences in vision-based tasks: A survey,” arXiv preprint
[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. arXiv:2206.08304, 2022.
Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” [28] H. Ren, T. Huang, and H. Yan, “Adversarial examples: attacks and
Advances in neural information processing systems, vol. 30, 2017. defenses in the physical world,” International Journal of Machine
[6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training Learning and Cybernetics, vol. 12, no. 11, pp. 3325–3336, 2021.
of deep bidirectional transformers for language understanding,” arXiv [29] J. Wang, A. Liu, X. Bai, and X. Liu, “Universal adversarial patch
preprint arXiv:1810.04805, 2018. attack for automatic checkout using perceptual and attentional bias,”
[7] D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, IEEE Transactions on Image Processing, vol. 31, pp. 598–611, 2021.
C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al., “Deep [30] J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, and X. Liu, “Dual attention
speech 2: End-to-end speech recognition in english and mandarin,” suppression attack: Generate adversarial camouflage in physical world,”
in International conference on machine learning. PMLR, 2016, pp. in Proceedings of the IEEE/CVF Conference on Computer Vision and
173–182. Pattern Recognition, 2021, pp. 8565–8574.
[8] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, [31] X. Zhu, X. Li, J. Li, Z. Wang, and X. Hu, “Fooling thermal infrared
S. Wang, Z. Zhang, Y. Wu et al., “Conformer: Convolution-augmented pedestrian detectors in real world using small bulbs,” in Proceedings
transformer for speech recognition,” arXiv preprint arXiv:2005.08100, of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021,
2020. pp. 3616–3624.
[9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. [32] X. Zhu, Z. Hu, S. Huang, J. Li, and X. Hu, “Infrared invisible clothing:
Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” Hiding from infrared detectors at multiple angles in real world,” in
in 2nd International Conference on Learning Representations, ICLR Proceedings of the IEEE/CVF Conference on Computer Vision and
2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Pattern Recognition, 2022, pp. 13 317–13 326.
Proceedings, 2014. [Online]. Available: http://arxiv.org/abs/1312.6199 [33] D. Wang, T. Jiang, J. Sun, W. Zhou, Z. Gong, X. Zhang, W. Yao, and
[10] I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing X. Chen, “Fca: Learning a 3d full-coverage vehicle camouflage for
adversarial examples,” in International Conference on Learning multi-view physical adversarial attack,” in Proceedings of the AAAI
Representations, 2015. [Online]. Available: http://arxiv.org/abs/1412. Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2414–
6572 2422.
[11] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine [34] Y. Duan, J. Chen, X. Zhou, J. Zou, Z. He, J. Zhang, W. Zhang, and
learning at scale,” arXiv preprint arXiv:1611.01236, 2016. Z. Pan, “Learning coated adversarial camouflages for object detectors,”
[12] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards in Proceedings of the Thirty-First International Joint Conference on
deep learning models resistant to adversarial attacks,” in 6th Interna- Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022,
tional Conference on Learning Representations, ICLR 2018, Vancouver, L. D. Raedt, Ed., 2022, pp. 891–897.
BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. [35] N. Suryanto, Y. Kim, H. Kang, H. T. Larasati, Y. Yun, T.-T.-H. Le,
OpenReview.net, 2018. H. Yang, S.-Y. Oh, and H. Kim, “Dta: Physical camouflage attacks
[13] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting using differentiable transformation network,” in Proceedings of the
adversarial attacks with momentum,” 2018 IEEE/CVF Conference on IEEE/CVF Conference on Computer Vision and Pattern Recognition,
Computer Vision and Pattern Recognition, pp. 9185–9193, 2018. 2022, pp. 15 305–15 314.
[14] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural [36] S. Pavlitskaya, B.-M. Codău, and J. M. Zöllner, “Feasibility of incon-
networks,” in 2017 ieee symposium on security and privacy (sp). IEEE, spicuous gan-generated adversarial patches against object detection,”
2017, pp. 39–57. arXiv preprint arXiv:2207.07347, 2022.
[15] Z. Huang and T. Zhang, “Black-box adversarial attack with [37] Y. Nemcovsky, M. Yaakoby, A. M. Bronstein, and C. Baskin, “Physical
transferable model-based embedding,” in International Conference passive patch adversarial attacks on visual odometry systems,” arXiv
on Learning Representations, 2020. [Online]. Available: https: preprint arXiv:2207.05729, 2022.
//openreview.net/forum?id=SJxhNTNYwB [38] B. G. Doan, M. Xue, S. Ma, E. Abbasnejad, and D. C. Ranasinghe, “Tnt
[16] C. Li, H. Wang, J. Zhang, W. Yao, and T. Jiang, “An approximated attacks! universal naturalistic adversarial patches against deep neural
gradient sign method using differential evolution for black-box adver- network systems,” IEEE Transactions on Information Forensics and
sarial attack,” IEEE Transactions on Evolutionary Computation, pp. Security, 2022.
1–1, 2022. [39] X. Wei, Y. Guo, and J. Yu, “Adversarial sticker: A stealthy attack
[17] L. Sun, M. Tan, and Z. Zhou, “A survey of practical adversarial example method in the physical world,” IEEE Transactions on Pattern Analysis
attacks,” Cybersecurity, vol. 1, no. 1, pp. 1–9, 2018. and Machine Intelligence, 2022.
[18] R. R. Wiyatno, A. Xu, O. Dia, and A. de Berker, “Adversarial [40] X. Wei, B. Pu, J. Lu, and B. Wu, “Physically adversarial attacks and de-
examples in modern machine learning: A review,” arXiv preprint fenses in computer vision: A survey,” arXiv preprint arXiv:2211.01671,
arXiv:1911.05268, 2019. 2022.
[19] S. Qiu, Q. Liu, S. Zhou, and C. Wu, “Review of artificial intelligence [41] H. Wei, H. Tang, X. Jia, H. Yu, Z. Li, Z. Wang, S. Satoh, and Z. Wang,
adversarial attack and defense technologies,” Applied Sciences, vol. 9, “Physical adversarial attack meets computer vision: A decade survey,”
no. 5, p. 909, 2019. arXiv preprint arXiv:2209.15179, 2022.
[20] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and [42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
defenses for deep learning,” IEEE transactions on neural networks and large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
learning systems, vol. 30, no. 9, pp. 2805–2824, 2019. [43] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
[21] A. Serban, E. Poll, and J. Visser, “Adversarial examples on ob- “Densely connected convolutional networks,” in Proceedings of the
ject recognition: A comprehensive survey,” ACM Computing Surveys IEEE conference on computer vision and pattern recognition, 2017,
(CSUR), vol. 53, no. 3, pp. 1–38, 2020. pp. 4700–4708.
[22] K. Ren, T. Zheng, Z. Qin, and X. Liu, “Adversarial attacks and defenses [44] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
in deep learning,” Engineering, vol. 6, no. 3, pp. 346–360, 2020. once: Unified, real-time object detection,” in Proceedings of the IEEE
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 24
conference on computer vision and pattern recognition, 2016, pp. 779– [66] R. Duan, X. Mao, A. K. Qin, Y. Chen, S. Ye, Y. He, and Y. Yang,
788. “Adversarial laser beam: Effective physical-world attack to dnns in
[45] S.-T. Chen, C. Cornelius, J. Martin, and D. H. P. Chau, “Shapeshifter: a blink,” in Proceedings of the IEEE/CVF Conference on Computer
Robust physical adversarial attack on faster r-cnn object detector,” Vision and Pattern Recognition, 2021, pp. 16 062–16 071.
in Joint European Conference on Machine Learning and Knowledge [67] K. Kim, J. Kim, S. Song, J.-H. Choi, C. Joo, and J.-S. Lee, “Light lies:
Discovery in Databases. Springer, 2018, pp. 52–68. Optical adversarial attack,” arXiv preprint arXiv:2106.09908, 2021.
[46] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, [68] B. Huang and H. Ling, “Spaa: Stealthy projector-based adversarial
A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks attacks on deep image classifiers,” in 2022 IEEE Conference on Virtual
on deep learning visual classification,” in Proceedings of the IEEE Reality and 3D User Interfaces (VR). IEEE, 2022, pp. 534–542.
conference on computer vision and pattern recognition, 2018, pp. [69] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “A general
1625–1634. framework for adversarial examples with objectives,” ACM Transac-
[47] S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveil- tions on Privacy and Security (TOPS), vol. 22, no. 3, pp. 1–30, 2019.
lance cameras: adversarial patches to attack person detection,” in [70] I. Singh, S. Momiyama, K. Kakizaki, and T. Araki, “On brightness
Proceedings of the IEEE/CVF conference on computer vision and agnostic adversarial examples against face recognition systems,” in
pattern recognition workshops, 2019, pp. 0–0. 2021 International Conference of the Biometrics Special Interest Group
[48] L. Huang, C. Gao, Y. Zhou, C. Xie, A. L. Yuille, C. Zou, and (BIOSIG). IEEE, 2021, pp. 1–5.
N. Liu, “Universal physical camouflage attacks on object detectors,” [71] B. Zhang, B. Tondi, and M. Barni, “Adversarial examples for replay
in Proceedings of the IEEE/CVF Conference on Computer Vision and attacks against cnn-based face recognition with anti-spoofing capabil-
Pattern Recognition, 2020, pp. 720–729. ity,” Computer Vision and Image Understanding, vol. 197, p. 102988,
[49] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing 2020.
robust adversarial examples,” in International conference on machine [72] M. Pautov, G. Melnikov, E. Kaziakhmedov, K. Kireev, and
learning. PMLR, 2018, pp. 284–293. A. Petiushko, “On adversarial patches: real-world attack on arcface-100
[50] D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, face recognition system,” in 2019 International Multi-Conference on
F. Tramer, A. Prakash, and T. Kohno, “Physical adversarial examples Engineering, Computer and Information Sciences (SIBIRCON). IEEE,
for object detectors,” in 12th USENIX workshop on offensive technolo- 2019, pp. 0391–0396.
gies (WOOT 18), 2018. [73] S. Komkov and A. Petiushko, “Advhat: Real-world adversarial attack
[51] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to on arcface face id system,” in 2020 25th International Conference on
a crime: Real and stealthy attacks on state-of-the-art face recognition,” Pattern Recognition (ICPR). IEEE, 2021, pp. 819–826.
in Proceedings of the 2016 acm sigsac conference on computer and [74] Z. Xiao, X. Gao, C. Fu, Y. Dong, W. Gao, X. Zhang, J. Zhou,
communications security, 2016, pp. 1528–1540. and J. Zhu, “Improving transferability of adversarial patches on face
[52] A. Mahendran and A. Vedaldi, “Understanding deep image represen- recognition with generative models,” in Proceedings of the IEEE/CVF
tations by inverting them,” in Proceedings of the IEEE conference on Conference on Computer Vision and Pattern Recognition, 2021, pp.
computer vision and pattern recognition, 2015, pp. 5188–5196. 11 845–11 854.
[53] S. T. Jan, J. Messou, Y.-C. Lin, J.-B. Huang, and G. Wang, “Connecting
[75] X. Wei, Y. Guo, J. Yu, and B. Zhang, “Simultaneously optimizing
the digital and physical world: Improving the robustness of adversarial
perturbations and positions for black-box adversarial patch attacks,”
attacks,” in Proceedings of the AAAI Conference on Artificial Intelli-
arXiv preprint arXiv:2212.12995, 2022.
gence, vol. 33, no. 01, 2019, pp. 962–969.
[76] A. Zolfi, S. Avidan, Y. Elovici, and A. Shabtai, “Adversarial mask:
[54] A. Liu, J. Wang, X. Liu, B. Cao, C. Zhang, and H. Yu, “Bias-
Real-world adversarial attack against face recognition models,” 2022.
based universal adversarial patch attack for automatic check-out,” in
[77] D.-L. Nguyen, S. S. Arora, Y. Wu, and H. Yang, “Adversarial light
European conference on computer vision. Springer, 2020, pp. 395–
projection attacks on face recognition systems: A feasibility study,”
410.
in Proceedings of the IEEE/CVF conference on computer vision and
[55] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples
pattern recognition workshops, 2020, pp. 814–815.
in the physical world,” in 5th International Conference on Learning
Representations. OpenReview.net, 2017, pp. 24–26. [78] C. Sitawarin, A. N. Bhagoji, A. Mosenia, P. Mittal, and M. Chiang,
[56] Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, J. Wang, B. Yu, W. Feng, and “Rogue signs: Deceiving traffic sign recognition with malicious ads
Y. Liu, “Watch out! motion is blluurrrri inngg the vision of your deep and logos,” arXiv preprint arXiv:1801.02780, 2018.
neural networks,” in Proceedings of the 34th International Conference [79] A. Liu, X. Liu, J. Fan, Y. Ma, A. Zhang, H. Xie, and D. Tao,
on Neural Information Processing Systems, 2020, pp. 975–985. “Perceptual-sensitive gan for generating adversarial patches,” in Pro-
[57] W. Feng, B. Wu, T. Zhang, Y. Zhang, and Y. Zhang, “Meta-attack: ceedings of the AAAI conference on artificial intelligence, vol. 33,
Class-agnostic and model-agnostic physical adversarial attack,” in no. 01, 2019, pp. 1028–1035.
Proceedings of the IEEE/CVF International Conference on Computer [80] R. Duan, X. Ma, Y. Wang, J. Bailey, A. K. Qin, and Y. Yang,
Vision, 2021, pp. 7787–7796. “Adversarial camouflage: Hiding physical-world attacks with natural
[58] B. Phan, F. Mannan, and F. Heide, “Adversarial imaging pipelines,” styles,” in Proceedings of the IEEE/CVF conference on computer vision
in Proceedings of the IEEE/CVF Conference on Computer Vision and and pattern recognition, 2020, pp. 1000–1008.
Pattern Recognition, 2021, pp. 16 051–16 061. [81] Y. Zhong, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Shadows can be
[59] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer, “Adversarial dangerous: Stealthy and effective physical-world adversarial attack by
patch,” Advances in Neural Information Processing Systems, 2017. natural phenomenon,” in Proceedings of the IEEE/CVF Conference on
[60] S. Casper, M. Nadeau, D. Hadfield-Menell, and G. Kreiman, “Robust Computer Vision and Pattern Recognition, 2022, pp. 15 345–15 354.
feature-level adversaries are interpretability tools,” in Advances in [82] Z. Wang, S. Zheng, M. Song, Q. Wang, A. Rahimpour, and H. Qi,
Neural Information Processing Systems, 2022. “advpattern: physical-world attacks on deep person re-identification via
[61] Y. Dong, S. Ruan, H. Su, C. Kang, X. Wei, and J. Zhu, “Viewfool: adversarially transformable patterns,” in Proceedings of the IEEE/CVF
Evaluating the robustness of visual recognition to adversarial view- International Conference on Computer Vision, 2019, pp. 8341–8350.
points,” in Advances in Neural Information Processing Systems, A. H. [83] Z. Kong, J. Guo, A. Li, and C. Liu, “Physgan: Generating physical-
Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. world-resilient adversarial examples for autonomous driving,” in Pro-
[62] N. Nichols and R. Jasper, “Projecting trouble: Light based adversarial ceedings of the IEEE/CVF Conference on Computer Vision and Pattern
attacks on deep learning classifiers,” Proceedings of the AAAI Fall Recognition, 2020, pp. 14 254–14 263.
Symposium on Adversary-Aware Learning Techniques and Trends in [84] Y. Qian, D. Ma, B. Wang, J. Pan, J. Wang, Z. Gu, J. Chen, W. Zhou,
Cybersecurity, 2018. and J. Lei, “Spot evasion attacks: Adversarial examples for license plate
[63] Y. Man, M. Li, and R. Gerdes, “Poster: Perceived adversarial exam- recognition systems with convolutional neural networks,” Computers &
ples,” in IEEE Symposium on Security and Privacy, no. 2019, 2019. Security, vol. 95, p. 101826, 2020.
[64] A. Gnanasambandam, A. M. Sherman, and S. H. Chan, “Optical [85] Y. Zhao, H. Zhu, R. Liang, Q. Shen, S. Zhang, and K. Chen, “Seeing
adversarial attack,” in Proceedings of the IEEE/CVF International isn’t believing: Towards more robust adversarial attack against real
Conference on Computer Vision, 2021, pp. 92–101. world object detectors,” in Proceedings of the 2019 ACM SIGSAC
[65] A. Sayles, A. Hooda, M. Gupta, R. Chatterjee, and E. Fernandes, Conference on Computer and Communications Security, 2019, pp.
“Invisible perturbations: Physical adversarial examples exploiting the 1989–2004.
rolling shutter effect,” in Proceedings of the IEEE/CVF Conference on [86] M. Lee and Z. Kolter, “On physical adversarial patches for object
Computer Vision and Pattern Recognition, 2021, pp. 14 666–14 675. detection,” arXiv preprint arXiv:1906.11897, 2019.
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 25
[87] K. Yang, T. Tsai, H. Yu, T.-Y. Ho, and Y. Jin, “Beyond digital domain: of the IEEE international conference on computer vision, 2017, pp.
Fooling deep learning based recognition system in physical world,” in 2223–2232.
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, [108] M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer
no. 01, 2020, pp. 1088–1095. networks,” Advances in neural information processing systems, vol. 28,
[88] A. Zolfi, M. Kravchik, Y. Elovici, and A. Shabtai, “The translucent 2015.
patch: A physical and universal attack on object detectors,” in Pro- [109] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Generating
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern adversarial examples with adversarial networks,” in Proceedings of the
Recognition, 2021, pp. 15 232–15 241. 27th International Joint Conference on Artificial Intelligence, 2018, p.
[89] A. Shapira, R. Bitton, D. Avraham, A. Zolfi, Y. Elovici, and A. Shabtai, 3905–3911.
“Attacking object detector using a universal targeted label-switch [110] Z.-A. Zhu, Y.-Z. Lu, and C.-K. Chiang, “Generating adversarial
patch,” arXiv preprint arXiv:2211.08859, 2022. examples by makeup attacks on face recognition,” in 2019 IEEE
[90] S. Hoory, T. Shapira, A. Shabtai, and Y. Elovici, “Dynamic ad- International Conference on Image Processing (ICIP). IEEE, 2019,
versarial patch for evading object detection models,” arXiv preprint pp. 2516–2520.
arXiv:2010.13070, 2020. [111] T. Malzbender, D. Gelb, and H. Wolters, “Polynomial texture maps,”
[91] D. Y. Yang, J. Xiong, X. Li, X. Yan, J. Raiti, Y. Wang, H. Wu, in Proceedings of the 28th annual conference on Computer graphics
and Z. Zhong, “Building towards” invisible cloak”: Robust physical and interactive techniques, 2001, pp. 519–528.
adversarial attack on yolo object detector,” in 2018 9th IEEE Annual [112] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
Ubiquitous Computing, Electronics & Mobile Communication Confer- D. Batra, “Grad-cam: Visual explanations from deep networks via
ence (UEMCON). IEEE, 2018, pp. 368–374. gradient-based localization,” in Proceedings of the IEEE international
[92] K. Xu, G. Zhang, S. Liu, Q. Fan, M. Sun, H. Chen, P.-Y. Chen, conference on computer vision, 2017, pp. 618–626.
Y. Wang, and X. Lin, “Adversarial t-shirt! evading person detectors in a [113] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using
physical world,” in European conference on computer vision. Springer, convolutional neural networks,” in Proceedings of the IEEE conference
2020, pp. 665–681. on computer vision and pattern recognition, 2016, pp. 2414–2423.
[93] Z. Wu, S.-N. Lim, L. S. Davis, and T. Goldstein, “Making an invisibility [114] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,”
cloak: Real world adversarial attacks on object detectors,” in European arXiv preprint arXiv:1804.02767, 2018.
Conference on Computer Vision. Springer, 2020, pp. 1–17. [115] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu,
[94] J. Tan, N. Ji, H. Xie, and X. Xiang, “Legitimate adversarial patches: and A. C. Berg, “Ssd: Single shot multibox detector,” in European
Evading human eyes and detection models in the physical world,” in conference on computer vision. Springer, 2016, pp. 21–37.
Proceedings of the 29th ACM International Conference on Multimedia, [116] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
2021, pp. 5307–5315. for dense object detection,” in Proceedings of the IEEE international
[95] Y.-C.-T. Hu, B.-H. Kung, D. S. Tan, J.-C. Chen, K.-L. Hua, and W.-H. conference on computer vision, 2017, pp. 2980–2988.
Cheng, “Naturalistic physical adversarial patch for object detectors,” in [117] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
Proceedings of the IEEE/CVF International Conference on Computer object detection with region proposal networks,” Advances in neural
Vision, 2021, pp. 7848–7857. information processing systems, vol. 28, pp. 91–99, 2015.
[96] Z. Hu, S. Huang, X. Zhu, F. Sun, B. Zhang, and X. Hu, “Adversarial [118] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
texture for fooling person detectors in the physical world,” in Proceed- Proceedings of the IEEE international conference on computer vision,
ings of the IEEE/CVF Conference on Computer Vision and Pattern 2017, pp. 2961–2969.
Recognition, 2022, pp. 13 307–13 316. [119] X. Liu, H. Yang, Z. Liu, L. Song, H. Li, and Y. Chen, “Dpatch:
[97] G. Lovisotto, H. Turner, I. Sluganovic, M. Strohmeier, and I. Mar- An adversarial patch attack on object detectors,” arXiv preprint
tinovic, “Slap: Improving physical adversarial examples with short- arXiv:1806.02299, 2018.
lived adversarial perturbations,” in 30th USENIX Security Symposium [120] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
(USENIX Security 21), 2021, pp. 1865–1882. “Feature pyramid networks for object detection,” in Proceedings of the
[98] X. Zeng, C. Liu, Y.-S. Wang, W. Qiu, L. Xie, Y.-W. Tai, C.-K. Tang, IEEE conference on computer vision and pattern recognition, 2017,
and A. L. Yuille, “Adversarial attacks beyond the image space,” in pp. 2117–2125.
Proceedings of the IEEE/CVF Conference on Computer Vision and [121] X. Yang, F. Wei, H. Zhang, and J. Zhu, “Design and interpretation of
Pattern Recognition, 2019, pp. 4302–4311. universal adversarial patches in face detection,” in European Confer-
[99] Y. Zhang, P. H. Foroosh, and B. Gong, “Camou: Learning a vehicle ence on Computer Vision. Springer, 2020, pp. 174–191.
camouflage for physical adversarial attack on object detections in the [122] C. Xiao, D. Yang, B. Li, J. Deng, and M. Liu, “Meshadv: Adversarial
wild,” ICLR, 2019. meshes for visual recognition,” in Proceedings of the IEEE/CVF
[100] T. Wu, X. Ning, W. Li, R. Huang, H. Yang, and Y. Wang, “Physical Conference on Computer Vision and Pattern Recognition, 2019, pp.
adversarial attack on vehicle detector in the carla simulator,” arXiv 6898–6907.
preprint arXiv:2007.16118, 2020. [123] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla:
[101] A. Du, B. Chen, T.-J. Chin, Y. W. Law, M. Sasdelli, R. Rajasegaran, An open urban driving simulator,” in Conference on robot learning.
and D. Campbell, “Physical adversarial attacks on an aerial imagery PMLR, 2017, pp. 1–16.
object detector,” in Proceedings of the IEEE/CVF Winter Conference [124] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun, “Light-
on Applications of Computer Vision, 2022, pp. 1796–1806. head r-cnn: In defense of two-stage object detector,” arXiv preprint
[102] F. Nesti, G. Rossolini, S. Nair, A. Biondi, and G. Buttazzo, “Evalu- arXiv:1711.07264, 2017.
ating the robustness of semantic segmentation for autonomous driving [125] H. Kato, Y. Ushiku, and T. Harada, “Neural 3d mesh renderer,” in
against real-world adversarial patch attacks,” in Proceedings of the Proceedings of the IEEE conference on computer vision and pattern
IEEE/CVF Winter Conference on Applications of Computer Vision, recognition, 2018, pp. 3907–3916.
2022, pp. 2280–2289. [126] M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient
[103] Z. Cheng, J. Liang, H. Choi, G. Tao, Z. Cao, D. Liu, and X. Zhang, object detection,” in Proceedings of the IEEE/CVF conference on
“Physical attack on monocular depth estimation with optimal adversar- computer vision and pattern recognition, 2020, pp. 10 781–10 790.
ial patches,” in European Conference on Computer Vision. Springer, [127] J. Byun, S. Cho, M.-J. Kwon, H.-S. Kim, and C. Kim, “Improving
2022, pp. 514–532. the transferability of targeted adversarial examples through object-
[104] R. R. Wiyatno and A. Xu, “Physical adversarial textures that fool based diverse input,” in Proceedings of the IEEE/CVF Conference on
visual object tracking,” in Proceedings of the IEEE/CVF International Computer Vision and Pattern Recognition, 2022, pp. 15 244–15 253.
Conference on Computer Vision, 2019, pp. 4822–4831. [128] M. A. Alcorn, Q. Li, Z. Gong, C. Wang, L. Mai, W.-S. Ku, and
[105] L. Ding, Y. Wang, K. Yuan, M. Jiang, P. Wang, H. Huang, and Z. J. A. Nguyen, “Strike (with) a pose: Neural networks are easily fooled
Wang, “Towards universal physical attacks on single object tracking,” by strange poses of familiar objects,” in Proceedings of the IEEE/CVF
in AAAI, 2021, pp. 1236–1245. Conference on Computer Vision and Pattern Recognition, 2019, pp.
[106] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image 4845–4854.
translation with conditional adversarial networks,” in Proceedings of [129] R. den Hollander, A. Adhikari, I. Tolios, M. van Bekkum, A. Bal,
the IEEE conference on computer vision and pattern recognition, 2017, S. Hendriks, M. Kruithof, D. Gross, N. Jansen, G. Perez et al.,
pp. 1125–1134. “Adversarial patch camouflage against aerial detection,” in Artificial
[107] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image Intelligence and Machine Learning in Defense Applications II, vol.
translation using cycle-consistent adversarial networks,” in Proceedings 11543. SPIE, 2020, pp. 77–86.
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 26
[130] L. Chen, G. Zhu, Q. Li, and H. Li, “Adversarial example in remote tion,” in Proceedings of the IEEE international conference on computer
sensing image recognition,” arXiv preprint arXiv:1910.13222, 2019. vision, 2017, pp. 2755–2764.
[131] L. Chen, H. Li, G. Zhu, Q. Li, J. Zhu, H. Huang, J. Peng, and L. Zhao, [153] K. K. Nakka and M. Salzmann, “Indirect local attacks for context-
“Attack selectivity of adversarial examples in remote sensing image aware semantic segmentation networks,” in European Conference on
scene classification,” IEEE Access, vol. 8, pp. 137 477–137 489, 2020. Computer Vision. Springer, 2020, pp. 611–628.
[132] Y. Xu, B. Du, and L. Zhang, “Assessing the threat of adversarial exam- [154] A. Arnab, O. Miksik, and P. H. Torr, “On the robustness of semantic
ples on deep neural networks for remote sensing scene classification: segmentation models to adversarial attacks,” in Proceedings of the
Attacks and defenses,” IEEE Transactions on Geoscience and Remote IEEE Conference on Computer Vision and Pattern Recognition, 2018,
Sensing, vol. 59, no. 2, pp. 1604–1617, 2020. pp. 888–897.
[133] Y. Wang, K. Wang, Z. Zhu, and F.-Y. Wang, “Adversarial attacks on [155] M. Alzantot, Y. Sharma, S. Chakraborty, H. Zhang, C.-J. Hsieh, and
faster r-cnn object detector,” Neurocomputing, vol. 382, pp. 87–95, M. B. Srivastava, “Genattack: Practical black-box attacks with gradient-
2020. free optimization,” in Proceedings of the Genetic and Evolutionary
[134] K.-H. Chow, L. Liu, M. Loper, J. Bae, M. E. Gursoy, S. Truex, W. Wei, Computation Conference, 2019, pp. 1111–1119.
and Y. Wu, “Adversarial objectness gradient attacks in real-time object [156] M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square
detection systems,” in 2020 Second IEEE International Conference on attack: a query-efficient black-box adversarial attack via random
Trust, Privacy and Security in Intelligent Systems and Applications search,” in European Conference on Computer Vision. Springer, 2020,
(TPS-ISA). IEEE, 2020, pp. 263–272. pp. 484–501.
[135] D. Wang, C. Li, S. Wen, Q.-L. Han, S. Nepal, X. Zhang, and Y. Xiang, [157] C. Li, W. Yao, H. Wang, and T. Jiang, “Adaptive momentum variance
“Daedalus: Breaking nonmaximum suppression in object detection via for attention-guided sparse adversarial attacks,” Pattern Recognition,
adversarial examples,” IEEE Transactions on Cybernetics, 2021. vol. 133, p. 108979, 2023.
[136] Q. Liao, X. Wang, B. Kong, S. Lyu, Y. Yin, Q. Song, and X. Wu, [158] A. Ghosh, S. S. Mullick, S. Datta, S. Das, A. K. Das, and R. Mallipeddi,
“Fast local attack: Generating local adversarial examples for object “A black-box adversarial attack strategy with adjustable sparsity and
detectors,” in 2020 International Joint Conference on Neural Networks generalizability for deep image classifiers,” Pattern Recognition, vol.
(IJCNN). IEEE, 2020, pp. 1–8. 122, p. 108279, 2022.
[137] J. Bao, “Sparse adversarial attack to object detection,” arXiv preprint [159] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun,
arXiv:2012.13692, 2020. “CARLA: An open urban driving simulator,” in Proceedings of the
[138] S. Wu, T. Dai, and S.-T. Xia, “Dpattack: Diffused patch attacks against 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
universal object detection,” arXiv preprint arXiv:2010.11679, 2020. [160] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-
[139] H. Huang, Y. Wang, Z. Chen, Z. Tang, W. Zhang, and K.-K. Ma, fidelity visual and physical simulation for autonomous vehicles,”
“Rpattack: Refined patch attack on general object detectors,” in 2021 in Field and Service Robotics, 2017. [Online]. Available: https:
IEEE International Conference on Multimedia and Expo (ICME). //arxiv.org/abs/1705.05065
IEEE, 2021, pp. 1–6. [161] H. Salman, A. Ilyas, L. Engstrom, S. Vemprala, A. Madry, and
A. Kapoor, “Unadversarial examples: Designing objects for robust
[140] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
vision,” Advances in Neural Information Processing Systems, vol. 34,
A. Swami, “The limitations of deep learning in adversarial settings,” in
pp. 15 270–15 284, 2021.
2016 IEEE European symposium on security and privacy (EuroS&P).
[162] J. Wang, Z. Yin, P. Hu, A. Liu, R. Tao, H. Qin, X. Liu, and D. Tao,
IEEE, 2016, pp. 372–387.
“Defensive patches for robust recognition in the physical world,” in
[141] F. Croce and M. Hein, “Sparse and imperceivable adversarial attacks,”
Proceedings of the IEEE/CVF Conference on Computer Vision and
in Proceedings of the IEEE/CVF International Conference on Com-
Pattern Recognition, 2022, pp. 2456–2465.
puter Vision, 2019, pp. 4724–4732.
[142] A. Modas, S.-M. Moosavi-Dezfooli, and P. Frossard, “Sparsefool: a
few pixels make a big difference,” in Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, 2019, pp.
9087–9096.
[143] X. Dong, D. Chen, J. Bao, C. Qin, L. Yuan, W. Zhang, N. Yu,
and D. Chen, “Greedyfool: Distortion-aware sparse adversarial attack,”
Advances in Neural Information Processing Systems, vol. 33, pp.
11 226–11 236, 2020.
[144] Z. He, W. Wang, J. Dong, and T. Tan, “Transferable sparse adversarial
attack,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2022, pp. 14 963–14 972.
[145] Y. Zhao, H. Yan, and X. Wei, “Object hider: Adversarial patch attack
against object detectors,” arXiv preprint arXiv:2010.14974, 2020.
[146] D. Li, J. Zhang, and K. Huang, “Universal adversarial perturbations
against object detection,” Pattern Recognition, vol. 110, p. 107584,
2021.
[147] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer-
sal adversarial perturbations,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2017, pp. 1765–1773.
[148] Z. Zhu, H. Su, C. Liu, W. Xiang, and S. Zheng, “You cannot easily
catch me: A low-detectable adversarial patch for object detectors,”
arXiv preprint arXiv:2109.15177, 2021.
[149] Y. Cao, C. Xiao, B. Cyr, Y. Zhou, W. Park, S. Rampazzi, Q. A.
Chen, K. Fu, and Z. M. Mao, “Adversarial sensor attack on lidar-based
perception in autonomous driving,” in Proceedings of the 2019 ACM
SIGSAC conference on computer and communications security, 2019,
pp. 2267–2281.
[150] T. Sato, J. Shen, N. Wang, Y. Jia, X. Lin, and Q. A. Chen, “Dirty road
can attack: Security of deep learning based automated lane centering
under {Physical-World} attack,” in 30th USENIX Security Symposium
(USENIX Security 21), 2021, pp. 3309–3326.
[151] J. Tu, H. Li, X. Yan, M. Ren, Y. Chen, M. Liang, E. Bitar, E. Yumer,
and R. Urtasun, “Exploring adversarial robustness of multi-sensor
perception systems in self driving,” arXiv preprint arXiv:2101.06784,
2021.
[152] J. Hendrik Metzen, M. Chaithanya Kumar, T. Brox, and V. Fischer,
“Universal adversarial perturbations against semantic image segmenta-