A Survey On Physical Adversarial Attack in CV A SURVEY

JOURNAL OF LATEX CLASS FILES, VOL. 18, NO.
9, SEPTEMBER 2022 1
A Survey on Physical Adversarial Attack in

Computer Vision
Donghua Wang1,2 , Wen Yao2∗ , Tingsong Jiang2∗ , Guijian Tang2,3 , Xiaoqian Chen2
1College of Computer Science and Technology, Zhejiang University, Hangzhou, China
2 Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
3 College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, China
wangdonghua@zju.edu.cn, wendy0782@126.com, tingsong@pku.edu.cn, tangbanllniu@163.com,
chenxiaoqian@nudt.edu.cn
arXiv:2209.14262v2 [cs.CV] 4 Jan 2023
Abstract—In the past decade, deep learning has dramatically

changed the traditional hand-craft feature manner with strong
feature learning capability, promoting the tremendous improve-
ment of conventional tasks. However, deep neural networks
(DNNs) have been demonstrated to be vulnerable to adversarial
examples crafted by small noise, which is imperceptible to human
observers but can make DNNs misbehave. Existing adversarial
attacks can be divided into digital and physical adversarial
attacks. The former is designed to pursue strong attack per-
formance in lab environments while hardly remaining effective
when applied to the physical world. In contrast, the latter focus on
developing physical deployable attacks, which are more robust
in complex physical environmental conditions (e.g., brightness,
occlusion, etc.). Recently, with the increasing deployment of the
DNN-based system in the real world, enhancing the robustness
of these systems is an emergency, while exploring physical
adversarial attacks exhaustly is the precondition. To this end, this Fig. 1. Count of physical attack literature published yearly.
paper reviews the development of physical adversarial attacks
against DNN-based computer vision tasks (i.e., image recogni-
tion and object detection tasks), which can provide beneficial elaborately designed glasses frames can bypass the facial
information for developing stronger physical adversarial attacks. recognition system; the automatic driving car is susceptible to
For completeness, we will briefly introduce the works that do
not involve physical attacks but are closely related to them. environmental changes (e.g., shadow or brightness), resulting
Specifically, we first proposed a taxonomy to summarize the in out-of-control or misdetection. That poses a potential risk
current physical adversarial attacks. Then we briefly discuss for deploying DNN-based systems in the real world, especially
the existing physical attacks and focus on the technique for for security-sensitive applications.
improving the robustness of physical attacks under complex Since Szegedy et al. [9] first found that the DNN-based
physical environmental conditions. Finally, we discuss the issues
of the current physical adversarial attacks to be solved and give image classifier is susceptible to malicious devised noise, a
promising directions. line of works [10]–[16] have been proposed to explore the
adversarial examples and the corresponding countermeasure
Index Terms—Adversarial examples, Physical adversarial at-
tacks, Security of neural network techniques in different research areas, especially in computer
vision. Accordingly, many surveys [17]–[22] were proposed to
the taxonomy or summarize the development of the adversarial
I. I NTRODUCTION examples. Serban et al. [21] made taxonomy exhaustively for
attack and systemically summarized the existing attack meth-
D EEP neural networks (DNNs) achieves considerable
success in a wide range of research filed, including
computer vision [1]–[4], natural language processing [5], [6],
ods from machine learning to DNNs. [20], [22]–[25] reviewed
the development of adversarial attack and defense methods
and speech recognition [7], [8]. Applications with the core against DNNs at different times. Some survey only focus on
of these advanced techniques have been widely deployed the specific task, such as image classification [26], adversarial
in the real world, such as facial payment, medical image patch [27]. However, these surveys mainly provide an overall
diagnosis, automatic driving, and so on. However, DNNs have view of the adversarial attack and defense in the specific
exposed their security risks to adversarial examples crafted research area or a roughly overall view. Moreover, with the
by small noises that are invisible to humane observers but increasingly maturing of DNNs in commercial deployment,
can mislead the DNNs to output wrong results. For instance, exploring the physical attack is more urgent for enhancing
the robustness of DNN-based systems. Although some works
*Wen Yao and Tingsong Jiang are the corresponding authors. have reviewed the development of physical attacks [22], [28],
JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2022 2
TERMINOLOGY
they are out-of-date since plenty of novel physical attacks x Clean image.
have emerged [29]–[39] in the past two years. Concurrent y Ground truth of the image.
to our work, although [40], [41] review physical attacks, x̂ Clean image in the physical world.
xadv Adversarial example.
they only review some of the current physical attacks, i.e., x̂adv Adversarial example in the physical world.
43 and 46 publications, respectively. In contrast, we review δ Adversarial perturbation.
69 publications on physical attacks. Figure 1 illustrates the Maximum allowable magnitude of perturbation.
f Neural network.
number of physical attack literature published yearly since ycls Ground truth label for the classification task.
2016. Therefore, it is necessary to analyze the newly proposed yobj Ground truth label for the object detection task.
method for better tracking the latest research direction. Jf (·) Loss function for neural network f.
∇Jf (·) The gradient of the f with respect to the input x.
In this paper, we comprehensively review the existing || · ||p Lp norm distance.
physical adversarial attacks in the computer vision task. Given d(·, ·) Distance metrics.
T Transformation distribution.
the statistical result of the current literature, we mainly survey t A specific transformation.
the two most prevalent tasks: image recognition and object
detection. Note that, object tracking and semantic segmenta-
tion tasks are similar to object detection. Thus we include II. F UNDAMENTAL CONCEPT
the physical adversarial attack against object tracking and
semantic segmentation in object detection. Specifically, we In this section, we briefly introduce the foundation concept
first propose a taxonomy to summarize the current physical of deep learning and adversarial examples.
adversarial attacks from the formal of adversarial perturbation,
the way of physical attack, the settings of physical attack, and
so on. Then, we briefly analyze different physical adversarial A. Deep learning
attacks in detail. After reviewing attacks for each field, we
summarize the most frequently used technique for improving Deep learning, a data-driven technique, is known for its
the robustness of physical attacks. Finally, we discuss the powerful capability to understand data patterns and features
issues of the current physical adversarial attacks to be solved from enormous data, where the learned representations are
and the promising direction. We hope this survey will pro- beneficial to various regressions or classification tasks. In gen-
vide comprehensive retrospect and beneficial information on eral, the deep learning technique consists of data preprocess-
physical adversarial attacks for researchers. ing, architecture design, training strategy, loss function, and
performance metrics. The core component of deep learning
To cover the existing physical attacks as much as possible, is automatic feature learning driven by deep neural networks
we searched the keywords “physical attack” or “physical (DNNs), in which the most representative characteristic is that
adversarial examples” from Google Scholar to obtain the it can extract useful features from considerable data in an end-
latest paper. In addition, conference proceedings in CVPR, to-end manner.
ICCV, ECCV, AAAI, IJCAI, ICML, ICLR, and NeurIPS were
The architecture of DNN is determined by the different
searched based on the keyword toward titles of the papers. We
connections and numbers between the neurons and the layers.
manually checked the references in all selected papers to seek
In the past decade, convolutional neural networks (CNNs)
the relevant literature to avoid leakage. In addition, we exclude
were the most commonly used architecture in computer vision
papers that have not relevant to physical adversarial attacks.
tasks and have been maturely deployed in commercial appli-
Finally, we collected 69 physical adversarial attack papers in
cations. CNN consists of multiple hidden layers with different
computer vision.
connections, where each layer contains several convolution
In summary, our contributions are listed as follows. kernels for extracting different feature representations from
input data. The representative architecture of CNN in computer
• We make a comprehensive survey on physical adversar-
vision is ResNet [2], VGG [42], and DenseNet [43], which
ial attacks in computer vision tasks to provide overall
are employed in image recognition tasks and object detection
knowledge about this research field.
tasks [2], [44]. Recently, inspired by the incredible success
• We propose a taxonomy to summarize the existing phys-
of transformer architecture in the NLP task, the transformer
ical adversarial attacks according to the characteristic of
has gradually been applied to computer vision tasks, called
different attacks. We briefly discuss different physical
vision transformers. Vision transformer consists of an encoder
attacks and outline the most frequently used techniques
and decoder with several transformer blocks of the same ar-
for improving the robustness of physical attacks.
chitecture. The encoder produces the embedding of the inputs,
• We discuss the challenging and research directions for
while the decoder takes all embedding as input to generate the
physical adversarial attacks based on the following is-
output with their incorporated contextual information. Despite
sues: input transformation augmentation; attack algorithm
the architecture discrepancy between different networks, the
design consisting of developing physical attacks under
neural network has a uniform representation formal, i.e., f (x),
black-box settings, transferability, attack multitask, and
and the concrete representation of f (x) depends on the given
so on; evaluation contains physically attack benchmark
tasks. However, despite DNNs achieving great success, the
evaluation datasets construction and uniform physical test
underlying mechanism of DNNs is unclear and exposing the
criterion.
potential security risks as it is overly dependent on the data.
B. Adversarial examples A. Adversary’s knowledge

The concept of the adversarial examples was first proposed 1) White-box attacks: White-box attacks assume the ad-
by Szegedy et al. [9], which refers to a kind of sample versary has full knowledge about the target victim network,
containing carefully crafted noise that is invisible to human including architecture, parameters, and training data. In white-
observers but can mislead the DNN to make the wrong output. box settings, the adversary is permitted to exploit the gradient
Specifically, given an image x and a well-trained network f , with respect to input images of the target victim model
the adversarial examples can be represented as the following to construct the adversarial examples. Therefore, white-box
optimization problem attacks are the most efficient adversarial attacks at present.
2) Black-box attacks: Black-box attacks assume the adver-
sary has no knowledge about the target victim network but
min δ allows the adversarial to query the target model, which makes
s.t.f (x + δ) 6= f (x) (1) it more suitable in practice. Therefore, black-box attacks are
||δ||p ≤ , more challenging than white-box attacks. Nonetheless, there
are some algorithms to perform black-box attacks, such as
where the || · ||p is the Lp -norm, is the maximum magnitude genetic algorithm, evolution evolution algorithm, and differ-
of the δ under the constrain of Lp -norm. L0 , L2 and L∞ are ential algorithm, or exploit the transferability of adversarial
the three commonly used Lp metrics. L0 counts the number examples.
of the non-zero pixels in the adversarial perturbation δ, L2
measures the Euclidean distance between the δ and zero, and B. Adversarial specificity
L∞ measures the maximum value of δ. 1) Non-targeted attack: Non-targeted attacks mislead the
Adversarial examples can be used to perform attacks on the target victim model to output any class except the original.
digital world and the physical world. In digital world attacks, Intuitively, non-target attacks tend to search the adversarial
the adversary aims to explore corner cases that can fool the example along the direction of the nearest decision boundary
DNNs, where the optimization and evaluation settings are ideal to the original sample.
and remain unchanged. By contrast, in physical world attacks, 2) Targeted attacks: Targeted attacks refer to misleading
there is a huge gap between the optimization and evaluation of the target victim model to output a pre-specific class. Although
adversarial examples. The adversarial perturbation is usually object detection has multiple outputs (i.e., bounding boxes,
generated in the digital world in advance and then required to classes, and the confidence score of the detected object), it
be made out (e.g., printed out) at the attack stage. Therefore, can be uniformly represented as the class in targeted attacks
the adversarial perturbation might encounter the following for consistency with classification tasks. Targeted attacks are
loss at the attack stage: on the first hand, loss caused by hard to realize when the distance between the original class
the device noise and color discrepancy between digital and and the target class in the high-dimensional decision boundary
physical caused by the printer. On the other hand, content is far. Therefore, targeted attacks are more challenging than
distortion originated in complex environmental conditions, non-targeted.
such as lighting, shadow, occlusion, brightness changing, ro-
tation, deformation, and so on. Note that, despite some works
attempting to attack the online employed DNN-based systems, C. Computer vision tasks
we do not discuss these works as we are focused on the 1) Image recognition task: Image recognition is a conven-
attacks that can apply to the real world. Based on the above tional task in computer vision designed to classify the input
discussion, developing effective physical adversarial attacks image into a specific class. For a well-trained DNN-based
are more challenging than digital attacks. image classifier f , given an image x, one can get the output
f (x) = ycls , where ycls = y, ycls and y is the predicted class
and ground truth of the input image, respectively, which are
III. TAXONOMY OF PHYSICAL ADVERSARIAL EXAMPLES usually denoted as a scalar.
2) Object detection task: Object detection is another tradi-
In this section, we propose a taxonomy scheme to sys-
tional task in computer vision, designed to locate the object
tematically categorize the existing physical adversarial attack
in the image and simultaneously give the class of the located
methods in computer vision tasks. Specifically, we first intro-
object. For a well-trained DNN-based object detector f , given
duce the basic classification of adversarial examples. Then, we
an image x, one can get the output f (x) = yobj , where yobj
divide all physical adversarial attack methods in the computer
contain three outputs including the bounding box, the confi-
vision tasks into image classification and object detection
dence score, and category, where the bounding box contains
according to the tasks of victim models. We then categorize
two coordinates consists of four scalars, latter two terms are
the existing physical attack methods in terms of the form
scalar value.
of perturbation, the way of attacks, representative techniques
for improving the robustness of physical attacks, and testing
environments according to the implementation details of dif- D. Form of perturbation
ferent attack methods. Finally, we describe three widely used Unlike the attack designed to add perturbation to the whole
techniques for improving physical attack strengths. image, physical adversarial attacks are devised to modify the
object region to mislead the target model, called instance-level TABLE I

attacks. The reason is that it is impossible to modify the back- T RANSFORMATION DISTRIBUTION REPORTED IN [45]–[48].
ground environment (e.g., sky) in the real world. In contrast, Transform Parameters Remark
instance-level attacks concentrate on crafting the perturbation Affine µ = 0, σ = 0.1 Perspective/Deformed Transforms
for the targeted object. Therefore, instance-level attacks are Rotation −15◦ ∼ 15◦ Camera Simulation
Contrast 0.5 ∼ 1.5 Camera Parameters
suitable for performing physical attacks. According to the Scale 0.25 ∼ 1.25 Distance/Resize
form used to modify the target object, the perturbation can be Brightness −0.25 ∼ 0.25 Illumination
categorized as the adversarial patch, adversarial camouflage Translation −0.04 ∼ 0.04 Pattern Location
Cropping −0.7 ∼ 1.0 Photograph/Occlude Simulation
texture, and optical attacks. Random Noise −0.15 ∼ 0.15 Noise
1) Adversarial patch: The adversarial patch is an image
patch that can be stuck on the target object’s surface. Thus, ad-
versarial patch attacks have the advantage of easily performing
physical attacks as the adversary only requires printing out the
patch and then sticking/hanging it on the specific target object.
In optimization, the adversarial patch is pasted on the target
(a) Digital Image
object, which is consistent with physical deployable. However,
inconsistencies exist when the target object is not a plane, as
it simply pastes on the plane target in the 2D image during
optimization. Additionally, the adversarial patch merely works
in the specific view. The above issues curb the usefulness of
(b) Printed Image
adversarial patches in the real world.
2) Adversarial camouflage texture: Adversarial camouflage Fig. 2. Visualization discrepancy between the digital image (a) and its printed
images (b). [50]
texture attacks are designed to modify the appearance of the
3D object by the optimized adversarial texture, which can be
wrapped/painted over the target object’s surface, such as the contrast, the adversary implements non-invasion attacks by
pattern of clothes or the scrawl/painting of the vehicle. Thus, projecting a shadow or emitting a laser toward the target
that perturbation is suitable for non-plane objects, making it object, which is more stealthy and dangerous. Optical attacks
remain consistent well with the deployment in physical attacks. fall into this type.
There are two technology roadmaps to optimize the texture.
One is to optimize a 2D image as the texture of the target
object; The other is to optimize a 3D texture directly with the F. Techniques for enhancing the robustness of physical attacks
help of a physical renderer. 1) Expectation Over Transformation (EOT): EOT is a gen-
3) Optical attacks: Unlike adversarial patch or camouflage eral framework for improving the adversarial robustness of
texture attacks, optical attacks are designed to modulate the physical attack on a given transformation distribution T [49].
light or laser to perform physical attacks. In physical attacks, Essentially, EOT is a data augmentation technique for adver-
the adversary emits the light/laser by the lighting device (e.g., sarial attacks, which takes potential transformation in the real
projector and laser emitter) toward the target object to modify world into account during the optimization, resulting in better
its appearance for attacking the target model. Thus, optical robustness. Table I provides the widely used transformations
attacks are controllable and well stealthy. However, optical reported in [45]–[48]. Specifically, rather than taking the norm
0
attacks are susceptible to environmental light, which curbs of x − x to constrain the solution space, given a distance
their utilization in practice. function d(·, ·), EOT instead aims to constrain the expected
effective distance between the adversarial and original inputs
under a transformation distribution, which expressed as
E. Way of physical adversarial attack
0
According to whether the target object requires to be δ = Et∼T [d(t(x ), t(x))]. (2)
modified, the implementation way of physical attacks can be
categorized into invasion attacks and non-invasion attacks. Note that, this survey categorizes the methods that used the
1) Invasion attacks: Invasion attacks mean that the adver- transformation as EOT for simplicity.
sarial perturbation would modify the object itself. For instance, 2) Non-Printability Score (NPS): One necessary way to
in attack facial recognition, the adversary wears an adversarial perform physical attacks is to print the adversarial perturba-
hat and glasses; In attack pedestrian detection, the adversary tion. Thus, the perturbation will inevitably have some distor-
wears clothes or holds an adversarial patch plane; In attack tion caused by the printer, resulting in attacks failing in some
vehicle detection, the adversary modifies the appearance of cases. For instance, Figure 2 illustrates the digital color and
the vehicle by pasting the adversarial patch or painting the its printed result. To address this issue, non-printability score
adversarial camouflage over the surface of the object to be (NPS) loss [51] is devised as a metric to measure the color
attacked. distance between optimized adversarial perturbation and the
2) Non-invasion attacks: Non-invasion attacks do not re- common printer. In general, the NPS loss can be represented
quire the adversary to modify the target object manually. In as the following loss
1) Stationary physical test: The stationary physical ad-

Lnps =
X
min |p(i,j) − cprint |, (3) versarial test experiment is performed in a static physical
p(i,j) ∈xadv
cprint ∈C environment, which means the adversary object and the sensor
device maintain the static at the same time. For example, to
where the p(i,j) is a pixel at coordinate (i, j) in the adversarial attack the automatic checkout system, the adversarial sticks the
example and cprint is a color in a set of printable colors C. adversarial patch to the surface of the commodity and place
Under the constraints of NPS loss, the optimized adversarial it under the sensor input of checkout systems [29], [54].
perturbation will gradually close to the predefined printable 2) Dynamic physical test: The dynamic physical test is
colors. performed in a dynamic environment, which makes it more
3) Total variation norm (TV loss) : Natural images have challenging. The reason is that more factors will impact
representative characteristics of smooth and consistent patches, perturbation as the changing environment. Thus, the dynamic
where color is only gradually within patches [52]. Hence, physical test includes three different formals 1) the adversarial
finding smooth and consistent perturbations enhances the object is fixed while the sensor device is moving (e.g., drone
plausibility of physical attacks. In addition, due to sampling surveillance); 2) the adversarial object moving while the sensor
noise, extreme differences between adjacent pixels in the device is fixed (e.g., video surveillance); 3) both the adversarial
perturbation are unlikely to be accurately captured by cameras. object and the sensor device are moving simultaneously (e.g.,
Consequently, perturbations that are non-smooth may not automatic driving).
be physically realizable [51]. To address the above issues, According to devised taxonomy scheme, we summarize the
total variation (TV) [52] loss is presented to maintain the existing physical adversarial examples of image recognition
smoothness of perturbation. For a perturbation δ, TV loss is and object detection task in Table II and Table III separately,
defined as and we provide the follow-up update in https://github.com/
winterwindwang/Physical-Adversarial-Attacks-Survey. The
organization of the following paper is that we review the
1
physical adversarial attack against the image recognition
X
T V (δ) = ((δi,j − δi+1,j )2 + (δi,j − δi,j+1 )2 ) 2 , (4)
i,j tasks in Section IV and object detection task in Section V.
In Section VI, we discuss challenges to be solved. Then we
where δi,j is a pixel in δ at coordinates (i, j). T V (δ) is provide some promising research directions for constructing
low when the values of adjacent pixels are close to each better physical adversarial examples. Finally, we make a
other (i.e., the perturbation is smooth) and vice versa [51]. conclusion in Section VII.
Therefore, under the constraint of T V (δ) loss, the perturbation
is gradually smooth and suitable for physical realizability.
IV. P HYSICAL A DVERSARIAL ATTACK ON I MAGE
4) Digital-to-Physical (D2P): In some cases, EOT cannot
R ECOGNITION TASK
cover all physical world transformations, such as the discrep-
ancy between the digital image and the corresponding captured Advanced deep learning techniques have been increasingly
physical image caused by environmental factors. Thus, some deployed in daily life. For instance, facial recognition in au-
researchers [53] proposed to utilize the digital image and tomatic payment, image recognition techniques are employed
the corresponding physical image pair (namely, D2P) to train in auto checkout machines in the shopping center, and road
the adversarial perturbation, reducing such discrepancy. D2P sign recognition in the computer-aid drive of the automated
has been shown the effectiveness in improving the physical vehicle. Therefore, a line of literature is proposed to investigate
robustness of adversarial perturbation. Therefore, D2P can the security of these systems. In this section, we will discuss
be regarded as an effective technique to boost the physical the existing physical attacks on image recognition tasks from
robustness of adversarial attacks. general image recognition, facial recognition, and road sign
recognition.
G. Physical Testing environment

A. General image recognition
The common process of physical adversarial attacks is de-
scribed as follows: first, optimize the adversarial perturbation The early investigation of physical adversarial attacks
in the digital world. Then, prepare the adversarial perturbation against image recognition mainly focuses on verifying the
and paste/paint it over the target object, e.g., printing out the possibility of physical attacks by printing the adversarial
adversarial patch with the color printer and then hanging it image. Later, more realistic settings are explored, such as the
on the object. Finally, place the adversarial perturbed object utilization of adversarial patches [59] or adversarial viewpoints
toward the deployed sensor device (e.g., RGB camera sensor) [61].
to perform attacks. Figure 3 provides an overview of the 1) Iterative Fast Gradient Sign Method (I-FGSM): Unlike
whole process of physical attacks. According to the relation exploring the adversarial examples in the digital world, Ku-
between the adversarial object and the sensor device, the rakin et al. [55] investigated the possibility of adversarial
physical testing environment can be divided into stationary examples in the physical world. Specifically, the author op-
and dynamic. timized the adversarial examples by iterative updates along
Optimization
Robustness Transformation Patch applier Victim model
Loss
Brightness
function
Rotate Resize
Adversarial /contrast
perturbation
Print out Stage 1: Optimize adversarial perturbation in digital space
Picture Physical test set
Stick patch
Victim model
Patch paper
Stage 2:Test the adversarial perturbation in physical space
Fig. 3. Overview of the physical adversarial attack process.
the direction of the gradient ascent multiple times in small 3) D2P: Take further steps than [55], Jan et al. [53]
steps. The attack algorithm can be expressed as follows. analyzed the discrepancy between the digital image and the
physical adversarial image, then proposed a method to model-
x0adv = x,
(5) ing such difference. Specifically, the author first collected enor-
xN +1
N N
adv = Clipx, xadv + αsign(∇x J(xadv , y)) mous digital-physical image pairs by printing digital images
where α controls the iterative step, is the maximum al- and retaking them. Then, the author treated the transformation
lowable modification over the image, Clipx, (·) is a function between the digital and physical worlds as the image-to-
to confine the discrepancy between adversarial examples and image translation and adopted a Pix2Pix model [106] or
clean image in . In physical attacks, the author printed a CycleGAN [107] to learn the translation mapping. Addi-
adversarial examples and used a phone to capture the photo tionally, the EOT is adopted to boost the robustness of the
of adversarial examples. Experiment results suggested that the adversarial perturbation on the simulated physical image (i.e.,
printed adversarial examples can fool the image recognition the generator’s output). In physical attacks, the author printed
model. out the adversarial examples and retook them every 15 degrees
2) EOT: To construct robust adversarial examples for phys- in the range of [-60, 60]. Experiment results suggest that their
ical attacks, Athalye et al. [49] proposed a general frame- attack outperforms the EOT and remains valid under different
work to synthesize robust adversarial examples over a set of visual angles. However, their attack is the full pixel attack,
transformation distributions, which called Expectation Over which is impractical in practice.
Transformation (EOT). Specifically, they constructed the the 4) ABBA: Guo [56] proposed a novel motion attack, which
following optimization problem exploits the blurring effect caused by object motion to fool
the DNNs. Specifically, the author optimized the specific
argmin Et∼T [log P r(yt |t(x0 )) transform parameters for the object and background to create
x0 (6) the blurring effect. The background is extracted by a binary
−λ||LAB(t(x0 )) − LAB(t(x))||2 ], mask generated with a saliency detection model. Moreover, the
where the P r(yt |t(x0 )) represents the probability of t(x0 ) be author used a spatial transformer network [108] to improve
classified to yt , which is used to guarantee the adversarial; the robustness of adversarial examples. In physical attacks,
LAB(·) is the color space convert function between RGB the author captured the object image in the real world and
space to LAB space, which used to ensure the perceptual of generated the adversarial blur examples by the devised method.
adversarial perturbation. Note that, they develop an algorithm Then, the author moved the camera according to the optimized
to optimize the texture of 3D objects in 3D cases scenario. transformation parameters to take the blurred image. Experi-
In physical attacks, the author printed the 3D object and the ment results show that the captured blurring image achieved
adversarial texture with the printer (3D), then wrapped the a success rate of 85.3%.
adversarial texture over the 3D object. Evaluations were con- 5) MetaAttack: Feng et al. [57] observed that the previous
ducted on the captured image, and the results suggested that D2P requires enormous amounts of manually collected data
the multi-view captured images can fool the image recognition and proposed to utilize the meta-learning technique to solve
model. the problem. Specifically, the author proposed to use GAN
TABLE II
P HYSICAL ADVERSARIAL ATTACKS AGAINST IMAGE RECOGNITION TASKS . W E LIST THEM BY TIME AND TASK , ALIGNING WITH THE PAPER ’ S
DISCUSSION ORDER .
Adversarial’s Threat Adversarial Robust Physical

Method Year-Venue Space Remark Code
Knowledge Model Specificty Technique Test Type
I-FGSM [55] 2017-ICLR White-box Classification Targeted/Nontargeted - Static 2D Pixel-wise X
EOT [49] 2018-ICLR White-box Classification Targeted EOT Static 2D Pixel-wise ×
D2P [53] 2019-AAAI White-box Classification Targeted EOT,D2P Static 2D Pixel-wise X
ABBA [56] 2020-NeurIPS White-box Classification Nontargeted EOT Static 2D Pixel-wise X
MetaAttack [57] 2021-ICCV White-box Classification Targeted EOT Static 2D Pixel-wise ×
ISPAttack [58] 2021-CVPR White-box Classification Targeted/Nontargeted - Static 2D Pixel-wise ×
AdvPatch [59] 2017-NeurIPS White-box Classification Targeted EOT Static 2D Patch X
ACOsAttack [54] 2020-ECCV White-box Classification Nontargeted EOT Static 2D Patch X
ACOsAttack2 [29] 2021-TIP White-box Classification Nontargeted EOT Static 2D Patch X
TnTAttack [38] 2022-TIFS White-box Classification Targeted/Nontargeted - Static 2D Patch ×
Copy/PasteAttack [60] 2022-NeurIPS White-box Classification Targeted EOT,TV Static 2D Patch X
ViewFool [61] 2020-NeurIPS White-box Classification Nontargeted - Static 3D Position X
LightAttack [62] 2018-AAAI-S Black-box Classification Targeted/Nontargeted EOT Static 2D/3D Optical ×
ProjectorAttack [63] 2019-S&P White-box Classification Targeted - Static 2D Optical ×
OPAD [64] 2021-ICCV-W White-box Classification Targeted/Nontargeted EOT Static 2D Optical ×
LEDAttack [65] 2021-CVPR White-box Classification Targeted/Nontargeted EOT Static 2D Optical X
AdvLB [66] 2021-CVPR Black-box Classification Nontargeted - Static 2D Optical X
SLMAttack [67] 2021-arXiv White-box Classification Nontargeted - Static 2D Optical ×
SPAA [68] 2022-VR White-box Classification Targeted/Nontargeted D2P Static 3D Optical X
AdvEyeglass [51] 2016-CCS White-box FR Targeted/Nontargeted TV,NPS Static 2D Eyeglasses ×
AdvEyeglass2 [69] 2019-TOPS White-box FR Targeted/Nontargeted Color Adjust Static 2D Eyeglasses X
CLBAAttack [70] 2021-BIOSIG White-box FR Targeted/Nontargeted EOT Static 2D Eyeglasses ×
ReplayAttack [71] 2020-CVIU White-box FR Targeted/Nontargeted EOT,Position Adjust Static 2D Patch ×
ArcFaceAttack [72] 2019-SIBIRCON White-box FR Targeted/Nontargeted TV Static 2D Sticker ×
AdvHat [73] 2020-ICPR White-box FR Nontargeted TV,Bending Static 2D Sticker ×
TAP [74] 2021-CVPR White-box FR Targeted/Nontargeted EOT Static 2D Sticker ×
AdvSticker [39] 2022-TPAMI Black-box FR Targeted/Nontargeted Blending Static 3D Sticker X
SOPP [75] 2022-TPAMI Black-box FR Targeted/Nontargeted - Static 2D Sticker X
AdvMask [76] 2022-ECMLPKDD White-box FR Targeted/Nontargeted Rendering Static 3D Mask X
LPA [77] 2020-CVPR-W White-box FR Targeted/Nontargeted Position/Color Adjust Static 2D Opitcal ×
RP2 [46] 2018-CVPR White-box TSR Targeted/Nontargeted D2P Static 3D Patch ×
RogueSigns [78] 2018-arXiv White-box TSR Targeted EOT Dynamic 2D Pixel-wise X
PS GAN [79] 2019-AAAI White-box TSR Nontargeted - Static 2D Patch ×
AdvCam [80] 2020-CVPR White-box TSR Targeted/Nontargeted EOT Static 2D Pixel-wise X
ShadowAttack [81] 2022-CVPR Black-box TSR Nontargeted EOT Static 2D Optical X
AdvPattern [82] 2019-CVPR White-box ReID Targeted/Nontargeted EOT,TV Static 2D Patch X
PhysGAN [83] 2020-CVPR White-box SteeringModel Nontargeted D2P Dynamic 2D Pixel-wise ×
LPRAttack [84] 2020-C&S Black-box LPR Targeted - Static 2D Patch ×
Venue: Venue with postfix “-S” and “-W” indicates the symposium and workshop. C&S : Computers & Security. CVIU: Computer Vision and Image Understanding.
Threat Model: FR: Face Recognition. TSR: Traffic Sign Recognition. LPR: License Plate Recognition.
- in Robust Technique indicates that there are no specific techniques to enhance physical robustness.
remain physically aggressive when printed out and retaken by

the camera.
7) Adversarial patch (AdvPatch): Concurrent to [49],
Brown et al. [59] first presented a general framework to
construct the adversarial patch, which is robustness under a
[57]
variety of transformations. Given that a small perturbation
[55] [53] [58]
cannot deceive the adversarial defense techniques, the author
Fig. 4. Examples of physical adversarial example for general classification. discarded the constraint on the magnitude of the adversarial
patches and proposed to use a large perturbation to attack
defense techniques. Specifically, they replaced a partial pixel
to generate adversarial examples. The generator’s parameters of the image with the optimized adversarial patch and defined
are trained via the meta-learning technique: for each task, the the patch application operator A(p, x, l, t) to implement the
training set was split into the support set and query set, where replacement. A(δ, x, l, t) means applies the transformation t to
the former is used to train the GAN, and the latter is used the patch δ, and then paste patch δ to the image x at location
to validate it. The generator is trained on multi-tasks and L. To construct the robust adversarial patch, they combine the
fine-tuned on the new tasks in a few-shot learning manner. EOT and propose the following optimization problem
Moreover, EOT is adopted to enhance physical robustness. The
physical experiments suggested that the captured adversarial
image obtained a success rate of 95.2%. δ̂ = argmin Ex∼X,t∼T ,l∼L [log P r(yt |A(δ, x, l, t))]
δ (7)
6) Attack Imaging pipelines: Unlike modifying the pixel +||δ||∞ .
of the RGB images, Phan et al. [58] proposed to modify the
RAW image that outputs from the camera device directly, By solving the above problem, one can obtain the adver-
which makes it robust to compression (i.e., JPEG). They sarial patch that could successfully mislead the DNN-based
demonstrated that perturbations devised for RAW images application to output the specific class yt on the smartphone.
TABLE III
P HYSICAL ADVERSARIAL ATTACKS AGAINST OBJECT DETECTION TASKS . W E LIST THEM BY TIME AND TASK , ALIGNING WITH THE PAPER ’ S DISCUSSION
ORDER .
Adversarial’s Threat Adversarial Robust Physical

Method Year-Venue Space Remark Code
Knowledge Model Specificty Technique Test Type
ShapeShifter [45] 2018-ECMLPKDD White-box OD Targeted/Nontargered EOT Dynamic 2D Pixel-wise X
RP2+ [50] 2018-USENIX-W White-box OD Targeted/Nontargered TV,NPS,D2P Static 2D Patch ×
AdvPatch [47] 2019-CVPR-W White-box OD Nontargered EOT,TV,NPS Dynamic 2D Patch X
NestedAE [85] 2019-CCS White-box OD Targeted/Nontargered D2P, Physical Adjust Dynamic 3D Patch ×
DPatch2 [86] 2019-ICML White-box OD Targeted/Nontargered EOT Static 2D Patch ×
LPAttack [87] 2020-AAAI White-box OD Targeted EOT,TV,NPS Static 2D Pixel-wise ×
TranslucentPatch [88] 2021-CVPR White-box OD Targeted/Nontargered Affine,NPS Static 2D Patch ×
SwitchPatch [89] 2022-arXiv White-box OD Targeted TV Static 2D Patch ×
ScreenAttack [90] 2020-arXiv White-box OD Targeted TV Static 2D Patch ×
Invisible Cloak1 [91] 2018-UEMCON White-box OD Nontargered EOT,TV Static 3D Wearable Patch ×
Adversarial T-Shirt [92] 2020-ECCV White-box OD Targeted EOT,TPS Static 2D Wearable Patch ×
Invisible Cloak2 [93] 2020-ECCV White-box OD Nontargered TPS Static 2D Wearable Patch X
LAP [94] 2021-MM White-box OD Nontargered TV,NPS Static 2D Wearable Patch ×
NaturalisticPatch [95] 2021-ICCV White-box OD Nontargered TV Static 2D Wearable Patch X
AdvTexture [96] 2022-CVPR White-box OD Nontargered EOT,TPS Static 2D Wearable Patch X
SLAP [97] 2021-USENIX Security White-box OD Nontargered EOT,TV Static 2D Optical X
MeshAdv [98] 2021-CVPR White-box OD Nontargered - Static 3D Shape ×
CAMOU [99] 2019-ICLR White-box OD Nontargered EOT Static 2D Camouflage ×
ER [100] 2020-arXiv Black-box OD Nontargered - Static 3D Camouflage ×
UPC [48] 2020-CVPR White-box OD Targeted/Nontargered EOT Static 3D Camouflage X
DAS [30] 2021-CVPR White-box OD Nontargered TV Static 3D Camouflage X
FCA [33] 2022-AAAI White-box OD Nontargered TV Static 3D Camouflage X
CAC [34] 2022-IJCAI White-box OD Targeted EOT Static 3D Camouflage ×
DPA [35] 2022-CVPR White-box OD Nontargered EOT Static 3D Camouflage ×
BulbAttack [31] 2021-AAAI White-box IR OD Nontargered EOT,TV Static 2D Optical ×
QRAttack [32] 2022-CVPR White-box IR OD Nontargered EOT Static 2D Patch ×
AerialAttack [101] 2022-WACV White-box ODAI Targeted EOT,TV,NPS Dynamic 2D Patch X
RWAE [102] 2022-WACV White-box SS Nontargered EOT Static 2D Patch X
MDEAttack [103] 2022-ECCV White-box MDE Nontargered EOT Static 2D Patch ×
PATAttack [104] 2019-ICCV White-box OT Targeted/Nontargered EOT Dynamic 2D Patch ×
MTD [105] 2021-AAAI White-box OT Targeted EOT,TV Dynamic 2D Patch ×
OD: Object Detection. SS: Semantic Segmentation. MDE: Monocular Depth Estimation. IR OD: Infrared Object Detection. ODAI: Object deTection in Aerial Images. OT: Object Tracking.
Venue with postfix “-W” indicates the workshop.
- in Robust Technique indicates that there are no specific techniques to enhance physical robustness.
However, lacking constraint on perturbation results in the odd to construct naturalistic adversarial patches. Unlike crafting
appearance of the adversarial patch. full-pixel adversarial examples with GAN [109], the author
8) ACOsAttack: The previous physical attacks failed to first train a generator to map the latent variable z (i.e., random
explore the DNN-based application in real life, Liu et al. noise) to the realistic image patch, which will be embedded in
[29], [54] proposed optimizing a universal adversarial patch to the image. In the second stage, the author fixed the generator’s
attack the real deployed automatic check-out system (ACOs). parameters, then used the gradient-based optimization method
Specifically, the author exploits models’ perceptual and se- with the guide of adversarial loss to seek the optimal latent
mantic biases to optimize the adversarial patch. Regarding variable z, which generates the visual naturally adversarial
the perceptual bias, the author used hard examples which patch. Their approach is based on the fact that the trained
convey strong model uncertainties and extracted the textural generator could map the randomly sampled latent variable
patch prior from them in terms of the style similarities. z to a naturalistic image, in which one latent space can be
To alleviate the heavy dependency on large-scale training mapped to the naturalistic adversarial patch. In such a way,
data, the author exploited the semantic bias by collecting the author significantly reduced the optimization variable (only
many prototypes with respect to the different classes, then 100-dimension latent variables z). In physical attacks, the
trained the universal adversarial patch on the prototype dataset. author printed the patch image and stuck it on the front of
Moreover, the author also adopted the EOT to improve the clothes, then captured a 1-minute video of the person who
robustness of the adversarial patch. In physical attacks, the wore the clothes with the adversarial patch. Experiment results
author printed the adversarial patch and stuck them on the suggested that over 90% of captured images in frames can fool
commodities, then captured it from different environmental the target network.
conditions (i.e., angles {−30◦ , −15◦ , 0◦ , 15◦ , 30◦ } and dis- 10) Copy/Paste Attack: Inspired by the ringlet butterfly’s
tances {0.3, 0.5, 0.7, 1})(meter). Experiment results suggested predators being stunned by adversarial “eyespots” on its
that the captured images with the adversarial patch easily wings, Casper [60] speculate that the DNNs are fooled by
deceive the system. Additionally, their attack could degrade the interpretable features in the real world. To this end, the author
recognition accuracy of the prevalent E-commerce platforms proposed to develop adversarial perturbation that can reveal
(i.e., JD and Taobao) on the test set by 43.75% and 40%. the weakness of the victim network. Specifically, the author
9) TnTAttack: Recently, Bao [38] et al. pointed out the used GAN to create the patch, which is pasted on the target
existing physical adversarial patches are visually conspicuous. image that is classified as the specific target class by the
To address the issue, the author proposed to exploit a generator target model. Additionally, EOT is adopted to improve the
adversarial patch’s robustness, and TV loss is adopted to

discourage high-frequency patterns. In physical attacks, the
author printed the adversarial patch and pasted it on the
[62] [63] [64]
object. Evaluation results on the recaptured image verify the
effectiveness of their method in the real scenario.
11) ViewFool: Unlike generating the adversarial perturba-
tion by modifying the pixel of images, [61] suggested the
existence of adversarial viewpoints in the real world, which
means that the image taken at a specific view is sufficient to [65] [67] [66] [68]
fool the model. To search such viewpoints, the author treats the Fig. 5. Examples of optical attacks.
coordinate and orientation (i.e., yaw, roll, pitch) of the camera
in the world axis as the optimization variables and exploits the
NeRF to render the object with the searched coordinate and layout, width, and intensity are optimized by random search.
orientation. In physical attacks, the author takes pictures from Recently, Huang et al. [68] proposed a stealthy projector-
adversarial viewpoints and evaluates the attack performance. based adversarial attack (SPAA) method, which formulated
Experiment results suggest that DNN models are vulnerable the physical projector attack as an end-to-end differentiable
to adversarial viewpoints. process. Specifically, the author devised a PCNet to approx-
12) Optical attacks: Unlike adversarial patch attacks, op- imate the project-and-capture process, which could optimize
tical attacks are more stealthy and controllable as the attacker the adversarial perturbation on the projected-captured image
is not required to modify the target object. One representative pairs under the guidance of adversarial loss and stealthiness
characteristic of optical attacks involves the projector-camera loss. In physical attacks, the author projected the adversarial
model, a process for improving the robustness of physical perturbation on the target object 1.5 meters away from the
attacks. It refers to projecting the adversarial perturbation into projector and recaptured it for evaluation. Experiment results
the real world by the projector and recapturing it by the demonstrated that the stealthy adversarial perturbation gener-
camera for evaluation. For this reason, Nichols et al. [62] ated by SPAA could deceive the DNN significantly.
first collected hundreds of real projected and captured image Figure 5 illustrates examples of various optical attack
pairs as the training dataset, then they got inspiration from the methods. However, despite its stealthiness and controllability,
one-pixel attack and adopted the differential evolution (DE) to optical attacks are constrained by many factors, such as
optimize one pixel as the projection point in the low-resolution the object’s surface material, the color’s saturation, and the
image (e.g., 32 × 32). In physical attacks, the author projected scene’s light intensity. However, despite its stealthiness and
the light over the specific area of the object and recaptured controllability, optical attacks are constrained by many factors,
it. Experiment results demonstrate the effectiveness of light such as the object’s surface material, the color’s saturation, and
attacks. the scene’s light intensity.
In contrast to [62], Man et al. [63] performed physical 13) AdvPattern: In contrast to the image recognition task,
attacks by projecting a light source over the whole image. Wang et al. [82] first attempted to implement the physical ad-
Specifically, to ensure the emitted light can impact the image versarial attack against the person re-identification (re-ID). To
to fool the classifier, the author treated the color distribu- improve the physical robustness of the adversarial pattern, they
tion (the means and the standard deviations of the Gaussian devise a degradation function (i.e., a kind of data augmentation
distribution) of light as an optimization problem. In physical technique) that randomly changes the brightness or blurs the
attacks, the author modulated the projector’s color (the value adversary’s images from the augmented generated set, which
of RGB) to emit the light to the displayed image, then took is constructed by a multi-position sampling strategy. Such a
the photo for testing. Cased studies verify the effectiveness of strategy is essentially used to boost the performance of an
physical attacks. Gnanasambandam et al. [64] considered the attack under a multi-camera view. For the physical adversarial
projector factor and devised a transform function consisting experiment, the author printed out the adversarial pattern and
of a matrix and an offset to mimic the process of converting attached it to the adversary’s clothes, and applied the physical
images to the physical scene, where the matrix is used to attack under the condition of distance (4 ∼ 10 meter) and
transfer the image to projector space; and the offset is used angle (12◦ ∼ 47◦ ).
to counteract the background illumination. In optimization, the
transform function is performed first, followed by the gradient- B. Facial Recognition
based algorithm (i.e., PGD [12]). Concurrent to [64], Kim et Facial recognition is a technique to match a human face
al. [67] modulated the phase of the light in the Fourier domain from a digital image or video frame against a database of
using a spatial light modulator (SLM) to yield the adversarial faces. In general, the first step of facial recognition is to
attack, where the modulator’s parameters are optimized with extract the face area of the input image and feed it into
gradient-based algorithms. Sayles et al. [65] modulated a light the facial recognition system to predict/match the concrete
signal to illuminate the object then its surface displays striping person. Thus, the adversarial attack against facial recognition
patterns, which could mislead the classifier to make a mistake can be categorized into dodging and impersonation attacks;
in physical attacks. Duan et al. [66] modulated a laser beam the former refers to the adversary seeking the adversarial
to perform adversarial attacks, where the laser’s wavelength, perturbation to mislead the face to the specific face, while
X
Luntargeted
G = lg(1 − D(G(z)))
z∈Z
X (9)
−κ( Fyi (x + G(z)) − Fgt (x + G(z))),
i6=x
X
Ltargeted
G = lg(1 − D(G(z)))
z∈Z
X (10)
(a) (b) (c) (d) −κ(Fyt (x + G(z)) − Fyi (x + G(z))),
Fig. 6. Examples of adversarial glasses for dodging and impersonation attacks i6=t
of [51] (a,b) and [69] (c,d).
where the z is sampling from a distribution Z, D(·) and G(·)
are the output of discriminator and generator, respectively. Fyt
the latter refers to the attacked face that can be misidentified and Fyi are the network’s softmax output of the ground-truth
as other face except the original one. Actually, the dodging label yt and another label yi . lg(·) is the logarithmic function
and impersonation attack is analogous to the untargeted and and κ is the weight to balance the loss function. By minimizing
targeted attacks, respectively. Specifically, physical attacks the loss function, the generator learns to craft realistic glasses
against the facial recognition system to output the incorrect frames with adversarial. To further enhance the robustness of
person (impersonation attack) or undetected result (dodging the adversarial glasses in physical attacks, the author intro-
attack) by modifying the accessorize of the face, e.g., eye- duce the following three methods. The first one is to utilize
glasses frame [51], [69], hat [73], sticker [39], or even makeup multiple images of the attacker to improve the robustness of
[110]. adversarial glasses under different conditions. The second is
to collect various pose images to make the attacks robust to
1) Eyeglass attack: Sharif et al. [51] first investigated the
pose changing. The third is to use the Polynomial Texture
physical adversarial attack against face recognition models.
Maps approach [111] to confine the RGB value of eyeglass
To attack the face recognition model, they taken the facial
under baseline luminance to values under a specific luminance,
accessory (i.e., eyeglasses) as the perturbation carrier. The
making the attacks robust to varying illumination conditions.
robustness of the single perturbation is enhanced by training
In physical attacks, the author printed out the eyeglasses frame
it on a set of images, making it remain works under different
and adversarial pattern, stuck them over the eyeglasses frame,
imaging conditions. Additionally, the TV loss [52] is adopted
and then recaptured images for evaluation. Experiment results
to improve the smoothness of perturbation and designed the
show that their method outperforms [51] by the improvement
NPS loss to ensure the printability of perturbation. Finally, the
of 45% under different physical conditions (i.e., head pose
author optimizes the perturbation by constructing the following
covered 13.01◦ pitch, 17.11◦ of yaw, 4.42◦ of roll). Figure 6
optimization problem
illustrates the printed adversarial eyeglasses generated by [51]
X and [69].
argmin(( sof tmaxloss(x + δ, yt )) Later, Zhang et al. [71] demonstrated that the face authen-
δ x∈X (8) tication system equipped with the DNN-based spoof detection
+κ1 · T V (δ) + κ2 · N P S(δ)), module is still vulnerable to the adversarial perturbation. They
got inspiration from EOT [49] and train the perturbation over
where sof tmaxloss is the adversarial loss for the face recog- a serial of transformations (i.e., affine, perspective, brightness,
nition system, κ1 and κ2 balance the objectives. To conduct gaussian blurring, and hue and saturation). In performing
the physical experiment, the author printed and cropped out physical attacks, The author displayed the face image with
the adversarial eyeglass frame and stuck it on the actual pair adversarial perturbation on the screen and then captured it by
of eyeglasses. Then, the author asked the trial participant to the cellphone under different views (angles in −20◦ ∼ 20◦ ,
wear eyeglasses and stand at a fixed camera distance in the distances in 10 to 50 cm). Experiment results suggested that
indoor environment and then collected 30-50 images for each the target system is fragile to the adversarial face image.
participant. Experiment results suggested that their adversarial Recently, Singh et al. [70] leveraged the concept of curricu-
eyeglass frame easily deceives face recognition. lum learning to promote the robustness of attack to changing
Although the eyeglasses frame generated by [51] deceive brightness against the facial recognition model. In optimiza-
facial recognition to some extent, Sharif et al. [69] argued tion, the author constantly changes the brightness of face
that the appearance of the eyeglasses frame is conspicuous. To images stuck with adversarial eyeglasses, which makes their
craft an inconspicuous eyeglass frame, the author proposed to attack robust to brightness changes. Physical attacks verify the
use a generative model to craft the adversarial glasses frame effectiveness of the attack against the brightness change.
with a natural pattern. Specifically, the author first collected 2) Sticker: Unlike treating the eyeglasses as the pertur-
enormous images of real glasses frames as the training set; bation carrier, Pautov et al. [72] proposed a simple but
Then trained an adversarial generative network (AGN) with effective method to craft the adversarial sticker to deceive face
the following loss function, i.e., Equation 9 and 10 for the recognition. To apply the adversarial sticker over the facial
untargeted and targeted attack, respectively. area, the author applied the projective transformation over the
used the bend and rotation transformation in the 3D space to

transform the adversarial patch to mimic the deformation of
the sticker on the real face. In physical attacks, the author
sticks the adversarial patch on the attacker’s face according to
Fig. 7. Bend and rotation over the rectangle [73]. The length of the rectangle
the searched coordinate. Experiment results demonstrated that
do not change. the natural image patch stick at the searched areas is sufficient
to degrade the performance of DNN in digital and physical
conditions.
adversarial sticker, then stuck the sticker on the nose or eye Recently, Wei et al. [75] found that the position and the
of the adversary. To optimize the sticker, the author treats pattern of adversarial patches are equally significant to a suc-
the cosine similarity between the image embedded with the cessful attack. Thus, the author optimizes the positions and si-
patch and the original clean image as adversarial loss, which multaneously perturbation against the face recognition model.
is expressed as follows. Specifically, the author used a UNet architecture to generate
the position and the attack step of the existing gradient-based
L(X, δ) = Et∈T ,x∈X [cos(ext , exadv )] , (11) algorithm (e.g., I-FGSM [55]) to reduce the optimization vari-
ables. In optimization, the author adopted the reinforcement
where ext and ex0 are the embedding corresponding to the learning framework to model the above problem, where the
desired person xt and the photo xadv of the attacker with the agent is the UNet, the policy is the sampling strategy in the
applied patch. In addition, TV loss is adopted to constrain agent, the action is specific optimization variables obtained
the visual effect of the adversarial sticker. To avoid the color by policy, and the rewards are generated by querying the
discrepancy between the physical and the digital perturbation, ensemble model with the constructed adversarial examples. In
the author chooses to optimize the sticker on a grayscale. They physical attacks, the author printed the adversarial patch with
conduct the physical attack by printing out various shapes (i.e., photo paper as it can reduce the color discrepancy as much
eyeglasses and stickers) of stickers and stick on a specific as possible. Experiment results on their method achieved an
area of the face. Experiment results show that their sticker average success rate of 66.18% on captured images.
can dramatically deteriorate FaceID models. In the prevalence of the COVID-19 pandemic, wearing
Komkov et al. [73] devised the adversarial sticker with masks has become common sense. Zolfi et al. [76] got
the rectangle shape, which is stuck on the hat to attack inspiration from this and generated a mask-like universal
the FaceID model. To optimize the adversarial sticker for adversarial patch, which makes the attack more stealthy. The
the hat, the author first applied the bends and rotation (see adversary who wears the crafted adversarial mask can bypass
Figure 7) transformation over the sticker to approximate the the face recognition systems in some scenarios, e.g., airport.
actual sticker process, which is realized by performing a Specifically, the author proposed a mask projection method to
differentiable formula on the pixel. The adversarial loss (i.e., convert the faces and masks into UV space to mimic actual
cosine similarity distance) and TV loss are adopted to optimize wearing. The mask image is optimized by minimizing the
the adversarial sticker. In physical attacks, the author stuck the cosine similarity loss and TV loss to ensure adversarial and
adversarial sticker on the forehead area of the hat. Experiment smoothness. In addition, geometric transformations and color-
results show that the person wearing the generated adversarial based augmentations are adopted to improve the robustness
sticker easily bypasses the Face ID system (e.g., ArcFace). of the mask. To perform physical attacks, the author makes
Concurrent to [73], Xiao et al. [74] proposed to improve the mask with two different materials regular paper and white
the transferability of the adversarial patch for face recogni- fabric mask. The author recruited a group of 15 male and
tion systems. The author analyzes the drawback of existing 15 female participants to wear the different devised masks,
transferability methods: sensitive to the initialization (e.g., and the experiment result demonstrated that the model’s per-
MIM [13]) and exhibits severe overfitting to the target model formance declined significantly (i.e., from 74.83% to 5.72%,
when the magnitude of perturbation is substantial. To address 4.61% for paper and fabric mask) on the face wore the
these issues, the author regularizes the adversarial patch in a adversarial mask.
low-dimensional manifold represented by a generative model 3) Light Projection Attack (LPA): Unlike the adversarial
regularization. In optimization, the adversarial patch is bent perturbation patch approaches, Nguyen et al. [77] exploited
and pasted on the eye area of the face. The author conducts the the adversarial light projection to attack the face recognition
physical attack by printing and photographing the adversarial systems. The critical technique of the light project attack
patch stuck in the human eye and achieves successful attacks. is calibrating the adversarial region and color between the
Unlike optimizing the pattern or color of adversarial per- projector and digital images. To address the issue, the author
turbation to perform attacks, Wei et al. [39] claimed that the proposed two calibration approaches (i.e., manual assign and
natural image patch is sufficient to attack facial recognition automatic obtain with a landmark detection) to constrain the
models. Specifically, the author proposes a general framework adversarial area to calibrate the position. To calibrate the
for searching the most sensitive area of the model that is color between the digital adversarial pattern and the physical
susceptible to the sticker. The author first locates the most vul- version of adversarial light emitted by the projector, the author
nerable area by traversing the face image with a patch image, presented a color transformation function γ (Equation 12)
which accelerates the search speed. In optimization, the author to implement the conversion between the camera (C) and
argmin λ||Mx · δ||p + N P S

δ (13)
Exi ∼X V J(f (xi + Ti (Mx · δ)), y),
[72] [110] [74] [70]
where the Mx is the binary mask of the location patches,
and X V is the training dataset consisting of digital and
physical examples. In physical attacks, the author performs the
[73] [77] [76] [39] experiments under various camera distances (i.e., 1.5, 3, and
Fig. 8. Examples of adversarial stickers or light projections.
12 meters) and angles (i.e., 0◦ ∼ 15◦ ) on the adversarial patch.
Additionally, they also conduct dynamic physical attacks by
recording the video while driving the car toward the adversar-
projector (G). ial traffic sign. Experiment results suggested that their method
can dramatically deceive the stop sign recognition model.
Additionally, the author empirically found that 1) adversarial
C(x̂adv + G(γ(h(x)))) = x + h(x) performance is easily impacted by the location of the mask;
⇐⇒ C(x̂adv ) + C(G(γ(h(x)))) = x + h(x) 2) the mask optimized by L1 regularization show more sparse
(12) that represents the most vulnerable of the model, while the L2
⇐⇒ C(G(γ(h(x)))) = h(x)
result in successful attacks at an earlier iterative step.
⇐⇒ γ = (C G)−1 , Concurrent to [46], Sitawarin et al. [78] proposed a general
attack pipeline, which can generate the adversarial logo, traffic
where x̂adv is the physical adversarial example, h(x) repre- sign, and custom adversarial sign. In optimization, the author
sents the algorithm to search adversarial perturbation. Addi- used a circle mask to ensure perturbation is added to the target
tionally, the author pointed out that γ in Lab color space can area (e.g., traffic sign). In addition, a set of transformations is
better approximate the physical domain RGB color space. In used to ensure robustness. Finally, the adversarial perturbation
physical attacks, the author projects the adversarial light on is optimized by minimizing the C&W adversarial loss [14]. To
the adversary’s face to attack the facial recognition model. perform dynamic physical attacks, the author drove a car and
To sum up, the main attack pipeline against facial recog- recorded the video to capture the adversarial sign from far
nition can be briefly described as follows. First, apply the (80 feet away) to near. Experiment results showed their attack
adversarial perturbation precisely over the face area using bend could achieve a 95.74% of attack success rate.
and rotation or 2D/3D conversion via rendered. Meanwhile, 2) PS GAN: Unlike the optimization-based attacks, Liu et
a set of transformations or data augmentation are utilized to al. [79] proposed to utilize the generative model to construct
improve the robustness. Finally, the cosine similarity-based inconspicuous adversarial patches. Specifically, the author
adversarial loss [70], [72], [73], [76], [77] and TV loss or exploited the attention map extracted by the Grad-CAM [112]
NPS loss to ensure the physical deployable. Figure 8 display to guide the paste location of adversarial patches for better
the examples generated by different methods. attack performance. Meanwhile, they minimized the discrep-
ancy between the adversarial patch and the seed patch for
better visual fidelity. Finally, they adopt GAN loss, adversarial
C. Traffic sign recognition loss, and the L2 -norm loss that is calculated by the seed patch
and the output of the generator to optimize the perturbation.
Traffic sign recognition is one of the widely used applica-
In physical attacks, the author printed the adversarial patch
tions of the DNN model, which can be deployed in various
and stuck it on the traffic sign (i.e., “Speed Limit 20”) in the
scenarios, such as road sign recognition in automatic driving
street with varying angles (e.g., {0◦ , 15◦ , 30◦ , −15◦ , −30◦ })
or license plate recognition in auto-checkout parking lots.
and distances (e.g., {1, 3, 5}m). Experiment results suggested
Therefore, a line of work has been proposed to explore the
that most of the captured traffic sign images pasted with
security risk of these models.
adversarial patches are misclassified by the model (from 86.7%
1) RP2 : Eykholt et al. [46] first demonstrated that the road
to 17.2%).
sign recognition model is vulnerable to physical adversarial
3) AdvCam: Take further step than [79], Duan et al. [80]
attack. The author proposed to optimize multiple white-black
proposed a novel approach to camouflage the adversarial
blocks with rectangle shapes. Specifically, the author first
perturbation (i.e., AdvCam) into a natural style that appears
collected considerable physical traffic sign images as the
legitimate to human observers. Specifically, they exploited the
training set to improve the robustness of adversarial patches
style transfer technique [113] to hide the large magnitude
to various physical environmental conditions. Moreover, to
perturbations into customized styles, which makes attacks
eliminate the color discrepancy between the digital adversarial
more stealthy. They adopt the following adversarial loss to
patch and the printed physical adversarial patch, they adopt the
maintain the aggressive
NPS loss to constrain the color of the adversarial patches. Fi-
nally, the author formulated the optimization of such physical (
deployable adversarial patches as the following optimization log(pyadv (xadv )), for targeted attack
Ladv = (14)
formal. − log(py (xadv )), for untargeted attack,
where pyadv (·) is the probability output (softmax on logits) of

the target model f with respect to class yadv . The style loss
and content loss were adopted to ensure style transfer and the
TV loss is introduced to reduce the variations between adjacent
RP2 [46] AdvLogo [78] Ph GAN [79] AdvCam [80] PhysGAN [83] ShadowAttack [81]
pixels. Additionally, the author utilized various physical condi-
Fig. 9. Examples of adversarial examples against traffic sign recognition.
tions (i.e., rotation, scale size, and lighting change) to boost the
robustness of adversarial patterns in the real world. Moreover,
the author experimentally demonstrated that the location and
attacks [84]. Rather than optimize the perturbation over the
shape of the adversarial pattern had limited influence on the
while image pixel, Qian et al. [84] proposed to optimize the
attack performance, e.g., the attack was valid even though the
best position where to paste the adversarial patch. Specifically,
adversarial patch was far from the object. In physical attacks,
the author first cut out characters of the plate and selected
the author prints the adversarial pattern on paper and then
the victim character. Then the position to be pasted with the
photos it at various angles and distances. Experiment results
rectangle black block is optimized by the genetic algorithm.
demonstrate the effectiveness of their approaches in physical
The author reveals that different license plate characters show
attacks. However, although the generated adversarial examples
different attack sensitivity. To perform physical attacks, the
with well stealthy, using the generated whole adversarial object
author first collects the car plate from the real world, then
(i.e., stop sign) to apply an attack is impractical in the real
optimize the optimal position for attack. After that, the author
world.
pastes the black sticker on the real car plate at the optimized
4) PhysGAN: In general, simulating real physical transfor-
location and captures the image for evaluation. Experiment
mations is impossible in the digital world. Unlike adopting
results show the vulnerability of the LPR model to physical
EOT to improve the physical robustness of adversarial patches,
adversarial attacks.
Kong et al. [83] proposed a novel method to learn physical
transformation from videos recorded in the real world by using In summary, the examples of various physical attack against
a generative model (i.e., PhysGAN), which is then used to the traffic sign recognition model is provided in Figure 9.
optimize the physical robust adversarial patch. Specifically,
the author placed the adversarial poster on the target object V. P HYSICAL A DVERSARIAL ATTACK ON O BJECT
of the video frame and fed it into a video feature extractor. D ETECTION TASK
The extracted features are simultaneously fed into the gen-
erator and the final classification head to calculate the loss. Object detection is a technique to locate and classify the
The author maximizes the distance between the clean and object in the image, which is widely applied in video surveil-
adversarial video in the classification output to ensure the lance and automatic driving, where the target object is the
adversarial of generated posters. Combined with the GAN’s vehicle, pedestrian, traffic sign, and so on. Existing object
loss, the trained adversarial poster preserves stealthiness and detectors can be divided into single-stage detectors and two-
can deceive the target model. In physical attacks, the author stage detectors. Single-stage detectors output the detection
printed the adversarial poster and placed it on the side of results directly, and the typically single-stage detectors include
the road, then recorded the video when driving past the the YOLO series [44], [114], SSD [115], RetinaNet [116], and
adversarial poster. Experiment results show that the steer so on. By contrast, the two-stage detector first extracts the
angle classification model is fragile to the physical printed feature with respect to input via a backbone, then performs
adversarial poster. classification and regression over the feature to obtain the cat-
5) ShadowAttack: Recently, Zhong et al. [81] argued that egories and bounding boxes. The typically two-stage detector
the pattern of perturbations generated by prior approaches is includes Faster RCNN [117], Master RCNN [118], and so on.
conspicuous and attention-grabbed for human observers. To Nonetheless, both single-stage and two-stage object detectors
this end, the author proposed to utilize the natural phenomenon have three outputs: bounding boxes, objectness, and category.
(i.e., shadow) to perform physical attacks. Specifically, the Bounding boxes locate the possible object in the image, where
author considers the shadow location and value to create the objectness gives the confidence that the located area contains
shadow. For shadow location, the author fund an appropriate the object, and category give the classification results.
polygon determined by a set of vertices. Regarding the shadow Performing the adversarial attack against object detection
value, the author uses the LAB color space rather than RGB at least requires modifying one of three outputs, making it
to search for the optimal illumination component. Finally, the more challenges than attacking the image recognition tasks.
author finds the best shadow by exploiting the particle swarm The main way to perform a physical adversarial attack against
optimization (PSO) strategy. Additionally, EOT was adopted object detection is to modify the attributes (e.g., appearance) of
to improve the robustness of the shadow attacks. In physical the object, including adversarial patch, adversarial camouflage,
attacks, according to the optimized digital shadow, the author and other novel approaches (e.g., emitting the laser toward
used a triangle board to create a shadow over the traffic sign. the object). Although the pixel-wise adversarial examples can
Experiment results suggested that the shadow attack could also realize physical adversarial attacks by printing the image
achieve the fooling rate of 95% in real-world scenarios. with adversarial perturbation, we ignore them because it is
Additionally, the license plate recognition (LPR) model has unrealistic for the adversary to modify the whole environment
been demonstrated to be vulnerable to physical adversarial in practice.
A. Patch-based physical adversarial attack

Patch-based physical attacks mainly focus on generating the
adversarial patch for the plane object. The most remarkable
advantage is that it is easily deployable in physical attacks as
the adversary can perform attacks by only printing it out and
hanging it. Such a characteristic makes it a suitable carrier for
perturbation in physical attacks.
1) ShapeShifter: Chen et al. [45] first demonstrated that the
EOT could be applied to improve the robustness of adversarial
patches in attacking two-stage object detectors (i.e., Faster Fig. 10. Visualization of adversarial patch hiding the person under the
RCNN [117]). The optimization based on C&W attack [14] is surveillance camera. [47]
formulated as follows.
1 X author captured the video of the adversarial object beginning

argmin Ex∼X,t∼T [ Lfi (Mt (xadv ), y)] away from 9 meters for the disappearance attack. By contrast,
xadv ∈Rh×w×3 m
ri ∈rpn(Mt (xadv )) in the creation attack setting, they placed the adversarial
+c · || tanh(xadv ) − xo ||22 , patch on the wall or cupboard and captured the video from
(15) toward the adversarial object, away from 3 meters. The exper-
iment result shows that the adversarial patch can mislead the
where Mt (xadv ) = (Mt (x, tanh(xadv ) is an operation that
YOLOv2 in most cases, demonstrating its fragility in practice.
transforms an object tanh(xadv ) using t and then paste it into
In addition, the experiment suggests that the indoor attack
the background image x with a binary mask; rpn represent
performs well than the outdoor attack, which can be attributed
the region proposal represented extracted by the first stage of
to the complex environment in the outdoor conditions.
Faster RCNN; the Lfi (Mt (xadv ), y) is the loss function of
The previous works [46], [59] were concentrated on the
classification in the i-th region proposal; the last term is a
no intra-class variety (e.g., stop sign). To this end, Thys et
regularization term that is used to balance the attack strength
al. [47] proposed to generate adversarial patches for intra-
and the quality of the adversarial patch. In physical attacks,
class variety (i.e., persons). They aim to hide the person
the author conducted extensive static and dynamic attacks
who holds the printed adversarial patch cardboard plate and
under multi-view conditions. The experiment demonstrated the
towards the device with the object detector application. In
vulnerability of the object detector and the feasibility of the
optimization, the author designed an algorithm to place the
physical adversarial attack.
adversarial patch in the center of the detected object, then
2) DPATCH: Inspired by the adversarial patch in image
adopted the objectness output as the adversarial loss, which
recognition tasks [59], Liu et al [119] extend adversarial patch
experimentally shows it superior to the classification output;
attack to object detector by simply attack the classification
NPS loss and TV loss are also used to ensure the printability
output of the detector. Although they demonstrated that the
and naturalness of the adversarial patch. To improve the
adversarial patch could successfully implement the targeted
physical robustness of the adversarial patch, the author adopted
attack and non-target attack on to object detector, the validity
a series of random transformations to the optimization process.
of the physical adversarial attack is not verified.
Experiment results demonstrated that the generated adversarial
3) RP2+: Following up the physical adversarial against
patch could hide the person in front of the surveillance camera
image recognition model [46], Eykholt et al. [50] extended
(based on YOLOv2) in most cases. However, there also exists
the RP2 algorithm to attack object detector. Specifically, they
the failure case when rotating the rotation in a large margin.
modify the adversarial loss for adapting to attack object de-
Zhao et al. [85] systemically investigated the adversarial
tector under two strategies creation and disappearance attack.
patch attack against two detectors (i.e., YOLOv3 and Faster
According to the observation that the detector is more tends
RCNN) in both digital and physical environments. Specifically,
to focus more on contextual areas for prediction, they take
the author first exploited the intermediate features to enhance
the object’s position and size into account to constrain the
the adversarial examples. To enhance the physical adversarial
adversarial patch, which is realized by performing rotation (in
attack, the author constructed a physical dataset by collecting
the Z plane) and position (in the X-Y plane). Additionally,
the real-world background and traffic sign image, respectively;
they find the L∞ lead to a very pixelated perturbation that
then, the traffic sign image is embedded into the background
hurts the attack performance, and then they adopt the total
image. The author also adopted the EOT to guarantee the
variation norm (TV loss [52]) instead of L∞ to address this
robustness of the adversarial patch. The author conducted
issue. The final optimization objective is expressed as follows.
the physical attack by placing the camera device inside the
argmin λ · T V (Mx · δ) + N P S car and toward the printed adversarial patch. They evaluated
δ (16) different distances between the car and the adversarial patch
+Exi ∼X V J(f (xi + Ti (Mx · δ)), y), by controlling the car’s speed (i.e., 6km/h ∼ 30km/h), the
where J(·, y) is the adversarial loss function devised for the attack success rate over 72%.
object detector, which considers the classification output. In Unlike generating the adversarial patch to overlap the object
physical attacks under indoor and outdoor conditions, the for misclassified or misdetection, Lee et al. [86] introduced an
approach to craft the adversarial patch that can place anywhere that the detection performance on captured images average
(even far from the object) in the image, which can suppress degraded 76.9% in indoor and outdoor conditions.
all the detected objects. Specifically, the author designed the 6) TranslucentPatch: Unlike crafting the adversarial patch
adversarial loss against the YOLO serial detectors and adopted and pastes on the object’s surface, Zolfi et al. [88] proposed a
the EOT (affine transformation and brightness on HSV color novel adversarial attack that crafts the adversarial patch, which
space) to improve the physical robustness. The PGD [12] is was stuck on the camera lens. To avoid the camera lens being
adopted the optimize the adversarial patch. In physical attacks, overlapped by the adversarial patch, the author took the im-
the author printed out the adversarial patch and recorded age’s alpha channel into account during the optimization. The
the video under natural lighting conditions. The YOLOv3 is author implemented the targeted attack and simultaneously
adopted to evaluate the effect of physical attacks. Experiment maintained the non-attacked objects were not to be affected,
results show that the patch is invariant to location. However, which are implemented by the Equation (18) and (19).
when placing the adversarial patch away from the object, the
size of the patches needs to be enlarged.
4) UPC: Previous works mostly craft the instance pertur- Ltarget conf = P r(objectness) × P r(targetclass), (18)
bations for rigid or planar objects (e.g., traffic sign), ignoring 1 X
the non-rigid or non-planar objects (e.g., clothes). To address Luntarget conf =
M
this issue, Huang et al. [48] proposed to craft the universal cls ∈ image
(19)
physical adversarial camouflage (UPC) for the non-rigid object cls 6= target
(i.e., person). Specifically, the author used the nature image
as the initialization of the adversarial patch for the purpose |conf (cls, clean) − conf (cls, patch)|,
of visual inconspicuous. Moreover, a set of transformations where the P r(objectness) and P r(class) represent the ob-
is performed on the adversarial patch to mimic deformable jectness and classification of detector, respectively. Addition-
properties. Then, the author devised specific two-stage attack ally, the author also takes the Intersection over Union (IOU)
procedures against Faster RCNN: the RPN attack and the as the adversarial loss
classifier and regressor attack, which are implemented by the
following loss function. gt
LIoU = IOUpredcit (target). (20)
The predicted bounding box changed by minimizing the LIoU .
Lrpn =Epi ∼P (D(si , st ) + si ||d~i − ∇d~i ||1 ), The author also adopted the NPS to ensure the color of the
Lcls =Epi ∼P̂ C(p)y + Epi ∼P ? L(C(p), y), printed adversarial patches. The author conducts the physical
(17)
attack by first printing out the adversarial patch on transparent
X
Lreg = ~ 2,
||R(p) − ∇d||
pi ∼P ?
paper, then placing it on the camera’s lens, then capturing the
video. The physical attack evaluated on YOLOv2/5 and Faster
where si and d~i are the confidence score and coordinates RCNN achieves over the average fooling rate of 35.48%.
of i-th bounding box; st is the target score, where s1 and 7) SwitchAttack: Recently, Shapira et al. [89] developed
s0 represent the background and foreground. ∇d~i is the a targeted attack method to generate the adversarial patch
predefined vector that guides the optimization direction. C and that is pasted on the car hood to attack object detectors
R are the prediction output of the classifier and the regressor. (i.e., YOLOv3 and Faster RCNN) in a more realistic monitor
P is the output proposal, P̂ is the top-k proposal filtered by scenario. Specifically, the author devised a tailored projection
confidence score and P ? is the proposal that can be detected method to adaptively adjust the position and shape of the ad-
as true label y. d~ denotes the distortion offset. Additionally, the versarial patch in terms of the camera orientation. To performs
TV loss is also adopted to produce more natural patterns. In the targeted attack, the author considers all candidates that
physical attacks, the author printed the adversarial patterns and surpass the specific threshold before the NMS component to
stuck them over the car (i.e., non-plane objects) and captured ensure the relevant candidates can be classified as the target
the picture from different distances (8 ∼ 12 m) and angles label. Moreover, TV loss is adopted to improve the physically
(−45◦ ∼ 45◦ ). Experiment results show that the adversarial deployable of the adversarial patch. In physical attacks, the
camouflage could reduce the detector’s performance by a large author stuck the printed patch on the toy car, achieving an
margin. attack success rate of 95.9% on recaptured images.
5) LPA Attack: Unlike the previous patch-based physical 8) ScreenAttack: The first step of current patch-based phys-
attacks [45], Yang et al. [87] proposed to generate the adver- ical attacks is to print out the adversarial patch, which is
sarial license plate to fool the SSD. Specifically, the author static in practice, and ignore the constantly changing character
devised a mask to guide the perturbation region, optimized by of the physical world, which may result in a failed attack.
the C&W adversarial loss and color regularization consisting To address this challenge, Hoory et al. [90] proposed a
of NPS and TV loss to ensure adversarially and eliminate the dynamic adversarial patch for evading object detection models
color discrepancy between the digital and physical worlds. In by displaying the adversarial patch with a digital screen (see
optimization, EOT is used to enhance physical robustness. In Figure 11). The author adopted the loss function as the same
physical attacks, the author created the adversarial license plate to [47] to train the adversarial patch. Additionally, the author
and placed it on the real car. Experiment results suggested pointed out that the adversarial patch crafted by loss function
[47] is easily misdetected to the semantically related category, Xu et al. [92] proposed an adversarial T-shirt attack to avoid
which makes it inconsistent with background objects. To this the detection of the single object detector (i.e., person) in the
end, the author considered the semantically-related class (e.g., physical world. The generated adversarial pattern is attached
bus or truck) of the original class (i.e., car) as the target and to the surface of the T-shirt. To mimic the cloth deformation
devised the following loss function in the physical world, the author proposed to apply the
Thin Plate Spline Mapping (TPS) transformation over the
X adversarial perturbation. To further enhance the performance
J(δ, xadv , C) = α · T V (δ) + L(f (xadv ), yi ), (21) of the adversarial attack, the author adopted the following
yi ∈C
three approaches: 1) approximate the cloth’s deformation with
where C is the set of semantically-related labels used to avoid TPS; 2) approximate the color discrepancy between digital
being detected as car-related classes. Furthermore, the author and physical space with a quadratic polynomial regression.
split the train set into multi subsets and trained an adversarial 3) enhance the robustness of the adversarial perturbation
patch for each subset. In physical attacks, the author placed with EOT. By performing the above operations, the T-shirt
a digital screen on the side and rear of the car, where the stuck with the adversarial patch degrades the person detector
adversarial patch is chosen from the patch subsets to dynamic significantly under indoor and outdoor physical conditions.
display in terms of the changing environmental conditions. Concurrent to [92], Wu et al. [93] proposed to devise an
Evaluation results on YOLOv2 suggested that 90% video adversarial patch to implement the disappear attack. To this
frames with the adversarial perturbed screen can not detect end, the author devised the objectness disappear loss, which
the target object (i.e., car). is expressed as follows
X 2
Lobj = Et∼T max {Si (A(x, δ, t)) + 1, 0} , (22)
i
where A(x, δ, t) is the patch apply function, which pastes

the adversarial patch δ on the image with the transformation
Fig. 11. Dynamic attack by display the adversarial patch on the screen. [90] process t; Si indexes the objectness score of i-th proposal.
By minimizing this loss function, the adversarial examples
Recently, the adversarial patch has been applied to attack the with the devised patch could suppress the positive scores.
face detector (i.e., FPN [120]) in the digital domain ( [121]) Additionally, the TV loss is adopted to ensure the visual effect
by sticking on the person’s forehead to conduct the adversarial of the adversarial patch. In physical attacks, the author treated
attack. the printed patch as a sticker of the clothes. Experiment results
suggest that the adversarial patch in the physical world could
B. Wearable adversarial perturbation deceive the YOLOv2 and Faster RCNN. Moreover, the author
Unlike devising the simple plane (non-rigid) adversarial empirically finds that the TPS would detriment the adversarial
perturbation, the non-rigid object show more difficult as that performance.
needs to take the deformable of the adversarial patch into Tan et al. [94] pointed out the existing adversarial wearable
account in physical deployable. Wearable adversarial perturba- patches are conspicuous to humans. Thus, the author proposed
tion approaches [91], [92], [92], [93] were designed to address two-stage strategies to optimize the legitimate adversarial
that issue. Figure 12 illustrates the related wearable patch- patch that can evade both human eyes and detection models
based adversarial attack. in the physical world. In the first stage, the author began
with a random cartoon image and optimized an adversarial
patch that could deceive the detector. In the second stage,
the author adopted the clean cartoon image and the optimized
patch crafted in the first stage to constrain the adversarial patch
to evade human attention. The author adopted the TV loss and
[91] [82] [92] [93] [95] [94] [38] the Euclidean distance between the adversarial patch and the
Fig. 12. Visualization examples of various wearable adversarial patches. cartoon image to ensure visually natural to human observers.
Moreover, NPS loss is used to constrain the patch color inside
Yang et al. [91] extended EOT from 2D space to 3D the printable colors. The author conducted the physical attack
space by building up a pinhole camera model to mimic the by attaching the printed adversarial patch to the white T-shirts.
patch’s transformation in the physical world. In optimization, The case study shows the effectiveness of their adversarial
the author adopted the C&W-based adversarial loss, TV loss, patch against YOLOv2 in the physical attack.
and Euclidean distance between the clean image and the Concurrently to [94], Hu et al. [95] proposed to exploit the
adversarial image with patch to optimize the adversarial patch. learned image manifold of a pre-trained GAN (e.g., BigGAN
In physical attacks, the author holds the laptop where the and StyleGAN) to guide the adversarial patch generation.
screen displays the adversarial patch. Experiment results on the Rather than directly optimizing the adversarial patch from
captured image show that the YOLO’s detection performance scratch, the author treats the image sample sampling from
degraded from 100% to 28%. the generator as the adversarial patch. Then the patch is
(s,p)
applied to the target object inside the image to constructing where P is the projection model, cs , cp , co is the training
the adversarial examples. The TV loss and the following data triple, consists of projecting cp on pixels of color cs , and
(s,p)
adversarial loss are adopted to optimize the adversarial patch the output co ; w is the parameters of the projection model.
N
To optimize the adversarial examples, the author combined the
1 X j j EOT and treated the objectness and classification output of
L= max[fobj (xadvi ) · fcls (xadvi )], (23)
N i=1 j the detector as the adversarial loss. The author conducted the
physical attack by placing the projector away from the target
where the xadvi is the i-th image in a batch with size
j j object from 1 to 12 meters and viewing angles −30◦ ∼ 30◦ ,
N. fcls , fobj represent the class probability and objectness
achieving the success rate 100% for the traffic sign classifi-
probability, respectively. In physical attacks under indoor and
cation model and 77% for object detectors (i.e., Mask RCNN
outdoor conditions, the author recorded one person wearing
and YOLOv3).
the adversarial shirt side-by-side with another person wearing
an ordinary shirt. Experiment results show that the phys- D. Camouflage-based physical adversarial attack
ically recaptured image could degrade the detection recall
The works discussed above are mainly focused on gen-
of YOLOv4tiny by approximately 23.80% and 43.14% (for
erating the adversarial patch on the 2D image, where each
indoor and outdoor, respectively). Moreover, they exhaustly
pixel can be modified independently. Recently, researchers
investigate the influence of different transformations, patch
have increasingly paid attention to more realistic attacks in 3D
sizes, and target classes on attack performance.
space, developing adversarial camouflage-based attacks that
Hu et al. [96] got inspiration from cloth making and
focus on modifying the appearance (i.e., shape or texture) of
proposed to treat the adversarial patch as the whole cloth to
the 3D object.
be cut. To generate a naturalistic patch, the author utilized the
1) Shape of 3D object: Zeng et al. [98] first proposed to
generative model to generate the adversarial patch and then
generate the adversarial examples against 3D shape classifica-
overlap them with the object. To simulate the physical defor-
tion and a visual question answer (VQA) system. Specifically,
mation of the cloth, the author applied the TPS transformation
they crafted the adversarial example by adopting the neural
over the adversarial patch. TV loss is adopted to ensure the
renderer technique, which allows them to perform the 3D
smoothness of the patch’s pattern. They conduct the physical
transformation to better approximate real-world environments.
experiment by painting the adversarial patch over the cloth and
Concurrent to Zeng [98], Xiao et al. [122] proposed to
then making the clothes. The adversary who wore the crafted
generate an adversarial object by modifying the mesh of a 3D
adversarial clothes can bypass the detector with the average
object, namely MeshAdv. The adversarial mesh is optimized
attack success rate of 80.75% against different detectors (i.e.,
on the total loss consisting of classification loss, objectness
YOLOv2/3, Faster RCNN, and Mask RCNN) under different
loss, and perceptual loss. Moreover, the author extends the
distances and view angles (range from 0◦ to 180◦ ). Figure 13
TV loss from pixel-wise in the 2D image to the vertex in the
describe the adversarial patch and the corresponding clothes.
3D image. However, the methods of the above method limited
their utilization, which made them fail to conduct the physical
attack.
2) CAMOU: Unlike the works [98], [122] optimize the ob-
ject mesh, Zhang et al. [99] proposed to optimize the texture of
the 3D object. The author presented a clone network to mimic
the entire processing containing the adversarial texture render
and detector prediction. Then apply the white-box attack on
Fig. 13. Adversarial patch (left) v.s. adversarial clothes (right). [96] the clone network to optimize the adversarial texture. Although
the camouflaged vehicle achieved good attack performance on
Mask RCNN, the optimized texture is a mosaic-like pattern,
C. Projector’s Projection which is conspicuous to the human observer.
Rather than generating the adversarial patch directly, Lovi- 3) UPC: Huang et al. [48] proposed to optimize the
sotto et al. [97] presented a novel adversarial attack by mod- adversarial texture from the patch seed and stick the adver-
ulating the projector’s parameters. The author first collected sarial patch over the partial vehicle’s surface to obtain well
the projectable colors as following step: 1) collect the target appearance. It is worth noting that the author printed out the
image of the target object as the projection surface S; 2) select adversarial texture and stuck it on the real car to experiment.
a color cp = [r, g, b], and the color Pcp over the projection 4) ER: Rather than construct the adversarial patch under
surface, resulting in the output Ocp . Finally, repeat the above the white-box setting, Wu et al. [100] considered a more
two steps until collecting enough data. The author takes the challenging black-box attack. The author proposed to utilize
camera noise and projector’s work mechanism into account to the genetic algorithm to optimize the adversarial texture. To
collect the realistic color as possible. After that, the author reduce the optimization variable, the author optimizes small-
fitted a projection model as follows size adversarial texture (64 × 64), then enlarges or repeats
(ER) to a large scale (2048 × 2048). The adversarial texture is
X
LP = argmin ||P(cs , cp ) − c(s,p) ||1 , (24) rendered wrapped over the vehicle in the simulator environ-
o
w
∀cs ,cp ment via Carla [123] PythonAPI, then recorded the adversarial
examples from the simulator environment and used to query

the detector (i.e., Light head RCNN [124]). They show the
adversarial texture generated by the black-box attack could
significantly reduce the performance (the magnitude of 52%)
of the detector in the simulator environment while ignore UPC [48] Wu [100]
verify the physical attack. DAS [30] FCA [33]
5) DAS: Wang et al. [30] pointed out that the adversar-
ial camouflage generated by the previous works [99], [100]
failed to consider the model’s attention, resulting inferior
transferability. To this end, the author proposed simultaneously
suppressing the model and human attention. To suppress the Random CAMOU [99] CAC [34] DTA [35]
model’s attention, the author proposed dispersing the intensity Fig. 14. Visual appearance of various camouflage texture rendered vehicle.
attention map by disconnecting the connected graph generated
from the attention map. To suppress human attention, the
author adopted the seed content patch as initialization and texture, the author optimizes the 2D image as the texture but
encouraged the adversarial texture to remain similar to the transforms them with the same camera matrix as the vehicle.
seed content. Moreover, the author optimizes the specific part In physical attacks, the author printed out the camouflaged 3D
faces of the 3D model as the textures via neural renderer [125]. object with a 3D printer and captured the photo every 45◦ .
To optimize the adversarial texture, the author constructed a Experiment results suggested that the average performance
dataset sampled from the Carla simulator at different distances reduction of two detectors (i.e., EfficientDetD0 [126] and
and angles. The author conducts the physical attack by printing YOLOv4) model is 47.5%.
out the camouflaged rendered car on paper and then cropping In summary, the camouflage-based adversarial texture has
the camouflage texture and sticking it on the toy car. Experi- attracted increasing attention in recent years because it can
ment results suggested that the average performance reduction better mimic physical transformations (e.g., shadowing and
of four detectors (i.e., YOLOv5, SDD, Faster RCNN, and brightness). For example, recently, Byun et al. [127] proposed
Mask RCNN) is 26.05%. to adopt the neural renderer as the data augmentation technique
6) FCA: Recently, Wang et al. [33] proposed to craft to boost the robustness of adversarial examples. The discussed
the full-coverage adversarial camouflage texture to overcome camouflaged attacks mainly focus on generating the adver-
more complex environment conditions (e.g., multi-view and sarial camouflage texture, while the loss function, especially
occlusion), which are less explored in the current patch-like adversarial loss, has been less explored. Additionally, these
[47] and partial camouflage work [30], [48]. To maintain the works conducted the physical attack by printing a small-size
effectiveness of the small ratio of adversarial camouflaged car model rather than the real car. We argue it is far away from
vehicles in the image, the author used multi-scale IOU loss to a realistic attack on the vehicle. Alcorn et al. [128] argued that
constrain the adversarial texture. Finally, the author combined the reason for the vulnerability of detectors might be attributed
the TV loss and adversarial loss (i.e., including IOU, classifi- to these simulated images being strange to the detector. In
cation, and objections) to optimize the adversarial camouflage other words, the detector was not trained on such simulated
texture. They adopt a similar physical attack process (including data, resulting in bad generalization. Moreover, there are many
datasets and detector) with [30], and the experiment shows issues to be solved, including but not limited to small-size
their method’s effectiveness under complex environments, adversarial objects and the complex physical environment
degrading the average reduction of the detector performance condition (e.g., occlusion and lighting) in the real world.
of 53.99% in physical attacks.
7) CAC: Unlike [33] optimize the full coverage faces of E. Against the infrared detectors
the 3D model that encountered the difficult issue of hard
implementation, Duan et al. [34] proposed to optimize the 2D
texture of the 3D model directly. They proposed to optimize
the texture by minimizing all classification output of the RPN
module at the first stage of Faster RCNN. Although they
adopted various physical settings (e.g., angles, brightness,
etc.), the author trained the texture on the image with pure-
color background. In physical attacks, the author print out the [31]
entire vehicle model with the 3D printer, and the experiment [32]
demonstrates their method’s effectiveness in the physical en- Fig. 15. Adversarial against infrared thermal imaging detector.
vironment under 360◦ free viewpoint settings.
8) DTA: Suryanto et al. [35] pointed out the previous works The existing physical adversarial attack mainly focused on
[30], [33] fail to consider more specific physical environmental the RGB image in visible light, while visible light has its
conditions, e.g., shadow. To address the above issue, the author limitations in dark environments (e.g., darkness); for example,
used a network to approximate the shadowing effect. Rather the image content taken in the evening is hardly identified.
than adopt the neural renderer to optimize the adversarial However, infrared thermal imaging has its advantage because
it is free from lighting changes, enabling them adopted widely as a saliency loss to train the adversarial patch. The saliency
in the real world. loss is represented as follows
Zhu et al. [31] proposed a novel adversarial method to attack
the infrared thermal imaging detector. The author devised
q q
Lsal = 2 + σ 2 + 0.3 ∗
σrg µ2rg + µ2yb
a board with multi bulbs, where the positions of bulbs are yb
(25)
the optimization variables. To mimic the shining of the bulb, rg = R − G, yb = 0.5 ∗ (R + G) − B,
the author applied the gaussian function on the pixel of the where µ and σ are the averages, and the standard deviation
bulb’s position. Finally, the author treated the combination of of the auxiliary variables, R, G, and B denote the patch’s
objectness output of detector and TV loss as their loss function color channel values. The author investigated the impact
to optimize the adversarial patch. The author conducted the of the patch’s size, position, number, and saliency on the
physical attack by placing the bulb in the optimized positions adversarial performance. Experiment results demonstrate that
and photoed the people who held the board. Experiment results the optimized patch could reduce the detector performance
show that the adversarial bulb board degrades the detector by significantly (i.e., 47%).
34.48%. Recently, Du et al. [101] implement the physical adversarial
Recently, the author pointed out that [31] may be failed attack against the aerial imagery detection. Considering the re-
in multi-view environments and proposed to construct the alistic scenario, the author designed two types (see Figure 17)
adversarial clothes covered with adversarial texture made by of adversarial patches to accommodate the different scenarios.
aerogel [32]. More specifically, the author adopted a similar Type ON refers to the adversary placing the adversarial patch
idea with [96] to optimize the adversarial patch image. To on the top of the car, which is suitable for dynamic scenarios
simplify the physical implementation, the author optimized the (i.e., the car is driving). By contrast, Type OFF refers to the
QR code-like adversarial texture, which allows the author only adversary placing the adversarial patch around the car, which is
required to optimize the generation parameters of the QR code suitable for stationary scenarios (e.g., parking lot). The author
texture. The generated QR code texture is tiled to a large shape, adopted objectness, NPS, and TV loss to train the adversarial
and then the actual adversarial patch is randomly cropped patch. The author conducted the physical attack by printing
from the tiled image and then pasted onto the target object out the adversarial patch and placing them around or near the
in the image. The TPS and various transformations are used target objects under different lighting conditions. Experiment
to improve the robustness of the adversarial patch. The author results demonstrated that the printed-out adversarial patch
made the clothes by attaching the black block to the aerogel could significantly reduce objectness score (25% to 85%).
according to the optimized QR code texture. The participant
who wore the adversarial patch could successfully bypass
the detector (i.e., YOLOv3), where the detection performance
degraded by 64.6%.
Type ON Type OFF
F. Against Aerial detection Fig. 17. Patches designs [101].
Unmanned aerial surveillance has gained increasing atten- Apart from two adversarial patch attacks against the object
tion with the development of drone techniques and object detector on the aerial imagery dataset, there are a number of
detection. Correspondingly, the adversarial attack against un- attempts to attack the remote sensing image recognition, and
manned aerial surveillance aroused increasing attention in detection [130]–[132].
the military scenario. The conventional technique against
unmanned aerial surveillance is to use the camouflage net,
G. Other adversarial attacks against object detection
while a recent study shows that a partial imagery patch could
deceive the unmanned aerial surveillance. Although physical attacks consider the more realistic con-
ditions, the basic procedure behind physical attacks is similar
to digital attacks, i.e., the first step of the physical adversarial
attack is to construct the adversarial perturbation in the digital
world. Thus, some digital attacks may inspire physical attacks.
Wang et al. [133] proposed a dense adversarial generation
(DAG) attack, which simultaneously attacks the classification,
objectness, and the classification and regress loss of the
Random patch Adversarial patch RPN module. Chow et al. [134] systemically investigate the
Fig. 16. Visualization result of different patch. [129]. adversarial objectness attack against the common detectors
(i.e., YOLOv3, SSD, and Faster RCNN) on three different
Den Hollander et al. [129] assumed unmanned aerial benchmark datasets, the devised adversarial patch could paste
surveillance is the YOLOv2 trained on the aerial imagery on the image or add into the image. Wang et al. [135] proposed
dataset. To attack the object detector, the author combined the a creative attack by disturbing the non-maximum suppression
common loss consisting of the adversarial loss, NPS loss, and (NMS) module of the detector, which resulted in enormous
the objectness loss and proposed a novel colorfulness metric wrong results on the generated adversarial examples. Liao et
al. [136] showed that there is no need to generate global rate for both YOLOv4 and Faster RCNN).
perturbation for the whole image, the author exploited the
high-level semantic information to generate local perturbation
H. Attack Semantic Segmentation
for detectors. To improve the naturalness of the adversarial
patch, Pavlitskaya et al. [36] extend the existing GAN-based Semantic segmentation is similar to the object detection task
approaches [95] by attacking all object that appears in the but needs fine-grind classification for every pixel in the image.
image to generate inconspicuous patches, and then assessed However, physical attacks against semantic segmentation are
the trade-off between attack performance and naturalness. less explored, which will be discussed as follow.
Nesti et al. [102] proposed to optimized a scene-specific
billboard patch to attack the semantic segmentation models.
The author exploits the Carla simulator [123] to simulate a
more realistic environment of the real world. Rather than adopt
the EOT in the 2D image space, the author exploited various
physical transformations in the 3D dimension provided by
SAA [137] DPatch [138] RPAttack [139]
Carla. More specifically, the author first applied the projective
transformation to precisely paste the patch onto the 2D attack-
able surface with the extracted 3D rotation matrix. The author
devised the following loss function to optimize the patch.
X X
HideAttack [138] HideAttack [138] LM = LCE (fi (xadv ), yi ), LM̄ = LCE (fi (xadv ), yi ),
r∈Ω r6∈Ω
Fig. 18. Visualization of different sparse attack against the object detector.
(26)
where LM describes the cumulative cross entropy (CE) for
Unlike searching the global perturbation for the whole
those pixels that have been misclassified with respect to the
image, the research has attempted to implement the sparse
ground truth y, denoted as Ω; while LM̄ refers to all the others.
attack by perturbing a few numbers of the image pixel,
The author printed out the billboard patch as a 1 × 2 meter
which makes them suitable for extending to physical attacks.
poster, which could successfully hide the target object. Figure
Although the sparse adversarial attack has been investigated
19 illustrates the physical billboard patch.
in image recognition systems [140]–[144], their application
on object detection is unexplored. Figure 18 illustrated the
successfully sparse adversarial attack in the digital world.
To perform the sparse adversarial attack, Bao et al. [137]
generated the adversarial patch by simultaneously considering
the position, shape, and size of the adversarial patch into
optimization. Zhao et al. [145] proposed two adversarial patch
generation algorithms: exploit the heatmap to guide the patch
application and consensus-based algorithm (i.e., one of the Fig. 19. Visualization of Random patch (left) and optimized patch (right).
model ensemble strategies). The proposed algorithm won [102]
seventh place in the Alibaba Tianchi adversarial challenge
object detection competition. Wu et al. [138] proposed a Unlike the image recognition and object detection task, the
diffused patch attack to optimize the adversarial line embedded research on the physical adversarial attack against the semantic
in the object’s center. The proposed attack won second place segmentation task lacks enough attention. The possible reason
in the Tianchi competition. Li et al. [146] extend the universal is the task similarity between object detection and semantic
adversarial perturbation [147] from image recognition tasks to segmentation. Additionally, semantic segmentation models are
object detection tasks and proposed a universal dense object commonly deployed in autonomous driving, and the physical
suppression (U-DOS) algorithm to optimize the universal adversarial attack against the autonomous driving system has
adversarial perturbation to deceive the detector on most input gained enormous attention [149]–[151], which is beyond the
images. To probe the model’s blind spots, Zhu et al. [148] discussion of this paper. Such reason may result in a few
came up with a novel approach that simultaneously models numbers of literature investigating the physical adversarial
the shape and the pattern of the adversarial patch; they can attack against semantic segmentation.
generate multi patches with different shapes for a specific Nonetheless, several attempts have explored the digital
image. Huang et al. [139] proposed to generate the adversarial attack against semantic segmentation. Hendrik Metzen et al.
patch and gradually remove the inconsequential perturbation [152] proposed universal adversarial perturbation to deceive
in the crafted patch. In addition, the author devised a weighted the semantic segmentation on most images. Unlike to [152],
strategy to stabilize the model ensemble, avoiding the coun- Nakka et al. [153] pointed out there is no need to perturb the
termeasure among different detectors. The experiment result whole image to realize the attack and proposed to optimize the
shows that perturbing a few numbers of pixels could reduce adversarial patch applied to the object to attack the semantic
the detector significantly (obtaining 100% missed detection segmentation. Arnab et al. [154] systematically investigated
the adversarial robustness of the semantic segmentation with Based on the aforementioned discussion, one promising
respect to various attack (e.g., FGSM [10], PGD [12]). approach is to adaptive adjust the mask’s shape, such as using
Additionally, physical adversarial attacks have been recently a network to automatically learn the best mask or exploit the
applied to various fields, such as depth estimation [103] and geometry algorithm to optimize the optimal mask.
object tracking [104], [105]. 2) Universal adversarial perturbation via black-box attack:
Many existing physical attacks belong to white-box attacks,
VI. D ISCUSSION AND F UTURE W ORK and the perturbation is defaulted to be universal, which means
it only requires optimizing one perturbation to fool the whole
Although physical adversarial attacks impose potential risks
dataset. Despite their success, the white-box setting is not
for existing deployed DNN-based systems, the issue remains
practical in practice. Regarding black-box attacks, there are
to be solved. In this section, from the perspective of the attack
plenty of works [16], [155]–[157] have been proposed for
pipeline (i.e., input transformation, algorithm design, and
attacking the target model in the digital world. Moreover, these
evaluation), we will summarize the common techniques used
works are designed to generate perturbation for each input
to develop physical attacks and then discuss the promising
image, which limits their application in practice. In surveyed
methods used to boost further the performance of physical at-
works, only one work was proposed to optimize the universal
tacks. Finally, we discuss the potentially beneficial application
adversarial perturbation against the image recognition model
of the physical deployable adversarial perturbation.
in digital work [158].
Therefore, developing physically deployable universal
A. Input transformation black-box attacks is more realistic in practice but more dif-
The essential of input transformation is the expectation ficult. Some algorithms can solve the above problem, such
maximization of the objective function over the inputs that as genetic algorithm, differential evolution, particle swarm
convert on various transformation operations (e.g., rotation, optimization, and random search. However, optimizing the
affine transform, brightness adjust). In adversarial attacks, perturbation revolves to solve a high-dimension optimization
the transformation is exerted on the adversarial examples problem, which significantly confines the efficiency and ef-
to make them robust to such transformation or boost their fectiveness of the attack. Thus, reducing the optimization
transferability. Thus, input transformation is a simple but dimension is crucial for developing the physical deployable
effective way to improve adversarial attacks’ performance black-box attack. Moreover, other issues referring to black-
both in the digital and physical world. However, the complex box may need to be considered, such as time-consuming and
environmental condition may curb the effectiveness of such query efficient.
transformations. To address the issue, the digital and the cor- 3) Transferability: Transferability is one of the significant
responding physical recaptured image pairs are used to boost characteristics of adversarial perturbation, which refers to the
the physical robustness of adversarial perturbation. Another adversarial perturbation generated for the known model and
solution is to use a network to model the discrepancy between can also be used to attack the unknown model. In realistic
the digital and physical worlds. physical attacks, the adversary may not know what the DNN
To better mimic the physical transformation in the real deployed in the system, which makes the transferability of
world, given the stronger learning capability of DNN, one adversarial perturbation essential for physical attack perfor-
approach is to exploit a network to learn the various physical mance. However, the existing physical attacks mainly focus on
transformation, such as shadow, brightness, rotation, satura- the robustness of physical attacks and attack performance on
tion, and so on. For example, one can use a network to learn specific known models, making the transferability of physical
the explicit physical transformation from video sequences. adversarial attacks less explored.
Thus, collecting high-quality training data is significant to the There are enormous works attempts to improve the trans-
network’s performance. ferability of digital attacks, including but not limited to in-
put transformation, gradient calibration, feature-level attack,
model ensemble, and generative model. These techniques
B. Attack algorithm design could be combined with physical attacks to boost their trans-
Although a line of physical attacks has been proposed for ferability.
different tasks and achieved certain success, many problems 4) Simultaneous attack multitask: Most current physical
still exist, which will be discussed as follows. attacks focus on attacking specific tasks, such as image
1) Adversarial patch application: Adversarial patch appli- recognition and object detection. However, in the real world,
cation refers to how to paste the adversarial perturbation on the attacker cannot get knowledge about the target victim
the host image/object, which is crucial for adversarial patch system, such as whether the system is equipped with the
attacks. However, most existing works have used the mask to image recognition model or object detection model. Therefore,
guide the application of the adversarial patch, which curbs the one promising direction to address the above problem is to
performance of the attacks. The reason may include two-fold: take multi-task into account during the optimization simulta-
on one side, only a specific area of the image is crucial for the neously.
model decisions, and modifying unspecific areas may have a Recently, the vision transformer (aka., ViT) provided a novel
countereffect; On another side, a large block of perturbation general framework for vision tasks, exhibiting competitive
is conspicuous to the human observer, hurting its stealthiness. performance to CNN-based framework. With the increasing
application of Vit, physical adversarial attacks against the ViT the scene and parameters (e.g., weather and light conditions).
should arouse the researcher’s attention. For example, one can develop a physical testing environment
5) Robustness of physical attacks: The main problem faced using an open source simulator (e.g., Carla [159] or AirSim
by physical adversarial attacks is the complex physical envi- [160]), then provide the physical testing interface, allowing
ronments, including brightness, lighting, shadow, deformation, the attacker to call the testing procedure with their optimized
distance, occlusion, and the system noise introduced by the adversarial perturbation and report the attack results.
imaging sensor (i.e., camera). The existing approaches en- 2) Uniform physical test criterion: Most physical attacks
counter the following drawbacks when applied to physical conduct physical experiments under stationary conditions,
attacks. First, they generally resit some of these conditions which means that the adversary takes photos of the adversarial
and hard take all underlying physical environment factors into perturbation disturbed object at a specific position and then
account. Second, they cannot cover some physical conditions, evaluates the attack performance of the captured image. How-
such as partial shadow or occlusion of the target object. ever, the target object moves in and out of the viewfinder of
Third, the used transformation (e.g., rotation or brightness the system’s sensor device in the actual physical environment,
change) modifies the whole pixel of the image instead of meaning that the ratio of the perturbed object is dynamically
the target object, which is inconsistent with real physical changing with respect to the fixed viewfinder. In the dynam-
deployment. Finally, there are no works to take the dynamic ically changing process, many factors caused by the back-
changing of the size of the target object into account, which is ground environment would impact the attack performance. The
crucial in the real-world scenario as the perturbed object will stationary test cannot include such a complex scenario. To
move. Therefore, constructing a general and robust physical this end, the dynamic physical test may better represent the
adversarial attack is an open problem for future research. performance of the attack method in the real world. Therefore,
Two promising techniques may be can used to solve the a complete criterion of the physical test process is urgent to
above issues. The first one is to utilize the technique from the boost the development of physical adversarial attacks.
computer graphic, such as the physical renderer, a computer 3) Applications: The adversarial attack is a double-blade
graphics approach that seeks to render images in a way that sword. On the one side, adversarial attacks would lead the
models the flow of light in the real world. The second one DNN-based system to make mistakes, resulting in potential
is to utilize emerging technologies, i.e., neural radiance fields security risks. On the other side, adversarial attacks also
(NeRF), a method for synthesis for novel views of complex have their benefits side, such as 1) the adversarial perturbed
scenes. The abovementioned approaches can be integrated into physical adversarial example could be used as the training
optimizing adversarial attacks to mimic complex real-world data to boost the adversarial robustness of DNN further; 2)
conditions. adversarial perturbation can used to help the DNN system
6) Stealthiness: Most existing successful physical adversar- remain stable in complex environmental conditions [161],
ial attacks pursue higher and more robust attack performance [162]. Therefore, potential applications exploiting adversarial
with the sacrifice of perceptible. One reason is the mode of perturbation to boost the performance of DNN in the real
projecting the perturbation to the sphere of L∞ with a radius world is an interesting research direction.
of a specific magnitude in digital attacks may be unsuitable
for physical attacks as the slight perturbation may be distorted
caused by the device noise or environmental factors, making VII. C ONCLUSION
the attack invalid. Moreover, visually conspicuous adversarial
In this paper, we systemically discussed the physical ad-
perturbation patterns easily observe by humans. Thus, as
versarial attacks in computer vision tasks, including image
we cannot diminish the perturbation magnitude, making the
recognition and object detector tasks. We argue that DNN-
adversarial patterns more natural may be a promising work.
based systems still face enormous potential risks in both digital
and physical environments. These risks should be thoroughly
C. Evaluation studied to avoid unnecessary loss when widely deployed in
the real world. Specifically, we proposed a taxonomy scheme
1) Evaluation benchmark dataset: Although most physical
to categorize the current physical adversarial attack to provide
attack methods were proposed, they are devised for different
a comprehensive overview. By reviewing the existing physical
datasets, resulting in hardly evaluating different methods on
adversarial attack, we summarize the common characteristic of
one uniform dataset. Thus, it is urgent to develop a public
physical adversarial attacks, which may be helpful to future
dataset for evaluating the effectiveness of the attack methods.
physical adversarial attack research. Finally, we discuss the
On the other hand, there is no unified evaluation process
problems that need to be solved in physical adversarial attacks
and criterion for physical attacks, which makes it hard to
and provide some possible research directions.
reproduce the result reported in the publications. However,
the real environment is complex and hard to reproduce as the
changing weather and light condition. Recently, some research VIII. ACKNOWLEDGMENTS
has attempted to use the simulator to develop adversarial
attacks, which gives a promising direction to develop an This work was supported by the National Natural Science
environment for evaluating physical attacks as the simulator Foundation of China (No.11725211 and 52005505).
can provide reproducibility and definite environments by fixing .
R EFERENCES [23] J. Zhang and C. Li, “Adversarial examples: Opportunities and chal-
lenges,” IEEE transactions on neural networks and learning systems,
[1] K. Simonyan and A. Zisserman, “Very deep convolutional networks for vol. 31, no. 7, pp. 2578–2593, 2019.
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [24] Y. Hu, W. Kuang, Z. Qin, K. Li, J. Zhang, Y. Gao, W. Li, and K. Li,
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image “Artificial intelligence security: threats and countermeasures,” ACM
recognition,” in Proceedings of the IEEE conference on computer vision Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021.
and pattern recognition, 2016, pp. 770–778. [25] A. Aldahdooh, W. Hamidouche, S. A. Fezza, and O. Déforges, “Adver-
[3] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and sarial example detection for dnn models: A review and experimental
S. Zagoruyko, “End-to-end object detection with transformers,” in comparison,” Artificial Intelligence Review, pp. 1–60, 2022.
European conference on computer vision. Springer, 2020, pp. 213– [26] G. R. Machado, E. Silva, and R. R. Goldschmidt, “Adversarial machine
229. learning in image classification: A survey toward the defender’s per-
[4] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, spective,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–38,
“Swin transformer: Hierarchical vision transformer using shifted win- 2021.
dows,” in Proceedings of the IEEE/CVF International Conference on [27] A. Sharma, Y. Bian, P. Munz, and A. Narayan, “Adversarial patch
Computer Vision, 2021, pp. 10 012–10 022. attacks and defences in vision-based tasks: A survey,” arXiv preprint
[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. arXiv:2206.08304, 2022.
Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” [28] H. Ren, T. Huang, and H. Yan, “Adversarial examples: attacks and
Advances in neural information processing systems, vol. 30, 2017. defenses in the physical world,” International Journal of Machine
[6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training Learning and Cybernetics, vol. 12, no. 11, pp. 3325–3336, 2021.
of deep bidirectional transformers for language understanding,” arXiv [29] J. Wang, A. Liu, X. Bai, and X. Liu, “Universal adversarial patch
preprint arXiv:1810.04805, 2018. attack for automatic checkout using perceptual and attentional bias,”
[7] D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, IEEE Transactions on Image Processing, vol. 31, pp. 598–611, 2021.
C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al., “Deep [30] J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, and X. Liu, “Dual attention
speech 2: End-to-end speech recognition in english and mandarin,” suppression attack: Generate adversarial camouflage in physical world,”
in International conference on machine learning. PMLR, 2016, pp. in Proceedings of the IEEE/CVF Conference on Computer Vision and
173–182. Pattern Recognition, 2021, pp. 8565–8574.
[8] A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, [31] X. Zhu, X. Li, J. Li, Z. Wang, and X. Hu, “Fooling thermal infrared
S. Wang, Z. Zhang, Y. Wu et al., “Conformer: Convolution-augmented pedestrian detectors in real world using small bulbs,” in Proceedings
transformer for speech recognition,” arXiv preprint arXiv:2005.08100, of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021,
2020. pp. 3616–3624.
[9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. [32] X. Zhu, Z. Hu, S. Huang, J. Li, and X. Hu, “Infrared invisible clothing:
Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” Hiding from infrared detectors at multiple angles in real world,” in
in 2nd International Conference on Learning Representations, ICLR Proceedings of the IEEE/CVF Conference on Computer Vision and
2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Pattern Recognition, 2022, pp. 13 317–13 326.
Proceedings, 2014. [Online]. Available: http://arxiv.org/abs/1312.6199 [33] D. Wang, T. Jiang, J. Sun, W. Zhou, Z. Gong, X. Zhang, W. Yao, and
[10] I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing X. Chen, “Fca: Learning a 3d full-coverage vehicle camouflage for
adversarial examples,” in International Conference on Learning multi-view physical adversarial attack,” in Proceedings of the AAAI
Representations, 2015. [Online]. Available: http://arxiv.org/abs/1412. Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2414–
6572 2422.
[11] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine [34] Y. Duan, J. Chen, X. Zhou, J. Zou, Z. He, J. Zhang, W. Zhang, and
learning at scale,” arXiv preprint arXiv:1611.01236, 2016. Z. Pan, “Learning coated adversarial camouflages for object detectors,”
[12] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards in Proceedings of the Thirty-First International Joint Conference on
deep learning models resistant to adversarial attacks,” in 6th Interna- Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022,
tional Conference on Learning Representations, ICLR 2018, Vancouver, L. D. Raedt, Ed., 2022, pp. 891–897.
BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. [35] N. Suryanto, Y. Kim, H. Kang, H. T. Larasati, Y. Yun, T.-T.-H. Le,
OpenReview.net, 2018. H. Yang, S.-Y. Oh, and H. Kim, “Dta: Physical camouflage attacks
[13] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting using differentiable transformation network,” in Proceedings of the
adversarial attacks with momentum,” 2018 IEEE/CVF Conference on IEEE/CVF Conference on Computer Vision and Pattern Recognition,
Computer Vision and Pattern Recognition, pp. 9185–9193, 2018. 2022, pp. 15 305–15 314.
[14] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural [36] S. Pavlitskaya, B.-M. Codău, and J. M. Zöllner, “Feasibility of incon-
networks,” in 2017 ieee symposium on security and privacy (sp). IEEE, spicuous gan-generated adversarial patches against object detection,”
2017, pp. 39–57. arXiv preprint arXiv:2207.07347, 2022.
[15] Z. Huang and T. Zhang, “Black-box adversarial attack with [37] Y. Nemcovsky, M. Yaakoby, A. M. Bronstein, and C. Baskin, “Physical
transferable model-based embedding,” in International Conference passive patch adversarial attacks on visual odometry systems,” arXiv
on Learning Representations, 2020. [Online]. Available: https: preprint arXiv:2207.05729, 2022.
//openreview.net/forum?id=SJxhNTNYwB [38] B. G. Doan, M. Xue, S. Ma, E. Abbasnejad, and D. C. Ranasinghe, “Tnt
[16] C. Li, H. Wang, J. Zhang, W. Yao, and T. Jiang, “An approximated attacks! universal naturalistic adversarial patches against deep neural
gradient sign method using differential evolution for black-box adver- network systems,” IEEE Transactions on Information Forensics and
sarial attack,” IEEE Transactions on Evolutionary Computation, pp. Security, 2022.
1–1, 2022. [39] X. Wei, Y. Guo, and J. Yu, “Adversarial sticker: A stealthy attack
[17] L. Sun, M. Tan, and Z. Zhou, “A survey of practical adversarial example method in the physical world,” IEEE Transactions on Pattern Analysis
attacks,” Cybersecurity, vol. 1, no. 1, pp. 1–9, 2018. and Machine Intelligence, 2022.
[18] R. R. Wiyatno, A. Xu, O. Dia, and A. de Berker, “Adversarial [40] X. Wei, B. Pu, J. Lu, and B. Wu, “Physically adversarial attacks and de-
examples in modern machine learning: A review,” arXiv preprint fenses in computer vision: A survey,” arXiv preprint arXiv:2211.01671,
arXiv:1911.05268, 2019. 2022.
[19] S. Qiu, Q. Liu, S. Zhou, and C. Wu, “Review of artificial intelligence [41] H. Wei, H. Tang, X. Jia, H. Yu, Z. Li, Z. Wang, S. Satoh, and Z. Wang,
adversarial attack and defense technologies,” Applied Sciences, vol. 9, “Physical adversarial attack meets computer vision: A decade survey,”
no. 5, p. 909, 2019. arXiv preprint arXiv:2209.15179, 2022.
[20] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and [42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
defenses for deep learning,” IEEE transactions on neural networks and large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
learning systems, vol. 30, no. 9, pp. 2805–2824, 2019. [43] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
[21] A. Serban, E. Poll, and J. Visser, “Adversarial examples on ob- “Densely connected convolutional networks,” in Proceedings of the
ject recognition: A comprehensive survey,” ACM Computing Surveys IEEE conference on computer vision and pattern recognition, 2017,
(CSUR), vol. 53, no. 3, pp. 1–38, 2020. pp. 4700–4708.
[22] K. Ren, T. Zheng, Z. Qin, and X. Liu, “Adversarial attacks and defenses [44] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
in deep learning,” Engineering, vol. 6, no. 3, pp. 346–360, 2020. once: Unified, real-time object detection,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2016, pp. 779– [66] R. Duan, X. Mao, A. K. Qin, Y. Chen, S. Ye, Y. He, and Y. Yang,
788. “Adversarial laser beam: Effective physical-world attack to dnns in
[45] S.-T. Chen, C. Cornelius, J. Martin, and D. H. P. Chau, “Shapeshifter: a blink,” in Proceedings of the IEEE/CVF Conference on Computer
Robust physical adversarial attack on faster r-cnn object detector,” Vision and Pattern Recognition, 2021, pp. 16 062–16 071.
in Joint European Conference on Machine Learning and Knowledge [67] K. Kim, J. Kim, S. Song, J.-H. Choi, C. Joo, and J.-S. Lee, “Light lies:
Discovery in Databases. Springer, 2018, pp. 52–68. Optical adversarial attack,” arXiv preprint arXiv:2106.09908, 2021.
[46] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, [68] B. Huang and H. Ling, “Spaa: Stealthy projector-based adversarial
A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks attacks on deep image classifiers,” in 2022 IEEE Conference on Virtual
on deep learning visual classification,” in Proceedings of the IEEE Reality and 3D User Interfaces (VR). IEEE, 2022, pp. 534–542.
conference on computer vision and pattern recognition, 2018, pp. [69] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “A general
1625–1634. framework for adversarial examples with objectives,” ACM Transac-
[47] S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveil- tions on Privacy and Security (TOPS), vol. 22, no. 3, pp. 1–30, 2019.
lance cameras: adversarial patches to attack person detection,” in [70] I. Singh, S. Momiyama, K. Kakizaki, and T. Araki, “On brightness
Proceedings of the IEEE/CVF conference on computer vision and agnostic adversarial examples against face recognition systems,” in
pattern recognition workshops, 2019, pp. 0–0. 2021 International Conference of the Biometrics Special Interest Group
[48] L. Huang, C. Gao, Y. Zhou, C. Xie, A. L. Yuille, C. Zou, and (BIOSIG). IEEE, 2021, pp. 1–5.
N. Liu, “Universal physical camouflage attacks on object detectors,” [71] B. Zhang, B. Tondi, and M. Barni, “Adversarial examples for replay
in Proceedings of the IEEE/CVF Conference on Computer Vision and attacks against cnn-based face recognition with anti-spoofing capabil-
Pattern Recognition, 2020, pp. 720–729. ity,” Computer Vision and Image Understanding, vol. 197, p. 102988,
[49] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing 2020.
robust adversarial examples,” in International conference on machine [72] M. Pautov, G. Melnikov, E. Kaziakhmedov, K. Kireev, and
learning. PMLR, 2018, pp. 284–293. A. Petiushko, “On adversarial patches: real-world attack on arcface-100
[50] D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, face recognition system,” in 2019 International Multi-Conference on
F. Tramer, A. Prakash, and T. Kohno, “Physical adversarial examples Engineering, Computer and Information Sciences (SIBIRCON). IEEE,
for object detectors,” in 12th USENIX workshop on offensive technolo- 2019, pp. 0391–0396.
gies (WOOT 18), 2018. [73] S. Komkov and A. Petiushko, “Advhat: Real-world adversarial attack
[51] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to on arcface face id system,” in 2020 25th International Conference on
a crime: Real and stealthy attacks on state-of-the-art face recognition,” Pattern Recognition (ICPR). IEEE, 2021, pp. 819–826.
in Proceedings of the 2016 acm sigsac conference on computer and [74] Z. Xiao, X. Gao, C. Fu, Y. Dong, W. Gao, X. Zhang, J. Zhou,
communications security, 2016, pp. 1528–1540. and J. Zhu, “Improving transferability of adversarial patches on face
[52] A. Mahendran and A. Vedaldi, “Understanding deep image represen- recognition with generative models,” in Proceedings of the IEEE/CVF
tations by inverting them,” in Proceedings of the IEEE conference on Conference on Computer Vision and Pattern Recognition, 2021, pp.
computer vision and pattern recognition, 2015, pp. 5188–5196. 11 845–11 854.
[53] S. T. Jan, J. Messou, Y.-C. Lin, J.-B. Huang, and G. Wang, “Connecting
[75] X. Wei, Y. Guo, J. Yu, and B. Zhang, “Simultaneously optimizing
the digital and physical world: Improving the robustness of adversarial
perturbations and positions for black-box adversarial patch attacks,”
attacks,” in Proceedings of the AAAI Conference on Artificial Intelli-
arXiv preprint arXiv:2212.12995, 2022.
gence, vol. 33, no. 01, 2019, pp. 962–969.
[76] A. Zolfi, S. Avidan, Y. Elovici, and A. Shabtai, “Adversarial mask:
[54] A. Liu, J. Wang, X. Liu, B. Cao, C. Zhang, and H. Yu, “Bias-
Real-world adversarial attack against face recognition models,” 2022.
based universal adversarial patch attack for automatic check-out,” in
[77] D.-L. Nguyen, S. S. Arora, Y. Wu, and H. Yang, “Adversarial light
European conference on computer vision. Springer, 2020, pp. 395–
projection attacks on face recognition systems: A feasibility study,”
410.
in Proceedings of the IEEE/CVF conference on computer vision and
[55] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples
pattern recognition workshops, 2020, pp. 814–815.
in the physical world,” in 5th International Conference on Learning
Representations. OpenReview.net, 2017, pp. 24–26. [78] C. Sitawarin, A. N. Bhagoji, A. Mosenia, P. Mittal, and M. Chiang,
[56] Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, J. Wang, B. Yu, W. Feng, and “Rogue signs: Deceiving traffic sign recognition with malicious ads
Y. Liu, “Watch out! motion is blluurrrri inngg the vision of your deep and logos,” arXiv preprint arXiv:1801.02780, 2018.
neural networks,” in Proceedings of the 34th International Conference [79] A. Liu, X. Liu, J. Fan, Y. Ma, A. Zhang, H. Xie, and D. Tao,
on Neural Information Processing Systems, 2020, pp. 975–985. “Perceptual-sensitive gan for generating adversarial patches,” in Pro-
[57] W. Feng, B. Wu, T. Zhang, Y. Zhang, and Y. Zhang, “Meta-attack: ceedings of the AAAI conference on artificial intelligence, vol. 33,
Class-agnostic and model-agnostic physical adversarial attack,” in no. 01, 2019, pp. 1028–1035.
Proceedings of the IEEE/CVF International Conference on Computer [80] R. Duan, X. Ma, Y. Wang, J. Bailey, A. K. Qin, and Y. Yang,
Vision, 2021, pp. 7787–7796. “Adversarial camouflage: Hiding physical-world attacks with natural
[58] B. Phan, F. Mannan, and F. Heide, “Adversarial imaging pipelines,” styles,” in Proceedings of the IEEE/CVF conference on computer vision
in Proceedings of the IEEE/CVF Conference on Computer Vision and and pattern recognition, 2020, pp. 1000–1008.
Pattern Recognition, 2021, pp. 16 051–16 061. [81] Y. Zhong, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Shadows can be
[59] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer, “Adversarial dangerous: Stealthy and effective physical-world adversarial attack by
patch,” Advances in Neural Information Processing Systems, 2017. natural phenomenon,” in Proceedings of the IEEE/CVF Conference on
[60] S. Casper, M. Nadeau, D. Hadfield-Menell, and G. Kreiman, “Robust Computer Vision and Pattern Recognition, 2022, pp. 15 345–15 354.
feature-level adversaries are interpretability tools,” in Advances in [82] Z. Wang, S. Zheng, M. Song, Q. Wang, A. Rahimpour, and H. Qi,
Neural Information Processing Systems, 2022. “advpattern: physical-world attacks on deep person re-identification via
[61] Y. Dong, S. Ruan, H. Su, C. Kang, X. Wei, and J. Zhu, “Viewfool: adversarially transformable patterns,” in Proceedings of the IEEE/CVF
Evaluating the robustness of visual recognition to adversarial view- International Conference on Computer Vision, 2019, pp. 8341–8350.
points,” in Advances in Neural Information Processing Systems, A. H. [83] Z. Kong, J. Guo, A. Li, and C. Liu, “Physgan: Generating physical-
Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. world-resilient adversarial examples for autonomous driving,” in Pro-
[62] N. Nichols and R. Jasper, “Projecting trouble: Light based adversarial ceedings of the IEEE/CVF Conference on Computer Vision and Pattern
attacks on deep learning classifiers,” Proceedings of the AAAI Fall Recognition, 2020, pp. 14 254–14 263.
Symposium on Adversary-Aware Learning Techniques and Trends in [84] Y. Qian, D. Ma, B. Wang, J. Pan, J. Wang, Z. Gu, J. Chen, W. Zhou,
Cybersecurity, 2018. and J. Lei, “Spot evasion attacks: Adversarial examples for license plate
[63] Y. Man, M. Li, and R. Gerdes, “Poster: Perceived adversarial exam- recognition systems with convolutional neural networks,” Computers &
ples,” in IEEE Symposium on Security and Privacy, no. 2019, 2019. Security, vol. 95, p. 101826, 2020.
[64] A. Gnanasambandam, A. M. Sherman, and S. H. Chan, “Optical [85] Y. Zhao, H. Zhu, R. Liang, Q. Shen, S. Zhang, and K. Chen, “Seeing
adversarial attack,” in Proceedings of the IEEE/CVF International isn’t believing: Towards more robust adversarial attack against real
Conference on Computer Vision, 2021, pp. 92–101. world object detectors,” in Proceedings of the 2019 ACM SIGSAC
[65] A. Sayles, A. Hooda, M. Gupta, R. Chatterjee, and E. Fernandes, Conference on Computer and Communications Security, 2019, pp.
“Invisible perturbations: Physical adversarial examples exploiting the 1989–2004.
rolling shutter effect,” in Proceedings of the IEEE/CVF Conference on [86] M. Lee and Z. Kolter, “On physical adversarial patches for object
Computer Vision and Pattern Recognition, 2021, pp. 14 666–14 675. detection,” arXiv preprint arXiv:1906.11897, 2019.
[87] K. Yang, T. Tsai, H. Yu, T.-Y. Ho, and Y. Jin, “Beyond digital domain: of the IEEE international conference on computer vision, 2017, pp.
Fooling deep learning based recognition system in physical world,” in 2223–2232.
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, [108] M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer
no. 01, 2020, pp. 1088–1095. networks,” Advances in neural information processing systems, vol. 28,
[88] A. Zolfi, M. Kravchik, Y. Elovici, and A. Shabtai, “The translucent 2015.
patch: A physical and universal attack on object detectors,” in Pro- [109] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Generating
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern adversarial examples with adversarial networks,” in Proceedings of the
Recognition, 2021, pp. 15 232–15 241. 27th International Joint Conference on Artificial Intelligence, 2018, p.
[89] A. Shapira, R. Bitton, D. Avraham, A. Zolfi, Y. Elovici, and A. Shabtai, 3905–3911.
“Attacking object detector using a universal targeted label-switch [110] Z.-A. Zhu, Y.-Z. Lu, and C.-K. Chiang, “Generating adversarial
patch,” arXiv preprint arXiv:2211.08859, 2022. examples by makeup attacks on face recognition,” in 2019 IEEE
[90] S. Hoory, T. Shapira, A. Shabtai, and Y. Elovici, “Dynamic ad- International Conference on Image Processing (ICIP). IEEE, 2019,
versarial patch for evading object detection models,” arXiv preprint pp. 2516–2520.
arXiv:2010.13070, 2020. [111] T. Malzbender, D. Gelb, and H. Wolters, “Polynomial texture maps,”
[91] D. Y. Yang, J. Xiong, X. Li, X. Yan, J. Raiti, Y. Wang, H. Wu, in Proceedings of the 28th annual conference on Computer graphics
and Z. Zhong, “Building towards” invisible cloak”: Robust physical and interactive techniques, 2001, pp. 519–528.
adversarial attack on yolo object detector,” in 2018 9th IEEE Annual [112] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
Ubiquitous Computing, Electronics & Mobile Communication Confer- D. Batra, “Grad-cam: Visual explanations from deep networks via
ence (UEMCON). IEEE, 2018, pp. 368–374. gradient-based localization,” in Proceedings of the IEEE international
[92] K. Xu, G. Zhang, S. Liu, Q. Fan, M. Sun, H. Chen, P.-Y. Chen, conference on computer vision, 2017, pp. 618–626.
Y. Wang, and X. Lin, “Adversarial t-shirt! evading person detectors in a [113] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using
physical world,” in European conference on computer vision. Springer, convolutional neural networks,” in Proceedings of the IEEE conference
2020, pp. 665–681. on computer vision and pattern recognition, 2016, pp. 2414–2423.
[93] Z. Wu, S.-N. Lim, L. S. Davis, and T. Goldstein, “Making an invisibility [114] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,”
cloak: Real world adversarial attacks on object detectors,” in European arXiv preprint arXiv:1804.02767, 2018.
Conference on Computer Vision. Springer, 2020, pp. 1–17. [115] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu,
[94] J. Tan, N. Ji, H. Xie, and X. Xiang, “Legitimate adversarial patches: and A. C. Berg, “Ssd: Single shot multibox detector,” in European
Evading human eyes and detection models in the physical world,” in conference on computer vision. Springer, 2016, pp. 21–37.
Proceedings of the 29th ACM International Conference on Multimedia, [116] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
2021, pp. 5307–5315. for dense object detection,” in Proceedings of the IEEE international
[95] Y.-C.-T. Hu, B.-H. Kung, D. S. Tan, J.-C. Chen, K.-L. Hua, and W.-H. conference on computer vision, 2017, pp. 2980–2988.
Cheng, “Naturalistic physical adversarial patch for object detectors,” in [117] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
Proceedings of the IEEE/CVF International Conference on Computer object detection with region proposal networks,” Advances in neural
Vision, 2021, pp. 7848–7857. information processing systems, vol. 28, pp. 91–99, 2015.
[96] Z. Hu, S. Huang, X. Zhu, F. Sun, B. Zhang, and X. Hu, “Adversarial [118] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
texture for fooling person detectors in the physical world,” in Proceed- Proceedings of the IEEE international conference on computer vision,
ings of the IEEE/CVF Conference on Computer Vision and Pattern 2017, pp. 2961–2969.
Recognition, 2022, pp. 13 307–13 316. [119] X. Liu, H. Yang, Z. Liu, L. Song, H. Li, and Y. Chen, “Dpatch:
[97] G. Lovisotto, H. Turner, I. Sluganovic, M. Strohmeier, and I. Mar- An adversarial patch attack on object detectors,” arXiv preprint
tinovic, “Slap: Improving physical adversarial examples with short- arXiv:1806.02299, 2018.
lived adversarial perturbations,” in 30th USENIX Security Symposium [120] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
(USENIX Security 21), 2021, pp. 1865–1882. “Feature pyramid networks for object detection,” in Proceedings of the
[98] X. Zeng, C. Liu, Y.-S. Wang, W. Qiu, L. Xie, Y.-W. Tai, C.-K. Tang, IEEE conference on computer vision and pattern recognition, 2017,
and A. L. Yuille, “Adversarial attacks beyond the image space,” in pp. 2117–2125.
Proceedings of the IEEE/CVF Conference on Computer Vision and [121] X. Yang, F. Wei, H. Zhang, and J. Zhu, “Design and interpretation of
Pattern Recognition, 2019, pp. 4302–4311. universal adversarial patches in face detection,” in European Confer-
[99] Y. Zhang, P. H. Foroosh, and B. Gong, “Camou: Learning a vehicle ence on Computer Vision. Springer, 2020, pp. 174–191.
camouflage for physical adversarial attack on object detections in the [122] C. Xiao, D. Yang, B. Li, J. Deng, and M. Liu, “Meshadv: Adversarial
wild,” ICLR, 2019. meshes for visual recognition,” in Proceedings of the IEEE/CVF
[100] T. Wu, X. Ning, W. Li, R. Huang, H. Yang, and Y. Wang, “Physical Conference on Computer Vision and Pattern Recognition, 2019, pp.
adversarial attack on vehicle detector in the carla simulator,” arXiv 6898–6907.
preprint arXiv:2007.16118, 2020. [123] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla:
[101] A. Du, B. Chen, T.-J. Chin, Y. W. Law, M. Sasdelli, R. Rajasegaran, An open urban driving simulator,” in Conference on robot learning.
and D. Campbell, “Physical adversarial attacks on an aerial imagery PMLR, 2017, pp. 1–16.
object detector,” in Proceedings of the IEEE/CVF Winter Conference [124] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun, “Light-
on Applications of Computer Vision, 2022, pp. 1796–1806. head r-cnn: In defense of two-stage object detector,” arXiv preprint
[102] F. Nesti, G. Rossolini, S. Nair, A. Biondi, and G. Buttazzo, “Evalu- arXiv:1711.07264, 2017.
ating the robustness of semantic segmentation for autonomous driving [125] H. Kato, Y. Ushiku, and T. Harada, “Neural 3d mesh renderer,” in
against real-world adversarial patch attacks,” in Proceedings of the Proceedings of the IEEE conference on computer vision and pattern
IEEE/CVF Winter Conference on Applications of Computer Vision, recognition, 2018, pp. 3907–3916.
2022, pp. 2280–2289. [126] M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient
[103] Z. Cheng, J. Liang, H. Choi, G. Tao, Z. Cao, D. Liu, and X. Zhang, object detection,” in Proceedings of the IEEE/CVF conference on
“Physical attack on monocular depth estimation with optimal adversar- computer vision and pattern recognition, 2020, pp. 10 781–10 790.
ial patches,” in European Conference on Computer Vision. Springer, [127] J. Byun, S. Cho, M.-J. Kwon, H.-S. Kim, and C. Kim, “Improving
2022, pp. 514–532. the transferability of targeted adversarial examples through object-
[104] R. R. Wiyatno and A. Xu, “Physical adversarial textures that fool based diverse input,” in Proceedings of the IEEE/CVF Conference on
visual object tracking,” in Proceedings of the IEEE/CVF International Computer Vision and Pattern Recognition, 2022, pp. 15 244–15 253.
Conference on Computer Vision, 2019, pp. 4822–4831. [128] M. A. Alcorn, Q. Li, Z. Gong, C. Wang, L. Mai, W.-S. Ku, and
[105] L. Ding, Y. Wang, K. Yuan, M. Jiang, P. Wang, H. Huang, and Z. J. A. Nguyen, “Strike (with) a pose: Neural networks are easily fooled
Wang, “Towards universal physical attacks on single object tracking,” by strange poses of familiar objects,” in Proceedings of the IEEE/CVF
in AAAI, 2021, pp. 1236–1245. Conference on Computer Vision and Pattern Recognition, 2019, pp.
[106] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image 4845–4854.
translation with conditional adversarial networks,” in Proceedings of [129] R. den Hollander, A. Adhikari, I. Tolios, M. van Bekkum, A. Bal,
the IEEE conference on computer vision and pattern recognition, 2017, S. Hendriks, M. Kruithof, D. Gross, N. Jansen, G. Perez et al.,
pp. 1125–1134. “Adversarial patch camouflage against aerial detection,” in Artificial
[107] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image Intelligence and Machine Learning in Defense Applications II, vol.
translation using cycle-consistent adversarial networks,” in Proceedings 11543. SPIE, 2020, pp. 77–86.
[130] L. Chen, G. Zhu, Q. Li, and H. Li, “Adversarial example in remote tion,” in Proceedings of the IEEE international conference on computer
sensing image recognition,” arXiv preprint arXiv:1910.13222, 2019. vision, 2017, pp. 2755–2764.
[131] L. Chen, H. Li, G. Zhu, Q. Li, J. Zhu, H. Huang, J. Peng, and L. Zhao, [153] K. K. Nakka and M. Salzmann, “Indirect local attacks for context-
“Attack selectivity of adversarial examples in remote sensing image aware semantic segmentation networks,” in European Conference on
scene classification,” IEEE Access, vol. 8, pp. 137 477–137 489, 2020. Computer Vision. Springer, 2020, pp. 611–628.
[132] Y. Xu, B. Du, and L. Zhang, “Assessing the threat of adversarial exam- [154] A. Arnab, O. Miksik, and P. H. Torr, “On the robustness of semantic
ples on deep neural networks for remote sensing scene classification: segmentation models to adversarial attacks,” in Proceedings of the
Attacks and defenses,” IEEE Transactions on Geoscience and Remote IEEE Conference on Computer Vision and Pattern Recognition, 2018,
Sensing, vol. 59, no. 2, pp. 1604–1617, 2020. pp. 888–897.
[133] Y. Wang, K. Wang, Z. Zhu, and F.-Y. Wang, “Adversarial attacks on [155] M. Alzantot, Y. Sharma, S. Chakraborty, H. Zhang, C.-J. Hsieh, and
faster r-cnn object detector,” Neurocomputing, vol. 382, pp. 87–95, M. B. Srivastava, “Genattack: Practical black-box attacks with gradient-
2020. free optimization,” in Proceedings of the Genetic and Evolutionary
[134] K.-H. Chow, L. Liu, M. Loper, J. Bae, M. E. Gursoy, S. Truex, W. Wei, Computation Conference, 2019, pp. 1111–1119.
and Y. Wu, “Adversarial objectness gradient attacks in real-time object [156] M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square
detection systems,” in 2020 Second IEEE International Conference on attack: a query-efficient black-box adversarial attack via random
Trust, Privacy and Security in Intelligent Systems and Applications search,” in European Conference on Computer Vision. Springer, 2020,
(TPS-ISA). IEEE, 2020, pp. 263–272. pp. 484–501.
[135] D. Wang, C. Li, S. Wen, Q.-L. Han, S. Nepal, X. Zhang, and Y. Xiang, [157] C. Li, W. Yao, H. Wang, and T. Jiang, “Adaptive momentum variance
“Daedalus: Breaking nonmaximum suppression in object detection via for attention-guided sparse adversarial attacks,” Pattern Recognition,
adversarial examples,” IEEE Transactions on Cybernetics, 2021. vol. 133, p. 108979, 2023.
[136] Q. Liao, X. Wang, B. Kong, S. Lyu, Y. Yin, Q. Song, and X. Wu, [158] A. Ghosh, S. S. Mullick, S. Datta, S. Das, A. K. Das, and R. Mallipeddi,
“Fast local attack: Generating local adversarial examples for object “A black-box adversarial attack strategy with adjustable sparsity and
detectors,” in 2020 International Joint Conference on Neural Networks generalizability for deep image classifiers,” Pattern Recognition, vol.
(IJCNN). IEEE, 2020, pp. 1–8. 122, p. 108279, 2022.
[137] J. Bao, “Sparse adversarial attack to object detection,” arXiv preprint [159] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun,
arXiv:2012.13692, 2020. “CARLA: An open urban driving simulator,” in Proceedings of the
[138] S. Wu, T. Dai, and S.-T. Xia, “Dpattack: Diffused patch attacks against 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
universal object detection,” arXiv preprint arXiv:2010.11679, 2020. [160] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-
[139] H. Huang, Y. Wang, Z. Chen, Z. Tang, W. Zhang, and K.-K. Ma, fidelity visual and physical simulation for autonomous vehicles,”
“Rpattack: Refined patch attack on general object detectors,” in 2021 in Field and Service Robotics, 2017. [Online]. Available: https:
IEEE International Conference on Multimedia and Expo (ICME). //arxiv.org/abs/1705.05065
IEEE, 2021, pp. 1–6. [161] H. Salman, A. Ilyas, L. Engstrom, S. Vemprala, A. Madry, and
A. Kapoor, “Unadversarial examples: Designing objects for robust
[140] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
vision,” Advances in Neural Information Processing Systems, vol. 34,
A. Swami, “The limitations of deep learning in adversarial settings,” in
pp. 15 270–15 284, 2021.
2016 IEEE European symposium on security and privacy (EuroS&P).
[162] J. Wang, Z. Yin, P. Hu, A. Liu, R. Tao, H. Qin, X. Liu, and D. Tao,
IEEE, 2016, pp. 372–387.
“Defensive patches for robust recognition in the physical world,” in
[141] F. Croce and M. Hein, “Sparse and imperceivable adversarial attacks,”
Proceedings of the IEEE/CVF Conference on Computer Vision and
in Proceedings of the IEEE/CVF International Conference on Com-
Pattern Recognition, 2022, pp. 2456–2465.
puter Vision, 2019, pp. 4724–4732.
[142] A. Modas, S.-M. Moosavi-Dezfooli, and P. Frossard, “Sparsefool: a
few pixels make a big difference,” in Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, 2019, pp.
9087–9096.
[143] X. Dong, D. Chen, J. Bao, C. Qin, L. Yuan, W. Zhang, N. Yu,
and D. Chen, “Greedyfool: Distortion-aware sparse adversarial attack,”
Advances in Neural Information Processing Systems, vol. 33, pp.
11 226–11 236, 2020.
[144] Z. He, W. Wang, J. Dong, and T. Tan, “Transferable sparse adversarial
attack,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2022, pp. 14 963–14 972.
[145] Y. Zhao, H. Yan, and X. Wei, “Object hider: Adversarial patch attack
against object detectors,” arXiv preprint arXiv:2010.14974, 2020.
[146] D. Li, J. Zhang, and K. Huang, “Universal adversarial perturbations
against object detection,” Pattern Recognition, vol. 110, p. 107584,
2021.
[147] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer-
sal adversarial perturbations,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2017, pp. 1765–1773.
[148] Z. Zhu, H. Su, C. Liu, W. Xiang, and S. Zheng, “You cannot easily
catch me: A low-detectable adversarial patch for object detectors,”
arXiv preprint arXiv:2109.15177, 2021.
[149] Y. Cao, C. Xiao, B. Cyr, Y. Zhou, W. Park, S. Rampazzi, Q. A.
Chen, K. Fu, and Z. M. Mao, “Adversarial sensor attack on lidar-based
perception in autonomous driving,” in Proceedings of the 2019 ACM
SIGSAC conference on computer and communications security, 2019,
pp. 2267–2281.
[150] T. Sato, J. Shen, N. Wang, Y. Jia, X. Lin, and Q. A. Chen, “Dirty road
can attack: Security of deep learning based automated lane centering
under {Physical-World} attack,” in 30th USENIX Security Symposium
(USENIX Security 21), 2021, pp. 3309–3326.
[151] J. Tu, H. Li, X. Yan, M. Ren, Y. Chen, M. Liang, E. Bitar, E. Yumer,
and R. Urtasun, “Exploring adversarial robustness of multi-sensor
perception systems in self driving,” arXiv preprint arXiv:2101.06784,
2021.
[152] J. Hendrik Metzen, M. Chaithanya Kumar, T. Brox, and V. Fischer,
“Universal adversarial perturbations against semantic image segmenta-

A Survey On Physical Adversarial Attack in CV A SURVEY

Uploaded by

Copyright:

Available Formats

You might also like

A Survey On Physical Adversarial Attack in CV A SURVEY

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey On Physical Adversarial Attack in CV A SURVEY

Uploaded by

Copyright:

Available Formats

JOURNAL OF LATEX CLASS FILES, VOL. 18, NO.

A Survey on Physical Adversarial Attack in

Abstract—In the past decade, deep learning has dramatically

B. Adversarial examples A. Adversary’s knowledge

object region to mislead the target model, called instance-level TABLE I

1) Stationary physical test: The stationary physical ad-

G. Physical Testing environment

Picture Physical test set

Stage 2:Test the adversarial perturbation in physical space

Fig. 3. Overview of the physical adversarial attack process.

Adversarial’s Threat Adversarial Robust Physical

remain physically aggressive when printed out and retaken by

Adversarial’s Threat Adversarial Robust Physical

adversarial patch’s robustness, and TV loss is adopted to

used the bend and rotation transformation in the 3D space to

argmin λ||Mx · δ||p + N P S

where pyadv (·) is the probability output (softmax on logits) of

A. Patch-based physical adversarial attack

1 X author captured the video of the adversarial object beginning

where A(x, δ, t) is the patch apply function, which pastes

examples from the simulator environment and used to query

You might also like