Professional Documents
Culture Documents
Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning
Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning
Abstract
Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains,
ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully
arXiv:1712.03141v2 [cs.CV] 19 Jul 2018
crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild
patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in
the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research
area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up
to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision
and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common
misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks
defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the
design of more secure learning algorithms.
Keywords: Adversarial Machine Learning; Evasion Attacks; Poisoning Attacks; Adversarial Examples; Secure Learning; Deep
Learning
Figure 1: A conceptual representation of the reactive (left) and proactive (right) arms races for pattern recognition and machine learning systems in computer
security [41, 42].
test examples) [3–5, 14–16, 26, 29, 31–33, 38, 41, 61]. It con-
sists of defining the attacker’s goal, knowledge of the targeted
system, and capability of manipulating the input data, to sub-
sequently define an optimization problem corresponding to the
optimal attack strategy. The solution to this problem provides
a way to manipulate input data to achieve the attacker’s goal.
While this framework only considers attacks against supervised
learning algorithms, we refer the reader to similar threat mod-
els to evaluate the security of clustering [62–64], and feature
selection algorithms [29, 65] under different attack settings.
Notation. In the following, we denote the sample and label
spaces with X and Y, respectively, and the training data with
D = (xi , yi )ni=1 , being n the number of training samples. We use
L(D, w) to denote the loss incurred by the classifier f : X 7→ Y
Figure 2: Examples of clean (top) and obfuscated (bottom) spam images [57]. (parameterized by w) on D. We assume that the classifica-
tion function f is learned by minimizing an objective function
the newly-deployed attacks and designs novel countermeasures L(D, w) on the training data. Typically, this is an estimate of
against them. However, reactive approaches are clearly not able the generalization error, obtained by the sum of the empirical
to prevent the risk of never-before-seen attacks. To this end, loss L on D and a regularization term.
the designer should follow a proactive approach to anticipate
the attacker by (i) identifying relevant threats against the sys-
tem under design and simulating the corresponding attacks, (ii) 3.1. Attacker’s Goal
devising suitable countermeasures (if retained necessary), and
(iii) repeating this process before system deployment (Fig. 1, This aspect is defined in terms of the desired security viola-
right). In practice, these steps are facilitated by leveraging a tion, attack specificity, and error specificity, as detailed below.
thorough model of the attacker, as that discussed in the next Security Violation. The attacker may aim to cause: an integrity
section, which helps envisioning and analyzing a number of po- violation, to evade detection without compromising normal sys-
tential attack scenarios against learning-based systems. tem operation; an availability violation, to compromise the nor-
mal system functionalities available to legitimate users; or a pri-
vacy violation, to obtain private information about the system,
3. Know Your Adversary: Modeling Threats its users or data by reverse-engineering the learning algorithm.
“If you know the enemy and know yourself, you need Attack Specificity. It ranges from targeted to indiscriminate,
not fear the result of a hundred battles.” (Sun Tzu, respectively, depending on whether the attacker aims to cause
The Art of War, 500 BC) misclassification of a specific set of samples (to target a given
system user or protected service), or of any sample (to target
We discuss here the first golden rule of the proactive secu- any system user or protected service).
rity cycle discussed in the previous section, i.e., how to model Error Specificity. It can be specific, if the attacker aims to
threats against learning-based systems and thoroughly evalu- have a sample misclassified as a specific class; or generic, if
ate their security against the corresponding attacks. To this the attacker aims to have a sample misclassified as any of the
end, we exploit a framework based on the popular attack tax- classes different from the true class.3
onomy proposed in [23, 40, 61] and subsequently extended
in [7, 30, 33, 41, 42], which enables one to envision different
3 In [14], the authors defined targeted and indiscriminate attacks (at test
attack scenarios against learning algorithms and deep networks,
time) depending on whether the attacker aims to cause specific or generic er-
and to implement the corresponding attack strategies. Notably, rors. Here we do not follow their naming convention, as it can cause confusion
these attacks include training-time poisoning and test-time eva- with the interpretation of targeted and indiscriminate attack specificity also in-
sion attacks (also recently referred to as adversarial training and troduced in previous work [23, 29, 30, 40–42, 61–64].
3
3.2. Attacker’s Knowledge wise neither change can be inflicted to the output of the clas-
The attacker can have different levels of knowledge of the sification function, nor any useful information can be extracted
targeted system, including: (k.i) the training data D; (k.ii) the from it. For example, if one attacks a malware detector based on
feature set X; (k.iii) the learning algorithm f , along with the dynamic analysis by injecting static code that will never be exe-
objective function L minimized during training; and, possi- cuted, there will be no impact at all on the classifier’s decisions.
bly, (k.iv) its (trained) parameters/hyper-parameters w. The This means that, although the exact feature representation may
attacker’s knowledge can thus be characterized in terms of a be not known to the attacker, at least she knows (or has to get to
space Θ, whose elements encode the components (k.i)-(k.iv) as know) which kind of features are used by the system (e.g., fea-
θ = (D, X, f, w). Depending on the assumptions made on (k.i)- tures based on static or dynamic analysis in malware detection).
(k.iv), one can describe different attack scenarios. Thus, knowledge of the feature representation may be partial,
Perfect-Knowledge (PK) White-Box Attacks. Here the at- but not completely absent. This is even more evident for deep
tacker is assumed to know everything about the targeted sys- networks trained on images, where the attacker knows that the
tem, i.e., θPK = (D, X, f, w). This setting allows one to perform input features are the image pixels.
a worst-case evaluation of the security of learning algorithms, Similar considerations hold for knowledge of the training
providing empirical upper bounds on the performance degrada- data. If the attacker knows that the classifier is used for a spe-
tion that may be incurred by the system under attack. cific task, it is clear the she also knows which kind of data has
Limited-Knowledge (LK) Gray-Box Attacks. One may con- been used to train it; for example, if a deep network aims to
sider here different settings, depending on the attacker’s knowl- discriminate among classes of animals, then it is clear that it
edge about each of the components (k.i)-(k.iv). Typically, the has been trained on images of such animals. Hence, also in this
attacker is assumed to know the feature representation X and case the attacker effectively has some knowledge of the training
the kind of learning algorithm f (e.g., the fact that the classi- data, even if not of the exact training samples.
fier is linear, or it is a neural network with a given architec- We thus characterize this setting as θZK = (D̂, X̂, fˆ, ŵ).
ture, etc.), but neither the training data D nor the classifier’s Even if surrogate learners are not necessarily used here [66–
(trained) parameters w. The attacker is however assumed to be 69], as well as in pioneering work on black-box attacks against
able to collect a surrogate data set D̂4 from a similar source machine learning [20, 70], one may anyway learn a surro-
(ideally sampling from the same underlying data distribution), gate classifier (potentially on a different feature representa-
and potentially get feedback from the classifier about its deci- tion) and check whether the crafted attack samples transfer to
sions to provide labels for such data. This enables the attacker the targeted classifier. Feedback from classifier’s decisions on
to estimate the parameters ŵ from D̂, by training a surrogate carefully-crafted query samples can then be used to refine the
classifier. We refer to this case as LK attacks with Surrogate surrogate model, as in [15]. Although the problem of learn-
Data (LK-SD), and denote it with θLK−SD = (D̂, X, f, ŵ) . ing a surrogate model while minimizing the number of queries
We refer to the setting in which the attacker does not even can be casted as an active learning problem, to our knowledge
know the kind of learning algorithm f as LK attacks with well-established active learning algorithms have not yet been
Surrogate Learners (LK-SL), and denote it with θLK−SL = compared against such recently-proposed approaches [15].
(D̂, X, fˆ, ŵ). LK-SL attacks also include the case in which the
3.3. Attacker’s Capability
attacker knows the learning algorithm, but optimizing the at-
tack samples against it may be not tractable or too complex. In This characteristic depends on the influence that the attacker
this case, the attacker can also craft the attacks against a sur- has on the input data, and on application-specific data manipu-
rogate classifier and test them against the targeted one. This is lation constraints.
a common procedure used also to evaluate the transferability Attack Influence. It can be causative, if the attacker can manip-
of attacks between learning algorithms, as firstly shown in [38] ulate both training and test data, or exploratory, if the attacker
and subsequently in [15] for deep networks. can only manipulate test data. These scenarios are more com-
monly known as poisoning and evasion attacks [23, 26, 29, 31,
Zero-Knowledge (ZK) Black-Box Attacks. Recent work has
33, 38, 40–42, 61].
also claimed that machine learning can be threatened without
Data Manipulation Constraints. Another aspect related to the
any substantial knowledge of the feature space, the learning al-
attacker’s capability depends on the presence of application-
gorithm and the training data, if the attacker can query the sys-
specific constraints on data manipulation, e.g., to evade mal-
tem in a black-box manner and get feedback on the provided
ware detection, malicious code has to be modified without com-
labels or confidence scores [15, 66–69]. This point deserves
promising its intrusive functionality. This may be done against
however some clarification. First, the attacker knows (as any
systems based on static code analysis, by injecting instructions
other potential user) that the classifier is designed to perform
or code that will never be executed [11, 38, 39, 48]. These
some task (e.g., object recognition in images, malware classifi-
constraints can be generally accounted for in the definition of
cation, etc.), and has to clearly have an idea of which potential
the optimal attack strategy by assuming that the initial attack
transformations to apply to cause some feature changes, other-
samples Dc can only be modified according to a space of possi-
ble modifications Φ(Dc ). In some cases, this space can also be
4 We use here the hat symbol to denote limited knowledge of a given com- mapped in terms of constraints on the feature values of the at-
ponent. tack samples; e.g., by imposing that feature values correspond-
4
Attacker’s Goal
Security Evaluation Curve Misclassifications that do Misclassifications that Querying strategies that reveal
not compromise normal compromise normal confidential information on the
classification accuracy
C2 system operation system operation learning model or its users
0
attack strength
Figure 4: Categorization of attacks against machine learning based on our threat
model.
Figure 3: Security evaluation curves for two hypothetical classifiers C1 and C2 ,
inspired from the methodology proposed in [41, 42]. Based only on classifi-
cation accuracy in the absence of attack, one may prefer C2 to C1 . Simulating
attacks of increasing strength (e.g., by increasing the level of perturbation in goal and main capabilities. The most common attacks, as dis-
input images) may however reveal that more accurate classifiers may be less cussed before, include evasion and poisoning availability at-
robust to adversarial perturbations. Thus, one may finally prefer C1 to C2 . tacks (aimed to maximize the test error) [23, 26, 29, 31, 33,
38, 40–42, 61]. More recently, different kinds of poisoning
integrity attacks (which manipulate the training data or the
ing to occurrences of some instructions in static malware detec- trained model to cause specific misclassifications, as defined
tors can only be incremented [38, 39, 48]. in [29, 33]) against deep networks have been also studied un-
der the name of backdoor and trojaning attacks [71, 72]. These
3.4. Attack Strategy attacks maliciously manipulate pre-trained network models to
create specific backdoor vulnerabilities. The corrupted mod-
Given the attacker’s knowledge θ ∈ Θ and a set of manip-
els are then publicly released, to favor their adoption in pro-
ulated attack samples D0c ∈ Φ(Dc ), the attacker’s goal can be
prietary systems (e.g., via fine-tuning or other transfer learning
defined in terms of an objective function A(D0c , θ) ∈ R which
techniques). When this happens, the attacker can activate the
measures how effective the attacks D0c are. The optimal attack
backdoors using specific input samples that are misclassified as
strategy can be thus given as:
desired. The underlying idea behind such attacks and their im-
D?c ∈ arg max A(D0c , θ) (1) pact on the learning process is conceptually depicted in Fig. 5.
D0c ∈Φ(Dc ) All the aforementioned attacks can be successfully staged under
different levels of the attacker’s knowledge. When knowledge is
We show in Sect. 4 how this high-level formulation encom-
limited, as in the gray-box and black-box cases, privacy or con-
passes both evasion and poisoning attacks against supervised
fidential attacks can be staged to gain further knowledge about
learning algorithms, despite it has been used also to attack clus-
the target classifier or its users. Although we do not thoroughly
tering [62–64], and feature selection algorithms [29, 65].
cover such attacks in detail here, we refer the reader to few
practical examples of such attacks reported to date, including
3.5. Security Evaluation Curves
model-extraction attacks aimed to steal machine-learning mod-
It is worth remarking here that, to provide a thorough security els, and model-inversion and hill-climbing attacks against bio-
evaluation of learning algorithms, one should assess their per- metric systems used to steal the face and fingerprint templates
formance not only under different assumptions on the attacker’s of their users (or any other sensitive information) [30, 66, 73–
knowledge, but also increasing the attack strength, i.e., the at- 76].
tacker’s capability Φ(Dc ) of manipulating the input data. For
example, this can be done by increasing the amount of pertur- 4. Be Proactive: Simulating Attacks
bation used to craft evasion attacks, or the number of poison-
“To know your enemy, you must become your
ing attack points injected into the training data. The resulting
enemy.”
security evaluation curve, conceptually represented in Fig. 3,
(Sun Tzu, The Art of War, 500 BC)
shows the extent to which the performance of a learning algo-
rithm drops more or less gracefully under attacks of increasing We discuss here how to formalize test-time evasion and
strength. This is crucial to enable a fairer comparison among training-time poisoning attacks in terms of the optimization
different attack algorithms and defenses in the context of con- problem given in Eq. (1), and consistently with the threat model
crete application examples [41, 42], as we will discuss in the discussed in Sect. 3. Note that other attacks may also be de-
remainder of this work. scribed, generally, in terms of the aforementioned optimization
problem, but we focus here only on those for which this con-
3.6. Summary of Attacks against Machine Learning nection is tighter.
Before delving into the details of specific attacks, we pro-
vide in Fig. 4 a simplified categorization of the main attacks 4.1. Evasion attacks
against machine-learning algorithms based on the aforemen- Evasion attacks consist of manipulating input data to evade
tioned threat model; in particular, considering the attacker’s a trained classifier at test time. These include, e.g., manipula-
5
Backdoor Attacks (Poisoning Integrity Attacks)
Targeted evasion Indiscriminate evasion
Training data (no poisoning) Training data (poisoned)
Attack referred
to as backdoor
T. Gu, B. Dolan-Gavitt, and S. Garg. Badnets: Identifying vulnerabilities in
1
1 learning model supply chain. In NIPS Workshop on Machine
the machine
Learning and Computer Security, 2017.
0
0 B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine
M. Barreno,
learning be secure? In Proc. ACM Symp. Information, Computer and
Comm. Sec., ASIACCS ’06, pages 16–25, New York, NY, USA, 2006. ACM.
B. Biggio, 1 1
B. Nelson, and P. Laskov. Poisoning attacks against support
vector machines. In J. Langford and J. Pineau, editors, 29th Int’l Conf. on
Machine Learning, 1 pages 1807–1814.0Omnipress, 2012.1 1 0 1
Backdoored stop sign B. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classifiers
Attack referred to as
‘poisoning integrity’
(labeled as speedlimit) under attack. IEEE Transactions on Knowledge and Data Engineering,
26(4):984–996, April 2014.
Figure 6: Examples of error-specific (left) and error-generic (right) evasion, as
reported in [7].
H. Xiao, B. Biggio, Decision
G. Brown, G. Fumera, C.boundaries
Eckert, and F. Roli. among
Is feature the three classes (blue, red and
Backdoor / poisoning integrity attacks place mislabeled training points in a region of the selection secure against training data poisoning? In F. Bach and D. Blei,
green
editors,points)
JMLR W&CPare - Proc.shown as Mach.
32nd Int’l Conf. black lines.
Learning (ICML),In the error-specific case, the initial
feature space far from the rest of training data. The learning algorithm labels such volume 37, pages 1689–1698, 2015. towards the green class (selected as target). In the
region as desired, allowing for subsequent intrusions / misclassifications at test time (blue) sample is shifted
error-generic
L. Munoz-Gonzalez,case,
B. Biggio,instead,
A. Demontis,it A. is shifted
Paudice, towards the red class, as it is the closest
V. Wongrassamee,
E. C. Lupu, and F. Roli. Towards poisoning of deep learning algorithms
class to the initial
with back-gradient sample.
optimization. In 10th The gray circle
ACM Workshop represents the feasible domain, given
on Artificial
asIntelligence
an upper bound on the `2 distance between the initial and the manipulated
and Security, AISec ’17, pp. 27–38, 2017. ACM.
B. Biggiosample.
attack and F. Roli. Wild patterns: Ten years after the rise of adversarial
machine learning. ArXiv e-prints, 2018.
50 50
100 100
150 150
200 200
50 50 50 50
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Figure 7: Error-specific evasion results from [7]. Top row: Security evaluation curves reporting accuracy of the given classifiers against an increasing `2 input
perturbation. The right-hand side plots depict a laundry detergent misclassified as a cup when applying the minimum input perturbation required for misclassification,
along with the corresponding magnified noise mask. Bottom row: Images of the laundry detergent perturbed with an increasing level of noise. The manipulations
are only barely visible for perturbation values higher than 150-200 (recall however that these values depend on the image size, as the `2 distance).
geted classifier, a surrogate classifier can be learned on surro- we were investigating how to evade detection by learning al-
gate training data, and used to craft the attack samples which gorithms in security-related tasks with a clear adversarial na-
then transfer with high probability to the targeted model. This ture (like spam and malware detection) [38], Szegedy et al. [2]
was also the first experiment showing that adversarial exam- were trying to interpret and visualize the salient characteristics
ples can be transferred, at least in a gray-box setting (training learned by deep networks. To this end, they started looking for
the same algorithm on different data). Notably, Šrndić and minimal changes to images in the input space that cause mis-
Laskov [39] subsequently exploited this attack to show that classifications. They expected to see significant changes to the
PDF malware detectors based on nonlinear learning algorithms background of the image or to the structure and aspect of the
were also vulnerable to evasion, conversely to what they sup- depicted objects, while it turned out that, quite surprisingly,
posed in [83]. such modifications were almost imperceptible to the human
More recently, we have also exploited the theoretical findings eye. This discovery has raised an enormous interest in both the
in [84], which connect regularization and robustness in kernel- computer vision and security communities which, since then,
based classifiers, to provide a theoretically-sound countermea- have started proposing novel security assessment methodolo-
sure for linear classifiers against evasion attacks [48, 77]. These gies, attacks and countermeasures to mitigate this threat, inde-
recent developments have enabled a deeper understanding on pendently re-discovering many other aspects and phenomena
how to defend against evasion attacks in spam filtering and mal- that had been already known, to some extent, in the area of ad-
ware detection, also clarifying (in a formal manner) the intu- versarial machine learning. An example is given by the use of
itive idea of uniform feature weights only heuristically provided surrogate classifiers with smoother decision functions [15] to
in [44, 82]. In particular, we have recently shown how a proper, attack models that either mask gradients or are not end-to-end
theoretically-grounded regularization scheme can significantly differentiable [6]. One of such defense mechanisms, known as
outperform heuristic approaches in these contexts [48, 77]. distillation [6], has indeed shown to be vulnerable to attacks
Security of Deep Learning. In 2014-2015, Szegedy et al. [2] based on surrogate classifiers [15], essentially leveraging the
and subsequent work [3–5] showed that deep networks can idea behind limited-knowledge evasion attacks that we first dis-
be fooled by well-crafted, minimally-perturbed input images cussed in [38]. Iterative attacks based on projected gradients
at test time, called adversarial examples. These samples are have also been independently re-discovered [104], despite they
obtained by minimizing the distance of the adversarial sam- had been used before against other classifiers [7, 38, 39].
ple x0 to the corresponding source sample x under the con-
straint that the predicted label is different, i.e., min x0 d(x, x0 ) Misconceptions on Evasion Attacks. The main misconcep-
s.t. f (x) , f (x0 ). Interestingly, the parallel discovery of such tion that is worth highlighting here is that adversarial examples
gradient-based adversarial perturbations by Szegedy et al. [2] should be minimally perturbed. The motivation of this miscon-
and Biggio et al. [38] started from different premises, as also ex- ception is easy to explain. The notion of adversarial examples
plained by Ian Goodfellow in one of his popular talks.6 While was initially introduced to analyze the instability of deep net-
works [2], i.e., their sensitivity to minimal input perturbations;
the goal of the initial work on adversarial examples was not to
6 Available at: https://youtu.be/CIfsB_EYsVI?t=6m02s to perform a detailed security assessment of a machine-learning
8
Ad
ve
Legend
rsa
ria
Ml
Pioneering work on adversarial machine learning
L
Work on security evaluation of learning algorithms 2004-2005: pioneering work Main contributions:
Dalvi et al., KDD 2004 - minimum-distance evasion of linear classifiers
Work on evasion attacks (a.k.a. adversarial examples) Lowd & Meek, KDD 2005 - notion of adversary-aware classifiers
2009: Kolcz et al., CEAS - evasion attacks against linear classifiers in spam filtering
cu
Main contributions:
y
2016: Papernot et al., Euro S&P 2017: Demontis et al., IEEE TDSC Main contributions:
Distillation defense (gradient masking) Yes, Machine Learning Can Be - Secure SVM against adversarial examples in malware
More Secure! A Case Study on detection
2017: Papernot et al., ASIACCS Android Malware Detection
Black-box evasion attacks with
substitute models (breaks distillation
with transfer attacks on a smoother
surrogate classifier)
Figure 8: Timeline of evasion attacks (i.e., adversarial examples) in adversarial machine learning, compared to work on the security of deep networks. Related work
is highlighted with markers of the same color (as reported in the legend) and connected with dashed lines to highlight independent (but related) findings. The date
of publication refers to publication on peer-reviewed conferences and journals.
algorithm using security evaluation curves (Fig. 7). Normally, of modifications to the input data, it is more reasonable to as-
as already discussed in this paper and also in our previous sume that the attacker will aim to maximize the classifier’s con-
work [38, 41], for the purpose of thoroughly assessing the secu- fidence on the desired output class, rather than only minimally
rity of a learning algorithm under attack, given a feasible space perturbing the attack samples (cf. Eqs. 2-3). For this reason,
9
while minimally-perturbed adversarial examples can be used Error-Generic Poisoning Attacks. In this case, the attacker
to analyze the sensitivity of a learning algorithm, maximum- aims to cause a denial of service, by inducing as many mis-
confidence attacks are more suitable for a thorough security as- classifications as possible (regardless of the classes in which
sessment of learning algorithms under attack. In particular, the they occur). Poisoning attacks are generally formulated as
use of the security evaluation curves described above gives us bilevel optimization problems, in which the outer optimization
a clearer understanding of the security of a learning algorithm maximizes the attacker’s objective A (typically, a loss func-
under more powerful attacks. By increasing the attack strength tion L computed on untainted data), while the inner optimiza-
(i.e., the maximum amount of perturbation applicable to the in- tion amounts to learning the classifier on the poisoned training
put data), one can draw a complete security evaluation curve, data [26, 29, 31]. This can be made explicit by rewriting Eq. (1)
reporting the evasion rate for each value of attack strength. This as:
ensures us that, e.g., if the noise applied to the input data is
not larger than , then the classification performance should not D?c ∈ arg max A(D0c , θ) = L(Dval , w? ) , (4)
drop more than δ. Conversely, using minimally-perturbed ad- D0c ∈Φ(Dc )
versarial examples one can only provide guarantees against an s.t. w? ∈ arg min L(Dtr ∪ D0c , w0 ) , (5)
average level of perturbation (rather than a worst-case bound). w0 ∈W
Figure 9: Conceptual example of how poisoning attacks compromise a linear classifier [26, 29]. Each plot shows the training data (red and blue points) and the
corresponding trained classifier (black solid line). The fraction of poisoning points injected into the training set is reported on top of each plot, along with the
test error (in parentheses) of the poisoned classifier. Poisoning points are optimized through a gradient-based attack algorithm. They are initialized by cloning the
training points denoted with white crosses and flipping their label. The gradient trajectories (black dashed lines) are then followed up to some local optima to obtain
the final poisoning points (highlighted with black circles).
11
Reactive Defenses Proactive Defenses
Security-by-Design Defenses Security-by-Obscurity Defenses
against white-box attacks (no probing) against gray-box and black-box attacks (probing)
1. timely detection of attacks
2. frequent retraining 1. secure/robust learning 1. information hiding, randomization
3. decision verification 2. attack detection 2. detection of probing attacks
Effect on decision boundaries:
noise-specific margin,
enclosure of legitimate training classes
and effective than pure proactive approaches aimed to solely on game theory have been proposed to overcome these limita-
mitigate the risk of potential future attacks [30, 42, 91]. Re- tions. Zero-sum games have been formulated to learn invari-
active approaches include: (i) timely detection of novel at- ant transformations like feature insertion, deletion and rescal-
tacks, (ii) frequent classifier retraining, and (iii) verification ing [35–37].10 More rigorous approaches have then introduced
of consistency of classifier decisions against training data and Nash and Stackelberg games for secure learning, deriving for-
ground-truth labels [42, 51]. In practice, to timely identify and mal conditions for existence and uniqueness of the game equi-
block novel security threats, one can leverage collaborative ap- librium, under the assumption that each player knows every-
proaches and honeypots, i.e., online services purposely vulner- thing about the opponents and the game [46, 94]. Randomized
able with the specific goal of collecting novel spam and mal- players [47] and uncertainty on the players’ strategies [95] have
ware samples. To correctly detect recently-reported attacks, the also been considered to simulate less pessimistic scenarios. De-
classifier should be frequently retrained on newly-collected data spite these approaches seem promising, understanding the ex-
(including them), and novel features and attack detectors may tent to which the resulting attack strategies are representative of
also be considered (see, e.g., the spam arms race discussed in practical scenarios remains an open issue [96, 97]. Adversar-
Sect. 2). This procedure should also be automated to some ex- ial learning is not a (board) game with well-defined rules and,
tent to act more readily when necessary; e.g., using automatic thus, the objective functions of real-world attackers may not
drift detection techniques [12, 42, 92]. The correctness of clas- even correspond to those hypothesized in the aforementioned
sifier decisions should finally be verified by expert domains. games. It may be thus interesting to verify, reactively, whether
This raises the issue of how to involve humans in the loop in real-world attackers behave as hypothesized, and exploit feed-
a more coordinated manner, to supervise and verify the correct back from the observed attacks to improve the definition of the
functionality of learning systems. attack strategy. Another relevant problem of these approaches
is their scalability to large datasets and high-dimensional fea-
5.2. Proactive Defenses ture spaces, as it may be too computationally costly to generate
a sufficient number of attack samples to correctly represent their
Proactive defenses aim to prevent future attacks. The main distribution, i.e., to effectively tackle the curse of dimensional-
ones proposed thus far can be categorized according to the ity.
paradigms of security by design and security by obscurity, as A more efficient approach relies on robust optimization. Ro-
discussed in the following. bust optimization formulates adversarial learning as a mini-
max problem in which the inner problem maximizes the train-
5.2.1. Security-by-Design Defenses against White-box Attacks ing loss by manipulating the training points under worst-case,
The paradigm of security by design advocates that a system bounded perturbations, while the outer problem trains the learn-
should be designed from the ground up to be secure. Based on ing algorithm to minimize the corresponding worst-case train-
this idea, several learning algorithms have been adapted to ex- ing loss [3, 84, 98, 104]. An interesting result by Xu et al. [84]
plicitly take into account different kinds of adversarial data ma- has shown that the inner problem can be solved in closed form,
nipulation. These defenses are designed in a white-box setting at least for linear SVMs, yielding a standard regularized loss
in which the attacker is assumed to have perfect knowledge of formulation that penalizes the classifier parameters using the
the attacked system. There is thus no need to probe the targeted dual norm of the input noise. This means that different reg-
classifier to improve knowledge about its behavior (as instead ularizers amount to hypothesizing different kinds of bounded
done in gray-box and black-box attacks). worst-case noise on the input data. This has effectively estab-
Countering Evasion Attacks. In 2004, Dalvi et al. [19] pro- lished an equivalence between regularized learning problems
posed the first adversary-aware classifier against evasion at- and robust optimization, which has in turn enabled approximat-
tacks, based on iteratively retraining the classifier on the sim- ing computationally-demanding secure learning models (e.g.,
ulated attacks. This is not very different from the idea of ad- game-theoretical ones) with more efficient ones based on regu-
versarial training that has been recently used in deep networks larizing the objective function in a specific manner [48, 77, 78],
to counter adversarial examples [2, 3], or to harden decision
trees and random forests [79]. These defenses are however 10 Note that similar ideas have been exploited also to model uncertainty on
heuristic, with no formal guarantees on convergence and robust- some parameters of the data distribution and learn optimal robust classifiers
ness properties. More theoretically-sound approaches relying against worst-case changes of such parameters [93].
12
SVM-RBF (no reject) SVM-RBF (reject) SVM-RBF (higher rejection rate)
1 1 1
0 0 0
1 1 1
1 0 1 1 0 1 1 0 1
Figure 12: Effect of class-enclosing defenses against blind-spot adversarial examples on multiclass SVMs with RBF kernels, adapted from [7]. Rejected samples
are highlighted with black contours. The adversarial example (black star) is misclassified only by the standard SVM (left plot), while SVM with rejection correctly
identifies it as an adversarial example (middle plot). Rejection thresholds can be modified to increase classifier security by tightening class enclosure (right plot), at
the expense of misclassifying more legitimate samples.
also in structured learning [99]. Notably, recent work has also tacks and the number of misclassified (stationary) samples at
derived an extension of this formulation to nonlinear classi- test time, as empirically shown in Sect. 4.1.1, and conceptually
fiers [100]. The main effect of the aforementioned techniques depicted in Fig. 12 [7]. The other relevant effect, especially
is to smooth out the decision function of the classifier, mak- induced by regularization methods inspired from robust opti-
ing it less sensitive to worst-case input changes. This in turn mization, is to provide a noise-specific margin between classes,
means reducing the norm of the input gradients. More direct as conceptually represented in Fig. 13 [77, 78]. These are the
(and sometimes equivalent) approaches obtain the same effect two main effects induced by the aforementioned secure learning
by penalizing the input gradients using specific regularization approaches in feature space.
terms [101–104].
It is finally worth remarking that, by using a secure learn-
Another line of defenses against evasion attacks is based on ing algorithm, one can counter blind-spot evasion samples, but
detecting and rejecting samples which are sufficiently far from definitely not adversarial examples whose feature vectors be-
the training data in feature space (similarly to the defense dis- come indistinguishable from those of training samples belong-
cussed in Sect. 4.1.1) [7, 8, 12, 105, 106]. These samples are ing to different classes. In this case, indeed, any learning al-
usually referred to as blind-spot evasion points, as they appear gorithm would not be able to tell such samples apart [114].
in regions of the feature space scarcely populated by training The security properties of learning algorithms should be thus
data. These regions can be assigned to any class during clas- considered independently from those exhibited by the chosen
sifier training without any substantial increase in the training feature representation. Security of features should be consid-
loss. In practice, this is a simple consequence of the stationar- ered as an additional, important requirement; features should
ity assumption underlying many machine-learning algorithms not only be discriminant, but also robust to manipulation, to
(according to which training and test data come from the same avoid straightforward classifier evasion by mimicking the fea-
distribution) [107, 108], and such rejection-based defenses sim- ture values exhibited by legitimate samples. In the case of deep
ply aim to overcome this issue. convolutional networks, most of the problems arise from the
Finally, we point out that classifier ensembles have been also fact that the learned mapping from input to deep space (i.e., the
exploited to improve security against evasion attempts (e.g., feature representation) violates the smoothness assumption of
by implementing rejection-based mechanisms or secure fusion learning algorithms: samples that are close in input space may
rules) [44, 56, 82, 105, 109, 110] and even against poison- be very far in deep space. In fact, as also reported in Sect. 4.1.1,
ing attacks [111]. They may however worsen security if the adversarial examples in deep space become indistinguishable
base classifiers are not properly combined [56, 105]. Many from training samples of other classes for sufficiently-high ad-
other heuristic defense techniques have also been proposed; versarial input perturbations [7]. Therefore, this vulnerability
e.g., training neural networks with bounded activation func- can only be patched by retraining or re-engineering the deeper
tions and training data augmentation to improve stability to in- layers of the network (and not only the last ones) [2, 7].
put changes [112, 113]. Countering Poisoning Attacks. While most work focused on
Effect on Decision Boundaries. We aim to discuss here how countering evasion attacks at test time, some white-box de-
the proposed defenses substantially change the way classi- fenses have also been proposed against poisoning attacks [24,
fiers learn their decision boundaries. Notably, defenses in- 25, 111, 115–119]. To compromise a learning algorithm dur-
volving retraining on the attack samples and rejection mech- ing training, an attack has to be exhibit different characteristics
anisms achieve security against evasion by essentially counter- from those shown by the rest of the training data (otherwise it
ing blind-spot attacks. One potential effect of this assumption would have no impact at all) [111]. Poisoning attacks can be
is that the resulting decision functions may tend to enclose the thus regarded as outliers, and countered using data sanitization
(stationary) training classes more tightly. This in turn may re- (i.e., attack detection and removal) [111, 116, 118], and robust
quire one to trade-off between the security against potential at- learning (i.e., learning algorithms based on robust statistics that
13
SVM Infinity-norm SVM 1-norm SVM
3 3 3
2 2 2
1 1 1
0 0 0
1 1 1
2 2 2
3 3 3
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
Figure 13: Decision functions for linear SVMs with `2 , `∞ and `1 regularization on the feature weights [77, 78]. The feasible domain of adversarial modifications
(characterizing the equivalent robust optimization problem) is shown for some training points, respectively with `2 , `1 and `∞ balls. Note how the shape of these
balls influences the orientation of the decision boundaries, i.e., how different regularizers optimally counter specific kinds of adversarial noise.
are intrinsically less sensitive to outlying training samples, e.g., “states of nature” and “actions” that a rationale agent can imple-
via bounded losses or kernel functions) [25, 117, 119–122]. ment are perfectly known. Using the words of a famous speech
by Donald Rumsfeld, one could argue that machine learning
5.2.2. Security-by-Obscurity Defenses against Black-box At- can deal with known unknowns.11 Unfortunately, adversarial
tacks machine learning often deals with unknown unknowns. When
These proactive defenses, also known as disinformation tech- learning systems are deployed in adversarial environments in
niques in [23, 40, 61], follow the paradigm of security by ob- the open world, they can misclassify (with high-confidence)
scurity, i.e., they hide information to the attacker to improve never-before-seen inputs that are largely different from known
security. These defenses aim to counter gray-box and black- training data. We know that unknown unknowns are the real
box attacks in which probing mechanisms are used to improve threat in many security problems (e.g., zero-day attacks in com-
surrogate models or refine evasion attempts by querying the tar- puter security). Although they can be mitigated using the proac-
geted classifier. tive approach described in this work, they remain a primary
Some examples include [51]: (i) randomizing collection of open issue for adversarial machine learning, as modeling at-
training data (collect at different timings, and locations); (ii) tacks relies on known unknowns, while unknown unknowns are
using difficult to reverse-engineer classifiers (e.g., classifier en- unpredictable.
sembles); (iii) denying access to the actual classifier or training We are firmly convinced that new research paths should be
data; and (iv) randomizing the classifier’s output to give im- explored to address this fundamental issue, complementary to
perfect feedback to the attacker. The latter approach has been formal verification and certified defenses [88, 118]. Machine
firstly proposed in 2008 [45] as an effective way to hide in- learning algorithms should be able to detect unknown unknowns
formation about the classification function to the attacker, with using robust methods for anomaly or novelty detection, poten-
recent follow-ups in [8, 47] to counter adversarial examples. tially asking for human intervention when required. The de-
However, it is still an open issue to understand whether and to velopment of practical methods for explaining, visualizing and
which extent randomization may be used to make it harder for interpreting the operation of machine-learning systems could
the attacker to learn a proper surrogate model, and to implement also help system designers to investigate the behavior of such
privacy-preserving mechanisms [123] against model inversion systems on cases that are not statistically represented by the
and hill-climbing attacks [30, 66, 73–76]. training data, and decide whether to trust their decisions on
Notably, security-by-obscurity defenses may not always be such unknown unknowns or not. These future research paths
helpful. Gradient masking has been proposed to hide the gra- lie at the intersection of the field of adversarial machine learn-
dient direction used to craft adversarial examples [6, 9], but ing and the emerging fields of robust artificial intelligence and
it has been shown that it can be easily circumvented with interpretability of machine learning [124, 125]. We believe that
surrogate learners [15, 16, 38], exploiting the same princi- these directions will help our society to get a more conscious
ple behind attacking non-differentiable classifiers (discussed in understanding of the potential and limits of modern data-driven
Sect. 4.1) [78]. machine-learning technologies.
14
timeline of evasion attacks. This work was also partly sup- N. Taft, J. D. Tygar, Antidote: understanding and defending against poi-
ported by the EU H2020 project ALOHA, under the European soning of anomaly detectors, in: IMC ’09, ACM, 2009, pp. 1–14.
[26] B. Biggio, B. Nelson, P. Laskov, Poisoning attacks against support vector
Union’s Horizon 2020 research and innovation programme machines, in: 29th ICML, 2012, pp. 1807–1814.
(grant no. 780788). [27] M. Kloft, P. Laskov, Online anomaly detection under adversarial impact,
in: 13th AISTATS, 2010, pp. 405–412.
[28] M. Kloft, P. Laskov, Security analysis of online centroid anomaly detec-
References tion, JMLR 13 (2012) 3647–3690.
[29] H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, F. Roli, Is feature
selection secure against training data poisoning?, in: 32nd ICML, Vol. 37,
[1] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, 2015, pp. 1689–1698.
G. Wang, J. Cai, T. Chen, Recent advances in convolutional neural net- [30] B. Biggio, I. Corona, B. Nelson, B. Rubinstein, D. Maiorca, G. Fumera,
works, Pattern Recognition 77 (2018) 354 – 377. G. Giacinto, F. Roli, Security evaluation of support vector machines in
[2] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, adversarial environments, in: Y. Ma, G. Guo (Eds.), Support Vector Ma-
R. Fergus, Intriguing properties of neural networks, in: ICLR, 2014. chines Applications, Springer Int’l Publishing, Cham, 2014, pp. 105–153.
[3] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adver- [31] S. Mei, X. Zhu, Using machine teaching to identify optimal training-set
sarial examples, in: ICLR, 2015. attacks on machine learners, in: 29th AAAI, 2015.
[4] A. M. Nguyen, J. Yosinski, J. Clune, Deep neural networks are easily [32] P. W. Koh, P. Liang, Understanding black-box predictions via influence
fooled: High confidence predictions for unrecognizable images., in: IEEE functions, in: ICML, 2017.
CVPR, 2015, pp. 427–436. [33] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongras-
[5] S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, Deepfool: a simple and samee, E. C. Lupu, F. Roli, Towards poisoning of deep learning algo-
accurate method to fool deep neural networks, in: IEEE CVPR, 2016, pp. rithms with back-gradient optimization, in: AISec ’17, ACM, 2018, pp.
2574–2582. 27–38.
[6] N. Papernot, P. McDaniel, X. Wu, S. Jha, A. Swami, Distillation as a de- [34] G. L. Wittel, S. F. Wu, On attacking statistical spam filters, in: 1st Conf.
fense to adversarial perturbations against deep neural networks, in: IEEE Email and Anti-Spam (CEAS), 2004.
Symp. Security & Privacy (SP), 2016, pp. 582–597. [35] A. Globerson, S. T. Roweis, Nightmare at test time: robust learning by
[7] M. Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, F. Roli, Is deep feature deletion, in: 23rd ICML, Vol. 148, ACM, 2006, pp. 353–360.
learning safe for robot vision? Adversarial examples against the iCub [36] C. H. Teo, A. Globerson, S. Roweis, A. Smola, Convex learning with
humanoid, in: ICCV Workshop ViPAR, 2017. invariances, in: NIPS 20, MIT Press, 2008, pp. 1489–1496.
[8] D. Meng, H. Chen, MagNet: a two-pronged defense against adversarial [37] O. Dekel, O. Shamir, L. Xiao, Learning to classify with missing and cor-
examples, in: 24th ACM Conf. Computer and Comm. Sec. (CCS), 2017. rupted features, Machine Learning 81 (2010) 149–178.
[9] J. Lu, T. Issaranon, D. Forsyth, Safetynet: Detecting and rejecting adver- [38] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Gi-
sarial examples robustly, in: IEEE ICCV, 2017. acinto, F. Roli, Evasion attacks against machine learning at test time, in:
[10] X. Li, F. Li, Adversarial examples detection in deep networks with con- ECML PKDD, Part III, Vol. 8190 of LNCS, Springer, 2013, pp. 387–402.
volutional filter statistics, in: IEEE ICCV, 2017. [39] N. Šrndic, P. Laskov, Practical evasion of a learning-based classifier: A
[11] K. Grosse, N. Papernot, P. Manoharan, M. Backes, P. D. McDaniel, Ad- case study, in: IEEE Symp. Security and Privacy, SP ’14, 2014, pp. 197–
versarial examples for malware detection, in: ESORICS (2), Vol. 10493 211.
of LNCS, Springer, 2017, pp. 62–79. [40] M. Barreno, B. Nelson, A. Joseph, J. Tygar, The security of machine
[12] R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, learning, Machine Learning 81 (2010) 121–148.
L. Cavallaro, Transcend: Detecting concept drift in malware classification [41] B. Biggio, G. Fumera, F. Roli, Security evaluation of pattern classifiers
models, USENIX Sec. Symp., USENIX Assoc., 2017, pp 625?642. under attack, IEEE Trans. Knowl. and Data Eng. 26 (4) (2014) 984–996.
[13] P. McDaniel, N. Papernot, Z. B. Celik, Machine learning in adversarial [42] B. Biggio, G. Fumera, F. Roli, Pattern recognition systems under attack:
settings, IEEE Security & Privacy 14 (3) (2016) 68–72. Design issues and research challenges, IJPRAI 28 (7) (2014) 1460002.
[14] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, A. Swami, [43] B. Biggio, G. Fumera, P. Russu, L. Didaci, F. Roli, Adversarial biometric
The limitations of deep learning in adversarial settings, in: 1st IEEE Eu- recognition : A review on biometric system security from the adversarial
ropean Symp. Security and Privacy, 2016, pp. 372–387. machine-learning perspective, IEEE Signal Proc. Mag., 32 (5) (2015) 31–
[15] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, 41.
Practical black-box attacks against machine learning, in: ASIA CCS ’17, [44] A. Kolcz, C. H. Teo, Feature weighting for improved classifier robustness,
ACM, 2017, pp. 506–519. in: 6th Conf. Email and Anti-Spam (CEAS), 2009.
[16] N. Carlini, D. A. Wagner, Towards evaluating the robustness of neural [45] B. Biggio, G. Fumera, F. Roli, Adversarial pattern classification using
networks, in: IEEE Symp. Security & Privacy (SP), 2017, pp. 39–57. multiple classifiers and randomisation, in: SSPR 2008, Vol. 5342 of
[17] M. Sharif, S. Bhagavatula, L. Bauer, M. K. Reiter, Accessorize to a crime: LNCS, Springer, 2008, pp. 500–509.
Real and stealthy attacks on state-of-the-art face recognition, in: Conf. [46] M. Brückner, C. Kanzow, T. Scheffer, Static prediction games for adver-
Computer and Comm. Security (CCS), ACM, 2016, pp. 1528–1540. sarial learning problems, JMLR 13 (2012) 2617–2654.
[18] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, A. Yuille, Adversarial ex- [47] S. Rota Bulò, B. Biggio, I. Pillai, M. Pelillo, F. Roli, Randomized pre-
amples for semantic segmentation and object detection, in: IEEE ICCV, diction games for adversarial machine learning, IEEE Trans. Neural Net-
2017. works and Learning Systems 28 (11) (2017) 2466–2478.
[19] N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma, Adversarial clas- [48] A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck,
sification, in: Int’l Conf. Knowl. Disc. and Data Mining, 2004, pp. 99– I. Corona, G. Giacinto, F. Roli, Yes, machine learning can be more se-
108. cure! A case study on android malware detection, IEEE Trans. Dep. and
[20] D. Lowd, C. Meek, Adversarial learning, in: Int’l Conf. Knowl. Disc. and Secure Comp.
Data Mining, ACM Press, Chicago, IL, USA, 2005, pp. 641–647. [49] P. Laskov, R. Lippmann (Eds.), NIPS Workshop on Machine Learning in
[21] D. Lowd, C. Meek, Good word attacks on statistical spam filters, in: 2nd Adversarial Environments for Computer Security, 2007.
Conf. Email and Anti-Spam (CEAS), Mountain View, CA, USA, 2005. [50] P. Laskov, R. Lippmann, Machine learning in adversarial environments,
[22] T. Matsumoto, H. Matsumoto, K. Yamada, S. Hoshino, Impact of artificial Machine Learning 81 (2010) 115–119.
“gummy” fingers on fingerprint systems, Datensch. und Datensich. 26 (8). [51] A. D. Joseph, P. Laskov, F. Roli, J. D. Tygar, B. Nelson, Machine Learn-
[23] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, J. D. Tygar, Can machine ing Methods for Computer Security (Dagstuhl Perspectives Workshop
learning be secure?, in: ASIA CCS ’06, ACM, 2006, pp. 16–25. 12371), Dagstuhl Manifestos 3 (1) (2013) 1–30.
[24] B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, [52] B. M. Thuraisingham, B. Biggio, D. M. Freeman, B. Miller, A. Sinha
U. Saini, C. Sutton, J. D. Tygar, K. Xia, Exploiting machine learning to (Eds.), AISec ’17: 10th Workshop on AI and Security, ACM, 2017.
subvert your spam filter, in: LEET ’08, USENIX Assoc., 2008, pp. 1–9. [53] A. D. Joseph, B. Nelson, B. I. P. Rubinstein, J. Tygar, Adversarial Ma-
[25] B. I. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S.-h. Lau, S. Rao,
15
chine Learning, Cambridge University Press, 2018. [80] A. Athalye, L. Engstrom, A. Ilyas, K. Kwok, Synthesizing robust adver-
[54] I. Corona, G. Giacinto, F. Roli, Adversarial attacks against intrusion de- sarial examples, in: ICLR, 2018.
tection systems: Taxonomy, solutions and open issues, Information Sci- [81] P. Fogla, M. Sharif, R. Perdisci, O. Kolesnikov, W. Lee, Polymorphic
ences 239 (0) (2013) 201 – 225. blending attacks, in: USENIX Sec. Symp., 2006, pp. 241–256.
[55] X. Han, N. Kheir, D. Balzarotti, Phisheye: Live monitoring of sandboxed [82] B. Biggio, G. Fumera, F. Roli, Multiple classifier systems for robust clas-
phishing kits, in: ACM CCS, 2016, pp. 1402–1413. sifier design in adversarial environments, Int’l JMLC 1 (1) (2010) 27–41.
[56] I. Corona, B. Biggio, M. Contini, L. Piras, R. Corda, M. Mereu, [83] N. Šrndić, P. Laskov, Detection of malicious pdf files based on hierarchi-
G. Mureddu, D. Ariu, F. Roli, Deltaphish: Detecting phishing webpages cal document structure, in: 20th NDSS, The Internet Society, 2013.
in compromised websites, in: ESORICS, Vol. 10492 of LNCS, Springer, [84] H. Xu, C. Caramanis, S. Mannor, Robustness and regularization of sup-
2017, pp. 370–388. port vector machines, JMLR 10 (2009) 1485–1510.
[57] B. Biggio, G. Fumera, I. Pillai, F. Roli, A survey and experimental eval- [85] N. Carlini, D. A. Wagner, Adversarial examples are not easily detected:
uation of image spam filtering techniques, PRL 32 (10) (2011) 1436 – Bypassing ten detection methods, in: AISec ’17, ACM, 2017, pp. 3–14.
1446. [86] A. Athalye, N. Carlini, D. A. Wagner, Obfuscated gradients give a false
[58] A. Attar, R. M. Rad, R. E. Atani, A survey of image spamming and filter- sense of security: Circumventing defenses to adversarial examples, in:
ing techniques, Artif. Intell. Rev. 40 (1) (2013) 71–105. ICML, Vol. 80 of JMLR W&CP, JMLR.org, 2018, pp. 274–283.
[59] G. Fumera, I. Pillai, F. Roli, Spam filtering based on the analysis of text [87] Y. Dong, F. Liao, T. Pang, X. Hu, J. Zhu, Boosting adversarial examples
information embedded into images, JMLR 7 (2006) 2699–2720. with momentum, in: CVPR, 2018.
[60] A. O. Thomas, A. Rusu, V. Govindaraju, Synthetic handwritten captchas, [88] X. Huang, M. Kwiatkowska, S. Wang, M. Wu, Safety verification of deep
Pattern Recognition 42 (12) (2009) 3365 – 3373, new Frontiers in Hand- neural networks, in: 29th Int’l Conf. Computer Aided Verification, Part I,
writing Recognition. Vol. 10426 of LNCS, Springer, 2017, pp. 3–29.
[61] L. Huang, A. D. Joseph, B. Nelson, B. Rubinstein, J. D. Tygar, Adversar- [89] K. Pei, Y. Cao, J. Yang, S. Jana, Deepxplore: Automated whitebox testing
ial machine learning, in: 4th AISec, Chicago, IL, USA, 2011, pp. 43–57. of deep learning systems, in: 26th SOSP, ACM, 2017, pp. 1–18.
[62] B. Biggio, I. Pillai, S. R. Bulò, D. Ariu, M. Pelillo, F. Roli, Is data cluster- [90] J. Newsome, B. Karp, D. Song, Paragraph: Thwarting signature learning
ing in adversarial settings secure?, in: AISec ’13, ACM, 2013, pp. 87–98. by training maliciously, in: RAID, LNCS, Springer, 2006, pp. 81–105.
[63] B. Biggio, K. Rieck, D. Ariu, C. Wressnegger, I. Corona, G. Giacinto, [91] A. Barth, B. I. Rubinstein, M. Sundararajan, J. C. Mitchell, D. Song,
F. Roli, Poisoning behavioral malware clustering, in: AISec ’14, ACM, P. L. Bartlett, A learning-based approach to reactive security, IEEE Trans.
2014, pp. 27–36. Dependable and Sec. Comp. 9 (4) (2012) 482–493.
[64] B. Biggio, S. R. Bulò, I. Pillai, M. Mura, E. Z. Mequanint, M. Pelillo, [92] L. I. Kuncheva, Classifier ensembles for detecting concept change in
F. Roli, Poisoning complete-linkage hierarchical clustering, in: SSPR, streaming data: Overview and perspectives, in: SUEMA, 2008, pp. 5–
Vol. 8621 of LNCS, Springer, 2014, pp. 42–52. 10.
[65] F. Zhang, P. Chan, B. Biggio, D. Yeung, F. Roli, Adversarial feature se- [93] E. R. Dougherty, J. Hua, Z. Xiong, Y. Chen, Optimal robust classifiers,
lection against evasion attacks, IEEE Trans. Cyb. 46 (3) (2016) 766–777. Pattern Recognition 38 (10) (2005) 1520–1532.
[66] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, T. Ristenpart, Stealing ma- [94] W. Liu, S. Chawla, Mining adversarial patterns via regularized loss mini-
chine learning models via prediction APIs, in: USENIX Sec. Symp., mization, Machine Learning 81 (1) (2010) 69–83.
USENIX Assoc., 2016, pp. 601–618. [95] M. Großhans, C. Sawade, M. Brückner, T. Scheffer, Bayesian games for
[67] W. Xu, Y. Qi, D. Evans, Automatically evading classifiers, in: Annual adversarial regression problems, in: 30th ICML, JMLR W&CP, Vol. 28,
Network & Distr. Sys. Sec. Symp. (NDSS), The Internet Society, 2016. 2013, pp. 55–63.
[68] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, C.-J. Hsieh, Zoo: Zeroth or- [96] M. Wooldridge, Does game theory work?, IEEE IS 27 (6) (2012) 76–80.
der optimization based black-box attacks to deep neural networks without [97] G. Cybenko, C. E. Landwehr, Security analytics and measurements, IEEE
training substitute models, in: AISec ’17, ACM, 2017, pp. 15–26. Security & Privacy 10 (3) (2012) 5–8.
[69] H. Dang, Y. Huang, E. Chang, Evading classifiers by morphing in the [98] Z. Qi, Y. Tian, Y. Shi, Robust twin support vector machine for pattern
dark, in: ACM CCS ’17, ACM, 2017, pp. 119–133. classification, Pattern Recognition 46 (1) (2013) 305 – 316.
[70] B. Nelson, B. I. Rubinstein, L. Huang, A. D. Joseph, S. J. Lee, S. Rao, [99] M. A. Torkamani, D. Lowd, On robustness and regularization of structural
J. D. Tygar, Query strategies for evading convex-inducing classifiers, support vector machines, in: ICML, Vol. 32 of PMLR, 2014, pp. 577–585.
JMLR 13 (2012) 1293–1332. [100] E. Wong, J. Z. Kolter, Provable defenses against adversarial examples
[71] T. Gu, B. Dolan-Gavitt, S. Garg, Badnets: Identifying vulnerabilities in via the convex outer adversarial polytope, in: ICML, Vol. 80 of JMLR
the machine learning model supply chain, in: NIPS Workshop Mach. W&CP, JMLR.org, 2018, pp. 5283–5292.
Learn. Comp. Sec., Vol. abs/1708.06733, 2017. [101] C. Lyu, K. Huang, H.-N. Liang, A unified gradient regularization family
[72] X. Chen, C. Liu, B. Li, K. Lu, D. Song, Targeted backdoor attacks on deep for adversarial examples, in: ICDM, Vol. 00, IEEE CS, 2015, pp. 301–
learning systems using data poisoning, ArXiv e-prints abs/1712.05526. 309.
[73] M. Fredrikson, S. Jha, T. Ristenpart, Model inversion attacks that exploit [102] J. Sokolić, R. Giryes, G. Sapiro, M. R. D. Rodrigues, Robust large mar-
confidence information and basic countermeasures, in: ACM CCS ’15, gin deep neural networks, IEEE Trans. Signal Proc. 65 (16) (2017) 4265–
ACM, 2015, pp. 1322–1333. 4280.
[74] A. Adler, Vulnerabilities in biometric encryption systems, in: T. Kanade, [103] C. J. Simon-Gabriel, Y. Ollivier, B. Schölkopf, L. Bottou, D. Lopez-Paz,
A. K. Jain, N. K. Ratha (Eds.), 5th Int’l Conf. Audio- and Video-Based Adversarial vulnerability of neural networks increases with input dimen-
Biometric Person Auth., Vol. 3546 of LNCS, Springer, 2005, pp. 1100– sion, ArXiv e-prints, 2018.
1109. [104] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards Deep
[75] J. Galbally, C. McCool, J. Fierrez, S. Marcel, J. Ortega-Garcia, On the Learning Models Resistant to Adversarial Attacks, ICLR, 2018.
vulnerability of face verification systems to hill-climbing attacks, Patt. [105] B. Biggio, I. Corona, Z.-M. He, P. Chan, G. Giacinto, D. Yeung,
Rec. 43 (3) (2010) 1027–1038. F. Roli, One-and-a-half-class multiple classifier systems for secure learn-
[76] M. Martinez-Diaz, J. Fierrez, J. Galbally, J. Ortega-Garcia, An evalu- ing against evasion attacks at test time, in: MCS, Vol. 9132 of LNCS,
ation of indirect attacks and countermeasures in fingerprint verification Springer, 2015, pp. 168–180.
systems, Patt. Rec. Lett. 32 (12) (2011) 1643 – 1651. [106] A. Bendale, T. E. Boult, Towards open set deep networks, in: IEEE
[77] A. Demontis, P. Russu, B. Biggio, G. Fumera, F. Roli, On security and CVPR, 2016, pp. 1563–1572.
sparsity of linear classifiers for adversarial settings, in: SSPR, Vol. 10029 [107] J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodri-guez, N. V. Chawla,
of LNCS, Springer, 2016, pp. 322–332. F. Herrera, A unifying view on dataset shift in classification, Patt. Rec.
[78] P. Russu, A. Demontis, B. Biggio, G. Fumera, F. Roli, Secure kernel ma- 45 (1) (2012) 521 – 530.
chines against evasion attacks, in: AISec ’16, ACM, 2016, pp. 59–69. [108] I. Pillai, G. Fumera, F. Roli, Multi-label classification with a reject op-
[79] A. Kantchelian, J. D. Tygar, A. D. Joseph, Evasion and hardening of tree tion, Patt. Rec. 46 (8) (2013) 2256 – 2266.
ensemble classifiers, in: ICML, Vol. 48 JMLR W&CP, 2016, pp. 2387– [109] P. Wild, P. Radu, L. Chen, J. Ferryman, Robust multimodal face and
2396. fingerprint fusion in the presence of spoofing attacks, Patt. Rec. 50 (2016)
16
17 – 25. 38 (3), 2017.
[110] B. Biggio, G. Fumera, G. L. Marcialis, F. Roli, Statistical meta-analysis [125] Z. Lipton, The mythos of model interpretability, ICML Workshop on
of presentation attacks for secure multibiometric systems, IEEE Trans. Human Interpretability of Machine Learning, 2016.
Patt. Analysis Mach. Intell. 39 (3) (2017) 561–575.
[111] B. Biggio, I. Corona, G. Fumera, G. Giacinto, F. Roli, Bagging classi-
fiers for fighting poisoning attacks in adversarial classification tasks, in: Battista Biggio received the M.Sc. degree (Hons.) in
MCS, Vol. 6713 of LNCS, Springer-Verlag, 2011, pp. 350–359. Electronic Engineering and the Ph.D. degree in Electronic
[112] V. Zantedeschi, M. Nicolae, A. Rawat, Efficient defenses against adver- Engineering and Computer Science from the University
sarial attacks, in: AISec, ACM, 2017, pp. 39–49.
of Cagliari, Italy, in 2006 and 2010. Since 2007, he has
[113] S. Zheng, Y. Song, T. Leung, I. Goodfellow, Improving the robustness
of deep neural networks via stability training, in: CVPR, 2016, pp. 4480– been with the Department of Electrical and Electronic En-
4488. gineering, University of Cagliari, where he is currently an
[114] D. Maiorca, B. Biggio, M. E. Chiappe, G. Giacinto, Adversar- Assistant Professor. In 2011, he visited the University of
ial detection of flash malware: Limitations and open issues, CoRR
abs/1710.10225. Tuebingen, Germany, and worked on the security of machine learning to train-
[115] B. Nelson, B. Biggio, P. Laskov, Understanding the risk factors of learn- ing data poisoning. His research interests include secure machine learning,
ing in adversarial environments, in: AISec ’11, 2011, pp. 87–92. multiple classifier systems, kernel methods, biometrics and computer security.
[116] G. F. Cretu, A. Stavrou, M. E. Locasto, S. J. Stolfo, A. D. Keromytis, Dr. Biggio serves as a reviewer for several international conferences and jour-
Casting out demons: Sanitizing training data for anomaly sensors, in:
IEEE Symp. Security and Privacy, IEEE CS, 2008, pp. 81–95. nals. He is a senior member of the IEEE and a member of the IAPR.
[117] C. Liu, B. Li, Y. Vorobeychik, A. Oprea, Robust linear regression against
training data poisoning, in: AISec ’17, ACM, 2017, pp. 91–102.
[118] J. Steinhardt, P. W. Koh, P. Liang, Certified defenses for data poisoning Fabio Roli received his Ph.D. in Electronic Engineer-
attacks, in: NIPS, 2017.
ing from the University of Genoa, Italy. He was a re-
[119] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, B. Li, Ma-
nipulating machine learning: Poisoning attacks and countermeasures for search group member of the University of Genoa (’88-
regression learning, in: 39th IEEE Symp. Security and Privacy, 2018. ’94), and adjunct professor at the University of Trento
[120] G. Xu, Z. Cao, B.-G. Hu, J. C. Principe, Robust support vector machines (’93-’94). In 1995, he joined the Department of Elec-
based on the rescaled hinge loss function, Patt. Rec. 63 (2017) 139 – 148.
trical and Electronic Engineering of the University of
[121] A. Christmann, I. Steinwart, On robust properties of convex risk mini-
mization methods for pattern recognition, JMLR 5 (2004) 1007–1034. Cagliari, where he is now professor of Computer En-
[122] J. Bootkrajang, A. Kaban, Learning kernel logistic regression in the gineering and head of the research laboratory on pattern recognition and appli-
presence of class label noise, Patt. Rec. 47 (11) (2014) 3641 – 3655. cations. His research activity is focused on the design of pattern recognition
[123] B. I. P. Rubinstein, P. L. Bartlett, L. Huang, N. Taft, Learning in a large
function space: Privacy-preserving mechanisms for SVM learning, J. Pri- systems and their applications. He was a very active organizer of international
vacy and Conf. 4 (1) (2012) 65–100. conferences and workshops, and established the popular workshop series on
[124] T. Dietterich, Steps Toward Robust Artificial Intelligence, AI Magazine multiple classifier systems. Dr. Roli is Fellow of the IEEE and of the IAPR.
17