Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

AI Creative Research Project 1

AI8037-00

Dr. Mujeen Sung


Assistant Professor, Kyung Hee University

Presented by : Sumit Kumar Dam M.Sc. + Ph.D. Combined (3rd Semester)


Student ID : 2022315499 Networking Intelligence Lab,
Date : 19 March 2024 Professor Choong Seon Hong.

NETWORKING
INTELLIGENCE LAB
Outline 2/16

 Topics to be discussed

1. Adversarial Attack
• Definition
• Types of Adversarial Attack
• How to defend against adversarial attacks?
2. Universal adversarial training with class-wise perturbations (ICME 2021)
• Core Contributions
• Why I Choose this Paper?
3. Background and Motivation
• UAP Crafting Algorithm
• Effects of class-wise UAPs
4. Universal Adversarial Training with Class-wise UAPs
• Class-wise UAT Algorithm
5. Results
• Comparison with SOTA UAT
• CIFAR Results
• ImageNet-10 Results
• Analysis and visualization

6. Conclusion

NETWORKING
INTELLIGENCE LAB
Adversarial Attack (1/2) 3/16

 Definition

At a high level, an adversarial attack pertains to the introduction of specific changes to input data, which cause a machine learning model (often
deep learning models) to make incorrect predictions or classifications, while these changes are usually imperceptible (or nearly so) to humans.
To get an idea of what adversarial examples look like, consider the following demonstrations.

Fig. 1. By adding perturbation, “ panda” is classified as “gibbon” [1]. Fig. 2. stickers and posters on vital traffic signs [2].

[1] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).
[2] https://www.analyticsvidhya.com/blog/2022/09/machine-learning-adversarial-attacks-and-defense/

NETWORKING
INTELLIGENCE LAB
Adversarial Attack (2/2) 4/16

 Types of Adversarial Attacks

These attacks can be further divided into two extensive categories: White Box Attacks and Black Box attacks.
In black box attacks, attackers don't have direct access to model internals but might know the type of model and can access its outputs. They
generate adversarial examples based on this limited information in the hopes that these will transfer to the target model. In contrast, the attacker
does have access to the model’s parameters in white box attacks.

Apart from these two, some other attacks are also


widely encountered in this field:
 Data Poisoning
 Byzantine Attacks
 Evasion
 Model Extraction

Fig. 3. : Illustration of black-box adversarial attacks [3].

[3] Pang, Ren, et al. "Advmind: Inferring adversary intent of black-box attacks." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020.

NETWORKING
INTELLIGENCE LAB
How to defend against adversarial attacks? 5/16

 Adversarial Training

Despite their overwhelming success in a wide range of applications, machine learning models (often deep learning models) are widely recognized
to be vulnerable to adversarial examples. This intriguing phenomenon led to a competition between adversarial attacks and defense techniques. So
far, adversarial training is the most widely used method for defending against adversarial attacks.

Adversarial training is when engineers use adversarial


examples to retrain the models to make them more robust
against perturbations. It can be summarized in three main
points:
 Creation of Adversarial Examples & Training the Model
 Robustness Improvement.
 Trade-offs

NETWORKING
INTELLIGENCE LAB
Paper Title:
Universal adversarial training with class-wise perturbations

Authors:
Philipp Benz, Chaoning Zhang, Adil Karjauv, In So Kweon

ICME 2021 Google citations: 28

Presented by : Sumit Kumar Dam


Date : 19 March 2024

NETWORKING
INTELLIGENCE LAB
Universal adversarial training with class-wise perturbations (ICME 2021) 7/16

 Core Contributions

In this research, the authors find that the generated UAP often does not attack images equally for each class. Given that UAPs contain dominant
semantic features of a certain class [17], this class-wise imbalance [8] is somewhat expected. Moreover, since only a single UAP is adopted for all
samples during adversarial training, there is inevitably a lack of diverse perturbation directions.

For overcoming the scarcity of diverse perturbation directions, the authors propose:
 Class-wise UAT, i.e., replacing the single UAP with class-wise UAPs during training.
 Their class-wise UAT outperforms the SOTA UAT [19] by a considerable margin for both clear accuracy and adversarial robustness against
universal attack.

Why I Choose this Paper?


 Since this paper deals with stuff like UAT, and class-wise UAP, and I am also working on adversarial attacks and model robustness, that's
why I found this paper useful for my future research.

[8] Philipp Benz, Chaoning Zhang, Adil Karjauv, and In So Kweon, “Robustness may be at odds with fairness: An empirical study on class-wise accuracy,” NeurIPS workshop, 2020.
[17] Chaoning Zhang, Philipp Benz, Tooba Imtiaz, and In So Kweon, “Understanding adversarial examples from the mutual influence of images and perturbations,” in CVPR, 2020.

NETWORKING
INTELLIGENCE LAB
Background and Motivation (1/4) 8/16

 UAP Crafting Algorithm (1/2)

Universal Adversarial Perturbations (UAPs) [16] fool most images with one single perturbation. Assuming a classifier parameterized through
weights θ as well as a dataset X from which samples x can be drawn, the objective to craft a single UAP perturbation δ is:

Here ε indicates an upper bound for the permissible perturbation magnitude.

[16] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard, “Universal adversarial perturbations,” in CVPR, 2017.

NETWORKING
INTELLIGENCE LAB
Background and Motivation (2/4) 9/16

 UAP Crafting Algorithm (2/2)

The classical UAP [16] adopted the image-dependent attack method DeepFool [16] to iteratively accumulate the final perturbation. [19, 17] argued,
that this process is slow and proposed faster and more efficient UAP algorithms.

Here the authors adopt the state-of-the-art algorithm proposed in


[17] to craft UAPs, which is shown in Algorithm 1. The UAP was
crafted with Algorithm 1 for ε = 8 and 2000 iterations with the
ADAM optimizer.

[16] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard, “Universal adversarial perturbations,” in CVPR, 2017.
[17] Chaoning Zhang, Philipp Benz, Tooba Imtiaz, and In So Kweon, “Understanding adversarial examples from the mutual influence of images and perturbations,” in CVPR, 2020.
[19] Ali Shafahi, Mahyar Najibi, Zheng Xu, John P Dickerson, Larry S Davis, and Tom Goldstein, “Universal adversarial training.,” in AAAI, 2020.

NETWORKING
INTELLIGENCE LAB
Background and Motivation (3/4) 10/16

 Effects of class-wise UAPs (1/2)

The authors highlight that UAPs do not attack images from all classes equally, which finally can lead to some inefficiencies in UAT. For this, they
plot the class-wise accuracies of a ResNet-18 trained on CIFAR for all samples perturbed through a UAP (Fig. 1). The UAP was crafted with
Algorithm 1. The plots in Figure 1 show that there exists a discrepancy in the class-wise accuracies.

Fig. 1. Class-wise accuracies without (“Clean”) and with UAP (“UAP”) applied. The plots are shown for a standard ResNet-18 on CIFAR-10 (top) and CIFAR-100 (bottom).

NETWORKING
INTELLIGENCE LAB
Background and Motivation (4/4) 11/16

 Effects of class-wise UAPs (2/2)

By contrast, crafting one UAP for the samples of each class separately, the class-wise accuracies of all classes can be significantly decreased. The
authors term such perturbations class-wise UAPs. Figure 2 depicts the increased severity of the class-wise UAP.

The class-wise accuracies for the ResNet-18 model trained and evaluated on CIFAR-10 are all below or around 10% point. The same trend is
observed for the ResNet-18 model trained on CIFAR-100.

Fig. 2. Class-wise accuracies without (“Clean”) and with class-wise UAP (“Class-Wise UAP”) applied to the images. The plots are shown for a standard ResNet-18 on CIFAR-10 (top) and CIFAR-100 (bottom).

NETWORKING
INTELLIGENCE LAB
Universal Adversarial Training with Class-wise UAPs 12/16

 Class-wise UAT Algorithm

It has the objective to fool the model for most of the samples
∼ with ground truth class c while obeying the magnitude
constraint ε. More formally:

In Algorithm 2 we adapt the UAT algorithm introduced in [19]


to leverage class-wise UAPs. The resulting new algorithm is
termed class-wise UAT.

[19] Ali Shafahi, Mahyar Najibi, Zheng Xu, John P Dickerson, Larry S Davis, and Tom Goldstein, “Universal adversarial training.,” in AAAI, 2020.

NETWORKING
INTELLIGENCE LAB
Results (1/4) 13/16

 Comparison with SOTA UAT

The authors compare their method with the results reported in [19] for the Wide-ResNet. The results on CIFAR10 are shown in Table 1.

This approach outperforms the previous SOTA method UAT [19] by a


significant margin. Improved UAT with class-wise UAPs can
improve the robustness of the model to UAPs by 3%.

Moreover, for the proposed class-wise UAT method, the accuracy


drop under the UAP attack is only 0.1%, which suggests that this
proposed UAT training method results in a model that is almost
immune to the UAP attack.

[19] Ali Shafahi, Mahyar Najibi, Zheng Xu, John P Dickerson, Larry S Davis, and Tom Goldstein, “Universal adversarial training.,” in AAAI, 2020.

NETWORKING
INTELLIGENCE LAB
Results (2/4) 14/16

 CIFAR Results

In the following, an extended comparison of the improved UAT to the adversarial training techniques using PGD and UAT [19] is being presented.
Compared to UAT [19], our class-wise UAT outperforms by a significant margin for both models. Specifically, for ResNet-18 training with class-
wise UAPs can improve the performance by 4.5%, while for the Wide-ResNet, the performance improves by 5%.

[19] Ali Shafahi, Mahyar Najibi, Zheng Xu, John P Dickerson, Larry S Davis, and Tom Goldstein, “Universal adversarial training.,” in AAAI, 2020.

NETWORKING
INTELLIGENCE LAB
Results (3/4) 15/16

 ImageNet-10 Results

Here, the authors evaluate its efficacy on higher resolution images. They randomly select a subset of 10 classes from the ImageNet dataset
(“drake”, “beaver”, “ballpoint”, “beach wagon”, “palace”, “red wolf”, “water buffalo”, “barbershop”, “chain-link fence”, “banana”). The result
resembles that on the CIFAR dataset. For example, on WRN50, the class-wise UAT significantly improves the robustness against UAP from
81.3% to 90.4%.

[19] Ali Shafahi, Mahyar Najibi, Zheng Xu, John P Dickerson, Larry S Davis, and Tom Goldstein, “Universal adversarial training.,” in AAAI, 2020.

NETWORKING
INTELLIGENCE LAB
Results (4/4) 16/16

 Analysis and visualization

In Figure 3, it can be observed that the model trained with UAT [19] shows more unbalanced class-wise performance, with the accuracies
of some vulnerable classes, such as “bird”, “cat”, “deer” and “dog”, being relatively low, while the others are relatively high. A similar
trend can be observed for our proposed class-wise UAT, however, the robustness of those vulnerable classes increases by a large margin.

Fig. 3. Class-wise accuracies without (top) and with a UAP applied (bottom) for a ResNet-18 model trained with UAT and our proposed class-wise UAT on CIFAR-10.

[19] Ali Shafahi, Mahyar Najibi, Zheng Xu, John P Dickerson, Larry S Davis, and Tom Goldstein, “Universal adversarial training.,” in AAAI, 2020.

NETWORKING
INTELLIGENCE LAB
Conclusion 17/16

 Conclusion

 In this work, the authors show that the UAP does not attack images from all classes equally and also identify the SOTA UAT algorithm that
only adopts a single perturbation during training as a source for unbalanced robustness.
 To this end, they propose class-wise UAT that outperforms the existing SOTA UAT by a large margin for both clean accuracy and
adversarial robustness against universal attack.
 Due to the enhanced robustness of vulnerable classes, the class-wise UAT also contributes to a more balanced class-wise robustness.

Thank You

NETWORKING
INTELLIGENCE LAB

You might also like