Professional Documents
Culture Documents
Domain Generalization On Constrained
Domain Generalization On Constrained
Domain Generalization On Constrained
1 Introduction
For many IoT domains and applications, edge computing enables to reduce band-
width requirements and unnecessary network communications that may raise
critical security threats. Due to its success across a large variety of application
domains, deploying state-of-the-art deep neural network models on edge devices
is a growing field of research [22]. However, this deployment faces several chal-
lenges of different nature, with critical ones related to the training data and
hardware constraints.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
A. González-Vidal et al. (Eds.): GIoTS 2022, LNCS 13533, pp. 250–261, 2022.
https://doi.org/10.1007/978-3-031-20936-9_20
Compatibility of Domain Generalization and Model Pruning 251
Fig. 1. Illustration of the scope of our study. Pruning and single domain generalization
techniques are jointly used to train a model on a source domain and test on an unseen
target domain. The model must fit in a constrained MCU.
challenges for modern AI-based IoT systems. For reproducibility purpose and
further experiments, codes and experiments are publicly available1 .
2 Background
2.1 Single Domain Generalization
Single domain generalization (hereafter, SDG) is a challenging setting where a
model is trained on a single source dataset with the objective to generalize to
unseen but related target datasets. Traditionally, the target domain represents
a real-world application with very few available training data (e.g. anomaly
detection from sensors). A source domain is selected according to its closeness
to the target domain and the ability to gather sufficient amount of labelled
data (e.g. simulated data). The most common method to tackle SDG is data
augmentation, for example with a combination of standard input transformations
found with an evolution algorithm, as in [21]. Adversarial data augmentation is
the most popular approach for SDG: it consists of alternating between training
and data augmentation phases where the dataset is augmented with samples
from a fictitious target domain that is “hard” under the current model [14].
As a reference method, we use the work from Xu et al. [23] that recently
reaches state-of-the-art performance with a scalable approach. For image clas-
sification, the authors start from the observation that semantics often relies
more on object shapes than local textures, while local textures are one of the
main sources of difference between domains (as the dogs in Fig. 1). To learn
texture-invariant representations, they augment the training dataset thanks to
random convolutions that “create an infinite number of new domains” [23]. At
each training iteration, images are augmented with a probability p up to three
times. Each augmentation is done by convolving the image with a randomly
(size, value) generated kernels. This augmentation creates copies of the input
image with different textures. Furthermore, they introduce a consistency loss
(based on Kullback-Leibler divergence) to encourage the model to predict the
same output for all augmented images. A parameter λ tunes the contribution of
the consistency loss to the global loss.
Pruning techniques Types [11] [5] [3] [2] [15] [10] [20] [17]
Sparsity structure Structured
Unstructured
Local
Global
Pruning heuristic Magnitude-based
Gradient-based
Others
Iterative scoring
Data-agnostic
Pruning schedule One-shot
Iterative
Retraining procedure Fine-tuning
Weight Rewinding
Learning rate rewinding
2
https://www.tensorflow.org/lite/microcontrollers.
3
https://www.st.com/en/embedded-software/x-cube-ai.html.
254 B. Nguyen et al.
and requires the use of a specific sparse computation library (e.g. [19]) to decrease
the model’s consumption and storage.
We focus our experiments on three common pruning settings. The first setting
is the one-shot global unstructured pruning at initialization. Global unstructured
pruning algorithms are known to be the most efficient methods to produce sparse
neural networks and, one-shot techniques do not increase the training budget.
The second is the iterative global unstructured pruning that reduces the loss
of accuracy at the cost of a bigger training budget. The third is the iterative
local structured pruning since structured methods are easily compatible with
standard development platforms.
domains. The data agnosticism of SynFlow can explain this difference. With
enough iterations, and independently of the dataset, SynFlow is designed to
satisfy the Maximal Critical Compression axiom that implies that Synflow algo-
rithm does not prune a parameter if it leads to layer collapse and there exists
another prunable parameter which can avoid layer collapse (see [17]). Meanwhile,
SNIP heuristic is designed to discover the important connections of the network
for its training on the source task. Xu et al. [23] relax this task thanks to ran-
dom convolutions. So SNIP is less relevant and using iterative ranking does not
improve the network performance.
three methods, the learning rate is initialized at 10−4 and is reduced by a factor
0.1 at epoch 120. Magnitude heuristic is used for pruning. After each pruning,
the network are retrained on 150 epochs.
For all domains and networks, an Occam’s hill [18] is observed in Fig. 4:
at low sparsity rate, the accuracy increases since pruning acts as a regulariza-
tion process which forces the model to focus on more important and general
aspects of the task [18]. For high sparsity rate, the collapse of the network’s per-
formance classically occurs. This local gain of generalization is confirmed with
weight rewinding where the network’s parameters receive the same number of
gradient updates for each sparsity level. Learning rate rewinding outperforms
other methods in accordance with [15]. However, the large increase of accuracy
is mostly due to the additional training iterations (gradient updates) with high
learning rate. For the following experiments, learning rate rewinding will be used.
The RealWorld HAR dataset [16] gathers fifteen subjects equipped with smart-
phones and smartwatches on seven different body positions (head, chest, upper
arm, waist, forearm, thigh, and shin) that perform seven activities (climbing
stairs down and up, jumping, lying, standing, sitting, running/jogging, and walk-
ing). From their devices, accelerometer and gyroscope data are sampled 50 Hz.
We follow the reference procedure of Chang et al. [1]. The accelerometer
signals are sampled in fixed width sliding windows of 3 s (no overlap). A trace
is discarded if it includes a transition of activities, timestamp noise, or data
points without labels. The neural network is trained with the data from one
body location (chest) then tested on the other body locations.
For our experiments, we use a variant of the model proposed in [1] in which
instance normalization layers are replaced with standard batch-normalization
258 B. Nguyen et al.
layers. We adapt Xu et al. [23] technique with temporal convolutions with ran-
dom kernels of various sizes within [1–7]. The original data fraction parameter
p and the consistency loss factor λ are fixed at 0.5 and 5 respectively. We keep
SynFlow heuristic since it performs well on all settings of the digit benchmarks.
Our results are averaged on three training seeds.
For these experiments, the network is trained on 70 epochs with Adam optimizer,
a batch size of 32 and an initial learning rate of 0.001 which is divided by a
factor 2 at epochs 40 and 60. For iterative pruning, the network is retrained on
50 epochs with learning rate rewinding after each pruning. We also follow the
evaluation process of [1] and measure the F1-score with macro-averaging (mean
of all the per-class F1 scores).
A first observation from Fig. 7 is the efficiency of our customized version of
Xu et al. method [23]: for all target domains, random convolutions enable the
model to reach a greater f1-score than classically trained model despite a lower
F1-score on the source domain. Second, we highlight an interesting compatibility
between [23] and pruning techniques, since a compression ratio up to 80% and
50% can be reached without loss of accuracy on the source domains for unstruc-
tured and structured pruning respectively, although models trained with random
convolutions are more impacted by high compression rate, more particularly for
structured pruning (right).
Figure 7 shows that pruning improves the generalization capacity: without
random convolutions (bottom), the F1-score of the network increases for target
domains at high sparsity score on all pruning settings. Furthermore, with random
convolutions, this increase is also observed in the one-shot unstructured pruning
Compatibility of Domain Generalization and Model Pruning 259
Fig. 7. Pruning on RealWorld HAR: trained with (top) and without (bottom) ran-
dom convolutions, one-shot at initialization (left) and iterative (centre) unstructured
pruning and iterative structured pruning (right).
setting (top-left) for the farthest body positions (thigh, shin) from the source
domain (chest). On the contrary, for iterative pruning (top-centre and top-right)
pruning increases F1-score on target domains close to the source domain while
decreases F1-score on target domains far from the source domains. This effect
can be explained by the additional training iterations (gradient updates) caused
by iterative pruning with learning rate rewinding.
5 Conclusion
We experimentally evaluate the impact of pruning techniques in the single
domain generalization setting with state-of-the-art methods and two benchmarks
260 B. Nguyen et al.
Acknowledgments. This work benefited from the French Jean Zay supercomputer
thanks to the AI dynamic access program. This collaborative work is partially sup-
ported by the IPCEI on Microelectronics and Nano2022 actions and by the European
project InSecTT (www.insectt.eu: ECSEL Joint Undertaking (876038). The JU receives
support from the European Union’s H2020 program and Au, Sw, Sp, It, Fr, Po, Ir, Fi,
Sl, Po, Nl, Tu. The document reflects only the author’s view and the Commission is
not responsible for any use that may be made of the information it contains.) and by
the French National Research Agency (ANR) in the framework of the Investissements
d’Avenir program (ANR-10-AIRT-05, irtnanoelec).
References
1. Chang, Y., Mathur, A., Isopoussu, A., Song, J., Kawsar, F.: A systematic study
of unsupervised domain adaptation for robust human-activity recognition. Proc.
ACM Interact. Mobile Wearable Ubiquit. Technol. 4(1), 1–3 (2020)
2. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable
neural networks. In: International Conference on Learning Representations (2019)
3. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for
efficient neural network. Adv. Neural Inf. Proc. Syst. 1, 1135–1143 (2015)
4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: Proceedings of the Conference on Computer Vision and Pattern Recognition
(2016)
5. He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for
deep convolutional neural networks acceleration. In: Proceedings of the Conference
on Computer Vision and Pattern Recognition (2019)
6. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile
vision applications. arXiv preprint arXiv:1704.04861 (2017)
7. Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pat-
tern Anal. Mach. Intell. 16(5), 550–554 (1994)
Compatibility of Domain Generalization and Model Pruning 261
8. Ismail Fawaz, H., et al.: InceptionTime: finding AlexNet for time series classifica-
tion. Data Min. Knowl. Disc. 34, 1–27 (2020)
9. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition.
Neural Comput. 1(4), 541–551 (1989)
10. Lee, N., Ajanthan, T., Torr, P.: Snip: single-shot network pruning based on connec-
tion sensitivity. In: International Conference on Learning Representations (2018)
11. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient
convnets. In: International Conference on Learning Representations (2017)
12. Munappy, A., Bosch, J., Olsson, H.H., Arpteg, A., Brinne, B.: Data management
challenges for deep learning. In: 2019 45th Euromicro Conference on Software Engi-
neering and Advanced Applications (SEAA). IEEE (2019)
13. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits
in natural images with unsupervised feature learning. NIPS (2011)
14. Qiao, F., Zhao, L., Peng, X.: Learning to learn single domain generalization. In:
Proceedings of the Conference on Computer Vision and Pattern Recognition (2020)
15. Renda, A., Frankle, J., Carbin, M.: Comparing rewinding and fine-tuning in neural
network pruning. In: International Conference on Learning Representations (2020)
16. Sztyler, T., Stuckenschmidt, H.: On-body localization of wearable devices: an inves-
tigation of position-aware activity recognition. In: 2016 IEEE International Con-
ference on Pervasive Computing and Communications (PerCom). IEEE (2016)
17. Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without
any data by iteratively conserving synaptic flow. Adv. Neural Inf. Proc. Syst. 33,
6377–6389 (2020)
18. Thodberg, H.H.: Improving generalization of neural networks through pruning. Int.
J. Neural Syst. 1(4), 317–326 (1991)
19. Trommer, E., Waschneck, B., Kumar, A.: dCSR: a memory-efficient sparse matrix
representation for parallel neural network inference. In: 2021 IEEE/ACM Interna-
tional Conference On Computer Aided Design (ICCAD). IEEE (2021)
20. Verdenius, S., Stol, M., Forré, P.: Pruning via iterative ranking of sensitivity statis-
tics. arXiv preprint arXiv:2006.00896 (2020)
21. Volpi, R., Murino, V.: Addressing model vulnerability to distributional shifts over
image transformation sets. In: Proceedings of the IEEE/CVF International Con-
ference on Computer Vision (2019)
22. Wang, X., Han, Y., Leung, V.C., Niyato, D., Yan, X., Chen, X.: Convergence of
edge computing and deep learning: a comprehensive survey. IEEE Commun. Surv.
Tutorials 22, 869–904 (2020)
23. Xu, Z., Liu, D., Yang, J., Raffel, C., Niethammer, M.: Robust and generalizable
visual representation learning via random convolutions. In: International Confer-
ence on Learning Representations (2021)