Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

1

Gradient Leakage Attacks in Federated Learning:


Research Frontiers, Taxonomy and Future
Directions
Haomiao Yang, Member, IEEE, Mengyu Ge, Graduate Student Member, IEEE, Dongyun Xue, Kunlan Xiang,
Hongwei Li, Senior Member, IEEE, Rongxing Lu, Fellow, IEEE

Abstract—Federated learning (FL) is a distributed deep learn- Unfortunately, several studies have shown that gradients still
ing framework that has become increasingly popular in recent implicitly contain information about the original data, making
years. Essentially, FL supports numerous participants and the pa- it possible to steal private data [2], [6], [7]. We call such a
rameter server to co-train a deep learning model through shared
gradients without revealing the private training data. Recent behavior gradient leakage attack (GLA).
studies, however, have shown that a potential adversary (either To better understand GLA, we classify GLA into two
the parameter server or participants) can recover private training categories, optimization-based attacks and analytics-based at-
data from the shared gradients, and such behavior is called tacks. Among the optimization-based methods, the most rep-
gradient leakage attacks (GLAs). In this study, we first present resentative work is the deep leakage from gradients (DLG)
an overview of FL systems and outline the GLA philosophy.
We classify the existing GLAs into two paradigms: optimization- proposed by Zhu et al. [2]. They consider an honest but curious
based and analytics-based attacks. In particular, the optimization- parameter server as an adversary. To recover the original
based approach defines the attack process as an optimization image data from the gradients uploaded by the participants,
problem, whereas the analytics-based approach defines the attack the adversary first randomly initializes a noisy image and
as a problem of solving multiple linear equations. We present a random label (also called a dummy image and dummy
a comprehensive review of the state-of-the-art GLA algorithms
followed by a detailed comparison. Based on the observations of label, respectively). Then, the adversary optimizes the dummy
the shortcomings of the existing optimization-based and analytics- image and dummy label simultaneously to make the gradients
based methods, we devise a new generation-based GLA paradigm. generated by the model close to the captured real gradients.
We demonstrate the superiority of the proposed GLA in terms When the optimization process converges, the dummy image
of data reconstruction performance and efficiency, thus posing a and dummy label are considered to be the original private data.
greater potential threat to federated learning protocols. Finally,
we pinpoint a variety of promising future directions for GLA. The study by Phong et al. [3] is the groundbreaking work of
an analytics-based attack. They were the first to demonstrate
Index Terms—Federated learning, gradients leakage attack, how to compute the raw input from a single-layer perceptron
analytics-based attacks, optimization-based attacks, generation-
based attacks analytically when only gradients are available. Fan et al. [4]
further extended this concept to a single convolutional layer
and proposed a closed-form algorithm. Unlike optimization-
I. I NTRODUCTION based methods, analytics-based attacks are fast and accurate,
Conventional machine learning patterns require data to be because the core insight is to solve one or more linear systems
aggregated to the center before model training. However, after of equations. In some large and complex deep learning models
deviating from the original data domain, the data could become such as ResNet and Transfomer-based models, however, the
uncontrollable, leading to data privacy leakage and data secu- analytics-based attack may face overdetermined and underde-
rity risks. Federated learning (FL) [1] is a distributed machine termined cases during the solution of linear equations.
learning technique proposed by Google. Unlike traditional In the existing literature related to GLA, many research
machine learning, it does not require data to be centralized works are relatively independent and may overlap in attack
in a single center, thus enabling distributed collaborative methods. Moreover, almost no work has broadly generalized
training models and preserving data privacy. Due to its intrinsic or classified GLA. A deeper and more thorough understanding
characteristics of preserving privacy and decentralization, FL of GLA is indispensable before any type of FL services to
has been widely favored by various industries. preserve data privacy can be offered to users. Therefore, a
FL systems mainly consist of a central parameter server comprehensive summary of state-of-the-art research on GLA is
and multiple participants. At each communication round, the necessary to guide more research directions, including defense
parameter server sends the model weights from the previous strategies, in the future.
round to the participants, who update the weights based on In this study, we comprehensively investigate the state-of-
their private data. Then, each participant sends the gradi- the-art GLA existing in FL. We first review the FL system
ent of the global model back to the parameter server for and outline the GLA philosophy by demonstrating that the
aggregation. Throughout this process, the central server has latest research efforts enable the definition of a complete
access only to the gradients or model parameters uploaded taxonomy of methods for reconstructing participants original
by the participants, and hence, data privacy is protected. private data from gradients. We then give a brief overview

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.
2

of the existing GLA, explain the starting point of the study Participant (Victim) Parameter server
of this field, and what problems are addressed. Meanwhile,
we provide not only a classification of the existing GLA but
also a detailed consideration of the upsides and downsides. In
Private training data Global model
view of the shortcomings of the existing work, such as slow Parameter
distribution
data reconstruction and poor performance on large models,
we further propose a novel generation-based attack strategy. Gradients/Weights

Finally, we highlight some purposeful open questions and Reconstructed data


consider future research directions that we believe will shed
Compression
new light on the field of GLAs. 2SWLRQDO GLA algorithms

II. P RELIMINARIES Adversary


Uploaded gradients (server/participant)
A. Federated Learning
FL is a distributed machine learning framework that enables
data sharing and collaborative modeling while also ensuring Fig. 1. An illustration of the GLA process: The adversary (either the server or
data privacy security and legal compliance. In general, there the participant) reconstructs the private training data by accessing the shared
are four phases of FL in each communication round. First, gradients
the parameter server distributes the model parameters from
the previous round to each participant or some of them. In the
first communication round, the model parameters are randomly III. T HREAT MODEL
initialized. Second, each selected participant feeds the local GLA is an attack method in which the adversary recovers
data into the global model for training. Given the limited com- or reconstructs information related to the original training data
putational resources of the participants, FL protocols typically from the shared gradients. We define the threat model in terms
require only one iteration by the participants using stochastic of two properties that are consistent with adversary capabil-
gradient descent. Third, all participants send the gradients or ities and background knowledge: adversary capabilities and
model weights obtained by local training back to the parameter adversary knowledge. The capabilities of the adversary can be
server. Finally, the parameter server aggregates the gradients or further classified as active and passive attacks. The adversary
weights from all participants in the previous step and updates knowledge includes both auxiliary-free and auxiliary-based
the global model according to the equation (1), where |ni | attacks.
represents the amount of data on the i-th participants; m is the
a) Adversary capability: An adversary in an active attack
number of participants; |N | is the sum of all |ni |; W stands
scenario is usually highly capable. Specifically, not only does
for shared gradient. The end of the FL process is marked by
an adversary have access to the gradients and model param-
the convergence of these iterative procedures over the model
eters, but it also can modify the model parameters and even
parameters.
the model structure to make it easier for them to steal private

m
|ni | data. In contrast, passive adversaries are usually honest but
Wt+1 = Wt − η ∇Wi (1) curious, and they just use what information they can get and
|N |
i snoop secretly on private data from that information.
B. GLA Philosophy b) Adversary knowledge: Typically, the adversary has
information only about the model parameters and the gradients
In general, any attack that obtains information about the
of the participants. The act of reconstructing private data from
original data from the gradients can be called a gradient leak-
this information alone is called an auxiliary-free attack. To
age attack. Specifically, from the gradients, the attacker can
obtain more accurate reconstruction results, some adversaries
infer the properties of the training data, class representatives
may exploit additional auxiliary information, including auxil-
[5], labels [6], or even the data itself [2]. Because the first
iary datasets, batch normalization statistics information of the
attack on the inferred attribute is shallow and less threatening
global model, and pretrained generative adversarial networks
[2], the last three attacks are already accurate to the details of
(GAN), which we refer to as auxiliary-based attacks. The
the original training data. Thus they pose a significant threat
introduction of auxiliary information has become an important
to FL services. In this study, we focused on the latter three
part of the current research on FL systems [8], [9].
GLAs.
Fig. 1 presents an overview of the FL system that suffers
from GLA. After a participant feeds the local data to the global IV. S TATE - OF - THE - ART GLA METHODS
model and computes the gradient by backpropagation, the
gradient is passed back to the parameter server. To reduce the According to the attack techniques and the reconstruction
communication overhead, participants compress the gradients result, all state-of-the-art GLA methods can be divided into
before uploading them. two categories: optimization-based and analytics-based at-
Finally, an adversary can pose a threat to FL by exploiting tacks. Next, for each type of attack, we briefly review the
gradient information and global models to reconstruct private existing leading attack methods and discuss their strengths and
data through various GLA algorithms. weaknesses. See TABLE I for a comparison of existing works.

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.
3

TABLE I
A COMPREHENSIVE COMPARISON OF THE STATE - OF - THE - ART GLA METHODS

Method Taxonomy Capability Knowledge Batch Size Shared Comp 1 Res 2 Pre 3 Open 4 Year Remark

Phong et al., [3] Analytics Passive Auxiliary-free 1 Gradients No 20×20 No No 2017 Analytics founder
DMU-GAN [5] Optimization Active Auxiliary-free - Gradients No 64×64 No Yes 2017 Optimization pioneer
DLG [2] Optimization Passive Auxiliary-free Max:8 Gradients No 64×64 No Yes 2019 Optimization founder
iDLG [6] Optimization Passive Auxiliary-free Max:1 Gradients No 32×32 No Yes 2020 -
CPL [9] Optimization Passive Auxiliary-based Max:8 Gradients Yes 32×32 No Yes 2020 Strong assumption
IG [7] Optimization Passive Auxiliary-free Max:100 Gradients No 32×32 No Yes 2020 Known labels
Fan et al. [4] Analytics Passive Auxiliary-free 1 Gradients No 32×32 - No 2020 -
STG [8] Optimization Passive Auxiliary-based Max:48 Gradients No 224×224 Yes No 2021 BN statistics
R-gap [11] Analytics Passive Auxiliary-based 2 Gradients No 32×32 - Yes 2021 -
Fowl et al., [12] Analytics Active Auxiliary-free Max:512 Gradients No 224×224 Yes Yes 2021 Strong assumption
LLG [14] Analytics Passive Auxiliary-free Max:128 Gradients No - No Yes 2021 Labels only
Franziska et al., [13] Analytics Active Auxiliary-free 100 Gradients No 32×32 - No 2022 Success rate 0.4-0.6
DLM [10] Optimization Passive Auxiliary-free 1 Weights No 32×32 No No 2022 -
HCGLA [15] Optimization Passive Auxiliary-based Max:16 Gradients Yes 32×32 No Yes 2023 -
1 Comp is shorthand for compression and indicates whether the work has a scenario where gradient compression is considered.
2 Res indicated the private image resolution.
3 Pre, short for pretraining, indicates whether the attack can be applied to a federated learning scenario in which the model has been pre-trained.
4 Open indicates whether the work publish the code.
”-” indicates that it is not reported in the original article or is not applicable.

A. Optimization-based attack method a dummy image and a random label as a dummy label, and
then optimizes them to produce a gradient consistent with the
Optimization-based methods usually achieve data recon- captured gradient by gradient descent. Eq. (2) shows the core
struction by optimizing the original data according to the idea of DLG, where f denotes the global model, gc is the
gradients to make it close to the private training data. Hitaj et captured real gradient, η stands for learning rate, and xd and
al. are the pioneers of optimization-based methods [5]. They yd represent dummy inputs and dummy labels, respectively.
proposed Deep Models Under the GAN (DMU-GAN) which The four formulas shown in Eq. (2) are denoted from top to
first used the idea of GAN, assuming that only two parties are bottom: 1. Dummy data and dummy labels are fed into the
involved and one of them is a malicious client who intends to global model to obtain dummy gradients gd . 2. Calculate the
steal private training data by continuously optimizing the GAN Euclidean distance D, between the dummy gradient gd and the
to synthesize samples so that the mimetic gradient no longer captured real gradient gc . 3. Update xd by gradient descent
can be changed by the other client. The essence is to trick with the goal of minimizing D. Update yd by gradient descent
the victim into revealing more detailed information related to with the goal of minimizing D.
private data by influencing the learning process. Thus, it is an

attack that breaks the performance of the global model. Note ⎪
⎪ ∂L(f (xd , θ), yd )

⎪ gd ←
that while DMU-GAN does employ a generative network, its ⎪
⎨ ∂θ
core mechanism is distinct from generation-based methods D ←gd − gc 2 (2)
presented later in this article. The primary goal of generation- ⎪


⎪ x d ←x d − η∇ x D
based methods is to create synthetic data that closely resemble ⎪

d

the target data distribution without requiring any optimization yd ←yd − η∇yd D
process. In contrast, DMU-GAN focuses on deceiving the Zhao et al. found that the labels can be directly inferred
victim by actively manipulating the learning process, which from the gradient of the last fully connected network without
involves continuous optimization of the GAN in response to optimization and proposed iDLG [6], thus reducing the search
the victim’s model updates. This iterative process is the main space of DLG optimization and improving the efficiency of
reason for classifying DMU-GAN as an optimization-based data reconstruction. iDLG however, can infer labels only when
method, despite the presence of a generative network. the batch size is 1. Based on iDLG, Geiping et al. proposed IG
Although DMU-GAN can generate data similar to private [7] and assumed that the label information of private data is
datasets, it is unable to extract individual data points. Zhu et known, which focuses on optimizing the image. Specifically,
al. proposed the DLG method [2] for the first time, which IG replaced the Euclidean distance of DLG, which measures
realized the exact reconstruction of private data instead of the difference between the dummy and real gradients, with
just reconstructing similar data. Since then, optimization-based cosine similarity. They analyzed which layers of the neural
studies have focused on DLG for further improvement. DLG network model contained the most information about the
regards the attack process as a problem of jointly optimiz- original data. Having seen the potential of GLA through IG,
ing input data and labels. The attack procedure is that the the recent work on see through the gradients (STG) proposed
adversary first randomly initializes a random noisy image as by Yin et al. [8] first pushed the boundaries toward high-

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.
4

resolution, multi-batch GLA scenarios. They first proposed a tens of thousands of levels and the number of layers is small.
method to infer the labels of all images in a batch (but it was To handle complex deep networks in real-world scenarios,
limited to the condition that there were no duplicate labels). Zhu et al. proposed the R-gap [11], which formulated a rank
They then used multiple independent optimization processes analysis method to estimate the feasibility of performing GLA
to improve the quality of the optimization, in a penalty term given a network architecture. The R-gap can be interpreted as
was added to the dummy image and statistical information a closed-form recursive procedure to recover raw data from
from the batch normalization layer was used to co-optimize gradients in a deep neural network. R-gap, however, does not
the dummy image to find an enhanced reconstruction of the address the batch input problem.
original data batch. Active adversaries were first introduced by Fowl et al. [12]
In a real FL scenario, direct transmission of the original full by side-connecting a huge fully connected network to the
gradient is very resource-consuming, so it is more desirable original global model and then directly obtaining a verbatim
for participants to compress local gradients before sending copy of the input data without solving the hard inverse
them back to the server to reduce communication overhead. problem. Although this approach has been shown to be able
Gradient compression inevitably brings information loss, so in to recover the original data even for batch size under 512 and
DLG, Zhu et al. [2] also noted gradient compression as a good image resolutions up to 224×224, the additional large network
defense measure and reported that when the retained gradient structures added by the adversary have exactly the same
is 70% of the original one, the adversary can no longer weight values, which leads to player awareness. To improve
reconstruct the private data. Considering gradient compression, the problem that the active attacker Fowl et al. is inherently
Wei et al. proposed the client privacy leakage (CPL) method detectable, Franziska et al. [13] proposed the concept of trap
[9]. Specifically, they discarded the use of random noise as an weights, which means that the adversary modifies only the
initialization method for dummy images and proposed the use initial weight parameter distribution, and the success rate of the
of real natural images to compensate for some of the infor- attack is improved by changing the initial weight distribution.
mation loss caused by gradient compression. CPL, however, Recovering labels is another important task in GLA, because
requires that the adversary’s auxiliary dataset overlaps with the most attack methods to recover the original image require
private training dataset. In a recent study, Yang et al. proposed recovering the labels at the beginning. Therefore, recovering
HCGLA [15] and implemented GLA for the scenario where batch labels from gradients, especially when labels are dupli-
gradients are highly compressed. In particular, they address cated, is a task worth exploring. Motivated by this, Wainakh et
the problem of requiring demanding auxiliary datasets in CPL al. proposed a heuristic algorithm label leakage from gradients
by first performing a shallow attack, which is to infer some (LLG) [14] to solve the problem of duplicate labels. According
features of the original data from the compressed gradients, to the authors, the insight behind LLG is that the number of
and then using these information to initialize the dummy data. the same labels in a batch affects the numerical magnitude of
In addition, HCGLA also proposes specific optimization objec- the gradient, and by approximately quantizing this value, it is
tive functions for gradient compression scenarios to improve possible to infer the number of identical labels in a batch. The
the accuracy of data reconstruction. drawback of LLG is that it works only for untrained networks.

B. Analytics-based attack method V. A COMPREHENSIVE COMPARISON


Unlike optimization-based methods, analytics-based meth- Table I provides a comprehensive comparison of the existing
ods typically obtain raw private data by solving a linear system GLA. Generally, when applying GLA, optimization-based
of equations. As a result, these methods reconstruct data faster methods are more time-consuming and require a lot more com-
and more accurately. Phong et al. [3], as the originators of putational resources than analytics-based methods. The former
the analytics-based approach, first showed how to compute often requires thousands of iterations, and tens of thousands
the original input from the gradient. The core idea is to of high-resolution images, on the order of hours, whereas the
divide the gradient of the weights and the gradient of the latter produces the raw input in milliseconds, regardless of
bias to reconstruct the original input. Their work is limited the image resolution. In addition, if the original image can
to a single-layer neural network, however, and the batch size be restored by the analytics-based methods, they are lossless
can be only 1. Moreover, it is required that the bias term (except for R-gap [11] due to an error accumulation). Despite
exists and is not zero. Fan et al. [4] further extended the the high-speed and high-quality nature of the analytics-based
analytics-based approach to bias attacks, that is, instead of a methods, there is significantly more research is needed on
perfect reconstruction, they used analytical methods to obtain optimization-based methods because analytics-based methods
a slightly different reconstruction from the original data. Their tend to have more limitations in implementation. For example,
work broadened the work of Phong et al. from a single the method proposed by Fowl et al. [12] can recover high
fully connected network to a multilayer convolutional network. quality raw data, but it is unrealistic to assume that the
Overall, the core idea of these early analytics-based works adversary has the ability to modify the initialization model.
is to recover the input of a fully connected layer from the Although Franziska et al. [13] weakened this assumption, the
gradient. client also could identify the adversary by simply examining
Both of these methods work only on shallow networks, the distribution of the initial weight. The methods by Phong
meaning that the parameter scale of the model is only in the et al. [3] and Fan et al. [4] have strong concealment, but

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.
5

their attack methods cannot reconstruct data from the current for upsampling from the feature maps to get the original input
popular neural network model. image. Eq. (3) gives the loss function of the generative network
Since in optimization-based methods the adversary is es- trained by the adversary, where gθ (·) represents the generative
sentially passive and difficult to detect, and they are less network, z represents the feature map, c(·) is the feature
restrictive than analytics-based methods, their range of ap- extraction network, and T V (·) stands for total variance loss.
plications is also wider than in analytics-based methods.
Unlike the analytics-based approach, the GLA defined by
the optimization-based is a nonconvex optimization problem.
Although the LBFGS optimizer (based on the quasi-Newton
method) is able to find the global optimal point in DLG [2],
follow-up work proved that LBFGS would fail completely
for larger networks and high-resolution inputs [8]. Therefore,
more GLA methods [7]–[9] choose the Adam optimizer for
optimization. Optimization to a global optimum, however,
is not easy in the case of large batch sizes and gradient Fig. 2. Schematic of how the generative network is trained
compression. Among the current optimization methods, the
research direction has focused on improving the reconstruction
quality in different FL setting scenarios. Generally, better L = X − gθ (z)2 + z − c(gθ (z))2 + T V (gθ (z)) (3)
reconstruction results can be obtained in scenes with auxiliary
information. Additional auxiliary information can reduce the Once the generative network is trained on the auxiliary
chance that the optimization task will get stuck at a local dataset of the adversary, the generative-based gradient leakage
optimum or that the initialization point will be located at a attack can be performed. Fig. 3 demonstrates the attack
point at which the convergence is easier. paradigm of the generative-based method. In this method, the
adversary first separates the locations of the original batch data
VI. GLA BASED ON GENERATION corresponding to the gradient matrix of the fully connected
layer according to the process of inferring the labels in the
A common problem with current optimization-based meth- STG [8], and then separates and extracts the feature maps
ods is that the optimization demands an enormous amount using the analytics-based method of Phong [3] and Fan et al.
of computational time. In contrast, analytics-based methods [4]. Finally, the inferred feature map is input into the trained
can recover data at the millisecond level, but they have many generative network to obtain the reconstructed data.
limitations in practice. It is natural to ask: whether there is a
fast way to implement the GLA without being too restrictive?
One approach that seems plausible is to use a generative
network to directly generate the private training image. How-
ever, most current methods using generative networks still have
an optimization process, but only the optimization space is
reduced compared to traditional GLA. Therefore, if we are
not starting from an optimization point of view, but from a
direct generation, we would like to be able to input gradient Fig. 3. Generation-based GLA
information and then directly generate the original image. We
refer to this kind of GLA as generation-based method. Fig. 5 shows the reconstruction results of the generation-
However, since the shared gradients tend to be large and based method and DLG [2], IG [7], and STG [8] with a batch
irregular, it is not practical to flatten and feed them directly size of 8 and a gradient compression ratio of 1%. Table II
into a generative network.Inspired by Phong et al. [3] and records the reconstructed image numerical metrics. We uses
Fan et al. [4], we can first quickly recover the input of the MSE (Mean Squared Error), PSNR (Peak Signal-to-Noise Ra-
last fully connected layer by the analytics-based method. This tio), SSIM (Structural Similarity Index), and LPIPS (Learned
is feasible in real-world scenarios, because in classification Perceptual Image Patch Similarity) to evaluate the final recon-
tasks, the global model often consists of a feature extractor struction performance. Among which MSE and PSNR measure
concatenated with fully connected layers. Once the input of the difference between the original and reconstructed images
the fully connected layer is computed, we can transform it in terms of mean error and signal-to-noise ratio, respectively.
to the original input feature map, since in the threat model, SSIM evaluates the structural similarity between the original
the global model architecture is known to the adversary. After and reconstructed images, where a score of 1 represents perfect
that, we can train a generator whose input is the feature map similarity. LPIPS measures the perceptual similarity between
and which output is the original image. Note that when there images by comparing them based on learned representations of
are no duplicate labels in the batch, each data in the batch human perception. Even in the gradient compression scenario,
recovers its feature map individually. Fig. 2 illustrates how we the generation-based method not only achieved better data
trained the generative model, where the generative network is reconstruction quality than the mainstream optimization-based
similar to the inverse ResNet, using transposed convolutions methods, but also completed the reconstruction in only a

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.
6

few seconds. Overall, because the generation-based method


is less restrictive and can be applied to gradient compression
scenarios with high reconstruction efficiency, we believe that
this is one of the future research directions.

Fig. 5. Combined generation-optimization data reconstruction results

compared to optimization-based methods for GLA. By directly


generating the private training image through a generative
network and bypassing the need for an extensive optimization
process, this method significantly reduces the time required
for refactoring. Furthermore, the generation-based approach
is less restrictive and can be applied to gradient compression
scenarios, which makes it a more versatile option. Moreover,
Fig. 4. Comparison of reconstruction results
combining the generation-based method with optimization-
based techniques can further enhance the quality of data re-
TABLE II construction, offering a promising direction for future research.
T HE NUMERIC RESULTS OF DIFFERENT GLA WHEN THE BATCH SIZE IS 8 Thus, the generation-based approach presents a powerful alter-
AND A GRADIENT COMPRESSION RATIO OF 1%
native to optimization-based methods, with its superior speed,
MSE ↓ PSNR ↑ SSIM ↑ LPIPS ↓ Time(s/batch) ↓ flexibility, and potential for improving data reconstruction
quality in GLA.
DLG [2] 0.1470 19.1715 0.0143 0.7586 1650
IG [7] 0.0912 10.3980 0.2338 0.6541 13610
STG [8] 0.0601 15.5830 0.3889 0.4188 14852 VII. F UTURE RESEARCH DIRECTIONS
Generation-based 0.0132 18.7460 0.4543 0.5150 1.72
In terms of GLA, we identified three directions that deserve
further exploration.
One another potential approach to enhance the quality First, all GLAs assume that the task scenario is a classifica-
of data reconstruction is to combine generation-based and tion problem, with image classification based tasks being the
optimization-based methods. In this method, the image gener- majority, which are based on cross-entropy loss. In some of the
ated by the generation-based approach serves as the initial im- other types of mainstream tasks in the image domain, such as
age, which is then iteratively optimized using an optimization- object detection, and semantic segmentation, the hybrid loss is
based approach. This promising approach can further im- used. Most of the current GLA methods are difficult to apply to
prove the quality of data reconstruction. Figure 5 displays hybrid loss scenarios, however; thus, they cannot attack other
the reconstruction results of our proposed generation-based types of task scenarios effectively. Moreover, in recognition
method combined with three optimization-based methods at and segmentation tasks, models often skip the upstream feature
a compression rate of 1%. Meanwhile, Table III provides map directly to the downstream part of the network, which is
numerical measurements of the reconstruction results. These similar to the skip connections mentioned by Fowl et al. [12]
results demonstrate that, compared to the simple generation- and may leak more information. Therefore, understanding how
based method, the combined reconstruction method signifi- to implement GLA for more task genres is a potential research
cantly improves image quality, as indicated by the increase in direction.
SSIM and LPIPS values. Second, the global models in the current GLA lack diversity
TABLE III
and basically only include small networks such as LeNet or
T HE NUMERIC RESULTS OF COMBINED GENERATION - OPTIMIZATION DATA custom ConvNet with few convolutional layers and fully con-
RECONSTRUCTION nected layers. Even for the most complex and large networks,
it is the norm to use only ResNet. Recently, Transformer
MSE↓ PSNR↑ SSIM↑ LPIPS↓ Time(s/batch)↓ architecture-based models have become popular in computer
Generation + DLG 0.0216 16.6650 0.4337 0.3392 1667 vision tasks and achieved better results than traditional con-
Generation + IG 0.0435 13.6186 0.3539 0.5607 13632 volutional neural network (CNN) models. Although some
Generation + STG 0.0310 15.0836 0.4745 0.3810 14865 works have attempted to extend the existing GLA methods
to Transformers, many drawbacks remain. For example, the
In conclusion, the generation-based approach offers a existing research can attack only single-layer self-attention
unique set of benefits in terms of speed and scalability when or shallow Transformers. Therefore, to make the application

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.
7

scenarios of GLA more diverse, extending GLA from CNN to in part by the Key-Area Research and Development Pro-
networks based on Transformer architecture remains a topic gram of Guangdong Province (2020B0101360001), in part
of interest. by Fundamental Research Funds for Chinese Central Uni-
Third, GLA must consider more realistic FL scenarios, versities (ZYGX2020ZB027), in part by CCF-Ant Group
including gradient transformation, vertical FL and transfer Research Fun, the Sichuan Science and Technology Program
learning. In practical FL, gradients often are transformed to (2020JDTD0007), and in part by the Sichuan Science and
reduce communication bandwidth or to defend against GLA, Technology Program (grant no. 2022ZHCG0037).
including clipping, compression, and adding noise. However,
most of the current studies do not consider these issues. In R EFERENCES
addition, vertical FL has greater application in industry, but
little research has been done in GLA. Finally, instead of [1] H. B. McMahan et al., “Communication-efficient learning of deep net-
works from decentralized data,” Proc. PMLR, pp. 1273-1282, 2017.
training a model from scratch, the popular machine learning [2] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” Advances in
paradigm prefers transfer learning strategies. This means that Neural Info. Processing Systems, pp. 14747–14756, 2019.
the upstream network does not generate gradients, which [3] Phong, L. T et al., “Privacy-preserving deep learning: Revisited and
enhanced,” Proc. ATIS, pp. 100-110, 2017.
would lose a large amount of information, and current main- [4] L. Fan et al., “Rethinking privacy preserving deep learning: How to
stream GLA methods rarely achieve data reconstruction when evaluate and thwart privacy attacks,” Federated Learning: Privacy and
such essential information is lost. Overall, it makes sense to Incentive, vol. 12500, pp. 32-50, 2020.
[5] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the GAN:
perform GLA in a more realistic scenario. Information leakage from collaborative deep learning,” Proc. ACM CCS,
We conclude by emphasizing the need for further research pp. 603-618, 2017.
into GLA defense strategies. Currently, the two dominant [6] B. Zhao, K. R. Mopuri, and H. Bilen, “idlg: Improved deep
leakage from gradients,” accessed on 8 Jan, 2020, available:
defense methods involve adding Gaussian noise to gradients https://arxiv.org/pdf/2001.02610.pdf.
and implementing cryptographic-based methods. However, the [7] J. Geiping et al., “Inverting gradients-how easy is it to break privacy
former comes at the expense of model accuracy, and finding in federated learning,” Advances in Neural Info. Processing Systems, pp.
16937-16947, 2020.
a balance remains challenging. The latter not only requires [8] H. Yin et al., “See through gradients: Image batch recovery via gradin-
modifying the model structure but also consumes significant version,” Proc. IEEE CVPR, pp. 16337-16346, 2021.
bandwidth and storage resources when updating the global [9] W. Wei et al., “A framework for evaluating gradient leakage at-
tacks in federated learning,” accessed on 22 Apr, 2020, available:
model. Recently, a study discovered that inserting a variational https://arxiv.org/pdf/2004.10397.pdf.
module before the output layer could interfere with gradients, [10] Z. Zhao, M. Luo, and W. Ding, “Deep leakage from model
ensuring the accuracy of the model while also resisting GLA. in federated learning,” accessed on 10 Jun, 2022, available:
https://arxiv.org/pdf/2206.04887.pdf.
This method is only effective on fully connected neural [11] J. Zhu, and M. Blaschko, “R-gap: Recursive gradient attack on privacy,”
networks and small convolutional neural networks. Thus, it accessed on 15 Oct, 2020, available: https://arxiv.org/pdf/2010.07733.pdf.
is imperative for future research to concentrate on devising [12] Fowl et al., “Robbing the fed: Directly obtaining private data in federated
learning with modified models,” accessed on 25 Oct, 2021, available:
robust, secure, and comprehensive countermeasures against https://arxiv.org/pdf/2110.13057.pdf.
gradient leakage attacks. [13] F. Boenisch et al., “When the curious abandon honesty: Feder-
ated learning is not private,” accessed on 6 Dec, 2021, available:
https://arxiv.org/pdf/2112.02918.pdf.
VIII. C ONCLUSION [14] A. Wainakh et al., “User label leakage from gradients in
federated learning,” accessed on 19 May, 2021, available:
Gradient leakage in federated learning is a problem that https://arxiv.org/pdf/2105.09369.pdf.
[15] H. Yang et al., “Using Highly Compressed Gradients in Federated
has attracted much attention in recent years. In this study, we Learning for Data Reconstruction Attacks,” IEEE Trans. Info. Forensics
provided a comprehensive summary of state-of-the-art gradient and Security, vol. 18, pp. 818-830, 2023.
leakage attacks. Depending on whether the adversary views
the attack pattern as solving a linear equation or an opti-
B IOGRAPHIES
mization problem, we generalized all current attack paradigms
into two taxonomies: optimization-based and analytics-based HAOMIAO YANG is currently a Professor with the School of
approaches. We extensively compared the two existing attack Computer Science and Engineering and the Center for Cyber
paradigms, and based on their shortcomings, we proposed Security, University of Electronic Science and Technology of
a novel generation-based approach that combines the ad- China (UESTC). His research interests include cryptography,
vantages of optimization-based and analytics-based methods cloud security, and cybersecurity for aviation communication.
to achieve satisfactory results. Finally, we encapsulated four MENGYU GE is currently pursuing his M.S. degree in School
directions worthy of future research based on current open of Computer Science, UESTC. His research interests include
questions in GLA related research. cloud computing, IoT and AI security.
DONGYUN XUE is currently studying for his M.S. degree in
School of Computer Science, UESTC. His research interests
ACKNOWLEDGMENTS
include security and privacy of outsourcing data and deep
This work was supported in part by the National Key R&D learning.
Program of China (2022YFB4501200, 2021YFB3101302, KUNLAN XIANG is currently pursuing the M.S. degree in
2021YFB3101300), in part by tthe National Natural Sci- School of Computer Science, UESTC. Her research interests
ence Foundation of China (62072081, 62072078, U2033212), include cloud security and IoT security.

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.
8

HONGWEI LI is currently the Head and a Professor at Depart-


ment of Information Security, School of Computer Science and
Engineering, UESTC. His research interests include network
security and applied cryptography.
RONGXING LU (Fellow, IEEE) is currently the Mastercard
IoT Research Chair, a University Research Scholar, and an
Associate Professor at the Faculty of Computer Science (FCS),
University of New Brunswick (UNB), Canada. His research
interests include applied cryptography, privacy enhancing tech-
nologies, and the IoT-big data security and privacy.

Authorized licensed use limited to: Ewha Womans Univ. Downloaded on June 12,2023 at 16:21:23 UTC from IEEE Xplore. Restrictions apply.

You might also like