Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

electronics

Review
A Survey of Side-Channel Leakage Assessment
Yaru Wang 1,2 and Ming Tang 1,2, *

1 School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China;
2018102110055@whu.edu.cn
2 Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education,
Wuhan University, Wuhan 430072, China
* Correspondence: m.tang@126.com; Tel.: +86-153-2741-2612

Abstract: As more threatening side-channel attacks (SCAs) are being proposed, the security of
cryptographic products is seriously challenged. This has prompted both academia and industry to
evaluate the security of these products. The security assessment is divided into two styles: attacking-
style assessment and leakage detection-style assessment. In this paper, we will focus specifically
on the leakage detection-style assessment. Firstly, we divide the assessment methods into Test
Vector Leakage Assessment (TVLA) and its optimizations and summarize the shortcomings of TVLA.
Secondly, we categorize the various optimization schemes for overcoming these shortcomings into
three groups: statistical tool optimizations, detection process optimizations, and decision strategy
optimizations. We provide concise explanations of the motivations and processes behind each scheme,
as well as compare their detection efficiency. Through our work, we conclude that there is no single
optimal assessment scheme that can address all shortcomings of TVLA. Finally, we summarize the
purposes and conditions of all leakage detection methods and provide a detection strategy for actual
leakage detection. Additionally, we discuss the current development trends in leakage detection.

Keywords: leakage assessment technology; side channel attack; TVLA; leakage detection

1. Introduction
The pervasive nature of information technology has permeated all aspects of work and
Citation: Wang, Y.; Tang, M. A life. As malicious information security incidents like “Eternal Blue” [1] and “Bvp47” [2]
Survey of Side-Channel Leakage continue to emerge, information security has garnered significant attention. The crypto-
Assessment. Electronics 2023, 12, 3461. graphic products are commonly known as products that utilize cryptography technology.
https://doi.org/10.3390/ The security is the fundamental attribute of cryptographic products. However, various
electronics12163461 cryptographic analysis technologies, such as traditional cryptographic analysis and SCAs,
can impact the security of these products. The traditional cryptographic analysis technol-
Academic Editor: Paulo Ferreira
ogy mainly includes techniques like differential cryptanalysis [3], linear cryptanalysis [4],
Received: 10 July 2023 correlation analysis [5], etc. On the other hand, SCA techniques encompass power analysis
Revised: 9 August 2023 attacks [6] (such as simple power analysis (SPA) [7], differential power analysis (DPA) [6],
Accepted: 11 August 2023 correlation power analysis (CPA) [8], mutual information analysis (MIA) [9], and SCA-based
Published: 15 August 2023 deep learning [10–16]), timing attacks [17], fault-based attacks [18], cache attacks [19], etc.
Consequently, evaluating the security of cryptographic products has become a crucial
task. Two popular security certification standards, namely, Common Criteria (CC) [20] and
FIPS [21], have been established to assess the security of cryptographic products. These
Copyright: © 2023 by the authors.
security certification standards employ two distinct assessment methods: evaluation-style
Licensee MDPI, Basel, Switzerland.
testing (also known as attacking-style assessment) and conformance-style testing (also
This article is an open access article
known as leakage detection-style assessment) [22].
distributed under the terms and
conditions of the Creative Commons
The attacking-style assessment mainly uses various SCA techniques to obtain the
Attribution (CC BY) license (https://
information of keys and evaluate the products against SCAs. The attacking-style assessors
creativecommons.org/licenses/by/ can directly obtain the key by executing SCAs. The evaluation results facilitate the calcula-
4.0/). tion of security metrics for encryption algorithms and the identification of vulnerabilities

Electronics 2023, 12, 3461. https://doi.org/10.3390/electronics12163461 https://www.mdpi.com/journal/electronics


Electronics 2023, 12, 3461 2 of 26

within these algorithms and provide guidance for implementing algorithm protection
strategies. The attacking-style assessment provides the advantages of rigorous evaluation
intensity and easily accessible evaluation results. The attacking-style assessment holds
great significance as a method for evaluating side-channel security. However, the use of
the attacking-style assessment in primary security assessments is considered unsuitable
due to its reliance on SCA technologies, which necessitate evaluators with advanced SCA
techniques. This reliance increases both the time and sample complexity of the assessment,
subsequently slowing down the assessment speed. Consequently, it fails to keep pace
with the rapid cycle of product innovation [23–27]. Instead, the leakage detection-style
assessment is proposed as a preliminary method to evaluate the security of cryptographic
products. The leakage detection-style assessment primarily relies on information theory or
statistical hypotheses to analyze side-channel measurements and determine the presence
of any side-channel leakage. The objective of the leakage detection-style assessment is to
evaluate whether the device can pass the testing rather than recover the key itself. This
type of assessment can be categorized into two main types: leakage assessment based on
information theory and assessment based on statistical hypothesis. In recent years, the
leakage assessment based on statistical hypotheses has been widely adopted due to its
superior efficiency and effectiveness [27]. Various works focusing on leakage assessment
using statistical hypotheses have been proposed [23–38].
In this paper, we study the works of side-channel leakage assessment and provide
succinct accounts of motivations and detection processes behind these assessment methods.
We also compare the efficiency of different detection methods and discuss the future
development of leakage assessments.
The main contributions of this paper are as follows:
(1) We analyze the works of side-channel leakage assessment and classify the leakage
detection-style assessment works into two categories: the technology of TVLA and
optimizations of TVLA. Additionally, we identify the shortcomings of TVLA. Due
to the TVLA’s flaws of statistical tool, detection process, and decision strategy, we
dividedTVLA’s optimization schemes into three groups: the optimization of statistical
tool, detection process, and decision strategy. Furthermore, we provide a brief de-
scription of the motivation and detection process for each optimization and compare
their detection efficiency.
(2) Due to the lack of a unified and comprehensive leakage detection assessment method
that can address all the TVLA’s shortcomings, as well as the variation in optimization
methods based on detection purposes and conditions, we present a summary on
how to select a suitable leakage detection assessment method depending on specific
detection purposes and conditions. Moreover, considering the current state of leakage
detection assessment, we discuss potential future trends in this field.
The structure of this paper is as follows. Section 2 provides a brief description
of the process, methods, metrics, and shortcomings of the attacking-style assessment.
Section 3 focuses on the leakage detection-style assessment and describes the development
process, metrics, and detection objectives. Section 4 mainly focuses on the leakage detec-
tion assessment based on statistical hypotheses and describes TVLA and its optimization
methods. Section 5 highlights the relationship between the attacking-style assessment and
leakage detection-style assessment. In Section 6, the current status and future development
trends of leakage detection-style assessment are discussed. Finally, Section 7 presents the
conclusion of this review.

2. The Attacking-Style Assessment


2.1. The Assessment Process of Attacking-Style Assessment
The attacking-style assessment primarily utilizes the device’s side-channel information
to recover the key and assess the product’s security against SCAs. The CC [20] standard
serves as a reference criterion for the attacking-style assessment. The estimators can choose
Electronics 2023, 12, x FOR PEER REVIEW 3 of 26

Electronics 2023, 12, 3461 3 of 26

standard serves as a reference criterion for the attacking-style assessment. The estimators
can choose any SCA method from the list of threats to conduct SCAs and assess the prod-
anysecurity
uct’s SCA method
level. from the
Figure list of threats
1 illustrates theto conduct
process of SCAs and assess theassessment.
the attacking-style product’s security
level. Figure 1 illustrates the process of the attacking-style assessment.

Figure
Figure 1. The
1. The process
process of of attacking-style
attacking-style assessment.
assessment.

In the attacking-style assessment, the evaluator assumes the role of an attacker with
In the attacking-style assessment, the evaluator assumes the role of an attacker with
prior knowledge of the implementation of cryptographic algorithms.
prior knowledge of the implementation of cryptographic algorithms.
2.2. The Methods of Attacking-Style Assessment
2.2. The Methods of Attacking-Style Assessment
Because the attacking-style assessment heavily relies on SCA technologies, the evalua-
torsBecause the attacking-style
must have assessment heavily
a proficient understanding of SCArelies on SCA technologies,
technologies. SCA techniquesthe eval-
can be
uators must have a proficient understanding of SCA technologies. SCA techniques
categorized into two groups: profiled attack (PA) and non-profiled attack (NPA). can
Thebeat-
categorized into two groups: profiled attack (PA) and non-profiled attack (NPA).
tackers using PAs must first construct a device model before conducting SCAs. The current The at-
tackers
methods using
forPAs
PAsmust firstTemplate
include construct a device
Attack (TA)model
[39–41]before conducting
and the SCAs. The cur-
profiled attack-based deep
rent methods
learning for PAs On
[14,42,43]. include Template
the other hand,Attack (TA) not
NPA does [38–40] andathe
require profiled
model attack-based
and solely relies on
deep learning [13,41,42].
side-channel measurementsOn the other hand,SCAs.
for conducting NPA Common
does not NPA
require a model
methods and solely
include DPA [7],
relies on side-channel
CPA [8], MIA [9], etc. measurements for conducting SCAs. Common NPA methods in-
clude DPA [7], CPA [8], MIA [9], etc.
2.2.1. The Profiled Attack
2.2.1.
(1) TheTheProfiled
TemplateAttack
Attack
(1) The Template
Time Attack
series are the typical representation of side-channel measurements. The attackers
Timethese
utilize series are the typical
measurements representation
along of side-channel
with their model to conductmeasurements.
side-channel powerThe attack-
analysis.
ersTAs
utilize these
are one measurements
of the along with
earliest approaches to PAstheir model
[39–41]. In to conduct
TAs, side-channel
it is assumed that thepower
attacker
analysis. TAstoare
has access theone of the
same earliest
device approaches
as the to PAs
target device, [38–40].
enabling In TAs,
them it is assumed
to encrypt that
any plaintext
theand
attacker
collecthas access to theside-channel
corresponding same devicepower
as thetraces.
target TAs
device, enabling
include them tothe
two stages: encrypt
stage of
constructing the template and the stage of matching the template.
Electronics 2023, 12, 3461 4 of 26

In the stage of constructing the template, the main objective is to extract trace features
and build the template. Assuming we have n traces designated as L = {l1 , l2 , . . . , ln },
we divide L into K groups noted as L1 , L2 , . . . , LK , where Li is the i-th group with k i .
The template of k i is noted as (µi , Ci ), where µi represents the mean vector and Ci is the
covariance matrix.
∑ l j ∈ Ai l j 1 0
µi =
ni
, Ci = ∑
ni − 1 l ∈ A
( l j − µi ) l j − µi . (1)
j i

During the stage of matching the template, the attacker utilizes the traces to calculate
the probability between the trace and the templates (µi , Ci ).
 
exp −0.5 × (trace − µi )0 Ci−1 (trace − µi )
P(trace; (µi , Ci )) = q . (2)
(2π )2 det(Ci )

If P trace; µ j , Cj > P(trace; (µi , Ci )), ∀i 6= j, then we get k guess = k j .
(2) The profiled attack based on deep learning
In recent years, deep learning technology has emerged as a popular alternative method
in PAs. Specifically, the PA based on deep learning [14,42,43] utilizes the multi-layer per-
ceptron and convolutional neural networks to construct the templates and conduct SCAs.
The PA based on deep learning requires two independent trace sets [42]. One set is
the training set, which is used to construct the template. Attackers need to have the keys,
plaintexts, and traces for the training set. During the training process, the attackers use a
minimum loss function to train the model, allowing the template to achieve better results.
The other one is the validation set, which is used to carry out attacks.
Compared to traditional approaches like TAs, the PA based on deep learning can
overcome the noise assumption and offers a more efficient and simplified process without
extensive preprocessing. However, there are two shortcomings. Firstly, the metrics of
deep learning are challenging to apply to SCA scenarios and may provide misleading
results [42,44]. Secondly, the effectiveness of PAs based on deep learning will decrease
significantly when facing imbalanced data.

2.2.2. The Non-Profiled Attack


The fundamental assumption of NPAs is that the attacker cannot obtain the same
device as the target device but has access to an unlimited number of side-channel traces.
(1) Differential Power Analysis
The key information is extracted by the distinguisher in NPA. Let X = ( X1 , . . . , XD )0
be the input, where Xi represents the i-th group. The attacker collects the traces from the
process of encrypting D groups’ data. The trace of Xi is denoted as li 0 = li,1 , . . . , li,nl ,


where nl represents the length of the trace. The guessing key space is K = {k1 , . . . k K },
where |K | is the number of all possible keys. For every key, the input X and the intermediate
value V = f ( X, k ) are considered. For input X and all guessing keys K, the trace matrix L
and intermediate value matrix V are

l1,1 l1,2 . . . l1,j l1,nl v1,1 v1,2 . . . v1,j v1,K


   
 l2,1 l2,2 . . . l2,j l1,n   v2,1 v2,2 . . . v2,j v1,K 
 l   
L=  . . . . . . . . . . . . . . . , V =  . . .
  ... ... ... ... 

 li,1 li,2 . . . li,j li,n   vi,1 vi,2 . . . vi,j vi,K 
l
l D,1 l D,2 ... l D,j l D,nl v D,1 v D,2 ... v D,j v D,K
Electronics 2023, 12, 3461 5 of 26

Map the intermediate value matrix V to matrix H = and calculate the correlation co-
efficient between hi of H and l j of L. The correlation coefficients ri,j are stored in matrix R.

h1,1 h1,2 ... h1,j h1,K r1,1 r1,2 ... r1,j r1,nl
   
 h2,1 h2,2 h2,j h1,K 
...  r2,1 r2,2 ... r2,j r2,nl 
 
 
H=
 ... ... ... ... ...  , R = ... ...
 ... ... ... 
 hi,1 hi,2 ... hi,j hi,K   ri,1 ri,2 ... ri,j ri,nl 
h D,1 h D,2 ... h D,j h D,K rK,1 rK,2 ... rK,j rK,nl
 
∑dD=1 (hi,d − hi )· (ld.j − l j
ri,j = q . (3)
2 2
∑dD=1 (hi,d − hi ) · ∑dD=1 (ld,j − l j )
We use correlation coefficients to assess the linear correlation between l j and hi , where
i = 1, . . . , K, j = 1, . . . , nl in DPA. As ri,j increases, the level of correspondence between
l j and hi also augments. By identifying the highest value of ri,j , the assailant can successfully
retrieve the key.
(2) Correlation Power Analysis
Correlation Power Analysis (CPA) is a variant of DPA that exploits the correlation
between the power traces L and the leakage model F to conduct the attack. Assuming the
leakage model is denoted by F, we map the intermediate value matrix V = f ( X, K ) to the
leakage value noted as GK = F (V ). Here, X represents the input, K denotes the key space,
and f stands for the cryptographic function. Typically, F represents the leakage model
based on either the Hamming weight or Hamming distance. Similarly, GK corresponds to
the Hamming weights or Hamming distances of V.
The correlation coefficient between the power traces L and GK is as follows:

E( L· GK ) − E( L)· E( GK )
p L,GK = , (4)
σL ·σGK

where E( L), E( GK ), E( L· GK ) are the expectations of L, GK , L· GK and σL , σGK are the


variances of L, GK , respectively.
The attackers iterate through the space of possible keys, calculating the correlation
coefficient between GK and L to determine whether a key guess is correct. The correlation
coefficient of the correct guess key is higher than that of the incorrect guess key. The
attackers identify the guess key with the maximum correlation coefficient as the correct
key, using the maximum likelihood estimation method.
(3) Mutual Information Analysis
The attacker of MIA uses mutual information or entropy to assess the correlation
between the leakage and the intermediate value or between the intermediate value and
side-channel measurements in [9]. The fundamental concepts of entropy and mutual
information are as follows:
Let X = ( X1 , X2 , . . . , Xn ) be the set of discrete random variables, then the entropy of
X is
H ( X ) = ∑ x∈X − p( x )logp( x ) (5)
where p( x ) is the probability distribution of variables.
Let X = ( X1 , X2 , . . . , Xn ), Y = (Y1 , Y2 , . . . , Yn ) be the set of discrete random variables,
then the conditional entropy H ( X |Y ) of X under Y is

H ( X |Y ) = − ∑ p(y)logp(y) ∑ p( x |y)logp( x |y). (6)


y ∈Y x∈X

The mutual information between X and Y is I ( X; Y ) = H ( X ) − H ( X |Y ) .


Electronics 2023, 12, 3461 6 of 26

Let K, X, and V be the key, plaintext, and intermediate value, respectively. The
correct key is denoted as k c , and the leakage function L(V ) = f ( X, K ) is continuous. Let
li = f ( Xi , Kc ), where 1 ≤ i ≤ N. For a given key k, the intermediate values can be obtained
using M = ( Xi , k ), where M represents discrete mutual information. When the attack is
successful, maxk∈K (| D ( M ( p, k), L)|) = k c , D is the distinguisher.
In MIA, the distinguisher was proposed as

I ( L, M ( X, k )) = H ( J ) − H ( I | M( X, k)). (7)

Different keys result in different values of mutual information. The average mutual
information is associated with the conditional entropy H ( I | M( X, k)) . A stronger corre-
lation between the measured L and M( X, k) indicates that the guessed key k guess is the
correct key.
The development of the attacking-style assessment relies entirely on SCA methods.
Essentially, the attacking-style assessment comprises attacks on the targeted devices. Con-
sequently, the assessment itself is considered as an attack.

2.3. The Metrics of Attacking-Style Assessment


Due to the assessment evaluation being an attack, the attack metrics are the assessment
metrics for attacking-style assessment. Distinguisher scores are commonly employed to
sort the candidate key k guess in an SCA. The position of the correct key k c in the sorting
results is called the key ranking, noted as rank(k c | L, X) . The metrics based on the key
ranking are defined as follows.
(1) The number of samples: Find a minimum positive integer N, when the sample size
| L| ≥ N, and there is
rank(k c | L, X ) ≤ K̂c . (8)
where K̂c is key ranking of k c . Especially, when the K̂c = 1, k c = K̂c , then rank(k c | L, X ) = 1.
(2) Success rate: The success rate of the side-channel attack is the probability of
successfully recovering the correct key.
(3) Guessing entropy: The guessing entropy is the mathematical expectation of
key ranking
GEL,X = E(rank(Kc | L, X )). (9)

2.4. The Advantages and Shortcomings of Attacking-Style Assessment


With the advancement of side-channel technology, there has been a rise in the proposal
of various side-channel attack methods. Evaluators can select an appropriate side-channel
attack method to evaluate cryptographic algorithms, considering distinct encryption imple-
mentations and attack conditions. Through evaluating the actual results of side-channel
attacks, one can determine the security level of the encryption implementation. The
attacking-style assessment technique is characterized by its high level of aggression and
extensive evaluation, allowing evaluators to directly acquire information about key and
design weaknesses. The attacking-style assessment enables a thorough comprehension of
system vulnerabilities and weaknesses. Through simulating real-world attack scenarios,
it offers valuable insights into the efficacy of security measures and aids in identifying
potential areas for improvement. It facilitates the identification of previously unknown
vulnerabilities. The capability to analyze attacks in a controlled environment enables the
development and implementation of efficient countermeasures. Thus, the attacking-style
assessment is better suited for conducting comprehensive analysis and evaluation of cryp-
tographic algorithms, subsequently facilitating vulnerability analysis and the establishment
of protective measures.
Electronics 2023, 12, 3461 7 of 26

The attacking-style assessment is widely utilized in the security assessment of crypto-


graphic products. However, there are several limitations to its applicability.
First, because new SCA methods are frequently proposed, it is crucial to periodically
update the list of attack methods. However, CC standards generally apply to high-security
products like bank smart cards and passport ICs. The time and computational complexity
involved in security evaluation make it difficult to keep pace with the innovation cycle of
new security products [26,45,46].
Second, the evaluators of attacking-style assessments must possess exceptional exper-
tise in SCA methods and measurement techniques [23].
Third, due to the different principles of SCAs, evaluators need to perform multiple
SCAs to calculate security metrics. The increasing number of attacks inevitably leads to
an increase in computational and time complexity [31,39]. Therefore, the attacking-style
assessment is not suitable for the primary security evaluation of cryptographic algorithms.
Instead, the leakage detection-style assessment is proposed as a preliminary method to
evaluate the security of cryptographic products.

3. The Leakage Detection-Style Assessment


3.1. The Goals of Leakage Detection-Style Assessment
The leakage detection-style assessment is conducted by the laboratory to provide the
security certification, or conducted by the designers during the design period to highlight
and address the potential issues before the product is launched into the market. Con-
sequently, different stages of assessment have different goals. There are four different
intentions [47]: certifying vulnerability, certifying security, demonstrating an attack, and
highlighting vulnerabilities.
(1) Certifying vulnerability: This involves identifying at least a leakage point in traces.
It is crucial to minimize the false positive rate.
(2) Certifying security: The goal here is to find no leakages after thorough testing. In
this case, the false negatives become a concern. It is important to design the tests with
“statistical power” to ensure a large enough sample size for detecting effects with reasonable
probability. Moreover, all possible intermediates and relevant higher-order combinations
of points should be targeted.
(3) Demonstrating an attack: The objective is to map a leaking point (or tuple) to its
associated intermediate state(s) and perform an attack. The reporting of the outcomes or
projections derived from these attacks is of interest. The false positives are undesirable as
they represent wasted efforts of the attacker.
(4) Highlighting vulnerabilities: The purpose is to map all exploitable leakage points to
intermediate states in order to guide designers in securing the device. This has similarities
to certifying security, as both require exhaustive analysis. The false negatives are of greater
concern than false positives, as false negatives indicate unfixed side-channel vulnerabilities.

3.2. The Process of Leakage Detection-Style Assessment


In leakage detection, the evaluators have the ability to control both input and output
variables and obtain the side-channel measurements. The process of leakage detection-style
assessment primarily encompasses four stages: collecting power traces, categorizing power
traces, calculating the statistical moment, and determining leakage. The process of leakage
detection-style assessment is shown in Figure 2.
Electronics 2023, 12, x FOR PEER REVIEW 8 of 26

Electronics 2023, 12, 3461 8 of 26


becomes an ideal method for primary security assessments due to the lowered technical
threshold and reduced evaluation time.

Figure 2. The process of leakage detection-style assessment.


Figure 2. The process of leakage detection-style assessment.

3.3.Compared
The Development of Leakage
with the Detection-Style
attacking-style Assessment
assessment, the leakage detection-style assessment
uses the statistical hypothesis or information theory
In this paper, we categorize the development of to detect whether
leakage there is leakage.
detection-style The
assessment
evaluator does not consider specific attack methods, leakage models, or the implementation
into three stages: the phase of proposing the leakage assessment concept, the phase of
offorming
encryption algorithms
leakage and does
assessment not require
methods, accurate
and the phase ofextraction of leakage
optimizing characteristics
the leakage assessment
ormethods.
the key recovery. Consequently,
Figure 3 provides a visualthe leakage detection-style
representation assessment
of these development becomes an
stages.
ideal method for primary security assessments due to the lowered technical threshold and
reduced evaluation time.

3.3. The Development of Leakage Detection-Style Assessment


In this paper, we categorize the development of leakage detection-style assessment
into three stages: the phase of proposing the leakage assessment concept, the phase of
forming leakage assessment methods, and the phase of optimizing the leakage assessment
methods. Figure 3 provides a visual representation of these development stages.
In the phase of proposing the leakage assessment concept, Corona (2001) proposed
the idea of security assessment [48]. The failed result indicates the presence of side-channel
leakage, whereas the pass result does not imply that there is no leakage but rather indicates
that the leakage has not been identified at the specified confidence level α. In [49], a
framework for the side-channel security assessment was presented, categorizing security
assessments into attacking-style and leakage detection-style assessments. This paper
primarily concentrates on the leakage detection-style assessment.
In the phase of forming leakage assessment methods, two groups of leakage detection-
style assessment methods have emerged based on different theories: the information
theory and the statistical hypothesis. In 2010 and 2011, Chothia proposed two leakage
assessment methods: one based on discrete mutual information (DMI) [50] and the other
based on continuous mutual information (CMI) [51]. Both the DMI and CMI methods
utilize information entropy as a testing tool to assess the possibility of side-channel leakage.
Additionally, Gilbert et al. utilized the statistical hypothesis as a testing tool in their
work [52]. They divided the traces into two groups and performed leakage assessment
on the Advanced Encryption Standard (AES) using a t-test [52]. In 2013, based on the
research by Gilbert et al. [52], Becker et al. proposed the Test Vector Leakage Assessment
(TVLA) technique
Figure 3. [53]. They
The development divided
of leakage the traces into
detection-style two groups: fixed plaintext traces
assessment.
vs. random plaintext traces. Then, they performed Welch’s t-tests on these two groups to
detect any mean difference between the trace sets of fixed plaintext and random plaintext.
In 2013, Mather et al. conducted a comparison of the detection efficiency between DMI
and CMI with TVLA [27]. The results demonstrated that TVLA outperformed DMI and
Electronics 2023, 12, 3461 9 of 26

CMI Figure 2. The of


in terms process of leakage
detection detection-style
efficiency. assessment. the leakage assessment based on the
Consequently,
statistical hypothesis has been developed in recent years. In 2015, Moradi et al. summarized
3.3. The Development of Leakage Detection-Style Assessment
TVLA techniques proposed by Gilbert and Becker in [23] and studied leakage detection in
variousIn this paper,
scenarios. wenon-specific
The categorize theTVLA
development of leakage
technology detection-style
has been assessment
widely applied in leakage
into three stages: the phase of proposing the leakage assessment concept, the phase of
assessment and is commonly regarded as a preliminary assessment technique in both
forming leakage assessment methods, and the phase of optimizing the leakage assessment
industry and academia due to its simplicity, efficiency, and versatility.
methods. Figure 3 provides a visual representation of these development stages.

Figure 3. The development of leakage detection-style assessment.


Figure 3. The development of leakage detection-style assessment.

However, in order to address the shortcomings of TVLA [22–40,47] (the detailed de-
scription of TVLA’s drawbacks can be found in Section 4), improve the detection efficiency,
reliability of results, and quantify side-channel vulnerability, the researchers have proposed
a variety of leakage assessment schemes to enhance the TVLA.
In the phase of optimizing leakage assessment methods, there are three main aspects to
consider. Firstly, the optimization of statistical tools aims to replace the t-test of TVLA with
alternative statistical tools (such as the paired t-test [26], χ2 -tests [28], Hotelling T 2 -tests [29],
KS tests [30], ANOVA [31], deep learning [36,37], etc.). Secondly, the optimization of the
detection process involves improving the current TVLA detection process or proposing a
new one [32–34]. Finally, the optimization of the decision strategy focuses on introducing a
new decision strategy, such as the HD strategy [35], to determine the stage of leakage.

4. The Leakage Assessment Based on Statistical Hypothesis


This section provides an overview of two perspectives on leakage assessment tech-
nologies: the TVLA technology and the TVLA’s optimization schemes.

4.1. The Test Vector Leakage Assessment


In 2013, the Cybersecurity Research Institute (CRI) introduced the TVLA as a standard-
ized approach for detecting side-channel leakages. This section emphasizes the process of
detection, the metrics used for detection, and discusses the limitations of TVLA.
Electronics 2023, 12, 3461 10 of 26

4.1.1. The TVLA Technology


(1) The detection process of TVLA
The detection process of TVLA is as follows.
In the stage of collecting power traces,
 N different inputs X are used to collect traces L
from the execution process. Let l = l1 , . . . , li , . . . , lnl be a trace, where li represents the
measurements of the i-th point, and nl is the length of traces.
In the stage of categorizing power traces, the trace set L is divided into two groups:
fixed plaintext trace set L A and random plaintext trace set L B . It is assumed that L A and
2 and N µ , σ2 , respectively. The cardinality,

L B obey the normal distribution N µ A , σA B B
sample mean, and sample variance of L A are denoted as n A , x A , s2A , while those of L B


are denoted as n B , x B , s2B .


In the stage of calculating the statistical moment, the null hypothesis H0 states that
there is no side-channel leakage, while the alternative hypothesis H1 suggests the presence
of leakage. Welch’s t-test is employed to determine the mean difference between L A and
L B . Based on the assumption of H0 , we calculate the statistical moment Ti and probability
Pi of accepting H0 .
Pi − value = P(| Ti | > | Tth |). (10)
In the stage of determining leakage, if | Ti | > | Tth |, then the null hypothesis H0 is
rejected, and it can be concluded that there is side-channel leakage.
(2) The statistical tool of TVLA
The statistical tool Welch’s t-test is used to test the mean difference between L A and
L B in TVLA. The null hypothesis H0 and alternative hypothesis H1 in Welch’s t-test [33]
are as follows:
H0 :µ A = µ B , H1 : µ A 6= µ B
The statistical moment Tµ and freedom degree v of the t-test are calculated as follows.
 2
s2A s2B
nA + nB
x − xB
Tµ = r A , v= 2 2 . (11)
s2 s2
 
s2A s2B A B
nA + nB
nA nB
n A −1 + n B −1

The probability | p| value of accepting hypothesis H0 is calculated as follows using the


probability density function (PDF).
 
Γ v+ 1 − v+2 1 Z ∞
t2

2
f (t, v) = √ 1 + , P = 2 f(t, v)dt, (12)
πvΓ 2v

v | Tµ |
R∞
where Γ( x ) = 0 t x−1 e−t dt is the gamma function.
The threshold 4.5 [54,55] is employed to determine the acceptance of H0 . If Tµ | i4.5 ,
the hypothesis H0 is rejected. When v > 1 000, P = 2 × f (4.5, v) < 10−5 [33], and this
implies that the probability of accepting hypothesis H0 is less than 0.00001, while the
probability of rejecting hypothesis H0 is greater than 0.99999, indicating the presence of
side-channel leakage in the device.
(3) The decision strategy of TVLA
Due to TVLA being a leakage assessment method for univariate data, when the
evaluator obtains the traces of length n L , TVLA is applied to each sample point of the traces.
The assessor can obtain n L detection results, and the assessment decision is obtained by
combing these results. The min − P strategy is a common decision strategy used in TVLA.
 0
Let L = [l1 , . . . , ln L ] be set of traces, where li = li,1 , . . . , li,j , . . . ll,N . The Tµi represents the
statistical moment value of li at i.
Electronics 2023, 12, 3461 11 of 26

When TVLA is applied to the long traces, the assessor actually conducts multiple
(n L times) tests. If any of the tests reject H0 , it indicates the presence of side-channel leakage.
In other words, the statistical moment is compatible with max1≤i≤n L Tµi ≥ th, or the
minimum p-value is less than the threshold αth . This indicates a leakage, meaning that
the min − P strategy of TVLA only uses one test result (the minimum p-value) to make a
decision regarding long traces.

4.1.2. The Assessment Metrics of TVLA


Due to the fact that TVLA involves the statistical analysis of random variables, it
is possible for the detection results to contain errors. Therefore, it becomes imperative
to evaluate the detection results. Commonly used metrics are employed to assess the
effectiveness of detection methods.
(1) The number of samples: The minimum sample size required to exceed the thresh-
old for statistical moments is an assessment metric used in TVLA. This metric is frequently
utilized to compare the detection effectiveness among various assessment methods. In
identical conditions, a smaller sample size indicates a higher degree of assessment effec-
tiveness [24–31].
(2) The false positive and false negative: The two types of errors commonly encoun-
tered in hypothesis testing are false positives [24] and false negatives [47]. A false positive
occurs when the null hypothesis is true, but it is rejected by a t-test, leading to an incorrect
conclusion. In the context of leakage detection, a false positive refers to a situation where
the device does not have any leaks, yet the TVLA results indicate otherwise. Conversely,
a false negative denotes a Type II error, which occurs when the null hypothesis is false,
but the t-test fails to reject it. The rate of Type II errors is denoted as β. During leakage
assessment, the assessor aims to control the false positive rate at the specified significance
level α.
(3) The effect size: The effect size ζ is an indicator [22,47] employed to assess the
effectiveness of leakage detection.
Cohen’s d [56,57] is a commonly used effect size for comparing differences among
groups, mainly applied to t-tests to compare the standard difference between two means.
Cohen’s d is computed as follows:

x A − xB
d= r , (13)
(n A −1)s2A +(n B −1)s2B
n A + n B −2

where x A , x B are the sample mean, s2A , s2B are the sample variance, and n A , n B are the
cardinalities. Cohen establishes the threshold as the criterion for judging the effect size [58].
According to his classification, when d ≤ 0.2, the effect size is considered “small”, and
when d ≤ 0.8, the effect size is considered “large”.
(4) The power: The power is an essential metric that indicates the ability of the assessor
to detect a difference at the significance level α, and it is noted as 1 − β. The power should
not be less than 75% and is typically required to reach 80% or 90%. The relationship among
the variance σ12 , σ22 , the power 1 − β, the significance level α, the effect size ζ, and the
number of samples N is as follows:
2
Tα/2 + Tβ · σ12 + σ22

N = 2· , (14)
ζ2
where ζ = µ1 − µ2 , and Tα/2 and Tβ are the statistical moments. Equation (14) allows us to
obtain any one of the significance level, effect size, or power [47].
(5) The observed power: Because power is a theoretical parameter that cannot be
directly obtained from samples, in order to evaluate the reliability of assessment results,
the observed power (OP) is proposed in [59] for given α and N. The OP can be considered
an approximation of power. If the probability of accepting the null hypothesis P > α, the
Electronics 2023, 12, 3461 12 of 26

assessor can determine the reliability of assessment results by checking whether OP is


greater than 0.8. Additionally, OP can serve as a means to compare the effectiveness of
different assessment methods, while keeping the sample size N constant.

4.1.3. The Drawbacks of TVLA


Generally, TVLA is based on hypothesis testing to determine whether the side-channel
measurements expose any secret information, which can help simplify the security as-
sessment process [25,60–62]. However, this complexity reduction is accompanied by an
increase in false positives or false negatives [23,36]. The drawbacks of TVLA are as follows:
(1) Difficulty interpreting negative outcomes
Formally, a statistical hypothesis test can either reject the null hypothesis or fail to
reject it. However, it cannot prove or accept the null hypothesis. If the statistical hypothesis
test fails to reject the null hypothesis, the security assessment agency must demonstrate
that their assessment method is fair. Unfortunately, due to the sample limitations or time
constraints, varying levels of expertise, and poor equipment quality, the fairness of leakage
assessment may be undermined. As a result, when the negative outcomes occur, it becomes
challenging to explain and provide evidence for the fairness of TVLA.
(2) Unreliability of positive outcomes
The TVLA is commonly utilized for univariate analysis. However, in actual detection
scenarios, multiple tests are necessary. To illustrate this, let us assume that the probability
of false positive is denoted as α. The length of traces is represented as nl , and there are
nl independent tests. Consequently, the probability of rejecting the null hypothesis for
a single test is given by α all = 1 − (1 − α)nl . Ding has emphasized that the threshold of
4.5 for TVLA corresponds to a false positive rate of approximately α ≈ 0.00001. For
instance, if there are 1000 independent tests, then α all = 0.0068. On the other hand, if there
are 1,000,000 independent tests, then α all = 0.9987 in [63]. Therefore, the products that
generate long power traces are more likely to be deemed vulnerable compared to those
with shorter traces. Consequently, the positive outcomes may be unreliable.
(3) Impossibility of achieving exhaustive coverage
Ideally, an evaluator would prefer to eliminate any possible sensitive dependency
across all distributional forms, for all points and tuples of points as a whole, considering all
target functions and intermediate states before determining the security of a target device.
However, even when the best efforts are made, there are still limitations. Moreover, in order
to enhance the scope of detection, extensive tests are required, which lead to an increase
in the type I error rate, computational complexity, and sample complexity. Consequently,
achieving exhaustive coverage through TVLA is not possible.
(4) The multivariate problems of TVLA
In TVLA, it is assumed that the sample points in a trace are independent of each other.
Additionally, TVLA is a test for the univariate analysis. However, in reality, numerous
examples contradict this assumption. This is especially evident in the case of protective
algorithm implementations, where the leakage is mostly observed to be multivariable or
horizontal [29,37,47]. Therefore, it is crucial to consider the correlation between multivariate
variables in leakage assessment.
(5) The fewer trace groups and dependence of statistical moment
The simplicity and efficiency of TVLA depend on a reduced number of groups, the
fixed VS random traces set, and the mean statistical moment [64]. However, in cases where
the leakage does not manifest in the mean statistical moment [32] and there is a presence of
mean differences among multiple groups, this introduces the risk of both false positives
and false negatives [29–33].
Electronics 2023, 12, 3461 13 of 26

(6) The drawbacks of distribution assumption


The assumption is that the power traces obey a normal distribution in TVLA, while
in reality, many examples that contradict this assumption can be observed. Especially for
the protective algorithm implementations, it is generally necessary to use the combining
functions to preprocess the side-channel traces in TVLA. Additionally, the distribution of
samples does not obey the normal distribution after preprocessing [59].
(7) The shortcomings of certifying vulnerability
The assessor of TVLA can only answer whether there is a side-channel leakage but
cannot provide information regarding the specific location of the leakage or how to exploit it
for key recovery. The results obtained from TVLA are insufficient for certifying vulnerability
or deducing the relationship between the detected leakage and attack [22].

4.2. The Optimizations of TVLA


In order to address the shortcomings of TVLA mentioned above, researchers have
proposed various optimization assessment methods. This section provides a summary of
optimization methods in three aspects, the optimization of statistical tools, the optimization
of assessment processes, and the optimization of decision strategies.

4.2.1. The Optimization of the Statistical Tool


The statistical tools play a crucial role in leakage assessments as they significantly
impact the detection results. The researchers have attempted to enhance the detection
efficiency and reliability of results in TVLA by utilizing alternative statistical tools in-
stead of Welch’s t-test when calculating statistical moments. This section summarizes the
optimization methods for statistical tools.
(1) The paired t-test
The motivation: In [26], Adam Ding found that the environmental noise can adversely
affect the results of t-tests in actual assessments. In the worst-case scenario, a device with
leaks could pass the test solely due to the environmental noise being strong enough to mask
them. In order to mitigate the impact of environmental noise, Adam Ding proposed a side-
channel leakage detection based on paired t-tests in [26], where Welch’s t-test was replaced
with the paired t-tests to eliminate the influence of environmental noise on the results.
The method: In the stage of collecting power traces, a fixed input sequence is proposed
to effectively minimize environmental noise. This sequence consists of the repetition of
ABBA, such as ABBAABBA. . . ABBAABBA. The power traces with minimal environmental
variation are obtained through a paired t-test.
In the stage of categorizing power traces, the power traces are divided into two sets
 0  0
and noted as L A = l A,1 , . . . l A, n A and L B = l B,1 , . . . l B, nB , where l A,i represents the
i-th trace of L A .
In the stage of calculating the statistical moment, assuming n A = n B = n, then there
0
are n pairs of traces (l A,i , l B,i ). Let D = ( L A − L B ) = { D1 , . . . , Dn } , where D and s2D
represent the sample mean and sample variance of D, respectively. Considering the null
hypothesis H0 : µ D = 0, then the statistical moment Tp of the paired t-test is as follows:

D
Tp = q 2 . (15)
sD
n

In the stage of determining leakage, use the same threshold and decision strategy
as TVLA.
(2) χ2 -test
The motivation: In TVLA, comparing two groups and using the simple mean statistical
moment can increase the risk of false negatives [28,49] or fail to detect leakages [40], when
Electronics 2023, 12, 3461 14 of 26

the leakage does not occur at the mean statistical moment. In 2018, Moradi proposed the
side-channel leakage detection based on the χ2 -test [28]. In the χ2 -test, Welch’s t-test was
replaced with the χ2 -test to detect whether multiple trace groups originate from the same
population. Furthermore, in the χ2 -test, the frequency of side-channel measurements is
stored in histograms, by analyzing the histograms to identify any distribution differences
among the trace groups.
The method: In the stages of collecting power traces and determining leakage, the
same method as TVLA is adopted. During the stage of categorizing power traces, the traces
are divided into r type groups and each type contains c sets in the χ2 -test. The frequency
of side-channel measurements is stored in c, and a contingency table is formed. In this
table, the Fi,j represents the frequency of the i-th type group and the j-th set. The number of
−1 c −1 ∑r−1 ∑c−1 Fi,j
samples is denoted as N = ∑ri= 0 ∑ j=0 Fi,j , and Ei,j =
i =0 j =0
N . In the stage of calculating
2
the statistical moment, the null hypothesis H0 of the χ -test states that all power traces
come from the same population. The statistical moment Tχ2 and the degree of freedom v
are obtained by (16).
2
r −1 c −1 Fi,j − Ei,j
Tχ2 = ∑∑ Ei,j
, v = (r − 1)·(c − 1), (16)
i =0 j =0

The probability P of accepting H0 is obtained by (17).


v −t Z ∞
t 2 −1 · e 2
f (t, v) = v ,P = 2 f (t, v)dt, (17)
e 2 ·Γ v
2
|t|

where Γ(·) is the gamma function.


In the stage of determining leakage, use the same threshold and decision strategy
as TVLA.
(3) KS test
The motivation: When the leakage does not occur at the mean statistical moment,
TVLA is not the optimal choice. Consequently, Zhou X proposed the side-channel leakage
detection based on the Kolmogorov–Smirnov (KS) test in [30]. The KS test is a non-
parametric statistical test used to determines if two groups of traces originate from the
same population by quantifying the distance of cumulative distribution functions between
the two groups.
The method: In the stages of collecting power traces, categorizing power traces, and
determining leakage, adopt the same method as TVLA. In the stage of calculating the
statistical moment, the null hypothesis H0 of the KS test assumes that L A and L B come
from the same population. The alternative hypothesis H1 states that L A and L B come from
different populations. The probability P of accepting H0 is as follows.

P = 2 ∑ (−1) j−1 e−2j
2 Z2
, (18)
j =1
√ 
0.11 n A ·nB
where Z = Dn A ,nB J+ √
J
+ 0.12 , J = n A +nB . DnA ,nB represents the maximum distance of
cumulative distribution probabilities between L A and LB , Dn A ,nB = supx L A,n A (x) − LB,nB (x) .
(4) Hotelling T 2 -test
The motivation: TVLA is based on the assumptions of a sample and its detection
efficiency strongly depends on parameters such as the signal-to-noise ratio (SNR), degree
of dependency, and density. The correct interpretation of leakage detection results requires
prior knowledge of these parameters. However, the evaluators often do not have this prior
knowledge, which poses a non-trivial challenge. In order to address this issue, Bronchain.
Electronics 2023, 12, 3461 15 of 26

O proposed using the Hotelling T 2 -test instead of Welch’s t-test in [36]. Additionally,
they explored the concept of multivariate detection, which is able to exploit differences
more effectively between multiple informative points in the trace compared to concurrent
univariate t-tests.
The method: In the stages of collecting power traces and categorizing power traces,
adopt the same method as TVLA. In the stage of calculating the statistical moment, the null
hypothesis H0 of the Hotelling T 2 -test is as follows.

H0 : µ A = µ B , H1 : µ A 6= µ B (19)

then, the statistical moment T 2 is


n A nB
T2 = ( x − x B ) T S −1 ( x A − x B ), (20)
n A + nB A

where n A , n B are the cardinality of trace sets L A and L B , the length of traces is nl , and the
covariance matrix C is
n T n T
∑i=A1 ( xi − x A )( xi − x A ) + ∑ j=B 1 ( x j − x B )( x j − x B )
C= . (21)
( n A + n B − 2)

The statistical moment T 2 follows the Fisher distribution with degrees of freedom
(nl , n A + n B − 2).

(n A + n B − 1 − nl ) 2 2
T = λTH ∼ F ( n l , n A + n B − 2), (22)
(n A + n B − 2)nl H0 0

(n A + n B − 1 − nl )
λ= , (23)
( n A + n B − 2) n l
then, the probability of accepting hypothesis H0 is
 
P = 1 − FF λT 2 (24)

v1 v2 
where FF ( x, v1 , v2 ) = I v1 x
2 , 2 , I is the regularization incomplete β function, and v1
v1 x + v2
and v2 are the degrees of freedom.
In the stage of determining leakage, use the same threshold and decision strategy
as TVLA.
(5) ANOVA
The motivation: The TVLA and the χ2 -test require a large number of traces to distin-
guish the difference between leakage points and non-leakage points, and the paired t-test
requires careful selection of inputs and low-noise measurements. Wei Yang proposed a
novel leakage detection method using analysis of variance (ANOVA) in [31]. In ANOVA,
the traces are categorized into multiple groups, and variance analysis is employed to
identify the differences among these groups.
The method: The method of collecting power traces and determining leakage is
adopted from TVLA. In the stage of categorizing power traces, the power trace set L is
divided into r groups. Let N, ni be the cardinality of L, Li , while x represents the sample
mean of L and xi represents the sample mean of Li . In the stage of calculating the statistical
moment, the null hypothesis H0 of the ANOVA test assumes that all group traces come from
the same population, then the statistical moment TF of ANOVA is calculated as follows.

( N − r )SSb nx 2
 
r r r n
TF = , SSb = ∑i=1 xi − ∑i=1 i i , SSw = ∑i=1 ∑ j=i 1 ( xi,j − x )2 , (25)
(r − 1)SSw N
Electronics 2023, 12, 3461 16 of 26

th = F1−α (r − 1, N − r ), (26)
The probability P of accepting hypothesis H0 is presented in Equation (27).
   v1 v1
v1 + v2 v1 2
2 −1
Z ∞ Γ 2 v2 x
P= f ( x, v)dx, f (v1 , v2 , x ) =  v1 + v2 , (27)
| TF | v1  v2 

xv1 2
Γ 2 ·Γ 2 1+ v2

where v1 = r − 1 and v2 = N − r.
(6) The deep learning leakage assessment
The motivation: TVLA is conducted under the assumption that each sample point
is independent and that the leakage occurs at the mean statistical moment. However, in
reality, many examples contradict this hypothesis. Additionally, unaligned traces can also
impact the detection results. Therefore, TVLA is inadequate for addressing horizontal
and multivariate leakage, as well as unaligned traces. In order to solve this issue, Moos T
proposed the method of Deep Learning Leakage Assessment (DL-LA) [37].
The method: DL-LA maintains the basic idea of TVLA, which involves discriminating
between two groups of traces, and enhances the side-channel leakage assessment by
training a neural network as a distinguisher to discriminate the two group traces. In
the stage of collecting power traces, the same method of trace collection as TVLA is
implemented. In the stage of categorizing power traces, the trace set L is divided into the
training set and validation set, which do notintersect.

The mean µ and standard deviation
j
j Xi − µ i
δ of the training set are calculated and Xi = δi is used to standardize both the training
set and validation set, where j represents the trace and i represents the sample points of
trace. In the stage of calculating the statistical moment, the assessor begins by training the
neural network distinguisher, using the training set, and then validating its accuracy using
the validation set. If the accuracy of the neural network distinguisher exceeds that of a
random guess distinguisher, it can be employed that the neural network distinguisher can
be used to discriminate the two groups of traces. The construction of the neural network
distinguisher is as follows.
The network is built using Python’s Keras library, with Tensor Flow serving as the
backend. It comprises four fully connected layers, consisting of neurons with outputs of 120,
90, 50, and 2, respectively. The ReLU function is utilized as the activation function for the
input layer and each inner layer, while softmax functions as the activation function for the
final layer. The four layers are separated by a Batch Normalization layer. Once the neural
network distinguisher is constructed, the assessor proceeds to employ it to conduct the
leakage assessment. The null hypothesis H0 states that the traces can be randomly divided
into two groups by the neural network distinguisher, and the total number of correct
classifications follows the binomial distribution X ∼ Binom( X, 0.5). The probability that
the total number of correct classifications is at least S X in the pure random distinguisher is
P( X ≥ S X ), and the probability is calculated as follows.

X   X  
X X
P( X ≥ SX ) = ∑ q
q
0.5 0.5 X −q
= 0.5 ∑
X
q
. (28)
q=S X q=S X

In the stage of determining leakage, the threshold Pth = 10−5 is set, and if
P( X ≥ S X ) > Pth , then the hypothesis H0 is rejected, indicating the presence of side-
channel leakage.
In summary, the various optimization methods (in Table 1) mentioned above are all
aimed at optimizing the statistical tool of TVLA. However, it should be noted that each
optimization method only addresses certain shortcomings of TVLA. Consequently, there
is currently no optimal statistical tool available that can effectively solve all shortcomings
Electronics 2023, 12, 3461 17 of 26

associated with the t-test used in TVLA. Therefore, when conducting leakage detection,
it is essential to select a statistical tool that is appropriate for the specific environment. If
addressing environmental noise, the paired t-test is recommended in place of the t-test.
For horizontal and multivariate leakage, the Hotelling T 2 -test or DL-LA are suggested.
Furthermore, for multi-group traces or non-mean statistical moment leakage, the χ2 -test,
ANOVA, and KS test are recommended alternatives to the t-test. Hence, it is highly
recommended to select the appropriate statistical tool based on the characteristics of
the detection environment and the nature of the leakage. This ensures accurate and
reliable results.

Table 1. The optimizations of TVLA’s statistical tool.

Tool For TVLA’s Shortcoming The Comparison Result with t-Test


The environmental noise negatively The paired t-test performs better than the t-test in a
The paired t-test [34]
affects the results of TVLA. noisy environment.
TVLA has only two classifications; the
When the leakage does not occur on the mean
χ2 -test [35] detection results rely on the mean
statistical moment, the χ2 -test is better than the t-test.
statistical moment.
When the leakage does not occur on the mean
The detection results rely on the mean
KS test [37] statistical moment or the statistical parameters are
statis tical moment.
transformed, the KS test is more robust than the t-test.
TVLA cannot be used for multivariate
For multivariate leakage, compared with the t-test, the
Hotelling T 2 -test [36] TVLA is based on an
Hotelling T 2 -test can improve the detection efficiency.
independence assumption.
When the traces are divided into more groups, the
ANOVA test [23] TVLA has only two groups. detection efficiency of the ANOVA test is better than
the t-test.
TVLA is not suitable for multivariate,
For the multivariate, horizontal leakage, or unaligned
DL-LA [14] horizontal leakage, and unaligned
power traces, DL-LA is better than the t-test.
power traces.

4.2.2. The Optimization of the Leakage Assessment Process


In TVLA, the efficiency and accuracy of detection results depend on the leakage assess-
ment process. With this in mind, the researchers have proposed a series of suggestions to
accelerate the detection process [32,33] or proposed a novel leakage assessment process [34].
This section primarily scrutinizes the optimization methods of the assessment process.
(1) The optimization of TVLA’s detection process
Melissa Azouaoui studied the literature on leakage assessment and considered the
leakage assessment process of TVLA as the combination or iteration of three steps: mea-
surement and preprocessing, leakage detection and mapping, and leakage exploitation,
and tried to find whether there are optimal guarantees for these steps in [33].
For measurement and preprocessing, the setting up of measurement devices depends
on the expertise [65]. The preprocessing is also similar, and currently the main methods
include filtering the noise [64,66] and aligning the power traces [67,68]. The best methods
of setting up the measurement devices and preprocessing should be as open and repeatable
as possible. Although there are some methods for setting up the measurement devices in
FIPS 140 and ISO, there is currently no guaranteed optimal approach for measurement
and preprocessing.
For leakage detection and mapping, the statistical hypothesis is commonly employed
for comparing the distribution or statistical moment. Despite the existence of numerous
methods for leakage detection, consensus on their fairness and optimality has yet to be
reached. Moreover, the “budget” of traces plays a significant role in leakage detection, yet
there is presently no established threshold for the optimal number of “budget” traces [29].
Electronics 2023, 12, 3461 18 of 26

For leakage exploitation, it is typically divided into three stages: modeling, informa-
tion extraction, and information processing. However, during the modeling stage, the
evaluator utilizes the traces to estimate the optimal model based on the implementation.
Nevertheless, as the number of shares increases in the mask scheme, the cost increases and
the independence of samples affects the modeling phase. Currently, obtaining the optimal
model remains an unresolved issue, and there is no optimal method.
During the actual leakage assessment, there are risks associated with all the aforemen-
tioned steps, and currently, there is no guarantee for an optimal leakage assessment process.
(2) A novel framework for explainable leakage assessment
Due to the unavailability of leakage detection results in TVLA, the current approach
consists of utilizing a specific attack to verify the detection outcomes. Based on this, Gao
Si and Elisabeth Oswald introduced a novel leakage assessment process in [34], referred
to as the “the process of Gao Si” in this paper. The leakage assessment process of [34] is
outlined below.
Step 1: Non-specific detection via key-dependent models.
Consider the two nested key-dependent models: The full model Lc f fits a model as a
function of the key K to the observed data. The null model L0 now contains only a constant
term, which represents the case where there is no dependency on K.
h 
Lc f (Kc ) = ∑ β j µ j (Kc ), j ∈ 0, 216 , (29)
j

L0 ( K ) = β 0 , (30)
where the coefficients β j are estimated from the traces via least square estimation.
The F-test is used to test H0 (both Lc f and L0 models explain the observed data
equally well) versus H1 (Lc f explains the data better than L0 ). If the F-test finds enough
evidence to reject H0 , we conclude this point’s leakage relies on Kc , because Kc is a part of
K. Consequently, the measurement depend on K.
Step 2: Degree analysis
Further restricting the degree of the key-dependent model, we determine how large
the key guess is required to be to exploit an identified leakage. We obtain the model
h 
Lcr (Kc ) = ∑ j β j µ j (Kc ), j ∈ 0, 216 , deg µ j (Kc ) ≤ g

(31)

The F-test is used again to test H0 (Lcr and Lc f explain the data equally well) versus
H1 (Lc f explains the data better than Lcr ).
The goal of the F-test is to have L f ( X ) = ∑ j β j µ j ( X ), j ∈ T f with Lr ( X ) = ∑ j β j µ j ( X ),
j ∈ Jr ∈ T f . The statistical moment of the F-test is as follows.

RSSr − RSS f
n f − nr
| F| = RSS f
. (32)
N −n f

  2  
where RSS = ∑iN=1 y(i) − e
L x (i ) , y(i) represents the measurements of x (i) , and e
L x (i )
is the Lc f or Lr of x (i) . n f and nr are the sample size of L f and Lr . N is the sample size
of L. The statistical moment | F | of the F-test obeys the F distribution with the freedom
degree of (n f − nr , N − n f ). The threshold of the F-test is Fth = Q F (d f 1 , d f 2 , 1 − α), where
d f 1 = n f − 1, d f 2 = N − n f , and Q F is the quantile function of the central F distribution.
If F ≥ Fth , the null hypothesis H0 is rejected, L f ( X ) explains the data better than
Lr ( X ), and the leakage contains the information of the key. If there is enough evidence
to reject H0 , we obtain that a model with only g or fewer key bytes suffices to explain the
Electronics 2023, 12, 3461 19 of 26

measurements. By successively reducing g, we can therefore determine the maximum key


guess that is required to explain the side channel measurements.
Step 3: Subkey identification
By using the technique of further restricting the reduced model, we can narrow down
precisely which specific key bytes are required to explain the identified leakage.
Step 4: Converting to specific attacks
If an evaluation regime requires an evaluator to demonstrate an actual attack targeting
an identified leakage point, a relatively straightforward connection to a concrete profiled
attack can be established.
In summary, although optimization methods exist for the measurement process and
preprocessing process, there is currently no optimal leakage detection process. In actual
leakage detection, the detection process of TVLA is still used to detect leakages. The
process of Gao Si is a new detection process to demonstrate that the discovered leakages
are key-related and can be exploited by attacks. The approach is a small step towards
establishing precise attack vectors for confirmatory attacks.

4.2.3. The Optimization of TVLA’s Decision Strategy


(1) The decision strategy of HC
For long traces, the detection result is obtained through using the min − P strategy
in TVLA. This strategy relies solely on the minimum | p|value to make a decision about
leakage, disregarding all other p-values. Ding, A.A proposed the higher critical (HC)
strategy in [35], which takes into account the information from all p-values.
The null hypothesis H0 : there is no leakage point in the trace. The alternative hypothe-
sis H1 : there is at least one leakage point in the trace. Let the length of traces be nl , there
are nl p-values, denoted as p(1), . . . , p(nl ), and the HC strategy is as follows.
Step 1: Sorting p-values in ascending order: p(1) ≤ p(2) ≤ . . . ≤ p(nl ).
Step 2: Calculating the normalized distance HCˆnl ,i of the p-value, and the normalized
distance of the HC strategy is the Formula (33).

n (i/nl − p(i ))
HC
\ n L ,i = pl . (33)
p(i )(1 − p(i ))

Step 3: Calculating the statistical moment of the HC strategy with (34).

HC
\nl ,max = max1≤i ≤ nl HC
\ nl ,i (34)
2

Step 4: Comparing the statistical moment HC HC


\ nl ,max with the threshold thnl ,α of sig-
nificance level α. If HC HC
\n ,max > thn ,α , then the null hypothesis is rejected, indicating the
l l
presence of side-channel leakages. The threshold thnHC
l ,α
represents the 1 − α quantile of the
statistical moment HCnl ,max under the null hypothesis. For large nl , the threshold thnHC
\
l ,α
can be approximated using the connection to a Brownian bridge, such as the calculation
formula provided in Li and Siegmund [68].
In summary, the HC strategy can combine multiple leakage points to enhance the
efficiency of leakage detection. Thus, it can serve as a viable alternative to TVLA’s
min-P strategy.

4.3. The Summary of TVLA’s Optimization Schemes


Researchers have proposed various optimization schemes for TVLA, which primarily
aim to address its inherent limitations. Currently, there is no comprehensive and universally
applicable statistical tool and detection process that can effectively address all the identified
limitations. Suitable detection methods for different detection purposes and conditions are
summarized in Figure 4.
Electronics 2023,12,
Electronics2023, 12,3461
x FOR PEER REVIEW 20 20
ofof2626

Figure4.4. The
Figure The optimization
optimization process
process of
of leakage
leakage assessment.
assessment.

Therefore,assessors
Therefore, assessorsshould
shouldperform
performthe thefollowing
followingprocess
processinin actual
actual leakage
leakage detec-
detection.
tion.Firstly, select the leakage detection process based on the purpose of detection. If the
aim isFirstly,
to onlyselect the leakage
discover detection
a side-channel processthe
leakage, based on the purpose
recommended of detection.
process is TVLA.IfIfthe the
aimisisto
aim todetect
only discover
and utilize a side-channel
the leakages,leakage, the recommended
it is recommended to chooseprocess is TVLA.
the process If theSi.
of Gao
aim Secondly,
is to detectifandthe utilize
TVLA the leakages,
process it is recommended
is chosen, it is recommended to choose the process
to select of Gao
an appropriate
Si.
statistical tool based on the evaluator’s prior knowledge of the device. If the evaluator has
no priorSecondly,
knowledgeif the about
TVLA the process is chosen,
device, a t-testitisisusedrecommended
initially totodetect
selectwhether
an appropriatethere is
astatistical
univariate,toolfirst-order
based on the evaluator’s
leakage. prior knowledge
If a leakage is detected,ofthe theleakage
device. If the evaluator
detection process hasis
no prior knowledge
stopped. Otherwise,about the χ2the device,
-test, KS testa t-test
and is used initially
ANOVA test are to used
detecttowhether there is a
detect univariate,
univariate,leakages,
high-order first-order leakage.
while If a leakage
the Hotelling T 2 -test
is detected,
and DA-LA theareleakage
used todetection
test for the process
presence is
2
stopped.
of Otherwise,
multivariate the 𝜒 -test,
or horizontal KS test
leakage. andevaluator
If the ANOVA has test prior
are used to detectofunivariate,
knowledge the device,
high-order
the leakages,
leakage type while the
(univariate Hotelling 𝑇 2 -test
or multivariate) and DA-LA
is determined are used
based on thisto knowledge.
test for the pres- Then,
ence of multivariate or horizontal leakage. If the evaluator
the detection environment (high noise or low noise) and alignment of power traces has prior knowledge of the
are
device, the For
determined. leakage type (univariate
univariate, or multivariate)
first-order leakage is determined
(mean statistical moment)based on this
and low-noise
knowledge. Then,
environments, the detection
the t-test environment
is more efficient. (high noisefirst-order
For univariate, or low noise)leakageandand alignment
high-noise of
power traces are
environments, thedetermined.
paired t-tests Forhave
univariate, first-orderefficiency.
better detection leakage (mean statistical moment)
For univariates where the
and low-noise
leakage does not environments,
occur at the mean the t-test is more
statistical efficient.
moment, the χ2univariate,
For -test, KS test,first-order
and ANOVA leakage test
and high-noise
have environments,
better detection performance the than
paired thet-tests
t-test.have better detection
For multivariate and efficiency.
horizontal For uni-
leakages,
2 -test 2
itvariates where the leakage
is recommended to choose does
thenot occur atTthe
Hotelling mean
andstatistical
DA-LA, moment,
with DA-LA the 𝜒 -test,more
being KS
test, and ANOVA test have
effective when the traces are unaligned.better detection performance than the t-test. For multivariate
and Finally,
horizontal leakages,
in the it is recommended
determination stage, the HC to choose
strategytheis Hotelling
recommended. 𝑇 2 -test and DA-LA,
withAlthough
DA-LA being many more effective
leakage whenmethods
detection the traceshave are unaligned.
been proposed, TVLA remains the
Finally,detection
mainstream in the determination
method. Therefore, stage, the theHC strategy
t-test is recommended.
is generally regarded as the mainstream
detection tool, with other detection tools considered supplementary. However, the current
Electronics 2023, 12, 3461 21 of 26

detection methods cannot definitively conclude that there is no leakage, only provide
evidence that no leakage has been detected.

5. Quantification of Side Channel Vulnerability


The inability of TVLA’s detection results to quantify the vulnerability of side channels
raises the question of how to establish a relationship between attacking-style assessment
and leakage detection-style assessment. Debapriya made an attempt to derive this rela-
tionship between TVLA and SCA. Additionally, SR can be calculated directly from TVLA,
effectively connecting CC and FIPS 140-3 based on the provided intermediate variables and
leakage model in [22].
The derivation process of the relationship between TVLA and SR is as follows.  Let
L = f ( X, k) be the normalized leakage model, where E( L) = 0, Var ( L) = E L2 = 1.
Y represents the measurements, and it can be defined as Y = eL + N , where e is the scale
2

factor and N ∼ N 0, σ represents the noise. Taking S-box as an example, the n-bit
Hamming weight model can be expressed as f ( X, k ) = √2n HW (sbox ( X ⊗ k)) − n2 (∗).


1
Firstly, link TVLA and NICV. We obtained N ICV2 = n
  . If
+n 1 1
n2 − n1 (σ12 −σ22 )+1
( TVLA)2 C
n 1
n2 = n1 = 2, N ICV2 = n +1
. If the side-channel traces are divided into q groups,
( TVLA)2
then
q−1 q
q ∑ i =1
N ICVq = N ICV2i (35)

q ni 2
∑ i =1 ni · n ( µi − µ )
where N ICV2i = 1 n , ni = n − ni .
n ∑ j=1 Yj − µ
( )
Secondly, link SNR and NICV. The form of SNR, NICV, and TVLA under the leakage model
Var ( E(Y | X )) 2 Var ( E(Y | X ))
(∗) are SNR = E(Var (Y | X ))
, SNR = σe 2 , N ICV = Var(Y ) , and N ICV = 1δ2 . Then
1+
e2

1 e·l ( X, q)
N ICV = 1
= √ (36)
SNR +1 e2 + 2σ2

Finally, export SR using SNR. The relationship between SNR and SR is

√ 1√
 
SR = Φ[K +( 1 )SNR(K∗∗ −kkT )] Q SNRk , (37)
4 2
  T   2 
where k = k k c , k g1 , . . . , k k c , k g2n −1 , k k c , k g = E l ( X, k c ) − l X, k gi .
  
k k c , k g1 , k g1 k k c , k g1 , k g2 n − 1

...
K= .. ..
,
 
.  ... . 
k k c , k g2 n − 1 , k g1 . . . k k c , k g2 n − 1 , k g2 n − 1
     
k k c , k gi , k gj = E l ( X, k c ) − l X, k gi l ( X, k c ) − l X, k g j ,

k∗∗ k c , k g1 , k g1 k∗∗ k c , k g1 , k g2n −1


   
...
K ∗∗ =
 .. .. 
. ... . 
k∗∗ k c , k g2n −1 , k g1 . . . k∗∗ k c , k g2n −1 , k g2n −1
 

     
k∗∗ k c , k gi , k gj = 4E((l ( X, k c ) − E(l ( X, k c )))2 k l ( X, k c ) − l X, k gi l ( X, k c ) − l X, k g j ,

where Φ[S] (µ) is the cumulative distributive function of the multivariate normal distribution
with mean vector µ and covariance C, k c is the correct key, and k gi with 1 ≤ i ≤ 2 n − 1.
where Φ[𝑆] (𝜇) is the cumulative distributive function of the multivariate normal distri-
bution with mean vector 𝜇 and covariance 𝐶, 𝑘𝑐 is the correct key, and 𝑘𝑔𝑖 with 1 ≤ 𝑖 ≤
Electronics 2023, 12, 3461 22 of 26
2 𝑛 − 1.
Figure 5 presents the process of hybrid side-channel testing. The process consists of
non-specific
FigureTVLA → specific
5 presents TVLA
the process of → SR →side-channel
hybrid evaluation testing.
results. The
First, the evaluator
process consists per-
forms a non-specific
of non-specific TVLATVLA on theTVLA
→ specific target→device, followed by
SR → evaluation the calculation
results. of SR using
First, the evaluator
specific TVLA.
performs The side-channel
a non-specific TVLA on vulnerabilities
the target device,of the target
followed device
by the are assessed
calculation without
of SR using
thespecific
need for actual
TVLA. attacks.
The If SR is
side-channel below the security
vulnerabilities limit,device
of the target the device is considered
are assessed without safe;
the need itfor
otherwise, isactual attacks.vulnerable
considered If SR is below the security limit, the device is considered safe;
to SCA.
otherwise, it is considered vulnerable to SCA.

Figure 5. The
Figure process
5. The processofofhybrid
hybridside-channel testing.
side-channel testing.

Although
Although thismethod
this method aims
aims to
to establish
establishthe
therelationship between
relationship TVLA
between and and
TVLA SR, itSR, it
provides limited bridging between CC and FIPS. However, this method is based on the
provides limited bridging between CC and FIPS. However, this method is based on the
assumptions regarding intermediate variables and leakage models, resulting in difficulties
assumptions regarding intermediate variables and leakage models, resulting in difficul-
in practical detection for evaluators without professional knowledge of these models
tiesand
in practical
values. detection for evaluators without professional knowledge of these models
and values.
6. Discussion
6. Discussion
Based on the research on leakage assessment works, we discuss leakage assessment
from twoon
Based perspectives:
the research the on
current status
leakage and future works,
assessment development trend. leakage assessment
we discuss
Regarding the current status of leakage assessment, firstly, although many leakage
from two perspectives: the current status and future development trend.
assessment methods have been proposed to address the specific shortcomings of TVLA
Regarding high-order
(multivariate, the current statusnoise
leakage, of leakage assessment,
issues, or independence firstly, although
of traces), there ismany leakage
currently
assessment
no unifiedmethods havemethod
standardized been proposed
to guide alltoassessors
addressinthe specific shortcomings
conducting the assessment of TVLA
step
(multivariate,
by step. Secondly, in actual leakage detection, the evaluator or agencies often spend a cur-
high-order leakage, noise issues, or independence of traces), there is
rently no unified
significant standardized
amount method
of time collecting thetotraces,
guideregardless
all assessors in conducting
of the the assessment
detection method used.
stepThe
bygoal is Secondly,
step. to evaluateintheactual
security of products
leakage and identify
detection, potentialorvulnerabilities
the evaluator agencies often andspend
provide further design to achieve the required security level. However, current
a significant amount of time collecting the traces, regardless of the detection method used. leakage
Thedetection focuses
goal is to solely
evaluate theonsecurity
discovering leakages and
of products and does not uncover
identify potential vulnerabilities or and
vulnerabilities
Electronics 2023, 12, 3461 23 of 26

guide the design process. From an investment cost perspective, the income obtained from
leakage detection is low.
In terms of the future development trend of leakage assessment, researchers are
currently attempting to link the non-specific leakage detection results with key guessing
in order to address the issue of unusable detection results. This approach would allow
the detection results to reveal information about the key or causes of leakage, which can
aid designers in constructing attacks or defenses. Therefore, we anticipate the formation
of a unified and fair method for security assessment. Additionally, in recent years, deep
learning technology has been applied to leakage detection. Compared to the traditional
leakage assessment technology, the deep learning leakage assessment offers simplicity and
high level of statistical confidence. However, it lacks the ability to provide superior security
metrics. Therefore, the further exploration is needed on how to effectively apply the deep
learning technology to side-channel leakage assessment.

7. Conclusions
In this paper, we conducted a comprehensive study on the leakage detection-style
assessment. These leakage detection-style assessment methodologies can be classified into
two categories: TVLA and its optimizations. We identified the drawbacks of TVLA and cat-
egorized the optimization schemes aimed at addressing these limitations into three groups:
statistical tool optimization, detection process optimization, and decision strategy opti-
mization. We gave succinct descriptions of the leakage detection assessment motivations
and detection processes and compared the efficiency of different optimization schemes.
Based on our classification and summary, we concluded that there is no single optimal
scheme that can effectively address all the shortcomings of TVLA. Different optimization
schemes are proposed for specific purposes and detection conditions. We summarized
the purposes and conditions for all TVLA optimizations and proposed a selection strategy
for leakage detection-style assessment schemes. According to the selection strategy, the
leakage detection process should be chosen based on the specific detection purpose. For
discovering side-channel leakages, TVLA is recommended, while the detection process
of Gao Si is suitable for discovering and utilizing leakages. Additionally, the appropriate
statistical tool should be selected. T-tests and paired t-tests are for detecting univariate,
first-order leakage. In low-noise environments, the t-test is more suitable, whereas in
high-noise environments, the paired t-test demonstrates better detection efficiency. If the
leakage does not occur at the mean statistical moment, the χ2 -test, KS test and ANOVA test
outperform the t-test for detecting univariate, high-order leakages. For testing multivariate
or horizontal leakages, the Hotelling T 2 -test and DA-LA are recommended. If the traces
are unaligned, DA-LA is more effective. Lastly, the HC strategy is recommended for the
determination stage. Based on the current status of leakage detection-style assessment, we
discuss the development trend of leakage detection. Researchers are increasingly interested
in linking leakage detection with key guessing to address the issue of unusable detection
results. The aim is to establish a unified and fair method for security assessment.

Author Contributions: Y.W. and M.T. made substantial contributions to the conceptualization and
methodology of the review. Y.W. also contributed to the writing, review, and editing of the manuscript.
All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported in part by the National Key R&D Program of China No. 2022YF
B3103800 and the National Natural Science Foundation of China No. 61972295.
Conflicts of Interest: The authors declare that they have no known competing financial interests or
personal relationships that could have appeared to influence the work reported in this paper.
Electronics 2023, 12, 3461 24 of 26

References
1. Li, Y.; Shen, C.; Tian, N. Guiding the Security Protection of Key Information Infrastructure with a Scientific Network Security
Concept. J. Internet Things 2019, 3, 1–4.
2. Cao, S.; Fan, L. NSA’s top backdoor has been exposed by Chinese researchers. Glob. Times 2022. [CrossRef]
3. Biham, E.; Shamir, A. Differential cryptanalysis of DES-like cryptosystems. J. Cryptol. 1991, 4, 3–72.
4. Matsui, M. Linear Cryptanalysis Method for DES Cipher. In Proceedings of the Workshop on the Theory and Application of
Cryptographic Techniques, Lofthus, Norway, 23–27 May 1993; Springer: Berlin/Heidelberg, Germany, 1993; pp. 386–397.
5. Knudsen, L.R. Cryptanalysis of LOKI 91, Advances in Cryptology-Auscrypt 92, LNCS 718. In Proceedings of the Workshop on the
Theory and Application of Cryptographic Techniques, Gold Coast, Queensland, Australia, 13–16 December 1992; Springer-Verlag:
Berlin/Heidelberg, Germany, 1998; pp. 196–208.
6. Kocher, P.; Jaffe, J.; Jun, B. Differential Power Analysis. In Proceedings of the 19th Annual International Cryptology Conference,
Santa Barbara, CA, USA, 15–19 August 1999; Springer: Berlin/Heidelberg, Germany, 1999; pp. 388–397.
7. Mangard, S. A Simple Power Analysis (SPA) Attack on Implementations of the AES Key Expansion. In Proceedings of the
International Conference on Information Security and Cryptology, Seoul, Republic of Korea, 28–29 November 2002; Springer:
Berlin/Heidelberg, Germany, 2002; pp. 343–358.
8. Brier, E.; Clavier, C.; Olivier, F. Correlation Power Analysis with a Leakage Model. In Proceedings of the 6th Interna-
tional Workshop on Cryptographic Hardware and Embedded Systems, Cambridge, MA, USA, 11–13 August 2004; Springer:
Berlin/Heidelberg, Germany, 2004; pp. 16–29.
9. Distinguisher, A.G.S.C.; Gierlichs, B.; Batina, L.; Tuyls, P.; Preneel, B. Mutual Information Analysis. In Proceedings of the
International Workshop on Cryptographic Hardware and Embedded Systems, Washington, DC, USA, 10–13 August 2008;
Springer: Berlin/Heidelberg, Germany, 2008; pp. 426–442.
10. Maghrebi, H.; Portigliatti, T.; Prouff, E. Breaking Cryptographic Implementations Using Deep Learning Techniques. In Proceed-
ings of the International Conference on Security, Privacy and Applied Cryptography Engineering, Hyderabad, India, 14–18
December 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 3–26.
11. Cagli, E.; Dumas, C.; Prouff, E. Convolutional Neural Networks with Data Augmentation against Jitter-Based Countermeasure.
In Proceedings of the International Conference on Cryptographic Hardware and Embedded Systems, Taipei, Taiwan, 25–28
September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 45–68.
12. Benadjila, R.; Prouff, E.; Strullu, R.; Cagli, E.; Dumas, C. Deep learning for side-channel analysis and introduction to ASCAD
database. J. Cryptogr. Eng. 2020, 10, 163–188. [CrossRef]
13. Picek, S.; Samiotis, I.P.; Heuser, A.; Kim, J.; Bhasin, S.; Legay, A. On the Performance of Deep Learning for Side-Channel Analysis.
In Proceedings of the IACR Transactions on Cryptographic Hardware and Embedded Systems, Amsterdam, The Netherland,
9–12 September 2018; pp. 281–301.
14. Himanshu, T.; Hanmandlu, M.; Kumar, K.; Medicherla, P.; Pandey, R. Improving CEMA Using Correlation Optimization.
In Proceedings of the 2020 International Conference on Advances in Computing, Communication Control and Networking
(ICACCCN), Greater Noida, India, 18–19 December 2020; pp. 211–216.
15. Agrawal, D.; Archambeault, B.; Rao, J.R.; Rohatgi, P. The EM Side Channel. In Proceedings of the 4th International Workshop on
cryptographic Hardware and Embedded Systems, Redwood Shores, CA, USA, 13–15 August 2002; Springer: Berlin/Heidelberg,
Germany, 2002; pp. 29–45.
16. Kocher, P.C. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In Proceedings of the
16th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 1996; Springer: Berlin/Heidelberg,
Germany, 1996; pp. 104–113.
17. Boneh, D.; DeMillo, R.A.; Lipton, R.J. On the Importance of Checking Cryptographic Protocols for Faults. In Proceedings of the
Advances in Cryptology-EUROCRYPT’97, LNCS 1233, International Conference on the Theory and Application of Cryptographic
Techniques, Konstanz, Germany, 11–15 May 1997; Spring: Berlin/Heidelberg, Germany, 1997; pp. 37–51.
18. Bernstein, D.J. Cache-Timing Attacks on AES. 2004. Available online: https://mimoza.marmara.edu.tr/~msakalli/cse466_09/
cache%20timing-20050414.pdf (accessed on 14 August 2023).
19. ISO/IEC JTC 1/SC 27: ISO/IEC 17825; Information Technology—Security Techniques—Testing Methods for the Mitigation of Non-Invasive
Attack Classes against Cryptographic Modules. International Organization for Standardization: Geneva, Switzerland, 2016.
20. FIPS 140–3; Security Requirements for Cryptographic Modules. NIST: Gaithersburg, MD, USA, 2019.
21. Roy, D.B.; Bhasin, S.; Guilley, S.; Heuser, A.; Patranabis, S.; Mukhopadhyay, D. CC meets FIPS: A Hybrid Test Methodology for
First Order Side Channel Analysis. IEEE Trans. Comput. 2019, 68, 347–362. [CrossRef]
22. Schneider, T.; Moradi, A. Leakage Assessment Methodology. In Proceedings of the Cryptographic Hardware and Embedded
Systems CHES 2015, Saint-Malo, France, 13–16 September 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 495–513.
23. Standaert, F.X. How (Not) to Use Welch’s t-test in Side Channel Security Evaluations; Report 2016/046; Cryptology ePrint Archive;
Springer: Berlin/Heidelberg, Germany, 2016.
24. Durvaux, F.; Standaert, F.-X. From Improved Leakage Detection to the Detection of Points of Interests in Leakage Traces. In
Proceedings of the 35th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Vienna,
Austria, 8–12 May 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 240–262.
Electronics 2023, 12, 3461 25 of 26

25. Ding, A.A.; Chen, C.; Eisenbarth, T. Simpler, Faster, and More Robust T-Test Based Leakage Detection. In Proceedings of
the Cryptographic Hardware and Embedded Systems—CHES, Graz, Austria, 14–15 April 2016; Springer: Berlin/Heidelberg,
Germany, 2014; pp. 108–125.
26. Mather, L.; Oswald, E.; Bandenburg, J.; Wójcik, M. Does My Device Leak Information? A Priori Statistical Power Analysis of
Leakage Detection Tests. In Proceedings of the International Conference on the Theory and Application of Cryptology and
Information Security, Bengaluru, India, 1–5 December 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 486–505.
27. Moradi, A.; Richter, B.; Schneider, T.; Standaert, F.X. Leakage Detection with the χ2 -Test. IACR Trans. Cryptogr. Hardw. Embed.
Syst. 2018, 2018, 209–237. [CrossRef]
28. Bronchain, O.; Schneider, T.; Standaert, F.X. Multi-tuple leakage detection and the dependent signal issue. IACR Trans. Cryptogr.
Hardw. Embed. Syst. 2019, 2019, 318–345. [CrossRef]
29. Zhou, X.; Qiao, K.; Ou, C. Leakage Detection with Kolmogorov-Smirnov Test. Cryptology ePrint Archive, Paper 2019/1478.
Available online: https://eprint.iacr.org/2019/1478 (accessed on 14 August 2023).
30. Yang, W.; Jia, A. Side-channel leakage detection with one-way analysis of variance. Secur. Commun. Netw. 2021, 2021, 6614702.
[CrossRef]
31. Azouaoui, M.; Bellizia, D.; Buhan, I.; Debande, N.; Duval, S.; Giraud, C.; Jaulmes, É.; Koeune, F.; Oswald, E.; Standaert, F.X.; et al.
A Systematic Appraisal of Side Channel Evaluation Strategies? In Proceedings of the Security Standardisation Research: 2020
International Conference on Research in Security Standardisation, SSR 2020, London, UK, 30 November–1 December 2020;
Springer: Berlin/Heidelberg, Germany, 2020; pp. 46–66.
32. Bronchain, O. Worst-Case Side-Channel Security: From Evaluation of Countermeasures to New Designs. Ph.D. Thesis, Catholic
University of Louvain, Louvain-la-Neuve, Belgium, 2022.
33. Gao, S.; Oswald, E. A Novel Completeness Test and its Application to Side Channel Attacks and Simulators. In Proceedings of the
Annual International Conference on the Theory and Applications of Cryptographic Techniques, EUROCRYPT 2022: Advances in
Cryptology—EUROCRYPT 2022, Trondheim, Norway, 30 May–3 June 2022; pp. 254–283.
34. Ding, A.A.; Zhang, L.; Durvaux, F.; Standaert, F.X.; Fei, Y. Towards Sound and Optimal Leakage Detection Procedure. In
Proceedings of the Smart Card Research and Advanced Applications—16th International Conference, CARDIS 2017, Lugano,
Switzerland, 13–15 November 2017; Revised Selected Papers, Volume 10728 of Lecture Notes in Computer Science. Springer:
Berlin/Heidelberg, Germany, 2017; pp. 105–122.
35. Zhang, L.; Mu, D.; Hu, W.; Tai, Y. Machine-learning-based side-channel leakage detection in electronic system-level synthesis.
IEEE Netw. 2020, 34, 44–49. [CrossRef]
36. Moos, T.; Wegener, F.; Moradi, A. DL-LA: Deep Learning Leakage Assessment: A modern roadmap for SCA evaluations. IACR
Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 552–598. [CrossRef]
37. Whitnall, C.; Oswald, E. A Critical Analysis of ISO 17825 Testing Methods for the Mitigation of Non-Invasive Attack Classes
against Cryptographic Modules. In Proceedings of the International Conference on the Theory and Application of Cryptology
and Information Security, Kobe, Japan, 8–12 December 2019; Springer: Cham, Switzerland, 2019; pp. 256–284.
38. Chari, S.; Raoj, R.; Rohatgi, P. Template Attacks. In Proceedings of the Lecture Notes in Computer Science: Volume 2523
Cryptographic Hardware and Embedded Systems-CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, 13–15
August 2002; Revised Papers. Springer: Berlin/Heidelberg, Germany, 2002; pp. 13–28.
39. Rechberger, C.; Oswald, E. Practical Template Attacks. In Proceedings of the 5th International Workshop, WISA 2004, Jeju Island,
Republic of Korea, 23–25 August 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 440–456.
40. Choudary, O.; Kuhn, M.G. Effectient Template Attacks. In Proceedings of the Lecture Notes in Computer Science: Vo1ume 84l9
Smart Card Research and Advanced Applications 12th International Conference, CARDIS 20l3, Berlin, Germany, 27–29 November
2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 253–270.
41. Cagli, E.; Dumas, C.; Prouff, E. Convolutional Neural Networks with Data Augmentation against Attack Based Countermeasures-
Profiling Attacks without Preprocessing. In Proceedings of the Lecture Notes in Computer Science: Volume l0529 Cryptographic
Hardware and embedded Systems—CHES 2017 19th International Conference, Taipei, Taiwan, 25–28 September 2017; Springer:
Berlin/Heidelberg, Germany, 2017; pp. 25–28.
42. Kim, J.; Picek, S.; Heuser, A.; Bhasin, S.; Hanjalic, A. Make some noise. unleashing the power of convolutional neural networks
for profiled side-channel analysis. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019, 2019, 148–179. [CrossRef]
43. Picek, S.; Heuser, A.; Jovic, A.; Bhasin, S.; Regazzoni, F. The curse of class imbalance and conflicting metrics with machine learning
for side-channel evaluations. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019, 2019, 209–237. [CrossRef]
44. Danger, J.L.; Duc, G.; Guilley, S.; Sauvage, L. Education and Open Benchmarking on Side-Channel Analysis with the DPA
Contests. In Non-Invasive Attack Testing Workshop; NIST: Gaithersburg, MD, USA, 2011.
45. Standaert, F.X.; Gierlichs, B.; Verbauwhede, I. Partition vs. Comparison Side Channel Distinguishers: An Empirical Evaluation of
Statistical Tests for Univariate Side-Channel Attacks against Two Unprotected CMOS Devices. In Proceedings of the International
Conference on Information Security and Cryptology, ICISC 2008, Seoul, Republic of Korea, 3–5 December 2008; Springer:
Berlin/Heidelberg, Germany, 2009; pp. 253–267.
46. Whitnall, C.; Oswald, E. A Cautionary Note Regarding the Usage of Leakage Detection Tests in Security Evaluation. In
Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security: Advances in
Cryptology—ASIACRYPT 2013, Bengaluru, India, 1–5 December 2013; pp. 486–505.
Electronics 2023, 12, 3461 26 of 26

47. Coron, J.S.; Kocher, E.; Naccache, D. Statistics and Secret Leakage. In Proceedings of the Financial Cryptography: 4th International
Conference, FC 2000, Anguilla, British West Indies, 20–24 February 2000; Springer: Berlin/Heidelberg, Germany, 2001; pp. 157–173.
48. Standaert, F.X.; Malkin, T.G.; Yung, M. A Unified Framework for the Analysis of Side-Channel Key Recovery Attacks. In
Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne,
Germany, 26–30 April 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 443–461.
49. Chatzikokolakis, K.; Chothia, T.; Guha, A. Statistical Measurement of Information Leakage. In Proceedings of the Interna-
tional Conference on Tools and Algorithms for the Construction and Analysis of Systems, ETAPS 2010, Paphos, Cyprus,
20–29 March 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 390–404.
50. Chothia, T.; Guha, A. A Statistical Test for Information Leaks Using Continuous Mutual Information. In Proceedings of the 2011
IEEE 24th Computer Security Foundations Symposium, Cernay-la-Ville, France, 27–29 June 2011; pp. 177–190.
51. Gilbert Goodwill, B.J.; Jaffe, J.; Rohatgi, P. A Testing Methodology for Side-Channel Resistance Validation. In NIST Non-Invasive
Attack Testing Workshop; NIST: Gaithersburg, MD, USA, 2011; pp. 115–136.
52. Becker, G.T.; Cooper, J.; DeMulder, E.K.; Goodwill, G.; Jaffe, J.; Kenworthy, G.; Kouzminov, T.; Leiserson, A.J.; Marson, M.E.;
Rohatgi, P.; et al. Test Vector Leakage Assessment (TVLA) Methodology in Practice. In Proceedings of the International
Cryptographic Module Conference, Gaithersburg, MD, USA, 24–26 September 2013.
53. Bilgin, B.; Gierlichs, B.; Nikova, S.; Nikov, V.; Rijmen, V. Higher-order threshold implementations. In Proceedings of the Lecture
Notes in Computer Science, Kaoshiung, Taiwan, 7–11 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 326–343.
54. De Cnudde, T.; Bilgin, B.; Reparaz, O.; Nikov, V.; Nikova, S. Higher-order threshold implementation of the AES S-box. In
Proceedings of the Smart Card Research and Advanced Applications: 14th International Conference, CARDIS 2015, Bochum,
Germany, 4–6 November 2015; Springer: Berlin/Heidelberg, Germany, 2016; pp. 259–272.
55. Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Routledge: Oxfordshire, UK, 1988.
56. Sawilowsky, S.S. New effect size rules of thumb. J. Mod. Appl. Stat. Methods 2009, 8, 597–599. [CrossRef]
57. Backes, M.; Dürmuth, M.; Gerling, S.; Pinkal, M.; Sporleder, C. Acoustic Side-Channel Attacks on Printers. In Proceedings of the
19th USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August 2019.
58. Wang, Y.; Tang, M.; Wang, P.; Liu, B.; Tian, R. The Levene test based-leakage assessment. Integration 2022, 87, 182–193. [CrossRef]
59. Wagner, M. 700+ Attacks Published on Smart Cards: The Need for a Systematic Counter Strategy. In Proceedings of the
Constructive Side-Channel Analysis and Secure Design—Third International Workshop, COSADE 2012, Darmstadt, Germany,
3–4 May 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 33–38.
60. Bache, F.; Plump, C.; Güneysu, T. Confident Leakage Assessment—A Side-Channel Evaluation Framework Based on Confidence
Intervals. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, Dresden, Germany,
19–23 March 2018; pp. 1117–1122.
61. Schneider, T.; Moradi, A. Leakage assessment methodology: Extended version. Cryptogr. Eng. 2016, 6, 85–99. [CrossRef]
62. Yaru, W.; Ming, T. Side channel leakage assessment with the Bartlett and multi-classes F-test. J. Commun. 2022, 42, 35–43.
63. Mangard, S. Hardware Countermeasures against DPA—A Statistical Analysis of Their Effectiveness. In Proceedings of the Topics
in Cryptology–CT-RSA 2004: The Cryptographers’ Track at the RSA Conference 2004, San Francisco, CA, USA, 23–27 February
2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 222–235.
64. Skorobogatov, S. Synchronization method for SCA and fault attacks. J. Cryptogr. Eng. 2011, 1, 71–77. [CrossRef]
65. Oswald, D.; Paar, C. Improving Side-Channel Analysis with Optimal Linear Transforms. In Proceedings of the Smart Card Research
and Advanced Applications: 11th International Conference, CARDIS 2012, Graz, Austria, 28–30 November 2012; pp. 219–233.
66. Merino Del Pozo, S.; Standaert, F.X. Blind source separation from single measurements using singular spectrum analysis. In
Proceedings of the Cryptographic Hardware and Embedded Systems--CHES 2015: 17th International Workshop, Saint-Malo,
France, 13–16 September 2015; pp. 42–43.
67. van Woudenberg, J.G.; Witteman, M.F.; Bakker, B. Improving Differential Power Analysis by Elastic Alignment. In Proceedings of
the Topics in Cryptology–CT-RSA 2011: The Cryptographers’ Track at the RSA Conference 2011, San Francisco, CA, USA, 14–18
February 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 104–119.
68. Li, J.; Siegmund, D. Higher criticism: P-values and criticism. Ann. Stat. 2015, 43, 1323–1350. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like