Key Updating For Leakage Resiliency With Application To AES Modes of Operation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO.

3, MARCH 2015

519

Key Updating for Leakage Resiliency With


Application to AES Modes of Operation
Mostafa Taha, Member, IEEE, and Patrick Schaumont, Senior Member, IEEE

Abstract Side-channel
analysis
(SCA)
exploits
the
information leaked through unintentional outputs (e.g., power
consumption) to reveal the secret key of cryptographic modules.
The real threat of SCA lies in the ability to mount attacks
over small parts of the key and to aggregate information over
different encryptions. The threat of SCA can be thwarted by
changing the secret key at every run. Indeed, many contributions
in the domain of leakage resilient cryptography tried to achieve
this goal. However, the proposed solutions were computationally
intensive and were not designed to solve the problem of the
current cryptographic schemes. In this paper, we propose a
generic framework of lightweight key updating that can protect
the current cryptographic standards and evaluate the minimum
requirements for heuristic SCA-security. Then, we propose a
complete solution to protect the implementation of any standard
mode of Advanced Encryption Standard. Our solution maintains
the same level of SCA-security (and sometimes better) as the
state of the art, at a negligible area overhead while doubling
the throughput of the best previous work.
Index Terms Hardware security (side channels).

I. I NTRODUCTION

IDE-CHANNEL analysis (SCA) is an implementation


attack that targets recovering the key of cryptographic
modules by monitoring side-channel outputs which include,
but are not limited to, electromagnetic radiation, execution
time, acoustic waves, photonic emissions and many more. The
real threat of SCA is that the adversary (Eve) can mount
attacks over small parts of the key, and to aggregate the
information leakage over different runs to recover the full
secret. SCA attacks are commonly based on three pillars, as
shown in Fig. 1:
1) Sensitive variables affect leakage traces.
2) Eve can calculate hypothetical sensitive variables.
3) She can combine information from different traces.
The design of countermeasures against SCA attacks is
a vast research field. Contributions in this regard fall into
three categories: Hiding, Masking and Leakage Resiliency.

Manuscript received April 26, 2014; revised August 18, 2014 and
October 21, 2014; accepted November 29, 2014. Date of publication
December 18, 2014; date of current version February 2, 2015. This work
was supported in part by the Virginia Tech-Middle East and North Africa
Program, Egypt, and in part by the National Science Foundation under Grant
1115839. The associate editor coordinating the review of this manuscript and
approving it for publication was Prof. Ozgur Sinanoglu.
The authors are with the Department of Electrical and Computer
Engineering, Virginia Tech, Blacksburg, VA 24061 USA (e-mail:
mtaha@vt.edu; schaum@vt.edu).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIFS.2014.2383359

Fig. 1.

Pillars of SCA attacks.

Our focus in this paper is to design a countermeasure for


hardware cryptographic modules at a small implementation
cost (area and performance).
Hiding depends on breaking the link between intermediate variables and the observable leakage by minimizing the
signal-to-noise ratio within the trace. This can be achieved
using balanced circuits and/or noise generators. Unfortunately,
cryptographic modules with hiding require more than double
the area (see [1]).
Masking depends on breaking Eves ability to calculate
hypothetical intermediate variables, by splitting the useful
information into n shares based on random variable(s).
The random variables are generated on-the-fly and discarded
afterwards. Each share is processed independently. The final
outputs (of each share) are combined to retrieve the original
output. Similarly, cryptographic modules supported with
masking require more than double the area (see [2]).
Leakage resiliency depends on using a fresh key for every
execution of the cryptographic module hence, prevents
aggregating information about any secret. Leakage
resiliency is achieved by utilizing a key-updating mechanism
(aka re-keying or key-rolling). Although leakage resilient
primitives can be implemented using unprotected cores, the
overall performance is at least halved (see [3]).
Most contributions in leakage resiliency focused on
designing new cryptographic primitives [4][7] however, the
proposed solutions were computationally intensive and do not
solve the problem of the current cryptographic schemes. Other
contributions focused on supporting a current primitive with
an SCA-secure key-updating scheme (as reviewed in Sec. IV).
The contribution in this paper follows the latter approach.
We propose a heuristically SCA-secure key-updating scheme
for the hardware implementation of AES running in any mode
of operation. We focus on achieving a sound security at the
smallest implementation cost (area and performance).
To achieve this goal, we propose a generic framework for lightweight key-updating and evaluate the minimum requirements

1556-6013 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

520

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

for SCA-security. Then, we propose a solution that maintains


the same level of SCA-security (and sometimes better) as the
state of the art, at a negligible area overhead while doubling
the throughput of the best previous work.
The rest of the paper is organized as follows. Sec. II
discusses the considered threat model and introduces a brief
background about leakage resiliency. Sec. III highlights the
system overview of our solution, the generic framework for
key-updating and the key-updating minimal requirements.
Sec. IV discusses the proposed solution for AES and its
practical security analysis. Sec. V shows the implementation
details and the comparison with previous work. Sec. VI
concludes the paper.
II. BACKGROUND
The threat considered in this paper is that Eve recovers the
secret key of a hardware implementation of AES. Classical
cryptography assumes that Eve can choose the input plaintext and the output ciphertext. SCA further assumes that
Eve knows the underlying implementation and can capture
the instantaneous power consumption. In the domain of
leakage resiliency, it is also assumed that Eve can run
any polynomial-time function (called leakage function)
on the power consumption to recover some bits of the
secret key.
Leakage resiliency, being a protocol level protection, cannot
protect the underlying implementation against Simple Power
Attacks (SPA), where one execution of the leakage function
can recover the full secret. Hence, the typical assumption is
that the leakage function can recover a small part < |k| of
the secret key. This is a reasonable assumption in hardware
modules, where the high parallelism and the measurement
noise prevents any polynomial-time function from recovering
the full secret. Differential Power Analysis (DPA) is
represented by executing the leakage function over different
executions (exactly |k|/), until the full secret key is
revealed.
Leakage resiliency depends on changing the secret key
after every execution. The updating function should possess a
minimum set of requirements in order to prevent DPA attacks.
For example, if the updating mechanism is linear or simple
(e.g. a counter), Eve can build her hypothesis based on a key
guess that follows the same updating mechanism, removing
the effect of key-updating at all. This attack is called futurecomputation attack, because it is modeled as if the leakage
function can recover some bits of a key that will show up
in the future. Future-computation attack represents the main
threat addressed by all leakage resilient cryptography. The rest
of this section reviews the two categories of key-updating and
the notable contributions in each one. At the end of each
subsection, we discuss how our solution improves over the
current ones.
The two categories of key-updating are stateless and stateful. One mechanism or the other is sufficient for a limited
set of applications. However, the two mechanisms are both
required for a complete and generic solution. For example,
Fig. 2 shows how the two mechanisms complement each other
for the application of data encryption. After exchanging a

Fig. 2. Stateless and stateful key-updating, as shown for the example of data
encryption.

public nonce, a stateless key-updating is used to generate a


pseudorandom secret state. Then, a stateful key-updating is
used to generate fresh running keys (k1 : k ).
A. Stateless Key-Updating
Stateless key-updating assumes that the two communicating
parties share only the secret key and a public variable (nonce)
i.e. there is no shared secret state between them. This updating
mechanism is required whenever there is no synchronization
between the two communicating parties e.g. during initialization of a secret channel. Stateless key-updating provides a
complete solution for applications with single cryptographic
execution e.g. challenge response protocols.
There is no provably secure construction that supports
stateless key-updating [3]. Intuitively speaking, the secret key
cannot be updated to a new key unless a public variable is
used (assuming no synchronization). Once a public variable
interacts with a secret key, SCA will be possible. Some
contributions tried to secure the stateless key-updating mechanism through hiding and masking [8], [9]. Although this
approach limits the implementation overhead exclusively to
the key-updating mechanism, allowing the use of unprotected
cryptographic cores, the overall overhead is still significant
(more than 100% [8]).
On the other hand, leakage resiliency can be used to
minimize the number of instances where a secret key is
being used. This can be achieved using the tree structure
(as proposed by Goldreich, Goldwasser and Micali, known
as GGM structure [10]), where the secret key is updated to a
new secret through a series of sequential randomization steps.
Each step involves processing one bit of a public nonce and is
responsible for randomizing the new key. Hence, after any
step, Eve will face a new secret with no way to combine
the extracted information. The GGM structure was proven
secure against SCA attacks by realizing a pseudorandom
function (PRF) with a fresh random variable per step [11].
Later, Medwed et al. improved the performance of the PRF
by processing 8-bits of the nonce per step, while supporting the
implementation of each step with key-dependent algorithmic
noise [12]. Although these PRFs are SCA-secure, they can
only be efficient in developing new cryptographic primitive,
but not to protect the current modes of AES where the final
output of the PRF is to be protected by a cryptographically-

TAHA AND SCHAUMONT: KEY UPDATING FOR LEAKAGE RESILIENCY

strong pseudo-random permutation PRP (AES in some mode


of operation).
In contrast, our target is to protect the standard modes of
AES with minimal overhead. Hence, we designed a stateless
function that is only SCA-secured, but not a PRF. The entropy
of the master key is passed over as-is to the encryption keys.
Our view is that, SCA-protection is not meant to correct the
entropy of the input key. This can be achieved more efficiently
by improving the cryptographic structure of the cipher. Hence,
our paper and [12] have different design goals, and hence
different security requirements. By removing the need for
extra randomness and keeping only SCA-security, our solution
is 3.2 times faster than the best previous solution for stateless
key-updating (that of [12]).
B. Stateful Key-Updating
Stateful key-updating assumes that the two communicating
parties share a common secret state (other than the key). They
both can update the secret key into a new key without requiring
any external variables. This scheme can provide a complete
solution for synchronized applications e.g. key-fobs.
The first provably secure construction for stateful
key-updating was the alternating structure [6], [11]. In this
structure, two different keys are used in an alternating fashion.
Hence, the computation of a future key depends not only on the
current execution but also on another value that is not currently
within the system. Unfortunately, this structure is inefficient,
as it requires doubling the key size. Also, it assumes that Eve
cannot combine the leakage from the two computing parts,
which is not a realistic assumption. Then, a direct structure
was proposed replacing the alternating structure by using a
fresh random variable at every key-update [5]. Unfortunately,
requiring a fresh random variable at every key-update is not
practical. Later, an efficient direct structure was proposed
using only one random variable under the assumption that
the leakage function is non-adaptive i.e. the leakage function
is fixed and selected prior to or independent of the random
variable [13].
In contrast, some contributions proposed heuristically secure
stateful functions that do not require any source of randomness [9], [14]. In these contributions, a full-features one-way
function is used to update the secret key.
Although using a one-way key-updating function supports
forward security, SCA-protection is not meant to add forward
security. This can be achieved more efficiently by improving
the cryptographic structure of the underlying cipher. Hence, we
studied the requirements for only SCA-security and proposed
a solution that is 2 times faster than the best previous work
for stateful key-updating (that of [9] and [13]).
III. F RAMEWORK FOR K EY-U PDATING
The proposed solution at the system level works as follows.
We assume that an application on Device A needs to send
secure data to an application on Device B. Both devices share
a secret key, which we name master key. They can initiate the
channel by exchanging a public nonce, and send the secure
data using any cryptographic primitive (AES) running in a

521

Fig. 3. Our solution: A tree structure for stateless key-updating and a chain
of whitening functions for Stateful key-updating.

mode of operation. Although the black-box security of these


modes is guaranteed by the cryptographic primitive, security is
not guaranteed if Eve can monitor Device A. Here, we target
protecting the master key against any SCA attack.
Device A starts with a stateless key-updating mechanism to
compute a pseudorandom secret state out of the master key
and the nonce. Then, the stateful key-updating is executed, to
compute running keys. Finally, the actual cryptographic mode
is called using the input data and the same previously used
nonce.
Our solution honors the tree structure for the stateless
key-updating. Each step of the tree involves processing a single
bit of the nonce through a lightweight whitening function
(Wt: whitening in the tree). The tree starts from the master
key, and ends with a pseudorandom secret state. For the
stateful key-updating, we use a chain of whitening functions
(Wc: whitening in the chain). Every execution of the whitening function generates a new running key. Our solution is
highlighted in Fig. 3.
A. Assumptions
During design of the proposed solution, we follow these
assumptions
1) Parallel Hardware: We assume that all the non-linear
elements (S-boxes) are processed in parallel. Hence, the
system power consumption is the aggregation of all the
leakages. This assumption is required to exploit the keydependent algorithmic noise, which supports SCA security in
the stateless key-updating.
2) Only Current and Previous Iterations Leak: This is a
very logical assumption, as the power leakage is a physical
quantity. The module as a physical entity does not have any
clue about the next input message block. It is only Eve who
can link future computations to the current leakage using the
algorithm and the future inputs. Although a similar assumption
was used in [3] (Only Current Iterations Leak), we include
leakage of previous iterations. Indeed, the use of Hamming
Distance leakage function may reveal some information about
the previously processed iteration. This assumption does not
exclude future computation attack, but only breaks the direct
(and mysterious) link between future computations and current
leakage.

522

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

B. Key-Updating Requirements
For the highlighted tree structure to be lightweight and
secure against SCA, Wt function is required to be (inspired
from [8]):
1) Non-linearity with balanced full-diffusion.
2) Resist Simple Power Analysis.
3) Resist 2-traces Differential Power Analysis.
4) At small area and performance overheads.
Full diffusion means that each bit of a new key depends on
every bit of an old key. Balanced full-diffusion means that
flipping any bit of an old key flips all the bits of a new key
with equal probability. Non-linearity means that one bit of
a new key depends on a non-linear function of the previous
key bits.
The Wc function should possess the same set of requirements except resistant against 2-traces DPA attacks which is
prevented by design.
C. Security Analysis
In this section, we show that the key-updating requirements
discussed in the previous section are necessary for a secure
leakage resiliency. The core idea of leakage resiliency is to
limit the use of any secret value to encrypt only one message
block. Thereafter, the secret value has to be updated to a new
secret. That said, leakage resiliency cannot prevent Eve from
attacking the leakage of encrypting one message block (using
means of Simple Power Analysis). However, leakage resilient
cryptographic schemes can prevent Eve from including more
than one leakage trace in any attack, i.e. prevent Differential
Power Analysis.
Also, the key-updating function cannot prevent Eve from
using the partially recovered information, using only one
leakage trace, to reduce the search space of the new secret
value. However, if the partially recovered information is small
( < |k|), the key-updating function can prevent Eve from
excluding parts of the new secret value, i.e. Eve cannot make
use of the partially recovered information unless she enumerate
all the search space of the new secret value. In other words,
leakage resiliency can prevent applying the divide-and-conquer
principle across key-updating.
Focusing on the role of key-updating in leakage
resilient cryptographic schemes, high-diffusion was proposed
as the only mathematical condition required for secure
key-updating [8], [15]. Here, we show that this condition is not
sufficient with a counter example, and propose new conditions.
In the next section, we propose a lightweight realization of a
secure key-updating function using the structure of Rijndael
algorithm.
Let the key-updating function be:
ki = ki

|k|


k j ; for i = 1 : |k|

j =1

where k is the old key, k  is the new key, and k j is one bit of the
key. The function computes the binary xor between a bit from
the old key and the parity of the entire old key. This updating
function fulfills the high-diffusion requirement of [8] and [15]

in their definition that one bit of the new key depends on many
bits of the old key. In fact, this function posses full-diffusion
in the definition that one bit of the new key depends on all the
bits of the old key. However, this function cannot not prevent
DPA attacks.
Note that, if the parity of the old key is one, i.e. odd number
of ones in its binary representation, the entire key will be
flipped with the parity of the new key is also one (assuming
the bit-length of the key is even). If the parity is zero, the
new key will equals the old key and the parity will stay zero.
In this case, Eve will put two hypotheses for each key-guess.
One hypothesis with flipping the key-guess between traces.
The other hypothesis with a fixed key-guess. Here, Eve can
overcome this kind of leakage resiliency by doubling the size
of hypotheses e.g. from 256 to 512 for guessing one byte of the
master key. We acknowledge that, this counterexample does
not harm the practical instances proposed by [8] and [15].
We only highlight limitation in the proposed conditions for
security.
To prevent such attack we require that the old key is
processed by a non-linear function before generating a new
secret key. The non-linearity will ensure that Eve cannot make
a hypothesis over a small part of the secret key that affects
the sensitive variable of different traces. Needless to say that,
Eve cannot make a hypothesis over the full secret key due to
computation complexity.
Also, in case of recovering a small number of bits of one key
( < |k|), the key-updating function should prevent Eve from
excluding any key hypothesis. Keeping in mind that, a key
hypothesis is typically put for a small part of the secret key
(one or two bytes), this requirement means that Eve cannot
map the recovered information from old key to a small part
of the new key. Ideally, one-bit of uncertainty in an old key
should generate two keys with an average Hamming Distance
of 50%. At a finer granularity, one-bit of uncertainty in an old
key should flip each bit of a new key with probability 50%.
We define a key-updating function that has such property as
a balanced function.
1) Extension to Stateless Key-Updating: At the start of
every session, the first execution of Wt will always process
the master key. As we discussed, leakage resiliency cannot
prevent Eve from exploiting the leakage of one trace. Hence,
we require that Wt be protected against simple power
analysis (SPA) attacks.
Also, key-updating protects cryptographic implementations
against DPA attacks only after being initialized to a secure
pseudorandom state, when no public inputs are further
used. However, while initializing new sessions (stateless
key-updating), Wt processes the master key and a public
nonce. Although, the tree structure limits the effect of the
public nonce to only one bit at a time, Eve can still mount DPA
attack against the two cases of the public nonce-bit (0 and 1).
Hence, we also require that Wt be protected against DPA
attacks using two differential traces.
If these requirements are met, the tree structure will
guarantee that:
Each nonce will generate unique secret state.
If full-diffusion is achieved, different values of the

TAHA AND SCHAUMONT: KEY UPDATING FOR LEAKAGE RESILIENCY

nonce (by definition) will result in different final


outputs.
SCA attack against any step is prevented. If each step is
protected against SPA attacks, the entire structure will be
protected by induction.
Extension to Stateful Key-Updating: Once the tree
structure has securely executed, the two communicating parties
will have a common pseudorandom secret state. The previously discussed requirements (non-linearity with balanced
full-diffusion) will prevent DPA attacks across key-updates.
Also, protection against 2-traces DPA attacks is not required
as there is no further inputs.
D. Discussions
1) A Lightweight tree, not a GGM:: The GGM structure
(the original idea for the tree) is a method to realize secure
pseudorandom functions (PRFs) from sequential steps of
randomization e.g. block-cipher encryptions using plaintexts
of random values. Hence, the final output of GGM is required
to be pseudorandom. Most leakage resilient stateless keyupdating used the GGM to achieve protection against both
black-box attacks and side-channel attacks, where the final
output is observable by Eve and used as a key-stream [3], [12].
However, we use a lightweight realization of the tree to
achieve protection against exclusively side-channel attacks.
The final output of the tree cannot be used as a key-stream
for stream ciphers, but only as a key to the underlying
block cipher. As discussed, the output is still secured with a
cryptographically sound block-cipher. The black-box security
of our solution is maintained by the underlying mode.
This domain change allowed two modifications:
1) In our solution, the decision bit (n(i )) selects between
two fixed inputs (all 0s or all 1s) instead of selecting
between two random variables. In this way, we lost the
source of randomization. But, we kept the whitening
function as a source of non-linear, balanced diffusion between key-bits which is the main ingredient of
SCA protection. Here, protection against SCA attacks is
actually improved by allowing only two differential
traces (at n(i ) = 0 and n(i ) = 1).
2) The whitening function is not required to exhibit strong
black-box security but rather to only prevent future
computation attacks.
For these modification, we called our structure a tree rather
than a GGM.
2) The Stateful Function is Not Forward Secure: Forward
security is a property of key-agreement protocols requiring
that, if the current key is recovered, all the previous sessions
are still secured. Contradicting the previous work of using
one-way functions, our stateful key-updating function is bijective and invertible hence, it does not add forward security to
the underlying cipher. We believe that, forward security can
be achieved more efficiently by improving the cryptographic
structure, not only as a by-product of adding SCA-security.
3) Protocol Level Protection: Our solutions is a protocol level protection against SCA, where the final output
depends on the key-updating mechanism. Hence, the two

523

TABLE I
P REVIOUS W ORK

communicating parties have to follow the same key-updating


mechanism, even if one of them is physically secured (e.g.
server). This is not the case for hiding or masking, where the
final output is not affected by the protection mechanism.
IV. A PPLICATION TO AES M ODES OF O PERATION
AES modes of operation are algorithms used to extend
capabilities of AES to cover plaintext of arbitrary length.
Here, we propose solutions to protect the implementation
of any standard mode. The considered modes are Cipher
Block Chaining (CBC), Cipher Feedback (CFB), Output
Feedback (OFB), and Counter (CTR) modes for data
encryption and Counter with CBC-MAC (CCM),
Galois/Counter (GCM) and Offset Codebook (OCB) modes
for authenticated encryption [16], [17].
These modes assume that Alice and Bob are willing to
exchange some data messages, and that they have a shared
secret key K . For every new message, aka session, they
initiate the mode with a public nonce variable (also called
initialization vector or counter). For CBC and CFB, the nonce
needs to be unpredictable by Eve and unique. For the other
modes, the nonce only needs to be unique. The length of nonce
is fixed to 128 bits for CBC, CFB, OFB and CTR while it is
variable for CCM, GCM and OCB. The maximum number of
bytes to be encrypted in a single message is usually less than
the birthday boundary of AES (264 ).
Every mode has a different way of connecting the
input/output of the block cipher between different executions,
however, they all have in common that they use the same
secret key for all block cipher executions. Indeed, they employ
a fixed secret key, so that the implementation requires only
one execution of the key-schedule algorithm (see [18]). Direct
application of a key-updating scheme will require re-executing
the key-schedule at every encryption, which is not compatible
with the current implementations. Our key-update mechanism
is supported with an implementation trick to inject running
keys directly instead of round keys. Hence, our solution is
compatible with current implementations and does not require
re-executing the key-schedule.
A. Related Work
Previous contributions that used key-updating schemes with
one public variable are shown in Table I.
One of the early works that used key-updating is the work of
Kocher [19], which is entirely based on DES. Unfortunately,
the scheme has two drawbacks: it does not incorporate a nonce,

524

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

Fig. 4.

High-level representation of the proposed scheme.

and every key update requires two executions of the underlying DES. Without using nonce, the running keys will be
generated in the same sequence in every session, which makes
it vulnerable to SCA over different sessions. Two recent works
proposed modular multiplication between the secret key and
the nonce as an easy-to-protect key-updating primitive [8], [9].
They used practical countermeasures (e.g., hiding and
masking) to protect the modular multiplication primitive.
The other contributions used GGM construction, which is
the best practice in leakage resiliency. The randomization
function at each step used was either a full-featured
hashing function (SHA-256) [14], or full-featured Block
cipher (AES) [12]. A recent contribution studied the minimum
SP network that can provide heuristic security against
SCA attacks [15].
Most key-updating contributions in the table focus only
on the stateless key-updating. Under the conditions of direct
construction and one public variable, we found only few
contributions for stateful key updating. Some contributions
achieve heuristically secure constructions using either hashing
functions or block ciphers [9], [14], [19], and one provable
construction [13].
B. Proposed Solution
Fig. 4 shows a high-level representation of our solution.
The secret key is used as a master key. The master key
and the nonce (n) are processed with a leak-proof keyupdating scheme. The key-updating scheme is composed of
two phases. The stateless key-updating protects the master key
against SCA and key-recovery attacks and generates a unique
pseudorandom secret state. The stateful key updating starts
from the secret state and generates session key and running
keys. The session key is used in the key-schedule algorithm to
generate round keys as shown in the figure. The running keys
(in groups of two) are used to directly replace the first and last
round keys of each encryption. In the figure, we did not show
the connection between nonce, plaintext and ciphertext for
specific mode as our scheme is compatible with any standard
mode.
Also, the Wt and Wc functions are defined as
follows.
1) Definition of Wt: Let Encrk ( p) denote the application
of the first AddRoundKey and two rounds of AES to the
plaintext p under the key k, i.e. a round-reduced version of

Fig. 5.

Replacing the first and last round keys by fresh running keys.

AES. Let n denote a nonce, and n(i ) denote bit i of the nonce.
Assuming K is the master key, the stateless key-updating starts
by initializing K 0 = K . Then, one step of the tree will be
defined as:

Encr1128 (K i ), if n(i ) = 1
i+1
i
K
= Wtn(i) (K ) :=
Encr0128 (K i ), if n(i ) = 0
i.e. Wt is the application of a round reduced version of AES
to the previous key under the key of all zeros or all ones
(depending on the bit value of the nonce). Note that, the
master key (and later keys) are used as the plaintext, and
a fixed input is used as the key. Also, capital letters K
denote the master key, or any key within the tree, while small
letters k denote running keys. Finally, the pseudorandom secret
state will be s = K |n|1 , where |n| is the bit-length of the
nonce n.
2) Definition of Wc: The running key chain starts by
initializing the first running key to the secret state: k0 = s.
Then, each new running key will be generated by applying the Wc function on the previous key. Wc will be a
whitening function realized by Encr with the key fixed to all
zeros:
ki+1 = Wc(ki ) := Encr0128 (ki )
3) Interaction With the Underlying Mode of AES: The typical implementation of any standard AES mode of operation
starts by running the key-schedule algorithm over the secret
key to generate round keys. Then, the round keys are stored
to be used in all AES encryptions.
Here, we use the first running key (which is the secret
state) as a session key. Hence, the key-schedule will run
over k0 to generate round keys. Then, instead of directly
using round keys in AES encryptions, each group of two
running keys (ki and ki+1 starting from i = 1) will replace
the first and last round keys of each encryption as shown
in Fig. 5.
C. Security of the Practical Scheme
In this section, we will show how the proposed key-updating
functions fulfills the required properties in Sec. III-B.

TAHA AND SCHAUMONT: KEY UPDATING FOR LEAKAGE RESILIENCY

Fig. 6. Probability density function of the Hamming Distance between the


input and output in response to a bit-flip.

1) Non-Linearity
With
Balanced
Full-Diffusion:
Non-linearly of the key-updating function is guaranteed by
the S-box layer of two AES rounds. The full-diffusion is
expected as the mathematical structure of Rijndael, especially
the ShiftRows and MixColumns steps, requires that each bit of
the input affects the entire state after two rounds [20]. In order
to prove that the functions have a full, balanced diffusion, we
conducted a diffusion test.
The diffusion test measures how each bit of the input affects
the output bits. The test involves one million experiments
over Wt. In each experiment, we select a random key and
compute the output of the function Wt at either n(0) = 1
or n(0) = 0 (randomly). Then, we randomly flip one-bit of
the key and re-compute the output. Finally, we compute and
record the Hamming Distance between the two outputs. Also,
for individual bit-positions, we accumulate the number of
instances when the bit-value is different between the outputs,
and divide the number by the total number of experiments.
The distribution of the Hamming Distance is shown
in Fig. 6. The average Hamming Distance is 50.16%, with
a 95% confidence intervals of 0.025%. The probability of
flipping individual bits of the output has a minimum value
of 50.03% and a maximum value of 50.33%. This indicates that all the bits contributed equally to the overall
diffusion.
Note that Wc is essentially Wt with the nonce-bit input is
set to n(0) = 0. Hence, the previous results applies equally to
the Wc function.
2) Resistant Against Side Channel Analysis: First of all,
although the master key is used in the data path and the
fixed input is used as the key (which removes the need of
key-schedule for the tree itself), this change is transparent to
SCA analysis, as the two values are xored to each other.
Under parallel hardware implementations, the system
power consumption of 16 parallel S-boxes at noiseless
measurement is:
Lj =

16

i=0

l(S( p j (i ) k(i ))),

525

where j is the trace number, p j (i ) is the fixed input byte


at trace number j , and k(i ) is the secret key byte at location
i [1 : 16]. Also, S is the S-box function, and l is the leakage
function.
In the following, we study the security of our solution under
the worst case attack, which is the template subset-sum attack.
In this attack, Eve tries to recover all the secret key bytes at
the same time, i.e. tries to find the combination of 16 key bytes
that satisfies the above equation. For the worst attack scenario,
we assume a perfect profiling phase where the leakage of every
output of the S-box has its distinct value, i.e. l(x) = x.
a) Resistant against SPA: Considering SPA attacks (using
only one equation), Eves problem is to find a subset K
of 16 elements from the set [0 : 255], such that the previous equality holds. This problem is actually the well-known
subset sum problem, which is NP-complete. Although many
algorithms were proposed to find a correct solution (e.g. the
LLL algorithm [21]), our problem is more complicated. Here,
Eve is required to find all the correct solutions (not only any
correct one), and test them all, including all the permutations,
searching for the correct secret key.
b) Resistant against 2-traces DPA: The problem of
testing all the correct solutions can be eased by considering
two differential traces with a DPA-like attack, i.e. at two
different inputs similar to our tree construction. Now, Eve will
only need to find a subset K that shows correct result for both
traces. Here, we are interested in computing (or estimating)
the computational complexity in the solution. We define the
computational complexity in the correct solution as the number
of correct Ks that Eve will have to test in order to find the
correct secret key.
Given that the original problem is NP-complete, we could
not find exact bounds for the computational complexity.
Hence, we tried to estimate it using simulation over a small
part of the key-space. Precisely, we did the following test.
First, we generated N random keys, as the key-space. Then,
we selected a secret key from the key-space and computed
the corresponding power consumption at the output of the
S-box (assuming that l(x) = x) using the two inputs (all
0s and all 1s). Finally, we counted the number of keys in
the key-space that could have the exact power consumption
using the same inputs. We did the previous experiment at
different sizes of the key-space N [100K : 50M], only
limited by the available memory at our workstation. The
average computational complexity (over 500 experiments for
each value of N) is shown in Fig. 7.
The 95% confidence intervals shown in the figure are
computed with the bootstrapping method, as the probability
density function of the average value did not match any
standard distribution. The figure shows an almost-linear
relationship. We acknowledge that we cannot extrapolate these
numbers to a key-space size of 2128 . However, the figure
suggests a very high computational complexity at the full
key-space.
Our analysis shows that, following a noiseless perfectly
profiled attack, the parallel hardware is SCA-secured. Indeed,
one research showed that the computational complexity is
estimated to become one, assuming a random key, only after

526

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

TABLE II
C OMPARISON B ETWEEN THE I MPLEMENTATION OVERHEAD OF THE
K EY-U PDATING S CHEMES . s I S THE S ECURITY PARAMETER

Fig. 7. The average computational complexity of the solution of a 2-traces


DPA attack under different key-space sizes.

128 equations differential trace [22], which is not allowed in


our solution by design.
3) Interaction With the Underlying Mode: Replacing two
round keys with two running keys does not affect the
black-box security of the underlying block cipher (AES), as
the running keys are pseudorandom and unknown. Moreover,
it allows the Electronic Codebook (ECB) mode to generate
indistinguishable ciphertexts.
D. Trading SCA-Security for Performance
It is commonly agreed that if Eve acquires a cryptographic
module and she has adequate resources, she can break the
module one way or the other (e.g. using invasive attacks).
The whole point of a SCA-countermeasure is to exclude SCA
from being the weakest point in the chain of security. Indeed,
practical markets will not employ high cost countermeasures
in low end devices, e.g. metro tickets.
For this reason, we design our scheme to be flexible with
the ability to trade some SCA security for a better performance. The reported performance overhead of the stateless
key-updating structures targets the best possible protection
against SCA. Here, our scheme uses one bit of the nonce at
each step of the tree for a maximum of two differential traces.
Indeed, limiting Eve by two differential traces exhibit mathematically secure implementations, however, most practical
markets can trade some SCA security for a better performance.
The performance of our structures can be improved by
using s bits of the nonce in each step of the tree. Instead
of repeating the nonce bit (0 or 1) over all the fixed input
bits (result in all 0s or all 1s), we repeat blocks of n bits
of the nonce. Here, s is a security parameter for trading
marginal SCA-security for performance. Using a security
level of s allows Eve to collect 2s differential traces. So far,
we designed our scheme at the best SCA-security (s = 1).
However, lower security bounds can be adopted for low-cost
applications. For example, s = 8 was the security level
tolerated in the design of [12]. Also, s = 4 (using PRESENT

S-box [24]) was the security level tolerated in the design


of [15]. The exact change in SCA-security can only be
measured with leakage quantification using a practical setup
as in [25]. We leave the exact measure of how s affects
SCA-security as future work, because any results,
although time consuming, will be applicable to only one
implementation.
V. I MPLEMENTATION
To enable a round-reduced option in the hardware implementation, we add a mode input. If the mode input is set,
the output is ready after two rounds, otherwise the output is
ready after ten rounds. We implemented the two cores using
Synopsys Design Compiler at UMC 130nm technology, where
the difference was only two gates at 3.7 Gate Equivalent (GE).
All executions of Wt and Wc use only two keys (all 0s
or all 1s), hence the key-schedule algorithm will run only
two times to output, and store a total of four round keys.
The Wt function requires two clock cycles, plus two cycles to
load the key and the fixed input (assuming that the fixed input
changes at every step). Therefore, the complete performance
overhead of the stateless key-updating is |n| 4 clock cycles.
Assuming the use of 128 bits nonce, which is a fixed value
for most modes, the performance overhead will be 512 clock
cycles. Also, function Wc requires two clock cycles, plus
one cycle to load the key (the input in fixed to all 0s).
Every encryption requires two running keys, hence the total
performance overhead for the stateful key-updating is 6 clock
cycles.
By changing the security parameter s, the performance
overhead is reduced by s times. In this case, the entire
tree structure will consume (|n|/s) 4 clock cycles. The
performance of Wc will not change as it does not accept any
input.
A. Comparison
A comparison between the implementation overhead of the
proposed scheme and that of the previous work is shown

TAHA AND SCHAUMONT: KEY UPDATING FOR LEAKAGE RESILIENCY

Fig. 8. The implementation overhead of the different techniques used for


the stateless key-updating.

Fig. 9. The implementation overhead of the different techniques used for


the stateful key-updating.

in Table II. In the table, we focus only on the encryption


pass, neglecting the effect of executing the key scheduling
algorithm. Here, we assume that the bit-length of the nonce
is 128 bits. Note that, we do not report any area overhead for
AES related schemes, because they utilize the same underlying
core. The results of [8] are taken at the first-order masked
implementation. The results of the minimum SP network
in [15] are taken from the implementation that is compatible
with AES (128-bit key and 128-bit nonce). For comparison
at small area, we use the currently smallest implementation
of AES in [2] and that of SHA-256 in [26]. For comparison
at fast computation, we use the AES core in [12] and the
SHA-256 core in [27].
Fig. 8 shows the implementation overhead for the stateless key-updating schemes. The key-updating schemes that
use SHA-256 and AES-Small are not shown in the figure
for having excessive implementation overhead. The solutions
in [8] and [15] used dedicated updating circuits to achieve

527

Fig. 10.

Relative throughput of the available re-keying schemes.

comparable performance overheads. The performance overhead of our RR-AES structure at s = 8 is only 64 cycles,
which is 3.2 times faster than the best previous solution at no
area overhead (that of [12]).
The implementation overhead of different techniques used
for the stateful key-updating is shown in Fig. 9. The scheme
that uses SHA-256-Fast is not shown for having excessive area
overhead. Our solution is two times faster than the currently
best direct constructions of [9] and [13]. The figure also
shows a state-of-are masking scheme. The smallest threshold
implementation (to prevent leakage caused by glitches) of AES
requires 8,393 GE of area overhead and works at 266 cycles
per encryption [2]. The threshold implementation is shown on
the stateful key-updating figure, as the performance overhead
of stateless key-updating is a one-time overhead (once per
message), and can be trivialized at long message lengths.
Finally, we compare the relative throughput of the available
solutions. The relative throughput of a protected module is
the ratio between its throughput to the throughput of the
unprotected module. The throughput is the number of message
blocks that are processed per clock cycle. Due to the one-time
overhead of the stateless key-updating, the relative throughput
of protected modules increases by increasing the message size.
Here, we assume that the unprotected AES core (one message
block per 12 cycles) is our reference. Also, we assume
using a serialized implementation for the re-keying schemes,
i.e. re-keying and encryption are done in separate clock
cycles. This assumption supports the no-area-overhead target
of our solutions. Fig. 10 shows the relative throughput of a
no-protection core, the AES-Fast solution from [14], combining the fast solutions from [12] and [9] and our recommended
RR-AES solutions at s = 1 and s = 8. It is clear that
our solution at s = 8 has the absolute highest throughput.
Also our solution at s = 1 achieves higher throughput that
the previous best solution (the combination of [9] and [12])
after 52 message blocks. This means that, for messages longer
than 832 bytes, our RR-AES solution with s = 1 achieves
higher throughput and better security guarantees than the best
previous work.

528

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 10, NO. 3, MARCH 2015

VI. C ONCLUSION
In this paper, we proposed a lightweight key-updating
framework for efficient leakage resiliency. We proposed the
minimum requirements for heuristically secure structures.
We proposed a complete solution to protect the implementation of any AES mode of operation. Our solution utilized
two rounds of the underlying AES itself achieving negligible
area overhead and very small performance overhead.
R EFERENCES
[1] K. Tiri et al., Prototype IC with WDDL and differential routingDPA
resistance assessment, in Cryptographic Hardware and Embedded
Systems. Berlin, Germany: Springer-Verlag, 2005, pp. 354365.
[2] A. Moradi, A. Poschmann, S. Ling, C. Paar, and H. Wang, Pushing
the limits: A very compact and a threshold implementation of AES,
in Advances in Cryptology. Berlin, Germany: Springer-Verlag, 2011,
pp. 6988.
[3] F.-X. Standaert, O. Pereira, Y. Yu, J.-J. Quisquater, M. Yung, and
E. Oswald, Leakage resilient cryptography in practice, in Towards
Hardware-Intrinsic Security. Berlin, Germany: Springer-Verlag, 2010,
pp. 99134.
[4] Y. Dodis and K. Pietrzak, Leakage-resilient pseudorandom functions
and side-channel attacks on Feistel networks, in Proc. 30th CRYPTO,
2010, pp. 2140.
[5] S. Faust, K. Pietrzak, and J. Schipper, Practical leakage-resilient
symmetric cryptography, in Cryptographic Hardware and Embedded
Systems. Berlin, Germany: Springer-Verlag, 2012, pp. 213232.
[6] S. Dziembowski and K. Pietrzak, Leakage-resilient cryptography, in
Proc. IEEE 49th Annu. IEEE Symp. Found. Comput. Sci. (FOCS),
Oct. 2008, pp. 293302.
[7] D. Martin, E. Oswald, and M. Stam, A leakage resilient MAC,
Dept. Comput. Sci., Univ. Bristol, Bristol, U.K., Tech. Rep. 2013/292,
2013. [Online]. Available: http://eprint.iacr.org/
[8] M. Medwed, F.-X. Standaert, J. Groschdl, and F. Regazzoni, Fresh
re-keying: Security against side-channel and fault attacks for low-cost
devices, in Progress in Cryptology. Berlin, Germany: Springer-Verlag,
2010, pp. 279296.
[9] B. Gammel, W. Fischer, and S. Mangard, Generating a session key
for authentication and secure data transfer, U.S. Patent 20 100 316 217,
Dec. 16, 2010.
[10] O. Goldreich, S. Goldwasser, and S. Micali, How to construct random
functions, J. ACM, vol. 33, no. 4, pp. 792807, Oct. 1986.
[11] K. Pietrzak, A leakage-resilient mode of operation, in Advances in
Cryptology. Berlin, Germany: Springer-Verlag, 2009, pp. 462482.
[12] M. Medwed, F.-X. Standaert, and A. Joux, Towards superexponential side-channel security with efficient leakage-resilient PRFs,
in Cryptographic Hardware and Embedded Systems. Berlin, Germany:
Springer-Verlag, 2012, pp. 193212.
[13] Y. Yu and F.-X. Standaert, Practical leakage-resilient pseudorandom
objects with minimum public randomness, in Topics in Cryptology.
Berlin, Germany: Springer-Verlag, 2013, pp. 223238.
[14] P. Kocher, Complexity and the challenges of securing SoCs, in
Proc. 48th ACM/EDAC/IEEE Design Autom. Conf. (DAC), Jun. 2011,
pp. 328331.
[15] S. Belad et al., Towards fresh re-keying with leakage-resilient PRFs:
Cipher design principles and analysis, J. Cryptograph. Eng., vol. 4,
no. 3, pp. 157171, Sep. 2014.
[16] M. Dworkin, NIST special publication 800-38A, recommendation for
block cipher modes of operation: Methods and techniques.
[17] Information Technology, Security Techniques, Authenticated Encryption,
document ISO/IEC 19772:2009, Mar. 2013.
[18] M. Mozaffari-Kermani and A. Reyhani-Masoleh, Efficient and highperformance parallel hardware architectures for the AES-GCM, IEEE
Trans. Comput., vol. 61, no. 8, pp. 11651178, Aug. 2012.

[19] P. C. Kocher, Leak-resistant cryptographic indexed key update,


U.S. Patent 6 539 092, Mar. 25, 2003.
[20] J. Daemen and V. Rijmen, The Design of Rijndael. Secaucus, NJ, USA:
Springer-Verlag, 2002.
[21] A. K. Lenstra, H. W. Lenstra, Jr., and L. Lovsz, Factoring polynomials
with rational coefficients, Math. Ann., vol. 261, no. 4, pp. 515534,
Dec. 1982.
[22] O. L. Mangasarian and B. Recht, Probability of unique integer solution
to a system of linear equations, Eur. J. Oper. Res., vol. 214, no. 1,
pp. 2730, Oct. 2011.
[23] J. Blmer, J. Guajardo, and V. Krummel, Provably secure masking of
AES, in Selected Areas in Cryptography, vol. 3357. Berlin, Germany:
Springer-Verlag, 2005, pp. 6983.
[24] A. Bogdanov et al., PRESENT: An ultra-lightweight block cipher,
in Cryptographic Hardware and Embedded Systems, vol. 4727. Berlin,
Germany: Springer-Verlag, 2007, pp. 450466.
[25] B. J. G. Goodwill, J. Jaffe, and P. Rohatgi, A testing methodology for
side-channel resistance validation, in Proc. NIST Non-Invasive Attack
Testing Workshop, 2011.
[26] X. Cao and M. ONeill, Application-oriented SHA-256 hardware design
for low-cost RFID, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
May 2012, pp. 14121415.
[27] X. Guo et al., ASIC implementations of five SHA-3 finalists, in
Proc. Design, Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2012,
pp. 10061011.
Mostafa Taha (S12M14) is currently a
Post-Doctoral Fellow with the Department of
Electronics and Communication Engineering,
Worcester
Polytechnic
Institute,
Worcester,
MA, USA. He received the B.E. and M.S. degrees
in electrical engineering from Assiut University,
Assiut, Egypt, in 2004 and 2008, respectively,
and the Ph.D. degree in computer engineering
from the Virginia Polytechnic Institute and State
University, Blacksburg, VA, USA, in 2014. His
research interests include hardware security and
implementation attacks. He served as an academic reviewer for several
conferences in this field, including CHES, COSADE, CARDIS, and HOST,
and several journals, including the IEEE T RANSACTIONS ON C OMPUTER A IDED D ESIGN, the IEEE T RANSACTIONS ON C OMPUTERS , the IEEE
T RANSACTIONS ON V ERY L ARGE S CALE I NTEGRATION S YSTEMS , and the
IACR Journal of Cryptographic Engineering. He is a member of the IEEE
and the International Association for Cryptologic Research

Patrick Schaumont (SM06) is currently an


Associate Professor of Computer Engineering
with the Virginia Polytechnic Institute and State
University, Blacksburg, VA, USA. He received the
Ph.D. degree in electrical engineering from the
University of California at Los Angeles,
Los Angeles, CA, USA, in 2004. His research
interests include cryptographic engineering and
its applications to embedded computing. He has
served on the Program Committee of international
conferences in this field, such as CHES, DATE,
DAC, IEEE, and HOST. He is an Associate Editor of several journals in
this field, including the IEEE T RANSACTIONS ON C OMPUTERS , the IACR
Journal of Cryptographic Engineering, the ACM Transactions on Design
Automation of Electronic Systems, and the ACM Transactions on Embedded
Computing Systems. He is a senior member of the IEEE and the International
Association for Cryptologic Research

You might also like