Professional Documents
Culture Documents
Improving First Order Differential Power Attacks Through Digital Signal Processing
Improving First Order Differential Power Attacks Through Digital Signal Processing
consumption model for the device and construct accordingly Our pre-processing methodology aims at selectively extract-
a selection function to classify the hypothesized consump- ing the useful information from the input traces, which we
tions; v) correlate the hypothetical consumptions predicted will use to correlate each filtered trace with the hypothetical
with the measured ones and select the hypothesis which fits power consumption estimated for each subkey value employ-
best the measured behavior as the correct key candidate. ing the Pearson’s coefficient as the correlation metric [4].
The complete workflow of the DPA methodology is depicted The proposed framework computes the frequency spectrum
in Figure 1. Indeed, the effectiveness of DPA attacks is at- (DFT) of the power traces, designs a passband digital fil-
tributable to the information leakage due to the switching ter (identifying which parts of the spectrum are carrying
activity triggered, at high level, by a known operation that the useful information), filters each trace multiplying its fre-
has an outcome dependent from the specific value of a por- quency spectrum by the spectrum of the designed filter, ap-
tion of the secret key. Since the device information leakage plies the inverse Fourier transform (IDFT) to get back a set
happens mostly in correspondence of the dynamic power dis- of time-domain signals, and applies Pearson’s analysis to the
sipation caused by signal transitions in the underlying cir- filtered power traces. The pre-processing discards the noise
cuits, the consumption hypotheses are usually considering which is not correlated with the switching activity through
the switching activity of the parts of the circuit handling proper signal processing, thus providing traces belonging to
the key values: i.e., functional units and registers [23]. The a simpler model which is both closer to the theoretical one,
usual choice in order to model the consumption in case a and easier to estimate during a possible template attack.
value is stored in a register, is to use the Hamming dis-
tance [4] between the previous and the new value stored by 3.1 Signal Processing of Power Traces
the latches. On the other hand, employing the Hamming
distance as a metric implies that the attacker either knows The frequency spectrum can be generated via a discrete
or is able to guess which values employed in the algorithm Fourier transform (DFT) of the trace samples, and the re-
are actually stored in latches and which are computed on the sulting values are usually presented as amplitude and phase,
fly: this in turn implies the knowledge of some either hard- both plotted against frequency. The amplitude values rep-
ware or software implementation details of the cryptosys- resent the magnitude of each harmonic component, while
tem. Another possibility is to derive the model through the the phase values represent its time shifting. The device in-
construction of a template consumption profile [27] through formation leakage happens mostly in correspondence of the
measuring the behavior of a device for all the key hypothe- dynamic power dissipation caused by signal transitions in
ses, and subsequently correlate it to the functioning of the the underlying circuits.
device under attack. This requires the attacker to be in pos- The most significant part of dynamic power dissipation in
sess of an exact copy of the device under attack where he can CMOS circuits [23, 28, 34] is proportional to capacitance,
arbitrarily choose the key employed in the encryption. This voltage swing and operating frequency of the considered re-
approach enables the attacker to obtain the best model of source. At physical level this dynamic power dissipation
the device, also taking into account the noise produced, al- is driven by the working clock frequency, which provides a
though it is required to collect a significant amount of traces synchronization signal for the start of the switching activi-
in order to properly model both the leakage and the noise ties of the logic gates. Thus, we expect that the harmonics
produced by the device. In the case under exam, we car- composing the working clock signal show a significant mag-
ried out the attack with both the Hamming weight and the nitude with respect to other frequency components. The
Hamming distance model, since we were in full knowledge information leakage is amplitude modulated on these har-
of the software implementation details and we had a clear monics, which act as carriers for the actual value of the
description of the hardware structure. Since the results em- dynamical power consumption of the circuit. In the case
ploying the Hamming distance model were nearly as efficient the circuit is driven by different clock trees with different
as the ones obtained with the Hamming weight, we chose to working frequencies (e.g., external buses are usually driven
present the ones which are obtainable by an attacker who with a clock splitter), multiple spikes in correspondence of
has no knowledge of the detailed information in our possess frequency bands centered around these working frequencies,
(see Section 4). are expected. Since in the time-domain the contribution of
all the harmonics is added together, it is reasonable to as-
sume that the eviction of signal components which are unre-
lated with the useful computation will yield a signal denser
in informative content related to the key manipulation, due
to the elimination of polluting noise.
cross compile toolchain based on GCC, and with the opti- DRAM and the time required for a full filtering campaign
mization level set to -O2 to achieve optimal performances. is approximately 10 minutes. The DPA attack has been
performed employing the Matlab scripts available from the
4.2 Workbench Description OpenSCA toolkit [7] and requires roughly one and a half
minute to perform an attack attempt with all the 10000
The employed oscilloscope is a LeCroy Waverunner Wave- traces.
Pro 7100A with a maximum sampling rate of 20 Gsamples/s
and a 1 GHz analog bandwidth at -3 dB. The employed
active differential probe is a LeCroy AP034 (1 GHz ana- 4.3 Experimental Results
log bandwidth at -3 dB, Common Mode Rejection Ratio
above 80 dB) connected to a 2 Ω shunt inserted on the In order to design the filter for the attack, we first computed
only power supply line available for the Cortex-M3 SoC. All the Fourier transform of a trace, shown in grey in Figure 3.
the measurements have been taken with a 5 Gs/s sampling Subsequently, we checked that the main peak appearing is
frequency in order to fully exploit the oscilloscope and the in correspondence with the core clock frequency of the CPU,
probe warranted bandpass regions and taking a reasonable as expected from the fact that the main consumption contri-
safety margin. The digital to analog converter in the oscil- bution comes from the computing core. After locating both
loscope discretizes the signal over 256 level and saves each the main clock and the significant harmonics generated by
sample to a single signed byte integer value, thus achiev- the clock splitter on the spectrum, we placed the passband
ing a typical 48.16 dB Signal-to-Quantization-Noise Ratio. windows centering them on the peaks and designing them
The measurement for each trace was obtained averaging over to have a 500 kHz width to compensate for environmental
64 executions of the encryption of the same plaintext and fluctuations in the signal, without including the noise from
stored directly on the oscilloscope mass storage for further the frequencies near the clock. The regions of the band-
retrieval. This was done in order to reduce the instrumental pass windows are depicted in black over the spectrum in
measurement noise, which, being well modelled by a random Figure 3. Even though there may be other periodic be-
variable following a Gaussian distribution with zero mean, haviors in the execution of a cryptographic primitive (for
will average out in different realizations. On the other hand, instance, due to the loop based structure of block ciphers),
averaging has no impact on the noise caused by either the the spectral components related to those loops may safely
intrinsic architectural structure of the chip or the imple- be discarded, since the source of the leakage (i.e., the gates
mentative strategies of the attacked algorithm. From now switching activities) is driven by the clock, and thus is not
on, we shall indicate with trace the result of averaging 64 influenced by such iterative structures. The unfiltered versus
A
sampled traces collected while encoding the same plaintext. filtered signal power ratio (GdB = 10 log Aunf iltered
f iltered
, where
The starting and ending points for the measurement were A denotes the sum of the modules of every value in the DFT
determined by a software risen trigger on one General Pur- squared) has an average value of GdB = 4.88 dB over the
pose Input/Output (GPIO) pin of the chip, asserting the traces. Figures 4(a) and 4(b) depict a sample trace respec-
pin before the algorithm started and deasserting it after the tively before and after the pre-processing through filtering.
end of the execution. We acquired a total of 10000 traces, The spikes present in Figure 4(a) are due to the effect of the
each one using 196 kB of disk space, resulting in a total disk auxiliary input-output circuit employed to trigger the start
occupation of 1.9 GB without the use of any compression of the acquisition on the oscilloscope. After applying the
techniques. The trace pre-processing phase has been done filter, both the effects of the trigger and the slow oscillation
with custom scripts run in Matlab-2009a on an Intel, Core, in the original signal are completely removed, together with
2 Quad Q6600 at 2.4 GHz endowed with 4 Gb of DDR2 the high frequency noise. The resulting trace is also free
Unfiltered Power Trace Raw traces
1
100
0.6
0
0.4
−50
0.2
−100
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 5 0 2000 4000 6000 8000 10000
Time [Sample] x10 Number of traces
(a) (a)
Filtered Power Trace Filtered traces
1
100
0.6
0
0.4
−50
0.2
−100
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 5 0 2000 4000 6000 8000 10000
Time [Sample] x10 Number of traces
(b) (b)
Figure 4: A sample trace pre (a), and post (b) fil- Figure 5: Correlation coefficients of the different
tering single-byte key hypotheses, correct key shown in
black: raw traces (a), filtered traces (b)
from any constant DC shift and thus does not need an addi-
tional average removal phase. The post processed trace (see ations w.r.t. the growth of the employed traces. By con-
Figure 4(b)) shows clearer features, and a regular behavior trast, in Figure 5(a) the estimated r value for the correct
which enhances the possibility of locating the time interval key hypothesis emerges using a higher number of traces (ap-
in which the encryption is computed through direct opti- proximately 6000) and does not show the same stationary
cal inspection and provides ground for a simpler temporal behavior as the one obtained through filtered traces.
alignment of the traces.
In order to further evaluate the soundness of the advan-
Figures 5(a) 5(b) depict the result of successfully running tage of applying our pre-processing technique, we analyze
the attack on the filtered and unfiltered traces in terms of the confidence intervals for the estimated values of the cor-
the maximum absolute value of the correlation coefficient for relation coefficient r, comparing the one of the correct key
each key hypothesis (correct one in black) with an increasing (depicted in dotted black) against the one of the best wrong
number of traces, regardless of the confidence level of the guess. In Figure 6 and Figure 7, the dotted grey line repre-
result obtained. The pictures report the result for attack on sents the wrong key hypothesis with the highest probability
the first byte of the key: the attacks on the remaining bytes of being mistaken as the right one, for each attack attempt.
show results within a negligible variation range (less than Figures 6(a) 6(b) depict the width of the confidence inter-
1%). For the sake of clarity, from now on, all the results will vals, with a confidence level of 80%, for the maximum es-
be presented on a single byte attack. timated correlation coefficients, while Figures 7(a) 7(b) are
depicting the same confidence intervals with γ =99.9%. The
It is evident from Figure 5(b) how, after the filtering, the confidence intervals in Figure 6(a) show that the attack is
correlation value related to the correct key hypothesis is set feasible with unfiltered traces, albeit with a thin confidence
apart from the wrong ones starting from a low number of margin. In fact, in the part where the correlation coeffi-
traces (roughly 300) and its value shows almost no fluctu- cient of the correct key dominates steadily the others (from
Raw traces: confidence level 80% Raw traces: confidence level 99.9%
1 1
Correct key Correct key
Best wrong hypothesis Best wrong hypothesis
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Number of traces Number of traces
(a) (a)
Filtered traces: confidence level 80% Filtered traces: confidence level 99.9%
1 1
Correct key Correct key
Best wrong hypothesis Best wrong hypothesis
Peak Correlation Coefficient
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Number of traces Number of traces
(b) (b)
Figure 6: Correlation analysis of unfiltered (a) Figure 7: Correlation analysis of unfiltered (a)
and filtered (b) traces, assuming a confidence level and filtered (b) traces, assuming a confidence level
γ =80% γ =99%