Daniel Gruss Slides Training PDF

www.tugraz.
at
How to have a Meltdown

Daniel Gruss
Graz University of Technology
April 19/20, 2018 — Cryptacus Training School
Daniel Gruss, Graz University of Technology

1 April 19/20, 2018 — Cryptacus Training School
www.tugraz.at
Get your computer ready!

Within the first two hours we will:
Checkout https://github.com/IAIK/cache_template_attacks
Make a histogram
Key stroke attack on an editor
Try to establish a covert channel

www.tugraz.at
Get your computer ready!

Within the third hour we will:
Use our covert channel in a Meltdown attack
Leak data from kernel addresses
for Meltdown: boot with nopti nokaslr

www.tugraz.at
1. Quick Start
2. Measuring and exploiting timing leakage
3. CPU caches
4. Cache attacks
5. Cache covert channels
6. Cache template attacks

www.tugraz.at
What to profile?
# ps -A | grep gedit
# cat /proc/pid/maps
00400000-00489000 r-xp 00000000 08:11 396356
/usr/bin/gedit
7f5a96991000-7f5a96a51000 r-xp 00000000 08:11 399365
/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.1400.14
...
memory range, access rights, offset, –, –, file name

5
www.tugraz.at
Profiling a single event
cd ../profiling/generic_low_frequency_example
# put the threshold into spy.c (MIN_CACHE_MISS_CYCLES)
make
./spy
# start the targeted program
sleep 2; ./spy 200 400000-489000 -- 20000
-- -- /usr/bin/gedit
... and hold down key in the targeted program

save addresses with peaks!
www.tugraz.at
Exploitation phase
cd ../exploitation/generic
make
./spy file offset

www.tugraz.at
Information leakage
Shared hardware
Memory x86 CPU
Memory Memory Branch Arithmetic Data and

deduplication bus prediction logic instruction
unit unit cache

www.tugraz.at
Why targeting the cache?
shared across cores

fast

www.tugraz.at
Why targeting the cache?
shared across cores

fast
→ fast cross-core attacks!

www.tugraz.at
Timing differences
caches improve performance

SRAM is expensive → small caches
different timings for memory accesses
data is cached → cache hit → fast

data is not cached → cache miss → slow

www.tugraz.at
1. Quick Start
3. CPU caches
4. Cache attacks

www.tugraz.at
Mesuring timing leakage

How every timing attack works:
learn timing of different corner cases

www.tugraz.at
Mesuring timing leakage

How every timing attack works:
learn timing of different corner cases
later, we recognize these corner cases by timing only

www.tugraz.at
Calibration
git clone https://github.com/IAIK/cache_template_attacks.git

cd calibration
make
./calibration

www.tugraz.at
Steps
1. build two cases: cache hits and cache misses

2. time each case many times (get rid of noise)

www.tugraz.at
Steps

3. we have a histogram!

www.tugraz.at
Steps

3. we have a histogram!
4. find a threshold to distinguish the two cases

www.tugraz.at
Step 1.1. Cache hits

Loop:
1. measure time
2. access variable (always cache hit)
3. measure time
4. update histogram with delta

www.tugraz.at
Step 1.2. Cache misses

Loop:
1. measure time
2. access variable (always cache miss)
3. measure time
4. update histogram with delta
5. flush variable (clflush instruction)

www.tugraz.at
Step 2. Accurate timings
very short timings

rdtsc instruction: cycle-accurate timestamps
[...]
rdtsc
function()
rdtsc
[...]

www.tugraz.at
do you measure what you think you measure?

out-of-order execution → what is really executed
rdtsc rdtsc rdtsc

function() [...] rdtsc
[...] rdtsc function()
rdtsc function() [...]

www.tugraz.at
use pseudo-serializing instruction rdtscp (recent CPUs)

www.tugraz.at

and/or use serializing instructions like cpuid

www.tugraz.at

and/or use fences like mfence

www.tugraz.at

and/or use fences like mfence
Intel, How to Benchmark Code Execution Times on Intel IA-32 and IA-64
Instruction Set Architectures White Paper, December 2010.

www.tugraz.at
Step 3. Histogram

www.tugraz.at
Step 3. Histogram

www.tugraz.at
Step 4. Find threshold
as high as possible
most cache hits are below
no cache miss below

www.tugraz.at
Side-channel attack on user input
locate key-dependent memory accesses

with cache template attacks

www.tugraz.at
Profiling Phase: one event
Attacker address space Victim address space

Cache
Shared 0x0
Shared 0x0
Cache is empty

www.tugraz.at

Cache
Shared 0x0
Shared 0x0
A
Shared 0x0
Attacker triggers an event

www.tugraz.at

Cache
Shared 0x0
Shared 0x0
Shared 0x0
Attacker checks one address for cache hits (“Reload”)

www.tugraz.at

Cache
Shared 0x0
Shared 0x0
Shared 0x0
Update number of cache hits per event

www.tugraz.at

Cache
Shared 0x0
flus Shared 0x0
h
Shared 0x0
Attacker flushes shared memory

www.tugraz.at

Cache
Shared 0x0
A
Shared 0x0
Repeat for higher accuracy

www.tugraz.at

Cache
Shared 0x40
A
Shared 0x40
Continue with next address

www.tugraz.at

Cache
A
Shared 0x80
Shared 0x80

www.tugraz.at
1. Quick Start
3. CPU caches
4. Cache attacks

www.tugraz.at
Directly mapped cache
Memory Address

www.tugraz.at
Memory Address Cache

www.tugraz.at

Tag Data

www.tugraz.at

Tag Data

www.tugraz.at

b bits Tag Data
2b bytes

www.tugraz.at

n bits b bits Tag Data
Cache Inde
x
2b bytes

www.tugraz.at

2n cache lines
Cache Inde
x
2b bytes

www.tugraz.at

f 2n cache lines
Cache Inde
x
=?
2b bytes
Tag
Hit/Miss

www.tugraz.at

f 2n cache lines
Cache Inde
x
=?
2b bytes
Tag
Hit/Miss
Problem: working on congruent addresses

www.tugraz.at
2-way set associativity

f 2n cache lines
Cache Inde
x

www.tugraz.at

WayTag
1 Tag Way 1 Data
n bits b bits Data
Way 2 Tag Way 2 Data
f 2n cache sets
Cache Inde
x

www.tugraz.at

WayTag
1 Tag Way 1 Data
n bits b bits Data
f 2n cache sets
Cache Inde
x
=?
Tag =?

www.tugraz.at

WayTag
1 Tag Way 1 Data
n bits b bits Data
f 2n cache sets
Cache Inde
x
=?
Tag =?
Data

www.tugraz.at

WayTag
1 Tag Way 1 Data
n bits b bits Data
f 2n cache sets
Cache Inde
x
=?
Tag =?
→ replacement policy Data

www.tugraz.at
Caches today
core 0 core 1 core 2 core 3

L1 and L2 are private
L1 L1 L1 L1 last-level cache:
L2 L2 L2 L2 ring bus
divided in slices
shared across cores
inclusive
LLC LLC LLC LLC
slice 0 slice 1 slice 2 slice 3

www.tugraz.at
Cache levels: Latency comparison

On current Intel CPUs:

www.tugraz.at

L1 cache: 4 cycles

www.tugraz.at

L1 cache: 4 cycles
L2 cache: 12 cycles

www.tugraz.at

L1 cache: 4 cycles
L2 cache: 12 cycles
L3 cache: 26-31 cycles

www.tugraz.at

L1 cache: 4 cycles
L2 cache: 12 cycles
L3 cache: 26-31 cycles
DRAM memory: >120 cycles

www.tugraz.at
(Unprivileged) cache maintainance

User programs can optimize cache usage:
prefetch: suggest CPU to load data into cache
clflush: throw out data from from all caches
... based on virtual addresses

www.tugraz.at
1. Quick Start
3. CPU caches
4. Cache attacks

www.tugraz.at
CPU cache attacks
cache-based keylogging
crypto key recovery
various implementations (AES, RSA, ECC, ...)

up to 97% key bits recovered after 1 encryption
cross-VM, cross-core, even cross-CPU

any CPU vendor

www.tugraz.at
Cross-core attacks?
using the inclusive property

www.tugraz.at
Cross-core attacks?

last-level cache is a superset of L1 and L2

www.tugraz.at
Cross-core attacks?

data evicted from last-level cache → evicted from L1 and L2

www.tugraz.at
Cross-core attacks?

data evicted from last-level cache → evicted from L1 and L2
a core can evict lines in the private L1 of another core

www.tugraz.at
Access-driven attacks
Attacker monitors its own activity to find sets accessed by victim.
Flush+Reload Prime+Probe
Gullasch et al. 2011 Percival 2005
Yarom and Falkner 2014 Liu et al. 2015
Gruss, Spreitzer, et al. 2015 Clémentine Maurice, Neumann, et al. 2015
Same techniques for covert and side channels

www.tugraz.at
Flush+Reload: Building Blocks
Shared Library / load binary twice / page deduplication

www.tugraz.at

clflush throws data out of cache
→ We can throw other shared code out of the cache

www.tugraz.at

clflush throws data out of cache
→ We can throw other shared code out of the cache
rdtsc / rdtscp give accurate timing information
→ We can measure whether shared code is in the cache

www.tugraz.at
Flush+Reload: First steps
Measure timing of cached memory

Measure timing of non-cached memory (flush before measuring)
Draw a histogram

www.tugraz.at
Flush+Reload
Attacker Victim
address space Cache address space
step 0: attacker maps shared library → shared memory, shared in cache

www.tugraz.at
Flush+Reload
Attacker Victim
cached cached

www.tugraz.at
Flush+Reload
Attacker Victim
flushes

step 1: attacker flushes the shared line

www.tugraz.at
Flush+Reload
Attacker Victim
loads data

step 2: victim loads data while performing encryption

www.tugraz.at
Flush+Reload
Attacker Victim
reloads data

step 2: victim loads data while performing encryption
step 3: attacker reloads data → fast access if the victim loaded the line

www.tugraz.at
Flush+Reload
Pros: fine granularity (1 line)
Cons: restrictive
1. needs clflush instruction (not available e.g., in JS)
2. needs shared memory

www.tugraz.at
Variants of Flush+Reload
Flush+Flush Gruss, Clémentine Maurice, et al. 2016

Evict+Reload Gruss, Spreitzer, et al. 2015 on ARM Lipp et al. 2016

www.tugraz.at
Prime+Probe
Attacker Victim
step 0: attacker fills the cache (prime)

www.tugraz.at
Prime+Probe
Attacker Victim

www.tugraz.at
Prime+Probe
Attacker Victim

www.tugraz.at
Prime+Probe
Attacker Victim
loads data

step 1: victim evicts cache lines while performing encryption

www.tugraz.at
Prime+Probe
Attacker Victim
loads data


www.tugraz.at
Prime+Probe
Attacker Victim
loads data


www.tugraz.at
Prime+Probe
Attacker Victim
loads data


www.tugraz.at
Prime+Probe
Attacker Victim


www.tugraz.at
Prime+Probe
Attacker Victim

step 2: attacker probes data to determine if the set was accessed

www.tugraz.at
Prime+Probe
Attacker Victim
fas
ta
cce
ss


www.tugraz.at
Prime+Probe
Attacker Victim
slo
w
ac
ce
ss


www.tugraz.at
Prime+Probe
Pros: less restrictive
1. no need for clflush instruction (not available e.g., in JS)
2. no need for shared memory
Cons: coarser granularity (1 set)

www.tugraz.at
Issues with Prime+Probe

We need to evict caches lines without clflush or shared memory:
1. which addresses do we access to have congruent cache lines?
2. without any privilege?
3. and in which order do we access them?

www.tugraz.at
#1.1: Which physical addresses to access?
“LRU eviction”:
assume that cache uses LRU replacement
accessing n addresses from the same cache set to evict an n-way set
eviction from last level → from whole hierarchy (it’s inclusive!)

www.tugraz.at
#1.2: Which addresses map to the same set?

35 17 6 0
physical address tag set oﬀset

function H that maps slices is
30 undocumented
H 11
2 reverse-engineered by
Clémentine Maurice,
Le Scouarnec, et al. 2015; Inci
et al. 2015; Yarom, Ge, et al.
2015
line

www.tugraz.at

35 17 6 0
physical address tag set oﬀset

function H that maps slices is
30 undocumented
H 11
2 reverse-engineered by
Clémentine Maurice,
Le Scouarnec, et al. 2015; Inci
et al. 2015; Yarom, Ge, et al.
2015
line
hash function basically an
XOR of address bits

www.tugraz.at

3 functions, depending on the number of cores
Address bit
3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0
7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6
2 cores o0 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
o ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
4 cores 0
o1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
o0 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
8 cores o1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
o2 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕

www.tugraz.at
#2: Obtain information without root privileges
last-level cache is physically indexed

www.tugraz.at

root privileges needed for physical addresses

www.tugraz.at

use 2 MB pages → lowest 21 bits are the same as virtual address

www.tugraz.at

use 2 MB pages → lowest 21 bits are the same as virtual address
→ enough to compute the cache set

www.tugraz.at
#3.1: Replacement policy on older CPUs

“LRU eviction” memory accesses
cache set

www.tugraz.at

cache set
LRU replacement policy: oldest entry first

www.tugraz.at

cache set 2 5 8 1 7 6 3 4

timestamps for every cache line

www.tugraz.at

load
cache set 2 5 8 9
1 7 6 3 4

access updates timestamp

www.tugraz.at

load
cache set 10
2 5 8 9
1 7 6 3 4


www.tugraz.at

load
cache set 10
2 5 8 9
1 7 6 3
11 4


www.tugraz.at

load
cache set 10
2 5 8 9
1 7 6 3
11 12
4


www.tugraz.at

load
cache set 10
2 5
13 8 9
1 7 6 3
11 12
4


www.tugraz.at

load
cache set 10
2 5
13 8 9
1 7 6
14 3
11 12
4


www.tugraz.at

load
cache set 10
2 5
13 8 9
1 7
15 6
14 3
11 12
4


www.tugraz.at

load
cache set 10
2 5
13 16
8 9
1 7
15 6
14 3
11 12
4


www.tugraz.at
#3.2: Replacement policy on recent CPUs

cache set 2 5 8 1 7 6 3 4
no LRU replacement

www.tugraz.at

load
cache set 2 5 8 9
1 7 6 3 4
no LRU replacement

www.tugraz.at

load
cache set 10
2 5 8 9
1 7 6 3 4
no LRU replacement

www.tugraz.at

load
cache set 10
2 5 8 9
11
1 7 6 3 4
no LRU replacement

www.tugraz.at

load
cache set 10
12
2 5 8 9
11
1 7 6 3 4
no LRU replacement

www.tugraz.at

load
cache set 10
12
2 5 8 9
11
1 7 6 13
3 4
no LRU replacement

www.tugraz.at

load
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 4
no LRU replacement

www.tugraz.at

load
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
4
no LRU replacement

www.tugraz.at

load
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
16
4
no LRU replacement

www.tugraz.at

cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
16
4
no LRU replacement
only 75% success rate on Haswell

www.tugraz.at

cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
16
4
no LRU replacement
only 75% success rate on Haswell
more accesses → higher success rate, but too slow

www.tugraz.at
#3.3: Cache eviction strategy
a1
Address
a2
a3
a4
a5
a6
a7
a8
a9
Time
Figure: Fast and effective on Haswell. Eviction rate >99.97%.

www.tugraz.at
What to profile?
# ps -A | grep gedit
# cat /proc/pid/maps
00400000-00489000 r-xp 00000000 08:11 396356
/usr/bin/gedit
7f5a96991000-7f5a96a51000 r-xp 00000000 08:11 399365
/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.1400.14
...
memory range, access rights, offset, –, –, file name

www.tugraz.at
Profiling a single event
cd ../profiling/generic_low_frequency_example
make
./spy
# start the targeted program
sleep 2; ./spy 200 400000-489000 -- 20000
-- -- /usr/bin/gedit
... and hold down key in the targeted program

save addresses with peaks!
www.tugraz.at
Exploitation phase
cd ../exploitation/generic
make
./spy file offset

www.tugraz.at
1. Quick Start
3. CPU caches
4. Cache attacks

www.tugraz.at
Side channels vs covert channels
side channel: attacker spies a victim process

covert channel: communication between two processes
that are not supposed to communicate

that are collaborating

www.tugraz.at
1-bit cache covert channels

ideas for 1-bit channels:

55
www.tugraz.at

Prime+Probe: use one cache set to transmit
0: sender does not access the set → low access time in receiver
1: sender does access the set → high access time in receiver

55
www.tugraz.at

Prime+Probe: use one cache set to transmit
0: sender does not access the set → low access time in receiver
1: sender does access the set → high access time in receiver
Flush+Reload/Flush+Flush/Evict+Reload: use one address to transmit
0: sender does not access the address → high access time in

receiver
1: sender does access the address → low access time in receiver

55
www.tugraz.at
1-bit covert channels
1 bit data, 0 bit control?

www.tugraz.at
1-bit covert channels
1 bit data, 0 bit control?

idea: divide time into slices (e.g., 50µs frames)
synchronize sender and receiver with a shared clock

www.tugraz.at
Problems of 1-bit covert channels
errors?

57
www.tugraz.at
Problems of 1-bit covert channels
errors? → error-correcting codes

retransmission may be more efficient (less overhead)
desynchronization
optimal transmission duration may vary

57
www.tugraz.at
Multi-bit covert channels
combine multiple 1-bit channels

www.tugraz.at

avoid interferences
→ higher performance

www.tugraz.at

avoid interferences
→ higher performance
use 1-bit for sending = true/false

www.tugraz.at
Packets / frames
Organize data in packets / frames:
some data bits
check sum
sequence number
→ keep sender and receiver synchronous
→ check whether retransmission is necessary

www.tugraz.at
State of the art
method raw capacity err. rate true capacity env.

F+F Gruss, Clémentine Maurice, et al. 2016 3968Kbps 0.840% 3690Kbps native
F+R Gruss, Clémentine Maurice, et al. 2016 2384Kbps 0.005% 2382Kbps native
E+R Lipp et al. 2016 1141Kbps 1.100% 1041Kbps native
P+P Liu et al. 2015 600Kbps 1.000% 552Kbps virt

www.tugraz.at
1. Quick Start
3. CPU caches
4. Cache attacks

www.tugraz.at
Cache Template Attacks

Profiling Phase
Preprocessing step to find exploitable addresses automatically
w.r.t. “events” (keystrokes, encryptions, ...)

called “Cache Template”

www.tugraz.at
Cache Template Attacks

Profiling Phase
Preprocessing step to find exploitable addresses automatically
w.r.t. “events” (keystrokes, encryptions, ...)

called “Cache Template”
Exploitation Phase
Monitor exploitable addresses

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
Shared 0x0
Cache is empty

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
Shared 0x0
A
Shared 0x0
Attacker triggers an event

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
Shared 0x0
Shared 0x0
Attacker checks one address for cache hits (“Reload”)

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
Shared 0x0
Shared 0x0
Update cache hit ratio (per event and address)

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
flus Shared 0x0
h
Shared 0x0
Attacker flushes shared memory

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
A
Shared 0x0
Repeat for higher accuracy

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
B
Shared 0x0
Repeat for all events

www.tugraz.at
Profiling Phase

Cache
Shared 0x0
C
Shared 0x0
Repeat for all events

www.tugraz.at
Profiling Phase

Cache
Shared 0x40
A
Shared 0x40

www.tugraz.at
Profiling Phase

Cache
A
Shared 0x80
Shared 0x80

www.tugraz.at
Profiling a Single Event

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at

www.tugraz.at
Profiling Phase: 1 Event, 1 Address
K EY
A DDRESS
n
0x7c800

www.tugraz.at
Profiling Phase: 1 Event, 1 Address
K EY
A DDRESS
n
0x7c800
Example: Cache Hit Ratio for (0x7c800, n): 200 / 200

www.tugraz.at
Profiling Phase: All Events, 1 Address
K EY
A DDRESS
g h i j k l m n o p q r s t u v w x y z
0x7c800

www.tugraz.at
K EY
A DDRESS
0x7c800
Example: Cache Hit Ratio for (0x7c800, u): 13 / 200

www.tugraz.at
K EY
A DDRESS
0x7c800
Distinguish n from other keys by monitoring 0x7c800

www.tugraz.at
Profiling Phase: All Events, All Addresses

K EY
0x7c680
0x7c6c0
0x7c700
0x7c740
0x7c780
0x7c7c0
0x7c800
0x7c840
A DDRESS 0x7c880
0x7c8c0
0x7c900
0x7c940
0x7c980
0x7c9c0
0x7ca00
0x7cb80
0x7cc40
0x7cc80
0x7ccc0
0x7cd00

www.tugraz.at
Exploitation Phase
Monitor addresses from Cache Template

www.tugraz.at
Exploitation Phase

Report to log file / attacker

www.tugraz.at
Exploitation Phase

Report to log file / attacker
Manual analysis of log file
Find password in keypress log, etc.

www.tugraz.at
Example Attacks

www.tugraz.at
Attack 1: Keystroke Timings
Spy on keystroke timings on

Linux, Windows and OS X

www.tugraz.at
Spy on keystroke timings on

Sub-microsecond accuracy

www.tugraz.at
Event trace Cache-hit trace

Spy on keystroke timings on Hit
Cache-hit phase
Event start
Event end
Sub-microsecond accuracy
Derive text input from timings Miss
0 0.1 0.2 2.24 2.25 2.26
T IME IN CYCLES ·107

www.tugraz.at
Attack 2: Keylogging
K EY
Linux with GTK: monitor 0x7c680
0x7c6c0
keystrokes of specific keys 0x7c700
0x7c740
0x7c780
Detect groups of keys 0x7c7c0
0x7c800
0x7c840
Some keys distinct
A DDRESS
0x7c880
0x7c8c0
0x7c900
0x7c940
0x7c980
0x7c9c0
0x7ca00
0x7cb80
0x7cc40
0x7cc80
0x7ccc0
0x7cd00

www.tugraz.at
Attack 3: Locate AES T-Tables

AES uses T-Tables (precomputed from S-Boxes)
4 T-Tables

T0 k{0,4,8,12} ⊕ p{0,4,8,12}

T1 k{1,5,9,13} ⊕ p{1,5,9,13}
...
If we know which entry of T is accessed, we know the result of ki ⊕ pi .
Known-plaintext attack (pi is known) → ki can be determined

www.tugraz.at

AES T-Table implementation from OpenSSL 1.0.2

www.tugraz.at

Most addresses in two groups:
Cache hit ratio 100% (always cache hits)

Cache hit ratio 0% (no cache hits)

www.tugraz.at

Most addresses in two groups:
Cache hit ratio 100% (always cache hits)

Cache hit ratio 0% (no cache hits)
One 4096 byte memory block:
Cache hit ratio of 92%

Cache hits depend on key value and plaintext value
The T-Tables

www.tugraz.at
Attack 4: AES T-Table Template Attack

Known-plaintext attack
Events: encryption with only one fixed key byte

www.tugraz.at

Profile each event

www.tugraz.at

Profile each event
Exploitation phase:
Eliminate key candidates

www.tugraz.at

Profile each event
Exploitation phase:
Reduction of key space in first-round attack:
64 bits after 16–160 encryptions

www.tugraz.at

Profile each event
Exploitation phase:
Reduction of key space in first-round attack:
64 bits after 16–160 encryptions
State of the art: full key recovery after 30000 encryptions

www.tugraz.at
Attack 4: AES T-Table Template
k0 = 0x00 k0 = 0x55
(transposed)

75
www.tugraz.at
Meltdown
Boot with
nopti nokaslr

www.tugraz.at
Meltdown setup
1. identify a promising kernel address

www.tugraz.at
Meltdown setup
/proc/kallsysms

www.tugraz.at
Meltdown setup
/proc/kallsysms
https://github.com/IAIK/meltdown/tree/master/libkdump

www.tugraz.at
Meltdown setup
/proc/kallsysms
size t paddr = libkdump virt to phys((size t)secret);

www.tugraz.at
Meltdown setup
/proc/kallsysms
size t paddr = libkdump virt to phys((size t)secret);
2. create a page aligned 256 × 4KB = 1MB array

www.tugraz.at
Meltdown in three easy steps
1. load one byte from a kernel address into a register

www.tugraz.at

2. compute array index: multiply byte in register by page size (4KB)

www.tugraz.at

2. compute array index: multiply byte in register by page size (4KB)
3. access array offset with this index

www.tugraz.at
Meltdown: Reading the Secret
1. Flush+Reload over all array offsets

www.tugraz.at

2. Read 0? Repeat.

www.tugraz.at

2. Read 0? Repeat.
3. Even something like 500k repetitions can make sense.

www.tugraz.at

2. Read 0? Repeat.
3. Even something like 500k repetitions can make sense.
4. We can just ignore cache line 0.

www.tugraz.at
Improving Meltdown
1. Add a null pointer

www.tugraz.at
Improving Meltdown

2. Add software/hardware prefetching

www.tugraz.at
Improving Meltdown

2. Add software/hardware prefetching
3. Add concurrent loads to the same address

www.tugraz.at
Preventing Meltdown
1. Run the same attack without bootflag nopti

www.tugraz.at
Preventing Meltdown
1. Run the same attack without bootflag nopti

2. Won’t work...

www.tugraz.at
How to have a Meltdown

Daniel Gruss
Graz University of Technology

www.tugraz.at
Bibliography I
Gruss, Daniel, Clémentine Maurice, Klaus Wagner, and Stefan Mangard (2016).
“Flush+Flush: A Fast and Stealthy Cache Attack”. In: DIMVA’16.
Gruss, Daniel, Raphael Spreitzer, and Stefan Mangard (2015). “Cache Template
Attacks: Automating Attacks on Inclusive Last-Level Caches”. In: USENIX
Security Symposium.
Gullasch, David, Endre Bangerter, and Stephan Krenn (2011). “Cache Games –
Bringing Access-Based Cache Attacks on AES to Practice”. In: S&P’11.
Inci, Mehmet Sinan, Berk Gulmezoglu, Gorka Irazoqui, Thomas Eisenbarth, and
Berk Sunar (2015). “Seriously, get off my cloud! Cross-VM RSA Key Recovery in
a Public Cloud”. In: Cryptology ePrint Archive, Report 2015/898, pp. 1–15.

Bibliography II
Lipp, Moritz, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice, and
Stefan Mangard (2016). “ARMageddon: Last-Level Cache Attacks on Mobile
Devices”. In: USENIX Security Symposium.
Liu, Fangfei, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee (2015).
“Last-Level Cache Side-Channel Attacks are Practical”. In: S&P’15.
Maurice, Clémentine, Nicolas Le Scouarnec, Christoph Neumann, Olivier Heen,
and Aurélien Francillon (2015). “Reverse Engineering Intel Complex Addressing
Using Performance Counters”. In: RAID.
Maurice, Clémentine, Christoph Neumann, Olivier Heen, and Aurélien Francillon
(2015). “C5: Cross-Cores Cache Covert Channel”. In: DIMVA’15.
Percival, Colin (2005). “Cache missing for fun and profit”. In: Proceedings of
BSDCan.

Bibliography III
Yarom, Yuval and Katrina Falkner (2014). “Flush+Reload: a High Resolution, Low
Noise, L3 Cache Side-Channel Attack”. In: USENIX Security Symposium.
Yarom, Yuval, Qian Ge, Fangfei Liu, Ruby B. Lee, and Gernot Heiser (2015).
“Mapping the Intel Last-Level Cache”. In: Cryptology ePrint Archive, Report
2015/905, pp. 1–12.


Daniel Gruss Slides Training PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Daniel Gruss Slides Training PDF

Uploaded by

Copyright:

Available Formats

www.tugraz.

How to have a Meltdown

Daniel Gruss, Graz University of Technology

Get your computer ready!

Daniel Gruss, Graz University of Technology

Get your computer ready!

Daniel Gruss, Graz University of Technology

2. Measuring and exploiting timing leakage

5. Cache covert channels

6. Cache template attacks

Daniel Gruss, Graz University of Technology

memory range, access rights, offset, –, –, file name

Daniel Gruss, Graz University of Technology

Profiling a single event

... and hold down key in the targeted program

Daniel Gruss, Graz University of Technology

Memory x86 CPU

Memory Memory Branch Arithmetic Data and

Daniel Gruss, Graz University of Technology

Why targeting the cache?

shared across cores

Daniel Gruss, Graz University of Technology

Why targeting the cache?

shared across cores

Daniel Gruss, Graz University of Technology

caches improve performance

data is cached → cache hit → fast

Daniel Gruss, Graz University of Technology

2. Measuring and exploiting timing leakage

5. Cache covert channels

6. Cache template attacks

Daniel Gruss, Graz University of Technology

Mesuring timing leakage

Daniel Gruss, Graz University of Technology

Mesuring timing leakage

Daniel Gruss, Graz University of Technology

git clone https://github.com/IAIK/cache_template_attacks.git

Daniel Gruss, Graz University of Technology

1. build two cases: cache hits and cache misses

Daniel Gruss, Graz University of Technology

1. build two cases: cache hits and cache misses

Daniel Gruss, Graz University of Technology

1. build two cases: cache hits and cache misses

Daniel Gruss, Graz University of Technology

Step 1.1. Cache hits

Daniel Gruss, Graz University of Technology

Step 1.2. Cache misses

Daniel Gruss, Graz University of Technology

Step 2. Accurate timings

very short timings

Daniel Gruss, Graz University of Technology

Step 2. Accurate timings

do you measure what you think you measure?

rdtsc rdtsc rdtsc

Daniel Gruss, Graz University of Technology

Step 2. Accurate timings

use pseudo-serializing instruction rdtscp (recent CPUs)

Daniel Gruss, Graz University of Technology

Step 2. Accurate timings

use pseudo-serializing instruction rdtscp (recent CPUs)

Daniel Gruss, Graz University of Technology

Step 2. Accurate timings