Professional Documents
Culture Documents
Daniel Gruss Slides Training PDF
Daniel Gruss Slides Training PDF
at
1. Quick Start
3. CPU caches
4. Cache attacks
What to profile?
# ps -A | grep gedit
# cat /proc/pid/maps
00400000-00489000 r-xp 00000000 08:11 396356
/usr/bin/gedit
7f5a96991000-7f5a96a51000 r-xp 00000000 08:11 399365
/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.1400.14
...
cd ../profiling/generic_low_frequency_example
# put the threshold into spy.c (MIN_CACHE_MISS_CYCLES)
make
./spy
# start the targeted program
sleep 2; ./spy 200 400000-489000 -- 20000
-- -- /usr/bin/gedit
Exploitation phase
cd ../exploitation/generic
# put the threshold into spy.c (MIN_CACHE_MISS_CYCLES)
make
./spy file offset
Information leakage
Shared hardware
Timing differences
1. Quick Start
3. CPU caches
4. Cache attacks
Calibration
Steps
Steps
Steps
[...]
rdtsc
function()
rdtsc
[...]
Intel, How to Benchmark Code Execution Times on Intel IA-32 and IA-64
Instruction Set Architectures White Paper, December 2010.
Step 3. Histogram
Step 3. Histogram
as high as possible
most cache hits are below
no cache miss below
Shared 0x0
Shared 0x0
Cache is empty
Shared 0x0
Shared 0x0
A
Shared 0x0
Shared 0x0
Shared 0x0
Shared 0x0
Shared 0x0
Shared 0x0
Shared 0x0
Shared 0x0
flus Shared 0x0
h
Shared 0x0
Shared 0x0
A
Shared 0x0
Shared 0x40
A
Shared 0x40
A
Shared 0x80
Shared 0x80
1. Quick Start
3. CPU caches
4. Cache attacks
Memory Address
2b bytes
Cache Inde
x
2b bytes
2n cache lines
Cache Inde
x
2b bytes
f 2n cache lines
Cache Inde
x
=?
2b bytes
Tag
Hit/Miss
f 2n cache lines
Cache Inde
x
=?
2b bytes
Tag
Hit/Miss
f 2n cache lines
Cache Inde
x
f 2n cache sets
Cache Inde
x
f 2n cache sets
Cache Inde
x
=?
Tag =?
f 2n cache sets
Cache Inde
x
=?
Tag =?
Data
f 2n cache sets
Cache Inde
x
=?
Tag =?
Caches today
1. Quick Start
3. CPU caches
4. Cache attacks
cache-based keylogging
crypto key recovery
Cross-core attacks?
Cross-core attacks?
Cross-core attacks?
Cross-core attacks?
Access-driven attacks
Attacker monitors its own activity to find sets accessed by victim.
Flush+Reload Prime+Probe
Gullasch et al. 2011 Percival 2005
Yarom and Falkner 2014 Liu et al. 2015
Gruss, Spreitzer, et al. 2015 Clémentine Maurice, Neumann, et al. 2015
Flush+Reload
Attacker Victim
address space Cache address space
Flush+Reload
Attacker Victim
address space Cache address space
cached cached
Flush+Reload
Attacker Victim
address space Cache address space
flushes
Flush+Reload
Attacker Victim
address space Cache address space
loads data
Flush+Reload
Attacker Victim
address space Cache address space
reloads data
Flush+Reload
Pros: fine granularity (1 line)
Cons: restrictive
1. needs clflush instruction (not available e.g., in JS)
2. needs shared memory
Variants of Flush+Reload
Prime+Probe
Attacker Victim
address space Cache address space
Prime+Probe
Attacker Victim
address space Cache address space
Prime+Probe
Attacker Victim
address space Cache address space
Prime+Probe
Attacker Victim
address space Cache address space
loads data
Prime+Probe
Attacker Victim
address space Cache address space
loads data
Prime+Probe
Attacker Victim
address space Cache address space
loads data
Prime+Probe
Attacker Victim
address space Cache address space
loads data
Prime+Probe
Attacker Victim
address space Cache address space
Prime+Probe
Attacker Victim
address space Cache address space
Prime+Probe
Attacker Victim
address space Cache address space
fas
ta
cce
ss
Prime+Probe
Attacker Victim
address space Cache address space
slo
w
ac
ce
ss
Prime+Probe
Pros: less restrictive
1. no need for clflush instruction (not available e.g., in JS)
2. no need for shared memory
“LRU eviction”:
assume that cache uses LRU replacement
accessing n addresses from the same cache set to evict an n-way set
eviction from last level → from whole hierarchy (it’s inclusive!)
Address bit
3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0
7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6
2 cores o0 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
o ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
4 cores 0
o1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
o0 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
8 cores o1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
o2 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕
cache set
cache set
cache set 2 5 8 1 7 6 3 4
load
cache set 2 5 8 9
1 7 6 3 4
load
cache set 10
2 5 8 9
1 7 6 3 4
load
cache set 10
2 5 8 9
1 7 6 3
11 4
load
cache set 10
2 5 8 9
1 7 6 3
11 12
4
load
cache set 10
2 5
13 8 9
1 7 6 3
11 12
4
load
cache set 10
2 5
13 8 9
1 7 6
14 3
11 12
4
load
cache set 10
2 5
13 8 9
1 7
15 6
14 3
11 12
4
load
cache set 10
2 5
13 16
8 9
1 7
15 6
14 3
11 12
4
cache set 2 5 8 1 7 6 3 4
no LRU replacement
load
cache set 2 5 8 9
1 7 6 3 4
no LRU replacement
load
cache set 10
2 5 8 9
1 7 6 3 4
no LRU replacement
load
cache set 10
2 5 8 9
11
1 7 6 3 4
no LRU replacement
load
cache set 10
12
2 5 8 9
11
1 7 6 3 4
no LRU replacement
load
cache set 10
12
2 5 8 9
11
1 7 6 13
3 4
no LRU replacement
load
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 4
no LRU replacement
load
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
4
no LRU replacement
load
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
16
4
no LRU replacement
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
16
4
no LRU replacement
only 75% success rate on Haswell
cache set 10
12
2 5 8 9
11
1 7 6 13
3
14 15
16
4
no LRU replacement
only 75% success rate on Haswell
more accesses → higher success rate, but too slow
a1
Address
a2
a3
a4
a5
a6
a7
a8
a9
Time
Figure: Fast and effective on Haswell. Eviction rate >99.97%.
What to profile?
# ps -A | grep gedit
# cat /proc/pid/maps
00400000-00489000 r-xp 00000000 08:11 396356
/usr/bin/gedit
7f5a96991000-7f5a96a51000 r-xp 00000000 08:11 399365
/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.1400.14
...
cd ../profiling/generic_low_frequency_example
# put the threshold into spy.c (MIN_CACHE_MISS_CYCLES)
make
./spy
# start the targeted program
sleep 2; ./spy 200 400000-489000 -- 20000
-- -- /usr/bin/gedit
Exploitation phase
cd ../exploitation/generic
# put the threshold into spy.c (MIN_CACHE_MISS_CYCLES)
make
./spy file offset
1. Quick Start
3. CPU caches
4. Cache attacks
0: sender does not access the set → low access time in receiver
1: sender does access the set → high access time in receiver
0: sender does not access the set → low access time in receiver
1: sender does access the set → high access time in receiver
errors?
Packets / frames
Organize data in packets / frames:
some data bits
check sum
sequence number
→ keep sender and receiver synchronous
→ check whether retransmission is necessary
1. Quick Start
3. CPU caches
4. Cache attacks
Exploitation Phase
Monitor exploitable addresses
Profiling Phase
Shared 0x0
Shared 0x0
Cache is empty
Profiling Phase
Shared 0x0
Shared 0x0
A
Shared 0x0
Profiling Phase
Shared 0x0
Shared 0x0
Shared 0x0
Profiling Phase
Shared 0x0
Shared 0x0
Shared 0x0
Profiling Phase
Shared 0x0
flus Shared 0x0
h
Shared 0x0
Profiling Phase
Shared 0x0
A
Shared 0x0
Profiling Phase
Shared 0x0
B
Shared 0x0
Profiling Phase
Shared 0x0
C
Shared 0x0
Profiling Phase
Shared 0x40
A
Shared 0x40
Profiling Phase
A
Shared 0x80
Shared 0x80
K EY
A DDRESS
n
0x7c800
K EY
A DDRESS
n
0x7c800
K EY
A DDRESS
g h i j k l m n o p q r s t u v w x y z
0x7c800
K EY
A DDRESS
g h i j k l m n o p q r s t u v w x y z
0x7c800
K EY
A DDRESS
g h i j k l m n o p q r s t u v w x y z
0x7c800
Exploitation Phase
Exploitation Phase
Exploitation Phase
Example Attacks
Cache-hit phase
Linux, Windows and OS X
Event start
Event end
Sub-microsecond accuracy
Derive text input from timings Miss
0 0.1 0.2 2.24 2.25 2.26
T IME IN CYCLES ·107
Attack 2: Keylogging
K EY
g h i j k l m n o p q r s t u v w x y z
Linux with GTK: monitor 0x7c680
0x7c6c0
keystrokes of specific keys 0x7c700
0x7c740
0x7c780
Detect groups of keys 0x7c7c0
0x7c800
0x7c840
Some keys distinct
A DDRESS
0x7c880
0x7c8c0
0x7c900
0x7c940
0x7c980
0x7c9c0
0x7ca00
0x7cb80
0x7cc40
0x7cc80
0x7ccc0
0x7cd00
k0 = 0x00 k0 = 0x55
(transposed)
Meltdown
Boot with
nopti nokaslr
Meltdown setup
Meltdown setup
/proc/kallsysms
Meltdown setup
/proc/kallsysms
https://github.com/IAIK/meltdown/tree/master/libkdump
Meltdown setup
/proc/kallsysms
https://github.com/IAIK/meltdown/tree/master/libkdump
size t paddr = libkdump virt to phys((size t)secret);
Meltdown setup
/proc/kallsysms
https://github.com/IAIK/meltdown/tree/master/libkdump
size t paddr = libkdump virt to phys((size t)secret);
Improving Meltdown
Improving Meltdown
Improving Meltdown
Preventing Meltdown
Preventing Meltdown
Bibliography I
Gruss, Daniel, Clémentine Maurice, Klaus Wagner, and Stefan Mangard (2016).
“Flush+Flush: A Fast and Stealthy Cache Attack”. In: DIMVA’16.
Gruss, Daniel, Raphael Spreitzer, and Stefan Mangard (2015). “Cache Template
Attacks: Automating Attacks on Inclusive Last-Level Caches”. In: USENIX
Security Symposium.
Gullasch, David, Endre Bangerter, and Stephan Krenn (2011). “Cache Games –
Bringing Access-Based Cache Attacks on AES to Practice”. In: S&P’11.
Inci, Mehmet Sinan, Berk Gulmezoglu, Gorka Irazoqui, Thomas Eisenbarth, and
Berk Sunar (2015). “Seriously, get off my cloud! Cross-VM RSA Key Recovery in
a Public Cloud”. In: Cryptology ePrint Archive, Report 2015/898, pp. 1–15.