Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Speculative

Read-Write Locks
Shady Issa Tiago Lopes Paolo Romano

04/12/18
Layout
• Introduction and Background

• Hardware Read-Write Lock Elision

• Speculative Read-Write Locks

• Evaluation

• Conclusion

4/12/2018 2 Middleware 18
Introduction

• Why are read-write locks important

• or

• HTM and its issues

4/12/2018 3 Middleware 18
Hardware Lock Elision

• The main motivation behind HTM adoption in


commodity processors

• Allows higher scalability even in legacy


applications

4/12/2018 4 Middleware 18
Hardware Lock Elision
Thread 1 Thread 2
lock()

X=1
unlock()
lock()
Y=1
unlock()

4/12/2018 5 Middleware 18
Hardware Lock Elision
Thread 1 Thread 2
BeginTX() BeginTx()

X=1 Y=1
EndTx() EndTx()

4/12/2018 6 Middleware 18
Hardware Lock Elision
Thread 1 Thread 2
BeginTX() BeginTx()

X=1 X=1
EndTx() ABORT EndTx()

4/12/2018 7 Middleware 18
Hardware Lock Elision

• HTM implementations are best-effort by nature

• Transactions may never commit in hardware even


in absence of contention:

• Capacity limitation / Timer interrupts

• A software fallback is always needed!!

4/12/2018 8 Middleware 18
Hardware Read-Write Lock
Elision

• Executes read-only critical sections without


instrumentation

• Spares them from any HTM limitations

• Relies on Suspend/Resume
Only in feature to ensure
POWER8
correctness processor

Pascal Felber, Shady Issa, Alexander Matveev, and Paolo Romano. Hardware read-write lock elision. EuroSys ’16
4/12/2018 9 Middleware 18
Speculative Read-Write
Locks

• Executes read-only critical sections without any


instrumentation

• Relies on plain Begin/EndTx API, which is


universal to all HTM implementations

4/12/2018 10 Middleware 18
Speculative Read-Write
X=0
Locks X=0
Thread 1 Thread 2
readlock()
BeginTx()
read X
returns 0 X=1
EndTx()
read X
returns 1
Inconsistent
values

4/12/2018 11 Middleware 18
Speculative Read-Write
X=0
Locks X=0
Thread 1 Thread 2
readlock()
BeginTx()
read X
returns 0 X=1
EndTx()
read X
returns 0
HTM hides Abort if there are
updates active readers

4/12/2018 12 Middleware 18
Speculative Read-Write
Locks

Throughput (105 Tx/s) Readers Writers


9 2.4 2.5

Latency (105 cycles)


8 2.2 NoSched
TLE 2 2 RWait
7 RSync
1.8
6 1.6 SpRWL
1.5
5 1.4
4 1.2
1 1
3
0.8
2 0.6 0.5
1 0.4
0 0.2 0
4 8 14 28 56 4 8 14 28 56 4 8 14 28 56
Number of threads

4/12/2018 13 Middleware 18
Scheduling Techniques
• Writers may starve in read-intensive workloads

• frequent activation of the fallback path

• SpRWL integrates two scheduling mechanisms:

• Reader Synchronization

• Writer Syncrhotnization

• They both rely on processor timestamp counters

4/12/2018 14 Middleware 18
Reader Synchronization

• Readers try to not kill active writers

• Synchronize readers starting point

4/12/2018 15 Middleware 18
Reader Synchronization
Thread 1 Thread 2
BeginTx()

readlock()

• flag itself as active


• inspects the status of • publish expected
other threads end time
• wait for the writers
expected to complete last
• publish that writer

4/12/2018 16 Middleware 18
Reader Synchronization
Thread 1 Thread 3
• inspects the status of
other threads
• wait with Thread 1 for
readlock() Thread 2
waiting for
Thread 2
readlock()

4/12/2018 17 Middleware 18
Reader Synchronization
Thread 1 Thread 3 Thread 2
BeginTx()

readlock()

readlock()
EndTx()

4/12/2018 18 Middleware 18
Writer Synchronization

• Avoid wasting retry budget for active readers

• Overlap writers with readers

4/12/2018 19 Middleware 18
Writer Synchronization
Thread 1 Thread 2
readlock()
BeginTx()
abort
• publish EndTx()
expected overlap between BeginTx()
end time readers and writers
abort
EndTx()
unlock()

• writers start
checking for active
readers here

4/12/2018 20 Middleware 18
Speculative Read-Write
Locks

Throughput (105 Tx/s) Readers Writers


9 2.4 2.5

Latency (105 cycles)


8 2.2 NoSched
TLE 2 2 RWait
7 RSync
1.8
6 1.6 SpRWL
1.5
5 1.4
4 1.2
1 1
3
0.8
2 0.6 0.5
1 0.4
0 0.2 0
4 8 14 28 56 4 8 14 28 56 4 8 14 28 56
Number of threads

4/12/2018 21 Middleware 18
What was not discussed

• Optimisations of attempting readers in HTM first

• SNZI - although I show results laters!!

4/12/2018 22 Middleware 18
Evaluation: TPC-C

• In-memory C++ import of TPC-C

• Workload with 35% read-only critical sections:

• Stock Level: 31%, Delivery: 4%, Order Status:


4%, Payment: 43%, New Order: 18%

4/12/2018 23 Middleware 18
TPC-C: Intel Broadwell
4
Throughput (10 Tx/s) Readers Writers
6 10 18

Latency (106 cycles)


9 TLE 16 SpRWL
5 8 RWL SNZI
BRLock 14
4 7 12
6
10
3 5
8
4
2 3 6
2 4
1
1 2
0 0 0
4 8 14 28 56 4 8 14 28 56 4 8 14 28 56
Number of threads

>3x

4/12/2018 24 Middleware 18
TPC-C: POWER8
4
Throughput (10 Tx/s) Readers Writers
3.5 8 14

Latency (106 cycles)


7 TLE
3 12 RW-LE
2.5 6 10 RWL
5 BRLock
2 8 SpRWL
4 SNZI
1.5 6
3
1 2 4
0.5 1 2
0 0 0
2 4 8 16 32 80 2 4 8 16 32 80 2 4 8 16 32 80
Number of threads

~2x

4/12/2018 25 Middleware 18
Conclusions and Future Work

• Broadens the applicability of HTM:

• SpRWL provides an efficient and universal mechanism to elide


read-write locks

• SpRWL achieves significant gains compared to both pessimistic


and HTM-based solutions

• Self-tuning and integration of scalable counters

• Other scheduling and contention management mechanisms

4/12/2018 26 Middleware 18

You might also like