Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 227

Spin Locks and Contention

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Focus so far: Correctness and Progress


Models
Accurate (we never lied to you) But idealized (so we forgot to mention a few things)

Protocols
Elegant Important But nave
Art of Multiprocessor Programming 2

New Focus: Performance


Models
More complicated (not the same as complex!) Still focus on principles (not soon obsolete)

Protocols
Elegant (in their fashion) Important (why else would we pay attention) And realistic (your mileage may vary)
Art of Multiprocessor Programming 3

Kinds of Architectures
SISD (Uniprocessor)
Single instruction stream Single data stream

SIMD (Vector)
Single instruction Multiple data

MIMD (Multiprocessors)
Multiple instruction Multiple data.
Art of Multiprocessor Programming 4

Kinds of Architectures
SISD (Uniprocessor)
Single instruction stream Single data stream

SIMD (Vector)
Single instruction Multiple data

Our space

MIMD (Multiprocessors)
Multiple instruction Multiple data.
(1) Art of Multiprocessor Programming 5

MIMD Architectures

memory
Shared Bus
Distributed

Memory Contention Communication Contention Communication Latency


Art of Multiprocessor Programming 6

Today: Revisit Mutual Exclusion


Think of performance, not just correctness and progress Begin to understand how performance depends on our software properly utilizing the multiprocessor machines hardware And get to know a collection of locking algorithms
Art of Multiprocessor Programming 7 (1)

What Should you do if you cant get a lock?


Keep trying
spin or busy-wait Good if delays are short

Give up the processor


Good if delays are long Always good on uniprocessor

Art of Multiprocessor Programming

8 (1)

What Should you do if you cant get a lock?


Keep trying
spin or busy-wait Good if delays are short

Give up the processor


Good if delays are long Always good on uniprocessor

our focus
Art of Multiprocessor Programming 9

Basic Spin-Lock

. .

CS spin lock critical section Resets lock upon exit

Art of Multiprocessor Programming

10

Basic Spin-Lock
lock introduces sequential bottleneck

. .

CS spin lock critical section Resets lock upon exit

Art of Multiprocessor Programming

11

Basic Spin-Lock
lock suffers from contention

. .

CS spin lock critical section Resets lock upon exit

Art of Multiprocessor Programming

12

Basic Spin-Lock
lock suffers from contention

. .

CS spin lock critical section Resets lock upon exit

Notice: these are distinct phenomena


Art of Multiprocessor Programming 13

Basic Spin-Lock
lock suffers from contention

. .

CS spin lock critical section Resets lock upon exit

Seq Bottleneck no parallelism


Art of Multiprocessor Programming 14

Basic Spin-Lock
lock suffers from contention

. .

CS spin lock critical section Resets lock upon exit

Contention ???
Art of Multiprocessor Programming 15

Review: Test-and-Set
Boolean value Test-and-set (TAS)
Swap true with current value Return value tells if prior value was true or false

Can reset just by writing false TAS aka getAndSet


Art of Multiprocessor Programming 16

Review: Test-and-Set
public class AtomicBoolean { boolean value;
public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; }

Art of Multiprocessor Programming

17 (5)

Review: Test-and-Set
public class AtomicBoolean { boolean value;
public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package

java.util.concurrent.atomic

Art of Multiprocessor Programming

18

Review: Test-and-Set
public class AtomicBoolean { boolean value;
public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; }

Swap old and new values


Art of Multiprocessor Programming 19

Review: Test-and-Set
AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getAndSet(true)

Art of Multiprocessor Programming

20

Review: Test-and-Set
AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getAndSet(true)

Swapping in true is called test-and-set or TAS

Art of Multiprocessor Programming

21 (5)

Test-and-Set Locks
Locking
Lock is free: value is false Lock is taken: value is true

Acquire lock by calling TAS


If result is false, you win If result is true, you lose

Release lock by writing false


Art of Multiprocessor Programming 22

Test-and-set Lock
class TASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Art of Multiprocessor Programming 23

Test-and-set Lock
class TASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); Lock state }}

is AtomicBoolean
24

Art of Multiprocessor Programming

Test-and-set Lock
class TASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); Keep trying until }}

lock acquired
25

Art of Multiprocessor Programming

Test-and-set Lock
class TASlock { Release lock AtomicBoolean state = by resetting new AtomicBoolean(false); state to false
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Art of Multiprocessor Programming 26

Space Complexity
TAS spin-lock has small footprint N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the W(n) lower bound? We used a RMW operation
Art of Multiprocessor Programming 27

Performance
Experiment
n threads Increment shared counter 1 million times

How long should it take? How long does it take?

Art of Multiprocessor Programming

28

Graph
time
no speedup because of sequential bottleneck

ideal

threads
Art of Multiprocessor Programming 29

Mystery #1
TAS lock

time

Ideal

threads
Art of Multiprocessor Programming

What is going on?


30 (1)

Test-and-Test-and-Set Locks
Lurking stage
Wait until lock looks free Spin while read returns true (lock taken) As soon as lock looks available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking
Art of Multiprocessor Programming 31

Pouncing state

Test-and-test-and-set Lock
class TTASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } }
Art of Multiprocessor Programming 32

Test-and-test-and-set Lock
class TTASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } }

Wait until lock looks free


Art of Multiprocessor Programming 33

Test-and-test-and-set Lock
class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } }


Art of Multiprocessor Programming

Then try to acquire it

34

Mystery #2
TAS lock

time

TTAS lock Ideal

threads
Art of Multiprocessor Programming 35

Mystery
Both
TAS and TTAS Do the same thing (in our model)

Except that
TTAS performs much better than TAS Neither approaches ideal

Art of Multiprocessor Programming

36

Opinion
Our memory abstraction is broken TAS & TTAS methods
Are provably the same (in our model) Except they arent (in field tests)

Need a more detailed model

Art of Multiprocessor Programming

37

Bus-Based Architectures

cache

cache
Bus

cache

memory
Art of Multiprocessor Programming 38

Bus-Based Architectures
Random access memory (10s of cycles)
cache cache
Bus

cache

memory
Art of Multiprocessor Programming 39

Shared Bus Broadcast medium One broadcaster at a time Processors and memory all snoop
cache cache
Bus

Bus-Based Architectures

cache

memory
Art of Multiprocessor Programming 40

Per-Processor Caches Bus-Based Architectures Small Fast: 1 or 2 cycles Address & state information

cache

cache
Bus

cache

memory
Art of Multiprocessor Programming 41

Jargon Watch
Cache hit
I found what I wanted in my cache Good Thing

Art of Multiprocessor Programming

42

Jargon Watch
Cache hit
I found what I wanted in my cache Good Thing

Cache miss
I had to shlep all the way to memory for that data Bad Thing
Art of Multiprocessor Programming 43

Cave Canem
This model is still a simplification
But not in any essential way Illustrates basic principles

Will discuss complexities later

Art of Multiprocessor Programming

44

Processor Issues Load Request

cache

cache
Bus

cache

memory
Art of Multiprocessor Programming

data
45

Processor Issues Load Request


Gimme data

cache

cache
Bus

cache

memory
Art of Multiprocessor Programming

data
46

Memory Responds

cache
Got your data right here

cache
Bus

cache
Bus

memory
Art of Multiprocessor Programming

data
47

Processor Issues Load Request


Gimme data

data

cache
Bus

cache

memory
Art of Multiprocessor Programming

data
48

Processor Issues Load Request


Gimme data

data

cache
Bus

cache

memory
Art of Multiprocessor Programming

data
49

Processor Issues Load Request


I got data

data

cache
Bus

cache

memory
Art of Multiprocessor Programming

data
50

I got data

Other Processor Responds

data

cache
Bus

cache
Bus

memory
Art of Multiprocessor Programming

data
51

Other Processor Responds

data

cache
Bus

cache
Bus

memory
Art of Multiprocessor Programming

data
52

Modify Cached Data

data

data
Bus

cache

memory
Art of Multiprocessor Programming

data
53 (1)

Modify Cached Data

data

data data
Bus

cache

memory
Art of Multiprocessor Programming

data
54 (1)

Modify Cached Data

data

data
Bus

cache

memory
Art of Multiprocessor Programming

data
55

Modify Cached Data

data

data
Bus

cache

Whats up with the other copies?memory


Art of Multiprocessor Programming

data
56

Cache Coherence
We have lots of copies of data
Original copy in memory Cached copies at processors

Some processor modifies its own copy


What do we do with the others? How to avoid confusion?

Art of Multiprocessor Programming

57

Write-Back Caches
Accumulate changes in cache Write back when needed
Need the cache for something else Another processor wants it

On first modification
Invalidate other entries Requires non-trivial protocol
Art of Multiprocessor Programming 58

Write-Back Caches
Cache entry has three states
Invalid: contains raw seething bits Valid: I can read but I cant write Dirty: Data has been modified
Intercept other load requests Write back to memory before using cache

Art of Multiprocessor Programming

59

Invalidate

data

data
Bus

cache

memory
Art of Multiprocessor Programming

data
60

Invalidate

Mine, all mine!

data

data
Bus Bus

cache

memory
Art of Multiprocessor Programming

data
61

Invalidate
Uh,oh

data cache

data
Bus Bus

cache

memory
Art of Multiprocessor Programming

data
62

Invalidate
Other caches lose read permission

cache

data
Bus

cache

memory
Art of Multiprocessor Programming

data
63

Invalidate
Other caches lose read permission

cache

data
Bus

cache

This cache acquires write permission

memory

data

Art of Multiprocessor Programming

64

Invalidate
Memory provides data only if not present in any cache, so no need to change it now (expensive) data cache cache
Bus

memory
Art of Multiprocessor Programming

data
65 (2)

Another Processor Asks for Data

cache

data
Bus Bus

cache

memory
Art of Multiprocessor Programming

data
66 (2)

Owner Responds
Here it is!

cache

data
Bus Bus

cache

memory
Art of Multiprocessor Programming

data
67 (2)

End of the Day

data

data
Bus

cache

data memory Reading OK, no writing


Art of Multiprocessor Programming 68 (1)

Mutual Exclusion
What do we want to optimize?
Bus bandwidth used by spinning threads Release/Acquire latency Acquire latency for idle lock

Art of Multiprocessor Programming

69

Simple TASLock
TAS invalidates cache lines Spinners
Miss in cache Go to bus

Thread wants to release lock


delayed behind spinners

Art of Multiprocessor Programming

70

Test-and-test-and-set
Wait until lock looks free
Spin on local cache No bus use while lock busy

Problem: when lock is released


Invalidation storm

Art of Multiprocessor Programming

71

Local Spinning while Lock is Busy

busy

busy
Bus

busy

memory
Art of Multiprocessor Programming

busy
72

On Release

invalid

invalid
Bus

free

memory
Art of Multiprocessor Programming

free
73

Everyone misses, rereads

On Release

invalid miss

invalid miss
Bus

free

memory
Art of Multiprocessor Programming

free
74 (1)

Everyone tries TAS

On Release

TAS() invalid

TAS() invalid
Bus

free

memory
Art of Multiprocessor Programming

free
75 (1)

Problems
Everyone misses
Reads satisfied sequentially

Everyone does TAS


Invalidates others caches

Eventually quiesces after lock acquired


How long does this take?
Art of Multiprocessor Programming 76

Measuring Quiescence Time


X = time of ops that dont use the bus Y = time of ops that cause intensive bus traffic
P1 P2 Pn

In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance.

By gradually varying X, can determine the exact time to quiesce.


Art of Multiprocessor Programming 77

Quiescence Time
Increses linearly with the number of processors for bus architecture

time
threads
Art of Multiprocessor Programming

78

Mystery Explained
TAS lock

time

TTAS lock Ideal

Art of Multiprocessor Programming

Better than threads TAS but still not as good as ideal


79

Solution: Introduce Delay


If the lock looks free But I fail to get it There must be contention Better to back off than to collide again

time
r2d r1d d

spin lock
80

Art of Multiprocessor Programming

Dynamic Example: Exponential Backoff

time
4d 2d d

spin lock

If I fail to get lock wait random duration before retry Each subsequent failure doubles expected wait
Art of Multiprocessor Programming 81

Exponential Backoff Lock


public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Art of Multiprocessor Programming 82

Exponential Backoff Lock


public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; Fix minimum delay }}}
Art of Multiprocessor Programming 83

Exponential Backoff Lock


public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; Wait until lock looks free }}}
Art of Multiprocessor Programming 84

Exponential Backoff Lock


public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; If we win, return }}}
Art of Multiprocessor Programming 85

Exponential Backoff Lock


public class Backoff implements lock { for{ random duration publicBack void off lock() int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Art of Multiprocessor Programming 86

Exponential Backoff Lock


public class Backoff implements lock { Double delay, within reason public voidmax lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Art of Multiprocessor Programming 87

Spin-Waiting Overhead
TTAS Lock

time

Backoff lock

threads
Art of Multiprocessor Programming 88

Backoff: Other Issues


Good
Easy to implement Beats TTAS lock

Bad
Must choose parameters carefully Not portable across platforms

Art of Multiprocessor Programming

89

Idea
Avoid useless invalidations
By keeping a queue of threads

Each thread
Notifies next in line Without bothering the others

Art of Multiprocessor Programming

90

Anderson Queue Lock


next

idle

flags

Art of Multiprocessor Programming

91

Anderson Queue Lock


next

acquiring getAndIncrement

flags

Art of Multiprocessor Programming

92

Anderson Queue Lock


next

acquiring getAndIncrement

flags

Art of Multiprocessor Programming

93

Anderson Queue Lock


next

acquired

Mine!
flags

Art of Multiprocessor Programming

94

Anderson Queue Lock


next

acquired

acquiring

flags

Art of Multiprocessor Programming

95

Anderson Queue Lock


next

acquired

acquiring

flags

getAndIncrement
F F F F F F F

Art of Multiprocessor Programming

96

Anderson Queue Lock


next

acquired

acquiring

flags

getAndIncrement
F F F F F F F

Art of Multiprocessor Programming

97

Anderson Queue Lock


next

acquired

acquiring

flags

Art of Multiprocessor Programming

98

Anderson Queue Lock


next

released

acquired

flags

Art of Multiprocessor Programming

99

Anderson Queue Lock


next

released

acquired

flags

Yow!

Art of Multiprocessor Programming

100

Anderson Queue Lock


class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Art of Multiprocessor Programming

101

Anderson Queue Lock


class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

One flag per thread

Art of Multiprocessor Programming

102

Anderson Queue Lock


class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Next flag to use


Art of Multiprocessor Programming 103

Anderson Queue Lock


class ALock implements Lock { boolean[] flags={true,false,,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Thread-local variable
Art of Multiprocessor Programming 104

Anderson Queue Lock


public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; }
public unlock() { flags[(mySlot+1) % n] = true; }
Art of Multiprocessor Programming

105

Anderson Queue Lock


public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; }
public unlock() { flags[(mySlot+1) % n] = true; Take next }
Art of Multiprocessor Programming

slot
106

Anderson Queue Lock


public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; }
public unlock() { flags[(mySlot+1) % n] = true; Spin until told }
Art of Multiprocessor Programming

to go
107

Anderson Queue Lock


public lock() { myslot = next.getAndIncrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; }
public unlock() { flags[(myslot+1) % n] = true; } Prepare slot for
Art of Multiprocessor Programming

re-use
108

Anderson Queue Lock


public lock() Tell { next thread to mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; }
public unlock() { flags[(mySlot+1) % n] = true; }
Art of Multiprocessor Programming

go

109

Local Spinning
next

released

acquired

Spin on my bit

flags

Unfortunately many bits share cache line


Art of Multiprocessor Programming 110

False Sharing
next
Result: contention flags

released

acquired

gets cache invalidation on F F of store F account by threads it is not waiting for


111

Spin on my bit Spinning thread

Line Art 1 of Multiprocessor Programming Line 2

The Solution: Padding


next

released

acquired

Spin on my line

flags

Line Art 1 of Multiprocessor Programming Line 2

112

Performance
TTAS
Shorter handover than backoff queue Curve is practically flat Scalable performance

Art of Multiprocessor Programming

113

Anderson Queue Lock


Good
First truly scalable lock Simple, easy to implement Back to FIFO order (like Bakery)

Art of Multiprocessor Programming

114

Anderson Queue Lock


Bad
Space hog One bit per thread one cache line per thread
What if unknown number of threads? What if small number of actual contenders?

Art of Multiprocessor Programming

115

CLH Lock
FIFO order Small, constant-size overhead per thread

Art of Multiprocessor Programming

116

Initially
idle

tail
false
Art of Multiprocessor Programming 117

Initially
idle

tail
false
Art of Multiprocessor Programming

Queue tail
118

Initially
idle

Lock is free
tail
false
Art of Multiprocessor Programming 119

Initially
idle

tail
false
Art of Multiprocessor Programming 120

Purple Wants the Lock


acquiring

tail
false
Art of Multiprocessor Programming 121

Purple Wants the Lock


acquiring

tail
false

true
122

Art of Multiprocessor Programming

Purple Wants the Lock


acquiring

Swap
tail
false

true
123

Art of Multiprocessor Programming

Purple Has the Lock


acquired

tail
false

true
124

Art of Multiprocessor Programming

Red Wants the Lock


acquired acquiring

tail
false

true

true
125

Art of Multiprocessor Programming

Red Wants the Lock


acquired acquiring

Swap
tail
false

true

true
126

Art of Multiprocessor Programming

Red Wants the Lock


acquired acquiring

tail
false

true

true
127

Art of Multiprocessor Programming

Red Wants the Lock


acquired acquiring

tail
false

true

true
128

Art of Multiprocessor Programming

Red Wants the Lock


acquired acquiring

Implicit Linked list tail


false

true

true
129

Art of Multiprocessor Programming

Red Wants the Lock


acquired acquiring

tail
false

true

true
130

Art of Multiprocessor Programming

Red Wants the Lock


acquired acquiring

true

Actually, it spins on cached copy

tail
false

true

true
131

Art of Multiprocessor Programming

Purple Releases
release acquiring

false

Bingo!
true
132

tail
false

false

Art of Multiprocessor Programming

Purple Releases
released acquired

tail
true
Art of Multiprocessor Programming 133

Space Usage
Let
L = number of locks N = number of threads

ALock
O(LN)

CLH lock
O(L+N)
Art of Multiprocessor Programming 134

CLH Queue Lock


class Qnode { AtomicBoolean locked = new AtomicBoolean(true); }

Art of Multiprocessor Programming

135

CLH Queue Lock


class Qnode { AtomicBoolean locked = new AtomicBoolean(true); }

Not released yet

Art of Multiprocessor Programming

136

CLH Queue Lock


class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}
Art of Multiprocessor Programming 137

CLH Queue Lock


class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}
Art of Multiprocessor Programming

Queue tail
138

CLH Queue Lock


class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}
Art of Multiprocessor Programming

Thread-local Qnode
139

CLH Queue Lock


class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode Swap in my node = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}
Art of Multiprocessor Programming 140

CLH Queue Lock


class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode Spin until predecessor = new Qnode(); releases lock public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}
Art of Multiprocessor Programming 141

CLH Queue Lock


Class CLHLock implements Lock { public void unlock() { myNode.locked.set(false); myNode = pred; } }

Art of Multiprocessor Programming

142

CLH Queue Lock


Class CLHLock implements Lock { public void unlock() { myNode.locked.set(false); myNode = pred; } }

Notify successor
143

Art of Multiprocessor Programming

CLH Queue Lock


Class CLHLock implements Lock { public void unlock() { myNode.locked.set(false); myNode = pred; } }

Recycle predecessors node


144

Art of Multiprocessor Programming

CLH Queue Lock


Class CLHLock implements Lock { public void unlock() { myNode.locked.set(false); myNode = pred; } } (we dont actually reuse myNode. Code in book shows how its done.)
Art of Multiprocessor Programming 145

CLH Lock
Good
Lock release affects predecessor only Small, constant-sized space

Bad
Doesnt work for uncached NUMA architectures

Art of Multiprocessor Programming

146

NUMA Architecturs
Acronym:
Non-Uniform Memory Architecture

Illusion:
Flat shared memory

Truth:
No caches (sometimes) Some memory regions faster than others
Art of Multiprocessor Programming 147

NUMA Machines

Spinning on local memory is fast


Art of Multiprocessor Programming 148

NUMA Machines

Spinning on remote memory is slow


Art of Multiprocessor Programming 149

CLH Lock
Each thread spins on predecessors memory Could be far away

Art of Multiprocessor Programming

150

MCS Lock
FIFO order Spin on local memory only Small, Constant-size overhead

Art of Multiprocessor Programming

151

Initially
idle

tail
false
Art of Multiprocessor Programming 152

Acquiring
acquiring
(allocate Qnode)
true

tail
false
Art of Multiprocessor Programming 153

Acquiring
acquired

swap
tail
false

true

Art of Multiprocessor Programming

154

Acquiring
acquired

true

tail
false
Art of Multiprocessor Programming 155

Acquired
acquired

true

tail
false
Art of Multiprocessor Programming 156

Acquiring
acquired acquiring

false

tail

swap
Art of Multiprocessor Programming

true
157

Acquiring
acquired acquiring

false

tail
true
Art of Multiprocessor Programming 158

Acquiring
acquired acquiring

false

tail
true
Art of Multiprocessor Programming 159

Acquiring
acquired acquiring

false

tail
true
Art of Multiprocessor Programming 160

Acquiring
acquired acquiring

true

tail
true false
Art of Multiprocessor Programming 161

Acquiring
acquired acquiring

true

Yes!
false true

tail

Art of Multiprocessor Programming

162

MCS Queue Lock


class Qnode { boolean locked = false; qnode next = null; }

Art of Multiprocessor Programming

163

MCS Queue Lock


class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}
Art of Multiprocessor Programming 164

MCS Queue Lock


class MCSLock implements Lock { Make a AtomicReference tail; QNode public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}
Art of Multiprocessor Programming 165

MCS Queue Lock


class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true;add my Node to pred.next = qnode; the tail of while (qnode.locked) {} queue }}}
Art of Multiprocessor Programming 166

MCS Queue Lock


class MCSLock implements Lock { Fix if queue AtomicReference tail; was non-empty public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}
Art of Multiprocessor Programming 167

MCS Queue Lock


class MCSLock implements Lock { AtomicReference tail; Wait until public void lock() { unlocked Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}
Art of Multiprocessor Programming 168

MCS Queue Unlock


class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }}
Art of Multiprocessor Programming 169

MCS Queue Lock


class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; Missing successor? }}
Art of Multiprocessor Programming 170

MCS Queue Lock


class MCSLock implements Lock { If really no successor, AtomicReference tail; return public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }}
Art of Multiprocessor Programming 171

MCS Queue Lock


class MCSLock implements Lock { Otherwise wait for AtomicReference tail; successor to catch up public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }}
Art of Multiprocessor Programming 172

MCS Queue Lock


class MCSLock implements Lock { AtomicReference queue; Pass lock to successor public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }}
Art of Multiprocessor Programming 173

Purple Release
releasing swap

false false
Art of Multiprocessor Programming 174

By looking at the queue, I see another thread is releasing activeswap

Purple Release

false false
Art of Multiprocessor Programming 175

By looking at the queue, I see another thread is releasing activeswap

Purple Release

false false I have to wait for that thread to finish


176

Art of Multiprocessor Programming

Purple Release
releasing prepare to spin

true false
Art of Multiprocessor Programming 177

Purple Release
releasing spinning

true false
Art of Multiprocessor Programming 178

Purple Release
releasing spinning

false true false


Art of Multiprocessor Programming 179

Purple Release
releasing Acquired lock

false true false


Art of Multiprocessor Programming 180

Abortable Locks
What if you want to give up waiting for a lock? For example
Timeout Database transaction aborted by user

Art of Multiprocessor Programming

181

Back-off Lock
Aborting is trivial
Just return from lock() call

Extra benefit:
No cleaning up Wait-free Immediate return

Art of Multiprocessor Programming

182

Queue Locks
Cant just quit
Thread in line behind will starve

Need a graceful way out

Art of Multiprocessor Programming

183

Queue Locks
spinning spinning spinning

true

true

true

Art of Multiprocessor Programming

184

Queue Locks
locked spinning spinning

false

true

true

Art of Multiprocessor Programming

185

Queue Locks
locked spinning

false

true

Art of Multiprocessor Programming

186

Queue Locks
locked

false

Art of Multiprocessor Programming

187

Queue Locks
spinning spinning spinning

true

true

true

Art of Multiprocessor Programming

188

Queue Locks
spinning spinning

true

true

true

Art of Multiprocessor Programming

189

Queue Locks
locked spinning

false

true

true

Art of Multiprocessor Programming

190

Queue Locks
spinning

false

true

Art of Multiprocessor Programming

191

Queue Locks
pwned

false

true

Art of Multiprocessor Programming

192

Abortable CLH Lock


When a thread gives up
Removing node in a wait-free way is hard

Idea:
let successor deal with it.

Art of Multiprocessor Programming

193

Initially
idle Pointer to predecessor (or null)

tail
A
Art of Multiprocessor Programming 194

Initially
idle Distinguished available node means lock is free

tail
A
Art of Multiprocessor Programming 195

Acquiring
acquiring

tail
A
Art of Multiprocessor Programming 196

Null predecessor Acquiring means lock not released or acquiring aborted

A
Art of Multiprocessor Programming 197

Acquiring
acquiring

Swap
A
Art of Multiprocessor Programming 198

Acquiring
acquiring

A
Art of Multiprocessor Programming 199

Acquired
locked
Pointer to AVAILABLE means lock is free.

A
Art of Multiprocessor Programming 200

Normal Case
locked spinning spinning

Null means lock is not free & request not aborted


Art of Multiprocessor Programming 201

One Thread Aborts


locked Timed out spinning

Art of Multiprocessor Programming

202

Successor Notices
locked Timed out spinning

Non-Null means predecessor aborted


Art of Multiprocessor Programming 203

Recycle Predecessors Node


locked spinning

Art of Multiprocessor Programming

204

Spin on Earlier Node


locked spinning

Art of Multiprocessor Programming

205

Spin on Earlier Node


released spinning

A The lock is now mine


Art of Multiprocessor Programming 206

Time-out Lock
public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

Art of Multiprocessor Programming

207

Time-out Lock
public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

Distinguished node to signify free lock


Art of Multiprocessor Programming 208

Time-out Lock
public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

Tail of the queue


Art of Multiprocessor Programming 209

Time-out Lock
public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

Remember my node
Art of Multiprocessor Programming 210

Time-out Lock
public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred== null || myPred.prev == AVAILABLE) { return true; }

Art of Multiprocessor Programming

211

Time-out Lock
public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred == null || myPred.prev == AVAILABLE) { return true; }

Create & initialize node


Art of Multiprocessor Programming 212

Time-out Lock
public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred == null || myPred.prev == AVAILABLE) { return true; }

Swap with tail


Art of Multiprocessor Programming 213

Time-out Lock
public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred == null || myPred.prev == AVAILABLE) { return true; } ...

If predecessor absent or released, we are done


214

Art of Multiprocessor Programming

Time-out Lock
locked

spinning

spinning

long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } }
Art of Multiprocessor Programming 215

Time-out Lock
long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } Keep trying for a }
Art of Multiprocessor Programming

while
216

Time-out Lock
long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } } prev field
Art of Multiprocessor Programming

Spin on predecessors
217

Time-out Lock
long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } } Predecessor released
Art of Multiprocessor Programming

lock
218

Time-out Lock
long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } Predecessor aborted, } advance one
Art of Multiprocessor Programming 219

Time-out Lock
if (!tail.compareAndSet(qnode, myPred)) qnode.prev = myPred; return false; } }

What do I do when I time out?

Art of Multiprocessor Programming

220

Time-out Lock
if (!tail.compareAndSet(qnode, myPred)) qnode.prev = myPred; return false; } }

Do I have a successor? If CAS fails: I do have a successor, tell it about myPred


Art of Multiprocessor Programming 221

Time-out Lock
if (!tail.compareAndSet(qnode, myPred)) qnode.prev = myPred; return false; } }

If CAS succeeds: no successor, simply return false


Art of Multiprocessor Programming 222

Time-Out Unlock
public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; }

Art of Multiprocessor Programming

223

Time-out Unlock
public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; }

If CAS failed: exists successor, notify successor it can enter


Art of Multiprocessor Programming 224

Timing-out Lock
public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; }

CAS successful: set tail to null, no clean up since no successor waiting


Art of Multiprocessor Programming 225

One Lock To Rule Them All?


TTAS+Backoff, CLH, MCS, ToLock Each better than others in some way There is no one solution Lock we pick really depends on:
the application the hardware which properties are important
Art of Multiprocessor Programming 226

This work is licensed under a Creative Commons AttributionShareAlike 2.5 License.


You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Art of Multiprocessor Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.

Art of Multiprocessor Programming

227

You might also like