Professional Documents
Culture Documents
Barrier Synchronization
Barrier Synchronization
Art of Multiprocessor 2
Programming
Simple Video Game
while (true) {
frame.prepare();
frame.display();
}
Art of Multiprocessor 3
Programming
Simple Video Game
while (true) {
frame.prepare();
frame.display();
}
Art of Multiprocessor 4
Programming
Two-Phase Rendering
while (true) { while (true) {
if (phase) { if (phase) {
frame[0].display(); frame[1].prepare();
} else { } else {
frame[1].display(); frame[0].prepare();
} }
phase = !phase; phase = !phase;
} }
Art of Multiprocessor 5
Programming
Two-Phase Rendering
while (true) { while (true) {
if (phase) { if (phase) {
frame[0].display(); frame[1].prepare();
} else { } else {
frame[1].display(); frame[0].prepare();
} }
phase = !phase; phase = !phase;
} }
even phases
Art of Multiprocessor 6
Programming
Two-Phase Rendering
while (true) { while (true) {
if (phase) { if (phase) {
frame[0].display(); frame[1].prepare();
} else { } else {
frame[1].display(); frame[0].prepare();
} }
phase = !phase; phase = !phase;
} }
odd phases
Art of Multiprocessor 7
Programming
Synchronization Problems
• How do threads stay in phase?
• Too early?
– “we render no frame before its time”
• Too late?
– Recycle memory before frame is displayed
Art of Multiprocessor 8
Programming
Ideal Parallel Computation
0 0
0 1 1
Art of Multiprocessor 9
Programming
Ideal Parallel Computation
2 2
2 1 1
Art of Multiprocessor 10
Programming
Real-Life Parallel Computation
0 0
0 zzz… 1 1
Art of Multiprocessor 11
Programming
Real-Life Parallel Computation
2
1zzz… 1
0
Uh, oh
Art of Multiprocessor 12
Programming
Barrier Synchronization
barrier
0 0
Art of Multiprocessor 13
Programming
Barrier Synchronization
barrier
1 1
Art of Multiprocessor 14
Programming
Barrier Synchronization
barrier
Until every
thread has left
here
No thread
Art of Multiprocessor 15
Programming
enters here
Why Do We Care?
• Mostly of interest to
– Scientific & numeric computation
• Elsewhere
– Garbage collection
– Less common in systems programming
– Still important topic
Art of Multiprocessor 16
Programming
Duality
• Dual to mutual exclusion
– Include others, not exclude them
• Same implementation issues
– Interaction with caches …
• Invalidation?
• Local spinning?
Art of Multiprocessor 17
Programming
Example: Parallel Prefix
a b c d before
a+b+c
after a a+b a+b+c
+d
Art of Multiprocessor 18
Programming
Parallel Prefix
One thread
Per entry a b c d
Art of Multiprocessor 19
Programming
Parallel Prefix: Phase 1
a b c d
Art of Multiprocessor 20
Programming
Parallel Prefix: Phase 2
a b c d
a+b+c
a a+b a+b+c
+d
Art of Multiprocessor 21
Programming
Parallel Prefix
• N threads can compute
– Parallel prefix
– Of N entries
– In log2 N rounds
• What if system is asynchronous?
– Why we need barriers
Art of Multiprocessor 22
Programming
Prefix
class Prefix extends Thread {
int[] a;
int i;
Barrier b;
void Prefix(int[] a,
Barrier b, int i) {
a = a;
b = b;
i = i;
}
Art of Multiprocessor 23
Programming
Prefix
class Prefix extends Thread {
int[] a;
int i;
Barrier b;
void Prefix(int[] a,
Barrier b, int i) {
a = a;
Array of input
b = b; values
i = i;
}
Art of Multiprocessor 24
Programming
Prefix
class Prefix extends Thread {
int[] a;
int i;
Barrier b;
void Prefix(int[] a,
Barrier b, int i) {
a = a;
b = b; Thread index
i = i;
}
Art of Multiprocessor 25
Programming
Prefix
class Prefix extends Thread {
int[] a;
int i;
Barrier b;
void Prefix(int[] a,
Barrier b, int i) {
a = a;
b = b; Shared barrier
i = i;
}
Art of Multiprocessor 26
Programming
Prefix
class Prefix extends Thread {
int[] a;
int i;
Barrier b; Initialize fields
void Prefix(int[] a,
Barrier b, int i) {
a = a;
b = b;
i = i;
}
Art of Multiprocessor 27
Programming
Where Do the Barriers Go?
public void run() {
int d = 1, sum = 0;
while (d < N) {
if (i >= d)
sum = a[i-d];
if (i >= d)
a[i] += sum;
d = d * 2;
}}}
Art of Multiprocessor 28
Programming
Where Do the Barriers Go?
public void run() {
int d = 1, sum = 0;
while (d < N) {
if (i >= d)
sum = a[i-d];
b.await(); Make sure everyone reads
if (i >= d) before anyone writes
a[i] += sum;
d = d * 2;
}}}
Art of Multiprocessor 29
Programming
Where Do the Barriers Go?
public void run() {
int d = 1, sum = 0;
while (d < N) {
if (i >= d)
sum = a[i-d];
b.await(); Make sure everyone reads
if (i >= d) before anyone writes
a[i] += sum;
b.await(); Make sure everyone writes
d = d * 2; before anyone reads
}}}
Art of Multiprocessor 30
Programming
Barrier Implementations
• Cache coherence
– Spin on locally-cached locations?
– Spin on statically-defined locations?
• Latency
– How many steps?
• Symmetry
– Do all threads do the same thing?
Art of Multiprocessor 31
Programming
Barriers
public class Barrier {
AtomicInteger count;
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 32
Programming
Barriers
public class Barrier {
AtomicInteger count;
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n; Number of threads
} not yet arrived
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 33
Programming
Barriers
public class Barrier {
AtomicInteger count;
int size; Number of threads
public Barrier(int n){ participating
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 34
Programming
Barriers
public class Barrier { Initialization
AtomicInteger count;
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 35
Programming
Barriers
public class Barrier {
AtomicInteger count; Principal method
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 36
Programming
Barriers
public class Barrier {
AtomicInteger count; If I’m last, reset fields
int size; for next time
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 37
Programming
Barriers
public class Barrier {
AtomicInteger count;
int size; Otherwise, wait for
public Barrier(int n){ everyone else
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 38
Programming
Barriers
public class Barrier {
AtomicInteger count;
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n;
} What’s wrong with this protocol?
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 39
Programming
Reuse
Barrier b = new Barrier(n);
while ( mumble() ) {
work(); do work
repeat
b.await() synchronize
}
Art of Multiprocessor 40
Programming
Barriers
public class Barrier {
AtomicInteger count;
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 41
Programming
Barriers
public class Barrier {
AtomicInteger count; Waiting for
int size; Phase 1 to finish
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
}}}} Art of Multiprocessor 42
Programming
Barriers
public classPhase
Barrier1 {
AtomicInteger count; Waiting for
is so over
Phase 1 to finish
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
Art of Multiprocessor 43
}}}} Programming
Barriers
public class Barrier
Prepare for {
phase
AtomicInteger 2
count; ZZZZZ….
int size;
public Barrier(int n){
count = AtomicInteger(n);
size = n;
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
Art of Multiprocessor 44
}}}} Programming
Uh-Oh
public class Barrier {
AtomicInteger count;Waiting for
Phase 2 to finish
int size;
public Barrier(int n){
count = AtomicInteger(n); Waiting for
size = n; Phase 1 to finish
}
public void await() {
if (count.getAndDecrement()==1) {
count.set(size);
} else {
while (count.get() != 0);
Art of Multiprocessor
}}}} Programming
Basic Problem
• One thread “wraps around” to start
phase 2
• While another thread is still waiting for
phase 1
• One solution:
– Always use two barriers
Art of Multiprocessor 46
Programming
Sense-Reversing Barriers
public class Barrier {
AtomicInteger count;
int size;
volatile boolean sense = false;
threadSense = new ThreadLocal<boolean>…
2-barrier 2-barrier
Art of Multiprocessor 54
Programming
Combining Tree Barriers
2-barrier
2-barrier 2-barrier
Art of Multiprocessor 55
Programming
Combining Tree Barrier
public class Node{
AtomicInteger count; int size;
Node parent; volatile boolean sense;
Art of Multiprocessor 63
Programming
Remarks
• Everyone spins on sense field
– Local spinning on bus-based (good)
– Network hot-spot on distributed
architecture (bad)
• Not really scalable
Art of Multiprocessor 64
Programming
Tournament Tree Barrier
• If tree nodes have fan-in 2
– Don’t need to call getAndDecrement()
– Winner chosen statically
• At level i
– If i-th bit of id is 0, move up
– Otherwise keep back
Art of Multiprocessor 65
Programming
Tournament Tree Barriers
root
winner loser
Art of Multiprocessor 67
Programming
Tournament Tree Barriers
Art of Multiprocessor 68
Programming
Tournament Tree Barriers
Loser spins on
own flag
Art of Multiprocessor 69
Programming
Tournament Tree Barriers
Winner
spins on
own flag
Art of Multiprocessor 70
Programming
Tournament Tree Barriers
Art of Multiprocessor 71
Programming
Tournament Tree Barriers
Bingo!
Art of Multiprocessor 72
Programming
Tournament Tree Barriers
Sense-reversing:
next time use
blue flags
Art of Multiprocessor 73
Programming
Tournament Barrier
class TBarrier {
volatile boolean flag;
TBarrier partner;
TBarrier parent;
boolean top;
…
}
Art of Multiprocessor 74
Programming
Tournament Barrier
class TBarrier {
volatile boolean flag;
TBarrier partner; Notifications
TBarrier parent; delivered here
boolean top;
…
}
Art of Multiprocessor 75
Programming
Tournament Barrier
class TBarrier {
volatile boolean flag;
TBarrier partner;
Other thead at
TBarrier parent;
same level
boolean top;
…
}
Art of Multiprocessor 76
Programming
Tournament Barrier
class TBarrier {
volatile boolean flag;
TBarrier partner;
Parent (winner) or
TBarrier parent;
null (loser)
boolean top;
…
}
Art of Multiprocessor 77
Programming
Tournament Barrier
class TBarrier {
volatile boolean flag;
TBarrier partner; Am I the root?
TBarrier parent;
boolean top;
…
}
Art of Multiprocessor 78
Programming
Tournament Barrier
void await(boolean mySense) {
if (top) {
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 79
Programming
Tournament Barrier
void await(boolean mySense) { Current sense
if (top) {
return; Le root, c’est moi
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 80
Programming
Tournament Barrier
void await(boolean mySense) {
I am already a
if (top) {
winner
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 81
Programming
Tournament Barrier
void await(boolean mySense) {
Wait for partner
if (top) {
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 82
Programming
Tournament Barrier
void await(boolean mySense) {
if (top) {
Synchronize upstairs
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 83
Programming
Tournament Barrier
void await(boolean mySense) {
if (top) {
Inform partner
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 84
Programming
Tournament Barrier
void await(boolean mySense) {
if (top) {
Inform partner
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}} Order is important (why?)
Art of Multiprocessor 85
Programming
Tournament Barrier
void await(boolean mySense) {
if (top) {
Natural-born loser
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 86
Programming
Tournament Barrier
void await(boolean mySense) {
if (top) {
Tell partner I’m here
return;
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 87
Programming
Tournament Barrier
void await(boolean mySense) {
if (top) {
Wait for notification
return;
from partner
} else if (parent != null) {
while (flag != mySense) {};
parent.await(mySense);
partner.flag = mySense;
} else {
partner.flag = mySense;
while (flag != mySense) {};
}}}
Art of Multiprocessor 88
Programming
Remarks
• No need for read-modify-write calls
• Each thread spins on fixed location
– Good for bus-based architectures
– Good for NUMA architectures
Art of Multiprocessor 89
Programming
Dissemination Barrier
• At round i
– Thread A notifies thread A+2i (mod n)
• Requires log n rounds
Art of Multiprocessor 90
Programming
Dissemination Barrier
+1 +2 +4
Art of Multiprocessor 91
Programming
Remarks
• Elegant
• Good source of homework problems
• Not cache-friendly
Art of Multiprocessor 92
Programming
Ideas So Far
• Sense-reversing
– Reuse without reinitializing
• Combining tree
– Like counters, locks …
• Tournament tree
– Optimized combining tree
• Dissemination barrier
– Intellectually Pleasing (matter of taste)
Art of Multiprocessor 93
Programming
Which is best for Multicore?
• On a cache coherent multicore chip:
perhaps none of the above…
• Here is another (arguably) better
algorithm …
Art of Multiprocessor 94
Programming
Static Tree Barrier
Sense-reversing
flag
Art of Multiprocessor 96
Programming
Static Tree Barrier
2 0
0 0
2
Spin until zero …
2 0
0 0
Art of Multiprocessor 98
Programming
Static Tree Barrier
2
1 0
0 My
0 counter is zero,
decrement parent
Art of Multiprocessor 99
Programming
Static Tree Barrier
2
1 0
Spin on done
0 0 flag
1
2
2
1 0
0 0
1
2
2
1 0
0 0
1
2
2
1 0
0 0
1
2
0
1 0
0 0
1
2
0
1 0
0 0
0
1 0
0 0
1
0
0
1 0
0 0
1
0
0
1 0
0 0
yes! 1
0
0
1 yes!
0
yes!
0 0 yes!
1
2
2
1 0
0 0
113
Concurrent Programming
• Many data structures combine blocking
& non-blocking methods
• Java™ concurrency package
– skiplists, hash tables, exchangers
– on 10 million desktops.
114
Progress Conditions
• Deadlock-free:
– Some thread eventually acquires lock.
• Starvation-free:
– Every thread eventually acquires lock.
• Lock-free:
– Some method call returns.
• Wait-free:
– Every method call returns.
• Obstruction-free:
– Every method call returns if it executes in isolation
We will show an
example shortly 115
List-Based Sets
• Unordered collection of elements
• No duplicates
• Methods
– Add() a new element
– Remove() an element
– Contains() if element is present
116
Coarse Grained Locking
a b d
a b d
118
Optimistic Fine Grained
a b c e
a b c d
120
The Simple Snapshot is
Obstruction-Free
• Put increasing labels on each entry
• Collect twice
• If both agree,
Collect1 Collect2
– We’re done
1
• Otherwise,
1
22 22
1 1
– Try again 7 = 7
13 13
18 18
12 12
121
Obstruction-freedom
• In the simple snapshot alg:
– The update method is wait-free
– But scan is obstruction-free
• Completes if it executes in isolation
• (no concurrent updates).
122
Wait-free contains()
a 0 b 0 dc 1
0 e 0
123
Lazy List-based Set Alg
a 0 b 0 dc 1
0 e 0
124
Lock-free List-Based Set
Logical Removal =
Set Mark Bit
a 0 b 0 cc 1
0 e 0
125
So how can this make sense?
• Why have methods with different
progress conditions?
• Let us try to understand this…
All make
Wait- Obstruction- Starvation-
progress
free free free
Some
Lock- Deadlock-
make
free free
progress
128
More Formally
• Standard notion of abstract object
• Progress conditions relate to method
calls of an object
– A thread is active if it takes an infinite
number of concrete (machine level) steps
– And is suspended if not.
129
Maximal vs. Minimal
• Minimal progress
– some call eventually completes
– System matters, not individuals
• Maximal progress
– every call eventually completes.
– Individuals matter
Flags courtesy of
www.theodora.com/flags used
with permission
130
The “Periodic Table” of
Progress Conditions
Non-Blocking Blocking
Maximal
Wait- Obstruction- Starvation-
progress
free free free
Minimal
Lock- Deadlock-
progress
free free
131
The Scheduler’s Role
Multiprocessor progress properties:
• Are not about the guarantees a
method's implementation provides.
• Are about scheduling needed to provide
minimal or maximal progress.
132
Fair Scheduling
• A history is fair if each thread takes an
infinite number of steps
133
Starvation Freedom
• A method implementation is starvation-
free if it guarantees maximal progress
in every fair history.
134
Dependent Progress
• Dependent progress conditions
– Do not guarantee minimal progress in
every history
• Independent ones do.
• Blocking progress conditions
– deadlock-freedom, Starvation-freedom
– are dependent.
135
Non-blocking Independent
Conditions
• A lock-free method guarantees
– minimal progress
– in every history.
• A wait-free method guarantees
– maximal progress
– in every history.
136
The “Periodic Table” of
Progress Conditions
Non-Blocking Blocking
Maximal
Wait- Obstruction- Starvation-
progress
free free free
Minimal
Lock- Deadlock-
progress
free free
Independent Dependent
137
Uniformly Isolating Schedules
• A history is uniformly isolating if any
thread eventually runs by itself for “long
enough”
• Modern systems do this with backoff,
yield, etc.
138
A Non-blocking Dependent
Condition
• A method implementation is obstruction-
free if it guarantees
– maximal progress
– in every uniformly isolating history.
139
The “Periodic Table” of
Progress Conditions
Non-Blocking Blocking
Independent Dependent
140
The “Periodic Table” of
Progress Conditions
Non-Blocking Blocking
Independent Dependent
141
Clash-Freedom: the
“Einsteinium” of Progress
• A method implementation is clash-free if
it guarantees
– minimal progress
– in every uniformly isolating history.
• Thm: clash-freedom strictly weaker than
obstruction-freedom
142
Getting from Minimal to
Maximal
Non-Blocking Blocking
Independent Dependent
143
Maximal Progress Postulate
•Programmers want maximal progress.
•Methods’ progress conditions define
–What we expect from the scheduler
–For example
•Don’t halt in critical section
•Let me run in isolation long enough …
s147
Why Lock-Free is OK
• We all want maximal progress
– Wait-free
• Yet we often write lock-free or deadlock-
free lock-based algorithms
• OK if we expect the scheduler to be
benevolent
– Often true (not always!)
Shared Memory
10011
150
The Answer
151
Shared-Memory Computability
Shared Memory
10011
153
Programmers Expect the Best
154
This work is licensed under a Creative Commons Attribution-
ShareAlike 2.5 License.
• You are free:
– to Share — to copy, distribute and transmit the work
– to Remix — to adapt the work
• Under the following conditions:
– Attribution. You must attribute the work to “The Art of
Multiprocessor Programming” (but not in any way that
suggests that the authors endorse you or your use of the
work).
– Share Alike. If you alter, transform, or build upon this work,
you may distribute the resulting work only under the same,
similar or a compatible license.
• For any reuse or distribution, you must make clear to others the
license terms of this work. The best way to do this is with a link
to
– http://creativecommons.org/licenses/by-sa/3.0/.
• Any of the above conditions can be waived if you get permission
from the copyright holder.
• Nothing in this license impairs or restricts the author's moral
rights.