Professional Documents
Culture Documents
Concepts of Parallel Programming
Concepts of Parallel Programming
Parallel Computing
Alf Wachsmann
Stanford Linear Accelerator Center (SLAC)
alfw@slac.stanford.edu
• What about
• 1 tree takes 30 years to grow big.
How long do 3 trees need?
36 100
136
1 2 3 4 5 6 7 8
Processors
Memory
Fetch Execute
CPU
SISD SIMD
Single Instruction, Single Data Single Instruction, Multiple Data
MISD MIMD
Multiple Instruction, Single Data Multiple Instruction, Multiple Data
CPU
CPU
Time
end do
call sub3
call sub4
...
...
Machine A Machine B
task 0 task 1
data data
Network
send(data) receive(data)
Source: http://en2.wikipedia.org/wiki/Class_NC
F 1=1
F 2=1
F k2=F k F k 1
N L = Latency [s]
cost = L N = number of bytes [byte]
B
B = Bandwidth [byte/s]
cost [s]
Infiniband Vendor
~900MB/s
indep. ~10ms
http://www.infinibandta.org/ standard (4x HCAs)
QLogic
Mellanox Myrinet Myrinet Quadrics Chelsio
InfiniPath
MHGA28 F 10G QM500 T210-CX
HT
http://www.mellanox.com/applications/performance_benchma
Intro. to Parallel Computing – Spring 2007 Concepts of Parallel Computing – A. Wachsmann 37
Synchronization
• “handshaking” between tasks that are sharing data
• Types of synchronization:
• Barrier
• Usually implies that all tasks are involved
• Each task performs its work until it reaches the barrier.
It then stops, or "blocks"
• When the last task reaches the barrier, all tasks are
synchronized
• Used in MPI
speedup=
1 N = number of processors,
P
S P = parallel fraction and
N S=1-P = serial fraction
http://upload.wikimedia.org/wikipedia/en/7/7a/Amdahl-law.jpg Speedup
N P=0.50 P=0.90 P=0.99 P=1.0
10 1.82 5.26 9.17 10
100 1.98 9.17 50.25 100
1000 1.99 9.91 90.99 1000
10000 1.99 9.99 99.02 10000