Professional Documents
Culture Documents
Lecture 15
Lecture 15
3. One process starts with send followed by receive, another vica versa
Works
Does this
OKwork?
for small messages only (if sendbuf is smaller then system message
send-bubber)
Large
But what
messages
about large
produce
messages?
Deadlock
Why? Would using Irecv help out?
314 MPI 14.4 Non-Blocking Send and Receive, Avoiding Deadlocks
Following is deadlock-free:
if rank==0:
request=comm.Isend([sendbuf, MPI.FLOAT], 1, tag)
comm.Recv([recvbuf, MPI.FLOAT], 1, tag)
request.Wait()
else:
request=comm.Isend([sendbuf, MPI.FLOAT], 0, tag)
comm.Recv([recvbuf, MPI.FLOAT], 0, tag)
request.Wait()
Is
This
thisproduces
OK? deadlock in any system message buffer size
316 MPI 14.4 Non-Blocking Send and Receive, Avoiding Deadlocks
(This
Is it deadlock-free
is deadlock-free)
317 MPI 14.4 Non-Blocking Send and Receive, Avoiding Deadlocks
– Data is devided between processes so that each process does the same
operations but with different data
• Control-parallel approach
– Each process has access to all pieces of data, but they perform different
operations on them
• Data dependencies
ax =f
bx +cy = g.
Given requirements are not easily achievable. Some problems are easily parallelis-
able, others not.
322 // Program Design 15.2 Assessing parallel programs
tseq (N)
S(N, P) :=
tpar (N, P)
• tseq (N) time solving the same problem with best known sequential algorithm
• 0 < S(N, P) ≤ P
15.2.2 Efficiency
tseq (N)
E(N, P) := .
P · tpar (N, P)
Presumably, 0 < E(N, P) ≤ 1.
324 // Program Design 15.2 Assessing parallel programs
tseq 1 1
S(N, P) = 1−σ
= 1−σ
≤ .
σ+ P tseq σ+ P σ
• The point is, that usually σ is not constant, but is reducing with N growing
To avoid problems with the terms, often term scaled efficiency is used
tseq (N)
ES (N, P) :=
tpar (P · N, P)
• Does solution time remain the same with problem size change?
• 0 < ES (N, P) ≤ 1
16 Parallel preconditioners
16.2 PCG
Recall CG method
Calculate r(0) = b − Ax(0) with given s t a r t i n g vector x(0)
for i = 1 , 2 , . . .
s o l v e Mz(i−1) = r(i−1) # M −1 i s c a l l e d P r e c o n d i t i o n e r
T
ρi−1 = r(i−1) z(i−1)
i f i ==1
p(1) = z(0)
else
βi−1 = ρi−1 /ρi−2
p(i) = z(i−1) + βi−1 p(i−1)
endif
T
q(i) = Ap(i) ; αi = ρi−1 /p(i) q(i)
x(i) = x(i−1) + αi p(i) ; r(i) = r(i−1) − αi q(i)
check convergence ; continue i f needed
end
330 // Preconditioners 16.3 Block-Jacobi method
Ω 1
M 1
0
M =
0 M 2
Ω 2
Ω 1
M 1
0
M =
0 M 2
Ω 2
DDM Classification
• Non-overlapping methods
– Block-Jacobi method
– Additive Average method
• Overlapping methods
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63
64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81
337 // Preconditioners 16.5 Domain Decomposition Method (DDM)
Ω1 Ω2 Ω3
Ω4 Ω5 Ω6
Ω7 Ω8 Ω9
338 // Preconditioners 16.5 Domain Decomposition Method (DDM)
P
−1 def
M1AS = ∑ Mi−1
i=1
339 // Preconditioners 16.6 Multilevel methods
• assembled
– analytically or
– through formula: A0 = R0 ART0
• Define
def
M0−1 = RT0 A−1 T T −1
0 R0 = R0 (R0 AR0 ) R0
• If fine grid is with uneven density, coarse grid must adapt according to the fine
grid in terms of nodes numbers in each cell
342 // Preconditioners 16.6 Multilevel methods
• condition number of Additive Schwarz method κ(B) does not depend on dis-
cretisation parmeter n.
• Ax-operation
First is easily done with MPI_ALLREDUCE. (Attention is needecd only in the case
of overlaps)
Parallelising Ax-operation depends on existence of overlap
In case of no overlap (like Block-Jacobi method), usually technique with shadow-
nodes is used:
344 // Preconditioners 16.7 Parallelising PCG
3 4 6 7
12 13 15 16
21 22 23 24 25
31 32 33
345 // Preconditioners 16.7 Parallelising PCG