Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Elective Course 4 :CSE 327

High Performance Computing


(HPC)
Dr. Marwa A. Radad

Lecture 9: Introduction to MPI + MPI communication


Course Outline:
✓ Introduction to High Performance Computing HPC
✓ Architecture of parallel computing
✓ Shared memory and Distributed memory in parallel computing
✓ Parallel algorithms
✓ Performance of parallel systems
✓ Introduction to OpenMP
✓ Essentials of OpenMP programming
✓ Introduction to MPI and distributed memory programming
✓ Communication using MPI
Basics of MPI
The message passing paradigm
Distributed-memory
architecture:
Each process(or) can only access
Message
its dedicated address space.

No global shared address


space even if using shared
memory machine

Data exchange and communication


between processes is done by
explicitly passing messages through
a communication network
Parallel Programming 2020 2020-11-30 4
The message passing paradigm
Distributed-memory
architecture:
Each process(or) can only access
Message
its dedicated address space.

No global shared address


space even if using shared
memory machine
Message passing library:
Data exchange and communication ▪ Should be flexible, efficient and portable
between processes is done by
explicitly passing messages through ▪ Hide communication hardware and software
a communication network layers from application developer
Parallel Programming 2020 2020-11-30 5
The message passing paradigm
▪ Widely accepted standard in HPC / numerical simulation:
Message Passing Interface (MPI)

▪ Process-based approach: All variables are local!


▪ Same program on each processor/machine (SPMD)

▪ The program is written in a sequential language (Fortran/C[++])

▪ Data exchange between processes: Send/receive messages via MPI


library calls
▪ No automatic workload distribution

Parallel Programming 2020 2020-11-30 6


The MPI standard
▪ MPI forum – defines MPI standard / library subroutine interfaces

▪ Latest standard: MPI 3.1 (2015), 868 pages


▪ MPI 4.0 under development

▪ Successful free implementations (MPICH, mvapich,


OpenMPI) and vendor libraries (Intel, Cray, HP,…)
▪ Documents: http://www.mpi-forum.org/

Parallel Programming 2020 2020-11-30 7


MPI goals and scope
▪ Portability is main goal: architecture- and
hardware-independent code Application

MPI
▪ Fortran and C interfaces (C++
Drivers
deprecated)
▪ Features for supporting parallel Hard-
libraries ware

▪ Support for heterogeneous


environments (e.g., clusters with
compute nodes of different
architectures)

Parallel Programming 2020 2020-11-30 8


MPI in a nutshell
The beginner’s MPI toolbox
Architecture
▪ Operating system view:
Application User ▪ Running processes
▪ Developer’s view: Library routines for
▪ coordination
MPI execution ▪ communication
Library calls ▪ synchronization
(>500, 20 essential)
environment
▪ User’s view: MPI execution
environment provides
MPI Library ▪ resource allocation
▪ parallel program startup
Threads / processes / CPUs / ▪ other (implementation-dependent)
network hardware behavior
Parallel execution in MPI
Program startup ▪ Processes run throughout program execution

▪ MPI startup mechanism:


▪ launches tasks/processes
▪ establishes communication context
(“communicator”)

Parallel Programming 2020 2020-11-30 11


Parallel execution in MPI
Program startup ▪ Processes run throughout program execution

▪ MPI startup mechanism:


▪ launches tasks/processes
▪ establishes communication context
(“communicator”)
▪ MPI Point-to-point communication:
▪ between pairs of tasks/processes

Parallel Programming 2020 2020-11-30 12


Parallel execution in MPI
Program startup ▪ Processes run throughout program execution

▪ MPI startup mechanism:


▪ launches tasks/processes
▪ establishes communication context
(“communicator”)
▪ MPI Point-to-point communication:
+ ▪ between pairs of tasks/processes
▪ MPI Collective communication:
▪ between all processes or a subgroup
▪ barrier, reductions, scatter/gather

Parallel Programming 2020 2020-11-30 13


Parallel execution in MPI
Program startup ▪ Processes run throughout program execution

▪ MPI startup mechanism:


▪ launches tasks/processes
▪ establishes communication context
(“communicator”)
▪ MPI Point-to-point communication:
+ ▪ between pairs of tasks/processes
▪ MPI Collective communication:
▪ between all processes or a subgroup
▪ barrier, reductions, scatter/gather

Program shutdown ▪ Clean shutdown by MPI


Parallel Programming 2020 2020-11-30 14
C and Fortran interfaces for MPI
▪ Required header files:
▪ C: #include <mpi.h>

▪ Library:
▪ C: error = MPI_Xxxx(...);
int MPI_Init(int *argc, char *argv);

▪ MPI constants (global/common): All upper case in C


MPI_COMM_WORLD

▪ Arrays:
▪ C: indexed from 0

Parallel Programming 2020 2020-11-30 15


MPI error handling

▪ C routines
▪ return an int — may be ignored

▪ Return value MPI_SUCCESS


▪ Indicates that all is fine

▪ Default: Abort parallel computation in case of other return values

Parallel Programming 2020 2020-11-30 16


Initialization and finalization
▪ Details of MPI startup are implementation defined

▪ First call in MPI program: initialization of parallel machine

int MPI_Init(int *argc, char *argv);

▪ Last call: clean shutdown of parallel machine

int MPI_Finalize();

Only “master” process is guaranteed to continue after finalize


▪ Stdout/stderr of each MPI process
▪ usually redirected to console where program was started
▪ many options possible, depending on implementation
Parallel Programming 2020 2020-11-30 17
World communicator and rank
▪ MPI_Init() defines “communicator” MPI_COMM_WORLD comprising all
processes
MPI_COMM_WORLD
1
0
3 4
6 5
2
7
Process rank

Parallel Programming 2020 2020-11-30 18


Communicator and rank
▪ Communicator defines a set of processes (MPI_COMM_WORLD: all)
▪ The rank identifies each process within a communicator
▪ Obtain rank:
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
▪ rank = 0,1,2,…, (number of processes in communicator – 1)
▪ One process may have different ranks if it belongs to different
communicators

▪ Obtain number of processes in communicator:


int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
Parallel Programming 2020 2020-11-30 19
MPI “Hello World!” in C
#include <mpi.h>

int main(char *argc, char *argv) {


int rank, size; Never forget that
these are pointers to
the original varables!
MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

printf(“Hello World! I am %d of %d\n”, rank, size);


MPI_Finalize(); Communicator
required for (almost)
} all MPI calls

Parallel Programming 2020 2020-11-30 20


Compiling and running the code
▪ Compiling/linking
▪ Headers and libs must be found by
compiler
▪ Most implementations provide
wrapper scripts, e.g.,
▪ mpicc / mpiCC
▪ Behave like normal
compilers/linkers

▪ Running
▪ Details are implementation specific
▪ Startup wrappers:
▪ mpirun, mpiexec
Parallel Programming 2020 2020-11-30 21
Compiling and running the code
▪ Compiling/linking
▪ Headers and libs must be found by
compiler
▪ Most implementations provide
wrapper scripts, e.g., $ mpiCC -o hello hello.cc
▪ mpicc / mpiCC $ mpirun -np 4 ./hello
▪ Behave like normal
compilers/linkers Hello World! I am 3 of 4
Hello World! I am 1 of 4
Hello World! I am 0 of 4
▪ Running Hello World! I am 2 of 4
▪ Details are implementation specific
▪ Startup wrappers:
▪ mpirun, mpiexec
Parallel Programming 2020 2020-11-30 22
MPI point-to-point communication
Point-to-point communication: message envelope
▪ Which process is sending the message?
▪ Where is the data on the sending process?
▪ What kind of data is being sent?
Message
▪ How much data is there?

▪ Which processes are receiving the


message?
▪ Where should the data be left on the receiving process?
▪ How much data is the receiving process prepared to accept?

▪ Sender and receiver must pass their information to MPI separately


Parallel Programming 2020 2020-11-30 24
Point-to-point communication: message envelope
▪ Which process is sending the message?
▪ Where is the data on the sending process?
▪ What kind of data is being sent?
Message
▪ How much data is there?

▪ Which processes are receiving the


message?
▪ Where should the data be left on the receiving process?
▪ How much data is the receiving process prepared to accept?

▪ Sender and receiver must pass their information to MPI separately


Parallel Programming 2020 2020-11-30 25
Point-to-point communication: message envelope
▪ Which process is sending the message?
▪ Where is the data on the sending process?
▪ What kind of data is being sent?
Message
▪ How much data is there?

▪ Which processes are receiving the


message?
▪ Where should the data be left on the receiving process?
▪ How much data is the receiving process prepared to accept?

▪ Sender and receiver must pass their information to MPI separately


Parallel Programming 2020 2020-11-30 26
MPI point-to-point communication
▪ Processes communicate by sending and receiving messages
▪ MPI message: array of elements of a particular type

rank 𝑖 rank 𝑗

sender receiver

▪ Data types
▪ Basic
▪ MPI derived types

Parallel Programming 2020 2020-11-30 27


Predefined data types in MPI (selection)
Data type matching: Same type
in send and receive call
required

8 binary digits

Parallel Programming 2020 2020-11-30 28


MPI blocking point-to-point communication
▪ Point-to-point: one sender, one receiver
▪ Identified by rank

▪ Blocking: After the MPI call returns,


▪ the source process can safely modify the send buffer
▪ the receive buffer (on the destination process) contains the entire message.

Parallel Programming 2020 2020-11-30 29


Standard blocking send
int MPI_Send(const void* buf, int count,
MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm);

buf address of send buffer


count # of elements
datatype MPI data type
dest destination rank
tag message tag
comm communicator

At completion
▪ Send buffer can be reused as you see fit
▪ Status of destination is unknown – the message could be anywhere
Parallel Programming 2020 2020-11-30 30
Buffered send
MPI_send(buf, count, datatype,

blocking
dest, tag, comm)

Rank 0 Rank 1
app MPI MPI app
Bsend

blocking
copy msg
completion post send
wait for Recv
recv post recv
send completes
when message has msg msg
been copied transfer transfer
completion
predictable & no
synchronization Caveat: comes at the cost of
additional copy operations

Parallel Programming 2020 2020-12-07 31


Synchronous send
MPI_Ssend(buf, count, datatype, Problems:
▪ Performance: high latency, risk of
dest, tag, comm) serialization
synchronization of source ▪ Source for potential deadlocks
and destination But: can be used for debugging

Rank 0 Rank 1
app MPI MPI app
Ssend
post send
blocking

wait for Recv


recv
post recv

blocking
completion msg msg
transfer transfer

Ssend completes after recv completion


predictable &
has been posted at
safe behavior
destination

Parallel Programming 2020 2020-12-07 32


Standard blocking receive
int MPI_Recv(void* buf, int count,
MPI_Datatype datatype, int source,
int tag, MPI_Comm comm, MPI_Status *status);
buf address of receive buffer
count # of elements that fit into receive buffer
datatype MPI data type
source sending process rank
tag message tag
comm communicator
status address of status object
At completion
▪ Message has been received successfully
▪ Message length, and probably the tag and the sender, are still unknown
Parallel Programming 2020 2020-11-30 33
Source and tag wildcards
▪ MPI_Recv accepts wildcards for the source and tag arguments:
MPI_ANY_SOURCE, MPI_ANY_TAG

▪ Actual source and tag values are available in the status object:

MPI_Status s;
MPI_Recv(buf, count, datatype, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD, &s);
printf(“Received from rank %d with tag %d\n”,
s.MPI_SOURCE, s.MPI_TAG);

Parallel Programming 2020 2020-11-30 34


Received message length
int MPI_Get_count(const MPI_Status *status,
MPI_Datatype datatype, int *count)

status address of status object


datatype MPI data type
count address of element count variable

▪ Determines number of elements received

int count;
MPI_Get_count(&s, MPI_DOUBLE, &count);

Parallel Programming 2020 2020-11-30 35


Requirements for poit-to-point communication
For a communication to succeed:
▪ sender must specify a valid destination
▪ receiver must specify a valid source rank
(or MPI_ANY_SOURCE)
▪ communicator must match
(e.g., MPI_COMM_WORLD)
▪ tags must match
(or MPI_ANY_TAG for receiver)

▪ message data types must match


▪ receiver’s buffer must be large enough

Parallel Programming 2020 2020-11-30 36


Beginner’s MPI toolbox
▪ Basic point-to-point communication and support functions:
▪ MPI_Init() let's get going
▪ MPI_Comm_size() how many are we?
▪ MPI_Comm_rank() who am I?
▪ MPI_Send() send data to someone else
▪ MPI_Recv() receive data from some-/anyone
▪ MPI_Get_count() how many items have I received?
▪ MPI_Finalize() finish off

▪ Send/receive buffer may safely be reused after the call has completed
▪ MPI_Send() must have a specific target/tag, MPI_Recv() does not
▪ So far no explicit synchronization!
Parallel Programming 2020 2020-11-30 37
Example: parallel integration in MPI
MPI_Init(&argc, &argv);
MPI_Status status; // rank 0 collects partial results
MPI_Comm_size(MPI_COMM_WORLD, &size); if(0==rank) {
MPI_Comm_rank(MPI_COMM_WORLD, &rank); res = psum; // local result
// integration limits for(int i=1; i<size; ++i) {
double a=0., b=2., res=0., tmp; MPI_Recv(tmp, // receive buffer
// limits for "me" 1, // array length
mya = a + rank * (b-a)/size; MPI_DOUBLE, // data type
myb = mya + (b-a)/size; i, // rank of source
// integrate f(x) over my own chunk 0, // tag (unused here)
psum = integrate(mya,myb); MPI_COMM_WORLD,
&status); //status object
𝑏 res += tmp;
Task: calculate∫ 𝑓 𝑥 𝑑𝑥 using }
𝑎
printf(“Result: %f\n”, res);
(existing) function integrate(x,y) } else { // ranks != 0 send results to rank 0
MPI_Send(psum, // send buffer
1, // message length
▪ Split up interval [a,b] into equal MPI_DOUBLE,// data type
disjoint chunks 0, // rank of destination
0, // tag (unused here)
▪ Compute partial results in parallel MPI_COMM_WORLD);
▪ Collect global sum at rank 0 }
MPI_Finalize();
Parallel Programming 2020 2020-11-30 38
Example: parallel integration in MPI
MPI_Init(&argc, &argv);
MPI_Status status;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// integration limits
double a=0., b=2., res=0., tmp;
// limits for "me"
mya = a + rank * (b-a)/size; 𝑏
Task: calculate∫ 𝑓 𝑥 𝑑𝑥 using
myb = mya + (b-a)/size; 𝑎
(existing) function integrate(x,y)
// integrate f(x) over my own chunk
psum = integrate(mya,myb); ▪ Split up interval [a,b] into equal
disjoint chunks
▪ Compute partial results in parallel
▪ Collect global sum at rank 0

Parallel Programming 2020 2020-11-30 39


Example: parallel integration in MPI
// rank 0 collects partial results
if(0==rank) {
res = psum; // local result
for(int i=1; i<size; ++i) {
MPI_Recv(tmp, // receive buffer
1, // array length
MPI_DOUBLE, // data type
i, // rank of source
0, // tag (unused here)
MPI_COMM_WORLD,
&status); //status object
𝑏 res += tmp;
Task: calculate∫ 𝑓 𝑥 𝑑𝑥 using }
𝑎
printf(“Result: %f\n”, res);
(existing) function integrate(x,y) } else { // ranks != 0 send results to rank 0
MPI_Send(psum, // send buffer
1,
▪ Split up interval [a,b] into equal // message length
MPI_DOUBLE,// data type
disjoint chunks 0, // rank of destination
▪ Compute partial results in parallel 0, // tag (unused here)
MPI_COMM_WORLD);
▪ Collect global sum at rank 0 }
MPI_Finalize();
Parallel Programming 2020 2020-11-30 40
Remarks on parallel integration example

▪ Gathering results from processes is a very common task in MPI – there are more
efficient and elegant ways to do this (see later).

▪ This is a reduction operation (summation). There are more efficient and elegant
ways to do this (see later).

▪ The “master” process waits for one receive operation to be completed before the
next one is initiated. There are more efficient ways...

Parallel Programming 2020 2020-11-30 41


Remarks on parallel integration example

▪ “Master-worker” schemes are quite common in MPI programming but scalability to


high process counts may be limited.

▪ Every process has its own res variable, but only the master process actually uses
it → it’s typical for MPI codes to use more memory than actually needed.

Parallel Programming 2020 2020-11-30 42


Some useful MPI calls
▪ double MPI_Wtime();
Returns current time stamp
▪ double MPI_Wtick();
Returns resolution of timer

▪ int MPI_Abort(MPI_Comm comm, int errorcode);


▪ “Best effort” attempt to abort all tasks in communicator, deliver error code to
calling environment
▪ This is a last resort; if possible, shut down the program via MPI_Finalize()

Parallel Programming 2020 2020-11-30 43


Introduction to collectives in MPI
Collectives in MPI
Collectives: operations including all ranks of a communicator

All ranks must call the function!

▪ Blocking variants: buffer can be reused after return


▪ Nonblocking variants (since MPI 3.0):

▪ May or may not synchronize the processes

▪ Cannot interfere with point-to-point communication


▪ Completely separate modes of operation!

Parallel Programming 2020 2020-12-14 4


Collectives in MPI
▪ Rules for all collectives
▪ Data type matching
▪ No tags
▪ Count must be exact, i.e., there is only one message length, buffer must be
large enough
▪ Types:
▪ Synchronization (barrier)
▪ Data movement (broadcast, scatter, gather, all to all)
▪ Collective computation (reduction)
▪ Combinations of data movement and computation (reduction + broadcast)
▪ General assumption: MPI does a better job at collectives than you trying to
emulate them with point-to-point calls

Parallel Programming 2020 2020-12-14 4


Global communication
Barrier
▪ Explicit synchronization of all ranks from specified
communicator

MPI_Barrier(comm);

▪ Ranks only return from call after every rank has called
the function

▪ MPI_Barrier() rarely needed


▪ Debugging

Parallel Programming 2020 2020-12-14 48


Broadcast
▪ Send buffer contents from one rank (“root”) to all ranks

MPI_Bcast(buf, count, datatype, int root, comm);

▪ no restrictions on which rank is root – often rank 0


root int
rank 0 1 2 3
buffer 1 2 3

count = 3
MPI_Bcast(buffer, 3, MPI_INT, 1, MPI_COMM_WORLD)

buffer 1 2 3 1 2 3 1 2 3 1 2 3

Parallel Programming 2020 2020-12-14 49


Scatter
▪ Send the i-th chunk of an array to the i-th rank
MPI_Scatter(sendbuf, sendcount, sendtype,
recvbuf, recvcount, recvtype,
root, comm);

▪ In general, sendcount = recvcount


▪ This is the length of the chunk

▪ sendbuf is ignored on non-root ranks because there is nothing to send

Parallel Programming 2020 2020-12-14 50


Scatter

root int
rank 0 1 2 3
sendbuf 1 2 3 4

recvbuf

MPI_Scatter(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT,


root, MPI_COMM_WORLD)

sendbuf 1 2 3 4

recvbuf 1 2 3 4

Parallel Programming 2020 2020-12-14 51


Gather
▪ Receive a message from each rank and place i-th rank’s message at i-th
position in receive buffer

MPI_Gather(sendbuf, sendcount, sendtype,


recvbuf, recvcount, recvtype,
root, comm)

▪ In general, sendcount = recvcount


▪ recvbuf is ignored on non-root ranks because there is nothing to receive

Parallel Programming 2020 2020-12-14 52


Gather

root int
rank 0 1 2 3
sendbuf 1 2 3 4

recvbuf

MPI_Gather(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT,


root, MPI_COMM_WORLD)

sendbuf 1 2 3 4

recvbuf 1 2 3 4

Parallel Programming 2020 2020-12-14 53


Allgather
▪ Combination of gather and broadcast

MPI_Allgather(sendbuf, sendcount, sendtype,


recvbuf, recvcount, recvtype,
comm);

▪ Why not just use gather followed by a broadcast instead?


▪ MPI library has more options for optimization
▪ General assumption: Combined collectives are faster than using separate ones

Parallel Programming 2020 2020-12-14 54


Allgather

rank 0 1 2 3
sendbuf 0 1 2 3
sendcount 1 1 1 1
recvbuf

recvcount 1 1 1 1

MPI_Allgather() (no root required)

recvbuf 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

Parallel Programming 2020 2020-12-14 55


Alltoall
▪ MPI_Alltoall: For all ranks, send i-th chunk to i-th rank

MPI_Alltoall(sendbuf, sendcount, sendtype,


recvbuf, recvcount, recvtype,
comm);

Parallel Programming 2020 2020-12-14 56


Alltoall

rank 0 1 2 3
sendbuf 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

sendcount 1 1 1 1

recvbuf

recvcount 1 1 1 1

MPI_Alltoall() (no root required)

recvbuf 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15

Parallel Programming 2020 2020-12-14 57


Global operations
Global operations: reduction
▪ Compute results over distributed data rank sendbuf
0 0 9 2 6
MPI_Reduce(sendbuf, recvbuf, count,
5 1 0 4
datatype, MPI_Op op, root, comm); 1
2 8 3 4 5
▪ Result in recvbuf only available on root process 3 1 0 6 8

▪ Perform operation on all count elements of an array MPI_Reduce()

max()
max()
max()

max()
count = 4
▪ If all ranks require result, use MPI_Allreduce() op = MPI_MAX

recvbuf 8 9 6 8
▪ If the 12 predefined ops are not enough use on root
MPI_Op_create/MPI_Op_free to create own ones
Parallel Programming 2020 2020-12-14 59
Global operations – predefined operators
Name Operation Name Operation
MPI_SUM Sum MPI_PROD Product
MPI_MAX Maximum MPI_MIN Minimum
MPI_LAND Logical AND MPI_BAND Bit-AND
MPI_LOR Logical OR MPI_BOR Bit-OR
MPI_LXOR Logical XOR MPI_BXOR Bit-XOR
MPI_MAXLOC Maximum+Position MPI_MINLOC Minimum+Position

Parallel Programming 2020 2020-12-14 60


Example: parallel integration in MPI
MPI_Init(&argc, &argv);
MPI_Status status; // rank 0 collects partial results
MPI_Comm_size(MPI_COMM_WORLD, &size); if(0==rank) {
MPI_Comm_rank(MPI_COMM_WORLD, &rank); res = psum; // local result
// integration limits for(int i=1; i<size; ++i) {
double a=0., b=2., res=0., tmp; MPI_Recv(tmp, // receive buffer
// limits for "me" 1, // array length
mya = a + rank * (b-a)/size; MPI_DOUBLE, // data type
myb = mya + (b-a)/size; i, // rank of source
// integrate f(x) over my own chunk 0, // tag (unused here)
psum = integrate(mya,myb); MPI_COMM_WORLD,
&status); //status object
𝑏 res += tmp;
Task: calculate∫ 𝑓 𝑥 𝑑𝑥 using }
𝑎
printf(“Result: %f\n”, res);
(existing) function integrate(x,y) } else { // ranks != 0 send results to rank 0
MPI_Send(psum, // send buffer
1, // message length
▪ Split up interval [a,b] into equal MPI_DOUBLE,// data type
disjoint chunks 0, // rank of destination
0, // tag (unused here)
▪ Compute partial results in parallel MPI_COMM_WORLD);
▪ Collect global sum at rank 0 }
MPI_Finalize();
Parallel Programming 2020 2020-11-30 61
Example: parallel integration in MPI using collective operations
MPI_Init(&argc, &argv);
MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// integration limits
double a=0., b=2., res=0., tmp;
// limits for "me"
mya = a + rank * (b-a)/size;
myb = mya + (b-a)/size;
// integrate f(x) over my own chunk
psum = integrate(mya,myb);
/* everyone calls reduce, data is taken from all and the result ends up in root’s res only */
MPI_ Reduce( psum, // send buffer
res, // receive buffer
1, // array length
MPI_DOUBLE, // data type
MPI_SUM, // MPI operation
0, // root
MPI_COMM_WORLD);

if(0==rank) printf(“Result: %f\n”, res);


Parallel Programming 2020 2020-11-30 62
Summary of beginner’s MPI toolbox
▪ Starting up and shutting down the “parallel program” with MPI_Init()
and MPI_Finalize()
▪ MPI task (“process”) identified by rank (MPI_Comm_rank())
▪ Number of MPI tasks: MPI_Comm_size()
▪ Startup process is very implementation dependent
▪ Simple, blocking point-to-point communication with MPI_Send() and
MPI_Recv()
▪ “Blocking” == buffer can be reused as soon as call returns
▪ Message matching

▪ Timing functions

Parallel Programming 2020 2020-11-30 63


Summary of MPI collective communication
▪ MPI (blocking) collectives
▪ All ranks in communicator must call the function

▪ Communication and synchronization


▪ Barrier, broadcast, scatter, gather, there are more…

▪ Global operations
▪ Reduce, allreduce, there are more…

Parallel Programming 2020 2020-12-14 64

You might also like