Professional Documents
Culture Documents
10 OpenMP-2
10 OpenMP-2
Memory
Parallel Programming Techniques and Applications Using
Networked Workstations and Parallel Computers
Chapter 8
Dr. Hamdan Abdellatef
OpenMP
Hands on tutorial by Tim Matson:
https://www.youtube.com/playlist?list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG
https://colab.research.google.com/gist/hamdanabdellatef/033338741f820332dfb61e1116c86fc8/openmp-patterns.ipynb
Recall
Decompose
into tasks
Original Problem
Tasks, shared and local data
Program SPMD_Emb_Par ()
{ Program SPMD_Emb_Par ()
TYPE *tmp, *func();
{ Program SPMD_Emb_Par ()
global_array
TYPE Data(TYPE);
*tmp,
{ Program *func();
SPMD_Emb_Par ()
global_array { Res(TYPE);
global_array
TYPE Data(TYPE);
*tmp, *func();
int N = global_array
get_num_procs();
global_arrayTYPERes(TYPE);
Data(TYPE);
*tmp, *func();
int idint=N get_proc_id();
= global_array
get_num_procs();
global_array Res(TYPE);
Data(TYPE);
Code with a if (id==0)
int id
for (int
int=setup_problem(N,DATA);
I= 0;
if (id==0)
int
Nget_proc_id();
= get_num_procs();
global_array
intI<N;I=I+Num){
id
Res(TYPE);
=setup_problem(N,DATA);
get_proc_id();
Num = get_num_procs();
parallel Prog. Env. tmp = (id==0)
for (int
if func(I);
I= 0;
int
Res.accumulate(
tmp
for (int
if
id I<N;I=I+Num){
=setup_problem(N,DATA);
get_proc_id();
I= 0; tmp);
= (id==0)
func(I);I<N;I=I+Num){
setup_problem(N, Data);
} Res.accumulate(
tmp I= ID;tmp);
= func(I);
for (int I<N;I=I+Num){
} } Res.accumulate(
tmp = func(I, tmp);
Data);
} } Res.accumulate( tmp);
} }
}
3
OpenMP
▪ An accepted standard developed in the late 1990s by a group of industry specialists.
▪ Consists of a small set of compiler directives, augmented with a small set of library routines and
environment variables using the base language Fortran and C/C++.
▪ The compiler directives can specify such things as the parallel block and parallel for.
4
OpenMP
▪ For C/C++, the OpenMP directives are contained in #pragma statements. The OpenMP #pragma
statements have the format:
▪ May be additional parameters (clauses) after the directive name for different options.
▪ Some directives require code to specified in a structured block (a statement or statements) that follows
the directive and then the directive and structured block form a “construct”.
5
OpenMP
▪ OpenMP uses “fork-join” model but thread-based.
▪ Initially, a single thread is executed by a master thread.
▪ Parallel regions (sections of code) can be executed by multiple threads (a team of threads).
Master
Thread
Parallel Regions
▪ Parallel directive creates a team of threads with a specified block of code executed by the
multiple threads in parallel.
▪ Other directives used within a parallel construct to specify parallel for loops and different blocks of
code for threads.
6
Parallel Directive
#pragma omp parallel
structured_block
▪ creates multiple threads, each one executing the specified structured_block, either a single
statement or a compound statement created with { ...} with a single entry point and a single exit
point.
7
Simple Example
▪ omp_get_num_threads returns the number of threads that are currently being used
▪ omp_get_thread_num() returns the thread number
▪ The array a[] is a global array
▪ x and num_threads are declared as private to the threads.
8
Number of threads in a team
▪ Established by either:
▪ Number of threads available can also be altered automatically to achieve best use of system
resources by a “dynamic adjustment” mechanism.
9
Work-Sharing
▪ Three constructs in this classification:
sections
for
single
▪ In all cases, there is an implicit barrier at the end of the construct unless a nowait clause is
included.
▪ Note that these constructs do not start a new team of threads. That done by an enclosing parallel
construct.
10
Sections
▪ The construct is a non-iterative worksharing
#pragma omp sections
11
For Loop
#pragma omp for
for_loop
▪ causes the for loop to be divided into parts and parts shared among threads in the team.
▪ The for loop must be of a simple form.
▪ Way that for loop divided can be specified by an additional “schedule” clause.
▪ Example: the clause schedule (static, chunk_size) cause the for loop be divided into sizes
specified by chunk_size and allocated to threads in a round robin fashion.
12
Single
▪ The directive
13
Combined Parallel Work-sharing Constructs
▪ If a parallel directive is followed by a single for directive, it can be combined into with similar effect
▪ If a parallel directive is followed by a single sections directive, it can be combined into with similar
effects
14
Master Directive
▪ The master directive:
#pragma omp master
structured_block
▪ Different to those in the work sharing group in that there is no implied barrier at the end of the
construct (nor the beginning).
▪ Other threads encountering this directive will ignore it and the associated structured block, and
will move on.
15
Hello OpenMP
#include <iostream>
#include <omp.h>
using namespace std;
int main()
{
omp_set_num_threads(4);
#pragma omp parallel
{
int ID = omp_get_thread_num();
cout << "Hello World!" << endl;
cout << ID << endl;
}
16
Example: Calculating Pi
▪ We know that:
17
#include <iostream>
#include <omp.h>
using namespace std;
#define NUM_THREADS 2
int main()
{
int i;
double x, pi, sum = 0.0; double start_time, run_time;
start_time = omp_get_wtime();
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel
{
double x;
#pragma omp for reduction (+:sum)
for (i = 1; i <= num_steps; i++) {
x = (i - 0.5) * step;
sum += 4.0 / (1.0 + x * x);
}
}
pi = step * sum;
run_time = omp_get_wtime() - start_time;
printf("\n pi with %ld steps is %lf in %lf seconds\n ", num_steps, pi, run_time);
}
18
Synchronization Constructs
Critical
▪ The critical directive will only allow one thread execute the associated structured block. When
one or more threads reach the critical directive:
▪ They will wait until no other thread is executing the same critical section (one with the same
name), and then one thread will proceed to execute the structured block.
▪ Name is optional.
▪ All critical sections with no name map to one undefined name.
20
Barrier
▪ When a thread reaches the barrier
▪ it waits until all threads have reached the barrier and then they all proceed together.
▪ There are restrictions on the placement of barrier directive in a program. In particular, all threads
must be able to reach the barrier.
21
Atomic
▪ The atomic directive
▪ implements a critical section efficiently when the critical section simply updates a variable (adds
one, subtracts one, or does some other simple arithmetic operation as defined by
expression_statement).
22
Ordered
▪ Used in conjunction with for and parallel for directives to cause an iteration to be executed in the
order that it would have occurred if written as a sequential loop.
23
#include <stdio.h>
#include <omp.h>
#define MAX_THREADS 4