10 OpenMP-2

Programming with Shared
Memory
Parallel Programming Techniques and Applications Using
Networked Workstations and Parallel Computers
Chapter 8
Dr. Hamdan Abdellatef
OpenMP
Hands on tutorial by Tim Matson:
https://www.youtube.com/playlist?list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG
https://colab.research.google.com/gist/hamdanabdellatef/033338741f820332dfb61e1116c86fc8/openmp-patterns.ipynb
Recall
Decompose
into tasks
Original Problem
Tasks, shared and local data
Program SPMD_Emb_Par ()
{ Program SPMD_Emb_Par ()
TYPE *tmp, *func();
{ Program SPMD_Emb_Par ()
global_array
TYPE Data(TYPE);
*tmp,
{ Program *func();
SPMD_Emb_Par ()
global_array { Res(TYPE);
global_array
TYPE Data(TYPE);
*tmp, *func();
int N = global_array
get_num_procs();
global_arrayTYPERes(TYPE);
Data(TYPE);
*tmp, *func();
int idint=N get_proc_id();
= global_array
get_num_procs();
global_array Res(TYPE);
Data(TYPE);
Code with a if (id==0)
int id
for (int
int=setup_problem(N,DATA);
I= 0;
if (id==0)
int
Nget_proc_id();
= get_num_procs();
global_array
intI<N;I=I+Num){
id
Res(TYPE);
=setup_problem(N,DATA);
get_proc_id();
Num = get_num_procs();
parallel Prog. Env. tmp = (id==0)
for (int
if func(I);
I= 0;
int
Res.accumulate(
tmp
for (int
if
id I<N;I=I+Num){
=setup_problem(N,DATA);
get_proc_id();
I= 0; tmp);
= (id==0)
func(I);I<N;I=I+Num){
setup_problem(N, Data);
} Res.accumulate(
tmp I= ID;tmp);
= func(I);
for (int I<N;I=I+Num){
} } Res.accumulate(
tmp = func(I, tmp);
Data);
} } Res.accumulate( tmp);
} }
}
Units of execution + new shared data

for extracted dependencies Corresponding source code
3
OpenMP
▪ An accepted standard developed in the late 1990s by a group of industry specialists.
▪ Consists of a small set of compiler directives, augmented with a small set of library routines and
environment variables using the base language Fortran and C/C++.
▪ The compiler directives can specify such things as the parallel block and parallel for.
▪ Several OpenMP compilers available.
4
OpenMP
▪ For C/C++, the OpenMP directives are contained in #pragma statements. The OpenMP #pragma
statements have the format:
#pragma omp directive_name ...
▪ where omp is an OpenMP keyword.
▪ May be additional parameters (clauses) after the directive name for different options.
▪ Some directives require code to specified in a structured block (a statement or statements) that follows
the directive and then the directive and structured block form a “construct”.
5
OpenMP
▪ OpenMP uses “fork-join” model but thread-based.
▪ Initially, a single thread is executed by a master thread.
▪ Parallel regions (sections of code) can be executed by multiple threads (a team of threads).
Master
Thread
Parallel Regions
▪ Parallel directive creates a team of threads with a specified block of code executed by the
multiple threads in parallel.
▪ Other directives used within a parallel construct to specify parallel for loops and different blocks of
code for threads.
6
Parallel Directive
#pragma omp parallel
structured_block
▪ creates multiple threads, each one executing the specified structured_block, either a single
statement or a compound statement created with { ...} with a single entry point and a single exit
point.
▪ There is an implicit barrier at the end of the construct.
7
Simple Example
▪ omp_get_num_threads returns the number of threads that are currently being used
▪ omp_get_thread_num() returns the thread number
▪ The array a[] is a global array
▪ x and num_threads are declared as private to the threads.
8
Number of threads in a team
▪ Established by either:
1. num_threads clause after the parallel directive, or

2. omp_set_num_threads() library routine being previously called, or
3. The environment variable OMP_NUM_THREADS is defined in the order given or is system
dependent if none of the above.
▪ Number of threads available can also be altered automatically to achieve best use of system
resources by a “dynamic adjustment” mechanism.
9
Work-Sharing
▪ Three constructs in this classification:
sections
for
single
▪ In all cases, there is an implicit barrier at the end of the construct unless a nowait clause is
included.
▪ Note that these constructs do not start a new team of threads. That done by an enclosing parallel
construct.
10
Sections
▪ The construct is a non-iterative worksharing
#pragma omp sections
▪ Structured blocks are to be distributed among and executed by

the threads in a team.
▪ Each structured block is executed once by one of the threads in

the team in the context of its implicit task.
11
For Loop
#pragma omp for
for_loop
▪ causes the for loop to be divided into parts and parts shared among threads in the team.
▪ The for loop must be of a simple form.
▪ Way that for loop divided can be specified by an additional “schedule” clause.
▪ Example: the clause schedule (static, chunk_size) cause the for loop be divided into sizes
specified by chunk_size and allocated to threads in a round robin fashion.
▪ Also can use reduction (op:var)
12
Single
▪ The directive
#pragma omp single

structured block
▪ cause the structured block to be executed by one thread only.
13
Combined Parallel Work-sharing Constructs
▪ If a parallel directive is followed by a single for directive, it can be combined into with similar effect
#pragma omp parallel for

for_loop
▪ If a parallel directive is followed by a single sections directive, it can be combined into with similar
effects
14
Master Directive
▪ The master directive:
#pragma omp master
structured_block
▪ Only the master thread to execute the structured block.
▪ Different to those in the work sharing group in that there is no implied barrier at the end of the
construct (nor the beginning).
▪ Other threads encountering this directive will ignore it and the associated structured block, and
will move on.
15
Hello OpenMP
#include <iostream>
#include <omp.h>
using namespace std;
int main()
{
omp_set_num_threads(4);
{
int ID = omp_get_thread_num();
cout << "Hello World!" << endl;
cout << ID << endl;
}
16
Example: Calculating Pi
▪ We know that:
▪ We can approximate the integral using rectangles
17
#include <iostream>
#include <omp.h>
using namespace std;
#define NUM_THREADS 2
static long num_steps = 100000000;

double step;
int main()
{
int i;
double x, pi, sum = 0.0; double start_time, run_time;
step = 1.0 / (double)num_steps;
start_time = omp_get_wtime();
omp_set_num_threads(NUM_THREADS);
{
double x;
#pragma omp for reduction (+:sum)
for (i = 1; i <= num_steps; i++) {
x = (i - 0.5) * step;
sum += 4.0 / (1.0 + x * x);
}
}
pi = step * sum;
run_time = omp_get_wtime() - start_time;
printf("\n pi with %ld steps is %lf in %lf seconds\n ", num_steps, pi, run_time);
}
18
Synchronization Constructs
Critical
▪ The critical directive will only allow one thread execute the associated structured block. When
one or more threads reach the critical directive:
#pragma omp critical name

structured_block
▪ They will wait until no other thread is executing the same critical section (one with the same
name), and then one thread will proceed to execute the structured block.
▪ Name is optional.
▪ All critical sections with no name map to one undefined name.
20
Barrier
▪ When a thread reaches the barrier
#pragma omp barrier
▪ it waits until all threads have reached the barrier and then they all proceed together.
▪ There are restrictions on the placement of barrier directive in a program. In particular, all threads
must be able to reach the barrier.
21
Atomic
▪ The atomic directive
#pragma omp atomic

expression_statement
▪ implements a critical section efficiently when the critical section simply updates a variable (adds
one, subtracts one, or does some other simple arithmetic operation as defined by
expression_statement).
22
Ordered
▪ Used in conjunction with for and parallel for directives to cause an iteration to be executed in the
order that it would have occurred if written as a sequential loop.
23
#include <stdio.h>
#include <omp.h>
#define MAX_THREADS 4
static long num_steps = 100000000; double step;

int main()
{
int i, j; double pi, full_sum = 0.0; double start_time, run_time;
double sum[MAX_THREADS];
step = 1.0 / (double)num_steps;
for (j = 1; j <= MAX_THREADS; j++) {

omp_set_num_threads(j);
full_sum = 0.0; start_time = omp_get_wtime();
#pragma omp parallel private(i)
{
int id = omp_get_thread_num();
int numthreads = omp_get_num_threads();
double x; double partial_sum = 0;
#pragma omp single

printf(" num_threads = %d", numthreads);
for (i = id; i < num_steps; i += numthreads) {

x = (i + 0.5) * step;
partial_sum += +4.0 / (1.0 + x * x);
}
#pragma omp critical
full_sum += partial_sum;
}
pi = step * full_sum; run_time = omp_get_wtime() - start_time;

printf("\n pi is %f in %f seconds %d threds \n ", pi, run_time, j); 24
}}
Thank You

10 OpenMP-2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 OpenMP-2

Uploaded by

Copyright:

Available Formats

Programming with Shared

Units of execution + new shared data

▪ Several OpenMP compilers available.

#pragma omp directive_name ...

▪ where omp is an OpenMP keyword.

▪ There is an implicit barrier at the end of the construct.

1. num_threads clause after the parallel directive, or

▪ Structured blocks are to be distributed among and executed by

▪ Each structured block is executed once by one of the threads in

▪ Also can use reduction (op:var)

#pragma omp single

▪ cause the structured block to be executed by one thread only.

#pragma omp parallel for

▪ Only the master thread to execute the structured block.

▪ We can approximate the integral using rectangles

static long num_steps = 100000000;

step = 1.0 / (double)num_steps;

#pragma omp critical name

#pragma omp barrier

#pragma omp atomic

static long num_steps = 100000000; double step;

step = 1.0 / (double)num_steps;

for (j = 1; j <= MAX_THREADS; j++) {

#pragma omp single

for (i = id; i < num_steps; i += numthreads) {

pi = step * full_sum; run_time = omp_get_wtime() - start_time;

You might also like