Professional Documents
Culture Documents
CP4253 Map Unit Iii
CP4253 Map Unit Iii
CP4253 Map Unit Iii
INTRODUCTION:
Shared memory is a common memory address space that can be accessed
simultaneously by more than one program. Multicore is a kind of shared memory
multiprocessor in which all cores share the same address space. A process can be divided
into many small parts and will be assigned to each core of the multicore system. The
execution of the process can be carried out in parallel on multiple cores.
Therefore the smallest sequence of instructions that is scheduled for execution in a core is
called as a thread. The execution of several of those threads in parallel is
multithreading.
OpenMP
OpenMP is an API for shared-memory parallel programming. The “MP” in OpenMP
stands for “multiprocessing”. OpenMP is designed for systems in which each thread or
process can potentially have access to all available memory.
OpenMP identifies parallel regions as block of code that may run in parallel.
It supports two types of parallelism
Thread Parallelism
Explicit Parallelism
OPENMP EXECUTION MODEL
OpenMP provides directives-based shared-memory API. In C and C++, this means that
there are special preprocessor instructions known as pragmas.
The preprocessor directive #pragma is used to provide the additional information to the
compiler in C/C++ language. This is used by the compiler to provide some special
features. Pragmas in C and C++ start with
#pragma
Example: Openmp program
To run the program, we specify the number of threads on the command line. For
example, we might run the program with four threads and type
Output
Hello from thread 0 of 4
Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 3 of 4
Explanation:
Omp.h is the header file
Strtol() function is used to get the number of threads. It included in
stdlib.h. Syntax:
Where
First argument is a string
Last argument is the numeric base in which the string us represented
Second argument is passed in a NULL pointer since it is not used
Pragma says the program should start a number of threads equal to what was
passed in via the command line.
OpenMP pragmas always begin with
# pragma omp
The first directive is a parallel directive, and, it specifies that the structured block
of code that follows should be executed by multiple threads.
#pragma omp parallel
An Open MP construct is defined to be a compiler directive plus a block of code
A single statement or a compound statement with a single entry at the top an a
single exit at the bottom.
Branching in or out of a structured block is not allowed
The number of threads that run the following structured block of code will be
determined by the run-time system. If there are no other threads started, the
system will typically run one thread on each available core.
Usually the number of threads is specified on the command line, so modify
parallel directives with the num threads clause.
A clause in OpenMP is just some text that modifies a directive. The num threads
clause can be added to a parallel directive. It allows the programmer to specify the
number of threads that should execute the following block:
# pragma omp parallel num_threads(thread count)
Thread is short for thread of execution. The name is meant to suggest a sequence of
statements executed by a program. Threads are typically started or forked by a process,
and they share most of the resources of the process that starts them, but each thread has
its own stack and program counter.
When a thread completes execution it joins the process that started it. This
terminology comes from diagrams that show threads as directed lines.
In OpenMP, the collection of threads executing the parallel block—the original thread
and the new threads—is called a team, the original thread is called the master, and the
additional threads are called slaves.
When the block of code is completed, when the threads return from the call to Hello—
there’s an implicit barrier. This means that a thread that has completed the block of code
will wait for all the other threads in the team to complete the block.
When all the threads have completed the block, the slave threads will terminate
and the master thread will continue executing the code that follows the block.
Error Checking
An exceptionally good idea is to check for errors while writing code. For example, After
the call to strtol,
Check the the value is positive.
Check that the number of threads actually created by the parallel directive is the
same as thread count
Another source of potential problem is the compiler.
If the compiler doesn’t support OpenMP, it will just ignore the parallel directive.
However, the attempt to include omp.h and the calls to omp_get_thread_ num and
omp_get_num_threads will cause errors.
To handle these problems, we can check whether the preprocessor macro OPENMP is
defined. If this is defined, we can include omp.h and make the calls to the OpenMP
functions.
We might make the following modifications to our program. Instead of simply
including omp.h in the line
#include <omp.h>
we can check for the definition of OPENMP before trying to include it:
#ifdef OPENMP
# include <omp.h>
#endif
Also, instead of just calling the OpenMP functions, we can first check whether OPENMP
is defined:
# ifdef OPENMP
int my_rank = omp_get_thread_num();
int thread count = omp_get_num_threads();
# else int my_rank = 0;
int thread count = 1;
# endif
Here, if OpenMP isn’t available, then the code will execute with one thread having rank
0.
MEMORY MODEL
OpenMP assumes that there is a place for storing and retrieving data that is
available to all threads, called the memory. Each thread may have a temporary view
of memory that it can use instead of memory to store data temporarily when it need
not be seen by other threads.
Data can move between memory and a thread's temporary view, but can never
move between temporary views directly, without going through memory. Each
variable used within a parallel region is either shared or private. The variable names
used within a parallel construct relate to the program variables visible at the point of
the parallel directive, referred to as their "original variables". Each shared variable
reference inside the construct refers to the original variable of the same name. For
each private variable, a reference to the variable name inside the construct refers to a
variable of the same type and size as the original variable, but private to the thread.
That is, it is not accessible by other threads.
There are two aspects of memory system behavior relating to shared memory
parallel programs: coherence and consistency.
Coherence refers to the behavior of the memory system when a single memory
location is accessed by multiple threads.
Consistency refers to the ordering of accesses to different memory locations,
observable from various threads in the system.
Device data Environment:
When an openMP program begins each device has an initial device data environment.
The initial device data environment for the host device is the data environment associated
with the initial task region.
Directives that accept data mapping attribute clauses determine how an original variable
is mapped to a corresponding variable in a device data environment. The original variable
is the variable with the same name that exists in the data environment of the task that
encounters the directive.
A flush also causes any values of the flush set variables that were captured in the
temporary view, to be discarded, so that later reads for those variables will come directly
from memory. A flush without a list of variable names flushes all variables visible at that
point in the program. A flush with a list flushes only the variables in the list. The
OpenMP flush operation is the only way in an OpenMP program, to guarantee that a
value will move between two threads.
In order to move a value from one thread to a second thread, OpenMP requires these four
actions in exactly the following order:
1. the first thread writes the value to the shared variable,
2. the first thread flushes the variable.
3. the second thread flushes the variable and
4. the second thread reads the variable.
OPENMP DIRECTIVES
Each open MP directive starts with #pragma omp
Syntax:
#pragma omp directive name [clause[clause]..] new line
Where
#pragma omp – Required for all open MP C/C++ directives
directive name – A valid open MP directive. Must appear after the pragma and before
any clauses
[clause[clause]..] – Optional. Clauses can be in any order and repeated as necessary
unless otherwise restricted
new line – Required. Precedes the structured block which is enclosed by this directive.
General rules:
Case Sensitive
Only one directive name can be specified per directive
Applies to the succeeding structured block or an Open MP construct
Order in which clauses appear in directives is not significant
Parallel Constructs
Defines Parallel region
Code that will be executed by multiple threads in
parallel Syntax:
#pragma omp parallel [clause..] new line
Structured block
Clause
For general attributes:
Clause Description
if Specifies whether a loop should be executed in parallel
or in serial.
num_threads Sets the number of threads in a thread team.
ordered Required on a parallel for statement if
an ordered directive is to be used in the loop.
schedule Applies to the for directive.
nowait Overrides the barrier implicit in a directive.
Example:
#include<stdio.h>
#include<omp.h>
int main()
{
#pragma omp parallel num_threads(4)
{
int i=omp_get_thread_num();
Printf_s(“Hello from thread %d\n”,i);
}
}
Output:
Hello from thread 0
Hello from thread 1
Hello from thread 2
Hello from thread 3
WORK SHARING CONSTRUCTS
A work-sharing construct divides the execution of the enclosed code region
among the members of the team that encounter it.
Work-sharing constructs do not launch new threads
There is no implied barrier upon entry to a work-sharing construct, however there
is an implied barrier at the end of a work sharing construct.
Restrictions:
A work-sharing construct must be enclosed dynamically within a parallel region in order
for the directive to execute in parallel.
Work-sharing constructs must be encountered by all members of a team or none at all
Successive work-sharing constructs must be encountered in the same order by all
members of a team
1. DO / for Directive
Splits the for loop so that each thread in the current team handles a different portion of the loop
Represents a type of ‘data parallelism’
#pragma omp for [clause ...] newline
schedule (type [,chunk])
ordered
private (list)
firstprivate (list)
lastprivate (list)
shared (list)
reduction (operator: list)
collapse (n)
nowait
for_loop
Clauses:
SCHEDULE: Describes how iterations of the loop are divided among the threads in the
team. The default schedule is implementation dependent.
STATIC
Loop iterations are divided into pieces of size chunk and then statically assigned to
threads. If chunk is not specified, the iterations are evenly (if possible) divided
contiguously among the threads.
DYNAMIC
Loop iterations are divided into pieces of size chunk, and dynamically scheduled among
the threads; when a thread finishes one chunk, it is dynamically assigned another.
The default chunk size is 1.
GUIDED
Iterations are dynamically assigned to threads in blocks as threads request them until no
blocks remain to be assigned. Similar to DYNAMIC except that the block size decreases
each time a parcel of work is given to a thread.
RUNTIME
The scheduling decision is deferred until runtime by the environment variable
OMP_SCHEDULE. It is illegal to specify a chunk size for this clause.
AUTO
The scheduling decision is delegated to the compiler and/or runtime system.
NO WAIT / no wait: If specified, then threads do not synchronize at the end of
the parallel loop.
ORDERED: Specifies that the iterations of the loop must be executed as they
would be in a serial program.
COLLAPSE: Specifies how many loops in a nested loop should be collapsed into
one large iteration space and divided according to the schedule clause.
Restrictions:
The DO loop can not be a DO WHILE loop, or a loop without loop control. Also,
the loop iteration variable must be an integer and the loop control parameters must
be the same for all threads.
Program correctness must not depend upon which thread executes a particular
iteration.
It is illegal to branch (goto) out of a loop associated with a DO/for directive.
The chunk size must be specified as a loop invariant integer expression, as there is
no synchronization during its evaluation by different threads.
ORDERED, COLLAPSE and SCHEDULE clauses may appear once each.
Example
#include<stdio.h>
#include<omp.h>
int main()
{
#pragma omp parallel
{
#pragma omp for
For(int n=0;n<10;++n)
Printf (“%d”,n);
Printf(“/n”);
}
Output
0567182349
2. Sections directive
Identifies a non-iterative work sharing construct
Identifies code section to be divided among all threads
Each section is executed by a thread
Independent section directives are nested within a SECTIONS directive
#pragma omp sections [clause ... newline
private (list)
firstprivate (list)
lastprivate (list)
reduction (operator: list)
nowait ]
{
#pragma omp section newline
structured_block
#pragma omp section newline
structured_block
}
Clauses:
There is an implied barrier at the end of a SECTIONS directive, unless
the NOWAIT/nowait clause is used
Example
#include<stdio.h>
#include<omp.h>
int main()
{
#pragma omp parallel sections num_threads(4)
{
Printf_s(“Hello from thread %d\n”, omp_get_thread_num());
#pragma omp section
Printf_s(“Hello from thread %d\n”, omp_get_thread_num());
}
}
Output:
Hello from thread 0
Hello from thread 0
3. Single Directive
Specifies that the associated structured block is executed by only one of the threads in the
team
Syntax:
#pragma omp single [clause ...] newline
private (list)
firstprivate (list)
nowait
structured_block
Clauses:
Threads in the team that do not execute the SINGLE directive, wait at the end of
the enclosed code block, unless a NOWAIT/nowait clause is specified.
Restrictions:
It is illegal to branch into or out of a SINGLE block.
Example:
#include<stdio.h>
#include<omp.h>
int main()
{
#pragma omp parallel num_threads(2)
{
#pragma omp single
//Only a single thread can read the input
Printf_s(“read input\n”);
//Matrix Multiplication
for(i=0; i<row_length_A; i++)
{
for (k=0; k<column_length_B; k++)
{
sum = 0;
for (j=0; j<column_length_A; j++)
{
sum += A[i][j]*B[j][k];
}
C[i][k]=sum;
}
}
//Array addition
for(i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
An OpenMP directive, "omp parallel for" instructs the compiler to execute the code in the
for loop in parallel. For multiplication, we can divide matrix A and B into blocks along
rows and columns respectively. This allows us to calculate every element in matrix C
individually thereby making the task parallel.
Steps to parallelization
The process of parallelizing a sequential program can be broken down into four discrete
steps.
2. Functional Parallelism
OpenMP allows us to assign different threads to different portions of code (functional
parallelism)
Functional Parallelism Example
v = alpha();
w = beta();
x = gamma(v, w);
y = delta();
printf ("%6.2f\n", epsilon(x,y));
May execute alpha,beta, and delta in parallel.
parallel sections Pragma
Precedes a block of k blocks of code that may be executed concurrently by k threads
Syntax:
#pragma omp parallel sections
section Pragma
Precedes each block of code within the encompassing block preceded by the parallel
sections pragma
May be omitted for first parallel section after the parallel sections pragma
Syntax:
#pragma omp section
Another Approach
Execute alpha and beta in parallel.
Execute gamma and delta in parallel.
sections Pragma
Appears inside a parallel block of code
Has same meaning as the parallel sections pragma
If multiple sections pragmas inside one parallel block, may reduce fork/join costs
HANDLING LOOPS
During parallelization the looping statements given in openMP programs has to be
handled very carefully to avoid yielding wrong results. The following section describes
the various suitable ways to handle for loops in OpenMP.
Scheduling Loops
Assigning iterations to threads.
For example, if a for loop with 10 iterations and 5 omp threads are used in the program,
then the partitioning the loop iteration can be done as follows
Thread 1 Thread 2 Thread 3 Thread 4 Thread 5
1 2 4 6 8 10
As given in the figure, iterations one and two are assigned to thread1, iterations three and
four are assigned to thread2. Similarly, iterations nine and ten are assigned to Thread5.
The Schedule clause can be applied to for directive.
Data Dependencies:
If one iteration depends on the results of its previous iterations, then for loop cannot be
parallelized correctly. This dependency called data dependencies.
Let us assume that the computation is carried out by two threads: Thread1 and Thread2.
The iterations 2,3 and 4 are allocated to Thread1 and iterations 5,6 and 7 are allocated to
Thread2. The computation can be done in any order by the threads.
Let us consider the situation in which the Thread 2 starts computing fib[5] before thread1
completes fib[4]. This situation definitely produces wrong result. Therefore openMP
cannot be able to parallelize the looping statements correctly if the iterations depend on
the result of previous iterations.
If the result of previous iterations is used in the subsequent iterations, then these type of
dependence is called a loop carried dependence.
Scope of variable
In serial programming, the scope of a variable consists of those parts of a program in which the
variable can be used. For example, a variable declared at the beginning of a C function has
“function-wide” scope, that is, it can only be accessed in the body of the function. On the other
hand, a variable declared at the beginning of a .c file but outside any function has “file-wide”
scope. The default scope of a variable can change with other directives, and that OpenMP
provides clauses to modify the default scope.