Professional Documents
Culture Documents
Cs8083 Unit III Notes
Cs8083 Unit III Notes
INTRODUCTION:
Shared memory is a common memory address space that can be accessed
simultaneously by more than one program. Multicore is a kind of shared memory
multiprocessor in which all cores share the same address space. A process can be divided
into many small parts and will be assigned to each core of the multicore system. The
execution of the process can be carried out in parallel on multiple cores.
Therefore the smallest sequence of instructions that is scheduled for execution in a core is
called as a thread. The execution of several of those threads in parallel is
multithreading.
OpenMP
OpenMP is an API for shared-memory parallel programming. The “MP” in OpenMP
stands for “multiprocessing”. OpenMP is designed for systems in which each thread or
process can potentially have access to all available memory.
OpenMP identifies parallel regions as block of code that may run in parallel.
It supports two types of parallelism
• Thread Parallelism
• Explicit Parallelism
3.1 OPENMP EXECUTION MODEL
OpenMP provides directives-based shared-memory API. In C and C++, this means that
there are special preprocessor instructions known as pragmas.
The preprocessor directive #pragma is used to provide the additional information to the
compiler in C/C++ language. This is used by the compiler to provide some special
features. Pragmas in C and C++ start with
#pragma
To run the program, we specify the number of threads on the command line. For
example, we might run the program with four threads and type
Output
Hello from thread 0 of 4
Hello from thread 1 of 4
Hello from thread 2 of 4
Hello from thread 3 of 4
Explanation:
▪ Omp.h is the header file
▪ Where
▪ First argument is a string
▪ Last argument is the numeric base in which the string us represented
▪ Second argument is passed in a NULL pointer since it is not used
▪ Pragma says the program should start a number of threads equal to what was
passed in via the command line.
▪ OpenMP pragmas always begin with
# pragma omp
▪ The first directive is a parallel directive, and, it specifies that the structured block
of code that follows should be executed by multiple threads.
▪ #pragma omp parallel
▪ An Open MP construct is defined to be a compiler directive plus a block of code
▪ A single statement or a compound statement with a single entry at the top an a
single exit at the bottom.
▪ Branching in or out of a structured block is not allowed
▪ The number of threads that run the following structured block of code will be
determined by the run-time system. If there are no other threads started, the
system will typically run one thread on each available core.
▪ Usually the number of threads is specified on the command line, so modify
parallel directives with the num threads clause.
▪ A clause in OpenMP is just some text that modifies a directive. The num threads
clause can be added to a parallel directive. It allows the programmer to specify the
number of threads that should execute the following block:
▪ # pragma omp parallel num_threads(thread count)
Thread is short for thread of execution. The name is meant to suggest a sequence of
statements executed by a program. Threads are typically started or forked by a process,
and they share most of the resources of the process that starts them, but each thread has
its own stack and program counter.
When a thread completes execution it joins the process that started it. This
terminology comes from diagrams that show threads as directed lines.
In OpenMP, the collection of threads executing the parallel block—the original thread
and the new threads—is called a team, the original thread is called the master, and the
additional threads are called slaves.
When the block of code is completed, when the threads return from the call to Hello—
there’s an implicit barrier. This means that a thread that has completed the block of code
will wait for all the other threads in the team to complete the block.
When all the threads have completed the block, the slave threads will terminate
and the master thread will continue executing the code that follows the block.
Error Checking
An exceptionally good idea is to check for errors while writing code. For example, After
the call to strtol,
▪ Check the the value is positive.
▪ Check that the number of threads actually created by the parallel directive is the
same as thread count
Another source of potential problem is the compiler.
▪ If the compiler doesn’t support OpenMP, it will just ignore the parallel directive.
A flush also causes any values of the flush set variables that were captured in the
temporary view, to be discarded, so that later reads for those variables will come directly
from memory. A flush without a list of variable names flushes all variables visible at that
point in the program. A flush with a list flushes only the variables in the list. The
OpenMP flush operation is the only way in an OpenMP program, to guarantee that a
value will move between two threads.
In order to move a value from one thread to a second thread, OpenMP requires these four
actions in exactly the following order:
1. the first thread writes the value to the shared variable,
2. the first thread flushes the variable.
3. the second thread flushes the variable and
4. the second thread reads the variable.
General rules:
▪ Case Sensitive
▪ Only one directive name can be specified per directive
▪ Applies to the succeeding structured block or an Open MP construct
▪ Order in which clauses appear in directives is not significant
Clause
For general attributes:
Clause Description
if Specifies whether a loop should be executed in parallel
or in serial.
num_threads Sets the number of threads in a thread team.
ordered Required on a parallel for statement if
an ordered directive is to be used in the loop.
schedule Applies to the for directive.
nowait Overrides the barrier implicit in a directive.
Example:
#include<stdio.h>
#include<omp.h>
int main()
{
#pragma omp parallel num_threads(4)
{
int i=omp_get_thread_num();
Printf_s(“Hello from thread %d\n”,i);
}
}
Output:
Hello from thread 0
Hello from thread 1
Hello from thread 2
Hello from thread 3
Restrictions:
• A work-sharing construct must be enclosed dynamically within a parallel region in order
for the directive to execute in parallel.
• Work-sharing constructs must be encountered by all members of a team or none at all
• Successive work-sharing constructs must be encountered in the same order by all
members of a team
1. DO / for Directive
Splits the for loop so that each thread in the current team handles a different portion of the loop
Represents a type of ‘data parallelism’
#pragma omp for [clause ...] newline
schedule (type [,chunk])
ordered
for_loop
Clauses:
SCHEDULE: Describes how iterations of the loop are divided among the threads in the
team. The default schedule is implementation dependent.
STATIC
Loop iterations are divided into pieces of size chunk and then statically assigned to
threads. If chunk is not specified, the iterations are evenly (if possible) divided
contiguously among the threads.
DYNAMIC
Loop iterations are divided into pieces of size chunk, and dynamically scheduled among
the threads; when a thread finishes one chunk, it is dynamically assigned another.
The default chunk size is 1.
GUIDED
Iterations are dynamically assigned to threads in blocks as threads request them until no
blocks remain to be assigned. Similar to DYNAMIC except that the block size decreases
each time a parcel of work is given to a thread.
RUNTIME
The scheduling decision is deferred until runtime by the environment variable
OMP_SCHEDULE. It is illegal to specify a chunk size for this clause.
AUTO
The scheduling decision is delegated to the compiler and/or runtime system.
▪ NO WAIT / no wait: If specified, then threads do not synchronize at the end of
the parallel loop.
▪ ORDERED: Specifies that the iterations of the loop must be executed as they
would be in a serial program.
▪ COLLAPSE: Specifies how many loops in a nested loop should be collapsed into
one large iteration space and divided according to the schedule clause.
Restrictions:
• The DO loop can not be a DO WHILE loop, or a loop without loop control. Also,
the loop iteration variable must be an integer and the loop control parameters must
be the same for all threads.
• Program correctness must not depend upon which thread executes a particular
iteration.
• It is illegal to branch (goto) out of a loop associated with a DO/for directive.
• The chunk size must be specified as a loop invariant integer expression, as there is
no synchronization during its evaluation by different threads.
• ORDERED, COLLAPSE and SCHEDULE clauses may appear once each.
Example
#include<stdio.h>
#include<omp.h>
Output
0567182349
2. Sections directive
▪ Identifies a non-iterative work sharing construct
▪ Identifies code section to be divided among all threads
▪ Each section is executed by a thread
▪ Independent section directives are nested within a SECTIONS directive
#pragma omp sections [clause ... newline
private (list)
firstprivate (list)
lastprivate (list)
reduction (operator: list)
nowait ]
{
#pragma omp section newline
structured_block
#pragma omp section newline
structured_block
}
Clauses:
• There is an implied barrier at the end of a SECTIONS directive, unless
the NOWAIT/nowait clause is used
Example
#include<stdio.h>
#include<omp.h>
int main()
{
#pragma omp parallel sections num_threads(4)
{
Printf_s(“Hello from thread %d\n”, omp_get_thread_num());
#pragma omp section
Printf_s(“Hello from thread %d\n”, omp_get_thread_num());
}
}
Output:
Hello from thread 0
Hello from thread 0
3. Single Directive
Specifies that the associated structured block is executed by only one of the threads in the
team
Syntax:
#pragma omp single [clause ...] newline
private (list)
firstprivate (list)
nowait
Example:
#include<stdio.h>
#include<omp.h>
int main()
{
#pragma omp parallel num_threads(2)
{
#pragma omp single
//Only a single thread can read the input
Printf_s(“read input\n”);
//Matrix Multiplication
for(i=0; i<row_length_A; i++)
{
for (k=0; k<column_length_B; k++)
{
sum = 0;
for (j=0; j<column_length_A; j++)
{
sum += A[i][j]*B[j][k];
}
C[i][k]=sum;
}
}
//Array addition
for(i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
An OpenMP directive, "omp parallel for" instructs the compiler to execute the code in the
for loop in parallel. For multiplication, we can divide matrix A and B into blocks along
rows and columns respectively. This allows us to calculate every element in matrix C
individually thereby making the task parallel.
Steps to parallelization
The process of parallelizing a sequential program can be broken down into four discrete
steps.
2. Functional Parallelism
OpenMP allows us to assign different threads to different portions of code (functional
parallelism)
Functional Parallelism Example
v = alpha();
w = beta();
x = gamma(v, w);
y = delta();
printf ("%6.2f\n", epsilon(x,y));
May execute alpha,beta, and delta in parallel.
section Pragma
Precedes each block of code within the encompassing block preceded by the parallel
sections pragma
May be omitted for first parallel section after the parallel sections pragma
Syntax:
#pragma omp section
Another Approach
Execute alpha and beta in parallel.
Execute gamma and delta in parallel.
sections Pragma
Appears inside a parallel block of code
Has same meaning as the parallel sections pragma
If multiple sections pragmas inside one parallel block, may reduce fork/join costs
Scheduling Loops
Assigning iterations to threads.
For example, if a for loop with 10 iterations and 5 omp threads are used in the program,
then the partitioning the loop iteration can be done as follows
Thread 1 Thread 2 Thread 3 Thread 4 Thread 5
1 2 4 6 8 10
As given in the figure, iterations one and two are assigned to thread1, iterations three and
four are assigned to thread2. Similarly, iterations nine and ten are assigned to Thread5.
The Schedule clause can be applied to for directive.
Data Dependencies:
If one iteration depends on the results of its previous iterations, then for loop cannot be
parallelized correctly. This dependency called data dependencies.
Let us assume that the computation is carried out by two threads: Thread1 and Thread2.
The iterations 2,3 and 4 are allocated to Thread1 and iterations 5,6 and 7 are allocated to
Thread2. The computation can be done in any order by the threads.
Let us consider the situation in which the Thread 2 starts computing fib[5] before thread1
completes fib[4]. This situation definitely produces wrong result. Therefore openMP
cannot be able to parallelize the looping statements correctly if the iterations depend on
the result of previous iterations.
If the result of previous iterations is used in the subsequent iterations, then these type of
dependence is called a loop carried dependence.
Scope of variable
In serial programming, the scope of a variable consists of those parts of a program in which the
variable can be used. For example, a variable declared at the beginning of a C function has
“function-wide” scope, that is, it can only be accessed in the body of the function. On the other
hand, a variable declared at the beginning of a .c file but outside any function has “file-wide”
scope. The default scope of a variable can change with other directives, and that OpenMP
provides clauses to modify the default scope.