CP4252 Multicore Architecture and Programming Lab Manual

ARULMURUGAN COLLEGE OF ENGINEERING
(Approved by AICTE, New Delhi and Affiliated to Anna University, Chennai & ISO 9001:2015 Certified Institution)
THENNILAI, KARUR - 639 206.
DEPARTMENT OF C O M P U T E R S C I E N C E A N D
ENGINEERING
MASTER OF ENGINEERING
CP4292 – MULTICORE ARCHITECTURE AND PROGRAMMING

LABORATORY
I YEAR / II SEMESTER
Name :
Reg No :
Subject name :
Academic Year :
ARULMURUGAN COLLEGE OF ENGINEERING
(Approved by AICTE and Affiliated to Anna University)
THENNILAI, KARUR – 639 206.
PRACTICAL RECORD
Register Number
Name
Year / Sem
Degree / Branch
Subject Code & Name
Certified that this is a bonafide record of work done by the above student
during the year 20 - 20
Staff in-charge Head of the Department
Submitted for the University Practical Examination held on
Internal Examiner External Examiner

INDEX
Ex. Staff
Date Name of the Experiment Marks Page No
No Signature
Ex No: 1
Date : OpenMP Fork-Join Parallelism
Aim:
To write a simple Program to demonstrate an OpenMP Fork-Join Parallelism.
Algorithm:
1. Start the program execution.
2. Determine the number of iterations or tasks that need to be performed in parallel.
3. Initialize any shared variables or data structures that will be accessed by multiple
threads.
4. Use the OpenMP #pragma omp parallel directive to create a team of threads.
5. Inside the parallel region, use the #pragma omp for directive to parallelize a loop or
distribute tasks among the threads.
6. If necessary, specify any reduction operations using the reduction clause to combine
results from different threads.
7. Each thread executes its assigned portion of the loop or tasks independently and
updates any shared variables or data structures.
8. After the loop or tasks are completed, the threads synchronize using the implicit
barrier at the end of the parallel region, ensuring that all computations are finished.
9. Continue with the sequential execution outside the parallel region.
10. If required, retrieve and combine results from shared variables or data structures to
obtain the final result.
11. End the program execution.

Program:
#include <stdio.h>
#include <omp.h>
int main()
int i, n = 10;
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (i = 0; i < n; i++) {
sum += i;
printf("The sum is: %d\n", sum);
return 0;
Run:
gcc -fopenmp program.c -o program
./program
Output:
The sum is: 45
Result:
Thus, simple Program for OpenMP Fork-Join Parallelism was executed successfuly.
Ex No: 2
Date : Matrix-vector multiplication
Aim:
To Create a program that computes a simple matrix-vector multiplication b=Ax, either in

C/C++.
Algorithm:
Input:
1. A: The matrix of size M x N
2. x: The vector of size N
3. b: The result vector of size M
4. Check if the number of columns in matrix A (N) matches the size of vector x (N). If
they are not equal, return an error, as matrix-vector multiplication is not possible.
5. Initialize the elements of vector b to zero.
6. For each row i in the matrix A (0 <= i < M): a. Initialize the element b[i] to zero. b.
For each column j in the matrix A (0 <= j < N): i. Add the product of A[i][j] and x[j]
to b[i].
7. The resulting vector b contains the result of the matrix-vector multiplication (b = Ax).
Program:
#include <iostream>
#include <omp.h>
#define N 1000
void matrixVectorMultiplication(int A[N][N], int x[N], int b[N]) {

#pragma omp parallel for
for (int i = 0; i < N; ++i) {
b[i] = 0;
for (int j = 0; j < N; ++j) {
b[i] += A[i][j] * x[j];
}
}
}
int main() {
int A[N][N];
int x[N];
int b[N];
// Initialize matrix A and vector x

// (omitted for brevity)
// Perform matrix-vector multiplication

matrixVectorMultiplication(A, x, b);
// Print the result vector b

for (int i = 0; i < N; ++i) {
std::cout << b[i] << " ";
}
std::cout << std::endl;
return 0;
}
Output:
[2 4 6]
[1 3 5]
b[0] = (2 * 1) + (4 * 2) + (6 * 3) = 2 + 8 + 18 = 28
b[1] = (1 * 1) + (3 * 2) + (5 * 3) = 1 + 6 + 15 = 22
Result:
Thus, creation of simple matrix-vector multiplication b=Ax, either in C/C++ was created and
output is verified.
Ex No: 3
Date : Largest Number in Array
Aim:
To create a program that computes the sum of all the elements in an array A (C/C++)
or a program that finds the largest number in an array A. Use OpenMP directives to make it
run in parallel.
Algorithm:
Input:
 A: The array of size N
Output:
 The sum of all elements in the array
1. Initialize a variable sum to zero.
2. Use OpenMP parallel for loop to distribute the iterations among multiple threads. a.
For each element A[i] in the array (0 <= i < N): i. Add A[i] to the sum using a
reduction clause.
3. The variable sum now contains the sum of all elements in the array.
Program:
#include <iostream>
#include <omp.h>
#define ARRAY_SIZE 1000
int computeArraySum(int A[], int size) {

int sum = 0;
#pragma omp parallel for reduction(+:sum)

for (int i = 0; i < size; ++i) {
sum += A[i];
}
return sum;
}
int main() {
int A[ARRAY_SIZE];
// Initialize array A with values
// (omitted for brevity)
// Compute the sum of array elements

int sum = computeArraySum(A, ARRAY_SIZE);
std::cout << "Sum: " << sum << std::endl;
return 0;
}
Output:
Array A: [1, 2, 3, 4, 5]
Sum: 15
Result:
Thus, creation of the sum of all the elements in an array was created and executed
successfully.
Ex No: 4
Date : Message-Passing
Aim:
To write a simple Program demonstrating Message-Passing logic using OpenMP
Algorithm:
1. Initialize the MPI environment.
2. Determine the rank of the current process and the total number of processes.
3. Define the sending and receiving buffers.
4. If the process rank is the sender:
 Pack the data into the sending buffer.
 Send the message to the desired destination process using MPI_Send.
5. If the process rank is the receiver:
 Receive the message from the desired source process using MPI_Recv.
 Unpack the received data from the receiving buffer.
6. Finalize the MPI environment.
Program:
#include <iostream>
#include <omp.h>
#define NUM_TASKS 5
void processTask(int taskID) {

// Simulate processing of the task
#pragma omp critical
std::cout << "Processing Task " << taskID << " on Thread " << omp_get_thread_num() <<
std::endl;
}
int main() {
#pragma omp parallel
{
#pragma omp single
{
// Create tasks
for (int taskID = 0; taskID < NUM_TASKS; ++taskID) {
#pragma omp task
processTask(taskID);
}
}
}
return 0;
}
Output:
Thread 0 sent message: 42
Thread 1 received message: 42
Result:
Thus a simple Program demonstrating Message-Passing logic using OpenMP was

executed successfully.
Ex No: 5
Date : Floyd's Algorithm
Aim:
To Implement the All-Pairs Shortest-Path Problem (Floyd's Algorithm) Using

OpenMP.
Algorithm:
1. Initialize the graph matrix with initial distances between vertices.
2. Create a distance matrix dist of size N x N to store the shortest path distances.
3. Copy the values from the graph matrix to the distance matrix.
4. Parallelize the following loop using OpenMP directives: a. For each intermediate
vertex k from 0 to N-1:
 Use #pragma omp parallel for directive to parallelize the outer loop.
 For each pair of vertices i and j from 0 to N-1:
 If dist[i][k] and dist[k][j] are not infinity:
 If the distance through vertex k is shorter than the current

distance between i and j, update dist[i][j] to the new shorter
distance.
5. Print the resulting shortest path distances stored in the dist matrix.
Program:
#include <iostream>
#include <limits>
#include <omp.h>
#define N 4 // Number of vertices
// Function to initialize the graph matrix with distances

void initializeGraph(int graph[N][N]) {
// Initialize the matrix with some distances
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
if (i == j)
graph[i][j] = 0;
else
graph[i][j] = (i + j) % 3 == 0 ? (i + j) : std::numeric_limits<int>::max();
}
}
}
// Function to print the resulting shortest path matrix

void printShortestPaths(int dist[N][N]) {
std::cout << "Shortest Paths:\n";
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
if (dist[i][j] == std::numeric_limits<int>::max())
std::cout << "INF\t";
else
std::cout << dist[i][j] << "\t";
}
std::cout << std::endl;
}
}
// Function to compute the shortest paths using Floyd's algorithm

void computeShortestPaths(int graph[N][N]) {
int dist[N][N];
// Initialize the distance matrix

for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
dist[i][j] = graph[i][j];
}
}
// Parallelized Floyd's algorithm

for (int k = 0; k < N; ++k) {
#pragma omp parallel for collapse(2)
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
if (dist[i][k] != std::numeric_limits<int>::max() && dist[k][j] !=
std::numeric_limits<int>::max()) {
if (dist[i][k] + dist[k][j] < dist[i][j]) {
dist[i][j] = dist[i][k] + dist[k][j];
}
}
}
}
}
printShortestPaths(dist);
}
int main() {
int graph[N][N];
initializeGraph(graph);
computeShortestPaths(graph);
return 0;
}
Output:
Shortest Paths:
0 1 1 2
3 0 2 3
2 3 0 1
1 2 1 0
Result:
Thus implementation of All-Pairs Shortest-Path Problem (Floyd's Algorithm) Using
OpenMP were executed successfully.
Ex No : 6
Date : Monte Carlo Methods
Aim:
To implement a program Parallel Random Number Generators using Monte Carlo Methods
in OpenMP.
Algorithm:
1. Set the number of points (numPoints) to generate for the Monte Carlo simulation.
2. Set the number of threads (numThreads) for parallel execution.
3. Set the maximum number of threads for OpenMP using
omp_set_num_threads(numThreads).
4. Define a function estimatePi that takes the number of points as input and returns an
estimate of Pi:
 Declare a variable numPointsInsideCircle and initialize it to 0.
 Initialize a random number generator for each thread with a unique seed.
 Use the #pragma omp parallel for directive to parallelize the following
steps:
 For each point:
 Generate random coordinates (x and y) using the random
number generator specific to each thread.
 Check if the point is inside the unit circle (i.e., x^2 + y^2 <=
1).
 If the point is inside the circle, increment
numPointsInsideCircle by 1.
 Calculate the estimate of Pi by multiplying the ratio of points inside the circle
to the total points by 4.
 Return the estimated value of Pi.
5. Call the estimatePi function with the number of points to obtain the estimate of Pi.
6. Print the estimated value of Pi.
Program:
#include <iostream>
#include <random>
#include <omp.h>
// Function to estimate Pi using Monte Carlo method

double estimatePi(int numPoints) {
int numPointsInsideCircle = 0;
// Initialize random number generator with a unique seed for each thread
std::random_device rd;
std::mt19937_64 generator(rd());
std::uniform_real_distribution<double> distribution(-1.0, 1.0);
// Parallelize the Monte Carlo simulation using OpenMP

#pragma omp parallel for reduction(+:numPointsInsideCircle)
for (int i = 0; i < numPoints; ++i) {
// Generate random point coordinates
double x = distribution(generator);
double y = distribution(generator);
// Check if the point is inside the unit circle

if (x * x + y * y <= 1.0) {
numPointsInsideCircle++;
}
}
// Estimate Pi using the ratio of points inside the circle to total points
double pi = 4.0 * static_cast<double>(numPointsInsideCircle) /
static_cast<double>(numPoints);
return pi;
}
int main() {
// Number of points to generate for the Monte Carlo simulation
int numPoints = 10000000;
// Set the number of threads for parallel execution

int numThreads = omp_get_max_threads();
omp_set_num_threads(numThreads);
// Estimate Pi using Monte Carlo simulation with parallel random number generation
double pi = estimatePi(numPoints);
std::cout << "Estimated Pi: " << pi << std::endl;
return 0;
}
Output:
Estimated Pi: 3.1416596
Result:
Thus, implementation of Parallel Random Number Generators using Monte Carlo
Methods in OpenMP were executed successfully.
Ex No : 7
Date : MPI-broadcast-and-collective-communication
Aim:
Write a Program to demonstrate MPI-broadcast-and-collective-communication in C.
Algorithm:
1. Initialize MPI using MPI_Init.
2. Get the rank of the current process using MPI_Comm_rank.
3. Get the total number of processes using MPI_Comm_size.
4. Initialize a variable value and set it to 0.
5. If the rank is 0, set value to the desired initial value (e.g., 42).
6. Use MPI_Bcast to broadcast the value from process 0 to all other processes.
7. Print the received value for each process using printf.
8. Allocate memory for an array values of size size to store the gathered values.
9. Use MPI_Allgather to gather the values from all processes into the values array.
10. Print the gathered values for each process using printf.
11. Free the dynamically allocated memory for the values array.
12. Finalize MPI using MPI_Finalize.
Program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char** argv) {

int rank, size;
int value = 0;
// Initialize MPI
MPI_Init(&argc, &argv);
// Get the rank and size of the MPI process

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// Only process 0 initializes the value

if (rank == 0) {
value = 42;
}
// Broadcast the value from process 0 to all other processes
MPI_Bcast(&value, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("Process %d received value: %d\n", rank, value);
// Perform collective communication to gather values from all processes

int* values = (int*)malloc(sizeof(int) * size);
MPI_Allgather(&value, 1, MPI_INT, values, 1, MPI_INT, MPI_COMM_WORLD);
// Print the gathered values from all processes

printf("Process %d gathered values: ", rank);
for (int i = 0; i < size; ++i) {
printf("%d ", values[i]);
}
printf("\n");
// Free dynamically allocated memory

free(values);
// Finalize MPI
MPI_Finalize();
return 0;
}
Output:
Process 0 received value: 42
Process 0 gathered values: 42 42 42 42

Result:
Thus the Program to demonstrate MPI-broadcast-and-collective-communication in C.
Ex No : 8
Date : MPI-scatter-gather-and-all gather
Aim:
To write a Program to demonstrate MPI-scatter-gather-and-all gather in C.
Algorithm:
4. Initialize the send buffer (sendbuf) and receive buffer (recvbuf).
5. If the rank is 0, initialize the send buffer with the desired values.
6. Use MPI_Scatter to scatter portions of the send buffer to all processes.
7. Print the received values for each process.
8. Use MPI_Gather to gather the received values from all processes into the send buffer
of process 0.
9. If the rank is 0, print the gathered values.
10. Use MPI_Allgather to gather values from all processes into the send buffer of each
process.
11. Print the gathered values for each process.
Program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int rank, size;
int sendbuf[10];
int recvbuf[2];
// Initialize MPI
// Only process 0 initializes the send buffer

if (rank == 0) {
for (int i = 0; i < 10; ++i) {
sendbuf[i] = i;
}
}
// Scatter the send buffer to all processes

MPI_Scatter(sendbuf, 2, MPI_INT, recvbuf, 2, MPI_INT, 0, MPI_COMM_WORLD);
// Print the received values for each process

printf("Process %d received values: ", rank);
for (int i = 0; i < 2; ++i) {
printf("%d ", recvbuf[i]);
}
printf("\n");
// Gather the received values from all processes

MPI_Gather(recvbuf, 2, MPI_INT, sendbuf, 2, MPI_INT, 0, MPI_COMM_WORLD);
// Process 0 prints the gathered values

if (rank == 0) {
printf("Process 0 gathered values: ");
for (int i = 0; i < 10; ++i) {
printf("%d ", sendbuf[i]);
}
printf("\n");
}
// Perform allgather operation to gather values from all processes

MPI_Allgather(recvbuf, 2, MPI_INT, sendbuf, 2, MPI_INT, MPI_COMM_WORLD);
// Print the gathered values from all processes

printf("Process %d gathered values: ", rank);
for (int i = 0; i < 2 * size; ++i) {
printf("%d ", sendbuf[i]);
}
printf("\n");
// Finalize MPI
MPI_Finalize();
return 0;
}
Output:
Process 0 received values: 0 1

Process 0 gathered values: 0 1 2 3 4 5 6 7 0 0
Process 0 gathered values: 0 1 2 3 4 5 6 7 0 0 2 3 4 5 6 7 0 0 2 3
Result:
Thus the Program to demonstrate MPI-scatter-gather-and-all gather in C were executed
successfully.
Ex No : 9
Date : MPI-send-and-receive
Aim:
To Write a Program to demonstrate MPI-send-and-receive in C.
Algorithm:
4. Initialize variables for the value to be sent and received.
5. If the rank is 0, set the value to be sent.
6. If the rank is 0, send the value to the target process using MPI_Send.
7. If the rank is the target process, receive the value from the source process using
MPI_Recv.
8. Print the received value.
9. Finalize MPI using MPI_Finalize
Program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int rank, size;
int value = 0;
int received_value;
// Initialize MPI

// Only process 0 initializes the value

if (rank == 0) {
value = 42;
printf("Process 0 sends value: %d\n", value);
// Send the value to process 1
MPI_Send(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (rank == 1) {
// Receive the value from process 0
MPI_Recv(&received_value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 receives value: %d\n", received_value);
}
// Finalize MPI
MPI_Finalize();
return 0;
}
Output:
Process 0 sends value: 42
Process 1 receives value: 42
Result:
Thus, a Program to demonstrate MPI-send-and-receive in C was executed successfully.
Ex No: 10
Date : Parallel-rank
Aim:
To write a Program to demonstrate by performing-parallel-rank-with-MPI in C
Algorithm:

4. Print the rank and total number of processes for each process.
Program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int rank, size;
// Initialize MPI

// Print the rank for each process

printf("Process %d out of %d processes\n", rank, size);
// Finalize MPI
MPI_Finalize();
return 0;
}
Output:
Process 0 out of 4 processes
Result:
Thus, a Program to demonstrate by performing-parallel-rank-with-MPI in C was
executed successfully.

CP4252 Multicore Architecture and Programming Lab Manual

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CP4252 Multicore Architecture and Programming Lab Manual

Uploaded by

Copyright:

Available Formats

ARULMURUGAN COLLEGE OF ENGINEERING

THENNILAI, KARUR - 639 206.

CP4292 – MULTICORE ARCHITECTURE AND PROGRAMMING

Subject Code & Name

Staff in-charge Head of the Department

Submitted for the University Practical Examination held on

Internal Examiner External Examiner

Date : OpenMP Fork-Join Parallelism

To write a simple Program to demonstrate an OpenMP Fork-Join Parallelism.

1. Start the program execution.

2. Determine the number of iterations or tasks that need to be performed in parallel.

9. Continue with the sequential execution outside the parallel region.

11. End the program execution.

#pragma omp parallel for reduction(+:sum)

for (i = 0; i < n; i++) {

printf("The sum is: %d\n", sum);

gcc -fopenmp program.c -o program

The sum is: 45

Date : Matrix-vector multiplication

To Create a program that computes a simple matrix-vector multiplication b=Ax, either in

1. A: The matrix of size M x N

2. x: The vector of size N

3. b: The result vector of size M

5. Initialize the elements of vector b to zero.

void matrixVectorMultiplication(int A[N][N], int x[N], int b[N]) {

// Initialize matrix A and vector x

// Perform matrix-vector multiplication

// Print the result vector b

Date : Largest Number in Array

 A: The array of size N

 The sum of all elements in the array

1. Initialize a variable sum to zero.

#define ARRAY_SIZE 1000

int computeArraySum(int A[], int size) {

#pragma omp parallel for reduction(+:sum)

// Compute the sum of array elements

std::cout << "Sum: " << sum << std::endl;

To write a simple Program demonstrating Message-Passing logic using OpenMP

1. Initialize the MPI environment.

3. Define the sending and receiving buffers.

4. If the process rank is the sender:

 Pack the data into the sending buffer.

 Send the message to the desired destination process using MPI_Send.

5. If the process rank is the receiver:

 Unpack the received data from the receiving buffer.

6. Finalize the MPI environment.

void processTask(int taskID) {

Thread 0 sent message: 42

Thread 1 received message: 42

Thread 2 received message: 42

Thread 3 received message: 42

Thus a simple Program demonstrating Message-Passing logic using OpenMP was

To Implement the All-Pairs Shortest-Path Problem (Floyd's Algorithm) Using

1. Initialize the graph matrix with initial distances between vertices.

 For each pair of vertices i and j from 0 to N-1:

 If dist[i][k] and dist[k][j] are not infinity:

 If the distance through vertex k is shorter than the current

#define N 4 // Number of vertices

// Function to initialize the graph matrix with distances

// Function to print the resulting shortest path matrix

// Function to compute the shortest paths using Floyd's algorithm

// Initialize the distance matrix

// Parallelized Floyd's algorithm