Professional Documents
Culture Documents
HPC Int I Retest Answer Key
HPC Int I Retest Answer Key
1. The average number of tasks completed by the server over a time period is called
__________________
The average number of tasks 4completed by the server over a time period is
called as ________. U/ CO1
a) 11 bits d) 20 bits
b) 16 bits c) 21 bits
5. In a ----------system, all caches on the bus monitor the bus to determine if they
have a copy of the block of data that is requested on the bus. R/ CO2
a) Coherence b) Recurrence c) Replication d) Snooping
6. Identify the directive that force threads to wait till all are done
U/ CO3
a) #pragma b) #pragma c) c) #pragma d) #pragma omp
omp parallel omp barrier omp critical sections
7. All the components of a CPU core can operate at some maximum speed called
__________. R/ CO1
a) ) Accelerated b) Peak c) High d) ) Scalable
Performance Performance Performance Performance
8. Cache memory works on the principle of locality of reference. R/CO1
9. The OpenMP library function omp_set_num_threads( ) is used to set the
R/ CO1
number of threads in upcoming parallel regions.
10.
Parallel efficiency is defined as R/ CO2
How many iterations are executed if four threads execute the above program?
If four threads execute the program, the Loop is splitted among four threads
(100/4= 25), therefore there are 25 iterations.
int A[N][N];
int B[N][N][N];
int C[N][N][N];
int main()
{
int i,j,k;
struct timeval tv1, tv2;
struct timezone tz;
double elapsed;
omp_set_num_threads(4);
for (i= 0; i< N; i++){
for (j= 0; j< N; j++)
{
for (k= 0; k< N; k++)
{
A[i][j] = 2;
B[i][j][k] = 4;
}
}
}
gettimeofday(&tv1, &tz);
#pragma omp parallel for private(i,j,k) shared(A,B,C)
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
for (k = 0; k < N; k++) {
C[i][j][k] += A[i][k] * B[k][j][i];
}
}
}
gettimeofday(&tv2, &tz);
elapsed = (double) (tv2.tv_sec-tv1.tv_sec) + (double) (tv2.tv_usec-
tv1.tv_usec) * 1.e-6;
printf("Elapsed time = %f seconds.\\n", elapsed);
int A[rows*rows];
int B[rows*cols];
int res[rows*cols];
A[0]=1;
A[1]=2;
A[2]=3;
A[3]=4;
B[0]=5;
B[1]=6;
B[2]=2;
B[3]=3;
B[4]=1;
B[5]=7;
gettimeofday(&tv1, &tz);
#pragma omp parallel for private(i,j,k) shared(A,B)
//multiplication as column major
for (int i=0;i<rows;i++){
for (int j=0;j<cols-1;j++){
res[i+j*rows]=0;
for (int k=0;k<rows;k++){
res[i+j*rows]+=A[i+k*rows]*B[k+j*cols];
printf("%d %d %d\n",res[i+j*rows],A[i+k*rows],B[k+j*cols]);
}
}
}
gettimeofday(&tv2, &tz);
elapsed = (double) (tv2.tv_sec-tv1.tv_sec) + (double) (tv2.tv_usec-tv1.tv_usec)
* 1.e-6;
printf("Elapsed time = %f seconds.\\n", elapsed);
for (int i=0;i<rows*(cols-1);i++){
printf("/nB[%d]=%d",i,res[i]);
}
return 0;
}
MATRIX MULTIPLICATION:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include <sys/time.h>
#define N 100
int A[N][N];
int B[N][N];
int C[N][N];
int main()
{
int i,j,k;
struct timeval tv1, tv2;
struct timezone tz;
double elapsed;
omp_set_num_threads(100);
for (i= 0; i< N; i++){
for (j= 0; j< N; j++){
A[i][j] = 2;
B[i][j] = 2;
}
}
gettimeofday(&tv1, &tz);
#pragma omp parallel for private(i,j,k) shared(A,B,C)
for (i = 0; i < N; ++i) {
for (j = 0; j < N; ++j) {
for (k = 0; k < N; ++k) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
gettimeofday(&tv2, &tz);
elapsed = (double) (tv2.tv_sec-tv1.tv_sec) + (double) (tv2.tv_usec-
tv1.tv_usec) * 1.e-6;
printf("Elapsed time = %f seconds.\\n", elapsed);
}
Data can be stored in a computer system in many different ways. CPUs have a
set of registers, which can be accessed without delay. In addition, there are one or
more small but very fast caches holding copies of recently used data items. Main
memory is much slower, but also much larger than cache. Finally, data can be
stored on disk and copied to main memory as needed. This is a complex
hierarchy.
The data transfer between different levels of the hierarchy is vital in order to
identify performance bottlenecks.
(2 marks)
Shared memory opens the possibility to have immediate access to all data from
all processors without explicit communication. OpenMP is a set of compiler
directives.
Model for OpenMP thread operations: The master thread “forks” team of
threads, which work on shared memory in a parallel region. After the parallel
region, the threads are “joined,” i.e., terminated or put to sleep, until the next
parallel region starts. The number of running threads may vary among parallel
regions. (3 Marks)