Professional Documents
Culture Documents
Ass Parallel (1)
Ass Parallel (1)
202003047
1. Compare between: 1) Problem constrained scaling, 2) Memory constrained scaling and 3) Time constrained scaling
(TC)
1. Problem-Constrained Scaling
Definition: Problem-constrained scaling, also known as weak scaling, involves increasing the problem size proportionally to
the number of processors. The goal is to keep the workload per processor constant.
Characteristics:
Workload: The workload per processor remains constant as the number of processors increases.
Objective: To evaluate how well the system can handle larger problems as more resources are added.
Performance Metric: Typically measured in terms of efficiency or speedup for a fixed problem size per processor.
Use Cases: Suitable for scenarios where the problem size can grow, such as simulations that require finer resolutions with
more processors.
Advantages:
Disadvantages:
Not useful for fixed-size problems where increasing the number of processors doesn’t change the problem size.
……………………………………………………………………………………………………..
2. Memory-Constrained Scaling
Definition: Memory-constrained scaling involves increasing the number of processors to handle problems that are constrained
by memory limitations. The focus is on distributing a problem that is too large to fit into the memory of a single processor.
Characteristics:
Workload: The problem size is fixed, but the memory required exceeds the capacity of a single processor.
Objective: To distribute the memory load across multiple processors to solve large problems that would otherwise be
infeasible.
Performance Metric: Typically measured by the ability to handle larger memory requirements and by the reduction in
execution time.
Advantages:
Allows solving problems that require more memory than is available on a single processor.
Effective for memory-intensive applications like large databases and scientific simulations.
Disadvantages:
………………………………………………………………………………….
Definition: Time-constrained scaling focuses on reducing the execution time of a fixed-size problem by increasing the number
Characteristics:
Workload: The problem size is fixed, and the goal is to decrease the execution time by adding more processors.
Objective: To evaluate how well the execution time decreases with the addition of more processors.
Performance Metric: Measured in terms of speedup, which is the ratio of the time taken with one processor to the time taken
Advantages:
Useful for applications where the problem size is fixed and improving execution time is crucial.
Disadvantages:
Diminishing returns as more processors are added due to overhead and communication costs.
Scalability is limited by the problem’s inherent parallelism and the overhead of managing more processors.
2. Write a pseud code for Summing M numbers with N processors using shared and distribute memory.
acquire(lock)
global_sum += local_sum
release(lock)
for j from 0 to (M / N) - 1
local_sum += subarray[j]
master_process:
global_sum = 0
3. Is a program that obtains linear speedup strongly scalable? Explain your answer.
Explanation:
Strong scalability, also known as strong scaling, refers to the ability of a parallel system to effectively utilize
an increasing number of processors to solve a fixed-size problem. When a program achieves linear speedup,
it means that the performance (typically measured in terms of execution time) improves proportionally with
the addition of more processors.
4.Assume the runtime of a program is 100 seconds for a problem of size 1. The program consists of an initialization phase
which lasts for 10 seconds and cannot be parallelized, and a problem solving phase which can be perfectly parallelized
and grows quadratically with increasing problem size.
i. What is the speedup for the program as a function of the number of processors p and the problem size n.
ii. What is the execution time and speedup of the program with problem size 1, if it is parallelized and run on 4
processors?
iii. What is the execution time of the program if the problem size is increased by a factor of 4 and it is run on 4
processors? And on 16 processors? What is the speedup of both measurements?
5. A program P consists of parts A, B and C. Part A is not parallelized while parts B and C are parallelized. The
program is run on a computer with 10 cores. For a given problem size, the execution of the program in 1 of the
cores takes 10, 120, and 53 seconds, for parts A, B, and C, respectively. Answer the following questions:
i. Which is the minimum execution time we can attain for each part if we now execute the same program with
same problem sizes on 5 of the cores? What is the best speedup we can expect for the entire program?
6. Design a multi-threaded, superscalar dual-core processor. The processor executes up to two instructions per clock
from one instruction stream on each core (one SIMD instruction + one scalar instruction). Also, it can switch to
execute the other instruction stream when faced with stall
7. Given the following program parallelized with OpenMP:
#include <omp.h>
8. The pseudo code for sequential algorithm is provided below for a 2D-grid based solver (partial differential
equation (PDE) on (N+2) x (N+2) grid).
i Write the pseudo code for the shared address space solver.
#include <omp.h>
const int n;
float* A; // assume allocated to grid of N+2 x N+2 elements
void solve(float* A) {
float diff, prev;
bool done = false;
i Compare between the shared address space and message passing programming models in terms of the
computation, communication, and synchronization.
programming models
Computation:
.Each core appears to support 2 threads (indicated by the two blue sections).
i Number of independent pieces of work is needed to run chip with maximal latency hiding ability.
.Given that each core has 2 threads and there are 8 cores:
i Assuming a single worker thread (X10 NTHREADS=1) what is the runtime of this program?
The speedup is not 2X because of the overheads involved in parallelization and the presence of serial components in the task
graph. Tasks 𝑆1,2, and the synchronization points before and after parallel tasks contribute to the non-parallelizable portion
of the total runtime. This inherent serialization limits the achievable speedup according to Amdahl's Law.