Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

PARALLEL AND DISTRIBUTED COMPUTING

CSE4001

DIGITAL ASSIGNMENT-1

SLOT: D2

Bikram Prasad

17BCE2396

1. A multiprocessor consists of 100 processors, each capable of a peak


execution rate of 2 Gflops. What the performance of the system is as measured in
Gflops, when 10% of the code is sequential and 90% of the code is parallelizable

Using Amdahl’s Law here we get,


Speedup = 1/(f+(1-f)/p)
where f = fraction of operations that must be performed sequentially, and p =
number of processors.
For the given problem, f=10%=0.1 and p=100.
So, speedup=1/ (0.1+0.9/100) =9.17
Peak execution rate of each processor is 2 GFlops.
Peak performance of the system = 2*9.17=18.34 GFlops.

2.An examination paper has 8 questions to be answered and there are 1000
answer books. Each answer takes 3 minutes to correct. If 4 teachers are
employed to correct the papers in a pipeline mode, how much time will be taken
to complete the job of correcting 1000 answer papers? What is the efficiency of
processing? If 8 teachers are employed instead of 4, what is the efficiency? What
is the job completion time in the second case? Repeat with 32 teachers and 4
pipelines.

Here,
There are 1000 papers, 8 questions per paper and it takes 3 minutes to correct each question.

If a single teacher corrects the papers, time taken= 1000*8*3 = 24000 minutes. Case 1:-
When there are 4 teachers,
Each teacher will have to correct 2 questions, so clock cycle is 6 minutes.

Time Taken = 1003 Clock Cycles=(1000+n-1) clock cycles = 1003*6 = 6018 minutes.
ö 逷 逷逷㷟 逷㷟

ö 逷 㷟㷟逷㷟 逷㷟 . 逷
Efficiency =

Efficiency = 24000/(6018*4) = 0.997

Case 2:- When there aren8 Teachers,


One clock cycle will be 3 mins.
Time Taken = (1000+7)*3 = 3021 minutes.
Efficiency = 24000/(3021*8) = 0.993
Case 3: 32 teachers, 4 pipelines
Each pipeline has 8 teachers.
250 papers must be corrected in each pipeline.
Time taken for one pipeline with 250 papers and 8 teachers = (250+7)*8 = 2056 minutes.

Time taken to correct all the papers will also be the same since the 4 pipelines can be
executed simultaneously.
Efficiency = 24000/(2056*32) = 0.365

3.Distinguish between UMA, NUMA and CC-NUMA parallel computer


architectures. Give one example block diagram of each of these architectures.
What are the advantages and disadvantages of each of these architectures? Do
all of these architectures have a global addressable shared memory? What is
the programming model used by these parallel computers?

UMA(Uniform Memory Access):


Uniform Memory Access has limited bandwidth.
Uniform Memory Access is slower than non-uniform Memory Access.
In Uniform Memory Access, Single memory controller is used.
Uniform Memory Access is applicable for general purpose applications and time-
sharing applications.
In uniform Memory Access, memory access time is balanced or equal.
There are 3 types of buses used in uniform Memory Access which are: Single, Multiple
and Crossbar.
Advantages:
Global address space provides a user-friendly programming perspective to memory.

Data sharing between task is both fast and uniform due to proximity of memory
to CPU.
Disadvantages:
Programmers responsibility for synchronization constructs that ensure “correct”
access of global memory.
It becomes increasingly difficult and expensive to design and produce shared memory
machines with ever increasing number of processors.

BLOCK DIAGRAM:-
NUMA(Non-Uniform Memory Access):
Non-uniform Memory Access has more bandwidth than uniform Memory
Access.
Non-uniform Memory Access is faster than uniform Memory Access.
In Non-uniform Memory Access, Different memory controller is used.
Non-uniform Memory Access is applicable for real-time applications and time-critical
applications.
In non-uniform Memory Access, memory access time is not equal.
While in non-uniform Memory Access. There are 2 types of buses used which are: Tree
and hierarchical.
Advantages:
Less replication of data
Easier programming
Disadvantages:
The cost of hardware routers.
Lack of programming standards for large configurations.
BLOCK DIAGRAM:-

CC-NUMA(Cache Coherent NUMA):


Has more bandwidth than uniform Memory Access.
This architecture allows fast access to data in local memory and slower access
to data in a remote memory.
In Non-uniform Memory Access, Different memory controller is used and they co-
operate using directory techniques to maintain cache-coherence across the system.

Applicable for real-time applications and time-critical applications.


In non-uniform Memory Access, memory access time is not equal.
The advantage of CC-NUMA over a standard SMP design is that there are several
local busses, so each one need not carry all transactions occurring throughout the
system.
Advantages:
ccNUMA uses inter-processor communication between cache controllers to keep a
consistent memory image when more than one cache stores the same memory location.

Disadvantages:
ccNUMA may perform poorly when multiple processors attempt to access the same
memory area in rapid succession.

BLOCK DIAGRAM:-

UMA architecture is based on multiple processors accessing a single shared memory in uniform
memory. In NUMA and ccNUMA there exists a local memory in each processor and there exists
a shared memory through which data passing and communication is performed by the
processors. In NUMA the processors are capable of accessing the local memory of the other
processors. All the three parallel computing architectures have shared memory in them.

UMA programming model is based on stream processing, implicit parallelism, explicit


parallelism and non-blocking tasks. NUMA programming maximizes local memory
reference. FMA is used for thread binding and memory placement. Non-cache-coherent
NUMA systems become prohibitively complex to program in the standard von Neumann
architecture programming model.
4. What are the major differences between a message passing parallel
computer and a NUMA parallel computer?

Message Passing NUMA


Each processor has its own exclusive Processors share a common memory
memory. space but each processor will have
different memory access times (non-
uniform memory access).
There is no global shared address There is a shared address space among
space. all processors.
Communication between the processors Communication between the processors
takes place directly by a message- tales place by changing the common
passing paradigm. Send and receive variables in the shared address space.
variants are the only way of So, a write to a location is visible to the
communication. reads of all the other processors.
Requires little hardware support, other Hardware support is required for the
than a network. shared memory. Also, if each processor
has individual cache, there should be a
cache coherence mechanism.
It is difficult to implement NUMA using Can easily emulate message passing.
message passing.

You might also like