L11 Granularity

Parallel Computing
Lecture #10
Performance Metrics
Performance
Two key goals to be achieved with the design of parallel

applications are:
• Performance – the capacity to reduce the time needed to solve a
problem as the computing resources increase
• Scalability – the capacity to increase performance as the size of

the problem increases
Need of Performance Metrics
 Determining the best algorithm
 Evaluating Hardware Platforms
 Examining the Benefits from parallelism

Performance Metrics
There are 2 distinct classes of performance metrics:

• Performance metrics for processors/cores – assess the
performance of a processing unit, normally done by measuring the
speed or the number of operations that it does in a certain period
of time
• Performance metrics for parallel applications – assess the
performance of a parallel application, normally done by comparing
the execution time with multiple processing units against the
execution time with just one processing unit
Analytical Modeling –
Basics
• A sequential algorithm is evaluated by its runtime.
• The parallel runtime of a program depends on the input size,

the number of processors, a n d the communication parameters
of the machine.
Sources of Overhead in Parallel
Programs
• If I use two processors, shouldnt my program run twice as
fast?
• No – a number of overheads, including wasted computation,

communication, idling, a n d contention cause degradation
in performance. Execution Time
P0
P1
P2
P3
P4
Essential/Excess Computation Interprocessor Communication
P5 Idling
P6
The execution profile of a hypothetical parallel program

P7
executing on eight processing elements.

Profile indicates times spent performing computation (both
essential a n d excess), communication, a n d idling.
Sources of Overheads in Parallel
Programs
• Interprocess interactions: Processors working on any non-trivial
parallel problem will n e e d to talk to e a c h other.
• Idling: Processes ma y idle b e cau s e of imbalance,

load synchronization, or serial components.
• Excess Computation: This is computation not perform e d b y

the serial version. This might b e b e c a u se the serial algorithm
is difficult to parallelize, or that some computations are
repeated across processors to minimize communication.
Performance Metrics
A Number of Metrics have been used based on the desired outcome of

performance analysis
• Execution Time
• Total Parallel Overhead
• Speedup
• Efficiency
• Cost
Execution Time
• Serial runtime of a program is the time elapsed between the

beginning a n d the e n d of its execution on a sequential
computer.
• The parallel runtime is the time that elapses from the moment
the Parallel Computation starts to the moment the last processor
finishes execution.
• We denote the serial runtime by T S a n d the parallel runtime by

TP .
Total Parallel Overhead
• The overheads incurred by a parallel program are encapsulated into a

single expression referred as the overhead function.
• We define overhead function or toral overhead of a parallel system as
• The total time collectively spent by all the processing elements over
and above that required by the fasters known sequential algorithm for
solving the same problem on a single processing element.
Total Parallel Overhead
• Let T a l l b e the total time collectively spent by all the

processing elements.
• T S is the serial time.
• Observe that T a l l − T S is then the total time spend by all

processors c o m b i ned in non-useful work. This is called the
total overhead.
• The total time collectively spent by all the processing

elements (p is the number of
T all = p T P
processors).
• The overhead function (T o ) is therefore given by
To (1)
=
Speedup
• Performance gain is achieved by parallelizing a given

application over a sequential implementation.
• Speedup is a measure that captures the relative benefit of

solving a problem in parallel.
• What is the benefit fro m parallelism?
• Speedup (S) is defined as the ratio of the time taken to solve

a problem on a single processor to the time required to
solve the same problem on a parallel computer with p
identical processing elements.
• Formally speedup S is defined as the ratio of serial runtime

of the best sequential algorithm for solving a problem to the
time taken by the parallel algorithm to solve the same
problem on p processing elements
Performance Metrics: Speedup
• For a given problem, there might b e many serial

algorithms available. These algorithms may have
different asymptotic runtimes (The limiting behavior of
the execution time of an algorithm when the size of the
problem goes to infinity) a nd may b e parallelizable to
different degrees.
• For the purpose of computing speedup, we always

consider the best sequential program as the
baseline.
Speedup: Example 1
• Consider the problem of adding n numbersby

using n processing elements.
• If n is a power of two, we c a n perform this operation

in log n steps by propagating partial sums up a
logical binary tree of processors.
Performance Metrics: Example (continued)
• If a n addition takes constant time, say, t c and

communication of a single word takes time t s + t w , w e have
the parallel time T P = Θ(log n)
• We know that T S = Θ(n)
n
• Speedup S is given b y S = log n
Θ
Efficiency
• Only an ideal parallel system containing p processing can deliver a speedup

equal to p.
• Efficiency is a measure of the fraction of time for which a processing
element is usefully employed; it is defined as the ratio of speedup to the
number of processing elements.
• In an ideal parallel system, speedup is equal to p and efficiency is equal
to one.
• In practice, speedup is less than p and efficiency is between zero and one,
depending on the effectiveness with which the processing elements are
utilized.
• Efficiency is denoted by the symbol E. Mathematically, it is given by
• E= S/P
Performance Metrics: Efficiency
• Efficiency is a measure of the fraction of time for w hich

a processing element is usefully employed
• Mathem atically, it is given b y
S
E = . (2)
p
• Following the bounds on speedup, efficiency c a n b e as low

as 0 a n d as high as 1.
Performance Metrics: Efficiency Example
• The speedup S of ad d in g n numbers on n processors is given

by S = logn n .
• Efficiency E is given b y
n
Θ log n
E =
n
1
= Θ
log n
Cost of a Parallel
System
• Cost is the product of parallel runtime a n d the number of
processing elements used (p × T P ).
• Cost reflects the sum of the time that e a c h processing

element spends solving the problem.
• A parallel system is said to b e cost-optimal if the cost of

solving a problem on a parallel computer is asymptotically
identical to serial cost.
• Since E = T S / p T P , for cost optimal systems, E = O(1).
• Cost is sometimes referred to as work or processor-time

product.
Cost of a Parallel System:
Example
Consider the problem of adding n numbers on n

processors.
• We have, T P = log n (for p = n).
• The cost of this system is given by pT P = n log n.
• Since the serial runtime of this operation is Θ(n), the

algorithm is not cost optimal.
GRANULARITY
Parallel tasks
• Can be specified at various levels of granularity.

• At one extreme, each program in a set of programs can
be viewed as one parallel task.
• At the other extreme, individual instructions within a
program can be viewed as parallel tasks.
Definition
• Granularity of a task
• measure of the amount of work which is performed by that
task
• communication overhead between multiple processors or
processing elements
• Granularity - Computation / Communication Ratio
• Impact of granularity on performance
• Fine grains or small tasks

• more parallelism
• hence increases the speedup
• synchronization overhead - scheduling strategies etc. can
negatively impact the performance of fine-grained tasks
• Increasing parallelism alone cannot give the best
performance.
• Impact of granularity on performance
• In order to reduce the communication

overhead, granularity can be increased.
• Coarse grained tasks have
less communication overhead but they often
cause load imbalance.
• Optimal performance is achieved between the
two extremes of fine-grained and coarse-grained
parallelism.
Granularity
The Number and size of tasks into which a

problem is decomposed determines the granularity
of decomposition.
Granularity is a measure of the ratio of computation to

communication.
In parallel computing, granularity (or grain size) of

a task is a measure of the amount of work (or
computation) which is performed by that Task.
Effect of Granularity on
Performance
• N elements n Processors-Communication Overhead
• Often, using fewer processors improves performance of parallel systems.
• Using fewer than the maximum possible number of processing elements

to execute a parallel algorithm is called scaling down a parallel system.
(p<n)
• If there are n inputs and only p processing elements (p < n), we can use the
parallel algorithm designed for n processing elements by assuming n virtual
processing elements and having each of the p physical processing elements
simulate n/p virtual processing elements.
• Since the number of processing elements decreases by a factor of n/p,

the computation at e a c h processing element increases by a factor of
n/p.
• If virtual processing elements are mapped appropriately onto physical processing

elements, the overall communication time does not grow by more than a factor of
n/p.
Building Granularity: Example
Consider the problem of adding n numbers on p

processing elements such that p < n an d both n an d p
are powers of 2.
• Use the parallel algorithm for n processors, except, in

this case, we think of them as virtual processors.
• Each of the p processors is now assigned n/p virtual

processors.
• n = 16 and p = 4
• Virtual processing element i is simulated by the physical

processing element labeled i mod p; the numbers to be
added are distributed similarly
• The first log p of the log n steps of

the original algorithm are simulated in (n/p)
log p steps on p processing elements.
• Subsequent log n−log p steps do not require any

comm unication.
(continued)
• The overall parallel execution time

of this parallel system is
Θ((n/p) log p).
• The cost is Θ(n log p), which is asymptotically

higher than the Θ(n) cost of adding n
numbers sequentially. Therefore, the parallel
system is not cost-optimal.

L11 Granularity

Uploaded by

Copyright:

Available Formats

You might also like

L11 Granularity

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L11 Granularity

Uploaded by

Copyright:

Available Formats

Parallel Computing

Two key goals to be achieved with the design of parallel

• Scalability – the capacity to increase performance as the size of

 Determining the best algorithm

 Evaluating Hardware Platforms

 Examining the Benefits from parallelism

There are 2 distinct classes of performance metrics:

• A sequential algorithm is evaluated by its runtime.

• The parallel runtime of a program depends on the input size,

• No – a number of overheads, including wasted computation,

The execution profile of a hypothetical parallel program

executing on eight processing elements.

• Idling: Processes ma y idle b e cau s e of imbalance,

• Excess Computation: This is computation not perform e d b y

A Number of Metrics have been used based on the desired outcome of

• Total Parallel Overhead

• Serial runtime of a program is the time elapsed between the

• We denote the serial runtime by T S a n d the parallel runtime by

• The overheads incurred by a parallel program are encapsulated into a

• We define overhead function or toral overhead of a parallel system as

• Let T a l l b e the total time collectively spent by all the

• T S is the serial time.

• Observe that T a l l − T S is then the total time spend by all

• The total time collectively spent by all the processing

• Performance gain is achieved by parallelizing a given

• Speedup is a measure that captures the relative benefit of

• What is the benefit fro m parallelism?

• Speedup (S) is defined as the ratio of the time taken to solve

• Formally speedup S is defined as the ratio of serial runtime

• For a given problem, there might b e many serial

• For the purpose of computing speedup, we always

• Consider the problem of adding n numbersby

• If n is a power of two, we c a n perform this operation

• If a n addition takes constant time, say, t c and

• We know that T S = Θ(n)

• Only an ideal parallel system containing p processing can deliver a speedup

• Efficiency is a measure of the fraction of time for w hich

• Mathem atically, it is given b y

• Following the bounds on speedup, efficiency c a n b e as low

• The speedup S of ad d in g n numbers on n processors is given

• Cost reflects the sum of the time that e a c h processing

• A parallel system is said to b e cost-optimal if the cost of

• Since E = T S / p T P , for cost optimal systems, E = O(1).

• Cost is sometimes referred to as work or processor-time

Consider the problem of adding n numbers on n

• We have, T P = log n (for p = n).

• The cost of this system is given by pT P = n log n.

• Since the serial runtime of this operation is Θ(n), the

• Can be specified at various levels of granularity.

• Fine grains or small tasks

• In order to reduce the communication

The Number and size of tasks into which a

Granularity is a measure of the ratio of computation to

In parallel computing, granularity (or grain size) of

• Often, using fewer processors improves performance of parallel systems.

• Using fewer than the maximum possible number of processing elements

• Since the number of processing elements decreases by a factor of n/p,