Professional Documents
Culture Documents
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
Reference
• Key stages
• Identifying portions of the work that can be performed concurrently.
• Mapping the concurrent pieces of work onto multiple processes running in parallel.
• Distributing the input, output, and intermediate data associated with the program.
• Managing accesses to data shared by multiple processors.
• Synchronizing the processors at various stages of the parallel program execution.
3.1. Preliminaries
• Dividing a computation into smaller computations and assigning them to different
processors for parallel execution are the two key steps in the design of parallel algorithms.
• The process of dividing a computation into smaller parts, some or all of which may
potentially be executed in parallel, is called decomposition.
• Tasks are programmer-defined units of computation into which the main computation is
subdivided by means of decomposition.
• Tasks can be of arbitrary size, but once defined, they are regarded as indivisible units of
computation.
Example: Dense matrix - vector multiplication
• An abstraction used to express such dependencies among tasks and their relative order of
execution is known as a task dependency graph.
• A task-dependency graph is a directed acyclic graph in which the nodes represent tasks and
the directed edges indicate the dependencies amongst them.
• The task corresponding to a node can be executed when all tasks connected to this node by
incoming edges have completed.
• The number and size of tasks into which a problem is decomposed determines the
granularity of the decomposition.
The portions of the matrix and the input and output vectors accessed by Task 1 are highlighted
3.1.2 Granularity, Concurrency, and Task-Interaction
• The maximum number of tasks that can be executed simultaneously in a parallel program at
any given time is known as its maximum degree of concurrency
• In most cases, the maximum degree of concurrency is less than the total number of tasks due
to dependencies among the tasks
• For example, the maximum degree of concurrency in the task-graphs above figures is four
• In general, for task dependency graphs that are trees, the maximum degree of concurrency is
always equal to the number of leaves in the tree.
• Average degree of concurrency - average number of tasks that can run concurrently over the
entire duration of execution of the program.
• Both the maximum and the average degrees of concurrency usually increase as the granularity
of tasks becomes smaller (finer)
Degree of concurrency
• A feature of a task-dependency graph that determines the average degree of concurrency for a
given granularity is its critical path
• The longest directed path between any pair of start and finish nodes is known as the critical
path.
• The sum of the weights of nodes along this path is known as the critical path length, where the
weight of a node is the size or the amount of work associated with the corresponding task.
• The ratio of the total amount of work to the critical-path length is the average degree of
concurrency.
• Therefore, a shorter critical path favors a higher degree of concurrency.
• The degree of concurrency also depends on the shape of the task-dependency graph and the
same granularity, in general, does not guarantee the same degree of concurrency.
Find the average degree of concurrency of the two task-dependency graphs?
Find the average degree of concurrency of the two task-dependency graphs?
• Another performance limiting factor is the interaction among tasks running on different
physical processors
• The dependencies in a task-dependency graph usually result from the fact that the output of
one task is the input for another.
• For example, in the database query example, tasks share intermediate data; the table generated
by one task is often used by another task as input.
• Another example: matrix-vector multiplication (one copy of vector is available)
• Process is an abstract entity that uses the code and data corresponding to a task to produce the
output of that task within a finite amount of time
• In addition to performing computations, a process may synchronize or communicate with
other processes
• The mechanism by which tasks are assigned to processes for execution is called mapping.
• The task-dependency and task-interaction graphs play an important role in the selection of a
good mapping for a parallel algorithm
• A good mapping should seek
• to maximize the use of concurrency by mapping independent tasks onto different processes,
• it should seek to minimize the total completion time and
• it should seek to minimize interaction among processes
The most efficient decomposition mapping combination is a single task mapped onto a single
process. It wastes no time in idling or interacting, but achieves no speedup
Mappings of the task graphs onto four processes (for query processing example)