Paralle Processing in Brief

PARALLEL
PROCESSING
CONCEPTS
Prof. Shashikant V. Athawale
Assistant Professor | Computer Engineering
Department | AISSMS College of Engineering,
Presented by:
Dr. Omar H. Salman
AL Iraqia University / Engineering
College / Network Department
Contents
• Introduction to Parallel Computing

• Motivating Parallelism
• Scope of Parallel Computing
• Parallel Programming Platforms
• Implicit Parallelism
• Trends in Microprocessor and Architectures 2
• Limitations of Memory System Performance

• Dichotomy of Parallel Computing Platforms
• Physical Organization of Parallel Platforms
• Communication Costs in Parallel Machines
• Scalable design principles
• Architectures: N-wide superscalar architectures
• Multi-core architectures.
Introduction to
Parallel Computing
A parallel computer is a “Collection of processing
elements that communicate and co-operate to solve large

3
problems fast”.
Processing of multiple tasks simultaneous on
multiple processor is called parallel processing.

What is Parallel Computing?
Traditionally, software has been written for serial computation:
To be run on a single computer having a single Central Processing Unit (CPU)
What is Parallel Computing?
In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem.
Serial Vs Parallel Computing
Fetch/Store Fetch/Store
Compute
Compute
communicate
Cooperative game
Motivating Parallelism
• The role of parallelism in accelerating

computing speeds has been recognized for
several decades.
• Its role in providing multiplicity of datapaths 7
and increased access to storage elements

has been significant in commercial
applications.
• The scalable performance and lower cost of parallel
platforms is reflected in the wide variety of
applications.
• Developing parallel hardware and software has
traditionally been time and effort intensive.
• If one is to view this in the context of rapidly improving
uniprocessor speeds, one is tempted to question the
need for parallel computing.
• This is the result of a number of fundamental physical 8
and computational limitations.

• The emergence of standardized parallel
programming environments, libraries, and
hardware have significantly reduced time to
(parallel) solution.
In short
1. Overcome limits to serial computing

2. Limits to increase transistor density
9
3. Limits to data transmission speed
4. Faster turn-around time
5. Solve larger problems
Scope of Parallel Computing
• Parallel computing has great impact on wide range of applications.
• Commerical
• Scientific
• Turn around time should be minimum

1
• High performance
0
• Resource mangement
• Load balencing
• Dynamic libray
• Minimum network congetion and latency

Applications
 Commercial computing.
- Weather forecasting
- Remote sensors, Image processing
- Process optimization, operations research.
 Scientific and Engineering application.
- Computational chemistry 1
1
- Molecular modelling
- Structure mechanics
 Business application.
- E – Governance
- Medical Imaging
 Internet applications.
- Internet server
- Digital Libraries
Parallel Programming
Platforms
• The main objective is to provide sufficient

details to programmer to be able to write 1
2
efficient code on variety of platform.
• Performance of various parallel
algorithm.
Implicit Parallelism
A programming language is said to be

implicitly parallel if its compiler or interpreter
1
can recognize opportunities for 3
parallelization and implement them without

being told to do so.
Implicitly parallel
programming language
 Implicitly parallel programming languages

 Microsoft Axum
 MATLAB's M-code 1
4
 ZPL
 Laboratory Virtual Instrument Engineering
Workbench (LabVIEW)
 NESL
 SISAL
 High-Performance Fortran (HPF)
Dichotomy of Parallel
Computing
Platforms
 First explore a dichotomy based on the logical and
physical organization of parallel platforms.
 The logical organization refers to a programmer's
view of the platform while the physical organization 1
5
refers to the actual hardware organization of the
platform.
 The two critical components of parallel computing
from a programmer's perspective are ways of
expressing parallel tasks and mechanisms for
specifying interaction between these tasks.
 The former is sometimes also referred to as the control

structure and the latter as the communication
Control Structure of Parallel Platforms
Parallel tasks can be specified at various levels of granularity.

At the other extreme, individual instructions within a program
can be viewed as parallel tasks. Between these extremes lie a
range of models for specifying the control structure of programs
and the corresponding architectural support for them.
Parallelism from single instruction on multiple processors 1
6
Consider the following code segment that adds two vectors:
1 for (i = 0; i < 1000; i++)
2 c[i] = a[i] + b[i];
In this example,
various iterations of the
loop are independent
of each other; i.e., c[0] = a[0] + b[0]; c[1] = a[1] + b[1];, etc., can all be
instruction, in this case add on all the processors with appropriate data, we
executed independently of each other. Consequently, if there is a mechanism for executing the same
could execute this loop much faster

A typical SIMD architecture (a) and a typical MIMD
architecture (b).
1
7
Figure A typical SIMD architecture (a) and a typical MIMD architecture (b).
Executing a conditional statement on an SIMD computer
with four processors: (a) the conditional statement; (b) the
execution of the statement in two steps
1
8
Communication Model of Parallel Platforms
Shared-Address-Space Platforms
Typical shared-address-space architectures: (a) Uniform-memory-access
shared-address-space computer; (b) Uniform-memory-access shared-
address-space computer with caches and memories; (c) Non-uniform-
memory-access shared-address-space computer with local memory only.
1
9
Message-Passing Platforms
The logical machine view of a message-passing platform

consists of p processing nodes.
Instances clustered workstations and non-shared-address-
space multicomputers.
On such platforms, interactions between processes
running 2
0
on different nodes must be accomplished using messages,
hence the name message passing.
This exchange of messages is used to transfer data, work,
and to synchronize actions among the processes.
In its most general form, message-passing paradigms
support execution of a different program on each of the p
Physical Organization
of Parallel Platforms
Architecture of an Ideal Parallel Computer

Exclusive-read, exclusive-write (EREW) PRAM. In this class,
access to a memory location is exclusive. No concurrent read or
write operations are allowed.
2
1
Concurrent-read, exclusive-write (CREW) PRAM. In this class,
multiple read accesses to a memory location are allowed.
Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write

accesses are allowed to a memory location, but multiple read
accesses are serialized.
Concurrent-read, concurrent-write (CRCW) PRAM. This class

allows multiple read and write accesses to a common memory
location. This is the most powerful PRAM model.
Interconnection Networks for Parallel Computers
▹ Interconnection networks can be classified as static or

dynamic. Static networks consist of point- to-point
communication links among processing nodes
and are also referred to as direct networks. Figure .Classification
of interconnection networks: (a) a static network; and (b) a dynamic network.
2
2
Network Topology
Linear Arrays
Linear arrays: (a) with no wraparound links; (b) with

wraparound link.
2
3
Two and three dimensional meshes: (a) 2-D mesh with no
wraparound; (b) 2-D mesh with wraparound link (2-D
torus); and (c) a 3-D mesh with no wraparound.
2
4
Construction of hypercubes from hypercubes of lower
dimension.
2
5
Tree-Based Networks
2
6
Complete binary tree networks: (a) a static tree network;

and (b) a dynamic tree network.
Scalable Design principles
❖ Avoid the single point of failure.

❖ Scale horizontally, not vertically.
❖ Push work as far away from the core as possible.
❖ API first.
❖ Cache everything, always.
❖ Provide as fresh as needed data.
❖ Design for maintenance and automation.
❖ Asynchronous rather than synchronous.
❖ Strive for statelessness.
N-wide superscalar architecture:
❖ Superscalar architecture is called as N-

wide architecture if it supports to fetch
and dispatch of n instructions in every
cycle.
Multi-core architectures:
Multi-core architectures:
❖ Many cores fit on the single processor socket.
❖ 2)Also called Chip-Multiprocessor
❖ 3)These cores runs in parallel.
❖ 4)The architecture of a multicore processor enables
❖ communication between all available cores to ensure that
the processing tasks are divided and assigned accurately.
THANKU YOU !!!!
3
1

Paralle Processing in Brief

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paralle Processing in Brief

Uploaded by

Copyright:

Available Formats

PARALLEL

• Introduction to Parallel Computing

• Limitations of Memory System Performance

A parallel computer is a “Collection of processing

elements that communicate and co-operate to solve large

Processing of multiple tasks simultaneous on

multiple processor is called parallel processing.

• The role of parallelism in accelerating

and increased access to storage elements

and computational limitations.

1. Overcome limits to serial computing

• Turn around time should be minimum

• Minimum network congetion and latency

• The main objective is to provide sufficient

A programming language is said to be

parallelization and implement them without

 Implicitly parallel programming languages

 The former is sometimes also referred to as the control

Parallel tasks can be specified at various levels of granularity.

could execute this loop much faster

The logical machine view of a message-passing platform

Architecture of an Ideal Parallel Computer

Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write

Concurrent-read, concurrent-write (CRCW) PRAM. This class

▹ Interconnection networks can be classified as static or

Linear arrays: (a) with no wraparound links; (b) with

Complete binary tree networks: (a) a static tree network;

❖ Avoid the single point of failure.

❖ Superscalar architecture is called as N-

You might also like