Professional Documents
Culture Documents
CA - UNIT 4 - PPT
CA - UNIT 4 - PPT
PARALLELISM
1. Parallelism
2. Parallel Processing
3. Amdahl’s Law
4. Flynn's Classification
5. Hardware Multithreading
6. Multicore Processors
7. Shared Memory Multiprocessors
8. Graphics Processing Units
9. Clusters
10. Warehouse Scale Computers
11. Message-Passing Multiprocessors
1.PARALLELISM
• ILP – Instruction Level Parallelism
• How many instructions can be executed simultaneously.
• Two approaches to instruction level parallelism:
Hardware and Software
• Hardware level works upon dynamic parallelism
• Software level works on static parallelism
• Dynamic parallelism means the processor decides which
instructions to execute in parallel.
20
4.2 - Single Instruction, Multiple Data (SIMD)
• Executes a single instruction on multiple data values simultaneously
using many processors.
• A single control unit does the fetch and decoding for all processors.
• SIMD architectures include array processors.
• E.g – ILLIAC-IV, MPP, CM-2, STARAN
21
4.3 - Multiple Instruction, Single Data (MISD)
• Executing different instructions but all of them operating
on the same data stream.
• This structure is not commercially implemented.
• Systolic array is one example of an MISD architecture
22
4.4 - Multiple Instruction, Multiple Data (MIMD)
• Execute multiple instructions simultaneously on multiple data
streams.
• Each processor must include its own control unit
• Distributed-memory multiprocessor or Shared-memory
multicomputer.
• E.g – CRAY-XMP, IBM 370/168 M
23
4.5 - SINGLE PROGRAM, MULTIPLE DATA (SPMD)
• It is a subcategory of MIMD.
• Tasks are split up and run simultaneously on multiple
processors with different input in order to obtain results
faster.
5.HARDWARE MULTITHREADING
• Hardware multithreading allows multiple threads to share the
functional units of a single processor in an overlapping
fashion.
2. Coarse-grained multithreading /
Blocking Multithreading
FINE-GRAINED MULTITHREADING
• The processor switches
between threads after each
instruction
• Also called as Interleaving.
• Done in a round robin fashion.
• In a pipelined Architecture
If
k stages,
k threads to be executed
then
there cannot be hazards
due to dependencies and the
pipeline never stalls.
COARSE-GRAINED MULTITHREADING
• Dedicated L1 cache
• Dedicated L2 cache
• Shared L2 cache
• Shared L3 cache
DEDICATED L1 CACHE
• On-chip cache
• Size - 256KB to 1MB
• Divided into instruction
and data cache.
• Has the data the CPU is
most likely to need
while completing a certain
task .
• Example :
ARM11 MPCore
DEDICATED L2 CACHE
• No on-chip cache
• Size -256KB to 8MB.
• Slower than L1 cache
• Bigger than L1 cache
• Holds data that is
likely to be accessed
by the CPU next.
• Example :
AMD Opteron
SHARED L2 CACHE
• L2 cache is shared.
• Power Consumption
SOFTWARE PERFORMANCE ISSUES
• Multi-threaded applications
• Multi-process applications
• Multi-instance applications
7. SHARED MEMORY MULTIPROCESSORS
• A parallel processor with a
single address space across all
processors.
• Processors communicate
through shared variables in
memory.
They are
• Symmetric Shared-Memory Architectures
• Distributed Shared Memory Architectures
Symmetric Shared-Memory Architectures
• Consists of several processors with a single physical memory
shared by all processors through a shared bus.
• The cores have private level 1 caches, while other caches may
or may not be shared between the cores.
Distributed Shared Memory Architectures
• Consists of multiple independent processing nodes with local
memory modules which is connected by a general interconnection
network.
• Distributed-memory systems are also called as clusters or hybrid
systems.
• Each node of a cluster has access to shared memory in addition to
each node's non-shared private memory.
8. GRAPHICS PROCESSING UNIT (GPU)
Visual Processing unit
A graphics coprocessor / accelerator
Built with hundreds of processing cores
Handles large number of floating-point operations
in parallel.
Used in mobile phones, game consoles, embedded
systems, PCs and Servers.
Does not rely on multilevel caches
Rely on hardware multithreading
Size - 4 to 6 GB (or) less
Difference Between CPU and GPU
GPU ARCHITECTURE
• First GPU is GeForce 256 by NVIDIA in 1999.
• Performance
• Fault Tolerance
• Scalability
APPLICATIONS OF CLUSTERS
• Amazon
• Facebook
• Google
• Microsoft