Arkom 13-40275

PARALLEL
PROCESSOR
1
Dosen 1. NOVERA ISTIQOMAH, M.T. (NVO)
Pengajar
2. MUHAMMAD FARIS RURIAWAN, M.T. (FRW)
2
1. Capaian
2. Multiple Processor Organizations
3. Symmetric Multiprocessors
4. Cache Coherence and the MESI

Protocol
5. Multithreading and Chip

Multiprocessors
6. Clusters
7. Nonuniform Memory Access
8. Vector Computation
3
CAPAIAN
[C3] Mampu Memahami & Menjelaskan

tentang berbagai jenis prosesor paralel,
komponen utama pada MESI protokol,
perbedaan antara implicit dan explicit
multithreading, cluster, komputasi vektor
4
MULTIPLE PROCESSOR
ORGANIZATIONS
5
JENIS PROSESOR PARALEL
• Single instruction, single data (SISD) stream: A single processor executes a
single instruction stream to operate on data stored in a single memory.
Uniprocessors fall into this category.
• Single instruction, multiple data (SIMD) stream: A single machine
instruction controls the simultaneous execution of a number of processing
elements on a lockstep basis. Each processing element has an associated data
memory, so that instructions are executed on different sets of data by
different processors.
• Multiple instruction, single data (MISD) stream: A sequence of data is
transmitted
to a set of processors, each of which executes a different instruction
sequence. This structure is not commercially implemented.
• Multiple instruction, multiple data (MIMD) stream: A set of processors
simultaneously execute different instruction sequences on different data sets.
SMPs, clusters, and NUMA systems fit into this category.
6
ARSITEKTUR PROSESOR PARALEL
7
SYMMETRIC MULTIPROCESSORS
(SMP)
8
KARAKTERISTIK
1. There are two or more similar processors of comparable capability.
2. These processors share the same main memory and I/O facilities
and are interconnected by a bus or other internal connection scheme,
such that memory
access time is approximately the same for each processor.
3. All processors share access to I/O devices, either through the same
channels or through different channels that provide paths to the same
device.
4. All processors can perform the same functions (hence the term
symmetric).
5. The system is controlled by an integrated operating system that
provides interaction
between processors and their programs at the job, task, file, and data
element levels.
9
KEUNTUNGAN
• Performance: If the work to be done by a computer can be
organized so that some portions of the work can be done in
parallel, then a system with multiple processors will yield greater
performance than one with a single processor of the same type
(Figure 17.3).
10
KEUNTUNGAN
• Availability: In a symmetric multiprocessor, because all processors

can perform the same functions, the failure of a single processor does
not halt the machine. Instead, the system can continue to function at
reduced performance.
• Incremental growth: A user can enhance the performance of a
system by adding an additional processor.
• Scaling: Vendors can offer a range of products with different price
and performance characteristics based on the number of processors
11
configured in the system.
MULTIPROCESSOR OPERATING
SYSTEM DESIGN CONSIDERATION
• Simultaneous concurrent processes: OS routines need to be
reentrant to allow several processors to execute the same IS
code simultaneously. With multiple processors executing the
same or different parts of the OS, OS tables and management
structures must be managed properly to avoid deadlock or
invalid operations.
• Scheduling: Any processor may perform scheduling, so
conflicts must be avoided. The scheduler must assign ready
processes to available processors.
• Synchronization: With multiple active processes having
potential access to shared address spaces or shared I/O
resources, care must be taken to provide effective
synchronization. Synchronization is a facility that enforces
mutual exclusion and event ordering. 12
MULTIPROCESSOR OPERATING
SYSTEM DESIGN CONSIDERATION
• Memory management: Memory management on a
multiprocessor must deal with all of the issues found on
uniprocessor machines. In addition, the operating system needs
to exploit the available hardware parallelism, such as
multiported memories, to achieve the best performance. The
paging mechanisms on different processors must be coordinated
to enforce consistency when several processors share a page or
segment and to decide on page replacement.
• Reliability and fault tolerance: The operating system should
provide graceful degradation in the face of processor failure. The
scheduler and other portions of the operating system must
recognize the loss of a processor and restructure management
tables accordingly.
13
CACHE COHERENCE&THE MESI
PROTOCOL
14
HARDWARE SOLUTIONS
1. DIRECTORY PROTOCOLS
 collect and maintain information about where copies of lines reside
 there is a centralized controller
 responsible for keeping the state information up to date
 every local action that can affect the global state of a line must be
reported to the central controller
 effective in large-scale systems that involve multiple buses or some
other complex interconnection scheme.
2. SNOOPY PROTOCOLS
 distribute the responsibility for maintaining cache coherence among all
of the cache controllers in a multiprocessor
 suited to a bus-based multiprocessor, because the shared bus provides
a simple means for broadcasting and snooping.
 two basic approaches: write invalidate and write update (or write
broadcast)
 Performance depends on the number of local caches and the pattern of
15
memory reads and writes.
HARDWARE SOLUTIONS
3. The MESI Protocol
The data cache includes two status bits per tag, so that each line
can be in one of four states:
• Modified: The line in the cache has been modified (different
from main memory) and is available only in this cache.
• Exclusive: The line in the cache is the same as that in main
memory and is not present in any other cache.
• Shared: The line in the cache is the same as that in main
memory and may be present in another cache.
• Invalid: The line in the cache does not contain valid data.
16
MULTITHREADING AND CHIP
MULTIPROCESSORS
17
IMPLICIT&EXPLICIT
1. Process: An instance of a program running on a computer. A
process embodies two key characteristics:
— Resource ownership: A process includes a virtual address
space to hold the process image; the process image is the
collection of program, data, stack, and attributes that define
the process
— Scheduling/execution: The execution of a process follows
an execution path (trace) through one or more programs. A
process has an execution state (Running, Ready, etc.) and a
dispatching priority and is the entity that is scheduled and
dispatched by the operating system.
18
IMPLICIT&EXPLICIT
3. Process switch: An operation that switches the processor
from one process to another, by saving all the process control
data, registers, and other information for the first and
replacing them with the process information for the second.
4. Thread: A dispatchable unit of work within a process. It
includes a processor context (which includes the program
counter and stack pointer) and its own data area for a stack
(to enable subroutine branching). A thread executes
sequentially and is interruptible so that the processor can turn
to another thread.
5. Thread switch: The act of switching processor control from
one thread to another within the same process. Typically, this
type of switch is much less costly than a process switch.
19
PRINCIPAL APPROACHES
1. Interleaved multithreading/fine-grained multithreading. The processor deals with
two or more thread contexts at a time, switching from one thread to another at
each clock cycle. If a thread is blocked because of data dependencies or memory
latencies, that thread is skipped and a ready thread is executed.
2. Blocked multithreading/coarse-grained multithreading. The instructions of a
thread are executed successively until an event occurs that may cause delay, such
as a cache miss. This event induces a switch to another thread. This approach is
effective on an in-order processor that would stall the pipeline for a delay event
such as a cache miss.
3. Simultaneous multithreading (SMT): Instructions are simultaneously issued from
multiple threads to the execution units of a superscalar processor. This combines
the wide superscalar instruction issue capability with the use of multiple thread
contexts.
4. Chip multiprocessing: the entire processor is replicated on a single chip and each
processor handles separate threads. The advantage of this approach is that the
available logic area on a chip is used effectively without depending on ever-
increasing complexity in pipeline design.
20
CLUSTERS
21
DEFINITION
 An alternative to symmetric multiprocessing as an approach to
providing high performance and high availability and is
particularly attractive for server applications
 A group of interconnected, whole computers working together
as a unified computing resource that can create the illusion of
being one machine.
22
CONFIGURATION
23
METHODS
24
ARCHITECTURE
25
COMPARED TO SMP
CLUSTERS SMP
• Multiple processors • Multiple processors, SMP

• Dominating the high- schemes have been around
performance server market far longer
• Far superior in terms of • Easier to manage and
incremental and absolute configure
scalability • Much closer to the original
• Superior in terms of single-processor model
availability • Less physical space
• Draws less power
• Well established and stable
26
NONUNIFORM MEMORY ACCESS
(NUMA)
27
TERMS
• Uniform memory access (UMA): All processors have access to all
parts of main memory using loads and stores. The memory access time
of a processor to all regions of memory is the same. The access times
experienced by different processors are the same. T
• Nonuniform memory access (NUMA): All processors have access to
all parts of main memory using loads and stores. The memory access
time of a processor differs depending on which region of main memory
is accessed. The last statement is true for all processors; however, for
different processors, which memory regions are slower and which are
faster differ.
• Cache-coherent NUMA (CC-NUMA): A NUMA system in which cache
coherence is maintained among the caches of the various processors.
28
ORGANIZATION
29
VECTOR COMPUTATION
30
APPROACHES
31
Terima kasih
32

Arkom 13-40275

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arkom 13-40275

Uploaded by

Copyright:

Available Formats

PARALLEL

2. Multiple Processor Organizations

4. Cache Coherence and the MESI

5. Multithreading and Chip

7. Nonuniform Memory Access

[C3] Mampu Memahami & Menjelaskan

• Availability: In a symmetric multiprocessor, because all processors

• Multiple processors • Multiple processors, SMP

You might also like