Professional Documents
Culture Documents
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
Distributed Memory.
Hybrid Distributed-Shared Memory.
Shared Memory
Shared memory parallel computers vary widely, but generally
have in common the ability for all processors to access all
memory as global address space.
Multiple processors can operate independently but share the
same memory resources.
Changes in a memory location effected by one processor are
visible to all other processors.
Shared memory machines can be divided into two main classes
based upon memory access times: UMA and NUMA.
Uniform Memory Access (UMA):
Non-Uniform Memory Access (NUMA)
Shared Memory : UMA vs. NUMA
Uniform Memory Access (UMA):
– Most commonly represented today by Symmetric Multiprocessor
(SMP) machines
– Identical processors
– Equal access and access times to memory
– Sometimes called CC-UMA - Cache Coherent UMA. Cache
coherent means if one processor updates a location in shared
memory, all the other processors know about the update. Cache
coherency is accomplished at the hardware level.
M M M
E E E
M B M B M B
O U O U O U
R S R S R S
Y Y Y
Global
GlobalMemory
MemorySystem
System
Processor
Processor Processor
Processor Processor
Processor
AA BB CC
M M M
E E E
M B M B M B
O U O U O U
R S R S R S
Y Y Y
Memory
Memory Memory Memory
Memory Memory
System
System AA System
System BB System
SystemCC
Distributed Memory: Pro and Con
Advantages:
– Memory is scalable with number of processors. Increase
the number of processors and the size of memory
increases proportionately.
– Each processor can rapidly access its own memory
without interference and without the overhead incurred
with trying to maintain cache coherency.
– Cost effectiveness: can use commodity, off-the-shelf
processors and networking.
Disadvantages
– The programmer is responsible for many of the details
associated with data communication between
processors.
– It may be difficult to map existing data structures, based
on global memory, to this memory organization.
– Non-uniform memory access (NUMA) times
• Distributing memory among processing nodes has
2 pluses:
– It’s a great way to save some bandwidth
• With memory distributed at nodes, most accesses are
to local memory within a particular node
• No need for bus communication
– Reduces latency for accesses to local memory
• 3Assumes
CPU A stores 0 into X 0 1 0
that neither cache had value/location X in it
1st
• Both a write-through cache and a write-back cache will
encounter this problem
• If B reads the value of X after Time 3, it will get 1
which is the wrong value!
Coherence in shared memory
programs
• Must have coherence and consistency
• Memory system coherent if:
– Program order preserved (always true in uniprocessor)
• Say we have a read by processor P of location X
• Before the read processor P wrote something to location X
• In the interim, no other processor has written to X
• A read to X should always return the value written by P
– A coherent view of memory is provided
• 1st, processor A writes something to memory location X
• Then, processor B tries to read from memory location X
• Processor B should get the value written by processor A
assuming…
– Enough time has past b/t the two events
– No other writes to X have occurred in the interim
Coherence in shared memory
programs (continued)
• Memory system coherent if: (continued)
– Writes to same location are serialized
• Two writes to the same location by any two processors are
seen in the same order by all processors